Parav Pandit [Thu, 2 May 2019 07:48:04 +0000 (10:48 +0300)]
RDMA/cma: Use rdma_read_gid_attr_ndev_rcu to access netdev
To access the netdevice of the GID attribute, use an existing API
rdma_read_gid_attr_ndev_rcu().
This further reduces dependency on open access to netdevice of GID
attribute.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Parav Pandit [Thu, 2 May 2019 07:48:03 +0000 (10:48 +0300)]
RDMA: Introduce and use GID attr helper to read RoCE L2 fields
Instead of RoCE drivers figuring out vlan, smac fields while working on
QP/AH, provide a helper routine to read the L2 fields such as vlan_id and
source mac address.
This moves logic from mlx5 driver to core for wider usage for RoCE ports.
This is a preparation patch to allow detaching netdev in subsequent patch.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Parav Pandit [Thu, 2 May 2019 07:48:02 +0000 (10:48 +0300)]
IB/cm: Reduce dependency on gid attribute ndev check
GID type to path record type conversion can be done directly based on port
type and gid attribute type. There is no need to find out using indirect
way by its GID attribute's ndev field.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Parav Pandit [Thu, 2 May 2019 07:48:01 +0000 (10:48 +0300)]
RDMA/rxe: Consider skb reserve space based on netdev of GID
Always consider the skb reserve space based on netdevice of the GID
attribute, regardless of vlan or non vlan netdevice.
Fixes:
43c9fc509fa5 ("rdma_rxe: make rxe work over 802.1q VLAN devices")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Kamal Heib [Mon, 29 Apr 2019 11:59:06 +0000 (14:59 +0300)]
RDMA: Get rid of iw_cm_verbs
Integrate iw_cm_verbs data members into ib_device_ops and ib_device
structs, this is done to achieve the following:
1) Avoid memory related bugs durring error unwind
2) Make the code more cleaner
3) Reduce code duplication
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Parav Pandit [Wed, 1 May 2019 05:46:55 +0000 (08:46 +0300)]
RDMA/core: Do not invoke init_port on compat devices
The driver interface cannot manipulate the sysfs of the compat device,
only of the full device so we must avoid calling the driver sysfs APIs on
compat devices.
This prevents an oops:
Call Trace:
dump_stack+0x5a/0x73
kobject_init+0x74/0x80
kobject_init_and_add+0x35/0xb0
hfi1_create_port_files+0x6e/0x3c0 [hfi1]
ib_setup_port_attrs+0x43b/0x560 [ib_core]
add_one_compat_dev+0x16a/0x230 [ib_core]
rdma_dev_init_net+0x110/0x160 [ib_core]
ops_init+0x38/0xf0
setup_net+0xcf/0x1e0
copy_net_ns+0xb7/0x130
create_new_namespaces+0x11a/0x1b0
unshare_nsproxy_namespaces+0x55/0xa0
ksys_unshare+0x1a7/0x340
__x64_sys_unshare+0xe/0x20
do_syscall_64+0x5b/0x180
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes:
5417783eabb2 ("RDMA/core: Support core port attributes in non init_net")
Reported-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Tested-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Artemy Kovalyov [Wed, 1 May 2019 05:39:48 +0000 (08:39 +0300)]
IB/core: Set qp->real_qp before it may be accessed
real_qp should be initialized before ib_destroy_qp() is called.
ib_destroy_qp() may be called in the error flow if ib_create_qp_security()
failed.
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Jack Morgenstein [Wed, 1 May 2019 05:38:30 +0000 (08:38 +0300)]
IB/mlx5: Add missing XRC options to QP optional params mask
The QP transition optional parameters for the various transition for XRC
QPs are identical to those for RC QPs.
Many of the XRC QP transition optional parameter bits are missing from the
QP optional mask table. These omissions caused failures when doing XRC QP
state transitions.
For example, when trying to change the response timer of an XRC receive QP
via the RTS2RTS transition, the new timer value was ignored because
MLX5_QP_OPTPAR_RNR_TIMEOUT bit was missing from the optional params mask
for XRC qps for the RTS2RTS transition.
Fix this by adding the missing XRC optional parameters for all QP
transitions to the opt_mask table.
Fixes:
e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters")
Fixes:
a4774e9095de ("IB/mlx5: Fix opt param mask according to firmware spec")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Shamir Rabinovitch [Tue, 30 Apr 2019 14:23:21 +0000 (17:23 +0300)]
RDMA/uverbs: Initialize uverbs_attr_bundle ucontext in ib_uverbs_get_context
ib_uverbs_get_context does not have a uobject so it does not call the
rdma_lookup_get_uobject which is used to set up the uverbs_attr_bundle
ucontext. For ib_uverbs_get_context we need to set up this manually before
we send the uverbs_attr_bundle down to the driver layer.
This completes the change that was done in commit
70f06b26f07e ("IB:
ucontext should be set properly for all cmd & ioctl paths")
Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Gal Pressman [Tue, 30 Apr 2019 08:46:39 +0000 (11:46 +0300)]
RDMA/uverbs: Initialize udata struct on destroy flows
Cited commit introduced the udata parameter to different destroy flows
but the uapi method definition does not have udata (i.e has_udata flag
is not set). As a result, an uninitialized udata struct is being passed
down to the driver callbacks.
Fix that by clearing the driver udata even in cases where has_udata flag
is not set.
Fixes:
c4367a26357b ("IB: Pass uverbs_attr_bundle down ib_x destroy path")
Cc: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Co-developed-by: Jason Gunthorpe <jgg@ziepe.ca>
Signed-off-by: Jason Gunthorpe <jgg@ziepe.ca>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Shiraz Saleem [Mon, 29 Apr 2019 21:32:04 +0000 (16:32 -0500)]
RDMA/umem: Handle page combining avoidance correctly in ib_umem_add_sg_table()
The flag update_cur_sg tracks whether contiguous pages from a new set of
page_list pages can be merged into the SGE passed into
ib_umem_add_sg_table(). If this flag is true, but the total segment length
exceeds the max_seg_size supported by HW, we avoid combining to this SGE
and move to a new SGE (x) and merge 'len' pages to it. However, if i <
npages, the next iteration can incorrectly merge 'len' contiguous pages
into x instead of into a new SGE since update_cur_sg is still true.
Reset update_cur_sg to false always after the check to merge pages into
the first SGE passed in to ib_umem_add_sg_table(). Also, prevent a new
SGE's segment length from ever exceeding HW max_seg_sz.
There is a crash on hfi1 as result of this where-in max_seg_sz is
defaulting to 64K. Due to above bug, unfolding SGE's in __ib_umem_release
points to a bad page ptr.
TEST comp-wfr.perfnative.STL-22166-WDT _ perftest native 2-Write_4097QP_4MB STARTING at
1555387093
BUG: Bad page state in process ib_write_bw pfn:7ebca0
page:
ffffcd675faf2800 count:0 mapcount:1 mapping:
0000000000000000 index:0x1
flags: 0x17ffffc0000000()
raw:
0017ffffc0000000 dead000000000100 dead000000000200 0000000000000000
raw:
0000000000000001 0000000000000000 0000000000000000 0000000000000000
page dumped because: nonzero mapcount
CPU: 18 PID: 15853 Comm: ib_write_bw Tainted: G B 5.1.0-rc4 #1
Hardware name: Intel Corporation S2600CWR/S2600CW, BIOS SE5C610.86B.01.01.0014.
121820151719 12/18/2015
Call Trace:
dump_stack+0x5a/0x73
bad_page+0xf5/0x10f
free_pcppages_bulk+0x62c/0x680
free_unref_page+0x54/0x70
__ib_umem_release+0x148/0x1a0 [ib_uverbs]
ib_umem_release+0x22/0x80 [ib_uverbs]
rvt_dereg_mr+0x67/0xb0 [rdmavt]
ib_dereg_mr_user+0x37/0x60 [ib_core]
destroy_hw_idr_uobject+0x1c/0x50 [ib_uverbs]
uverbs_destroy_uobject+0x2e/0x180 [ib_uverbs]
uobj_destroy+0x4d/0x60 [ib_uverbs]
__uobj_get_destroy+0x33/0x50 [ib_uverbs]
__uobj_perform_destroy+0xa/0x30 [ib_uverbs]
ib_uverbs_dereg_mr+0x66/0x90 [ib_uverbs]
ib_uverbs_write+0x3e1/0x500 [ib_uverbs]
vfs_write+0xad/0x1b0
ksys_write+0x5a/0xd0
do_syscall_64+0x5b/0x180
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes:
d10bcf947a3e ("RDMA/umem: Combine contiguous PAGE_SIZE regions in SGEs")
Tested-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Gal Pressman [Wed, 1 May 2019 10:48:13 +0000 (13:48 +0300)]
RDMA/core: Introduce RDMA subsystem ibdev_* print functions
Similarly to dev/netdev/etc printk helpers, add standard printk helpers
for the RDMA subsystem.
Example output:
efa 0000:00:06.0 efa_0: Hello World!
efa_0: Hello World! (no parent device set)
(NULL ib_device): Hello World! (ibdev is NULL)
Cc: Jason Baron <jbaron@akamai.com>
Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
Suggested-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Matthew Wilcox [Thu, 21 Feb 2019 00:20:45 +0000 (16:20 -0800)]
uverbs: Convert idr to XArray
The word 'idr' is scattered throughout the API, so I haven't changed it,
but the 'idr' variable is now an XArray.
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Matthew Wilcox [Thu, 21 Feb 2019 00:21:07 +0000 (16:21 -0800)]
ib/bnxt: Remove mention of idr_alloc from comment
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Jason Gunthorpe [Thu, 25 Apr 2019 13:26:02 +0000 (10:26 -0300)]
Merge branch 'mlx5_tir_icm' into rdma.git for-next
Ariel Levkovich says:
====================
The series exposes the ICM address of the receive transport
interface (TIR) of Raw Packet and RSS QPs to the user since they are
required to properly create and insert steering rules that direct flows to
these QPs.
====================
For dependencies this branch is based on mlx5-next from
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
* branch 'mlx5_tir_icm':
IB/mlx5: Expose TIR ICM address to user space
net/mlx5: Introduce new TIR creation core API
net/mlx5: Expose TIR ICM address in command outbox
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Ariel Levkovich [Sun, 31 Mar 2019 16:44:50 +0000 (19:44 +0300)]
IB/mlx5: Expose TIR ICM address to user space
This patch exposes the TIR ICM address of raw packet and RSS
QPs to user space.
In order to pass the new field, the patch extends the mlx5
specific QP creation response structure and fills it with
the icm address returned by the FW command, if available.
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Ariel Levkovich [Sun, 31 Mar 2019 16:44:49 +0000 (19:44 +0300)]
net/mlx5: Introduce new TIR creation core API
Introducing new TIR creation core API which allows caller
to receive back from the call the full command outbox.
This comes as a preparation for the next patch that will
retrieve the TIR ICM address from the command outbox.
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Ariel Levkovich [Sun, 31 Mar 2019 16:44:48 +0000 (19:44 +0300)]
net/mlx5: Expose TIR ICM address in command outbox
Adding the TIR ICM address to the create_tir command outbox
through which the device reports the ICM address of the newly
created TIR.
The TIR address can be used for direct attachment to a steering
rule in SW managed steering mode.
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Ariel Levkovich [Sun, 31 Mar 2019 16:44:43 +0000 (19:44 +0300)]
net/mlx5: Expose SW ICM related device memory capabilities
Add SW ICM related fields to the device memory capabilities
structure and sw ownership capability in flow table properties.
The currently supported SW ICM types are steering and header modify
and the changes exposes the device memory capabilities for each
of these two types.
SW ICM memory can be allocated by SW and then be accessed by RDMA
operations for direct management of the HW packet handling tables.
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Jason Gunthorpe [Wed, 24 Apr 2019 19:20:34 +0000 (16:20 -0300)]
Merge branch 'rdma_mmap' into rdma.git for-next
Jason Gunthorpe says:
====================
Upon review it turns out there are some long standing problems in BAR
mapping area:
* BAR pages intended for read-only can be switched to writable via mprotect.
* Missing use of rdma_user_mmap_io for the mlx5 clock BAR page.
* Disassociate causes SIGBUS when touching the pages.
* CPU pages are being mapped through to the process via remap_pfn_range
instead of the more appropriate vm_insert_page, causing weird behaviors
during disassociation.
This series adds the missing VM_* flag manipulation, adds faulting a zero
page for disassociation and revises the CPU page mappings to use
vm_insert_page.
====================
For dependencies this branch is based on for-rc from
git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git
* branch 'rdma_mmap':
RDMA: Remove rdma_user_mmap_page
RDMA/mlx5: Use get_zeroed_page() for clock_info
RDMA/ucontext: Fix regression with disassociate
RDMA/mlx5: Use rdma_user_map_io for mapping BAR pages
RDMA/mlx5: Do not allow the user to write to the clock page
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Jason Gunthorpe [Tue, 16 Apr 2019 11:07:30 +0000 (14:07 +0300)]
RDMA: Remove rdma_user_mmap_page
Upon further research drivers that want this should simply call the core
function vm_insert_page(). The VMA holds a reference on the page and it
will be automatically freed when the last reference drops. No need for
disassociate to sequence the cleanup.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Jason Gunthorpe [Tue, 16 Apr 2019 11:07:29 +0000 (14:07 +0300)]
RDMA/mlx5: Use get_zeroed_page() for clock_info
get_zeroed_page() returns a virtual address for the page which is better
than allocating a struct page and doing a permanent kmap on it.
Cc: stable@vger.kernel.org
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Jason Gunthorpe [Tue, 16 Apr 2019 11:07:28 +0000 (14:07 +0300)]
RDMA/ucontext: Fix regression with disassociate
When this code was consolidated the intention was that the VMA would
become backed by anonymous zero pages after the zap_vma_pte - however this
very subtly relied on setting the vm_ops = NULL and clearing the VM_SHARED
bits to transform the VMA into an anonymous VMA. Since the vm_ops was
removed this broke.
Now userspace gets a SIGBUS if it touches the vma after disassociation.
Instead of converting the VMA to anonymous provide a fault handler that
puts a zero'd page into the VMA when user-space touches it after
disassociation.
Cc: stable@vger.kernel.org
Suggested-by: Andrea Arcangeli <aarcange@redhat.com>
Fixes:
5f9794dc94f5 ("RDMA/ucontext: Add a core API for mmaping driver IO memory")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Jason Gunthorpe [Tue, 16 Apr 2019 11:07:26 +0000 (14:07 +0300)]
RDMA/mlx5: Use rdma_user_map_io for mapping BAR pages
Since mlx5 supports device disassociate it must use this API for all
BAR page mmaps, otherwise the pages can remain mapped after the device
is unplugged causing a system crash.
Cc: stable@vger.kernel.org
Fixes:
5f9794dc94f5 ("RDMA/ucontext: Add a core API for mmaping driver IO memory")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Jason Gunthorpe [Tue, 16 Apr 2019 11:07:25 +0000 (14:07 +0300)]
RDMA/mlx5: Do not allow the user to write to the clock page
The intent of this VMA was to be read-only from user space, but the
VM_MAYWRITE masking was missed, so mprotect could make it writable.
Cc: stable@vger.kernel.org
Fixes:
5c99eaecb1fc ("IB/mlx5: Mmap the HCA's clock info to user-space")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
John Fleck [Thu, 11 Apr 2019 14:17:21 +0000 (07:17 -0700)]
IB/hfi1: Remove reference to RHF.VCRCErr
The bit VCRCErr in the receive header flag is actually a
reserved field. Remove bit operations on this field.
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: John Fleck <john.fleck@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Mike Marciniszyn [Thu, 11 Apr 2019 14:17:10 +0000 (07:17 -0700)]
IB/hfi1: Add selected Rcv counters
These counters are required for error analysis and debug.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Mike Marciniszyn [Fri, 12 Apr 2019 13:41:42 +0000 (06:41 -0700)]
IB/{rdmavt, qib, hfi1}: Use new routine to release reference counts
The reference count adjustments on reference count completion
are open coded throughout.
Add a routine to do all reference count adjustments and use.
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Mike Marciniszyn [Thu, 11 Apr 2019 14:16:23 +0000 (07:16 -0700)]
IB/rdmavt: Use more efficient allowed_ops
QP creation already records the allowed_ops.
Take advantage of that single field to replace multiple qp_type
specific tests.
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Mike Marciniszyn [Thu, 11 Apr 2019 14:16:11 +0000 (07:16 -0700)]
IB/rdmavt: Fix ab/ba include issues
The currently include file ordering for rdmavt headers has an
ab/ba include issue the precludes using inlines from rdma_vt.h
in rdmavt_qp.h.
At the heart of the issue is that rdma_vt.h includes rdmavt_qp.h.
Fix the ordering issue by adjusting rdma_vt.h to not require rdmavt_qp.h
and move qp related inlines to rdmavt_qp.h.
Additionally, promote rvt_mmap_info to rdma_vt.h since it is shared
by rdmavt_cq.h and rdmavt_qp.h.
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Mike Marciniszyn [Thu, 11 Apr 2019 14:16:00 +0000 (07:16 -0700)]
IB/hfi1: Make opfn.h self sufficient
The opfn.h include file build-ablility depends on the including file
having the correct includes.
Fix by making opfn.h self sufficient.
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Kaike Wan [Thu, 11 Apr 2019 14:15:49 +0000 (07:15 -0700)]
IB/{rdmavt, hfi1): Miscellaneous comment fixes
This patch fixes miscellaneous comment errors.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Josh Collier [Thu, 11 Apr 2019 14:07:42 +0000 (07:07 -0700)]
IB/hfi1: Add debugfs to control expansion ROM write protect
Some kernels now enable CONFIG_IO_STRICT_DEVMEM which prevents multiple
handles to PCI resource0. In order to continue to support expansion ROM
updates while the driver is loaded, the driver must now provide an
interface to control the expansion ROM write protection.
This patch adds an exprom_wp debugfs interface that allows the hfi1_eprom
user tool to disable the expansion ROM write protection by opening the
file and writing a '1'. The write protection is released when writing a
'0' or automatically re-enabled when the file handle is closed. The
current implementation will only allow one handle to be opened at a time
across all hfi1 devices.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Josh Collier <josh.d.collier@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Leon Romanovsky [Thu, 4 Apr 2019 06:56:38 +0000 (09:56 +0300)]
RDMA/hns: Remove asynchronic QP destroy
Verbs destroy callbacks are synchronous operations and can't be delayed.
The expectation is that after driver returned from destroy function, the
memory can be freed and user won't be able to access it again.
Ditch workqueue implementation used in HNS driver.
Fixes:
d838c481e025 ("IB/hns: Fix the bug when destroy qp")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: oulijun <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Parav Pandit [Wed, 10 Apr 2019 08:23:04 +0000 (11:23 +0300)]
RDMA/cma: Consider scope_id while binding to ipv6 ll address
When two netdev have same link local addresses (such as vlan and non
vlan), two rdma cm listen id should be able to bind to following different
addresses.
listener-1: addr=lla, scope_id=A, port=X
listener-2: addr=lla, scope_id=B, port=X
However while comparing the addresses only addr and port are considered,
due to which 2nd listener fails to listen.
In below example of two listeners, 2nd listener is failing with address in
use error.
$ rping -sv -a fe80::268a:7ff:feb3:d113%ens2f1 -p 4545&
$ rping -sv -a fe80::268a:7ff:feb3:d113%ens2f1.200 -p 4545
rdma_bind_addr: Address already in use
To overcome this, consider the scope_ids as well which forms the accurate
IPv6 link local address.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Parav Pandit [Wed, 10 Apr 2019 08:23:03 +0000 (11:23 +0300)]
IB/core: Allow vlan link local address based RoCE GIDs
IPv6 link local address for a VLAN netdevice has nothing to do with its
resemblance with the default GID, because VLAN link local GID is in
different layer 2 domain.
Now that RoCE MAD packet processing and route resolution consider the
right GID index, there is no need for an unnecessary check which prevents
the addition of vlan based IPv6 link local GIDs.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Saeed Mahameed [Mon, 22 Apr 2019 22:25:39 +0000 (15:25 -0700)]
Merge tag 'v5.1-rc1' of git://git./linux/kernel/git/torvalds/linux into mlx5-next
Linux 5.1-rc1
We forgot to reset the branch last merge window thus mlx5-next is outdated
and still based on 5.0-rc2. This merge commit is needed to sync mlx5-next
branch with 5.1-rc1.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Mark Bloch [Thu, 28 Mar 2019 13:46:27 +0000 (15:46 +0200)]
RDMA/mlx5: Don't create IB representors when in multiport RoCE mode
Switchdev mode and mutiport RoCE mode aren't compatible at this point.
Don't create IB reps when a user switches to switchdev mode and the driver
operates in that mode.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Mark Bloch [Thu, 28 Mar 2019 13:46:26 +0000 (15:46 +0200)]
RDMA/mlx5: Initialize roce port info before multiport master init
When working in mutliport RoCE mode it is possible to attach a slave
before the master. In that case the slave is waiting for a master to be
attached. When the master is attached it goes over the list of waiting
slaves, finds a slave that is compatible and tries to bind it to itself.
The call stack is:
mlx5_ib_init_multiport_master() -> mlx5_ib_bind_slave_port()
In the bind function we will create a netdev notifier, but this is done
before we initialize the RoCE structure (this is done at a later stage by
the master in the ROCE stage).
Once events are delivered to that notifier we will use
mlx5_ib_get_native_port_mdev() to get the actual port and as the native
port is zero we will access an invalid index in the port structure.
Move the RoCE structure initialization to an earlier stage.
Fixes:
32f69e4be269 ("{net, IB}/mlx5: Manage port association for multiport RoCE")
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Mark Bloch [Thu, 28 Mar 2019 13:46:25 +0000 (15:46 +0200)]
RDMA/mlx5: Allow DEVX and raw creation flow on reps
Remove the limitations that were in place and provide support for DEVX and
raw flow creation on reps.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Maor Gottlieb [Thu, 28 Mar 2019 13:46:24 +0000 (15:46 +0200)]
RDMA/mlx5: Add query e-switch vport context to devx white list
Add MLX5_OP_QUERY_ESW_VPORT_CONTEXT to devx white list. It will be allowed
only if HCA_CAP.eswitch_manager==1.
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Mark Bloch [Thu, 28 Mar 2019 13:46:23 +0000 (15:46 +0200)]
RDMA/mlx5: Allow inserting a steering rule to the FDB
Allow this only via mlx5 raw create flow API, legacy verbs are not
supported. To accommodate that, we add a new attribute to matcher creation
to indicate the type of flow table to be used.
MLX5_IB_ATTR_FLOW_MATCHER_FT_TYPE
With this new attribute MLX5_IB_ATTR_FLOW_MATCHER_FLOW_FLAGS is no longer
needed, we keep it for compatibility but at most only a single attribute can
be passed of the two.
When inserting a flow rule to the FDB we require that a DEVX FT is
provided as a destination, no other configuration is allowed.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Mark Bloch [Thu, 28 Mar 2019 13:46:22 +0000 (15:46 +0200)]
RDMA/mlx5: Create flow table with max size supported
Instead of failing the request, just use the supported number of flow
entries.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Mark Bloch [Thu, 28 Mar 2019 13:46:21 +0000 (15:46 +0200)]
RDMA/mlx5: Access the prio bypass inside the FDB flow table namespace
Now that we have a specific prio inside the FDB namespace allow retrieving
it from the RDMA side.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Parav Pandit [Mon, 15 Apr 2019 10:22:51 +0000 (13:22 +0300)]
RDMA/core: Add a netlink command to change net namespace of rdma device
Provide an option to change the net namespace of a rdma device through a
netlink command. When multiple rdma devices exists in a system, and when
containers are used, this will limit rdma device visibility to a specified
net namespace.
An example command to change net namespace of mlx5_1 device to the
previously created net namespace 'foo' is:
$ ip netns add foo
$ rdma dev set mlx5_1 netns foo
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Parav Pandit [Mon, 15 Apr 2019 10:22:50 +0000 (13:22 +0300)]
RDMA/core: Introduce a helper function to change net namespace of rdma device
Introduce a helper function that changes rdma device's net namespace which
performs mini disable/enable sequence to have device visible only in
assigned net namespace.
Device unregistration, device rename and device change net namespace
may be invoked concurrently.
(a) device unregistration needs to wait if a device change (rename or net
namespace change) operation is in progress.
(b) device net namespace change should not proceed if the unregistration
has started.
(c) while one cpu is changing device net namespace, other cpu should not
be able to rename or change net namespace.
To address above concurrency,
(a) Use unreg_mutex to synchronize between ib_unregister_device() and net
namespace change operation
(b) In cases where unregister_device() has started unregistration before
change_netns got chance to acquire unreg_mutex, validate the refcount
- if it dropped to zero, abort the net namespace change operation.
Finally use the helper function to change net namespace of ib device to
move the device back to init_net when such net is deleted.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Parav Pandit [Mon, 15 Apr 2019 10:22:49 +0000 (13:22 +0300)]
RDMA/core: Avoid freeing netdevs in disable_device()
So we can use the disable_device() helper while changing the net namespace
of the rdma device in a subsequent patch, move free_netdevs() out of it.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Chengguang Xu [Sat, 20 Apr 2019 07:55:12 +0000 (15:55 +0800)]
infiniband/qib: Fix typo in comment
Fix typo 'faspath' -> 'pastpath'.
Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Colin Ian King [Tue, 16 Apr 2019 14:38:04 +0000 (15:38 +0100)]
RDMA/cxgb4: Fix spelling mistake "immedate" -> "immediate"
There is a spelling mistake in a module parameter description. Fix it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Guy Levi [Wed, 10 Apr 2019 07:59:45 +0000 (10:59 +0300)]
IB/mlx5: Fix scatter to CQE in DCT QP creation
When scatter to CQE is enabled on a DCT QP it corrupts the mailbox command
since it tried to treat it as as QP create mailbox command instead of a
DCT create command.
The corrupted mailbox command causes userspace to malfunction as the
device doesn't create the QP as expected.
A new mlx5 capability is exposed to user-space which ensures that it will
not enable the feature on DCT without this fix in the kernel.
Fixes:
5d6ff1babe78 ("IB/mlx5: Support scatter to CQE for DC transport type")
Signed-off-by: Guy Levi <guyle@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Colin Ian King [Sat, 13 Apr 2019 16:00:26 +0000 (17:00 +0100)]
RDMA/cxgb4: Fix null pointer dereference on alloc_skb failure
Currently if alloc_skb fails to allocate the skb a null skb is passed to
t4_set_arp_err_handler and this ends up dereferencing the null skb. Avoid
the NULL pointer dereference by checking for a NULL skb and returning
early.
Addresses-Coverity: ("Dereference null return")
Fixes:
b38a0ad8ec11 ("RDMA/cxgb4: Set arp error handler for PASS_ACCEPT_RPL messages")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Potnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Josh Collier [Mon, 15 Apr 2019 18:34:22 +0000 (11:34 -0700)]
IB/rdmavt: Fix frwr memory registration
Current implementation was not properly handling frwr memory
registrations. This was uncovered by commit 27f26cec761das ("xprtrdma:
Plant XID in on-the-wire RDMA offset (FRWR)") in which xprtrdma, which is
used for NFS over RDMA, started failing as it was the first ULP to modify
the ib_mr iova resulting in the NFS server getting REMOTE ACCESS ERROR
when attempting to perform RDMA Writes to the client.
The fix is to properly capture the true iova, offset, and length in the
call to ib_map_mr_sg, and then update the iova when processing the
IB_WR_REG_MEM on the send queue.
Fixes:
a41081aa5936 ("IB/rdmavt: Add support for ib_map_mr_sg")
Cc: stable@vger.kernel.org
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Josh Collier <josh.d.collier@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Linus Torvalds [Sun, 14 Apr 2019 22:17:41 +0000 (15:17 -0700)]
Linux 5.1-rc5
Linus Torvalds [Sun, 14 Apr 2019 22:09:40 +0000 (15:09 -0700)]
Merge branch 'page-refs' (page ref overflow)
Merge page ref overflow branch.
Jann Horn reported that he can overflow the page ref count with
sufficient memory (and a filesystem that is intentionally extremely
slow).
Admittedly it's not exactly easy. To have more than four billion
references to a page requires a minimum of 32GB of kernel memory just
for the pointers to the pages, much less any metadata to keep track of
those pointers. Jann needed a total of 140GB of memory and a specially
crafted filesystem that leaves all reads pending (in order to not ever
free the page references and just keep adding more).
Still, we have a fairly straightforward way to limit the two obvious
user-controllable sources of page references: direct-IO like page
references gotten through get_user_pages(), and the splice pipe page
duplication. So let's just do that.
* branch page-refs:
fs: prevent page refcount overflow in pipe_buf_get
mm: prevent get_user_pages() from overflowing page refcount
mm: add 'try_get_page()' helper function
mm: make page ref count overflow check tighter and more explicit
Matthew Wilcox [Fri, 5 Apr 2019 21:02:10 +0000 (14:02 -0700)]
fs: prevent page refcount overflow in pipe_buf_get
Change pipe_buf_get() to return a bool indicating whether it succeeded
in raising the refcount of the page (if the thing in the pipe is a page).
This removes another mechanism for overflowing the page refcount. All
callers converted to handle a failure.
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 11 Apr 2019 17:49:19 +0000 (10:49 -0700)]
mm: prevent get_user_pages() from overflowing page refcount
If the page refcount wraps around past zero, it will be freed while
there are still four billion references to it. One of the possible
avenues for an attacker to try to make this happen is by doing direct IO
on a page multiple times. This patch makes get_user_pages() refuse to
take a new page reference if there are already more than two billion
references to the page.
Reported-by: Jann Horn <jannh@google.com>
Acked-by: Matthew Wilcox <willy@infradead.org>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 11 Apr 2019 17:14:59 +0000 (10:14 -0700)]
mm: add 'try_get_page()' helper function
This is the same as the traditional 'get_page()' function, but instead
of unconditionally incrementing the reference count of the page, it only
does so if the count was "safe". It returns whether the reference count
was incremented (and is marked __must_check, since the caller obviously
has to be aware of it).
Also like 'get_page()', you can't use this function unless you already
had a reference to the page. The intent is that you can use this
exactly like get_page(), but in situations where you want to limit the
maximum reference count.
The code currently does an unconditional WARN_ON_ONCE() if we ever hit
the reference count issues (either zero or negative), as a notification
that the conditional non-increment actually happened.
NOTE! The count access for the "safety" check is inherently racy, but
that doesn't matter since the buffer we use is basically half the range
of the reference count (ie we look at the sign of the count).
Acked-by: Matthew Wilcox <willy@infradead.org>
Cc: Jann Horn <jannh@google.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 11 Apr 2019 17:06:20 +0000 (10:06 -0700)]
mm: make page ref count overflow check tighter and more explicit
We have a VM_BUG_ON() to check that the page reference count doesn't
underflow (or get close to overflow) by checking the sign of the count.
That's all fine, but we actually want to allow people to use a "get page
ref unless it's already very high" helper function, and we want that one
to use the sign of the page ref (without triggering this VM_BUG_ON).
Change the VM_BUG_ON to only check for small underflows (or _very_ close
to overflowing), and ignore overflows which have strayed into negative
territory.
Acked-by: Matthew Wilcox <willy@infradead.org>
Cc: Jann Horn <jannh@google.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 13 Apr 2019 23:23:16 +0000 (16:23 -0700)]
Merge tag 'for-linus-
20190412' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"Set of fixes that should go into this round. This pull is larger than
I'd like at this time, but there's really no specific reason for that.
Some are fixes for issues that went into this merge window, others are
not. Anyway, this contains:
- Hardware queue limiting for virtio-blk/scsi (Dongli)
- Multi-page bvec fixes for lightnvm pblk
- Multi-bio dio error fix (Jason)
- Remove the cache hint from the io_uring tool side, since we didn't
move forward with that (me)
- Make io_uring SETUP_SQPOLL root restricted (me)
- Fix leak of page in error handling for pc requests (Jérôme)
- Fix BFQ regression introduced in this merge window (Paolo)
- Fix break logic for bio segment iteration (Ming)
- Fix NVMe cancel request error handling (Ming)
- NVMe pull request with two fixes (Christoph):
- fix the initial CSN for nvme-fc (James)
- handle log page offsets properly in the target (Keith)"
* tag 'for-linus-
20190412' of git://git.kernel.dk/linux-block:
block: fix the return errno for direct IO
nvmet: fix discover log page when offsets are used
nvme-fc: correct csn initialization and increments on error
block: do not leak memory in bio_copy_user_iov()
lightnvm: pblk: fix crash in pblk_end_partial_read due to multipage bvecs
nvme: cancel request synchronously
blk-mq: introduce blk_mq_complete_request_sync()
scsi: virtio_scsi: limit number of hw queues by nr_cpu_ids
virtio-blk: limit number of hw queues by nr_cpu_ids
block, bfq: fix use after free in bfq_bfqq_expire
io_uring: restrict IORING_SETUP_SQPOLL to root
tools/io_uring: remove IOCQE_FLAG_CACHEHIT
block: don't use for-inside-for in bio_for_each_segment_all
Linus Torvalds [Sat, 13 Apr 2019 21:47:06 +0000 (14:47 -0700)]
Merge tag 'nfs-for-5.1-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes from Trond Myklebust:
"Highlights include:
Stable fix:
- Fix a deadlock in close() due to incorrect draining of RDMA queues
Bugfixes:
- Revert "SUNRPC: Micro-optimise when the task is known not to be
sleeping" as it is causing stack overflows
- Fix a regression where NFSv4 getacl and fs_locations stopped
working
- Forbid setting AF_INET6 to "struct sockaddr_in"->sin_family.
- Fix xfstests failures due to incorrect copy_file_range() return
values"
* tag 'nfs-for-5.1-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
Revert "SUNRPC: Micro-optimise when the task is known not to be sleeping"
NFSv4.1 fix incorrect return value in copy_file_range
xprtrdma: Fix helper that drains the transport
NFS: Fix handling of reply page vector
NFS: Forbid setting AF_INET6 to "struct sockaddr_in"->sin_family.
Linus Torvalds [Sat, 13 Apr 2019 21:37:49 +0000 (14:37 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fix from James Bottomley:
"One obvious fix for a ciostor data corruption on error bug"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: csiostor: fix missing data copy in csio_scsi_err_handler()
Linus Torvalds [Sat, 13 Apr 2019 21:33:56 +0000 (14:33 -0700)]
Merge tag 'clk-fixes-for-linus' of git://git./linux/kernel/git/clk/linux
Pull clk fixes from Stephen Boyd:
"Here's more than a handful of clk driver fixes for changes that came
in during the merge window:
- Fix the AT91 sama5d2 programmable clk prescaler formula
- A bunch of Amlogic meson clk driver fixes for the VPU clks
- A DMI quirk for Intel's Bay Trail SoC's driver to properly mark pmc
clks as critical only when really needed
- Stop overwriting CLK_SET_RATE_PARENT flag in mediatek's clk gate
implementation
- Use the right structure to test for a frequency table in i.MX's
PLL_1416x driver"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: imx: Fix PLL_1416X not rounding rates
clk: mediatek: fix clk-gate flag setting
platform/x86: pmc_atom: Drop __initconst on dmi table
clk: x86: Add system specific quirk to mark clocks as critical
clk: meson: vid-pll-div: remove warning and return 0 on invalid config
clk: meson: pll: fix rounding and setting a rate that matches precisely
clk: meson-g12a: fix VPU clock parents
clk: meson: g12a: fix VPU clock muxes mask
clk: meson-gxbb: round the vdec dividers to closest
clk: at91: fix programmable clock for sama5d2
Linus Torvalds [Sat, 13 Apr 2019 21:29:21 +0000 (14:29 -0700)]
Merge tag 'pci-v5.1-fixes-2' of git://git./linux/kernel/git/helgaas/pci
Pull PCI fixes from Bjorn Helgaas:
- Add a DMA alias quirk for another Marvell SATA device (Andre
Przywara)
- Fix a pciehp regression that broke safe removal of devices (Sergey
Miroshnichenko)
* tag 'pci-v5.1-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI: pciehp: Ignore Link State Changes after powering off a slot
PCI: Add function 1 DMA alias quirk for Marvell 9170 SATA controller
Linus Torvalds [Sat, 13 Apr 2019 16:03:09 +0000 (09:03 -0700)]
Merge tag 'powerpc-5.1-5' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"A minor build fix for 64-bit FLATMEM configs.
A fix for a boot failure on 32-bit powermacs.
My commit to fix CLOCK_MONOTONIC across Y2038 broke the 32-bit VDSO on
64-bit kernels, ie. compat mode, which is only used on big endian.
The rewrite of the SLB code we merged in 4.20 missed the fact that the
0x380 exception is also used with the Radix MMU to report out of range
accesses. This could lead to an oops if userspace tried to read from
addresses outside the user or kernel range.
Thanks to: Aneesh Kumar K.V, Christophe Leroy, Larry Finger, Nicholas
Piggin"
* tag 'powerpc-5.1-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/mm: Define MAX_PHYSMEM_BITS for all 64-bit configs
powerpc/64s/radix: Fix radix segment exception handling
powerpc/vdso32: fix CLOCK_MONOTONIC on PPC64
powerpc/32: Fix early boot failure with RTAS built-in
Linus Torvalds [Sat, 13 Apr 2019 15:57:00 +0000 (08:57 -0700)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"The main thing is a fix to our FUTEX_WAKE_OP implementation which was
unbelievably broken, but did actually work for the one scenario that
GLIBC used to use.
Summary:
- Fix stack unwinding so we ignore user stacks
- Fix ftrace module PLT trampoline initialisation checks
- Fix terminally broken implementation of FUTEX_WAKE_OP atomics"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: futex: Fix FUTEX_WAKE_OP atomic ops with non-zero result value
arm64: backtrace: Don't bother trying to unwind the userspace stack
arm64/ftrace: fix inadvertent BUG() in trampoline check
Linus Torvalds [Sat, 13 Apr 2019 03:54:40 +0000 (20:54 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Fix typos in user-visible resctrl parameters, and also fix assembly
constraint bugs that might result in miscompilation"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/asm: Use stricter assembly constraints in bitops
x86/resctrl: Fix typos in the mba_sc mount option
Linus Torvalds [Sat, 13 Apr 2019 03:52:28 +0000 (20:52 -0700)]
Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull timer fix from Ingo Molnar:
"Fix the alarm_timer_remaining() return value"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
alarmtimer: Return correct remaining time
Linus Torvalds [Sat, 13 Apr 2019 03:50:43 +0000 (20:50 -0700)]
Merge branch 'sched-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull scheduler fix from Ingo Molnar:
"Fix a NULL pointer dereference crash in certain environments"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/fair: Do not re-read ->h_load_next during hierarchical load calculation
Linus Torvalds [Sat, 13 Apr 2019 03:42:30 +0000 (20:42 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"Six kernel side fixes: three related to NMI handling on AMD systems, a
race fix, a kexec initialization fix and a PEBS sampling fix"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: Fix perf_event_disable_inatomic() race
x86/perf/amd: Remove need to check "running" bit in NMI handler
x86/perf/amd: Resolve NMI latency issues for active PMCs
x86/perf/amd: Resolve race condition when disabling PMC
perf/x86/intel: Initialize TFA MSR
perf/x86/intel: Fix handling of wakeup_events for multi-entry PEBS
Linus Torvalds [Sat, 13 Apr 2019 03:31:08 +0000 (20:31 -0700)]
Merge branch 'locking-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull locking fix from Ingo Molnar:
"Fixes a crash when accessing /proc/lockdep"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/lockdep: Zap lock classes even with lock debugging disabled
Linus Torvalds [Sat, 13 Apr 2019 03:21:59 +0000 (20:21 -0700)]
Merge branch 'irq-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull irq fixes from Ingo Molnar:
"Two genirq fixes, plus an irqchip driver error handling fix"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq: Respect IRQCHIP_SKIP_SET_WAKE in irq_chip_set_wake_parent()
genirq: Initialize request_mutex if CONFIG_SPARSE_IRQ=n
irqchip/irq-ls1x: Missing error code in ls1x_intc_of_init()
Linus Torvalds [Sat, 13 Apr 2019 03:13:13 +0000 (20:13 -0700)]
Merge branch 'core-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull core fixes from Ingo Molnar:
"Fix an objtool warning plus fix a u64_to_user_ptr() macro expansion
bug"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: Add rewind_stack_do_exit() to the noreturn list
linux/kernel.h: Use parentheses around argument in u64_to_user_ptr()
Leonard Crestez [Fri, 12 Apr 2019 14:10:03 +0000 (14:10 +0000)]
clk: imx: Fix PLL_1416X not rounding rates
Code which initializes the "clk_init_data.ops" checks pll->rate_table
before that field is ever assigned to so it always picks
"clk_pll1416x_min_ops".
This breaks dynamic rate rounding for features such as cpufreq.
Fix by checking pll_clk->rate_table instead, here pll_clk refers to
the constant initialization data coming from per-soc clk driver.
Signed-off-by: Leonard Crestez <leonard.crestez@nxp.com>
Fixes:
8646d4dcc7fb ("clk: imx: Add PLLs driver for imx8mm soc")
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Weiyi Lu [Fri, 12 Apr 2019 03:30:27 +0000 (11:30 +0800)]
clk: mediatek: fix clk-gate flag setting
CLK_SET_RATE_PARENT would be dropped.
Merge two flag setting together to correct the error.
Fixes:
5a1cc4c27ad2 ("clk: mediatek: Add flags to mtk_gate")
Cc: <stable@vger.kernel.org>
Signed-off-by: Weiyi Lu <weiyi.lu@mediatek.com>
Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Linus Torvalds [Fri, 12 Apr 2019 15:25:16 +0000 (08:25 -0700)]
Merge tag 'dma-mapping-5.1-1' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping fixes from Christoph Hellwig:
"Fix a sparc64 sun4v_pci regression introduced in this merged window,
and a dma-debug stracktrace regression from the big refactor last
merge window"
* tag 'dma-mapping-5.1-1' of git://git.infradead.org/users/hch/dma-mapping:
dma-debug: only skip one stackframe entry
sparc64/pci_sun4v: fix ATU checks for large DMA masks
Linus Torvalds [Fri, 12 Apr 2019 15:21:15 +0000 (08:21 -0700)]
Merge tag 'iommu-fix-v5.1-rc5' of git://git./linux/kernel/git/joro/iommu
Pull IOMMU fix from Joerg Roedel:
"Fix an AMD IOMMU issue where the driver didn't correctly setup the
exclusion range in the hardware registers, resulting in exclusion
ranges being one page too big.
This can cause data corruption of the address of that last page is
used by DMA operations"
* tag 'iommu-fix-v5.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/amd: Set exclusion range correctly
Linus Torvalds [Fri, 12 Apr 2019 15:18:37 +0000 (08:18 -0700)]
Merge tag 'clang-format-for-linus-v5.1-rc5' of git://github.com/ojeda/linux
Pull clang-format update from Miguel Ojeda:
"The usual roughly-per-release .clang-format macro list update"
* tag 'clang-format-for-linus-v5.1-rc5' of git://github.com/ojeda/linux:
clang-format: Update with the latest for_each macro list
Linus Torvalds [Fri, 12 Apr 2019 15:16:40 +0000 (08:16 -0700)]
Merge tag 'mmc-v5.1-rc2' of git://git./linux/kernel/git/ulfh/mmc
Pull MMC host fixes from Ulf Hansson:
- alcor: Stabilize data write requests
- sdhci-omap: Fix command error path during tuning
* tag 'mmc-v5.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: sdhci-omap: Don't finish_mrq() on a command error during tuning
mmc: alcor: don't write data before command has completed
Linus Torvalds [Fri, 12 Apr 2019 15:11:59 +0000 (08:11 -0700)]
Merge tag 'sound-5.1-rc5' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Well, this one became unpleasantly larger than previous pull requests,
but it's a kind of usual pattern: now it contains a collection of ASoC
fixes, and nothing to worry too much.
The fixes for ASoC core (DAPM, DPCM, topology) are all small and just
covering corner cases. The rest changes are driver-specific, many of
which are for x86 platforms and new drivers like STM32, in addition to
the usual fixups for HD-audio"
* tag 'sound-5.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (66 commits)
ASoC: wcd9335: Fix missing regmap requirement
ALSA: hda: Fix racy display power access
ASoC: pcm: fix error handling when try_module_get() fails.
ASoC: stm32: sai: fix master clock management
ASoC: Intel: kbl: fix wrong number of channels
ALSA: hda - Add two more machines to the power_save_blacklist
ASoC: pcm: update module refcount if module_get_upon_open is set
ASoC: core: conditionally increase module refcount on component open
ASoC: stm32: fix sai driver name initialisation
ASoC: topology: Use the correct dobj to free enum control values and texts
ALSA: seq: Fix OOB-reads from strlcpy
ASoC: intel: skylake: add remove() callback for component driver
ASoC: cs35l35: Disable regulators on driver removal
ALSA: xen-front: Do not use stream buffer size before it is set
ASoC: rockchip: pdm: change dma burst to 8
ASoC: rockchip: pdm: fix regmap_ops hang issue
ASoC: simple-card: don't select DPCM via simple-audio-card
ASoC: audio-graph-card: don't select DPCM via audio-graph-card
ASoC: tlv320aic32x4: Change author's name
ALSA: hda/realtek - Add quirk for Tuxedo XC 1509
...
Linus Torvalds [Fri, 12 Apr 2019 15:07:46 +0000 (08:07 -0700)]
Merge tag 'acpi-5.1-rc5' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"Fix an ACPICA issue introduced during the 4.20 development cycle and
causing some systems to crash because of leftover operation region
data still maintained after the operation region in question has gone
away (Erik Schmauss)"
* tag 'acpi-5.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPICA: Namespace: remove address node from global list after method termination
Linus Torvalds [Fri, 12 Apr 2019 15:04:01 +0000 (08:04 -0700)]
Merge tag 'drm-fixes-2019-04-12' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Fixes across the driver spectrum this week, the mediatek fbdev support
might be a bit late for this round, but I looked over it and it's not
very large and seems like a useful feature for them.
Otherwise the main thing is a regression fix for i915 5.0 bug that
caused black screens on a bunch of Dell XPS 15s I think, I know at
least Fedora is waiting for this to land, and the udl fix is also for
a regression since 5.0 where unplugging the device would end badly.
core:
- make atomic hooks optional
i915:
- Revert a 5.0 regression where some eDP panels stopped working
- DSI related fixes for platforms up to IceLake
- GVT (regression fix, warning fix, use-after free fix)
amdgpu:
- Cursor fixes
- missing PCI ID fix for KFD
- XGMI fix
- shadow buffer handling after reset fix
udl:
- fix unplugging device crashes.
mediatek:
- stabilise MT2701 HDMI support
- fbdev support
tegra:
- fix for build regression in rc1.
sun4i:
- Allwinner A6 max freq improvements
- null ptr deref fix
dw-hdmi:
- SCDC configuration improvements
omap:
- CEC clock management policy fix"
* tag 'drm-fixes-2019-04-12' of git://anongit.freedesktop.org/drm/drm: (32 commits)
gpu: host1x: Fix compile error when IOMMU API is not available
drm/i915/gvt: Roundup fb->height into tile's height at calucation fb->size
drm/i915/dp: revert back to max link rate and lane count on eDP
drm/i915/icl: Fix port disable sequence for mipi-dsi
drm/i915/icl: Ungate ddi clocks before IO enable
drm/mediatek: no change parent rate in round_rate() for MT2701 hdmi phy
drm/mediatek: using new factor for tvdpll for MT2701 hdmi phy
drm/mediatek: remove flag CLK_SET_RATE_PARENT for MT2701 hdmi phy
drm/mediatek: make implementation of recalc_rate() for MT2701 hdmi phy
drm/mediatek: fix the rate and divder of hdmi phy for MT2701
drm/mediatek: fix possible object reference leak
drm/i915: Get power refs in encoder->get_power_domains()
drm/i915: Fix pipe_bpp readout for BXT/GLK DSI
drm/amd/display: Fix negative cursor pos programming (v2)
drm/sun4i: tcon top: Fix NULL/invalid pointer dereference in sun8i_tcon_top_un/bind
drm/udl: add a release method and delay modeset teardown
drm/i915/gvt: Prevent use-after-free in ppgtt_free_all_spt()
drm/i915/gvt: Annotate iomem usage
drm/sun4i: DW HDMI: Lower max. supported rate for H6
Revert "Documentation/gpu/meson: Remove link to meson_canvas.c"
...
Colin Ian King [Fri, 12 Apr 2019 10:40:17 +0000 (11:40 +0100)]
RDMA/mlx5: Check for error return in flow_rule rather than err
Currently when the call to create_flow_rule_vport_sq fails, the error
check is being performed on err rather than on the return pointer
flow_rule. The return flow_rule maybe NULL (which is not considered an
error) or an error code, so check for the error on flow_rule.
Addresses-Coverity: ("Uninitialized scalar variable")
Fixes:
d5ed8ac34cef ("RDMA/mlx5: Move default representors SQ steering to rule to modify QP")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Will Deacon [Mon, 8 Apr 2019 11:45:09 +0000 (12:45 +0100)]
arm64: futex: Fix FUTEX_WAKE_OP atomic ops with non-zero result value
Rather embarrassingly, our futex() FUTEX_WAKE_OP implementation doesn't
explicitly set the return value on the non-faulting path and instead
leaves it holding the result of the underlying atomic operation. This
means that any FUTEX_WAKE_OP atomic operation which computes a non-zero
value will be reported as having failed. Regrettably, I wrote the buggy
code back in 2011 and it was upstreamed as part of the initial arm64
support in 2012.
The reasons we appear to get away with this are:
1. FUTEX_WAKE_OP is rarely used and therefore doesn't appear to get
exercised by futex() test applications
2. If the result of the atomic operation is zero, the system call
behaves correctly
3. Prior to version 2.25, the only operation used by GLIBC set the
futex to zero, and therefore worked as expected. From 2.25 onwards,
FUTEX_WAKE_OP is not used by GLIBC at all.
Fix the implementation by ensuring that the return value is either 0
to indicate that the atomic operation completed successfully, or -EFAULT
if we encountered a fault when accessing the user mapping.
Cc: <stable@kernel.org>
Fixes:
6170a97460db ("arm64: Atomic operations")
Signed-off-by: Will Deacon <will.deacon@arm.com>
Devesh Sharma [Wed, 10 Apr 2019 09:10:07 +0000 (05:10 -0400)]
RDMA/ocrdma: Remove use of idr use pci bdf instead
Removing the use of IDR variable just to name the function ids. Using the
PCI_FUNC(pdev->devfn) instead to create the device name, associated
resources and to print driver into at various places.
Reported-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Joerg Roedel [Fri, 12 Apr 2019 10:50:31 +0000 (12:50 +0200)]
iommu/amd: Set exclusion range correctly
The exlcusion range limit register needs to contain the
base-address of the last page that is part of the range, as
bits 0-11 of this register are treated as 0xfff by the
hardware for comparisons.
So correctly set the exclusion range in the hardware to the
last page which is _in_ the range.
Fixes:
b2026aa2dce44 ('x86, AMD IOMMU: add functions for programming IOMMU MMIO space')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Miguel Ojeda [Sat, 30 Mar 2019 08:20:16 +0000 (09:20 +0100)]
clang-format: Update with the latest for_each macro list
Re-run the shell fragment that generated the original list now that
there are two dozens of new entries after v5.1's merge window.
Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Peter Zijlstra [Thu, 4 Apr 2019 13:03:00 +0000 (15:03 +0200)]
perf/core: Fix perf_event_disable_inatomic() race
Thomas-Mich Richter reported he triggered a WARN()ing from event_function_local()
on his s390. The problem boils down to:
CPU-A CPU-B
perf_event_overflow()
perf_event_disable_inatomic()
@pending_disable = 1
irq_work_queue();
sched-out
event_sched_out()
@pending_disable = 0
sched-in
perf_event_overflow()
perf_event_disable_inatomic()
@pending_disable = 1;
irq_work_queue(); // FAILS
irq_work_run()
perf_pending_event()
if (@pending_disable)
perf_event_disable_local(); // WHOOPS
The problem exists in generic, but s390 is particularly sensitive
because it doesn't implement arch_irq_work_raise(), nor does it call
irq_work_run() from it's PMU interrupt handler (nor would that be
sufficient in this case, because s390 also generates
perf_event_overflow() from pmu::stop). Add to that the fact that s390
is a virtual architecture and (virtual) CPU-A can stall long enough
for the above race to happen, even if it would self-IPI.
Adding a irq_work_sync() to event_sched_in() would work for all hardare
PMUs that properly use irq_work_run() but fails for software PMUs.
Instead encode the CPU number in @pending_disable, such that we can
tell which CPU requested the disable. This then allows us to detect
the above scenario and even redirect the IPI to make up for the failed
queue.
Reported-by: Thomas-Mich Richter <tmricht@linux.ibm.com>
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Hendrik Brueckner <brueckner@linux.ibm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Dave Airlie [Fri, 12 Apr 2019 03:39:22 +0000 (13:39 +1000)]
Merge tag 'drm-intel-fixes-2019-04-11' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
- Revert back to max link rate and lane count on eDP.
- DSI related fixes for all platforms including Ice Lake.
- GVT Fixes including one vGPU display plane size regression fix,
one for preventing use-after-free in ppgtt shadow free function,
and another warning fix for iomem access annotation.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190411235832.GA6476@intel.com
Jason Yan [Fri, 12 Apr 2019 02:09:16 +0000 (10:09 +0800)]
block: fix the return errno for direct IO
If the last bio returned is not dio->bio, the status of the bio will
not assigned to dio->bio if it is error. This will cause the whole IO
status wrong.
ksoftirqd/21-117 [021] ..s. 4017.966090: 8,0 C N 4883648 [0]
<idle>-0 [018] ..s. 4017.970888: 8,0 C WS 4924800 + 1024 [0]
<idle>-0 [018] ..s. 4017.970909: 8,0 D WS 4935424 + 1024 [<idle>]
<idle>-0 [018] ..s. 4017.970924: 8,0 D WS 4936448 + 321 [<idle>]
ksoftirqd/21-117 [021] ..s. 4017.995033: 8,0 C R 4883648 + 336 [65475]
ksoftirqd/21-117 [021] d.s. 4018.001988: myprobe1: (blkdev_bio_end_io+0x0/0x168) bi_status=7
ksoftirqd/21-117 [021] d.s. 4018.001992: myprobe: (aio_complete_rw+0x0/0x148) x0=0xffff802f2595ad80 res=0x12a000 res2=0x0
We always have to assign bio->bi_status to dio->bio.bi_status because we
will only check dio->bio.bi_status when we return the whole IO to
the upper layer.
Fixes:
542ff7bf18c6 ("block: new direct I/O implementation")
Cc: stable@vger.kernel.org
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Thu, 11 Apr 2019 21:19:02 +0000 (14:19 -0700)]
Merge tag 'for-5.1-rc4-tag' of git://git./linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- fix parsing of compression algorithm when set as a inode property,
this could end up with eg. 'zst' or 'zli' in the value
- don't allow trim on a filesystem with unreplayed log, this could
cause data loss if there are pending updates to the block groups that
would not be subject to trim after replay
* tag 'for-5.1-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: prop: fix vanished compression property after failed set
btrfs: prop: fix zstd compression parameter validation
Btrfs: do not allow trimming when a fs is mounted with the nologreplay option
Dave Airlie [Thu, 11 Apr 2019 20:55:20 +0000 (06:55 +1000)]
Merge tag 'drm-misc-fixes-2019-04-11' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
- core: Make atomic_enable and disable optional for CRTC
- dw-hdmi: Lower max frequency for the Allwinner H6, SCDC configuration
improvements for older controller versions
- omap: a fix for the CEC clock management policy
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <maxime.ripard@bootlin.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190411151658.orm46ccd5zmrw27l@flea
Trond Myklebust [Thu, 11 Apr 2019 19:16:52 +0000 (15:16 -0400)]
Revert "SUNRPC: Micro-optimise when the task is known not to be sleeping"
This reverts commit
009a82f6437490c262584d65a14094a818bcb747.
The ability to optimise here relies on compiler being able to optimise
away tail calls to avoid stack overflows. Unfortunately, we are seeing
reports of problems, so let's just revert.
Reported-by: Daniel Mack <daniel@zonque.org>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Olga Kornievskaia [Thu, 11 Apr 2019 18:34:18 +0000 (14:34 -0400)]
NFSv4.1 fix incorrect return value in copy_file_range
According to the NFSv4.2 spec if the input and output file is the
same file, operation should fail with EINVAL. However, linux
copy_file_range() system call has no such restrictions. Therefore,
in such case let's return EOPNOTSUPP and allow VFS to fallback
to doing do_splice_direct(). Also when copy_file_range is called
on an NFSv4.0 or 4.1 mount (ie., a server that doesn't support
COPY functionality), we also need to return EOPNOTSUPP and
fallback to a regular copy.
Fixes xfstest generic/075, generic/091, generic/112, generic/263
for all NFSv4.x versions.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Chuck Lever [Tue, 9 Apr 2019 21:04:09 +0000 (17:04 -0400)]
xprtrdma: Fix helper that drains the transport
We want to drain only the RQ first. Otherwise the transport can
deadlock on ->close if there are outstanding Send completions.
Fixes:
6d2d0ee27c7a ("xprtrdma: Replace rpcrdma_receive_wq ... ")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: stable@vger.kernel.org # v5.0+
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Chuck Lever [Tue, 9 Apr 2019 14:44:16 +0000 (10:44 -0400)]
NFS: Fix handling of reply page vector
NFSv4 GETACL and FS_LOCATIONS requests stopped working in v5.1-rc.
These two need the extra padding to be added directly to the reply
length.
Reported-by: Olga Kornievskaia <aglo@umich.edu>
Fixes:
02ef04e432ba ("NFS: Account for XDR pad of buf->pages")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Olga Kornievskaia <aglo@umich.edu>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Tetsuo Handa [Sat, 30 Mar 2019 01:21:07 +0000 (10:21 +0900)]
NFS: Forbid setting AF_INET6 to "struct sockaddr_in"->sin_family.
syzbot is reporting uninitialized value at rpc_sockaddr2uaddr() [1]. This
is because syzbot is setting AF_INET6 to "struct sockaddr_in"->sin_family
(which is embedded into user-visible "struct nfs_mount_data" structure)
despite nfs23_validate_mount_data() cannot pass sizeof(struct sockaddr_in6)
bytes of AF_INET6 address to rpc_sockaddr2uaddr().
Since "struct nfs_mount_data" structure is user-visible, we can't change
"struct nfs_mount_data" to use "struct sockaddr_storage". Therefore,
assuming that everybody is using AF_INET family when passing address via
"struct nfs_mount_data"->addr, reject if its sin_family is not AF_INET.
[1] https://syzkaller.appspot.com/bug?id=
599993614e7cbbf66bc2656a919ab2a95fb5d75c
Reported-by: syzbot <syzbot+047a11c361b872896a4f@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Scott Wood [Wed, 10 Apr 2019 21:59:25 +0000 (16:59 -0500)]
dma-debug: only skip one stackframe entry
With skip set to 1, I get a traceback like this:
[ 106.867637] DMA-API: Mapped at:
[ 106.870784] afu_dma_map_region+0x2cd/0x4f0 [dfl_afu]
[ 106.875839] afu_ioctl+0x258/0x380 [dfl_afu]
[ 106.880108] do_vfs_ioctl+0xa9/0x720
[ 106.883688] ksys_ioctl+0x60/0x90
[ 106.887007] __x64_sys_ioctl+0x16/0x20
With the previous value of 2, afu_dma_map_region was being omitted. I
suspect that the code paths have simply changed since the value of 2 was
chosen a decade ago, but it's also possible that it varies based on which
mapping function was used, compiler inlining choices, etc. In any case,
it's best to err on the side of skipping less.
Signed-off-by: Scott Wood <swood@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Stephen Boyd [Thu, 11 Apr 2019 17:22:43 +0000 (10:22 -0700)]
platform/x86: pmc_atom: Drop __initconst on dmi table
It's used by probe and that isn't an init function. Drop this so that we
don't get a section mismatch.
Reported-by: kbuild test robot <lkp@intel.com>
Cc: David Müller <dave.mueller@gmx.ch>
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Fixes:
7c2e07130090 ("clk: x86: Add system specific quirk to mark clocks as critical")
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Rodrigo Vivi [Thu, 11 Apr 2019 16:18:13 +0000 (09:18 -0700)]
Merge tag 'gvt-fixes-2019-04-11' of https://github.com/intel/gvt-linux into drm-intel-fixes
gvt-fixes-2019-04-11
- Fix sparse warning on iomem usage (Chris)
- Prevent use-after-free for ppgtt shadow table free (Chris)
- Fix display plane size regression for tiled surface (Xiong)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
From: Zhenyu Wang <zhenyuw@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190411064910.GF17995@zhen-hp.sh.intel.com
Jens Axboe [Thu, 11 Apr 2019 15:36:41 +0000 (09:36 -0600)]
Merge branch 'nvme-5.1' of git://git.infradead.org/nvme into for-linus
Pull NVMe fixes from Christoph:
"Two nvme fixes for 5.1 - fixing the initial CSN for nvme-fc, and handle
log page offsets properly in the target."
* 'nvme-5.1' of git://git.infradead.org/nvme:
nvmet: fix discover log page when offsets are used
nvme-fc: correct csn initialization and increments on error