Bob Pearson [Thu, 7 Oct 2021 20:40:50 +0000 (15:40 -0500)]
RDMA/rxe: Replace ah->pd by ah->ibah.pd
The pd field in struct rxe_ah is redundant with the pd field in the
rdma-core's ib_ah. Eliminate the pd field in rxe_ah and add an inline to
extract the pd from the ibah field.
Link: https://lore.kernel.org/r/20211007204051.10086-5-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Bob Pearson [Thu, 7 Oct 2021 20:40:49 +0000 (15:40 -0500)]
RDMA/rxe: Create AH index and return to user space
Make changes to rdma_user_rxe.h to allow indexing AH objects, passing the
index in UD send WRs to the driver and returning the index to the rxe
provider.
Modify rxe_create_ah() to add an index to AH when created and if called
from a new user provider return it to user space. If called from an old
provider mark the AH as not having a useful index. Modify rxe_destroy_ah
to drop the index before deleting the object.
Link: https://lore.kernel.org/r/20211007204051.10086-4-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Bob Pearson [Thu, 7 Oct 2021 20:40:48 +0000 (15:40 -0500)]
RDMA/rxe: Change AH objects to indexed
Make changes to rxe_param.h and rxe_pool.c to allow indexing of AH
objects. Valid indices are non-zero so older providers can be detected.
Link: https://lore.kernel.org/r/20211007204051.10086-3-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Bob Pearson [Thu, 7 Oct 2021 20:40:47 +0000 (15:40 -0500)]
RDMA/rxe: Move AV from rxe_send_wqe to rxe_send_wr
Move the struct rxe_av av from struct rxe_send_wqe to struct rxe_send_wr
placing it in wr.ud at the same offset as it was previously. This has the
effect of increasing the size of struct rxe_send_wr while keeping the size
of struct rxe_send_wqe the same. This better reflects the use of this
field which is only used for UD sends. This change has no effect on ABI
compatibility so the modified rxe driver will operate with older versions
of rdma-core.
Link: https://lore.kernel.org/r/20211007204051.10086-2-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Leon Romanovsky [Tue, 12 Oct 2021 07:28:43 +0000 (10:28 +0300)]
RDMA/mlx4: Return missed an error if device doesn't support steering
The error flow fixed in this patch is not possible because all kernel
users of create QP interface check that device supports steering before
set IB_QP_CREATE_NETIF_QP flag.
Fixes:
c1c98501121e ("IB/mlx4: Add support for steerable IB UD QPs")
Link: https://lore.kernel.org/r/91c61f6e60eb0240f8bbc321fda7a1d2986dd03c.1634023677.git.leonro@nvidia.com
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Zhu Yanjun [Mon, 11 Oct 2021 11:01:28 +0000 (07:01 -0400)]
RDMA/irdma: Remove irdma_cqp_up_map_cmd()
The function irdma_cqp_up_map_cmd() is not used. So remove it.
Link: https://lore.kernel.org/r/20211011110128.4057-5-yanjun.zhu@linux.dev
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Zhu Yanjun [Mon, 11 Oct 2021 11:01:27 +0000 (07:01 -0400)]
RDMA/irdma: Remove irdma_get_hw_addr()
The function irdma_get_hw_addr() is not used. So remove it.
Link: https://lore.kernel.org/r/20211011110128.4057-4-yanjun.zhu@linux.dev
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Zhu Yanjun [Mon, 11 Oct 2021 11:01:26 +0000 (07:01 -0400)]
RDMA/irdma: Remove irdma_sc_send_lsmm_nostag()
The function irdma_sc_send_lsmm_nostag is not used. So remove it.
Link: https://lore.kernel.org/r/20211011110128.4057-3-yanjun.zhu@linux.dev
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Zhu Yanjun [Mon, 11 Oct 2021 11:01:25 +0000 (07:01 -0400)]
RDMA/irdma: Remove irdma_uk_mw_bind()
The function irdma_uk_mw_bind is not used. So remove it.
Link: https://lore.kernel.org/r/20211011110128.4057-2-yanjun.zhu@linux.dev
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Christophe JAILLET [Sun, 10 Oct 2021 14:08:10 +0000 (16:08 +0200)]
RDMA: Remove redundant 'flush_workqueue()' calls
'destroy_workqueue()' already drains the queue before destroying it, so
there is no need to flush it explicitly.
Remove the redundant 'flush_workqueue()' calls.
This was generated with coccinelle:
@@
expression E;
@@
- flush_workqueue(E);
destroy_workqueue(E);
Link: https://lore.kernel.org/r/ca7bac6e6c9c5cc8d04eec3944edb13de0e381a3.1633874776.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Colin Ian King [Thu, 7 Oct 2021 17:39:42 +0000 (18:39 +0100)]
RDMA/iwpm: Remove redundant initialization of pointer err_str
The pointer err_str is being initialized with a value that is never read,
it is being updated later on. The assignment is redundant and can be
removed.
Link: https://lore.kernel.org/r/20211007173942.21933-1-colin.king@canonical.com
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Cai Huoqing [Sun, 26 Sep 2021 06:11:15 +0000 (14:11 +0800)]
RDMA/hns: Use dma_alloc_coherent() instead of kmalloc/dma_map_single()
Replacing kmalloc/kfree/dma_map_single/dma_unmap_single() with
dma_alloc_coherent/dma_free_coherent() helps to reduce code size, and
simplify the code, and coherent DMA will not clear the cache every time.
The SOC that this driver supports does not have incoherent DMA, so this
makes the code follow the DMA API properly with no performance
impact. Currently there are missing dma sync calls around the DMA
transfers.
Link: https://lore.kernel.org/r/20210926061116.282-1-caihuoqing@baidu.com
Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Reviewed-by: Wenpeng Liang <liangwenpeng@huawei.com>
Tested-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Aharon Landau [Fri, 8 Oct 2021 12:24:39 +0000 (15:24 +0300)]
RDMA/mlx5: Add optional counter support in get_hw_stats callback
When get_hw_stats is called, query and return the optional counter
statistic as well.
Link: https://lore.kernel.org/r/20211008122439.166063-14-markzhang@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Aharon Landau [Fri, 8 Oct 2021 12:24:38 +0000 (15:24 +0300)]
RDMA/mlx5: Add modify_op_stat() support
Add support for ib callback modify_op_stat() to add or remove an optional
counter. When adding, a steering flow table is created with a rule that
catches and counts all the matching packets. When removing, the table and
flow counter are destroyed.
Link: https://lore.kernel.org/r/20211008122439.166063-13-markzhang@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Aharon Landau [Fri, 8 Oct 2021 12:24:37 +0000 (15:24 +0300)]
RDMA/mlx5: Add steering support in optional flow counters
Adding steering infrastructure for adding and removing optional counter.
This allows to add and remove the counters dynamically in order not to
hurt performance.
Link: https://lore.kernel.org/r/20211008122439.166063-12-markzhang@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Aharon Landau [Fri, 8 Oct 2021 12:24:36 +0000 (15:24 +0300)]
RDMA/mlx5: Support optional counters in hw_stats initialization
Add optional counter support when allocate and initialize hw_stats
structure. Optional counters have IB_STAT_FLAG_OPTIONAL flag set and are
disabled by default.
Link: https://lore.kernel.org/r/20211008122439.166063-11-markzhang@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Aharon Landau [Fri, 8 Oct 2021 12:24:35 +0000 (15:24 +0300)]
RDMA/nldev: Allow optional-counter status configuration through RDMA netlink
Provide an option to allow users to enable/disable optional counters
through RDMA netlink. Limiting it to users with ADMIN capability only.
Examples:
1. Enable optional counters cc_rx_ce_pkts and cc_rx_cnp_pkts (and
disable all others):
$ sudo rdma statistic set link rocep8s0f0/1 optional-counters \
cc_rx_ce_pkts,cc_rx_cnp_pkts
2. Remove all optional counters:
$ sudo rdma statistic unset link rocep8s0f0/1 optional-counters
Link: https://lore.kernel.org/r/20211008122439.166063-10-markzhang@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Aharon Landau [Fri, 8 Oct 2021 12:24:34 +0000 (15:24 +0300)]
RDMA/nldev: Split nldev_stat_set_mode_doit out of nldev_stat_set_doit
In order to allow expansion of the set command with more set options, take
the set mode out of the main set function.
Link: https://lore.kernel.org/r/20211008122439.166063-9-markzhang@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Aharon Landau [Fri, 8 Oct 2021 12:24:33 +0000 (15:24 +0300)]
RDMA/nldev: Add support to get status of all counters
This patch adds the ability to get the name, index and status of all
counters for each link through RDMA netlink. This can be used for
user-space to get the current optional-counter mode.
Examples:
$ rdma statistic mode
link rocep8s0f0/1 optional-counters cc_rx_ce_pkts
$ rdma statistic mode supported
link rocep8s0f0/1 supported optional-counters cc_rx_ce_pkts,cc_rx_cnp_pkts,cc_tx_cnp_pkts
link rocep8s0f1/1 supported optional-counters cc_rx_ce_pkts,cc_rx_cnp_pkts,cc_tx_cnp_pkts
Link: https://lore.kernel.org/r/20211008122439.166063-8-markzhang@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Signed-off-by: Neta Ostrovsky <netao@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Aharon Landau [Fri, 8 Oct 2021 12:24:32 +0000 (15:24 +0300)]
RDMA/counter: Add optional counter support
An optional counter is a driver-specific counter that may be dynamically
enabled/disabled. This enhancement allows drivers to expose counters
which are, for example, mutually exclusive and cannot be enabled at the
same time, counters that might degrades performance, optional debug
counters, etc.
Optional counters are marked with IB_STAT_FLAG_OPTIONAL flag. They are not
exported in sysfs, and must be at the end of all stats, otherwise the
attr->show() in sysfs would get wrong indexes for hwcounters that are
behind optional counters.
Link: https://lore.kernel.org/r/20211008122439.166063-7-markzhang@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Signed-off-by: Neta Ostrovsky <netao@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Aharon Landau [Fri, 8 Oct 2021 12:24:31 +0000 (15:24 +0300)]
RDMA/counter: Add an is_disabled field in struct rdma_hw_stats
Add a bitmap in rdma_hw_stat structure, with each bit indicates whether
the corresponding counter is currently disabled or not. By default
hwcounters are enabled.
Link: https://lore.kernel.org/r/20211008122439.166063-6-markzhang@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Mark Zhang [Fri, 8 Oct 2021 12:24:30 +0000 (15:24 +0300)]
RDMA/core: Add a helper API rdma_free_hw_stats_struct
Add a new API rdma_free_hw_stats_struct to pair with
rdma_alloc_hw_stats_struct (which is also de-inlined).
This will be useful when there are more alloc/free works in following
patches.
Link: https://lore.kernel.org/r/20211008122439.166063-5-markzhang@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Aharon Landau [Fri, 8 Oct 2021 12:24:29 +0000 (15:24 +0300)]
RDMA/counter: Add a descriptor in struct rdma_hw_stats
Add a counter statistic descriptor structure in rdma_hw_stats. In addition
to the counter name, more meta-information will be added. This code
extension is needed for optional-counter support in the following patches.
Link: https://lore.kernel.org/r/20211008122439.166063-4-markzhang@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Jason Gunthorpe [Tue, 12 Oct 2021 15:19:24 +0000 (12:19 -0300)]
Merge branch 'mlx5-next' of git://git./linux/kernel/git/mellanox/linux
For dependencies in the following patches.
* mellanox/mlx5-next:
net/mlx5: Add priorities for counters in RDMA namespaces
net/mlx5: Add ifc bits to support optional counters
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Aharon Landau [Fri, 8 Oct 2021 12:24:28 +0000 (15:24 +0300)]
net/mlx5: Add priorities for counters in RDMA namespaces
Add additional flow steering priorities in the RDMA namespace.
This allows adding flow counters to count filtered RDMA traffic and then
continue processing in the regular RDMA steering flow.
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Aharon Landau [Fri, 8 Oct 2021 12:24:27 +0000 (15:24 +0300)]
net/mlx5: Add ifc bits to support optional counters
Adding bth_opcode field and the relevant bits. This field will be used
to capture and count congestion notification packets (CNP).
Adding source_vhca_port support bit.
This field will be used to check the capability to use the
source_vhca_port as a match criteria in cases of dual port.
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Gal Pressman [Sun, 3 Oct 2021 10:56:04 +0000 (13:56 +0300)]
RDMA/efa: CQ notifications
This patch adds support for CQ notifications through the standard verbs
api.
In order to achieve that, a new event queue (EQ) object is introduced,
which is in charge of reporting completion events to the driver. On
driver load, EQs are allocated and their affinity is set to a single
cpu. When a user app creates a CQ with a completion channel, the
completion vector number is converted to a EQ number, which is in charge
of reporting the CQ events.
In addition, the CQ creation admin command now returns an offset for the
CQ doorbell, which is mapped to the userspace provider and is used to arm
the CQ when requested by the user.
The EQs use a single doorbell (located on the registers BAR), which
encodes the EQ number and arm as part of the doorbell value. The EQs are
polled by the driver on each new EQE, and arm it when the poll is
completed.
Link: https://lore.kernel.org/r/20211003105605.29222-1-galpress@amazon.com
Reviewed-by: Firas JahJah <firasj@amazon.com>
Reviewed-by: Yossi Leybovich <sleybo@amazon.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Xiao Yang [Thu, 30 Sep 2021 09:48:13 +0000 (17:48 +0800)]
RDMA/rxe: Remove duplicate settings
Remove duplicate settings for vendor_err and qp_num.
Link: https://lore.kernel.org/r/20210930094813.226888-5-yangx.jy@fujitsu.com
Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Xiao Yang [Thu, 30 Sep 2021 09:48:12 +0000 (17:48 +0800)]
RDMA/rxe: Set partial attributes when completion status != IBV_WC_SUCCESS
As ibv_poll_cq()'s manual said, only partial attributes are valid when
completion status != IBV_WC_SUCCESS.
Link: https://lore.kernel.org/r/20210930094813.226888-4-yangx.jy@fujitsu.com
Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Xiao Yang [Thu, 30 Sep 2021 09:48:11 +0000 (17:48 +0800)]
RDMA/rxe: Change the is_user member of struct rxe_cq to bool
Make the is_user members of struct rxe_qp/rxe_cq has the same type.
Link: https://lore.kernel.org/r/20210930094813.226888-3-yangx.jy@fujitsu.com
Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Xiao Yang [Thu, 30 Sep 2021 09:48:10 +0000 (17:48 +0800)]
RDMA/rxe: Remove the is_user members of struct rxe_sq/rxe_rq/rxe_srq
The is_user members of struct rxe_sq/rxe_rq/rxe_srq are unsed since
commit
ae6e843fe08d ("RDMA/rxe: Add memory barriers to kernel queues").
In this case, it is fine to remove them directly.
Link: https://lore.kernel.org/r/20210930094813.226888-2-yangx.jy@fujitsu.com
Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Zhu Yanjun [Wed, 6 Oct 2021 20:15:31 +0000 (16:15 -0400)]
RDMA/irdma: Delete unused struct irdma_bth
The struct irdma_bth is not used, so remove it.
Link: https://lore.kernel.org/r/20211006201531.469650-1-yanjun.zhu@linux.dev
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Andy Shevchenko [Fri, 1 Oct 2021 12:31:53 +0000 (15:31 +0300)]
IB/hf1: Use string_upper() instead of an open coded variant
Use string_upper() from the string helper module instead of an open coded
variant.
Link: https://lore.kernel.org/r/20211001123153.67379-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Logan Gunthorpe [Fri, 1 Oct 2021 21:32:15 +0000 (15:32 -0600)]
RDMA/rw: switch to dma_map_sgtable()
There are a couple of subtle error path bugs related to mapping the sgls:
- In rdma_rw_ctx_init(), dma_unmap would be called with an sg that could
have been incremented from the original call, as well as an nents that
is the dma mapped entries not the original number of nents called when
mapped.
- Similarly in rdma_rw_ctx_signature_init, both sg and prot_sg were
unmapped with the incorrect number of nents.
To fix this, switch to the sgtable interface for mapping which
conveniently stores the original nents for unmapping. This will get
cleaned up further once the dma mapping interface supports P2PDMA and
pci_p2pdma_map_sg() can be removed.
Fixes:
0e353e34e1e7 ("IB/core: add RW API support for signature MRs")
Fixes:
a060b5629ab0 ("IB/core: generic RDMA READ/WRITE API")
Link: https://lore.kernel.org/r/20211001213215.3761-1-logang@deltatee.com
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Aharon Landau [Sun, 26 Sep 2021 08:31:43 +0000 (11:31 +0300)]
RDMA/mlx5: Avoid taking MRs from larger MR cache pools when a pool is empty
Currently, if a cache entry is empty, the driver will try to take MRs
from larger cache entries. This behavior consumes a lot of memory.
In addition, when searching for an mkey in an entry, the entry is locked.
When using a multithreaded application with the old behavior, the threads
will block each other more often, which can hurt performance as can be
seen in the table below.
Therefore, avoid it by creating a new mkey when the requested cache entry
is empty.
The test was performed on a machine with
Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz 44 cores.
Here are the time measures for allocating MRs of 2^6 pages. The search in
the cache started from entry 6.
+------------+---------------------+---------------------+
| | Old behavior | New behavior |
| +----------+----------+----------+----------+
| | 1 thread | 5 thread | 1 thread | 5 thread |
+============+==========+==========+==========+==========+
| 1,000 MRs | 14 ms | 30 ms | 14 ms | 80 ms |
+------------+----------+----------+----------+----------+
| 10,000 MRs | 135 ms | 6 sec | 173 ms | 880 ms |
+------------+----------+----------+----------+----------+
|100,000 MRs | 11.2 sec | 57 sec | 1.74 sec | 8.8 sec |
+------------+----------+----------+----------+----------+
Link: https://lore.kernel.org/r/71af2770c737b936f7b10f457f0ef303ffcf7ad7.1632644527.git.leonro@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Md Haris Iqbal [Wed, 22 Sep 2021 12:53:33 +0000 (14:53 +0200)]
RDMA/rtrs-clt: Follow "one entry one value" rule for IO migration stats
This commit divides the sysfs entry cpu_migration into 2 different entries
One for "from cpus" and the other for "to cpus".
Link: https://lore.kernel.org/r/20210922125333.351454-8-haris.iqbal@ionos.com
Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Reviewed-by: Gioh Kim <gi-oh.kim@ionos.com>
Reviewed-by: Aleksei Marov <aleksei.marov@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Md Haris Iqbal [Wed, 22 Sep 2021 12:53:32 +0000 (14:53 +0200)]
RDMA/rtrs: Do not allow sessname to contain special symbols / and .
Allowing these characters in sessname can lead to unexpected results,
particularly because / is used as a separator between files in a path, and
. points to the current directory.
Link: https://lore.kernel.org/r/20210922125333.351454-7-haris.iqbal@ionos.com
Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Reviewed-by: Gioh Kim <gi-oh.kim@ionos.com>
Reviewed-by: Aleksei Marov <aleksei.marov@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Md Haris Iqbal [Wed, 22 Sep 2021 12:53:31 +0000 (14:53 +0200)]
RDMA/rtrs: Introduce destroy_cq helper
The same code snip used twice, to avoid duplicate, replace it with a
destroy_cq helper.
Link: https://lore.kernel.org/r/20210922125333.351454-6-haris.iqbal@ionos.com
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Jack Wang [Wed, 22 Sep 2021 12:53:30 +0000 (14:53 +0200)]
RDMA/rtrs: Replace duplicate check with is_pollqueue helper
if (con->cid >= con->sess->irq_con_num) check can be replaced with a
is_pollqueue helper.
Link: https://lore.kernel.org/r/20210922125333.351454-5-haris.iqbal@ionos.com
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Jack Wang [Wed, 22 Sep 2021 12:53:29 +0000 (14:53 +0200)]
RDMA/rtrs: Fix warning when use poll mode on client side.
When testing with poll mode, it will fail and lead to warning below on
client side:
$ echo "sessname=bla path=gid:fe80::2:c903:4e:d0b3@gid:fe80::2:c903:8:ca17 device_path=/dev/nullb2 nr_poll_queues=-1" | \
sudo tee /sys/devices/virtual/rnbd-client/ctl/map_device
rnbd_client L597: Mapping device /dev/nullb2 on session bla, (access_mode: rw, nr_poll_queues: 8)
WARNING: CPU: 3 PID: 9886 at drivers/infiniband/core/cq.c:447 ib_cq_pool_get+0x26f/0x2a0 [ib_core]
The problem is in case of poll queue, we need to still call
ib_alloc_cq/ib_free_cq, we can't use cq_poll api for poll queue.
As both client and server use shared function from rtrs, set irq_con_num
to con_num on server side, which is number of total connection of the
session, this way we can differ if the rtrs_con requires pollqueue.
Following up patches will replace the duplicate code with helpers.
Link: https://lore.kernel.org/r/20210922125333.351454-4-haris.iqbal@ionos.com
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Reviewed-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Md Haris Iqbal [Wed, 22 Sep 2021 12:53:28 +0000 (14:53 +0200)]
RDMA/rtrs: Remove len parameter from helper print functions of sysfs
Since we have changed all sysfs show functions to use sysfs_emit, we do
not require the len (PAGE_SIZE) in our helper print functions. So remove
it from the function parameter.
Link: https://lore.kernel.org/r/20210922125333.351454-3-haris.iqbal@ionos.com
Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Md Haris Iqbal [Wed, 22 Sep 2021 12:53:27 +0000 (14:53 +0200)]
RDMA/rtrs: Use sysfs_emit instead of s*printf function for sysfs show
sysfs_emit function was added to be aware of the PAGE_SIZE maximum of the
temporary buffer used for outputting sysfs content, so there is no
possible overruns. So replace the uses of any s*printf functions for the
sysfs show functions with sysfs_emit.
Link: https://lore.kernel.org/r/20210922125333.351454-2-haris.iqbal@ionos.com
Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Jason Gunthorpe [Wed, 15 Sep 2021 16:25:19 +0000 (13:25 -0300)]
RDMA/cma: Split apart the multiple uses of the same list heads
Two list heads in the rdma_id_private are being used for multiple
purposes, to save a few bytes of memory. Give the different purposes
different names and union the memory that is clearly exclusive.
list splits into device_item and listen_any_item. device_item is threaded
onto the cma_device's list and listen_any goes onto the
listen_any_list. IDs doing any listen cannot have devices.
listen_list splits into listen_item and listen_list. listen_list is on the
parent listen any rdma_id_private and listen_item is on child listen that
is bound to a specific cma_dev.
Which name should be used in which case depends on the state and other
factors of the rdma_id_private. Remap all the confusing references to make
sense with the new names, so at least there is some hope of matching the
necessary preconditions with each access.
Link: https://lore.kernel.org/r/0-v1-a5ead4a0c19d+c3a-cma_list_head_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Jason Gunthorpe [Mon, 4 Oct 2021 18:59:49 +0000 (15:59 -0300)]
Merge tag 'v5.15-rc4' into rdma.get for-next
Merged due to dependencies in following patches.
Conflict in drivers/infiniband/hw/hfi1/ipoib_tx.c resolved by hand to take
the %p change and txq stats rename together.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Rao Shoaib [Wed, 15 Sep 2021 01:12:20 +0000 (18:12 -0700)]
RDMA/rxe: Bump up default maximum values used via uverbs
In our internal testing we have found that default maximum values are too
small. Ideally there should be no limits, but since maximum values are
reported via ibv_query_device, we have to return some value. So, the
default maximums have been changed to large values.
Link: https://lore.kernel.org/r/20210915011220.307585-1-Rao.Shoaib@oracle.com
Signed-off-by: Rao Shoaib <Rao.Shoaib@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Linus Torvalds [Sun, 3 Oct 2021 21:08:47 +0000 (14:08 -0700)]
Linux 5.15-rc4
Chen Jingwen [Tue, 28 Sep 2021 12:56:57 +0000 (20:56 +0800)]
elf: don't use MAP_FIXED_NOREPLACE for elf interpreter mappings
In commit
b212921b13bd ("elf: don't use MAP_FIXED_NOREPLACE for elf
executable mappings") we still leave MAP_FIXED_NOREPLACE in place for
load_elf_interp.
Unfortunately, this will cause kernel to fail to start with:
1 (init): Uhuuh, elf segment at
00003ffff7ffd000 requested but the memory is mapped already
Failed to execute /init (error -17)
The reason is that the elf interpreter (ld.so) has overlapping segments.
readelf -l ld-2.31.so
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x000000000002c94c 0x000000000002c94c R E 0x10000
LOAD 0x000000000002dae0 0x000000000003dae0 0x000000000003dae0
0x00000000000021e8 0x0000000000002320 RW 0x10000
LOAD 0x000000000002fe00 0x000000000003fe00 0x000000000003fe00
0x00000000000011ac 0x0000000000001328 RW 0x10000
The reason for this problem is the same as described in commit
ad55eac74f20 ("elf: enforce MAP_FIXED on overlaying elf segments").
Not only executable binaries, elf interpreters (e.g. ld.so) can have
overlapping elf segments, so we better drop MAP_FIXED_NOREPLACE and go
back to MAP_FIXED in load_elf_interp.
Fixes:
4ed28639519c ("fs, elf: drop MAP_FIXED usage from elf_map")
Cc: <stable@vger.kernel.org> # v4.19
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Chen Jingwen <chenjingwen6@huawei.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 3 Oct 2021 20:56:53 +0000 (13:56 -0700)]
Merge tag 'ext4_for_linus_stable' of git://git./linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"Fix a number of ext4 bugs in fast_commit, inline data, and delayed
allocation.
Also fix error handling code paths in ext4_dx_readdir() and
ext4_fill_super().
Finally, avoid a grabbing a journal head in the delayed allocation
write in the common cases where we are overwriting a pre-existing
block or appending to an inode"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: recheck buffer uptodate bit under buffer lock
ext4: fix potential infinite loop in ext4_dx_readdir()
ext4: flush s_error_work before journal destroy in ext4_fill_super
ext4: fix loff_t overflow in ext4_max_bitmap_size()
ext4: fix reserved space counter leakage
ext4: limit the number of blocks in one ADD_RANGE TLV
ext4: enforce buffer head state assertion in ext4_da_map_blocks
ext4: remove extent cache entries when truncating inline data
ext4: drop unnecessary journal handle in delalloc write
ext4: factor out write end code of inline file
ext4: correct the error path of ext4_write_inline_data_end()
ext4: check and update i_disksize properly
ext4: add error checking to ext4_ext_replay_set_iblocks()
Linus Torvalds [Sun, 3 Oct 2021 20:45:48 +0000 (13:45 -0700)]
objtool: print out the symbol type when complaining about it
The objtool warning that the kvm instruction emulation code triggered
wasn't very useful:
arch/x86/kvm/emulate.o: warning: objtool: __ex_table+0x4: don't know how to handle reloc symbol type: kvm_fastop_exception
in that it helpfully tells you which symbol name it had trouble figuring
out the relocation for, but it doesn't actually say what the unknown
symbol type was that triggered it all.
In this case it was because of missing type information (type 0, aka
STT_NOTYPE), but on the whole it really should just have printed that
out as part of the message.
Because if this warning triggers, that's very much the first thing you
want to know - why did reloc2sec_off() return failure for that symbol?
So rather than just saying you can't handle some type of symbol without
saying what the type _was_, just print out the type number too.
Fixes:
24ff65257375 ("objtool: Teach get_alt_entry() about more relocation types")
Link: https://lore.kernel.org/lkml/CAHk-=wiZwq-0LknKhXN4M+T8jbxn_2i9mcKpO+OaBSSq_Eh7tg@mail.gmail.com/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 3 Oct 2021 20:34:19 +0000 (13:34 -0700)]
kvm: fix objtool relocation warning
The recent change to make objtool aware of more symbol relocation types
(commit
24ff65257375: "objtool: Teach get_alt_entry() about more
relocation types") also added another check, and resulted in this
objtool warning when building kvm on x86:
arch/x86/kvm/emulate.o: warning: objtool: __ex_table+0x4: don't know how to handle reloc symbol type: kvm_fastop_exception
The reason seems to be that kvm_fastop_exception() is marked as a global
symbol, which causes the relocation to ke kept around for objtool. And
at the same time, the kvm_fastop_exception definition (which is done as
an inline asm statement) doesn't actually set the type of the global,
which then makes objtool unhappy.
The minimal fix is to just not mark kvm_fastop_exception as being a
global symbol. It's only used in that one compilation unit anyway, so
it was always pointless. That's how all the other local exception table
labels are done.
I'm not entirely happy about the kinds of games that the kvm code plays
with doing its own exception handling, and the fact that it confused
objtool is most definitely a symptom of the code being a bit too subtle
and ad-hoc. But at least this trivial one-liner makes objtool no longer
upset about what is going on.
Fixes:
24ff65257375 ("objtool: Teach get_alt_entry() about more relocation types")
Link: https://lore.kernel.org/lkml/CAHk-=wiZwq-0LknKhXN4M+T8jbxn_2i9mcKpO+OaBSSq_Eh7tg@mail.gmail.com/
Cc: Borislav Petkov <bp@suse.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 3 Oct 2021 18:19:02 +0000 (11:19 -0700)]
Merge tag 'char-misc-5.15-rc4' of git://git./linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here are some small misc driver fixes for 5.15-rc4. They are in two
"groups":
- ipack driver fixes for issues found by Johan Hovold
- interconnect driver fixes for reported problems
All of these have been in linux-next for a while with no reported
issues"
* tag 'char-misc-5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
ipack: ipoctal: fix module reference leak
ipack: ipoctal: fix missing allocation-failure check
ipack: ipoctal: fix tty-registration error handling
ipack: ipoctal: fix tty registration race
ipack: ipoctal: fix stack information leak
interconnect: qcom: sdm660: Add missing a2noc qos clocks
dt-bindings: interconnect: sdm660: Add missing a2noc qos clocks
interconnect: qcom: sdm660: Correct NOC_QOS_PRIORITY shift and mask
interconnect: qcom: sdm660: Fix id of slv_cnoc_mnoc_cfg
Linus Torvalds [Sun, 3 Oct 2021 18:10:09 +0000 (11:10 -0700)]
Merge tag 'driver-core-5.15-rc4' of git://git./linux/kernel/git/gregkh/driver-core
Pull driver core fixes from Greg KH:
"Here are some driver core and kernfs fixes for reported issues for
5.15-rc4. These fixes include:
- kernfs positive dentry bugfix
- debugfs_create_file_size error path fix
- cpumask sysfs file bugfix to preserve the user/kernel abi (has been
reported multiple times.)
- devlink fixes for mdiobus devices as reported by the subsystem
maintainers.
Also included in here are some devlink debugging changes to make it
easier for people to report problems when asked. They have already
helped with the mdiobus and other subsystems reporting issues.
All of these have been linux-next for a while with no reported issues"
* tag 'driver-core-5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
kernfs: also call kernfs_set_rev() for positive dentry
driver core: Add debug logs when fwnode links are added/deleted
driver core: Create __fwnode_link_del() helper function
driver core: Set deferred probe reason when deferred by driver core
net: mdiobus: Set FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD for mdiobus parents
driver core: fw_devlink: Add support for FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD
driver core: fw_devlink: Improve handling of cyclic dependencies
cpumask: Omit terminating null byte in cpumap_print_{list,bitmask}_to_buf
debugfs: debugfs_create_file_size(): use IS_ERR to check for error
Linus Torvalds [Sun, 3 Oct 2021 17:49:00 +0000 (10:49 -0700)]
Merge tag 'sched_urgent_for_v5.15_rc4' of git://git./linux/kernel/git/tip/tip
Pull scheduler fixes from Borislav Petkov:
- Tell the compiler to always inline is_percpu_thread()
- Make sure tunable_scaling buffer is null-terminated after an update
in sysfs
- Fix LTP named regression due to cgroup list ordering
* tag 'sched_urgent_for_v5.15_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched: Always inline is_percpu_thread()
sched/fair: Null terminate buffer when updating tunable_scaling
sched/fair: Add ancestors of unthrottled undecayed cfs_rq
Linus Torvalds [Sun, 3 Oct 2021 17:32:27 +0000 (10:32 -0700)]
Merge tag 'perf_urgent_for_v5.15_rc4' of git://git./linux/kernel/git/tip/tip
Pull perf fixes from Borislav Petkov:
- Make sure the destroy callback is reset when a event initialization
fails
- Update the event constraints for Icelake
- Make sure the active time of an event is updated even for inactive
events
* tag 'perf_urgent_for_v5.15_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: fix userpage->time_enabled of inactive events
perf/x86/intel: Update event constraints for ICX
perf/x86: Reset destroy callback on event init failure
Linus Torvalds [Sun, 3 Oct 2021 17:23:54 +0000 (10:23 -0700)]
Merge tag 'objtool_urgent_for_v5.15_rc4' of git://git./linux/kernel/git/tip/tip
Pull objtool fix from Borislav Petkov:
- Handle symbol relocations properly due to changes in the toolchains
which remove section symbols now
* tag 'objtool_urgent_for_v5.15_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: Teach get_alt_entry() about more relocation types
Linus Torvalds [Sun, 3 Oct 2021 00:51:01 +0000 (17:51 -0700)]
Merge tag 'hwmon-for-v5.15-rc4' of git://git./linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
- Fixed various potential NULL pointer accesses in w8379* drivers
- Improved error handling, fault reporting, and fixed rounding in
thmp421 driver
- Fixed error handling in ltc2947 driver
- Added missing attribute to pmbus/mp2975 driver
- Fixed attribute values in pbus/ibm-cffps, occ, and mlxreg-fan
drivers
- Removed unused residual code from k10temp driver
* tag 'hwmon-for-v5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (w83793) Fix NULL pointer dereference by removing unnecessary structure field
hwmon: (w83792d) Fix NULL pointer dereference by removing unnecessary structure field
hwmon: (w83791d) Fix NULL pointer dereference by removing unnecessary structure field
hwmon: (pmbus/mp2975) Add missed POUT attribute for page 1 mp2975 controller
hwmon: (pmbus/ibm-cffps) max_power_out swap changes
hwmon: (occ) Fix P10 VRM temp sensors
hwmon: (ltc2947) Properly handle errors when looking for the external clock
hwmon: (tmp421) fix rounding for negative values
hwmon: (tmp421) report /PVLD condition as fault
hwmon: (tmp421) handle I2C errors
hwmon: (mlxreg-fan) Return non-zero value when fan current state is enforced from sysfs
hwmon: (k10temp) Remove residues of current and voltage
Linus Torvalds [Sun, 3 Oct 2021 00:43:54 +0000 (17:43 -0700)]
Merge tag '5.15-rc3-ksmbd-fixes' of git://git.samba.org/ksmbd
Pull ksmbd server fixes from Steve French:
"Eleven fixes for the ksmbd kernel server, mostly security related:
- an important fix for disabling weak NTLMv1 authentication
- seven security (improved buffer overflow checks) fixes
- fix for wrong infolevel struct used in some getattr/setattr paths
- two small documentation fixes"
* tag '5.15-rc3-ksmbd-fixes' of git://git.samba.org/ksmbd:
ksmbd: missing check for NULL in convert_to_nt_pathname()
ksmbd: fix transform header validation
ksmbd: add buffer validation for SMB2_CREATE_CONTEXT
ksmbd: add validation in smb2 negotiate
ksmbd: add request buffer validation in smb2_set_info
ksmbd: use correct basic info level in set_file_basic_info()
ksmbd: remove NTLMv1 authentication
ksmbd: fix documentation for 2 functions
MAINTAINERS: rename cifs_common to smbfs_common in cifs and ksmbd entry
ksmbd: fix invalid request buffer access in compound
ksmbd: remove RFC1002 check in smb2 request
Linus Torvalds [Sat, 2 Oct 2021 19:56:03 +0000 (12:56 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Five fairly minor fixes and spelling updates, all in drivers. Even
though the ufs fix is in tracing, it's a potentially exploitable use
beyond end of array bug"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: csiostor: Add module softdep on cxgb4
scsi: qla2xxx: Fix excessive messages during device logout
scsi: virtio_scsi: Fix spelling mistake "Unsupport" -> "Unsupported"
scsi: ses: Fix unsigned comparison with less than zero
scsi: ufs: Fix illegal offset in UPIU event trace
Linus Torvalds [Sat, 2 Oct 2021 18:00:36 +0000 (11:00 -0700)]
Merge tag 'block-5.15-2021-10-01' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"A few block fixes for this release:
- Revert a BFQ commit that causes breakage for people. Unfortunately
it was auto-selected for stable as well, so now 5.14.7 suffers from
it too. Hopefully stable will pick up this revert quickly too, so
we can remove the issue on that end as well.
- Add a quirk for Apple NVMe controllers, which due to their
non-compliance broke due to the introduction of command sequences
(Keith)
- Use shifts in nbd, fixing a __divdi3 issue (Nick)"
* tag 'block-5.15-2021-10-01' of git://git.kernel.dk/linux-block:
nbd: use shifts rather than multiplies
Revert "block, bfq: honor already-setup queue merges"
nvme: add command id quirk for apple controllers
Linus Torvalds [Sat, 2 Oct 2021 17:26:19 +0000 (10:26 -0700)]
Merge tag 'io_uring-5.15-2021-10-01' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
"Two fixes in here:
- The signal issue that was discussed start of this week (me).
- Kill dead fasync support in io_uring. Looks like it was broken
since io_uring was initially merged, and given that nobody has ever
complained about it, let's just kill it (Pavel)"
* tag 'io_uring-5.15-2021-10-01' of git://git.kernel.dk/linux-block:
io_uring: kill fasync
io-wq: exclusively gate signal based exit on get_signal() return
Linus Torvalds [Sat, 2 Oct 2021 17:08:35 +0000 (10:08 -0700)]
Merge tag 'libnvdimm-fixes-5.15-rc4' of git://git./linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fixes from Dan Williams:
"A fix for a regression added this cycle in the pmem driver, and for a
long standing bug for failed NUMA node lookups on ARM64.
This has appeared in -next for several days with no reported issues.
Summary:
- Fix a regression that caused the sysfs ABI for pmem block devices
to not be registered. This fails the nvdimm unit tests and dax
xfstests.
- Fix numa node lookups for dax-kmem memory (device-dax memory
assigned to the page allocator) on ARM64"
* tag 'libnvdimm-fixes-5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
nvdimm/pmem: fix creating the dax group
ACPI: NFIT: Use fallback node id when numa info in NFIT table is incorrect
Dave Wysochanski [Fri, 1 Oct 2021 14:37:31 +0000 (15:37 +0100)]
cachefiles: Fix oops in trace_cachefiles_mark_buried due to NULL object
In cachefiles_mark_object_buried, the dentry in question may not have an
owner, and thus our cachefiles_object pointer may be NULL when calling
the tracepoint, in which case we will also not have a valid debug_id to
print in the tracepoint.
Check for NULL object in the tracepoint and if so, just set debug_id to
MAX_UINT as was done in
2908f5e101e3 ("fscache: Add a cookie debug ID
and use that in traces").
This fixes the following oops:
FS-Cache: Cache "mycache" added (type cachefiles)
CacheFiles: File cache on vdc registered
...
Workqueue: fscache_object fscache_object_work_func [fscache]
RIP: 0010:trace_event_raw_event_cachefiles_mark_buried+0x4e/0xa0 [cachefiles]
....
Call Trace:
cachefiles_mark_object_buried+0xa5/0xb0 [cachefiles]
cachefiles_bury_object+0x270/0x430 [cachefiles]
cachefiles_walk_to_object+0x195/0x9c0 [cachefiles]
cachefiles_lookup_object+0x5a/0xc0 [cachefiles]
fscache_look_up_object+0xd7/0x160 [fscache]
fscache_object_work_func+0xb2/0x340 [fscache]
process_one_work+0x1f1/0x390
worker_thread+0x53/0x3e0
kthread+0x127/0x150
Fixes:
2908f5e101e3 ("fscache: Add a cookie debug ID and use that in traces")
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-cachefs@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Sat, 2 Oct 2021 10:17:29 +0000 (03:17 -0700)]
drm/i915: fix blank screen booting crashes
5.15-rc1 crashes with blank screen when booting up on two ThinkPads
using i915. Bisections converge convincingly, but arrive at different
and suprising "culprits", none of them the actual culprit.
netconsole (with init_netconsole() hacked to call i915_init() when
logging has started, instead of by module_init()) tells the story:
kernel BUG at drivers/gpu/drm/i915/i915_sw_fence.c:245!
with RSI:
ffffffff814d408b pointing to sw_fence_dummy_notify().
I've been building with CONFIG_CC_OPTIMIZE_FOR_SIZE=y, and that
function needs to be 4-byte aligned.
Fixes:
62eaf0ae217d ("drm/i915/guc: Support request cancellation")
Signed-off-by: Hugh Dickins <hughd@google.com>
Tested-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nadezda Lutovinova [Tue, 21 Sep 2021 15:51:53 +0000 (18:51 +0300)]
hwmon: (w83793) Fix NULL pointer dereference by removing unnecessary structure field
If driver read tmp value sufficient for
(tmp & 0x08) && (!(tmp & 0x80)) && ((tmp & 0x7) == ((tmp >> 4) & 0x7))
from device then Null pointer dereference occurs.
(It is possible if tmp = 0b0xyz1xyz, where same literals mean same numbers)
Also lm75[] does not serve a purpose anymore after switching to
devm_i2c_new_dummy_device() in w83791d_detect_subclients().
The patch fixes possible NULL pointer dereference by removing lm75[].
Found by Linux Driver Verification project (linuxtesting.org).
Cc: stable@vger.kernel.org
Signed-off-by: Nadezda Lutovinova <lutovinova@ispras.ru>
Link: https://lore.kernel.org/r/20210921155153.28098-3-lutovinova@ispras.ru
[groeck: Dropped unnecessary continuation lines, fixed multi-line alignments]
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Nadezda Lutovinova [Tue, 21 Sep 2021 15:51:52 +0000 (18:51 +0300)]
hwmon: (w83792d) Fix NULL pointer dereference by removing unnecessary structure field
If driver read val value sufficient for
(val & 0x08) && (!(val & 0x80)) && ((val & 0x7) == ((val >> 4) & 0x7))
from device then Null pointer dereference occurs.
(It is possible if tmp = 0b0xyz1xyz, where same literals mean same numbers)
Also lm75[] does not serve a purpose anymore after switching to
devm_i2c_new_dummy_device() in w83791d_detect_subclients().
The patch fixes possible NULL pointer dereference by removing lm75[].
Found by Linux Driver Verification project (linuxtesting.org).
Cc: stable@vger.kernel.org
Signed-off-by: Nadezda Lutovinova <lutovinova@ispras.ru>
Link: https://lore.kernel.org/r/20210921155153.28098-2-lutovinova@ispras.ru
[groeck: Dropped unnecessary continuation lines, fixed multipline alignment]
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Nadezda Lutovinova [Tue, 21 Sep 2021 15:51:51 +0000 (18:51 +0300)]
hwmon: (w83791d) Fix NULL pointer dereference by removing unnecessary structure field
If driver read val value sufficient for
(val & 0x08) && (!(val & 0x80)) && ((val & 0x7) == ((val >> 4) & 0x7))
from device then Null pointer dereference occurs.
(It is possible if tmp = 0b0xyz1xyz, where same literals mean same numbers)
Also lm75[] does not serve a purpose anymore after switching to
devm_i2c_new_dummy_device() in w83791d_detect_subclients().
The patch fixes possible NULL pointer dereference by removing lm75[].
Found by Linux Driver Verification project (linuxtesting.org).
Cc: stable@vger.kernel.org
Signed-off-by: Nadezda Lutovinova <lutovinova@ispras.ru>
Link: https://lore.kernel.org/r/20210921155153.28098-1-lutovinova@ispras.ru
[groeck: Dropped unnecessary continuation lines, fixed multi-line alignment]
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Vadim Pasternak [Mon, 27 Sep 2021 07:07:40 +0000 (10:07 +0300)]
hwmon: (pmbus/mp2975) Add missed POUT attribute for page 1 mp2975 controller
Add missed attribute for reading POUT from page 1.
It is supported by device, but has been missed in initial commit.
Fixes:
2c6fcbb21149 ("hwmon: (pmbus) Add support for MPS Multi-phase mp2975 controller")
Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Link: https://lore.kernel.org/r/20210927070740.2149290-1-vadimp@nvidia.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Brandon Wyman [Tue, 28 Sep 2021 20:50:51 +0000 (20:50 +0000)]
hwmon: (pmbus/ibm-cffps) max_power_out swap changes
The bytes for max_power_out from the ibm-cffps devices differ in byte
order for some power supplies.
The Witherspoon power supply returns the bytes in MSB/LSB order.
The Rainier power supply returns the bytes in LSB/MSB order.
The Witherspoon power supply uses version cffps1. The Rainier power
supply should use version cffps2. If version is cffps1, swap the bytes
before output to max_power_out.
Tested:
Witherspoon before: 3148. Witherspoon after: 3148.
Rainier before: 53255. Rainier after: 2000.
Signed-off-by: Brandon Wyman <bjwyman@gmail.com>
Reviewed-by: Eddie James <eajames@linux.ibm.com>
Link: https://lore.kernel.org/r/20210928205051.1222815-1-bjwyman@gmail.com
[groeck: Replaced yoda programming]
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Eddie James [Wed, 29 Sep 2021 15:36:04 +0000 (10:36 -0500)]
hwmon: (occ) Fix P10 VRM temp sensors
The P10 (temp sensor version 0x10) doesn't do the same VRM status
reporting that was used on P9. It just reports the temperature, so
drop the check for VRM fru type in the sysfs show function, and don't
set the name to "alarm".
Fixes:
db4919ec86 ("hwmon: (occ) Add new temperature sensor type")
Signed-off-by: Eddie James <eajames@linux.ibm.com>
Link: https://lore.kernel.org/r/20210929153604.14968-1-eajames@linux.ibm.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Linus Torvalds [Fri, 1 Oct 2021 21:45:23 +0000 (14:45 -0700)]
Merge tag 's390-5.15-4' of git://git./linux/kernel/git/s390/linux
Pull s390 fix from Vasily Gorbik:
"One fix for 5.15-rc4: Avoid CIO excessive path-verification requests,
which might cause unwanted delays"
* tag 's390-5.15-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/cio: avoid excessive path-verification requests
Rafael J. Wysocki [Fri, 1 Oct 2021 17:14:28 +0000 (19:14 +0200)]
thermal: Update information in MAINTAINERS
Because Rui is now going to focus on work that is not related to the
maintenance of the thermal subsystem in the kernel, Rafael will start
to help Daniel with handling the development process as a new member
of the thermal maintainers team. Rui will continue to review patches
in that area.
The thermal development process flow will change so that the material
from the thermal git tree will be merged into the thermal branch of
the linux-pm.git tree before going into the mainline.
Update the information in MAINTAINERS accordingly.
Signed-off-by: Rafael J. Wysocki <rafael@kernel.org>
Acked-by: Zhang Rui <rui.zhang@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 1 Oct 2021 18:08:07 +0000 (11:08 -0700)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull more kvm fixes from Paolo Bonzini:
"Small x86 fixes"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: selftests: Ensure all migrations are performed when test is affined
KVM: x86: Swap order of CPUID entry "index" vs. "significant flag" checks
ptp: Fix ptp_kvm_getcrosststamp issue for x86 ptp_kvm
x86/kvmclock: Move this_cpu_pvti into kvmclock.h
selftests: KVM: Don't clobber XMM register when read
KVM: VMX: Fix a TSX_CTRL_CPUID_CLEAR field mask issue
Linus Torvalds [Fri, 1 Oct 2021 17:27:44 +0000 (10:27 -0700)]
Merge tag 'drm-fixes-2021-10-01' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Daniel Vetter:
"Dave is out on a long w/e, should be back next week.
Nothing nefarious, just a bunch of driver fixes: amdgpu, i915, tegra,
and one exynos driver fix"
* tag 'drm-fixes-2021-10-01' of git://anongit.freedesktop.org/drm/drm:
drm/amdgpu: force exit gfxoff on sdma resume for rmb s0ix
drm/amdgpu: check tiling flags when creating FB on GFX8-
drm/amd/display: Pass PCI deviceid into DC
drm/amd/display: initialize backlight_ramping_override to false
drm/amdgpu: correct initial cp_hqd_quantum for gfx9
drm/amd/display: Fix Display Flicker on embedded panels
drm/amdgpu: fix gart.bo pin_count leak
drm/i915: Remove warning from the rps worker
drm/i915/request: fix early tracepoints
drm/i915/guc, docs: Fix pdfdocs build error by removing nested grid
gpu: host1x: Plug potential memory leak
gpu/host1x: fence: Make spinlock static
drm/tegra: uapi: Fix wrong mapping end address in case of disabled IOMMU
drm/tegra: dc: Remove unused variables
drm/exynos: Make use of the helper function devm_platform_ioremap_resource()
drm/i915/gvt: fix the usage of ww lock in gvt scheduler.
Pavel Begunkov [Fri, 1 Oct 2021 09:39:33 +0000 (10:39 +0100)]
io_uring: kill fasync
We have never supported fasync properly, it would only fire when there
is something polling io_uring making it useless. The original support came
in through the initial io_uring merge for 5.1. Since it's broken and
nobody has reported it, get rid of the fasync bits.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/2f7ca3d344d406d34fa6713824198915c41cea86.1633080236.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Fri, 1 Oct 2021 17:14:29 +0000 (10:14 -0700)]
Merge tag 'iommu-fixes-v5.15-rc3' of git://git./linux/kernel/git/joro/iommu
Pull iommu fixes from Joerg Roedel:
- Two fixes for the new Apple DART driver to fix a kernel panic and a
stale data usage issue
- Intel VT-d fix for how PCI device ids are printed
* tag 'iommu-fixes-v5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/dart: Clear sid2group entry when a group is freed
iommu/vt-d: Drop "0x" prefix from PCI bus & device addresses
iommu/dart: Remove iommu_flush_ops
Daniel Vetter [Fri, 1 Oct 2021 16:14:38 +0000 (18:14 +0200)]
Merge tag 'exynos-drm-fixes-for-v5.15-rc4' of git://git./linux/kernel/git/daeinki/drm-exynos into drm-fixes
One cleanup
- Use devm_platform_ioremap_resource() helper function instead of old
one.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Inki Dae <inki.dae@samsung.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210928074158.2942-1-inki.dae@samsung.com
Daniel Vetter [Fri, 1 Oct 2021 14:59:21 +0000 (16:59 +0200)]
Merge tag 'amd-drm-fixes-5.15-2021-09-29' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-5.15-2021-09-29:
amdgpu:
- gart pin count fix
- eDP flicker fix
- GFX9 MQD fix
- Display fixes
- Tiling flags fix for pre-GFX9
- SDMA resume fix for S0ix
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210930023013.5207-1-alexander.deucher@amd.com
Daniel Vetter [Fri, 1 Oct 2021 14:47:18 +0000 (16:47 +0200)]
Merge tag 'drm-intel-fixes-2021-09-30' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
drm/i915 fixes for v5.15-rc4:
- Fix GVT scheduler ww lock usage
- Fix pdfdocs documentation build
- Fix request early tracepoints
- Fix an invalid warning from rps worker
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/87lf3ev44z.fsf@intel.com
Jason Gunthorpe [Fri, 3 Sep 2021 08:48:15 +0000 (16:48 +0800)]
IB/mlx5: Flow through a more detailed return code from get_prefetchable_mr()
The error returns for various cases detected by get_prefetchable_mr() get
confused as it flows back to userspace. Properly label each error path and
flow the error code properly back to the system call.
Link: https://lore.kernel.org/r/20210928170846.GA1721590@nvidia.com
Suggested-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Peter Zijlstra [Mon, 20 Sep 2021 13:31:11 +0000 (15:31 +0200)]
sched: Always inline is_percpu_thread()
vmlinux.o: warning: objtool: check_preemption_disabled()+0x81: call to is_percpu_thread() leaves .noinstr.text section
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210928084218.063371959@infradead.org
Mel Gorman [Mon, 27 Sep 2021 11:46:35 +0000 (12:46 +0100)]
sched/fair: Null terminate buffer when updating tunable_scaling
This patch null-terminates the temporary buffer in sched_scaling_write()
so kstrtouint() does not return failure and checks the value is valid.
Before:
$ cat /sys/kernel/debug/sched/tunable_scaling
1
$ echo 0 > /sys/kernel/debug/sched/tunable_scaling
-bash: echo: write error: Invalid argument
$ cat /sys/kernel/debug/sched/tunable_scaling
1
After:
$ cat /sys/kernel/debug/sched/tunable_scaling
1
$ echo 0 > /sys/kernel/debug/sched/tunable_scaling
$ cat /sys/kernel/debug/sched/tunable_scaling
0
$ echo 3 > /sys/kernel/debug/sched/tunable_scaling
-bash: echo: write error: Invalid argument
Fixes:
8a99b6833c88 ("sched: Move SCHED_DEBUG sysctl to debugfs")
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lore.kernel.org/r/20210927114635.GH3959@techsingularity.net
Michal Koutný [Fri, 17 Sep 2021 15:30:37 +0000 (17:30 +0200)]
sched/fair: Add ancestors of unthrottled undecayed cfs_rq
Since commit
a7b359fc6a37 ("sched/fair: Correctly insert cfs_rq's to
list on unthrottle") we add cfs_rqs with no runnable tasks but not fully
decayed into the load (leaf) list. We may ignore adding some ancestors
and therefore breaking tmp_alone_branch invariant. This broke LTP test
cfs_bandwidth01 and it was partially fixed in commit
fdaba61ef8a2
("sched/fair: Ensure that the CFS parent is added after unthrottling").
I noticed the named test still fails even with the fix (but with low
probability, 1 in ~1000 executions of the test). The reason is when
bailing out of unthrottle_cfs_rq early, we may miss adding ancestors of
the unthrottled cfs_rq, thus, not joining tmp_alone_branch properly.
Fix this by adding ancestors if we notice the unthrottled cfs_rq was
added to the load list.
Fixes:
a7b359fc6a37 ("sched/fair: Correctly insert cfs_rq's to list on unthrottle")
Signed-off-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Reviewed-by: Odin Ugedal <odin@uged.al>
Link: https://lore.kernel.org/r/20210917153037.11176-1-mkoutny@suse.com
Song Liu [Wed, 29 Sep 2021 19:43:13 +0000 (12:43 -0700)]
perf/core: fix userpage->time_enabled of inactive events
Users of rdpmc rely on the mmapped user page to calculate accurate
time_enabled. Currently, userpage->time_enabled is only updated when the
event is added to the pmu. As a result, inactive event (due to counter
multiplexing) does not have accurate userpage->time_enabled. This can
be reproduced with something like:
/* open 20 task perf_event "cycles", to create multiplexing */
fd = perf_event_open(); /* open task perf_event "cycles" */
userpage = mmap(fd); /* use mmap and rdmpc */
while (true) {
time_enabled_mmap = xxx; /* use logic in perf_event_mmap_page */
time_enabled_read = read(fd).time_enabled;
if (time_enabled_mmap > time_enabled_read)
BUG();
}
Fix this by updating userpage for inactive events in merge_sched_in.
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reported-and-tested-by: Lucian Grijincu <lucian@fb.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210929194313.2398474-1-songliubraving@fb.com
Kan Liang [Tue, 28 Sep 2021 15:19:03 +0000 (08:19 -0700)]
perf/x86/intel: Update event constraints for ICX
According to the latest event list, the event encoding 0xEF is only
available on the first 4 counters. Add it into the event constraints
table.
Fixes:
6017608936c1 ("perf/x86/intel: Add Icelake support")
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/1632842343-25862-1-git-send-email-kan.liang@linux.intel.com
Anand K Mistry [Wed, 29 Sep 2021 07:04:21 +0000 (17:04 +1000)]
perf/x86: Reset destroy callback on event init failure
perf_init_event tries multiple init callbacks and does not reset the
event state between tries. When x86_pmu_event_init runs, it
unconditionally sets the destroy callback to hw_perf_event_destroy. On
the next init attempt after x86_pmu_event_init, in perf_try_init_event,
if the pmu's capabilities includes PERF_PMU_CAP_NO_EXCLUDE, the destroy
callback will be run. However, if the next init didn't set the destroy
callback, hw_perf_event_destroy will be run (since the callback wasn't
reset).
Looking at other pmu init functions, the common pattern is to only set
the destroy callback on a successful init. Resetting the callback on
failure tries to replicate that pattern.
This was discovered after commit
f11dd0d80555 ("perf/x86/amd/ibs: Extend
PERF_PMU_CAP_NO_EXCLUDE to IBS Op") when the second (and only second)
run of the perf tool after a reboot results in 0 samples being
generated. The extra run of hw_perf_event_destroy results in
active_events having an extra decrement on each perf run. The second run
has active_events == 0 and every subsequent run has active_events < 0.
When active_events == 0, the NMI handler will early-out and not record
any samples.
Signed-off-by: Anand K Mistry <amistry@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210929170405.1.I078b98ee7727f9ae9d6df8262bad7e325e40faf0@changeid
Peter Zijlstra [Thu, 30 Sep 2021 10:43:10 +0000 (12:43 +0200)]
objtool: Teach get_alt_entry() about more relocation types
Occasionally objtool encounters symbol (as opposed to section)
relocations in .altinstructions. Typically they are the alternatives
written by elf_add_alternative() as encountered on a noinstr
validation run on vmlinux after having already ran objtool on the
individual .o files.
Basically this is the counterpart of commit
44f6a7c0755d ("objtool:
Fix seg fault with Clang non-section symbols"), because when these new
assemblers (binutils now also does this) strip the section symbols,
elf_add_reloc_to_insn() is forced to emit symbol based relocations.
As such, teach get_alt_entry() about different relocation types.
Fixes:
9bc0bb50727c ("objtool/x86: Rewrite retpoline thunk calls")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/YVWUvknIEVNkPvnP@hirez.programming.kicks-ass.net
Zhang Yi [Fri, 10 Sep 2021 08:03:16 +0000 (16:03 +0800)]
ext4: recheck buffer uptodate bit under buffer lock
Commit
8e33fadf945a ("ext4: remove an unnecessary if statement in
__ext4_get_inode_loc()") forget to recheck buffer's uptodate bit again
under buffer lock, which may overwrite the buffer if someone else have
already brought it uptodate and changed it.
Fixes:
8e33fadf945a ("ext4: remove an unnecessary if statement in __ext4_get_inode_loc()")
Cc: stable@kernel.org
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20210910080316.70421-1-yi.zhang@huawei.com
yangerkun [Tue, 14 Sep 2021 11:14:15 +0000 (19:14 +0800)]
ext4: fix potential infinite loop in ext4_dx_readdir()
When ext4_htree_fill_tree() fails, ext4_dx_readdir() can run into an
infinite loop since if info->last_pos != ctx->pos this will reset the
directory scan and reread the failing entry. For example:
1. a dx_dir which has 3 block, block 0 as dx_root block, block 1/2 as
leaf block which own the ext4_dir_entry_2
2. block 1 read ok and call_filldir which will fill the dirent and update
the ctx->pos
3. block 2 read fail, but we has already fill some dirent, so we will
return back to userspace will a positive return val(see ksys_getdents64)
4. the second ext4_dx_readdir will reset the world since info->last_pos
!= ctx->pos, and will also init the curr_hash which pos to block 1
5. So we will read block1 too, and once block2 still read fail, we can
only fill one dirent because the hash of the entry in block1(besides
the last one) won't greater than curr_hash
6. this time, we forget update last_pos too since the read for block2
will fail, and since we has got the one entry, ksys_getdents64 can
return success
7. Latter we will trapped in a loop with step 4~6
Cc: stable@kernel.org
Signed-off-by: yangerkun <yangerkun@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20210914111415.3921954-1-yangerkun@huawei.com
yangerkun [Fri, 24 Sep 2021 09:39:17 +0000 (17:39 +0800)]
ext4: flush s_error_work before journal destroy in ext4_fill_super
The error path in ext4_fill_super forget to flush s_error_work before
journal destroy, and it may trigger the follow bug since
flush_stashed_error_work can run concurrently with journal destroy
without any protection for sbi->s_journal.
[32031.740193] EXT4-fs (loop66): get root inode failed
[32031.740484] EXT4-fs (loop66): mount failed
[32031.759805] ------------[ cut here ]------------
[32031.759807] kernel BUG at fs/jbd2/transaction.c:373!
[32031.760075] invalid opcode: 0000 [#1] SMP PTI
[32031.760336] CPU: 5 PID: 1029268 Comm: kworker/5:1 Kdump: loaded
4.18.0
[32031.765112] Call Trace:
[32031.765375] ? __switch_to_asm+0x35/0x70
[32031.765635] ? __switch_to_asm+0x41/0x70
[32031.765893] ? __switch_to_asm+0x35/0x70
[32031.766148] ? __switch_to_asm+0x41/0x70
[32031.766405] ? _cond_resched+0x15/0x40
[32031.766665] jbd2__journal_start+0xf1/0x1f0 [jbd2]
[32031.766934] jbd2_journal_start+0x19/0x20 [jbd2]
[32031.767218] flush_stashed_error_work+0x30/0x90 [ext4]
[32031.767487] process_one_work+0x195/0x390
[32031.767747] worker_thread+0x30/0x390
[32031.768007] ? process_one_work+0x390/0x390
[32031.768265] kthread+0x10d/0x130
[32031.768521] ? kthread_flush_work_fn+0x10/0x10
[32031.768778] ret_from_fork+0x35/0x40
static int start_this_handle(...)
BUG_ON(journal->j_flags & JBD2_UNMOUNT); <---- Trigger this
Besides, after we enable fast commit, ext4_fc_replay can add work to
s_error_work but return success, so the latter journal destroy in
ext4_load_journal can trigger this problem too.
Fix this problem with two steps:
1. Call ext4_commit_super directly in ext4_handle_error for the case
that called from ext4_fc_replay
2. Since it's hard to pair the init and flush for s_error_work, we'd
better add a extras flush_work before journal destroy in
ext4_fill_super
Besides, this patch will call ext4_commit_super in ext4_handle_error for
any nojournal case too. But it seems safe since the reason we call
schedule_work was that we should save error info to sb through journal
if available. Conversely, for the nojournal case, it seems useless delay
commit superblock to s_error_work.
Fixes:
c92dc856848f ("ext4: defer saving error info from atomic context")
Fixes:
2d01ddc86606 ("ext4: save error info to sb through journal if available")
Cc: stable@kernel.org
Signed-off-by: yangerkun <yangerkun@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20210924093917.1953239-1-yangerkun@huawei.com
Ritesh Harjani [Sat, 5 Jun 2021 05:09:32 +0000 (10:39 +0530)]
ext4: fix loff_t overflow in ext4_max_bitmap_size()
We should use unsigned long long rather than loff_t to avoid
overflow in ext4_max_bitmap_size() for comparison before returning.
w/o this patch sbi->s_bitmap_maxbytes was becoming a negative
value due to overflow of upper_limit (with has_huge_files as true)
Below is a quick test to trigger it on a 64KB pagesize system.
sudo mkfs.ext4 -b 65536 -O ^has_extents,^64bit /dev/loop2
sudo mount /dev/loop2 /mnt
sudo echo "hello" > /mnt/hello -> This will error out with
"echo: write error: File too large"
Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Link: https://lore.kernel.org/r/594f409e2c543e90fd836b78188dfa5c575065ba.1622867594.git.riteshh@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Jeffle Xu [Mon, 23 Aug 2021 06:13:58 +0000 (14:13 +0800)]
ext4: fix reserved space counter leakage
When ext4_insert_delayed block receives and recovers from an error from
ext4_es_insert_delayed_block(), e.g., ENOMEM, it does not release the
space it has reserved for that block insertion as it should. One effect
of this bug is that s_dirtyclusters_counter is not decremented and
remains incorrectly elevated until the file system has been unmounted.
This can result in premature ENOSPC returns and apparent loss of free
space.
Another effect of this bug is that
/sys/fs/ext4/<dev>/delayed_allocation_blocks can remain non-zero even
after syncfs has been executed on the filesystem.
Besides, add check for s_dirtyclusters_counter when inode is going to be
evicted and freed. s_dirtyclusters_counter can still keep non-zero until
inode is written back in .evict_inode(), and thus the check is delayed
to .destroy_inode().
Fixes:
51865fda28e5 ("ext4: let ext4 maintain extent status tree")
Cc: stable@kernel.org
Suggested-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Eric Whitney <enwlinux@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20210823061358.84473-1-jefflexu@linux.alibaba.com
Hou Tao [Fri, 20 Aug 2021 04:45:05 +0000 (12:45 +0800)]
ext4: limit the number of blocks in one ADD_RANGE TLV
Now EXT4_FC_TAG_ADD_RANGE uses ext4_extent to track the
newly-added blocks, but the limit on the max value of
ee_len field is ignored, and it can lead to BUG_ON as
shown below when running command "fallocate -l 128M file"
on a fast_commit-enabled fs:
kernel BUG at fs/ext4/ext4_extents.h:199!
invalid opcode: 0000 [#1] SMP PTI
CPU: 3 PID: 624 Comm: fallocate Not tainted 5.14.0-rc6+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
RIP: 0010:ext4_fc_write_inode_data+0x1f3/0x200
Call Trace:
? ext4_fc_write_inode+0xf2/0x150
ext4_fc_commit+0x93b/0xa00
? ext4_fallocate+0x1ad/0x10d0
ext4_sync_file+0x157/0x340
? ext4_sync_file+0x157/0x340
vfs_fsync_range+0x49/0x80
do_fsync+0x3d/0x70
__x64_sys_fsync+0x14/0x20
do_syscall_64+0x3b/0xc0
entry_SYSCALL_64_after_hwframe+0x44/0xae
Simply fixing it by limiting the number of blocks
in one EXT4_FC_TAG_ADD_RANGE TLV.
Fixes:
aa75f4d3daae ("ext4: main fast-commit commit path")
Cc: stable@kernel.org
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20210820044505.474318-1-houtao1@huawei.com
Dan Carpenter [Thu, 30 Sep 2021 12:24:56 +0000 (15:24 +0300)]
ksmbd: missing check for NULL in convert_to_nt_pathname()
The kmalloc() does not have a NULL check. This code can be re-written
slightly cleaner to just use the kstrdup().
Fixes:
265fd1991c1d ("ksmbd: use LOOKUP_BENEATH to prevent the out of share access")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Acked-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Linus Torvalds [Thu, 30 Sep 2021 21:28:05 +0000 (14:28 -0700)]
Merge tag 'net-5.15-rc4' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Networking fixes, including fixes from mac80211, netfilter and bpf.
Current release - regressions:
- bpf, cgroup: assign cgroup in cgroup_sk_alloc when called from
interrupt
- mdio: revert mechanical patches which broke handling of optional
resources
- dev_addr_list: prevent address duplication
Previous releases - regressions:
- sctp: break out if skb_header_pointer returns NULL in sctp_rcv_ootb
(NULL deref)
- Revert "mac80211: do not use low data rates for data frames with no
ack flag", fixing broadcast transmissions
- mac80211: fix use-after-free in CCMP/GCMP RX
- netfilter: include zone id in tuple hash again, minimize collisions
- netfilter: nf_tables: unlink table before deleting it (race -> UAF)
- netfilter: log: work around missing softdep backend module
- mptcp: don't return sockets in foreign netns
- sched: flower: protect fl_walk() with rcu (race -> UAF)
- ixgbe: fix NULL pointer dereference in ixgbe_xdp_setup
- smsc95xx: fix stalled rx after link change
- enetc: fix the incorrect clearing of IF_MODE bits
- ipv4: fix rtnexthop len when RTA_FLOW is present
- dsa: mv88e6xxx: 6161: use correct MAX MTU config method for this
SKU
- e100: fix length calculation & buffer overrun in ethtool::get_regs
Previous releases - always broken:
- mac80211: fix using stale frag_tail skb pointer in A-MSDU tx
- mac80211: drop frames from invalid MAC address in ad-hoc mode
- af_unix: fix races in sk_peer_pid and sk_peer_cred accesses (race
-> UAF)
- bpf, x86: Fix bpf mapping of atomic fetch implementation
- bpf: handle return value of BPF_PROG_TYPE_STRUCT_OPS prog
- netfilter: ip6_tables: zero-initialize fragment offset
- mhi: fix error path in mhi_net_newlink
- af_unix: return errno instead of NULL in unix_create1() when over
the fs.file-max limit
Misc:
- bpf: exempt CAP_BPF from checks against bpf_jit_limit
- netfilter: conntrack: make max chain length random, prevent
guessing buckets by attackers
- netfilter: nf_nat_masquerade: make async masq_inet6_event handling
generic, defer conntrack walk to work queue (prevent hogging RTNL
lock)"
* tag 'net-5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (77 commits)
af_unix: fix races in sk_peer_pid and sk_peer_cred accesses
net: stmmac: fix EEE init issue when paired with EEE capable PHYs
net: dev_addr_list: handle first address in __hw_addr_add_ex
net: sched: flower: protect fl_walk() with rcu
net: introduce and use lock_sock_fast_nested()
net: phy: bcm7xxx: Fixed indirect MMD operations
net: hns3: disable firmware compatible features when uninstall PF
net: hns3: fix always enable rx vlan filter problem after selftest
net: hns3: PF enable promisc for VF when mac table is overflow
net: hns3: fix show wrong state when add existing uc mac address
net: hns3: fix mixed flag HCLGE_FLAG_MQPRIO_ENABLE and HCLGE_FLAG_DCB_ENABLE
net: hns3: don't rollback when destroy mqprio fail
net: hns3: remove tc enable checking
net: hns3: do not allow call hns3_nic_net_open repeatedly
ixgbe: Fix NULL pointer dereference in ixgbe_xdp_setup
net: bridge: mcast: Associate the seqcount with its protecting lock.
net: mdio-ipq4019: Fix the error for an optional regs resource
net: hns3: fix hclge_dbg_dump_tm_pg() stack usage
net: mdio: mscc-miim: Fix the mdio controller
af_unix: Return errno instead of NULL in unix_create1().
...
Linus Torvalds [Thu, 30 Sep 2021 19:11:35 +0000 (12:11 -0700)]
Merge tag 'gpio-fixes-for-v5.15-rc4' of git://git./linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
"A single fix for the gpio-pca953x driver and two commits updating the
MAINTAINERS entries for Mun Yew Tham (GPIO specific) and myself
(treewide after a change in professional situation).
Summary:
- don't ignore I2C errors in gpio-pca953x
- update MAINTAINERS entries for Mun Yew Tham and myself"
* tag 'gpio-fixes-for-v5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
MAINTAINERS: Update Mun Yew Tham as Altera Pio Driver maintainer
MAINTAINERS: update my email address
gpio: pca953x: do not ignore i2c errors
Linus Torvalds [Thu, 30 Sep 2021 19:00:46 +0000 (12:00 -0700)]
Merge tag 'for-linus' of git://git./linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
"Not much too exciting here, although two syzkaller bugs that seem to
have 9 lives may have finally been squashed.
Several core bugs and a batch of driver bug fixes:
- Fix compilation problems in qib and hfi1
- Do not corrupt the joined multicast group state when using
SEND_ONLY
- Several CMA bugs, a reference leak for listening and two syzkaller
crashers
- Various bug fixes for irdma
- Fix a Sleeping while atomic bug in usnic
- Properly sanitize kernel pointers in dmesg
- Two bugs in the 64b CQE support for hns"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/hns: Add the check of the CQE size of the user space
RDMA/hns: Fix the size setting error when copying CQE in clean_cq()
RDMA/hfi1: Fix kernel pointer leak
RDMA/usnic: Lock VF with mutex instead of spinlock
RDMA/hns: Work around broken constant propagation in gcc 8
RDMA/cma: Ensure rdma_addr_cancel() happens before issuing more requests
RDMA/cma: Do not change route.addr.src_addr.ss_family
RDMA/irdma: Report correct WC error when there are MW bind errors
RDMA/irdma: Report correct WC error when transport retry counter is exceeded
RDMA/irdma: Validate number of CQ entries on create CQ
RDMA/irdma: Skip CQP ring during a reset
MAINTAINERS: Update Broadcom RDMA maintainers
RDMA/cma: Fix listener leak in rdma_cma_listen_on_all() failure
IB/cma: Do not send IGMP leaves for sendonly Multicast groups
IB/qib: Fix clang confusion of NULL pointer comparison
Namjae Jeon [Wed, 29 Sep 2021 10:52:51 +0000 (19:52 +0900)]
ksmbd: fix transform header validation
Validate that the transform and smb request headers are present
before checking OriginalMessageSize and SessionId fields.
Cc: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Cc: Ralph Böhme <slow@samba.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Reviewed-by: Tom Talpey <tom@talpey.com>
Acked-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Hyunchul Lee [Fri, 24 Sep 2021 13:22:22 +0000 (22:22 +0900)]
ksmbd: add buffer validation for SMB2_CREATE_CONTEXT
Add buffer validation for SMB2_CREATE_CONTEXT.
Cc: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Reviewed-by: Ralph Boehme <slow@samba.org>
Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Namjae Jeon [Wed, 29 Sep 2021 06:44:32 +0000 (15:44 +0900)]
ksmbd: add validation in smb2 negotiate
This patch add validation to check request buffer check in smb2
negotiate and fix null pointer deferencing oops in smb3_preauth_hash_rsp()
that found from manual test.
Cc: Tom Talpey <tom@talpey.com>
Cc: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Cc: Ralph Böhme <slow@samba.org>
Cc: Hyunchul Lee <hyc.lee@gmail.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Reviewed-by: Ralph Boehme <slow@samba.org>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Namjae Jeon [Wed, 29 Sep 2021 06:41:48 +0000 (15:41 +0900)]
ksmbd: add request buffer validation in smb2_set_info
Add buffer validation in smb2_set_info, and remove unused variable
in set_file_basic_info. and smb2_set_info infolevel functions take
structure pointer argument.
Cc: Tom Talpey <tom@talpey.com>
Cc: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Cc: Ralph Böhme <slow@samba.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Acked-by: Hyunchul Lee <hyc.lee@gmail.com>
Reviewed-by: Ralph Boehme <slow@samba.org>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>