Shiraz Saleem [Fri, 25 Jun 2021 16:23:29 +0000 (11:23 -0500)]
RDMA/irdma: Fix potential overflow expression in irdma_prm_get_pbles
Coverity reports a signed 32-bit overflow on "1 << pprm->pble_shift" when
used expression to compute bits_needed that expects 64bit, unsigned.
Fix this by using the 1ULL in the left shift operator and convert mem_size
to u64.
Link: https://lore.kernel.org/r/20210625162329.1654-3-tatyana.e.nikolova@intel.com
Reported-by: coverity-bot <keescook+coverity-bot@chromium.org>
Addresses-Coverity-ID: 1505157 ("Integer handling issues")
Fixes:
915cc7ac0f8e ("RDMA/irdma: Add miscellaneous utility definitions")
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Shiraz Saleem [Fri, 25 Jun 2021 16:23:28 +0000 (11:23 -0500)]
RDMA/irdma: Check contents of user-space irdma_mem_reg_req object
The contents of user-space req object is used in array indexing in
irdma_handle_q_mem without checking for valid values.
Guard against bad input on each of these req object pages by limiting them
to number of pages that make up the region.
Link: https://lore.kernel.org/r/20210625162329.1654-2-tatyana.e.nikolova@intel.com
Reported-by: coverity-bot <keescook+coverity-bot@chromium.org>
Addresses-Coverity-ID: 1505160 ("TAINTED_SCALAR")
Fixes:
b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs")
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Dan Carpenter [Fri, 25 Jun 2021 13:03:06 +0000 (16:03 +0300)]
RDMA/rxe: Missing unlock on error in get_srq_wqe()
This error path needs to unlock before returning.
Fixes:
ec0fa2445c18 ("RDMA/rxe: Fix over copying in get_srq_wqe")
Link: https://lore.kernel.org/r/YNXUCmnPsSkPyhkm@mwanda
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Majd Dibbiny <majd@nvidia.com>
Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Gerd Rausch [Thu, 24 Jun 2021 18:55:31 +0000 (11:55 -0700)]
RDMA/cma: Fix rdma_resolve_route() memory leak
Fix a memory leak when "mda_resolve_route() is called more than once on
the same "rdma_cm_id".
This is possible if cma_query_handler() triggers the
RDMA_CM_EVENT_ROUTE_ERROR flow which puts the state machine back and
allows rdma_resolve_route() to be called again.
Link: https://lore.kernel.org/r/f6662b7b-bdb7-2706-1e12-47c61d3474b6@oracle.com
Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Håkon Bugge [Fri, 25 Jun 2021 12:30:57 +0000 (14:30 +0200)]
RDMA/core/sa_query: Remove unused argument
Fixes:
4c33bd1926cc ("IB/SA: Add support to query OPA path records")
Link: https://lore.kernel.org/r/1624624257-3677-1-git-send-email-haakon.bugge@oracle.com
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Håkon Bugge [Tue, 22 Jun 2021 14:13:27 +0000 (16:13 +0200)]
RDMA/cma: Fix incorrect Packet Lifetime calculation
An approximation for the PacketLifeTime is half the local ACK timeout.
The encoding for both timers are logarithmic.
If the local ACK timeout is set, but zero, it means the timer is
disabled. In this case, we choose the CMA_IBOE_PACKET_LIFETIME value,
since 50% of infinite makes no sense.
Before this commit, the PacketLifeTime became 255 if local ACK
timeout was zero (not running).
Fixed by explicitly testing for timeout being zero.
Fixes:
e1ee1e62bec4 ("RDMA/cma: Use ACK timeout for RoCE packetLifeTime")
Link: https://lore.kernel.org/r/1624371207-26710-1-git-send-email-haakon.bugge@oracle.com
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Håkon Bugge [Tue, 22 Jun 2021 13:39:57 +0000 (15:39 +0200)]
RDMA/cma: Protect RMW with qp_mutex
The struct rdma_id_private contains three bit-fields, tos_set,
timeout_set, and min_rnr_timer_set. These are set by accessor functions
without any synchronization. If two or all accessor functions are invoked
in close proximity in time, there will be Read-Modify-Write from several
contexts to the same variable, and the result will be intermittent.
Fixed by protecting the bit-fields by the qp_mutex in the accessor
functions.
The consumer of timeout_set and min_rnr_timer_set is in
rdma_init_qp_attr(), which is called with qp_mutex held for connected
QPs. Explicit locking is added for the consumers of tos and tos_set.
This commit depends on ("RDMA/cma: Remove unnecessary INIT->INIT
transition"), since the call to rdma_init_qp_attr() from
cma_init_conn_qp() does not hold the qp_mutex.
Fixes:
2c1619edef61 ("IB/cma: Define option to set ack timeout and pack tos_set")
Fixes:
3aeffc46afde ("IB/cma: Introduce rdma_set_min_rnr_timer()")
Link: https://lore.kernel.org/r/1624369197-24578-3-git-send-email-haakon.bugge@oracle.com
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Håkon Bugge [Tue, 22 Jun 2021 13:39:56 +0000 (15:39 +0200)]
RDMA/cma: Remove unnecessary INIT->INIT transition
In rdma_create_qp(), a connected QP will be transitioned to the INIT
state.
Afterwards, the QP will be transitioned to the RTR state by the
cma_modify_qp_rtr() function. But this function starts by performing an
ib_modify_qp() to the INIT state again, before another ib_modify_qp() is
performed to transition the QP to the RTR state.
Hence, there is no need to transition the QP to the INIT state in
rdma_create_qp().
Link: https://lore.kernel.org/r/1624369197-24578-2-git-send-email-haakon.bugge@oracle.com
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Yixing Liu [Tue, 22 Jun 2021 12:16:03 +0000 (20:16 +0800)]
RDMA/hns: Add window selection field of congestion control
The window selection field is necessary for congestion control of HIP09,
it is got from firmware and then filled into QPC. Some algorithms need it
to decide whether to limit the number of windows.
Fixes:
f91696f2f053 ("RDMA/hns: Support congestion control type selection according to the FW")
Link: https://lore.kernel.org/r/1624364163-44185-1-git-send-email-liweihang@huawei.com
Signed-off-by: Yixing Liu <liuyixing1@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Ira Weiny [Tue, 22 Jun 2021 06:14:19 +0000 (23:14 -0700)]
RDMA/hfi1: Remove use of kmap()
kmap() is being deprecated and will break uses of device dax after PKS
protection is introduced.[1]
The kmap() used in sdma does not need to be global. Use the new
kmap_local_page() which works with PKS and may provide better performance
for this thread local mapping anyway.
[1] https://lore.kernel.org/lkml/
20201009195033.3208459-59-ira.weiny@intel.com/
Link: https://lore.kernel.org/r/20210622061422.2633501-2-ira.weiny@intel.com
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Ira Weiny [Tue, 22 Jun 2021 16:56:22 +0000 (09:56 -0700)]
RDMA/irdma: Remove use of kmap()
kmap() is being deprecated and will break uses of device dax after PKS
protection is introduced.[1]
The kmap() used in the irdma CM driver is thread local. Therefore
kmap_local_page() is sufficient to use and may provide performance
benefits as well. kmap_local_page() will work with device dax and pgmap
protected pages.
Use kmap_local_page() instead of kmap().
[1] https://lore.kernel.org/lkml/
20201009195033.3208459-59-ira.weiny@intel.com/
Link: https://lore.kernel.org/r/20210622165622.2638628-1-ira.weiny@intel.com
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Colin Ian King [Wed, 23 Jun 2021 18:24:37 +0000 (19:24 +0100)]
RDMA/bnxt_re: Fix uninitialized struct bit field rsvd1
The bit field rsvd1 in resp is not being initialized and garbage data is
being copied from the stack back to userspace via the ib_copy_to_udata
call. Fix this by setting the entire struct resp to zero; this will ensure
that further new bit fields in the future will be zero'd too.
Link: https://lore.kernel.org/r/20210623182437.163801-1-colin.king@canonical.com
Addresses-Coverity: ("Uninitialized scalar variable")
Fixes:
879740517dab ("RDMA/bnxt_re: Update ABI to pass wqe-mode to user space")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
[jgg: remove extra zeroing]
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Max Gurtovoy [Mon, 24 May 2021 08:52:15 +0000 (11:52 +0300)]
IB/isert: Align target max I/O size to initiator size
Since the Linux iser initiator default max I/O size set to 512KB and since
there is no handshake procedure for this size in iser protocol, set the
default max IO size of the target to 512KB as well.
For changing the default values, there is a module parameter for both
drivers.
Link: https://lore.kernel.org/r/20210524085215.29005-1-mgurtovoy@nvidia.com
Reviewed-by: Alaa Hleihel <alaa@nvidia.com>
Reviewed-by: Israel Rukshin <israelr@nvidia.com>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Weihang Li [Wed, 23 Jun 2021 08:50:01 +0000 (16:50 +0800)]
RDMA/hns: Fix incorrect vlan enable bit in QPC
The QPC_RQ/SQ_VLAN_EN bit in QPC should be enabled, not the QPC mask.
Fixes:
f0cb411aad23 ("RDMA/hns: Use new interface to modify QP context")
Link: https://lore.kernel.org/r/1624438201-11915-1-git-send-email-liweihang@huawei.com
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Selvin Xavier [Wed, 23 Jun 2021 08:14:49 +0000 (01:14 -0700)]
MAINTAINERS: Update Broadcom RDMA maintainers
Updating the maintainers file as Devesh decidied to leave Broadcom.
Link: https://lore.kernel.org/r/1624436089-28263-1-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Kamal Heib [Sun, 20 Jun 2021 20:15:03 +0000 (23:15 +0300)]
RDMA/irdma: Use the queried port attributes
Instead of hard code the gid_table_len value, use the value from the
ib_query_port() attributes.
Fixes:
b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs")
Link: https://lore.kernel.org/r/20210620201503.67055-1-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Acked-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Bob Pearson [Fri, 18 Jun 2021 04:57:43 +0000 (23:57 -0500)]
RDMA/rxe: Fix redundant skb_put_zero
rxe_init_packet() in rxe_net.c calls skb_put_zero() to reserve space for
the payload and zero it out. All these bytes are then re-written with RoCE
headers and payload. Remove this useless extra copy.
Fixes:
ecb238f6a7f3 ("IB/cxgb4: use skb_put_zero()/__skb_put_zero")
Link: https://lore.kernel.org/r/20210618045742.204195-7-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Bob Pearson [Fri, 18 Jun 2021 04:57:42 +0000 (23:57 -0500)]
RDMA/rxe: Fix extra copy in prepare_ack_packet
Currently prepare_ack_packet writes almost all the fields of the BTH in
the ack packet twice. Replace code with the subroutine init_bth().
Fixes:
8700e3e7c485 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20210618045742.204195-6-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Bob Pearson [Fri, 18 Jun 2021 04:57:41 +0000 (23:57 -0500)]
RDMA/rxe: Fix over copying in get_srq_wqe
Currently get_srq_wqe() in rxe_resp.c copies the maximum possible number
of bytes from the wqe into the QPs copy of the SRQ wqe. This is usually
extra work and risks reading past the end of the SRQ circular buffer if
the SRQ is configured with less than the maximum possible number of SGEs.
Check the number of SGEs is not too large.
Compute the actual number of bytes in the WR and copy only those.
Fixes:
8700e3e7c485 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20210618045742.204195-5-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Bob Pearson [Fri, 18 Jun 2021 04:57:40 +0000 (23:57 -0500)]
RDMA/rxe: Fix extra copies in build_rdma_network_hdr
build_rdma_network_hdr() in rxe_resp.c does more copying than is
needed. Remove this subroutine and eliminate the extra copies for IPV6 and
reduce the extra copying for IPV4.
Fixes:
e404f945a610 ("IB/rxe: improved debug prints & code cleanup")
Link: https://lore.kernel.org/r/20210618045742.204195-4-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Bob Pearson [Fri, 18 Jun 2021 04:57:39 +0000 (23:57 -0500)]
RDMA/rxe: Fix redundant call to ip_send_check
For IPV4 packets sent on the wire the rxe driver calls ip_local_out()
which immediately calls __ip_local_out() which sets iph->tot_len and calls
ip_send_check(). This code is duplicated in prepare4(). On the loopback
path the IP header checksum and tot_len fields are not used so they do not
need to be set.
Remove this redundant code.
Fixes:
8700e3e7c485 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20210618045742.204195-3-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Bob Pearson [Fri, 18 Jun 2021 04:57:38 +0000 (23:57 -0500)]
RDMA/rxe: Fix useless copy in send_atomic_ack
In send_atomic_ack() in rxe_resp.c there is code copying ack_pkt into the
skb->cb[]. This doesn't do anything useful because the cb[] is not used in
the transmit path by the rxe driver.
Remove this code.
Fixes:
4c93496f18ce ("IB/rxe: do not copy extra stack memory to skb")
Link: https://lore.kernel.org/r/20210618045742.204195-2-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearson@hpe.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Lang Cheng [Tue, 22 Jun 2021 11:53:56 +0000 (19:53 +0800)]
RDMA/hns: Add vendor_err info to error WC
ULP can get more error information of CQ through verbs instead of prints.
Link: https://lore.kernel.org/r/1624362836-11631-1-git-send-email-liweihang@huawei.com
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Shiraz Saleem [Tue, 22 Jun 2021 17:52:31 +0000 (12:52 -0500)]
RDMA/irdma: Check return value from ib_umem_find_best_pgsz
iwmr->page_size stores the return from ib_umem_find_best_pgsz and maybe
zero when used in ib_umem_num_dma_blocks thus causing a divide by zero
error.
Fix this by erroring out of irdma_reg_user when 0 is returned from
ib_umem_find_best_pgsz.
Link: https://lore.kernel.org/r/20210622175232.439-3-tatyana.e.nikolova@intel.com
Reported-by: coverity-bot <keescook+coverity-bot@chromium.org>
Addresses-Coverity-ID: 1505149 ("Integer handling issues")
Fixes:
b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs")
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Lang Cheng [Fri, 18 Jun 2021 10:10:20 +0000 (18:10 +0800)]
RDMA/hns: Fix spelling mistakes of original
'orignal' should be 'original'.
Fixes:
9a4435375cd1 ("IB/hns: Add driver files for hns RoCE driver")
Link: https://lore.kernel.org/r/1624011020-16992-11-git-send-email-liweihang@huawei.com
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Yixing Liu [Fri, 18 Jun 2021 10:10:19 +0000 (18:10 +0800)]
RDMA/hns: Simplify the judgment in hns_roce_v2_post_send()
The QP type has been checked in check_send_valid(), if it's not RC, it
will process the UD/GSI branch.
Link: https://lore.kernel.org/r/1624011020-16992-10-git-send-email-liweihang@huawei.com
Signed-off-by: Yixing Liu <liuyixing1@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Wenpeng Liang [Fri, 18 Jun 2021 10:10:18 +0000 (18:10 +0800)]
RDMA/hns: Encapsulate flushing CQE as a function
The process of flushing CQE can be encapsultated into a function, which
can reduce duplicate code.
Link: https://lore.kernel.org/r/1624011020-16992-9-git-send-email-liweihang@huawei.com
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Yangyang Li [Fri, 18 Jun 2021 10:10:17 +0000 (18:10 +0800)]
RDMA/hns: Modify function return value type
hns_roce_init_qp_table() will only return 0, because this function does
not need to return a value, so it is modified to void type.
Link: https://lore.kernel.org/r/1624011020-16992-8-git-send-email-liweihang@huawei.com
Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Xi Wang [Fri, 18 Jun 2021 10:10:16 +0000 (18:10 +0800)]
RDMA/hns: Clean definitions of EQC structure
Remove unused members in EQ context structure.
Fixes:
782832f25404 ("RDMA/hns: Simplify the function config_eqc()")
Link: https://lore.kernel.org/r/1624011020-16992-7-git-send-email-liweihang@huawei.com
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Yangyang Li [Fri, 18 Jun 2021 10:10:15 +0000 (18:10 +0800)]
RDMA/hns: Delete unnecessary branch of hns_roce_v2_query_qp
When query_qp is called by userspace, max_send_wr and max_send_sge are set
to 0 by the kernel driver. However, the userspace does not use these two
return values from the kernel driver, but uses its own calculated values.
So there is no need for special treatment.
Fixes:
926a01dc000d ("RDMA/hns: Add QP operations support for hip08 SoC")
Link: https://lore.kernel.org/r/1624011020-16992-6-git-send-email-liweihang@huawei.com
Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Yangyang Li [Fri, 18 Jun 2021 10:10:14 +0000 (18:10 +0800)]
RDMA/hns: Add member assignments for qp_init_attr
Some kernel ULPs need to use the return value of qp_init_attr, so add
member assignments for qp_init_attr.
Fixes:
926a01dc000d ("RDMA/hns: Add QP operations support for hip08 SoC")
Link: https://lore.kernel.org/r/1624011020-16992-5-git-send-email-liweihang@huawei.com
Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Yixing Liu [Fri, 18 Jun 2021 10:10:13 +0000 (18:10 +0800)]
RDMA/hns: Fix some print issues
Remove redundant print and fix a character type mismatch.
Fixes:
0e0ab04b5bbe ("RDMA/hns: Refactor the MTR creation flow")
Link: https://lore.kernel.org/r/1624011020-16992-4-git-send-email-liweihang@huawei.com
Signed-off-by: Yixing Liu <liuyixing1@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Yixing Liu [Fri, 18 Jun 2021 10:10:12 +0000 (18:10 +0800)]
RDMA/hns: Fix uninitialized variable
A random value will be returned if the condition below is not met, so it
needs to be initialized.
Fixes:
9ea9a53ea93b ("RDMA/hns: Add mapped page count checking for MTR")
Link: https://lore.kernel.org/r/1624011020-16992-3-git-send-email-liweihang@huawei.com
Signed-off-by: Yixing Liu <liuyixing1@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Lang Cheng [Fri, 18 Jun 2021 10:10:11 +0000 (18:10 +0800)]
RDMA/hns: Force rewrite inline flag of WQE
When a non-inline WR reuses a WQE that was used for inline last time, the
remaining inline flag should be cleared.
Fixes:
62490fd5a865 ("RDMA/hns: Avoid unnecessary memset on WQEs in post_send")
Link: https://lore.kernel.org/r/1624011020-16992-2-git-send-email-liweihang@huawei.com
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Jason Gunthorpe [Tue, 22 Jun 2021 18:08:39 +0000 (15:08 -0300)]
Merge branch 'mlx5_realtime_ts' into rdma.git for-next
Aharon Landau says:
====================
In case device supports only real-time timestamp, the kernel will fail to
create QP despite rdma-core requested such timestamp type.
It is because device returns free-running timestamp, and the conversion
from free-running to real-time is performed in the user space.
This series fixes it, by returning real-time timestamp.
====================
* mlx5_realtime_ts:
RDMA/mlx5: Support real-time timestamp directly from the device
RDMA/mlx5: Refactor get_ts_format functions to simplify code
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Jason Gunthorpe [Tue, 22 Jun 2021 17:42:52 +0000 (14:42 -0300)]
Merge tag 'v5.13-rc7' into rdma.git for-next
Linux 5.13-rc7
Needed for dependencies in following patches. Merge conflict in rxe_cmop.c
resolved by compining both patches.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Aharon Landau [Wed, 16 Jun 2021 07:57:39 +0000 (10:57 +0300)]
RDMA/mlx5: Support real-time timestamp directly from the device
Currently, if the user asks for a real-time timestamp, the device will
return a free-running one, and the timestamp will be translated to
real-time in the user-space.
When the device supports only real-time timestamp and not free-running,
the creation of the QP will fail even though the user needs supported the
real-time one. To prevent this, we will return the real-time timestamp
directly from the device.
Link: https://lore.kernel.org/r/c6cfc8e6f038575c5c2de6505830f7e74e4de80d.1623829775.git.leonro@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Kees Cook [Wed, 16 Jun 2021 20:26:15 +0000 (13:26 -0700)]
RDMA/core: Use flexible array for mad data
In preparation for FORTIFY_SOURCE performing compile-time and run-time
field bounds checking for memcpy(), memmove(), and memset(), avoid
intentionally reading across neighboring array fields.
Without a flexible array, this looks like an attempt to perform a memcpy()
read beyond the end of the packet->mad.data array:
drivers/infiniband/core/user_mad.c:
memcpy(packet->msg->mad, packet->mad.data, IB_MGMT_MAD_HDR);
Switch from [0] to [] to use the appropriately handled type for trailing
bytes.
Link: https://lore.kernel.org/r/20210616202615.1247242-1-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Aharon Landau [Wed, 16 Jun 2021 07:57:38 +0000 (10:57 +0300)]
RDMA/mlx5: Refactor get_ts_format functions to simplify code
QPC, SQC and RQC timestamp formats and capabilities are always equal
because they represent general hardware support. So instead of code
duplication, let's merge them into general enum and logic.
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Xiao Yang [Mon, 21 Jun 2021 07:14:56 +0000 (15:14 +0800)]
RDMA/rxe: Don't overwrite errno from ib_umem_get()
rxe_mr_init_user() always returns the fixed -EINVAL when ib_umem_get()
fails so it's hard for user to know which actual error happens in
ib_umem_get(). For example, ib_umem_get() will return -EOPNOTSUPP when
trying to pin pages on a DAX file.
Return actual error as mlx4/mlx5 does.
Link: https://lore.kernel.org/r/20210621071456.4259-1-ice_yangxiao@163.com
Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Kees Cook [Wed, 16 Jun 2021 20:37:44 +0000 (13:37 -0700)]
IB/mlx4: Avoid field-overflowing memcpy()
In preparation for FORTIFY_SOURCE performing compile-time and run-time
field bounds checking for memcpy(), memmove(), and memset(), avoid
intentionally writing across neighboring array fields.
Use the ether_addr_copy() helper instead, as already done for smac.
Link: https://lore.kernel.org/r/20210616203744.1248551-1-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Jack Wang [Mon, 21 Jun 2021 05:53:40 +0000 (07:53 +0200)]
rnbd/rtrs-clt: Query and use max_segments from rtrs-clt.
With fast memory registration on write request, rnbd-clt
can do bigger IO without split. rnbd-clt now can query
rtrs-clt to get the max_segments, instead of using
BMAX_SEGMENTS.
BMAX_SEGMENTS is not longer needed, so remove it.
Link: https://lore.kernel.org/r/20210621055340.11789-6-jinpu.wang@ionos.com
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Reviewed-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Jack Wang [Mon, 21 Jun 2021 05:53:39 +0000 (07:53 +0200)]
RDMA/rtrs-clt: Raise MAX_SEGMENTS
As we can do fast memory registration on write, we can increase
the max_segments, default to 512K.
Link: https://lore.kernel.org/r/20210621055340.11789-5-jinpu.wang@ionos.com
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Reviewed-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Jack Wang [Mon, 21 Jun 2021 05:53:38 +0000 (07:53 +0200)]
RDMA/rtrs_clt: Alloc less memory with write path fast memory registration
With write path fast memory registration, we need less memory for
each request.
With fast memory registration, we can reduce max_send_sge to save
memory usage.
Also convert the kmalloc_array to kcalloc.
Link: https://lore.kernel.org/r/20210621055340.11789-4-jinpu.wang@ionos.com
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Reviewed-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Jack Wang [Mon, 21 Jun 2021 05:53:37 +0000 (07:53 +0200)]
RDMA/rtrs-clt: Write path fast memory registration
With fast memory registration in write path, we can reduce
the memory consumption by using less max_send_sge, support IO bigger
than 116 KB (29 segments * 4 KB) without splitting, and it also
make the IO path more symmetric.
To avoid some times MR reg failed, waiting for the invalidation to finish
before the new mr reg. Introduce a refcount, only finish the request
when both local invalidation and io reply are there.
Link: https://lore.kernel.org/r/20210621055340.11789-3-jinpu.wang@ionos.com
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Dima Stepanov <dmitrii.stepanov@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Jack Wang [Mon, 21 Jun 2021 05:53:36 +0000 (07:53 +0200)]
RDMA/rtrs: Introduce head/tail wr
Introduce tail wr, we can send as the last wr, we want to send the local
invalidate wr after rdma wr in later patch.
While at it, also fix coding style issue.
Link: https://lore.kernel.org/r/20210621055340.11789-2-jinpu.wang@ionos.com
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Reviewed-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Devesh Sharma [Wed, 16 Jun 2021 20:28:17 +0000 (01:58 +0530)]
RDMA/bnxt_re: Update ABI to pass wqe-mode to user space
Changing ucontext ABI response structure to pass wqe_mode to user library.
A flag in comp_mask has been set to indicate presence of wqe_mode.
Moved wqe-mode ABI to uapi/rdma/bnxt_re-abi.h
Link: https://lore.kernel.org/r/20210616202817.1185276-1-devesh.sharma@broadcom.com
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Anand Khoje [Wed, 16 Jun 2021 15:45:08 +0000 (21:15 +0530)]
IB/core: Shuffle locks in ib_port_data to save memory
pahole shows two 4-byte holes in struct ib_port_data after pkey_list_lock
and netdev_lock respectively.
Shuffling the netdev_lock to be after pkey_list_lock, this shaves off
eight bytes from the struct.
Link: https://lore.kernel.org/r/20210616154509.1047-3-anand.a.khoje@oracle.com
Suggested-by: Haakon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Anand Khoje <anand.a.khoje@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Anand Khoje [Wed, 16 Jun 2021 15:45:07 +0000 (21:15 +0530)]
IB/core: Removed port validity check from ib_get_cached_subnet_prefix
Removed port validity check from ib_get_cached_subnet_prefix() as this
check is not needed because "port_num" is valid.
Link: https://lore.kernel.org/r/20210616154509.1047-2-anand.a.khoje@oracle.com
Suggested-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Anand Khoje <anand.a.khoje@oracle.com>
Signed-off-by: Haakon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Leon Romanovsky [Wed, 16 Jun 2021 06:09:47 +0000 (09:09 +0300)]
RDMA: Fix kernel-doc warnings about wrong comment
Compilation with W=1 produces warnings similar to the below.
drivers/infiniband/ulp/ipoib/ipoib_main.c:320: warning: This comment
starts with '/**', but isn't a kernel-doc comment. Refer
Documentation/doc-guide/kernel-doc.rst
All such occurrences were found with the following one line
git grep -A 1 "\/\*\*" drivers/infiniband/
Link: https://lore.kernel.org/r/e57d5f4ddd08b7a19934635b44d6d632841b9ba7.1623823612.git.leonro@nvidia.com
Reviewed-by: Jack Wang <jinpu.wang@ionos.com> #rtrs
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Yangyang Li [Thu, 10 Jun 2021 11:50:14 +0000 (19:50 +0800)]
RDMA/hns: Use IDA interface to manage xrcd index
Switch xrcd index allocation and release from hns own bitmap interface
to IDA interface.
Link: https://lore.kernel.org/r/1623325814-55737-7-git-send-email-liweihang@huawei.com
Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Yangyang Li [Thu, 10 Jun 2021 11:50:13 +0000 (19:50 +0800)]
RDMA/hns: Use IDA interface to manage pd index
Switch pd index allocation and release from hns own bitmap interface
to IDA interface.
Link: https://lore.kernel.org/r/1623325814-55737-6-git-send-email-liweihang@huawei.com
Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Yangyang Li [Thu, 10 Jun 2021 11:50:12 +0000 (19:50 +0800)]
RDMA/hns: Use IDA interface to manage mtpt index
Switch mtpt index allocation and release from hns own bitmap interface
to IDA interface.
Link: https://lore.kernel.org/r/1623325814-55737-5-git-send-email-liweihang@huawei.com
Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Yangyang Li [Thu, 10 Jun 2021 11:50:11 +0000 (19:50 +0800)]
RDMA/hns: Remove unused RR mechanism
Round-robin (RR) is no longer used in the allocation of the bitmap table,
and all the function input parameters that use this mechanism are
BITMAP_NO_RR. The code that defines and uses the RR needs to be deleted.
Link: https://lore.kernel.org/r/1623325814-55737-4-git-send-email-liweihang@huawei.com
Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Yangyang Li [Thu, 10 Jun 2021 11:50:10 +0000 (19:50 +0800)]
RDMA/hns: Remove the unused hns_roce_bitmap_free_range function
hns_roce_bitmap_free_range() is only called inside hns_roce_bitmap_free(),
and the input parameter "cnt" is set to a constant 1. In addition, the
driver does not use alloc_range scenarios, so free_range does not need to
exist.
Link: https://lore.kernel.org/r/1623325814-55737-3-git-send-email-liweihang@huawei.com
Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Yangyang Li [Thu, 10 Jun 2021 11:50:09 +0000 (19:50 +0800)]
RDMA/hns: Remove the unused hns_roce_bitmap_alloc_range function
The function is no longer used.
Link: https://lore.kernel.org/r/1623325814-55737-2-git-send-email-liweihang@huawei.com
Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Wenpeng Liang [Thu, 10 Jun 2021 11:40:32 +0000 (19:40 +0800)]
RDMA/core: Fix incorrect print format specifier
There are some '%u' for 'int' and '%d' for 'unsigend int', they should be
fixed.
Link: https://lore.kernel.org/r/1623325232-30900-1-git-send-email-liweihang@huawei.com
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Xi Wang [Mon, 21 Jun 2021 08:00:43 +0000 (16:00 +0800)]
RDMA/hns: Clean SRQC structure definition
Remove unused members in srq context structure.
Link: https://lore.kernel.org/r/1624262443-24528-10-git-send-email-liweihang@huawei.com
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Yixing Liu [Mon, 21 Jun 2021 08:00:42 +0000 (16:00 +0800)]
RDMA/hns: Use new interface to write DB related fields
Use hr_write_reg() instead of roce_set_field().
Link: https://lore.kernel.org/r/1624262443-24528-9-git-send-email-liweihang@huawei.com
Signed-off-by: Yixing Liu <liuyixing1@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Yixing Liu [Mon, 21 Jun 2021 08:00:41 +0000 (16:00 +0800)]
RDMA/hns: Use new interface to write FRMR fields
Use "hr_reg_write" to replace "roce_set_filed".
Link: https://lore.kernel.org/r/1624262443-24528-8-git-send-email-liweihang@huawei.com
Signed-off-by: Yixing Liu <liuyixing1@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Lang Cheng [Mon, 21 Jun 2021 08:00:40 +0000 (16:00 +0800)]
RDMA/hns: Use new interface to get CQE fields
WQE_INDEX and OPCODE and QPN of CQE use redundant masks. Just remove them.
Link: https://lore.kernel.org/r/1624262443-24528-7-git-send-email-liweihang@huawei.com
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Lang Cheng [Mon, 21 Jun 2021 08:00:39 +0000 (16:00 +0800)]
RDMA/hns: Use new interface to modify QP context
Fill all QPC fileds with hr_reg_*() instead of roce_set_*(). SQPN is used
for HIP08 ES only, it should be removed.
Link: https://lore.kernel.org/r/1624262443-24528-6-git-send-email-liweihang@huawei.com
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Yixing Liu [Mon, 21 Jun 2021 08:00:38 +0000 (16:00 +0800)]
RDMA/hns: Use new interface to write CQ context.
Use hr_reg_*() to write CQ context, it's simpler than roce_set_*().
Link: https://lore.kernel.org/r/1624262443-24528-5-git-send-email-liweihang@huawei.com
Signed-off-by: Yixing Liu <liuyixing1@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Lang Cheng [Mon, 21 Jun 2021 08:00:37 +0000 (16:00 +0800)]
RDMA/hns: Add hr_reg_write_bool()
In order to avoid to do bitwise operations on a boolean value, add a new
register interface to avoid sparse comlaint about "dubious: x & !y" when
calling hr_reg_write(ctx, field, !!val).
Fixes:
dc504774408b ("RDMA/hns: Use new interface to set MPT related fields")
Fixes:
495c24808ce7 ("RDMA/hns: Add XRC subtype in QPC and XRC type in SRQC")
Link: https://lore.kernel.org/r/1624262443-24528-4-git-send-email-liweihang@huawei.com
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Weihang Li [Mon, 21 Jun 2021 08:00:36 +0000 (16:00 +0800)]
RDMA/hns: Add a check to ensure integer mtu is positive
GCC may reports an running time assert error when a value calculated from
ib_mtu_enum_to_int() is using as 'val' in FIELD_PREDP:
include/linux/compiler_types.h:328:38: error: call to
'__compiletime_assert_1524' declared with attribute error: FIELD_PREP:
value too large for the field
So a check is added about whether integer mtu from ib_mtu_enum_to_int() is
negative to avoid this warning.
Link: https://lore.kernel.org/r/1624262443-24528-3-git-send-email-liweihang@huawei.com
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Weihang Li [Mon, 21 Jun 2021 08:00:35 +0000 (16:00 +0800)]
RDMA/hns: Do not use !! for values that are already bool when calling hr_reg_write()
There is no need to use "!!" before "eq->eqe_size ==
HNS_ROCE_V3_EQE_SIZE", or sparse will complain about "dubious: x & !y".
Fixes:
782832f25404 ("RDMA/hns: Simplify the function config_eqc()")
Link: https://lore.kernel.org/r/1624262443-24528-2-git-send-email-liweihang@huawei.com
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Avihai Horon [Wed, 9 Jun 2021 11:05:03 +0000 (14:05 +0300)]
RDMA/mlx5: Enable Relaxed Ordering by default for kernel ULPs
Relaxed Ordering is a capability that can only benefit users that support
it. All kernel ULPs should support Relaxed Ordering, as they are designed
to read data only after observing the CQE and use the DMA API correctly.
Hence, implicitly enable Relaxed Ordering by default for MR transfers in
kernel ULPs.
Link: https://lore.kernel.org/r/b7e820aab7402b8efa63605f4ea465831b3b1e5e.1623236426.git.leonro@nvidia.com
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Linus Torvalds [Sun, 20 Jun 2021 22:03:15 +0000 (15:03 -0700)]
Linux 5.13-rc7
Linus Torvalds [Sun, 20 Jun 2021 16:44:52 +0000 (09:44 -0700)]
Merge tag 'sched_urgent_for_v5.13_rc6' of git://git./linux/kernel/git/tip/tip
Pull scheduler fix from Borislav Petkov:
"A single fix to restore fairness between control groups with equal
priority"
* tag 'sched_urgent_for_v5.13_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/fair: Correctly insert cfs_rq's to list on unthrottle
Linus Torvalds [Sun, 20 Jun 2021 16:38:14 +0000 (09:38 -0700)]
Merge tag 'irq_urgent_for_v5.13_rc6' of git://git./linux/kernel/git/tip/tip
Pull irq fix from Borislav Petkov:
"A single fix for GICv3 to not take an interrupt in an NMI context"
* tag 'irq_urgent_for_v5.13_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/gic-v3: Workaround inconsistent PMR setting on NMI entry
Linus Torvalds [Sun, 20 Jun 2021 16:09:58 +0000 (09:09 -0700)]
Merge tag 'x86_urgent_for_v5.13_rc6' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
"A first set of urgent fixes to the FPU/XSTATE handling mess^W code.
(There's a lot more in the pipe):
- Prevent corruption of the XSTATE buffer in signal handling by
validating what is being copied from userspace first.
- Invalidate other task's preserved FPU registers on XRSTOR failure
(#PF) because latter can still modify some of them.
- Restore the proper PKRU value in case userspace modified it
- Reset FPU state when signal restoring fails
Other:
- Map EFI boot services data memory as encrypted in a SEV guest so
that the guest can access it and actually boot properly
- Two SGX correctness fixes: proper resources freeing and a NUMA fix"
* tag 'x86_urgent_for_v5.13_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Avoid truncating memblocks for SGX memory
x86/sgx: Add missing xa_destroy() when virtual EPC is destroyed
x86/fpu: Reset state for all signal restore failures
x86/pkru: Write hardware init value to PKRU when xstate is init
x86/process: Check PF_KTHREAD and not current->mm for kernel threads
x86/fpu: Invalidate FPU state after a failed XRSTOR from a user buffer
x86/fpu: Prevent state corruption in __fpu__restore_sig()
x86/ioremap: Map EFI-reserved memory as encrypted for SEV
Linus Torvalds [Sat, 19 Jun 2021 23:50:23 +0000 (16:50 -0700)]
Merge tag 'powerpc-5.13-6' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Fix initrd corruption caused by our recent change to use relative jump
labels.
Fix a crash using perf record on systems without a hardware PMU
backend.
Rework our 64-bit signal handling slighty to make it more closely
match the old behaviour, after the recent change to use unsafe user
accessors.
Thanks to Anastasia Kovaleva, Athira Rajeev, Christophe Leroy, Daniel
Axtens, Greg Kurz, and Roman Bolshakov"
* tag 'powerpc-5.13-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/perf: Fix crash in perf_instruction_pointer() when ppmu is not set
powerpc: Fix initrd corruption with relative jump labels
powerpc/signal64: Copy siginfo before changing regs->nip
powerpc/mem: Add back missing header to fix 'no previous prototype' error
Linus Torvalds [Sat, 19 Jun 2021 21:50:43 +0000 (14:50 -0700)]
Merge tag 'perf-tools-fixes-for-v5.13-2021-06-19' of git://git./linux/kernel/git/acme/linux
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- Fix refcount usage when processing PERF_RECORD_KSYMBOL.
- 'perf stat' metric group fixes.
- Fix 'perf test' non-bash issue with stat bpf counters.
- Update unistd, in.h and socket.h with the kernel sources, silencing
perf build warnings.
* tag 'perf-tools-fixes-for-v5.13-2021-06-19' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
tools headers UAPI: Sync linux/in.h copy with the kernel sources
tools headers UAPI: Sync asm-generic/unistd.h with the kernel original
perf beauty: Update copy of linux/socket.h with the kernel sources
perf test: Fix non-bash issue with stat bpf counters
perf machine: Fix refcount usage when processing PERF_RECORD_KSYMBOL
perf metricgroup: Return error code from metricgroup__add_metric_sys_event_iter()
perf metricgroup: Fix find_evsel_group() event selector
Linus Torvalds [Sat, 19 Jun 2021 15:45:34 +0000 (08:45 -0700)]
Merge tag 'riscv-for-linus-5.13-rc7' of git://git./linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
- A build fix to always build modules with the 'medany' code model, as
the module loader doesn't support 'medlow'.
- A Kconfig warning fix for the SiFive errata.
- A pair of fixes that for regressions to the recent memory layout
changes.
- A fix for the FU740 device tree.
* tag 'riscv-for-linus-5.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: dts: fu740: fix cache-controller interrupts
riscv: Ensure BPF_JIT_REGION_START aligned with PMD size
riscv: kasan: Fix MODULES_VADDR evaluation due to local variables' name
riscv: sifive: fix Kconfig errata warning
riscv32: Use medany C model for modules
Linus Torvalds [Sat, 19 Jun 2021 15:39:13 +0000 (08:39 -0700)]
Merge tag 's390-5.13-4' of git://git./linux/kernel/git/s390/linux
Pull s390 fixes from Vasily Gorbik:
- Fix zcrypt ioctl hang due to AP queue msg counter dropping below 0
when pending requests are purged.
- Two fixes for the machine check handler in the entry code.
* tag 's390-5.13-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/ap: Fix hanging ioctl caused by wrong msg counter
s390/mcck: fix invalid KVM guest condition check
s390/mcck: fix calculation of SIE critical section size
Arnaldo Carvalho de Melo [Sat, 19 Jun 2021 13:15:22 +0000 (10:15 -0300)]
tools headers UAPI: Sync linux/in.h copy with the kernel sources
To pick the changes in:
321827477360934d ("icmp: don't send out ICMP messages with a source address of 0.0.0.0")
That don't result in any change in tooling, as INADDR_ are not used to
generate id->string tables used by 'perf trace'.
This addresses this build warning:
Warning: Kernel ABI header at 'tools/include/uapi/linux/in.h' differs from latest version at 'include/uapi/linux/in.h'
diff -u tools/include/uapi/linux/in.h include/uapi/linux/in.h
Cc: David S. Miller <davem@davemloft.net>
Cc: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Sat, 19 Jun 2021 13:11:46 +0000 (10:11 -0300)]
tools headers UAPI: Sync asm-generic/unistd.h with the kernel original
To pick the changes in:
8b1462b67f23da54 ("quota: finish disable quotactl_path syscall")
Those headers are used in some arches to generate the syscall table used
in 'perf trace' to translate syscall numbers into strings.
This addresses this perf build warning:
Warning: Kernel ABI header at 'tools/include/uapi/asm-generic/unistd.h' differs from latest version at 'include/uapi/asm-generic/unistd.h'
diff -u tools/include/uapi/asm-generic/unistd.h include/uapi/asm-generic/unistd.h
Cc: Jan Kara <jack@suse.cz>
Cc: Marcin Juszkiewicz <marcin@juszkiewicz.com.pl>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Sat, 19 Jun 2021 13:09:08 +0000 (10:09 -0300)]
perf beauty: Update copy of linux/socket.h with the kernel sources
To pick the changes in:
ea6932d70e223e02 ("net: make get_net_ns return error if NET_NS is disabled")
That don't result in any changes in the tables generated from that
header.
This silences this perf build warning:
Warning: Kernel ABI header at 'tools/perf/trace/beauty/include/linux/socket.h' differs from latest version at 'include/linux/socket.h'
diff -u tools/perf/trace/beauty/include/linux/socket.h include/linux/socket.h
Cc: Changbin Du <changbin.du@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Thu, 17 Jun 2021 18:42:13 +0000 (11:42 -0700)]
perf test: Fix non-bash issue with stat bpf counters
$(( .. )) is a bash feature but the test's interpreter is !/bin/sh,
switch the code to use expr.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: bpf@vger.kernel.org
Link: http://lore.kernel.org/lkml/20210617184216.2075588-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Riccardo Mancini [Sat, 12 Jun 2021 17:37:48 +0000 (19:37 +0200)]
perf machine: Fix refcount usage when processing PERF_RECORD_KSYMBOL
ASan reported a memory leak of BPF-related ksymbols map and dso. The
leak is caused by refount never reaching 0, due to missing __put calls
in the function machine__process_ksymbol_register.
Once the dso is inserted in the map, dso__put() should be called
(map__new2() increases the refcount to 2).
The same thing applies for the map when it's inserted into maps
(maps__insert() increases the refcount to 2).
$ sudo ./perf record -- sleep 5
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.025 MB perf.data (8 samples) ]
=================================================================
==297735==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 6992 byte(s) in 19 object(s) allocated from:
#0 0x4f43c7 in calloc (/home/user/linux/tools/perf/perf+0x4f43c7)
#1 0x8e4e53 in map__new2 /home/user/linux/tools/perf/util/map.c:216:20
#2 0x8cf68c in machine__process_ksymbol_register /home/user/linux/tools/perf/util/machine.c:778:10
[...]
Indirect leak of 8702 byte(s) in 19 object(s) allocated from:
#0 0x4f43c7 in calloc (/home/user/linux/tools/perf/perf+0x4f43c7)
#1 0x8728d7 in dso__new_id /home/user/linux/tools/perf/util/dso.c:1256:20
#2 0x872015 in dso__new /home/user/linux/tools/perf/util/dso.c:1295:9
#3 0x8cf623 in machine__process_ksymbol_register /home/user/linux/tools/perf/util/machine.c:774:21
[...]
Indirect leak of 1520 byte(s) in 19 object(s) allocated from:
#0 0x4f43c7 in calloc (/home/user/linux/tools/perf/perf+0x4f43c7)
#1 0x87b3da in symbol__new /home/user/linux/tools/perf/util/symbol.c:269:23
#2 0x888954 in map__process_kallsym_symbol /home/user/linux/tools/perf/util/symbol.c:710:8
[...]
Indirect leak of 1406 byte(s) in 19 object(s) allocated from:
#0 0x4f43c7 in calloc (/home/user/linux/tools/perf/perf+0x4f43c7)
#1 0x87b3da in symbol__new /home/user/linux/tools/perf/util/symbol.c:269:23
#2 0x8cfbd8 in machine__process_ksymbol_register /home/user/linux/tools/perf/util/machine.c:803:8
[...]
Signed-off-by: Riccardo Mancini <rickyman7@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tommi Rantala <tommi.t.rantala@nokia.com>
Link: http://lore.kernel.org/lkml/20210612173751.188582-1-rickyman7@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
John Garry [Thu, 10 Jun 2021 14:33:00 +0000 (22:33 +0800)]
perf metricgroup: Return error code from metricgroup__add_metric_sys_event_iter()
The error code is not set at all in the sys event iter function.
This may lead to an uninitialized value of "ret" in
metricgroup__add_metric() when no CPU metric is added.
Fix by properly setting the error code.
It is not necessary to init "ret" to 0 in metricgroup__add_metric(), as
if we have no CPU or sys event metric matching, then "has_match" should
be 0 and "ret" is set to -EINVAL.
However gcc cannot detect that it may not have been set after the
map_for_each_metric() loop for CPU metrics, which is strange.
Fixes:
be335ec28efa8 ("perf metricgroup: Support adding metrics for system PMUs")
Signed-off-by: John Garry <john.garry@huawei.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/1623335580-187317-3-git-send-email-john.garry@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
John Garry [Thu, 10 Jun 2021 14:32:59 +0000 (22:32 +0800)]
perf metricgroup: Fix find_evsel_group() event selector
The following command segfaults on my x86 broadwell:
$ ./perf stat -M frontend_bound,retiring,backend_bound,bad_speculation sleep 1
WARNING: grouped events cpus do not match, disabling group:
anon group { raw 0x10e }
anon group { raw 0x10e }
perf: util/evsel.c:1596: get_group_fd: Assertion `!(!leader->core.fd)' failed.
Aborted (core dumped)
The issue shows itself as a use-after-free in evlist__check_cpu_maps(),
whereby the leader of an event selector (evsel) has been deleted (yet we
still attempt to verify for an evsel).
Fundamentally the problem comes from metricgroup__setup_events() ->
find_evsel_group(), and has developed from the previous fix attempt in
commit
9c880c24cb0d ("perf metricgroup: Fix for metrics containing
duration_time").
The problem now is that the logic in checking if an evsel is in the same
group is subtly broken for the "cycles" event. For the "cycles" event,
the pmu_name is NULL; however the logic in find_evsel_group() may set an
event matched against "cycles" as used, when it should not be.
This leads to a condition where an evsel is set, yet its leader is not.
Fix the check for evsel pmu_name by not matching evsels when either has a
NULL pmu_name.
There is still a pre-existing metric issue whereby the ordering of the
metrics may break the 'stat' function, as discussed at:
https://lore.kernel.org/lkml/
49c6fccb-b716-1bf0-18a6-
cace1cdb66b9@huawei.com/
Fixes:
9c880c24cb0d ("perf metricgroup: Fix for metrics containing duration_time")
Signed-off-by: John Garry <john.garry@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> # On a Thinkpad T450S
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/1623335580-187317-2-git-send-email-john.garry@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
David Abdurachmanov [Sun, 13 Jun 2021 00:43:57 +0000 (17:43 -0700)]
riscv: dts: fu740: fix cache-controller interrupts
The order of interrupt numbers is incorrect.
The order for FU740 is: DirError, DataError, DataFail, DirFail
From SiFive FU740-C000 Manual:
19 - L2 Cache DirError
20 - L2 Cache DirFail
21 - L2 Cache DataError
22 - L2 Cache DataFail
Signed-off-by: David Abdurachmanov <david.abdurachmanov@sifive.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Jisheng Zhang [Fri, 18 Jun 2021 14:09:13 +0000 (22:09 +0800)]
riscv: Ensure BPF_JIT_REGION_START aligned with PMD size
Andreas reported commit
fc8504765ec5 ("riscv: bpf: Avoid breaking W^X")
breaks booting with one kind of defconfig, I reproduced a kernel panic
with the defconfig:
[ 0.138553] Unable to handle kernel paging request at virtual address
ffffffff81201220
[ 0.139159] Oops [#1]
[ 0.139303] Modules linked in:
[ 0.139601] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.13.0-rc5-default+ #1
[ 0.139934] Hardware name: riscv-virtio,qemu (DT)
[ 0.140193] epc : __memset+0xc4/0xfc
[ 0.140416] ra : skb_flow_dissector_init+0x1e/0x82
[ 0.140609] epc :
ffffffff8029806c ra :
ffffffff8033be78 sp :
ffffffe001647da0
[ 0.140878] gp :
ffffffff81134b08 tp :
ffffffe001654380 t0 :
ffffffff81201158
[ 0.141156] t1 :
0000000000000002 t2 :
0000000000000154 s0 :
ffffffe001647dd0
[ 0.141424] s1 :
ffffffff80a43250 a0 :
ffffffff81201220 a1 :
0000000000000000
[ 0.141654] a2 :
000000000000003c a3 :
ffffffff81201258 a4 :
0000000000000064
[ 0.141893] a5 :
ffffffff8029806c a6 :
0000000000000040 a7 :
ffffffffffffffff
[ 0.142126] s2 :
ffffffff81201220 s3 :
0000000000000009 s4 :
ffffffff81135088
[ 0.142353] s5 :
ffffffff81135038 s6 :
ffffffff8080ce80 s7 :
ffffffff80800438
[ 0.142584] s8 :
ffffffff80bc6578 s9 :
0000000000000008 s10:
ffffffff806000ac
[ 0.142810] s11:
0000000000000000 t3 :
fffffffffffffffc t4 :
0000000000000000
[ 0.143042] t5 :
0000000000000155 t6 :
00000000000003ff
[ 0.143220] status:
0000000000000120 badaddr:
ffffffff81201220 cause:
000000000000000f
[ 0.143560] [<
ffffffff8029806c>] __memset+0xc4/0xfc
[ 0.143859] [<
ffffffff8061e984>] init_default_flow_dissectors+0x22/0x60
[ 0.144092] [<
ffffffff800010fc>] do_one_initcall+0x3e/0x168
[ 0.144278] [<
ffffffff80600df0>] kernel_init_freeable+0x1c8/0x224
[ 0.144479] [<
ffffffff804868a8>] kernel_init+0x12/0x110
[ 0.144658] [<
ffffffff800022de>] ret_from_exception+0x0/0xc
[ 0.145124] ---[ end trace
f1e9643daa46d591 ]---
After some investigation, I think I found the root cause: commit
2bfc6cd81bd ("move kernel mapping outside of linear mapping") moves
BPF JIT region after the kernel:
| #define BPF_JIT_REGION_START PFN_ALIGN((unsigned long)&_end)
The &_end is unlikely aligned with PMD size, so the front bpf jit
region sits with part of kernel .data section in one PMD size mapping.
But kernel is mapped in PMD SIZE, when bpf_jit_binary_lock_ro() is
called to make the first bpf jit prog ROX, we will make part of kernel
.data section RO too, so when we write to, for example memset the
.data section, MMU will trigger a store page fault.
To fix the issue, we need to ensure the BPF JIT region is PMD size
aligned. This patch acchieve this goal by restoring the BPF JIT region
to original position, I.E the 128MB before kernel .text section. The
modification to kasan_init.c is inspired by Alexandre.
Fixes:
fc8504765ec5 ("riscv: bpf: Avoid breaking W^X")
Reported-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Jisheng Zhang [Fri, 18 Jun 2021 14:01:36 +0000 (22:01 +0800)]
riscv: kasan: Fix MODULES_VADDR evaluation due to local variables' name
commit
2bfc6cd81bd1 ("riscv: Move kernel mapping outside of linear
mapping") makes use of MODULES_VADDR to populate kernel, BPF, modules
mapping. Currently, MODULES_VADDR is defined as below for RV64:
| #define MODULES_VADDR (PFN_ALIGN((unsigned long)&_end) - SZ_2G)
But kasan_init() has two local variables which are also named as _start,
_end, so MODULES_VADDR is evaluated with the local variable _end
rather than the global "_end" as we expected. Fix this issue by
renaming the two local variables.
Fixes:
2bfc6cd81bd1 ("riscv: Move kernel mapping outside of linear mapping")
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Linus Torvalds [Sat, 19 Jun 2021 01:55:29 +0000 (18:55 -0700)]
Merge tag 'net-5.13-rc7' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Networking fixes for 5.13-rc7, including fixes from wireless, bpf,
bluetooth, netfilter and can.
Current release - regressions:
- mlxsw: spectrum_qdisc: Pass handle, not band number to find_class()
to fix modifying offloaded qdiscs
- lantiq: net: fix duplicated skb in rx descriptor ring
- rtnetlink: fix regression in bridge VLAN configuration, empty info
is not an error, bot-generated "fix" was not needed
- libbpf: s/rx/tx/ typo on umem->rx_ring_setup_done to fix umem
creation
Current release - new code bugs:
- ethtool: fix NULL pointer dereference during module EEPROM dump via
the new netlink API
- mlx5e: don't update netdev RQs with PTP-RQ, the special purpose
queue should not be visible to the stack
- mlx5e: select special PTP queue only for SKBTX_HW_TSTAMP skbs
- mlx5e: verify dev is present in get devlink port ndo, avoid a panic
Previous releases - regressions:
- neighbour: allow NUD_NOARP entries to be force GCed
- further fixes for fallout from reorg of WiFi locking (staging:
rtl8723bs, mac80211, cfg80211)
- skbuff: fix incorrect msg_zerocopy copy notifications
- mac80211: fix NULL ptr deref for injected rate info
- Revert "net/mlx5: Arm only EQs with EQEs" it may cause missed IRQs
Previous releases - always broken:
- bpf: more speculative execution fixes
- netfilter: nft_fib_ipv6: skip ipv6 packets from any to link-local
- udp: fix race between close() and udp_abort() resulting in a panic
- fix out of bounds when parsing TCP options before packets are
validated (in netfilter: synproxy, tc: sch_cake and mptcp)
- mptcp: improve operation under memory pressure, add missing
wake-ups
- mptcp: fix double-lock/soft lookup in subflow_error_report()
- bridge: fix races (null pointer deref and UAF) in vlan tunnel
egress
- ena: fix DMA mapping function issues in XDP
- rds: fix memory leak in rds_recvmsg
Misc:
- vrf: allow larger MTUs
- icmp: don't send out ICMP messages with a source address of 0.0.0.0
- cdc_ncm: switch to eth%d interface naming"
* tag 'net-5.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (139 commits)
net: ethernet: fix potential use-after-free in ec_bhf_remove
selftests/net: Add icmp.sh for testing ICMP dummy address responses
icmp: don't send out ICMP messages with a source address of 0.0.0.0
net: ll_temac: Avoid ndo_start_xmit returning NETDEV_TX_BUSY
net: ll_temac: Fix TX BD buffer overwrite
net: ll_temac: Add memory-barriers for TX BD access
net: ll_temac: Make sure to free skb when it is completely used
MAINTAINERS: add Guvenc as SMC maintainer
bnxt_en: Call bnxt_ethtool_free() in bnxt_init_one() error path
bnxt_en: Fix TQM fastpath ring backing store computation
bnxt_en: Rediscover PHY capabilities after firmware reset
cxgb4: fix wrong shift.
mac80211: handle various extensible elements correctly
mac80211: reset profile_periodicity/ema_ap
cfg80211: avoid double free of PMSR request
cfg80211: make certificate generation more robust
mac80211: minstrel_ht: fix sample time check
net: qed: Fix memcpy() overflow of qed_dcbx_params()
net: cdc_eem: fix tx fixup skb leak
net: hamradio: fix memory leak in mkiss_close
...
Linus Torvalds [Fri, 18 Jun 2021 23:39:03 +0000 (16:39 -0700)]
Merge tag 'for-5.13-rc6-tag' of git://git./linux/kernel/git/kdave/linux
Pull btrfs fix from David Sterba:
"One more fix, for a space accounting bug in zoned mode. It happens
when a block group is switched back rw->ro and unusable bytes (due to
zoned constraints) are subtracted twice.
It has user visible effects so I consider it important enough for late
-rc inclusion and backport to stable"
* tag 'for-5.13-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: zoned: fix negative space_info->bytes_readonly
Linus Torvalds [Fri, 18 Jun 2021 20:54:11 +0000 (13:54 -0700)]
Merge tag 'pci-v5.13-fixes-2' of git://git./linux/kernel/git/helgaas/pci
Pull PCI fixes from Bjorn Helgaas:
- Clear 64-bit flag for host bridge windows below 4GB to fix a resource
allocation regression added in -rc1 (Punit Agrawal)
- Fix tegra194 MCFG quirk build regressions added in -rc1 (Jon Hunter)
- Avoid secondary bus resets on TI KeyStone C667X devices (Antti
Järvinen)
- Avoid secondary bus resets on some NVIDIA GPUs (Shanker Donthineni)
- Work around FLR erratum on Huawei Intelligent NIC VF (Chiqijun)
- Avoid broken ATS on AMD Navi14 GPU (Evan Quan)
- Trust Broadcom BCM57414 NIC to isolate functions even though it
doesn't advertise ACS support (Sriharsha Basavapatna)
- Work around AMD RS690 BIOSes that don't configure DMA above 4GB
(Mikel Rychliski)
- Fix panic during PIO transfer on Aardvark controller (Pali Rohár)
* tag 'pci-v5.13-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI: aardvark: Fix kernel panic during PIO transfer
PCI: Add AMD RS690 quirk to enable 64-bit DMA
PCI: Add ACS quirk for Broadcom BCM57414 NIC
PCI: Mark AMD Navi14 GPU ATS as broken
PCI: Work around Huawei Intelligent NIC VF FLR erratum
PCI: Mark some NVIDIA GPUs to avoid bus reset
PCI: Mark TI C667X to avoid bus reset
PCI: tegra194: Fix MCFG quirk build regressions
PCI: of: Clear 64-bit flag for non-prefetchable memory below 4GB
Matthew Wilcox (Oracle) [Wed, 16 Jun 2021 21:22:28 +0000 (22:22 +0100)]
afs: Re-enable freezing once a page fault is interrupted
If a task is killed during a page fault, it does not currently call
sb_end_pagefault(), which means that the filesystem cannot be frozen
at any time thereafter. This may be reported by lockdep like this:
====================================
WARNING: fsstress/10757 still has locks held!
5.13.0-rc4-build4+ #91 Not tainted
------------------------------------
1 lock held by fsstress/10757:
#0:
ffff888104eac530
(
sb_pagefaults
as filesystem freezing is modelled as a lock.
Fix this by removing all the direct returns from within the function,
and using 'ret' to indicate whether we were interrupted or successful.
Fixes:
1cf7a1518aef ("afs: Implement shared-writeable mmap")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/20210616154900.1958373-1-willy@infradead.org/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pavel Skripkin [Fri, 18 Jun 2021 13:49:02 +0000 (16:49 +0300)]
net: ethernet: fix potential use-after-free in ec_bhf_remove
static void ec_bhf_remove(struct pci_dev *dev)
{
...
struct ec_bhf_priv *priv = netdev_priv(net_dev);
unregister_netdev(net_dev);
free_netdev(net_dev);
pci_iounmap(dev, priv->dma_io);
pci_iounmap(dev, priv->io);
...
}
priv is netdev private data, but it is used
after free_netdev(). It can cause use-after-free when accessing priv
pointer. So, fix it by moving free_netdev() after pci_iounmap()
calls.
Fixes:
6af55ff52b02 ("Driver for Beckhoff CX5020 EtherCAT master module.")
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 18 Jun 2021 19:22:55 +0000 (12:22 -0700)]
Merge tag 'mac80211-for-net-2021-06-18' of git://git./linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
A couple of straggler fixes:
* a minstrel HT sample check fix
* peer measurement could double-free on races
* certificate file generation at build time could
sometimes hang
* some parameters weren't reset between connections
in mac80211
* some extensible elements were treated as non-
extensible, possibly causuing bad connections
(or failures) if the AP adds data
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Toke Høiland-Jørgensen [Fri, 18 Jun 2021 11:04:36 +0000 (13:04 +0200)]
selftests/net: Add icmp.sh for testing ICMP dummy address responses
This adds a new icmp.sh selftest for testing that the kernel will respond
correctly with an ICMP unreachable message with the dummy (192.0.0.8)
source address when there are no IPv4 addresses configured to use as source
addresses.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Toke Høiland-Jørgensen [Fri, 18 Jun 2021 11:04:35 +0000 (13:04 +0200)]
icmp: don't send out ICMP messages with a source address of 0.0.0.0
When constructing ICMP response messages, the kernel will try to pick a
suitable source address for the outgoing packet. However, if no IPv4
addresses are configured on the system at all, this will fail and we end up
producing an ICMP message with a source address of 0.0.0.0. This can happen
on a box routing IPv4 traffic via v6 nexthops, for instance.
Since 0.0.0.0 is not generally routable on the internet, there's a good
chance that such ICMP messages will never make it back to the sender of the
original packet that the ICMP message was sent in response to. This, in
turn, can create connectivity and PMTUd problems for senders. Fortunately,
RFC7600 reserves a dummy address to be used as a source for ICMP
messages (192.0.0.8/32), so let's teach the kernel to substitute that
address as a last resort if the regular source address selection procedure
fails.
Below is a quick example reproducing this issue with network namespaces:
ip netns add ns0
ip l add type veth peer netns ns0
ip l set dev veth0 up
ip a add 10.0.0.1/24 dev veth0
ip a add fc00:dead:cafe:42::1/64 dev veth0
ip r add 10.1.0.0/24 via inet6 fc00:dead:cafe:42::2
ip -n ns0 l set dev veth0 up
ip -n ns0 a add fc00:dead:cafe:42::2/64 dev veth0
ip -n ns0 r add 10.0.0.0/24 via inet6 fc00:dead:cafe:42::1
ip netns exec ns0 sysctl -w net.ipv4.icmp_ratelimit=0
ip netns exec ns0 sysctl -w net.ipv4.ip_forward=1
tcpdump -tpni veth0 -c 2 icmp &
ping -w 1 10.1.0.1 > /dev/null
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on veth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
IP 10.0.0.1 > 10.1.0.1: ICMP echo request, id 29, seq 1, length 64
IP 0.0.0.0 > 10.0.0.1: ICMP net 10.1.0.1 unreachable, length 92
2 packets captured
2 packets received by filter
0 packets dropped by kernel
With this patch the above capture changes to:
IP 10.0.0.1 > 10.1.0.1: ICMP echo request, id 31127, seq 1, length 64
IP 192.0.0.8 > 10.0.0.1: ICMP net 10.1.0.1 unreachable, length 92
Fixes:
1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: Juliusz Chroboczek <jch@irif.fr>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Esben Haabendal [Fri, 18 Jun 2021 10:52:38 +0000 (12:52 +0200)]
net: ll_temac: Avoid ndo_start_xmit returning NETDEV_TX_BUSY
As documented in Documentation/networking/driver.rst, the ndo_start_xmit
method must not return NETDEV_TX_BUSY under any normal circumstances, and
as recommended, we simply stop the tx queue in advance, when there is a
risk that the next xmit would cause a NETDEV_TX_BUSY return.
Signed-off-by: Esben Haabendal <esben@geanix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Esben Haabendal [Fri, 18 Jun 2021 10:52:33 +0000 (12:52 +0200)]
net: ll_temac: Fix TX BD buffer overwrite
Just as the initial check, we need to ensure num_frag+1 buffers available,
as that is the number of buffers we are going to use.
This fixes a buffer overflow, which might be seen during heavy network
load. Complete lockup of TEMAC was reproducible within about 10 minutes of
a particular load.
Fixes:
84823ff80f74 ("net: ll_temac: Fix race condition causing TX hang")
Cc: stable@vger.kernel.org # v5.4+
Signed-off-by: Esben Haabendal <esben@geanix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Esben Haabendal [Fri, 18 Jun 2021 10:52:28 +0000 (12:52 +0200)]
net: ll_temac: Add memory-barriers for TX BD access
Add a couple of memory-barriers to ensure correct ordering of read/write
access to TX BDs.
In xmit_done, we should ensure that reading the additional BD fields are
only done after STS_CTRL_APP0_CMPLT bit is set.
When xmit_done marks the BD as free by setting APP0=0, we need to ensure
that the other BD fields are reset first, so we avoid racing with the xmit
path, which writes to the same fields.
Finally, making sure to read APP0 of next BD after the current BD, ensures
that we see all available buffers.
Signed-off-by: Esben Haabendal <esben@geanix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Esben Haabendal [Fri, 18 Jun 2021 10:52:23 +0000 (12:52 +0200)]
net: ll_temac: Make sure to free skb when it is completely used
With the skb pointer piggy-backed on the TX BD, we have a simple and
efficient way to free the skb buffer when the frame has been transmitted.
But in order to avoid freeing the skb while there are still fragments from
the skb in use, we need to piggy-back on the TX BD of the skb, not the
first.
Without this, we are doing use-after-free on the DMA side, when the first
BD of a multi TX BD packet is seen as completed in xmit_done, and the
remaining BDs are still being processed.
Cc: stable@vger.kernel.org # v5.4+
Signed-off-by: Esben Haabendal <esben@geanix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Karsten Graul [Fri, 18 Jun 2021 07:00:30 +0000 (09:00 +0200)]
MAINTAINERS: add Guvenc as SMC maintainer
Add Guvenc as maintainer for Shared Memory Communications (SMC)
Sockets.
Cc: Julian Wiedmann <jwi@linux.ibm.com>
Acked-by: Guvenc Gulce <guvenc@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 18 Jun 2021 19:00:27 +0000 (12:00 -0700)]
Merge branch 'bnxt_en-fixes'
Michael Chan says:
====================
bnxt_en: Bug fixes
This patchset includes 3 small bug fixes to reinitialize PHY capabilities
after firmware reset, setup the chip's internal TQM fastpath ring
backing memory properly for RoCE traffic, and to free ethtool related
memory if driver probe fails.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Somnath Kotur [Fri, 18 Jun 2021 06:07:27 +0000 (02:07 -0400)]
bnxt_en: Call bnxt_ethtool_free() in bnxt_init_one() error path
bnxt_ethtool_init() may have allocated some memory and we need to
call bnxt_ethtool_free() to properly unwind if bnxt_init_one()
fails.
Fixes:
7c3809181468 ("bnxt_en: Refactor bnxt_init_one() and turn on TPA support on 57500 chips.")
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>