platform/kernel/linux-starfive.git
2 years agobpf: introduce frags support to bpf_prog_test_run_xdp()
Lorenzo Bianconi [Fri, 21 Jan 2022 10:09:58 +0000 (11:09 +0100)]
bpf: introduce frags support to bpf_prog_test_run_xdp()

Introduce the capability to allocate a xdp frags in
bpf_prog_test_run_xdp routine. This is a preliminary patch to
introduce the selftests for new xdp frags ebpf helpers

Acked-by: Toke Hoiland-Jorgensen <toke@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/b7c0e425a9287f00f601c4fc0de54738ec6ceeea.1642758637.git.lorenzo@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agobpf: move user_size out of bpf_test_init
Lorenzo Bianconi [Fri, 21 Jan 2022 10:09:57 +0000 (11:09 +0100)]
bpf: move user_size out of bpf_test_init

Rely on data_size_in in bpf_test_init routine signature. This is a
preliminary patch to introduce xdp frags selftest

Acked-by: Toke Hoiland-Jorgensen <toke@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/6b48d38ed3d60240d7d6bb15e6fa7fabfac8dfb2.1642758637.git.lorenzo@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agobpf: add frags support to xdp copy helpers
Eelco Chaudron [Fri, 21 Jan 2022 10:09:56 +0000 (11:09 +0100)]
bpf: add frags support to xdp copy helpers

This patch adds support for frags for the following helpers:
  - bpf_xdp_output()
  - bpf_perf_event_output()

Acked-by: Toke Hoiland-Jorgensen <toke@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/340b4a99cdc24337b40eaf8bb597f9f9e7b0373e.1642758637.git.lorenzo@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agobpf: add frags support to the bpf_xdp_adjust_tail() API
Eelco Chaudron [Fri, 21 Jan 2022 10:09:55 +0000 (11:09 +0100)]
bpf: add frags support to the bpf_xdp_adjust_tail() API

This change adds support for tail growing and shrinking for XDP frags.

When called on a non-linear packet with a grow request, it will work
on the last fragment of the packet. So the maximum grow size is the
last fragments tailroom, i.e. no new buffer will be allocated.
A XDP frags capable driver is expected to set frag_size in xdp_rxq_info
data structure to notify the XDP core the fragment size.
frag_size set to 0 is interpreted by the XDP core as tail growing is
not allowed.
Introduce __xdp_rxq_info_reg utility routine to initialize frag_size field.

When shrinking, it will work from the last fragment, all the way down to
the base buffer depending on the shrinking size. It's important to mention
that once you shrink down the fragment(s) are freed, so you can not grow
again to the original size.

Acked-by: Toke Hoiland-Jorgensen <toke@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Co-developed-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Link: https://lore.kernel.org/r/eabda3485dda4f2f158b477729337327e609461d.1642758637.git.lorenzo@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agobpf: introduce bpf_xdp_get_buff_len helper
Lorenzo Bianconi [Fri, 21 Jan 2022 10:09:54 +0000 (11:09 +0100)]
bpf: introduce bpf_xdp_get_buff_len helper

Introduce bpf_xdp_get_buff_len helper in order to return the xdp buffer
total size (linear and paged area)

Acked-by: Toke Hoiland-Jorgensen <toke@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/aac9ac3504c84026cf66a3c71b7c5ae89bc991be.1642758637.git.lorenzo@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agonet: mvneta: enable jumbo frames if the loaded XDP program support frags
Lorenzo Bianconi [Fri, 21 Jan 2022 10:09:53 +0000 (11:09 +0100)]
net: mvneta: enable jumbo frames if the loaded XDP program support frags

Enable the capability to receive jumbo frames even if the interface is
running in XDP mode if the loaded program declare to properly support
xdp frags. At same time reject a xdp program not supporting xdp frags
if the driver is running in xdp frags mode.

Acked-by: Toke Hoiland-Jorgensen <toke@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/6909f81a3cbb8fb6b88e914752c26395771b882a.1642758637.git.lorenzo@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agobpf: introduce BPF_F_XDP_HAS_FRAGS flag in prog_flags loading the ebpf program
Lorenzo Bianconi [Fri, 21 Jan 2022 10:09:52 +0000 (11:09 +0100)]
bpf: introduce BPF_F_XDP_HAS_FRAGS flag in prog_flags loading the ebpf program

Introduce BPF_F_XDP_HAS_FRAGS and the related field in bpf_prog_aux
in order to notify the driver the loaded program support xdp frags.

Acked-by: Toke Hoiland-Jorgensen <toke@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/db2e8075b7032a356003f407d1b0deb99adaa0ed.1642758637.git.lorenzo@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agonet: mvneta: add frags support to XDP_TX
Lorenzo Bianconi [Fri, 21 Jan 2022 10:09:51 +0000 (11:09 +0100)]
net: mvneta: add frags support to XDP_TX

Introduce the capability to map non-linear xdp buffer running
mvneta_xdp_submit_frame() for XDP_TX and XDP_REDIRECT

Acked-by: Toke Hoiland-Jorgensen <toke@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/5d46ab63870ffe96fb95e6075a7ff0c81ef6424d.1642758637.git.lorenzo@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agoxdp: add frags support to xdp_return_{buff/frame}
Lorenzo Bianconi [Fri, 21 Jan 2022 10:09:50 +0000 (11:09 +0100)]
xdp: add frags support to xdp_return_{buff/frame}

Take into account if the received xdp_buff/xdp_frame is non-linear
recycling/returning the frame memory to the allocator or into
xdp_frame_bulk.

Acked-by: Toke Hoiland-Jorgensen <toke@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/a961069febc868508ce1bdf5e53a343eb4e57cb2.1642758637.git.lorenzo@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agonet: marvell: rely on xdp_update_skb_shared_info utility routine
Lorenzo Bianconi [Fri, 21 Jan 2022 10:09:49 +0000 (11:09 +0100)]
net: marvell: rely on xdp_update_skb_shared_info utility routine

Rely on xdp_update_skb_shared_info routine in order to avoid
resetting frags array in skb_shared_info structure building
the skb in mvneta_swbm_build_skb(). Frags array is expected to
be initialized by the receiving driver building the xdp_buff
and here we just need to update memory metadata.

Acked-by: Toke Hoiland-Jorgensen <toke@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/e0dad97f5d02b13f189f99f1e5bc8e61bef73412.1642758637.git.lorenzo@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agonet: xdp: add xdp_update_skb_shared_info utility routine
Lorenzo Bianconi [Fri, 21 Jan 2022 10:09:48 +0000 (11:09 +0100)]
net: xdp: add xdp_update_skb_shared_info utility routine

Introduce xdp_update_skb_shared_info routine to update frags array
metadata in skb_shared_info data structure converting to a skb from
a xdp_buff or xdp_frame.
According to the current skb_shared_info architecture in
xdp_frame/xdp_buff and to the xdp frags support, there is
no need to run skb_add_rx_frag() and reset frags array converting the buffer
to a skb since the frag array will be in the same position for xdp_buff/xdp_frame
and for the skb, we just need to update memory metadata.
Introduce XDP_FLAGS_PF_MEMALLOC flag in xdp_buff_flags in order to mark
the xdp_buff or xdp_frame as under memory-pressure if pages of the frags array
are under memory pressure. Doing so we can avoid looping over all fragments in
xdp_update_skb_shared_info routine. The driver is expected to set the
flag constructing the xdp_buffer using xdp_buff_set_frag_pfmemalloc
utility routine.
Rely on xdp_update_skb_shared_info in __xdp_build_skb_from_frame routine
converting the non-linear xdp_frame to a skb after performing a XDP_REDIRECT.

Acked-by: Toke Hoiland-Jorgensen <toke@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/bfd23fb8a8d7438724f7819c567cdf99ffd6226f.1642758637.git.lorenzo@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agonet: mvneta: simplify mvneta_swbm_add_rx_fragment management
Lorenzo Bianconi [Fri, 21 Jan 2022 10:09:47 +0000 (11:09 +0100)]
net: mvneta: simplify mvneta_swbm_add_rx_fragment management

Relying on xdp frags bit, remove skb_shared_info structure
allocated on the stack in mvneta_rx_swbm routine and simplify
mvneta_swbm_add_rx_fragment accessing skb_shared_info in the
xdp_buff structure directly. There is no performance penalty in
this approach since mvneta_swbm_add_rx_fragment is run just
for xdp frags use-case.

Acked-by: Toke Hoiland-Jorgensen <toke@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/45f050c094ccffce49d6bc5112939ed35250ba90.1642758637.git.lorenzo@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agonet: mvneta: update frags bit before passing the xdp buffer to eBPF layer
Lorenzo Bianconi [Fri, 21 Jan 2022 10:09:46 +0000 (11:09 +0100)]
net: mvneta: update frags bit before passing the xdp buffer to eBPF layer

Update frags bit (XDP_FLAGS_HAS_FRAGS) in xdp_buff to notify
XDP/eBPF layer and XDP remote drivers if this is a "non-linear"
XDP buffer. Access skb_shared_info only if XDP_FLAGS_HAS_FRAGS flag
is set in order to avoid possible cache-misses.

Acked-by: Toke Hoiland-Jorgensen <toke@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/c00a73097f8a35860d50dae4a36e6cc9ef7e172f.1642758637.git.lorenzo@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agoxdp: introduce flags field in xdp_buff/xdp_frame
Lorenzo Bianconi [Fri, 21 Jan 2022 10:09:45 +0000 (11:09 +0100)]
xdp: introduce flags field in xdp_buff/xdp_frame

Introduce flags field in xdp_frame and xdp_buffer data structures
to define additional buffer features. At the moment the only
supported buffer feature is frags bit (XDP_FLAGS_HAS_FRAGS).
frags bit is used to specify if this is a linear buffer
(XDP_FLAGS_HAS_FRAGS not set) or a frags frame (XDP_FLAGS_HAS_FRAGS
set). In the latter case the driver is expected to initialize the
skb_shared_info structure at the end of the first buffer to link together
subsequent buffers belonging to the same frame.

Acked-by: Toke Hoiland-Jorgensen <toke@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/e389f14f3a162c0a5bc6a2e1aa8dd01a90be117d.1642758637.git.lorenzo@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agonet: skbuff: add size metadata to skb_shared_info for xdp
Lorenzo Bianconi [Fri, 21 Jan 2022 10:09:44 +0000 (11:09 +0100)]
net: skbuff: add size metadata to skb_shared_info for xdp

Introduce xdp_frags_size field in skb_shared_info data structure
to store xdp_buff/xdp_frame frame paged size (xdp_frags_size will
be used in xdp frags support). In order to not increase
skb_shared_info size we will use a hole due to skb_shared_info
alignment.

Acked-by: Toke Hoiland-Jorgensen <toke@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/8a849819a3e0a143d540f78a3a5add76e17e980d.1642758637.git.lorenzo@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agoMerge branch 'octeontx2-af-fixes'
David S. Miller [Fri, 21 Jan 2022 14:32:21 +0000 (14:32 +0000)]
Merge branch 'octeontx2-af-fixes'

Subbaraya Sundeep says:

====================
octeontx-af2: Fixes for CN10K and CN9xxx platforms

This patchset has consolidated fixes in Octeontx2 driver
handling CN10K and CN9xxx platforms. When testing the
new CN10K hardware some issues resurfaced like accessing
wrong register for CN10K and enabling loopback on not supported
interfaces. Some fixes are needed for CN9xxx platforms as well.

Below is the description of patches

Patch 1: AF sets RX RSS action for all the VFs when a VF is
brought up. But when a PF sets RX action for its VF like Drop/Direct
to a queue in ntuple filter it is not retained because of AF fixup.
This patch skips modifying VF RX RSS action if PF has already
set its action.

Patch 2: When configuring backpressure wrong register is being read for
LBKs hence fixed it.

Patch 3: Some RVU blocks may take longer time to reset but are guaranteed
to complete the reset. Hence wait till reset is complete.

Patch 4: For enabling LMAC CN10K needs another register compared
to CN9xxx platforms. Hence changed it.

Patch 5: Adds missing barrier before submitting memory pointer
to the aura hardware.

Patch 6: Increase polling time while link credit restore and also
return proper error code when timeout occurs.

Patch 7: Internal loopback not supported on LPCS interfaces like
SGMII/QSGMII so do not enable it.

Patch 8: When there is a error in message processing, AF sets the error
response and replies back to requestor. PF forwards a invalid message to
VF back if AF reply has error in it. This way VF lacks the actual error set
by AF for its message. This is changed such that PF simply forwards the
actual reply and let VF handle the error.

Patch 9: ntuple filter with "flow-type ether proto 0x8842 vlan 0x92e"
was not working since ethertype 0x8842 is NGIO protocol. Hardware
parser explicitly parses such NGIO packets and sets the packet as
NGIO and do not set it as tagged packet. Fix this by changing parser
such that it sets the packet as both NGIO and tagged by using
separate layer types.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoocteontx2-af: Add KPU changes to parse NGIO as separate layer
Kiran Kumar K [Fri, 21 Jan 2022 06:34:47 +0000 (12:04 +0530)]
octeontx2-af: Add KPU changes to parse NGIO as separate layer

With current KPU profile NGIO is being parsed along with CTAG as
a single layer. Because of this MCAM/ntuple rules installed with
ethertype as 0x8842 are not being hit. Adding KPU profile changes
to parse NGIO in separate ltype and CTAG in separate ltype.

Fixes: f9c49be90c05 ("octeontx2-af: Update the default KPU profile and fixes")
Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoocteontx2-pf: Forward error codes to VF
Subbaraya Sundeep [Fri, 21 Jan 2022 06:34:46 +0000 (12:04 +0530)]
octeontx2-pf: Forward error codes to VF

PF forwards its VF messages to AF and corresponding
replies from AF to VF. AF sets proper error code in the
replies after processing message requests. Currently PF
checks the error codes in replies and sends invalid
message to VF. This way VF lacks the information of
error code set by AF for its messages. This patch
changes that such that PF simply forwards AF replies
so that VF can handle error codes.

Fixes: d424b6c02415 ("octeontx2-pf: Enable SRIOV and added VF mbox handling")
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoocteontx2-af: cn10k: Do not enable RPM loopback for LPC interfaces
Geetha sowjanya [Fri, 21 Jan 2022 06:34:45 +0000 (12:04 +0530)]
octeontx2-af: cn10k: Do not enable RPM loopback for LPC interfaces

Internal looback is not supported to low rate LPCS interface like
SGMII/QSGMII. Hence don't allow to enable for such interfaces.

Fixes: 3ad3f8f93c81 ("octeontx2-af: cn10k: MAC internal loopback support")
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoocteontx2-af: Increase link credit restore polling timeout
Geetha sowjanya [Fri, 21 Jan 2022 06:34:44 +0000 (12:04 +0530)]
octeontx2-af: Increase link credit restore polling timeout

It's been observed that sometimes link credit restore takes
a lot of time than the current timeout. This patch increases
the default timeout value and return the proper error value
on failure.

Fixes: 1c74b89171c3 ("octeontx2-af: Wait for TX link idle for credits change")
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoocteontx2-pf: cn10k: Ensure valid pointers are freed to aura
Geetha sowjanya [Fri, 21 Jan 2022 06:34:43 +0000 (12:04 +0530)]
octeontx2-pf: cn10k: Ensure valid pointers are freed to aura

While freeing SQB pointers to aura, driver first memcpy to
target address and then triggers lmtst operation to free pointer
to the aura. We need to ensure(by adding dmb barrier)that memcpy
is finished before pointers are freed to the aura. This patch also
adds the missing sq context structure entry in debugfs.

Fixes: ef6c8da71eaf ("octeontx2-pf: cn10K: Reserve LMTST lines per core")
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoocteontx2-af: cn10k: Use appropriate register for LMAC enable
Geetha sowjanya [Fri, 21 Jan 2022 06:34:42 +0000 (12:04 +0530)]
octeontx2-af: cn10k: Use appropriate register for LMAC enable

CN10K platforms uses RPM(0..2)_MTI_MAC100(0..3)_COMMAND_CONFIG
register for lmac TX/RX enable whereas CN9xxx platforms use
CGX_CMRX_CONFIG register. This config change was missed when
adding support for CN10K RPM.

Fixes: 91c6945ea1f9 ("octeontx2-af: cn10k: Add RPM MAC support")
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoocteontx2-af: Retry until RVU block reset complete
Geetha sowjanya [Fri, 21 Jan 2022 06:34:41 +0000 (12:04 +0530)]
octeontx2-af: Retry until RVU block reset complete

Few RVU blocks like SSO require more time for reset on some
silicons. Hence retrying the block reset until success.

Fixes: c0fa2cff8822c ("octeontx2-af: Handle return value in block reset")
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoocteontx2-af: Fix LBK backpressure id count
Sunil Goutham [Fri, 21 Jan 2022 06:34:40 +0000 (12:04 +0530)]
octeontx2-af: Fix LBK backpressure id count

In rvu_nix_get_bpid() lbk_bpid_cnt is being read from
wrong register. Due to this backpressure enable is failing
for LBK VF32 onwards. This patch fixes that.

Fixes: fe1939bb2340 ("octeontx2-af: Add SDP interface support")
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Subbaraya Sundeep <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoocteontx2-af: Do not fixup all VF action entries
Subbaraya Sundeep [Fri, 21 Jan 2022 06:34:39 +0000 (12:04 +0530)]
octeontx2-af: Do not fixup all VF action entries

AF modifies all the rules destined for VF to use
the action same as default RSS action. This fixup
was needed because AF only installs default rules with
RSS action. But the action in rules installed by a PF
for its VFs should not be changed by this fixup.
This is because action can be drop or direct to
queue as specified by user(ntuple filters).
This patch fixes that problem.

Fixes: 967db3529eca ("octeontx2-af: add support for multicast/promisc packet")
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge tag 'wireless-2022-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git...
David S. Miller [Fri, 21 Jan 2022 11:23:33 +0000 (11:23 +0000)]
Merge tag 'wireless-2022-01-21' of git://git./linux/kernel/git/wireless/wireless

Kalle Valo says:

====================
wireless fixes for v5.17

First set of fixes for v5.17. This is the first pull request from the
new wireless tree and only changes to MAINTAINERS file.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
David S. Miller [Fri, 21 Jan 2022 10:30:30 +0000 (10:30 +0000)]
Merge branch '40GbE' of git://git./linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2022-01-20

This series contains updates to i40e driver only.

Jedrzej increases delay for EMP reset and adds checks to ensure a VF
request to change queues can be met.

Sylwester moves the placement of the Flow Director queue as to not
fragment the queue pile which would cause later re-allocation issues.

Karen prevents VF reset being invoked while another is still occurring
to avoid reading invalid data.

Joe Damato fixes some statistics fields to match the values of the
fields they are based on.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoselftests: bpf: test BPF_PROG_QUERY for progs attached to sockmap
Di Zhu [Wed, 19 Jan 2022 01:40:05 +0000 (09:40 +0800)]
selftests: bpf: test BPF_PROG_QUERY for progs attached to sockmap

Add test for querying progs attached to sockmap. we use an existing
libbpf query interface to query prog cnt before and after progs
attaching to sockmap and check whether the queried prog id is right.

Signed-off-by: Di Zhu <zhudi2@huawei.com>
Acked-by: Yonghong Song <yhs@fb.com>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/r/20220119014005.1209-2-zhudi2@huawei.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agobpf: support BPF_PROG_QUERY for progs attached to sockmap
Di Zhu [Wed, 19 Jan 2022 01:40:04 +0000 (09:40 +0800)]
bpf: support BPF_PROG_QUERY for progs attached to sockmap

Right now there is no way to query whether BPF programs are
attached to a sockmap or not.

we can use the standard interface in libbpf to query, such as:
bpf_prog_query(mapFd, BPF_SK_SKB_STREAM_PARSER, 0, NULL, ...);
the mapFd is the fd of sockmap.

Signed-off-by: Di Zhu <zhudi2@huawei.com>
Acked-by: Yonghong Song <yhs@fb.com>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/r/20220119014005.1209-1-zhudi2@huawei.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agoMerge branch 'libbpf: streamline netlink-based XDP APIs'
Alexei Starovoitov [Fri, 21 Jan 2022 05:22:03 +0000 (21:22 -0800)]
Merge branch 'libbpf: streamline netlink-based XDP APIs'

Andrii Nakryiko says:

====================

Revamp existing low-level XDP APIs provided by libbpf to follow more
consistent naming (new APIs follow bpf_tc_xxx() approach where it makes
sense) and be extensible without ABI breakages (OPTS-based). See patch #1 for
details, remaining patches switch bpftool, selftests/bpf and samples/bpf to
new APIs.
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agosamples/bpf: adapt samples/bpf to bpf_xdp_xxx() APIs
Andrii Nakryiko [Thu, 20 Jan 2022 06:14:22 +0000 (22:14 -0800)]
samples/bpf: adapt samples/bpf to bpf_xdp_xxx() APIs

Use new bpf_xdp_*() APIs across all XDP-related BPF samples.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220120061422.2710637-5-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agoselftests/bpf: switch to new libbpf XDP APIs
Andrii Nakryiko [Thu, 20 Jan 2022 06:14:21 +0000 (22:14 -0800)]
selftests/bpf: switch to new libbpf XDP APIs

Switch to using new bpf_xdp_*() APIs across all selftests. Take
advantage of a more straightforward and user-friendly semantics of
old_prog_fd (0 means "don't care") in few places.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220120061422.2710637-4-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agobpftool: use new API for attaching XDP program
Andrii Nakryiko [Thu, 20 Jan 2022 06:14:20 +0000 (22:14 -0800)]
bpftool: use new API for attaching XDP program

Switch to new bpf_xdp_attach() API to avoid deprecation warnings.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220120061422.2710637-3-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agolibbpf: streamline low-level XDP APIs
Andrii Nakryiko [Thu, 20 Jan 2022 06:14:19 +0000 (22:14 -0800)]
libbpf: streamline low-level XDP APIs

Introduce 4 new netlink-based XDP APIs for attaching, detaching, and
querying XDP programs:
  - bpf_xdp_attach;
  - bpf_xdp_detach;
  - bpf_xdp_query;
  - bpf_xdp_query_id.

These APIs replace bpf_set_link_xdp_fd, bpf_set_link_xdp_fd_opts,
bpf_get_link_xdp_id, and bpf_get_link_xdp_info APIs ([0]). The latter
don't follow a consistent naming pattern and some of them use
non-extensible approaches (e.g., struct xdp_link_info which can't be
modified without breaking libbpf ABI).

The approach I took with these low-level XDP APIs is similar to what we
did with low-level TC APIs. There is a nice duality of bpf_tc_attach vs
bpf_xdp_attach, and so on. I left bpf_xdp_attach() to support detaching
when -1 is specified for prog_fd for generality and convenience, but
bpf_xdp_detach() is preferred due to clearer naming and associated
semantics. Both bpf_xdp_attach() and bpf_xdp_detach() accept the same
opts struct allowing to specify expected old_prog_fd.

While doing the refactoring, I noticed that old APIs require users to
specify opts with old_fd == -1 to declare "don't care about already
attached XDP prog fd" condition. Otherwise, FD 0 is assumed, which is
essentially never an intended behavior. So I made this behavior
consistent with other kernel and libbpf APIs, in which zero FD means "no
FD". This seems to be more in line with the latest thinking in BPF land
and should cause less user confusion, hopefully.

For querying, I left two APIs, both more generic bpf_xdp_query()
allowing to query multiple IDs and attach mode, but also
a specialization of it, bpf_xdp_query_id(), which returns only requested
prog_id. Uses of prog_id returning bpf_get_link_xdp_id() were so
prevalent across selftests and samples, that it seemed a very common use
case and using bpf_xdp_query() for doing it felt very cumbersome with
a highly branches if/else chain based on flags and attach mode.

Old APIs are scheduled for deprecation in libbpf 0.8 release.

  [0] Closes: https://github.com/libbpf/libbpf/issues/309

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/20220120061422.2710637-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agoMerge branch 'libbpf: deprecate legacy BPF map definitions'
Alexei Starovoitov [Fri, 21 Jan 2022 05:19:05 +0000 (21:19 -0800)]
Merge branch 'libbpf: deprecate legacy BPF map definitions'

Andrii Nakryiko says:

====================

Officially deprecate legacy BPF map definitions in libbpf. They've been slated
for deprecation for a while in favor of more powerful BTF-defined map
definitions and this patch set adds warnings and a way to enforce this in
libbpf through LIBBPF_STRICT_MAP_DEFINITIONS strict mode flag.

Selftests are fixed up and updated, BPF documentation is updated, bpftool's
strict mode usage is adjusted to avoid breaking users unnecessarily.

v1->v2:
  - replace missed bpf_map_def case in Documentation/bpf/btf.rst (Alexei).
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agodocs/bpf: update BPF map definition example
Andrii Nakryiko [Thu, 20 Jan 2022 06:05:29 +0000 (22:05 -0800)]
docs/bpf: update BPF map definition example

Use BTF-defined map definition in the documentation example.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220120060529.1890907-5-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agolibbpf: deprecate legacy BPF map definitions
Andrii Nakryiko [Thu, 20 Jan 2022 06:05:28 +0000 (22:05 -0800)]
libbpf: deprecate legacy BPF map definitions

Enact deprecation of legacy BPF map definition in SEC("maps") ([0]). For
the definitions themselves introduce LIBBPF_STRICT_MAP_DEFINITIONS flag
for libbpf strict mode. If it is set, error out on any struct
bpf_map_def-based map definition. If not set, libbpf will print out
a warning for each legacy BPF map to raise awareness that it goes away.

For any use of BPF_ANNOTATE_KV_PAIR() macro providing a legacy way to
associate BTF key/value type information with legacy BPF map definition,
warn through libbpf's pr_warn() error message (but don't fail BPF object
open).

BPF-side struct bpf_map_def is marked as deprecated. User-space struct
bpf_map_def has to be used internally in libbpf, so it is left
untouched. It should be enough for bpf_map__def() to be marked
deprecated to raise awareness that it goes away.

bpftool is an interesting case that utilizes libbpf to open BPF ELF
object to generate skeleton. As such, even though bpftool itself uses
full on strict libbpf mode (LIBBPF_STRICT_ALL), it has to relax it a bit
for BPF map definition handling to minimize unnecessary disruptions. So
opt-out of LIBBPF_STRICT_MAP_DEFINITIONS for bpftool. User's code that
will later use generated skeleton will make its own decision whether to
enforce LIBBPF_STRICT_MAP_DEFINITIONS or not.

There are few tests in selftests/bpf that are consciously using legacy
BPF map definitions to test libbpf functionality. For those, temporary
opt out of LIBBPF_STRICT_MAP_DEFINITIONS mode for the duration of those
tests.

  [0] Closes: https://github.com/libbpf/libbpf/issues/272

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220120060529.1890907-4-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agoselftests/bpf: convert remaining legacy map definitions
Andrii Nakryiko [Thu, 20 Jan 2022 06:05:27 +0000 (22:05 -0800)]
selftests/bpf: convert remaining legacy map definitions

Converted few remaining legacy BPF map definition to BTF-defined ones.
For the remaining two bpf_map_def-based legacy definitions that we want
to keep for testing purposes until libbpf 1.0 release, guard them in
pragma to suppres deprecation warnings which will be added in libbpf in
the next commit.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220120060529.1890907-3-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agoselftests/bpf: fail build on compilation warning
Andrii Nakryiko [Thu, 20 Jan 2022 06:05:26 +0000 (22:05 -0800)]
selftests/bpf: fail build on compilation warning

It's very easy to miss compilation warnings without -Werror, which is
not set for selftests. libbpf and bpftool are already strict about this,
so make selftests/bpf also treat compilation warnings as errors to catch
such regressions early.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220120060529.1890907-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agoMerge branch 'mptcp-a-few-fixes'
Jakub Kicinski [Fri, 21 Jan 2022 04:24:03 +0000 (20:24 -0800)]
Merge branch 'mptcp-a-few-fixes'

Mat Martineau says:

====================
mptcp: A few fixes

Patch 1 fixes a RCU locking issue when processing a netlink command that
updates endpoint flags in the in-kernel MPTCP path manager.

Patch 2 fixes a typo affecting available endpoint id tracking.

Patch 3 fixes IPv6 routing in the MPTCP self tests.
====================

Link: https://lore.kernel.org/r/20220121003529.54930-1-mathew.j.martineau@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoselftests: mptcp: fix ipv6 routing setup
Paolo Abeni [Fri, 21 Jan 2022 00:35:29 +0000 (16:35 -0800)]
selftests: mptcp: fix ipv6 routing setup

MPJ ipv6 selftests currently lack per link route to the server
net. Additionally, ipv6 subflows endpoints are created without any
interface specified. The end-result is that in ipv6 self-tests
subflows are created all on the same link, leading to expected delays
and sporadic self-tests failures.

Fix the issue by adding the missing setup bits.

Fixes: 523514ed0a99 ("selftests: mptcp: add ADD_ADDR IPv6 test cases")
Reported-and-tested-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agomptcp: fix removing ids bitmap setting
Geliang Tang [Fri, 21 Jan 2022 00:35:28 +0000 (16:35 -0800)]
mptcp: fix removing ids bitmap setting

In mptcp_pm_nl_rm_addr_or_subflow(), the bit of rm_list->ids[i] in the
id_avail_bitmap should be set, not rm_list->ids[1]. This patch fixed it.

Fixes: 86e39e04482b ("mptcp: keep track of local endpoint still available for each msk")
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agomptcp: fix msk traversal in mptcp_nl_cmd_set_flags()
Paolo Abeni [Fri, 21 Jan 2022 00:35:27 +0000 (16:35 -0800)]
mptcp: fix msk traversal in mptcp_nl_cmd_set_flags()

The MPTCP endpoint list is under RCU protection, guarded by the
pernet spinlock. mptcp_nl_cmd_set_flags() traverses the list
without acquiring the spin-lock nor under the RCU critical section.

This change addresses the issue performing the lookup and the endpoint
update under the pernet spinlock.

Fixes: 0f9f696a502e ("mptcp: add set_flags command in PM netlink")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Jakub Kicinski [Fri, 21 Jan 2022 04:22:30 +0000 (20:22 -0800)]
Merge git://git./pub/scm/linux/kernel/git/pablo/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

1) Incorrect helper module alias in netbios_ns, from Florian Westphal.

2) Remove unused variable in nf_tables.

3) Uninitialized last expression in nf_tables register tracking.

4) Memleak in nft_connlimit after moving stateful data out of the
   expression data area.

5) Bogus invalid stats update when NF_REPEAT is returned, from Florian.

* git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf:
  netfilter: conntrack: don't increment invalid counter on NF_REPEAT
  netfilter: nft_connlimit: memleak if nf_ct_netns_get() fails
  netfilter: nf_tables: set last expression in register tracking area
  netfilter: nf_tables: remove unused variable
  netfilter: nf_conntrack_netbios_ns: fix helper module alias
====================

Link: https://lore.kernel.org/r/20220120125212.991271-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoipv6: annotate accesses to fn->fn_sernum
Eric Dumazet [Thu, 20 Jan 2022 17:41:12 +0000 (09:41 -0800)]
ipv6: annotate accesses to fn->fn_sernum

struct fib6_node's fn_sernum field can be
read while other threads change it.

Add READ_ONCE()/WRITE_ONCE() annotations.

Do not change existing smp barriers in fib6_get_cookie_safe()
and __fib6_update_sernum_upto_root()

syzbot reported:

BUG: KCSAN: data-race in fib6_clean_node / inet6_csk_route_socket

write to 0xffff88813df62e2c of 4 bytes by task 1920 on cpu 1:
 fib6_clean_node+0xc2/0x260 net/ipv6/ip6_fib.c:2178
 fib6_walk_continue+0x38e/0x430 net/ipv6/ip6_fib.c:2112
 fib6_walk net/ipv6/ip6_fib.c:2160 [inline]
 fib6_clean_tree net/ipv6/ip6_fib.c:2240 [inline]
 __fib6_clean_all+0x1a9/0x2e0 net/ipv6/ip6_fib.c:2256
 fib6_flush_trees+0x6c/0x80 net/ipv6/ip6_fib.c:2281
 rt_genid_bump_ipv6 include/net/net_namespace.h:488 [inline]
 addrconf_dad_completed+0x57f/0x870 net/ipv6/addrconf.c:4230
 addrconf_dad_work+0x908/0x1170
 process_one_work+0x3f6/0x960 kernel/workqueue.c:2307
 worker_thread+0x616/0xa70 kernel/workqueue.c:2454
 kthread+0x1bf/0x1e0 kernel/kthread.c:359
 ret_from_fork+0x1f/0x30

read to 0xffff88813df62e2c of 4 bytes by task 15701 on cpu 0:
 fib6_get_cookie_safe include/net/ip6_fib.h:285 [inline]
 rt6_get_cookie include/net/ip6_fib.h:306 [inline]
 ip6_dst_store include/net/ip6_route.h:234 [inline]
 inet6_csk_route_socket+0x352/0x3c0 net/ipv6/inet6_connection_sock.c:109
 inet6_csk_xmit+0x91/0x1e0 net/ipv6/inet6_connection_sock.c:121
 __tcp_transmit_skb+0x1323/0x1840 net/ipv4/tcp_output.c:1402
 tcp_transmit_skb net/ipv4/tcp_output.c:1420 [inline]
 tcp_write_xmit+0x1450/0x4460 net/ipv4/tcp_output.c:2680
 __tcp_push_pending_frames+0x68/0x1c0 net/ipv4/tcp_output.c:2864
 tcp_push+0x2d9/0x2f0 net/ipv4/tcp.c:725
 mptcp_push_release net/mptcp/protocol.c:1491 [inline]
 __mptcp_push_pending+0x46c/0x490 net/mptcp/protocol.c:1578
 mptcp_sendmsg+0x9ec/0xa50 net/mptcp/protocol.c:1764
 inet6_sendmsg+0x5f/0x80 net/ipv6/af_inet6.c:643
 sock_sendmsg_nosec net/socket.c:705 [inline]
 sock_sendmsg net/socket.c:725 [inline]
 kernel_sendmsg+0x97/0xd0 net/socket.c:745
 sock_no_sendpage+0x84/0xb0 net/core/sock.c:3086
 inet_sendpage+0x9d/0xc0 net/ipv4/af_inet.c:834
 kernel_sendpage+0x187/0x200 net/socket.c:3492
 sock_sendpage+0x5a/0x70 net/socket.c:1007
 pipe_to_sendpage+0x128/0x160 fs/splice.c:364
 splice_from_pipe_feed fs/splice.c:418 [inline]
 __splice_from_pipe+0x207/0x500 fs/splice.c:562
 splice_from_pipe fs/splice.c:597 [inline]
 generic_splice_sendpage+0x94/0xd0 fs/splice.c:746
 do_splice_from fs/splice.c:767 [inline]
 direct_splice_actor+0x80/0xa0 fs/splice.c:936
 splice_direct_to_actor+0x345/0x650 fs/splice.c:891
 do_splice_direct+0x106/0x190 fs/splice.c:979
 do_sendfile+0x675/0xc40 fs/read_write.c:1245
 __do_sys_sendfile64 fs/read_write.c:1310 [inline]
 __se_sys_sendfile64 fs/read_write.c:1296 [inline]
 __x64_sys_sendfile64+0x102/0x140 fs/read_write.c:1296
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

value changed: 0x0000026f -> 0x00000271

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 15701 Comm: syz-executor.2 Not tainted 5.16.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

The Fixes tag I chose is probably arbitrary, I do not think
we need to backport this patch to older kernels.

Fixes: c5cff8561d2d ("ipv6: add rcu grace period before freeing fib6_node")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Link: https://lore.kernel.org/r/20220120174112.1126644-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agotcp: add a missing sk_defer_free_flush() in tcp_splice_read()
Eric Dumazet [Thu, 20 Jan 2022 12:45:30 +0000 (04:45 -0800)]
tcp: add a missing sk_defer_free_flush() in tcp_splice_read()

Without it, splice users can hit the warning
added in commit 79074a72d335 ("net: Flush deferred skb free on socket destroy")

Fixes: f35f821935d8 ("tcp: defer skb freeing after socket lock is released")
Fixes: 79074a72d335 ("net: Flush deferred skb free on socket destroy")
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Gal Pressman <gal@nvidia.com>
Link: https://lore.kernel.org/r/20220120124530.925607-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agotcp: Add a stub for sk_defer_free_flush()
Gal Pressman [Thu, 20 Jan 2022 12:34:40 +0000 (14:34 +0200)]
tcp: Add a stub for sk_defer_free_flush()

When compiling the kernel with CONFIG_INET disabled, the
sk_defer_free_flush() should be defined as a nop.

This resolves the following compilation error:
  ld: net/core/sock.o: in function `sk_defer_free_flush':
  ./include/net/tcp.h:1378: undefined reference to `__sk_defer_free_flush'

Fixes: 79074a72d335 ("net: Flush deferred skb free on socket destroy")
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220120123440.9088-1-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agophylib: fix potential use-after-free
Marek Behún [Wed, 19 Jan 2022 16:27:48 +0000 (17:27 +0100)]
phylib: fix potential use-after-free

Commit bafbdd527d56 ("phylib: Add device reset GPIO support") added call
to phy_device_reset(phydev) after the put_device() call in phy_detach().

The comment before the put_device() call says that the phydev might go
away with put_device().

Fix potential use-after-free by calling phy_device_reset() before
put_device().

Fixes: bafbdd527d56 ("phylib: Add device reset GPIO support")
Signed-off-by: Marek Behún <kabel@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20220119162748.32418-1-kabel@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoselftests/bpf: Do not fail build if CONFIG_NF_CONNTRACK=m/n
Kumar Kartikeya Dwivedi [Thu, 20 Jan 2022 16:49:32 +0000 (22:19 +0530)]
selftests/bpf: Do not fail build if CONFIG_NF_CONNTRACK=m/n

Some users have complained that selftests fail to build when
CONFIG_NF_CONNTRACK=m. It would be useful to allow building as long as
it is set to module or built-in, even though in case of building as
module, user would need to load it before running the selftest. Note
that this also allows building selftest when CONFIG_NF_CONNTRACK is
disabled.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220120164932.2798544-1-memxor@gmail.com
2 years agoselftests: bpf: Fix bind on used port
Felix Maurer [Tue, 18 Jan 2022 15:11:56 +0000 (16:11 +0100)]
selftests: bpf: Fix bind on used port

The bind_perm BPF selftest failed when port 111/tcp was already in use
during the test. To fix this, the test now runs in its own network name
space.

To use unshare, it is necessary to reorder the includes. The style of
the includes is adapted to be consistent with the other prog_tests.

v2: Replace deprecated CHECK macro with ASSERT_OK

Fixes: 8259fdeb30326 ("selftests/bpf: Verify that rebinding to port < 1024 from BPF works")
Signed-off-by: Felix Maurer <fmaurer@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/551ee65533bb987a43f93d88eaf2368b416ccd32.1642518457.git.fmaurer@redhat.com
2 years agoMerge branch 'rely on ASSERT marcos in xdp_bpf2bpf.c/xdp_adjust_tail.c'
Andrii Nakryiko [Thu, 20 Jan 2022 21:54:57 +0000 (13:54 -0800)]
Merge branch 'rely on ASSERT marcos in xdp_bpf2bpf.c/xdp_adjust_tail.c'

Lorenzo Bianconi says:

====================

Rely on ASSERT* macros and get rid of deprecated CHECK ones in xdp_bpf2bpf and
xdp_adjust_tail bpf selftests.
This is a preliminary series for XDP multi-frags support.

Changes since v1:
- run each ASSERT test separately
- drop unnecessary return statements
- drop unnecessary if condition in test_xdp_bpf2bpf()
====================

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2 years agobpf: selftests: Get rid of CHECK macro in xdp_bpf2bpf.c
Lorenzo Bianconi [Thu, 20 Jan 2022 11:50:27 +0000 (12:50 +0100)]
bpf: selftests: Get rid of CHECK macro in xdp_bpf2bpf.c

Rely on ASSERT* macros and get rid of deprecated CHECK ones in
xdp_bpf2bpf bpf selftest.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/df7e5098465016e27d91f2c69a376a35d63a7621.1642679130.git.lorenzo@kernel.org
2 years agobpf: selftests: Get rid of CHECK macro in xdp_adjust_tail.c
Lorenzo Bianconi [Thu, 20 Jan 2022 11:50:26 +0000 (12:50 +0100)]
bpf: selftests: Get rid of CHECK macro in xdp_adjust_tail.c

Rely on ASSERT* macros and get rid of deprecated CHECK ones in
xdp_adjust_tail bpf selftest.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/c0ab002ffa647a20ec9e584214bf0d4373142b54.1642679130.git.lorenzo@kernel.org
2 years agoi40e: fix unsigned stat widths
Joe Damato [Thu, 9 Dec 2021 01:56:33 +0000 (17:56 -0800)]
i40e: fix unsigned stat widths

Change i40e_update_vsi_stats and struct i40e_vsi to use u64 fields to match
the width of the stats counters in struct i40e_rx_queue_stats.

Update debugfs code to use the correct format specifier for u64.

Fixes: 41c445ff0f48 ("i40e: main driver core")
Signed-off-by: Joe Damato <jdamato@fastly.com>
Reported-by: kernel test robot <lkp@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 years agoi40e: Fix for failed to init adminq while VF reset
Karen Sornek [Thu, 2 Dec 2021 11:52:01 +0000 (12:52 +0100)]
i40e: Fix for failed to init adminq while VF reset

Fix for failed to init adminq: -53 while VF is resetting via MAC
address changing procedure.
Added sync module to avoid reading deadbeef value in reinit adminq
during software reset.
Without this patch it is possible to trigger VF reset procedure
during reinit adminq. This resulted in an incorrect reading of
value from the AQP registers and generated the -53 error.

Fixes: 5c3c48ac6bf5 ("i40e: implement virtual device interface")
Signed-off-by: Grzegorz Szczurek <grzegorzx.szczurek@intel.com>
Signed-off-by: Karen Sornek <karen.sornek@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 years agoi40e: Fix queues reservation for XDP
Sylwester Dziedziuch [Fri, 26 Nov 2021 10:11:22 +0000 (11:11 +0100)]
i40e: Fix queues reservation for XDP

When XDP was configured on a system with large number of CPUs
and X722 NIC there was a call trace with NULL pointer dereference.

i40e 0000:87:00.0: failed to get tracking for 256 queues for VSI 0 err -12
i40e 0000:87:00.0: setup of MAIN VSI failed

BUG: kernel NULL pointer dereference, address: 0000000000000000
RIP: 0010:i40e_xdp+0xea/0x1b0 [i40e]
Call Trace:
? i40e_reconfig_rss_queues+0x130/0x130 [i40e]
dev_xdp_install+0x61/0xe0
dev_xdp_attach+0x18a/0x4c0
dev_change_xdp_fd+0x1e6/0x220
do_setlink+0x616/0x1030
? ahci_port_stop+0x80/0x80
? ata_qc_issue+0x107/0x1e0
? lock_timer_base+0x61/0x80
? __mod_timer+0x202/0x380
rtnl_setlink+0xe5/0x170
? bpf_lsm_binder_transaction+0x10/0x10
? security_capable+0x36/0x50
rtnetlink_rcv_msg+0x121/0x350
? rtnl_calcit.isra.0+0x100/0x100
netlink_rcv_skb+0x50/0xf0
netlink_unicast+0x1d3/0x2a0
netlink_sendmsg+0x22a/0x440
sock_sendmsg+0x5e/0x60
__sys_sendto+0xf0/0x160
? __sys_getsockname+0x7e/0xc0
? _copy_from_user+0x3c/0x80
? __sys_setsockopt+0xc8/0x1a0
__x64_sys_sendto+0x20/0x30
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f83fa7a39e0

This was caused by PF queue pile fragmentation due to
flow director VSI queue being placed right after main VSI.
Because of this main VSI was not able to resize its
queue allocation for XDP resulting in no queues allocated
for main VSI when XDP was turned on.

Fix this by always allocating last queue in PF queue pile
for a flow director VSI.

Fixes: 41c445ff0f48 ("i40e: main driver core")
Fixes: 74608d17fe29 ("i40e: add support for XDP_TX action")
Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 years agoi40e: Fix issue when maximum queues is exceeded
Jedrzej Jagielski [Fri, 5 Nov 2021 11:17:00 +0000 (11:17 +0000)]
i40e: Fix issue when maximum queues is exceeded

Before this patch VF interface vanished when
maximum queue number was exceeded. Driver tried
to add next queues even if there was not enough
space. PF sent incorrect number of queues to
the VF when there were not enough of them.

Add an additional condition introduced to check
available space in 'qp_pile' before proceeding.
This condition makes it impossible to add queues
if they number is greater than the number resulting
from available space.
Also add the search for free space in PF queue
pair piles.

Without this patch VF interfaces are not seen
when available space for queues has been
exceeded and following logs appears permanently
in dmesg:
"Unable to get VF config (-32)".
"VF 62 failed opcode 3, retval: -5"
"Unable to get VF config due to PF error condition, not retrying"

Fixes: 7daa6bf3294e ("i40e: driver core headers")
Fixes: 41c445ff0f48 ("i40e: main driver core")
Signed-off-by: Jaroslaw Gawin <jaroslawx.gawin@intel.com>
Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 years agoi40e: Increase delay to 1 s after global EMP reset
Jedrzej Jagielski [Thu, 28 Oct 2021 13:51:14 +0000 (13:51 +0000)]
i40e: Increase delay to 1 s after global EMP reset

Recently simplified i40e_rebuild causes that FW sometimes
is not ready after NVM update, the ping does not return.

Increase the delay in case of EMP reset.
Old delay of 300 ms was introduced for specific cards for 710 series.
Now it works for all the cards and delay was increased.

Fixes: 1fa51a650e1d ("i40e: Add delay after EMP reset for firmware to recover")
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 years agoMerge branch 'stmmac-fixes'
David S. Miller [Thu, 20 Jan 2022 11:58:45 +0000 (11:58 +0000)]
Merge branch 'stmmac-fixes'

Yuji Ishikawa says:

====================
net: stmmac: dwmac-visconti: Fix bit definitions and clock configuration for RMII mode

This series is a fix for RMII/MII operation mode of the dwmac-visconti driver.
It is composed of two parts:

* 1/2: fix constant definitions for cleared bits in ETHER_CLK_SEL register
* 2/2: fix configuration of ETHER_CLK_SEL register for running in RMII operation mode.

  net: stmmac: dwmac-visconti: Fix bit definitions for ETHER_CLK_SEL
    v1 -> v2:
      - added Fixes tag to commit message

  net: stmmac: dwmac-visconti: Fix clock configuration for RMII mode
    v1 -> v2:
      - added Fixes tag to commit message
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: stmmac: dwmac-visconti: Fix clock configuration for RMII mode
Yuji Ishikawa [Wed, 19 Jan 2022 04:46:48 +0000 (13:46 +0900)]
net: stmmac: dwmac-visconti: Fix clock configuration for RMII mode

Bit pattern of the ETHER_CLOCK_SEL register for RMII/MII mode should be fixed.
Also, some control bits should be modified with a specific sequence.

Fixes: b38dd98ff8d0 ("net: stmmac: Add Toshiba Visconti SoCs glue driver")
Signed-off-by: Yuji Ishikawa <yuji2.ishikawa@toshiba.co.jp>
Reviewed-by: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: stmmac: dwmac-visconti: Fix bit definitions for ETHER_CLK_SEL
Yuji Ishikawa [Wed, 19 Jan 2022 04:46:47 +0000 (13:46 +0900)]
net: stmmac: dwmac-visconti: Fix bit definitions for ETHER_CLK_SEL

just 0 should be used to represent cleared bits

* ETHER_CLK_SEL_DIV_SEL_20
* ETHER_CLK_SEL_TX_CLK_EXT_SEL_IN
* ETHER_CLK_SEL_RX_CLK_EXT_SEL_IN
* ETHER_CLK_SEL_TX_CLK_O_TX_I
* ETHER_CLK_SEL_RMII_CLK_SEL_IN

Fixes: b38dd98ff8d0 ("net: stmmac: Add Toshiba Visconti SoCs glue driver")
Signed-off-by: Yuji Ishikawa <yuji2.ishikawa@toshiba.co.jp>
Reviewed-by: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoipv6_tunnel: Rate limit warning messages
Ido Schimmel [Thu, 20 Jan 2022 08:05:46 +0000 (10:05 +0200)]
ipv6_tunnel: Rate limit warning messages

The warning messages can be invoked from the data path for every packet
transmitted through an ip6gre netdev, leading to high CPU utilization.

Fix that by rate limiting the messages.

Fixes: 09c6bbf090ec ("[IPV6]: Do mandatory IPv6 tunnel endpoint checks in realtime")
Reported-by: Maksym Yaremchuk <maksymy@nvidia.com>
Tested-by: Maksym Yaremchuk <maksymy@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoethtool: Fix link extended state for big endian
Moshe Tal [Thu, 20 Jan 2022 09:55:50 +0000 (11:55 +0200)]
ethtool: Fix link extended state for big endian

The link extended sub-states are assigned as enum that is an integer
size but read from a union as u8, this is working for small values on
little endian systems but for big endian this always give 0. Fix the
variable in the union to match the enum size.

Fixes: ecc31c60240b ("ethtool: Add link extended state")
Signed-off-by: Moshe Tal <moshet@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: phy: broadcom: hook up soft_reset for BCM54616S
Robert Hancock [Tue, 18 Jan 2022 21:52:43 +0000 (15:52 -0600)]
net: phy: broadcom: hook up soft_reset for BCM54616S

A problem was encountered with the Bel-Fuse 1GBT-SFP05 SFP module (which
is a 1 Gbps copper module operating in SGMII mode with an internal
BCM54616S PHY device) using the Xilinx AXI Ethernet MAC core, where the
module would work properly on the initial insertion or boot of the
device, but after the device was rebooted, the link would either only
come up at 100 Mbps speeds or go up and down erratically.

I found no meaningful changes in the PHY configuration registers between
the working and non-working boots, but the status registers seemed to
have a lot of error indications set on the SERDES side of the device on
the non-working boot. I suspect the problem is that whatever happens on
the SGMII link when the device is rebooted and the FPGA logic gets
reloaded ends up putting the module's onboard PHY into a bad state.

Since commit 6e2d85ec0559 ("net: phy: Stop with excessive soft reset")
the genphy_soft_reset call is not made automatically by the PHY core
unless the callback is explicitly specified in the driver structure. For
most of these Broadcom devices, there is probably a hardware reset that
gets asserted to reset the PHY during boot, however for SFP modules
(where the BCM54616S is commonly found) no such reset line exists, so if
the board keeps the SFP cage powered up across a reboot, it will end up
with no reset occurring during reboots.

Hook up the genphy_soft_reset callback for BCM54616S to ensure that a
PHY reset is performed before the device is initialized. This appears to
fix the issue with erratic operation after a reboot with this SFP
module.

Fixes: 6e2d85ec0559 ("net: phy: Stop with excessive soft reset")
Signed-off-by: Robert Hancock <robert.hancock@calian.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: sched: Clarify error message when qdisc kind is unknown
Victor Nogueira [Tue, 18 Jan 2022 17:19:09 +0000 (14:19 -0300)]
net: sched: Clarify error message when qdisc kind is unknown

When adding a tc rule with a qdisc kind that is not supported or not
compiled into the kernel, the kernel emits the following error: "Error:
Specified qdisc not found.". Found via tdc testing when ETS qdisc was not
compiled in and it was not obvious right away what the message meant
without looking at the kernel code.

Change the error message to be more explicit and say the qdisc kind is
unknown.

Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: fix information leakage in /proc/net/ptype
Congyu Liu [Tue, 18 Jan 2022 19:20:13 +0000 (14:20 -0500)]
net: fix information leakage in /proc/net/ptype

In one net namespace, after creating a packet socket without binding
it to a device, users in other net namespaces can observe the new
`packet_type` added by this packet socket by reading `/proc/net/ptype`
file. This is minor information leakage as packet socket is
namespace aware.

Add a net pointer in `packet_type` to keep the net namespace of
of corresponding packet socket. In `ptype_seq_show`, this net pointer
must be checked when it is not NULL.

Fixes: 2feb27dbe00c ("[NETNS]: Minor information leak via /proc/net/ptype file.")
Signed-off-by: Congyu Liu <liu3101@purdue.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge tag 'net-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 20 Jan 2022 08:57:05 +0000 (10:57 +0200)]
Merge tag 'net-5.17-rc1' of git://git./linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from netfilter, bpf.

  Quite a handful of old regression fixes but most of those are
  pre-5.16.

  Current release - regressions:

   - fix memory leaks in the skb free deferral scheme if upper layer
     protocols are used, i.e. in-kernel TCP readers like TLS

  Current release - new code bugs:

   - nf_tables: fix NULL check typo in _clone() functions

   - change the default to y for Vertexcom vendor Kconfig

   - a couple of fixes to incorrect uses of ref tracking

   - two fixes for constifying netdev->dev_addr

  Previous releases - regressions:

   - bpf:
      - various verifier fixes mainly around register offset handling
        when passed to helper functions
      - fix mount source displayed for bpffs (none -> bpffs)

   - bonding:
      - fix extraction of ports for connection hash calculation
      - fix bond_xmit_broadcast return value when some devices are down

   - phy: marvell: add Marvell specific PHY loopback

   - sch_api: don't skip qdisc attach on ingress, prevent ref leak

   - htb: restore minimal packet size handling in rate control

   - sfp: fix high power modules without diagnostic monitoring

   - mscc: ocelot:
      - don't let phylink re-enable TX PAUSE on the NPI port
      - don't dereference NULL pointers with shared tc filters

   - smsc95xx: correct reset handling for LAN9514

   - cpsw: avoid alignment faults by taking NET_IP_ALIGN into account

   - phy: micrel: use kszphy_suspend/_resume for irq aware devices,
     avoid races with the interrupt

  Previous releases - always broken:

   - xdp: check prog type before updating BPF link

   - smc: resolve various races around abnormal connection termination

   - sit: allow encapsulated IPv6 traffic to be delivered locally

   - axienet: fix init/reset handling, add missing barriers, read the
     right status words, stop queues correctly

   - add missing dev_put() in sock_timestamping_bind_phc()

  Misc:

   - ipv4: prevent accidentally passing RTO_ONLINK to
     ip_route_output_key_hash() by sanitizing flags

   - ipv4: avoid quadratic behavior in netns dismantle

   - stmmac: dwmac-oxnas: add support for OX810SE

   - fsl: xgmac_mdio: add workaround for erratum A-009885"

* tag 'net-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (92 commits)
  ipv4: add net_hash_mix() dispersion to fib_info_laddrhash keys
  ipv4: avoid quadratic behavior in netns dismantle
  net/fsl: xgmac_mdio: Fix incorrect iounmap when removing module
  powerpc/fsl/dts: Enable WA for erratum A-009885 on fman3l MDIO buses
  dt-bindings: net: Document fsl,erratum-a009885
  net/fsl: xgmac_mdio: Add workaround for erratum A-009885
  net: mscc: ocelot: fix using match before it is set
  net: phy: micrel: use kszphy_suspend()/kszphy_resume for irq aware devices
  net: cpsw: avoid alignment faults by taking NET_IP_ALIGN into account
  nfc: llcp: fix NULL error pointer dereference on sendmsg() after failed bind()
  net: axienet: increase default TX ring size to 128
  net: axienet: fix for TX busy handling
  net: axienet: fix number of TX ring slots for available check
  net: axienet: Fix TX ring slot available check
  net: axienet: limit minimum TX ring size
  net: axienet: add missing memory barriers
  net: axienet: reset core on initialization prior to MDIO access
  net: axienet: Wait for PhyRstCmplt after core reset
  net: axienet: increase reset timeout
  bpf, selftests: Add ringbuf memory type confusion test
  ...

2 years agoMerge branch 'akpm' (patches from Andrew)
Linus Torvalds [Thu, 20 Jan 2022 08:41:01 +0000 (10:41 +0200)]
Merge branch 'akpm' (patches from Andrew)

Merge more updates from Andrew Morton:
 "55 patches.

  Subsystems affected by this patch series: percpu, procfs, sysctl,
  misc, core-kernel, get_maintainer, lib, checkpatch, binfmt, nilfs2,
  hfs, fat, adfs, panic, delayacct, kconfig, kcov, and ubsan"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (55 commits)
  lib: remove redundant assignment to variable ret
  ubsan: remove CONFIG_UBSAN_OBJECT_SIZE
  kcov: fix generic Kconfig dependencies if ARCH_WANTS_NO_INSTR
  lib/Kconfig.debug: make TEST_KMOD depend on PAGE_SIZE_LESS_THAN_256KB
  btrfs: use generic Kconfig option for 256kB page size limit
  arch/Kconfig: split PAGE_SIZE_LESS_THAN_256KB from PAGE_SIZE_LESS_THAN_64KB
  configs: introduce debug.config for CI-like setup
  delayacct: track delays from memory compact
  Documentation/accounting/delay-accounting.rst: add thrashing page cache and direct compact
  delayacct: cleanup flags in struct task_delay_info and functions use it
  delayacct: fix incomplete disable operation when switch enable to disable
  delayacct: support swapin delay accounting for swapping without blkio
  panic: remove oops_id
  panic: use error_report_end tracepoint on warnings
  fs/adfs: remove unneeded variable make code cleaner
  FAT: use io_schedule_timeout() instead of congestion_wait()
  hfsplus: use struct_group_attr() for memcpy() region
  nilfs2: remove redundant pointer sbufs
  fs/binfmt_elf: use PT_LOAD p_align values for static PIE
  const_structs.checkpatch: add frequently used ops structs
  ...

2 years agolib: remove redundant assignment to variable ret
Colin Ian King [Thu, 20 Jan 2022 02:10:38 +0000 (18:10 -0800)]
lib: remove redundant assignment to variable ret

The variable ret is being assigned a value that is never read.  If the
for-loop is entered then ret is immediately re-assigned a new value.  If
the for-loop is not executed ret is never read.  The assignment is
redundant and can be removed.

Link: https://lkml.kernel.org/r/20211230134557.83633-1-colin.i.king@gmail.com
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agoubsan: remove CONFIG_UBSAN_OBJECT_SIZE
Kees Cook [Thu, 20 Jan 2022 02:10:35 +0000 (18:10 -0800)]
ubsan: remove CONFIG_UBSAN_OBJECT_SIZE

The object-size sanitizer is redundant to -Warray-bounds, and
inappropriately performs its checks at run-time when all information
needed for the evaluation is available at compile-time, making it quite
difficult to use:

  https://bugzilla.kernel.org/show_bug.cgi?id=214861

With -Warray-bounds almost enabled globally, it doesn't make sense to
keep this around.

Link: https://lkml.kernel.org/r/20211203235346.110809-1-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Michal Marek <michal.lkml@markovi.net>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agokcov: fix generic Kconfig dependencies if ARCH_WANTS_NO_INSTR
Marco Elver [Thu, 20 Jan 2022 02:10:31 +0000 (18:10 -0800)]
kcov: fix generic Kconfig dependencies if ARCH_WANTS_NO_INSTR

Until recent versions of GCC and Clang, it was not possible to disable
KCOV instrumentation via a function attribute.  The relevant function
attribute was introduced in 540540d06e9d9 ("kcov: add
__no_sanitize_coverage to fix noinstr for all architectures").

x86 was the first architecture to want a working noinstr, and at the
time no compiler support for the attribute existed yet.  Therefore,
commit 0f1441b44e823 ("objtool: Fix noinstr vs KCOV") introduced the
ability to NOP __sanitizer_cov_*() calls in .noinstr.text.

However, this doesn't work for other architectures like arm64 and s390
that want a working noinstr per ARCH_WANTS_NO_INSTR.

At the time of 0f1441b44e823, we didn't yet have ARCH_WANTS_NO_INSTR,
but now we can move the Kconfig dependency checks to the generic KCOV
option.  KCOV will be available if:

- architecture does not care about noinstr, OR
- we have objtool support (like on x86), OR
- GCC is 12.0 or newer, OR
- Clang is 13.0 or newer.

Link: https://lkml.kernel.org/r/20211201152604.3984495-1-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agolib/Kconfig.debug: make TEST_KMOD depend on PAGE_SIZE_LESS_THAN_256KB
Nathan Chancellor [Thu, 20 Jan 2022 02:10:28 +0000 (18:10 -0800)]
lib/Kconfig.debug: make TEST_KMOD depend on PAGE_SIZE_LESS_THAN_256KB

Commit b05fbcc36be1 ("btrfs: disable build on platforms having page size
256K") disabled btrfs for configurations that used a 256kB page size.
However, it did not fully solve the problem because CONFIG_TEST_KMOD
selects CONFIG_BTRFS, which does not account for the dependency.  This
results in a Kconfig warning and the failed BUILD_BUG_ON error
returning.

  WARNING: unmet direct dependencies detected for BTRFS_FS
    Depends on [n]: BLOCK [=y] && !PPC_256K_PAGES && !PAGE_SIZE_256KB [=y]
    Selected by [m]:
    - TEST_KMOD [=m] && RUNTIME_TESTING_MENU [=y] && m && MODULES [=y] && NETDEVICES [=y] && NET_CORE [=y] && INET [=y] && BLOCK [=y]

To resolve this, add CONFIG_PAGE_SIZE_LESS_THAN_256KB as a dependency of
CONFIG_TEST_KMOD so there is no more invalid configuration or build
errors.

Link: https://lkml.kernel.org/r/20211129230141.228085-4-nathan@kernel.org
Fixes: b05fbcc36be1 ("btrfs: disable build on platforms having page size 256K")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reported-by: kernel test robot <lkp@intel.com>
Cc: Chris Mason <clm@fb.com>
Cc: David Sterba <dsterba@suse.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agobtrfs: use generic Kconfig option for 256kB page size limit
Nathan Chancellor [Thu, 20 Jan 2022 02:10:25 +0000 (18:10 -0800)]
btrfs: use generic Kconfig option for 256kB page size limit

Use the newly introduced CONFIG_PAGE_SIZE_LESS_THAN_256KB to describe
the dependency introduced by commit b05fbcc36be1 ("btrfs: disable build
on platforms having page size 256K").

Link: https://lkml.kernel.org/r/20211129230141.228085-3-nathan@kernel.org
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: David Sterba <dsterba@suse.com>
Cc: Chris Mason <clm@fb.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: kernel test robot <lkp@intel.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agoarch/Kconfig: split PAGE_SIZE_LESS_THAN_256KB from PAGE_SIZE_LESS_THAN_64KB
Nathan Chancellor [Thu, 20 Jan 2022 02:10:22 +0000 (18:10 -0800)]
arch/Kconfig: split PAGE_SIZE_LESS_THAN_256KB from PAGE_SIZE_LESS_THAN_64KB

Patch series "Fix CONFIG_TEST_KMOD with 256kB page size".

The kernel test robot reported a build error [1] from a failed assertion
in fs/btrfs/inode.c with a hexagon randconfig that includes
CONFIG_PAGE_SIZE_256KB.  This error is the same one that was addressed
by commit b05fbcc36be1 ("btrfs: disable build on platforms having page
size 256K") but CONFIG_TEST_KMOD selects CONFIG_BTRFS without having the
"page size less than 256kB dependency", which results in the error
reappearing.

The first patch introduces CONFIG_PAGE_SIZE_LESS_THAN_256KB by splitting
it off from CONFIG_PAGE_SIZE_LESS_THAN_64KB, which was introduced in
commit 1f0e290cc5fd ("arch: Add generic Kconfig option indicating page
size smaller than 64k") for a similar reason in 5.16-rc3.

The second patch uses that configuration option for CONFIG_BTRFS to
reduce duplication.

The third patch resolves the build error by adding
CONFIG_PAGE_SIZE_LESS_THAN_256KB as a dependency to CONFIG_TEST_KMOD so
that CONFIG_BTRFS does not get enabled under that invalid configuration.

[1]: https://lore.kernel.org/r/202111270255.UYOoN5VN-lkp@intel.com/

This patch (of 3):

btrfs requires a page size smaller than 256kB.  To use that dependency
in other places, introduce CONFIG_PAGE_SIZE_LESS_THAN_256KB and reuse
that dependency in CONFIG_PAGE_SIZE_LESS_THAN_64KB.

Link: https://lkml.kernel.org/r/20211129230141.228085-1-nathan@kernel.org
Link: https://lkml.kernel.org/r/20211129230141.228085-2-nathan@kernel.org
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Cc: Chris Mason <clm@fb.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: David Sterba <dsterba@suse.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agoconfigs: introduce debug.config for CI-like setup
Qian Cai [Thu, 20 Jan 2022 02:10:18 +0000 (18:10 -0800)]
configs: introduce debug.config for CI-like setup

Some general debugging features like kmemleak, KASAN, lockdep, UBSAN etc
help fix many viruses like a microscope.  On the other hand, those
features are scatter around and mixed up with more situational debugging
options making them difficult to consume properly.  This cold help
amplify the general debugging/testing efforts and help establish
sensitive default values for those options across the broad.  This could
also help different distros to collaborate on maintaining debug-flavored
kernels.

The config is based on years' experiences running daily CI inside the
largest enterprise Linux distro company to seek regressions on
linux-next builds on different bare-metal and virtual platforms.  It can
be used for example,

  $ make ARCH=arm64 defconfig debug.config

Since KASAN and KCSAN can't be enabled together, we will need to create
a separate one for KCSAN later as well.

Link: https://lkml.kernel.org/r/20211115134754.7334-1-quic_qiancai@quicinc.com
Signed-off-by: Qian Cai <quic_qiancai@quicinc.com>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Marco Elver <elver@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Naresh Kamboju <naresh.kamboju@linaro.org>
Cc: "Stephen Rothwell" <sfr@canb.auug.org.au>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agodelayacct: track delays from memory compact
wangyong [Thu, 20 Jan 2022 02:10:15 +0000 (18:10 -0800)]
delayacct: track delays from memory compact

Delay accounting does not track the delay of memory compact.  When there
is not enough free memory, tasks can spend a amount of their time
waiting for compact.

To get the impact of tasks in direct memory compact, measure the delay
when allocating memory through memory compact.

Also update tools/accounting/getdelays.c:

    / # ./getdelays_next  -di -p 304
    print delayacct stats ON
    printing IO accounting
    PID     304

    CPU             count     real total  virtual total    delay total  delay average
                      277      780000000      849039485       18877296          0.068ms
    IO              count    delay total  delay average
                        0              0              0ms
    SWAP            count    delay total  delay average
                        0              0              0ms
    RECLAIM         count    delay total  delay average
                        5    11088812685           2217ms
    THRASHING       count    delay total  delay average
                        0              0              0ms
    COMPACT         count    delay total  delay average
                        3          72758              0ms
    watch: read=0, write=0, cancelled_write=0

Link: https://lkml.kernel.org/r/1638619795-71451-1-git-send-email-wang.yong12@zte.com.cn
Signed-off-by: wangyong <wang.yong12@zte.com.cn>
Reviewed-by: Jiang Xuexin <jiang.xuexin@zte.com.cn>
Reviewed-by: Zhang Wenya <zhang.wenya1@zte.com.cn>
Reviewed-by: Yang Yang <yang.yang29@zte.com.cn>
Reviewed-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agoDocumentation/accounting/delay-accounting.rst: add thrashing page cache and direct...
wangyong [Thu, 20 Jan 2022 02:10:12 +0000 (18:10 -0800)]
Documentation/accounting/delay-accounting.rst: add thrashing page cache and direct compact

Add thrashing page cache and direct compact related descriptions and
update the usage of getdelays userspace utility.

The following patches modifications have been updated:
https://lore.kernel.org/all/20190312102002.31737-4-jinpuwang@gmail.com/
https://lore.kernel.org/all/1638619795-71451-1-git-send-email-
wang.yong12@zte.com.cn/

Link: https://lkml.kernel.org/r/1639583021-92977-1-git-send-email-wang.yong12@zte.com.cn
Signed-off-by: wangyong <wang.yong12@zte.com.cn>
Reviewed-by: Yang Yang <yang.yang29@zte.com.cn>
Reported-by: Zeal Robot <zealci@zte.com.cn>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agodelayacct: cleanup flags in struct task_delay_info and functions use it
Yang Yang [Thu, 20 Jan 2022 02:10:09 +0000 (18:10 -0800)]
delayacct: cleanup flags in struct task_delay_info and functions use it

Flags in struct task_delay_info is used to distinguish the difference
between swapin and blkio delay acountings.  But after patch "delayacct:
support swapin delay accounting for swapping without blkio", there is no
need to do that since swapin and blkio delay accounting use their own
functions.

Link: https://lkml.kernel.org/r/20211124065958.36703-1-yang.yang29@zte.com.cn
Signed-off-by: Yang Yang <yang.yang29@zte.com.cn>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agodelayacct: fix incomplete disable operation when switch enable to disable
Yang Yang [Thu, 20 Jan 2022 02:10:06 +0000 (18:10 -0800)]
delayacct: fix incomplete disable operation when switch enable to disable

When a task is created after delayacct is enabled, kernel will do all
the delay accountings for that task.  The problems is if user disables
delayacct by set /proc/sys/kernel/task_delayacct to zero, only blkio
delay accounting is disabled.

Now disable all the kinds of delay accountings when
/proc/sys/kernel/task_delayacct sets to zero.

Link: https://lkml.kernel.org/r/20211123140342.32962-1-ran.xiaokai@zte.com.cn
Signed-off-by: Yang Yang <yang.yang29@zte.com.cn>
Reported-by: Zeal Robot <zealci@zte.com.cn>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agodelayacct: support swapin delay accounting for swapping without blkio
Yang Yang [Thu, 20 Jan 2022 02:10:02 +0000 (18:10 -0800)]
delayacct: support swapin delay accounting for swapping without blkio

Currently delayacct accounts swapin delay only for swapping that cause
blkio.  If we use zram for swapping, tools/accounting/getdelays can't
get any SWAP delay.

It's useful to get zram swapin delay information, for example to adjust
compress algorithm or /proc/sys/vm/swappiness.

Reference to PSI, it accounts any kind of swapping by doing its work in
swap_readpage(), no matter whether swapping causes blkio.  Let delayacct
do the similar work.

Link: https://lkml.kernel.org/r/20211112083813.8559-1-yang.yang29@zte.com.cn
Signed-off-by: Yang Yang <yang.yang29@zte.com.cn>
Reported-by: Zeal Robot <zealci@zte.com.cn>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agopanic: remove oops_id
Sebastian Andrzej Siewior [Thu, 20 Jan 2022 02:09:59 +0000 (18:09 -0800)]
panic: remove oops_id

The oops id has been added as part of the end of trace marker for the
kerneloops.org project.  The id is used to automatically identify
duplicate submissions of the same report.  Identical looking reports
with different a id can be considered as the same oops occurred again.

The early initialisation of the oops_id can create a warning if the
random core is not yet fully initialized.  On PREEMPT_RT it is
problematic if the id is initialized on demand from non preemptible
context.

The kernel oops project is not available since 2017.  Remove the oops_id
and use 0 in the output in case parser rely on it.

Link: https://bugs.debian.org/953172
Link: https://lkml.kernel.org/r/Ybdi16aP2NEugWHq@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agopanic: use error_report_end tracepoint on warnings
Marco Elver [Thu, 20 Jan 2022 02:09:56 +0000 (18:09 -0800)]
panic: use error_report_end tracepoint on warnings

Introduce the error detector "warning" to the error_report event and use
the error_report_end tracepoint at the end of a warning report.

This allows in-kernel tests but also userspace to more easily determine
if a warning occurred without polling kernel logs.

[akpm@linux-foundation.org: add comma to enum list, per Andy]

Link: https://lkml.kernel.org/r/20211115085630.1756817-1-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Alexander Popov <alex.popov@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agofs/adfs: remove unneeded variable make code cleaner
Minghao Chi [Thu, 20 Jan 2022 02:09:53 +0000 (18:09 -0800)]
fs/adfs: remove unneeded variable make code cleaner

Return value directly instead of taking this in a variable.

Link: https://lkml.kernel.org/r/20211210023211.424609-1-chi.minghao@zte.com.cn
Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>
Reported-by: Zeal Robot <zealci@zte.com.cm>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agoFAT: use io_schedule_timeout() instead of congestion_wait()
NeilBrown [Thu, 20 Jan 2022 02:09:50 +0000 (18:09 -0800)]
FAT: use io_schedule_timeout() instead of congestion_wait()

congestion_wait() in this context is just a sleep - block devices do not
support congestion signalling any more.

The goal for this wait, which was introduced in commit ae78bf9c4f5f
("[PATCH] add -o flush for fat") is to wait for any recently written
data to get to storage.  We currently have no direct mechanism to do
this, so a simple wait that behaves identically to the current
congestion_wait() is the best we can do.

This is a step towards removing congestion_wait()

Link: https://lkml.kernel.org/r/163936544519.22433.13400436295732112065@noble.neil.brown.name
Signed-off-by: NeilBrown <neilb@suse.de>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agohfsplus: use struct_group_attr() for memcpy() region
Kees Cook [Thu, 20 Jan 2022 02:09:47 +0000 (18:09 -0800)]
hfsplus: use struct_group_attr() for memcpy() region

In preparation for FORTIFY_SOURCE performing compile-time and run-time
field bounds checking for memset(), avoid intentionally writing across
neighboring fields.

Add struct_group() to mark the "info" region (containing struct DInfo
and struct DXInfo structs) in struct hfsplus_cat_folder and struct
hfsplus_cat_file that are written into directly, so the compiler can
correctly reason about the expected size of the writes.

"pahole" shows no size nor member offset changes to struct
hfsplus_cat_folder nor struct hfsplus_cat_file.  "objdump -d" shows no
object code changes.

Link: https://lkml.kernel.org/r/20211119192851.1046717-1-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agonilfs2: remove redundant pointer sbufs
Colin Ian King [Thu, 20 Jan 2022 02:09:44 +0000 (18:09 -0800)]
nilfs2: remove redundant pointer sbufs

Pointer sbufs is being assigned a value but it's not being used later
on.  The pointer is redundant and can be removed.  Cleans up scan-build
static analysis warning:

  fs/nilfs2/page.c:203:8: warning: Although the value stored to 'sbufs'
    is used in the enclosing expression, the value is never actually read
    from 'sbufs' [deadcode.DeadStores]
        sbh = sbufs = page_buffers(src);

Link: https://lkml.kernel.org/r/20211211180955.550380-1-colin.i.king@gmail.com
Link: https://lkml.kernel.org/r/1640712476-15136-1-git-send-email-konishi.ryusuke@gmail.com
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agofs/binfmt_elf: use PT_LOAD p_align values for static PIE
H.J. Lu [Thu, 20 Jan 2022 02:09:40 +0000 (18:09 -0800)]
fs/binfmt_elf: use PT_LOAD p_align values for static PIE

Extend commit ce81bb256a22 ("fs/binfmt_elf: use PT_LOAD p_align values
for suitable start address") which fixed PIE binaries built with
-Wl,-z,max-page-size=0x200000, to cover static PIE binaries.  This
fixes:

    https://bugzilla.kernel.org/show_bug.cgi?id=215275

Tested by verifying static PIE binaries with -Wl,-z,max-page-size=0x200000 loading.

Link: https://lkml.kernel.org/r/20211209174052.370537-1-hjl.tools@gmail.com
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Cc: Chris Kennelly <ckennelly@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sandeep Patil <sspatil@google.com>
Cc: Fangrui Song <maskray@google.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agoconst_structs.checkpatch: add frequently used ops structs
Rikard Falkeborn [Thu, 20 Jan 2022 02:09:37 +0000 (18:09 -0800)]
const_structs.checkpatch: add frequently used ops structs

Add commonly used structs (>50 instances) which are always or almost
always const.

Link: https://lkml.kernel.org/r/20211127101134.33101-1-rikard.falkeborn@gmail.com
Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agocheckpatch: improve Kconfig help test
Joe Perches [Thu, 20 Jan 2022 02:09:34 +0000 (18:09 -0800)]
checkpatch: improve Kconfig help test

The Kconfig help test erroneously counts patch context lines as part of
the help text.

Fix that and improve the message block output.

Link: https://lkml.kernel.org/r/06c0cdc157ae1502e8e9eb3624b9ea995cf11e7a.camel@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Dwaipayan Ray <dwaipayanray1@gmail.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agocheckpatch: relax regexp for COMMIT_LOG_LONG_LINE
Jerome Forissier [Thu, 20 Jan 2022 02:09:31 +0000 (18:09 -0800)]
checkpatch: relax regexp for COMMIT_LOG_LONG_LINE

One exceptions to the COMMIT_LOG_LONG_LINE rule is a file path followed
by ':'.  That is typically some sort diagnostic message from a compiler
or a build tool, in which case we don't want to wrap the lines but keep
the message unmodified.

The regular expression used to match this pattern currently doesn't
accept absolute paths or + characters.  This can result in false
positives as in the following (out-of-tree) example:

  ...
  /home/jerome/work/optee_repo_qemu/build/../toolchains/aarch32/bin/arm-linux-gnueabihf-ld.bfd: /home/jerome/work/toolchains-gcc10.2/aarch32/bin/../lib/gcc/arm-none-linux-gnueabihf/10.2.1/../../../../arm-none-linux-gnueabihf/lib/libstdc++.a(eh_alloc.o): in function `__cxa_allocate_exception':
  /tmp/dgboter/bbs/build03--cen7x86_64/buildbot/cen7x86_64--arm-none-linux-gnueabihf/build/src/gcc/libstdc++-v3/libsupc++/eh_alloc.cc:284: undefined reference to `malloc'
  ...

Update the regular expression to match the above paths.

Link: https://lkml.kernel.org/r/20210923143842.2837983-1-jerome@forissier.org
Signed-off-by: Jerome Forissier <jerome@forissier.org>
Acked-by: Joe Perches <joe@perches.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Dwaipayan Ray <dwaipayanray1@gmail.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agolib/test_meminit: destroy cache in kmem_cache_alloc_bulk() test
Andrey Konovalov [Thu, 20 Jan 2022 02:09:28 +0000 (18:09 -0800)]
lib/test_meminit: destroy cache in kmem_cache_alloc_bulk() test

Make do_kmem_cache_size_bulk() destroy the cache it creates.

Link: https://lkml.kernel.org/r/aced20a94bf04159a139f0846e41d38a1537debb.1640018297.git.andreyknvl@google.com
Fixes: 03a9349ac0e0 ("lib/test_meminit: add a kmem_cache_alloc_bulk() test")
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agouuid: remove licence boilerplate text from the header
Andy Shevchenko [Thu, 20 Jan 2022 02:09:25 +0000 (18:09 -0800)]
uuid: remove licence boilerplate text from the header

Remove licence boilerplate text from the UAPI header.

Link: https://lkml.kernel.org/r/20211216113552.81199-2-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agouuid: discourage people from using UAPI header in new code
Andy Shevchenko [Thu, 20 Jan 2022 02:09:22 +0000 (18:09 -0800)]
uuid: discourage people from using UAPI header in new code

Discourage people from using UAPI header in new code by adding a note.

Link: https://lkml.kernel.org/r/20211216113552.81199-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agokunit: replace kernel.h with the necessary inclusions
Andy Shevchenko [Thu, 20 Jan 2022 02:09:19 +0000 (18:09 -0800)]
kunit: replace kernel.h with the necessary inclusions

When kernel.h is used in the headers it adds a lot into dependency hell,
especially when there are circular dependencies are involved.

Replace kernel.h inclusion with the list of what is really being used.

Link: https://lkml.kernel.org/r/20211213204441.56204-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Tested-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agotest_hash.c: refactor into kunit
Isabella Basso [Thu, 20 Jan 2022 02:09:15 +0000 (18:09 -0800)]
test_hash.c: refactor into kunit

Use KUnit framework to make tests more easily integrable with CIs.  Even
though these tests are not yet properly written as unit tests this
change should help in debugging.

Also remove kernel messages (i.e.  through pr_info) as KUnit handles all
debugging output and let it handle module init and exit details.

Link: https://lkml.kernel.org/r/20211208183711.390454-6-isabbasso@riseup.net
Reviewed-by: David Gow <davidgow@google.com>
Reported-by: kernel test robot <lkp@intel.com>
Tested-by: David Gow <davidgow@google.com>
Co-developed-by: Augusto Durães Camargo <augusto.duraes33@gmail.com>
Signed-off-by: Augusto Durães Camargo <augusto.duraes33@gmail.com>
Co-developed-by: Enzo Ferreira <ferreiraenzoa@gmail.com>
Signed-off-by: Enzo Ferreira <ferreiraenzoa@gmail.com>
Signed-off-by: Isabella Basso <isabbasso@riseup.net>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Daniel Latypov <dlatypov@google.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Rodrigo Siqueira <rodrigosiqueiramelo@gmail.com>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agolib/Kconfig.debug: properly split hash test kernel entries
Isabella Basso [Thu, 20 Jan 2022 02:09:12 +0000 (18:09 -0800)]
lib/Kconfig.debug: properly split hash test kernel entries

Split TEST_HASH so that each entry only has one file.

Note that there's no stringhash test file, but actually
<linux/stringhash.h> tests are performed in lib/test_hash.c.

Link: https://lkml.kernel.org/r/20211208183711.390454-5-isabbasso@riseup.net
Reviewed-by: David Gow <davidgow@google.com>
Tested-by: David Gow <davidgow@google.com>
Signed-off-by: Isabella Basso <isabbasso@riseup.net>
Cc: Augusto Durães Camargo <augusto.duraes33@gmail.com>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Daniel Latypov <dlatypov@google.com>
Cc: Enzo Ferreira <ferreiraenzoa@gmail.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: kernel test robot <lkp@intel.com>
Cc: Rodrigo Siqueira <rodrigosiqueiramelo@gmail.com>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agotest_hash.c: split test_hash_init
Isabella Basso [Thu, 20 Jan 2022 02:09:09 +0000 (18:09 -0800)]
test_hash.c: split test_hash_init

Split up test_hash_init so that it calls each test more explicitly
insofar it is possible without rewriting the entire file.  This aims at
improving readability.

Split tests performed on string_or as they don't interfere with those
performed in hash_or.  Also separate pr_info calls about skipped tests
as they're not part of the tests themselves, but only warn about
(un)defined arch-specific hash functions.

Link: https://lkml.kernel.org/r/20211208183711.390454-4-isabbasso@riseup.net
Reviewed-by: David Gow <davidgow@google.com>
Tested-by: David Gow <davidgow@google.com>
Signed-off-by: Isabella Basso <isabbasso@riseup.net>
Cc: Augusto Durães Camargo <augusto.duraes33@gmail.com>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Daniel Latypov <dlatypov@google.com>
Cc: Enzo Ferreira <ferreiraenzoa@gmail.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: kernel test robot <lkp@intel.com>
Cc: Rodrigo Siqueira <rodrigosiqueiramelo@gmail.com>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agotest_hash.c: split test_int_hash into arch-specific functions
Isabella Basso [Thu, 20 Jan 2022 02:09:05 +0000 (18:09 -0800)]
test_hash.c: split test_int_hash into arch-specific functions

Split the test_int_hash function to keep its mainloop separate from
arch-specific chunks, which are only compiled as needed.  This aims at
improving readability.

Link: https://lkml.kernel.org/r/20211208183711.390454-3-isabbasso@riseup.net
Reviewed-by: David Gow <davidgow@google.com>
Tested-by: David Gow <davidgow@google.com>
Signed-off-by: Isabella Basso <isabbasso@riseup.net>
Cc: Augusto Durães Camargo <augusto.duraes33@gmail.com>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Daniel Latypov <dlatypov@google.com>
Cc: Enzo Ferreira <ferreiraenzoa@gmail.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: kernel test robot <lkp@intel.com>
Cc: Rodrigo Siqueira <rodrigosiqueiramelo@gmail.com>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agohash.h: remove unused define directive
Isabella Basso [Thu, 20 Jan 2022 02:09:02 +0000 (18:09 -0800)]
hash.h: remove unused define directive

Patch series "test_hash.c: refactor into KUnit", v3.

We refactored the lib/test_hash.c file into KUnit as part of the student
group LKCAMP [1] introductory hackathon for kernel development.

This test was pointed to our group by Daniel Latypov [2], so its full
conversion into a pure KUnit test was our goal in this patch series, but
we ran into many problems relating to it not being split as unit tests,
which complicated matters a bit, as the reasoning behind the original
tests is quite cryptic for those unfamiliar with hash implementations.

Some interesting developments we'd like to highlight are:

 - In patch 1/5 we noticed that there was an unused define directive
   that could be removed.

 - In patch 4/5 we noticed how stringhash and hash tests are all under
   the lib/test_hash.c file, which might cause some confusion, and we
   also broke those kernel config entries up.

Overall KUnit developments have been made in the other patches in this
series:

In patches 2/5, 3/5 and 5/5 we refactored the lib/test_hash.c file so as
to make it more compatible with the KUnit style, whilst preserving the
original idea of the maintainer who designed it (i.e.  George Spelvin),
which might be undesirable for unit tests, but we assume it is enough
for a first patch.

This patch (of 5):

Currently, there exist hash_32() and __hash_32() functions, which were
introduced in a patch [1] targeting architecture specific optimizations.
These functions can be overridden on a per-architecture basis to achieve
such optimizations.  They must set their corresponding define directive
(HAVE_ARCH_HASH_32 and HAVE_ARCH__HASH_32, respectively) so that header
files can deal with these overrides properly.

As the supported 32-bit architectures that have their own hash function
implementation (i.e.  m68k, Microblaze, H8/300, pa-risc) have only been
making use of the (more general) __hash_32() function (which only lacks
a right shift operation when compared to the hash_32() function), remove
the define directive corresponding to the arch-specific hash_32()
implementation.

[1] https://lore.kernel.org/lkml/20160525073311.5600.qmail@ns.sciencehorizons.net/

[akpm@linux-foundation.org: hash_32_generic() becomes hash_32()]

Link: https://lkml.kernel.org/r/20211208183711.390454-1-isabbasso@riseup.net
Link: https://lkml.kernel.org/r/20211208183711.390454-2-isabbasso@riseup.net
Reviewed-by: David Gow <davidgow@google.com>
Tested-by: David Gow <davidgow@google.com>
Co-developed-by: Augusto Durães Camargo <augusto.duraes33@gmail.com>
Signed-off-by: Augusto Durães Camargo <augusto.duraes33@gmail.com>
Co-developed-by: Enzo Ferreira <ferreiraenzoa@gmail.com>
Signed-off-by: Enzo Ferreira <ferreiraenzoa@gmail.com>
Signed-off-by: Isabella Basso <isabbasso@riseup.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Daniel Latypov <dlatypov@google.com>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Rodrigo Siqueira <rodrigosiqueiramelo@gmail.com>
Cc: kernel test robot <lkp@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agolib/list_debug.c: print more list debugging context in __list_del_entry_valid()
Zhen Lei [Thu, 20 Jan 2022 02:08:59 +0000 (18:08 -0800)]
lib/list_debug.c: print more list debugging context in __list_del_entry_valid()

Currently, the entry->prev and entry->next are considered to be valid as
long as they are not LIST_POISON{1|2}.  However, the memory may be
corrupted.  The prev->next is invalid probably because 'prev' is
invalid, not because prev->next's content is illegal.

Unfortunately, the printk and its subfunctions will modify the registers
that hold the 'prev' and 'next', and we don't see this valuable
information in the BUG context.

So print the contents of 'entry->prev' and 'entry->next'.

Here's an example:
  list_del corruption. prev->next should be c0ecbf74, but was c08410dc
  kernel BUG at lib/list_debug.c:53!
  ... ...
  PC is at __list_del_entry_valid+0x58/0x98
  LR is at __list_del_entry_valid+0x58/0x98
  psr: 60000093
  sp : c0ecbf30  ip : 00000000  fp : 00000001
  r10: c08410d0  r9 : 00000001  r8 : c0825e0c
  r7 : 20000013  r6 : c08410d0  r5 : c0ecbf74  r4 : c0ecbf74
  r3 : c0825d08  r2 : 00000000  r1 : df7ce6f4  r0 : 00000044
  ... ...
  Stack: (0xc0ecbf30 to 0xc0ecc000)
  bf20:                                     c0ecbf74 c0164fd0 c0ecbf70 c0165170
  bf40: c0eca000 c0840c00 c0840c00 c0824500 c0825e0c c0189bbc c088f404 60000013
  bf60: 60000013 c0e85100 000004ec 00000000 c0ebcdc0 c0ecbf74 c0ecbf74 c0825d08
  bf80: c0e807c0 c018965c 00000000 c013f2a0 c0e807c0 c013f154 00000000 00000000
  bfa0: 00000000 00000000 00000000 c01001b0 00000000 00000000 00000000 00000000
  bfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  bfe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
  (__list_del_entry_valid) from (__list_del_entry+0xc/0x20)
  (__list_del_entry) from (finish_swait+0x60/0x7c)
  (finish_swait) from (rcu_gp_kthread+0x560/0xa20)
  (rcu_gp_kthread) from (kthread+0x14c/0x15c)
  (kthread) from (ret_from_fork+0x14/0x24)

At first, I thought prev->next was overwritten.  Later, I carefully
analyzed the RCU code and the disassembly code.  The error occurred when
deleting a node from the list rcu_state.gp_wq.  The System.map shows
that the address of rcu_state is c0840c00.  Then I use gdb to obtain the
offset of rcu_state.gp_wq.task_list.

  (gdb) p &((struct rcu_state *)0)->gp_wq.task_list
  $1 = (struct list_head *) 0x4dc

Again:
  list_del corruption. prev->next should be c0ecbf74, but was c08410dc

  c08410dc = c0840c00 + 0x4dc = &rcu_state.gp_wq.task_list

Because rcu_state.gp_wq has at most one node, so I can guess that "prev
= &rcu_state.gp_wq.task_list".  But for other scenes, maybe I wasn't so
lucky, I cannot figure out the value of 'prev'.

Link: https://lkml.kernel.org/r/20211207025835.1909-1-thunder.leizhen@huawei.com
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>