Ido Schimmel [Mon, 15 May 2023 16:29:25 +0000 (19:29 +0300)]
devlink: Fix crash with CONFIG_NET_NS=n
'__net_initdata' becomes a no-op with CONFIG_NET_NS=y, but when this
option is disabled it becomes '__initdata', which means the data can be
freed after the initialization phase. This annotation is obviously
incorrect for the devlink net device notifier block which is still
registered after the initialization phase [1].
Fix this crash by removing the '__net_initdata' annotation.
[1]
general protection fault, probably for non-canonical address 0xcccccccccccccccc: 0000 [#1] PREEMPT SMP
CPU: 3 PID: 117 Comm: (udev-worker) Not tainted 6.4.0-rc1-custom-gdf0acdc59b09 #64
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc37 04/01/2014
RIP: 0010:notifier_call_chain+0x58/0xc0
[...]
Call Trace:
<TASK>
dev_set_mac_address+0x85/0x120
dev_set_mac_address_user+0x30/0x50
do_setlink+0x219/0x1270
rtnl_setlink+0xf7/0x1a0
rtnetlink_rcv_msg+0x142/0x390
netlink_rcv_skb+0x58/0x100
netlink_unicast+0x188/0x270
netlink_sendmsg+0x214/0x470
__sys_sendto+0x12f/0x1a0
__x64_sys_sendto+0x24/0x30
do_syscall_64+0x38/0x80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Fixes:
e93c9378e33f ("devlink: change per-devlink netdev notifier to static one")
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Closes: https://lore.kernel.org/netdev/
600ddf9e-589a-2aa0-7b69-
a438f833ca10@samsung.com/
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230515162925.1144416-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Florian Fainelli [Mon, 15 May 2023 02:56:07 +0000 (19:56 -0700)]
net: bcmgenet: Restore phy_stop() depending upon suspend/close
Removing the phy_stop() from bcmgenet_netif_stop() ended up causing
warnings from the PHY library that phy_start() is called from the
RUNNING state since we are no longer stopping the PHY state machine
during bcmgenet_suspend().
Restore the call to phy_stop() but make it conditional on being called
from the close or suspend path.
Fixes:
c96e731c93ff ("net: bcmgenet: connect and disconnect from the PHY state machine")
Fixes:
93e0401e0fc0 ("net: bcmgenet: Remove phy_stop() from bcmgenet_netif_stop()")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/20230515025608.2587012-1-f.fainelli@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Russell King (Oracle) [Sat, 13 May 2023 21:03:45 +0000 (22:03 +0100)]
net: phylink: fix ksettings_set() ethtool call
While testing a Fiberstore SFP-10G-T module (which uses 10GBASE-R with
rate adaption) in a Clearfog platform (which can't do that) it was
found that the PHYs advertisement was not limited according to the
hosts capabilities when using ethtool to change it.
Fix this by ensuring that we mask the advertisement with the computed
support mask as the very first thing we do.
Fixes:
cbc1bb1e4689 ("net: phylink: simplify phy case for ksettings_set method")
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 15 May 2023 09:21:21 +0000 (10:21 +0100)]
Merge branch 'tipc-fixes'
Xin Long says:
====================
tipc: fix the mtu update in link mtu negotiation
This patchset fixes a crash caused by a too small MTU carried in the
activate msg. Note that as such malicious packet does not exist in
the normal env, the fix won't break any application
The 1st patch introduces a function to calculate the minimum MTU for
the bearer, and the 2nd patch fixes the crash with this helper. While
at it, the 3rd patch fixes the udp bearer mtu update by netlink with
this helper.
====================
Reviewed-by: Tung Nguyen <tung.q.nguyen@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Sun, 14 May 2023 19:52:29 +0000 (15:52 -0400)]
tipc: check the bearer min mtu properly when setting it by netlink
Checking the bearer min mtu with tipc_udp_mtu_bad() only works for
IPv4 UDP bearer, and IPv6 UDP bearer has a different value for the
min mtu. This patch checks with encap_hlen + TIPC_MIN_BEARER_MTU
for min mtu, which works for both IPv4 and IPv6 UDP bearer.
Note that tipc_udp_mtu_bad() is still used to check media min mtu
in __tipc_nl_media_set(), as m->mtu currently is only used by the
IPv4 UDP bearer as its default mtu value.
Fixes:
682cd3cf946b ("tipc: confgiure and apply UDP bearer MTU on running links")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Sun, 14 May 2023 19:52:28 +0000 (15:52 -0400)]
tipc: do not update mtu if msg_max is too small in mtu negotiation
When doing link mtu negotiation, a malicious peer may send Activate msg
with a very small mtu, e.g. 4 in Shuang's testing, without checking for
the minimum mtu, l->mtu will be set to 4 in tipc_link_proto_rcv(), then
n->links[bearer_id].mtu is set to
4294967228, which is a overflow of
'4 - INT_H_SIZE - EMSG_OVERHEAD' in tipc_link_mss().
With tipc_link.mtu = 4, tipc_link_xmit() kept printing the warning:
tipc: Too large msg, purging xmit list 1 5 0 40 4!
tipc: Too large msg, purging xmit list 1 15 0 60 4!
And with tipc_link_entry.mtu
4294967228, a huge skb was allocated in
named_distribute(), and when purging it in tipc_link_xmit(), a crash
was even caused:
general protection fault, probably for non-canonical address 0x2100001011000dd: 0000 [#1] PREEMPT SMP PTI
CPU: 0 PID: 0 Comm: swapper/0 Kdump: loaded Not tainted 6.3.0.neta #19
RIP: 0010:kfree_skb_list_reason+0x7e/0x1f0
Call Trace:
<IRQ>
skb_release_data+0xf9/0x1d0
kfree_skb_reason+0x40/0x100
tipc_link_xmit+0x57a/0x740 [tipc]
tipc_node_xmit+0x16c/0x5c0 [tipc]
tipc_named_node_up+0x27f/0x2c0 [tipc]
tipc_node_write_unlock+0x149/0x170 [tipc]
tipc_rcv+0x608/0x740 [tipc]
tipc_udp_recv+0xdc/0x1f0 [tipc]
udp_queue_rcv_one_skb+0x33e/0x620
udp_unicast_rcv_skb.isra.72+0x75/0x90
__udp4_lib_rcv+0x56d/0xc20
ip_protocol_deliver_rcu+0x100/0x2d0
This patch fixes it by checking the new mtu against tipc_bearer_min_mtu(),
and not updating mtu if it is too small.
Fixes:
ed193ece2649 ("tipc: simplify link mtu negotiation")
Reported-by: Shuang Li <shuali@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Sun, 14 May 2023 19:52:27 +0000 (15:52 -0400)]
tipc: add tipc_bearer_min_mtu to calculate min mtu
As different media may requires different min mtu, and even the
same media with different net family requires different min mtu,
add tipc_bearer_min_mtu() to calculate min mtu accordingly.
This API will be used to check the new mtu when doing the link
mtu negotiation in the next patch.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King (Oracle) [Sat, 13 May 2023 08:57:27 +0000 (09:57 +0100)]
net: mdio: i2c: fix rollball accessors
Commit
87e3bee0f247 ("net: mdio: i2c: Separate C22 and C45 transactions")
separated the non-rollball bus accessors, but left the rollball
accessors as is. As rollball accessors are clause 45, this results
in the rollball protocol being completely non-functional. Fix this.
Fixes:
87e3bee0f247 ("net: mdio: i2c: Separate C22 and C45 transactions")
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Feng Liu [Fri, 12 May 2023 15:18:12 +0000 (11:18 -0400)]
virtio_net: Fix error unwinding of XDP initialization
When initializing XDP in virtnet_open(), some rq xdp initialization
may hit an error causing net device open failed. However, previous
rqs have already initialized XDP and enabled NAPI, which is not the
expected behavior. Need to roll back the previous rq initialization
to avoid leaks in error unwinding of init code.
Also extract helper functions of disable and enable queue pairs.
Use newly introduced disable helper function in error unwinding and
virtnet_close. Use enable helper function in virtnet_open.
Fixes:
754b8a21a96d ("virtio_net: setup xdp_rxq_info")
Signed-off-by: Feng Liu <feliu@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: William Tu <witu@nvidia.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shenwei Wang [Fri, 12 May 2023 13:38:43 +0000 (08:38 -0500)]
net: fec: remove the xdp_return_frame when lack of tx BDs
In the implementation, the sent_frame count does not increment when
transmit errors occur. Therefore, bq_xmit_all() will take care of
returning the XDP frames.
Fixes:
26312c685ae0 ("net: fec: correct the counting of XDP sent frames")
Signed-off-by: Shenwei Wang <shenwei.wang@nxp.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dong Chenchen [Thu, 11 May 2023 12:54:40 +0000 (20:54 +0800)]
net: nsh: Use correct mac_offset to unwind gso skb in nsh_gso_segment()
As the call trace shows, skb_panic was caused by wrong skb->mac_header
in nsh_gso_segment():
invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
CPU: 3 PID: 2737 Comm: syz Not tainted 6.3.0-next-
20230505 #1
RIP: 0010:skb_panic+0xda/0xe0
call Trace:
skb_push+0x91/0xa0
nsh_gso_segment+0x4f3/0x570
skb_mac_gso_segment+0x19e/0x270
__skb_gso_segment+0x1e8/0x3c0
validate_xmit_skb+0x452/0x890
validate_xmit_skb_list+0x99/0xd0
sch_direct_xmit+0x294/0x7c0
__dev_queue_xmit+0x16f0/0x1d70
packet_xmit+0x185/0x210
packet_snd+0xc15/0x1170
packet_sendmsg+0x7b/0xa0
sock_sendmsg+0x14f/0x160
The root cause is:
nsh_gso_segment() use skb->network_header - nhoff to reset mac_header
in skb_gso_error_unwind() if inner-layer protocol gso fails.
However, skb->network_header may be reset by inner-layer protocol
gso function e.g. mpls_gso_segment. skb->mac_header reset by the
inaccurate network_header will be larger than skb headroom.
nsh_gso_segment
nhoff = skb->network_header - skb->mac_header;
__skb_pull(skb,nsh_len)
skb_mac_gso_segment
mpls_gso_segment
skb_reset_network_header(skb);//skb->network_header+=nsh_len
return -EINVAL;
skb_gso_error_unwind
skb_push(skb, nsh_len);
skb->mac_header = skb->network_header - nhoff;
// skb->mac_header > skb->headroom, cause skb_push panic
Use correct mac_offset to restore mac_header and get rid of nhoff.
Fixes:
c411ed854584 ("nsh: add GSO support")
Reported-by: syzbot+632b5d9964208bfef8c0@syzkaller.appspotmail.com
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 13 May 2023 16:12:24 +0000 (17:12 +0100)]
Merge branch 'hns3-fixes'
Hao Lan says:
====================
net: hns3: fix some bug for hns3
There are some bugfixes for the HNS3 ethernet driver. patch#1 fix miss
checking for rx packet. patch#2 fixes VF promisc mode not update
when mac table full bug, and patch#3 fixes a nterrupts not
initialization in VF FLR bug.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jijie Shao [Fri, 12 May 2023 10:00:14 +0000 (18:00 +0800)]
net: hns3: fix reset timeout when enable full VF
The timeout of the cmdq reset command has been increased to
resolve the reset timeout issue in the full VF scenario.
The timeout of other cmdq commands remains unchanged.
Fixes:
8d307f8e8cf1 ("net: hns3: create new set of unified hclge_comm_cmd_send APIs")
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: Hao Lan <lanhao@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jie Wang [Fri, 12 May 2023 10:00:13 +0000 (18:00 +0800)]
net: hns3: fix reset delay time to avoid configuration timeout
Currently the hns3 vf function reset delays 5000ms before vf rebuild
process. In product applications, this delay is too long for application
configurations and causes configuration timeout.
According to the tests, 500ms delay is enough for reset process except PF
FLR. So this patch modifies delay to 500ms in these scenarios.
Fixes:
6988eb2a9b77 ("net: hns3: Add support to reset the enet/ring mgmt layer")
Signed-off-by: Jie Wang <wangjie125@huawei.com>
Signed-off-by: Hao Lan <lanhao@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jijie Shao [Fri, 12 May 2023 10:00:12 +0000 (18:00 +0800)]
net: hns3: fix sending pfc frames after reset issue
To prevent the system from abnormally sending PFC frames after an
abnormal reset. The hns3 driver notifies the firmware to disable pfc
before reset.
Fixes:
35d93a30040c ("net: hns3: adjust the process of PF reset")
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: Hao Lan <lanhao@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jie Wang [Fri, 12 May 2023 10:00:11 +0000 (18:00 +0800)]
net: hns3: fix output information incomplete for dumping tx queue info with debugfs
In function hns3_dump_tx_queue_info, The print buffer is not enough when
the tx BD number is configured to 32760. As a result several BD
information wouldn't be displayed.
So fix it by increasing the tx queue print buffer length.
Fixes:
630a6738da82 ("net: hns3: adjust string spaces of some parameters of tx bd info in debugfs")
Signed-off-by: Jie Wang <wangjie125@huawei.com>
Signed-off-by: Hao Lan <lanhao@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 13 May 2023 16:06:38 +0000 (17:06 +0100)]
Merge branch 'dsa-rzn1-a5psw-stp'
Alexis Lothoré says:
====================
net: dsa: rzn1-a5psw: fix STP states handling
This small series fixes STP support and while adding a new function to
enable/disable learning, use that to disable learning on standalone ports
at switch setup as reported by Vladimir Oltean.
This series was initially submitted on net-next by Clement Leger, but some
career evolutions has made him hand me over those topics.
Also, this new revision is submitted on net instead of net-next for V1
based on Vladimir Oltean's suggestion
Changes since v2:
- fix commit split by moving A5PSW_MGMT_CFG_ENABLE in relevant commit
- fix reverse christmas tree ordering in a5psw_port_stp_state_set
Changes since v1:
- fix typos in commit messages and doc
- re-split STP states handling commit
- add Fixes: tag and new Signed-off-by
- submit series as fix on net instead of net-next
- split learning and blocking setting functions
- remove unused define A5PSW_PORT_ENA_TX_SHIFT
- add boolean for tx/rx enabled for clarity
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Clément Léger [Fri, 12 May 2023 07:27:12 +0000 (09:27 +0200)]
net: dsa: rzn1-a5psw: disable learning for standalone ports
When ports are in standalone mode, they should have learning disabled to
avoid adding new entries in the MAC lookup table which might be used by
other bridge ports to forward packets. While adding that, also make sure
learning is enabled for CPU port.
Fixes:
888cdb892b61 ("net: dsa: rzn1-a5psw: add Renesas RZ/N1 advanced 5 port switch driver")
Signed-off-by: Clément Léger <clement.leger@bootlin.com>
Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com>
Reviewed-by: Piotr Raczynski <piotr.raczynski@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexis Lothoré [Fri, 12 May 2023 07:27:11 +0000 (09:27 +0200)]
net: dsa: rzn1-a5psw: fix STP states handling
stp_set_state() should actually allow receiving BPDU while in LEARNING
mode which is not the case. Additionally, the BLOCKEN bit does not
actually forbid sending forwarded frames from that port. To fix this, add
a5psw_port_tx_enable() function which allows to disable TX. However, while
its name suggest that TX is totally disabled, it is not and can still
allow to send BPDUs even if disabled. This can be done by using forced
forwarding with the switch tagging mechanism but keeping "filtering"
disabled (which is already the case in the rzn1-a5sw tag driver). With
these fixes, STP support is now functional.
Fixes:
888cdb892b61 ("net: dsa: rzn1-a5psw: add Renesas RZ/N1 advanced 5 port switch driver")
Signed-off-by: Clément Léger <clement.leger@bootlin.com>
Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Clément Léger [Fri, 12 May 2023 07:27:10 +0000 (09:27 +0200)]
net: dsa: rzn1-a5psw: enable management frames for CPU port
Currently, management frame were discarded before reaching the CPU port due
to a misconfiguration of the MGMT_CONFIG register. Enable them by setting
the correct value in this register in order to correctly receive management
frame and handle STP.
Fixes:
888cdb892b61 ("net: dsa: rzn1-a5psw: add Renesas RZ/N1 advanced 5 port switch driver")
Signed-off-by: Clément Léger <clement.leger@bootlin.com>
Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com>
Reviewed-by: Piotr Raczynski <piotr.raczynski@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Thu, 11 May 2023 23:22:11 +0000 (19:22 -0400)]
erspan: get the proto with the md version for collect_md
In commit
20704bd1633d ("erspan: build the header with the right proto
according to erspan_ver"), it gets the proto with t->parms.erspan_ver,
but t->parms.erspan_ver is not used by collect_md branch, and instead
it should get the proto with md->version for collect_md.
Thanks to Kevin for pointing this out.
Fixes:
20704bd1633d ("erspan: build the header with the right proto according to erspan_ver")
Fixes:
94d7d8f29287 ("ip6_gre: add erspan v2 support")
Reported-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 11 May 2023 11:47:49 +0000 (11:47 +0000)]
tcp: fix possible sk_priority leak in tcp_v4_send_reset()
When tcp_v4_send_reset() is called with @sk == NULL,
we do not change ctl_sk->sk_priority, which could have been
set from a prior invocation.
Change tcp_v4_send_reset() to set sk_priority and sk_mark
fields before calling ip_send_unicast_reply().
This means tcp_v4_send_reset() and tcp_v4_send_ack()
no longer have to clear ctl_sk->sk_mark after
their call to ip_send_unicast_reply().
Fixes:
f6c0f5d209fa ("tcp: honor SO_PRIORITY in TIME_WAIT state")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Zhuang Shengen [Thu, 11 May 2023 11:34:30 +0000 (19:34 +0800)]
vsock: avoid to close connected socket after the timeout
When client and server establish a connection through vsock,
the client send a request to the server to initiate the connection,
then start a timer to wait for the server's response. When the server's
RESPONSE message arrives, the timer also times out and exits. The
server's RESPONSE message is processed first, and the connection is
established. However, the client's timer also times out, the original
processing logic of the client is to directly set the state of this vsock
to CLOSE and return ETIMEDOUT. It will not notify the server when the port
is released, causing the server port remain.
when client's vsock_connect timeout,it should check sk state is
ESTABLISHED or not. if sk state is ESTABLISHED, it means the connection
is established, the client should not set the sk state to CLOSE
Note: I encountered this issue on kernel-4.18, which can be fixed by
this patch. Then I checked the latest code in the community
and found similar issue.
Fixes:
d021c344051a ("VSOCK: Introduce VM Sockets")
Signed-off-by: Zhuang Shengen <zhuangshengen@huawei.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pieter Jansen van Vuuren [Thu, 11 May 2023 09:43:33 +0000 (10:43 +0100)]
sfc: disable RXFCS and RXALL features by default
By default we would not want RXFCS and RXALL features enabled as they are
mainly intended for debugging purposes. This does not stop users from
enabling them later on as needed.
Fixes:
8e57daf70671 ("sfc_ef100: RX path for EF100")
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com>
Co-developed-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jan Sokolowski [Thu, 11 May 2023 15:53:19 +0000 (08:53 -0700)]
ice: Fix undersized tx_flags variable
As not all ICE_TX_FLAGS_* fit in current 16-bit limited
tx_flags field that was introduced in the Fixes commit,
VLAN-related information would be discarded completely.
As such, creating a vlan and trying to run ping through
would result in no traffic passing.
Fix that by refactoring tx_flags variable into flags only and
a separate variable that holds VLAN ID. As there is some space left,
type variable can fit between those two. Pahole reports no size
change to ice_tx_buf struct.
Fixes:
aa1d3faf71a6 ("ice: Robustify cleaning/completing XDP Tx buffers")
Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 11 May 2023 16:03:10 +0000 (09:03 -0700)]
MAINTAINERS: exclude wireless drivers from netdev
It seems that we mostly get netdev CCed on wireless patches
which are written by people who don't know any better and
CC everything that get_maintainers spits out. Rather than
patches which indeed could benefit from general networking
review.
Marking them down in patchwork as Awaiting Upstream is
a bit tedious.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Kalle Valo <kvalo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Huayu Chen [Thu, 11 May 2023 06:50:56 +0000 (08:50 +0200)]
nfp: fix NFP_NET_MAX_DSCP definition error
The patch corrects the NFP_NET_MAX_DSCP definition in the main.h file.
The incorrect definition result DSCP bits not being mapped properly when
DCB is set. When NFP_NET_MAX_DSCP was defined as 4, the next 60 DSCP
bits failed to be set.
Fixes:
9b7fe8046d74 ("nfp: add DCB IEEE support")
Cc: stable@vger.kernel.org
Signed-off-by: Huayu Chen <huayu.chen@corigine.com>
Acked-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 11 May 2023 01:43:39 +0000 (18:43 -0700)]
MAINTAINERS: don't CC docs@ for netlink spec changes
Documentation/netlink/ contains machine-readable protocol
specs in YAML. Those are much like device tree bindings,
no point CCing docs@ for the changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Marcelo Ricardo Leitner [Wed, 10 May 2023 22:42:43 +0000 (19:42 -0300)]
MAINTAINERS: sctp: move Neil to CREDITS
Neil moved away from SCTP related duties.
Move him to CREDITS then and while at it, update SCTP
project website.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Grygorii Strashko [Wed, 10 May 2023 12:51:39 +0000 (18:21 +0530)]
net: phy: dp83867: add w/a for packet errors seen with short cables
Introduce the W/A for packet errors seen with short cables (<1m) between
two DP83867 PHYs.
The W/A recommended by DM requires FFE Equalizer Configuration tuning by
writing value 0x0E81 to DSP_FFE_CFG register (0x012C), surrounded by hard
and soft resets as follows:
write_reg(0x001F, 0x8000); //hard reset
write_reg(DSP_FFE_CFG, 0x0E81);
write_reg(0x001F, 0x4000); //soft reset
Since DP83867 PHY DM says "Changing this register to 0x0E81, will not
affect Long Cable performance.", enable the W/A by default.
Fixes:
2a10154abcb7 ("net: phy: dp83867: Add TI dp83867 phy")
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Uwe Kleine-König [Wed, 10 May 2023 20:00:20 +0000 (22:00 +0200)]
net: fec: Better handle pm_runtime_get() failing in .remove()
In the (unlikely) event that pm_runtime_get() (disguised as
pm_runtime_resume_and_get()) fails, the remove callback returned an
error early. The problem with this is that the driver core ignores the
error value and continues removing the device. This results in a
resource leak. Worse the devm allocated resources are freed and so if a
callback of the driver is called later the register mapping is already
gone which probably results in a crash.
Fixes:
a31eda65ba21 ("net: fec: fix clock count mis-match")
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20230510200020.1534610-1-u.kleine-koenig@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Wed, 10 May 2023 15:46:46 +0000 (15:46 +0000)]
ipv6: remove nexthop_fib6_nh_bh()
After blamed commit, nexthop_fib6_nh_bh() and nexthop_fib6_nh()
are the same.
Delete nexthop_fib6_nh_bh(), and convert /proc/net/ipv6_route
to standard rcu to avoid this splat:
[ 5723.180080] WARNING: suspicious RCU usage
[ 5723.180083] -----------------------------
[ 5723.180084] include/net/nexthop.h:516 suspicious rcu_dereference_check() usage!
[ 5723.180086]
other info that might help us debug this:
[ 5723.180087]
rcu_scheduler_active = 2, debug_locks = 1
[ 5723.180089] 2 locks held by cat/55856:
[ 5723.180091] #0:
ffff9440a582afa8 (&p->lock){+.+.}-{3:3}, at: seq_read_iter (fs/seq_file.c:188)
[ 5723.180100] #1:
ffffffffaac07040 (rcu_read_lock_bh){....}-{1:2}, at: rcu_lock_acquire (include/linux/rcupdate.h:326)
[ 5723.180109]
stack backtrace:
[ 5723.180111] CPU: 14 PID: 55856 Comm: cat Tainted: G S I 6.3.0-dbx-DEV #528
[ 5723.180115] Call Trace:
[ 5723.180117] <TASK>
[ 5723.180119] dump_stack_lvl (lib/dump_stack.c:107)
[ 5723.180124] dump_stack (lib/dump_stack.c:114)
[ 5723.180126] lockdep_rcu_suspicious (include/linux/context_tracking.h:122)
[ 5723.180132] ipv6_route_seq_show (include/net/nexthop.h:?)
[ 5723.180135] ? ipv6_route_seq_next (net/ipv6/ip6_fib.c:2605)
[ 5723.180140] seq_read_iter (fs/seq_file.c:272)
[ 5723.180145] seq_read (fs/seq_file.c:163)
[ 5723.180151] proc_reg_read (fs/proc/inode.c:316 fs/proc/inode.c:328)
[ 5723.180155] vfs_read (fs/read_write.c:468)
[ 5723.180160] ? up_read (kernel/locking/rwsem.c:1617)
[ 5723.180164] ksys_read (fs/read_write.c:613)
[ 5723.180168] __x64_sys_read (fs/read_write.c:621)
[ 5723.180170] do_syscall_64 (arch/x86/entry/common.c:?)
[ 5723.180174] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
[ 5723.180177] RIP: 0033:0x7fa455677d2a
Fixes:
09eed1192cec ("neighbour: switch to standard rcu, instead of rcu_bh")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20230510154646.370659-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jiri Pirko [Wed, 10 May 2023 14:46:21 +0000 (16:46 +0200)]
devlink: change per-devlink netdev notifier to static one
The commit
565b4824c39f ("devlink: change port event netdev notifier
from per-net to global") changed original per-net notifier to be
per-devlink instance. That fixed the issue of non-receiving events
of netdev uninit if that moved to a different namespace.
That worked fine in -net tree.
However, later on when commit
ee75f1fc44dd ("net/mlx5e: Create
separate devlink instance for ethernet auxiliary device") and
commit
72ed5d5624af ("net/mlx5: Suspend auxiliary devices only in
case of PCI device suspend") were merged, a deadlock was introduced
when removing a namespace with devlink instance with another nested
instance.
Here there is the bad flow example resulting in deadlock with mlx5:
net_cleanup_work -> cleanup_net (takes down_read(&pernet_ops_rwsem) ->
devlink_pernet_pre_exit() -> devlink_reload() ->
mlx5_devlink_reload_down() -> mlx5_unload_one_devl_locked() ->
mlx5_detach_device() -> del_adev() -> mlx5e_remove() ->
mlx5e_destroy_devlink() -> devlink_free() ->
unregister_netdevice_notifier() (takes down_write(&pernet_ops_rwsem)
Steps to reproduce:
$ modprobe mlx5_core
$ ip netns add ns1
$ devlink dev reload pci/0000:08:00.0 netns ns1
$ ip netns del ns1
Resolve this by converting the notifier from per-devlink instance to
a static one registered during init phase and leaving it registered
forever. Use this notifier for all devlink port instances created
later on.
Note what a tree needs this fix only in case all of the cited fixes
commits are present.
Reported-by: Moshe Shemesh <moshe@nvidia.com>
Fixes:
565b4824c39f ("devlink: change port event netdev notifier from per-net to global")
Fixes:
ee75f1fc44dd ("net/mlx5e: Create separate devlink instance for ethernet auxiliary device")
Fixes:
72ed5d5624af ("net/mlx5: Suspend auxiliary devices only in case of PCI device suspend")
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230510144621.932017-1-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 12 May 2023 01:01:40 +0000 (18:01 -0700)]
Merge branch 'selftests-seg6-make-srv6_end_dt4_l3vpn_test-more-robust'
Andrea Mayer says:
====================
selftests: seg6: make srv6_end_dt4_l3vpn_test more robust
This pachset aims to improve and make more robust the selftests performed to
check whether SRv6 End.DT4 beahvior works as expected under different system
configurations.
Some Linux distributions enable Deduplication Address Detection and Reverse
Path Filtering mechanisms by default which can interfere with SRv6 End.DT4
behavior and cause selftests to fail.
The following patches improve selftests for End.DT4 by taking these two
mechanisms into account. Specifically:
- patch 1/2: selftests: seg6: disable DAD on IPv6 router cfg for
srv6_end_dt4_l3vpn_test
- patch 2/2: selftets: seg6: disable rp_filter by default in
srv6_end_dt4_l3vpn_test
====================
Link: https://lore.kernel.org/r/20230510111638.12408-1-andrea.mayer@uniroma2.it
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Andrea Mayer [Wed, 10 May 2023 11:16:38 +0000 (13:16 +0200)]
selftets: seg6: disable rp_filter by default in srv6_end_dt4_l3vpn_test
On some distributions, the rp_filter is automatically set (=1) by
default on a netdev basis (also on VRFs).
In an SRv6 End.DT4 behavior, decapsulated IPv4 packets are routed using
the table associated with the VRF bound to that tunnel. During lookup
operations, the rp_filter can lead to packet loss when activated on the
VRF.
Therefore, we chose to make this selftest more robust by explicitly
disabling the rp_filter during tests (as it is automatically set by some
Linux distributions).
Fixes:
2195444e09b4 ("selftests: add selftest for the SRv6 End.DT4 behavior")
Reported-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Tested-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Andrea Mayer [Wed, 10 May 2023 11:16:37 +0000 (13:16 +0200)]
selftests: seg6: disable DAD on IPv6 router cfg for srv6_end_dt4_l3vpn_test
The srv6_end_dt4_l3vpn_test instantiates a virtual network consisting of
several routers (rt-1, rt-2) and hosts.
When the IPv6 addresses of rt-{1,2} routers are configured, the Deduplicate
Address Detection (DAD) kicks in when enabled in the Linux distros running
the selftests. DAD is used to check whether an IPv6 address is already
assigned in a network. Such a mechanism consists of sending an ICMPv6 Echo
Request and waiting for a reply.
As the DAD process could take too long to complete, it may cause the
failing of some tests carried out by the srv6_end_dt4_l3vpn_test script.
To make the srv6_end_dt4_l3vpn_test more robust, we disable DAD on routers
since we configure the virtual network manually and do not need any address
deduplication mechanism at all.
Fixes:
2195444e09b4 ("selftests: add selftest for the SRv6 End.DT4 behavior")
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Thu, 11 May 2023 13:42:47 +0000 (08:42 -0500)]
Merge tag 'net-6.4-rc2' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from netfilter.
Current release - regressions:
- mtk_eth_soc: fix NULL pointer dereference
Previous releases - regressions:
- core:
- skb_partial_csum_set() fix against transport header magic value
- fix load-tearing on sk->sk_stamp in sock_recv_cmsgs().
- annotate sk->sk_err write from do_recvmmsg()
- add vlan_get_protocol_and_depth() helper
- netlink: annotate accesses to nlk->cb_running
- netfilter: always release netdev hooks from notifier
Previous releases - always broken:
- core: deal with most data-races in sk_wait_event()
- netfilter: fix possible bug_on with enable_hooks=1
- eth: bonding: fix send_peer_notif overflow
- eth: xpcs: fix incorrect number of interfaces
- eth: ipvlan: fix out-of-bounds caused by unclear skb->cb
- eth: stmmac: Initialize MAC_ONEUS_TIC_COUNTER register"
* tag 'net-6.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (31 commits)
af_unix: Fix data races around sk->sk_shutdown.
af_unix: Fix a data race of sk->sk_receive_queue->qlen.
net: datagram: fix data-races in datagram_poll()
net: mscc: ocelot: fix stat counter register values
ipvlan:Fix out-of-bounds caused by unclear skb->cb
docs: networking: fix x25-iface.rst heading & index order
gve: Remove the code of clearing PBA bit
tcp: add annotations around sk->sk_shutdown accesses
net: add vlan_get_protocol_and_depth() helper
net: pcs: xpcs: fix incorrect number of interfaces
net: deal with most data-races in sk_wait_event()
net: annotate sk->sk_err write from do_recvmmsg()
netlink: annotate accesses to nlk->cb_running
kselftest: bonding: add num_grat_arp test
selftests: forwarding: lib: add netns support for tc rule handle stats get
Documentation: bonding: fix the doc of peer_notif_delay
bonding: fix send_peer_notif overflow
net: ethernet: mtk_eth_soc: fix NULL pointer dereference
selftests: nft_flowtable.sh: check ingress/egress chain too
selftests: nft_flowtable.sh: monitor result file sizes
...
Linus Torvalds [Thu, 11 May 2023 13:35:52 +0000 (08:35 -0500)]
Merge tag 'media/v6.4-2' of git://git./linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
- fix some unused-variable warning in mtk-mdp3
- ignore unused suspend operations in nxp
- some driver fixes in rcar-vin
* tag 'media/v6.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
media: platform: mtk-mdp3: work around unused-variable warning
media: nxp: ignore unused suspend operations
media: rcar-vin: Select correct interrupt mode for V4L2_FIELD_ALTERNATE
media: rcar-vin: Fix NV12 size alignment
media: rcar-vin: Gen3 can not scale NV12
Jakub Kicinski [Thu, 11 May 2023 02:08:58 +0000 (19:08 -0700)]
Merge tag 'nf-23-05-10' of git://git./linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter updates for net
The following patchset contains Netfilter fixes for net:
1) Fix UAF when releasing netnamespace, from Florian Westphal.
2) Fix possible BUG_ON when nf_conntrack is enabled with enable_hooks,
from Florian Westphal.
3) Fixes for nft_flowtable.sh selftest, from Boris Sukholitko.
4) Extend nft_flowtable.sh selftest to cover integration with
ingress/egress hooks, from Florian Westphal.
* tag 'nf-23-05-10' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
selftests: nft_flowtable.sh: check ingress/egress chain too
selftests: nft_flowtable.sh: monitor result file sizes
selftests: nft_flowtable.sh: wait for specific nc pids
selftests: nft_flowtable.sh: no need for ps -x option
selftests: nft_flowtable.sh: use /proc for pid checking
netfilter: conntrack: fix possible bug_on with enable_hooks=1
netfilter: nf_tables: always release netdev hooks from notifier
====================
Link: https://lore.kernel.org/r/20230510083313.152961-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 11 May 2023 02:06:55 +0000 (19:06 -0700)]
Merge branch 'af_unix-fix-two-data-races-reported-by-kcsan'
Kuniyuki Iwashima says:
====================
af_unix: Fix two data races reported by KCSAN.
KCSAN reported data races around these two fields for AF_UNIX sockets.
* sk->sk_receive_queue->qlen
* sk->sk_shutdown
Let's annotate them properly.
====================
Link: https://lore.kernel.org/r/20230510003456.42357-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kuniyuki Iwashima [Wed, 10 May 2023 00:34:56 +0000 (17:34 -0700)]
af_unix: Fix data races around sk->sk_shutdown.
KCSAN found a data race around sk->sk_shutdown where unix_release_sock()
and unix_shutdown() update it under unix_state_lock(), OTOH unix_poll()
and unix_dgram_poll() read it locklessly.
We need to annotate the writes and reads with WRITE_ONCE() and READ_ONCE().
BUG: KCSAN: data-race in unix_poll / unix_release_sock
write to 0xffff88800d0f8aec of 1 bytes by task 264 on cpu 0:
unix_release_sock+0x75c/0x910 net/unix/af_unix.c:631
unix_release+0x59/0x80 net/unix/af_unix.c:1042
__sock_release+0x7d/0x170 net/socket.c:653
sock_close+0x19/0x30 net/socket.c:1397
__fput+0x179/0x5e0 fs/file_table.c:321
____fput+0x15/0x20 fs/file_table.c:349
task_work_run+0x116/0x1a0 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x174/0x180 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x1a/0x30 kernel/entry/common.c:297
do_syscall_64+0x4b/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc
read to 0xffff88800d0f8aec of 1 bytes by task 222 on cpu 1:
unix_poll+0xa3/0x2a0 net/unix/af_unix.c:3170
sock_poll+0xcf/0x2b0 net/socket.c:1385
vfs_poll include/linux/poll.h:88 [inline]
ep_item_poll.isra.0+0x78/0xc0 fs/eventpoll.c:855
ep_send_events fs/eventpoll.c:1694 [inline]
ep_poll fs/eventpoll.c:1823 [inline]
do_epoll_wait+0x6c4/0xea0 fs/eventpoll.c:2258
__do_sys_epoll_wait fs/eventpoll.c:2270 [inline]
__se_sys_epoll_wait fs/eventpoll.c:2265 [inline]
__x64_sys_epoll_wait+0xcc/0x190 fs/eventpoll.c:2265
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
value changed: 0x00 -> 0x03
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 222 Comm: dbus-broker Not tainted 6.3.0-rc7-02330-gca6270c12e20 #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Fixes:
3c73419c09a5 ("af_unix: fix 'poll for write'/ connected DGRAM sockets")
Fixes:
1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kuniyuki Iwashima [Wed, 10 May 2023 00:34:55 +0000 (17:34 -0700)]
af_unix: Fix a data race of sk->sk_receive_queue->qlen.
KCSAN found a data race of sk->sk_receive_queue->qlen where recvmsg()
updates qlen under the queue lock and sendmsg() checks qlen under
unix_state_sock(), not the queue lock, so the reader side needs
READ_ONCE().
BUG: KCSAN: data-race in __skb_try_recv_from_queue / unix_wait_for_peer
write (marked) to 0xffff888019fe7c68 of 4 bytes by task 49792 on cpu 0:
__skb_unlink include/linux/skbuff.h:2347 [inline]
__skb_try_recv_from_queue+0x3de/0x470 net/core/datagram.c:197
__skb_try_recv_datagram+0xf7/0x390 net/core/datagram.c:263
__unix_dgram_recvmsg+0x109/0x8a0 net/unix/af_unix.c:2452
unix_dgram_recvmsg+0x94/0xa0 net/unix/af_unix.c:2549
sock_recvmsg_nosec net/socket.c:1019 [inline]
____sys_recvmsg+0x3a3/0x3b0 net/socket.c:2720
___sys_recvmsg+0xc8/0x150 net/socket.c:2764
do_recvmmsg+0x182/0x560 net/socket.c:2858
__sys_recvmmsg net/socket.c:2937 [inline]
__do_sys_recvmmsg net/socket.c:2960 [inline]
__se_sys_recvmmsg net/socket.c:2953 [inline]
__x64_sys_recvmmsg+0x153/0x170 net/socket.c:2953
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
read to 0xffff888019fe7c68 of 4 bytes by task 49793 on cpu 1:
skb_queue_len include/linux/skbuff.h:2127 [inline]
unix_recvq_full net/unix/af_unix.c:229 [inline]
unix_wait_for_peer+0x154/0x1a0 net/unix/af_unix.c:1445
unix_dgram_sendmsg+0x13bc/0x14b0 net/unix/af_unix.c:2048
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x148/0x160 net/socket.c:747
____sys_sendmsg+0x20e/0x620 net/socket.c:2503
___sys_sendmsg+0xc6/0x140 net/socket.c:2557
__sys_sendmmsg+0x11d/0x370 net/socket.c:2643
__do_sys_sendmmsg net/socket.c:2672 [inline]
__se_sys_sendmmsg net/socket.c:2669 [inline]
__x64_sys_sendmmsg+0x58/0x70 net/socket.c:2669
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
value changed: 0x0000000b -> 0x00000001
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 49793 Comm: syz-executor.0 Not tainted 6.3.0-rc7-02330-gca6270c12e20 #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Fixes:
1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Tue, 9 May 2023 17:31:31 +0000 (17:31 +0000)]
net: datagram: fix data-races in datagram_poll()
datagram_poll() runs locklessly, we should add READ_ONCE()
annotations while reading sk->sk_err, sk->sk_shutdown and sk->sk_state.
Fixes:
1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230509173131.3263780-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Thu, 11 May 2023 01:04:04 +0000 (20:04 -0500)]
MAINTAINERS: re-sort all entries and fields
It's been a few years since we've sorted this thing, and the end result
is that we've added MAINTAINERS entries in the wrong order, and a number
of entries have their fields in non-canonical order too.
So roll this boulder up the hill one more time by re-running
./scripts/parse-maintainers.pl --order
on it.
This file ends up being fairly painful for merge conflicts even
normally, since unlike almost all other kernel files it's one of those
"everybody touches the same thing", and re-ordering all entries is only
going to make that worse. But the alternative is to never do it at all,
and just let it all rot..
The rc2 week is likely the quietest and least painful time to do this.
Requested-by: Randy Dunlap <rdunlap@infradead.org>
Requested-by: Joe Perches <joe@perches.com> # "Please use --order"
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Wed, 10 May 2023 22:07:42 +0000 (17:07 -0500)]
Merge tag 'fsnotify_for_v6.4-rc2' of git://git./linux/kernel/git/jack/linux-fs
Pull inotify fix from Jan Kara:
"A fix for possibly reporting invalid watch descriptor with inotify
event"
* tag 'fsnotify_for_v6.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
inotify: Avoid reporting event with invalid wd
Linus Torvalds [Wed, 10 May 2023 21:48:35 +0000 (16:48 -0500)]
Merge tag 'gfs2-v6.3-fix' of git://git./linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 fix from Andreas Gruenbacher:
- Fix a NULL pointer dereference when mounting corrupted filesystems
* tag 'gfs2-v6.3-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: Don't deref jdesc in evict
Bob Peterson [Fri, 28 Apr 2023 16:07:46 +0000 (12:07 -0400)]
gfs2: Don't deref jdesc in evict
On corrupt gfs2 file systems the evict code can try to reference the
journal descriptor structure, jdesc, after it has been freed and set to
NULL. The sequence of events is:
init_journal()
...
fail_jindex:
gfs2_jindex_free(sdp); <------frees journals, sets jdesc = NULL
if (gfs2_holder_initialized(&ji_gh))
gfs2_glock_dq_uninit(&ji_gh);
fail:
iput(sdp->sd_jindex); <--references jdesc in evict_linked_inode
evict()
gfs2_evict_inode()
evict_linked_inode()
ret = gfs2_trans_begin(sdp, 0, sdp->sd_jdesc->jd_blocks);
<------references the now freed/zeroed sd_jdesc pointer.
The call to gfs2_trans_begin is done because the truncate_inode_pages
call can cause gfs2 events that require a transaction, such as removing
journaled data (jdata) blocks from the journal.
This patch fixes the problem by adding a check for sdp->sd_jdesc to
function gfs2_evict_inode. In theory, this should only happen to corrupt
gfs2 file systems, when gfs2 detects the problem, reports it, then tries
to evict all the system inodes it has read in up to that point.
Reported-by: Yang Lan <lanyang0908@gmail.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Linus Torvalds [Wed, 10 May 2023 14:36:42 +0000 (09:36 -0500)]
Merge tag 'platform-drivers-x86-v6.4-2' of git://git./linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Hans de Goede:
"Nothing special to report just various small fixes:
- thinkpad_acpi: Fix profile (performance/bal/low-power) regression
on T490
- misc other small fixes / hw-id additions"
* tag 'platform-drivers-x86-v6.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/mellanox: fix potential race in mlxbf-tmfifo driver
platform/x86: touchscreen_dmi: Add info for the Dexp Ursus KX210i
platform/x86: touchscreen_dmi: Add upside-down quirk for GDIX1002 ts on the Juno Tablet
platform/x86: thinkpad_acpi: Add profile force ability
platform/x86: thinkpad_acpi: Fix platform profiles on T490
platform/x86: hp-wmi: add micmute to hp_wmi_keymap struct
platform/x86/intel-uncore-freq: Return error on write frequency
platform/x86: intel_scu_pcidrv: Add back PCI ID for Medfield
Colin Foster [Wed, 10 May 2023 04:48:51 +0000 (21:48 -0700)]
net: mscc: ocelot: fix stat counter register values
Commit
d4c367650704 ("net: mscc: ocelot: keep ocelot_stat_layout by reg
address, not offset") organized the stats counters for Ocelot chips, namely
the VSC7512 and VSC7514. A few of the counter offsets were incorrect, and
were caught by this warning:
WARNING: CPU: 0 PID: 24 at drivers/net/ethernet/mscc/ocelot_stats.c:909
ocelot_stats_init+0x1fc/0x2d8
reg 0x5000078 had address 0x220 but reg 0x5000079 has address 0x214,
bulking broken!
Fix these register offsets.
Fixes:
d4c367650704 ("net: mscc: ocelot: keep ocelot_stat_layout by reg address, not offset")
Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
t.feng [Wed, 10 May 2023 03:50:44 +0000 (11:50 +0800)]
ipvlan:Fix out-of-bounds caused by unclear skb->cb
If skb enqueue the qdisc, fq_skb_cb(skb)->time_to_send is changed which
is actually skb->cb, and IPCB(skb_in)->opt will be used in
__ip_options_echo. It is possible that memcpy is out of bounds and lead
to stack overflow.
We should clear skb->cb before ip_local_out or ip6_local_out.
v2:
1. clean the stack info
2. use IPCB/IP6CB instead of skb->cb
crash on stable-5.10(reproduce in kasan kernel).
Stack info:
[ 2203.651571] BUG: KASAN: stack-out-of-bounds in
__ip_options_echo+0x589/0x800
[ 2203.653327] Write of size 4 at addr
ffff88811a388f27 by task
swapper/3/0
[ 2203.655460] CPU: 3 PID: 0 Comm: swapper/3 Kdump: loaded Not tainted
5.10.0-60.18.0.50.h856.kasan.eulerosv2r11.x86_64 #1
[ 2203.655466] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.10.2-0-g5f4c7b1-20181220_000000-szxrtosci10000 04/01/2014
[ 2203.655475] Call Trace:
[ 2203.655481] <IRQ>
[ 2203.655501] dump_stack+0x9c/0xd3
[ 2203.655514] print_address_description.constprop.0+0x19/0x170
[ 2203.655530] __kasan_report.cold+0x6c/0x84
[ 2203.655586] kasan_report+0x3a/0x50
[ 2203.655594] check_memory_region+0xfd/0x1f0
[ 2203.655601] memcpy+0x39/0x60
[ 2203.655608] __ip_options_echo+0x589/0x800
[ 2203.655654] __icmp_send+0x59a/0x960
[ 2203.655755] nf_send_unreach+0x129/0x3d0 [nf_reject_ipv4]
[ 2203.655763] reject_tg+0x77/0x1bf [ipt_REJECT]
[ 2203.655772] ipt_do_table+0x691/0xa40 [ip_tables]
[ 2203.655821] nf_hook_slow+0x69/0x100
[ 2203.655828] __ip_local_out+0x21e/0x2b0
[ 2203.655857] ip_local_out+0x28/0x90
[ 2203.655868] ipvlan_process_v4_outbound+0x21e/0x260 [ipvlan]
[ 2203.655931] ipvlan_xmit_mode_l3+0x3bd/0x400 [ipvlan]
[ 2203.655967] ipvlan_queue_xmit+0xb3/0x190 [ipvlan]
[ 2203.655977] ipvlan_start_xmit+0x2e/0xb0 [ipvlan]
[ 2203.655984] xmit_one.constprop.0+0xe1/0x280
[ 2203.655992] dev_hard_start_xmit+0x62/0x100
[ 2203.656000] sch_direct_xmit+0x215/0x640
[ 2203.656028] __qdisc_run+0x153/0x1f0
[ 2203.656069] __dev_queue_xmit+0x77f/0x1030
[ 2203.656173] ip_finish_output2+0x59b/0xc20
[ 2203.656244] __ip_finish_output.part.0+0x318/0x3d0
[ 2203.656312] ip_finish_output+0x168/0x190
[ 2203.656320] ip_output+0x12d/0x220
[ 2203.656357] __ip_queue_xmit+0x392/0x880
[ 2203.656380] __tcp_transmit_skb+0x1088/0x11c0
[ 2203.656436] __tcp_retransmit_skb+0x475/0xa30
[ 2203.656505] tcp_retransmit_skb+0x2d/0x190
[ 2203.656512] tcp_retransmit_timer+0x3af/0x9a0
[ 2203.656519] tcp_write_timer_handler+0x3ba/0x510
[ 2203.656529] tcp_write_timer+0x55/0x180
[ 2203.656542] call_timer_fn+0x3f/0x1d0
[ 2203.656555] expire_timers+0x160/0x200
[ 2203.656562] run_timer_softirq+0x1f4/0x480
[ 2203.656606] __do_softirq+0xfd/0x402
[ 2203.656613] asm_call_irq_on_stack+0x12/0x20
[ 2203.656617] </IRQ>
[ 2203.656623] do_softirq_own_stack+0x37/0x50
[ 2203.656631] irq_exit_rcu+0x134/0x1a0
[ 2203.656639] sysvec_apic_timer_interrupt+0x36/0x80
[ 2203.656646] asm_sysvec_apic_timer_interrupt+0x12/0x20
[ 2203.656654] RIP: 0010:default_idle+0x13/0x20
[ 2203.656663] Code: 89 f0 5d 41 5c 41 5d 41 5e c3 cc cc cc cc cc cc cc
cc cc cc cc cc cc 0f 1f 44 00 00 0f 1f 44 00 00 0f 00 2d 9f 32 57 00 fb
f4 <c3> cc cc cc cc 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 54 be 08
[ 2203.656668] RSP: 0018:
ffff88810036fe78 EFLAGS:
00000256
[ 2203.656676] RAX:
ffffffffaf2a87f0 RBX:
ffff888100360000 RCX:
ffffffffaf290191
[ 2203.656681] RDX:
0000000000098b5e RSI:
0000000000000004 RDI:
ffff88811a3c4f60
[ 2203.656686] RBP:
0000000000000000 R08:
0000000000000001 R09:
ffff88811a3c4f63
[ 2203.656690] R10:
ffffed10234789ec R11:
0000000000000001 R12:
0000000000000003
[ 2203.656695] R13:
ffff888100360000 R14:
0000000000000000 R15:
0000000000000000
[ 2203.656729] default_idle_call+0x5a/0x150
[ 2203.656735] cpuidle_idle_call+0x1c6/0x220
[ 2203.656780] do_idle+0xab/0x100
[ 2203.656786] cpu_startup_entry+0x19/0x20
[ 2203.656793] secondary_startup_64_no_verify+0xc2/0xcb
[ 2203.657409] The buggy address belongs to the page:
[ 2203.658648] page:
0000000027a9842f refcount:1 mapcount:0
mapping:
0000000000000000 index:0x0 pfn:0x11a388
[ 2203.658665] flags:
0x17ffffc0001000(reserved|node=0|zone=2|lastcpupid=0x1fffff)
[ 2203.658675] raw:
0017ffffc0001000 ffffea000468e208 ffffea000468e208
0000000000000000
[ 2203.658682] raw:
0000000000000000 0000000000000000 00000001ffffffff
0000000000000000
[ 2203.658686] page dumped because: kasan: bad access detected
To reproduce(ipvlan with IPVLAN_MODE_L3):
Env setting:
=======================================================
modprobe ipvlan ipvlan_default_mode=1
sysctl net.ipv4.conf.eth0.forwarding=1
iptables -t nat -A POSTROUTING -s 20.0.0.0/255.255.255.0 -o eth0 -j
MASQUERADE
ip link add gw link eth0 type ipvlan
ip -4 addr add 20.0.0.254/24 dev gw
ip netns add net1
ip link add ipv1 link eth0 type ipvlan
ip link set ipv1 netns net1
ip netns exec net1 ip link set ipv1 up
ip netns exec net1 ip -4 addr add 20.0.0.4/24 dev ipv1
ip netns exec net1 route add default gw 20.0.0.254
ip netns exec net1 tc qdisc add dev ipv1 root netem loss 10%
ifconfig gw up
iptables -t filter -A OUTPUT -p tcp --dport 8888 -j REJECT --reject-with
icmp-port-unreachable
=======================================================
And then excute the shell(curl any address of eth0 can reach):
for((i=1;i<=100000;i++))
do
ip netns exec net1 curl x.x.x.x:8888
done
=======================================================
Fixes:
2ad7bf363841 ("ipvlan: Initial check-in of the IPVLAN driver.")
Signed-off-by: "t.feng" <fengtao40@huawei.com>
Suggested-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Randy Dunlap [Wed, 10 May 2023 02:29:14 +0000 (19:29 -0700)]
docs: networking: fix x25-iface.rst heading & index order
Fix the chapter heading for "X.25 Device Driver Interface" so that it
does not contain a trailing '-' character, which makes Sphinx
omit this heading from the contents.
Reverse the order of the x25.rst and x25-iface.rst files in the index
so that the project introduction (x25.rst) comes first.
Fixes:
883780af7209 ("docs: networking: convert x25-iface.txt to ReST")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org
Cc: Martin Schiller <ms@dev.tdt.de>
Cc: linux-x25@vger.kernel.org
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ziwei Xiao [Tue, 9 May 2023 22:51:23 +0000 (15:51 -0700)]
gve: Remove the code of clearing PBA bit
Clearing the PBA bit from the driver is race prone and it may lead to
dropped interrupt events. This could potentially lead to the traffic
being completely halted.
Fixes:
5e8c5adf95f8 ("gve: DQO: Add core netdev features")
Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Signed-off-by: Bailey Forrest <bcf@google.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 9 May 2023 20:36:56 +0000 (20:36 +0000)]
tcp: add annotations around sk->sk_shutdown accesses
Now sk->sk_shutdown is no longer a bitfield, we can add
standard READ_ONCE()/WRITE_ONCE() annotations to silence
KCSAN reports like the following:
BUG: KCSAN: data-race in tcp_disconnect / tcp_poll
write to 0xffff88814588582c of 1 bytes by task 3404 on cpu 1:
tcp_disconnect+0x4d6/0xdb0 net/ipv4/tcp.c:3121
__inet_stream_connect+0x5dd/0x6e0 net/ipv4/af_inet.c:715
inet_stream_connect+0x48/0x70 net/ipv4/af_inet.c:727
__sys_connect_file net/socket.c:2001 [inline]
__sys_connect+0x19b/0x1b0 net/socket.c:2018
__do_sys_connect net/socket.c:2028 [inline]
__se_sys_connect net/socket.c:2025 [inline]
__x64_sys_connect+0x41/0x50 net/socket.c:2025
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
read to 0xffff88814588582c of 1 bytes by task 3374 on cpu 0:
tcp_poll+0x2e6/0x7d0 net/ipv4/tcp.c:562
sock_poll+0x253/0x270 net/socket.c:1383
vfs_poll include/linux/poll.h:88 [inline]
io_poll_check_events io_uring/poll.c:281 [inline]
io_poll_task_func+0x15a/0x820 io_uring/poll.c:333
handle_tw_list io_uring/io_uring.c:1184 [inline]
tctx_task_work+0x1fe/0x4d0 io_uring/io_uring.c:1246
task_work_run+0x123/0x160 kernel/task_work.c:179
get_signal+0xe64/0xff0 kernel/signal.c:2635
arch_do_signal_or_restart+0x89/0x2a0 arch/x86/kernel/signal.c:306
exit_to_user_mode_loop+0x6f/0xe0 kernel/entry/common.c:168
exit_to_user_mode_prepare+0x6c/0xb0 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x26/0x140 kernel/entry/common.c:297
do_syscall_64+0x4d/0xc0 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x63/0xcd
value changed: 0x03 -> 0x00
Fixes:
1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 9 May 2023 13:18:57 +0000 (13:18 +0000)]
net: add vlan_get_protocol_and_depth() helper
Before blamed commit, pskb_may_pull() was used instead
of skb_header_pointer() in __vlan_get_protocol() and friends.
Few callers depended on skb->head being populated with MAC header,
syzbot caught one of them (skb_mac_gso_segment())
Add vlan_get_protocol_and_depth() to make the intent clearer
and use it where sensible.
This is a more generic fix than commit
e9d3f80935b6
("net/af_packet: make sure to pull mac header") which was
dealing with a similar issue.
kernel BUG at include/linux/skbuff.h:2655 !
invalid opcode: 0000 [#1] SMP KASAN
CPU: 0 PID: 1441 Comm: syz-executor199 Not tainted 6.1.24-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/14/2023
RIP: 0010:__skb_pull include/linux/skbuff.h:2655 [inline]
RIP: 0010:skb_mac_gso_segment+0x68f/0x6a0 net/core/gro.c:136
Code: fd 48 8b 5c 24 10 44 89 6b 70 48 c7 c7 c0 ae 0d 86 44 89 e6 e8 a1 91 d0 00 48 c7 c7 00 af 0d 86 48 89 de 31 d2 e8 d1 4a e9 ff <0f> 0b 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41
RSP: 0018:
ffffc90001bd7520 EFLAGS:
00010286
RAX:
ffffffff8469736a RBX:
ffff88810f31dac0 RCX:
ffff888115a18b00
RDX:
0000000000000000 RSI:
0000000000000000 RDI:
0000000000000000
RBP:
ffffc90001bd75e8 R08:
ffffffff84697183 R09:
fffff5200037adf9
R10:
0000000000000000 R11:
dffffc0000000001 R12:
0000000000000012
R13:
000000000000fee5 R14:
0000000000005865 R15:
000000000000fed7
FS:
000055555633f300(0000) GS:
ffff8881f6a00000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
0000000020000000 CR3:
0000000116fea000 CR4:
00000000003506f0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
Call Trace:
<TASK>
[<
ffffffff847018dd>] __skb_gso_segment+0x32d/0x4c0 net/core/dev.c:3419
[<
ffffffff8470398a>] skb_gso_segment include/linux/netdevice.h:4819 [inline]
[<
ffffffff8470398a>] validate_xmit_skb+0x3aa/0xee0 net/core/dev.c:3725
[<
ffffffff84707042>] __dev_queue_xmit+0x1332/0x3300 net/core/dev.c:4313
[<
ffffffff851a9ec7>] dev_queue_xmit+0x17/0x20 include/linux/netdevice.h:3029
[<
ffffffff851b4a82>] packet_snd net/packet/af_packet.c:3111 [inline]
[<
ffffffff851b4a82>] packet_sendmsg+0x49d2/0x6470 net/packet/af_packet.c:3142
[<
ffffffff84669a12>] sock_sendmsg_nosec net/socket.c:716 [inline]
[<
ffffffff84669a12>] sock_sendmsg net/socket.c:736 [inline]
[<
ffffffff84669a12>] __sys_sendto+0x472/0x5f0 net/socket.c:2139
[<
ffffffff84669c75>] __do_sys_sendto net/socket.c:2151 [inline]
[<
ffffffff84669c75>] __se_sys_sendto net/socket.c:2147 [inline]
[<
ffffffff84669c75>] __x64_sys_sendto+0xe5/0x100 net/socket.c:2147
[<
ffffffff8551d40f>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
[<
ffffffff8551d40f>] do_syscall_64+0x2f/0x50 arch/x86/entry/common.c:80
[<
ffffffff85600087>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
Fixes:
469aceddfa3e ("vlan: consolidate VLAN parsing code and limit max parsing depth")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Toke Høiland-Jørgensen <toke@redhat.com>
Cc: Willem de Bruijn <willemb@google.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King (Oracle) [Tue, 9 May 2023 11:50:04 +0000 (12:50 +0100)]
net: pcs: xpcs: fix incorrect number of interfaces
In synopsys_xpcs_compat[], the DW_XPCS_2500BASEX entry was setting
the number of interfaces using the xpcs_2500basex_features array
rather than xpcs_2500basex_interfaces. This causes us to overflow
the array of interfaces. Fix this.
Fixes:
f27abde3042a ("net: pcs: add 2500BASEX support for Intel mGbE controller")
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 9 May 2023 18:29:48 +0000 (18:29 +0000)]
net: deal with most data-races in sk_wait_event()
__condition is evaluated twice in sk_wait_event() macro.
First invocation is lockless, and reads can race with writes,
as spotted by syzbot.
BUG: KCSAN: data-race in sk_stream_wait_connect / tcp_disconnect
write to 0xffff88812d83d6a0 of 4 bytes by task 9065 on cpu 1:
tcp_disconnect+0x2cd/0xdb0
inet_shutdown+0x19e/0x1f0 net/ipv4/af_inet.c:911
__sys_shutdown_sock net/socket.c:2343 [inline]
__sys_shutdown net/socket.c:2355 [inline]
__do_sys_shutdown net/socket.c:2363 [inline]
__se_sys_shutdown+0xf8/0x140 net/socket.c:2361
__x64_sys_shutdown+0x31/0x40 net/socket.c:2361
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
read to 0xffff88812d83d6a0 of 4 bytes by task 9040 on cpu 0:
sk_stream_wait_connect+0x1de/0x3a0 net/core/stream.c:75
tcp_sendmsg_locked+0x2e4/0x2120 net/ipv4/tcp.c:1266
tcp_sendmsg+0x30/0x50 net/ipv4/tcp.c:1484
inet6_sendmsg+0x63/0x80 net/ipv6/af_inet6.c:651
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg net/socket.c:747 [inline]
__sys_sendto+0x246/0x300 net/socket.c:2142
__do_sys_sendto net/socket.c:2154 [inline]
__se_sys_sendto net/socket.c:2150 [inline]
__x64_sys_sendto+0x78/0x90 net/socket.c:2150
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
value changed: 0x00000000 -> 0x00000068
Fixes:
1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 9 May 2023 16:35:53 +0000 (16:35 +0000)]
net: annotate sk->sk_err write from do_recvmmsg()
do_recvmmsg() can write to sk->sk_err from multiple threads.
As said before, many other points reading or writing sk_err
need annotations.
Fixes:
34b88a68f26a ("net: Fix use after free in the recvmmsg exit path")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 9 May 2023 16:56:34 +0000 (16:56 +0000)]
netlink: annotate accesses to nlk->cb_running
Both netlink_recvmsg() and netlink_native_seq_show() read
nlk->cb_running locklessly. Use READ_ONCE() there.
Add corresponding WRITE_ONCE() to netlink_dump() and
__netlink_dump_start()
syzbot reported:
BUG: KCSAN: data-race in __netlink_dump_start / netlink_recvmsg
write to 0xffff88813ea4db59 of 1 bytes by task 28219 on cpu 0:
__netlink_dump_start+0x3af/0x4d0 net/netlink/af_netlink.c:2399
netlink_dump_start include/linux/netlink.h:308 [inline]
rtnetlink_rcv_msg+0x70f/0x8c0 net/core/rtnetlink.c:6130
netlink_rcv_skb+0x126/0x220 net/netlink/af_netlink.c:2577
rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:6192
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x56f/0x640 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x665/0x770 net/netlink/af_netlink.c:1942
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg net/socket.c:747 [inline]
sock_write_iter+0x1aa/0x230 net/socket.c:1138
call_write_iter include/linux/fs.h:1851 [inline]
new_sync_write fs/read_write.c:491 [inline]
vfs_write+0x463/0x760 fs/read_write.c:584
ksys_write+0xeb/0x1a0 fs/read_write.c:637
__do_sys_write fs/read_write.c:649 [inline]
__se_sys_write fs/read_write.c:646 [inline]
__x64_sys_write+0x42/0x50 fs/read_write.c:646
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
read to 0xffff88813ea4db59 of 1 bytes by task 28222 on cpu 1:
netlink_recvmsg+0x3b4/0x730 net/netlink/af_netlink.c:2022
sock_recvmsg_nosec+0x4c/0x80 net/socket.c:1017
____sys_recvmsg+0x2db/0x310 net/socket.c:2718
___sys_recvmsg net/socket.c:2762 [inline]
do_recvmmsg+0x2e5/0x710 net/socket.c:2856
__sys_recvmmsg net/socket.c:2935 [inline]
__do_sys_recvmmsg net/socket.c:2958 [inline]
__se_sys_recvmmsg net/socket.c:2951 [inline]
__x64_sys_recvmmsg+0xe2/0x160 net/socket.c:2951
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
value changed: 0x00 -> 0x01
Fixes:
16b304f3404f ("netlink: Eliminate kmalloc in netlink dump operation.")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 10 May 2023 08:27:21 +0000 (09:27 +0100)]
Merge branch 'bonding-overflow'
Hangbin Liu says:
====================
bonding: fix send_peer_notif overflow
Bonding send_peer_notif was defined as u8. But the value is
num_peer_notif multiplied by peer_notif_delay, which is u8 * u32.
This would cause the send_peer_notif overflow.
Before the fix:
TEST: num_grat_arp (active-backup miimon num_grat_arp 10) [ OK ]
TEST: num_grat_arp (active-backup miimon num_grat_arp 20) [ OK ]
4 garp packets sent on active slave eth1
TEST: num_grat_arp (active-backup miimon num_grat_arp 30) [FAIL]
24 garp packets sent on active slave eth1
TEST: num_grat_arp (active-backup miimon num_grat_arp 50) [FAIL]
After the fix:
TEST: num_grat_arp (active-backup miimon num_grat_arp 10) [ OK ]
TEST: num_grat_arp (active-backup miimon num_grat_arp 20) [ OK ]
TEST: num_grat_arp (active-backup miimon num_grat_arp 30) [ OK ]
TEST: num_grat_arp (active-backup miimon num_grat_arp 50) [ OK ]
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Hangbin Liu [Tue, 9 May 2023 03:12:00 +0000 (11:12 +0800)]
kselftest: bonding: add num_grat_arp test
TEST: num_grat_arp (active-backup miimon num_grat_arp 10) [ OK ]
TEST: num_grat_arp (active-backup miimon num_grat_arp 20) [ OK ]
TEST: num_grat_arp (active-backup miimon num_grat_arp 30) [ OK ]
TEST: num_grat_arp (active-backup miimon num_grat_arp 50) [ OK ]
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hangbin Liu [Tue, 9 May 2023 03:11:59 +0000 (11:11 +0800)]
selftests: forwarding: lib: add netns support for tc rule handle stats get
When run the test in netns, it's not easy to get the tc stats via
tc_rule_handle_stats_get(). With the new netns parameter, we can get
stats from specific netns like
num=$(tc_rule_handle_stats_get "dev eth0 ingress" 101 ".packets" "-n ns")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hangbin Liu [Tue, 9 May 2023 03:11:58 +0000 (11:11 +0800)]
Documentation: bonding: fix the doc of peer_notif_delay
Bonding only supports setting peer_notif_delay with miimon set.
Fixes:
0307d589c4d6 ("bonding: add documentation for peer_notif_delay")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hangbin Liu [Tue, 9 May 2023 03:11:57 +0000 (11:11 +0800)]
bonding: fix send_peer_notif overflow
Bonding send_peer_notif was defined as u8. Since commit
07a4ddec3ce9
("bonding: add an option to specify a delay between peer notifications").
the bond->send_peer_notif will be num_peer_notif multiplied by
peer_notif_delay, which is u8 * u32. This would cause the send_peer_notif
overflow easily. e.g.
ip link add bond0 type bond mode 1 miimon 100 num_grat_arp 30 peer_notify_delay 1000
To fix the overflow, let's set the send_peer_notif to u32 and limit
peer_notif_delay to 300s.
Reported-by: Liang Li <liali@redhat.com>
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2090053
Fixes:
07a4ddec3ce9 ("bonding: add an option to specify a delay between peer notifications")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Golle [Tue, 9 May 2023 01:20:06 +0000 (03:20 +0200)]
net: ethernet: mtk_eth_soc: fix NULL pointer dereference
Check for NULL pointer to avoid kernel crashing in case of missing WO
firmware in case only a single WEDv2 device has been initialized, e.g. on
MT7981 which can connect just one wireless frontend.
Fixes:
86ce0d09e424 ("net: ethernet: mtk_eth_soc: use WO firmware for MT7981")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal [Tue, 9 May 2023 14:47:24 +0000 (16:47 +0200)]
selftests: nft_flowtable.sh: check ingress/egress chain too
Make sure flowtable interacts correctly with ingress and egress
chains, i.e. those get handled before and after flow table respectively.
Adds three more tests:
1. repeat flowtable test, but with 'ip dscp set cs3' done in
inet forward chain.
Expect that some packets have been mangled (before flowtable offload
became effective) while some pass without mangling (after offload
succeeds).
2. repeat flowtable test, but with 'ip dscp set cs3' done in
veth0:ingress.
Expect that all packets pass with cs3 dscp field.
3. same as 2, but use veth1:egress. Expect the same outcome.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Boris Sukholitko [Thu, 4 May 2023 08:48:14 +0000 (11:48 +0300)]
selftests: nft_flowtable.sh: monitor result file sizes
When running nft_flowtable.sh in VM on a busy server we've found that
the time of the netcat file transfers vary wildly.
Therefore replace hardcoded 3 second sleep with the loop checking for
a change in the file sizes. Once no change in detected we test the results.
Nice side effect is that we shave 1 second sleep in the fast case
(hard-coded 3 second sleep vs two 1 second sleeps).
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Boris Sukholitko [Thu, 4 May 2023 08:48:13 +0000 (11:48 +0300)]
selftests: nft_flowtable.sh: wait for specific nc pids
Doing wait with no parameters may interfere with some of the tests
having their own background processes.
Although no such test is currently present, the cleanup is useful
to rely on the nft_flowtable.sh for local development (e.g. running
background tcpdump command during the tests).
Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Boris Sukholitko [Thu, 4 May 2023 08:48:12 +0000 (11:48 +0300)]
selftests: nft_flowtable.sh: no need for ps -x option
Some ps commands (e.g. busybox derived) have no -x option. For the
purposes of hash calculation of the list of processes this option is
inessential.
Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Boris Sukholitko [Thu, 4 May 2023 08:48:11 +0000 (11:48 +0300)]
selftests: nft_flowtable.sh: use /proc for pid checking
Some ps commands (e.g. busybox derived) have no -p option. Use /proc for
pid existence check.
Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Westphal [Thu, 4 May 2023 12:55:02 +0000 (14:55 +0200)]
netfilter: conntrack: fix possible bug_on with enable_hooks=1
I received a bug report (no reproducer so far) where we trip over
712 rcu_read_lock();
713 ct_hook = rcu_dereference(nf_ct_hook);
714 BUG_ON(ct_hook == NULL); // here
In nf_conntrack_destroy().
First turn this BUG_ON into a WARN. I think it was triggered
via enable_hooks=1 flag.
When this flag is turned on, the conntrack hooks are registered
before nf_ct_hook pointer gets assigned.
This opens a short window where packets enter the conntrack machinery,
can have skb->_nfct set up and a subsequent kfree_skb might occur
before nf_ct_hook is set.
Call nf_conntrack_init_end() to set nf_ct_hook before we register the
pernet ops.
Fixes:
ba3fbe663635 ("netfilter: nf_conntrack: provide modparam to always register conntrack hooks")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Westphal [Thu, 4 May 2023 12:20:21 +0000 (14:20 +0200)]
netfilter: nf_tables: always release netdev hooks from notifier
This reverts "netfilter: nf_tables: skip netdev events generated on netns removal".
The problem is that when a veth device is released, the veth release
callback will also queue the peer netns device for removal.
Its possible that the peer netns is also slated for removal. In this
case, the device memory is already released before the pre_exit hook of
the peer netns runs:
BUG: KASAN: slab-use-after-free in nf_hook_entry_head+0x1b8/0x1d0
Read of size 8 at addr
ffff88812c0124f0 by task kworker/u8:1/45
Workqueue: netns cleanup_net
Call Trace:
nf_hook_entry_head+0x1b8/0x1d0
__nf_unregister_net_hook+0x76/0x510
nft_netdev_unregister_hooks+0xa0/0x220
__nft_release_hook+0x184/0x490
nf_tables_pre_exit_net+0x12f/0x1b0
..
Order is:
1. First netns is released, veth_dellink() queues peer netns device
for removal
2. peer netns is queued for removal
3. peer netns device is released, unreg event is triggered
4. unreg event is ignored because netns is going down
5. pre_exit hook calls nft_netdev_unregister_hooks but device memory
might be free'd already.
Fixes:
68a3765c659f ("netfilter: nf_tables: skip netdev events generated on netns removal")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Fainelli [Mon, 8 May 2023 23:17:49 +0000 (16:17 -0700)]
net: phy: bcm7xx: Correct read from expansion register
Since the driver works in the "legacy" addressing mode, we need to write
to the expansion register (0x17) with bits 11:8 set to 0xf to properly
select the expansion register passed as argument.
Fixes:
f68d08c437f9 ("net: phy: bcm7xxx: Add EPHY entry for 72165")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230508231749.1681169-1-f.fainelli@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kuniyuki Iwashima [Mon, 8 May 2023 17:55:43 +0000 (10:55 -0700)]
net: Fix load-tearing on sk->sk_stamp in sock_recv_cmsgs().
KCSAN found a data race in sock_recv_cmsgs() where the read access
to sk->sk_stamp needs READ_ONCE().
BUG: KCSAN: data-race in packet_recvmsg / packet_recvmsg
write (marked) to 0xffff88803c81f258 of 8 bytes by task 19171 on cpu 0:
sock_write_timestamp include/net/sock.h:2670 [inline]
sock_recv_cmsgs include/net/sock.h:2722 [inline]
packet_recvmsg+0xb97/0xd00 net/packet/af_packet.c:3489
sock_recvmsg_nosec net/socket.c:1019 [inline]
sock_recvmsg+0x11a/0x130 net/socket.c:1040
sock_read_iter+0x176/0x220 net/socket.c:1118
call_read_iter include/linux/fs.h:1845 [inline]
new_sync_read fs/read_write.c:389 [inline]
vfs_read+0x5e0/0x630 fs/read_write.c:470
ksys_read+0x163/0x1a0 fs/read_write.c:613
__do_sys_read fs/read_write.c:623 [inline]
__se_sys_read fs/read_write.c:621 [inline]
__x64_sys_read+0x41/0x50 fs/read_write.c:621
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
read to 0xffff88803c81f258 of 8 bytes by task 19183 on cpu 1:
sock_recv_cmsgs include/net/sock.h:2721 [inline]
packet_recvmsg+0xb64/0xd00 net/packet/af_packet.c:3489
sock_recvmsg_nosec net/socket.c:1019 [inline]
sock_recvmsg+0x11a/0x130 net/socket.c:1040
sock_read_iter+0x176/0x220 net/socket.c:1118
call_read_iter include/linux/fs.h:1845 [inline]
new_sync_read fs/read_write.c:389 [inline]
vfs_read+0x5e0/0x630 fs/read_write.c:470
ksys_read+0x163/0x1a0 fs/read_write.c:613
__do_sys_read fs/read_write.c:623 [inline]
__se_sys_read fs/read_write.c:621 [inline]
__x64_sys_read+0x41/0x50 fs/read_write.c:621
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
value changed: 0xffffffffc4653600 -> 0x0000000000000000
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 19183 Comm: syz-executor.5 Not tainted 6.3.0-rc7-02330-gca6270c12e20 #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Fixes:
6c7c98bad488 ("sock: avoid dirtying sk_stamp, if possible")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230508175543.55756-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Marek Vasut [Sat, 6 May 2023 23:58:45 +0000 (01:58 +0200)]
net: stmmac: Initialize MAC_ONEUS_TIC_COUNTER register
Initialize MAC_ONEUS_TIC_COUNTER register with correct value derived
from CSR clock, otherwise EEE is unstable on at least NXP i.MX8M Plus
and Micrel KSZ9131RNX PHY, to the point where not even ARP request can
be sent out.
i.MX 8M Plus Applications Processor Reference Manual, Rev. 1, 06/2021
11.7.6.1.34 One-microsecond Reference Timer (MAC_ONEUS_TIC_COUNTER)
defines this register as:
"
This register controls the generation of the Reference time (1 microsecond
tic) for all the LPI timers. This timer has to be programmed by the software
initially.
...
The application must program this counter so that the number of clock cycles
of CSR clock is 1us. (Subtract 1 from the value before programming).
For example if the CSR clock is 100MHz then this field needs to be programmed
to value 100 - 1 = 99 (which is 0x63).
This is required to generate the 1US events that are used to update some of
the EEE related counters.
"
The reset value is 0x63 on i.MX8M Plus, which means expected CSR clock are
100 MHz. However, the i.MX8M Plus "enet_qos_root_clk" are 266 MHz instead,
which means the LPI timers reach their count much sooner on this platform.
This is visible using a scope by monitoring e.g. exit from LPI mode on TX_CTL
line from MAC to PHY. This should take 30us per STMMAC_DEFAULT_TWT_LS setting,
during which the TX_CTL line transitions from tristate to low, and 30 us later
from low to high. On i.MX8M Plus, this transition takes 11 us, which matches
the 30us * 100/266 formula for misconfigured MAC_ONEUS_TIC_COUNTER register.
Configure MAC_ONEUS_TIC_COUNTER based on CSR clock, so that the LPI timers
have correct 1us reference. This then fixes EEE on i.MX8M Plus with Micrel
KSZ9131RNX PHY.
Fixes:
477286b53f55 ("stmmac: add GMAC4 core support")
Signed-off-by: Marek Vasut <marex@denx.de>
Tested-by: Harald Seiler <hws@denx.de>
Reviewed-by: Francesco Dolcini <francesco.dolcini@toradex.com>
Tested-by: Francesco Dolcini <francesco.dolcini@toradex.com> # Toradex Verdin iMX8MP
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Link: https://lore.kernel.org/r/20230506235845.246105-1-marex@denx.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Tue, 9 May 2023 17:22:13 +0000 (10:22 -0700)]
nfs: fix another case of NULL/IS_ERR confusion wrt folio pointers
Dan has been improving on the smatch error pointer checks, and pointed
at another case where the __filemap_get_folio() conversion to error
pointers had been overlooked. This time because it was hidden behind
the filemap_grab_folio() helper function that is a wrapper around it.
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Cc: Anna Schumaker <anna@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Tue, 9 May 2023 16:53:41 +0000 (09:53 -0700)]
Merge tag 'for-6.4-rc1-tag' of git://git./linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- fix backward leaf iteration which could possibly return the same key
- fix assertion when device add and balance race for exclusive
operation
- fix regression when freeing device, state tree would leak after
device replace
- fix attempt to clear space cache v1 when block-group-tree is enabled
- fix potential i_size corruption when encoded write races with send v2
and enabled no-holes (the race is hard to hit though, the window is a
few instructions wide)
- fix wrong bitmap API use when checking empty zones, parameters were
swapped but not causing a bug due to other code
- prevent potential qgroup leak if subvolume create does not commit
transaction (which is pending in the development queue)
- error handling and reporting:
- abort transaction when sibling keys check fails for leaves
- print extent buffers when sibling keys check fails
* tag 'for-6.4-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: don't free qgroup space unless specified
btrfs: fix encoded write i_size corruption with no-holes
btrfs: zoned: fix wrong use of bitops API in btrfs_ensure_empty_zones
btrfs: properly reject clear_cache and v1 cache for block-group-tree
btrfs: print extent buffers when sibling keys check fails
btrfs: abort transaction when sibling keys check fails for leaves
btrfs: fix leak of source device allocation state after device replace
btrfs: fix assertion of exclop condition when starting balance
btrfs: fix btrfs_prev_leaf() to not return the same key twice
Liming Sun [Wed, 26 Apr 2023 14:23:44 +0000 (10:23 -0400)]
platform/mellanox: fix potential race in mlxbf-tmfifo driver
This commit adds memory barrier for the 'vq' update in function
mlxbf_tmfifo_virtio_find_vqs() to avoid potential race due to
out-of-order memory write. It also adds barrier for the 'is_ready'
flag to make sure the initializations are visible before this flag
is checked.
Signed-off-by: Liming Sun <limings@nvidia.com>
Reviewed-by: Vadim Pasternak <vadimp@nvidia.com>
Link: https://lore.kernel.org/r/b98c0ab61d644ba38fa9b3fd1607b138b0dd820b.1682518748.git.limings@nvidia.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Roy Novich [Sun, 7 May 2023 13:57:43 +0000 (16:57 +0300)]
linux/dim: Do nothing if no time delta between samples
Add return value for dim_calc_stats. This is an indication for the
caller if curr_stats was assigned by the function. Avoid using
curr_stats uninitialized over {rdma/net}_dim, when no time delta between
samples. Coverity reported this potential use of an uninitialized
variable.
Fixes:
4c4dbb4a7363 ("net/mlx5e: Move dynamic interrupt coalescing code to include/linux")
Fixes:
cb3c7fd4f839 ("net/mlx5e: Support adaptive RX coalescing")
Signed-off-by: Roy Novich <royno@nvidia.com>
Reviewed-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Link: https://lore.kernel.org/r/20230507135743.138993-1-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Andrey Avdeev [Sun, 30 Apr 2023 08:01:10 +0000 (11:01 +0300)]
platform/x86: touchscreen_dmi: Add info for the Dexp Ursus KX210i
Add touchscreen info for the Dexp Ursus KX210i
Signed-off-by: Andrey Avdeev <jamesstoun@gmail.com>
Link: https://lore.kernel.org/r/ZE4gRgzRQCjXFYD0@avdeevavpc
Cc: stable@vger.kernel.org
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Hans de Goede [Fri, 5 May 2023 21:03:23 +0000 (23:03 +0200)]
platform/x86: touchscreen_dmi: Add upside-down quirk for GDIX1002 ts on the Juno Tablet
The Juno Computers Juno Tablet has an upside-down mounted Goodix
touchscreen. Add a quirk to invert both axis to correct for this.
Link: https://junocomputers.com/us/product/juno-tablet/
Cc: stable@vger.kernel.org
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20230505210323.43177-1-hdegoede@redhat.com
Mark Pearson [Fri, 5 May 2023 13:25:23 +0000 (09:25 -0400)]
platform/x86: thinkpad_acpi: Add profile force ability
There has been a lot of confusion around which platform profiles are
supported on various platforms and it would be useful to have a debug
method to be able to override the profile mode that is selected.
I don't expect this to be used in anything other than debugging in
conjunction with Lenovo engineers - but it does give a way to get a
system working whilst we wait for either FW fixes, or a driver fix
to land upstream, if something is wonky in the mode detection logic
Signed-off-by: Mark Pearson <mpearson-lenovo@squebb.ca>
Link: https://lore.kernel.org/r/20230505132523.214338-2-mpearson-lenovo@squebb.ca
Cc: stable@vger.kernel.org
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Mark Pearson [Fri, 5 May 2023 13:25:22 +0000 (09:25 -0400)]
platform/x86: thinkpad_acpi: Fix platform profiles on T490
I had incorrectly thought that PSC profiles were not usable on Intel
platforms so had blocked them in the driver initialistion. This broke
platform profiles on the T490.
After discussion with the FW team PSC does work on Intel platforms and
should be allowed.
Note - it's possible this may impact other platforms where it is advertised
but special driver support that only Windows has is needed. But if it does
then they will need fixing via quirks. Please report any issues to me so I
can get them addressed - but I haven't found any problems in testing...yet
Fixes:
bce6243f767f ("platform/x86: thinkpad_acpi: do not use PSC mode on Intel platforms")
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2177962
Cc: stable@vger.kernel.org
Signed-off-by: Mark Pearson <mpearson-lenovo@squebb.ca>
Link: https://lore.kernel.org/r/20230505132523.214338-1-mpearson-lenovo@squebb.ca
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Fae [Tue, 25 Apr 2023 06:36:44 +0000 (01:36 -0500)]
platform/x86: hp-wmi: add micmute to hp_wmi_keymap struct
Fixes micmute key of HP Envy X360 ey0xxx.
Signed-off-by: Fae <faenkhauser@gmail.com>
Link: https://lore.kernel.org/r/20230425063644.11828-1-faenkhauser@gmail.com
Cc: stable@vger.kernel.org
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Srinivas Pandruvada [Tue, 18 Apr 2023 15:32:30 +0000 (08:32 -0700)]
platform/x86/intel-uncore-freq: Return error on write frequency
Currently when the uncore_write() returns error, it is silently
ignored. Return error to user space when uncore_write() fails.
Fixes:
49a474c7ba51 ("platform/x86: Add support for Uncore frequency control")
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Wendy Wang <wendy.wang@intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://lore.kernel.org/r/20230418153230.679094-1-srinivas.pandruvada@linux.intel.com
Cc: stable@vger.kernel.org
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Arnd Bergmann [Tue, 18 Apr 2023 06:11:43 +0000 (08:11 +0200)]
media: nxp: imx8-isi: fix buiding on 32-bit
The #if check is wrong, leading to a build failure:
drivers/media/platform/nxp/imx8-isi/imx8-isi-hw.c: In function 'mxc_isi_channel_set_inbuf':
drivers/media/platform/nxp/imx8-isi/imx8-isi-hw.c:33:5: error: "CONFIG_ARCH_DMA_ADDR_T_64BIT" is not defined, evaluates to 0 [-Werror=undef]
33 | #if CONFIG_ARCH_DMA_ADDR_T_64BIT
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
This could just be an #ifdef, but it seems nicer to just remove the
check entirely. Apparently the only reason for the #ifdef is to avoid
another warning:
drivers/media/platform/nxp/imx8-isi/imx8-isi-hw.c:55:24: error: right shift count >= width of type [-Werror=shift-count-overflow]
But this is best avoided by using the lower_32_bits()/upper_32_bits()
helpers.
Fixes:
cf21f328fcaf ("media: nxp: Add i.MX8 ISI driver")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Julian Winkler [Sun, 16 Apr 2023 15:49:32 +0000 (17:49 +0200)]
platform/x86: intel_scu_pcidrv: Add back PCI ID for Medfield
This id was removed in commit
b47018a778c1 ("platform/x86: intel_scu_ipc:
Remove Lincroft support"), saying it is only used on Moorestown,
but apparently the same id is also used on Medfield.
Tested on the Medfield based Motorola RAZR i smartphone.
Signed-off-by: Julian Winkler <julian.winkler1@web.de>
Link: https://lore.kernel.org/r/20230416154932.6579-1-julian.winkler1@web.de
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Arnd Bergmann [Tue, 18 Apr 2023 09:15:48 +0000 (11:15 +0200)]
media: platform: mtk-mdp3: work around unused-variable warning
When CONFIG_OF is disabled, the 'data' variable is not used at all
because of_match_node() turns into a dummy macro:
drivers/media/platform/mediatek/mdp3/mtk-mdp3-comp.c: In function 'mdp_comp_sub_create':
drivers/media/platform/mediatek/mdp3/mtk-mdp3-comp.c:1038:36: error: unused variable 'data' [-Werror=unused-variable]
1038 | const struct mtk_mdp_driver_data *data = mdp->mdp_data;
| ^~~~
Remove the variable again by moving the pointer dereference into the
of_match_node call.
Fixes:
b385b991ef2f ("media: platform: mtk-mdp3: chip config split about subcomponents")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Arnd Bergmann [Tue, 18 Apr 2023 09:19:55 +0000 (11:19 +0200)]
media: nxp: ignore unused suspend operations
gcc warns about some functions being unused when CONFIG_PM_SLEEP
is disabled:
drivers/media/platform/nxp/imx8-isi/imx8-isi-core.c:328:12: error: 'mxc_isi_pm_resume' defined but not used [-Werror=unused-function]
328 | static int mxc_isi_pm_resume(struct device *dev)
| ^~~~~~~~~~~~~~~~~
drivers/media/platform/nxp/imx8-isi/imx8-isi-core.c:314:12: error: 'mxc_isi_pm_suspend' defined but not used [-Werror=unused-function]
314 | static int mxc_isi_pm_suspend(struct device *dev)
| ^~~~~~~~~~~~~~~~~~
Use the modern SYSTEM_SLEEP_PM_OPS()/RUNTIME_PM_OPS() helpers in place
of the old SET_SYSTEM_SLEEP_PM_OPS()/SET_RUNTIME_PM_OPS() ones.
By convention, use pm_ptr() to guard the reference to the operations.
This makes no difference as long as the driver requires CONFIG_PM,
but is what users of SET_RUNTIME_PM_OPS() are supposed to do.
Fixes:
cf21f328fcaf ("media: nxp: Add i.MX8 ISI driver")
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Niklas Söderlund [Sat, 11 Feb 2023 20:55:34 +0000 (21:55 +0100)]
media: rcar-vin: Select correct interrupt mode for V4L2_FIELD_ALTERNATE
When adding proper support for V4L2_FIELD_ALTERNATE it was missed that
this field format should trigger an interrupt for each field, not just
for the whole frame. Fix this by marking it as progressive in the
capture setup, which will then select the correct interrupt mode.
Tested on both Gen2 and Gen3 with the result of a doubling of the frame
rate for V4L2_FIELD_ALTERNATE. From a PAL video source the frame rate is
now 50, which is expected for alternate field capture.
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Niklas Söderlund [Sat, 11 Feb 2023 20:54:32 +0000 (21:54 +0100)]
media: rcar-vin: Fix NV12 size alignment
When doing format validation for NV12 the width and height should be
aligned to 32 pixels.
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Niklas Söderlund [Sat, 11 Feb 2023 20:54:31 +0000 (21:54 +0100)]
media: rcar-vin: Gen3 can not scale NV12
The VIN modules on Gen3 can not scale NV12, fail format validation if
the user tries. Currently no frames are produced if this is attempted.
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Linus Torvalds [Sun, 7 May 2023 20:34:35 +0000 (13:34 -0700)]
Linux 6.4-rc1
Linus Torvalds [Sun, 7 May 2023 18:32:18 +0000 (11:32 -0700)]
Merge tag 'perf-tools-for-v6.4-3-2023-05-06' of git://git./linux/kernel/git/acme/linux
Pull perf tool updates from Arnaldo Carvalho de Melo:
"Third version of perf tool updates, with the build problems with with
using a 'vmlinux.h' generated from the main build fixed, and the bpf
skeleton build disabled by default.
Build:
- Require libtraceevent to build, one can disable it using
NO_LIBTRACEEVENT=1.
It is required for tools like 'perf sched', 'perf kvm', 'perf
trace', etc.
libtraceevent is available in most distros so installing
'libtraceevent-devel' should be a one-time event to continue
building perf as usual.
Using NO_LIBTRACEEVENT=1 produces tooling that is functional and
sufficient for lots of users not interested in those libtraceevent
dependent features.
- Allow Python support in 'perf script' when libtraceevent isn't
linked, as not all features requires it, for instance Intel PT does
not use tracepoints.
- Error if the python interpreter needed for jevents to work isn't
available and NO_JEVENTS=1 isn't set, preventing a build without
support for JSON vendor events, which is a rare but possible
condition. The two check error messages:
$(error ERROR: No python interpreter needed for jevents generation. Install python or build with NO_JEVENTS=1.)
$(error ERROR: Python interpreter needed for jevents generation too old (older than 3.6). Install a newer python or build with NO_JEVENTS=1.)
- Make libbpf 1.0 the minimum required when building with out of
tree, distro provided libbpf.
- Use libsdtc++'s and LLVM's libcxx's __cxa_demangle, a portable C++
demangler, add 'perf test' entry for it.
- Make binutils libraries opt in, as distros disable building with it
due to licensing, they were used for C++ demangling, for instance.
- Switch libpfm4 to opt-out rather than opt-in, if libpfm-devel (or
equivalent) isn't installed, we'll just have a build warning:
Makefile.config:1144: libpfm4 not found, disables libpfm4 support. Please install libpfm4-dev
- Add a feature test for scandirat(), that is not implemented so far
in musl and uclibc, disabling features that need it, such as
scanning for tracepoints in /sys/kernel/tracing/events.
perf BPF filters:
- New feature where BPF can be used to filter samples, for instance:
$ sudo ./perf record -e cycles --filter 'period > 1000' true
$ sudo ./perf script
perf-exec 2273949 546850.708501: 5029 cycles:
ffffffff826f9e25 finish_wait+0x5 ([kernel.kallsyms])
perf-exec 2273949 546850.708508: 32409 cycles:
ffffffff826f9e25 finish_wait+0x5 ([kernel.kallsyms])
perf-exec 2273949 546850.708526: 143369 cycles:
ffffffff82b4cdbf xas_start+0x5f ([kernel.kallsyms])
perf-exec 2273949 546850.708600: 372650 cycles:
ffffffff8286b8f7 __pagevec_lru_add+0x117 ([kernel.kallsyms])
perf-exec 2273949 546850.708791: 482953 cycles:
ffffffff829190de __mod_memcg_lruvec_state+0x4e ([kernel.kallsyms])
true 2273949 546850.709036: 501985 cycles:
ffffffff828add7c tlb_gather_mmu+0x4c ([kernel.kallsyms])
true 2273949 546850.709292: 503065 cycles:
7f2446d97c03 _dl_map_object_deps+0x973 (/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
- In addition to 'period' (PERF_SAMPLE_PERIOD), the other
PERF_SAMPLE_ can be used for filtering, and also some other sample
accessible values, from tools/perf/Documentation/perf-record.txt:
Essentially the BPF filter expression is:
<term> <operator> <value> (("," | "||") <term> <operator> <value>)*
The <term> can be one of:
ip, id, tid, pid, cpu, time, addr, period, txn, weight, phys_addr,
code_pgsz, data_pgsz, weight1, weight2, weight3, ins_lat, retire_lat,
p_stage_cyc, mem_op, mem_lvl, mem_snoop, mem_remote, mem_lock,
mem_dtlb, mem_blk, mem_hops
The <operator> can be one of:
==, !=, >, >=, <, <=, &
The <value> can be one of:
<number> (for any term)
na, load, store, pfetch, exec (for mem_op)
l1, l2, l3, l4, cxl, io, any_cache, lfb, ram, pmem (for mem_lvl)
na, none, hit, miss, hitm, fwd, peer (for mem_snoop)
remote (for mem_remote)
na, locked (for mem_locked)
na, l1_hit, l1_miss, l2_hit, l2_miss, any_hit, any_miss, walk, fault (for mem_dtlb)
na, by_data, by_addr (for mem_blk)
hops0, hops1, hops2, hops3 (for mem_hops)
perf lock contention:
- Show lock type with address.
- Track and show mmap_lock, siglock and per-cpu rq_lock with address.
This is done for mmap_lock by following the current->mm pointer:
$ sudo ./perf lock con -abl -- sleep 10
contended total wait max wait avg wait address symbol
...
16344 312.30 ms 2.22 ms 19.11 us
ffff8cc702595640
17686 310.08 ms 1.49 ms 17.53 us
ffff8cc7025952c0
3 84.14 ms 45.79 ms 28.05 ms
ffff8cc78114c478 mmap_lock
3557 76.80 ms 68.75 us 21.59 us
ffff8cc77ca3af58
1 68.27 ms 68.27 ms 68.27 ms
ffff8cda745dfd70
9 54.53 ms 7.96 ms 6.06 ms
ffff8cc7642a48b8 mmap_lock
14629 44.01 ms 60.00 us 3.01 us
ffff8cc7625f9ca0
3481 42.63 ms 140.71 us 12.24 us
ffffffff937906ac vmap_area_lock
16194 38.73 ms 42.15 us 2.39 us
ffff8cd397cbc560
11 38.44 ms 10.39 ms 3.49 ms
ffff8ccd6d12fbb8 mmap_lock
1 5.43 ms 5.43 ms 5.43 ms
ffff8cd70018f0d8
1674 5.38 ms 422.93 us 3.21 us
ffffffff92e06080 tasklist_lock
581 4.51 ms 130.68 us 7.75 us
ffff8cc9b1259058
5 3.52 ms 1.27 ms 703.23 us
ffff8cc754510070
112 3.47 ms 56.47 us 31.02 us
ffff8ccee38b3120
381 3.31 ms 73.44 us 8.69 us
ffffffff93790690 purge_vmap_area_lock
255 3.19 ms 36.35 us 12.49 us
ffff8d053ce30c80
- Update default map size to 16384.
- Allocate single letter option -M for --map-nr-entries, as it is
proving being frequently used.
- Fix struct rq lock access for older kernels with BPF's CO-RE
(Compile once, run everywhere).
- Fix problems found with MSAn.
perf report/top:
- Add inline information when using --call-graph=fp or lbr, as was
already done to the --call-graph=dwarf callchain mode.
- Improve the 'srcfile' sort key performance by really using an
optimization introduced in 6.2 for the 'srcline' sort key that
avoids calling addr2line for comparision with each sample.
perf sched:
- Make 'perf sched latency/map/replay' to use "sched:sched_waking"
instead of "sched:sched_waking", consistent with 'perf record'
since
d566a9c2d482 ("perf sched: Prefer sched_waking event when it
exists").
perf ftrace:
- Make system wide the default target for latency subcommand, run the
following command then generate some network traffic and press
control+C:
# perf ftrace latency -T __kfree_skb
^C
DURATION | COUNT | GRAPH |
0 - 1 us | 27 | ############# |
1 - 2 us | 22 | ########### |
2 - 4 us | 8 | #### |
4 - 8 us | 5 | ## |
8 - 16 us | 24 | ############ |
16 - 32 us | 2 | # |
32 - 64 us | 1 | |
64 - 128 us | 0 | |
128 - 256 us | 0 | |
256 - 512 us | 0 | |
512 - 1024 us | 0 | |
1 - 2 ms | 0 | |
2 - 4 ms | 0 | |
4 - 8 ms | 0 | |
8 - 16 ms | 0 | |
16 - 32 ms | 0 | |
32 - 64 ms | 0 | |
64 - 128 ms | 0 | |
128 - 256 ms | 0 | |
256 - 512 ms | 0 | |
512 - 1024 ms | 0 | |
1 - ... s | 0 | |
#
perf top:
- Add --branch-history (LBR: Last Branch Record) option, just like
already available for 'perf record'.
- Fix segfault in thread__comm_len() where thread->comm was being
used outside thread->comm_lock.
perf annotate:
- Allow configuring objdump and addr2line in ~/.perfconfig., so that
you can use alternative binaries, such as llvm's.
perf kvm:
- Add TUI mode for 'perf kvm stat report'.
Reference counting:
- Add reference count checking infrastructure to check for use after
free, done to the 'cpumap', 'namespaces', 'maps' and 'map' structs,
more to come.
To build with it use -DREFCNT_CHECKING=1 in the make command line
to build tools/perf. Documented at:
https://perf.wiki.kernel.org/index.php/Reference_Count_Checking
- The above caught, for instance, fix, present in this series:
- Fix maps use after put in 'perf test "Share thread maps"':
'maps' is copied from leader, but the leader is put on line 79
and then 'maps' is used to read the reference count below - so
a use after put, with the put of maps happening within
thread__put.
Fixed by reversing the order of puts so that the leader is put
last.
- Also several fixes were made to places where reference counts were
not being held.
- Make this one of the tests in 'make -C tools/perf build-test' to
regularly build test it and to make sure no direct access to the
reference counted structs are made, doing that via accessors to
check the validity of the struct pointer.
ARM64:
- Fix 'perf report' segfault when filtering coresight traces by
sparse lists of CPUs.
- Add support for 'simd' as a sort field for 'perf report', to show
ARM's NEON SIMD's predicate flags: "partial" and "empty".
arm64 vendor events:
- Add N1 metrics.
Intel vendor events:
- Add graniterapids, grandridge and sierraforrest events.
- Refresh events for: alderlake, aldernaken, broadwell, broadwellde,
broadwellx, cascadelakx, haswell, haswellx, icelake, icelakex,
jaketown, meteorlake, knightslanding, sandybridge, sapphirerapids,
silvermont, skylake, tigerlake and westmereep-dp
- Refresh metrics for alderlake-n, broadwell, broadwellde,
broadwellx, haswell, haswellx, icelakex, ivybridge, ivytown and
skylakex.
perf stat:
- Implement --topdown using JSON metrics.
- Add TopdownL1 JSON metric as a default if present, but disable it
for now for some Intel hybrid architectures, a series of patches
addressing this is being reviewed and will be submitted for v6.5.
- Use metrics for --smi-cost.
- Update topdown documentation.
Vendor events (JSON) infrastructure:
- Add support for computing and printing metric threshold values. For
instance, here is one found in thesapphirerapids json file:
{
"BriefDescription": "Percentage of cycles spent in System Management Interrupts.",
"MetricExpr": "((msr@aperf@ - cycles) / msr@aperf@ if msr@smi@ > 0 else 0)",
"MetricGroup": "smi",
"MetricName": "smi_cycles",
"MetricThreshold": "smi_cycles > 0.1",
"ScaleUnit": "100%"
},
- Test parsing metric thresholds with the fake PMU in 'perf test
pmu-events'.
- Support for printing metric thresholds in 'perf list'.
- Add --metric-no-threshold option to 'perf stat'.
- Add rand (reverse and) and has_pmem (optane memory) support to
metrics.
- Sort list of input files to avoid depending on the order from
readdir() helping in obtaining reproducible builds.
S/390:
- Add common metrics: - CPI (cycles per instruction), prbstate (ratio
of instructions executed in problem state compared to total number
of instructions), l1mp (Level one instruction and data cache misses
per 100 instructions).
- Add cache metrics for z13, z14, z15 and z16.
- Add metric for TLB and cache.
ARM:
- Add raw decoding for SPE (Statistical Profiling Extension) v1.3 MTE
(Memory Tagging Extension) and MOPS (Memory Operations) load/store.
Intel PT hardware tracing:
- Add event type names UINTR (User interrupt delivered) and UIRET
(Exiting from user interrupt routine), documented in table 32-50
"CFE Packet Type and Vector Fields Details" in the Intel Processor
Trace chapter of The Intel SDM Volume 3 version 078.
- Add support for new branch instructions ERETS and ERETU.
- Fix CYC timestamps after standalone CBR
ARM CoreSight hardware tracing:
- Allow user to override timestamp and contextid settings.
- Fix segfault in dso lookup.
- Fix timeless decode mode detection.
- Add separate decode paths for timeless and per-thread modes.
auxtrace:
- Fix address filter entire kernel size.
Miscellaneous:
- Fix use-after-free and unaligned bugs in the PLT handling routines.
- Use zfree() to reduce chances of use after free.
- Add missing 0x prefix for addresses printed in hexadecimal in 'perf
probe'.
- Suppress massive unsupported target platform errors in the unwind
code.
- Fix return incorrect build_id size in elf_read_build_id().
- Fix 'perf scripts intel-pt-events.py' IPC output for Python 2 .
- Add missing new parameter in kfree_skb tracepoint to the python
scripts using it.
- Add 'perf bench syscall fork' benchmark.
- Add support for printing PERF_MEM_LVLNUM_UNC (Uncached access) in
'perf mem'.
- Fix wrong size expectation for perf test 'Setup struct
perf_event_attr' caused by the patch adding
perf_event_attr::config3.
- Fix some spelling mistakes"
* tag 'perf-tools-for-v6.4-3-2023-05-06' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (365 commits)
Revert "perf build: Make BUILD_BPF_SKEL default, rename to NO_BPF_SKEL"
Revert "perf build: Warn for BPF skeletons if endian mismatches"
perf metrics: Fix SEGV with --for-each-cgroup
perf bpf skels: Stop using vmlinux.h generated from BTF, use subset of used structs + CO-RE
perf stat: Separate bperf from bpf_profiler
perf test record+probe_libc_inet_pton: Fix call chain match on x86_64
perf test record+probe_libc_inet_pton: Fix call chain match on s390
perf tracepoint: Fix memory leak in is_valid_tracepoint()
perf cs-etm: Add fix for coresight trace for any range of CPUs
perf build: Fix unescaped # in perf build-test
perf unwind: Suppress massive unsupported target platform errors
perf script: Add new parameter in kfree_skb tracepoint to the python scripts using it
perf script: Print raw ip instead of binary offset for callchain
perf symbols: Fix return incorrect build_id size in elf_read_build_id()
perf list: Modify the warning message about scandirat(3)
perf list: Fix memory leaks in print_tracepoint_events()
perf lock contention: Rework offset calculation with BPF CO-RE
perf lock contention: Fix struct rq lock access
perf stat: Disable TopdownL1 on hybrid
perf stat: Avoid SEGV on counter->name
...
Linus Torvalds [Sun, 7 May 2023 18:04:26 +0000 (11:04 -0700)]
Merge tag 'core-debugobjects-2023-05-06' of git://git./linux/kernel/git/tip/tip
Pull debugobjects fix from Thomas Gleixner:
"A single fix for debugobjects:
The recent fix to ensure atomicity of lookup and allocation
inadvertently broke the pool refill mechanism, so that debugobject
OOMs now in certain situations. The reason is that the functions which
got updated no longer invoke debug_objecs_init(), which is now the
only place to care about refilling the tracking object pool.
Restore the original behaviour by adding explicit refill opportunities
to those places"
* tag 'core-debugobjects-2023-05-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
debugobject: Ensure pool refill (again)
Linus Torvalds [Sun, 7 May 2023 17:57:14 +0000 (10:57 -0700)]
Merge tag 'v6.4-p2' of git://git./linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
- A long-standing bug in crypto_engine
- A buggy but harmless check in the sun8i-ss driver
- A regression in the CRYPTO_USER interface
* tag 'v6.4-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: api - Fix CRYPTO_USER checks for report function
crypto: engine - fix crypto_queue backlog handling
crypto: sun8i-ss - Fix a test in sun8i_ss_setup_ivs()
Linus Torvalds [Sun, 7 May 2023 17:46:21 +0000 (10:46 -0700)]
Merge tag '6.4-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
"smb3 client fixes, mostly DFS or reconnect related:
- Two DFS connection sharing fixes
- DFS refresh fix
- Reconnect fix
- Two potential use after free fixes
- Also print prefix patch in mount debug msg
- Two small cleanup fixes"
* tag '6.4-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
cifs: Remove unneeded semicolon
cifs: fix sharing of DFS connections
cifs: avoid potential races when handling multiple dfs tcons
cifs: protect access of TCP_Server_Info::{origin,leaf}_fullpath
cifs: fix potential race when tree connecting ipc
cifs: fix potential use-after-free bugs in TCP_Server_Info::hostname
cifs: print smb3_fs_context::source when mounting
cifs: protect session status check in smb2_reconnect()
SMB3.1.1: correct definition for app_instance_id create contexts
Linus Torvalds [Sun, 7 May 2023 17:31:45 +0000 (10:31 -0700)]
Merge tag 'clk-for-linus' of git://git./linux/kernel/git/clk/linux
Pull clk fixes from Stephen Boyd:
"A couple more patches that would be good to get into -rc1:
- Revert an i.MX patch that's causing video failures because division
math goes sideways
- Fix a clang + W=1 build isue where FIELD_PREP() is taking a 32-bit
variable instead of the usual u64 type
- Fix a Kconfig bug in the StarFive JH7110 clk config that selects a
reset controller when it can't be selected"
* tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: starfive: Fix RESET_STARFIVE_JH7110 can't be selected in a specified case
clk: sp7021: Adjust width of _m in HWM_FIELD_PREP()
Revert "clk: imx: composite-8m: Add support to determine_rate"
Linus Torvalds [Sun, 7 May 2023 17:17:33 +0000 (10:17 -0700)]
Merge tag 'mailbox-v6.4' of git://git.linaro.org/landing-teams/working/fujitsu/integration
Pull mailbox updates from Jassi Brar:
- mailbox api: allow direct registration to a channel and convert omap
and pcc to use mbox_bind_client
- omap and hi6220 : use of_property_read_bool
- test: fix double-free and use spinlock header
- rockchip and bcm-pdc: drop of_match_ptr
- mpfs: change config symbol
- mediatek gce: support MT6795
- qcom apcs: consolidate of_device_id and support IPQ9574
* tag 'mailbox-v6.4' of git://git.linaro.org/landing-teams/working/fujitsu/integration:
dt-bindings: mailbox: qcom: add compatible for IPQ9574 SoC
mailbox: qcom-apcs-ipc: do not grow the of_device_id
dt-bindings: mailbox: qcom,apcs-kpss-global: use fallbacks for few variants
dt-bindings: mailbox: mediatek,gce-mailbox: Add support for MT6795
mailbox: mpfs: convert SOC_MICROCHIP_POLARFIRE to ARCH_MICROCHIP_POLARFIRE
mailbox: bcm-pdc: drop of_match_ptr for ID table
mailbox: rockchip: drop of_match_ptr for ID table
mailbox: mailbox-test: Fix potential double-free in mbox_test_message_write()
mailbox: mailbox-test: Explicitly include header for spinlock support
mailbox: Use of_property_read_bool() for boolean properties
mailbox: pcc: Use mbox_bind_client
mailbox: omap: Use mbox_bind_client
mailbox: Allow direct registration to a channel
Linus Torvalds [Sun, 7 May 2023 17:00:09 +0000 (10:00 -0700)]
Merge tag 'for-6.4/io_uring-2023-05-07' of git://git.kernel.dk/linux
Pull more io_uring updates from Jens Axboe:
"Nothing major in here, just two different parts:
- A small series from Breno that enables passing the full SQE down
for ->uring_cmd().
This is a prerequisite for enabling full network socket operations.
Queued up a bit late because of some stylistic concerns that got
resolved, would be nice to have this in 6.4-rc1 so the dependent
work will be easier to handle for 6.5.
- Fix for the huge page coalescing, which was a regression introduced
in the 6.3 kernel release (Tobias)"
* tag 'for-6.4/io_uring-2023-05-07' of git://git.kernel.dk/linux:
io_uring: Remove unnecessary BUILD_BUG_ON
io_uring: Pass whole sqe to commands
io_uring: Create a helper to return the SQE size
io_uring/rsrc: check for nonconsecutive pages
Christophe JAILLET [Fri, 5 May 2023 18:39:33 +0000 (20:39 +0200)]
net: mdio: mvusb: Fix an error handling path in mvusb_mdio_probe()
Should of_mdiobus_register() fail, a previous usb_get_dev() call should be
undone as in the .disconnect function.
Fixes:
04e37d92fbed ("net: phy: add marvell usb to mdio controller")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>