From a955318fe67ec0d962760b5ee58e74bffaf649b8 Mon Sep 17 00:00:00 2001 From: Matteo Croce Date: Mon, 14 Jun 2021 04:25:04 +0200 Subject: [PATCH] stmmac: align RX buffers On RX an SKB is allocated and the received buffer is copied into it. But on some architectures, the memcpy() needs the source and destination buffers to have the same alignment to be efficient. This is not our case, because SKB data pointer is misaligned by two bytes to compensate the ethernet header. Align the RX buffer the same way as the SKB one, so the copy is faster. An iperf3 RX test gives a decent improvement on a RISC-V machine: before: [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 733 MBytes 615 Mbits/sec 88 sender [ 5] 0.00-10.01 sec 730 MBytes 612 Mbits/sec receiver after: [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 1.10 GBytes 942 Mbits/sec 0 sender [ 5] 0.00-10.00 sec 1.09 GBytes 940 Mbits/sec receiver And the memcpy() overhead during the RX drops dramatically. before: Overhead Shared O Symbol 43.35% [kernel] [k] memcpy 33.77% [kernel] [k] __asm_copy_to_user 3.64% [kernel] [k] sifive_l2_flush64_range after: Overhead Shared O Symbol 45.40% [kernel] [k] __asm_copy_to_user 28.09% [kernel] [k] memcpy 4.27% [kernel] [k] sifive_l2_flush64_range Signed-off-by: Matteo Croce Signed-off-by: David S. Miller --- drivers/net/ethernet/stmicro/stmmac/stmmac.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h b/drivers/net/ethernet/stmicro/stmmac/stmmac.h index 6655cb8e24cf..e735134e8487 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h @@ -339,9 +339,9 @@ static inline bool stmmac_xdp_is_enabled(struct stmmac_priv *priv) static inline unsigned int stmmac_rx_offset(struct stmmac_priv *priv) { if (stmmac_xdp_is_enabled(priv)) - return XDP_PACKET_HEADROOM; + return XDP_PACKET_HEADROOM + NET_IP_ALIGN; - return 0; + return NET_SKB_PAD + NET_IP_ALIGN; } void stmmac_disable_rx_queue(struct stmmac_priv *priv, u32 queue); -- 2.34.1