net/mlx5e: RX, Fix page_pool allocation failure recovery for striding rq
When a page allocation fails during refill in mlx5e_post_rx_mpwqes, the
page will be released again on the next refill call. This triggers the
page_pool negative page fragment count warning below:
[ 2436.447717] WARNING: CPU: 1 PID: 2419 at include/net/page_pool/helpers.h:130 mlx5e_page_release_fragmented.isra.0+0x42/0x50 [mlx5_core]
...
[ 2436.447895] RIP: 0010:mlx5e_page_release_fragmented.isra.0+0x42/0x50 [mlx5_core]
[ 2436.447991] Call Trace:
[ 2436.447975] mlx5e_post_rx_mpwqes+0x1d5/0xcf0 [mlx5_core]
[ 2436.447994] <IRQ>
[ 2436.447996] ? __warn+0x7d/0x120
[ 2436.448009] ? mlx5e_handle_rx_cqe_mpwrq+0x109/0x1d0 [mlx5_core]
[ 2436.448002] ? mlx5e_page_release_fragmented.isra.0+0x42/0x50 [mlx5_core]
[ 2436.448044] ? mlx5e_poll_rx_cq+0x87/0x6e0 [mlx5_core]
[ 2436.448061] ? report_bug+0x155/0x180
[ 2436.448065] ? handle_bug+0x36/0x70
[ 2436.448067] ? exc_invalid_op+0x13/0x60
[ 2436.448070] ? asm_exc_invalid_op+0x16/0x20
[ 2436.448079] mlx5e_napi_poll+0x122/0x6b0 [mlx5_core]
[ 2436.448077] ? mlx5e_page_release_fragmented.isra.0+0x42/0x50 [mlx5_core]
[ 2436.448113] ? generic_exec_single+0x35/0x100
[ 2436.448117] __napi_poll+0x25/0x1a0
[ 2436.448120] net_rx_action+0x28a/0x300
[ 2436.448122] __do_softirq+0xcd/0x279
[ 2436.448126] irq_exit_rcu+0x6a/0x90
[ 2436.448128] sysvec_apic_timer_interrupt+0x6e/0x90
[ 2436.448130] </IRQ>
This patch fixes the striding rq case by setting the skip flag on all
the wqe pages that were expected to have new pages allocated.
Fixes: 4c2a13236807 ("net/mlx5e: RX, Defer page release in striding rq for better recycling")
Tested-by: Chris Mason <clm@fb.com>
Reported-by: Chris Mason <clm@fb.com>
Closes: https://lore.kernel.org/netdev/117FF31A-7BE0-4050-B2BB-E41F224FF72F@meta.com
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>