review.tizen.org Git - platform/kernel/linux-rpi.git/commit

author	Dragos Tatulea <dtatulea@nvidia.com>
	Tue, 21 Feb 2023 19:05:07 +0000 (21:05 +0200)
committer	Saeed Mahameed <saeedm@nvidia.com>
	Tue, 28 Mar 2023 20:43:59 +0000 (13:43 -0700)
commit	cd640b050368d5be6bccf1edb51b1e4c553555e6
tree	4911c91ea21916632daf68dd26846b8ed72771b3	tree \| snapshot
parent	4ba2b4988c98ce9b56b77a1610c3a7b70ee30b57	commit \| diff

net/mlx5e: RX, Break the wqe bulk refill in smaller chunks

To avoid overflowing the page pool's cache, don't release the
whole bulk which is usually larger than the cache refill size.
Group release+alloc instead into cache refill units that
allow releasing to the cache and then allocating from the cache.

A refill_unit variable is added as a iteration unit over the
wqe_bulk when doing release+alloc.

For a single ring, single core, default MTU (1500) TCP stream
test the number of pages allocated from the cache directly
(rx_pp_recycle_cached) increases from 0% to 52%:

+---------------------------------------------+
| Page Pool stats (/sec)  |  Before |   After |
+-------------------------+---------+---------+
|rx_pp_alloc_fast         | 2145422 | 2193802 |
|rx_pp_alloc_slow         |       2 |       0 |
|rx_pp_alloc_empty        |       2 |       0 |
|rx_pp_alloc_refill       |   34059 |   16634 |
|rx_pp_alloc_waive        |       0 |       0 |
|rx_pp_recycle_cached     |       0 | 1145818 |
|rx_pp_recycle_cache_full |       0 |       0 |
|rx_pp_recycle_ring       | 2179361 | 1064616 |
|rx_pp_recycle_ring_full  |     121 |       0 |
+---------------------------------------------+

With this patch, the performance for legacy rq for the above test is
back to baseline.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

drivers/net/ethernet/mellanox/mlx5/core/en.h		diff \| blob \| history
drivers/net/ethernet/mellanox/mlx5/core/en/params.c		diff \| blob \| history
drivers/net/ethernet/mellanox/mlx5/core/en_rx.c		diff \| blob \| history