Saeed Mahameed says:
====================
mlx5-updates-2023-03-28
Dragos Tatulea says:
====================
net/mlx5e: RX, Drop page_cache and fully use page_pool
For page allocation on the rx path, the mlx5e driver has been using an
internal page cache in tandem with the page pool. The internal page
cache uses a queue for page recycling which has the issue of head of
queue blocking.
This patch series drops the internal page_cache altogether and uses the
page_pool to implement everything that was done by the page_cache
before:
* Let the page_pool handle dma mapping and unmapping.
* Use fragmented pages with fragment counter instead of tracking via
page ref.
* Enable skb recycling.
The patch series has the following effects on the rx path:
* Improved performance for the cases when there was low page recycling
due to head of queue blocking in the internal page_cache. The test
for this was running a single iperf TCP stream to a rx queue
which is bound on the same cpu as the application.
|-------------+--------+--------+------+---------|
| rq type | before | after | unit | diff |
|-------------+--------+--------+------+---------|
| striding rq | 30.1 | 31.4 | Gbps | 4.14 % |
| legacy rq | 30.2 | 33.0 | Gbps | 8.48 % |
|-------------+--------+--------+------+---------|
* Small XDP performance degradation. The test was is XDP drop
program running on a single rx queue with small packets incoming
it looks like this:
|-------------+----------+----------+------+---------|
| rq type | before | after | unit | diff |
|-------------+----------+----------+------+---------|
| striding rq |
19725449 |
18544617 | pps | -6.37 % |
| legacy rq |
19879931 |
18631841 | pps | -6.70 % |
|-------------+----------+----------+------+---------|
This will be handled in a different patch series by adding support for
multi-packet per page.
* For other cases the performance is roughly the same.
The above numbers were obtained on the following system:
24 core Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz
32 GB RAM
ConnectX-7 single port
The breakdown on the patch series is the following:
* Preparations for introducing the mlx5e_frag_page struct.
* Delete the mlx5e_page_cache struct.
* Enable dma mapping from page_pool.
* Enable skb recycling and fragment counting.
* Do deferred release of pages (just before alloc) to ensure better
page_pool cache utilization.
====================
* tag 'mlx5-updates-2023-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5e: RX, Remove unnecessary recycle parameter and page_cache stats
net/mlx5e: RX, Break the wqe bulk refill in smaller chunks
net/mlx5e: RX, Increase WQE bulk size for legacy rq
net/mlx5e: RX, Split off release path for xsk buffers for legacy rq
net/mlx5e: RX, Defer page release in legacy rq for better recycling
net/mlx5e: RX, Change wqe last_in_page field from bool to bit flags
net/mlx5e: RX, Defer page release in striding rq for better recycling
net/mlx5e: RX, Rename xdp_xmit_bitmap to a more generic name
net/mlx5e: RX, Enable skb page recycling through the page_pool
net/mlx5e: RX, Enable dma map and sync from page_pool allocator
net/mlx5e: RX, Remove internal page_cache
net/mlx5e: RX, Store SHAMPO header pages in array
net/mlx5e: RX, Remove alloc unit layout constraint for striding rq
net/mlx5e: RX, Remove alloc unit layout constraint for legacy rq
net/mlx5e: RX, Remove mlx5e_alloc_unit argument in page allocation
====================
Link: https://lore.kernel.org/r/20230328205623.142075-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>