io_uring: recycle apoll_poll entries
Particularly for networked workloads, io_uring intensively uses its
poll based backend to get a notification when data/space is available.
Profiling workloads, we see 3-4% of alloc+free that is directly attributed
to just the apoll allocation and free (and the rest being skb alloc+free).
For the fast path, we have ctx->uring_lock held already for both issue
and the inline completions, and we can utilize that to avoid any extra
locking needed to have a basic recycling cache for the apoll entries on
both the alloc and free side.
Double poll still requires an allocation. But those are rare and not
a fast path item.
With the simple cache in place, we see a 3-4% reduction in overhead for
the workload.
Signed-off-by: Jens Axboe <axboe@kernel.dk>