io_uring: reduce latency by reissueing the operation
It is quite frequent that when an operation fails and returns EAGAIN,
the data becomes available between that failure and the call to
vfs_poll() done by io_arm_poll_handler().
Detecting the situation and reissuing the operation is much faster
than going ahead and push the operation to the io-wq.
Performance improvement testing has been performed with:
Single thread, 1 TCP connection receiving a 5 Mbps stream, no sqpoll.
4 measurements have been taken:
1. The time it takes to process a read request when data is already available
2. The time it takes to process by calling twice io_issue_sqe() after vfs_poll() indicated that data was available
3. The time it takes to execute io_queue_async_work()
4. The time it takes to complete a read request asynchronously
2.25% of all the read operations did use the new path.
ready data (baseline)
avg 3657.
94182918628
min 580
max 20098
stddev 1213.
15975908162
reissue completion
average 7882.
67567567568
min 2316
max 28811
stddev 1982.
79172973284
insert io-wq time
average 8983.
82276995305
min 3324
max 87816
stddev 2551.
60056552038
async time completion
average 24670.
4758861127
min 10758
max 102612
stddev 3483.
92416873804
Conclusion:
On average reissuing the sqe with the patch code is 1.1uSec faster and
in the worse case scenario 59uSec faster than placing the request on
io-wq
On average completion time by reissuing the sqe with the patch code is
16.79uSec faster and in the worse case scenario 73.8uSec faster than
async completion.
Signed-off-by: Olivier Langlois <olivier@trillion01.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/9e8441419bb1b8f3c3fcc607b2713efecdef2136.1624364038.git.olivier@trillion01.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>