io_uring: add a priority tw list for irq completion work
authorHao Xu <haoxu@linux.alibaba.com>
Tue, 7 Dec 2021 09:39:48 +0000 (17:39 +0800)
committerJens Axboe <axboe@kernel.dk>
Tue, 7 Dec 2021 22:01:57 +0000 (15:01 -0700)
commit4813c3779261fab4067edea28155a98c65a41b5f
tree011e44584854b03d58de1aa42ff1f905bfb3824f
parent24115c4e95e137b73954bbbd94354889552a4b08
io_uring: add a priority tw list for irq completion work

Now we have a lot of task_work users, some are just to complete a req
and generate a cqe. Let's put the work to a new tw list which has a
higher priority, so that it can be handled quickly and thus to reduce
avg req latency and users can issue next round of sqes earlier.
An explanatory case:

origin timeline:
    submit_sqe-->irq-->add completion task_work
    -->run heavy work0~n-->run completion task_work
now timeline:
    submit_sqe-->irq-->add completion task_work
    -->run completion task_work-->run heavy work0~n

Limitation: this optimization is only for those that submission and
reaping process are in different threads. Otherwise anyhow we have to
submit new sqes after returning to userspace, then the order of TWs
doesn't matter.

Tested this patch(and the following ones) by manually replace
__io_queue_sqe() in io_queue_sqe() by io_req_task_queue() to construct
'heavy' task works. Then test with fio:

ioengine=io_uring
sqpoll=1
thread=1
bs=4k
direct=1
rw=randread
time_based=1
runtime=600
randrepeat=0
group_reporting=1
filename=/dev/nvme0n1

Tried various iodepth.
The peak IOPS for this patch is 710K, while the old one is 665K.
For avg latency, difference shows when iodepth grow:
depth and avg latency(usec):
depth      new          old
 1        7.05         7.10
 2        8.47         8.60
 4        10.42        10.42
 8        13.78        13.22
 16       27.41        24.33
 32       49.40        53.08
 64       102.53       103.36
 128      196.98       205.61
 256      372.99       414.88
         512      747.23       791.30
         1024     1472.59      1538.72
         2048     3153.49      3329.01
         4096     6387.86      6682.54
         8192     12150.25     12774.14
         16384    23085.58     26044.71

Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/20211207093951.247840-3-haoxu@linux.alibaba.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
fs/io_uring.c