io_uring: optimise io_free_batch_list()
authorPavel Begunkov <asml.silence@gmail.com>
Mon, 4 Oct 2021 19:02:55 +0000 (20:02 +0100)
committerJens Axboe <axboe@kernel.dk>
Tue, 19 Oct 2021 11:49:54 +0000 (05:49 -0600)
Delay reading the next node in io_free_batch_list(), allows the compiler
to load the value a bit later improving register spilling in some cases.
With gcc 11.1 it helped to move @task_refs variable from the stack to a
register and optimises out a couple of per request instructions.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/cc9fdfb6f72a4e8bc9918a5e9f2d97869a263ae4.1633373302.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
fs/io_uring.c

index e404c98..1b007f7 100644 (file)
@@ -2280,9 +2280,10 @@ static void io_free_batch_list(struct io_ring_ctx *ctx,
                struct io_kiocb *req = container_of(node, struct io_kiocb,
                                                    comp_list);
 
-               node = req->comp_list.next;
-               if (!req_ref_put_and_test(req))
+               if (!req_ref_put_and_test(req)) {
+                       node = req->comp_list.next;
                        continue;
+               }
 
                io_queue_next(req);
                io_dismantle_req(req);
@@ -2294,6 +2295,7 @@ static void io_free_batch_list(struct io_ring_ctx *ctx,
                        task_refs = 0;
                }
                task_refs++;
+               node = req->comp_list.next;
                wq_stack_add_head(&req->comp_list, &ctx->submit_state.free_list);
        } while (node);