Single thread fio test (read, bs=4k, ioengine=libaio, iodepth=128,
numjobs=1) over dm-thin device has poor performance versus bare nvme
device.
Further investigation with perf indicates that queue_work_on() consumes
over 20% CPU time when doing IO over dm-thin device. The call stack is
as follows.
- 40.57% thin_map
+ 22.07% queue_work_on
+ 9.95% dm_thin_find_block
+ 2.80% cell_defer_no_holder
1.91% inc_all_io_entry.isra.33.part.34
+ 1.78% bio_detain.isra.35
In cell_defer_no_holder(), wakeup_worker() is always called, no matter
whether the tc->deferred_bio_list list is empty or not. In single thread
IO model, this list is most likely empty. So skip waking up worker thread
if tc->deferred_bio_list list is empty.
Single thread IO performance improves from 448 MiB/s to 646 MiB/s (+44%)
once the needless wake_worker() calls are properly skipped.
Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
{
struct pool *pool = tc->pool;
unsigned long flags;
+ int has_work;
spin_lock_irqsave(&tc->lock, flags);
cell_release_no_holder(pool, cell, &tc->deferred_bio_list);
+ has_work = !bio_list_empty(&tc->deferred_bio_list);
spin_unlock_irqrestore(&tc->lock, flags);
- wake_worker(pool);
+ if (has_work)
+ wake_worker(pool);
}
static void thin_defer_bio(struct thin_c *tc, struct bio *bio);