iomap: Add per-block dirty state tracking to improve performance
authorRitesh Harjani (IBM) <ritesh.list@gmail.com>
Mon, 10 Jul 2023 21:12:43 +0000 (14:12 -0700)
committerRitesh Harjani (IBM) <ritesh.list@gmail.com>
Tue, 25 Jul 2023 05:25:56 +0000 (10:55 +0530)
commit4ce02c67972211be488408c275c8fbf19faf29b3
tree976b4a0fa2e9d6f6dd88a2a25567f5d52c5ff2ff
parenta01b8f225248e86f3328a48c3311882148a8c5d3
iomap: Add per-block dirty state tracking to improve performance

When filesystem blocksize is less than folio size (either with
mapping_large_folio_support() or with blocksize < pagesize) and when the
folio is uptodate in pagecache, then even a byte write can cause
an entire folio to be written to disk during writeback. This happens
because we currently don't have a mechanism to track per-block dirty
state within struct iomap_folio_state. We currently only track uptodate
state.

This patch implements support for tracking per-block dirty state in
iomap_folio_state->state bitmap. This should help improve the filesystem
write performance and help reduce write amplification.

Performance testing of below fio workload reveals ~16x performance
improvement using nvme with XFS (4k blocksize) on Power (64K pagesize)
FIO reported write bw scores improved from around ~28 MBps to ~452 MBps.

1. <test_randwrite.fio>
[global]
ioengine=psync
rw=randwrite
overwrite=1
pre_read=1
direct=0
bs=4k
size=1G
dir=./
numjobs=8
fdatasync=1
runtime=60
iodepth=64
group_reporting=1

[fio-run]

2. Also our internal performance team reported that this patch improves
   their database workload performance by around ~83% (with XFS on Power)

Reported-by: Aravinda Herle <araherle@in.ibm.com>
Reported-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
fs/gfs2/aops.c
fs/iomap/buffered-io.c
fs/xfs/xfs_aops.c
fs/zonefs/file.c
include/linux/iomap.h