block: fix bio_add_page for non trivial merge_bvec_fn case
authorDmitry Monakhov <dmonakhov@openvz.org>
Wed, 27 Jan 2010 19:44:36 +0000 (22:44 +0300)
committerJens Axboe <jens.axboe@oracle.com>
Thu, 28 Jan 2010 14:08:29 +0000 (15:08 +0100)
We have to properly decrease bi_size in order to merge_bvec_fn return
right result.  Otherwise this result in false merge rejects for two
absolutely valid bio_vecs.  This may cause significant performance
penalty for example fs_block_size == 1k and block device is raid0 with
small chunk_size = 8k. Then it is impossible to merge 7-th fs-block in
to bio which already has 6 fs-blocks.

Cc: <stable@kernel.org>
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
fs/bio.c

index 12429c9..88094af 100644 (file)
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -542,13 +542,18 @@ static int __bio_add_page(struct request_queue *q, struct bio *bio, struct page
 
                if (page == prev->bv_page &&
                    offset == prev->bv_offset + prev->bv_len) {
+                       unsigned int prev_bv_len = prev->bv_len;
                        prev->bv_len += len;
 
                        if (q->merge_bvec_fn) {
                                struct bvec_merge_data bvm = {
+                                       /* prev_bvec is already charged in
+                                          bi_size, discharge it in order to
+                                          simulate merging updated prev_bvec
+                                          as new bvec. */
                                        .bi_bdev = bio->bi_bdev,
                                        .bi_sector = bio->bi_sector,
-                                       .bi_size = bio->bi_size,
+                                       .bi_size = bio->bi_size - prev_bv_len,
                                        .bi_rw = bio->bi_rw,
                                };