md raid10: fix NULL deference in handle_write_completed()
authorYufen Yu <yuyufen@huawei.com>
Tue, 6 Feb 2018 09:39:15 +0000 (17:39 +0800)
committerShaohua Li <sh.li@alibaba-inc.com>
Mon, 19 Feb 2018 17:40:36 +0000 (09:40 -0800)
commit01a69cab01c184d3786af09e9339311123d63d22
tree6741de4025476a9d0db6e1811d7b5e838fa65da8
parent39772f0a7be3b3dc26c74ea13fe7847fd1522c8b
md raid10: fix NULL deference in handle_write_completed()

In the case of 'recover', an r10bio with R10BIO_WriteError &
R10BIO_IsRecover will be progressed by handle_write_completed().
This function traverses all r10bio->devs[copies].
If devs[m].repl_bio != NULL, it thinks conf->mirrors[dev].replacement
is also not NULL. However, this is not always true.

When there is an rdev of raid10 has replacement, then each r10bio
->devs[m].repl_bio != NULL in conf->r10buf_pool. However, in 'recover',
even if corresponded replacement is NULL, it doesn't clear r10bio
->devs[m].repl_bio, resulting in replacement NULL deference.

This bug was introduced when replacement support for raid10 was
added in Linux 3.3.

As NeilBrown suggested:
Elsewhere the determination of "is this device part of the
resync/recovery" is made by resting bio->bi_end_io.
If this is end_sync_write, then we tried to write here.
If it is NULL, then we didn't try to write.

Fixes: 9ad1aefc8ae8 ("md/raid10:  Handle replacement devices during resync.")
Cc: stable (V3.3+)
Suggested-by: NeilBrown <neilb@suse.com>
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Shaohua Li <sh.li@alibaba-inc.com>
drivers/md/raid10.c