md/raid5: fix bug that could result in reads from a failed device.
authorNeilBrown <neilb@suse.de>
Tue, 25 Oct 2011 23:31:04 +0000 (10:31 +1100)
committerNeilBrown <neilb@suse.de>
Tue, 25 Oct 2011 23:31:04 +0000 (10:31 +1100)
This bug was introduced in 415e72d034c50520ddb7ff79e7d1792c1306f0c9
which was in 2.6.36.

There is a small window of time between when a device fails and when
it is removed from the array.  During this time we might still read
from it, but we won't write to it - so it is possible that we could
read stale data.

We didn't need the test of 'Faulty' before because the test on
In_sync is sufficient.  Since we started allowing reads from the early
part of non-In_sync devices we need a test on Faulty too.

This is suitable for any kernel from 2.6.36 onwards, though the patch
might need a bit of tweaking in 3.0 and earlier.

Cc: stable@kernel.org
Signed-off-by: NeilBrown <neilb@suse.de>
drivers/md/raid5.c

index eea9379..521bf26 100644 (file)
@@ -3062,7 +3062,7 @@ static void analyse_stripe(struct stripe_head *sh, struct stripe_head_state *s)
                        }
                } else if (test_bit(In_sync, &rdev->flags))
                        set_bit(R5_Insync, &dev->flags);
-               else {
+               else if (!test_bit(Faulty, &rdev->flags)) {
                        /* in sync if before recovery_offset */
                        if (sh->sector + STRIPE_SECTORS <= rdev->recovery_offset)
                                set_bit(R5_Insync, &dev->flags);