crush: fix off-by-one errors in total_tries refactor
authorIlya Dryomov <ilya.dryomov@inktank.com>
Wed, 19 Mar 2014 14:58:36 +0000 (16:58 +0200)
committerGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Tue, 13 May 2014 11:32:52 +0000 (13:32 +0200)
commite24b7822c7fea8d23d30baea8aabf96ec8522936
tree4bd2bc8e42e6f0cfbb1a6319629c28193a962305
parent2d1871289ce4af1bdee5eb46992f0ad3e005a508
crush: fix off-by-one errors in total_tries refactor

commit 48a163dbb517eba13643bf404a0d695c1ab0a60d upstream.

Back in 27f4d1f6bc32c2ed7b2c5080cbd58b14df622607 we refactored the CRUSH
code to allow adjustment of the retry counts on a per-pool basis.  That
commit had an off-by-one bug: the previous "tries" counter was a *retry*
count, not a *try* count, but the new code was passing in 1 meaning
there should be no retries.

Fix the ftotal vs tries comparison to use < instead of <= to fix the
problem.  Note that the original code used <= here, which means the
global "choose_total_tries" tunable is actually counting retries.
Compensate for that by adding 1 in crush_do_rule when we pull the tunable
into the local variable.

This was noticed looking at output from a user provided osdmap.
Unfortunately the map doesn't illustrate the change in mapping behavior
and I haven't managed to construct one yet that does.  Inspection of the
crush debug output now aligns with prior versions, though.

Reflects ceph.git commit 795704fd615f0b008dcc81aa088a859b2d075138.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
net/ceph/crush/mapper.c