multipath: add fast_io_fail and dev_loss_tmo config parameters
authorJun'ichi Nomura <j-nomura@ce.jp.nec.com>
Fri, 30 Jul 2010 09:13:14 +0000 (18:13 +0900)
committerChristophe Varoqui <christophe.varoqui@opensvc.com>
Thu, 2 Sep 2010 07:02:01 +0000 (09:02 +0200)
commitdab80d6aa7decf2fe76971f23dbe41d1c3e5ae95
treeece0f1a41e37fa055528832ea34c09a0e6829ddb
parent5af359eda5ae08881424d90acc59cbfbcd6c1e28
multipath: add fast_io_fail and dev_loss_tmo config parameters

Hi,

(03/23/10 11:44), Benjamin Marzinski wrote:
> This patch adds two new configuration parameters to multipath.conf,
> fast_io_fail_tmo and dev_loss_tmo which set
>
> /sys/class/fc_remote_ports/rport-<host>:<channel>-<rport_id>/fast_io_fail_tmo and
> /sys/class/fc_remote_ports/rport-<host>:<channel>-<rport_id>/dev_loss_tmo
...

This is nice feature but the code uses scsi_id instead of rport_id:

> +sysfs_set_scsi_tmo (struct multipath *mpp)
...
> + vector_foreach_slot(mpp->paths, pp, i) {
> + if (safe_snprintf(attr_path, SYSFS_PATH_SIZE,
> +                    "/class/fc_remote_ports/rport-%d:%d-%d",
> +   pp->sg_id.host_no, pp->sg_id.channel,
> +   pp->sg_id.scsi_id)) {
> + condlog(0, "attr_path '/class/fc_remote_ports/rport-%d:%d-%d' too large", pp->sg_id.host_no, pp->sg_id.channel, pp->sg_id.scsi_id);
> + return 1;
> + }

So it sets fast_io_fail_tmo/dev_loss_tmo for wrong rport.

For example, I have a storage with node_id 0x2000003013842bcb
connected via switch, whose node_id is 0x100000051e09ee30.
When I set 'fast_io_fail_tmo = 8' in multipath.conf,
multipath command sets the timeout like this:
  # for f in /sys/class/fc_remote_ports/rport-*/fast_io_fail_tmo; do d=$(dirname $f); echo $(basename $d):$(cat $d/node_name):$(cat $f); done
  rport-0:0-0:0x100000051e09ee30:8
  rport-0:0-1:0x100000051e09ee30:8
  rport-0:0-2:0x2000003013842bcb:off
  rport-0:0-3:0x2000003013842bcb:off
  rport-1:0-0:0x100000051e09ee30:8
  rport-1:0-1:0x100000051e09ee30:8
  rport-1:0-2:0x2000003013842bcb:off
  rport-1:0-3:0x2000003013842bcb:off
As a result, when a link is down for the storage and fast_io_fail_tmo
has passed, I/O will be still blocked.

Attached is a quick patch for this problem.

With this patch, fast_io_fail_tmo is set like this:
  rport-0:0-0:0x100000051e09ee30:8
  rport-0:0-1:0x100000051e09ee30:8
  rport-0:0-2:0x2000003013842bcb:off
  rport-0:0-3:0x2000003013842bcb:off
  rport-1:0-0:0x100000051e09ee30:8
  rport-1:0-1:0x100000051e09ee30:8
  rport-1:0-2:0x2000003013842bcb:off
  rport-1:0-3:0x2000003013842bcb:off

Others might have better idea about resolving rport_id from target.
Mike, Hannes, any comments?

Thanks,
--
Jun'ichi Nomura, NEC Corporation

rport_id != scsi_id

multipath should find rport_id from the target_id.
libmultipath/discovery.c