staging/lustre/ptlrpc: make ptlrpcd threads cpt-aware
On NUMA systems, the placement of worker threads relative to the
memory they use greatly affects performance. The CPT mechanism can be
used to constrain a number of Lustre thread types, and this change
makes it possible to configure the placement of ptlrpcd threads in a
similar manner.
To simplify the code changes, the global structures used to manage
ptlrpcd threads are changed to one per CPT. In particular this means
there will be one ptlrpcd recovery thread per CPT.
To prevent ptlrpcd threads from wandering all over the system, all
ptlrpcd thread are bound to a CPT. Note that some CPT configuration
is always created, but the defaults are not likely to be correct for
a NUMA system. After discussing the options with Liang Zhen we
decided that we would not bind ptlrpcd threads to specific CPUs,
and rather trust the kernel scheduler to migrate ptlrpcd threads.
With all ptlrpcd threads bound to a CPT, but not to specific CPUs,
the load policy mechanism can be radically simplified:
- PDL_POLICY_LOCAL and PDL_POLICY_ROUND are currently identical.
- PDL_POLICY_ROUND, if fully implemented, would cost us the locality
we are trying to achieve, so most or all calls using this policy
would have to be changed to PDL_POLICY_LOCAL.
- PDL_POLICY_PREFERRED is not used, and cannot be implemented without
binding ptlrpcd threads to individual CPUs.
- PDL_POLICY_SAME is rarely used, and cannot be implemented without
binding ptlrpcd threads to individual CPUs.
The partner mechanism is also updated, because now all ptlrpcd
threads are "bound" threads. The only difference between the various
bind policies, PDB_POLICY_NONE, PDB_POLICY_FULL, PDB_POLICY_PAIR, and
PDB_POLICY_NEIGHBOR, is the number of partner threads. The bind
policy is replaced with a tunable that directly specifies the size of
the groups of ptlrpcd partner threads.
Ensure that the ptlrpc_request_set for a ptlrpcd thread is created on
the same CPT that the thread will work on. When threads are bound to
specific nodes and/or CPUs in a NUMA system, it pays to ensure that
the datastructures used by these threads are also on the same node.
Visible changes:
* ptlrpcd thread names include the CPT number, for example
"ptlrpcd_02_07". In this case the "07" is relative to the CPT, and
not a CPU number.
Tunables added:
* ptlrpcd_cpts (string): A CPT string describing the CPU partitions
that ptlrpcd threads should run on. Used to make ptlrpcd threads
run on a subset of all CPTs.
* ptlrpcd_per_cpt_max (int): The maximum number of ptlrpcd threads
to run in a CPT.
* ptlrpcd_partner_group_size (int): The desired number of threads
in each ptlrpcd partner thread group. Default is 2, corresponding
to the old PDB_POLICY_PAIR. A negative value makes all ptlrpcd
threads in a CPT partners of each other.
Tunables obsoleted:
* max_ptlrpcds: The new ptlrcpd_per_cpt_max can be used to obtain the
same effect.
* ptlrpcd_bind_policy: The new ptlrpcd_partner_group_size can be used
to obtain the same effect.
Internal interface changes:
* pdb_policy_t and related code have been removed. Groups of partner
ptlrpcd threads are still created, and all threads in a partner
group are bound on the same CPT. The ptlrpcd threads bound to a
CPT are typically divided into several partner groups. The partner
groups on a CPT all have an equal number of ptlrpcd threads.
* pdl_policy_t and related code have been removed. Since ptlrpcd
threads are not bound to a specific CPU, all the code that avoids
scheduling on the current CPU (or attempts to do so) has been
removed as non-functional. A simplified form of PDL_POLICY_LOCAL
is kept as the only load policy.
* LIOD_BIND and related code have been removed. All ptlrpcd threads
are now bound to a CPT, and no additional binding policy is
implemented.
* ptlrpc_prep_set(): Changed to allocate a ptlrpc_request_set
on the current CPT.
* ptlrpcd(): If an error is encountered before entering the main loop
store the error in pc_error before exiting.
* ptlrpcd_start(): Check pc_error to verify that the ptlrpcd thread
has successfully entered its main loop.
* ptlrpcd_init(): Initialize the struct ptlrpcd_ctl for all threads
for a CPT before starting any of them. This closes a race during
startup where a partner thread could reference a non-initialized
struct ptlrpcd_ctl.
Signed-off-by: Olaf Weber <olaf@sgi.com>
Reviewed-on: http://review.whamcloud.com/13972
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6325
Reviewed-by: Grégoire Pichon <gregoire.pichon@bull.net>
Reviewed-by: Stephen Champion <schamp@sgi.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
13 files changed: