staging: lustre: osc: Track and limit "unstable" pages
authorPrakash Surya <surya1@llnl.gov>
Wed, 27 Apr 2016 22:21:04 +0000 (18:21 -0400)
committerGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Fri, 29 Apr 2016 04:51:58 +0000 (21:51 -0700)
commitac5b14810952b659438c015478348c93eda91c25
tree49d8c33d4a9ef93951ddd96c1b9b91d52c4197b4
parent7bbe9f838e17a1abcfc74c874739e2b0418899b0
staging: lustre: osc: Track and limit "unstable" pages

This change adds a global counter to track the number of "unstable"
pages held by a given client, along with per file system counters. An
"unstable" page is defined as a page which has been sent to the server
as part of a bulk request, but is uncommitted to stable storage.

In addition to simply tracking the unstable pages, they now also count
towards the maximum number of "pinned" pages on the system at any given
time. Thus, a client will now be bound on the number of dirty and
unstable pages it can pin in memory. Previously only dirty pages were
accounted for in this limit.

In addition to tracking the number of unstable pages in Lustre, the
NR_UNSTABLE_NFS memory zone is also incremented and decremented for
easy monitoring using the "NFS_Unstable:" field in /proc/meminfo.
This field is also used internally by the kernel to limit the total
amount of unstable pages on the system.

The motivation for this change is twofold. First, the client must not
allow itself to disconnect from an OST while still holding unstable
pages. Otherwise, these unstable pages can get lost due to an OST
failure, and replay is not possible due to the disconnect via unmount.

Secondly, the client needs a mechanism to prevent it from allocating too
much of its available RAM to unreclaimable pages pinned by the ptlrpc
layer. If this case occurs, out of memory events can trigger as a side
effect, which we need to avoid.

The current number of unstable pages accounted for on a per file system
granularity is exported by the unstable_stats proc file, contained under
each file system's llite namespace. An example of retrieving this
information is below:

$ lctl get_param llite.*.unstable_stats

Signed-off-by: Prakash Surya <surya1@llnl.gov>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2139
Reviewed-on: http://review.whamcloud.com/6284
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
drivers/staging/lustre/lustre/include/cl_object.h
drivers/staging/lustre/lustre/include/lustre_net.h
drivers/staging/lustre/lustre/include/obd.h
drivers/staging/lustre/lustre/include/obd_support.h
drivers/staging/lustre/lustre/llite/llite_internal.h
drivers/staging/lustre/lustre/llite/llite_lib.c
drivers/staging/lustre/lustre/llite/lproc_llite.c
drivers/staging/lustre/lustre/obdclass/class_obd.c
drivers/staging/lustre/lustre/osc/osc_cache.c
drivers/staging/lustre/lustre/osc/osc_internal.h
drivers/staging/lustre/lustre/osc/osc_request.c