Documentation/filesystems/netfs_library.rst

   1 .. SPDX-License-Identifier: GPL-2.0
   2
   3 =================================
   4 NETWORK FILESYSTEM HELPER LIBRARY
   5 =================================
   6
   7 .. Contents:
   8
   9  - Overview.
  10  - Buffered read helpers.
  11    - Read helper functions.
  12    - Read helper structures.
  13    - Read helper operations.
  14    - Read helper procedure.
  15    - Read helper cache API.
  16
  17
  18 Overview
  19 ========
  20
  21 The network filesystem helper library is a set of functions designed to aid a
  22 network filesystem in implementing VM/VFS operations.  For the moment, that
  23 just includes turning various VM buffered read operations into requests to read
  24 from the server.  The helper library, however, can also interpose other
  25 services, such as local caching or local data encryption.
  26
  27 Note that the library module doesn't link against local caching directly, so
  28 access must be provided by the netfs.
  29
  30
  31 Buffered Read Helpers
  32 =====================
  33
  34 The library provides a set of read helpers that handle the ->readpage(),
  35 ->readahead() and much of the ->write_begin() VM operations and translate them
  36 into a common call framework.
  37
  38 The following services are provided:
  39
  40  * Handles transparent huge pages (THPs).
  41
  42  * Insulates the netfs from VM interface changes.
  43
  44  * Allows the netfs to arbitrarily split reads up into pieces, even ones that
  45    don't match page sizes or page alignments and that may cross pages.
  46
  47  * Allows the netfs to expand a readahead request in both directions to meet
  48    its needs.
  49
  50  * Allows the netfs to partially fulfil a read, which will then be resubmitted.
  51
  52  * Handles local caching, allowing cached data and server-read data to be
  53    interleaved for a single request.
  54
  55  * Handles clearing of bufferage that aren't on the server.
  56
  57  * Handle retrying of reads that failed, switching reads from the cache to the
  58    server as necessary.
  59
  60  * In the future, this is a place that other services can be performed, such as
  61    local encryption of data to be stored remotely or in the cache.
  62
  63 From the network filesystem, the helpers require a table of operations.  This
  64 includes a mandatory method to issue a read operation along with a number of
  65 optional methods.
  66
  67
  68 Read Helper Functions
  69 ---------------------
  70
  71 Three read helpers are provided::
  72
  73  * void netfs_readahead(struct readahead_control *ractl,
  74                         const struct netfs_read_request_ops *ops,
  75                         void *netfs_priv);``
  76  * int netfs_readpage(struct file *file,
  77                       struct page *page,
  78                       const struct netfs_read_request_ops *ops,
  79                       void *netfs_priv);
  80  * int netfs_write_begin(struct file *file,
  81                          struct address_space *mapping,
  82                          loff_t pos,
  83                          unsigned int len,
  84                          unsigned int flags,
  85                          struct page **_page,
  86                          void **_fsdata,
  87                          const struct netfs_read_request_ops *ops,
  88                          void *netfs_priv);
  89
  90 Each corresponds to a VM operation, with the addition of a couple of parameters
  91 for the use of the read helpers:
  92
  93  * ``ops``
  94
  95    A table of operations through which the helpers can talk to the filesystem.
  96
  97  * ``netfs_priv``
  98
  99    Filesystem private data (can be NULL).
 100
 101 Both of these values will be stored into the read request structure.
 102
 103 For ->readahead() and ->readpage(), the network filesystem should just jump
 104 into the corresponding read helper; whereas for ->write_begin(), it may be a
 105 little more complicated as the network filesystem might want to flush
 106 conflicting writes or track dirty data and needs to put the acquired page if an
 107 error occurs after calling the helper.
 108
 109 The helpers manage the read request, calling back into the network filesystem
 110 through the suppplied table of operations.  Waits will be performed as
 111 necessary before returning for helpers that are meant to be synchronous.
 112
 113 If an error occurs and netfs_priv is non-NULL, ops->cleanup() will be called to
 114 deal with it.  If some parts of the request are in progress when an error
 115 occurs, the request will get partially completed if sufficient data is read.
 116
 117 Additionally, there is::
 118
 119   * void netfs_subreq_terminated(struct netfs_read_subrequest *subreq,
 120                                  ssize_t transferred_or_error,
 121                                  bool was_async);
 122
 123 which should be called to complete a read subrequest.  This is given the number
 124 of bytes transferred or a negative error code, plus a flag indicating whether
 125 the operation was asynchronous (ie. whether the follow-on processing can be
 126 done in the current context, given this may involve sleeping).
 127
 128
 129 Read Helper Structures
 130 ----------------------
 131
 132 The read helpers make use of a couple of structures to maintain the state of
 133 the read.  The first is a structure that manages a read request as a whole::
 134
 135         struct netfs_read_request {
 136                 struct inode            *inode;
 137                 struct address_space    *mapping;
 138                 struct netfs_cache_resources cache_resources;
 139                 void                    *netfs_priv;
 140                 loff_t                  start;
 141                 size_t                  len;
 142                 loff_t                  i_size;
 143                 const struct netfs_read_request_ops *netfs_ops;
 144                 unsigned int            debug_id;
 145                 ...
 146         };
 147
 148 The above fields are the ones the netfs can use.  They are:
 149
 150  * ``inode``
 151  * ``mapping``
 152
 153    The inode and the address space of the file being read from.  The mapping
 154    may or may not point to inode->i_data.
 155
 156  * ``cache_resources``
 157
 158    Resources for the local cache to use, if present.
 159
 160  * ``netfs_priv``
 161
 162    The network filesystem's private data.  The value for this can be passed in
 163    to the helper functions or set during the request.  The ->cleanup() op will
 164    be called if this is non-NULL at the end.
 165
 166  * ``start``
 167  * ``len``
 168
 169    The file position of the start of the read request and the length.  These
 170    may be altered by the ->expand_readahead() op.
 171
 172  * ``i_size``
 173
 174    The size of the file at the start of the request.
 175
 176  * ``netfs_ops``
 177
 178    A pointer to the operation table.  The value for this is passed into the
 179    helper functions.
 180
 181  * ``debug_id``
 182
 183    A number allocated to this operation that can be displayed in trace lines
 184    for reference.
 185
 186
 187 The second structure is used to manage individual slices of the overall read
 188 request::
 189
 190         struct netfs_read_subrequest {
 191                 struct netfs_read_request *rreq;
 192                 loff_t                  start;
 193                 size_t                  len;
 194                 size_t                  transferred;
 195                 unsigned long           flags;
 196                 unsigned short          debug_index;
 197                 ...
 198         };
 199
 200 Each subrequest is expected to access a single source, though the helpers will
 201 handle falling back from one source type to another.  The members are:
 202
 203  * ``rreq``
 204
 205    A pointer to the read request.
 206
 207  * ``start``
 208  * ``len``
 209
 210    The file position of the start of this slice of the read request and the
 211    length.
 212
 213  * ``transferred``
 214
 215    The amount of data transferred so far of the length of this slice.  The
 216    network filesystem or cache should start the operation this far into the
 217    slice.  If a short read occurs, the helpers will call again, having updated
 218    this to reflect the amount read so far.
 219
 220  * ``flags``
 221
 222    Flags pertaining to the read.  There are two of interest to the filesystem
 223    or cache:
 224
 225    * ``NETFS_SREQ_CLEAR_TAIL``
 226
 227      This can be set to indicate that the remainder of the slice, from
 228      transferred to len, should be cleared.
 229
 230    * ``NETFS_SREQ_SEEK_DATA_READ``
 231
 232      This is a hint to the cache that it might want to try skipping ahead to
 233      the next data (ie. using SEEK_DATA).
 234
 235  * ``debug_index``
 236
 237    A number allocated to this slice that can be displayed in trace lines for
 238    reference.
 239
 240
 241 Read Helper Operations
 242 ----------------------
 243
 244 The network filesystem must provide the read helpers with a table of operations
 245 through which it can issue requests and negotiate::
 246
 247         struct netfs_read_request_ops {
 248                 void (*init_rreq)(struct netfs_read_request *rreq, struct file *file);
 249                 bool (*is_cache_enabled)(struct inode *inode);
 250                 int (*begin_cache_operation)(struct netfs_read_request *rreq);
 251                 void (*expand_readahead)(struct netfs_read_request *rreq);
 252                 bool (*clamp_length)(struct netfs_read_subrequest *subreq);
 253                 void (*issue_op)(struct netfs_read_subrequest *subreq);
 254                 bool (*is_still_valid)(struct netfs_read_request *rreq);
 255                 int (*check_write_begin)(struct file *file, loff_t pos, unsigned len,
 256                                          struct page *page, void **_fsdata);
 257                 void (*done)(struct netfs_read_request *rreq);
 258                 void (*cleanup)(struct address_space *mapping, void *netfs_priv);
 259         };
 260
 261 The operations are as follows:
 262
 263  * ``init_rreq()``
 264
 265    [Optional] This is called to initialise the request structure.  It is given
 266    the file for reference and can modify the ->netfs_priv value.
 267
 268  * ``is_cache_enabled()``
 269
 270    [Required] This is called by netfs_write_begin() to ask if the file is being
 271    cached.  It should return true if it is being cached and false otherwise.
 272
 273  * ``begin_cache_operation()``
 274
 275    [Optional] This is called to ask the network filesystem to call into the
 276    cache (if present) to initialise the caching state for this read.  The netfs
 277    library module cannot access the cache directly, so the cache should call
 278    something like fscache_begin_read_operation() to do this.
 279
 280    The cache gets to store its state in ->cache_resources and must set a table
 281    of operations of its own there (though of a different type).
 282
 283    This should return 0 on success and an error code otherwise.  If an error is
 284    reported, the operation may proceed anyway, just without local caching (only
 285    out of memory and interruption errors cause failure here).
 286
 287  * ``expand_readahead()``
 288
 289    [Optional] This is called to allow the filesystem to expand the size of a
 290    readahead read request.  The filesystem gets to expand the request in both
 291    directions, though it's not permitted to reduce it as the numbers may
 292    represent an allocation already made.  If local caching is enabled, it gets
 293    to expand the request first.
 294
 295    Expansion is communicated by changing ->start and ->len in the request
 296    structure.  Note that if any change is made, ->len must be increased by at
 297    least as much as ->start is reduced.
 298
 299  * ``clamp_length()``
 300
 301    [Optional] This is called to allow the filesystem to reduce the size of a
 302    subrequest.  The filesystem can use this, for example, to chop up a request
 303    that has to be split across multiple servers or to put multiple reads in
 304    flight.
 305
 306    This should return 0 on success and an error code on error.
 307
 308  * ``issue_op()``
 309
 310    [Required] The helpers use this to dispatch a subrequest to the server for
 311    reading.  In the subrequest, ->start, ->len and ->transferred indicate what
 312    data should be read from the server.
 313
 314    There is no return value; the netfs_subreq_terminated() function should be
 315    called to indicate whether or not the operation succeeded and how much data
 316    it transferred.  The filesystem also should not deal with setting pages
 317    uptodate, unlocking them or dropping their refs - the helpers need to deal
 318    with this as they have to coordinate with copying to the local cache.
 319
 320    Note that the helpers have the pages locked, but not pinned.  It is possible
 321    to use the ITER_XARRAY iov iterator to refer to the range of the inode that
 322    is being operated upon without the need to allocate large bvec tables.
 323
 324  * ``is_still_valid()``
 325
 326    [Optional] This is called to find out if the data just read from the local
 327    cache is still valid.  It should return true if it is still valid and false
 328    if not.  If it's not still valid, it will be reread from the server.
 329
 330  * ``check_write_begin()``
 331
 332    [Optional] This is called from the netfs_write_begin() helper once it has
 333    allocated/grabbed the page to be modified to allow the filesystem to flush
 334    conflicting state before allowing it to be modified.
 335
 336    It should return 0 if everything is now fine, -EAGAIN if the page should be
 337    regrabbed and any other error code to abort the operation.
 338
 339  * ``done``
 340
 341    [Optional] This is called after the pages in the request have all been
 342    unlocked (and marked uptodate if applicable).
 343
 344  * ``cleanup``
 345
 346    [Optional] This is called as the request is being deallocated so that the
 347    filesystem can clean up ->netfs_priv.
 348
 349
 350
 351 Read Helper Procedure
 352 ---------------------
 353
 354 The read helpers work by the following general procedure:
 355
 356  * Set up the request.
 357
 358  * For readahead, allow the local cache and then the network filesystem to
 359    propose expansions to the read request.  This is then proposed to the VM.
 360    If the VM cannot fully perform the expansion, a partially expanded read will
 361    be performed, though this may not get written to the cache in its entirety.
 362
 363  * Loop around slicing chunks off of the request to form subrequests:
 364
 365    * If a local cache is present, it gets to do the slicing, otherwise the
 366      helpers just try to generate maximal slices.
 367
 368    * The network filesystem gets to clamp the size of each slice if it is to be
 369      the source.  This allows rsize and chunking to be implemented.
 370
 371    * The helpers issue a read from the cache or a read from the server or just
 372      clears the slice as appropriate.
 373
 374    * The next slice begins at the end of the last one.
 375
 376    * As slices finish being read, they terminate.
 377
 378  * When all the subrequests have terminated, the subrequests are assessed and
 379    any that are short or have failed are reissued:
 380
 381    * Failed cache requests are issued against the server instead.
 382
 383    * Failed server requests just fail.
 384
 385    * Short reads against either source will be reissued against that source
 386      provided they have transferred some more data:
 387
 388      * The cache may need to skip holes that it can't do DIO from.
 389
 390      * If NETFS_SREQ_CLEAR_TAIL was set, a short read will be cleared to the
 391        end of the slice instead of reissuing.
 392
 393  * Once the data is read, the pages that have been fully read/cleared:
 394
 395    * Will be marked uptodate.
 396
 397    * If a cache is present, will be marked with PG_fscache.
 398
 399    * Unlocked
 400
 401  * Any pages that need writing to the cache will then have DIO writes issued.
 402
 403  * Synchronous operations will wait for reading to be complete.
 404
 405  * Writes to the cache will proceed asynchronously and the pages will have the
 406    PG_fscache mark removed when that completes.
 407
 408  * The request structures will be cleaned up when everything has completed.
 409
 410
 411 Read Helper Cache API
 412 ---------------------
 413
 414 When implementing a local cache to be used by the read helpers, two things are
 415 required: some way for the network filesystem to initialise the caching for a
 416 read request and a table of operations for the helpers to call.
 417
 418 The network filesystem's ->begin_cache_operation() method is called to set up a
 419 cache and this must call into the cache to do the work.  If using fscache, for
 420 example, the cache would call::
 421
 422         int fscache_begin_read_operation(struct netfs_read_request *rreq,
 423                                          struct fscache_cookie *cookie);
 424
 425 passing in the request pointer and the cookie corresponding to the file.
 426
 427 The netfs_read_request object contains a place for the cache to hang its
 428 state::
 429
 430         struct netfs_cache_resources {
 431                 const struct netfs_cache_ops    *ops;
 432                 void                            *cache_priv;
 433                 void                            *cache_priv2;
 434         };
 435
 436 This contains an operations table pointer and two private pointers.  The
 437 operation table looks like the following::
 438
 439         struct netfs_cache_ops {
 440                 void (*end_operation)(struct netfs_cache_resources *cres);
 441
 442                 void (*expand_readahead)(struct netfs_cache_resources *cres,
 443                                          loff_t *_start, size_t *_len, loff_t i_size);
 444
 445                 enum netfs_read_source (*prepare_read)(struct netfs_read_subrequest *subreq,
 446                                                        loff_t i_size);
 447
 448                 int (*read)(struct netfs_cache_resources *cres,
 449                             loff_t start_pos,
 450                             struct iov_iter *iter,
 451                             bool seek_data,
 452                             netfs_io_terminated_t term_func,
 453                             void *term_func_priv);
 454
 455                 int (*write)(struct netfs_cache_resources *cres,
 456                              loff_t start_pos,
 457                              struct iov_iter *iter,
 458                              netfs_io_terminated_t term_func,
 459                              void *term_func_priv);
 460         };
 461
 462 With a termination handler function pointer::
 463
 464         typedef void (*netfs_io_terminated_t)(void *priv,
 465                                               ssize_t transferred_or_error,
 466                                               bool was_async);
 467
 468 The methods defined in the table are:
 469
 470  * ``end_operation()``
 471
 472    [Required] Called to clean up the resources at the end of the read request.
 473
 474  * ``expand_readahead()``
 475
 476    [Optional] Called at the beginning of a netfs_readahead() operation to allow
 477    the cache to expand a request in either direction.  This allows the cache to
 478    size the request appropriately for the cache granularity.
 479
 480    The function is passed poiners to the start and length in its parameters,
 481    plus the size of the file for reference, and adjusts the start and length
 482    appropriately.  It should return one of:
 483
 484    * ``NETFS_FILL_WITH_ZEROES``
 485    * ``NETFS_DOWNLOAD_FROM_SERVER``
 486    * ``NETFS_READ_FROM_CACHE``
 487    * ``NETFS_INVALID_READ``
 488
 489    to indicate whether the slice should just be cleared or whether it should be
 490    downloaded from the server or read from the cache - or whether slicing
 491    should be given up at the current point.
 492
 493  * ``prepare_read()``
 494
 495    [Required] Called to configure the next slice of a request.  ->start and
 496    ->len in the subrequest indicate where and how big the next slice can be;
 497    the cache gets to reduce the length to match its granularity requirements.
 498
 499  * ``read()``
 500
 501    [Required] Called to read from the cache.  The start file offset is given
 502    along with an iterator to read to, which gives the length also.  It can be
 503    given a hint requesting that it seek forward from that start position for
 504    data.
 505
 506    Also provided is a pointer to a termination handler function and private
 507    data to pass to that function.  The termination function should be called
 508    with the number of bytes transferred or an error code, plus a flag
 509    indicating whether the termination is definitely happening in the caller's
 510    context.
 511
 512  * ``write()``
 513
 514    [Required] Called to write to the cache.  The start file offset is given
 515    along with an iterator to write from, which gives the length also.
 516
 517    Also provided is a pointer to a termination handler function and private
 518    data to pass to that function.  The termination function should be called
 519    with the number of bytes transferred or an error code, plus a flag
 520    indicating whether the termination is definitely happening in the caller's
 521    context.
 522
 523 Note that these methods are passed a pointer to the cache resource structure,
 524 not the read request structure as they could be used in other situations where
 525 there isn't a read request structure as well, such as writing dirty data to the
 526 cache.