mm: fadvise: avoid fadvise for fs without backing device
authorShakeel Butt <shakeelb@google.com>
Fri, 8 Sep 2017 23:13:05 +0000 (16:13 -0700)
committerLinus Torvalds <torvalds@linux-foundation.org>
Sat, 9 Sep 2017 01:26:47 +0000 (18:26 -0700)
The fadvise() manpage is silent on fadvise()'s effect on memory-based
filesystems (shmem, hugetlbfs & ramfs) and pseudo file systems (procfs,
sysfs, kernfs).  The current implementaion of fadvise is mostly a noop
for such filesystems except for FADV_DONTNEED which will trigger
expensive remote LRU cache draining.  This patch makes the noop of
fadvise() on such file systems very explicit.

However this change has two side effects for ramfs and one for tmpfs.
First fadvise(FADV_DONTNEED) could remove the unmapped clean zero'ed
pages of ramfs (allocated through read, readahead & read fault) and
tmpfs (allocated through read fault).  Also fadvise(FADV_WILLNEED) could
create such clean zero'ed pages for ramfs.  This change removes those
possibilities.

One of our generic libraries does fadvise(FADV_DONTNEED).  Recently we
observed high latency in fadvise() and noticed that the users have
started using tmpfs files and the latency was due to expensive remote
LRU cache draining.  For normal tmpfs files (have data written on them),
fadvise(FADV_DONTNEED) will always trigger the unneeded remote cache
draining.

Link: http://lkml.kernel.org/r/20170818011023.181465-1-shakeelb@google.com
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: Greg Thelen <gthelen@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
mm/fadvise.c

index a430131..702f239 100644 (file)
@@ -52,7 +52,9 @@ SYSCALL_DEFINE4(fadvise64_64, int, fd, loff_t, offset, loff_t, len, int, advice)
                goto out;
        }
 
-       if (IS_DAX(inode)) {
+       bdi = inode_to_bdi(mapping->host);
+
+       if (IS_DAX(inode) || (bdi == &noop_backing_dev_info)) {
                switch (advice) {
                case POSIX_FADV_NORMAL:
                case POSIX_FADV_RANDOM:
@@ -75,8 +77,6 @@ SYSCALL_DEFINE4(fadvise64_64, int, fd, loff_t, offset, loff_t, len, int, advice)
        else
                endbyte--;              /* inclusive */
 
-       bdi = inode_to_bdi(mapping->host);
-
        switch (advice) {
        case POSIX_FADV_NORMAL:
                f.file->f_ra.ra_pages = bdi->ra_pages;