platform/kernel/linux-rpi.git
3 years agolib: inline _find_next_bit() wrappers
Yury Norov [Fri, 7 May 2021 01:03:03 +0000 (18:03 -0700)]
lib: inline _find_next_bit() wrappers

lib/find_bit.c declares five single-line wrappers for _find_next_bit().
We may turn those wrappers to inline functions.  It eliminates unneeded
function calls and opens room for compile-time optimizations.

Link: https://lkml.kernel.org/r/20210401003153.97325-8-yury.norov@gmail.com
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Alexey Klimov <aklimov@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Sterba <dsterba@suse.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Jianpeng Ma <jianpeng.ma@intel.com>
Cc: Joe Perches <joe@perches.com>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Stefano Brivio <sbrivio@redhat.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Wolfram Sang <wsa+renesas@sang-engineering.com>
Cc: Yoshinori Sato <ysato@users.osdn.me>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agotools: sync small_const_nbits() macro with the kernel
Yury Norov [Fri, 7 May 2021 01:03:00 +0000 (18:03 -0700)]
tools: sync small_const_nbits() macro with the kernel

Sync implementation with the kernel and move the macro from
tools/include/linux/bitmap.h to tools/include/asm-generic/bitsperlong.h

Link: https://lkml.kernel.org/r/20210401003153.97325-7-yury.norov@gmail.com
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Alexey Klimov <aklimov@redhat.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Sterba <dsterba@suse.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Jianpeng Ma <jianpeng.ma@intel.com>
Cc: Joe Perches <joe@perches.com>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Stefano Brivio <sbrivio@redhat.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Wolfram Sang <wsa+renesas@sang-engineering.com>
Cc: Yoshinori Sato <ysato@users.osdn.me>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agolib: extend the scope of small_const_nbits() macro
Yury Norov [Fri, 7 May 2021 01:02:56 +0000 (18:02 -0700)]
lib: extend the scope of small_const_nbits() macro

find_bit would also benefit from small_const_nbits() optimizations.  The
detailed comment is provided by Rasmus Villemoes.

Link: https://lkml.kernel.org/r/20210401003153.97325-6-yury.norov@gmail.com
Suggested-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Alexey Klimov <aklimov@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Sterba <dsterba@suse.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Jianpeng Ma <jianpeng.ma@intel.com>
Cc: Joe Perches <joe@perches.com>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Stefano Brivio <sbrivio@redhat.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Wolfram Sang <wsa+renesas@sang-engineering.com>
Cc: Yoshinori Sato <ysato@users.osdn.me>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoarch: rearrange headers inclusion order in asm/bitops for m68k, sh and h8300
Yury Norov [Fri, 7 May 2021 01:02:53 +0000 (18:02 -0700)]
arch: rearrange headers inclusion order in asm/bitops for m68k, sh and h8300

m68k and sh include bitmap/{find,le}.h prior to ffs/fls headers.  New
fast-path implementation in find.h requires ffs/fls.  Reordering the
headers inclusion sequence helps to prevent compile-time implicit function
declaration error.

[yury.norov@gmail.com: h8300: rearrange headers inclusion order in asm/bitops]
Link: https://lkml.kernel.org/r/20210406183625.794227-1-yury.norov@gmail.com
Link: https://lkml.kernel.org/r/20210401003153.97325-5-yury.norov@gmail.com
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Cc: Alexey Klimov <aklimov@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Sterba <dsterba@suse.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Jianpeng Ma <jianpeng.ma@intel.com>
Cc: Joe Perches <joe@perches.com>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Stefano Brivio <sbrivio@redhat.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Wolfram Sang <wsa+renesas@sang-engineering.com>
Cc: Yoshinori Sato <ysato@users.osdn.me>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agotools: sync BITMAP_LAST_WORD_MASK() macro with the kernel
Yury Norov [Fri, 7 May 2021 01:02:49 +0000 (18:02 -0700)]
tools: sync BITMAP_LAST_WORD_MASK() macro with the kernel

Kernel version generates better code.

Link: https://lkml.kernel.org/r/20210401003153.97325-4-yury.norov@gmail.com
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Alexey Klimov <aklimov@redhat.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Sterba <dsterba@suse.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Jianpeng Ma <jianpeng.ma@intel.com>
Cc: Joe Perches <joe@perches.com>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Stefano Brivio <sbrivio@redhat.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Wolfram Sang <wsa+renesas@sang-engineering.com>
Cc: Yoshinori Sato <ysato@users.osdn.me>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agotools: bitmap: sync function declarations with the kernel
Yury Norov [Fri, 7 May 2021 01:02:46 +0000 (18:02 -0700)]
tools: bitmap: sync function declarations with the kernel

Some functions in tools/include/linux/bitmap.h declare nbits as int.  In
the kernel nbits is declared as unsigned int.

Link: https://lkml.kernel.org/r/20210401003153.97325-3-yury.norov@gmail.com
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Alexey Klimov <aklimov@redhat.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Sterba <dsterba@suse.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Jianpeng Ma <jianpeng.ma@intel.com>
Cc: Joe Perches <joe@perches.com>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Stefano Brivio <sbrivio@redhat.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Wolfram Sang <wsa+renesas@sang-engineering.com>
Cc: Yoshinori Sato <ysato@users.osdn.me>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agotools: disable -Wno-type-limits
Yury Norov [Fri, 7 May 2021 01:02:42 +0000 (18:02 -0700)]
tools: disable -Wno-type-limits

Patch series "lib/find_bit: fast path for small bitmaps", v6.

Bitmap operations are much simpler and faster in case of small bitmaps
which fit into a single word.  In linux/bitmap.c we have a machinery that
allows compiler to replace actual function call with a few instructions if
bitmaps passed into the function are small and their size is known at
compile time.

find_*_bit() API lacks this functionality; but users will benefit from it
a lot.  One important example is cpumask subsystem when NR_CPUS <=
BITS_PER_LONG.

This patch (of 12):

GENMASK(h, l) may be passed with unsigned types.  In such case,
type-limits warning is generated for example in case of GENMASK(h, 0).

Link: https://lkml.kernel.org/r/20210401003153.97325-1-yury.norov@gmail.com
Link: https://lkml.kernel.org/r/20210401003153.97325-2-yury.norov@gmail.com
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Alexey Klimov <aklimov@redhat.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Sterba <dsterba@suse.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Jianpeng Ma <jianpeng.ma@intel.com>
Cc: Joe Perches <joe@perches.com>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Stefano Brivio <sbrivio@redhat.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Wolfram Sang <wsa+renesas@sang-engineering.com>
Cc: Yoshinori Sato <ysato@users.osdn.me>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agokernel/cred.c: make init_groups static
Rasmus Villemoes [Fri, 7 May 2021 01:02:39 +0000 (18:02 -0700)]
kernel/cred.c: make init_groups static

init_groups is declared in both cred.h and init_task.h, but it is not
actually referenced anywhere outside of cred.c where it is defined.  So
make it static and remove the declarations.

Link: https://lkml.kernel.org/r/20210310220102.2484201-1-linux@rasmusvillemoes.dk
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agokernel/async.c: fix pr_debug statement
Rasmus Villemoes [Fri, 7 May 2021 01:02:36 +0000 (18:02 -0700)]
kernel/async.c: fix pr_debug statement

An async_func_t returns void - any errors encountered it has to stash
somewhere for consumers to discover later.

Link: https://lkml.kernel.org/r/20210226124355.2503524-1-linux@rasmusvillemoes.dk
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agolinux/profile.h: remove unnecessary declaration
Wan Jiabing [Fri, 7 May 2021 01:02:33 +0000 (18:02 -0700)]
linux/profile.h: remove unnecessary declaration

Declaring struct pt_regs is unnecessary.  On the one hand, there is no
function using it; on the other hand, struct pt_regs has been declared in
linux/kernel.h.  Remove them.

Link: https://lkml.kernel.org/r/20210401104834.1009157-1-wanjiabing@vivo.com
Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agokernel.h: drop inclusion in bitmap.h
Andy Shevchenko [Fri, 7 May 2021 01:02:30 +0000 (18:02 -0700)]
kernel.h: drop inclusion in bitmap.h

The bitmap.h header is used in a lot of code around the kernel.  Besides
that it includes kernel.h which sometimes makes a loop.

The problem here is many unneeded loops that make header hell
dependencies.  For example, how may you move bitmap_zalloc() from C-file
to the header?  Currently it's impossible.  And bitmap.h here is only the
tip of an iceberg.

kerne.h is a dump of everything that even has nothing in common at all.
We may still have it, but in my new code I prefer to include only the
headers that I want to use, without the bulk of unneeded kernel code.

Break the loop by introducing align.h, including it in kernel.h and
bitmap.h followed by replacing kernel.h with limits.h.

Link: https://lkml.kernel.org/r/20210326170347.37441-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Yury Norov <yury.norov@gmail.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoinclude: remove pagemap.h from blkdev.h
Matthew Wilcox (Oracle) [Fri, 7 May 2021 01:02:27 +0000 (18:02 -0700)]
include: remove pagemap.h from blkdev.h

My UEK-derived config has 1030 files depending on pagemap.h before this
change.  Afterwards, just 326 files need to be rebuilt when I touch
pagemap.h.  I think blkdev.h is probably included too widely, but
untangling that dependency is harder and this solves my problem.  x86
allmodconfig builds, but there may be implicit include problems on other
architectures.

Link: https://lkml.kernel.org/r/20210309195747.283796-1-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Dan Williams <dan.j.williams@intel.com> [nvdimm]
Acked-by: Jens Axboe <axboe@kernel.dk> [block]
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Coly Li <colyli@suse.de> [bcache]
Acked-by: Martin K. Petersen <martin.petersen@oracle.com> [scsi]
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoproc/sysctl: fix function name error in comments
zhouchuangao [Fri, 7 May 2021 01:02:24 +0000 (18:02 -0700)]
proc/sysctl: fix function name error in comments

The function name should be modified to register_sysctl_paths instead of
register_sysctl_table_path.

Link: https://lkml.kernel.org/r/1615807194-79646-1-git-send-email-zhouchuangao@vivo.com
Signed-off-by: zhouchuangao <zhouchuangao@vivo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoselftests: proc: test subset=pid
Alexey Dobriyan [Fri, 7 May 2021 01:02:21 +0000 (18:02 -0700)]
selftests: proc: test subset=pid

Test that /proc instance mounted with

mount -t proc -o subset=pid

contains only ".", "..", "self", "thread-self" and pid directories.

Note:
Currently "subset=pid" doesn't return "." and ".." via readdir.
This must be a bug.

Link: https://lkml.kernel.org/r/YFYZZ7WGaZlsnChS@localhost.localdomain
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Alexey Gladkov <gladkov.alexey@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoproc: delete redundant subset=pid check
Alexey Dobriyan [Fri, 7 May 2021 01:02:18 +0000 (18:02 -0700)]
proc: delete redundant subset=pid check

Two checks in lookup and readdir code should be enough to not have third
check in open code.

Can't open what can't be looked up?

Link: https://lkml.kernel.org/r/YFYYwIBIkytqnkxP@localhost.localdomain
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Alexey Gladkov <gladkov.alexey@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoproc: mandate ->proc_lseek in "struct proc_ops"
Alexey Dobriyan [Fri, 7 May 2021 01:02:16 +0000 (18:02 -0700)]
proc: mandate ->proc_lseek in "struct proc_ops"

Now that proc_ops are separate from file_operations and other operations
it easy to check all instances to have ->proc_lseek hook and remove check
in main code.

Note:
nonseekable_open() files naturally don't require ->proc_lseek.

Garbage collect pde_lseek() function.

[adobriyan@gmail.com: smoke test lseek()]
Link: https://lkml.kernel.org/r/YG4OIhChOrVTPgdN@localhost.localdomain
Link: https://lkml.kernel.org/r/YFYX0Bzwxlc7aBa/@localhost.localdomain
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoproc: save LOC in __xlate_proc_name()
Alexey Dobriyan [Fri, 7 May 2021 01:02:13 +0000 (18:02 -0700)]
proc: save LOC in __xlate_proc_name()

Can't look at this verbosity anymore.

Link: https://lkml.kernel.org/r/YFYXAp/fgq405qcy@localhost.localdomain
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agofs/proc/generic.c: fix incorrect pde_is_permanent check
Colin Ian King [Fri, 7 May 2021 01:02:10 +0000 (18:02 -0700)]
fs/proc/generic.c: fix incorrect pde_is_permanent check

Currently the pde_is_permanent() check is being run on root multiple times
rather than on the next proc directory entry.  This looks like a
copy-paste error.  Fix this by replacing root with next.

Addresses-Coverity: ("Copy-paste error")
Link: https://lkml.kernel.org/r/20210318122633.14222-1-colin.king@canonical.com
Fixes: d919b33dafb3 ("proc: faster open/read/close with "permanent" files")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Reviewed-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoalpha: csum_partial_copy.c: add function prototypes from <net/checksum.h>
Randy Dunlap [Fri, 7 May 2021 01:02:07 +0000 (18:02 -0700)]
alpha: csum_partial_copy.c: add function prototypes from <net/checksum.h>

Fix "no previous prototype" W=1 warnings from the kernel test robot:

  arch/alpha/lib/csum_partial_copy.c:349:1: error: no previous prototype for 'csum_and_copy_from_user' [-Werror=missing-prototypes]
  349 | csum_and_copy_from_user(const void __user *src, void *dst, int len)
      | ^~~~~~~~~~~~~~~~~~~~~~~
  arch/alpha/lib/csum_partial_copy.c:358:1: error: no previous prototype for 'csum_partial_copy_nocheck' [-Werror=missing-prototypes]
  358 | csum_partial_copy_nocheck(const void *src, void *dst, int len)
      | ^~~~~~~~~~~~~~~~~~~~~~~~~

Link: https://lkml.kernel.org/r/20210425235749.19113-1-rdunlap@infradead.org
Fixes: 808b49da54e6 ("alpha: turn csum_partial_copy_from_user() into csum_and_copy_from_user()")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kernel test robot <lkp@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoalpha: eliminate old-style function definitions
Randy Dunlap [Fri, 7 May 2021 01:02:04 +0000 (18:02 -0700)]
alpha: eliminate old-style function definitions

'make ARCH=alpha W=1' reports a couple of old-style function
definitions with missing parameter list, so fix those.

  arch/alpha/kernel/pc873xx.c: In function 'pc873xx_get_base':
  arch/alpha/kernel/pc873xx.c:16:21: warning: old-style function definition [-Wold-style-definition]
   16 | unsigned int __init pc873xx_get_base()

  arch/alpha/kernel/pc873xx.c: In function 'pc873xx_get_model':
  arch/alpha/kernel/pc873xx.c:21:14: warning: old-style function definition [-Wold-style-definition]
   21 | char *__init pc873xx_get_model()

Link: https://lkml.kernel.org/r/20210421061312.30097-1-rdunlap@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agotcp: Specify cmsgbuf is user pointer for receive zerocopy.
Arjun Roy [Thu, 6 May 2021 22:35:30 +0000 (15:35 -0700)]
tcp: Specify cmsgbuf is user pointer for receive zerocopy.

A prior change (1f466e1f15cf) introduces separate handling for
->msg_control depending on whether the pointer is a kernel or user
pointer. However, while tcp receive zerocopy is using this field, it
is not properly annotating that the buffer in this case is a user
pointer. This can cause faults when the improper mechanism is used
within put_cmsg().

This patch simply annotates tcp receive zerocopy's use as explicitly
being a user pointer.

Fixes: 7eeba1706eba ("tcp: Add receive timestamp support for receive zerocopy.")
Signed-off-by: Arjun Roy <arjunroy@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20210506223530.2266456-1-arjunroy.kdev@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agomlxsw: spectrum_mr: Update egress RIF list before route's action
Ido Schimmel [Thu, 6 May 2021 07:23:08 +0000 (10:23 +0300)]
mlxsw: spectrum_mr: Update egress RIF list before route's action

Each multicast route that is forwarding packets (as opposed to trapping
them) points to a list of egress router interfaces (RIFs) through which
packets are replicated.

A route's action can transition from trap to forward when a RIF is
created for one of the route's egress virtual interfaces (eVIF). When
this happens, the route's action is first updated and only later the
list of egress RIFs is committed to the device.

This results in the route pointing to an invalid list. In case the list
pointer is out of range (due to uninitialized memory), the device will
complain:

mlxsw_spectrum2 0000:06:00.0: EMAD reg access failed (tid=5733bf490000905c,reg_id=300f(pefa),type=write,status=7(bad parameter))

Fix this by first committing the list of egress RIFs to the device and
only later update the route's action.

Note that a fix is not needed in the reverse function (i.e.,
mlxsw_sp_mr_route_evif_unresolve()), as there the route's action is
first updated and only later the RIF is removed from the list.

Cc: stable@vger.kernel.org
Fixes: c011ec1bbfd6 ("mlxsw: spectrum: Add the multicast routing offloading logic")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/20210506072308.3834303-1-idosch@idosch.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: ipa: fix inter-EE IRQ register definitions
Alex Elder [Wed, 5 May 2021 22:36:36 +0000 (17:36 -0500)]
net: ipa: fix inter-EE IRQ register definitions

In gsi_irq_setup(), two registers are written with the intention of
disabling inter-EE channel and event IRQs.

But the wrong registers are used (and defined); the ones used are
read-only registers that indicate whether the interrupt condition is
present.

Define the mask registers instead of the status registers, and use
them to disable the inter-EE interrupt types.

Fixes: 46f748ccaf01 ("net: ipa: explicitly disallow inter-EE interrupts")
Signed-off-by: Alex Elder <elder@linaro.org>
Link: https://lore.kernel.org/r/20210505223636.232527-1-elder@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoMerge tag 'linux-can-fixes-for-5.13-20210506' of git://git.kernel.org/pub/scm/linux...
Jakub Kicinski [Thu, 6 May 2021 23:24:31 +0000 (16:24 -0700)]
Merge tag 'linux-can-fixes-for-5.13-20210506' of git://git./linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2021-05-06

The first two patches target the mcp251xfd driver. Dan Carpenter's
patch fixes a NULL pointer dereference in the probe function's error
path. A patch by me adds the missing can_rx_offload_del() in error
path of the probe function.

Frieder Schrempf contributes a patch for the mcp251x driver, the patch
fixes the resume from sleep before interface was brought up.

The last patch is by me and fixes a race condition in the TX path of
the m_can driver for peripheral (SPI) based m_can cores.

* tag 'linux-can-fixes-for-5.13-20210506' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
  can: m_can: m_can_tx_work_queue(): fix tx_skb race condition
  can: mcp251x: fix resume from sleep before interface was brought up
  can: mcp251xfd: mcp251xfd_probe(): add missing can_rx_offload_del() in error path
  can: mcp251xfd: mcp251xfd_probe(): fix an error pointer dereference in probe
====================

Link: https://lore.kernel.org/r/20210506074015.1300591-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoInput: xpad - add support for Amazon Game Controller
Matt Reynolds [Thu, 29 Apr 2021 22:29:37 +0000 (15:29 -0700)]
Input: xpad - add support for Amazon Game Controller

The Amazon Luna controller (product name "Amazon Game Controller") behaves
like an Xbox 360 controller when connected over USB.

Signed-off-by: Matt Reynolds <mattreynolds@chromium.org>
Reviewed-by: Harry Cutts <hcutts@chromium.org>
Link: https://lore.kernel.org/r/20210429103548.1.If5f9a44cb81e25b9350f7c6c0b3c88b4ecd81166@changeid
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
3 years agoInput: ili210x - add missing negation for touch indication on ili210x
Hansem Ro [Thu, 6 May 2021 20:27:10 +0000 (13:27 -0700)]
Input: ili210x - add missing negation for touch indication on ili210x

This adds the negation needed for proper finger detection on Ilitek
ili2107/ili210x. This fixes polling issues (on Amazon Kindle Fire)
caused by returning false for the cooresponding finger on the touchscreen.

Signed-off-by: Hansem Ro <hansemro@outlook.com>
Fixes: e3559442afd2a ("ili210x - rework the touchscreen sample processing")
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
3 years agoMerge tag 's390-5.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Linus Torvalds [Thu, 6 May 2021 21:39:50 +0000 (14:39 -0700)]
Merge tag 's390-5.13-2' of git://git./linux/kernel/git/s390/linux

Pull more s390 updates from Heiko Carstens:

 - add support for system call stack randomization

 - handle stale PCI deconfiguration events

 - couple of defconfig updates

 - some fixes and cleanups

* tag 's390-5.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390: fix detection of vector enhancements facility 1 vs. vector packed decimal facility
  s390/entry: add support for syscall stack randomization
  s390/configs: change CONFIG_VIRTIO_CONSOLE to "m"
  s390/cio: remove invalid condition on IO_SCH_UNREG
  s390/cpumf: remove call to perf_event_update_userpage
  s390/cpumf: move counter set size calculation to common place
  s390/cpumf: beautify if-then-else indentation
  s390/configs: enable CONFIG_PCI_IOV
  s390/pci: handle stale deconfiguration events
  s390/pci: rename zpci_configure_device()

3 years agoMerge tag 'vfio-v5.13-rc1pt2' of git://github.com/awilliam/linux-vfio
Linus Torvalds [Thu, 6 May 2021 21:22:58 +0000 (14:22 -0700)]
Merge tag 'vfio-v5.13-rc1pt2' of git://github.com/awilliam/linux-vfio

Pull more VFIO updates from Alex Williamson:
 "A second small set of commits for this merge window, primarily to
  unbreak some deletions from our uAPI header.

   - Additional mdev sample driver cleanup (Dan Carpenter)

   - Doc fix (Alyssa Ross)

   - Unbreak uAPI from NVLink2 support removal (Alex Williamson)"

* tag 'vfio-v5.13-rc1pt2' of git://github.com/awilliam/linux-vfio:
  docs: vfio: fix typo
  vfio/pci: Revert nvlink removal uAPI breakage
  vfio/mdev: remove unnecessary NULL check in mbochs_create()

3 years agoMerge branch 'pcmcia-next' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo...
Linus Torvalds [Thu, 6 May 2021 18:08:50 +0000 (11:08 -0700)]
Merge branch 'pcmcia-next' of git://git./linux/kernel/git/brodo/linux

Pull pcmcia updates from Dominik Brodowski:
 "A number of patches fixing W=1 kernel build warnings, and one patch
  removing an always-false NULL check"

* 'pcmcia-next' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux:
  pcmcia: rsrc_nonstatic: Fix call-back function as reference formatting
  pcmcia: pcmcia_resource: Fix some kernel-doc formatting/disparities and demote others
  pcmcia: ds: Fix function name disparity in header
  pcmcia: pcmcia_cis: Demote non-conforming kernel-doc headers to standard kernel-doc
  pcmcia: cistpl: Demote non-conformant kernel-doc headers to standard comments
  pcmcia: rsrc_nonstatic: Demote kernel-doc abuses
  pcmcia: ds: Remove if with always false condition

3 years agoMerge tag 'ceph-for-5.13-rc1' of git://github.com/ceph/ceph-client
Linus Torvalds [Thu, 6 May 2021 17:27:02 +0000 (10:27 -0700)]
Merge tag 'ceph-for-5.13-rc1' of git://github.com/ceph/ceph-client

Pull ceph updates from Ilya Dryomov:
 "Notable items here are

   - a series to take advantage of David Howells' netfs helper library
     from Jeff

   - three new filesystem client metrics from Xiubo

   - ceph.dir.rsnaps vxattr from Yanhu

   - two auth-related fixes from myself, marked for stable.

  Interspersed is a smattering of assorted fixes and cleanups across the
  filesystem"

* tag 'ceph-for-5.13-rc1' of git://github.com/ceph/ceph-client: (24 commits)
  libceph: allow addrvecs with a single NONE/blank address
  libceph: don't set global_id until we get an auth ticket
  libceph: bump CephXAuthenticate encoding version
  ceph: don't allow access to MDS-private inodes
  ceph: fix up some bare fetches of i_size
  ceph: convert some PAGE_SIZE invocations to thp_size()
  ceph: support getting ceph.dir.rsnaps vxattr
  ceph: drop pinned_page parameter from ceph_get_caps
  ceph: fix inode leak on getattr error in __fh_to_dentry
  ceph: only check pool permissions for regular files
  ceph: send opened files/pinned caps/opened inodes metrics to MDS daemon
  ceph: avoid counting the same request twice or more
  ceph: rename the metric helpers
  ceph: fix kerneldoc copypasta over ceph_start_io_direct
  ceph: use attach/detach_page_private for tracking snap context
  ceph: don't use d_add in ceph_handle_snapdir
  ceph: don't clobber i_snap_caps on non-I_NEW inode
  ceph: fix fall-through warnings for Clang
  ceph: convert ceph_readpages to ceph_readahead
  ceph: convert ceph_write_begin to netfs_write_begin
  ...

3 years agoMerge tag 'ecryptfs-5.13-rc1-updates' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Thu, 6 May 2021 17:06:39 +0000 (10:06 -0700)]
Merge tag 'ecryptfs-5.13-rc1-updates' of git://git./linux/kernel/git/tyhicks/ecryptfs

Pull ecryptfs updates from Tyler Hicks:
 "Code cleanups and a bug fix

   - W=1 compiler warning cleanups

   - Mutex initialization simplification

   - Protect against NULL pointer exception during mount"

* tag 'ecryptfs-5.13-rc1-updates' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
  ecryptfs: fix kernel panic with null dev_name
  ecryptfs: remove unused helpers
  ecryptfs: Fix typo in message
  eCryptfs: Use DEFINE_MUTEX() for mutex lock
  ecryptfs: keystore: Fix some kernel-doc issues and demote non-conformant headers
  ecryptfs: inode: Help out nearly-there header and demote non-conformant ones
  ecryptfs: mmap: Help out one function header and demote other abuses
  ecryptfs: crypto: Supply some missing param descriptions and demote abuses
  ecryptfs: miscdev: File headers are not good kernel-doc candidates
  ecryptfs: main: Demote a bunch of non-conformant kernel-doc headers
  ecryptfs: messaging: Add missing param descriptions and demote abuses
  ecryptfs: super: Fix formatting, naming and kernel-doc abuses
  ecryptfs: file: Demote kernel-doc abuses
  ecryptfs: kthread: Demote file header and provide description for 'cred'
  ecryptfs: dentry: File headers are not good candidates for kernel-doc
  ecryptfs: debug: Demote a couple of kernel-doc abuses
  ecryptfs: read_write: File headers do not make good candidates for kernel-doc
  ecryptfs: use DEFINE_MUTEX() for mutex lock
  eCryptfs: add a semicolon

3 years agoMerge tag 'trace-v5.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt...
Linus Torvalds [Thu, 6 May 2021 17:03:38 +0000 (10:03 -0700)]
Merge tag 'trace-v5.13-2' of git://git./linux/kernel/git/rostedt/linux-trace

Pull tracing fix from Steven Rostedt:
 "Fix probes written to the set_ftrace_filter file

  Now that there's a library that accesses the tracefs file system
  (libtracefs), the way the files are interacted with is slightly
  different than the command line. For instance, the write() system call
  is used directly instead of an echo. This exposes some old bugs.

  If a probe is written to "set_ftrace_filter" without any white space
  after it, it will be ignored. This is because the write expects that a
  string written to it that does not end with white spaces thinks there
  is more to come. But if the file is closed, the release function needs
  to finish it. The "set_ftrace_filter" release function handles the
  filter part of the "set_ftrace_filter" file, but did not handle the
  probe part"

* tag 'trace-v5.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  ftrace: Handle commands when closing set_ftrace_filter file

3 years agoMerge tag 'acpi-5.13-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael...
Linus Torvalds [Thu, 6 May 2021 16:56:26 +0000 (09:56 -0700)]
Merge tag 'acpi-5.13-rc1-2' of git://git./linux/kernel/git/rafael/linux-pm

Pull ACPI fixes from Rafael Wysocki:
 "These revert one recent commit that turned out to be problematic,
  address two issues in the ACPI "custom method" interface and update
  GPIO properties documentation.

  Specifics:

   - Revent recent commit related to the handling of ACPI power
     resources during initialization, because it turned out to cause
     problems to occur on some systems (Rafael Wysocki).

   - Fix potential use-after-free and potential memory leak in the ACPI
     "custom method" debugfs interface (Mark Langsdorf).

   - Update ACPI GPIO properties documentation to cover assumptions
     regarding GPIO polarity (Andy Shevchenko)"

* tag 'acpi-5.13-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  Revert "ACPI: scan: Turn off unused power resources during initialization"
  ACPI: custom_method: fix a possible memory leak
  ACPI: custom_method: fix potential use-after-free issue
  Documentation: firmware-guide: gpio-properties: Add note to SPI CS case

3 years agoMerge tag 'devicetree-fixes-for-5.13-1' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Thu, 6 May 2021 16:53:40 +0000 (09:53 -0700)]
Merge tag 'devicetree-fixes-for-5.13-1' of git://git./linux/kernel/git/robh/linux

Pull devicetree fixes from Rob Herring:

 - Several Renesas binding fixes to fix warnings

 - Remove duplicate compatibles in 8250 binding

 - Remove orphaned Sigma Designs Tango bindings

 - Fix bcm2711-hdmi binding to use 'additionalProperties'

 - Fix idt,32434-pic warning for missing 'interrupts' property

 - Fix 'stored but not read' warnings in DT overlay code

* tag 'devicetree-fixes-for-5.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
  dt-bindings: net: renesas,etheravb: Fix optional second clock name
  dt-bindings: display: renesas,du: Add missing power-domains property
  dt-bindings: media: renesas,vin: Make resets optional on R-Car Gen1
  dt-bindings: PCI: rcar-pci-host: Document missing R-Car H1 support
  of: overlay: Remove redundant assignment to ret
  dt-bindings: serial: 8250: Remove duplicated compatible strings
  dt-bindings: Remove unused Sigma Designs Tango bindings
  dt-bindings: bcm2711-hdmi: Fix broken schema
  dt-bindings: interrupt-controller: idt,32434-pic: Add missing interrupts property

3 years agoMerge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm
Linus Torvalds [Thu, 6 May 2021 16:28:07 +0000 (09:28 -0700)]
Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm

Pull ARM updates from Russell King:

 - Fix BSS size calculation for LLVM

 - Improve robustness of kernel entry around v7_invalidate_l1

 - Fix and update kprobes assembly

 - Correct breakpoint overflow handler check

 - Pause function graph tracer when suspending a CPU

 - Switch to generic syscallhdr.sh and syscalltbl.sh

 - Remove now unused set_kernel_text_r[wo] functions

 - Updates for ptdump (__init marking and using DEFINE_SHOW_ATTRIBUTE)

 - Fix for interrupted SMC (secure) calls

 - Remove Compaq Personal Server platform

* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
  ARM: footbridge: remove personal server platform
  ARM: 9075/1: kernel: Fix interrupted SMC calls
  ARM: 9074/1: ptdump: convert to DEFINE_SHOW_ATTRIBUTE
  ARM: 9073/1: ptdump: add __init section marker to three functions
  ARM: 9072/1: mm: remove set_kernel_text_r[ow]()
  ARM: 9067/1: syscalls: switch to generic syscallhdr.sh
  ARM: 9068/1: syscalls: switch to generic syscalltbl.sh
  ARM: 9066/1: ftrace: pause/unpause function graph tracer in cpu_suspend()
  ARM: 9064/1: hw_breakpoint: Do not directly check the event's overflow_handler hook
  ARM: 9062/1: kprobes: rewrite test-arm.c in UAL
  ARM: 9061/1: kprobes: fix UNPREDICTABLE warnings
  ARM: 9060/1: kexec: Remove unused kexec_reinit callback
  ARM: 9059/1: cache-v7: get rid of mini-stack
  ARM: 9058/1: cache-v7: refactor v7_invalidate_l1 to avoid clobbering r5/r6
  ARM: 9057/1: cache-v7: add missing ISB after cache level selection
  ARM: 9056/1: decompressor: fix BSS size calculation for LLVM ld.lld

3 years agoMerge tag 'riscv-for-linus-5.13-mw0' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Thu, 6 May 2021 16:24:18 +0000 (09:24 -0700)]
Merge tag 'riscv-for-linus-5.13-mw0' of git://git./linux/kernel/git/riscv/linux

Pull RISC-V updates from Palmer Dabbelt:

 - Support for the memtest= kernel command-line argument.

 - Support for building the kernel with FORTIFY_SOURCE.

 - Support for generic clockevent broadcasts.

 - Support for the buildtar build target.

 - Some build system cleanups to pass more LLVM-friendly arguments.

 - Support for kprobes.

 - A rearranged kernel memory map, the first part of supporting sv48
   systems.

 - Improvements to kexec, along with support for kdump and crash
   kernels.

 - An alternatives-based errata framework, along with support for
   handling a pair of errata that manifest on some SiFive designs
   (including the HiFive Unmatched).

 - Support for XIP.

 - A device tree for the Microchip PolarFire ICICLE SoC and associated
   dev board.

... along with a bunch of cleanups.  There are already a handful of fixes
on the list so there will likely be a part 2.

* tag 'riscv-for-linus-5.13-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (45 commits)
  RISC-V: Always define XIP_FIXUP
  riscv: Remove 32b kernel mapping from page table dump
  riscv: Fix 32b kernel build with CONFIG_DEBUG_VIRTUAL=y
  RISC-V: Fix error code returned by riscv_hartid_to_cpuid()
  RISC-V: Enable Microchip PolarFire ICICLE SoC
  RISC-V: Initial DTS for Microchip ICICLE board
  dt-bindings: riscv: microchip: Add YAML documentation for the PolarFire SoC
  RISC-V: Add Microchip PolarFire SoC kconfig option
  RISC-V: enable XIP
  RISC-V: Add crash kernel support
  RISC-V: Add kdump support
  RISC-V: Improve init_resources()
  RISC-V: Add kexec support
  RISC-V: Add EM_RISCV to kexec UAPI header
  riscv: vdso: fix and clean-up Makefile
  riscv/mm: Use BUG_ON instead of if condition followed by BUG.
  riscv/kprobe: fix kernel panic when invoking sys_read traced by kprobe
  riscv: Set ARCH_HAS_STRICT_MODULE_RWX if MMU
  riscv: module: Create module allocations without exec permissions
  riscv: bpf: Avoid breaking W^X
  ...

3 years agoMerge tag 'hexagon-5.13-0' of git://git.kernel.org/pub/scm/linux/kernel/git/bcain...
Linus Torvalds [Thu, 6 May 2021 15:40:38 +0000 (08:40 -0700)]
Merge tag 'hexagon-5.13-0' of git://git./linux/kernel/git/bcain/linux

Pull Hexagon updates from Brian Cain:
 "Hexagon architecture build fixes + builtins

  Small build fixes applied:

   - use -mlong-calls to build

   - extend jumps in futex_atomic_*

   - etc

  Also, for convenience and portability, the hexagon compiler builtin
  functions like memcpy etc have been added to the kernel -- following
  the idiom used by other architectures"

* tag 'hexagon-5.13-0' of git://git.kernel.org/pub/scm/linux/kernel/git/bcain/linux:
  Hexagon: add target builtins to kernel
  Hexagon: remove DEBUG from comet config
  Hexagon: change jumps to must-extend in futex_atomic_*
  Hexagon: fix build errors

3 years agoMerge tag 'docs-5.13-2' of git://git.lwn.net/linux
Linus Torvalds [Thu, 6 May 2021 15:33:54 +0000 (08:33 -0700)]
Merge tag 'docs-5.13-2' of git://git.lwn.net/linux

Pull documentation fixes from Jonathan Corbet:
 "A few late-arriving documentation fixes, including some oprofile
  cleanup, a kernel-doc fix, some regression-reporting updates, and the
  usual minor fixes"

* tag 'docs-5.13-2' of git://git.lwn.net/linux:
  Enlisted oprofile version line removed
  oprofiled version output line removed from the list
  Removed the oprofiled version option
  docs: reporting-issues.rst: CC subsystem and maintainers on regressions
  docs: correct URL to bios and kernel developer's guide
  docs/core-api: Consistent code style
  docs/zh_CN: Adjust order and content of zh_CN/index.rst
  Documentation: input: joydev file corrections
  docs: Fix typo in Documentation/x86/x86_64/5level-paging.rst
  kernel-doc: Add support for __deprecated

3 years agoblock: reexpand iov_iter after read/write
yangerkun [Thu, 1 Apr 2021 07:18:07 +0000 (15:18 +0800)]
block: reexpand iov_iter after read/write

We get a bug:

BUG: KASAN: slab-out-of-bounds in iov_iter_revert+0x11c/0x404
lib/iov_iter.c:1139
Read of size 8 at addr ffff0000d3fb11f8 by task

CPU: 0 PID: 12582 Comm: syz-executor.2 Not tainted
5.10.0-00843-g352c8610ccd2 #2
Hardware name: linux,dummy-virt (DT)
Call trace:
 dump_backtrace+0x0/0x2d0 arch/arm64/kernel/stacktrace.c:132
 show_stack+0x28/0x34 arch/arm64/kernel/stacktrace.c:196
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x110/0x164 lib/dump_stack.c:118
 print_address_description+0x78/0x5c8 mm/kasan/report.c:385
 __kasan_report mm/kasan/report.c:545 [inline]
 kasan_report+0x148/0x1e4 mm/kasan/report.c:562
 check_memory_region_inline mm/kasan/generic.c:183 [inline]
 __asan_load8+0xb4/0xbc mm/kasan/generic.c:252
 iov_iter_revert+0x11c/0x404 lib/iov_iter.c:1139
 io_read fs/io_uring.c:3421 [inline]
 io_issue_sqe+0x2344/0x2d64 fs/io_uring.c:5943
 __io_queue_sqe+0x19c/0x520 fs/io_uring.c:6260
 io_queue_sqe+0x2a4/0x590 fs/io_uring.c:6326
 io_submit_sqe fs/io_uring.c:6395 [inline]
 io_submit_sqes+0x4c0/0xa04 fs/io_uring.c:6624
 __do_sys_io_uring_enter fs/io_uring.c:9013 [inline]
 __se_sys_io_uring_enter fs/io_uring.c:8960 [inline]
 __arm64_sys_io_uring_enter+0x190/0x708 fs/io_uring.c:8960
 __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline]
 invoke_syscall arch/arm64/kernel/syscall.c:48 [inline]
 el0_svc_common arch/arm64/kernel/syscall.c:158 [inline]
 do_el0_svc+0x120/0x290 arch/arm64/kernel/syscall.c:227
 el0_svc+0x1c/0x28 arch/arm64/kernel/entry-common.c:367
 el0_sync_handler+0x98/0x170 arch/arm64/kernel/entry-common.c:383
 el0_sync+0x140/0x180 arch/arm64/kernel/entry.S:670

Allocated by task 12570:
 stack_trace_save+0x80/0xb8 kernel/stacktrace.c:121
 kasan_save_stack mm/kasan/common.c:48 [inline]
 kasan_set_track mm/kasan/common.c:56 [inline]
 __kasan_kmalloc+0xdc/0x120 mm/kasan/common.c:461
 kasan_kmalloc+0xc/0x14 mm/kasan/common.c:475
 __kmalloc+0x23c/0x334 mm/slub.c:3970
 kmalloc include/linux/slab.h:557 [inline]
 __io_alloc_async_data+0x68/0x9c fs/io_uring.c:3210
 io_setup_async_rw fs/io_uring.c:3229 [inline]
 io_read fs/io_uring.c:3436 [inline]
 io_issue_sqe+0x2954/0x2d64 fs/io_uring.c:5943
 __io_queue_sqe+0x19c/0x520 fs/io_uring.c:6260
 io_queue_sqe+0x2a4/0x590 fs/io_uring.c:6326
 io_submit_sqe fs/io_uring.c:6395 [inline]
 io_submit_sqes+0x4c0/0xa04 fs/io_uring.c:6624
 __do_sys_io_uring_enter fs/io_uring.c:9013 [inline]
 __se_sys_io_uring_enter fs/io_uring.c:8960 [inline]
 __arm64_sys_io_uring_enter+0x190/0x708 fs/io_uring.c:8960
 __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline]
 invoke_syscall arch/arm64/kernel/syscall.c:48 [inline]
 el0_svc_common arch/arm64/kernel/syscall.c:158 [inline]
 do_el0_svc+0x120/0x290 arch/arm64/kernel/syscall.c:227
 el0_svc+0x1c/0x28 arch/arm64/kernel/entry-common.c:367
 el0_sync_handler+0x98/0x170 arch/arm64/kernel/entry-common.c:383
 el0_sync+0x140/0x180 arch/arm64/kernel/entry.S:670

Freed by task 12570:
 stack_trace_save+0x80/0xb8 kernel/stacktrace.c:121
 kasan_save_stack mm/kasan/common.c:48 [inline]
 kasan_set_track+0x38/0x6c mm/kasan/common.c:56
 kasan_set_free_info+0x20/0x40 mm/kasan/generic.c:355
 __kasan_slab_free+0x124/0x150 mm/kasan/common.c:422
 kasan_slab_free+0x10/0x1c mm/kasan/common.c:431
 slab_free_hook mm/slub.c:1544 [inline]
 slab_free_freelist_hook mm/slub.c:1577 [inline]
 slab_free mm/slub.c:3142 [inline]
 kfree+0x104/0x38c mm/slub.c:4124
 io_dismantle_req fs/io_uring.c:1855 [inline]
 __io_free_req+0x70/0x254 fs/io_uring.c:1867
 io_put_req_find_next fs/io_uring.c:2173 [inline]
 __io_queue_sqe+0x1fc/0x520 fs/io_uring.c:6279
 __io_req_task_submit+0x154/0x21c fs/io_uring.c:2051
 io_req_task_submit+0x2c/0x44 fs/io_uring.c:2063
 task_work_run+0xdc/0x128 kernel/task_work.c:151
 get_signal+0x6f8/0x980 kernel/signal.c:2562
 do_signal+0x108/0x3a4 arch/arm64/kernel/signal.c:658
 do_notify_resume+0xbc/0x25c arch/arm64/kernel/signal.c:722
 work_pending+0xc/0x180

blkdev_read_iter can truncate iov_iter's count since the count + pos may
exceed the size of the blkdev. This will confuse io_read that we have
consume the iovec. And once we do the iov_iter_revert in io_read, we
will trigger the slab-out-of-bounds. Fix it by reexpand the count with
size has been truncated.

blkdev_write_iter can trigger the problem too.

Signed-off-by: yangerkun <yangerkun@huawei.com>
Acked-by: Pavel Begunkov <asml.silencec@gmail.com>
Link: https://lore.kernel.org/r/20210401071807.3328235-1-yangerkun@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoMerge branches 'acpi-pm' and 'acpi-docs'
Rafael J. Wysocki [Thu, 6 May 2021 15:21:42 +0000 (17:21 +0200)]
Merge branches 'acpi-pm' and 'acpi-docs'

* acpi-pm:
  Revert "ACPI: scan: Turn off unused power resources during initialization"

* acpi-docs:
  Documentation: firmware-guide: gpio-properties: Add note to SPI CS case

3 years agoarm64: kernel: Update the stale comment
Shaokun Zhang [Thu, 6 May 2021 05:54:22 +0000 (13:54 +0800)]
arm64: kernel: Update the stale comment

Commit af391b15f7b5 ("arm64: kernel: rename __cpu_suspend to keep it aligned with arm")
has used @index instead of @arg, but the comment is stale, update it.

Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Link: https://lore.kernel.org/r/1620280462-21937-1-git-send-email-zhangshaokun@hisilicon.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoALSA: hda: generic: change the DAC ctl name for LO+SPK or LO+HP
Hui Wang [Tue, 4 May 2021 07:39:17 +0000 (15:39 +0800)]
ALSA: hda: generic: change the DAC ctl name for LO+SPK or LO+HP

Without this change, the DAC ctl's name could be changed only when
the machine has both Speaker and Headphone, but we met some machines
which only has Lineout and Headhpone, and the Lineout and Headphone
share the Audio Mixer0 and DAC0, the ctl's name is set to "Front".

On most of machines, the "Front" is used for Speaker only or Lineout
only, but on this machine it is shared by Lineout and Headphone,
This introduces an issue in the pipewire and pulseaudio, suppose users
want the Headphone to be on and the Speaker/Lineout to be off, they
could turn off the "Front", this works on most of the machines, but on
this machine, the "Front" couldn't be turned off otherwise the
headphone will be off too. Here we do some change to let the ctl's
name change to "Headphone+LO" on this machine, and pipewire and
pulseaudio already could handle "Headphone+LO" and "Speaker+LO".
(https://gitlab.freedesktop.org/pipewire/pipewire/-/issues/747)

BugLink: http://bugs.launchpad.net/bugs/804178
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Link: https://lore.kernel.org/r/20210504073917.22406-1-hui.wang@canonical.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
3 years agocan: m_can: m_can_tx_work_queue(): fix tx_skb race condition
Marc Kleine-Budde [Wed, 5 May 2021 11:32:27 +0000 (13:32 +0200)]
can: m_can: m_can_tx_work_queue(): fix tx_skb race condition

The m_can_start_xmit() function checks if the cdev->tx_skb is NULL and
returns with NETDEV_TX_BUSY in case tx_sbk is not NULL.

There is a race condition in the m_can_tx_work_queue(), where first
the skb is send to the driver and then the case tx_sbk is set to NULL.
A TX complete IRQ might come in between and wake the queue, which
results in tx_skb not being cleared yet.

Fixes: f524f829b75a ("can: m_can: Create a m_can platform framework")
Tested-by: Torin Cooper-Bennun <torin@maxiluxsystems.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
3 years agocan: mcp251x: fix resume from sleep before interface was brought up
Frieder Schrempf [Wed, 5 May 2021 07:14:15 +0000 (09:14 +0200)]
can: mcp251x: fix resume from sleep before interface was brought up

Since 8ce8c0abcba3 the driver queues work via priv->restart_work when
resuming after suspend, even when the interface was not previously
enabled. This causes a null dereference error as the workqueue is only
allocated and initialized in mcp251x_open().

To fix this we move the workqueue init to mcp251x_can_probe() as there
is no reason to do it later and repeat it whenever mcp251x_open() is
called.

Fixes: 8ce8c0abcba3 ("can: mcp251x: only reset hardware as required")
Link: https://lore.kernel.org/r/17d5d714-b468-482f-f37a-482e3d6df84e@kontron.de
Signed-off-by: Frieder Schrempf <frieder.schrempf@kontron.de>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
[mkl: fix error handling in mcp251x_stop()]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
3 years agocan: mcp251xfd: mcp251xfd_probe(): add missing can_rx_offload_del() in error path
Marc Kleine-Budde [Sun, 2 May 2021 09:34:34 +0000 (11:34 +0200)]
can: mcp251xfd: mcp251xfd_probe(): add missing can_rx_offload_del() in error path

This patch adds the missing can_rx_offload_del(), that must be called
if mcp251xfd_register() fails.

Fixes: 55e5b97f003e ("can: mcp25xxfd: add driver for Microchip MCP25xxFD SPI CAN")
Link: https://lore.kernel.org/r/20210504091838.1109047-1-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
3 years agocan: mcp251xfd: mcp251xfd_probe(): fix an error pointer dereference in probe
Dan Carpenter [Mon, 3 May 2021 14:49:09 +0000 (17:49 +0300)]
can: mcp251xfd: mcp251xfd_probe(): fix an error pointer dereference in probe

When we converted this code to use dev_err_probe() we accidentally
removed a return. It means that if devm_clk_get() it will lead to an
Oops when we call clk_get_rate() on the next line.

Fixes: cf8ee6de2543 ("can: mcp251xfd: mcp251xfd_probe(): use dev_err_probe() to simplify error handling")
Link: https://lore.kernel.org/r/YJANZf13Qxd5Mhr1@mwanda
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
3 years agox86/process: setup io_threads more like normal user space threads
Stefan Metzmacher [Wed, 5 May 2021 11:03:10 +0000 (13:03 +0200)]
x86/process: setup io_threads more like normal user space threads

As io_threads are fully set up USER threads it's clearer to separate the
code path from the KTHREAD logic.

The only remaining difference to user space threads is that io_threads
never return to user space again. Instead they loop within the given
worker function.

The fact that they never return to user space means they don't have an
user space thread stack. In order to indicate that to tools like gdb we
reset the stack and instruction pointers to 0.

This allows gdb attach to user space processes using io-uring, which like
means that they have io_threads, without printing worrying message like
this:

  warning: Selected architecture i386:x86-64 is not compatible with reported target architecture i386

  warning: Architecture rejected target-supplied description

The output will be something like this:

  (gdb) info threads
    Id   Target Id                  Frame
  * 1    LWP 4863 "io_uring-cp-for" syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
    2    LWP 4864 "iou-mgr-4863"    0x0000000000000000 in ?? ()
    3    LWP 4865 "iou-wrk-4863"    0x0000000000000000 in ?? ()
  (gdb) thread 3
  [Switching to thread 3 (LWP 4865)]
  #0  0x0000000000000000 in ?? ()
  (gdb) bt
  #0  0x0000000000000000 in ?? ()
  Backtrace stopped: Cannot access memory at address 0x0

Fixes: 4727dc20e042 ("arch: setup PF_IO_WORKER threads like PF_KTHREAD")
Link: https://lore.kernel.org/io-uring/044d0bad-6888-a211-e1d3-159a4aeed52d@polymtl.ca/T/#m1bbf5727e3d4e839603f6ec7ed79c7eebfba6267
Signed-off-by: Stefan Metzmacher <metze@samba.org>
cc: Linus Torvalds <torvalds@linux-foundation.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: Andy Lutomirski <luto@kernel.org>
cc: linux-kernel@vger.kernel.org
cc: io-uring@vger.kernel.org
cc: x86@kernel.org
Link: https://lore.kernel.org/r/20210505110310.237537-1-metze@samba.org
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agonetfilter: nftables: Fix a memleak from userdata error path in new objects
Pablo Neira Ayuso [Wed, 5 May 2021 21:06:43 +0000 (23:06 +0200)]
netfilter: nftables: Fix a memleak from userdata error path in new objects

Release object name if userdata allocation fails.

Fixes: b131c96496b3 ("netfilter: nf_tables: add userdata support for nft_object")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agonetfilter: remove BUG_ON() after skb_header_pointer()
Pablo Neira Ayuso [Wed, 5 May 2021 20:30:49 +0000 (22:30 +0200)]
netfilter: remove BUG_ON() after skb_header_pointer()

Several conntrack helpers and the TCP tracker assume that
skb_header_pointer() never fails based on upfront header validation.
Even if this should not ever happen, BUG_ON() is a too drastic measure,
remove them.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agoMAINTAINERS: add io_uring tool to IO_URING
Lukas Bulwahn [Wed, 5 May 2021 05:37:28 +0000 (07:37 +0200)]
MAINTAINERS: add io_uring tool to IO_URING

The files in ./tools/io_uring/ are maintained by the IO_URING maintainers.
Reflect that fact in MAINTAINERS.

Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Link: https://lore.kernel.org/r/20210505053728.3868-1-lukas.bulwahn@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: truncate lengths larger than MAX_RW_COUNT on provide buffers
Thadeu Lima de Souza Cascardo [Wed, 5 May 2021 12:47:06 +0000 (09:47 -0300)]
io_uring: truncate lengths larger than MAX_RW_COUNT on provide buffers

Read and write operations are capped to MAX_RW_COUNT. Some read ops rely on
that limit, and that is not guaranteed by the IORING_OP_PROVIDE_BUFFERS.

Truncate those lengths when doing io_add_buffers, so buffer addresses still
use the uncapped length.

Also, take the chance and change struct io_buffer len member to __u32, so
it matches struct io_provide_buffer len member.

This fixes CVE-2021-3491, also reported as ZDI-CAN-13546.

Fixes: ddf0322db79c ("io_uring: add IORING_OP_PROVIDE_BUFFERS")
Reported-by: Billy Jheng Bing-Jhong (@st424204)
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoMerge branch 'akpm' (patches from Andrew)
Linus Torvalds [Wed, 5 May 2021 20:50:15 +0000 (13:50 -0700)]
Merge branch 'akpm' (patches from Andrew)

Merge more updates from Andrew Morton:
 "The remainder of the main mm/ queue.

  143 patches.

  Subsystems affected by this patch series (all mm): pagecache, hugetlb,
  userfaultfd, vmscan, compaction, migration, cma, ksm, vmstat, mmap,
  kconfig, util, memory-hotplug, zswap, zsmalloc, highmem, cleanups, and
  kfence"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (143 commits)
  kfence: use power-efficient work queue to run delayed work
  kfence: maximize allocation wait timeout duration
  kfence: await for allocation using wait_event
  kfence: zero guard page after out-of-bounds access
  mm/process_vm_access.c: remove duplicate include
  mm/mempool: minor coding style tweaks
  mm/highmem.c: fix coding style issue
  btrfs: use memzero_page() instead of open coded kmap pattern
  iov_iter: lift memzero_page() to highmem.h
  mm/zsmalloc: use BUG_ON instead of if condition followed by BUG.
  mm/zswap.c: switch from strlcpy to strscpy
  arm64/Kconfig: introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
  x86/Kconfig: introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
  mm,memory_hotplug: add kernel boot option to enable memmap_on_memory
  acpi,memhotplug: enable MHP_MEMMAP_ON_MEMORY when supported
  mm,memory_hotplug: allocate memmap from the added memory range
  mm,memory_hotplug: factor out adjusting present pages into adjust_present_page_count()
  mm,memory_hotplug: relax fully spanned sections check
  drivers/base/memory: introduce memory_block_{online,offline}
  mm/memory_hotplug: remove broken locking of zone PCP structures during hot remove
  ...

3 years agoMerge tag 'nfsd-5.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Linus Torvalds [Wed, 5 May 2021 20:44:19 +0000 (13:44 -0700)]
Merge tag 'nfsd-5.13-1' of git://git./linux/kernel/git/cel/linux

Pull more nfsd updates from Chuck Lever:
 "Additional fixes and clean-ups for NFSD since tags/nfsd-5.13,
  including a fix to grant read delegations for files open for writing"

* tag 'nfsd-5.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  SUNRPC: Fix null pointer dereference in svc_rqst_free()
  SUNRPC: fix ternary sign expansion bug in tracing
  nfsd: Fix fall-through warnings for Clang
  nfsd: grant read delegations to clients holding writes
  nfsd: reshuffle some code
  nfsd: track filehandle aliasing in nfs4_files
  nfsd: hash nfs4_files by inode number
  nfsd: ensure new clients break delegations
  nfsd: removed unused argument in nfsd_startup_generic()
  nfsd: remove unused function
  svcrdma: Pass a useful error code to the send_err tracepoint
  svcrdma: Rename goto labels in svc_rdma_sendto()
  svcrdma: Don't leak send_ctxt on Send errors

3 years agoMerge tag '5.13-rc-smb3-part2' of git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Wed, 5 May 2021 20:37:07 +0000 (13:37 -0700)]
Merge tag '5.13-rc-smb3-part2' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs updates from Steve French:
 "Ten CIFS/SMB3 changes - including two marked for stable - including
  some important multichannel fixes, as well as support for handle
  leases (deferred close) and shutdown support:

   - some important multichannel fixes

   - support for handle leases (deferred close)

   - shutdown support (which is also helpful since it enables multiple
     xfstests)

   - enable negotiating stronger encryption by default (GCM256)

   - improve wireshark debugging by allowing more options for root to
     dump decryption keys

  SambaXP and the SMB3 Plugfest test event are going on now so I am
  expecting more patches over the next few days due to extra testing
  (including more multichannel fixes)"

* tag '5.13-rc-smb3-part2' of git://git.samba.org/sfrench/cifs-2.6:
  fs/cifs: Fix resource leak
  Cifs: Fix kernel oops caused by deferred close for files.
  cifs: fix regression when mounting shares with prefix paths
  cifs: use echo_interval even when connection not ready.
  cifs: detect dead connections only when echoes are enabled.
  smb3.1.1: allow dumping keys for multiuser mounts
  smb3.1.1: allow dumping GCM256 keys to improve debugging of encrypted shares
  cifs: add shutdown support
  cifs: Deferred close for files
  smb3.1.1: enable negotiating stronger encryption by default

3 years agoMerge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Linus Torvalds [Wed, 5 May 2021 20:31:39 +0000 (13:31 -0700)]
Merge tag 'for_linus' of git://git./linux/kernel/git/mst/vhost

Pull virtio updates from Michael Tsirkin:
 "A bunch of new drivers including vdpa support for block and
  virtio-vdpa.

  Beginning of vq kick (aka doorbell) mapping support.

  Misc fixes"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (40 commits)
  virtio_pci_modern: correct sparse tags for notify
  virtio_pci_modern: __force cast the notify mapping
  vDPA/ifcvf: get_config_size should return dev specific config size
  vDPA/ifcvf: enable Intel C5000X-PL virtio-block for vDPA
  vDPA/ifcvf: deduce VIRTIO device ID when probe
  vdpa_sim_blk: add support for vdpa management tool
  vdpa_sim_blk: handle VIRTIO_BLK_T_GET_ID
  vdpa_sim_blk: implement ramdisk behaviour
  vdpa: add vdpa simulator for block device
  vhost/vdpa: Remove the restriction that only supports virtio-net devices
  vhost/vdpa: use get_config_size callback in vhost_vdpa_config_validate()
  vdpa: add get_config_size callback in vdpa_config_ops
  vdpa_sim: cleanup kiovs in vdpasim_free()
  vringh: add vringh_kiov_length() helper
  vringh: implement vringh_kiov_advance()
  vringh: explain more about cleaning riov and wiov
  vringh: reset kiov 'consumed' field in __vringh_iov()
  vringh: add 'iotlb_lock' to synchronize iotlb accesses
  vdpa_sim: use iova module to allocate IOVA addresses
  vDPA/ifcvf: deduce VIRTIO device ID from pdev ids
  ...

3 years agonetfilter: nfnetlink_osf: Fix a missing skb_header_pointer() NULL check
Pablo Neira Ayuso [Wed, 5 May 2021 20:25:24 +0000 (22:25 +0200)]
netfilter: nfnetlink_osf: Fix a missing skb_header_pointer() NULL check

Do not assume that the tcph->doff field is correct when parsing for TCP
options, skb_header_pointer() might fail to fetch these bits.

Fixes: 11eeef41d5f6 ("netfilter: passive OS fingerprint xtables match")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agoMerge tag 'pci-v5.13-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaa...
Linus Torvalds [Wed, 5 May 2021 20:24:11 +0000 (13:24 -0700)]
Merge tag 'pci-v5.13-changes' of git://git./linux/kernel/git/helgaas/pci

Pull pci updates from Bjorn Helgaas:
 "Enumeration:
   - Release OF node when pci_scan_device() fails (Dmitry Baryshkov)
   - Add pci_disable_parity() (Bjorn Helgaas)
   - Disable Mellanox Tavor parity reporting (Heiner Kallweit)
   - Disable N2100 r8169 parity reporting (Heiner Kallweit)
   - Fix RCiEP device to RCEC association (Qiuxu Zhuo)
   - Convert sysfs "config", "rom", "reset", "label", "index",
     "acpi_index" to static attributes to help fix races in device
     enumeration (Krzysztof Wilczyński)
   - Convert sysfs "vpd" to static attribute (Heiner Kallweit, Krzysztof
     Wilczyński)
   - Use sysfs_emit() in "show" functions (Krzysztof Wilczyński)
   - Remove unused alloc_pci_root_info() return value (Krzysztof
     Wilczyński)

  PCI device hotplug:
   - Fix acpiphp reference count leak (Feilong Lin)

  Power management:
   - Fix acpi_pci_set_power_state() debug message (Rafael J. Wysocki)
   - Fix runtime PM imbalance (Dinghao Liu)

  Virtualization:
   - Increase delay after FLR to work around Intel DC P4510 NVMe erratum
     (Raphael Norwitz)

  MSI:
   - Convert rcar, tegra, xilinx to MSI domains (Marc Zyngier)
   - For rcar, xilinx, use controller address as MSI doorbell (Marc
     Zyngier)
   - Remove unused hv msi_controller struct (Marc Zyngier)
   - Remove unused PCI core msi_controller support (Marc Zyngier)
   - Remove struct msi_controller altogether (Marc Zyngier)
   - Remove unused default_teardown_msi_irqs() (Marc Zyngier)
   - Let host bridges declare their reliance on MSI domains (Marc
     Zyngier)
   - Make pci_host_common_probe() declare its reliance on MSI domains
     (Marc Zyngier)
   - Advertise mediatek lack of built-in MSI handling (Thomas Gleixner)
   - Document ways of ending up with NO_MSI (Marc Zyngier)
   - Refactor HT advertising of NO_MSI flag (Marc Zyngier)

  VPD:
   - Remove obsolete Broadcom NIC VPD length-limiting quirk (Heiner
     Kallweit)
   - Remove sysfs VPD size checking dead code (Heiner Kallweit)
   - Convert VPF sysfs file to static attribute (Heiner Kallweit)
   - Remove unnecessary pci_set_vpd_size() (Heiner Kallweit)
   - Tone down "missing VPD" message (Heiner Kallweit)

  Endpoint framework:
   - Fix NULL pointer dereference when epc_features not implemented
     (Shradha Todi)
   - Add missing destroy_workqueue() in endpoint test (Yang Yingliang)

  Amazon Annapurna Labs PCIe controller driver:
   - Fix compile testing without CONFIG_PCI_ECAM (Arnd Bergmann)
   - Fix "no symbols" warnings when compile testing with
     CONFIG_TRIM_UNUSED_KSYMS (Arnd Bergmann)

  APM X-Gene PCIe controller driver:
   - Fix cfg resource mapping regression (Dejin Zheng)

  Broadcom iProc PCIe controller driver:
   - Return zero for success of iproc_msi_irq_domain_alloc() (Pali
     Rohár)

  Broadcom STB PCIe controller driver:
   - Add reset_control_rearm() stub for !CONFIG_RESET_CONTROLLER (Jim
     Quinlan)
   - Fix use of BCM7216 reset controller (Jim Quinlan)
   - Use reset/rearm for Broadcom STB pulse reset instead of
     deassert/assert (Jim Quinlan)
   - Fix brcm_pcie_probe() error return for unsupported revision (Wei
     Yongjun)

  Cavium ThunderX PCIe controller driver:
   - Fix compile testing (Arnd Bergmann)
   - Fix "no symbols" warnings when compile testing with
     CONFIG_TRIM_UNUSED_KSYMS (Arnd Bergmann)

  Freescale Layerscape PCIe controller driver:
   - Fix ls_pcie_ep_probe() syntax error (comma for semicolon)
     (Krzysztof Wilczyński)
   - Remove layerscape-gen4 dependencies on OF and ARM64, add dependency
     on ARCH_LAYERSCAPE (Geert Uytterhoeven)

  HiSilicon HIP PCIe controller driver:
   - Remove obsolete HiSilicon PCIe DT description (Dongdong Liu)

  Intel Gateway PCIe controller driver:
   - Remove unused pcie_app_rd() (Jiapeng Chong)

  Intel VMD host bridge driver:
   - Program IRTE with Requester ID of VMD endpoint, not child device
     (Jon Derrick)
   - Disable VMD MSI-X remapping when possible so children can use more
     MSI-X vectors (Jon Derrick)

  MediaTek PCIe controller driver:
   - Configure FC and FTS for functions other than 0 (Ryder Lee)
   - Add YAML schema for MediaTek (Jianjun Wang)
   - Export pci_pio_to_address() for module use (Jianjun Wang)
   - Add MediaTek MT8192 PCIe controller driver (Jianjun Wang)
   - Add MediaTek MT8192 INTx support (Jianjun Wang)
   - Add MediaTek MT8192 MSI support (Jianjun Wang)
   - Add MediaTek MT8192 system power management support (Jianjun Wang)
   - Add missing MODULE_DEVICE_TABLE (Qiheng Lin)

  Microchip PolarFlare PCIe controller driver:
   - Make several symbols static (Wei Yongjun)

  NVIDIA Tegra PCIe controller driver:
   - Add MCFG quirks for Tegra194 ECAM errata (Vidya Sagar)
   - Make several symbols const (Rikard Falkeborn)
   - Fix Kconfig host/endpoint typo (Wesley Sheng)

  SiFive FU740 PCIe controller driver:
   - Add pcie_aux clock to prci driver (Greentime Hu)
   - Use reset-simple in prci driver for PCIe (Greentime Hu)
   - Add SiFive FU740 PCIe host controller driver and DT binding (Paul
     Walmsley, Greentime Hu)

  Synopsys DesignWare PCIe controller driver:
   - Move MSI Receiver init to dw_pcie_host_init() so it is
     re-initialized along with the RC in resume (Jisheng Zhang)
   - Move iATU detection earlier to fix regression (Hou Zhiqiang)

  TI J721E PCIe driver:
   - Add DT binding and TI j721e support for refclk to PCIe connector
     (Kishon Vijay Abraham I)
   - Add host mode and endpoint mode DT bindings for TI AM64 SoC (Kishon
     Vijay Abraham I)

  TI Keystone PCIe controller driver:
   - Use generic config accessors for TI AM65x (K3) to fix regression
     (Kishon Vijay Abraham I)

  Xilinx NWL PCIe controller driver:
   - Add support for coherent PCIe DMA traffic using CCI (Bharat Kumar
     Gogada)
   - Add optional "dma-coherent" DT property (Bharat Kumar Gogada)

  Miscellaneous:
   - Fix kernel-doc warnings (Krzysztof Wilczyński)
   - Remove unused MicroGate SyncLink device IDs (Jiri Slaby)
   - Remove redundant dev_err() for devm_ioremap_resource() failure
     (Chen Hui)
   - Remove redundant initialization (Colin Ian King)
   - Drop redundant dev_err() for platform_get_irq() errors (Krzysztof
     Wilczyński)"

* tag 'pci-v5.13-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (98 commits)
  riscv: dts: Add PCIe support for the SiFive FU740-C000 SoC
  PCI: fu740: Add SiFive FU740 PCIe host controller driver
  dt-bindings: PCI: Add SiFive FU740 PCIe host controller
  MAINTAINERS: Add maintainers for SiFive FU740 PCIe driver
  clk: sifive: Use reset-simple in prci driver for PCIe driver
  clk: sifive: Add pcie_aux clock in prci driver for PCIe driver
  PCI: brcmstb: Use reset/rearm instead of deassert/assert
  ata: ahci_brcm: Fix use of BCM7216 reset controller
  reset: add missing empty function reset_control_rearm()
  PCI: Allow VPD access for QLogic ISP2722
  PCI/VPD: Add helper pci_get_func0_dev()
  PCI/VPD: Remove pci_vpd_find_tag() SRDT handling
  PCI/VPD: Remove pci_vpd_find_tag() 'offset' argument
  PCI/VPD: Change pci_vpd_init() return type to void
  PCI/VPD: Make missing VPD message less alarming
  PCI/VPD: Remove pci_set_vpd_size()
  x86/PCI: Remove unused alloc_pci_root_info() return value
  MAINTAINERS: Add Jianjun Wang as MediaTek PCI co-maintainer
  PCI: mediatek-gen3: Add system PM support
  PCI: mediatek-gen3: Add MSI support
  ...

3 years agoMerge tag 'pwm/for-5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry...
Linus Torvalds [Wed, 5 May 2021 19:53:16 +0000 (12:53 -0700)]
Merge tag 'pwm/for-5.13-rc1' of git://git./linux/kernel/git/thierry.reding/linux-pwm

Pull pwm updates from Thierry Reding:
 "This adds support for the PWM controller found on Toshiba Visconti
  SoCs and converts a couple of drivers to the atomic API.

  There's also a bunch of cleanups and minor fixes across the board"

* tag 'pwm/for-5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm: (35 commits)
  pwm: Reword docs about pwm_apply_state()
  pwm: atmel: Improve duty cycle calculation in .apply()
  pwm: atmel: Fix duty cycle calculation in .get_state()
  pwm: visconti: Add Toshiba Visconti SoC PWM support
  dt-bindings: pwm: Add bindings for Toshiba Visconti PWM Controller
  arm64: dts: rockchip: Remove clock-names from PWM nodes
  ARM: dts: rockchip: Remove clock-names from PWM nodes
  dt-bindings: pwm: rockchip: Add more compatible strings
  dt-bindings: pwm: Convert pwm-rockchip.txt to YAML
  pwm: mediatek: Remove unused function
  pwm: pca9685: Improve runtime PM behavior
  pwm: pca9685: Support hardware readout
  pwm: pca9685: Switch to atomic API
  pwm: lpss: Don't modify HW state in .remove callback
  pwm: sti: Free resources only after pwmchip_remove()
  pwm: sti: Don't modify HW state in .remove callback
  pwm: lpc3200: Don't modify HW state in .remove callback
  pwm: lpc18xx-sct: Free resources only after pwmchip_remove()
  pwm: bcm-kona: Don't modify HW state in .remove callback
  pwm: bcm2835: Free resources only after pwmchip_remove()
  ...

3 years agosmc: disallow TCP_ULP in smc_setsockopt()
Cong Wang [Wed, 5 May 2021 19:40:48 +0000 (12:40 -0700)]
smc: disallow TCP_ULP in smc_setsockopt()

syzbot is able to setup kTLS on an SMC socket which coincidentally
uses sk_user_data too. Later, kTLS treats it as psock so triggers a
refcnt warning. The root cause is that smc_setsockopt() simply calls
TCP setsockopt() which includes TCP_ULP. I do not think it makes
sense to setup kTLS on top of SMC sockets, so we should just disallow
this setup.

It is hard to find a commit to blame, but we can apply this patch
since the beginning of TCP_ULP.

Reported-and-tested-by: syzbot+b54a1ce86ba4a623b7f0@syzkaller.appspotmail.com
Fixes: 734942cc4ea6 ("tcp: ULP infrastructure")
Cc: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: fix nla_strcmp to handle more then one trailing null character
Maciej Żenczykowski [Wed, 5 May 2021 16:58:31 +0000 (09:58 -0700)]
net: fix nla_strcmp to handle more then one trailing null character

Android userspace has been using TCA_KIND with a char[IFNAMESIZ]
many-null-terminated buffer containing the string 'bpf'.

This works on 4.19 and ceases to work on 5.10.

I'm not entirely sure what fixes tag to use, but I think the issue
was likely introduced in the below mentioned 5.4 commit.

Reported-by: Nucca Chen <nuccachen@google.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Fixes: 62794fc4fbf5 ("net_sched: add max len check for TCA_KIND")
Change-Id: I66dc281f165a2858fc29a44869a270a2d698a82b
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge tag 'thermal-v5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/therma...
Linus Torvalds [Wed, 5 May 2021 19:46:48 +0000 (12:46 -0700)]
Merge tag 'thermal-v5.13-rc1' of git://git./linux/kernel/git/thermal/linux

Pull thermal updates from Daniel Lezcano:

 - Remove duplicate error message for the amlogic driver (Tang Bin)

 - Fix spellos in comments for the tegra and sun8i (Bhaskar Chowdhury)

 - Add the missing fifth node on the rcar_gen3 sensor (Niklas Söderlund)

 - Remove duplicate include in ti-bandgap (Zhang Yunkai)

 - Assign error code in the error path in the function
   thermal_of_populate_bind_params() (Jia-Ju Bai)

 - Fix spelling mistake in a comment 'disabed' -> 'disabled' (Colin Ian
   King)

 - Use the device name instead of auto-numbering for a better
   identification of the cooling device (Daniel Lezcano)

 - Improve a bit the division accuracy in the power allocator governor
   (Jeson Gao)

 - Enable the missing third sensor on msm8976 (Konrad Dybcio)

 - Add QCom tsens driver co-maintainer (Thara Gopinath)

 - Fix memory leak and use after free errors in the core code (Daniel
   Lezcano)

 - Add the MDM9607 compatible bindings (Konrad Dybcio)

 - Fix trivial spello in the copyright name for Hisilicon (Hao Fang)

 - Fix negative index array access when converting the frequency to
   power in the energy model (Brian-sy Yang)

 - Add support for Gen2 new PMIC support for Qcom SPMI (David Collins)

 - Update maintainer file for CPU cooling device section (Lukasz Luba)

 - Fix missing put_device on error in the Qcom tsens driver (Guangqing
   Zhu)

 - Add compatible DT binding for sm8350 (Robert Foss)

 - Add support for the MDM9607's tsens driver (Konrad Dybcio)

 - Remove duplicate error messages in thermal_mmio and the bcm2835
   driver (Ruiqi Gong)

 - Add the Thermal Temperature Cooling driver (Zhang Rui)

 - Remove duplicate error messages in the Hisilicon sensor driver (Ye
   Bin)

 - Use the devm_platform_ioremap_resource_byname() function instead of a
   couple of corresponding calls (dingsenjie)

 - Sort the headers alphabetically in the ti-bandgap driver (Zhen Lei)

 - Add missing property in the DT thermal sensor binding (Rafał Miłecki)

 - Remove dead code in the ti-bandgap sensor driver (Lin Ruizhe)

 - Convert the BRCM DT bindings to the yaml schema (Rafał Miłecki)

 - Replace the thermal_notify_framework() call by a call to the
   thermal_zone_device_update() function. Remove the function as well as
   the corresponding documentation (Thara Gopinath)

 - Add support for the ipq8064-tsens sensor along with a set of cleanups
   and code preparation (Ansuel Smith)

 - Add a lockless __thermal_cdev_update() function to improve the
   locking scheme in the core code and governors (Lukasz Luba)

 - Fix multiple cooling device notification changes (Lukasz Luba)

 - Remove unneeded variable initialization (Colin Ian King)

* tag 'thermal-v5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux: (55 commits)
  thermal/drivers/mtk_thermal: Remove redundant initializations of several variables
  thermal/core/power allocator: Use the lockless __thermal_cdev_update() function
  thermal/core/fair share: Use the lockless __thermal_cdev_update() function
  thermal/core/fair share: Lock the thermal zone while looping over instances
  thermal/core/power_allocator: Update once cooling devices when temp is low
  thermal/core/power_allocator: Maintain the device statistics from going stale
  thermal/core: Create a helper __thermal_cdev_update() without a lock
  dt-bindings: thermal: tsens: Document ipq8064 bindings
  thermal/drivers/tsens: Add support for ipq8064-tsens
  thermal/drivers/tsens: Drop unused define for msm8960
  thermal/drivers/tsens: Replace custom 8960 apis with generic apis
  thermal/drivers/tsens: Fix bug in sensor enable for msm8960
  thermal/drivers/tsens: Use init_common for msm8960
  thermal/drivers/tsens: Add VER_0 tsens version
  thermal/drivers/tsens: Convert msm8960 to reg_field
  thermal/drivers/tsens: Don't hardcode sensor slope
  Documentation: driver-api: thermal: Remove thermal_notify_framework from documentation
  thermal/core: Remove thermal_notify_framework
  iwlwifi: mvm: tt: Replace thermal_notify_framework
  dt-bindings: thermal: brcm,ns-thermal: Convert to the json-schema
  ...

3 years agonet:CXGB4: fix leak if sk_buff is not used
Íñigo Huguet [Wed, 5 May 2021 12:54:50 +0000 (14:54 +0200)]
net:CXGB4: fix leak if sk_buff is not used

An sk_buff is allocated to send a flow control message, but it's not
sent in all cases: in case the state is not appropiate to send it or if
it can't be enqueued.

In the first of these 2 cases, the sk_buff was discarded but not freed,
producing a memory leak.

Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoFix spelling error from "eleminate" to "eliminate"
Sean Gloumeau [Wed, 5 May 2021 04:15:39 +0000 (00:15 -0400)]
Fix spelling error from "eleminate" to "eliminate"

Spelling error "eleminate" amended to "eliminate".

Signed-off-by: Sean Gloumeau <sajgloumeau@gmail.com>
Reviewed-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoethtool: fix missing NLM_F_MULTI flag when dumping
Fernando Fernandez Mancera [Tue, 4 May 2021 22:47:14 +0000 (00:47 +0200)]
ethtool: fix missing NLM_F_MULTI flag when dumping

When dumping the ethtool information from all the interfaces, the
netlink reply should contain the NLM_F_MULTI flag. This flag allows
userspace tools to identify that multiple messages are expected.

Link: https://bugzilla.redhat.com/1953847
Fixes: 365f9ae4ee36 ("ethtool: fix genlmsg_put() failure handling in ethnl_default_dumpit()")
Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge tag 'gpio-updates-for-v5.13-v2' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Wed, 5 May 2021 19:39:29 +0000 (12:39 -0700)]
Merge tag 'gpio-updates-for-v5.13-v2' of git://git./linux/kernel/git/brgl/linux

Pull gpio updates from Bartosz Golaszewski:

 - new driver for the Realtek Otto GPIO controller

 - ACPI support for gpio-mpc8xxx

 - edge event support for gpio-sch (+ Kconfig fixes)

 - Kconfig improvements in gpio-ich

 - fixes to older issues in gpio-mockup

 - ACPI quirk for ignoring EC wakeups on Dell Venue 10 Pro 5055

 - improve the GPIO aggregator code by using more generic interfaces
   instead of reimplementing them in the driver

 - convert the DT bindings for gpio-74x164 to yaml

 - documentation improvements

 - a slew of other minor fixes and improvements to GPIO drivers

* tag 'gpio-updates-for-v5.13-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: (34 commits)
  dt-bindings: gpio: add YAML description for rockchip,gpio-bank
  gpio: mxs: remove useless function
  dt-bindings: gpio: fairchild,74hc595: Convert to json-schema
  gpio: it87: remove unused code
  gpio: 104-dio-48e: Fix coding style issues
  gpio: mpc8xxx: Add ACPI support
  gpio: ich: Switch to be dependent on LPC_ICH
  gpio: sch: Drop MFD_CORE selection
  gpio: sch: depends on LPC_SCH
  gpiolib: acpi: Add quirk to ignore EC wakeups on Dell Venue 10 Pro 5055
  gpio: sch: Hook into ACPI GPE handler to catch GPIO edge events
  gpio: sch: Add edge event support
  gpio: aggregator: Replace custom get_arg() with a generic next_arg()
  lib/cmdline: Export next_arg() for being used in modules
  gpio: omap: Use device_get_match_data() helper
  gpio: Add Realtek Otto GPIO support
  dt-bindings: gpio: Binding for Realtek Otto GPIO
  docs: kernel-parameters: Add gpio_mockup_named_lines
  docs: kernel-parameters: Move gpio-mockup for alphabetic order
  lib: bitmap: provide devm_bitmap_alloc() and devm_bitmap_zalloc()
  ...

3 years agoMerge tag 'char-misc-5.13-rc1-round2' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Wed, 5 May 2021 19:29:37 +0000 (12:29 -0700)]
Merge tag 'char-misc-5.13-rc1-round2' of git://git./linux/kernel/git/gregkh/char-misc

Pull char/misc driver fixes from Greg KH:
 "Here are two char/misc fixes for 5.13-rc1 to resolve reported issues.

  The first is a bugfix for the nitro_enclaves driver that fixed some
  important problems. The second was a dyndbg bugfix that resolved some
  reported problems in dynamic debugging control.

  Both have been in linux-next for a while with no reported issues"

* tag 'char-misc-5.13-rc1-round2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  dyndbg: fix parsing file query without a line-range suffix
  nitro_enclaves: Fix stale file descriptors on failed usercopy

3 years agoMerge branch 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux
Linus Torvalds [Wed, 5 May 2021 19:24:29 +0000 (12:24 -0700)]
Merge branch 'turbostat' of git://git./linux/kernel/git/lenb/linux

Pull turbostat updates from Len Brown:
 "Bug fixes and a smattering of features"

* 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: (21 commits)
  tools/power turbostat: version 2021.05.04
  tools/power turbostat: Support "turbostat --hide idle"
  tools/power turbostat: elevate priority of interval mode
  tools/power turbostat: formatting
  tools/power turbostat: rename tcc variables
  tools/power turbostat: add TCC Offset support
  tools/power turbostat: save original CPU model
  tools/power turbostat: Fix Core C6 residency on Atom CPUs
  tools/power turbostat: Print the C-state Pre-wake settings
  tools/power turbostat: Enable tsc_tweak for Elkhart Lake and Jasper Lake
  tools/power turbostat: unmark non-kernel-doc comment
  tools/power/turbostat: Remove Package C6 Retention on Ice Lake Server
  tools/power turbostat: Fix offset overflow issue in index converting
  tools/power/turbostat: Fix turbostat for AMD Zen CPUs
  tools/power turbostat: update version number
  tools/power turbostat: Fix DRAM Energy Unit on SKX
  Revert "tools/power turbostat: adjust for temperature offset"
  tools/power turbostat: Support Ice Lake D
  tools/power turbostat: Support Alder Lake Mobile
  tools/power turbostat: print microcode patch level
  ...

3 years agoMerge tag 'ktest-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt...
Linus Torvalds [Wed, 5 May 2021 19:15:20 +0000 (12:15 -0700)]
Merge tag 'ktest-v5.13' of git://git./linux/kernel/git/rostedt/linux-ktest

Pull ktest updates from Steven Rostedt:

 - Added a KTEST section in the MAINTAINERS file

 - Included John Hawley as a co-maintainer

 - Add an example config that would work with VMware workstation guests

 - Cleanups to the code

* tag 'ktest-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest:
  ktest: Add KTEST section to MAINTAINERS file
  ktest: Re-arrange the code blocks for better discoverability
  ktest: Further consistency cleanups
  ktest: Fixing indentation to match expected pattern
  ktest: Adding editor hints to improve consistency
  ktest: Add example config for using VMware VMs
  ktest: Minor cleanup with uninitialized variable $build_options

3 years agoMerge tag 'safesetid-5.13' of git://github.com/micah-morton/linux
Linus Torvalds [Wed, 5 May 2021 19:08:06 +0000 (12:08 -0700)]
Merge tag 'safesetid-5.13' of git://github.com/micah-morton/linux

Pull SafeSetID update from Micah Morton:
 "Simple code cleanup

  This just has a single three-line code cleanup to eliminate some
  unnecessary 'break' statements"

* tag 'safesetid-5.13' of git://github.com/micah-morton/linux:
  LSM: SafeSetID: Fix code specification by scripts/checkpatch.pl

3 years agokfence: use power-efficient work queue to run delayed work
Marco Elver [Wed, 5 May 2021 01:40:27 +0000 (18:40 -0700)]
kfence: use power-efficient work queue to run delayed work

Use the power-efficient work queue, to avoid the pathological case where
we keep pinning ourselves on the same possibly idle CPU on systems that
want to be power-efficient (https://lwn.net/Articles/731052/).

Link: https://lkml.kernel.org/r/20210421105132.3965998-4-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Jann Horn <jannh@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agokfence: maximize allocation wait timeout duration
Marco Elver [Wed, 5 May 2021 01:40:24 +0000 (18:40 -0700)]
kfence: maximize allocation wait timeout duration

The allocation wait timeout was initially added because of warnings due to
CONFIG_DETECT_HUNG_TASK=y [1].  While the 1 sec timeout is sufficient to
resolve the warnings (given the hung task timeout must be 1 sec or larger)
it may cause unnecessary wake-ups if the system is idle:

  https://lkml.kernel.org/r/CADYN=9J0DQhizAGB0-jz4HOBBh+05kMBXb4c0cXMS7Qi5NAJiw@mail.gmail.com

Fix it by computing the timeout duration in terms of the current
sysctl_hung_task_timeout_secs value.

Link: https://lkml.kernel.org/r/20210421105132.3965998-3-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Jann Horn <jannh@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agokfence: await for allocation using wait_event
Marco Elver [Wed, 5 May 2021 01:40:21 +0000 (18:40 -0700)]
kfence: await for allocation using wait_event

Patch series "kfence: optimize timer scheduling", v2.

We have observed that mostly-idle systems with KFENCE enabled wake up
otherwise idle CPUs, preventing such to enter a lower power state.
Debugging revealed that KFENCE spends too much active time in
toggle_allocation_gate().

While the first version of KFENCE was using all the right bits to be
scheduling optimal, and thus power efficient, by simply using wait_event()
+ wake_up(), that code was unfortunately removed.

As KFENCE was exposed to various different configs and tests, the
scheduling optimal code slowly disappeared.  First because of hung task
warnings, and finally because of deadlocks when an allocation is made by
timer code with debug objects enabled.  Clearly, the "fixes" were not too
friendly for devices that want to be power efficient.

Therefore, let's try a little harder to fix the hung task and deadlock
problems that we have with wait_event() + wake_up(), while remaining as
scheduling friendly and power efficient as possible.

Crucially, we need to defer the wake_up() to an irq_work, avoiding any
potential for deadlock.

The result with this series is that on the devices where we observed a
power regression, power usage returns back to baseline levels.

This patch (of 3):

On mostly-idle systems, we have observed that toggle_allocation_gate() is
a cause of frequent wake-ups, preventing an otherwise idle CPU to go into
a lower power state.

A late change in KFENCE's development, due to a potential deadlock [1],
required changing the scheduling-friendly wait_event_timeout() and
wake_up() to an open-coded wait-loop using schedule_timeout().  [1]
https://lkml.kernel.org/r/000000000000c0645805b7f982e4@google.com

To avoid unnecessary wake-ups, switch to using wait_event_timeout().

Unfortunately, we still cannot use a version with direct wake_up() in
__kfence_alloc() due to the same potential for deadlock as in [1].
Instead, add a level of indirection via an irq_work that is scheduled if
we determine that the kfence_timer requires a wake_up().

Link: https://lkml.kernel.org/r/20210421105132.3965998-1-elver@google.com
Link: https://lkml.kernel.org/r/20210421105132.3965998-2-elver@google.com
Fixes: 0ce20dd84089 ("mm: add Kernel Electric-Fence infrastructure")
Signed-off-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Hillf Danton <hdanton@sina.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agokfence: zero guard page after out-of-bounds access
Marco Elver [Wed, 5 May 2021 01:40:18 +0000 (18:40 -0700)]
kfence: zero guard page after out-of-bounds access

After an out-of-bounds accesses, zero the guard page before re-protecting
in kfence_guarded_free().  On one hand this helps make the failure mode of
subsequent out-of-bounds accesses more deterministic, but could also
prevent certain information leaks.

Link: https://lkml.kernel.org/r/20210312121653.348518-1-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Acked-by: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Jann Horn <jannh@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomm/process_vm_access.c: remove duplicate include
Zhang Yunkai [Wed, 5 May 2021 01:40:15 +0000 (18:40 -0700)]
mm/process_vm_access.c: remove duplicate include

'linux/compat.h' included in 'process_vm_access.c' is duplicated.

Link: https://lkml.kernel.org/r/20210306132122.220431-1-zhang.yunkai@zte.com.cn
Signed-off-by: Zhang Yunkai <zhang.yunkai@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomm/mempool: minor coding style tweaks
Zhiyuan Dai [Wed, 5 May 2021 01:40:12 +0000 (18:40 -0700)]
mm/mempool: minor coding style tweaks

Various coding style tweaks to various files under mm/

[daizhiyuan@phytium.com.cn: mm/swapfile: minor coding style tweaks]
Link: https://lkml.kernel.org/r/1614223624-16055-1-git-send-email-daizhiyuan@phytium.com.cn
[daizhiyuan@phytium.com.cn: mm/sparse: minor coding style tweaks]
Link: https://lkml.kernel.org/r/1614227288-19363-1-git-send-email-daizhiyuan@phytium.com.cn
[daizhiyuan@phytium.com.cn: mm/vmscan: minor coding style tweaks]
Link: https://lkml.kernel.org/r/1614227649-19853-1-git-send-email-daizhiyuan@phytium.com.cn
[daizhiyuan@phytium.com.cn: mm/compaction: minor coding style tweaks]
Link: https://lkml.kernel.org/r/1614228218-20770-1-git-send-email-daizhiyuan@phytium.com.cn
[daizhiyuan@phytium.com.cn: mm/oom_kill: minor coding style tweaks]
Link: https://lkml.kernel.org/r/1614228360-21168-1-git-send-email-daizhiyuan@phytium.com.cn
[daizhiyuan@phytium.com.cn: mm/shmem: minor coding style tweaks]
Link: https://lkml.kernel.org/r/1614228504-21491-1-git-send-email-daizhiyuan@phytium.com.cn
[daizhiyuan@phytium.com.cn: mm/page_alloc: minor coding style tweaks]
Link: https://lkml.kernel.org/r/1614228613-21754-1-git-send-email-daizhiyuan@phytium.com.cn
[daizhiyuan@phytium.com.cn: mm/filemap: minor coding style tweaks]
Link: https://lkml.kernel.org/r/1614228936-22337-1-git-send-email-daizhiyuan@phytium.com.cn
[daizhiyuan@phytium.com.cn: mm/mlock: minor coding style tweaks]
Link: https://lkml.kernel.org/r/1613956588-2453-1-git-send-email-daizhiyuan@phytium.com.cn
[daizhiyuan@phytium.com.cn: mm/frontswap: minor coding style tweaks]
Link: https://lkml.kernel.org/r/1613962668-15045-1-git-send-email-daizhiyuan@phytium.com.cn
[daizhiyuan@phytium.com.cn: mm/vmalloc: minor coding style tweaks]
Link: https://lkml.kernel.org/r/1613963379-15988-1-git-send-email-daizhiyuan@phytium.com.cn
[daizhiyuan@phytium.com.cn: mm/memory_hotplug: minor coding style tweaks]
Link: https://lkml.kernel.org/r/1613971784-24878-1-git-send-email-daizhiyuan@phytium.com.cn
[daizhiyuan@phytium.com.cn: mm/mempolicy: minor coding style tweaks]
Link: https://lkml.kernel.org/r/1613972228-25501-1-git-send-email-daizhiyuan@phytium.com.cn
Link: https://lkml.kernel.org/r/1614222374-13805-1-git-send-email-daizhiyuan@phytium.com.cn
Signed-off-by: Zhiyuan Dai <daizhiyuan@phytium.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomm/highmem.c: fix coding style issue
songqiang [Wed, 5 May 2021 01:40:09 +0000 (18:40 -0700)]
mm/highmem.c: fix coding style issue

Delete/add some blank lines and some blank spaces

Link: https://lkml.kernel.org/r/20210311095015.14277-1-songqiang@uniontech.com
Signed-off-by: songqiang <songqiang@uniontech.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agobtrfs: use memzero_page() instead of open coded kmap pattern
Ira Weiny [Wed, 5 May 2021 01:40:07 +0000 (18:40 -0700)]
btrfs: use memzero_page() instead of open coded kmap pattern

There are many places where kmap/memset/kunmap patterns occur.

Use the newly lifted memzero_page() to eliminate direct uses of kmap and
leverage the new core functions use of kmap_local_page().

The development of this patch was aided by the following coccinelle
script:

// <smpl>
// SPDX-License-Identifier: GPL-2.0-only
// Find kmap/memset/kunmap pattern and replace with memset*page calls
//
// NOTE: Offsets and other expressions may be more complex than what the script
// will automatically generate.  Therefore a catchall rule is provided to find
// the pattern which then must be evaluated by hand.
//
// Confidence: Low
// Copyright: (C) 2021 Intel Corporation
// URL: http://coccinelle.lip6.fr/
// Comments:
// Options:

//
// Then the memset pattern
//
@ memset_rule1 @
expression page, V, L, Off;
identifier ptr;
type VP;
@@

(
-VP ptr = kmap(page);
|
-ptr = kmap(page);
|
-VP ptr = kmap_atomic(page);
|
-ptr = kmap_atomic(page);
)
<+...
(
-memset(ptr, 0, L);
+memzero_page(page, 0, L);
|
-memset(ptr + Off, 0, L);
+memzero_page(page, Off, L);
|
-memset(ptr, V, L);
+memset_page(page, V, 0, L);
|
-memset(ptr + Off, V, L);
+memset_page(page, V, Off, L);
)
...+>
(
-kunmap(page);
|
-kunmap_atomic(ptr);
)

// Remove any pointers left unused
@
depends on memset_rule1
@
identifier memset_rule1.ptr;
type VP, VP1;
@@

-VP ptr;
... when != ptr;
? VP1 ptr;

//
// Catch all
//
@ memset_rule2 @
expression page;
identifier ptr;
expression GenTo, GenSize, GenValue;
type VP;
@@

(
-VP ptr = kmap(page);
|
-ptr = kmap(page);
|
-VP ptr = kmap_atomic(page);
|
-ptr = kmap_atomic(page);
)
<+...
(
//
// Some call sites have complex expressions within the memset/memcpy
// The follow are catch alls which need to be evaluated by hand.
//
-memset(GenTo, 0, GenSize);
+memzero_pageExtra(page, GenTo, GenSize);
|
-memset(GenTo, GenValue, GenSize);
+memset_pageExtra(page, GenValue, GenTo, GenSize);
)
...+>
(
-kunmap(page);
|
-kunmap_atomic(ptr);
)

// Remove any pointers left unused
@
depends on memset_rule2
@
identifier memset_rule2.ptr;
type VP, VP1;
@@

-VP ptr;
... when != ptr;
? VP1 ptr;

// </smpl>

Link: https://lkml.kernel.org/r/20210309212137.2610186-4-ira.weiny@intel.com
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Cc: Chris Mason <clm@fb.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoiov_iter: lift memzero_page() to highmem.h
Ira Weiny [Wed, 5 May 2021 01:40:03 +0000 (18:40 -0700)]
iov_iter: lift memzero_page() to highmem.h

Patch series "btrfs: Convert kmap/memset/kunmap to memzero_user()".

Lifting memzero_user(), convert it to kmap_local_page() and then use it
in btrfs.

This patch (of 3):

memzero_page() can replace the kmap/memset/kunmap pattern in other
places in the code.  While zero_user() has the same interface it is not
the same call and its use should be limited and some of those calls may
be better converted from zero_user() to memzero_page().[1] But that is
not addressed in this series.

Lift memzero_page() to highmem.

[1] https://lore.kernel.org/lkml/CAHk-=wijdojzo56FzYqE5TOYw2Vws7ik3LEMGj9SPQaJJ+Z73Q@mail.gmail.com/

Link: https://lkml.kernel.org/r/20210309212137.2610186-1-ira.weiny@intel.com
Link: https://lkml.kernel.org/r/20210309212137.2610186-2-ira.weiny@intel.com
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: David Sterba <dsterba@suse.com>
Cc: Chris Mason <clm@fb.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomm/zsmalloc: use BUG_ON instead of if condition followed by BUG.
zhouchuangao [Wed, 5 May 2021 01:40:00 +0000 (18:40 -0700)]
mm/zsmalloc: use BUG_ON instead of if condition followed by BUG.

It can be optimized at compile time.

Link: https://lkml.kernel.org/r/1616727798-9110-1-git-send-email-zhouchuangao@vivo.com
Signed-off-by: zhouchuangao <zhouchuangao@vivo.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomm/zswap.c: switch from strlcpy to strscpy
Zhiyuan Dai [Wed, 5 May 2021 01:39:57 +0000 (18:39 -0700)]
mm/zswap.c: switch from strlcpy to strscpy

strlcpy is marked as deprecated in Documentation/process/deprecated.rst,
and there is no functional difference when the caller expects truncation
(when not checking the return value).  strscpy is relatively better as
it also avoids scanning the whole source string.

Link: https://lkml.kernel.org/r/1614227981-20367-1-git-send-email-daizhiyuan@phytium.com.cn
Signed-off-by: Zhiyuan Dai <daizhiyuan@phytium.com.cn>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoarm64/Kconfig: introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
Oscar Salvador [Wed, 5 May 2021 01:39:54 +0000 (18:39 -0700)]
arm64/Kconfig: introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE

Enable arm64 platform to use the MHP_MEMMAP_ON_MEMORY feature.

Link: https://lkml.kernel.org/r/20210421102701.25051-9-osalvador@suse.de
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agox86/Kconfig: introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
Oscar Salvador [Wed, 5 May 2021 01:39:51 +0000 (18:39 -0700)]
x86/Kconfig: introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE

Enable x86_64 platform to use the MHP_MEMMAP_ON_MEMORY feature.

Link: https://lkml.kernel.org/r/20210421102701.25051-8-osalvador@suse.de
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomm,memory_hotplug: add kernel boot option to enable memmap_on_memory
Oscar Salvador [Wed, 5 May 2021 01:39:48 +0000 (18:39 -0700)]
mm,memory_hotplug: add kernel boot option to enable memmap_on_memory

Self stored memmap leads to a sparse memory situation which is
unsuitable for workloads that requires large contiguous memory chunks,
so make this an opt-in which needs to be explicitly enabled.

To control this, let memory_hotplug have its own memory space, as
suggested by David, so we can add memory_hotplug.memmap_on_memory
parameter.

Link: https://lkml.kernel.org/r/20210421102701.25051-7-osalvador@suse.de
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoacpi,memhotplug: enable MHP_MEMMAP_ON_MEMORY when supported
Oscar Salvador [Wed, 5 May 2021 01:39:45 +0000 (18:39 -0700)]
acpi,memhotplug: enable MHP_MEMMAP_ON_MEMORY when supported

Let the caller check whether it can pass MHP_MEMMAP_ON_MEMORY by
checking mhp_supports_memmap_on_memory().  MHP_MEMMAP_ON_MEMORY can only
be set in case ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE is enabled, the
architecture supports altmap, and the range to be added spans a single
memory block.

Link: https://lkml.kernel.org/r/20210421102701.25051-6-osalvador@suse.de
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomm,memory_hotplug: allocate memmap from the added memory range
Oscar Salvador [Wed, 5 May 2021 01:39:42 +0000 (18:39 -0700)]
mm,memory_hotplug: allocate memmap from the added memory range

Physical memory hotadd has to allocate a memmap (struct page array) for
the newly added memory section.  Currently, alloc_pages_node() is used
for those allocations.

This has some disadvantages:
 a) an existing memory is consumed for that purpose
    (eg: ~2MB per 128MB memory section on x86_64)
    This can even lead to extreme cases where system goes OOM because
    the physically hotplugged memory depletes the available memory before
    it is onlined.
 b) if the whole node is movable then we have off-node struct pages
    which has performance drawbacks.
 c) It might be there are no PMD_ALIGNED chunks so memmap array gets
    populated with base pages.

This can be improved when CONFIG_SPARSEMEM_VMEMMAP is enabled.

Vmemap page tables can map arbitrary memory.  That means that we can
reserve a part of the physically hotadded memory to back vmemmap page
tables.  This implementation uses the beginning of the hotplugged memory
for that purpose.

There are some non-obviously things to consider though.

Vmemmap pages are allocated/freed during the memory hotplug events
(add_memory_resource(), try_remove_memory()) when the memory is
added/removed.  This means that the reserved physical range is not
online although it is used.  The most obvious side effect is that
pfn_to_online_page() returns NULL for those pfns.  The current design
expects that this should be OK as the hotplugged memory is considered a
garbage until it is onlined.  For example hibernation wouldn't save the
content of those vmmemmaps into the image so it wouldn't be restored on
resume but this should be OK as there no real content to recover anyway
while metadata is reachable from other data structures (e.g.  vmemmap
page tables).

The reserved space is therefore (de)initialized during the {on,off}line
events (mhp_{de}init_memmap_on_memory).  That is done by extracting page
allocator independent initialization from the regular onlining path.
The primary reason to handle the reserved space outside of
{on,off}line_pages is to make each initialization specific to the
purpose rather than special case them in a single function.

As per above, the functions that are introduced are:

 - mhp_init_memmap_on_memory:
   Initializes vmemmap pages by calling move_pfn_range_to_zone(), calls
   kasan_add_zero_shadow(), and onlines as many sections as vmemmap pages
   fully span.

 - mhp_deinit_memmap_on_memory:
   Offlines as many sections as vmemmap pages fully span, removes the
   range from zhe zone by remove_pfn_range_from_zone(), and calls
   kasan_remove_zero_shadow() for the range.

The new function memory_block_online() calls mhp_init_memmap_on_memory()
before doing the actual online_pages().  Should online_pages() fail, we
clean up by calling mhp_deinit_memmap_on_memory().  Adjusting of
present_pages is done at the end once we know that online_pages()
succedeed.

On offline, memory_block_offline() needs to unaccount vmemmap pages from
present_pages() before calling offline_pages().  This is necessary because
offline_pages() tears down some structures based on the fact whether the
node or the zone become empty.  If offline_pages() fails, we account back
vmemmap pages.  If it succeeds, we call mhp_deinit_memmap_on_memory().

Hot-remove:

 We need to be careful when removing memory, as adding and
 removing memory needs to be done with the same granularity.
 To check that this assumption is not violated, we check the
 memory range we want to remove and if a) any memory block has
 vmemmap pages and b) the range spans more than a single memory
 block, we scream out loud and refuse to proceed.

 If all is good and the range was using memmap on memory (aka vmemmap pages),
 we construct an altmap structure so free_hugepage_table does the right
 thing and calls vmem_altmap_free instead of free_pagetable.

Link: https://lkml.kernel.org/r/20210421102701.25051-5-osalvador@suse.de
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomm,memory_hotplug: factor out adjusting present pages into adjust_present_page_count()
David Hildenbrand [Wed, 5 May 2021 01:39:39 +0000 (18:39 -0700)]
mm,memory_hotplug: factor out adjusting present pages into adjust_present_page_count()

Let's have a single place (inspired by adjust_managed_page_count())
where we adjust present pages.

In contrast to adjust_managed_page_count(), only memory onlining or
offlining is allowed to modify the number of present pages.

Link: https://lkml.kernel.org/r/20210421102701.25051-4-osalvador@suse.de
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomm,memory_hotplug: relax fully spanned sections check
Oscar Salvador [Wed, 5 May 2021 01:39:36 +0000 (18:39 -0700)]
mm,memory_hotplug: relax fully spanned sections check

We want {online,offline}_pages to operate on whole memblocks, but
memmap_on_memory will poke pageblock_nr_pages aligned holes in the
beginning, which is a special case we want to allow.  Relax the check to
account for that case.

Link: https://lkml.kernel.org/r/20210421102701.25051-3-osalvador@suse.de
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agodrivers/base/memory: introduce memory_block_{online,offline}
Oscar Salvador [Wed, 5 May 2021 01:39:33 +0000 (18:39 -0700)]
drivers/base/memory: introduce memory_block_{online,offline}

Patch series "Allocate memmap from hotadded memory (per device)", v10.

The primary goal of this patchset is to reduce memory overhead of the
hot-added memory (at least for SPARSEMEM_VMEMMAP memory model).  The
current way we use to populate memmap (struct page array) has two main
drawbacks:

a) it consumes an additional memory until the hotadded memory itself is
   onlined and

b) memmap might end up on a different numa node which is especially
   true for movable_node configuration.

c) due to fragmentation we might end up populating memmap with base
   pages

One way to mitigate all these issues is to simply allocate memmap array
(which is the largest memory footprint of the physical memory hotplug)
from the hot-added memory itself.  SPARSEMEM_VMEMMAP memory model allows
us to map any pfn range so the memory doesn't need to be online to be
usable for the array.  See patch 4 for more details.  This feature is
only usable when CONFIG_SPARSEMEM_VMEMMAP is set.

[Overall design]:

Implementation wise we reuse vmem_altmap infrastructure to override the
default allocator used by vmemap_populate.  memory_block structure gains a
new field called nr_vmemmap_pages, which accounts for the number of
vmemmap pages used by that memory_block.  E.g: On x86_64, that is 512
vmemmap pages on small memory bloks and 4096 on large memory blocks (1GB)

We also introduce new two functions: memory_block_{online,offline}.  These
functions take care of initializing/unitializing vmemmap pages prior to
calling {online,offline}_pages, so the latter functions can remain totally
untouched.

More details can be found in the respective changelogs.

This patch (of 8):

This is a preparatory patch that introduces two new functions:
memory_block_online() and memory_block_offline().

For now, these functions will only call online_pages() and offline_pages()
respectively, but they will be later in charge of preparing the vmemmap
pages, carrying out the initialization and proper accounting of such
pages.

Since memory_block struct contains all the information, pass this struct
down the chain till the end functions.

Link: https://lkml.kernel.org/r/20210421102701.25051-1-osalvador@suse.de
Link: https://lkml.kernel.org/r/20210421102701.25051-2-osalvador@suse.de
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomm/memory_hotplug: remove broken locking of zone PCP structures during hot remove
Mel Gorman [Wed, 5 May 2021 01:39:30 +0000 (18:39 -0700)]
mm/memory_hotplug: remove broken locking of zone PCP structures during hot remove

zone_pcp_reset allegedly protects against a race with drain_pages using
local_irq_save but this is bogus.  local_irq_save only operates on the
local CPU.  If memory hotplug is running on CPU A and drain_pages is
running on CPU B, disabling IRQs on CPU A does not affect CPU B and
offers no protection.

This patch deletes IRQ disable/enable on the grounds that IRQs protect
nothing and assumes the existing hotplug paths guarantees the PCP cannot
be used after zone_pcp_enable().  That should be the case already
because all the pages have been freed and there is no page to put on the
PCP lists.

Link: https://lkml.kernel.org/r/20210412090346.GQ3697@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoselftests/vm: gup_test: test faulting in kernel, and verify pinnable pages
Pavel Tatashin [Wed, 5 May 2021 01:39:27 +0000 (18:39 -0700)]
selftests/vm: gup_test: test faulting in kernel, and verify pinnable pages

When pages are pinned they can be faulted in userland and migrated, and
they can be faulted right in kernel without migration.

In either case, the pinned pages must end-up being pinnable (not
movable).

Add a new test to gup_test, to help verify that the gup/pup
(get_user_pages() / pin_user_pages()) behavior with respect to pinnable
and movable pages is reasonable and correct.  Specifically, provide a
way to:

1) Verify that only "pinnable" pages are pinned.  This is checked
   automatically for you.

2) Verify that gup/pup performance is reasonable.  This requires
   comparing benchmarks between doing gup/pup on pages that have been
   pre-faulted in from user space, vs.  doing gup/pup on pages that are
   not faulted in until gup/pup time (via FOLL_TOUCH).  This decision is
   controlled with the new -z command line option.

Link: https://lkml.kernel.org/r/20210215161349.246722-15-pasha.tatashin@soleen.com
Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: James Morris <jmorris@namei.org>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Tyler Hicks <tyhicks@linux.microsoft.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoselftests/vm: gup_test: fix test flag
Pavel Tatashin [Wed, 5 May 2021 01:39:23 +0000 (18:39 -0700)]
selftests/vm: gup_test: fix test flag

In gup_test both gup_flags and test_flags use the same flags field.
This is broken.

Farther, in the actual gup_test.c all the passed gup_flags are erased
and unconditionally replaced with FOLL_WRITE.

Which means that test_flags are ignored, and code like this always
performs pin dump test:

155   if (gup->flags & GUP_TEST_FLAG_DUMP_PAGES_USE_PIN)
156   nr = pin_user_pages(addr, nr, gup->flags,
157       pages + i, NULL);
158   else
159   nr = get_user_pages(addr, nr, gup->flags,
160       pages + i, NULL);
161   break;

Add a new test_flags field, to allow raw gup_flags to work.  Add a new
subcommand for DUMP_USER_PAGES_TEST to specify that pin test should be
performed.

Remove unconditional overwriting of gup_flags via FOLL_WRITE.  But,
preserve the previous behaviour where FOLL_WRITE was the default flag,
and add a new option "-W" to unset FOLL_WRITE.

Rename flags with gup_flags.

With the fix, dump works like this:

  root@virtme:/# gup_test  -c
  ---- page #0, starting from user virt addr: 0x7f8acb9e4000
  page:00000000d3d2ee27 refcount:2 mapcount:1 mapping:0000000000000000
  index:0x0 pfn:0x100bcf
  anon flags: 0x300000000080016(referenced|uptodate|lru|swapbacked)
  raw: 0300000000080016 ffffd0e204021608 ffffd0e208df2e88 ffff8ea04243ec61
  raw: 0000000000000000 0000000000000000 0000000200000000 0000000000000000
  page dumped because: gup_test: dump_pages() test
  DUMP_USER_PAGES_TEST: done

  root@virtme:/# gup_test  -c -p
  ---- page #0, starting from user virt addr: 0x7fd19701b000
  page:00000000baed3c7d refcount:1025 mapcount:1 mapping:0000000000000000
  index:0x0 pfn:0x108008
  anon flags: 0x300000000080014(uptodate|lru|swapbacked)
  raw: 0300000000080014 ffffd0e204200188 ffffd0e205e09088 ffff8ea04243ee71
  raw: 0000000000000000 0000000000000000 0000040100000000 0000000000000000
  page dumped because: gup_test: dump_pages() test
  DUMP_USER_PAGES_TEST: done

Refcount shows the difference between pin vs no-pin case.
Also change type of nr from int to long, as it counts number of pages.

Link: https://lkml.kernel.org/r/20210215161349.246722-14-pasha.tatashin@soleen.com
Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: James Morris <jmorris@namei.org>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Tyler Hicks <tyhicks@linux.microsoft.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomm/gup: longterm pin migration cleanup
Pavel Tatashin [Wed, 5 May 2021 01:39:19 +0000 (18:39 -0700)]
mm/gup: longterm pin migration cleanup

When pages are longterm pinned, we must migrated them out of movable zone.
The function that migrates them has a hidden loop with goto.  The loop is
to retry on isolation failures, and after successful migration.

Make this code better by moving this loop to the caller.

Link: https://lkml.kernel.org/r/20210215161349.246722-13-pasha.tatashin@soleen.com
Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: James Morris <jmorris@namei.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Tyler Hicks <tyhicks@linux.microsoft.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomm/gup: change index type to long as it counts pages
Pavel Tatashin [Wed, 5 May 2021 01:39:15 +0000 (18:39 -0700)]
mm/gup: change index type to long as it counts pages

In __get_user_pages_locked() i counts number of pages which should be
long, as long is used in all other places to contain number of pages, and
32-bit becomes increasingly small for handling page count proportional
values.

Link: https://lkml.kernel.org/r/20210215161349.246722-12-pasha.tatashin@soleen.com
Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: James Morris <jmorris@namei.org>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Tyler Hicks <tyhicks@linux.microsoft.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomemory-hotplug.rst: add a note about ZONE_MOVABLE and page pinning
Pavel Tatashin [Wed, 5 May 2021 01:39:12 +0000 (18:39 -0700)]
memory-hotplug.rst: add a note about ZONE_MOVABLE and page pinning

Document the special handling of page pinning when ZONE_MOVABLE present.

Link: https://lkml.kernel.org/r/20210215161349.246722-11-pasha.tatashin@soleen.com
Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: James Morris <jmorris@namei.org>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Tyler Hicks <tyhicks@linux.microsoft.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomm/gup: migrate pinned pages out of movable zone
Pavel Tatashin [Wed, 5 May 2021 01:39:08 +0000 (18:39 -0700)]
mm/gup: migrate pinned pages out of movable zone

We should not pin pages in ZONE_MOVABLE.  Currently, we do not pin only
movable CMA pages.  Generalize the function that migrates CMA pages to
migrate all movable pages.  Use is_pinnable_page() to check which pages
need to be migrated

Link: https://lkml.kernel.org/r/20210215161349.246722-10-pasha.tatashin@soleen.com
Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: James Morris <jmorris@namei.org>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Tyler Hicks <tyhicks@linux.microsoft.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomm/gup: do not migrate zero page
Pavel Tatashin [Wed, 5 May 2021 01:39:04 +0000 (18:39 -0700)]
mm/gup: do not migrate zero page

On some platforms ZERO_PAGE(0) might end-up in a movable zone.  Do not
migrate zero page in gup during longterm pinning as migration of zero page
is not allowed.

For example, in x86 QEMU with 16G of memory and kernelcore=5G parameter, I
see the following:

Boot#1: zero_pfn  0x48a8d zero_pfn zone: ZONE_DMA32
Boot#2: zero_pfn 0x20168d zero_pfn zone: ZONE_MOVABLE

On x86, empty_zero_page is declared in .bss and depending on the loader
may end up in different physical locations during boots.

Also, move is_zero_pfn() my_zero_pfn() functions under CONFIG_MMU, because
zero_pfn that they are using is declared in memory.c which is compiled
with CONFIG_MMU.

Link: https://lkml.kernel.org/r/20210215161349.246722-9-pasha.tatashin@soleen.com
Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: James Morris <jmorris@namei.org>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Tyler Hicks <tyhicks@linux.microsoft.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomm: honor PF_MEMALLOC_PIN for all movable pages
Pavel Tatashin [Wed, 5 May 2021 01:39:00 +0000 (18:39 -0700)]
mm: honor PF_MEMALLOC_PIN for all movable pages

PF_MEMALLOC_PIN is only honored for CMA pages, extend this flag to work
for any allocations from ZONE_MOVABLE by removing __GFP_MOVABLE from
gfp_mask when this flag is passed in the current context.

Add is_pinnable_page() to return true if page is in a pinnable page.  A
pinnable page is not in ZONE_MOVABLE and not of MIGRATE_CMA type.

Link: https://lkml.kernel.org/r/20210215161349.246722-8-pasha.tatashin@soleen.com
Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: James Morris <jmorris@namei.org>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Tyler Hicks <tyhicks@linux.microsoft.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomm: apply per-task gfp constraints in fast path
Pavel Tatashin [Wed, 5 May 2021 01:38:57 +0000 (18:38 -0700)]
mm: apply per-task gfp constraints in fast path

Function current_gfp_context() is called after fast path.  However, soon
we will add more constraints which will also limit zones based on
context.  Move this call into fast path, and apply the correct
constraints for all allocations.

Also update .reclaim_idx based on value returned by
current_gfp_context() because it soon will modify the allowed zones.

Note:
With this patch we will do one extra current->flags load during fast path,
but we already load current->flags in fast-path:

__alloc_pages()
 prepare_alloc_pages()
  current_alloc_flags(gfp_mask, *alloc_flags);

Later, when we add the zone constrain logic to current_gfp_context() we
will be able to remove current->flags load from current_alloc_flags, and
therefore return fast-path to the current performance level.

Link: https://lkml.kernel.org/r/20210215161349.246722-7-pasha.tatashin@soleen.com
Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Suggested-by: Michal Hocko <mhocko@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: James Morris <jmorris@namei.org>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Tyler Hicks <tyhicks@linux.microsoft.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomm cma: rename PF_MEMALLOC_NOCMA to PF_MEMALLOC_PIN
Pavel Tatashin [Wed, 5 May 2021 01:38:53 +0000 (18:38 -0700)]
mm cma: rename PF_MEMALLOC_NOCMA to PF_MEMALLOC_PIN

PF_MEMALLOC_NOCMA is used ot guarantee that the allocator will not
return pages that might belong to CMA region.  This is currently used
for long term gup to make sure that such pins are not going to be done
on any CMA pages.

When PF_MEMALLOC_NOCMA has been introduced we haven't realized that it
is focusing on CMA pages too much and that there is larger class of
pages that need the same treatment.  MOVABLE zone cannot contain any
long term pins as well so it makes sense to reuse and redefine this flag
for that usecase as well.  Rename the flag to PF_MEMALLOC_PIN which
defines an allocation context which can only get pages suitable for
long-term pins.

Also rename: memalloc_nocma_save()/memalloc_nocma_restore to
memalloc_pin_save()/memalloc_pin_restore() and make the new functions
common.

[rppt@linux.ibm.com: fix renaming of PF_MEMALLOC_NOCMA to PF_MEMALLOC_PIN]
Link: https://lkml.kernel.org/r/20210331163816.11517-1-rppt@kernel.org
Link: https://lkml.kernel.org/r/20210215161349.246722-6-pasha.tatashin@soleen.com
Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: James Morris <jmorris@namei.org>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Tyler Hicks <tyhicks@linux.microsoft.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomm/gup: check for isolation errors
Pavel Tatashin [Wed, 5 May 2021 01:38:49 +0000 (18:38 -0700)]
mm/gup: check for isolation errors

It is still possible that we pin movable CMA pages if there are
isolation errors and cma_page_list stays empty when we check again.

Check for isolation errors, and return success only when there are no
isolation errors, and cma_page_list is empty after checking.

Because isolation errors are transient, we retry indefinitely.

Link: https://lkml.kernel.org/r/20210215161349.246722-5-pasha.tatashin@soleen.com
Fixes: 9a4e9f3b2d73 ("mm: update get_user_pages_longterm to migrate pages allocated from CMA region")
Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: James Morris <jmorris@namei.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Tyler Hicks <tyhicks@linux.microsoft.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>