platform/kernel/linux-rpi.git
4 years agoMerge tag 'acpi-5.9-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael...
Linus Torvalds [Sat, 15 Aug 2020 15:18:22 +0000 (08:18 -0700)]
Merge tag 'acpi-5.9-rc1-2' of git://git./linux/kernel/git/rafael/linux-pm

Pull more ACPI updates from Rafael Wysocki:
 "Add new hardware support to the ACPI driver for AMD SoCs, the x86 clk
  driver and the Designware i2c driver (changes from Akshu Agrawal and
  Pu Wen)"

* tag 'acpi-5.9-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  clk: x86: Support RV architecture
  ACPI: APD: Add a fmw property is_raven
  clk: x86: Change name from ST to FCH
  ACPI: APD: Change name from ST to FCH
  i2c: designware: Add device HID for Hygon I2C controller

4 years agoMerge tag 'pm-5.9-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael...
Linus Torvalds [Sat, 15 Aug 2020 15:17:01 +0000 (08:17 -0700)]
Merge tag 'pm-5.9-rc1-3' of git://git./linux/kernel/git/rafael/linux-pm

Pull one more power management update from Rafael Wysocki:
 "Modify the intel_pstate driver to allow it to work in the passive mode
  with hardware-managed P-states (HWP) enabled"

* tag 'pm-5.9-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpufreq: intel_pstate: Implement passive mode with HWP enabled

4 years agoMerge tag 'mfd-next-5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd
Linus Torvalds [Sat, 15 Aug 2020 15:09:38 +0000 (08:09 -0700)]
Merge tag 'mfd-next-5.9-1' of git://git./linux/kernel/git/lee/mfd

Pull MFD updates from Lee Jones:
 "Core Frameworks
   - Make better attempt at matching device with the correct OF node
   - Allow batch removal of hierarchical sub-devices

  New Drivers
   - Add STM32 Clocksource driver
   - Add support for Khadas System Control Microcontroller

  Driver Removal
   - Remove unused driver for TI's SMSC ECE1099

  New Device Support
   - Add support for Intel Emmitsburg PCH to Intel LPSS PCI
   - Add support for Intel Tiger Lake PCH-H to Intel LPSS PCI
   - Add support for Dialog DA revision to Dialog DA9063

  New Functionality
   - Add support for AXP803 to be probed by I2C

  Fix-ups
   - Numerous W=1 warning fixes
   - Device Tree changes (stm32-lptimer, gateworks-gsc, khadas,mcu, stmfx, cros-ec, j721e-system-controller)
   - Enabled Regmap 'fast I/O' in stm32-lptimer
   - Change BUG_ON to WARN_ON in arizona-core
   - Remove superfluous code/initialisation (madera, max14577)
   - Trivial formatting/spelling issues (madera-core, madera-i2c, da9055, max77693-private)
   - Switch to of_platform_populate() in sprd-sc27xx-spi
   - Expand out set/get brightness/pwm macros in lm3533-ctrlbank
   - Disable IRQs on suspend in motorola-cpcap
   - Clean-up error handling in intel_soc_pmic_mrfld
   - Ensure correct removal order of sub-devices in madera
   - Many s/HTTP/HTTPS/ link changes
   - Ensure name used with Regmap is unique in syscon

  Bug Fixes
   - Properly 'put' clock on unbind and error in arizona-core
   - Fix revision handling in da9063
   - Fix 'assignment of read-only location' error in kempld-core
   - Avoid using the Regmap API when atomic in rn5t618
   - Redefine volatile register description in rn5t618
   - Use locking to protect event handler in dln2"

* tag 'mfd-next-5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (76 commits)
  mfd: syscon: Use a unique name with regmap_config
  mfd: Replace HTTP links with HTTPS ones
  mfd: dln2: Run event handler loop under spinlock
  mfd: madera: Improve handling of regulator unbinding
  mfd: mfd-core: Add mechanism for removal of a subset of children
  mfd: intel_soc_pmic_mrfld: Simplify the return expression of intel_scu_ipc_dev_iowrite8()
  mfd: max14577: Remove redundant initialization of variable current_bits
  mfd: rn5t618: Fix caching of battery related registers
  mfd: max77693-private: Drop a duplicated word
  mfd: da9055: pdata.h: Drop a duplicated word
  mfd: rn5t618: Make restart handler atomic safe
  mfd: kempld-core: Fix 'assignment of read-only location' error
  mfd: axp20x: Allow the AXP803 to be probed by I2C
  mfd: da9063: Add support for latest DA silicon revision
  mfd: da9063: Fix revision handling to correctly select reg tables
  dt-bindings: mfd: st,stmfx: Remove I2C unit name
  dt-bindings: mfd: ti,j721e-system-controller.yaml: Add J721e system controller
  mfd: motorola-cpcap: Disable interrupt for suspend
  mfd: smsc-ece1099: Remove driver
  mfd: core: Add OF_MFD_CELL_REG() helper
  ...

4 years agoMerge branch 'akpm' (patches from Andrew)
Linus Torvalds [Sat, 15 Aug 2020 15:02:03 +0000 (08:02 -0700)]
Merge branch 'akpm' (patches from Andrew)

Merge more updates from Andrew Morton:
 "Subsystems affected by this patch series: mm/hotfixes, lz4, exec,
  mailmap, mm/thp, autofs, sysctl, mm/kmemleak, mm/misc and lib"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (35 commits)
  virtio: pci: constify ioreadX() iomem argument (as in generic implementation)
  ntb: intel: constify ioreadX() iomem argument (as in generic implementation)
  rtl818x: constify ioreadX() iomem argument (as in generic implementation)
  iomap: constify ioreadX() iomem argument (as in generic implementation)
  sh: use generic strncpy()
  sh: clkfwk: remove r8/r16/r32
  include/asm-generic/vmlinux.lds.h: align ro_after_init
  mm: annotate a data race in page_zonenum()
  mm/swap.c: annotate data races for lru_rotate_pvecs
  mm/rmap: annotate a data race at tlb_flush_batched
  mm/mempool: fix a data race in mempool_free()
  mm/list_lru: fix a data race in list_lru_count_one
  mm/memcontrol: fix a data race in scan count
  mm/page_counter: fix various data races at memsw
  mm/swapfile: fix and annotate various data races
  mm/filemap.c: fix a data race in filemap_fault()
  mm/swap_state: mark various intentional data races
  mm/page_io: mark various intentional data races
  mm/frontswap: mark various intentional data races
  mm/kmemleak: silence KCSAN splats in checksum
  ...

4 years agovirtio: pci: constify ioreadX() iomem argument (as in generic implementation)
Krzysztof Kozlowski [Sat, 15 Aug 2020 00:32:20 +0000 (17:32 -0700)]
virtio: pci: constify ioreadX() iomem argument (as in generic implementation)

The ioreadX() helpers have inconsistent interface.  On some architectures
void *__iomem address argument is a pointer to const, on some not.

Implementations of ioreadX() do not modify the memory under the address so
they can be converted to a "const" version for const-safety and
consistency among architectures.

Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Allen Hubbe <allenbh@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Jon Mason <jdmason@kudzu.us>
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Rich Felker <dalias@libc.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/20200709072837.5869-5-krzk@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agontb: intel: constify ioreadX() iomem argument (as in generic implementation)
Krzysztof Kozlowski [Sat, 15 Aug 2020 00:32:15 +0000 (17:32 -0700)]
ntb: intel: constify ioreadX() iomem argument (as in generic implementation)

The ioreadX() helpers have inconsistent interface.  On some architectures
void *__iomem address argument is a pointer to const, on some not.

Implementations of ioreadX() do not modify the memory under the address so
they can be converted to a "const" version for const-safety and
consistency among architectures.

Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Dave Jiang <dave.jiang@intel.com>
Cc: Allen Hubbe <allenbh@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Jon Mason <jdmason@kudzu.us>
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Rich Felker <dalias@libc.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/20200709072837.5869-4-krzk@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agortl818x: constify ioreadX() iomem argument (as in generic implementation)
Krzysztof Kozlowski [Sat, 15 Aug 2020 00:32:11 +0000 (17:32 -0700)]
rtl818x: constify ioreadX() iomem argument (as in generic implementation)

The ioreadX() helpers have inconsistent interface.  On some architectures
void *__iomem address argument is a pointer to const, on some not.

Implementations of ioreadX() do not modify the memory under the address so
they can be converted to a "const" version for const-safety and
consistency among architectures.

Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Kalle Valo <kvalo@codeaurora.org>
Cc: Allen Hubbe <allenbh@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Jon Mason <jdmason@kudzu.us>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Rich Felker <dalias@libc.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/20200709072837.5869-3-krzk@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoiomap: constify ioreadX() iomem argument (as in generic implementation)
Krzysztof Kozlowski [Sat, 15 Aug 2020 00:32:07 +0000 (17:32 -0700)]
iomap: constify ioreadX() iomem argument (as in generic implementation)

Patch series "iomap: Constify ioreadX() iomem argument", v3.

The ioread8/16/32() and others have inconsistent interface among the
architectures: some taking address as const, some not.

It seems there is nothing really stopping all of them to take pointer to
const.

This patch (of 4):

The ioreadX() and ioreadX_rep() helpers have inconsistent interface.  On
some architectures void *__iomem address argument is a pointer to const,
on some not.

Implementations of ioreadX() do not modify the memory under the address so
they can be converted to a "const" version for const-safety and
consistency among architectures.

[krzk@kernel.org: sh: clk: fix assignment from incompatible pointer type for ioreadX()]
Link: http://lkml.kernel.org/r/20200723082017.24053-1-krzk@kernel.org
[akpm@linux-foundation.org: fix drivers/mailbox/bcm-pdc-mailbox.c]
Link: http://lkml.kernel.org/r/202007132209.Rxmv4QyS%25lkp@intel.com
Suggested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Jon Mason <jdmason@kudzu.us>
Cc: Allen Hubbe <allenbh@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Link: http://lkml.kernel.org/r/20200709072837.5869-1-krzk@kernel.org
Link: http://lkml.kernel.org/r/20200709072837.5869-2-krzk@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agosh: use generic strncpy()
Kuninori Morimoto [Sat, 15 Aug 2020 00:32:04 +0000 (17:32 -0700)]
sh: use generic strncpy()

Current SH will get below warning at strncpy()

In file included from ${LINUX}/arch/sh/include/asm/string.h:3,
                 from ${LINUX}/include/linux/string.h:20,
                 from ${LINUX}/include/linux/bitmap.h:9,
                 from ${LINUX}/include/linux/nodemask.h:95,
                 from ${LINUX}/include/linux/mmzone.h:17,
                 from ${LINUX}/include/linux/gfp.h:6,
                 from ${LINUX}/innclude/linux/slab.h:15,
                 from ${LINUX}/linux/drivers/mmc/host/vub300.c:38:
${LINUX}/drivers/mmc/host/vub300.c: In function 'new_system_port_status':
${LINUX}/arch/sh/include/asm/string_32.h:51:42: warning: array subscript\
  80 is above array bounds of 'char[26]' [-Warray-bounds]
   : "0" (__dest), "1" (__src), "r" (__src+__n)
                                     ~~~~~^~~~

In general, strncpy() should behave like below.

char dest[10];
char *src = "12345";

strncpy(dest, src, 10);
// dest = {'1', '2', '3', '4', '5',
           '\0','\0','\0','\0','\0'}

But, current SH strnpy() has 2 issues.
1st is it will access to out-of-memory (= src + 10).
2nd is it needs big fixup for it, and maintenance __asm__
code is difficult.

To solve these issues, this patch simply uses generic strncpy()
instead of architecture specific one.

Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Alan Modra <amodra@gmail.com>
Cc: Bin Meng <bin.meng@windriver.com>
Cc: Chen Zhou <chenzhou10@huawei.com>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Romain Naour <romain.naour@gmail.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: https://marc.info/?l=linux-renesas-soc&m=157664657013309
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agosh: clkfwk: remove r8/r16/r32
Kuninori Morimoto [Sat, 15 Aug 2020 00:32:00 +0000 (17:32 -0700)]
sh: clkfwk: remove r8/r16/r32

SH will get below warning

${LINUX}/drivers/sh/clk/cpg.c: In function 'r8':
${LINUX}/drivers/sh/clk/cpg.c:41:17: warning: passing argument 1 of 'ioread8'
 discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
  return ioread8(addr);
                 ^~~~
In file included from ${LINUX}/arch/sh/include/asm/io.h:21,
                 from ${LINUX}/include/linux/io.h:13,
                 from ${LINUX}/drivers/sh/clk/cpg.c:14:
${LINUX}/include/asm-generic/iomap.h:29:29: note: expected 'void *' but
argument is of type 'const void *'
 extern unsigned int ioread8(void __iomem *);
                             ^~~~~~~~~~~~~~

We don't need "const" for r8/r16/r32.  And we don't need r8/r16/r32
themselvs.  This patch cleanup these.

Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Alan Modra <amodra@gmail.com>
Cc: Bin Meng <bin.meng@windriver.com>
Cc: Chen Zhou <chenzhou10@huawei.com>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Romain Naour <romain.naour@gmail.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
X-MARC-Message: https://marc.info/?l=linux-renesas-soc&m=157852973916903
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoinclude/asm-generic/vmlinux.lds.h: align ro_after_init
Romain Naour [Sat, 15 Aug 2020 00:31:57 +0000 (17:31 -0700)]
include/asm-generic/vmlinux.lds.h: align ro_after_init

Since the patch [1], building the kernel using a toolchain built with
binutils 2.33.1 prevents booting a sh4 system under Qemu.  Apply the patch
provided by Alan Modra [2] that fix alignment of rodata.

[1] https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=ebd2263ba9a9124d93bbc0ece63d7e0fae89b40e
[2] https://www.sourceware.org/ml/binutils/2019-12/msg00112.html

Signed-off-by: Romain Naour <romain.naour@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Alan Modra <amodra@gmail.com>
Cc: Bin Meng <bin.meng@windriver.com>
Cc: Chen Zhou <chenzhou10@huawei.com>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Cc: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: <stable@vger.kernel.org>
Link: https://marc.info/?l=linux-sh&m=158429470221261
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm: annotate a data race in page_zonenum()
Qian Cai [Sat, 15 Aug 2020 00:31:53 +0000 (17:31 -0700)]
mm: annotate a data race in page_zonenum()

 BUG: KCSAN: data-race in page_cpupid_xchg_last / put_page

 write (marked) to 0xfffffc0d48ec1a00 of 8 bytes by task 91442 on cpu 3:
  page_cpupid_xchg_last+0x51/0x80
  page_cpupid_xchg_last at mm/mmzone.c:109 (discriminator 11)
  wp_page_reuse+0x3e/0xc0
  wp_page_reuse at mm/memory.c:2453
  do_wp_page+0x472/0x7b0
  do_wp_page at mm/memory.c:2798
  __handle_mm_fault+0xcb0/0xd00
  handle_pte_fault at mm/memory.c:4049
  (inlined by) __handle_mm_fault at mm/memory.c:4163
  handle_mm_fault+0xfc/0x2f0
  handle_mm_fault at mm/memory.c:4200
  do_page_fault+0x263/0x6f9
  do_user_addr_fault at arch/x86/mm/fault.c:1465
  (inlined by) do_page_fault at arch/x86/mm/fault.c:1539
  page_fault+0x34/0x40

 read to 0xfffffc0d48ec1a00 of 8 bytes by task 94817 on cpu 69:
  put_page+0x15a/0x1f0
  page_zonenum at include/linux/mm.h:923
  (inlined by) is_zone_device_page at include/linux/mm.h:929
  (inlined by) page_is_devmap_managed at include/linux/mm.h:948
  (inlined by) put_page at include/linux/mm.h:1023
  wp_page_copy+0x571/0x930
  wp_page_copy at mm/memory.c:2615
  do_wp_page+0x107/0x7b0
  __handle_mm_fault+0xcb0/0xd00
  handle_mm_fault+0xfc/0x2f0
  do_page_fault+0x263/0x6f9
  page_fault+0x34/0x40

 Reported by Kernel Concurrency Sanitizer on:
 CPU: 69 PID: 94817 Comm: systemd-udevd Tainted: G        W  O L 5.5.0-next-20200204+ #6
 Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019

A page never changes its zone number. The zone number happens to be
stored in the same word as other bits which are modified, but the zone
number bits will never be modified by any other write, so it can accept
a reload of the zone bits after an intervening write and it don't need
to use READ_ONCE(). Thus, annotate this data race using
ASSERT_EXCLUSIVE_BITS() to also assert that there are no concurrent
writes to it.

Suggested-by: Marco Elver <elver@google.com>
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Link: http://lkml.kernel.org/r/1581619089-14472-1-git-send-email-cai@lca.pw
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/swap.c: annotate data races for lru_rotate_pvecs
Qian Cai [Sat, 15 Aug 2020 00:31:50 +0000 (17:31 -0700)]
mm/swap.c: annotate data races for lru_rotate_pvecs

Read to lru_add_pvec->nr could be interrupted and then write to the same
variable.  The write has local interrupt disabled, but the plain reads
result in data races.  However, it is unlikely the compilers could do much
damage here given that lru_add_pvec->nr is a "unsigned char" and there is
an existing compiler barrier.  Thus, annotate the reads using the
data_race() macro.  The data races were reported by KCSAN,

 BUG: KCSAN: data-race in lru_add_drain_cpu / rotate_reclaimable_page

 write to 0xffff9291ebcb8a40 of 1 bytes by interrupt on cpu 23:
  rotate_reclaimable_page+0x2df/0x490
  pagevec_add at include/linux/pagevec.h:81
  (inlined by) rotate_reclaimable_page at mm/swap.c:259
  end_page_writeback+0x1b5/0x2b0
  end_swap_bio_write+0x1d0/0x280
  bio_endio+0x297/0x560
  dec_pending+0x218/0x430 [dm_mod]
  clone_endio+0xe4/0x2c0 [dm_mod]
  bio_endio+0x297/0x560
  blk_update_request+0x201/0x920
  scsi_end_request+0x6b/0x4a0
  scsi_io_completion+0xb7/0x7e0
  scsi_finish_command+0x1ed/0x2a0
  scsi_softirq_done+0x1c9/0x1d0
  blk_done_softirq+0x181/0x1d0
  __do_softirq+0xd9/0x57c
  irq_exit+0xa2/0xc0
  do_IRQ+0x8b/0x190
  ret_from_intr+0x0/0x42
  delay_tsc+0x46/0x80
  __const_udelay+0x3c/0x40
  __udelay+0x10/0x20
  kcsan_setup_watchpoint+0x202/0x3a0
  __tsan_read1+0xc2/0x100
  lru_add_drain_cpu+0xb8/0x3f0
  lru_add_drain+0x25/0x40
  shrink_active_list+0xe1/0xc80
  shrink_lruvec+0x766/0xb70
  shrink_node+0x2d6/0xca0
  do_try_to_free_pages+0x1f7/0x9a0
  try_to_free_pages+0x252/0x5b0
  __alloc_pages_slowpath+0x458/0x1290
  __alloc_pages_nodemask+0x3bb/0x450
  alloc_pages_vma+0x8a/0x2c0
  do_anonymous_page+0x16e/0x6f0
  __handle_mm_fault+0xcd5/0xd40
  handle_mm_fault+0xfc/0x2f0
  do_page_fault+0x263/0x6f9
  page_fault+0x34/0x40

 read to 0xffff9291ebcb8a40 of 1 bytes by task 37761 on cpu 23:
  lru_add_drain_cpu+0xb8/0x3f0
  lru_add_drain_cpu at mm/swap.c:602
  lru_add_drain+0x25/0x40
  shrink_active_list+0xe1/0xc80
  shrink_lruvec+0x766/0xb70
  shrink_node+0x2d6/0xca0
  do_try_to_free_pages+0x1f7/0x9a0
  try_to_free_pages+0x252/0x5b0
  __alloc_pages_slowpath+0x458/0x1290
  __alloc_pages_nodemask+0x3bb/0x450
  alloc_pages_vma+0x8a/0x2c0
  do_anonymous_page+0x16e/0x6f0
  __handle_mm_fault+0xcd5/0xd40
  handle_mm_fault+0xfc/0x2f0
  do_page_fault+0x263/0x6f9
  page_fault+0x34/0x40

 2 locks held by oom02/37761:
  #0: ffff9281e5928808 (&mm->mmap_sem#2){++++}, at: do_page_fault
  #1: ffffffffb3ade380 (fs_reclaim){+.+.}, at: fs_reclaim_acquire.part
 irq event stamp: 1949217
 trace_hardirqs_on_thunk+0x1a/0x1c
 __do_softirq+0x2e7/0x57c
 __do_softirq+0x34c/0x57c
 irq_exit+0xa2/0xc0

 Reported by Kernel Concurrency Sanitizer on:
 CPU: 23 PID: 37761 Comm: oom02 Not tainted 5.6.0-rc3-next-20200226+ #6
 Hardware name: HP ProLiant BL660c Gen9, BIOS I38 10/17/2018

Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Marco Elver <elver@google.com>
Link: http://lkml.kernel.org/r/20200228044018.1263-1-cai@lca.pw
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/rmap: annotate a data race at tlb_flush_batched
Qian Cai [Sat, 15 Aug 2020 00:31:47 +0000 (17:31 -0700)]
mm/rmap: annotate a data race at tlb_flush_batched

mm->tlb_flush_batched could be accessed concurrently as noticed by
KCSAN,

 BUG: KCSAN: data-race in flush_tlb_batched_pending / try_to_unmap_one

 write to 0xffff93f754880bd0 of 1 bytes by task 822 on cpu 6:
  try_to_unmap_one+0x59a/0x1ab0
  set_tlb_ubc_flush_pending at mm/rmap.c:635
  (inlined by) try_to_unmap_one at mm/rmap.c:1538
  rmap_walk_anon+0x296/0x650
  rmap_walk+0xdf/0x100
  try_to_unmap+0x18a/0x2f0
  shrink_page_list+0xef6/0x2870
  shrink_inactive_list+0x316/0x880
  shrink_lruvec+0x8dc/0x1380
  shrink_node+0x317/0xd80
  balance_pgdat+0x652/0xd90
  kswapd+0x396/0x8d0
  kthread+0x1e0/0x200
  ret_from_fork+0x27/0x50

 read to 0xffff93f754880bd0 of 1 bytes by task 6364 on cpu 4:
  flush_tlb_batched_pending+0x29/0x90
  flush_tlb_batched_pending at mm/rmap.c:682
  change_p4d_range+0x5dd/0x1030
  change_pte_range at mm/mprotect.c:44
  (inlined by) change_pmd_range at mm/mprotect.c:212
  (inlined by) change_pud_range at mm/mprotect.c:240
  (inlined by) change_p4d_range at mm/mprotect.c:260
  change_protection+0x222/0x310
  change_prot_numa+0x3e/0x60
  task_numa_work+0x219/0x350
  task_work_run+0xed/0x140
  prepare_exit_to_usermode+0x2cc/0x2e0
  ret_from_intr+0x32/0x42

 Reported by Kernel Concurrency Sanitizer on:
 CPU: 4 PID: 6364 Comm: mtest01 Tainted: G        W    L 5.5.0-next-20200210+ #5
 Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019

flush_tlb_batched_pending() is under PTL but the write is not, but
mm->tlb_flush_batched is only a bool type, so the value is unlikely to be
shattered.  Thus, mark it as an intentional data race by using the data
race macro.

Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Marco Elver <elver@google.com>
Link: http://lkml.kernel.org/r/1581450783-8262-1-git-send-email-cai@lca.pw
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/mempool: fix a data race in mempool_free()
Qian Cai [Sat, 15 Aug 2020 00:31:44 +0000 (17:31 -0700)]
mm/mempool: fix a data race in mempool_free()

mempool_t pool.curr_nr could be accessed concurrently as noticed by
KCSAN,

 BUG: KCSAN: data-race in mempool_free / remove_element

 write to 0xffffffffa937638c of 4 bytes by task 6359 on cpu 113:
  remove_element+0x4a/0x1c0
  remove_element at mm/mempool.c:132
  mempool_alloc+0x102/0x210
  (inlined by) mempool_alloc at mm/mempool.c:399
  bio_alloc_bioset+0x106/0x2c0
  get_swap_bio+0x49/0x230
  __swap_writepage+0x680/0xc30
  swap_writepage+0x9c/0xf0
  pageout+0x33e/0xae0
  shrink_page_list+0x1f57/0x2870
  shrink_inactive_list+0x316/0x880
  shrink_lruvec+0x8dc/0x1380
  shrink_node+0x317/0xd80
  do_try_to_free_pages+0x1f7/0xa10
  try_to_free_pages+0x26c/0x5e0
  __alloc_pages_slowpath+0x458/0x1290
  <snip>

 read to 0xffffffffa937638c of 4 bytes by interrupt on cpu 64:
  mempool_free+0x3e/0x150
  mempool_free at mm/mempool.c:492
  bio_free+0x192/0x280
  bio_put+0x91/0xd0
  end_swap_bio_write+0x1d8/0x280
  bio_endio+0x2c2/0x5b0
  dec_pending+0x22b/0x440 [dm_mod]
  clone_endio+0xe4/0x2c0 [dm_mod]
  bio_endio+0x2c2/0x5b0
  blk_update_request+0x217/0x940
  scsi_end_request+0x6b/0x4d0
  scsi_io_completion+0xb7/0x7e0
  scsi_finish_command+0x223/0x310
  scsi_softirq_done+0x1d5/0x210
  blk_mq_complete_request+0x224/0x250
  scsi_mq_done+0xc2/0x250
  pqi_raid_io_complete+0x5a/0x70 [smartpqi]
  pqi_irq_handler+0x150/0x1410 [smartpqi]
  __handle_irq_event_percpu+0x90/0x540
  handle_irq_event_percpu+0x49/0xd0
  handle_irq_event+0x85/0xca
  handle_edge_irq+0x13f/0x3e0
  do_IRQ+0x86/0x190
  <snip>

Since the write is under pool->lock but the read is done as lockless.
Even though the commit 5b990546e334 ("mempool: fix and document
synchronization and memory barrier usage") introduced the smp_wmb() and
smp_rmb() pair to improve the situation, it is adequate to protect it
from data races which could lead to a logic bug, so fix it by adding
READ_ONCE() for the read.

Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Marco Elver <elver@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: http://lkml.kernel.org/r/1581446384-2131-1-git-send-email-cai@lca.pw
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/list_lru: fix a data race in list_lru_count_one
Qian Cai [Sat, 15 Aug 2020 00:31:41 +0000 (17:31 -0700)]
mm/list_lru: fix a data race in list_lru_count_one

struct list_lru_one l.nr_items could be accessed concurrently as noticed
by KCSAN,

 BUG: KCSAN: data-race in list_lru_count_one / list_lru_isolate_move

 write to 0xffffa102789c4510 of 8 bytes by task 823 on cpu 39:
  list_lru_isolate_move+0xf9/0x130
  list_lru_isolate_move at mm/list_lru.c:180
  inode_lru_isolate+0x12b/0x2a0
  __list_lru_walk_one+0x122/0x3d0
  list_lru_walk_one+0x75/0xa0
  prune_icache_sb+0x8b/0xc0
  super_cache_scan+0x1b8/0x250
  do_shrink_slab+0x256/0x6d0
  shrink_slab+0x41b/0x4a0
  shrink_node+0x35c/0xd80
  balance_pgdat+0x652/0xd90
  kswapd+0x396/0x8d0
  kthread+0x1e0/0x200
  ret_from_fork+0x27/0x50

 read to 0xffffa102789c4510 of 8 bytes by task 6345 on cpu 56:
  list_lru_count_one+0x116/0x2f0
  list_lru_count_one at mm/list_lru.c:193
  super_cache_count+0xe8/0x170
  do_shrink_slab+0x95/0x6d0
  shrink_slab+0x41b/0x4a0
  shrink_node+0x35c/0xd80
  do_try_to_free_pages+0x1f7/0xa10
  try_to_free_pages+0x26c/0x5e0
  __alloc_pages_slowpath+0x458/0x1290
  __alloc_pages_nodemask+0x3bb/0x450
  alloc_pages_vma+0x8a/0x2c0
  do_anonymous_page+0x170/0x700
  __handle_mm_fault+0xc9f/0xd00
  handle_mm_fault+0xfc/0x2f0
  do_page_fault+0x263/0x6f9
  page_fault+0x34/0x40

 Reported by Kernel Concurrency Sanitizer on:
 CPU: 56 PID: 6345 Comm: oom01 Tainted: G        W    L 5.5.0-next-20200205+ #4
 Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019

A shattered l.nr_items could affect the shrinker behaviour due to a data
race. Fix it by adding READ_ONCE() for the read. Since the writes are
aligned and up to word-size, assume those are safe from data races to
avoid readability issues of writing WRITE_ONCE(var, var + val).

Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Marco Elver <elver@google.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Link: http://lkml.kernel.org/r/1581114679-5488-1-git-send-email-cai@lca.pw
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/memcontrol: fix a data race in scan count
Qian Cai [Sat, 15 Aug 2020 00:31:37 +0000 (17:31 -0700)]
mm/memcontrol: fix a data race in scan count

struct mem_cgroup_per_node mz.lru_zone_size[zone_idx][lru] could be
accessed concurrently as noticed by KCSAN,

 BUG: KCSAN: data-race in lruvec_lru_size / mem_cgroup_update_lru_size

 write to 0xffff9c804ca285f8 of 8 bytes by task 50951 on cpu 12:
  mem_cgroup_update_lru_size+0x11c/0x1d0
  mem_cgroup_update_lru_size at mm/memcontrol.c:1266
  isolate_lru_pages+0x6a9/0xf30
  shrink_active_list+0x123/0xcc0
  shrink_lruvec+0x8fd/0x1380
  shrink_node+0x317/0xd80
  do_try_to_free_pages+0x1f7/0xa10
  try_to_free_pages+0x26c/0x5e0
  __alloc_pages_slowpath+0x458/0x1290
  __alloc_pages_nodemask+0x3bb/0x450
  alloc_pages_vma+0x8a/0x2c0
  do_anonymous_page+0x170/0x700
  __handle_mm_fault+0xc9f/0xd00
  handle_mm_fault+0xfc/0x2f0
  do_page_fault+0x263/0x6f9
  page_fault+0x34/0x40

 read to 0xffff9c804ca285f8 of 8 bytes by task 50964 on cpu 95:
  lruvec_lru_size+0xbb/0x270
  mem_cgroup_get_zone_lru_size at include/linux/memcontrol.h:536
  (inlined by) lruvec_lru_size at mm/vmscan.c:326
  shrink_lruvec+0x1d0/0x1380
  shrink_node+0x317/0xd80
  do_try_to_free_pages+0x1f7/0xa10
  try_to_free_pages+0x26c/0x5e0
  __alloc_pages_slowpath+0x458/0x1290
  __alloc_pages_nodemask+0x3bb/0x450
  alloc_pages_current+0xa6/0x120
  alloc_slab_page+0x3b1/0x540
  allocate_slab+0x70/0x660
  new_slab+0x46/0x70
  ___slab_alloc+0x4ad/0x7d0
  __slab_alloc+0x43/0x70
  kmem_cache_alloc+0x2c3/0x420
  getname_flags+0x4c/0x230
  getname+0x22/0x30
  do_sys_openat2+0x205/0x3b0
  do_sys_open+0x9a/0xf0
  __x64_sys_openat+0x62/0x80
  do_syscall_64+0x91/0xb47
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

 Reported by Kernel Concurrency Sanitizer on:
 CPU: 95 PID: 50964 Comm: cc1 Tainted: G        W  O L    5.5.0-next-20200204+ #6
 Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019

The write is under lru_lock, but the read is done as lockless.  The scan
count is used to determine how aggressively the anon and file LRU lists
should be scanned.  Load tearing could generate an inefficient heuristic,
so fix it by adding READ_ONCE() for the read.

Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Link: http://lkml.kernel.org/r/20200206034945.2481-1-cai@lca.pw
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/page_counter: fix various data races at memsw
Qian Cai [Sat, 15 Aug 2020 00:31:34 +0000 (17:31 -0700)]
mm/page_counter: fix various data races at memsw

Commit 3e32cb2e0a12 ("mm: memcontrol: lockless page counters") could had
memcg->memsw->watermark and memcg->memsw->failcnt been accessed
concurrently as reported by KCSAN,

 BUG: KCSAN: data-race in page_counter_try_charge / page_counter_try_charge

 read to 0xffff8fb18c4cd190 of 8 bytes by task 1081 on cpu 59:
  page_counter_try_charge+0x4d/0x150 mm/page_counter.c:138
  try_charge+0x131/0xd50 mm/memcontrol.c:2405
  __memcg_kmem_charge_memcg+0x58/0x140
  __memcg_kmem_charge+0xcc/0x280
  __alloc_pages_nodemask+0x1e1/0x450
  alloc_pages_current+0xa6/0x120
  pte_alloc_one+0x17/0xd0
  __pte_alloc+0x3a/0x1f0
  copy_p4d_range+0xc36/0x1990
  copy_page_range+0x21d/0x360
  dup_mmap+0x5f5/0x7a0
  dup_mm+0xa2/0x240
  copy_process+0x1b3f/0x3460
  _do_fork+0xaa/0xa20
  __x64_sys_clone+0x13b/0x170
  do_syscall_64+0x91/0xb47
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

 write to 0xffff8fb18c4cd190 of 8 bytes by task 1153 on cpu 120:
  page_counter_try_charge+0x5b/0x150 mm/page_counter.c:139
  try_charge+0x131/0xd50 mm/memcontrol.c:2405
  mem_cgroup_try_charge+0x159/0x460
  mem_cgroup_try_charge_delay+0x3d/0xa0
  wp_page_copy+0x14d/0x930
  do_wp_page+0x107/0x7b0
  __handle_mm_fault+0xce6/0xd40
  handle_mm_fault+0xfc/0x2f0
  do_page_fault+0x263/0x6f9
  page_fault+0x34/0x40

 BUG: KCSAN: data-race in page_counter_try_charge / page_counter_try_charge

 write to 0xffff88809bbf2158 of 8 bytes by task 11782 on cpu 0:
  page_counter_try_charge+0x100/0x170 mm/page_counter.c:129
  try_charge+0x185/0xbf0 mm/memcontrol.c:2405
  __memcg_kmem_charge_memcg+0x4a/0xe0 mm/memcontrol.c:2837
  __memcg_kmem_charge+0xcf/0x1b0 mm/memcontrol.c:2877
  __alloc_pages_nodemask+0x26c/0x310 mm/page_alloc.c:4780

 read to 0xffff88809bbf2158 of 8 bytes by task 11814 on cpu 1:
  page_counter_try_charge+0xef/0x170 mm/page_counter.c:129
  try_charge+0x185/0xbf0 mm/memcontrol.c:2405
  __memcg_kmem_charge_memcg+0x4a/0xe0 mm/memcontrol.c:2837
  __memcg_kmem_charge+0xcf/0x1b0 mm/memcontrol.c:2877
  __alloc_pages_nodemask+0x26c/0x310 mm/page_alloc.c:4780

Since watermark could be compared or set to garbage due to a data race
which would change the code logic, fix it by adding a pair of READ_ONCE()
and WRITE_ONCE() in those places.

The "failcnt" counter is tolerant of some degree of inaccuracy and is only
used to report stats, a data race will not be harmful, thus mark it as an
intentional data race using the data_race() macro.

Fixes: 3e32cb2e0a12 ("mm: memcontrol: lockless page counters")
Reported-by: syzbot+f36cfe60b1006a94f9dc@syzkaller.appspotmail.com
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Marco Elver <elver@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Link: http://lkml.kernel.org/r/1581519682-23594-1-git-send-email-cai@lca.pw
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/swapfile: fix and annotate various data races
Qian Cai [Sat, 15 Aug 2020 00:31:31 +0000 (17:31 -0700)]
mm/swapfile: fix and annotate various data races

swap_info_struct si.highest_bit, si.swap_map[offset] and si.flags could
be accessed concurrently separately as noticed by KCSAN,

=== si.highest_bit ===

 write to 0xffff8d5abccdc4d4 of 4 bytes by task 5353 on cpu 24:
  swap_range_alloc+0x81/0x130
  swap_range_alloc at mm/swapfile.c:681
  scan_swap_map_slots+0x371/0xb90
  get_swap_pages+0x39d/0x5c0
  get_swap_page+0xf2/0x524
  add_to_swap+0xe4/0x1c0
  shrink_page_list+0x1795/0x2870
  shrink_inactive_list+0x316/0x880
  shrink_lruvec+0x8dc/0x1380
  shrink_node+0x317/0xd80
  do_try_to_free_pages+0x1f7/0xa10
  try_to_free_pages+0x26c/0x5e0
  __alloc_pages_slowpath+0x458/0x1290

 read to 0xffff8d5abccdc4d4 of 4 bytes by task 6672 on cpu 70:
  scan_swap_map_slots+0x4a6/0xb90
  scan_swap_map_slots at mm/swapfile.c:892
  get_swap_pages+0x39d/0x5c0
  get_swap_page+0xf2/0x524
  add_to_swap+0xe4/0x1c0
  shrink_page_list+0x1795/0x2870
  shrink_inactive_list+0x316/0x880
  shrink_lruvec+0x8dc/0x1380
  shrink_node+0x317/0xd80
  do_try_to_free_pages+0x1f7/0xa10
  try_to_free_pages+0x26c/0x5e0
  __alloc_pages_slowpath+0x458/0x1290

 Reported by Kernel Concurrency Sanitizer on:
 CPU: 70 PID: 6672 Comm: oom01 Tainted: G        W    L 5.5.0-next-20200205+ #3
 Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019

=== si.swap_map[offset] ===

 write to 0xffffbc370c29a64c of 1 bytes by task 6856 on cpu 86:
  __swap_entry_free_locked+0x8c/0x100
  __swap_entry_free_locked at mm/swapfile.c:1209 (discriminator 4)
  __swap_entry_free.constprop.20+0x69/0xb0
  free_swap_and_cache+0x53/0xa0
  unmap_page_range+0x7f8/0x1d70
  unmap_single_vma+0xcd/0x170
  unmap_vmas+0x18b/0x220
  exit_mmap+0xee/0x220
  mmput+0x10e/0x270
  do_exit+0x59b/0xf40
  do_group_exit+0x8b/0x180

 read to 0xffffbc370c29a64c of 1 bytes by task 6855 on cpu 20:
  _swap_info_get+0x81/0xa0
  _swap_info_get at mm/swapfile.c:1140
  free_swap_and_cache+0x40/0xa0
  unmap_page_range+0x7f8/0x1d70
  unmap_single_vma+0xcd/0x170
  unmap_vmas+0x18b/0x220
  exit_mmap+0xee/0x220
  mmput+0x10e/0x270
  do_exit+0x59b/0xf40
  do_group_exit+0x8b/0x180

=== si.flags ===

 write to 0xffff956c8fc6c400 of 8 bytes by task 6087 on cpu 23:
  scan_swap_map_slots+0x6fe/0xb50
  scan_swap_map_slots at mm/swapfile.c:887
  get_swap_pages+0x39d/0x5c0
  get_swap_page+0x377/0x524
  add_to_swap+0xe4/0x1c0
  shrink_page_list+0x1795/0x2870
  shrink_inactive_list+0x316/0x880
  shrink_lruvec+0x8dc/0x1380
  shrink_node+0x317/0xd80
  do_try_to_free_pages+0x1f7/0xa10
  try_to_free_pages+0x26c/0x5e0
  __alloc_pages_slowpath+0x458/0x1290

 read to 0xffff956c8fc6c400 of 8 bytes by task 6207 on cpu 63:
  _swap_info_get+0x41/0xa0
  __swap_info_get at mm/swapfile.c:1114
  put_swap_page+0x84/0x490
  __remove_mapping+0x384/0x5f0
  shrink_page_list+0xff1/0x2870
  shrink_inactive_list+0x316/0x880
  shrink_lruvec+0x8dc/0x1380
  shrink_node+0x317/0xd80
  do_try_to_free_pages+0x1f7/0xa10
  try_to_free_pages+0x26c/0x5e0
  __alloc_pages_slowpath+0x458/0x1290

The writes are under si->lock but the reads are not. For si.highest_bit
and si.swap_map[offset], data race could trigger logic bugs, so fix them
by having WRITE_ONCE() for the writes and READ_ONCE() for the reads
except those isolated reads where they compare against zero which a data
race would cause no harm. Thus, annotate them as intentional data races
using the data_race() macro.

For si.flags, the readers are only interested in a single bit where a
data race there would cause no issue there.

[cai@lca.pw: add a missing annotation for si->flags in memory.c]
Link: http://lkml.kernel.org/r/1581612647-5958-1-git-send-email-cai@lca.pw
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Marco Elver <elver@google.com>
Cc: Hugh Dickins <hughd@google.com>
Link: http://lkml.kernel.org/r/1581095163-12198-1-git-send-email-cai@lca.pw
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/filemap.c: fix a data race in filemap_fault()
Kirill A. Shutemov [Sat, 15 Aug 2020 00:31:27 +0000 (17:31 -0700)]
mm/filemap.c: fix a data race in filemap_fault()

struct file_ra_state ra.mmap_miss could be accessed concurrently during
page faults as noticed by KCSAN,

 BUG: KCSAN: data-race in filemap_fault / filemap_map_pages

 write to 0xffff9b1700a2c1b4 of 4 bytes by task 3292 on cpu 30:
  filemap_fault+0x920/0xfc0
  do_sync_mmap_readahead at mm/filemap.c:2384
  (inlined by) filemap_fault at mm/filemap.c:2486
  __xfs_filemap_fault+0x112/0x3e0 [xfs]
  xfs_filemap_fault+0x74/0x90 [xfs]
  __do_fault+0x9e/0x220
  do_fault+0x4a0/0x920
  __handle_mm_fault+0xc69/0xd00
  handle_mm_fault+0xfc/0x2f0
  do_page_fault+0x263/0x6f9
  page_fault+0x34/0x40

 read to 0xffff9b1700a2c1b4 of 4 bytes by task 3313 on cpu 32:
  filemap_map_pages+0xc2e/0xd80
  filemap_map_pages at mm/filemap.c:2625
  do_fault+0x3da/0x920
  __handle_mm_fault+0xc69/0xd00
  handle_mm_fault+0xfc/0x2f0
  do_page_fault+0x263/0x6f9
  page_fault+0x34/0x40

 Reported by Kernel Concurrency Sanitizer on:
 CPU: 32 PID: 3313 Comm: systemd-udevd Tainted: G        W    L 5.5.0-next-20200210+ #1
 Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019

ra.mmap_miss is used to contribute the readahead decisions, a data race
could be undesirable.  Both the read and write is only under non-exclusive
mmap_sem, two concurrent writers could even underflow the counter.  Fix
the underflow by writing to a local variable before committing a final
store to ra.mmap_miss given a small inaccuracy of the counter should be
acceptable.

Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Qian Cai <cai@lca.pw>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Marco Elver <elver@google.com>
Link: http://lkml.kernel.org/r/20200211030134.1847-1-cai@lca.pw
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/swap_state: mark various intentional data races
Qian Cai [Sat, 15 Aug 2020 00:31:24 +0000 (17:31 -0700)]
mm/swap_state: mark various intentional data races

swap_cache_info.* could be accessed concurrently as noticed by
KCSAN,

 BUG: KCSAN: data-race in lookup_swap_cache / lookup_swap_cache

 write to 0xffffffff85517318 of 8 bytes by task 94138 on cpu 101:
  lookup_swap_cache+0x12e/0x460
  lookup_swap_cache at mm/swap_state.c:322
  do_swap_page+0x112/0xeb0
  __handle_mm_fault+0xc7a/0xd00
  handle_mm_fault+0xfc/0x2f0
  do_page_fault+0x263/0x6f9
  page_fault+0x34/0x40

 read to 0xffffffff85517318 of 8 bytes by task 91655 on cpu 100:
  lookup_swap_cache+0x117/0x460
  lookup_swap_cache at mm/swap_state.c:322
  shmem_swapin_page+0xc7/0x9e0
  shmem_getpage_gfp+0x2ca/0x16c0
  shmem_fault+0xef/0x3c0
  __do_fault+0x9e/0x220
  do_fault+0x4a0/0x920
  __handle_mm_fault+0xc69/0xd00
  handle_mm_fault+0xfc/0x2f0
  do_page_fault+0x263/0x6f9
  page_fault+0x34/0x40

 Reported by Kernel Concurrency Sanitizer on:
 CPU: 100 PID: 91655 Comm: systemd-journal Tainted: G        W  O L 5.5.0-next-20200204+ #6
 Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019

 write to 0xffffffff8d717308 of 8 bytes by task 11365 on cpu 87:
   __delete_from_swap_cache+0x681/0x8b0
   __delete_from_swap_cache at mm/swap_state.c:178

 read to 0xffffffff8d717308 of 8 bytes by task 11275 on cpu 53:
   __delete_from_swap_cache+0x66e/0x8b0
   __delete_from_swap_cache at mm/swap_state.c:178

Both the read and write are done as lockless. Since swap_cache_info.*
are only used to print out counter information, even if any of them
missed a few incremental due to data races, it will be harmless, so just
mark it as an intentional data race using the data_race() macro.

While at it, fix a checkpatch.pl warning,

WARNING: Single statement macros should not use a do {} while (0) loop

Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Marco Elver <elver@google.com>
Link: http://lkml.kernel.org/r/20200207003715.1578-1-cai@lca.pw
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/page_io: mark various intentional data races
Qian Cai [Sat, 15 Aug 2020 00:31:20 +0000 (17:31 -0700)]
mm/page_io: mark various intentional data races

struct swap_info_struct si.flags could be accessed concurrently as noticed
by KCSAN,

 BUG: KCSAN: data-race in scan_swap_map_slots / swap_readpage

 write to 0xffff9c77b80ac400 of 8 bytes by task 91325 on cpu 16:
  scan_swap_map_slots+0x6fe/0xb50
  scan_swap_map_slots at mm/swapfile.c:887
  get_swap_pages+0x39d/0x5c0
  get_swap_page+0x377/0x524
  add_to_swap+0xe4/0x1c0
  shrink_page_list+0x1740/0x2820
  shrink_inactive_list+0x316/0x8b0
  shrink_lruvec+0x8dc/0x1380
  shrink_node+0x317/0xd80
  do_try_to_free_pages+0x1f7/0xa10
  try_to_free_pages+0x26c/0x5e0
  __alloc_pages_slowpath+0x458/0x1290
  __alloc_pages_nodemask+0x3bb/0x450
  alloc_pages_vma+0x8a/0x2c0
  do_anonymous_page+0x170/0x700
  __handle_mm_fault+0xc9f/0xd00
  handle_mm_fault+0xfc/0x2f0
  do_page_fault+0x263/0x6f9
  page_fault+0x34/0x40

 read to 0xffff9c77b80ac400 of 8 bytes by task 5422 on cpu 7:
  swap_readpage+0x204/0x6a0
  swap_readpage at mm/page_io.c:380
  read_swap_cache_async+0xa2/0xb0
  swapin_readahead+0x6a0/0x890
  do_swap_page+0x465/0xeb0
  __handle_mm_fault+0xc7a/0xd00
  handle_mm_fault+0xfc/0x2f0
  do_page_fault+0x263/0x6f9
  page_fault+0x34/0x40

 Reported by Kernel Concurrency Sanitizer on:
 CPU: 7 PID: 5422 Comm: gmain Tainted: G        W  O L 5.5.0-next-20200204+ #6
 Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019

Other reads,

 read to 0xffff91ea33eac400 of 8 bytes by task 11276 on cpu 120:
  __swap_writepage+0x140/0xc20
  __swap_writepage at mm/page_io.c:289

 read to 0xffff91ea33eac400 of 8 bytes by task 11264 on cpu 16:
  swap_set_page_dirty+0x44/0x1f4
  swap_set_page_dirty at mm/page_io.c:442

The write is under &si->lock, but the reads are done as lockless.  Since
the reads only check for a specific bit in the flag, it is harmless even
if load tearing happens.  Thus, just mark them as intentional data races
using the data_race() macro.

[cai@lca.pw: add a missing annotation]
Link: http://lkml.kernel.org/r/1581612585-5812-1-git-send-email-cai@lca.pw
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Marco Elver <elver@google.com>
Link: http://lkml.kernel.org/r/20200207003601.1526-1-cai@lca.pw
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/frontswap: mark various intentional data races
Qian Cai [Sat, 15 Aug 2020 00:31:17 +0000 (17:31 -0700)]
mm/frontswap: mark various intentional data races

There are a few information counters that are intentionally not protected
against increment races, so just annotate them using the data_race()
macro.

 BUG: KCSAN: data-race in __frontswap_store / __frontswap_store

 write to 0xffffffff8b7174d8 of 8 bytes by task 6396 on cpu 103:
  __frontswap_store+0x2d0/0x344
  inc_frontswap_failed_stores at mm/frontswap.c:70
  (inlined by) __frontswap_store at mm/frontswap.c:280
  swap_writepage+0x83/0xf0
  pageout+0x33e/0xae0
  shrink_page_list+0x1f57/0x2870
  shrink_inactive_list+0x316/0x880
  shrink_lruvec+0x8dc/0x1380
  shrink_node+0x317/0xd80
  do_try_to_free_pages+0x1f7/0xa10
  try_to_free_pages+0x26c/0x5e0
  __alloc_pages_slowpath+0x458/0x1290
  __alloc_pages_nodemask+0x3bb/0x450
  alloc_pages_vma+0x8a/0x2c0
  do_anonymous_page+0x170/0x700
  __handle_mm_fault+0xc9f/0xd00
  handle_mm_fault+0xfc/0x2f0
  do_page_fault+0x263/0x6f9
  page_fault+0x34/0x40

 read to 0xffffffff8b7174d8 of 8 bytes by task 6405 on cpu 47:
  __frontswap_store+0x2b9/0x344
  inc_frontswap_failed_stores at mm/frontswap.c:70
  (inlined by) __frontswap_store at mm/frontswap.c:280
  swap_writepage+0x83/0xf0
  pageout+0x33e/0xae0
  shrink_page_list+0x1f57/0x2870
  shrink_inactive_list+0x316/0x880
  shrink_lruvec+0x8dc/0x1380
  shrink_node+0x317/0xd80
  do_try_to_free_pages+0x1f7/0xa10
  try_to_free_pages+0x26c/0x5e0
  __alloc_pages_slowpath+0x458/0x1290
  __alloc_pages_nodemask+0x3bb/0x450
  alloc_pages_vma+0x8a/0x2c0
  do_anonymous_page+0x170/0x700
  __handle_mm_fault+0xc9f/0xd00
  handle_mm_fault+0xfc/0x2f0
  do_page_fault+0x263/0x6f9
  page_fault+0x34/0x40

Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Marco Elver <elver@google.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Link: http://lkml.kernel.org/r/1581114499-5042-1-git-send-email-cai@lca.pw
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/kmemleak: silence KCSAN splats in checksum
Qian Cai [Sat, 15 Aug 2020 00:31:14 +0000 (17:31 -0700)]
mm/kmemleak: silence KCSAN splats in checksum

Even if KCSAN is disabled for kmemleak, update_checksum() could still call
crc32() (which is outside of kmemleak.c) to dereference object->pointer.
Thus, the value of object->pointer could be accessed concurrently as
noticed by KCSAN,

 BUG: KCSAN: data-race in crc32_le_base / do_raw_spin_lock

 write to 0xffffb0ea683a7d50 of 4 bytes by task 23575 on cpu 12:
  do_raw_spin_lock+0x114/0x200
  debug_spin_lock_after at kernel/locking/spinlock_debug.c:91
  (inlined by) do_raw_spin_lock at kernel/locking/spinlock_debug.c:115
  _raw_spin_lock+0x40/0x50
  __handle_mm_fault+0xa9e/0xd00
  handle_mm_fault+0xfc/0x2f0
  do_page_fault+0x263/0x6f9
  page_fault+0x34/0x40

 read to 0xffffb0ea683a7d50 of 4 bytes by task 839 on cpu 60:
  crc32_le_base+0x67/0x350
  crc32_le_base+0x67/0x350:
  crc32_body at lib/crc32.c:106
  (inlined by) crc32_le_generic at lib/crc32.c:179
  (inlined by) crc32_le at lib/crc32.c:197
  kmemleak_scan+0x528/0xd90
  update_checksum at mm/kmemleak.c:1172
  (inlined by) kmemleak_scan at mm/kmemleak.c:1497
  kmemleak_scan_thread+0xcc/0xfa
  kthread+0x1e0/0x200
  ret_from_fork+0x27/0x50

If a shattered value was returned due to a data race, it will be corrected
in the next scan.  Thus, let KCSAN ignore all reads in the region to
silence KCSAN in case the write side is non-atomic.

Suggested-by: Marco Elver <elver@google.com>
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Marco Elver <elver@google.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: http://lkml.kernel.org/r/20200317182754.2180-1-cai@lca.pw
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoall arch: remove system call sys_sysctl
Xiaoming Ni [Sat, 15 Aug 2020 00:31:07 +0000 (17:31 -0700)]
all arch: remove system call sys_sysctl

Since commit 61a47c1ad3a4dc ("sysctl: Remove the sysctl system call"),
sys_sysctl is actually unavailable: any input can only return an error.

We have been warning about people using the sysctl system call for years
and believe there are no more users.  Even if there are users of this
interface if they have not complained or fixed their code by now they
probably are not going to, so there is no point in warning them any
longer.

So completely remove sys_sysctl on all architectures.

[nixiaoming@huawei.com: s390: fix build error for sys_call_table_emu]
Link: http://lkml.kernel.org/r/20200618141426.16884-1-nixiaoming@huawei.com
Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Will Deacon <will@kernel.org> [arm/arm64]
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Bin Meng <bin.meng@windriver.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: chenzefeng <chenzefeng2@huawei.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Christian Brauner <christian@brauner.io>
Cc: Chris Zankel <chris@zankel.net>
Cc: David Howells <dhowells@redhat.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Diego Elio Pettenò <flameeyes@flameeyes.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kars de Jong <jongk@linux-m68k.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Marco Elver <elver@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Olof Johansson <olof@lixom.net>
Cc: Paul Burton <paulburton@kernel.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Sargun Dhillon <sargun@sargun.me>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Sven Schnelle <svens@stackframe.org>
Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Zhou Yanjie <zhouyanjie@wanyeetech.com>
Link: http://lkml.kernel.org/r/20200616030734.87257-1-nixiaoming@huawei.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agofs: autofs: delete repeated words in comments
Randy Dunlap [Sat, 15 Aug 2020 00:30:46 +0000 (17:30 -0700)]
fs: autofs: delete repeated words in comments

Drop duplicated words {the, at} in comments.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Ian Kent <raven@themaw.net>
Link: http://lkml.kernel.org/r/20200811021817.24982-1-rdunlap@infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm: introduce offset_in_thp
Matthew Wilcox (Oracle) [Sat, 15 Aug 2020 00:30:43 +0000 (17:30 -0700)]
mm: introduce offset_in_thp

Mirroring offset_in_page(), this gives you the offset within this
particular page, no matter what size page it is.  It optimises down to
offset_in_page() if CONFIG_TRANSPARENT_HUGEPAGE is not set.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Link: http://lkml.kernel.org/r/20200629151959.15779-8-willy@infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm: add thp_head
Matthew Wilcox (Oracle) [Sat, 15 Aug 2020 00:30:40 +0000 (17:30 -0700)]
mm: add thp_head

This is like compound_head() but compiles away when
CONFIG_TRANSPARENT_HUGEPAGE is not enabled.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Link: http://lkml.kernel.org/r/20200629151959.15779-7-willy@infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm: replace hpage_nr_pages with thp_nr_pages
Matthew Wilcox (Oracle) [Sat, 15 Aug 2020 00:30:37 +0000 (17:30 -0700)]
mm: replace hpage_nr_pages with thp_nr_pages

The thp prefix is more frequently used than hpage and we should be
consistent between the various functions.

[akpm@linux-foundation.org: fix mm/migrate.c]

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Link: http://lkml.kernel.org/r/20200629151959.15779-6-willy@infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm: add thp_size
Matthew Wilcox (Oracle) [Sat, 15 Aug 2020 00:30:33 +0000 (17:30 -0700)]
mm: add thp_size

This function returns the number of bytes in a THP.  It is like
page_size(), but compiles to just PAGE_SIZE if CONFIG_TRANSPARENT_HUGEPAGE
is disabled.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Link: http://lkml.kernel.org/r/20200629151959.15779-5-willy@infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm: add thp_order
Matthew Wilcox (Oracle) [Sat, 15 Aug 2020 00:30:30 +0000 (17:30 -0700)]
mm: add thp_order

This function returns the order of a transparent huge page.  It compiles
to 0 if CONFIG_TRANSPARENT_HUGEPAGE is disabled.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Link: http://lkml.kernel.org/r/20200629151959.15779-4-willy@infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm: move page-flags include to top of file
Matthew Wilcox (Oracle) [Sat, 15 Aug 2020 00:30:26 +0000 (17:30 -0700)]
mm: move page-flags include to top of file

Give up on the notion that we can remove page-flags.h from mm.h.  There
are currently 14 inline functions which use a PageFoo function.  Also, two
of the files directly included by mm.h include page-flags.h themselves,
and there are probably more indirect inclusions.  So just include it at
the top like any other header file.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Link: http://lkml.kernel.org/r/20200629151959.15779-3-willy@infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm: store compound_nr as well as compound_order
Matthew Wilcox (Oracle) [Sat, 15 Aug 2020 00:30:23 +0000 (17:30 -0700)]
mm: store compound_nr as well as compound_order

Patch series "THP prep patches".

These are some generic cleanups and improvements, which I would like
merged into mmotm soon.  The first one should be a performance improvement
for all users of compound pages, and the others are aimed at getting code
to compile away when CONFIG_TRANSPARENT_HUGEPAGE is disabled (ie small
systems).  Also better documented / less confusing than the current prefix
mixture of compound, hpage and thp.

This patch (of 7):

This removes a few instructions from functions which need to know how many
pages are in a compound page.  The storage used is either page->mapping on
64-bit or page->index on 32-bit.  Both of these are fine to overlay on
tail pages.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Link: http://lkml.kernel.org/r/20200629151959.15779-1-willy@infradead.org
Link: http://lkml.kernel.org/r/20200629151959.15779-2-willy@infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomailmap: add entry for Greg Kurz
Greg Kurz [Sat, 15 Aug 2020 00:30:20 +0000 (17:30 -0700)]
mailmap: add entry for Greg Kurz

I had stopped using gkurz@linux.vnet.ibm.com a while back already but this
email address was shutdown last June when I quit IBM.  It's about time to
map it to groug@kaod.org.

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/159724692879.76040.4938578139173154028.stgit@bahia.lan
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoselftests/exec: add file type errno tests
Kees Cook [Sat, 15 Aug 2020 00:30:17 +0000 (17:30 -0700)]
selftests/exec: add file type errno tests

Make sure execve() returns the expected errno values for non-regular
files.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Marc Zyngier <maz@kernel.org>
Link: http://lkml.kernel.org/r/20200813231723.2725102-3-keescook@chromium.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoexec: restore EACCES of S_ISDIR execve()
Kees Cook [Sat, 15 Aug 2020 00:30:14 +0000 (17:30 -0700)]
exec: restore EACCES of S_ISDIR execve()

Patch series "Fix S_ISDIR execve() errno".

Fix an errno change for execve() of directories, noticed by Marc Zyngier.
Along with the fix, include a regression test to avoid seeing this return
in the future.

This patch (of 2):

The return code for attempting to execute a directory has always been
EACCES.  Adjust the S_ISDIR exec test to reflect the old errno instead of
the general EISDIR for other kinds of "open" attempts on directories.

Fixes: 633fb6ac3980 ("exec: move S_ISREG() check earlier")
Reported-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Greg Kroah-Hartman <gregkh@android.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@google.com>
Link: http://lkml.kernel.org/r/20200813231723.2725102-2-keescook@chromium.org
Link: https://lore.kernel.org/lkml/20200813151305.6191993b@why
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agolz4: fix kernel decompression speed
Nick Terrell [Sat, 15 Aug 2020 00:30:10 +0000 (17:30 -0700)]
lz4: fix kernel decompression speed

This patch replaces all memcpy() calls with LZ4_memcpy() which calls
__builtin_memcpy() so the compiler can inline it.

LZ4 relies heavily on memcpy() with a constant size being inlined.  In x86
and i386 pre-boot environments memcpy() cannot be inlined because memcpy()
doesn't get defined as __builtin_memcpy().

An equivalent patch has been applied upstream so that the next import
won't lose this change [1].

I've measured the kernel decompression speed using QEMU before and after
this patch for the x86_64 and i386 architectures.  The speed-up is about
10x as shown below.

Code Arch Kernel Size Time Speed
v5.8 x86_64 11504832 B 148 ms  79 MB/s
patch x86_64 11503872 B  13 ms 885 MB/s
v5.8 i386  9621216 B  91 ms 106 MB/s
patch i386  9620224 B  10 ms 962 MB/s

I also measured the time to decompress the initramfs on x86_64, i386, and
arm.  All three show the same decompression speed before and after, as
expected.

[1] https://github.com/lz4/lz4/pull/890

Signed-off-by: Nick Terrell <terrelln@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Yann Collet <yann.collet.73@gmail.com>
Cc: Gao Xiang <gaoxiang25@huawei.com>
Cc: Sven Schmidt <4sschmid@informatik.uni-hamburg.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Arvind Sankar <nivedita@alum.mit.edu>
Link: http://lkml.kernel.org/r/20200803194022.2966806-1-nickrterrell@gmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoRevert "mm/vmstat.c: do not show lowmem reserve protection information of empty zone"
Baoquan He [Sat, 15 Aug 2020 00:30:07 +0000 (17:30 -0700)]
Revert "mm/vmstat.c: do not show lowmem reserve protection information of empty zone"

This reverts commit 26e7deadaae175.

Sonny reported that one of their tests started failing on the latest
kernel on their Chrome OS platform.  The root cause is that the above
commit removed the protection line of empty zone, while the parser used in
the test relies on the protection line to mark the end of each zone.

Let's revert it to avoid breaking userspace testing or applications.

Fixes: 26e7deadaae175 ("mm/vmstat.c: do not show lowmem reserve protection information of empty zone)"
Reported-by: Sonny Rao <sonnyrao@chromium.org>
Signed-off-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: <stable@vger.kernel.org> [5.8.x]
Link: http://lkml.kernel.org/r/20200811075412.12872-1-bhe@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoasm-generic: pgalloc.h: use correct #ifdef to enable pud_alloc_one()
Mike Rapoport [Sat, 15 Aug 2020 00:30:04 +0000 (17:30 -0700)]
asm-generic: pgalloc.h: use correct #ifdef to enable pud_alloc_one()

The #ifdef statement that guards the generic version of pud_alloc_one() by
mistake used __HAVE_ARCH_PUD_FREE instead of __HAVE_ARCH_PUD_ALLOC_ONE.

Fix it.

Fixes: d9e8b929670b ("asm-generic: pgalloc: provide generic pud_alloc_one() and pud_free_one()")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20200812191415.GE163101@linux.ibm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoMerge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Linus Torvalds [Fri, 14 Aug 2020 23:01:59 +0000 (16:01 -0700)]
Merge tag 'scsi-misc' of git://git./linux/kernel/git/jejb/scsi

Pull more SCSI updates from James Bottomley:
 "This is the set of patches which arrived too late to stabilise in
  -next for the first pull.

  It's really just an lpfc driver update and an assortment of minor
  fixes, all in drivers. The only core update is to the zone block
  device driver, which isn't the one most people use"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: lpfc: Update lpfc version to 12.8.0.3
  scsi: lpfc: Fix LUN loss after cable pull
  scsi: lpfc: Fix validation of bsg reply lengths
  scsi: lpfc: Fix retry of PRLI when status indicates its unsupported
  scsi: lpfc: Fix oops when unloading driver while running mds diags
  scsi: lpfc: Fix RSCN timeout due to incorrect gidft counter
  scsi: lpfc: Fix no message shown for lpfc_hdw_queue out of range value
  scsi: lpfc: Fix FCoE speed reporting
  scsi: lpfc: Add missing misc_deregister() for lpfc_init()
  scsi: lpfc: nvmet: Avoid hang / use-after-free again when destroying targetport
  scsi: scsi_transport_sas: Add spaces around binary operator "|"
  scsi: sd_zbc: Improve zone revalidation
  scsi: libfc: Free skb in fc_disc_gpn_id_resp() for valid cases
  scsi: fcoe: Memory leak fix in fcoe_sysfs_fcf_del()
  scsi: target: Make iscsit_register_transport() return void

4 years agoMerge tag 'pwm/for-5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry...
Linus Torvalds [Fri, 14 Aug 2020 23:00:09 +0000 (16:00 -0700)]
Merge tag 'pwm/for-5.9-rc1' of git://git./linux/kernel/git/thierry.reding/linux-pwm

Pull pwm updates from Thierry Reding:
 "The majority of this batch is conversion of the PWM period and duty
  cycle to 64-bit unsigned integers, which is required so that some
  types of hardware can generate the full range of signals that they're
  capable of.

  The remainder is mostly minor fixes and cleanups"

* tag 'pwm/for-5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm:
  pwm: bcm-iproc: handle clk_get_rate() return
  pwm: Replace HTTP links with HTTPS ones
  pwm: omap-dmtimer: Repair pwm_omap_dmtimer_chip's broken kerneldoc header
  pwm: mediatek: Provide missing kerneldoc description for 'soc' arg
  pwm: bcm-kona: Remove impossible comparison when validating duty cycle
  pwm: bcm-iproc: Remove impossible comparison when validating duty cycle
  pwm: iqs620a: Use lowercase hexadecimal literals for consistency
  pwm: Convert period and duty cycle to u64
  clk: pwm: Use 64-bit division function
  backlight: pwm_bl: Use 64-bit division function
  pwm: sun4i: Use nsecs_to_jiffies to avoid a division
  pwm: sifive: Use 64-bit division macro
  pwm: iqs620a: Use 64-bit division
  pwm: imx27: Use 64-bit division macro
  pwm: imx-tpm: Use 64-bit division macro
  pwm: clps711x: Use 64-bit division macro
  hwmon: pwm-fan: Use 64-bit division macro
  drm/i915: Use 64-bit division macro

4 years agoMerge tag 'sound-fix-5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai...
Linus Torvalds [Fri, 14 Aug 2020 22:58:57 +0000 (15:58 -0700)]
Merge tag 'sound-fix-5.9-rc1' of git://git./linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "All device-specific small fixes and quirks mostly for usual suspects,
  USB-audio and HD-audio"

* tag 'sound-fix-5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: echoaudio: Fix potential Oops in snd_echo_resume()
  ALSA: hda/hdmi: Use force connectivity quirk on another HP desktop
  ALSA: hda/realtek - Fix unused variable warning
  ALSA: hda - reverse the setting value in the micmute_led_set
  ALSA: echoaduio: Drop superfluous volatile modifier
  ALSA: usb-audio: Disable Lenovo P620 Rear line-in volume control
  ALSA: usb-audio: add quirk for Pioneer DDJ-RB
  ALSA: usb-audio: work around streaming quirk for MacroSilicon MS2109
  ALSA: hda - fix the micmute led status for Lenovo ThinkCentre AIO
  ALSA: usb-audio: fix overeager device match for MacroSilicon MS2109
  ALSA: hda/realtek: Fix pin default on Intel NUC 8 Rugged
  ALSA: usb-audio: Creative USB X-Fi Pro SB1095 volume knob support
  ALSA: usb-audio: fix spelling mistake "buss" -> "bus"

4 years agodma-debug: remove debug_dma_assert_idle() function
Linus Torvalds [Fri, 14 Aug 2020 22:22:43 +0000 (15:22 -0700)]
dma-debug: remove debug_dma_assert_idle() function

This remoes the code from the COW path to call debug_dma_assert_idle(),
which was added many years ago.

Google shows that it hasn't caught anything in the 6+ years we've had it
apart from a false positive, and Hugh just noticed how it had a very
unfortunate spinlock serialization in the COW path.

He fixed that issue the previous commit (a85ffd59bd36: "dma-debug: fix
debug_dma_assert_idle(), use rcu_read_lock()"), but let's see if anybody
even notices when we remove this function entirely.

NOTE! We keep the dma tracking infrastructure that was added by the
commit that introduced it.  Partly to make it easier to resurrect this
debug code if we ever deside to, and partly because that tracking by pfn
and offset looks quite reasonable.

The problem with this debug code was simply that it was expensive and
didn't seem worth it, not that it was wrong per se.

Acked-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agodma-debug: fix debug_dma_assert_idle(), use rcu_read_lock()
Hugh Dickins [Thu, 13 Aug 2020 03:17:00 +0000 (20:17 -0700)]
dma-debug: fix debug_dma_assert_idle(), use rcu_read_lock()

Since commit 2a9127fcf229 ("mm: rewrite wait_on_page_bit_common()
logic") improved unlock_page(), it has become more noticeable how
cow_user_page() in a kernel with CONFIG_DMA_API_DEBUG=y can create and
suffer from heavy contention on DMA debug's radix_lock in
debug_dma_assert_idle().

It is only doing a lookup: use rcu_read_lock() and rcu_read_unlock()
instead; though that does require the static ents[] to be moved
onstack...

...but, hold on, isn't that radix_tree_gang_lookup() and loop doing
quite the wrong thing: searching CACHELINES_PER_PAGE entries for an
exact match with the first cacheline of the page in question?
radix_tree_gang_lookup() is the right tool for the job, but we need
nothing more than to check the first entry it can find, reporting if
that falls anywhere within the page.

(Is RCU safe here? As safe as using the spinlock was. The entries are
never freed, so don't need to be freed by RCU. They may be reused, and
there is a faint chance of a race, with an offending entry reused while
printing its error info; but the spinlock did not prevent that either,
and I agree that it's not worth worrying about. ]

[ Side noe: this patch is a clear improvement to the status quo, but the
  next patch will be removing this debug function entirely.

  But just in case we decide we want to resurrect the debugging code
  some day, I'm first applying this improvement patch so that it doesn't
  get lost    - Linus ]

Fixes: 3b7a6418c749 ("dma debug: account for cachelines and read-only mappings in overlap tracking")
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoMerge tag 'timers-urgent-2020-08-14' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 14 Aug 2020 21:26:08 +0000 (14:26 -0700)]
Merge tag 'timers-urgent-2020-08-14' of git://git./linux/kernel/git/tip/tip

Pull timekeeping updates from Thomas Gleixner:
 "A set of timekeeping/VDSO updates:

   - Preparatory work to allow S390 to switch over to the generic VDSO
     implementation.

     S390 requires that the VDSO data pointer is handed in to the
     counter read function when time namespace support is enabled.
     Adding the pointer is a NOOP for all other architectures because
     the compiler is supposed to optimize that out when it is unused in
     the architecture specific inline. The change also solved a similar
     problem for MIPS which fortunately has time namespaces not yet
     enabled.

     S390 needs to update clock related VDSO data independent of the
     timekeeping updates. This was solved so far with yet another
     sequence counter in the S390 implementation. A better solution is
     to utilize the already existing VDSO sequence count for this. The
     core code now exposes helper functions which allow to serialize
     against the timekeeper code and against concurrent readers.

     S390 needs extra data for their clock readout function. The initial
     common VDSO data structure did not provide a way to add that. It
     now has an embedded architecture specific struct embedded which
     defaults to an empty struct.

     Doing this now avoids tree dependencies and conflicts post rc1 and
     allows all other architectures which work on generic VDSO support
     to work from a common upstream base.

   - A trivial comment fix"

* tag 'timers-urgent-2020-08-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  time: Delete repeated words in comments
  lib/vdso: Allow to add architecture-specific vdso data
  timekeeping/vsyscall: Provide vdso_update_begin/end()
  vdso/treewide: Add vdso_data pointer argument to __arch_get_hw_counter()

4 years agoMerge tag 'timers-core-2020-08-14' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 14 Aug 2020 21:17:51 +0000 (14:17 -0700)]
Merge tag 'timers-core-2020-08-14' of git://git./linux/kernel/git/tip/tip

Pull more timer updates from Thomas Gleixner:
 "A set of posix CPU timer changes which allows to defer the heavy work
  of posix CPU timers into task work context. The tick interrupt is
  reduced to a quick check which queues the work which is doing the
  heavy lifting before returning to user space or going back to guest
  mode. Moving this out is deferring the signal delivery slightly but
  posix CPU timers are inaccurate by nature as they depend on the tick
  so there is no real damage. The relevant test cases all passed.

  This lifts the last offender for RT out of the hard interrupt context
  tick handler, but it also has the general benefit that the actual
  heavy work is accounted to the task/process and not to the tick
  interrupt itself.

  Further optimizations are possible to break long sighand lock hold and
  interrupt disabled (on !RT kernels) times when a massive amount of
  posix CPU timers (which are unpriviledged) is armed for a
  task/process.

  This is currently only enabled for x86 because the architecture has to
  ensure that task work is handled in KVM before entering a guest, which
  was just established for x86 with the new common entry/exit code which
  got merged post 5.8 and is not the case for other KVM architectures"

* tag 'timers-core-2020-08-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Select POSIX_CPU_TIMERS_TASK_WORK
  posix-cpu-timers: Provide mechanisms to defer timer handling to task_work
  posix-cpu-timers: Split run_posix_cpu_timers()

4 years agoMerge tag 'irq-urgent-2020-08-14' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 14 Aug 2020 21:14:28 +0000 (14:14 -0700)]
Merge tag 'irq-urgent-2020-08-14' of git://git./linux/kernel/git/tip/tip

Pull irq fixes from Thomas Gleixner:
 "Two fixes in the core interrupt code which ensure that all error exits
  unlock the descriptor lock"

* tag 'irq-urgent-2020-08-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq: Unlock irq descriptor after errors
  genirq/PM: Always unlock IRQ descriptor in rearm_wake_irq()

4 years agoMerge tag 'for-linus' of git://github.com/openrisc/linux
Linus Torvalds [Fri, 14 Aug 2020 21:04:53 +0000 (14:04 -0700)]
Merge tag 'for-linus' of git://github.com/openrisc/linux

Pull OpenRISC updates from Stafford Horne:
 "A few patches all over the place during this cycle, mostly bug and
  sparse warning fixes for OpenRISC, but a few enhancements too. Note,
  there are 2 non OpenRISC specific fixups.

  Non OpenRISC fixes:

   - In init we need to align the init_task correctly to fix an issue
     with MUTEX_FLAGS, reviewed by Peter Z. No one picked this up so I
     kept it on my tree.

   - In asm-generic/io.h I fixed up some sparse warnings, OK'd by Arnd.
     Arnd asked to merge it via my tree.

  OpenRISC fixes:

   - Many fixes for OpenRISC sprase warnings.

   - Add support OpenRISC SMP tlb flushing rather than always flushing
     the entire TLB on every CPU.

   - Fix bug when dumping stack via /proc/xxx/stack of user threads"

* tag 'for-linus' of git://github.com/openrisc/linux:
  openrisc: uaccess: Add user address space check to access_ok
  openrisc: signal: Fix sparse address space warnings
  openrisc: uaccess: Remove unused macro __addr_ok
  openrisc: uaccess: Use static inline function in access_ok
  openrisc: uaccess: Fix sparse address space warnings
  openrisc: io: Fixup defines and move include to the end
  asm-generic/io.h: Fix sparse warnings on big-endian architectures
  openrisc: Implement proper SMP tlb flushing
  openrisc: Fix oops caused when dumping stack
  openrisc: Add support for external initrd images
  init: Align init_task to avoid conflict with MUTEX_FLAGS
  openrisc: fix __user in raw_copy_to_user()'s prototype

4 years agoMerge tag 'powerpc-5.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc...
Linus Torvalds [Fri, 14 Aug 2020 20:40:27 +0000 (13:40 -0700)]
Merge tag 'powerpc-5.9-2' of git://git./linux/kernel/git/powerpc/linux

Pull powerpc fix from Michael Ellerman:
 "One fix for a boot crash on some platforms introduced by the recent
  pkey refactoring.

  Thanks to Christian Zigotzky and Aneesh Kumar K.V"

* tag 'powerpc-5.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/pkeys: Fix boot failures with Nemo board (A-EON AmigaOne X1000)

4 years agoMerge tag 'for-linus-5.9-rc1b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 14 Aug 2020 20:34:37 +0000 (13:34 -0700)]
Merge tag 'for-linus-5.9-rc1b-tag' of git://git./linux/kernel/git/xen/tip

Pull more xen updates from Juergen Gross:

 - Remove support for running as 32-bit Xen PV-guest.

   32-bit PV guests are rarely used, are lacking security fixes for
   Meltdown, and can be easily replaced by PVH mode. Another series for
   doing more cleanup will follow soon (removal of 32-bit-only pvops
   functionality).

 - Fixes and additional features for the Xen display frontend driver.

* tag 'for-linus-5.9-rc1b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  drm/xen-front: Pass dumb buffer data offset to the backend
  xen: Sync up with the canonical protocol definition in Xen
  drm/xen-front: Add YUYV to supported formats
  drm/xen-front: Fix misused IS_ERR_OR_NULL checks
  xen/gntdev: Fix dmabuf import with non-zero sgt offset
  x86/xen: drop tests for highmem in pv code
  x86/xen: eliminate xen-asm_64.S
  x86/xen: remove 32-bit Xen PV guest support

4 years agoMerge tag 'hyperv-fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 14 Aug 2020 20:31:25 +0000 (13:31 -0700)]
Merge tag 'hyperv-fixes-signed' of git://git./linux/kernel/git/hyperv/linux

Pull hyper-v fixes from Wei Liu:

 - fix oops reporting on Hyper-V

 - make objtool happy

* tag 'hyperv-fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
  x86/hyperv: Make hv_setup_sched_clock inline
  Drivers: hv: vmbus: Only notify Hyper-V for die events that are oops

4 years agox86/fsgsbase/64: Fix NULL deref in 86_fsgsbase_read_task
Eric Dumazet [Fri, 14 Aug 2020 18:16:17 +0000 (11:16 -0700)]
x86/fsgsbase/64: Fix NULL deref in 86_fsgsbase_read_task

syzbot found its way in 86_fsgsbase_read_task() and triggered this oops:

   KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
   CPU: 0 PID: 6866 Comm: syz-executor262 Not tainted 5.8.0-syzkaller #0
   RIP: 0010:x86_fsgsbase_read_task+0x16d/0x310 arch/x86/kernel/process_64.c:393
   Call Trace:
     putreg32+0x3ab/0x530 arch/x86/kernel/ptrace.c:876
     genregs32_set arch/x86/kernel/ptrace.c:1026 [inline]
     genregs32_set+0xa4/0x100 arch/x86/kernel/ptrace.c:1006
     copy_regset_from_user include/linux/regset.h:326 [inline]
     ia32_arch_ptrace arch/x86/kernel/ptrace.c:1061 [inline]
     compat_arch_ptrace+0x36c/0xd90 arch/x86/kernel/ptrace.c:1198
     __do_compat_sys_ptrace kernel/ptrace.c:1420 [inline]
     __se_compat_sys_ptrace kernel/ptrace.c:1389 [inline]
     __ia32_compat_sys_ptrace+0x220/0x2f0 kernel/ptrace.c:1389
     do_syscall_32_irqs_on arch/x86/entry/common.c:84 [inline]
     __do_fast_syscall_32+0x57/0x80 arch/x86/entry/common.c:126
     do_fast_syscall_32+0x2f/0x70 arch/x86/entry/common.c:149
     entry_SYSENTER_compat_after_hwframe+0x4d/0x5c

This can happen if ptrace() or sigreturn() pokes an LDT selector into FS
or GS for a task with no LDT and something tries to read the base before
a return to usermode notices the bad selector and fixes it.

The fix is to make sure ldt pointer is not NULL.

Fixes: 07e1d88adaae ("x86/fsgsbase/64: Fix ptrace() to read the FS/GS base accurately")
Co-developed-by: Jann Horn <jannh@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Markus T Metzger <markus.t.metzger@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoMerge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Linus Torvalds [Fri, 14 Aug 2020 20:09:15 +0000 (13:09 -0700)]
Merge branch 'linus' of git://git./linux/kernel/git/herbert/crypto-2.6

Pull crypto fix from Herbert Xu:
 "This fixes a regression in af_alg"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: algif_aead - fix uninitialized ctx->init

4 years agoMerge tag 'modules-for-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu...
Linus Torvalds [Fri, 14 Aug 2020 18:07:02 +0000 (11:07 -0700)]
Merge tag 'modules-for-v5.9' of git://git./linux/kernel/git/jeyu/linux

Pull module updates from Jessica Yu:
 "The most important change would be Christoph Hellwig's patch
  implementing proprietary taint inheritance, in an effort to discourage
  the creation of GPL "shim" modules that interface between GPL symbols
  and proprietary symbols.

  Summary:

   - Have modules that use symbols from proprietary modules inherit the
     TAINT_PROPRIETARY_MODULE taint, in an effort to prevent GPL shim
     modules that are used to circumvent _GPL exports. These are modules
     that claim to be GPL licensed while also using symbols from
     proprietary modules. Such modules will be rejected while non-GPL
     modules will inherit the proprietary taint.

   - Module export space cleanup. Unexport symbols that are unused
     outside of module.c or otherwise used in only built-in code"

* tag 'modules-for-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
  modules: inherit TAINT_PROPRIETARY_MODULE
  modules: return licensing information from find_symbol
  modules: rename the licence field in struct symsearch to license
  modules: unexport __module_address
  modules: unexport __module_text_address
  modules: mark each_symbol_section static
  modules: mark find_symbol static
  modules: mark ref_module static
  modules: linux/moduleparam.h: drop duplicated word in a comment

4 years agoMerge tag 'kconfig-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy...
Linus Torvalds [Fri, 14 Aug 2020 18:04:45 +0000 (11:04 -0700)]
Merge tag 'kconfig-v5.9' of git://git./linux/kernel/git/masahiroy/linux-kbuild

Pull Kconfig updates from Masahiro Yamada:

 - remove '---help---' keyword support

 - fix mouse events for 'menuconfig' symbols in search view of qconf

 - code cleanups of qconf

* tag 'kconfig-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (24 commits)
  kconfig: qconf: move setOptionMode() to ConfigList from ConfigView
  kconfig: qconf: do not limit the pop-up menu to the first row
  kconfig: qconf: refactor icon setups
  kconfig: qconf: remove unused voidPix, menuInvPix
  kconfig: qconf: remove ConfigItem::text/setText
  kconfig: qconf: remove ConfigList::addColumn/removeColumn
  kconfig: qconf: remove ConfigItem::pixmap/setPixmap
  kconfig: qconf: drop more localization code
  kconfig: qconf: remove 'parent' from ConfigList::updateMenuList()
  kconfig: qconf: remove unused argument from ConfigView::updateList()
  kconfig: qconf: remove unused argument from ConfigList::updateList()
  kconfig: qconf: omit parent to QHBoxLayout()
  kconfig: qconf: remove name from ConfigSearchWindow constructor
  kconfig: qconf: remove unused ConfigList::listView()
  kconfig: qconf: overload addToolBar() to create and insert toolbar
  kconfig: qconf: remove toolBar from ConfigMainWindow members
  kconfig: qconf: use 'menu' variable for (QMenu *)
  kconfig: qconf: do not use 'menu' variable for (QMenuBar *)
  kconfig: qconf: remove ->addSeparator() to menuBar
  kconfig: add 'static' to some file-local data
  ...

4 years agoMerge branch 'pm-cpufreq'
Rafael J. Wysocki [Fri, 14 Aug 2020 17:39:35 +0000 (19:39 +0200)]
Merge branch 'pm-cpufreq'

* pm-cpufreq:
  cpufreq: intel_pstate: Implement passive mode with HWP enabled

4 years agokconfig: qconf: move setOptionMode() to ConfigList from ConfigView
Masahiro Yamada [Fri, 7 Aug 2020 09:19:09 +0000 (18:19 +0900)]
kconfig: qconf: move setOptionMode() to ConfigList from ConfigView

ConfigView::setOptionMode() only gets access to the 'list' member.

Move it to the more relevant ConfigList class.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
4 years agokconfig: qconf: do not limit the pop-up menu to the first row
Masahiro Yamada [Fri, 7 Aug 2020 09:19:08 +0000 (18:19 +0900)]
kconfig: qconf: do not limit the pop-up menu to the first row

If you right-click the first row in the option tree, the pop-up menu
shows up, but if you right-click the second row or below, the event
is ignored due to the following check:

  if (e->y() <= header()->geometry().bottom()) {

Perhaps, the intention was to show the pop-menu only when the tree
header was right-clicked, but this handler is not called in that case.

Since the origin of e->y() starts from the bottom of the header,
this check is odd.

Going forward, you can right-click anywhere in the tree to get the
pop-up menu.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
4 years agokconfig: qconf: refactor icon setups
Masahiro Yamada [Fri, 7 Aug 2020 09:19:07 +0000 (18:19 +0900)]
kconfig: qconf: refactor icon setups

These icon data are used by ConfigItem, but stored in each instance
of ConfigView. There is no point to keep the same data in each of 3
instances, "menu", "config", and "search".

Move the icon data to the more relevant ConfigItem class, and make
them static members.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
4 years agokconfig: qconf: remove unused voidPix, menuInvPix
Masahiro Yamada [Fri, 7 Aug 2020 09:19:06 +0000 (18:19 +0900)]
kconfig: qconf: remove unused voidPix, menuInvPix

These are initialized, but not used by anyone.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
4 years agokconfig: qconf: remove ConfigItem::text/setText
Masahiro Yamada [Fri, 7 Aug 2020 09:19:05 +0000 (18:19 +0900)]
kconfig: qconf: remove ConfigItem::text/setText

Use QTreeWidgetItem::text/setText directly

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
4 years agokconfig: qconf: remove ConfigList::addColumn/removeColumn
Masahiro Yamada [Fri, 7 Aug 2020 09:19:04 +0000 (18:19 +0900)]
kconfig: qconf: remove ConfigList::addColumn/removeColumn

Use QTreeView::showColumn/hideColumn directly.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
4 years agokconfig: qconf: remove ConfigItem::pixmap/setPixmap
Masahiro Yamada [Fri, 7 Aug 2020 09:19:03 +0000 (18:19 +0900)]
kconfig: qconf: remove ConfigItem::pixmap/setPixmap

Use QTreeWidgetItem::icon/setIcon directly.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
4 years agokconfig: qconf: drop more localization code
Masahiro Yamada [Fri, 7 Aug 2020 09:19:02 +0000 (18:19 +0900)]
kconfig: qconf: drop more localization code

This is a remnant of commit 694c49a7c01c ("kconfig: drop localization
support").

Get it back to the code prior to commit 3b9fa0931dd8 ("[PATCH] Kconfig
i18n support").

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
4 years agokconfig: qconf: remove 'parent' from ConfigList::updateMenuList()
Masahiro Yamada [Fri, 7 Aug 2020 09:19:01 +0000 (18:19 +0900)]
kconfig: qconf: remove 'parent' from ConfigList::updateMenuList()

All the call-sites of this function pass 'this' to the first argument.

So, 'parent' is always the 'this' pointer.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
4 years agokconfig: qconf: remove unused argument from ConfigView::updateList()
Masahiro Yamada [Fri, 7 Aug 2020 09:19:00 +0000 (18:19 +0900)]
kconfig: qconf: remove unused argument from ConfigView::updateList()

Now that ConfigList::updateList() takes no argument, the 'item' argument
ConfigView::updateList() is no longer used.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
4 years agokconfig: qconf: remove unused argument from ConfigList::updateList()
Masahiro Yamada [Fri, 7 Aug 2020 09:18:59 +0000 (18:18 +0900)]
kconfig: qconf: remove unused argument from ConfigList::updateList()

This function allocates 'item' before using it, so the argument 'item'
is always shadowed.

Remove the meaningless argument.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
4 years agokconfig: qconf: omit parent to QHBoxLayout()
Masahiro Yamada [Fri, 7 Aug 2020 09:18:58 +0000 (18:18 +0900)]
kconfig: qconf: omit parent to QHBoxLayout()

Instead of passing 0 (i.e. nullptr), leave it empty.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
4 years agokconfig: qconf: remove name from ConfigSearchWindow constructor
Masahiro Yamada [Fri, 7 Aug 2020 09:18:57 +0000 (18:18 +0900)]
kconfig: qconf: remove name from ConfigSearchWindow constructor

This constructor is only called with "search" as the second argument.

Hard-code the name in the constructor, and drop it from the function
argument.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
4 years agokconfig: qconf: remove unused ConfigList::listView()
Masahiro Yamada [Fri, 7 Aug 2020 09:18:56 +0000 (18:18 +0900)]
kconfig: qconf: remove unused ConfigList::listView()

I do not know how this function can be useful. In fact, it is unsed.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
4 years agokconfig: qconf: overload addToolBar() to create and insert toolbar
Masahiro Yamada [Fri, 7 Aug 2020 09:18:55 +0000 (18:18 +0900)]
kconfig: qconf: overload addToolBar() to create and insert toolbar

Use the overloaded function, addToolBar(const QString &title)
to create a QToolBar object, setting its window title, and inserts
it into the toolbar area.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
4 years agokconfig: qconf: remove toolBar from ConfigMainWindow members
Masahiro Yamada [Fri, 7 Aug 2020 09:18:54 +0000 (18:18 +0900)]
kconfig: qconf: remove toolBar from ConfigMainWindow members

This pointer is only used in the ConfigMainWindow constructor.

Drop it from the private members.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
4 years agokconfig: qconf: use 'menu' variable for (QMenu *)
Masahiro Yamada [Fri, 7 Aug 2020 09:18:53 +0000 (18:18 +0900)]
kconfig: qconf: use 'menu' variable for (QMenu *)

The variable 'config' for the file menu is inconsistent.

You do not need to use different variables. Use 'menu' for every menu.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
4 years agokconfig: qconf: do not use 'menu' variable for (QMenuBar *)
Masahiro Yamada [Fri, 7 Aug 2020 09:18:52 +0000 (18:18 +0900)]
kconfig: qconf: do not use 'menu' variable for (QMenuBar *)

I think it is a bit confusing to use 'menu' to hold a QMenuBar pointer.
I want to use 'menu' for a QMenu pointer.

You do not need to use a local variable here. Use menuBar() directly.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
4 years agokconfig: qconf: remove ->addSeparator() to menuBar
Masahiro Yamada [Fri, 7 Aug 2020 09:18:51 +0000 (18:18 +0900)]
kconfig: qconf: remove ->addSeparator() to menuBar

I do not understand the purpose of this ->addSeparator().
It does not make any difference.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
4 years agokconfig: add 'static' to some file-local data
Masahiro Yamada [Wed, 29 Jul 2020 03:18:37 +0000 (12:18 +0900)]
kconfig: add 'static' to some file-local data

Fix some warnings from sparce like follows:

  warning: symbol '...' was not declared. Should it be static?

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
4 years agokconfig: qconf: Fix mouse events in search view
Maxime Chretien [Wed, 8 Jul 2020 13:32:15 +0000 (15:32 +0200)]
kconfig: qconf: Fix mouse events in search view

On menu properties mouse events didn't do anything in search view
(listMode).

As there are no menus in listMode we can add an exception in tests to
always change the value on mouse events if we are in listMode.

Signed-off-by: Maxime Chretien <maxime.chretien@bootlin.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
4 years agokconfig: constify XPM data
Masahiro Yamada [Sat, 18 Jul 2020 06:38:47 +0000 (15:38 +0900)]
kconfig: constify XPM data

Constify arrays as well as strings.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
4 years agoRevert "checkpatch: kconfig: prefer 'help' over '---help---'"
Masahiro Yamada [Wed, 17 Jun 2020 03:02:20 +0000 (12:02 +0900)]
Revert "checkpatch: kconfig: prefer 'help' over '---help---'"

This reverts commit 84af7a6194e493fae312a2b7fa5a3b51f76d9282.

The conversion is done.

Cc: Ulf Magnusson <ulfalizer@gmail.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
4 years agokconfig: remove '---help---' support
Masahiro Yamada [Wed, 17 Jun 2020 03:02:19 +0000 (12:02 +0900)]
kconfig: remove '---help---' support

The conversion is done. No more user of '---help---'.

Cc: Ulf Magnusson <ulfalizer@gmail.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
4 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Fri, 14 Aug 2020 03:03:11 +0000 (20:03 -0700)]
Merge git://git./linux/kernel/git/netdev/net

Pull networking fixes from David Miller:
 "Some merge window fallout, some longer term fixes:

   1) Handle headroom properly in lapbether and x25_asy drivers, from
      Xie He.

   2) Fetch MAC address from correct r8152 device node, from Thierry
      Reding.

   3) In the sw kTLS path we should allow MSG_CMSG_COMPAT in sendmsg,
      from Rouven Czerwinski.

   4) Correct fdputs in socket layer, from Miaohe Lin.

   5) Revert troublesome sockptr_t optimization, from Christoph Hellwig.

   6) Fix TCP TFO key reading on big endian, from Jason Baron.

   7) Missing CAP_NET_RAW check in nfc, from Qingyu Li.

   8) Fix inet fastreuse optimization with tproxy sockets, from Tim
      Froidcoeur.

   9) Fix 64-bit divide in new SFC driver, from Edward Cree.

  10) Add a tracepoint for prandom_u32 so that we can more easily
      perform usage analysis. From Eric Dumazet.

  11) Fix rwlock imbalance in AF_PACKET, from John Ogness"

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (49 commits)
  net: openvswitch: introduce common code for flushing flows
  af_packet: TPACKET_V3: fix fill status rwlock imbalance
  random32: add a tracepoint for prandom_u32()
  Revert "ipv4: tunnel: fix compilation on ARCH=um"
  net: accept an empty mask in /sys/class/net/*/queues/rx-*/rps_cpus
  net: ethernet: stmmac: Disable hardware multicast filter
  net: stmmac: dwmac1000: provide multicast filter fallback
  ipv4: tunnel: fix compilation on ARCH=um
  vsock: fix potential null pointer dereference in vsock_poll()
  sfc: fix ef100 design-param checking
  net: initialize fastreuse on inet_inherit_port
  net: refactor bind_bucket fastreuse into helper
  net: phy: marvell10g: fix null pointer dereference
  net: Fix potential memory leak in proto_register()
  net: qcom/emac: add missed clk_disable_unprepare in error path of emac_clks_phase1_init
  ionic_lif: Use devm_kcalloc() in ionic_qcq_alloc()
  net/nfc/rawsock.c: add CAP_NET_RAW check.
  hinic: fix strncpy output truncated compile warnings
  drivers/net/wan/x25_asy: Added needed_headroom and a skb->len check
  net/tls: Fix kmap usage
  ...

4 years agoMerge branch 'i2c/for-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Linus Torvalds [Fri, 14 Aug 2020 01:41:00 +0000 (18:41 -0700)]
Merge branch 'i2c/for-5.9' of git://git./linux/kernel/git/wsa/linux

Pull i2c updates from Wolfram Sang:

 - bus recovery can now be given a pinctrl handle and the I2C core will
   do all the steps to switch to/from GPIO which can save quite some
   boilerplate code from drivers

 - "fallthrough" conversion

 - driver updates, mostly ID additions

* 'i2c/for-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (32 commits)
  i2c: iproc: fix race between client unreg and isr
  i2c: eg20t: use generic power management
  i2c: eg20t: Drop PCI wakeup calls from .suspend/.resume
  i2c: mediatek: Fix i2c_spec_values description
  i2c: mediatek: Add i2c compatible for MediaTek MT8192
  dt-bindings: i2c: update bindings for MT8192 SoC
  i2c: mediatek: Add access to more than 8GB dram in i2c driver
  i2c: mediatek: Add apdma sync in i2c driver
  i2c: i801: Add support for Intel Tiger Lake PCH-H
  i2c: i801: Add support for Intel Emmitsburg PCH
  i2c: bcm2835: Replace HTTP links with HTTPS ones
  Documentation: i2c: dev: 'block process call' is supported
  i2c: at91: Move to generic GPIO bus recovery
  i2c: core: treat EPROBE_DEFER when acquiring SCL/SDA GPIOs
  i2c: core: add generic I2C GPIO recovery
  dt-bindings: i2c: add generic properties for GPIO bus recovery
  i2c: rcar: avoid race when unregistering slave
  i2c: tegra: Avoid tegra_i2c_init_dma() for Tegra210 vi i2c
  i2c: tegra: Fix runtime resume to re-init VI I2C
  i2c: tegra: Fix the error path in tegra_i2c_runtime_resume
  ...

4 years agonet: openvswitch: introduce common code for flushing flows
Tonghao Zhang [Wed, 12 Aug 2020 09:56:39 +0000 (17:56 +0800)]
net: openvswitch: introduce common code for flushing flows

To avoid some issues, for example RCU usage warning and double free,
we should flush the flows under ovs_lock. This patch refactors
table_instance_destroy and introduces table_instance_flow_flush
which can be invoked by __dp_destroy or ovs_flow_tbl_flush.

Fixes: 50b0e61b32ee ("net: openvswitch: fix possible memleak on destroy flow-table")
Reported-by: Johan Knöös <jknoos@google.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2020-August/050489.html
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoaf_packet: TPACKET_V3: fix fill status rwlock imbalance
John Ogness [Thu, 13 Aug 2020 19:39:25 +0000 (21:45 +0206)]
af_packet: TPACKET_V3: fix fill status rwlock imbalance

After @blk_fill_in_prog_lock is acquired there is an early out vnet
situation that can occur. In that case, the rwlock needs to be
released.

Also, since @blk_fill_in_prog_lock is only acquired when @tp_version
is exactly TPACKET_V3, only release it on that exact condition as
well.

And finally, add sparse annotation so that it is clearer that
prb_fill_curr_block() and prb_clear_blk_fill_status() are acquiring
and releasing @blk_fill_in_prog_lock, respectively. sparse is still
unable to understand the balance, but the warnings are now on a
higher level that make more sense.

Fixes: 632ca50f2cbd ("af_packet: TPACKET_V3: replace busy-wait loop")
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agorandom32: add a tracepoint for prandom_u32()
Eric Dumazet [Thu, 13 Aug 2020 17:06:43 +0000 (10:06 -0700)]
random32: add a tracepoint for prandom_u32()

There has been some heat around prandom_u32() lately, and some people
were wondering if there was a simple way to determine how often
it was used, before considering making it maybe 10 times more expensive.

This tracepoint exports the generated pseudo random value.

Tested:

perf list | grep prandom_u32
  random:prandom_u32                                 [Tracepoint event]

perf record -a [-g] [-C1] -e random:prandom_u32 sleep 1
[ perf record: Woken up 0 times to write data ]
[ perf record: Captured and wrote 259.748 MB perf.data (924087 samples) ]

perf report --nochildren
    ...
    97.67%  ksoftirqd/1     [kernel.vmlinux]  [k] prandom_u32
            |
            ---prandom_u32
               prandom_u32
               |
               |--48.86%--tcp_v4_syn_recv_sock
               |          tcp_check_req
               |          tcp_v4_rcv
               |          ...
                --48.81%--tcp_conn_request
                          tcp_v4_conn_request
                          tcp_rcv_state_process
                          ...
perf script

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Sedat Dilek <sedat.dilek@gmail.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge tag 'docs-5.9-2' of git://git.lwn.net/linux
Linus Torvalds [Thu, 13 Aug 2020 20:57:45 +0000 (13:57 -0700)]
Merge tag 'docs-5.9-2' of git://git.lwn.net/linux

Pull documentation fixes from Jonathan Corbet:
 "A handful of obvious fixes that wandered in during the merge window"

* tag 'docs-5.9-2' of git://git.lwn.net/linux:
  Documentation/locking/locktypes: fix the typo
  doc/zh_CN: resolve undefined label warning in admin-guide index
  doc/zh_CN: fix title heading markup in admin-guide cpu-load
  docs: remove the 2.6 "Upgrading I2C Drivers" guide
  docs: Correct the release date of 5.2 stable
  mailmap: Update comments for with format and more detalis
  docs: cdrom: Fix a typo and rst markup
  Doc: admin-guide: use correct legends in kernel-parameters.txt
  Documentation/features: refresh RISC-V arch support files
  documentation: coccinelle: Improve command example for make C={1,2}
  Core-api: Documentation: Replace deprecated :c:func: Usage
  Dev-tools: Documentation: Replace deprecated :c:func: Usage
  Filesystems: Documentation: Replace deprecated :c:func: Usage
  docs: trace: fix a typo

4 years agoDocumentation/locking/locktypes: fix the typo
Huang Shijie [Thu, 13 Aug 2020 06:02:20 +0000 (14:02 +0800)]
Documentation/locking/locktypes: fix the typo

We have three categories locks, not two.

Signed-off-by: Huang Shijie <sjhuang@iluvatar.ai>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20200813060220.18199-1-sjhuang@iluvatar.ai
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
4 years agoMerge tag 's390-5.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Linus Torvalds [Thu, 13 Aug 2020 19:38:32 +0000 (12:38 -0700)]
Merge tag 's390-5.9-2' of git://git./linux/kernel/git/s390/linux

Pull more s390 updates from Heiko Carstens:

 - Allow s390 debug feature to handle finally more than 256 CPU numbers,
   instead of truncating the most significant bits.

 - Improve THP splitting required by qemu processes by making use of
   walk_page_vma() instead of calling follow_page() for every single
   page within each vma.

 - Add missing ZCRYPT dependency to VFIO_AP to fix potential compile
   problems.

 - Remove not required select CLOCKSOURCE_VALIDATE_LAST_CYCLE again.

 - Set node distance to LOCAL_DISTANCE instead of 0, since e.g. libnuma
   translates a node distance of 0 to "no NUMA support available".

 - Couple of other minor fixes and improvements.

* tag 's390-5.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/numa: move code to arch/s390/kernel
  s390/time: remove select CLOCKSOURCE_VALIDATE_LAST_CYCLE again
  s390/debug: debug feature version 3
  s390/Kconfig: add missing ZCRYPT dependency to VFIO_AP
  s390/numa: set node distance to LOCAL_DISTANCE
  s390/pkey: remove redundant variable initialization
  s390/test_unwind: fix possible memleak in test_unwind()
  s390/gmap: improve THP splitting
  s390/atomic: circumvent gcc 10 build regression

4 years agoMerge tag 'for-5.9-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Linus Torvalds [Thu, 13 Aug 2020 19:26:18 +0000 (12:26 -0700)]
Merge tag 'for-5.9-tag' of git://git./linux/kernel/git/kdave/linux

Pull more btrfs updates from David Sterba:
 "One minor update, the rest are fixes that have arrived a bit late for
  the first batch. There are also some recent fixes for bugs that were
  discovered during the merge window and pop up during testing.

  User visible change:

   - show correct subvolume path in /proc/mounts for bind mounts

  Fixes:

   - fix compression messages when remounting with different level or
     compression algorithm

   - tree-log: fix some memory leaks on error handling paths

   - restore I_VERSION on remount

   - fix return values and error code mixups

   - fix umount crash with quotas enabled when removing sysfs files

   - fix trim range on a shrunk device"

* tag 'for-5.9-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: trim: fix underflow in trim length to prevent access beyond device boundary
  btrfs: fix return value mixup in btrfs_get_extent
  btrfs: sysfs: fix NULL pointer dereference at btrfs_sysfs_del_qgroups()
  btrfs: check correct variable after allocation in btrfs_backref_iter_alloc
  btrfs: make sure SB_I_VERSION doesn't get unset by remount
  btrfs: fix memory leaks after failure to lookup checksums during inode logging
  btrfs: don't show full path of bind mounts in subvol=
  btrfs: fix messages after changing compression level by remount
  btrfs: only search for left_info if there is no right_info in try_merge_free_space
  btrfs: inode: fix NULL pointer dereference if inode doesn't need compression

4 years agoMerge tag 'xfs-5.9-merge-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Linus Torvalds [Thu, 13 Aug 2020 19:22:19 +0000 (12:22 -0700)]
Merge tag 'xfs-5.9-merge-8' of git://git./fs/xfs/xfs-linux

Pull xfs fixes from Darrick Wong:
 "Two small fixes that have come in during the past week:

   - Fix duplicated words in comments

   - Fix an ubsan complaint about null pointer arithmetic"

* tag 'xfs-5.9-merge-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: Fix UBSAN null-ptr-deref in xfs_sysfs_init
  xfs: delete duplicated words + other fixes

4 years agoMerge tag 'exfat-for-5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkin...
Linus Torvalds [Thu, 13 Aug 2020 19:18:07 +0000 (12:18 -0700)]
Merge tag 'exfat-for-5.9-rc1' of git://git./linux/kernel/git/linkinjeon/exfat

Pull exfat updates from Namjae Jeon:

 - don't clear MediaFailure and VolumeDirty bit in volume flags if these
   were already set before mounting

 - write multiple dirty buffers at once in sync mode

 - remove unneeded EXFAT_SB_DIRTY bit set

* tag 'exfat-for-5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat:
  exfat: retain 'VolumeFlags' properly
  exfat: optimize exfat_zeroed_cluster()
  exfat: add error check when updating dir-entries
  exfat: write multiple sectors at once
  exfat: remove EXFAT_SB_DIRTY flag

4 years agomm: memcontrol: fix warning when allocating the root cgroup
Johannes Weiner [Thu, 13 Aug 2020 14:40:54 +0000 (10:40 -0400)]
mm: memcontrol: fix warning when allocating the root cgroup

Commit 3e38e0aaca9e ("mm: memcg: charge memcg percpu memory to the
parent cgroup") adds memory tracking to the memcg kernel structures
themselves to make cgroups liable for the memory they are consuming
through the allocation of child groups (which can be significant).

This code is a bit awkward as it's spread out through several functions:
The outermost function does memalloc_use_memcg(parent) to set up
current->active_memcg, which designates which cgroup to charge, and the
inner functions pass GFP_ACCOUNT to request charging for specific
allocations.  To make sure this dependency is satisfied at all times -
to make sure we don't randomly charge whoever is calling the functions -
the inner functions warn on !current->active_memcg.

However, this triggers a false warning when the root memcg itself is
allocated.  No parent exists in this case, and so current->active_memcg
is rightfully NULL.  It's a false positive, not indicative of a bug.

Delete the warnings for now, we can revisit this later.

Fixes: 3e38e0aaca9e ("mm: memcg: charge memcg percpu memory to the parent cgroup")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Roman Gushchin <guro@fb.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agodrm/xen-front: Pass dumb buffer data offset to the backend
Oleksandr Andrushchenko [Thu, 13 Aug 2020 12:00:38 +0000 (14:00 +0200)]
drm/xen-front: Pass dumb buffer data offset to the backend

While importing a dmabuf it is possible that the data of the buffer
is put with offset which is indicated by the SGT offset.
Respect the offset value and forward it to the backend.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Acked-by: Noralf Trønnes <noralf@tronnes.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
4 years agoxen: Sync up with the canonical protocol definition in Xen
Oleksandr Andrushchenko [Thu, 13 Aug 2020 06:21:12 +0000 (09:21 +0300)]
xen: Sync up with the canonical protocol definition in Xen

This is the sync up with the canonical definition of the
display protocol in Xen.

1. Add protocol version as an integer

Version string, which is in fact an integer, is hard to handle in the
code that supports different protocol versions. To simplify that
also add the version as an integer.

2. Pass buffer offset with XENDISPL_OP_DBUF_CREATE

There are cases when display data buffer is created with non-zero
offset to the data start. Handle such cases and provide that offset
while creating a display buffer.

3. Add XENDISPL_OP_GET_EDID command

Add an optional request for reading Extended Display Identification
Data (EDID) structure which allows better configuration of the
display connectors over the configuration set in XenStore.
With this change connectors may have multiple resolutions defined
with respect to detailed timing definitions and additional properties
normally provided by displays.

If this request is not supported by the backend then visible area
is defined by the relevant XenStore's "resolution" property.

If backend provides extended display identification data (EDID) with
XENDISPL_OP_GET_EDID request then EDID values must take precedence
over the resolutions defined in XenStore.

4. Bump protocol version to 2.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20200813062113.11030-5-andr2000@gmail.com
Signed-off-by: Juergen Gross <jgross@suse.com>
4 years agodrm/xen-front: Add YUYV to supported formats
Oleksandr Andrushchenko [Thu, 13 Aug 2020 06:21:11 +0000 (09:21 +0300)]
drm/xen-front: Add YUYV to supported formats

Add YUYV to supported formats, so the frontend can work with the
formats used by cameras and other HW.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Acked-by: Noralf Trønnes <noralf@tronnes.org>
Link: https://lore.kernel.org/r/20200813062113.11030-4-andr2000@gmail.com
Signed-off-by: Juergen Gross <jgross@suse.com>
4 years agodrm/xen-front: Fix misused IS_ERR_OR_NULL checks
Oleksandr Andrushchenko [Thu, 13 Aug 2020 06:21:10 +0000 (09:21 +0300)]
drm/xen-front: Fix misused IS_ERR_OR_NULL checks

The patch c575b7eeb89f: "drm/xen-front: Add support for Xen PV
display frontend" from Apr 3, 2018, leads to the following static
checker warning:

drivers/gpu/drm/xen/xen_drm_front_gem.c:140 xen_drm_front_gem_create()
warn: passing zero to 'ERR_CAST'

drivers/gpu/drm/xen/xen_drm_front_gem.c
   133  struct drm_gem_object *xen_drm_front_gem_create(struct drm_device *dev,
   134                                                  size_t size)
   135  {
   136          struct xen_gem_object *xen_obj;
   137
   138          xen_obj = gem_create(dev, size);
   139          if (IS_ERR_OR_NULL(xen_obj))
   140                  return ERR_CAST(xen_obj);

Fix this and the rest of misused places with IS_ERR_OR_NULL in the
driver.

Fixes: c575b7eeb89f: "drm/xen-front: Add support for Xen PV display frontend"
Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20200813062113.11030-3-andr2000@gmail.com
Signed-off-by: Juergen Gross <jgross@suse.com>
4 years agoxen/gntdev: Fix dmabuf import with non-zero sgt offset
Oleksandr Andrushchenko [Thu, 13 Aug 2020 06:21:09 +0000 (09:21 +0300)]
xen/gntdev: Fix dmabuf import with non-zero sgt offset

It is possible that the scatter-gather table during dmabuf import has
non-zero offset of the data, but user-space doesn't expect that.
Fix this by failing the import, so user-space doesn't access wrong data.

Fixes: bf8dc55b1358 ("xen/gntdev: Implement dma-buf import functionality")
Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Acked-by: Juergen Gross <jgross@suse.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20200813062113.11030-2-andr2000@gmail.com
Signed-off-by: Juergen Gross <jgross@suse.com>
4 years agoALSA: echoaudio: Fix potential Oops in snd_echo_resume()
Dinghao Liu [Thu, 13 Aug 2020 07:46:30 +0000 (15:46 +0800)]
ALSA: echoaudio: Fix potential Oops in snd_echo_resume()

Freeing chip on error may lead to an Oops at the next time
the system goes to resume. Fix this by removing all
snd_echo_free() calls on error.

Fixes: 47b5d028fdce8 ("ALSA: Echoaudio - Add suspend support #2")
Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
Link: https://lore.kernel.org/r/20200813074632.17022-1-dinghao.liu@zju.edu.cn
Signed-off-by: Takashi Iwai <tiwai@suse.de>
4 years agogenirq: Unlock irq descriptor after errors
Guenter Roeck [Tue, 11 Aug 2020 18:00:12 +0000 (11:00 -0700)]
genirq: Unlock irq descriptor after errors

In irq_set_irqchip_state(), the irq descriptor is not unlocked after an
error is encountered. While that should never happen in practice, a buggy
driver may trigger it. This would result in a lockup, so fix it.

Fixes: 1d0326f352bb ("genirq: Check irq_data_get_irq_chip() return value before use")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20200811180012.80269-1-linux@roeck-us.net
4 years agocrypto: algif_aead - fix uninitialized ctx->init
Ondrej Mosnacek [Wed, 12 Aug 2020 12:58:25 +0000 (14:58 +0200)]
crypto: algif_aead - fix uninitialized ctx->init

In skcipher_accept_parent_nokey() the whole af_alg_ctx structure is
cleared by memset() after allocation, so add such memset() also to
aead_accept_parent_nokey() so that the new "init" field is also
initialized to zero. Without that the initial ctx->init checks might
randomly return true and cause errors.

While there, also remove the redundant zero assignments in both
functions.

Found via libkcapi testsuite.

Cc: Stephan Mueller <smueller@chronox.de>
Fixes: f3c802a1f300 ("crypto: algif_aead - Only wake up when ctx->more is zero")
Suggested-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>