review.tizen.org Git - platform/kernel/linux-rpi.git/log

projects / platform / kernel / linux-rpi.git / log

Chuck Lever [Mon, 19 Aug 2019 22:48:43 +0000 (18:48 -0400)]

xprtrdma: Use an llist to manage free rpcrdma_reps

rpcrdma_rep objects are removed from their free list by only a
single thread: the Receive completion handler. Thus that free list
can be converted to an llist, where a single-threaded consumer and
a multi-threaded producer (rpcrdma_buffer_put) can both access the
llist without the need for any serialization.

This eliminates spin lock contention between the Receive completion
handler and rpcrdma_buffer_get, and makes the rep consumer wait-
free.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

commit | commitdiff | tree

Chuck Lever [Mon, 19 Aug 2019 22:47:57 +0000 (18:47 -0400)]

xprtrdma: Remove rpcrdma_buffer::rb_mrlock

Clean up: Now that the free list is used sparingly, get rid of the
separate spin lock protecting it.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

commit | commitdiff | tree

Chuck Lever [Mon, 19 Aug 2019 22:47:10 +0000 (18:47 -0400)]

xprtrdma: Cache free MRs in each rpcrdma_req

Instead of a globally-contended MR free list, cache MRs in each
rpcrdma_req as they are released. This means acquiring and releasing
an MR will be lock-free in the common case, even outside the
transport send lock.

The original idea of per-rpcrdma_req MR free lists was suggested by
Shirley Ma <shirley.ma@oracle.com> several years ago. I just now
figured out how to make that idea work with on-demand MR allocation.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

commit | commitdiff | tree

Chuck Lever [Mon, 19 Aug 2019 22:46:24 +0000 (18:46 -0400)]

xprtrdma: Ensure creating an MR does not trigger FS writeback

Probably would be good to also pass GFP flags to ib_alloc_mr.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

commit | commitdiff | tree

Chuck Lever [Mon, 19 Aug 2019 22:45:37 +0000 (18:45 -0400)]

xprtrdma: Move rpcrdma_mr_get out of frwr_map

Refactor: Retrieve an MR and handle error recovery entirely in
rpc_rdma.c, as this is not a device-specific function.

Note that since commit 89f90fe1ad8b ("SUNRPC: Allow calls to
xprt_transmit() to drain the entire transmit queue"), the
xprt_transmit function handles the cond_resched. The transport no
longer has to do this itself.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

commit | commitdiff | tree

Chuck Lever [Mon, 19 Aug 2019 22:44:50 +0000 (18:44 -0400)]

xprtrdma: Combine rpcrdma_mr_put and rpcrdma_mr_unmap_and_put

Clean up. There is only one remaining rpcrdma_mr_put call site, and
it can be directly replaced with unmap_and_put because mr->mr_dir is
set to DMA_NONE just before the call.

Now all the call sites do a DMA unmap, and we can just rename
mr_unmap_and_put to mr_put, which nicely matches mr_get.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

commit | commitdiff | tree

Chuck Lever [Mon, 19 Aug 2019 22:44:04 +0000 (18:44 -0400)]

xprtrdma: Simplify rpcrdma_mr_pop

Clean up: rpcrdma_mr_pop call sites check if the list is empty
first. Let's replace the list_empty with less costly logic.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

commit | commitdiff | tree

Chuck Lever [Mon, 19 Aug 2019 22:43:17 +0000 (18:43 -0400)]

xprtrdma: Toggle XPRT_CONGESTED in xprtrdma's slot methods

Commit 48be539dd44a ("xprtrdma: Introduce ->alloc_slot call-out for
xprtrdma") added a separate alloc_slot and free_slot to the RPC/RDMA
transport. Later, commit 75891f502f5f ("SUNRPC: Support for
congestion control when queuing is enabled") modified the generic
alloc/free_slot methods, but neglected the methods in xprtrdma.

Found via code review.

Fixes: 75891f502f5f ("SUNRPC: Support for congestion control ... ")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

commit | commitdiff | tree

Chuck Lever [Mon, 19 Aug 2019 22:42:31 +0000 (18:42 -0400)]

xprtrdma: Rename rpcrdma_buffer::rb_all

Clean up: There are other "all" list heads. For code clarity
distinguish this one as for use only for MRs by renaming it.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

commit | commitdiff | tree

Chuck Lever [Mon, 19 Aug 2019 22:41:44 +0000 (18:41 -0400)]

xprtrdma: Rename CQE field in Receive trace points

Make the field name the same for all trace points that handle
pointers to struct rpcrdma_rep. That makes it easy to grep for
matching rep points in trace output.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

commit | commitdiff | tree

Chuck Lever [Mon, 19 Aug 2019 22:40:58 +0000 (18:40 -0400)]

xprtrdma: Boost client's max slot table size to match Linux server

I've heard rumors of an NFS/RDMA server implementation that has a
default credit limit of 1024. The client's default setting remains
at 128.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

commit | commitdiff | tree

Chuck Lever [Mon, 19 Aug 2019 22:40:11 +0000 (18:40 -0400)]

xprtrdma: Boost maximum transport header size

Although I haven't seen any performance results that justify it,
I've received several complaints that NFS/RDMA no longer supports
a maximum rsize and wsize of 1MB. These days it is somewhat smaller.

To simplify the logic that determines whether a chunk list is
necessary, the implementation uses a fixed maximum size of the
transport header. Currently that maximum size is 256 bytes, one
quarter of the default inline threshold size for RPC/RDMA v1.

Since commit a78868497c2e ("xprtrdma: Reduce max_frwr_depth"), the
size of chunks is also smaller to take advantage of inline page
lists in device internal MR data structures.

The combination of these two design choices has reduced the maximum
NFS rsize and wsize that can be used for most RNIC/HCAs. Increasing
the maximum transport header size and the maximum number of RDMA
segments it can contain increases the negotiated maximum rsize/wsize
on common RNIC/HCAs.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

commit | commitdiff | tree

Chuck Lever [Mon, 19 Aug 2019 22:39:25 +0000 (18:39 -0400)]

xprtrdma: Fix calculation of ri_max_segs again

Commit 302d3deb206 ("xprtrdma: Prevent inline overflow") added this
calculation back in 2016, but got it wrong. I tested only the lower
bound, which is why there is a max_t there. The upper bound should be
rounded up too.

Now, when using DIV_ROUND_UP, that takes care of the lower bound as
well.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

commit | commitdiff | tree

Chuck Lever [Mon, 19 Aug 2019 22:38:38 +0000 (18:38 -0400)]

xprtrdma: Update obsolete comment

Comment was made obsolete by commit 8cec3dba76a4 ("xprtrdma:
rpcrdma_regbuf alignment").

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

commit | commitdiff | tree

Chuck Lever [Mon, 19 Aug 2019 22:37:52 +0000 (18:37 -0400)]

xprtrdma: Refresh the documenting comment in frwr_ops.c

Things have changed since this comment was written. In particular,
the reworking of connection closing, on-demand creation of MRs, and
the removal of fr_state all mean that deferring MR recovery to
frwr_map is no longer needed. The description is obsolete.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

commit | commitdiff | tree

Chuck Lever [Mon, 19 Aug 2019 22:37:05 +0000 (18:37 -0400)]

SUNRPC: Inline xdr_commit_encode

Micro-optimization: For xdr_commit_encode call sites in
net/sunrpc/xdr.c, eliminate the extra calling sequence. On my
client, this change saves about a microsecond for every 30 calls
to xdr_reserve_space().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

commit | commitdiff | tree

Chuck Lever [Mon, 19 Aug 2019 22:36:19 +0000 (18:36 -0400)]

SUNRPC: Remove rpc_wake_up_queued_task_on_wq()

Clean up: commit c544577daddb ("SUNRPC: Clean up transport write
space handling") appears to have removed the last caller of
rpc_wake_up_queued_task_on_wq().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

commit | commitdiff | tree

Jia-Ju Bai [Fri, 26 Jul 2019 07:48:53 +0000 (15:48 +0800)]

fs: nfs: Fix possible null-pointer dereferences in encode_attrs()

In encode_attrs(), there is an if statement on line 1145 to check
whether label is NULL:
    if (label && (attrmask[2] & FATTR4_WORD2_SECURITY_LABEL))

When label is NULL, it is used on lines 1178-1181:
    *p++ = cpu_to_be32(label->lfs);
    *p++ = cpu_to_be32(label->pi);
    *p++ = cpu_to_be32(label->len);
    p = xdr_encode_opaque_fixed(p, label->label, label->len);

To fix these bugs, label is checked before being used.

These bugs are found by a static analysis tool STCheck written by us.

Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

commit | commitdiff | tree

Linus Torvalds [Sun, 18 Aug 2019 21:31:08 +0000 (14:31 -0700)]

Linux 5.3-rc5

commit | commitdiff | tree

Linus Torvalds [Sun, 18 Aug 2019 19:56:42 +0000 (12:56 -0700)]

Merge tag 'fixes-for-5.3-rc5' of git://git./linux/kernel/git/mtd/linux

Pull MTD fix from Richard Weinberger:
"A single fix for MTD to correctly set the spi-nor WP pin"

* tag 'fixes-for-5.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
mtd: spi-nor: Fix the disabling of write protection at init

commit | commitdiff | tree

Linus Torvalds [Sun, 18 Aug 2019 16:51:48 +0000 (09:51 -0700)]

Merge tag 'for-5.3-rc4-tag' of git://git./linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
"Two fixes that popped up during testing:

   - fix for sysfs-related code that adds/removes block groups, warnings
     appear during several fstests in connection with sysfs updates in
     5.3, the fix essentially replaces a workaround with scope NOFS and
     applies to 5.2-based branch too

   - add sanity check of trim range"

* tag 'for-5.3-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: trim: Check the range passed into to prevent overflow
  Btrfs: fix sysfs warning and missing raid sysfs directories

commit | commitdiff | tree

Linus Torvalds [Sun, 18 Aug 2019 16:45:42 +0000 (09:45 -0700)]

Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:
"A set of fixes for x86:

   - Fix the inconsistent error handling in the umwait init code

   - Rework the boot param zeroing so gcc9 stops complaining about out
     of bound memset. The resulting source code is actually more sane to
     read than the smart solution we had

   - Maintainers update so Tony gets involved when Intel models are
     added

   - Some more fallthrough fixes"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot: Save fields explicitly, zero out everything else
  MAINTAINERS, x86/CPU: Tony Luck will maintain asm/intel-family.h
  x86/fpu/math-emu: Address fallthrough warnings
  x86/apic/32: Fix yet another implicit fallthrough warning
  x86/umwait: Fix error handling in umwait_init()

commit | commitdiff | tree

Linus Torvalds [Sun, 18 Aug 2019 16:36:51 +0000 (09:36 -0700)]

Merge branch 'efi-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull EFI fix from Thomas Gleixner:
"A single fix for a EFI mixed mode regression caused by recent rework
which did not take the firmware bitwidth into account"

* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi-stub: Fix get_efi_config_table on mixed-mode setups

commit | commitdiff | tree

Linus Torvalds [Sun, 18 Aug 2019 16:26:16 +0000 (09:26 -0700)]

Merge tag 'spdx-5.3-rc5' of git://git./linux/kernel/git/gregkh/spdx

Pull SPDX fixes from Greg KH:
"Here are four small SPDX fixes for 5.3-rc5.

  A few style fixes for some SPDX comments, added an SPDX tag for one
  file, and fix up some GPL boilerplate for another file.

  All of these have been in linux-next for a few weeks with no reported
  issues (they are comment changes only, so that's to be expected...)"

* tag 'spdx-5.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx:
  i2c: stm32: Use the correct style for SPDX License Identifier
  intel_th: Use the correct style for SPDX License Identifier
  coccinelle: api/atomic_as_refcounter: add SPDX License Identifier
  kernel/configs: Replace GPL boilerplate code with SPDX identifier

commit | commitdiff | tree

Linus Torvalds [Sun, 18 Aug 2019 16:17:41 +0000 (09:17 -0700)]

Merge tag 'char-misc-5.3-rc5' of git://git./linux/kernel/git/gregkh/char-misc

Pull char/misc driver fixes from Greg KH:
"Here are some small char and misc driver fixes for 5.3-rc5.

  These are two different subsystems needing some fixes, the habanalabs
  driver which is has some more big endian fixes for problems found. The
  other are some small soundwire fixes, including some Kconfig
  dependencies needed to resolve reported build errors.

  All of these have been in linux-next this week with no reported
  issues"

* tag 'char-misc-5.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  misc: xilinx-sdfec: fix dependency and build error
  habanalabs: fix device IRQ unmasking for BE host
  habanalabs: fix endianness handling for internal QMAN submission
  habanalabs: fix completion queue handling when host is BE
  habanalabs: fix endianness handling for packets from user
  habanalabs: fix DRAM usage accounting on context tear down
  habanalabs: Avoid double free in error flow
  soundwire: fix regmap dependencies and align with other serial links
  soundwire: cadence_master: fix definitions for INTSTAT0/1
  soundwire: cadence_master: fix register definition for SLAVE_STATE

commit | commitdiff | tree

Linus Torvalds [Sun, 18 Aug 2019 16:14:56 +0000 (09:14 -0700)]

Merge tag 'staging-5.3-rc5' of git://git./linux/kernel/git/gregkh/staging

Pull staging/IIO fixes from Greg KH:
"Here are four small staging and iio driver fixes for 5.3-rc5

  Two are for the dt3000 comedi driver for some reported problems found
  in that codebase, and two are some small iio fixes.

  All of these have been in linux-next this week with no reported
  issues"

* tag 'staging-5.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  staging: comedi: dt3000: Fix rounding up of timer divisor
  staging: comedi: dt3000: Fix signed integer overflow 'divider * base'
  iio: adc: max9611: Fix temperature reading in probe
  iio: frequency: adf4371: Fix output frequency setting

commit | commitdiff | tree

Linus Torvalds [Sun, 18 Aug 2019 16:11:29 +0000 (09:11 -0700)]

Merge tag 'usb-5.3-rc5' of git://git./linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
"Here are number of small USB fixes for 5.3-rc5.

  Syzbot has been on a tear recently now that it has some good USB
  debugging hooks integrated, so there's a number of fixes in here found
  by those tools for some _very_ old bugs. Also a handful of gadget
  driver fixes for reported issues, some hopefully-final dma fixes for
  host controller drivers, and some new USB serial gadget driver ids.

  All of these have been in linux-next this week with no reported issues
  (the usb-serial ones were in linux-next in its own branch, but merged
  into mine on Friday)"

* tag 'usb-5.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  usb: add a hcd_uses_dma helper
  usb: don't create dma pools for HCDs with a localmem_pool
  usb: chipidea: imx: fix EPROBE_DEFER support during driver probe
  usb: host: fotg2: restart hcd after port reset
  USB: CDC: fix sanity checks in CDC union parser
  usb: cdc-acm: make sure a refcount is taken early enough
  USB: serial: option: add the BroadMobi BM818 card
  USB: serial: option: Add Motorola modem UARTs
  USB: core: Fix races in character device registration and deregistraion
  usb: gadget: mass_storage: Fix races between fsg_disable and fsg_set_alt
  usb: gadget: composite: Clear "suspended" on reset/disconnect
  usb: gadget: udc: renesas_usb3: Fix sysfs interface of "role"
  USB: serial: option: add D-Link DWM-222 device ID
  USB: serial: option: Add support for ZTE MF871A

commit | commitdiff | tree

Linus Torvalds [Sun, 18 Aug 2019 02:39:54 +0000 (19:39 -0700)]

Merge tag 'for-linus-2019-08-17' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
"A collection of fixes that should go into this series. This contains:

   - Revert of the REQ_NOWAIT_INLINE and associated dio changes. There
     were still corner cases there, and even though I had a solution for
     it, it's too involved for this stage. (me)

   - Set of NVMe fixes (via Sagi)

   - io_uring fix for fixed buffers (Anthony)

   - io_uring defer issue fix (Jackie)

   - Regression fix for queue sync at exit time (zhengbin)

   - xen blk-back memory leak fix (Wenwen)"

* tag 'for-linus-2019-08-17' of git://git.kernel.dk/linux-block:
  io_uring: fix an issue when IOSQE_IO_LINK is inserted into defer list
  block: remove REQ_NOWAIT_INLINE
  io_uring: fix manual setup of iov_iter for fixed buffers
  xen/blkback: fix memory leaks
  blk-mq: move cancel of requeue_work to the front of blk_exit_queue
  nvme-pci: Fix async probe remove race
  nvme: fix controller removal race with scan work
  nvme-rdma: fix possible use-after-free in connect error flow
  nvme: fix a possible deadlock when passthru commands sent to a multipath device
  nvme-core: Fix extra device_put() call on error path
  nvmet-file: fix nvmet_file_flush() always returning an error
  nvmet-loop: Flush nvme_delete_wq when removing the port
  nvmet: Fix use-after-free bug when a port is removed
  nvme-multipath: revalidate nvme_ns_head gendisk in nvme_validate_ns

commit | commitdiff | tree

Linus Torvalds [Sun, 18 Aug 2019 02:31:30 +0000 (19:31 -0700)]

Merge tag 'hyperv-fixes-signed' of git://git./linux/kernel/git/hyperv/linux

Pull Hyper-V fixes from Sasha Levin:

- A few fixes for the userspace hyper-v tools from Adrian Vladu.

- A fix for the hyper-v MAINTAINERs entry from Lan Tianyu.

- Fix for SPDX license identifier in the userspace tools from Nishad
   Kamdar.

* tag 'hyperv-fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
  MAINTAINERS: Fix Hyperv vIOMMU driver file name
  tools: hv: Use the correct style for SPDX License Identifier
  tools: hv: fix typos in toolchain
  tools: hv: fix KVP and VSS daemons exit code
  tools: hv: fixed Python pep8/flake8 warnings for lsvmbus

commit | commitdiff | tree

Lan Tianyu [Tue, 26 Mar 2019 06:28:21 +0000 (14:28 +0800)]

MAINTAINERS: Fix Hyperv vIOMMU driver file name

The Hyperv vIOMMU file name should be "hyperv-iommu.c" rather
than "hyperv_iommu.c". This patch is to fix it.

Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

commit | commitdiff | tree

Nishad Kamdar [Mon, 22 Jul 2019 13:31:17 +0000 (19:01 +0530)]

tools: hv: Use the correct style for SPDX License Identifier

This patch corrects the SPDX License Identifier style
in the trace header file related to Microsoft Hyper-V
client drivers.
For C header files Documentation/process/license-rules.rst
mandates C-like comments (opposed to C source files where
C++ style should be used)

Changes made by using a script provided by Joe Perches here:
https://lkml.org/lkml/2019/2/7/46

Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Nishad Kamdar <nishadkamdar@gmail.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

commit | commitdiff | tree

Adrian Vladu [Mon, 6 May 2019 16:51:24 +0000 (16:51 +0000)]

tools: hv: fix typos in toolchain

Fix typos in the HyperV toolchain.

Signed-off-by: Adrian Vladu <avladu@cloudbasesolutions.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Alessandro Pilotti <apilotti@cloudbasesolutions.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

commit | commitdiff | tree

Adrian Vladu [Mon, 6 May 2019 16:50:58 +0000 (16:50 +0000)]

tools: hv: fix KVP and VSS daemons exit code

HyperV KVP and VSS daemons should exit with 0 when the '--help'
or '-h' flags are used.

Signed-off-by: Adrian Vladu <avladu@cloudbasesolutions.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Alessandro Pilotti <apilotti@cloudbasesolutions.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

commit | commitdiff | tree

Adrian Vladu [Mon, 6 May 2019 17:27:37 +0000 (17:27 +0000)]

tools: hv: fixed Python pep8/flake8 warnings for lsvmbus

Fixed pep8/flake8 python style code for lsvmbus tool.

The TAB indentation was on purpose ignored (pep8 rule W191) to make
sure the code is complying with the Linux code guideline.
The following command doe not show any warnings now:
pep8 --ignore=W191 lsvmbus
flake8 --ignore=W191 lsvmbus

Signed-off-by: Adrian Vladu <avladu@cloudbasesolutions.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Alessandro Pilotti <apilotti@cloudbasesolutions.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

commit | commitdiff | tree

Linus Torvalds [Sat, 17 Aug 2019 17:44:50 +0000 (10:44 -0700)]

Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:
"I2C has one revert because of a regression, two fixes for tiny race
  windows (which we were not able to trigger), a MAINTAINERS addition,
  and a SPDX fix"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: stm32: Use the correct style for SPDX License Identifier
  i2c: emev2: avoid race when unregistering slave client
  i2c: rcar: avoid race when unregistering slave client
  MAINTAINERS: i2c-imx: take over maintainership
  Revert "i2c: imx: improve the error handling in i2c_imx_dma_request()"

commit | commitdiff | tree

Linus Torvalds [Sat, 17 Aug 2019 17:36:47 +0000 (10:36 -0700)]

Merge tag 'riscv/for-v5.3-rc5' of git://git./linux/kernel/git/riscv/linux

Pull RISC-V fixes from Paul Walmsley:

- Two patches to fix significant bugs in floating point register
   context handling

- A minor fix in RISC-V flush_tlb_page(), to supply a valid end address
   to flush_tlb_range()

- Two minor defconfig additions: to build the virtio hwrng driver by
   default (for QEMU targets), and to partially synchronize the 32-bit
   defconfig with the 64-bit defconfig

* tag 'riscv/for-v5.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: Make __fstate_clean() work correctly.
  riscv: Correct the initialized flow of FP register
  riscv: defconfig: Update the defconfig
  riscv: rv32_defconfig: Update the defconfig
  riscv: fix flush_tlb_range() end address for flush_tlb_page()

commit | commitdiff | tree

Greg Kroah-Hartman [Sat, 17 Aug 2019 15:09:33 +0000 (17:09 +0200)]

Merge tag 'usb-serial-5.3-rc5' of https://git./linux/kernel/git/johan/usb-serial into usb-linus

Johan writes:

USB-serial fixes for 5.3-rc5

Here are some new modem device ids.

All have been in linux-next with no reported issues.

Signed-off-by: Johan Hovold <johan@kernel.org>
* tag 'usb-serial-5.3-rc5' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial:
  USB: serial: option: add the BroadMobi BM818 card
  USB: serial: option: Add Motorola modem UARTs
  USB: serial: option: add D-Link DWM-222 device ID
  USB: serial: option: Add support for ZTE MF871A

commit | commitdiff | tree

Linus Torvalds [Sat, 17 Aug 2019 00:27:55 +0000 (17:27 -0700)]

Merge tag 'xtensa-20190816' of git://github.com/jcmvbkbc/linux-xtensa

Pull Xtensa fix from Max Filippov:
"Add missing isync into cpu_reset to make sure ITLB changes are
effective"

* tag 'xtensa-20190816' of git://github.com/jcmvbkbc/linux-xtensa:
xtensa: add missing isync to the cpu_reset TLB code

commit | commitdiff | tree

Linus Torvalds [Fri, 16 Aug 2019 17:51:47 +0000 (10:51 -0700)]

Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux

Pull arm64 fixes from Catalin Marinas:

- Don't taint the kernel if CPUs have different sets of page sizes
   supported (other than the one in use).

- Issue I-cache maintenance for module ftrace trampoline.

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: ftrace: Ensure module ftrace trampoline is coherent with I-side
  arm64: cpufeature: Don't treat granule sizes as strict

commit | commitdiff | tree

Will Deacon [Fri, 16 Aug 2019 13:57:43 +0000 (14:57 +0100)]

arm64: ftrace: Ensure module ftrace trampoline is coherent with I-side

The initial support for dynamic ftrace trampolines in modules made use
of an indirect branch which loaded its target from the beginning of
a special section (e71a4e1bebaf7 ("arm64: ftrace: add support for far
branches to dynamic ftrace")). Since no instructions were being patched,
no cache maintenance was needed. However, later in be0f272bfc83 ("arm64:
ftrace: emit ftrace-mod.o contents through code") this code was reworked
to output the trampoline instructions directly into the PLT entry but,
unfortunately, the necessary cache maintenance was overlooked.

Add a call to __flush_icache_range() after writing the new trampoline
instructions but before patching in the branch to the trampoline.

Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: James Morse <james.morse@arm.com>
Cc: <stable@vger.kernel.org>
Fixes: be0f272bfc83 ("arm64: ftrace: emit ftrace-mod.o contents through code")
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

commit | commitdiff | tree

Linus Torvalds [Fri, 16 Aug 2019 16:13:16 +0000 (09:13 -0700)]

Merge tag 'pm-5.3-rc5' of git://git./linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
"These add a check to avoid recent suspend-to-idle power regression on
  systems with NVMe drives where the PCIe ASPM policy is "performance"
  (or when the kernel is built without ASPM support), fix an issue
  related to frequency limits in the schedutil cpufreq governor and fix
  a mistake related to the PM QoS usage in the cpufreq core introduced
  recently.

  Specifics:

   - Disable NVMe power optimization related to suspend-to-idle added
     recently on systems where PCIe ASPM is not able to put PCIe links
     into low-power states to prevent excess power from being drawn by
     the system while suspended (Rafael Wysocki).

   - Make the schedutil governor handle frequency limits changes
     properly in all cases (Viresh Kumar).

   - Prevent the cpufreq core from treating positive values returned by
     dev_pm_qos_update_request() as errors (Viresh Kumar)"

* tag 'pm-5.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  nvme-pci: Allow PCI bus-level PM to be used if ASPM is disabled
  PCI/ASPM: Add pcie_aspm_enabled()
  cpufreq: schedutil: Don't skip freq update when limits change
  cpufreq: dev_pm_qos_update_request() can return 1 on success

commit | commitdiff | tree

Linus Torvalds [Fri, 16 Aug 2019 15:59:33 +0000 (08:59 -0700)]

Merge tag 'dmaengine-fix-5.3-rc5' of git://git.infradead.org/users/vkoul/slave-dma

Pull dmaengine fixes from Vinod Koul:
"Fixes in dmaengine drivers for:

   - dw-edma: endianess, _iomem type and stack usages

   - ste_dma40: unneeded variable and null-pointer dereference

   - tegra210-adma: unused function

   - omap-dma: off-by-one fix"

* tag 'dmaengine-fix-5.3-rc5' of git://git.infradead.org/users/vkoul/slave-dma:
  omap-dma/omap_vout_vrfb: fix off-by-one fi value
  dmaengine: stm32-mdma: Fix a possible null-pointer dereference in stm32_mdma_irq_handler()
  dmaengine: tegra210-adma: Fix unused function warnings
  dmaengine: ste_dma40: fix unneeded variable warning
  dmaengine: dw-edma: fix endianess confusion
  dmaengine: dw-edma: fix __iomem type confusion
  dmaengine: dw-edma: fix unnecessary stack usage

commit | commitdiff | tree

Linus Torvalds [Fri, 16 Aug 2019 15:49:45 +0000 (08:49 -0700)]

Merge tag 'sound-5.3-rc5' of git://git./linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"All small fixes targeted for stable:

   - Two fixes for USB-audio with malformed descriptor, spotted by
     fuzzers

   - Two fixes Conexant HD-audio codec wrt power management

   - Quirks for HD-audio AMD platform and HP laptop

   - HD-audio memory leak fix"

* tag 'sound-5.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: usb-audio: Fix a stack buffer overflow bug in check_input_term
  ALSA: usb-audio: Fix an OOB bug in parse_audio_mixer_unit
  ALSA: hda - Add a generic reboot_notify
  ALSA: hda - Let all conexant codec enter D3 when rebooting
  ALSA: hda/realtek - Add quirk for HP Envy x360
  ALSA: hda - Fix a memory leak bug
  ALSA: hda - Apply workaround for another AMD chip 1022:1487

commit | commitdiff | tree

Linus Torvalds [Fri, 16 Aug 2019 15:41:15 +0000 (08:41 -0700)]

Merge tag 'drm-fixes-2019-08-16' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
"Nothing too crazy this week, one amdgpu fix to use vmalloc for a
  struct that grew in size, and another MST fix for nouveau, and some
  other misc fixes:

  i915:
   - single GVT use after free fix

  scheduler:
   - entity destruction race fix

  amdgpu:
   - struct allocation fix
   - gfx9 soft recovery fix

  nouveau:
   - followup MST fix

  ast:
   - vga register race fix"

* tag 'drm-fixes-2019-08-16' of git://anongit.freedesktop.org/drm/drm:
  drm/nouveau: Only recalculate PBN/VCPI on mode/connector changes
  drm/ast: Fixed reboot test may cause system hanged
  drm/scheduler: use job count instead of peek
  drm/amd/display: use kvmalloc for dc_state (v2)
  drm/amdgpu: fix gfx9 soft recovery
  drm/i915: Use after free in error path in intel_vgpu_create_workload()

commit | commitdiff | tree

Rafael J. Wysocki [Fri, 16 Aug 2019 12:24:51 +0000 (14:24 +0200)]

Merge branch 'pm-cpufreq'

* pm-cpufreq:
cpufreq: schedutil: Don't skip freq update when limits change
cpufreq: dev_pm_qos_update_request() can return 1 on success

commit | commitdiff | tree

John Hubbard [Wed, 31 Jul 2019 05:46:27 +0000 (22:46 -0700)]

x86/boot: Save fields explicitly, zero out everything else

Recent gcc compilers (gcc 9.1) generate warnings about an out of bounds
memset, if the memset goes accross several fields of a struct. This
generated a couple of warnings on x86_64 builds in sanitize_boot_params().

Fix this by explicitly saving the fields in struct boot_params
that are intended to be preserved, and zeroing all the rest.

[ tglx: Tagged for stable as it breaks the warning free build there as well ]

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Suggested-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20190731054627.5627-2-jhubbard@nvidia.com

commit | commitdiff | tree

Greg Kroah-Hartman [Fri, 16 Aug 2019 10:35:56 +0000 (12:35 +0200)]

Merge tag 'soundwire-5.3-rc5' of git://git./linux/kernel/git/vkoul/soundwire into char-misc-linus

Vinod writes:

soundwire fixes for v5.3-rc5

Pierre sent fixes which are queued now for v5.3-rc5 are:
- regmap dependecy
- cadence register definitions

* tag 'soundwire-5.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire:
  soundwire: fix regmap dependencies and align with other serial links
  soundwire: cadence_master: fix definitions for INTSTAT0/1
  soundwire: cadence_master: fix register definition for SLAVE_STATE

commit | commitdiff | tree

Dave Airlie [Fri, 16 Aug 2019 02:41:52 +0000 (12:41 +1000)]

Merge tag 'drm-intel-fixes-2019-08-15' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes

drm/i915 fixes for v5.4-rc5:
- GVT use-after-free fix

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/87zhkag9ic.fsf@intel.com

commit | commitdiff | tree

Hui Peng [Thu, 15 Aug 2019 04:31:34 +0000 (00:31 -0400)]

ALSA: usb-audio: Fix a stack buffer overflow bug in check_input_term

`check_input_term` recursively calls itself with input from
device side (e.g., uac_input_terminal_descriptor.bCSourceID)
as argument (id). In `check_input_term`, if `check_input_term`
is called with the same `id` argument as the caller, it triggers
endless recursive call, resulting kernel space stack overflow.

This patch fixes the bug by adding a bitmap to `struct mixer_build`
to keep track of the checked ids and stop the execution if some id
has been checked (similar to how parse_audio_unit handles unitid
argument).

Reported-by: Hui Peng <benquike@gmail.com>
Reported-by: Mathias Payer <mathias.payer@nebelwelt.net>
Signed-off-by: Hui Peng <benquike@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

commit | commitdiff | tree

Linus Torvalds [Thu, 15 Aug 2019 19:29:36 +0000 (12:29 -0700)]

Merge tag 'xfs-5.3-fixes-2' of git://git./fs/xfs/xfs-linux

Pull xfs fixes from Darrick Wong:

- Fix crashes when the attr fork isn't present due to errors but inode
   inactivation tries to zap the attr data anyway.

- Convert more directory corruption debugging asserts to actual
   EFSCORRUPTED returns instead of blowing up later on.

- Don't fail writeback just because we ran out of memory allocating
   metadata log data.

* tag 'xfs-5.3-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: don't crash on null attr fork xfs_bmapi_read
  xfs: remove more ondisk directory corruption asserts
  fs: xfs: xfs_log: Don't use KM_MAYFAIL at xfs_log_reserve().

commit | commitdiff | tree

Linus Torvalds [Thu, 15 Aug 2019 19:15:45 +0000 (12:15 -0700)]

Merge tag 'iomap-5.3-fixes-1' of git://git./fs/xfs/xfs-linux

Pull iomap fixlet from Darrick Wong:
"A single update to the MAINTAINERS entry for iomap now that we've
removed fs/iomap.c"

* tag 'iomap-5.3-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
MAINTAINERS: iomap: Remove fs/iomap.c record

commit | commitdiff | tree

Jackie Liu [Wed, 14 Aug 2019 09:35:22 +0000 (17:35 +0800)]

io_uring: fix an issue when IOSQE_IO_LINK is inserted into defer list

This patch may fix two issues:

First, when IOSQE_IO_DRAIN set, the next IOs need to be inserted into
defer list to delay execution, but link io will be actively scheduled to
run by calling io_queue_sqe.

Second, when multiple LINK_IOs are inserted together with defer_list,
the LINK_IO is no longer keep order.

   |-------------|
   |   LINK_IO   |      ----> insert to defer_list  -----------
   |-------------|                                            |
   |   LINK_IO   |      ----> insert to defer_list  ----------|
   |-------------|                                            |
   |   LINK_IO   |      ----> insert to defer_list  ----------|
   |-------------|                                            |
   |   NORMAL_IO |      ----> insert to defer_list  ----------|
   |-------------|                                            |
                                                              |
                              queue_work at same time   <-----|

Fixes: 9e645e1105c ("io_uring: add support for sqe links")
Signed-off-by: Jackie Liu <liuyun01@kylinos.cn>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Jens Axboe [Thu, 15 Aug 2019 17:09:16 +0000 (11:09 -0600)]

block: remove REQ_NOWAIT_INLINE

We had a few issues with this code, and there's still a problem around
how we deal with error handling for chained/split bios. For now, just
revert the code and we'll try again with a thoroug solution. This
reverts commits:

e15c2ffa1091 ("block: fix O_DIRECT error handling for bio fragments")
0eb6ddfb865c ("block: Fix __blkdev_direct_IO() for bio fragments")
6a43074e2f46 ("block: properly handle IOCB_NOWAIT for async O_DIRECT IO")
893a1c97205a ("blk-mq: allow REQ_NOWAIT to return an error inline")

Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Aleix Roca Nonell [Thu, 15 Aug 2019 12:03:22 +0000 (14:03 +0200)]

io_uring: fix manual setup of iov_iter for fixed buffers

Commit bd11b3a391e3 ("io_uring: don't use iov_iter_advance() for fixed
buffers") introduced an optimization to avoid using the slow
iov_iter_advance by manually populating the iov_iter iterator in some
cases.

However, the computation of the iterator count field was erroneous: The
first bvec was always accounted for an extent of page size even if the
bvec length was smaller.

In consequence, some I/O operations on fixed buffers were unable to
operate on the full extent of the buffer, consistently skipping some
bytes at the end of it.

Fixes: bd11b3a391e3 ("io_uring: don't use iov_iter_advance() for fixed buffers")
Cc: stable@vger.kernel.org
Signed-off-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Linus Torvalds [Thu, 15 Aug 2019 16:20:17 +0000 (09:20 -0700)]

Merge tag 'auxdisplay-for-linus-v5.3-rc5' of git://github.com/ojeda/linux

Pull auxdisplay fixes from Miguel Ojeda:
"A few minor auxdisplay improvements:

   - A couple of small header cleanups for charlcd (Masahiro Yamada)

   - A trivial typo fix for the examples of cfag12864b (Masahiro Yamada)

   - An Kconfig help text improvement for charlcd (Mans Rullgard)

   - An error path fix for panel (zhengbin)"

* tag 'auxdisplay-for-linus-v5.3-rc5' of git://github.com/ojeda/linux:
  auxdisplay: Fix a typo in cfag12864b-example.c
  auxdisplay: charlcd: add include guard to charlcd.h
  auxdisplay: charlcd: move charlcd.h to drivers/auxdisplay
  auxdisplay: charlcd: add help text for backlight initial state
  auxdisplay: panel: need to delete scan_timer when misc_register fails in panel_attach

commit | commitdiff | tree

Linus Torvalds [Thu, 15 Aug 2019 16:18:56 +0000 (09:18 -0700)]

Merge tag 'devicetree-fixes-for-5.3-3' of git://git./linux/kernel/git/robh/linux

Pull devicetree fixes from Rob Herring:

- Fix building DT binding examples for in tree builds

- Correct some refcounting in adjust_local_phandle_references()

- Update FSL FEC binding with deprecated properties

- Schema fix in stm32 pinctrl

- Fix typo in of_irq_parse_one docbook comment

* tag 'devicetree-fixes-for-5.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
  of: irq: fix a trivial typo in a doc comment
  dt-bindings: pinctrl: stm32: Fix 'st,syscfg' schema
  dt-bindings: fec: explicitly mark deprecated properties
  of: resolver: Add of_node_put() before return and break
  dt-bindings: Fix generated example files getting added to schemas

commit | commitdiff | tree

Randy Dunlap [Tue, 13 Aug 2019 23:01:20 +0000 (16:01 -0700)]

misc: xilinx-sdfec: fix dependency and build error

lib/devres.c, which implements devm_ioremap_resource(), is only built
when CONFIG_HAS_IOMEM is set/enabled, so XILINX_SDFEC should depend
on HAS_IOMEM. Fixes this build error (as seen on UML builds):

ERROR: "devm_ioremap_resource" [drivers/misc/xilinx_sdfec.ko] undefined!

Fixes: 76d83e1c3233 ("misc: xilinx-sdfec: add core driver")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Derek Kiernan <derek.kiernan@xilinx.com>
Cc: Dragan Cvetic <dragan.cvetic@xilinx.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/f9004be5-9925-327b-3ec2-6506e46fe565@infradead.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Christoph Hellwig [Sun, 11 Aug 2019 08:05:16 +0000 (10:05 +0200)]

usb: add a hcd_uses_dma helper

The USB buffer allocation code is the only place in the usb core (and in
fact the whole kernel) that uses is_device_dma_capable, while the URB
mapping code uses the uses_dma flag in struct usb_bus. Switch the buffer
allocation to use the uses_dma flag used by the rest of the USB code,
and create a helper in hcd.h that checks this flag as well as the
CONFIG_HAS_DMA to simplify the caller a bit.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20190811080520.21712-3-hch@lst.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Christoph Hellwig [Sun, 11 Aug 2019 08:05:15 +0000 (10:05 +0200)]

usb: don't create dma pools for HCDs with a localmem_pool

If the HCD provides a localmem pool we will never use the DMA pools, so
don't create them.

Fixes: b0310c2f09bb ("USB: use genalloc for USB HCs with local memory")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20190811080520.21712-2-hch@lst.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

André Draszik [Sat, 10 Aug 2019 15:07:58 +0000 (16:07 +0100)]

usb: chipidea: imx: fix EPROBE_DEFER support during driver probe

If driver probe needs to be deferred, e.g. because ci_hdrc_add_device()
isn't ready yet, this driver currently misbehaves badly:
    a) success is still reported to the driver core (meaning a 2nd
       probe attempt will never be done), leaving the driver in
       a dysfunctional state and the hardware unusable

    b) driver remove / shutdown OOPSes:
    [  206.786916] Unable to handle kernel paging request at virtual address fffffdff
    [  206.794148] pgd = 880b9f82
    [  206.796890] [fffffdff] *pgd=abf5e861, *pte=00000000, *ppte=00000000
    [  206.803179] Internal error: Oops: 37 [#1] PREEMPT SMP ARM
    [  206.808581] Modules linked in: wl18xx evbug
    [  206.813308] CPU: 1 PID: 1 Comm: systemd-shutdow Not tainted 4.19.35+gf345c93b4195 #1
    [  206.821053] Hardware name: Freescale i.MX7 Dual (Device Tree)
    [  206.826813] PC is at ci_hdrc_remove_device+0x4/0x20
    [  206.831699] LR is at ci_hdrc_imx_remove+0x20/0xe8
    [  206.836407] pc : [<805cd4b0>]    lr : [<805d62cc>]    psr: 20000013
    [  206.842678] sp : a806be40  ip : 00000001  fp : 80adbd3c
    [  206.847906] r10: 80b1b794  r9 : 80d5dfe0  r8 : a8192c44
    [  206.853136] r7 : 80db93a0  r6 : a8192c10  r5 : a8192c00  r4 : a93a4a00
    [  206.859668] r3 : 00000000  r2 : a8192ce4  r1 : ffffffff  r0 : fffffdfb
    [  206.866201] Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
    [  206.873341] Control: 10c5387d  Table: a9e0c06a  DAC: 00000051
    [  206.879092] Process systemd-shutdow (pid: 1, stack limit = 0xb271353c)
    [  206.885624] Stack: (0xa806be40 to 0xa806c000)
    [  206.889992] be40: a93a4a00 805d62cc a8192c1c a8170e10 a8192c10 8049a490 80d04d08 00000000
    [  206.898179] be60: 00000000 80d0da2c fee1dead 00000000 a806a000 00000058 00000000 80148b08
    [  206.906366] be80: 01234567 80148d8c a9858600 00000000 00000000 00000000 00000000 80d04d08
    [  206.914553] bea0: 00000000 00000000 a82741e0 a9858600 00000024 00000002 a9858608 00000005
    [  206.922740] bec0: 0000001e 8022c058 00000000 00000000 a806bf14 a9858600 00000000 a806befc
    [  206.930927] bee0: a806bf78 00000000 7ee12c30 8022c18c a806bef8 a806befc 00000000 00000001
    [  206.939115] bf00: 00000000 00000024 a806bf14 00000005 7ee13b34 7ee12c68 00000004 7ee13f20
    [  206.947302] bf20: 00000010 7ee12c7c 00000005 7ee12d04 0000000a 76e7dc00 00000001 80d0f140
    [  206.955490] bf40: ab637880 a974de40 60000013 80d0f140 ab6378a0 80d04d08 a8080470 a9858600
    [  206.963677] bf60: a9858600 00000000 00000000 8022c24c 00000000 80144310 00000000 00000000
    [  206.971864] bf80: 80101204 80d04d08 00000000 80d04d08 00000000 00000000 00000003 00000058
    [  206.980051] bfa0: 80101204 80101000 00000000 00000000 fee1dead 28121969 01234567 00000000
    [  206.988237] bfc0: 00000000 00000000 00000003 00000058 00000000 00000000 00000000 00000000
    [  206.996425] bfe0: 0049ffb0 7ee13d58 0048a84b 76f245a6 60000030 fee1dead 00000000 00000000
    [  207.004622] [<805cd4b0>] (ci_hdrc_remove_device) from [<805d62cc>] (ci_hdrc_imx_remove+0x20/0xe8)
    [  207.013509] [<805d62cc>] (ci_hdrc_imx_remove) from [<8049a490>] (device_shutdown+0x16c/0x218)
    [  207.022050] [<8049a490>] (device_shutdown) from [<80148b08>] (kernel_restart+0xc/0x50)
    [  207.029980] [<80148b08>] (kernel_restart) from [<80148d8c>] (sys_reboot+0xf4/0x1f0)
    [  207.037648] [<80148d8c>] (sys_reboot) from [<80101000>] (ret_fast_syscall+0x0/0x54)
    [  207.045308] Exception stack(0xa806bfa8 to 0xa806bff0)
    [  207.050368] bfa0:                   00000000 00000000 fee1dead 28121969 01234567 00000000
    [  207.058554] bfc0: 00000000 00000000 00000003 00000058 00000000 00000000 00000000 00000000
    [  207.066737] bfe0: 0049ffb0 7ee13d58 0048a84b 76f245a6
    [  207.071799] Code: ebffffa8 e3a00000 e8bd8010 e92d4010 (e5904004)
    [  207.078021] ---[ end trace be47424e3fd46e9f ]---
    [  207.082647] Kernel panic - not syncing: Fatal exception
    [  207.087894] ---[ end Kernel panic - not syncing: Fatal exception ]---

    c) the error path in combination with driver removal causes
       imbalanced calls to the clk_*() and pm_()* APIs

a) happens because the original intended return value is
   overwritten (with 0) by the return code of
   regulator_disable() in ci_hdrc_imx_probe()'s error path
b) happens because ci_pdev is -EPROBE_DEFER, which causes
   ci_hdrc_remove_device() to OOPS

Fix a) by being more careful in ci_hdrc_imx_probe()'s error
path and not overwriting the real error code

Fix b) by calling the respective cleanup functions during
remove only when needed (when ci_pdev != NULL, i.e. when
everything was initialised correctly). This also has the
side effect of not causing imbalanced clk_*() and pm_*()
API calls as part of the error code path.

Fixes: 7c8e8909417e ("usb: chipidea: imx: add HSIC support")
Signed-off-by: André Draszik <git@andred.net>
Cc: stable <stable@vger.kernel.org>
CC: Peter Chen <Peter.Chen@nxp.com>
CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
CC: Shawn Guo <shawnguo@kernel.org>
CC: Sascha Hauer <s.hauer@pengutronix.de>
CC: Pengutronix Kernel Team <kernel@pengutronix.de>
CC: Fabio Estevam <festevam@gmail.com>
CC: NXP Linux Team <linux-imx@nxp.com>
CC: linux-usb@vger.kernel.org
CC: linux-arm-kernel@lists.infradead.org
CC: linux-kernel@vger.kernel.org
Link: https://lore.kernel.org/r/20190810150758.17694-1-git@andred.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Hans Ulli Kroll [Sat, 10 Aug 2019 15:04:58 +0000 (17:04 +0200)]

usb: host: fotg2: restart hcd after port reset

On the Gemini SoC the FOTG2 stalls after port reset
so restart the HCD after each port reset.

Signed-off-by: Hans Ulli Kroll <ulli.kroll@googlemail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20190810150458.817-1-linus.walleij@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Oliver Neukum [Tue, 13 Aug 2019 09:35:41 +0000 (11:35 +0200)]

USB: CDC: fix sanity checks in CDC union parser

A few checks checked for the size of the pointer to a structure
instead of the structure itself. Copy & paste issue presumably.

Fixes: e4c6fb7794982 ("usbnet: move the CDC parser into USB core")
Cc: stable <stable@vger.kernel.org>
Reported-by: syzbot+45a53506b65321c1fe91@syzkaller.appspotmail.com
Signed-off-by: Oliver Neukum <oneukum@suse.com>
Link: https://lore.kernel.org/r/20190813093541.18889-1-oneukum@suse.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Oliver Neukum [Thu, 8 Aug 2019 14:21:19 +0000 (16:21 +0200)]

usb: cdc-acm: make sure a refcount is taken early enough

destroy() will decrement the refcount on the interface, so that
it needs to be taken so early that it never undercounts.

Fixes: 7fb57a019f94e ("USB: cdc-acm: Fix potential deadlock (lockdep warning)")
Cc: stable <stable@vger.kernel.org>
Reported-and-tested-by: syzbot+1b2449b7b5dc240d107a@syzkaller.appspotmail.com
Signed-off-by: Oliver Neukum <oneukum@suse.com>
Link: https://lore.kernel.org/r/20190808142119.7998-1-oneukum@suse.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Bob Ham [Wed, 24 Jul 2019 14:52:26 +0000 (07:52 -0700)]

USB: serial: option: add the BroadMobi BM818 card

Add a VID:PID for the BroadMobi BM818 M.2 card

T:  Bus=01 Lev=03 Prnt=40 Port=03 Cnt=01 Dev#= 44 Spd=480 MxCh= 0
D:  Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
P:  Vendor=2020 ProdID=2060 Rev=00.00
S:  Manufacturer=Qualcomm, Incorporated
S:  Product=Qualcomm CDMA Technologies MSM
C:  #Ifs= 5 Cfg#= 1 Atr=e0 MxPwr=500mA
I:  If#=0x0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=(none)
I:  If#=0x1 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=(none)
I:  If#=0x2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=(none)
I:  If#=0x3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=fe Prot=ff Driver=(none)
I:  If#=0x4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=(none)

Signed-off-by: Bob Ham <bob.ham@puri.sm>
Signed-off-by: Angus Ainslie (Purism) <angus@akkea.ca>
Cc: stable <stable@vger.kernel.org>
[ johan: use USB_DEVICE_INTERFACE_CLASS() ]
Signed-off-by: Johan Hovold <johan@kernel.org>

commit | commitdiff | tree

Tony Lindgren [Thu, 15 Aug 2019 08:26:02 +0000 (01:26 -0700)]

USB: serial: option: Add Motorola modem UARTs

On Motorola Mapphone devices such as Droid 4 there are five USB ports
that do not use the same layout as Gobi 1K/2K/etc devices listed in
qcserial.c. So we should use qcaux.c or option.c as noted by
Dan Williams <dan.j.williams@intel.com>.

As the Motorola USB serial ports have an interrupt endpoint as shown
with lsusb -v, we should use option.c instead of qcaux.c as pointed out
by Johan Hovold <johan@kernel.org>.

The ff/ff/ff interfaces seem to always be UARTs on Motorola devices.
For the other interfaces, class 0x0a (CDC Data) should not in general
be added as they are typically part of a multi-interface function as
noted earlier by Bjørn Mork <bjorn@mork.no>.

However, looking at the Motorola mapphone kernel code, the mdm6600 0x0a
class is only used for flashing the modem firmware, and there are no
other interfaces. So I've added that too with more details below as it
works just fine.

The ttyUSB ports on Droid 4 are:

ttyUSB0 DIAG, CQDM-capable
ttyUSB1 MUX or NMEA, no response
ttyUSB2 MUX or NMEA, no response
ttyUSB3 TCMD
ttyUSB4 AT-capable

The ttyUSB0 is detected as QCDM capable by ModemManager. I think
it's only used for debugging with ModemManager --debug for sending
custom AT commands though. ModemManager already can manage data
connection using the USB QMI ports that are already handled by the
qmi_wwan.c driver.

To enable the MUX or NMEA ports, it seems that something needs to be
done additionally to enable them, maybe via the DIAG or TCMD port.
It might be just a NVRAM setting somewhere, but I have no idea what
NVRAM settings may need changing for that.

The TCMD port seems to be a Motorola custom protocol for testing
the modem and to configure it's NVRAM and seems to work just fine
based on a quick test with a minimal tcmdrw tool I wrote.

The voice modem AT-capable port seems to provide only partial
support, and no PM support compared to the TS 27.010 based UART
wired directly to the modem.

The UARTs added with this change are the same product IDs as the
Motorola Mapphone Android Linux kernel mdm6600_id_table. I don't
have any mdm9600 based devices, so I have only tested these on
mdm6600 based droid 4.

Then for the class 0x0a (CDC Data) mode, the Motorola Mapphone Android
Linux kernel driver moto_flashqsc.c just seems to change the
port->bulk_out_size to 8K from the default. And is only used for
flashing the modem firmware it seems.

I've verified that flashing the modem with signed firmware works just
fine with the option driver after manually toggling the GPIO pins, so
I've added droid 4 modem flashing mode to the option driver. I've not
added the other devices listed in moto_flashqsc.c in case they really
need different port->bulk_out_size. Those can be added as they get
tested to work for flashing the modem.

After this patch the output of /sys/kernel/debug/usb/devices has
the following for normal 22b8:2a70 mode including the related qmi_wwan
interfaces:

T:  Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#=  2 Spd=12   MxCh= 0
D:  Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
P:  Vendor=22b8 ProdID=2a70 Rev= 0.00
S:  Manufacturer=Motorola, Incorporated
S:  Product=Flash MZ600
C:* #Ifs= 9 Cfg#= 1 Atr=e0 MxPwr=500mA
I:* If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
E:  Ad=81(I) Atr=02(Bulk) MxPS=  64 Ivl=0ms
E:  Ad=01(O) Atr=02(Bulk) MxPS=  64 Ivl=0ms
I:* If#= 1 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
E:  Ad=82(I) Atr=02(Bulk) MxPS=  64 Ivl=0ms
E:  Ad=02(O) Atr=02(Bulk) MxPS=  64 Ivl=0ms
I:* If#= 2 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
E:  Ad=83(I) Atr=02(Bulk) MxPS=  64 Ivl=0ms
E:  Ad=03(O) Atr=02(Bulk) MxPS=  64 Ivl=0ms
I:* If#= 3 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
E:  Ad=84(I) Atr=02(Bulk) MxPS=  64 Ivl=0ms
E:  Ad=04(O) Atr=02(Bulk) MxPS=  64 Ivl=0ms
I:* If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
E:  Ad=85(I) Atr=03(Int.) MxPS=  64 Ivl=5ms
E:  Ad=86(I) Atr=02(Bulk) MxPS=  64 Ivl=0ms
E:  Ad=05(O) Atr=02(Bulk) MxPS=  64 Ivl=0ms
I:* If#= 5 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=fb Prot=ff Driver=qmi_wwan
E:  Ad=87(I) Atr=03(Int.) MxPS=  64 Ivl=5ms
E:  Ad=88(I) Atr=02(Bulk) MxPS=  64 Ivl=0ms
E:  Ad=06(O) Atr=02(Bulk) MxPS=  64 Ivl=0ms
I:* If#= 6 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=fb Prot=ff Driver=qmi_wwan
E:  Ad=89(I) Atr=03(Int.) MxPS=  64 Ivl=5ms
E:  Ad=8a(I) Atr=02(Bulk) MxPS=  64 Ivl=0ms
E:  Ad=07(O) Atr=02(Bulk) MxPS=  64 Ivl=0ms
I:* If#= 7 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=fb Prot=ff Driver=qmi_wwan
E:  Ad=8b(I) Atr=03(Int.) MxPS=  64 Ivl=5ms
E:  Ad=8c(I) Atr=02(Bulk) MxPS=  64 Ivl=0ms
E:  Ad=08(O) Atr=02(Bulk) MxPS=  64 Ivl=0ms
I:* If#= 8 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=fb Prot=ff Driver=qmi_wwan
E:  Ad=8d(I) Atr=03(Int.) MxPS=  64 Ivl=5ms
E:  Ad=8e(I) Atr=02(Bulk) MxPS=  64 Ivl=0ms
E:  Ad=09(O) Atr=02(Bulk) MxPS=  64 Ivl=0ms

In 22b8:900e "qc_dload" mode the device shows up as:

T:  Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#=  2 Spd=12   MxCh= 0
D:  Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
P:  Vendor=22b8 ProdID=900e Rev= 0.00
S:  Manufacturer=Motorola, Incorporated
S:  Product=Flash MZ600
C:* #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr=500mA
I:* If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
E:  Ad=81(I) Atr=02(Bulk) MxPS=  64 Ivl=0ms
E:  Ad=01(O) Atr=02(Bulk) MxPS=  64 Ivl=0ms

And in 22b8:4281 "ram_downloader" mode the device shows up as:

T:  Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#=  2 Spd=12   MxCh= 0
D:  Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
P:  Vendor=22b8 ProdID=4281 Rev= 0.00
S:  Manufacturer=Motorola, Incorporated
S:  Product=Flash MZ600
C:* #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr=500mA
I:* If#= 0 Alt= 0 #EPs= 2 Cls=0a(data ) Sub=00 Prot=fc Driver=option
E:  Ad=81(I) Atr=02(Bulk) MxPS=  64 Ivl=0ms
E:  Ad=01(O) Atr=02(Bulk) MxPS=  64 Ivl=0ms

Cc: Bjørn Mork <bjorn@mork.no>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Lars Melin <larsm17@gmail.com>
Cc: Marcel Partap <mpartap@gmx.net>
Cc: Merlijn Wajer <merlijn@wizzup.org>
Cc: Michael Scott <hashcode0f@gmail.com>
Cc: NeKit <nekit1000@gmail.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Sebastian Reichel <sre@kernel.org>
Tested-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Johan Hovold <johan@kernel.org>

commit | commitdiff | tree

Tony Luck [Wed, 14 Aug 2019 23:40:30 +0000 (16:40 -0700)]

MAINTAINERS, x86/CPU: Tony Luck will maintain asm/intel-family.h

There are a few different subsystems in the kernel that depend on model
specific behaviour (perf, EDAC, power, ...). Easier for just one person
to have the task to get new model numbers included instead of having
these groups trip over each other to do it.

[ bp: s/Cpu/CPU/ and add x86@kernel.org so that it gets CCed too as
FYI. ]

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190814234030.30817-1-tony.luck@intel.com

commit | commitdiff | tree

Dave Airlie [Thu, 15 Aug 2019 03:29:18 +0000 (13:29 +1000)]

Merge tag 'drm-fixes-5.3-2019-08-14' of git://people.freedesktop.org/~agd5f/linux into drm-fixes

drm-fixes-5.3-2019-08-14:

amdgpu:
- Use kvalloc for dc_state to avoid allocation
failures in some cases.
- Fix gfx9 soft recovery

scheduler:
- Fix a race condition when destroying entities

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190815024919.3434-1-alexander.deucher@amd.com

commit | commitdiff | tree

Lyude Paul [Fri, 9 Aug 2019 00:53:05 +0000 (20:53 -0400)]

drm/nouveau: Only recalculate PBN/VCPI on mode/connector changes

I -thought- I had fixed this entirely, but it looks like that I didn't
test this thoroughly enough as we apparently still make one big mistake
with nv50_msto_atomic_check() - we don't handle the following scenario:

* CRTC #1 has n VCPI allocated to it, is attached to connector DP-4
  which is attached to encoder #1. enabled=y active=n
* CRTC #1 is changed from DP-4 to DP-5, causing:
  * DP-4 crtc=#1→NULL (VCPI n→0)
  * DP-5 crtc=NULL→#1
  * CRTC #1 steals encoder #1 back from DP-4 and gives it to DP-5
  * CRTC #1 maintains the same mode as before, just with a different
    connector
* mode_changed=n connectors_changed=y
  (we _SHOULD_ do VCPI 0→n here, but don't)

Once the above scenario is repeated once, we'll attempt freeing VCPI
from the connector that we didn't allocate due to the connectors
changing, but the mode staying the same. Sigh.

Since nv50_msto_atomic_check() has broken a few times now, let's rethink
things a bit to be more careful: limit both VCPI/PBN allocations to
mode_changed || connectors_changed, since neither VCPI or PBN should
ever need to change outside of routing and mode changes.

Changes since v1:
* Fix accidental reversal of clock and bpp arguments in
  drm_dp_calc_pbn_mode() - William Lewis

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reported-by: Bohdan Milar <bmilar@redhat.com>
Tested-by: Bohdan Milar <bmilar@redhat.com>
Fixes: 232c9eec417a ("drm/nouveau: Use atomic VCPI helpers for MST")
References: 412e85b60531 ("drm/nouveau: Only release VCPI slots on mode changes")
Cc: Lyude Paul <lyude@redhat.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Airlie <airlied@redhat.com>
Cc: Jerry Zuo <Jerry.Zuo@amd.com>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Juston Li <juston.li@intel.com>
Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Cc: Karol Herbst <karolherbst@gmail.com>
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: <stable@vger.kernel.org> # v5.1+
Acked-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190809005307.18391-1-lyude@redhat.com

commit | commitdiff | tree

Y.C. Chen [Wed, 11 Apr 2018 01:27:39 +0000 (09:27 +0800)]

drm/ast: Fixed reboot test may cause system hanged

There is another thread still access standard VGA I/O while loading drm driver.
Disable standard VGA I/O decode to avoid this issue.

Signed-off-by: Y.C. Chen <yc_chen@aspeedtech.com>
Reviewed-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1523410059-18415-1-git-send-email-yc_chen@aspeedtech.com

commit | commitdiff | tree

Lubomir Rintel [Wed, 7 Aug 2019 13:22:31 +0000 (15:22 +0200)]

of: irq: fix a trivial typo in a doc comment

Diverged from what the code does with commit 530210c7814e ("of/irq: Replace
of_irq with of_phandle_args").

Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Signed-off-by: Rob Herring <robh@kernel.org>

commit | commitdiff | tree

Rob Herring [Tue, 13 Aug 2019 20:47:54 +0000 (14:47 -0600)]

dt-bindings: pinctrl: stm32: Fix 'st,syscfg' schema

The proper way to add additional contraints to an existing json-schema
is using 'allOf' to reference the base schema. Using just '$ref' doesn't
work. Fix this for the 'st,syscfg' property.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Cc: linux-gpio@vger.kernel.org
Cc: linux-stm32@st-md-mailman.stormreply.com
Cc: linux-arm-kernel@lists.infradead.org
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Rob Herring <robh@kernel.org>

commit | commitdiff | tree

Linus Torvalds [Wed, 14 Aug 2019 22:29:53 +0000 (15:29 -0700)]

Merge tag 'Wimplicit-fallthrough-5.3-rc5' of git://git./linux/kernel/git/gustavoars/linux

Pull fallthrough fixes from Gustavo A. R. Silva:
"Fix sh mainline builds:

   - Fix fall-through warning in sh.

   - Fix missing break bug in sh (this is a 10-year-old bug)

  Currently, mainline builds for sh are broken. These patches fix that"

* tag 'Wimplicit-fallthrough-5.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux:
  sh: kernel: hw_breakpoint: Fix missing break in switch statement
  sh: kernel: disassemble: Mark expected switch fall-throughs

commit | commitdiff | tree

Linus Torvalds [Wed, 14 Aug 2019 21:21:14 +0000 (14:21 -0700)]

Merge tag 'afs-fixes-20190814' of git://git./linux/kernel/git/dhowells/linux-fs

Pull afs fixes from David Howells:

- Fix the CB.ProbeUuid handler to generate its reply correctly.

- Fix a mix up in indices when parsing a Volume Location entry record.

- Fix a potential NULL-pointer deref when cleaning up a read request.

- Fix the expected data version of the destination directory in
   afs_rename().

- Fix afs_d_revalidate() to only update d_fsdata if it's not the same
   as the directory data version to reduce the likelihood of overwriting
   the result of a competing operation. (d_fsdata carries the directory
   DV or the least-significant word thereof).

- Fix the tracking of the data-version on a directory and make sure
   that dentry objects get properly initialised, updated and
   revalidated.

   Also fix rename to update d_fsdata to match the new directory's DV if
   the dentry gets moved over and unhash the dentry to stop
   afs_d_revalidate() from interfering.

* tag 'afs-fixes-20190814' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  afs: Fix missing dentry data version updating
  afs: Only update d_fsdata if different in afs_d_revalidate()
  afs: Fix off-by-one in afs_rename() expected data version calculation
  fs: afs: Fix a possible null-pointer dereference in afs_put_read()
  afs: Fix loop index mixup in afs_deliver_vl_get_entry_by_name_u()
  afs: Fix the CB.ProbeUuid service handler to reply correctly

commit | commitdiff | tree

Christian König [Fri, 9 Aug 2019 15:27:21 +0000 (17:27 +0200)]

drm/scheduler: use job count instead of peek

The spsc_queue_peek function is accessing queue->head which belongs to
the consumer thread and shouldn't be accessed by the producer

This is fixing a rare race condition when destroying entities.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Monk.liu@amd.com
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

commit | commitdiff | tree

Vincent Chen [Wed, 14 Aug 2019 08:23:53 +0000 (16:23 +0800)]

riscv: Make __fstate_clean() work correctly.

Make the __fstate_clean() function correctly set the
state of sstatus.FS in pt_regs to SR_FS_CLEAN.

Fixes: 7db91e57a0acd ("RISC-V: Task implementation")
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
[paul.walmsley@sifive.com: expanded "Fixes" commit ID]
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>

commit | commitdiff | tree

Vincent Chen [Wed, 14 Aug 2019 08:23:52 +0000 (16:23 +0800)]

riscv: Correct the initialized flow of FP register

  The following two reasons cause FP registers are sometimes not
initialized before starting the user program.
1. Currently, the FP context is initialized in flush_thread() function
   and we expect these initial values to be restored to FP register when
   doing FP context switch. However, the FP context switch only occurs in
   switch_to function. Hence, if this process does not be scheduled out
   and scheduled in before entering the user space, the FP registers
   have no chance to initialize.
2. In flush_thread(), the state of reg->sstatus.FS inherits from the
   parent. Hence, the state of reg->sstatus.FS may be dirty. If this
   process is scheduled out during flush_thread() and initializing the
   FP register, the fstate_save() in switch_to will corrupt the FP context
   which has been initialized until flush_thread().

  To solve the 1st case, the initialization of the FP register will be
completed in start_thread(). It makes sure all FP registers are initialized
before starting the user program. For the 2nd case, the state of
reg->sstatus.FS in start_thread will be set to SR_FS_OFF to prevent this
process from corrupting FP context in doing context save. The FP state is
set to SR_FS_INITIAL in start_trhead().

Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Fixes: 7db91e57a0acd ("RISC-V: Task implementation")
Cc: stable@vger.kernel.org
[paul.walmsley@sifive.com: fixed brace alignment issue reported by
checkpatch]
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>

commit | commitdiff | tree

Linus Torvalds [Wed, 14 Aug 2019 18:10:38 +0000 (11:10 -0700)]

Merge tag 'for-linus' of git://git./linux/kernel/git/rdma/rdma

Pull rdma fixes from Doug Ledford:
"Fairly small pull request for -rc3. I'm out of town the rest of this
  week, so I made sure to clean out as much as possible from patchworks
  in enough time for 0-day to chew through it (Yay! for 0-day being back
  online! :-)). Jason might send through any emergency stuff that could
  pop up, otherwise I'm back next week.

  The only real thing of note is the siw ABI change. Since we just
  merged siw *this* release, there are no prior kernel releases to
  maintain kernel ABI with. I told Bernard that if there is anything
  else about the siw ABI he thinks he might want to change before it
  goes set in stone, he should get it in ASAP. The siw module was around
  for several years outside the kernel tree, and it had to be revamped
  considerably for inclusion upstream, so we are making no attempts to
  be backward compatible with the out of tree version. Once 5.3 is
  actually released, we will have our baseline ABI to maintain.

  Summary:

   - Fix a memory registration release flow issue that was causing a
     WARN_ON (mlx5)

   - If the counters for a port aren't allocated, then we can't do
     operations on the non-existent counters (core)

   - Check the right variable for error code result (mlx5)

   - Fix a use after free issue (mlx5)

   - Fix an off by one memory leak (siw)

   - Actually return an error code on error (core)

   - Allow siw to be built on 32bit arches (siw, ABI change, but OK
     since siw was just merged this merge window and there is no prior
     released kernel to maintain compatibility with and we also updated
     the rdma-core user space package to match)"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/siw: Change CQ flags from 64->32 bits
  RDMA/core: Fix error code in stat_get_doit_qp()
  RDMA/siw: Fix a memory leak in siw_init_cpulist()
  IB/mlx5: Fix use-after-free error while accessing ev_file pointer
  IB/mlx5: Check the correct variable in error handling code
  RDMA/counter: Prevent QP counter binding if counters unsupported
  IB/mlx5: Fix implicit MR release flow

commit | commitdiff | tree

Hui Peng [Wed, 14 Aug 2019 02:34:04 +0000 (22:34 -0400)]

ALSA: usb-audio: Fix an OOB bug in parse_audio_mixer_unit

The `uac_mixer_unit_descriptor` shown as below is read from the
device side. In `parse_audio_mixer_unit`, `baSourceID` field is
accessed from index 0 to `bNrInPins` - 1, the current implementation
assumes that descriptor is always valid (the length of descriptor
is no shorter than 5 + `bNrInPins`). If a descriptor read from
the device side is invalid, it may trigger out-of-bound memory
access.

```
struct uac_mixer_unit_descriptor {
__u8 bLength;
__u8 bDescriptorType;
__u8 bDescriptorSubtype;
__u8 bUnitID;
__u8 bNrInPins;
__u8 baSourceID[];
}
```

This patch fixes the bug by add a sanity check on the length of
the descriptor.

Reported-by: Hui Peng <benquike@gmail.com>
Reported-by: Mathias Payer <mathias.payer@nebelwelt.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Hui Peng <benquike@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

commit | commitdiff | tree

Linus Torvalds [Wed, 14 Aug 2019 17:31:11 +0000 (10:31 -0700)]

Merge tag 'dma-mapping-5.3-4' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping fixes from Christoph Hellwig:

- fix the handling of the bus_dma_mask in dma_get_required_mask, which
   caused a regression in this merge window (Lucas Stach)

- fix a regression in the handling of DMA_ATTR_NO_KERNEL_MAPPING (me)

- fix dma_mmap_coherent to not cause page attribute mismatches on
   coherent architectures like x86 (me)

* tag 'dma-mapping-5.3-4' of git://git.infradead.org/users/hch/dma-mapping:
  dma-mapping: fix page attributes for dma_mmap_*
  dma-direct: don't truncate dma_required_mask to bus addressing capabilities
  dma-direct: fix DMA_ATTR_NO_KERNEL_MAPPING

commit | commitdiff | tree

Linus Torvalds [Wed, 14 Aug 2019 17:16:59 +0000 (10:16 -0700)]

Merge tag 'iommu-fixes-v5.3-rc4' of git://git./linux/kernel/git/joro/iommu

Pull iommu fixes from Joerg Roedel:

- A couple more fixes for the Intel VT-d driver for bugs introduced
   during the recent conversion of this driver to use IOMMU core default
   domains.

- Fix for common dma-iommu code to make sure MSI mappings happen in the
   correct domain for a device.

- Fix a corner case in the handling of sg-lists in dma-iommu code that
   might cause dma_length to be truncated.

- Mark a switch as fall-through in arm-smmu code.

* tag 'iommu-fixes-v5.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/vt-d: Fix possible use-after-free of private domain
  iommu/vt-d: Detach domain before using a private one
  iommu/dma: Handle SG length overflow better
  iommu/vt-d: Correctly check format of page table in debugfs
  iommu/vt-d: Detach domain when move device out of group
  iommu/arm-smmu: Mark expected switch fall-through
  iommu/dma: Handle MSI mappings separately

commit | commitdiff | tree

Linus Torvalds [Wed, 14 Aug 2019 16:53:46 +0000 (09:53 -0700)]

Merge branch 'akpm' (patches from Andrew)

Merge misc VM fixes from Andrew Morton:
"A bunch of hotfixes, all affecting mm/.

  The two-patch series from Andrea may be controversial. This restores
  patches which were reverted in Dec 2018 due to a regression report [*].

  After extensive discussion it is evident that the problems which these
  patches solved were significantly more serious than the problems they
  introduced. I am told that major distros are already carrying these
  two patches for this reason"

[*] See

      https://lore.kernel.org/lkml/alpine.DEB.2.21.1812061343240.144733@chino.kir.corp.google.com/
      https://lore.kernel.org/lkml/alpine.DEB.2.21.1812031545560.161134@chino.kir.corp.google.com/

  for the google-specific issues brought up by David Rijentes. And as
  Andrew says:

    "I'm unaware of anyone else who will be adversely affected by this,
     and google already carries over a thousand kernel patches - another
     won't kill them.

     There has been sporadic discussion about fixing these things for
     real but it's clear that nobody apart from David is particularly
     motivated"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  hugetlbfs: fix hugetlb page migration/fault race causing SIGBUS
  mm, vmscan: do not special-case slab reclaim when watermarks are boosted
  Revert "mm, thp: restore node-local hugepage allocations"
  Revert "Revert "mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask""
  include/asm-generic/5level-fixup.h: fix variable 'p4d' set but not used
  seq_file: fix problem when seeking mid-record
  mm: workingset: fix vmstat counters for shadow nodes
  mm/usercopy: use memory range to be accessed for wraparound check
  mm: kmemleak: disable early logging in case of error
  mm/vmalloc.c: fix percpu free VM area search criteria
  mm/memcontrol.c: fix use after free in mem_cgroup_iter()
  mm/z3fold.c: fix z3fold_destroy_pool() race condition
  mm/z3fold.c: fix z3fold_destroy_pool() ordering
  mm: mempolicy: handle vma with unmovable pages mapped correctly in mbind
  mm: mempolicy: make the behavior consistent when MPOL_MF_MOVE* and MPOL_MF_STRICT were specified
  mm/hmm: fix bad subpage pointer in try_to_unmap_one
  mm/hmm: fix ZONE_DEVICE anon page mapping reuse
  mm: document zone device struct page field usage

commit | commitdiff | tree

Nishad Kamdar [Sat, 3 Aug 2019 14:13:35 +0000 (19:43 +0530)]

i2c: stm32: Use the correct style for SPDX License Identifier

This patch corrects the SPDX License Identifier style
in header file related to STM32 Driver for I2C hardware
bus support.
For C header files Documentation/process/license-rules.rst
mandates C-like comments (opposed to C source files where
C++ style should be used)

Changes made by using a script provided by Joe Perches here:
https://lkml.org/lkml/2019/2/7/46

Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Nishad Kamdar <nishadkamdar@gmail.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>

commit | commitdiff | tree

Wolfram Sang [Thu, 8 Aug 2019 19:54:17 +0000 (21:54 +0200)]

i2c: emev2: avoid race when unregistering slave client

After we disabled interrupts, there might still be an active one
running. Sync before clearing the pointer to the slave device.

Fixes: c31d0a00021d ("i2c: emev2: add slave support")
Reported-by: Krzysztof Adamski <krzysztof.adamski@nokia.com>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Krzysztof Adamski <krzysztof.adamski@nokia.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>

commit | commitdiff | tree

Wolfram Sang [Thu, 8 Aug 2019 19:39:10 +0000 (21:39 +0200)]

i2c: rcar: avoid race when unregistering slave client

After we disabled interrupts, there might still be an active one
running. Sync before clearing the pointer to the slave device.

Fixes: de20d1857dd6 ("i2c: rcar: add slave support")
Reported-by: Krzysztof Adamski <krzysztof.adamski@nokia.com>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Krzysztof Adamski <krzysztof.adamski@nokia.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>

commit | commitdiff | tree

Oleksij Rempel [Mon, 12 Aug 2019 05:08:17 +0000 (07:08 +0200)]

MAINTAINERS: i2c-imx: take over maintainership

I would like to maintain the i2c-imx driver. Since I work with
different i.MX variants and have access to the hardware, I can spend
some time on the reviewing of this driver.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>

commit | commitdiff | tree

Fabio Estevam [Thu, 8 Aug 2019 21:01:36 +0000 (18:01 -0300)]

Revert "i2c: imx: improve the error handling in i2c_imx_dma_request()"

Since commit e1ab9a468e3b ("i2c: imx: improve the error handling in
i2c_imx_dma_request()") when booting with the DMA driver as module (such
as CONFIG_FSL_EDMA=m) the following endless clk warnings are seen:

[  153.077831] ------------[ cut here ]------------
[  153.082528] WARNING: CPU: 0 PID: 15 at drivers/clk/clk.c:924 clk_core_disable_lock+0x18/0x24
[  153.093077] i2c0 already disabled
[  153.096416] Modules linked in:
[  153.099521] CPU: 0 PID: 15 Comm: kworker/0:1 Tainted: G        W         5.2.0+ #321
[  153.107290] Hardware name: Freescale Vybrid VF5xx/VF6xx (Device Tree)
[  153.113772] Workqueue: events deferred_probe_work_func
[  153.118979] [<c0019560>] (unwind_backtrace) from [<c0014734>] (show_stack+0x10/0x14)
[  153.126778] [<c0014734>] (show_stack) from [<c083f8dc>] (dump_stack+0x9c/0xd4)
[  153.134051] [<c083f8dc>] (dump_stack) from [<c0031154>] (__warn+0xf8/0x124)
[  153.141056] [<c0031154>] (__warn) from [<c0031248>] (warn_slowpath_fmt+0x38/0x48)
[  153.148580] [<c0031248>] (warn_slowpath_fmt) from [<c040fde0>] (clk_core_disable_lock+0x18/0x24)
[  153.157413] [<c040fde0>] (clk_core_disable_lock) from [<c058f520>] (i2c_imx_probe+0x554/0x6ec)
[  153.166076] [<c058f520>] (i2c_imx_probe) from [<c04b9178>] (platform_drv_probe+0x48/0x98)
[  153.174297] [<c04b9178>] (platform_drv_probe) from [<c04b7298>] (really_probe+0x1d8/0x2c0)
[  153.182605] [<c04b7298>] (really_probe) from [<c04b7554>] (driver_probe_device+0x5c/0x174)
[  153.190909] [<c04b7554>] (driver_probe_device) from [<c04b58c8>] (bus_for_each_drv+0x44/0x8c)
[  153.199480] [<c04b58c8>] (bus_for_each_drv) from [<c04b746c>] (__device_attach+0xa0/0x108)
[  153.207782] [<c04b746c>] (__device_attach) from [<c04b65a4>] (bus_probe_device+0x88/0x90)
[  153.215999] [<c04b65a4>] (bus_probe_device) from [<c04b6a04>] (deferred_probe_work_func+0x60/0x90)
[  153.225003] [<c04b6a04>] (deferred_probe_work_func) from [<c004f190>] (process_one_work+0x204/0x634)
[  153.234178] [<c004f190>] (process_one_work) from [<c004f618>] (worker_thread+0x20/0x484)
[  153.242315] [<c004f618>] (worker_thread) from [<c0055c2c>] (kthread+0x118/0x150)
[  153.249758] [<c0055c2c>] (kthread) from [<c00090b4>] (ret_from_fork+0x14/0x20)
[  153.257006] Exception stack(0xdde43fb0 to 0xdde43ff8)
[  153.262095] 3fa0:                                     00000000 00000000 00000000 00000000
[  153.270306] 3fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[  153.278520] 3fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[  153.285159] irq event stamp: 3323022
[  153.288787] hardirqs last  enabled at (3323021): [<c0861c4c>] _raw_spin_unlock_irq+0x24/0x2c
[  153.297261] hardirqs last disabled at (3323022): [<c040d7a0>] clk_enable_lock+0x10/0x124
[  153.305392] softirqs last  enabled at (3322092): [<c000a504>] __do_softirq+0x344/0x540
[  153.313352] softirqs last disabled at (3322081): [<c00385c0>] irq_exit+0x10c/0x128
[  153.320946] ---[ end trace a506731ccd9bd703 ]---

This endless clk warnings behaviour is well explained by Andrey Smirnov:

"Allocating DMA after registering I2C adapter can lead to infinite
probing loop, for example, consider the following scenario:

    1. i2c_imx_probe() is called and successfully registers an I2C
       adapter via i2c_add_numbered_adapter()

    2. As a part of i2c_add_numbered_adapter() new I2C slave devices
       are added from DT which results in a call to
       driver_deferred_probe_trigger()

    3. i2c_imx_probe() continues and calls i2c_imx_dma_request() which
       due to lack of proper DMA driver returns -EPROBE_DEFER

    4. i2c_imx_probe() fails, removes I2C adapter and returns
       -EPROBE_DEFER, which places it into deferred probe list

    5. Deferred probe work triggered in #2 above kicks in and calls
       i2c_imx_probe() again thus bringing us to step #1"

So revert commit e1ab9a468e3b ("i2c: imx: improve the error handling in
i2c_imx_dma_request()") and restore the old behaviour, in order to
avoid regressions on existing setups.

Cc: <stable@vger.kernel.org>
Reported-by: Andrey Smirnov <andrew.smirnov@gmail.com>
Reported-by: Russell King <linux@armlinux.org.uk>
Fixes: e1ab9a468e3b ("i2c: imx: improve the error handling in i2c_imx_dma_request()")
Signed-off-by: Fabio Estevam <festevam@gmail.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>

commit | commitdiff | tree

Hui Wang [Wed, 14 Aug 2019 04:09:08 +0000 (12:09 +0800)]

ALSA: hda - Add a generic reboot_notify

Make codec enter D3 before rebooting or poweroff can fix the noise
issue on some laptops. And in theory it is harmless for all codecs
to enter D3 before rebooting or poweroff, let us add a generic
reboot_notify, then realtek and conexant drivers can call this
function.

Cc: stable@vger.kernel.org
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

commit | commitdiff | tree

Hui Wang [Wed, 14 Aug 2019 04:09:07 +0000 (12:09 +0800)]

ALSA: hda - Let all conexant codec enter D3 when rebooting

We have 3 new lenovo laptops which have conexant codec 0x14f11f86,
these 3 laptops also have the noise issue when rebooting, after
letting the codec enter D3 before rebooting or poweroff, the noise
disappers.

Instead of adding a new ID again in the reboot_notify(), let us make
this function apply to all conexant codec. In theory make codec enter
D3 before rebooting or poweroff is harmless, and I tested this change
on a couple of other Lenovo laptops which have different conexant
codecs, there is no side effect so far.

Cc: stable@vger.kernel.org
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

commit | commitdiff | tree

Alistair Francis [Tue, 13 Aug 2019 23:32:30 +0000 (16:32 -0700)]

riscv: defconfig: Update the defconfig

Update the defconfig:
- Add CONFIG_HW_RANDOM=y and CONFIG_HW_RANDOM_VIRTIO=y to enable
VirtIORNG when running on QEMU

Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>

commit | commitdiff | tree

Alistair Francis [Tue, 13 Aug 2019 23:32:29 +0000 (16:32 -0700)]

riscv: rv32_defconfig: Update the defconfig

Update the rv32_defconfig:
- Add 'CONFIG_DEVTMPFS_MOUNT=y' to match the RISC-V defconfig
- Add CONFIG_HW_RANDOM=y and CONFIG_HW_RANDOM_VIRTIO=y to enable
VirtIORNG when running on QEMU

Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>

commit | commitdiff | tree

Mike Kravetz [Tue, 13 Aug 2019 22:38:00 +0000 (15:38 -0700)]

hugetlbfs: fix hugetlb page migration/fault race causing SIGBUS

Li Wang discovered that LTP/move_page12 V2 sometimes triggers SIGBUS in
the kernel-v5.2.3 testing.  This is caused by a race between hugetlb
page migration and page fault.

If a hugetlb page can not be allocated to satisfy a page fault, the task
is sent SIGBUS.  This is normal hugetlbfs behavior.  A hugetlb fault
mutex exists to prevent two tasks from trying to instantiate the same
page.  This protects against the situation where there is only one
hugetlb page, and both tasks would try to allocate.  Without the mutex,
one would fail and SIGBUS even though the other fault would be
successful.

There is a similar race between hugetlb page migration and fault.
Migration code will allocate a page for the target of the migration.  It
will then unmap the original page from all page tables.  It does this
unmap by first clearing the pte and then writing a migration entry.  The
page table lock is held for the duration of this clear and write
operation.  However, the beginnings of the hugetlb page fault code
optimistically checks the pte without taking the page table lock.  If
clear (as it can be during the migration unmap operation), a hugetlb
page allocation is attempted to satisfy the fault.  Note that the page
which will eventually satisfy this fault was already allocated by the
migration code.  However, the allocation within the fault path could
fail which would result in the task incorrectly being sent SIGBUS.

Ideally, we could take the hugetlb fault mutex in the migration code
when modifying the page tables.  However, locks must be taken in the
order of hugetlb fault mutex, page lock, page table lock.  This would
require significant rework of the migration code.  Instead, the issue is
addressed in the hugetlb fault code.  After failing to allocate a huge
page, take the page table lock and check for huge_pte_none before
returning an error.  This is the same check that must be made further in
the code even if page allocation is successful.

Link: http://lkml.kernel.org/r/20190808000533.7701-1-mike.kravetz@oracle.com
Fixes: 290408d4a250 ("hugetlb: hugepage migration core")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reported-by: Li Wang <liwang@redhat.com>
Tested-by: Li Wang <liwang@redhat.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Cyril Hrubis <chrubis@suse.cz>
Cc: Xishi Qiu <xishi.qiuxishi@alibaba-inc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Mel Gorman [Tue, 13 Aug 2019 22:37:57 +0000 (15:37 -0700)]

mm, vmscan: do not special-case slab reclaim when watermarks are boosted

Dave Chinner reported a problem pointing a finger at commit 1c30844d2dfe
("mm: reclaim small amounts of memory when an external fragmentation
event occurs").

The report is extensive:

  https://lore.kernel.org/linux-mm/20190807091858.2857-1-david@fromorbit.com/

and it's worth recording the most relevant parts (colorful language and
typos included).

When running a simple, steady state 4kB file creation test to
simulate extracting tarballs larger than memory full of small
files into the filesystem, I noticed that once memory fills up
the cache balance goes to hell.

The workload is creating one dirty cached inode for every dirty
page, both of which should require a single IO each to clean and
reclaim, and creation of inodes is throttled by the rate at which
dirty writeback runs at (via balance dirty pages). Hence the ingest
rate of new cached inodes and page cache pages is identical and
steady. As a result, memory reclaim should quickly find a steady
balance between page cache and inode caches.

The moment memory fills, the page cache is reclaimed at a much
faster rate than the inode cache, and evidence suggests that
the inode cache shrinker is not being called when large batches
of pages are being reclaimed. In roughly the same time period
that it takes to fill memory with 50% pages and 50% slab caches,
memory reclaim reduces the page cache down to just dirty pages
and slab caches fill the entirety of memory.

The LRU is largely full of dirty pages, and we're getting spikes
of random writeback from memory reclaim so it's all going to shit.
Behaviour never recovers, the page cache remains pinned at just
dirty pages, and nothing I could tune would make any difference.
vfs_cache_pressure makes no difference - I would set it so high
it should trim the entire inode caches in a single pass, yet it
didn't do anything. It was clear from tracing and live telemetry
that the shrinkers were pretty much not running except when
there was absolutely no memory free at all, and then they did
the minimum necessary to free memory to make progress.

So I went looking at the code, trying to find places where pages
got reclaimed and the shrinkers weren't called. There's only one
- kswapd doing boosted reclaim as per commit 1c30844d2dfe ("mm:
reclaim small amounts of memory when an external fragmentation
event occurs").

The watermark boosting introduced by the commit is triggered in response
to an allocation "fragmentation event".  The boosting was not intended
to target THP specifically and triggers even if THP is disabled.
However, with Dave's perfectly reasonable workload, fragmentation events
can be very common given the ratio of slab to page cache allocations so
boosting remains active for long periods of time.

As high-order allocations might use compaction and compaction cannot
move slab pages the decision was made in the commit to special-case
kswapd when watermarks are boosted -- kswapd avoids reclaiming slab as
reclaiming slab does not directly help compaction.

As Dave notes, this decision means that slab can be artificially
protected for long periods of time and messes up the balance with slab
and page caches.

Removing the special casing can still indirectly help avoid
fragmentation by avoiding fragmentation-causing events due to slab
allocation as pages from a slab pageblock will have some slab objects
freed.  Furthermore, with the special casing, reclaim behaviour is
unpredictable as kswapd sometimes examines slab and sometimes does not
in a manner that is tricky to tune or analyse.

This patch removes the special casing.  The downside is that this is not
a universal performance win.  Some benchmarks that depend on the
residency of data when rereading metadata may see a regression when slab
reclaim is restored to its original behaviour.  Similarly, some
benchmarks that only read-once or write-once may perform better when
page reclaim is too aggressive.  The primary upside is that slab
shrinker is less surprising (arguably more sane but that's a matter of
opinion), behaves consistently regardless of the fragmentation state of
the system and properly obeys VM sysctls.

A fsmark benchmark configuration was constructed similar to what Dave
reported and is codified by the mmtest configuration
config-io-fsmark-small-file-stream.  It was evaluated on a 1-socket
machine to avoid dealing with NUMA-related issues and the timing of
reclaim.  The storage was an SSD Samsung Evo and a fresh trimmed XFS
filesystem was used for the test data.

This is not an exact replication of Dave's setup.  The configuration
scales its parameters depending on the memory size of the SUT to behave
similarly across machines.  The parameters mean the first sample
reported by fs_mark is using 50% of RAM which will barely be throttled
and look like a big outlier.  Dave used fake NUMA to have multiple
kswapd instances which I didn't replicate.  Finally, the number of
iterations differ from Dave's test as the target disk was not large
enough.  While not identical, it should be representative.

  fsmark
                                     5.3.0-rc3              5.3.0-rc3
                                       vanilla          shrinker-v1r1
  Min       1-files/sec     4444.80 (   0.00%)     4765.60 (   7.22%)
  1st-qrtle 1-files/sec     5005.10 (   0.00%)     5091.70 (   1.73%)
  2nd-qrtle 1-files/sec     4917.80 (   0.00%)     4855.60 (  -1.26%)
  3rd-qrtle 1-files/sec     4667.40 (   0.00%)     4831.20 (   3.51%)
  Max-1     1-files/sec    11421.50 (   0.00%)     9999.30 ( -12.45%)
  Max-5     1-files/sec    11421.50 (   0.00%)     9999.30 ( -12.45%)
  Max-10    1-files/sec    11421.50 (   0.00%)     9999.30 ( -12.45%)
  Max-90    1-files/sec     4649.60 (   0.00%)     4780.70 (   2.82%)
  Max-95    1-files/sec     4491.00 (   0.00%)     4768.20 (   6.17%)
  Max-99    1-files/sec     4491.00 (   0.00%)     4768.20 (   6.17%)
  Max       1-files/sec    11421.50 (   0.00%)     9999.30 ( -12.45%)
  Hmean     1-files/sec     5004.75 (   0.00%)     5075.96 (   1.42%)
  Stddev    1-files/sec     1778.70 (   0.00%)     1369.66 (  23.00%)
  CoeffVar  1-files/sec       33.70 (   0.00%)       26.05 (  22.71%)
  BHmean-99 1-files/sec     5053.72 (   0.00%)     5101.52 (   0.95%)
  BHmean-95 1-files/sec     5053.72 (   0.00%)     5101.52 (   0.95%)
  BHmean-90 1-files/sec     5107.05 (   0.00%)     5131.41 (   0.48%)
  BHmean-75 1-files/sec     5208.45 (   0.00%)     5206.68 (  -0.03%)
  BHmean-50 1-files/sec     5405.53 (   0.00%)     5381.62 (  -0.44%)
  BHmean-25 1-files/sec     6179.75 (   0.00%)     6095.14 (  -1.37%)

                     5.3.0-rc3   5.3.0-rc3
                       vanillashrinker-v1r1
  Duration User         501.82      497.29
  Duration System      4401.44     4424.08
  Duration Elapsed     8124.76     8358.05

This is showing a slight skew for the max result representing a large
outlier for the 1st, 2nd and 3rd quartile are similar indicating that
the bulk of the results show little difference.  Note that an earlier
version of the fsmark configuration showed a regression but that
included more samples taken while memory was still filling.

Note that the elapsed time is higher.  Part of this is that the
configuration included time to delete all the test files when the test
completes -- the test automation handles the possibility of testing
fsmark with multiple thread counts.  Without the patch, many of these
objects would be memory resident which is part of what the patch is
addressing.

There are other important observations that justify the patch.

1. With the vanilla kernel, the number of dirty pages in the system is
   very low for much of the test. With this patch, dirty pages is
   generally kept at 10% which matches vm.dirty_background_ratio which
   is normal expected historical behaviour.

2. With the vanilla kernel, the ratio of Slab/Pagecache is close to
   0.95 for much of the test i.e. Slab is being left alone and
   dominating memory consumption. With the patch applied, the ratio
   varies between 0.35 and 0.45 with the bulk of the measured ratios
   roughly half way between those values. This is a different balance to
   what Dave reported but it was at least consistent.

3. Slabs are scanned throughout the entire test with the patch applied.
   The vanille kernel has periods with no scan activity and then
   relatively massive spikes.

4. Without the patch, kswapd scan rates are very variable. With the
   patch, the scan rates remain quite steady.

4. Overall vmstats are closer to normal expectations

                                5.3.0-rc3      5.3.0-rc3
                                  vanilla  shrinker-v1r1
    Ops Direct pages scanned             99388.00      328410.00
    Ops Kswapd pages scanned          45382917.00    33451026.00
    Ops Kswapd pages reclaimed        30869570.00    25239655.00
    Ops Direct pages reclaimed           74131.00        5830.00
    Ops Kswapd efficiency %                 68.02          75.45
    Ops Kswapd velocity                   5585.75        4002.25
    Ops Page reclaim immediate         1179721.00      430927.00
    Ops Slabs scanned                 62367361.00    73581394.00
    Ops Direct inode steals               2103.00        1002.00
    Ops Kswapd inode steals             570180.00     5183206.00

o Vanilla kernel is hitting direct reclaim more frequently,
  not very much in absolute terms but the fact the patch
  reduces it is interesting
o "Page reclaim immediate" in the vanilla kernel indicates
  dirty pages are being encountered at the tail of the LRU.
  This is generally bad and means in this case that the LRU
  is not long enough for dirty pages to be cleaned by the
  background flush in time. This is much reduced by the
  patch.
o With the patch, kswapd is reclaiming 10 times more slab
  pages than with the vanilla kernel. This is indicative
  of the watermark boosting over-protecting slab

A more complete set of tests were run that were part of the basis for
introducing boosting and while there are some differences, they are well
within tolerances.

Bottom line, the special casing kswapd to avoid slab behaviour is
unpredictable and can lead to abnormal results for normal workloads.

This patch restores the expected behaviour that slab and page cache is
balanced consistently for a workload with a steady allocation ratio of
slab/pagecache pages.  It also means that if there are workloads that
favour the preservation of slab over pagecache that it can be tuned via
vm.vfs_cache_pressure where as the vanilla kernel effectively ignores
the parameter when boosting is active.

Link: http://lkml.kernel.org/r/20190808182946.GM2739@techsingularity.net
Fixes: 1c30844d2dfe ("mm: reclaim small amounts of memory when an external fragmentation event occurs")
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: <stable@vger.kernel.org> [5.0+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Andrea Arcangeli [Tue, 13 Aug 2019 22:37:53 +0000 (15:37 -0700)]

Revert "mm, thp: restore node-local hugepage allocations"

This reverts commit 2f0799a0ffc033b ("mm, thp: restore node-local
hugepage allocations").

commit 2f0799a0ffc033b was rightfully applied to avoid the risk of a
severe regression that was reported by the kernel test robot at the end
of the merge window.  Now we understood the regression was a false
positive and was caused by a significant increase in fairness during a
swap trashing benchmark.  So it's safe to re-apply the fix and continue
improving the code from there.  The benchmark that reported the
regression is very useful, but it provides a meaningful result only when
there is no significant alteration in fairness during the workload.  The
removal of __GFP_THISNODE increased fairness.

__GFP_THISNODE cannot be used in the generic page faults path for new
memory allocations under the MPOL_DEFAULT mempolicy, or the allocation
behavior significantly deviates from what the MPOL_DEFAULT semantics are
supposed to be for THP and 4k allocations alike.

Setting THP defrag to "always" or using MADV_HUGEPAGE (with THP defrag
set to "madvise") has never meant to provide an implicit MPOL_BIND on
the "current" node the task is running on, causing swap storms and
providing a much more aggressive behavior than even zone_reclaim_node =
3.

Any workload who could have benefited from __GFP_THISNODE has now to
enable zone_reclaim_mode=1||2||3.  __GFP_THISNODE implicitly provided
the zone_reclaim_mode behavior, but it only did so if THP was enabled:
if THP was disabled, there would have been no chance to get any 4k page
from the current node if the current node was full of pagecache, which
further shows how this __GFP_THISNODE was misplaced in MADV_HUGEPAGE.
MADV_HUGEPAGE has never been intended to provide any zone_reclaim_mode
semantics, in fact the two are orthogonal, zone_reclaim_mode = 1|2|3
must work exactly the same with MADV_HUGEPAGE set or not.

The performance characteristic of memory depends on the hardware
details.  The numbers below are obtained on Naples/EPYC architecture and
the N/A projection extends them to show what we should aim for in the
future as a good THP NUMA locality default.  The benchmark used
exercises random memory seeks (note: the cost of the page faults is not
part of the measurement).

  D0 THP | D0 4k | D1 THP | D1 4k | D2 THP | D2 4k | D3 THP | D3 4k | ...
  0%     | +43%  | +45%   | +106% | +131%  | +224% | N/A    | N/A

D0 means distance zero (i.e.  local memory), D1 means distance one (i.e.
intra socket memory), D2 means distance two (i.e.  inter socket memory),
etc...

For the guest physical memory allocated by qemu and for guest mode
kernel the performance characteristic of RAM is more complex and an
ideal default could be:

  D0 THP | D1 THP | D0 4k | D2 THP | D1 4k | D3 THP | D2 4k | D3 4k | ...
  0%     | +58%   | +101% | N/A    | +222% | N/A    | N/A   | N/A

NOTE: the N/A are projections and haven't been measured yet, the
measurement in this case is done on a 1950x with only two NUMA nodes.
The THP case here means THP was used both in the host and in the guest.

After applying this commit the THP NUMA locality order that we'll get
out of MADV_HUGEPAGE is this:

  D0 THP | D1 THP | D2 THP | D3 THP | ... | D0 4k | D1 4k | D2 4k | D3 4k | ...

Before this commit it was:

  D0 THP | D0 4k | D1 4k | D2 4k | D3 4k | ...

Even if we ignore the breakage of large workloads that can't fit in a
single node that the __GFP_THISNODE implicit "current node" mbind
caused, the THP NUMA locality order provided by __GFP_THISNODE was still
not the one we shall aim for in the long term (i.e.  the first one at
the top).

After this commit is applied, we can introduce a new allocator multi
order API and to replace those two alloc_pages_vmas calls in the page
fault path, with a single multi order call:

        unsigned int order = (1 << HPAGE_PMD_ORDER) | (1 << 0);
        page = alloc_pages_multi_order(..., &order);
        if (!page)
         goto out;
        if (!(order & (1 << 0))) {
         VM_WARN_ON(order != 1 << HPAGE_PMD_ORDER);
         /* THP fault */
        } else {
         VM_WARN_ON(order != 1 << 0);
         /* 4k fallback */
        }

The page allocator logic has to be altered so that when it fails on any
zone with order 9, it has to try again with a order 0 before falling
back to the next zone in the zonelist.

After that we need to do more measurements and evaluate if adding an
opt-in feature for guest mode is worth it, to swap "DN 4k | DN+1 THP"
with "DN+1 THP | DN 4k" at every NUMA distance crossing.

Link: http://lkml.kernel.org/r/20190503223146.2312-3-aarcange@redhat.com
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: Zi Yan <zi.yan@cs.rutgers.edu>
Cc: Stefan Priebe - Profihost AG <s.priebe@profihost.ag>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Andrea Arcangeli [Tue, 13 Aug 2019 22:37:50 +0000 (15:37 -0700)]

Revert "Revert "mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask""

Patch series "reapply: relax __GFP_THISNODE for MADV_HUGEPAGE mappings".

The fixes for what was originally reported as "pathological THP
behavior" we rightfully reverted to be sure not to introduced
regressions at end of a merge window after a severe regression report
from the kernel bot.  We can safely re-apply them now that we had time
to analyze the problem.

The mm process worked fine, because the good fixes were eventually
committed upstream without excessive delay.

The regression reported by the kernel bot however forced us to revert
the good fixes to be sure not to introduce regressions and to give us
the time to analyze the issue further.  The silver lining is that this
extra time allowed to think more at this issue and also plan for a
future direction to improve things further in terms of THP NUMA
locality.

This patch (of 2):

This reverts commit 356ff8a9a78fb35d ("Revert "mm, thp: consolidate THP
gfp handling into alloc_hugepage_direct_gfpmask").  So it reapplies
89c83fb539f954 ("mm, thp: consolidate THP gfp handling into
alloc_hugepage_direct_gfpmask").

Consolidation of the THP allocation flags at the same place was meant to
be a clean up to easier handle otherwise scattered code which is
imposing a maintenance burden.  There were no real problems observed
with the gfp mask consolidation but the reversion was rushed through
without a larger consensus regardless.

This patch brings the consolidation back because this should make the
long term maintainability easier as well as it should allow future
changes to be less error prone.

[mhocko@kernel.org: changelog additions]
Link: http://lkml.kernel.org/r/20190503223146.2312-2-aarcange@redhat.com
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: Zi Yan <zi.yan@cs.rutgers.edu>
Cc: Stefan Priebe - Profihost AG <s.priebe@profihost.ag>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Qian Cai [Tue, 13 Aug 2019 22:37:47 +0000 (15:37 -0700)]

include/asm-generic/5level-fixup.h: fix variable 'p4d' set but not used

A compiler throws a warning on an arm64 system since commit 9849a5697d3d
("arch, mm: convert all architectures to use 5level-fixup.h"),

  mm/kasan/init.c: In function 'kasan_free_p4d':
  mm/kasan/init.c:344:9: warning: variable 'p4d' set but not used [-Wunused-but-set-variable]
   p4d_t *p4d;
          ^~~

because p4d_none() in "5level-fixup.h" is compiled away while it is a
static inline function in "pgtable-nopud.h".

However, if converted p4d_none() to a static inline there, powerpc would
be unhappy as it reads those in assembler language in
"arch/powerpc/include/asm/book3s/64/pgtable.h", so it needs to skip
assembly include for the static inline C function.

While at it, converted a few similar functions to be consistent with the
ones in "pgtable-nopud.h".

Link: http://lkml.kernel.org/r/20190806232917.881-1-cai@lca.pw
Signed-off-by: Qian Cai <cai@lca.pw>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

NeilBrown [Tue, 13 Aug 2019 22:37:44 +0000 (15:37 -0700)]

seq_file: fix problem when seeking mid-record

If you use lseek or similar (e.g.  pread) to access a location in a
seq_file file that is within a record, rather than at a record boundary,
then the first read will return the remainder of the record, and the
second read will return the whole of that same record (instead of the
next record).  When seeking to a record boundary, the next record is
correctly returned.

This bug was introduced by a recent patch (identified below).  Before
that patch, seq_read() would increment m->index when the last of the
buffer was returned (m->count == 0).  After that patch, we rely on
->next to increment m->index after filling the buffer - but there was
one place where that didn't happen.

Link: https://lkml.kernel.org/lkml/877e7xl029.fsf@notabene.neil.brown.name/
Fixes: 1f4aace60b0e ("fs/seq_file.c: simplify seq_file iteration code and interface")
Signed-off-by: NeilBrown <neilb@suse.com>
Reported-by: Sergei Turchanov <turchanov@farpost.com>
Tested-by: Sergei Turchanov <turchanov@farpost.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Markus Elfring <Markus.Elfring@web.de>
Cc: <stable@vger.kernel.org> [4.19+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Roman Gushchin [Tue, 13 Aug 2019 22:37:41 +0000 (15:37 -0700)]

mm: workingset: fix vmstat counters for shadow nodes

Memcg counters for shadow nodes are broken because the memcg pointer is
obtained in a wrong way. The following approach is used:
        virt_to_page(xa_node)->mem_cgroup

Since commit 4d96ba353075 ("mm: memcg/slab: stop setting
page->mem_cgroup pointer for slab pages") page->mem_cgroup pointer isn't
set for slab pages, so memcg_from_slab_page() should be used instead.

Also I doubt that it ever worked correctly: virt_to_head_page() should
be used instead of virt_to_page().  Otherwise objects residing on tail
pages are not accounted, because only the head page contains a valid
mem_cgroup pointer.  That was a case since the introduction of these
counters by the commit 68d48e6a2df5 ("mm: workingset: add vmstat counter
for shadow nodes").

Link: http://lkml.kernel.org/r/20190801233532.138743-1-guro@fb.com
Fixes: 4d96ba353075 ("mm: memcg/slab: stop setting page->mem_cgroup pointer for slab pages")
Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Isaac J. Manjarres [Tue, 13 Aug 2019 22:37:37 +0000 (15:37 -0700)]

mm/usercopy: use memory range to be accessed for wraparound check

Currently, when checking to see if accessing n bytes starting at address
"ptr" will cause a wraparound in the memory addresses, the check in
check_bogus_address() adds an extra byte, which is incorrect, as the
range of addresses that will be accessed is [ptr, ptr + (n - 1)].

This can lead to incorrectly detecting a wraparound in the memory
address, when trying to read 4 KB from memory that is mapped to the the
last possible page in the virtual address space, when in fact, accessing
that range of memory would not cause a wraparound to occur.

Use the memory range that will actually be accessed when considering if
accessing a certain amount of bytes will cause the memory address to
wrap around.

Link: http://lkml.kernel.org/r/1564509253-23287-1-git-send-email-isaacm@codeaurora.org
Fixes: f5509cc18daa ("mm: Hardened usercopy")
Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org>
Co-developed-by: Prasad Sodagudi <psodagud@codeaurora.org>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Trilok Soni <tsoni@codeaurora.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Catalin Marinas [Tue, 13 Aug 2019 22:37:34 +0000 (15:37 -0700)]

mm: kmemleak: disable early logging in case of error

If an error occurs during kmemleak_init() (e.g. kmem cache cannot be
created), kmemleak is disabled but kmemleak_early_log remains enabled.
Subsequently, when the .init.text section is freed, the log_early()
function no longer exists. To avoid a page fault in such scenario,
ensure that kmemleak_disable() also disables early logging.

Link: http://lkml.kernel.org/r/20190731152302.42073-1-catalin.marinas@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Qian Cai <cai@lca.pw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Kuppuswamy Sathyanarayanan [Tue, 13 Aug 2019 22:37:31 +0000 (15:37 -0700)]

mm/vmalloc.c: fix percpu free VM area search criteria

Recent changes to the vmalloc code by commit 68ad4a330433
("mm/vmalloc.c: keep track of free blocks for vmap allocation") can
cause spurious percpu allocation failures.  These, in turn, can result
in panic()s in the slub code.  One such possible panic was reported by
Dave Hansen in following link https://lkml.org/lkml/2019/6/19/939.
Another related panic observed is,

RIP: 0033:0x7f46f7441b9b
Call Trace:
  dump_stack+0x61/0x80
  pcpu_alloc.cold.30+0x22/0x4f
  mem_cgroup_css_alloc+0x110/0x650
  cgroup_apply_control_enable+0x133/0x330
  cgroup_mkdir+0x41b/0x500
  kernfs_iop_mkdir+0x5a/0x90
  vfs_mkdir+0x102/0x1b0
  do_mkdirat+0x7d/0xf0
  do_syscall_64+0x5b/0x180
  entry_SYSCALL_64_after_hwframe+0x44/0xa9

VMALLOC memory manager divides the entire VMALLOC space (VMALLOC_START
to VMALLOC_END) into multiple VM areas (struct vm_areas), and it mainly
uses two lists (vmap_area_list & free_vmap_area_list) to track the used
and free VM areas in VMALLOC space.  And pcpu_get_vm_areas(offsets[],
sizes[], nr_vms, align) function is used for allocating congruent VM
areas for percpu memory allocator.  In order to not conflict with
VMALLOC users, pcpu_get_vm_areas allocates VM areas near the end of the
VMALLOC space.  So the search for free vm_area for the given requirement
starts near VMALLOC_END and moves upwards towards VMALLOC_START.

Prior to commit 68ad4a330433, the search for free vm_area in
pcpu_get_vm_areas() involves following two main steps.

Step 1:
    Find a aligned "base" adress near VMALLOC_END.
    va = free vm area near VMALLOC_END
Step 2:
    Loop through number of requested vm_areas and check,
        Step 2.1:
           if (base < VMALLOC_START)
              1. fail with error
        Step 2.2:
           // end is offsets[area] + sizes[area]
           if (base + end > va->vm_end)
               1. Move the base downwards and repeat Step 2
        Step 2.3:
           if (base + start < va->vm_start)
              1. Move to previous free vm_area node, find aligned
                 base address and repeat Step 2

But Commit 68ad4a330433 removed Step 2.2 and modified Step 2.3 as below:

        Step 2.3:
           if (base + start < va->vm_start || base + end > va->vm_end)
              1. Move to previous free vm_area node, find aligned
                 base address and repeat Step 2

Above change is the root cause of spurious percpu memory allocation
failures.  For example, consider a case where a relatively large vm_area
(~ 30 TB) was ignored in free vm_area search because it did not pass the
base + end < vm->vm_end boundary check.  Ignoring such large free
vm_area's would lead to not finding free vm_area within boundary of
VMALLOC_start to VMALLOC_END which in turn leads to allocation failures.

So modify the search algorithm to include Step 2.2.

Link: http://lkml.kernel.org/r/20190729232139.91131-1-sathyanarayanan.kuppuswamy@linux.intel.com
Fixes: 68ad4a330433 ("mm/vmalloc.c: keep track of free blocks for vmap allocation")
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reported-by: Dave Hansen <dave.hansen@intel.com>
Acked-by: Dennis Zhou <dennis@kernel.org>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: sathyanarayanan kuppuswamy <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Domain: System / Kernel;

RSS Atom