Pavel Shilovsky [Thu, 26 Sep 2019 19:31:20 +0000 (12:31 -0700)]
CIFS: Fix oplock handling for SMB 2.1+ protocols
commit
a016e2794fc3a245a91946038dd8f34d65e53cc3 upstream.
There may be situations when a server negotiates SMB 2.1
protocol version or higher but responds to a CREATE request
with an oplock rather than a lease.
Currently the client doesn't handle such a case correctly:
when another CREATE comes in the server sends an oplock
break to the initial CREATE and the client doesn't send
an ack back due to a wrong caching level being set (READ
instead of RWH). Missing an oplock break ack makes the
server wait until the break times out which dramatically
increases the latency of the second CREATE.
Fix this by properly detecting oplocks when using SMB 2.1
protocol version and higher.
Cc: <stable@vger.kernel.org>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Murphy Zhou [Sat, 21 Sep 2019 11:26:00 +0000 (19:26 +0800)]
CIFS: fix max ea value size
commit
63d37fb4ce5ae7bf1e58f906d1bf25f036fe79b2 upstream.
It should not be larger then the slab max buf size. If user
specifies a larger size, it passes this check and goes
straightly to SMB2_set_info_init performing an insecure memcpy.
Signed-off-by: Murphy Zhou <jencce.kernel@gmail.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Chris Brandt [Thu, 26 Sep 2019 12:19:09 +0000 (07:19 -0500)]
i2c: riic: Clear NACK in tend isr
commit
a71e2ac1f32097fbb2beab098687a7a95c84543e upstream.
The NACKF flag should be cleared in INTRIICNAKI interrupt processing as
description in HW manual.
This issue shows up quickly when PREEMPT_RT is applied and a device is
probed that is not plugged in (like a touchscreen controller). The result
is endless interrupts that halt system boot.
Fixes:
310c18a41450 ("i2c: riic: add driver")
Cc: stable@vger.kernel.org
Reported-by: Chien Nguyen <chien.nguyen.eb@rvc.renesas.com>
Signed-off-by: Chris Brandt <chris.brandt@renesas.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Laurent Vivier [Tue, 17 Sep 2019 09:54:50 +0000 (11:54 +0200)]
hwrng: core - don't wait on add_early_randomness()
commit
78887832e76541f77169a24ac238fccb51059b63 upstream.
add_early_randomness() is called by hwrng_register() when the
hardware is added. If this hardware and its module are present
at boot, and if there is no data available the boot hangs until
data are available and can't be interrupted.
For instance, in the case of virtio-rng, in some cases the host can be
not able to provide enough entropy for all the guests.
We can have two easy ways to reproduce the problem but they rely on
misconfiguration of the hypervisor or the egd daemon:
- if virtio-rng device is configured to connect to the egd daemon of the
host but when the virtio-rng driver asks for data the daemon is not
connected,
- if virtio-rng device is configured to connect to the egd daemon of the
host but the egd daemon doesn't provide data.
The guest kernel will hang at boot until the virtio-rng driver provides
enough data.
To avoid that, call rng_get_data() in non-blocking mode (wait=0)
from add_early_randomness().
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Fixes:
d9e797261933 ("hwrng: add randomness to system from rng...")
Cc: <stable@vger.kernel.org>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Chao Yu [Wed, 11 Sep 2019 09:36:50 +0000 (17:36 +0800)]
quota: fix wrong condition in is_quota_modification()
commit
6565c182094f69e4ffdece337d395eb7ec760efc upstream.
Quoted from
commit
3da40c7b0898 ("ext4: only call ext4_truncate when size <= isize")
" At LSF we decided that if we truncate up from isize we shouldn't trim
fallocated blocks that were fallocated with KEEP_SIZE and are past the
new i_size. This patch fixes ext4 to do this. "
And generic/092 of fstest have covered this case for long time, however
is_quota_modification() didn't adjust based on that rule, so that in
below condition, we will lose to quota block change:
- fallocate blocks beyond EOF
- remount
- truncate(file_path, file_size)
Fix it.
Link: https://lore.kernel.org/r/20190911093650.35329-1-yuchao0@huawei.com
Fixes:
3da40c7b0898 ("ext4: only call ext4_truncate when size <= isize")
CC: stable@vger.kernel.org
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Theodore Ts'o [Sat, 24 Aug 2019 02:38:00 +0000 (22:38 -0400)]
ext4: fix punch hole for inline_data file systems
commit
c1e8220bd316d8ae8e524df39534b8a412a45d5e upstream.
If a program attempts to punch a hole on an inline data file, we need
to convert it to a normal file first.
This was detected using ext4/032 using the adv configuration. Simple
reproducer:
mke2fs -Fq -t ext4 -O inline_data /dev/vdc
mount /vdc
echo "" > /vdc/testfile
xfs_io -c 'truncate
33554432' /vdc/testfile
xfs_io -c 'fpunch 0 1048576' /vdc/testfile
umount /vdc
e2fsck -fy /dev/vdc
Cc: stable@vger.kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Rakesh Pandit [Fri, 23 Aug 2019 02:53:46 +0000 (22:53 -0400)]
ext4: fix warning inside ext4_convert_unwritten_extents_endio
commit
e3d550c2c4f2f3dba469bc3c4b83d9332b4e99e1 upstream.
Really enable warning when CONFIG_EXT4_DEBUG is set and fix missing
first argument. This was introduced in commit
ff95ec22cd7f ("ext4:
add warning to ext4_convert_unwritten_extents_endio") and splitting
extents inside endio would trigger it.
Fixes:
ff95ec22cd7f ("ext4: add warning to ext4_convert_unwritten_extents_endio")
Signed-off-by: Rakesh Pandit <rakesh@tuxera.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tetsuo Handa [Mon, 26 Aug 2019 13:13:25 +0000 (22:13 +0900)]
/dev/mem: Bail out upon SIGKILL.
commit
8619e5bdeee8b2c685d686281f2d2a6017c4bc15 upstream.
syzbot found that a thread can stall for minutes inside read_mem() or
write_mem() after that thread was killed by SIGKILL [1]. Reading from
iomem areas of /dev/mem can be slow, depending on the hardware.
While reading 2GB at one read() is legal, delaying termination of killed
thread for minutes is bad. Thus, allow reading/writing /dev/mem and
/dev/kmem to be preemptible and killable.
[ 1335.912419][T20577] read_mem: sz=4096 count=
2134565632
[ 1335.943194][T20577] read_mem: sz=4096 count=
2134561536
[ 1335.978280][T20577] read_mem: sz=4096 count=
2134557440
[ 1336.011147][T20577] read_mem: sz=4096 count=
2134553344
[ 1336.041897][T20577] read_mem: sz=4096 count=
2134549248
Theoretically, reading/writing /dev/mem and /dev/kmem can become
"interruptible". But this patch chose "killable". Future patch will make
them "interruptible" so that we can revert to "killable" if some program
regressed.
[1] https://syzkaller.appspot.com/bug?id=
a0e3436829698d5824231251fad9d8e998f94f5e
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: stable <stable@vger.kernel.org>
Reported-by: syzbot <syzbot+8ab2d0f39fb79fe6ca40@syzkaller.appspotmail.com>
Link: https://lore.kernel.org/r/1566825205-10703-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Denis Kenzior [Wed, 28 Aug 2019 21:11:10 +0000 (16:11 -0500)]
cfg80211: Purge frame registrations on iftype change
commit
c1d3ad84eae35414b6b334790048406bd6301b12 upstream.
Currently frame registrations are not purged, even when changing the
interface type. This can lead to potentially weird situations where
frames possibly not allowed on a given interface type remain registered
due to the type switching happening after registration.
The kernel currently relies on userspace apps to actually purge the
registrations themselves, this is not something that the kernel should
rely on.
Add a call to cfg80211_mlme_purge_registrations() to forcefully remove
any registrations left over prior to switching the iftype.
Cc: stable@vger.kernel.org
Signed-off-by: Denis Kenzior <denkenz@gmail.com>
Link: https://lore.kernel.org/r/20190828211110.15005-1-denkenz@gmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
NeilBrown [Tue, 20 Aug 2019 00:21:09 +0000 (10:21 +1000)]
md: only call set_in_sync() when it is expected to succeed.
commit
480523feae581ab714ba6610388a3b4619a2f695 upstream.
Since commit
4ad23a976413 ("MD: use per-cpu counter for
writes_pending"), set_in_sync() is substantially more expensive: it
can wait for a full RCU grace period which can be 10s of milliseconds.
So we should only call it when the cost is justified.
md_check_recovery() currently calls set_in_sync() every time it finds
anything to do (on non-external active arrays). For an array
performing resync or recovery, this will be quite often.
Each call will introduce a delay to the md thread, which can noticeable
affect IO submission latency.
In md_check_recovery() we only need to call set_in_sync() if
'safemode' was non-zero at entry, meaning that there has been not
recent IO. So we save this "safemode was nonzero" state, and only
call set_in_sync() if it was non-zero.
This measurably reduces mean and maximum IO submission latency during
resync/recovery.
Reported-and-tested-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Fixes:
4ad23a976413 ("MD: use per-cpu counter for writes_pending")
Cc: stable@vger.kernel.org (v4.12+)
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
NeilBrown [Tue, 20 Aug 2019 00:21:09 +0000 (10:21 +1000)]
md: don't report active array_state until after revalidate_disk() completes.
commit
9d4b45d6af442237560d0bb5502a012baa5234b7 upstream.
Until revalidate_disk() has completed, the size of a new md array will
appear to be zero.
So we shouldn't report, through array_state, that the array is active
until that time.
udev rules check array_state to see if the array is ready. As soon as
it appear to be zero, fsck can be run. If it find the size to be
zero, it will fail.
So add a new flag to provide an interlock between do_md_run() and
array_state_show(). This flag is set while do_md_run() is active and
it prevents array_state_show() from reporting that the array is
active.
Before do_md_run() is called, ->pers will be NULL so array is
definitely not active.
After do_md_run() is called, revalidate_disk() will have run and the
array will be completely ready.
We also move various sysfs_notify*() calls out of md_run() into
do_md_run() after MD_NOT_READY is cleared. This ensure the
information is ready before the notification is sent.
Prior to v4.12, array_state_show() was called with the
mddev->reconfig_mutex held, which provided exclusion with do_md_run().
Note that MD_NOT_READY cleared twice. This is deliberate to cover
both success and error paths with minimal noise.
Fixes:
b7b17c9b67e5 ("md: remove mddev_lock() from md_attr_show()")
Cc: stable@vger.kernel.org (v4.12++)
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Xiao Ni [Mon, 8 Jul 2019 02:14:32 +0000 (10:14 +0800)]
md/raid6: Set R5_ReadError when there is read failure on parity disk
commit
143f6e733b73051cd22dcb80951c6c929da413ce upstream.
7471fb77ce4d ("md/raid6: Fix anomily when recovering a single device in
RAID6.") avoids rereading P when it can be computed from other members.
However, this misses the chance to re-write the right data to P. This
patch sets R5_ReadError if the re-read fails.
Also, when re-read is skipped, we also missed the chance to reset
rdev->read_errors to 0. It can fail the disk when there are many read
errors on P member disk (other disks don't have read error)
V2: upper layer read request don't read parity/Q data. So there is no
need to consider such situation.
This is Reported-by: kbuild test robot <lkp@intel.com>
Fixes:
7471fb77ce4d ("md/raid6: Fix anomily when recovering a single device in RAID6.")
Cc: <stable@vger.kernel.org> #4.4+
Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Filipe Manana [Tue, 24 Sep 2019 09:49:54 +0000 (10:49 +0100)]
Btrfs: fix race setting up and completing qgroup rescan workers
commit
13fc1d271a2e3ab8a02071e711add01fab9271f6 upstream.
There is a race between setting up a qgroup rescan worker and completing
a qgroup rescan worker that can lead to callers of the qgroup rescan wait
ioctl to either not wait for the rescan worker to complete or to hang
forever due to missing wake ups. The following diagram shows a sequence
of steps that illustrates the race.
CPU 1 CPU 2 CPU 3
btrfs_ioctl_quota_rescan()
btrfs_qgroup_rescan()
qgroup_rescan_init()
mutex_lock(&fs_info->qgroup_rescan_lock)
spin_lock(&fs_info->qgroup_lock)
fs_info->qgroup_flags |=
BTRFS_QGROUP_STATUS_FLAG_RESCAN
init_completion(
&fs_info->qgroup_rescan_completion)
fs_info->qgroup_rescan_running = true
mutex_unlock(&fs_info->qgroup_rescan_lock)
spin_unlock(&fs_info->qgroup_lock)
btrfs_init_work()
--> starts the worker
btrfs_qgroup_rescan_worker()
mutex_lock(&fs_info->qgroup_rescan_lock)
fs_info->qgroup_flags &=
~BTRFS_QGROUP_STATUS_FLAG_RESCAN
mutex_unlock(&fs_info->qgroup_rescan_lock)
starts transaction, updates qgroup status
item, etc
btrfs_ioctl_quota_rescan()
btrfs_qgroup_rescan()
qgroup_rescan_init()
mutex_lock(&fs_info->qgroup_rescan_lock)
spin_lock(&fs_info->qgroup_lock)
fs_info->qgroup_flags |=
BTRFS_QGROUP_STATUS_FLAG_RESCAN
init_completion(
&fs_info->qgroup_rescan_completion)
fs_info->qgroup_rescan_running = true
mutex_unlock(&fs_info->qgroup_rescan_lock)
spin_unlock(&fs_info->qgroup_lock)
btrfs_init_work()
--> starts another worker
mutex_lock(&fs_info->qgroup_rescan_lock)
fs_info->qgroup_rescan_running = false
mutex_unlock(&fs_info->qgroup_rescan_lock)
complete_all(&fs_info->qgroup_rescan_completion)
Before the rescan worker started by the task at CPU 3 completes, if
another task calls btrfs_ioctl_quota_rescan(), it will get -EINPROGRESS
because the flag BTRFS_QGROUP_STATUS_FLAG_RESCAN is set at
fs_info->qgroup_flags, which is expected and correct behaviour.
However if other task calls btrfs_ioctl_quota_rescan_wait() before the
rescan worker started by the task at CPU 3 completes, it will return
immediately without waiting for the new rescan worker to complete,
because fs_info->qgroup_rescan_running is set to false by CPU 2.
This race is making test case btrfs/171 (from fstests) to fail often:
btrfs/171 9s ... - output mismatch (see /home/fdmanana/git/hub/xfstests/results//btrfs/171.out.bad)
# --- tests/btrfs/171.out 2018-09-16 21:30:48.
505104287 +0100
# +++ /home/fdmanana/git/hub/xfstests/results//btrfs/171.out.bad 2019-09-19 02:01:36.
938486039 +0100
# @@ -1,2 +1,3 @@
# QA output created by 171
# +ERROR: quota rescan failed: Operation now in progress
# Silence is golden
# ...
# (Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/btrfs/171.out /home/fdmanana/git/hub/xfstests/results//btrfs/171.out.bad' to see the entire diff)
That is because the test calls the btrfs-progs commands "qgroup quota
rescan -w", "qgroup assign" and "qgroup remove" in a sequence that makes
calls to the rescan start ioctl fail with -EINPROGRESS (note the "btrfs"
commands 'qgroup assign' and 'qgroup remove' often call the rescan start
ioctl after calling the qgroup assign ioctl,
btrfs_ioctl_qgroup_assign()), since previous waits didn't actually wait
for a rescan worker to complete.
Another problem the race can cause is missing wake ups for waiters,
since the call to complete_all() happens outside a critical section and
after clearing the flag BTRFS_QGROUP_STATUS_FLAG_RESCAN. In the sequence
diagram above, if we have a waiter for the first rescan task (executed
by CPU 2), then fs_info->qgroup_rescan_completion.wait is not empty, and
if after the rescan worker clears BTRFS_QGROUP_STATUS_FLAG_RESCAN and
before it calls complete_all() against
fs_info->qgroup_rescan_completion, the task at CPU 3 calls
init_completion() against fs_info->qgroup_rescan_completion which
re-initilizes its wait queue to an empty queue, therefore causing the
rescan worker at CPU 2 to call complete_all() against an empty queue,
never waking up the task waiting for that rescan worker.
Fix this by clearing BTRFS_QGROUP_STATUS_FLAG_RESCAN and setting
fs_info->qgroup_rescan_running to false in the same critical section,
delimited by the mutex fs_info->qgroup_rescan_lock, as well as doing the
call to complete_all() in that same critical section. This gives the
protection needed to avoid rescan wait ioctl callers not waiting for a
running rescan worker and the lost wake ups problem, since setting that
rescan flag and boolean as well as initializing the wait queue is done
already in a critical section delimited by that mutex (at
qgroup_rescan_init()).
Fixes:
57254b6ebce4ce ("Btrfs: add ioctl to wait for qgroup rescan completion")
Fixes:
d2c609b834d62f ("btrfs: properly track when rescan worker is running")
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Qu Wenruo [Mon, 16 Sep 2019 12:02:39 +0000 (20:02 +0800)]
btrfs: qgroup: Fix reserved data space leak if we have multiple reserve calls
commit
d4e204948fe3e0dc8e1fbf3f8f3290c9c2823be3 upstream.
[BUG]
The following script can cause btrfs qgroup data space leak:
mkfs.btrfs -f $dev
mount $dev -o nospace_cache $mnt
btrfs subv create $mnt/subv
btrfs quota en $mnt
btrfs quota rescan -w $mnt
btrfs qgroup limit 128m $mnt/subv
for (( i = 0; i < 3; i++)); do
# Create 3 64M holes for latter fallocate to fail
truncate -s 192m $mnt/subv/file
xfs_io -c "pwrite 64m 4k" $mnt/subv/file > /dev/null
xfs_io -c "pwrite 128m 4k" $mnt/subv/file > /dev/null
sync
# it's supposed to fail, and each failure will leak at least 64M
# data space
xfs_io -f -c "falloc 0 192m" $mnt/subv/file &> /dev/null
rm $mnt/subv/file
sync
done
# Shouldn't fail after we removed the file
xfs_io -f -c "falloc 0 64m" $mnt/subv/file
[CAUSE]
Btrfs qgroup data reserve code allow multiple reservations to happen on
a single extent_changeset:
E.g:
btrfs_qgroup_reserve_data(inode, &data_reserved, 0, SZ_1M);
btrfs_qgroup_reserve_data(inode, &data_reserved, SZ_1M, SZ_2M);
btrfs_qgroup_reserve_data(inode, &data_reserved, 0, SZ_4M);
Btrfs qgroup code has its internal tracking to make sure we don't
double-reserve in above example.
The only pattern utilizing this feature is in the main while loop of
btrfs_fallocate() function.
However btrfs_qgroup_reserve_data()'s error handling has a bug in that
on error it clears all ranges in the io_tree with EXTENT_QGROUP_RESERVED
flag but doesn't free previously reserved bytes.
This bug has a two fold effect:
- Clearing EXTENT_QGROUP_RESERVED ranges
This is the correct behavior, but it prevents
btrfs_qgroup_check_reserved_leak() to catch the leakage as the
detector is purely EXTENT_QGROUP_RESERVED flag based.
- Leak the previously reserved data bytes.
The bug manifests when N calls to btrfs_qgroup_reserve_data are made and
the last one fails, leaking space reserved in the previous ones.
[FIX]
Also free previously reserved data bytes when btrfs_qgroup_reserve_data
fails.
Fixes:
524725537023 ("btrfs: qgroup: Introduce btrfs_qgroup_reserve_data function")
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Qu Wenruo [Mon, 16 Sep 2019 12:02:38 +0000 (20:02 +0800)]
btrfs: qgroup: Fix the wrong target io_tree when freeing reserved data space
commit
bab32fc069ce8829c416e8737c119f62a57970f9 upstream.
[BUG]
Under the following case with qgroup enabled, if some error happened
after we have reserved delalloc space, then in error handling path, we
could cause qgroup data space leakage:
From btrfs_truncate_block() in inode.c:
ret = btrfs_delalloc_reserve_space(inode, &data_reserved,
block_start, blocksize);
if (ret)
goto out;
again:
page = find_or_create_page(mapping, index, mask);
if (!page) {
btrfs_delalloc_release_space(inode, data_reserved,
block_start, blocksize, true);
btrfs_delalloc_release_extents(BTRFS_I(inode), blocksize, true);
ret = -ENOMEM;
goto out;
}
[CAUSE]
In the above case, btrfs_delalloc_reserve_space() will call
btrfs_qgroup_reserve_data() and mark the io_tree range with
EXTENT_QGROUP_RESERVED flag.
In the error handling path, we have the following call stack:
btrfs_delalloc_release_space()
|- btrfs_free_reserved_data_space()
|- btrsf_qgroup_free_data()
|- __btrfs_qgroup_release_data(reserved=@reserved, free=1)
|- qgroup_free_reserved_data(reserved=@reserved)
|- clear_record_extent_bits();
|- freed += changeset.bytes_changed;
However due to a completion bug, qgroup_free_reserved_data() will clear
EXTENT_QGROUP_RESERVED flag in BTRFS_I(inode)->io_failure_tree, other
than the correct BTRFS_I(inode)->io_tree.
Since io_failure_tree is never marked with that flag,
btrfs_qgroup_free_data() will not free any data reserved space at all,
causing a leakage.
This type of error handling can only be triggered by errors outside of
qgroup code. So EDQUOT error from qgroup can't trigger it.
[FIX]
Fix the wrong target io_tree.
Reported-by: Josef Bacik <josef@toxicpanda.com>
Fixes:
bc42bda22345 ("btrfs: qgroup: Fix qgroup reserved space underflow by only freeing reserved ranges")
CC: stable@vger.kernel.org # 4.14+
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Nikolay Borisov [Wed, 4 Sep 2019 16:33:58 +0000 (19:33 +0300)]
btrfs: Relinquish CPUs in btrfs_compare_trees
commit
6af112b11a4bc1b560f60a618ac9c1dcefe9836e upstream.
When doing any form of incremental send the parent and the child trees
need to be compared via btrfs_compare_trees. This can result in long
loop chains without ever relinquishing the CPU. This causes softlockup
detector to trigger when comparing trees with a lot of items. Example
report:
watchdog: BUG: soft lockup - CPU#0 stuck for 24s! [snapperd:16153]
CPU: 0 PID: 16153 Comm: snapperd Not tainted 5.2.9-1-default #1 openSUSE Tumbleweed (unreleased)
Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
pstate:
40000005 (nZcv daif -PAN -UAO)
pc : __ll_sc_arch_atomic_sub_return+0x14/0x20
lr : btrfs_release_extent_buffer_pages+0xe0/0x1e8 [btrfs]
sp :
ffff00001273b7e0
Call trace:
__ll_sc_arch_atomic_sub_return+0x14/0x20
release_extent_buffer+0xdc/0x120 [btrfs]
free_extent_buffer.part.0+0xb0/0x118 [btrfs]
free_extent_buffer+0x24/0x30 [btrfs]
btrfs_release_path+0x4c/0xa0 [btrfs]
btrfs_free_path.part.0+0x20/0x40 [btrfs]
btrfs_free_path+0x24/0x30 [btrfs]
get_inode_info+0xa8/0xf8 [btrfs]
finish_inode_if_needed+0xe0/0x6d8 [btrfs]
changed_cb+0x9c/0x410 [btrfs]
btrfs_compare_trees+0x284/0x648 [btrfs]
send_subvol+0x33c/0x520 [btrfs]
btrfs_ioctl_send+0x8a0/0xaf0 [btrfs]
btrfs_ioctl+0x199c/0x2288 [btrfs]
do_vfs_ioctl+0x4b0/0x820
ksys_ioctl+0x84/0xb8
__arm64_sys_ioctl+0x28/0x38
el0_svc_common.constprop.0+0x7c/0x188
el0_svc_handler+0x34/0x90
el0_svc+0x8/0xc
Fix this by adding a call to cond_resched at the beginning of the main
loop in btrfs_compare_trees.
Fixes:
7069830a9e38 ("Btrfs: add btrfs_compare_trees function")
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Filipe Manana [Mon, 12 Aug 2019 18:14:29 +0000 (19:14 +0100)]
Btrfs: fix use-after-free when using the tree modification log
commit
efad8a853ad2057f96664328a0d327a05ce39c76 upstream.
At ctree.c:get_old_root(), we are accessing a root's header owner field
after we have freed the respective extent buffer. This results in an
use-after-free that can lead to crashes, and when CONFIG_DEBUG_PAGEALLOC
is set, results in a stack trace like the following:
[ 3876.799331] stack segment: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
[ 3876.799363] CPU: 0 PID: 15436 Comm: pool Not tainted 5.3.0-rc3-btrfs-next-54 #1
[ 3876.799385] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
[ 3876.799433] RIP: 0010:btrfs_search_old_slot+0x652/0xd80 [btrfs]
(...)
[ 3876.799502] RSP: 0018:
ffff9f08c1a2f9f0 EFLAGS:
00010286
[ 3876.799518] RAX:
ffff8dd300000000 RBX:
ffff8dd85a7a9348 RCX:
000000038da26000
[ 3876.799538] RDX:
0000000000000000 RSI:
ffffe522ce368980 RDI:
0000000000000246
[ 3876.799559] RBP:
dae1922adadad000 R08:
0000000008020000 R09:
ffffe522c0000000
[ 3876.799579] R10:
ffff8dd57fd788c8 R11:
000000007511b030 R12:
ffff8dd781ddc000
[ 3876.799599] R13:
ffff8dd9e6240578 R14:
ffff8dd6896f7a88 R15:
ffff8dd688cf90b8
[ 3876.799620] FS:
00007f23ddd97700(0000) GS:
ffff8dda20200000(0000) knlGS:
0000000000000000
[ 3876.799643] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 3876.799660] CR2:
00007f23d4024000 CR3:
0000000710bb0005 CR4:
00000000003606f0
[ 3876.799682] DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
[ 3876.799703] DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
[ 3876.799723] Call Trace:
[ 3876.799735] ? do_raw_spin_unlock+0x49/0xc0
[ 3876.799749] ? _raw_spin_unlock+0x24/0x30
[ 3876.799779] resolve_indirect_refs+0x1eb/0xc80 [btrfs]
[ 3876.799810] find_parent_nodes+0x38d/0x1180 [btrfs]
[ 3876.799841] btrfs_check_shared+0x11a/0x1d0 [btrfs]
[ 3876.799870] ? extent_fiemap+0x598/0x6e0 [btrfs]
[ 3876.799895] extent_fiemap+0x598/0x6e0 [btrfs]
[ 3876.799913] do_vfs_ioctl+0x45a/0x700
[ 3876.799926] ksys_ioctl+0x70/0x80
[ 3876.799938] ? trace_hardirqs_off_thunk+0x1a/0x20
[ 3876.799953] __x64_sys_ioctl+0x16/0x20
[ 3876.799965] do_syscall_64+0x62/0x220
[ 3876.799977] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 3876.799993] RIP: 0033:0x7f23e0013dd7
(...)
[ 3876.800056] RSP: 002b:
00007f23ddd96ca8 EFLAGS:
00000246 ORIG_RAX:
0000000000000010
[ 3876.800078] RAX:
ffffffffffffffda RBX:
00007f23d80210f8 RCX:
00007f23e0013dd7
[ 3876.800099] RDX:
00007f23d80210f8 RSI:
00000000c020660b RDI:
0000000000000003
[ 3876.800626] RBP:
000055fa2a2a2440 R08:
0000000000000000 R09:
00007f23ddd96d7c
[ 3876.801143] R10:
00007f23d8022000 R11:
0000000000000246 R12:
00007f23ddd96d80
[ 3876.801662] R13:
00007f23ddd96d78 R14:
00007f23d80210f0 R15:
00007f23ddd96d80
(...)
[ 3876.805107] ---[ end trace
e53161e179ef04f9 ]---
Fix that by saving the root's header owner field into a local variable
before freeing the root's extent buffer, and then use that local variable
when needed.
Fixes:
30b0463a9394d9 ("Btrfs: fix accessing the root pointer in tree mod log functions")
CC: stable@vger.kernel.org # 3.10+
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Christophe Leroy [Wed, 21 Aug 2019 15:05:55 +0000 (15:05 +0000)]
btrfs: fix allocation of free space cache v1 bitmap pages
commit
3acd48507dc43eeeb0a1fe965b8bad91cab904a7 upstream.
Various notifications of type "BUG kmalloc-4096 () : Redzone
overwritten" have been observed recently in various parts of the kernel.
After some time, it has been made a relation with the use of BTRFS
filesystem and with SLUB_DEBUG turned on.
[ 22.809700] BUG kmalloc-4096 (Tainted: G W ): Redzone overwritten
[ 22.810286] INFO: 0xbe1a5921-0xfbfc06cd. First byte 0x0 instead of 0xcc
[ 22.810866] INFO: Allocated in __load_free_space_cache+0x588/0x780 [btrfs] age=22 cpu=0 pid=224
[ 22.811193] __slab_alloc.constprop.26+0x44/0x70
[ 22.811345] kmem_cache_alloc_trace+0xf0/0x2ec
[ 22.811588] __load_free_space_cache+0x588/0x780 [btrfs]
[ 22.811848] load_free_space_cache+0xf4/0x1b0 [btrfs]
[ 22.812090] cache_block_group+0x1d0/0x3d0 [btrfs]
[ 22.812321] find_free_extent+0x680/0x12a4 [btrfs]
[ 22.812549] btrfs_reserve_extent+0xec/0x220 [btrfs]
[ 22.812785] btrfs_alloc_tree_block+0x178/0x5f4 [btrfs]
[ 22.813032] __btrfs_cow_block+0x150/0x5d4 [btrfs]
[ 22.813262] btrfs_cow_block+0x194/0x298 [btrfs]
[ 22.813484] commit_cowonly_roots+0x44/0x294 [btrfs]
[ 22.813718] btrfs_commit_transaction+0x63c/0xc0c [btrfs]
[ 22.813973] close_ctree+0xf8/0x2a4 [btrfs]
[ 22.814107] generic_shutdown_super+0x80/0x110
[ 22.814250] kill_anon_super+0x18/0x30
[ 22.814437] btrfs_kill_super+0x18/0x90 [btrfs]
[ 22.814590] INFO: Freed in proc_cgroup_show+0xc0/0x248 age=41 cpu=0 pid=83
[ 22.814841] proc_cgroup_show+0xc0/0x248
[ 22.814967] proc_single_show+0x54/0x98
[ 22.815086] seq_read+0x278/0x45c
[ 22.815190] __vfs_read+0x28/0x17c
[ 22.815289] vfs_read+0xa8/0x14c
[ 22.815381] ksys_read+0x50/0x94
[ 22.815475] ret_from_syscall+0x0/0x38
Commit
69d2480456d1 ("btrfs: use copy_page for copying pages instead of
memcpy") changed the way bitmap blocks are copied. But allthough bitmaps
have the size of a page, they were allocated with kzalloc().
Most of the time, kzalloc() allocates aligned blocks of memory, so
copy_page() can be used. But when some debug options like SLAB_DEBUG are
activated, kzalloc() may return unaligned pointer.
On powerpc, memcpy(), copy_page() and other copying functions use
'dcbz' instruction which provides an entire zeroed cacheline to avoid
memory read when the intention is to overwrite a full line. Functions
like memcpy() are writen to care about partial cachelines at the start
and end of the destination, but copy_page() assumes it gets pages. As
pages are naturally cache aligned, copy_page() doesn't care about
partial lines. This means that when copy_page() is called with a
misaligned pointer, a few leading bytes are zeroed.
To fix it, allocate bitmaps through kmem_cache instead of using kzalloc()
The cache pool is created with PAGE_SIZE alignment constraint.
Reported-by: Erhard F. <erhard_f@mailbox.org>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204371
Fixes:
69d2480456d1 ("btrfs: use copy_page for copying pages instead of memcpy")
Cc: stable@vger.kernel.org # 4.19+
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: David Sterba <dsterba@suse.com>
[ rename to btrfs_free_space_bitmap ]
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Mark Salyzyn [Thu, 29 Aug 2019 18:30:14 +0000 (11:30 -0700)]
ovl: filter of trusted xattr results in audit
commit
5c2e9f346b815841f9bed6029ebcb06415caf640 upstream.
When filtering xattr list for reading, presence of trusted xattr
results in a security audit log. However, if there is other content
no errno will be set, and if there isn't, the errno will be -ENODATA
and not -EPERM as is usually associated with a lack of capability.
The check does not block the request to list the xattrs present.
Switch to ns_capable_noaudit to reflect a more appropriate check.
Signed-off-by: Mark Salyzyn <salyzyn@android.com>
Cc: linux-security-module@vger.kernel.org
Cc: kernel-team@android.com
Cc: stable@vger.kernel.org # v3.18+
Fixes:
a082c6f680da ("ovl: filter trusted xattr for non-admin")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Ding Xiang [Mon, 9 Sep 2019 08:29:56 +0000 (16:29 +0800)]
ovl: Fix dereferencing possible ERR_PTR()
commit
97f024b9171e74c4443bbe8a8dce31b917f97ac5 upstream.
if ovl_encode_real_fh() fails, no memory was allocated
and the error in the error-valued pointer should be returned.
Fixes:
9b6faee07470 ("ovl: check ERR_PTR() return value from ovl_encode_fh()")
Signed-off-by: Ding Xiang <dingxiang@cmss.chinamobile.com>
Cc: <stable@vger.kernel.org> # v4.16+
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Steve French [Thu, 12 Sep 2019 02:46:20 +0000 (21:46 -0500)]
smb3: allow disabling requesting leases
commit
3e7a02d47872081f4b6234a9f72500f1d10f060c upstream.
In some cases to work around server bugs or performance
problems it can be helpful to be able to disable requesting
SMB2.1/SMB3 leases on a particular mount (not to all servers
and all shares we are mounted to). Add new mount parm
"nolease" which turns off requesting leases on directory
or file opens. Currently the only way to disable leases is
globally through a module load parameter. This is more
granular.
Suggested-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Yufen Yu [Fri, 27 Sep 2019 08:19:55 +0000 (16:19 +0800)]
block: fix null pointer dereference in blk_mq_rq_timed_out()
commit
8d6996630c03d7ceeabe2611378fea5ca1c3f1b3 upstream.
We got a null pointer deference BUG_ON in blk_mq_rq_timed_out()
as following:
[ 108.825472] BUG: kernel NULL pointer dereference, address:
0000000000000040
[ 108.827059] PGD 0 P4D 0
[ 108.827313] Oops: 0000 [#1] SMP PTI
[ 108.827657] CPU: 6 PID: 198 Comm: kworker/6:1H Not tainted 5.3.0-rc8+ #431
[ 108.829503] Workqueue: kblockd blk_mq_timeout_work
[ 108.829913] RIP: 0010:blk_mq_check_expired+0x258/0x330
[ 108.838191] Call Trace:
[ 108.838406] bt_iter+0x74/0x80
[ 108.838665] blk_mq_queue_tag_busy_iter+0x204/0x450
[ 108.839074] ? __switch_to_asm+0x34/0x70
[ 108.839405] ? blk_mq_stop_hw_queue+0x40/0x40
[ 108.839823] ? blk_mq_stop_hw_queue+0x40/0x40
[ 108.840273] ? syscall_return_via_sysret+0xf/0x7f
[ 108.840732] blk_mq_timeout_work+0x74/0x200
[ 108.841151] process_one_work+0x297/0x680
[ 108.841550] worker_thread+0x29c/0x6f0
[ 108.841926] ? rescuer_thread+0x580/0x580
[ 108.842344] kthread+0x16a/0x1a0
[ 108.842666] ? kthread_flush_work+0x170/0x170
[ 108.843100] ret_from_fork+0x35/0x40
The bug is caused by the race between timeout handle and completion for
flush request.
When timeout handle function blk_mq_rq_timed_out() try to read
'req->q->mq_ops', the 'req' have completed and reinitiated by next
flush request, which would call blk_rq_init() to clear 'req' as 0.
After commit
12f5b93145 ("blk-mq: Remove generation seqeunce"),
normal requests lifetime are protected by refcount. Until 'rq->ref'
drop to zero, the request can really be free. Thus, these requests
cannot been reused before timeout handle finish.
However, flush request has defined .end_io and rq->end_io() is still
called even if 'rq->ref' doesn't drop to zero. After that, the 'flush_rq'
can be reused by the next flush request handle, resulting in null
pointer deference BUG ON.
We fix this problem by covering flush request with 'rq->ref'.
If the refcount is not zero, flush_end_io() return and wait the
last holder recall it. To record the request status, we add a new
entry 'rq_status', which will be used in flush_end_io().
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: stable@vger.kernel.org # v4.18+
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-------
v2:
- move rq_status from struct request to struct blk_flush_queue
v3:
- remove unnecessary '{}' pair.
v4:
- let spinlock to protect 'fq->rq_status'
v5:
- move rq_status after flush_running_idx member of struct blk_flush_queue
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Stefan Assmann [Wed, 21 Aug 2019 14:09:29 +0000 (16:09 +0200)]
i40e: check __I40E_VF_DISABLE bit in i40e_sync_filters_subtask
commit
a7542b87607560d0b89e7ff81d870bd6ff8835cb upstream.
While testing VF spawn/destroy the following panic occurred.
BUG: unable to handle kernel NULL pointer dereference at
0000000000000029
[...]
Workqueue: i40e i40e_service_task [i40e]
RIP: 0010:i40e_sync_vsi_filters+0x6fd/0xc60 [i40e]
[...]
Call Trace:
? __switch_to_asm+0x35/0x70
? __switch_to_asm+0x41/0x70
? __switch_to_asm+0x35/0x70
? _cond_resched+0x15/0x30
i40e_sync_filters_subtask+0x56/0x70 [i40e]
i40e_service_task+0x382/0x11b0 [i40e]
? __switch_to_asm+0x41/0x70
? __switch_to_asm+0x41/0x70
process_one_work+0x1a7/0x3b0
worker_thread+0x30/0x390
? create_worker+0x1a0/0x1a0
kthread+0x112/0x130
? kthread_bind+0x30/0x30
ret_from_fork+0x35/0x40
Investigation revealed a race where pf->vf[vsi->vf_id].trusted may get
accessed by the watchdog via i40e_sync_filters_subtask() although
i40e_free_vfs() already free'd pf->vf.
To avoid this the call to i40e_sync_vsi_filters() in
i40e_sync_filters_subtask() needs to be guarded by __I40E_VF_DISABLE,
which is also used by i40e_free_vfs().
Note: put the __I40E_VF_DISABLE check after the
__I40E_MACVLAN_SYNC_PENDING check as the latter is more likely to
trigger.
CC: stable@vger.kernel.org
Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Michal Hocko [Wed, 25 Sep 2019 23:45:53 +0000 (16:45 -0700)]
memcg, kmem: do not fail __GFP_NOFAIL charges
commit
e55d9d9bfb69405bd7615c0f8d229d8fafb3e9b8 upstream.
Thomas has noticed the following NULL ptr dereference when using cgroup
v1 kmem limit:
BUG: unable to handle kernel NULL pointer dereference at
0000000000000008
PGD 0
P4D 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 3 PID: 16923 Comm: gtk-update-icon Not tainted 4.19.51 #42
Hardware name: Gigabyte Technology Co., Ltd. Z97X-Gaming G1/Z97X-Gaming G1, BIOS F9 07/31/2015
RIP: 0010:create_empty_buffers+0x24/0x100
Code: cd 0f 1f 44 00 00 0f 1f 44 00 00 41 54 49 89 d4 ba 01 00 00 00 55 53 48 89 fb e8 97 fe ff ff 48 89 c5 48 89 c2 eb 03 48 89 ca <48> 8b 4a 08 4c 09 22 48 85 c9 75 f1 48 89 6a 08 48 8b 43 18 48 8d
RSP: 0018:
ffff927ac1b37bf8 EFLAGS:
00010286
RAX:
0000000000000000 RBX:
fffff2d4429fd740 RCX:
0000000100097149
RDX:
0000000000000000 RSI:
0000000000000082 RDI:
ffff9075a99fbe00
RBP:
0000000000000000 R08:
fffff2d440949cc8 R09:
00000000000960c0
R10:
0000000000000002 R11:
0000000000000000 R12:
0000000000000000
R13:
ffff907601f18360 R14:
0000000000002000 R15:
0000000000001000
FS:
00007fb55b288bc0(0000) GS:
ffff90761f8c0000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
0000000000000008 CR3:
000000007aebc002 CR4:
00000000001606e0
Call Trace:
create_page_buffers+0x4d/0x60
__block_write_begin_int+0x8e/0x5a0
? ext4_inode_attach_jinode.part.82+0xb0/0xb0
? jbd2__journal_start+0xd7/0x1f0
ext4_da_write_begin+0x112/0x3d0
generic_perform_write+0xf1/0x1b0
? file_update_time+0x70/0x140
__generic_file_write_iter+0x141/0x1a0
ext4_file_write_iter+0xef/0x3b0
__vfs_write+0x17e/0x1e0
vfs_write+0xa5/0x1a0
ksys_write+0x57/0xd0
do_syscall_64+0x55/0x160
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Tetsuo then noticed that this is because the __memcg_kmem_charge_memcg
fails __GFP_NOFAIL charge when the kmem limit is reached. This is a wrong
behavior because nofail allocations are not allowed to fail. Normal
charge path simply forces the charge even if that means to cross the
limit. Kmem accounting should be doing the same.
Link: http://lkml.kernel.org/r/20190906125608.32129-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Thomas Lindroth <thomas.lindroth@gmail.com>
Debugged-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Thomas Lindroth <thomas.lindroth@gmail.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tetsuo Handa [Mon, 23 Sep 2019 22:37:08 +0000 (15:37 -0700)]
memcg, oom: don't require __GFP_FS when invoking memcg OOM killer
commit
f9c645621a28e37813a1de96d9cbd89cde94a1e4 upstream.
Masoud Sharbiani noticed that commit
29ef680ae7c21110 ("memcg, oom: move
out_of_memory back to the charge path") broke memcg OOM called from
__xfs_filemap_fault() path. It turned out that try_charge() is retrying
forever without making forward progress because mem_cgroup_oom(GFP_NOFS)
cannot invoke the OOM killer due to commit
3da88fb3bacfaa33 ("mm, oom:
move GFP_NOFS check to out_of_memory").
Allowing forced charge due to being unable to invoke memcg OOM killer will
lead to global OOM situation. Also, just returning -ENOMEM will be risky
because OOM path is lost and some paths (e.g. get_user_pages()) will leak
-ENOMEM. Therefore, invoking memcg OOM killer (despite GFP_NOFS) will be
the only choice we can choose for now.
Until
29ef680ae7c21110, we were able to invoke memcg OOM killer when
GFP_KERNEL reclaim failed [1]. But since
29ef680ae7c21110, we need to
invoke memcg OOM killer when GFP_NOFS reclaim failed [2]. Although in the
past we did invoke memcg OOM killer for GFP_NOFS [3], we might get
pre-mature memcg OOM reports due to this patch.
[1]
leaker invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0
CPU: 0 PID: 2746 Comm: leaker Not tainted 4.18.0+ #19
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
Call Trace:
dump_stack+0x63/0x88
dump_header+0x67/0x27a
? mem_cgroup_scan_tasks+0x91/0xf0
oom_kill_process+0x210/0x410
out_of_memory+0x10a/0x2c0
mem_cgroup_out_of_memory+0x46/0x80
mem_cgroup_oom_synchronize+0x2e4/0x310
? high_work_func+0x20/0x20
pagefault_out_of_memory+0x31/0x76
mm_fault_error+0x55/0x115
? handle_mm_fault+0xfd/0x220
__do_page_fault+0x433/0x4e0
do_page_fault+0x22/0x30
? page_fault+0x8/0x30
page_fault+0x1e/0x30
RIP: 0033:0x4009f0
Code: 03 00 00 00 e8 71 fd ff ff 48 83 f8 ff 49 89 c6 74 74 48 89 c6 bf c0 0c 40 00 31 c0 e8 69 fd ff ff 45 85 ff 7e 21 31 c9 66 90 <41> 0f be 14 0e 01 d3 f7 c1 ff 0f 00 00 75 05 41 c6 04 0e 2a 48 83
RSP: 002b:
00007ffe29ae96f0 EFLAGS:
00010206
RAX:
000000000000001b RBX:
0000000000000000 RCX:
0000000001ce1000
RDX:
0000000000000000 RSI:
000000007fffffe5 RDI:
0000000000000000
RBP:
000000000000000c R08:
0000000000000000 R09:
00007f94be09220d
R10:
0000000000000002 R11:
0000000000000246 R12:
00000000000186a0
R13:
0000000000000003 R14:
00007f949d845000 R15:
0000000002800000
Task in /leaker killed as a result of limit of /leaker
memory: usage 524288kB, limit 524288kB, failcnt 158965
memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0
kmem: usage 2016kB, limit 9007199254740988kB, failcnt 0
Memory cgroup stats for /leaker: cache:844KB rss:521136KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:132KB writeback:0KB inactive_anon:0KB active_anon:521224KB inactive_file:1012KB active_file:8KB unevictable:0KB
Memory cgroup out of memory: Kill process 2746 (leaker) score 998 or sacrifice child
Killed process 2746 (leaker) total-vm:536704kB, anon-rss:521176kB, file-rss:1208kB, shmem-rss:0kB
oom_reaper: reaped process 2746 (leaker), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[2]
leaker invoked oom-killer: gfp_mask=0x600040(GFP_NOFS), nodemask=(null), order=0, oom_score_adj=0
CPU: 1 PID: 2746 Comm: leaker Not tainted 4.18.0+ #20
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
Call Trace:
dump_stack+0x63/0x88
dump_header+0x67/0x27a
? mem_cgroup_scan_tasks+0x91/0xf0
oom_kill_process+0x210/0x410
out_of_memory+0x109/0x2d0
mem_cgroup_out_of_memory+0x46/0x80
try_charge+0x58d/0x650
? __radix_tree_replace+0x81/0x100
mem_cgroup_try_charge+0x7a/0x100
__add_to_page_cache_locked+0x92/0x180
add_to_page_cache_lru+0x4d/0xf0
iomap_readpages_actor+0xde/0x1b0
? iomap_zero_range_actor+0x1d0/0x1d0
iomap_apply+0xaf/0x130
iomap_readpages+0x9f/0x150
? iomap_zero_range_actor+0x1d0/0x1d0
xfs_vm_readpages+0x18/0x20 [xfs]
read_pages+0x60/0x140
__do_page_cache_readahead+0x193/0x1b0
ondemand_readahead+0x16d/0x2c0
page_cache_async_readahead+0x9a/0xd0
filemap_fault+0x403/0x620
? alloc_set_pte+0x12c/0x540
? _cond_resched+0x14/0x30
__xfs_filemap_fault+0x66/0x180 [xfs]
xfs_filemap_fault+0x27/0x30 [xfs]
__do_fault+0x19/0x40
__handle_mm_fault+0x8e8/0xb60
handle_mm_fault+0xfd/0x220
__do_page_fault+0x238/0x4e0
do_page_fault+0x22/0x30
? page_fault+0x8/0x30
page_fault+0x1e/0x30
RIP: 0033:0x4009f0
Code: 03 00 00 00 e8 71 fd ff ff 48 83 f8 ff 49 89 c6 74 74 48 89 c6 bf c0 0c 40 00 31 c0 e8 69 fd ff ff 45 85 ff 7e 21 31 c9 66 90 <41> 0f be 14 0e 01 d3 f7 c1 ff 0f 00 00 75 05 41 c6 04 0e 2a 48 83
RSP: 002b:
00007ffda45c9290 EFLAGS:
00010206
RAX:
000000000000001b RBX:
0000000000000000 RCX:
0000000001a1e000
RDX:
0000000000000000 RSI:
000000007fffffe5 RDI:
0000000000000000
RBP:
000000000000000c R08:
0000000000000000 R09:
00007f6d061ff20d
R10:
0000000000000002 R11:
0000000000000246 R12:
00000000000186a0
R13:
0000000000000003 R14:
00007f6ce59b2000 R15:
0000000002800000
Task in /leaker killed as a result of limit of /leaker
memory: usage 524288kB, limit 524288kB, failcnt 7221
memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0
kmem: usage 1944kB, limit 9007199254740988kB, failcnt 0
Memory cgroup stats for /leaker: cache:3632KB rss:518232KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB inactive_anon:0KB active_anon:518408KB inactive_file:3908KB active_file:12KB unevictable:0KB
Memory cgroup out of memory: Kill process 2746 (leaker) score 992 or sacrifice child
Killed process 2746 (leaker) total-vm:536704kB, anon-rss:518264kB, file-rss:1188kB, shmem-rss:0kB
oom_reaper: reaped process 2746 (leaker), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[3]
leaker invoked oom-killer: gfp_mask=0x50, order=0, oom_score_adj=0
leaker cpuset=/ mems_allowed=0
CPU: 1 PID: 3206 Comm: leaker Not tainted 3.10.0-957.27.2.el7.x86_64 #1
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
Call Trace:
[<
ffffffffaf364147>] dump_stack+0x19/0x1b
[<
ffffffffaf35eb6a>] dump_header+0x90/0x229
[<
ffffffffaedbb456>] ? find_lock_task_mm+0x56/0xc0
[<
ffffffffaee32a38>] ? try_get_mem_cgroup_from_mm+0x28/0x60
[<
ffffffffaedbb904>] oom_kill_process+0x254/0x3d0
[<
ffffffffaee36c36>] mem_cgroup_oom_synchronize+0x546/0x570
[<
ffffffffaee360b0>] ? mem_cgroup_charge_common+0xc0/0xc0
[<
ffffffffaedbc194>] pagefault_out_of_memory+0x14/0x90
[<
ffffffffaf35d072>] mm_fault_error+0x6a/0x157
[<
ffffffffaf3717c8>] __do_page_fault+0x3c8/0x4f0
[<
ffffffffaf371925>] do_page_fault+0x35/0x90
[<
ffffffffaf36d768>] page_fault+0x28/0x30
Task in /leaker killed as a result of limit of /leaker
memory: usage 524288kB, limit 524288kB, failcnt 20628
memory+swap: usage 524288kB, limit 9007199254740988kB, failcnt 0
kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
Memory cgroup stats for /leaker: cache:840KB rss:523448KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:523448KB inactive_file:464KB active_file:376KB unevictable:0KB
Memory cgroup out of memory: Kill process 3206 (leaker) score 970 or sacrifice child
Killed process 3206 (leaker) total-vm:536692kB, anon-rss:523304kB, file-rss:412kB, shmem-rss:0kB
Bisected by Masoud Sharbiani.
Link: http://lkml.kernel.org/r/cbe54ed1-b6ba-a056-8899-2dc42526371d@i-love.sakura.ne.jp
Fixes:
3da88fb3bacfaa33 ("mm, oom: move GFP_NOFS check to out_of_memory") [necessary after
29ef680ae7c21110]
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: Masoud Sharbiani <msharbiani@apple.com>
Tested-by: Masoud Sharbiani <msharbiani@apple.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: David Rientjes <rientjes@google.com>
Cc: <stable@vger.kernel.org> [4.19+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bob Peterson [Thu, 12 Sep 2019 17:54:27 +0000 (13:54 -0400)]
gfs2: clear buf_in_tr when ending a transaction in sweep_bh_for_rgrps
commit
f0b444b349e33ae0d3dd93e25ca365482a5d17d4 upstream.
In function sweep_bh_for_rgrps, which is a helper for punch_hole,
it uses variable buf_in_tr to keep track of when it needs to commit
pending block frees on a partial delete that overflows the
transaction created for the delete. The problem is that the
variable was initialized at the start of function sweep_bh_for_rgrps
but it was never cleared, even when starting a new transaction.
This patch reinitializes the variable when the transaction is
ended, so the next transaction starts out with it cleared.
Fixes:
d552a2b9b33e ("GFS2: Non-recursive delete")
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Hans de Goede [Sun, 21 Jul 2019 13:19:18 +0000 (15:19 +0200)]
efifb: BGRT: Improve efifb_bgrt_sanity_check
commit
51677dfcc17f88ed754143df670ff064eae67f84 upstream.
For various reasons, at least with x86 EFI firmwares, the xoffset and
yoffset in the BGRT info are not always reliable.
Extensive testing has shown that when the info is correct, the
BGRT image is always exactly centered horizontally (the yoffset variable
is more variable and not always predictable).
This commit simplifies / improves the bgrt_sanity_check to simply
check that the BGRT image is exactly centered horizontally and skips
(re)drawing it when it is not.
This fixes the BGRT image sometimes being drawn in the wrong place.
Cc: stable@vger.kernel.org
Fixes:
88fe4ceb2447 ("efifb: BGRT: Do not copy the boot graphics for non native resolutions")
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Cc: Peter Jones <pjones@redhat.com>,
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190721131918.10115-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Mark Brown [Wed, 4 Sep 2019 12:42:50 +0000 (13:42 +0100)]
regulator: Defer init completion for a while after late_initcall
commit
55576cf1853798e86f620766e23b604c9224c19c upstream.
The kernel has no way of knowing when we have finished instantiating
drivers, between deferred probe and systems that build key drivers as
modules we might be doing this long after userspace has booted. This has
always been a bit of an issue with regulator_init_complete since it can
power off hardware that's not had it's driver loaded which can result in
user visible effects, the main case is powering off displays. Practically
speaking it's not been an issue in real systems since most systems that
use the regulator API are embedded and build in key drivers anyway but
with Arm laptops coming on the market it's becoming more of an issue so
let's do something about it.
In the absence of any better idea just defer the powering off for 30s
after late_initcall(), this is obviously a hack but it should mask the
issue for now and it's no more arbitrary than late_initcall() itself.
Ideally we'd have some heuristics to detect if we're on an affected
system and tune or skip the delay appropriately, and there may be some
need for a command line option to be added.
Link: https://lore.kernel.org/r/20190904124250.25844-1-broonie@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Tested-by: Lee Jones <lee.jones@linaro.org>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Thadeu Lima de Souza Cascardo [Tue, 3 Sep 2019 17:18:02 +0000 (14:18 -0300)]
alarmtimer: Use EOPNOTSUPP instead of ENOTSUPP
commit
f18ddc13af981ce3c7b7f26925f099e7c6929aba upstream.
ENOTSUPP is not supposed to be returned to userspace. This was found on an
OpenPower machine, where the RTC does not support set_alarm.
On that system, a clock_nanosleep(CLOCK_REALTIME_ALARM, ...) results in
"524 Unknown error 524"
Replace it with EOPNOTSUPP which results in the expected "95 Operation not
supported" error.
Fixes:
1c6b39ad3f01 (alarmtimers: Return -ENOTSUPP if no RTC device is present)
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20190903171802.28314-1-cascardo@canonical.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Shawn Lin [Fri, 30 Aug 2019 00:26:47 +0000 (08:26 +0800)]
arm64: dts: rockchip: limit clock rate of MMC controllers for RK3328
commit
03e61929c0d227ed3e1c322fc3804216ea298b7e upstream.
150MHz is a fundamental limitation of RK3328 Soc, w/o this limitation,
eMMC, for instance, will run into 200MHz clock rate in HS200 mode, which
makes the RK3328 boards not always boot properly. By adding it in
rk3328.dtsi would also obviate the worry of missing it when adding new
boards.
Fixes:
52e02d377a72 ("arm64: dts: rockchip: add core dtsi file for RK3328 SoCs")
Cc: stable@vger.kernel.org
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Liang Chen <cl@rock-chips.com>
Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Will Deacon [Thu, 22 Aug 2019 14:03:45 +0000 (15:03 +0100)]
arm64: tlb: Ensure we execute an ISB following walk cache invalidation
commit
51696d346c49c6cf4f29e9b20d6e15832a2e3408 upstream.
05f2d2f83b5a ("arm64: tlbflush: Introduce __flush_tlb_kernel_pgtable")
added a new TLB invalidation helper which is used when freeing
intermediate levels of page table used for kernel mappings, but is
missing the required ISB instruction after completion of the TLBI
instruction.
Add the missing barrier.
Cc: <stable@vger.kernel.org>
Fixes:
05f2d2f83b5a ("arm64: tlbflush: Introduce __flush_tlb_kernel_pgtable")
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Will Deacon [Thu, 22 Aug 2019 13:58:37 +0000 (14:58 +0100)]
Revert "arm64: Remove unnecessary ISBs from set_{pte,pmd,pud}"
commit
d0b7a302d58abe24ed0f32a0672dd4c356bb73db upstream.
This reverts commit
24fe1b0efad4fcdd32ce46cffeab297f22581707.
Commit
24fe1b0efad4fcdd ("arm64: Remove unnecessary ISBs from
set_{pte,pmd,pud}") removed ISB instructions immediately following updates
to the page table, on the grounds that they are not required by the
architecture and a DSB alone is sufficient to ensure that subsequent data
accesses use the new translation:
DDI0487E_a, B2-128:
| ... no instruction that appears in program order after the DSB
| instruction can alter any state of the system or perform any part of
| its functionality until the DSB completes other than:
|
| * Being fetched from memory and decoded
| * Reading the general-purpose, SIMD and floating-point,
| Special-purpose, or System registers that are directly or indirectly
| read without causing side-effects.
However, the same document also states the following:
DDI0487E_a, B2-125:
| DMB and DSB instructions affect reads and writes to the memory system
| generated by Load/Store instructions and data or unified cache
| maintenance instructions being executed by the PE. Instruction fetches
| or accesses caused by a hardware translation table access are not
| explicit accesses.
which appears to claim that the DSB alone is insufficient. Unfortunately,
some CPU designers have followed the second clause above, whereas in Linux
we've been relying on the first. This means that our mapping sequence:
MOV X0, <valid pte>
STR X0, [Xptep] // Store new PTE to page table
DSB ISHST
LDR X1, [X2] // Translates using the new PTE
can actually raise a translation fault on the load instruction because the
translation can be performed speculatively before the page table update and
then marked as "faulting" by the CPU. For user PTEs, this is ok because we
can handle the spurious fault, but for kernel PTEs and intermediate table
entries this results in a panic().
Revert the offending commit to reintroduce the missing barriers.
Cc: <stable@vger.kernel.org>
Fixes:
24fe1b0efad4fcdd ("arm64: Remove unnecessary ISBs from set_{pte,pmd,pud}")
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Luis Araneda [Thu, 8 Aug 2019 12:52:43 +0000 (08:52 -0400)]
ARM: zynq: Use memcpy_toio instead of memcpy on smp bring-up
commit
b7005d4ef4f3aa2dc24019ffba03a322557ac43d upstream.
This fixes a kernel panic on memcpy when
FORTIFY_SOURCE is enabled.
The initial smp implementation on commit
aa7eb2bb4e4a
("arm: zynq: Add smp support")
used memcpy, which worked fine until commit
ee333554fed5
("ARM: 8749/1: Kconfig: Add ARCH_HAS_FORTIFY_SOURCE")
enabled overflow checks at runtime, producing a read
overflow panic.
The computed size of memcpy args are:
- p_size (dst):
4294967295 = (size_t) -1
- q_size (src): 1
- size (len): 8
Additionally, the memory is marked as __iomem, so one of
the memcpy_* functions should be used for read/write.
Fixes:
aa7eb2bb4e4a ("arm: zynq: Add smp support")
Signed-off-by: Luis Araneda <luaraneda@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Lihua Yao [Sat, 7 Sep 2019 03:30:01 +0000 (03:30 +0000)]
ARM: samsung: Fix system restart on S3C6410
commit
16986074035cc0205472882a00d404ed9d213313 upstream.
S3C6410 system restart is triggered by watchdog reset.
Cc: <stable@vger.kernel.org>
Fixes:
9f55342cc2de ("ARM: dts: s3c64xx: Fix infinite interrupt in soft mode")
Signed-off-by: Lihua Yao <ylhuajnu@outlook.com>
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Amadeusz Sławiński [Tue, 27 Aug 2019 14:17:08 +0000 (16:17 +0200)]
ASoC: Intel: Fix use of potentially uninitialized variable
commit
810f3b860850148788fc1ed8a6f5f807199fed65 upstream.
If ipc->ops.reply_msg_match is NULL, we may end up using uninitialized
mask value.
reported by smatch:
sound/soc/intel/common/sst-ipc.c:266 sst_ipc_reply_find_msg() error: uninitialized symbol 'mask'.
Signed-off-by: Amadeusz Sławiński <amadeuszx.slawinski@intel.com>
Link: https://lore.kernel.org/r/20190827141712.21015-3-amadeuszx.slawinski@linux.intel.com
Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Amadeusz Sławiński [Tue, 27 Aug 2019 14:17:07 +0000 (16:17 +0200)]
ASoC: Intel: Skylake: Use correct function to access iomem space
commit
17d29ff98fd4b70e9ccdac5e95e18a087e2737ef upstream.
For copying from __iomem, we should use __ioread32_copy.
reported by sparse:
sound/soc/intel/skylake/skl-debug.c:437:34: warning: incorrect type in argument 1 (different address spaces)
sound/soc/intel/skylake/skl-debug.c:437:34: expected void [noderef] <asn:2> *to
sound/soc/intel/skylake/skl-debug.c:437:34: got unsigned char *
sound/soc/intel/skylake/skl-debug.c:437:51: warning: incorrect type in argument 2 (different address spaces)
sound/soc/intel/skylake/skl-debug.c:437:51: expected void const *from
sound/soc/intel/skylake/skl-debug.c:437:51: got void [noderef] <asn:2> *[assigned] fw_reg_addr
Signed-off-by: Amadeusz Sławiński <amadeuszx.slawinski@intel.com>
Link: https://lore.kernel.org/r/20190827141712.21015-2-amadeuszx.slawinski@linux.intel.com
Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Amadeusz Sławiński [Tue, 27 Aug 2019 14:17:12 +0000 (16:17 +0200)]
ASoC: Intel: NHLT: Fix debug print format
commit
855a06da37a773fd073d51023ac9d07988c87da8 upstream.
oem_table_id is 8 chars long, so we need to limit it, otherwise it
may print some unprintable characters into dmesg.
Signed-off-by: Amadeusz Sławiński <amadeuszx.slawinski@intel.com>
Link: https://lore.kernel.org/r/20190827141712.21015-7-amadeuszx.slawinski@linux.intel.com
Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Kees Cook [Thu, 26 Sep 2019 17:15:25 +0000 (10:15 -0700)]
binfmt_elf: Do not move brk for INTERP-less ET_EXEC
commit
7be3cb019db1cbd5fd5ffe6d64a23fefa4b6f229 upstream.
When brk was moved for binaries without an interpreter, it should have
been limited to ET_DYN only. In other words, the special case was an
ET_DYN that lacks an INTERP, not just an executable that lacks INTERP.
The bug manifested for giant static executables, where the brk would end
up in the middle of the text area on 32-bit architectures.
Reported-and-tested-by: Richard Kojedzinszky <richard@kojedz.in>
Fixes:
bbdc6076d2e5 ("binfmt_elf: move brk out of mmap when doing direct loader exec")
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Arnd Bergmann [Wed, 19 Jun 2019 13:24:17 +0000 (10:24 -0300)]
media: don't drop front-end reference count for ->detach
commit
14e3cdbb00a885eedc95c0cf8eda8fe28d26d6b4 upstream.
A bugfix introduce a link failure in configurations without CONFIG_MODULES:
In file included from drivers/media/usb/dvb-usb/pctv452e.c:20:0:
drivers/media/usb/dvb-usb/pctv452e.c: In function 'pctv452e_frontend_attach':
drivers/media/dvb-frontends/stb0899_drv.h:151:36: error: weak declaration of 'stb0899_attach' being applied to a already existing, static definition
The problem is that the !IS_REACHABLE() declaration of stb0899_attach()
is a 'static inline' definition that clashes with the weak definition.
I further observed that the bugfix was only done for one of the five users
of stb0899_attach(), the other four still have the problem. This reverts
the bugfix and instead addresses the problem by not dropping the reference
count when calling '->detach()', instead we call this function directly
in dvb_frontend_put() before dropping the kref on the front-end.
I first submitted this in early 2018, and after some discussion it
was apparently discarded. While there is a long-term plan in place,
that plan is obviously not nearing completion yet, and the current
kernel is still broken unless this patch is applied.
Link: https://patchwork.kernel.org/patch/10140175/
Link: https://patchwork.linuxtv.org/patch/54831/
Cc: Max Kellermann <max.kellermann@gmail.com>
Cc: Wolfgang Rohdewald <wolfgang@rohdewald.de>
Cc: stable@vger.kernel.org
Fixes:
f686c14364ad ("[media] stb0899: move code to "detach" callback")
Fixes:
6cdeaed3b142 ("media: dvb_usb_pctv452e: module refcount changes were unbalanced")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Sean Young <sean@mess.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Hans de Goede [Sun, 18 Aug 2019 15:03:23 +0000 (12:03 -0300)]
media: sn9c20x: Add MSI MS-1039 laptop to flip_dmi_table
commit
7e0bb5828311f811309bed5749528ca04992af2f upstream.
Like a bunch of other MSI laptops the MS-1039 uses a 0c45:627b
SN9C201 + OV7660 webcam which is mounted upside down.
Add it to the sn9c20x flip_dmi_table to deal with this.
Cc: stable@vger.kernel.org
Reported-by: Rui Salvaterra <rsalvaterra@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sean Christopherson [Tue, 3 Sep 2019 23:36:45 +0000 (16:36 -0700)]
KVM: x86: Manually calculate reserved bits when loading PDPTRS
commit
16cfacc8085782dab8e365979356ce1ca87fd6cc upstream.
Manually generate the PDPTR reserved bit mask when explicitly loading
PDPTRs. The reserved bits that are being tracked by the MMU reflect the
current paging mode, which is unlikely to be PAE paging in the vast
majority of flows that use load_pdptrs(), e.g. CR0 and CR4 emulation,
__set_sregs(), etc... This can cause KVM to incorrectly signal a bad
PDPTR, or more likely, miss a reserved bit check and subsequently fail
a VM-Enter due to a bad VMCS.GUEST_PDPTR.
Add a one off helper to generate the reserved bits instead of sharing
code across the MMU's calculations and the PDPTR emulation. The PDPTR
reserved bits are basically set in stone, and pushing a helper into
the MMU's calculation adds unnecessary complexity without improving
readability.
Oppurtunistically fix/update the comment for load_pdptrs().
Note, the buggy commit also introduced a deliberate functional change,
"Also remove bit 5-6 from rsvd_bits_mask per latest SDM.", which was
effectively (and correctly) reverted by commit
cd9ae5fe47df ("KVM: x86:
Fix page-tables reserved bits"). A bit of SDM archaeology shows that
the SDM from late 2008 had a bug (likely a copy+paste error) where it
listed bits 6:5 as AVL and A for PDPTEs used for 4k entries but reserved
for 2mb entries. I.e. the SDM contradicted itself, and bits 6:5 are and
always have been reserved.
Fixes:
20c466b56168d ("KVM: Use rsvd_bits_mask in load_pdptrs()")
Cc: stable@vger.kernel.org
Cc: Nadav Amit <nadav.amit@gmail.com>
Reported-by: Doug Reiland <doug.reiland@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jan Dakinevich [Tue, 27 Aug 2019 13:07:08 +0000 (13:07 +0000)]
KVM: x86: set ctxt->have_exception in x86_decode_insn()
commit
c8848cee74ff05638e913582a476bde879c968ad upstream.
x86_emulate_instruction() takes into account ctxt->have_exception flag
during instruction decoding, but in practice this flag is never set in
x86_decode_insn().
Fixes:
6ea6e84309ca ("KVM: x86: inject exceptions produced by x86_decode_insn")
Cc: stable@vger.kernel.org
Cc: Denis Lunev <den@virtuozzo.com>
Cc: Roman Kagan <rkagan@virtuozzo.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Signed-off-by: Jan Dakinevich <jan.dakinevich@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jan Dakinevich [Tue, 27 Aug 2019 13:07:09 +0000 (13:07 +0000)]
KVM: x86: always stop emulation on page fault
commit
8530a79c5a9f4e29e6ffb35ec1a79d81f4968ec8 upstream.
inject_emulated_exception() returns true if and only if nested page
fault happens. However, page fault can come from guest page tables
walk, either nested or not nested. In both cases we should stop an
attempt to read under RIP and give guest to step over its own page
fault handler.
This is also visible when an emulated instruction causes a #GP fault
and the VMware backdoor is enabled. To handle the VMware backdoor,
KVM intercepts #GP faults; with only the next patch applied,
x86_emulate_instruction() injects a #GP but returns EMULATE_FAIL
instead of EMULATE_DONE. EMULATE_FAIL causes handle_exception_nmi()
(or gp_interception() for SVM) to re-inject the original #GP because it
thinks emulation failed due to a non-VMware opcode. This patch prevents
the issue as x86_emulate_instruction() will return EMULATE_DONE after
injecting the #GP.
Fixes:
6ea6e84309ca ("KVM: x86: inject exceptions produced by x86_decode_insn")
Cc: stable@vger.kernel.org
Cc: Denis Lunev <den@virtuozzo.com>
Cc: Roman Kagan <rkagan@virtuozzo.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Signed-off-by: Jan Dakinevich <jan.dakinevich@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Helge Deller [Thu, 5 Sep 2019 14:44:17 +0000 (16:44 +0200)]
parisc: Disable HP HSC-PCI Cards to prevent kernel crash
commit
5fa1659105fac63e0f3c199b476025c2e04111ce upstream.
The HP Dino PCI controller chip can be used in two variants: as on-board
controller (e.g. in B160L), or on an Add-On card ("Card-Mode") to bridge
PCI components to systems without a PCI bus, e.g. to a HSC/GSC bus. One
such Add-On card is the HP HSC-PCI Card which has one or more DEC Tulip
PCI NIC chips connected to the on-card Dino PCI controller.
Dino in Card-Mode has a big disadvantage: All PCI memory accesses need
to go through the DINO_MEM_DATA register, so Linux drivers will not be
able to use the ioremap() function. Without ioremap() many drivers will
not work, one example is the tulip driver which then simply crashes the
kernel if it tries to access the ports on the HP HSC card.
This patch disables the HP HSC card if it finds one, and as such
fixes the kernel crash on a HP D350/2 machine.
Signed-off-by: Helge Deller <deller@gmx.de>
Noticed-by: Phil Scarr <phil.scarr@pm.me>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Vasily Averin [Fri, 13 Sep 2019 15:17:11 +0000 (18:17 +0300)]
fuse: fix missing unlock_page in fuse_writepage()
commit
d5880c7a8620290a6c90ced7a0e8bd0ad9419601 upstream.
unlock_page() was missing in case of an already in-flight write against the
same page.
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Fixes:
ff17be086477 ("fuse: writepage: skip already in flight")
Cc: <stable@vger.kernel.org> # v3.13
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Madhavan Srinivasan [Tue, 27 Aug 2019 10:16:35 +0000 (15:46 +0530)]
powerpc/imc: Dont create debugfs files for cpu-less nodes
commit
41ba17f20ea835c489e77bd54e2da73184e22060 upstream.
Commit <
684d984038aa> ('powerpc/powernv: Add debugfs interface for
imc-mode and imc') added debugfs interface for the nest imc pmu
devices to support changing of different ucode modes. Primarily adding
this capability for debug. But when doing so, the code did not
consider the case of cpu-less nodes. So when reading the _cmd_ or
_mode_ file of a cpu-less node will create this crash.
Faulting instruction address: 0xc0000000000d0d58
Oops: Kernel access of bad area, sig: 11 [#1]
...
CPU: 67 PID: 5301 Comm: cat Not tainted 5.2.0-rc6-next-
20190627+ #19
NIP:
c0000000000d0d58 LR:
c00000000049aa18 CTR:
c0000000000d0d50
REGS:
c00020194548f9e0 TRAP: 0300 Not tainted (5.2.0-rc6-next-
20190627+)
MSR:
9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR:
28022822 XER:
00000000
CFAR:
c00000000049aa14 DAR:
000000000003fc08 DSISR:
40000000 IRQMASK: 0
...
NIP imc_mem_get+0x8/0x20
LR simple_attr_read+0x118/0x170
Call Trace:
simple_attr_read+0x70/0x170 (unreliable)
debugfs_attr_read+0x6c/0xb0
__vfs_read+0x3c/0x70
vfs_read+0xbc/0x1a0
ksys_read+0x7c/0x140
system_call+0x5c/0x70
Patch fixes the issue with a more robust check for vbase to NULL.
Before patch, ls output for the debugfs imc directory
# ls /sys/kernel/debug/powerpc/imc/
imc_cmd_0 imc_cmd_251 imc_cmd_253 imc_cmd_255 imc_mode_0 imc_mode_251 imc_mode_253 imc_mode_255
imc_cmd_250 imc_cmd_252 imc_cmd_254 imc_cmd_8 imc_mode_250 imc_mode_252 imc_mode_254 imc_mode_8
After patch, ls output for the debugfs imc directory
# ls /sys/kernel/debug/powerpc/imc/
imc_cmd_0 imc_cmd_8 imc_mode_0 imc_mode_8
Actual bug here is that, we have two loops with potentially different
loop counts. That is, in imc_get_mem_addr_nest(), loop count is
obtained from the dt entries. But in case of export_imc_mode_and_cmd(),
loop was based on for_each_nid() count. Patch fixes the loop count in
latter based on the struct mem_info. Ideally it would be better to
have array size in struct imc_pmu.
Fixes:
684d984038aa ('powerpc/powernv: Add debugfs interface for imc-mode and imc')
Reported-by: Qian Cai <cai@lca.pw>
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190827101635.6942-1-maddy@linux.vnet.ibm.com
Cc: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Ming Lei [Thu, 25 Jul 2019 02:05:00 +0000 (10:05 +0800)]
scsi: implement .cleanup_rq callback
[ Upstream commit
b7e9e1fb7a9227be34ad4a5e778022c3164494cf ]
Implement .cleanup_rq() callback for freeing driver private part
of the request. Then we can avoid to leak this part if the request isn't
completed by SCSI, and freed by blk-mq or upper layer(such as dm-rq) finally.
Cc: Ewan D. Milne <emilne@redhat.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: dm-devel@redhat.com
Cc: <stable@vger.kernel.org>
Fixes:
396eaf21ee17 ("blk-mq: improve DM's blk-mq IO merging via blk_insert_cloned_request feedback")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Ming Lei [Thu, 25 Jul 2019 02:04:59 +0000 (10:04 +0800)]
blk-mq: add callback of .cleanup_rq
[ Upstream commit
226b4fc75c78f9c497c5182d939101b260cfb9f3 ]
SCSI maintains its own driver private data hooked off of each SCSI
request, and the pridate data won't be freed after scsi_queue_rq()
returns BLK_STS_RESOURCE or BLK_STS_DEV_RESOURCE. An upper layer driver
(e.g. dm-rq) may need to retry these SCSI requests, before SCSI has
fully dispatched them, due to a lower level SCSI driver's resource
limitation identified in scsi_queue_rq(). Currently SCSI's per-request
private data is leaked when the upper layer driver (dm-rq) frees and
then retries these requests in response to BLK_STS_RESOURCE or
BLK_STS_DEV_RESOURCE returns from scsi_queue_rq().
This usecase is so specialized that it doesn't warrant training an
existing blk-mq interface (e.g. blk_mq_free_request) to allow SCSI to
account for freeing its driver private data -- doing so would add an
extra branch for handling a special case that all other consumers of
SCSI (and blk-mq) won't ever need to worry about.
So the most pragmatic way forward is to delegate freeing SCSI driver
private data to the upper layer driver (dm-rq). Do so by adding
new .cleanup_rq callback and calling a new blk_mq_cleanup_rq() method
from dm-rq. A following commit will implement the .cleanup_rq() hook
in scsi_mq_ops.
Cc: Ewan D. Milne <emilne@redhat.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: dm-devel@redhat.com
Cc: <stable@vger.kernel.org>
Fixes:
396eaf21ee17 ("blk-mq: improve DM's blk-mq IO merging via blk_insert_cloned_request feedback")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Jan-Marek Glogowski [Sun, 15 Sep 2019 14:57:28 +0000 (16:57 +0200)]
ALSA: hda/realtek - PCI quirk for Medion E4254
[ Upstream commit
bd9c10bc663dd2eaac8fe39dad0f18cd21527446 ]
The laptop has a combined jack to attach headsets on the right.
The BIOS encodes them as two different colored jacks at the front,
but otherwise it seems to be configured ok. But any adaption of
the pins config on its own doesn't fix the jack detection to work
in Linux. Still Windows works correct.
This is somehow fixed by chaining ALC256_FIXUP_ASUS_HEADSET_MODE,
which seems to register the microphone jack as a headset part and
also results in fixing jack sensing, visible in dmesg as:
-snd_hda_codec_realtek hdaudioC0D0: Mic=0x19
+snd_hda_codec_realtek hdaudioC0D0: Headset Mic=0x19
[ Actually the essential change is the location of the jack; the
driver created "Front Mic Jack" without the matching volume / mute
control element due to its jack location, which confused PA.
-- tiwai ]
Signed-off-by: Jan-Marek Glogowski <glogow@fbihome.de>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/8f4f9b20-0aeb-f8f1-c02f-fd53c09679f1@fbihome.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Yan, Zheng [Tue, 1 Oct 2019 21:24:25 +0000 (17:24 -0400)]
ceph: use ceph_evict_inode to cleanup inode's resource
[ Upstream commit
87bc5b895d94a0f40fe170d4cf5771c8e8f85d15 ]
remove_session_caps() relies on __wait_on_freeing_inode(), to wait for
freeing inode to remove its caps. But VFS wakes freeing inode waiters
before calling destroy_inode().
[ jlayton: mainline moved to ->free_inode before the original patch was
merged. This backport reinstates ceph_destroy_inode and just
has it do the call_rcu call. ]
Cc: stable@vger.kernel.org
Link: https://tracker.ceph.com/issues/40102
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Sasha Levin [Tue, 1 Oct 2019 22:01:07 +0000 (18:01 -0400)]
Revert "ceph: use ceph_evict_inode to cleanup inode's resource"
This reverts commit
812810399999a673d30f9d04d38659030a28051a.
The backport was incorrect and was causing kernel panics. Revert and
re-apply a correct backport from Jeff Layton.
Signed-off-by: Sasha Levin <sashal@kernel.org>
Joonwon Kang [Sat, 27 Jul 2019 15:58:41 +0000 (00:58 +0900)]
randstruct: Check member structs in is_pure_ops_struct()
commit
60f2c82ed20bde57c362e66f796cf9e0e38a6dbb upstream.
While no uses in the kernel triggered this case, it was possible to have
a false negative where a struct contains other structs which contain only
function pointers because of unreachable code in is_pure_ops_struct().
Signed-off-by: Joonwon Kang <kjw1627@gmail.com>
Link: https://lore.kernel.org/r/20190727155841.GA13586@host
Fixes:
313dd1b62921 ("gcc-plugins: Add the randstruct plugin")
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Ira Weiny [Wed, 11 Sep 2019 11:30:53 +0000 (07:30 -0400)]
IB/hfi1: Define variables as unsigned long to fix KASAN warning
commit
f8659d68e2bee5b86a1beaf7be42d942e1fc81f4 upstream.
Define the working variables to be unsigned long to be compatible with
for_each_set_bit and change types as needed.
While we are at it remove unused variables from a couple of functions.
This was found because of the following KASAN warning:
==================================================================
BUG: KASAN: stack-out-of-bounds in find_first_bit+0x19/0x70
Read of size 8 at addr
ffff888362d778d0 by task kworker/u308:2/1889
CPU: 21 PID: 1889 Comm: kworker/u308:2 Tainted: G W 5.3.0-rc2-mm1+ #2
Hardware name: Intel Corporation W2600CR/W2600CR, BIOS SE5C600.86B.02.04.0003.
102320141138 10/23/2014
Workqueue: ib-comp-unb-wq ib_cq_poll_work [ib_core]
Call Trace:
dump_stack+0x9a/0xf0
? find_first_bit+0x19/0x70
print_address_description+0x6c/0x332
? find_first_bit+0x19/0x70
? find_first_bit+0x19/0x70
__kasan_report.cold.6+0x1a/0x3b
? find_first_bit+0x19/0x70
kasan_report+0xe/0x12
find_first_bit+0x19/0x70
pma_get_opa_portstatus+0x5cc/0xa80 [hfi1]
? ret_from_fork+0x3a/0x50
? pma_get_opa_port_ectrs+0x200/0x200 [hfi1]
? stack_trace_consume_entry+0x80/0x80
hfi1_process_mad+0x39b/0x26c0 [hfi1]
? __lock_acquire+0x65e/0x21b0
? clear_linkup_counters+0xb0/0xb0 [hfi1]
? check_chain_key+0x1d7/0x2e0
? lock_downgrade+0x3a0/0x3a0
? match_held_lock+0x2e/0x250
ib_mad_recv_done+0x698/0x15e0 [ib_core]
? clear_linkup_counters+0xb0/0xb0 [hfi1]
? ib_mad_send_done+0xc80/0xc80 [ib_core]
? mark_held_locks+0x79/0xa0
? _raw_spin_unlock_irqrestore+0x44/0x60
? rvt_poll_cq+0x1e1/0x340 [rdmavt]
__ib_process_cq+0x97/0x100 [ib_core]
ib_cq_poll_work+0x31/0xb0 [ib_core]
process_one_work+0x4ee/0xa00
? pwq_dec_nr_in_flight+0x110/0x110
? do_raw_spin_lock+0x113/0x1d0
worker_thread+0x57/0x5a0
? process_one_work+0xa00/0xa00
kthread+0x1bb/0x1e0
? kthread_create_on_node+0xc0/0xc0
ret_from_fork+0x3a/0x50
The buggy address belongs to the page:
page:
ffffea000d8b5dc0 refcount:0 mapcount:0 mapping:
0000000000000000 index:0x0
flags: 0x17ffffc0000000()
raw:
0017ffffc0000000 0000000000000000 ffffea000d8b5dc8 0000000000000000
raw:
0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: kasan: bad access detected
addr
ffff888362d778d0 is located in stack of task kworker/u308:2/1889 at offset 32 in frame:
pma_get_opa_portstatus+0x0/0xa80 [hfi1]
this frame has 1 object:
[32, 36) 'vl_select_mask'
Memory state around the buggy address:
ffff888362d77780: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888362d77800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
ffff888362d77880: 00 00 00 00 00 00 f1 f1 f1 f1 04 f2 f2 f2 00 00
^
ffff888362d77900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888362d77980: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 04 f2 f2 f2
==================================================================
Cc: <stable@vger.kernel.org>
Fixes:
7724105686e7 ("IB/hfi1: add driver files")
Link: https://lore.kernel.org/r/20190911113053.126040.47327.stgit@awfm-01.aw.intel.com
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Danit Goldberg [Mon, 16 Sep 2019 06:48:18 +0000 (09:48 +0300)]
IB/mlx5: Free mpi in mp_slave mode
commit
5d44adebbb7e785939df3db36ac360f5e8b73e44 upstream.
ib_add_slave_port() allocates a multiport struct but never frees it.
Don't leak memory, free the allocated mpi struct during driver unload.
Cc: <stable@vger.kernel.org>
Fixes:
32f69e4be269 ("{net, IB}/mlx5: Manage port association for multiport RoCE")
Link: https://lore.kernel.org/r/20190916064818.19823-3-leon@kernel.org
Signed-off-by: Danit Goldberg <danitg@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Vincent Whitchurch [Thu, 11 Jul 2019 14:29:37 +0000 (16:29 +0200)]
printk: Do not lose last line in kmsg buffer dump
commit
c9dccacfccc72c32692eedff4a27a4b0833a2afd upstream.
kmsg_dump_get_buffer() is supposed to select all the youngest log
messages which fit into the provided buffer. It determines the correct
start index by using msg_print_text() with a NULL buffer to calculate
the size of each entry. However, when performing the actual writes,
msg_print_text() only writes the entry to the buffer if the written len
is lesser than the size of the buffer. So if the lengths of the
selected youngest log messages happen to precisely fill up the provided
buffer, the last log message is not included.
We don't want to modify msg_print_text() to fill up the buffer and start
returning a length which is equal to the size of the buffer, since
callers of its other users, such as kmsg_dump_get_line(), depend upon
the current behaviour.
Instead, fix kmsg_dump_get_buffer() to compensate for this.
For example, with the following two final prints:
[ 6.427502]
AAAAAAAAAAAAA
[ 6.427769]
BBBBBBBB12345
A dump of a 64-byte buffer filled by kmsg_dump_get_buffer(), before this
patch:
00000000: 3c 30 3e 5b 20 20 20 20 36 2e 35 32 32 31 39 37 <0>[ 6.522197
00000010: 5d 20 41 41 41 41 41 41 41 41 41 41 41 41 41 0a ]
AAAAAAAAAAAAA.
00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
After this patch:
00000000: 3c 30 3e 5b 20 20 20 20 36 2e 34 35 36 36 37 38 <0>[ 6.456678
00000010: 5d 20 42 42 42 42 42 42 42 42 31 32 33 34 35 0a ]
BBBBBBBB12345.
00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Link: http://lkml.kernel.org/r/20190711142937.4083-1-vincent.whitchurch@axis.com
Fixes:
e2ae715d66bf4bec ("kmsg - kmsg_dump() use iterator to receive log buffer content")
To: rostedt@goodmis.org
Cc: linux-kernel@vger.kernel.org
Cc: <stable@vger.kernel.org> # v3.5+
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Quinn Tran [Fri, 26 Jul 2019 16:07:32 +0000 (09:07 -0700)]
scsi: qla2xxx: Fix Relogin to prevent modifying scan_state flag
commit
8b5292bcfcacf15182a77a973a98d310e76fd58b upstream.
Relogin fails to move forward due to scan_state flag indicating device is
not there. Before relogin process, Session delete process accidently
modified the scan_state flag.
[mkp: typos plus corrected Fixes: sha as reported by sfr]
Fixes:
2dee5521028c ("scsi: qla2xxx: Fix login state machine freeze")
Cc: stable@vger.kernel.org
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Martin Wilck [Wed, 4 Sep 2019 15:52:29 +0000 (15:52 +0000)]
scsi: scsi_dh_rdac: zero cdb in send_mode_select()
commit
57adf5d4cfd3198aa480e7c94a101fc8c4e6109d upstream.
cdb in send_mode_select() is not zeroed and is only partially filled in
rdac_failover_get(), which leads to some random data getting to the
device. Users have reported storage responding to such commands with
INVALID FIELD IN CDB. Code before commit
327825574132 was not affected, as
it called blk_rq_set_block_pc().
Fix this by zeroing out the cdb first.
Identified & fix proposed by HPE.
Fixes:
327825574132 ("scsi_dh_rdac: switch to scsi_execute_req_flags()")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20190904155205.1666-1-martin.wilck@suse.com
Signed-off-by: Martin Wilck <mwilck@suse.com>
Acked-by: Ales Novak <alnovak@suse.cz>
Reviewed-by: Shane Seymour <shane.seymour@hpe.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Takashi Sakamoto [Tue, 10 Sep 2019 13:51:52 +0000 (22:51 +0900)]
ALSA: firewire-tascam: check intermediate state of clock status and retry
commit
e1a00b5b253a4f97216b9a33199a863987075162 upstream.
2 bytes in MSB of register for clock status is zero during intermediate
state after changing status of sampling clock in models of TASCAM FireWire
series. The duration of this state differs depending on cases. During the
state, it's better to retry reading the register for current status of
the clock.
In current implementation, the intermediate state is checked only when
getting current sampling transmission frequency, then retry reading.
This care is required for the other operations to read the register.
This commit moves the codes of check and retry into helper function
commonly used for operations to read the register.
Fixes:
e453df44f0d6 ("ALSA: firewire-tascam: add PCM functionality")
Cc: <stable@vger.kernel.org> # v4.4+
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Link: https://lore.kernel.org/r/20190910135152.29800-3-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Takashi Sakamoto [Tue, 10 Sep 2019 13:51:51 +0000 (22:51 +0900)]
ALSA: firewire-tascam: handle error code when getting current source of clock
commit
2617120f4de6d0423384e0e86b14c78b9de84d5a upstream.
The return value of snd_tscm_stream_get_clock() is ignored. This commit
checks the value and handle error.
Fixes:
e453df44f0d6 ("ALSA: firewire-tascam: add PCM functionality")
Cc: <stable@vger.kernel.org> # v4.4+
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Link: https://lore.kernel.org/r/20190910135152.29800-2-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Luca Coelho [Tue, 24 Sep 2019 10:30:57 +0000 (13:30 +0300)]
iwlwifi: fw: don't send GEO_TX_POWER_LIMIT command to FW version 36
commit
fddbfeece9c7882cc47754c7da460fe427e3e85b upstream.
The intention was to have the GEO_TX_POWER_LIMIT command in FW version
36 as well, but not all 8000 family got this feature enabled. The
8000 family is the only one using version 36, so skip this version
entirely. If we try to send this command to the firmwares that do not
support it, we get a BAD_COMMAND response from the firmware.
This fixes https://bugzilla.kernel.org/show_bug.cgi?id=204151.
Cc: stable@vger.kernel.org # 4.19+
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
MyungJoo Ham [Mon, 26 Aug 2019 12:37:37 +0000 (21:37 +0900)]
PM / devfreq: passive: fix compiler warning
[ Upstream commit
0465814831a926ce2f83e8f606d067d86745234e ]
The recent commit of
PM / devfreq: passive: Use non-devm notifiers
had incurred compiler warning, "unused variable 'dev'".
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Sakari Ailus [Wed, 7 Aug 2019 14:19:00 +0000 (11:19 -0300)]
media: omap3isp: Set device on omap3isp subdevs
[ Upstream commit
e9eb103f027725053a4b02f93d7f2858b56747ce ]
The omap3isp driver registered subdevs without the dev field being set. Do
that now.
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Qu Wenruo [Tue, 16 Jul 2019 09:00:33 +0000 (17:00 +0800)]
btrfs: extent-tree: Make sure we only allocate extents from block groups with the same type
[ Upstream commit
2a28468e525f3924efed7f29f2bc5a2926e7e19a ]
[BUG]
With fuzzed image and MIXED_GROUPS super flag, we can hit the following
BUG_ON():
kernel BUG at fs/btrfs/delayed-ref.c:491!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 1849 Comm: sync Tainted: G O 5.2.0-custom #27
RIP: 0010:update_existing_head_ref.cold+0x44/0x46 [btrfs]
Call Trace:
add_delayed_ref_head+0x20c/0x2d0 [btrfs]
btrfs_add_delayed_tree_ref+0x1fc/0x490 [btrfs]
btrfs_free_tree_block+0x123/0x380 [btrfs]
__btrfs_cow_block+0x435/0x500 [btrfs]
btrfs_cow_block+0x110/0x240 [btrfs]
btrfs_search_slot+0x230/0xa00 [btrfs]
? __lock_acquire+0x105e/0x1e20
btrfs_insert_empty_items+0x67/0xc0 [btrfs]
alloc_reserved_file_extent+0x9e/0x340 [btrfs]
__btrfs_run_delayed_refs+0x78e/0x1240 [btrfs]
? kvm_clock_read+0x18/0x30
? __sched_clock_gtod_offset+0x21/0x50
btrfs_run_delayed_refs.part.0+0x4e/0x180 [btrfs]
btrfs_run_delayed_refs+0x23/0x30 [btrfs]
btrfs_commit_transaction+0x53/0x9f0 [btrfs]
btrfs_sync_fs+0x7c/0x1c0 [btrfs]
? __ia32_sys_fdatasync+0x20/0x20
sync_fs_one_sb+0x23/0x30
iterate_supers+0x95/0x100
ksys_sync+0x62/0xb0
__ia32_sys_sync+0xe/0x20
do_syscall_64+0x65/0x240
entry_SYSCALL_64_after_hwframe+0x49/0xbe
[CAUSE]
This situation is caused by several factors:
- Fuzzed image
The extent tree of this fs missed one backref for extent tree root.
So we can allocated space from that slot.
- MIXED_BG feature
Super block has MIXED_BG flag.
- No mixed block groups exists
All block groups are just regular ones.
This makes data space_info->block_groups[] contains metadata block
groups. And when we reserve space for data, we can use space in
metadata block group.
Then we hit the following file operations:
- fallocate
We need to allocate data extents.
find_free_extent() choose to use the metadata block to allocate space
from, and choose the space of extent tree root, since its backref is
missing.
This generate one delayed ref head with is_data = 1.
- extent tree update
We need to update extent tree at run_delayed_ref time.
This generate one delayed ref head with is_data = 0, for the same
bytenr of old extent tree root.
Then we trigger the BUG_ON().
[FIX]
The quick fix here is to check block_group->flags before using it.
The problem can only happen for MIXED_GROUPS fs. Regular filesystems
won't have space_info with DATA|METADATA flag, and no way to hit the
bug.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203255
Reported-by: Jungyeon Yoon <jungyeon.yoon@gmail.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Kai-Heng Feng [Wed, 21 Aug 2019 05:10:04 +0000 (13:10 +0800)]
iommu/amd: Override wrong IVRS IOAPIC on Raven Ridge systems
[ Upstream commit
93d051550ee02eaff9a2541d825605a7bd778027 ]
Raven Ridge systems may have malfunction touchpad or hang at boot if
incorrect IVRS IOAPIC is provided by BIOS.
Users already found correct "ivrs_ioapic=" values, let's put them inside
kernel to workaround buggy BIOS.
BugLink: https://bugs.launchpad.net/bugs/1795292
BugLink: https://bugs.launchpad.net/bugs/1837688
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Takashi Iwai [Thu, 22 Aug 2019 07:58:07 +0000 (09:58 +0200)]
ALSA: hda/realtek - Blacklist PC beep for Lenovo ThinkCentre M73/93
[ Upstream commit
051c78af14fcd74a22b5af45548ad9d588247cc7 ]
Lenovo ThinkCentre M73 and M93 don't seem to have a proper beep
although the driver tries to probe and set up blindly.
Blacklist these machines for suppressing the beep creation.
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=204635
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Tomas Bortoli [Wed, 31 Jul 2019 15:19:05 +0000 (12:19 -0300)]
media: ttusb-dec: Fix info-leak in ttusb_dec_send_command()
[ Upstream commit
a10feaf8c464c3f9cfdd3a8a7ce17e1c0d498da1 ]
The function at issue does not always initialize each byte allocated
for 'b' and can therefore leak uninitialized memory to a USB device in
the call to usb_bulk_msg()
Use kzalloc() instead of kmalloc()
Signed-off-by: Tomas Bortoli <tomasbortoli@gmail.com>
Reported-by: syzbot+0522702e9d67142379f1@syzkaller.appspotmail.com
Signed-off-by: Sean Young <sean@mess.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Ahzo [Mon, 5 Aug 2019 19:14:18 +0000 (21:14 +0200)]
drm/amd/powerplay/smu7: enforce minimal VBITimeout (v2)
[ Upstream commit
f659bb6dae58c113805f92822e4c16ddd3156b79 ]
This fixes screen corruption/flickering on 75 Hz displays.
v2: make print statement debug only (Alex)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102646
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Ahzo <Ahzo@tutanota.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Takashi Iwai [Tue, 13 Aug 2019 15:11:28 +0000 (17:11 +0200)]
ALSA: hda - Drop unsol event handler for Intel HDMI codecs
[ Upstream commit
f2dbe87c5ac1f88e6007ba1f1374f4bd8a197fb6 ]
We don't need to deal with the unsol events for Intel chips that are
tied with the graphics via audio component notifier. Although the
presence of the audio component is checked at the beginning of
hdmi_unsol_event(), better to short cut by dropping unsol_event ops.
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=204565
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Kai-Heng Feng [Mon, 8 Jul 2019 04:55:45 +0000 (12:55 +0800)]
e1000e: add workaround for possible stalled packet
[ Upstream commit
e5e9a2ecfe780975820e157b922edee715710b66 ]
This works around a possible stalled packet issue, which may occur due to
clock recovery from the PCH being too slow, when the LAN is transitioning
from K1 at 1G link speed.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204057
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Kevin Easton [Wed, 10 Jul 2019 13:31:38 +0000 (13:31 +0000)]
libertas: Add missing sentinel at end of if_usb.c fw_table
[ Upstream commit
764f3f1ecffc434096e0a2b02f1a6cc964a89df6 ]
This sentinel tells the firmware loading process when to stop.
Reported-and-tested-by: syzbot+98156c174c5a2cad9f8f@syzkaller.appspotmail.com
Signed-off-by: Kevin Easton <kevin@guarana.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Nigel Croxon [Fri, 6 Sep 2019 13:21:33 +0000 (09:21 -0400)]
raid5: don't increment read_errors on EILSEQ return
[ Upstream commit
b76b4715eba0d0ed574f58918b29c1b2f0fa37a8 ]
While MD continues to count read errors returned by the lower layer.
If those errors are -EILSEQ, instead of -EIO, it should NOT increase
the read_errors count.
When RAID6 is set up on dm-integrity target that detects massive
corruption, the leg will be ejected from the array. Even if the
issue is correctable with a sector re-write and the array has
necessary redundancy to correct it.
The leg is ejected because it runs up the rdev->read_errors beyond
conf->max_nr_stripes. The return status in dm-drypt when there is
a data integrity error is -EILSEQ (BLK_STS_PROTECTION).
Signed-off-by: Nigel Croxon <ncroxon@redhat.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Ulf Hansson [Sun, 8 Sep 2019 10:12:27 +0000 (12:12 +0200)]
mmc: dw_mmc: Re-store SDIO IRQs mask at system resume
[ Upstream commit
7c526608d5afb62cbc967225e2ccaacfdd142e9d ]
In cases when SDIO IRQs have been enabled, runtime suspend is prevented by
the driver. However, this still means dw_mci_runtime_suspend|resume() gets
called during system suspend/resume, via pm_runtime_force_suspend|resume().
This means during system suspend/resume, the register context of the dw_mmc
device most likely loses its register context, even in cases when SDIO IRQs
have been enabled.
To re-enable the SDIO IRQs during system resume, the dw_mmc driver
currently relies on the mmc core to re-enable the SDIO IRQs when it resumes
the SDIO card, but this isn't the recommended solution. Instead, it's
better to deal with this locally in the dw_mmc driver, so let's do that.
Tested-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Ulf Hansson [Sun, 8 Sep 2019 10:12:26 +0000 (12:12 +0200)]
mmc: core: Add helper function to indicate if SDIO IRQs is enabled
[ Upstream commit
bd880b00697befb73eff7220ee20bdae4fdd487b ]
To avoid each host driver supporting SDIO IRQs, from keeping track
internally about if SDIO IRQs has been claimed, let's introduce a common
helper function, sdio_irq_claimed().
The function returns true if SDIO IRQs are claimed, via using the
information about the number of claimed irqs. This is safe, even without
any locks, as long as the helper function is called only from
runtime/system suspend callbacks of the host driver.
Tested-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Al Cooper [Tue, 3 Sep 2019 11:51:14 +0000 (07:51 -0400)]
mmc: sdhci: Fix incorrect switch to HS mode
[ Upstream commit
c894e33ddc1910e14d6f2a2016f60ab613fd8b37 ]
When switching from any MMC speed mode that requires 1.8v
(HS200, HS400 and HS400ES) to High Speed (HS) mode, the system
ends up configured for SDR12 with a 50MHz clock which is an illegal
mode.
This happens because the SDHCI_CTRL_VDD_180 bit in the
SDHCI_HOST_CONTROL2 register is left set and when this bit is
set, the speed mode is controlled by the SDHCI_CTRL_UHS field
in the SDHCI_HOST_CONTROL2 register. The SDHCI_CTRL_UHS field
will end up being set to 0 (SDR12) by sdhci_set_uhs_signaling()
because there is no UHS mode being set.
The fix is to change sdhci_set_uhs_signaling() to set the
SDHCI_CTRL_UHS field to SDR25 (which is the same as HS) for
any switch to HS mode.
This was found on a new eMMC controller that does strict checking
of the speed mode and the corresponding clock rate. It caused the
switch to HS400 mode to fail because part of the sequence to switch
to HS400 requires a switch from HS200 to HS before going to HS400.
Suggested-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Al Cooper <alcooperx@gmail.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Ulf Hansson [Sun, 8 Sep 2019 10:12:30 +0000 (12:12 +0200)]
mmc: core: Clarify sdio_irq_pending flag for MMC_CAP2_SDIO_IRQ_NOTHREAD
[ Upstream commit
36d57efb4af534dd6b442ea0b9a04aa6dfa37abe ]
The sdio_irq_pending flag is used to let host drivers indicate that it has
signaled an IRQ. If that is the case and we only have a single SDIO func
that have claimed an SDIO IRQ, our assumption is that we can avoid reading
the SDIO_CCCR_INTx register and just call the SDIO func irq handler
immediately. This makes sense, but the flag is set/cleared in a somewhat
messy order, let's fix that up according to below.
First, the flag is currently set in sdio_run_irqs(), which is executed as a
work that was scheduled from sdio_signal_irq(). To make it more implicit
that the host have signaled an IRQ, let's instead immediately set the flag
in sdio_signal_irq(). This also makes the behavior consistent with host
drivers that uses the legacy, mmc_signal_sdio_irq() API. This have no
functional impact, because we don't expect host drivers to call
sdio_signal_irq() until after the work (sdio_run_irqs()) have been executed
anyways.
Second, currently we never clears the flag when using the sdio_run_irqs()
work, but only when using the sdio_irq_thread(). Let make the behavior
consistent, by moving the flag to be cleared inside the common
process_sdio_pending_irqs() function. Additionally, tweak the behavior of
the flag slightly, by avoiding to clear it unless we processed the SDIO
IRQ. The purpose with this at this point, is to keep the information about
whether there have been an SDIO IRQ signaled by the host, so at system
resume we can decide to process it without reading the SDIO_CCCR_INTx
register.
Tested-by: Matthias Kaehlcke <mka@chromium.org>
Reviewed-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Guoqing Jiang [Wed, 11 Sep 2019 08:06:29 +0000 (10:06 +0200)]
raid5: don't set STRIPE_HANDLE to stripe which is in batch list
[ Upstream commit
6ce220dd2f8ea71d6afc29b9a7524c12e39f374a ]
If stripe in batch list is set with STRIPE_HANDLE flag, then the stripe
could be set with STRIPE_ACTIVE by the handle_stripe function. And if
error happens to the batch_head at the same time, break_stripe_batch_list
is called, then below warning could happen (the same report in [1]), it
means a member of batch list was set with STRIPE_ACTIVE.
[7028915.431770] stripe state: 2001
[7028915.431815] ------------[ cut here ]------------
[7028915.431828] WARNING: CPU: 18 PID: 29089 at drivers/md/raid5.c:4614 break_stripe_batch_list+0x203/0x240 [raid456]
[...]
[7028915.431879] CPU: 18 PID: 29089 Comm: kworker/u82:5 Tainted: G O 4.14.86-1-storage #4.14.86-1.2~deb9
[7028915.431881] Hardware name: Supermicro SSG-2028R-ACR24L/X10DRH-iT, BIOS 3.1 06/18/2018
[7028915.431888] Workqueue: raid5wq raid5_do_work [raid456]
[7028915.431890] task:
ffff9ab0ef36d7c0 task.stack:
ffffb72926f84000
[7028915.431896] RIP: 0010:break_stripe_batch_list+0x203/0x240 [raid456]
[7028915.431898] RSP: 0018:
ffffb72926f87ba8 EFLAGS:
00010286
[7028915.431900] RAX:
0000000000000012 RBX:
ffff9aaa84a98000 RCX:
0000000000000000
[7028915.431901] RDX:
0000000000000000 RSI:
ffff9ab2bfa15458 RDI:
ffff9ab2bfa15458
[7028915.431902] RBP:
ffff9aaa8fb4e900 R08:
0000000000000001 R09:
0000000000002eb4
[7028915.431903] R10:
00000000ffffffff R11:
0000000000000000 R12:
ffff9ab1736f1b00
[7028915.431904] R13:
0000000000000000 R14:
ffff9aaa8fb4e900 R15:
0000000000000001
[7028915.431906] FS:
0000000000000000(0000) GS:
ffff9ab2bfa00000(0000) knlGS:
0000000000000000
[7028915.431907] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[7028915.431908] CR2:
00007ff953b9f5d8 CR3:
0000000bf4009002 CR4:
00000000003606e0
[7028915.431909] DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
[7028915.431910] DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
[7028915.431910] Call Trace:
[7028915.431923] handle_stripe+0x8e7/0x2020 [raid456]
[7028915.431930] ? __wake_up_common_lock+0x89/0xc0
[7028915.431935] handle_active_stripes.isra.58+0x35f/0x560 [raid456]
[7028915.431939] raid5_do_work+0xc6/0x1f0 [raid456]
Also commit
59fc630b8b5f9f ("RAID5: batch adjacent full stripe write")
said "If a stripe is added to batch list, then only the first stripe
of the list should be put to handle_list and run handle_stripe."
So don't set STRIPE_HANDLE to stripe which is already in batch list,
otherwise the stripe could be put to handle_list and run handle_stripe,
then the above warning could be triggered.
[1]. https://www.spinics.net/lists/raid/msg62552.html
Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Peter Ujfalusi [Fri, 6 Sep 2019 05:55:24 +0000 (08:55 +0300)]
ASoC: dmaengine: Make the pcm->name equal to pcm->id if the name is not set
[ Upstream commit
2ec42f3147e1610716f184b02e65d7f493eed925 ]
Some tools use the snd_pcm_info_get_name() to try to identify PCMs or for
other purposes.
Currently it is left empty with the dmaengine-pcm, in this case copy the
pcm->id string as pcm->name.
For example IGT is using this to find the HDMI PCM for testing audio on it.
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Reported-by: Arthur She <arthur.she@linaro.org>
Link: https://lore.kernel.org/r/20190906055524.7393-1-peter.ujfalusi@ti.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
M. Vefa Bicakci [Fri, 16 Aug 2019 01:41:39 +0000 (21:41 -0400)]
platform/x86: intel_pmc_core: Do not ioremap RAM
[ Upstream commit
7d505758b1e556cdf65a5e451744fe0ae8063d17 ]
On a Xen-based PVH virtual machine with more than 4 GiB of RAM,
intel_pmc_core fails initialization with the following warning message
from the kernel, indicating that the driver is attempting to ioremap
RAM:
ioremap on RAM at 0x00000000fe000000 - 0x00000000fe001fff
WARNING: CPU: 1 PID: 434 at arch/x86/mm/ioremap.c:186 __ioremap_caller.constprop.0+0x2aa/0x2c0
...
Call Trace:
? pmc_core_probe+0x87/0x2d0 [intel_pmc_core]
pmc_core_probe+0x87/0x2d0 [intel_pmc_core]
This issue appears to manifest itself because of the following fallback
mechanism in the driver:
if (lpit_read_residency_count_address(&slp_s0_addr))
pmcdev->base_addr = PMC_BASE_ADDR_DEFAULT;
The validity of address PMC_BASE_ADDR_DEFAULT (i.e., 0xFE000000) is not
verified by the driver, which is what this patch introduces. With this
patch, if address PMC_BASE_ADDR_DEFAULT is in RAM, then the driver will
not attempt to ioremap the aforementioned address.
Signed-off-by: M. Vefa Bicakci <m.v.b@runbox.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Gayatri Kammela [Thu, 5 Sep 2019 19:30:17 +0000 (12:30 -0700)]
x86/cpu: Add Tiger Lake to Intel family
[ Upstream commit
6e1c32c5dbb4b90eea8f964c2869d0bde050dbe0 ]
Add the model numbers/CPUIDs of Tiger Lake mobile and desktop to the
Intel family.
Suggested-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Gayatri Kammela <gayatri.kammela@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rahul Tanwar <rahul.tanwar@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190905193020.14707-2-tony.luck@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Harald Freudenberger [Thu, 5 Sep 2019 07:38:17 +0000 (09:38 +0200)]
s390/crypto: xts-aes-s390 fix extra run-time crypto self tests finding
[ Upstream commit
9e323d45ba94262620a073a3f9945ca927c07c71 ]
With 'extra run-time crypto self tests' enabled, the selftest
for s390-xts fails with
alg: skcipher: xts-aes-s390 encryption unexpectedly succeeded on
test vector "random: len=0 klen=64"; expected_error=-22,
cfg="random: inplace use_digest nosimd src_divs=[2.61%@+4006,
84.44%@+21, 1.55%@+13, 4.50%@+344, 4.26%@+21, 2.64%@+27]"
This special case with nbytes=0 is not handled correctly and this
fix now makes sure that -EINVAL is returned when there is en/decrypt
called with 0 bytes to en/decrypt.
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Masami Hiramatsu [Tue, 3 Sep 2019 11:08:21 +0000 (20:08 +0900)]
kprobes: Prohibit probing on BUG() and WARN() address
[ Upstream commit
e336b4027775cb458dc713745e526fa1a1996b2a ]
Since BUG() and WARN() may use a trap (e.g. UD2 on x86) to
get the address where the BUG() has occurred, kprobes can not
do single-step out-of-line that instruction. So prohibit
probing on such address.
Without this fix, if someone put a kprobe on WARN(), the
kernel will crash with invalid opcode error instead of
outputing warning message, because kernel can not find
correct bug address.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David S . Miller <davem@davemloft.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Naveen N . Rao <naveen.n.rao@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/156750890133.19112.3393666300746167111.stgit@devnote2
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Peter Ujfalusi [Fri, 23 Aug 2019 12:56:14 +0000 (15:56 +0300)]
dmaengine: ti: edma: Do not reset reserved paRAM slots
[ Upstream commit
c5dbe60664b3660f5ac5854e21273ea2e7ff698f ]
Skip resetting paRAM slots marked as reserved as they might be used by
other cores.
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Link: https://lore.kernel.org/r/20190823125618.8133-2-peter.ujfalusi@ti.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Yufen Yu [Tue, 3 Sep 2019 13:12:41 +0000 (21:12 +0800)]
md/raid1: fail run raid1 array when active disk less than one
[ Upstream commit
07f1a6850c5d5a65c917c3165692b5179ac4cb6b ]
When run test case:
mdadm -CR /dev/md1 -l 1 -n 4 /dev/sd[a-d] --assume-clean --bitmap=internal
mdadm -S /dev/md1
mdadm -A /dev/md1 /dev/sd[b-c] --run --force
mdadm --zero /dev/sda
mdadm /dev/md1 -a /dev/sda
echo offline > /sys/block/sdc/device/state
echo offline > /sys/block/sdb/device/state
sleep 5
mdadm -S /dev/md1
echo running > /sys/block/sdb/device/state
echo running > /sys/block/sdc/device/state
mdadm -A /dev/md1 /dev/sd[a-c] --run --force
mdadm run fail with kernel message as follow:
[ 172.986064] md: kicking non-fresh sdb from array!
[ 173.004210] md: kicking non-fresh sdc from array!
[ 173.022383] md/raid1:md1: active with 0 out of 4 mirrors
[ 173.022406] md1: failed to create bitmap (-5)
In fact, when active disk in raid1 array less than one, we
need to return fail in raid1_run().
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Wang Shenran [Wed, 24 Jul 2019 08:01:10 +0000 (11:01 +0300)]
hwmon: (acpi_power_meter) Change log level for 'unsafe software power cap'
[ Upstream commit
6e4d91aa071810deac2cd052161aefb376ecf04e ]
At boot time, the acpi_power_meter driver logs the following error level
message: "Ignoring unsafe software power cap". Having read about it from
a few sources, it seems that the error message can be quite misleading.
While the message can imply that Linux is ignoring the fact that the
system is operating in potentially dangerous conditions, the truth is
the driver found an ACPI_PMC object that supports software power
capping. The driver simply decides not to use it, perhaps because it
doesn't support the object.
The best solution is probably changing the log level from error to warning.
All sources I have found, regarding the error, have downplayed its
significance. There is not much of a reason for it to be on error level,
while causing potential confusions or misinterpretations.
Signed-off-by: Wang Shenran <shenran268@gmail.com>
Link: https://lore.kernel.org/r/20190724080110.6952-1-shenran268@gmail.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Kent Overstreet [Tue, 3 Sep 2019 13:25:45 +0000 (21:25 +0800)]
closures: fix a race on wakeup from closure_sync
[ Upstream commit
a22a9602b88fabf10847f238ff81fde5f906fef7 ]
The race was when a thread using closure_sync() notices cl->s->done == 1
before the thread calling closure_put() calls wake_up_process(). Then,
it's possible for that thread to return and exit just before
wake_up_process() is called - so we're trying to wake up a process that
no longer exists.
rcu_read_lock() is sufficient to protect against this, as there's an rcu
barrier somewhere in the process teardown path.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Acked-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Wenwen Wang [Wed, 21 Aug 2019 03:44:19 +0000 (22:44 -0500)]
ACPI / PCI: fix acpi_pci_irq_enable() memory leak
[ Upstream commit
29b49958cf73b439b17fa29e9a25210809a6c01c ]
In acpi_pci_irq_enable(), 'entry' is allocated by kzalloc() in
acpi_pci_irq_check_entry() (invoked from acpi_pci_irq_lookup()). However,
it is not deallocated if acpi_pci_irq_valid() returns false, leading to a
memory leak. To fix this issue, free 'entry' before returning 0.
Fixes:
e237a5518425 ("x86/ACPI/PCI: Recognize that Interrupt Line 255 means "not connected"")
Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Wenwen Wang [Fri, 16 Aug 2019 05:08:27 +0000 (00:08 -0500)]
ACPI: custom_method: fix memory leaks
[ Upstream commit
03d1571d9513369c17e6848476763ebbd10ec2cb ]
In cm_write(), 'buf' is allocated through kzalloc(). In the following
execution, if an error occurs, 'buf' is not deallocated, leading to memory
leaks. To fix this issue, free 'buf' before returning the error.
Fixes:
526b4af47f44 ("ACPI: Split out custom_method functionality into an own driver")
Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Marek Szyprowski [Fri, 30 Aug 2019 12:52:42 +0000 (14:52 +0200)]
ARM: dts: exynos: Mark LDO10 as always-on on Peach Pit/Pi Chromebooks
[ Upstream commit
5b0eeeaa37615df37a9a30929b73e9defe61ca84 ]
Commit
aff138bf8e37 ("ARM: dts: exynos: Add TMU nodes regulator supply
for Peach boards") assigned LDO10 to Exynos Thermal Measurement Unit,
but it turned out that it supplies also some other critical parts and
board freezes/crashes when it is turned off.
The mentioned commit made Exynos TMU a consumer of that regulator and in
typical case Exynos TMU driver keeps it enabled from early boot. However
there are such configurations (example is multi_v7_defconfig), in which
some of the regulators are compiled as modules and are not available
from early boot. In such case it may happen that LDO10 is turned off by
regulator core, because it has no consumers yet (in this case consumer
drivers cannot get it, because the supply regulators for it are not yet
available). This in turn causes the board to crash. This patch restores
'always-on' property for the LDO10 regulator.
Fixes:
aff138bf8e37 ("ARM: dts: exynos: Add TMU nodes regulator supply for Peach boards")
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Tzvetomir Stoyanov [Mon, 5 Aug 2019 20:43:15 +0000 (16:43 -0400)]
libtraceevent: Change users plugin directory
[ Upstream commit
e97fd1383cd77c467d2aed7fa4e596789df83977 ]
To be compliant with XDG user directory layout, the user's plugin
directory is changed from ~/.traceevent/plugins to
~/.local/lib/traceevent/plugins/
Suggested-by: Patrick McLean <chutzpah@gentoo.org>
Signed-off-by: Tzvetomir Stoyanov <tstoyanov@vmware.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Patrick McLean <chutzpah@gentoo.org>
Cc: linux-trace-devel@vger.kernel.org
Link: https://lore.kernel.org/linux-trace-devel/20190313144206.41e75cf8@patrickm/
Link: http://lore.kernel.org/linux-trace-devel/20190801074959.22023-4-tz.stoyanov@gmail.com
Link: http://lore.kernel.org/lkml/20190805204355.344622683@goodmis.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Eric Dumazet [Wed, 28 Aug 2019 13:13:38 +0000 (06:13 -0700)]
iommu/iova: Avoid false sharing on fq_timer_on
[ Upstream commit
0d87308cca2c124f9bce02383f1d9632c9be89c4 ]
In commit
14bd9a607f90 ("iommu/iova: Separate atomic variables
to improve performance") Jinyu Qi identified that the atomic_cmpxchg()
in queue_iova() was causing a performance loss and moved critical fields
so that the false sharing would not impact them.
However, avoiding the false sharing in the first place seems easy.
We should attempt the atomic_cmpxchg() no more than 100 times
per second. Adding an atomic_read() will keep the cache
line mostly shared.
This false sharing came with commit
9a005a800ae8
("iommu/iova: Add flush timer").
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes:
9a005a800ae8 ('iommu/iova: Add flush timer')
Cc: Jinyu Qi <jinyuqi@huawei.com>
Cc: Joerg Roedel <jroedel@suse.de>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Dan Williams [Thu, 29 Aug 2019 23:30:34 +0000 (16:30 -0700)]
libata/ahci: Drop PCS quirk for Denverton and beyond
[ Upstream commit
c312ef176399e04fc5f7f2809d9a589751fbf6d9 ]
The Linux ahci driver has historically implemented a configuration fixup
for platforms / platform-firmware that fails to enable the ports prior
to OS hand-off at boot. The fixup was originally implemented way back
before ahci moved from drivers/scsi/ to drivers/ata/, and was updated in
2007 via commit
49f290903935 "ahci: update PCS programming". The quirk
sets a port-enable bitmap in the PCS register at offset 0x92.
This quirk could be applied generically up until the arrival of the
Denverton (DNV) platform. The DNV AHCI controller architecture supports
more than 6 ports and along with that the PCS register location and
format were updated to allow for more possible ports in the bitmap. DNV
AHCI expands the register to 32-bits and moves it to offset 0x94.
As it stands there are no known problem reports with existing Linux
trying to set bits at offset 0x92 which indicates that the quirk is not
applicable. Likely it is not applicable on a wider range of platforms,
but it is difficult to discern which platforms if any still depend on
the quirk.
Rather than try to fix the PCS quirk to consider the DNV register layout
instead require explicit opt-in. The assumption is that the OS driver
need not touch this register, and platforms can be added with a new
boad_ahci_pcs7 board-id when / if problematic platforms are found in the
future. The logic in ahci_intel_pcs_quirk() looks for all Intel AHCI
instances with "legacy" board-ids and otherwise skips the quirk if the
board was matched by class-code.
Reported-by: Stephen Douthit <stephend@silicom-usa.com>
Cc: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Stephen Douthit <stephend@silicom-usa.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Qian Cai [Wed, 28 Aug 2019 21:39:43 +0000 (17:39 -0400)]
iommu/amd: Silence warnings under memory pressure
[ Upstream commit
3d708895325b78506e8daf00ef31549476e8586a ]
When running heavy memory pressure workloads, the system is throwing
endless warnings,
smartpqi 0000:23:00.0: AMD-Vi: IOMMU mapping error in map_sg (io-pages:
5 reason: -12)
Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40
07/10/2019
swapper/10: page allocation failure: order:0, mode:0xa20(GFP_ATOMIC),
nodemask=(null),cpuset=/,mems_allowed=0,4
Call Trace:
<IRQ>
dump_stack+0x62/0x9a
warn_alloc.cold.43+0x8a/0x148
__alloc_pages_nodemask+0x1a5c/0x1bb0
get_zeroed_page+0x16/0x20
iommu_map_page+0x477/0x540
map_sg+0x1ce/0x2f0
scsi_dma_map+0xc6/0x160
pqi_raid_submit_scsi_cmd_with_io_request+0x1c3/0x470 [smartpqi]
do_IRQ+0x81/0x170
common_interrupt+0xf/0xf
</IRQ>
because the allocation could fail from iommu_map_page(), and the volume
of this call could be huge which may generate a lot of serial console
output and cosumes all CPUs.
Fix it by silencing the warning in this call site, and there is still a
dev_err() later to notify the failure.
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Takashi Sakamoto [Fri, 30 Aug 2019 01:14:56 +0000 (10:14 +0900)]
ALSA: firewire-motu: add support for MOTU 4pre
[ Upstream commit
6af86bdb8ad41f4cf1292d3b10857dc322758328 ]
MOTU 4pre was launched in 2012 by MOTU, Inc. This commit allows userspace
applications can transmit and receive PCM frames and MIDI messages for
this model via ALSA PCM interface and RawMidi/Sequencer interfaces.
The device supports MOTU protocol version 3. Unlike the other devices, the
device is simply designed. The size of data block is fixed to 10 quadlets
during available sampling rates (44.1 - 96.0 kHz). Each data block
includes 1 source packet header, 2 data chunks for messages, 8 data chunks
for PCM samples and 2 data chunks for padding to quadlet alignment. The
device has no MIDI, optical, BNC and AES/EBU interfaces.
Like support for the other MOTU devices, the quality of playback sound
is not enough good with periodical noise yet.
$ python2 crpp < ~/git/am-config-rom/motu/motu-4pre.img
ROM header and bus information block
-----------------------------------------------------------------
400
041078cc bus_info_length 4, crc_length 16, crc 30924
404
31333934 bus_name "1394"
408
20ff7000 irmc 0, cmc 0, isc 1, bmc 0, cyc_clk_acc 255, max_rec 7 (256)
40c
0001f200 company_id 0001f2 |
410
000a41c5 device_id
00000a41c5 | EUI-64
0001f200000a41c5
root directory
-----------------------------------------------------------------
414
0004ef04 directory_length 4, crc 61188
418
030001f2 vendor
41c
0c0083c0 node capabilities per IEEE 1394
420
d1000002 --> unit directory at 428
424
8d000005 --> eui-64 leaf at 438
unit directory at 428
-----------------------------------------------------------------
428
0003ceda directory_length 3, crc 52954
42c
120001f2 specifier id
430
13000045 version
434
17103800 model
eui-64 leaf at 438
-----------------------------------------------------------------
438
0002d248 leaf_length 2, crc 53832
43c
0001f200 company_id 0001f2 |
440
000a41c5 device_id
00000a41c5 | EUI-64
0001f200000a41c5
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Anton Eidelman [Fri, 16 Aug 2019 20:00:10 +0000 (13:00 -0700)]
nvme-multipath: fix ana log nsid lookup when nsid is not found
[ Upstream commit
e01f91dff91c7b16a6e3faf2565017d497a73f83 ]
ANA log parsing invokes nvme_update_ana_state() per ANA group desc.
This updates the state of namespaces with nsids in desc->nsids[].
Both ctrl->namespaces list and desc->nsids[] array are sorted by nsid.
Hence nvme_update_ana_state() performs a single walk over ctrl->namespaces:
- if current namespace matches the current desc->nsids[n],
this namespace is updated, and n is incremented.
- the process stops when it encounters the end of either
ctrl->namespaces end or desc->nsids[]
In case desc->nsids[n] does not match any of ctrl->namespaces,
the remaining nsids following desc->nsids[n] will not be updated.
Such situation was considered abnormal and generated WARN_ON_ONCE.
However ANA log MAY contain nsids not (yet) found in ctrl->namespaces.
For example, lets consider the following scenario:
- nvme0 exposes namespaces with nsids = [2, 3] to the host
- a new namespace nsid = 1 is added dynamically
- also, a ANA topology change is triggered
- NS_CHANGED aen is generated and triggers scan_work
- before scan_work discovers nsid=1 and creates a namespace, a NOTICE_ANA
aen was issues and ana_work receives ANA log with nsids=[1, 2, 3]
Result: ana_work fails to update ANA state on existing namespaces [2, 3]
Solution:
Change the way nvme_update_ana_state() namespace list walk
checks the current namespace against desc->nsids[n] as follows:
a) ns->head->ns_id < desc->nsids[n]: keep walking ctrl->namespaces.
b) ns->head->ns_id == desc->nsids[n]: match, update the namespace
c) ns->head->ns_id >= desc->nsids[n]: skip to desc->nsids[n+1]
This enables correct operation in the scenario described above.
This also allows ANA log to contain nsids currently invisible
to the host, i.e. inactive nsids.
Signed-off-by: Anton Eidelman <anton@lightbitslabs.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Tom Wu [Thu, 8 Aug 2019 02:22:36 +0000 (02:22 +0000)]
nvmet: fix data units read and written counters in SMART log
[ Upstream commit
3bec2e3754becebd4c452999adb49bc62c575ea4 ]
In nvme spec 1.3 there is a definition for data write/read counters
from SMART log, (See section 5.14.1.2):
This value is reported in thousands (i.e., a value of 1
corresponds to 1000 units of 512 bytes read) and is rounded up.
However, in nvme target where value is reported with actual units,
but not thousands of units as the spec requires.
Signed-off-by: Tom Wu <tomwu@mellanox.com>
Reviewed-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Song Liu [Wed, 28 Aug 2019 21:54:55 +0000 (23:54 +0200)]
x86/mm/pti: Handle unaligned address gracefully in pti_clone_pagetable()
[ Upstream commit
825d0b73cd7526b0bb186798583fae810091cbac ]
pti_clone_pmds() assumes that the supplied address is either:
- properly PUD/PMD aligned
or
- the address is actually mapped which means that independently
of the mapping level (PUD/PMD/PTE) the next higher mapping
exists.
If that's not the case the unaligned address can be incremented by PUD or
PMD size incorrectly. All callers supply mapped and/or aligned addresses,
but for the sake of robustness it's better to handle that case properly and
to emit a warning.
[ tglx: Rewrote changelog and added WARN_ON_ONCE() ]
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1908282352470.1938@nanos.tec.linutronix.de
Signed-off-by: Sasha Levin <sashal@kernel.org>
Shengjiu Wang [Wed, 28 Aug 2019 17:20:17 +0000 (13:20 -0400)]
ASoC: fsl_ssi: Fix clock control issue in master mode
[ Upstream commit
696d05225cebffd172008d212657be90e823eac0 ]
The test case is
arecord -Dhw:0 -d 10 -f S16_LE -r 48000 -c 2 temp.wav &
aplay -Dhw:0 -d 30 -f S16_LE -r 48000 -c 2 test.wav
There will be error after end of arecord:
aplay: pcm_write:2051: write error: Input/output error
Capture and Playback work in parallel in master mode, one
substream stops, the other substream is impacted, the
reason is that clock is disabled wrongly.
The clock's reference count is not increased when second
substream starts, the hw_param() function returns in the
beginning because first substream is enabled, then in end
of first substream, the hw_free() disables the clock.
This patch is to move the clock enablement to the place
before checking of the device enablement in hw_param().
Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com>
Link: https://lore.kernel.org/r/1567012817-12625-1-git-send-email-shengjiu.wang@nxp.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Thomas Gleixner [Wed, 28 Aug 2019 14:24:47 +0000 (16:24 +0200)]
x86/mm/pti: Do not invoke PTI functions when PTI is disabled
[ Upstream commit
990784b57731192b7d90c8d4049e6318d81e887d ]
When PTI is disabled at boot time either because the CPU is not affected or
PTI has been disabled on the command line, the boot code still calls into
pti_finalize() which then unconditionally invokes:
pti_clone_entry_text()
pti_clone_kernel_text()
pti_clone_kernel_text() was called unconditionally before the 32bit support
was added and 32bit added the call to pti_clone_entry_text().
The call has no side effects as cloning the page tables into the available
second one, which was allocated for PTI does not create damage. But it does
not make sense either and in case that this functionality would be extended
later this might actually lead to hard to diagnose issues.
Neither function should be called when PTI is runtime disabled. Make the
invocation conditional.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20190828143124.063353972@linutronix.de
Signed-off-by: Sasha Levin <sashal@kernel.org>
Mark Rutland [Tue, 27 Aug 2019 17:12:57 +0000 (18:12 +0100)]
arm64: kpti: ensure patched kernel text is fetched from PoU
[ Upstream commit
f32c7a8e45105bd0af76872bf6eef0438ff12fb2 ]
While the MMUs is disabled, I-cache speculation can result in
instructions being fetched from the PoC. During boot we may patch
instructions (e.g. for alternatives and jump labels), and these may be
dirty at the PoU (and stale at the PoC).
Thus, while the MMU is disabled in the KPTI pagetable fixup code we may
load stale instructions into the I-cache, potentially leading to
subsequent crashes when executing regions of code which have been
modified at runtime.
Similarly to commit:
8ec41987436d566f ("arm64: mm: ensure patched kernel text is fetched from PoU")
... we can invalidate the I-cache after enabling the MMU to prevent such
issues.
The KPTI pagetable fixup code itself should be clean to the PoC per the
boot protocol, so no maintenance is required for this code.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Neil Horman [Thu, 22 Aug 2019 14:34:21 +0000 (10:34 -0400)]
x86/apic/vector: Warn when vector space exhaustion breaks affinity
[ Upstream commit
743dac494d61d991967ebcfab92e4f80dc7583b3 ]
On x86, CPUs are limited in the number of interrupts they can have affined
to them as they only support 256 interrupt vectors per CPU. 32 vectors are
reserved for the CPU and the kernel reserves another 22 for internal
purposes. That leaves 202 vectors for assignement to devices.
When an interrupt is set up or the affinity is changed by the kernel or the
administrator, the vector assignment code attempts to honor the requested
affinity mask. If the vector space on the CPUs in that affinity mask is
exhausted the code falls back to a wider set of CPUs and assigns a vector
on a CPU outside of the requested affinity mask silently.
While the effective affinity is reflected in the corresponding
/proc/irq/$N/effective_affinity* files the silent breakage of the requested
affinity can lead to unexpected behaviour for administrators.
Add a pr_warn() when this happens so that adminstrators get at least
informed about it in the syslog.
[ tglx: Massaged changelog and made the pr_warn() more informative ]
Reported-by: djuran@redhat.com
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: djuran@redhat.com
Link: https://lkml.kernel.org/r/20190822143421.9535-1-nhorman@tuxdriver.com
Signed-off-by: Sasha Levin <sashal@kernel.org>