Filipe Manana [Tue, 11 Aug 2020 11:43:48 +0000 (12:43 +0100)]
btrfs: do not commit logs and transactions during link and rename operations
Since commit
d4682ba03ef618 ("Btrfs: sync log after logging new name") we
started to commit logs, and fallback to transaction commits when we failed
to log the new names or commit the logs, after link and rename operations
when the target inodes (or their parents) were previously logged in the
current transaction. This was to avoid losing directories despite an
explicit fsync on them when they are ancestors of some inode that got a
new named logged, due to a link or rename operation. However that adds the
cost of starting IO and waiting for it to complete, which can cause higher
latencies for applications.
Instead of doing that, just make sure that when we log a new name for an
inode we don't mark any of its ancestors as logged, so that if any one
does an fsync against any of them, without doing any other change on them,
the fsync commits the log. This way we only pay the cost of a log commit
(or a transaction commit if something goes wrong or a new block group was
created) if the application explicitly asks to fsync any of the parent
directories.
Using dbench, which mixes several filesystems operations including renames,
revealed some significant latency gains. The following script that uses
dbench was used to test this:
#!/bin/bash
DEV=/dev/nvme0n1
MNT=/mnt/btrfs
MOUNT_OPTIONS="-o ssd -o space_cache=v2"
MKFS_OPTIONS="-m single -d single"
THREADS=16
echo "performance" | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
mkfs.btrfs -f $MKFS_OPTIONS $DEV
mount $MOUNT_OPTIONS $DEV $MNT
dbench -t 300 -D $MNT $THREADS
umount $MNT
The test was run on bare metal, no virtualization, on a box with 12 cores
(Intel i7-8700), 64Gb of RAM and using a NVMe device, with a kernel
configuration that is the default of typical distributions (debian in this
case), without debug options enabled (kasan, kmemleak, slub debug, debug
of page allocations, lock debugging, etc).
Results before this patch:
Operation Count AvgLat MaxLat
----------------------------------------
NTCreateX
10750455 0.011 155.088
Close 7896674 0.001 0.243
Rename 455222 2.158 1101.947
Unlink 2171189 0.067 121.638
Deltree 256 2.425 7.816
Mkdir 128 0.002 0.003
Qpathinfo 9744323 0.006 21.370
Qfileinfo 1707092 0.001 0.146
Qfsinfo 1786756 0.001 11.228
Sfileinfo 875612 0.003 21.263
Find 3767281 0.025 9.617
WriteX 5356924 0.011 211.390
ReadX
16852694 0.003 9.442
LockX 35008 0.002 0.119
UnlockX 35008 0.001 0.138
Flush 753458 4.252 1102.249
Throughput 1128.35 MB/sec 16 clients 16 procs max_latency=1102.255 ms
Results after this patch:
16 clients, after
Operation Count AvgLat MaxLat
----------------------------------------
NTCreateX
11471098 0.012 448.281
Close 8426396 0.001 0.925
Rename 485746 0.123 267.183
Unlink 2316477 0.080 63.433
Deltree 288 2.830 11.144
Mkdir 144 0.003 0.010
Qpathinfo
10397420 0.006 10.288
Qfileinfo 1822039 0.001 0.169
Qfsinfo 1906497 0.002 14.039
Sfileinfo 934433 0.004 2.438
Find 4019879 0.026 10.200
WriteX 5718932 0.011 200.985
ReadX
17981671 0.003 10.036
LockX 37352 0.002 0.076
UnlockX 37352 0.001 0.109
Flush 804018 5.015 778.033
Throughput 1201.98 MB/sec 16 clients 16 procs max_latency=778.036 ms
(+6.5% throughput, -29.4% max latency, -75.8% rename latency)
Test case generic/498 from fstests tests the scenario that the previously
mentioned commit fixed.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Tue, 11 Aug 2020 11:43:37 +0000 (12:43 +0100)]
btrfs: do not take the log_mutex of the subvolume when pinning the log
During a rename we pin the log to make sure no one commits a log that
reflects an ongoing rename operation, as it might result in a committed
log where it recorded the unlink of the old name without having recorded
the new name. However we are taking the subvolume's log_mutex before
incrementing the log_writers counter, which is not necessary since that
counter is atomic and we only remove the old name from the log and add
the new name to the log after we have incremented log_writers, ensuring
that no one can commit the log after we have removed the old name from
the log and before we added the new name to the log.
By taking the log_mutex lock we are just adding unnecessary contention on
the lock, which can become visible for workloads that mix renames with
fsyncs, writes for files opened with O_SYNC and unlink operations (if the
inode or its parent were fsynced before in the current transaction).
So just remove the lock and unlock of the subvolume's log_mutex at
btrfs_pin_log_trans().
Using dbench, which mixes different types of operations that end up taking
that mutex (fsyncs, renames, unlinks and writes into files opened with
O_SYNC) revealed some small gains. The following script that calls dbench
was used:
#!/bin/bash
DEV=/dev/nvme0n1
MNT=/mnt/btrfs
MOUNT_OPTIONS="-o ssd -o space_cache=v2"
MKFS_OPTIONS="-m single -d single"
THREADS=32
echo "performance" | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
mkfs.btrfs -f $MKFS_OPTIONS $DEV
mount $MOUNT_OPTIONS $DEV $MNT
dbench -s -t 600 -D $MNT $THREADS
umount $MNT
The test was run on bare metal, no virtualization, on a box with 12 cores
(Intel i7-8700), 64Gb of RAM and using a NVMe device, with a kernel
configuration that is the default of typical distributions (debian in this
case), without debug options enabled (kasan, kmemleak, slub debug, debug
of page allocations, lock debugging, etc).
Results before this patch:
Operation Count AvgLat MaxLat
----------------------------------------
NTCreateX 4410848 0.017 738.640
Close 3240222 0.001 0.834
Rename 186850 7.478 1272.476
Unlink 890875 0.128 785.018
Deltree 128 2.846 12.081
Mkdir 64 0.002 0.003
Qpathinfo 3997659 0.009 11.171
Qfileinfo 701307 0.001 0.478
Qfsinfo 733494 0.002 1.103
Sfileinfo 359362 0.004 3.266
Find 1546226 0.041 4.128
WriteX 2202803 7.905 1376.989
ReadX 6917775 0.003 3.887
LockX 14392 0.002 0.043
UnlockX 14392 0.001 0.085
Flush 309225 0.128 1033.936
Throughput 231.555 MB/sec (sync open) 32 clients 32 procs max_latency=1376.993 ms
Results after this patch:
Operation Count AvgLat MaxLat
----------------------------------------
NTCreateX 4603244 0.017 232.776
Close 3381299 0.001 1.041
Rename 194871 7.251 1073.165
Unlink 929730 0.133 119.233
Deltree 128 2.871 10.199
Mkdir 64 0.002 0.004
Qpathinfo 4171343 0.009 11.317
Qfileinfo 731227 0.001 1.635
Qfsinfo 765079 0.002 3.568
Sfileinfo 374881 0.004 1.220
Find 1612964 0.041 4.675
WriteX 2296720 7.569 1178.204
ReadX 7213633 0.003 3.075
LockX 14976 0.002 0.076
UnlockX 14976 0.001 0.061
Flush 322635 0.102 579.505
Throughput 241.4 MB/sec (sync open) 32 clients 32 procs max_latency=1178.207 ms
(+4.3% throughput, -14.4% max latency)
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Mon, 17 Aug 2020 10:16:57 +0000 (12:16 +0200)]
btrfs: send: remove indirect callback parameter for changed_cb
There's a custom callback passed to btrfs_compare_trees which happens to
be named exactly same as the existing function implementing it. This is
confusing and the indirection is not necessary for our needs. Compiler
is clever enough to call it directly so there's effectively no change.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Mon, 17 Aug 2020 10:12:38 +0000 (12:12 +0200)]
btrfs: scrub: rename ratelimit state varaible to avoid shadowing
There's already defined _rs within ctree.h:btrfs_printk_ratelimited,
local variables should not use _ to avoid such name clashes with
macro-local variables.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Mon, 17 Aug 2020 10:08:37 +0000 (12:08 +0200)]
btrfs: remove unnecessarily shadowed variables
In btrfs_orphan_cleanup, there's another instance of fs_info, but it's
the same as the one we already have.
In btrfs_backref_finish_upper_links, rb_node is same type and used
as temporary cursor to the tree.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Mon, 17 Aug 2020 08:58:38 +0000 (10:58 +0200)]
btrfs: compression: move declarations to header
The declarations of compression algorithm callbacks are defined in the
.c file as they're used from there. Compiler warns that there are no
declarations for public functions when compiling lzo.c/zlib.c/zstd.c.
Fix that by moving the declarations to the header as it's the common
place for all of them.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Mon, 17 Aug 2020 08:56:00 +0000 (10:56 +0200)]
btrfs: remove const from btrfs_feature_set_name
The function btrfs_feature_set_name returns a const char pointer, the
second const is not necessary and reported as a warning:
In file included from fs/btrfs/space-info.c:6:
fs/btrfs/sysfs.h:16:1: warning: type qualifiers ignored on function return type [-Wignored-qualifiers]
16 | const char * const btrfs_feature_set_name(enum btrfs_feature_set set);
| ^~~~~
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Thu, 13 Aug 2020 06:33:52 +0000 (14:33 +0800)]
btrfs: cleanup calculation of lockend in lock_and_cleanup_extent_if_need()
We're just doing rounding up to sectorsize to calculate the lockend.
There is no need to do the unnecessary length calculation, just direct
round_up() is enough.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Josef Bacik [Tue, 25 Aug 2020 20:56:59 +0000 (16:56 -0400)]
btrfs: fix possible infinite loop in data async reclaim
Dave reported an issue where generic/102 would sometimes hang. This
turned out to be because we'd get into this spot where we were no longer
making progress on data reservations because our exit condition was not
met. The log is basically
while (!space_info->full && !list_empty(&space_info->tickets))
flush_space(space_info, flush_state);
where flush state is our various flush states, but doesn't include
ALLOC_CHUNK_FORCE. This is because we actually lead with allocating
chunks, and so the assumption was that once you got to the actual
flushing states you could no longer allocate chunks. This was a stupid
assumption, because you could have deleted block groups that would be
reclaimed by a transaction commit, thus unsetting space_info->full.
This is essentially what happens with generic/102, and so sometimes
you'd get stuck in the flushing loop because we weren't allocating
chunks, but flushing space wasn't giving us what we needed to make
progress.
Fix this by adding ALLOC_CHUNK_FORCE to the end of our flushing states,
that way we will eventually bail out because we did end up with
space_info->full if we free'd a chunk previously. Otherwise, as is the
case for this test, we'll allocate our chunk and continue on our happy
merry way.
Reported-by: David Sterba <dsterba@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Josef Bacik [Tue, 21 Jul 2020 14:22:34 +0000 (10:22 -0400)]
btrfs: add a comment explaining the data flush steps
The data flushing steps are not obvious to people other than myself and
Chris. Write a giant comment explaining the reasoning behind each flush
step for data as well as why it is in that particular order.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Josef Bacik [Tue, 21 Jul 2020 14:22:33 +0000 (10:22 -0400)]
btrfs: do async reclaim for data reservations
Now that we have the data ticketing stuff in place, move normal data
reservations to use an async reclaim helper to satisfy tickets. Before
we could have multiple tasks race in and both allocate chunks, resulting
in more data chunks than we would necessarily need. Serializing these
allocations and making a single thread responsible for flushing will
only allocate chunks as needed, as well as cut down on transaction
commits and other flush related activities.
Priority reservations will still work as they have before, simply
trying to allocate a chunk until they can make their reservation.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Josef Bacik [Tue, 21 Jul 2020 14:22:32 +0000 (10:22 -0400)]
btrfs: flush delayed refs when trying to reserve data space
We can end up with freed extents in the delayed refs, and thus
may_commit_transaction() may not think we have enough pinned space to
commit the transaction and we'll ENOSPC early. Handle this by running
the delayed refs in order to make sure pinned is uptodate before we try
to commit the transaction.
Tested-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Josef Bacik [Tue, 21 Jul 2020 14:22:31 +0000 (10:22 -0400)]
btrfs: run delayed iputs before committing the transaction for data
Before we were waiting on iputs after we committed the transaction, but
this doesn't really make much sense. We want to reclaim any space we
may have in order to be more likely to commit the transaction, due to
pinned space being added by running the delayed iputs. Fix this by
making delayed iputs run before committing the transaction.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Josef Bacik [Tue, 21 Jul 2020 14:22:30 +0000 (10:22 -0400)]
btrfs: don't force commit if we are data
We used to unconditionally commit the transaction at least 2 times and
then on the 3rd try check against pinned space to make sure committing
the transaction was worth the effort. This is overkill, we know nobody
is going to steal our reservation, and if we can't make our reservation
with the pinned amount simply bail out.
This also cleans up the passing of bytes_needed to
may_commit_transaction, as that was the thing we added into place in
order to accomplish this behavior. We no longer need it so remove that
mess.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Josef Bacik [Tue, 21 Jul 2020 14:22:29 +0000 (10:22 -0400)]
btrfs: drop the commit_cycles stuff for data reservations
This was an old wart left over from how we previously did data
reservations. Before we could have people race in and take a
reservation while we were flushing space, so we needed to make sure we
looped a few times before giving up. Now that we're using the ticketing
infrastructure we don't have to worry about this and can drop the logic
altogether.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Josef Bacik [Tue, 21 Jul 2020 14:22:28 +0000 (10:22 -0400)]
btrfs: use the same helper for data and metadata reservations
Now that data reservations follow the same pattern as metadata
reservations we can simply rename __reserve_metadata_bytes to
__reserve_bytes and use that helper for data reservations.
Things to keep in mind, btrfs_can_overcommit() returns 0 for data,
because we can never overcommit. We also will never pass in FLUSH_ALL
for data, so we'll simply be added to the priority list and go straight
into handle_reserve_ticket.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Josef Bacik [Tue, 21 Jul 2020 14:22:27 +0000 (10:22 -0400)]
btrfs: serialize data reservations if we are flushing
Nikolay reported a problem where generic/371 would fail sometimes with a
slow drive. The gist of the test is that we fallocate a file in
parallel with a pwrite of a different file. These two files combined
are smaller than the file system, but sometimes the pwrite would ENOSPC.
A fair bit of investigation uncovered the fact that the fallocate
workload was racing in and grabbing the free space that the pwrite
workload was trying to free up so it could make its own reservation.
After a few loops of this eventually the pwrite workload would error out
with an ENOSPC.
We've had the same problem with metadata as well, and we serialized all
metadata allocations to satisfy this problem. This wasn't usually a
problem with data because data reservations are more straightforward,
but obviously could still happen.
Fix this by not allowing reservations to occur if there are any pending
tickets waiting to be satisfied on the space info.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Josef Bacik [Tue, 21 Jul 2020 14:22:26 +0000 (10:22 -0400)]
btrfs: use ticketing for data space reservations
Now that we have all the infrastructure in place, use the ticketing
infrastructure to make data allocations. This still maintains the exact
same flushing behavior, but now we're using tickets to get our
reservations satisfied.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Josef Bacik [Tue, 21 Jul 2020 14:22:25 +0000 (10:22 -0400)]
btrfs: add btrfs_reserve_data_bytes and use it
Create a new function btrfs_reserve_data_bytes() in order to handle data
reservations. This uses the new flush types and flush states to handle
making data reservations.
This patch specifically does not change any functionality, and is
purposefully not cleaned up in order to make bisection easier for the
future patches. The new helper is identical to the old helper in how it
handles data reservations. We first try to force a chunk allocation,
and then we run through the flush states all at once and in the same
order that they were done with the old helper.
Subsequent patches will clean this up and change the behavior of the
flushing, and it is important to keep those changes separate so we can
easily bisect down to the patch that caused the regression, rather than
the patch that made us start using the new infrastructure.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Josef Bacik [Tue, 21 Jul 2020 14:22:24 +0000 (10:22 -0400)]
btrfs: add the data transaction commit logic into may_commit_transaction
Data space flushing currently unconditionally commits the transaction
twice in a row, and the last time it checks if there's enough pinned
extents to satisfy its reservation before deciding to commit the
transaction for the 3rd and final time.
Encode this logic into may_commit_transaction(). In the next patch we
will pass in U64_MAX for bytes_needed the first two times, and the final
time we will pass in the actual bytes we need so the normal logic will
apply.
This patch exists solely to make the logical changes I will make to the
flushing state machine separate to make it easier to bisect any
performance related regressions.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Josef Bacik [Tue, 21 Jul 2020 14:22:23 +0000 (10:22 -0400)]
btrfs: add flushing states for handling data reservations
Currently the way we do data reservations is by seeing if we have enough
space in our space_info. If we do not and we're a normal inode we'll
1) Attempt to force a chunk allocation until we can't anymore.
2) If that fails we'll flush delalloc, then commit the transaction, then
run the delayed iputs.
If we are a free space inode we're only allowed to force a chunk
allocation. In order to use the normal flushing mechanism we need to
encode this into a flush state array for normal inodes. Since both will
start with allocating chunks until the space info is full there is no
need to add this as a flush state, this will be handled specially.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Josef Bacik [Tue, 21 Jul 2020 14:22:22 +0000 (10:22 -0400)]
btrfs: check tickets after waiting on ordered extents
Right now if the space is freed up after the ordered extents complete
(which is likely since the reservations are held until they complete),
we would do extra delalloc flushing before we'd notice that we didn't
have any more tickets. Fix this by moving the tickets check after our
wait_ordered_extents check.
Tested-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Josef Bacik [Tue, 21 Jul 2020 14:22:21 +0000 (10:22 -0400)]
btrfs: use btrfs_start_delalloc_roots in shrink_delalloc
The original iteration of flushing had us flushing delalloc and then
checking to see if we could make our reservation, thus we were very
careful about how many pages we would flush at once.
But now that everything is async and we satisfy tickets as the space
becomes available we don't have to keep track of any of this, simply
try and flush the number of dirty inodes we may have in order to
reclaim space to make our reservation. This cleans up our delalloc
flushing significantly.
The async_pages stuff is dropped because btrfs_start_delalloc_roots()
handles the case that we generate async extents for us, so we no longer
require this extra logic.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Josef Bacik [Tue, 21 Jul 2020 14:22:20 +0000 (10:22 -0400)]
btrfs: use the btrfs_space_info_free_bytes_may_use helper for delalloc
We are going to use the ticket infrastructure for data, so use the
btrfs_space_info_free_bytes_may_use() helper in
btrfs_free_reserved_data_space_noquota() so we get the
btrfs_try_granting_tickets call when we free our reservation.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Josef Bacik [Tue, 21 Jul 2020 14:22:19 +0000 (10:22 -0400)]
btrfs: call btrfs_try_granting_tickets when reserving space
If we have compression on we could free up more space than we reserved,
and thus be able to make a space reservation. Add the call for this
scenario.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Josef Bacik [Tue, 21 Jul 2020 14:22:18 +0000 (10:22 -0400)]
btrfs: call btrfs_try_granting_tickets when unpinning anything
When unpinning we were only calling btrfs_try_granting_tickets() if
global_rsv->space_info == space_info, which is problematic because we
use ticketing for SYSTEM chunks, and want to use it for DATA as well.
Fix this by moving this call outside of that if statement.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Josef Bacik [Tue, 21 Jul 2020 14:22:17 +0000 (10:22 -0400)]
btrfs: call btrfs_try_granting_tickets when freeing reserved bytes
We were missing a call to btrfs_try_granting_tickets in
btrfs_free_reserved_bytes, so add it to handle the case where we're able
to satisfy an allocation because we've freed a pending reservation.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Josef Bacik [Tue, 21 Jul 2020 14:22:16 +0000 (10:22 -0400)]
btrfs: make ALLOC_CHUNK use the space info flags
We have traditionally used flush_space() to flush metadata space, so
we've been unconditionally using btrfs_metadata_alloc_profile() for our
profile to allocate a chunk. However if we're going to use this for
data we need to use btrfs_get_alloc_profile() on the space_info we pass
in.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Josef Bacik [Tue, 21 Jul 2020 14:22:15 +0000 (10:22 -0400)]
btrfs: make shrink_delalloc take space_info as an arg
Currently shrink_delalloc just looks up the metadata space info, but
this won't work if we're trying to reclaim space for data chunks. We
get the right space_info we want passed into flush_space, so simply pass
that along to shrink_delalloc.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Josef Bacik [Tue, 21 Jul 2020 14:22:14 +0000 (10:22 -0400)]
btrfs: handle U64_MAX for shrink_delalloc
Data allocations are going to want to pass in U64_MAX for flushing
space, adjust shrink_delalloc to handle this properly.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Josef Bacik [Tue, 21 Jul 2020 14:22:13 +0000 (10:22 -0400)]
btrfs: remove orig from shrink_delalloc
We don't use this anywhere inside of shrink_delalloc since
17024ad0a0fd
("Btrfs: fix early ENOSPC due to delalloc"), remove it.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Josef Bacik [Tue, 21 Jul 2020 14:22:12 +0000 (10:22 -0400)]
btrfs: change nr to u64 in btrfs_start_delalloc_roots
We have btrfs_wait_ordered_roots() which takes a u64 for nr, but
btrfs_start_delalloc_roots() that takes an int for nr, which makes using
them in conjunction, especially for something like (u64)-1, annoying and
inconsistent. Fix btrfs_start_delalloc_roots() to take a u64 for nr and
adjust start_delalloc_inodes() and it's callers appropriately.
This means we've adjusted start_delalloc_inodes() to take a pointer of
nr since we want to preserve the ability for start-delalloc_inodes() to
return an error, so simply make it do the nr adjusting as necessary.
Part of adjusting the callers to this means changing
btrfs_writeback_inodes_sb_nr() to take a u64 for items. This may be
confusing because it seems unrelated, but the caller of
btrfs_writeback_inodes_sb_nr() already passes in a u64, it's just the
function variable that needs to be changed.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Nikolay Borisov [Wed, 12 Aug 2020 13:18:51 +0000 (16:18 +0300)]
btrfs: remove fsid argument from btrfs_sysfs_update_sprout_fsid
It can be accessed from 'fs_devices' as it's identical to
fs_info->fs_devices. Also add a comment about why we are calling the
function. No semantic changes.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Nikolay Borisov [Mon, 3 Aug 2020 09:43:18 +0000 (12:43 +0300)]
btrfs: remove spurious BUG_ON in btrfs_get_extent
That BUG_ON cannot ever trigger because as the comment there states -
'err' is always set. Simply remove it as it brings no value.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Randy Dunlap [Wed, 5 Aug 2020 02:48:34 +0000 (19:48 -0700)]
btrfs: delete duplicated words + other fixes in comments
Delete repeated words in fs/btrfs/.
{to, the, a, and old}
and change "into 2 part" to "into 2 parts".
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Tue, 28 Jul 2020 01:42:49 +0000 (09:42 +0800)]
btrfs: tracepoints: output proper root owner for trace_find_free_extent()
The current trace event always output result like this:
find_free_extent: root=2(EXTENT_TREE) len=16384 empty_size=0 flags=4(METADATA)
find_free_extent: root=2(EXTENT_TREE) len=16384 empty_size=0 flags=4(METADATA)
find_free_extent: root=2(EXTENT_TREE) len=8192 empty_size=0 flags=1(DATA)
find_free_extent: root=2(EXTENT_TREE) len=8192 empty_size=0 flags=1(DATA)
find_free_extent: root=2(EXTENT_TREE) len=4096 empty_size=0 flags=1(DATA)
find_free_extent: root=2(EXTENT_TREE) len=4096 empty_size=0 flags=1(DATA)
T's saying we're allocating data extent for EXTENT tree, which is not
even possible.
It's because we always use EXTENT tree as the owner for
trace_find_free_extent() without using the @root from
btrfs_reserve_extent().
This patch will change the parameter to use proper @root for
trace_find_free_extent():
Now it looks much better:
find_free_extent: root=5(FS_TREE) len=16384 empty_size=0 flags=36(METADATA|DUP)
find_free_extent: root=5(FS_TREE) len=8192 empty_size=0 flags=1(DATA)
find_free_extent: root=5(FS_TREE) len=16384 empty_size=0 flags=1(DATA)
find_free_extent: root=5(FS_TREE) len=4096 empty_size=0 flags=1(DATA)
find_free_extent: root=5(FS_TREE) len=8192 empty_size=0 flags=1(DATA)
find_free_extent: root=5(FS_TREE) len=16384 empty_size=0 flags=36(METADATA|DUP)
find_free_extent: root=7(CSUM_TREE) len=16384 empty_size=0 flags=36(METADATA|DUP)
find_free_extent: root=2(EXTENT_TREE) len=16384 empty_size=0 flags=36(METADATA|DUP)
find_free_extent: root=1(ROOT_TREE) len=16384 empty_size=0 flags=36(METADATA|DUP)
Reported-by: Hans van Kranenburg <hans@knorrie.org>
CC: stable@vger.kernel.org # 5.4+
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Linus Torvalds [Sun, 4 Oct 2020 23:04:34 +0000 (16:04 -0700)]
Linux 5.9-rc8
Linus Torvalds [Sat, 3 Oct 2020 19:19:23 +0000 (12:19 -0700)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"Two bugfixes"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: VMX: update PFEC_MASK/PFEC_MATCH together with PF intercept
KVM: arm64: Restore missing ISB on nVHE __tlb_switch_to_guest
Linus Torvalds [Sat, 3 Oct 2020 18:57:39 +0000 (11:57 -0700)]
Merge tag 'for-linus-5.9b-rc8-tag' of git://git./linux/kernel/git/xen/tip
Pull xen fix from Juergen Gross:
"Fix a regression introduced in 5.9-rc3 which caused a system running
as fully virtualized guest under Xen to crash when using legacy
devices like a floppy"
* tag 'for-linus-5.9b-rc8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/events: don't use chip_data for legacy IRQs
Linus Torvalds [Sat, 3 Oct 2020 18:47:35 +0000 (11:47 -0700)]
Merge tag 'usb-5.9-rc8' of git://git./linux/kernel/git/gregkh/usb
Pull USB/PHY fixes from Greg KH:
"Here are some small USB and PHY driver fixes for 5.9-rc8
The PHY driver fix resolves an issue found by Dan Carpenter for a
memory leak.
The USB fixes fall into two groups:
- usb gadget fix from Bryan that is a fix for a previous security fix
that showed up in in-the-wild testing
- usb core driver matching bugfixes. This fixes a bug that has
plagued the both the usbip driver and syzbot testing tools this -rc
release cycle. All is now working properly so usbip connections
will work, and syzbot can get back to fuzzing USB drivers properly.
All have been in linux-next for a while with no reported issues"
* tag 'usb-5.9-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usbcore/driver: Accommodate usbip
usbcore/driver: Fix incorrect downcast
usbcore/driver: Fix specific driver selection
Revert "usbip: Implement a match function to fix usbip"
USB: gadget: f_ncm: Fix NDP16 datagram validation
phy: ti: am654: Fix a leak in serdes_am654_probe()
Linus Torvalds [Sat, 3 Oct 2020 18:40:22 +0000 (11:40 -0700)]
Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"Some more driver fixes for i2c"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: npcm7xx: Clear LAST bit after a failed transaction.
i2c: cpm: Fix i2c_ram structure
i2c: i801: Exclude device from suspend direct complete optimization
Linus Torvalds [Sat, 3 Oct 2020 18:37:23 +0000 (11:37 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:
"A couple more driver quirks, now enabling newer trackpoints from
Synaptics for real"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: i8042 - add nopnp quirk for Acer Aspire 5 A515
Input: trackpoint - enable Synaptics trackpoints
Eric Biggers [Sat, 3 Oct 2020 05:21:48 +0000 (22:21 -0700)]
scripts/spelling.txt: fix malformed entry
One of the entries has three fields "mistake||correction||correction"
rather than the expected two fields "mistake||correction". Fix it.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Link: https://lkml.kernel.org/r/20200930234359.255295-1-ebiggers@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joonsoo Kim [Sat, 3 Oct 2020 05:21:45 +0000 (22:21 -0700)]
mm/page_alloc: handle a missing case for memalloc_nocma_{save/restore} APIs
memalloc_nocma_{save/restore} APIs can be used to skip page allocation
on CMA area, but, there is a missing case and the page on CMA area could
be allocated even if APIs are used. This patch handles this case to fix
the potential issue.
For now, these APIs are used to prevent long-term pinning on the CMA
page. When the long-term pinning is requested on the CMA page, it is
migrated to the non-CMA page before pinning. This non-CMA page is
allocated by using memalloc_nocma_{save/restore} APIs. If APIs doesn't
work as intended, the CMA page is allocated and it is pinned for a long
time. This long-term pin for the CMA page causes cma_alloc() failure
and it could result in wrong behaviour on the device driver who uses the
cma_alloc().
Missing case is an allocation from the pcplist. MIGRATE_MOVABLE pcplist
could have the pages on CMA area so we need to skip it if ALLOC_CMA
isn't specified.
Fixes:
8510e69c8efe (mm/page_alloc: fix memalloc_nocma_{save/restore} APIs)
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Link: https://lkml.kernel.org/r/1601429472-12599-1-git-send-email-iamjoonsoo.kim@lge.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric Farman [Sat, 3 Oct 2020 05:21:41 +0000 (22:21 -0700)]
mm, slub: restore initial kmem_cache flags
The routine that applies debug flags to the kmem_cache slabs
inadvertantly prevents non-debug flags from being applied to those
same objects. That is, if slub_debug=<flag>,<slab> is specified,
non-debugged slabs will end up having flags of zero, and the slabs
may be unusable.
Fix this by including the input flags for non-matching slabs with the
contents of slub_debug, so that the caches are created as expected
alongside any debugging options that may be requested. With this, we
can remove the check for a NULL slub_debug_string, since it's covered
by the loop itself.
Fixes:
e17f1dfba37b ("mm, slub: extend slub_debug syntax for multiple blocks")
Signed-off-by: Eric Farman <farman@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Kees Cook <keescook@chromium.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Link: https://lkml.kernel.org/r/20200930161931.28575-1-farman@linux.ibm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Paolo Bonzini [Sat, 3 Oct 2020 09:07:59 +0000 (05:07 -0400)]
Merge tag 'kvmarm-fixes-5.9-3' of git://git./linux/kernel/git/kvmarm/kvmarm into kvm-master
KVM/arm64 fixes for 5.9, take #3
- Fix synchronization of VTTBR update on TLB invalidation for nVHE systems
Paolo Bonzini [Tue, 29 Sep 2020 12:31:32 +0000 (08:31 -0400)]
KVM: VMX: update PFEC_MASK/PFEC_MATCH together with PF intercept
The PFEC_MASK and PFEC_MATCH fields in the VMCS reverse the meaning of
the #PF intercept bit in the exception bitmap when they do not match.
This means that, if PFEC_MASK and/or PFEC_MATCH are set, the
hypervisor can get a vmexit for #PF exceptions even when the
corresponding bit is clear in the exception bitmap.
This is unexpected and is promptly detected by a WARN_ON_ONCE.
To fix it, reset PFEC_MASK and PFEC_MATCH when the #PF intercept
is disabled (as is common with enable_ept && !allow_smaller_maxphyaddr).
Reported-by: Qian Cai <cai@redhat.com>>
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Linus Torvalds [Fri, 2 Oct 2020 21:51:34 +0000 (14:51 -0700)]
Merge tag 'pinctrl-v5.9-2' of git://git./linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:
"Some pin control fixes here. All of them are driver fixes, the Intel
Cherryview being the most interesting one.
- Fix a mux problem for I2C in the MVEBU driver.
- Fix a really hairy inversion problem in the Intel Cherryview
driver.
- Fix the register for the sdc2_clk in the Qualcomm SM8250 driver.
- Check the virtual GPIO boot failur in the Mediatek driver"
* tag 'pinctrl-v5.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: mediatek: check mtk_is_virt_gpio input parameter
pinctrl: qcom: sm8250: correct sdc2_clk
pinctrl: cherryview: Preserve CHV_PADCTRL1_INVRXTX_TXDATA flag on GPIOs
pinctrl: mvebu: Fix i2c sda definition for 98DX3236
Linus Torvalds [Fri, 2 Oct 2020 21:48:25 +0000 (14:48 -0700)]
Merge tag 'pci-v5.9-fixes-2' of git://git./linux/kernel/git/helgaas/pci
Pull PCI fixes from Bjorn Helgaas:
- Fix rockchip regression in rockchip_pcie_valid_device() (Lorenzo
Pieralisi)
- Add Pali Rohár as aardvark PCI maintainer (Pali Rohár)
* tag 'pci-v5.9-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
MAINTAINERS: Add Pali Rohár as aardvark PCI maintainer
PCI: rockchip: Fix bus checks in rockchip_pcie_valid_device()
Linus Torvalds [Fri, 2 Oct 2020 21:42:13 +0000 (14:42 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Two patches in driver frameworks. The iscsi one corrects a bug induced
by a BPF change to network locking and the other is a regression we
introduced"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: iscsi: iscsi_tcp: Avoid holding spinlock while calling getpeername()
scsi: target: Fix lun lookup for TARGET_SCF_LOOKUP_LUN_FROM_TAG case
Linus Torvalds [Fri, 2 Oct 2020 21:38:10 +0000 (14:38 -0700)]
Merge tag 'io_uring-5.9-2020-10-02' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
- fix for async buffered reads if read-ahead is fully disabled (Hao)
- double poll match fix
- ->show_fdinfo() potential ABBA deadlock complaint fix
* tag 'io_uring-5.9-2020-10-02' of git://git.kernel.dk/linux-block:
io_uring: fix async buffered reads when readahead is disabled
io_uring: fix potential ABBA deadlock in ->show_fdinfo()
io_uring: always delete double poll wait entry on match
Linus Torvalds [Fri, 2 Oct 2020 21:34:52 +0000 (14:34 -0700)]
Merge tag 'block-5.9-2020-10-02' of git://git.kernel.dk/linux-block
Pull block fix from Jens Axboe:
"Single fix for a ->commit_rqs failure case"
* tag 'block-5.9-2020-10-02' of git://git.kernel.dk/linux-block:
blk-mq: call commit_rqs while list empty but error happen
Linus Torvalds [Fri, 2 Oct 2020 17:37:08 +0000 (10:37 -0700)]
Merge branch 'work.epoll' of git://git./linux/kernel/git/viro/vfs
Pull epoll fixes from Al Viro:
"Several race fixes in epoll"
* 'work.epoll' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
ep_create_wakeup_source(): dentry name can change under you...
epoll: EPOLL_CTL_ADD: close the race in decision to take fast path
epoll: replace ->visited/visited_list with generation count
epoll: do not insert into poll queues until all sanity checks are done
Linus Torvalds [Fri, 2 Oct 2020 17:13:05 +0000 (10:13 -0700)]
Merge tag 'riscv-for-linus-5.9-rc8' of git://git./linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
"Two fixes for this week:
- The addition of a symbol export for clint_time_val, which has been
inlined into some timex functions and can be used by drivers.
- A fix to avoid calling get_cycles() before the timers have been
probed.
These both only effect !MMU systems"
* tag 'riscv-for-linus-5.9-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
RISC-V: Check clint_time_val before use
clocksource: clint: Export clint_time_val for modules
Linus Torvalds [Fri, 2 Oct 2020 17:09:40 +0000 (10:09 -0700)]
Merge tag 'for-5.9-rc7-tag' of git://git./linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"Two more fixes.
One is for a lockdep warning/lockup (also caught by syzbot), that one
has been seen in practice. Regarding the other syzbot reports
mentioned last time, they don't seem to be urgent and reliably
reproducible so they'll be fixed later.
The second fix is for a potential corruption when device replace
finishes and the in-memory state of trim is not copied to the new
device"
* tag 'for-5.9-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix filesystem corruption after a device replace
btrfs: move btrfs_rm_dev_replace_free_srcdev outside of all locks
btrfs: move btrfs_scratch_superblocks into btrfs_dev_replace_finishing
Linus Torvalds [Fri, 2 Oct 2020 17:05:56 +0000 (10:05 -0700)]
Merge tag 'pm-5.9-rc8' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix one more issue related to the recent RCU-lockdep changes, a
typo in documentation and add a missing return statement to
intel_pstate.
Specifics:
- Fix up RCU usage for cpuidle on the ARM imx6q platform (Ulf
Hansson)
- Fix typo in the PM documentation (Yoann Congal)
- Add return statement that is missing after recent changes in the
intel_pstate driver (Zhang Rui)"
* tag 'pm-5.9-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ARM: imx6q: Fixup RCU usage for cpuidle
Documentation: PM: Fix a reStructuredText syntax error
cpufreq: intel_pstate: Fix missing return statement
Linus Torvalds [Fri, 2 Oct 2020 17:01:00 +0000 (10:01 -0700)]
Merge tag 'staging-5.9-rc8' of git://git./linux/kernel/git/gregkh/staging
Pull IIO fixes from Greg KH:
"Here are two small IIO driver fixes for 5.9-rc8 that resolve some
reported issues:
- driver name fixed in one driver
- device name typo fixed
Both have been in linux-next for a while with no reported problems"
* tag 'staging-5.9-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
iio: adc: qcom-spmi-adc5: fix driver name
iio: adc: ad7124: Fix typo in device name
Linus Torvalds [Fri, 2 Oct 2020 16:51:42 +0000 (09:51 -0700)]
Merge tag 'gpio-v5.9-2' of git://git./linux/kernel/git/linusw/linux-gpio
Pull GPIO fixes from Linus Walleij:
"Some late GPIO fixes for the v5.9 series:
- Fix compiler warnings on the OMAP when PM is disabled
- Clear the interrupt when setting edge sensitivity on the Spreadtrum
driver.
- Fix up spurious interrupts on the TC35894.
- Support threaded interrupts on the Siox controller.
- Fix resource leaks on the mockup driver.
- Fix line event handling in syscall compatible mode for the
character device.
- Fix an unitialized variable in the PCA953A driver.
- Fix access to all GPIO IRQs on the Aspeed AST2600.
- Fix line direction on the AMD FCH driver.
- Use the bitmap API instead of compiler intrinsics for bit
manipulation in the PCA953x driver"
* tag 'gpio-v5.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpio: pca953x: Correctly initialize registers 6 and 7 for PCA957x
gpio: pca953x: Use bitmap API over implicit GCC extension
gpio: amd-fch: correct logic of GPIO_LINE_DIRECTION
gpio: aspeed: fix ast2600 bank properties
gpio/aspeed-sgpio: don't enable all interrupts by default
gpio/aspeed-sgpio: enable access to all 80 input & output sgpios
gpio: pca953x: Fix uninitialized pending variable
gpiolib: Fix line event handling in syscall compatible mode
gpio: mockup: fix resource leak in error path
gpio: siox: explicitly support only threaded irqs
gpio: tc35894: fix up tc35894 interrupt configuration
gpio: sprd: Clear interrupt when setting the type as edge
gpio: omap: Fix warnings if PM is disabled
Linus Torvalds [Fri, 2 Oct 2020 16:40:09 +0000 (09:40 -0700)]
Merge tag 'mmc-v5.9-rc4-3' of git://git./linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
- Fix deadlock when removing MEMSTICK host
- Workaround broken CMDQ on Intel GLK based IRBIS models
* tag 'mmc-v5.9-rc4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: sdhci: Workaround broken command queuing on Intel GLK based IRBIS models
memstick: Skip allocating card when removing host
Thibaut Sautereau [Fri, 2 Oct 2020 15:16:11 +0000 (17:16 +0200)]
random32: Restore __latent_entropy attribute on net_rand_state
Commit
f227e3ec3b5c ("random32: update the net random state on interrupt
and activity") broke compilation and was temporarily fixed by Linus in
83bdc7275e62 ("random32: remove net_rand_state from the latent entropy
gcc plugin") by entirely moving net_rand_state out of the things handled
by the latent_entropy GCC plugin.
From what I understand when reading the plugin code, using the
__latent_entropy attribute on a declaration was the wrong part and
simply keeping the __latent_entropy attribute on the variable definition
was the correct fix.
Fixes:
83bdc7275e62 ("random32: remove net_rand_state from the latent entropy gcc plugin")
Acked-by: Willy Tarreau <w@1wt.eu>
Cc: Emese Revfy <re.emese@gmail.com>
Signed-off-by: Thibaut Sautereau <thibaut.sautereau@ssi.gouv.fr>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rafael J. Wysocki [Fri, 2 Oct 2020 16:30:30 +0000 (18:30 +0200)]
Merge branch 'pm-cpufreq'
* pm-cpufreq:
cpufreq: intel_pstate: Fix missing return statement
Roman Gushchin [Thu, 1 Oct 2020 20:07:49 +0000 (13:07 -0700)]
mm: memcg/slab: fix slab statistics in !SMP configuration
Since commit
ea426c2a7de8 ("mm: memcg: prepare for byte-sized vmstat
items") the write side of slab counters accepts a value in bytes and
converts it to pages. It happens in __mod_node_page_state().
However a non-SMP version of __mod_node_page_state() doesn't perform
this conversion. It leads to incorrect (unrealistically high) slab
counters values. Fix this by adding a similar conversion to the non-SMP
version of __mod_node_page_state().
Signed-off-by: Roman Gushchin <guro@fb.com>
Reported-and-tested-by: Bastian Bittorf <bb@npl.de>
Fixes:
ea426c2a7de8 ("mm: memcg: prepare for byte-sized vmstat items")
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 2 Oct 2020 02:14:36 +0000 (19:14 -0700)]
pipe: remove pipe_wait() and fix wakeup race with splice
The pipe splice code still used the old model of waiting for pipe IO by
using a non-specific "pipe_wait()" that waited for any pipe event to
happen, which depended on all pipe IO being entirely serialized by the
pipe lock. So by checking the state you were waiting for, and then
adding yourself to the wait queue before dropping the lock, you were
guaranteed to see all the wakeups.
Strictly speaking, the actual wakeups were not done under the lock, but
the pipe_wait() model still worked, because since the waiter held the
lock when checking whether it should sleep, it would always see the
current state, and the wakeup was always done after updating the state.
However, commit
0ddad21d3e99 ("pipe: use exclusive waits when reading or
writing") split the single wait-queue into two, and in the process also
made the "wait for event" code wait for _two_ wait queues, and that then
showed a race with the wakers that were not serialized by the pipe lock.
It's only splice that used that "pipe_wait()" model, so the problem
wasn't obvious, but Josef Bacik reports:
"I hit a hang with fstest btrfs/187, which does a btrfs send into
/dev/null. This works by creating a pipe, the write side is given to
the kernel to write into, and the read side is handed to a thread that
splices into a file, in this case /dev/null.
The box that was hung had the write side stuck here [pipe_write] and
the read side stuck here [splice_from_pipe_next -> pipe_wait].
[ more details about pipe_wait() scenario ]
The problem is we're doing the prepare_to_wait, which sets our state
each time, however we can be woken up either with reads or writes. In
the case above we race with the WRITER waking us up, and re-set our
state to INTERRUPTIBLE, and thus never break out of schedule"
Josef had a patch that avoided the issue in pipe_wait() by just making
it set the state only once, but the deeper problem is that pipe_wait()
depends on a level of synchonization by the pipe mutex that it really
shouldn't. And the whole "wait for any pipe state change" model really
isn't very good to begin with.
So rather than trying to work around things in pipe_wait(), remove that
legacy model of "wait for arbitrary pipe event" entirely, and actually
create functions that wait for the pipe actually being readable or
writable, and can do so without depending on the pipe lock serializing
everything.
Fixes:
0ddad21d3e99 ("pipe: use exclusive waits when reading or writing")
Link: https://lore.kernel.org/linux-fsdevel/bfa88b5ad6f069b2b679316b9e495a970130416c.1601567868.git.josef@toxicpanda.com/
Reported-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-and-tested-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 1 Oct 2020 19:59:36 +0000 (12:59 -0700)]
Merge tag 'iommu-fixes-v5.9-rc7' of git://git./linux/kernel/git/joro/iommu
Pull iommu fixes from Joerg Roedel:
- Fix a device reference counting bug in the Exynos IOMMU driver.
- Lockdep fix for the Intel VT-d driver.
- Fix a bug in the AMD IOMMU driver which caused corruption of the IVRS
ACPI table and caused IOMMU driver initialization failures in kdump
kernels.
* tag 'iommu-fixes-v5.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/vt-d: Fix lockdep splat in iommu_flush_dev_iotlb()
iommu/amd: Fix the overwritten field in IVMD header
iommu/exynos: add missing put_device() call in exynos_iommu_of_xlate()
Linus Torvalds [Thu, 1 Oct 2020 18:49:01 +0000 (11:49 -0700)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 fix from Catalin Marinas:
"A previous commit to prevent AML memory opregions from accessing the
kernel memory turned out to be too restrictive. Relax the permission
check to permit the ACPI core to map kernel memory used for table
overrides"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: permit ACPI core to map kernel memory used for table overrides
Linus Torvalds [Thu, 1 Oct 2020 16:45:37 +0000 (09:45 -0700)]
Merge tag 'drm-fixes-2020-10-01-1' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"AMD and vmwgfx fixes.
Just dequeuing these a bit early as the AMD ones are bit larger than
I'd prefer, but Alex missed last week so it's a double set of fixes.
The larger ones are just register header fixes for the new chips that
were just introduced in rc1 along with some new PCI IDs for new hw.
Otherwise it is usual fixes.
The vmwgfx fix was due to some testing I was doing and found we
weren't booting properly, vmware had the fix internally so hurried it
vmwgfx:
- fix a regression due to TTM refactor
amdgpu:
- Fix potential double free in userptr handling
- Sienna Cichlid and Navy Flounder udpates
- Add Sienna Cichlid PCI IDs
- Drop experimental flag for navi12
- Raven fixes
- Renoir fixes
- HDCP fix
- DCN3 fix for clang and older versions of gcc
- Fix a runtime pm refcount issue"
* tag 'drm-fixes-2020-10-01-1' of git://anongit.freedesktop.org/drm/drm:
drm/amdgpu: disable gfxoff temporarily for navy_flounder
drm/amd/pm: setup APU dpm clock table in SMU HW initialization
drm/vmwgfx: Fix error handling in get_node
drm/amd/display: remove duplicate call to rn_vbios_smu_get_smu_version()
drm/amdgpu/swsmu/smu12: fix force clock handling for mclk
drm/amdgpu: restore proper ref count in amdgpu_display_crtc_set_config
drm/amdgpu/display: fix CFLAGS setup for DCN30
drm/amd/display: fix return value check for hdcp_work
drm/amdgpu: remove gpu_info fw support for sienna_cichlid etc.
drm/amd/pm: Removed fixed clock in auto mode DPM
drm/amdgpu: remove experimental flag from navi12
drm/amdgpu: add device ID for sienna_cichlid (v2)
drm/amdgpu: use the AV1 defines for VCN 3.0
drm/amdgpu: add VCN 3.0 AV1 registers
drm/amdgpu: add the GC 10.3 VRS registers
drm/amdgpu: prevent double kfree ttm->sg
Linus Torvalds [Thu, 1 Oct 2020 16:41:02 +0000 (09:41 -0700)]
Merge tag 'trace-v5.9-rc6' of git://git./linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"Two tracing fixes:
- Fix temp buffer accounting that caused a WARNING for
ftrace_dump_on_opps()
- Move the recursion check in one of the function callback helpers to
the beginning of the function, as if the rcu_is_watching() gets
traced, it will cause a recursive loop that will crash the kernel"
* tag 'trace-v5.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ftrace: Move RCU is watching check after recursion check
tracing: Fix trace_find_next_entry() accounting of temp buffer size
Lu Baolu [Sun, 27 Sep 2020 06:24:28 +0000 (14:24 +0800)]
iommu/vt-d: Fix lockdep splat in iommu_flush_dev_iotlb()
Lock(&iommu->lock) without disabling irq causes lockdep warnings.
[ 12.703950] ========================================================
[ 12.703962] WARNING: possible irq lock inversion dependency detected
[ 12.703975] 5.9.0-rc6+ #659 Not tainted
[ 12.703983] --------------------------------------------------------
[ 12.703995] systemd-udevd/284 just changed the state of lock:
[ 12.704007]
ffffffffbd6ff4d8 (device_domain_lock){..-.}-{2:2}, at:
iommu_flush_dev_iotlb.part.57+0x2e/0x90
[ 12.704031] but this lock took another, SOFTIRQ-unsafe lock in the past:
[ 12.704043] (&iommu->lock){+.+.}-{2:2}
[ 12.704045]
and interrupts could create inverse lock ordering between
them.
[ 12.704073]
other info that might help us debug this:
[ 12.704085] Possible interrupt unsafe locking scenario:
[ 12.704097] CPU0 CPU1
[ 12.704106] ---- ----
[ 12.704115] lock(&iommu->lock);
[ 12.704123] local_irq_disable();
[ 12.704134] lock(device_domain_lock);
[ 12.704146] lock(&iommu->lock);
[ 12.704158] <Interrupt>
[ 12.704164] lock(device_domain_lock);
[ 12.704174]
*** DEADLOCK ***
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20200927062428.13713-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Juergen Gross [Wed, 30 Sep 2020 09:16:14 +0000 (11:16 +0200)]
xen/events: don't use chip_data for legacy IRQs
Since commit
c330fb1ddc0a ("XEN uses irqdesc::irq_data_common::handler_data to store a per interrupt XEN data pointer which contains XEN specific information.")
Xen is using the chip_data pointer for storing IRQ specific data. When
running as a HVM domain this can result in problems for legacy IRQs, as
those might use chip_data for their own purposes.
Use a local array for this purpose in case of legacy IRQs, avoiding the
double use.
Cc: stable@vger.kernel.org
Fixes:
c330fb1ddc0a ("XEN uses irqdesc::irq_data_common::handler_data to store a per interrupt XEN data pointer which contains XEN specific information.")
Signed-off-by: Juergen Gross <jgross@suse.com>
Tested-by: Stefan Bader <stefan.bader@canonical.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: https://lore.kernel.org/r/20200930091614.13660-1-jgross@suse.com
Signed-off-by: Juergen Gross <jgross@suse.com>
Adrian Huang [Sat, 26 Sep 2020 10:26:02 +0000 (18:26 +0800)]
iommu/amd: Fix the overwritten field in IVMD header
Commit
387caf0b759a ("iommu/amd: Treat per-device exclusion
ranges as r/w unity-mapped regions") accidentally overwrites
the 'flags' field in IVMD (struct ivmd_header) when the I/O
virtualization memory definition is associated with the
exclusion range entry. This leads to the corrupted IVMD table
(incorrect checksum). The kdump kernel reports the invalid checksum:
ACPI BIOS Warning (bug): Incorrect checksum in table [IVRS] - 0x5C, should be 0x60 (
20200717/tbprint-177)
AMD-Vi: [Firmware Bug]: IVRS invalid checksum
Fix the above-mentioned issue by modifying the 'struct unity_map_entry'
member instead of the IVMD header.
Cleanup: The *exclusion_range* functions are not used anymore, so
get rid of them.
Fixes:
387caf0b759a ("iommu/amd: Treat per-device exclusion ranges as r/w unity-mapped regions")
Reported-and-tested-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Adrian Huang <ahuang12@lenovo.com>
Cc: Jerry Snitselaar <jsnitsel@redhat.com>
Link: https://lore.kernel.org/r/20200926102602.19177-1-adrianhuang0701@gmail.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Marc Zyngier [Mon, 13 Jul 2020 14:15:14 +0000 (15:15 +0100)]
KVM: arm64: Restore missing ISB on nVHE __tlb_switch_to_guest
Commit
a0e50aa3f4a8 ("KVM: arm64: Factor out stage 2 page table
data from struct kvm") dropped the ISB after __load_guest_stage2(),
only leaving the one that is required when the speculative AT
workaround is in effect.
As Andrew points it: "This alternative is 'backwards' to avoid a
double ISB as there is one in __load_guest_stage2 when the workaround
is active."
Restore the missing ISB, conditionned on the AT workaround not being
active.
Fixes:
a0e50aa3f4a8 ("KVM: arm64: Factor out stage 2 page table data from struct kvm")
Reported-by: Andrew Scull <ascull@google.com>
Reported-by: Thomas Tai <thomas.tai@oracle.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Andy Shevchenko [Wed, 30 Sep 2020 14:20:13 +0000 (17:20 +0300)]
gpio: pca953x: Correctly initialize registers 6 and 7 for PCA957x
When driver has been converted to the bitmap API the non-bitmap functions
started behaving differently on 32-bit BE architectures since the bytes in
two consequent unsigned longs are in different order in comparison to byte
array. Hence if the chip had had more than 32 lines the memset() call over
it would have not set up upper lines correctly.
Although it's currently a theoretical case (no supported chips of this type
has 32+ lines), it's better to provide a clean code to avoid people thinking
this is okay and potentially producing not fully working things.
Fixes:
35d13d94893f ("gpio: pca953x: convert to use bitmap API")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Link: https://lore.kernel.org/r/20200930142013.59247-2-andriy.shevchenko@linux.intel.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Andy Shevchenko [Wed, 30 Sep 2020 14:20:12 +0000 (17:20 +0300)]
gpio: pca953x: Use bitmap API over implicit GCC extension
In IRQ handler we have to clear bitmap before use. Currently
the GCC extension has been used for that. For sake of the consistency
switch to bitmap API. As expected bloat-o-meter shows no difference
in the object size.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Link: https://lore.kernel.org/r/20200930142013.59247-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Hanks Chen [Thu, 20 Aug 2020 11:22:25 +0000 (19:22 +0800)]
pinctrl: mediatek: check mtk_is_virt_gpio input parameter
check mtk_is_virt_gpio input parameter,
virtual gpio need to support eint mode.
add error handler for the ko case
to fix this boot fail:
pc : mtk_is_virt_gpio+0x20/0x38 [pinctrl_mtk_common_v2]
lr : mtk_gpio_get_direction+0x44/0xb0 [pinctrl_paris]
Fixes:
edd546465002 ("pinctrl: mediatek: avoid virtual gpio trying to set reg")
Signed-off-by: Hanks Chen <hanks.chen@mediatek.com>
Acked-by: Sean Wang <sean.wang@kernel.org>
Singed-off-by: Jie Yang <sin_jieyang@mediatek.com>
Link: https://lore.kernel.org/r/1597922546-29633-1-git-send-email-hanks.chen@mediatek.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Dmitry Baryshkov [Mon, 14 Sep 2020 09:18:46 +0000 (12:18 +0300)]
pinctrl: qcom: sm8250: correct sdc2_clk
Correct sdc2_clk pin definition (register offset is wrong, verified by
the msm-4.19 driver).
Fixes:
4e3ec9e407ad ("pinctrl: qcom: Add sm8250 pinctrl driver.")
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Acked-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20200914091846.55204-1-dmitry.baryshkov@linaro.org
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Dave Airlie [Thu, 1 Oct 2020 05:25:32 +0000 (15:25 +1000)]
Merge tag 'amd-drm-fixes-5.9-2020-09-30' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
amd-drm-fixes-5.9-2020-09-30:
amdgpu:
- Fix potential double free in userptr handling
- Sienna Cichlid and Navy Flounder udpates
- Add Sienna Cichlid PCI IDs
- Drop experimental flag for navi12
- Raven fixes
- Renoir fixes
- HDCP fix
- DCN3 fix for clang and older versions of gcc
- Fix a runtime pm refcount issue
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200930161326.4243-1-alexander.deucher@amd.com
Pali Rohár [Fri, 25 Sep 2020 09:21:15 +0000 (11:21 +0200)]
MAINTAINERS: Add Pali Rohár as aardvark PCI maintainer
Link: https://lore.kernel.org/r/20200925092115.16546-1-pali@kernel.org
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Thomas Petazzzoni <thomas.petazzoni@bootlin.com>
Ard Biesheuvel [Tue, 29 Sep 2020 13:25:22 +0000 (15:25 +0200)]
arm64: permit ACPI core to map kernel memory used for table overrides
Jonathan reports that the strict policy for memory mapped by the
ACPI core breaks the use case of passing ACPI table overrides via
initramfs. This is due to the fact that the memory type used for
loading the initramfs in memory is not recognized as a memory type
that is typically used by firmware to pass firmware tables.
Since the purpose of the strict policy is to ensure that no AML or
other ACPI code can manipulate any memory that is used by the kernel
to keep its internal state or the state of user tasks, we can relax
the permission check, and allow mappings of memory that is reserved
and marked as NOMAP via memblock, and therefore not covered by the
linear mapping to begin with.
Fixes:
1583052d111f ("arm64/acpi: disallow AML memory opregions to access kernel memory")
Fixes:
325f5585ec36 ("arm64/acpi: disallow writeable AML opregion mapping for EFI code regions")
Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Link: https://lore.kernel.org/r/20200929132522.18067-1-ardb@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Linus Torvalds [Wed, 30 Sep 2020 21:18:38 +0000 (14:18 -0700)]
Merge tag 'clk-fixes-for-linus' of git://git./linux/kernel/git/clk/linux
Pull clk fixes from Stephen Boyd:
"Another batch of clk driver fixes:
- Make sure DRAM and ChipID region doesn't get disabled on Exynos
- Fix a SATA failure on Tegra
- Fix the emac_ptp clk divider on stratix10"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: socfpga: stratix10: fix the divider for the emac_ptp_free_clk
clk: samsung: exynos4: mark 'chipid' clock as CLK_IGNORE_UNUSED
clk: tegra: Fix missing prototype for tegra210_clk_register_emc()
clk: tegra: Always program PLL_E when enabled
clk: tegra: Capitalization fixes
clk: samsung: Keep top BPLL mux on Exynos542x enabled
Anup Patel [Sun, 27 Sep 2020 05:39:16 +0000 (11:09 +0530)]
RISC-V: Check clint_time_val before use
The NoMMU kernel is broken for QEMU virt machine from Linux-5.9-rc6
because clint_time_val is used even before CLINT driver is probed
at following places:
1. rand_initialize() calls get_cycles() which in-turn uses
clint_time_val
2. boot_init_stack_canary() calls get_cycles() which in-turn
uses clint_time_val
The issue#1 (above) is fixed by providing custom random_get_entropy()
for RISC-V NoMMU kernel. For issue#2 (above), we remove dependency of
boot_init_stack_canary() on get_cycles() and this is aligned with the
boot_init_stack_canary() implementations of ARM, ARM64 and MIPS kernel.
Fixes:
d5be89a8d118 ("RISC-V: Resurrect the MMIO timer implementation for M-mode systems")
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Tested-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Filipe Manana [Wed, 23 Sep 2020 14:30:16 +0000 (15:30 +0100)]
btrfs: fix filesystem corruption after a device replace
We use a device's allocation state tree to track ranges in a device used
for allocated chunks, and we set ranges in this tree when allocating a new
chunk. However after a device replace operation, we were not setting the
allocated ranges in the new device's allocation state tree, so that tree
is empty after a device replace.
This means that a fitrim operation after a device replace will trim the
device ranges that have allocated chunks and extents, as we trim every
range for which there is not a range marked in the device's allocation
state tree. It is also important during chunk allocation, since the
device's allocation state is used to determine if a range is already
allocated when allocating a new chunk.
This is trivial to reproduce and the following script triggers the bug:
$ cat reproducer.sh
#!/bin/bash
DEV1="/dev/sdg"
DEV2="/dev/sdh"
DEV3="/dev/sdi"
wipefs -a $DEV1 $DEV2 $DEV3 &> /dev/null
# Create a raid1 test fs on 2 devices.
mkfs.btrfs -f -m raid1 -d raid1 $DEV1 $DEV2 > /dev/null
mount $DEV1 /mnt/btrfs
xfs_io -f -c "pwrite -S 0xab 0 10M" /mnt/btrfs/foo
echo "Starting to replace $DEV1 with $DEV3"
btrfs replace start -B $DEV1 $DEV3 /mnt/btrfs
echo
echo "Running fstrim"
fstrim /mnt/btrfs
echo
echo "Unmounting filesystem"
umount /mnt/btrfs
echo "Mounting filesystem in degraded mode using $DEV3 only"
wipefs -a $DEV1 $DEV2 &> /dev/null
mount -o degraded $DEV3 /mnt/btrfs
if [ $? -ne 0 ]; then
dmesg | tail
echo
echo "Failed to mount in degraded mode"
exit 1
fi
echo
echo "File foo data (expected all bytes = 0xab):"
od -A d -t x1 /mnt/btrfs/foo
umount /mnt/btrfs
When running the reproducer:
$ ./replace-test.sh
wrote
10485760/
10485760 bytes at offset 0
10 MiB, 2560 ops; 0.0901 sec (110.877 MiB/sec and 28384.5216 ops/sec)
Starting to replace /dev/sdg with /dev/sdi
Running fstrim
Unmounting filesystem
Mounting filesystem in degraded mode using /dev/sdi only
mount: /mnt/btrfs: wrong fs type, bad option, bad superblock on /dev/sdi, missing codepage or helper program, or other error.
[19581.748641] BTRFS info (device sdg): dev_replace from /dev/sdg (devid 1) to /dev/sdi started
[19581.803842] BTRFS info (device sdg): dev_replace from /dev/sdg (devid 1) to /dev/sdi finished
[19582.208293] BTRFS info (device sdi): allowing degraded mounts
[19582.208298] BTRFS info (device sdi): disk space caching is enabled
[19582.208301] BTRFS info (device sdi): has skinny extents
[19582.212853] BTRFS warning (device sdi): devid 2 uuid
1f731f47-e1bb-4f00-bfbb-
9e5a0cb4ba9f is missing
[19582.213904] btree_readpage_end_io_hook: 25839 callbacks suppressed
[19582.213907] BTRFS error (device sdi): bad tree block start, want
30490624 have 0
[19582.214780] BTRFS warning (device sdi): failed to read root (objectid=7): -5
[19582.231576] BTRFS error (device sdi): open_ctree failed
Failed to mount in degraded mode
So fix by setting all allocated ranges in the replace target device when
the replace operation is finishing, when we are holding the chunk mutex
and we can not race with new chunk allocations.
A test case for fstests follows soon.
Fixes:
1c11b63eff2a67 ("btrfs: replace pending/pinned chunks lists with io tree")
CC: stable@vger.kernel.org # 5.2+
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Josef Bacik [Thu, 20 Aug 2020 15:18:27 +0000 (11:18 -0400)]
btrfs: move btrfs_rm_dev_replace_free_srcdev outside of all locks
When closing and freeing the source device we could end up doing our
final blkdev_put() on the bdev, which will grab the bd_mutex. As such
we want to be holding as few locks as possible, so move this call
outside of the dev_replace->lock_finishing_cancel_unmount lock. Since
we're modifying the fs_devices we need to make sure we're holding the
uuid_mutex here, so take that as well.
There's a report from syzbot probably hitting one of the cases where
the bd_mutex and device_list_mutex are taken in the wrong order, however
it's not with device replace, like this patch fixes. As there's no
reproducer available so far, we can't verify the fix.
https://lore.kernel.org/lkml/
000000000000fc04d105afcf86d7@google.com/
link: https://syzkaller.appspot.com/bug?extid=84a0634dc5d21d488419
WARNING: possible circular locking dependency detected
5.9.0-rc5-syzkaller #0 Not tainted
------------------------------------------------------
syz-executor.0/6878 is trying to acquire lock:
ffff88804c17d780 (&bdev->bd_mutex){+.+.}-{3:3}, at: blkdev_put+0x30/0x520 fs/block_dev.c:1804
but task is already holding lock:
ffff8880908cfce0 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: close_fs_devices.part.0+0x2e/0x800 fs/btrfs/volumes.c:1159
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #4 (&fs_devs->device_list_mutex){+.+.}-{3:3}:
__mutex_lock_common kernel/locking/mutex.c:956 [inline]
__mutex_lock+0x134/0x10e0 kernel/locking/mutex.c:1103
btrfs_finish_chunk_alloc+0x281/0xf90 fs/btrfs/volumes.c:5255
btrfs_create_pending_block_groups+0x2f3/0x700 fs/btrfs/block-group.c:2109
__btrfs_end_transaction+0xf5/0x690 fs/btrfs/transaction.c:916
find_free_extent_update_loop fs/btrfs/extent-tree.c:3807 [inline]
find_free_extent+0x23b7/0x2e60 fs/btrfs/extent-tree.c:4127
btrfs_reserve_extent+0x166/0x460 fs/btrfs/extent-tree.c:4206
cow_file_range+0x3de/0x9b0 fs/btrfs/inode.c:1063
btrfs_run_delalloc_range+0x2cf/0x1410 fs/btrfs/inode.c:1838
writepage_delalloc+0x150/0x460 fs/btrfs/extent_io.c:3439
__extent_writepage+0x441/0xd00 fs/btrfs/extent_io.c:3653
extent_write_cache_pages.constprop.0+0x69d/0x1040 fs/btrfs/extent_io.c:4249
extent_writepages+0xcd/0x2b0 fs/btrfs/extent_io.c:4370
do_writepages+0xec/0x290 mm/page-writeback.c:2352
__writeback_single_inode+0x125/0x1400 fs/fs-writeback.c:1461
writeback_sb_inodes+0x53d/0xf40 fs/fs-writeback.c:1721
wb_writeback+0x2ad/0xd40 fs/fs-writeback.c:1894
wb_do_writeback fs/fs-writeback.c:2039 [inline]
wb_workfn+0x2dc/0x13e0 fs/fs-writeback.c:2080
process_one_work+0x94c/0x1670 kernel/workqueue.c:2269
worker_thread+0x64c/0x1120 kernel/workqueue.c:2415
kthread+0x3b5/0x4a0 kernel/kthread.c:292
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
-> #3 (sb_internal#2){.+.+}-{0:0}:
percpu_down_read include/linux/percpu-rwsem.h:51 [inline]
__sb_start_write+0x234/0x470 fs/super.c:1672
sb_start_intwrite include/linux/fs.h:1690 [inline]
start_transaction+0xbe7/0x1170 fs/btrfs/transaction.c:624
find_free_extent_update_loop fs/btrfs/extent-tree.c:3789 [inline]
find_free_extent+0x25e1/0x2e60 fs/btrfs/extent-tree.c:4127
btrfs_reserve_extent+0x166/0x460 fs/btrfs/extent-tree.c:4206
cow_file_range+0x3de/0x9b0 fs/btrfs/inode.c:1063
btrfs_run_delalloc_range+0x2cf/0x1410 fs/btrfs/inode.c:1838
writepage_delalloc+0x150/0x460 fs/btrfs/extent_io.c:3439
__extent_writepage+0x441/0xd00 fs/btrfs/extent_io.c:3653
extent_write_cache_pages.constprop.0+0x69d/0x1040 fs/btrfs/extent_io.c:4249
extent_writepages+0xcd/0x2b0 fs/btrfs/extent_io.c:4370
do_writepages+0xec/0x290 mm/page-writeback.c:2352
__writeback_single_inode+0x125/0x1400 fs/fs-writeback.c:1461
writeback_sb_inodes+0x53d/0xf40 fs/fs-writeback.c:1721
wb_writeback+0x2ad/0xd40 fs/fs-writeback.c:1894
wb_do_writeback fs/fs-writeback.c:2039 [inline]
wb_workfn+0x2dc/0x13e0 fs/fs-writeback.c:2080
process_one_work+0x94c/0x1670 kernel/workqueue.c:2269
worker_thread+0x64c/0x1120 kernel/workqueue.c:2415
kthread+0x3b5/0x4a0 kernel/kthread.c:292
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
-> #2 ((work_completion)(&(&wb->dwork)->work)){+.+.}-{0:0}:
__flush_work+0x60e/0xac0 kernel/workqueue.c:3041
wb_shutdown+0x180/0x220 mm/backing-dev.c:355
bdi_unregister+0x174/0x590 mm/backing-dev.c:872
del_gendisk+0x820/0xa10 block/genhd.c:933
loop_remove drivers/block/loop.c:2192 [inline]
loop_control_ioctl drivers/block/loop.c:2291 [inline]
loop_control_ioctl+0x3b1/0x480 drivers/block/loop.c:2257
vfs_ioctl fs/ioctl.c:48 [inline]
__do_sys_ioctl fs/ioctl.c:753 [inline]
__se_sys_ioctl fs/ioctl.c:739 [inline]
__x64_sys_ioctl+0x193/0x200 fs/ioctl.c:739
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
-> #1 (loop_ctl_mutex){+.+.}-{3:3}:
__mutex_lock_common kernel/locking/mutex.c:956 [inline]
__mutex_lock+0x134/0x10e0 kernel/locking/mutex.c:1103
lo_open+0x19/0xd0 drivers/block/loop.c:1893
__blkdev_get+0x759/0x1aa0 fs/block_dev.c:1507
blkdev_get fs/block_dev.c:1639 [inline]
blkdev_open+0x227/0x300 fs/block_dev.c:1753
do_dentry_open+0x4b9/0x11b0 fs/open.c:817
do_open fs/namei.c:3251 [inline]
path_openat+0x1b9a/0x2730 fs/namei.c:3368
do_filp_open+0x17e/0x3c0 fs/namei.c:3395
do_sys_openat2+0x16d/0x420 fs/open.c:1168
do_sys_open fs/open.c:1184 [inline]
__do_sys_open fs/open.c:1192 [inline]
__se_sys_open fs/open.c:1188 [inline]
__x64_sys_open+0x119/0x1c0 fs/open.c:1188
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
-> #0 (&bdev->bd_mutex){+.+.}-{3:3}:
check_prev_add kernel/locking/lockdep.c:2496 [inline]
check_prevs_add kernel/locking/lockdep.c:2601 [inline]
validate_chain kernel/locking/lockdep.c:3218 [inline]
__lock_acquire+0x2a96/0x5780 kernel/locking/lockdep.c:4426
lock_acquire+0x1f3/0xae0 kernel/locking/lockdep.c:5006
__mutex_lock_common kernel/locking/mutex.c:956 [inline]
__mutex_lock+0x134/0x10e0 kernel/locking/mutex.c:1103
blkdev_put+0x30/0x520 fs/block_dev.c:1804
btrfs_close_bdev fs/btrfs/volumes.c:1117 [inline]
btrfs_close_bdev fs/btrfs/volumes.c:1107 [inline]
btrfs_close_one_device fs/btrfs/volumes.c:1133 [inline]
close_fs_devices.part.0+0x1a4/0x800 fs/btrfs/volumes.c:1161
close_fs_devices fs/btrfs/volumes.c:1193 [inline]
btrfs_close_devices+0x95/0x1f0 fs/btrfs/volumes.c:1179
close_ctree+0x688/0x6cb fs/btrfs/disk-io.c:4149
generic_shutdown_super+0x144/0x370 fs/super.c:464
kill_anon_super+0x36/0x60 fs/super.c:1108
btrfs_kill_super+0x38/0x50 fs/btrfs/super.c:2265
deactivate_locked_super+0x94/0x160 fs/super.c:335
deactivate_super+0xad/0xd0 fs/super.c:366
cleanup_mnt+0x3a3/0x530 fs/namespace.c:1118
task_work_run+0xdd/0x190 kernel/task_work.c:141
tracehook_notify_resume include/linux/tracehook.h:188 [inline]
exit_to_user_mode_loop kernel/entry/common.c:163 [inline]
exit_to_user_mode_prepare+0x1e1/0x200 kernel/entry/common.c:190
syscall_exit_to_user_mode+0x7e/0x2e0 kernel/entry/common.c:265
entry_SYSCALL_64_after_hwframe+0x44/0xa9
other info that might help us debug this:
Chain exists of:
&bdev->bd_mutex --> sb_internal#2 --> &fs_devs->device_list_mutex
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&fs_devs->device_list_mutex);
lock(sb_internal#2);
lock(&fs_devs->device_list_mutex);
lock(&bdev->bd_mutex);
*** DEADLOCK ***
3 locks held by syz-executor.0/6878:
#0:
ffff88809070c0e0 (&type->s_umount_key#70){++++}-{3:3}, at: deactivate_super+0xa5/0xd0 fs/super.c:365
#1:
ffffffff8a5b37a8 (uuid_mutex){+.+.}-{3:3}, at: btrfs_close_devices+0x23/0x1f0 fs/btrfs/volumes.c:1178
#2:
ffff8880908cfce0 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: close_fs_devices.part.0+0x2e/0x800 fs/btrfs/volumes.c:1159
stack backtrace:
CPU: 0 PID: 6878 Comm: syz-executor.0 Not tainted 5.9.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x198/0x1fd lib/dump_stack.c:118
check_noncircular+0x324/0x3e0 kernel/locking/lockdep.c:1827
check_prev_add kernel/locking/lockdep.c:2496 [inline]
check_prevs_add kernel/locking/lockdep.c:2601 [inline]
validate_chain kernel/locking/lockdep.c:3218 [inline]
__lock_acquire+0x2a96/0x5780 kernel/locking/lockdep.c:4426
lock_acquire+0x1f3/0xae0 kernel/locking/lockdep.c:5006
__mutex_lock_common kernel/locking/mutex.c:956 [inline]
__mutex_lock+0x134/0x10e0 kernel/locking/mutex.c:1103
blkdev_put+0x30/0x520 fs/block_dev.c:1804
btrfs_close_bdev fs/btrfs/volumes.c:1117 [inline]
btrfs_close_bdev fs/btrfs/volumes.c:1107 [inline]
btrfs_close_one_device fs/btrfs/volumes.c:1133 [inline]
close_fs_devices.part.0+0x1a4/0x800 fs/btrfs/volumes.c:1161
close_fs_devices fs/btrfs/volumes.c:1193 [inline]
btrfs_close_devices+0x95/0x1f0 fs/btrfs/volumes.c:1179
close_ctree+0x688/0x6cb fs/btrfs/disk-io.c:4149
generic_shutdown_super+0x144/0x370 fs/super.c:464
kill_anon_super+0x36/0x60 fs/super.c:1108
btrfs_kill_super+0x38/0x50 fs/btrfs/super.c:2265
deactivate_locked_super+0x94/0x160 fs/super.c:335
deactivate_super+0xad/0xd0 fs/super.c:366
cleanup_mnt+0x3a3/0x530 fs/namespace.c:1118
task_work_run+0xdd/0x190 kernel/task_work.c:141
tracehook_notify_resume include/linux/tracehook.h:188 [inline]
exit_to_user_mode_loop kernel/entry/common.c:163 [inline]
exit_to_user_mode_prepare+0x1e1/0x200 kernel/entry/common.c:190
syscall_exit_to_user_mode+0x7e/0x2e0 kernel/entry/common.c:265
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x460027
RSP: 002b:
00007fff59216328 EFLAGS:
00000246 ORIG_RAX:
00000000000000a6
RAX:
0000000000000000 RBX:
0000000000076035 RCX:
0000000000460027
RDX:
0000000000403188 RSI:
0000000000000002 RDI:
00007fff592163d0
RBP:
0000000000000333 R08:
0000000000000000 R09:
000000000000000b
R10:
0000000000000005 R11:
0000000000000246 R12:
00007fff59217460
R13:
0000000002df2a60 R14:
0000000000000000 R15:
00007fff59217460
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
[ add syzbot reference ]
Signed-off-by: David Sterba <dsterba@suse.com>
Ulf Hansson [Wed, 30 Sep 2020 11:20:23 +0000 (13:20 +0200)]
ARM: imx6q: Fixup RCU usage for cpuidle
The commit
eb1f00237aca ("lockdep,trace: Expose tracepoints"), started to
expose us for tracepoints. For imx6q cpuidle, this leads to an RCU splat
according to below.
[6.870684] [<
c0db7690>] (_raw_spin_lock) from [<
c011f6a4>] (imx6q_enter_wait+0x18/0x9c)
[6.878846] [<
c011f6a4>] (imx6q_enter_wait) from [<
c09abfb0>] (cpuidle_enter_state+0x168/0x5e4)
To fix the problem, let's assign the corresponding idlestate->flags the
CPUIDLE_FLAG_RCU_IDLE bit, which enables us to call rcu_idle_enter|exit()
at the proper point.
Reported-by: Dong Aisheng <aisheng.dong@nxp.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Yoann Congal [Tue, 29 Sep 2020 20:41:58 +0000 (22:41 +0200)]
Documentation: PM: Fix a reStructuredText syntax error
Fix a reStructuredText syntax error in the cpuidle PM admin-guide
documentation: the ``...'' quotation marks are parsed as partial ''...''
reStructuredText markup and break the output formatting.
This change them to "...".
Signed-off-by: Yoann Congal <yoann.congal@smile.fr>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Zhang Rui [Mon, 28 Sep 2020 03:33:42 +0000 (11:33 +0800)]
cpufreq: intel_pstate: Fix missing return statement
Fix missing return statement when writing "off" to intel_pstate status
sysfs I/F.
Fixes:
55671ea3257a ("cpufreq: intel_pstate: Free memory only when turning off")
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Jiansong Chen [Wed, 30 Sep 2020 07:30:24 +0000 (15:30 +0800)]
drm/amdgpu: disable gfxoff temporarily for navy_flounder
gfxoff is temporarily disabled for navy_flounder, since
at present the feature caused some tdr when performing
display operations.
Signed-off-by: Jiansong Chen <Jiansong.Chen@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Evan Quan [Wed, 30 Sep 2020 04:07:56 +0000 (12:07 +0800)]
drm/amd/pm: setup APU dpm clock table in SMU HW initialization
As the dpm clock table is needed during DC HW initialization.
And that (DC HW initialization) comes before smu_late_init()
where current APU dpm clock table setup is performed. So, NULL
pointer dereference will be triggered. By moving APU dpm clock
table setup to smu_hw_init(), this can be avoided.
Fixes:
02cf91c113ea ("drm/amd/powerplay: postpone operations not required for hw setup to late_init")
Acked-by: Nirmoy Das <nirmoy.das@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reported-by: Dirk Gouders <dirk@gouders.net>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Linus Walleij [Wed, 30 Sep 2020 09:38:04 +0000 (11:38 +0200)]
Merge tag 'gpio-fixes-for-v5.9' of git://git./linux/kernel/git/brgl/linux into fixes
gpio fixes for v5.9
- correct logic of GPIO_LINE_DIRECTION in gpio-amd-fch
Palmer Dabbelt [Wed, 30 Sep 2020 06:48:47 +0000 (23:48 -0700)]
clocksource: clint: Export clint_time_val for modules
clint_time_val will soon be used by the RISC-V implementation of
random_get_entropy(), which is a static inline function that may be used by
modules (at least CRYPTO_JITTERENTROPY=m).
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Dave Airlie [Wed, 30 Sep 2020 04:21:22 +0000 (14:21 +1000)]
Merge branch 'vmwgfx-fixes-5.9' of git://people.freedesktop.org/~sroland/linux into drm-fixes
One vmwgfx regression fix.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: "Roland Scheidegger (VMware)" <rscheidegger.oss@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200930041000.2423-1-rscheidegger.oss@gmail.com
Zack Rusin [Fri, 25 Sep 2020 15:54:10 +0000 (11:54 -0400)]
drm/vmwgfx: Fix error handling in get_node
ttm_mem_type_manager_func.get_node was changed to return -ENOSPC
instead of setting the node pointer to NULL. Unfortunately
vmwgfx still had two places where it was explicitly converting
-ENOSPC to 0 causing regressions. This fixes those spots by
allowing -ENOSPC to be returned. That seems to fix recent
regressions with vmwgfx.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Martin Krastev <krastevm@vmware.com>
Sigend-off-by: Roland Scheidegger <sroland@vmware.com>
Mark Mielke [Mon, 28 Sep 2020 04:33:29 +0000 (00:33 -0400)]
scsi: iscsi: iscsi_tcp: Avoid holding spinlock while calling getpeername()
The kernel may fail to boot or devices may fail to come up when
initializing iscsi_tcp devices starting with Linux 5.8.
Commit
a79af8a64d39 ("[SCSI] iscsi_tcp: use iscsi_conn_get_addr_param
libiscsi function") introduced getpeername() within the session spinlock.
Commit
1b66d253610c ("bpf: Add get{peer, sock}name attach types for
sock_addr") introduced BPF_CGROUP_RUN_SA_PROG_LOCK() within getpeername(),
which acquires a mutex and when used from iscsi_tcp devices can now lead to
"BUG: scheduling while atomic:" and subsequent damage.
Ensure that the spinlock is released before calling getpeername() or
getsockname(). sock_hold() and sock_put() are used to ensure that the
socket reference is preserved until after the getpeername() or
getsockname() complete.
Link: https://bugzilla.redhat.com/show_bug.cgi?id=1877345
Link: https://lkml.org/lkml/2020/7/28/1085
Link: https://lkml.org/lkml/2020/8/31/459
Link: https://lore.kernel.org/r/20200928043329.606781-1-mark.mielke@gmail.com
Fixes:
a79af8a64d39 ("[SCSI] iscsi_tcp: use iscsi_conn_get_addr_param libiscsi function")
Fixes:
1b66d253610c ("bpf: Add get{peer, sock}name attach types for sock_addr")
Cc: stable@vger.kernel.org
Reported-by: Marc Dionne <marc.c.dionne@gmail.com>
Tested-by: Marc Dionne <marc.c.dionne@gmail.com>
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Mark Mielke <mark.mielke@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Linus Torvalds [Wed, 30 Sep 2020 00:56:30 +0000 (17:56 -0700)]
Merge tag 'devicetree-fixes-for-5.9-3' of git://git./linux/kernel/git/robh/linux
Pull devicetree fixes from Rob Herring:
- Fix handling of HOST_EXTRACFLAGS for dtc
- Several warning fixes for DT bindings
* tag 'devicetree-fixes-for-5.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
scripts/dtc: only append to HOST_EXTRACFLAGS instead of overwriting
dt-bindings: Fix 'reg' size issues in zynqmp examples
ARM: dts: bcm2835: Change firmware compatible from simple-bus to simple-mfd
dt-bindings: leds: cznic,turris-omnia-leds: fix error in binding
dt-bindings: crypto: sa2ul: fix a DT binding check warning
Linus Torvalds [Wed, 30 Sep 2020 00:18:34 +0000 (17:18 -0700)]
autofs: use __kernel_write() for the autofs pipe writing
autofs got broken in some configurations by commit
13c164b1a186
("autofs: switch to kernel_write") because there is now an extra LSM
permission check done by security_file_permission() in rw_verify_area().
autofs is one if the few places that really does want the much more
limited __kernel_write(), because the write is an internal kernel one
that shouldn't do any user permission checks (it also doesn't need the
file_start_write/file_end_write logic, since it's just a pipe).
There are a couple of other cases like that - accounting, core dumping,
and splice - but autofs stands out because it can be built as a module.
As a result, we need to export this internal __kernel_write() function
again.
We really don't want any other module to use this, but we don't have a
"EXPORT_SYMBOL_FOR_AUTOFS_ONLY()". But we can mark it GPL-only to at
least approximate that "internal use only" for licensing.
While in this area, make autofs pass in NULL for the file position
pointer, since it's always a pipe, and we now use a NULL file pointer
for streaming file descriptors (see file_ppos() and commit
438ab720c675:
"vfs: pass ppos=NULL to .read()/.write() of FMODE_STREAM files")
This effectively reverts commits
9db977522449 ("fs: unexport
__kernel_write") and
13c164b1a186 ("autofs: switch to kernel_write").
Fixes:
13c164b1a186 ("autofs: switch to kernel_write")
Reported-by: Ondrej Mosnacek <omosnace@redhat.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Acked-by: Acked-by: Ian Kent <raven@themaw.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dirk Gouders [Sun, 27 Sep 2020 09:39:45 +0000 (11:39 +0200)]
drm/amd/display: remove duplicate call to rn_vbios_smu_get_smu_version()
Commit
78fe9f63947a2b ("drm/amd/display: Remove DISPCLK Limit Floor for Certain SMU Versions")
added a call to rn_vbios_smu_get_smu_version() to set clk_mgr->smu_ver.
That field is initialized prior to the if-statement, already.
Fixes:
78fe9f63947a2b (drm/amd/display: Remove DISPCLK Limit Floor for Certain SMU Versions)
Signed-off-by: Dirk Gouders <dirk@gouders.net>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Sung Lee <sung.lee@amd.com>
Cc: Yongqiang Sun <yongqiang.sun@amd.com>
Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Mon, 28 Sep 2020 18:16:25 +0000 (14:16 -0400)]
drm/amdgpu/swsmu/smu12: fix force clock handling for mclk
The state array is in the reverse order compared to other asics
(high to low rather than low to high).
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1313
Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jean Delvare [Mon, 28 Sep 2020 09:10:37 +0000 (11:10 +0200)]
drm/amdgpu: restore proper ref count in amdgpu_display_crtc_set_config
A recent attempt to fix a ref count leak in
amdgpu_display_crtc_set_config() turned out to be doing too much and
"fixed" an intended decrease as if it were a leak. Undo that part to
restore the proper balance. This is the very nature of this function
to increase or decrease the power reference count depending on the
situation.
Consequences of this bug is that the power reference would
eventually get down to 0 while the display was still in use,
resulting in that display switching off unexpectedly.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Fixes:
e008fa6fb415 ("drm/amdgpu: fix ref count leak in amdgpu_display_crtc_set_config")
Cc: stable@vger.kernel.org
Cc: Navid Emamdoost <navid.emamdoost@gmail.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 22 Sep 2020 15:34:16 +0000 (11:34 -0400)]
drm/amdgpu/display: fix CFLAGS setup for DCN30
Properly handle clang and older versions of gcc.
Fixes:
e77165bf7b02a3 ("drm/amd/display: Add DCN3 blocks to Makefile")
Acked-by: Nirmoy Das <nirmoy.das@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Flora Cui [Wed, 23 Sep 2020 06:42:59 +0000 (14:42 +0800)]
drm/amd/display: fix return value check for hdcp_work
max_caps might be 0, thus hdcp_work might be ZERO_SIZE_PTR
Signed-off-by: Flora Cui <flora.cui@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jiansong Chen [Wed, 23 Sep 2020 03:58:23 +0000 (11:58 +0800)]
drm/amdgpu: remove gpu_info fw support for sienna_cichlid etc.
Remove gpu_info fw support for sienna_cichlid etc., since the
information can be retrieved from discovery binary.
Signed-off-by: Jiansong Chen <Jiansong.Chen@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>