Naohiro Aota [Thu, 19 Aug 2021 12:19:14 +0000 (21:19 +0900)]
btrfs: zoned: finish superblock zone once no space left for new SB
If there is no more space left for a new superblock in a superblock zone,
then it is better to ZONE_FINISH the zone and frees up the active zone
count.
Since btrfs_advance_sb_log() can now issue REQ_OP_ZONE_FINISH, we also need
to convert it to return int for the error case.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Naohiro Aota [Thu, 19 Aug 2021 12:19:13 +0000 (21:19 +0900)]
btrfs: zoned: locate superblock position using zone capacity
sb_write_pointer() returns the write position of next superblock. For READ,
we need a previous location. When the pointer is at the head, the previous
one is the last one of the other zone. Calculate the last one's position
from zone capacity.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Naohiro Aota [Thu, 19 Aug 2021 12:19:12 +0000 (21:19 +0900)]
btrfs: zoned: consider zone as full when no more SB can be written
We cannot write beyond zone capacity. So, we should consider a zone as
"full" when the write pointer goes beyond capacity - the size of super
info.
Also, take this opportunity to replace a subtle duplicated code with a loop
and fix a typo in comment.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Naohiro Aota [Thu, 19 Aug 2021 12:19:11 +0000 (21:19 +0900)]
btrfs: zoned: tweak reclaim threshold for zone capacity
With the introduction of zone capacity, the range [capacity, length] is
always zone unusable. Counting this region as a reclaim target will
cause reclaiming too early. Reclaim block groups based on bytes that can
be usable after resetting.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Naohiro Aota [Thu, 19 Aug 2021 12:19:10 +0000 (21:19 +0900)]
btrfs: zoned: calculate free space from zone capacity
Now that we introduced capacity in a block group, we need to calculate free
space using the capacity instead of the length. Thus, bytes we account
capacity - alloc_pointer as free, and account bytes [capacity, length] as
zone unusable.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Naohiro Aota [Thu, 19 Aug 2021 12:19:09 +0000 (21:19 +0900)]
btrfs: zoned: move btrfs_free_excluded_extents out of btrfs_calc_zone_unusable
btrfs_free_excluded_extents() is not neccessary for
btrfs_calc_zone_unusable() and it makes btrfs_calc_zone_unusable()
difficult to reuse. Move it out and call btrfs_free_excluded_extents()
in proper context.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Naohiro Aota [Thu, 19 Aug 2021 12:19:08 +0000 (21:19 +0900)]
btrfs: zoned: load zone capacity information from devices
The ZNS specification introduces the concept of a Zone Capacity. A zone
capacity is an additional per-zone attribute that indicates the number of
usable logical blocks within each zone, starting from the first logical
block of each zone. It is always smaller or equal to the zone size.
With the SINGLE profile, we can set a block group's "capacity" as the same
as the underlying zone's Zone Capacity. We will limit the allocation not
to exceed in a following commit.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Fri, 6 Aug 2021 08:12:42 +0000 (16:12 +0800)]
btrfs: defrag: enable defrag for subpage case
With the new infrastructure which has taken subpage into consideration,
now we should be safe to allow defrag to work for subpage case.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Fri, 6 Aug 2021 08:12:41 +0000 (16:12 +0800)]
btrfs: defrag: remove the old infrastructure
Now the old infrastructure can all be removed, defrag
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Thu, 27 May 2021 12:33:22 +0000 (20:33 +0800)]
btrfs: defrag: use defrag_one_cluster() to implement btrfs_defrag_file()
The function defrag_one_cluster() is able to defrag one range well
enough, we only need to do preparation for it, including:
- Clamp and align the defrag range
- Exclude invalid cases
- Proper inode locking
The old infrastructures will not be removed in this patch, as it would
be too noisy to review.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Fri, 6 Aug 2021 08:12:39 +0000 (16:12 +0800)]
btrfs: defrag: introduce helper to defrag one cluster
This new helper, defrag_one_cluster(), will defrag one cluster (at most
256K):
- Collect all initial targets
- Kick in readahead when possible
- Call defrag_one_range() on each initial target
With some extra range clamping.
- Update @sectors_defragged parameter
This involves one behavior change, the defragged sectors accounting is
no longer as accurate as old behavior, as the initial targets are not
consistent.
We can have new holes punched inside the initial target, and we will
skip such holes later.
But the defragged sectors accounting doesn't need to be that accurate
anyway, thus I don't want to pass those extra accounting burden into
defrag_one_range().
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Fri, 6 Aug 2021 08:12:38 +0000 (16:12 +0800)]
btrfs: defrag: introduce helper to defrag a range
A new helper, defrag_one_range(), is introduced to defrag one range.
This function will mostly prepare the needed pages and extent status for
defrag_one_locked_target().
As we can only have a consistent view of extent map with page and extent
bits locked, we need to re-check the range passed in to get a real
target list for defrag_one_locked_target().
Since defrag_collect_targets() will call defrag_lookup_extent() and lock
extent range, we also need to teach those two functions to skip extent
lock. Thus new parameter, @locked, is introduced to skip extent lock if
the caller has already locked the range.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Fri, 6 Aug 2021 08:12:37 +0000 (16:12 +0800)]
btrfs: defrag: introduce helper to defrag a contiguous prepared range
A new helper, defrag_one_locked_target(), introduced to do the real part
of defrag.
The caller needs to ensure both page and extents bits are locked, and no
ordered extent exists for the range, and all writeback is finished.
The core defrag part is pretty straight-forward:
- Reserve space
- Set extent bits to defrag
- Update involved pages to be dirty
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Fri, 6 Aug 2021 08:12:36 +0000 (16:12 +0800)]
btrfs: defrag: introduce helper to collect target file extents
Introduce a helper, defrag_collect_targets(), to collect all possible
targets to be defragged.
This function will not consider things like max_sectors_to_defrag, thus
caller should be responsible to ensure we don't exceed the limit.
This function will be the first stage of later defrag rework.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Fri, 6 Aug 2021 08:12:35 +0000 (16:12 +0800)]
btrfs: defrag: factor out page preparation into a helper
In cluster_pages_for_defrag(), we have complex code block inside one
for() loop.
The code block is to prepare one page for defrag, this will ensure:
- The page is locked and set up properly.
- No ordered extent exists in the page range.
- The page is uptodate.
This behavior is pretty common and will be reused by later defrag
rework.
So factor out the code into its own helper, defrag_prepare_one_page(),
for later usage, and cleanup the code by a little.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Fri, 6 Aug 2021 08:12:34 +0000 (16:12 +0800)]
btrfs: defrag: replace hard coded PAGE_SIZE with sectorsize
When testing subpage defrag support, I always find some strange inode
nbytes error, after a lot of debugging, it turns out that
defrag_lookup_extent() is using PAGE_SIZE as size for
lookup_extent_mapping().
Since lookup_extent_mapping() is calling __lookup_extent_mapping() with
@strict == 1, this means any extent map smaller than one page will be
ignored, prevent subpage defrag to grab a correct extent map.
There are quite some PAGE_SIZE usage in ioctl.c, but most of them are
correct usages, and can be one of the following cases:
- ioctl structure size check
We want ioctl structure to be contained inside one page.
- real page operations
The remaining cases in defrag_lookup_extent() and
check_defrag_in_cache() will be addressed in this patch.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Fri, 6 Aug 2021 08:12:33 +0000 (16:12 +0800)]
btrfs: defrag: also check PagePrivate for subpage cases in cluster_pages_for_defrag()
In function cluster_pages_for_defrag() we have a window where we unlock
page, either start the ordered range or read the content from disk.
When we re-lock the page, we need to make sure it still has the correct
page->private for subpage.
Thus add the extra PagePrivate check here to handle subpage cases
properly.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Fri, 6 Aug 2021 08:12:32 +0000 (16:12 +0800)]
btrfs: defrag: pass file_ra_state instead of file to btrfs_defrag_file()
Currently btrfs_defrag_file() accepts both "struct inode" and "struct
file" as parameter. We can easily grab "struct inode" from "struct
file" using file_inode() helper.
The reason why we need "struct file" is just to re-use its f_ra.
Change this to pass "struct file_ra_state" parameter, so that it's more
clear what we really want. Since we're here, also add some comments on
the function btrfs_defrag_file().
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Anand Jain [Tue, 24 Aug 2021 05:27:42 +0000 (13:27 +0800)]
btrfs: rename and switch to bool btrfs_chunk_readonly
btrfs_chunk_readonly() checks if the given chunk is writeable. It
returns 1 for readonly, and 0 for writeable. So the return argument type
bool shall suffice instead of the current type int.
Also, rename btrfs_chunk_readonly() to btrfs_chunk_writeable() as we
check if the bg is writeable, and helps to keep the logic at the parent
function simpler to understand.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Sidong Yang [Thu, 26 Aug 2021 14:44:36 +0000 (14:44 +0000)]
btrfs: reflink: initialize return value to 0 in btrfs_extent_same()
Fix a warning reported by smatch that ret could be returned without
initialized. The dedupe operations are supposed to to return 0 for a 0
length range but the caller does not pass olen == 0. To keep this
behaviour and also fix the warning initialize ret to 0.
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Sidong Yang <realwakka@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Tue, 17 Aug 2021 09:38:52 +0000 (17:38 +0800)]
btrfs: subpage: pack all subpage bitmaps into a larger bitmap
Currently we use u16 bitmap to make 4k sectorsize work for 64K page
size.
But this u16 bitmap is not large enough to contain larger page size like
128K, nor is space efficient for 16K page size.
To handle both cases, here we pack all subpage bitmaps into a larger
bitmap, now btrfs_subpage::bitmaps[] will be the ultimate bitmap for
subpage usage.
Each sub-bitmap will has its start bit number recorded in
btrfs_subpage_info::*_start, and its bitmap length will be recorded in
btrfs_subpage_info::bitmap_nr_bits.
All subpage bitmap operations will be converted from using direct u16
operations to bitmap operations, with above *_start calculated.
For 64K page size with 4K sectorsize, this should not cause much
difference.
While for 16K page size, we will only need 1 unsigned long (u32) to
store all the bitmaps, which saves quite some space.
Furthermore, this allows us to support larger page size like 128K and
258K.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Tue, 17 Aug 2021 09:38:51 +0000 (17:38 +0800)]
btrfs: subpage: introduce btrfs_subpage_bitmap_info
Currently we use fixed size u16 bitmap for subpage bitmap. This is fine
for 4K sectorsize with 64K page size.
But for 4K sectorsize and larger page size, the bitmap is too small,
while for smaller page size like 16K, u16 bitmaps waste too much space.
Here we introduce a new helper structure, btrfs_subpage_bitmap_info, to
record the proper bitmap size, and where each bitmap should start at.
By this, we can later compact all subpage bitmaps into one u32 bitmap.
This patch is the first step.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Tue, 17 Aug 2021 09:38:50 +0000 (17:38 +0800)]
btrfs: subpage: make btrfs_alloc_subpage() return btrfs_subpage directly
The existing calling convention of btrfs_alloc_subpage() is pretty
awful. Change it to a more common pattern by returning struct
btrfs_subpage directly and let the caller to determine if the call
succeeded.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Tue, 17 Aug 2021 09:38:49 +0000 (17:38 +0800)]
btrfs: subpage: only call btrfs_alloc_subpage() when sectorsize is smaller than PAGE_SIZE
There are two call sites of btrfs_alloc_subpage():
- btrfs_attach_subpage()
We have ensured sectorsize is smaller than PAGE_SIZE
- alloc_extent_buffer()
We call btrfs_alloc_subpage() unconditionally.
The alloc_extent_buffer() forces us to check the sectorsize size against
page size inside btrfs_alloc_subpage().
Since the function name, btrfs_alloc_subpage(), already indicates it
should only get called for subpage cases, do the check in
alloc_extent_buffer() and add an ASSERT() in btrfs_alloc_subpage().
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Su Yue [Wed, 18 Aug 2021 04:15:48 +0000 (12:15 +0800)]
btrfs: update comment for fs_devices::seed_list in btrfs_rm_device
Update it since commit
944d3f9fac61 ("btrfs: switch seed device to
list api") did conversion from fs_devices::seed to fs_devices::seed_list.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Su Yue <l@damenly.su>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Anand Jain [Wed, 28 Jul 2021 06:20:41 +0000 (14:20 +0800)]
btrfs: drop unnecessary ret in ioctl_quota_rescan_status
There is no need for the variable ret after
d66105cfa873 ("btrfs:
allocate btrfs_ioctl_quota_rescan_args on stack"), remove it.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Marcos Paulo de Souza [Sun, 1 Aug 2021 23:35:49 +0000 (20:35 -0300)]
btrfs: send: simplify send_create_inode_if_needed
The out label is being overused, we can simply return if the condition
permits.
No functional changes.
Reviewed-by: Su Yue <l@damenly.su>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Nikolay Borisov [Wed, 18 Aug 2021 10:41:19 +0000 (13:41 +0300)]
btrfs: rename btrfs_alloc_chunk to btrfs_create_chunk
The user facing function used to allocate new chunks is
btrfs_chunk_alloc, unfortunately there is yet another similar sounding
function - btrfs_alloc_chunk. This creates confusion, especially since
the latter function can be considered "private" in the sense that it
implements the first stage of chunk creation and as such is called by
btrfs_chunk_alloc.
To avoid the awkwardness that comes with having similarly named but
distinctly different in their purpose function rename btrfs_alloc_chunk
to btrfs_create_chunk, given that the main purpose of this function is
to orchestrate the whole process of allocating a chunk - reserving space
into devices, deciding on characteristics of the stripe size and
creating the in-memory structures.
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Linus Torvalds [Mon, 25 Oct 2021 18:30:31 +0000 (11:30 -0700)]
Linux 5.15-rc7
Matthew Wilcox (Oracle) [Mon, 25 Oct 2021 18:16:34 +0000 (19:16 +0100)]
secretmem: Prevent secretmem_users from wrapping to zero
Commit
110860541f44 ("mm/secretmem: use refcount_t instead of atomic_t")
attempted to fix the problem of secretmem_users wrapping to zero and
allowing suspend once again.
But it was reverted in commit
87066fdd2e30 ("Revert 'mm/secretmem: use
refcount_t instead of atomic_t'") because of the problems it caused - a
refcount_t was not semantically the right type to use.
Instead prevent secretmem_users from wrapping to zero by forbidding new
users if the number of users has wrapped from positive to negative.
This stops a long way short of reaching the necessary 4 billion users
where it wraps to zero again, so there's no need to be clever with
special anti-wrap types or checking the return value from atomic_inc().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Jordy Zomer <jordy@pwning.systems>
Cc: Kees Cook <keescook@chromium.org>,
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Mon, 25 Oct 2021 17:46:41 +0000 (10:46 -0700)]
spi: Fix tegra20 build with CONFIG_PM=n once again
Commit
efafec27c565 ("spi: Fix tegra20 build with CONFIG_PM=n") already
fixed the build without PM support once. There was an alternative fix
by Guenter in commit
2bab94090b01 ("spi: tegra20-slink: Declare runtime
suspend and resume functions conditionally"), and Mark then merged the
two correctly in
ffb1e76f4f32 ("Merge tag 'v5.15-rc2' into spi-5.15").
But for some inexplicable reason, Mark then merged things _again_ in
commit
59c4e190b10c ("Merge tag 'v5.15-rc3' into spi-5.15"), and screwed
things up at that point, and the __maybe_unused attribute on
tegra_slink_runtime_resume() went missing.
Reinstate it, so that alpha (and other architectures without PM support)
builds cleanly again.
Btw, this is another prime example of how random back-merges are not
good. Just don't do them. Subsystem developers should not merge my
tree in any normal circumstances. Both of those merge commits pointed
to above are bad: even the one that got the merge result right doesn't
even mention _why_ it was done, and the one that got it wrong is
obviously broken.
Reported-by: Guenter Roeck <linux@roeck-us.net>
Cc: Mark Brown <broonie@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Mon, 25 Oct 2021 17:28:52 +0000 (10:28 -0700)]
Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM fixes from Russell King:
- Fix clang-related relocation warning in futex code
- Fix incorrect use of get_kernel_nofault()
- Fix bad code generation in __get_user_check() when kasan is enabled
- Ensure TLB function table is correctly aligned
- Remove duplicated string function definitions in decompressor
- Fix link-time orphan section warnings
- Fix old-style function prototype for arch_init_kprobes()
- Only warn about XIP address when not compile testing
- Handle BE32 big endian for keystone2 remapping
* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: 9148/1: handle CONFIG_CPU_ENDIAN_BE32 in arch/arm/kernel/head.S
ARM: 9141/1: only warn about XIP address when not compile testing
ARM: 9139/1: kprobes: fix arch_init_kprobes() prototype
ARM: 9138/1: fix link warning with XIP + frame-pointer
ARM: 9134/1: remove duplicate memcpy() definition
ARM: 9133/1: mm: proc-macros: ensure *_tlb_fns are 4B aligned
ARM: 9132/1: Fix __get_user_check failure with ARM KASAN images
ARM: 9125/1: fix incorrect use of get_kernel_nofault()
ARM: 9122/1: select HAVE_FUTEX_CMPXCHG
Linus Torvalds [Mon, 25 Oct 2021 16:57:28 +0000 (09:57 -0700)]
Merge tag 'libata-5.15-rc7' of git://git./linux/kernel/git/dlemoal/libata
Pull libata fix from Damien Le Moal:
"A single fix in this pull request addressing an invalid error code
return in the sata_mv driver (from Zheyu)"
* tag 'libata-5.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
ata: sata_mv: Fix the error handling of mv_chip_id()
Linus Torvalds [Mon, 25 Oct 2021 16:47:18 +0000 (09:47 -0700)]
Merge tag 'pinctrl-v5.15-3' of git://git./linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:
"Some late pin control fixes, the most generally annoying will probably
be the AMD IRQ storm fix affecting the Microsoft surface.
Summary:
- Three fixes pertaining to Broadcom DT bindings. Some stuff didn't
work out as inteded, we need to back out
- A resume bug fix in the STM32 driver
- Disable and mask the interrupts on probe in the AMD pinctrl driver,
affecting Microsoft surface"
* tag 'pinctrl-v5.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: amd: disable and mask interrupts on probe
pinctrl: stm32: use valid pin identifier in stm32_pinctrl_resume()
Revert "pinctrl: bcm: ns: support updated DT binding as syscon subnode"
dt-bindings: pinctrl: brcm,ns-pinmux: drop unneeded CRU from example
Revert "dt-bindings: pinctrl: bcm4708-pinmux: rework binding to use syscon"
LABBE Corentin [Thu, 21 Oct 2021 09:26:57 +0000 (10:26 +0100)]
ARM: 9148/1: handle CONFIG_CPU_ENDIAN_BE32 in arch/arm/kernel/head.S
My intel-ixp42x-welltech-epbx100 no longer boot since 4.14.
This is due to commit
463dbba4d189 ("ARM: 9104/2: Fix Keystone 2 kernel
mapping regression")
which forgot to handle CONFIG_CPU_ENDIAN_BE32 as possible BE config.
Suggested-by: Krzysztof Hałasa <khalasa@piap.pl>
Fixes:
463dbba4d189 ("ARM: 9104/2: Fix Keystone 2 kernel mapping regression")
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Zheyu Ma [Fri, 22 Oct 2021 09:12:26 +0000 (09:12 +0000)]
ata: sata_mv: Fix the error handling of mv_chip_id()
mv_init_host() propagates the value returned by mv_chip_id() which in turn
gets propagated by mv_pci_init_one() and hits local_pci_probe().
During the process of driver probing, the probe function should return < 0
for failure, otherwise, the kernel will treat value > 0 as success.
Since this is a bug rather than a recoverable runtime error we should
use dev_alert() instead of dev_err().
Signed-off-by: Zheyu Ma <zheyuma97@gmail.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Linus Torvalds [Sun, 24 Oct 2021 19:48:33 +0000 (09:48 -1000)]
Revert "mm/secretmem: use refcount_t instead of atomic_t"
This reverts commit
110860541f443f950c1274f217a1a3e298670a33.
Converting the "secretmem_users" counter to a refcount is incorrect,
because a refcount is special in zero and can't just be incremented (but
a count of users is not, and "no users" is actually perfectly valid and
not a sign of a free'd resource).
Reported-by: syzbot+75639e6a0331cd61d3e2@syzkaller.appspotmail.com
Cc: Jordy Zomer <jordy@pwning.systems>
Cc: Kees Cook <keescook@chromium.org>,
Cc: Jordy Zomer <jordy@jordyzomer.github.io>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 24 Oct 2021 19:36:06 +0000 (09:36 -1000)]
Merge branch 'fixes' of git://git./linux/kernel/git/viro/vfs
Pull autofs fix from Al Viro:
"Fix for a braino of mine (in getting rid of open-coded
dentry_path_raw() in autofs a couple of cycles ago).
Mea culpa... Obvious -stable fodder"
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
autofs: fix wait name hash calculation in autofs_wait()
Linus Torvalds [Sun, 24 Oct 2021 17:04:21 +0000 (07:04 -1000)]
Merge tag 'sched_urgent_for_v5.15_rc7' of git://git./linux/kernel/git/tip/tip
Pull scheduler fix from Borislav Petkov:
"Reset clang's Shadow Call Stack on hotplug to prevent it from
overflowing"
* tag 'sched_urgent_for_v5.15_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/scs: Reset the shadow stack when idle_task_exit
Linus Torvalds [Sun, 24 Oct 2021 17:00:15 +0000 (07:00 -1000)]
Merge tag 'x86_urgent_for_v5.15_rc7' of git://git./linux/kernel/git/tip/tip
Pull x86 fix from Borislav Petkov:
"A single change adding Dave Hansen to our maintainers team"
* tag 'x86_urgent_for_v5.15_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
MAINTAINERS: Add Dave Hansen to the x86 maintainer team
Linus Torvalds [Sun, 24 Oct 2021 16:43:59 +0000 (06:43 -1000)]
Merge tag '5.15-rc6-ksmbd-fixes' of git://git.samba.org/ksmbd
Pull ksmbd fixes from Steve French:
"Ten fixes for the ksmbd kernel server, for improved security and
additional buffer overflow checks:
- a security improvement to session establishment to reduce the
possibility of dictionary attacks
- fix to ensure that maximum i/o size negotiated in the protocol is
not less than 64K and not more than 8MB to better match expected
behavior
- fix for crediting (flow control) important to properly verify that
sufficient credits are available for the requested operation
- seven additional buffer overflow, buffer validation checks"
* tag '5.15-rc6-ksmbd-fixes' of git://git.samba.org/ksmbd:
ksmbd: add buffer validation in session setup
ksmbd: throttle session setup failures to avoid dictionary attacks
ksmbd: validate OutputBufferLength of QUERY_DIR, QUERY_INFO, IOCTL requests
ksmbd: validate credit charge after validating SMB2 PDU body size
ksmbd: add buffer validation for smb direct
ksmbd: limit read/write/trans buffer size not to exceed 8MB
ksmbd: validate compound response buffer
ksmbd: fix potencial 32bit overflow from data area check in smb2_write
ksmbd: improve credits management
ksmbd: add validation in smb2_ioctl
Linus Torvalds [Sun, 24 Oct 2021 16:23:48 +0000 (06:23 -1000)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Ten fixes, seven of which are in drivers.
The core fixes are one to fix a potential crash on resume, one to sort
out our reference count releases to avoid releasing in-use modules and
one to adjust the cmd per lun calculation to avoid an overflow in
hyper-v"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ufs: ufs-pci: Force a full restore after suspend-to-disk
scsi: qla2xxx: Fix unmap of already freed sgl
scsi: qla2xxx: Fix a memory leak in an error path of qla2x00_process_els()
scsi: qla2xxx: Return -ENOMEM if kzalloc() fails
scsi: sd: Fix crashes in sd_resume_runtime()
scsi: mpi3mr: Fix duplicate device entries when scanning through sysfs
scsi: core: Put LLD module refcnt after SCSI device is released
scsi: storvsc: Fix validation for unsolicited incoming packets
scsi: iscsi: Fix set_param() handling
scsi: core: Fix shost->cmd_per_lun calculation in scsi_add_host_with_dma()
Linus Torvalds [Sat, 23 Oct 2021 03:42:13 +0000 (17:42 -1000)]
Merge tag 'block-5.15-2021-10-22' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"Fix for the cgroup code not ussing irq safe stats updates, and one fix
for an error handling condition in add_partition()"
* tag 'block-5.15-2021-10-22' of git://git.kernel.dk/linux-block:
block: fix incorrect references to disk objects
blk-cgroup: blk_cgroup_bio_start() should use irq-safe operations on blkg->iostat_cpu
Linus Torvalds [Sat, 23 Oct 2021 03:34:31 +0000 (17:34 -1000)]
Merge tag 'io_uring-5.15-2021-10-22' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
"Two fixes for the max workers limit API that was introduced this
series: one fix for an issue with that code, and one fixing a linked
timeout regression in this series"
* tag 'io_uring-5.15-2021-10-22' of git://git.kernel.dk/linux-block:
io_uring: apply worker limits to previous users
io_uring: fix ltimeout unprep
io_uring: apply max_workers limit to all future users
io-wq: max_worker fixes
Linus Torvalds [Fri, 22 Oct 2021 20:39:47 +0000 (10:39 -1000)]
Merge tag 'fuse-fixes-5.15-rc7' of git://git./linux/kernel/git/mszeredi/fuse
Pull fuse fixes from Miklos Szeredi:
"Syzbot discovered a race in case of reusing the fuse sb (introduced in
this cycle).
Fix it by doing the s_fs_info initialization at the proper place"
* tag 'fuse-fixes-5.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: clean up error exits in fuse_fill_super()
fuse: always initialize sb->s_fs_info
fuse: clean up fuse_mount destruction
fuse: get rid of fuse_put_super()
fuse: check s_root when destroying sb
Linus Torvalds [Fri, 22 Oct 2021 20:31:32 +0000 (10:31 -1000)]
Merge tag 'hyperv-fixes-signed-
20211022' of git://git./linux/kernel/git/hyperv/linux
Pull hyper-v fix from Wei Liu:
- Fix vmbus ARM64 build (Arnd Bergmann)
* tag 'hyperv-fixes-signed-
20211022' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
hyperv/vmbus: include linux/bitops.h
Arnd Bergmann [Mon, 18 Oct 2021 13:19:08 +0000 (15:19 +0200)]
hyperv/vmbus: include linux/bitops.h
On arm64 randconfig builds, hyperv sometimes fails with this
error:
In file included from drivers/hv/hv_trace.c:3:
In file included from drivers/hv/hyperv_vmbus.h:16:
In file included from arch/arm64/include/asm/sync_bitops.h:5:
arch/arm64/include/asm/bitops.h:11:2: error: only <linux/bitops.h> can be included directly
In file included from include/asm-generic/bitops/hweight.h:5:
include/asm-generic/bitops/arch_hweight.h:9:9: error: implicit declaration of function '__sw_hweight32' [-Werror,-Wimplicit-function-declaration]
include/asm-generic/bitops/atomic.h:17:7: error: implicit declaration of function 'BIT_WORD' [-Werror,-Wimplicit-function-declaration]
Include the correct header first.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20211018131929.2260087-1-arnd@kernel.org
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Linus Torvalds [Fri, 22 Oct 2021 19:08:08 +0000 (09:08 -1000)]
Merge tag 'acpi-5.15-rc7' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These fix two regressions, one related to ACPI power resources
management and one that broke ACPI tools compilation.
Specifics:
- Stop turning off unused ACPI power resources in an unknown state to
address a regression introduced during the 5.14 cycle (Rafael
Wysocki).
- Fix an ACPI tools build issue introduced recently when the minimal
stdarg.h was added (Miguel Bernal Marin)"
* tag 'acpi-5.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: PM: Do not turn off power resources in unknown state
ACPI: tools: fix compilation error
Linus Torvalds [Fri, 22 Oct 2021 19:02:15 +0000 (09:02 -1000)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull more x86 kvm fixes from Paolo Bonzini:
- Cache coherency fix for SEV live migration
- Fix for instruction emulation with PKU
- fixes for rare delaying of interrupt delivery
- fix for SEV-ES buffer overflow
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: SEV-ES: go over the sev_pio_data buffer in multiple passes if needed
KVM: SEV-ES: keep INS functions together
KVM: x86: remove unnecessary arguments from complete_emulator_pio_in
KVM: x86: split the two parts of emulator_pio_in
KVM: SEV-ES: clean up kvm_sev_es_ins/outs
KVM: x86: leave vcpu->arch.pio.count alone in emulator_pio_in_out
KVM: SEV-ES: rename guest_ins_data to sev_pio_data
KVM: SEV: Flush cache on non-coherent systems before RECEIVE_UPDATE_DATA
KVM: MMU: Reset mmu->pkru_mask to avoid stale data
KVM: nVMX: promptly process interrupts delivered while in guest mode
KVM: x86: check for interrupts before deciding whether to exit the fast path
Rafael J. Wysocki [Fri, 22 Oct 2021 18:45:10 +0000 (20:45 +0200)]
Merge branch 'acpi-tools'
Merge a fix for a recent ACPI tools bild regresson.
* acpi-tools:
ACPI: tools: fix compilation error
Paolo Bonzini [Tue, 12 Oct 2021 15:33:03 +0000 (11:33 -0400)]
KVM: SEV-ES: go over the sev_pio_data buffer in multiple passes if needed
The PIO scratch buffer is larger than a single page, and therefore
it is not possible to copy it in a single step to vcpu->arch/pio_data.
Bound each call to emulator_pio_in/out to a single page; keep
track of how many I/O operations are left in vcpu->arch.sev_pio_count,
so that the operation can be restarted in the complete_userspace_io
callback.
For OUT, this means that the previous kvm_sev_es_outs implementation
becomes an iterator of the loop, and we can consume the sev_pio_data
buffer before leaving to userspace.
For IN, instead, consuming the buffer and decreasing sev_pio_count
is always done in the complete_userspace_io callback, because that
is when the memcpy is done into sev_pio_data.
Cc: stable@vger.kernel.org
Fixes:
7ed9abfe8e9f ("KVM: SVM: Support string IO operations for an SEV-ES guest")
Reported-by: Felix Wilhelm <fwilhelm@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Tue, 12 Oct 2021 15:25:45 +0000 (11:25 -0400)]
KVM: SEV-ES: keep INS functions together
Make the diff a little nicer when we actually get to fixing
the bug. No functional change intended.
Cc: stable@vger.kernel.org
Fixes:
7ed9abfe8e9f ("KVM: SVM: Support string IO operations for an SEV-ES guest")
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Tue, 12 Oct 2021 16:35:20 +0000 (12:35 -0400)]
KVM: x86: remove unnecessary arguments from complete_emulator_pio_in
complete_emulator_pio_in can expect that vcpu->arch.pio has been filled in,
and therefore does not need the size and count arguments. This makes things
nicer when the function is called directly from a complete_userspace_io
callback.
No functional change intended.
Cc: stable@vger.kernel.org
Fixes:
7ed9abfe8e9f ("KVM: SVM: Support string IO operations for an SEV-ES guest")
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Wed, 13 Oct 2021 16:32:02 +0000 (12:32 -0400)]
KVM: x86: split the two parts of emulator_pio_in
emulator_pio_in handles both the case where the data is pending in
vcpu->arch.pio.count, and the case where I/O has to be done via either
an in-kernel device or a userspace exit. For SEV-ES we would like
to split these, to identify clearly the moment at which the
sev_pio_data is consumed. To this end, create two different
functions: __emulator_pio_in fills in vcpu->arch.pio.count, while
complete_emulator_pio_in clears it and releases vcpu->arch.pio.data.
Because this patch has to be backported, things are left a bit messy.
kernel_pio() operates on vcpu->arch.pio, which leads to emulator_pio_in()
having with two calls to complete_emulator_pio_in(). It will be fixed
in the next release.
While at it, remove the unused void* val argument of emulator_pio_in_out.
The function currently hardcodes vcpu->arch.pio_data as the
source/destination buffer, which sucks but will be fixed after the more
severe SEV-ES buffer overflow.
No functional change intended.
Cc: stable@vger.kernel.org
Fixes:
7ed9abfe8e9f ("KVM: SVM: Support string IO operations for an SEV-ES guest")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Tue, 12 Oct 2021 14:51:55 +0000 (10:51 -0400)]
KVM: SEV-ES: clean up kvm_sev_es_ins/outs
A few very small cleanups to the functions, smushed together because
the patch is already very small like this:
- inline emulator_pio_in_emulated and emulator_pio_out_emulated,
since we already have the vCPU
- remove the data argument and pull setting vcpu->arch.sev_pio_data into
the caller
- remove unnecessary clearing of vcpu->arch.pio.count when
emulation is done by the kernel (and therefore vcpu->arch.pio.count
is already clear on exit from emulator_pio_in and emulator_pio_out).
No functional change intended.
Cc: stable@vger.kernel.org
Fixes:
7ed9abfe8e9f ("KVM: SVM: Support string IO operations for an SEV-ES guest")
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Wed, 13 Oct 2021 16:29:42 +0000 (12:29 -0400)]
KVM: x86: leave vcpu->arch.pio.count alone in emulator_pio_in_out
Currently emulator_pio_in clears vcpu->arch.pio.count twice if
emulator_pio_in_out performs kernel PIO. Move the clear into
emulator_pio_out where it is actually necessary.
No functional change intended.
Cc: stable@vger.kernel.org
Fixes:
7ed9abfe8e9f ("KVM: SVM: Support string IO operations for an SEV-ES guest")
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Tue, 12 Oct 2021 14:22:34 +0000 (10:22 -0400)]
KVM: SEV-ES: rename guest_ins_data to sev_pio_data
We will be using this field for OUTS emulation as well, in case the
data that is pushed via OUTS spans more than one page. In that case,
there will be a need to save the data pointer across exits to userspace.
So, change the name to something that refers to any kind of PIO.
Also spell out what it is used for, namely SEV-ES.
No functional change intended.
Cc: stable@vger.kernel.org
Fixes:
7ed9abfe8e9f ("KVM: SVM: Support string IO operations for an SEV-ES guest")
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Linus Torvalds [Fri, 22 Oct 2021 05:06:08 +0000 (19:06 -1000)]
Merge tag 'drm-fixes-2021-10-22' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Nothing too crazy at the end of the cycle, the kmb modesetting fixes
are probably a bit large but it's not a major driver, and its fixing
monitor doesn't turn on type problems.
Otherwise it's just a few minor patches, one ast regression revert, an
msm power stability fix.
ast:
- fix regression with connector detect
msm:
- fix power stability issue
msxfb:
- fix crash on unload
panel:
- sync fix
kmb:
- modesetting fixes"
* tag 'drm-fixes-2021-10-22' of git://anongit.freedesktop.org/drm/drm:
Revert "drm/ast: Add detect function support"
drm/kmb: Enable ADV bridge after modeset
drm/kmb: Corrected typo in handle_lcd_irq
drm/kmb: Disable change of plane parameters
drm/kmb: Remove clearing DPHY regs
drm/kmb: Limit supported mode to 1080p
drm/kmb: Work around for higher system clock
drm/panel: ilitek-ili9881c: Fix sync for Feixin K101-IM2BYL02 panel
drm: mxsfb: Fix NULL pointer dereference crash on unload
drm/msm/devfreq: Restrict idle clamping to a618 for now
Mike Rapoport [Thu, 21 Oct 2021 07:09:29 +0000 (10:09 +0300)]
memblock: exclude MEMBLOCK_NOMAP regions from kmemleak
Vladimir Zapolskiy reports:
Commit
a7259df76702 ("memblock: make memblock_find_in_range method
private") invokes a kernel panic while running kmemleak on OF platforms
with nomaped regions:
Unable to handle kernel paging request at virtual address
fff000021e00000
[...]
scan_block+0x64/0x170
scan_gray_list+0xe8/0x17c
kmemleak_scan+0x270/0x514
kmemleak_write+0x34c/0x4ac
The memory allocated from memblock is registered with kmemleak, but if
it is marked MEMBLOCK_NOMAP it won't have linear map entries so an
attempt to scan such areas will fault.
Ideally, memblock_mark_nomap() would inform kmemleak to ignore
MEMBLOCK_NOMAP memory, but it can be called before kmemleak interfaces
operating on physical addresses can use __va() conversion.
Make sure that functions that mark allocated memory as MEMBLOCK_NOMAP
take care of informing kmemleak to ignore such memory.
Link: https://lore.kernel.org/all/8ade5174-b143-d621-8c8e-dc6a1898c6fb@linaro.org
Link: https://lore.kernel.org/all/c30ff0a2-d196-c50d-22f0-bd50696b1205@quicinc.com
Fixes:
a7259df76702 ("memblock: make memblock_find_in_range method private")
Reported-by: Vladimir Zapolskiy <vladimir.zapolskiy@linaro.org>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Vladimir Zapolskiy <vladimir.zapolskiy@linaro.org>
Tested-by: Qian Cai <quic_qiancai@quicinc.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mike Rapoport [Thu, 21 Oct 2021 07:09:28 +0000 (10:09 +0300)]
Revert "memblock: exclude NOMAP regions from kmemleak"
Commit
6e44bd6d34d6 ("memblock: exclude NOMAP regions from kmemleak")
breaks boot on EFI systems with kmemleak and VM_DEBUG enabled:
efi: Processing EFI memory map:
efi: 0x000090000000-0x000091ffffff [Conventional| | | | | | | | | | |WB|WT|WC|UC]
efi: 0x000092000000-0x0000928fffff [Runtime Data|RUN| | | | | | | | | |WB|WT|WC|UC]
------------[ cut here ]------------
kernel BUG at mm/kmemleak.c:1140!
Internal error: Oops - BUG: 0 [#1] SMP
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 5.15.0-rc6-next-
20211019+ #104
pstate:
600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : kmemleak_free_part_phys+0x64/0x8c
lr : kmemleak_free_part_phys+0x38/0x8c
sp :
ffff800011eafbc0
x29:
ffff800011eafbc0 x28:
1fffff7fffb41c0d x27:
fffffbfffda0e068
x26:
0000000092000000 x25:
1ffff000023d5f94 x24:
ffff800011ed84d0
x23:
ffff800011ed84c0 x22:
ffff800011ed83d8 x21:
0000000000900000
x20:
ffff800011782000 x19:
0000000092000000 x18:
ffff800011ee0730
x17:
0000000000000000 x16:
0000000000000000 x15:
1ffff0000233252c
x14:
ffff800019a905a0 x13:
0000000000000001 x12:
ffff7000023d5ed7
x11:
1ffff000023d5ed6 x10:
ffff7000023d5ed6 x9 :
dfff800000000000
x8 :
ffff800011eaf6b7 x7 :
0000000000000001 x6 :
ffff800011eaf6b0
x5 :
00008ffffdc2a12a x4 :
ffff7000023d5ed7 x3 :
1ffff000023dbf99
x2 :
1ffff000022f0463 x1 :
0000000000000000 x0 :
ffffffffffffffff
Call trace:
kmemleak_free_part_phys+0x64/0x8c
memblock_mark_nomap+0x5c/0x78
reserve_regions+0x294/0x33c
efi_init+0x2d0/0x490
setup_arch+0x80/0x138
start_kernel+0xa0/0x3ec
__primary_switched+0xc0/0xc8
Code:
34000041 97d526e7 f9418e80 36000040 (
d4210000)
random: get_random_bytes called from print_oops_end_marker+0x34/0x80 with crng_init=0
---[ end trace
0000000000000000 ]---
The crash happens because kmemleak_free_part_phys() tries to use __va()
before memstart_addr is initialized and this triggers a VM_BUG_ON() in
arch/arm64/include/asm/memory.h:
Revert
6e44bd6d34d6 ("memblock: exclude NOMAP regions from kmemleak"),
the issue it is fixing will be fixed differently.
Reported-by: Qian Cai <quic_qiancai@quicinc.com>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 22 Oct 2021 03:27:17 +0000 (17:27 -1000)]
Merge branch 'ucount-fixes-for-v5.15' of git://git./linux/kernel/git/ebiederm/user-namespace
Pull ucounts fixes from Eric Biederman:
"There has been one very hard to track down bug in the ucount code that
we have been tracking since roughly v5.14 was released. Alex managed
to find a reliable reproducer a few days ago and then I was able to
instrument the code and figure out what the issue was.
It turns out the sigqueue_alloc single atomic operation optimization
did not play nicely with ucounts multiple level rlimits. It turned out
that either sigqueue_alloc or sigqueue_free could be operating on
multiple levels and trigger the conditions for the optimization on
more than one level at the same time.
To deal with that situation I have introduced inc_rlimit_get_ucounts
and dec_rlimit_put_ucounts that just focuses on the optimization and
the rlimit and ucount changes.
While looking into the big bug I found I couple of other little issues
so I am including those fixes here as well.
When I have time I would very much like to dig into process ownership
of the shared signal queue and see if we could pick a single owner for
the entire queue so that all of the rlimits can count to that owner.
That should entirely remove the need to call get_ucounts and
put_ucounts in sigqueue_alloc and sigqueue_free. It is difficult
because Linux unlike POSIX supports setuid that works on a single
thread"
* 'ucount-fixes-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
ucounts: Move get_ucounts from cred_alloc_blank to key_change_session_keyring
ucounts: Proper error handling in set_cred_ucounts
ucounts: Pair inc_rlimit_ucounts with dec_rlimit_ucoutns in commit_creds
ucounts: Fix signal ucount refcounting
Linus Torvalds [Fri, 22 Oct 2021 01:36:50 +0000 (15:36 -1000)]
Merge tag 'net-5.15-rc7' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from netfilter, and can.
We'll have one more fix for a socket accounting regression, it's still
getting polished. Otherwise things look fine.
Current release - regressions:
- revert "vrf: reset skb conntrack connection on VRF rcv", there are
valid uses for previous behavior
- can: m_can: fix iomap_read_fifo() and iomap_write_fifo()
Current release - new code bugs:
- mlx5: e-switch, return correct error code on group creation failure
Previous releases - regressions:
- sctp: fix transport encap_port update in sctp_vtag_verify
- stmmac: fix E2E delay mechanism (in PTP timestamping)
Previous releases - always broken:
- netfilter: ip6t_rt: fix out-of-bounds read of ipv6_rt_hdr
- netfilter: xt_IDLETIMER: fix out-of-bound read caused by lack of
init
- netfilter: ipvs: make global sysctl read-only in non-init netns
- tcp: md5: fix selection between vrf and non-vrf keys
- ipv6: count rx stats on the orig netdev when forwarding
- bridge: mcast: use multicast_membership_interval for IGMPv3
- can:
- j1939: fix UAF for rx_kref of j1939_priv abort sessions on
receiving bad messages
- isotp: fix TX buffer concurrent access in isotp_sendmsg() fix
return error on FC timeout on TX path
- ice: fix re-init of RDMA Tx queues and crash if RDMA was not inited
- hns3: schedule the polling again when allocation fails, prevent
stalls
- drivers: add missing of_node_put() when aborting
for_each_available_child_of_node()
- ptp: fix possible memory leak and UAF in ptp_clock_register()
- e1000e: fix packet loss in burst mode on Tiger Lake and later
- mlx5e: ipsec: fix more checksum offload issues"
* tag 'net-5.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (75 commits)
usbnet: sanity check for maxpacket
net: enetc: make sure all traffic classes can send large frames
net: enetc: fix ethtool counter name for PM0_TERR
ptp: free 'vclock_index' in ptp_clock_release()
sfc: Don't use netif_info before net_device setup
sfc: Export fibre-specific supported link modes
net/mlx5e: IPsec: Fix work queue entry ethernet segment checksum flags
net/mlx5e: IPsec: Fix a misuse of the software parser's fields
net/mlx5e: Fix vlan data lost during suspend flow
net/mlx5: E-switch, Return correct error code on group creation failure
net/mlx5: Lag, change multipath and bonding to be mutually exclusive
ice: Add missing E810 device ids
igc: Update I226_K device ID
e1000e: Fix packet loss on Tiger Lake and later
e1000e: Separate TGP board type from SPT
ptp: Fix possible memory leak in ptp_clock_register()
net: stmmac: Fix E2E delay mechanism
nfc: st95hf: Make spi remove() callback return zero
net: hns3: disable sriov before unload hclge layer
net: hns3: fix vf reset workqueue cannot exit
...
Linus Torvalds [Fri, 22 Oct 2021 01:30:09 +0000 (15:30 -1000)]
Merge tag 'powerpc-5.15-5' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Fix a bug exposed by a previous fix, where running guests with
certain SMT topologies could crash the host on Power8.
- Fix atomic sleep warnings when re-onlining CPUs, when PREEMPT is
enabled.
Thanks to Nathan Lynch, Srikar Dronamraju, and Valentin Schneider.
* tag 'powerpc-5.15-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/smp: do not decrement idle task preempt count in CPU offline
powerpc/idle: Don't corrupt back chain when going idle
Kim Phillips [Thu, 21 Oct 2021 15:30:06 +0000 (10:30 -0500)]
Revert "drm/ast: Add detect function support"
This reverts commit
aae74ff9caa8de9a45ae2e46068c417817392a26,
since it prevents my AMD Milan system from booting, with:
[ 27.189558] BUG: kernel NULL pointer dereference, address:
0000000000000000
[ 27.197506] #PF: supervisor write access in kernel mode
[ 27.203333] #PF: error_code(0x0002) - not-present page
[ 27.209064] PGD 0 P4D 0
[ 27.211885] Oops: 0002 [#1] PREEMPT SMP NOPTI
[ 27.216744] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.15.0-rc6+ #15
[ 27.223928] Hardware name: AMD Corporation ETHANOL_X/ETHANOL_X, BIOS RXM1006B 08/20/2021
[ 27.232955] RIP: 0010:run_timer_softirq+0x38b/0x4a0
[ 27.238397] Code: 4c 89 f7 e8 37 27 ac 00 49 c7 46 08 00 00 00 00 49 8b 04 24 48 85 c0 74 71 4d 8b 3c 24 4d 89 7e 08 66 90 49 8b 07 49 8b 57 08 <48> 89 02 48 85 c0 74 04 48 89 50 08 49 8b 77 18 41 f6 47 22 20 4c
[ 27.259350] RSP: 0018:
ffffc42d00003ee8 EFLAGS:
00010086
[ 27.265176] RAX:
dead000000000122 RBX:
0000000000000000 RCX:
0000000000000101
[ 27.273134] RDX:
0000000000000000 RSI:
0000000000000087 RDI:
0000000000000001
[ 27.281084] RBP:
ffffc42d00003f70 R08:
0000000000000000 R09:
00000000000003eb
[ 27.289043] R10:
ffffa0860cb300d0 R11:
ffffa0c44de290b0 R12:
ffffc42d00003ef8
[ 27.297002] R13:
00000000fffef200 R14:
ffffa0c44de18dc0 R15:
ffffa0867a882350
[ 27.304961] FS:
0000000000000000(0000) GS:
ffffa0c44de00000(0000) knlGS:
0000000000000000
[ 27.313988] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 27.320396] CR2:
0000000000000000 CR3:
000000014569c001 CR4:
0000000000770ef0
[ 27.328346] PKRU:
55555554
[ 27.331359] Call Trace:
[ 27.334073] <IRQ>
[ 27.336314] ? __queue_work+0x420/0x420
[ 27.340589] ? lapic_next_event+0x21/0x30
[ 27.345060] ? clockevents_program_event+0x8f/0xe0
[ 27.350402] __do_softirq+0xfb/0x2db
[ 27.354388] irq_exit_rcu+0x98/0xd0
[ 27.358275] sysvec_apic_timer_interrupt+0xac/0xd0
[ 27.363620] </IRQ>
[ 27.365955] asm_sysvec_apic_timer_interrupt+0x12/0x20
[ 27.371685] RIP: 0010:cpuidle_enter_state+0xcc/0x390
[ 27.377292] Code: 3d 01 79 0a 50 e8 44 ed 77 ff 49 89 c6 0f 1f 44 00 00 31 ff e8 f5 f8 77 ff 80 7d d7 00 0f 85 e6 01 00 00 fb 66 0f 1f 44 00 00 <45> 85 ff 0f 88 17 01 00 00 49 63 c7 4c 2b 75 c8 48 8d 14 40 48 8d
[ 27.398243] RSP: 0018:
ffffffffb0e03dc8 EFLAGS:
00000246
[ 27.404069] RAX:
ffffa0c44de00000 RBX:
0000000000000001 RCX:
000000000000001f
[ 27.412028] RDX:
0000000000000000 RSI:
ffffffffb0bafc1f RDI:
ffffffffb0bbdb81
[ 27.419986] RBP:
ffffffffb0e03e00 R08:
00000006549f8f3f R09:
ffffffffb1065200
[ 27.427935] R10:
ffffa0c44de27ae4 R11:
ffffa0c44de27ac4 R12:
ffffa0c5634cb000
[ 27.435894] R13:
ffffffffb1065200 R14:
00000006549f8f3f R15:
0000000000000001
[ 27.443854] ? cpuidle_enter_state+0xbb/0x390
[ 27.448712] cpuidle_enter+0x2e/0x40
[ 27.452695] call_cpuidle+0x23/0x40
[ 27.456584] do_idle+0x1f0/0x270
[ 27.460181] cpu_startup_entry+0x20/0x30
[ 27.464553] rest_init+0xd4/0xe0
[ 27.468149] arch_call_rest_init+0xe/0x1b
[ 27.472619] start_kernel+0x6bc/0x6e2
[ 27.476764] x86_64_start_reservations+0x24/0x26
[ 27.481912] x86_64_start_kernel+0x75/0x79
[ 27.486477] secondary_startup_64_no_verify+0xb0/0xbb
[ 27.492111] Modules linked in: kvm_amd(+) kvm ipmi_si(+) ipmi_devintf rapl wmi_bmof ipmi_msghandler input_leds ccp k10temp mac_hid sch_fq_codel msr ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear ast i2c_algo_bit drm_vram_helper drm_ttm_helper ttm drm_kms_helper crct10dif_pclmul crc32_pclmul ghash_clmulni_intel syscopyarea aesni_intel sysfillrect crypto_simd sysimgblt fb_sys_fops cryptd hid_generic cec nvme ahci usbhid drm e1000e nvme_core hid libahci i2c_piix4 wmi
[ 27.551789] CR2:
0000000000000000
[ 27.555482] ---[ end trace
897987dfe93dccc6 ]---
[ 27.560630] RIP: 0010:run_timer_softirq+0x38b/0x4a0
[ 27.566069] Code: 4c 89 f7 e8 37 27 ac 00 49 c7 46 08 00 00 00 00 49 8b 04 24 48 85 c0 74 71 4d 8b 3c 24 4d 89 7e 08 66 90 49 8b 07 49 8b 57 08 <48> 89 02 48 85 c0 74 04 48 89 50 08 49 8b 77 18 41 f6 47 22 20 4c
[ 27.587021] RSP: 0018:
ffffc42d00003ee8 EFLAGS:
00010086
[ 27.592848] RAX:
dead000000000122 RBX:
0000000000000000 RCX:
0000000000000101
[ 27.600808] RDX:
0000000000000000 RSI:
0000000000000087 RDI:
0000000000000001
[ 27.608765] RBP:
ffffc42d00003f70 R08:
0000000000000000 R09:
00000000000003eb
[ 27.616716] R10:
ffffa0860cb300d0 R11:
ffffa0c44de290b0 R12:
ffffc42d00003ef8
[ 27.624673] R13:
00000000fffef200 R14:
ffffa0c44de18dc0 R15:
ffffa0867a882350
[ 27.632624] FS:
0000000000000000(0000) GS:
ffffa0c44de00000(0000) knlGS:
0000000000000000
[ 27.641650] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 27.648159] CR2:
0000000000000000 CR3:
000000014569c001 CR4:
0000000000770ef0
[ 27.656119] PKRU:
55555554
[ 27.659133] Kernel panic - not syncing: Fatal exception in interrupt
[ 29.030411] Shutting down cpus with NMI
[ 29.034699] Kernel Offset: 0x2e600000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 29.046790] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
Since unreliable, found by bisecting for KASAN's use-after-free in
enqueue_timer+0x4f/0x1e0, where the timer callback is called.
Reported-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Fixes:
aae74ff9caa8 ("drm/ast: Add detect function support")
Link: https://lore.kernel.org/lkml/0f7871be-9ca6-5ae4-3a40-5db9a8fb2365@amd.com/
Cc: Ainux <ainux.wang@gmail.com>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: David Airlie <airlied@redhat.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: sterlingteng@gmail.com
Cc: chenhuacai@kernel.org
Cc: Chuck Lever III <chuck.lever@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jon Grimm <jon.grimm@amd.com>
Cc: dri-devel <dri-devel@lists.freedesktop.org>
Cc: linux-kernel <linux-kernel@vger.kernel.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211021153006.92983-1-kim.phillips@amd.com
Dave Airlie [Thu, 21 Oct 2021 19:34:54 +0000 (05:34 +1000)]
Merge tag 'drm-misc-fixes-2021-10-21-1' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
drm-misc-fixes for v5.15-rc7:
- Rebased, to remove vc4 patches.
- Fix mxsfb crash on unload.
- Use correct sync parameters for Feixin K101-IM2BYL02.
- Assorted kmb modeset/atomic fixes.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/e66eaf89-b9b9-41f5-d0d2-dad7e59fabb5@linux.intel.com
Dave Airlie [Thu, 21 Oct 2021 19:22:10 +0000 (05:22 +1000)]
Merge tag 'drm-msm-fixes-2021-10-18' of https://gitlab.freedesktop.org/drm/msm into drm-fixes
One more fix for v5.15, to work around a power stability issue on a630
(and possibly others)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rob Clark <robdclark@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/CAF6AEGs1WPLthmd=ToDcEHm=u-7O38RAVJ2XwRoS8xPmC520vg@mail.gmail.com
Pavel Begunkov [Thu, 21 Oct 2021 12:20:29 +0000 (13:20 +0100)]
io_uring: apply worker limits to previous users
Another change to the API io-wq worker limitation API added in 5.15,
apply the limit to all prior users that already registered a tctx. It
may be confusing as it's now, in particular the change covers the
following 2 cases:
TASK1 | TASK2
_________________________________________________
ring = create() |
| limit_iowq_workers()
*not limited* |
TASK1 | TASK2
_________________________________________________
ring = create() |
| issue_requests()
limit_iowq_workers() |
| *not limited*
A note on locking, it's safe to traverse ->tctx_list as we hold
->uring_lock, but do that after dropping sqd->lock to avoid possible
problems. It's also safe to access tctx->io_wq there because tasks
kill it only after removing themselves from tctx_list, see
io_uring_cancel_generic() -> io_uring_clean_tctx()
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d6e09ecc3545e4dc56e43c906ee3d71b7ae21bed.1634818641.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <haoxu@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Masahiro Kozuka [Tue, 14 Sep 2021 21:09:51 +0000 (14:09 -0700)]
KVM: SEV: Flush cache on non-coherent systems before RECEIVE_UPDATE_DATA
Flush the destination page before invoking RECEIVE_UPDATE_DATA, as the
PSP encrypts the data with the guest's key when writing to guest memory.
If the target memory was not previously encrypted, the cache may contain
dirty, unecrypted data that will persist on non-coherent systems.
Fixes:
15fb7de1a7f5 ("KVM: SVM: Add KVM_SEV_RECEIVE_UPDATE_DATA command")
Cc: stable@vger.kernel.org
Cc: Peter Gonda <pgonda@google.com>
Cc: Marc Orr <marcorr@google.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Masahiro Kozuka <masa.koz@kozuka.jp>
[sean: converted bug report to changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <
20210914210951.2994260-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Chenyi Qiang [Thu, 21 Oct 2021 07:10:22 +0000 (15:10 +0800)]
KVM: MMU: Reset mmu->pkru_mask to avoid stale data
When updating mmu->pkru_mask, the value can only be added but it isn't
reset in advance. This will make mmu->pkru_mask keep the stale data.
Fix this issue.
Fixes:
2d344105f57c ("KVM, pkeys: introduce pkru_mask to cache conditions")
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <
20211021071022.1140-1-chenyi.qiang@intel.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Oliver Neukum [Thu, 21 Oct 2021 12:29:44 +0000 (14:29 +0200)]
usbnet: sanity check for maxpacket
maxpacket of 0 makes no sense and oopses as we need to divide
by it. Give up.
V2: fixed typo in log and stylistic issues
Signed-off-by: Oliver Neukum <oneukum@suse.com>
Reported-by: syzbot+76bb1d34ffa0adc03baa@syzkaller.appspotmail.com
Reviewed-by: Johan Hovold <johan@kernel.org>
Link: https://lore.kernel.org/r/20211021122944.21816-1-oneukum@suse.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Wed, 20 Oct 2021 17:33:40 +0000 (20:33 +0300)]
net: enetc: make sure all traffic classes can send large frames
The enetc driver does not implement .ndo_change_mtu, instead it
configures the MAC register field PTC{Traffic Class}MSDUR[MAXSDU]
statically to a large value during probe time.
The driver used to configure only the max SDU for traffic class 0, and
that was fine while the driver could only use traffic class 0. But with
the introduction of mqprio, sending a large frame into any other TC than
0 is broken.
This patch fixes that by replicating per traffic class the static
configuration done in enetc_configure_port_mac().
Fixes:
cbe9e835946f ("enetc: Enable TC offloading with mqprio")
Reported-by: Richie Pearn <richard.pearn@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: <Claudiu Manoil <claudiu.manoil@nxp.com>
Link: https://lore.kernel.org/r/20211020173340.1089992-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Wed, 20 Oct 2021 16:52:06 +0000 (19:52 +0300)]
net: enetc: fix ethtool counter name for PM0_TERR
There are two counters named "MAC tx frames", one of them is actually
incorrect. The correct name for that counter should be "MAC tx error
frames", which is symmetric to the existing "MAC rx error frames".
Fixes:
16eb4c85c964 ("enetc: Add ethtool statistics")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: <Claudiu Manoil <claudiu.manoil@nxp.com>
Link: https://lore.kernel.org/r/20211020165206.1069889-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Thomas Gleixner [Wed, 20 Oct 2021 21:08:16 +0000 (23:08 +0200)]
MAINTAINERS: Add Dave Hansen to the x86 maintainer team
Dave is already listed as x86/mm maintainer, has a profund knowledge
of the x86 architecture in general and a good taste in terms of kernel
programming in general.
Add him as a full x86 maintainer with all rights and duties.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/87zgr3flq7.ffs@tglx
Yang Yingliang [Thu, 21 Oct 2021 09:13:53 +0000 (17:13 +0800)]
ptp: free 'vclock_index' in ptp_clock_release()
'vclock_index' is accessed from sysfs, it shouled be freed
in release function, so move it from ptp_clock_unregister()
to ptp_clock_release().
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Erik Ekman [Tue, 19 Oct 2021 22:40:16 +0000 (00:40 +0200)]
sfc: Don't use netif_info before net_device setup
Use pci_info instead to avoid unnamed/uninitialized noise:
[197088.688729] sfc 0000:01:00.0: Solarflare NIC detected
[197088.690333] sfc 0000:01:00.0: Part Number : SFN5122F
[197088.729061] sfc 0000:01:00.0 (unnamed net_device) (uninitialized): no SR-IOV VFs probed
[197088.729071] sfc 0000:01:00.0 (unnamed net_device) (uninitialized): no PTP support
Inspired by
fa44821a4ddd ("sfc: don't use netif_info et al before
net_device is registered") from Heiner Kallweit.
Signed-off-by: Erik Ekman <erik@kryo.se>
Acked-by: Martin Habets <habetsm.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Erik Ekman [Tue, 19 Oct 2021 21:13:32 +0000 (23:13 +0200)]
sfc: Export fibre-specific supported link modes
The 1/10GbaseT modes were set up for cards with SFP+ cages in
3497ed8c852a5 ("sfc: report supported link speeds on SFP connections").
10GbaseT was likely used since no 10G fibre mode existed.
The missing fibre modes for 1/10G were added to ethtool.h in
5711a9822144
("net: ethtool: add support for 1000BaseX and missing 10G link modes")
shortly thereafter.
The user guide available at https://support-nic.xilinx.com/wp/drivers
lists support for the following cable and transceiver types in section 2.9:
- QSFP28 100G Direct Attach Cables
- QSFP28 100G SR Optical Transceivers (with SR4 modules listed)
- SFP28 25G Direct Attach Cables
- SFP28 25G SR Optical Transceivers
- QSFP+ 40G Direct Attach Cables
- QSFP+ 40G Active Optical Cables
- QSFP+ 40G SR4 Optical Transceivers
- QSFP+ to SFP+ Breakout Direct Attach Cables
- QSFP+ to SFP+ Breakout Active Optical Cables
- SFP+ 10G Direct Attach Cables
- SFP+ 10G SR Optical Transceivers
- SFP+ 10G LR Optical Transceivers
- SFP 1000BASE‐T Transceivers
- 1G Optical Transceivers
(From user guide issue 28. Issue 16 which also includes older cards like
SFN5xxx/SFN6xxx has matching lists for 1/10/40G transceiver types.)
Regarding SFP+ 10GBASE‐T transceivers the latest guide says:
"Solarflare adapters do not support 10GBASE‐T transceiver modules."
Tested using SFN5122F-R7 (with 2 SFP+ ports). Supported link modes do not change
depending on module used (tested with 1000BASE-T, 1000BASE-BX10, 10GBASE-LR).
Before:
$ ethtool ext
Settings for ext:
Supported ports: [ FIBRE ]
Supported link modes: 1000baseT/Full
10000baseT/Full
Supported pause frame use: Symmetric Receive-only
Supports auto-negotiation: No
Supported FEC modes: Not reported
Advertised link modes: Not reported
Advertised pause frame use: No
Advertised auto-negotiation: No
Advertised FEC modes: Not reported
Link partner advertised link modes: Not reported
Link partner advertised pause frame use: No
Link partner advertised auto-negotiation: No
Link partner advertised FEC modes: Not reported
Speed: 1000Mb/s
Duplex: Full
Auto-negotiation: off
Port: FIBRE
PHYAD: 255
Transceiver: internal
Current message level: 0x000020f7 (8439)
drv probe link ifdown ifup rx_err tx_err hw
Link detected: yes
After:
$ ethtool ext
Settings for ext:
Supported ports: [ FIBRE ]
Supported link modes: 1000baseT/Full
1000baseX/Full
10000baseCR/Full
10000baseSR/Full
10000baseLR/Full
Supported pause frame use: Symmetric Receive-only
Supports auto-negotiation: No
Supported FEC modes: Not reported
Advertised link modes: Not reported
Advertised pause frame use: No
Advertised auto-negotiation: No
Advertised FEC modes: Not reported
Link partner advertised link modes: Not reported
Link partner advertised pause frame use: No
Link partner advertised auto-negotiation: No
Link partner advertised FEC modes: Not reported
Speed: 1000Mb/s
Duplex: Full
Auto-negotiation: off
Port: FIBRE
PHYAD: 255
Transceiver: internal
Supports Wake-on: g
Wake-on: d
Current message level: 0x000020f7 (8439)
drv probe link ifdown ifup rx_err tx_err hw
Link detected: yes
Signed-off-by: Erik Ekman <erik@kryo.se>
Acked-by: Martin Habets <habetsm.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 21 Oct 2021 11:32:41 +0000 (12:32 +0100)]
Merge git://git./pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:
====================
Netfilter/IPVS fixes for net
The following patchset contains Netfilter fixes for net:
1) Crash due to missing initialization of timer data in
xt_IDLETIMER, from Juhee Kang.
2) NF_CONNTRACK_SECMARK should be bool in Kconfig, from Vegard Nossum.
3) Skip netdev events on netns removal, from Florian Westphal.
4) Add testcase to show port shadowing via UDP, also from Florian.
5) Remove pr_debug() code in ip6t_rt, this fixes a crash due to
unsafe access to non-linear skbuff, from Xin Long.
6) Make net/ipv4/vs/debug_level read-only from non-init netns,
from Antoine Tenart.
7) Remove bogus invocation to bash in selftests/netfilter/nft_flowtable.sh
also from Florian.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 21 Oct 2021 11:11:26 +0000 (12:11 +0100)]
Merge tag 'mlx5-fixes-2021-10-20' of git://git./linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-fixes-2021-10-20
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 21 Oct 2021 11:10:29 +0000 (12:10 +0100)]
Merge branch '1GbE' of git://git./linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2021-10-20
This series contains updates to e1000e, igc, and ice drivers.
Sasha fixes an issue with dropped packets on Tiger Lake platforms for
e1000e and corrects a device ID for igc.
Tony adds missing E810 device IDs for ice.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Anitha Chrisanthus [Mon, 7 Jun 2021 21:17:11 +0000 (14:17 -0700)]
drm/kmb: Enable ADV bridge after modeset
On KMB, ADV bridge must be programmed and powered on prior to
MIPI DSI HW initialization.
v2: changed to atomic_bridge_chain_enable (Sam)
Fixes:
98521f4d4b4c ("drm/kmb: Mipi DSI part of the display driver")
Co-developed-by: Edmund Dea <edmund.j.dea@intel.com>
Signed-off-by: Edmund Dea <edmund.j.dea@intel.com>
Signed-off-by: Anitha Chrisanthus <anitha.chrisanthus@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211019230719.789958-1-anitha.chrisanthus@intel.com
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Anitha Chrisanthus [Mon, 19 Jul 2021 23:28:51 +0000 (16:28 -0700)]
drm/kmb: Corrected typo in handle_lcd_irq
Check for Overflow bits for layer3 in the irq handler.
Fixes:
7f7b96a8a0a1 ("drm/kmb: Add support for KeemBay Display")
Signed-off-by: Anitha Chrisanthus <anitha.chrisanthus@intel.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20211013233632.471892-5-anitha.chrisanthus@intel.com
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Edmund Dea [Wed, 6 Oct 2021 23:03:48 +0000 (16:03 -0700)]
drm/kmb: Disable change of plane parameters
Due to HW limitations, KMB cannot change height, width, or
pixel format after initial plane configuration.
v2: removed memset disp_cfg as it is already zero.
Fixes:
7f7b96a8a0a1 ("drm/kmb: Add support for KeemBay Display")
Signed-off-by: Edmund Dea <edmund.j.dea@intel.com>
Signed-off-by: Anitha Chrisanthus <anitha.chrisanthus@intel.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20211013233632.471892-4-anitha.chrisanthus@intel.com
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Edmund Dea [Tue, 20 Apr 2021 22:31:53 +0000 (15:31 -0700)]
drm/kmb: Remove clearing DPHY regs
Don't clear the shared DPHY registers common to MIPI Rx and MIPI Tx during
DSI initialization since this was causing MIPI Rx reset. Rest of the
writes are bitwise, so will not affect Mipi Rx side.
Fixes:
98521f4d4b4c ("drm/kmb: Mipi DSI part of the display driver")
Signed-off-by: Edmund Dea <edmund.j.dea@intel.com>
Signed-off-by: Anitha Chrisanthus <anitha.chrisanthus@intel.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20211013233632.471892-3-anitha.chrisanthus@intel.com
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Anitha Chrisanthus [Fri, 8 Jan 2021 22:34:13 +0000 (14:34 -0800)]
drm/kmb: Limit supported mode to 1080p
KMB only supports single resolution(1080p), this commit checks for
1920x1080x60 or 1920x1080x59 in crtc_mode_valid.
Also, modes with vfp < 4 are not supported in KMB display. This change
prunes display modes with vfp < 4.
v2: added vfp check
Fixes:
7f7b96a8a0a1 ("drm/kmb: Add support for KeemBay Display")
Co-developed-by: Edmund Dea <edmund.j.dea@intel.com>
Signed-off-by: Edmund Dea <edmund.j.dea@intel.com>
Signed-off-by: Anitha Chrisanthus <anitha.chrisanthus@intel.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Link:https://patchwork.freedesktop.org/patch/msgid/
20211013233632.471892-2-anitha.chrisanthus@intel.com
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Anitha Chrisanthus [Tue, 15 Dec 2020 19:13:09 +0000 (11:13 -0800)]
drm/kmb: Work around for higher system clock
Use a different value for system clock offset in the
ppl/llp ratio calculations for clocks higher than 500 Mhz.
Fixes:
98521f4d4b4c ("drm/kmb: Mipi DSI part of the display driver")
Signed-off-by: Anitha Chrisanthus <anitha.chrisanthus@intel.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20211013233632.471892-1-anitha.chrisanthus@intel.com
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Dan Johansen [Wed, 18 Aug 2021 21:48:18 +0000 (23:48 +0200)]
drm/panel: ilitek-ili9881c: Fix sync for Feixin K101-IM2BYL02 panel
This adjusts sync values according to the datasheet
Fixes:
1c243751c095 ("drm/panel: ilitek-ili9881c: add support for Feixin K101-IM2BYL02 panel")
Co-developed-by: Marius Gripsgard <marius@ubports.com>
Signed-off-by: Dan Johansen <strit@manjaro.org>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20210818214818.298089-1-strit@manjaro.org
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Marek Vasut [Sat, 16 Oct 2021 21:04:46 +0000 (23:04 +0200)]
drm: mxsfb: Fix NULL pointer dereference crash on unload
The mxsfb->crtc.funcs may already be NULL when unloading the driver,
in which case calling mxsfb_irq_disable() via drm_irq_uninstall() from
mxsfb_unload() leads to NULL pointer dereference.
Since all we care about is masking the IRQ and mxsfb->base is still
valid, just use that to clear and mask the IRQ.
Fixes:
ae1ed00932819 ("drm: mxsfb: Stop using DRM simple display pipeline helper")
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Daniel Abrecht <public@danielabrecht.ch>
Cc: Emil Velikov <emil.l.velikov@gmail.com>
Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Stefan Agner <stefan@agner.ch>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20211016210446.171616-1-marex@denx.de
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Miklos Szeredi [Thu, 21 Oct 2021 08:01:39 +0000 (10:01 +0200)]
fuse: clean up error exits in fuse_fill_super()
Instead of "goto err", return error directly, since there's no error
cleanup to do now.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Thu, 21 Oct 2021 08:01:39 +0000 (10:01 +0200)]
fuse: always initialize sb->s_fs_info
Syzkaller reports a null pointer dereference in fuse_test_super() that is
caused by sb->s_fs_info being NULL.
This is due to the fact that fuse_fill_super() is initializing s_fs_info,
which is too late, it's already on the fs_supers list. The initialization
needs to be done in sget_fc() with the sb_lock held.
Move allocation of fuse_mount and fuse_conn from fuse_fill_super() into
fuse_get_tree().
After this ->kill_sb() will always be called with non-NULL ->s_fs_info,
hence fuse_mount_destroy() can drop the test for non-NULL "fm".
Reported-by: syzbot+74a15f02ccb51f398601@syzkaller.appspotmail.com
Fixes:
5d5b74aa9c76 ("fuse: allow sharing existing sb")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Thu, 21 Oct 2021 08:01:39 +0000 (10:01 +0200)]
fuse: clean up fuse_mount destruction
1. call fuse_mount_destroy() for open coded variants
2. before deactivate_locked_super() don't need fuse_mount destruction since
that will now be done (if ->s_fs_info is not cleared)
3. rearrange fuse_mount setup in fuse_get_tree_submount() so that the
regular pattern can be used
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Thu, 21 Oct 2021 08:01:38 +0000 (10:01 +0200)]
fuse: get rid of fuse_put_super()
The ->put_super callback is called from generic_shutdown_super() in case of
a fully initialized sb. This is called from kill_***_super(), which is
called from ->kill_sb instances.
Fuse uses ->put_super to destroy the fs specific fuse_mount and drop the
reference to the fuse_conn, while it does the same on each error case
during sb setup.
This patch moves the destruction from fuse_put_super() to
fuse_mount_destroy(), called at the end of all ->kill_sb instances. A
follup patch will clean up the error paths.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Thu, 21 Oct 2021 08:01:38 +0000 (10:01 +0200)]
fuse: check s_root when destroying sb
Checking "fm" works because currently sb->s_fs_info is cleared on error
paths; however, sb->s_root is what generic_shutdown_super() checks to
determine whether the sb was fully initialized or not.
This change will allow cleanup of sb setup error paths.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Paolo Bonzini [Wed, 20 Oct 2021 10:22:59 +0000 (06:22 -0400)]
KVM: nVMX: promptly process interrupts delivered while in guest mode
Since commit
c300ab9f08df ("KVM: x86: Replace late check_nested_events() hack with
more precise fix") there is no longer the certainty that check_nested_events()
tries to inject an external interrupt vmexit to L1 on every call to vcpu_enter_guest.
Therefore, even in that case we need to set KVM_REQ_EVENT. This ensures
that inject_pending_event() is called, and from there kvm_check_nested_events().
Fixes:
c300ab9f08df ("KVM: x86: Replace late check_nested_events() hack with more precise fix")
Cc: stable@vger.kernel.org
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Wed, 20 Oct 2021 10:27:36 +0000 (06:27 -0400)]
KVM: x86: check for interrupts before deciding whether to exit the fast path
The kvm_x86_sync_pir_to_irr callback can sometimes set KVM_REQ_EVENT.
If that happens exactly at the time that an exit is handled as
EXIT_FASTPATH_REENTER_GUEST, vcpu_enter_guest will go incorrectly
through the loop that calls kvm_x86_run, instead of processing
the request promptly.
Fixes:
379a3c8ee444 ("KVM: VMX: Optimize posted-interrupt delivery for timer fastpath")
Cc: stable@vger.kernel.org
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Ian Kent [Thu, 23 Sep 2021 07:13:39 +0000 (15:13 +0800)]
autofs: fix wait name hash calculation in autofs_wait()
There's a mistake in commit
2be7828c9fefc ("get rid of autofs_getpath()")
that affects kernels from v5.13.0, basically missed because of me not
fully testing the change for Al.
The problem is that the hash calculation for the wait name qstr hasn't
been updated to account for the change to use dentry_path_raw(). This
prevents the correct matching an existing wait resulting in multiple
notifications being sent to the daemon for the same mount which must
not occur.
The problem wasn't discovered earlier because it only occurs when
multiple processes trigger a request for the same mount concurrently
so it only shows up in more aggressive testing.
Fixes:
2be7828c9fefc ("get rid of autofs_getpath()")
Cc: stable@vger.kernel.org
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Linus Torvalds [Wed, 20 Oct 2021 20:23:05 +0000 (10:23 -1000)]
Merge tag 'ceph-for-5.15-rc7' of git://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
"Two important filesystem fixes, marked for stable.
The blocklisted superblocks issue was particularly annoying because
for unexperienced users it essentially exacted a reboot to establish a
new functional mount in that scenario"
* tag 'ceph-for-5.15-rc7' of git://github.com/ceph/ceph-client:
ceph: fix handling of "meta" errors
ceph: skip existing superblocks that are blocklisted or shut down when mounting
Linus Torvalds [Wed, 20 Oct 2021 20:16:51 +0000 (10:16 -1000)]
Merge tag 'dma-mapping-5.15-2' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping fixes from Christoph Hellwig:
- fix more dma-debug fallout (Gerald Schaefer, Hamza Mahfooz)
- fix a kerneldoc warning (Logan Gunthorpe)
* tag 'dma-mapping-5.15-2' of git://git.infradead.org/users/hch/dma-mapping:
dma-debug: teach add_dma_entry() about DMA_ATTR_SKIP_CPU_SYNC
dma-debug: fix sg checks in debug_dma_map_sg()
dma-mapping: fix the kerneldoc for dma_map_sgtable()
Emeel Hakim [Mon, 18 Oct 2021 12:31:19 +0000 (15:31 +0300)]
net/mlx5e: IPsec: Fix work queue entry ethernet segment checksum flags
Current Work Queue Entry (WQE) checksum (csum) flags in the ethernet
segment (eseg) in case of IPsec crypto offload datapath are not aligned
with PRM/HW expectations.
Currently the driver always sets the l3_inner_csum flag in case of IPsec
because of the wrong usage of skb->encapsulation as indicator for inner
IPsec header since skb->encapsulation is always ON for IPsec packets
since IPsec itself is an encapsulation protocol. The above forced a
failing attempts of calculating csum of non-existing segments (like in
the IP|ESP|TCP packet case which does not have an l3_inner) which led
to lots of packet drops hence the low throughput.
Fix by using xo->inner_ipproto as indicator for inner IPsec header
instead of skb->encapsulation in addition to setting the csum flags
as following:
* Tunnel Mode:
* Pkt: MAC IP ESP IP L4
* CSUM: l3_cs | l3_inner_cs | l4_inner_cs
*
* Transport Mode:
* Pkt: MAC IP ESP L4
* CSUM: l3_cs [ | l4_cs (checksum partial case)]
*
* Tunnel(VXLAN TCP/UDP) over Transport Mode
* Pkt: MAC IP ESP UDP VXLAN IP L4
* CSUM: l3_cs | l3_inner_cs | l4_inner_cs
Fixes:
f1267798c980 ("net/mlx5: Fix checksum issue of VXLAN and IPsec crypto offload")
Signed-off-by: Emeel Hakim <ehakim@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Emeel Hakim [Mon, 18 Oct 2021 12:30:09 +0000 (15:30 +0300)]
net/mlx5e: IPsec: Fix a misuse of the software parser's fields
IPsec crypto offload current Software Parser (SWP) fields settings in
the ethernet segment (eseg) are not aligned with PRM/HW expectations.
Among others in case of IP|ESP|TCP packet, current driver sets the
offsets for inner_l3 and inner_l4 although there is no inner l3/l4
headers relative to ESP header in such packets.
SWP provides the offsets for HW ,so it can be used to find csum fields
to offload the checksum, however these are not necessarily used by HW
and are used as fallback in case HW fails to parse the packet, e.g
when performing IPSec Transport Aware (IP | ESP | TCP) there is no
need to add SW parse on inner packet. So in some cases packets csum
was calculated correctly , whereas in other cases it failed. The later
faced csum errors (caused by wrong packet length calculations) which
led to lots of packet drops hence the low throughput.
Fix by setting the SWP fields as expected in a IP|ESP|TCP packet.
the following describe the expected SWP offsets:
* Tunnel Mode:
* SWP: OutL3 InL3 InL4
* Pkt: MAC IP ESP IP L4
*
* Transport Mode:
* SWP: OutL3 OutL4
* Pkt: MAC IP ESP L4
*
* Tunnel(VXLAN TCP/UDP) over Transport Mode
* SWP: OutL3 InL3 InL4
* Pkt: MAC IP ESP UDP VXLAN IP L4
Fixes:
f1267798c980 ("net/mlx5: Fix checksum issue of VXLAN and IPsec crypto offload")
Signed-off-by: Emeel Hakim <ehakim@nvidia.com>
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Moshe Shemesh [Sat, 2 Oct 2021 08:15:35 +0000 (11:15 +0300)]
net/mlx5e: Fix vlan data lost during suspend flow
During suspend flow the driver calls mlx5e_destroy_vlan_table() which
does not only delete the vlans steering flow rules, but also frees the
data on currently active vlans, thus it is not restored during resume
flow.
This fix keeps the vlan data on suspend flow and frees it only on driver
remove flow.
Fixes:
6783f0a21a3c ("net/mlx5e: Dynamic alloc vlan table for netdev when needed")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>