Wolfram Sang [Fri, 11 Dec 2020 22:23:30 +0000 (23:23 +0100)]
Merge tag 'at24-fixes-for-v5.10' of git://git./linux/kernel/git/brgl/linux into i2c/for-current
at24 fixes for v5.10
- fix NVMEM name with custom AT24 device name
Linus Torvalds [Sun, 6 Dec 2020 22:25:12 +0000 (14:25 -0800)]
Linux 5.10-rc7
Linus Torvalds [Sun, 6 Dec 2020 19:48:17 +0000 (11:48 -0800)]
Merge tag 'char-misc-5.10-rc7' of git://git./linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here are some small driver fixes, and one "large" revert, for
5.10-rc7.
They include:
- revert mei patch from 5.10-rc1 that was using a reserved userspace
value. It will be resubmitted once the proper id has been assigned
by the virtio people.
- habanalabs fixes found by the fall-through audit from Gustavo
- speakup driver fixes for reported issues
- fpga config build fix for reported issue.
All of these except the revert have been in linux-next with no
reported issues. The revert is "clean" and just removes a
previously-added driver, so no real issue there"
* tag 'char-misc-5.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
Revert "mei: virtio: virtualization frontend driver"
fpga: Specify HAS_IOMEM dependency for FPGA_DFL
habanalabs: put devices before driver removal
habanalabs: free host huge va_range if not used
speakup: Reject setting the speakup line discipline outside of speakup
Linus Torvalds [Sun, 6 Dec 2020 19:43:50 +0000 (11:43 -0800)]
Merge tag 'tty-5.10-rc7' of git://git./linux/kernel/git/gregkh/tty
Pull tty fixes from Greg KH:
"Here are two tty core fixes for 5.10-rc7.
They resolve some reported locking issues in the tty core. While they
have not been in a released linux-next yet, they have passed all of
the 0-day bot testing as well as the submitter's testing"
* tag 'tty-5.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
tty: Fix ->session locking
tty: Fix ->pgrp locking in tiocspgrp()
Linus Torvalds [Sun, 6 Dec 2020 19:38:36 +0000 (11:38 -0800)]
Merge tag 'usb-5.10-rc7' of git://git./linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are some small USB fixes for 5.10-rc7 that resolve a number of
reported issues, and add some new device ids.
Nothing major here, but these solve some problems that people were
having with the 5.10-rc tree:
- reverts for USB storage dma settings that broke working devices
- thunderbolt use-after-free fix
- cdns3 driver fixes
- gadget driver userspace copy fix
- new device ids
All of these except for the reverts have been in linux-next with no
reported issues. The reverts are "clean" and were tested by Hans, as
well as passing the 0-day tests"
* tag 'usb-5.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: gadget: f_fs: Use local copy of descriptors for userspace copy
usb: ohci-omap: Fix descriptor conversion
Revert "usb-storage: fix sdev->host->dma_dev"
Revert "uas: fix sdev->host->dma_dev"
Revert "uas: bump hw_max_sectors to 2048 blocks for SS or faster drives"
USB: serial: kl5kusb105: fix memleak on open
USB: serial: ch341: sort device-id entries
USB: serial: ch341: add new Product ID for CH341A
USB: serial: option: fix Quectel BG96 matching
usb: cdns3: core: fix goto label for error path
usb: cdns3: gadget: clear trb->length as zero after preparing every trb
usb: cdns3: Fix hardware based role switch
USB: serial: option: add support for Thales Cinterion EXS82
USB: serial: option: add Fibocom NL668 variants
thunderbolt: Fix use-after-free in remove_unplugged_switch()
Linus Torvalds [Sun, 6 Dec 2020 19:22:39 +0000 (11:22 -0800)]
Merge tag 'x86-urgent-2020-12-06' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"A set of fixes for x86:
- Make the AMD L3 QoS code and data priorization enable/disable
mechanism work correctly.
The control bit was only set/cleared on one of the CPUs in a L3
domain, but it has to be modified on all CPUs in the domain. The
initial documentation was not clear about this, but the updated one
from Oct 2020 spells it out.
- Fix an off by one in the UV platform detection code which causes
the UV hubs to be identified wrongly.
The chip revisions start at 1 not at 0.
- Fix a long standing bug in the evaluation of prefixes in the
uprobes code which fails to handle repeated prefixes properly.
The aggregate size of the prefixes can be larger than the bytes
array but the code blindly iterated over the aggregate size beyond
the array boundary. Add a macro to handle this case properly and
use it at the affected places"
* tag 'x86-urgent-2020-12-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sev-es: Use new for_each_insn_prefix() macro to loop over prefixes bytes
x86/insn-eval: Use new for_each_insn_prefix() macro to loop over prefixes bytes
x86/uprobes: Do not use prefixes.nbytes when looping over prefixes.bytes
x86/platform/uv: Fix UV4 hub revision adjustment
x86/resctrl: Fix AMD L3 QOS CDP enable/disable
Linus Torvalds [Sun, 6 Dec 2020 19:20:18 +0000 (11:20 -0800)]
Merge tag 'perf-urgent-2020-12-06' of git://git./linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
"Two fixes for performance monitoring on X86:
- Add recursion protection to another callchain invoked from
x86_pmu_stop() which can recurse back into x86_pmu_stop(). The
first attempt to fix this missed this extra code path.
- Use the already filtered status variable to check for PEBS counter
overflow bits and not the unfiltered full status read from
IA32_PERF_GLOBAL_STATUS which can have unrelated bits check which
would be evaluated incorrectly"
* tag 'perf-urgent-2020-12-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel: Check PEBS status correctly
perf/x86/intel: Fix a warning on x86_pmu_stop() with large PEBS
Linus Torvalds [Sun, 6 Dec 2020 19:15:55 +0000 (11:15 -0800)]
Merge tag 'irq-urgent-2020-12-06' of git://git./linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
"A set of updates for the interrupt subsystem:
- Make multiqueue devices which use the managed interrupt affinity
infrastructure work on PowerPC/Pseries. PowerPC does not use the
generic infrastructure for setting up PCI/MSI interrupts and the
multiqueue changes failed to update the legacy PCI/MSI
infrastructure. Make this work by passing the affinity setup
information down to the mapping and allocation functions.
- Move Jason Cooper from MAINTAINERS to CREDITS as his mail is
bouncing and he's not reachable. We hope all is well with him and
say thanks for his work over the years"
* tag 'irq-urgent-2020-12-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
powerpc/pseries: Pass MSI affinity to irq_create_mapping()
genirq/irqdomain: Add an irq_create_mapping_affinity() function
MAINTAINERS: Move Jason Cooper to CREDITS
Linus Torvalds [Sun, 6 Dec 2020 19:11:32 +0000 (11:11 -0800)]
Merge tag 'locking-urgent-2020-12-06' of git://git./linux/kernel/git/tip/tip
Pull intel_idle build fix from Thomas Gleixner:
"A tiny build fix for a recent change in the intel_idle driver which
missed a CONFIG dependency and broke the build for certain
configurations"
* tag 'locking-urgent-2020-12-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
intel_idle: Build fix
Linus Torvalds [Sun, 6 Dec 2020 18:31:39 +0000 (10:31 -0800)]
Merge tag 'kbuild-fixes-v5.10-2' of git://git./linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- Move -Wcast-align to W=3, which tends to be false-positive and there
is no tree-wide solution.
- Pass -fmacro-prefix-map to KBUILD_CPPFLAGS because it is a
preprocessor option and makes sense for .S files as well.
- Disable -gdwarf-2 for Clang's integrated assembler to avoid warnings.
- Disable --orphan-handling=warn for LLD 10.0.1 to avoid warnings.
- Fix undesirable line breaks in *.mod files.
* tag 'kbuild-fixes-v5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: avoid split lines in .mod files
kbuild: Disable CONFIG_LD_ORPHAN_WARN for ld.lld 10.0.1
kbuild: Hoist '--orphan-handling' into Kconfig
Kbuild: do not emit debug info for assembly with LLVM_IAS=1
kbuild: use -fmacro-prefix-map for .S sources
Makefile.extrawarn: move -Wcast-align to W=3
Linus Torvalds [Sun, 6 Dec 2020 18:20:59 +0000 (10:20 -0800)]
Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
"12 patches.
Subsystems affected by this patch series: mm (memcg, zsmalloc, swap,
mailmap, selftests, pagecache, hugetlb, pagemap), lib, and coredump"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm/mmap.c: fix mmap return value when vma is merged after call_mmap()
hugetlb_cgroup: fix offline of hugetlb cgroup with reservations
mm/filemap: add static for function __add_to_page_cache_locked
userfaultfd: selftests: fix SIGSEGV if huge mmap fails
tools/testing/selftests/vm: fix build error
mailmap: add two more addresses of Uwe Kleine-König
mm/swapfile: do not sleep with a spin lock held
mm/zsmalloc.c: drop ZSMALLOC_PGTABLE_MAPPING
mm: list_lru: set shrinker map bit when child nr_items is not zero
mm: memcg/slab: fix obj_cgroup_charge() return value handling
coredump: fix core_pattern parse error
zlib: export S390 symbols for zlib modules
Liu Zixian [Sun, 6 Dec 2020 06:15:15 +0000 (22:15 -0800)]
mm/mmap.c: fix mmap return value when vma is merged after call_mmap()
On success, mmap should return the begin address of newly mapped area,
but patch "mm: mmap: merge vma after call_mmap() if possible" set
vm_start of newly merged vma to return value addr. Users of mmap will
get wrong address if vma is merged after call_mmap(). We fix this by
moving the assignment to addr before merging vma.
We have a driver which changes vm_flags, and this bug is found by our
testcases.
Fixes: d70cec898324 ("mm: mmap: merge vma after call_mmap() if possible")
Signed-off-by: Liu Zixian <liuzixian4@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Hongxiang Lou <louhongxiang@huawei.com>
Cc: Hu Shiyuan <hushiyuan@huawei.com>
Cc: Matthew Wilcox <willy@infradead.org>
Link: https://lkml.kernel.org/r/20201203085350.22624-1-liuzixian4@huawei.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mike Kravetz [Sun, 6 Dec 2020 06:15:12 +0000 (22:15 -0800)]
hugetlb_cgroup: fix offline of hugetlb cgroup with reservations
Adrian Moreno was ruuning a kubernetes 1.19 + containerd/docker workload
using hugetlbfs. In this environment the issue is reproduced by:
- Start a simple pod that uses the recently added HugePages medium
feature (pod yaml attached)
- Start a DPDK app. It doesn't need to run successfully (as in transfer
packets) nor interact with real hardware. It seems just initializing
the EAL layer (which handles hugepage reservation and locking) is
enough to trigger the issue
- Delete the Pod (or let it "Complete").
This would result in a kworker thread going into a tight loop (top output):
1425 root 20 0 0 0 0 R 99.7 0.0 5:22.45 kworker/28:7+cgroup_destroy
'perf top -g' reports:
- 63.28% 0.01% [kernel] [k] worker_thread
- 49.97% worker_thread
- 52.64% process_one_work
- 62.08% css_killed_work_fn
- hugetlb_cgroup_css_offline
41.52% _raw_spin_lock
- 2.82% _cond_resched
rcu_all_qs
2.66% PageHuge
- 0.57% schedule
- 0.57% __schedule
We are spinning in the do-while loop in hugetlb_cgroup_css_offline.
Worse yet, we are holding the master cgroup lock (cgroup_mutex) while
infinitely spinning. Little else can be done on the system as the
cgroup_mutex can not be acquired.
Do note that the issue can be reproduced by simply offlining a hugetlb
cgroup containing pages with reservation counts.
The loop in hugetlb_cgroup_css_offline is moving page counts from the
cgroup being offlined to the parent cgroup. This is done for each
hstate, and is repeated until hugetlb_cgroup_have_usage returns false.
The routine moving counts (hugetlb_cgroup_move_parent) is only moving
'usage' counts. The routine hugetlb_cgroup_have_usage is checking for
both 'usage' and 'reservation' counts. Discussion about what to do with
reservation counts when reparenting was discussed here:
https://lore.kernel.org/linux-kselftest/CAHS8izMFAYTgxym-Hzb_JmkTK1N_S9tGN71uS6MFV+R7swYu5A@mail.gmail.com/
The decision was made to leave a zombie cgroup for with reservation
counts. Unfortunately, the code checking reservation counts was
incorrectly added to hugetlb_cgroup_have_usage.
To fix the issue, simply remove the check for reservation counts. While
fixing this issue, a related bug in hugetlb_cgroup_css_offline was
noticed. The hstate index is not reinitialized each time through the
do-while loop. Fix this as well.
Fixes: 1adc4d419aa2 ("hugetlb_cgroup: add interface for charge/uncharge hugetlb reservations")
Reported-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Adrian Moreno <amorenoz@redhat.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Sandipan Das <sandipan@linux.ibm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20201203220242.158165-1-mike.kravetz@oracle.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alex Shi [Sun, 6 Dec 2020 06:15:09 +0000 (22:15 -0800)]
mm/filemap: add static for function __add_to_page_cache_locked
mm/filemap.c:830:14: warning: no previous prototype for `__add_to_page_cache_locked' [-Wmissing-prototypes]
Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Souptick Joarder <jrdr.linux@gmail.com>
Link: https://lkml.kernel.org/r/1604661895-5495-1-git-send-email-alex.shi@linux.alibaba.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Axel Rasmussen [Sun, 6 Dec 2020 06:15:05 +0000 (22:15 -0800)]
userfaultfd: selftests: fix SIGSEGV if huge mmap fails
The error handling in hugetlb_allocate_area() was incorrect for the
hugetlb_shared test case.
Previously the behavior was:
- mmap a hugetlb area
- If this fails, set the pointer to NULL, and carry on
- mmap an alias of the same hugetlb fd
- If this fails, munmap the original area
If the original mmap failed, it's likely the second one did too. If
both failed, we'd blindly try to munmap a NULL pointer, causing a
SIGSEGV. Instead, "goto fail" so we return before trying to mmap the
alias.
This issue can be hit "in real life" by forgetting to set
/proc/sys/vm/nr_hugepages (leaving it at 0), and then trying to run the
hugetlb_shared test.
Another small improvement is, when the original mmap fails, don't just
print "it failed": perror(), so we can see *why*. :)
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Alan Gilbert <dgilbert@redhat.com>
Link: https://lkml.kernel.org/r/20201204203443.2714693-1-axelrasmussen@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Xingxing Su [Sun, 6 Dec 2020 06:15:02 +0000 (22:15 -0800)]
tools/testing/selftests/vm: fix build error
Only x86 and PowerPC implement the pkey-xxx.h, and an error was reported
when compiling protection_keys.c.
Add a Arch judgment to compile "protection_keys" in the Makefile.
If other arch implement this, add the arch name to the Makefile.
eg:
ifneq (,$(findstring $(ARCH),powerpc mips ... ))
Following build errors:
pkey-helpers.h:93:2: error: #error Architecture not supported
#error Architecture not supported
pkey-helpers.h:96:20: error: `PKEY_DISABLE_ACCESS' undeclared
#define PKEY_MASK (PKEY_DISABLE_ACCESS | PKEY_DISABLE_WRITE)
^
protection_keys.c:218:45: error: `PKEY_DISABLE_WRITE' undeclared
pkey_assert(flags & (PKEY_DISABLE_ACCESS | PKEY_DISABLE_WRITE));
^
Signed-off-by: Xingxing Su <suxingxing@loongson.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Sandipan Das <sandipan@linux.ibm.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Mina Almasry <almasrymina@google.com>
Link: https://lkml.kernel.org/r/1606826876-30656-1-git-send-email-suxingxing@loongson.cn
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Uwe Kleine-König [Sun, 6 Dec 2020 06:14:58 +0000 (22:14 -0800)]
mailmap: add two more addresses of Uwe Kleine-König
This fixes attribution for the commits (among others)
-
d4097456cd1d ("video/framebuffer: move the probe func into
.devinit.text in Blackfin LCD driver")
-
0312e024d6cd ("mfd: mc13xxx: Add support for mc34708")
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Link: https://lkml.kernel.org/r/20201127213358.3440830-1-u.kleine-koenig@pengutronix.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Qian Cai [Sun, 6 Dec 2020 06:14:55 +0000 (22:14 -0800)]
mm/swapfile: do not sleep with a spin lock held
We can't call kvfree() with a spin lock held, so defer it. Fixes a
might_sleep() runtime warning.
Fixes: 873d7bcfd066 ("mm/swapfile.c: use kvzalloc for swap_info_struct allocation")
Signed-off-by: Qian Cai <qcai@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20201202151549.10350-1-qcai@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Minchan Kim [Sun, 6 Dec 2020 06:14:51 +0000 (22:14 -0800)]
mm/zsmalloc.c: drop ZSMALLOC_PGTABLE_MAPPING
While I was doing zram testing, I found sometimes decompression failed
since the compression buffer was corrupted. With investigation, I found
below commit calls cond_resched unconditionally so it could make a
problem in atomic context if the task is reschedule.
BUG: sleeping function called from invalid context at mm/vmalloc.c:108
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 946, name: memhog
3 locks held by memhog/946:
#0:
ffff9d01d4b193e8 (&mm->mmap_lock#2){++++}-{4:4}, at: __mm_populate+0x103/0x160
#1:
ffffffffa3d53de0 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_slowpath.constprop.0+0xa98/0x1160
#2:
ffff9d01d56b8110 (&zspage->lock){.+.+}-{3:3}, at: zs_map_object+0x8e/0x1f0
CPU: 0 PID: 946 Comm: memhog Not tainted
5.9.3-00011-gc5bfc0287345-dirty #316
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
Call Trace:
unmap_kernel_range_noflush+0x2eb/0x350
unmap_kernel_range+0x14/0x30
zs_unmap_object+0xd5/0xe0
zram_bvec_rw.isra.0+0x38c/0x8e0
zram_rw_page+0x90/0x101
bdev_write_page+0x92/0xe0
__swap_writepage+0x94/0x4a0
pageout+0xe3/0x3a0
shrink_page_list+0xb94/0xd60
shrink_inactive_list+0x158/0x460
We can fix this by removing the ZSMALLOC_PGTABLE_MAPPING feature (which
contains the offending calling code) from zsmalloc.
Even though this option showed some amount improvement(e.g., 30%) in
some arm32 platforms, it has been headache to maintain since it have
abused APIs[1](e.g., unmap_kernel_range in atomic context).
Since we are approaching to deprecate 32bit machines and already made
the config option available for only builtin build since v5.8, lastly it
has been not default option in zsmalloc, it's time to drop the option
for better maintenance.
[1] http://lore.kernel.org/linux-mm/
20201105170249.387069-1-minchan@kernel.org
Fixes: e47110e90584 ("mm/vunmap: add cond_resched() in vunmap_pmd_range")
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Tony Lindgren <tony@atomide.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Harish Sriram <harish@linux.ibm.com>
Cc: Uladzislau Rezki <urezki@gmail.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20201117202916.GA3856507@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yang Shi [Sun, 6 Dec 2020 06:14:48 +0000 (22:14 -0800)]
mm: list_lru: set shrinker map bit when child nr_items is not zero
When investigating a slab cache bloat problem, significant amount of
negative dentry cache was seen, but confusingly they neither got shrunk
by reclaimer (the host has very tight memory) nor be shrunk by dropping
cache. The vmcore shows there are over 14M negative dentry objects on
lru, but tracing result shows they were even not scanned at all.
Further investigation shows the memcg's vfs shrinker_map bit is not set.
So the reclaimer or dropping cache just skip calling vfs shrinker. So
we have to reboot the hosts to get the memory back.
I didn't manage to come up with a reproducer in test environment, and
the problem can't be reproduced after rebooting. But it seems there is
race between shrinker map bit clear and reparenting by code inspection.
The hypothesis is elaborated as below.
The memcg hierarchy on our production environment looks like:
root
/ \
system user
The main workloads are running under user slice's children, and it
creates and removes memcg frequently. So reparenting happens very often
under user slice, but no task is under user slice directly.
So with the frequent reparenting and tight memory pressure, the below
hypothetical race condition may happen:
CPU A CPU B
reparent
dst->nr_items == 0
shrinker:
total_objects == 0
add src->nr_items to dst
set_bit
return SHRINK_EMPTY
clear_bit
child memcg offline
replace child's kmemcg_id with
parent's (in memcg_offline_kmem())
list_lru_del() between shrinker runs
see parent's kmemcg_id
dec dst->nr_items
reparent again
dst->nr_items may go negative
due to concurrent list_lru_del()
The second run of shrinker:
read nr_items without any
synchronization, so it may
see intermediate negative
nr_items then total_objects
may return 0 coincidently
keep the bit cleared
dst->nr_items != 0
skip set_bit
add scr->nr_item to dst
After this point dst->nr_item may never go zero, so reparenting will not
set shrinker_map bit anymore. And since there is no task under user
slice directly, so no new object will be added to its lru to set the
shrinker map bit either. That bit is kept cleared forever.
How does list_lru_del() race with reparenting? It is because reparenting
replaces children's kmemcg_id to parent's without protecting from
nlru->lock, so list_lru_del() may see parent's kmemcg_id but actually
deleting items from child's lru, but dec'ing parent's nr_items, so the
parent's nr_items may go negative as commit
2788cf0c401c ("memcg:
reparent list_lrus and free kmemcg_id on css offline") says.
Since it is impossible that dst->nr_items goes negative and
src->nr_items goes zero at the same time, so it seems we could set the
shrinker map bit iff src->nr_items != 0. We could synchronize
list_lru_count_one() and reparenting with nlru->lock, but it seems
checking src->nr_items in reparenting is the simplest and avoids lock
contention.
Fixes: fae91d6d8be5 ("mm/list_lru.c: set bit in memcg shrinker bitmap on first list_lru item appearance")
Suggested-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: <stable@vger.kernel.org> [4.19]
Link: https://lkml.kernel.org/r/20201202171749.264354-1-shy828301@gmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Roman Gushchin [Sun, 6 Dec 2020 06:14:45 +0000 (22:14 -0800)]
mm: memcg/slab: fix obj_cgroup_charge() return value handling
Commit
10befea91b61 ("mm: memcg/slab: use a single set of kmem_caches
for all allocations") introduced a regression into the handling of the
obj_cgroup_charge() return value. If a non-zero value is returned
(indicating of exceeding one of memory.max limits), the allocation
should fail, instead of falling back to non-accounted mode.
To make the code more readable, move memcg_slab_pre_alloc_hook() and
memcg_slab_post_alloc_hook() calling conditions into bodies of these
hooks.
Fixes: 10befea91b61 ("mm: memcg/slab: use a single set of kmem_caches for all allocations")
Signed-off-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20201127161828.GD840171@carbon.dhcp.thefacebook.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Menglong Dong [Sun, 6 Dec 2020 06:14:42 +0000 (22:14 -0800)]
coredump: fix core_pattern parse error
'format_corename()' will splite 'core_pattern' on spaces when it is in
pipe mode, and take helper_argv[0] as the path to usermode executable.
It works fine in most cases.
However, if there is a space between '|' and '/file/path', such as
'| /usr/lib/systemd/systemd-coredump %P %u %g', then helper_argv[0] will
be parsed as '', and users will get a 'Core dump to | disabled'.
It is not friendly to users, as the pattern above was valid previously.
Fix this by ignoring the spaces between '|' and '/file/path'.
Fixes: 315c69261dd3 ("coredump: split pipe command whitespace before expanding template")
Signed-off-by: Menglong Dong <dong.menglong@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Paul Wise <pabs3@bonedaddy.net>
Cc: Jakub Wilk <jwilk@jwilk.net> [https://bugs.debian.org/924398]
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/5fb62870.1c69fb81.8ef5d.af76@mx.google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Randy Dunlap [Sun, 6 Dec 2020 06:14:38 +0000 (22:14 -0800)]
zlib: export S390 symbols for zlib modules
Fix build errors when ZLIB_INFLATE=m and ZLIB_DEFLATE=m and ZLIB_DFLTCC=y
by exporting the 2 needed symbols in dfltcc_inflate.c.
Fixes these build errors:
ERROR: modpost: "dfltcc_inflate" [lib/zlib_inflate/zlib_inflate.ko] undefined!
ERROR: modpost: "dfltcc_can_inflate" [lib/zlib_inflate/zlib_inflate.ko] undefined!
Fixes: 126196100063 ("lib/zlib: add s390 hardware support for kernel zlib_inflate")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Link: https://lkml.kernel.org/r/20201123191712.4882-1-rdunlap@infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Masahiro Yamada [Thu, 3 Dec 2020 17:55:51 +0000 (02:55 +0900)]
kbuild: avoid split lines in .mod files
"xargs echo" is not a safe way to remove line breaks because the input
may exceed the command line limit and xargs may break it up into
multiple invocations of echo. This should never happen because
scripts/gen_autoksyms.sh expects all undefined symbols are placed in
the second line of .mod files.
One possible way is to replace "xargs echo" with
"sed ':x;N;$!bx;s/\n/ /g'" or something, but I rewrote the code by
using awk because it is more readable.
This issue was reported by Sami Tolvanen; in his Clang LTO patch set,
$(multi-used-m) is no longer an ELF object, but a thin archive that
contains LLVM bitcode files. llvm-nm prints out symbols for each
archive member separately, which results a lot of dupications, in some
places, beyond the system-defined limit.
This problem must be fixed irrespective of LTO, and we must ensure
zero possibility of having this issue.
Link: https://lkml.org/lkml/2020/12/1/1658
Reported-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Michael S. Tsirkin [Sat, 5 Dec 2020 19:38:46 +0000 (14:38 -0500)]
Revert "mei: virtio: virtualization frontend driver"
This reverts commit
d162219c655c8cf8003128a13840d6c1e183fb80.
The device uses a VIRTIO device ID out of a not-for-production range.
Releasing Linux using an ID out of this range will make it conflict with
development setups. An official request to reserve an ID for an MEI
device is yet to be submitted to the virtio TC, thus there's no chance
it will be reserved and fixed in time before the next release.
Once requested it usually takes 2-3 weeks to land in the spec, which
means the device can be supported with the official ID in the next Linux
version if contributors act quickly.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Cc: Tomas Winkler <tomas.winkler@intel.com>
Cc: Alexander Usyskin <alexander.usyskin@intel.com>
Cc: Wang Yu <yu1.wang@intel.com>
Cc: Liu Shuo <shuo.a.liu@intel.com>
Link: https://lore.kernel.org/r/20201205193625.469773-1-mst@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Masami Hiramatsu [Thu, 3 Dec 2020 04:51:01 +0000 (13:51 +0900)]
x86/sev-es: Use new for_each_insn_prefix() macro to loop over prefixes bytes
Since insn.prefixes.nbytes can be bigger than the size of
insn.prefixes.bytes[] when a prefix is repeated, the proper
check must be:
insn.prefixes.bytes[i] != 0 and i < 4
instead of using insn.prefixes.nbytes. Use the new
for_each_insn_prefix() macro which does it correctly.
Debugged by Kees Cook <keescook@chromium.org>.
[ bp: Massage commit message. ]
Fixes: 25189d08e516 ("x86/sev-es: Add support for handling IOIO exceptions")
Reported-by: syzbot+9b64b619f10f19d19a7c@syzkaller.appspotmail.com
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/160697106089.3146288.2052422845039649176.stgit@devnote2
Masami Hiramatsu [Thu, 3 Dec 2020 04:50:50 +0000 (13:50 +0900)]
x86/insn-eval: Use new for_each_insn_prefix() macro to loop over prefixes bytes
Since insn.prefixes.nbytes can be bigger than the size of
insn.prefixes.bytes[] when a prefix is repeated, the proper check must
be
insn.prefixes.bytes[i] != 0 and i < 4
instead of using insn.prefixes.nbytes. Use the new
for_each_insn_prefix() macro which does it correctly.
Debugged by Kees Cook <keescook@chromium.org>.
[ bp: Massage commit message. ]
Fixes: 32d0b95300db ("x86/insn-eval: Add utility functions to get segment selector")
Reported-by: syzbot+9b64b619f10f19d19a7c@syzkaller.appspotmail.com
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/160697104969.3146288.16329307586428270032.stgit@devnote2
Masami Hiramatsu [Thu, 3 Dec 2020 04:50:37 +0000 (13:50 +0900)]
x86/uprobes: Do not use prefixes.nbytes when looping over prefixes.bytes
Since insn.prefixes.nbytes can be bigger than the size of
insn.prefixes.bytes[] when a prefix is repeated, the proper check must
be
insn.prefixes.bytes[i] != 0 and i < 4
instead of using insn.prefixes.nbytes.
Introduce a for_each_insn_prefix() macro for this purpose. Debugged by
Kees Cook <keescook@chromium.org>.
[ bp: Massage commit message, sync with the respective header in tools/
and drop "we". ]
Fixes: 2b1444983508 ("uprobes, mm, x86: Add the ability to install and remove uprobes breakpoints")
Reported-by: syzbot+9b64b619f10f19d19a7c@syzkaller.appspotmail.com
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/160697103739.3146288.7437620795200799020.stgit@devnote2
Linus Torvalds [Sun, 6 Dec 2020 00:16:34 +0000 (16:16 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:
"A fix for 'RETRIGEN' handling in Atmel touch controllers that was
causing lost interrupts on systems using edge-triggered interrupts, a
quirk for i8042 driver, and a couple more fixes."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: atmel_mxt_ts - fix lost interrupts
Input: xpad - support Ardwiino Controllers
Input: i8042 - add ByteSpeed touchpad to noloop table
Input: i8042 - fix error return code in i8042_setup_aux()
Input: soc_button_array - add missing include
Linus Torvalds [Sat, 5 Dec 2020 23:27:22 +0000 (15:27 -0800)]
Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"Some more I2C driver updates. IMX updates are a tad bigger, but not
exceptionally big"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: mlxbf: Fix the return check of devm_ioremap and ioremap
i2c: mlxbf: select CONFIG_I2C_SLAVE
i2c: imx: Don't generate STOP condition if arbitration has been lost
i2c: imx: Check for I2SR_IAL after every byte
i2c: imx: Fix reset of I2SR_IAL flag
i2c: qcom: Fix IRQ error misassignement
i2c: qup: Fix error return code in qup_i2c_bam_schedule_desc()
Linus Torvalds [Sat, 5 Dec 2020 22:45:30 +0000 (14:45 -0800)]
Merge tag 'block-5.10-2020-12-05' of git://git.kernel.dk/linux-block
Pull block fix from Jens Axboe:
"Single fix for an issue with chunk_sectors and stacked devices"
* tag 'block-5.10-2020-12-05' of git://git.kernel.dk/linux-block:
block: use gcd() to fix chunk_sectors limit stacking
Linus Torvalds [Sat, 5 Dec 2020 22:39:59 +0000 (14:39 -0800)]
Merge tag 'io_uring-5.10-2020-12-05' of git://git.kernel.dk/linux-block
Pull io_uring fix from Jens Axboe:
"Just a small fix this time, for an issue with 32-bit compat apps and
buffer selection with recvmsg"
* tag 'io_uring-5.10-2020-12-05' of git://git.kernel.dk/linux-block:
io_uring: fix recvmsg setup with compat buf-select
Linus Torvalds [Sat, 5 Dec 2020 19:16:21 +0000 (11:16 -0800)]
Merge tag 'powerpc-5.10-5' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Some more powerpc fixes for 5.10:
- Three commits fixing possible missed TLB invalidations for
multi-threaded processes when CPUs are hotplugged in and out.
- A fix for a host crash triggerable by host userspace (qemu) in KVM
on Power9.
- A fix for a host crash in machine check handling when running HPT
guests on a HPT host.
- One commit fixing potential missed TLB invalidations when using the
hash MMU on Power9 or later.
- A regression fix for machines with CPUs on node 0 but no memory.
Thanks to Aneesh Kumar K.V, Cédric Le Goater, Greg Kurz, Milan
Mohanty, Milton Miller, Nicholas Piggin, Paul Mackerras, and Srikar
Dronamraju"
* tag 'powerpc-5.10-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/64s/powernv: Fix memory corruption when saving SLB entries on MCE
KVM: PPC: Book3S HV: XIVE: Fix vCPU id sanity check
powerpc/numa: Fix a regression on memoryless node 0
powerpc/64s: Trim offlined CPUs from mm_cpumasks
kernel/cpu: add arch override for clear_tasks_mm_cpumask() mm handling
powerpc/64s/pseries: Fix hash tlbiel_all_isa300 for guest kernels
powerpc/64s: Fix hash ISA v3.0 TLBIEL instruction generation
Linus Torvalds [Sat, 5 Dec 2020 19:09:07 +0000 (11:09 -0800)]
Merge tag '5.10-rc6-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
"Three smb3 fixes (two for stable) fixing
- a null pointer issue in a DFS error path
- a problem with excessive padding when mounted with "idsfromsid"
causing owner fields to get corrupted
- a more recent problem with compounded reparse point query found in
testing to the Linux kernel server"
* tag '5.10-rc6-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
cifs: refactor create_sd_buf() and and avoid corrupting the buffer
cifs: add NULL check for ses->tcon_ipc
smb3: set COMPOUND_FID to FileID field of subsequent compound request
Linus Torvalds [Sat, 5 Dec 2020 18:59:21 +0000 (10:59 -0800)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Four small fixes in two drivers.
The mpt3sas fixes are all problems with timeout under unusual
conditions, and the storvsc is a missed incoming packet validation
and a missed error return"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: mpt3sas: Increase IOCInit request timeout to 30s
scsi: mpt3sas: Fix ioctl timeout
scsi: storvsc: Validate length of incoming packet in storvsc_on_channel_callback()
scsi: storvsc: Fix error return in storvsc_probe()
Linus Torvalds [Sat, 5 Dec 2020 18:51:25 +0000 (10:51 -0800)]
Merge tag 'for-5.10/dm-fixes-2' of git://git./linux/kernel/git/device-mapper/linux-dm
Pull fix for device mapper fixes from Mike Snitzer:
"Apologies for the glaring bug I introduced with my previous pull
request!
Fix incorrect branching at top of blk_max_size_offset()"
* tag 'for-5.10/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
block: fix incorrect branching in blk_max_size_offset()
Wang Xiaojun [Thu, 3 Dec 2020 01:46:47 +0000 (20:46 -0500)]
i2c: mlxbf: Fix the return check of devm_ioremap and ioremap
devm_ioremap and ioremap may return NULL which cannot be checked
by IS_ERR.
Signed-off-by: Wang Xiaojun <wangxiaojun11@huawei.com>
Reported-by: Hulk Robot <hulkci@huawei.com>
Acked-by: Khalil Blaiech <kblaiech@nvidia.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Arnd Bergmann [Thu, 3 Dec 2020 22:32:50 +0000 (23:32 +0100)]
i2c: mlxbf: select CONFIG_I2C_SLAVE
If this is not enabled, the interfaces used in this driver do not work:
drivers/i2c/busses/i2c-mlxbf.c:1888:3: error: implicit declaration of function 'i2c_slave_event' [-Werror,-Wimplicit-function-declaration]
i2c_slave_event(slave, I2C_SLAVE_WRITE_REQUESTED, &value);
^
drivers/i2c/busses/i2c-mlxbf.c:1888:26: error: use of undeclared identifier 'I2C_SLAVE_WRITE_REQUESTED'
i2c_slave_event(slave, I2C_SLAVE_WRITE_REQUESTED, &value);
^
drivers/i2c/busses/i2c-mlxbf.c:1890:32: error: use of undeclared identifier 'I2C_SLAVE_WRITE_RECEIVED'
ret = i2c_slave_event(slave, I2C_SLAVE_WRITE_RECEIVED,
^
drivers/i2c/busses/i2c-mlxbf.c:1892:26: error: use of undeclared identifier 'I2C_SLAVE_STOP'
i2c_slave_event(slave, I2C_SLAVE_STOP, &value);
^
Fixes: b5b5b32081cd ("i2c: mlxbf: I2C SMBus driver for Mellanox BlueField SoC")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Khalil Blaiech <kblaiech@nvidia.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Mike Snitzer [Fri, 4 Dec 2020 22:21:03 +0000 (17:21 -0500)]
block: fix incorrect branching in blk_max_size_offset()
If non-zero 'chunk_sectors' is passed in to blk_max_size_offset() that
override will be incorrectly ignored.
Old blk_max_size_offset() branching, prior to commit
3ee16db390b4,
must be used only if passed 'chunk_sectors' override is zero.
Fixes: 3ee16db390b4 ("dm: fix IO splitting")
Cc: stable@vger.kernel.org # 5.9
Reported-by: John Dorminy <jdorminy@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Linus Torvalds [Fri, 4 Dec 2020 21:28:39 +0000 (13:28 -0800)]
Merge tag 'for-5.10/dm-fixes' of git://git./linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- Fix DM's bio splitting changes that were made during v5.9. This
restores splitting in terms of varied per-target ti->max_io_len
rather than use block core's single stacked 'chunk_sectors' limit.
- Like DM crypt, update DM integrity to not use crypto drivers that
have CRYPTO_ALG_ALLOCATES_MEMORY set.
- Fix DM writecache target's argument parsing and status display.
- Remove needless BUG() from dm writecache's persistent_memory_claim()
- Remove old gcc workaround in DM cache target's block_div() for ARM
link errors now that gcc >= 4.9 is required.
- Fix RCU locking in dm_blk_report_zones and dm_dax_zero_page_range.
- Remove old, and now frowned upon, BUG_ON(in_interrupt()) in
dm_table_event().
- Remove invalid sparse annotations from dm_prepare_ioctl() and
dm_unprepare_ioctl().
* tag 'for-5.10/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm: remove invalid sparse __acquires and __releases annotations
dm: fix double RCU unlock in dm_dax_zero_page_range() error path
dm: fix IO splitting
dm writecache: remove BUG() and fail gracefully instead
dm table: Remove BUG_ON(in_interrupt())
dm: fix bug with RCU locking in dm_blk_report_zones
Revert "dm cache: fix arm link errors with inline"
dm writecache: fix the maximum number of arguments
dm writecache: advance the number of arguments when reporting max_age
dm integrity: don't use drivers that have CRYPTO_ALG_ALLOCATES_MEMORY
Mike Snitzer [Fri, 4 Dec 2020 20:25:18 +0000 (15:25 -0500)]
dm: remove invalid sparse __acquires and __releases annotations
Fixes sparse warnings:
drivers/md/dm.c:508:12: warning: context imbalance in 'dm_prepare_ioctl' - wrong count at exit
drivers/md/dm.c:543:13: warning: context imbalance in 'dm_unprepare_ioctl' - wrong count at exit
Fixes: 971888c46993f ("dm: hold DM table for duration of ioctl rather than use blkdev_get")
Cc: stable@vger.kernel.org
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Mike Snitzer [Fri, 4 Dec 2020 20:19:27 +0000 (15:19 -0500)]
dm: fix double RCU unlock in dm_dax_zero_page_range() error path
Remove redundant dm_put_live_table() in dm_dax_zero_page_range() error
path to fix sparse warning:
drivers/md/dm.c:1208:9: warning: context imbalance in 'dm_dax_zero_page_range' - unexpected unlock
Fixes: cdf6cdcd3b99a ("dm,dax: Add dax zero_page_range operation")
Cc: stable@vger.kernel.org
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Mike Snitzer [Mon, 30 Nov 2020 15:57:43 +0000 (10:57 -0500)]
dm: fix IO splitting
Commit
882ec4e609c1 ("dm table: stack 'chunk_sectors' limit to account
for target-specific splitting") caused a couple regressions:
1) Using lcm_not_zero() when stacking chunk_sectors was a bug because
chunk_sectors must reflect the most limited of all devices in the
IO stack.
2) DM targets that set max_io_len but that do _not_ provide an
.iterate_devices method no longer had there IO split properly.
And commit
5091cdec56fa ("dm: change max_io_len() to use
blk_max_size_offset()") also caused a regression where DM no longer
supported varied (per target) IO splitting. The implication being the
potential for severely reduced performance for IO stacks that use a DM
target like dm-cache to hide performance limitations of a slower
device (e.g. one that requires 4K IO splitting).
Coming full circle: Fix all these issues by discontinuing stacking
chunk_sectors up using ti->max_io_len in dm_calculate_queue_limits(),
add optional chunk_sectors override argument to blk_max_size_offset()
and update DM's max_io_len() to pass ti->max_io_len to its
blk_max_size_offset() call.
Passing in an optional chunk_sectors override to blk_max_size_offset()
allows for code reuse of block's centralized calculation for max IO
size based on provided offset and split boundary.
Fixes: 882ec4e609c1 ("dm table: stack 'chunk_sectors' limit to account for target-specific splitting")
Fixes: 5091cdec56fa ("dm: change max_io_len() to use blk_max_size_offset()")
Cc: stable@vger.kernel.org
Reported-by: John Dorminy <jdorminy@redhat.com>
Reported-by: Bruce Johnston <bjohnsto@redhat.com>
Reported-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Reviewed-by: John Dorminy <jdorminy@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Fri, 4 Dec 2020 17:25:22 +0000 (09:25 -0800)]
Merge tag 'drm-fixes-2020-12-04' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"This week's regular fixes.
i915 has fixes for a few races, use-after-free, and gpu hangs. Tegra
just has some minor fixes that I didn't see much point in hanging on
to. The nouveau fix is for all pre-nv50 cards and was reported a few
times. Otherwise it's just some amdgpu, and a few misc fixes.
Summary:
amdgpu:
- SMU11 manual fan fix
- Renoir display clock fix
- VCN3 dynamic powergating fix
i915:
- Program mocs:63 for cache eviction on gen9 (Chris)
- Protect context lifetime with RCU (Chris)
- Split the breadcrumb spinlock between global and contexts (Chris)
- Retain default context state across shrinking (Venkata)
- Limit frequency drop to RPe on parking (Chris)
- Return earlier from intel_modeset_init() without display (Jani)
- Defer initial modeset until after GGTT is initialized (Chris)
nouveau:
- pre-nv50 regression fix
rockchip:
- uninitialised LVDS property fix
omap:
- bridge fix
panel:
- race fix
mxsfb:
- fence sync fix
- modifiers fix
tegra:
- idr init fix
- sor fixes
- output/of cleanup fix"
* tag 'drm-fixes-2020-12-04' of git://anongit.freedesktop.org/drm/drm: (22 commits)
drm/amdgpu/vcn3.0: remove old DPG workaround
drm/amdgpu/vcn3.0: stall DPG when WPTR/RPTR reset
drm/amd/display: Init clock value by current vbios CLKs
drm/amdgpu/pm/smu11: Fix fan set speed bug
drm/i915/display: Defer initial modeset until after GGTT is initialised
drm/i915/display: return earlier from intel_modeset_init() without display
drm/i915/gt: Limit frequency drop to RPe on parking
drm/i915/gt: Retain default context state across shrinking
drm/i915/gt: Split the breadcrumb spinlock between global and contexts
drm/i915/gt: Protect context lifetime with RCU
drm/i915/gt: Program mocs:63 for cache eviction on gen9
drm/omap: sdi: fix bridge enable/disable
drm/panel: sony-acx565akm: Fix race condition in probe
drm/rockchip: Avoid uninitialized use of endpoint id in LVDS
drm/tegra: sor: Disable clocks on error in tegra_sor_init()
drm/nouveau: make sure ret is initialized in nouveau_ttm_io_mem_reserve
drm: mxsfb: Implement .format_mod_supported
drm: mxsfb: fix fence synchronization
drm/tegra: output: Do not put OF node twice
drm/tegra: replace idr_init() by idr_init_base()
...
Jann Horn [Thu, 3 Dec 2020 01:25:05 +0000 (02:25 +0100)]
tty: Fix ->session locking
Currently, locking of ->session is very inconsistent; most places
protect it using the legacy tty mutex, but disassociate_ctty(),
__do_SAK(), tiocspgrp() and tiocgsid() don't.
Two of the writers hold the ctrl_lock (because they already need it for
->pgrp), but __proc_set_tty() doesn't do that yet.
On a PREEMPT=y system, an unprivileged user can theoretically abuse
this broken locking to read 4 bytes of freed memory via TIOCGSID if
tiocgsid() is preempted long enough at the right point. (Other things
might also go wrong, especially if root-only ioctls are involved; I'm
not sure about that.)
Change the locking on ->session such that:
- tty_lock() is held by all writers: By making disassociate_ctty()
hold it. This should be fine because the same lock can already be
taken through the call to tty_vhangup_session().
The tricky part is that we need to shorten the area covered by
siglock to be able to take tty_lock() without ugly retry logic; as
far as I can tell, this should be fine, since nothing in the
signal_struct is touched in the `if (tty)` branch.
- ctrl_lock is held by all writers: By changing __proc_set_tty() to
hold the lock a little longer.
- All readers that aren't holding tty_lock() hold ctrl_lock: By
adding locking to tiocgsid() and __do_SAK(), and expanding the area
covered by ctrl_lock in tiocspgrp().
Cc: stable@kernel.org
Signed-off-by: Jann Horn <jannh@google.com>
Reviewed-by: Jiri Slaby <jirislaby@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jann Horn [Thu, 3 Dec 2020 01:25:04 +0000 (02:25 +0100)]
tty: Fix ->pgrp locking in tiocspgrp()
tiocspgrp() takes two tty_struct pointers: One to the tty that userspace
passed to ioctl() (`tty`) and one to the TTY being changed (`real_tty`).
These pointers are different when ioctl() is called with a master fd.
To properly lock real_tty->pgrp, we must take real_tty->ctrl_lock.
This bug makes it possible for racing ioctl(TIOCSPGRP, ...) calls on
both sides of a PTY pair to corrupt the refcount of `struct pid`,
leading to use-after-free errors.
Fixes: 47f86834bbd4 ("redo locking of tty->pgrp")
CC: stable@kernel.org
Signed-off-by: Jann Horn <jannh@google.com>
Reviewed-by: Jiri Slaby <jirislaby@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Vamsi Krishna Samavedam [Mon, 30 Nov 2020 20:34:53 +0000 (12:34 -0800)]
usb: gadget: f_fs: Use local copy of descriptors for userspace copy
The function may be unbound causing the ffs_ep and its descriptors
to be freed while userspace is in the middle of an ioctl requesting
the same descriptors. Avoid dangling pointer reference by first
making a local copy of desctiptors before releasing the spinlock.
Fixes: c559a3534109 ("usb: gadget: f_fs: add ioctl returning ep descriptor")
Reviewed-by: Peter Chen <peter.chen@nxp.com>
Signed-off-by: Vamsi Krishna Samavedam <vskrishn@codeaurora.org>
Signed-off-by: Jack Pham <jackp@codeaurora.org>
Cc: stable <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20201130203453.28154-1-jackp@codeaurora.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Linus Walleij [Mon, 30 Nov 2020 08:30:33 +0000 (09:30 +0100)]
usb: ohci-omap: Fix descriptor conversion
There were a bunch of issues with the patch converting the
OMAP1 OSK board to use descriptors for controlling the USB
host:
- The chip label was incorrect
- The GPIO offset was off-by-one
- The code should use sleeping accessors
This patch tries to fix all issues at the same time.
Cc: Aaro Koskinen <aaro.koskinen@iki.fi>
Reported-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Fixes: 15d157e87443 ("usb: ohci-omap: Convert to use GPIO descriptors")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20201130083033.29435-1-linus.walleij@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Greg Kroah-Hartman [Fri, 4 Dec 2020 15:01:23 +0000 (16:01 +0100)]
Revert "usb-storage: fix sdev->host->dma_dev"
This reverts commit
0154012f8018bba4d9971d1007c12ffd48539ddb as Hans
reports it causes problems on some systems. Until a "real" fix for this
can be found, revert this change to get normal functionality back.
Link: https://lore.kernel.org/r/70ca74c2-4a80-e25b-eca9-a63a75516673@redhat.com
Cc: Tom Yan <tom.ty89@gmail.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Tested-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Greg Kroah-Hartman [Fri, 4 Dec 2020 15:00:34 +0000 (16:00 +0100)]
Revert "uas: fix sdev->host->dma_dev"
This reverts commit
558033c2828f832ab3b68c6f8b8710e0de6faef0 as Hans
reports it causes problems on some systems. Until a "real" fix for this
can be found, revert this change to get normal functionality back.
Link: https://lore.kernel.org/r/70ca74c2-4a80-e25b-eca9-a63a75516673@redhat.com
Cc: Tom Yan <tom.ty89@gmail.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Tested-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Greg Kroah-Hartman [Fri, 4 Dec 2020 14:59:27 +0000 (15:59 +0100)]
Revert "uas: bump hw_max_sectors to 2048 blocks for SS or faster drives"
This reverts commit
5df7ef7d32fec1d6d1c34dbec019b461a12ce870 as Hans
reports it causes problems on some systems. Until a "real" fix for this
can be found, revert this change to get normal functionality back.
Link: https://lore.kernel.org/r/70ca74c2-4a80-e25b-eca9-a63a75516673@redhat.com
Cc: Tom Yan <tom.ty89@gmail.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Tested-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Greg Kroah-Hartman [Fri, 4 Dec 2020 12:15:55 +0000 (13:15 +0100)]
Merge tag 'usb-serial-5.10-rc7' of https://git./linux/kernel/git/johan/usb-serial into usb-linus
Johan writes:
USB-serial fixes for 5.10-rc7
Here's a fix for a regression in the option driver which has been
backported to the stable trees and fix for a small memory leak on open
in the kl5kusb105 driver.
Included are also various new device ids.
All but the memleak fix has been in linux-next and with no reported
issues.
* tag 'usb-serial-5.10-rc7' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial:
USB: serial: kl5kusb105: fix memleak on open
USB: serial: ch341: sort device-id entries
USB: serial: ch341: add new Product ID for CH341A
USB: serial: option: fix Quectel BG96 matching
USB: serial: option: add support for Thales Cinterion EXS82
USB: serial: option: add Fibocom NL668 variants
Johan Hovold [Fri, 4 Dec 2020 08:55:19 +0000 (09:55 +0100)]
USB: serial: kl5kusb105: fix memleak on open
Fix memory leak of control-message transfer buffer on successful open().
Fixes: 6774d5f53271 ("USB: serial: kl5kusb105: fix open error path")
Cc: stable@vger.kernel.org
Signed-off-by: Johan Hovold <johan@kernel.org>
Diego Santa Cruz [Thu, 3 Dec 2020 21:47:03 +0000 (22:47 +0100)]
misc: eeprom: at24: fix NVMEM name with custom AT24 device name
When the "label" property is set on the AT24 EEPROM the NVMEM devid is
set to NVMEM_DEVID_NONE, but it is not effective since there is a
leftover line setting it back to NVMEM_DEVID_AUTO a few lines after.
Fixes: 61f764c307f6 ("eeprom: at24: Support custom device names for AT24 EEPROMs")
Signed-off-by: Diego Santa Cruz <Diego.SantaCruz@spinetix.com>
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Dave Airlie [Fri, 4 Dec 2020 01:53:43 +0000 (11:53 +1000)]
Merge tag 'drm-misc-fixes-2020-12-03' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
One bridge fix for OMAP, one for a race condition in a panel, two for
uninitialized variables in rockchip and nouveau, and two fixes for mxsfb
to fix a regression with modifiers and a fix for a fence synchronization
issue.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20201203125943.h2ft2xoywunt5orl@gilmour
Dave Airlie [Fri, 4 Dec 2020 01:48:37 +0000 (11:48 +1000)]
Merge tag 'amd-drm-fixes-5.10-2020-12-02' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
amd-drm-fixes-5.10-2020-12-02:
amdgpu:
- SMU11 manual fan fix
- Renoir display clock fix
- VCN3 dynamic powergating fix
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201203044815.41257-1-alexander.deucher@amd.com
Dave Airlie [Fri, 4 Dec 2020 01:45:37 +0000 (11:45 +1000)]
Merge tag 'drm-intel-fixes-2020-12-03' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
Fixes for GPU hang, null dereference, suspend-resume, power consumption, and use-after-free.
- Program mocs:63 for cache eviction on gen9 (Chris)
- Protect context lifetime with RCU (Chris)
- Split the breadcrumb spinlock between global and contexts (Chris)
- Retain default context state across shrinking (Venkata)
- Limit frequency drop to RPe on parking (Chris)
- Return earlier from intel_modeset_init() without display (Jani)
- Defer initial modeset until after GGTT is initialized (Chris)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201203134705.GA1575873@intel.com
Ronnie Sahlberg [Mon, 30 Nov 2020 01:29:20 +0000 (11:29 +1000)]
cifs: refactor create_sd_buf() and and avoid corrupting the buffer
When mounting with "idsfromsid" mount option, Azure
corrupted the owner SIDs due to excessive padding
caused by placing the owner fields at the end of the
security descriptor on create. Placing owners at the
front of the security descriptor (rather than the end)
is also safer, as the number of ACEs (that follow it)
are variable.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Suggested-by: Rohith Surabattula <rohiths@microsoft.com>
CC: Stable <stable@vger.kernel.org> # v5.8
Signed-off-by: Steve French <stfrench@microsoft.com>
Aurelien Aptel [Thu, 3 Dec 2020 18:46:08 +0000 (19:46 +0100)]
cifs: add NULL check for ses->tcon_ipc
In some scenarios (DFS and BAD_NETWORK_NAME) set_root_set() can be
called with a NULL ses->tcon_ipc.
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Namjae Jeon [Thu, 3 Dec 2020 03:31:36 +0000 (12:31 +0900)]
smb3: set COMPOUND_FID to FileID field of subsequent compound request
For an operation compounded with an SMB2 CREATE request, client must set
COMPOUND_FID(0xFFFFFFFFFFFFFFFF) to FileID field of smb2 ioctl.
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Fixes: 2e4564b31b645 ("smb3: add support stat of WSL reparse points for special file types")
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Linus Torvalds [Thu, 3 Dec 2020 21:10:11 +0000 (13:10 -0800)]
Merge tag 'net-5.10-rc7' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Networking fixes for 5.10-rc7, including fixes from bpf, netfilter,
wireless drivers, wireless mesh and can.
Current release - regressions:
- mt76: usb: fix crash on device removal
Current release - always broken:
- xsk: Fix umem cleanup from wrong context in socket destruct
Previous release - regressions:
- net: ip6_gre: set dev->hard_header_len when using header_ops
- ipv4: Fix TOS mask in inet_rtm_getroute()
- net, xsk: Avoid taking multiple skbuff references
Previous release - always broken:
- net/x25: prevent a couple of overflows
- netfilter: ipset: prevent uninit-value in hash_ip6_add
- geneve: pull IP header before ECN decapsulation
- mpls: ensure LSE is pullable in TC and openvswitch paths
- vxlan: respect needed_headroom of lower device
- batman-adv: Consider fragmentation for needed packet headroom
- can: drivers: don't count arbitration loss as an error
- netfilter: bridge: reset skb->pkt_type after POST_ROUTING traversal
- inet_ecn: Fix endianness of checksum update when setting ECT(1)
- ibmvnic: fix various corner cases around reset handling
- net/mlx5: fix rejecting unsupported Connect-X6DX SW steering
- net/mlx5: Enforce HW TX csum offload with kTLS"
* tag 'net-5.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (62 commits)
net/mlx5: DR, Proper handling of unsupported Connect-X6DX SW steering
net/mlx5e: kTLS, Enforce HW TX csum offload with kTLS
net: mlx5e: fix fs_tcp.c build when IPV6 is not enabled
net/mlx5: Fix wrong address reclaim when command interface is down
net/sched: act_mpls: ensure LSE is pullable before reading it
net: openvswitch: ensure LSE is pullable before reading it
net: skbuff: ensure LSE is pullable before decrementing the MPLS ttl
net: mvpp2: Fix error return code in mvpp2_open()
chelsio/chtls: fix a double free in chtls_setkey()
rtw88: debug: Fix uninitialized memory in debugfs code
vxlan: fix error return code in __vxlan_dev_create()
net: pasemi: fix error return code in pasemi_mac_open()
cxgb3: fix error return code in t3_sge_alloc_qset()
net/x25: prevent a couple of overflows
dpaa_eth: copy timestamp fields to new skb in A-050385 workaround
net: ip6_gre: set dev->hard_header_len when using header_ops
mt76: usb: fix crash on device removal
iwlwifi: pcie: add some missing entries for AX210
iwlwifi: pcie: invert values of NO_160 device config entries
iwlwifi: pcie: add one missing entry for AX210
...
Linus Torvalds [Thu, 3 Dec 2020 19:58:26 +0000 (11:58 -0800)]
Merge tag 's390-5.10-6' of git://git./linux/kernel/git/s390/linux
Pull s390 fixes from Heiko Carstens:
"One commit is fixing lockdep irq state tracing which broke with -rc6.
The other one fixes logical vs physical CPU address mixup in our PCI
code.
Summary:
- fix lockdep irq state tracing
- fix logical vs physical CPU address confusion in PCI code"
* tag 's390-5.10-6' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390: fix irq state tracing
s390/pci: fix CPU address in MSI for directed IRQ
Linus Torvalds [Thu, 3 Dec 2020 19:47:13 +0000 (11:47 -0800)]
Merge tag '9p-for-5.10-rc7' of git://github.com/martinetd/linux
Pull 9p fixes from Dominique Martinet:
"Restore splice functionality for 9p"
* tag '9p-for-5.10-rc7' of git://github.com/martinetd/linux:
fs: 9p: add generic splice_write file operation
fs: 9p: add generic splice_read file operations
Jakub Kicinski [Thu, 3 Dec 2020 19:18:37 +0000 (11:18 -0800)]
Merge branch 'mlx5-fixes-2020-12-01'
Saeed Mahameed says:
====================
mlx5 fixes 2020-12-01
This series introduces some fixes to mlx5 driver.
====================
Link: https://lore.kernel.org/r/20201203043946.235385-1-saeedm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Yevgeny Kliteynik [Thu, 3 Dec 2020 04:39:46 +0000 (20:39 -0800)]
net/mlx5: DR, Proper handling of unsupported Connect-X6DX SW steering
STEs format for Connect-X5 and Connect-X6DX different. Currently, on
Connext-X6DX the SW steering would break at some point when building STEs
w/o giving a proper error message. Fix this by checking the STE format of
the current device when initializing domain: add mlx5_ifc definitions for
Connect-X6DX SW steering, read FW capability to get the current format
version, and check this version when domain is being created.
Fixes: 26d688e33f88 ("net/mlx5: DR, Add Steering entry (STE) utilities")
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tariq Toukan [Thu, 3 Dec 2020 04:39:45 +0000 (20:39 -0800)]
net/mlx5e: kTLS, Enforce HW TX csum offload with kTLS
Checksum calculation cannot be done in SW for TX kTLS HW offloaded
packets.
Offload it to the device, disregard the declared state of the TX
csum offload feature.
Fixes: d2ead1f360e8 ("net/mlx5e: Add kTLS TX HW offload support")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Boris Pismenny <borisp@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Randy Dunlap [Thu, 3 Dec 2020 04:39:44 +0000 (20:39 -0800)]
net: mlx5e: fix fs_tcp.c build when IPV6 is not enabled
Fix build when CONFIG_IPV6 is not enabled by making a function
be built conditionally.
Fixes these build errors and warnings:
../drivers/net/ethernet/mellanox/mlx5/core/en_accel/fs_tcp.c: In function 'accel_fs_tcp_set_ipv6_flow':
../include/net/sock.h:380:34: error: 'struct sock_common' has no member named 'skc_v6_daddr'; did you mean 'skc_daddr'?
380 | #define sk_v6_daddr __sk_common.skc_v6_daddr
| ^~~~~~~~~~~~
../drivers/net/ethernet/mellanox/mlx5/core/en_accel/fs_tcp.c:55:14: note: in expansion of macro 'sk_v6_daddr'
55 | &sk->sk_v6_daddr, 16);
| ^~~~~~~~~~~
At top level:
../drivers/net/ethernet/mellanox/mlx5/core/en_accel/fs_tcp.c:47:13: warning: 'accel_fs_tcp_set_ipv6_flow' defined but not used [-Wunused-function]
47 | static void accel_fs_tcp_set_ipv6_flow(struct mlx5_flow_spec *spec, struct sock *sk)
Fixes: 5229a96e59ec ("net/mlx5e: Accel, Expose flow steering API for rules add/del")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eran Ben Elisha [Thu, 3 Dec 2020 04:39:43 +0000 (20:39 -0800)]
net/mlx5: Fix wrong address reclaim when command interface is down
When command interface is down, driver to reclaim all 4K page chucks that
were hold by the Firmeware. Fix a bug for 64K page size systems, where
driver repeatedly released only the first chunk of the page.
Define helper function to fill 4K chunks for a given Firmware pages.
Iterate over all unreleased Firmware pages and call the hepler per each.
Fixes: 5adff6a08862 ("net/mlx5: Fix incorrect page count when in internal error")
Signed-off-by: Eran Ben Elisha <eranbe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Davide Caratti [Thu, 3 Dec 2020 09:37:52 +0000 (10:37 +0100)]
net/sched: act_mpls: ensure LSE is pullable before reading it
when 'act_mpls' is used to mangle the LSE, the current value is read from
the packet dereferencing 4 bytes at mpls_hdr(): ensure that the label is
contained in the skb "linear" area.
Found by code inspection.
v2:
- use MPLS_HLEN instead of sizeof(new_lse), thanks to Jakub Kicinski
Fixes: 2a2ea50870ba ("net: sched: add mpls manipulation actions to TC")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Guillaume Nault <gnault@redhat.com>
Link: https://lore.kernel.org/r/3243506cba43d14858f3bd21ee0994160e44d64a.1606987058.git.dcaratti@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Davide Caratti [Thu, 3 Dec 2020 09:46:06 +0000 (10:46 +0100)]
net: openvswitch: ensure LSE is pullable before reading it
when openvswitch is configured to mangle the LSE, the current value is
read from the packet dereferencing 4 bytes at mpls_hdr(): ensure that
the label is contained in the skb "linear" area.
Found by code inspection.
Fixes: d27cf5c59a12 ("net: core: add MPLS update core helper and use in OvS")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Link: https://lore.kernel.org/r/aa099f245d93218b84b5c056b67b6058ccf81a66.1606987185.git.dcaratti@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Davide Caratti [Thu, 3 Dec 2020 09:58:21 +0000 (10:58 +0100)]
net: skbuff: ensure LSE is pullable before decrementing the MPLS ttl
skb_mpls_dec_ttl() reads the LSE without ensuring that it is contained in
the skb "linear" area. Fix this calling pskb_may_pull() before reading the
current ttl.
Found by code inspection.
Fixes: 2a2ea50870ba ("net: sched: add mpls manipulation actions to TC")
Reported-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Link: https://lore.kernel.org/r/53659f28be8bc336c113b5254dc637cc76bbae91.1606987074.git.dcaratti@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 3 Dec 2020 18:59:28 +0000 (10:59 -0800)]
Merge tag 'wireless-drivers-2020-12-03' of git://git./linux/kernel/git/kvalo/wireless-drivers
Kalle Valo says:
====================
wireless-drivers fixes for v5.10
Second, and most likely final, set of fixes for v5.10. Small fixes and
PCI id addtions.
iwlwifi
* PCI id additions
mt76
* fix a kernel crash during device removal
rtw88
* fix uninitialized memory in debugfs code
* tag 'wireless-drivers-2020-12-03' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers:
rtw88: debug: Fix uninitialized memory in debugfs code
mt76: usb: fix crash on device removal
iwlwifi: pcie: add some missing entries for AX210
iwlwifi: pcie: invert values of NO_160 device config entries
iwlwifi: pcie: add one missing entry for AX210
iwlwifi: update MAINTAINERS entry
====================
Link: https://lore.kernel.org/r/20201203183408.EE88AC43461@smtp.codeaurora.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Wang Hai [Thu, 3 Dec 2020 14:18:06 +0000 (22:18 +0800)]
net: mvpp2: Fix error return code in mvpp2_open()
Fix to return negative error code -ENOENT from invalid configuration
error handling case instead of 0, as done elsewhere in this function.
Fixes: 4bb043262878 ("net: mvpp2: phylink support")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20201203141806.37966-1-wanghai38@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dan Carpenter [Thu, 3 Dec 2020 08:44:31 +0000 (11:44 +0300)]
chelsio/chtls: fix a double free in chtls_setkey()
The "skb" is freed by the transmit code in cxgb4_ofld_send() and we
shouldn't use it again. But in the current code, if we hit an error
later on in the function then the clean up code will call kfree_skb(skb)
and so it causes a double free.
Set the "skb" to NULL and that makes the kfree_skb() a no-op.
Fixes: d25f2f71f653 ("crypto: chtls - Program the TLS session Key")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/X8ilb6PtBRLWiSHp@mwanda
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Sandeen [Tue, 1 Dec 2020 23:21:40 +0000 (17:21 -0600)]
uapi: fix statx attribute value overlap for DAX & MOUNT_ROOT
STATX_ATTR_MOUNT_ROOT and STATX_ATTR_DAX got merged with the same value,
so one of them needs fixing. Move STATX_ATTR_DAX.
While we're in here, clarify the value-matching scheme for some of the
attributes, and explain why the value for DAX does not match.
Fixes: 80340fe3605c ("statx: add mount_root")
Fixes: 712b2698e4c0 ("fs/stat: Define DAX statx attribute")
Link: https://lore.kernel.org/linux-fsdevel/7027520f-7c79-087e-1d00-743bdefa1a1e@redhat.com/
Link: https://lore.kernel.org/lkml/20201202214629.1563760-1-ira.weiny@intel.com/
Reported-by: David Howells <dhowells@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: David Howells <dhowells@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Cc: <stable@vger.kernel.org> # 5.8
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Uwe Kleine-König [Thu, 3 Dec 2020 08:41:42 +0000 (09:41 +0100)]
pwm: sl28cpld: fix getting driver data in pwm callbacks
Currently .get_state() and .apply() use dev_get_drvdata() on the struct
device related to the pwm chip. This only works after .probe() called
platform_set_drvdata() which in this driver happens only after
pwmchip_add() and so comes possibly too late.
Instead of setting the driver data earlier use the traditional
container_of approach as this way the driver data is conceptually and
computational nearer.
Fixes: 9db33d221efc ("pwm: Add support for sl28cpld PWM controller")
Tested-by: Michael Walle <michael@walle.cc>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Willy Tarreau [Mon, 30 Nov 2020 07:36:48 +0000 (08:36 +0100)]
lib/syscall: fix syscall registers retrieval on 32-bit platforms
Lilith >_> and Claudio Bozzato of Cisco Talos security team reported
that collect_syscall() improperly casts the syscall registers to 64-bit
values leaking the uninitialized last 24 bytes on 32-bit platforms, that
are visible in /proc/self/syscall.
The cause is that info->data.args are u64 while syscall_get_arguments()
uses longs, as hinted by the bogus pointer cast in the function.
Let's just proceed like the other call places, by retrieving the
registers into an array of longs before assigning them to the caller's
array. This was successfully tested on x86_64, i386 and ppc32.
Reference: CVE-2020-28588, TALOS-2020-1211
Fixes: 631b7abacd02 ("ptrace: Remove maxargs from task_current_syscall()")
Cc: Greg KH <greg@kroah.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Michael Ellerman <mpe@ellerman.id.au> (ppc32)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mike Travis [Thu, 3 Dec 2020 15:22:52 +0000 (09:22 -0600)]
x86/platform/uv: Fix UV4 hub revision adjustment
Currently, UV4 is incorrectly identified as UV4A and UV4A as UV5. Hub
chip starts with revision 1, fix it.
[ bp: Massage commit message. ]
Fixes: 647128f1536e ("x86/platform/uv: Update UV MMRs for UV5")
Signed-off-by: Mike Travis <mike.travis@hpe.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Steve Wahl <steve.wahl@hpe.com>
Acked-by: Dimitri Sivanich <dimitri.sivanich@hpe.com>
Link: https://lkml.kernel.org/r/20201203152252.371199-1-mike.travis@hpe.com
Dan Carpenter [Thu, 3 Dec 2020 08:43:37 +0000 (11:43 +0300)]
rtw88: debug: Fix uninitialized memory in debugfs code
This code does not ensure that the whole buffer is initialized and none
of the callers check for errors so potentially none of the buffer is
initialized. Add a memset to eliminate this bug.
Fixes: e3037485c68e ("rtw88: new Realtek 802.11ac driver")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/X8ilOfVz3pf0T5ec@mwanda
Johan Hovold [Thu, 3 Dec 2020 09:11:59 +0000 (10:11 +0100)]
USB: serial: ch341: sort device-id entries
Keep the device-id entries sorted to make it easier to add new ones in
the right spot.
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: stable@vger.kernel.org
Signed-off-by: Johan Hovold <johan@kernel.org>
Jan-Niklas Burfeind [Thu, 3 Dec 2020 03:03:59 +0000 (04:03 +0100)]
USB: serial: ch341: add new Product ID for CH341A
Add PID for CH340 that's found on a ch341 based Programmer made by keeyees.
The specific device that contains the serial converter is described
here: http://www.keeyees.com/a/Products/ej/36.html
The driver works flawlessly as soon as the new PID (0x5512) is added to
it.
Signed-off-by: Jan-Niklas Burfeind <kernel@aiyionpri.me>
Cc: stable@vger.kernel.org
Signed-off-by: Johan Hovold <johan@kernel.org>
Stephane Eranian [Thu, 26 Nov 2020 11:09:22 +0000 (20:09 +0900)]
perf/x86/intel: Check PEBS status correctly
The kernel cannot disambiguate when 2+ PEBS counters overflow at the
same time. This is what the comment for this code suggests. However,
I see the comparison is done with the unfiltered p->status which is a
copy of IA32_PERF_GLOBAL_STATUS at the time of the sample. This
register contains more than the PEBS counter overflow bits. It also
includes many other bits which could also be set.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20201126110922.317681-2-namhyung@kernel.org
Namhyung Kim [Thu, 26 Nov 2020 11:09:21 +0000 (20:09 +0900)]
perf/x86/intel: Fix a warning on x86_pmu_stop() with large PEBS
The commit
3966c3feca3f ("x86/perf/amd: Remove need to check "running"
bit in NMI handler") introduced this. It seems x86_pmu_stop can be
called recursively (like when it losts some samples) like below:
x86_pmu_stop
intel_pmu_disable_event (x86_pmu_disable)
intel_pmu_pebs_disable
intel_pmu_drain_pebs_nhm (x86_pmu_drain_pebs_buffer)
x86_pmu_stop
While commit
35d1ce6bec13 ("perf/x86/intel/ds: Fix x86_pmu_stop
warning for large PEBS") fixed it for the normal cases, there's
another path to call x86_pmu_stop() recursively when a PEBS error was
detected (like two or more counters overflowed at the same time).
Like in the Kan's previous fix, we can skip the interrupt accounting
for large PEBS, so check the iregs which is set for PMI only.
Fixes: 3966c3feca3f ("x86/perf/amd: Remove need to check "running" bit in NMI handler")
Reported-by: John Sperbeck <jsperbeck@google.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20201126110922.317681-1-namhyung@kernel.org
Peter Zijlstra [Mon, 30 Nov 2020 11:54:34 +0000 (12:54 +0100)]
intel_idle: Build fix
Because CONFIG_ soup.
Fixes: 6e1d2bc675bd ("intel_idle: Fix intel_idle() vs tracing")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20201130115402.GO3040@hirez.programming.kicks-ass.net
Boyuan Zhang [Tue, 19 May 2020 15:38:44 +0000 (11:38 -0400)]
drm/amdgpu/vcn3.0: remove old DPG workaround
Port from VCN2.5
SCRATCH2 is used to keep decode wptr as a workaround
which fix a hardware DPG decode wptr update bug for
vcn2.5 beforehand.
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 5.9.x
Boyuan Zhang [Sun, 10 May 2020 19:47:03 +0000 (15:47 -0400)]
drm/amdgpu/vcn3.0: stall DPG when WPTR/RPTR reset
Port from VCN2.5
Add vcn dpg harware synchronization to fix race condition
issue between vcn driver and hardware.
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 5.9.x
Brandon Syu [Thu, 12 Nov 2020 07:35:52 +0000 (15:35 +0800)]
drm/amd/display: Init clock value by current vbios CLKs
[Why]
While booting into OS, driver updates DPP/DISP CLKs.
But init clock value is zero which is invalid.
[How]
Get current clocks value to update init clocks.
To avoid underflow.
Signed-off-by: Brandon Syu <Brandon.Syu@amd.com>
Reviewed-by: Tony Cheng <Tony.Cheng@amd.com>
Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Arunpravin [Fri, 27 Nov 2020 16:10:24 +0000 (21:40 +0530)]
drm/amdgpu/pm/smu11: Fix fan set speed bug
Fix fan set speed calculation.
Suggested-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Arunpravin <Arunpravin.PaneerSelvam@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Zhang Changzhong [Wed, 2 Dec 2020 09:58:42 +0000 (17:58 +0800)]
vxlan: fix error return code in __vxlan_dev_create()
Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.
Fixes: 0ce1822c2a08 ("vxlan: add adjacent link to limit depth level")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com>
Link: https://lore.kernel.org/r/1606903122-2098-1-git-send-email-zhangchangzhong@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Zhang Changzhong [Wed, 2 Dec 2020 09:57:15 +0000 (17:57 +0800)]
net: pasemi: fix error return code in pasemi_mac_open()
Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.
Fixes: 72b05b9940f0 ("pasemi_mac: RX/TX ring management cleanup")
Fixes: 8d636d8bc5ff ("pasemi_mac: jumbo frame support")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com>
Link: https://lore.kernel.org/r/1606903035-1838-1-git-send-email-zhangchangzhong@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Zhang Changzhong [Wed, 2 Dec 2020 09:56:05 +0000 (17:56 +0800)]
cxgb3: fix error return code in t3_sge_alloc_qset()
Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.
Fixes: b1fb1f280d09 ("cxgb3 - Fix dma mapping error path")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com>
Acked-by: Raju Rangoju <rajur@chelsio.com>
Link: https://lore.kernel.org/r/1606902965-1646-1-git-send-email-zhangchangzhong@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dave Airlie [Thu, 3 Dec 2020 01:34:06 +0000 (11:34 +1000)]
Merge tag 'drm/tegra/for-5.10-rc7' of ssh://git.freedesktop.org/git/tegra/linux into drm-fixes
drm/tegra: Fixes for v5.10-rc7
This is a set of small fixes for various issues found during the last
couple of weeks.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thierry Reding <thierry.reding@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201127145324.125776-1-thierry.reding@gmail.com
Dan Carpenter [Tue, 1 Dec 2020 15:15:12 +0000 (18:15 +0300)]
net/x25: prevent a couple of overflows
The .x25_addr[] address comes from the user and is not necessarily
NUL terminated. This leads to a couple problems. The first problem is
that the strlen() in x25_bind() can read beyond the end of the buffer.
The second problem is more subtle and could result in memory corruption.
The call tree is:
x25_connect()
--> x25_write_internal()
--> x25_addr_aton()
The .x25_addr[] buffers are copied to the "addresses" buffer from
x25_write_internal() so it will lead to stack corruption.
Verify that the strings are NUL terminated and return -EINVAL if they
are not.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Fixes: a9288525d2ae ("X25: Dont let x25_bind use addresses containing characters")
Reported-by: "kiyin(尹亮)" <kiyin@tencent.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Martin Schiller <ms@dev.tdt.de>
Link: https://lore.kernel.org/r/X8ZeAKm8FnFpN//B@mwanda
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Thu, 3 Dec 2020 01:25:23 +0000 (17:25 -0800)]
Merge tag 'gfs2-v5.10-rc5-fixes' of git://git./linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 fixes from Andreas Gruenbacher:
"Various gfs2 fixes"
* tag 'gfs2-v5.10-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: Fix deadlock between gfs2_{create_inode,inode_lookup} and delete_work_func
gfs2: Upgrade shared glocks for atime updates
gfs2: Don't freeze the file system during unmount
gfs2: check for empty rgrp tree in gfs2_ri_update
gfs2: set lockdep subclass for iopen glocks
gfs2: Fix deadlock dumping resource group glocks
Chris Wilson [Wed, 25 Nov 2020 19:30:32 +0000 (19:30 +0000)]
drm/i915/display: Defer initial modeset until after GGTT is initialised
Prior to sanitizing the GGTT, the only operations allowed in
intel_display_init_nogem() are those to reserve the preallocated (and
active) regions in the GGTT leftover from the BIOS. Trying to allocate a
GGTT vma (such as intel_pin_and_fence_fb_obj during the initial modeset)
may then conflict with other preallocated regions that have not yet been
protected.
Move the initial modesetting from the end of init_nogem to the beginning
of init so that any vma pinning (either framebuffers or DSB, for example),
is after the GGTT is ready to handle it.
This will prevent the DSB object from being destroyed too early:
[ 53.449241] BUG: KASAN: use-after-free in i915_init_ggtt+0x324/0x9e0 [i915]
[ 53.449309] Read of size 8 at addr
ffff88811b1e8070 by task systemd-udevd/345
[ 53.449399] CPU: 1 PID: 345 Comm: systemd-udevd Tainted: G W 5.10.0-rc5+ #12
[ 53.449409] Call Trace:
[ 53.449418] dump_stack+0x9a/0xcc
[ 53.449558] ? i915_init_ggtt+0x324/0x9e0 [i915]
[ 53.449565] print_address_description.constprop.0+0x3e/0x60
[ 53.449577] ? _raw_spin_lock_irqsave+0x4e/0x50
[ 53.449718] ? i915_init_ggtt+0x324/0x9e0 [i915]
[ 53.449849] ? i915_init_ggtt+0x324/0x9e0 [i915]
[ 53.449857] kasan_report.cold+0x1f/0x37
[ 53.449993] ? i915_init_ggtt+0x324/0x9e0 [i915]
[ 53.450130] i915_init_ggtt+0x324/0x9e0 [i915]
[ 53.450273] ? i915_ggtt_suspend+0x1f0/0x1f0 [i915]
[ 53.450281] ? static_obj+0x69/0x80
[ 53.450289] ? lockdep_init_map_waits+0xa9/0x310
[ 53.450431] ? intel_wopcm_init+0x96/0x3d0 [i915]
[ 53.450581] ? i915_gem_init+0x75/0x2d0 [i915]
[ 53.450720] i915_gem_init+0x75/0x2d0 [i915]
[ 53.450852] i915_driver_probe+0x8c2/0x1210 [i915]
[ 53.450993] ? i915_pm_prepare+0x630/0x630 [i915]
[ 53.451006] ? check_chain_key+0x1e7/0x2e0
[ 53.451025] ? __pm_runtime_resume+0x58/0xb0
[ 53.451157] i915_pci_probe+0xa6/0x2b0 [i915]
[ 53.451285] ? i915_pci_remove+0x40/0x40 [i915]
[ 53.451295] ? lockdep_hardirqs_on_prepare+0x124/0x230
[ 53.451302] ? _raw_spin_unlock_irqrestore+0x42/0x50
[ 53.451309] ? lockdep_hardirqs_on+0xbf/0x130
[ 53.451315] ? preempt_count_sub+0xf/0xb0
[ 53.451321] ? _raw_spin_unlock_irqrestore+0x2f/0x50
[ 53.451335] pci_device_probe+0xf9/0x190
[ 53.451350] really_probe+0x17f/0x5b0
[ 53.451365] driver_probe_device+0x13a/0x1c0
[ 53.451376] device_driver_attach+0x82/0x90
[ 53.451386] ? device_driver_attach+0x90/0x90
[ 53.451391] __driver_attach+0xab/0x190
[ 53.451401] ? device_driver_attach+0x90/0x90
[ 53.451407] bus_for_each_dev+0xe4/0x140
[ 53.451414] ? subsys_dev_iter_exit+0x10/0x10
[ 53.451423] ? __list_add_valid+0x2b/0xa0
[ 53.451440] bus_add_driver+0x227/0x2e0
[ 53.451454] driver_register+0xd3/0x150
[ 53.451585] i915_init+0x92/0xac [i915]
[ 53.451592] ? 0xffffffffa0a20000
[ 53.451598] do_one_initcall+0xb6/0x3b0
[ 53.451606] ? trace_event_raw_event_initcall_finish+0x150/0x150
[ 53.451614] ? __kasan_kmalloc.constprop.0+0xc2/0xd0
[ 53.451627] ? kmem_cache_alloc_trace+0x4a4/0x8e0
[ 53.451634] ? kasan_unpoison_shadow+0x33/0x40
[ 53.451649] do_init_module+0xf8/0x350
[ 53.451662] load_module+0x43de/0x47f0
[ 53.451716] ? module_frob_arch_sections+0x20/0x20
[ 53.451731] ? rw_verify_area+0x5f/0x130
[ 53.451780] ? __do_sys_finit_module+0x10d/0x1a0
[ 53.451785] __do_sys_finit_module+0x10d/0x1a0
[ 53.451792] ? __ia32_sys_init_module+0x40/0x40
[ 53.451800] ? seccomp_do_user_notification.isra.0+0x5c0/0x5c0
[ 53.451829] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 53.451835] ? mark_held_locks+0x24/0x90
[ 53.451856] do_syscall_64+0x33/0x80
[ 53.451863] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 53.451868] RIP: 0033:0x7fde09b4470d
[ 53.451875] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 53 f7 0c 00 f7 d8 64 89 01 48
[ 53.451880] RSP: 002b:
00007ffd6abc1718 EFLAGS:
00000246 ORIG_RAX:
0000000000000139
[ 53.451890] RAX:
ffffffffffffffda RBX:
000056444e528150 RCX:
00007fde09b4470d
[ 53.451895] RDX:
0000000000000000 RSI:
00007fde09a21ded RDI:
000000000000000f
[ 53.451899] RBP:
0000000000020000 R08:
0000000000000000 R09:
0000000000000000
[ 53.451904] R10:
000000000000000f R11:
0000000000000246 R12:
00007fde09a21ded
[ 53.451909] R13:
0000000000000000 R14:
000056444e329200 R15:
000056444e528150
[ 53.451957] Allocated by task 345:
[ 53.451995] kasan_save_stack+0x1b/0x40
[ 53.452001] __kasan_kmalloc.constprop.0+0xc2/0xd0
[ 53.452006] kmem_cache_alloc+0x1cd/0x8d0
[ 53.452146] i915_vma_instance+0x126/0xb70 [i915]
[ 53.452304] i915_gem_object_ggtt_pin_ww+0x222/0x3f0 [i915]
[ 53.452446] intel_dsb_prepare+0x14f/0x230 [i915]
[ 53.452588] intel_atomic_commit+0x183/0x690 [i915]
[ 53.452730] intel_initial_commit+0x2bc/0x2f0 [i915]
[ 53.452871] intel_modeset_init_nogem+0xa02/0x2af0 [i915]
[ 53.452995] i915_driver_probe+0x8af/0x1210 [i915]
[ 53.453120] i915_pci_probe+0xa6/0x2b0 [i915]
[ 53.453125] pci_device_probe+0xf9/0x190
[ 53.453131] really_probe+0x17f/0x5b0
[ 53.453136] driver_probe_device+0x13a/0x1c0
[ 53.453142] device_driver_attach+0x82/0x90
[ 53.453148] __driver_attach+0xab/0x190
[ 53.453153] bus_for_each_dev+0xe4/0x140
[ 53.453158] bus_add_driver+0x227/0x2e0
[ 53.453164] driver_register+0xd3/0x150
[ 53.453286] i915_init+0x92/0xac [i915]
[ 53.453292] do_one_initcall+0xb6/0x3b0
[ 53.453297] do_init_module+0xf8/0x350
[ 53.453302] load_module+0x43de/0x47f0
[ 53.453307] __do_sys_finit_module+0x10d/0x1a0
[ 53.453312] do_syscall_64+0x33/0x80
[ 53.453318] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 53.453345] Freed by task 82:
[ 53.453379] kasan_save_stack+0x1b/0x40
[ 53.453384] kasan_set_track+0x1c/0x30
[ 53.453389] kasan_set_free_info+0x1b/0x30
[ 53.453394] __kasan_slab_free+0x112/0x160
[ 53.453399] kmem_cache_free+0xb2/0x3f0
[ 53.453536] i915_gem_flush_free_objects+0x31a/0x3b0 [i915]
[ 53.453542] process_one_work+0x519/0x9f0
[ 53.453547] worker_thread+0x75/0x5c0
[ 53.453552] kthread+0x1da/0x230
[ 53.453557] ret_from_fork+0x22/0x30
[ 53.453584] The buggy address belongs to the object at
ffff88811b1e8040
which belongs to the cache i915_vma of size 968
[ 53.453692] The buggy address is located 48 bytes inside of
968-byte region [
ffff88811b1e8040,
ffff88811b1e8408)
[ 53.453792] The buggy address belongs to the page:
[ 53.453842] page:
00000000b35f7048 refcount:1 mapcount:0 mapping:
0000000000000000 index:0xffff88811b1ef940 pfn:0x11b1e8
[ 53.453847] head:
00000000b35f7048 order:3 compound_mapcount:0 compound_pincount:0
[ 53.453853] flags: 0x8000000000010200(slab|head)
[ 53.453860] raw:
8000000000010200 ffff888115596248 ffff888115596248 ffff8881155b6340
[ 53.453866] raw:
ffff88811b1ef940 0000000000170001 00000001ffffffff 0000000000000000
[ 53.453870] page dumped because: kasan: bad access detected
[ 53.453895] Memory state around the buggy address:
[ 53.453944]
ffff88811b1e7f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 53.454011]
ffff88811b1e7f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 53.454079] >
ffff88811b1e8000: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb
[ 53.454146] ^
[ 53.454211]
ffff88811b1e8080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 53.454279]
ffff88811b1e8100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 53.454347] ==================================================================
[ 53.454414] Disabling lock debugging due to kernel taint
[ 53.454434] general protection fault, probably for non-canonical address 0xdead0000000000d0: 0000 [#1] PREEMPT SMP KASAN PTI
[ 53.454446] CPU: 1 PID: 345 Comm: systemd-udevd Tainted: G B W 5.10.0-rc5+ #12
[ 53.454592] RIP: 0010:i915_init_ggtt+0x26f/0x9e0 [i915]
[ 53.454602] Code: 89 8d 48 ff ff ff 4c 8d 60 d0 49 39 c7 0f 84 37 02 00 00 4c 89 b5 40 ff ff ff 4d 8d bc 24 90 00 00 00 4c 89 ff e8 c1 97 f8 e0 <49> 83 bc 24 90 00 00 00 00 0f 84 0f 02 00 00 49 8d 7c 24 08 e8 a8
[ 53.454618] RSP: 0018:
ffff88812247f430 EFLAGS:
00010286
[ 53.454625] RAX:
0000000000000000 RBX:
ffff888136440000 RCX:
ffffffffa03fb78f
[ 53.454633] RDX:
0000000000000000 RSI:
0000000000000008 RDI:
dead000000000160
[ 53.454641] RBP:
ffff88812247f500 R08:
ffffffff8113589f R09:
0000000000000000
[ 53.454648] R10:
ffffffff83063843 R11:
fffffbfff060c708 R12:
dead0000000000d0
[ 53.454656] R13:
ffff888136449ba0 R14:
0000000000002000 R15:
dead000000000160
[ 53.454664] FS:
00007fde095c4880(0000) GS:
ffff88840c880000(0000) knlGS:
0000000000000000
[ 53.454672] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 53.454679] CR2:
00007fef132b4f28 CR3:
000000012245c002 CR4:
00000000003706e0
[ 53.454686] DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
[ 53.454693] DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
[ 53.454700] Call Trace:
[ 53.454833] ? i915_ggtt_suspend+0x1f0/0x1f0 [i915]
Reported-by: Matthew Auld <matthew.auld@intel.com>
Fixes: afeda4f3b1c8 ("drm/i915/dsb: Pre allocate and late cleanup of cmd buffer")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Tested-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201125193032.29282-1-chris@chris-wilson.co.uk
(cherry picked from commit
b3bf99daaee96a141536ce5c60a0d6dba6ec1d23)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Jani Nikula [Fri, 6 Nov 2020 22:55:27 +0000 (14:55 -0800)]
drm/i915/display: return earlier from intel_modeset_init() without display
!HAS_DISPLAY() implies !HAS_OVERLAY(), skipping overlay setup anyway, so
return earlier from intel_modeset_init() for clarity.
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201106225531.920641-4-lucas.demarchi@intel.com
(cherry picked from commit
71c8415d0daa78ef1295743d0e11ba0214d0a9b9)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Chris Wilson [Tue, 24 Nov 2020 18:35:21 +0000 (18:35 +0000)]
drm/i915/gt: Limit frequency drop to RPe on parking
We treat idling the GT (intel_rps_park) as a downclock event, and reduce
the frequency we intend to restart the GT with. Since the two workloads
are likely related (e.g. a compositor rendering every 16ms), we want to
carry the frequency and load information from across the idling.
However, we do also need to update the frequencies so that workloads
that run for less than 1ms are autotuned by RPS (otherwise we leave
compositors running at max clocks, draining excess power). Conversely,
if we try to run too slowly, the next workload has to run longer. Since
there is a hysteresis in the power graph, below a certain frequency
running a short workload for longer consumes more energy than running it
slightly higher for less time. The exact balance point is unknown
beforehand, but measurements with 30fps media playback indicate that RPe
is a better choice.
Reported-by: Edward Baker <edward.baker@intel.com>
Tested-by: Edward Baker <edward.baker@intel.com>
Fixes: 043cd2d14ede ("drm/i915/gt: Leave rps->cur_freq on unpark")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Andi Shyti <andi.shyti@intel.com>
Cc: Lyude Paul <lyude@redhat.com>
Cc: <stable@vger.kernel.org> # v5.8+
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201124183521.28623-1-chris@chris-wilson.co.uk
(cherry picked from commit
f7ed83cc1925f0b8ce2515044d674354035c3af9)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Venkata Ramana Nayana [Fri, 27 Nov 2020 12:07:16 +0000 (12:07 +0000)]
drm/i915/gt: Retain default context state across shrinking
As we use a shmemfs file to hold the context state, when not in use it
may be swapped out, such as across suspend. Since we wrote into the
shmemfs without marking the pages as dirty, the contents may be dropped
instead of being written back to swap. On re-using the shmemfs file,
such as creating a new context after resume, the contents of that file
were likely garbage and so the new context could then hang the GPU.
Simply mark the page as being written when copying into the shmemfs
file, and it the new contents will be retained across swapout.
Fixes: be1cb55a07bf ("drm/i915/gt: Keep a no-frills swappable copy of the default context state")
Cc: Sudeep Dutt <sudeep.dutt@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: CQ Tang <cq.tang@intel.com>
Signed-off-by: Venkata Ramana Nayana <venkata.ramana.nayana@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: <stable@vger.kernel.org> # v5.8+
Link: https://patchwork.freedesktop.org/patch/msgid/20201127120718.454037-161-matthew.auld@intel.com
(cherry picked from commit
a9d71f76ccfd309f3bd5f7c9b60e91a4decae792)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Chris Wilson [Thu, 26 Nov 2020 14:04:06 +0000 (14:04 +0000)]
drm/i915/gt: Split the breadcrumb spinlock between global and contexts
As we funnel more and more contexts into the breadcrumbs on an engine,
the hold time of b->irq_lock grows. As we may then contend with the
b->irq_lock during request submission, this increases the burden upon
the engine->active.lock and so directly impacts both our execution
latency and client latency. If we split the b->irq_lock by introducing a
per-context spinlock to manage the signalers within a context, we then
only need the b->irq_lock for enabling/disabling the interrupt and can
avoid taking the lock for walking the list of contexts within the signal
worker. Even with the current setup, this greatly reduces the number of
times we have to take and fight for b->irq_lock.
Furthermore, this closes the race between enabling the signaling context
while it is in the process of being signaled and removed:
<4>[ 416.208555] list_add corruption. prev->next should be next (
ffff8881951d5910), but was
dead000000000100. (prev=
ffff8882781bb870).
<4>[ 416.208573] WARNING: CPU: 7 PID: 0 at lib/list_debug.c:28 __list_add_valid+0x4d/0x70
<4>[ 416.208575] Modules linked in: i915(+) vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio mei_hdcp x86_pkg_temp_thermal coretemp ax88179_178a usbnet mii crct10dif_pclmul snd_intel_dspcfg crc32_pclmul snd_hda_codec snd_hwdep ghash_clmulni_intel snd_hda_core e1000e snd_pcm ptp pps_core mei_me mei prime_numbers intel_lpss_pci [last unloaded: i915]
<4>[ 416.208611] CPU: 7 PID: 0 Comm: swapper/7 Tainted: G U 5.8.0-CI-CI_DRM_8852+ #1
<4>[ 416.208614] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake Y LPDDR4x T4 RVP TLC, BIOS ICLSFWR1.R00.3212.A00.
1905212112 05/21/2019
<4>[ 416.208627] RIP: 0010:__list_add_valid+0x4d/0x70
<4>[ 416.208631] Code: c3 48 89 d1 48 c7 c7 60 18 33 82 48 89 c2 e8 ea e0 b6 ff 0f 0b 31 c0 c3 48 89 c1 4c 89 c6 48 c7 c7 b0 18 33 82 e8 d3 e0 b6 ff <0f> 0b 31 c0 c3 48 89 f2 4c 89 c1 48 89 fe 48 c7 c7 00 19 33 82 e8
<4>[ 416.208633] RSP: 0018:
ffffc90000280e18 EFLAGS:
00010086
<4>[ 416.208636] RAX:
0000000000000000 RBX:
ffff888250a44880 RCX:
0000000000000105
<4>[ 416.208639] RDX:
0000000000000105 RSI:
ffffffff82320c5b RDI:
00000000ffffffff
<4>[ 416.208641] RBP:
ffff8882781bb870 R08:
0000000000000000 R09:
0000000000000001
<4>[ 416.208643] R10:
00000000054d2957 R11:
000000006abbd991 R12:
ffff8881951d58c8
<4>[ 416.208646] R13:
ffff888286073880 R14:
ffff888286073848 R15:
ffff8881951d5910
<4>[ 416.208669] FS:
0000000000000000(0000) GS:
ffff88829c180000(0000) knlGS:
0000000000000000
<4>[ 416.208671] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
<4>[ 416.208673] CR2:
0000556231326c48 CR3:
0000000005610001 CR4:
0000000000760ee0
<4>[ 416.208675] PKRU:
55555554
<4>[ 416.208677] Call Trace:
<4>[ 416.208679] <IRQ>
<4>[ 416.208751] i915_request_enable_breadcrumb+0x278/0x400 [i915]
<4>[ 416.208839] __i915_request_submit+0xca/0x2a0 [i915]
<4>[ 416.208892] __execlists_submission_tasklet+0x480/0x1830 [i915]
<4>[ 416.208942] execlists_submission_tasklet+0xc4/0x130 [i915]
<4>[ 416.208947] tasklet_action_common.isra.17+0x6c/0x1c0
<4>[ 416.208954] __do_softirq+0xdf/0x498
<4>[ 416.208960] ? handle_fasteoi_irq+0x150/0x150
<4>[ 416.208964] asm_call_on_stack+0xf/0x20
<4>[ 416.208966] </IRQ>
<4>[ 416.208969] do_softirq_own_stack+0xa1/0xc0
<4>[ 416.208972] irq_exit_rcu+0xb5/0xc0
<4>[ 416.208976] common_interrupt+0xf7/0x260
<4>[ 416.208980] asm_common_interrupt+0x1e/0x40
<4>[ 416.208985] RIP: 0010:cpuidle_enter_state+0xb6/0x410
<4>[ 416.208987] Code: 00 31 ff e8 9c 3e 89 ff 80 7c 24 0b 00 74 12 9c 58 f6 c4 02 0f 85 31 03 00 00 31 ff e8 e3 6c 90 ff e8 fe a4 94 ff fb 45 85 ed <0f> 88 c7 02 00 00 49 63 c5 4c 2b 24 24 48 8d 14 40 48 8d 14 90 48
<4>[ 416.208989] RSP: 0018:
ffffc90000143e70 EFLAGS:
00000206
<4>[ 416.208991] RAX:
0000000000000007 RBX:
ffffe8ffffda8070 RCX:
0000000000000000
<4>[ 416.208993] RDX:
0000000000000000 RSI:
ffffffff8238b4ee RDI:
ffffffff8233184f
<4>[ 416.208995] RBP:
ffffffff826b4e00 R08:
0000000000000000 R09:
0000000000000000
<4>[ 416.208997] R10:
0000000000000001 R11:
0000000000000000 R12:
00000060e7f24a8f
<4>[ 416.208998] R13:
0000000000000003 R14:
0000000000000003 R15:
0000000000000003
<4>[ 416.209012] cpuidle_enter+0x24/0x40
<4>[ 416.209016] do_idle+0x22f/0x2d0
<4>[ 416.209022] cpu_startup_entry+0x14/0x20
<4>[ 416.209025] start_secondary+0x158/0x1a0
<4>[ 416.209030] secondary_startup_64+0xa4/0xb0
<4>[ 416.209039] irq event stamp:
10186977
<4>[ 416.209042] hardirqs last enabled at (
10186976): [<
ffffffff810b9363>] tasklet_action_common.isra.17+0xe3/0x1c0
<4>[ 416.209044] hardirqs last disabled at (
10186977): [<
ffffffff81a5e5ed>] _raw_spin_lock_irqsave+0xd/0x50
<4>[ 416.209047] softirqs last enabled at (
10186968): [<
ffffffff810b9a1a>] irq_enter_rcu+0x6a/0x70
<4>[ 416.209049] softirqs last disabled at (
10186969): [<
ffffffff81c00f4f>] asm_call_on_stack+0xf/0x20
<4>[ 416.209317] list_del corruption,
ffff8882781bb870->next is LIST_POISON1 (
dead000000000100)
<4>[ 416.209317] WARNING: CPU: 7 PID: 46 at lib/list_debug.c:47 __list_del_entry_valid+0x4e/0x90
<4>[ 416.209317] Modules linked in: i915(+) vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio mei_hdcp x86_pkg_temp_thermal coretemp ax88179_178a usbnet mii crct10dif_pclmul snd_intel_dspcfg crc32_pclmul snd_hda_codec snd_hwdep ghash_clmulni_intel snd_hda_core e1000e snd_pcm ptp pps_core mei_me mei prime_numbers intel_lpss_pci [last unloaded: i915]
<4>[ 416.209317] CPU: 7 PID: 46 Comm: ksoftirqd/7 Tainted: G U W 5.8.0-CI-CI_DRM_8852+ #1
<4>[ 416.209317] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake Y LPDDR4x T4 RVP TLC, BIOS ICLSFWR1.R00.3212.A00.
1905212112 05/21/2019
<4>[ 416.209317] RIP: 0010:__list_del_entry_valid+0x4e/0x90
<4>[ 416.209317] Code: 2e 48 8b 32 48 39 fe 75 3a 48 8b 50 08 48 39 f2 75 48 b8 01 00 00 00 c3 48 89 fe 48 89 c2 48 c7 c7 38 19 33 82 e8 62 e0 b6 ff <0f> 0b 31 c0 c3 48 89 fe 48 c7 c7 70 19 33 82 e8 4e e0 b6 ff 0f 0b
<4>[ 416.209317] RSP: 0018:
ffffc90000280de8 EFLAGS:
00010086
<4>[ 416.209317] RAX:
0000000000000000 RBX:
ffff8882781bb848 RCX:
0000000000010104
<4>[ 416.209317] RDX:
0000000000010104 RSI:
ffffffff8238b4ee RDI:
00000000ffffffff
<4>[ 416.209317] RBP:
ffff8882781bb880 R08:
0000000000000000 R09:
0000000000000001
<4>[ 416.209317] R10:
000000009fb6666e R11:
00000000feca9427 R12:
ffffc90000280e18
<4>[ 416.209317] R13:
ffff8881951d5930 R14:
dead0000000000d8 R15:
ffff8882781bb880
<4>[ 416.209317] FS:
0000000000000000(0000) GS:
ffff88829c180000(0000) knlGS:
0000000000000000
<4>[ 416.209317] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
<4>[ 416.209317] CR2:
0000556231326c48 CR3:
0000000005610001 CR4:
0000000000760ee0
<4>[ 416.209317] PKRU:
55555554
<4>[ 416.209317] Call Trace:
<4>[ 416.209317] <IRQ>
<4>[ 416.209317] remove_signaling_context.isra.13+0xd/0x70 [i915]
<4>[ 416.209513] signal_irq_work+0x1f7/0x4b0 [i915]
This is caused by virtual engines where although we take the breadcrumb
lock on each of the active engines, they may be different engines on
different requests, It turns out that the b->irq_lock was not a
sufficient proxy for the engine->active.lock in the case of more than
one request, so introduce an explicit lock around ce->signals.
v2: ce->signal_lock is acquired with only RCU protection and so must be
treated carefully and not cleared during reallocation. We also then need
to confirm that the ce we lock is the same as we found in the breadcrumb
list.
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2276
Fixes: c18636f76344 ("drm/i915: Remove requirement for holding i915_request.lock for breadcrumbs")
Fixes: 2854d866327a ("drm/i915/gt: Replace intel_engine_transfer_stale_breadcrumbs")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201126140407.31952-4-chris@chris-wilson.co.uk
(cherry picked from commit
c744d50363b714783bbc88d986cc16def13710f7)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Chris Wilson [Thu, 26 Nov 2020 14:04:05 +0000 (14:04 +0000)]
drm/i915/gt: Protect context lifetime with RCU
Allow a brief period for continued access to a dead intel_context by
deferring the release of the struct until after an RCU grace period.
As we are using a dedicated slab cache for the contexts, we can defer
the release of the slab pages via RCU, with the caveat that individual
structs may be reused from the freelist within an RCU grace period. To
handle that, we have to avoid clearing members of the zombie struct.
This is required for a later patch to handle locking around virtual
requests in the signaler, as those requests may want to move between
engines and be destroyed while we are holding b->irq_lock on a physical
engine.
v2: Drop mutex_reinit(), if we never mark the mutex as destroyed we
don't need to reset the debug code, at the loss of having the mutex
debug code spot us attempting to destroy a locked mutex.
v3: As the intended use will remain strongly referenced counted, with
very little inflight access across reuse, drop the ctor.
v4: Drop the unrequired change to remove the temporary reference around
dropping the active context, and add back some more missing ctor
operations.
v5: The ctor is back. Tvrtko spotted that ce->signal_lock [introduced
later] maybe accessed under RCU and so needs special care not to be
reinitialised.
v6: Don't mix SLAB_TYPESAFE_BY_RCU and RCU list iteration.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201126140407.31952-3-chris@chris-wilson.co.uk
(cherry picked from commit
14d1eaf08845c534963c83f754afe0cb14cb2512)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>