platform/kernel/linux-starfive.git
3 years agochtls: Replace skb_dequeue with skb_peek
Ayush Sawal [Wed, 6 Jan 2021 04:29:10 +0000 (09:59 +0530)]
chtls: Replace skb_dequeue with skb_peek

The skb is unlinked twice, one in __skb_dequeue in function
chtls_reset_synq() and another in cleanup_syn_rcv_conn().
So in this patch using skb_peek() instead of __skb_dequeue(),
so that unlink will be handled only in cleanup_syn_rcv_conn().

Fixes: cc35c88ae4db ("crypto : chtls - CPL handler definition")
Signed-off-by: Vinay Kumar Yadav <vinay.yadav@chelsio.com>
Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agochtls: Avoid unnecessary freeing of oreq pointer
Ayush Sawal [Wed, 6 Jan 2021 04:29:09 +0000 (09:59 +0530)]
chtls: Avoid unnecessary freeing of oreq pointer

In chtls_pass_accept_request(), removing the chtls_reqsk_free()
call to avoid oreq freeing twice. Here oreq is the pointer to
struct request_sock.

Fixes: cc35c88ae4db ("crypto : chtls - CPL handler definition")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agochtls: Fix panic when route to peer not configured
Ayush Sawal [Wed, 6 Jan 2021 04:29:08 +0000 (09:59 +0530)]
chtls: Fix panic when route to peer not configured

If route to peer is not configured, we might get non tls
devices from dst_neigh_lookup() which is invalid, adding a
check to avoid it.

Fixes: cc35c88ae4db ("crypto : chtls - CPL handler definition")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agochtls: Remove invalid set_tcb call
Ayush Sawal [Wed, 6 Jan 2021 04:29:07 +0000 (09:59 +0530)]
chtls: Remove invalid set_tcb call

At the time of SYN_RECV, connection information is not
initialized at FW, updating tcb flag over uninitialized
connection causes adapter crash. We don't need to
update the flag during SYN_RECV state, so avoid this.

Fixes: cc35c88ae4db ("crypto : chtls - CPL handler definition")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agochtls: Fix hardware tid leak
Ayush Sawal [Wed, 6 Jan 2021 04:29:06 +0000 (09:59 +0530)]
chtls: Fix hardware tid leak

send_abort_rpl() is not calculating cpl_abort_req_rss offset and
ends up sending wrong TID with abort_rpl WR causng tid leaks.
Replaced send_abort_rpl() with chtls_send_abort_rpl() as it is
redundant.

Fixes: cc35c88ae4db ("crypto : chtls - CPL handler definition")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoMerge tag 'gcc-plugins-v5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 8 Jan 2021 00:03:19 +0000 (16:03 -0800)]
Merge tag 'gcc-plugins-v5.11-rc3' of git://git./linux/kernel/git/kees/linux

Pull gcc-plugins fix from Kees Cook:
 "Bump c++ standard version for latest GCC versions (Valdis Kletnieks)"

* tag 'gcc-plugins-v5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  gcc-plugins: fix gcc 11 indigestion with plugins...

3 years agoKVM: SVM: Add support for booting APs in an SEV-ES guest
Tom Lendacky [Mon, 4 Jan 2021 20:20:01 +0000 (14:20 -0600)]
KVM: SVM: Add support for booting APs in an SEV-ES guest

Typically under KVM, an AP is booted using the INIT-SIPI-SIPI sequence,
where the guest vCPU register state is updated and then the vCPU is VMRUN
to begin execution of the AP. For an SEV-ES guest, this won't work because
the guest register state is encrypted.

Following the GHCB specification, the hypervisor must not alter the guest
register state, so KVM must track an AP/vCPU boot. Should the guest want
to park the AP, it must use the AP Reset Hold exit event in place of, for
example, a HLT loop.

First AP boot (first INIT-SIPI-SIPI sequence):
  Execute the AP (vCPU) as it was initialized and measured by the SEV-ES
  support. It is up to the guest to transfer control of the AP to the
  proper location.

Subsequent AP boot:
  KVM will expect to receive an AP Reset Hold exit event indicating that
  the vCPU is being parked and will require an INIT-SIPI-SIPI sequence to
  awaken it. When the AP Reset Hold exit event is received, KVM will place
  the vCPU into a simulated HLT mode. Upon receiving the INIT-SIPI-SIPI
  sequence, KVM will make the vCPU runnable. It is again up to the guest
  to then transfer control of the AP to the proper location.

  To differentiate between an actual HLT and an AP Reset Hold, a new MP
  state is introduced, KVM_MP_STATE_AP_RESET_HOLD, which the vCPU is
  placed in upon receiving the AP Reset Hold exit event. Additionally, to
  communicate the AP Reset Hold exit event up to userspace (if needed), a
  new exit reason is introduced, KVM_EXIT_AP_RESET_HOLD.

A new x86 ops function is introduced, vcpu_deliver_sipi_vector, in order
to accomplish AP booting. For VMX, vcpu_deliver_sipi_vector is set to the
original SIPI delivery function, kvm_vcpu_deliver_sipi_vector(). SVM adds
a new function that, for non SEV-ES guests, invokes the original SIPI
delivery function, kvm_vcpu_deliver_sipi_vector(), but for SEV-ES guests,
implements the logic above.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <e8fbebe8eb161ceaabdad7c01a5859a78b424d5e.1609791600.git.thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: nSVM: cancel KVM_REQ_GET_NESTED_STATE_PAGES on nested vmexit
Maxim Levitsky [Thu, 7 Jan 2021 09:38:51 +0000 (11:38 +0200)]
KVM: nSVM: cancel KVM_REQ_GET_NESTED_STATE_PAGES on nested vmexit

It is possible to exit the nested guest mode, entered by
svm_set_nested_state prior to first vm entry to it (e.g due to pending event)
if the nested run was not pending during the migration.

In this case we must not switch to the nested msr permission bitmap.
Also add a warning to catch similar cases in the future.

Fixes: a7d5c7ce41ac1 ("KVM: nSVM: delay MSR permission processing to first nested VM run")

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210107093854.882483-2-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: nSVM: mark vmcb as dirty when forcingly leaving the guest mode
Maxim Levitsky [Thu, 7 Jan 2021 09:38:54 +0000 (11:38 +0200)]
KVM: nSVM: mark vmcb as dirty when forcingly leaving the guest mode

We overwrite most of vmcb fields while doing so, so we must
mark it as dirty.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210107093854.882483-5-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: nSVM: correctly restore nested_run_pending on migration
Maxim Levitsky [Thu, 7 Jan 2021 09:38:52 +0000 (11:38 +0200)]
KVM: nSVM: correctly restore nested_run_pending on migration

The code to store it on the migration exists, but no code was restoring it.

One of the side effects of fixing this is that L1->L2 injected events
are no longer lost when migration happens with nested run pending.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210107093854.882483-3-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: x86/mmu: Clarify TDP MMU page list invariants
Ben Gardon [Thu, 7 Jan 2021 00:19:35 +0000 (16:19 -0800)]
KVM: x86/mmu: Clarify TDP MMU page list invariants

The tdp_mmu_roots and tdp_mmu_pages in struct kvm_arch should only contain
pages with tdp_mmu_page set to true. tdp_mmu_pages should not contain any
pages with a non-zero root_count and tdp_mmu_roots should only contain
pages with a positive root_count, unless a thread holds the MMU lock and
is in the process of modifying the list. Various functions expect these
invariants to be maintained, but they are not explictily documented. Add
to the comments on both fields to document the above invariants.

Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20210107001935.3732070-2-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: x86/mmu: Ensure TDP MMU roots are freed after yield
Ben Gardon [Thu, 7 Jan 2021 00:19:34 +0000 (16:19 -0800)]
KVM: x86/mmu: Ensure TDP MMU roots are freed after yield

Many TDP MMU functions which need to perform some action on all TDP MMU
roots hold a reference on that root so that they can safely drop the MMU
lock in order to yield to other threads. However, when releasing the
reference on the root, there is a bug: the root will not be freed even
if its reference count (root_count) is reduced to 0.

To simplify acquiring and releasing references on TDP MMU root pages, and
to ensure that these roots are properly freed, move the get/put operations
into another TDP MMU root iterator macro.

Moving the get/put operations into an iterator macro also helps
simplify control flow when a root does need to be freed. Note that using
the list_for_each_entry_safe macro would not have been appropriate in
this situation because it could keep a pointer to the next root across
an MMU lock release + reacquire, during which time that root could be
freed.

Reported-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Fixes: faaf05b00aec ("kvm: x86/mmu: Support zapping SPTEs in the TDP MMU")
Fixes: 063afacd8730 ("kvm: x86/mmu: Support invalidate range MMU notifier for TDP MMU")
Fixes: a6a0b05da9f3 ("kvm: x86/mmu: Support dirty logging for the TDP MMU")
Fixes: 14881998566d ("kvm: x86/mmu: Support disabling dirty logging for the tdp MMU")
Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20210107001935.3732070-1-bgardon@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agokvm: check tlbs_dirty directly
Lai Jiangshan [Thu, 17 Dec 2020 15:41:18 +0000 (23:41 +0800)]
kvm: check tlbs_dirty directly

In kvm_mmu_notifier_invalidate_range_start(), tlbs_dirty is used as:
        need_tlb_flush |= kvm->tlbs_dirty;
with need_tlb_flush's type being int and tlbs_dirty's type being long.

It means that tlbs_dirty is always used as int and the higher 32 bits
is useless.  We need to check tlbs_dirty in a correct way and this
change checks it directly without propagating it to need_tlb_flush.

Note: it's _extremely_ unlikely this neglecting of higher 32 bits can
cause problems in practice.  It would require encountering tlbs_dirty
on a 4 billion count boundary, and KVM would need to be using shadow
paging or be running a nested guest.

Cc: stable@vger.kernel.org
Fixes: a4ee1ca4a36e ("KVM: MMU: delay flush all tlbs on sync_page path")
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Message-Id: <20201217154118.16497-1-jiangshanlai@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: x86: change in pv_eoi_get_pending() to make code more readable
Stephen Zhang [Fri, 18 Dec 2020 07:51:37 +0000 (15:51 +0800)]
KVM: x86: change in pv_eoi_get_pending() to make code more readable

Signed-off-by: Stephen Zhang <stephenzhangzsd@gmail.com>
Message-Id: <1608277897-1932-1-git-send-email-stephenzhangzsd@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoMerge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Jakub Kicinski [Thu, 7 Jan 2021 23:10:26 +0000 (15:10 -0800)]
Merge https://git./linux/kernel/git/bpf/bpf

Alexei Starovoitov says:

====================
pull-request: bpf 2021-01-07

We've added 4 non-merge commits during the last 10 day(s) which contain
a total of 4 files changed, 14 insertions(+), 7 deletions(-).

The main changes are:

1) Fix task_iter bug caused by the merge conflict resolution, from Yonghong.

2) Fix resolve_btfids for multiple type hierarchies, from Jiri.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  bpftool: Fix compilation failure for net.o with older glibc
  tools/resolve_btfids: Warn when having multiple IDs for single type
  bpf: Fix a task_iter bug caused by a merge conflict resolution
  selftests/bpf: Fix a compile error for BPF_F_BPRM_SECUREEXEC
====================

Link: https://lore.kernel.org/r/20210107221555.64959-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoMAINTAINERS: Really update email address for Sean Christopherson
Sean Christopherson [Wed, 6 Jan 2021 18:29:16 +0000 (10:29 -0800)]
MAINTAINERS: Really update email address for Sean Christopherson

Use my @google.com address in MAINTAINERS, somehow only the .mailmap
entry was added when the original update patch was applied.

Fixes: c2b1209d852f ("MAINTAINERS: Update email address for Sean Christopherson")
Cc: kvm@vger.kernel.org
Reported-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210106182916.331743-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: x86: fix shift out of bounds reported by UBSAN
Paolo Bonzini [Tue, 22 Dec 2020 10:20:43 +0000 (05:20 -0500)]
KVM: x86: fix shift out of bounds reported by UBSAN

Since we know that e >= s, we can reassociate the left shift,
changing the shifted number from 1 to 2 in exchange for
decreasing the right hand side by 1.

Reported-by: syzbot+e87846c48bf72bc85311@syzkaller.appspotmail.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: selftests: Implement perf_test_util more conventionally
Andrew Jones [Fri, 18 Dec 2020 14:17:34 +0000 (15:17 +0100)]
KVM: selftests: Implement perf_test_util more conventionally

It's not conventional C to put non-inline functions in header
files. Create a source file for the functions instead. Also
reduce the amount of globals and rename the functions to
something less generic.

Reviewed-by: Ben Gardon <bgardon@google.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Jones <drjones@redhat.com>
Message-Id: <20201218141734.54359-4-drjones@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: selftests: Use vm_create_with_vcpus in create_vm
Andrew Jones [Fri, 18 Dec 2020 14:17:33 +0000 (15:17 +0100)]
KVM: selftests: Use vm_create_with_vcpus in create_vm

Reviewed-by: Ben Gardon <bgardon@google.com>
Signed-off-by: Andrew Jones <drjones@redhat.com>
Message-Id: <20201218141734.54359-3-drjones@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: selftests: Factor out guest mode code
Andrew Jones [Fri, 18 Dec 2020 14:17:32 +0000 (15:17 +0100)]
KVM: selftests: Factor out guest mode code

demand_paging_test, dirty_log_test, and dirty_log_perf_test have
redundant guest mode code. Factor it out.

Also, while adding a new include, remove the ones we don't need.

Reviewed-by: Ben Gardon <bgardon@google.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Jones <drjones@redhat.com>
Message-Id: <20201218141734.54359-2-drjones@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM/SVM: Remove leftover __svm_vcpu_run prototype from svm.c
Uros Bizjak [Sun, 20 Dec 2020 20:03:39 +0000 (21:03 +0100)]
KVM/SVM: Remove leftover __svm_vcpu_run prototype from svm.c

Commit 16809ecdc1e8a moved __svm_vcpu_run the prototype to svm.h,
but forgot to remove the original from svm.c.

Fixes: 16809ecdc1e8a ("KVM: SVM: Provide an updated VMRUN invocation for SEV-ES guests")
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Message-Id: <20201220200339.65115-1-ubizjak@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: SVM: Add register operand to vmsave call in sev_es_vcpu_load
Nathan Chancellor [Sat, 19 Dec 2020 06:37:11 +0000 (23:37 -0700)]
KVM: SVM: Add register operand to vmsave call in sev_es_vcpu_load

When using LLVM's integrated assembler (LLVM_IAS=1) while building
x86_64_defconfig + CONFIG_KVM=y + CONFIG_KVM_AMD=y, the following build
error occurs:

 $ make LLVM=1 LLVM_IAS=1 arch/x86/kvm/svm/sev.o
 arch/x86/kvm/svm/sev.c:2004:15: error: too few operands for instruction
         asm volatile(__ex("vmsave") : : "a" (__sme_page_pa(sd->save_area)) : "memory");
                      ^
 arch/x86/kvm/svm/sev.c:28:17: note: expanded from macro '__ex'
 #define __ex(x) __kvm_handle_fault_on_reboot(x)
                 ^
 ./arch/x86/include/asm/kvm_host.h:1646:10: note: expanded from macro '__kvm_handle_fault_on_reboot'
         "666: \n\t"                                                     \
                 ^
 <inline asm>:2:2: note: instantiated into assembly here
         vmsave
         ^
 1 error generated.

This happens because LLVM currently does not support calling vmsave
without the fixed register operand (%rax for 64-bit and %eax for
32-bit). This will be fixed in LLVM 12 but the kernel currently supports
LLVM 10.0.1 and newer so this needs to be handled.

Add the proper register using the _ASM_AX macro, which matches the
vmsave call in vmenter.S.

Fixes: 861377730aa9 ("KVM: SVM: Provide support for SEV-ES vCPU loading")
Link: https://reviews.llvm.org/D93524
Link: https://github.com/ClangBuiltLinux/linux/issues/1216
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Message-Id: <20201219063711.3526947-1-natechancellor@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoMerge branch 'kvm-master' into kvm-next
Paolo Bonzini [Thu, 7 Jan 2021 23:06:52 +0000 (18:06 -0500)]
Merge branch 'kvm-master' into kvm-next

Fixes to get_mmio_spte, destined to 5.10 stable branch.

3 years agoKVM: x86/mmu: Optimize not-present/MMIO SPTE check in get_mmio_spte()
Sean Christopherson [Fri, 18 Dec 2020 00:31:39 +0000 (16:31 -0800)]
KVM: x86/mmu: Optimize not-present/MMIO SPTE check in get_mmio_spte()

Check only the terminal leaf for a "!PRESENT || MMIO" SPTE when looking
for reserved bits on valid, non-MMIO SPTEs.  The get_walk() helpers
terminate their walks if a not-present or MMIO SPTE is encountered, i.e.
the non-terminal SPTEs have already been verified to be regular SPTEs.
This eliminates an extra check-and-branch in a relatively hot loop.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20201218003139.2167891-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: x86/mmu: Use raw level to index into MMIO walks' sptes array
Sean Christopherson [Fri, 18 Dec 2020 00:31:38 +0000 (16:31 -0800)]
KVM: x86/mmu: Use raw level to index into MMIO walks' sptes array

Bump the size of the sptes array by one and use the raw level of the
SPTE to index into the sptes array.  Using the SPTE level directly
improves readability by eliminating the need to reason out why the level
is being adjusted when indexing the array.  The array is on the stack
and is not explicitly initialized; bumping its size is nothing more than
a superficial adjustment to the stack frame.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20201218003139.2167891-4-seanjc@google.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: x86/mmu: Get root level from walkers when retrieving MMIO SPTE
Sean Christopherson [Fri, 18 Dec 2020 00:31:37 +0000 (16:31 -0800)]
KVM: x86/mmu: Get root level from walkers when retrieving MMIO SPTE

Get the so called "root" level from the low level shadow page table
walkers instead of manually attempting to calculate it higher up the
stack, e.g. in get_mmio_spte().  When KVM is using PAE shadow paging,
the starting level of the walk, from the callers perspective, is not
the CR3 root but rather the PDPTR "root".  Checking for reserved bits
from the CR3 root causes get_mmio_spte() to consume uninitialized stack
data due to indexing into sptes[] for a level that was not filled by
get_walk().  This can result in false positives and/or negatives
depending on what garbage happens to be on the stack.

Opportunistically nuke a few extra newlines.

Fixes: 95fb5b0258b7 ("kvm: x86/mmu: Support MMIO in the TDP MMU")
Reported-by: Richard Herbert <rherbert@sympatico.ca>
Cc: Ben Gardon <bgardon@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20201218003139.2167891-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoKVM: x86/mmu: Use -1 to flag an undefined spte in get_mmio_spte()
Sean Christopherson [Fri, 18 Dec 2020 00:31:36 +0000 (16:31 -0800)]
KVM: x86/mmu: Use -1 to flag an undefined spte in get_mmio_spte()

Return -1 from the get_walk() helpers if the shadow walk doesn't fill at
least one spte, which can theoretically happen if the walk hits a
not-present PDPTR.  Returning the root level in such a case will cause
get_mmio_spte() to return garbage (uninitialized stack data).  In
practice, such a scenario should be impossible as KVM shouldn't get a
reserved-bit page fault with a not-present PDPTR.

Note, using mmu->root_level in get_walk() is wrong for other reasons,
too, but that's now a moot point.

Fixes: 95fb5b0258b7 ("kvm: x86/mmu: Support MMIO in the TDP MMU")
Cc: Ben Gardon <bgardon@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20201218003139.2167891-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoMerge branch 'net-fix-netfilter-defrag-ip-tunnel-pmtu-blackhole'
Jakub Kicinski [Thu, 7 Jan 2021 22:42:36 +0000 (14:42 -0800)]
Merge branch 'net-fix-netfilter-defrag-ip-tunnel-pmtu-blackhole'

Florian Westphal says:

====================
net: fix netfilter defrag/ip tunnel pmtu blackhole

Christian Perle reported a PMTU blackhole due to unexpected interaction
between the ip defragmentation that comes with connection tracking and
ip tunnels.

Unfortunately setting 'nopmtudisc' on the tunnel breaks the test
scenario even without netfilter.

Christinas setup looks like this:
     +--------+       +---------+       +--------+
     |Router A|-------|Wanrouter|-------|Router B|
     |        |.IPIP..|         |..IPIP.|        |
     +--------+       +---------+       +--------+
          /             mtu 1400           \
         /                                  \
 +--------+                                  +--------+
 |Client A|                                  |Client B|
 +--------+                                  +--------+

MTU is 1500 everywhere, except on Router A to Wanrouter and
Wanrouter to Router B.

Router A and Router B use IPIP tunnel interfaces to tunnel traffic
between Client A and Client B over WAN.

Client A sends a 1400 byte UDP datagram to Client B.
This packet gets encapsulated in the IPIP tunnel.

This works, packet is received on client B.

When conntrack (or anything else that forces ip defragmentation) is
enabled on Router A, the packet gets dropped on Router A after
encapsulation because they exceed the link MTU.

Setting the 'nopmtudisc' flag on the IPIP tunnel makes things worse,
no packets pass even in the no-netfilter scenario.

Patch one is a reproducer script for selftest infra.

Patch two is a fix for 'nopmtudisc' behaviour so ip_tunnel will send
an icmp error to Client A.  This allows 'nopmtudisc' tunnel to forward
the UDP datagrams.

Patch three enables ip refragmentation for all reassembled packets, just
like ipv6.
====================

Link: https://lore.kernel.org/r/20210105231523.622-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: ip: always refragment ip defragmented packets
Florian Westphal [Tue, 5 Jan 2021 23:15:23 +0000 (00:15 +0100)]
net: ip: always refragment ip defragmented packets

Conntrack reassembly records the largest fragment size seen in IPCB.
However, when this gets forwarded/transmitted, fragmentation will only
be forced if one of the fragmented packets had the DF bit set.

In that case, a flag in IPCB will force fragmentation even if the
MTU is large enough.

This should work fine, but this breaks with ip tunnels.
Consider client that sends a UDP datagram of size X to another host.

The client fragments the datagram, so two packets, of size y and z, are
sent. DF bit is not set on any of these packets.

Middlebox netfilter reassembles those packets back to single size-X
packet, before routing decision.

packet-size-vs-mtu checks in ip_forward are irrelevant, because DF bit
isn't set.  At output time, ip refragmentation is skipped as well
because x is still smaller than the mtu of the output device.

If ttransmit device is an ip tunnel, the packet size increases to
x+overhead.

Also, tunnel might be configured to force DF bit on outer header.

In this case, packet will be dropped (exceeds MTU) and an ICMP error is
generated back to sender.

But sender already respects the announced MTU, all the packets that
it sent did fit the announced mtu.

Force refragmentation as per original sizes unconditionally so ip tunnel
will encapsulate the fragments instead.

The only other solution I see is to place ip refragmentation in
the ip_tunnel code to handle this case.

Fixes: d6b915e29f4ad ("ip_fragment: don't forward defragmented DF packet")
Reported-by: Christian Perle <christian.perle@secunet.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: fix pmtu check in nopmtudisc mode
Florian Westphal [Tue, 5 Jan 2021 23:15:22 +0000 (00:15 +0100)]
net: fix pmtu check in nopmtudisc mode

For some reason ip_tunnel insist on setting the DF bit anyway when the
inner header has the DF bit set, EVEN if the tunnel was configured with
'nopmtudisc'.

This means that the script added in the previous commit
cannot be made to work by adding the 'nopmtudisc' flag to the
ip tunnel configuration. Doing so breaks connectivity even for the
without-conntrack/netfilter scenario.

When nopmtudisc is set, the tunnel will skip the mtu check, so no
icmp error is sent to client. Then, because inner header has DF set,
the outer header gets added with DF bit set as well.

IP stack then sends an error to itself because the packet exceeds
the device MTU.

Fixes: 23a3647bc4f93 ("ip_tunnels: Use skb-len to PMTU check.")
Cc: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoselftests: netfilter: add selftest for ipip pmtu discovery with enabled connection...
Florian Westphal [Tue, 5 Jan 2021 23:15:21 +0000 (00:15 +0100)]
selftests: netfilter: add selftest for ipip pmtu discovery with enabled connection tracking

Convert Christians bug description into a reproducer.

Cc: Shuah Khan <shuah@kernel.org>
Reported-by: Christian Perle <christian.perle@secunet.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoARC: unbork 5.11 bootup: fix snafu in _TIF_NOTIFY_SIGNAL handling
Vineet Gupta [Wed, 6 Jan 2021 20:34:36 +0000 (12:34 -0800)]
ARC: unbork 5.11 bootup: fix snafu in _TIF_NOTIFY_SIGNAL handling

Linux 5.11.rcX was failing to boot on ARC HSDK board. Turns out we have
a couple of issues, this being the first one, and I'm to blame as I
didn't pay attention during review.

TIF_NOTIFY_SIGNAL support requires checking multiple TIF_* bits in
kernel return code path. Old code only needed to check a single bit so
BBIT0 <TIF_SIGPENDING> worked. New code needs to check multiple bits so
AND <bit-mask> instruction. So needs to use bit mask variant _TIF_SIGPENDING

Cc: Jens Axboe <axboe@kernel.dk>
Fixes: 53855e12588743ea128 ("arc: add support for TIF_NOTIFY_SIGNAL")
Link: https://github.com/foss-for-synopsys-dwc-arc-processors/linux/issues/34
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
3 years agodocs: admin-guide: bootconfig: Fix feils to fails
Bhaskar Chowdhury [Thu, 7 Jan 2021 12:56:10 +0000 (18:26 +0530)]
docs: admin-guide: bootconfig: Fix feils to fails

s/feils/fails/p

Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20210107125610.1576368-1-unixbhaskar@gmail.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
3 years agoDocumentation/admin-guide: kernel-parameters: hyphenate comma-separated
Randy Dunlap [Fri, 1 Jan 2021 04:08:31 +0000 (20:08 -0800)]
Documentation/admin-guide: kernel-parameters: hyphenate comma-separated

Hyphenate "comma separated" when it is used as a compound adjective.
hyphenated.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20210101040831.4148-1-rdunlap@infradead.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
3 years agodocs: binfmt-misc: Fix .rst formatting
Jonathan Neuschäfer [Fri, 1 Jan 2021 21:14:47 +0000 (22:14 +0100)]
docs: binfmt-misc: Fix .rst formatting

"name below" is not part of the /proc path and should not be formatted
in monospace.

"doesn``t" is rendered in HTML with a double backtick. Revert it back to
"doesn't".

Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Link: https://lore.kernel.org/r/20210101211447.1021412-1-j.neuschaefer@gmx.net
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
3 years agodocs: remove mention of ENABLE_MUST_CHECK
Miguel Ojeda [Tue, 5 Jan 2021 05:58:15 +0000 (06:58 +0100)]
docs: remove mention of ENABLE_MUST_CHECK

We removed ENABLE_MUST_CHECK in 196793946264 ("Compiler Attributes:
remove CONFIG_ENABLE_MUST_CHECK"), so let's remove docs' mentions.

At the same time, fix the outdated text related to
ENABLE_WARN_DEPRECATED that wasn't removed in 3337d5cfe5e08
("configs: get rid of obsolete CONFIG_ENABLE_WARN_DEPRECATED").

Finally, reflow the paragraph.

Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lore.kernel.org/r/20210105055815.GA5173@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
3 years agodocs: octeontx2: tune rst markup
Lukas Bulwahn [Wed, 6 Jan 2021 16:17:35 +0000 (17:17 +0100)]
docs: octeontx2: tune rst markup

Commit 80b9414832a1 ("docs: octeontx2: Add Documentation for NPA health
reporters") added new documentation with improper formatting for rst, and
caused a few new warnings for make htmldocs in octeontx2.rst:169--202.

Tune markup and formatting for better presentation in the HTML view.

Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: George Cherian <george.cherian@marvell.com>
Link: https://lore.kernel.org/r/20210106161735.21751-1-lukas.bulwahn@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet/mlx5e: Fix memleak in mlx5e_create_l2_table_groups
Dinghao Liu [Mon, 21 Dec 2020 11:27:31 +0000 (19:27 +0800)]
net/mlx5e: Fix memleak in mlx5e_create_l2_table_groups

When mlx5_create_flow_group() fails, ft->g should be
freed just like when kvzalloc() fails. The caller of
mlx5e_create_l2_table_groups() does not catch this
issue on failure, which leads to memleak.

Fixes: 33cfaaa8f36f ("net/mlx5e: Split the main flow steering table")
Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5e: Fix two double free cases
Dinghao Liu [Mon, 28 Dec 2020 08:48:40 +0000 (16:48 +0800)]
net/mlx5e: Fix two double free cases

mlx5e_create_ttc_table_groups() frees ft->g on failure of
kvzalloc(), but such failure will be caught by its caller
in mlx5e_create_ttc_table() and ft->g will be freed again
in mlx5e_destroy_flow_table(). The same issue also occurs
in mlx5e_create_ttc_table_groups(). Set ft->g to NULL after
kfree() to avoid double free.

Fixes: 7b3722fa9ef6 ("net/mlx5e: Support RSS for GRE tunneled packets")
Fixes: 33cfaaa8f36f ("net/mlx5e: Split the main flow steering table")
Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: Release devlink object if adev fails
Leon Romanovsky [Mon, 4 Jan 2021 08:08:36 +0000 (10:08 +0200)]
net/mlx5: Release devlink object if adev fails

Add missed freeing previously allocated devlink object.

Fixes: a925b5e309c9 ("net/mlx5: Register mlx5 devices to auxiliary virtual bus")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5e: ethtool, Fix restriction of autoneg with 56G
Aya Levin [Sun, 27 Dec 2020 14:33:19 +0000 (16:33 +0200)]
net/mlx5e: ethtool, Fix restriction of autoneg with 56G

Prior to this patch, configuring speed to 50G with autoneg off over
devices supporting 50G per lane failed.
Support for 50G per lane introduced a new set of link-modes, on which
driver always performed a speed validation as if only legacy link-modes
were configured. Fix driver speed validation to force setting autoneg
over 56G only if in legacy link-mode.

Fixes: 3d7cadae51f1 ("net/mlx5e: ethtool, Fix analysis of speed setting")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Eran Ben Elisha <eranbe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5e: In skb build skip setting mark in switchdev mode
Maor Dickman [Mon, 14 Dec 2020 11:53:03 +0000 (13:53 +0200)]
net/mlx5e: In skb build skip setting mark in switchdev mode

sop_drop_qpn field in the cqe is used by two features, in SWITCHDEV mode
to restore the chain id in case of a miss and in LEGACY mode to support
skbedit mark action. In build RX skb, the skb mark field is set regardless
of the configured mode which cause a corruption of the mark field in case
of switchdev mode.

Fix by overriding the mark value back to 0 in the representor tc update
skb flow.

Fixes: 8f1e0b97cc70 ("net/mlx5: E-Switch, Mark miss packets with new chain id mapping")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: E-Switch, fix changing vf VLANID
Alaa Hleihel [Mon, 4 Jan 2021 10:54:40 +0000 (12:54 +0200)]
net/mlx5: E-Switch, fix changing vf VLANID

Adding vf VLANID for the first time, or after having cleared previously
defined VLANID works fine, however, attempting to change an existing vf
VLANID clears the rules on the firmware, but does not add new rules for
the new vf VLANID.

Fix this by changing the logic in function esw_acl_egress_lgcy_setup()
so that it will always configure egress rules.

Fixes: ea651a86d468 ("net/mlx5: E-Switch, Refactor eswitch egress acl codes")
Signed-off-by: Alaa Hleihel <alaa@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5e: Fix SWP offsets when vlan inserted by driver
Moshe Shemesh [Fri, 13 Nov 2020 04:06:28 +0000 (06:06 +0200)]
net/mlx5e: Fix SWP offsets when vlan inserted by driver

In case WQE includes inline header the vlan is inserted by driver even
if vlan offload is set. On geneve over vlan interface where software
parser is used the SWP offsets should be updated according to the added
vlan.

Fixes: e3cfc7e6b7bd ("net/mlx5e: TX, Add geneve tunnel stateless offload support")
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5e: CT: Use per flow counter when CT flow accounting is enabled
Oz Shlomo [Mon, 7 Dec 2020 08:15:18 +0000 (08:15 +0000)]
net/mlx5e: CT: Use per flow counter when CT flow accounting is enabled

Connection counters may be shared for both directions when the counter
is used for connection aging purposes. However, if TC flow
accounting is enabled then a unique counter is required per direction.

Instantiate a unique counter per direction if the conntrack accounting
extension is enabled. Use a shared counter when the connection accounting
extension is disabled.

Fixes: 1edae2335adf ("net/mlx5e: CT: Use the same counter for both directions")
Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reported-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: Use port_num 1 instead of 0 when delete a RoCE address
Mark Zhang [Mon, 14 Dec 2020 01:38:40 +0000 (03:38 +0200)]
net/mlx5: Use port_num 1 instead of 0 when delete a RoCE address

In multi-port mode, FW reports syndrome 0x2ea48 (invalid vhca_port_number)
if the port_num is not 1 or 2.

Fixes: 80f09dfc237f ("net/mlx5: Eswitch, enable RoCE loopback traffic")
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5e: Add missing capability check for uplink follow
Aya Levin [Tue, 24 Nov 2020 20:16:23 +0000 (22:16 +0200)]
net/mlx5e: Add missing capability check for uplink follow

Expose firmware indication that it supports setting eswitch uplink state
to follow (follow the physical link). Condition setting the eswitch
uplink admin-state with this capability bit. Older FW may not support
the uplink state setting.

Fixes: 7d0314b11cdd ("net/mlx5e: Modify uplink state on interface up/down")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: Check if lag is supported before creating one
Mark Zhang [Mon, 30 Nov 2020 02:38:11 +0000 (04:38 +0200)]
net/mlx5: Check if lag is supported before creating one

This patch fixes a memleak issue by preventing to create a lag and
add PFs if lag is not supported.

comm “python3”, pid 349349, jiffies 4296985507 (age 1446.976s)
hex dump (first 32 bytes):
  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  …………….
  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  …………….
 backtrace:
  [<000000005b216ae7>] mlx5_lag_add+0x1d5/0×3f0 [mlx5_core]
  [<000000000445aa55>] mlx5e_nic_enable+0x66/0×1b0 [mlx5_core]
  [<00000000c56734c3>] mlx5e_attach_netdev+0x16e/0×200 [mlx5_core]
  [<0000000030439d1f>] mlx5e_attach+0x5c/0×90 [mlx5_core]
  [<0000000018fd8615>] mlx5e_add+0x1a4/0×410 [mlx5_core]
  [<0000000068bc504b>] mlx5_add_device+0x72/0×120 [mlx5_core]
  [<000000009fce51f9>] mlx5_register_device+0x77/0xb0 [mlx5_core]
  [<00000000d0d81ff3>] mlx5_load_one+0xc58/0×1eb0 [mlx5_core]
  [<0000000045077adc>] init_one+0x3ea/0×920 [mlx5_core]
  [<0000000043287674>] pci_device_probe+0xcd/0×150
  [<00000000dafd3279>] really_probe+0x1c9/0×4b0
  [<00000000f06bdd84>] driver_probe_device+0x5d/0×140
  [<00000000e3d508b6>] device_driver_attach+0x4f/0×60
  [<0000000084fba0f0>] bind_store+0xbf/0×120
  [<00000000bf6622b3>] kernfs_fop_write+0x114/0×1b0

Fixes: 9b412cc35f00 ("net/mlx5e: Add LAG warning if bond slave is not lag master")
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agoMerge tag 'spi-fix-v5.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/brooni...
Linus Torvalds [Thu, 7 Jan 2021 20:21:32 +0000 (12:21 -0800)]
Merge tag 'spi-fix-v5.11-rc2' of git://git./linux/kernel/git/broonie/spi

Pull spi fixes from Mark Brown:
 "A couple of core fixes here, both to do with handling of drivers which
  don't report their maximum speed since we factored some of the
  handling for transfer speeds out into the core in the previous
  release.

  There's also some driver specific fixes, including a relatively large
  set for some races around timeouts in spi-geni-qcom"

* tag 'spi-fix-v5.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: fix the divide by 0 error when calculating xfer waiting time
  spi: Fix the clamping of spi->max_speed_hz
  spi: altera: fix return value for altera_spi_txrx()
  spi: stm32: FIFO threshold level - fix align packet size
  spi: spi-geni-qcom: Print an error when we timeout setting the CS
  spi: spi-geni-qcom: Don't try to set CS if an xfer is pending
  spi: spi-geni-qcom: Fail new xfers if xfer/cancel/abort pending
  spi: spi-geni-qcom: Fix geni_spi_isr() NULL dereference in timeout case

3 years agoMerge tag 'regulator-fix-v5.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Thu, 7 Jan 2021 20:18:22 +0000 (12:18 -0800)]
Merge tag 'regulator-fix-v5.11-rc2' of git://git./linux/kernel/git/broonie/regulator

Pull regulator fixes from Mark Brown:
 "A few minor driver specific fixes, mostly DT bindings document bits,
  plus a new device ID"

* tag 'regulator-fix-v5.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: qcom-rpmh: add QCOM_COMMAND_DB dependency
  regulator: qcom-rpmh-regulator: correct hfsmps515 definition
  dt-bindings: regulator: qcom,rpmh-regulator: add pm8009 revision
  regulator: bd718x7: Add enable times
  regulator: pf8x00: Use specific compatible strings for devices

3 years agoFonts: font_ter16x32: Update font with new upstream Terminus release
Amanoel Dawod [Sat, 26 Dec 2020 23:58:40 +0000 (18:58 -0500)]
Fonts: font_ter16x32: Update font with new upstream Terminus release

This is just a maintenance patch to update font_ter16x32.c with changes
and minor fixes added in new upstream Terminus v4.49.

>From release notes of new version 4.49, this brings:
- Altered ascii grave in some sizes to be more useful as a back quote.
- Fixed 21B5, added 21B2 and 21B3.

Just as my initial submission of the font, above changes were obtained from
new ter-i32b.psf font source.

Terminus font sources are available for download at SourceForge:
https://sourceforge.net/projects/terminus-font/files/terminus-font-4.49/

Simply running `make` in source directory will build the .psf font files.

Signed-off-by: Amanoel Dawod <kernel@amanoeldawod.com>
Link: https://lore.kernel.org/r/20201226235840.26290-1-kernel@amanoeldawod.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agotools: selftests: add test for changing routes with PTMU exceptions
Sean Tranchetti [Wed, 6 Jan 2021 00:22:26 +0000 (16:22 -0800)]
tools: selftests: add test for changing routes with PTMU exceptions

Adds new 2 new tests to the PTMU script: pmtu_ipv4/6_route_change.

These tests explicitly test for a recently discovered problem in the
IPv6 routing framework where PMTU exceptions were not properly released
when replacing a route via "ip route change ...".

After creating PMTU exceptions, the route from the device A to R1 will be
replaced with a new route, then device A will be deleted. If the PMTU
exceptions were properly cleaned up by the kernel, this device deletion
will succeed. Otherwise, the unregistration of the device will stall, and
messages such as the following will be logged in dmesg:

unregister_netdevice: waiting for veth_A-R1 to become free. Usage count = 4

Signed-off-by: Sean Tranchetti <stranche@codeaurora.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/1609892546-11389-2-git-send-email-stranche@quicinc.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: ipv6: fib: flush exceptions when purging route
Sean Tranchetti [Wed, 6 Jan 2021 00:22:25 +0000 (16:22 -0800)]
net: ipv6: fib: flush exceptions when purging route

Route removal is handled by two code paths. The main removal path is via
fib6_del_route() which will handle purging any PMTU exceptions from the
cache, removing all per-cpu copies of the DST entry used by the route, and
releasing the fib6_info struct.

The second removal location is during fib6_add_rt2node() during a route
replacement operation. This path also calls fib6_purge_rt() to handle
cleaning up the per-cpu copies of the DST entries and releasing the
fib6_info associated with the older route, but it does not flush any PMTU
exceptions that the older route had. Since the older route is removed from
the tree during the replacement, we lose any way of accessing it again.

As these lingering DSTs and the fib6_info struct are holding references to
the underlying netdevice struct as well, unregistering that device from the
kernel can never complete.

Fixes: 2b760fcf5cfb3 ("ipv6: hook up exception table to store dst cache")
Signed-off-by: Sean Tranchetti <stranche@codeaurora.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/1609892546-11389-1-git-send-email-stranche@quicinc.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoMerge tag 'regmap-fix-v5.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Thu, 7 Jan 2021 19:57:56 +0000 (11:57 -0800)]
Merge tag 'regmap-fix-v5.11-rc2' of git://git./linux/kernel/git/broonie/regmap

Pull regmap fixes from Mark Brown:
 "A couple of small fixes for leaks when attaching a device to a
  preexisting regmap"

* tag 'regmap-fix-v5.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
  regmap: debugfs: Fix a reversed if statement in regmap_debugfs_init()
  regmap: debugfs: Fix a memory leak when calling regmap_attach_dev

3 years agomisc: pvpanic: Check devm_ioport_map() for NULL
Andy Shevchenko [Mon, 28 Dec 2020 18:43:13 +0000 (20:43 +0200)]
misc: pvpanic: Check devm_ioport_map() for NULL

Inconveniently devm_ioport_map() and devm_ioremap_resource()
return errors differently, i.e. former uses simply NULL pointer,
while the latter an error pointer.

Due to this, we have to check each of them separately.

Fixes: f104060813fe ("misc: pvpanic: Combine ACPI and platform drivers")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20201228184313.57610-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agoMerge tag 'linux-can-fixes-for-5.11-20210107' of git://git.kernel.org/pub/scm/linux...
Jakub Kicinski [Thu, 7 Jan 2021 19:08:08 +0000 (11:08 -0800)]
Merge tag 'linux-can-fixes-for-5.11-20210107' of git://git./linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2021-01-07

The first patch is by me for the m_can driver and removes an erroneous
m_can_clk_stop() from the driver's unregister function.

The second patch targets the tcan4x5x driver, is by me, and fixes the bit
timing constant parameters.

The next two patches are by me, target the mcp251xfd driver, and fix a race
condition in the optimized TEF path (which was added in net-next for v5.11).
The similar code in the RX path is changed to look the same, although it
doesn't suffer from the race condition.

A patch by Lad Prabhakar updates the description and help text for the rcar CAN
driver to reflect all supported SoCs.

In the last patch Sriram Dash transfers the maintainership of the m_can driver
to Pankaj Sharma.

* tag 'linux-can-fixes-for-5.11-20210107' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
  MAINTAINERS: Update MCAN MMIO device driver maintainer
  can: rcar: Kconfig: update help description for CAN_RCAR config
  can: mcp251xfd: mcp251xfd_handle_rxif_ring(): first increment RX tail pointer in HW, then in driver
  can: mcp251xfd: mcp251xfd_handle_tefif(): fix TEF vs. TX race condition
  can: tcan4x5x: fix bittiming const, use common bittiming from m_can driver
  can: m_can: m_can_class_unregister(): remove erroneous m_can_clk_stop()
====================

Link: https://lore.kernel.org/r/20210107103451.183477-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoMerge tag 'nvme-5.11-2021-01-07' of git://git.infradead.org/nvme into block-5.11
Jens Axboe [Thu, 7 Jan 2021 17:57:54 +0000 (10:57 -0700)]
Merge tag 'nvme-5.11-2021-01-07' of git://git.infradead.org/nvme into block-5.11

Pull NVMe updates from Christoph:

"nvme updates for 5.11:

 - fix a race in the nvme-tcp send code (Sagi Grimberg)
 - fix a list corruption in an nvme-rdma error path (Israel Rukshin)
 - avoid a possible double fetch in nvme-pci (Lalithambika Krishnakumar)
 - add the susystem NQN quirk for a Samsung driver (Gopal Tiwari)
 - fix two compiler warnings in nvme-fcloop (James Smart)
 - don't call sleeping functions from irq context in nvme-fc (James Smart)
 - remove an unused argument (Max Gurtovoy)
 - remove unused exports (Minwoo Im)"

* tag 'nvme-5.11-2021-01-07' of git://git.infradead.org/nvme:
  nvme: remove the unused status argument from nvme_trace_bio_complete
  nvmet-rdma: Fix list_del corruption on queue establishment failure
  nvme: unexport functions with no external caller
  nvme: avoid possible double fetch in handling CQE
  nvme-tcp: Fix possible race of io_work and direct send
  nvme-pci: mark Samsung PM1725a as IGNORE_DEV_SUBNQN
  nvme-fcloop: Fix sscanf type and list_first_entry_or_null warnings
  nvme-fc: avoid calling _nvme_fc_abort_outstanding_ios from interrupt context

3 years agodrm/msm: Only enable A6xx LLCC code on A6xx
Konrad Dybcio [Mon, 4 Jan 2021 19:30:41 +0000 (20:30 +0100)]
drm/msm: Only enable A6xx LLCC code on A6xx

Using this code on A5xx (and probably older too) causes a
smmu bug.

Fixes: 474dadb8b0d5 ("drm/msm/a6xx: Add support for using system cache(LLC)")
Signed-off-by: Konrad Dybcio <konrad.dybcio@somainline.org>
Tested-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@somainline.org>
Reviewed-by: Jordan Crouse <jcrouse@codeaurora.org>
Reviewed-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@chromium.org>
3 years agocpufreq: intel_pstate: remove obsolete functions
Lukas Bulwahn [Mon, 21 Dec 2020 05:13:20 +0000 (06:13 +0100)]
cpufreq: intel_pstate: remove obsolete functions

percent_fp() was used in intel_pstate_pid_reset(), which was removed in
commit 9d0ef7af1f2d ("cpufreq: intel_pstate: Do not use PID-based P-state
selection") and hence, percent_fp() is unused since then.

percent_ext_fp() was last used in intel_pstate_update_perf_limits(), which
was refactored in commit 1a4fe38add8b ("cpufreq: intel_pstate: Remove
max/min fractions to limit performance"), and hence, percent_ext_fp() is
unused since then.

make CC=clang W=1 points us those unused functions:

drivers/cpufreq/intel_pstate.c:79:23: warning: unused function 'percent_fp' [-Wunused-function]
static inline int32_t percent_fp(int percent)
                      ^

drivers/cpufreq/intel_pstate.c:94:23: warning: unused function 'percent_ext_fp' [-Wunused-function]
static inline int32_t percent_ext_fp(int percent)
                      ^

Remove those obsolete functions.

Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
3 years agodrm/msm: Add modparam to allow vram carveout
Iskren Chernev [Wed, 30 Dec 2020 15:29:43 +0000 (17:29 +0200)]
drm/msm: Add modparam to allow vram carveout

Using the GPU with a VRAM Carveout is a security vulnerability.
Nevertheless it is sometimes required, especially when no IOMMU
implementation is available for a certain platform.

Signed-off-by: Iskren Chernev <iskren.chernev@gmail.com>
Signed-off-by: Rob Clark <robdclark@chromium.org>
3 years agodrm/msm: Call msm_init_vram before binding the gpu
Craig Tatlor [Wed, 30 Dec 2020 15:29:42 +0000 (17:29 +0200)]
drm/msm: Call msm_init_vram before binding the gpu

vram.size is needed when binding a gpu without an iommu and is defined
in msm_init_vram(), so run that before binding it.

Signed-off-by: Craig Tatlor <ctatlor97@gmail.com>
Reviewed-by: Brian Masney <masneyb@onstation.org>
Tested-by: Alexey Minnekhanov <alexeymin@postmarketos.org>
Signed-off-by: Rob Clark <robdclark@chromium.org>
3 years agodrm/msm/dp: postpone irq_hpd event during connection pending state
Kuogee Hsieh [Fri, 18 Dec 2020 17:53:40 +0000 (09:53 -0800)]
drm/msm/dp: postpone irq_hpd event during connection pending state

irq_hpd event can only be executed at connected state. Therefore
irq_hpd event should be postponed if it happened at connection
pending state. This patch also make sure both link rate and lane
are valid before start link training.

Signed-off-by: Kuogee Hsieh <khsieh@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@chromium.org>
3 years agodevice property: add description of fwnode cases
Bard Liao [Tue, 5 Jan 2021 09:11:46 +0000 (17:11 +0800)]
device property: add description of fwnode cases

There are only four valid fwnode cases which are
- primary --> secondary --> -ENODEV
- primary --> NULL
- secondary --> -ENODEV
- NULL

dev->fwnode should be converted between the 4 cases above no matter
how/when set_primary_fwnode() and set_secondary_fwnode() are called.
Describe it in the code so people will keep it in mind.

Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
[ rjw: Comment edit ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
3 years agoRevert "device property: Keep secondary firmware node secondary by type"
Bard Liao [Tue, 5 Jan 2021 09:11:45 +0000 (17:11 +0800)]
Revert "device property: Keep secondary firmware node secondary by type"

While commit d5dcce0c414f ("device property: Keep secondary firmware
node secondary by type") describes everything correct in its commit
message, the change it made does the opposite and original commit
c15e1bdda436 ("device property: Fix the secondary firmware node handling
in set_primary_fwnode()") was fully correct.

Revert the former one here and improve documentation in the next patch.

Fixes: d5dcce0c414f ("device property: Keep secondary firmware node secondary by type")
Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Cc: 5.10+ <stable@vger.kernel.org> # 5.10+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
3 years agoACPI: Update Kconfig help text for items that are no longer modular
Peter Robinson [Tue, 29 Dec 2020 11:17:59 +0000 (11:17 +0000)]
ACPI: Update Kconfig help text for items that are no longer modular

The CONTAINER and HOTPLUG_MEMORY options mention modules but are bool
only, so if selected are always built in.

Drop the help text about modules.

Signed-off-by: Peter Robinson <pbrobinson@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
3 years agoACPI: scan: add stub acpi_create_platform_device() for !CONFIG_ACPI
Shawn Guo [Thu, 31 Dec 2020 11:35:25 +0000 (19:35 +0800)]
ACPI: scan: add stub acpi_create_platform_device() for !CONFIG_ACPI

It adds a stub acpi_create_platform_device() for !CONFIG_ACPI build, so
that caller doesn't have to deal with !CONFIG_ACPI build issue.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
3 years agoACPI: PM: s2idle: Drop unused local variables and related code
Rafael J. Wysocki [Tue, 5 Jan 2021 18:19:18 +0000 (19:19 +0100)]
ACPI: PM: s2idle: Drop unused local variables and related code

Two local variables in drivers/acpi/x86/s2idle.c are never read, so
drop them along with the code updating their values (in vain).

Fixes: fef98671194b ("ACPI: PM: s2idle: Move x86-specific code to the x86 directory")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
3 years agocpufreq: powernow-k8: pass policy rather than use cpufreq_cpu_get()
Colin Ian King [Tue, 5 Jan 2021 10:19:57 +0000 (10:19 +0000)]
cpufreq: powernow-k8: pass policy rather than use cpufreq_cpu_get()

Currently there is an unlikely case where cpufreq_cpu_get() returns a
NULL policy and this will cause a NULL pointer dereference later on.

Fix this by passing the policy to transition_frequency_fidvid() from
the caller and hence eliminating the need for the cpufreq_cpu_get()
and cpufreq_cpu_put().

Thanks to Viresh Kumar for suggesting the fix.

Addresses-Coverity: ("Dereference null return")
Fixes: b43a7ffbf33b ("cpufreq: Notify all policy->cpus in cpufreq_notify_transition()")
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
3 years agocpufreq: intel_pstate: Use HWP capabilities in intel_cpufreq_adjust_perf()
Rafael J. Wysocki [Tue, 5 Jan 2021 18:20:29 +0000 (19:20 +0100)]
cpufreq: intel_pstate: Use HWP capabilities in intel_cpufreq_adjust_perf()

If turbo P-states cannot be used, either due to the configuration of
the processor, or because intel_pstate is not allowed to used them,
the maximum available P-state with HWP enabled corresponds to the
HWP_CAP.GUARANTEED value which is not static.  It can be adjusted by
an out-of-band agent or during an Intel Speed Select performance
level change, so long as it remains less than or equal to
HWP_CAP.MAX.

However, if turbo P-states cannot be used, intel_cpufreq_adjust_perf()
always uses pstate.max_pstate (set during the initialization of the
driver only) as the maximum available P-state, so it may miss a change
of the HWP_CAP.GUARANTEED value.

Prevent that from happening by modifyig intel_cpufreq_adjust_perf()
to always read the "guaranteed" and "maximum turbo" performance
levels from the cached HWP_CAP value.

Fixes: a365ab6b9dfb ("cpufreq: intel_pstate: Implement the ->adjust_perf() callback")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
3 years agofs: Fix freeze_bdev()/thaw_bdev() accounting of bd_fsfreeze_sb
Satya Tangirala [Thu, 24 Dec 2020 04:49:54 +0000 (04:49 +0000)]
fs: Fix freeze_bdev()/thaw_bdev() accounting of bd_fsfreeze_sb

freeze/thaw_bdev() currently use bdev->bd_fsfreeze_count to infer
whether or not bdev->bd_fsfreeze_sb is valid (it's valid iff
bd_fsfreeze_count is non-zero). thaw_bdev() doesn't nullify
bd_fsfreeze_sb.

But this means a freeze_bdev() call followed by a thaw_bdev() call can
leave bd_fsfreeze_sb with a non-null value, while bd_fsfreeze_count is
zero. If freeze_bdev() is called again, and this time
get_active_super() returns NULL (e.g. because the FS is unmounted),
we'll end up with bd_fsfreeze_count > 0, but bd_fsfreeze_sb is
*untouched* - it stays the same (now garbage) value. A subsequent
thaw_bdev() will decide that the bd_fsfreeze_sb value is legitimate
(since bd_fsfreeze_count > 0), and attempt to use it.

Fix this by always setting bd_fsfreeze_sb to NULL when
bd_fsfreeze_count is successfully decremented to 0 in thaw_sb().
Alternatively, we could set bd_fsfreeze_sb to whatever
get_active_super() returns in freeze_bdev() whenever bd_fsfreeze_count
is successfully incremented to 1 from 0 (which can be achieved cleanly
by moving the line currently setting bd_fsfreeze_sb to immediately
after the "sync:" label, but it might be a little too subtle/easily
overlooked in future).

This fixes the currently panicking xfstests generic/085.

Fixes: 040f04bd2e82 ("fs: simplify freeze_bdev/thaw_bdev")
Signed-off-by: Satya Tangirala <satyat@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: synchronise ev_posted() with waitqueues
Pavel Begunkov [Thu, 7 Jan 2021 03:15:43 +0000 (03:15 +0000)]
io_uring: synchronise ev_posted() with waitqueues

waitqueue_active() needs smp_mb() to be in sync with waitqueues
modification, but we miss it in io_cqring_ev_posted*() apart from
cq_wait() case.

Take an smb_mb() out of wq_has_sleeper() making it waitqueue_active(),
and place it a few lines before, so it can synchronise other
waitqueue_active() as well.

The patch doesn't add any additional overhead, so even if there are
no problems currently, it's just safer to have it this way.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: dont kill fasync under completion_lock
Pavel Begunkov [Thu, 7 Jan 2021 03:15:42 +0000 (03:15 +0000)]
io_uring: dont kill fasync under completion_lock

      CPU0                    CPU1
       ----                    ----
  lock(&new->fa_lock);
                               local_irq_disable();
                               lock(&ctx->completion_lock);
                               lock(&new->fa_lock);
  <Interrupt>
    lock(&ctx->completion_lock);

 *** DEADLOCK ***

Move kill_fasync() out of io_commit_cqring() to io_cqring_ev_posted(),
so it doesn't hold completion_lock while doing it. That saves from the
reported deadlock, and it's just nice to shorten the locking time and
untangle nested locks (compl_lock -> wq_head::lock).

Reported-by: syzbot+91ca3f25bd7f795f019c@syzkaller.appspotmail.com
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: trigger eventfd for IOPOLL
Pavel Begunkov [Thu, 7 Jan 2021 03:15:41 +0000 (03:15 +0000)]
io_uring: trigger eventfd for IOPOLL

Make sure io_iopoll_complete() tries to wake up eventfd, which currently
is skipped together with io_cqring_ev_posted() for non-SQPOLL IOPOLL.

Add an iopoll version of io_cqring_ev_posted(), duplicates a bit of
code, but they actually use different sets of wait queues may be for
better.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoiommu/vt-d: Fix ineffective devTLB invalidation for subdevices
Liu Yi L [Wed, 6 Jan 2021 16:03:57 +0000 (00:03 +0800)]
iommu/vt-d: Fix ineffective devTLB invalidation for subdevices

iommu_flush_dev_iotlb() is called to invalidate caches on a device but
only loops over the devices which are fully-attached to the domain. For
sub-devices, this is ineffective and can result in invalid caching
entries left on the device.

Fix the missing invalidation by adding a loop over the subdevices and
ensuring that 'domain->has_iotlb_device' is updated when attaching to
subdevices.

Fixes: 67b8e02b5e76 ("iommu/vt-d: Aux-domain specific domain attach/detach")
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Acked-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/1609949037-25291-4-git-send-email-yi.l.liu@intel.com
Signed-off-by: Will Deacon <will@kernel.org>
3 years agoiommu/vt-d: Fix general protection fault in aux_detach_device()
Liu Yi L [Wed, 6 Jan 2021 16:03:56 +0000 (00:03 +0800)]
iommu/vt-d: Fix general protection fault in aux_detach_device()

The aux-domain attach/detach are not tracked, some data structures might
be used after free. This causes general protection faults when multiple
subdevices are created and assigned to a same guest machine:

  | general protection fault, probably for non-canonical address 0xdead000000000100: 0000 [#1] SMP NOPTI
  | RIP: 0010:intel_iommu_aux_detach_device+0x12a/0x1f0
  | [...]
  | Call Trace:
  |  iommu_aux_detach_device+0x24/0x70
  |  vfio_mdev_detach_domain+0x3b/0x60
  |  ? vfio_mdev_set_domain+0x50/0x50
  |  iommu_group_for_each_dev+0x4f/0x80
  |  vfio_iommu_detach_group.isra.0+0x22/0x30
  |  vfio_iommu_type1_detach_group.cold+0x71/0x211
  |  ? find_exported_symbol_in_section+0x4a/0xd0
  |  ? each_symbol_section+0x28/0x50
  |  __vfio_group_unset_container+0x4d/0x150
  |  vfio_group_try_dissolve_container+0x25/0x30
  |  vfio_group_put_external_user+0x13/0x20
  |  kvm_vfio_group_put_external_user+0x27/0x40 [kvm]
  |  kvm_vfio_destroy+0x45/0xb0 [kvm]
  |  kvm_put_kvm+0x1bb/0x2e0 [kvm]
  |  kvm_vm_release+0x22/0x30 [kvm]
  |  __fput+0xcc/0x260
  |  ____fput+0xe/0x10
  |  task_work_run+0x8f/0xb0
  |  do_exit+0x358/0xaf0
  |  ? wake_up_state+0x10/0x20
  |  ? signal_wake_up_state+0x1a/0x30
  |  do_group_exit+0x47/0xb0
  |  __x64_sys_exit_group+0x18/0x20
  |  do_syscall_64+0x57/0x1d0
  |  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fix the crash by tracking the subdevices when attaching and detaching
aux-domains.

Fixes: 67b8e02b5e76 ("iommu/vt-d: Aux-domain specific domain attach/detach")
Co-developed-by: Xin Zeng <xin.zeng@intel.com>
Signed-off-by: Xin Zeng <xin.zeng@intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Acked-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/1609949037-25291-3-git-send-email-yi.l.liu@intel.com
Signed-off-by: Will Deacon <will@kernel.org>
3 years agoiommu/vt-d: Move intel_iommu info from struct intel_svm to struct intel_svm_dev
Liu Yi L [Wed, 6 Jan 2021 16:03:55 +0000 (00:03 +0800)]
iommu/vt-d: Move intel_iommu info from struct intel_svm to struct intel_svm_dev

'struct intel_svm' is shared by all devices bound to a give process,
but records only a single pointer to a 'struct intel_iommu'. Consequently,
cache invalidations may only be applied to a single DMAR unit, and are
erroneously skipped for the other devices.

In preparation for fixing this, rework the structures so that the iommu
pointer resides in 'struct intel_svm_dev', allowing 'struct intel_svm'
to track them in its device list.

Fixes: 1c4f88b7f1f9 ("iommu/vt-d: Shared virtual address in scalable mode")
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Raj Ashok <ashok.raj@intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Reported-by: Guo Kaijie <Kaijie.Guo@intel.com>
Reported-by: Xin Zeng <xin.zeng@intel.com>
Signed-off-by: Guo Kaijie <Kaijie.Guo@intel.com>
Signed-off-by: Xin Zeng <xin.zeng@intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Tested-by: Guo Kaijie <Kaijie.Guo@intel.com>
Cc: stable@vger.kernel.org # v5.0+
Acked-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/1609949037-25291-2-git-send-email-yi.l.liu@intel.com
Signed-off-by: Will Deacon <will@kernel.org>
3 years agoiommu/arm-smmu-qcom: Initialize SCTLR of the bypass context
Bjorn Andersson [Wed, 6 Jan 2021 00:50:38 +0000 (16:50 -0800)]
iommu/arm-smmu-qcom: Initialize SCTLR of the bypass context

On SM8150 it's occasionally observed that the boot hangs in between the
writing of SMEs and context banks in arm_smmu_device_reset().

The problem seems to coincide with a display refresh happening after
updating the stream mapping, but before clearing - and there by
disabling translation - the context bank picked to emulate translation
bypass.

Resolve this by explicitly disabling the bypass context already in
cfg_probe.

Fixes: f9081b8ff593 ("iommu/arm-smmu-qcom: Implement S2CR quirk")
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20210106005038.4152731-1-bjorn.andersson@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
3 years agousb: usbip: Use DEFINE_SPINLOCK() for spinlock
Zheng Yongjun [Wed, 23 Dec 2020 14:14:31 +0000 (22:14 +0800)]
usb: usbip: Use DEFINE_SPINLOCK() for spinlock

spinlock can be initialized automatically with DEFINE_SPINLOCK()
rather than explicitly calling spin_lock_init().

Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com>
Link: https://lore.kernel.org/r/20201223141431.835-1-zhengyongjun3@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agousb: gadget: configfs: Add a specific configFS reset callback
Wesley Cheng [Tue, 29 Dec 2020 23:03:31 +0000 (15:03 -0800)]
usb: gadget: configfs: Add a specific configFS reset callback

In order for configFS based USB gadgets to set the proper charge current
for bus reset scenarios, expose a separate reset callback to set the
current to 100mA based on the USB battery charging specification.

Reviewed-by: Peter Chen <peter.chen@nxp.com>
Signed-off-by: Wesley Cheng <wcheng@codeaurora.org>
Link: https://lore.kernel.org/r/1609283011-21997-4-git-send-email-wcheng@codeaurora.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agoiommu/vt-d: Fix lockdep splat in sva bind()/unbind()
Lu Baolu [Thu, 31 Dec 2020 00:53:23 +0000 (08:53 +0800)]
iommu/vt-d: Fix lockdep splat in sva bind()/unbind()

Lock(&iommu->lock) without disabling irq causes lockdep warnings.

========================================================
WARNING: possible irq lock inversion dependency detected
5.11.0-rc1+ #828 Not tainted
--------------------------------------------------------
kworker/0:1H/120 just changed the state of lock:
ffffffffad9ea1b8 (device_domain_lock){..-.}-{2:2}, at:
iommu_flush_dev_iotlb.part.0+0x32/0x120
but this lock took another, SOFTIRQ-unsafe lock in the past:
 (&iommu->lock){+.+.}-{2:2}

and interrupts could create inverse lock ordering between them.

other info that might help us debug this:
 Possible interrupt unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&iommu->lock);
                               local_irq_disable();
                               lock(device_domain_lock);
                               lock(&iommu->lock);
  <Interrupt>
    lock(device_domain_lock);

 *** DEADLOCK ***

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20201231005323.2178523-5-baolu.lu@linux.intel.com
Signed-off-by: Will Deacon <will@kernel.org>
3 years agoRevert "iommu: Add quirk for Intel graphic devices in map_sg"
Lu Baolu [Thu, 31 Dec 2020 00:53:22 +0000 (08:53 +0800)]
Revert "iommu: Add quirk for Intel graphic devices in map_sg"

This reverts commit 65f746e8285f0a67d43517d86fedb9e29ead49f2.

As commit 8a473dbadccf ("drm/i915: Fix DMA mapped scatterlist walks") and
commit 934941ed5a30 ("drm/i915: Fix DMA mapped scatterlist lookup") fixed
the DMA scatterlist limitations in the i915 driver, remove this temporary
workaround.

Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Tom Murphy <murphyt7@tcd.ie>
Cc: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20201231005323.2178523-4-baolu.lu@linux.intel.com
Signed-off-by: Will Deacon <will@kernel.org>
3 years agoiommu/vt-d: Fix misuse of ALIGN in qi_flush_piotlb()
Lu Baolu [Thu, 31 Dec 2020 00:53:19 +0000 (08:53 +0800)]
iommu/vt-d: Fix misuse of ALIGN in qi_flush_piotlb()

Use IS_ALIGNED() instead. Otherwise, an unaligned address will be ignored.

Fixes: 33cd6e642d6a ("iommu/vt-d: Flush PASID-based iotlb for iova over first level")
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20201231005323.2178523-1-baolu.lu@linux.intel.com
Signed-off-by: Will Deacon <will@kernel.org>
3 years agodrm/ttm: unexport ttm_pool_init/fini
Christian König [Tue, 5 Jan 2021 17:56:56 +0000 (18:56 +0100)]
drm/ttm: unexport ttm_pool_init/fini

Drivers are not supposed to use this directly any more.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Link: https://patchwork.freedesktop.org/patch/412156/
3 years agodrm/radeon: stop re-init the TTM page pool
Christian König [Tue, 5 Jan 2021 17:55:47 +0000 (18:55 +0100)]
drm/radeon: stop re-init the TTM page pool

Drivers are not supposed to init the page pool directly any more.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Link: https://patchwork.freedesktop.org/patch/412153/
3 years agousb: dwc3: gadget: Clear wait flag on dequeue
Thinh Nguyen [Tue, 5 Jan 2021 06:42:39 +0000 (22:42 -0800)]
usb: dwc3: gadget: Clear wait flag on dequeue

If an active transfer is dequeued, then the endpoint is freed to start a
new transfer. Make sure to clear the endpoint's transfer wait flag for
this case.

Fixes: e0d19563eb6c ("usb: dwc3: gadget: Wait for transfer completion")
Cc: stable@vger.kernel.org
Acked-by: Felipe Balbi <balbi@kernel.org>
Signed-off-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com>
Link: https://lore.kernel.org/r/b81cd5b5281cfbfdadb002c4bcf5c9be7c017cfd.1609828485.git.Thinh.Nguyen@synopsys.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agousb: typec: Send uevent for num_altmodes update
Prashant Malani [Thu, 7 Jan 2021 03:49:04 +0000 (19:49 -0800)]
usb: typec: Send uevent for num_altmodes update

Generate a change uevent when the "number_of_alternate_modes" sysfs file
for partners and plugs is updated by a port driver.

Cc: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Cc: Benson Leung <bleung@chromium.org>
Signed-off-by: Prashant Malani <pmalani@chromium.org>
Link: https://lore.kernel.org/r/20210107034904.4112029-1-pmalani@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agousb: typec: Fix copy paste error for NVIDIA alt-mode description
Peter Robinson [Wed, 6 Jan 2021 00:16:05 +0000 (00:16 +0000)]
usb: typec: Fix copy paste error for NVIDIA alt-mode description

The name of the module for the NVIDIA alt-mode is incorrect as it
looks to be a copy-paste error from the entry above, update it to
the correct typec_nvidia module name.

Cc: Ajay Gupta <ajayg@nvidia.com>
Cc: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Signed-off-by: Peter Robinson <pbrobinson@gmail.com>
Link: https://lore.kernel.org/r/20210106001605.167917-1-pbrobinson@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agousb: gadget: enable super speed plus
taehyun.cho [Wed, 6 Jan 2021 15:46:25 +0000 (00:46 +0900)]
usb: gadget: enable super speed plus

Enable Super speed plus in configfs to support USB3.1 Gen2.
This ensures that when a USB gadget is plugged in, it is
enumerated as Gen 2 and connected at 10 Gbps if the host and
cable are capable of it.

Many in-tree gadget functions (fs, midi, acm, ncm, mass_storage,
etc.) already have SuperSpeed Plus support.

Tested: plugged gadget into Linux host and saw:
[284907.385986] usb 8-2: new SuperSpeedPlus Gen 2 USB device number 3 using xhci_hcd

Tested-by: Lorenzo Colitti <lorenzo@google.com>
Acked-by: Felipe Balbi <balbi@kernel.org>
Signed-off-by: taehyun.cho <taehyun.cho@samsung.com>
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Link: https://lore.kernel.org/r/20210106154625.2801030-1-lorenzo@google.com
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agokcov, usb: hide in_serving_softirq checks in __usb_hcd_giveback_urb
Andrey Konovalov [Tue, 5 Jan 2021 19:53:42 +0000 (20:53 +0100)]
kcov, usb: hide in_serving_softirq checks in __usb_hcd_giveback_urb

Done opencode in_serving_softirq() checks in in_serving_softirq() to avoid
cluttering the code, hide them in kcov helpers instead.

Fixes: aee9ddb1d371 ("kcov, usb: only collect coverage from __usb_hcd_giveback_urb in softirq")
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Link: https://lore.kernel.org/r/aeb430c5bb90b0ccdf1ec302c70831c1a47b9c45.1609876340.git.andreyknvl@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agodmabuf: fix use-after-free of dmabuf's file->f_inode
Charan Teja Reddy [Tue, 5 Jan 2021 14:36:39 +0000 (20:06 +0530)]
dmabuf: fix use-after-free of dmabuf's file->f_inode

It is observed 'use-after-free' on the dmabuf's file->f_inode with the
race between closing the dmabuf file and reading the dmabuf's debug
info.

Consider the below scenario where P1 is closing the dma_buf file
and P2 is reading the dma_buf's debug info in the system:

P1 P2
dma_buf_debug_show()
dma_buf_put()
  __fput()
    file->f_op->release()
    dput()
    ....
      dentry_unlink_inode()
        iput(dentry->d_inode)
        (where the inode is freed)
mutex_lock(&db_list.lock)
read 'dma_buf->file->f_inode'
(the same inode is freed by P1)
mutex_unlock(&db_list.lock)
      dentry->d_op->d_release()-->
        dma_buf_release()
          .....
          mutex_lock(&db_list.lock)
          removes the dmabuf from the list
          mutex_unlock(&db_list.lock)

In the above scenario, when dma_buf_put() is called on a dma_buf, it
first frees the dma_buf's file->f_inode(=dentry->d_inode) and then
removes this dma_buf from the system db_list. In between P2 traversing
the db_list tries to access this dma_buf's file->f_inode that was freed
by P1 which is a use-after-free case.

Since, __fput() calls f_op->release first and then later calls the
d_op->d_release, move the dma_buf's db_list removal from d_release() to
f_op->release(). This ensures that dma_buf's file->f_inode is not
accessed after it is released.

Cc: <stable@vger.kernel.org> # 5.4.x-
Fixes: 4ab59c3c638c ("dma-buf: Move dma_buf_release() from fops to dentry_ops")
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Charan Teja Reddy <charante@codeaurora.org>
Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/1609857399-31549-1-git-send-email-charante@codeaurora.org
3 years agoarm64: Move PSTATE.TCO setting to separate functions
Catalin Marinas [Mon, 4 Jan 2021 12:36:14 +0000 (12:36 +0000)]
arm64: Move PSTATE.TCO setting to separate functions

For consistency with __uaccess_{disable,enable}_hw_pan(), move the
PSTATE.TCO setting into dedicated __uaccess_{disable,enable}_tco()
functions.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
3 years agoMAINTAINERS: Update MCAN MMIO device driver maintainer
Sriram Dash [Mon, 4 Jan 2021 12:31:34 +0000 (18:01 +0530)]
MAINTAINERS: Update MCAN MMIO device driver maintainer

Update Pankaj Sharma as maintainer for mcan mmio device driver as I
will be moving to a different role.

Signed-off-by: Sriram Dash <sriram.dash@samsung.com>
Acked-by: Pankaj Sharma <pankj.sharma@samsung.com>
Link: https://lore.kernel.org/r/20210104123134.16930-1-sriram.dash@samsung.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
3 years agocan: rcar: Kconfig: update help description for CAN_RCAR config
Lad Prabhakar [Mon, 4 Jan 2021 09:03:27 +0000 (09:03 +0000)]
can: rcar: Kconfig: update help description for CAN_RCAR config

The rcar_can driver also supports RZ/G SoC's, update the description to reflect
this.

Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20210104090327.6547-1-prabhakar.mahadev-lad.rj@bp.renesas.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
3 years agocan: mcp251xfd: mcp251xfd_handle_rxif_ring(): first increment RX tail pointer in...
Marc Kleine-Budde [Tue, 5 Jan 2021 21:41:38 +0000 (22:41 +0100)]
can: mcp251xfd: mcp251xfd_handle_rxif_ring(): first increment RX tail pointer in HW, then in driver

The previous patch fixes a TEF vs. TX race condition, by first updating the TEF
tail pointer in hardware, and then updating the driver internal pointer.

The same pattern exists in the RX-path, too. This should be no problem, as the
driver accesses the RX-FIFO from the interrupt handler only, thus the access is
properly serialized. Fix the order here, too, so that the TEF- and RX-path look
similar.

Fixes: 1f652bb6bae7 ("can: mcp25xxfd: rx-path: reduce number of SPI core requests to set UINC bit")
Link: https://lore.kernel.org/r/20210105214138.3150886-3-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
3 years agocan: mcp251xfd: mcp251xfd_handle_tefif(): fix TEF vs. TX race condition
Marc Kleine-Budde [Tue, 5 Jan 2021 21:41:37 +0000 (22:41 +0100)]
can: mcp251xfd: mcp251xfd_handle_tefif(): fix TEF vs. TX race condition

The mcp251xfd driver uses a TX FIFO for sending CAN frames and a TX Event FIFO
(TEF) for completed TX-requests.

The TEF event handling in the mcp251xfd_handle_tefif() function has a race
condition. It first increments the tx-ring's tail counter to signal that
there's room in the TX and TEF FIFO, then it increments the TEF FIFO in
hardware.

A running mcp251xfd_start_xmit() on a different CPU might not stop the txqueue
(as the tx-ring still shows free space). The next mcp251xfd_start_xmit() will
push a message into the chip and the TX complete event might overflow the TEF
FIFO.

This patch changes the order to fix the problem.

Fixes: 68c0c1c7f966 ("can: mcp251xfd: tef-path: reduce number of SPI core requests to set UINC bit")
Link: https://lore.kernel.org/r/20210105214138.3150886-2-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
3 years agoMerge tag 'drm-intel-fixes-2021-01-07' of git://anongit.freedesktop.org/drm/drm-intel...
Daniel Vetter [Thu, 7 Jan 2021 09:26:05 +0000 (10:26 +0100)]
Merge tag 'drm-intel-fixes-2021-01-07' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes

drm/i915 fixes for v5.11-rc3:
- Use per-connector PM QoS tracking for DP aux communication
- GuC firmware fix for older Cometlakes
- Clear the gpu reloc and shadow batches

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/877dop18zf.fsf@intel.com
3 years agoMerge tag 'amd-drm-fixes-5.11-2021-01-06' of https://gitlab.freedesktop.org/agd5f...
Daniel Vetter [Thu, 7 Jan 2021 09:02:30 +0000 (10:02 +0100)]
Merge tag 'amd-drm-fixes-5.11-2021-01-06' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-5.11-2021-01-06:

amdgpu:
- Telemetry fix for VGH
- Powerplay fixes for RV
- Powerplay fixes for RN
- RAS fixes for Sienna Cichlid
- Blank screen regression fix
- Drop DCN support for aarch64
- Misc other fixes

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210106222721.3934-1-alexander.deucher@amd.com
3 years agocan: tcan4x5x: fix bittiming const, use common bittiming from m_can driver
Marc Kleine-Budde [Tue, 15 Dec 2020 10:32:38 +0000 (11:32 +0100)]
can: tcan4x5x: fix bittiming const, use common bittiming from m_can driver

According to the TCAN4550 datasheet "SLLSF91 - DECEMBER 2018" the tcan4x5x has
the same bittiming constants as a m_can revision 3.2.x/3.3.0.

The tcan4x5x chip I'm using identifies itself as m_can revision 3.2.1, so
remove the tcan4x5x specific bittiming values and rely on the values in the
m_can driver, which are selected according to core revision.

Fixes: 5443c226ba91 ("can: tcan4x5x: Add tcan4x5x driver to the kernel")
Cc: Dan Murphy <dmurphy@ti.com>
Reviewed-by: Sean Nyekjaer <sean@geanix.com>
Link: https://lore.kernel.org/r/20201215103238.524029-3-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
3 years agocan: m_can: m_can_class_unregister(): remove erroneous m_can_clk_stop()
Marc Kleine-Budde [Tue, 15 Dec 2020 10:32:37 +0000 (11:32 +0100)]
can: m_can: m_can_class_unregister(): remove erroneous m_can_clk_stop()

In m_can_class_register() the clock is started, but stopped on exit. When
calling m_can_class_unregister(), the clock is stopped a second time.

This patch removes the erroneous m_can_clk_stop() in  m_can_class_unregister().

Fixes: f524f829b75a ("can: m_can: Create a m_can platform framework")
Cc: Dan Murphy <dmurphy@ti.com>
Cc: Sriram Dash <sriram.dash@samsung.com>
Reviewed-by: Sean Nyekjaer <sean@geanix.com>
Link: https://lore.kernel.org/r/20201215103238.524029-2-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
3 years agoptp: ptp_ines: prevent build when HAS_IOMEM is not set
Randy Dunlap [Wed, 6 Jan 2021 04:25:31 +0000 (20:25 -0800)]
ptp: ptp_ines: prevent build when HAS_IOMEM is not set

ptp_ines.c uses devm_platform_ioremap_resource(), which is only
built/available when CONFIG_HAS_IOMEM is enabled.
CONFIG_HAS_IOMEM is not enabled for arch/s390/, so builds on S390
have a build error:

s390-linux-ld: drivers/ptp/ptp_ines.o: in function `ines_ptp_ctrl_probe':
ptp_ines.c:(.text+0x17e6): undefined reference to `devm_platform_ioremap_resource'

Prevent builds of ptp_ines.c when HAS_IOMEM is not set.

Fixes: bad1eaa6ac31 ("ptp: Add a driver for InES time stamping IP core.")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kernel test robot <lkp@intel.com>
Link: lore.kernel.org/r/202101031125.ZEFCUiKi-lkp@intel.com
Acked-by: Richard Cochran <richardcochran@gmail.com>
Link: https://lore.kernel.org/r/20210106042531.1351-1-rdunlap@infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>