David Woodhouse [Tue, 27 Oct 2020 14:39:43 +0000 (14:39 +0000)]
sched/wait: Add add_wait_queue_priority()
This allows an exclusive wait_queue_entry to be added at the head of the
queue, instead of the tail as normal. Thus, it gets to consume events
first without allowing non-exclusive waiters to be woken at all.
The (first) intended use is for KVM IRQFD, which currently has
inconsistent behaviour depending on whether posted interrupts are
available or not. If they are, KVM will bypass the eventfd completely
and deliver interrupts directly to the appropriate vCPU. If not, events
are delivered through the eventfd and userspace will receive them when
polling on the eventfd.
By using add_wait_queue_priority(), KVM will be able to consistently
consume events within the kernel without accidentally exposing them
to userspace when they're supposed to be bypassed. This, in turn, means
that userspace doesn't have to jump through hoops to avoid listening
on the erroneously noisy eventfd and injecting duplicate interrupts.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <
20201027143944.648769-2-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Yadong Qi [Fri, 6 Nov 2020 06:51:22 +0000 (14:51 +0800)]
KVM: x86: emulate wait-for-SIPI and SIPI-VMExit
Background: We have a lightweight HV, it needs INIT-VMExit and
SIPI-VMExit to wake-up APs for guests since it do not monitor
the Local APIC. But currently virtual wait-for-SIPI(WFS) state
is not supported in nVMX, so when running on top of KVM, the L1
HV cannot receive the INIT-VMExit and SIPI-VMExit which cause
the L2 guest cannot wake up the APs.
According to Intel SDM Chapter 25.2 Other Causes of VM Exits,
SIPIs cause VM exits when a logical processor is in
wait-for-SIPI state.
In this patch:
1. introduce SIPI exit reason,
2. introduce wait-for-SIPI state for nVMX,
3. advertise wait-for-SIPI support to guest.
When L1 hypervisor is not monitoring Local APIC, L0 need to emulate
INIT-VMExit and SIPI-VMExit to L1 to emulate INIT-SIPI-SIPI for
L2. L2 LAPIC write would be traped by L0 Hypervisor(KVM), L0 should
emulate the INIT/SIPI vmexit to L1 hypervisor to set proper state
for L2's vcpu state.
Handle procdure:
Source vCPU:
L2 write LAPIC.ICR(INIT).
L0 trap LAPIC.ICR write(INIT): inject a latched INIT event to target
vCPU.
Target vCPU:
L0 emulate an INIT VMExit to L1 if is guest mode.
L1 set guest VMCS, guest_activity_state=WAIT_SIPI, vmresume.
L0 set vcpu.mp_state to INIT_RECEIVED if (vmcs12.guest_activity_state
== WAIT_SIPI).
Source vCPU:
L2 write LAPIC.ICR(SIPI).
L0 trap LAPIC.ICR write(INIT): inject a latched SIPI event to traget
vCPU.
Target vCPU:
L0 emulate an SIPI VMExit to L1 if (vcpu.mp_state == INIT_RECEIVED).
L1 set CS:IP, guest_activity_state=ACTIVE, vmresume.
L0 resume to L2.
L2 start-up.
Signed-off-by: Yadong Qi <yadong.qi@intel.com>
Message-Id: <
20200922052343.84388-1-yadong.qi@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <
20201106065122.403183-1-yadong.qi@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Thu, 5 Nov 2020 16:20:49 +0000 (11:20 -0500)]
KVM: x86: fix apic_accept_events vs check_nested_events
vmx_apic_init_signal_blocked is buggy in that it returns true
even in VMX non-root mode. In non-root mode, however, INITs
are not latched, they just cause a vmexit. Previously,
KVM was waiting for them to be processed when kvm_apic_accept_events
and in the meanwhile it ate the SIPIs that the processor received.
However, in order to implement the wait-for-SIPI activity state,
KVM will have to process KVM_APIC_SIPI in vmx_check_nested_events,
and it will not be possible anymore to disregard SIPIs in non-root
mode as the code is currently doing.
By calling kvm_x86_ops.nested_ops->check_events, we can force a vmexit
(with the side-effect of latching INITs) before incorrectly injecting
an INIT or SIPI in a guest, and therefore vmx_apic_init_signal_blocked
can do the right thing.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Wed, 7 Oct 2020 01:44:17 +0000 (18:44 -0700)]
KVM: selftests: Verify supported CR4 bits can be set before KVM_SET_CPUID2
Extend the KVM_SET_SREGS test to verify that all supported CR4 bits, as
enumerated by KVM, can be set before KVM_SET_CPUID2, i.e. without first
defining the vCPU model. KVM is supposed to skip guest CPUID checks
when host userspace is stuffing guest state.
Check the inverse as well, i.e. that KVM rejects KVM_SET_REGS if CR4
has one or more unsupported bits set.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <
20201007014417.29276-7-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Wed, 7 Oct 2020 01:44:16 +0000 (18:44 -0700)]
KVM: x86: Return bool instead of int for CR4 and SREGS validity checks
Rework the common CR4 and SREGS checks to return a bool instead of an
int, i.e. true/false instead of 0/-EINVAL, and add "is" to the name to
clarify the polarity of the return value (which is effectively inverted
by this change).
No functional changed intended.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <
20201007014417.29276-6-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Wed, 7 Oct 2020 01:44:15 +0000 (18:44 -0700)]
KVM: x86: Move vendor CR4 validity check to dedicated kvm_x86_ops hook
Split out VMX's checks on CR4.VMXE to a dedicated hook, .is_valid_cr4(),
and invoke the new hook from kvm_valid_cr4(). This fixes an issue where
KVM_SET_SREGS would return success while failing to actually set CR4.
Fixing the issue by explicitly checking kvm_x86_ops.set_cr4()'s return
in __set_sregs() is not a viable option as KVM has already stuffed a
variety of vCPU state.
Note, kvm_valid_cr4() and is_valid_cr4() have different return types and
inverted semantics. This will be remedied in a future patch.
Fixes:
5e1746d6205d ("KVM: nVMX: Allow setting the VMXE bit in CR4")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <
20201007014417.29276-5-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Wed, 7 Oct 2020 01:44:14 +0000 (18:44 -0700)]
KVM: SVM: Drop VMXE check from svm_set_cr4()
Drop svm_set_cr4()'s explicit check CR4.VMXE now that common x86 handles
the check by incorporating VMXE into the CR4 reserved bits, via
kvm_cpu_caps. SVM obviously does not set X86_FEATURE_VMX.
No functional change intended.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <
20201007014417.29276-4-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Wed, 7 Oct 2020 01:44:13 +0000 (18:44 -0700)]
KVM: VMX: Drop explicit 'nested' check from vmx_set_cr4()
Drop vmx_set_cr4()'s explicit check on the 'nested' module param now
that common x86 handles the check by incorporating VMXE into the CR4
reserved bits, via kvm_cpu_caps. X86_FEATURE_VMX is set in kvm_cpu_caps
(by vmx_set_cpu_caps()), if and only if 'nested' is true.
No functional change intended.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <
20201007014417.29276-3-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Wed, 7 Oct 2020 01:44:12 +0000 (18:44 -0700)]
KVM: VMX: Drop guest CPUID check for VMXE in vmx_set_cr4()
Drop vmx_set_cr4()'s somewhat hidden guest_cpuid_has() check on VMXE now
that common x86 handles the check by incorporating VMXE into the CR4
reserved bits, i.e. in cr4_guest_rsvd_bits. This fixes a bug where KVM
incorrectly rejects KVM_SET_SREGS with CR4.VMXE=1 if it's executed
before KVM_SET_CPUID{,2}.
Fixes:
5e1746d6205d ("KVM: nVMX: Allow setting the VMXE bit in CR4")
Reported-by: Stas Sergeev <stsp@users.sourceforge.net>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <
20201007014417.29276-2-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Sun, 15 Nov 2020 13:55:43 +0000 (08:55 -0500)]
kvm: mmu: fix is_tdp_mmu_check when the TDP MMU is not in use
In some cases where shadow paging is in use, the root page will
be either mmu->pae_root or vcpu->arch.mmu->lm_root. Then it will
not have an associated struct kvm_mmu_page, because it is allocated
with alloc_page instead of kvm_mmu_alloc_page.
Just return false quickly from is_tdp_mmu_root if the TDP MMU is
not in use, which also includes the case where shadow paging is
enabled.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Babu Moger [Thu, 12 Nov 2020 22:18:03 +0000 (16:18 -0600)]
KVM: SVM: Update cr3_lm_rsvd_bits for AMD SEV guests
For AMD SEV guests, update the cr3_lm_rsvd_bits to mask
the memory encryption bit in reserved bits.
Signed-off-by: Babu Moger <babu.moger@amd.com>
Message-Id: <
160521948301.32054.
5783800787423231162.stgit@bmoger-ubuntu>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Babu Moger [Thu, 12 Nov 2020 22:17:56 +0000 (16:17 -0600)]
KVM: x86: Introduce cr3_lm_rsvd_bits in kvm_vcpu_arch
SEV guests fail to boot on a system that supports the PCID feature.
While emulating the RSM instruction, KVM reads the guest CR3
and calls kvm_set_cr3(). If the vCPU is in the long mode,
kvm_set_cr3() does a sanity check for the CR3 value. In this case,
it validates whether the value has any reserved bits set. The
reserved bit range is 63:cpuid_maxphysaddr(). When AMD memory
encryption is enabled, the memory encryption bit is set in the CR3
value. The memory encryption bit may fall within the KVM reserved
bit range, causing the KVM emulation failure.
Introduce a new field cr3_lm_rsvd_bits in kvm_vcpu_arch which will
cache the reserved bits in the CR3 value. This will be initialized
to rsvd_bits(cpuid_maxphyaddr(vcpu), 63).
If the architecture has any special bits(like AMD SEV encryption bit)
that needs to be masked from the reserved bits, should be cleared
in vendor specific kvm_x86_ops.vcpu_after_set_cpuid handler.
Fixes:
a780a3ea628268b2 ("KVM: X86: Fix reserved bits check for MOV to CR3")
Signed-off-by: Babu Moger <babu.moger@amd.com>
Message-Id: <
160521947657.32054.
3264016688005356563.stgit@bmoger-ubuntu>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
David Edmondson [Tue, 3 Nov 2020 12:04:00 +0000 (12:04 +0000)]
KVM: x86: clflushopt should be treated as a no-op by emulation
The instruction emulator ignores clflush instructions, yet fails to
support clflushopt. Treat both similarly.
Fixes:
13e457e0eebf ("KVM: x86: Emulator does not decode clflush well")
Signed-off-by: David Edmondson <david.edmondson@oracle.com>
Message-Id: <
20201103120400.240882-1-david.edmondson@oracle.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Fri, 13 Nov 2020 11:28:23 +0000 (06:28 -0500)]
Merge tag 'kvmarm-fixes-5.10-3' of git://git./linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for v5.10, take #3
- Allow userspace to downgrade ID_AA64PFR0_EL1.CSV2
- Inject UNDEF on SCXTNUM_ELx access
Linus Torvalds [Fri, 13 Nov 2020 00:39:58 +0000 (16:39 -0800)]
Merge tag 'fscrypt-for-linus' of git://git./fs/fscrypt/fscrypt
Pull fscrypt fix from Eric Biggers:
"Fix a regression where new files weren't using inline encryption when
they should be"
* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
fscrypt: fix inline encryption not used on new files
Linus Torvalds [Fri, 13 Nov 2020 00:37:14 +0000 (16:37 -0800)]
Merge tag 'gfs2-v5.10-rc3-fixes' of git://git./linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 fixes from Andreas Gruenbacher:
"Fix jdata data corruption and glock reference leak"
* tag 'gfs2-v5.10-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: Fix case in which ail writes are done to jdata holes
Revert "gfs2: Ignore journal log writes for jdata holes"
gfs2: fix possible reference leak in gfs2_check_blk_type
Linus Torvalds [Thu, 12 Nov 2020 22:02:04 +0000 (14:02 -0800)]
Merge tag 'net-5.10-rc4' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Current release - regressions:
- arm64: dts: fsl-ls1028a-kontron-sl28: specify in-band mode for
ENETC
Current release - bugs in new features:
- mptcp: provide rmem[0] limit offset to fix oops
Previous release - regressions:
- IPv6: Set SIT tunnel hard_header_len to zero to fix path MTU
calculations
- lan743x: correctly handle chips with internal PHY
- bpf: Don't rely on GCC __attribute__((optimize)) to disable GCSE
- mlx5e: Fix VXLAN port table synchronization after function reload
Previous release - always broken:
- bpf: Zero-fill re-used per-cpu map element
- fix out-of-order UDP packets when forwarding with UDP GSO fraglists
turned on:
- fix UDP header access on Fast/frag0 UDP GRO
- fix IP header access and skb lookup on Fast/frag0 UDP GRO
- ethtool: netlink: add missing netdev_features_change() call
- net: Update window_clamp if SOCK_RCVBUF is set
- igc: Fix returning wrong statistics
- ch_ktls: fix multiple leaks and corner cases in Chelsio TLS offload
- tunnels: Fix off-by-one in lower MTU bounds for ICMP/ICMPv6 replies
- r8169: disable hw csum for short packets on all chip versions
- vrf: Fix fast path output packet handling with async Netfilter
rules"
* tag 'net-5.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (65 commits)
lan743x: fix use of uninitialized variable
net: udp: fix IP header access and skb lookup on Fast/frag0 UDP GRO
net: udp: fix UDP header access on Fast/frag0 UDP GRO
devlink: Avoid overwriting port attributes of registered port
vrf: Fix fast path output packet handling with async Netfilter rules
cosa: Add missing kfree in error path of cosa_write
net: switch to the kernel.org patchwork instance
ch_ktls: stop the txq if reaches threshold
ch_ktls: tcb update fails sometimes
ch_ktls/cxgb4: handle partial tag alone SKBs
ch_ktls: don't free skb before sending FIN
ch_ktls: packet handling prior to start marker
ch_ktls: Correction in middle record handling
ch_ktls: missing handling of header alone
ch_ktls: Correction in trimmed_len calculation
cxgb4/ch_ktls: creating skbs causes panic
ch_ktls: Update cheksum information
ch_ktls: Correction in finding correct length
cxgb4/ch_ktls: decrypted bit is not enough
net/x25: Fix null-ptr-deref in x25_connect
...
Linus Torvalds [Thu, 12 Nov 2020 21:49:12 +0000 (13:49 -0800)]
Merge tag 'nfs-for-5.10-2' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client bugfixes from Anna Schumaker:
"Stable fixes:
- Fix failure to unregister shrinker
Other fixes:
- Fix unnecessary locking to clear up some contention
- Fix listxattr receive buffer size
- Fix default mount options for nfsroot"
* tag 'nfs-for-5.10-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
NFS: Remove unnecessary inode lock in nfs_fsync_dir()
NFS: Remove unnecessary inode locking in nfs_llseek_dir()
NFS: Fix listxattr receive buffer size
NFSv4.2: fix failure to unregister shrinker
nfsroot: Default mount option should ask for built-in NFS version
Marc Zyngier [Tue, 10 Nov 2020 14:13:08 +0000 (14:13 +0000)]
KVM: arm64: Handle SCXTNUM_ELx traps
As the kernel never sets HCR_EL2.EnSCXT, accesses to SCXTNUM_ELx
will trap to EL2. Let's handle that as gracefully as possible
by injecting an UNDEF exception into the guest. This is consistent
with the guest's view of ID_AA64PFR0_EL1.CSV2 being at most 1.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201110141308.451654-4-maz@kernel.org
Marc Zyngier [Tue, 10 Nov 2020 14:13:07 +0000 (14:13 +0000)]
KVM: arm64: Unify trap handlers injecting an UNDEF
A large number of system register trap handlers only inject an
UNDEF exeption, and yet each class of sysreg seems to provide its
own, identical function.
Let's unify them all, saving us introducing yet another one later.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201110141308.451654-3-maz@kernel.org
Marc Zyngier [Tue, 10 Nov 2020 14:13:06 +0000 (14:13 +0000)]
KVM: arm64: Allow setting of ID_AA64PFR0_EL1.CSV2 from userspace
We now expose ID_AA64PFR0_EL1.CSV2=1 to guests running on hosts
that are immune to Spectre-v2, but that don't have this field set,
most likely because they predate the specification.
However, this prevents the migration of guests that have started on
a host the doesn't fake this CSV2 setting to one that does, as KVM
rejects the write to ID_AA64PFR0_EL2 on the grounds that it isn't
what is already there.
In order to fix this, allow userspace to set this field as long as
this doesn't result in a promising more than what is already there
(setting CSV2 to 0 is acceptable, but setting it to 1 when it is
already set to 0 isn't).
Fixes:
e1026237f9067 ("KVM: arm64: Set CSV2 for guests on hardware unaffected by Spectre-v2")
Reported-by: Peng Liang <liangpeng10@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201110141308.451654-2-maz@kernel.org
Marc Zyngier [Thu, 12 Nov 2020 21:20:43 +0000 (21:20 +0000)]
Merge tag 'v5.10-rc1' into kvmarm-master/next
Linux 5.10-rc1
Signed-off-by: Marc Zyngier <maz@kernel.org>
Linus Torvalds [Thu, 12 Nov 2020 19:06:53 +0000 (11:06 -0800)]
Merge tag 'acpi-5.10-rc4' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These are mostly docmentation fixes and janitorial changes plus some
new device IDs and a new quirk.
Specifics:
- Fix documentation regarding GPIO properties (Andy Shevchenko)
- Fix spelling mistakes in ACPI documentation (Flavio Suligoi)
- Fix white space inconsistencies in ACPI code (Maximilian Luz)
- Fix string formatting in the ACPI Generic Event Device (GED) driver
(Nick Desaulniers)
- Add Intel Alder Lake device IDs to the ACPI drivers used by the
Dynamic Platform and Thermal Framework (Srinivas Pandruvada)
- Add lid-related DMI quirk for Medion Akoya E2228T to the ACPI
button driver (Hans de Goede)"
* tag 'acpi-5.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: DPTF: Support Alder Lake
Documentation: ACPI: fix spelling mistakes
ACPI: button: Add DMI quirk for Medion Akoya E2228T
ACPI: GED: fix -Wformat
ACPI: Fix whitespace inconsistencies
ACPI: scan: Fix acpi_dma_configure_id() kerneldoc name
Documentation: firmware-guide: gpio-properties: Clarify initial output state
Documentation: firmware-guide: gpio-properties: active_low only for GpioIo()
Documentation: firmware-guide: gpio-properties: Fix factual mistakes
Linus Torvalds [Thu, 12 Nov 2020 19:03:38 +0000 (11:03 -0800)]
Merge tag 'pm-5.10-rc4' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"Make the intel_pstate driver behave as expected when it operates in
the passive mode with HWP enabled and the 'powersave' governor on top
of it"
* tag 'pm-5.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: intel_pstate: Take CPUFREQ_GOV_STRICT_TARGET into account
cpufreq: Add strict_target to struct cpufreq_policy
cpufreq: Introduce CPUFREQ_GOV_STRICT_TARGET
cpufreq: Introduce governor flags
Sven Van Asbroeck [Thu, 12 Nov 2020 15:25:13 +0000 (10:25 -0500)]
lan743x: fix use of uninitialized variable
When no devicetree is present, the driver will use an
uninitialized variable.
Fix by initializing this variable.
Fixes:
902a66e08cea ("lan743x: correctly handle chips with internal PHY")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Sven Van Asbroeck <thesven73@gmail.com>
Link: https://lore.kernel.org/r/20201112152513.1941-1-TheSven73@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 12 Nov 2020 17:55:58 +0000 (09:55 -0800)]
Merge branch 'net-udp-fix-fast-frag0-udp-gro'
Alexander Lobakin says:
====================
net: udp: fix Fast/frag0 UDP GRO
While testing UDP GSO fraglists forwarding through driver that uses
Fast GRO (via napi_gro_frags()), I was observing lots of out-of-order
iperf packets:
[ ID] Interval Transfer Bitrate Jitter
[SUM] 0.0-40.0 sec 12106 datagrams received out-of-order
Simple switch to napi_gro_receive() or any other method without frag0
shortcut completely resolved them.
I've found two incorrect header accesses in GRO receive callback(s):
- udp_hdr() (instead of udp_gro_udphdr()) that always points to junk
in "fast" mode and could probably do this in "regular".
This was the actual bug that caused all out-of-order delivers;
- udp{4,6}_lib_lookup_skb() -> ip{,v6}_hdr() (instead of
skb_gro_network_header()) that potentionally might return odd
pointers in both modes.
Each patch addresses one of these two issues.
This doesn't cover a support for nested tunnels as it's out of the
subject and requires more invasive changes. It will be handled
separately in net-next series.
Credits:
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>
Since v4 [0]:
- split the fix into two logical ones (Willem);
- replace ternaries with plain ifs to beautify the code (Jakub);
- drop p->data part to reintroduce it later in abovementioned set.
Since v3 [1]:
- restore the original {,__}udp{4,6}_lib_lookup_skb() and use
private versions of them inside GRO code (Willem).
Since v2 [2]:
- dropped redundant check introduced in v2 as it's performed right
before (thanks to Eric);
- udp_hdr() switched to data + off for skbs from list (also Eric);
- fixed possible malfunction of {,__}udp{4,6}_lib_lookup_skb() with
Fast/frag0 due to ip{,v6}_hdr() usage (Willem).
Since v1 [3]:
- added a NULL pointer check for "uh" as suggested by Willem.
[0] https://lore.kernel.org/netdev/Ha2hou5eJPcblo4abjAqxZRzIl1RaLs2Hy0oOAgFs@cp4-web-036.plabs.ch
[1] https://lore.kernel.org/netdev/MgZce9htmEtCtHg7pmWxXXfdhmQ6AHrnltXC41zOoo@cp7-web-042.plabs.ch
[2] https://lore.kernel.org/netdev/0eaG8xtbtKY1dEKCTKUBubGiC9QawGgB3tVZtNqVdY@cp4-web-030.plabs.ch
[3] https://lore.kernel.org/netdev/YazU6GEzBdpyZMDMwJirxDX7B4sualpDG68ADZYvJI@cp4-web-034.plabs.ch
====================
Link: https://lore.kernel.org/r/hjGOh0iCOYyo1FPiZh6TMXcx3YCgNs1T1eGKLrDz8@cp4-web-037.plabs.ch
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alexander Lobakin [Wed, 11 Nov 2020 20:45:38 +0000 (20:45 +0000)]
net: udp: fix IP header access and skb lookup on Fast/frag0 UDP GRO
udp{4,6}_lib_lookup_skb() use ip{,v6}_hdr() to get IP header of the
packet. While it's probably OK for non-frag0 paths, this helpers
will also point to junk on Fast/frag0 GRO when all headers are
located in frags. As a result, sk/skb lookup may fail or give wrong
results. To support both GRO modes, skb_gro_network_header() might
be used. To not modify original functions, add private versions of
udp{4,6}_lib_lookup_skb() only to perform correct sk lookups on GRO.
Present since the introduction of "application-level" UDP GRO
in 4.7-rc1.
Misc: replace totally unneeded ternaries with plain ifs.
Fixes:
a6024562ffd7 ("udp: Add GRO functions to UDP socket")
Suggested-by: Willem de Bruijn <willemb@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alexander Lobakin [Wed, 11 Nov 2020 20:45:25 +0000 (20:45 +0000)]
net: udp: fix UDP header access on Fast/frag0 UDP GRO
UDP GRO uses udp_hdr(skb) in its .gro_receive() callback. While it's
probably OK for non-frag0 paths (when all headers or even the entire
frame are already in skb head), this inline points to junk when
using Fast GRO (napi_gro_frags() or napi_gro_receive() with only
Ethernet header in skb head and all the rest in the frags) and breaks
GRO packet compilation and the packet flow itself.
To support both modes, skb_gro_header_fast() + skb_gro_header_slow()
are typically used. UDP even has an inline helper that makes use of
them, udp_gro_udphdr(). Use that instead of troublemaking udp_hdr()
to get rid of the out-of-order delivers.
Present since the introduction of plain UDP GRO in 5.0-rc1.
Fixes:
e20cf8d3f1f7 ("udp: implement GRO for plain UDP sockets.")
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Bob Peterson [Thu, 12 Nov 2020 16:02:48 +0000 (10:02 -0600)]
gfs2: Fix case in which ail writes are done to jdata holes
Patch
b2a846dbef4e ("gfs2: Ignore journal log writes for jdata holes")
tried (unsuccessfully) to fix a case in which writes were done to jdata
blocks, the blocks are sent to the ail list, then a punch_hole or truncate
operation caused the blocks to be freed. In other words, the ail items
are for jdata holes. Before
b2a846dbef4e, the jdata hole caused function
gfs2_block_map to return -EIO, which was eventually interpreted as an
IO error to the journal, and then withdraw.
This patch changes function gfs2_get_block_noalloc, which is only used
for jdata writes, so it returns -ENODATA rather than -EIO, and when
-ENODATA is returned to gfs2_ail1_start_one, the error is ignored.
We can safely ignore it because gfs2_ail1_start_one is only called
when the jdata pages have already been written and truncated, so the
ail1 content no longer applies.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Bob Peterson [Wed, 11 Nov 2020 17:09:55 +0000 (11:09 -0600)]
Revert "gfs2: Ignore journal log writes for jdata holes"
This reverts commit
b2a846dbef4ef54ef032f0f5ee188c609a0278a7.
That commit changed the behavior of function gfs2_block_map to return
-ENODATA in cases where a hole (IOMAP_HOLE) is encountered and create is
false. While that fixed the intended problem for jdata, it also broke
other callers of gfs2_block_map such as some jdata block reads. Before
the patch, an encountered hole would be skipped and the buffer seen as
unmapped by the caller. The patch changed the behavior to return
-ENODATA, which is interpreted as an error by the caller.
The -ENODATA return code should be restricted to the specific case where
jdata holes are encountered during ail1 writes. That will be done in a
later patch.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Jakub Kicinski [Thu, 12 Nov 2020 16:47:22 +0000 (08:47 -0800)]
Merge branch '40GbE' of git://git./linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2020-11-10
This series contains updates to i40e and igc drivers and the MAINTAINERS
file.
Slawomir fixes updating VF MAC addresses to fix various issues related
to reporting and setting of these addresses for i40e.
Dan Carpenter fixes a possible used before being initialized issue for
i40e.
Vinicius fixes reporting of netdev stats for igc.
Tony updates repositories for Intel Ethernet Drivers.
* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
MAINTAINERS: Update repositories for Intel Ethernet Drivers
igc: Fix returning wrong statistics
i40e, xsk: uninitialized variable in i40e_clean_rx_irq_zc()
i40e: Fix MAC address setting for a VF via Host/VM
====================
Link: https://lore.kernel.org/r/20201111001955.533210-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Parav Pandit [Wed, 11 Nov 2020 03:47:44 +0000 (05:47 +0200)]
devlink: Avoid overwriting port attributes of registered port
Cited commit in fixes tag overwrites the port attributes for the
registered port.
Avoid such error by checking registered flag before setting attributes.
Fixes:
71ad8d55f8e5 ("devlink: Replace devlink_port_attrs_set parameters with a struct")
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20201111034744.35554-1-parav@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Martin Willi [Fri, 6 Nov 2020 07:30:30 +0000 (08:30 +0100)]
vrf: Fix fast path output packet handling with async Netfilter rules
VRF devices use an optimized direct path on output if a default qdisc
is involved, calling Netfilter hooks directly. This path, however, does
not consider Netfilter rules completing asynchronously, such as with
NFQUEUE. The Netfilter okfn() is called for asynchronously accepted
packets, but the VRF never passes that packet down the stack to send
it out over the slave device. Using the slower redirect path for this
seems not feasible, as we do not know beforehand if a Netfilter hook
has asynchronously completing rules.
Fix the use of asynchronously completing Netfilter rules in OUTPUT and
POSTROUTING by using a special completion function that additionally
calls dst_output() to pass the packet down the stack. Also, slightly
adjust the use of nf_reset_ct() so that is called in the asynchronous
case, too.
Fixes:
dcdd43c41e60 ("net: vrf: performance improvements for IPv4")
Fixes:
a9ec54d1b0cd ("net: vrf: performance improvements for IPv6")
Signed-off-by: Martin Willi <martin@strongswan.org>
Link: https://lore.kernel.org/r/20201106073030.3974927-1-martin@strongswan.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Trond Myklebust [Fri, 30 Oct 2020 21:57:30 +0000 (17:57 -0400)]
NFS: Remove unnecessary inode lock in nfs_fsync_dir()
nfs_inc_stats() is already thread-safe, and there are no other reasons
to hold the inode lock here.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Trond Myklebust [Fri, 30 Oct 2020 21:57:29 +0000 (17:57 -0400)]
NFS: Remove unnecessary inode locking in nfs_llseek_dir()
Remove the contentious inode lock, and instead provide thread safety
using the file->f_lock spinlock.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Sat, 31 Oct 2020 16:44:25 +0000 (12:44 -0400)]
NFS: Fix listxattr receive buffer size
Certain NFSv4.2/RDMA tests fail with v5.9-rc1.
rpcrdma_convert_kvec() runs off the end of the rl_segments array
because rq_rcv_buf.tail[0].iov_len holds a very large positive
value. The resultant kernel memory corruption is enough to crash
the client system.
Callers of rpc_prepare_reply_pages() must reserve an extra XDR_UNIT
in the maximum decode size for a possible XDR pad of the contents
of the xdr_buf's pages. That guarantees the allocated receive buffer
will be large enough to accommodate the usual contents plus that XDR
pad word.
encode_op_hdr() cannot add that extra word. If it does,
xdr_inline_pages() underruns the length of the tail iovec.
Fixes:
3e1f02123fba ("NFSv4.2: add client side XDR handling for extended attributes")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
J. Bruce Fields [Wed, 21 Oct 2020 14:34:15 +0000 (10:34 -0400)]
NFSv4.2: fix failure to unregister shrinker
We forgot to unregister the nfs4_xattr_large_entry_shrinker.
That leaves the global list of shrinkers corrupted after unload of the
nfs module, after which possibly unrelated code that calls
register_shrinker() or unregister_shrinker() gets a BUG() with
"supervisor write access in kernel mode".
And similarly for the nfs4_xattr_large_entry_lru.
Reported-by: Kris Karas <bugs-a17@moonlit-rail.com>
Tested-By: Kris Karas <bugs-a17@moonlit-rail.com>
Fixes:
95ad37f90c33 "NFSv4.2: add client side xattr caching."
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
CC: stable@vger.kernel.org
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Rafael J. Wysocki [Thu, 12 Nov 2020 15:11:48 +0000 (16:11 +0100)]
Merge branches 'acpi-scan', 'acpi-misc', 'acpi-button' and 'acpi-dptf'
* acpi-scan:
ACPI: scan: Fix acpi_dma_configure_id() kerneldoc name
* acpi-misc:
ACPI: GED: fix -Wformat
ACPI: Fix whitespace inconsistencies
* acpi-button:
ACPI: button: Add DMI quirk for Medion Akoya E2228T
* acpi-dptf:
ACPI: DPTF: Support Alder Lake
Zhang Qilong [Sun, 8 Nov 2020 09:27:41 +0000 (17:27 +0800)]
gfs2: fix possible reference leak in gfs2_check_blk_type
In the fail path of gfs2_check_blk_type, forgetting to call
gfs2_glock_dq_uninit will result in rgd_gh reference leak.
Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Eric Biggers [Wed, 11 Nov 2020 01:52:24 +0000 (17:52 -0800)]
fscrypt: fix inline encryption not used on new files
The new helper function fscrypt_prepare_new_inode() runs before
S_ENCRYPTED has been set on the new inode. This accidentally made
fscrypt_select_encryption_impl() never enable inline encryption on newly
created files, due to its use of fscrypt_needs_contents_encryption()
which only returns true when S_ENCRYPTED is set.
Fix this by using S_ISREG() directly instead of
fscrypt_needs_contents_encryption(), analogous to what
select_encryption_mode() does.
I didn't notice this earlier because by design, the user-visible
behavior is the same (other than performance, potentially) regardless of
whether inline encryption is used or not.
Fixes:
a992b20cd4ee ("fscrypt: add fscrypt_prepare_new_inode() and fscrypt_set_context()")
Reviewed-by: Satya Tangirala <satyat@google.com>
Link: https://lore.kernel.org/r/20201111015224.303073-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Wang Hai [Tue, 10 Nov 2020 14:46:14 +0000 (22:46 +0800)]
cosa: Add missing kfree in error path of cosa_write
If memory allocation for 'kbuf' succeed, cosa_write() doesn't have a
corresponding kfree() in exception handling. Thus add kfree() for this
function implementation.
Fixes:
1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Acked-by: Jan "Yenya" Kasprzak <kas@fi.muni.cz>
Link: https://lore.kernel.org/r/20201110144614.43194-1-wanghai38@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 10 Nov 2020 03:51:20 +0000 (19:51 -0800)]
net: switch to the kernel.org patchwork instance
Move to the kernel.org patchwork instance, it has significantly
lower latency for accessing from Europe and the US. Other quirks
include the reply bot.
Link: https://lore.kernel.org/r/20201110035120.642746-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 12 Nov 2020 00:30:41 +0000 (16:30 -0800)]
Merge branch 'cxgb4-ch_ktls-fixes-in-nic-tls-code'
Rohit Maheshwari says:
====================
cxgb4/ch_ktls: Fixes in nic tls code
This series helps in fixing multiple nic ktls issues. Series is broken
into 12 patches.
Patch 1 avoids deciding tls packet based on decrypted bit. If its a
retransmit packet which has tls handshake and finish (for encryption),
decrypted bit won't be set there, and so we can't rely on decrypted
bit.
Patch 2 helps supporting linear skb. SKBs were assumed non-linear.
Corrected the length extraction.
Patch 3 fixes the checksum offload update in WR.
Patch 4 fixes kernel panic happening due to creating new skb for each
record. As part of fix driver will use same skb to send out one tls
record (partial data) of the same SKB.
Patch 5 fixes the problem of skb data length smaller than remaining data
of the record.
Patch 6 fixes the handling of SKBs which has tls header alone pkt, but
not starting from beginning.
Patch 7 avoids sending extra data which is used to make a record 16 byte
aligned. We don't need to retransmit those extra few bytes.
Patch 8 handles the cases where retransmit packet has tls starting
exchanges which are prior to tls start marker.
Patch 9 fixes the problem os skb free before HW knows about tcp FIN.
Patch 10 handles the small packet case which has partial TAG bytes only.
HW can't handle those, hence using sw crypto for such pkts.
Patch 11 corrects the potential tcb update problem.
Patch 12 stops the queue if queue reaches threshold value.
v1->v2:
- Corrected fixes tag issue.
- Marked chcr_ktls_sw_fallback() static.
v2->v3:
- Replaced GFP_KERNEL with GFP_ATOMIC.
- Removed mixed fixes.
v3->v4:
- Corrected fixes tag issue.
v4->v5:
- Separated mixed fixes from patch 4.
v5-v6:
- Fixes tag should be at the end.
====================
Link: https://lore.kernel.org/r/20201109105142.15398-1-rohitm@chelsio.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rohit Maheshwari [Mon, 9 Nov 2020 10:51:42 +0000 (16:21 +0530)]
ch_ktls: stop the txq if reaches threshold
Stop the queue and ask for the credits if queue reaches to
threashold.
Fixes:
5a4b9fe7fece ("cxgb4/chcr: complete record tx handling")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rohit Maheshwari [Mon, 9 Nov 2020 10:51:41 +0000 (16:21 +0530)]
ch_ktls: tcb update fails sometimes
context id and port id should be filled while sending tcb update.
Fixes:
5a4b9fe7fece ("cxgb4/chcr: complete record tx handling")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rohit Maheshwari [Mon, 9 Nov 2020 10:51:40 +0000 (16:21 +0530)]
ch_ktls/cxgb4: handle partial tag alone SKBs
If TCP congestion caused a very small packets which only has some
part fo the TAG, and that too is not till the end. HW can't handle
such case, so falling back to sw crypto in such cases.
v1->v2:
- Marked chcr_ktls_sw_fallback() static.
Fixes:
dc05f3df8fac ("chcr: Handle first or middle part of record")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rohit Maheshwari [Mon, 9 Nov 2020 10:51:39 +0000 (16:21 +0530)]
ch_ktls: don't free skb before sending FIN
If its a last packet and fin is set. Make sure FIN is informed
to HW before skb gets freed.
Fixes:
429765a149f1 ("chcr: handle partial end part of a record")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rohit Maheshwari [Mon, 9 Nov 2020 10:51:38 +0000 (16:21 +0530)]
ch_ktls: packet handling prior to start marker
There could be a case where ACK for tls exchanges prior to start
marker is missed out, and by the time tls is offloaded. This pkt
should not be discarded and handled carefully. It could be
plaintext alone or plaintext + finish as well.
Fixes:
5a4b9fe7fece ("cxgb4/chcr: complete record tx handling")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rohit Maheshwari [Mon, 9 Nov 2020 10:51:37 +0000 (16:21 +0530)]
ch_ktls: Correction in middle record handling
If a record starts in middle, reset TCB UNA so that we could
avoid sending out extra packet which is needed to make it 16
byte aligned to start AES CTR.
Check also considers prev_seq, which should be what is
actually sent, not the skb data length.
Avoid updating partial TAG to HW at any point of time, that's
why we need to check if remaining part is smaller than TAG
size, then reset TX_MAX to be TAG starting sequence number.
Fixes:
5a4b9fe7fece ("cxgb4/chcr: complete record tx handling")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rohit Maheshwari [Mon, 9 Nov 2020 10:51:36 +0000 (16:21 +0530)]
ch_ktls: missing handling of header alone
If an skb has only header part which doesn't start from
beginning, is not being handled properly.
Fixes:
dc05f3df8fac ("chcr: Handle first or middle part of record")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rohit Maheshwari [Mon, 9 Nov 2020 10:51:35 +0000 (16:21 +0530)]
ch_ktls: Correction in trimmed_len calculation
trimmed length calculation goes wrong if skb has only tag part
to send. It should be zero if there is no data bytes apart from
TAG.
Fixes:
dc05f3df8fac ("chcr: Handle first or middle part of record")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rohit Maheshwari [Mon, 9 Nov 2020 10:51:34 +0000 (16:21 +0530)]
cxgb4/ch_ktls: creating skbs causes panic
Creating SKB per tls record and freeing the original one causes
panic. There will be race if connection reset is requested. By
freeing original skb, refcnt will be decremented and that means,
there is no pending record to send, and so tls_dev_del will be
requested in control path while SKB of related connection is in
queue.
Better approach is to use same SKB to send one record (partial
data) at a time. We still have to create a new SKB when partial
last part of a record is requested.
This fix introduces new API cxgb4_write_partial_sgl() to send
partial part of skb. Present cxgb4_write_sgl can only provide
feasibility to start from an offset which limits to header only
and it can write sgls for the whole skb len. But this new API
will help in both. It can start from any offset and can end
writing in middle of the skb.
v4->v5:
- Removed extra changes.
Fixes:
429765a149f1 ("chcr: handle partial end part of a record")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rohit Maheshwari [Mon, 9 Nov 2020 10:51:33 +0000 (16:21 +0530)]
ch_ktls: Update cheksum information
Checksum update was missing in the WR.
Fixes:
429765a149f1 ("chcr: handle partial end part of a record")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rohit Maheshwari [Mon, 9 Nov 2020 10:51:32 +0000 (16:21 +0530)]
ch_ktls: Correction in finding correct length
There is a possibility of linear skbs coming in. Correcting
the length extraction logic.
v2->v3:
- Separated un-related changes from this patch.
Fixes:
5a4b9fe7fece ("cxgb4/chcr: complete record tx handling")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rohit Maheshwari [Mon, 9 Nov 2020 10:51:31 +0000 (16:21 +0530)]
cxgb4/ch_ktls: decrypted bit is not enough
If skb has retransmit data starting before start marker, e.g. ccs,
decrypted bit won't be set for that, and if it has some data to
encrypt, then it must be given to crypto ULD. So in place of
decrypted, check if socket is tls offloaded. Also, unless skb has
some data to encrypt, no need to give it for tls offload handling.
v2->v3:
- Removed ifdef.
Fixes:
5a4b9fe7fece ("cxgb4/chcr: complete record tx handling")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Martin Schiller [Mon, 9 Nov 2020 06:54:49 +0000 (07:54 +0100)]
net/x25: Fix null-ptr-deref in x25_connect
This fixes a regression for blocking connects introduced by commit
4becb7ee5b3d ("net/x25: Fix x25_neigh refcnt leak when x25 disconnect").
The x25->neighbour is already set to "NULL" by x25_disconnect() now,
while a blocking connect is waiting in
x25_wait_for_connection_establishment(). Therefore x25->neighbour must
not be accessed here again and x25->state is also already set to
X25_STATE_0 by x25_disconnect().
Fixes:
4becb7ee5b3d ("net/x25: Fix x25_neigh refcnt leak when x25 disconnect")
Signed-off-by: Martin Schiller <ms@dev.tdt.de>
Reviewed-by: Xie He <xie.he.0141@gmail.com>
Link: https://lore.kernel.org/r/20201109065449.9014-1-ms@dev.tdt.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Michael Walle [Mon, 9 Nov 2020 11:04:36 +0000 (12:04 +0100)]
arm64: dts: fsl-ls1028a-kontron-sl28: specify in-band mode for ENETC
Since commit
71b77a7a27a3 ("enetc: Migrate to PHYLINK and PCS_LYNX") the
network port of the Kontron sl28 board is broken. After the migration to
phylink the device tree has to specify the in-band-mode property. Add
it.
Fixes:
71b77a7a27a3 ("enetc: Migrate to PHYLINK and PCS_LYNX")
Suggested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20201109110436.5906-1-michael@walle.cc
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Wang Hai [Mon, 9 Nov 2020 14:09:13 +0000 (22:09 +0800)]
tipc: fix memory leak in tipc_topsrv_start()
kmemleak report a memory leak as follows:
unreferenced object 0xffff88810a596800 (size 512):
comm "ip", pid 21558, jiffies
4297568990 (age 112.120s)
hex dump (first 32 bytes):
00 00 00 00 ad 4e ad de ff ff ff ff 00 00 00 00 .....N..........
ff ff ff ff ff ff ff ff 00 83 60 b0 ff ff ff ff ..........`.....
backtrace:
[<
0000000022bbe21f>] tipc_topsrv_init_net+0x1f3/0xa70
[<
00000000fe15ddf7>] ops_init+0xa8/0x3c0
[<
00000000138af6f2>] setup_net+0x2de/0x7e0
[<
000000008c6807a3>] copy_net_ns+0x27d/0x530
[<
000000006b21adbd>] create_new_namespaces+0x382/0xa30
[<
00000000bb169746>] unshare_nsproxy_namespaces+0xa1/0x1d0
[<
00000000fe2e42bc>] ksys_unshare+0x39c/0x780
[<
0000000009ba3b19>] __x64_sys_unshare+0x2d/0x40
[<
00000000614ad866>] do_syscall_64+0x56/0xa0
[<
00000000a1b5ca3c>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
'srv' is malloced in tipc_topsrv_start() but not free before
leaving from the error handling cases. We need to free it.
Fixes:
5c45ab24ac77 ("tipc: make struct tipc_server private for server.c")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Link: https://lore.kernel.org/r/20201109140913.47370-1-wanghai38@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Wed, 11 Nov 2020 22:15:06 +0000 (14:15 -0800)]
Merge branch 'stable/for-linus-5.10-rc2' of git://git./linux/kernel/git/konrad/swiotlb
Pull swiotlb fixes from Konrad Rzeszutek Wilk:
"Two tiny fixes for issues that make drivers under Xen unhappy under
certain conditions"
* 'stable/for-linus-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb:
swiotlb: remove the tbl_dma_addr argument to swiotlb_tbl_map_single
swiotlb: fix "x86: Don't panic if can not alloc buffer for swiotlb"
Jakub Kicinski [Wed, 11 Nov 2020 02:08:21 +0000 (18:08 -0800)]
Merge branch 'net-iucv-fixes-2020-11-09'
Julian Wiedmann says:
====================
net/iucv: fixes 2020-11-09
One fix in the shutdown path for af_iucv sockets. This is relevant for
stable as well.
Also sending along an update for the Maintainers file.
v1 -> v2: use the correct Fixes tag in patch 1 (Jakub)
====================
Link: https://lore.kernel.org/r/20201109075706.56573-1-jwi@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ursula Braun [Mon, 9 Nov 2020 07:57:06 +0000 (08:57 +0100)]
MAINTAINERS: remove Ursula Braun as s390 network maintainer
I am retiring soon. Thus this patch removes myself from the
MAINTAINERS file (s390 network).
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
[jwi: fix up the subject]
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ursula Braun [Mon, 9 Nov 2020 07:57:05 +0000 (08:57 +0100)]
net/af_iucv: fix null pointer dereference on shutdown
syzbot reported the following KASAN finding:
BUG: KASAN: nullptr-dereference in iucv_send_ctrl+0x390/0x3f0 net/iucv/af_iucv.c:385
Read of size 2 at addr
000000000000021e by task syz-executor907/519
CPU: 0 PID: 519 Comm: syz-executor907 Not tainted 5.9.0-syzkaller-07043-gbcf9877ad213 #0
Hardware name: IBM 3906 M04 701 (KVM/Linux)
Call Trace:
[<
00000000c576af60>] unwind_start arch/s390/include/asm/unwind.h:65 [inline]
[<
00000000c576af60>] show_stack+0x180/0x228 arch/s390/kernel/dumpstack.c:135
[<
00000000c9dcd1f8>] __dump_stack lib/dump_stack.c:77 [inline]
[<
00000000c9dcd1f8>] dump_stack+0x268/0x2f0 lib/dump_stack.c:118
[<
00000000c5fed016>] print_address_description.constprop.0+0x5e/0x218 mm/kasan/report.c:383
[<
00000000c5fec82a>] __kasan_report mm/kasan/report.c:517 [inline]
[<
00000000c5fec82a>] kasan_report+0x11a/0x168 mm/kasan/report.c:534
[<
00000000c98b5b60>] iucv_send_ctrl+0x390/0x3f0 net/iucv/af_iucv.c:385
[<
00000000c98b6262>] iucv_sock_shutdown+0x44a/0x4c0 net/iucv/af_iucv.c:1457
[<
00000000c89d3a54>] __sys_shutdown+0x12c/0x1c8 net/socket.c:2204
[<
00000000c89d3b70>] __do_sys_shutdown net/socket.c:2212 [inline]
[<
00000000c89d3b70>] __s390x_sys_shutdown+0x38/0x48 net/socket.c:2210
[<
00000000c9e36eac>] system_call+0xe0/0x28c arch/s390/kernel/entry.S:415
There is nothing to shutdown if a connection has never been established.
Besides that iucv->hs_dev is not yet initialized if a socket is in
IUCV_OPEN state and iucv->path is not yet initialized if socket is in
IUCV_BOUND state.
So, just skip the shutdown calls for a socket in these states.
Fixes:
eac3731bd04c ("[S390]: Add AF_IUCV socket support")
Fixes:
82492a355fac ("af_iucv: add shutdown for HS transport")
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
[jwi: correct one Fixes tag]
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sven Van Asbroeck [Mon, 9 Nov 2020 20:38:28 +0000 (15:38 -0500)]
lan743x: fix "BUG: invalid wait context" when setting rx mode
In the net core, the struct net_device_ops -> ndo_set_rx_mode()
callback is called with the dev->addr_list_lock spinlock held.
However, this driver's ndo_set_rx_mode callback eventually calls
lan743x_dp_write(), which acquires a mutex. Mutex acquisition
may sleep, and this is not allowed when holding a spinlock.
Fix by removing the dp_lock mutex entirely. Its purpose is to
prevent concurrent accesses to the data port. No concurrent
accesses are possible, because the dev->addr_list_lock
spinlock in the core only lets through one thread at a time.
Fixes:
23f0703c125b ("lan743x: Add main source files for new lan743x driver")
Signed-off-by: Sven Van Asbroeck <thesven73@gmail.com>
Link: https://lore.kernel.org/r/20201109203828.5115-1-TheSven73@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
zhangxiaoxu [Mon, 9 Nov 2020 14:44:16 +0000 (09:44 -0500)]
net: dsa: mv88e6xxx: Fix memleak in mv88e6xxx_region_atu_snapshot
When mv88e6xxx_fid_map return error, we lost free the table.
Fix it.
Fixes:
bfb255428966 ("net: dsa: mv88e6xxx: Add devlink regions")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhangxiaoxu <zhangxiaoxu5@huawei.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20201109144416.1540867-1-zhangxiaoxu5@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Mao Wenan [Tue, 10 Nov 2020 00:16:31 +0000 (08:16 +0800)]
net: Update window_clamp if SOCK_RCVBUF is set
When net.ipv4.tcp_syncookies=1 and syn flood is happened,
cookie_v4_check or cookie_v6_check tries to redo what
tcp_v4_send_synack or tcp_v6_send_synack did,
rsk_window_clamp will be changed if SOCK_RCVBUF is set,
which will make rcv_wscale is different, the client
still operates with initial window scale and can overshot
granted window, the client use the initial scale but local
server use new scale to advertise window value, and session
work abnormally.
Fixes:
e88c64f0a425 ("tcp: allow effective reduction of TCP's rcv-buffer via setsockopt")
Signed-off-by: Mao Wenan <wenan.mao@linux.alibaba.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/1604967391-123737-1-git-send-email-wenan.mao@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Heiner Kallweit [Sun, 8 Nov 2020 21:44:02 +0000 (22:44 +0100)]
net: phy: realtek: support paged operations on RTL8201CP
The RTL8401-internal PHY identifies as RTL8201CP, and the init
sequence in r8169, copied from vendor driver r8168, uses paged
operations. Therefore set the same paged operation callbacks as
for the other Realtek PHY's.
Fixes:
cdafdc29ef75 ("r8169: sync support for RTL8401 with vendor driver")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/69882f7a-ca2f-e0c7-ae83-c9b6937282cd@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sven Van Asbroeck [Sun, 8 Nov 2020 17:12:24 +0000 (12:12 -0500)]
lan743x: correctly handle chips with internal PHY
Commit
6f197fb63850 ("lan743x: Added fixed link and RGMII support")
assumes that chips with an internal PHY will never have a devicetree
entry. This is incorrect: even for these chips, a devicetree entry
can be useful e.g. to pass the mac address from bootloader to chip:
&pcie {
status = "okay";
host@0 {
reg = <0 0 0 0 0>;
#address-cells = <3>;
#size-cells = <2>;
lan7430: ethernet@0 {
/* LAN7430 with internal PHY */
compatible = "microchip,lan743x";
status = "okay";
reg = <0 0 0 0 0>;
/* filled in by bootloader */
local-mac-address = [00 00 00 00 00 00];
};
};
};
If a devicetree entry is present, the driver will not attach the chip
to its internal phy, and the chip will be non-operational.
Fix by tweaking the phy connection algorithm:
- first try to connect to a phy specified in the devicetree
(could be 'real' phy, or just a 'fixed-link')
- if that doesn't succeed, try to connect to an internal phy, even
if the chip has a devnode
Tested on a LAN7430 with internal PHY. I cannot test a device using
fixed-link, as I do not have access to one.
Fixes:
6f197fb63850 ("lan743x: Added fixed link and RGMII support")
Tested-by: Sven Van Asbroeck <thesven73@gmail.com> # lan7430
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Sven Van Asbroeck <thesven73@gmail.com>
Link: https://lore.kernel.org/r/20201108171224.23829-1-TheSven73@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paul Moore [Sun, 8 Nov 2020 14:08:26 +0000 (09:08 -0500)]
netlabel: fix our progress tracking in netlbl_unlabel_staticlist()
The current NetLabel code doesn't correctly keep track of the netlink
dump state in some cases, in particular when multiple interfaces with
large configurations are loaded. The problem manifests itself by not
reporting the full configuration to userspace, even though it is
loaded and active in the kernel. This patch fixes this by ensuring
that the dump state is properly reset when necessary inside the
netlbl_unlabel_staticlist() function.
Fixes:
8cc44579d1bd ("NetLabel: Introduce static network labels for unlabeled connections")
Signed-off-by: Paul Moore <paul@paul-moore.com>
Link: https://lore.kernel.org/r/160484450633.3752.16512718263560813473.stgit@sifl
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tony Nguyen [Tue, 10 Nov 2020 00:07:35 +0000 (16:07 -0800)]
MAINTAINERS: Update repositories for Intel Ethernet Drivers
Update Intel Ethernet Drivers repositories to new locations.
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Vinicius Costa Gomes [Fri, 25 Sep 2020 18:35:37 +0000 (11:35 -0700)]
igc: Fix returning wrong statistics
'igc_update_stats()' was not updating 'netdev->stats', so the returned
statistics, for example, requested by:
$ ip -s link show dev enp3s0
were not being updated and were always zero.
Fix by returning a set of statistics that are actually being
updated (adapter->stats64).
Fixes:
c9a11c23ceb6 ("igc: Add netdev")
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Dan Carpenter [Wed, 16 Sep 2020 14:32:28 +0000 (17:32 +0300)]
i40e, xsk: uninitialized variable in i40e_clean_rx_irq_zc()
The "failure" variable is used without being initialized. It should be
set to false.
Fixes:
8cbf74149903 ("i40e, xsk: move buffer allocation out of the Rx processing loop")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Slawomir Laba [Wed, 14 Oct 2020 08:54:09 +0000 (08:54 +0000)]
i40e: Fix MAC address setting for a VF via Host/VM
Fix MAC setting flow for the PF driver.
Update the unicast VF's MAC address in VF structure if it is
a new setting in i40e_vc_add_mac_addr_msg.
When unicast MAC address gets deleted, record that and
set the new unicast MAC address that is already waiting in the filter
list. This logic is based on the order of messages arriving to
the PF driver.
Without this change the MAC address setting was interpreted
incorrectly in the following use cases:
1) Print incorrect VF MAC or zero MAC
ip link show dev $pf
2) Don't preserve MAC between driver reload
rmmod iavf; modprobe iavf
3) Update VF MAC when macvlan was set
ip link add link $vf address $mac $vf.1 type macvlan
4) Failed to update mac address when VF was trusted
ip link set dev $vf address $mac
This includes all other configurations including above commands.
Fixes:
f657a6e1313b ("i40e: Fix VF driver MAC address configuration")
Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Vlad Buslov [Sat, 7 Nov 2020 11:19:28 +0000 (13:19 +0200)]
selftest: fix flower terse dump tests
Iproute2 tc classifier terse dump has been accepted with modified syntax.
Update the tests accordingly.
Signed-off-by: Vlad Buslov <vlad@buslov.dev>
Fixes:
e7534fd42a99 ("selftests: implement flower classifier terse dump tests")
Link: https://lore.kernel.org/r/20201107111928.453534-1-vlad@buslov.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Tue, 10 Nov 2020 18:33:55 +0000 (10:33 -0800)]
Merge branch 'fixes' of git://git./linux/kernel/git/viro/vfs
Pull core dump fix from Al Viro:
"Fix for multithreaded coredump playing fast and loose with getting
registers of secondary threads; if a secondary gets caught in the
middle of exit(2), the conditition it will be stopped in for dumper to
examine might be unusual enough for things to go wrong.
Quite a few architectures are fine with that, but some are not."
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
don't dump the threads that had been already exiting when zapped.
Linus Torvalds [Tue, 10 Nov 2020 18:07:15 +0000 (10:07 -0800)]
Merge tag 'for-5.10-rc3-tag' of git://git./linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"A handful of minor fixes and updates:
- handle missing device replace item on mount (syzbot report)
- fix space reservation calculation when finishing relocation
- fix memory leak on error path in ref-verify (debugging feature)
- fix potential overflow during defrag on 32bit arches
- minor code update to silence smatch warning
- minor error message updates"
* tag 'for-5.10-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: ref-verify: fix memory leak in btrfs_ref_tree_mod
btrfs: dev-replace: fail mount if we don't have replace item with target device
btrfs: scrub: update message regarding read-only status
btrfs: clean up NULL checks in qgroup_unreserve_range()
btrfs: fix min reserved size calculation in merge_reloc_root
btrfs: print the block rsv type when we fail our reservation
btrfs: fix potential overflow in cluster_pages_for_defrag on 32bit arch
Linus Torvalds [Tue, 10 Nov 2020 18:05:37 +0000 (10:05 -0800)]
Merge tag 'fscrypt-for-linus' of git://git./fs/fscrypt/fscrypt
Pull fscrypt fix from Eric Biggers:
"Fix a regression where a new WARN_ON() was reachable when using
FSCRYPT_POLICY_FLAG_IV_INO_LBLK_32 on ext4, causing xfstest
generic/602 to sometimes fail on ext4"
* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
fscrypt: remove reachable WARN in fscrypt_setup_iv_ino_lblk_32_key()
Linus Torvalds [Tue, 10 Nov 2020 18:02:31 +0000 (10:02 -0800)]
Merge branch 'turbostat' of git://git./linux/kernel/git/lenb/linux
Pull turbostat updates from Len Brown:
"Update update to version 20.09.30, one kernel side fix"
* 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
tools/power turbostat: update version number
powercap: restrict energy meter to root access
tools/power turbostat: harden against cpu hotplug
tools/power turbostat: adjust for temperature offset
tools/power turbostat: Build with _FILE_OFFSET_BITS=64
tools/power turbostat: Support AMD Family 19h
tools/power turbostat: Remove empty columns for Jacobsville
tools/power turbostat: Add a new GFXAMHz column that exposes gt_act_freq_mhz.
tools/power x86_energy_perf_policy: Input/output error in a VM
tools/power turbostat: Skip pc8, pc9, pc10 columns, if they are disabled
tools/power turbostat: Support additional CPU model numbers
tools/power turbostat: Fix output formatting for ACPI CST enumeration
tools/power turbostat: Replace HTTP links with HTTPS ones: TURBOSTAT UTILITY
tools/power turbostat: Use sched_getcpu() instead of hardcoded cpu 0
tools/power turbostat: Enable accumulate RAPL display
tools/power turbostat: Introduce functions to accumulate RAPL consumption
tools/power turbostat: Make the energy variable to be 64 bit
tools/power turbostat: Always print idle in the system configuration header
tools/power turbostat: Print /dev/cpu_dma_latency
Srinivas Pandruvada [Tue, 10 Nov 2020 17:50:58 +0000 (09:50 -0800)]
ACPI: DPTF: Support Alder Lake
Add Alder Lake ACPI IDs for DPTF devices.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Flavio Suligoi [Tue, 10 Nov 2020 13:03:38 +0000 (14:03 +0100)]
Documentation: ACPI: fix spelling mistakes
Signed-off-by: Flavio Suligoi <f.suligoi@asem.it>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Rafael J. Wysocki [Tue, 10 Nov 2020 17:27:40 +0000 (18:27 +0100)]
cpufreq: intel_pstate: Take CPUFREQ_GOV_STRICT_TARGET into account
Make intel_pstate take the new CPUFREQ_GOV_STRICT_TARGET governor
flag into account when it operates in the passive mode with HWP
enabled, so as to fix the "powersave" governor behavior in that
case (currently, HWP is allowed to scale the performance all the
way up to the policy max limit when the "powersave" governor is
used, but it should be constrained to the policy min limit then).
Fixes:
f6ebbcf08f37 ("cpufreq: intel_pstate: Implement passive mode with HWP enabled")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: 5.9+ <stable@vger.kernel.org> # 5.9+: 9a2a9ebc0a75 cpufreq: Introduce governor flags
Cc: 5.9+ <stable@vger.kernel.org> # 5.9+: 218f66870181 cpufreq: Introduce CPUFREQ_GOV_STRICT_TARGET
Cc: 5.9+ <stable@vger.kernel.org> # 5.9+: ea9364bbadf1 cpufreq: Add strict_target to struct cpufreq_policy
Rafael J. Wysocki [Tue, 10 Nov 2020 17:26:37 +0000 (18:26 +0100)]
cpufreq: Add strict_target to struct cpufreq_policy
Add a new field to be set when the CPUFREQ_GOV_STRICT_TARGET flag is
set for the current governor to struct cpufreq_policy, so that the
drivers needing to check CPUFREQ_GOV_STRICT_TARGET do not have to
access the governor object during every frequency transition.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Rafael J. Wysocki [Tue, 10 Nov 2020 17:26:10 +0000 (18:26 +0100)]
cpufreq: Introduce CPUFREQ_GOV_STRICT_TARGET
Introduce a new governor flag, CPUFREQ_GOV_STRICT_TARGET, for the
governors that want the target frequency to be set exactly to the
given value without leaving any room for adjustments on the hardware
side and set this flag for the powersave and performance governors.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Rafael J. Wysocki [Tue, 10 Nov 2020 17:25:57 +0000 (18:25 +0100)]
cpufreq: Introduce governor flags
A new cpufreq governor flag will be added subsequently, so replace
the bool dynamic_switching fleid in struct cpufreq_governor with a
flags field and introduce CPUFREQ_GOV_DYNAMIC_SWITCHING to set for
the "dynamic switching" governors instead of it.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Len Brown [Thu, 1 Oct 2020 01:04:05 +0000 (21:04 -0400)]
tools/power turbostat: update version number
goodbye summer...
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Tue, 10 Nov 2020 21:00:00 +0000 (13:00 -0800)]
powercap: restrict energy meter to root access
Remove non-privileged user access to power data contained in
/sys/class/powercap/intel-rapl*/*/energy_uj
Non-privileged users currently have read access to power data and can
use this data to form a security attack. Some privileged
drivers/applications need read access to this data, but don't expose it
to non-privileged users.
For example, thermald uses this data to ensure that power management
works correctly. Thus removing non-privileged access is preferred over
completely disabling this power reporting capability with
CONFIG_INTEL_RAPL=n.
Fixes:
95677a9a3847 ("PowerCap: Fix mode for energy counter")
Signed-off-by: Len Brown <len.brown@intel.com>
Cc: stable@vger.kernel.org
Paolo Abeni [Sun, 8 Nov 2020 18:49:59 +0000 (19:49 +0100)]
mptcp: provide rmem[0] limit
The mptcp proto struct currently does not provide the
required limit for forward memory scheduling. Under
pressure sk_rmem_schedule() will unconditionally try
to use such field and will oops.
Address the issue inheriting the tcp limit, as we already
do for the wmem one.
Fixes:
9c3f94e1681b ("mptcp: add missing memory scheduling in the rx path")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Link: https://lore.kernel.org/r/37af798bd46f402fb7c79f57ebbdd00614f5d7fa.1604861097.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jonathan Neuschäfer [Sat, 7 Nov 2020 22:08:21 +0000 (23:08 +0100)]
docs: networking: phy: s/2.5 times faster/2.5 times as fast/
2.5 times faster would be 3.5 Gbps (4.375 Gbaud after 8b/10b encoding).
Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Link: https://lore.kernel.org/r/20201107220822.1291215-1-j.neuschaefer@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alexander Lobakin [Sun, 8 Nov 2020 00:46:15 +0000 (00:46 +0000)]
ethtool: netlink: add missing netdev_features_change() call
After updating userspace Ethtool from 5.7 to 5.9, I noticed that
NETDEV_FEAT_CHANGE is no more raised when changing netdev features
through Ethtool.
That's because the old Ethtool ioctl interface always calls
netdev_features_change() at the end of user request processing to
inform the kernel that our netdevice has some features changed, but
the new Netlink interface does not. Instead, it just notifies itself
with ETHTOOL_MSG_FEATURES_NTF.
Replace this ethtool_notify() call with netdev_features_change(), so
the kernel will be aware of any features changes, just like in case
with the ioctl interface. This does not omit Ethtool notifications,
as Ethtool itself listens to NETDEV_FEAT_CHANGE and drops
ETHTOOL_MSG_FEATURES_NTF on it
(net/ethtool/netlink.c:ethnl_netdev_event()).
From v1 [1]:
- dropped extra new line as advised by Jakub;
- no functional changes.
[1] https://lore.kernel.org/netdev/AlZXQ2o5uuTVHCfNGOiGgJ8vJ3KgO5YIWAnQjH0cDE@cp3-web-009.plabs.ch
Fixes:
0980bfcd6954 ("ethtool: set netdev features with FEATURES_SET request")
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Link: https://lore.kernel.org/r/ahA2YWXYICz5rbUSQqNG4roJ8OlJzzYQX7PTiG80@cp4-web-028.plabs.ch
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Stefano Brivio [Fri, 6 Nov 2020 16:59:52 +0000 (17:59 +0100)]
tunnels: Fix off-by-one in lower MTU bounds for ICMP/ICMPv6 replies
Jianlin reports that a bridged IPv6 VXLAN endpoint, carrying IPv6
packets over a link with a PMTU estimation of exactly 1350 bytes,
won't trigger ICMPv6 Packet Too Big replies when the encapsulated
datagrams exceed said PMTU value. VXLAN over IPv6 adds 70 bytes of
overhead, so an ICMPv6 reply indicating 1280 bytes as inner MTU
would be legitimate and expected.
This comes from an off-by-one error I introduced in checks added
as part of commit
4cb47a8644cc ("tunnels: PMTU discovery support
for directly bridged IP packets"), whose purpose was to prevent
sending ICMPv6 Packet Too Big messages with an MTU lower than the
smallest permissible IPv6 link MTU, i.e. 1280 bytes.
In iptunnel_pmtud_check_icmpv6(), avoid triggering a reply only if
the advertised MTU would be less than, and not equal to, 1280 bytes.
Also fix the analogous comparison for IPv4, that is, skip the ICMP
reply only if the resulting MTU is strictly less than 576 bytes.
This becomes apparent while running the net/pmtu.sh bridged VXLAN
or GENEVE selftests with adjusted lower-link MTU values. Using
e.g. GENEVE, setting ll_mtu to the values reported below, in the
test_pmtu_ipvX_over_bridged_vxlanY_or_geneveY_exception() test
function, we can see failures on the following tests:
test | ll_mtu
-------------------------------|--------
pmtu_ipv4_br_geneve4_exception | 626
pmtu_ipv6_br_geneve4_exception | 1330
pmtu_ipv6_br_geneve6_exception | 1350
owing to the different tunneling overheads implied by the
corresponding configurations.
Reported-by: Jianlin Shi <jishi@redhat.com>
Fixes:
4cb47a8644cc ("tunnels: PMTU discovery support for directly bridged IP packets")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Link: https://lore.kernel.org/r/4f5fc2f33bfdf8409549fafd4f952b008bf04d63.1604681709.git.sbrivio@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Oliver Herms [Tue, 3 Nov 2020 10:41:33 +0000 (11:41 +0100)]
IPv6: Set SIT tunnel hard_header_len to zero
Due to the legacy usage of hard_header_len for SIT tunnels while
already using infrastructure from net/ipv4/ip_tunnel.c the
calculation of the path MTU in tnl_update_pmtu is incorrect.
This leads to unnecessary creation of MTU exceptions for any
flow going over a SIT tunnel.
As SIT tunnels do not have a header themsevles other than their
transport (L3, L2) headers we're leaving hard_header_len set to zero
as tnl_update_pmtu is already taking care of the transport headers
sizes.
This will also help avoiding unnecessary IPv6 GC runs and spinlock
contention seen when using SIT tunnels and for more than
net.ipv6.route.gc_thresh flows.
Fixes:
c54419321455 ("GRE: Refactor GRE tunneling code.")
Signed-off-by: Oliver Herms <oliver.peter.herms@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20201103104133.GA1573211@tws
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Mon, 9 Nov 2020 21:58:10 +0000 (13:58 -0800)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"ARM:
- fix compilation error when PMD and PUD are folded
- fix regression in reads-as-zero behaviour of ID_AA64ZFR0_EL1
- add aarch64 get-reg-list test
x86:
- fix semantic conflict between two series merged for 5.10
- fix (and test) enforcement of paravirtual cpuid features
selftests:
- various cleanups to memory management selftests
- new selftests testcase for performance of dirty logging"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (30 commits)
KVM: selftests: allow two iterations of dirty_log_perf_test
KVM: selftests: Introduce the dirty log perf test
KVM: selftests: Make the number of vcpus global
KVM: selftests: Make the per vcpu memory size global
KVM: selftests: Drop pointless vm_create wrapper
KVM: selftests: Add wrfract to common guest code
KVM: selftests: Simplify demand_paging_test with timespec_diff_now
KVM: selftests: Remove address rounding in guest code
KVM: selftests: Factor code out of demand_paging_test
KVM: selftests: Use a single binary for dirty/clear log test
KVM: selftests: Always clear dirty bitmap after iteration
KVM: selftests: Add blessed SVE registers to get-reg-list
KVM: selftests: Add aarch64 get-reg-list test
selftests: kvm: test enforcement of paravirtual cpuid features
selftests: kvm: Add exception handling to selftests
selftests: kvm: Clear uc so UCALL_NONE is being properly reported
selftests: kvm: Fix the segment descriptor layout to match the actual layout
KVM: x86: handle MSR_IA32_DEBUGCTLMSR with report_ignored_msrs
kvm: x86: request masterclock update any time guest uses different msr
kvm: x86: ensure pv_cpuid.features is initialized when enabling cap
...
Linus Torvalds [Mon, 9 Nov 2020 20:43:12 +0000 (12:43 -0800)]
Merge tag 'nfsd-5.10-1' of git://linux-nfs.org/~bfields/linux
Pull nfsd fixes from Bruce Fields:
"This is mainly server-to-server copy and fallout from Chuck's 5.10 rpc
refactoring"
* tag 'nfsd-5.10-1' of git://linux-nfs.org/~bfields/linux:
net/sunrpc: fix useless comparison in proc_do_xprt()
net/sunrpc: return 0 on attempt to write to "transports"
NFSD: fix missing refcount in nfsd4_copy by nfsd4_do_async_copy
NFSD: Fix use-after-free warning when doing inter-server copy
NFSD: MKNOD should return NFSERR_BADTYPE instead of NFSERR_INVAL
SUNRPC: Fix general protection fault in trace_rpc_xdr_overflow()
NFSD: NFSv3 PATHCONF Reply is improperly formed
Linus Torvalds [Mon, 9 Nov 2020 20:36:58 +0000 (12:36 -0800)]
Merge tag 'ext4_for_linus_cleanups' of git://git./linux/kernel/git/tytso/ext4
Pull ext4 fixes and cleanups from Ted Ts'o:
"More fixes and cleanups for the new fast_commit features, but also a
few other miscellaneous bug fixes and a cleanup for the MAINTAINERS
file"
* tag 'ext4_for_linus_cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (28 commits)
jbd2: fix up sparse warnings in checkpoint code
ext4: fix sparse warnings in fast_commit code
ext4: cleanup fast commit mount options
jbd2: don't start fast commit on aborted journal
ext4: make s_mount_flags modifications atomic
ext4: issue fsdev cache flush before starting fast commit
ext4: disable fast commit with data journalling
ext4: fix inode dirty check in case of fast commits
ext4: remove unnecessary fast commit calls from ext4_file_mmap
ext4: mark buf dirty before submitting fast commit buffer
ext4: fix code documentatioon
ext4: dedpulicate the code to wait on inode that's being committed
jbd2: don't read journal->j_commit_sequence without taking a lock
jbd2: don't touch buffer state until it is filled
jbd2: add todo for a fast commit performance optimization
jbd2: don't pass tid to jbd2_fc_end_commit_fallback()
jbd2: don't use state lock during commit path
jbd2: drop jbd2_fc_init documentation
ext4: clean up the JBD2 API that initializes fast commits
jbd2: rename j_maxlen to j_total_len and add jbd2_journal_max_txn_bufs
...
Linus Torvalds [Mon, 9 Nov 2020 20:23:01 +0000 (12:23 -0800)]
Merge tag 'erofs-for-5.10-rc4-fixes' of git://git./linux/kernel/git/xiang/erofs
Pull erofs fixes from Gao Xiang:
"A week ago, Vladimir reported an issue that the kernel log would
become polluted if the page allocation debug option is enabled. I also
found this when I cleaned up magical page->mapping and originally
planned to submit these all for 5.11 but it seems the impact can be
noticed so submit the fix in advance.
In addition, nl6720 also reported that atime is empty although EROFS
has the only one on-disk timestamp as a practical consideration for
now but it's better to derive it as what we did for the other
timestamps.
Summary:
- fix setting up pcluster improperly for temporary pages
- derive atime instead of leaving it empty"
* tag 'erofs-for-5.10-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: fix setting up pcluster for temporary pages
erofs: derive atime instead of leaving it empty
Hans de Goede [Sat, 7 Nov 2020 13:32:54 +0000 (14:32 +0100)]
ACPI: button: Add DMI quirk for Medion Akoya E2228T
The Medion Akoya E2228T's ACPI _LID implementation is quite broken,
it has the same issues as the one from the Medion Akoya E2215T:
1. For notifications it uses an ActiveLow Edge GpioInt, rather then
an ActiveBoth one, meaning that the device is only notified when the
lid is closed, not when it is opened.
2. Matching with this its _LID method simply always returns 0 (closed)
In order for the Linux LID code to work properly with this implementation,
the lid_init_state selection needs to be set to ACPI_BUTTON_LID_INIT_OPEN,
add a DMI quirk for this.
While working on this I also found out that the MD60### part of the model
number differs per country/batch while all of the E2215T and E2228T models
have this issue, so also remove the " MD60198" part from the E2215T quirk.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Nick Desaulniers [Sat, 7 Nov 2020 08:49:39 +0000 (00:49 -0800)]
ACPI: GED: fix -Wformat
Clang is more aggressive about -Wformat warnings when the format flag
specifies a type smaller than the parameter. It turns out that gsi is an
int. Fixes:
drivers/acpi/evged.c:105:48: warning: format specifies type 'unsigned
char' but the argument has type 'unsigned int' [-Wformat]
trigger == ACPI_EDGE_SENSITIVE ? 'E' : 'L', gsi);
^~~
Link: https://github.com/ClangBuiltLinux/linux/issues/378
Fixes:
ea6f3af4c5e6 ("ACPI: GED: add support for _Exx / _Lxx handler methods")
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Maximilian Luz [Thu, 5 Nov 2020 02:06:00 +0000 (03:06 +0100)]
ACPI: Fix whitespace inconsistencies
Replaces spaces with tabs where spaces have been (inconsistently) used
for indentation and removes trailing whitespaces.
Signed-off-by: Maximilian Luz <luzmaximilian@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
John Garry [Mon, 2 Nov 2020 11:19:31 +0000 (19:19 +0800)]
ACPI: scan: Fix acpi_dma_configure_id() kerneldoc name
For some reason building with W=1 doesn't pick up on this, but the
kerneldoc name for acpi_dma_configure_id() is not right, so fix it up.
Signed-off-by: John Garry <john.garry@huawei.com>
Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Andy Shevchenko [Thu, 29 Oct 2020 19:32:43 +0000 (21:32 +0200)]
Documentation: firmware-guide: gpio-properties: Clarify initial output state
GpioIo() doesn't provide an explicit state for an output pin.
Linux tries to be smart and uses a common sense based on other
parameters. Document how it looks like in the code.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Andy Shevchenko [Thu, 29 Oct 2020 19:32:42 +0000 (21:32 +0200)]
Documentation: firmware-guide: gpio-properties: active_low only for GpioIo()
It appears that people may misinterpret active_low field in _DSD
for GpioInt() resource. Add a paragraph to clarify this.
Reported-by: Ricardo Ribalda <ribalda@chromium.org>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Ricardo Ribalda <ribalda@chromium.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>