Marc Zyngier [Tue, 24 Mar 2020 12:45:27 +0000 (12:45 +0000)]
Merge branch 'kvm-arm64/gic-v4.1' into kvmarm-master/next
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Wed, 4 Mar 2020 20:33:30 +0000 (20:33 +0000)]
KVM: arm64: GICv4.1: Expose HW-based SGIs in debugfs
The vgic-state debugfs file could do with showing the pending state
of the HW-backed SGIs. Plug it into the low-level code.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/20200304203330.4967-24-maz@kernel.org
Marc Zyngier [Wed, 4 Mar 2020 20:33:29 +0000 (20:33 +0000)]
KVM: arm64: GICv4.1: Allow non-trapping WFI when using HW SGIs
Just like for VLPIs, it is beneficial to avoid trapping on WFI when the
vcpu is using the GICv4.1 SGIs.
Add such a check to vcpu_clear_wfx_traps().
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/20200304203330.4967-23-maz@kernel.org
Marc Zyngier [Wed, 4 Mar 2020 20:33:28 +0000 (20:33 +0000)]
KVM: arm64: GICv4.1: Reload VLPI configuration on distributor enable/disable
Each time a Group-enable bit gets flipped, the state of these bits
needs to be forwarded to the hardware. This is a pretty heavy
handed operation, requiring all vcpus to reload their GICv4
configuration. It is thus implemented as a new request type.
These enable bits are programmed into the HW by setting the VGrp{0,1}En
fields of GICR_VPENDBASER when the vPEs are made resident again.
Of course, we only support Group-1 for now...
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Link: https://lore.kernel.org/r/20200304203330.4967-22-maz@kernel.org
Marc Zyngier [Wed, 4 Mar 2020 20:33:27 +0000 (20:33 +0000)]
KVM: arm64: GICv4.1: Plumb SGI implementation selection in the distributor
The GICv4.1 architecture gives the hypervisor the option to let
the guest choose whether it wants the good old SGIs with an
active state, or the new, HW-based ones that do not have one.
For this, plumb the configuration of SGIs into the GICv3 MMIO
handling, present the GICD_TYPER2.nASSGIcap to the guest,
and handle the GICD_CTLR.nASSGIreq setting.
In order to be able to deal with the restore of a guest, also
apply the GICD_CTLR.nASSGIreq setting at first run so that we
can move the restored SGIs to the HW if that's what the guest
had selected in a previous life.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Link: https://lore.kernel.org/r/20200304203330.4967-21-maz@kernel.org
Marc Zyngier [Wed, 4 Mar 2020 20:33:26 +0000 (20:33 +0000)]
KVM: arm64: GICv4.1: Allow SGIs to switch between HW and SW interrupts
In order to let a guest buy in the new, active-less SGIs, we
need to be able to switch between the two modes.
Handle this by stopping all guest activity, transfer the state
from one mode to the other, and resume the guest. Nothing calls
this code so far, but a later patch will plug it into the MMIO
emulation.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Link: https://lore.kernel.org/r/20200304203330.4967-20-maz@kernel.org
Marc Zyngier [Wed, 4 Mar 2020 20:33:25 +0000 (20:33 +0000)]
KVM: arm64: GICv4.1: Add direct injection capability to SGI registers
Most of the GICv3 emulation code that deals with SGIs now has to be
aware of the v4.1 capabilities in order to benefit from it.
Add such support, keyed on the interrupt having the hw flag set and
being a SGI.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/20200304203330.4967-19-maz@kernel.org
Marc Zyngier [Wed, 4 Mar 2020 20:33:24 +0000 (20:33 +0000)]
KVM: arm64: GICv4.1: Let doorbells be auto-enabled
As GICv4.1 understands the life cycle of doorbells (instead of
just randomly firing them at the most inconvenient time), just
enable them at irq_request time, and be done with it.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/20200304203330.4967-18-maz@kernel.org
Marc Zyngier [Wed, 4 Mar 2020 20:33:23 +0000 (20:33 +0000)]
irqchip/gic-v4.1: Eagerly vmap vPEs
Now that we have HW-accelerated SGIs being delivered to VPEs, it
becomes required to map the VPEs on all ITSs instead of relying
on the lazy approach that we would use when using the ITS-list
mechanism.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Link: https://lore.kernel.org/r/20200304203330.4967-17-maz@kernel.org
Marc Zyngier [Wed, 4 Mar 2020 20:33:22 +0000 (20:33 +0000)]
irqchip/gic-v4.1: Add VSGI property setup
Add the SGI configuration entry point for KVM to use.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Link: https://lore.kernel.org/r/20200304203330.4967-16-maz@kernel.org
Marc Zyngier [Wed, 4 Mar 2020 20:33:21 +0000 (20:33 +0000)]
irqchip/gic-v4.1: Add VSGI allocation/teardown
Allocate per-VPE SGIs when initializing the GIC-specific part of the
VPE data structure.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Link: https://lore.kernel.org/r/20200304203330.4967-15-maz@kernel.org
Marc Zyngier [Wed, 4 Mar 2020 20:33:20 +0000 (20:33 +0000)]
irqchip/gic-v4.1: Move doorbell management to the GICv4 abstraction layer
In order to hide some of the differences between v4.0 and v4.1, move
the doorbell management out of the KVM code, and into the GICv4-specific
layer. This allows the calling code to ask for the doorbell when blocking,
and otherwise to leave the doorbell permanently disabled.
This matches the v4.1 code perfectly, and only results in a minor
refactoring of the v4.0 code.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Link: https://lore.kernel.org/r/20200304203330.4967-14-maz@kernel.org
Marc Zyngier [Wed, 4 Mar 2020 20:33:19 +0000 (20:33 +0000)]
irqchip/gic-v4.1: Plumb set_vcpu_affinity SGI callbacks
Just like for vLPIs, there is some configuration information that cannot
be directly communicated through the normal irqchip API, and we have to
use our good old friend set_vcpu_affinity as a side-band communication
mechanism.
This is used to configure group and priority for a given vSGI.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/20200304203330.4967-13-maz@kernel.org
Marc Zyngier [Wed, 4 Mar 2020 20:33:18 +0000 (20:33 +0000)]
irqchip/gic-v4.1: Plumb get/set_irqchip_state SGI callbacks
To implement the get/set_irqchip_state callbacks (limited to the
PENDING state), we have to use a particular set of hacks:
- Reading the pending state is done by using a pair of new redistributor
registers (GICR_VSGIR, GICR_VSGIPENDR), which allow the 16 interrupts
state to be retrieved.
- Setting the pending state is done by generating it as we'd otherwise do
for a guest (writing to GITS_SGIR).
- Clearing the pending state is done by emitting a VSGI command with the
"clear" bit set.
This requires some interesting locking though:
- When talking to the redistributor, we must make sure that the VPE
affinity doesn't change, hence taking the VPE lock.
- At the same time, we must ensure that nobody accesses the same
redistributor's GICR_VSGIR registers for a different VPE, which
would corrupt the reading of the pending bits. We thus take the
per-RD spinlock. Much fun.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Link: https://lore.kernel.org/r/20200304203330.4967-12-maz@kernel.org
Marc Zyngier [Wed, 4 Mar 2020 20:33:17 +0000 (20:33 +0000)]
irqchip/gic-v4.1: Plumb mask/unmask SGI callbacks
Implement mask/unmask for virtual SGIs by calling into the
configuration helper.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/20200304203330.4967-11-maz@kernel.org
Marc Zyngier [Wed, 4 Mar 2020 20:33:16 +0000 (20:33 +0000)]
irqchip/gic-v4.1: Add initial SGI configuration
The GICv4.1 ITS has yet another new command (VSGI) which allows
a VPE-targeted SGI to be configured (or have its pending state
cleared). Add support for this command and plumb it into the
activate irqdomain callback so that it is ready to be used.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Link: https://lore.kernel.org/r/20200304203330.4967-10-maz@kernel.org
Marc Zyngier [Wed, 4 Mar 2020 20:33:15 +0000 (20:33 +0000)]
irqchip/gic-v4.1: Plumb skeletal VSGI irqchip
Since GICv4.1 has the capability to inject 16 SGIs into each VPE,
and that I'm keen not to invent too many specific interfaces to
manipulate these interrupts, let's pretend that each of these SGIs
is an actual Linux interrupt.
For that matter, let's introduce a minimal irqchip and irqdomain
setup that will get fleshed up in the following patches.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/20200304203330.4967-9-maz@kernel.org
Marc Zyngier [Wed, 4 Mar 2020 20:33:14 +0000 (20:33 +0000)]
irqchip/gic-v4.1: Map the ITS SGIR register page
One of the new features of GICv4.1 is to allow virtual SGIs to be
directly signaled to a VPE. For that, the ITS has grown a new
64kB page containing only a single register that is used to
signal a SGI to a given VPE.
Add a second mapping covering this new 64kB range, and take this
opportunity to limit the original mapping to 64kB, which is enough
to cover the span of the ITS registers.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/20200304203330.4967-8-maz@kernel.org
Marc Zyngier [Wed, 4 Mar 2020 20:33:13 +0000 (20:33 +0000)]
irqchip/gic-v4.1: Advertise support v4.1 to KVM
Tell KVM that we support v4.1. Nothing uses this information so far.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/20200304203330.4967-7-maz@kernel.org
Marc Zyngier [Wed, 4 Mar 2020 20:33:12 +0000 (20:33 +0000)]
irqchip/gic-v4.1: Ensure mutual exclusion betwen invalidations on the same RD
The GICv4.1 spec says that it is CONTRAINED UNPREDICTABLE to write to
any of the GICR_INV{LPI,ALL}R registers if GICR_SYNCR.Busy == 1.
To deal with it, we must ensure that only a single invalidation can
happen at a time for a given redistributor. Add a per-RD lock to that
effect and take it around the invalidation/syncr-read to deal with this.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/20200304203330.4967-6-maz@kernel.org
Zenghui Yu [Wed, 4 Mar 2020 20:33:11 +0000 (20:33 +0000)]
irqchip/gic-v4.1: Wait for completion of redistributor's INVALL operation
In GICv4.1, we emulate a guest-issued INVALL command by a direct write
to GICR_INVALLR. Before we finish the emulation and go back to guest,
let's make sure the physical invalidate operation is actually completed
and no stale data will be left in redistributor. Per the specification,
this can be achieved by polling the GICR_SYNCR.Busy bit (to zero).
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/20200302092145.899-1-yuzenghui@huawei.com
Link: https://lore.kernel.org/r/20200304203330.4967-5-maz@kernel.org
Marc Zyngier [Wed, 4 Mar 2020 20:33:10 +0000 (20:33 +0000)]
irqchip/gic-v4.1: Ensure mutual exclusion between vPE affinity change and RD access
Before GICv4.1, all operations would be serialized with the affinity
changes by virtue of using the same ITS command queue. With v4.1, things
change, as invalidations (and a number of other operations) are issued
using the redistributor MMIO frame.
We must thus make sure that these redistributor accesses cannot race
against aginst the affinity change, or we may end-up talking to the
wrong redistributor.
To ensure this, we expand the irq_to_cpuid() helper to take a spinlock
when the LPI is mapped to a vLPI (a new per-VPE lock) on each operation
that requires mutual exclusion.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Link: https://lore.kernel.org/r/20200304203330.4967-4-maz@kernel.org
Marc Zyngier [Wed, 4 Mar 2020 20:33:09 +0000 (20:33 +0000)]
irqchip/gic-v4.1: Skip absent CPUs while iterating over redistributors
In a system that is only sparsly populated with CPUs, we can end-up with
redistributors structures that are not initialized. Let's make sure we
don't try and access those when iterating over them (in this case when
checking we have a L2 VPE table).
Fixes:
4e6437f12d6e ("irqchip/gic-v4.1: Ensure L2 vPE table is allocated at RD level")
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/20200304203330.4967-3-maz@kernel.org
Marc Zyngier [Wed, 4 Mar 2020 20:33:08 +0000 (20:33 +0000)]
irqchip/gic-v3: Use SGIs without active state if offered
To allow the direct injection of SGIs into a guest, the GICv4.1
architecture has to sacrifice the Active state so that SGIs look
a lot like LPIs (they are injected by the same mechanism).
In order not to break existing software, the architecture gives
offers guests OSs the choice: SGIs with or without an active
state. It is the hypervisors duty to honor the guest's choice.
For this, the architecture offers a discovery bit indicating whether
the GIC supports GICv4.1 SGIs (GICD_TYPER2.nASSGIcap), and another
bit indicating whether the guest wants Active-less SGIs or not
(controlled by GICD_CTLR.nASSGIreq).
A hypervisor not supporting GICv4.1 SGIs would leave nASSGIcap
clear, and a guest not knowing about GICv4.1 SGIs (or definitely
wanting an Active state) would leave nASSGIreq clear (both being
thankfully backward compatible with older revisions of the GIC).
Since Linux is perfectly happy without an active state on SGIs,
inform the hypervisor that we'll use that if offered.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Link: https://lore.kernel.org/r/20200304203330.4967-2-maz@kernel.org
KarimAllah Ahmed [Mon, 16 Mar 2020 09:39:06 +0000 (10:39 +0100)]
KVM: arm64: Use the correct timer structure to access the physical counter
Use the physical timer structure when reading the physical counter
instead of using the virtual timer structure. Thankfully, nothing is
accessing this code path yet (at least not until we enable save/restore
of the physical counter). It doesn't hurt for this to be correct though.
Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
[maz: amended commit log]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Fixes:
84135d3d18da ("KVM: arm/arm64: consolidate arch timer trap handlers")
Link: https://lore.kernel.org/r/1584351546-5018-1-git-send-email-karahmed@amazon.de
Linus Torvalds [Sun, 1 Mar 2020 22:38:46 +0000 (16:38 -0600)]
Linux 5.6-rc4
Linus Torvalds [Sun, 1 Mar 2020 22:35:08 +0000 (16:35 -0600)]
Merge tag 'ext4_for_linus_stable' of git://git./linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"Two more bug fixes (including a regression) for 5.6"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: potential crash on allocation error in ext4_alloc_flex_bg_array()
jbd2: fix data races at struct journal_head
Linus Torvalds [Sun, 1 Mar 2020 21:16:35 +0000 (15:16 -0600)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
"More bugfixes, including a few remaining "make W=1" issues such as too
large frame sizes on some configurations.
On the ARM side, the compiler was messing up shadow stacks between EL1
and EL2 code, which is easily fixed with __always_inline"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: VMX: check descriptor table exits on instruction emulation
kvm: x86: Limit the number of "kvm: disabled by bios" messages
KVM: x86: avoid useless copy of cpufreq policy
KVM: allow disabling -Werror
KVM: x86: allow compiling as non-module with W=1
KVM: Pre-allocate 1 cpumask variable per cpu for both pv tlb and pv ipis
KVM: Introduce pv check helpers
KVM: let declaration of kvm_get_running_vcpus match implementation
KVM: SVM: allocate AVIC data structures based on kvm_amd module parameter
arm64: Ask the compiler to __always_inline functions used by KVM at HYP
KVM: arm64: Define our own swab32() to avoid a uapi static inline
KVM: arm64: Ask the compiler to __always_inline functions used at HYP
kvm: arm/arm64: Fold VHE entry/exit work into kvm_vcpu_run_vhe()
KVM: arm/arm64: Fix up includes for trace.h
Oliver Upton [Sat, 29 Feb 2020 19:30:14 +0000 (11:30 -0800)]
KVM: VMX: check descriptor table exits on instruction emulation
KVM emulates UMIP on hardware that doesn't support it by setting the
'descriptor table exiting' VM-execution control and performing
instruction emulation. When running nested, this emulation is broken as
KVM refuses to emulate L2 instructions by default.
Correct this regression by allowing the emulation of descriptor table
instructions if L1 hasn't requested 'descriptor table exiting'.
Fixes:
07721feee46b ("KVM: nVMX: Don't emulate instructions in guest mode")
Reported-by: Jan Kiszka <jan.kiszka@web.de>
Cc: stable@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Linus Torvalds [Sun, 1 Mar 2020 01:16:46 +0000 (19:16 -0600)]
Merge branch 'i2c/for-current-fixed' of git://git./linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"I2C has three driver bugfixes for you. We agreed on the Mac regression
to go in via I2C"
* 'i2c/for-current-fixed' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
macintosh: therm_windtunnel: fix regression when instantiating devices
i2c: altera: Fix potential integer overflow
i2c: jz4780: silence log flood on txabrt
Dan Carpenter [Fri, 28 Feb 2020 09:22:56 +0000 (12:22 +0300)]
ext4: potential crash on allocation error in ext4_alloc_flex_bg_array()
If sbi->s_flex_groups_allocated is zero and the first allocation fails
then this code will crash. The problem is that "i--" will set "i" to
-1 but when we compare "i >= sbi->s_flex_groups_allocated" then the -1
is type promoted to unsigned and becomes UINT_MAX. Since UINT_MAX
is more than zero, the condition is true so we call kvfree(new_groups[-1]).
The loop will carry on freeing invalid memory until it crashes.
Fixes:
7c990728b99e ("ext4: fix potential race between s_flex_groups online resizing and access")
Reviewed-by: Suraj Jitindar Singh <surajjs@amazon.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: stable@kernel.org
Link: https://lore.kernel.org/r/20200228092142.7irbc44yaz3by7nb@kili.mountain
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Wolfram Sang [Tue, 25 Feb 2020 14:12:29 +0000 (15:12 +0100)]
macintosh: therm_windtunnel: fix regression when instantiating devices
Removing attach_adapter from this driver caused a regression for at
least some machines. Those machines had the sensors described in their
DT, too, so they didn't need manual creation of the sensor devices. The
old code worked, though, because manual creation came first. Creation of
DT devices then failed later and caused error logs, but the sensors
worked nonetheless because of the manually created devices.
When removing attach_adaper, manual creation now comes later and loses
the race. The sensor devices were already registered via DT, yet with
another binding, so the driver could not be bound to it.
This fix refactors the code to remove the race and only manually creates
devices if there are no DT nodes present. Also, the DT binding is updated
to match both, the DT and manually created devices. Because we don't
know which device creation will be used at runtime, the code to start
the kthread is moved to do_probe() which will be called by both methods.
Fixes:
3e7bed52719d ("macintosh: therm_windtunnel: drop using attach_adapter")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=201723
Reported-by: Erhard Furtner <erhard_f@mailbox.org>
Tested-by: Erhard Furtner <erhard_f@mailbox.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: stable@kernel.org # v4.19+
Qian Cai [Sat, 22 Feb 2020 04:31:11 +0000 (23:31 -0500)]
jbd2: fix data races at struct journal_head
journal_head::b_transaction and journal_head::b_next_transaction could
be accessed concurrently as noticed by KCSAN,
LTP: starting fsync04
/dev/zero: Can't open blockdev
EXT4-fs (loop0): mounting ext3 file system using the ext4 subsystem
EXT4-fs (loop0): mounted filesystem with ordered data mode. Opts: (null)
==================================================================
BUG: KCSAN: data-race in __jbd2_journal_refile_buffer [jbd2] / jbd2_write_access_granted [jbd2]
write to 0xffff99f9b1bd0e30 of 8 bytes by task 25721 on cpu 70:
__jbd2_journal_refile_buffer+0xdd/0x210 [jbd2]
__jbd2_journal_refile_buffer at fs/jbd2/transaction.c:2569
jbd2_journal_commit_transaction+0x2d15/0x3f20 [jbd2]
(inlined by) jbd2_journal_commit_transaction at fs/jbd2/commit.c:1034
kjournald2+0x13b/0x450 [jbd2]
kthread+0x1cd/0x1f0
ret_from_fork+0x27/0x50
read to 0xffff99f9b1bd0e30 of 8 bytes by task 25724 on cpu 68:
jbd2_write_access_granted+0x1b2/0x250 [jbd2]
jbd2_write_access_granted at fs/jbd2/transaction.c:1155
jbd2_journal_get_write_access+0x2c/0x60 [jbd2]
__ext4_journal_get_write_access+0x50/0x90 [ext4]
ext4_mb_mark_diskspace_used+0x158/0x620 [ext4]
ext4_mb_new_blocks+0x54f/0xca0 [ext4]
ext4_ind_map_blocks+0xc79/0x1b40 [ext4]
ext4_map_blocks+0x3b4/0x950 [ext4]
_ext4_get_block+0xfc/0x270 [ext4]
ext4_get_block+0x3b/0x50 [ext4]
__block_write_begin_int+0x22e/0xae0
__block_write_begin+0x39/0x50
ext4_write_begin+0x388/0xb50 [ext4]
generic_perform_write+0x15d/0x290
ext4_buffered_write_iter+0x11f/0x210 [ext4]
ext4_file_write_iter+0xce/0x9e0 [ext4]
new_sync_write+0x29c/0x3b0
__vfs_write+0x92/0xa0
vfs_write+0x103/0x260
ksys_write+0x9d/0x130
__x64_sys_write+0x4c/0x60
do_syscall_64+0x91/0xb05
entry_SYSCALL_64_after_hwframe+0x49/0xbe
5 locks held by fsync04/25724:
#0:
ffff99f9911093f8 (sb_writers#13){.+.+}, at: vfs_write+0x21c/0x260
#1:
ffff99f9db4c0348 (&sb->s_type->i_mutex_key#15){+.+.}, at: ext4_buffered_write_iter+0x65/0x210 [ext4]
#2:
ffff99f5e7dfcf58 (jbd2_handle){++++}, at: start_this_handle+0x1c1/0x9d0 [jbd2]
#3:
ffff99f9db4c0168 (&ei->i_data_sem){++++}, at: ext4_map_blocks+0x176/0x950 [ext4]
#4:
ffffffff99086b40 (rcu_read_lock){....}, at: jbd2_write_access_granted+0x4e/0x250 [jbd2]
irq event stamp: 1407125
hardirqs last enabled at (1407125): [<
ffffffff980da9b7>] __find_get_block+0x107/0x790
hardirqs last disabled at (1407124): [<
ffffffff980da8f9>] __find_get_block+0x49/0x790
softirqs last enabled at (1405528): [<
ffffffff98a0034c>] __do_softirq+0x34c/0x57c
softirqs last disabled at (1405521): [<
ffffffff97cc67a2>] irq_exit+0xa2/0xc0
Reported by Kernel Concurrency Sanitizer on:
CPU: 68 PID: 25724 Comm: fsync04 Tainted: G L 5.6.0-rc2-next-
20200221+ #7
Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
The plain reads are outside of jh->b_state_lock critical section which result
in data races. Fix them by adding pairs of READ|WRITE_ONCE().
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Qian Cai <cai@lca.pw>
Link: https://lore.kernel.org/r/20200222043111.2227-1-cai@lca.pw
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Linus Torvalds [Sat, 29 Feb 2020 15:58:47 +0000 (09:58 -0600)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Four small fixes.
Three are in drivers for fairly obvious bugs. The fourth is a set of
regressions introduced by the compat_ioctl changes because some of the
compat updates wrongly replaced .ioctl instead of .compat_ioctl"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: compat_ioctl: cdrom: Replace .ioctl with .compat_ioctl in four appropriate places
scsi: zfcp: fix wrong data and display format of SFP+ temperature
scsi: sd_sbc: Fix sd_zbc_report_zones()
scsi: libfc: free response frame from GPN_ID
Linus Torvalds [Fri, 28 Feb 2020 19:51:53 +0000 (11:51 -0800)]
Merge tag 'pci-v5.6-fixes-2' of git://git./linux/kernel/git/helgaas/pci
Pull PCI fixes from Bjorn Helgaas:
- Fix build issue on 32-bit ARM with old compilers (Marek Szyprowski)
- Update MAINTAINERS for recent Cadence driver file move (Lukas
Bulwahn)
* tag 'pci-v5.6-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
MAINTAINERS: Correct Cadence PCI driver path
PCI: brcmstb: Fix build on 32bit ARM platforms with older compilers
Linus Torvalds [Fri, 28 Feb 2020 19:43:30 +0000 (11:43 -0800)]
Merge tag 'block-5.6-2020-02-28' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- Passthrough insertion fix (Ming)
- Kill off some unused arguments (John)
- blktrace RCU fix (Jan)
- Dead fields removal for null_blk (Dongli)
- NVMe polled IO fix (Bijan)
* tag 'block-5.6-2020-02-28' of git://git.kernel.dk/linux-block:
nvme-pci: Hold cq_poll_lock while completing CQEs
blk-mq: Remove some unused function arguments
null_blk: remove unused fields in 'nullb_cmd'
blktrace: Protect q->blk_trace with RCU
blk-mq: insert passthrough request into hctx->dispatch directly
Linus Torvalds [Fri, 28 Feb 2020 19:39:14 +0000 (11:39 -0800)]
Merge tag 'io_uring-5.6-2020-02-28' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
- Fix for a race with IOPOLL used with SQPOLL (Xiaoguang)
- Only show ->fdinfo if procfs is enabled (Tobias)
- Fix for a chain with multiple personalities in the SQEs
- Fix for a missing free of personality idr on exit
- Removal of the spin-for-work optimization
- Fix for next work lookup on request completion
- Fix for non-vec read/write result progation in case of links
- Fix for a fileset references on switch
- Fix for a recvmsg/sendmsg 32-bit compatability mode
* tag 'io_uring-5.6-2020-02-28' of git://git.kernel.dk/linux-block:
io_uring: fix 32-bit compatability with sendmsg/recvmsg
io_uring: define and set show_fdinfo only if procfs is enabled
io_uring: drop file set ref put/get on switch
io_uring: import_single_range() returns 0/-ERROR
io_uring: pick up link work on submit reference drop
io-wq: ensure work->task_pid is cleared on init
io-wq: remove spin-for-work optimization
io_uring: fix poll_list race for SETUP_IOPOLL|SETUP_SQPOLL
io_uring: fix personality idr leak
io_uring: handle multiple personalities in link chains
Jens Axboe [Fri, 28 Feb 2020 17:02:36 +0000 (10:02 -0700)]
Merge branch 'nvme-5.6-rc4' of git://git.infradead.org/nvme into block-5.6
Pull NVMe fix from Keith.
* 'nvme-5.6-rc4' of git://git.infradead.org/nvme:
nvme-pci: Hold cq_poll_lock while completing CQEs
Linus Torvalds [Fri, 28 Feb 2020 17:02:18 +0000 (09:02 -0800)]
Merge tag 'acpi-5.6-rc4' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"Fix a couple of configuration issues in the ACPI watchdog (WDAT)
driver (Mika Westerberg) and make it possible to disable that driver
at boot time in case it still does not work as expected (Jean
Delvare)"
* tag 'acpi-5.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: watchdog: Set default timeout in probe
ACPI: watchdog: Fix gas->access_width usage
ACPICA: Introduce ACPI_ACCESS_BYTE_WIDTH() macro
ACPI: watchdog: Allow disabling WDAT at boot
Linus Torvalds [Fri, 28 Feb 2020 16:49:52 +0000 (08:49 -0800)]
Merge tag 'pm-5.6-rc4' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"Fix a recent cpufreq initialization regression (Rafael Wysocki),
revert a devfreq commit that made incompatible changes and broke user
land on some systems (Orson Zhai), drop a stale reference to a
document that has gone away recently (Jonathan Neuschäfer), and fix a
typo in a hibernation code comment (Alexandre Belloni)"
* tag 'pm-5.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: Fix policy initialization for internal governor drivers
Revert "PM / devfreq: Modify the device name as devfreq(X) for sysfs"
PM / hibernate: fix typo "reserverd_size" -> "reserved_size"
Documentation: power: Drop reference to interface.rst
Linus Torvalds [Fri, 28 Feb 2020 16:34:47 +0000 (08:34 -0800)]
Merge tag 'zonefs-5.6-rc4' of git://git./linux/kernel/git/dlemoal/zonefs
Pull zonefs fixes from Damien Le Moal:
"Two fixes in here:
- Revert the initial decision to silently ignore IOCB_NOWAIT for
asynchronous direct IOs to sequential zone files. Instead, return
an error to the user to signal that the feature is not supported
(from Christoph)
- A fix to zonefs Kconfig to select FS_IOMAP to avoid build failures
if no other file system already selected this option (from
Johannes)"
* tag 'zonefs-5.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
zonefs: select FS_IOMAP
zonefs: fix IOCB_NOWAIT handling
Paolo Bonzini [Fri, 28 Feb 2020 10:46:59 +0000 (11:46 +0100)]
Merge tag 'kvmarm-fixes-5.6-1' of git://git./linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm fixes for 5.6, take #1
- Fix compilation on 32bit
- Move VHE guest entry/exit into the VHE-specific entry code
- Make sure all functions called by the non-VHE HYP code is tagged as __always_inline
Erwan Velu [Thu, 27 Feb 2020 18:00:46 +0000 (19:00 +0100)]
kvm: x86: Limit the number of "kvm: disabled by bios" messages
In older version of systemd(219), at boot time, udevadm is called with :
/usr/bin/udevadm trigger --type=devices --action=add"
This program generates an echo "add" in /sys/devices/system/cpu/cpu<x>/uevent,
leading to the "kvm: disabled by bios" message in case of your Bios disabled
the virtualization extensions.
On a modern system running up to 256 CPU threads, this pollutes the Kernel logs.
This patch offers to ratelimit this message to avoid any userspace program triggering
this uevent printing this message too often.
This patch is only a workaround but greatly reduce the pollution without
breaking the current behavior of printing a message if some try to instantiate
KVM on a system that doesn't support it.
Note that recent versions of systemd (>239) do not have trigger this behavior.
This patch will be useful at least for some using older systemd with recent Kernels.
Signed-off-by: Erwan Velu <e.velu@criteo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Rafael J. Wysocki [Fri, 28 Feb 2020 10:00:50 +0000 (11:00 +0100)]
Merge branches 'pm-sleep' and 'pm-devfreq'
* pm-sleep:
PM / hibernate: fix typo "reserverd_size" -> "reserved_size"
Documentation: power: Drop reference to interface.rst
* pm-devfreq:
Revert "PM / devfreq: Modify the device name as devfreq(X) for sysfs"
Paolo Bonzini [Fri, 28 Feb 2020 09:49:10 +0000 (10:49 +0100)]
KVM: x86: avoid useless copy of cpufreq policy
struct cpufreq_policy is quite big and it is not a good idea
to allocate one on the stack. Just use cpufreq_cpu_get and
cpufreq_cpu_put which is even simpler.
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Fri, 28 Feb 2020 09:42:31 +0000 (10:42 +0100)]
KVM: allow disabling -Werror
Restrict -Werror to well-tested configurations and allow disabling it
via Kconfig.
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Valdis Klētnieks [Fri, 28 Feb 2020 02:49:52 +0000 (21:49 -0500)]
KVM: x86: allow compiling as non-module with W=1
Compile error with CONFIG_KVM_INTEL=y and W=1:
CC arch/x86/kvm/vmx/vmx.o
arch/x86/kvm/vmx/vmx.c:68:32: error: 'vmx_cpu_id' defined but not used [-Werror=unused-const-variable=]
68 | static const struct x86_cpu_id vmx_cpu_id[] = {
| ^~~~~~~~~~
cc1: all warnings being treated as errors
When building with =y, the MODULE_DEVICE_TABLE macro doesn't generate a
reference to the structure (or any code at all). This makes W=1 compiles
unhappy.
Wrap both in a #ifdef to avoid the issue.
Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
[Do the same for CONFIG_KVM_AMD. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Wanpeng Li [Tue, 18 Feb 2020 01:08:24 +0000 (09:08 +0800)]
KVM: Pre-allocate 1 cpumask variable per cpu for both pv tlb and pv ipis
Nick Desaulniers Reported:
When building with:
$ make CC=clang arch/x86/ CFLAGS=-Wframe-larger-than=1000
The following warning is observed:
arch/x86/kernel/kvm.c:494:13: warning: stack frame size of 1064 bytes in
function 'kvm_send_ipi_mask_allbutself' [-Wframe-larger-than=]
static void kvm_send_ipi_mask_allbutself(const struct cpumask *mask, int
vector)
^
Debugging with:
https://github.com/ClangBuiltLinux/frame-larger-than
via:
$ python3 frame_larger_than.py arch/x86/kernel/kvm.o \
kvm_send_ipi_mask_allbutself
points to the stack allocated `struct cpumask newmask` in
`kvm_send_ipi_mask_allbutself`. The size of a `struct cpumask` is
potentially large, as it's CONFIG_NR_CPUS divided by BITS_PER_LONG for
the target architecture. CONFIG_NR_CPUS for X86_64 can be as high as
8192, making a single instance of a `struct cpumask` 1024 B.
This patch fixes it by pre-allocate 1 cpumask variable per cpu and use it for
both pv tlb and pv ipis..
Reported-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Wanpeng Li [Tue, 18 Feb 2020 01:08:23 +0000 (09:08 +0800)]
KVM: Introduce pv check helpers
Introduce some pv check helpers for consistency.
Suggested-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Christian Borntraeger [Fri, 28 Feb 2020 08:49:41 +0000 (09:49 +0100)]
KVM: let declaration of kvm_get_running_vcpus match implementation
Sparse notices that declaration and implementation do not match:
arch/s390/kvm/../../../virt/kvm/kvm_main.c:4435:17: warning: incorrect type in return expression (different address spaces)
arch/s390/kvm/../../../virt/kvm/kvm_main.c:4435:17: expected struct kvm_vcpu [noderef] <asn:3> **
arch/s390/kvm/../../../virt/kvm/kvm_main.c:4435:17: got struct kvm_vcpu *[noderef] <asn:3> *
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Tue, 25 Feb 2020 07:54:26 +0000 (08:54 +0100)]
KVM: SVM: allocate AVIC data structures based on kvm_amd module parameter
Even if APICv is disabled at startup, the backing page and ir_list need
to be initialized in case they are needed later. The only case in
which this can be skipped is for userspace irqchip, and that must be
done because avic_init_backing_page dereferences vcpu->arch.apic
(which is NULL for userspace irqchip).
Tested-by: rmuncrief@humanavance.com
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=206579
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Linus Torvalds [Fri, 28 Feb 2020 05:52:18 +0000 (21:52 -0800)]
Merge tag 'drm-fixes-2020-02-28' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Just some fixes for this week: amdgpu, radeon and i915.
The main i915 one is a regression Gen7 (Ivybridge/Haswell), this moves
them back from trying to use the full-ppgtt support to the aliasing
version it used to use due to gpu hangs. Otherwise it's pretty quiet.
amdgpu:
- Drop DRIVER_USE_AGP
- Fix memory leak in GPU reset
- Resume fix for raven
radeon:
- Drop DRIVER_USE_AGP
i915:
- downgrade gen7 back to aliasing-ppgtt to avoid GPU hangs
- shrinker fix
- pmu leak and double free fixes
- gvt user after free and virtual display reset fixes
- randconfig build fix"
* tag 'drm-fixes-2020-02-28' of git://anongit.freedesktop.org/drm/drm:
drm/radeon: Inline drm_get_pci_dev
drm/amdgpu: Drop DRIVER_USE_AGP
drm/i915: Avoid recursing onto active vma from the shrinker
drm/i915/pmu: Avoid using globals for PMU events
drm/i915/pmu: Avoid using globals for CPU hotplug state
drm/i915/gtt: Downgrade gen7 (ivb, byt, hsw) back to aliasing-ppgtt
drm/i915: fix header test with GCOV
amdgpu/gmc_v9: save/restore sdpif regs during S3
drm/amdgpu: fix memory leak during TDR test(v2)
drm/i915/gvt: Fix orphan vgpu dmabuf_objs' lifetime
drm/i915/gvt: Separate display reset from ALL_ENGINES reset
Dave Airlie [Fri, 28 Feb 2020 02:40:41 +0000 (12:40 +1000)]
Merge tag 'drm-intel-fixes-2020-02-27' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
drm/i915 fixes for v5.6-rc4:
- downgrade gen7 back to aliasing-ppgtt to avoid GPU hangs
- shrinker fix
- pmu leak and double free fixes
- gvt user after free and virtual display reset fixes
- randconfig build fix
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/874kvcsh00.fsf@intel.com
Dave Airlie [Fri, 28 Feb 2020 02:30:18 +0000 (12:30 +1000)]
Merge tag 'amd-drm-fixes-5.6-2020-02-26' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
amd-drm-fixes-5.6-2020-02-26:
amdgpu:
- Drop DRIVER_USE_AGP
- Fix memory leak in GPU reset
- Resume fix for raven
radeon:
- Drop DRIVER_USE_AGP
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200227034106.3912-1-alexander.deucher@amd.com
Linus Torvalds [Fri, 28 Feb 2020 00:34:41 +0000 (16:34 -0800)]
Merge git://git./linux/kernel/git/netdev/net
Pull networking fixes from David Miller:
1) Fix leak in nl80211 AP start where we leak the ACL memory, from
Johannes Berg.
2) Fix double mutex unlock in mac80211, from Andrei Otcheretianski.
3) Fix RCU stall in ipset, from Jozsef Kadlecsik.
4) Fix devlink locking in devlink_dpipe_table_register, from Madhuparna
Bhowmik.
5) Fix race causing TX hang in ll_temac, from Esben Haabendal.
6) Stale eth hdr pointer in br_dev_xmit(), from Nikolay Aleksandrov.
7) Fix TX hash calculation bounds checking wrt. tc rules, from Amritha
Nambiar.
8) Size netlink responses properly in schedule action code to take into
consideration TCA_ACT_FLAGS. From Jiri Pirko.
9) Fix firmware paths for mscc PHY driver, from Antoine Tenart.
10) Don't register stmmac notifier multiple times, from Aaro Koskinen.
11) Various rmnet bug fixes, from Taehee Yoo.
12) Fix vsock deadlock in vsock transport release, from Stefano
Garzarella.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (61 commits)
net: dsa: mv88e6xxx: Fix masking of egress port
mlxsw: pci: Wait longer before accessing the device after reset
sfc: fix timestamp reconstruction at 16-bit rollover points
vsock: fix potential deadlock in transport->release()
unix: It's CONFIG_PROC_FS not CONFIG_PROCFS
net: rmnet: fix packet forwarding in rmnet bridge mode
net: rmnet: fix bridge mode bugs
net: rmnet: use upper/lower device infrastructure
net: rmnet: do not allow to change mux id if mux id is duplicated
net: rmnet: remove rcu_read_lock in rmnet_force_unassociate_device()
net: rmnet: fix suspicious RCU usage
net: rmnet: fix NULL pointer dereference in rmnet_changelink()
net: rmnet: fix NULL pointer dereference in rmnet_newlink()
net: phy: marvell: don't interpret PHY status unless resolved
mlx5: register lag notifier for init network namespace only
unix: define and set show_fdinfo only if procfs is enabled
hinic: fix a bug of rss configuration
hinic: fix a bug of setting hw_ioctxt
hinic: fix a irq affinity bug
net/smc: check for valid ib_client_data
...
Lukas Bulwahn [Fri, 21 Feb 2020 18:54:02 +0000 (19:54 +0100)]
MAINTAINERS: Correct Cadence PCI driver path
de80f95ccb9c ("PCI: cadence: Move all files to per-device cadence
directory") moved files of the PCI cadence drivers, but did not update the
MAINTAINERS entry.
Since then, ./scripts/get_maintainer.pl --self-test complains:
warning: no file matches F: drivers/pci/controller/pcie-cadence*
Repair the MAINTAINERS entry.
Link: https://lore.kernel.org/r/20200221185402.4703-1-lukas.bulwahn@gmail.com
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Jens Axboe [Thu, 27 Feb 2020 21:17:49 +0000 (14:17 -0700)]
io_uring: fix 32-bit compatability with sendmsg/recvmsg
We must set MSG_CMSG_COMPAT if we're in compatability mode, otherwise
the iovec import for these commands will not do the right thing and fail
the command with -EINVAL.
Found by running the test suite compiled as 32-bit.
Cc: stable@vger.kernel.org
Fixes:
aa1fa28fc73e ("io_uring: add support for recvmsg()")
Fixes:
0fa03c624d8f ("io_uring: add support for sendmsg()")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Andrew Lunn [Thu, 27 Feb 2020 20:20:49 +0000 (21:20 +0100)]
net: dsa: mv88e6xxx: Fix masking of egress port
Add missing ~ to the usage of the mask.
Reported-by: Kevin Benson <Kevin.Benson@zii.aero>
Reported-by: Chris Healy <Chris.Healy@zii.aero>
Fixes:
5c74c54ce6ff ("net: dsa: mv88e6xxx: Split monitor port configuration")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Thu, 27 Feb 2020 20:07:53 +0000 (21:07 +0100)]
mlxsw: pci: Wait longer before accessing the device after reset
During initialization the driver issues a reset to the device and waits
for 100ms before checking if the firmware is ready. The waiting is
necessary because before that the device is irresponsive and the first
read can result in a completion timeout.
While 100ms is sufficient for Spectrum-1 and Spectrum-2, it is
insufficient for Spectrum-3.
Fix this by increasing the timeout to 200ms.
Fixes:
da382875c616 ("mlxsw: spectrum: Extend to support Spectrum-3 ASIC")
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alex Maftei (amaftei) [Wed, 26 Feb 2020 17:33:19 +0000 (17:33 +0000)]
sfc: fix timestamp reconstruction at 16-bit rollover points
We can't just use the top bits of the last sync event as they could be
off-by-one every 65,536 seconds, giving an error in reconstruction of
65,536 seconds.
This patch uses the difference in the bottom 16 bits (mod 2^16) to
calculate an offset that needs to be applied to the last sync event to
get to the current time.
Signed-off-by: Alexandru-Mihai Maftei <amaftei@solarflare.com>
Acked-by: Martin Habets <mhabets@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefano Garzarella [Wed, 26 Feb 2020 10:58:18 +0000 (11:58 +0100)]
vsock: fix potential deadlock in transport->release()
Some transports (hyperv, virtio) acquire the sock lock during the
.release() callback.
In the vsock_stream_connect() we call vsock_assign_transport(); if
the socket was previously assigned to another transport, the
vsk->transport->release() is called, but the sock lock is already
held in the vsock_stream_connect(), causing a deadlock reported by
syzbot:
INFO: task syz-executor280:9768 blocked for more than 143 seconds.
Not tainted 5.6.0-rc1-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor280 D27912 9768 9766 0x00000000
Call Trace:
context_switch kernel/sched/core.c:3386 [inline]
__schedule+0x934/0x1f90 kernel/sched/core.c:4082
schedule+0xdc/0x2b0 kernel/sched/core.c:4156
__lock_sock+0x165/0x290 net/core/sock.c:2413
lock_sock_nested+0xfe/0x120 net/core/sock.c:2938
virtio_transport_release+0xc4/0xd60 net/vmw_vsock/virtio_transport_common.c:832
vsock_assign_transport+0xf3/0x3b0 net/vmw_vsock/af_vsock.c:454
vsock_stream_connect+0x2b3/0xc70 net/vmw_vsock/af_vsock.c:1288
__sys_connect_file+0x161/0x1c0 net/socket.c:1857
__sys_connect+0x174/0x1b0 net/socket.c:1874
__do_sys_connect net/socket.c:1885 [inline]
__se_sys_connect net/socket.c:1882 [inline]
__x64_sys_connect+0x73/0xb0 net/socket.c:1882
do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe
To avoid this issue, this patch remove the lock acquiring in the
.release() callback of hyperv and virtio transports, and it holds
the lock when we call vsk->transport->release() in the vsock core.
Reported-by: syzbot+731710996d79d0d58fbc@syzkaller.appspotmail.com
Fixes:
408624af4c89 ("vsock: use local transport when it is loaded")
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 27 Feb 2020 19:52:35 +0000 (11:52 -0800)]
unix: It's CONFIG_PROC_FS not CONFIG_PROCFS
Fixes:
3a12500ed5dd ("unix: define and set show_fdinfo only if procfs is enabled")
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 27 Feb 2020 19:45:07 +0000 (11:45 -0800)]
Merge branch 'net-rmnet-fix-several-bugs'
Taehee Yoo says:
====================
net: rmnet: fix several bugs
This patchset is to fix several bugs in RMNET module.
1. The first patch fixes NULL-ptr-deref in rmnet_newlink().
When rmnet interface is being created, it uses IFLA_LINK
without checking NULL.
So, if userspace doesn't set IFLA_LINK, panic will occur.
In this patch, checking NULL pointer code is added.
2. The second patch fixes NULL-ptr-deref in rmnet_changelink().
To get real device in rmnet_changelink(), it uses IFLA_LINK.
But, IFLA_LINK should not be used in rmnet_changelink().
3. The third patch fixes suspicious RCU usage in rmnet_get_port().
rmnet_get_port() uses rcu_dereference_rtnl().
But, rmnet_get_port() is used by datapath.
So, rcu_dereference_bh() should be used instead of rcu_dereference_rtnl().
4. The fourth patch fixes suspicious RCU usage in
rmnet_force_unassociate_device().
RCU critical section should not be scheduled.
But, unregister_netdevice_queue() in the rmnet_force_unassociate_device()
would be scheduled.
So, the RCU warning occurs.
In this patch, the rcu_read_lock() in the rmnet_force_unassociate_device()
is removed because it's unnecessary.
5. The fifth patch fixes duplicate MUX ID case.
RMNET MUX ID is unique.
So, rmnet interface isn't allowed to be created, which have
a duplicate MUX ID.
But, only rmnet_newlink() checks this condition, rmnet_changelink()
doesn't check this.
So, duplicate MUX ID case would happen.
6. The sixth patch fixes upper/lower interface relationship problems.
When IFLA_LINK is used, the upper/lower infrastructure should be used.
Because it checks the maximum depth of upper/lower interfaces and it also
checks circular interface relationship, etc.
In this patch, netdev_upper_dev_link() is used.
7. The seventh patch fixes bridge related problems.
a) ->ndo_del_slave() doesn't work.
b) It couldn't detect circular upper/lower interface relationship.
c) It couldn't prevent stack overflow because of too deep depth
of upper/lower interface
d) It doesn't check the number of lower interfaces.
e) Panics because of several reasons.
These problems are actually the same problem.
So, this patch fixes these problems.
8. The eighth patch fixes packet forwarding issue in bridge mode
Packet forwarding is not working in rmnet bridge mode.
Because when a packet is forwarded, skb_push() for an ethernet header
is needed. But it doesn't call skb_push().
So, the ethernet header will be lost.
Change log:
- update commit logs.
- drop two patches in this patchset because of wrong target branch.
- ("net: rmnet: add missing module alias")
- ("net: rmnet: print error message when command fails")
- remove unneessary rcu_read_lock() in the third patch.
- use rcu_dereference_bh() instead of rcu_dereference in third patch.
- do not allow to add a bridge device if rmnet interface is already
bridge mode in the seventh patch.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Taehee Yoo [Thu, 27 Feb 2020 12:26:15 +0000 (12:26 +0000)]
net: rmnet: fix packet forwarding in rmnet bridge mode
Packet forwarding is not working in rmnet bridge mode.
Because when a packet is forwarded, skb_push() for an ethernet header
is needed. But it doesn't call skb_push().
So, the ethernet header will be lost.
Test commands:
modprobe rmnet
ip netns add nst
ip netns add nst2
ip link add veth0 type veth peer name veth1
ip link add veth2 type veth peer name veth3
ip link set veth1 netns nst
ip link set veth3 netns nst2
ip link add rmnet0 link veth0 type rmnet mux_id 1
ip link set veth2 master rmnet0
ip link set veth0 up
ip link set veth2 up
ip link set rmnet0 up
ip a a 192.168.100.1/24 dev rmnet0
ip netns exec nst ip link set veth1 up
ip netns exec nst ip a a 192.168.100.2/24 dev veth1
ip netns exec nst2 ip link set veth3 up
ip netns exec nst2 ip a a 192.168.100.3/24 dev veth3
ip netns exec nst2 ping 192.168.100.2
Fixes:
60d58f971c10 ("net: qualcomm: rmnet: Implement bridge mode")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Taehee Yoo [Thu, 27 Feb 2020 12:26:02 +0000 (12:26 +0000)]
net: rmnet: fix bridge mode bugs
In order to attach a bridge interface to the rmnet interface,
"master" operation is used.
(e.g. ip link set dummy1 master rmnet0)
But, in the rmnet_add_bridge(), which is a callback of ->ndo_add_slave()
doesn't register lower interface.
So, ->ndo_del_slave() doesn't work.
There are other problems too.
1. It couldn't detect circular upper/lower interface relationship.
2. It couldn't prevent stack overflow because of too deep depth
of upper/lower interface
3. It doesn't check the number of lower interfaces.
4. Panics because of several reasons.
The root problem of these issues is actually the same.
So, in this patch, these all problems will be fixed.
Test commands:
modprobe rmnet
ip link add dummy0 type dummy
ip link add rmnet0 link dummy0 type rmnet mux_id 1
ip link add dummy1 master rmnet0 type dummy
ip link add dummy2 master rmnet0 type dummy
ip link del rmnet0
ip link del dummy2
ip link del dummy1
Splat looks like:
[ 41.867595][ T1164] general protection fault, probably for non-canonical address 0xdffffc0000000101I
[ 41.869993][ T1164] KASAN: null-ptr-deref in range [0x0000000000000808-0x000000000000080f]
[ 41.872950][ T1164] CPU: 0 PID: 1164 Comm: ip Not tainted 5.6.0-rc1+ #447
[ 41.873915][ T1164] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 41.875161][ T1164] RIP: 0010:rmnet_unregister_bridge.isra.6+0x71/0xf0 [rmnet]
[ 41.876178][ T1164] Code: 48 89 ef 48 89 c6 5b 5d e9 fc fe ff ff e8 f7 f3 ff ff 48 8d b8 08 08 00 00 48 ba 00 7
[ 41.878925][ T1164] RSP: 0018:
ffff8880c4d0f188 EFLAGS:
00010202
[ 41.879774][ T1164] RAX:
0000000000000000 RBX:
0000000000000000 RCX:
0000000000000101
[ 41.887689][ T1164] RDX:
dffffc0000000000 RSI:
ffffffffb8cf64f0 RDI:
0000000000000808
[ 41.888727][ T1164] RBP:
ffff8880c40e4000 R08:
ffffed101b3c0e3c R09:
0000000000000001
[ 41.889749][ T1164] R10:
0000000000000001 R11:
ffffed101b3c0e3b R12:
1ffff110189a1e3c
[ 41.890783][ T1164] R13:
ffff8880c4d0f200 R14:
ffffffffb8d56160 R15:
ffff8880ccc2c000
[ 41.891794][ T1164] FS:
00007f4300edc0c0(0000) GS:
ffff8880d9c00000(0000) knlGS:
0000000000000000
[ 41.892953][ T1164] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 41.893800][ T1164] CR2:
00007f43003bc8c0 CR3:
00000000ca53e001 CR4:
00000000000606f0
[ 41.894824][ T1164] Call Trace:
[ 41.895274][ T1164] ? rcu_is_watching+0x2c/0x80
[ 41.895895][ T1164] rmnet_config_notify_cb+0x1f7/0x590 [rmnet]
[ 41.896687][ T1164] ? rmnet_unregister_bridge.isra.6+0xf0/0xf0 [rmnet]
[ 41.897611][ T1164] ? rmnet_unregister_bridge.isra.6+0xf0/0xf0 [rmnet]
[ 41.898508][ T1164] ? __module_text_address+0x13/0x140
[ 41.899162][ T1164] notifier_call_chain+0x90/0x160
[ 41.899814][ T1164] rollback_registered_many+0x660/0xcf0
[ 41.900544][ T1164] ? netif_set_real_num_tx_queues+0x780/0x780
[ 41.901316][ T1164] ? __lock_acquire+0xdfe/0x3de0
[ 41.901958][ T1164] ? memset+0x1f/0x40
[ 41.902468][ T1164] ? __nla_validate_parse+0x98/0x1ab0
[ 41.903166][ T1164] unregister_netdevice_many.part.133+0x13/0x1b0
[ 41.903988][ T1164] rtnl_delete_link+0xbc/0x100
[ ... ]
Fixes:
60d58f971c10 ("net: qualcomm: rmnet: Implement bridge mode")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Taehee Yoo [Thu, 27 Feb 2020 12:25:43 +0000 (12:25 +0000)]
net: rmnet: use upper/lower device infrastructure
netdev_upper_dev_link() is useful to manage lower/upper interfaces.
And this function internally validates looping, maximum depth.
All or most virtual interfaces that could have a real interface
(e.g. macsec, macvlan, ipvlan etc.) use lower/upper infrastructure.
Test commands:
modprobe rmnet
ip link add dummy0 type dummy
ip link add rmnet1 link dummy0 type rmnet mux_id 1
for i in {2..100}
do
let A=$i-1
ip link add rmnet$i link rmnet$A type rmnet mux_id $i
done
ip link del dummy0
The purpose of the test commands is to make stack overflow.
Splat looks like:
[ 52.411438][ T1395] BUG: KASAN: slab-out-of-bounds in find_busiest_group+0x27e/0x2c00
[ 52.413218][ T1395] Write of size 64 at addr
ffff8880c774bde0 by task ip/1395
[ 52.414841][ T1395]
[ 52.430720][ T1395] CPU: 1 PID: 1395 Comm: ip Not tainted 5.6.0-rc1+ #447
[ 52.496511][ T1395] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 52.513597][ T1395] Call Trace:
[ 52.546516][ T1395]
[ 52.558773][ T1395] Allocated by task
3171537984:
[ 52.588290][ T1395] BUG: unable to handle page fault for address:
ffffffffb999e260
[ 52.589311][ T1395] #PF: supervisor read access in kernel mode
[ 52.590529][ T1395] #PF: error_code(0x0000) - not-present page
[ 52.591374][ T1395] PGD
d6818067 P4D
d6818067 PUD
d6819063 PMD 0
[ 52.592288][ T1395] Thread overran stack, or stack corrupted
[ 52.604980][ T1395] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[ 52.605856][ T1395] CPU: 1 PID: 1395 Comm: ip Not tainted 5.6.0-rc1+ #447
[ 52.611764][ T1395] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 52.621520][ T1395] RIP: 0010:stack_depot_fetch+0x10/0x30
[ 52.622296][ T1395] Code: ff e9 f9 fe ff ff 48 89 df e8 9c 1d 91 ff e9 ca fe ff ff cc cc cc cc cc cc cc 89 f8 0
[ 52.627887][ T1395] RSP: 0018:
ffff8880c774bb60 EFLAGS:
00010006
[ 52.628735][ T1395] RAX:
00000000001f8880 RBX:
ffff8880c774d140 RCX:
0000000000000000
[ 52.631773][ T1395] RDX:
000000000000001d RSI:
ffff8880c774bb68 RDI:
0000000000003ff0
[ 52.649584][ T1395] RBP:
ffffea00031dd200 R08:
ffffed101b43e403 R09:
ffffed101b43e403
[ 52.674857][ T1395] R10:
0000000000000001 R11:
ffffed101b43e402 R12:
ffff8880d900e5c0
[ 52.678257][ T1395] R13:
ffff8880c774c000 R14:
0000000000000000 R15:
dffffc0000000000
[ 52.694541][ T1395] FS:
00007fe867f6e0c0(0000) GS:
ffff8880da000000(0000) knlGS:
0000000000000000
[ 52.764039][ T1395] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 52.815008][ T1395] CR2:
ffffffffb999e260 CR3:
00000000c26aa005 CR4:
00000000000606e0
[ 52.862312][ T1395] Call Trace:
[ 52.887133][ T1395] Modules linked in: dummy rmnet veth openvswitch nsh nf_conncount nf_nat nf_conntrack nf_dex
[ 52.936749][ T1395] CR2:
ffffffffb999e260
[ 52.965695][ T1395] ---[ end trace
7e32ca99482dbb31 ]---
[ 52.966556][ T1395] RIP: 0010:stack_depot_fetch+0x10/0x30
[ 52.971083][ T1395] Code: ff e9 f9 fe ff ff 48 89 df e8 9c 1d 91 ff e9 ca fe ff ff cc cc cc cc cc cc cc 89 f8 0
[ 53.003650][ T1395] RSP: 0018:
ffff8880c774bb60 EFLAGS:
00010006
[ 53.043183][ T1395] RAX:
00000000001f8880 RBX:
ffff8880c774d140 RCX:
0000000000000000
[ 53.076480][ T1395] RDX:
000000000000001d RSI:
ffff8880c774bb68 RDI:
0000000000003ff0
[ 53.093858][ T1395] RBP:
ffffea00031dd200 R08:
ffffed101b43e403 R09:
ffffed101b43e403
[ 53.112795][ T1395] R10:
0000000000000001 R11:
ffffed101b43e402 R12:
ffff8880d900e5c0
[ 53.139837][ T1395] R13:
ffff8880c774c000 R14:
0000000000000000 R15:
dffffc0000000000
[ 53.141500][ T1395] FS:
00007fe867f6e0c0(0000) GS:
ffff8880da000000(0000) knlGS:
0000000000000000
[ 53.143343][ T1395] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 53.152007][ T1395] CR2:
ffffffffb999e260 CR3:
00000000c26aa005 CR4:
00000000000606e0
[ 53.156459][ T1395] Kernel panic - not syncing: Fatal exception
[ 54.213570][ T1395] Shutting down cpus with NMI
[ 54.354112][ T1395] Kernel Offset: 0x33000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0x)
[ 54.355687][ T1395] Rebooting in 5 seconds..
Fixes:
b37f78f234bf ("net: qualcomm: rmnet: Fix crash on real dev unregistration")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Taehee Yoo [Thu, 27 Feb 2020 12:25:19 +0000 (12:25 +0000)]
net: rmnet: do not allow to change mux id if mux id is duplicated
Basically, duplicate mux id isn't be allowed.
So, the creation of rmnet will be failed if there is duplicate mux id
is existing.
But, changelink routine doesn't check duplicate mux id.
Test commands:
modprobe rmnet
ip link add dummy0 type dummy
ip link add rmnet0 link dummy0 type rmnet mux_id 1
ip link add rmnet1 link dummy0 type rmnet mux_id 2
ip link set rmnet1 type rmnet mux_id 1
Fixes:
23790ef12082 ("net: qualcomm: rmnet: Allow to configure flags for existing devices")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Taehee Yoo [Thu, 27 Feb 2020 12:25:05 +0000 (12:25 +0000)]
net: rmnet: remove rcu_read_lock in rmnet_force_unassociate_device()
The notifier_call() of the slave interface removes rmnet interface with
unregister_netdevice_queue().
But, before calling unregister_netdevice_queue(), it acquires
rcu readlock.
In the RCU critical section, sleeping isn't be allowed.
But, unregister_netdevice_queue() internally calls synchronize_net(),
which would sleep.
So, suspicious RCU usage warning occurs.
Test commands:
modprobe rmnet
ip link add dummy0 type dummy
ip link add dummy1 type dummy
ip link add rmnet0 link dummy0 type rmnet mux_id 1
ip link set dummy1 master rmnet0
ip link del dummy0
Splat looks like:
[ 79.639245][ T1195] =============================
[ 79.640134][ T1195] WARNING: suspicious RCU usage
[ 79.640852][ T1195] 5.6.0-rc1+ #447 Not tainted
[ 79.641657][ T1195] -----------------------------
[ 79.642472][ T1195] ./include/linux/rcupdate.h:273 Illegal context switch in RCU read-side critical section!
[ 79.644043][ T1195]
[ 79.644043][ T1195] other info that might help us debug this:
[ 79.644043][ T1195]
[ 79.645682][ T1195]
[ 79.645682][ T1195] rcu_scheduler_active = 2, debug_locks = 1
[ 79.646980][ T1195] 2 locks held by ip/1195:
[ 79.647629][ T1195] #0:
ffffffffa3cf64f0 (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x457/0x890
[ 79.649312][ T1195] #1:
ffffffffa39256c0 (rcu_read_lock){....}, at: rmnet_config_notify_cb+0xf0/0x590 [rmnet]
[ 79.651717][ T1195]
[ 79.651717][ T1195] stack backtrace:
[ 79.652650][ T1195] CPU: 3 PID: 1195 Comm: ip Not tainted 5.6.0-rc1+ #447
[ 79.653702][ T1195] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 79.655037][ T1195] Call Trace:
[ 79.655560][ T1195] dump_stack+0x96/0xdb
[ 79.656252][ T1195] ___might_sleep+0x345/0x440
[ 79.656994][ T1195] synchronize_net+0x18/0x30
[ 79.661132][ T1195] netdev_rx_handler_unregister+0x40/0xb0
[ 79.666266][ T1195] rmnet_unregister_real_device+0x42/0xb0 [rmnet]
[ 79.667211][ T1195] rmnet_config_notify_cb+0x1f7/0x590 [rmnet]
[ 79.668121][ T1195] ? rmnet_unregister_bridge.isra.6+0xf0/0xf0 [rmnet]
[ 79.669166][ T1195] ? rmnet_unregister_bridge.isra.6+0xf0/0xf0 [rmnet]
[ 79.670286][ T1195] ? __module_text_address+0x13/0x140
[ 79.671139][ T1195] notifier_call_chain+0x90/0x160
[ 79.671973][ T1195] rollback_registered_many+0x660/0xcf0
[ 79.672893][ T1195] ? netif_set_real_num_tx_queues+0x780/0x780
[ 79.675091][ T1195] ? __lock_acquire+0xdfe/0x3de0
[ 79.675825][ T1195] ? memset+0x1f/0x40
[ 79.676367][ T1195] ? __nla_validate_parse+0x98/0x1ab0
[ 79.677290][ T1195] unregister_netdevice_many.part.133+0x13/0x1b0
[ 79.678163][ T1195] rtnl_delete_link+0xbc/0x100
[ ... ]
Fixes:
ceed73a2cf4a ("drivers: net: ethernet: qualcomm: rmnet: Initial implementation")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Taehee Yoo [Thu, 27 Feb 2020 12:24:45 +0000 (12:24 +0000)]
net: rmnet: fix suspicious RCU usage
rmnet_get_port() internally calls rcu_dereference_rtnl(),
which checks RTNL.
But rmnet_get_port() could be called by packet path.
The packet path is not protected by RTNL.
So, the suspicious RCU usage problem occurs.
Test commands:
modprobe rmnet
ip netns add nst
ip link add veth0 type veth peer name veth1
ip link set veth1 netns nst
ip link add rmnet0 link veth0 type rmnet mux_id 1
ip netns exec nst ip link add rmnet1 link veth1 type rmnet mux_id 1
ip netns exec nst ip link set veth1 up
ip netns exec nst ip link set rmnet1 up
ip netns exec nst ip a a 192.168.100.2/24 dev rmnet1
ip link set veth0 up
ip link set rmnet0 up
ip a a 192.168.100.1/24 dev rmnet0
ping 192.168.100.2
Splat looks like:
[ 146.630958][ T1174] WARNING: suspicious RCU usage
[ 146.631735][ T1174] 5.6.0-rc1+ #447 Not tainted
[ 146.632387][ T1174] -----------------------------
[ 146.633151][ T1174] drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c:386 suspicious rcu_dereference_check() !
[ 146.634742][ T1174]
[ 146.634742][ T1174] other info that might help us debug this:
[ 146.634742][ T1174]
[ 146.645992][ T1174]
[ 146.645992][ T1174] rcu_scheduler_active = 2, debug_locks = 1
[ 146.646937][ T1174] 5 locks held by ping/1174:
[ 146.647609][ T1174] #0:
ffff8880c31dea70 (sk_lock-AF_INET){+.+.}, at: raw_sendmsg+0xab8/0x2980
[ 146.662463][ T1174] #1:
ffffffff93925660 (rcu_read_lock_bh){....}, at: ip_finish_output2+0x243/0x2150
[ 146.671696][ T1174] #2:
ffffffff93925660 (rcu_read_lock_bh){....}, at: __dev_queue_xmit+0x213/0x2940
[ 146.673064][ T1174] #3:
ffff8880c19ecd58 (&dev->qdisc_running_key#7){+...}, at: ip_finish_output2+0x714/0x2150
[ 146.690358][ T1174] #4:
ffff8880c5796898 (&dev->qdisc_xmit_lock_key#3){+.-.}, at: sch_direct_xmit+0x1e2/0x1020
[ 146.699875][ T1174]
[ 146.699875][ T1174] stack backtrace:
[ 146.701091][ T1174] CPU: 0 PID: 1174 Comm: ping Not tainted 5.6.0-rc1+ #447
[ 146.705215][ T1174] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 146.706565][ T1174] Call Trace:
[ 146.707102][ T1174] dump_stack+0x96/0xdb
[ 146.708007][ T1174] rmnet_get_port.part.9+0x76/0x80 [rmnet]
[ 146.709233][ T1174] rmnet_egress_handler+0x107/0x420 [rmnet]
[ 146.710492][ T1174] ? sch_direct_xmit+0x1e2/0x1020
[ 146.716193][ T1174] rmnet_vnd_start_xmit+0x3d/0xa0 [rmnet]
[ 146.717012][ T1174] dev_hard_start_xmit+0x160/0x740
[ 146.717854][ T1174] sch_direct_xmit+0x265/0x1020
[ 146.718577][ T1174] ? register_lock_class+0x14d0/0x14d0
[ 146.719429][ T1174] ? dev_watchdog+0xac0/0xac0
[ 146.723738][ T1174] ? __dev_queue_xmit+0x15fd/0x2940
[ 146.724469][ T1174] ? lock_acquire+0x164/0x3b0
[ 146.725172][ T1174] __dev_queue_xmit+0x20c7/0x2940
[ ... ]
Fixes:
ceed73a2cf4a ("drivers: net: ethernet: qualcomm: rmnet: Initial implementation")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Taehee Yoo [Thu, 27 Feb 2020 12:24:26 +0000 (12:24 +0000)]
net: rmnet: fix NULL pointer dereference in rmnet_changelink()
In the rmnet_changelink(), it uses IFLA_LINK without checking
NULL pointer.
tb[IFLA_LINK] could be NULL pointer.
So, NULL-ptr-deref could occur.
rmnet already has a lower interface (real_dev).
So, after this patch, rmnet_changelink() does not use IFLA_LINK anymore.
Test commands:
modprobe rmnet
ip link add dummy0 type dummy
ip link add rmnet0 link dummy0 type rmnet mux_id 1
ip link set rmnet0 type rmnet mux_id 2
Splat looks like:
[ 90.578726][ T1131] general protection fault, probably for non-canonical address 0xdffffc0000000000I
[ 90.581121][ T1131] KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
[ 90.582380][ T1131] CPU: 2 PID: 1131 Comm: ip Not tainted 5.6.0-rc1+ #447
[ 90.584285][ T1131] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 90.587506][ T1131] RIP: 0010:rmnet_changelink+0x5a/0x8a0 [rmnet]
[ 90.588546][ T1131] Code: 83 ec 20 48 c1 ea 03 80 3c 02 00 0f 85 6f 07 00 00 48 8b 5e 28 48 b8 00 00 00 00 00 0
[ 90.591447][ T1131] RSP: 0018:
ffff8880ce78f1b8 EFLAGS:
00010247
[ 90.592329][ T1131] RAX:
dffffc0000000000 RBX:
0000000000000000 RCX:
ffff8880ce78f8b0
[ 90.593253][ T1131] RDX:
0000000000000000 RSI:
ffff8880ce78f4a0 RDI:
0000000000000004
[ 90.594058][ T1131] RBP:
ffff8880cf543e00 R08:
0000000000000002 R09:
0000000000000002
[ 90.594859][ T1131] R10:
ffffffffc0586a40 R11:
0000000000000000 R12:
ffff8880ca47c000
[ 90.595690][ T1131] R13:
ffff8880ca47c000 R14:
ffff8880cf545000 R15:
0000000000000000
[ 90.596553][ T1131] FS:
00007f21f6c7e0c0(0000) GS:
ffff8880da400000(0000) knlGS:
0000000000000000
[ 90.597504][ T1131] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 90.599418][ T1131] CR2:
0000556e413db458 CR3:
00000000c917a002 CR4:
00000000000606e0
[ 90.600289][ T1131] Call Trace:
[ 90.600631][ T1131] __rtnl_newlink+0x922/0x1270
[ 90.601194][ T1131] ? lock_downgrade+0x6e0/0x6e0
[ 90.601724][ T1131] ? rtnl_link_unregister+0x220/0x220
[ 90.602309][ T1131] ? lock_acquire+0x164/0x3b0
[ 90.602784][ T1131] ? is_bpf_image_address+0xff/0x1d0
[ 90.603331][ T1131] ? rtnl_newlink+0x4c/0x90
[ 90.603810][ T1131] ? kernel_text_address+0x111/0x140
[ 90.604419][ T1131] ? __kernel_text_address+0xe/0x30
[ 90.604981][ T1131] ? unwind_get_return_address+0x5f/0xa0
[ 90.605616][ T1131] ? create_prof_cpu_mask+0x20/0x20
[ 90.606304][ T1131] ? arch_stack_walk+0x83/0xb0
[ 90.606985][ T1131] ? stack_trace_save+0x82/0xb0
[ 90.607656][ T1131] ? stack_trace_consume_entry+0x160/0x160
[ 90.608503][ T1131] ? deactivate_slab.isra.78+0x2c5/0x800
[ 90.609336][ T1131] ? kasan_unpoison_shadow+0x30/0x40
[ 90.610096][ T1131] ? kmem_cache_alloc_trace+0x135/0x350
[ 90.610889][ T1131] ? rtnl_newlink+0x4c/0x90
[ 90.611512][ T1131] rtnl_newlink+0x65/0x90
[ ... ]
Fixes:
23790ef12082 ("net: qualcomm: rmnet: Allow to configure flags for existing devices")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Taehee Yoo [Thu, 27 Feb 2020 12:23:52 +0000 (12:23 +0000)]
net: rmnet: fix NULL pointer dereference in rmnet_newlink()
rmnet registers IFLA_LINK interface as a lower interface.
But, IFLA_LINK could be NULL.
In the current code, rmnet doesn't check IFLA_LINK.
So, panic would occur.
Test commands:
modprobe rmnet
ip link add rmnet0 type rmnet mux_id 1
Splat looks like:
[ 36.826109][ T1115] general protection fault, probably for non-canonical address 0xdffffc0000000000I
[ 36.838817][ T1115] KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
[ 36.839908][ T1115] CPU: 1 PID: 1115 Comm: ip Not tainted 5.6.0-rc1+ #447
[ 36.840569][ T1115] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 36.841408][ T1115] RIP: 0010:rmnet_newlink+0x54/0x510 [rmnet]
[ 36.841986][ T1115] Code: 83 ec 18 48 c1 e9 03 80 3c 01 00 0f 85 d4 03 00 00 48 8b 6a 28 48 b8 00 00 00 00 00 c
[ 36.843923][ T1115] RSP: 0018:
ffff8880b7e0f1c0 EFLAGS:
00010247
[ 36.844756][ T1115] RAX:
dffffc0000000000 RBX:
ffff8880d14cca00 RCX:
1ffff11016fc1e99
[ 36.845859][ T1115] RDX:
0000000000000000 RSI:
ffff8880c3d04000 RDI:
0000000000000004
[ 36.846961][ T1115] RBP:
0000000000000000 R08:
ffff8880b7e0f8b0 R09:
ffff8880b6ac2d90
[ 36.848020][ T1115] R10:
ffffffffc0589a40 R11:
ffffed1016d585b7 R12:
ffffffff88ceaf80
[ 36.848788][ T1115] R13:
ffff8880c3d04000 R14:
ffff8880b7e0f8b0 R15:
ffff8880c3d04000
[ 36.849546][ T1115] FS:
00007f50ab3360c0(0000) GS:
ffff8880da000000(0000) knlGS:
0000000000000000
[ 36.851784][ T1115] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 36.852422][ T1115] CR2:
000055871afe5ab0 CR3:
00000000ae246001 CR4:
00000000000606e0
[ 36.853181][ T1115] Call Trace:
[ 36.853514][ T1115] __rtnl_newlink+0xbdb/0x1270
[ 36.853967][ T1115] ? lock_downgrade+0x6e0/0x6e0
[ 36.854420][ T1115] ? rtnl_link_unregister+0x220/0x220
[ 36.854936][ T1115] ? lock_acquire+0x164/0x3b0
[ 36.855376][ T1115] ? is_bpf_image_address+0xff/0x1d0
[ 36.855884][ T1115] ? rtnl_newlink+0x4c/0x90
[ 36.856304][ T1115] ? kernel_text_address+0x111/0x140
[ 36.856857][ T1115] ? __kernel_text_address+0xe/0x30
[ 36.857440][ T1115] ? unwind_get_return_address+0x5f/0xa0
[ 36.858063][ T1115] ? create_prof_cpu_mask+0x20/0x20
[ 36.858644][ T1115] ? arch_stack_walk+0x83/0xb0
[ 36.859171][ T1115] ? stack_trace_save+0x82/0xb0
[ 36.859710][ T1115] ? stack_trace_consume_entry+0x160/0x160
[ 36.860357][ T1115] ? deactivate_slab.isra.78+0x2c5/0x800
[ 36.860928][ T1115] ? kasan_unpoison_shadow+0x30/0x40
[ 36.861520][ T1115] ? kmem_cache_alloc_trace+0x135/0x350
[ 36.862125][ T1115] ? rtnl_newlink+0x4c/0x90
[ 36.864073][ T1115] rtnl_newlink+0x65/0x90
[ ... ]
Fixes:
ceed73a2cf4a ("drivers: net: ethernet: qualcomm: rmnet: Initial implementation")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Thu, 27 Feb 2020 19:26:33 +0000 (11:26 -0800)]
Merge tag 'kbuild-fixes-v5.6-2' of git://git./linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- fix missed rebuild of DT schema check
- add some phony targets to PHONY
- fix comments and documents
* tag 'kbuild-fixes-v5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: get rid of trailing slash from subdir- example
kbuild: add dt_binding_check to PHONY in a correct place
kbuild: add dtbs_check to PHONY
kbuild: remove unneeded semicolon at the end of cmd_dtb_check
kbuild: fix DT binding schema rule to detect command line changes
kbuild: remove wrong documentation about mandatory-y
kbuild: add comment for V=2 mode
Russell King [Thu, 27 Feb 2020 09:44:49 +0000 (09:44 +0000)]
net: phy: marvell: don't interpret PHY status unless resolved
Don't attempt to interpret the PHY specific status register unless
the PHY is indicating that the resolution is valid.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Thu, 27 Feb 2020 07:22:10 +0000 (08:22 +0100)]
mlx5: register lag notifier for init network namespace only
The current code causes problems when the unregistering netdevice could
be different then the registering one.
Since the check in mlx5_lag_netdev_event() does not allow any other
network namespace anyway, fix this by registerting the lag notifier
per init network namespace only.
Fixes:
d48834f9d4b4 ("mlx5: Use dev_net netdevice notifier registrations")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Tested-by: Aya Levin <ayal@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Thu, 27 Feb 2020 19:13:27 +0000 (11:13 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/hid/hid
Pull HID subsystem fixes from Jiri Kosina:
- syzkaller-reported error handling fixes in various drivers, from
various people
- increase of HID report buffer size to 8K, which is apparently needed
by certain modern devices
- a few new device-ID-specific fixes / quirks
- battery charging status reporting fix in logitech-hidpp, from Filipe
LaÃns
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
HID: hid-bigbenff: fix race condition for scheduled work during removal
HID: hid-bigbenff: call hid_hw_stop() in case of error
HID: hid-bigbenff: fix general protection fault caused by double kfree
HID: i2c-hid: add Trekstor Surfbook E11B to descriptor override
HID: alps: Fix an error handling path in 'alps_input_configured()'
HID: hiddev: Fix race in in hiddev_disconnect()
HID: core: increase HID report buffer size to 8KiB
HID: core: fix off-by-one memset in hid_report_raw_event()
HID: apple: Add support for recent firmware on Magic Keyboards
HID: ite: Only bind to keyboard USB interface on Acer SW5-012 keyboard dock
HID: logitech-hidpp: BatteryVoltage: only read chargeStatus if extPower is active
Tobias Klauser [Wed, 26 Feb 2020 17:29:53 +0000 (18:29 +0100)]
unix: define and set show_fdinfo only if procfs is enabled
Follow the pattern used with other *_show_fdinfo functions and only
define unix_show_fdinfo and set it in proto_ops if CONFIG_PROCFS
is set.
Fixes:
3c32da19a858 ("unix: Show number of pending scm files of receive queue in fdinfo")
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 27 Feb 2020 19:08:01 +0000 (11:08 -0800)]
Merge branch 'hinic-BugFixes'
Luo bin says:
====================
hinic: BugFixes
the bug fixed in patch #2 has been present since the first commit.
the bugs fixed in patch #1 and patch #3 have been present since the
following commits:
patch #1:
352f58b0d9f2 ("net-next/hinic: Set Rxq irq to specific cpu for NUMA")
patch #3:
421e9526288b ("hinic: add rss support")
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Luo bin [Thu, 27 Feb 2020 06:34:44 +0000 (06:34 +0000)]
hinic: fix a bug of rss configuration
should use real receive queue number to configure hw rss
indirect table rather than maximal queue number
Signed-off-by: Luo bin <luobin9@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Luo bin [Thu, 27 Feb 2020 06:34:43 +0000 (06:34 +0000)]
hinic: fix a bug of setting hw_ioctxt
a reserved field is used to signify prime physical function index
in the latest firmware version, so we must assign a value to it
correctly
Signed-off-by: Luo bin <luobin9@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Luo bin [Thu, 27 Feb 2020 06:34:42 +0000 (06:34 +0000)]
hinic: fix a irq affinity bug
can not use a local variable as an input parameter of
irq_set_affinity_hint
Signed-off-by: Luo bin <luobin9@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Thu, 27 Feb 2020 19:07:13 +0000 (11:07 -0800)]
Merge tag 'docs-5.6-fixes' of git://git.lwn.net/linux
Pull documentation fixes from Jonathan Corbet:
"A pair of docs-build fixes"
* tag 'docs-5.6-fixes' of git://git.lwn.net/linux:
docs: Fix empty parallelism argument
docs: remove MPX from the x86 toc
Linus Torvalds [Thu, 27 Feb 2020 19:01:22 +0000 (11:01 -0800)]
Merge tag 'audit-pr-
20200226' of git://git./linux/kernel/git/pcmoore/audit
Pull audit fixes from Paul Moore:
"Two fixes for problems found by syzbot:
- Moving audit filter structure fields into a union caused some
problems in the code which populates that filter structure.
We keep the union (that idea is a good one), but we are fixing the
code so that it doesn't needlessly set fields in the union and mess
up the error handling.
- The audit_receive_msg() function wasn't validating user input as
well as it should in all cases, we add the necessary checks"
* tag 'audit-pr-
20200226' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
audit: always check the netlink payload length in audit_receive_msg()
audit: fix error handling in audit_data_to_entry()
Bijan Mottahedeh [Thu, 27 Feb 2020 02:53:43 +0000 (18:53 -0800)]
nvme-pci: Hold cq_poll_lock while completing CQEs
Completions need to consumed in the same order the controller submitted
them, otherwise future completion entries may overwrite ones we haven't
handled yet. Hold the nvme queue's poll lock while completing new CQEs to
prevent another thread from freeing command tags for reuse out-of-order.
Fixes:
dabcefab45d3 ("nvme: provide optimized poll function for separate poll queues")
Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Marek Szyprowski [Thu, 27 Feb 2020 11:51:46 +0000 (12:51 +0100)]
PCI: brcmstb: Fix build on 32bit ARM platforms with older compilers
Some older compilers have no implementation for the helper for 64-bit
unsigned division/modulo, so linking pcie-brcmstb driver causes the
"undefined reference to `__aeabi_uldivmod'" error.
*rc_bar2_size is always a power of two, because it is calculated as:
"1ULL << fls64(entry->res->end - entry->res->start)", so the modulo
operation in the subsequent check can be replaced by a simple logical
AND with a proper mask.
Link: https://lore.kernel.org/r/20200227115146.24515-1-m.szyprowski@samsung.com
Fixes:
c0452137034b ("PCI: brcmstb: Add Broadcom STB PCIe host controller driver")
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Tobias Klauser [Wed, 26 Feb 2020 17:38:32 +0000 (18:38 +0100)]
io_uring: define and set show_fdinfo only if procfs is enabled
Follow the pattern used with other *_show_fdinfo functions and only
define and use io_uring_show_fdinfo and its helper functions if
CONFIG_PROC_FS is set.
Fixes:
87ce955b24c9 ("io_uring: add ->show_fdinfo() for the io_uring file descriptor")
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Rafael J. Wysocki [Thu, 27 Feb 2020 10:21:23 +0000 (11:21 +0100)]
Merge tag 'devfreq-fixes-for-5.6-rc4' of git://git./linux/kernel/git/chanwoo/linux
Pull a devfreq fix for 5.6-rc4 from Chanwoo Choi:
"Revert "PM / devfreq: Modify the device name as devfreq(X) for sysfs"
- This changes as devfreq(X) cause break some user space applications
such as Android HAL from Unisoc and Hikey. In result, decide to revert it
for preventing the HAL layer problem."
* tag 'devfreq-fixes-for-5.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux:
Revert "PM / devfreq: Modify the device name as devfreq(X) for sysfs"
Rafael J. Wysocki [Wed, 26 Feb 2020 21:39:27 +0000 (22:39 +0100)]
cpufreq: Fix policy initialization for internal governor drivers
Before commit
1e4f63aecb53 ("cpufreq: Avoid creating excessively
large stack frames") the initial value of the policy field in struct
cpufreq_policy set by the driver's ->init() callback was implicitly
passed from cpufreq_init_policy() to cpufreq_set_policy() if the
default governor was neither "performance" nor "powersave". After
that commit, however, cpufreq_init_policy() must take that case into
consideration explicitly and handle it as appropriate, so make that
happen.
Fixes:
1e4f63aecb53 ("cpufreq: Avoid creating excessively large stack frames")
Link: https://lore.kernel.org/linux-pm/39fb762880c27da110086741315ca8b111d781cd.camel@gmail.com/
Reported-by: Artem Bityutskiy <dedekind1@gmail.com>
Cc: 5.4+ <stable@vger.kernel.org> # 5.4+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Karsten Graul [Wed, 26 Feb 2020 16:52:46 +0000 (17:52 +0100)]
net/smc: check for valid ib_client_data
In smc_ib_remove_dev() check if the provided ib device was actually
initialized for SMC before.
Reported-by: syzbot+84484ccebdd4e5451d91@syzkaller.appspotmail.com
Fixes:
a4cf0443c414 ("smc: introduce SMC as an IB-client")
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Aaro Koskinen [Wed, 26 Feb 2020 16:49:01 +0000 (18:49 +0200)]
net: stmmac: fix notifier registration
We cannot register the same netdev notifier multiple times when probing
stmmac devices. Register the notifier only once in module init, and also
make debugfs creation/deletion safe against simultaneous notifier call.
Fixes:
481a7d154cbb ("stmmac: debugfs entry name is not be changed when udev rename device name.")
Signed-off-by: Aaro Koskinen <aaro.koskinen@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Antoine Tenart [Wed, 26 Feb 2020 15:26:50 +0000 (16:26 +0100)]
net: phy: mscc: fix firmware paths
The firmware paths for the VSC8584 PHYs not not contain the leading
'microchip/' directory, as used in linux-firmware, resulting in an
error when probing the driver. This patch fixes it.
Fixes:
a5afc1678044 ("net: phy: mscc: add support for VSC8584 PHY")
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Wed, 26 Feb 2020 11:19:03 +0000 (12:19 +0100)]
mptcp: add dummy icsk_sync_mss()
syzbot noted that the master MPTCP socket lacks the icsk_sync_mss
callback, and was able to trigger a null pointer dereference:
BUG: kernel NULL pointer dereference, address:
0000000000000000
PGD
8e171067 P4D
8e171067 PUD
93fa2067 PMD 0
Oops: 0010 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 8984 Comm: syz-executor066 Not tainted 5.6.0-rc2-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:0x0
Code: Bad RIP value.
RSP: 0018:
ffffc900020b7b80 EFLAGS:
00010246
RAX:
1ffff110124ba600 RBX:
0000000000000000 RCX:
ffff88809fefa600
RDX:
ffff8880994cdb18 RSI:
0000000000000000 RDI:
ffff8880925d3140
RBP:
ffffc900020b7bd8 R08:
ffffffff870225be R09:
fffffbfff140652a
R10:
fffffbfff140652a R11:
0000000000000000 R12:
ffff8880925d35d0
R13:
ffff8880925d3140 R14:
dffffc0000000000 R15:
1ffff110124ba6ba
FS:
0000000001a0b880(0000) GS:
ffff8880aea00000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
ffffffffffffffd6 CR3:
00000000a6d6f000 CR4:
00000000001406f0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
Call Trace:
cipso_v4_sock_setattr+0x34b/0x470 net/ipv4/cipso_ipv4.c:1888
netlbl_sock_setattr+0x2a7/0x310 net/netlabel/netlabel_kapi.c:989
smack_netlabel security/smack/smack_lsm.c:2425 [inline]
smack_inode_setsecurity+0x3da/0x4a0 security/smack/smack_lsm.c:2716
security_inode_setsecurity+0xb2/0x140 security/security.c:1364
__vfs_setxattr_noperm+0x16f/0x3e0 fs/xattr.c:197
vfs_setxattr fs/xattr.c:224 [inline]
setxattr+0x335/0x430 fs/xattr.c:451
__do_sys_fsetxattr fs/xattr.c:506 [inline]
__se_sys_fsetxattr+0x130/0x1b0 fs/xattr.c:495
__x64_sys_fsetxattr+0xbf/0xd0 fs/xattr.c:495
do_syscall_64+0xf7/0x1c0 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x440199
Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 fb 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:
00007ffcadc19e48 EFLAGS:
00000246 ORIG_RAX:
00000000000000be
RAX:
ffffffffffffffda RBX:
00000000004002c8 RCX:
0000000000440199
RDX:
0000000020000200 RSI:
00000000200001c0 RDI:
0000000000000003
RBP:
00000000006ca018 R08:
0000000000000003 R09:
00000000004002c8
R10:
0000000000000009 R11:
0000000000000246 R12:
0000000000401a20
R13:
0000000000401ab0 R14:
0000000000000000 R15:
0000000000000000
Modules linked in:
CR2:
0000000000000000
Address the issue adding a dummy icsk_sync_mss callback.
To properly sync the subflows mss and options list we need some
additional infrastructure, which will land to net-next.
Reported-by: syzbot+f4dfece964792d80b139@syzkaller.appspotmail.com
Fixes:
2303f994b3e1 ("mptcp: Associate MPTCP context with TCP socket")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sudheesh Mavila [Wed, 26 Feb 2020 07:10:45 +0000 (12:40 +0530)]
net: phy: corrected the return value for genphy_check_and_restart_aneg and genphy_c45_check_and_restart_aneg
When auto-negotiation is not required, return value should be zero.
Changes v1->v2:
- improved comments and code as Andrew Lunn and Heiner Kallweit suggestion
- fixed issue in genphy_c45_check_and_restart_aneg as Russell King
suggestion.
Fixes:
2a10ab043ac5 ("net: phy: add genphy_check_and_restart_aneg()")
Fixes:
1af9f16840e9 ("net: phy: add genphy_c45_check_and_restart_aneg()")
Signed-off-by: Sudheesh Mavila <sudheesh.mavila@amd.com>
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
yangerkun [Wed, 26 Feb 2020 03:54:35 +0000 (11:54 +0800)]
slip: not call free_netdev before rtnl_unlock in slip_open
As the description before netdev_run_todo, we cannot call free_netdev
before rtnl_unlock, fix it by reorder the code.
Signed-off-by: yangerkun <yangerkun@huawei.com>
Reviewed-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 25 Feb 2020 19:52:29 +0000 (11:52 -0800)]
ipv6: restrict IPV6_ADDRFORM operation
IPV6_ADDRFORM is able to transform IPv6 socket to IPv4 one.
While this operation sounds illogical, we have to support it.
One of the things it does for TCP socket is to switch sk->sk_prot
to tcp_prot.
We now have other layers playing with sk->sk_prot, so we should make
sure to not interfere with them.
This patch makes sure sk_prot is the default pointer for TCP IPv6 socket.
syzbot reported :
BUG: kernel NULL pointer dereference, address:
0000000000000000
PGD
a0113067 P4D
a0113067 PUD
a8771067 PMD 0
Oops: 0010 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 10686 Comm: syz-executor.0 Not tainted 5.6.0-rc2-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:0x0
Code: Bad RIP value.
RSP: 0018:
ffffc9000281fce0 EFLAGS:
00010246
RAX:
1ffffffff15f48ac RBX:
ffffffff8afa4560 RCX:
dffffc0000000000
RDX:
0000000000000000 RSI:
0000000000000000 RDI:
ffff8880a69a8f40
RBP:
ffffc9000281fd10 R08:
ffffffff86ed9b0c R09:
ffffed1014d351f5
R10:
ffffed1014d351f5 R11:
0000000000000000 R12:
ffff8880920d3098
R13:
1ffff1101241a613 R14:
ffff8880a69a8f40 R15:
0000000000000000
FS:
00007f2ae75db700(0000) GS:
ffff8880aea00000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
ffffffffffffffd6 CR3:
00000000a3b85000 CR4:
00000000001406f0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
Call Trace:
inet_release+0x165/0x1c0 net/ipv4/af_inet.c:427
__sock_release net/socket.c:605 [inline]
sock_close+0xe1/0x260 net/socket.c:1283
__fput+0x2e4/0x740 fs/file_table.c:280
____fput+0x15/0x20 fs/file_table.c:313
task_work_run+0x176/0x1b0 kernel/task_work.c:113
tracehook_notify_resume include/linux/tracehook.h:188 [inline]
exit_to_usermode_loop arch/x86/entry/common.c:164 [inline]
prepare_exit_to_usermode+0x480/0x5b0 arch/x86/entry/common.c:195
syscall_return_slowpath+0x113/0x4a0 arch/x86/entry/common.c:278
do_syscall_64+0x11f/0x1c0 arch/x86/entry/common.c:304
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x45c429
Code: ad b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:
00007f2ae75dac78 EFLAGS:
00000246 ORIG_RAX:
0000000000000036
RAX:
0000000000000000 RBX:
00007f2ae75db6d4 RCX:
000000000045c429
RDX:
0000000000000001 RSI:
000000000000011a RDI:
0000000000000004
RBP:
000000000076bf20 R08:
0000000000000038 R09:
0000000000000000
R10:
0000000020000180 R11:
0000000000000246 R12:
00000000ffffffff
R13:
0000000000000a9d R14:
00000000004ccfb4 R15:
000000000076bf2c
Modules linked in:
CR2:
0000000000000000
---[ end trace
82567b5207e87bae ]---
RIP: 0010:0x0
Code: Bad RIP value.
RSP: 0018:
ffffc9000281fce0 EFLAGS:
00010246
RAX:
1ffffffff15f48ac RBX:
ffffffff8afa4560 RCX:
dffffc0000000000
RDX:
0000000000000000 RSI:
0000000000000000 RDI:
ffff8880a69a8f40
RBP:
ffffc9000281fd10 R08:
ffffffff86ed9b0c R09:
ffffed1014d351f5
R10:
ffffed1014d351f5 R11:
0000000000000000 R12:
ffff8880920d3098
R13:
1ffff1101241a613 R14:
ffff8880a69a8f40 R15:
0000000000000000
FS:
00007f2ae75db700(0000) GS:
ffff8880aea00000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
ffffffffffffffd6 CR3:
00000000a3b85000 CR4:
00000000001406f0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
Fixes:
604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot+1938db17e275e85dc328@syzkaller.appspotmail.com
Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ursula Braun [Tue, 25 Feb 2020 15:34:36 +0000 (16:34 +0100)]
net/smc: fix cleanup for linkgroup setup failures
If an SMC connection to a certain peer is setup the first time,
a new linkgroup is created. In case of setup failures, such a
linkgroup is unusable and should disappear. As a first step the
linkgroup is removed from the linkgroup list in smc_lgr_forget().
There are 2 problems:
smc_listen_decline() might be called before linkgroup creation
resulting in a crash due to calling smc_lgr_forget() with
parameter NULL.
If a setup failure occurs after linkgroup creation, the connection
is never unregistered from the linkgroup, preventing linkgroup
freeing.
This patch introduces an enhanced smc_lgr_cleanup_early() function
which
* contains a linkgroup check for early smc_listen_decline()
invocations
* invokes smc_conn_free() to guarantee unregistering of the
connection.
* schedules fast linkgroup removal of the unusable linkgroup
And the unused function smcd_conn_free() is removed from smc_core.h.
Fixes:
3b2dec2603d5b ("net/smc: restructure client and server code in af_smc")
Fixes:
2a0674fffb6bc ("net/smc: improve abnormal termination of link groups")
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nicolas Saenz Julienne [Tue, 25 Feb 2020 13:11:59 +0000 (14:11 +0100)]
net: bcmgenet: Clear ID_MODE_DIS in EXT_RGMII_OOB_CTRL when not needed
Outdated Raspberry Pi 4 firmware might configure the external PHY as
rgmii although the kernel currently sets it as rgmii-rxid. This makes
connections unreliable as ID_MODE_DIS is left enabled. To avoid this,
explicitly clear that bit whenever we don't need it.
Fixes:
da38802211cc ("net: bcmgenet: Add RGMII_RXID support")
Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Tue, 25 Feb 2020 12:54:12 +0000 (13:54 +0100)]
sched: act: count in the size of action flags bitfield
The put of the flags was added by the commit referenced in fixes tag,
however the size of the message was not extended accordingly.
Fix this by adding size of the flags bitfield to the message size.
Fixes:
e38226786022 ("net: sched: update action implementations to support flags")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Masahiro Yamada [Wed, 26 Feb 2020 17:44:58 +0000 (02:44 +0900)]
kbuild: get rid of trailing slash from subdir- example
obj-* needs a trailing slash for a directory, but subdir-* does not.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Madhuparna Bhowmik [Tue, 25 Feb 2020 12:27:45 +0000 (17:57 +0530)]
net: core: devlink.c: Use built-in RCU list checking
list_for_each_entry_rcu() has built-in RCU and lock checking.
Pass cond argument to list_for_each_entry_rcu() to silence
false lockdep warning when CONFIG_PROVE_RCU_LIST is enabled.
The devlink->lock is held when devlink_dpipe_table_find()
is called in non RCU read side section. Therefore, pass struct devlink
to devlink_dpipe_table_find() for lockdep checking.
Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Mon, 24 Feb 2020 23:56:32 +0000 (15:56 -0800)]
net: dsa: bcm_sf2: Forcibly configure IMP port for 1Gb/sec
We are still experiencing some packet loss with the existing advanced
congestion buffering (ACB) settings with the IMP port configured for
2Gb/sec, so revert to conservative link speeds that do not produce
packet loss until this is resolved.
Fixes:
8f1880cbe8d0 ("net: dsa: bcm_sf2: Configure IMP port for 2Gb/sec")
Fixes:
de34d7084edd ("net: dsa: bcm_sf2: Only 7278 supports 2Gb/sec IMP port")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>