Dave Martin [Thu, 21 Feb 2019 11:42:32 +0000 (11:42 +0000)]
arm64: KVM: Fix architecturally invalid reset value for FPEXC32_EL2
Due to what looks like a typo dating back to the original addition
of FPEXC32_EL2 handling, KVM currently initialises this register to
an architecturally invalid value.
As a result, the VECITR field (RES1) in bits [10:8] is initialised
with 0, and the two reserved (RES0) bits [6:5] are initialised with
1. (In the Common VFP Subarchitecture as specified by ARMv7-A,
these two bits were IMP DEF. ARMv8-A removes them.)
This patch changes the reset value from 0x70 to 0x700, which
reflects the architectural constraints and is presumably what was
originally intended.
Cc: <stable@vger.kernel.org> # 4.12.x-
Cc: Christoffer Dall <christoffer.dall@arm.com>
Fixes:
62a89c44954f ("arm64: KVM: 32bit handling of coprocessor traps")
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Shaokun Zhang [Fri, 22 Feb 2019 06:34:29 +0000 (14:34 +0800)]
KVM: arm/arm64: Remove unused timer variable
The 'timer' local variable became unused after commit
bee038a67487
("KVM: arm/arm64: Rework the timer code to use a timer_map").
Remove it to avoid [-Wunused-but-set-variable] warning.
Cc: Christoffer Dall <christoffer.dall@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Pouloze <suzuki.poulose@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Shaokun Zhang [Tue, 19 Feb 2019 09:22:21 +0000 (17:22 +0800)]
KVM: arm/arm64: Remove unused gpa_end variable
The 'gpa_end' local variable is never used and let's remove it.
Cc: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Zenghui Yu [Thu, 14 Feb 2019 01:45:46 +0000 (01:45 +0000)]
KVM: arm64: Fix comment for KVM_PHYS_SHIFT
Since Suzuki K Poulose's work on Dynamic IPA support, KVM_PHYS_SHIFT will
be used only when machine_type's bits[7:0] equal to 0 (by default). Thus
the outdated comment should be fixed.
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Colin Ian King [Sun, 17 Feb 2019 22:36:26 +0000 (22:36 +0000)]
KVM: arm/arm64: fix spelling mistake: "auxilary" -> "auxiliary"
There is a spelling mistake in a kvm_err error message. Fix it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Marc Zyngier [Tue, 12 Feb 2019 10:07:47 +0000 (10:07 +0000)]
KVM: arm/arm64: Update MAINTAINERS entries
For historical reasons, KVM/arm and KVM/arm64 have had different
entries in the MAINTAINER file. This makes little sense, as they are
maintained together.
On top of that, we have a bunch of talented people helping with
the reviewing, and they deserve to be mentioned in the consolidated
entry.
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Acked-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Acked-by: James Morse <james.morse@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Masahiro Yamada [Fri, 25 Jan 2019 07:57:30 +0000 (16:57 +0900)]
KVM: arm/arm64: Prefix header search paths with $(srctree)/
Currently, the Kbuild core manipulates header search paths in a crazy
way [1].
To fix this mess, I want all Makefiles to add explicit $(srctree)/ to
the search paths in the srctree. Some Makefiles are already written in
that way, but not all. The goal of this work is to make the notation
consistent, and finally get rid of the gross hacks.
Having whitespaces after -I does not matter since commit
48f6e3cf5bc6
("kbuild: do not drop -I without parameter").
[1]: https://patchwork.kernel.org/patch/9632347/
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Masahiro Yamada [Fri, 25 Jan 2019 07:57:29 +0000 (16:57 +0900)]
KVM: arm/arm64: Remove -I. header search paths
The header search path -I. in kernel Makefiles is very suspicious;
it allows the compiler to search for headers in the top of $(srctree),
where obviously no header file exists.
I was able to build without these extra header search paths.
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Masahiro Yamada [Fri, 25 Jan 2019 07:57:28 +0000 (16:57 +0900)]
KVM: arm/arm64: Fix TRACE_INCLUDE_PATH
As the comment block in include/trace/define_trace.h says,
TRACE_INCLUDE_PATH should be a relative path to the define_trace.h
../../virt/kvm/arm is the correct relative path.
../../../virt/kvm/arm is working by coincidence because the top
Makefile adds -I$(srctree)/arch/$(SRCARCH)/include as a header
search path, but we should not rely on it.
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Marc Zyngier [Sun, 20 Jan 2019 20:32:31 +0000 (20:32 +0000)]
KVM: arm/arm64: arch_timer: Mark physical interrupt active when a virtual interrupt is pending
When a guest gets scheduled, KVM performs a "load" operation,
which for the timer includes evaluating the virtual "active" state
of the interrupt, and replicating it on the physical side. This
ensures that the deactivation in the guest will also take place
in the physical GIC distributor.
If the interrupt is not yet active, we flag it as inactive on the
physical side. This means that on restoring the timer registers,
if the timer has expired, we'll immediately take an interrupt.
That's absolutely fine, as the interrupt will then be flagged as
active on the physical side. What this assumes though is that we'll
enter the guest right after having taken the interrupt, and that
the guest will quickly ACK the interrupt, making it active at on
the virtual side.
It turns out that quite often, this assumption doesn't really hold.
The guest may be preempted on the back on this interrupt, either
from kernel space or whilst running at EL1 when a host interrupt
fires. When this happens, we repeat the whole sequence on the
next load (interrupt marked as inactive, timer registers restored,
interrupt fires). And if it takes a really long time for a guest
to activate the interrupt (as it does with nested virt), we end-up
with many such events in quick succession, leading to the guest only
making very slow progress.
This can also be seen with the number of virtual timer interrupt on the
host being far greater than the same number in the guest.
An easy way to fix this is to evaluate the timer state when performing
the "load" operation, just like we do when the interrupt actually fires.
If the timer has a pending virtual interrupt at this stage, then we
can safely flag the physical interrupt as being active, which prevents
spurious exits.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Ard Biesheuvel [Thu, 31 Jan 2019 13:17:18 +0000 (14:17 +0100)]
arm64: KVM: Describe data or unified caches as having 1 set and 1 way
On SMP ARM systems, cache maintenance by set/way should only ever be
done in the context of onlining or offlining CPUs, which is typically
done by bare metal firmware and never in a virtual machine. For this
reason, we trap set/way cache maintenance operations and replace them
with conditional flushing of the entire guest address space.
Due to this trapping, the set/way arguments passed into the set/way
ops are completely ignored, and thus irrelevant. This also means that
the set/way geometry is equally irrelevant, and we can simply report
it as 1 set and 1 way, so that legacy 32-bit ARM system software (i.e.,
the kind that only receives odd fixes) doesn't take a performance hit
due to the trapping when iterating over the cachelines.
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Ard Biesheuvel [Thu, 31 Jan 2019 13:17:17 +0000 (14:17 +0100)]
arm64: KVM: Expose sanitised cache type register to guest
We currently permit CPUs in the same system to deviate in the exact
topology of the caches, and we subsequently hide this fact from user
space by exposing a sanitised value of the cache type register CTR_EL0.
However, guests running under KVM see the bare value of CTR_EL0, which
could potentially result in issues with, e.g., JITs or other pieces of
code that are sensitive to misreported cache line sizes.
So let's start trapping cache ID instructions if there is a mismatch,
and expose the sanitised version of CTR_EL0 to guests. Note that CTR_EL0
is treated as an invariant to KVM user space, so update that part as well.
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Christoffer Dall [Sun, 1 May 2016 20:29:58 +0000 (22:29 +0200)]
KVM: arm/arm64: Move kvm_is_write_fault to header file
Move this little function to the header files for arm/arm64 so other
code can make use of it directly.
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Christoffer Dall [Fri, 4 Jan 2019 12:31:22 +0000 (13:31 +0100)]
KVM: arm/arm64: Rework the timer code to use a timer_map
We are currently emulating two timers in two different ways. When we
add support for nested virtualization in the future, we are going to be
emulating either two timers in two diffferent ways, or four timers in a
single way.
We need a unified data structure to keep track of how we map virtual
state to physical state and we need to cleanup some of the timer code to
operate more independently on a struct arch_timer_context instead of
trying to consider the global state of the VCPU and recomputing all
state.
Co-written with Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Christoffer Dall [Tue, 19 Feb 2019 13:04:30 +0000 (14:04 +0100)]
KVM: arm/arm64: arch_timer: Assign the phys timer on VHE systems
VHE systems don't have to emulate the physical timer, we can simply
assign the EL1 physical timer directly to the VM as the host always
uses the EL2 timers.
In order to minimize the amount of cruft, AArch32 gets definitions for
the physical timer too, but is should be generally unused on this
architecture.
Co-written with Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Christoffer Dall [Tue, 18 Sep 2018 17:08:18 +0000 (10:08 -0700)]
KVM: arm/arm64: timer: Rework data structures for multiple timers
Prepare for having 4 timer data structures (2 for now).
Move loaded to the cpu data structure and not the individual timer
structure, in preparation for assigning the EL1 phys timer as well.
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Andre Przywara [Thu, 5 Jul 2018 15:48:23 +0000 (16:48 +0100)]
KVM: arm/arm64: consolidate arch timer trap handlers
At the moment we have separate system register emulation handlers for
each timer register. Actually they are quite similar, and we rely on
kvm_arm_timer_[gs]et_reg() for the actual emulation anyways, so let's
just merge all of those handlers into one function, which just marshalls
the arguments and then hands off to a set of common accessors.
This makes extending the emulation to include EL2 timers much easier.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
[Fixed 32-bit VM breakage and reduced to reworking existing code]
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
[Fixed 32bit host, general cleanup]
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Marc Zyngier [Wed, 9 Jan 2019 19:18:40 +0000 (19:18 +0000)]
KVM: arm64: Reuse sys_reg() macro when searching the trap table
Instead of having an open-coded macro, reuse the sys_reg() macro
that does the exact same thing (the encoding is slightly different,
but the ordering property is the same).
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Marc Zyngier [Fri, 4 Jan 2019 10:33:42 +0000 (11:33 +0100)]
KVM: arm64: Fix ICH_ELRSR_EL2 sysreg naming
We previously incorrectly named the define for this system register.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Christoffer Dall [Mon, 26 Nov 2018 17:21:22 +0000 (18:21 +0100)]
KVM: arm/arm64: Simplify bg_timer programming
Instead of calling into kvm_timer_[un]schedule from the main kvm
blocking path, test if the VCPU is on the wait queue from the load/put
path and perform the background timer setup/cancel in this path.
This has the distinct advantage that we no longer race between load/put
and schedule/unschedule and programming and canceling of the bg_timer
always happens when the timer state is not loaded.
Note that we must now remove the checks in kvm_timer_blocking that do
not schedule a background timer if one of the timers can fire, because
we no longer have a guarantee that kvm_vcpu_check_block() will be called
before kvm_timer_blocking.
Reported-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Christoffer Dall [Tue, 11 Dec 2018 14:26:31 +0000 (15:26 +0100)]
KVM: arm/arm64: Factor out VMID into struct kvm_vmid
In preparation for nested virtualization where we are going to have more
than a single VMID per VM, let's factor out the VMID data into a
separate VMID data structure and change the VMID allocator to operate on
this new structure instead of using a struct kvm.
This also means that udate_vttbr now becomes update_vmid, and that the
vttbr itself is generated on the fly based on the stage 2 page table
base address and the vmid.
We cache the physical address of the pgd when allocating the pgd to
avoid doing the calculation on every entry to the guest and to avoid
calling into potentially non-hyp-mapped code from hyp/EL2.
If we wanted to merge the VMID allocator with the arm64 ASID allocator
at some point in the future, it should actually become easier to do that
after this patch.
Note that to avoid mapping the kvm_vmid_bits variable into hyp, we
simply forego the masking of the vmid value in kvm_get_vttbr and rely on
update_vmid to always assign a valid vmid value (within the supported
range).
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
[maz: minor cleanups]
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Marc Zyngier [Sat, 19 Jan 2019 15:29:54 +0000 (15:29 +0000)]
arm/arm64: KVM: Statically configure the host's view of MPIDR
We currently eagerly save/restore MPIDR. It turns out to be
slightly pointless:
- On the host, this value is known as soon as we're scheduled on a
physical CPU
- In the guest, this value cannot change, as it is set by KVM
(and this is a read-only register)
The result of the above is that we can perfectly avoid the eager
saving of MPIDR_EL1, and only keep the restore. We just have
to setup the host contexts appropriately at boot time.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Marc Zyngier [Fri, 11 Jan 2019 14:57:58 +0000 (14:57 +0000)]
ARM: KVM: Teach some form of type-safety to kvm_call_hyp
Just like on arm64, and for the same reasons, kvm_call_hyp removes
any form of type safety when calling into HYP. But we can still
try to tell the compiler what we're trying to achieve.
Here, we can add code that would do the function call if it wasn't
guarded by an always-false predicate. Hopefully, the compiler is
dumb enough to do the type checking and clever enough to not emit
the corresponding code...
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Marc Zyngier [Sat, 5 Jan 2019 16:06:53 +0000 (16:06 +0000)]
arm64: KVM: Drop VHE-specific HYP call stub
We now call VHE code directly, without going through any central
dispatching function. Let's drop that code.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Marc Zyngier [Sat, 5 Jan 2019 15:57:56 +0000 (15:57 +0000)]
arm64: KVM: Allow for direct call of HYP functions when using VHE
When running VHE, there is no need to jump via some stub to perform
a "HYP" function call, as there is a single address space.
Let's thus change kvm_call_hyp() and co to perform a direct call
in this case. Although this results in a bit of code expansion,
it allows the compiler to check for type compatibility, something
that we are missing so far.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Marc Zyngier [Sat, 5 Jan 2019 15:49:50 +0000 (15:49 +0000)]
arm/arm64: KVM: Introduce kvm_call_hyp_ret()
Until now, we haven't differentiated between HYP calls that
have a return value and those who don't. As we're about to
change this, introduce kvm_call_hyp_ret(), and change all
call sites that actually make use of a return value.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Andre Przywara [Fri, 6 Jul 2018 08:11:50 +0000 (09:11 +0100)]
clocksource/arm_arch_timer: Store physical timer IRQ number for KVM on VHE
A host running in VHE mode gets the EL2 physical timer as its time
source (accessed using the EL1 sysreg accessors, which get re-directed
to the EL2 sysregs by VHE).
The EL1 physical timer remains unused by the host kernel, allowing us to
pass that on directly to a KVM guest and saves us from emulating this
timer for the guest on VHE systems.
Store the EL1 Physical Timer's IRQ number in
struct arch_timer_kvm_info on VHE systems to allow KVM to use it.
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Linus Torvalds [Mon, 21 Jan 2019 00:14:44 +0000 (13:14 +1300)]
Linux 5.0-rc3
Linus Torvalds [Mon, 21 Jan 2019 00:12:03 +0000 (13:12 +1300)]
Merge tag 'pstore-v5.0-rc4' of git://git./linux/kernel/git/kees/linux
Pull pstore fixes from Kees Cook:
- Fix console ramoops to show the previous boot logs (Sai Prakash
Ranjan)
- Avoid allocation and leak of platform data
* tag 'pstore-v5.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
pstore/ram: Avoid allocation and leak of platform data
pstore/ram: Fix console ramoops to show the previous boot logs
Linus Torvalds [Mon, 21 Jan 2019 00:07:03 +0000 (13:07 +1300)]
Merge tag 'gcc-plugins-v5.0-rc4' of git://git./linux/kernel/git/kees/linux
Pull gcc-plugins fixes from Kees Cook:
"Fix ARM per-task stack protector plugin under GCC 9 (Ard Biesheuvel)"
* tag 'gcc-plugins-v5.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
gcc-plugins: arm_ssp_per_task_plugin: fix for GCC 9+
gcc-plugins: arm_ssp_per_task_plugin: sign extend the SP mask
Linus Torvalds [Sun, 20 Jan 2019 23:52:31 +0000 (12:52 +1300)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Fix endless loop in nf_tables, from Phil Sutter.
2) Fix cross namespace ip6_gre tunnel hash list corruption, from
Olivier Matz.
3) Don't be too strict in phy_start_aneg() otherwise we might not allow
restarting auto negotiation. From Heiner Kallweit.
4) Fix various KMSAN uninitialized value cases in tipc, from Ying Xue.
5) Memory leak in act_tunnel_key, from Davide Caratti.
6) Handle chip errata of mv88e6390 PHY, from Andrew Lunn.
7) Remove linear SKB assumption in fou/fou6, from Eric Dumazet.
8) Missing udplite rehash callbacks, from Alexey Kodanev.
9) Log dirty pages properly in vhost, from Jason Wang.
10) Use consume_skb() in neigh_probe() as this is a normal free not a
drop, from Yang Wei. Likewise in macvlan_process_broadcast().
11) Missing device_del() in mdiobus_register() error paths, from Thomas
Petazzoni.
12) Fix checksum handling of short packets in mlx5, from Cong Wang.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (96 commits)
bpf: in __bpf_redirect_no_mac pull mac only if present
virtio_net: bulk free tx skbs
net: phy: phy driver features are mandatory
isdn: avm: Fix string plus integer warning from Clang
net/mlx5e: Fix cb_ident duplicate in indirect block register
net/mlx5e: Fix wrong (zero) TX drop counter indication for representor
net/mlx5e: Fix wrong error code return on FEC query failure
net/mlx5e: Force CHECKSUM_UNNECESSARY for short ethernet frames
tools: bpftool: Cleanup license mess
bpf: fix inner map masking to prevent oob under speculation
bpf: pull in pkt_sched.h header for tooling to fix bpftool build
selftests: forwarding: Add a test case for externally learned FDB entries
selftests: mlxsw: Test FDB offload indication
mlxsw: spectrum_switchdev: Do not treat static FDB entries as sticky
net: bridge: Mark FDB entries that were added by user as such
mlxsw: spectrum_fid: Update dummy FID index
mlxsw: pci: Return error on PCI reset timeout
mlxsw: pci: Increase PCI SW reset timeout
mlxsw: pci: Ring CQ's doorbell before RDQ's
MAINTAINERS: update email addresses of liquidio driver maintainers
...
Kees Cook [Sun, 20 Jan 2019 22:33:34 +0000 (14:33 -0800)]
pstore/ram: Avoid allocation and leak of platform data
Yue Hu noticed that when parsing device tree the allocated platform data
was never freed. Since it's not used beyond the function scope, this
switches to using a stack variable instead.
Reported-by: Yue Hu <huyue2@yulong.com>
Fixes:
35da60941e44 ("pstore/ram: add Device Tree bindings")
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Ard Biesheuvel [Fri, 18 Jan 2019 10:58:07 +0000 (11:58 +0100)]
gcc-plugins: arm_ssp_per_task_plugin: fix for GCC 9+
GCC 9 reworks the way the references to the stack canary are
emitted, to prevent the value from being spilled to the stack
before the final comparison in the epilogue, defeating the
purpose, given that the spill slot is under control of the
attacker that we are protecting ourselves from.
Since our canary value address is obtained without accessing
memory (as opposed to pre-v7 code that will obtain it from a
literal pool), it is unlikely (although not guaranteed) that
the compiler will spill the canary value in the same way, so
let's just disable this improvement when building with GCC9+.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Ard Biesheuvel [Fri, 18 Jan 2019 10:58:06 +0000 (11:58 +0100)]
gcc-plugins: arm_ssp_per_task_plugin: sign extend the SP mask
The ARM per-task stack protector GCC plugin hits an assert in
the compiler in some case, due to the fact the the SP mask
expression is not sign-extended as it should be. So fix that.
Suggested-by: Kugan Vivekanandarajah <kugan.vivekanandarajah@linaro.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Linus Torvalds [Sun, 20 Jan 2019 18:37:16 +0000 (07:37 +1300)]
Merge tag 'for_linus' of git://git./linux/kernel/git/mst/vhost
Pull virtio/vhost fixes and cleanups from Michael Tsirkin:
"Fixes and cleanups all over the place"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vhost/scsi: Use copy_to_iter() to send control queue response
vhost: return EINVAL if iovecs size does not match the message size
virtio-balloon: tweak config_changed implementation
virtio: don't allocate vqs when names[i] = NULL
virtio_pci: use queue idx instead of array idx to set up the vq
virtio: document virtio_config_ops restrictions
virtio: fix virtio_config_ops description
Linus Torvalds [Sun, 20 Jan 2019 18:35:26 +0000 (07:35 +1300)]
Merge tag 'for-5.0-rc2-tag' of git://git./linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"A handful of fixes (some of them in testing for a long time):
- fix some test failures regarding cleanup after transaction abort
- revert of a patch that could cause a deadlock
- delayed iput fixes, that can help in ENOSPC situation when there's
low space and a lot data to write"
* tag 'for-5.0-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: wakeup cleaner thread when adding delayed iput
btrfs: run delayed iputs before committing
btrfs: wait on ordered extents on abort cleanup
btrfs: handle delayed ref head accounting cleanup in abort
Revert "btrfs: balance dirty metadata pages in btrfs_finish_ordered_io"
Linus Torvalds [Sun, 20 Jan 2019 18:23:42 +0000 (07:23 +1300)]
Merge tags 'compiler-attributes-for-linus-v5.0-rc3' and 'clang-format-for-linus-v5.0-rc3' of git://github.com/ojeda/linux
Pull misc clang fixes from Miguel Ojeda:
- A fix for OPTIMIZER_HIDE_VAR from Michael S Tsirkin
- Update clang-format with the latest for_each macro list from Jason
Gunthorpe
* tag 'compiler-attributes-for-linus-v5.0-rc3' of git://github.com/ojeda/linux:
include/linux/compiler*.h: fix OPTIMIZER_HIDE_VAR
* tag 'clang-format-for-linus-v5.0-rc3' of git://github.com/ojeda/linux:
clang-format: Update .clang-format with the latest for_each macro list
Florian La Roche [Sat, 19 Jan 2019 15:14:50 +0000 (16:14 +0100)]
fix int_sqrt64() for very large numbers
If an input number x for int_sqrt64() has the highest bit set, then
fls64(x) is 64. (1UL << 64) is an overflow and breaks the algorithm.
Subtracting 1 is a better guess for the initial value of m anyway and
that's what also done in int_sqrt() implicitly [*].
[*] Note how int_sqrt() uses __fls() with two underscores, which already
returns the proper raw bit number.
In contrast, int_sqrt64() used fls64(), and that returns bit numbers
illogically starting at 1, because of error handling for the "no
bits set" case. Will points out that he bug probably is due to a
copy-and-paste error from the regular int_sqrt() case.
Signed-off-by: Florian La Roche <Florian.LaRoche@googlemail.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Will Deacon [Sat, 19 Jan 2019 21:56:05 +0000 (21:56 +0000)]
x86: uaccess: Inhibit speculation past access_ok() in user_access_begin()
Commit
594cc251fdd0 ("make 'user_access_begin()' do 'access_ok()'")
makes the access_ok() check part of the user_access_begin() preceding a
series of 'unsafe' accesses. This has the desirable effect of ensuring
that all 'unsafe' accesses have been range-checked, without having to
pick through all of the callsites to verify whether the appropriate
checking has been made.
However, the consolidated range check does not inhibit speculation, so
it is still up to the caller to ensure that they are not susceptible to
any speculative side-channel attacks for user addresses that ultimately
fail the access_ok() check.
This is an oversight, so use __uaccess_begin_nospec() to ensure that
speculation is inhibited until the access_ok() check has passed.
Reported-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 20 Jan 2019 03:27:59 +0000 (15:27 +1200)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"Three arm64 fixes for -rc3.
We've plugged a couple of nasty issues involving KASLR-enabled
kernels, and removed a redundant #define that was introduced as part
of the KHWASAN fixes from akpm at -rc2.
- Fix broken kpti page-table rewrite in bizarre KASLR configuration
- Fix module loading with KASLR
- Remove redundant definition of ARCH_SLAB_MINALIGN"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
kasan, arm64: remove redundant ARCH_SLAB_MINALIGN define
arm64: kaslr: ensure randomized quantities are clean to the PoC
arm64: kpti: Update arm64_kernel_use_ng_mappings() when forced on
David S. Miller [Sun, 20 Jan 2019 00:38:12 +0000 (16:38 -0800)]
Merge git://git./pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2019-01-20
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Fix a out-of-bounds access in __bpf_redirect_no_mac, from Willem.
2) Fix bpf_setsockopt to reset sock dst on SO_MARK changes, from Peter.
3) Fix map in map masking to prevent out-of-bounds access under
speculative execution, from Daniel.
4) Fix bpf_setsockopt's SO_MAX_PACING_RATE to support TCP internal
pacing, from Yuchung.
5) Fix json writer license in bpftool, from Thomas.
6) Fix AF_XDP to check if an actually queue exists during umem
setup, from Krzysztof.
7) Several fixes to BPF stackmap's build id handling. Another fix
for bpftool build to account for libbfd variations wrt linking
requirements, from Stanislav.
8) Fix BPF samples build with clang by working around missing asm
goto, from Yonghong.
9) Fix libbpf to retry program load on signal interrupt, from Lorenz.
10) Various minor compile warning fixes in BPF code, from Mathieu.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Willem de Bruijn [Wed, 16 Jan 2019 01:19:22 +0000 (20:19 -0500)]
bpf: in __bpf_redirect_no_mac pull mac only if present
Syzkaller was able to construct a packet of negative length by
redirecting from bpf_prog_test_run_skb with BPF_PROG_TYPE_LWT_XMIT:
BUG: KASAN: slab-out-of-bounds in memcpy include/linux/string.h:345 [inline]
BUG: KASAN: slab-out-of-bounds in skb_copy_from_linear_data include/linux/skbuff.h:3421 [inline]
BUG: KASAN: slab-out-of-bounds in __pskb_copy_fclone+0x2dd/0xeb0 net/core/skbuff.c:1395
Read of size
4294967282 at addr
ffff8801d798009c by task syz-executor2/12942
kasan_report.cold.9+0x242/0x309 mm/kasan/report.c:412
check_memory_region_inline mm/kasan/kasan.c:260 [inline]
check_memory_region+0x13e/0x1b0 mm/kasan/kasan.c:267
memcpy+0x23/0x50 mm/kasan/kasan.c:302
memcpy include/linux/string.h:345 [inline]
skb_copy_from_linear_data include/linux/skbuff.h:3421 [inline]
__pskb_copy_fclone+0x2dd/0xeb0 net/core/skbuff.c:1395
__pskb_copy include/linux/skbuff.h:1053 [inline]
pskb_copy include/linux/skbuff.h:2904 [inline]
skb_realloc_headroom+0xe7/0x120 net/core/skbuff.c:1539
ipip6_tunnel_xmit net/ipv6/sit.c:965 [inline]
sit_tunnel_xmit+0xe1b/0x30d0 net/ipv6/sit.c:1029
__netdev_start_xmit include/linux/netdevice.h:4325 [inline]
netdev_start_xmit include/linux/netdevice.h:4334 [inline]
xmit_one net/core/dev.c:3219 [inline]
dev_hard_start_xmit+0x295/0xc90 net/core/dev.c:3235
__dev_queue_xmit+0x2f0d/0x3950 net/core/dev.c:3805
dev_queue_xmit+0x17/0x20 net/core/dev.c:3838
__bpf_tx_skb net/core/filter.c:2016 [inline]
__bpf_redirect_common net/core/filter.c:2054 [inline]
__bpf_redirect+0x5cf/0xb20 net/core/filter.c:2061
____bpf_clone_redirect net/core/filter.c:2094 [inline]
bpf_clone_redirect+0x2f6/0x490 net/core/filter.c:2066
bpf_prog_41f2bcae09cd4ac3+0xb25/0x1000
The generated test constructs a packet with mac header, network
header, skb->data pointing to network header and skb->len 0.
Redirecting to a sit0 through __bpf_redirect_no_mac pulls the
mac length, even though skb->data already is at skb->network_header.
bpf_prog_test_run_skb has already pulled it as LWT_XMIT !is_l2.
Update the offset calculation to pull only if skb->data differs
from skb->network_header, which is not true in this case.
The test itself can be run only from commit
1cf1cae963c2 ("bpf:
introduce BPF_PROG_TEST_RUN command"), but the same type of packets
with skb at network header could already be built from lwt xmit hooks,
so this fix is more relevant to that commit.
Also set the mac header on redirect from LWT_XMIT, as even after this
change to __bpf_redirect_no_mac that field is expected to be set, but
is not yet in ip_finish_output2.
Fixes:
3a0af8fd61f9 ("bpf: BPF for lightweight tunnel infrastructure")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Michael S. Tsirkin [Fri, 18 Jan 2019 04:20:07 +0000 (23:20 -0500)]
virtio_net: bulk free tx skbs
Use napi_consume_skb() to get bulk free. Note that napi_consume_skb is
safe to call in a non-napi context as long as the napi_budget flag is
correct.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sat, 19 Jan 2019 22:33:18 +0000 (10:33 +1200)]
Merge tag 'mips_fixes_5.0_2' of git://git./linux/kernel/git/mips/linux
Pull MIPS fixes from Paul Burton:
- Fix IPI handling for Lantiq SoCs, which was broken by changes made
back in v4.12.
- Enable OF/DT serial support in ath79_defconfig to give us working
serial by default.
- Fix 64b builds for the Jazz platform.
- Set up a struct device for the BCM47xx SoC to allow BCM47xx drivers
to perform DMA again following the major DMA mapping changes made in
v4.19.
- Disable MSI on Cavium Octeon systems when the pcie_disable command
line parameter introduced in v3.3 is used, in order to avoid
inadvetently accessing PCIe controller registers despite the command
line.
- Fix a build failure for Cavium Octeon kernels with kexec enabled,
introduced in v4.20.
- Fix a regression in the behaviour of semctl/shmctl/msgctl IPC
syscalls for kernels including n32 support but not o32 support caused
by some cleanup in v3.19.
* tag 'mips_fixes_5.0_2' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: OCTEON: fix kexec support
mips: fix n32 compat_ipc_parse_version
Disable MSI also when pcie-octeon.pcie_disable on
MIPS: BCM47XX: Setup struct device for the SoC
MIPS: jazz: fix 64bit build
MIPS: ath79: Enable OF serial ports in the default config
MIPS: lantiq: Use CP0_LEGACY_COMPARE_IRQ
MIPS: lantiq: Fix IPI interrupt handling
Linus Torvalds [Sat, 19 Jan 2019 22:28:46 +0000 (10:28 +1200)]
Merge tag 'devicetree-fixes-for-5.0-2' of git://git./linux/kernel/git/robh/linux
Pull Devicetree fix from Rob Herring:
"A single build fix for powerpc due to device_node.type removal"
* tag 'devicetree-fixes-for-5.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
powerpc: chrp: Use of_node_is_type to access device_type
Linus Torvalds [Sat, 19 Jan 2019 22:24:30 +0000 (10:24 +1200)]
Merge tag 'libnvdimm-fixes-5.0-rc3' of git://git./linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fixes from Dan Williams:
"A crash fix, a build warning fix, a miscellaneous small cleanups.
In case anyone is looking for them, there was a regression caught by
testing that caused two patches to be dropped from this update. Those
patches have been reworked and will soak for another week / re-target
5.0-rc4.
- Fix driver initialization crash due to the inability to report an
'error' state for a DIMM's security capability.
- Build warning fix for little-endian ARM64 builds
- Fix a potential race between the EDAC driver's usage of the NFIT
SMBIOS id for a DIMM and the driver shutdown path.
- A small collection of one-line benign cleanups for duplicate
variable assignments, a duplicate header include and a mis-typed
function argument"
* tag 'libnvdimm-fixes-5.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
libnvdimm/security: Fix nvdimm_security_state() state request selection
acpi/nfit: Remove duplicate set nd_set in acpi_nfit_init_interleave_set()
acpi/nfit: Fix race accessing memdev in nfit_get_smbios_id()
libnvdimm/dimm: Fix security capability detection for non-Intel NVDIMMs
nfit: Mark some functions as __maybe_unused
ACPI/nfit: delete the function to_acpi_nfit_desc
ACPI/nfit: delete the redundant header file
Linus Torvalds [Sat, 19 Jan 2019 21:58:52 +0000 (09:58 +1200)]
Merge tag 'linux-watchdog-5.0-rc-fixes' of git://linux-watchdog.org/linux-watchdog
Pull watchdog fixes from Wim Van Sebroeck:
- mt7621_wdt/rt2880_wdt: Fix compilation problem
- tqmx86: Fix a couple IS_ERR() vs NULL bugs
* tag 'linux-watchdog-5.0-rc-fixes' of git://www.linux-watchdog.org/linux-watchdog:
watchdog: tqmx86: Fix a couple IS_ERR() vs NULL bugs
watchdog: mt7621_wdt/rt2880_wdt: Fix compilation problem
Linus Torvalds [Sat, 19 Jan 2019 21:27:38 +0000 (09:27 +1200)]
Merge tag 'nfs-for-5.0-2' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client fixes from Anna Schumaker:
"These are mostly fixes for SUNRPC bugs, with a single v4.2
copy_file_range() fix mixed in.
Stable bugfixes:
- Fix TCP receive code on archs with flush_dcache_page()
Other bugfixes:
- Fix error code in rpcrdma_buffer_create()
- Fix a double free in rpcrdma_send_ctxs_create()
- Fix kernel BUG at kernel/cred.c:825
- Fix unnecessary retry in nfs42_proc_copy_file_range()
- Ensure rq_bytes_sent is reset before request transmission
- Ensure we respect the RPCSEC_GSS sequence number limit
- Address Kerberos performance/behavior regression"
* tag 'nfs-for-5.0-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
SUNRPC: Address Kerberos performance/behavior regression
SUNRPC: Ensure we respect the RPCSEC_GSS sequence number limit
SUNRPC: Ensure rq_bytes_sent is reset before request transmission
NFSv4.2 fix unnecessary retry in nfs4_copy_file_range
sunrpc: kernel BUG at kernel/cred.c:825!
SUNRPC: Fix TCP receive code on archs with flush_dcache_page()
xprtrdma: Double free in rpcrdma_sendctxs_create()
xprtrdma: Fix error code in rpcrdma_buffer_create()
Linus Torvalds [Sat, 19 Jan 2019 21:15:04 +0000 (09:15 +1200)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"A set of 17 fixes. Most of these are minor or trivial.
The one fix that may be serious is the isci one: the bug can cause hba
parameters to be set from uninitialized memory. I don't think it's
exploitable, but you never know"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: cxgb4i: add wait_for_completion()
scsi: qla1280: set 64bit coherent mask
scsi: ufs: Fix geometry descriptor size
scsi: megaraid_sas: Retry reads of outbound_intr_status reg
scsi: qedi: Add ep_state for login completion on un-reachable targets
scsi: ufs: Fix system suspend status
scsi: qla2xxx: Use correct number of vectors for online CPUs
scsi: hisi_sas: Set protection parameters prior to adding SCSI host
scsi: tcmu: avoid cmd/qfull timers updated whenever a new cmd comes
scsi: isci: initialize shost fully before calling scsi_add_host()
scsi: lpfc: lpfc_sli: Mark expected switch fall-throughs
scsi: smartpqi_init: fix boolean expression in pqi_device_remove_start
scsi: core: Synchronize request queue PM status only on successful resume
scsi: pm80xx: reduce indentation
scsi: qla4xxx: check return code of qla4xxx_copy_from_fwddb_param
scsi: megaraid_sas: correct an info message
scsi: target/iscsi: fix error msg typo when create lio_qr_cache failed
scsi: sd: Fix cache_type_store()
Linus Torvalds [Sat, 19 Jan 2019 21:12:50 +0000 (09:12 +1200)]
Merge tag 'for-linus-
20190118' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- block size setting fixes for loop/nbd (Jan Kara)
- md bio_alloc_mddev() cleanup (Marcos)
- Ensure we don't lose the REQ_INTEGRITY flag (Ming)
- Two NVMe fixes by way of Christoph:
- Fix NVMe IRQ calculation (Ming)
- Uninitialized variable in nvmet-tcp (Sagi)
- BFQ comment fix (Paolo)
- License cleanup for recently added blk-mq-debugfs-zoned (Thomas)
* tag 'for-linus-
20190118' of git://git.kernel.dk/linux-block:
block: Cleanup license notice
nvme-pci: fix nvme_setup_irqs()
nvmet-tcp: fix uninitialized variable access
block: don't lose track of REQ_INTEGRITY flag
blockdev: Fix livelocks on loop device
nbd: Use set_blocksize() to set device blocksize
md: Make bio_alloc_mddev use bio_alloc_bioset
block, bfq: fix comments on __bfq_deactivate_entity
Jason Gunthorpe [Fri, 18 Jan 2019 22:57:04 +0000 (22:57 +0000)]
clang-format: Update .clang-format with the latest for_each macro list
Re-run the shell fragment that generated the original list. In particular
this adds the missing xarray related functions.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Camelia Groza [Thu, 17 Jan 2019 12:22:36 +0000 (14:22 +0200)]
net: phy: phy driver features are mandatory
Since phy driver features became a link_mode bitmap, phy drivers that
don't have a list of features configured will cause the kernel to crash
when probed.
Prevent the phy driver from registering if the features field is missing.
Fixes:
719655a14971 ("net: phy: Replace phy driver features u32 with link_mode bitmap")
Reported-by: Scott Wood <oss@buserror.net>
Signed-off-by: Camelia Groza <camelia.groza@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nathan Chancellor [Thu, 10 Jan 2019 05:41:08 +0000 (22:41 -0700)]
isdn: avm: Fix string plus integer warning from Clang
A recent commit in Clang expanded the -Wstring-plus-int warning, showing
some odd behavior in this file.
drivers/isdn/hardware/avm/b1.c:426:30: warning: adding 'int' to a string does not append to the string [-Wstring-plus-int]
cinfo->version[j] = "\0\0" + 1;
~~~~~~~^~~
drivers/isdn/hardware/avm/b1.c:426:30: note: use array indexing to silence this warning
cinfo->version[j] = "\0\0" + 1;
^
& [ ]
1 warning generated.
This is equivalent to just "\0". Nick pointed out that it is smarter to
use "" instead of "\0" because "" is used elsewhere in the kernel and
can be deduplicated at the linking stage.
Link: https://github.com/ClangBuiltLinux/linux/issues/309
Suggested-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rob Herring [Fri, 18 Jan 2019 14:12:10 +0000 (08:12 -0600)]
powerpc: chrp: Use of_node_is_type to access device_type
Commit
8ce5f8415753 ("of: Remove struct device_node.type pointer")
removed struct device_node.type pointer, but the conversion to use
of_node_is_type() accessor was missed in chrp_init_IRQ().
Fixes:
8ce5f8415753 ("of: Remove struct device_node.type pointer")
Reported-by: kbuild test robot <lkp@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Rob Herring <robh@kernel.org>
David S. Miller [Sat, 19 Jan 2019 02:23:23 +0000 (18:23 -0800)]
Merge tag 'mlx5-fixes-2019-01-18' of git://git./linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
Mellanox, mlx5 fixes 2019-01-18
This series introduces some fixes to mlx5 driver.
Please pull and let me know if there is any problem.
For -stable v4.18
('net/mlx5e: Force CHECKSUM_UNNECESSARY for short ethernet frames')
The patch doesn't apply cleanly to 4.18.y, but it is very simple to
resolve, what should be the procedure here ?
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eli Britstein [Wed, 19 Dec 2018 05:36:51 +0000 (07:36 +0200)]
net/mlx5e: Fix cb_ident duplicate in indirect block register
Previously the identifier used for indirect block callback registry
and for block rule cb registry (when done via indirect blocks) was the
pointer to the tunnel netdev we were interested in receiving updates on.
This worked fine if a single PF existed that registered one callback for
the tunnel netdev of interest. However, if multiple PFs are in place then
the 2nd PF tries to register with the same tunnel netdev identifier. This
leads to EEXIST errors and/or incorrect cb deletions.
Prevent this conflict by using the rpriv pointer as the identifier for
netdev indirect block cb registry, allowing each PF to register a unique
callback per tunnel netdev. For block cb registry, the same PF may
register multiple cbs to the same block if using TC shared blocks.
Instead of the rpriv, use the pointer to the allocated indr_priv data as
the identifier here. This means that there can be a unique block callback
for each PF/tunnel netdev combo.
Fixes:
f5bc2c5de101 ("net/mlx5e: Support TC indirect block notifications
for eswitch uplink reprs")
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Tariq Toukan [Thu, 8 Nov 2018 10:06:53 +0000 (12:06 +0200)]
net/mlx5e: Fix wrong (zero) TX drop counter indication for representor
For representors, the TX dropped counter is not folded from the
per-ring counters. Fix it.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Shay Agroskin [Sun, 9 Dec 2018 10:00:13 +0000 (12:00 +0200)]
net/mlx5e: Fix wrong error code return on FEC query failure
Advertised and configured FEC query failure resulted in printing
wrong error code.
Fixes:
6cfa94605091 ("net/mlx5e: Ethtool driver callback for query/set FEC policy")
Signed-off-by: Shay Agroskin <shayag@mellanox.com>
Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Cong Wang [Tue, 4 Dec 2018 06:14:04 +0000 (22:14 -0800)]
net/mlx5e: Force CHECKSUM_UNNECESSARY for short ethernet frames
When an ethernet frame is padded to meet the minimum ethernet frame
size, the padding octets are not covered by the hardware checksum.
Fortunately the padding octets are usually zero's, which don't affect
checksum. However, we have a switch which pads non-zero octets, this
causes kernel hardware checksum fault repeatedly.
Prior to:
commit '
88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE ...")'
skb checksum was forced to be CHECKSUM_NONE when padding is detected.
After it, we need to keep skb->csum updated, like what we do for RXFCS.
However, fixing up CHECKSUM_COMPLETE requires to verify and parse IP
headers, it is not worthy the effort as the packets are so small that
CHECKSUM_COMPLETE can't save anything.
Fixes:
88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"),
Cc: Eric Dumazet <edumazet@google.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Nikola Ciprich <nikola.ciprich@linuxbox.cz>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Thomas Gleixner [Thu, 17 Jan 2019 23:14:24 +0000 (00:14 +0100)]
tools: bpftool: Cleanup license mess
Precise and non-ambiguous license information is important. The recent
relicensing of the bpftools introduced a license conflict.
The files have now:
SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause
and
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version
* 2 of the License, or (at your option) any later version
Amazingly about 20 people acked that change and neither they nor the
committer noticed. Oh well.
Digging deeper: The files were imported from the iproute2 repository with
the GPL V2 or later boiler plate text in commit
b66e907cfee2 ("tools:
bpftool: copy JSON writer from iproute2 repository")
Looking at the iproute2 repository at
git://git.kernel.org/pub/scm/network/iproute2/iproute2.git
the following commit is the equivivalent:
commit
d9d8c839 ("json_writer: add SPDX Identifier (GPL-2/BSD-2)")
That commit explicitly removes the boiler plate and relicenses the code
uner GPL-2.0-only and BSD-2-Clause. As Steven wrote the original code and
also the relicensing commit, it's assumed that the relicensing was intended
to do exaclty that. Just the kernel side update failed to remove the boiler
plate. Do so now.
Fixes:
907b22365115 ("tools: bpftool: dual license all files")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: YueHaibing <yuehaibing@huawei.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: Stanislav Fomichev <sdf@google.com>
Cc: Sean Young <sean@mess.org>
Cc: Jiri Benc <jbenc@redhat.com>
Cc: David Calavera <david.calavera@gmail.com>
Cc: Andrey Ignatov <rdna@fb.com>
Cc: Joe Stringer <joe@wand.net.nz>
Cc: David Ahern <dsahern@gmail.com>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Petar Penkov <ppenkov@stanford.edu>
Cc: Sandipan Das <sandipan@linux.ibm.com>
Cc: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Taeung Song <treeze.taeung@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Quentin Monnet <quentin.monnet@netronome.com>
CC: okash.khawaja@gmail.com
Cc: netdev@vger.kernel.org
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Daniel Borkmann [Thu, 17 Jan 2019 15:34:45 +0000 (16:34 +0100)]
bpf: fix inner map masking to prevent oob under speculation
During review I noticed that inner meta map setup for map in
map is buggy in that it does not propagate all needed data
from the reference map which the verifier is later accessing.
In particular one such case is index masking to prevent out of
bounds access under speculative execution due to missing the
map's unpriv_array/index_mask field propagation. Fix this such
that the verifier is generating the correct code for inlined
lookups in case of unpriviledged use.
Before patch (test_verifier's 'map in map access' dump):
# bpftool prog dump xla id 3
0: (62) *(u32 *)(r10 -4) = 0
1: (bf) r2 = r10
2: (07) r2 += -4
3: (18) r1 = map[id:4]
5: (07) r1 += 272 |
6: (61) r0 = *(u32 *)(r2 +0) |
7: (35) if r0 >= 0x1 goto pc+6 | Inlined map in map lookup
8: (54) (u32) r0 &= (u32) 0 | with index masking for
9: (67) r0 <<= 3 | map->unpriv_array.
10: (0f) r0 += r1 |
11: (79) r0 = *(u64 *)(r0 +0) |
12: (15) if r0 == 0x0 goto pc+1 |
13: (05) goto pc+1 |
14: (b7) r0 = 0 |
15: (15) if r0 == 0x0 goto pc+11
16: (62) *(u32 *)(r10 -4) = 0
17: (bf) r2 = r10
18: (07) r2 += -4
19: (bf) r1 = r0
20: (07) r1 += 272 |
21: (61) r0 = *(u32 *)(r2 +0) | Index masking missing (!)
22: (35) if r0 >= 0x1 goto pc+3 | for inner map despite
23: (67) r0 <<= 3 | map->unpriv_array set.
24: (0f) r0 += r1 |
25: (05) goto pc+1 |
26: (b7) r0 = 0 |
27: (b7) r0 = 0
28: (95) exit
After patch:
# bpftool prog dump xla id 1
0: (62) *(u32 *)(r10 -4) = 0
1: (bf) r2 = r10
2: (07) r2 += -4
3: (18) r1 = map[id:2]
5: (07) r1 += 272 |
6: (61) r0 = *(u32 *)(r2 +0) |
7: (35) if r0 >= 0x1 goto pc+6 | Same inlined map in map lookup
8: (54) (u32) r0 &= (u32) 0 | with index masking due to
9: (67) r0 <<= 3 | map->unpriv_array.
10: (0f) r0 += r1 |
11: (79) r0 = *(u64 *)(r0 +0) |
12: (15) if r0 == 0x0 goto pc+1 |
13: (05) goto pc+1 |
14: (b7) r0 = 0 |
15: (15) if r0 == 0x0 goto pc+12
16: (62) *(u32 *)(r10 -4) = 0
17: (bf) r2 = r10
18: (07) r2 += -4
19: (bf) r1 = r0
20: (07) r1 += 272 |
21: (61) r0 = *(u32 *)(r2 +0) |
22: (35) if r0 >= 0x1 goto pc+4 | Now fixed inlined inner map
23: (54) (u32) r0 &= (u32) 0 | lookup with proper index masking
24: (67) r0 <<= 3 | for map->unpriv_array.
25: (0f) r0 += r1 |
26: (05) goto pc+1 |
27: (b7) r0 = 0 |
28: (b7) r0 = 0
29: (95) exit
Fixes:
b2157399cc98 ("bpf: prevent out-of-bounds speculation")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Daniel Borkmann [Thu, 17 Jan 2019 15:15:09 +0000 (16:15 +0100)]
bpf: pull in pkt_sched.h header for tooling to fix bpftool build
Dan reported that bpftool does not compile for him:
$ make tools/bpf
DESCEND bpf
Auto-detecting system features:
.. libbfd: [ on ]
.. disassembler-four-args: [ OFF ]
DESCEND bpftool
Auto-detecting system features:
.. libbfd: [ on ]
.. disassembler-four-args: [ OFF ]
CC /opt/linux.git/tools/bpf/bpftool/net.o
In file included from /opt/linux.git/tools/include/uapi/linux/pkt_cls.h:6:0,
from /opt/linux.git/tools/include/uapi/linux/tc_act/tc_bpf.h:14,
from net.c:13:
net.c: In function 'show_dev_tc_bpf':
net.c:164:21: error: 'TC_H_CLSACT' undeclared (first use in this function)
handle = TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_INGRESS);
[...]
Fix it by importing pkt_sched.h header copy into tooling
infrastructure.
Fixes:
49a249c38726 ("tools/bpftool: copy a few net uapi headers to tools directory")
Fixes:
f6f3bac08ff9 ("tools/bpf: bpftool: add net support")
Reported-by: Dan Gilson <dan_gilson@yahoo.com>
Reference: https://bugzilla.kernel.org/show_bug.cgi?id=202315
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
David S. Miller [Fri, 18 Jan 2019 23:12:16 +0000 (15:12 -0800)]
Merge branch 'mlxsw-fixes'
Ido Schimmel says:
====================
mlxsw: Various fixes
This patchset contains small fixes in mlxsw and one fix in the bridge
driver.
Patches #1-#4 perform small adjustments in PCI and FID code following
recent tests that were performed on the Spectrum-2 ASIC.
Patch #5 fixes the bridge driver to mark FDB entries that were added by
user as such. Otherwise, these entries will be ignored by underlying
switch drivers.
Patch #6 fixes a long standing issue in mlxsw where the driver
incorrectly programmed static FDB entries as both static and sticky.
Patches #7-#8 add test cases for above mentioned bugs.
Please consider patches #1, #2 and #4 for stable.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Fri, 18 Jan 2019 15:58:03 +0000 (15:58 +0000)]
selftests: forwarding: Add a test case for externally learned FDB entries
Test that externally learned FDB entries can roam, but not age out.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Fri, 18 Jan 2019 15:58:02 +0000 (15:58 +0000)]
selftests: mlxsw: Test FDB offload indication
Test that externally learned FDB entries added from user space are
marked as offloaded.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Fri, 18 Jan 2019 15:58:01 +0000 (15:58 +0000)]
mlxsw: spectrum_switchdev: Do not treat static FDB entries as sticky
The driver currently treats static FDB entries as both static and
sticky. This is incorrect and prevents such entries from being roamed to
a different port via learning.
Fix this by configuring static entries with ageing disabled and roaming
enabled.
In net-next we can add proper support for the newly introduced 'sticky'
flag.
Fixes:
56ade8fe3fe1 ("mlxsw: spectrum: Add initial support for Spectrum ASIC")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Alexander Petrovskiy <alexpe@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Fri, 18 Jan 2019 15:58:00 +0000 (15:58 +0000)]
net: bridge: Mark FDB entries that were added by user as such
Externally learned entries can be added by a user or by a switch driver
that is notifying the bridge driver about entries that were learned in
hardware.
In the first case, the entries are not marked with the 'added_by_user'
flag, which causes switch drivers to ignore them and not offload them.
The 'added_by_user' flag can be set on externally learned FDB entries
based on the 'swdev_notify' parameter in br_fdb_external_learn_add(),
which effectively means if the created / updated FDB entry was added by
a user or not.
Fixes:
816a3bed9549 ("switchdev: Add fdb.added_by_user to switchdev notifications")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Alexander Petrovskiy <alexpe@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Cc: Roopa Prabhu <roopa@cumulusnetworks.com>
Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Cc: bridge@lists.linux-foundation.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Nir Dotan [Fri, 18 Jan 2019 15:57:59 +0000 (15:57 +0000)]
mlxsw: spectrum_fid: Update dummy FID index
When using a tc flower action of egress mirred redirect, the driver adds
an implicit FID setting action. This implicit action sets a dummy FID to
the packet and is used as part of a design for trapping unmatched flows
in OVS. While this implicit FID setting action is supposed to be a NOP
when a redirect action is added, in Spectrum-2 the FID record is
consulted as the dummy FID index is an 802.1D FID index and the packet
is dropped instead of being redirected.
Set the dummy FID index value to be within 802.1Q range. This satisfies
both Spectrum-1 which ignores the FID and Spectrum-2 which identifies it
as an 802.1Q FID and will then follow the redirect action.
Fixes:
c3ab435466d5 ("mlxsw: spectrum: Extend to support Spectrum-2 ASIC")
Signed-off-by: Nir Dotan <nird@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nir Dotan [Fri, 18 Jan 2019 15:57:57 +0000 (15:57 +0000)]
mlxsw: pci: Return error on PCI reset timeout
Return an appropriate error in the case when the driver timeouts on waiting
for firmware to go out of PCI reset.
Fixes:
233fa44bd67a ("mlxsw: pci: Implement reset done check")
Signed-off-by: Nir Dotan <nird@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nir Dotan [Fri, 18 Jan 2019 15:57:56 +0000 (15:57 +0000)]
mlxsw: pci: Increase PCI SW reset timeout
Spectrum-2 PHY layer introduces a calibration period which is a part of the
Spectrum-2 firmware boot process. Hence increase the SW timeout waiting for
the firmware to come out of boot. This does not increase system boot time
in cases where the firmware PHY calibration process is done quickly.
Fixes:
c3ab435466d5 ("mlxsw: spectrum: Extend to support Spectrum-2 ASIC")
Signed-off-by: Nir Dotan <nird@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Fri, 18 Jan 2019 15:57:55 +0000 (15:57 +0000)]
mlxsw: pci: Ring CQ's doorbell before RDQ's
When a packet should be trapped to the CPU the device consumes a WQE
(work queue element) from an RDQ (receive descriptor queue) and copies
the packet to the address specified in the WQE. The device then tries to
post a CQE (completion queue element) that contains various metadata
(e.g., ingress port) about the packet to a CQ (completion queue).
In case the device managed to consume a WQE, but did not manage to post
the corresponding CQE, it will get stuck. This unlikely situation can be
triggered due to the scheme the driver is currently using to process
CQEs.
The driver will consume up to 512 CQEs at a time and after processing
each corresponding WQE it will ring the RDQ's doorbell, letting the
device know that a new WQE was posted for it to consume. Only after
processing all the CQEs (up to 512), the driver will ring the CQ's
doorbell, letting the device know that new ones can be posted.
Fix this by having the driver ring the CQ's doorbell for every processed
CQE, but before ringing the RDQ's doorbell. This guarantees that
whenever we post a new WQE, there is a corresponding CQE available. Copy
the currently processed CQE to prevent the device from overwriting it
with a new CQE after ringing the doorbell.
Note that the driver still arms the CQ only after processing all the
pending CQEs, so that interrupts for this CQ will only be delivered
after the driver finished its processing.
Before commit
8404f6f2e8ed ("mlxsw: pci: Allow to use CQEs of version 1
and version 2") the issue was virtually impossible to trigger since the
number of CQEs was twice the number of WQEs and the number of CQEs
processed at a time was equal to the number of available WQEs.
Fixes:
8404f6f2e8ed ("mlxsw: pci: Allow to use CQEs of version 1 and version 2")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Semion Lisyansky <semionl@mellanox.com>
Tested-by: Semion Lisyansky <semionl@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Felix Manlunas [Thu, 17 Jan 2019 18:07:45 +0000 (18:07 +0000)]
MAINTAINERS: update email addresses of liquidio driver maintainers
Update email addresses of liquidio driver maintainers. Also remove a
former maintainer.
Signed-off-by: Felix Manlunas <fmanlunas@marvell.com>
Acked-by: Derek Chickles <dchickles@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jonathan Neuschäfer [Thu, 17 Jan 2019 17:02:18 +0000 (18:02 +0100)]
net: Fix typo in NET_FAILOVER help text
"also enables" should not be spelled as one word.
Fixes:
cfc80d9a1163 ("net: Introduce net_failover driver")
Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ross Lagerwall [Thu, 17 Jan 2019 15:34:38 +0000 (15:34 +0000)]
net: Fix usage of pskb_trim_rcsum
In certain cases, pskb_trim_rcsum() may change skb pointers.
Reinitialize header pointers afterwards to avoid potential
use-after-frees. Add a note in the documentation of
pskb_trim_rcsum(). Found by KASAN.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 18 Jan 2019 19:34:10 +0000 (07:34 +1200)]
Merge tag 'media/v5.0-1' of git://git./linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
- a regression fix at v4l2 core, with affects multi-plane streams
- a fix at vim2m driver
* tag 'media/v5.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
media: vim2m: only cancel work if it is for right context
media: v4l: ioctl: Validate num_planes for debug messages
media: v4l: ioctl: Validate num_planes before using it
media: v4l2-ioctl: Clear only per-plane reserved fields
Linus Torvalds [Fri, 18 Jan 2019 19:26:16 +0000 (07:26 +1200)]
Merge tag 'pci-v5.0-fixes-2' of git://git./linux/kernel/git/helgaas/pci
Pull PCI fixes from Bjorn Helgaas::
- Fix PCI kconfig menu organization (Rob Herring)
- Fix pci_alloc_irq_vectors_affinity() error return to allow "reduce
and retry" for drivers using IRQ sets (Ming Lei)
- Fix "pci=disable_acs_redir" initdata use-after-free problem (Logan
Gunthorpe)
* tag 'pci-v5.0-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI: Fix __initdata issue with "pci=disable_acs_redir" parameter
PCI/MSI: Return -ENOSPC from pci_alloc_irq_vectors_affinity()
PCI: Fix PCI kconfig menu organization
Linus Torvalds [Fri, 18 Jan 2019 19:23:25 +0000 (07:23 +1200)]
Merge tag 'i3c/fixes-for-5.0-rc3' of git://git./linux/kernel/git/i3c/linux
Pull i3c fixes from Boris Brezillon:
- Fix the error check on master->sysclk val in the Cadence driver
- Fix reattach implementation in the Designware driver
* tag 'i3c/fixes-for-5.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux:
i3c: master: dw-i3c-master: fix i3c_attach/reattach
i3c: master: Fix an error checking typo in 'cdns_i3c_master_probe()'
Linus Torvalds [Fri, 18 Jan 2019 19:21:43 +0000 (07:21 +1200)]
Merge tag 'mtd/fixes-for-5.0-rc3' of git://git.infradead.org/linux-mtd
Pull mtd fixes from Boris Brezillon:
"Raw NAND changes:
- jz4740: fix a compilation warning
- fsmc: fix a regression introduced by ->select_chip() deprecation
- denali: fix a regression introduced by NAND_KEEP_TIMINGS addition"
* tag 'mtd/fixes-for-5.0-rc3' of git://git.infradead.org/linux-mtd:
mtd: rawnand: denali: get ->setup_data_interface() working again
mtd: nand: jz4740: fix '__iomem *' vs. '* __iomem'
mtd: rawnand: fsmc: Keep bank enable bit set
Linus Torvalds [Fri, 18 Jan 2019 19:17:19 +0000 (07:17 +1200)]
Merge tag 'regmap-fix-v5.0-rc2' of git://git./linux/kernel/git/broonie/regmap
Pull regmap fixes from Mark Brown:
"The cleanups for the way we handle type information introduced during
the merge window revealed that we'd been abusing the irq APIs for a
long time, causing breakage for systems.
This has a couple of minimal fixes for that which restore the previous
behaviour for the time being, we'll fix it properly for v5.1 but
that'd be a bit much to do as a bug fix"
* tag 'regmap-fix-v5.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
regmap-irq: do not write mask register if mask_base is zero
regmap: regmap-irq: silently ignore unsupported type settings
Thomas Petazzoni [Wed, 16 Jan 2019 09:53:58 +0000 (10:53 +0100)]
net: phy: mdio_bus: add missing device_del() in mdiobus_register() error handling
The current code in __mdiobus_register() doesn't properly handle
failures returned by the devm_gpiod_get_optional() call: it returns
immediately, without unregistering the device that was added by the
call to device_register() earlier in the function.
This leaves a stale device, which then causes a NULL pointer
dereference in the code that handles deferred probing:
[ 1.489982] Unable to handle kernel NULL pointer dereference at virtual address
00000074
[ 1.498110] pgd = (ptrval)
[ 1.500838] [
00000074] *pgd=
00000000
[ 1.504432] Internal error: Oops: 17 [#1] SMP ARM
[ 1.509133] Modules linked in:
[ 1.512192] CPU: 1 PID: 51 Comm: kworker/1:3 Not tainted 4.20.0-00039-g3b73a4cc8b3e-dirty #99
[ 1.520708] Hardware name: Xilinx Zynq Platform
[ 1.525261] Workqueue: events deferred_probe_work_func
[ 1.530403] PC is at klist_next+0x10/0xfc
[ 1.534403] LR is at device_for_each_child+0x40/0x94
[ 1.539361] pc : [<
c0683fbc>] lr : [<
c0455d90>] psr:
200e0013
[ 1.545628] sp :
ceeefe68 ip :
00000001 fp :
ffffe000
[ 1.550863] r10:
00000000 r9 :
c0c66790 r8 :
00000000
[ 1.556079] r7 :
c0457d44 r6 :
00000000 r5 :
ceeefe8c r4 :
cfa2ec78
[ 1.562604] r3 :
00000064 r2 :
c0457d44 r1 :
ceeefe8c r0 :
00000064
[ 1.569129] Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
[ 1.576263] Control:
18c5387d Table:
0ed7804a DAC:
00000051
[ 1.582013] Process kworker/1:3 (pid: 51, stack limit = 0x(ptrval))
[ 1.588280] Stack: (0xceeefe68 to 0xceef0000)
[ 1.592630] fe60:
cfa2ec78 c0c03c08 00000000 c0457d44 00000000 c0c66790
[ 1.600814] fe80:
00000000 c0455d90 ceeefeac 00000064 00000000 0d7a542e cee9d494 cfa2ec78
[ 1.608998] fea0:
cfa2ec78 00000000 c0457d44 c0457d7c cee9d494 c0c03c08 00000000 c0455dac
[ 1.617182] fec0:
cf98ba44 cf926a00 cee9d494 0d7a542e 00000000 cf935a10 cf935a10 cf935a10
[ 1.625366] fee0:
c0c4e9b8 c0457d7c c0c4e80c 00000001 cf935a10 c0457df4 cf935a10 c0c4e99c
[ 1.633550] ff00:
c0c4e99c c045a27c c0c4e9c4 ced63f80 cfde8a80 cfdebc00 00000000 c013893c
[ 1.641734] ff20:
cfde8a80 cfde8a80 c07bd354 ced63f80 ced63f94 cfde8a80 00000008 c0c02d00
[ 1.649936] ff40:
cfde8a98 cfde8a80 ffffe000 c0139a30 ffffe000 c0c6624a c07bd354 00000000
[ 1.658120] ff60:
ffffe000 cee9e780 ceebfe00 00000000 ceeee000 ced63f80 c0139788 cf8cdea4
[ 1.666304] ff80:
cee9e79c c013e598 00000001 ceebfe00 c013e44c 00000000 00000000 00000000
[ 1.674488] ffa0:
00000000 00000000 00000000 c01010e8 00000000 00000000 00000000 00000000
[ 1.682671] ffc0:
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 1.690855] ffe0:
00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
[ 1.699058] [<
c0683fbc>] (klist_next) from [<
c0455d90>] (device_for_each_child+0x40/0x94)
[ 1.707241] [<
c0455d90>] (device_for_each_child) from [<
c0457d7c>] (device_reorder_to_tail+0x38/0x88)
[ 1.716476] [<
c0457d7c>] (device_reorder_to_tail) from [<
c0455dac>] (device_for_each_child+0x5c/0x94)
[ 1.725692] [<
c0455dac>] (device_for_each_child) from [<
c0457d7c>] (device_reorder_to_tail+0x38/0x88)
[ 1.734927] [<
c0457d7c>] (device_reorder_to_tail) from [<
c0457df4>] (device_pm_move_to_tail+0x28/0x40)
[ 1.744235] [<
c0457df4>] (device_pm_move_to_tail) from [<
c045a27c>] (deferred_probe_work_func+0x58/0x8c)
[ 1.753746] [<
c045a27c>] (deferred_probe_work_func) from [<
c013893c>] (process_one_work+0x210/0x4fc)
[ 1.762888] [<
c013893c>] (process_one_work) from [<
c0139a30>] (worker_thread+0x2a8/0x5c0)
[ 1.771072] [<
c0139a30>] (worker_thread) from [<
c013e598>] (kthread+0x14c/0x154)
[ 1.778482] [<
c013e598>] (kthread) from [<
c01010e8>] (ret_from_fork+0x14/0x2c)
[ 1.785689] Exception stack(0xceeeffb0 to 0xceeefff8)
[ 1.790739] ffa0:
00000000 00000000 00000000 00000000
[ 1.798923] ffc0:
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 1.807107] ffe0:
00000000 00000000 00000000 00000000 00000013 00000000
[ 1.813724] Code:
e92d47f0 e1a05000 e8900048 e1a00003 (
e5937010)
[ 1.819844] ---[ end trace
3c2c0c8b65399ec9 ]---
The actual error that we had from devm_gpiod_get_optional() was
-EPROBE_DEFER, due to the GPIO being provided by a driver that is
probed later than the Ethernet controller driver.
To fix this, we simply add the missing device_del() invocation in the
error path.
Fixes:
69226896ad636 ("mdio_bus: Issue GPIO RESET to PHYs")
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Otto Sabart [Mon, 14 Jan 2019 11:56:36 +0000 (12:56 +0100)]
doc: net: fix bad references to network drivers
Fix "reference to nonexisting document" warnings.
Fixes:
b255e500c8dc ("net: documentation: build a directory structure for drivers")
Signed-off-by: Otto Sabart <ottosabart@seberm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 18 Jan 2019 17:55:42 +0000 (05:55 +1200)]
Merge tag 'powerpc-5.0-3' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"A couple of weeks of fixes.
There's one fix for an oops on Power9 machines with Open CAPI
adapters.
And a fix for probable memory corruption in some of the new NPU code,
caught by smatch though and not seen in the wild.
Plus a few other minor fixes.
There's one non-fix which is the perf_regs change. That was sent
during the merge window but I accidentally only merged the first of
two patches in the series. It's been in linux-next so hopefully
doesn't conflict with anything in acme's tree.
Thanks to: Alexey Kardashevskiy, Andrew Donnellan, Breno Leitao,
Christian Lamparter, Christophe Leroy, Dan Carpenter, Frederic Barrat,
Greg Kurz, Jason A. Donenfeld, Madhavan Srinivasan"
* tag 'powerpc-5.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/syscalls: Fix syscall tracing
powerpc/pseries: Fix build break due to pnv_npu2_init()
powerpc/4xx/ocm: Fix fix for phys_addr_t printf warnings
powerpc/powernv/npu: Fix oops in pnv_try_setup_npu_table_group()
powerpc/tm: Limit TM code inside PPC_TRANSACTIONAL_MEM
powerpc/8xx: fix setting of pagetable for Abatron BDI debug tool.
powerpc/powernv/npu: Allocate enough memory in pnv_try_setup_npu_table_group()
powerpc/perf: Update perf_regs structure to include MMCRA
Linus Torvalds [Fri, 18 Jan 2019 17:53:41 +0000 (05:53 +1200)]
Merge tag 'for-linus-5.0-rc3-tag' of git://git./linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
- Several fixes for the Xen pvcalls drivers (1 fix for the backend and
8 for the frontend).
- A fix for a rather longstanding bug in the Xen sched_clock()
interface which led to weird time jumps when migrating the system.
- A fix for avoiding accesses to x2apic MSRs in Xen PV guests.
* tag 'for-linus-5.0-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen: Fix x86 sched_clock() interface for xen
pvcalls-front: fix potential null dereference
always clear the X2APIC_ENABLE bit for PV guest
pvcalls-front: Avoid get_free_pages(GFP_KERNEL) under spinlock
xen/pvcalls: remove set but not used variable 'intf'
pvcalls-back: set -ENOTCONN in pvcalls_conn_back_read
pvcalls-front: don't return error when the ring is full
pvcalls-front: properly allocate sk
pvcalls-front: don't try to free unallocated rings
pvcalls-front: read all data before closing the connection
Linus Torvalds [Fri, 18 Jan 2019 17:48:43 +0000 (05:48 +1200)]
Merge branch 'linus' of git://git./linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
"This fixes the following issues:
- Zero-length DMA mapping in caam
- Invalidly mapping stack memory for DMA in talitos
- Use after free in cavium/nitrox
- Key parsing in authenc
- Undefined shift in sm3
- Bogus completion call in authencesn
- SHA support detection in caam"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: sm3 - fix undefined shift by >= width of value
crypto: talitos - fix ablkcipher for CONFIG_VMAP_STACK
crypto: talitos - reorder code in talitos_edesc_alloc()
crypto: adiantum - initialize crypto_spawn::inst
crypto: cavium/nitrox - Use after free in process_response_list()
crypto: authencesn - Avoid twice completion call in decrypt path
crypto: caam - fix SHA support detection
crypto: caam - fix zero-length buffer DMA mapping
crypto: ccree - convert to use crypto_authenc_extractkeys()
crypto: bcm - convert to use crypto_authenc_extractkeys()
crypto: authenc - fix parsing key with misaligned rta_len
Linus Torvalds [Fri, 18 Jan 2019 17:46:00 +0000 (05:46 +1200)]
Merge tag 'acpi-5.0-rc3' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These fix an ACPI initialization ordering issue introduced in the 4.17
time frame and causing functional problems to appear on multiple
systems and fix some fallout of the recent change to enable building
kernels with ACPI support and without PCI.
Specifics:
- Restore the ACPI initialization ordering changed implicitly by the
module-level AML handling rework during the 4.17 development cycle
that caused the EC address space handler based on information from
ECDT to be set up before loading AML definition blocks, making it
effectively not accessible by AML on some systems that don't work
as expected any more (Rafael Wysocki).
- Add direct dependencies on PCI to Kconfig in multiple places for
code that depends on both ACPI and PCI, but the PCI dependency was
implicitly satisfied by the ACPI dependency before, to prevent
invalid configurations from being created, for example by
randconfig (Sinan Kaya)"
* tag 'acpi-5.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: EC: Look for ECDT EC after calling acpi_load_tables()
drivers: thermal: int340x_thermal: Make PCI dependency explicit
x86/intel/lpss: Make PCI dependency explicit
platform/x86: apple-gmux: Make PCI dependency explicit
platform/x86: intel_pmc: Make PCI dependency explicit
platform/x86: intel_ips: make PCI dependency explicit
vga-switcheroo: make PCI dependency explicit
ata: pata_acpi: Make PCI dependency explicit
ACPI / LPSS: Make PCI dependency explicit
Linus Torvalds [Fri, 18 Jan 2019 17:43:05 +0000 (05:43 +1200)]
Merge tag 'fbdev-v5.0-rc3' of git://github.com/bzolnier/linux
Pull fbdev fixes from Bartlomiej Zolnierkiewicz:
- fix stack memory leak in omap2fb driver (Vlad Tsyrklevich)
- fix OF node name handling v4.20 regression in offb driver (Rob
Herring)
- convert CONFIG_FB_LOGO_CENTER config option added in v5.0-rc1 into a
kernel parameter (Peter Rosin)
* tag 'fbdev-v5.0-rc3' of git://github.com/bzolnier/linux:
fbdev: fbmem: convert CONFIG_FB_LOGO_CENTER into a cmd line option
fbdev: offb: Fix OF node name handling
omap2fb: Fix stack memory disclosure
Linus Torvalds [Fri, 18 Jan 2019 17:41:38 +0000 (05:41 +1200)]
Merge tag 'drm-fixes-2019-01-18-1' of git://anongit.freedesktop.org/drm/drm
Pull drm update from Dave Airlie:
"Add nouveau TU102 (RTX 2080 Ti) support"
* tag 'drm-fixes-2019-01-18-1' of git://anongit.freedesktop.org/drm/drm:
drm/nouveau/core: recognise TU102
Josef Bacik [Fri, 11 Jan 2019 15:21:02 +0000 (10:21 -0500)]
btrfs: wakeup cleaner thread when adding delayed iput
The cleaner thread usually takes care of delayed iputs, with the
exception of the btrfs_end_transaction_throttle path. Delaying iputs
means we are potentially delaying the eviction of an inode and it's
respective space. The cleaner thread only gets woken up every 30
seconds, or when we require space. If there are a lot of inodes that
need to be deleted we could induce a serious amount of latency while we
wait for these inodes to be evicted. So instead wakeup the cleaner if
it's not already awake to process any new delayed iputs we add to the
list. If we suddenly need space we will less likely be backed up
behind a bunch of inodes that are waiting to be deleted, and we could
possibly free space before we need to get into the flushing logic which
will save us some latency.
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Josef Bacik [Fri, 11 Jan 2019 15:21:01 +0000 (10:21 -0500)]
btrfs: run delayed iputs before committing
Delayed iputs means we can have final iputs of deleted inodes in the
queue, which could potentially generate a lot of pinned space that could
be free'd. So before we decide to commit the transaction for ENOPSC
reasons, run the delayed iputs so that any potential space is free'd up.
If there is and we freed enough we can then commit the transaction and
potentially be able to make our reservation.
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Josef Bacik [Wed, 21 Nov 2018 19:05:45 +0000 (14:05 -0500)]
btrfs: wait on ordered extents on abort cleanup
If we flip read-only before we initiate writeback on all dirty pages for
ordered extents we've created then we'll have ordered extents left over
on umount, which results in all sorts of bad things happening. Fix this
by making sure we wait on ordered extents if we have to do the aborted
transaction cleanup stuff.
generic/475 can produce this warning:
[ 8531.177332] WARNING: CPU: 2 PID: 11997 at fs/btrfs/disk-io.c:3856 btrfs_free_fs_root+0x95/0xa0 [btrfs]
[ 8531.183282] CPU: 2 PID: 11997 Comm: umount Tainted: G W 5.0.0-rc1-default+ #394
[ 8531.185164] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),BIOS rel-1.11.2-0-gf9626cc-prebuilt.qemu-project.org 04/01/2014
[ 8531.187851] RIP: 0010:btrfs_free_fs_root+0x95/0xa0 [btrfs]
[ 8531.193082] RSP: 0018:
ffffb1ab86163d98 EFLAGS:
00010286
[ 8531.194198] RAX:
ffff9f3449494d18 RBX:
ffff9f34a2695000 RCX:
0000000000000000
[ 8531.195629] RDX:
0000000000000002 RSI:
0000000000000001 RDI:
0000000000000000
[ 8531.197315] RBP:
ffff9f344e930000 R08:
0000000000000001 R09:
0000000000000000
[ 8531.199095] R10:
0000000000000000 R11:
ffff9f34494d4ff8 R12:
ffffb1ab86163dc0
[ 8531.200870] R13:
ffff9f344e9300b0 R14:
ffffb1ab86163db8 R15:
0000000000000000
[ 8531.202707] FS:
00007fc68e949fc0(0000) GS:
ffff9f34bd800000(0000)knlGS:
0000000000000000
[ 8531.204851] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 8531.205942] CR2:
00007ffde8114dd8 CR3:
000000002dfbd000 CR4:
00000000000006e0
[ 8531.207516] Call Trace:
[ 8531.208175] btrfs_free_fs_roots+0xdb/0x170 [btrfs]
[ 8531.210209] ? wait_for_completion+0x5b/0x190
[ 8531.211303] close_ctree+0x157/0x350 [btrfs]
[ 8531.212412] generic_shutdown_super+0x64/0x100
[ 8531.213485] kill_anon_super+0x14/0x30
[ 8531.214430] btrfs_kill_super+0x12/0xa0 [btrfs]
[ 8531.215539] deactivate_locked_super+0x29/0x60
[ 8531.216633] cleanup_mnt+0x3b/0x70
[ 8531.217497] task_work_run+0x98/0xc0
[ 8531.218397] exit_to_usermode_loop+0x83/0x90
[ 8531.219324] do_syscall_64+0x15b/0x180
[ 8531.220192] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 8531.221286] RIP: 0033:0x7fc68e5e4d07
[ 8531.225621] RSP: 002b:
00007ffde8116608 EFLAGS:
00000246 ORIG_RAX:
00000000000000a6
[ 8531.227512] RAX:
0000000000000000 RBX:
00005580c2175970 RCX:
00007fc68e5e4d07
[ 8531.229098] RDX:
0000000000000001 RSI:
0000000000000000 RDI:
00005580c2175b80
[ 8531.230730] RBP:
0000000000000000 R08:
00005580c2175ba0 R09:
00007ffde8114e80
[ 8531.232269] R10:
0000000000000000 R11:
0000000000000246 R12:
00005580c2175b80
[ 8531.233839] R13:
00007fc68eac61c4 R14:
00005580c2175a68 R15:
0000000000000000
Leaving a tree in the rb-tree:
3853 void btrfs_free_fs_root(struct btrfs_root *root)
3854 {
3855 iput(root->ino_cache_inode);
3856 WARN_ON(!RB_EMPTY_ROOT(&root->inode_tree));
CC: stable@vger.kernel.org
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
[ add stacktrace ]
Signed-off-by: David Sterba <dsterba@suse.com>
Josef Bacik [Wed, 21 Nov 2018 19:05:41 +0000 (14:05 -0500)]
btrfs: handle delayed ref head accounting cleanup in abort
We weren't doing any of the accounting cleanup when we aborted
transactions. Fix this by making cleanup_ref_head_accounting global and
calling it from the abort code, this fixes the issue where our
accounting was all wrong after the fs aborts.
The test generic/475 on a 2G VM can trigger the problems eg.:
[ 8502.136957] WARNING: CPU: 0 PID: 11064 at fs/btrfs/extent-tree.c:5986 btrfs_free_block_grou +ps+0x3dc/0x410 [btrfs]
[ 8502.148372] CPU: 0 PID: 11064 Comm: umount Not tainted 5.0.0-rc1-default+ #394
[ 8502.150807] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626 +cc-prebuilt.qemu-project.org 04/01/2014
[ 8502.154317] RIP: 0010:btrfs_free_block_groups+0x3dc/0x410 [btrfs]
[ 8502.160623] RSP: 0018:
ffffb1ab84b93de8 EFLAGS:
00010206
[ 8502.161906] RAX:
0000000001000000 RBX:
ffff9f34b1756400 RCX:
0000000000000000
[ 8502.163448] RDX:
0000000000000002 RSI:
0000000000000001 RDI:
ffff9f34b1755400
[ 8502.164906] RBP:
ffff9f34b7e8c000 R08:
0000000000000001 R09:
0000000000000000
[ 8502.166716] R10:
0000000000000000 R11:
0000000000000001 R12:
ffff9f34b7e8c108
[ 8502.168498] R13:
ffff9f34b7e8c158 R14:
0000000000000000 R15:
dead000000000100
[ 8502.170296] FS:
00007fb1cf15ffc0(0000) GS:
ffff9f34bd400000(0000) knlGS:
0000000000000000
[ 8502.172439] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 8502.173669] CR2:
00007fb1ced507b0 CR3:
000000002f7a6000 CR4:
00000000000006f0
[ 8502.175094] Call Trace:
[ 8502.175759] close_ctree+0x17f/0x350 [btrfs]
[ 8502.176721] generic_shutdown_super+0x64/0x100
[ 8502.177702] kill_anon_super+0x14/0x30
[ 8502.178607] btrfs_kill_super+0x12/0xa0 [btrfs]
[ 8502.179602] deactivate_locked_super+0x29/0x60
[ 8502.180595] cleanup_mnt+0x3b/0x70
[ 8502.181406] task_work_run+0x98/0xc0
[ 8502.182255] exit_to_usermode_loop+0x83/0x90
[ 8502.183113] do_syscall_64+0x15b/0x180
[ 8502.183919] entry_SYSCALL_64_after_hwframe+0x49/0xbe
Corresponding to
release_global_block_rsv() {
...
WARN_ON(fs_info->delayed_refs_rsv.reserved > 0);
CC: stable@vger.kernel.org
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
[ add log dump ]
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Wed, 9 Jan 2019 14:02:23 +0000 (15:02 +0100)]
Revert "btrfs: balance dirty metadata pages in btrfs_finish_ordered_io"
This reverts commit
e73e81b6d0114d4a303205a952ab2e87c44bd279.
This patch causes a few problems:
- adds latency to btrfs_finish_ordered_io
- as btrfs_finish_ordered_io is used for free space cache, generating
more work from btrfs_btree_balance_dirty_nodelay could end up in the
same workque, effectively deadlocking
12260 kworker/u96:16+btrfs-freespace-write D
[<0>] balance_dirty_pages+0x6e6/0x7ad
[<0>] balance_dirty_pages_ratelimited+0x6bb/0xa90
[<0>] btrfs_finish_ordered_io+0x3da/0x770
[<0>] normal_work_helper+0x1c5/0x5a0
[<0>] process_one_work+0x1ee/0x5a0
[<0>] worker_thread+0x46/0x3d0
[<0>] kthread+0xf5/0x130
[<0>] ret_from_fork+0x24/0x30
[<0>] 0xffffffffffffffff
Transaction commit will wait on the freespace cache:
838 btrfs-transacti D
[<0>] btrfs_start_ordered_extent+0x154/0x1e0
[<0>] btrfs_wait_ordered_range+0xbd/0x110
[<0>] __btrfs_wait_cache_io+0x49/0x1a0
[<0>] btrfs_write_dirty_block_groups+0x10b/0x3b0
[<0>] commit_cowonly_roots+0x215/0x2b0
[<0>] btrfs_commit_transaction+0x37e/0x910
[<0>] transaction_kthread+0x14d/0x180
[<0>] kthread+0xf5/0x130
[<0>] ret_from_fork+0x24/0x30
[<0>] 0xffffffffffffffff
And then writepages ends up waiting on transaction commit:
9520 kworker/u96:13+flush-btrfs-1 D
[<0>] wait_current_trans+0xac/0xe0
[<0>] start_transaction+0x21b/0x4b0
[<0>] cow_file_range_inline+0x10b/0x6b0
[<0>] cow_file_range.isra.69+0x329/0x4a0
[<0>] run_delalloc_range+0x105/0x3c0
[<0>] writepage_delalloc+0x119/0x180
[<0>] __extent_writepage+0x10c/0x390
[<0>] extent_write_cache_pages+0x26f/0x3d0
[<0>] extent_writepages+0x4f/0x80
[<0>] do_writepages+0x17/0x60
[<0>] __writeback_single_inode+0x59/0x690
[<0>] writeback_sb_inodes+0x291/0x4e0
[<0>] __writeback_inodes_wb+0x87/0xb0
[<0>] wb_writeback+0x3bb/0x500
[<0>] wb_workfn+0x40d/0x610
[<0>] process_one_work+0x1ee/0x5a0
[<0>] worker_thread+0x1e0/0x3d0
[<0>] kthread+0xf5/0x130
[<0>] ret_from_fork+0x24/0x30
[<0>] 0xffffffffffffffff
Eventually, we have every process in the system waiting on
balance_dirty_pages(), and nobody is able to make progress on page
writeback.
The original patch tried to fix an OOM condition, that happened on 4.4 but no
success reproducing that on later kernels (4.19 and 4.20). This is more likely
a problem in OOM itself.
Link: https://lore.kernel.org/linux-btrfs/20180528054821.9092-1-ethanlien@synology.com/
Reported-by: Chris Mason <clm@fb.com>
CC: stable@vger.kernel.org # 4.18+
CC: ethanlien <ethanlien@synology.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Rafael J. Wysocki [Fri, 18 Jan 2019 10:17:16 +0000 (11:17 +0100)]
Merge branch 'acpi-pci'
* acpi-pci:
drivers: thermal: int340x_thermal: Make PCI dependency explicit
x86/intel/lpss: Make PCI dependency explicit
platform/x86: apple-gmux: Make PCI dependency explicit
platform/x86: intel_pmc: Make PCI dependency explicit
platform/x86: intel_ips: make PCI dependency explicit
vga-switcheroo: make PCI dependency explicit
ata: pata_acpi: Make PCI dependency explicit
ACPI / LPSS: Make PCI dependency explicit
Masahiro Yamada [Fri, 18 Jan 2019 05:30:38 +0000 (14:30 +0900)]
mtd: rawnand: denali: get ->setup_data_interface() working again
Commit
7a08dbaedd36 ("mtd: rawnand: Move ->setup_data_interface() to
nand_controller_ops") missed to invert the if-conditonal for denali.
Since then, the Denali NAND driver cannnot invoke setup_data_interface.
Fixes:
7a08dbaedd36 ("mtd: rawnand: Move ->setup_data_interface() to nand_controller_ops")
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Miquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: Boris Brezillon <bbrezillon@kernel.org>
Luc Van Oostenryck [Thu, 17 Jan 2019 17:39:07 +0000 (18:39 +0100)]
mtd: nand: jz4740: fix '__iomem *' vs. '* __iomem'
The function jz_nand_ioremap_resource() needs a pointer to an __iomem
pointer as its last argument but this argument is declared as:
void * __iomem *base
Fix this by using the correct declaration:
void __iomem **base
which then also removes the following Sparse's warnings:
282:15: warning: incorrect type in assignment (different address spaces)
282:15: expected void *[noderef] <asn:2>
282:15: got void [noderef] <asn:2> *
322:57: warning: incorrect type in argument 4 (different address spaces)
322:57: expected void *[noderef] <asn:2> *base
322:57: got void [noderef] <asn:2> **
402:67: warning: incorrect type in argument 4 (different address spaces)
402:67: expected void *[noderef] <asn:2> *base
402:67: got void [noderef] <asn:2> **
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Acked-by: Miquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: Boris Brezillon <bbrezillon@kernel.org>
Yang Wei [Thu, 17 Jan 2019 15:30:03 +0000 (23:30 +0800)]
macvlan: replace kfree_skb by consume_skb for drop profiles
Replace the kfree_skb() by consume_skb() to be drop monitor(dropwatch,
perf) friendly.
Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yang Wei [Thu, 17 Jan 2019 15:11:30 +0000 (23:11 +0800)]
neighbour: Do not perturb drop profiles when neigh_probe
Replace the kfree_skb() by consume_skb() to be drop monitor(dropwatch,
perf) friendly.
Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Thu, 17 Jan 2019 14:20:14 +0000 (14:20 +0000)]
amd-xgbe: Fix mdio access for non-zero ports and clause 45 PHYs
The XGBE hardware has support for performing MDIO operations using an
MDIO command request. The driver mistakenly uses the mdio port address
as the MDIO command request device address instead of the MDIO command
request port address. Additionally, the driver does not properly check
for and create a clause 45 MDIO command.
Check the supplied MDIO register to determine if the request is a clause
45 operation (MII_ADDR_C45). For a clause 45 operation, extract the device
address and register number from the supplied MDIO register and use them
to set the MDIO command request device address and register number fields.
For a clause 22 operation, the MDIO request device address is set to zero
and the MDIO command request register number is set to the supplied MDIO
register. In either case, the supplied MDIO port address is used as the
MDIO command request port address.
Fixes:
732f2ab7afb9 ("amd-xgbe: Add support for MDIO attached PHYs")
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Tested-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Camelia Groza [Thu, 17 Jan 2019 12:33:33 +0000 (14:33 +0200)]
net: phy: add missing phy driver features
The phy drivers for CS4340 and TN2020 are missing their
features attributes. Add them.
Fixes:
719655a14971 ("net: phy: Replace phy driver features u32 with link_mode bitmap")
Reported-by: Scott Wood <oss@buserror.net>
Signed-off-by: Camelia Groza <camelia.groza@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Madalin Bucur [Thu, 17 Jan 2019 09:42:27 +0000 (11:42 +0200)]
dpaa_eth: NETIF_F_LLTX requires to do our own update of trans_start
As txq_trans_update() only updates trans_start when the lock is held,
trans_start does not get updated if NETIF_F_LLTX is declared.
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>