Shanker Donthineni [Mon, 14 Jan 2019 09:50:19 +0000 (09:50 +0000)]
irqchip/gicv3-its: Use NUMA aware memory allocation for ITS tables
The NUMA node information is visible to ITS driver but not being used
other than handling hardware errata. ITS/GICR hardware accesses to the
local NUMA node is usually quicker than the remote NUMA node. How slow
the remote NUMA accesses are depends on the implementation details.
This patch allocates memory for ITS management tables and command
queue from the corresponding NUMA node using the appropriate NUMA
aware functions. This change improves the performance of the ITS
tables read latency on systems where it has more than one ITS block,
and with the slower inter node accesses.
Apache Web server benchmarking using ab tool on a HiSilicon D06
board with multiple numa mem nodes shows Time per request and
Transfer rate improvements of ~3.6% with this patch.
Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
Signed-off-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Reviewed-by: Ganapatrao Kulkarni <gkulkarni@marvell.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Marc Zyngier [Wed, 20 Feb 2019 08:59:23 +0000 (08:59 +0000)]
irqdomain: Allow the default irq domain to be retrieved
The default irq domain allows legacy code to create irqdomain
mappings without having to track the domain it is allocating
from. Setting the default domain is a one shot, fire and forget
operation, and no effort was made to be able to retrieve this
information at a later point in time.
Newer irqdomain APIs (the hierarchical stuff) relies on both
the irqchip code to track the irqdomain it is allocating from,
as well as some form of firmware abstraction to easily identify
which piece of HW maps to which irq domain (DT, ACPI).
For systems without such firmware (or legacy platform that are
getting dragged into the 21st century), things are a bit harder.
For these cases (and these cases only!), let's provide a way
to retrieve the default domain, allowing the use of the v2 API
without having to resort to platform-specific hacks.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Anup Patel [Tue, 12 Feb 2019 12:52:46 +0000 (18:22 +0530)]
irqchip/sifive-plic: Implement irq_set_affinity() for SMP host
Currently on SMP host, all CPUs take external interrupts routed via
PLIC. All CPUs will try to claim a given external interrupt but only
one of them will succeed while other CPUs would simply resume whatever
they were doing before. This means if we have N CPUs then for every
external interrupt N-1 CPUs will always fail to claim it and waste
their CPU time.
Instead of above, external interrupts should be taken by only one CPU
and we should have provision to explicitly specify IRQ affinity from
kernel-space or user-space.
This patch provides irq_set_affinity() implementation for PLIC driver.
It also updates irq_enable() such that PLIC interrupts are only enabled
for one of CPUs specified in IRQ affinity mask.
With this patch in-place, we can change IRQ affinity at any-time from
user-space using procfs.
Example:
/ # cat /proc/interrupts
CPU0 CPU1 CPU2 CPU3
8: 44 0 0 0 SiFive PLIC 8 virtio0
10: 48 0 0 0 SiFive PLIC 10 ttyS0
IPI0: 55 663 58 363 Rescheduling interrupts
IPI1: 0 1 3 16 Function call interrupts
/ #
/ #
/ # echo 4 > /proc/irq/10/smp_affinity
/ #
/ # cat /proc/interrupts
CPU0 CPU1 CPU2 CPU3
8: 45 0 0 0 SiFive PLIC 8 virtio0
10: 160 0 17 0 SiFive PLIC 10 ttyS0
IPI0: 68 693 77 410 Rescheduling interrupts
IPI1: 0 2 3 16 Function call interrupts
Signed-off-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Anup Patel [Tue, 12 Feb 2019 12:52:45 +0000 (18:22 +0530)]
irqchip/sifive-plic: Differentiate between PLIC handler and context
We explicitly differentiate between PLIC handler and context because
PLIC context is for given mode of HART whereas PLIC handler is per-CPU
software construct meant for handling interrupts from a particular
PLIC context.
To achieve this differentiation, we rename "nr_handlers" to "nr_contexts"
and "nr_mapped" to "nr_handlers" in plic_init().
Signed-off-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Anup Patel [Tue, 12 Feb 2019 12:52:44 +0000 (18:22 +0530)]
irqchip/sifive-plic: Add warning in plic_init() if handler already present
We have two enteries (one for M-mode and another for S-mode) in the
interrupts-extended DT property of PLIC DT node for each HART. It is
expected that firmware/bootloader will set M-mode HWIRQ line of each
HART to 0xffffffff (i.e. -1) in interrupts-extended DT property
because Linux runs in S-mode only.
If firmware/bootloader is buggy then it will not correctly update
interrupts-extended DT property which might result in a plic_handler
configured twice. This patch adds a warning in plic_init() if a
plic_handler is already marked present. This warning provides us
a hint about incorrectly updated interrupts-extended DT property.
Signed-off-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Anup Patel [Tue, 12 Feb 2019 12:52:43 +0000 (18:22 +0530)]
irqchip/sifive-plic: Pre-compute context hart base and enable base
This patch does following optimizations:
1. Pre-compute hart base for each context handler
2. Pre-compute enable base for each context handler
3. Have enable lock for each context handler instead
of global plic_toggle_lock
Signed-off-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Atish Patra [Tue, 12 Feb 2019 11:10:11 +0000 (03:10 -0800)]
irqchip/irq-sifive-plic: Check and continue in case of an invalid cpuid.
riscv_hartid_to_cpuid can return invalid cpuid for a hart that is
present in DT but was never brought up.
Print the appropriate warning message and continue.
Signed-off-by: Atish Patra <atish.patra@wdc.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Aaro Koskinen [Wed, 6 Feb 2019 21:26:08 +0000 (23:26 +0200)]
irqchip/i8259: Fix shutdown order by moving syscore_ops registration
When using cpufreq on Loongson 2F MIPS platform, "poweroff"
command gets frequently stuck in syscore_shutdown(). The reason is
that i8259A_shutdown() gets called before cpufreq_suspend(), and if we
have pending work then irq_work_sync() in cpufreq_dbs_governor_stop()
gets stuck forever as we have all interrupts masked already.
irq-i8259 is registering syscore_ops using device_initcall(),
while cpufreq uses core_initcall(). Fix the shutdown order simply
by registering the irq syscore_ops during the early IRQ init instead
of using a separate initcall at later stage.
Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Jiaxun Yang [Fri, 1 Feb 2019 06:22:36 +0000 (14:22 +0800)]
dt-bindings: interrupt-controller: loongson ls1x intc
Dt-bindings doc about Loongson-1 interrupt controller.
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Jiaxun Yang [Fri, 1 Feb 2019 06:22:35 +0000 (14:22 +0800)]
irqchip: Add driver for Loongson-1 interrupt controller
This controller appeared on Loongson-1 family MCUs
including Loongson-1B and Loongson-1C.
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Zenghui Yu [Sun, 10 Feb 2019 05:24:10 +0000 (05:24 +0000)]
irqchip/gic-v3-its: Avoid parsing _indirect_ twice for Device table
In current logic, its_parse_indirect_baser() will be invoked twice
when allocating Device tables. Add a *break* to omit the unnecessary
and annoying (might be ...) invoking.
Fixes:
32bd44dc19de ("irqchip/gic-v3-its: Fix the incorrect parsing of VCPU table size")
Cc: stable@vger.kernel.org
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Julien Thierry [Wed, 13 Feb 2019 10:09:19 +0000 (10:09 +0000)]
genirq: Fix wrong name in request_percpu_nmi() description
ready_percpu_nmi() was the previous name of prepare_percpu_nmi(). Update
request_percpu_nmi() comment with the correct function name.
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Reported-by: Li Wei <liwei391@huawei.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Julien Thierry [Thu, 31 Jan 2019 14:54:01 +0000 (14:54 +0000)]
irqdesc: Add domain handler for NMIs
NMI handling code should be executed between calls to nmi_enter and
nmi_exit.
Add a separate domain handler to properly setup NMI context when handling
an interrupt requested as NMI.
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Julien Thierry [Thu, 31 Jan 2019 14:54:00 +0000 (14:54 +0000)]
genirq: Provide NMI handlers
Provide flow handlers that are NMI safe for interrupts and percpu_devid
interrupts.
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Julien Thierry [Thu, 31 Jan 2019 14:53:59 +0000 (14:53 +0000)]
genirq: Provide NMI management for percpu_devid interrupts
Add support for percpu_devid interrupts treated as NMIs.
Percpu_devid NMIs need to be setup/torn down on each CPU they target.
The same restrictions as for global NMIs still apply for percpu_devid NMIs.
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Julien Thierry [Thu, 31 Jan 2019 14:53:58 +0000 (14:53 +0000)]
genirq: Provide basic NMI management for interrupt lines
Add functionality to allocate interrupt lines that will deliver IRQs
as Non-Maskable Interrupts. These allocations are only successful if
the irqchip provides the necessary support and allows NMI delivery for the
interrupt line.
Interrupt lines allocated for NMI delivery must be enabled/disabled through
enable_nmi/disable_nmi_nosync to keep their state consistent.
To treat a PERCPU IRQ as NMI, the interrupt must not be shared nor threaded,
the irqchip directly managing the IRQ must be the root irqchip and the
irqchip cannot be behind a slow bus.
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Linus Torvalds [Mon, 21 Jan 2019 00:14:44 +0000 (13:14 +1300)]
Linux 5.0-rc3
Linus Torvalds [Mon, 21 Jan 2019 00:12:03 +0000 (13:12 +1300)]
Merge tag 'pstore-v5.0-rc4' of git://git./linux/kernel/git/kees/linux
Pull pstore fixes from Kees Cook:
- Fix console ramoops to show the previous boot logs (Sai Prakash
Ranjan)
- Avoid allocation and leak of platform data
* tag 'pstore-v5.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
pstore/ram: Avoid allocation and leak of platform data
pstore/ram: Fix console ramoops to show the previous boot logs
Linus Torvalds [Mon, 21 Jan 2019 00:07:03 +0000 (13:07 +1300)]
Merge tag 'gcc-plugins-v5.0-rc4' of git://git./linux/kernel/git/kees/linux
Pull gcc-plugins fixes from Kees Cook:
"Fix ARM per-task stack protector plugin under GCC 9 (Ard Biesheuvel)"
* tag 'gcc-plugins-v5.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
gcc-plugins: arm_ssp_per_task_plugin: fix for GCC 9+
gcc-plugins: arm_ssp_per_task_plugin: sign extend the SP mask
Linus Torvalds [Sun, 20 Jan 2019 23:52:31 +0000 (12:52 +1300)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Fix endless loop in nf_tables, from Phil Sutter.
2) Fix cross namespace ip6_gre tunnel hash list corruption, from
Olivier Matz.
3) Don't be too strict in phy_start_aneg() otherwise we might not allow
restarting auto negotiation. From Heiner Kallweit.
4) Fix various KMSAN uninitialized value cases in tipc, from Ying Xue.
5) Memory leak in act_tunnel_key, from Davide Caratti.
6) Handle chip errata of mv88e6390 PHY, from Andrew Lunn.
7) Remove linear SKB assumption in fou/fou6, from Eric Dumazet.
8) Missing udplite rehash callbacks, from Alexey Kodanev.
9) Log dirty pages properly in vhost, from Jason Wang.
10) Use consume_skb() in neigh_probe() as this is a normal free not a
drop, from Yang Wei. Likewise in macvlan_process_broadcast().
11) Missing device_del() in mdiobus_register() error paths, from Thomas
Petazzoni.
12) Fix checksum handling of short packets in mlx5, from Cong Wang.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (96 commits)
bpf: in __bpf_redirect_no_mac pull mac only if present
virtio_net: bulk free tx skbs
net: phy: phy driver features are mandatory
isdn: avm: Fix string plus integer warning from Clang
net/mlx5e: Fix cb_ident duplicate in indirect block register
net/mlx5e: Fix wrong (zero) TX drop counter indication for representor
net/mlx5e: Fix wrong error code return on FEC query failure
net/mlx5e: Force CHECKSUM_UNNECESSARY for short ethernet frames
tools: bpftool: Cleanup license mess
bpf: fix inner map masking to prevent oob under speculation
bpf: pull in pkt_sched.h header for tooling to fix bpftool build
selftests: forwarding: Add a test case for externally learned FDB entries
selftests: mlxsw: Test FDB offload indication
mlxsw: spectrum_switchdev: Do not treat static FDB entries as sticky
net: bridge: Mark FDB entries that were added by user as such
mlxsw: spectrum_fid: Update dummy FID index
mlxsw: pci: Return error on PCI reset timeout
mlxsw: pci: Increase PCI SW reset timeout
mlxsw: pci: Ring CQ's doorbell before RDQ's
MAINTAINERS: update email addresses of liquidio driver maintainers
...
Kees Cook [Sun, 20 Jan 2019 22:33:34 +0000 (14:33 -0800)]
pstore/ram: Avoid allocation and leak of platform data
Yue Hu noticed that when parsing device tree the allocated platform data
was never freed. Since it's not used beyond the function scope, this
switches to using a stack variable instead.
Reported-by: Yue Hu <huyue2@yulong.com>
Fixes:
35da60941e44 ("pstore/ram: add Device Tree bindings")
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Ard Biesheuvel [Fri, 18 Jan 2019 10:58:07 +0000 (11:58 +0100)]
gcc-plugins: arm_ssp_per_task_plugin: fix for GCC 9+
GCC 9 reworks the way the references to the stack canary are
emitted, to prevent the value from being spilled to the stack
before the final comparison in the epilogue, defeating the
purpose, given that the spill slot is under control of the
attacker that we are protecting ourselves from.
Since our canary value address is obtained without accessing
memory (as opposed to pre-v7 code that will obtain it from a
literal pool), it is unlikely (although not guaranteed) that
the compiler will spill the canary value in the same way, so
let's just disable this improvement when building with GCC9+.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Ard Biesheuvel [Fri, 18 Jan 2019 10:58:06 +0000 (11:58 +0100)]
gcc-plugins: arm_ssp_per_task_plugin: sign extend the SP mask
The ARM per-task stack protector GCC plugin hits an assert in
the compiler in some case, due to the fact the the SP mask
expression is not sign-extended as it should be. So fix that.
Suggested-by: Kugan Vivekanandarajah <kugan.vivekanandarajah@linaro.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Linus Torvalds [Sun, 20 Jan 2019 18:37:16 +0000 (07:37 +1300)]
Merge tag 'for_linus' of git://git./linux/kernel/git/mst/vhost
Pull virtio/vhost fixes and cleanups from Michael Tsirkin:
"Fixes and cleanups all over the place"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vhost/scsi: Use copy_to_iter() to send control queue response
vhost: return EINVAL if iovecs size does not match the message size
virtio-balloon: tweak config_changed implementation
virtio: don't allocate vqs when names[i] = NULL
virtio_pci: use queue idx instead of array idx to set up the vq
virtio: document virtio_config_ops restrictions
virtio: fix virtio_config_ops description
Linus Torvalds [Sun, 20 Jan 2019 18:35:26 +0000 (07:35 +1300)]
Merge tag 'for-5.0-rc2-tag' of git://git./linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"A handful of fixes (some of them in testing for a long time):
- fix some test failures regarding cleanup after transaction abort
- revert of a patch that could cause a deadlock
- delayed iput fixes, that can help in ENOSPC situation when there's
low space and a lot data to write"
* tag 'for-5.0-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: wakeup cleaner thread when adding delayed iput
btrfs: run delayed iputs before committing
btrfs: wait on ordered extents on abort cleanup
btrfs: handle delayed ref head accounting cleanup in abort
Revert "btrfs: balance dirty metadata pages in btrfs_finish_ordered_io"
Linus Torvalds [Sun, 20 Jan 2019 18:23:42 +0000 (07:23 +1300)]
Merge tags 'compiler-attributes-for-linus-v5.0-rc3' and 'clang-format-for-linus-v5.0-rc3' of git://github.com/ojeda/linux
Pull misc clang fixes from Miguel Ojeda:
- A fix for OPTIMIZER_HIDE_VAR from Michael S Tsirkin
- Update clang-format with the latest for_each macro list from Jason
Gunthorpe
* tag 'compiler-attributes-for-linus-v5.0-rc3' of git://github.com/ojeda/linux:
include/linux/compiler*.h: fix OPTIMIZER_HIDE_VAR
* tag 'clang-format-for-linus-v5.0-rc3' of git://github.com/ojeda/linux:
clang-format: Update .clang-format with the latest for_each macro list
Florian La Roche [Sat, 19 Jan 2019 15:14:50 +0000 (16:14 +0100)]
fix int_sqrt64() for very large numbers
If an input number x for int_sqrt64() has the highest bit set, then
fls64(x) is 64. (1UL << 64) is an overflow and breaks the algorithm.
Subtracting 1 is a better guess for the initial value of m anyway and
that's what also done in int_sqrt() implicitly [*].
[*] Note how int_sqrt() uses __fls() with two underscores, which already
returns the proper raw bit number.
In contrast, int_sqrt64() used fls64(), and that returns bit numbers
illogically starting at 1, because of error handling for the "no
bits set" case. Will points out that he bug probably is due to a
copy-and-paste error from the regular int_sqrt() case.
Signed-off-by: Florian La Roche <Florian.LaRoche@googlemail.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Will Deacon [Sat, 19 Jan 2019 21:56:05 +0000 (21:56 +0000)]
x86: uaccess: Inhibit speculation past access_ok() in user_access_begin()
Commit
594cc251fdd0 ("make 'user_access_begin()' do 'access_ok()'")
makes the access_ok() check part of the user_access_begin() preceding a
series of 'unsafe' accesses. This has the desirable effect of ensuring
that all 'unsafe' accesses have been range-checked, without having to
pick through all of the callsites to verify whether the appropriate
checking has been made.
However, the consolidated range check does not inhibit speculation, so
it is still up to the caller to ensure that they are not susceptible to
any speculative side-channel attacks for user addresses that ultimately
fail the access_ok() check.
This is an oversight, so use __uaccess_begin_nospec() to ensure that
speculation is inhibited until the access_ok() check has passed.
Reported-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 20 Jan 2019 03:27:59 +0000 (15:27 +1200)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"Three arm64 fixes for -rc3.
We've plugged a couple of nasty issues involving KASLR-enabled
kernels, and removed a redundant #define that was introduced as part
of the KHWASAN fixes from akpm at -rc2.
- Fix broken kpti page-table rewrite in bizarre KASLR configuration
- Fix module loading with KASLR
- Remove redundant definition of ARCH_SLAB_MINALIGN"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
kasan, arm64: remove redundant ARCH_SLAB_MINALIGN define
arm64: kaslr: ensure randomized quantities are clean to the PoC
arm64: kpti: Update arm64_kernel_use_ng_mappings() when forced on
David S. Miller [Sun, 20 Jan 2019 00:38:12 +0000 (16:38 -0800)]
Merge git://git./pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2019-01-20
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Fix a out-of-bounds access in __bpf_redirect_no_mac, from Willem.
2) Fix bpf_setsockopt to reset sock dst on SO_MARK changes, from Peter.
3) Fix map in map masking to prevent out-of-bounds access under
speculative execution, from Daniel.
4) Fix bpf_setsockopt's SO_MAX_PACING_RATE to support TCP internal
pacing, from Yuchung.
5) Fix json writer license in bpftool, from Thomas.
6) Fix AF_XDP to check if an actually queue exists during umem
setup, from Krzysztof.
7) Several fixes to BPF stackmap's build id handling. Another fix
for bpftool build to account for libbfd variations wrt linking
requirements, from Stanislav.
8) Fix BPF samples build with clang by working around missing asm
goto, from Yonghong.
9) Fix libbpf to retry program load on signal interrupt, from Lorenz.
10) Various minor compile warning fixes in BPF code, from Mathieu.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Willem de Bruijn [Wed, 16 Jan 2019 01:19:22 +0000 (20:19 -0500)]
bpf: in __bpf_redirect_no_mac pull mac only if present
Syzkaller was able to construct a packet of negative length by
redirecting from bpf_prog_test_run_skb with BPF_PROG_TYPE_LWT_XMIT:
BUG: KASAN: slab-out-of-bounds in memcpy include/linux/string.h:345 [inline]
BUG: KASAN: slab-out-of-bounds in skb_copy_from_linear_data include/linux/skbuff.h:3421 [inline]
BUG: KASAN: slab-out-of-bounds in __pskb_copy_fclone+0x2dd/0xeb0 net/core/skbuff.c:1395
Read of size
4294967282 at addr
ffff8801d798009c by task syz-executor2/12942
kasan_report.cold.9+0x242/0x309 mm/kasan/report.c:412
check_memory_region_inline mm/kasan/kasan.c:260 [inline]
check_memory_region+0x13e/0x1b0 mm/kasan/kasan.c:267
memcpy+0x23/0x50 mm/kasan/kasan.c:302
memcpy include/linux/string.h:345 [inline]
skb_copy_from_linear_data include/linux/skbuff.h:3421 [inline]
__pskb_copy_fclone+0x2dd/0xeb0 net/core/skbuff.c:1395
__pskb_copy include/linux/skbuff.h:1053 [inline]
pskb_copy include/linux/skbuff.h:2904 [inline]
skb_realloc_headroom+0xe7/0x120 net/core/skbuff.c:1539
ipip6_tunnel_xmit net/ipv6/sit.c:965 [inline]
sit_tunnel_xmit+0xe1b/0x30d0 net/ipv6/sit.c:1029
__netdev_start_xmit include/linux/netdevice.h:4325 [inline]
netdev_start_xmit include/linux/netdevice.h:4334 [inline]
xmit_one net/core/dev.c:3219 [inline]
dev_hard_start_xmit+0x295/0xc90 net/core/dev.c:3235
__dev_queue_xmit+0x2f0d/0x3950 net/core/dev.c:3805
dev_queue_xmit+0x17/0x20 net/core/dev.c:3838
__bpf_tx_skb net/core/filter.c:2016 [inline]
__bpf_redirect_common net/core/filter.c:2054 [inline]
__bpf_redirect+0x5cf/0xb20 net/core/filter.c:2061
____bpf_clone_redirect net/core/filter.c:2094 [inline]
bpf_clone_redirect+0x2f6/0x490 net/core/filter.c:2066
bpf_prog_41f2bcae09cd4ac3+0xb25/0x1000
The generated test constructs a packet with mac header, network
header, skb->data pointing to network header and skb->len 0.
Redirecting to a sit0 through __bpf_redirect_no_mac pulls the
mac length, even though skb->data already is at skb->network_header.
bpf_prog_test_run_skb has already pulled it as LWT_XMIT !is_l2.
Update the offset calculation to pull only if skb->data differs
from skb->network_header, which is not true in this case.
The test itself can be run only from commit
1cf1cae963c2 ("bpf:
introduce BPF_PROG_TEST_RUN command"), but the same type of packets
with skb at network header could already be built from lwt xmit hooks,
so this fix is more relevant to that commit.
Also set the mac header on redirect from LWT_XMIT, as even after this
change to __bpf_redirect_no_mac that field is expected to be set, but
is not yet in ip_finish_output2.
Fixes:
3a0af8fd61f9 ("bpf: BPF for lightweight tunnel infrastructure")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Michael S. Tsirkin [Fri, 18 Jan 2019 04:20:07 +0000 (23:20 -0500)]
virtio_net: bulk free tx skbs
Use napi_consume_skb() to get bulk free. Note that napi_consume_skb is
safe to call in a non-napi context as long as the napi_budget flag is
correct.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sat, 19 Jan 2019 22:33:18 +0000 (10:33 +1200)]
Merge tag 'mips_fixes_5.0_2' of git://git./linux/kernel/git/mips/linux
Pull MIPS fixes from Paul Burton:
- Fix IPI handling for Lantiq SoCs, which was broken by changes made
back in v4.12.
- Enable OF/DT serial support in ath79_defconfig to give us working
serial by default.
- Fix 64b builds for the Jazz platform.
- Set up a struct device for the BCM47xx SoC to allow BCM47xx drivers
to perform DMA again following the major DMA mapping changes made in
v4.19.
- Disable MSI on Cavium Octeon systems when the pcie_disable command
line parameter introduced in v3.3 is used, in order to avoid
inadvetently accessing PCIe controller registers despite the command
line.
- Fix a build failure for Cavium Octeon kernels with kexec enabled,
introduced in v4.20.
- Fix a regression in the behaviour of semctl/shmctl/msgctl IPC
syscalls for kernels including n32 support but not o32 support caused
by some cleanup in v3.19.
* tag 'mips_fixes_5.0_2' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: OCTEON: fix kexec support
mips: fix n32 compat_ipc_parse_version
Disable MSI also when pcie-octeon.pcie_disable on
MIPS: BCM47XX: Setup struct device for the SoC
MIPS: jazz: fix 64bit build
MIPS: ath79: Enable OF serial ports in the default config
MIPS: lantiq: Use CP0_LEGACY_COMPARE_IRQ
MIPS: lantiq: Fix IPI interrupt handling
Linus Torvalds [Sat, 19 Jan 2019 22:28:46 +0000 (10:28 +1200)]
Merge tag 'devicetree-fixes-for-5.0-2' of git://git./linux/kernel/git/robh/linux
Pull Devicetree fix from Rob Herring:
"A single build fix for powerpc due to device_node.type removal"
* tag 'devicetree-fixes-for-5.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
powerpc: chrp: Use of_node_is_type to access device_type
Linus Torvalds [Sat, 19 Jan 2019 22:24:30 +0000 (10:24 +1200)]
Merge tag 'libnvdimm-fixes-5.0-rc3' of git://git./linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fixes from Dan Williams:
"A crash fix, a build warning fix, a miscellaneous small cleanups.
In case anyone is looking for them, there was a regression caught by
testing that caused two patches to be dropped from this update. Those
patches have been reworked and will soak for another week / re-target
5.0-rc4.
- Fix driver initialization crash due to the inability to report an
'error' state for a DIMM's security capability.
- Build warning fix for little-endian ARM64 builds
- Fix a potential race between the EDAC driver's usage of the NFIT
SMBIOS id for a DIMM and the driver shutdown path.
- A small collection of one-line benign cleanups for duplicate
variable assignments, a duplicate header include and a mis-typed
function argument"
* tag 'libnvdimm-fixes-5.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
libnvdimm/security: Fix nvdimm_security_state() state request selection
acpi/nfit: Remove duplicate set nd_set in acpi_nfit_init_interleave_set()
acpi/nfit: Fix race accessing memdev in nfit_get_smbios_id()
libnvdimm/dimm: Fix security capability detection for non-Intel NVDIMMs
nfit: Mark some functions as __maybe_unused
ACPI/nfit: delete the function to_acpi_nfit_desc
ACPI/nfit: delete the redundant header file
Linus Torvalds [Sat, 19 Jan 2019 21:58:52 +0000 (09:58 +1200)]
Merge tag 'linux-watchdog-5.0-rc-fixes' of git://linux-watchdog.org/linux-watchdog
Pull watchdog fixes from Wim Van Sebroeck:
- mt7621_wdt/rt2880_wdt: Fix compilation problem
- tqmx86: Fix a couple IS_ERR() vs NULL bugs
* tag 'linux-watchdog-5.0-rc-fixes' of git://www.linux-watchdog.org/linux-watchdog:
watchdog: tqmx86: Fix a couple IS_ERR() vs NULL bugs
watchdog: mt7621_wdt/rt2880_wdt: Fix compilation problem
Linus Torvalds [Sat, 19 Jan 2019 21:27:38 +0000 (09:27 +1200)]
Merge tag 'nfs-for-5.0-2' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client fixes from Anna Schumaker:
"These are mostly fixes for SUNRPC bugs, with a single v4.2
copy_file_range() fix mixed in.
Stable bugfixes:
- Fix TCP receive code on archs with flush_dcache_page()
Other bugfixes:
- Fix error code in rpcrdma_buffer_create()
- Fix a double free in rpcrdma_send_ctxs_create()
- Fix kernel BUG at kernel/cred.c:825
- Fix unnecessary retry in nfs42_proc_copy_file_range()
- Ensure rq_bytes_sent is reset before request transmission
- Ensure we respect the RPCSEC_GSS sequence number limit
- Address Kerberos performance/behavior regression"
* tag 'nfs-for-5.0-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
SUNRPC: Address Kerberos performance/behavior regression
SUNRPC: Ensure we respect the RPCSEC_GSS sequence number limit
SUNRPC: Ensure rq_bytes_sent is reset before request transmission
NFSv4.2 fix unnecessary retry in nfs4_copy_file_range
sunrpc: kernel BUG at kernel/cred.c:825!
SUNRPC: Fix TCP receive code on archs with flush_dcache_page()
xprtrdma: Double free in rpcrdma_sendctxs_create()
xprtrdma: Fix error code in rpcrdma_buffer_create()
Linus Torvalds [Sat, 19 Jan 2019 21:15:04 +0000 (09:15 +1200)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"A set of 17 fixes. Most of these are minor or trivial.
The one fix that may be serious is the isci one: the bug can cause hba
parameters to be set from uninitialized memory. I don't think it's
exploitable, but you never know"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: cxgb4i: add wait_for_completion()
scsi: qla1280: set 64bit coherent mask
scsi: ufs: Fix geometry descriptor size
scsi: megaraid_sas: Retry reads of outbound_intr_status reg
scsi: qedi: Add ep_state for login completion on un-reachable targets
scsi: ufs: Fix system suspend status
scsi: qla2xxx: Use correct number of vectors for online CPUs
scsi: hisi_sas: Set protection parameters prior to adding SCSI host
scsi: tcmu: avoid cmd/qfull timers updated whenever a new cmd comes
scsi: isci: initialize shost fully before calling scsi_add_host()
scsi: lpfc: lpfc_sli: Mark expected switch fall-throughs
scsi: smartpqi_init: fix boolean expression in pqi_device_remove_start
scsi: core: Synchronize request queue PM status only on successful resume
scsi: pm80xx: reduce indentation
scsi: qla4xxx: check return code of qla4xxx_copy_from_fwddb_param
scsi: megaraid_sas: correct an info message
scsi: target/iscsi: fix error msg typo when create lio_qr_cache failed
scsi: sd: Fix cache_type_store()
Linus Torvalds [Sat, 19 Jan 2019 21:12:50 +0000 (09:12 +1200)]
Merge tag 'for-linus-
20190118' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- block size setting fixes for loop/nbd (Jan Kara)
- md bio_alloc_mddev() cleanup (Marcos)
- Ensure we don't lose the REQ_INTEGRITY flag (Ming)
- Two NVMe fixes by way of Christoph:
- Fix NVMe IRQ calculation (Ming)
- Uninitialized variable in nvmet-tcp (Sagi)
- BFQ comment fix (Paolo)
- License cleanup for recently added blk-mq-debugfs-zoned (Thomas)
* tag 'for-linus-
20190118' of git://git.kernel.dk/linux-block:
block: Cleanup license notice
nvme-pci: fix nvme_setup_irqs()
nvmet-tcp: fix uninitialized variable access
block: don't lose track of REQ_INTEGRITY flag
blockdev: Fix livelocks on loop device
nbd: Use set_blocksize() to set device blocksize
md: Make bio_alloc_mddev use bio_alloc_bioset
block, bfq: fix comments on __bfq_deactivate_entity
Jason Gunthorpe [Fri, 18 Jan 2019 22:57:04 +0000 (22:57 +0000)]
clang-format: Update .clang-format with the latest for_each macro list
Re-run the shell fragment that generated the original list. In particular
this adds the missing xarray related functions.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Camelia Groza [Thu, 17 Jan 2019 12:22:36 +0000 (14:22 +0200)]
net: phy: phy driver features are mandatory
Since phy driver features became a link_mode bitmap, phy drivers that
don't have a list of features configured will cause the kernel to crash
when probed.
Prevent the phy driver from registering if the features field is missing.
Fixes:
719655a14971 ("net: phy: Replace phy driver features u32 with link_mode bitmap")
Reported-by: Scott Wood <oss@buserror.net>
Signed-off-by: Camelia Groza <camelia.groza@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nathan Chancellor [Thu, 10 Jan 2019 05:41:08 +0000 (22:41 -0700)]
isdn: avm: Fix string plus integer warning from Clang
A recent commit in Clang expanded the -Wstring-plus-int warning, showing
some odd behavior in this file.
drivers/isdn/hardware/avm/b1.c:426:30: warning: adding 'int' to a string does not append to the string [-Wstring-plus-int]
cinfo->version[j] = "\0\0" + 1;
~~~~~~~^~~
drivers/isdn/hardware/avm/b1.c:426:30: note: use array indexing to silence this warning
cinfo->version[j] = "\0\0" + 1;
^
& [ ]
1 warning generated.
This is equivalent to just "\0". Nick pointed out that it is smarter to
use "" instead of "\0" because "" is used elsewhere in the kernel and
can be deduplicated at the linking stage.
Link: https://github.com/ClangBuiltLinux/linux/issues/309
Suggested-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rob Herring [Fri, 18 Jan 2019 14:12:10 +0000 (08:12 -0600)]
powerpc: chrp: Use of_node_is_type to access device_type
Commit
8ce5f8415753 ("of: Remove struct device_node.type pointer")
removed struct device_node.type pointer, but the conversion to use
of_node_is_type() accessor was missed in chrp_init_IRQ().
Fixes:
8ce5f8415753 ("of: Remove struct device_node.type pointer")
Reported-by: kbuild test robot <lkp@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Rob Herring <robh@kernel.org>
David S. Miller [Sat, 19 Jan 2019 02:23:23 +0000 (18:23 -0800)]
Merge tag 'mlx5-fixes-2019-01-18' of git://git./linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
Mellanox, mlx5 fixes 2019-01-18
This series introduces some fixes to mlx5 driver.
Please pull and let me know if there is any problem.
For -stable v4.18
('net/mlx5e: Force CHECKSUM_UNNECESSARY for short ethernet frames')
The patch doesn't apply cleanly to 4.18.y, but it is very simple to
resolve, what should be the procedure here ?
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eli Britstein [Wed, 19 Dec 2018 05:36:51 +0000 (07:36 +0200)]
net/mlx5e: Fix cb_ident duplicate in indirect block register
Previously the identifier used for indirect block callback registry
and for block rule cb registry (when done via indirect blocks) was the
pointer to the tunnel netdev we were interested in receiving updates on.
This worked fine if a single PF existed that registered one callback for
the tunnel netdev of interest. However, if multiple PFs are in place then
the 2nd PF tries to register with the same tunnel netdev identifier. This
leads to EEXIST errors and/or incorrect cb deletions.
Prevent this conflict by using the rpriv pointer as the identifier for
netdev indirect block cb registry, allowing each PF to register a unique
callback per tunnel netdev. For block cb registry, the same PF may
register multiple cbs to the same block if using TC shared blocks.
Instead of the rpriv, use the pointer to the allocated indr_priv data as
the identifier here. This means that there can be a unique block callback
for each PF/tunnel netdev combo.
Fixes:
f5bc2c5de101 ("net/mlx5e: Support TC indirect block notifications
for eswitch uplink reprs")
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Tariq Toukan [Thu, 8 Nov 2018 10:06:53 +0000 (12:06 +0200)]
net/mlx5e: Fix wrong (zero) TX drop counter indication for representor
For representors, the TX dropped counter is not folded from the
per-ring counters. Fix it.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Shay Agroskin [Sun, 9 Dec 2018 10:00:13 +0000 (12:00 +0200)]
net/mlx5e: Fix wrong error code return on FEC query failure
Advertised and configured FEC query failure resulted in printing
wrong error code.
Fixes:
6cfa94605091 ("net/mlx5e: Ethtool driver callback for query/set FEC policy")
Signed-off-by: Shay Agroskin <shayag@mellanox.com>
Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Cong Wang [Tue, 4 Dec 2018 06:14:04 +0000 (22:14 -0800)]
net/mlx5e: Force CHECKSUM_UNNECESSARY for short ethernet frames
When an ethernet frame is padded to meet the minimum ethernet frame
size, the padding octets are not covered by the hardware checksum.
Fortunately the padding octets are usually zero's, which don't affect
checksum. However, we have a switch which pads non-zero octets, this
causes kernel hardware checksum fault repeatedly.
Prior to:
commit '
88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE ...")'
skb checksum was forced to be CHECKSUM_NONE when padding is detected.
After it, we need to keep skb->csum updated, like what we do for RXFCS.
However, fixing up CHECKSUM_COMPLETE requires to verify and parse IP
headers, it is not worthy the effort as the packets are so small that
CHECKSUM_COMPLETE can't save anything.
Fixes:
88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"),
Cc: Eric Dumazet <edumazet@google.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Nikola Ciprich <nikola.ciprich@linuxbox.cz>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Thomas Gleixner [Thu, 17 Jan 2019 23:14:24 +0000 (00:14 +0100)]
tools: bpftool: Cleanup license mess
Precise and non-ambiguous license information is important. The recent
relicensing of the bpftools introduced a license conflict.
The files have now:
SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause
and
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version
* 2 of the License, or (at your option) any later version
Amazingly about 20 people acked that change and neither they nor the
committer noticed. Oh well.
Digging deeper: The files were imported from the iproute2 repository with
the GPL V2 or later boiler plate text in commit
b66e907cfee2 ("tools:
bpftool: copy JSON writer from iproute2 repository")
Looking at the iproute2 repository at
git://git.kernel.org/pub/scm/network/iproute2/iproute2.git
the following commit is the equivivalent:
commit
d9d8c839 ("json_writer: add SPDX Identifier (GPL-2/BSD-2)")
That commit explicitly removes the boiler plate and relicenses the code
uner GPL-2.0-only and BSD-2-Clause. As Steven wrote the original code and
also the relicensing commit, it's assumed that the relicensing was intended
to do exaclty that. Just the kernel side update failed to remove the boiler
plate. Do so now.
Fixes:
907b22365115 ("tools: bpftool: dual license all files")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: YueHaibing <yuehaibing@huawei.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: Stanislav Fomichev <sdf@google.com>
Cc: Sean Young <sean@mess.org>
Cc: Jiri Benc <jbenc@redhat.com>
Cc: David Calavera <david.calavera@gmail.com>
Cc: Andrey Ignatov <rdna@fb.com>
Cc: Joe Stringer <joe@wand.net.nz>
Cc: David Ahern <dsahern@gmail.com>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Petar Penkov <ppenkov@stanford.edu>
Cc: Sandipan Das <sandipan@linux.ibm.com>
Cc: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Taeung Song <treeze.taeung@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Quentin Monnet <quentin.monnet@netronome.com>
CC: okash.khawaja@gmail.com
Cc: netdev@vger.kernel.org
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Daniel Borkmann [Thu, 17 Jan 2019 15:34:45 +0000 (16:34 +0100)]
bpf: fix inner map masking to prevent oob under speculation
During review I noticed that inner meta map setup for map in
map is buggy in that it does not propagate all needed data
from the reference map which the verifier is later accessing.
In particular one such case is index masking to prevent out of
bounds access under speculative execution due to missing the
map's unpriv_array/index_mask field propagation. Fix this such
that the verifier is generating the correct code for inlined
lookups in case of unpriviledged use.
Before patch (test_verifier's 'map in map access' dump):
# bpftool prog dump xla id 3
0: (62) *(u32 *)(r10 -4) = 0
1: (bf) r2 = r10
2: (07) r2 += -4
3: (18) r1 = map[id:4]
5: (07) r1 += 272 |
6: (61) r0 = *(u32 *)(r2 +0) |
7: (35) if r0 >= 0x1 goto pc+6 | Inlined map in map lookup
8: (54) (u32) r0 &= (u32) 0 | with index masking for
9: (67) r0 <<= 3 | map->unpriv_array.
10: (0f) r0 += r1 |
11: (79) r0 = *(u64 *)(r0 +0) |
12: (15) if r0 == 0x0 goto pc+1 |
13: (05) goto pc+1 |
14: (b7) r0 = 0 |
15: (15) if r0 == 0x0 goto pc+11
16: (62) *(u32 *)(r10 -4) = 0
17: (bf) r2 = r10
18: (07) r2 += -4
19: (bf) r1 = r0
20: (07) r1 += 272 |
21: (61) r0 = *(u32 *)(r2 +0) | Index masking missing (!)
22: (35) if r0 >= 0x1 goto pc+3 | for inner map despite
23: (67) r0 <<= 3 | map->unpriv_array set.
24: (0f) r0 += r1 |
25: (05) goto pc+1 |
26: (b7) r0 = 0 |
27: (b7) r0 = 0
28: (95) exit
After patch:
# bpftool prog dump xla id 1
0: (62) *(u32 *)(r10 -4) = 0
1: (bf) r2 = r10
2: (07) r2 += -4
3: (18) r1 = map[id:2]
5: (07) r1 += 272 |
6: (61) r0 = *(u32 *)(r2 +0) |
7: (35) if r0 >= 0x1 goto pc+6 | Same inlined map in map lookup
8: (54) (u32) r0 &= (u32) 0 | with index masking due to
9: (67) r0 <<= 3 | map->unpriv_array.
10: (0f) r0 += r1 |
11: (79) r0 = *(u64 *)(r0 +0) |
12: (15) if r0 == 0x0 goto pc+1 |
13: (05) goto pc+1 |
14: (b7) r0 = 0 |
15: (15) if r0 == 0x0 goto pc+12
16: (62) *(u32 *)(r10 -4) = 0
17: (bf) r2 = r10
18: (07) r2 += -4
19: (bf) r1 = r0
20: (07) r1 += 272 |
21: (61) r0 = *(u32 *)(r2 +0) |
22: (35) if r0 >= 0x1 goto pc+4 | Now fixed inlined inner map
23: (54) (u32) r0 &= (u32) 0 | lookup with proper index masking
24: (67) r0 <<= 3 | for map->unpriv_array.
25: (0f) r0 += r1 |
26: (05) goto pc+1 |
27: (b7) r0 = 0 |
28: (b7) r0 = 0
29: (95) exit
Fixes:
b2157399cc98 ("bpf: prevent out-of-bounds speculation")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Daniel Borkmann [Thu, 17 Jan 2019 15:15:09 +0000 (16:15 +0100)]
bpf: pull in pkt_sched.h header for tooling to fix bpftool build
Dan reported that bpftool does not compile for him:
$ make tools/bpf
DESCEND bpf
Auto-detecting system features:
.. libbfd: [ on ]
.. disassembler-four-args: [ OFF ]
DESCEND bpftool
Auto-detecting system features:
.. libbfd: [ on ]
.. disassembler-four-args: [ OFF ]
CC /opt/linux.git/tools/bpf/bpftool/net.o
In file included from /opt/linux.git/tools/include/uapi/linux/pkt_cls.h:6:0,
from /opt/linux.git/tools/include/uapi/linux/tc_act/tc_bpf.h:14,
from net.c:13:
net.c: In function 'show_dev_tc_bpf':
net.c:164:21: error: 'TC_H_CLSACT' undeclared (first use in this function)
handle = TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_INGRESS);
[...]
Fix it by importing pkt_sched.h header copy into tooling
infrastructure.
Fixes:
49a249c38726 ("tools/bpftool: copy a few net uapi headers to tools directory")
Fixes:
f6f3bac08ff9 ("tools/bpf: bpftool: add net support")
Reported-by: Dan Gilson <dan_gilson@yahoo.com>
Reference: https://bugzilla.kernel.org/show_bug.cgi?id=202315
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
David S. Miller [Fri, 18 Jan 2019 23:12:16 +0000 (15:12 -0800)]
Merge branch 'mlxsw-fixes'
Ido Schimmel says:
====================
mlxsw: Various fixes
This patchset contains small fixes in mlxsw and one fix in the bridge
driver.
Patches #1-#4 perform small adjustments in PCI and FID code following
recent tests that were performed on the Spectrum-2 ASIC.
Patch #5 fixes the bridge driver to mark FDB entries that were added by
user as such. Otherwise, these entries will be ignored by underlying
switch drivers.
Patch #6 fixes a long standing issue in mlxsw where the driver
incorrectly programmed static FDB entries as both static and sticky.
Patches #7-#8 add test cases for above mentioned bugs.
Please consider patches #1, #2 and #4 for stable.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Fri, 18 Jan 2019 15:58:03 +0000 (15:58 +0000)]
selftests: forwarding: Add a test case for externally learned FDB entries
Test that externally learned FDB entries can roam, but not age out.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Fri, 18 Jan 2019 15:58:02 +0000 (15:58 +0000)]
selftests: mlxsw: Test FDB offload indication
Test that externally learned FDB entries added from user space are
marked as offloaded.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Fri, 18 Jan 2019 15:58:01 +0000 (15:58 +0000)]
mlxsw: spectrum_switchdev: Do not treat static FDB entries as sticky
The driver currently treats static FDB entries as both static and
sticky. This is incorrect and prevents such entries from being roamed to
a different port via learning.
Fix this by configuring static entries with ageing disabled and roaming
enabled.
In net-next we can add proper support for the newly introduced 'sticky'
flag.
Fixes:
56ade8fe3fe1 ("mlxsw: spectrum: Add initial support for Spectrum ASIC")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Alexander Petrovskiy <alexpe@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Fri, 18 Jan 2019 15:58:00 +0000 (15:58 +0000)]
net: bridge: Mark FDB entries that were added by user as such
Externally learned entries can be added by a user or by a switch driver
that is notifying the bridge driver about entries that were learned in
hardware.
In the first case, the entries are not marked with the 'added_by_user'
flag, which causes switch drivers to ignore them and not offload them.
The 'added_by_user' flag can be set on externally learned FDB entries
based on the 'swdev_notify' parameter in br_fdb_external_learn_add(),
which effectively means if the created / updated FDB entry was added by
a user or not.
Fixes:
816a3bed9549 ("switchdev: Add fdb.added_by_user to switchdev notifications")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Alexander Petrovskiy <alexpe@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Cc: Roopa Prabhu <roopa@cumulusnetworks.com>
Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Cc: bridge@lists.linux-foundation.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Nir Dotan [Fri, 18 Jan 2019 15:57:59 +0000 (15:57 +0000)]
mlxsw: spectrum_fid: Update dummy FID index
When using a tc flower action of egress mirred redirect, the driver adds
an implicit FID setting action. This implicit action sets a dummy FID to
the packet and is used as part of a design for trapping unmatched flows
in OVS. While this implicit FID setting action is supposed to be a NOP
when a redirect action is added, in Spectrum-2 the FID record is
consulted as the dummy FID index is an 802.1D FID index and the packet
is dropped instead of being redirected.
Set the dummy FID index value to be within 802.1Q range. This satisfies
both Spectrum-1 which ignores the FID and Spectrum-2 which identifies it
as an 802.1Q FID and will then follow the redirect action.
Fixes:
c3ab435466d5 ("mlxsw: spectrum: Extend to support Spectrum-2 ASIC")
Signed-off-by: Nir Dotan <nird@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nir Dotan [Fri, 18 Jan 2019 15:57:57 +0000 (15:57 +0000)]
mlxsw: pci: Return error on PCI reset timeout
Return an appropriate error in the case when the driver timeouts on waiting
for firmware to go out of PCI reset.
Fixes:
233fa44bd67a ("mlxsw: pci: Implement reset done check")
Signed-off-by: Nir Dotan <nird@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nir Dotan [Fri, 18 Jan 2019 15:57:56 +0000 (15:57 +0000)]
mlxsw: pci: Increase PCI SW reset timeout
Spectrum-2 PHY layer introduces a calibration period which is a part of the
Spectrum-2 firmware boot process. Hence increase the SW timeout waiting for
the firmware to come out of boot. This does not increase system boot time
in cases where the firmware PHY calibration process is done quickly.
Fixes:
c3ab435466d5 ("mlxsw: spectrum: Extend to support Spectrum-2 ASIC")
Signed-off-by: Nir Dotan <nird@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Fri, 18 Jan 2019 15:57:55 +0000 (15:57 +0000)]
mlxsw: pci: Ring CQ's doorbell before RDQ's
When a packet should be trapped to the CPU the device consumes a WQE
(work queue element) from an RDQ (receive descriptor queue) and copies
the packet to the address specified in the WQE. The device then tries to
post a CQE (completion queue element) that contains various metadata
(e.g., ingress port) about the packet to a CQ (completion queue).
In case the device managed to consume a WQE, but did not manage to post
the corresponding CQE, it will get stuck. This unlikely situation can be
triggered due to the scheme the driver is currently using to process
CQEs.
The driver will consume up to 512 CQEs at a time and after processing
each corresponding WQE it will ring the RDQ's doorbell, letting the
device know that a new WQE was posted for it to consume. Only after
processing all the CQEs (up to 512), the driver will ring the CQ's
doorbell, letting the device know that new ones can be posted.
Fix this by having the driver ring the CQ's doorbell for every processed
CQE, but before ringing the RDQ's doorbell. This guarantees that
whenever we post a new WQE, there is a corresponding CQE available. Copy
the currently processed CQE to prevent the device from overwriting it
with a new CQE after ringing the doorbell.
Note that the driver still arms the CQ only after processing all the
pending CQEs, so that interrupts for this CQ will only be delivered
after the driver finished its processing.
Before commit
8404f6f2e8ed ("mlxsw: pci: Allow to use CQEs of version 1
and version 2") the issue was virtually impossible to trigger since the
number of CQEs was twice the number of WQEs and the number of CQEs
processed at a time was equal to the number of available WQEs.
Fixes:
8404f6f2e8ed ("mlxsw: pci: Allow to use CQEs of version 1 and version 2")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Semion Lisyansky <semionl@mellanox.com>
Tested-by: Semion Lisyansky <semionl@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Felix Manlunas [Thu, 17 Jan 2019 18:07:45 +0000 (18:07 +0000)]
MAINTAINERS: update email addresses of liquidio driver maintainers
Update email addresses of liquidio driver maintainers. Also remove a
former maintainer.
Signed-off-by: Felix Manlunas <fmanlunas@marvell.com>
Acked-by: Derek Chickles <dchickles@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jonathan Neuschäfer [Thu, 17 Jan 2019 17:02:18 +0000 (18:02 +0100)]
net: Fix typo in NET_FAILOVER help text
"also enables" should not be spelled as one word.
Fixes:
cfc80d9a1163 ("net: Introduce net_failover driver")
Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ross Lagerwall [Thu, 17 Jan 2019 15:34:38 +0000 (15:34 +0000)]
net: Fix usage of pskb_trim_rcsum
In certain cases, pskb_trim_rcsum() may change skb pointers.
Reinitialize header pointers afterwards to avoid potential
use-after-frees. Add a note in the documentation of
pskb_trim_rcsum(). Found by KASAN.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 18 Jan 2019 19:34:10 +0000 (07:34 +1200)]
Merge tag 'media/v5.0-1' of git://git./linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
- a regression fix at v4l2 core, with affects multi-plane streams
- a fix at vim2m driver
* tag 'media/v5.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
media: vim2m: only cancel work if it is for right context
media: v4l: ioctl: Validate num_planes for debug messages
media: v4l: ioctl: Validate num_planes before using it
media: v4l2-ioctl: Clear only per-plane reserved fields
Linus Torvalds [Fri, 18 Jan 2019 19:26:16 +0000 (07:26 +1200)]
Merge tag 'pci-v5.0-fixes-2' of git://git./linux/kernel/git/helgaas/pci
Pull PCI fixes from Bjorn Helgaas::
- Fix PCI kconfig menu organization (Rob Herring)
- Fix pci_alloc_irq_vectors_affinity() error return to allow "reduce
and retry" for drivers using IRQ sets (Ming Lei)
- Fix "pci=disable_acs_redir" initdata use-after-free problem (Logan
Gunthorpe)
* tag 'pci-v5.0-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI: Fix __initdata issue with "pci=disable_acs_redir" parameter
PCI/MSI: Return -ENOSPC from pci_alloc_irq_vectors_affinity()
PCI: Fix PCI kconfig menu organization
Linus Torvalds [Fri, 18 Jan 2019 19:23:25 +0000 (07:23 +1200)]
Merge tag 'i3c/fixes-for-5.0-rc3' of git://git./linux/kernel/git/i3c/linux
Pull i3c fixes from Boris Brezillon:
- Fix the error check on master->sysclk val in the Cadence driver
- Fix reattach implementation in the Designware driver
* tag 'i3c/fixes-for-5.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux:
i3c: master: dw-i3c-master: fix i3c_attach/reattach
i3c: master: Fix an error checking typo in 'cdns_i3c_master_probe()'
Linus Torvalds [Fri, 18 Jan 2019 19:21:43 +0000 (07:21 +1200)]
Merge tag 'mtd/fixes-for-5.0-rc3' of git://git.infradead.org/linux-mtd
Pull mtd fixes from Boris Brezillon:
"Raw NAND changes:
- jz4740: fix a compilation warning
- fsmc: fix a regression introduced by ->select_chip() deprecation
- denali: fix a regression introduced by NAND_KEEP_TIMINGS addition"
* tag 'mtd/fixes-for-5.0-rc3' of git://git.infradead.org/linux-mtd:
mtd: rawnand: denali: get ->setup_data_interface() working again
mtd: nand: jz4740: fix '__iomem *' vs. '* __iomem'
mtd: rawnand: fsmc: Keep bank enable bit set
Linus Torvalds [Fri, 18 Jan 2019 19:17:19 +0000 (07:17 +1200)]
Merge tag 'regmap-fix-v5.0-rc2' of git://git./linux/kernel/git/broonie/regmap
Pull regmap fixes from Mark Brown:
"The cleanups for the way we handle type information introduced during
the merge window revealed that we'd been abusing the irq APIs for a
long time, causing breakage for systems.
This has a couple of minimal fixes for that which restore the previous
behaviour for the time being, we'll fix it properly for v5.1 but
that'd be a bit much to do as a bug fix"
* tag 'regmap-fix-v5.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
regmap-irq: do not write mask register if mask_base is zero
regmap: regmap-irq: silently ignore unsupported type settings
Thomas Petazzoni [Wed, 16 Jan 2019 09:53:58 +0000 (10:53 +0100)]
net: phy: mdio_bus: add missing device_del() in mdiobus_register() error handling
The current code in __mdiobus_register() doesn't properly handle
failures returned by the devm_gpiod_get_optional() call: it returns
immediately, without unregistering the device that was added by the
call to device_register() earlier in the function.
This leaves a stale device, which then causes a NULL pointer
dereference in the code that handles deferred probing:
[ 1.489982] Unable to handle kernel NULL pointer dereference at virtual address
00000074
[ 1.498110] pgd = (ptrval)
[ 1.500838] [
00000074] *pgd=
00000000
[ 1.504432] Internal error: Oops: 17 [#1] SMP ARM
[ 1.509133] Modules linked in:
[ 1.512192] CPU: 1 PID: 51 Comm: kworker/1:3 Not tainted 4.20.0-00039-g3b73a4cc8b3e-dirty #99
[ 1.520708] Hardware name: Xilinx Zynq Platform
[ 1.525261] Workqueue: events deferred_probe_work_func
[ 1.530403] PC is at klist_next+0x10/0xfc
[ 1.534403] LR is at device_for_each_child+0x40/0x94
[ 1.539361] pc : [<
c0683fbc>] lr : [<
c0455d90>] psr:
200e0013
[ 1.545628] sp :
ceeefe68 ip :
00000001 fp :
ffffe000
[ 1.550863] r10:
00000000 r9 :
c0c66790 r8 :
00000000
[ 1.556079] r7 :
c0457d44 r6 :
00000000 r5 :
ceeefe8c r4 :
cfa2ec78
[ 1.562604] r3 :
00000064 r2 :
c0457d44 r1 :
ceeefe8c r0 :
00000064
[ 1.569129] Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
[ 1.576263] Control:
18c5387d Table:
0ed7804a DAC:
00000051
[ 1.582013] Process kworker/1:3 (pid: 51, stack limit = 0x(ptrval))
[ 1.588280] Stack: (0xceeefe68 to 0xceef0000)
[ 1.592630] fe60:
cfa2ec78 c0c03c08 00000000 c0457d44 00000000 c0c66790
[ 1.600814] fe80:
00000000 c0455d90 ceeefeac 00000064 00000000 0d7a542e cee9d494 cfa2ec78
[ 1.608998] fea0:
cfa2ec78 00000000 c0457d44 c0457d7c cee9d494 c0c03c08 00000000 c0455dac
[ 1.617182] fec0:
cf98ba44 cf926a00 cee9d494 0d7a542e 00000000 cf935a10 cf935a10 cf935a10
[ 1.625366] fee0:
c0c4e9b8 c0457d7c c0c4e80c 00000001 cf935a10 c0457df4 cf935a10 c0c4e99c
[ 1.633550] ff00:
c0c4e99c c045a27c c0c4e9c4 ced63f80 cfde8a80 cfdebc00 00000000 c013893c
[ 1.641734] ff20:
cfde8a80 cfde8a80 c07bd354 ced63f80 ced63f94 cfde8a80 00000008 c0c02d00
[ 1.649936] ff40:
cfde8a98 cfde8a80 ffffe000 c0139a30 ffffe000 c0c6624a c07bd354 00000000
[ 1.658120] ff60:
ffffe000 cee9e780 ceebfe00 00000000 ceeee000 ced63f80 c0139788 cf8cdea4
[ 1.666304] ff80:
cee9e79c c013e598 00000001 ceebfe00 c013e44c 00000000 00000000 00000000
[ 1.674488] ffa0:
00000000 00000000 00000000 c01010e8 00000000 00000000 00000000 00000000
[ 1.682671] ffc0:
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 1.690855] ffe0:
00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
[ 1.699058] [<
c0683fbc>] (klist_next) from [<
c0455d90>] (device_for_each_child+0x40/0x94)
[ 1.707241] [<
c0455d90>] (device_for_each_child) from [<
c0457d7c>] (device_reorder_to_tail+0x38/0x88)
[ 1.716476] [<
c0457d7c>] (device_reorder_to_tail) from [<
c0455dac>] (device_for_each_child+0x5c/0x94)
[ 1.725692] [<
c0455dac>] (device_for_each_child) from [<
c0457d7c>] (device_reorder_to_tail+0x38/0x88)
[ 1.734927] [<
c0457d7c>] (device_reorder_to_tail) from [<
c0457df4>] (device_pm_move_to_tail+0x28/0x40)
[ 1.744235] [<
c0457df4>] (device_pm_move_to_tail) from [<
c045a27c>] (deferred_probe_work_func+0x58/0x8c)
[ 1.753746] [<
c045a27c>] (deferred_probe_work_func) from [<
c013893c>] (process_one_work+0x210/0x4fc)
[ 1.762888] [<
c013893c>] (process_one_work) from [<
c0139a30>] (worker_thread+0x2a8/0x5c0)
[ 1.771072] [<
c0139a30>] (worker_thread) from [<
c013e598>] (kthread+0x14c/0x154)
[ 1.778482] [<
c013e598>] (kthread) from [<
c01010e8>] (ret_from_fork+0x14/0x2c)
[ 1.785689] Exception stack(0xceeeffb0 to 0xceeefff8)
[ 1.790739] ffa0:
00000000 00000000 00000000 00000000
[ 1.798923] ffc0:
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 1.807107] ffe0:
00000000 00000000 00000000 00000000 00000013 00000000
[ 1.813724] Code:
e92d47f0 e1a05000 e8900048 e1a00003 (
e5937010)
[ 1.819844] ---[ end trace
3c2c0c8b65399ec9 ]---
The actual error that we had from devm_gpiod_get_optional() was
-EPROBE_DEFER, due to the GPIO being provided by a driver that is
probed later than the Ethernet controller driver.
To fix this, we simply add the missing device_del() invocation in the
error path.
Fixes:
69226896ad636 ("mdio_bus: Issue GPIO RESET to PHYs")
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Otto Sabart [Mon, 14 Jan 2019 11:56:36 +0000 (12:56 +0100)]
doc: net: fix bad references to network drivers
Fix "reference to nonexisting document" warnings.
Fixes:
b255e500c8dc ("net: documentation: build a directory structure for drivers")
Signed-off-by: Otto Sabart <ottosabart@seberm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 18 Jan 2019 17:55:42 +0000 (05:55 +1200)]
Merge tag 'powerpc-5.0-3' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"A couple of weeks of fixes.
There's one fix for an oops on Power9 machines with Open CAPI
adapters.
And a fix for probable memory corruption in some of the new NPU code,
caught by smatch though and not seen in the wild.
Plus a few other minor fixes.
There's one non-fix which is the perf_regs change. That was sent
during the merge window but I accidentally only merged the first of
two patches in the series. It's been in linux-next so hopefully
doesn't conflict with anything in acme's tree.
Thanks to: Alexey Kardashevskiy, Andrew Donnellan, Breno Leitao,
Christian Lamparter, Christophe Leroy, Dan Carpenter, Frederic Barrat,
Greg Kurz, Jason A. Donenfeld, Madhavan Srinivasan"
* tag 'powerpc-5.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/syscalls: Fix syscall tracing
powerpc/pseries: Fix build break due to pnv_npu2_init()
powerpc/4xx/ocm: Fix fix for phys_addr_t printf warnings
powerpc/powernv/npu: Fix oops in pnv_try_setup_npu_table_group()
powerpc/tm: Limit TM code inside PPC_TRANSACTIONAL_MEM
powerpc/8xx: fix setting of pagetable for Abatron BDI debug tool.
powerpc/powernv/npu: Allocate enough memory in pnv_try_setup_npu_table_group()
powerpc/perf: Update perf_regs structure to include MMCRA
Linus Torvalds [Fri, 18 Jan 2019 17:53:41 +0000 (05:53 +1200)]
Merge tag 'for-linus-5.0-rc3-tag' of git://git./linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
- Several fixes for the Xen pvcalls drivers (1 fix for the backend and
8 for the frontend).
- A fix for a rather longstanding bug in the Xen sched_clock()
interface which led to weird time jumps when migrating the system.
- A fix for avoiding accesses to x2apic MSRs in Xen PV guests.
* tag 'for-linus-5.0-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen: Fix x86 sched_clock() interface for xen
pvcalls-front: fix potential null dereference
always clear the X2APIC_ENABLE bit for PV guest
pvcalls-front: Avoid get_free_pages(GFP_KERNEL) under spinlock
xen/pvcalls: remove set but not used variable 'intf'
pvcalls-back: set -ENOTCONN in pvcalls_conn_back_read
pvcalls-front: don't return error when the ring is full
pvcalls-front: properly allocate sk
pvcalls-front: don't try to free unallocated rings
pvcalls-front: read all data before closing the connection
Linus Torvalds [Fri, 18 Jan 2019 17:48:43 +0000 (05:48 +1200)]
Merge branch 'linus' of git://git./linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
"This fixes the following issues:
- Zero-length DMA mapping in caam
- Invalidly mapping stack memory for DMA in talitos
- Use after free in cavium/nitrox
- Key parsing in authenc
- Undefined shift in sm3
- Bogus completion call in authencesn
- SHA support detection in caam"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: sm3 - fix undefined shift by >= width of value
crypto: talitos - fix ablkcipher for CONFIG_VMAP_STACK
crypto: talitos - reorder code in talitos_edesc_alloc()
crypto: adiantum - initialize crypto_spawn::inst
crypto: cavium/nitrox - Use after free in process_response_list()
crypto: authencesn - Avoid twice completion call in decrypt path
crypto: caam - fix SHA support detection
crypto: caam - fix zero-length buffer DMA mapping
crypto: ccree - convert to use crypto_authenc_extractkeys()
crypto: bcm - convert to use crypto_authenc_extractkeys()
crypto: authenc - fix parsing key with misaligned rta_len
Linus Torvalds [Fri, 18 Jan 2019 17:46:00 +0000 (05:46 +1200)]
Merge tag 'acpi-5.0-rc3' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These fix an ACPI initialization ordering issue introduced in the 4.17
time frame and causing functional problems to appear on multiple
systems and fix some fallout of the recent change to enable building
kernels with ACPI support and without PCI.
Specifics:
- Restore the ACPI initialization ordering changed implicitly by the
module-level AML handling rework during the 4.17 development cycle
that caused the EC address space handler based on information from
ECDT to be set up before loading AML definition blocks, making it
effectively not accessible by AML on some systems that don't work
as expected any more (Rafael Wysocki).
- Add direct dependencies on PCI to Kconfig in multiple places for
code that depends on both ACPI and PCI, but the PCI dependency was
implicitly satisfied by the ACPI dependency before, to prevent
invalid configurations from being created, for example by
randconfig (Sinan Kaya)"
* tag 'acpi-5.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: EC: Look for ECDT EC after calling acpi_load_tables()
drivers: thermal: int340x_thermal: Make PCI dependency explicit
x86/intel/lpss: Make PCI dependency explicit
platform/x86: apple-gmux: Make PCI dependency explicit
platform/x86: intel_pmc: Make PCI dependency explicit
platform/x86: intel_ips: make PCI dependency explicit
vga-switcheroo: make PCI dependency explicit
ata: pata_acpi: Make PCI dependency explicit
ACPI / LPSS: Make PCI dependency explicit
Linus Torvalds [Fri, 18 Jan 2019 17:43:05 +0000 (05:43 +1200)]
Merge tag 'fbdev-v5.0-rc3' of git://github.com/bzolnier/linux
Pull fbdev fixes from Bartlomiej Zolnierkiewicz:
- fix stack memory leak in omap2fb driver (Vlad Tsyrklevich)
- fix OF node name handling v4.20 regression in offb driver (Rob
Herring)
- convert CONFIG_FB_LOGO_CENTER config option added in v5.0-rc1 into a
kernel parameter (Peter Rosin)
* tag 'fbdev-v5.0-rc3' of git://github.com/bzolnier/linux:
fbdev: fbmem: convert CONFIG_FB_LOGO_CENTER into a cmd line option
fbdev: offb: Fix OF node name handling
omap2fb: Fix stack memory disclosure
Linus Torvalds [Fri, 18 Jan 2019 17:41:38 +0000 (05:41 +1200)]
Merge tag 'drm-fixes-2019-01-18-1' of git://anongit.freedesktop.org/drm/drm
Pull drm update from Dave Airlie:
"Add nouveau TU102 (RTX 2080 Ti) support"
* tag 'drm-fixes-2019-01-18-1' of git://anongit.freedesktop.org/drm/drm:
drm/nouveau/core: recognise TU102
Josef Bacik [Fri, 11 Jan 2019 15:21:02 +0000 (10:21 -0500)]
btrfs: wakeup cleaner thread when adding delayed iput
The cleaner thread usually takes care of delayed iputs, with the
exception of the btrfs_end_transaction_throttle path. Delaying iputs
means we are potentially delaying the eviction of an inode and it's
respective space. The cleaner thread only gets woken up every 30
seconds, or when we require space. If there are a lot of inodes that
need to be deleted we could induce a serious amount of latency while we
wait for these inodes to be evicted. So instead wakeup the cleaner if
it's not already awake to process any new delayed iputs we add to the
list. If we suddenly need space we will less likely be backed up
behind a bunch of inodes that are waiting to be deleted, and we could
possibly free space before we need to get into the flushing logic which
will save us some latency.
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Josef Bacik [Fri, 11 Jan 2019 15:21:01 +0000 (10:21 -0500)]
btrfs: run delayed iputs before committing
Delayed iputs means we can have final iputs of deleted inodes in the
queue, which could potentially generate a lot of pinned space that could
be free'd. So before we decide to commit the transaction for ENOPSC
reasons, run the delayed iputs so that any potential space is free'd up.
If there is and we freed enough we can then commit the transaction and
potentially be able to make our reservation.
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Josef Bacik [Wed, 21 Nov 2018 19:05:45 +0000 (14:05 -0500)]
btrfs: wait on ordered extents on abort cleanup
If we flip read-only before we initiate writeback on all dirty pages for
ordered extents we've created then we'll have ordered extents left over
on umount, which results in all sorts of bad things happening. Fix this
by making sure we wait on ordered extents if we have to do the aborted
transaction cleanup stuff.
generic/475 can produce this warning:
[ 8531.177332] WARNING: CPU: 2 PID: 11997 at fs/btrfs/disk-io.c:3856 btrfs_free_fs_root+0x95/0xa0 [btrfs]
[ 8531.183282] CPU: 2 PID: 11997 Comm: umount Tainted: G W 5.0.0-rc1-default+ #394
[ 8531.185164] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),BIOS rel-1.11.2-0-gf9626cc-prebuilt.qemu-project.org 04/01/2014
[ 8531.187851] RIP: 0010:btrfs_free_fs_root+0x95/0xa0 [btrfs]
[ 8531.193082] RSP: 0018:
ffffb1ab86163d98 EFLAGS:
00010286
[ 8531.194198] RAX:
ffff9f3449494d18 RBX:
ffff9f34a2695000 RCX:
0000000000000000
[ 8531.195629] RDX:
0000000000000002 RSI:
0000000000000001 RDI:
0000000000000000
[ 8531.197315] RBP:
ffff9f344e930000 R08:
0000000000000001 R09:
0000000000000000
[ 8531.199095] R10:
0000000000000000 R11:
ffff9f34494d4ff8 R12:
ffffb1ab86163dc0
[ 8531.200870] R13:
ffff9f344e9300b0 R14:
ffffb1ab86163db8 R15:
0000000000000000
[ 8531.202707] FS:
00007fc68e949fc0(0000) GS:
ffff9f34bd800000(0000)knlGS:
0000000000000000
[ 8531.204851] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 8531.205942] CR2:
00007ffde8114dd8 CR3:
000000002dfbd000 CR4:
00000000000006e0
[ 8531.207516] Call Trace:
[ 8531.208175] btrfs_free_fs_roots+0xdb/0x170 [btrfs]
[ 8531.210209] ? wait_for_completion+0x5b/0x190
[ 8531.211303] close_ctree+0x157/0x350 [btrfs]
[ 8531.212412] generic_shutdown_super+0x64/0x100
[ 8531.213485] kill_anon_super+0x14/0x30
[ 8531.214430] btrfs_kill_super+0x12/0xa0 [btrfs]
[ 8531.215539] deactivate_locked_super+0x29/0x60
[ 8531.216633] cleanup_mnt+0x3b/0x70
[ 8531.217497] task_work_run+0x98/0xc0
[ 8531.218397] exit_to_usermode_loop+0x83/0x90
[ 8531.219324] do_syscall_64+0x15b/0x180
[ 8531.220192] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 8531.221286] RIP: 0033:0x7fc68e5e4d07
[ 8531.225621] RSP: 002b:
00007ffde8116608 EFLAGS:
00000246 ORIG_RAX:
00000000000000a6
[ 8531.227512] RAX:
0000000000000000 RBX:
00005580c2175970 RCX:
00007fc68e5e4d07
[ 8531.229098] RDX:
0000000000000001 RSI:
0000000000000000 RDI:
00005580c2175b80
[ 8531.230730] RBP:
0000000000000000 R08:
00005580c2175ba0 R09:
00007ffde8114e80
[ 8531.232269] R10:
0000000000000000 R11:
0000000000000246 R12:
00005580c2175b80
[ 8531.233839] R13:
00007fc68eac61c4 R14:
00005580c2175a68 R15:
0000000000000000
Leaving a tree in the rb-tree:
3853 void btrfs_free_fs_root(struct btrfs_root *root)
3854 {
3855 iput(root->ino_cache_inode);
3856 WARN_ON(!RB_EMPTY_ROOT(&root->inode_tree));
CC: stable@vger.kernel.org
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
[ add stacktrace ]
Signed-off-by: David Sterba <dsterba@suse.com>
Josef Bacik [Wed, 21 Nov 2018 19:05:41 +0000 (14:05 -0500)]
btrfs: handle delayed ref head accounting cleanup in abort
We weren't doing any of the accounting cleanup when we aborted
transactions. Fix this by making cleanup_ref_head_accounting global and
calling it from the abort code, this fixes the issue where our
accounting was all wrong after the fs aborts.
The test generic/475 on a 2G VM can trigger the problems eg.:
[ 8502.136957] WARNING: CPU: 0 PID: 11064 at fs/btrfs/extent-tree.c:5986 btrfs_free_block_grou +ps+0x3dc/0x410 [btrfs]
[ 8502.148372] CPU: 0 PID: 11064 Comm: umount Not tainted 5.0.0-rc1-default+ #394
[ 8502.150807] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626 +cc-prebuilt.qemu-project.org 04/01/2014
[ 8502.154317] RIP: 0010:btrfs_free_block_groups+0x3dc/0x410 [btrfs]
[ 8502.160623] RSP: 0018:
ffffb1ab84b93de8 EFLAGS:
00010206
[ 8502.161906] RAX:
0000000001000000 RBX:
ffff9f34b1756400 RCX:
0000000000000000
[ 8502.163448] RDX:
0000000000000002 RSI:
0000000000000001 RDI:
ffff9f34b1755400
[ 8502.164906] RBP:
ffff9f34b7e8c000 R08:
0000000000000001 R09:
0000000000000000
[ 8502.166716] R10:
0000000000000000 R11:
0000000000000001 R12:
ffff9f34b7e8c108
[ 8502.168498] R13:
ffff9f34b7e8c158 R14:
0000000000000000 R15:
dead000000000100
[ 8502.170296] FS:
00007fb1cf15ffc0(0000) GS:
ffff9f34bd400000(0000) knlGS:
0000000000000000
[ 8502.172439] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 8502.173669] CR2:
00007fb1ced507b0 CR3:
000000002f7a6000 CR4:
00000000000006f0
[ 8502.175094] Call Trace:
[ 8502.175759] close_ctree+0x17f/0x350 [btrfs]
[ 8502.176721] generic_shutdown_super+0x64/0x100
[ 8502.177702] kill_anon_super+0x14/0x30
[ 8502.178607] btrfs_kill_super+0x12/0xa0 [btrfs]
[ 8502.179602] deactivate_locked_super+0x29/0x60
[ 8502.180595] cleanup_mnt+0x3b/0x70
[ 8502.181406] task_work_run+0x98/0xc0
[ 8502.182255] exit_to_usermode_loop+0x83/0x90
[ 8502.183113] do_syscall_64+0x15b/0x180
[ 8502.183919] entry_SYSCALL_64_after_hwframe+0x49/0xbe
Corresponding to
release_global_block_rsv() {
...
WARN_ON(fs_info->delayed_refs_rsv.reserved > 0);
CC: stable@vger.kernel.org
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
[ add log dump ]
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Wed, 9 Jan 2019 14:02:23 +0000 (15:02 +0100)]
Revert "btrfs: balance dirty metadata pages in btrfs_finish_ordered_io"
This reverts commit
e73e81b6d0114d4a303205a952ab2e87c44bd279.
This patch causes a few problems:
- adds latency to btrfs_finish_ordered_io
- as btrfs_finish_ordered_io is used for free space cache, generating
more work from btrfs_btree_balance_dirty_nodelay could end up in the
same workque, effectively deadlocking
12260 kworker/u96:16+btrfs-freespace-write D
[<0>] balance_dirty_pages+0x6e6/0x7ad
[<0>] balance_dirty_pages_ratelimited+0x6bb/0xa90
[<0>] btrfs_finish_ordered_io+0x3da/0x770
[<0>] normal_work_helper+0x1c5/0x5a0
[<0>] process_one_work+0x1ee/0x5a0
[<0>] worker_thread+0x46/0x3d0
[<0>] kthread+0xf5/0x130
[<0>] ret_from_fork+0x24/0x30
[<0>] 0xffffffffffffffff
Transaction commit will wait on the freespace cache:
838 btrfs-transacti D
[<0>] btrfs_start_ordered_extent+0x154/0x1e0
[<0>] btrfs_wait_ordered_range+0xbd/0x110
[<0>] __btrfs_wait_cache_io+0x49/0x1a0
[<0>] btrfs_write_dirty_block_groups+0x10b/0x3b0
[<0>] commit_cowonly_roots+0x215/0x2b0
[<0>] btrfs_commit_transaction+0x37e/0x910
[<0>] transaction_kthread+0x14d/0x180
[<0>] kthread+0xf5/0x130
[<0>] ret_from_fork+0x24/0x30
[<0>] 0xffffffffffffffff
And then writepages ends up waiting on transaction commit:
9520 kworker/u96:13+flush-btrfs-1 D
[<0>] wait_current_trans+0xac/0xe0
[<0>] start_transaction+0x21b/0x4b0
[<0>] cow_file_range_inline+0x10b/0x6b0
[<0>] cow_file_range.isra.69+0x329/0x4a0
[<0>] run_delalloc_range+0x105/0x3c0
[<0>] writepage_delalloc+0x119/0x180
[<0>] __extent_writepage+0x10c/0x390
[<0>] extent_write_cache_pages+0x26f/0x3d0
[<0>] extent_writepages+0x4f/0x80
[<0>] do_writepages+0x17/0x60
[<0>] __writeback_single_inode+0x59/0x690
[<0>] writeback_sb_inodes+0x291/0x4e0
[<0>] __writeback_inodes_wb+0x87/0xb0
[<0>] wb_writeback+0x3bb/0x500
[<0>] wb_workfn+0x40d/0x610
[<0>] process_one_work+0x1ee/0x5a0
[<0>] worker_thread+0x1e0/0x3d0
[<0>] kthread+0xf5/0x130
[<0>] ret_from_fork+0x24/0x30
[<0>] 0xffffffffffffffff
Eventually, we have every process in the system waiting on
balance_dirty_pages(), and nobody is able to make progress on page
writeback.
The original patch tried to fix an OOM condition, that happened on 4.4 but no
success reproducing that on later kernels (4.19 and 4.20). This is more likely
a problem in OOM itself.
Link: https://lore.kernel.org/linux-btrfs/20180528054821.9092-1-ethanlien@synology.com/
Reported-by: Chris Mason <clm@fb.com>
CC: stable@vger.kernel.org # 4.18+
CC: ethanlien <ethanlien@synology.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Rafael J. Wysocki [Fri, 18 Jan 2019 10:17:16 +0000 (11:17 +0100)]
Merge branch 'acpi-pci'
* acpi-pci:
drivers: thermal: int340x_thermal: Make PCI dependency explicit
x86/intel/lpss: Make PCI dependency explicit
platform/x86: apple-gmux: Make PCI dependency explicit
platform/x86: intel_pmc: Make PCI dependency explicit
platform/x86: intel_ips: make PCI dependency explicit
vga-switcheroo: make PCI dependency explicit
ata: pata_acpi: Make PCI dependency explicit
ACPI / LPSS: Make PCI dependency explicit
Masahiro Yamada [Fri, 18 Jan 2019 05:30:38 +0000 (14:30 +0900)]
mtd: rawnand: denali: get ->setup_data_interface() working again
Commit
7a08dbaedd36 ("mtd: rawnand: Move ->setup_data_interface() to
nand_controller_ops") missed to invert the if-conditonal for denali.
Since then, the Denali NAND driver cannnot invoke setup_data_interface.
Fixes:
7a08dbaedd36 ("mtd: rawnand: Move ->setup_data_interface() to nand_controller_ops")
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Miquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: Boris Brezillon <bbrezillon@kernel.org>
Luc Van Oostenryck [Thu, 17 Jan 2019 17:39:07 +0000 (18:39 +0100)]
mtd: nand: jz4740: fix '__iomem *' vs. '* __iomem'
The function jz_nand_ioremap_resource() needs a pointer to an __iomem
pointer as its last argument but this argument is declared as:
void * __iomem *base
Fix this by using the correct declaration:
void __iomem **base
which then also removes the following Sparse's warnings:
282:15: warning: incorrect type in assignment (different address spaces)
282:15: expected void *[noderef] <asn:2>
282:15: got void [noderef] <asn:2> *
322:57: warning: incorrect type in argument 4 (different address spaces)
322:57: expected void *[noderef] <asn:2> *base
322:57: got void [noderef] <asn:2> **
402:67: warning: incorrect type in argument 4 (different address spaces)
402:67: expected void *[noderef] <asn:2> *base
402:67: got void [noderef] <asn:2> **
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Acked-by: Miquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: Boris Brezillon <bbrezillon@kernel.org>
Yang Wei [Thu, 17 Jan 2019 15:30:03 +0000 (23:30 +0800)]
macvlan: replace kfree_skb by consume_skb for drop profiles
Replace the kfree_skb() by consume_skb() to be drop monitor(dropwatch,
perf) friendly.
Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yang Wei [Thu, 17 Jan 2019 15:11:30 +0000 (23:11 +0800)]
neighbour: Do not perturb drop profiles when neigh_probe
Replace the kfree_skb() by consume_skb() to be drop monitor(dropwatch,
perf) friendly.
Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Thu, 17 Jan 2019 14:20:14 +0000 (14:20 +0000)]
amd-xgbe: Fix mdio access for non-zero ports and clause 45 PHYs
The XGBE hardware has support for performing MDIO operations using an
MDIO command request. The driver mistakenly uses the mdio port address
as the MDIO command request device address instead of the MDIO command
request port address. Additionally, the driver does not properly check
for and create a clause 45 MDIO command.
Check the supplied MDIO register to determine if the request is a clause
45 operation (MII_ADDR_C45). For a clause 45 operation, extract the device
address and register number from the supplied MDIO register and use them
to set the MDIO command request device address and register number fields.
For a clause 22 operation, the MDIO request device address is set to zero
and the MDIO command request register number is set to the supplied MDIO
register. In either case, the supplied MDIO port address is used as the
MDIO command request port address.
Fixes:
732f2ab7afb9 ("amd-xgbe: Add support for MDIO attached PHYs")
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Tested-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Camelia Groza [Thu, 17 Jan 2019 12:33:33 +0000 (14:33 +0200)]
net: phy: add missing phy driver features
The phy drivers for CS4340 and TN2020 are missing their
features attributes. Add them.
Fixes:
719655a14971 ("net: phy: Replace phy driver features u32 with link_mode bitmap")
Reported-by: Scott Wood <oss@buserror.net>
Signed-off-by: Camelia Groza <camelia.groza@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Madalin Bucur [Thu, 17 Jan 2019 09:42:27 +0000 (11:42 +0200)]
dpaa_eth: NETIF_F_LLTX requires to do our own update of trans_start
As txq_trans_update() only updates trans_start when the lock is held,
trans_start does not get updated if NETIF_F_LLTX is declared.
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yunjian Wang [Thu, 17 Jan 2019 01:46:41 +0000 (09:46 +0800)]
net: bridge: Fix ethernet header pointer before check skb forwardable
The skb header should be set to ethernet header before using
is_skb_forwardable. Because the ethernet header length has been
considered in is_skb_forwardable(including dev->hard_header_len
length).
To reproduce the issue:
1, add 2 ports on linux bridge br using following commands:
$ brctl addbr br
$ brctl addif br eth0
$ brctl addif br eth1
2, the MTU of eth0 and eth1 is 1500
3, send a packet(Data 1480, UDP 8, IP 20, Ethernet 14, VLAN 4)
from eth0 to eth1
So the expect result is packet larger than 1500 cannot pass through
eth0 and eth1. But currently, the packet passes through success, it
means eth1's MTU limit doesn't take effect.
Fixes:
f6367b4660dd ("bridge: use is_skb_forwardable in forward path")
Cc: bridge@lists.linux-foundation.org
Cc: Nkolay Aleksandrov <nikolay@cumulusnetworks.com>
Cc: Roopa Prabhu <roopa@cumulusnetworks.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Wang [Wed, 16 Jan 2019 08:54:42 +0000 (16:54 +0800)]
vhost: log dirty page correctly
Vhost dirty page logging API is designed to sync through GPA. But we
try to log GIOVA when device IOTLB is enabled. This is wrong and may
lead to missing data after migration.
To solve this issue, when logging with device IOTLB enabled, we will:
1) reuse the device IOTLB translation result of GIOVA->HVA mapping to
get HVA, for writable descriptor, get HVA through iovec. For used
ring update, translate its GIOVA to HVA
2) traverse the GPA->HVA mapping to get the possible GPA and log
through GPA. Pay attention this reverse mapping is not guaranteed
to be unique, so we should log each possible GPA in this case.
This fix the failure of scp to guest during migration. In -next, we
will probably support passing GIOVA->GPA instead of GIOVA->HVA.
Fixes:
6b1e6cc7855b ("vhost: new device IOTLB API")
Reported-by: Jintack Lim <jintack@cs.columbia.edu>
Cc: Jintack Lim <jintack@cs.columbia.edu>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Sat, 12 Jan 2019 20:51:05 +0000 (12:51 -0800)]
Documentation: timestamping: correct path to net_tstamp.h
net_tstamp.h is an UAPI header, so it was moved under include/uapi.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dave Airlie [Fri, 18 Jan 2019 05:38:13 +0000 (15:38 +1000)]
Merge branch 'linux-4.21' of git://github.com/skeggsb/linux into drm-fixes
nouveau support for TU102 (RTX 2080 Ti)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Ben Skeggs <bskeggs@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/CABDvA=mQsRr0BpRpv3n6UjthHush4u_kQR3oUGHkBtAHTmyCYw@mail.gmail.com
Linus Torvalds [Fri, 18 Jan 2019 05:17:20 +0000 (17:17 +1200)]
Merge tag 'for-linus' of git://git./linux/kernel/git/rdma/rdma
Pull rdma fixes frfom Jason Gunthorpe:
"Not much so far. We have the usual batch of bugs and two fixes to code
merged this cycle:
- Restore valgrind support for the ioctl verbs interface merged this
window, and fix a missed error code on an error path from that
conversion
- A user reported crash on obsolete mthca hardware
- pvrdma was using the wrong command opcode toward the hypervisor
- NULL pointer crash regression when dumping rdma-cm over netlink
- Be conservative about exposing the global rkey"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/uverbs: Mark ioctl responses with UVERBS_ATTR_F_VALID_OUTPUT
RDMA/mthca: Clear QP objects during their allocation
RDMA/vmw_pvrdma: Return the correct opcode when creating WR
RDMA/cma: Add cm_id restrack resource based on kernel or user cm_id type
RDMA/nldev: Don't expose unsafe global rkey to regular user
RDMA/uverbs: Fix post send success return value in case of error
Linus Torvalds [Fri, 18 Jan 2019 05:14:02 +0000 (17:14 +1200)]
Merge tag 'drm-fixes-2019-01-18' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"The rc3 fixes are a bit scattered:
- meson, sun4i and rockchip all had missing of_node_put.
- qxl and virtio both were advertising dma-buf to userspace when they
really shouldn't have.
Otherwise:
meson:
- modesetting regression fix
i915 GVT:
- one cmd parser failure fix
- region cleanup fix in vGPU destroy
amdgpu:
- KFD fixes for arm64 mixed APU/DGPU
- vega12 powerplay fix
- raven DC fixes
- freesync fix"
* tag 'drm-fixes-2019-01-18' of git://anongit.freedesktop.org/drm/drm:
drm/amd/display: Detach backlight from stream
drm/sun4i: backend: add missing of_node_puts
Revert "drm/amdgpu: validate user pitch alignment"
Revert "drm/amdgpu: validate user GEM object size"
drm/meson: Fix atomic mode switching regression
drm/i915/gvt: Fix mmap range check
drm/i915/gvt: free VFIO region space in vgpu detach
drm/amd/display: Fix disabled cursor on top screen edge
drm/amd/display: fix warning on raven hotplug
drm/amd/display: fix PME notification not working in RV desktop
drm/amd/display: Only get the connector state for VRR when toggled
drm/amd/display: Pack DMCU iRAM alignment
drm/amd/powerplay: run acg btc for Vega12
drm/amdkfd: Don't assign dGPUs to APU topology devices
drm/amdkfd: Allow building KFD on ARM64 (v2)
drm/meson: add missing of_node_put
drm/virtio: drop prime import/export callbacks
drm/qxl: drop prime import/export callbacks
drm/i915/gvt: Allow F_CMD_ACCESS on mmio 0x21f0
drm/rockchip: add missing of_node_put
Linus Torvalds [Fri, 18 Jan 2019 04:58:07 +0000 (16:58 +1200)]
Merge tag 'led-fix-for-5.0-rc3' of git://git./linux/kernel/git/j.anaszewski/linux-leds
Pull LED fix from Jacek Anaszewski.
* tag 'led-fix-for-5.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds:
leds: lp5523: fix a missing check of return value of lp55xx_read
Linus Torvalds [Fri, 18 Jan 2019 04:55:49 +0000 (16:55 +1200)]
Merge tag 'hwmon-for-v5.0-rc3' of git://git./linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
"Minor fixes/regressions"
* tag 'hwmon-for-v5.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (tmp421) Correct the misspelling of the tmp442 compatible attribute in OF device ID table
hwmon: (occ) Fix potential integer overflow
hwmon: (lm80) Fix missing unlock on error in set_fan_div()
hwmon: (nct6775) Enable IO mapping for NCT6797D and NCT6798D
hwmon: (nct6775) Fix chip ID for NCT6798D
Thomas Gleixner [Thu, 17 Jan 2019 23:14:17 +0000 (00:14 +0100)]
block: Cleanup license notice
Remove the imprecise and sloppy:
"This files is licensed under the GPL."
license notice in the top level comment.
1) The file already contains a SPDX license identifier which clearly
states that the license of the file is GPL V2 only
2) The notice resolves to GPL v1 or later for scanners which is just
contrary to the intent of SPDX identifiers to provide clear and non
ambiguous license information. Aside of that the value add of this
notice is below zero,
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Matias Bjorling <mb@lightnvm.io>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org
Fixes:
6a5ac9846508 ("block: Make struct request_queue smaller for CONFIG_BLK_DEV_ZONED=n")
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Ben Skeggs [Thu, 17 Jan 2019 01:39:55 +0000 (11:39 +1000)]
drm/nouveau/core: recognise TU102
Would usually do this split-out, verifying each component indivitually, but
this has been squashed together to be more palatable for merging in 5.0-rc.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Nicolas Dichtel [Thu, 17 Jan 2019 10:27:22 +0000 (11:27 +0100)]
af_packet: fix raw sockets over 6in4 tunnel
Since commit
cb9f1b783850, scapy (which uses an AF_PACKET socket in
SOCK_RAW mode) is unable to send a basic icmp packet over a sit tunnel:
Here is a example of the setup:
$ ip link set ntfp2 up
$ ip addr add 10.125.0.1/24 dev ntfp2
$ ip tunnel add tun1 mode sit ttl 64 local 10.125.0.1 remote 10.125.0.2 dev ntfp2
$ ip addr add fd00:cafe:cafe::1/128 dev tun1
$ ip link set dev tun1 up
$ ip route add fd00:200::/64 dev tun1
$ scapy
>>> p = []
>>> p += IPv6(src='fd00:100::1', dst='fd00:200::1')/ICMPv6EchoRequest()
>>> send(p, count=1, inter=0.1)
>>> quit()
$ ip -s link ls dev tun1 | grep -A1 "TX.*errors"
TX: bytes packets errors dropped carrier collsns
0 0 1 0 0 0
The problem is that the network offset is set to the hard_header_len of the
output device (tun1, ie 14 + 20) and in our case, because the packet is
small (48 bytes) the pskb_inet_may_pull() fails (it tries to pull 40 bytes
(ipv6 header) starting from the network offset).
This problem is more generally related to device with variable hard header
length. To avoid a too intrusive patch in the current release, a (ugly)
workaround is proposed in this patch. It has to be cleaned up in net-next.
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=993675a3100b1
Link: http://patchwork.ozlabs.org/patch/1024489/
Fixes:
cb9f1b783850 ("ip: validate header length on virtual device xmit")
CC: Willem de Bruijn <willemb@google.com>
CC: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>