Luca Coelho [Mon, 18 Dec 2017 18:13:07 +0000 (20:13 +0200)]
iwlwifi: mvm: check if mac80211_queue is valid in iwl_mvm_disable_txq
[ Upstream commit
9a233bb8025105db9a60b5d761005cc5a6c77f3d ]
Sometimes iwl_mvm_disable_txq() may be called with mac80211_queue ==
IEEE80211_INVAL_HW_QUEUE, and this would cause us to use BIT(0xFF)
which is way too large for the u16 we used to store it in
hw_queue_to_mac820211. If this happens the following UBSAN warning
will be generated:
[ 167.185167] UBSAN: Undefined behaviour in drivers/net/wireless/intel/iwlwifi/mvm/utils.c:838:5
[ 167.185171] shift exponent 255 is too large for 64-bit type 'long unsigned int'
Fix that by checking that it is not IEEE80211_INVAL_HW_QUEUE and,
while at it, add a warning if the queue number is larger than
IEEE80211_MAX_QUEUES.
Fixes:
34e10860ae8d ("iwlwifi: mvm: remove references to queue_info in new TX path")
Reported-by: Paul Menzel <pmenzel+linux-wireless@molgen.mpg.de>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Greg Ungerer [Wed, 28 Mar 2018 07:12:18 +0000 (17:12 +1000)]
m68k: set dma and coherent masks for platform FEC ethernets
[ Upstream commit
f61e64310b75733d782e930d1fb404b84699eed6 ]
As of commit
205e1b7f51e4 ("dma-mapping: warn when there is no
coherent_dma_mask") the Freescale FEC driver is issuing the following
warning on driver initialization on ColdFire systems:
WARNING: CPU: 0 PID: 1 at ./include/linux/dma-mapping.h:516 0x40159e20
Modules linked in:
CPU: 0 PID: 1 Comm: swapper Not tainted 4.16.0-rc7-dirty #4
Stack from
41833dd8:
41833dd8 40259c53 40025534 40279e26 00000003 00000000 4004e514 41827000
400255de 40244e42 00000204 40159e20 00000009 00000000 00000000 4024531d
40159e20 40244e42 00000204 00000000 00000000 00000000 00000007 00000000
00000000 40279e26 4028d040 40226576 4003ae88 40279e26 418273f6 41833ef8
7fffffff 418273f2 41867028 4003c9a2 4180ac6c 00000004 41833f8c 4013e71c
40279e1c 40279e26 40226c16 4013ced2 40279e26 40279e58 4028d040 00000000
Call Trace:
[<
40025534>] 0x40025534
[<
4004e514>] 0x4004e514
[<
400255de>] 0x400255de
[<
40159e20>] 0x40159e20
[<
40159e20>] 0x40159e20
It is not fatal, the driver and the system continue to function normally.
As per the warning the coherent_dma_mask is not set on this device.
There is nothing special about the DMA memory coherency on this hardware
so we can just set the mask to 32bits in the platform data for the FEC
ethernet devices.
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Alexander Shishkin [Thu, 1 Mar 2018 08:15:32 +0000 (10:15 +0200)]
intel_th: Use correct method of finding hub
[ Upstream commit
9ad577087165478c9d9be82b15ed9bf2db5835f5 ]
Since commit
8edc514b01e9 ("intel_th: Make SOURCE devices children of the
root device") the hub is not the parent of SOURCE devices any more, so the
new helper function should be used for that instead of always using the
parent. The intel_th_set_output() path, however, still uses the old
logic, leading to the hub driver structure being aliased with something
else, like struct pci_driver or struct acpi_driver, and an incorrect call
to an address inferred from that, potentially resulting in a crash.
Fixes:
8edc514b01e9 ("intel_th: Make SOURCE devices children of the root device")
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sebastian Andrzej Siewior [Thu, 22 Mar 2018 15:22:33 +0000 (16:22 +0100)]
iommu/amd: Take into account that alloc_dev_data() may return NULL
[ Upstream commit
39ffe39545cd5cb5b8cee9f0469165cf24dc62c2 ]
find_dev_data() does not check whether the return value alloc_dev_data()
is NULL. This was okay once because the pointer was returned once as-is.
Since commit
df3f7a6e8e85 ("iommu/amd: Use is_attach_deferred
call-back") the pointer may be used within find_dev_data() so a NULL
check is required.
Cc: Baoquan He <bhe@redhat.com>
Fixes:
df3f7a6e8e85 ("iommu/amd: Use is_attach_deferred call-back")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Anilkumar Kolli [Wed, 28 Mar 2018 09:19:40 +0000 (12:19 +0300)]
ath10k: advertize beacon_int_min_gcd
[ Upstream commit
8ebee73b574ad3dd1f14d461f65ceaffbd637650 ]
This patch fixes regression caused by
0c317a02ca98
("cfg80211: support virtual interfaces with different beacon intervals"),
with this change cfg80211 expects the driver to advertize
'beacon_int_min_gcd' to support different beacon intervals in multivap
scenario. This support is added for, QCA988X/QCA99X0/QCA9984/QCA4019.
Verifed AP + mesh bring up on QCA9984 with beacon interval 100msec and
1000msec respectively.
Frimware: firmware-5.bin_10.4-3.5.3-00053
Fixes:
0c317a02ca98 ("cfg80211: support virtual interfaces with different beacon intervals")
Signed-off-by: Anilkumar Kolli <akolli@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Harry Morris [Wed, 28 Mar 2018 10:54:27 +0000 (11:54 +0100)]
ieee802154: ca8210: fix uninitialised data read
[ Upstream commit
86674a97f5055f4c7f406563408096e8cf9364ff ]
In ca8210_test_int_user_write() a user can request the transfer of a
frame with a length field (command.length) that is longer than the
actual buffer provided (len). In this scenario the driver will copy
the buffer contents into the uninitialised command[] buffer, then
transfer <data.length> bytes over the SPI even though only <len> bytes
had been populated, potentially leaking sensitive kernel memory.
Also the first 6 bytes of the command buffer must be initialised in case
a malformed, short packet is written and the uninitialised bytes are
read in ca8210_test_check_upstream.
Reported-by: Domen Puncer Kugler <domen.puncer@samsung.com>
Signed-off-by: Harry Morris <h.morris@cascoda.com>
Tested-by: Harry Morris <h.morris@cascoda.com>
Signed-off-by: Stefan Schmidt <stefan@osg.samsung.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Michael Ellerman [Fri, 30 Mar 2018 12:27:25 +0000 (23:27 +1100)]
powerpc/mpic: Check if cpu_possible() in mpic_physmask()
[ Upstream commit
0834d627fbea00c1444075eb3e448e1974da452d ]
In mpic_physmask() we loop over all CPUs up to 32, then get the hard
SMP processor id of that CPU.
Currently that's possibly walking off the end of the paca array, but
in a future patch we will change the paca array to be an array of
pointers, and in that case we will get a NULL for missing CPUs and
oops. eg:
Unable to handle kernel paging request for data at address 0x88888888888888b8
Faulting instruction address: 0xc00000000004e380
Oops: Kernel access of bad area, sig: 11 [#1]
...
NIP .mpic_set_affinity+0x60/0x1a0
LR .irq_do_set_affinity+0x48/0x100
Fix it by checking the CPU is possible, this also fixes the code if
there are gaps in the CPU numbering which probably never happens on
mpic systems but who knows.
Debugged-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Lenny Szubowicz [Tue, 27 Mar 2018 13:56:40 +0000 (09:56 -0400)]
ACPI: acpi_pad: Fix memory leak in power saving threads
[ Upstream commit
8b29d29abc484d638213dd79a18a95ae7e5bb402 ]
Fix once per second (round_robin_time) memory leak of about 1 KB in
each acpi_pad kernel idling thread that is activated.
Found by testing with kmemleak.
Signed-off-by: Lenny Szubowicz <lszubowi@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Aaro Koskinen [Fri, 16 Mar 2018 20:17:28 +0000 (22:17 +0200)]
drivers: macintosh: rack-meter: really fix bogus memsets
[ Upstream commit
e283655b5abe26462d53d5196f186c5e8863af3b ]
We should zero an array using sizeof instead of number of elements.
Fixes the following compiler (GCC 7.3.0) warnings:
drivers/macintosh/rack-meter.c: In function 'rackmeter_do_pause':
drivers/macintosh/rack-meter.c:157:2: warning: 'memset' used with length equal to number of elements without multiplication by element size [-Wmemset-elt-size]
drivers/macintosh/rack-meter.c:158:2: warning: 'memset' used with length equal to number of elements without multiplication by element size [-Wmemset-elt-size]
Fixes:
4f7bef7a9f69 ("drivers: macintosh: rack-meter: fix bogus memsets")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Dan Carpenter [Thu, 29 Mar 2018 09:01:53 +0000 (12:01 +0300)]
xen/acpi: off by one in read_acpi_id()
[ Upstream commit
c37a3c94775855567b90f91775b9691e10bd2806 ]
If acpi_id is == nr_acpi_bits, then we access one element beyond the end
of the acpi_psd[] array or we set one bit beyond the end of the bit map
when we do __set_bit(acpi_id, acpi_id_present);
Fixes:
59a568029181 ("xen/acpi-processor: C and P-state driver that uploads said data to hypervisor.")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
David Howells [Fri, 30 Mar 2018 20:04:44 +0000 (21:04 +0100)]
rxrpc: Don't treat call aborts as conn aborts
[ Upstream commit
57b0c9d49b94bbeb53649b7fbd264603c1ebd585 ]
If a call-level abort is received for the previous call to complete on a
connection channel, then that abort is queued for the connection processor
to handle. Unfortunately, the connection processor then assumes without
checking that the abort is connection-level (ie. callNumber is 0) and
distributes it over all active calls on that connection, thereby
incorrectly aborting them.
Fix this by discarding aborts aimed at a completed call.
Further, discard all packets aimed at a call that's complete if there's
currently an active call on a channel, since the DATA packets associated
with the new call automatically terminate the old call.
Fixes:
18bfeba50dfd ("rxrpc: Perform terminal call ACK/ABORT retransmission from conn processor")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
David Howells [Fri, 30 Mar 2018 20:04:43 +0000 (21:04 +0100)]
rxrpc: Fix Tx ring annotation after initial Tx failure
[ Upstream commit
03877bf6a30cca7d4bc3ffabd3c3e9464a7a1a19 ]
rxrpc calls have a ring of packets that are awaiting ACK or retransmission
and a parallel ring of annotations that tracks the state of those packets.
If the initial transmission of a packet on the underlying UDP socket fails
then the packet annotation is marked for resend - but the setting of this
mark accidentally erases the last-packet mark also stored in the same
annotation slot. If this happens, a call won't switch out of the Tx phase
when all the packets have been transmitted.
Fix this by retaining the last-packet mark and only altering the packet
state.
Fixes:
248f219cb8bc ("rxrpc: Rewrite the data and ack handling code")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Qu Wenruo [Tue, 19 Dec 2017 07:44:54 +0000 (15:44 +0800)]
btrfs: qgroup: Fix root item corruption when multiple same source snapshots are created with quota enabled
[ Upstream commit
4d31778aa2fa342f5f92ca4025b293a1729161d1 ]
When multiple pending snapshots referring to the same source subvolume
are executed, enabled quota will cause root item corruption, where root
items are using old bytenr (no backref in extent tree).
This can be triggered by fstests btrfs/152.
The cause is when source subvolume is still dirty, extra commit
(simplied transaction commit) of qgroup_account_snapshot() can skip
dirty roots not recorded in current transaction, making root item of
source subvolume not updated.
Fix it by forcing recording source subvolume in current transaction
before qgroup sub-transaction commit.
Reported-by: Justin Maggard <jmaggard@netgear.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jeff Mahoney [Fri, 16 Mar 2018 18:36:27 +0000 (14:36 -0400)]
btrfs: fix lockdep splat in btrfs_alloc_subvolume_writers
[ Upstream commit
8a5a916d9a35e13576d79cc16e24611821b13e34 ]
While running btrfs/011, I hit the following lockdep splat.
This is the important bit:
pcpu_alloc+0x1ac/0x5e0
__percpu_counter_init+0x4e/0xb0
btrfs_init_fs_root+0x99/0x1c0 [btrfs]
btrfs_get_fs_root.part.54+0x5b/0x150 [btrfs]
resolve_indirect_refs+0x130/0x830 [btrfs]
find_parent_nodes+0x69e/0xff0 [btrfs]
btrfs_find_all_roots_safe+0xa0/0x110 [btrfs]
btrfs_find_all_roots+0x50/0x70 [btrfs]
btrfs_qgroup_prepare_account_extents+0x53/0x90 [btrfs]
btrfs_commit_transaction+0x3ce/0x9b0 [btrfs]
The percpu_counter_init call in btrfs_alloc_subvolume_writers
uses GFP_KERNEL, which we can't do during transaction commit.
This switches it to GFP_NOFS.
========================================================
WARNING: possible irq lock inversion dependency detected
4.12.14-kvmsmall #8 Tainted: G W
--------------------------------------------------------
kswapd0/50 just changed the state of lock:
(&delayed_node->mutex){+.+.-.}, at: [<
ffffffffc06994fa>] __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
but this lock took another, RECLAIM_FS-unsafe lock in the past:
(pcpu_alloc_mutex){+.+.+.}
and interrupts could create inverse lock ordering between them.
other info that might help us debug this:
Chain exists of:
&delayed_node->mutex --> &found->groups_sem --> pcpu_alloc_mutex
Possible interrupt unsafe locking scenario:
CPU0 CPU1
---- ----
lock(pcpu_alloc_mutex);
local_irq_disable();
lock(&delayed_node->mutex);
lock(&found->groups_sem);
<Interrupt>
lock(&delayed_node->mutex);
*** DEADLOCK ***
2 locks held by kswapd0/50:
#0: (shrinker_rwsem){++++..}, at: [<
ffffffff811dc11f>] shrink_slab+0x7f/0x5b0
#1: (&type->s_umount_key#30){+++++.}, at: [<
ffffffff8126dec6>] trylock_super+0x16/0x50
the shortest dependencies between 2nd lock and 1st lock:
-> (pcpu_alloc_mutex){+.+.+.} ops: 4904 {
HARDIRQ-ON-W at:
__mutex_lock+0x4e/0x8c0
pcpu_alloc+0x1ac/0x5e0
alloc_kmem_cache_cpus.isra.70+0x25/0xa0
__do_tune_cpucache+0x2c/0x220
do_tune_cpucache+0x26/0xc0
enable_cpucache+0x6d/0xf0
kmem_cache_init_late+0x42/0x75
start_kernel+0x343/0x4cb
x86_64_start_kernel+0x127/0x134
secondary_startup_64+0xa5/0xb0
SOFTIRQ-ON-W at:
__mutex_lock+0x4e/0x8c0
pcpu_alloc+0x1ac/0x5e0
alloc_kmem_cache_cpus.isra.70+0x25/0xa0
__do_tune_cpucache+0x2c/0x220
do_tune_cpucache+0x26/0xc0
enable_cpucache+0x6d/0xf0
kmem_cache_init_late+0x42/0x75
start_kernel+0x343/0x4cb
x86_64_start_kernel+0x127/0x134
secondary_startup_64+0xa5/0xb0
RECLAIM_FS-ON-W at:
__kmalloc+0x47/0x310
pcpu_extend_area_map+0x2b/0xc0
pcpu_alloc+0x3ec/0x5e0
alloc_kmem_cache_cpus.isra.70+0x25/0xa0
__do_tune_cpucache+0x2c/0x220
do_tune_cpucache+0x26/0xc0
enable_cpucache+0x6d/0xf0
__kmem_cache_create+0x1bf/0x390
create_cache+0xba/0x1b0
kmem_cache_create+0x1f8/0x2b0
ksm_init+0x6f/0x19d
do_one_initcall+0x50/0x1b0
kernel_init_freeable+0x201/0x289
kernel_init+0xa/0x100
ret_from_fork+0x3a/0x50
INITIAL USE at:
__mutex_lock+0x4e/0x8c0
pcpu_alloc+0x1ac/0x5e0
alloc_kmem_cache_cpus.isra.70+0x25/0xa0
setup_cpu_cache+0x2f/0x1f0
__kmem_cache_create+0x1bf/0x390
create_boot_cache+0x8b/0xb1
kmem_cache_init+0xa1/0x19e
start_kernel+0x270/0x4cb
x86_64_start_kernel+0x127/0x134
secondary_startup_64+0xa5/0xb0
}
... key at: [<
ffffffff821d8e70>] pcpu_alloc_mutex+0x70/0xa0
... acquired at:
pcpu_alloc+0x1ac/0x5e0
__percpu_counter_init+0x4e/0xb0
btrfs_init_fs_root+0x99/0x1c0 [btrfs]
btrfs_get_fs_root.part.54+0x5b/0x150 [btrfs]
resolve_indirect_refs+0x130/0x830 [btrfs]
find_parent_nodes+0x69e/0xff0 [btrfs]
btrfs_find_all_roots_safe+0xa0/0x110 [btrfs]
btrfs_find_all_roots+0x50/0x70 [btrfs]
btrfs_qgroup_prepare_account_extents+0x53/0x90 [btrfs]
btrfs_commit_transaction+0x3ce/0x9b0 [btrfs]
transaction_kthread+0x176/0x1b0 [btrfs]
kthread+0x102/0x140
ret_from_fork+0x3a/0x50
-> (&fs_info->commit_root_sem){++++..} ops: 1566382 {
HARDIRQ-ON-W at:
down_write+0x3e/0xa0
cache_block_group+0x287/0x420 [btrfs]
find_free_extent+0x106c/0x12d0 [btrfs]
btrfs_reserve_extent+0xd8/0x170 [btrfs]
cow_file_range.isra.66+0x133/0x470 [btrfs]
run_delalloc_range+0x121/0x410 [btrfs]
writepage_delalloc.isra.50+0xfe/0x180 [btrfs]
__extent_writepage+0x19a/0x360 [btrfs]
extent_write_cache_pages.constprop.56+0x249/0x3e0 [btrfs]
extent_writepages+0x4d/0x60 [btrfs]
do_writepages+0x1a/0x70
__filemap_fdatawrite_range+0xa7/0xe0
btrfs_rename+0x5ee/0xdb0 [btrfs]
vfs_rename+0x52a/0x7e0
SyS_rename+0x351/0x3b0
do_syscall_64+0x79/0x1e0
entry_SYSCALL_64_after_hwframe+0x42/0xb7
HARDIRQ-ON-R at:
down_read+0x35/0x90
caching_thread+0x57/0x560 [btrfs]
normal_work_helper+0x1c0/0x5e0 [btrfs]
process_one_work+0x1e0/0x5c0
worker_thread+0x44/0x390
kthread+0x102/0x140
ret_from_fork+0x3a/0x50
SOFTIRQ-ON-W at:
down_write+0x3e/0xa0
cache_block_group+0x287/0x420 [btrfs]
find_free_extent+0x106c/0x12d0 [btrfs]
btrfs_reserve_extent+0xd8/0x170 [btrfs]
cow_file_range.isra.66+0x133/0x470 [btrfs]
run_delalloc_range+0x121/0x410 [btrfs]
writepage_delalloc.isra.50+0xfe/0x180 [btrfs]
__extent_writepage+0x19a/0x360 [btrfs]
extent_write_cache_pages.constprop.56+0x249/0x3e0 [btrfs]
extent_writepages+0x4d/0x60 [btrfs]
do_writepages+0x1a/0x70
__filemap_fdatawrite_range+0xa7/0xe0
btrfs_rename+0x5ee/0xdb0 [btrfs]
vfs_rename+0x52a/0x7e0
SyS_rename+0x351/0x3b0
do_syscall_64+0x79/0x1e0
entry_SYSCALL_64_after_hwframe+0x42/0xb7
SOFTIRQ-ON-R at:
down_read+0x35/0x90
caching_thread+0x57/0x560 [btrfs]
normal_work_helper+0x1c0/0x5e0 [btrfs]
process_one_work+0x1e0/0x5c0
worker_thread+0x44/0x390
kthread+0x102/0x140
ret_from_fork+0x3a/0x50
INITIAL USE at:
down_write+0x3e/0xa0
cache_block_group+0x287/0x420 [btrfs]
find_free_extent+0x106c/0x12d0 [btrfs]
btrfs_reserve_extent+0xd8/0x170 [btrfs]
cow_file_range.isra.66+0x133/0x470 [btrfs]
run_delalloc_range+0x121/0x410 [btrfs]
writepage_delalloc.isra.50+0xfe/0x180 [btrfs]
__extent_writepage+0x19a/0x360 [btrfs]
extent_write_cache_pages.constprop.56+0x249/0x3e0 [btrfs]
extent_writepages+0x4d/0x60 [btrfs]
do_writepages+0x1a/0x70
__filemap_fdatawrite_range+0xa7/0xe0
btrfs_rename+0x5ee/0xdb0 [btrfs]
vfs_rename+0x52a/0x7e0
SyS_rename+0x351/0x3b0
do_syscall_64+0x79/0x1e0
entry_SYSCALL_64_after_hwframe+0x42/0xb7
}
... key at: [<
ffffffffc0729578>] __key.61970+0x0/0xfffffffffff9aa88 [btrfs]
... acquired at:
cache_block_group+0x287/0x420 [btrfs]
find_free_extent+0x106c/0x12d0 [btrfs]
btrfs_reserve_extent+0xd8/0x170 [btrfs]
btrfs_alloc_tree_block+0x12f/0x4c0 [btrfs]
btrfs_create_tree+0xbb/0x2a0 [btrfs]
btrfs_create_uuid_tree+0x37/0x140 [btrfs]
open_ctree+0x23c0/0x2660 [btrfs]
btrfs_mount+0xd36/0xf90 [btrfs]
mount_fs+0x3a/0x160
vfs_kern_mount+0x66/0x150
btrfs_mount+0x18c/0xf90 [btrfs]
mount_fs+0x3a/0x160
vfs_kern_mount+0x66/0x150
do_mount+0x1c1/0xcc0
SyS_mount+0x7e/0xd0
do_syscall_64+0x79/0x1e0
entry_SYSCALL_64_after_hwframe+0x42/0xb7
-> (&found->groups_sem){++++..} ops: 2134587 {
HARDIRQ-ON-W at:
down_write+0x3e/0xa0
__link_block_group+0x34/0x130 [btrfs]
btrfs_read_block_groups+0x33d/0x7b0 [btrfs]
open_ctree+0x2054/0x2660 [btrfs]
btrfs_mount+0xd36/0xf90 [btrfs]
mount_fs+0x3a/0x160
vfs_kern_mount+0x66/0x150
btrfs_mount+0x18c/0xf90 [btrfs]
mount_fs+0x3a/0x160
vfs_kern_mount+0x66/0x150
do_mount+0x1c1/0xcc0
SyS_mount+0x7e/0xd0
do_syscall_64+0x79/0x1e0
entry_SYSCALL_64_after_hwframe+0x42/0xb7
HARDIRQ-ON-R at:
down_read+0x35/0x90
btrfs_calc_num_tolerated_disk_barrier_failures+0x113/0x1f0 [btrfs]
open_ctree+0x207b/0x2660 [btrfs]
btrfs_mount+0xd36/0xf90 [btrfs]
mount_fs+0x3a/0x160
vfs_kern_mount+0x66/0x150
btrfs_mount+0x18c/0xf90 [btrfs]
mount_fs+0x3a/0x160
vfs_kern_mount+0x66/0x150
do_mount+0x1c1/0xcc0
SyS_mount+0x7e/0xd0
do_syscall_64+0x79/0x1e0
entry_SYSCALL_64_after_hwframe+0x42/0xb7
SOFTIRQ-ON-W at:
down_write+0x3e/0xa0
__link_block_group+0x34/0x130 [btrfs]
btrfs_read_block_groups+0x33d/0x7b0 [btrfs]
open_ctree+0x2054/0x2660 [btrfs]
btrfs_mount+0xd36/0xf90 [btrfs]
mount_fs+0x3a/0x160
vfs_kern_mount+0x66/0x150
btrfs_mount+0x18c/0xf90 [btrfs]
mount_fs+0x3a/0x160
vfs_kern_mount+0x66/0x150
do_mount+0x1c1/0xcc0
SyS_mount+0x7e/0xd0
do_syscall_64+0x79/0x1e0
entry_SYSCALL_64_after_hwframe+0x42/0xb7
SOFTIRQ-ON-R at:
down_read+0x35/0x90
btrfs_calc_num_tolerated_disk_barrier_failures+0x113/0x1f0 [btrfs]
open_ctree+0x207b/0x2660 [btrfs]
btrfs_mount+0xd36/0xf90 [btrfs]
mount_fs+0x3a/0x160
vfs_kern_mount+0x66/0x150
btrfs_mount+0x18c/0xf90 [btrfs]
mount_fs+0x3a/0x160
vfs_kern_mount+0x66/0x150
do_mount+0x1c1/0xcc0
SyS_mount+0x7e/0xd0
do_syscall_64+0x79/0x1e0
entry_SYSCALL_64_after_hwframe+0x42/0xb7
INITIAL USE at:
down_write+0x3e/0xa0
__link_block_group+0x34/0x130 [btrfs]
btrfs_read_block_groups+0x33d/0x7b0 [btrfs]
open_ctree+0x2054/0x2660 [btrfs]
btrfs_mount+0xd36/0xf90 [btrfs]
mount_fs+0x3a/0x160
vfs_kern_mount+0x66/0x150
btrfs_mount+0x18c/0xf90 [btrfs]
mount_fs+0x3a/0x160
vfs_kern_mount+0x66/0x150
do_mount+0x1c1/0xcc0
SyS_mount+0x7e/0xd0
do_syscall_64+0x79/0x1e0
entry_SYSCALL_64_after_hwframe+0x42/0xb7
}
... key at: [<
ffffffffc0729488>] __key.59101+0x0/0xfffffffffff9ab78 [btrfs]
... acquired at:
find_free_extent+0xcb4/0x12d0 [btrfs]
btrfs_reserve_extent+0xd8/0x170 [btrfs]
btrfs_alloc_tree_block+0x12f/0x4c0 [btrfs]
__btrfs_cow_block+0x110/0x5b0 [btrfs]
btrfs_cow_block+0xd7/0x290 [btrfs]
btrfs_search_slot+0x1f6/0x960 [btrfs]
btrfs_lookup_inode+0x2a/0x90 [btrfs]
__btrfs_update_delayed_inode+0x65/0x210 [btrfs]
btrfs_commit_inode_delayed_inode+0x121/0x130 [btrfs]
btrfs_evict_inode+0x3fe/0x6a0 [btrfs]
evict+0xc4/0x190
__dentry_kill+0xbf/0x170
dput+0x2ae/0x2f0
SyS_rename+0x2a6/0x3b0
do_syscall_64+0x79/0x1e0
entry_SYSCALL_64_after_hwframe+0x42/0xb7
-> (&delayed_node->mutex){+.+.-.} ops: 5580204 {
HARDIRQ-ON-W at:
__mutex_lock+0x4e/0x8c0
btrfs_delayed_update_inode+0x46/0x6e0 [btrfs]
btrfs_update_inode+0x83/0x110 [btrfs]
btrfs_dirty_inode+0x62/0xe0 [btrfs]
touch_atime+0x8c/0xb0
do_generic_file_read+0x818/0xb10
__vfs_read+0xdc/0x150
vfs_read+0x8a/0x130
SyS_read+0x45/0xa0
do_syscall_64+0x79/0x1e0
entry_SYSCALL_64_after_hwframe+0x42/0xb7
SOFTIRQ-ON-W at:
__mutex_lock+0x4e/0x8c0
btrfs_delayed_update_inode+0x46/0x6e0 [btrfs]
btrfs_update_inode+0x83/0x110 [btrfs]
btrfs_dirty_inode+0x62/0xe0 [btrfs]
touch_atime+0x8c/0xb0
do_generic_file_read+0x818/0xb10
__vfs_read+0xdc/0x150
vfs_read+0x8a/0x130
SyS_read+0x45/0xa0
do_syscall_64+0x79/0x1e0
entry_SYSCALL_64_after_hwframe+0x42/0xb7
IN-RECLAIM_FS-W at:
__mutex_lock+0x4e/0x8c0
__btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
btrfs_evict_inode+0x22c/0x6a0 [btrfs]
evict+0xc4/0x190
dispose_list+0x35/0x50
prune_icache_sb+0x42/0x50
super_cache_scan+0x139/0x190
shrink_slab+0x262/0x5b0
shrink_node+0x2eb/0x2f0
kswapd+0x2eb/0x890
kthread+0x102/0x140
ret_from_fork+0x3a/0x50
INITIAL USE at:
__mutex_lock+0x4e/0x8c0
btrfs_delayed_update_inode+0x46/0x6e0 [btrfs]
btrfs_update_inode+0x83/0x110 [btrfs]
btrfs_dirty_inode+0x62/0xe0 [btrfs]
touch_atime+0x8c/0xb0
do_generic_file_read+0x818/0xb10
__vfs_read+0xdc/0x150
vfs_read+0x8a/0x130
SyS_read+0x45/0xa0
do_syscall_64+0x79/0x1e0
entry_SYSCALL_64_after_hwframe+0x42/0xb7
}
... key at: [<
ffffffffc072d488>] __key.56935+0x0/0xfffffffffff96b78 [btrfs]
... acquired at:
__lock_acquire+0x264/0x11c0
lock_acquire+0xbd/0x1e0
__mutex_lock+0x4e/0x8c0
__btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
btrfs_evict_inode+0x22c/0x6a0 [btrfs]
evict+0xc4/0x190
dispose_list+0x35/0x50
prune_icache_sb+0x42/0x50
super_cache_scan+0x139/0x190
shrink_slab+0x262/0x5b0
shrink_node+0x2eb/0x2f0
kswapd+0x2eb/0x890
kthread+0x102/0x140
ret_from_fork+0x3a/0x50
stack backtrace:
CPU: 1 PID: 50 Comm: kswapd0 Tainted: G W 4.12.14-kvmsmall #8 SLE15 (unreleased)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
Call Trace:
dump_stack+0x78/0xb7
print_irq_inversion_bug.part.38+0x19f/0x1aa
check_usage_forwards+0x102/0x120
? ret_from_fork+0x3a/0x50
? check_usage_backwards+0x110/0x110
mark_lock+0x16c/0x270
__lock_acquire+0x264/0x11c0
? pagevec_lookup_entries+0x1a/0x30
? truncate_inode_pages_range+0x2b3/0x7f0
lock_acquire+0xbd/0x1e0
? __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
__mutex_lock+0x4e/0x8c0
? __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
? __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
? btrfs_evict_inode+0x1f6/0x6a0 [btrfs]
__btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
btrfs_evict_inode+0x22c/0x6a0 [btrfs]
evict+0xc4/0x190
dispose_list+0x35/0x50
prune_icache_sb+0x42/0x50
super_cache_scan+0x139/0x190
shrink_slab+0x262/0x5b0
shrink_node+0x2eb/0x2f0
kswapd+0x2eb/0x890
kthread+0x102/0x140
? mem_cgroup_shrink_node+0x2c0/0x2c0
? kthread_create_on_node+0x40/0x40
ret_from_fork+0x3a/0x50
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Filipe Manana [Mon, 26 Mar 2018 22:59:12 +0000 (23:59 +0100)]
Btrfs: fix copy_items() return value when logging an inode
[ Upstream commit
8434ec46c6e3232cebc25a910363b29f5c617820 ]
When logging an inode, at tree-log.c:copy_items(), if we call
btrfs_next_leaf() at the loop which checks for the need to log holes, we
need to make sure copy_items() returns the value 1 to its caller and
not 0 (on success). This is because the path the caller passed was
released and is now different from what is was before, and the caller
expects a return value of 0 to mean both success and that the path
has not changed, while a return value of 1 means both success and
signals the caller that it can not reuse the path, it has to perform
another tree search.
Even though this is a case that should not be triggered on normal
circumstances or very rare at least, its consequences can be very
unpredictable (especially when replaying a log tree).
Fixes:
16e7549f045d ("Btrfs: incompatible format change to remove hole extents")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Qu Wenruo [Tue, 27 Mar 2018 12:44:18 +0000 (20:44 +0800)]
btrfs: tests/qgroup: Fix wrong tree backref level
[ Upstream commit
3c0efdf03b2d127f0e40e30db4e7aa0429b1b79a ]
The extent tree of the test fs is like the following:
BTRFS info (device (null)): leaf
16327509003777336587 total ptrs 1 free space 3919
item 0 key (4096 168 4096) itemoff 3944 itemsize 51
extent refs 1 gen 1 flags 2
tree block key (
68719476736 0 0) level 1
^^^^^^^
ref#0: tree block backref root 5
And it's using an empty tree for fs tree, so there is no way that its
level can be 1.
For REAL (created by mkfs) fs tree backref with no skinny metadata, the
result should look like:
item 3 key (
30408704 EXTENT_ITEM 4096) itemoff 3845 itemsize 51
refs 1 gen 4 flags TREE_BLOCK
tree block key (256 INODE_ITEM 0) level 0
^^^^^^^
tree block backref root 5
Fix the level to 0, so it won't break later tree level checker.
Fixes:
faa2dbf004e8 ("Btrfs: add sanity tests for new qgroup accounting code")
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Nicholas Piggin [Mon, 26 Mar 2018 15:01:16 +0000 (01:01 +1000)]
powerpc/64s: sreset panic if there is no debugger or crash dump handlers
[ Upstream commit
d40b6768e45bd9213139b2d91d30c7692b6007b1 ]
system_reset_exception does most of its own crash handling now,
invoking the debugger or crash dumps if they are registered. If not,
then it goes through to die() to print stack traces, and then is
supposed to panic (according to comments).
However after die() prints oopses, it does its own handling which
doesn't allow system_reset_exception to panic (e.g., it may just
kill the current process). This patch causes sreset exceptions to
return from die after it prints messages but before acting.
This also stops die from invoking the debugger on 0x100 crashes.
system_reset_exception similarly calls the debugger. It had been
thought this was harmless (because if the debugger was disabled,
neither call would fire, and if it was enabled the first call
would return). However in some cases like xmon 'X' command, the
debugger returns 0, which currently causes it to be entered
again (first in system_reset_exception, then in die), which is
confusing.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Florian Fainelli [Sun, 1 Apr 2018 17:26:29 +0000 (10:26 -0700)]
net: bgmac: Correctly annotate register space
[ Upstream commit
16a1c0646e55c3345bce8e4edfc06ad119d27c04 ]
All the members: base, idm_base and nicpm_base should be annotated with
__iomem since they are pointers to register space. This fixes a bunch of
sparse reported warnings.
Fixes:
f6a95a24957a ("net: ethernet: bgmac: Add platform device support")
Fixes:
dd5c5d037f5e ("net: ethernet: bgmac: add NS2 support")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Florian Fainelli [Sun, 1 Apr 2018 17:26:30 +0000 (10:26 -0700)]
net: bgmac: Fix endian access in bgmac_dma_tx_ring_free()
[ Upstream commit
60d6e6f0b9e422dd01aeda39257ee0428e5e2a3f ]
bgmac_dma_tx_ring_free() assigns the ctl1 word which is a litle endian
32-bit word without using proper accessors, fix this, and because a
length cannot be negative, use unsigned int while at it.
Fixes:
9cde94506eac ("bgmac: implement scatter/gather support")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
David S. Miller [Tue, 3 Apr 2018 15:24:35 +0000 (08:24 -0700)]
sparc64: Make atomic_xchg() an inline function rather than a macro.
[ Upstream commit
d13864b68e41c11e4231de90cf358658f6ecea45 ]
This avoids a lot of -Wunused warnings such as:
====================
kernel/debug/debug_core.c: In function ‘kgdb_cpu_enter’:
./arch/sparc/include/asm/cmpxchg_64.h:55:22: warning: value computed is not used [-Wunused-value]
#define xchg(ptr,x) ((__typeof__(*(ptr)))__xchg((unsigned long)(x),(ptr),sizeof(*(ptr))))
./arch/sparc/include/asm/atomic_64.h:86:30: note: in expansion of macro ‘xchg’
#define atomic_xchg(v, new) (xchg(&((v)->counter), new))
^~~~
kernel/debug/debug_core.c:508:4: note: in expansion of macro ‘atomic_xchg’
atomic_xchg(&kgdb_active, cpu);
^~~~~~~~~~~
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
David Howells [Wed, 4 Apr 2018 12:41:26 +0000 (13:41 +0100)]
fscache: Fix hanging wait on page discarded by writeback
[ Upstream commit
2c98425720233ae3e135add0c7e869b32913502f ]
If the fscache asynchronous write operation elects to discard a page that's
pending storage to the cache because the page would be over the store limit
then it needs to wake the page as someone may be waiting on completion of
the write.
The problem is that the store limit may be updated by a different
asynchronous operation - and so may miss the write - and that the store
limit may not even get updated until later by the netfs.
Fix the kernel hang by making fscache_write_op() mark as written any pages
that are over the limit.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Alexander Graf [Tue, 3 Apr 2018 22:19:35 +0000 (00:19 +0200)]
lan78xx: Connect phy early
[ Upstream commit
92571a1aae40d291158d16e7142637908220f470 ]
When using wicked with a lan78xx device attached to the system, we
end up with ethtool commands issued on the device before an ifup
got issued. That lead to the following crash:
Unable to handle kernel NULL pointer dereference at virtual address
0000039c
pgd =
ffff800035b30000
[
0000039c] *pgd=
0000000000000000
Internal error: Oops:
96000004 [#1] SMP
Modules linked in: [...]
Supported: Yes
CPU: 3 PID: 638 Comm: wickedd Tainted: G E 4.12.14-0-default #1
Hardware name: raspberrypi rpi/rpi, BIOS 2018.03-rc2 02/21/2018
task:
ffff800035e74180 task.stack:
ffff800036718000
PC is at phy_ethtool_ksettings_get+0x20/0x98
LR is at lan78xx_get_link_ksettings+0x44/0x60 [lan78xx]
pc : [<
ffff0000086f7f30>] lr : [<
ffff000000dcca84>] pstate:
20000005
sp :
ffff80003671bb20
x29:
ffff80003671bb20 x28:
ffff800035e74180
x27:
ffff000008912000 x26:
000000000000001d
x25:
0000000000000124 x24:
ffff000008f74d00
x23:
0000004000114809 x22:
0000000000000000
x21:
ffff80003671bbd0 x20:
0000000000000000
x19:
ffff80003671bbd0 x18:
000000000000040d
x17:
0000000000000001 x16:
0000000000000000
x15:
0000000000000000 x14:
ffffffffffffffff
x13:
0000000000000000 x12:
0000000000000020
x11:
0101010101010101 x10:
fefefefefefefeff
x9 :
7f7f7f7f7f7f7f7f x8 :
fefefeff31677364
x7 :
0000000080808080 x6 :
ffff80003671bc9c
x5 :
ffff80003671b9f8 x4 :
ffff80002c296190
x3 :
0000000000000000 x2 :
0000000000000000
x1 :
ffff80003671bbd0 x0 :
ffff80003671bc00
Process wickedd (pid: 638, stack limit = 0xffff800036718000)
Call trace:
Exception stack(0xffff80003671b9e0 to 0xffff80003671bb20)
b9e0:
ffff80003671bc00 ffff80003671bbd0 0000000000000000 0000000000000000
ba00:
ffff80002c296190 ffff80003671b9f8 ffff80003671bc9c 0000000080808080
ba20:
fefefeff31677364 7f7f7f7f7f7f7f7f fefefefefefefeff 0101010101010101
ba40:
0000000000000020 0000000000000000 ffffffffffffffff 0000000000000000
ba60:
0000000000000000 0000000000000001 000000000000040d ffff80003671bbd0
ba80:
0000000000000000 ffff80003671bbd0 0000000000000000 0000004000114809
baa0:
ffff000008f74d00 0000000000000124 000000000000001d ffff000008912000
bac0:
ffff800035e74180 ffff80003671bb20 ffff000000dcca84 ffff80003671bb20
bae0:
ffff0000086f7f30 0000000020000005 ffff80002c296000 ffff800035223900
bb00:
0000ffffffffffff 0000000000000000 ffff80003671bb20 ffff0000086f7f30
[<
ffff0000086f7f30>] phy_ethtool_ksettings_get+0x20/0x98
[<
ffff000000dcca84>] lan78xx_get_link_ksettings+0x44/0x60 [lan78xx]
[<
ffff0000087cbc40>] ethtool_get_settings+0x68/0x210
[<
ffff0000087cc0d4>] dev_ethtool+0x214/0x2180
[<
ffff0000087e5008>] dev_ioctl+0x400/0x630
[<
ffff00000879dd00>] sock_do_ioctl+0x70/0x88
[<
ffff00000879f5f8>] sock_ioctl+0x208/0x368
[<
ffff0000082cde10>] do_vfs_ioctl+0xb0/0x848
[<
ffff0000082ce634>] SyS_ioctl+0x8c/0xa8
Exception stack(0xffff80003671bec0 to 0xffff80003671c000)
bec0:
0000000000000009 0000000000008946 0000fffff4e841d0 0000aa0032687465
bee0:
0000aaaafa2319d4 0000fffff4e841d4 0000000032687465 0000000032687465
bf00:
000000000000001d 7f7fff7f7f7f7f7f 72606b622e71ff4c 7f7f7f7f7f7f7f7f
bf20:
0101010101010101 0000000000000020 ffffffffffffffff 0000ffff7f510c68
bf40:
0000ffff7f6a9d18 0000ffff7f44ce30 000000000000040d 0000ffff7f6f98f0
bf60:
0000fffff4e842c0 0000000000000001 0000aaaafa2c2e00 0000ffff7f6ab000
bf80:
0000fffff4e842c0 0000ffff7f62a000 0000aaaafa2b9f20 0000aaaafa2c2e00
bfa0:
0000fffff4e84818 0000fffff4e841a0 0000ffff7f5ad0cc 0000fffff4e841a0
bfc0:
0000ffff7f44ce3c 0000000080000000 0000000000000009 000000000000001d
bfe0:
0000000000000000 0000000000000000 0000000000000000 0000000000000000
The culprit is quite simple: The driver tries to access the phy left and right,
but only actually has a working reference to it when the device is up.
The fix thus is quite simple too: Get a reference to the phy on probe already
and keep it even when the device is going down.
With this patch applied, I can successfully run wicked on my system and bring
the interface up and down as many times as I want, without getting NULL pointer
dereferences in between.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sean Christopherson [Fri, 23 Mar 2018 16:34:00 +0000 (09:34 -0700)]
KVM: VMX: raise internal error for exception during invalid protected mode state
[ Upstream commit
add5ff7a216ee545a214013f26d1ef2f44a9c9f8 ]
Exit to userspace with KVM_INTERNAL_ERROR_EMULATION if we encounter
an exception in Protected Mode while emulating guest due to invalid
guest state. Unlike Big RM, KVM doesn't support emulating exceptions
in PM, i.e. PM exceptions are always injected via the VMCS. Because
we will never do VMRESUME due to emulation_required, the exception is
never realized and we'll keep emulating the faulting instruction over
and over until we receive a signal.
Exit to userspace iff there is a pending exception, i.e. don't exit
simply on a requested event. The purpose of this check and exit is to
aid in debugging a guest that is in all likelihood already doomed.
Invalid guest state in PM is extremely limited in normal operation,
e.g. it generally only occurs for a few instructions early in BIOS,
and any exception at this time is all but guaranteed to be fatal.
Non-vectored interrupts, e.g. INIT, SIPI and SMI, can be cleanly
handled/emulated, while checking for vectored interrupts, e.g. INTR
and NMI, without hitting false positives would add a fair amount of
complexity for almost no benefit (getting hit by lightning seems
more likely than encountering this specific scenario).
Add a WARN_ON_ONCE to vmx_queue_exception() if we try to inject an
exception via the VMCS and emulation_required is true.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sai Praneeth [Wed, 4 Apr 2018 19:34:19 +0000 (12:34 -0700)]
x86/mm: Fix bogus warning during EFI bootup, use boot_cpu_has() instead of this_cpu_has() in build_cr3_noflush()
[ Upstream commit
162ee5a8ab49be40d253f90e94aef712470a3a24 ]
Linus reported the following boot warning:
WARNING: CPU: 0 PID: 0 at arch/x86/include/asm/tlbflush.h:134 load_new_mm_cr3+0x114/0x170
[...]
Call Trace:
switch_mm_irqs_off+0x267/0x590
switch_mm+0xe/0x20
efi_switch_mm+0x3e/0x50
efi_enter_virtual_mode+0x43f/0x4da
start_kernel+0x3bf/0x458
secondary_startup_64+0xa5/0xb0
... after merging:
03781e40890c: x86/efi: Use efi_switch_mm() rather than manually twiddling with %cr3
When the platform supports PCID and if CONFIG_DEBUG_VM=y is enabled,
build_cr3_noflush() (called via switch_mm()) does a sanity check to see
if X86_FEATURE_PCID is set.
Presently, build_cr3_noflush() uses "this_cpu_has(X86_FEATURE_PCID)" to
perform the check but this_cpu_has() works only after SMP is initialized
(i.e. per cpu cpu_info's should be populated) and this happens to be very
late in the boot process (during rest_init()).
As efi_runtime_services() are called during (early) kernel boot time
and run time, modify build_cr3_noflush() to use boot_cpu_has() all the
time. As suggested by Dave Hansen, this should be OK because all CPU's have
same capabilities on x86.
With this change the warning is fixed.
( Dave also suggested that we put a warning in this_cpu_has() if it's used
early in the boot process. This is still work in progress as it affects
MCE. )
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Lee Chun-Yi <jlee@suse.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: Ricardo Neri <ricardo.neri@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1522870459-7432-1-git-send-email-sai.praneeth.prakhya@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Davidlohr Bueso [Mon, 2 Apr 2018 16:49:54 +0000 (09:49 -0700)]
sched/rt: Fix rq->clock_update_flags < RQCF_ACT_SKIP warning
[ Upstream commit
d29a20645d5e929aa7e8616f28e5d8e1c49263ec ]
While running rt-tests' pi_stress program I got the following splat:
rq->clock_update_flags < RQCF_ACT_SKIP
WARNING: CPU: 27 PID: 0 at kernel/sched/sched.h:960 assert_clock_updated.isra.38.part.39+0x13/0x20
[...]
<IRQ>
enqueue_top_rt_rq+0xf4/0x150
? cpufreq_dbs_governor_start+0x170/0x170
sched_rt_rq_enqueue+0x65/0x80
sched_rt_period_timer+0x156/0x360
? sched_rt_rq_enqueue+0x80/0x80
__hrtimer_run_queues+0xfa/0x260
hrtimer_interrupt+0xcb/0x220
smp_apic_timer_interrupt+0x62/0x120
apic_timer_interrupt+0xf/0x20
</IRQ>
[...]
do_idle+0x183/0x1e0
cpu_startup_entry+0x5f/0x70
start_secondary+0x192/0x1d0
secondary_startup_64+0xa5/0xb0
We can get rid of it be the "traditional" means of adding an
update_rq_clock() call after acquiring the rq->lock in
do_sched_rt_period_timer().
The case for the RT task throttling (which this workload also hits)
can be ignored in that the skip_update call is actually bogus and
quite the contrary (the request bits are removed/reverted).
By setting RQCF_UPDATED we really don't care if the skip is happening
or not and will therefore make the assert_clock_updated() check happy.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave@stgolabs.net
Cc: linux-kernel@vger.kernel.org
Cc: rostedt@goodmis.org
Link: http://lkml.kernel.org/r/20180402164954.16255-1-dave@stgolabs.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Nicholas Piggin [Thu, 5 Apr 2018 06:10:00 +0000 (16:10 +1000)]
powerpc/64s/idle: Fix restore of AMOR on POWER9 after deep sleep
[ Upstream commit
c1b25a17d24925b0961c319cfc3fd7e1dc778914 ]
POWER8 restores AMOR when waking from deep sleep, but POWER9 does not,
because it does not go through the subcore restore.
Have POWER9 restore it in core restore.
Fixes:
ee97b6b99f42 ("powerpc/mm/radix: Setup AMOR in HV mode to allow key 0")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jun Piao [Thu, 5 Apr 2018 23:18:48 +0000 (16:18 -0700)]
ocfs2/dlm: don't handle migrate lockres if already in shutdown
[ Upstream commit
bb34f24c7d2c98d0c81838a7700e6068325b17a0 ]
We should not handle migrate lockres if we are already in
'DLM_CTXT_IN_SHUTDOWN', as that will cause lockres remains after leaving
dlm domain. At last other nodes will get stuck into infinite loop when
requsting lock from us.
The problem is caused by concurrency umount between nodes. Before
receiveing N1's DLM_BEGIN_EXIT_DOMAIN_MSG, N2 has picked up N1 as the
migrate target. So N2 will continue sending lockres to N1 even though
N1 has left domain.
N1 N2 (owner)
touch file
access the file,
and get pr lock
begin leave domain and
pick up N1 as new owner
begin leave domain and
migrate all lockres done
begin migrate lockres to N1
end leave domain, but
the lockres left
unexpectedly, because
migrate task has passed
[piaojun@huawei.com: v3]
Link: http://lkml.kernel.org/r/5A9CBD19.5020107@huawei.com
Link: http://lkml.kernel.org/r/5A99F028.2090902@huawei.com
Signed-off-by: Jun Piao <piaojun@huawei.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Reviewed-by: Joseph Qi <jiangqi903@gmail.com>
Reviewed-by: Changwei Ge <ge.changwei@h3c.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Mikhail Malygin [Mon, 2 Apr 2018 09:26:59 +0000 (12:26 +0300)]
IB/rxe: Fix for oops in rxe_register_device on ppc64le arch
[ Upstream commit
efc365e7290d040fbd43f60b0e97653489a739d4 ]
On ppc64le arch rxe_add command causes oops in kernel log:
[ 92.495140] Oops: Kernel access of bad area, sig: 11 [#1]
[ 92.499710] SMP NR_CPUS=2048 NUMA pSeries
[ 92.499792] Modules linked in: ipt_MASQUERADE(E) nf_nat_masquerade_ipv4(E) nf_conntrack_netlink(E) nfnetlink(E) xfrm_user(E) iptable
_nat(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) nf_nat_ipv4(E) xt_addrtype(E) iptable_filter(E) ip_tables(E) xt_conntrack(E) x_tables(E)
nf_nat(E) nf_conntrack(E) br_netfilter(E) bridge(E) stp(E) llc(E) overlay(E) af_packet(E) rpcrdma(E) ib_isert(E) iscsi_target_mod(E) i
b_iser(E) libiscsi(E) ib_srpt(E) target_core_mod(E) ib_srp(E) ib_ipoib(E) rdma_ucm(E) ib_ucm(E) ib_uverbs(E) ib_umad(E) bochs_drm(E) tt
m(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) drm(E) agpgart(E) virtio_rng(E) virtio_console(E) rtc_
generic(E) dm_ec(OEN) ttln_rdma(OEN) rdma_cm(E) configfs(E) iw_cm(E) ib_cm(E) rdma_rxe(E) ip6_udp_tunnel(E) udp_tunnel(E) ib_core(E) ql
a2xxx(E)
[ 92.499832] scsi_transport_fc(E) nvme_fc(E) nvme_fabrics(E) nvme_core(E) ipmi_watchdog(E) ipmi_ssif(E) ipmi_poweroff(E) ipmi_powernv(EX) ipmi_devintf(E) ipmi_msghandler(E) dummy(E) ext4(E) crc16(E) jbd2(E) mbcache(E) dm_service_time(E) scsi_transport_iscsi(E) sd_mod(E) sr_mod(E) cdrom(E) hid_generic(E) usbhid(E) virtio_blk(E) virtio_scsi(E) virtio_net(E) ibmvscsi(EX) scsi_transport_srp(E) xhci_pci(E) xhci_hcd(E) usbcore(E) usb_common(E) virtio_pci(E) virtio_ring(E) virtio(E) sunrpc(E) dm_mirror(E) dm_region_hash(E) dm_log(E) sg(E) dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) scsi_mod(E) autofs4(E)
[ 92.499834] Supported: No, Unsupported modules are loaded
[ 92.499839] CPU: 3 PID: 5576 Comm: sh Tainted: G OE NX 4.4.120-ttln.17-default #1
[ 92.499841] task:
c0000000afe8a490 ti:
c0000000beba8000 task.ti:
c0000000beba8000
[ 92.499842] NIP:
c00000000008ba3c LR:
c000000000027644 CTR:
c00000000008ba10
[ 92.499844] REGS:
c0000000bebab750 TRAP: 0300 Tainted: G OE NX (4.4.120-ttln.17-default)
[ 92.499850] MSR:
8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR:
28424428 XER:
20000000
[ 92.499871] CFAR:
0000000000002424 DAR:
0000000000000208 DSISR:
40000000 SOFTE: 1
GPR00:
c000000000027644 c0000000bebab9d0 c000000000f09700 0000000000000000
GPR04:
d0000000043d7192 0000000000000002 000000000000001a fffffffffffffffe
GPR08:
000000000000009c c00000000008ba10 d0000000043e5848 d0000000043d3828
GPR12:
c00000000008ba10 c000000007a02400 0000000010062e38 0000010020388860
GPR16:
0000000000000000 0000000000000000 00000100203885f0 00000000100f6c98
GPR20:
c0000000b3f1fcc0 c0000000b3f1fc48 c0000000b3f1fbd0 c0000000b3f1fb58
GPR24:
c0000000b3f1fae0 c0000000b3f1fa68 00000000000005dc c0000000b3f1f9f0
GPR28:
d0000000043e5848 c0000000b3f1f900 c0000000b3f1f320 c0000000b3f1f000
[ 92.499881] NIP [
c00000000008ba3c] dma_get_required_mask_pSeriesLP+0x2c/0x1a0
[ 92.499885] LR [
c000000000027644] dma_get_required_mask+0x44/0xac
[ 92.499886] Call Trace:
[ 92.499891] [
c0000000bebab9d0] [
c0000000bebaba30] 0xc0000000bebaba30 (unreliable)
[ 92.499894] [
c0000000bebaba10] [
c000000000027644] dma_get_required_mask+0x44/0xac
[ 92.499904] [
c0000000bebaba30] [
d0000000043cb4b4] rxe_register_device+0xc4/0x430 [rdma_rxe]
[ 92.499910] [
c0000000bebabab0] [
d0000000043c06c8] rxe_add+0x448/0x4e0 [rdma_rxe]
[ 92.499915] [
c0000000bebabb30] [
d0000000043d28dc] rxe_net_add+0x4c/0xf0 [rdma_rxe]
[ 92.499921] [
c0000000bebabb60] [
d0000000043d305c] rxe_param_set_add+0x6c/0x1ac [rdma_rxe]
[ 92.499924] [
c0000000bebabbf0] [
c0000000000e78c0] param_attr_store+0xa0/0x180
[ 92.499927] [
c0000000bebabc70] [
c0000000000e6448] module_attr_store+0x48/0x70
[ 92.499932] [
c0000000bebabc90] [
c000000000391f60] sysfs_kf_write+0x70/0xb0
[ 92.499935] [
c0000000bebabcb0] [
c000000000390f1c] kernfs_fop_write+0x18c/0x1e0
[ 92.499939] [
c0000000bebabd00] [
c0000000002e22ac] __vfs_write+0x4c/0x1d0
[ 92.499942] [
c0000000bebabd90] [
c0000000002e2f94] vfs_write+0xc4/0x200
[ 92.499945] [
c0000000bebabde0] [
c0000000002e488c] SyS_write+0x6c/0x110
[ 92.499948] [
c0000000bebabe30] [
c000000000009384] system_call+0x38/0xe4
[ 92.499949] Instruction dump:
[ 92.499954]
4e800020 3c4c00e8 3842dcf0 7c0802a6 f8010010 60000000 7c0802a6 fba1ffe8
[ 92.499958]
fbc1fff0 fbe1fff8 f8010010 f821ffc1 <
e9230208>
7c7e1b78 2fa90000 419e0078
[ 92.499962] ---[ end trace
bed077e15eb420cf ]---
It fails in dma_get_required_mask, that has ppc-specific implementation,
and fail if provided device argument is NULL
Signed-off-by: Mikhail Malygin <mikhail@malygin.me>
Reviewed-by: Yonatan Cohen <yonatanc@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Nikolay Borisov [Thu, 5 Apr 2018 07:40:15 +0000 (10:40 +0300)]
btrfs: Fix possible softlock on single core machines
[ Upstream commit
1e1c50a929bc9e49bc3f9935b92450d9e69f8158 ]
do_chunk_alloc implements a loop checking whether there is a pending
chunk allocation and if so causes the caller do loop. Generally this
loop is executed only once, however testing with btrfs/072 on a single
core vm machines uncovered an extreme case where the system could loop
indefinitely. This is due to a missing cond_resched when loop which
doesn't give a chance to the previous chunk allocator finish its job.
The fix is to simply add the missing cond_resched.
Fixes:
6d74119f1a3e ("Btrfs: avoid taking the chunk_mutex in do_chunk_alloc")
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Liu Bo [Mon, 2 Apr 2018 17:59:47 +0000 (01:59 +0800)]
Btrfs: fix NULL pointer dereference in log_dir_items
[ Upstream commit
80c0b4210a963e31529e15bf90519708ec947596 ]
0, 1 and <0 can be returned by btrfs_next_leaf(), and when <0 is
returned, path->nodes[0] could be NULL, log_dir_items lacks such a
check for <0 and we may run into a null pointer dereference panic.
Fixes:
e02119d5a7b4 ("Btrfs: Add a write ahead tree log to optimize synchronous operations")
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Liu Bo [Mon, 2 Apr 2018 17:59:48 +0000 (01:59 +0800)]
Btrfs: bail out on error during replay_dir_deletes
[ Upstream commit
b98def7ca6e152ee55e36863dddf6f41f12d1dc6 ]
If errors were returned by btrfs_next_leaf(), replay_dir_deletes needs
to bail out, otherwise @ret would be forced to be 0 after 'break;' and
the caller won't be aware of it.
Fixes:
e02119d5a7b4 ("Btrfs: Add a write ahead tree log to optimize synchronous operations")
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Yang Shi [Thu, 5 Apr 2018 23:22:35 +0000 (16:22 -0700)]
mm: thp: fix potential clearing to referenced flag in page_idle_clear_pte_refs_one()
[ Upstream commit
f0849ac0b8e072073ec5fcc7fadd05a77434364e ]
For PTE-mapped THP, the compound THP has not been split to normal 4K
pages yet, the whole THP is considered referenced if any one of sub page
is referenced.
When walking PTE-mapped THP by pvmw, all relevant PTEs will be checked
to retrieve referenced bit. But, the current code just returns the
result of the last PTE. If the last PTE has not referenced, the
referenced flag will be cleared.
Just set referenced when ptep{pmdp}_clear_young_notify() returns true.
Link: http://lkml.kernel.org/r/1518212451-87134-1-git-send-email-yang.shi@linux.alibaba.com
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
Reported-by: Gang Deng <gavin.dg@linux.alibaba.com>
Suggested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Huang Ying [Thu, 5 Apr 2018 23:23:20 +0000 (16:23 -0700)]
mm: fix races between address_space dereference and free in page_evicatable
[ Upstream commit
e92bb4dd9673945179b1fc738c9817dd91bfb629 ]
When page_mapping() is called and the mapping is dereferenced in
page_evicatable() through shrink_active_list(), it is possible for the
inode to be truncated and the embedded address space to be freed at the
same time. This may lead to the following race.
CPU1 CPU2
truncate(inode) shrink_active_list()
... page_evictable(page)
truncate_inode_page(mapping, page);
delete_from_page_cache(page)
spin_lock_irqsave(&mapping->tree_lock, flags);
__delete_from_page_cache(page, NULL)
page_cache_tree_delete(..)
... mapping = page_mapping(page);
page->mapping = NULL;
...
spin_unlock_irqrestore(&mapping->tree_lock, flags);
page_cache_free_page(mapping, page)
put_page(page)
if (put_page_testzero(page)) -> false
- inode now has no pages and can be freed including embedded address_space
mapping_unevictable(mapping)
test_bit(AS_UNEVICTABLE, &mapping->flags);
- we've dereferenced mapping which is potentially already free.
Similar race exists between swap cache freeing and page_evicatable()
too.
The address_space in inode and swap cache will be freed after a RCU
grace period. So the races are fixed via enclosing the page_mapping()
and address_space usage in rcu_read_lock/unlock(). Some comments are
added in code to make it clear what is protected by the RCU read lock.
Link: http://lkml.kernel.org/r/20180212081227.1940-1-ying.huang@intel.com
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Minchan Kim <minchan@kernel.org>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Claudio Imbrenda [Thu, 5 Apr 2018 23:25:41 +0000 (16:25 -0700)]
mm/ksm: fix interaction with THP
[ Upstream commit
77da2ba0648a4fd52e5ff97b8b2b8dd312aec4b0 ]
This patch fixes a corner case for KSM. When two pages belong or
belonged to the same transparent hugepage, and they should be merged,
KSM fails to split the page, and therefore no merging happens.
This bug can be reproduced by:
* making sure ksm is running (in case disabling ksmtuned)
* enabling transparent hugepages
* allocating a THP-aligned 1-THP-sized buffer
e.g. on amd64: posix_memalign(&p, 1<<21, 1<<21)
* filling it with the same values
e.g. memset(p, 42, 1<<21)
* performing madvise to make it mergeable
e.g. madvise(p, 1<<21, MADV_MERGEABLE)
* waiting for KSM to perform a few scans
The expected outcome is that the all the pages get merged (1 shared and
the rest sharing); the actual outcome is that no pages get merged (1
unshared and the rest volatile)
The reason of this behaviour is that we increase the reference count
once for both pages we want to merge, but if they belong to the same
hugepage (or compound page), the reference counter used in both cases is
the one of the head of the compound page. This means that
split_huge_page will find a value of the reference counter too high and
will fail.
This patch solves this problem by testing if the two pages to merge
belong to the same hugepage when attempting to merge them. If so, the
hugepage is split safely. This means that the hugepage is not split if
not necessary.
Link: http://lkml.kernel.org/r/1521548069-24758-1-git-send-email-imbrenda@linux.vnet.ibm.com
Signed-off-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
Co-authored-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Thomas Falcon [Fri, 6 Apr 2018 23:37:03 +0000 (18:37 -0500)]
ibmvnic: Zero used TX descriptor counter on reset
[ Upstream commit
41f714672f93608751dbd2fa2291d476a8ff0150 ]
The counter that tracks used TX descriptors pending completion
needs to be zeroed as part of a device reset. This change fixes
a bug causing transmit queues to be stopped unnecessarily and in
some cases a transmit queue stall and timeout reset. If the counter
is not reset, the remaining descriptors will not be "removed",
effectively reducing queue capacity. If the queue is over half full,
it will cause the queue to stall if stopped.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Esben Haabendal [Sun, 8 Apr 2018 20:17:01 +0000 (22:17 +0200)]
dp83640: Ensure against premature access to PHY registers after reset
[ Upstream commit
76327a35caabd1a932e83d6a42b967aa08584e5d ]
The datasheet specifies a 3uS pause after performing a software
reset. The default implementation of genphy_soft_reset() does not
provide this, so implement soft_reset with the needed pause.
Signed-off-by: Esben Haabendal <eha@deif.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sandipan Das [Wed, 4 Apr 2018 18:04:18 +0000 (23:34 +0530)]
perf clang: Add support for recent clang versions
[ Upstream commit
7854e499f33fd9c7e63288692ffb754d9b1d02fd ]
The clang API calls used by perf have changed in recent releases and
builds succeed with libclang-3.9 only. This introduces compatibility
with libclang-4.0 and above.
Without this patch, we will see the following compilation errors with
libclang-4.0+:
util/c++/clang.cpp: In function ‘clang::CompilerInvocation* perf::createCompilerInvocation(llvm::opt::ArgStringList, llvm::StringRef&, clang::DiagnosticsEngine&)’:
util/c++/clang.cpp:62:33: error: ‘IK_C’ was not declared in this scope
Opts.Inputs.emplace_back(Path, IK_C);
^~~~
util/c++/clang.cpp: In function ‘std::unique_ptr<llvm::Module> perf::getModuleFromSource(llvm::opt::ArgStringList, llvm::StringRef, llvm::IntrusiveRefCntPtr<clang::vfs::FileSystem>)’:
util/c++/clang.cpp:75:26: error: no matching function for call to ‘clang::CompilerInstance::setInvocation(clang::CompilerInvocation*)’
Clang.setInvocation(&*CI);
^
In file included from util/c++/clang.cpp:14:0:
/usr/include/clang/Frontend/CompilerInstance.h:231:8: note: candidate: void clang::CompilerInstance::setInvocation(std::shared_ptr<clang::CompilerInvocation>)
void setInvocation(std::shared_ptr<CompilerInvocation> Value);
^~~~~~~~~~~~~
Committer testing:
Tested on Fedora 27 after installing the clang-devel and llvm-devel
packages, versions:
# rpm -qa | egrep llvm\|clang
llvm-5.0.1-6.fc27.x86_64
clang-libs-5.0.1-5.fc27.x86_64
clang-5.0.1-5.fc27.x86_64
clang-tools-extra-5.0.1-5.fc27.x86_64
llvm-libs-5.0.1-6.fc27.x86_64
llvm-devel-5.0.1-6.fc27.x86_64
clang-devel-5.0.1-5.fc27.x86_64
#
Make sure you don't have some older version lying around in /usr/local,
etc, then:
$ make LIBCLANGLLVM=1 -C tools/perf install-bin
And in the end perf will be linked agains these libraries:
# ldd ~/bin/perf | egrep -i llvm\|clang
libclangAST.so.5 => /lib64/libclangAST.so.5 (0x00007f8bb2eb4000)
libclangBasic.so.5 => /lib64/libclangBasic.so.5 (0x00007f8bb29e3000)
libclangCodeGen.so.5 => /lib64/libclangCodeGen.so.5 (0x00007f8bb23f7000)
libclangDriver.so.5 => /lib64/libclangDriver.so.5 (0x00007f8bb2060000)
libclangFrontend.so.5 => /lib64/libclangFrontend.so.5 (0x00007f8bb1d06000)
libclangLex.so.5 => /lib64/libclangLex.so.5 (0x00007f8bb1a3e000)
libclangTooling.so.5 => /lib64/libclangTooling.so.5 (0x00007f8bb17d4000)
libclangEdit.so.5 => /lib64/libclangEdit.so.5 (0x00007f8bb15c5000)
libclangSema.so.5 => /lib64/libclangSema.so.5 (0x00007f8bb0cc9000)
libclangAnalysis.so.5 => /lib64/libclangAnalysis.so.5 (0x00007f8bb0a23000)
libclangParse.so.5 => /lib64/libclangParse.so.5 (0x00007f8bb0725000)
libclangSerialization.so.5 => /lib64/libclangSerialization.so.5 (0x00007f8bb039a000)
libLLVM-5.0.so => /lib64/libLLVM-5.0.so (0x00007f8bace98000)
libclangASTMatchers.so.5 => /lib64/../lib64/libclangASTMatchers.so.5 (0x00007f8bab735000)
libclangFormat.so.5 => /lib64/../lib64/libclangFormat.so.5 (0x00007f8bab4b2000)
libclangRewrite.so.5 => /lib64/../lib64/libclangRewrite.so.5 (0x00007f8bab2a1000)
libclangToolingCore.so.5 => /lib64/../lib64/libclangToolingCore.so.5 (0x00007f8bab08e000)
#
Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Fixes:
00b86691c77c ("perf clang: Add builtin clang support ant test case")
Link: http://lkml.kernel.org/r/20180404180419.19056-2-sandipan@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sandipan Das [Wed, 4 Apr 2018 18:04:17 +0000 (23:34 +0530)]
perf tools: Fix perf builds with clang support
[ Upstream commit
c2fb54a183cfe77c6fdc9d71e2d5299c1c302a6e ]
For libclang, some distro packages provide static libraries (.a) while
some provide shared libraries (.so). Currently, perf code can only be
linked with static libraries. This makes perf build possible for both
cases.
Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Fixes:
d58ac0bf8d1e ("perf build: Add clang and llvm compile and linking support")
Link: http://lkml.kernel.org/r/20180404180419.19056-1-sandipan@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Anshuman Khandual [Thu, 29 Mar 2018 06:23:37 +0000 (11:53 +0530)]
powerpc/fscr: Enable interrupts earlier before calling get_user()
[ Upstream commit
709b973c844c0b4d115ac3a227a2e5a68722c912 ]
The function get_user() can sleep while trying to fetch instruction
from user address space and causes the following warning from the
scheduler.
BUG: sleeping function called from invalid context
Though interrupts get enabled back but it happens bit later after
get_user() is called. This change moves enabling these interrupts
earlier covering the function get_user(). While at this, lets check
for kernel mode and crash as this interrupt should not have been
triggered from the kernel context.
Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Shunyong Yang [Fri, 6 Apr 2018 02:43:49 +0000 (10:43 +0800)]
cpufreq: CPPC: Initialize shared perf capabilities of CPUs
[ Upstream commit
8913315e9459b146e5888ab5138e10daa061b885 ]
When multiple CPUs are related in one cpufreq policy, the first online
CPU will be chosen by default to handle cpufreq operations. Let's take
cpu0 and cpu1 as an example.
When cpu0 is offline, policy->cpu will be shifted to cpu1. cpu1's perf
capabilities should be initialized. Otherwise, perf capabilities are 0s
and speed change can not take effect.
This patch copies perf capabilities of the first online CPU to other
shared CPUs when policy shared type is CPUFREQ_SHARED_TYPE_ANY.
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Shunyong Yang <shunyong.yang@hxt-semitech.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Carlos Maiolino [Wed, 11 Apr 2018 05:39:04 +0000 (22:39 -0700)]
Force log to disk before reading the AGF during a fstrim
[ Upstream commit
8c81dd46ef3c416b3b95e3020fb90dbd44e6140b ]
Forcing the log to disk after reading the agf is wrong, we might be
calling xfs_log_force with XFS_LOG_SYNC with a metadata lock held.
This can cause a deadlock when racing a fstrim with a filesystem
shutdown.
The deadlock has been identified due a miscalculation bug in device-mapper
dm-thin, which returns lack of space to its users earlier than the device itself
really runs out of space, changing the device-mapper volume into an error state.
The problem happened while filling the filesystem with a single file,
triggering the bug in device-mapper, consequently causing an IO error
and shutting down the filesystem.
If such file is removed, and fstrim executed before the XFS finishes the
shut down process, the fstrim process will end up holding the buffer
lock, and going to sleep on the cil wait queue.
At this point, the shut down process will try to wake up all the threads
waiting on the cil wait queue, but for this, it will try to hold the
same buffer log already held my the fstrim, locking up the filesystem.
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jens Axboe [Wed, 11 Apr 2018 17:26:09 +0000 (11:26 -0600)]
sr: get/drop reference to device in revalidate and check_events
[ Upstream commit
2d097c50212e137e7b53ffe3b37561153eeba87d ]
We can't just use scsi_cd() to get the scsi_cd structure, we have
to grab a live reference to the device. For both callbacks, we're
not inside an open where we already hold a reference to the device.
This fixes device removal/addition under concurrent device access,
which otherwise could result in the below oops.
NULL pointer dereference at
0000000000000010
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP
Modules linked in:
sr 12:0:0:0: [sr2] scsi-1 drive
scsi_debug crc_t10dif crct10dif_generic crct10dif_common nvme nvme_core sb_edac xl
sr 12:0:0:0: Attached scsi CD-ROM sr2
sr_mod cdrom btrfs xor zstd_decompress zstd_compress xxhash lzo_compress zlib_defc
sr 12:0:0:0: Attached scsi generic sg7 type 5
igb ahci libahci i2c_algo_bit libata dca [last unloaded: crc_t10dif]
CPU: 43 PID: 4629 Comm: systemd-udevd Not tainted 4.16.0+ #650
Hardware name: Dell Inc. PowerEdge T630/0NT78X, BIOS 2.3.4 11/09/2016
RIP: 0010:sr_block_revalidate_disk+0x23/0x190 [sr_mod]
RSP: 0018:
ffff883ff357bb58 EFLAGS:
00010292
RAX:
ffffffffa00b07d0 RBX:
ffff883ff3058000 RCX:
ffff883ff357bb66
RDX:
0000000000000003 RSI:
0000000000007530 RDI:
ffff881fea631000
RBP:
0000000000000000 R08:
ffff881fe4d38400 R09:
0000000000000000
R10:
0000000000000000 R11:
00000000000001b6 R12:
000000000800005d
R13:
000000000800005d R14:
ffff883ffd9b3790 R15:
0000000000000000
FS:
00007f7dc8e6d8c0(0000) GS:
ffff883fff340000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
0000000000000010 CR3:
0000003ffda98005 CR4:
00000000003606e0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
Call Trace:
? __invalidate_device+0x48/0x60
check_disk_change+0x4c/0x60
sr_block_open+0x16/0xd0 [sr_mod]
__blkdev_get+0xb9/0x450
? iget5_locked+0x1c0/0x1e0
blkdev_get+0x11e/0x320
? bdget+0x11d/0x150
? _raw_spin_unlock+0xa/0x20
? bd_acquire+0xc0/0xc0
do_dentry_open+0x1b0/0x320
? inode_permission+0x24/0xc0
path_openat+0x4e6/0x1420
? cpumask_any_but+0x1f/0x40
? flush_tlb_mm_range+0xa0/0x120
do_filp_open+0x8c/0xf0
? __seccomp_filter+0x28/0x230
? _raw_spin_unlock+0xa/0x20
? __handle_mm_fault+0x7d6/0x9b0
? list_lru_add+0xa8/0xc0
? _raw_spin_unlock+0xa/0x20
? __alloc_fd+0xaf/0x160
? do_sys_open+0x1a6/0x230
do_sys_open+0x1a6/0x230
do_syscall_64+0x5a/0x100
entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Reviewed-by: Lee Duncan <lduncan@suse.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Xidong Wang [Tue, 10 Apr 2018 23:29:34 +0000 (16:29 -0700)]
z3fold: fix memory leak
[ Upstream commit
1ec6995d1290bfb87cc3a51f0836c889e857cef9 ]
In z3fold_create_pool(), the memory allocated by __alloc_percpu() is not
released on the error path that pool->compact_wq , which holds the
return value of create_singlethread_workqueue(), is NULL. This will
result in a memory leak bug.
[akpm@linux-foundation.org: fix oops on kzalloc() failure, check __alloc_percpu() retval]
Link: http://lkml.kernel.org/r/1522803111-29209-1-git-send-email-wangxidong_97@163.com
Signed-off-by: Xidong Wang <wangxidong_97@163.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tom Abraham [Tue, 10 Apr 2018 23:29:48 +0000 (16:29 -0700)]
swap: divide-by-zero when zero length swap file on ssd
[ Upstream commit
a06ad633a37c64a0cd4c229fc605cee8725d376e ]
Calling swapon() on a zero length swap file on SSD can lead to a
divide-by-zero.
Although creating such files isn't possible with mkswap and they woud be
considered invalid, it would be better for the swapon code to be more
robust and handle this condition gracefully (return -EINVAL).
Especially since the fix is small and straightforward.
To help with wear leveling on SSD, the swapon syscall calculates a
random position in the swap file using modulo p->highest_bit, which is
set to maxpages - 1 in read_swap_header.
If the swap file is zero length, read_swap_header sets maxpages=1 and
last_page=0, resulting in p->highest_bit=0 and we divide-by-zero when we
modulo p->highest_bit in swapon syscall.
This can be prevented by having read_swap_header return zero if
last_page is zero.
Link: http://lkml.kernel.org/r/5AC747C1020000A7001FA82C@prv-mh.provo.novell.com
Signed-off-by: Thomas Abraham <tabraham@suse.com>
Reported-by: <Mark.Landis@Teradata.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Danilo Krummrich [Tue, 10 Apr 2018 23:31:38 +0000 (16:31 -0700)]
fs/proc/proc_sysctl.c: fix potential page fault while unregistering sysctl table
[ Upstream commit
a0b0d1c345d0317efe594df268feb5ccc99f651e ]
proc_sys_link_fill_cache() does not take currently unregistering sysctl
tables into account, which might result into a page fault in
sysctl_follow_link() - add a check to fix it.
This bug has been present since v3.4.
Link: http://lkml.kernel.org/r/20180228013506.4915-1-danilokrummrich@dk-develop.de
Fixes:
0e47c99d7fe25 ("sysctl: Replace root_list with links between sysctl_table_sets")
Signed-off-by: Danilo Krummrich <danilokrummrich@dk-develop.de>
Acked-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: "Luis R . Rodriguez" <mcgrof@kernel.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Dave Hansen [Fri, 6 Apr 2018 20:55:14 +0000 (13:55 -0700)]
x86/mm: Do not forbid _PAGE_RW before init for __ro_after_init
[ Upstream commit
639d6aafe437a7464399d2a77d006049053df06f ]
__ro_after_init data gets stuck in the .rodata section. That's normally
fine because the kernel itself manages the R/W properties.
But, if we run __change_page_attr() on an area which is __ro_after_init,
the .rodata checks will trigger and force the area to be immediately
read-only, even if it is early-ish in boot. This caused problems when
trying to clear the _PAGE_GLOBAL bit for these area in the PTI code:
it cleared _PAGE_GLOBAL like I asked, but also took it up on itself
to clear _PAGE_RW. The kernel then oopses the next time it wrote to
a __ro_after_init data structure.
To fix this, add the kernel_set_to_readonly check, just like we have
for kernel text, just a few lines below in this function.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180406205514.8D898241@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Joerg Roedel [Wed, 11 Apr 2018 15:24:38 +0000 (17:24 +0200)]
x86/pgtable: Don't set huge PUD/PMD on non-leaf entries
[ Upstream commit
e3e288121408c3abeed5af60b87b95c847143845 ]
The pmd_set_huge() and pud_set_huge() functions are used from
the generic ioremap() code to establish large mappings where this
is possible.
But the generic ioremap() code does not check whether the
PMD/PUD entries are already populated with a non-leaf entry,
so that any page-table pages these entries point to will be
lost.
Further, on x86-32 with SHARED_KERNEL_PMD=0, this causes a
BUG_ON() in vmalloc_sync_one() when PMD entries are synced
from swapper_pg_dir to the current page-table. This happens
because the PMD entry from swapper_pg_dir was promoted to a
huge-page entry while the current PGD still contains the
non-leaf entry. Because both entries are present and point
to a different page, the BUG_ON() triggers.
This was actually triggered with pti-x32 enabled in a KVM
virtual machine by the graphics driver.
A real and better fix for that would be to improve the
page-table handling in the generic ioremap() code. But that is
out-of-scope for this patch-set and left for later work.
Reported-by: David H. Gutteridge <dhgutteridge@sympatico.ca>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Waiman Long <llong@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180411152437.GC15462@8bytes.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Filipe Manana [Thu, 5 Apr 2018 21:55:12 +0000 (22:55 +0100)]
Btrfs: fix loss of prealloc extents past i_size after fsync log replay
[ Upstream commit
471d557afed155b85da237ec46c549f443eeb5de ]
Currently if we allocate extents beyond an inode's i_size (through the
fallocate system call) and then fsync the file, we log the extents but
after a power failure we replay them and then immediately drop them.
This behaviour happens since about 2009, commit
c71bf099abdd ("Btrfs:
Avoid orphan inodes cleanup while replaying log"), because it marks
the inode as an orphan instead of dropping any extents beyond i_size
before replaying logged extents, so after the log replay, and while
the mount operation is still ongoing, we find the inode marked as an
orphan and then perform a truncation (drop extents beyond the inode's
i_size). Because the processing of orphan inodes is still done
right after replaying the log and before the mount operation finishes,
the intention of that commit does not make any sense (at least as
of today). However reverting that behaviour is not enough, because
we can not simply discard all extents beyond i_size and then replay
logged extents, because we risk dropping extents beyond i_size created
in past transactions, for example:
add prealloc extent beyond i_size
fsync - clears the flag BTRFS_INODE_NEEDS_FULL_SYNC from the inode
transaction commit
add another prealloc extent beyond i_size
fsync - triggers the fast fsync path
power failure
In that scenario, we would drop the first extent and then replay the
second one. To fix this just make sure that all prealloc extents
beyond i_size are logged, and if we find too many (which is far from
a common case), fallback to a full transaction commit (like we do when
logging regular extents in the fast fsync path).
Trivial reproducer:
$ mkfs.btrfs -f /dev/sdb
$ mount /dev/sdb /mnt
$ xfs_io -f -c "pwrite -S 0xab 0 256K" /mnt/foo
$ sync
$ xfs_io -c "falloc -k 256K 1M" /mnt/foo
$ xfs_io -c "fsync" /mnt/foo
<power failure>
# mount to replay log
$ mount /dev/sdb /mnt
# at this point the file only has one extent, at offset 0, size 256K
A test case for fstests follows soon, covering multiple scenarios that
involve adding prealloc extents with previous shrinking truncates and
without such truncates.
Fixes:
c71bf099abdd ("Btrfs: Avoid orphan inodes cleanup while replaying log")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Liu Bo [Fri, 30 Mar 2018 22:11:56 +0000 (06:11 +0800)]
Btrfs: clean up resources during umount after trans is aborted
[ Upstream commit
af7227338135d2f1b1552bf9a6d43e02dcba10b9 ]
Currently if some fatal errors occur, like all IO get -EIO, resources
would be cleaned up when
a) transaction is being committed or
b) BTRFS_FS_STATE_ERROR is set
However, in some rare cases, resources may be left alone after transaction
gets aborted and umount may run into some ASSERT(), e.g.
ASSERT(list_empty(&block_group->dirty_list));
For case a), in btrfs_commit_transaciton(), there're several places at the
beginning where we just call btrfs_end_transaction() without cleaning up
resources. For case b), it is possible that the trans handle doesn't have
any dirty stuff, then only trans hanlde is marked as aborted while
BTRFS_FS_STATE_ERROR is not set, so resources remain in memory.
This makes btrfs also check BTRFS_FS_STATE_TRANS_ABORTED to make sure that
all resources won't stay in memory after umount.
Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Johannes Thumshirn [Thu, 12 Apr 2018 15:16:06 +0000 (09:16 -0600)]
nvme: don't send keep-alives to the discovery controller
[ Upstream commit
74c6c71530847808d4e3be7b205719270efee80c ]
NVMe over Fabrics 1.0 Section 5.2 "Discovery Controller Properties and
Command Support" Figure 31 "Discovery Controller – Admin Commands"
explicitly listst all commands but "Get Log Page" and "Identify" as
reserved, but NetApp report the Linux host is sending Keep Alive
commands to the discovery controller, which is a violation of the
Spec.
We're already checking for discovery controllers when configuring the
keep alive timeout but when creating a discovery controller we're not
hard wiring the keep alive timeout to 0 and thus remain on
NVME_DEFAULT_KATO for the discovery controller.
This can be easily remproduced when issuing a direct connect to the
discovery susbsystem using:
'nvme connect [...] --nqn=nqn.2014-08.org.nvmexpress.discovery'
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Fixes:
07bfcd09a288 ("nvme-fabrics: add a generic NVMe over Fabrics library")
Reported-by: Martin George <marting@netapp.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jean Delvare [Fri, 13 Apr 2018 13:37:59 +0000 (15:37 +0200)]
firmware: dmi_scan: Fix UUID length safety check
[ Upstream commit
90fe6f8ff00a07641ca893d64f75ca22ce77cca2 ]
The test which ensures that the DMI type 1 structure is long enough
to hold the UUID is off by one. It would fail if the structure is
exactly 24 bytes long, while that's sufficient to hold the UUID.
I don't expect this bug to cause problem in practice because all
implementations I have seen had length 8, 25 or 27 bytes, in line
with the SMBIOS specifications. But let's fix it still.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Fixes:
a814c3597a6b ("firmware: dmi_scan: Check DMI structure length")
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Rich Felker [Fri, 16 Mar 2018 00:01:36 +0000 (20:01 -0400)]
sh: fix debug trap failure to process signals before return to user
[ Upstream commit
96a598996f6ac518ac79839ecbb17c91af91f4f7 ]
When responding to a debug trap (breakpoint) in userspace, the
kernel's trap handler raised SIGTRAP but returned from the trap via a
code path that ignored pending signals, resulting in an infinite loop
re-executing the trapping instruction.
Signed-off-by: Rich Felker <dalias@libc.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Yelena Krivosheev [Fri, 30 Mar 2018 10:05:31 +0000 (12:05 +0200)]
net: mvneta: fix enable of all initialized RXQs
[ Upstream commit
e81b5e01c14add8395dfba7130f8829206bb507d ]
In mvneta_port_up() we enable relevant RX and TX port queues by write
queues bit map to an appropriate register.
q_map must be ZERO in the beginning of this process.
Signed-off-by: Yelena Krivosheev <yelena@marvell.com>
Acked-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Toshiaki Makita [Thu, 29 Mar 2018 10:05:30 +0000 (19:05 +0900)]
vlan: Fix vlan insertion for packets without ethernet header
[ Upstream commit
c769accdf3d8a103940bea2979b65556718567e9 ]
In some situation vlan packets do not have ethernet headers. One example
is packets from tun devices. Users can specify vlan protocol in tun_pi
field instead of IP protocol. When we have a vlan device with reorder_hdr
disabled on top of the tun device, such packets from tun devices are
untagged in skb_vlan_untag() and vlan headers will be inserted back in
vlan_insert_inner_tag().
vlan_insert_inner_tag() however did not expect packets without ethernet
headers, so in such a case size argument for memmove() underflowed.
We don't need to copy headers for packets which do not have preceding
headers of vlan headers, so skip memmove() in that case.
Also don't write vlan protocol in skb->data when it does not have enough
room for it.
Fixes:
cbe7128c4b92 ("vlan: Fix out of order vlan headers with reorder header off")
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Toshiaki Makita [Thu, 29 Mar 2018 10:05:29 +0000 (19:05 +0900)]
net: Fix untag for vlan packets without ethernet header
[ Upstream commit
ae4745730cf8e693d354ccd4dbaf59ea440c09a9 ]
In some situation vlan packets do not have ethernet headers. One example
is packets from tun devices. Users can specify vlan protocol in tun_pi
field instead of IP protocol, and skb_vlan_untag() attempts to untag such
packets.
skb_vlan_untag() (more precisely, skb_reorder_vlan_header() called by it)
however did not expect packets without ethernet headers, so in such a case
size argument for memmove() underflowed and triggered crash.
====
BUG: unable to handle kernel paging request at
ffff8801cccb8000
IP: __memmove+0x24/0x1a0 arch/x86/lib/memmove_64.S:43
PGD 9cee067 P4D 9cee067 PUD
1d9401063 PMD
1cccb7063 PTE
2810100028101
Oops: 000b [#1] SMP KASAN
Dumping ftrace buffer:
(ftrace buffer empty)
Modules linked in:
CPU: 1 PID: 17663 Comm: syz-executor2 Not tainted 4.16.0-rc7+ #368
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:__memmove+0x24/0x1a0 arch/x86/lib/memmove_64.S:43
RSP: 0018:
ffff8801cc046e28 EFLAGS:
00010287
RAX:
ffff8801ccc244c4 RBX:
fffffffffffffffe RCX:
fffffffffff6c4c2
RDX:
fffffffffffffffe RSI:
ffff8801cccb7ffc RDI:
ffff8801cccb8000
RBP:
ffff8801cc046e48 R08:
ffff8801ccc244be R09:
ffffed0039984899
R10:
0000000000000001 R11:
ffffed0039984898 R12:
ffff8801ccc244c4
R13:
ffff8801ccc244c0 R14:
ffff8801d96b7c06 R15:
ffff8801d96b7b40
FS:
00007febd562d700(0000) GS:
ffff8801db300000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
ffff8801cccb8000 CR3:
00000001ccb2f006 CR4:
00000000001606e0
DR0:
0000000020000000 DR1:
0000000020000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000600
Call Trace:
memmove include/linux/string.h:360 [inline]
skb_reorder_vlan_header net/core/skbuff.c:5031 [inline]
skb_vlan_untag+0x470/0xc40 net/core/skbuff.c:5061
__netif_receive_skb_core+0x119c/0x3460 net/core/dev.c:4460
__netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4627
netif_receive_skb_internal+0x10b/0x670 net/core/dev.c:4701
netif_receive_skb+0xae/0x390 net/core/dev.c:4725
tun_rx_batched.isra.50+0x5ee/0x870 drivers/net/tun.c:1555
tun_get_user+0x299e/0x3c20 drivers/net/tun.c:1962
tun_chr_write_iter+0xb9/0x160 drivers/net/tun.c:1990
call_write_iter include/linux/fs.h:1782 [inline]
new_sync_write fs/read_write.c:469 [inline]
__vfs_write+0x684/0x970 fs/read_write.c:482
vfs_write+0x189/0x510 fs/read_write.c:544
SYSC_write fs/read_write.c:589 [inline]
SyS_write+0xef/0x220 fs/read_write.c:581
do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x42/0xb7
RIP: 0033:0x454879
RSP: 002b:
00007febd562cc68 EFLAGS:
00000246 ORIG_RAX:
0000000000000001
RAX:
ffffffffffffffda RBX:
00007febd562d6d4 RCX:
0000000000454879
RDX:
0000000000000157 RSI:
0000000020000180 RDI:
0000000000000014
RBP:
000000000072bea0 R08:
0000000000000000 R09:
0000000000000000
R10:
0000000000000000 R11:
0000000000000246 R12:
00000000ffffffff
R13:
00000000000006b0 R14:
00000000006fc120 R15:
0000000000000000
Code: 90 90 90 90 90 90 90 48 89 f8 48 83 fa 20 0f 82 03 01 00 00 48 39 fe 7d 0f 49 89 f0 49 01 d0 49 39 f8 0f 8f 9f 00 00 00 48 89 d1 <f3> a4 c3 48 81 fa a8 02 00 00 72 05 40 38 fe 74 3b 48 83 ea 20
RIP: __memmove+0x24/0x1a0 arch/x86/lib/memmove_64.S:43 RSP:
ffff8801cc046e28
CR2:
ffff8801cccb8000
====
We don't need to copy headers for packets which do not have preceding
headers of vlan headers, so skip memmove() in that case.
Fixes:
4bbb3e0e8239 ("net: Fix vlan untag for bridge and vlan_dev with reorder_hdr off")
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Manish Chopra [Wed, 28 Mar 2018 10:35:52 +0000 (03:35 -0700)]
qede: Do not drop rx-checksum invalidated packets.
[ Upstream commit
58f101bf87e32753342a6924772c6ebb0fbde24a ]
Today, driver drops received packets which are indicated as
invalid checksum by the device. Instead of dropping such packets,
pass them to the stack with CHECKSUM_NONE indication in skb.
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: Manish Chopra <manish.chopra@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Stephen Hemminger [Tue, 27 Mar 2018 18:28:48 +0000 (11:28 -0700)]
hv_netvsc: enable multicast if necessary
[ Upstream commit
f03dbb06dc380274e351ca4b1ee1587ed4529e62 ]
My recent change to netvsc drive in how receive flags are handled
broke multicast. The Hyper-v/Azure virtual interface there is not a
multicast filter list, filtering is only all or none. The driver must
enable all multicast if any multicast address is present.
Fixes:
009f766ca238 ("hv_netvsc: filter multicast/broadcast")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Vinayak Menon [Wed, 28 Mar 2018 23:01:16 +0000 (16:01 -0700)]
mm/kmemleak.c: wait for scan completion before disabling free
[ Upstream commit
914b6dfff790544d9b77dfd1723adb3745ec9700 ]
A crash is observed when kmemleak_scan accesses the object->pointer,
likely due to the following race.
TASK A TASK B TASK C
kmemleak_write
(with "scan" and
NOT "scan=on")
kmemleak_scan()
create_object
kmem_cache_alloc fails
kmemleak_disable
kmemleak_do_cleanup
kmemleak_free_enabled = 0
kfree
kmemleak_free bails out
(kmemleak_free_enabled is 0)
slub frees object->pointer
update_checksum
crash - object->pointer
freed (DEBUG_PAGEALLOC)
kmemleak_do_cleanup waits for the scan thread to complete, but not for
direct call to kmemleak_scan via kmemleak_write. So add a wait for
kmemleak_scan completion before disabling kmemleak_free, and while at it
fix the comment on stop_scan_thread.
[vinmenon@codeaurora.org: fix stop_scan_thread comment]
Link: http://lkml.kernel.org/r/1522219972-22809-1-git-send-email-vinmenon@codeaurora.org
Link: http://lkml.kernel.org/r/1522063429-18992-1-git-send-email-vinmenon@codeaurora.org
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Steven J. Hill [Wed, 28 Mar 2018 23:01:09 +0000 (16:01 -0700)]
mm/vmstat.c: fix vmstat_update() preemption BUG
[ Upstream commit
c7f26ccfb2c31eb1bf810ba13d044fcf583232db ]
Attempting to hotplug CPUs with CONFIG_VM_EVENT_COUNTERS enabled can
cause vmstat_update() to report a BUG due to preemption not being
disabled around smp_processor_id().
Discovered on Ubiquiti EdgeRouter Pro with Cavium Octeon II processor.
BUG: using smp_processor_id() in preemptible [
00000000] code:
kworker/1:1/269
caller is vmstat_update+0x50/0xa0
CPU: 0 PID: 269 Comm: kworker/1:1 Not tainted
4.16.0-rc4-Cavium-Octeon-00009-gf83bbd5-dirty #1
Workqueue: mm_percpu_wq vmstat_update
Call Trace:
show_stack+0x94/0x128
dump_stack+0xa4/0xe0
check_preemption_disabled+0x118/0x120
vmstat_update+0x50/0xa0
process_one_work+0x144/0x348
worker_thread+0x150/0x4b8
kthread+0x110/0x140
ret_from_kernel_thread+0x14/0x1c
Link: http://lkml.kernel.org/r/1520881552-25659-1-git-send-email-steven.hill@cavium.com
Signed-off-by: Steven J. Hill <steven.hill@cavium.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Tejun Heo <htejun@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Maninder Singh [Wed, 28 Mar 2018 23:01:05 +0000 (16:01 -0700)]
mm/page_owner: fix recursion bug after changing skip entries
[ Upstream commit
299815a4fba9f3c7a81434dba0072148f1690608 ]
This patch fixes commit
5f48f0bd4e36 ("mm, page_owner: skip unnecessary
stack_trace entries").
Because if we skip first two entries then logic of checking count value
as 2 for recursion is broken and code will go in one depth recursion.
so we need to check only one call of _RET_IP(__set_page_owner) while
checking for recursion.
Current Backtrace while checking for recursion:-
(save_stack) from (__set_page_owner) // (But recursion returns true here)
(__set_page_owner) from (get_page_from_freelist)
(get_page_from_freelist) from (__alloc_pages_nodemask)
(__alloc_pages_nodemask) from (depot_save_stack)
(depot_save_stack) from (save_stack) // recursion should return true here
(save_stack) from (__set_page_owner)
(__set_page_owner) from (get_page_from_freelist)
(get_page_from_freelist) from (__alloc_pages_nodemask+)
(__alloc_pages_nodemask) from (depot_save_stack)
(depot_save_stack) from (save_stack)
(save_stack) from (__set_page_owner)
(__set_page_owner) from (get_page_from_freelist)
Correct Backtrace with fix:
(save_stack) from (__set_page_owner) // recursion returned true here
(__set_page_owner) from (get_page_from_freelist)
(get_page_from_freelist) from (__alloc_pages_nodemask+)
(__alloc_pages_nodemask) from (depot_save_stack)
(depot_save_stack) from (save_stack)
(save_stack) from (__set_page_owner)
(__set_page_owner) from (get_page_from_freelist)
Link: http://lkml.kernel.org/r/1521607043-34670-1-git-send-email-maninder1.s@samsung.com
Fixes:
5f48f0bd4e36 ("mm, page_owner: skip unnecessary stack_trace entries")
Signed-off-by: Maninder Singh <maninder1.s@samsung.com>
Signed-off-by: Vaneet Narang <v.narang@samsung.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@techadventures.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ayush Mittal <ayush.m@samsung.com>
Cc: Prakash Gupta <guptap@codeaurora.org>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Cc: Vasyl Gomonovych <gomonovych@gmail.com>
Cc: Amit Sahrawat <a.sahrawat@samsung.com>
Cc: <pankaj.m@samsung.com>
Cc: Vaneet Narang <v.narang@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Shakeel Butt [Wed, 28 Mar 2018 23:00:57 +0000 (16:00 -0700)]
mm, slab: memcg_link the SLAB's kmem_cache
[ Upstream commit
880cd276dff17ea29e9a8404275c9502b265afa7 ]
All the root caches are linked into slab_root_caches which was
introduced by the commit
510ded33e075 ("slab: implement slab_root_caches
list") but it missed to add the SLAB's kmem_cache.
While experimenting with opt-in/opt-out kmem accounting, I noticed
system crashes due to NULL dereference inside cache_from_memcg_idx()
while deferencing kmem_cache.memcg_params.memcg_caches. The upstream
clean kernel will not see these crashes but SLAB should be consistent
with SLUB which does linked its boot caches (kmem_cache_node and
kmem_cache) into slab_root_caches.
Link: http://lkml.kernel.org/r/20180319210020.60289-1-shakeelb@google.com
Fixes:
510ded33e075c ("slab: implement slab_root_caches list")
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Manish Chopra [Tue, 27 Mar 2018 13:34:41 +0000 (06:34 -0700)]
qede: Fix barrier usage after tx doorbell write.
[ Upstream commit
b9fc828debc8ac2bb21b5819a44d2aea456f1c95 ]
Since commit
c5ad119fb6c09b0297446be05bd66602fa564758
("net: sched: pfifo_fast use skb_array") driver is exposed
to an issue where it is hitting NULL skbs while handling TX
completions. Driver uses mmiowb() to flush the writes to the
doorbell bar which is a write-combined bar, however on x86
mmiowb() does not flush the write combined buffer.
This patch fixes this problem by replacing mmiowb() with wmb()
after the write combined doorbell write so that writes are
flushed and synchronized from more than one processor.
V1->V2:
-------
This patch was marked as "superseded" in patchwork.
(Not really sure for what reason).Resending it as v2.
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: Manish Chopra <manish.chopra@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jan Kiszka [Wed, 21 Mar 2018 05:15:28 +0000 (13:15 +0800)]
builddeb: Fix header package regarding dtc source links
[ Upstream commit
f8437520704cfd9cc442a99d73ed708a3cdadaf9 ]
Since
d5d332d3f7e8, a couple of links in scripts/dtc/include-prefixes
are additionally required in order to build device trees with the header
package.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: Riku Voipio <riku.voipio@linaro.org>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cong Wang [Mon, 26 Mar 2018 22:08:33 +0000 (15:08 -0700)]
llc: properly handle dev_queue_xmit() return value
[ Upstream commit
b85ab56c3f81c5a24b5a5213374f549df06430da ]
llc_conn_send_pdu() pushes the skb into write queue and
calls llc_conn_send_pdus() to flush them out. However, the
status of dev_queue_xmit() is not returned to caller,
in this case, llc_conn_state_process().
llc_conn_state_process() needs hold the skb no matter
success or failure, because it still uses it after that,
therefore we should hold skb before dev_queue_xmit() when
that skb is the one being processed by llc_conn_state_process().
For other callers, they can just pass NULL and ignore
the return value as they are.
Reported-by: Noam Rathaus <noamr@beyondsecurity.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Alexey Dobriyan [Sun, 14 Jan 2018 12:05:04 +0000 (15:05 +0300)]
x86/alternatives: Fixup alternative_call_2
[ Upstream commit
bd6271039ee6f0c9b468148fc2d73e0584af6b4f ]
The following pattern fails to compile while the same pattern
with alternative_call() does:
if (...)
alternative_call_2(...);
else
alternative_call_2(...);
as it expands into
if (...)
{
}; <===
else
{
};
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20180114120504.GA11368@avx2
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Stephane Eranian [Fri, 23 Mar 2018 07:01:47 +0000 (00:01 -0700)]
perf/x86/intel: Fix linear IP of PEBS real_ip on Haswell and later CPUs
[ Upstream commit
71eb9ee9596d8df3d5723c3cfc18774c6235e8b1 ]
this patch fix a bug in how the pebs->real_ip is handled in the PEBS
handler. real_ip only exists in Haswell and later processor. It is
actually the eventing IP, i.e., where the event occurred. As opposed
to the pebs->ip which is the PEBS interrupt IP which is always off
by one.
The problem is that the real_ip just like the IP needs to be fixed up
because PEBS does not record all the machine state registers, and
in particular the code segement (cs). This is why we have the set_linear_ip()
function. The problem was that set_linear_ip() was only used on the pebs->ip
and not the pebs->real_ip.
We have profiles which ran into invalid callstacks because of this.
Here is an example:
..... 0:
ffffffffffffff80 recent entry, marker kernel v
..... 1:
000000000040044d <= user address in kernel space!
..... 2:
fffffffffffffe00 marker enter user v
..... 3:
000000000040044d
..... 4:
00000000004004b6 oldest entry
Debugging output in get_perf_callchain():
[ 857.769909] CALLCHAIN: CPU8 ip=40044d regs->cs=10 user_mode(regs)=0
The problem is that the kernel entry in 1: points to a user level
address. How can that be?
The reason is that with PEBS sampling the instruction that caused the event
to occur and the instruction where the CPU was when the interrupt was posted
may be far apart. And sometime during that time window, the privilege level may
change. This happens, for instance, when the PEBS sample is taken close to a
kernel entry point. Here PEBS, eventing IP (real_ip) captured a user level
instruction. But by the time the PMU interrupt fired, the processor had already
entered kernel space. This is why the debug output shows a user address with
user_mode() false.
The problem comes from PEBS not recording the code segment (cs) register.
The register is used in x86_64 to determine if executing in kernel vs user
space. This is okay because the kernel has a software workaround called
set_linear_ip(). But the issue in setup_pebs_sample_data() is that
set_linear_ip() is never called on the real_ip value when it is available
(Haswell and later) and precise_ip > 1.
This patch fixes this problem and eliminates the callchain discrepancy.
The patch restructures the code around set_linear_ip() to minimize the number
of times the IP has to be set.
Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: kan.liang@intel.com
Link: http://lkml.kernel.org/r/1521788507-10231-1-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Or Gerlitz [Thu, 15 Feb 2018 10:39:55 +0000 (12:39 +0200)]
net/mlx5: Make eswitch support to depend on switchdev
[ Upstream commit
f125376b06bcc57dfb0216ac8d6ec6d5dcf81025 ]
Add dependancy for switchdev to be congfigured as any user-space control
plane SW is expected to use the HW switchdev ID to locate the representors
related to VFs of a certain PF and apply SW/offloaded switching on them.
Fixes:
e80541ecabd5 ('net/mlx5: Add CONFIG_MLX5_ESWITCH Kconfig')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sean Wang [Mon, 26 Mar 2018 10:07:10 +0000 (18:07 +0800)]
net: dsa: mt7530: fix module autoloading for OF platform drivers
[ Upstream commit
3c82b372a9f44aa224b8d5106ff6f1ad516fa8a8 ]
It's required to create a modules.alias via MODULE_DEVICE_TABLE helper
for the OF platform driver. Otherwise, module autoloading cannot work.
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Xin Long [Sun, 25 Mar 2018 17:16:45 +0000 (01:16 +0800)]
bonding: fix the err path for dev hwaddr sync in bond_enslave
[ Upstream commit
5c78f6bfae2b10ff70e21d343e64584ea6280c26 ]
vlan_vids_add_by_dev is called right after dev hwaddr sync, so on
the err path it should unsync dev hwaddr. Otherwise, the slave
dev's hwaddr will never be unsync when this err happens.
Fixes:
1ff412ad7714 ("bonding: change the bond's vlan syncing functions with the standard ones")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pawel Dembicki [Sat, 24 Mar 2018 21:08:14 +0000 (22:08 +0100)]
net: qmi_wwan: add BroadMobi BM806U 2020:2033
[ Upstream commit
743989254ea9f132517806d8893ca9b6cf9dc86b ]
BroadMobi BM806U is an Qualcomm MDM9225 based 3G/4G modem.
Tested hardware BM806U is mounted on D-Link DWR-921-C3 router.
The USB id is added to qmi_wwan.c to allow QMI communication with
the BM806U.
Tested on 4.14 kernel and OpenWRT.
Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Raghuram Chary J [Fri, 23 Mar 2018 10:18:08 +0000 (15:48 +0530)]
lan78xx: Set ASD in MAC_CR when EEE is enabled.
[ Upstream commit
e69647a19c870c2f919e4d5023af8a515e8ef25f ]
Description:
EEE does not work with lan7800 when AutoSpeed is not set.
(This can happen when EEPROM is not populated or configured incorrectly)
Root-Cause:
When EEE is enabled, the mac config register ASD is not set
i.e. in default state, causing EEE fail.
Fix:
Set the register when eeprom is not present.
Fixes:
55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver")
Signed-off-by: Raghuram Chary J <raghuramchary.jallipalli@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jinbum Park [Tue, 6 Mar 2018 00:37:21 +0000 (01:37 +0100)]
ARM: 8748/1: mm: Define vdso_start, vdso_end as array
[ Upstream commit
73b9160d0dfe44dfdaffd6465dc1224c38a4a73c ]
Define vdso_start, vdso_end as array to avoid compile-time analysis error
for the case of built with CONFIG_FORTIFY_SOURCE.
and, since vdso_start, vdso_end are used in vdso.c only,
move extern-declaration from vdso.h to vdso.c.
If kernel is built with CONFIG_FORTIFY_SOURCE,
compile-time error happens at this code.
- if (memcmp(&vdso_start, "177ELF", 4))
The size of "&vdso_start" is recognized as 1 byte, but n is 4,
So that compile-time error is reported.
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jinbum Park <jinb.park7@gmail.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Linus Lüssing [Wed, 21 Mar 2018 23:21:32 +0000 (00:21 +0100)]
batman-adv: fix packet loss for broadcasted DHCP packets to a server
[ Upstream commit
a752c0a4524889cdc0765925258fd1fd72344100 ]
DHCP connectivity issues can currently occur if the following conditions
are met:
1) A DHCP packet from a client to a server
2) This packet has a multicast destination
3) This destination has a matching entry in the translation table
(FF:FF:FF:FF:FF:FF for IPv4, 33:33:00:01:00:02/33:33:00:01:00:03
for IPv6)
4) The orig-node determined by TT for the multicast destination
does not match the orig-node determined by best-gateway-selection
In this case the DHCP packet will be dropped.
The "gateway-out-of-range" check is supposed to only be applied to
unicasted DHCP packets to a specific DHCP server.
In that case dropping the the unicasted frame forces the client to
retry via a broadcasted one, but now directed to the new best
gateway.
A DHCP packet with broadcast/multicast destination is already ensured to
always be delivered to the best gateway. Dropping a multicasted
DHCP packet here will only prevent completing DHCP as there is no
other fallback.
So far, it seems the unicast check was implicitly performed by
expecting the batadv_transtable_search() to return NULL for multicast
destinations. However, a multicast address could have always ended up in
the translation table and in fact is now common.
To fix this potential loss of a DHCP client-to-server packet to a
multicast address this patch adds an explicit multicast destination
check to reliably bail out of the gateway-out-of-range check for such
destinations.
The issue and fix were tested in the following three node setup:
- Line topology, A-B-C
- A: gateway client, DHCP client
- B: gateway server, hop-penalty increased: 30->60, DHCP server
- C: gateway server, code modifications to announce FF:FF:FF:FF:FF:FF
Without this patch, A would never transmit its DHCP Discover packet
due to an always "out-of-range" condition. With this patch,
a full DHCP handshake between A and B was possible again.
Fixes:
be7af5cf9cae ("batman-adv: refactoring gateway handling code")
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Linus Lüssing [Tue, 20 Mar 2018 02:13:27 +0000 (03:13 +0100)]
batman-adv: fix multicast-via-unicast transmission with AP isolation
[ Upstream commit
f8fb3419ead44f9a3136995acd24e35da4525177 ]
For multicast frames AP isolation is only supposed to be checked on
the receiving nodes and never on the originating one.
Furthermore, the isolation or wifi flag bits should only be intepreted
as such for unicast and never multicast TT entries.
By injecting flags to the multicast TT entry claimed by a single
target node it was verified in tests that this multicast address
becomes unreachable, leading to packet loss.
Omitting the "src" parameter to the batadv_transtable_search() call
successfully skipped the AP isolation check and made the target
reachable again.
Fixes:
1d8ab8d3c176 ("batman-adv: Modified forwarding behaviour for multicast packets")
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Felix Kuehling [Fri, 23 Mar 2018 19:30:33 +0000 (15:30 -0400)]
drm/amdkfd: Fix scratch memory with HWS enabled
[ Upstream commit
c70a36268799cf2f902b5a31e452571fcb96bfe9 ]
Program sh_hidden_private_base_vmid correctly in the map-process
PM4 packet.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Masami Hiramatsu [Sat, 17 Mar 2018 12:40:31 +0000 (21:40 +0900)]
selftests: ftrace: Add a testcase for probepoint
[ Upstream commit
dfa453bc90eca0febff33c8d292a656e53702158 ]
Add a testcase for probe point definition. This tests
symbol, address and symbol+offset syntax. The offset
must be positive and smaller than UINT_MAX.
Link: http://lkml.kernel.org/r/152129043097.31874.14273580606301767394.stgit@devbox
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Masami Hiramatsu [Sat, 17 Mar 2018 12:39:44 +0000 (21:39 +0900)]
selftests: ftrace: Add a testcase for string type with kprobe_event
[ Upstream commit
5fbdbed797b6d12d043a5121fdbc8d8b49d10e80 ]
Add a testcase for string type with kprobe event.
This tests good/bad syntax combinations and also
the traced data is correct in several way.
Link: http://lkml.kernel.org/r/152129038381.31874.9201387794548737554.stgit@devbox
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Masami Hiramatsu [Sat, 17 Mar 2018 12:38:56 +0000 (21:38 +0900)]
selftests: ftrace: Add probe event argument syntax testcase
[ Upstream commit
871bef2000968c312a4000b2f56d370dcedbc93c ]
Add a testcase for probe event argument syntax which
ensures the kprobe_events interface correctly parses
given event arguments.
Link: http://lkml.kernel.org/r/152129033679.31874.12705519603869152799.stgit@devbox
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Steffen Klassert [Mon, 19 Mar 2018 06:15:39 +0000 (07:15 +0100)]
xfrm: Fix transport mode skb control buffer usage.
[ Upstream commit
9a3fb9fb84cc30577c1b012a6a3efda944684291 ]
A recent commit introduced a new struct xfrm_trans_cb
that is used with the sk_buff control buffer. Unfortunately
it placed the structure in front of the control buffer and
overlooked that the IPv4/IPv6 control buffer is still needed
for some layer 4 protocols. As a result the IPv4/IPv6 control
buffer is overwritten with this structure. Fix this by setting
a apropriate header in front of the structure.
Fixes
acf568ee859f ("xfrm: Reinject transport-mode packets ...")
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
David Rientjes [Thu, 22 Mar 2018 23:17:45 +0000 (16:17 -0700)]
mm, thp: do not cause memcg oom for thp
[ Upstream commit
9d3c3354bb85bab4d865fe95039443f09a4c8394 ]
Commit
2516035499b9 ("mm, thp: remove __GFP_NORETRY from khugepaged and
madvised allocations") changed the page allocator to no longer detect
thp allocations based on __GFP_NORETRY.
It did not, however, modify the mem cgroup try_charge() path to avoid
oom kill for either khugepaged collapsing or thp faulting. It is never
expected to oom kill a process to allocate a hugepage for thp; reclaim
is governed by the thp defrag mode and MADV_HUGEPAGE, but allocations
(and charging) should fallback instead of oom killing processes.
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1803191409420.124411@chino.kir.corp.google.com
Fixes:
2516035499b9 ("mm, thp: remove __GFP_NORETRY from khugepaged and madvised allocations")
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Yisheng Xie [Thu, 22 Mar 2018 23:17:02 +0000 (16:17 -0700)]
mm/mempolicy.c: avoid use uninitialized preferred_node
[ Upstream commit
8970a63e965b43288c4f5f40efbc2bbf80de7f16 ]
Alexander reported a use of uninitialized memory in __mpol_equal(),
which is caused by incorrect use of preferred_node.
When mempolicy in mode MPOL_PREFERRED with flags MPOL_F_LOCAL, it uses
numa_node_id() instead of preferred_node, however, __mpol_equal() uses
preferred_node without checking whether it is MPOL_F_LOCAL or not.
[akpm@linux-foundation.org: slight comment tweak]
Link: http://lkml.kernel.org/r/4ebee1c2-57f6-bcb8-0e2d-1833d1ee0bb7@huawei.com
Fixes:
fc36b8d3d819 ("mempolicy: use MPOL_F_LOCAL to Indicate Preferred Local Policy")
Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com>
Reported-by: Alexander Potapenko <glider@google.com>
Tested-by: Alexander Potapenko <glider@google.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Y.C. Chen [Mon, 12 Mar 2018 03:40:23 +0000 (11:40 +0800)]
drm/ast: Fixed 1280x800 Display Issue
[ Upstream commit
5a9f698feb11b198f17b2acebbfe0e2716a3beed ]
The original ast driver cannot display properly if the resolution is 1280x800 and the pixel clock is 83.5MHz.
Here is the update to fix it.
Signed-off-by: Y.C. Chen <yc_chen@aspeedtech.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Florian Fainelli [Wed, 21 Mar 2018 00:31:10 +0000 (17:31 -0700)]
net: dsa: Fix functional dsa-loop dependency on FIXED_PHY
[ Upstream commit
40013ff20b1beed31184935fc0aea6a859d4d4ef ]
We have a functional dependency on the FIXED_PHY MDIO bus because we register
fixed PHY devices "the old way" which only works if the code that does this has
had a chance to run before the fixed MDIO bus is probed. Make sure we account
for that and have dsa_loop_bdinfo.o be either built-in or modular depending on
whether CONFIG_FIXED_PHY reflects that too.
Fixes:
98cd1552ea27 ("net: dsa: Mock-up driver")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Davide Caratti [Mon, 19 Mar 2018 14:31:28 +0000 (15:31 +0100)]
net/sched: fix idr leak in the error path of tcf_skbmod_init()
[ Upstream commit
f29cdfbe33d6915ba8056179b0041279a67e3647 ]
tcf_skbmod_init() can fail after the idr has been successfully reserved.
When this happens, every subsequent attempt to configure skbmod rules
using the same idr value will systematically fail with -ENOSPC, unless
the first attempt was done using the 'replace' keyword:
# tc action add action skbmod swap mac index 100
RTNETLINK answers: Cannot allocate memory
We have an error talking to the kernel
# tc action add action skbmod swap mac index 100
RTNETLINK answers: No space left on device
We have an error talking to the kernel
# tc action add action skbmod swap mac index 100
RTNETLINK answers: No space left on device
We have an error talking to the kernel
...
Fix this in tcf_skbmod_init(), ensuring that tcf_idr_release() is called
on the error path when the idr has been reserved, but not yet inserted.
Also, don't test 'ovr' in the error path, to avoid a 'replace' failure
implicitly become a 'delete' that leaks refcount in act_skbmod module:
# rmmod act_skbmod; modprobe act_skbmod
# tc action add action skbmod swap mac index 100
# tc action add action skbmod swap mac continue index 100
RTNETLINK answers: File exists
We have an error talking to the kernel
# tc action replace action skbmod swap mac continue index 100
RTNETLINK answers: Cannot allocate memory
We have an error talking to the kernel
# tc action list action skbmod
#
# rmmod act_skbmod
rmmod: ERROR: Module act_skbmod is in use
Fixes:
65a206c01e8e ("net/sched: Change act_api and act_xxx modules to use IDR")
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Davide Caratti [Mon, 19 Mar 2018 14:31:26 +0000 (15:31 +0100)]
net/sched: fix idr leak in the error path of __tcf_ipt_init()
[ Upstream commit
1e46ef1762bb2e52f0f996131a4d16ed4e9fd065 ]
__tcf_ipt_init() can fail after the idr has been successfully reserved.
When this happens, subsequent attempts to configure xt/ipt rules using
the same idr value systematically fail with -ENOSPC:
# tc action add action xt -j LOG --log-prefix test1 index 100
tablename: mangle hook: NF_IP_POST_ROUTING
target: LOG level warning prefix "test1" index 100
RTNETLINK answers: Cannot allocate memory
We have an error talking to the kernel
Command "(null)" is unknown, try "tc actions help".
# tc action add action xt -j LOG --log-prefix test1 index 100
tablename: mangle hook: NF_IP_POST_ROUTING
target: LOG level warning prefix "test1" index 100
RTNETLINK answers: No space left on device
We have an error talking to the kernel
Command "(null)" is unknown, try "tc actions help".
# tc action add action xt -j LOG --log-prefix test1 index 100
tablename: mangle hook: NF_IP_POST_ROUTING
target: LOG level warning prefix "test1" index 100
RTNETLINK answers: No space left on device
We have an error talking to the kernel
...
Fix this in the error path of __tcf_ipt_init(), calling tcf_idr_release()
in place of tcf_idr_cleanup(). Since tcf_ipt_release() can now be called
when tcfi_t is NULL, we also need to protect calls to ipt_destroy_target()
to avoid NULL pointer dereference.
Fixes:
65a206c01e8e ("net/sched: Change act_api and act_xxx modules to use IDR")
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Davide Caratti [Mon, 19 Mar 2018 14:31:25 +0000 (15:31 +0100)]
net/sched: fix idr leak in the error path of tcp_pedit_init()
[ Upstream commit
94fa3f929ec0c048b1f3658cc335b940df4f6d22 ]
tcf_pedit_init() can fail to allocate 'keys' after the idr has been
successfully reserved. When this happens, subsequent attempts to configure
a pedit rule using the same idr value systematically fail with -ENOSPC:
# tc action add action pedit munge ip ttl set 63 index 100
RTNETLINK answers: Cannot allocate memory
We have an error talking to the kernel
# tc action add action pedit munge ip ttl set 63 index 100
RTNETLINK answers: No space left on device
We have an error talking to the kernel
# tc action add action pedit munge ip ttl set 63 index 100
RTNETLINK answers: No space left on device
We have an error talking to the kernel
...
Fix this in the error path of tcf_act_pedit_init(), calling
tcf_idr_release() in place of tcf_idr_cleanup().
Fixes:
65a206c01e8e ("net/sched: Change act_api and act_xxx modules to use IDR")
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Davide Caratti [Mon, 19 Mar 2018 14:31:24 +0000 (15:31 +0100)]
net/sched: fix idr leak in the error path of tcf_act_police_init()
[ Upstream commit
5bf7f8185f7c7112decdfe3d3e5c5d5e67f099a1 ]
tcf_act_police_init() can fail after the idr has been successfully
reserved (e.g., qdisc_get_rtab() may return NULL). When this happens,
subsequent attempts to configure a police rule using the same idr value
systematiclly fail with -ENOSPC:
# tc action add action police rate 1000 burst 1000 drop index 100
RTNETLINK answers: Cannot allocate memory
We have an error talking to the kernel
# tc action add action police rate 1000 burst 1000 drop index 100
RTNETLINK answers: No space left on device
We have an error talking to the kernel
# tc action add action police rate 1000 burst 1000 drop index 100
RTNETLINK answers: No space left on device
...
Fix this in the error path of tcf_act_police_init(), calling
tcf_idr_release() in place of tcf_idr_cleanup().
Fixes:
65a206c01e8e ("net/sched: Change act_api and act_xxx modules to use IDR")
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Davide Caratti [Mon, 19 Mar 2018 14:31:23 +0000 (15:31 +0100)]
net/sched: fix idr leak in the error path of tcf_simp_init()
[ Upstream commit
60e10b3adc3bac0f6a894c28e0eb1f2d13607362 ]
if the kernel fails to duplicate 'sdata', creation of a new action fails
with -ENOMEM. However, subsequent attempts to install the same action
using the same value of 'index' systematically fail with -ENOSPC, and
that value of 'index' will no more be usable by act_simple, until rmmod /
insmod of act_simple.ko is done:
# tc actions add action simple sdata hello index 100
# tc actions list action simple
action order 0: Simple <hello>
index 100 ref 1 bind 0
# tc actions flush action simple
# tc actions add action simple sdata hello index 100
RTNETLINK answers: Cannot allocate memory
We have an error talking to the kernel
# tc actions flush action simple
# tc actions add action simple sdata hello index 100
RTNETLINK answers: No space left on device
We have an error talking to the kernel
# tc actions add action simple sdata hello index 100
RTNETLINK answers: No space left on device
We have an error talking to the kernel
...
Fix this in the error path of tcf_simp_init(), calling tcf_idr_release()
in place of tcf_idr_cleanup().
Fixes:
65a206c01e8e ("net/sched: Change act_api and act_xxx modules to use IDR")
Suggested-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Davide Caratti [Mon, 19 Mar 2018 14:31:22 +0000 (15:31 +0100)]
net/sched: fix idr leak on the error path of tcf_bpf_init()
[ Upstream commit
bbc09e7842a5023ba5bc0f8d559b9dd464e44006 ]
when the following command sequence is entered
# tc action add action bpf bytecode '4,40 0 0 12,31 0 1 2048,6 0 0 262144,6 0 0 0' index 100
RTNETLINK answers: Invalid argument
We have an error talking to the kernel
# tc action add action bpf bytecode '4,40 0 0 12,21 0 1 2048,6 0 0 262144,6 0 0 0' index 100
RTNETLINK answers: No space left on device
We have an error talking to the kernel
act_bpf correctly refuses to install the first TC rule, because 31 is not
a valid instruction. However, it refuses to install the second TC rule,
even if the BPF code is correct. Furthermore, it's no more possible to
install any other rule having the same value of 'index' until act_bpf
module is unloaded/inserted again. After the idr has been reserved, call
tcf_idr_release() instead of tcf_idr_cleanup(), to fix this issue.
Fixes:
65a206c01e8e ("net/sched: Change act_api and act_xxx modules to use IDR")
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Kalderon, Michal [Wed, 21 Mar 2018 12:51:52 +0000 (14:51 +0200)]
RDMA/qedr: Fix QP state initialization race
[ Upstream commit
caf61b1b8b88ccf1451f7321a176393797e8d292 ]
Once the FW is transitioned to error, FLUSH cqes can be received.
We want the driver to be aware of the fact that QP is already in error.
Without this fix, a user may see false error messages in the dmesg log,
mentioning that a FLUSH cqe was received while QP is not in error state.
Fixes:
cecbcddf ("qedr: Add support for QP verbs")
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Kalderon, Michal [Wed, 21 Mar 2018 12:51:51 +0000 (14:51 +0200)]
RDMA/qedr: Fix rc initialization on CNQ allocation failure
[ Upstream commit
b15606f47b89b0b09936d7f45b59ba6275527041 ]
Return code wasn't set properly when CNQ allocation failed.
This only affect error message logging, currently user will
receive an error message that says the qedr driver load failed
with rc '0', instead of ENOMEM
Fixes:
ec72fce4 ("qedr: Add support for RoCE HW init")
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Kalderon, Michal [Wed, 21 Mar 2018 12:51:50 +0000 (14:51 +0200)]
RDMA/qedr: fix QP's ack timeout configuration
[ Upstream commit
c3594f22302cca5e924e47ec1cc8edd265708f41 ]
QPs that were configured with ack timeout value lower than 1
msec will not implement re-transmission timeout.
This means that if a packet / ACK were dropped, the QP
will not retransmit this packet.
This can lead to an application hang.
Fixes:
cecbcddf6 ("qedr: Add support for QP verbs")
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Chien Tin Tung [Wed, 21 Mar 2018 18:09:25 +0000 (13:09 -0500)]
RDMA/ucma: Correct option size check using optlen
[ Upstream commit
5f3e3b85cc0a5eae1c46d72e47d3de7bf208d9e2 ]
The option size check is using optval instead of optlen
causing the set option call to fail. Use the correct
field, optlen, for size check.
Fixes:
6a21dfc0d0db ("RDMA/ucma: Limit possible option size")
Signed-off-by: Chien Tin Tung <chien.tin.tung@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Nicolas Pitre [Thu, 15 Mar 2018 20:56:20 +0000 (16:56 -0400)]
kbuild: make scripts/adjust_autoksyms.sh robust against timestamp races
[ Upstream commit
825d487583089f9a33d31650c9c41f6474aab7fc ]
Some filesystems have timestamps with coarse precision that may allow
for a recently built object file to have the same timestamp as the
updated time on one of its dependency files. When that happens, the
object file doesn't get rebuilt as it should.
This is especially the case on filesystems that don't have sub-second
time precision, such as ext3 or Ext4 with 128B inodes.
Let's prevent that by making sure updated dependency files have a newer
timestamp than the first file we created (i.e. autoksyms.h.tmpnew).
Reported-by: Thomas Lindroth <thomas.lindroth@gmail.com>
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Tested-by: Thomas Lindroth <thomas.lindroth@gmail.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Stefan Wahren [Wed, 14 Mar 2018 19:02:59 +0000 (20:02 +0100)]
brcmfmac: Fix check for ISO3166 code
[ Upstream commit
9b9322db5c5a1917a66c71fe47c3848a9a31227e ]
The commit "regulatory: add NUL to request alpha2" increases the length of
alpha2 to 3. This causes a regression on brcmfmac, because
brcmf_cfg80211_reg_notifier() expect valid ISO3166 codes in the complete
array. So fix this accordingly.
Fixes:
657308f73e67 ("regulatory: add NUL to request alpha2")
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Acked-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Song Liu [Mon, 12 Mar 2018 16:59:43 +0000 (09:59 -0700)]
perf/cgroup: Fix child event counting bug
[ Upstream commit
c917e0f259908e75bd2a65877e25f9d90c22c848 ]
When a perf_event is attached to parent cgroup, it should count events
for all children cgroups:
parent_group <---- perf_event
\
- child_group <---- process(es)
However, in our tests, we found this perf_event cannot report reliable
results. Here is an example case:
# create cgroups
mkdir -p /sys/fs/cgroup/p/c
# start perf for parent group
perf stat -e instructions -G "p"
# on another console, run test process in child cgroup:
stressapptest -s 2 -M 1000 & echo $! > /sys/fs/cgroup/p/c/cgroup.procs
# after the test process is done, stop perf in the first console shows
<not counted> instructions p
The instruction should not be "not counted" as the process runs in the
child cgroup.
We found this is because perf_event->cgrp and cpuctx->cgrp are not
identical, thus perf_event->cgrp are not updated properly.
This patch fixes this by updating perf_cgroup properly for ancestor
cgroup(s).
Reported-by: Ephraim Park <ephiepark@fb.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <jolsa@redhat.com>
Cc: <kernel-team@fb.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/20180312165943.1057894-1-songliubraving@fb.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Thierry Reding [Sun, 18 Mar 2018 00:13:39 +0000 (01:13 +0100)]
drm/tegra: Shutdown on driver unbind
[ Upstream commit
192b4af6cd28cdad9b42fd79c21a90a2aeb0bec7 ]
Since commit
846c7dfc1193 ("drm/atomic: Try to preserve the crtc enabled
state in drm_atomic_remove_fb, v2."), removing the last framebuffer will
no longer disable the corresponding pipeline, which causes the KMS core
to complain about leaked connectors on driver unbind.
Fix this by calling drm_atomic_helper_shutdown() on driver unbind, which
will cause all display pipelines to be shut down and therefore drop the
extra references on the connectors.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Avraham Stern [Mon, 5 Mar 2018 09:26:53 +0000 (11:26 +0200)]
iwlwifi: mvm: fix array out of bounds reference
[ Upstream commit
4a6d2e525b43eba5870ea7e360f59aa65de00705 ]
When starting aggregation, the code checks the status of the queue
allocated to the aggregation tid, which might not yet be allocated
and thus the queue index may be invalid.
Fix this by reserving a new queue in case the queue id is invalid.
While at it, clean up some unreachable code (a condition that is
already handled earlier) and remove all the non-DQA comments since
non-DQA mode is no longer supported.
Fixes:
cf961e16620f ("iwlwifi: mvm: support dqa-mode agg on non-shared queue")
Signed-off-by: Avraham Stern <avraham.stern@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Avraham Stern [Tue, 6 Mar 2018 12:10:49 +0000 (14:10 +0200)]
iwlwifi: mvm: make sure internal station has a valid id
[ Upstream commit
df65c8d1728adee2a52c30287d33da83a8c87480 ]
If the driver failed to resume from D3, it is possible that it has
no valid aux station. In such case, fw restart will end up in sending
station related commands with an invalid station id, which will
result in an assert.
Fix this by allocating a new station id for the aux station if it
does not have a valid id even in the case of fw restart.
Signed-off-by: Avraham Stern <avraham.stern@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Avraham Stern [Wed, 7 Mar 2018 08:41:18 +0000 (10:41 +0200)]
iwlwifi: mvm: clear tx queue id when unreserving aggregation queue
[ Upstream commit
4b387906b1c3692bb790388c335515c0cf098a23 ]
When a queue is reserved for aggregation, the queue id is assigned
to the tid_data. This is fine since iwl_mvm_sta_tx_agg_oper()
takes care of allocating the queue before actual tx starts.
When the reservation is cancelled (e.g. when the AP declined the
aggregation request) the tid_data is not cleared. As a result,
following tx for this tid was trying to use an unallocated queue.
Fix this by setting the txq_id for the tid to invalid when unreserving
the queue.
Signed-off-by: Avraham Stern <avraham.stern@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>