Haibo Chen [Tue, 8 Aug 2017 10:54:01 +0000 (18:54 +0800)]
mmc: mmc: correct the logic for setting HS400ES signal voltage
commit
92ddd95919466de5d34f3cb43635da9a7f9ab814 upstream.
Change the default err value to -EINVAL, make sure the card only
has type EXT_CSD_CARD_TYPE_HS400_1_8V also do the signal voltage
setting when select hs400es mode.
Fixes: commit
1720d3545b77 ("mmc: core: switch to 1V8 or 1V2 for hs400es mode")
Signed-off-by: Haibo Chen <haibo.chen@nxp.com>
Reviewed-by: Shawn Lin <shawn.lin@rock-chips.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Miquel Raynal [Wed, 5 Jul 2017 06:51:09 +0000 (08:51 +0200)]
nand: fix wrong default oob layout for small pages using soft ecc
commit
f7f8c1756e9a5f1258a7cc6b663f8451b724900f upstream.
When using soft ecc, if no ooblayout is given, the core automatically
uses one of the nand_ooblayout_{sp,lp}*() functions to determine the
layout inside the out of band data.
Until kernel version 4.6, struct nand_ecclayout was used for that
purpose. During the migration from 4.6 to 4.7, an error shown up in the
small page layout, in the case oob section is only 8 bytes long.
The layout was using three bytes (0, 1, 2) for ecc, two bytes (3, 4)
as free bytes, one byte (5) for bad block marker and finally
two bytes (6, 7) as free bytes, as shown there:
[linux-4.6] drivers/mtd/nand/nand_base.c:52
static struct nand_ecclayout nand_oob_8 = {
.eccbytes = 3,
.eccpos = {0, 1, 2},
.oobfree = {
{.offset = 3,
.length = 2},
{.offset = 6,
.length = 2} }
};
This fixes the current implementation which is incoherent. It
references bit 3 at the same time as an ecc byte and a free byte.
Furthermore, it is clear with the previous implementation that there
is only one ecc section with 8 bytes oob sections. We shall return
-ERANGE in the nand_ooblayout_ecc_sp() function when asked for the
second section.
Signed-off-by: Miquel Raynal <miquel.raynal@free-electrons.com>
Fixes:
41b207a70d3a ("mtd: nand: implement the default mtd_ooblayout_ops")
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Mateusz Jurczyk [Wed, 7 Jun 2017 10:26:49 +0000 (12:26 +0200)]
fuse: initialize the flock flag in fuse_file on allocation
commit
68227c03cba84a24faf8a7277d2b1a03c8959c2c upstream.
Before the patch, the flock flag could remain uninitialized for the
lifespan of the fuse_file allocation. Unless set to true in
fuse_file_flock(), it would remain in an indeterminate state until read in
an if statement in fuse_release_common(). This could consequently lead to
taking an unexpected branch in the code.
The bug was discovered by a runtime instrumentation designed to detect use
of uninitialized memory in the kernel.
Signed-off-by: Mateusz Jurczyk <mjurczyk@google.com>
Fixes:
37fb3a30b462 ("fuse: fix flock")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Nicholas Bellinger [Sun, 6 Aug 2017 23:10:03 +0000 (16:10 -0700)]
target: Fix node_acl demo-mode + uncached dynamic shutdown regression
commit
6f48655facfd7f7ccfe6d252ac0fe319ab02e4dd upstream.
This patch fixes a generate_node_acls = 1 + cache_dynamic_acls = 0
regression, that was introduced by
commit
01d4d673558985d9a118e1e05026633c3e2ade9b
Author: Nicholas Bellinger <nab@linux-iscsi.org>
Date: Wed Dec 7 12:55:54 2016 -0800
which originally had the proper list_del_init() usage, but was
dropped during list review as it was thought unnecessary by HCH.
However, list_del_init() usage is required during the special
generate_node_acls = 1 + cache_dynamic_acls = 0 case when
transport_free_session() does a list_del(&se_nacl->acl_list),
followed by target_complete_nacl() doing the same thing.
This was manifesting as a general protection fault as reported
by Justin:
kernel: general protection fault: 0000 [#1] SMP
kernel: Modules linked in:
kernel: CPU: 0 PID: 11047 Comm: iscsi_ttx Not tainted 4.13.0-rc2.x86_64.1+ #20
kernel: Hardware name: Intel Corporation S5500BC/S5500BC, BIOS S5500.86B.01.00.0064.
050520141428 05/05/2014
kernel: task:
ffff88026939e800 task.stack:
ffffc90007884000
kernel: RIP: 0010:target_put_nacl+0x49/0xb0
kernel: RSP: 0018:
ffffc90007887d70 EFLAGS:
00010246
kernel: RAX:
dead000000000200 RBX:
ffff8802556ca000 RCX:
0000000000000000
kernel: RDX:
dead000000000100 RSI:
0000000000000246 RDI:
ffff8802556ce028
kernel: RBP:
ffffc90007887d88 R08:
0000000000000001 R09:
0000000000000000
kernel: R10:
ffffc90007887df8 R11:
ffffea0009986900 R12:
ffff8802556ce020
kernel: R13:
ffff8802556ce028 R14:
ffff8802556ce028 R15:
ffffffff88d85540
kernel: FS:
0000000000000000(0000) GS:
ffff88027fc00000(0000) knlGS:
0000000000000000
kernel: CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
kernel: CR2:
00007fffe36f5f94 CR3:
0000000009209000 CR4:
00000000003406f0
kernel: DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
kernel: DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
kernel: Call Trace:
kernel: transport_free_session+0x67/0x140
kernel: transport_deregister_session+0x7a/0xc0
kernel: iscsit_close_session+0x92/0x210
kernel: iscsit_close_connection+0x5f9/0x840
kernel: iscsit_take_action_for_connection_exit+0xfe/0x110
kernel: iscsi_target_tx_thread+0x140/0x1e0
kernel: ? wait_woken+0x90/0x90
kernel: kthread+0x124/0x160
kernel: ? iscsit_thread_get_cpumask+0x90/0x90
kernel: ? kthread_create_on_node+0x40/0x40
kernel: ret_from_fork+0x22/0x30
kernel: Code: 00 48 89 fb 4c 8b a7 48 01 00 00 74 68 4d 8d 6c 24 08 4c
89 ef e8 e8 28 43 00 48 8b 93 20 04 00 00 48 8b 83 28 04 00 00 4c 89
ef <48> 89 42 08 48 89 10 48 b8 00 01 00 00 00 00 ad de 48 89 83 20
kernel: RIP: target_put_nacl+0x49/0xb0 RSP:
ffffc90007887d70
kernel: ---[ end trace
f12821adbfd46fed ]---
To address this, go ahead and use proper list_del_list() for all
cases of se_nacl->acl_list deletion.
Reported-by: Justin Maggard <jmaggard01@gmail.com>
Tested-by: Justin Maggard <jmaggard01@gmail.com>
Cc: Justin Maggard <jmaggard01@gmail.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Nicholas Bellinger [Sat, 5 Aug 2017 06:59:31 +0000 (23:59 -0700)]
iscsi-target: Fix iscsi_np reset hung task during parallel delete
commit
978d13d60c34818a41fc35962602bdfa5c03f214 upstream.
This patch fixes a bug associated with iscsit_reset_np_thread()
that can occur during parallel configfs rmdir of a single iscsi_np
used across multiple iscsi-target instances, that would result in
hung task(s) similar to below where configfs rmdir process context
was blocked indefinately waiting for iscsi_np->np_restart_comp
to finish:
[ 6726.112076] INFO: task dcp_proxy_node_:15550 blocked for more than 120 seconds.
[ 6726.119440] Tainted: G W O 4.1.26-3321 #2
[ 6726.125045] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 6726.132927] dcp_proxy_node_ D
ffff8803f202bc88 0 15550 1 0x00000000
[ 6726.140058]
ffff8803f202bc88 ffff88085c64d960 ffff88083b3b1ad0 ffff88087fffeb08
[ 6726.147593]
ffff8803f202c000 7fffffffffffffff ffff88083f459c28 ffff88083b3b1ad0
[ 6726.155132]
ffff88035373c100 ffff8803f202bca8 ffffffff8168ced2 ffff8803f202bcb8
[ 6726.162667] Call Trace:
[ 6726.165150] [<
ffffffff8168ced2>] schedule+0x32/0x80
[ 6726.170156] [<
ffffffff8168f5b4>] schedule_timeout+0x214/0x290
[ 6726.176030] [<
ffffffff810caef2>] ? __send_signal+0x52/0x4a0
[ 6726.181728] [<
ffffffff8168d7d6>] wait_for_completion+0x96/0x100
[ 6726.187774] [<
ffffffff810e7c80>] ? wake_up_state+0x10/0x10
[ 6726.193395] [<
ffffffffa035d6e2>] iscsit_reset_np_thread+0x62/0xe0 [iscsi_target_mod]
[ 6726.201278] [<
ffffffffa0355d86>] iscsit_tpg_disable_portal_group+0x96/0x190 [iscsi_target_mod]
[ 6726.210033] [<
ffffffffa0363f7f>] lio_target_tpg_store_enable+0x4f/0xc0 [iscsi_target_mod]
[ 6726.218351] [<
ffffffff81260c5a>] configfs_write_file+0xaa/0x110
[ 6726.224392] [<
ffffffff811ea364>] vfs_write+0xa4/0x1b0
[ 6726.229576] [<
ffffffff811eb111>] SyS_write+0x41/0xb0
[ 6726.234659] [<
ffffffff8169042e>] system_call_fastpath+0x12/0x71
It would happen because each iscsit_reset_np_thread() sets state
to ISCSI_NP_THREAD_RESET, sends SIGINT, and then blocks waiting
for completion on iscsi_np->np_restart_comp.
However, if iscsi_np was active processing a login request and
more than a single iscsit_reset_np_thread() caller to the same
iscsi_np was blocked on iscsi_np->np_restart_comp, iscsi_np
kthread process context in __iscsi_target_login_thread() would
flush pending signals and only perform a single completion of
np->np_restart_comp before going back to sleep within transport
specific iscsit_transport->iscsi_accept_np code.
To address this bug, add a iscsi_np->np_reset_count and update
__iscsi_target_login_thread() to keep completing np->np_restart_comp
until ->np_reset_count has reached zero.
Reported-by: Gary Guo <ghg@datera.io>
Tested-by: Gary Guo <ghg@datera.io>
Cc: Mike Christie <mchristi@redhat.com>
Cc: Hannes Reinecke <hare@suse.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Varun Prakash [Sun, 23 Jul 2017 14:33:33 +0000 (20:03 +0530)]
iscsi-target: fix memory leak in iscsit_setup_text_cmd()
commit
ea8dc5b4cd2195ee582cae28afa4164c6dea1738 upstream.
On receiving text request iscsi-target allocates buffer for
payload in iscsit_handle_text_cmd() and assigns buffer pointer
to cmd->text_in_ptr, this buffer is currently freed in
iscsit_release_cmd(), if iscsi-target sets 'C' bit in text
response then it will receive another text request from the
initiator with ttt != 0xffffffff in this case iscsi-target
will find cmd using itt and call iscsit_setup_text_cmd()
which will set cmd->text_in_ptr to NULL without freeing
previously allocated buffer.
This patch fixes this issue by calling kfree(cmd->text_in_ptr)
in iscsit_setup_text_cmd() before assigning NULL to it.
For the first text request cmd->text_in_ptr is NULL as
cmd is memset to 0 in iscsit_allocate_cmd().
Signed-off-by: Varun Prakash <varun@chelsio.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Boris Brezillon [Mon, 31 Jul 2017 08:29:56 +0000 (10:29 +0200)]
mtd: nand: Fix timing setup for NANDs that do not support SET FEATURES
commit
a11bf5ed951f8900d244d09eb03a888b59c7fc82 upstream.
Some ONFI NANDs do not support the SET/GET FEATURES commands, which,
according to the spec, is perfectly valid.
On these NANDs we can't set a specific timing mode using the "timing
mode" feature, and we should assume the NAND does not require any setup
to enter a specific timing mode.
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Fixes:
d8e725dd8311 ("mtd: nand: automate NAND timings selection")
Reported-by: Alexander Dahl <ada@thorsis.com>
Tested-by: Alexander Dahl <ada@thorsis.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Max Filippov [Tue, 1 Aug 2017 18:02:46 +0000 (11:02 -0700)]
xtensa: don't limit csum_partial export by CONFIG_NET
commit
7f81e55c737a8fa82c71f290945d729a4902f8d2 upstream.
csum_partial and csum_partial_copy_generic are defined unconditionally
and are available even when CONFIG_NET is disabled. They are used not
only by the network drivers, but also by scsi and media.
Don't limit these functions export by CONFIG_NET.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Max Filippov [Tue, 1 Aug 2017 18:15:15 +0000 (11:15 -0700)]
xtensa: mm/cache: add missing EXPORT_SYMBOLs
commit
bc652eb6a0d5cffaea7dc8e8ad488aab2a1bf1ed upstream.
Functions clear_user_highpage, copy_user_highpage, flush_dcache_page,
local_flush_cache_range and local_flush_cache_page may be used from
modules. Export them.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Max Filippov [Sat, 29 Jul 2017 00:42:59 +0000 (17:42 -0700)]
xtensa: fix cache aliasing handling code for WT cache
commit
6d0f581d1768d3eaba15776e7dd1fdfec10cfe36 upstream.
Currently building kernel for xtensa core with aliasing WT cache fails
with the following messages:
mm/memory.c:2152: undefined reference to `flush_dcache_page'
mm/memory.c:2332: undefined reference to `local_flush_cache_page'
mm/memory.c:1919: undefined reference to `local_flush_cache_range'
mm/memory.c:4179: undefined reference to `copy_to_user_page'
mm/memory.c:4183: undefined reference to `copy_from_user_page'
This happens because implementation of these functions is only compiled
when data cache is WB, which looks wrong: even when data cache doesn't
need flushing it still needs invalidation. The functions like
__flush_[invalidate_]dcache_* are correctly defined for both WB and WT
caches (and even if they weren't that'd still be ok, just slower).
Fix this by providing the same implementation of the above functions for
both WB and WT cache.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Mel Gorman [Wed, 9 Aug 2017 07:27:11 +0000 (08:27 +0100)]
futex: Remove unnecessary warning from get_futex_key
commit
48fb6f4db940e92cfb16cd878cddd59ea6120d06 upstream.
Commit
65d8fc777f6d ("futex: Remove requirement for lock_page() in
get_futex_key()") removed an unnecessary lock_page() with the
side-effect that page->mapping needed to be treated very carefully.
Two defensive warnings were added in case any assumption was missed and
the first warning assumed a correct application would not alter a
mapping backing a futex key. Since merging, it has not triggered for
any unexpected case but Mark Rutland reported the following bug
triggering due to the first warning.
kernel BUG at kernel/futex.c:679!
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
Modules linked in:
CPU: 0 PID: 3695 Comm: syz-executor1 Not tainted 4.13.0-rc3-00020-g307fec773ba3 #3
Hardware name: linux,dummy-virt (DT)
task:
ffff80001e271780 task.stack:
ffff000010908000
PC is at get_futex_key+0x6a4/0xcf0 kernel/futex.c:679
LR is at get_futex_key+0x6a4/0xcf0 kernel/futex.c:679
pc : [<
ffff00000821ac14>] lr : [<
ffff00000821ac14>] pstate:
80000145
The fact that it's a bug instead of a warning was due to an unrelated
arm64 problem, but the warning itself triggered because the underlying
mapping changed.
This is an application issue but from a kernel perspective it's a
recoverable situation and the warning is unnecessary so this patch
removes the warning. The warning may potentially be triggered with the
following test program from Mark although it may be necessary to adjust
NR_FUTEX_THREADS to be a value smaller than the number of CPUs in the
system.
#include <linux/futex.h>
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <sys/syscall.h>
#include <sys/time.h>
#include <unistd.h>
#define NR_FUTEX_THREADS 16
pthread_t threads[NR_FUTEX_THREADS];
void *mem;
#define MEM_PROT (PROT_READ | PROT_WRITE)
#define MEM_SIZE 65536
static int futex_wrapper(int *uaddr, int op, int val,
const struct timespec *timeout,
int *uaddr2, int val3)
{
syscall(SYS_futex, uaddr, op, val, timeout, uaddr2, val3);
}
void *poll_futex(void *unused)
{
for (;;) {
futex_wrapper(mem, FUTEX_CMP_REQUEUE_PI, 1, NULL, mem + 4, 1);
}
}
int main(int argc, char *argv[])
{
int i;
mem = mmap(NULL, MEM_SIZE, MEM_PROT,
MAP_SHARED | MAP_ANONYMOUS, -1, 0);
printf("Mapping @ %p\n", mem);
printf("Creating futex threads...\n");
for (i = 0; i < NR_FUTEX_THREADS; i++)
pthread_create(&threads[i], NULL, poll_futex, NULL);
printf("Flipping mapping...\n");
for (;;) {
mmap(mem, MEM_SIZE, MEM_PROT,
MAP_FIXED | MAP_SHARED | MAP_ANONYMOUS, -1, 0);
}
return 0;
}
Reported-and-tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cong Wang [Thu, 10 Aug 2017 22:24:24 +0000 (15:24 -0700)]
mm: fix list corruptions on shmem shrinklist
commit
d041353dc98a6339182cd6f628b4c8f111278cb3 upstream.
We saw many list corruption warnings on shmem shrinklist:
WARNING: CPU: 18 PID: 177 at lib/list_debug.c:59 __list_del_entry+0x9e/0xc0
list_del corruption. prev->next should be
ffff9ae5694b82d8, but was
ffff9ae5699ba960
Modules linked in: intel_rapl sb_edac edac_core x86_pkg_temp_thermal coretemp iTCO_wdt iTCO_vendor_support crct10dif_pclmul crc32_pclmul ghash_clmulni_intel raid0 dcdbas shpchp wmi hed i2c_i801 ioatdma lpc_ich i2c_smbus acpi_cpufreq tcp_diag inet_diag sch_fq_codel ipmi_si ipmi_devintf ipmi_msghandler igb ptp crc32c_intel pps_core i2c_algo_bit i2c_core dca ipv6 crc_ccitt
CPU: 18 PID: 177 Comm: kswapd1 Not tainted 4.9.34-t3.el7.twitter.x86_64 #1
Hardware name: Dell Inc. PowerEdge C6220/0W6W6G, BIOS 2.2.3 11/07/2013
Call Trace:
dump_stack+0x4d/0x66
__warn+0xcb/0xf0
warn_slowpath_fmt+0x4f/0x60
__list_del_entry+0x9e/0xc0
shmem_unused_huge_shrink+0xfa/0x2e0
shmem_unused_huge_scan+0x20/0x30
super_cache_scan+0x193/0x1a0
shrink_slab.part.41+0x1e3/0x3f0
shrink_slab+0x29/0x30
shrink_node+0xf9/0x2f0
kswapd+0x2d8/0x6c0
kthread+0xd7/0xf0
ret_from_fork+0x22/0x30
WARNING: CPU: 23 PID: 639 at lib/list_debug.c:33 __list_add+0x89/0xb0
list_add corruption. prev->next should be next (
ffff9ae5699ba960), but was
ffff9ae5694b82d8. (prev=
ffff9ae5694b82d8).
Modules linked in: intel_rapl sb_edac edac_core x86_pkg_temp_thermal coretemp iTCO_wdt iTCO_vendor_support crct10dif_pclmul crc32_pclmul ghash_clmulni_intel raid0 dcdbas shpchp wmi hed i2c_i801 ioatdma lpc_ich i2c_smbus acpi_cpufreq tcp_diag inet_diag sch_fq_codel ipmi_si ipmi_devintf ipmi_msghandler igb ptp crc32c_intel pps_core i2c_algo_bit i2c_core dca ipv6 crc_ccitt
CPU: 23 PID: 639 Comm: systemd-udevd Tainted: G W 4.9.34-t3.el7.twitter.x86_64 #1
Hardware name: Dell Inc. PowerEdge C6220/0W6W6G, BIOS 2.2.3 11/07/2013
Call Trace:
dump_stack+0x4d/0x66
__warn+0xcb/0xf0
warn_slowpath_fmt+0x4f/0x60
__list_add+0x89/0xb0
shmem_setattr+0x204/0x230
notify_change+0x2ef/0x440
do_truncate+0x5d/0x90
path_openat+0x331/0x1190
do_filp_open+0x7e/0xe0
do_sys_open+0x123/0x200
SyS_open+0x1e/0x20
do_syscall_64+0x61/0x170
entry_SYSCALL64_slow_path+0x25/0x25
The problem is that shmem_unused_huge_shrink() moves entries from the
global sbinfo->shrinklist to its local lists and then releases the
spinlock. However, a parallel shmem_setattr() could access one of these
entries directly and add it back to the global shrinklist if it is
removed, with the spinlock held.
The logic itself looks solid since an entry could be either in a local
list or the global list, otherwise it is removed from one of them by
list_del_init(). So probably the race condition is that, one CPU is in
the middle of INIT_LIST_HEAD() but the other CPU calls list_empty()
which returns true too early then the following list_add_tail() sees a
corrupted entry.
list_empty_careful() is designed to fix this situation.
[akpm@linux-foundation.org: add comments]
Link: http://lkml.kernel.org/r/20170803054630.18775-1-xiyou.wangcong@gmail.com
Fixes:
779750d20b93 ("shmem: split huge pages beyond i_size under memory pressure")
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jonathan Toppins [Thu, 10 Aug 2017 22:23:35 +0000 (15:23 -0700)]
mm: ratelimit PFNs busy info message
commit
75dddef32514f7aa58930bde6a1263253bc3d4ba upstream.
The RDMA subsystem can generate several thousand of these messages per
second eventually leading to a kernel crash. Ratelimit these messages
to prevent this crash.
Doug said:
"I've been carrying a version of this for several kernel versions. I
don't remember when they started, but we have one (and only one) class
of machines: Dell PE R730xd, that generate these errors. When it
happens, without a rate limit, we get rcu timeouts and kernel oopses.
With the rate limit, we just get a lot of annoying kernel messages but
the machine continues on, recovers, and eventually the memory
operations all succeed"
And:
"> Well... why are all these EBUSY's occurring? It sounds inefficient
> (at least) but if it is expected, normal and unavoidable then
> perhaps we should just remove that message altogether?
I don't have an answer to that question. To be honest, I haven't
looked real hard. We never had this at all, then it started out of the
blue, but only on our Dell 730xd machines (and it hits all of them),
but no other classes or brands of machines. And we have our 730xd
machines loaded up with different brands and models of cards (for
instance one dedicated to mlx4 hardware, one for qib, one for mlx5, an
ocrdma/cxgb4 combo, etc), so the fact that it hit all of the machines
meant it wasn't tied to any particular brand/model of RDMA hardware.
To me, it always smelled of a hardware oddity specific to maybe the
CPUs or mainboard chipsets in these machines, so given that I'm not an
mm expert anyway, I never chased it down.
A few other relevant details: it showed up somewhere around 4.8/4.9 or
thereabouts. It never happened before, but the prinkt has been there
since the 3.18 days, so possibly the test to trigger this message was
changed, or something else in the allocator changed such that the
situation started happening on these machines?
And, like I said, it is specific to our 730xd machines (but they are
all identical, so that could mean it's something like their specific
ram configuration is causing the allocator to hit this on these
machine but not on other machines in the cluster, I don't want to say
it's necessarily the model of chipset or CPU, there are other bits of
identicalness between these machines)"
Link: http://lkml.kernel.org/r/499c0f6cc10d6eb829a67f2a4d75b4228a9b356e.1501695897.git.jtoppins@redhat.com
Signed-off-by: Jonathan Toppins <jtoppins@redhat.com>
Reviewed-by: Doug Ledford <dledford@redhat.com>
Tested-by: Doug Ledford <dledford@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Greg Kroah-Hartman [Sun, 13 Aug 2017 02:31:45 +0000 (19:31 -0700)]
Linux 4.9.43
Greg Kroah-Hartman [Sat, 12 Aug 2017 16:08:48 +0000 (09:08 -0700)]
Revert "ARM: dts: sun8i: Support DTB build for NanoPi M1"
This reverts commit
1e9e71782f3462d5aecb0720d26298253bdbeca7 which is
commit
661ccdc1a95f18ab6c1373322fde09afd5b90a1f upstream.
It's not needed in 4.9, and it breaks the build.
Reported-by: Guenter Roeck <linux@roeck-us.net>
Cc: Milo Kim <woogyom.kim@gmail.com>
Cc: Maxime Ripard <maxime.ripard@free-electrons.com>
Cc: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Suzuki K Poulose [Wed, 5 Jul 2017 08:57:00 +0000 (09:57 +0100)]
KVM: arm/arm64: Handle hva aging while destroying the vm
commit
7e5a672289c9754d07e1c3b33649786d3d70f5e4 upstream.
The mmu_notifier_release() callback of KVM triggers cleaning up
the stage2 page table on kvm-arm. However there could be other
notifier callbacks in parallel with the mmu_notifier_release(),
which could cause the call backs ending up in an empty stage2
page table. Make sure we check it for all the notifier callbacks.
Fixes: commit
293f29363 ("kvm-arm: Unmap shadow pagetables properly")
Reported-by: Alex Graf <agraf@suse.de>
Reviewed-by: Christoffer Dall <cdall@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Rob Gardner [Mon, 17 Jul 2017 15:22:27 +0000 (09:22 -0600)]
sparc64: Prevent perf from running during super critical sections
commit
fc290a114fc6034b0f6a5a46e2fb7d54976cf87a upstream.
This fixes another cause of random segfaults and bus errors that may
occur while running perf with the callgraph option.
Critical sections beginning with spin_lock_irqsave() raise the interrupt
level to PIL_NORMAL_MAX (14) and intentionally do not block performance
counter interrupts, which arrive at PIL_NMI (15).
But some sections of code are "super critical" with respect to perf
because the perf_callchain_user() path accesses user space and may cause
TLB activity as well as faults as it unwinds the user stack.
One particular critical section occurs in switch_mm:
spin_lock_irqsave(&mm->context.lock, flags);
...
load_secondary_context(mm);
tsb_context_switch(mm);
...
spin_unlock_irqrestore(&mm->context.lock, flags);
If a perf interrupt arrives in between load_secondary_context() and
tsb_context_switch(), then perf_callchain_user() could execute with
the context ID of one process, but with an active TSB for a different
process. When the user stack is accessed, it is very likely to
incur a TLB miss, since the h/w context ID has been changed. The TLB
will then be reloaded with a translation from the TSB for one process,
but using a context ID for another process. This exposes memory from
one process to another, and since it is a mapping for stack memory,
this usually causes the new process to crash quickly.
This super critical section needs more protection than is provided
by spin_lock_irqsave() since perf interrupts must not be allowed in.
Since __tsb_context_switch already goes through the trouble of
disabling interrupts completely, we fix this by moving the secondary
context load down into this better protected region.
Orabug:
25577560
Signed-off-by: Dave Aldridge <david.j.aldridge@oracle.com>
Signed-off-by: Rob Gardner <rob.gardner@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Willem de Bruijn [Thu, 10 Aug 2017 16:29:19 +0000 (12:29 -0400)]
udp: consistently apply ufo or fragmentation
[ Upstream commit
85f1bd9a7b5a79d5baa8bf44af19658f7bf77bfa ]
When iteratively building a UDP datagram with MSG_MORE and that
datagram exceeds MTU, consistently choose UFO or fragmentation.
Once skb_is_gso, always apply ufo. Conversely, once a datagram is
split across multiple skbs, do not consider ufo.
Sendpage already maintains the first invariant, only add the second.
IPv6 does not have a sendpage implementation to modify.
A gso skb must have a partial checksum, do not follow sk_no_check_tx
in udp_send_skb.
Found by syzkaller.
Fixes:
e89e9cf539a2 ("[IPv4/IPv6]: UFO Scatter-gather approach")
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Greg Kroah-Hartman [Fri, 11 Aug 2017 16:19:02 +0000 (09:19 -0700)]
revert "ipv4: Should use consistent conditional judgement for ip fragment in __ip_append_data and ip_finish_output"
This reverts commit
f102bb7164c9020e12662998f0fd99c3be72d4f6 which is
commit
0a28cfd51e17f4f0a056bcf66bfbe492c3b99f38 upstream as there is
another patch that needs to be applied instead of this one.
Cc: Zheng Li <james.z.li@ericsson.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Greg Kroah-Hartman [Fri, 11 Aug 2017 16:14:09 +0000 (09:14 -0700)]
revert "net: account for current skb length when deciding about UFO"
This reverts commit
ef09c9ff343122a0b245416066992d096416ff19 which is
commit
a5cb659bbc1c8644efa0c3138a757a1e432a4880 upstream as it causes
merge issues with later patches that are much more important...
Cc: Michal Kubecek <mkubecek@suse.cz>
Cc: Vlad Yasevich <vyasevic@redhat.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Willem de Bruijn [Thu, 10 Aug 2017 16:41:58 +0000 (12:41 -0400)]
packet: fix tp_reserve race in packet_set_ring
[ Upstream commit
c27927e372f0785f3303e8fad94b85945e2c97b7 ]
Updates to tp_reserve can race with reads of the field in
packet_set_ring. Avoid this by holding the socket lock during
updates in setsockopt PACKET_RESERVE.
This bug was discovered by syzkaller.
Fixes:
8913336a7e8d ("packet: add PACKET_RESERVE sockopt")
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Nikolay Borisov [Wed, 9 Aug 2017 11:38:04 +0000 (14:38 +0300)]
igmp: Fix regression caused by igmp sysctl namespace code.
[ Upstream commit
1714020e42b17135032c8606f7185b3fb2ba5d78 ]
Commit
dcd87999d415 ("igmp: net: Move igmp namespace init to correct file")
moved the igmp sysctls initialization from tcp_sk_init to igmp_net_init. This
function is only called as part of per-namespace initialization, only if
CONFIG_IP_MULTICAST is defined, otherwise igmp_mc_init() call in ip_init is
compiled out, casuing the igmp pernet ops to not be registerd and those sysctl
being left initialized with 0. However, there are certain functions, such as
ip_mc_join_group which are always compiled and make use of some of those
sysctls. Let's do a partial revert of the aforementioned commit and move the
sysctl initialization into inet_init_net, that way they will always have
sane values.
Fixes:
dcd87999d415 ("igmp: net: Move igmp namespace init to correct file")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=196595
Reported-by: Gerardo Exequiel Pozzi <vmlinuz386@gmail.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Willem de Bruijn [Tue, 8 Aug 2017 18:22:55 +0000 (14:22 -0400)]
net: avoid skb_warn_bad_offload false positives on UFO
[ Upstream commit
8d63bee643f1fb53e472f0e135cae4eb99d62d19 ]
skb_warn_bad_offload triggers a warning when an skb enters the GSO
stack at __skb_gso_segment that does not have CHECKSUM_PARTIAL
checksum offload set.
Commit
b2504a5dbef3 ("net: reduce skb_warn_bad_offload() noise")
observed that SKB_GSO_DODGY producers can trigger the check and
that passing those packets through the GSO handlers will fix it
up. But, the software UFO handler will set ip_summed to
CHECKSUM_NONE.
When __skb_gso_segment is called from the receive path, this
triggers the warning again.
Make UFO set CHECKSUM_UNNECESSARY instead of CHECKSUM_NONE. On
Tx these two are equivalent. On Rx, this better matches the
skb state (checksum computed), as CHECKSUM_NONE here means no
checksum computed.
See also this thread for context:
http://patchwork.ozlabs.org/patch/799015/
Fixes:
b2504a5dbef3 ("net: reduce skb_warn_bad_offload() noise")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Eric Dumazet [Tue, 8 Aug 2017 08:41:58 +0000 (01:41 -0700)]
tcp: fastopen: tcp_connect() must refresh the route
[ Upstream commit
8ba60924710cde564a3905588b6219741d6356d0 ]
With new TCP_FASTOPEN_CONNECT socket option, there is a possibility
to call tcp_connect() while socket sk_dst_cache is either NULL
or invalid.
+0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 4
+0 fcntl(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0
+0 setsockopt(4, SOL_TCP, TCP_FASTOPEN_CONNECT, [1], 4) = 0
+0 connect(4, ..., ...) = 0
<< sk->sk_dst_cache becomes obsolete, or even set to NULL >>
+1 sendto(4, ..., 1000, MSG_FASTOPEN, ..., ...) = 1000
We need to refresh the route otherwise bad things can happen,
especially when syzkaller is running on the host :/
Fixes:
19f6d3f3c8422 ("net/tcp-fastopen: Add new API support")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Wei Wang <weiwan@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Wei Wang <weiwan@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Xin Long [Wed, 9 Aug 2017 10:15:19 +0000 (18:15 +0800)]
net: sched: set xt_tgchk_param par.nft_compat as 0 in ipt_init_target
[ Upstream commit
96d9703050a0036a3360ec98bb41e107c90664fe ]
Commit
55917a21d0cc ("netfilter: x_tables: add context to know if
extension runs from nft_compat") introduced a member nft_compat to
xt_tgchk_param structure.
But it didn't set it's value for ipt_init_target. With unexpected
value in par.nft_compat, it may return unexpected result in some
target's checkentry.
This patch is to set all it's fields as 0 and only initialize the
non-zero fields in ipt_init_target.
v1->v2:
As Wang Cong's suggestion, fix it by setting all it's fields as
0 and only initializing the non-zero fields.
Fixes:
55917a21d0cc ("netfilter: x_tables: add context to know if extension runs from nft_compat")
Suggested-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Davide Caratti [Thu, 3 Aug 2017 20:54:48 +0000 (22:54 +0200)]
net/mlx4_en: don't set CHECKSUM_COMPLETE on SCTP packets
[ Upstream commit
e718fe450e616227b74d27a233cdf37b4df0c82b ]
if the NIC fails to validate the checksum on TCP/UDP, and validation of IP
checksum is successful, the driver subtracts the pseudo-header checksum
from the value obtained by the hardware and sets CHECKSUM_COMPLETE. Don't
do that if protocol is IPPROTO_SCTP, otherwise CRC32c validation fails.
V2: don't test MLX4_CQE_STATUS_IPV6 if MLX4_CQE_STATUS_IPV4 is set
Reported-by: Shuang Li <shuali@redhat.com>
Fixes:
f8c6455bb04b ("net/mlx4_en: Extend checksum offloading by CHECKSUM COMPLETE")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Daniel Borkmann [Fri, 4 Aug 2017 12:20:54 +0000 (14:20 +0200)]
bpf, s390: fix jit branch offset related to ldimm64
[ Upstream commit
b0a0c2566f28e71e5e32121992ac8060cec75510 ]
While testing some other work that required JIT modifications, I
run into test_bpf causing a hang when JIT enabled on s390. The
problematic test case was the one from
ddc665a4bb4b (bpf, arm64:
fix jit branch offset related to ldimm64), and turns out that we
do have a similar issue on s390 as well. In bpf_jit_prog() we
update next instruction address after returning from bpf_jit_insn()
with an insn_count. bpf_jit_insn() returns either -1 in case of
error (e.g. unsupported insn), 1 or 2. The latter is only the
case for ldimm64 due to spanning 2 insns, however, next address
is only set to i + 1 not taking actual insn_count into account,
thus fix is to use insn_count instead of 1. bpf_jit_enable in
mode 2 provides also disasm on s390:
Before fix:
000003ff800349b6:
a7f40003 brc 15,
3ff800349bc ; target
000003ff800349ba: 0000 unknown
000003ff800349bc:
e3b0f0700024 stg %r11,112(%r15)
000003ff800349c2:
e3e0f0880024 stg %r14,136(%r15)
000003ff800349c8: 0db0 basr %r11,%r0
000003ff800349ca:
c0ef00000000 llilf %r14,0
000003ff800349d0:
e320b0360004 lg %r2,54(%r11)
000003ff800349d6:
e330b03e0004 lg %r3,62(%r11)
000003ff800349dc:
ec23ffeda065 clgrj %r2,%r3,10,
3ff800349b6 ; jmp
000003ff800349e2:
e3e0b0460004 lg %r14,70(%r11)
000003ff800349e8:
e3e0b04e0004 lg %r14,78(%r11)
000003ff800349ee:
b904002e lgr %r2,%r14
000003ff800349f2:
e3b0f0700004 lg %r11,112(%r15)
000003ff800349f8:
e3e0f0880004 lg %r14,136(%r15)
000003ff800349fe: 07fe bcr 15,%r14
After fix:
000003ff80ef3db4:
a7f40003 brc 15,
3ff80ef3dba
000003ff80ef3db8: 0000 unknown
000003ff80ef3dba:
e3b0f0700024 stg %r11,112(%r15)
000003ff80ef3dc0:
e3e0f0880024 stg %r14,136(%r15)
000003ff80ef3dc6: 0db0 basr %r11,%r0
000003ff80ef3dc8:
c0ef00000000 llilf %r14,0
000003ff80ef3dce:
e320b0360004 lg %r2,54(%r11)
000003ff80ef3dd4:
e330b03e0004 lg %r3,62(%r11)
000003ff80ef3dda:
ec230006a065 clgrj %r2,%r3,10,
3ff80ef3de6 ; jmp
000003ff80ef3de0:
e3e0b0460004 lg %r14,70(%r11)
000003ff80ef3de6:
e3e0b04e0004 lg %r14,78(%r11) ; target
000003ff80ef3dec:
b904002e lgr %r2,%r14
000003ff80ef3df0:
e3b0f0700004 lg %r11,112(%r15)
000003ff80ef3df6:
e3e0f0880004 lg %r14,136(%r15)
000003ff80ef3dfc: 07fe bcr 15,%r14
test_bpf.ko suite runs fine after the fix.
Fixes:
054623105728 ("s390/bpf: Add s390x eBPF JIT compiler backend")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Eric Dumazet [Thu, 3 Aug 2017 06:10:46 +0000 (23:10 -0700)]
net: fix keepalive code vs TCP_FASTOPEN_CONNECT
[ Upstream commit
2dda640040876cd8ae646408b69eea40c24f9ae9 ]
syzkaller was able to trigger a divide by 0 in TCP stack [1]
Issue here is that keepalive timer needs to be updated to not attempt
to send a probe if the connection setup was deferred using
TCP_FASTOPEN_CONNECT socket option added in linux-4.11
[1]
divide error: 0000 [#1] SMP
CPU: 18 PID: 0 Comm: swapper/18 Not tainted
task:
ffff986f62f4b040 ti:
ffff986f62fa2000 task.ti:
ffff986f62fa2000
RIP: 0010:[<
ffffffff8409cc0d>] [<
ffffffff8409cc0d>] __tcp_select_window+0x8d/0x160
Call Trace:
<IRQ>
[<
ffffffff8409d951>] tcp_transmit_skb+0x11/0x20
[<
ffffffff8409da21>] tcp_xmit_probe_skb+0xc1/0xe0
[<
ffffffff840a0ee8>] tcp_write_wakeup+0x68/0x160
[<
ffffffff840a151b>] tcp_keepalive_timer+0x17b/0x230
[<
ffffffff83b3f799>] call_timer_fn+0x39/0xf0
[<
ffffffff83b40797>] run_timer_softirq+0x1d7/0x280
[<
ffffffff83a04ddb>] __do_softirq+0xcb/0x257
[<
ffffffff83ae03ac>] irq_exit+0x9c/0xb0
[<
ffffffff83a04c1a>] smp_apic_timer_interrupt+0x6a/0x80
[<
ffffffff83a03eaf>] apic_timer_interrupt+0x7f/0x90
<EOI>
[<
ffffffff83fed2ea>] ? cpuidle_enter_state+0x13a/0x3b0
[<
ffffffff83fed2cd>] ? cpuidle_enter_state+0x11d/0x3b0
Tested:
Following packetdrill no longer crashes the kernel
`echo 0 >/proc/sys/net/ipv4/tcp_timestamps`
// Cache warmup: send a Fast Open cookie request
0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
+0 setsockopt(3, SOL_TCP, TCP_FASTOPEN_CONNECT, [1], 4) = 0
+0 connect(3, ..., ...) = -1 EINPROGRESS (Operation is now in progress)
+0 > S 0:0(0) <mss 1460,nop,nop,sackOK,nop,wscale 8,FO,nop,nop>
+.01 < S. 123:123(0) ack 1 win 14600 <mss 1460,nop,nop,sackOK,nop,wscale 6,FO
abcd1234,nop,nop>
+0 > . 1:1(0) ack 1
+0 close(3) = 0
+0 > F. 1:1(0) ack 1
+0 < F. 1:1(0) ack 2 win 92
+0 > . 2:2(0) ack 2
+0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 4
+0 fcntl(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0
+0 setsockopt(4, SOL_TCP, TCP_FASTOPEN_CONNECT, [1], 4) = 0
+0 setsockopt(4, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0
+.01 connect(4, ..., ...) = 0
+0 setsockopt(4, SOL_TCP, TCP_KEEPIDLE, [5], 4) = 0
+10 close(4) = 0
`echo 1 >/proc/sys/net/ipv4/tcp_timestamps`
Fixes:
19f6d3f3c842 ("net/tcp-fastopen: Add new API support")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Wei Wang <weiwan@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Yuchung Cheng [Tue, 1 Aug 2017 20:22:32 +0000 (13:22 -0700)]
tcp: avoid setting cwnd to invalid ssthresh after cwnd reduction states
[ Upstream commit
ed254971edea92c3ac5c67c6a05247a92aa6075e ]
If the sender switches the congestion control during ECN-triggered
cwnd-reduction state (CA_CWR), upon exiting recovery cwnd is set to
the ssthresh value calculated by the previous congestion control. If
the previous congestion control is BBR that always keep ssthresh
to TCP_INIFINITE_SSTHRESH, cwnd ends up being infinite. The safe
step is to avoid assigning invalid ssthresh value when recovery ends.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Guillaume Nault [Tue, 8 Aug 2017 09:43:24 +0000 (11:43 +0200)]
ppp: fix xmit recursion detection on ppp channels
[ Upstream commit
0a0e1a85c83775a648041be2b15de6d0a2f2b8eb ]
Commit
e5dadc65f9e0 ("ppp: Fix false xmit recursion detect with two ppp
devices") dropped the xmit_recursion counter incrementation in
ppp_channel_push() and relied on ppp_xmit_process() for this task.
But __ppp_channel_push() can also send packets directly (using the
.start_xmit() channel callback), in which case the xmit_recursion
counter isn't incremented anymore. If such packets get routed back to
the parent ppp unit, ppp_xmit_process() won't notice the recursion and
will call ppp_channel_push() on the same channel, effectively creating
the deadlock situation that the xmit_recursion mechanism was supposed
to prevent.
This patch re-introduces the xmit_recursion counter incrementation in
ppp_channel_push(). Since the xmit_recursion variable is now part of
the parent ppp unit, incrementation is skipped if the channel doesn't
have any. This is fine because only packets routed through the parent
unit may enter the channel recursively.
Finally, we have to ensure that pch->ppp is not going to be modified
while executing ppp_channel_push(). Instead of taking this lock only
while calling ppp_xmit_process(), we now have to hold it for the full
ppp_channel_push() execution. This respects the ppp locks ordering
which requires locking ->upl before ->downl.
Fixes:
e5dadc65f9e0 ("ppp: Fix false xmit recursion detect with two ppp devices")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Gao Feng [Mon, 17 Jul 2017 10:34:42 +0000 (18:34 +0800)]
ppp: Fix false xmit recursion detect with two ppp devices
[ Upstream commit
e5dadc65f9e0177eb649bcd9d333f1ebf871223e ]
The global percpu variable ppp_xmit_recursion is used to detect the ppp
xmit recursion to avoid the deadlock, which is caused by one CPU tries to
lock the xmit lock twice. But it would report false recursion when one CPU
wants to send the skb from two different PPP devices, like one L2TP on the
PPPoE. It is a normal case actually.
Now use one percpu member of struct ppp instead of the gloable variable to
detect the xmit recursion of one ppp device.
Fixes:
55454a565836 ("ppp: avoid dealock on recursive xmit")
Signed-off-by: Gao Feng <gfree.wind@vip.163.com>
Signed-off-by: Liu Jianying <jianying.liu@ikuai8.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Greg Kroah-Hartman [Fri, 11 Aug 2017 15:49:50 +0000 (08:49 -0700)]
Linux 4.9.42
Tejun Heo [Sun, 23 Jul 2017 12:36:15 +0000 (08:36 -0400)]
workqueue: implicit ordered attribute should be overridable
commit
0a94efb5acbb6980d7c9ab604372d93cd507e4d8 upstream.
5c0338c68706 ("workqueue: restore WQ_UNBOUND/max_active==1 to be
ordered") automatically enabled ordered attribute for unbound
workqueues w/ max_active == 1. Because ordered workqueues reject
max_active and some attribute changes, this implicit ordered mode
broke cases where the user creates an unbound workqueue w/ max_active
== 1 and later explicitly changes the related attributes.
This patch distinguishes explicit and implicit ordered setting and
overrides from attribute changes if implict.
Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes:
5c0338c68706 ("workqueue: restore WQ_UNBOUND/max_active==1 to be ordered")
Cc: Holger Hoffstätte <holger@applied-asynchrony.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Florian Fainelli [Sat, 18 Feb 2017 00:07:33 +0000 (16:07 -0800)]
net: phy: Fix PHY unbind crash
commit
7b9a88a390dacb37b051a7b09b9a08f546edf5eb upstream.
The PHY library does not deal very well with bind and unbind events. The first
thing we would see is that we were not properly canceling the PHY state machine
workqueue, so we would be crashing while dereferencing phydev->drv since there
is no driver attached anymore.
Suggested-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Michal Kubeček [Mon, 19 Jun 2017 11:03:43 +0000 (13:03 +0200)]
net: account for current skb length when deciding about UFO
[ Upstream commit
a5cb659bbc1c8644efa0c3138a757a1e432a4880 ]
Our customer encountered stuck NFS writes for blocks starting at specific
offsets w.r.t. page boundary caused by networking stack sending packets via
UFO enabled device with wrong checksum. The problem can be reproduced by
composing a long UDP datagram from multiple parts using MSG_MORE flag:
sendto(sd, buff, 1000, MSG_MORE, ...);
sendto(sd, buff, 1000, MSG_MORE, ...);
sendto(sd, buff, 3000, 0, ...);
Assume this packet is to be routed via a device with MTU 1500 and
NETIF_F_UFO enabled. When second sendto() gets into __ip_append_data(),
this condition is tested (among others) to decide whether to call
ip_ufo_append_data():
((length + fragheaderlen) > mtu) || (skb && skb_is_gso(skb))
At the moment, we already have skb with 1028 bytes of data which is not
marked for GSO so that the test is false (fragheaderlen is usually 20).
Thus we append second 1000 bytes to this skb without invoking UFO. Third
sendto(), however, has sufficient length to trigger the UFO path so that we
end up with non-UFO skb followed by a UFO one. Later on, udp_send_skb()
uses udp_csum() to calculate the checksum but that assumes all fragments
have correct checksum in skb->csum which is not true for UFO fragments.
When checking against MTU, we need to add skb->len to length of new segment
if we already have a partially filled skb and fragheaderlen only if there
isn't one.
In the IPv6 case, skb can only be null if this is the first segment so that
we have to use headersize (length of the first IPv6 header) rather than
fragheaderlen (length of IPv6 header of further fragments) for skb == NULL.
Fixes:
e89e9cf539a2 ("[IPv4/IPv6]: UFO Scatter-gather approach")
Fixes:
e4c5e13aa45c ("ipv6: Should use consistent conditional judgement for
ip6 fragment between __ip6_append_data and ip6_finish_output")
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
zheng li [Mon, 12 Dec 2016 01:56:05 +0000 (09:56 +0800)]
ipv4: Should use consistent conditional judgement for ip fragment in __ip_append_data and ip_finish_output
[ Upstream commit
0a28cfd51e17f4f0a056bcf66bfbe492c3b99f38 ]
There is an inconsistent conditional judgement in __ip_append_data and
ip_finish_output functions, the variable length in __ip_append_data just
include the length of application's payload and udp header, don't include
the length of ip header, but in ip_finish_output use
(skb->len > ip_skb_dst_mtu(skb)) as judgement, and skb->len include the
length of ip header.
That causes some particular application's udp payload whose length is
between (MTU - IP Header) and MTU were fragmented by ip_fragment even
though the rst->dev support UFO feature.
Add the length of ip header to length in __ip_append_data to keep
consistent conditional judgement as ip_finish_output for ip fragment.
Signed-off-by: Zheng Li <james.z.li@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Or Gerlitz [Sun, 15 Jan 2017 17:05:38 +0000 (19:05 +0200)]
net/mlx5: E-Switch, Re-enable RoCE on mode change only after FDB destroy
[ Upstream commit
5bae8c031053c69b4aa74b7f1ba15d4ec8426208 ]
We must re-enable RoCE on the e-switch management port (PF) only after destroying
the FDB in its switchdev/offloaded mode. Otherwise, when encapsulation is supported,
this re-enablement will fail.
Also, it's more natural and symmetric to disable RoCE on the PF before we create
the FDB under switchdev mode, so do that as well and revert if getting into error
during the mode change later.
Fixes:
9da34cd34e85 ('net/mlx5: Disable RoCE on the e-switch management [..]')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Ard Biesheuvel [Wed, 11 Jan 2017 00:58:00 +0000 (16:58 -0800)]
mm: don't dereference struct page fields of invalid pages
[ Upstream commit
f073bdc51771f5a5c7a8d1191bfc3ae371d44de7 ]
The VM_BUG_ON() check in move_freepages() checks whether the node id of
a page matches the node id of its zone. However, it does this before
having checked whether the struct page pointer refers to a valid struct
page to begin with. This is guaranteed in most cases, but may not be
the case if CONFIG_HOLES_IN_ZONE=y.
So reorder the VM_BUG_ON() with the pfn_valid_within() check.
Link: http://lkml.kernel.org/r/1481706707-6211-2-git-send-email-ard.biesheuvel@linaro.org
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Hanjun Guo <hanjun.guo@linaro.org>
Cc: Yisheng Xie <xieyisheng1@huawei.com>
Cc: Robert Richter <rrichter@cavium.com>
Cc: James Morse <james.morse@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jamie Iles [Wed, 11 Jan 2017 00:57:54 +0000 (16:57 -0800)]
signal: protect SIGNAL_UNKILLABLE from unintentional clearing.
[ Upstream commit
2d39b3cd34e6d323720d4c61bd714f5ae202c022 ]
Since commit
00cd5c37afd5 ("ptrace: permit ptracing of /sbin/init") we
can now trace init processes. init is initially protected with
SIGNAL_UNKILLABLE which will prevent fatal signals such as SIGSTOP, but
there are a number of paths during tracing where SIGNAL_UNKILLABLE can
be implicitly cleared.
This can result in init becoming stoppable/killable after tracing. For
example, running:
while true; do kill -STOP 1; done &
strace -p 1
and then stopping strace and the kill loop will result in init being
left in state TASK_STOPPED. Sending SIGCONT to init will resume it, but
init will now respond to future SIGSTOP signals rather than ignoring
them.
Make sure that when setting SIGNAL_STOP_CONTINUED/SIGNAL_STOP_STOPPED
that we don't clear SIGNAL_UNKILLABLE.
Link: http://lkml.kernel.org/r/20170104122017.25047-1-jamie.iles@oracle.com
Signed-off-by: Jamie Iles <jamie.iles@oracle.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sudip Mukherjee [Wed, 11 Jan 2017 00:57:45 +0000 (16:57 -0800)]
lib/Kconfig.debug: fix frv build failure
[ Upstream commit
da0510c47519fe0999cffe316e1d370e29f952be ]
The build of frv allmodconfig was failing with the errors like:
/tmp/cc0JSPc3.s: Assembler messages:
/tmp/cc0JSPc3.s:1839: Error: symbol `.LSLT0' is already defined
/tmp/cc0JSPc3.s:1842: Error: symbol `.LASLTP0' is already defined
/tmp/cc0JSPc3.s:1969: Error: symbol `.LELTP0' is already defined
/tmp/cc0JSPc3.s:1970: Error: symbol `.LELT0' is already defined
Commit
866ced950bcd ("kbuild: Support split debug info v4") introduced
splitting the debug info and keeping that in a separate file. Somehow,
the frv-linux gcc did not like that and I am guessing that instead of
splitting it started copying. The first report about this is at:
https://lists.01.org/pipermail/kbuild-all/2015-July/010527.html.
I will try and see if this can work with frv and if still fails I will
open a bug report with gcc. But meanwhile this is the easiest option to
solve build failure of frv.
Fixes:
866ced950bcd ("kbuild: Support split debug info v4")
Link: http://lkml.kernel.org/r/1482062348-5352-1-git-send-email-sudipm.mukherjee@gmail.com
Signed-off-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Michal Hocko [Wed, 11 Jan 2017 00:57:27 +0000 (16:57 -0800)]
mm, slab: make sure that KMALLOC_MAX_SIZE will fit into MAX_ORDER
[ Upstream commit
bb1107f7c6052c863692a41f78c000db792334bf ]
Andrey Konovalov has reported the following warning triggered by the
syzkaller fuzzer.
WARNING: CPU: 1 PID: 9935 at mm/page_alloc.c:3511 __alloc_pages_nodemask+0x159c/0x1e20
Kernel panic - not syncing: panic_on_warn set ...
CPU: 1 PID: 9935 Comm: syz-executor0 Not tainted 4.9.0-rc7+ #34
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
__alloc_pages_slowpath mm/page_alloc.c:3511
__alloc_pages_nodemask+0x159c/0x1e20 mm/page_alloc.c:3781
alloc_pages_current+0x1c7/0x6b0 mm/mempolicy.c:2072
alloc_pages include/linux/gfp.h:469
kmalloc_order+0x1f/0x70 mm/slab_common.c:1015
kmalloc_order_trace+0x1f/0x160 mm/slab_common.c:1026
kmalloc_large include/linux/slab.h:422
__kmalloc+0x210/0x2d0 mm/slub.c:3723
kmalloc include/linux/slab.h:495
ep_write_iter+0x167/0xb50 drivers/usb/gadget/legacy/inode.c:664
new_sync_write fs/read_write.c:499
__vfs_write+0x483/0x760 fs/read_write.c:512
vfs_write+0x170/0x4e0 fs/read_write.c:560
SYSC_write fs/read_write.c:607
SyS_write+0xfb/0x230 fs/read_write.c:599
entry_SYSCALL_64_fastpath+0x1f/0xc2
The issue is caused by a lack of size check for the request size in
ep_write_iter which should be fixed. It, however, points to another
problem, that SLUB defines KMALLOC_MAX_SIZE too large because the its
KMALLOC_SHIFT_MAX is (MAX_ORDER + PAGE_SHIFT) which means that the
resulting page allocator request might be MAX_ORDER which is too large
(see __alloc_pages_slowpath).
The same applies to the SLOB allocator which allows even larger sizes.
Make sure that they are capped properly and never request more than
MAX_ORDER order.
Link: http://lkml.kernel.org/r/20161220130659.16461-2-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Rabin Vincent [Wed, 23 Nov 2016 12:02:32 +0000 (13:02 +0100)]
ARM: 8632/1: ftrace: fix syscall name matching
[ Upstream commit
270c8cf1cacc69cb8d99dea812f06067a45e4609 ]
ARM has a few system calls (most notably mmap) for which the names of
the functions which are referenced in the syscall table do not match the
names of the syscall tracepoints. As a consequence of this, these
tracepoints are not made available. Implement
arch_syscall_match_sym_name to fix this and allow tracing even these
system calls.
Signed-off-by: Rabin Vincent <rabinv@axis.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Omar Sandoval [Mon, 9 Jan 2017 19:44:12 +0000 (11:44 -0800)]
virtio_blk: fix panic in initialization error path
[ Upstream commit
6bf6b0aa3da84a3d9126919a94c49c0fb7ee2fb3 ]
If blk_mq_init_queue() returns an error, it gets assigned to
vblk->disk->queue. Then, when we call put_disk(), we end up calling
blk_put_queue() with the ERR_PTR, causing a bad dereference. Fix it by
only assigning to vblk->disk->queue on success.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jeff Moyer [Mon, 9 Jan 2017 20:20:31 +0000 (15:20 -0500)]
nbd: blk_mq_init_queue returns an error code on failure, not NULL
[ Upstream commit
25b4acfc7de0fc4da3bfea3a316f7282c6fbde81 ]
Additionally, don't assign directly to disk->queue, otherwise
blk_put_queue (called via put_disk) will choke (panic) on the errno
stored there.
Bug found by code inspection after Omar found a similar issue in
virtio_blk. Compile-tested only.
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Steve Wise [Thu, 22 Dec 2016 15:40:37 +0000 (07:40 -0800)]
iw_cxgb4: do not send RX_DATA_ACK CPLs after close/abort
[ Upstream commit
3bcf96e0183f5c863657cb6ae9adad307a0f6071 ]
Function rx_data(), which handles ingress CPL_RX_DATA messages, was
always sending an RX_DATA_ACK with the goal of updating the credits.
However, if the RDMA connection is moved out of FPDU mode abruptly,
then it is possible for iw_cxgb4 to process queued RX_DATA CPLs after HW
has aborted the connection. These CPLs should not trigger RX_DATA_ACKS.
If they do, HW can see a READ after DELETE of the DB_LE hash entry for
the tid and post a LE_DB HashTblMemCrcError.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Emmanuel Vadot [Wed, 14 Dec 2016 14:57:24 +0000 (15:57 +0100)]
ARM: dts: sunxi: Change node name for pwrseq pin on Olinuxino-lime2-emmc
[ Upstream commit
3116d37651d77125bf50f81f859b1278e02ccce6 ]
The node name for the power seq pin is mmc2@0 like the mmc2_pins_a one.
This makes the original node (mmc2_pins_a) scrapped out of the dtb and
result in a unusable eMMC if U-Boot didn't configured the pins to the
correct functions.
Signed-off-by: Emmanuel Vadot <manu@bidouilliste.com>
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Milo Kim [Mon, 12 Dec 2016 23:18:15 +0000 (08:18 +0900)]
ARM: dts: sun8i: Support DTB build for NanoPi M1
[ Upstream commit
661ccdc1a95f18ab6c1373322fde09afd5b90a1f ]
The commit
10efbf5f1633 ("ARM: dts: sun8i: Add dts file for NanoPi M1 SBC")
introduced NanoPi M1 board but it's missing in Allwinner H3 DTB build.
Signed-off-by: Milo Kim <woogyom.kim@gmail.com>
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Gerd Hoffmann [Mon, 28 Nov 2016 07:52:20 +0000 (08:52 +0100)]
drm/virtio: fix framebuffer sparse warning
[ Upstream commit
71d3f6ef7f5af38dea2975ec5715c88bae92e92d ]
virtio uses normal ram as backing storage for the framebuffer, so we
should assign the address to new screen_buffer (added by commit
17a7b0b4d9749f80d365d7baff5dec2f54b0e992) instead of screen_base.
Reported-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Milan P. Gandhi [Sat, 24 Dec 2016 16:32:46 +0000 (22:02 +0530)]
scsi: qla2xxx: Get mutex lock before checking optrom_state
[ Upstream commit
c7702b8c22712a06080e10f1d2dee1a133ec8809 ]
There is a race condition with qla2xxx optrom functions where one thread
might modify optrom buffer, optrom_state while other thread is still
reading from it.
In couple of crashes, it was found that we had successfully passed the
following 'if' check where we confirm optrom_state to be
QLA_SREADING. But by the time we acquired mutex lock to proceed with
memory_read_from_buffer function, some other thread/process had already
modified that option rom buffer and optrom_state from QLA_SREADING to
QLA_SWAITING. Then we got ha->optrom_buffer 0x0 and crashed the system:
if (ha->optrom_state != QLA_SREADING)
return 0;
mutex_lock(&ha->optrom_mutex);
rval = memory_read_from_buffer(buf, count, &off, ha->optrom_buffer,
ha->optrom_region_size);
mutex_unlock(&ha->optrom_mutex);
With current optrom function we get following crash due to a race
condition:
[ 1479.466679] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 1479.466707] IP: [<
ffffffff81326756>] memcpy+0x6/0x110
[...]
[ 1479.473673] Call Trace:
[ 1479.474296] [<
ffffffff81225cbc>] ? memory_read_from_buffer+0x3c/0x60
[ 1479.474941] [<
ffffffffa01574dc>] qla2x00_sysfs_read_optrom+0x9c/0xc0 [qla2xxx]
[ 1479.475571] [<
ffffffff8127e76b>] read+0xdb/0x1f0
[ 1479.476206] [<
ffffffff811fdf9e>] vfs_read+0x9e/0x170
[ 1479.476839] [<
ffffffff811feb6f>] SyS_read+0x7f/0xe0
[ 1479.477466] [<
ffffffff816964c9>] system_call_fastpath+0x16/0x1b
Below patch modifies qla2x00_sysfs_read_optrom,
qla2x00_sysfs_write_optrom functions to get the mutex_lock before
checking ha->optrom_state to avoid similar crashes.
The patch was applied and tested and same crashes were no longer
observed again.
Tested-by: Milan P. Gandhi <mgandhi@redhat.com>
Signed-off-by: Milan P. Gandhi <mgandhi@redhat.com>
Reviewed-by: Laurence Oberman <loberman@redhat.com>
Acked-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Marek Szyprowski [Thu, 22 Dec 2016 09:44:30 +0000 (10:44 +0100)]
clk/samsung: exynos542x: mark some clocks as critical
[ Upstream commit
318fa46cc60d37fec1e87dbf03a82aca0f5ce695 ]
Some parent clocks of the Exynos542x clock blocks, which have separate
power domains (like DISP, MFC, MSC, GSC, FSYS and G2D) must be always
enabled to access any register related to power management unit or devices
connected to it. For the time being, until a proper solution based on
runtime PM is applied, mark those clocks as critical (instead of ignore
unused or even no flags) to prevent disabling them.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Reviewed-by: Chanwoo Choi <cw00.choi@samsung.com>
Reviewed-by: Javier Martinez Canillas <javier@osg.samsung.com>
Tested-by: Javier Martinez Canillas <javier@osg.samsung.com> [Exynos5800 Peach Pi Chromebook]
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pavel Tikhomirov [Mon, 9 Jan 2017 07:45:49 +0000 (10:45 +0300)]
ipv4: make tcp_notsent_lowat sysctl knob behave as true unsigned int
[ Upstream commit
b007f09072ca8afa118ade333e717ba443e8d807 ]
> cat /proc/sys/net/ipv4/tcp_notsent_lowat
-1
> echo
4294967295 > /proc/sys/net/ipv4/tcp_notsent_lowat
-bash: echo: write error: Invalid argument
> echo -
2147483648 > /proc/sys/net/ipv4/tcp_notsent_lowat
> cat /proc/sys/net/ipv4/tcp_notsent_lowat
-
2147483648
but in documentation we have "tcp_notsent_lowat - UNSIGNED INTEGER"
v2: simplify to just proc_douintvec
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Zefir Kurtisi [Fri, 6 Jan 2017 11:14:48 +0000 (12:14 +0100)]
phy state machine: failsafe leave invalid RUNNING state
[ Upstream commit
811a919135b980bac8009d042acdccf10dc1ef5e ]
While in RUNNING state, phy_state_machine() checks for link changes by
comparing phydev->link before and after calling phy_read_status().
This works as long as it is guaranteed that phydev->link is never
changed outside the phy_state_machine().
If in some setups this happens, it causes the state machine to miss
a link loss and remain RUNNING despite phydev->link being 0.
This has been observed running a dsa setup with a process continuously
polling the link states over ethtool each second (SNMPD RFC-1213
agent). Disconnecting the link on a phy followed by a ETHTOOL_GSET
causes dsa_slave_get_settings() / dsa_slave_get_link_ksettings() to
call phy_read_status() and with that modify the link status - and
with that bricking the phy state machine.
This patch adds a fail-safe check while in RUNNING, which causes to
move to CHANGELINK when the link is gone and we are still RUNNING.
Signed-off-by: Zefir Kurtisi <zefir.kurtisi@neratec.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pau Espin Pedrol [Fri, 6 Jan 2017 19:33:27 +0000 (20:33 +0100)]
netfilter: use fwmark_reflect in nf_send_reset
[ Upstream commit
cc31d43b4154ad5a7d8aa5543255a93b7e89edc2 ]
Otherwise, RST packets generated by ipt_REJECT always have mark 0 when
the routing is checked later in the same code path.
Fixes:
e110861f8609 ("net: add a sysctl to reflect the fwmark on replies")
Cc: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Pau Espin Pedrol <pau.espin@tessares.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bard Liao [Tue, 27 Dec 2016 04:05:05 +0000 (12:05 +0800)]
ASoC: rt5645: set sel_i2s_pre_div1 to 2
[ Upstream commit
02c5c03283c52157d336abf5e44ffcda10579fbf ]
The i2s clock pre-divider 1 is used for both i2s1 and sysclk.
The i2s1 is usually used for the main i2s and the pre-divider
will be set in hw_params function.
However, if i2s2 is used, the pre-divider is not set in the hw_params
function and the default value of i2s clock pre-divider 1 is too high
for sysclk and DMIC usage. Fix by overriding default divider value to 2.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=95681
Tested-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Signed-off-by: Bard Liao <bardliao@realtek.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Christophe JAILLET [Mon, 9 Jan 2017 00:02:47 +0000 (01:02 +0100)]
spi: spi-axi: Free resources on error path
[ Upstream commit
9620ca90115d4bd700f05862d3b210a266a66efe ]
We should go to 'err_put_master' here instead of returning directly.
Otherwise a call to 'spi_master_put' is missing.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Nicholas Mc Guire [Sat, 7 Jan 2017 09:38:31 +0000 (10:38 +0100)]
x86/boot: Add missing declaration of string functions
[ Upstream commit
fac69d0efad08fc15e4dbfc116830782acc0dc9a ]
Add the missing declarations of basic string functions to string.h to allow
a clean build.
Fixes:
5be865661516 ("String-handling functions for the new x86 setup code.")
Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
Link: http://lkml.kernel.org/r/1483781911-21399-1-git-send-email-hofrat@osadl.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Michael Chan [Fri, 6 Jan 2017 21:18:53 +0000 (16:18 -0500)]
tg3: Fix race condition in tg3_get_stats64().
[ Upstream commit
f5992b72ebe0dde488fa8f706b887194020c66fc ]
The driver's ndo_get_stats64() method is not always called under RTNL.
So it can race with driver close or ethtool reconfigurations. Fix the
race condition by taking tp->lock spinlock in tg3_free_consistent()
when freeing the tp->hw_stats memory block. tg3_get_stats64() is
already taking tp->lock.
Reported-by: Wang Yufen <wangyufen@huawei.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Grygorii Strashko [Thu, 5 Jan 2017 20:48:07 +0000 (14:48 -0600)]
net: phy: dp83867: fix irq generation
[ Upstream commit
5ca7d1ca77dc23934504b95a96d2660d345f83c2 ]
For proper IRQ generation by DP83867 phy the INT/PWDN pin has to be
programmed as an interrupt output instead of a Powerdown input in
Configuration Register 3 (CFG3), Address 0x001E, bit 7 INT_OE = 1. The
current driver doesn't do this and as result IRQs will not be generated by
DP83867 phy even if they are properly configured in DT.
Hence, fix IRQ generation by properly configuring CFG3.INT_OE bit and
ensure that Link Status Change (LINK_STATUS_CHNG_INT) and Auto-Negotiation
Complete (AUTONEG_COMP_INT) interrupt are enabled. After this the DP83867
driver will work properly in interrupt enabled mode.
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sergei Shtylyov [Wed, 4 Jan 2017 21:29:32 +0000 (00:29 +0300)]
sh_eth: R8A7740 supports packet shecksumming
[ Upstream commit
0f1f9cbc04dbb3cc310f70a11cba0cf1f2109d9c ]
The R8A7740 GEther controller supports the packet checksum offloading
but the 'hw_crc' (bad name, I'll fix it) flag isn't set in the R8A7740
data, thus CSMR isn't cleared...
Fixes:
73a0d907301e ("net: sh_eth: add support R8A7740")
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sergei Shtylyov [Wed, 4 Jan 2017 19:18:24 +0000 (22:18 +0300)]
sh_eth: fix EESIPR values for SH77{34|63}
[ Upstream commit
978d3639fd13d987950e4ce85c8737ae92154b2c ]
As the SH77{34|63} manuals are freely available, I've checked the EESIPR
values written against the manuals, and they appeared to set the reserved
bits 11-15 (which should be 0 on write). Fix those EESIPR values.
Fixes:
380af9e390ec ("net: sh_eth: CPU dependency code collect to "struct sh_eth_cpu_data"")
Fixes:
f5d12767c8fd ("sh_eth: get SH77{34|63} support out of #ifdef")
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Arnd Bergmann [Wed, 11 Jan 2017 14:35:25 +0000 (15:35 +0100)]
wext: handle NULL extra data in iwe_stream_add_point better
commit
93be2b74279c15c2844684b1a027fdc71dd5d9bf upstream.
gcc-7 complains that wl3501_cs passes NULL into a function that
then uses the argument as the input for memcpy:
drivers/net/wireless/wl3501_cs.c: In function 'wl3501_get_scan':
include/net/iw_handler.h:559:3: error: argument 2 null where non-null expected [-Werror=nonnull]
memcpy(stream + point_len, extra, iwe->u.data.length);
This works fine here because iwe->u.data.length is guaranteed to be 0
and the memcpy doesn't actually have an effect.
Making the length check explicit avoids the warning and should have
no other effect here.
Also check the pointer itself, since otherwise we get warnings
elsewhere in the code.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
David S. Miller [Fri, 4 Aug 2017 16:47:52 +0000 (09:47 -0700)]
sparc64: Fix exception handling in UltraSPARC-III memcpy.
[ Upstream commit
0ede1c401332173ab0693121dc6cde04a4dbf131 ]
Mikael Pettersson reported that some test programs in the strace-4.18
testsuite cause an OOPS.
After some debugging it turns out that garbage values are returned
when an exception occurs, causing the fixup memset() to be run with
bogus arguments.
The problem is that two of the exception handler stubs write the
successfully copied length into the wrong register.
Fixes:
ee841d0aff64 ("sparc64: Convert U3copy_{from,to}_user to accurate exception reporting.")
Reported-by: Mikael Pettersson <mikpelinux@gmail.com>
Tested-by: Mikael Pettersson <mikpelinux@gmail.com>
Reviewed-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jane Chu [Tue, 11 Jul 2017 18:00:54 +0000 (12:00 -0600)]
sparc64: Measure receiver forward progress to avoid send mondo timeout
[ Upstream commit
9d53caec84c7c5700e7c1ed744ea584fff55f9ac ]
A large sun4v SPARC system may have moments of intensive xcall activities,
usually caused by unmapping many pages on many CPUs concurrently. This can
flood receivers with CPU mondo interrupts for an extended period, causing
some unlucky senders to hit send-mondo timeout. This problem gets worse
as cpu count increases because sometimes mappings must be invalidated on
all CPUs, and sometimes all CPUs may gang up on a single CPU.
But a busy system is not a broken system. In the above scenario, as long
as the receiver is making forward progress processing mondo interrupts,
the sender should continue to retry.
This patch implements the receiver's forward progress meter by introducing
a per cpu counter 'cpu_mondo_counter[cpu]' where 'cpu' is in the range
of 0..NR_CPUS. The receiver increments its counter as soon as it receives
a mondo and the sender tracks the receiver's counter. If the receiver has
stopped making forward progress when the retry limit is reached, the sender
declares send-mondo-timeout and panic; otherwise, the receiver is allowed
to keep making forward progress.
In addition, it's been observed that PCIe hotplug events generate Correctable
Errors that are handled by hypervisor and then OS. Hypervisor 'borrows'
a guest cpu strand briefly to provide the service. If the cpu strand is
simultaneously the only cpu targeted by a mondo, it may not be available
for the mondo in 20msec, causing SUN4V mondo timeout. It appears that 1 second
is the agreed wait time between hypervisor and guest OS, this patch makes
the adjustment.
Orabug:
25476541
Orabug:
26417466
Signed-off-by: Jane Chu <jane.chu@oracle.com>
Reviewed-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Anthony Yznaga <anthony.yznaga@oracle.com>
Reviewed-by: Rob Gardner <rob.gardner@oracle.com>
Reviewed-by: Thomas Tai <thomas.tai@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Wei Liu [Wed, 21 Jun 2017 09:21:22 +0000 (10:21 +0100)]
xen-netback: correctly schedule rate-limited queues
[ Upstream commit
dfa523ae9f2542bee4cddaea37b3be3e157f6e6b ]
Add a flag to indicate if a queue is rate-limited. Test the flag in
NAPI poll handler and avoid rescheduling the queue if true, otherwise
we risk locking up the host. The rescheduling will be done in the
timer callback function.
Reported-by: Jean-Louis Dupond <jean-louis@dupond.be>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Tested-by: Jean-Louis Dupond <jean-louis@dupond.be>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Florian Fainelli [Fri, 28 Jul 2017 18:58:36 +0000 (11:58 -0700)]
net: phy: Correctly process PHY_HALTED in phy_stop_machine()
[ Upstream commit
7ad813f208533cebfcc32d3d7474dc1677d1b09a ]
Marc reported that he was not getting the PHY library adjust_link()
callback function to run when calling phy_stop() + phy_disconnect()
which does not indeed happen because we set the state machine to
PHY_HALTED but we don't get to run it to process this state past that
point.
Fix this with a synchronous call to phy_state_machine() in order to have
the state machine actually act on PHY_HALTED, set the PHY device's link
down, turn the network device's carrier off and finally call the
adjust_link() function.
Reported-by: Marc Gonzalez <marc_gonzalez@sigmadesigns.com>
Fixes:
a390d1f379cf ("phylib: convert state_queue work to delayed_work")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Marc Gonzalez <marc_gonzalez@sigmadesigns.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Eugenia Emantayev [Wed, 12 Jul 2017 14:44:07 +0000 (17:44 +0300)]
net/mlx5e: Schedule overflow check work to mlx5e workqueue
[ Upstream commit
f08c39ed0bfb503c7b3e013cd40d036ce6a0941a ]
This is done in order to ensure that work will not run after the cleanup.
Fixes:
ef9814deafd0 ('net/mlx5e: Add HW timestamping (TS) support')
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Eugenia Emantayev [Wed, 12 Jul 2017 14:27:18 +0000 (17:27 +0300)]
net/mlx5e: Fix wrong delay calculation for overflow check scheduling
[ Upstream commit
d439c84509a510e864fdc6166c760482cd03fc57 ]
The overflow_period is calculated in seconds. In order to use it
for delayed work scheduling translation to jiffies is needed.
Fixes:
ef9814deafd0 ('net/mlx5e: Add HW timestamping (TS) support')
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Ilan Tayari [Wed, 5 Jul 2017 07:17:04 +0000 (10:17 +0300)]
net/mlx5e: Fix outer_header_zero() check size
[ Upstream commit
0242f4a0bb03906010bbf80495512be00494a0ef ]
outer_header_zero() routine checks if the outer_headers match of a
flow-table entry are all zero.
This function uses the size of whole fte_match_param, instead of just
the outer_headers member, causing failure to detect all-zeros if
any other members of the fte_match_param are non-zero.
Use the correct size for zero check.
Fixes:
6dc6071cfcde ("net/mlx5e: Add ethtool flow steering support")
Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Moshe Shemesh [Sun, 25 Jun 2017 15:45:32 +0000 (18:45 +0300)]
net/mlx5: Fix command bad flow on command entry allocation failure
[ Upstream commit
219c81f7d1d5a89656cb3b53d3b4e11e93608d80 ]
When driver fail to allocate an entry to send command to FW, it must
notify the calling function and release the memory allocated for
this command.
Fixes:
e126ba97dba9e ('mlx5: Add driver for Mellanox Connect-IB adapters')
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Cc: kernel-team@fb.com
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Aviv Heller [Sun, 2 Jul 2017 16:13:43 +0000 (19:13 +0300)]
net/mlx5: Consider tx_enabled in all modes on remap
[ Upstream commit
dc798b4cc0f2a06e7ad7d522403de274b86a0a6f ]
The tx_enabled lag event field is used to determine whether a slave is
active.
Current logic uses this value only if the mode is active-backup.
However, LACP mode, although considered a load balancing mode, can mark
a slave as inactive in certain situations (e.g., LACP timeout).
This fix takes the tx_enabled value into account when remapping, with
no respect to the LAG mode (this should not affect the behavior in XOR
mode, since in this mode both slaves are marked as active).
Fixes:
7907f23adc18 (net/mlx5: Implement RoCE LAG feature)
Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Xin Long [Wed, 26 Jul 2017 08:24:59 +0000 (16:24 +0800)]
sctp: fix the check for _sctp_walk_params and _sctp_walk_errors
[ Upstream commit
6b84202c946cd3da3a8daa92c682510e9ed80321 ]
Commit
b1f5bfc27a19 ("sctp: don't dereference ptr before leaving
_sctp_walk_{params, errors}()") tried to fix the issue that it
may overstep the chunk end for _sctp_walk_{params, errors} with
'chunk_end > offset(length) + sizeof(length)'.
But it introduced a side effect: When processing INIT, it verifies
the chunks with 'param.v == chunk_end' after iterating all params
by sctp_walk_params(). With the check 'chunk_end > offset(length)
+ sizeof(length)', it would return when the last param is not yet
accessed. Because the last param usually is fwdtsn supported param
whose size is 4 and 'chunk_end == offset(length) + sizeof(length)'
This is a badly issue even causing sctp couldn't process 4-shakes.
Client would always get abort when connecting to server, due to
the failure of INIT chunk verification on server.
The patch is to use 'chunk_end <= offset(length) + sizeof(length)'
instead of 'chunk_end < offset(length) + sizeof(length)' for both
_sctp_walk_params and _sctp_walk_errors.
Fixes:
b1f5bfc27a19 ("sctp: don't dereference ptr before leaving _sctp_walk_{params, errors}()")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Alexander Potapenko [Fri, 14 Jul 2017 16:32:45 +0000 (18:32 +0200)]
sctp: don't dereference ptr before leaving _sctp_walk_{params, errors}()
[ Upstream commit
b1f5bfc27a19f214006b9b4db7b9126df2dfdf5a ]
If the length field of the iterator (|pos.p| or |err|) is past the end
of the chunk, we shouldn't access it.
This bug has been detected by KMSAN. For the following pair of system
calls:
socket(PF_INET6, SOCK_STREAM, 0x84 /* IPPROTO_??? */) = 3
sendto(3, "A", 1, MSG_OOB, {sa_family=AF_INET6, sin6_port=htons(0),
inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0,
sin6_scope_id=0}, 28) = 1
the tool has reported a use of uninitialized memory:
==================================================================
BUG: KMSAN: use of uninitialized memory in sctp_rcv+0x17b8/0x43b0
CPU: 1 PID: 2940 Comm: probe Not tainted 4.11.0-rc5+ #2926
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:16
dump_stack+0x172/0x1c0 lib/dump_stack.c:52
kmsan_report+0x12a/0x180 mm/kmsan/kmsan.c:927
__msan_warning_32+0x61/0xb0 mm/kmsan/kmsan_instr.c:469
__sctp_rcv_init_lookup net/sctp/input.c:1074
__sctp_rcv_lookup_harder net/sctp/input.c:1233
__sctp_rcv_lookup net/sctp/input.c:1255
sctp_rcv+0x17b8/0x43b0 net/sctp/input.c:170
sctp6_rcv+0x32/0x70 net/sctp/ipv6.c:984
ip6_input_finish+0x82f/0x1ee0 net/ipv6/ip6_input.c:279
NF_HOOK ./include/linux/netfilter.h:257
ip6_input+0x239/0x290 net/ipv6/ip6_input.c:322
dst_input ./include/net/dst.h:492
ip6_rcv_finish net/ipv6/ip6_input.c:69
NF_HOOK ./include/linux/netfilter.h:257
ipv6_rcv+0x1dbd/0x22e0 net/ipv6/ip6_input.c:203
__netif_receive_skb_core+0x2f6f/0x3a20 net/core/dev.c:4208
__netif_receive_skb net/core/dev.c:4246
process_backlog+0x667/0xba0 net/core/dev.c:4866
napi_poll net/core/dev.c:5268
net_rx_action+0xc95/0x1590 net/core/dev.c:5333
__do_softirq+0x485/0x942 kernel/softirq.c:284
do_softirq_own_stack+0x1c/0x30 arch/x86/entry/entry_64.S:902
</IRQ>
do_softirq kernel/softirq.c:328
__local_bh_enable_ip+0x25b/0x290 kernel/softirq.c:181
local_bh_enable+0x37/0x40 ./include/linux/bottom_half.h:31
rcu_read_unlock_bh ./include/linux/rcupdate.h:931
ip6_finish_output2+0x19b2/0x1cf0 net/ipv6/ip6_output.c:124
ip6_finish_output+0x764/0x970 net/ipv6/ip6_output.c:149
NF_HOOK_COND ./include/linux/netfilter.h:246
ip6_output+0x456/0x520 net/ipv6/ip6_output.c:163
dst_output ./include/net/dst.h:486
NF_HOOK ./include/linux/netfilter.h:257
ip6_xmit+0x1841/0x1c00 net/ipv6/ip6_output.c:261
sctp_v6_xmit+0x3b7/0x470 net/sctp/ipv6.c:225
sctp_packet_transmit+0x38cb/0x3a20 net/sctp/output.c:632
sctp_outq_flush+0xeb3/0x46e0 net/sctp/outqueue.c:885
sctp_outq_uncork+0xb2/0xd0 net/sctp/outqueue.c:750
sctp_side_effects net/sctp/sm_sideeffect.c:1773
sctp_do_sm+0x6962/0x6ec0 net/sctp/sm_sideeffect.c:1147
sctp_primitive_ASSOCIATE+0x12c/0x160 net/sctp/primitive.c:88
sctp_sendmsg+0x43e5/0x4f90 net/sctp/socket.c:1954
inet_sendmsg+0x498/0x670 net/ipv4/af_inet.c:762
sock_sendmsg_nosec net/socket.c:633
sock_sendmsg net/socket.c:643
SYSC_sendto+0x608/0x710 net/socket.c:1696
SyS_sendto+0x8a/0xb0 net/socket.c:1664
do_syscall_64+0xe6/0x130 arch/x86/entry/common.c:285
entry_SYSCALL64_slow_path+0x25/0x25 arch/x86/entry/entry_64.S:246
RIP: 0033:0x401133
RSP: 002b:
00007fff6d99cd38 EFLAGS:
00000246 ORIG_RAX:
000000000000002c
RAX:
ffffffffffffffda RBX:
00000000004002b0 RCX:
0000000000401133
RDX:
0000000000000001 RSI:
0000000000494088 RDI:
0000000000000003
RBP:
00007fff6d99cd90 R08:
00007fff6d99cd50 R09:
000000000000001c
R10:
0000000000000001 R11:
0000000000000246 R12:
0000000000000000
R13:
00000000004063d0 R14:
0000000000406460 R15:
0000000000000000
origin:
save_stack_trace+0x37/0x40 arch/x86/kernel/stacktrace.c:59
kmsan_save_stack_with_flags mm/kmsan/kmsan.c:302
kmsan_internal_poison_shadow+0xb1/0x1a0 mm/kmsan/kmsan.c:198
kmsan_poison_shadow+0x6d/0xc0 mm/kmsan/kmsan.c:211
slab_alloc_node mm/slub.c:2743
__kmalloc_node_track_caller+0x200/0x360 mm/slub.c:4351
__kmalloc_reserve net/core/skbuff.c:138
__alloc_skb+0x26b/0x840 net/core/skbuff.c:231
alloc_skb ./include/linux/skbuff.h:933
sctp_packet_transmit+0x31e/0x3a20 net/sctp/output.c:570
sctp_outq_flush+0xeb3/0x46e0 net/sctp/outqueue.c:885
sctp_outq_uncork+0xb2/0xd0 net/sctp/outqueue.c:750
sctp_side_effects net/sctp/sm_sideeffect.c:1773
sctp_do_sm+0x6962/0x6ec0 net/sctp/sm_sideeffect.c:1147
sctp_primitive_ASSOCIATE+0x12c/0x160 net/sctp/primitive.c:88
sctp_sendmsg+0x43e5/0x4f90 net/sctp/socket.c:1954
inet_sendmsg+0x498/0x670 net/ipv4/af_inet.c:762
sock_sendmsg_nosec net/socket.c:633
sock_sendmsg net/socket.c:643
SYSC_sendto+0x608/0x710 net/socket.c:1696
SyS_sendto+0x8a/0xb0 net/socket.c:1664
do_syscall_64+0xe6/0x130 arch/x86/entry/common.c:285
return_from_SYSCALL_64+0x0/0x6a arch/x86/entry/entry_64.S:246
==================================================================
Signed-off-by: Alexander Potapenko <glider@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Xin Long [Wed, 26 Jul 2017 06:20:15 +0000 (14:20 +0800)]
dccp: fix a memleak for dccp_feat_init err process
[ Upstream commit
e90ce2fc27cad7e7b1e72b9e66201a7a4c124c2b ]
In dccp_feat_init, when ccid_get_builtin_ccids failsto alloc
memory for rx.val, it should free tx.val before returning an
error.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Xin Long [Wed, 26 Jul 2017 06:19:46 +0000 (14:19 +0800)]
dccp: fix a memleak that dccp_ipv4 doesn't put reqsk properly
[ Upstream commit
b7953d3c0e30a5fc944f6b7bd0bcceb0794bcd85 ]
The patch "dccp: fix a memleak that dccp_ipv6 doesn't put reqsk
properly" fixed reqsk refcnt leak for dccp_ipv6. The same issue
exists on dccp_ipv4.
This patch is to fix it for dccp_ipv4.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Xin Long [Wed, 26 Jul 2017 06:19:09 +0000 (14:19 +0800)]
dccp: fix a memleak that dccp_ipv6 doesn't put reqsk properly
[ Upstream commit
0c2232b0a71db0ac1d22f751aa1ac0cadb950fd2 ]
In dccp_v6_conn_request, after reqsk gets alloced and hashed into
ehash table, reqsk's refcnt is set 3. one is for req->rsk_timer,
one is for hlist, and the other one is for current using.
The problem is when dccp_v6_conn_request returns and finishes using
reqsk, it doesn't put reqsk. This will cause reqsk refcnt leaks and
reqsk obj never gets freed.
Jianlin found this issue when running dccp_memleak.c in a loop, the
system memory would run out.
dccp_memleak.c:
int s1 = socket(PF_INET6, 6, IPPROTO_IP);
bind(s1, &sa1, 0x20);
listen(s1, 0x9);
int s2 = socket(PF_INET6, 6, IPPROTO_IP);
connect(s2, &sa1, 0x20);
close(s1);
close(s2);
This patch is to put the reqsk before dccp_v6_conn_request returns,
just as what tcp_conn_request does.
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Marc Gonzalez [Tue, 25 Jul 2017 12:35:03 +0000 (14:35 +0200)]
net: ethernet: nb8800: Handle all 4 RGMII modes identically
[ Upstream commit
4813497b537c6208c90d6cbecac5072d347de900 ]
Before commit
bf8f6952a233 ("Add blurb about RGMII") it was unclear
whose responsibility it was to insert the required clock skew, and
in hindsight, some PHY drivers got it wrong. The solution forward
is to introduce a new property, explicitly requiring skew from the
node to which it is attached. In the interim, this driver will handle
all 4 RGMII modes identically (no skew).
Fixes:
52dfc8301248 ("net: ethernet: add driver for Aurora VLSI NB8800 Ethernet controller")
Signed-off-by: Marc Gonzalez <marc_gonzalez@sigmadesigns.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Stefano Brivio [Mon, 24 Jul 2017 21:14:28 +0000 (23:14 +0200)]
ipv6: Don't increase IPSTATS_MIB_FRAGFAILS twice in ip6_fragment()
[ Upstream commit
afce615aaabfbaad02550e75c0bec106dafa1adf ]
RFC 2465 defines ipv6IfStatsOutFragFails as:
"The number of IPv6 datagrams that have been discarded
because they needed to be fragmented at this output
interface but could not be."
The existing implementation, instead, would increase the counter
twice in case we fail to allocate room for single fragments:
once for the fragment, once for the datagram.
This didn't look intentional though. In one of the two affected
affected failure paths, the double increase was simply a result
of a new 'goto fail' statement, introduced to avoid a skb leak.
The other path appears to be affected since at least 2.6.12-rc2.
Reported-by: Sabrina Dubroca <sdubroca@redhat.com>
Fixes:
1d325d217c7f ("ipv6: ip6_fragment: fix headroom tests and skb leak")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
WANG Cong [Mon, 24 Jul 2017 17:07:32 +0000 (10:07 -0700)]
packet: fix use-after-free in prb_retire_rx_blk_timer_expired()
[ Upstream commit
c800aaf8d869f2b9b47b10c5c312fe19f0a94042 ]
There are multiple reports showing we have a use-after-free in
the timer prb_retire_rx_blk_timer_expired(), where we use struct
tpacket_kbdq_core::pkbdq, a pg_vec, after it gets freed by
free_pg_vec().
The interesting part is it is not freed via packet_release() but
via packet_setsockopt(), which means we are not closing the socket.
Looking into the big and fat function packet_set_ring(), this could
happen if we satisfy the following conditions:
1. closing == 0, not on packet_release() path
2. req->tp_block_nr == 0, we don't allocate a new pg_vec
3. rx_ring->pg_vec is already set as V3, which means we already called
packet_set_ring() wtih req->tp_block_nr > 0 previously
4. req->tp_frame_nr == 0, pass sanity check
5. po->mapped == 0, never called mmap()
In this scenario we are clearing the old rx_ring->pg_vec, so we need
to free this pg_vec, but we don't stop the timer on this path because
of closing==0.
The timer has to be stopped as long as we need to free pg_vec, therefore
the check on closing!=0 is wrong, we should check pg_vec!=NULL instead.
Thanks to liujian for testing different fixes.
Reported-by: alexander.levin@verizon.com
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Reported-by: liujian (CE) <liujian56@huawei.com>
Tested-by: liujian (CE) <liujian56@huawei.com>
Cc: Ding Tianhong <dingtianhong@huawei.com>
Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Liping Zhang [Sun, 23 Jul 2017 09:52:23 +0000 (17:52 +0800)]
openvswitch: fix potential out of bound access in parse_ct
[ Upstream commit
69ec932e364b1ba9c3a2085fe96b76c8a3f71e7c ]
Before the 'type' is validated, we shouldn't use it to fetch the
ovs_ct_attr_lens's minlen and maxlen, else, out of bound access
may happen.
Fixes:
7f8a436eaa2c ("openvswitch: Add conntrack action")
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Thomas Jarosch [Sat, 22 Jul 2017 15:14:34 +0000 (17:14 +0200)]
mcs7780: Fix initialization when CONFIG_VMAP_STACK is enabled
[ Upstream commit
9476d393667968b4a02afbe9d35a3558482b943e ]
DMA transfers are not allowed to buffers that are on the stack.
Therefore allocate a buffer to store the result of usb_control_message().
Fixes these bugreports:
https://bugzilla.kernel.org/show_bug.cgi?id=195217
https://bugzilla.redhat.com/show_bug.cgi?id=1421387
https://bugzilla.redhat.com/show_bug.cgi?id=1427398
Shortened kernel backtrace from 4.11.9-200.fc25.x86_64:
kernel: ------------[ cut here ]------------
kernel: WARNING: CPU: 3 PID: 2957 at drivers/usb/core/hcd.c:1587
kernel: transfer buffer not dma capable
kernel: Call Trace:
kernel: dump_stack+0x63/0x86
kernel: __warn+0xcb/0xf0
kernel: warn_slowpath_fmt+0x5a/0x80
kernel: usb_hcd_map_urb_for_dma+0x37f/0x570
kernel: ? try_to_del_timer_sync+0x53/0x80
kernel: usb_hcd_submit_urb+0x34e/0xb90
kernel: ? schedule_timeout+0x17e/0x300
kernel: ? del_timer_sync+0x50/0x50
kernel: ? __slab_free+0xa9/0x300
kernel: usb_submit_urb+0x2f4/0x560
kernel: ? urb_destroy+0x24/0x30
kernel: usb_start_wait_urb+0x6e/0x170
kernel: usb_control_msg+0xdc/0x120
kernel: mcs_get_reg+0x36/0x40 [mcs7780]
kernel: mcs_net_open+0xb5/0x5c0 [mcs7780]
...
Regression goes back to 4.9, so it's a good candidate for -stable.
Though it's the decision of the maintainer.
Thanks to Dan Williams for adding the "transfer buffer not dma capable"
warning in the first place. It instantly pointed me in the right direction.
Patch has been tested with transferring data from a Polar watch.
Signed-off-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
WANG Cong [Thu, 20 Jul 2017 18:27:57 +0000 (11:27 -0700)]
rtnetlink: allocate more memory for dev_set_mac_address()
[ Upstream commit
153711f9421be5dbc973dc57a4109dc9d54c89b1 ]
virtnet_set_mac_address() interprets mac address as struct
sockaddr, but upper layer only allocates dev->addr_len
which is ETH_ALEN + sizeof(sa_family_t) in this case.
We lack a unified definition for mac address, so just fix
the upper layer, this also allows drivers to interpret it
to struct sockaddr freely.
Reported-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Mahesh Bandewar [Wed, 19 Jul 2017 22:41:33 +0000 (15:41 -0700)]
ipv4: initialize fib_trie prior to register_netdev_notifier call.
[ Upstream commit
8799a221f5944a7d74516ecf46d58c28ec1d1f75 ]
Net stack initialization currently initializes fib-trie after the
first call to netdevice_notifier() call. In fact fib_trie initialization
needs to happen before first rtnl_register(). It does not cause any problem
since there are no devices UP at this moment, but trying to bring 'lo'
UP at initialization would make this assumption wrong and exposes the issue.
Fixes following crash
Call Trace:
? alternate_node_alloc+0x76/0xa0
fib_table_insert+0x1b7/0x4b0
fib_magic.isra.17+0xea/0x120
fib_add_ifaddr+0x7b/0x190
fib_netdev_event+0xc0/0x130
register_netdevice_notifier+0x1c1/0x1d0
ip_fib_init+0x72/0x85
ip_rt_init+0x187/0x1e9
ip_init+0xe/0x1a
inet_init+0x171/0x26c
? ipv4_offload_init+0x66/0x66
do_one_initcall+0x43/0x160
kernel_init_freeable+0x191/0x219
? rest_init+0x80/0x80
kernel_init+0xe/0x150
ret_from_fork+0x22/0x30
Code: f6 46 23 04 74 86 4c 89 f7 e8 ae 45 01 00 49 89 c7 4d 85 ff 0f 85 7b ff ff ff 31 db eb 08 4c 89 ff e8 16 47 01 00 48 8b 44 24 38 <45> 8b 6e 14 4d 63 76 74 48 89 04 24 0f 1f 44 00 00 48 83 c4 08
RIP: kmem_cache_alloc+0xcf/0x1c0 RSP:
ffff9b1500017c28
CR2:
0000000000000014
Fixes:
7b1a74fdbb9e ("[NETNS]: Refactor fib initialization so it can handle multiple namespaces.")
Fixes:
7f9b80529b8a ("[IPV4]: fib hash|trie initialization")
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Florian Fainelli [Thu, 20 Jul 2017 19:25:22 +0000 (12:25 -0700)]
net: dsa: b53: Add missing ARL entries for BCM53125
[ Upstream commit
be35e8c516c1915a3035d266a2015b41f73ba3f9 ]
The BCM53125 entry was missing an arl_entries member which would
basically prevent the ARL search from terminating properly. This switch
has 4 ARL entries, so add that.
Fixes:
1da6df85c6fb ("net: dsa: b53: Implement ARL add/del/dump operations")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sabrina Dubroca [Wed, 19 Jul 2017 20:28:55 +0000 (22:28 +0200)]
ipv6: avoid overflow of offset in ip6_find_1stfragopt
[ Upstream commit
6399f1fae4ec29fab5ec76070435555e256ca3a6 ]
In some cases, offset can overflow and can cause an infinite loop in
ip6_find_1stfragopt(). Make it unsigned int to prevent the overflow, and
cap it at IPV6_MAXPLEN, since packets larger than that should be invalid.
This problem has been here since before the beginning of git history.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
David S. Miller [Wed, 19 Jul 2017 20:33:24 +0000 (13:33 -0700)]
net: Zero terminate ifr_name in dev_ifname().
[ Upstream commit
63679112c536289826fec61c917621de95ba2ade ]
The ifr.ifr_name is passed around and assumed to be NULL terminated.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Alexander Potapenko [Mon, 17 Jul 2017 10:35:58 +0000 (12:35 +0200)]
ipv4: ipv6: initialize treq->txhash in cookie_v[46]_check()
[ Upstream commit
18bcf2907df935981266532e1e0d052aff2e6fae ]
KMSAN reported use of uninitialized memory in skb_set_hash_from_sk(),
which originated from the TCP request socket created in
cookie_v6_check():
==================================================================
BUG: KMSAN: use of uninitialized memory in tcp_transmit_skb+0xf77/0x3ec0
CPU: 1 PID: 2949 Comm: syz-execprog Not tainted 4.11.0-rc5+ #2931
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
TCP: request_sock_TCPv6: Possible SYN flooding on port 20028. Sending cookies. Check SNMP counters.
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:16
dump_stack+0x172/0x1c0 lib/dump_stack.c:52
kmsan_report+0x12a/0x180 mm/kmsan/kmsan.c:927
__msan_warning_32+0x61/0xb0 mm/kmsan/kmsan_instr.c:469
skb_set_hash_from_sk ./include/net/sock.h:2011
tcp_transmit_skb+0xf77/0x3ec0 net/ipv4/tcp_output.c:983
tcp_send_ack+0x75b/0x830 net/ipv4/tcp_output.c:3493
tcp_delack_timer_handler+0x9a6/0xb90 net/ipv4/tcp_timer.c:284
tcp_delack_timer+0x1b0/0x310 net/ipv4/tcp_timer.c:309
call_timer_fn+0x240/0x520 kernel/time/timer.c:1268
expire_timers kernel/time/timer.c:1307
__run_timers+0xc13/0xf10 kernel/time/timer.c:1601
run_timer_softirq+0x36/0xa0 kernel/time/timer.c:1614
__do_softirq+0x485/0x942 kernel/softirq.c:284
invoke_softirq kernel/softirq.c:364
irq_exit+0x1fa/0x230 kernel/softirq.c:405
exiting_irq+0xe/0x10 ./arch/x86/include/asm/apic.h:657
smp_apic_timer_interrupt+0x5a/0x80 arch/x86/kernel/apic/apic.c:966
apic_timer_interrupt+0x86/0x90 arch/x86/entry/entry_64.S:489
RIP: 0010:native_restore_fl ./arch/x86/include/asm/irqflags.h:36
RIP: 0010:arch_local_irq_restore ./arch/x86/include/asm/irqflags.h:77
RIP: 0010:__msan_poison_alloca+0xed/0x120 mm/kmsan/kmsan_instr.c:440
RSP: 0018:
ffff880024917cd8 EFLAGS:
00000246 ORIG_RAX:
ffffffffffffff10
RAX:
0000000000000246 RBX:
ffff8800224c0000 RCX:
0000000000000005
RDX:
0000000000000004 RSI:
ffff880000000000 RDI:
ffffea0000b6d770
RBP:
ffff880024917d58 R08:
0000000000000dd8 R09:
0000000000000004
R10:
0000160000000000 R11:
0000000000000000 R12:
ffffffff85abf810
R13:
ffff880024917dd8 R14:
0000000000000010 R15:
ffffffff81cabde4
</IRQ>
poll_select_copy_remaining+0xac/0x6b0 fs/select.c:293
SYSC_select+0x4b4/0x4e0 fs/select.c:653
SyS_select+0x76/0xa0 fs/select.c:634
entry_SYSCALL_64_fastpath+0x13/0x94 arch/x86/entry/entry_64.S:204
RIP: 0033:0x4597e7
RSP: 002b:
000000c420037ee0 EFLAGS:
00000246 ORIG_RAX:
0000000000000017
RAX:
ffffffffffffffda RBX:
0000000000000000 RCX:
00000000004597e7
RDX:
0000000000000000 RSI:
0000000000000000 RDI:
0000000000000000
RBP:
000000c420037ef0 R08:
000000c420037ee0 R09:
0000000000000059
R10:
0000000000000000 R11:
0000000000000246 R12:
000000000042dc20
R13:
00000000000000f3 R14:
0000000000000030 R15:
0000000000000003
chained origin:
save_stack_trace+0x37/0x40 arch/x86/kernel/stacktrace.c:59
kmsan_save_stack_with_flags mm/kmsan/kmsan.c:302
kmsan_save_stack mm/kmsan/kmsan.c:317
kmsan_internal_chain_origin+0x12a/0x1f0 mm/kmsan/kmsan.c:547
__msan_store_shadow_origin_4+0xac/0x110 mm/kmsan/kmsan_instr.c:259
tcp_create_openreq_child+0x709/0x1ae0 net/ipv4/tcp_minisocks.c:472
tcp_v6_syn_recv_sock+0x7eb/0x2a30 net/ipv6/tcp_ipv6.c:1103
tcp_get_cookie_sock+0x136/0x5f0 net/ipv4/syncookies.c:212
cookie_v6_check+0x17a9/0x1b50 net/ipv6/syncookies.c:245
tcp_v6_cookie_check net/ipv6/tcp_ipv6.c:989
tcp_v6_do_rcv+0xdd8/0x1c60 net/ipv6/tcp_ipv6.c:1298
tcp_v6_rcv+0x41a3/0x4f00 net/ipv6/tcp_ipv6.c:1487
ip6_input_finish+0x82f/0x1ee0 net/ipv6/ip6_input.c:279
NF_HOOK ./include/linux/netfilter.h:257
ip6_input+0x239/0x290 net/ipv6/ip6_input.c:322
dst_input ./include/net/dst.h:492
ip6_rcv_finish net/ipv6/ip6_input.c:69
NF_HOOK ./include/linux/netfilter.h:257
ipv6_rcv+0x1dbd/0x22e0 net/ipv6/ip6_input.c:203
__netif_receive_skb_core+0x2f6f/0x3a20 net/core/dev.c:4208
__netif_receive_skb net/core/dev.c:4246
process_backlog+0x667/0xba0 net/core/dev.c:4866
napi_poll net/core/dev.c:5268
net_rx_action+0xc95/0x1590 net/core/dev.c:5333
__do_softirq+0x485/0x942 kernel/softirq.c:284
origin:
save_stack_trace+0x37/0x40 arch/x86/kernel/stacktrace.c:59
kmsan_save_stack_with_flags mm/kmsan/kmsan.c:302
kmsan_internal_poison_shadow+0xb1/0x1a0 mm/kmsan/kmsan.c:198
kmsan_kmalloc+0x7f/0xe0 mm/kmsan/kmsan.c:337
kmem_cache_alloc+0x1c2/0x1e0 mm/slub.c:2766
reqsk_alloc ./include/net/request_sock.h:87
inet_reqsk_alloc+0xa4/0x5b0 net/ipv4/tcp_input.c:6200
cookie_v6_check+0x4f4/0x1b50 net/ipv6/syncookies.c:169
tcp_v6_cookie_check net/ipv6/tcp_ipv6.c:989
tcp_v6_do_rcv+0xdd8/0x1c60 net/ipv6/tcp_ipv6.c:1298
tcp_v6_rcv+0x41a3/0x4f00 net/ipv6/tcp_ipv6.c:1487
ip6_input_finish+0x82f/0x1ee0 net/ipv6/ip6_input.c:279
NF_HOOK ./include/linux/netfilter.h:257
ip6_input+0x239/0x290 net/ipv6/ip6_input.c:322
dst_input ./include/net/dst.h:492
ip6_rcv_finish net/ipv6/ip6_input.c:69
NF_HOOK ./include/linux/netfilter.h:257
ipv6_rcv+0x1dbd/0x22e0 net/ipv6/ip6_input.c:203
__netif_receive_skb_core+0x2f6f/0x3a20 net/core/dev.c:4208
__netif_receive_skb net/core/dev.c:4246
process_backlog+0x667/0xba0 net/core/dev.c:4866
napi_poll net/core/dev.c:5268
net_rx_action+0xc95/0x1590 net/core/dev.c:5333
__do_softirq+0x485/0x942 kernel/softirq.c:284
==================================================================
Similar error is reported for cookie_v4_check().
Fixes:
58d607d3e52f ("tcp: provide skb->hash to synack packets")
Signed-off-by: Alexander Potapenko <glider@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Neal Cardwell [Fri, 14 Jul 2017 21:49:25 +0000 (17:49 -0400)]
tcp_bbr: init pacing rate on first RTT sample
[ Upstream commit
32984565574da7ed3afa10647bb4020d7a9e6c93 ]
Fixes the following behavior: for connections that had no RTT sample
at the time of initializing congestion control, BBR was initializing
the pacing rate to a high nominal rate (based an a guess of RTT=1ms,
in case this is LAN traffic). Then BBR never adjusted the pacing rate
downward upon obtaining an actual RTT sample, if the connection never
filled the pipe (e.g. all sends were small app-limited writes()).
This fix adjusts the pacing rate upon obtaining the first RTT sample.
Fixes:
0f8782ea1497 ("tcp_bbr: add BBR congestion control")
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Neal Cardwell [Fri, 14 Jul 2017 21:49:24 +0000 (17:49 -0400)]
tcp_bbr: remove sk_pacing_rate=0 transient during init
[ Upstream commit
1d3648eb5d1fe9ed3d095ed8fa19ad11ca4c8bc0 ]
Fix a corner case noticed by Eric Dumazet, where BBR's setting
sk->sk_pacing_rate to 0 during initialization could theoretically
cause packets in the sending host to hang if there were packets "in
flight" in the pacing infrastructure at the time the BBR congestion
control state is initialized. This could occur if the pacing
infrastructure happened to race with bbr_init() in a way such that the
pacer read the 0 rather than the immediately following non-zero pacing
rate.
Fixes:
0f8782ea1497 ("tcp_bbr: add BBR congestion control")
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Neal Cardwell [Fri, 14 Jul 2017 21:49:23 +0000 (17:49 -0400)]
tcp_bbr: introduce bbr_init_pacing_rate_from_rtt() helper
[ Upstream commit
79135b89b8af304456bd67916b80116ddf03d7b6 ]
Introduce a helper to initialize the BBR pacing rate unconditionally,
based on the current cwnd and RTT estimate. This is a pure refactor,
but is needed for two following fixes.
Fixes:
0f8782ea1497 ("tcp_bbr: add BBR congestion control")
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Neal Cardwell [Fri, 14 Jul 2017 21:49:22 +0000 (17:49 -0400)]
tcp_bbr: introduce bbr_bw_to_pacing_rate() helper
[ Upstream commit
f19fd62dafaf1ed6cf615dba655b82fa9df59074 ]
Introduce a helper to convert a BBR bandwidth and gain factor to a
pacing rate in bytes per second. This is a pure refactor, but is
needed for two following fixes.
Fixes:
0f8782ea1497 ("tcp_bbr: add BBR congestion control")
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Neal Cardwell [Fri, 14 Jul 2017 21:49:21 +0000 (17:49 -0400)]
tcp_bbr: cut pacing rate only if filled pipe
[ Upstream commit
4aea287e90dd61a48268ff2994b56f9799441b62 ]
In bbr_set_pacing_rate(), which decides whether to cut the pacing
rate, there was some code that considered exiting STARTUP to be
equivalent to the notion of filling the pipe (i.e.,
bbr_full_bw_reached()). Specifically, as the code was structured,
exiting STARTUP and going into PROBE_RTT could cause us to cut the
pacing rate down to something silly and low, based on whatever
bandwidth samples we've had so far, when it's possible that all of
them have been small app-limited bandwidth samples that are not
representative of the bandwidth available in the path. (The code was
correct at the time it was written, but the state machine changed
without this spot being adjusted correspondingly.)
Fixes:
0f8782ea1497 ("tcp_bbr: add BBR congestion control")
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Steven Toth [Tue, 6 Jun 2017 12:30:27 +0000 (09:30 -0300)]
saa7164: fix double fetch PCIe access condition
commit
6fb05e0dd32e566facb96ea61a48c7488daa5ac3 upstream.
Avoid a double fetch by reusing the values from the prior transfer.
Originally reported via https://bugzilla.kernel.org/show_bug.cgi?id=195559
Thanks to Pengfei Wang <wpengfeinudt@gmail.com> for reporting.
Signed-off-by: Steven Toth <stoth@kernellabs.com>
Reported-by: Pengfei Wang <wpengfeinudt@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Omar Sandoval [Thu, 20 Jul 2017 22:10:35 +0000 (15:10 -0700)]
Btrfs: fix early ENOSPC due to delalloc
commit
17024ad0a0fdfcfe53043afb969b813d3e020c21 upstream.
If a lot of metadata is reserved for outstanding delayed allocations, we
rely on shrink_delalloc() to reclaim metadata space in order to fulfill
reservation tickets. However, shrink_delalloc() has a shortcut where if
it determines that space can be overcommitted, it will stop early. This
made sense before the ticketed enospc system, but now it means that
shrink_delalloc() will often not reclaim enough space to fulfill any
tickets, leading to an early ENOSPC. (Reservation tickets don't care
about being able to overcommit, they need every byte accounted for.)
Fix it by getting rid of the shortcut so that shrink_delalloc() reclaims
all of the metadata it is supposed to. This fixes early ENOSPCs we were
seeing when doing a btrfs receive to populate a new filesystem, as well
as early ENOSPCs Christoph saw when doing a big cp -r onto Btrfs.
Fixes:
957780eb2788 ("Btrfs: introduce ticketed enospc infrastructure")
Tested-by: Christoph Anton Mitterer <mail@christoph.anton.mitterer.name>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jin Qian [Mon, 15 May 2017 17:45:08 +0000 (10:45 -0700)]
f2fs: sanity check checkpoint segno and blkoff
commit
15d3042a937c13f5d9244241c7a9c8416ff6e82a upstream.
Make sure segno and blkoff read from raw image are valid.
Cc: stable@vger.kernel.org
Signed-off-by: Jin Qian <jinqian@google.com>
[Jaegeuk Kim: adjust minor coding style]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
[AmitP: Found in Android Security bulletin for Aug'17, fixes CVE-2017-10663]
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sean Young [Fri, 7 Jul 2017 21:49:18 +0000 (18:49 -0300)]
media: lirc: LIRC_GET_REC_RESOLUTION should return microseconds
commit
9f5039ba440e499d85c29b1ddbc3cbc9dc90e44b upstream.
Since commit
e8f4818895b3 ("[media] lirc: advertise
LIRC_CAN_GET_REC_RESOLUTION and improve") lircd uses the ioctl
LIRC_GET_REC_RESOLUTION to determine the shortest pulse or space that
the hardware can detect. This breaks decoding in lirc because lircd
expects the answer in microseconds, but nanoseconds is returned.
Reported-by: Derek <user.vdr@gmail.com>
Tested-by: Derek <user.vdr@gmail.com>
Signed-off-by: Sean Young <sean@mess.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
David Woods [Fri, 26 May 2017 21:53:21 +0000 (17:53 -0400)]
mmc: core: Use device_property_read instead of of_property_read
commit
73a47a9bb3e2c4a9c553c72456e63ab991b1a4d9 upstream.
Using the device_property interfaces allows mmc drivers to work
on platforms which run on either device tree or ACPI.
Signed-off-by: David Woods <dwoods@mellanox.com>
Reviewed-by: Chris Metcalf <cmetcalf@mellanox.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
David Woods [Fri, 26 May 2017 21:53:20 +0000 (17:53 -0400)]
mmc: dw_mmc: Use device_property_read instead of of_property_read
commit
852ff5fea9eb6a9799f1881d6df2cd69a9e6eed5 upstream.
Using the device_property interfaces allows the dw_mmc driver to work
on platforms which run on either device tree or ACPI.
Signed-off-by: David Woods <dwoods@mellanox.com>
Reviewed-by: Chris Metcalf <cmetcalf@mellanox.com>
Acked-by: Jaehoon Chung <jh80.chung@samsung.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Nicholas Bellinger [Thu, 25 May 2017 04:47:09 +0000 (21:47 -0700)]
iscsi-target: Fix initial login PDU asynchronous socket close OOPs
commit
25cdda95fda78d22d44157da15aa7ea34be3c804 upstream.
This patch fixes a OOPs originally introduced by:
commit
bb048357dad6d604520c91586334c9c230366a14
Author: Nicholas Bellinger <nab@linux-iscsi.org>
Date: Thu Sep 5 14:54:04 2013 -0700
iscsi-target: Add sk->sk_state_change to cleanup after TCP failure
which would trigger a NULL pointer dereference when a TCP connection
was closed asynchronously via iscsi_target_sk_state_change(), but only
when the initial PDU processing in iscsi_target_do_login() from iscsi_np
process context was blocked waiting for backend I/O to complete.
To address this issue, this patch makes the following changes.
First, it introduces some common helper functions used for checking
socket closing state, checking login_flags, and atomically checking
socket closing state + setting login_flags.
Second, it introduces a LOGIN_FLAGS_INITIAL_PDU bit to know when a TCP
connection has dropped via iscsi_target_sk_state_change(), but the
initial PDU processing within iscsi_target_do_login() in iscsi_np
context is still running. For this case, it sets LOGIN_FLAGS_CLOSED,
but doesn't invoke schedule_delayed_work().
The original NULL pointer dereference case reported by MNC is now handled
by iscsi_target_do_login() doing a iscsi_target_sk_check_close() before
transitioning to FFP to determine when the socket has already closed,
or iscsi_target_start_negotiation() if the login needs to exchange
more PDUs (eg: iscsi_target_do_login returned 0) but the socket has
closed. For both of these cases, the cleanup up of remaining connection
resources will occur in iscsi_target_start_negotiation() from iscsi_np
process context once the failure is detected.
Finally, to handle to case where iscsi_target_sk_state_change() is
called after the initial PDU procesing is complete, it now invokes
conn->login_work -> iscsi_target_do_login_rx() to perform cleanup once
existing iscsi_target_sk_check_close() checks detect connection failure.
For this case, the cleanup of remaining connection resources will occur
in iscsi_target_do_login_rx() from delayed workqueue process context
once the failure is detected.
Reported-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Mike Christie <mchristi@redhat.com>
Tested-by: Mike Christie <mchristi@redhat.com>
Cc: Mike Christie <mchristi@redhat.com>
Reported-by: Hannes Reinecke <hare@suse.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Varun Prakash <varun@chelsio.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Prabhakar Lad [Thu, 20 Jul 2017 12:02:09 +0000 (08:02 -0400)]
media: platform: davinci: return -EINVAL for VPFE_CMD_S_CCDC_RAW_PARAMS ioctl
commit
da05d52d2f0f6bd61094a0cd045fed94bf7d673a upstream.
this patch makes sure VPFE_CMD_S_CCDC_RAW_PARAMS ioctl no longer works
for vpfe_capture driver with a minimal patch suitable for backporting.
- This ioctl was never in public api and was only defined in kernel header.
- The function set_params constantly mixes up pointers and phys_addr_t
numbers.
- This is part of a 'VPFE_CMD_S_CCDC_RAW_PARAMS' ioctl command that is
described as an 'experimental ioctl that will change in future kernels'.
- The code to allocate the table never gets called after we copy_from_user
the user input over the kernel settings, and then compare them
for inequality.
- We then go on to use an address provided by user space as both the
__user pointer for input and pass it through phys_to_virt to come up
with a kernel pointer to copy the data to. This looks like a trivially
exploitable root hole.
Due to these reasons we make sure this ioctl now returns -EINVAL and backport
this patch as far as possible.
Fixes:
5f15fbb68fd7 ("V4L/DVB (12251): v4l: dm644x ccdc module for vpfe capture driver")
Signed-off-by: Lad, Prabhakar <prabhakar.csengg@gmail.com>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Marc Gonzalez [Fri, 28 Jul 2017 13:27:49 +0000 (15:27 +0200)]
ARM: dts: tango4: Request RGMII RX and TX clock delays
commit
985333b0eef8603b02181c4ec0a722b82be9642d upstream.
RX and TX clock delays are required. Request them explicitly.
Fixes:
cad008b8a77e6 ("ARM: dts: tango4: Initial device trees")
Signed-off-by: Marc Gonzalez <marc_gonzalez@sigmadesigns.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>