Lepton Wu [Tue, 11 Dec 2018 19:12:55 +0000 (11:12 -0800)]
VSOCK: bind to random port for VMADDR_PORT_ANY
The old code always starts from fixed port for VMADDR_PORT_ANY. Sometimes
when VMM crashed, there is still orphaned vsock which is waiting for
close timer, then it could cause connection time out for new started VM
if they are trying to connect to same port with same guest cid since the
new packets could hit that orphaned vsock. We could also fix this by doing
more in vhost_vsock_reset_orphans, but any way, it should be better to start
from a random local port instead of a fixed one.
Signed-off-by: Lepton Wu <ytht.net@gmail.com>
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mario Limonciello [Tue, 11 Dec 2018 14:16:14 +0000 (08:16 -0600)]
r8152: Add support for MAC address pass through on RTL8153-BND
All previous docks and dongles that have supported this feature use
the RTL8153-AD chip.
RTL8153-BND is a new chip that will be used in upcoming Dell type-C docks.
It should be added to the whitelist of devices to activate MAC address
pass through.
Per confirming with Realtek all devices containing RTL8153-BND should
activate MAC pass through and there won't use pass through bit on efuse
like in RTL8153-AD.
Signed-off-by: Mario Limonciello <mario.limonciello@dell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Atul Gupta [Tue, 11 Dec 2018 10:20:53 +0000 (02:20 -0800)]
crypto/chelsio/chtls: send/recv window update
recalculated send and receive window using linkspeed.
Determine correct value of eck_ok from SYN received and
option configured on local system.
Signed-off-by: Atul Gupta <atul.gupta@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Atul Gupta [Tue, 11 Dec 2018 10:20:40 +0000 (02:20 -0800)]
crypto/chelsio/chtls: macro correction in tx path
corrected macro used in tx path. removed redundant hdrlen
and check for !page in chtls_sendmsg
Signed-off-by: Atul Gupta <atul.gupta@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Atul Gupta [Tue, 11 Dec 2018 10:20:26 +0000 (02:20 -0800)]
crypto/chelsio/chtls: listen fails with multiadapt
listen fails when more than one tls capable device is
registered. tls_hw_hash is called for each dev which loops
again for each cdev_list causing listen failure. Hence
call chtls_listen_start/stop for specific device than loop over all
devices.
Signed-off-by: Atul Gupta <atul.gupta@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Atul Gupta [Tue, 11 Dec 2018 10:20:09 +0000 (02:20 -0800)]
net/tls: sleeping function from invalid context
HW unhash within mutex for registered tls devices cause sleep
when called from tcp_set_state for TCP_CLOSE. Release lock and
re-acquire after function call with ref count incr/dec.
defined kref and fp release for tls_device to ensure device
is not released outside lock.
BUG: sleeping function called from invalid context at
kernel/locking/mutex.c:748
in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/7
INFO: lockdep is turned off.
CPU: 7 PID: 0 Comm: swapper/7 Tainted: G W O
Call Trace:
<IRQ>
dump_stack+0x5e/0x8b
___might_sleep+0x222/0x260
__mutex_lock+0x5c/0xa50
? vprintk_emit+0x1f3/0x440
? kmem_cache_free+0x22d/0x2a0
? tls_hw_unhash+0x2f/0x80
? printk+0x52/0x6e
? tls_hw_unhash+0x2f/0x80
tls_hw_unhash+0x2f/0x80
tcp_set_state+0x5f/0x180
tcp_done+0x2e/0xe0
tcp_rcv_state_process+0x92c/0xdd3
? lock_acquire+0xf5/0x1f0
? tcp_v4_rcv+0xa7c/0xbe0
? tcp_v4_do_rcv+0x70/0x1e0
Signed-off-by: Atul Gupta <atul.gupta@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Atul Gupta [Tue, 11 Dec 2018 10:19:40 +0000 (02:19 -0800)]
net/tls: Init routines in create_ctx
create_ctx is called from tls_init and tls_hw_prot
hence initialize function pointers in common routine.
Signed-off-by: Atul Gupta <atul.gupta@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nathan Chancellor [Tue, 11 Dec 2018 04:20:30 +0000 (21:20 -0700)]
drivers: net: xgene: Remove unnecessary forward declarations
Clang warns:
drivers/net/ethernet/apm/xgene/xgene_enet_main.c:33:36: warning:
tentative array definition assumed to have one element
static const struct acpi_device_id xgene_enet_acpi_match[];
^
1 warning generated.
Both xgene_enet_acpi_match and xgene_enet_of_match are defined before
their uses at the bottom of the file so this is unnecessary. When
CONFIG_ACPI is disabled, ACPI_PTR becomes NULL so xgene_enet_acpi_match
doesn't need to be defined.
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Mon, 10 Dec 2018 23:23:30 +0000 (15:23 -0800)]
tipc: compare remote and local protocols in tipc_udp_enable()
When TIPC_NLA_UDP_REMOTE is an IPv6 mcast address but
TIPC_NLA_UDP_LOCAL is an IPv4 address, a NULL-ptr deref is triggered
as the UDP tunnel sock is initialized to IPv4 or IPv6 sock merely
based on the protocol in local address.
We should just error out when the remote address and local address
have different protocols.
Reported-by: syzbot+eb4da3a20fad2e52555d@syzkaller.appspotmail.com
Cc: Ying Xue <ying.xue@windriver.com>
Cc: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Mon, 10 Dec 2018 20:45:45 +0000 (12:45 -0800)]
tipc: fix a double kfree_skb()
tipc_udp_xmit() drops the packet on error, there is no
need to drop it again.
Fixes:
ef20cd4dd163 ("tipc: introduce UDP replicast")
Reported-and-tested-by: syzbot+eae585ba2cc2752d3704@syzkaller.appspotmail.com
Cc: Ying Xue <ying.xue@windriver.com>
Cc: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Mon, 10 Dec 2018 19:49:55 +0000 (11:49 -0800)]
tipc: use lock_sock() in tipc_sk_reinit()
lock_sock() must be used in process context to be race-free with
other lock_sock() callers, for example, tipc_release(). Otherwise
using the spinlock directly can't serialize a parallel tipc_release().
As it is blocking, we have to hold the sock refcnt before
rhashtable_walk_stop() and release it after rhashtable_walk_start().
Fixes:
07f6c4bc048a ("tipc: convert tipc reference table to use generic rhashtable")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Ying Xue <ying.xue@windriver.com>
Cc: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 14 Dec 2018 19:38:48 +0000 (11:38 -0800)]
net: netlink: rename NETLINK_DUMP_STRICT_CHK -> NETLINK_GET_STRICT_CHK
NETLINK_DUMP_STRICT_CHK can be used for all GET requests,
dumps as well as doit handlers. Replace the DUMP in the
name with GET make that clearer.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sudarsana Reddy Kalluru [Mon, 10 Dec 2018 07:27:01 +0000 (23:27 -0800)]
qed: Fix command number mismatch between driver and the mfw
The value for OEM_CFG_UPDATE command differs between driver and the
Management firmware (mfw). Fix this gap with adding a reserved field.
Fixes:
cac6f691546b ("qed: Add support for Unified Fabric Port.")
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 14 Dec 2018 03:21:53 +0000 (19:21 -0800)]
Merge tag 'mlx5-fixes-2018-12-13' of git://git./linux/kernel/git/saeed/linux
mlx5-fixes-2018-12-13
Subject: [pull request][net 0/9] Mellanox, mlx5 fixes 2018-12-13
Saeed Mahameed says:
====================
This series introduces some fixes to the mlx5 core and mlx5e netdevice
driver.
=======
Conflict with net-next: When merged with net-next this series will
cause a moderate conflict:
1) in drivers/net/ethernet/mellanox/mlx5/core/en_tc.c (2 hunks)
Take hunks from net only and just replace *attr->mirror_count to *attr->split_count
1.1) there is one more instance of slow_attr->mirror_count to be replaced
with slow_attr->split_count, it doesn't appear in the conflict, it will
cause a compilation error if left out.
2) in mlx5_ifc.h, take hunks only from net.
Example for the merge resolution can be found at:
https://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git/commit/?h=merge/mlx5-fixes&id=
48830adf29804d85d77ed8a251d625db0eb5b8a8
branch merge/mlx5-fixes of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
(I simply merged this pull request tag into net-next and resolved the conflict)
I don't know if it's ok with you, but to save your time, you can just:
git pull git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux merge/mlx5-fixes
Into net-next, before your next net merge, and you will have a clean
merge of net into net-next (at least for mlx5 files).
======
Please pull and let me know if there's any problem.
For -stable v4.18
338d615be484 ('net/mlx5e: Cancel DIM work on close SQ')
91f40f9904ad ('net/mlx5e: RX, Verify MPWQE stride size is in range')
For -stable v4.19
c5c7e1c41bbe ('net/mlx5e: Remove unused UDP GSO remaining counter')
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Tal Gilboa [Thu, 22 Nov 2018 12:20:45 +0000 (14:20 +0200)]
net/mlx5e: Cancel DIM work on close SQ
TXQ SQ closure is followed by closing the corresponding CQ. A pending
DIM work would try to modify the now non-existing CQ.
This would trigger an error:
[85535.835926] mlx5_core 0000:af:00.0: mlx5_cmd_check:769:(pid 124399):
MODIFY_CQ(0x403) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0x1d7771)
Fix by making sure to cancel any pending DIM work before destroying the SQ.
Fixes:
cbce4f444798 ("net/mlx5e: Enable adaptive-TX moderation")
Signed-off-by: Tal Gilboa <talgi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Mikhael Goikhman [Mon, 19 Nov 2018 17:11:12 +0000 (19:11 +0200)]
net/mlx5e: Remove unused UDP GSO remaining counter
Remove tx_udp_seg_rem counter from ethtool output, as it is no longer
being updated in the driver's data flow.
Fixes:
3f44899ef2ce ("net/mlx5e: Use PARTIAL_GSO for UDP segmentation")
Signed-off-by: Mikhael Goikhman <migo@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Or Gerlitz [Mon, 10 Dec 2018 15:05:59 +0000 (17:05 +0200)]
net/mlx5e: Avoid encap flows deletion attempt the 1st time a neigh is resolved
Currently, we are deleting offloaded encap flows in case the relevant neigh
becomes unconnected while the encap is valid (a sign that it used to be
connected), or if the curr neigh mac is different from the cached mac
(a sign that the remote side changed their mac).
The 2nd check also applies when the neigh becomes connected on the 1st
time (we start with zero mac). Before the offending commit, the deleting
handler was practically no op, as no flows were offloaded. But since
that commit, we offload neigh-less encap flows to slow path.
Under mirroring scheme, we go into the delete handler, attempt to unoffload a
mirror rule which was never set (as we were offloading to slow path) and crash.
Fix that by calling the delete handler only when the encap is valid,
which covers both cases mentioned above.
Fixes:
5dbe906ff1d5 ('net/mlx5e: Use a slow path rule instead if vxlan neighbour isn't available')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Or Gerlitz [Mon, 10 Dec 2018 10:31:42 +0000 (12:31 +0200)]
net/mlx5e: Properly initialize flow attributes for slow path eswitch rule deletion
When a neighbour is resolved, we delete the goto slow path rule from HW.
The eswitch flow attributes where not properly initialized on that case,
hence we mess up the eswitch refcounts for chain zero (the default one).
Fix that along with making sure to use semicolons and not commas on that code;
Fixes:
5dbe906ff1d5 ('net/mlx5e: Use a slow path rule instead if vxlan neighbour isn't available')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Or Gerlitz [Sun, 9 Dec 2018 15:15:23 +0000 (17:15 +0200)]
net/mlx5e: Avoid overriding the user provided priority for offloaded tc rules
Just a leftover which was wrongly left there, remove it while spawning
a message to suggest firmware upgrade.
Fixes:
bf07aa730a04 ('net/mlx5e: Support offloading tc priorities and chains for eswitch flows')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Or Gerlitz [Sun, 9 Dec 2018 15:03:36 +0000 (17:03 +0200)]
net/mlx5e: Err if asked to mirror a goto chain tc eswitch rule
Currently we are not supporting this and not err-ing on that either.
For now, just err if asked to do that.
Fixes:
bf07aa730a04 ('net/mlx5e: Support offloading tc priorities and chains for eswitch flows')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reported-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Moshe Shemesh [Fri, 2 Nov 2018 04:10:49 +0000 (06:10 +0200)]
net/mlx5e: RX, Verify MPWQE stride size is in range
Add check of MPWQE stride size is within range supported by HW. In case
calculated MPWQE stride size exceed range, linear SKB can't be used and
we should use non linear MPWQE instead.
Fixes:
619a8f2a42f1 ("net/mlx5e: Use linear SKB in Striding RQ")
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Gavi Teitz [Mon, 19 Nov 2018 15:22:30 +0000 (17:22 +0200)]
net/mlx5e: Fix default amount of channels for VF representors
The default amount of channels a representor opens was erroneously
changed from one to the maximum amount of channels, restore to its
intended value.
Fixes:
779d986d60de ("net/mlx5e: Do not ignore netdevice TX/RX queues number")
Signed-off-by: Gavi Teitz <gavi@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Vu Pham [Wed, 31 Oct 2018 14:03:21 +0000 (16:03 +0200)]
net/mlx5: E-Switch, Fix fdb cap bits swap
The cap bits locations for the fdb caps of multi path to table (used for
local mirroring) and multi encap (used for prio/chains) were wrongly used
in swapped locations. This went unnoted so far b/c we tested the offending
patch with CX5 FW that supports both of them. On different environments where
not both caps are supported, we will be messed up, fix that.
Fixes:
b9aa0ba17af5 ('net/mlx5: Add cap bits for multi fdb encap')
Signed-off-by: Vu Pham <vu@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Tested-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
David S. Miller [Thu, 13 Dec 2018 05:56:20 +0000 (21:56 -0800)]
Merge branch 'vhost-fixes'
Jason Wang says:
====================
Fix various issue of vhost
This series tries to fix various issues of vhost:
- Patch 1 adds a missing write barrier between used idx updating and
logging.
- Patch 2-3 brings back the protection of device IOTLB through vq
mutex, this fixes possible use after free in device IOTLB entries.
Please consider them for -stable.
Changes from V2:
- drop dirty page fix and make it for net-next
Changes from V1:
- silent compiler warning for 32bit.
- use mutex_trylock() on slowpath instead of mutex_lock() even on fast
path.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Wang [Thu, 13 Dec 2018 02:53:39 +0000 (10:53 +0800)]
Revert "net: vhost: lock the vqs one by one"
This reverts commit
78139c94dc8c96a478e67dab3bee84dc6eccb5fd. We don't
protect device IOTLB with vq mutex, which will lead e.g use after free
for device IOTLB entries. And since we've switched to use
mutex_trylock() in previous patch, it's safe to revert it without
having deadlock.
Fixes: commit
78139c94dc8c ("net: vhost: lock the vqs one by one")
Cc: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Wang [Thu, 13 Dec 2018 02:53:38 +0000 (10:53 +0800)]
vhost_net: switch to use mutex_trylock() in vhost_net_busy_poll()
We used to hold the mutex of paired virtqueue in
vhost_net_busy_poll(). But this will results an inconsistent lock
order which may cause deadlock if we try to bring back the protection
of device IOTLB with vq mutex that requires to hold mutex of all
virtqueues at the same time.
Fix this simply by switching to use mutex_trylock(), when fail just
skip the busy polling. This can happen when device IOTLB is under
updating which should be rare.
Fixes: commit
78139c94dc8c ("net: vhost: lock the vqs one by one")
Cc: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Wang [Thu, 13 Dec 2018 02:53:37 +0000 (10:53 +0800)]
vhost: make sure used idx is seen before log in vhost_add_used_n()
We miss a write barrier that guarantees used idx is updated and seen
before log. This will let userspace sync and copy used ring before
used idx is update. Fix this by adding a barrier before log_write().
Fixes:
8dd014adfea6f ("vhost-net: mergeable buffers support")
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 13 Dec 2018 05:43:43 +0000 (21:43 -0800)]
Merge branch '40GbE' of git://git./linux/kernel/git/jkirsher/net-queue
Jeff Kirsher says:
====================
Intel Wired LAN Driver Fixes 2018-12-12
This series contains fixes to i40e and ixgbe.
Stefan Assmann fixes an issue created by a previous fix, where
ether_addr_copy() was moved to avoid a race but did not take into
account that it alters the MAC address being handed to
i40e_del_mac_filter().
Michał Mirosław provides 2 fixes for i40e, first resolves issues in the
hardware VLAN offload where VLAN.TCI equal to 0 was being dropped and a
race between disabling VLAN receive feature in hardware and processing
the receive queue, where packets could have their VLAN information
dropped.
Ross Lagerwall fixes a racy condition during a ixgbe VF reset, where
writing the register to issue a reset and sending the reset message via
the mailbox API could result of the mailbox memory getting cleared
during the reset before the message gets successfully sent which results
in a VF driver malfunction.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 13 Dec 2018 05:36:12 +0000 (21:36 -0800)]
Merge git://git./pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Fix warnings suspicious rcu usage when handling base chain
statistics, from Taehee Yoo.
2) Refetch pointer to tcp header from nf_ct_sack_adjust() since
skb_make_writable() may reallocate data area, reported by Google
folks patch from Florian.
3) Incorrect netlink nest end after previous cancellation from error
path in ipset, from Pan Bian.
4) Use dst_hold_safe() from nf_xfrm_me_harder(), from Florian.
5) Use rb_link_node_rcu() for rcu-protected rbtree node in
nf_conncount, from Taehee Yoo.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 13 Dec 2018 00:25:14 +0000 (16:25 -0800)]
Merge branch 'bnx2x-Fix-series'
Sudarsana Reddy Kalluru says:
====================
bnx2x: Fix series
The patch series addresses few important issues in the bnx2x driver.
Please consider applying it 'net' tree.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Sudarsana Reddy Kalluru [Wed, 12 Dec 2018 16:57:03 +0000 (08:57 -0800)]
bnx2x: Send update-svid ramrod with retry/poll flags enabled
Driver sends update-SVID ramrod in the MFW notification path.
If there is a pending ramrod, driver doesn't retry the command
and storm firmware will never be updated with the SVID value.
The patch adds changes to send update-svid ramrod in process context with
retry/poll flags set.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sudarsana Reddy Kalluru [Wed, 12 Dec 2018 16:57:02 +0000 (08:57 -0800)]
bnx2x: Enable PTP only on the PF that initializes the port
There will be only one PHC clock per port. PTP should be enabled only on
one PF per port. The change enables PTP functionality on the PF that
initializes the port. The change is useful in multi-function modes e.g.,
NPAR where a port can have more than one PF.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sudarsana Reddy Kalluru [Wed, 12 Dec 2018 16:57:01 +0000 (08:57 -0800)]
bnx2x: Remove configured vlans as part of unload sequence.
Vlans are not getting removed when drivers are unloaded. The recent storm
firmware versions had added safeguards against re-configuring an already
configured vlan. As a result, PF inner reload flows (e.g., mtu change)
might trigger an assertion.
This change is going to remove vlans (same as we do for MACs) when doing
a chip cleanup during unload.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sudarsana Reddy Kalluru [Wed, 12 Dec 2018 16:57:00 +0000 (08:57 -0800)]
bnx2x: Clear fip MAC when fcoe offload support is disabled
On some customer setups it was observed that shmem contains a non-zero fip
MAC for 57711 which would lead to enabling of SW FCoE.
Add a software workaround to clear the bad fip mac address if no FCoE
connections are supported.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Taehee Yoo [Sat, 8 Dec 2018 02:03:01 +0000 (11:03 +0900)]
netfilter: nf_conncount: use rb_link_node_rcu() instead of rb_link_node()
rbnode in insert_tree() is rcu protected pointer.
So, in order to handle this pointer, _rcu function should be used.
rb_link_node_rcu() is a rcu version of rb_link_node().
Fixes:
34848d5c896e ("netfilter: nf_conncount: Split insert and traversal")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Westphal [Tue, 11 Dec 2018 06:45:29 +0000 (07:45 +0100)]
netfilter: nat: can't use dst_hold on noref dst
The dst entry might already have a zero refcount, waiting on rcu list
to be free'd. Using dst_hold() transitions its reference count to 1, and
next dst release will try to free it again -- resulting in a double free:
WARNING: CPU: 1 PID: 0 at include/net/dst.h:239 nf_xfrm_me_harder+0xe7/0x130 [nf_nat]
RIP: 0010:nf_xfrm_me_harder+0xe7/0x130 [nf_nat]
Code: 48 8b 5c 24 60 65 48 33 1c 25 28 00 00 00 75 53 48 83 c4 68 5b 5d 41 5c c3 85 c0 74 0d 8d 48 01 f0 0f b1 0a 74 86 85 c0 75 f3 <0f> 0b e9 7b ff ff ff 29 c6 31 d2 b9 20 00 48 00 4c 89 e7 e8 31 27
Call Trace:
nf_nat_ipv4_out+0x78/0x90 [nf_nat_ipv4]
nf_hook_slow+0x36/0xd0
ip_output+0x9f/0xd0
ip_forward+0x328/0x440
ip_rcv+0x8a/0xb0
Use dst_hold_safe instead and bail out if we cannot take a reference.
Fixes:
a4c2fd7f7891 ("net: remove DST_NOCACHE flag")
Reported-by: Martin Zaharinov <micron10@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pan Bian [Mon, 10 Dec 2018 13:39:37 +0000 (14:39 +0100)]
netfilter: ipset: do not call ipset_nest_end after nla_nest_cancel
In the error handling block, nla_nest_cancel(skb, atd) is called to
cancel the nest operation. But then, ipset_nest_end(skb, atd) is
unexpected called to end the nest operation. This patch calls the
ipset_nest_end only on the branch that nla_nest_cancel is not called.
Fixes:
45040978c899 ("netfilter: ipset: Fix set:list type crash when flush/dump set in parallel")
Signed-off-by: Pan Bian <bianpan2016@163.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Ross Lagerwall [Wed, 5 Dec 2018 13:54:26 +0000 (13:54 +0000)]
ixgbe: Fix race when the VF driver does a reset
When the VF driver does a reset, it (at least the Linux one) writes to
the VFCTRL register to issue a reset and then immediately sends a reset
message using the mailbox API. This is racy because when the PF driver
detects that the VFCTRL register reset pin has been asserted, it clears
the mailbox memory. Depending on ordering, the reset message sent by
the VF could be cleared by the PF driver. It then responds to the
cleared message with a NACK which causes the VF driver to malfunction.
Fix this by deferring clearing the mailbox memory until the reset
message is received.
Fixes:
939b701ad633 ("ixgbe: fix driver behaviour after issuing VFLR")
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Michał Mirosław [Tue, 4 Dec 2018 17:31:15 +0000 (18:31 +0100)]
i40e: DRY rx_ptype handling code
Move rx_ptype extracting to i40e_process_skb_fields() to avoid
duplicating the code.
Signed-off-by: Michał Mirosław <michal.miroslaw@atendesoftware.pl>
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Hui Peng [Wed, 12 Dec 2018 11:42:24 +0000 (12:42 +0100)]
USB: hso: Fix OOB memory access in hso_probe/hso_get_config_data
The function hso_probe reads if_num from the USB device (as an u8) and uses
it without a length check to index an array, resulting in an OOB memory read
in hso_probe or hso_get_config_data.
Add a length check for both locations and updated hso_probe to bail on
error.
This issue has been assigned CVE-2018-19985.
Reported-by: Hui Peng <benquike@gmail.com>
Reported-by: Mathias Payer <mathias.payer@nebelwelt.net>
Signed-off-by: Hui Peng <benquike@gmail.com>
Signed-off-by: Mathias Payer <mathias.payer@nebelwelt.net>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michał Mirosław [Tue, 4 Dec 2018 17:31:14 +0000 (18:31 +0100)]
i40e: fix VLAN.TCI == 0 RX HW offload
This fixes two bugs in hardware VLAN offload:
1. VLAN.TCI == 0 was being dropped
2. there was a race between disabling of VLAN RX feature in hardware
and processing RX queue, where packets processed in this window
could have their VLAN information dropped
Fix moves the VLAN handling into i40e_process_skb_fields() to save on
duplicated code. i40e_receive_skb() becomes trivial and so is removed.
Signed-off-by: Michał Mirosław <michal.miroslaw@atendesoftware.pl>
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Stefan Assmann [Tue, 4 Dec 2018 14:18:52 +0000 (15:18 +0100)]
i40e: fix mac filter delete when setting mac address
A previous commit moved the ether_addr_copy() in i40e_set_mac() before
the mac filter del/add to avoid a race. However it wasn't taken into
account that this alters the mac address being handed to
i40e_del_mac_filter().
Also changed i40e_add_mac_filter() to operate on netdev->dev_addr,
hopefully that makes the code easier to read.
Fixes:
458867b2ca0c ("i40e: don't remove netdev->dev_addr when syncing uc list")
Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Acked-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Heiner Kallweit [Sun, 9 Dec 2018 21:05:11 +0000 (22:05 +0100)]
r8169: fix crash if CONFIG_DEBUG_SHIRQ is enabled
If CONFIG_DEBUG_SHIRQ is enabled __free_irq() intentionally fires
a spurious interrupt. This interrupt causes a crash because
tp->dev->phydev is NULL at that time.
Fixes:
38caff5a445b ("r8169: handle all interrupt events in the hard irq handler")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 11 Dec 2018 19:04:22 +0000 (11:04 -0800)]
Merge branch 'ieee802154-for-davem-2018-12-11' of git://git./linux/kernel/git/sschmidt/wpan
Stefan Schmidt says:
====================
pull-request: ieee802154 for net 2018-12-11
An update from ieee802154 for your *net* tree.
Just two more fixes for ieee802154 dribver before the final 4.20 release.
Alexander Aring fixes a problem in the nested parsing code of the
hwsim driver interface.
A fix for a potential overflow in the ca8210 driver by Yue Habing.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
YueHaibing [Tue, 11 Dec 2018 03:13:39 +0000 (11:13 +0800)]
ieee802154: ca8210: fix possible u8 overflow in ca8210_rx_done
gcc warning this:
drivers/net/ieee802154/ca8210.c:730:10: warning:
comparison is always false due to limited range of data type [-Wtype-limits]
'len' is u8 type, we get it from buf[1] adding 2, which can overflow.
This patch change the type of 'len' to unsigned int to avoid this,also fix
the gcc warning.
Fixes:
ded845a781a5 ("ieee802154: Add CA8210 IEEE 802.15.4 device driver")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
Pieter Jansen van Vuuren [Mon, 10 Dec 2018 23:03:43 +0000 (15:03 -0800)]
nfp: flower: ensure TCP flags can be placed in IPv6 frame
Previously we did not ensure tcp flags have a place to be stored
when using IPv6. We correct this by including IPv6 key layer when
we match tcp flags and the IPv6 key layer has not been included
already.
Fixes:
07e1671cfca5 ("nfp: flower: refactor shared ip header in match offload")
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 11 Dec 2018 01:34:26 +0000 (17:34 -0800)]
Merge branch 'ibmvnic-Fix-reset-work-item-locking-bugs'
Thomas Falcon says:
====================
net/ibmvnic: Fix reset work item locking bugs
This patch set fixes issues with scheduling reset work items in
a tasklet context. Since ibmvnic_reset can called in an interrupt,
it should not use a mutex or allocate memory non-atomically.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Falcon [Mon, 10 Dec 2018 21:22:23 +0000 (15:22 -0600)]
ibmvnic: Fix non-atomic memory allocation in IRQ context
ibmvnic_reset allocated new reset work item objects in a non-atomic
context. This can be called from a tasklet, generating the output below.
Allocate work items with the GFP_ATOMIC flag instead.
BUG: sleeping function called from invalid context at mm/slab.h:421
in_atomic(): 1, irqs_disabled(): 1, pid: 93, name: kworker/0:2
INFO: lockdep is turned off.
irq event stamp: 66049
hardirqs last enabled at (66048): [<
c000000000122468>] tasklet_action_common.isra.12+0x78/0x1c0
hardirqs last disabled at (66049): [<
c000000000befce8>] _raw_spin_lock_irqsave+0x48/0xf0
softirqs last enabled at (66044): [<
c000000000a8ac78>] dev_deactivate_queue.constprop.28+0xc8/0x160
softirqs last disabled at (66045): [<
c0000000000306e0>] call_do_softirq+0x14/0x24
CPU: 0 PID: 93 Comm: kworker/0:2 Kdump: loaded Not tainted 4.20.0-rc6-00001-g1b50a8f03706 #7
Workqueue: events linkwatch_event
Call Trace:
[
c0000003fffe7ae0] [
c000000000bc83e4] dump_stack+0xe8/0x164 (unreliable)
[
c0000003fffe7b30] [
c00000000015ba0c] ___might_sleep+0x2dc/0x320
[
c0000003fffe7bb0] [
c000000000391514] kmem_cache_alloc_trace+0x3e4/0x440
[
c0000003fffe7c30] [
d000000005b2309c] ibmvnic_reset+0x16c/0x360 [ibmvnic]
[
c0000003fffe7cc0] [
d000000005b29834] ibmvnic_tasklet+0x1054/0x2010 [ibmvnic]
[
c0000003fffe7e00] [
c0000000001224c8] tasklet_action_common.isra.12+0xd8/0x1c0
[
c0000003fffe7e60] [
c000000000bf1238] __do_softirq+0x1a8/0x64c
[
c0000003fffe7f90] [
c0000000000306e0] call_do_softirq+0x14/0x24
[
c0000003f3967980] [
c00000000001ba50] do_softirq_own_stack+0x60/0xb0
[
c0000003f39679c0] [
c0000000001218a8] do_softirq+0xa8/0x100
[
c0000003f39679f0] [
c000000000121a74] __local_bh_enable_ip+0x174/0x180
[
c0000003f3967a60] [
c000000000bf003c] _raw_spin_unlock_bh+0x5c/0x80
[
c0000003f3967a90] [
c000000000a8ac78] dev_deactivate_queue.constprop.28+0xc8/0x160
[
c0000003f3967ad0] [
c000000000a8c8b0] dev_deactivate_many+0xd0/0x520
[
c0000003f3967b70] [
c000000000a8cd40] dev_deactivate+0x40/0x60
[
c0000003f3967ba0] [
c000000000a5e0c4] linkwatch_do_dev+0x74/0xd0
[
c0000003f3967bd0] [
c000000000a5e694] __linkwatch_run_queue+0x1a4/0x1f0
[
c0000003f3967c30] [
c000000000a5e728] linkwatch_event+0x48/0x60
[
c0000003f3967c50] [
c0000000001444e8] process_one_work+0x238/0x710
[
c0000003f3967d20] [
c000000000144a48] worker_thread+0x88/0x4e0
[
c0000003f3967db0] [
c00000000014e3a8] kthread+0x178/0x1c0
[
c0000003f3967e20] [
c00000000000bfd0] ret_from_kernel_thread+0x5c/0x6c
Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Falcon [Mon, 10 Dec 2018 21:22:22 +0000 (15:22 -0600)]
ibmvnic: Convert reset work item mutex to spin lock
ibmvnic_reset can create and schedule a reset work item from
an IRQ context, so do not use a mutex, which can sleep. Convert
the reset work item mutex to a spin lock. Locking debugger generated
the trace output below.
BUG: sleeping function called from invalid context at kernel/locking/mutex.c:908
in_atomic(): 1, irqs_disabled(): 1, pid: 120, name: kworker/8:1
4 locks held by kworker/8:1/120:
#0:
0000000017c05720 ((wq_completion)"events"){+.+.}, at: process_one_work+0x188/0x710
#1:
00000000ace90706 ((linkwatch_work).work){+.+.}, at: process_one_work+0x188/0x710
#2:
000000007632871f (rtnl_mutex){+.+.}, at: rtnl_lock+0x30/0x50
#3:
00000000fc36813a (&(&crq->lock)->rlock){..-.}, at: ibmvnic_tasklet+0x88/0x2010 [ibmvnic]
irq event stamp: 26293
hardirqs last enabled at (26292): [<
c000000000122468>] tasklet_action_common.isra.12+0x78/0x1c0
hardirqs last disabled at (26293): [<
c000000000befce8>] _raw_spin_lock_irqsave+0x48/0xf0
softirqs last enabled at (26288): [<
c000000000a8ac78>] dev_deactivate_queue.constprop.28+0xc8/0x160
softirqs last disabled at (26289): [<
c0000000000306e0>] call_do_softirq+0x14/0x24
CPU: 8 PID: 120 Comm: kworker/8:1 Kdump: loaded Not tainted 4.20.0-rc6 #6
Workqueue: events linkwatch_event
Call Trace:
[
c0000003fffa7a50] [
c000000000bc83e4] dump_stack+0xe8/0x164 (unreliable)
[
c0000003fffa7aa0] [
c00000000015ba0c] ___might_sleep+0x2dc/0x320
[
c0000003fffa7b20] [
c000000000be960c] __mutex_lock+0x8c/0xb40
[
c0000003fffa7c30] [
d000000006202ac8] ibmvnic_reset+0x78/0x330 [ibmvnic]
[
c0000003fffa7cc0] [
d0000000062097f4] ibmvnic_tasklet+0x1054/0x2010 [ibmvnic]
[
c0000003fffa7e00] [
c0000000001224c8] tasklet_action_common.isra.12+0xd8/0x1c0
[
c0000003fffa7e60] [
c000000000bf1238] __do_softirq+0x1a8/0x64c
[
c0000003fffa7f90] [
c0000000000306e0] call_do_softirq+0x14/0x24
[
c0000003f3f87980] [
c00000000001ba50] do_softirq_own_stack+0x60/0xb0
[
c0000003f3f879c0] [
c0000000001218a8] do_softirq+0xa8/0x100
[
c0000003f3f879f0] [
c000000000121a74] __local_bh_enable_ip+0x174/0x180
[
c0000003f3f87a60] [
c000000000bf003c] _raw_spin_unlock_bh+0x5c/0x80
[
c0000003f3f87a90] [
c000000000a8ac78] dev_deactivate_queue.constprop.28+0xc8/0x160
[
c0000003f3f87ad0] [
c000000000a8c8b0] dev_deactivate_many+0xd0/0x520
[
c0000003f3f87b70] [
c000000000a8cd40] dev_deactivate+0x40/0x60
[
c0000003f3f87ba0] [
c000000000a5e0c4] linkwatch_do_dev+0x74/0xd0
[
c0000003f3f87bd0] [
c000000000a5e694] __linkwatch_run_queue+0x1a4/0x1f0
[
c0000003f3f87c30] [
c000000000a5e728] linkwatch_event+0x48/0x60
[
c0000003f3f87c50] [
c0000000001444e8] process_one_work+0x238/0x710
[
c0000003f3f87d20] [
c000000000144a48] worker_thread+0x88/0x4e0
[
c0000003f3f87db0] [
c00000000014e3a8] kthread+0x178/0x1c0
[
c0000003f3f87e20] [
c00000000000bfd0] ret_from_kernel_thread+0x5c/0x6c
Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gustavo A. R. Silva [Mon, 10 Dec 2018 18:41:24 +0000 (12:41 -0600)]
ipv4: Fix potential Spectre v1 vulnerability
vr.vifi is indirectly controlled by user-space, hence leading to
a potential exploitation of the Spectre variant 1 vulnerability.
This issue was detected with the help of Smatch:
net/ipv4/ipmr.c:1616 ipmr_ioctl() warn: potential spectre issue 'mrt->vif_table' [r] (local cap)
net/ipv4/ipmr.c:1690 ipmr_compat_ioctl() warn: potential spectre issue 'mrt->vif_table' [r] (local cap)
Fix this by sanitizing vr.vifi before using it to index mrt->vif_table'
Notice that given that speculation windows are large, the policy is
to kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].
[1] https://marc.info/?l=linux-kernel&m=
152449131114778&w=2
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Mon, 10 Dec 2018 10:00:52 +0000 (18:00 +0800)]
sctp: initialize sin6_flowinfo for ipv6 addrs in sctp_inet6addr_event
syzbot reported a kernel-infoleak, which is caused by an uninitialized
field(sin6_flowinfo) of addr->a.v6 in sctp_inet6addr_event().
The call trace is as below:
BUG: KMSAN: kernel-infoleak in _copy_to_user+0x19a/0x230 lib/usercopy.c:33
CPU: 1 PID: 8164 Comm: syz-executor2 Not tainted 4.20.0-rc3+ #95
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x32d/0x480 lib/dump_stack.c:113
kmsan_report+0x12c/0x290 mm/kmsan/kmsan.c:683
kmsan_internal_check_memory+0x32a/0xa50 mm/kmsan/kmsan.c:743
kmsan_copy_to_user+0x78/0xd0 mm/kmsan/kmsan_hooks.c:634
_copy_to_user+0x19a/0x230 lib/usercopy.c:33
copy_to_user include/linux/uaccess.h:183 [inline]
sctp_getsockopt_local_addrs net/sctp/socket.c:5998 [inline]
sctp_getsockopt+0x15248/0x186f0 net/sctp/socket.c:7477
sock_common_getsockopt+0x13f/0x180 net/core/sock.c:2937
__sys_getsockopt+0x489/0x550 net/socket.c:1939
__do_sys_getsockopt net/socket.c:1950 [inline]
__se_sys_getsockopt+0xe1/0x100 net/socket.c:1947
__x64_sys_getsockopt+0x62/0x80 net/socket.c:1947
do_syscall_64+0xcf/0x110 arch/x86/entry/common.c:291
entry_SYSCALL_64_after_hwframe+0x63/0xe7
sin6_flowinfo is not really used by SCTP, so it will be fixed by simply
setting it to 0.
The issue exists since very beginning.
Thanks Alexander for the reproducer provided.
Reported-by: syzbot+ad5d327e6936a2e284be@syzkaller.appspotmail.com
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sun, 9 Dec 2018 23:31:00 +0000 (15:31 -0800)]
Linux 4.20-rc6
Linus Torvalds [Sun, 9 Dec 2018 23:12:33 +0000 (15:12 -0800)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
"A decent batch of fixes here. I'd say about half are for problems that
have existed for a while, and half are for new regressions added in
the 4.20 merge window.
1) Fix 10G SFP phy module detection in mvpp2, from Baruch Siach.
2) Revert bogus emac driver change, from Benjamin Herrenschmidt.
3) Handle BPF exported data structure with pointers when building
32-bit userland, from Daniel Borkmann.
4) Memory leak fix in act_police, from Davide Caratti.
5) Check RX checksum offload in RX descriptors properly in aquantia
driver, from Dmitry Bogdanov.
6) SKB unlink fix in various spots, from Edward Cree.
7) ndo_dflt_fdb_dump() only works with ethernet, enforce this, from
Eric Dumazet.
8) Fix FID leak in mlxsw driver, from Ido Schimmel.
9) IOTLB locking fix in vhost, from Jean-Philippe Brucker.
10) Fix SKB truesize accounting in ipv4/ipv6/netfilter frag memory
limits otherwise namespace exit can hang. From Jiri Wiesner.
11) Address block parsing length fixes in x25 from Martin Schiller.
12) IRQ and ring accounting fixes in bnxt_en, from Michael Chan.
13) For tun interfaces, only iface delete works with rtnl ops, enforce
this by disallowing add. From Nicolas Dichtel.
14) Use after free in liquidio, from Pan Bian.
15) Fix SKB use after passing to netif_receive_skb(), from Prashant
Bhole.
16) Static key accounting and other fixes in XPS from Sabrina Dubroca.
17) Partially initialized flow key passed to ip6_route_output(), from
Shmulik Ladkani.
18) Fix RTNL deadlock during reset in ibmvnic driver, from Thomas
Falcon.
19) Several small TCP fixes (off-by-one on window probe abort, NULL
deref in tail loss probe, SNMP mis-estimations) from Yuchung
Cheng"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (93 commits)
net/sched: cls_flower: Reject duplicated rules also under skip_sw
bnxt_en: Fix _bnxt_get_max_rings() for 57500 chips.
bnxt_en: Fix NQ/CP rings accounting on the new 57500 chips.
bnxt_en: Keep track of reserved IRQs.
bnxt_en: Fix CNP CoS queue regression.
net/mlx4_core: Correctly set PFC param if global pause is turned off.
Revert "net/ibm/emac: wrong bit is used for STA control"
neighbour: Avoid writing before skb->head in neigh_hh_output()
ipv6: Check available headroom in ip6_xmit() even without options
tcp: lack of available data can also cause TSO defer
ipv6: sr: properly initialize flowi6 prior passing to ip6_route_output
mlxsw: spectrum_switchdev: Fix VLAN device deletion via ioctl
mlxsw: spectrum_router: Relax GRE decap matching check
mlxsw: spectrum_switchdev: Avoid leaking FID's reference count
mlxsw: spectrum_nve: Remove easily triggerable warnings
ipv4: ipv6: netfilter: Adjust the frag mem limit when truesize changes
sctp: frag_point sanity check
tcp: fix NULL ref in tail loss probe
tcp: Do not underestimate rwnd_limited
net: use skb_list_del_init() to remove from RX sublists
...
Linus Torvalds [Sun, 9 Dec 2018 23:09:55 +0000 (15:09 -0800)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Three fixes: a boot parameter re-(re-)fix, a retpoline build artifact
fix and an LLVM workaround"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/vdso: Drop implicit common-page-size linker flag
x86/build: Fix compiler support check for CONFIG_RETPOLINE
x86/boot: Clear RSDP address in boot_params for broken loaders
Linus Torvalds [Sun, 9 Dec 2018 22:21:33 +0000 (14:21 -0800)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull kprobes fixes from Ingo Molnar:
"Two kprobes fixes: a blacklist fix and an instruction patching related
corruption fix"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
kprobes/x86: Blacklist non-attachable interrupt functions
kprobes/x86: Fix instruction patching corruption when copying more than one RIP-relative instruction
Linus Torvalds [Sun, 9 Dec 2018 22:03:56 +0000 (14:03 -0800)]
Merge branch 'efi-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull EFI fixes from Ingo Molnar:
"Two fixes: a large-system fix and an earlyprintk fix with certain
resolutions"
* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/earlyprintk/efi: Fix infinite loop on some screen widths
x86/efi: Allocate e820 buffer before calling efi_exit_boot_service
Or Gerlitz [Sun, 9 Dec 2018 16:10:24 +0000 (18:10 +0200)]
net/sched: cls_flower: Reject duplicated rules also under skip_sw
Currently, duplicated rules are rejected only for skip_hw or "none",
hence allowing users to push duplicates into HW for no reason.
Use the flower tables to protect for that.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reported-by: Chris Mi <chrism@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 9 Dec 2018 19:46:59 +0000 (11:46 -0800)]
Merge branch 'bnxt_en-Bug-fixes'
Michael Chan says:
====================
bnxt_en: Bug fixes.
The first patch fixes a regression on CoS queue setup, introduced
recently by the 57500 new chip support patches. The rest are
fixes related to ring and resource accounting on the new 57500 chips.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Sun, 9 Dec 2018 12:01:02 +0000 (07:01 -0500)]
bnxt_en: Fix _bnxt_get_max_rings() for 57500 chips.
The CP rings are accounted differently on the new 57500 chips. There
must be enough CP rings for the sum of RX and TX rings on the new
chips. The current logic may be over-estimating the RX and TX rings.
The output parameter max_cp should be the maximum NQs capped by
MSIX vectors available for networking in the context of 57500 chips.
The existing code which uses CMPL rings capped by the MSIX vectors
works most of the time but is not always correct.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Sun, 9 Dec 2018 12:01:01 +0000 (07:01 -0500)]
bnxt_en: Fix NQ/CP rings accounting on the new 57500 chips.
The new 57500 chips have introduced the NQ structure in addition to
the existing CP rings in all chips. We need to introduce a new
bnxt_nq_rings_in_use(). On legacy chips, the 2 functions are the
same and one will just call the other. On the new chips, they
refer to the 2 separate ring structures. The new function is now
called to determine the resource (NQ or CP rings) associated with
MSIX that are in use.
On 57500 chips, the RDMA driver does not use the CP rings so
we don't need to do the subtraction adjustment.
Fixes:
41e8d7983752 ("bnxt_en: Modify the ring reservation functions for 57500 series chips.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Sun, 9 Dec 2018 12:01:00 +0000 (07:01 -0500)]
bnxt_en: Keep track of reserved IRQs.
The new 57500 chips use 1 NQ per MSIX vector, whereas legacy chips use
1 CP ring per MSIX vector. To better unify this, add a resv_irqs
field to struct bnxt_hw_resc. On legacy chips, we initialize resv_irqs
with resv_cp_rings. On new chips, we initialize it with the allocated
MSIX resources.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Sun, 9 Dec 2018 12:00:59 +0000 (07:00 -0500)]
bnxt_en: Fix CNP CoS queue regression.
Recent changes to support the 57500 devices have created this
regression. The bnxt_hwrm_queue_qportcfg() call was moved to be
called earlier before the RDMA support was determined, causing
the CoS queues configuration to be set before knowing whether RDMA
was supported or not. Fix it by moving it to the right place right
after RDMA support is determined.
Fixes:
98f04cf0f1fc ("bnxt_en: Check context memory requirements from firmware.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sun, 9 Dec 2018 18:43:17 +0000 (10:43 -0800)]
Merge tag 'char-misc-4.20-rc6' of git://git./linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here are some small driver fixes for 4.20-rc6.
There is a hyperv fix that for some reaon took forever to get into a
shape that could be applied to the tree properly, but resolves a much
reported issue. The others are some gnss patches, one a bugfix and the
two others updates to the MAINTAINERS file to properly match the gnss
files in the tree.
All have been in linux-next for a while with no reported issues"
* tag 'char-misc-4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
MAINTAINERS: exclude gnss from SIRFPRIMA2 regex matching
MAINTAINERS: add gnss scm tree
gnss: sirf: fix activation retry handling
Drivers: hv: vmbus: Offload the handling of channels to two workqueues
Linus Torvalds [Sun, 9 Dec 2018 18:35:33 +0000 (10:35 -0800)]
Merge tag 'staging-4.20-rc6' of git://git./linux/kernel/git/gregkh/staging
Pull staging fixes from Greg KH:
"Here are two staging driver bugfixes for 4.20-rc6.
One is a revert of a previously incorrect patch that was merged a
while ago, and the other resolves a possible buffer overrun that was
found by code inspection.
Both of these have been in the linux-next tree with no reported
issues"
* tag 'staging-4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
Revert commit
ef9209b642f "staging: rtl8723bs: Fix indenting errors and an off-by-one mistake in core/rtw_mlme_ext.c"
staging: rtl8712: Fix possible buffer overrun
Linus Torvalds [Sun, 9 Dec 2018 18:24:29 +0000 (10:24 -0800)]
Merge tag 'tty-4.20-rc6' of git://git./linux/kernel/git/gregkh/tty
Pull tty driver fixes from Greg KH:
"Here are three small tty driver fixes for 4.20-rc6
Nothing major, just some bug fixes for reported issues. Full details
are in the shortlog.
All of these have been in linux-next for a while with no reported
issues"
* tag 'tty-4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
kgdboc: fix KASAN global-out-of-bounds bug in param_set_kgdboc_var()
tty: serial: 8250_mtk: always resume the device in probe.
tty: do not set TTY_IO_ERROR flag if console port
Linus Torvalds [Sun, 9 Dec 2018 18:18:24 +0000 (10:18 -0800)]
Merge tag 'usb-4.20-rc6' of git://git./linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are some small USB fixes for 4.20-rc6
The "largest" here are some xhci fixes for reported issues. Also here
is a USB core fix, some quirk additions, and a usb-serial fix which
required the export of one of the tty layer's functions to prevent
code duplication. The tty maintainer agreed with this change.
All of these have been in linux-next with no reported issues"
* tag 'usb-4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
xhci: Prevent U1/U2 link pm states if exit latency is too long
xhci: workaround CSS timeout on AMD SNPS 3.0 xHC
USB: check usb_get_extra_descriptor for proper size
USB: serial: console: fix reported terminal settings
usb: quirk: add no-LPM quirk on SanDisk Ultra Flair device
USB: Fix invalid-free bug in port_over_current_notify()
usb: appledisplay: Add 27" Apple Cinema Display
Linus Torvalds [Sun, 9 Dec 2018 18:15:13 +0000 (10:15 -0800)]
Merge tag '4.20-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
"Three small fixes: a fix for smb3 direct i/o, a fix for CIFS DFS for
stable and a minor cifs Kconfig fix"
* tag '4.20-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
CIFS: Avoid returning EBUSY to upper layer VFS
cifs: Fix separator when building path from dentry
cifs: In Kconfig CONFIG_CIFS_POSIX needs depends on legacy (insecure cifs)
Linus Torvalds [Sun, 9 Dec 2018 17:54:04 +0000 (09:54 -0800)]
Merge tag 'dax-fixes-4.20-rc6' of git://git./linux/kernel/git/nvdimm/nvdimm
Pull dax fixes from Dan Williams:
"The last of the known regression fixes and fallout from the Xarray
conversion of the filesystem-dax implementation.
On the path to debugging why the dax memory-failure injection test
started failing after the Xarray conversion a couple more fixes for
the dax_lock_mapping_entry(), now called dax_lock_page(), surfaced.
Those plus the bug that started the hunt are now addressed. These
patches have appeared in a -next release with no issues reported.
Note the touches to mm/memory-failure.c are just the conversion to the
new function signature for dax_lock_page().
Summary:
- Fix the Xarray conversion of fsdax to properly handle
dax_lock_mapping_entry() in the presense of pmd entries
- Fix inode destruction racing a new lock request"
* tag 'dax-fixes-4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
dax: Fix unlock mismatch with updated API
dax: Don't access a freed inode
dax: Check page->mapping isn't NULL
Linus Torvalds [Sun, 9 Dec 2018 17:46:54 +0000 (09:46 -0800)]
Merge tag 'libnvdimm-fixes-4.20-rc6' of git://git./linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fixes from Dan Williams:
"A regression fix for the Address Range Scrub implementation, yes
another one, and support for platforms that misalign persistent memory
relative to the Linux memory hotplug section constraint. Longer term,
support for sub-section memory hotplug would alleviate alignment
waste, but until then this hack allows a 'struct page' memmap to be
established for these misaligned memory regions.
These have all appeared in a -next release, and thanks to Patrick for
reporting and testing the alignment padding fix.
Summary:
- Unless and until the core mm handles memory hotplug units smaller
than a section (128M), persistent memory namespaces must be padded
to section alignment.
The libnvdimm core already handled section collision with "System
RAM", but some configurations overlap independent "Persistent
Memory" ranges within a section, so additional padding injection is
added for that case.
- The recent reworks of the ARS (address range scrub) state machine
to reduce the number of state flags inadvertantly missed a
conversion of acpi_nfit_ars_rescan() call sites. Fix the regression
whereby user-requested ARS results in a "short" scrub rather than a
"long" scrub.
- Fixup the unit tests to handle / test the 128M section alignment of
mocked test resources.
* tag 'libnvdimm-fixes-4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
acpi/nfit: Fix user-initiated ARS to be "ARS-long" rather than "ARS-short"
libnvdimm, pfn: Pad pfn namespaces relative to other regions
tools/testing/nvdimm: Align test resources to 128M
Tarick Bedeir [Fri, 7 Dec 2018 08:30:26 +0000 (00:30 -0800)]
net/mlx4_core: Correctly set PFC param if global pause is turned off.
rx_ppp and tx_ppp can be set between 0 and 255, so don't clamp to 1.
Fixes:
6e8814ceb7e8 ("net/mlx4_en: Fix mixed PFC and Global pause user control requests")
Signed-off-by: Tarick Bedeir <tarick@google.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sun, 9 Dec 2018 01:45:20 +0000 (17:45 -0800)]
Merge branch 'fixes' of git://git./linux/kernel/git/evalenti/linux-soc-thermal
Pull thermal SoC fixes from Eduardo Valentin:
"Fixes for armada and broadcom thermal drivers"
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal:
thermal: broadcom: constify thermal_zone_of_device_ops structure
thermal: armada: constify thermal_zone_of_device_ops structure
thermal: bcm2835: Switch to SPDX identifier
thermal: armada: fix legacy resource fixup
thermal: armada: fix legacy validity test sense
Linus Torvalds [Sat, 8 Dec 2018 19:44:04 +0000 (11:44 -0800)]
Merge tag 'asm-generic-4.20' of git://git./linux/kernel/git/arnd/asm-generic
Pull asm-generic fix from Arnd Bergmann:
"Multiple people reported a bug I introduced in asm-generic/unistd.h in
4.20, this is the obvious bugfix to get glibc and others to correctly
build again on new architectures that no longer provide the old
fstatat64() family of system calls"
* tag 'asm-generic-4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
asm-generic: unistd.h: fixup broken macro include.
Linus Torvalds [Sat, 8 Dec 2018 19:33:26 +0000 (11:33 -0800)]
Merge tag 'clk-fixes-for-linus' of git://git./linux/kernel/git/clk/linux
Pull clk fixes from Stephen Boyd:
"A few clk driver fixes this time:
- Introduce protected-clock DT binding to fix breakage on qcom
sdm845-mtp boards where the qspi clks introduced this merge window
cause the firmware on those boards to take down the system if we
try to read the clk registers
- Fix a couple off-by-one errors found by Dan Carpenter
- Handle failure in zynq fixed factor clk driver to avoid using
uninitialized data"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: zynqmp: Off by one in zynqmp_is_valid_clock()
clk: mmp: Off by one in mmp_clk_add()
clk: mvebu: Off by one bugs in cp110_of_clk_get()
arm64: dts: qcom: sdm845-mtp: Mark protected gcc clocks
clk: qcom: Support 'protected-clocks' property
dt-bindings: clk: Introduce 'protected-clocks' property
clk: zynqmp: handle fixed factor param query error
Linus Torvalds [Sat, 8 Dec 2018 19:25:02 +0000 (11:25 -0800)]
Merge tag 'xfs-4.20-fixes-3' of git://git./fs/xfs/xfs-linux
Pull xfs fixes from Darrick Wong:
"Here are hopefully the last set of fixes for 4.20.
There's a fix for a longstanding statfs reporting problem with project
quotas, a correction for page cache invalidation behaviors when
fallocating near EOF, and a fix for a broken metadata verifier return
code.
Finally, the most important fix is to the pipe splicing code (aka the
generic copy_file_range fallback) to avoid pointless short directio
reads by only asking the filesystem for as much data as there are
available pages in the pipe buffer. Our previous fix (simulated short
directio reads because the number of pages didn't match the length of
the read requested) caused subtle problems on overlayfs, so that part
is reverted.
Anyhow, this series passes fstests -g all on xfs and overlay+xfs, and
has passed 17 billion fsx operations problem-free since I started
testing
Summary:
- Fix broken project quota inode counts
- Fix incorrect PAGE_MASK/PAGE_SIZE usage
- Fix incorrect return value in btree verifier
- Fix WARN_ON remap flags false positive
- Fix splice read overflows"
* tag 'xfs-4.20-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
iomap: partially revert
4721a601099 (simulated directio short read on EFAULT)
splice: don't read more than available pipe space
vfs: allow some remap flags to be passed to vfs_clone_file_range
xfs: fix inverted return from xfs_btree_sblock_verify_crc
xfs: fix PAGE_MASK usage in xfs_free_file_space
fs/xfs: fix f_ffree value for statfs when project quota is set
David Rientjes [Fri, 7 Dec 2018 22:50:16 +0000 (14:50 -0800)]
Revert "mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask"
This reverts commit
89c83fb539f95491be80cdd5158e6f0ce329e317.
This should have been done as part of
2f0799a0ffc0 ("mm, thp: restore
node-local hugepage allocations"). The movement of the thp allocation
policy from alloc_pages_vma() to alloc_hugepage_direct_gfpmask() was
intended to only set __GFP_THISNODE for mempolicies that are not
MPOL_BIND whereas the revert could set this regardless of mempolicy.
While the check for MPOL_BIND between alloc_hugepage_direct_gfpmask()
and alloc_pages_vma() was racy, that has since been removed since the
revert. What is left is the possibility to use __GFP_THISNODE in
policy_node() when it is unexpected because the special handling for
hugepages in alloc_pages_vma() was removed as part of the consolidation.
Secondly, prior to
89c83fb539f9, alloc_pages_vma() implemented a somewhat
different policy for hugepage allocations, which were allocated through
alloc_hugepage_vma(). For hugepage allocations, if the allocating
process's node is in the set of allowed nodes, allocate with
__GFP_THISNODE for that node (for MPOL_PREFERRED, use that node with
__GFP_THISNODE instead). This was changed for shmem_alloc_hugepage() to
allow fallback to other nodes in
89c83fb539f9 as it did for new_page() in
mm/mempolicy.c which is functionally different behavior and removes the
requirement to only allocate hugepages locally.
So this commit does a full revert of
89c83fb539f9 instead of the partial
revert that was done in
2f0799a0ffc0. The result is the same thp
allocation policy for 4.20 that was in 4.19.
Fixes:
89c83fb539f9 ("mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask")
Fixes:
2f0799a0ffc0 ("mm, thp: restore node-local hugepage allocations")
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Benjamin Herrenschmidt [Fri, 7 Dec 2018 04:05:04 +0000 (15:05 +1100)]
Revert "net/ibm/emac: wrong bit is used for STA control"
This reverts commit
624ca9c33c8a853a4a589836e310d776620f4ab9.
This commit is completely bogus. The STACR register has two formats, old
and new, depending on the version of the IP block used. There's a pair of
device-tree properties that can be used to specify the format used:
has-inverted-stacr-oc
has-new-stacr-staopc
What this commit did was to change the bit definition used with the old
parts to match the new parts. This of course breaks the driver on all
the old ones.
Instead, the author should have set the appropriate properties in the
device-tree for the variant used on his board.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 8 Dec 2018 00:24:40 +0000 (16:24 -0800)]
Merge branch 'skb-headroom-slab-out-of-bounds'
Stefano Brivio says:
====================
Fix slab out-of-bounds on insufficient headroom for IPv6 packets
Patch 1/2 fixes a slab out-of-bounds occurring with short SCTP packets over
IPv4 over L2TP over IPv6 on a configuration with relatively low HEADER_MAX.
Patch 2/2 makes sure we avoid writing before the allocated buffer in
neigh_hh_output() in case the headroom is enough for the unaligned hardware
header size, but not enough for the aligned one, and that we warn if we hit
this condition.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefano Brivio [Thu, 6 Dec 2018 18:30:37 +0000 (19:30 +0100)]
neighbour: Avoid writing before skb->head in neigh_hh_output()
While skb_push() makes the kernel panic if the skb headroom is less than
the unaligned hardware header size, it will proceed normally in case we
copy more than that because of alignment, and we'll silently corrupt
adjacent slabs.
In the case fixed by the previous patch,
"ipv6: Check available headroom in ip6_xmit() even without options", we
end up in neigh_hh_output() with 14 bytes headroom, 14 bytes hardware
header and write 16 bytes, starting 2 bytes before the allocated buffer.
Always check we're not writing before skb->head and, if the headroom is
not enough, warn and drop the packet.
v2:
- instead of panicking with BUG_ON(), WARN_ON_ONCE() and drop the packet
(Eric Dumazet)
- if we avoid the panic, though, we need to explicitly check the headroom
before the memcpy(), otherwise we'll have corrupted slabs on a running
kernel, after we warn
- use __skb_push() instead of skb_push(), as the headroom check is
already implemented here explicitly (Eric Dumazet)
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefano Brivio [Thu, 6 Dec 2018 18:30:36 +0000 (19:30 +0100)]
ipv6: Check available headroom in ip6_xmit() even without options
Even if we send an IPv6 packet without options, MAX_HEADER might not be
enough to account for the additional headroom required by alignment of
hardware headers.
On a configuration without HYPERV_NET, WLAN, AX25, and with IPV6_TUNNEL,
sending short SCTP packets over IPv4 over L2TP over IPv6, we start with
100 bytes of allocated headroom in sctp_packet_transmit(), end up with 54
bytes after l2tp_xmit_skb(), and 14 bytes in ip6_finish_output2().
Those would be enough to append our 14 bytes header, but we're going to
align that to 16 bytes, and write 2 bytes out of the allocated slab in
neigh_hh_output().
KASan says:
[ 264.967848] ==================================================================
[ 264.967861] BUG: KASAN: slab-out-of-bounds in ip6_finish_output2+0x1aec/0x1c70
[ 264.967866] Write of size 16 at addr
000000006af1c7fe by task netperf/6201
[ 264.967870]
[ 264.967876] CPU: 0 PID: 6201 Comm: netperf Not tainted 4.20.0-rc4+ #1
[ 264.967881] Hardware name: IBM 2827 H43 400 (z/VM 6.4.0)
[ 264.967887] Call Trace:
[ 264.967896] ([<
00000000001347d6>] show_stack+0x56/0xa0)
[ 264.967903] [<
00000000017e379c>] dump_stack+0x23c/0x290
[ 264.967912] [<
00000000007bc594>] print_address_description+0xf4/0x290
[ 264.967919] [<
00000000007bc8fc>] kasan_report+0x13c/0x240
[ 264.967927] [<
000000000162f5e4>] ip6_finish_output2+0x1aec/0x1c70
[ 264.967935] [<
000000000163f890>] ip6_finish_output+0x430/0x7f0
[ 264.967943] [<
000000000163fe44>] ip6_output+0x1f4/0x580
[ 264.967953] [<
000000000163882a>] ip6_xmit+0xfea/0x1ce8
[ 264.967963] [<
00000000017396e2>] inet6_csk_xmit+0x282/0x3f8
[ 264.968033] [<
000003ff805fb0ba>] l2tp_xmit_skb+0xe02/0x13e0 [l2tp_core]
[ 264.968037] [<
000003ff80631192>] l2tp_eth_dev_xmit+0xda/0x150 [l2tp_eth]
[ 264.968041] [<
0000000001220020>] dev_hard_start_xmit+0x268/0x928
[ 264.968069] [<
0000000001330e8e>] sch_direct_xmit+0x7ae/0x1350
[ 264.968071] [<
000000000122359c>] __dev_queue_xmit+0x2b7c/0x3478
[ 264.968075] [<
00000000013d2862>] ip_finish_output2+0xce2/0x11a0
[ 264.968078] [<
00000000013d9b14>] ip_finish_output+0x56c/0x8c8
[ 264.968081] [<
00000000013ddd1e>] ip_output+0x226/0x4c0
[ 264.968083] [<
00000000013dbd6c>] __ip_queue_xmit+0x894/0x1938
[ 264.968100] [<
000003ff80bc3a5c>] sctp_packet_transmit+0x29d4/0x3648 [sctp]
[ 264.968116] [<
000003ff80b7bf68>] sctp_outq_flush_ctrl.constprop.5+0x8d0/0xe50 [sctp]
[ 264.968131] [<
000003ff80b7c716>] sctp_outq_flush+0x22e/0x7d8 [sctp]
[ 264.968146] [<
000003ff80b35c68>] sctp_cmd_interpreter.isra.16+0x530/0x6800 [sctp]
[ 264.968161] [<
000003ff80b3410a>] sctp_do_sm+0x222/0x648 [sctp]
[ 264.968177] [<
000003ff80bbddac>] sctp_primitive_ASSOCIATE+0xbc/0xf8 [sctp]
[ 264.968192] [<
000003ff80b93328>] __sctp_connect+0x830/0xc20 [sctp]
[ 264.968208] [<
000003ff80bb11ce>] sctp_inet_connect+0x2e6/0x378 [sctp]
[ 264.968212] [<
0000000001197942>] __sys_connect+0x21a/0x450
[ 264.968215] [<
000000000119aff8>] sys_socketcall+0x3d0/0xb08
[ 264.968218] [<
000000000184ea7a>] system_call+0x2a2/0x2c0
[...]
Just like ip_finish_output2() does for IPv4, check that we have enough
headroom in ip6_xmit(), and reallocate it if we don't.
This issue is older than git history.
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 6 Dec 2018 17:58:24 +0000 (09:58 -0800)]
tcp: lack of available data can also cause TSO defer
tcp_tso_should_defer() can return true in three different cases :
1) We are cwnd-limited
2) We are rwnd-limited
3) We are application limited.
Neal pointed out that my recent fix went too far, since
it assumed that if we were not in 1) case, we must be rwnd-limited
Fix this by properly populating the is_cwnd_limited and
is_rwnd_limited booleans.
After this change, we can finally move the silly check for FIN
flag only for the application-limited case.
The same move for EOR bit will be handled in net-next,
since commit
1c09f7d073b1 ("tcp: do not try to defer skbs
with eor mark (MSG_EOR)") is scheduled for linux-4.21
Tested by running 200 concurrent netperf -t TCP_RR -- -r 60000,100
and checking none of them was rwnd_limited in the chrono_stat
output from "ss -ti" command.
Fixes:
41727549de3e ("tcp: Do not underestimate rwnd_limited")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Suggested-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 7 Dec 2018 22:34:10 +0000 (14:34 -0800)]
Merge tag 'for_linus' of git://git./linux/kernel/git/mst/vhost
Pull vhost/virtio fixes from Michael Tsirkin:
"A couple of last-minute fixes"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vhost/vsock: fix use-after-free in network stack callers
virtio/s390: fix race in ccw_io_helper()
virtio/s390: avoid race on vcdev->config
vhost/vsock: fix reset orphans race with close timeout
Linus Torvalds [Fri, 7 Dec 2018 22:18:49 +0000 (14:18 -0800)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 fix from Catalin Marinas:
"Avoid sending IPIs with interrupts disabled"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: hibernate: Avoid sending cross-calling with interrupts disabled
Linus Torvalds [Fri, 7 Dec 2018 21:13:07 +0000 (13:13 -0800)]
Merge tag 'gcc-plugins-v4.20-rc6' of git://git./linux/kernel/git/kees/linux
Pull gcc stackleak plugin fixes from Kees Cook:
- Remove tracing for inserted stack depth marking function (Anders
Roxell)
- Move gcc-plugin pass location to avoid objtool warnings (Alexander
Popov)
* tag 'gcc-plugins-v4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
stackleak: Register the 'stackleak_cleanup' pass before the '*free_cfg' pass
stackleak: Mark stackleak_track_stack() as notrace
Linus Torvalds [Fri, 7 Dec 2018 21:07:10 +0000 (13:07 -0800)]
Merge branch 'linus' of git://git./linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
- Disable the new crypto stats interface as it's still being changed
- Fix potential uses-after-free in cbc/cfb/pcbc.
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: user - Disable statistics interface
crypto: do not free algorithm before using
Linus Torvalds [Fri, 7 Dec 2018 20:58:34 +0000 (12:58 -0800)]
Merge tag 'pci-v4.20-fixes-3' of git://git./linux/kernel/git/helgaas/pci
Pull PCI fixes from Bjorn Helgaas:
"Revert ASPM change that caused a regression"
* tag 'pci-v4.20-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
Revert "PCI/ASPM: Do not initialize link state when aspm_disabled is set"
Shmulik Ladkani [Fri, 7 Dec 2018 07:50:17 +0000 (09:50 +0200)]
ipv6: sr: properly initialize flowi6 prior passing to ip6_route_output
In 'seg6_output', stack variable 'struct flowi6 fl6' was missing
initialization.
Fixes:
6c8702c60b88 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels")
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 7 Dec 2018 18:40:37 +0000 (10:40 -0800)]
Merge tag 'for-linus-
20181207' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"Let's try this again...
We're finally happy with the DM livelock issue, and it's also passed
overnight testing and the corruption regression test. The end result
is much nicer now too, which is great.
Outside of that fix, there's a pull request for NVMe with two small
fixes, and a regression fix for BFQ from this merge window. The BFQ
fix looks bigger than it is, it's 90% comment updates"
* tag 'for-linus-
20181207' of git://git.kernel.dk/linux-block:
blk-mq: punt failed direct issue to dispatch list
nvmet-rdma: fix response use after free
nvme: validate controller state before rescheduling keep alive
block, bfq: fix decrement of num_active_groups
Linus Torvalds [Fri, 7 Dec 2018 18:31:31 +0000 (10:31 -0800)]
Merge branch 'i2c/for-current-fixed' of git://git./linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"A set of driver bugfixes for the I2C subsystem"
* 'i2c/for-current-fixed' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: uniphier-f: fix violation of tLOW requirement for Fast-mode
i2c: uniphier: fix violation of tLOW requirement for Fast-mode
i2c: uniphier-f: fill TX-FIFO only in IRQ handler for repeated START
i2c: uniphier-f: fix timeout error after reading 8 bytes
i2c: scmi: Fix probe error on devices with an empty SMB0001 ACPI device node
i2c: axxia: properly handle master timeout
i2c: rcar: check bus state before reinitializing
i2c: nvidia-gpu: limit reads also for combined messages
i2c: nvidia-gpu: adhere to I2C fault codes
Linus Torvalds [Fri, 7 Dec 2018 17:58:34 +0000 (09:58 -0800)]
Merge tag 'dmaengine-fix-4.20-rc6' of git://git.infradead.org/users/vkoul/slave-dma
Pull dmaengine fixes from Vinod Koul:
"Another pull request for dmaengine. We got bunch of fixes early this
week and all are tagged to stable. Hope this is last fix for this
cycle:
- Fix imx-sdma handling of channel terminations, this involves
reverting two commits and implement async termination
- Fix cppi dma channel deletion from pending list on stop
- Fix FIFO size for dw controller in Intel Merrifield"
* tag 'dmaengine-fix-4.20-rc6' of git://git.infradead.org/users/vkoul/slave-dma:
dmaengine: dw: Fix FIFO size for Intel Merrifield
dmaengine: cppi41: delete channel from pending list when stop channel
dmaengine: imx-sdma: use GFP_NOWAIT for dma descriptor allocations
dmaengine: imx-sdma: implement channel termination via worker
Revert "dmaengine: imx-sdma: alloclate bd memory from dma pool"
Revert "dmaengine: imx-sdma: Use GFP_NOWAIT for dma allocations"
Nick Desaulniers [Thu, 6 Dec 2018 19:12:31 +0000 (11:12 -0800)]
x86/vdso: Drop implicit common-page-size linker flag
GNU linker's -z common-page-size's default value is based on the target
architecture. arch/x86/entry/vdso/Makefile sets it to the architecture
default, which is implicit and redundant. Drop it.
Fixes:
2aae950b21e4 ("x86_64: Add vDSO for x86-64 with gettimeofday/clock_gettime/getcpu")
Reported-by: Dmitry Golovin <dima@golovin.in>
Reported-by: Bill Wendling <morbo@google.com>
Suggested-by: Dmitry Golovin <dima@golovin.in>
Suggested-by: Rui Ueyama <ruiu@google.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Fangrui Song <maskray@google.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20181206191231.192355-1-ndesaulniers@google.com
Link: https://bugs.llvm.org/show_bug.cgi?id=38774
Link: https://github.com/ClangBuiltLinux/linux/issues/31
Will Deacon [Fri, 7 Dec 2018 12:47:10 +0000 (12:47 +0000)]
arm64: hibernate: Avoid sending cross-calling with interrupts disabled
Since commit
3b8c9f1cdfc50 ("arm64: IPI each CPU after invalidating the
I-cache for kernel mappings"), a call to flush_icache_range() will use
an IPI to cross-call other online CPUs so that any stale instructions
are flushed from their pipelines. This triggers a WARN during the
hibernation resume path, where flush_icache_range() is called with
interrupts disabled and is therefore prone to deadlock:
| Disabling non-boot CPUs ...
| CPU1: shutdown
| psci: CPU1 killed.
| CPU2: shutdown
| psci: CPU2 killed.
| CPU3: shutdown
| psci: CPU3 killed.
| WARNING: CPU: 0 PID: 1 at ../kernel/smp.c:416 smp_call_function_many+0xd4/0x350
| Modules linked in:
| CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.20.0-rc4 #1
Since all secondary CPUs have been taken offline prior to invalidating
the I-cache, there's actually no need for an IPI and we can simply call
__flush_icache_range() instead.
Cc: <stable@vger.kernel.org>
Fixes:
3b8c9f1cdfc50 ("arm64: IPI each CPU after invalidating the I-cache for kernel mappings")
Reported-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com>
Tested-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com>
Tested-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Jens Axboe [Fri, 7 Dec 2018 15:40:13 +0000 (08:40 -0700)]
Merge branch 'nvme-4.20' of git://git.infradead.org/nvme into for-linus
Pull NVMe fixes from Christoph.
* 'nvme-4.20' of git://git.infradead.org/nvme:
nvmet-rdma: fix response use after free
nvme: validate controller state before rescheduling keep alive
Jens Axboe [Fri, 7 Dec 2018 05:17:44 +0000 (22:17 -0700)]
blk-mq: punt failed direct issue to dispatch list
After the direct dispatch corruption fix, we permanently disallow direct
dispatch of non read/write requests. This works fine off the normal IO
path, as they will be retried like any other failed direct dispatch
request. But for the blk_insert_cloned_request() that only DM uses to
bypass the bottom level scheduler, we always first attempt direct
dispatch. For some types of requests, that's now a permanent failure,
and no amount of retrying will make that succeed. This results in a
livelock.
Instead of making special cases for what we can direct issue, and now
having to deal with DM solving the livelock while still retaining a BUSY
condition feedback loop, always just add a request that has been through
->queue_rq() to the hardware queue dispatch list. These are safe to use
as no merging can take place there. Additionally, if requests do have
prepped data from drivers, we aren't dependent on them not sharing space
in the request structure to safely add them to the IO scheduler lists.
This basically reverts
ffe81d45322c and is based on a patch from Ming,
but with the list insert case covered as well.
Fixes:
ffe81d45322c ("blk-mq: fix corruption with direct issue")
Cc: stable@vger.kernel.org
Suggested-by: Ming Lei <ming.lei@redhat.com>
Reported-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Ming Lei <ming.lei@redhat.com>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Israel Rukshin [Wed, 5 Dec 2018 16:54:57 +0000 (16:54 +0000)]
nvmet-rdma: fix response use after free
nvmet_rdma_release_rsp() may free the response before using it at error
flow.
Fixes: 8407879 ("nvmet-rdma: fix possible bogus dereference under heavy load")
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
James Smart [Wed, 28 Nov 2018 01:04:44 +0000 (17:04 -0800)]
nvme: validate controller state before rescheduling keep alive
Delete operations are seeing NULL pointer references in call_timer_fn.
Tracking these back, the timer appears to be the keep alive timer.
nvme_keep_alive_work() which is tied to the timer that is cancelled
by nvme_stop_keep_alive(), simply starts the keep alive io but doesn't
wait for it's completion. So nvme_stop_keep_alive() only stops a timer
when it's pending. When a keep alive is in flight, there is no timer
running and the nvme_stop_keep_alive() will have no affect on the keep
alive io. Thus, if the io completes successfully, the keep alive timer
will be rescheduled. In the failure case, delete is called, the
controller state is changed, the nvme_stop_keep_alive() is called while
the io is outstanding, and the delete path continues on. The keep
alive happens to successfully complete before the delete paths mark it
as aborted as part of the queue termination, so the timer is restarted.
The delete paths then tear down the controller, and later on the timer
code fires and the timer entry is now corrupt.
Fix by validating the controller state before rescheduling the keep
alive. Testing with the fix has confirmed the condition above was hit.
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Paolo Valente [Thu, 6 Dec 2018 18:18:18 +0000 (19:18 +0100)]
block, bfq: fix decrement of num_active_groups
Since commit '
2d29c9f89fcd ("block, bfq: improve asymmetric scenarios
detection")', if there are process groups with I/O requests waiting for
completion, then BFQ tags the scenario as 'asymmetric'. This detection
is needed for preserving service guarantees (for details, see comments
on the computation * of the variable asymmetric_scenario in the
function bfq_better_to_idle).
Unfortunately, commit '
2d29c9f89fcd ("block, bfq: improve asymmetric
scenarios detection")' contains an error exactly in the updating of
the number of groups with I/O requests waiting for completion: if a
group has more than one descendant process, then the above number of
groups, which is renamed from num_active_groups to a more appropriate
num_groups_with_pending_reqs by this commit, may happen to be wrongly
decremented multiple times, namely every time one of the descendant
processes gets all its pending I/O requests completed.
A correct, complete solution should work as follows. Consider a group
that is inactive, i.e., that has no descendant process with pending
I/O inside BFQ queues. Then suppose that num_groups_with_pending_reqs
is still accounting for this group, because the group still has some
descendant process with some I/O request still in
flight. num_groups_with_pending_reqs should be decremented when the
in-flight request of the last descendant process is finally completed
(assuming that nothing else has changed for the group in the meantime,
in terms of composition of the group and active/inactive state of
child groups and processes). To accomplish this, an additional
pending-request counter must be added to entities, and must be
updated correctly.
To avoid this additional field and operations, this commit resorts to
the following tradeoff between simplicity and accuracy: for an
inactive group that is still counted in num_groups_with_pending_reqs,
this commit decrements num_groups_with_pending_reqs when the first
descendant process of the group remains with no request waiting for
completion.
This simplified scheme provides a fix to the unbalanced decrements
introduced by
2d29c9f89fcd. Since this error was also caused by lack
of comments on this non-trivial issue, this commit also adds related
comments.
Fixes:
2d29c9f89fcd ("block, bfq: improve asymmetric scenarios detection")
Reported-by: Steven Barrett <steven@liquorix.net>
Tested-by: Steven Barrett <steven@liquorix.net>
Tested-by: Lucjan Lucjanov <lucjan.lucjanov@gmail.com>
Reviewed-by: Federico Motta <federico@willer.it>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Greg Kroah-Hartman [Fri, 7 Dec 2018 13:05:28 +0000 (14:05 +0100)]
Merge tag 'gnss-4.20-rc6' of https://git./linux/kernel/git/johan/gnss into char-misc-linus
Johan writes:
GNSS fixes for 4.20-rc6
Here's a fix for a broken activation retry loop in the sirf driver.
Included are also two MAINTAINERS updates.
All have been in linux-next with no reported issues.
Signed-off-by: Johan Hovold <johan@kernel.org>
* tag 'gnss-4.20-rc6' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/gnss:
MAINTAINERS: exclude gnss from SIRFPRIMA2 regex matching
MAINTAINERS: add gnss scm tree
gnss: sirf: fix activation retry handling
Florian Westphal [Wed, 5 Dec 2018 13:12:19 +0000 (14:12 +0100)]
netfilter: seqadj: re-load tcp header pointer after possible head reallocation
When adjusting sack block sequence numbers, skb_make_writable() gets
called to make sure tcp options are all in the linear area, and buffer
is not shared.
This can cause tcp header pointer to get reallocated, so we must
reaload it to avoid memory corruption.
This bug pre-dates git history.
Reported-by: Neel Mehta <nmehta@google.com>
Reported-by: Shane Huntley <shuntley@google.com>
Reported-by: Heather Adkins <argv@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Long Li [Thu, 6 Dec 2018 04:51:06 +0000 (04:51 +0000)]
CIFS: Avoid returning EBUSY to upper layer VFS
EBUSY is not handled by VFS, and will be passed to user-mode. This is not
correct as we need to wait for more credits.
This patch also fixes a bug where rsize or wsize is used uninitialized when
the call to server->ops->wait_mtu_credits() fails.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Herbert Xu [Fri, 7 Dec 2018 05:56:08 +0000 (13:56 +0800)]
crypto: user - Disable statistics interface
Since this user-space API is still undergoing significant changes,
this patch disables it for the current merge window.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>