Linus Torvalds [Fri, 28 Feb 2020 00:34:41 +0000 (16:34 -0800)]
Merge git://git./linux/kernel/git/netdev/net
Pull networking fixes from David Miller:
1) Fix leak in nl80211 AP start where we leak the ACL memory, from
Johannes Berg.
2) Fix double mutex unlock in mac80211, from Andrei Otcheretianski.
3) Fix RCU stall in ipset, from Jozsef Kadlecsik.
4) Fix devlink locking in devlink_dpipe_table_register, from Madhuparna
Bhowmik.
5) Fix race causing TX hang in ll_temac, from Esben Haabendal.
6) Stale eth hdr pointer in br_dev_xmit(), from Nikolay Aleksandrov.
7) Fix TX hash calculation bounds checking wrt. tc rules, from Amritha
Nambiar.
8) Size netlink responses properly in schedule action code to take into
consideration TCA_ACT_FLAGS. From Jiri Pirko.
9) Fix firmware paths for mscc PHY driver, from Antoine Tenart.
10) Don't register stmmac notifier multiple times, from Aaro Koskinen.
11) Various rmnet bug fixes, from Taehee Yoo.
12) Fix vsock deadlock in vsock transport release, from Stefano
Garzarella.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (61 commits)
net: dsa: mv88e6xxx: Fix masking of egress port
mlxsw: pci: Wait longer before accessing the device after reset
sfc: fix timestamp reconstruction at 16-bit rollover points
vsock: fix potential deadlock in transport->release()
unix: It's CONFIG_PROC_FS not CONFIG_PROCFS
net: rmnet: fix packet forwarding in rmnet bridge mode
net: rmnet: fix bridge mode bugs
net: rmnet: use upper/lower device infrastructure
net: rmnet: do not allow to change mux id if mux id is duplicated
net: rmnet: remove rcu_read_lock in rmnet_force_unassociate_device()
net: rmnet: fix suspicious RCU usage
net: rmnet: fix NULL pointer dereference in rmnet_changelink()
net: rmnet: fix NULL pointer dereference in rmnet_newlink()
net: phy: marvell: don't interpret PHY status unless resolved
mlx5: register lag notifier for init network namespace only
unix: define and set show_fdinfo only if procfs is enabled
hinic: fix a bug of rss configuration
hinic: fix a bug of setting hw_ioctxt
hinic: fix a irq affinity bug
net/smc: check for valid ib_client_data
...
Andrew Lunn [Thu, 27 Feb 2020 20:20:49 +0000 (21:20 +0100)]
net: dsa: mv88e6xxx: Fix masking of egress port
Add missing ~ to the usage of the mask.
Reported-by: Kevin Benson <Kevin.Benson@zii.aero>
Reported-by: Chris Healy <Chris.Healy@zii.aero>
Fixes:
5c74c54ce6ff ("net: dsa: mv88e6xxx: Split monitor port configuration")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Thu, 27 Feb 2020 20:07:53 +0000 (21:07 +0100)]
mlxsw: pci: Wait longer before accessing the device after reset
During initialization the driver issues a reset to the device and waits
for 100ms before checking if the firmware is ready. The waiting is
necessary because before that the device is irresponsive and the first
read can result in a completion timeout.
While 100ms is sufficient for Spectrum-1 and Spectrum-2, it is
insufficient for Spectrum-3.
Fix this by increasing the timeout to 200ms.
Fixes:
da382875c616 ("mlxsw: spectrum: Extend to support Spectrum-3 ASIC")
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alex Maftei (amaftei) [Wed, 26 Feb 2020 17:33:19 +0000 (17:33 +0000)]
sfc: fix timestamp reconstruction at 16-bit rollover points
We can't just use the top bits of the last sync event as they could be
off-by-one every 65,536 seconds, giving an error in reconstruction of
65,536 seconds.
This patch uses the difference in the bottom 16 bits (mod 2^16) to
calculate an offset that needs to be applied to the last sync event to
get to the current time.
Signed-off-by: Alexandru-Mihai Maftei <amaftei@solarflare.com>
Acked-by: Martin Habets <mhabets@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefano Garzarella [Wed, 26 Feb 2020 10:58:18 +0000 (11:58 +0100)]
vsock: fix potential deadlock in transport->release()
Some transports (hyperv, virtio) acquire the sock lock during the
.release() callback.
In the vsock_stream_connect() we call vsock_assign_transport(); if
the socket was previously assigned to another transport, the
vsk->transport->release() is called, but the sock lock is already
held in the vsock_stream_connect(), causing a deadlock reported by
syzbot:
INFO: task syz-executor280:9768 blocked for more than 143 seconds.
Not tainted 5.6.0-rc1-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor280 D27912 9768 9766 0x00000000
Call Trace:
context_switch kernel/sched/core.c:3386 [inline]
__schedule+0x934/0x1f90 kernel/sched/core.c:4082
schedule+0xdc/0x2b0 kernel/sched/core.c:4156
__lock_sock+0x165/0x290 net/core/sock.c:2413
lock_sock_nested+0xfe/0x120 net/core/sock.c:2938
virtio_transport_release+0xc4/0xd60 net/vmw_vsock/virtio_transport_common.c:832
vsock_assign_transport+0xf3/0x3b0 net/vmw_vsock/af_vsock.c:454
vsock_stream_connect+0x2b3/0xc70 net/vmw_vsock/af_vsock.c:1288
__sys_connect_file+0x161/0x1c0 net/socket.c:1857
__sys_connect+0x174/0x1b0 net/socket.c:1874
__do_sys_connect net/socket.c:1885 [inline]
__se_sys_connect net/socket.c:1882 [inline]
__x64_sys_connect+0x73/0xb0 net/socket.c:1882
do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe
To avoid this issue, this patch remove the lock acquiring in the
.release() callback of hyperv and virtio transports, and it holds
the lock when we call vsk->transport->release() in the vsock core.
Reported-by: syzbot+731710996d79d0d58fbc@syzkaller.appspotmail.com
Fixes:
408624af4c89 ("vsock: use local transport when it is loaded")
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 27 Feb 2020 19:52:35 +0000 (11:52 -0800)]
unix: It's CONFIG_PROC_FS not CONFIG_PROCFS
Fixes:
3a12500ed5dd ("unix: define and set show_fdinfo only if procfs is enabled")
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 27 Feb 2020 19:45:07 +0000 (11:45 -0800)]
Merge branch 'net-rmnet-fix-several-bugs'
Taehee Yoo says:
====================
net: rmnet: fix several bugs
This patchset is to fix several bugs in RMNET module.
1. The first patch fixes NULL-ptr-deref in rmnet_newlink().
When rmnet interface is being created, it uses IFLA_LINK
without checking NULL.
So, if userspace doesn't set IFLA_LINK, panic will occur.
In this patch, checking NULL pointer code is added.
2. The second patch fixes NULL-ptr-deref in rmnet_changelink().
To get real device in rmnet_changelink(), it uses IFLA_LINK.
But, IFLA_LINK should not be used in rmnet_changelink().
3. The third patch fixes suspicious RCU usage in rmnet_get_port().
rmnet_get_port() uses rcu_dereference_rtnl().
But, rmnet_get_port() is used by datapath.
So, rcu_dereference_bh() should be used instead of rcu_dereference_rtnl().
4. The fourth patch fixes suspicious RCU usage in
rmnet_force_unassociate_device().
RCU critical section should not be scheduled.
But, unregister_netdevice_queue() in the rmnet_force_unassociate_device()
would be scheduled.
So, the RCU warning occurs.
In this patch, the rcu_read_lock() in the rmnet_force_unassociate_device()
is removed because it's unnecessary.
5. The fifth patch fixes duplicate MUX ID case.
RMNET MUX ID is unique.
So, rmnet interface isn't allowed to be created, which have
a duplicate MUX ID.
But, only rmnet_newlink() checks this condition, rmnet_changelink()
doesn't check this.
So, duplicate MUX ID case would happen.
6. The sixth patch fixes upper/lower interface relationship problems.
When IFLA_LINK is used, the upper/lower infrastructure should be used.
Because it checks the maximum depth of upper/lower interfaces and it also
checks circular interface relationship, etc.
In this patch, netdev_upper_dev_link() is used.
7. The seventh patch fixes bridge related problems.
a) ->ndo_del_slave() doesn't work.
b) It couldn't detect circular upper/lower interface relationship.
c) It couldn't prevent stack overflow because of too deep depth
of upper/lower interface
d) It doesn't check the number of lower interfaces.
e) Panics because of several reasons.
These problems are actually the same problem.
So, this patch fixes these problems.
8. The eighth patch fixes packet forwarding issue in bridge mode
Packet forwarding is not working in rmnet bridge mode.
Because when a packet is forwarded, skb_push() for an ethernet header
is needed. But it doesn't call skb_push().
So, the ethernet header will be lost.
Change log:
- update commit logs.
- drop two patches in this patchset because of wrong target branch.
- ("net: rmnet: add missing module alias")
- ("net: rmnet: print error message when command fails")
- remove unneessary rcu_read_lock() in the third patch.
- use rcu_dereference_bh() instead of rcu_dereference in third patch.
- do not allow to add a bridge device if rmnet interface is already
bridge mode in the seventh patch.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Taehee Yoo [Thu, 27 Feb 2020 12:26:15 +0000 (12:26 +0000)]
net: rmnet: fix packet forwarding in rmnet bridge mode
Packet forwarding is not working in rmnet bridge mode.
Because when a packet is forwarded, skb_push() for an ethernet header
is needed. But it doesn't call skb_push().
So, the ethernet header will be lost.
Test commands:
modprobe rmnet
ip netns add nst
ip netns add nst2
ip link add veth0 type veth peer name veth1
ip link add veth2 type veth peer name veth3
ip link set veth1 netns nst
ip link set veth3 netns nst2
ip link add rmnet0 link veth0 type rmnet mux_id 1
ip link set veth2 master rmnet0
ip link set veth0 up
ip link set veth2 up
ip link set rmnet0 up
ip a a 192.168.100.1/24 dev rmnet0
ip netns exec nst ip link set veth1 up
ip netns exec nst ip a a 192.168.100.2/24 dev veth1
ip netns exec nst2 ip link set veth3 up
ip netns exec nst2 ip a a 192.168.100.3/24 dev veth3
ip netns exec nst2 ping 192.168.100.2
Fixes:
60d58f971c10 ("net: qualcomm: rmnet: Implement bridge mode")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Taehee Yoo [Thu, 27 Feb 2020 12:26:02 +0000 (12:26 +0000)]
net: rmnet: fix bridge mode bugs
In order to attach a bridge interface to the rmnet interface,
"master" operation is used.
(e.g. ip link set dummy1 master rmnet0)
But, in the rmnet_add_bridge(), which is a callback of ->ndo_add_slave()
doesn't register lower interface.
So, ->ndo_del_slave() doesn't work.
There are other problems too.
1. It couldn't detect circular upper/lower interface relationship.
2. It couldn't prevent stack overflow because of too deep depth
of upper/lower interface
3. It doesn't check the number of lower interfaces.
4. Panics because of several reasons.
The root problem of these issues is actually the same.
So, in this patch, these all problems will be fixed.
Test commands:
modprobe rmnet
ip link add dummy0 type dummy
ip link add rmnet0 link dummy0 type rmnet mux_id 1
ip link add dummy1 master rmnet0 type dummy
ip link add dummy2 master rmnet0 type dummy
ip link del rmnet0
ip link del dummy2
ip link del dummy1
Splat looks like:
[ 41.867595][ T1164] general protection fault, probably for non-canonical address 0xdffffc0000000101I
[ 41.869993][ T1164] KASAN: null-ptr-deref in range [0x0000000000000808-0x000000000000080f]
[ 41.872950][ T1164] CPU: 0 PID: 1164 Comm: ip Not tainted 5.6.0-rc1+ #447
[ 41.873915][ T1164] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 41.875161][ T1164] RIP: 0010:rmnet_unregister_bridge.isra.6+0x71/0xf0 [rmnet]
[ 41.876178][ T1164] Code: 48 89 ef 48 89 c6 5b 5d e9 fc fe ff ff e8 f7 f3 ff ff 48 8d b8 08 08 00 00 48 ba 00 7
[ 41.878925][ T1164] RSP: 0018:
ffff8880c4d0f188 EFLAGS:
00010202
[ 41.879774][ T1164] RAX:
0000000000000000 RBX:
0000000000000000 RCX:
0000000000000101
[ 41.887689][ T1164] RDX:
dffffc0000000000 RSI:
ffffffffb8cf64f0 RDI:
0000000000000808
[ 41.888727][ T1164] RBP:
ffff8880c40e4000 R08:
ffffed101b3c0e3c R09:
0000000000000001
[ 41.889749][ T1164] R10:
0000000000000001 R11:
ffffed101b3c0e3b R12:
1ffff110189a1e3c
[ 41.890783][ T1164] R13:
ffff8880c4d0f200 R14:
ffffffffb8d56160 R15:
ffff8880ccc2c000
[ 41.891794][ T1164] FS:
00007f4300edc0c0(0000) GS:
ffff8880d9c00000(0000) knlGS:
0000000000000000
[ 41.892953][ T1164] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 41.893800][ T1164] CR2:
00007f43003bc8c0 CR3:
00000000ca53e001 CR4:
00000000000606f0
[ 41.894824][ T1164] Call Trace:
[ 41.895274][ T1164] ? rcu_is_watching+0x2c/0x80
[ 41.895895][ T1164] rmnet_config_notify_cb+0x1f7/0x590 [rmnet]
[ 41.896687][ T1164] ? rmnet_unregister_bridge.isra.6+0xf0/0xf0 [rmnet]
[ 41.897611][ T1164] ? rmnet_unregister_bridge.isra.6+0xf0/0xf0 [rmnet]
[ 41.898508][ T1164] ? __module_text_address+0x13/0x140
[ 41.899162][ T1164] notifier_call_chain+0x90/0x160
[ 41.899814][ T1164] rollback_registered_many+0x660/0xcf0
[ 41.900544][ T1164] ? netif_set_real_num_tx_queues+0x780/0x780
[ 41.901316][ T1164] ? __lock_acquire+0xdfe/0x3de0
[ 41.901958][ T1164] ? memset+0x1f/0x40
[ 41.902468][ T1164] ? __nla_validate_parse+0x98/0x1ab0
[ 41.903166][ T1164] unregister_netdevice_many.part.133+0x13/0x1b0
[ 41.903988][ T1164] rtnl_delete_link+0xbc/0x100
[ ... ]
Fixes:
60d58f971c10 ("net: qualcomm: rmnet: Implement bridge mode")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Taehee Yoo [Thu, 27 Feb 2020 12:25:43 +0000 (12:25 +0000)]
net: rmnet: use upper/lower device infrastructure
netdev_upper_dev_link() is useful to manage lower/upper interfaces.
And this function internally validates looping, maximum depth.
All or most virtual interfaces that could have a real interface
(e.g. macsec, macvlan, ipvlan etc.) use lower/upper infrastructure.
Test commands:
modprobe rmnet
ip link add dummy0 type dummy
ip link add rmnet1 link dummy0 type rmnet mux_id 1
for i in {2..100}
do
let A=$i-1
ip link add rmnet$i link rmnet$A type rmnet mux_id $i
done
ip link del dummy0
The purpose of the test commands is to make stack overflow.
Splat looks like:
[ 52.411438][ T1395] BUG: KASAN: slab-out-of-bounds in find_busiest_group+0x27e/0x2c00
[ 52.413218][ T1395] Write of size 64 at addr
ffff8880c774bde0 by task ip/1395
[ 52.414841][ T1395]
[ 52.430720][ T1395] CPU: 1 PID: 1395 Comm: ip Not tainted 5.6.0-rc1+ #447
[ 52.496511][ T1395] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 52.513597][ T1395] Call Trace:
[ 52.546516][ T1395]
[ 52.558773][ T1395] Allocated by task
3171537984:
[ 52.588290][ T1395] BUG: unable to handle page fault for address:
ffffffffb999e260
[ 52.589311][ T1395] #PF: supervisor read access in kernel mode
[ 52.590529][ T1395] #PF: error_code(0x0000) - not-present page
[ 52.591374][ T1395] PGD
d6818067 P4D
d6818067 PUD
d6819063 PMD 0
[ 52.592288][ T1395] Thread overran stack, or stack corrupted
[ 52.604980][ T1395] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[ 52.605856][ T1395] CPU: 1 PID: 1395 Comm: ip Not tainted 5.6.0-rc1+ #447
[ 52.611764][ T1395] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 52.621520][ T1395] RIP: 0010:stack_depot_fetch+0x10/0x30
[ 52.622296][ T1395] Code: ff e9 f9 fe ff ff 48 89 df e8 9c 1d 91 ff e9 ca fe ff ff cc cc cc cc cc cc cc 89 f8 0
[ 52.627887][ T1395] RSP: 0018:
ffff8880c774bb60 EFLAGS:
00010006
[ 52.628735][ T1395] RAX:
00000000001f8880 RBX:
ffff8880c774d140 RCX:
0000000000000000
[ 52.631773][ T1395] RDX:
000000000000001d RSI:
ffff8880c774bb68 RDI:
0000000000003ff0
[ 52.649584][ T1395] RBP:
ffffea00031dd200 R08:
ffffed101b43e403 R09:
ffffed101b43e403
[ 52.674857][ T1395] R10:
0000000000000001 R11:
ffffed101b43e402 R12:
ffff8880d900e5c0
[ 52.678257][ T1395] R13:
ffff8880c774c000 R14:
0000000000000000 R15:
dffffc0000000000
[ 52.694541][ T1395] FS:
00007fe867f6e0c0(0000) GS:
ffff8880da000000(0000) knlGS:
0000000000000000
[ 52.764039][ T1395] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 52.815008][ T1395] CR2:
ffffffffb999e260 CR3:
00000000c26aa005 CR4:
00000000000606e0
[ 52.862312][ T1395] Call Trace:
[ 52.887133][ T1395] Modules linked in: dummy rmnet veth openvswitch nsh nf_conncount nf_nat nf_conntrack nf_dex
[ 52.936749][ T1395] CR2:
ffffffffb999e260
[ 52.965695][ T1395] ---[ end trace
7e32ca99482dbb31 ]---
[ 52.966556][ T1395] RIP: 0010:stack_depot_fetch+0x10/0x30
[ 52.971083][ T1395] Code: ff e9 f9 fe ff ff 48 89 df e8 9c 1d 91 ff e9 ca fe ff ff cc cc cc cc cc cc cc 89 f8 0
[ 53.003650][ T1395] RSP: 0018:
ffff8880c774bb60 EFLAGS:
00010006
[ 53.043183][ T1395] RAX:
00000000001f8880 RBX:
ffff8880c774d140 RCX:
0000000000000000
[ 53.076480][ T1395] RDX:
000000000000001d RSI:
ffff8880c774bb68 RDI:
0000000000003ff0
[ 53.093858][ T1395] RBP:
ffffea00031dd200 R08:
ffffed101b43e403 R09:
ffffed101b43e403
[ 53.112795][ T1395] R10:
0000000000000001 R11:
ffffed101b43e402 R12:
ffff8880d900e5c0
[ 53.139837][ T1395] R13:
ffff8880c774c000 R14:
0000000000000000 R15:
dffffc0000000000
[ 53.141500][ T1395] FS:
00007fe867f6e0c0(0000) GS:
ffff8880da000000(0000) knlGS:
0000000000000000
[ 53.143343][ T1395] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 53.152007][ T1395] CR2:
ffffffffb999e260 CR3:
00000000c26aa005 CR4:
00000000000606e0
[ 53.156459][ T1395] Kernel panic - not syncing: Fatal exception
[ 54.213570][ T1395] Shutting down cpus with NMI
[ 54.354112][ T1395] Kernel Offset: 0x33000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0x)
[ 54.355687][ T1395] Rebooting in 5 seconds..
Fixes:
b37f78f234bf ("net: qualcomm: rmnet: Fix crash on real dev unregistration")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Taehee Yoo [Thu, 27 Feb 2020 12:25:19 +0000 (12:25 +0000)]
net: rmnet: do not allow to change mux id if mux id is duplicated
Basically, duplicate mux id isn't be allowed.
So, the creation of rmnet will be failed if there is duplicate mux id
is existing.
But, changelink routine doesn't check duplicate mux id.
Test commands:
modprobe rmnet
ip link add dummy0 type dummy
ip link add rmnet0 link dummy0 type rmnet mux_id 1
ip link add rmnet1 link dummy0 type rmnet mux_id 2
ip link set rmnet1 type rmnet mux_id 1
Fixes:
23790ef12082 ("net: qualcomm: rmnet: Allow to configure flags for existing devices")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Taehee Yoo [Thu, 27 Feb 2020 12:25:05 +0000 (12:25 +0000)]
net: rmnet: remove rcu_read_lock in rmnet_force_unassociate_device()
The notifier_call() of the slave interface removes rmnet interface with
unregister_netdevice_queue().
But, before calling unregister_netdevice_queue(), it acquires
rcu readlock.
In the RCU critical section, sleeping isn't be allowed.
But, unregister_netdevice_queue() internally calls synchronize_net(),
which would sleep.
So, suspicious RCU usage warning occurs.
Test commands:
modprobe rmnet
ip link add dummy0 type dummy
ip link add dummy1 type dummy
ip link add rmnet0 link dummy0 type rmnet mux_id 1
ip link set dummy1 master rmnet0
ip link del dummy0
Splat looks like:
[ 79.639245][ T1195] =============================
[ 79.640134][ T1195] WARNING: suspicious RCU usage
[ 79.640852][ T1195] 5.6.0-rc1+ #447 Not tainted
[ 79.641657][ T1195] -----------------------------
[ 79.642472][ T1195] ./include/linux/rcupdate.h:273 Illegal context switch in RCU read-side critical section!
[ 79.644043][ T1195]
[ 79.644043][ T1195] other info that might help us debug this:
[ 79.644043][ T1195]
[ 79.645682][ T1195]
[ 79.645682][ T1195] rcu_scheduler_active = 2, debug_locks = 1
[ 79.646980][ T1195] 2 locks held by ip/1195:
[ 79.647629][ T1195] #0:
ffffffffa3cf64f0 (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x457/0x890
[ 79.649312][ T1195] #1:
ffffffffa39256c0 (rcu_read_lock){....}, at: rmnet_config_notify_cb+0xf0/0x590 [rmnet]
[ 79.651717][ T1195]
[ 79.651717][ T1195] stack backtrace:
[ 79.652650][ T1195] CPU: 3 PID: 1195 Comm: ip Not tainted 5.6.0-rc1+ #447
[ 79.653702][ T1195] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 79.655037][ T1195] Call Trace:
[ 79.655560][ T1195] dump_stack+0x96/0xdb
[ 79.656252][ T1195] ___might_sleep+0x345/0x440
[ 79.656994][ T1195] synchronize_net+0x18/0x30
[ 79.661132][ T1195] netdev_rx_handler_unregister+0x40/0xb0
[ 79.666266][ T1195] rmnet_unregister_real_device+0x42/0xb0 [rmnet]
[ 79.667211][ T1195] rmnet_config_notify_cb+0x1f7/0x590 [rmnet]
[ 79.668121][ T1195] ? rmnet_unregister_bridge.isra.6+0xf0/0xf0 [rmnet]
[ 79.669166][ T1195] ? rmnet_unregister_bridge.isra.6+0xf0/0xf0 [rmnet]
[ 79.670286][ T1195] ? __module_text_address+0x13/0x140
[ 79.671139][ T1195] notifier_call_chain+0x90/0x160
[ 79.671973][ T1195] rollback_registered_many+0x660/0xcf0
[ 79.672893][ T1195] ? netif_set_real_num_tx_queues+0x780/0x780
[ 79.675091][ T1195] ? __lock_acquire+0xdfe/0x3de0
[ 79.675825][ T1195] ? memset+0x1f/0x40
[ 79.676367][ T1195] ? __nla_validate_parse+0x98/0x1ab0
[ 79.677290][ T1195] unregister_netdevice_many.part.133+0x13/0x1b0
[ 79.678163][ T1195] rtnl_delete_link+0xbc/0x100
[ ... ]
Fixes:
ceed73a2cf4a ("drivers: net: ethernet: qualcomm: rmnet: Initial implementation")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Taehee Yoo [Thu, 27 Feb 2020 12:24:45 +0000 (12:24 +0000)]
net: rmnet: fix suspicious RCU usage
rmnet_get_port() internally calls rcu_dereference_rtnl(),
which checks RTNL.
But rmnet_get_port() could be called by packet path.
The packet path is not protected by RTNL.
So, the suspicious RCU usage problem occurs.
Test commands:
modprobe rmnet
ip netns add nst
ip link add veth0 type veth peer name veth1
ip link set veth1 netns nst
ip link add rmnet0 link veth0 type rmnet mux_id 1
ip netns exec nst ip link add rmnet1 link veth1 type rmnet mux_id 1
ip netns exec nst ip link set veth1 up
ip netns exec nst ip link set rmnet1 up
ip netns exec nst ip a a 192.168.100.2/24 dev rmnet1
ip link set veth0 up
ip link set rmnet0 up
ip a a 192.168.100.1/24 dev rmnet0
ping 192.168.100.2
Splat looks like:
[ 146.630958][ T1174] WARNING: suspicious RCU usage
[ 146.631735][ T1174] 5.6.0-rc1+ #447 Not tainted
[ 146.632387][ T1174] -----------------------------
[ 146.633151][ T1174] drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c:386 suspicious rcu_dereference_check() !
[ 146.634742][ T1174]
[ 146.634742][ T1174] other info that might help us debug this:
[ 146.634742][ T1174]
[ 146.645992][ T1174]
[ 146.645992][ T1174] rcu_scheduler_active = 2, debug_locks = 1
[ 146.646937][ T1174] 5 locks held by ping/1174:
[ 146.647609][ T1174] #0:
ffff8880c31dea70 (sk_lock-AF_INET){+.+.}, at: raw_sendmsg+0xab8/0x2980
[ 146.662463][ T1174] #1:
ffffffff93925660 (rcu_read_lock_bh){....}, at: ip_finish_output2+0x243/0x2150
[ 146.671696][ T1174] #2:
ffffffff93925660 (rcu_read_lock_bh){....}, at: __dev_queue_xmit+0x213/0x2940
[ 146.673064][ T1174] #3:
ffff8880c19ecd58 (&dev->qdisc_running_key#7){+...}, at: ip_finish_output2+0x714/0x2150
[ 146.690358][ T1174] #4:
ffff8880c5796898 (&dev->qdisc_xmit_lock_key#3){+.-.}, at: sch_direct_xmit+0x1e2/0x1020
[ 146.699875][ T1174]
[ 146.699875][ T1174] stack backtrace:
[ 146.701091][ T1174] CPU: 0 PID: 1174 Comm: ping Not tainted 5.6.0-rc1+ #447
[ 146.705215][ T1174] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 146.706565][ T1174] Call Trace:
[ 146.707102][ T1174] dump_stack+0x96/0xdb
[ 146.708007][ T1174] rmnet_get_port.part.9+0x76/0x80 [rmnet]
[ 146.709233][ T1174] rmnet_egress_handler+0x107/0x420 [rmnet]
[ 146.710492][ T1174] ? sch_direct_xmit+0x1e2/0x1020
[ 146.716193][ T1174] rmnet_vnd_start_xmit+0x3d/0xa0 [rmnet]
[ 146.717012][ T1174] dev_hard_start_xmit+0x160/0x740
[ 146.717854][ T1174] sch_direct_xmit+0x265/0x1020
[ 146.718577][ T1174] ? register_lock_class+0x14d0/0x14d0
[ 146.719429][ T1174] ? dev_watchdog+0xac0/0xac0
[ 146.723738][ T1174] ? __dev_queue_xmit+0x15fd/0x2940
[ 146.724469][ T1174] ? lock_acquire+0x164/0x3b0
[ 146.725172][ T1174] __dev_queue_xmit+0x20c7/0x2940
[ ... ]
Fixes:
ceed73a2cf4a ("drivers: net: ethernet: qualcomm: rmnet: Initial implementation")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Taehee Yoo [Thu, 27 Feb 2020 12:24:26 +0000 (12:24 +0000)]
net: rmnet: fix NULL pointer dereference in rmnet_changelink()
In the rmnet_changelink(), it uses IFLA_LINK without checking
NULL pointer.
tb[IFLA_LINK] could be NULL pointer.
So, NULL-ptr-deref could occur.
rmnet already has a lower interface (real_dev).
So, after this patch, rmnet_changelink() does not use IFLA_LINK anymore.
Test commands:
modprobe rmnet
ip link add dummy0 type dummy
ip link add rmnet0 link dummy0 type rmnet mux_id 1
ip link set rmnet0 type rmnet mux_id 2
Splat looks like:
[ 90.578726][ T1131] general protection fault, probably for non-canonical address 0xdffffc0000000000I
[ 90.581121][ T1131] KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
[ 90.582380][ T1131] CPU: 2 PID: 1131 Comm: ip Not tainted 5.6.0-rc1+ #447
[ 90.584285][ T1131] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 90.587506][ T1131] RIP: 0010:rmnet_changelink+0x5a/0x8a0 [rmnet]
[ 90.588546][ T1131] Code: 83 ec 20 48 c1 ea 03 80 3c 02 00 0f 85 6f 07 00 00 48 8b 5e 28 48 b8 00 00 00 00 00 0
[ 90.591447][ T1131] RSP: 0018:
ffff8880ce78f1b8 EFLAGS:
00010247
[ 90.592329][ T1131] RAX:
dffffc0000000000 RBX:
0000000000000000 RCX:
ffff8880ce78f8b0
[ 90.593253][ T1131] RDX:
0000000000000000 RSI:
ffff8880ce78f4a0 RDI:
0000000000000004
[ 90.594058][ T1131] RBP:
ffff8880cf543e00 R08:
0000000000000002 R09:
0000000000000002
[ 90.594859][ T1131] R10:
ffffffffc0586a40 R11:
0000000000000000 R12:
ffff8880ca47c000
[ 90.595690][ T1131] R13:
ffff8880ca47c000 R14:
ffff8880cf545000 R15:
0000000000000000
[ 90.596553][ T1131] FS:
00007f21f6c7e0c0(0000) GS:
ffff8880da400000(0000) knlGS:
0000000000000000
[ 90.597504][ T1131] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 90.599418][ T1131] CR2:
0000556e413db458 CR3:
00000000c917a002 CR4:
00000000000606e0
[ 90.600289][ T1131] Call Trace:
[ 90.600631][ T1131] __rtnl_newlink+0x922/0x1270
[ 90.601194][ T1131] ? lock_downgrade+0x6e0/0x6e0
[ 90.601724][ T1131] ? rtnl_link_unregister+0x220/0x220
[ 90.602309][ T1131] ? lock_acquire+0x164/0x3b0
[ 90.602784][ T1131] ? is_bpf_image_address+0xff/0x1d0
[ 90.603331][ T1131] ? rtnl_newlink+0x4c/0x90
[ 90.603810][ T1131] ? kernel_text_address+0x111/0x140
[ 90.604419][ T1131] ? __kernel_text_address+0xe/0x30
[ 90.604981][ T1131] ? unwind_get_return_address+0x5f/0xa0
[ 90.605616][ T1131] ? create_prof_cpu_mask+0x20/0x20
[ 90.606304][ T1131] ? arch_stack_walk+0x83/0xb0
[ 90.606985][ T1131] ? stack_trace_save+0x82/0xb0
[ 90.607656][ T1131] ? stack_trace_consume_entry+0x160/0x160
[ 90.608503][ T1131] ? deactivate_slab.isra.78+0x2c5/0x800
[ 90.609336][ T1131] ? kasan_unpoison_shadow+0x30/0x40
[ 90.610096][ T1131] ? kmem_cache_alloc_trace+0x135/0x350
[ 90.610889][ T1131] ? rtnl_newlink+0x4c/0x90
[ 90.611512][ T1131] rtnl_newlink+0x65/0x90
[ ... ]
Fixes:
23790ef12082 ("net: qualcomm: rmnet: Allow to configure flags for existing devices")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Taehee Yoo [Thu, 27 Feb 2020 12:23:52 +0000 (12:23 +0000)]
net: rmnet: fix NULL pointer dereference in rmnet_newlink()
rmnet registers IFLA_LINK interface as a lower interface.
But, IFLA_LINK could be NULL.
In the current code, rmnet doesn't check IFLA_LINK.
So, panic would occur.
Test commands:
modprobe rmnet
ip link add rmnet0 type rmnet mux_id 1
Splat looks like:
[ 36.826109][ T1115] general protection fault, probably for non-canonical address 0xdffffc0000000000I
[ 36.838817][ T1115] KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
[ 36.839908][ T1115] CPU: 1 PID: 1115 Comm: ip Not tainted 5.6.0-rc1+ #447
[ 36.840569][ T1115] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 36.841408][ T1115] RIP: 0010:rmnet_newlink+0x54/0x510 [rmnet]
[ 36.841986][ T1115] Code: 83 ec 18 48 c1 e9 03 80 3c 01 00 0f 85 d4 03 00 00 48 8b 6a 28 48 b8 00 00 00 00 00 c
[ 36.843923][ T1115] RSP: 0018:
ffff8880b7e0f1c0 EFLAGS:
00010247
[ 36.844756][ T1115] RAX:
dffffc0000000000 RBX:
ffff8880d14cca00 RCX:
1ffff11016fc1e99
[ 36.845859][ T1115] RDX:
0000000000000000 RSI:
ffff8880c3d04000 RDI:
0000000000000004
[ 36.846961][ T1115] RBP:
0000000000000000 R08:
ffff8880b7e0f8b0 R09:
ffff8880b6ac2d90
[ 36.848020][ T1115] R10:
ffffffffc0589a40 R11:
ffffed1016d585b7 R12:
ffffffff88ceaf80
[ 36.848788][ T1115] R13:
ffff8880c3d04000 R14:
ffff8880b7e0f8b0 R15:
ffff8880c3d04000
[ 36.849546][ T1115] FS:
00007f50ab3360c0(0000) GS:
ffff8880da000000(0000) knlGS:
0000000000000000
[ 36.851784][ T1115] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 36.852422][ T1115] CR2:
000055871afe5ab0 CR3:
00000000ae246001 CR4:
00000000000606e0
[ 36.853181][ T1115] Call Trace:
[ 36.853514][ T1115] __rtnl_newlink+0xbdb/0x1270
[ 36.853967][ T1115] ? lock_downgrade+0x6e0/0x6e0
[ 36.854420][ T1115] ? rtnl_link_unregister+0x220/0x220
[ 36.854936][ T1115] ? lock_acquire+0x164/0x3b0
[ 36.855376][ T1115] ? is_bpf_image_address+0xff/0x1d0
[ 36.855884][ T1115] ? rtnl_newlink+0x4c/0x90
[ 36.856304][ T1115] ? kernel_text_address+0x111/0x140
[ 36.856857][ T1115] ? __kernel_text_address+0xe/0x30
[ 36.857440][ T1115] ? unwind_get_return_address+0x5f/0xa0
[ 36.858063][ T1115] ? create_prof_cpu_mask+0x20/0x20
[ 36.858644][ T1115] ? arch_stack_walk+0x83/0xb0
[ 36.859171][ T1115] ? stack_trace_save+0x82/0xb0
[ 36.859710][ T1115] ? stack_trace_consume_entry+0x160/0x160
[ 36.860357][ T1115] ? deactivate_slab.isra.78+0x2c5/0x800
[ 36.860928][ T1115] ? kasan_unpoison_shadow+0x30/0x40
[ 36.861520][ T1115] ? kmem_cache_alloc_trace+0x135/0x350
[ 36.862125][ T1115] ? rtnl_newlink+0x4c/0x90
[ 36.864073][ T1115] rtnl_newlink+0x65/0x90
[ ... ]
Fixes:
ceed73a2cf4a ("drivers: net: ethernet: qualcomm: rmnet: Initial implementation")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Thu, 27 Feb 2020 19:26:33 +0000 (11:26 -0800)]
Merge tag 'kbuild-fixes-v5.6-2' of git://git./linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- fix missed rebuild of DT schema check
- add some phony targets to PHONY
- fix comments and documents
* tag 'kbuild-fixes-v5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: get rid of trailing slash from subdir- example
kbuild: add dt_binding_check to PHONY in a correct place
kbuild: add dtbs_check to PHONY
kbuild: remove unneeded semicolon at the end of cmd_dtb_check
kbuild: fix DT binding schema rule to detect command line changes
kbuild: remove wrong documentation about mandatory-y
kbuild: add comment for V=2 mode
Russell King [Thu, 27 Feb 2020 09:44:49 +0000 (09:44 +0000)]
net: phy: marvell: don't interpret PHY status unless resolved
Don't attempt to interpret the PHY specific status register unless
the PHY is indicating that the resolution is valid.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Thu, 27 Feb 2020 07:22:10 +0000 (08:22 +0100)]
mlx5: register lag notifier for init network namespace only
The current code causes problems when the unregistering netdevice could
be different then the registering one.
Since the check in mlx5_lag_netdev_event() does not allow any other
network namespace anyway, fix this by registerting the lag notifier
per init network namespace only.
Fixes:
d48834f9d4b4 ("mlx5: Use dev_net netdevice notifier registrations")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Tested-by: Aya Levin <ayal@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Thu, 27 Feb 2020 19:13:27 +0000 (11:13 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/hid/hid
Pull HID subsystem fixes from Jiri Kosina:
- syzkaller-reported error handling fixes in various drivers, from
various people
- increase of HID report buffer size to 8K, which is apparently needed
by certain modern devices
- a few new device-ID-specific fixes / quirks
- battery charging status reporting fix in logitech-hidpp, from Filipe
Laíns
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
HID: hid-bigbenff: fix race condition for scheduled work during removal
HID: hid-bigbenff: call hid_hw_stop() in case of error
HID: hid-bigbenff: fix general protection fault caused by double kfree
HID: i2c-hid: add Trekstor Surfbook E11B to descriptor override
HID: alps: Fix an error handling path in 'alps_input_configured()'
HID: hiddev: Fix race in in hiddev_disconnect()
HID: core: increase HID report buffer size to 8KiB
HID: core: fix off-by-one memset in hid_report_raw_event()
HID: apple: Add support for recent firmware on Magic Keyboards
HID: ite: Only bind to keyboard USB interface on Acer SW5-012 keyboard dock
HID: logitech-hidpp: BatteryVoltage: only read chargeStatus if extPower is active
Tobias Klauser [Wed, 26 Feb 2020 17:29:53 +0000 (18:29 +0100)]
unix: define and set show_fdinfo only if procfs is enabled
Follow the pattern used with other *_show_fdinfo functions and only
define unix_show_fdinfo and set it in proto_ops if CONFIG_PROCFS
is set.
Fixes:
3c32da19a858 ("unix: Show number of pending scm files of receive queue in fdinfo")
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 27 Feb 2020 19:08:01 +0000 (11:08 -0800)]
Merge branch 'hinic-BugFixes'
Luo bin says:
====================
hinic: BugFixes
the bug fixed in patch #2 has been present since the first commit.
the bugs fixed in patch #1 and patch #3 have been present since the
following commits:
patch #1:
352f58b0d9f2 ("net-next/hinic: Set Rxq irq to specific cpu for NUMA")
patch #3:
421e9526288b ("hinic: add rss support")
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Luo bin [Thu, 27 Feb 2020 06:34:44 +0000 (06:34 +0000)]
hinic: fix a bug of rss configuration
should use real receive queue number to configure hw rss
indirect table rather than maximal queue number
Signed-off-by: Luo bin <luobin9@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Luo bin [Thu, 27 Feb 2020 06:34:43 +0000 (06:34 +0000)]
hinic: fix a bug of setting hw_ioctxt
a reserved field is used to signify prime physical function index
in the latest firmware version, so we must assign a value to it
correctly
Signed-off-by: Luo bin <luobin9@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Luo bin [Thu, 27 Feb 2020 06:34:42 +0000 (06:34 +0000)]
hinic: fix a irq affinity bug
can not use a local variable as an input parameter of
irq_set_affinity_hint
Signed-off-by: Luo bin <luobin9@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Thu, 27 Feb 2020 19:07:13 +0000 (11:07 -0800)]
Merge tag 'docs-5.6-fixes' of git://git.lwn.net/linux
Pull documentation fixes from Jonathan Corbet:
"A pair of docs-build fixes"
* tag 'docs-5.6-fixes' of git://git.lwn.net/linux:
docs: Fix empty parallelism argument
docs: remove MPX from the x86 toc
Linus Torvalds [Thu, 27 Feb 2020 19:01:22 +0000 (11:01 -0800)]
Merge tag 'audit-pr-
20200226' of git://git./linux/kernel/git/pcmoore/audit
Pull audit fixes from Paul Moore:
"Two fixes for problems found by syzbot:
- Moving audit filter structure fields into a union caused some
problems in the code which populates that filter structure.
We keep the union (that idea is a good one), but we are fixing the
code so that it doesn't needlessly set fields in the union and mess
up the error handling.
- The audit_receive_msg() function wasn't validating user input as
well as it should in all cases, we add the necessary checks"
* tag 'audit-pr-
20200226' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
audit: always check the netlink payload length in audit_receive_msg()
audit: fix error handling in audit_data_to_entry()
Karsten Graul [Wed, 26 Feb 2020 16:52:46 +0000 (17:52 +0100)]
net/smc: check for valid ib_client_data
In smc_ib_remove_dev() check if the provided ib device was actually
initialized for SMC before.
Reported-by: syzbot+84484ccebdd4e5451d91@syzkaller.appspotmail.com
Fixes:
a4cf0443c414 ("smc: introduce SMC as an IB-client")
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Aaro Koskinen [Wed, 26 Feb 2020 16:49:01 +0000 (18:49 +0200)]
net: stmmac: fix notifier registration
We cannot register the same netdev notifier multiple times when probing
stmmac devices. Register the notifier only once in module init, and also
make debugfs creation/deletion safe against simultaneous notifier call.
Fixes:
481a7d154cbb ("stmmac: debugfs entry name is not be changed when udev rename device name.")
Signed-off-by: Aaro Koskinen <aaro.koskinen@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Antoine Tenart [Wed, 26 Feb 2020 15:26:50 +0000 (16:26 +0100)]
net: phy: mscc: fix firmware paths
The firmware paths for the VSC8584 PHYs not not contain the leading
'microchip/' directory, as used in linux-firmware, resulting in an
error when probing the driver. This patch fixes it.
Fixes:
a5afc1678044 ("net: phy: mscc: add support for VSC8584 PHY")
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Wed, 26 Feb 2020 11:19:03 +0000 (12:19 +0100)]
mptcp: add dummy icsk_sync_mss()
syzbot noted that the master MPTCP socket lacks the icsk_sync_mss
callback, and was able to trigger a null pointer dereference:
BUG: kernel NULL pointer dereference, address:
0000000000000000
PGD
8e171067 P4D
8e171067 PUD
93fa2067 PMD 0
Oops: 0010 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 8984 Comm: syz-executor066 Not tainted 5.6.0-rc2-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:0x0
Code: Bad RIP value.
RSP: 0018:
ffffc900020b7b80 EFLAGS:
00010246
RAX:
1ffff110124ba600 RBX:
0000000000000000 RCX:
ffff88809fefa600
RDX:
ffff8880994cdb18 RSI:
0000000000000000 RDI:
ffff8880925d3140
RBP:
ffffc900020b7bd8 R08:
ffffffff870225be R09:
fffffbfff140652a
R10:
fffffbfff140652a R11:
0000000000000000 R12:
ffff8880925d35d0
R13:
ffff8880925d3140 R14:
dffffc0000000000 R15:
1ffff110124ba6ba
FS:
0000000001a0b880(0000) GS:
ffff8880aea00000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
ffffffffffffffd6 CR3:
00000000a6d6f000 CR4:
00000000001406f0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
Call Trace:
cipso_v4_sock_setattr+0x34b/0x470 net/ipv4/cipso_ipv4.c:1888
netlbl_sock_setattr+0x2a7/0x310 net/netlabel/netlabel_kapi.c:989
smack_netlabel security/smack/smack_lsm.c:2425 [inline]
smack_inode_setsecurity+0x3da/0x4a0 security/smack/smack_lsm.c:2716
security_inode_setsecurity+0xb2/0x140 security/security.c:1364
__vfs_setxattr_noperm+0x16f/0x3e0 fs/xattr.c:197
vfs_setxattr fs/xattr.c:224 [inline]
setxattr+0x335/0x430 fs/xattr.c:451
__do_sys_fsetxattr fs/xattr.c:506 [inline]
__se_sys_fsetxattr+0x130/0x1b0 fs/xattr.c:495
__x64_sys_fsetxattr+0xbf/0xd0 fs/xattr.c:495
do_syscall_64+0xf7/0x1c0 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x440199
Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 fb 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:
00007ffcadc19e48 EFLAGS:
00000246 ORIG_RAX:
00000000000000be
RAX:
ffffffffffffffda RBX:
00000000004002c8 RCX:
0000000000440199
RDX:
0000000020000200 RSI:
00000000200001c0 RDI:
0000000000000003
RBP:
00000000006ca018 R08:
0000000000000003 R09:
00000000004002c8
R10:
0000000000000009 R11:
0000000000000246 R12:
0000000000401a20
R13:
0000000000401ab0 R14:
0000000000000000 R15:
0000000000000000
Modules linked in:
CR2:
0000000000000000
Address the issue adding a dummy icsk_sync_mss callback.
To properly sync the subflows mss and options list we need some
additional infrastructure, which will land to net-next.
Reported-by: syzbot+f4dfece964792d80b139@syzkaller.appspotmail.com
Fixes:
2303f994b3e1 ("mptcp: Associate MPTCP context with TCP socket")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sudheesh Mavila [Wed, 26 Feb 2020 07:10:45 +0000 (12:40 +0530)]
net: phy: corrected the return value for genphy_check_and_restart_aneg and genphy_c45_check_and_restart_aneg
When auto-negotiation is not required, return value should be zero.
Changes v1->v2:
- improved comments and code as Andrew Lunn and Heiner Kallweit suggestion
- fixed issue in genphy_c45_check_and_restart_aneg as Russell King
suggestion.
Fixes:
2a10ab043ac5 ("net: phy: add genphy_check_and_restart_aneg()")
Fixes:
1af9f16840e9 ("net: phy: add genphy_c45_check_and_restart_aneg()")
Signed-off-by: Sudheesh Mavila <sudheesh.mavila@amd.com>
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
yangerkun [Wed, 26 Feb 2020 03:54:35 +0000 (11:54 +0800)]
slip: not call free_netdev before rtnl_unlock in slip_open
As the description before netdev_run_todo, we cannot call free_netdev
before rtnl_unlock, fix it by reorder the code.
Signed-off-by: yangerkun <yangerkun@huawei.com>
Reviewed-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 25 Feb 2020 19:52:29 +0000 (11:52 -0800)]
ipv6: restrict IPV6_ADDRFORM operation
IPV6_ADDRFORM is able to transform IPv6 socket to IPv4 one.
While this operation sounds illogical, we have to support it.
One of the things it does for TCP socket is to switch sk->sk_prot
to tcp_prot.
We now have other layers playing with sk->sk_prot, so we should make
sure to not interfere with them.
This patch makes sure sk_prot is the default pointer for TCP IPv6 socket.
syzbot reported :
BUG: kernel NULL pointer dereference, address:
0000000000000000
PGD
a0113067 P4D
a0113067 PUD
a8771067 PMD 0
Oops: 0010 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 10686 Comm: syz-executor.0 Not tainted 5.6.0-rc2-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:0x0
Code: Bad RIP value.
RSP: 0018:
ffffc9000281fce0 EFLAGS:
00010246
RAX:
1ffffffff15f48ac RBX:
ffffffff8afa4560 RCX:
dffffc0000000000
RDX:
0000000000000000 RSI:
0000000000000000 RDI:
ffff8880a69a8f40
RBP:
ffffc9000281fd10 R08:
ffffffff86ed9b0c R09:
ffffed1014d351f5
R10:
ffffed1014d351f5 R11:
0000000000000000 R12:
ffff8880920d3098
R13:
1ffff1101241a613 R14:
ffff8880a69a8f40 R15:
0000000000000000
FS:
00007f2ae75db700(0000) GS:
ffff8880aea00000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
ffffffffffffffd6 CR3:
00000000a3b85000 CR4:
00000000001406f0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
Call Trace:
inet_release+0x165/0x1c0 net/ipv4/af_inet.c:427
__sock_release net/socket.c:605 [inline]
sock_close+0xe1/0x260 net/socket.c:1283
__fput+0x2e4/0x740 fs/file_table.c:280
____fput+0x15/0x20 fs/file_table.c:313
task_work_run+0x176/0x1b0 kernel/task_work.c:113
tracehook_notify_resume include/linux/tracehook.h:188 [inline]
exit_to_usermode_loop arch/x86/entry/common.c:164 [inline]
prepare_exit_to_usermode+0x480/0x5b0 arch/x86/entry/common.c:195
syscall_return_slowpath+0x113/0x4a0 arch/x86/entry/common.c:278
do_syscall_64+0x11f/0x1c0 arch/x86/entry/common.c:304
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x45c429
Code: ad b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:
00007f2ae75dac78 EFLAGS:
00000246 ORIG_RAX:
0000000000000036
RAX:
0000000000000000 RBX:
00007f2ae75db6d4 RCX:
000000000045c429
RDX:
0000000000000001 RSI:
000000000000011a RDI:
0000000000000004
RBP:
000000000076bf20 R08:
0000000000000038 R09:
0000000000000000
R10:
0000000020000180 R11:
0000000000000246 R12:
00000000ffffffff
R13:
0000000000000a9d R14:
00000000004ccfb4 R15:
000000000076bf2c
Modules linked in:
CR2:
0000000000000000
---[ end trace
82567b5207e87bae ]---
RIP: 0010:0x0
Code: Bad RIP value.
RSP: 0018:
ffffc9000281fce0 EFLAGS:
00010246
RAX:
1ffffffff15f48ac RBX:
ffffffff8afa4560 RCX:
dffffc0000000000
RDX:
0000000000000000 RSI:
0000000000000000 RDI:
ffff8880a69a8f40
RBP:
ffffc9000281fd10 R08:
ffffffff86ed9b0c R09:
ffffed1014d351f5
R10:
ffffed1014d351f5 R11:
0000000000000000 R12:
ffff8880920d3098
R13:
1ffff1101241a613 R14:
ffff8880a69a8f40 R15:
0000000000000000
FS:
00007f2ae75db700(0000) GS:
ffff8880aea00000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
ffffffffffffffd6 CR3:
00000000a3b85000 CR4:
00000000001406f0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
Fixes:
604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot+1938db17e275e85dc328@syzkaller.appspotmail.com
Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ursula Braun [Tue, 25 Feb 2020 15:34:36 +0000 (16:34 +0100)]
net/smc: fix cleanup for linkgroup setup failures
If an SMC connection to a certain peer is setup the first time,
a new linkgroup is created. In case of setup failures, such a
linkgroup is unusable and should disappear. As a first step the
linkgroup is removed from the linkgroup list in smc_lgr_forget().
There are 2 problems:
smc_listen_decline() might be called before linkgroup creation
resulting in a crash due to calling smc_lgr_forget() with
parameter NULL.
If a setup failure occurs after linkgroup creation, the connection
is never unregistered from the linkgroup, preventing linkgroup
freeing.
This patch introduces an enhanced smc_lgr_cleanup_early() function
which
* contains a linkgroup check for early smc_listen_decline()
invocations
* invokes smc_conn_free() to guarantee unregistering of the
connection.
* schedules fast linkgroup removal of the unusable linkgroup
And the unused function smcd_conn_free() is removed from smc_core.h.
Fixes:
3b2dec2603d5b ("net/smc: restructure client and server code in af_smc")
Fixes:
2a0674fffb6bc ("net/smc: improve abnormal termination of link groups")
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nicolas Saenz Julienne [Tue, 25 Feb 2020 13:11:59 +0000 (14:11 +0100)]
net: bcmgenet: Clear ID_MODE_DIS in EXT_RGMII_OOB_CTRL when not needed
Outdated Raspberry Pi 4 firmware might configure the external PHY as
rgmii although the kernel currently sets it as rgmii-rxid. This makes
connections unreliable as ID_MODE_DIS is left enabled. To avoid this,
explicitly clear that bit whenever we don't need it.
Fixes:
da38802211cc ("net: bcmgenet: Add RGMII_RXID support")
Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Tue, 25 Feb 2020 12:54:12 +0000 (13:54 +0100)]
sched: act: count in the size of action flags bitfield
The put of the flags was added by the commit referenced in fixes tag,
however the size of the message was not extended accordingly.
Fix this by adding size of the flags bitfield to the message size.
Fixes:
e38226786022 ("net: sched: update action implementations to support flags")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Masahiro Yamada [Wed, 26 Feb 2020 17:44:58 +0000 (02:44 +0900)]
kbuild: get rid of trailing slash from subdir- example
obj-* needs a trailing slash for a directory, but subdir-* does not.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Madhuparna Bhowmik [Tue, 25 Feb 2020 12:27:45 +0000 (17:57 +0530)]
net: core: devlink.c: Use built-in RCU list checking
list_for_each_entry_rcu() has built-in RCU and lock checking.
Pass cond argument to list_for_each_entry_rcu() to silence
false lockdep warning when CONFIG_PROVE_RCU_LIST is enabled.
The devlink->lock is held when devlink_dpipe_table_find()
is called in non RCU read side section. Therefore, pass struct devlink
to devlink_dpipe_table_find() for lockdep checking.
Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Mon, 24 Feb 2020 23:56:32 +0000 (15:56 -0800)]
net: dsa: bcm_sf2: Forcibly configure IMP port for 1Gb/sec
We are still experiencing some packet loss with the existing advanced
congestion buffering (ACB) settings with the IMP port configured for
2Gb/sec, so revert to conservative link speeds that do not produce
packet loss until this is resolved.
Fixes:
8f1880cbe8d0 ("net: dsa: bcm_sf2: Configure IMP port for 2Gb/sec")
Fixes:
de34d7084edd ("net: dsa: bcm_sf2: Only 7278 supports 2Gb/sec IMP port")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 27 Feb 2020 00:30:17 +0000 (16:30 -0800)]
Merge git://git./pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes:
1) Perform garbage collection from workqueue to fix rcu detected
stall in ipset hash set types, from Jozsef Kadlecsik.
2) Fix the forceadd evaluation path, also from Jozsef.
3) Fix nft_set_pipapo selftest, from Stefano Brivio.
4) Crash when add-flush-add element in pipapo set, also from Stefano.
Add test to cover this crash.
5) Remove sysctl entry under mutex in hashlimit, from Cong Wang.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Wed, 26 Feb 2020 23:54:52 +0000 (15:54 -0800)]
Merge tag 'tag-chrome-platform-fixes-for-v5.6-rc4' of git://git./linux/kernel/git/chrome-platform/linux
Pull chrome platform fix from Benson Leung:
"Fix a build warning"
* tag 'tag-chrome-platform-fixes-for-v5.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux:
platform/chrome: wilco_ec: Include asm/unaligned instead of linux/ path
Jonathan Lemon [Mon, 24 Feb 2020 23:29:09 +0000 (15:29 -0800)]
bnxt_en: add newline to netdev_*() format strings
Add missing newlines to netdev_* format strings so the lines
aren't buffered by the printk subsystem.
Nitpicked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Acked-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Thu, 13 Feb 2020 06:53:52 +0000 (22:53 -0800)]
netfilter: xt_hashlimit: unregister proc file before releasing mutex
Before releasing the global mutex, we only unlink the hashtable
from the hash list, its proc file is still not unregistered at
this point. So syzbot could trigger a race condition where a
parallel htable_create() could register the same file immediately
after the mutex is released.
Move htable_remove_proc_entry() back to mutex protection to
fix this. And, fold htable_destroy() into htable_put() to make
the code slightly easier to understand.
Reported-and-tested-by: syzbot+d195fd3b9a364ddd6731@syzkaller.appspotmail.com
Fixes:
c4a3922d2d20 ("netfilter: xt_hashlimit: reduce hashlimit_mutex scope for htable_put()")
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Michal Kubecek [Mon, 24 Feb 2020 19:42:12 +0000 (20:42 +0100)]
ethtool: limit bitset size
Syzbot reported that ethnl_compact_sanity_checks() can be tricked into
reading past the end of ETHTOOL_A_BITSET_VALUE and ETHTOOL_A_BITSET_MASK
attributes and even the message by passing a value between (u32)(-31)
and (u32)(-1) as ETHTOOL_A_BITSET_SIZE.
The problem is that DIV_ROUND_UP(attr_nbits, 32) is 0 for such values so
that zero length ETHTOOL_A_BITSET_VALUE will pass the length check but
ethnl_bitmap32_not_zero() check would try to access up to 512 MB of
attribute "payload".
Prevent this overflow byt limiting the bitset size. Technically, compact
bitset format would allow bitset sizes up to almost 2^18 (so that the
nest size does not exceed U16_MAX) but bitsets used by ethtool are much
shorter. S16_MAX, the largest value which can be directly used as an
upper limit in policy, should be a reasonable compromise.
Fixes:
10b518d4e6dd ("ethtool: netlink bitset handling")
Reported-by: syzbot+7fd4ed5b4234ab1fdccd@syzkaller.appspotmail.com
Reported-by: syzbot+709b7a64d57978247e44@syzkaller.appspotmail.com
Reported-by: syzbot+983cb8fb2d17a7af549d@syzkaller.appspotmail.com
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Amritha Nambiar [Mon, 24 Feb 2020 18:56:00 +0000 (10:56 -0800)]
net: Fix Tx hash bound checking
Fixes the lower and upper bounds when there are multiple TCs and
traffic is on the the same TC on the same device.
The lower bound is represented by 'qoffset' and the upper limit for
hash value is 'qcount + qoffset'. This gives a clean Rx to Tx queue
mapping when there are multiple TCs, as the queue indices for upper TCs
will be offset by 'qoffset'.
v2: Fixed commit description based on comments.
Fixes:
1b837d489e06 ("net: Revoke export for __skb_tx_hash, update it to just be static skb_tx_hash")
Fixes:
eadec877ce9c ("net: Add support for subordinate traffic classes to netdev_pick_tx")
Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Wed, 26 Feb 2020 18:34:42 +0000 (10:34 -0800)]
Merge tag 'trace-v5.6-rc2' of git://git./linux/kernel/git/rostedt/linux-trace
Pull tracing and bootconfig updates:
"Fixes and changes to bootconfig before it goes live in a release.
Change in API of bootconfig (before it comes live in a release):
- Have a magic value "BOOTCONFIG" in initrd to know a bootconfig
exists
- Set CONFIG_BOOT_CONFIG to 'n' by default
- Show error if "bootconfig" on cmdline but not compiled in
- Prevent redefining the same value
- Have a way to append values
- Added a SELECT BLK_DEV_INITRD to fix a build failure
Synthetic event fixes:
- Switch to raw_smp_processor_id() for recording CPU value in preempt
section. (No care for what the value actually is)
- Fix samples always recording u64 values
- Fix endianess
- Check number of values matches number of fields
- Fix a printing bug
Fix of trace_printk() breaking postponed start up tests
Make a function static that is only used in a single file"
* tag 'trace-v5.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
bootconfig: Fix CONFIG_BOOTTIME_TRACING dependency issue
bootconfig: Add append value operator support
bootconfig: Prohibit re-defining value on same key
bootconfig: Print array as multiple commands for legacy command line
bootconfig: Reject subkey and value on same parent key
tools/bootconfig: Remove unneeded error message silencer
bootconfig: Add bootconfig magic word for indicating bootconfig explicitly
bootconfig: Set CONFIG_BOOT_CONFIG=n by default
tracing: Clear trace_state when starting trace
bootconfig: Mark boot_config_checksum() static
tracing: Disable trace_printk() on post poned tests
tracing: Have synthetic event test use raw_smp_processor_id()
tracing: Fix number printing bug in print_synth_event()
tracing: Check that number of vals matches number of synth event fields
tracing: Make synth_event trace functions endian-correct
tracing: Make sure synth_event_trace() example always uses u64
Linus Torvalds [Wed, 26 Feb 2020 18:28:59 +0000 (10:28 -0800)]
Merge tag 'linux-kselftest-kunit-5.6-rc4' of git://git./linux/kernel/git/shuah/linux-kselftest
Pull Kunit fixes from Shuah Khan:
"This Kselftest kunit update consists of fixes to documentation and
the run-time tool from Brendan Higgins and Heidi Fahim"
* tag 'linux-kselftest-kunit-5.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
kunit: run kunit_tool from any directory
kunit: test: Improve error messages for kunit_tool when kunitconfig is invalid
Documentation: kunit: fixed sphinx error in code block
Linus Torvalds [Wed, 26 Feb 2020 18:06:56 +0000 (10:06 -0800)]
Merge tag 'linux-kselftest-5.6-rc4' of git://git./linux/kernel/git/shuah/linux-kselftest
Pull Kselftest fixes from Shuah Khan:
- fixes to TIMEOUT failures and out-of-tree compilation compilation
errors from Michael Ellerman.
- declutter git status fix from Christophe Leroy
* tag 'linux-kselftest-5.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/rseq: Fix out-of-tree compilation
selftests: Install settings files to fix TIMEOUT failures
selftest/lkdtm: Don't pollute 'git status'
Christoph Hellwig [Wed, 26 Feb 2020 15:39:29 +0000 (07:39 -0800)]
Revert "KVM: x86: enable -Werror"
This reverts commit
ead68df94d248c80fdbae220ae5425eb5af2e753.
Using the -Werror flag breaks the build for me due to mostly harmless
KASAN or similar warnings:
arch/x86/kvm/x86.c: In function ‘kvm_timer_init’:
arch/x86/kvm/x86.c:7209:1: error: the frame size of 1112 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
Feel free to add a CONFIG_WERROR if you care strong enough, but don't
break peoples builds for absolutely no good reason.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Mon, 24 Feb 2020 20:47:14 +0000 (12:47 -0800)]
signal: avoid double atomic counter increments for user accounting
When queueing a signal, we increment both the users count of pending
signals (for RLIMIT_SIGPENDING tracking) and we increment the refcount
of the user struct itself (because we keep a reference to the user in
the signal structure in order to correctly account for it when freeing).
That turns out to be fairly expensive, because both of them are atomic
updates, and particularly under extreme signal handling pressure on big
machines, you can get a lot of cache contention on the user struct.
That can then cause horrid cacheline ping-pong when you do these
multiple accesses.
So change the reference counting to only pin the user for the _first_
pending signal, and to unpin it when the last pending signal is
dequeued. That means that when a user sees a lot of concurrent signal
queuing - which is the only situation when this matters - the only
atomic access needed is generally the 'sigpending' count update.
This was noticed because of a particularly odd timing artifact on a
dual-socket 96C/192T Cascade Lake platform: when you get into bad
contention, on that machine for some reason seems to be much worse when
the contention happens in the upper 32-byte half of the cacheline.
As a result, the kernel test robot will-it-scale 'signal1' benchmark had
an odd performance regression simply due to random alignment of the
'struct user_struct' (and pointed to a completely unrelated and
apparently nonsensical commit for the regression).
Avoiding the double increments (and decrements on the dequeueing side,
of course) makes for much less contention and hugely improved
performance on that will-it-scale microbenchmark.
Quoting Feng Tang:
"It makes a big difference, that the performance score is tripled! bump
from original 17000 to 54000. Also the gap between 5.0-rc6 and
5.0-rc6+Jiri's patch is reduced to around 2%"
[ The "2% gap" is the odd cacheline placement difference on that
platform: under the extreme contention case, the effect of which half
of the cacheline was hot was 5%, so with the reduced contention the
odd timing artifact is reduced too ]
It does help in the non-contended case too, but is not nearly as
noticeable.
Reported-and-tested-by: Feng Tang <feng.tang@intel.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Huang, Ying <ying.huang@intel.com>
Cc: Philip Li <philip.li@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Masahiro Yamada [Sat, 22 Feb 2020 19:04:34 +0000 (04:04 +0900)]
kbuild: add dt_binding_check to PHONY in a correct place
The dt_binding_check is added to PHONY, but it is invisible when
$(dtstree) is empty. So, it is not specified as phony for
ARCH=x86 etc.
Add it to PHONY outside the ifneq ... endif block.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Rob Herring <robh@kernel.org>
Masahiro Yamada [Sat, 22 Feb 2020 19:04:33 +0000 (04:04 +0900)]
kbuild: add dtbs_check to PHONY
The dtbs_check should be a phony target, but currently it is not
specified so.
'make dtbs_check' works even if a file named 'dtbs_check' exists
because it depends on another phony target, scripts_dtc, but we
should not rely on it.
Add dtbs_check to PHONY.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Rob Herring <robh@kernel.org>
Masahiro Yamada [Sat, 22 Feb 2020 19:04:32 +0000 (04:04 +0900)]
kbuild: remove unneeded semicolon at the end of cmd_dtb_check
This trailing semicolon is unneeded.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Rob Herring <robh@kernel.org>
Masahiro Yamada [Sat, 22 Feb 2020 19:04:31 +0000 (04:04 +0900)]
kbuild: fix DT binding schema rule to detect command line changes
This if_change_rule is not working properly; it cannot detect any
command line change.
The reason is because cmd-check in scripts/Kbuild.include compares
$(cmd_$@) and $(cmd_$1), but cmd_dtc_dt_yaml does not exist here.
For if_change_rule to work properly, the stem part of cmd_* and rule_*
must match. Because this cmd_and_fixdep invokes cmd_dtc, this rule must
be named rule_dtc.
Fixes:
4f0e3a57d6eb ("kbuild: Add support for DT binding schema checks")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Rob Herring <robh@kernel.org>
Masahiro Yamada [Wed, 19 Feb 2020 01:15:19 +0000 (10:15 +0900)]
kbuild: remove wrong documentation about mandatory-y
This sentence does not make sense in the section about mandatory-y.
This seems to be a copy-paste mistake of commit
fcc8487d477a ("uapi:
export all headers under uapi directories").
The correct description would be "The convention is to list one
mandatory-y per line ...".
I just removed it instead of fixing it. If such information is needed,
it could be commented in include/asm-generic/Kbuild and
include/uapi/asm-generic/Kbuild.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Randy Dunlap [Thu, 13 Feb 2020 04:40:57 +0000 (20:40 -0800)]
kbuild: add comment for V=2 mode
Complete the comments for valid values of KBUILD_VERBOSE,
specifically for KBUILD_VERBOSE=2.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Stefano Brivio [Fri, 21 Feb 2020 02:04:22 +0000 (03:04 +0100)]
selftests: nft_concat_range: Add test for reported add/flush/add issue
Add a specific test for the crash reported by Phil Sutter and addressed
in the previous patch. The test cases that, in my intention, should
have covered these cases, that is, the ones from the 'concurrency'
section, don't run these sequences tightly enough and spectacularly
failed to catch this.
While at it, define a convenient way to add these kind of tests, by
adding a "reported issues" test section.
It's more convenient, for this particular test, to execute the set
setup in its own function. However, future test cases like this one
might need to call setup functions, and will typically need no tools
other than nft, so allow for this in check_tools().
The original form of the reproducer used here was provided by Phil.
Reported-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Stefano Brivio [Fri, 21 Feb 2020 02:04:21 +0000 (03:04 +0100)]
nft_set_pipapo: Actually fetch key data in nft_pipapo_remove()
Phil reports that adding elements, flushing and re-adding them
right away:
nft add table t '{ set s { type ipv4_addr . inet_service; flags interval; }; }'
nft add element t s '{ 10.0.0.1 . 22-25, 10.0.0.1 . 10-20 }'
nft flush set t s
nft add element t s '{ 10.0.0.1 . 10-20, 10.0.0.1 . 22-25 }'
triggers, almost reliably, a crash like this one:
[ 71.319848] general protection fault, probably for non-canonical address 0x6f6b6e696c2e756e: 0000 [#1] PREEMPT SMP PTI
[ 71.321540] CPU: 3 PID: 1201 Comm: kworker/3:2 Not tainted 5.6.0-rc1-00377-g2bb07f4e1d861 #192
[ 71.322746] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190711_202441-buildvm-armv7-10.arm.fedoraproject.org-2.fc31 04/01/2014
[ 71.324430] Workqueue: events nf_tables_trans_destroy_work [nf_tables]
[ 71.325387] RIP: 0010:nft_set_elem_destroy+0xa5/0x110 [nf_tables]
[ 71.326164] Code: 89 d4 84 c0 74 0e 8b 77 44 0f b6 f8 48 01 df e8 41 ff ff ff 45 84 e4 74 36 44 0f b6 63 08 45 84 e4 74 2c 49 01 dc 49 8b 04 24 <48> 8b 40 38 48 85 c0 74 4f 48 89 e7 4c 8b
[ 71.328423] RSP: 0018:
ffffc9000226fd90 EFLAGS:
00010282
[ 71.329225] RAX:
6f6b6e696c2e756e RBX:
ffff88813ab79f60 RCX:
ffff88813931b5a0
[ 71.330365] RDX:
0000000000000001 RSI:
0000000000000000 RDI:
ffff88813ab79f9a
[ 71.331473] RBP:
ffff88813ab79f60 R08:
0000000000000008 R09:
0000000000000000
[ 71.332627] R10:
000000000000021c R11:
0000000000000000 R12:
ffff88813ab79fc2
[ 71.333615] R13:
ffff88813b3adf50 R14:
dead000000000100 R15:
ffff88813931b8a0
[ 71.334596] FS:
0000000000000000(0000) GS:
ffff88813bd80000(0000) knlGS:
0000000000000000
[ 71.335780] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 71.336577] CR2:
000055ac683710f0 CR3:
000000013a222003 CR4:
0000000000360ee0
[ 71.337533] DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
[ 71.338557] DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
[ 71.339718] Call Trace:
[ 71.340093] nft_pipapo_destroy+0x7a/0x170 [nf_tables_set]
[ 71.340973] nft_set_destroy+0x20/0x50 [nf_tables]
[ 71.341879] nf_tables_trans_destroy_work+0x246/0x260 [nf_tables]
[ 71.342916] process_one_work+0x1d5/0x3c0
[ 71.343601] worker_thread+0x4a/0x3c0
[ 71.344229] kthread+0xfb/0x130
[ 71.344780] ? process_one_work+0x3c0/0x3c0
[ 71.345477] ? kthread_park+0x90/0x90
[ 71.346129] ret_from_fork+0x35/0x40
[ 71.346748] Modules linked in: nf_tables_set nf_tables nfnetlink 8021q [last unloaded: nfnetlink]
[ 71.348153] ---[ end trace
2eaa8149ca759bcc ]---
[ 71.349066] RIP: 0010:nft_set_elem_destroy+0xa5/0x110 [nf_tables]
[ 71.350016] Code: 89 d4 84 c0 74 0e 8b 77 44 0f b6 f8 48 01 df e8 41 ff ff ff 45 84 e4 74 36 44 0f b6 63 08 45 84 e4 74 2c 49 01 dc 49 8b 04 24 <48> 8b 40 38 48 85 c0 74 4f 48 89 e7 4c 8b
[ 71.350017] RSP: 0018:
ffffc9000226fd90 EFLAGS:
00010282
[ 71.350019] RAX:
6f6b6e696c2e756e RBX:
ffff88813ab79f60 RCX:
ffff88813931b5a0
[ 71.350019] RDX:
0000000000000001 RSI:
0000000000000000 RDI:
ffff88813ab79f9a
[ 71.350020] RBP:
ffff88813ab79f60 R08:
0000000000000008 R09:
0000000000000000
[ 71.350021] R10:
000000000000021c R11:
0000000000000000 R12:
ffff88813ab79fc2
[ 71.350022] R13:
ffff88813b3adf50 R14:
dead000000000100 R15:
ffff88813931b8a0
[ 71.350025] FS:
0000000000000000(0000) GS:
ffff88813bd80000(0000) knlGS:
0000000000000000
[ 71.350026] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 71.350027] CR2:
000055ac683710f0 CR3:
000000013a222003 CR4:
0000000000360ee0
[ 71.350028] DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
[ 71.350028] DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
[ 71.350030] Kernel panic - not syncing: Fatal exception
[ 71.350412] Kernel Offset: disabled
[ 71.365922] ---[ end Kernel panic - not syncing: Fatal exception ]---
which is caused by dangling elements that have been deactivated, but
never removed.
On a flush operation, nft_pipapo_walk() walks through all the elements
in the mapping table, which are then deactivated by nft_flush_set(),
one by one, and added to the commit list for removal. Element data is
then freed.
On transaction commit, nft_pipapo_remove() is called, and failed to
remove these elements, leading to the stale references in the mapping.
The first symptom of this, revealed by KASan, is a one-byte
use-after-free in subsequent calls to nft_pipapo_walk(), which is
usually not enough to trigger a panic. When stale elements are used
more heavily, though, such as double-free via nft_pipapo_destroy()
as in Phil's case, the problem becomes more noticeable.
The issue comes from that fact that, on a flush operation,
nft_pipapo_remove() won't get the actual key data via elem->key,
elements to be deleted upon commit won't be found by the lookup via
pipapo_get(), and removal will be skipped. Key data should be fetched
via nft_set_ext_key(), instead.
Reported-by: Phil Sutter <phil@nwl.cc>
Fixes:
3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pablo Neira Ayuso [Wed, 26 Feb 2020 12:55:15 +0000 (13:55 +0100)]
Merge branch 'master' of git://blackhole.kfki.hu/nf
Jozsef Kadlecsik says:
====================
ipset patches for nf
The first one is larger than usual, but the issue could not be solved simpler.
Also, it's a resend of the patch I submitted a few days ago, with a one line
fix on top of that: the size of the comment extensions was not taken into
account at reporting the full size of the set.
- Fix "INFO: rcu detected stall in hash_xxx" reports of syzbot
by introducing region locking and using workqueue instead of timer based
gc of timed out entries in hash types of sets in ipset.
- Fix the forceadd evaluation path - the bug was also uncovered by the syzbot.
====================
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Masami Hiramatsu [Tue, 25 Feb 2020 14:36:41 +0000 (23:36 +0900)]
bootconfig: Fix CONFIG_BOOTTIME_TRACING dependency issue
Since commit
d8a953ddde5e ("bootconfig: Set CONFIG_BOOT_CONFIG=n by
default") also changed the CONFIG_BOOTTIME_TRACING to select
CONFIG_BOOT_CONFIG to show the boot-time tracing on the menu,
it introduced wrong dependencies with BLK_DEV_INITRD as below.
WARNING: unmet direct dependencies detected for BOOT_CONFIG
Depends on [n]: BLK_DEV_INITRD [=n]
Selected by [y]:
- BOOTTIME_TRACING [=y] && TRACING_SUPPORT [=y] && FTRACE [=y] && TRACING [=y]
This makes the CONFIG_BOOT_CONFIG selects CONFIG_BLK_DEV_INITRD to
fix this error and make CONFIG_BOOTTIME_TRACING=n by default, so
that both boot-time tracing and boot configuration off but those
appear on the menu list.
Link: http://lkml.kernel.org/r/158264140162.23842.11237423518607465535.stgit@devnote2
Fixes:
d8a953ddde5e ("bootconfig: Set CONFIG_BOOT_CONFIG=n by default")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Compiled-tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Jason A. Donenfeld [Tue, 25 Feb 2020 10:05:35 +0000 (18:05 +0800)]
icmp: allow icmpv6_ndo_send to work with CONFIG_IPV6=n
The icmpv6_send function has long had a static inline implementation
with an empty body for CONFIG_IPV6=n, so that code calling it doesn't
need to be ifdef'd. The new icmpv6_ndo_send function, which is intended
for drivers as a drop-in replacement with an identical function
signature, should follow the same pattern. Without this patch, drivers
that used to work with CONFIG_IPV6=n now result in a linker error.
Cc: Chen Zhou <chenzhou10@huawei.com>
Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes:
0b41713b6066 ("icmp: introduce helper for nat'd source address in network device context")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Tue, 25 Feb 2020 18:14:39 +0000 (10:14 -0800)]
Merge tag 'riscv-for-linux-5.6-rc4' of git://git./linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
"This contains a handful of RISC-V related fixes that I've collected
and would like to target for 5.6-rc4:
- A fix to set up the PMPs on boot, which allows the kernel to access
memory on systems that don't set up permissive PMPs before getting
to Linux. This only effects machine-mode kernels, which currently
means only NOMMU kernels.
- A fix to avoid enabling supervisor-mode interrupts when running in
machine-mode, also only for NOMMU kernels.
- A pair of fixes to our KASAN support to avoid corrupting memory.
- A gitignore fix.
This boots on QEMU's virt board for me"
* tag 'riscv-for-linux-5.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: adjust the indent
riscv: allocate a complete page size for each page table
riscv: Fix gitignore
RISC-V: Don't enable all interrupts in trap_init()
riscv: set pmp configuration if kernel is running in M-mode
Linus Torvalds [Tue, 25 Feb 2020 18:09:41 +0000 (10:09 -0800)]
Merge branch 'mips-fixes' of git://git./linux/kernel/git/mips/linux
Pull MIPS fixes from Paul Burton:
"Here are a few MIPS fixes, and a MAINTAINERS update to hand over MIPS
maintenance to Thomas Bogendoerfer - this will be my final pull
request as MIPS maintainer.
Thanks for your helpful comments, useful corrections & responsiveness
during the time I've fulfilled the role, and I'm sure I'll pop up
elsewhere in the tree somewhere down the line"
* 'mips-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MAINTAINERS: Hand MIPS over to Thomas
MIPS: ingenic: DTS: Fix watchdog nodes
MIPS: X1000: Fix clock of watchdog node.
MIPS: vdso: Wrap -mexplicit-relocs in cc-option
MIPS: VPE: Fix a double free and a memory leak in 'release_vpe()'
MIPS: cavium_octeon: Fix syncw generation.
mips: vdso: add build time check that no 'jalr t9' calls left
MIPS: Disable VDSO time functionality on microMIPS
mips: vdso: fix 'jalr t9' crash in vdso code
Stefano Brivio [Fri, 21 Feb 2020 02:11:56 +0000 (03:11 +0100)]
selftests: nft_concat_range: Move option for 'list ruleset' before command
Before nftables commit
fb9cea50e8b3 ("main: enforce options before
commands"), 'nft list ruleset -a' happened to work, but it's wrong
and won't work anymore. Replace it by 'nft -a list ruleset'.
Reported-by: Chen Yi <yiche@redhat.com>
Fixes:
611973c1e06f ("selftests: netfilter: Introduce tests for sets with range concatenation")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Kees Cook [Sat, 22 Feb 2020 00:02:39 +0000 (16:02 -0800)]
docs: Fix empty parallelism argument
When there was no parallelism (no top-level -j arg and a pre-1.7
sphinx-build), the argument passed would be empty ("") instead of just
being missing, which would (understandably) badly confuse sphinx-build.
Fix this by removing the quotes.
Reported-by: Rafael J. Wysocki <rafael@kernel.org>
Fixes:
51e46c7a4007 ("docs, parallelism: Rearrange how jobserver reservations are made")
Cc: stable@vger.kernel.org # v5.5 only
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Stephen Kitt [Fri, 21 Feb 2020 20:57:33 +0000 (21:57 +0100)]
docs: remove MPX from the x86 toc
MPX was removed in commit
45fc24e89b7c ("x86/mpx: remove MPX from
arch/x86"), this removes the corresponding entry in the x86 toc.
This was suggested by a Sphinx warning.
Signed-off-by: Stephen Kitt <steve@sk2.org>
Fixes:
45fc24e89b7cc ("x86/mpx: remove MPX from arch/x86")
Acked-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Paul Burton [Sat, 22 Feb 2020 17:04:17 +0000 (09:04 -0800)]
MAINTAINERS: Hand MIPS over to Thomas
My time with MIPS the company has reached its end, and so at best I'll
have little time spend on maintaining arch/mips/.
Ralf last authored a patch over 2 years ago, the last time he committed
one is even further back & activity was sporadic for a while before
that. The reality is that he isn't active.
Having a new maintainer with time to do things properly will be
beneficial all round. Thomas Bogendoerfer has been involved in MIPS
development for a long time & has offered to step up as maintainer, so
add Thomas and remove myself & Ralf from the MIPS entry.
Ralf already has an entry in CREDITS to honor his contributions, so this
just adds one for me.
Signed-off-by: Paul Burton <paulburton@kernel.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-kernel@vger.kernel.org
Cc: linux-mips@vger.kernel.org
David S. Miller [Mon, 24 Feb 2020 23:43:38 +0000 (15:43 -0800)]
Merge tag 'mac80211-for-net-2020-02-24' of git://git./linux/kernel/git/jberg/mac80211
Johannes Berg
====================
A few fixes:
* remove a double mutex-unlock
* fix a leak in an error path
* NULL pointer check
* include if_vlan.h where needed
* avoid RCU list traversal when not under RCU
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Paul Moore [Mon, 24 Feb 2020 21:38:57 +0000 (16:38 -0500)]
audit: always check the netlink payload length in audit_receive_msg()
This patch ensures that we always check the netlink payload length
in audit_receive_msg() before we take any action on the payload
itself.
Cc: stable@vger.kernel.org
Reported-by: syzbot+399c44bf1f43b8747403@syzkaller.appspotmail.com
Reported-by: syzbot+e4b12d8d202701f08b6d@syzkaller.appspotmail.com
Signed-off-by: Paul Moore <paul@paul-moore.com>
Zong Li [Fri, 7 Feb 2020 09:52:45 +0000 (17:52 +0800)]
riscv: adjust the indent
Adjust the indent to match Linux coding style.
Signed-off-by: Zong Li <zong.li@sifive.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Zong Li [Fri, 7 Feb 2020 09:52:44 +0000 (17:52 +0800)]
riscv: allocate a complete page size for each page table
Each page table should be created by allocating a complete page size
for it. Otherwise, the content of the page table would be corrupted
somewhere through memory allocation which allocates the memory at the
middle of the page table for other use.
Signed-off-by: Zong Li <zong.li@sifive.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Linus Torvalds [Mon, 24 Feb 2020 19:48:17 +0000 (11:48 -0800)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"Bugfixes, including the fix for CVE-2020-2732 and a few issues found
by 'make W=1'"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: s390: rstify new ioctls in api.rst
KVM: nVMX: Check IO instruction VM-exit conditions
KVM: nVMX: Refactor IO bitmap checks into helper function
KVM: nVMX: Don't emulate instructions in guest mode
KVM: nVMX: Emulate MTF when performing instruction emulation
KVM: fix error handling in svm_hardware_setup
KVM: SVM: Fix potential memory leak in svm_cpu_init()
KVM: apic: avoid calculating pending eoi from an uninitialized val
KVM: nVMX: clear PIN_BASED_POSTED_INTR from nested pinbased_ctls only when apicv is globally disabled
KVM: nVMX: handle nested posted interrupts when apicv is disabled for L1
kvm: x86: svm: Fix NULL pointer dereference when AVIC not enabled
KVM: VMX: Add VMX_FEATURE_USR_WAIT_PAUSE
KVM: nVMX: Hold KVM's srcu lock when syncing vmcs12->shadow
KVM: x86: don't notify userspace IOAPIC on edge-triggered interrupt EOI
kvm/emulate: fix a -Werror=cast-function-type
KVM: x86: fix incorrect comparison in trace event
KVM: nVMX: Fix some obsolete comments and grammar error
KVM: x86: fix missing prototypes
KVM: x86: enable -Werror
Linus Torvalds [Mon, 24 Feb 2020 19:40:23 +0000 (11:40 -0800)]
Merge branch 'linus' of git://git./linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
"This fixes a Kconfig-related build error and an integer overflow in
chacha20poly1305"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: chacha20poly1305 - prevent integer overflow on large input
tee: amdtee: amdtee depends on CRYPTO_DEV_CCP_DD
Linus Torvalds [Mon, 24 Feb 2020 19:32:15 +0000 (11:32 -0800)]
Merge branch 'fixes' of git://git./linux/kernel/git/viro/vfs
Pull tmpfs fix from Al Viro:
"Regression from fs_parse series this cycle..."
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
tmpfs: deny and force are not huge mount options
Linus Torvalds [Fri, 21 Feb 2020 20:43:35 +0000 (12:43 -0800)]
floppy: check FDC index for errors before assigning it
Jordy Zomer reported a KASAN out-of-bounds read in the floppy driver in
wait_til_ready().
Which on the face of it can't happen, since as Willy Tarreau points out,
the function does no particular memory access. Except through the FDCS
macro, which just indexes a static allocation through teh current fdc,
which is always checked against N_FDC.
Except the checking happens after we've already assigned the value.
The floppy driver is a disgrace (a lot of it going back to my original
horrd "design"), and has no real maintainer. Nobody has the hardware,
and nobody really cares. But it still gets used in virtual environment
because it's one of those things that everybody supports.
The whole thing should be re-written, or at least parts of it should be
seriously cleaned up. The 'current fdc' index, which is used by the
FDCS macro, and which is often shadowed by a local 'fdc' variable, is a
prime example of how not to write code.
But because nobody has the hardware or the motivation, let's just fix up
the immediate problem with a nasty band-aid: test the fdc index before
actually assigning it to the static 'fdc' variable.
Reported-by: Jordy Zomer <jordy@simplyhacker.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nikolay Aleksandrov [Mon, 24 Feb 2020 16:46:22 +0000 (18:46 +0200)]
net: bridge: fix stale eth hdr pointer in br_dev_xmit
In br_dev_xmit() we perform vlan filtering in br_allowed_ingress() but
if the packet has the vlan header inside (e.g. bridge with disabled
tx-vlan-offload) then the vlan filtering code will use skb_vlan_untag()
to extract the vid before filtering which in turn calls pskb_may_pull()
and we may end up with a stale eth pointer. Moreover the cached eth header
pointer will generally be wrong after that operation. Remove the eth header
caching and just use eth_hdr() directly, the compiler does the right thing
and calculates it only once so we don't lose anything.
Fixes:
057658cb33fb ("bridge: suppress arp pkts on BR_NEIGH_SUPPRESS ports")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 24 Feb 2020 18:58:57 +0000 (10:58 -0800)]
Merge branch 'net-ll_temac-Bugfixes'
Esben Haabendal says:
====================
net: ll_temac: Bugfixes
Fix a number of bugs which have been present since the first commit.
The bugs fixed in patch 1,2 and 4 have all been observed in real systems, and
was relatively easy to reproduce given an appropriate stress setup.
Changes since v1:
- Changed error handling of of dma_map_single() in temac_start_xmit() to drop
packet instead of returning NETDEV_TX_BUSY.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Esben Haabendal [Fri, 21 Feb 2020 06:47:58 +0000 (07:47 +0100)]
net: ll_temac: Handle DMA halt condition caused by buffer underrun
The SDMA engine used by TEMAC halts operation when it has finished
processing of the last buffer descriptor in the buffer ring.
Unfortunately, no interrupt event is generated when this happens,
so we need to setup another mechanism to make sure DMA operation is
restarted when enough buffers have been added to the ring.
Fixes:
92744989533c ("net: add Xilinx ll_temac device driver")
Signed-off-by: Esben Haabendal <esben@geanix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Esben Haabendal [Fri, 21 Feb 2020 06:47:45 +0000 (07:47 +0100)]
net: ll_temac: Fix RX buffer descriptor handling on GFP_ATOMIC pressure
Failures caused by GFP_ATOMIC memory pressure have been observed, and
due to the missing error handling, results in kernel crash such as
[1876998.350133] kernel BUG at mm/slub.c:3952!
[1876998.350141] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[1876998.350147] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.3.0-scnxt #1
[1876998.350150] Hardware name: N/A N/A/COMe-bIP2, BIOS CCR2R920 03/01/2017
[1876998.350160] RIP: 0010:kfree+0x1ca/0x220
[1876998.350164] Code: 85 db 74 49 48 8b 95 68 01 00 00 48 31 c2 48 89 10 e9 d7 fe ff ff 49 8b 04 24 a9 00 00 01 00 75 0b 49 8b 44 24 08 a8 01 75 02 <0f> 0b 49 8b 04 24 31 f6 a9 00 00 01 00 74 06 41 0f b6 74 24
5b
[1876998.350172] RSP: 0018:
ffffc900000f0df0 EFLAGS:
00010246
[1876998.350177] RAX:
ffffea00027f0708 RBX:
ffff888008d78000 RCX:
0000000000391372
[1876998.350181] RDX:
0000000000000000 RSI:
ffffe8ffffd01400 RDI:
ffff888008d78000
[1876998.350185] RBP:
ffff8881185a5d00 R08:
ffffc90000087dd8 R09:
000000000000280a
[1876998.350189] R10:
0000000000000002 R11:
0000000000000000 R12:
ffffea0000235e00
[1876998.350193] R13:
ffff8881185438a0 R14:
0000000000000000 R15:
ffff888118543870
[1876998.350198] FS:
0000000000000000(0000) GS:
ffff88811f300000(0000) knlGS:
0000000000000000
[1876998.350203] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
s#1 Part1
[1876998.350206] CR2:
00007f8dac7b09f0 CR3:
000000011e20a006 CR4:
00000000001606e0
[1876998.350210] Call Trace:
[1876998.350215] <IRQ>
[1876998.350224] ? __netif_receive_skb_core+0x70a/0x920
[1876998.350229] kfree_skb+0x32/0xb0
[1876998.350234] __netif_receive_skb_core+0x70a/0x920
[1876998.350240] __netif_receive_skb_one_core+0x36/0x80
[1876998.350245] process_backlog+0x8b/0x150
[1876998.350250] net_rx_action+0xf7/0x340
[1876998.350255] __do_softirq+0x10f/0x353
[1876998.350262] irq_exit+0xb2/0xc0
[1876998.350265] do_IRQ+0x77/0xd0
[1876998.350271] common_interrupt+0xf/0xf
[1876998.350274] </IRQ>
In order to handle such failures more graceful, this change splits the
receive loop into one for consuming the received buffers, and one for
allocating new buffers.
When GFP_ATOMIC allocations fail, the receive will continue with the
buffers that is still there, and with the expectation that the allocations
will succeed in a later call to receive.
Fixes:
92744989533c ("net: add Xilinx ll_temac device driver")
Signed-off-by: Esben Haabendal <esben@geanix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Esben Haabendal [Fri, 21 Feb 2020 06:47:33 +0000 (07:47 +0100)]
net: ll_temac: Add more error handling of dma_map_single() calls
This adds error handling to the remaining dma_map_single() calls, so that
behavior is well defined if/when we run out of DMA memory.
Fixes:
92744989533c ("net: add Xilinx ll_temac device driver")
Signed-off-by: Esben Haabendal <esben@geanix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Esben Haabendal [Fri, 21 Feb 2020 06:47:21 +0000 (07:47 +0100)]
net: ll_temac: Fix race condition causing TX hang
It is possible that the interrupt handler fires and frees up space in
the TX ring in between checking for sufficient TX ring space and
stopping the TX queue in temac_start_xmit. If this happens, the
queue wake from the interrupt handler will occur before the queue is
stopped, causing a lost wakeup and the adapter's transmit hanging.
To avoid this, after stopping the queue, check again whether there is
sufficient space in the TX ring. If so, wake up the queue again.
This is a port of the similar fix in axienet driver,
commit
7de44285c1f6 ("net: axienet: Fix race condition causing TX hang").
Fixes:
23ecc4bde21f ("net: ll_temac: fix checksum offload logic")
Signed-off-by: Esben Haabendal <esben@geanix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Christian Borntraeger [Mon, 24 Feb 2020 10:15:59 +0000 (11:15 +0100)]
KVM: s390: rstify new ioctls in api.rst
We also need to rstify the new ioctls that we added in parallel to the
rstification of the kvm docs.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Madhuparna Bhowmik [Sun, 23 Feb 2020 14:33:02 +0000 (20:03 +0530)]
mac80211: rx: avoid RCU list traversal under mutex
local->sta_mtx is held in __ieee80211_check_fast_rx_iface().
No need to use list_for_each_entry_rcu() as it also requires
a cond argument to avoid false lockdep warnings when not used in
RCU read-side section (with CONFIG_PROVE_RCU_LIST).
Therefore use list_for_each_entry();
Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com>
Link: https://lore.kernel.org/r/20200223143302.15390-1-madhuparnabhowmik10@gmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Johannes Berg [Mon, 24 Feb 2020 08:38:15 +0000 (09:38 +0100)]
nl80211: explicitly include if_vlan.h
We use that here, and do seem to get it through some recursive
include, but better include it explicitly.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Link: https://lore.kernel.org/r/20200224093814.1b9c258fec67.I45ac150d4e11c72eb263abec9f1f0c7add9bef2b@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Madhuparna Bhowmik [Sun, 23 Feb 2020 11:22:33 +0000 (16:52 +0530)]
net: core: devlink.c: Hold devlink->lock from the beginning of devlink_dpipe_table_register()
devlink_dpipe_table_find() should be called under either
rcu_read_lock() or devlink->lock. devlink_dpipe_table_register()
calls devlink_dpipe_table_find() without holding the lock
and acquires it later. Therefore hold the devlink->lock
from the beginning of devlink_dpipe_table_register().
Suggested-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 20 Feb 2020 23:34:53 +0000 (15:34 -0800)]
net: phy: Avoid multiple suspends
It is currently possible for a PHY device to be suspended as part of a
network device driver's suspend call while it is still being attached to
that net_device, either via phy_suspend() or implicitly via phy_stop().
Later on, when the MDIO bus controller get suspended, we would attempt
to suspend again the PHY because it is still attached to a network
device.
This is both a waste of time and creates an opportunity for improper
clock/power management bugs to creep in.
Fixes:
803dd9c77ac3 ("net: phy: avoid suspending twice a PHY")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marek Vasut [Sun, 23 Feb 2020 13:38:40 +0000 (14:38 +0100)]
net: ks8851-ml: Fix IRQ handling and locking
The KS8851 requires that packet RX and TX are mutually exclusive.
Currently, the driver hopes to achieve this by disabling interrupt
from the card by writing the card registers and by disabling the
interrupt on the interrupt controller. This however is racy on SMP.
Replace this approach by expanding the spinlock used around the
ks_start_xmit() TX path to ks_irq() RX path to assure true mutual
exclusion and remove the interrupt enabling/disabling, which is
now not needed anymore. Furthermore, disable interrupts also in
ks_net_stop(), which was missing before.
Note that a massive improvement here would be to re-use the KS8851
driver approach, which is to move the TX path into a worker thread,
interrupt handling to threaded interrupt, and synchronize everything
with mutexes, but that would be a much bigger rework, for a separate
patch.
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Petr Stetiar <ynezz@true.cz>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jonathan Neuschäfer [Sun, 23 Feb 2020 17:46:31 +0000 (18:46 +0100)]
docs: networking: phy: Rephrase paragraph for clarity
Let's make it a little easier to read.
Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Neal Cardwell [Sat, 22 Feb 2020 16:21:15 +0000 (11:21 -0500)]
tcp: fix TFO SYNACK undo to avoid double-timestamp-undo
In a rare corner case the new logic for undo of SYNACK RTO could
result in triggering the warning in tcp_fastretrans_alert() that says:
WARN_ON(tp->retrans_out != 0);
The warning looked like:
WARNING: CPU: 1 PID: 1 at net/ipv4/tcp_input.c:2818 tcp_ack+0x13e0/0x3270
The sequence that tickles this bug is:
- Fast Open server receives TFO SYN with data, sends SYNACK
- (client receives SYNACK and sends ACK, but ACK is lost)
- server app sends some data packets
- (N of the first data packets are lost)
- server receives client ACK that has a TS ECR matching first SYNACK,
and also SACKs suggesting the first N data packets were lost
- server performs TS undo of SYNACK RTO, then immediately
enters recovery
- buggy behavior then performed a *second* undo that caused
the connection to be in CA_Open with retrans_out != 0
Basically, the incoming ACK packet with SACK blocks causes us to first
undo the cwnd reduction from the SYNACK RTO, but then immediately
enters fast recovery, which then makes us eligible for undo again. And
then tcp_rcv_synrecv_state_fastopen() accidentally performs an undo
using a "mash-up" of state from two different loss recovery phases: it
uses the timestamp info from the ACK of the original SYNACK, and the
undo_marker from the fast recovery.
This fix refines the logic to only invoke the tcp_try_undo_loss()
inside tcp_rcv_synrecv_state_fastopen() if the connection is still in
CA_Loss. If peer SACKs triggered fast recovery, then
tcp_rcv_synrecv_state_fastopen() can't safely undo.
Fixes:
794200d66273 ("tcp: undo cwnd on Fast Open spurious SYNACK retransmit")
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Haiyang Zhang [Fri, 21 Feb 2020 16:32:18 +0000 (08:32 -0800)]
hv_netvsc: Fix unwanted wakeup in netvsc_attach()
When netvsc_attach() is called by operations like changing MTU, etc.,
an extra wakeup may happen while netvsc_attach() calling
rndis_filter_device_add() which sends rndis messages when queue is
stopped in netvsc_detach(). The completion message will wake up queue 0.
We can reproduce the issue by changing MTU etc., then the wake_queue
counter from "ethtool -S" will increase beyond stop_queue counter:
stop_queue: 0
wake_queue: 1
The issue causes queue wake up, and counter increment, no other ill
effects in current code. So we didn't see any network problem for now.
To fix this, initialize tx_disable to true, and set it to false when
the NIC is ready to be attached or registered.
Fixes:
7b2ee50c0cd5 ("hv_netvsc: common detach logic")
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Mon, 24 Feb 2020 00:17:42 +0000 (16:17 -0800)]
Linux 5.6-rc3
Daniele Palmas [Fri, 21 Feb 2020 13:17:05 +0000 (14:17 +0100)]
net: usb: qmi_wwan: restore mtu min/max values after raw_ip switch
usbnet creates network interfaces with min_mtu = 0 and
max_mtu = ETH_MAX_MTU.
These values are not modified by qmi_wwan when the network interface
is created initially, allowing, for example, to set mtu greater than 1500.
When a raw_ip switch is done (raw_ip set to 'Y', then set to 'N') the mtu
values for the network interface are set through ether_setup, with
min_mtu = ETH_MIN_MTU and max_mtu = ETH_DATA_LEN, not allowing anymore to
set mtu greater than 1500 (error: mtu greater than device maximum).
The patch restores the original min/max mtu values set by usbnet after a
raw_ip switch.
Signed-off-by: Daniele Palmas <dnlplm@gmail.com>
Acked-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sun, 23 Feb 2020 17:43:50 +0000 (09:43 -0800)]
Merge tag 'for-5.6-rc2-tag' of git://git./linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"These are fixes that were found during testing with help of error
injection, plus some other stable material.
There's a fixup to patch added to rc1 causing locking in wrong context
warnings, tests found one more deadlock scenario. The patches are
tagged for stable, two of them now in the queue but we'd like all
three released at the same time.
I'm not happy about fixes to fixes in such a fast succession during
rcs, but I hope we found all the fallouts of commit
28553fa992cb
('Btrfs: fix race between shrinking truncate and fiemap')"
* tag 'for-5.6-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
Btrfs: fix deadlock during fast fsync when logging prealloc extents beyond eof
Btrfs: fix btrfs_wait_ordered_range() so that it waits for all ordered extents
btrfs: fix bytes_may_use underflow in prealloc error condtition
btrfs: handle logged extent failure properly
btrfs: do not check delayed items are empty for single transaction cleanup
btrfs: reset fs_root to NULL on error in open_ctree
btrfs: destroy qgroup extent records on transaction abort
Linus Torvalds [Sun, 23 Feb 2020 17:42:19 +0000 (09:42 -0800)]
Merge tag 'ext4_for_linus_stable' of git://git./linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"More miscellaneous ext4 bug fixes (all stable fodder)"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: fix mount failure with quota configured as module
jbd2: fix ocfs2 corrupt when clearing block group bits
ext4: fix race between writepages and enabling EXT4_EXTENTS_FL
ext4: rename s_journal_flag_rwsem to s_writepages_rwsem
ext4: fix potential race between s_flex_groups online resizing and access
ext4: fix potential race between s_group_info online resizing and access
ext4: fix potential race between online resizing and write operations
ext4: add cond_resched() to __ext4_find_entry()
ext4: fix a data race in EXT4_I(inode)->i_disksize
Linus Torvalds [Sun, 23 Feb 2020 17:37:41 +0000 (09:37 -0800)]
Merge tag 'csky-for-linus-5.6-rc3' of git://github.com/c-sky/csky-linux
Pull csky updates from Guo Ren:
"Sorry, I missed 5.6-rc1 merge window, but in this pull request the
most are the fixes and the rests are between fixes and features. The
only outside modification is the MAINTAINERS file update with our
mailing list.
- cache flush implementation fixes
- ftrace modify panic fix
- CONFIG_SMP boot problem fix
- fix pt_regs saving for atomic.S
- fix fixaddr_init without highmem.
- fix stack protector support
- fix fake Tightly-Coupled Memory code compile and use
- fix some typos and coding convention"
* tag 'csky-for-linus-5.6-rc3' of git://github.com/c-sky/csky-linux: (23 commits)
csky: Replace <linux/clk-provider.h> by <linux/of_clk.h>
csky: Implement copy_thread_tls
csky: Add PCI support
csky: Minimize defconfig to support buildroot config.fragment
csky: Add setup_initrd check code
csky: Cleanup old Kconfig options
arch/csky: fix some Kconfig typos
csky: Fixup compile warning for three unimplemented syscalls
csky: Remove unused cache implementation
csky: Fixup ftrace modify panic
csky: Add flush_icache_mm to defer flush icache all
csky: Optimize abiv2 copy_to_user_page with VM_EXEC
csky: Enable defer flush_dcache_page for abiv2 cpus (807/810/860)
csky: Remove unnecessary flush_icache_* implementation
csky: Support icache flush without specific instructions
csky/Kconfig: Add Kconfig.platforms to support some drivers
csky/smp: Fixup boot failed when CONFIG_SMP
csky: Set regs->usp to kernel sp, when the exception is from kernel
csky/mm: Fixup export invalid_pte_table symbol
csky: Separate fixaddr_init from highmem
...
Oliver Upton [Tue, 4 Feb 2020 23:26:31 +0000 (15:26 -0800)]
KVM: nVMX: Check IO instruction VM-exit conditions
Consult the 'unconditional IO exiting' and 'use IO bitmaps' VM-execution
controls when checking instruction interception. If the 'use IO bitmaps'
VM-execution control is 1, check the instruction access against the IO
bitmaps to determine if the instruction causes a VM-exit.
Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Oliver Upton [Tue, 4 Feb 2020 23:26:30 +0000 (15:26 -0800)]
KVM: nVMX: Refactor IO bitmap checks into helper function
Checks against the IO bitmap are useful for both instruction emulation
and VM-exit reflection. Refactor the IO bitmap checks into a helper
function.
Signed-off-by: Oliver Upton <oupton@google.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Tue, 4 Feb 2020 23:26:29 +0000 (15:26 -0800)]
KVM: nVMX: Don't emulate instructions in guest mode
vmx_check_intercept is not yet fully implemented. To avoid emulating
instructions disallowed by the L1 hypervisor, refuse to emulate
instructions by default.
Cc: stable@vger.kernel.org
[Made commit, added commit msg - Oliver]
Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Oliver Upton [Fri, 7 Feb 2020 10:36:07 +0000 (02:36 -0800)]
KVM: nVMX: Emulate MTF when performing instruction emulation
Since commit
5f3d45e7f282 ("kvm/x86: add support for
MONITOR_TRAP_FLAG"), KVM has allowed an L1 guest to use the monitor trap
flag processor-based execution control for its L2 guest. KVM simply
forwards any MTF VM-exits to the L1 guest, which works for normal
instruction execution.
However, when KVM needs to emulate an instruction on the behalf of an L2
guest, the monitor trap flag is not emulated. Add the necessary logic to
kvm_skip_emulated_instruction() to synthesize an MTF VM-exit to L1 upon
instruction emulation for L2.
Fixes:
5f3d45e7f282 ("kvm/x86: add support for MONITOR_TRAP_FLAG")
Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Li RongQing [Sun, 23 Feb 2020 08:13:12 +0000 (16:13 +0800)]
KVM: fix error handling in svm_hardware_setup
rename svm_hardware_unsetup as svm_hardware_teardown, move
it before svm_hardware_setup, and call it to free all memory
if fail to setup in svm_hardware_setup, otherwise memory will
be leaked
remove __exit attribute for it since it is called in __init
function
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>