Andy Shevchenko [Wed, 29 Apr 2020 15:09:32 +0000 (18:09 +0300)]
stmmac: intel: Fix kernel crash due to wrong error path
Unfortunately sometimes ->probe() may fail. The commit
b9663b7ca6ff
("net: stmmac: Enable SERDES power up/down sequence")
messed up with error handling and thus:
[ 12.811311] ------------[ cut here ]------------
[ 12.811993] kernel BUG at net/core/dev.c:9937!
Fix this by properly crafted error path.
Fixes: b9663b7ca6ff ("net: stmmac: Enable SERDES power up/down sequence")
Cc: Voon Weifeng <weifeng.voon@intel.com>
Cc: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Mon, 27 Apr 2020 15:05:47 +0000 (18:05 +0300)]
mlxsw: spectrum_acl_tcam: Position vchunk in a vregion list properly
Vregion helpers to get min and max priority depend on the correct
ordering of vchunks in the vregion list. However, the current code
always adds new chunk to the end of the list, no matter what the
priority is. Fix this by finding the correct place in the list and put
vchunk there.
Fixes: 22a677661f56 ("mlxsw: spectrum: Introduce ACL core with simple TCAM implementation")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Toke Høiland-Jørgensen [Mon, 27 Apr 2020 14:11:05 +0000 (16:11 +0200)]
tunnel: Propagate ECT(1) when decapsulating as recommended by RFC6040
RFC 6040 recommends propagating an ECT(1) mark from an outer tunnel header
to the inner header if that inner header is already marked as ECT(0). When
RFC 6040 decapsulation was implemented, this case of propagation was not
added. This simply appears to be an oversight, so let's fix that.
Fixes: eccc1bb8d4b4 ("tunnel: drop packet if ECN present with not-ECT")
Reported-by: Bob Briscoe <ietf@bobbriscoe.net>
Reported-by: Olivier Tilmans <olivier.tilmans@nokia-bell-labs.com>
Cc: Dave Taht <dave.taht@gmail.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Shevchenko [Mon, 27 Apr 2020 10:51:20 +0000 (13:51 +0300)]
net: macb: Fix runtime PM refcounting
The commit
e6a41c23df0d, while trying to fix an issue,
("net: macb: ensure interface is not suspended on at91rm9200")
introduced a refcounting regression, because in error case refcounter
must be balanced. Fix it by calling pm_runtime_put_noidle() in error case.
While here, fix the same mistake in other couple of places.
Fixes: e6a41c23df0d ("net: macb: ensure interface is not suspended on at91rm9200")
Cc: Alexandre Belloni <alexandre.belloni@bootlin.com>
Cc: Claudiu Beznea <claudiu.beznea@microchip.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Christophe JAILLET [Sun, 26 Apr 2020 20:59:21 +0000 (22:59 +0200)]
net: moxa: Fix a potential double 'free_irq()'
Should an irq requested with 'devm_request_irq' be released explicitly,
it should be done by 'devm_free_irq()', not 'free_irq()'.
Fixes: 6c821bd9edc9 ("net: Add MOXA ART SoCs ethernet driver")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Scott Dial [Fri, 24 Apr 2020 22:51:08 +0000 (18:51 -0400)]
net: macsec: preserve ingress frame ordering
MACsec decryption always occurs in a softirq context. Since
the FPU may not be usable in the softirq context, the call to
decrypt may be scheduled on the cryptd work queue. The cryptd
work queue does not provide ordering guarantees. Therefore,
preserving order requires masking out ASYNC implementations
of gcm(aes).
For instance, an Intel CPU with AES-NI makes available the
generic-gcm-aesni driver from the aesni_intel module to
implement gcm(aes). However, this implementation requires
the FPU, so it is not always available to use from a softirq
context, and will fallback to the cryptd work queue, which
does not preserve frame ordering. With this change, such a
system would select gcm_base(ctr(aes-aesni),ghash-generic).
While the aes-aesni implementation prefers to use the FPU, it
will fallback to the aes-asm implementation if unavailable.
By using a synchronous version of gcm(aes), the decryption
will complete before returning from crypto_aead_decrypt().
Therefore, the macsec_decrypt_done() callback will be called
before returning from macsec_decrypt(). Thus, the order of
calls to macsec_post_decrypt() for the frames is preserved.
While it's presumable that the pure AES-NI version of gcm(aes)
is more performant, the hybrid solution is capable of gigabit
speeds on modest hardware. Regardless, preserving the order
of frames is paramount for many network protocols (e.g.,
triggering TCP retries). Within the MACsec driver itself, the
replay protection is tripped by the out-of-order frames, and
can cause frames to be dropped.
This bug has been present in this code since it was added in
v4.6, however it may not have been noticed since not all CPUs
have FPU offload available. Additionally, the bug manifests
as occasional out-of-order packets that are easily
misattributed to other network phenomena.
When this code was added in v4.6, the crypto/gcm.c code did
not restrict selection of the ghash function based on the
ASYNC flag. For instance, x86 CPUs with PCLMULQDQ would
select the ghash-clmulni driver instead of ghash-generic,
which submits to the cryptd work queue if the FPU is busy.
However, this bug was was corrected in v4.8 by commit
b30bdfa86431afbafe15284a3ad5ac19b49b88e3, and was backported
all the way back to the v3.14 stable branch, so this patch
should be applicable back to the v4.6 stable branch.
Signed-off-by: Scott Dial <scott@scottdial.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 1 May 2020 01:07:16 +0000 (18:07 -0700)]
Merge git://git./pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Do not update the UDP checksum when it's zero, from Guillaume Nault.
2) Fix return of local variable in nf_osf, from Arnd Bergmann.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 1 May 2020 01:04:58 +0000 (18:04 -0700)]
Merge branch 'net-ipa-three-bug-fixes'
Alex Elder says:
====================
net: ipa: three bug fixes
This series fixes three bugs in the Qualcomm IPA code. The third
adds a missing error code initialization step.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Alex Elder [Thu, 30 Apr 2020 21:35:12 +0000 (16:35 -0500)]
net: ipa: zero return code before issuing generic EE command
Zero the result code stored in a field of the scratch 0 register
before issuing a generic EE command. This just guarantees that
the value we read later was actually written as a result of the
command.
Also add the definitions of two more possible result codes that can
be returned when issuing flow control enable or disable commands:
INCORRECT_CHANNEL_STATE: - channel must be in started state
INCORRECT_DIRECTION - flow control is only valid for TX channels
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alex Elder [Thu, 30 Apr 2020 21:35:11 +0000 (16:35 -0500)]
net: ipa: fix an error message in gsi_channel_init_one()
An error message about limiting the number of TREs used prints the
wrong value. Fix this bug.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alex Elder [Thu, 30 Apr 2020 21:35:10 +0000 (16:35 -0500)]
net: ipa: fix a bug in ipa_endpoint_stop()
In ipa_endpoint_stop(), for TX endpoints we set the number of retries
to 0. When we break out of the loop, retries being 0 means we return
EIO rather than the value of ret (which should be 0).
Fix this by using a non-zero retry count for both RX and TX
channels, and just break out of the loop after calling
gsi_channel_stop() for TX channels. This way only RX channels
will retry, and the retry count will be non-zero at the end
for TX channels (so the proper value gets returned).
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 1 May 2020 01:02:46 +0000 (18:02 -0700)]
Merge branch 'ionic-fw-upgrade-bug-fixes'
Shannon Nelson says:
====================
ionic: fw upgrade bug fixes
These patches address issues found in additional internal
fw-upgrade testing.
v2:
- replaced extra state flag with postponing first link check
- added device reset patch
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Thu, 30 Apr 2020 21:33:43 +0000 (14:33 -0700)]
ionic: add device reset to fw upgrade down
Doing a device reset addresses an obscure FW timing issue in
the FW upgrade process.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Thu, 30 Apr 2020 21:33:42 +0000 (14:33 -0700)]
ionic: refresh devinfo after fw-upgrade
Make sure we can report the new FW version after a
fw-upgrade has finished by re-reading the device's
fw version information.
Fixes: c672412f6172 ("ionic: remove lifs on fw reset")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Thu, 30 Apr 2020 21:33:41 +0000 (14:33 -0700)]
ionic: no link check until after probe
Don't bother with the link check during probe, let
the watchdog notice the first link-up. This allows
probe to finish cleanly without any interruptions
from over excited user programs opening the device
as soon as it is registered.
Fixes: c672412f6172 ("ionic: remove lifs on fw reset")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julia Lawall [Thu, 30 Apr 2020 19:51:32 +0000 (21:51 +0200)]
dp83640: reverse arguments to list_add_tail
In this code, it appears that phyter_clocks is a list head, based on
the previous list_for_each, and that clock->list is intended to be a
list element, given that it has just been initialized in
dp83640_clock_init. Accordingly, switch the arguments to
list_add_tail, which takes the list head as the second argument.
Fixes: cb646e2b02b27 ("ptp: Added a clock driver for the National Semiconductor PHYTER.")
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Thu, 30 Apr 2020 19:38:45 +0000 (22:38 +0300)]
net: bridge: vlan: Add a schedule point during VLAN processing
User space can request to delete a range of VLANs from a bridge slave in
one netlink request. For each deleted VLAN the FDB needs to be traversed
in order to flush all the affected entries.
If a large range of VLANs is deleted and the number of FDB entries is
large or the FDB lock is contented, it is possible for the kernel to
loop through the deleted VLANs for a long time. In case preemption is
disabled, this can result in a soft lockup.
Fix this by adding a schedule point after each VLAN is deleted to yield
the CPU, if needed. This is safe because the VLANs are traversed in
process context.
Fixes: bdced7ef7838 ("bridge: support for multiple vlans and vlan ranges in setlink and dellink requests")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Stefan Priebe - Profihost AG <s.priebe@profihost.ag>
Tested-by: Stefan Priebe - Profihost AG <s.priebe@profihost.ag>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Juliet Kim [Thu, 30 Apr 2020 18:22:11 +0000 (13:22 -0500)]
ibmvnic: Skip fatal error reset after passive init
During MTU change, the following events may happen.
Client-driven CRQ initialization fails due to partner’s CRQ closed,
causing client to enqueue a reset task for FATAL_ERROR. Then passive
(server-driven) CRQ initialization succeeds, causing client to
release CRQ and enqueue a reset task for failover. If the passive
CRQ initialization occurs before the FATAL reset task is processed,
the FATAL error reset task would try to access a CRQ message queue
that was freed, causing an oops. The problem may be most likely to
occur during DLPAR add vNIC with a non-default MTU, because the DLPAR
process will automatically issue a change MTU request.
Fix this by not processing fatal error reset if CRQ is passively
initialized after client-driven CRQ initialization fails.
Signed-off-by: Juliet Kim <julietk@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 30 Apr 2020 19:58:11 +0000 (12:58 -0700)]
Merge tag 'mlx5-fixes-2020-04-29' of git://git./linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
Mellanox, mlx5 fixes 2020-04-29
This series introduces some fixes to mlx5 driver.
Please pull and let me know if there is any problem.
v2:
- Dropped the ktls patch, Tariq has to check if it is fixable in the stack
For -stable v4.12
('net/mlx5: Fix forced completion access non initialized command entry')
('net/mlx5: Fix command entry leak in Internal Error State')
For -stable v5.4
('net/mlx5: DR, On creation set CQ's arm_db member to right value')
For -stable v5.6
('net/mlx5e: Fix q counters on uplink representors')
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Thu, 30 Apr 2020 13:03:22 +0000 (15:03 +0200)]
mptcp: fix uninitialized value access
tcp_v{4,6}_syn_recv_sock() set 'own_req' only when returning
a not NULL 'child', let's check 'own_req' only if child is
available to avoid an - unharmful - UBSAN splat.
v1 -> v2:
- reference the correct hash
Fixes: 4c8941de781c ("mptcp: avoid flipping mp_capable field in syn_recv_sock()")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 30 Apr 2020 19:23:23 +0000 (12:23 -0700)]
Merge branch 'mptcp-fix-incoming-options-parsing'
Paolo Abeni says:
====================
mptcp: fix incoming options parsing
This series addresses a serious issue in MPTCP option parsing.
This is bigger than the usual -net change, but I was unable to find a
working, sane, smaller fix.
The core change is inside patch 2/5 which moved MPTCP options parsing from
the TCP code inside existing MPTCP hooks and clean MPTCP options status on
each processed packet.
The patch 1/5 is a needed pre-requisite, and patches 3,4,5 are smaller,
related fixes.
v1 -> v2:
- cleaned-up patch 1/5
- rebased on top of current -net
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Thu, 30 Apr 2020 13:01:55 +0000 (15:01 +0200)]
mptcp: initialize the data_fin field for mpc packets
When parsing MPC+data packets we set the dss field, so
we must also initialize the data_fin, or we can find stray
value there.
Fixes: 9a19371bf029 ("mptcp: fix data_fin handing in RX path")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Thu, 30 Apr 2020 13:01:54 +0000 (15:01 +0200)]
mptcp: fix 'use_ack' option access.
The mentioned RX option field is initialized only for DSS
packet, we must access it only if 'dss' is set too, or
the subflow will end-up in a bad status, leading to
RFC violations.
Fixes: d22f4988ffec ("mptcp: process MP_CAPABLE data option")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Thu, 30 Apr 2020 13:01:53 +0000 (15:01 +0200)]
mptcp: avoid a WARN on bad input.
Syzcaller has found a way to trigger the WARN_ON_ONCE condition
in check_fully_established().
The root cause is a legit fallback to TCP scenario, so replace
the WARN with a plain message on a more strict condition.
Fixes: f296234c98a8 ("mptcp: Add handling of incoming MP_JOIN requests")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Thu, 30 Apr 2020 13:01:52 +0000 (15:01 +0200)]
mptcp: move option parsing into mptcp_incoming_options()
The mptcp_options_received structure carries several per
packet flags (mp_capable, mp_join, etc.). Such fields must
be cleared on each packet, even on dropped ones or packet
not carrying any MPTCP options, but the current mptcp
code clears them only on TCP option reset.
On several races/corner cases we end-up with stray bits in
incoming options, leading to WARN_ON splats. e.g.:
[ 171.164906] Bad mapping: ssn=32714 map_seq=1 map_data_len=32713
[ 171.165006] WARNING: CPU: 1 PID: 5026 at net/mptcp/subflow.c:533 warn_bad_map (linux-mptcp/net/mptcp/subflow.c:533 linux-mptcp/net/mptcp/subflow.c:531)
[ 171.167632] Modules linked in: ip6_vti ip_vti ip_gre ipip sit tunnel4 ip_tunnel geneve ip6_udp_tunnel udp_tunnel macsec macvtap tap ipvlan macvlan 8021q garp mrp xfrm_interface veth netdevsim nlmon dummy team bonding vcan bridge stp llc ip6_gre gre ip6_tunnel tunnel6 tun binfmt_misc intel_rapl_msr intel_rapl_common rfkill kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel joydev virtio_balloon pcspkr i2c_piix4 sunrpc ip_tables xfs libcrc32c crc32c_intel serio_raw virtio_console ata_generic virtio_blk virtio_net net_failover failover ata_piix libata
[ 171.199464] CPU: 1 PID: 5026 Comm: repro Not tainted 5.7.0-rc1.mptcp_f227fdf5d388+ #95
[ 171.200886] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
[ 171.202546] RIP: 0010:warn_bad_map (linux-mptcp/net/mptcp/subflow.c:533 linux-mptcp/net/mptcp/subflow.c:531)
[ 171.206537] Code: c1 ea 03 0f b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 04 84 d2 75 1d 8b 55 3c 44 89 e6 48 c7 c7 20 51 13 95 e8 37 8b 22 fe <0f> 0b 48 83 c4 08 5b 5d 41 5c c3 89 4c 24 04 e8 db d6 94 fe 8b 4c
[ 171.220473] RSP: 0018:
ffffc90000150560 EFLAGS:
00010282
[ 171.221639] RAX:
0000000000000000 RBX:
0000000000000000 RCX:
0000000000000000
[ 171.223108] RDX:
0000000000000000 RSI:
0000000000000008 RDI:
fffff5200002a09e
[ 171.224388] RBP:
ffff8880aa6e3c00 R08:
0000000000000001 R09:
fffffbfff2ec9955
[ 171.225706] R10:
ffffffff9764caa7 R11:
fffffbfff2ec9954 R12:
0000000000007fca
[ 171.227211] R13:
ffff8881066f4a7f R14:
ffff8880aa6e3c00 R15:
0000000000000020
[ 171.228460] FS:
00007f8623719740(0000) GS:
ffff88810be00000(0000) knlGS:
0000000000000000
[ 171.230065] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 171.231303] CR2:
00007ffdab190a50 CR3:
00000001038ea006 CR4:
0000000000160ee0
[ 171.232586] Call Trace:
[ 171.233109] <IRQ>
[ 171.233531] get_mapping_status (linux-mptcp/net/mptcp/subflow.c:691)
[ 171.234371] mptcp_subflow_data_available (linux-mptcp/net/mptcp/subflow.c:736 linux-mptcp/net/mptcp/subflow.c:832)
[ 171.238181] subflow_state_change (linux-mptcp/net/mptcp/subflow.c:1085 (discriminator 1))
[ 171.239066] tcp_fin (linux-mptcp/net/ipv4/tcp_input.c:4217)
[ 171.240123] tcp_data_queue (linux-mptcp/./include/linux/compiler.h:199 linux-mptcp/net/ipv4/tcp_input.c:4822)
[ 171.245083] tcp_rcv_established (linux-mptcp/./include/linux/skbuff.h:1785 linux-mptcp/./include/net/tcp.h:1774 linux-mptcp/./include/net/tcp.h:1847 linux-mptcp/net/ipv4/tcp_input.c:5238 linux-mptcp/net/ipv4/tcp_input.c:5730)
[ 171.254089] tcp_v4_rcv (linux-mptcp/./include/linux/spinlock.h:393 linux-mptcp/net/ipv4/tcp_ipv4.c:2009)
[ 171.258969] ip_protocol_deliver_rcu (linux-mptcp/net/ipv4/ip_input.c:204 (discriminator 1))
[ 171.260214] ip_local_deliver_finish (linux-mptcp/./include/linux/rcupdate.h:651 linux-mptcp/net/ipv4/ip_input.c:232)
[ 171.261389] ip_local_deliver (linux-mptcp/./include/linux/netfilter.h:307 linux-mptcp/./include/linux/netfilter.h:301 linux-mptcp/net/ipv4/ip_input.c:252)
[ 171.265884] ip_rcv (linux-mptcp/./include/linux/netfilter.h:307 linux-mptcp/./include/linux/netfilter.h:301 linux-mptcp/net/ipv4/ip_input.c:539)
[ 171.273666] process_backlog (linux-mptcp/./include/linux/rcupdate.h:651 linux-mptcp/net/core/dev.c:6135)
[ 171.275328] net_rx_action (linux-mptcp/net/core/dev.c:6572 linux-mptcp/net/core/dev.c:6640)
[ 171.280472] __do_softirq (linux-mptcp/./arch/x86/include/asm/jump_label.h:25 linux-mptcp/./include/linux/jump_label.h:200 linux-mptcp/./include/trace/events/irq.h:142 linux-mptcp/kernel/softirq.c:293)
[ 171.281379] do_softirq_own_stack (linux-mptcp/arch/x86/entry/entry_64.S:1083)
[ 171.282358] </IRQ>
We could address the issue clearing explicitly the relevant fields
in several places - tcp_parse_option, tcp_fast_parse_options,
possibly others.
Instead we move the MPTCP option parsing into the already existing
mptcp ingress hook, so that we need to clear the fields in a single
place.
This allows us dropping an MPTCP hook from the TCP code and
removing the quite large mptcp_options_received from the tcp_sock
struct. On the flip side, the MPTCP sockets will traverse the
option space twice (in tcp_parse_option() and in
mptcp_incoming_options(). That looks acceptable: we already
do that for syn and 3rd ack packets, plain TCP socket will
benefit from it, and even MPTCP sockets will experience better
code locality, reducing the jumps between TCP and MPTCP code.
v1 -> v2:
- rebased on current '-net' tree
Fixes: 648ef4b88673 ("mptcp: Implement MPTCP receive path")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Thu, 30 Apr 2020 13:01:51 +0000 (15:01 +0200)]
mptcp: consolidate synack processing.
Currently the MPTCP code uses 2 hooks to process syn-ack
packets, mptcp_rcv_synsent() and the sk_rx_dst_set()
callback.
We can drop the first, moving the relevant code into the
latter, reducing the hooking into the TCP code. This is
also needed by the next patch.
v1 -> v2:
- use local tcp sock ptr instead of casting the sk variable
several times - DaveM
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Roi Dayan [Thu, 23 Apr 2020 09:37:21 +0000 (12:37 +0300)]
net/mlx5e: Fix q counters on uplink representors
Need to allocate the q counters before init_rx which needs them
when creating the rq.
Fixes: 8520fa57a4e9 ("net/mlx5e: Create q counters on uplink representors")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Moshe Shemesh [Sun, 23 Feb 2020 01:27:41 +0000 (03:27 +0200)]
net/mlx5: Fix command entry leak in Internal Error State
Processing commands by cmd_work_handler() while already in Internal
Error State will result in entry leak, since the handler process force
completion without doorbell. Forced completion doesn't release the entry
and event completion will never arrive, so entry should be released.
Fixes: 73dd3a4839c1 ("net/mlx5: Avoid using pending command interface slots")
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Moshe Shemesh [Sun, 21 Jul 2019 05:40:13 +0000 (08:40 +0300)]
net/mlx5: Fix forced completion access non initialized command entry
mlx5_cmd_flush() will trigger forced completions to all valid command
entries. Triggered by an asynch event such as fast teardown it can
happen at any stage of the command, including command initialization.
It will trigger forced completion and that can lead to completion on an
uninitialized command entry.
Setting MLX5_CMD_ENT_STATE_PENDING_COMP only after command entry is
initialized will ensure force completion is treated only if command
entry is initialized.
Fixes: 73dd3a4839c1 ("net/mlx5: Avoid using pending command interface slots")
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Erez Shitrit [Wed, 25 Mar 2020 15:19:43 +0000 (17:19 +0200)]
net/mlx5: DR, On creation set CQ's arm_db member to right value
In polling mode, set arm_db member to a value that will avoid CQ
event recovery by the HW.
Otherwise we might get event without completion function.
In addition,empty completion function to was added to protect from
unexpected events.
Fixes: 297cccebdc5a ("net/mlx5: DR, Expose an internal API to issue RDMA operations")
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Parav Pandit [Tue, 21 Apr 2020 10:36:07 +0000 (05:36 -0500)]
net/mlx5: E-switch, Fix mutex init order
In cited patch mutex is initialized after its used.
Below call trace is observed.
Fix the order to initialize the mutex early enough.
Similarly follow mirror sequence during cleanup.
kernel: DEBUG_LOCKS_WARN_ON(lock->magic != lock)
kernel: WARNING: CPU: 5 PID: 45916 at kernel/locking/mutex.c:938
__mutex_lock+0x7d6/0x8a0
kernel: Call Trace:
kernel: ? esw_vport_tbl_get+0x3b/0x250 [mlx5_core]
kernel: ? mark_held_locks+0x55/0x70
kernel: ? __slab_free+0x274/0x400
kernel: ? lockdep_hardirqs_on+0x140/0x1d0
kernel: esw_vport_tbl_get+0x3b/0x250 [mlx5_core]
kernel: ? mlx5_esw_chains_create_fdb_prio+0xa57/0xc20 [mlx5_core]
kernel: mlx5_esw_vport_tbl_get+0x88/0xf0 [mlx5_core]
kernel: mlx5_esw_chains_create+0x2f3/0x3e0 [mlx5_core]
kernel: esw_create_offloads_fdb_tables+0x11d/0x580 [mlx5_core]
kernel: esw_offloads_enable+0x26d/0x540 [mlx5_core]
kernel: mlx5_eswitch_enable_locked+0x155/0x860 [mlx5_core]
kernel: mlx5_devlink_eswitch_mode_set+0x1af/0x320 [mlx5_core]
kernel: devlink_nl_cmd_eswitch_set_doit+0x41/0xb0
Fixes: 96e326878fa5 ("net/mlx5e: Eswitch, Use per vport tables for mirroring")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Parav Pandit [Mon, 20 Apr 2020 09:32:48 +0000 (04:32 -0500)]
net/mlx5: E-switch, Fix printing wrong error value
When mlx5_modify_header_alloc() fails, instead of printing the error
value returned, current error log prints 0.
Fix by printing correct error value returned by
mlx5_modify_header_alloc().
Fixes: 6724e66b90ee ("net/mlx5: E-Switch, Get reg_c1 value on miss")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Parav Pandit [Mon, 20 Apr 2020 09:20:41 +0000 (04:20 -0500)]
net/mlx5: E-switch, Fix error unwinding flow for steering init failure
Error unwinding is done incorrectly in the cited commit.
When steering init fails, there is no need to perform steering cleanup.
When vport error exists, error cleanup should be mirror of the setup
routine, i.e. to perform steering cleanup before metadata cleanup.
This avoids the call trace in accessing uninitialized objects which are
skipped during steering_init() due to failure in steering_init().
Call trace:
mlx5_cmd_modify_header_alloc:805:(pid 21128): too many modify header
actions 1, max supported 0
E-Switch: Failed to create restore mod header
BUG: kernel NULL pointer dereference, address:
00000000000000d0
[ 677.263079] mlx5_destroy_flow_group+0x13/0x80 [mlx5_core]
[ 677.268921] esw_offloads_steering_cleanup+0x51/0xf0 [mlx5_core]
[ 677.275281] esw_offloads_enable+0x1a5/0x800 [mlx5_core]
[ 677.280949] mlx5_eswitch_enable_locked+0x155/0x860 [mlx5_core]
[ 677.287227] mlx5_devlink_eswitch_mode_set+0x1af/0x320
[ 677.293741] devlink_nl_cmd_eswitch_set_doit+0x41/0xb0
[ 677.299217] genl_rcv_msg+0x1eb/0x430
Fixes: 7983a675ba65 ("net/mlx5: E-Switch, Enable chains only if regs loopback is enabled")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
David S. Miller [Wed, 29 Apr 2020 21:23:05 +0000 (14:23 -0700)]
Merge branch 'wireguard-fixes'
Jason A. Donenfeld says:
====================
wireguard fixes for 5.7-rc4
This series contains two fixes and a cleanup for wireguard:
1) Removal of a spurious newline, from Sultan Alsawaf.
2) Fix for a memory leak in an error path, in which memory allocated
prior to the error wasn't freed, reported by Sultan Alsawaf.
3) Fix to ECN support to use RFC6040 properly like all the other tunnel
drivers, from Toke Høiland-Jørgensen.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Toke Høiland-Jørgensen [Wed, 29 Apr 2020 20:59:22 +0000 (14:59 -0600)]
wireguard: receive: use tunnel helpers for decapsulating ECN markings
WireGuard currently only propagates ECN markings on tunnel decap according
to the old RFC3168 specification. However, the spec has since been updated
in RFC6040 to recommend slightly different decapsulation semantics. This
was implemented in the kernel as a set of common helpers for ECN
decapsulation, so let's just switch over WireGuard to using those, so it
can benefit from this enhancement and any future tweaks. We do not drop
packets with invalid ECN marking combinations, because WireGuard is
frequently used to work around broken ISPs, which could be doing that.
Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
Reported-by: Olivier Tilmans <olivier.tilmans@nokia-bell-labs.com>
Cc: Dave Taht <dave.taht@gmail.com>
Cc: Rodney W. Grimes <ietf@gndrsh.dnsmgr.net>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jason A. Donenfeld [Wed, 29 Apr 2020 20:59:21 +0000 (14:59 -0600)]
wireguard: queueing: cleanup ptr_ring in error path of packet_queue_init
Prior, if the alloc_percpu of packet_percpu_multicore_worker_alloc
failed, the previously allocated ptr_ring wouldn't be freed. This commit
adds the missing call to ptr_ring_cleanup in the error case.
Reported-by: Sultan Alsawaf <sultan@kerneltoast.com>
Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sultan Alsawaf [Wed, 29 Apr 2020 20:59:20 +0000 (14:59 -0600)]
wireguard: send: remove errant newline from packet_encrypt_worker
This commit removes a useless newline at the end of a scope, which
doesn't add anything in the way of organization or readability.
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal [Wed, 29 Apr 2020 18:43:20 +0000 (20:43 +0200)]
mptcp: replace mptcp_disconnect with a stub
Paolo points out that mptcp_disconnect is bogus:
"lock_sock(sk);
looks suspicious (lock should be already held by the caller)
And call to: tcp_disconnect(sk, flags); too, sk is not a tcp
socket".
->disconnect() gets called from e.g. inet_stream_connect when
one tries to disassociate a connected socket again (to re-connect
without closing the socket first).
MPTCP however uses mptcp_stream_connect, not inet_stream_connect,
for the mptcp-socket connect call.
inet_stream_connect only gets called indirectly, for the tcp socket,
so any ->disconnect() calls end up calling tcp_disconnect for that
tcp subflow sk.
This also explains why syzkaller has not yet reported a problem
here. So for now replace this with a stub that doesn't do anything.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/14
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Wed, 29 Apr 2020 19:00:41 +0000 (21:00 +0200)]
netfilter: nf_osf: avoid passing pointer to local var
gcc-10 points out that a code path exists where a pointer to a stack
variable may be passed back to the caller:
net/netfilter/nfnetlink_osf.c: In function 'nf_osf_hdr_ctx_init':
cc1: warning: function may return address of local variable [-Wreturn-local-addr]
net/netfilter/nfnetlink_osf.c:171:16: note: declared here
171 | struct tcphdr _tcph;
| ^~~~~
I am not sure whether this can happen in practice, but moving the
variable declaration into the callers avoids the problem.
Fixes: 31a9c29210e2 ("netfilter: nf_osf: add struct nf_osf_hdr_ctx")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Jason Yan [Wed, 29 Apr 2020 14:10:01 +0000 (22:10 +0800)]
net: dsa: mv88e6xxx: remove duplicate assignment of struct members
These struct members named 'phylink_validate' was assigned twice:
static const struct mv88e6xxx_ops mv88e6190_ops = {
......
.phylink_validate = mv88e6390_phylink_validate,
......
.phylink_validate = mv88e6390_phylink_validate,
};
static const struct mv88e6xxx_ops mv88e6190x_ops = {
......
.phylink_validate = mv88e6390_phylink_validate,
......
.phylink_validate = mv88e6390x_phylink_validate,
};
static const struct mv88e6xxx_ops mv88e6191_ops = {
......
.phylink_validate = mv88e6390_phylink_validate,
......
.phylink_validate = mv88e6390_phylink_validate,
};
static const struct mv88e6xxx_ops mv88e6290_ops = {
......
.phylink_validate = mv88e6390_phylink_validate,
......
.phylink_validate = mv88e6390_phylink_validate,
};
Remove all the first one and leave the second one which are been used in
fact. Be aware that for 'mv88e6190x_ops' the assignment functions is
different while the others are all the same. This fixes the following
coccicheck warning:
drivers/net/dsa/mv88e6xxx/chip.c:3911:48-49: phylink_validate: first
occurrence line 3965, second occurrence line 3967
drivers/net/dsa/mv88e6xxx/chip.c:3970:49-50: phylink_validate: first
occurrence line 4024, second occurrence line 4026
drivers/net/dsa/mv88e6xxx/chip.c:4029:48-49: phylink_validate: first
occurrence line 4082, second occurrence line 4085
drivers/net/dsa/mv88e6xxx/chip.c:4184:48-49: phylink_validate: first
occurrence line 4238, second occurrence line 4242
Fixes: 4262c38dc42e ("net: dsa: mv88e6xxx: Add SERDES stats counters to all 6390 family members")
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Reviewed-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
YueHaibing [Tue, 28 Apr 2020 08:12:08 +0000 (16:12 +0800)]
net/x25: Fix null-ptr-deref in x25_disconnect
We should check null before do x25_neigh_put in x25_disconnect,
otherwise may cause null-ptr-deref like this:
#include <sys/socket.h>
#include <linux/x25.h>
int main() {
int sck_x25;
sck_x25 = socket(AF_X25, SOCK_SEQPACKET, 0);
close(sck_x25);
return 0;
}
BUG: kernel NULL pointer dereference, address:
00000000000000d8
CPU: 0 PID: 4817 Comm: t2 Not tainted 5.7.0-rc3+ #159
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-
RIP: 0010:x25_disconnect+0x91/0xe0
Call Trace:
x25_release+0x18a/0x1b0
__sock_release+0x3d/0xc0
sock_close+0x13/0x20
__fput+0x107/0x270
____fput+0x9/0x10
task_work_run+0x6d/0xb0
exit_to_usermode_loop+0x102/0x110
do_syscall_64+0x23c/0x260
entry_SYSCALL_64_after_hwframe+0x49/0xb3
Reported-by: syzbot+6db548b615e5aeefdce2@syzkaller.appspotmail.com
Fixes: 4becb7ee5b3d ("net/x25: Fix x25_neigh refcnt leak when x25 disconnect")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gavin Shan [Tue, 28 Apr 2020 04:49:45 +0000 (14:49 +1000)]
net/ena: Fix build warning in ena_xdp_set()
This fixes the following build warning in ena_xdp_set(), which is
observed on aarch64 with 64KB page size.
In file included from ./include/net/inet_sock.h:19,
from ./include/net/ip.h:27,
from drivers/net/ethernet/amazon/ena/ena_netdev.c:46:
drivers/net/ethernet/amazon/ena/ena_netdev.c: In function \
‘ena_xdp_set’: \
drivers/net/ethernet/amazon/ena/ena_netdev.c:557:6: warning: \
format ‘%lu’ \
expects argument of type ‘long unsigned int’, but argument 4 \
has type ‘int’ \
[-Wformat=] "Failed to set xdp program, the current MTU (%d) is \
larger than the maximum allowed MTU (%lu) while xdp is on",
Signed-off-by: Gavin Shan <gshan@redhat.com>
Acked-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 27 Apr 2020 20:02:55 +0000 (13:02 -0700)]
Merge tag 'batadv-net-for-davem-
20200427' of git://git.open-mesh.org/linux-merge
Simon Wunderlich says:
====================
Here are some batman-adv bugfixes:
- fix random number generation in network coding, by George Spelvin
- fix reference counter leaks, by Xiyu Yang (3 patches)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Christophe JAILLET [Mon, 27 Apr 2020 06:18:03 +0000 (08:18 +0200)]
net/sonic: Fix a resource leak in an error handling path in 'jazz_sonic_probe()'
A call to 'dma_alloc_coherent()' is hidden in 'sonic_alloc_descriptors()',
called from 'sonic_probe1()'.
This is correctly freed in the remove function, but not in the error
handling path of the probe function.
Fix it and add the missing 'dma_free_coherent()' call.
While at it, rename a label in order to be slightly more informative.
Fixes: efcce839360f ("[PATCH] macsonic/jazzsonic network drivers update")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Anthony Felice [Mon, 27 Apr 2020 02:00:59 +0000 (22:00 -0400)]
net: tc35815: Fix phydev supported/advertising mask
Commit
3c1bcc8614db ("net: ethernet: Convert phydev advertize and
supported from u32 to link mode") updated ethernet drivers to use a
linkmode bitmap. It mistakenly dropped a bitwise negation in the
tc35815 ethernet driver on a bitmask to set the supported/advertising
flags.
Found by Anthony via code inspection, not tested as I do not have the
required hardware.
Fixes: 3c1bcc8614db ("net: ethernet: Convert phydev advertize and supported from u32 to link mode")
Signed-off-by: Anthony Felice <tony.felice@timesys.com>
Reviewed-by: Akshay Bhat <akshay.bhat@timesys.com>
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 27 Apr 2020 01:19:07 +0000 (18:19 -0700)]
sch_sfq: validate silly quantum values
syzbot managed to set up sfq so that q->scaled_quantum was zero,
triggering an infinite loop in sfq_dequeue()
More generally, we must only accept quantum between 1 and 2^18 - 7,
meaning scaled_quantum must be in [1, 0x7FFF] range.
Otherwise, we also could have a loop in sfq_dequeue()
if scaled_quantum happens to be 0x8000, since slot->allot
could indefinitely switch between 0 and 0x8000.
Fixes: eeaeb068f139 ("sch_sfq: allow big packets and be fair")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot+0251e883fe39e7a0cb0a@syzkaller.appspotmail.com
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 27 Apr 2020 18:44:06 +0000 (11:44 -0700)]
Merge branch 'bnxt_en-fixes'
Michael Chan says:
====================
bnxt_en: Bug fixes.
A collection of 5 miscellaneous bug fixes covering VF anti-spoof setup
issues, devlink MSIX max value, AER, context memory allocation error
path, and VLAN acceleration logic.
Please queue for -stable. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Sun, 26 Apr 2020 20:24:42 +0000 (16:24 -0400)]
bnxt_en: Fix VLAN acceleration handling in bnxt_fix_features().
The current logic in bnxt_fix_features() will inadvertently turn on both
CTAG and STAG VLAN offload if the user tries to disable both. Fix it
by checking that the user is trying to enable CTAG or STAG before
enabling both. The logic is supposed to enable or disable both CTAG and
STAG together.
Fixes: 5a9f6b238e59 ("bnxt_en: Enable and disable RX CTAG and RX STAG VLAN acceleration together.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Sun, 26 Apr 2020 20:24:41 +0000 (16:24 -0400)]
bnxt_en: Return error when allocating zero size context memory.
bnxt_alloc_ctx_pg_tbls() should return error when the memory size of the
context memory to set up is zero. By returning success (0), the caller
may proceed normally and may crash later when it tries to set up the
memory.
Fixes: 08fe9d181606 ("bnxt_en: Add Level 2 context memory paging support.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Sun, 26 Apr 2020 20:24:40 +0000 (16:24 -0400)]
bnxt_en: Improve AER slot reset.
Improve the slot reset sequence by disabling the device to prevent bad
DMAs if slot reset fails. Return the proper result instead of always
PCI_ERS_RESULT_RECOVERED to the caller.
Fixes: 6316ea6db93d ("bnxt_en: Enable AER support.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vasundhara Volam [Sun, 26 Apr 2020 20:24:39 +0000 (16:24 -0400)]
bnxt_en: Reduce BNXT_MSIX_VEC_MAX value to supported CQs per PF.
Broadcom adapters support only maximum of 512 CQs per PF. If user sets
MSIx vectors more than supported CQs, firmware is setting incorrect value
for msix_vec_per_pf_max parameter. Fix it by reducing the BNXT_MSIX_VEC_MAX
value to 512, even though the maximum # of MSIx vectors supported by adapter
are 1280.
Fixes: f399e8497826 ("bnxt_en: Use msix_vec_per_pf_max and msix_vec_per_pf_min devlink params.")
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Sun, 26 Apr 2020 20:24:38 +0000 (16:24 -0400)]
bnxt_en: Fix VF anti-spoof filter setup.
Fix the logic that sets the enable/disable flag for the source MAC
filter according to firmware spec 1.7.1.
In the original firmware spec. before 1.7.1, the VF spoof check flags
were not latched after making the HWRM_FUNC_CFG call, so there was a
need to keep the func_flags so that subsequent calls would perserve
the VF spoof check setting. A change was made in the 1.7.1 spec
so that the flags became latched. So we now set or clear the anti-
spoof setting directly without retrieving the old settings in the
stored vf->func_flags which are no longer valid. We also remove the
unneeded vf->func_flags.
Fixes: 8eb992e876a8 ("bnxt_en: Update firmware interface spec to 1.7.6.2.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Baruch Siach [Sun, 26 Apr 2020 06:22:06 +0000 (09:22 +0300)]
net: phy: marvell10g: fix temperature sensor on 2110
Read the temperature sensor register from the correct location for the
88E2110 PHY. There is no enable/disable bit on 2110, so make
mv3310_hwmon_config() run on 88X3310 only.
Fixes: 62d01535474b61 ("net: phy: marvell10g: add support for the 88x2110 PHY")
Cc: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Reviewed-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sat, 25 Apr 2020 22:19:51 +0000 (15:19 -0700)]
sch_choke: avoid potential panic in choke_reset()
If choke_init() could not allocate q->tab, we would crash later
in choke_reset().
BUG: KASAN: null-ptr-deref in memset include/linux/string.h:366 [inline]
BUG: KASAN: null-ptr-deref in choke_reset+0x208/0x340 net/sched/sch_choke.c:326
Write of size 8 at addr
0000000000000000 by task syz-executor822/7022
CPU: 1 PID: 7022 Comm: syz-executor822 Not tainted 5.7.0-rc1-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x188/0x20d lib/dump_stack.c:118
__kasan_report.cold+0x5/0x4d mm/kasan/report.c:515
kasan_report+0x33/0x50 mm/kasan/common.c:625
check_memory_region_inline mm/kasan/generic.c:187 [inline]
check_memory_region+0x141/0x190 mm/kasan/generic.c:193
memset+0x20/0x40 mm/kasan/common.c:85
memset include/linux/string.h:366 [inline]
choke_reset+0x208/0x340 net/sched/sch_choke.c:326
qdisc_reset+0x6b/0x520 net/sched/sch_generic.c:910
dev_deactivate_queue.constprop.0+0x13c/0x240 net/sched/sch_generic.c:1138
netdev_for_each_tx_queue include/linux/netdevice.h:2197 [inline]
dev_deactivate_many+0xe2/0xba0 net/sched/sch_generic.c:1195
dev_deactivate+0xf8/0x1c0 net/sched/sch_generic.c:1233
qdisc_graft+0xd25/0x1120 net/sched/sch_api.c:1051
tc_modify_qdisc+0xbab/0x1a00 net/sched/sch_api.c:1670
rtnetlink_rcv_msg+0x44e/0xad0 net/core/rtnetlink.c:5454
netlink_rcv_skb+0x15a/0x410 net/netlink/af_netlink.c:2469
netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline]
netlink_unicast+0x537/0x740 net/netlink/af_netlink.c:1329
netlink_sendmsg+0x882/0xe10 net/netlink/af_netlink.c:1918
sock_sendmsg_nosec net/socket.c:652 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:672
____sys_sendmsg+0x6bf/0x7e0 net/socket.c:2362
___sys_sendmsg+0x100/0x170 net/socket.c:2416
__sys_sendmsg+0xec/0x1b0 net/socket.c:2449
do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295
Fixes: 77e62da6e60c ("sch_choke: drop all packets in queue during reset")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sat, 25 Apr 2020 19:40:25 +0000 (12:40 -0700)]
fq_codel: fix TCA_FQ_CODEL_DROP_BATCH_SIZE sanity checks
My intent was to not let users set a zero drop_batch_size,
it seems I once again messed with min()/max().
Fixes: 9d18562a2278 ("fq_codel: add batch ability to fq_codel_drop()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xiyu Yang [Sat, 25 Apr 2020 13:10:23 +0000 (21:10 +0800)]
net/tls: Fix sk_psock refcnt leak when in tls_data_ready()
tls_data_ready() invokes sk_psock_get(), which returns a reference of
the specified sk_psock object to "psock" with increased refcnt.
When tls_data_ready() returns, local variable "psock" becomes invalid,
so the refcount should be decreased to keep refcount balanced.
The reference counting issue happens in one exception handling path of
tls_data_ready(). When "psock->ingress_msg" is empty but "psock" is not
NULL, the function forgets to decrease the refcnt increased by
sk_psock_get(), causing a refcnt leak.
Fix this issue by calling sk_psock_put() on all paths when "psock" is
not NULL.
Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xiyu Yang [Sat, 25 Apr 2020 13:06:25 +0000 (21:06 +0800)]
net/x25: Fix x25_neigh refcnt leak when x25 disconnect
x25_connect() invokes x25_get_neigh(), which returns a reference of the
specified x25_neigh object to "x25->neighbour" with increased refcnt.
When x25 connect success and returns, the reference still be hold by
"x25->neighbour", so the refcount should be decreased in
x25_disconnect() to keep refcount balanced.
The reference counting issue happens in x25_disconnect(), which forgets
to decrease the refcnt increased by x25_get_neigh() in x25_connect(),
causing a refcnt leak.
Fix this issue by calling x25_neigh_put() before x25_disconnect()
returns.
Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xiyu Yang [Sat, 25 Apr 2020 12:54:37 +0000 (20:54 +0800)]
net/tls: Fix sk_psock refcnt leak in bpf_exec_tx_verdict()
bpf_exec_tx_verdict() invokes sk_psock_get(), which returns a reference
of the specified sk_psock object to "psock" with increased refcnt.
When bpf_exec_tx_verdict() returns, local variable "psock" becomes
invalid, so the refcount should be decreased to keep refcount balanced.
The reference counting issue happens in one exception handling path of
bpf_exec_tx_verdict(). When "policy" equals to NULL but "psock" is not
NULL, the function forgets to decrease the refcnt increased by
sk_psock_get(), causing a refcnt leak.
Fix this issue by calling sk_psock_put() on this error path before
bpf_exec_tx_verdict() returns.
Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Richard Clark [Sat, 25 Apr 2020 00:58:11 +0000 (08:58 +0800)]
aquantia: Fix the media type of AQC100 ethernet controller in the driver
The Aquantia AQC100 controller enables a SFP+ port, so the driver should
configure the media type as '_TYPE_FIBRE' instead of '_TYPE_TP'.
Signed-off-by: Richard Clark <richard.xnu.clark@gmail.com>
Cc: Igor Russkikh <irusskikh@marvell.com>
Cc: "David S. Miller" <davem@davemloft.net>
Acked-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 27 Apr 2020 17:18:01 +0000 (10:18 -0700)]
Merge branch 'vsock-virtio-fixes-about-packet-delivery-to-monitoring-devices'
Stefano Garzarella says:
====================
vsock/virtio: fixes about packet delivery to monitoring devices
During the review of v1, Stefan pointed out an issue introduced by
that patch, where replies can appear in the packet capture before
the transmitted packet.
While fixing my patch, reverting it and adding a new flag in
'struct virtio_vsock_pkt' (patch 2/2), I found that we already had
that issue in vhost-vsock, so I fixed it (patch 1/2).
v1 -> v2:
- reverted the v1 patch, to avoid that replies can appear in the
packet capture before the transmitted packet [Stefan]
- added patch to fix packet delivering to monitoring devices in
vhost-vsock
- added patch to check if the packet is already delivered to
monitoring devices
v1: https://patchwork.ozlabs.org/project/netdev/patch/
20200421092527.41651-1-sgarzare@redhat.com/
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefano Garzarella [Fri, 24 Apr 2020 15:08:30 +0000 (17:08 +0200)]
vsock/virtio: fix multiple packet delivery to monitoring devices
In virtio_transport.c, if the virtqueue is full, the transmitting
packet is queued up and it will be sent in the next iteration.
This causes the same packet to be delivered multiple times to
monitoring devices.
We want to continue to deliver packets to monitoring devices before
it is put in the virtqueue, to avoid that replies can appear in the
packet capture before the transmitted packet.
This patch fixes the issue, adding a new flag (tap_delivered) in
struct virtio_vsock_pkt, to check if the packet is already delivered
to monitoring devices.
In vhost/vsock.c, we are splitting packets, so we must set
'tap_delivered' to false when we queue up the same virtio_vsock_pkt
to handle the remaining bytes.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefano Garzarella [Fri, 24 Apr 2020 15:08:29 +0000 (17:08 +0200)]
vhost/vsock: fix packet delivery order to monitoring devices
We want to deliver packets to monitoring devices before it is
put in the virtqueue, to avoid that replies can appear in the
packet capture before the transmitted packet.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Guillaume Nault [Tue, 21 Apr 2020 00:42:19 +0000 (02:42 +0200)]
netfilter: nat: never update the UDP checksum when it's 0
If the UDP header of a local VXLAN endpoint is NAT-ed, and the VXLAN
device has disabled UDP checksums and enabled Tx checksum offloading,
then the skb passed to udp_manip_pkt() has hdr->check == 0 (outer
checksum disabled) and skb->ip_summed == CHECKSUM_PARTIAL (inner packet
checksum offloaded).
Because of the ->ip_summed value, udp_manip_pkt() tries to update the
outer checksum with the new address and port, leading to an invalid
checksum sent on the wire, as the original null checksum obviously
didn't take the old address and port into account.
So, we can't take ->ip_summed into account in udp_manip_pkt(), as it
might not refer to the checksum we're acting on. Instead, we can base
the decision to update the UDP checksum entirely on the value of
hdr->check, because it's null if and only if checksum is disabled:
* A fully computed checksum can't be 0, since a 0 checksum is
represented by the CSUM_MANGLED_0 value instead.
* A partial checksum can't be 0, since the pseudo-header always adds
at least one non-zero value (the UDP protocol type 0x11) and adding
more values to the sum can't make it wrap to 0 as the carry is then
added to the wrapped number.
* A disabled checksum uses the special value 0.
The problem seems to be there from day one, although it was probably
not visible before UDP tunnels were implemented.
Fixes: 5b1158e909ec ("[NETFILTER]: Add NAT support for nf_conntrack")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Eric Dumazet [Sat, 25 Apr 2020 01:39:29 +0000 (18:39 -0700)]
net: remove obsolete comment
Commit
b656722906ef ("net: Increase the size of skb_frag_t")
removed the 16bit limitation of a frag on some 32bit arches.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Fri, 24 Apr 2020 11:15:21 +0000 (13:15 +0200)]
mptcp: fix race in msk status update
Currently subflow_finish_connect() changes unconditionally
any msk socket status other than TCP_ESTABLISHED.
If an unblocking connect() races with close(), we can end-up
triggering:
IPv4: Attempt to release TCP socket in state 1
00000000e32b8b7e
when the msk socket is disposed.
Be sure to enter the established status only from SYN_SENT.
Fixes: c3c123d16c0e ("net: mptcp: don't hang in mptcp_sendmsg() after TCP fallback")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sat, 25 Apr 2020 19:25:32 +0000 (12:25 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/ebiederm/user-namespace
Pull pid leak fix from Eric Biederman:
"Oleg noticed that put_pid(thread_pid) was not getting called when proc
was not compiled in.
Let's get that fixed before 5.7 is released and causes problems for
anyone"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
proc: Put thread_pid in release_task not proc_flush_pid
Linus Torvalds [Sat, 25 Apr 2020 19:16:48 +0000 (12:16 -0700)]
Merge tag 'timers-urgent-2020-04-25' of git://git./linux/kernel/git/tip/tip
Pull timer fixlet from Ingo Molnar:
"A single fix for a comment that may show up in DocBook output"
* tag 'timers-urgent-2020-04-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
vdso/datapage: Use correct clock mode name in comment
Linus Torvalds [Sat, 25 Apr 2020 19:11:47 +0000 (12:11 -0700)]
Merge tag 'sched-urgent-2020-04-25' of git://git./linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
"Misc fixes:
- an uclamp accounting fix
- three frequency invariance fixes and a readability improvement"
* tag 'sched-urgent-2020-04-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/core: Fix reset-on-fork from RT with uclamp
x86, sched: Move check for CPU type to caller function
x86, sched: Don't enable static key when starting secondary CPUs
x86, sched: Account for CPUs with less than 4 cores in freq. invariance
x86, sched: Bail out of frequency invariance if base frequency is unknown
Linus Torvalds [Sat, 25 Apr 2020 19:08:24 +0000 (12:08 -0700)]
Merge tag 'perf-urgent-2020-04-25' of git://git./linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"Two changes:
- fix exit event records
- extend x86 PMU driver enumeration to add Intel Jasper Lake CPU
support"
* tag 'perf-urgent-2020-04-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: fix parent pid/tid in task exit events
perf/x86/cstate: Add Jasper Lake CPU support
Linus Torvalds [Sat, 25 Apr 2020 18:52:02 +0000 (11:52 -0700)]
Merge tag 'objtool-urgent-2020-04-25' of git://git./linux/kernel/git/tip/tip
Pull objtool fixes from Ingo Molnar:
"Two fixes: fix an off-by-one bug, and fix 32-bit builds on 64-bit
systems"
* tag 'objtool-urgent-2020-04-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: Fix off-by-one in symbol_by_offset()
objtool: Fix 32bit cross builds
Linus Torvalds [Sat, 25 Apr 2020 02:17:30 +0000 (19:17 -0700)]
Merge git://git./linux/kernel/git/netdev/net
Pull networking fixes from David Miller:
1) Fix memory leak in netfilter flowtable, from Roi Dayan.
2) Ref-count leaks in netrom and tipc, from Xiyu Yang.
3) Fix warning when mptcp socket is never accepted before close, from
Florian Westphal.
4) Missed locking in ovs_ct_exit(), from Tonghao Zhang.
5) Fix large delays during PTP synchornization in cxgb4, from Rahul
Lakkireddy.
6) team_mode_get() can hang, from Taehee Yoo.
7) Need to use kvzalloc() when allocating fw tracer in mlx5 driver,
from Niklas Schnelle.
8) Fix handling of bpf XADD on BTF memory, from Jann Horn.
9) Fix BPF_STX/BPF_B encoding in x86 bpf jit, from Luke Nelson.
10) Missing queue memory release in iwlwifi pcie code, from Johannes
Berg.
11) Fix NULL deref in macvlan device event, from Taehee Yoo.
12) Initialize lan87xx phy correctly, from Yuiko Oshino.
13) Fix looping between VRF and XFRM lookups, from David Ahern.
14) etf packet scheduler assumes all sockets are full sockets, which is
not necessarily true. From Eric Dumazet.
15) Fix mptcp data_fin handling in RX path, from Paolo Abeni.
16) fib_select_default() needs to handle nexthop objects, from David
Ahern.
17) Use GFP_ATOMIC under spinlock in mac80211_hwsim, from Wei Yongjun.
18) vxlan and geneve use wrong nlattr array, from Sabrina Dubroca.
19) Correct rx/tx stats in bcmgenet driver, from Doug Berger.
20) BPF_LDX zero-extension is encoded improperly in x86_32 bpf jit, fix
from Luke Nelson.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (100 commits)
selftests/bpf: Fix a couple of broken test_btf cases
tools/runqslower: Ensure own vmlinux.h is picked up first
bpf: Make bpf_link_fops static
bpftool: Respect the -d option in struct_ops cmd
selftests/bpf: Add test for freplace program with expected_attach_type
bpf: Propagate expected_attach_type when verifying freplace programs
bpf: Fix leak in LINK_UPDATE and enforce empty old_prog_fd
bpf, x86_32: Fix logic error in BPF_LDX zero-extension
bpf, x86_32: Fix clobbering of dst for BPF_JSET
bpf, x86_32: Fix incorrect encoding in BPF_LDX zero-extension
bpf: Fix reStructuredText markup
net: systemport: suppress warnings on failed Rx SKB allocations
net: bcmgenet: suppress warnings on failed Rx SKB allocations
macsec: avoid to set wrong mtu
mac80211: sta_info: Add lockdep condition for RCU list usage
mac80211: populate debugfs only after cfg80211 init
net: bcmgenet: correct per TX/RX ring statistics
net: meth: remove spurious copyright text
net: phy: bcm84881: clear settings on link down
chcr: Fix CPU hard lockup
...
David S. Miller [Sat, 25 Apr 2020 01:26:14 +0000 (18:26 -0700)]
Merge git://git./pub/scm/linux/kernel/git/bpf/bpf
Alexei Starovoitov says:
====================
pull-request: bpf 2020-04-24
The following pull-request contains BPF updates for your *net* tree.
We've added 17 non-merge commits during the last 5 day(s) which contain
a total of 19 files changed, 203 insertions(+), 85 deletions(-).
The main changes are:
1) link_update fix, from Andrii.
2) libbpf get_xdp_id fix, from David.
3) xadd verifier fix, from Jann.
4) x86-32 JIT fixes, from Luke and Wang.
5) test_btf fix, from Stanislav.
6) freplace verifier fix, from Toke.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Stanislav Fomichev [Wed, 22 Apr 2020 00:37:53 +0000 (17:37 -0700)]
selftests/bpf: Fix a couple of broken test_btf cases
Commit
51c39bb1d5d1 ("bpf: Introduce function-by-function verification")
introduced function linkage flag and changed the error message from
"vlen != 0" to "Invalid func linkage" and broke some fake BPF programs.
Adjust the test accordingly.
AFACT, the programs don't really need any arguments and only look
at BTF for maps, so let's drop the args altogether.
Before:
BTF raw test[103] (func (Non zero vlen)): do_test_raw:3703:FAIL expected
err_str:vlen != 0
magic: 0xeb9f
version: 1
flags: 0x0
hdr_len: 24
type_off: 0
type_len: 72
str_off: 72
str_len: 10
btf_total_size: 106
[1] INT (anon) size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
[2] INT (anon) size=4 bits_offset=0 nr_bits=32 encoding=(none)
[3] FUNC_PROTO (anon) return=0 args=(1 a, 2 b)
[4] FUNC func type_id=3 Invalid func linkage
BTF libbpf test[1] (test_btf_haskv.o): libbpf: load bpf program failed:
Invalid argument
libbpf: -- BEGIN DUMP LOG ---
libbpf:
Validating test_long_fname_2() func#1...
Arg#0 type PTR in test_long_fname_2() is not supported yet.
processed 0 insns (limit
1000000) max_states_per_insn 0 total_states 0
peak_states 0 mark_read 0
libbpf: -- END LOG --
libbpf: failed to load program 'dummy_tracepoint'
libbpf: failed to load object 'test_btf_haskv.o'
do_test_file:4201:FAIL bpf_object__load: -4007
BTF libbpf test[2] (test_btf_newkv.o): libbpf: load bpf program failed:
Invalid argument
libbpf: -- BEGIN DUMP LOG ---
libbpf:
Validating test_long_fname_2() func#1...
Arg#0 type PTR in test_long_fname_2() is not supported yet.
processed 0 insns (limit
1000000) max_states_per_insn 0 total_states 0
peak_states 0 mark_read 0
libbpf: -- END LOG --
libbpf: failed to load program 'dummy_tracepoint'
libbpf: failed to load object 'test_btf_newkv.o'
do_test_file:4201:FAIL bpf_object__load: -4007
BTF libbpf test[3] (test_btf_nokv.o): libbpf: load bpf program failed:
Invalid argument
libbpf: -- BEGIN DUMP LOG ---
libbpf:
Validating test_long_fname_2() func#1...
Arg#0 type PTR in test_long_fname_2() is not supported yet.
processed 0 insns (limit
1000000) max_states_per_insn 0 total_states 0
peak_states 0 mark_read 0
libbpf: -- END LOG --
libbpf: failed to load program 'dummy_tracepoint'
libbpf: failed to load object 'test_btf_nokv.o'
do_test_file:4201:FAIL bpf_object__load: -4007
Fixes: 51c39bb1d5d1 ("bpf: Introduce function-by-function verification")
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200422003753.124921-1-sdf@google.com
Andrii Nakryiko [Wed, 22 Apr 2020 01:24:07 +0000 (18:24 -0700)]
tools/runqslower: Ensure own vmlinux.h is picked up first
Reorder include paths to ensure that runqslower sources are picking up
vmlinux.h, generated by runqslower's own Makefile. When runqslower is built
from selftests/bpf, due to current -I$(BPF_INCLUDE) -I$(OUTPUT) ordering, it
might pick up not-yet-complete vmlinux.h, generated by selftests Makefile,
which could lead to compilation errors like [0]. So ensure that -I$(OUTPUT)
goes first and rely on runqslower's Makefile own dependency chain to ensure
vmlinux.h is properly completed before source code relying on it is compiled.
[0] https://travis-ci.org/github/libbpf/libbpf/jobs/
677905925
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200422012407.176303-1-andriin@fb.com
Zou Wei [Thu, 23 Apr 2020 02:32:40 +0000 (10:32 +0800)]
bpf: Make bpf_link_fops static
Fix the following sparse warning:
kernel/bpf/syscall.c:2289:30: warning: symbol 'bpf_link_fops' was not declared. Should it be static?
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zou Wei <zou_wei@huawei.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/1587609160-117806-1-git-send-email-zou_wei@huawei.com
Martin KaFai Lau [Fri, 24 Apr 2020 18:29:11 +0000 (11:29 -0700)]
bpftool: Respect the -d option in struct_ops cmd
In the prog cmd, the "-d" option turns on the verifier log.
This is missed in the "struct_ops" cmd and this patch fixes it.
Fixes: 65c93628599d ("bpftool: Add struct_ops support")
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20200424182911.1259355-1-kafai@fb.com
Toke Høiland-Jørgensen [Fri, 24 Apr 2020 13:34:28 +0000 (15:34 +0200)]
selftests/bpf: Add test for freplace program with expected_attach_type
This adds a new selftest that tests the ability to attach an freplace
program to a program type that relies on the expected_attach_type of the
target program to pass verification.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/158773526831.293902.16011743438619684815.stgit@toke.dk
Toke Høiland-Jørgensen [Fri, 24 Apr 2020 13:34:27 +0000 (15:34 +0200)]
bpf: Propagate expected_attach_type when verifying freplace programs
For some program types, the verifier relies on the expected_attach_type of
the program being verified in the verification process. However, for
freplace programs, the attach type was not propagated along with the
verifier ops, so the expected_attach_type would always be zero for freplace
programs.
This in turn caused the verifier to sometimes make the wrong call for
freplace programs. For all existing uses of expected_attach_type for this
purpose, the result of this was only false negatives (i.e., freplace
functions would be rejected by the verifier even though they were valid
programs for the target they were replacing). However, should a false
positive be introduced, this can lead to out-of-bounds accesses and/or
crashes.
The fix introduced in this patch is to propagate the expected_attach_type
to the freplace program during verification, and reset it after that is
done.
Fixes: be8704ff07d2 ("bpf: Introduce dynamic program extensions")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/158773526726.293902.13257293296560360508.stgit@toke.dk
Andrii Nakryiko [Fri, 24 Apr 2020 05:20:44 +0000 (22:20 -0700)]
bpf: Fix leak in LINK_UPDATE and enforce empty old_prog_fd
Fix bug of not putting bpf_link in LINK_UPDATE command.
Also enforce zeroed old_prog_fd if no BPF_F_REPLACE flag is specified.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200424052045.4002963-1-andriin@fb.com
Wang YanQing [Thu, 23 Apr 2020 05:06:37 +0000 (13:06 +0800)]
bpf, x86_32: Fix logic error in BPF_LDX zero-extension
When verifier_zext is true, we don't need to emit code
for zero-extension.
Fixes: 836256bf5f37 ("x32: bpf: eliminate zero extension code-gen")
Signed-off-by: Wang YanQing <udknight@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200423050637.GA4029@udknight
Luke Nelson [Wed, 22 Apr 2020 17:36:30 +0000 (10:36 -0700)]
bpf, x86_32: Fix clobbering of dst for BPF_JSET
The current JIT clobbers the destination register for BPF_JSET BPF_X
and BPF_K by using "and" and "or" instructions. This is fine when the
destination register is a temporary loaded from a register stored on
the stack but not otherwise.
This patch fixes the problem (for both BPF_K and BPF_X) by always loading
the destination register into temporaries since BPF_JSET should not
modify the destination register.
This bug may not be currently triggerable as BPF_REG_AX is the only
register not stored on the stack and the verifier uses it in a limited
way.
Fixes: 03f5781be2c7b ("bpf, x86_32: add eBPF JIT compiler for ia32")
Signed-off-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: Luke Nelson <luke.r.nels@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Wang YanQing <udknight@gmail.com>
Link: https://lore.kernel.org/bpf/20200422173630.8351-2-luke.r.nels@gmail.com
Luke Nelson [Wed, 22 Apr 2020 17:36:29 +0000 (10:36 -0700)]
bpf, x86_32: Fix incorrect encoding in BPF_LDX zero-extension
The current JIT uses the following sequence to zero-extend into the
upper 32 bits of the destination register for BPF_LDX BPF_{B,H,W},
when the destination register is not on the stack:
EMIT3(0xC7, add_1reg(0xC0, dst_hi), 0);
The problem is that C7 /0 encodes a MOV instruction that requires a 4-byte
immediate; the current code emits only 1 byte of the immediate. This
means that the first 3 bytes of the next instruction will be treated as
the rest of the immediate, breaking the stream of instructions.
This patch fixes the problem by instead emitting "xor dst_hi,dst_hi"
to clear the upper 32 bits. This fixes the problem and is more efficient
than using MOV to load a zero immediate.
This bug may not be currently triggerable as BPF_REG_AX is the only
register not stored on the stack and the verifier uses it in a limited
way, and the verifier implements a zero-extension optimization. But the
JIT should avoid emitting incorrect encodings regardless.
Fixes: 03f5781be2c7b ("bpf, x86_32: add eBPF JIT compiler for ia32")
Signed-off-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: Luke Nelson <luke.r.nels@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Acked-by: Wang YanQing <udknight@gmail.com>
Link: https://lore.kernel.org/bpf/20200422173630.8351-1-luke.r.nels@gmail.com
Jakub Wilk [Wed, 22 Apr 2020 08:23:24 +0000 (10:23 +0200)]
bpf: Fix reStructuredText markup
The patch fixes:
$ scripts/bpf_helpers_doc.py > bpf-helpers.rst
$ rst2man bpf-helpers.rst > bpf-helpers.7
bpf-helpers.rst:1105: (WARNING/2) Inline strong start-string without end-string.
Signed-off-by: Jakub Wilk <jwilk@jwilk.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20200422082324.2030-1-jwilk@jwilk.net
Doug Berger [Thu, 23 Apr 2020 23:13:30 +0000 (16:13 -0700)]
net: systemport: suppress warnings on failed Rx SKB allocations
The driver is designed to drop Rx packets and reclaim the buffers
when an allocation fails, and the network interface needs to safely
handle this packet loss. Therefore, an allocation failure of Rx
SKBs is relatively benign.
However, the output of the warning message occurs with a high
scheduling priority that can cause excessive jitter/latency for
other high priority processing.
This commit suppresses the warning messages to prevent scheduling
problems while retaining the failure count in the statistics of
the network interface.
Signed-off-by: Doug Berger <opendmb@gmail.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Doug Berger [Thu, 23 Apr 2020 23:02:11 +0000 (16:02 -0700)]
net: bcmgenet: suppress warnings on failed Rx SKB allocations
The driver is designed to drop Rx packets and reclaim the buffers
when an allocation fails, and the network interface needs to safely
handle this packet loss. Therefore, an allocation failure of Rx
SKBs is relatively benign.
However, the output of the warning message occurs with a high
scheduling priority that can cause excessive jitter/latency for
other high priority processing.
This commit suppresses the warning messages to prevent scheduling
problems while retaining the failure count in the statistics of
the network interface.
Signed-off-by: Doug Berger <opendmb@gmail.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Taehee Yoo [Thu, 23 Apr 2020 13:40:47 +0000 (13:40 +0000)]
macsec: avoid to set wrong mtu
When a macsec interface is created, the mtu is calculated with the lower
interface's mtu value.
If the mtu of lower interface is lower than the length, which is needed
by macsec interface, macsec's mtu value will be overflowed.
So, if the lower interface's mtu is too low, macsec interface's mtu
should be set to 0.
Test commands:
ip link add dummy0 mtu 10 type dummy
ip link add macsec0 link dummy0 type macsec
ip link show macsec0
Before:
11: macsec0@dummy0: <BROADCAST,MULTICAST,M-DOWN> mtu
4294967274
After:
11: macsec0@dummy0: <BROADCAST,MULTICAST,M-DOWN> mtu 0
Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 24 Apr 2020 23:23:24 +0000 (16:23 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Two minor fixes: one to update a Kconfig reference and the other to
fix a resource leak on an error path in sg"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: Update referenced link to cdrtools
scsi: sg: add sg_remove_request in sg_write
Eric W. Biederman [Fri, 24 Apr 2020 20:41:20 +0000 (15:41 -0500)]
proc: Put thread_pid in release_task not proc_flush_pid
Oleg pointed out that in the unlikely event the kernel is compiled
with CONFIG_PROC_FS unset that release_task will now leak the pid.
Move the put_pid out of proc_flush_pid into release_task to fix this
and to guarantee I don't make that mistake again.
When possible it makes sense to keep get and put in the same function
so it can easily been seen how they pair up.
Fixes: 7bc3e6e55acf ("proc: Use a list of inodes to flush from proc")
Reported-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Linus Torvalds [Fri, 24 Apr 2020 20:43:37 +0000 (13:43 -0700)]
Merge tag 'pm-5.7-rc3' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"Restore an optimization related to asynchronous suspend and resume of
devices during system-wide power transitions that was disabled by
mistake (Kai-Heng Feng) and update the pm-graph suite of power
management utilities (Todd Brandt)"
* tag 'pm-5.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM: sleep: core: Switch back to async_schedule_dev()
pm-graph v5.6
Linus Torvalds [Fri, 24 Apr 2020 20:41:29 +0000 (13:41 -0700)]
Merge tag 'pnp-5.7-rc3' of git://git./linux/kernel/git/rafael/linux-pm
Pull PNP cleanup from Rafael Wysocki:
"Make the PNP code use list_for_each_entry() in a few places instead of
open-coding it (Jason Gunthorpe)"
* tag 'pnp-5.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
pnp: Use list_for_each_entry() instead of open coding
Linus Torvalds [Fri, 24 Apr 2020 20:37:19 +0000 (13:37 -0700)]
Merge tag 'acpi-5.7-rc3' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"Drop a lid status quirk for Asus T200TA that is not necessary any more
and clean up a resource management inconsistency in the PCI IRQ link
configuration code.
Both changes from Hans de Goede"
* tag 'acpi-5.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: button: Drop no longer necessary Asus T200TA lid_init_state quirk
ACPI/PCI: pci_link: use extended_irq union member when setting ext-irq shareable
Linus Torvalds [Fri, 24 Apr 2020 18:10:58 +0000 (11:10 -0700)]
mm: check that mm is still valid in madvise()
IORING_OP_MADVISE can end up basically doing mprotect() on the VM of
another process, which means that it can race with our crazy core dump
handling which accesses the VM state without holding the mmap_sem
(because it incorrectly thinks that it is the final user).
This is clearly a core dumping problem, but we've never fixed it the
right way, and instead have the notion of "check that the mm is still
ok" using mmget_still_valid() after getting the mmap_sem for writing in
any situation where we're not the original VM thread.
See commit
04f5866e41fb ("coredump: fix race condition between
mmget_not_zero()/get_task_mm() and core dumping") for more background on
this whole mmget_still_valid() thing. You might want to have a barf bag
handy when you do.
We're discussing just fixing this properly in the only remaining core
dumping routines. But even if we do that, let's make do_madvise() do
the right thing, and then when we fix core dumping, we can remove all
these mmget_still_valid() checks.
Reported-and-tested-by: Jann Horn <jannh@google.com>
Fixes: c1ca757bd6f4 ("io_uring: add IORING_OP_MADVISE")
Acked-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David S. Miller [Fri, 24 Apr 2020 20:17:01 +0000 (13:17 -0700)]
Merge tag 'mac80211-for-net-2020-04-24' of git://git./linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
Just three changes:
* fix a wrong GFP_KERNEL in hwsim
* fix the debugfs mess after the mac80211 registration race fix
* suppress false-positive RCU list lockdep warnings
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 24 Apr 2020 20:14:05 +0000 (13:14 -0700)]
Merge tag 'wireless-drivers-2020-04-24' of git://git./linux/kernel/git/kvalo/wireless-drivers
Kalle Valo says:
====================
wireless-drivers fixes for v5.7
Second set of fixes for v5.7. Quite a few iwlwifi fixes and some
maintainers file updates.
iwlwifi
* fix a bug with kmemdup() error handling
* fix a DMA pool warning about unfreed memory
* fix beacon statistics
* fix a theoritical bug in device initialisation
* fix queue limit handling and inactive TID removal
* disable ACK Enabled Aggregation which was enabled by accident
* fix transmit power setting reading from BIOS with certain versions
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 24 Apr 2020 19:58:22 +0000 (12:58 -0700)]
Merge tag 'io_uring-5.7-2020-04-24' of git://git.kernel.dk/linux-block
Pull io_uring fix from Jens Axboe:
"Single fixup for a change that went into -rc2"
* tag 'io_uring-5.7-2020-04-24' of git://git.kernel.dk/linux-block:
io_uring: only restore req->work for req that needs do completion
Linus Torvalds [Fri, 24 Apr 2020 19:54:13 +0000 (12:54 -0700)]
Merge tag 'libata-5.7-2020-04-24' of git://git.kernel.dk/linux-block
Pull libata fixlet from Jens Axboe:
"Minor spelling error fix for libata"
* tag 'libata-5.7-2020-04-24' of git://git.kernel.dk/linux-block:
ata: sata_inic162x fix a spelling issue
Linus Torvalds [Fri, 24 Apr 2020 19:44:19 +0000 (12:44 -0700)]
Merge tag 'block-5.7-2020-04-24' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"A few fixes/changes that should go into this release:
- null_blk zoned fixes (Damien)
- blkdev_close() sync improvement (Douglas)
- Fix regression in blk-iocost that impacted (at least) systemtap
(Waiman)
- Comment fix, header removal (Zhiqiang, Jianpeng)"
* tag 'block-5.7-2020-04-24' of git://git.kernel.dk/linux-block:
null_blk: Cleanup zoned device initialization
null_blk: Fix zoned command handling
block: remove unused header
blk-iocost: Fix error on iocost_ioc_vrate_adj
bdev: Reduce time holding bd_mutex in sync in blkdev_close()
buffer: remove useless comment and WB_REASON_FREE_MORE_MEM, reason.
Linus Torvalds [Fri, 24 Apr 2020 19:39:21 +0000 (12:39 -0700)]
Merge tag 'trace-v5.7-rc2' of git://git./linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"A few tracing fixes:
- Two fixes for memory leaks detected by kmemleak
- Removal of some dead code
- A few local functions turned static"
* tag 'trace-v5.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Convert local functions in tracing_map.c to static
tracing: Remove DECLARE_TRACE_NOARGS
ftrace: Fix memory leak caused by not freeing entry in unregister_ftrace_direct()
tracing: Fix memory leaks in trace_events_hist.c
Rafael J. Wysocki [Fri, 24 Apr 2020 19:03:57 +0000 (21:03 +0200)]
Merge branch 'acpi-pci'
* acpi-pci:
ACPI/PCI: pci_link: use extended_irq union member when setting ext-irq shareable
Linus Torvalds [Fri, 24 Apr 2020 18:34:43 +0000 (11:34 -0700)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- Ensure context synchronisation after a write to APIAKey.
- Fix bullet list formatting in Documentation/arm64/amu.rst to
eliminate doc warnings.
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
Documentation: arm64: fix amu.rst doc warnings
arm64: sync kernel APIAKey when installing