Vladimir Oltean [Mon, 1 Mar 2021 11:18:18 +0000 (13:18 +0200)]
net: enetc: keep RX ring consumer index in sync with hardware
The RX rings have a producer index owned by hardware, where newly
received frame buffers are placed, and a consumer index owned by
software, where newly allocated buffers are placed, in expectation of
hardware being able to place frame data in them.
Hardware increments the producer index when a frame is received, however
it is not allowed to increment the producer index to match the consumer
index (RBCIR) since the ring can hold at most RBLENR[LENGTH]-1 received
BDs. Whenever the producer index matches the value of the consumer
index, the ring has no unprocessed received frames and all BDs in the
ring have been initialized/prepared by software, i.e. hardware owns all
BDs in the ring.
The code uses the next_to_clean variable to keep track of the producer
index, and the next_to_use variable to keep track of the consumer index.
The RX rings are seeded from enetc_refill_rx_ring, which is called from
two places:
1. initially the ring is seeded until full with enetc_bd_unused(rx_ring),
i.e. with 511 buffers. This will make next_to_clean=0 and next_to_use=511:
.ndo_open
-> enetc_open
-> enetc_setup_bdrs
-> enetc_setup_rxbdr
-> enetc_refill_rx_ring
2. then during the data path processing, it is refilled with 16 buffers
at a time:
enetc_msix
-> napi_schedule
-> enetc_poll
-> enetc_clean_rx_ring
-> enetc_refill_rx_ring
There is just one problem: the initial seeding done during .ndo_open
updates just the producer index (ENETC_RBPIR) with 0, and the software
next_to_clean and next_to_use variables. Notably, it will not update the
consumer index to make the hardware aware of the newly added buffers.
Wait, what? So how does it work?
Well, the reset values of the producer index and of the consumer index
of a ring are both zero. As per the description in the second paragraph,
it means that the ring is full of buffers waiting for hardware to put
frames in them, which by coincidence is almost true, because we have in
fact seeded 511 buffers into the ring.
But will the hardware attempt to access the 512th entry of the ring,
which has an invalid BD in it? Well, no, because in order to do that, it
would have to first populate the first 511 entries, and the NAPI
enetc_poll will kick in by then. Eventually, after 16 processed slots
have become available in the RX ring, enetc_clean_rx_ring will call
enetc_refill_rx_ring and then will [ finally ] update the consumer index
with the new software next_to_use variable. From now on, the
next_to_clean and next_to_use variables are in sync with the producer
and consumer ring indices.
So the day is saved, right? Well, not quite. Freeing the memory
allocated for the rings is done in:
enetc_close
-> enetc_clear_bdrs
-> enetc_clear_rxbdr
-> this just disables the ring
-> enetc_free_rxtx_rings
-> enetc_free_rx_ring
-> sets next_to_clean and next_to_use to 0
but again, nothing is committed to the hardware producer and consumer
indices (yay!). The assumption is that the ring is disabled, so the
indices don't matter anyway, and it's the responsibility of the "open"
code path to set those up.
.. Except that the "open" code path does not set those up properly.
While initially, things almost work, during subsequent enetc_close ->
enetc_open sequences, we have problems. To be precise, the enetc_open
that is subsequent to enetc_close will again refill the ring with 511
entries, but it will leave the consumer index untouched. Untouched
means, of course, equal to the value it had before disabling the ring
and draining the old buffers in enetc_close.
But as mentioned, enetc_setup_rxbdr will at least update the producer
index though, through this line of code:
enetc_rxbdr_wr(hw, idx, ENETC_RBPIR, 0);
so at this stage we'll have:
next_to_clean=0 (in hardware 0)
next_to_use=511 (in hardware we'll have the refill index prior to enetc_close)
Again, the next_to_clean and producer index are in sync and set to
correct values, so the driver manages to limp on. Eventually, 16 ring
entries will be consumed by enetc_poll, and the savior
enetc_clean_rx_ring will come and call enetc_refill_rx_ring, and then
update the hardware consumer ring based upon the new next_to_use.
So.. it works?
Well, by coincidence, it almost does, but there's a circumstance where
enetc_clean_rx_ring won't be there to save us. If the previous value of
the consumer index was 15, there's a problem, because the NAPI poll
sequence will only issue a refill when 16 or more buffers have been
consumed.
It's easiest to illustrate this with an example:
ip link set eno0 up
ip addr add 192.168.100.1/24 dev eno0
ping 192.168.100.1 -c 20 # ping this port from another board
ip link set eno0 down
ip link set eno0 up
ping 192.168.100.1 -c 20 # ping it again from the same other board
One by one:
1. ip link set eno0 up
-> calls enetc_setup_rxbdr:
-> calls enetc_refill_rx_ring(511 buffers)
-> next_to_clean=0 (in hw 0)
-> next_to_use=511 (in hw 0)
2. ping 192.168.100.1 -c 20 # ping this port from another board
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=1 next_to_clean 0 (in hw 1) next_to_use 511 (in hw 0)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=2 next_to_clean 1 (in hw 2) next_to_use 511 (in hw 0)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=3 next_to_clean 2 (in hw 3) next_to_use 511 (in hw 0)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=4 next_to_clean 3 (in hw 4) next_to_use 511 (in hw 0)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=5 next_to_clean 4 (in hw 5) next_to_use 511 (in hw 0)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=6 next_to_clean 5 (in hw 6) next_to_use 511 (in hw 0)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=7 next_to_clean 6 (in hw 7) next_to_use 511 (in hw 0)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=8 next_to_clean 7 (in hw 8) next_to_use 511 (in hw 0)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=9 next_to_clean 8 (in hw 9) next_to_use 511 (in hw 0)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=10 next_to_clean 9 (in hw 10) next_to_use 511 (in hw 0)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=11 next_to_clean 10 (in hw 11) next_to_use 511 (in hw 0)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=12 next_to_clean 11 (in hw 12) next_to_use 511 (in hw 0)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=13 next_to_clean 12 (in hw 13) next_to_use 511 (in hw 0)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=14 next_to_clean 13 (in hw 14) next_to_use 511 (in hw 0)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=15 next_to_clean 14 (in hw 15) next_to_use 511 (in hw 0)
enetc_clean_rx_ring: enetc_refill_rx_ring(16) increments next_to_use by 16 (mod 512) and writes it to hw
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=0 next_to_clean 15 (in hw 16) next_to_use 15 (in hw 15)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=1 next_to_clean 16 (in hw 17) next_to_use 15 (in hw 15)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=2 next_to_clean 17 (in hw 18) next_to_use 15 (in hw 15)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=3 next_to_clean 18 (in hw 19) next_to_use 15 (in hw 15)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=4 next_to_clean 19 (in hw 20) next_to_use 15 (in hw 15)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=5 next_to_clean 20 (in hw 21) next_to_use 15 (in hw 15)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=6 next_to_clean 21 (in hw 22) next_to_use 15 (in hw 15)
20 packets transmitted, 20 packets received, 0% packet loss
3. ip link set eno0 down
enetc_free_rx_ring: next_to_clean 0 (in hw 22), next_to_use 0 (in hw 15)
4. ip link set eno0 up
-> calls enetc_setup_rxbdr:
-> calls enetc_refill_rx_ring(511 buffers)
-> next_to_clean=0 (in hw 0)
-> next_to_use=511 (in hw 15)
5. ping 192.168.100.1 -c 20 # ping it again from the same other board
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=1 next_to_clean 0 (in hw 1) next_to_use 511 (in hw 15)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=2 next_to_clean 1 (in hw 2) next_to_use 511 (in hw 15)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=3 next_to_clean 2 (in hw 3) next_to_use 511 (in hw 15)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=4 next_to_clean 3 (in hw 4) next_to_use 511 (in hw 15)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=5 next_to_clean 4 (in hw 5) next_to_use 511 (in hw 15)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=6 next_to_clean 5 (in hw 6) next_to_use 511 (in hw 15)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=7 next_to_clean 6 (in hw 7) next_to_use 511 (in hw 15)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=8 next_to_clean 7 (in hw 8) next_to_use 511 (in hw 15)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=9 next_to_clean 8 (in hw 9) next_to_use 511 (in hw 15)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=10 next_to_clean 9 (in hw 10) next_to_use 511 (in hw 15)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=11 next_to_clean 10 (in hw 11) next_to_use 511 (in hw 15)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=12 next_to_clean 11 (in hw 12) next_to_use 511 (in hw 15)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=13 next_to_clean 12 (in hw 13) next_to_use 511 (in hw 15)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=14 next_to_clean 13 (in hw 14) next_to_use 511 (in hw 15)
20 packets transmitted, 12 packets received, 40% packet loss
And there it dies. No enetc_refill_rx_ring (because cleaned_cnt must be equal
to 15 for that to happen), no nothing. The hardware enters the condition where
the producer (14) + 1 is equal to the consumer (15) index, which makes it
believe it has no more free buffers to put packets in, so it starts discarding
them:
ip netns exec ns0 ethtool -S eno0 | grep -v ': 0'
NIC statistics:
Rx ring 0 discarded frames: 8
Summarized, if the interface receives between 16 and 32 (mod 512) frames
and then there is a link flap, then the port will eventually die with no
way to recover. If it receives less than 16 (mod 512) frames, then the
initial NAPI poll [ before the link flap ] will not update the consumer
index in hardware (it will remain zero) which will be ok when the buffers
are later reinitialized. If more than 32 (mod 512) frames are received,
the initial NAPI poll has the chance to refill the ring twice, updating
the consumer index to at least 32. So after the link flap, the consumer
index is still wrong, but the post-flap NAPI poll gets a chance to
refill the ring once (because it passes through cleaned_cnt=15) and
makes the consumer index be again back in sync with next_to_use.
The solution to this problem is actually simple, we just need to write
next_to_use into the hardware consumer index at enetc_open time, which
always brings it back in sync after an initial buffer seeding process.
The simpler thing would be to put the write to the consumer index into
enetc_refill_rx_ring directly, but there are issues with the MDIO
locking: in the NAPI poll code we have the enetc_lock_mdio() taken from
top-level and we use the unlocked enetc_wr_reg_hot, whereas in
enetc_open, the enetc_lock_mdio() is not taken at the top level, but
instead by each individual enetc_wr_reg, so we are forced to put an
additional enetc_wr_reg in enetc_setup_rxbdr. Better organization of
the code is left as a refactoring exercise.
Fixes: d4fd0404c1c9 ("enetc: Introduce basic PF and VF ENETC ethernet drivers")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Mon, 1 Mar 2021 11:18:17 +0000 (13:18 +0200)]
net: enetc: remove bogus write to SIRXIDR from enetc_setup_rxbdr
The Station Interface Receive Interrupt Detect Register (SIRXIDR)
contains a 16-bit wide mask of 'interrupt detected' events for each ring
associated with a port. Bit i is write-1-to-clean for RX ring i.
I have no explanation whatsoever how this line of code came to be
inserted in the blamed commit. I checked the downstream versions of that
patch and none of them have it.
The somewhat comical aspect of it is that we're writing a binary number
to the SIRXIDR register, which is derived from enetc_bd_unused(rx_ring).
Since the RX rings have 512 buffer descriptors, we end up writing 511 to
this register, which is 0x1ff, so we are effectively clearing the
'interrupt detected' event for rings 0-8.
This register is not what is used for interrupt handling though - it
only provides a summary for the entire SI. The hardware provides one
separate Interrupt Detect Register per RX ring, which auto-clears upon
read. So there doesn't seem to be any adverse effect caused by this
bogus write.
There is, however, one reason why this should be handled as a bugfix:
next_to_clean _should_ be committed to hardware, just not to that
register, and this was obscuring the fact that it wasn't. This is fixed
in the next patch, and removing the bogus line now allows the fix patch
to be backported beyond that point.
Fixes: fd5736bf9f23 ("enetc: Workaround for MDIO register access issue")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Mon, 1 Mar 2021 11:18:16 +0000 (13:18 +0200)]
net: enetc: force the RGMII speed and duplex instead of operating in inband mode
The ENETC port 0 MAC supports in-band status signaling coming from a PHY
when operating in RGMII mode, and this feature is enabled by default.
It has been reported that RGMII is broken in fixed-link, and that is not
surprising considering the fact that no PHY is attached to the MAC in
that case, but a switch.
This brings us to the topic of the patch: the enetc driver should have
not enabled the optional in-band status signaling for RGMII unconditionally,
but should have forced the speed and duplex to what was resolved by
phylink.
Note that phylink does not accept the RGMII modes as valid for in-band
signaling, and these operate a bit differently than 1000base-x and SGMII
(notably there is no clause 37 state machine so no ACK required from the
MAC, instead the PHY sends extra code words on RXD[3:0] whenever it is
not transmitting something else, so it should be safe to leave a PHY
with this option unconditionally enabled even if we ignore it). The spec
talks about this here:
https://e2e.ti.com/cfs-file/__key/communityserver-discussions-components-files/138/RGMIIv1_5F00_3.pdf
Fixes: 71b77a7a27a3 ("enetc: Migrate to PHYLINK and PCS_LYNX")
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Mon, 1 Mar 2021 11:18:15 +0000 (13:18 +0200)]
net: enetc: don't disable VLAN filtering in IFF_PROMISC mode
Quoting from the blamed commit:
In promiscuous mode, it is more intuitive that all traffic is received,
including VLAN tagged traffic. It appears that it is necessary to set
the flag in PSIPVMR for that to be the case, so VLAN promiscuous mode is
also temporarily enabled. On exit from promiscuous mode, the setting
made by ethtool is restored.
Intuitive or not, there isn't any definition issued by a standards body
which says that promiscuity has anything to do with VLAN filtering - it
only has to do with accepting packets regardless of destination MAC address.
In fact people are already trying to use this misunderstanding/bug of
the enetc driver as a justification to transform promiscuity into
something it never was about: accepting every packet (maybe that would
be the "rx-all" netdev feature?):
https://lore.kernel.org/netdev/
20201110153958.ci5ekor3o2ekg3ky@ipetronik.com/
This is relevant because there are use cases in the kernel (such as
tc-flower rules with the protocol 802.1Q and a vlan_id key) which do not
(yet) use the vlan_vid_add API to be compatible with VLAN-filtering NICs
such as enetc, so for those, disabling rx-vlan-filter is currently the
only right solution to make these setups work:
https://lore.kernel.org/netdev/CA+h21hoxwRdhq4y+w8Kwgm74d4cA0xLeiHTrmT-VpSaM7obhkg@mail.gmail.com/
The blamed patch has unintentionally introduced one more way for this to
work, which is to enable IFF_PROMISC, however this is non-portable
because port promiscuity is not meant to disable VLAN filtering.
Therefore, it could invite people to write broken scripts for enetc, and
then wonder why they are broken when migrating to other drivers that
don't handle promiscuity in the same way.
Fixes: 7070eea5e95a ("enetc: permit configuration of rx-vlan-filter with ethtool")
Cc: Markus Blöchl <Markus.Bloechl@ipetronik.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Mon, 1 Mar 2021 11:18:14 +0000 (13:18 +0200)]
net: enetc: fix incorrect TPID when receiving 802.1ad tagged packets
When the enetc ports have rx-vlan-offload enabled, they report a TPID of
ETH_P_8021Q regardless of what was actually in the packet. When
rx-vlan-offload is disabled, packets have the proper TPID. Fix this
inconsistency by finishing the TODO left in the code.
Fixes: d4fd0404c1c9 ("enetc: Introduce basic PF and VF ENETC ethernet drivers")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Mon, 1 Mar 2021 11:18:13 +0000 (13:18 +0200)]
net: enetc: take the MDIO lock only once per NAPI poll cycle
The workaround for the ENETC MDIO erratum caused a performance
degradation of 82 Kpps (seen with IP forwarding of two 1Gbps streams of
64B packets). This is due to excessive locking and unlocking in the fast
path, which can be avoided.
By taking the MDIO read-side lock only once per NAPI poll cycle, we are
able to regain 54 Kpps (65%) of the performance hit. The rest of the
performance degradation comes from the TX data path, but unfortunately
it doesn't look like we can optimize that away easily, even with
netdev_xmit_more(), there just isn't any skb batching done, to help with
taking the MDIO lock less often than once per packet.
We need to change the register accessor type for enetc_get_tx_tstamp,
because it now runs under the enetc_lock_mdio as per the new call path
detailed below:
enetc_msix
-> napi_schedule
-> enetc_poll
-> enetc_lock_mdio
-> enetc_clean_tx_ring
-> enetc_get_tx_tstamp
-> enetc_clean_rx_ring
-> enetc_unlock_mdio
Fixes: fd5736bf9f23 ("enetc: Workaround for MDIO register access issue")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Mon, 1 Mar 2021 11:18:12 +0000 (13:18 +0200)]
net: enetc: initialize RFS/RSS memories for unused ports too
Michael reports that since linux-next-
20210211, the AER messages for ECC
errors have started reappearing, and this time they can be reliably
reproduced with the first ping on one of his LS1028A boards.
$ ping 1[ 33.258069] pcieport 0000:00:1f.0: AER: Multiple Corrected error received: 0000:00:00.0
72.16.0.1
PING [ 33.267050] pcieport 0000:00:1f.0: AER: can't find device of ID0000
172.16.0.1 (172.16.0.1): 56 data bytes
64 bytes from 172.16.0.1: seq=0 ttl=64 time=17.124 ms
64 bytes from 172.16.0.1: seq=1 ttl=64 time=0.273 ms
$ devmem 0x1f8010e10 32
0xC0000006
It isn't clear why this is necessary, but it seems that for the errors
to go away, we must clear the entire RFS and RSS memory, not just for
the ports in use.
Sadly the code is structured in such a way that we can't have unified
logic for the used and unused ports. For the minimal initialization of
an unused port, we need just to enable and ioremap the PF memory space,
and a control buffer descriptor ring. Unused ports must then free the
CBDR because the driver will exit, but used ports can not pick up from
where that code path left, since the CBDR API does not reinitialize a
ring when setting it up, so its producer and consumer indices are out of
sync between the software and hardware state. So a separate
enetc_init_unused_port function was created, and it gets called right
after the PF memory space is enabled.
Fixes: 07bf34a50e32 ("net: enetc: initialize the RFS and RSS memories")
Reported-by: Michael Walle <michael@walle.cc>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Michael Walle <michael@walle.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Mon, 1 Mar 2021 11:18:11 +0000 (13:18 +0200)]
net: enetc: don't overwrite the RSS indirection table when initializing
After the blamed patch, all RX traffic gets hashed to CPU 0 because the
hashing indirection table set up in:
enetc_pf_probe
-> enetc_alloc_si_resources
-> enetc_configure_si
-> enetc_setup_default_rss_table
is overwritten later in:
enetc_pf_probe
-> enetc_init_port_rss_memory
which zero-initializes the entire port RSS table in order to avoid ECC errors.
The trouble really is that enetc_init_port_rss_memory really neads
enetc_alloc_si_resources to be called, because it depends upon
enetc_alloc_cbdr and enetc_setup_cbdr. But that whole enetc_configure_si
thing could have been better thought out, it has nothing to do in a
function called "alloc_si_resources", especially since its counterpart,
"free_si_resources", does nothing to unwind the configuration of the SI.
The point is, we need to pull out enetc_configure_si out of
enetc_alloc_resources, and move it after enetc_init_port_rss_memory.
This allows us to set up the default RSS indirection table after
initializing the memory.
Fixes: 07bf34a50e32 ("net: enetc: initialize the RFS and RSS memories")
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yejune Deng [Mon, 1 Mar 2021 06:05:48 +0000 (14:05 +0800)]
inetpeer: use div64_ul() and clamp_val() calculate inet_peer_threshold
In inet_initpeers(), struct inet_peer on IA32 uses 128 bytes in nowdays.
Get rid of the cascade and use div64_ul() and clamp_val() calculate that
will not need to be adjusted in the future as suggested by Eric Dumazet.
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Yejune Deng <yejune.deng@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Skripkin [Sun, 28 Feb 2021 23:22:40 +0000 (02:22 +0300)]
net/qrtr: fix __netdev_alloc_skb call
syzbot found WARNING in __alloc_pages_nodemask()[1] when order >= MAX_ORDER.
It was caused by a huge length value passed from userspace to qrtr_tun_write_iter(),
which tries to allocate skb. Since the value comes from the untrusted source
there is no need to raise a warning in __alloc_pages_nodemask().
[1] WARNING in __alloc_pages_nodemask+0x5f8/0x730 mm/page_alloc.c:5014
Call Trace:
__alloc_pages include/linux/gfp.h:511 [inline]
__alloc_pages_node include/linux/gfp.h:524 [inline]
alloc_pages_node include/linux/gfp.h:538 [inline]
kmalloc_large_node+0x60/0x110 mm/slub.c:3999
__kmalloc_node_track_caller+0x319/0x3f0 mm/slub.c:4496
__kmalloc_reserve net/core/skbuff.c:150 [inline]
__alloc_skb+0x4e4/0x5a0 net/core/skbuff.c:210
__netdev_alloc_skb+0x70/0x400 net/core/skbuff.c:446
netdev_alloc_skb include/linux/skbuff.h:2832 [inline]
qrtr_endpoint_post+0x84/0x11b0 net/qrtr/qrtr.c:442
qrtr_tun_write_iter+0x11f/0x1a0 net/qrtr/tun.c:98
call_write_iter include/linux/fs.h:1901 [inline]
new_sync_write+0x426/0x650 fs/read_write.c:518
vfs_write+0x791/0xa30 fs/read_write.c:605
ksys_write+0x12d/0x250 fs/read_write.c:658
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Reported-by: syzbot+80dccaee7c6630fa9dcf@syzkaller.appspotmail.com
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Acked-by: Alexander Lobakin <alobakin@pm.me>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 1 Mar 2021 21:22:34 +0000 (13:22 -0800)]
Merge branch 'sh_eth-masks'
Sergey Shtylyov says:
====================
Fix TRSCER masks in the Ether driver
Here are 3 patches against DaveM's 'net' repo. I'm fixing the TRSCER masks in
the driver to match the manuals...
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Sergey Shtylyov [Sun, 28 Feb 2021 20:27:32 +0000 (23:27 +0300)]
sh_eth: fix TRSCER mask for R7S9210
According to the RZ/A2M Group User's Manual: Hardware, Rev. 2.00,
the TRSCER register has bit 9 reserved, hence we can't use the driver's
default TRSCER mask. Add the explicit initializer for sh_eth_cpu_data::
trscer_err_mask for R7S9210.
Fixes: 6e0bb04d0e4f ("sh_eth: Add R7S9210 support")
Signed-off-by: Sergey Shtylyov <s.shtylyov@omprussia.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sergey Shtylyov [Sun, 28 Feb 2021 20:26:34 +0000 (23:26 +0300)]
sh_eth: fix TRSCER mask for R7S72100
According to the RZ/A1H Group, RZ/A1M Group User's Manual: Hardware,
Rev. 4.00, the TRSCER register has bit 9 reserved, hence we can't use
the driver's default TRSCER mask. Add the explicit initializer for
sh_eth_cpu_data::trscer_err_mask for R7S72100.
Fixes: db893473d313 ("sh_eth: Add support for r7s72100")
Signed-off-by: Sergey Shtylyov <s.shtylyov@omprussia.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sergey Shtylyov [Sun, 28 Feb 2021 20:25:43 +0000 (23:25 +0300)]
sh_eth: fix TRSCER mask for SH771x
According to the SH7710, SH7712, SH7713 Group User's Manual: Hardware,
Rev. 3.00, the TRSCER register actually has only bit 7 valid (and named
differently), with all the other bits reserved. Apparently, this was not
the case with some early revisions of the manual as we have the other
bits declared (and set) in the original driver. Follow the suit and add
the explicit sh_eth_cpu_data::trscer_err_mask initializer for SH771x...
Fixes: 86a74ff21a7a ("net: sh_eth: add support for Renesas SuperH Ethernet")
Signed-off-by: Sergey Shtylyov <s.shtylyov@omprussia.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tong Zhang [Sun, 28 Feb 2021 03:55:50 +0000 (22:55 -0500)]
atm: lanai: dont run lanai_dev_close if not open
lanai_dev_open() can fail. When it fail, lanai->base is unmapped and the
pci device is disabled. The caller, lanai_init_one(), then tries to run
atm_dev_deregister(). This will subsequently call lanai_dev_close() and
use the already released MMIO area.
To fix this issue, set the lanai->base to NULL if open fail,
and test the flag in lanai_dev_close().
[ 8.324153] lanai: lanai_start() failed, err=19
[ 8.324819] lanai(itf 0): shutting down interface
[ 8.325211] BUG: unable to handle page fault for address:
ffffc90000180024
[ 8.325781] #PF: supervisor write access in kernel mode
[ 8.326215] #PF: error_code(0x0002) - not-present page
[ 8.326641] PGD
100000067 P4D
100000067 PUD
100139067 PMD
10013a067 PTE 0
[ 8.327206] Oops: 0002 [#1] SMP KASAN NOPTI
[ 8.327557] CPU: 0 PID: 95 Comm: modprobe Not tainted
5.11.0-rc7-00090-gdcc0b49040c7 #12
[ 8.328229] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
rel-1.13.0-48-gd9c812dda519-4
[ 8.329145] RIP: 0010:lanai_dev_close+0x4f/0xe5 [lanai]
[ 8.329587] Code: 00 48 c7 c7 00 d3 01 c0 e8 49 4e 0a c2 48 8d bd 08 02 00 00 e8 6e 52 14 c1 48 80
[ 8.330917] RSP: 0018:
ffff8881029ef680 EFLAGS:
00010246
[ 8.331196] RAX:
000000000003fffe RBX:
ffff888102fb4800 RCX:
ffffffffc001a98a
[ 8.331572] RDX:
ffffc90000180000 RSI:
0000000000000246 RDI:
ffff888102fb4000
[ 8.331948] RBP:
ffff888102fb4000 R08:
ffffffff8115da8a R09:
ffffed102053deaa
[ 8.332326] R10:
0000000000000003 R11:
ffffed102053dea9 R12:
ffff888102fb48a4
[ 8.332701] R13:
ffffffffc00123c0 R14:
ffff888102fb4b90 R15:
ffff888102fb4b88
[ 8.333077] FS:
00007f08eb9056a0(0000) GS:
ffff88815b400000(0000) knlGS:
0000000000000000
[ 8.333502] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 8.333806] CR2:
ffffc90000180024 CR3:
0000000102a28000 CR4:
00000000000006f0
[ 8.334182] DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
[ 8.334557] DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
[ 8.334932] Call Trace:
[ 8.335066] atm_dev_deregister+0x161/0x1a0 [atm]
[ 8.335324] lanai_init_one.cold+0x20c/0x96d [lanai]
[ 8.335594] ? lanai_send+0x2a0/0x2a0 [lanai]
[ 8.335831] local_pci_probe+0x6f/0xb0
[ 8.336039] pci_device_probe+0x171/0x240
[ 8.336255] ? pci_device_remove+0xe0/0xe0
[ 8.336475] ? kernfs_create_link+0xb6/0x110
[ 8.336704] ? sysfs_do_create_link_sd.isra.0+0x76/0xe0
[ 8.336983] really_probe+0x161/0x420
[ 8.337181] driver_probe_device+0x6d/0xd0
[ 8.337401] device_driver_attach+0x82/0x90
[ 8.337626] ? device_driver_attach+0x90/0x90
[ 8.337859] __driver_attach+0x60/0x100
[ 8.338065] ? device_driver_attach+0x90/0x90
[ 8.338298] bus_for_each_dev+0xe1/0x140
[ 8.338511] ? subsys_dev_iter_exit+0x10/0x10
[ 8.338745] ? klist_node_init+0x61/0x80
[ 8.338956] bus_add_driver+0x254/0x2a0
[ 8.339164] driver_register+0xd3/0x150
[ 8.339370] ? 0xffffffffc0028000
[ 8.339550] do_one_initcall+0x84/0x250
[ 8.339755] ? trace_event_raw_event_initcall_finish+0x150/0x150
[ 8.340076] ? free_vmap_area_noflush+0x1a5/0x5c0
[ 8.340329] ? unpoison_range+0xf/0x30
[ 8.340532] ? ____kasan_kmalloc.constprop.0+0x84/0xa0
[ 8.340806] ? unpoison_range+0xf/0x30
[ 8.341014] ? unpoison_range+0xf/0x30
[ 8.341217] do_init_module+0xf8/0x350
[ 8.341419] load_module+0x3fe6/0x4340
[ 8.341621] ? vm_unmap_ram+0x1d0/0x1d0
[ 8.341826] ? ____kasan_kmalloc.constprop.0+0x84/0xa0
[ 8.342101] ? module_frob_arch_sections+0x20/0x20
[ 8.342358] ? __do_sys_finit_module+0x108/0x170
[ 8.342604] __do_sys_finit_module+0x108/0x170
[ 8.342841] ? __ia32_sys_init_module+0x40/0x40
[ 8.343083] ? file_open_root+0x200/0x200
[ 8.343298] ? do_sys_open+0x85/0xe0
[ 8.343491] ? filp_open+0x50/0x50
[ 8.343675] ? exit_to_user_mode_prepare+0xfc/0x130
[ 8.343935] do_syscall_64+0x33/0x40
[ 8.344132] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 8.344401] RIP: 0033:0x7f08eb887cf7
[ 8.344594] Code: 48 89 57 30 48 8b 04 24 48 89 47 38 e9 1d a0 02 00 48 89 f8 48 89 f7 48 89 d6 41
[ 8.345565] RSP: 002b:
00007ffcd5c98ad8 EFLAGS:
00000246 ORIG_RAX:
0000000000000139
[ 8.345962] RAX:
ffffffffffffffda RBX:
00000000008fea70 RCX:
00007f08eb887cf7
[ 8.346336] RDX:
0000000000000000 RSI:
00000000008fd9e0 RDI:
0000000000000003
[ 8.346711] RBP:
0000000000000003 R08:
0000000000000000 R09:
0000000000000001
[ 8.347085] R10:
00007f08eb8eb300 R11:
0000000000000246 R12:
00000000008fd9e0
[ 8.347460] R13:
0000000000000000 R14:
00000000008fddd0 R15:
0000000000000001
[ 8.347836] Modules linked in: lanai(+) atm
[ 8.348065] CR2:
ffffc90000180024
[ 8.348244] ---[ end trace
7fdc1c668f2003e5 ]---
[ 8.348490] RIP: 0010:lanai_dev_close+0x4f/0xe5 [lanai]
[ 8.348772] Code: 00 48 c7 c7 00 d3 01 c0 e8 49 4e 0a c2 48 8d bd 08 02 00 00 e8 6e 52 14 c1 48 80
[ 8.349745] RSP: 0018:
ffff8881029ef680 EFLAGS:
00010246
[ 8.350022] RAX:
000000000003fffe RBX:
ffff888102fb4800 RCX:
ffffffffc001a98a
[ 8.350397] RDX:
ffffc90000180000 RSI:
0000000000000246 RDI:
ffff888102fb4000
[ 8.350772] RBP:
ffff888102fb4000 R08:
ffffffff8115da8a R09:
ffffed102053deaa
[ 8.351151] R10:
0000000000000003 R11:
ffffed102053dea9 R12:
ffff888102fb48a4
[ 8.351525] R13:
ffffffffc00123c0 R14:
ffff888102fb4b90 R15:
ffff888102fb4b88
[ 8.351918] FS:
00007f08eb9056a0(0000) GS:
ffff88815b400000(0000) knlGS:
0000000000000000
[ 8.352343] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 8.352647] CR2:
ffffc90000180024 CR3:
0000000102a28000 CR4:
00000000000006f0
[ 8.353022] DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
[ 8.353397] DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
[ 8.353958] modprobe (95) used greatest stack depth: 26216 bytes left
Signed-off-by: Tong Zhang <ztong0001@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tong Zhang [Sat, 27 Feb 2021 21:15:06 +0000 (16:15 -0500)]
atm: eni: dont release is never initialized
label err_eni_release is reachable when eni_start() fail.
In eni_start() it calls dev->phy->start() in the last step, if start()
fail we don't need to call phy->stop(), if start() is never called, we
neither need to call phy->stop(), otherwise null-ptr-deref will happen.
In order to fix this issue, don't call phy->stop() in label err_eni_release
[ 4.875714] ==================================================================
[ 4.876091] BUG: KASAN: null-ptr-deref in suni_stop+0x47/0x100 [suni]
[ 4.876433] Read of size 8 at addr
0000000000000030 by task modprobe/95
[ 4.876778]
[ 4.876862] CPU: 0 PID: 95 Comm: modprobe Not tainted
5.11.0-rc7-00090-gdcc0b49040c7 #2
[ 4.877290] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-48-gd94
[ 4.877876] Call Trace:
[ 4.878009] dump_stack+0x7d/0xa3
[ 4.878191] kasan_report.cold+0x10c/0x10e
[ 4.878410] ? __slab_free+0x2f0/0x340
[ 4.878612] ? suni_stop+0x47/0x100 [suni]
[ 4.878832] suni_stop+0x47/0x100 [suni]
[ 4.879043] eni_do_release+0x3b/0x70 [eni]
[ 4.879269] eni_init_one.cold+0x1152/0x1747 [eni]
[ 4.879528] ? _raw_spin_lock_irqsave+0x7b/0xd0
[ 4.879768] ? eni_ioctl+0x270/0x270 [eni]
[ 4.879990] ? __mutex_lock_slowpath+0x10/0x10
[ 4.880226] ? eni_ioctl+0x270/0x270 [eni]
[ 4.880448] local_pci_probe+0x6f/0xb0
[ 4.880650] pci_device_probe+0x171/0x240
[ 4.880864] ? pci_device_remove+0xe0/0xe0
[ 4.881086] ? kernfs_create_link+0xb6/0x110
[ 4.881315] ? sysfs_do_create_link_sd.isra.0+0x76/0xe0
[ 4.881594] really_probe+0x161/0x420
[ 4.881791] driver_probe_device+0x6d/0xd0
[ 4.882010] device_driver_attach+0x82/0x90
[ 4.882233] ? device_driver_attach+0x90/0x90
[ 4.882465] __driver_attach+0x60/0x100
[ 4.882671] ? device_driver_attach+0x90/0x90
[ 4.882903] bus_for_each_dev+0xe1/0x140
[ 4.883114] ? subsys_dev_iter_exit+0x10/0x10
[ 4.883346] ? klist_node_init+0x61/0x80
[ 4.883557] bus_add_driver+0x254/0x2a0
[ 4.883764] driver_register+0xd3/0x150
[ 4.883971] ? 0xffffffffc0038000
[ 4.884149] do_one_initcall+0x84/0x250
[ 4.884355] ? trace_event_raw_event_initcall_finish+0x150/0x150
[ 4.884674] ? unpoison_range+0xf/0x30
[ 4.884875] ? ____kasan_kmalloc.constprop.0+0x84/0xa0
[ 4.885150] ? unpoison_range+0xf/0x30
[ 4.885352] ? unpoison_range+0xf/0x30
[ 4.885557] do_init_module+0xf8/0x350
[ 4.885760] load_module+0x3fe6/0x4340
[ 4.885960] ? vm_unmap_ram+0x1d0/0x1d0
[ 4.886166] ? ____kasan_kmalloc.constprop.0+0x84/0xa0
[ 4.886441] ? module_frob_arch_sections+0x20/0x20
[ 4.886697] ? __do_sys_finit_module+0x108/0x170
[ 4.886941] __do_sys_finit_module+0x108/0x170
[ 4.887178] ? __ia32_sys_init_module+0x40/0x40
[ 4.887419] ? file_open_root+0x200/0x200
[ 4.887634] ? do_sys_open+0x85/0xe0
[ 4.887826] ? filp_open+0x50/0x50
[ 4.888009] ? fpregs_assert_state_consistent+0x4d/0x60
[ 4.888287] ? exit_to_user_mode_prepare+0x2f/0x130
[ 4.888547] do_syscall_64+0x33/0x40
[ 4.888739] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 4.889010] RIP: 0033:0x7ff62fcf1cf7
[ 4.889202] Code: 48 89 57 30 48 8b 04 24 48 89 47 38 e9 1d a0 02 00 48 89 f8 48 89 f71
[ 4.890172] RSP: 002b:
00007ffe6644ade8 EFLAGS:
00000246 ORIG_RAX:
0000000000000139
[ 4.890570] RAX:
ffffffffffffffda RBX:
0000000000f2ca70 RCX:
00007ff62fcf1cf7
[ 4.890944] RDX:
0000000000000000 RSI:
0000000000f2b9e0 RDI:
0000000000000003
[ 4.891318] RBP:
0000000000000003 R08:
0000000000000000 R09:
0000000000000001
[ 4.891691] R10:
00007ff62fd55300 R11:
0000000000000246 R12:
0000000000f2b9e0
[ 4.892064] R13:
0000000000000000 R14:
0000000000f2bdd0 R15:
0000000000000001
[ 4.892439] ==================================================================
Signed-off-by: Tong Zhang <ztong0001@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Guangbin Huang [Sat, 27 Feb 2021 03:05:58 +0000 (11:05 +0800)]
net: phy: fix save wrong speed and duplex problem if autoneg is on
If phy uses generic driver and autoneg is on, enter command
"ethtool -s eth0 speed 50" will not change phy speed actually, but
command "ethtool eth0" shows speed is 50Mb/s because phydev->speed
has been set to 50 and no update later.
And duplex setting has same problem too.
However, if autoneg is on, phy only changes speed and duplex according to
phydev->advertising, but not phydev->speed and phydev->duplex. So in this
case, phydev->speed and phydev->duplex don't need to be set in function
phy_ethtool_ksettings_set() if autoneg is on.
Fixes: 51e2a3846eab ("PHY: Avoid unnecessary aneg restarts")
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jason A. Donenfeld [Sat, 27 Feb 2021 00:40:19 +0000 (01:40 +0100)]
net: always use icmp{,v6}_ndo_send from ndo_start_xmit
There were a few remaining tunnel drivers that didn't receive the prior
conversion to icmp{,v6}_ndo_send. Knowing now that this could lead to
memory corrution (see
ee576c47db60 ("net: icmp: pass zeroed opts from
icmp{,v6}_ndo_send before sending") for details), there's even more
imperative to have these all converted. So this commit goes through the
remaining cases that I could find and does a boring translation to the
ndo variety.
The Fixes: line below is the merge that originally added icmp{,v6}_
ndo_send and converted the first batch of icmp{,v6}_send users. The
rationale then for the change applies equally to this patch. It's just
that these drivers were left out of the initial conversion because these
network devices are hiding in net/ rather than in drivers/net/.
Cc: Florian Westphal <fw@strlen.de>
Cc: Willem de Bruijn <willemb@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: David Ahern <dsahern@kernel.org>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Fixes: 803381f9f117 ("Merge branch 'icmp-account-for-NAT-when-sending-icmps-from-ndo-layer'")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
DENG Qingfang [Sun, 28 Feb 2021 17:08:23 +0000 (01:08 +0800)]
net: dsa: tag_rtl4_a: fix egress tags
Commit
86dd9868b878 has several issues, but was accepted too soon
before anyone could take a look.
- Double free. dsa_slave_xmit() will free the skb if the xmit function
returns NULL, but the skb is already freed by eth_skb_pad(). Use
__skb_put_padto() to avoid that.
- Unnecessary allocation. It has been done by DSA core since commit
a3b0b6479700.
- A u16 pointer points to skb data. It should be __be16 for network
byte order.
- Typo in comments. "numer" -> "number".
Fixes: 86dd9868b878 ("net: dsa: tag_rtl4_a: Support also egress tags")
Signed-off-by: DENG Qingfang <dqfext@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jan Beulich [Thu, 25 Feb 2021 15:39:01 +0000 (16:39 +0100)]
xen-netback: use local var in xenvif_tx_check_gop() instead of re-calculating
shinfo already holds the result of skb_shinfo(skb) at this point - no
need to re-invoke the construct even twice.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ioana Ciornei [Fri, 26 Feb 2021 15:30:20 +0000 (17:30 +0200)]
net: phy: ti: take into account all possible interrupt sources
The previous implementation of .handle_interrupt() did not take into
account the fact that all the interrupt status registers should be
acknowledged since multiple interrupt sources could be asserted.
Fix this by reading all the status registers before exiting with
IRQ_NONE or triggering the PHY state machine.
Fixes: 1d1ae3c6ca3f ("net: phy: ti: implement generic .handle_interrupt() callback")
Reported-by: Sven Schuchmann <schuchmann@schleissheimer.de>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Link: https://lore.kernel.org/r/20210226153020.867852-1-ciorneiioana@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Sun, 28 Feb 2021 20:04:02 +0000 (12:04 -0800)]
Merge branch 'net-hns3-fixes-fot-net'
Huazhong Tan says:
====================
net: hns3: fixes fot -net
The patchset includes some fixes for the HNS3 ethernet driver.
====================
Link: https://lore.kernel.org/r/1614410693-8107-1-git-send-email-tanhuazhong@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jian Shen [Sat, 27 Feb 2021 07:24:53 +0000 (15:24 +0800)]
net: hns3: fix bug when calculating the TCAM table info
The function hclge_fd_convert_tuple() is used to convert tuples
and tuples mask to TCAM x and y. But it misuses the source mac
as source mac mask when convert INNER_SRC_MAC, which may cause
the flow director rule works unexpectedly. So fix it.
Fixes: 117328680288 ("net: hns3: Add input key and action config support for flow director")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jian Shen [Sat, 27 Feb 2021 07:24:52 +0000 (15:24 +0800)]
net: hns3: fix query vlan mask value error for flow director
Currently, the driver returns VLAN_VID_MASK for vlan mask field,
when get flow director rule information for rule doesn't use vlan.
It may cause the vlan mask value display as 0xf000 in this
case, like below:
estuary:/$ ethtool -u eth1
50 RX rings available
Total 1 rules
Filter: 2
Rule Type: TCP over IPv4
Src IP addr: 0.0.0.0 mask: 255.255.255.255
Dest IP addr: 0.0.0.0 mask: 255.255.255.255
TOS: 0x0 mask: 0xff
Src port: 0 mask: 0xffff
Dest port: 0 mask: 0xffff
VLAN EtherType: 0x0 mask: 0xffff
VLAN: 0x0 mask: 0xf000
User-defined: 0x1234 mask: 0x0
Action: Direct to queue 3
Fix it by return 0.
Fixes: 05c2314fe6a8 ("net: hns3: Add support for rule query of flow director")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jian Shen [Sat, 27 Feb 2021 07:24:51 +0000 (15:24 +0800)]
net: hns3: fix error mask definition of flow director
Currently, some bit filed definitions of flow director TCAM
configuration command are incorrect. Since the wrong MSB is
always 0, and these fields are assgined in order, so it still works.
Fix it by redefine them.
Fixes: 117328680288 ("net: hns3: Add input key and action config support for flow director")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Daniel Borkmann [Fri, 26 Feb 2021 21:22:48 +0000 (22:22 +0100)]
net: Fix gro aggregation for udp encaps with zero csum
We noticed a GRO issue for UDP-based encaps such as vxlan/geneve when the
csum for the UDP header itself is 0. In that case, GRO aggregation does
not take place on the phys dev, but instead is deferred to the vxlan/geneve
driver (see trace below).
The reason is essentially that GRO aggregation bails out in udp_gro_receive()
for such case when drivers marked the skb with CHECKSUM_UNNECESSARY (ice, i40e,
others) where for non-zero csums
2abb7cdc0dc8 ("udp: Add support for doing
checksum unnecessary conversion") promotes those skbs to CHECKSUM_COMPLETE
and napi context has csum_valid set. This is however not the case for zero
UDP csum (here: csum_cnt is still 0 and csum_valid continues to be false).
At the same time
57c67ff4bd92 ("udp: additional GRO support") added matches
on !uh->check ^ !uh2->check as part to determine candidates for aggregation,
so it certainly is expected to handle zero csums in udp_gro_receive(). The
purpose of the check added via
662880f44203 ("net: Allow GRO to use and set
levels of checksum unnecessary") seems to catch bad csum and stop aggregation
right away.
One way to fix aggregation in the zero case is to only perform the !csum_valid
check in udp_gro_receive() if uh->check is infact non-zero.
Before:
[...]
swapper 0 [008] 731.946506: net:netif_receive_skb: dev=enp10s0f0 skbaddr=0xffff966497100400 len=1500 (1)
swapper 0 [008] 731.946507: net:netif_receive_skb: dev=enp10s0f0 skbaddr=0xffff966497100200 len=1500
swapper 0 [008] 731.946507: net:netif_receive_skb: dev=enp10s0f0 skbaddr=0xffff966497101100 len=1500
swapper 0 [008] 731.946508: net:netif_receive_skb: dev=enp10s0f0 skbaddr=0xffff966497101700 len=1500
swapper 0 [008] 731.946508: net:netif_receive_skb: dev=enp10s0f0 skbaddr=0xffff966497101b00 len=1500
swapper 0 [008] 731.946508: net:netif_receive_skb: dev=enp10s0f0 skbaddr=0xffff966497100600 len=1500
swapper 0 [008] 731.946508: net:netif_receive_skb: dev=enp10s0f0 skbaddr=0xffff966497100f00 len=1500
swapper 0 [008] 731.946509: net:netif_receive_skb: dev=enp10s0f0 skbaddr=0xffff966497100a00 len=1500
swapper 0 [008] 731.946516: net:netif_receive_skb: dev=enp10s0f0 skbaddr=0xffff966497100500 len=1500
swapper 0 [008] 731.946516: net:netif_receive_skb: dev=enp10s0f0 skbaddr=0xffff966497100700 len=1500
swapper 0 [008] 731.946516: net:netif_receive_skb: dev=enp10s0f0 skbaddr=0xffff966497101d00 len=1500 (2)
swapper 0 [008] 731.946517: net:netif_receive_skb: dev=enp10s0f0 skbaddr=0xffff966497101000 len=1500
swapper 0 [008] 731.946517: net:netif_receive_skb: dev=enp10s0f0 skbaddr=0xffff966497101c00 len=1500
swapper 0 [008] 731.946517: net:netif_receive_skb: dev=enp10s0f0 skbaddr=0xffff966497101400 len=1500
swapper 0 [008] 731.946518: net:netif_receive_skb: dev=enp10s0f0 skbaddr=0xffff966497100e00 len=1500
swapper 0 [008] 731.946518: net:netif_receive_skb: dev=enp10s0f0 skbaddr=0xffff966497101600 len=1500
swapper 0 [008] 731.946521: net:netif_receive_skb: dev=enp10s0f0 skbaddr=0xffff966497100800 len=774
swapper 0 [008] 731.946530: net:netif_receive_skb: dev=test_vxlan skbaddr=0xffff966497100400 len=14032 (1)
swapper 0 [008] 731.946530: net:netif_receive_skb: dev=test_vxlan skbaddr=0xffff966497101d00 len=9112 (2)
[...]
# netperf -H 10.55.10.4 -t TCP_STREAM -l 20
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.55.10.4 () port 0 AF_INET : demo
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 16384 20.01 13129.24
After:
[...]
swapper 0 [026] 521.862641: net:netif_receive_skb: dev=enp10s0f0 skbaddr=0xffff93ab0d479000 len=11286 (1)
swapper 0 [026] 521.862643: net:netif_receive_skb: dev=test_vxlan skbaddr=0xffff93ab0d479000 len=11236 (1)
swapper 0 [026] 521.862650: net:netif_receive_skb: dev=enp10s0f0 skbaddr=0xffff93ab0d478500 len=2898 (2)
swapper 0 [026] 521.862650: net:netif_receive_skb: dev=enp10s0f0 skbaddr=0xffff93ab0d479f00 len=8490 (3)
swapper 0 [026] 521.862653: net:netif_receive_skb: dev=test_vxlan skbaddr=0xffff93ab0d478500 len=2848 (2)
swapper 0 [026] 521.862653: net:netif_receive_skb: dev=test_vxlan skbaddr=0xffff93ab0d479f00 len=8440 (3)
[...]
# netperf -H 10.55.10.4 -t TCP_STREAM -l 20
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.55.10.4 () port 0 AF_INET : demo
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 16384 20.01 24576.53
Fixes: 57c67ff4bd92 ("udp: additional GRO support")
Fixes: 662880f44203 ("net: Allow GRO to use and set levels of checksum unnecessary")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Tom Herbert <tom@herbertland.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/r/20210226212248.8300-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rafał Miłecki [Fri, 26 Feb 2021 13:20:38 +0000 (14:20 +0100)]
net: broadcom: bcm4908_enet: enable RX after processing packets
When receiving a lot of packets hardware may run out of free
descriptiors and stop RX ring. Enable it every time after handling
received packets.
Fixes: 4feffeadbcb2 ("net: broadcom: bcm4908enet: add BCM4908 controller driver")
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20210226132038.29849-1-zajec5@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Yinjun Zhang [Thu, 25 Feb 2021 12:51:02 +0000 (13:51 +0100)]
ethtool: fix the check logic of at least one channel for RX/TX
The command "ethtool -L <intf> combined 0" may clean the RX/TX channel
count and skip the error path, since the attrs
tb[ETHTOOL_A_CHANNELS_RX_COUNT] and tb[ETHTOOL_A_CHANNELS_TX_COUNT]
are NULL in this case when recent ethtool is used.
Tested using ethtool v5.10.
Fixes: 7be92514b99c ("ethtool: check if there is at least one channel for TX/RX in the core")
Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Louis Peens <louis.peens@netronome.com>
Link: https://lore.kernel.org/r/20210225125102.23989-1-simon.horman@netronome.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 26 Feb 2021 23:50:25 +0000 (15:50 -0800)]
Merge branch 'bnxt_en-error-recovery-bug-fixes'
Michael Chan says:
====================
bnxt_en: Error recovery bug fixes.
Two error recovery related bug fixes for 2 corner cases.
Please queue patch #2 for -stable. Thanks.
====================
Link: https://lore.kernel.org/r/1614332590-17865-1-git-send-email-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Edwin Peer [Fri, 26 Feb 2021 09:43:10 +0000 (04:43 -0500)]
bnxt_en: reliably allocate IRQ table on reset to avoid crash
The following trace excerpt corresponds with a NULL pointer dereference
of 'bp->irq_tbl' in bnxt_setup_inta() on an Aarch64 system after many
device resets:
Unable to handle kernel NULL pointer dereference at ...
000000d
...
pc : string+0x3c/0x80
lr : vsnprintf+0x294/0x7e0
sp :
ffff00000f61ba70 pstate :
20000145
x29:
ffff00000f61ba70 x28:
000000000000000d
x27:
ffff0000009c8b5a x26:
ffff00000f61bb80
x25:
ffff0000009c8b5a x24:
0000000000000012
x23:
00000000ffffffe0 x22:
ffff000008990428
x21:
ffff00000f61bb80 x20:
000000000000000d
x19:
000000000000001f x18:
0000000000000000
x17:
0000000000000000 x16:
ffff800b6d0fb400
x15:
0000000000000000 x14:
ffff800b7fe31ae8
x13:
00001ed16472c920 x12:
ffff000008c6b1c9
x11:
ffff000008cf0580 x10:
ffff00000f61bb80
x9 :
00000000ffffffd8 x8 :
000000000000000c
x7 :
ffff800b684b8000 x6 :
0000000000000000
x5 :
0000000000000065 x4 :
0000000000000001
x3 :
ffff0a00ffffff04 x2 :
000000000000001f
x1 :
0000000000000000 x0 :
000000000000000d
Call trace:
string+0x3c/0x80
vsnprintf+0x294/0x7e0
snprintf+0x44/0x50
__bnxt_open_nic+0x34c/0x928 [bnxt_en]
bnxt_open+0xe8/0x238 [bnxt_en]
__dev_open+0xbc/0x130
__dev_change_flags+0x12c/0x168
dev_change_flags+0x20/0x60
...
Ordinarily, a call to bnxt_setup_inta() (not in trace due to inlining)
would not be expected on a system supporting MSIX at all. However, if
bnxt_init_int_mode() does not end up being called after the call to
bnxt_clear_int_mode() in bnxt_fw_reset_close(), then the driver will
think that only INTA is supported and bp->irq_tbl will be NULL,
causing the above crash.
In the error recovery scenario, we call bnxt_clear_int_mode() in
bnxt_fw_reset_close() early in the sequence. Ordinarily, we will
call bnxt_init_int_mode() in bnxt_hwrm_if_change() after we
reestablish communication with the firmware after reset. However,
if the sequence has to abort before we call bnxt_init_int_mode() and
if the user later attempts to re-open the device, then it will cause
the crash above.
We fix it in 2 ways:
1. Check for bp->irq_tbl in bnxt_setup_int_mode(). If it is NULL, call
bnxt_init_init_mode().
2. If we need to abort in bnxt_hwrm_if_change() and cannot complete
the error recovery sequence, set the BNXT_STATE_ABORT_ERR flag. This
will cause more drastic recovery at the next attempt to re-open the
device, including a call to bnxt_init_int_mode().
Fixes: 3bc7d4a352ef ("bnxt_en: Add BNXT_STATE_IN_FW_RESET state.")
Reviewed-by: Scott Branden <scott.branden@broadcom.com>
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vasundhara Volam [Fri, 26 Feb 2021 09:43:09 +0000 (04:43 -0500)]
bnxt_en: Fix race between firmware reset and driver remove.
The driver's error recovery reset sequence can take many seconds to
complete and only the critical sections are protected by rtnl_lock.
A recent change has introduced a regression in this sequence.
bnxt_remove_one() may be called while the recovery is in progress.
Normally, unregister_netdev() would cause bnxt_close_nic() to be
called and this would cause the error recovery to safely abort
with the BNXT_STATE_ABORT_ERR flag set in bnxt_close_nic().
Recently, we added bnxt_reinit_after_abort() to allow the user to
reopen the device after an aborted recovery. This causes the
regression in the scenario described above because we would
attempt to re-open even after the netdev has been unregistered.
Fix it by checking the netdev reg_state in
bnxt_reinit_after_abort() and abort if it is unregistered.
Fixes: 6882c36cf82e ("bnxt_en: attempt to reinitialize after aborted reset")
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 26 Feb 2021 23:47:55 +0000 (15:47 -0800)]
Merge branch 'mlxsw-various-fixes'
Ido Schimmel says:
====================
mlxsw: Various fixes
This patchset contains various fixes for mlxsw.
Patch #1 fixes a race condition in a selftest. The race and fix are
explained in detail in the changelog.
Patch #2 re-adds a link mode that was wrongly removed, resulting in a
regression in some setups.
Patch #3 fixes a race condition in route installation with nexthop
objects.
Please consider patches #2 and #3 for stable.
====================
Link: https://lore.kernel.org/r/20210225165721.1322424-1-idosch@idosch.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ido Schimmel [Thu, 25 Feb 2021 16:57:21 +0000 (18:57 +0200)]
mlxsw: spectrum_router: Ignore routes using a deleted nexthop object
Routes are currently processed from a workqueue whereas nexthop objects
are processed in system call context. This can result in the driver not
finding a suitable nexthop group for a route and issuing a warning [1].
Fix this by ignoring such routes earlier in the process. The subsequent
deletion notification will be ignored as well.
[1]
WARNING: CPU: 2 PID: 7754 at drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:4853 mlxsw_sp_router_fib_event_work+0x1112/0x1e00 [mlxsw_spectrum]
[...]
CPU: 2 PID: 7754 Comm: kworker/u8:0 Not tainted 5.11.0-rc6-cq-
20210207-1 #16
Hardware name: Mellanox Technologies Ltd. MSN2100/SA001390, BIOS 5.6.5 05/24/2018
Workqueue: mlxsw_core_ordered mlxsw_sp_router_fib_event_work [mlxsw_spectrum]
RIP: 0010:mlxsw_sp_router_fib_event_work+0x1112/0x1e00 [mlxsw_spectrum]
Fixes: cdd6cfc54c64 ("mlxsw: spectrum_router: Allow programming routes with nexthop objects")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reported-by: Alex Veber <alexve@nvidia.com>
Tested-by: Alex Veber <alexve@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Danielle Ratson [Thu, 25 Feb 2021 16:57:20 +0000 (18:57 +0200)]
mlxsw: spectrum_ethtool: Add an external speed to PTYS register
Currently, only external bits are added to the PTYS register, whereas
there is one external bit that is wrongly marked as internal, and so was
recently removed from the register.
Add that bit to the PTYS register again, as this bit is no longer
internal.
Its removal resulted in '100000baseLR4_ER4/Full' link mode no longer
being supported, causing a regression on some setups.
Fixes: 5bf01b571cf4 ("mlxsw: spectrum_ethtool: Remove internal speeds from PTYS register")
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reported-by: Eddie Shklaer <eddies@nvidia.com>
Tested-by: Eddie Shklaer <eddies@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Danielle Ratson [Thu, 25 Feb 2021 16:57:19 +0000 (18:57 +0200)]
selftests: forwarding: Fix race condition in mirror installation
When mirroring to a gretap in hardware the device expects to be
programmed with the egress port and all the encapsulating headers. This
requires the driver to resolve the path the packet will take in the
software data path and program the device accordingly.
If the path cannot be resolved (in this case because of an unresolved
neighbor), then mirror installation fails until the path is resolved.
This results in a race that causes the test to sometimes fail.
Fix this by setting the neighbor's state to permanent, so that it is
always valid.
Fixes: b5b029399fa6d ("selftests: forwarding: mirror_gre_bridge_1d_vlan: Add STP test")
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Arjun Roy [Thu, 25 Feb 2021 23:26:28 +0000 (15:26 -0800)]
tcp: Fix sign comparison bug in getsockopt(TCP_ZEROCOPY_RECEIVE)
getsockopt(TCP_ZEROCOPY_RECEIVE) has a bug where we read a
user-provided "len" field of type signed int, and then compare the
value to the result of an "offsetofend" operation, which is unsigned.
Negative values provided by the user will be promoted to large
positive numbers; thus checking that len < offsetofend() will return
false when the intention was that it return true.
Note that while len is originally checked for negative values earlier
on in do_tcp_getsockopt(), subsequent calls to get_user() re-read the
value from userspace which may have changed in the meantime.
Therefore, re-add the check for negative values after the call to
get_user in the handler code for TCP_ZEROCOPY_RECEIVE.
Fixes: c8856c051454 ("tcp-zerocopy: Return inq along with tcp receive zerocopy.")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Arjun Roy <arjunroy@google.com>
Link: https://lore.kernel.org/r/20210225232628.4033281-1-arjunroy.kdev@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Heiko Thiery [Thu, 25 Feb 2021 21:15:16 +0000 (22:15 +0100)]
net: fec: ptp: avoid register access when ipg clock is disabled
When accessing the timecounter register on an i.MX8MQ the kernel hangs.
This is only the case when the interface is down. This can be reproduced
by reading with 'phc_ctrl eth0 get'.
Like described in the change in
91c0d987a9788dcc5fe26baafd73bf9242b68900
the igp clock is disabled when the interface is down and leads to a
system hang.
So we check if the ptp clock status before reading the timecounter
register.
Signed-off-by: Heiko Thiery <heiko.thiery@gmail.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Link: https://lore.kernel.org/r/20210225211514.9115-1-heiko.thiery@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Arnd Bergmann [Thu, 25 Feb 2021 14:57:27 +0000 (15:57 +0100)]
net: phy: make mdio_bus_phy_suspend/resume as __maybe_unused
When CONFIG_PM_SLEEP is disabled, the compiler warns about unused
functions:
drivers/net/phy/phy_device.c:273:12: error: unused function 'mdio_bus_phy_suspend' [-Werror,-Wunused-function]
static int mdio_bus_phy_suspend(struct device *dev)
drivers/net/phy/phy_device.c:293:12: error: unused function 'mdio_bus_phy_resume' [-Werror,-Wunused-function]
static int mdio_bus_phy_resume(struct device *dev)
The logic is intentional, so just mark these two as __maybe_unused
and remove the incorrect #ifdef.
Fixes: 4c0d2e96ba05 ("net: phy: consider that suspend2ram may cut off PHY power")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20210225145748.404410-1-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
DENG Qingfang [Fri, 26 Feb 2021 06:32:26 +0000 (14:32 +0800)]
net: dsa: mt7530: don't build GPIO support if !GPIOLIB
The new GPIO support may be optional at runtime, but it requires
building against gpiolib:
ERROR: modpost: "gpiochip_get_data" [drivers/net/dsa/mt7530.ko]
undefined!
ERROR: modpost: "devm_gpiochip_add_data_with_key"
[drivers/net/dsa/mt7530.ko] undefined!
Add #ifdef to exclude GPIO support if GPIOLIB is not enabled.
Fixes: 429a0edeefd8 ("net: dsa: mt7530: MT7530 optional GPIO support")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: DENG Qingfang <dqfext@gmail.com>
Link: https://lore.kernel.org/r/20210226063226.8474-1-dqfext@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Arnd Bergmann [Thu, 25 Feb 2021 14:38:32 +0000 (15:38 +0100)]
net: dsa: tag_ocelot_8021q: fix driver dependency
When the ocelot driver code is in a library, the dsa tag
code cannot be built-in:
ld.lld: error: undefined symbol: ocelot_can_inject
>>> referenced by tag_ocelot_8021q.c
>>> dsa/tag_ocelot_8021q.o:(ocelot_xmit) in archive net/built-in.a
ld.lld: error: undefined symbol: ocelot_port_inject_frame
>>> referenced by tag_ocelot_8021q.c
>>> dsa/tag_ocelot_8021q.o:(ocelot_xmit) in archive net/built-in.a
Building the tag support only really makes sense for compile-testing
when the driver is available, so add a Kconfig dependency that prevents
the broken configuration while allowing COMPILE_TEST alternative when
MSCC_OCELOT_SWITCH_LIB is disabled entirely. This case is handled
through the #ifdef check in include/soc/mscc/ocelot.h.
Fixes: 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20210225143910.3964364-2-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Arnd Bergmann [Thu, 25 Feb 2021 14:38:31 +0000 (15:38 +0100)]
net: mscc: ocelot: select NET_DEVLINK
Without this option, the driver fails to link:
ld.lld: error: undefined symbol: devlink_sb_register
>>> referenced by ocelot_devlink.c
>>> net/ethernet/mscc/ocelot_devlink.o:(ocelot_devlink_sb_register) in archive drivers/built-in.a
>>> referenced by ocelot_devlink.c
>>> net/ethernet/mscc/ocelot_devlink.o:(ocelot_devlink_sb_register) in archive drivers/built-in.a
Fixes: f59fd9cab730 ("net: mscc: ocelot: configure watermarks using devlink-sb")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20210225143910.3964364-1-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 26 Feb 2021 23:17:13 +0000 (15:17 -0800)]
Merge branch 'ethernet-fixes-for-stmmac-driver'
Joakim Zhang says:
====================
ethernet: fixes for stmmac driver
Fixes for stmmac driver.
---
ChangeLogs:
V1->V2:
* subject prefix: ethernet: stmmac: -> net: stmmac:
* use dma_addr_t instead of unsigned int for physical address
* use cpu_to_le32()
V2->V3:
* fix the build issue pointed out by kbuild bot.
* add error handling for stmmac_reinit_rx_buffers() function.
V3->V4:
* remove patch (net: stmmac: remove redundant null check for ptp clock),
reviewer thinks it should target net-next.
V4->V5:
* use %pad format to print dma_addr_t.
* extend dwmac4_display_ring() to support all descriptor types.
* while() -> do-while()
====================
Link: https://lore.kernel.org/r/20210225090114.17562-1-qiangqing.zhang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Joakim Zhang [Thu, 25 Feb 2021 09:01:14 +0000 (17:01 +0800)]
net: stmmac: re-init rx buffers when mac resume back
During suspend/resume stress test, we found descriptor write back by DMA
could exhibit unusual behavior, e.g.:
003 [0xc4310030]: 0x0 0x40 0x0 0xb5010040
We can see that desc3 write back is 0xb5010040, it is still ownd by DMA,
so application would not recycle this buffer. It will trigger fatal bus
error when DMA try to use this descriptor again. To fix this issue, we
should re-init all rx buffers when mac resume back.
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Joakim Zhang [Thu, 25 Feb 2021 09:01:13 +0000 (17:01 +0800)]
net: stmmac: fix wrongly set buffer2 valid when sph unsupport
In current driver, buffer2 available only when hardware supports split
header. Wrongly set buffer2 valid in stmmac_rx_refill when refill buffer
address. You can see that desc3 is 0x81000000 after initialization, but
turn out to be 0x83000000 after refill.
Fixes: 67afd6d1cfdf ("net: stmmac: Add Split Header support and enable it in XGMAC cores")
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Joakim Zhang [Thu, 25 Feb 2021 09:01:12 +0000 (17:01 +0800)]
net: stmmac: fix dma physical address of descriptor when display ring
Driver uses dma_alloc_coherent to allocate dma memory for descriptors,
dma_alloc_coherent will return both the virtual address and physical
address. AFAIK, virt_to_phys could not convert virtual address to
physical address, for which memory is allocated by dma_alloc_coherent.
dwmac4_display_ring() function is broken for various descriptor, it only
support normal descriptor(struct dma_desc) now, this patch also extends to
support all descriptor types.
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Joakim Zhang [Thu, 25 Feb 2021 09:01:11 +0000 (17:01 +0800)]
net: stmmac: fix watchdog timeout during suspend/resume stress test
stmmac_xmit() call stmmac_tx_timer_arm() at the end to modify tx timer to
do the transmission cleanup work. Imagine such a situation, stmmac enters
suspend immediately after tx timer modified, it's expire callback
stmmac_tx_clean() would not be invoked. This could affect BQL, since
netdev_tx_sent_queue() has been called, but netdev_tx_completed_queue()
have not been involved, as a result, dql_avail(&dev_queue->dql) finally
always return a negative value.
__dev_queue_xmit->__dev_xmit_skb->qdisc_run->__qdisc_run->qdisc_restart->dequeue_skb:
if ((q->flags & TCQ_F_ONETXQUEUE) &&
netif_xmit_frozen_or_stopped(txq)) // __QUEUE_STATE_STACK_XOFF is set
Net core will stop transmitting any more. Finillay, net watchdong would timeout.
To fix this issue, we should call netdev_tx_reset_queue() in stmmac_resume().
Fixes: 54139cf3bb33 ("net: stmmac: adding multiple buffers for rx")
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Joakim Zhang [Thu, 25 Feb 2021 09:01:10 +0000 (17:01 +0800)]
net: stmmac: stop each tx channel independently
If clear GMAC_CONFIG_TE bit, it would stop all tx channels, but users
may only want to stop specific tx channel.
Fixes: 48863ce5940f ("stmmac: add DMA support for GMAC 4.xx")
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 26 Feb 2021 21:17:43 +0000 (13:17 -0800)]
Merge tag 'wireless-drivers-2021-02-26' of git://git./linux/kernel/git/kvalo/wireless-drivers
Kalle Valo says:
====================
wireless-drivers fixes for v5.12
First set of fixes for v5.12. One iwlwifi kernel crash fix and smaller
fixes to multiple drivers.
ath9k
* fix Spatial Multiplexing Power Save (SMPS) handling to improve thoughtput
mt76
* error handling fixes
* memory leax fixes
iwlwifi
* don't crash during debug collection on DVM devices
MAINTAINERS
* email address update
ath11k
* fix GCC warning about DMA address debug messages
* fix regression which broke QCA6390 AP mode
* tag 'wireless-drivers-2021-02-26' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers:
mt76: mt7915: fix unused 'mode' variable
mt76: dma: do not report truncated frames to mac80211
mt76: mt7921: remove incorrect error handling
iwlwifi: pcie: fix iwl_so_trans_cfg link error when CONFIG_IWLMVM is disabled
ath11k: fix AP mode for QCA6390
ath11k: qmi: use %pad to format dma_addr_t
MAINTAINERS: update for mwifiex driver maintainers
iwlwifi: avoid crash on unsupported debug collection
mt76: mt7915: only modify tx buffer list after allocating tx token id
mt76: fix tx skb error handling in mt76_dma_tx_queue_skb
ath9k: fix transmitting to stations in dynamic SMPS mode
====================
Link: https://lore.kernel.org/r/20210226164411.CDD03C433CA@smtp.codeaurora.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 26 Feb 2021 21:16:29 +0000 (13:16 -0800)]
Merge https://git./linux/kernel/git/bpf/bpf
Alexei Starovoitov says:
====================
pull-request: bpf 2021-02-26
1) Fix for bpf atomic insns with src_reg=r0, from Brendan.
2) Fix use after free due to bpf_prog_clone, from Cong.
3) Drop imprecise verifier log message, from Dmitrii.
4) Remove incorrect blank line in bpf helper description, from Hangbin.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: No need to drop the packet when there is no geneve opt
bpf: Remove blank line in bpf helper description comment
tools/resolve_btfids: Fix build error with older host toolchains
selftests/bpf: Fix a compiler warning in global func test
bpf: Drop imprecise log message
bpf: Clear percpu pointers in bpf_prog_clone_free()
bpf: Fix a warning message in mark_ptr_not_null_reg()
bpf, x86: Fix BPF_FETCH atomic and/or/xor with r0 as src
====================
Link: https://lore.kernel.org/r/20210226193737.57004-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Arnd Bergmann [Fri, 26 Feb 2021 14:21:27 +0000 (15:21 +0100)]
mt76: mt7915: fix unused 'mode' variable
clang points out a possible corner case in the mt7915_tm_set_tx_cont()
function if called with invalid arguments:
drivers/net/wireless/mediatek/mt76/mt7915/testmode.c:593:2: warning: variable 'mode' is used uninitialized whenever switch default is taken [-Wsometimes-uninitialized]
default:
^~~~~~~
drivers/net/wireless/mediatek/mt76/mt7915/testmode.c:597:13: note: uninitialized use occurs here
rateval = mode << 6 | rate_idx;
^~~~
drivers/net/wireless/mediatek/mt76/mt7915/testmode.c:506:37: note: initialize the variable 'mode' to silence this warning
u8 rate_idx = td->tx_rate_idx, mode;
^
Change it to return an error instead of continuing with invalid data
here.
Fixes: 3f0caa3cbf94 ("mt76: mt7915: add support for continuous tx in testmode")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Lorenzo Bianconi [Sun, 7 Feb 2021 11:48:31 +0000 (12:48 +0100)]
mt76: dma: do not report truncated frames to mac80211
Commit
b102f0c522cf6 ("mt76: fix array overflow on receiving too many
fragments for a packet") fixes a possible OOB access but it introduces a
memory leak since the pending frame is not released to page_frag_cache
if the frag array of skb_shared_info is full. Commit
93a1d4791c10
("mt76: dma: fix a possible memory leak in mt76_add_fragment()") fixes
the issue but does not free the truncated skb that is forwarded to
mac80211 layer. Fix the leftover issue discarding even truncated skbs.
Fixes: 93a1d4791c10 ("mt76: dma: fix a possible memory leak in mt76_add_fragment()")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/a03166fcc8214644333c68674a781836e0f57576.1612697217.git.lorenzo@kernel.org
Arnd Bergmann [Thu, 25 Feb 2021 14:59:15 +0000 (15:59 +0100)]
mt76: mt7921: remove incorrect error handling
Clang points out a mistake in the error handling in
mt7921_mcu_tx_rate_report(), which tries to dereference a pointer that
cannot be initialized because of the error that is being handled:
drivers/net/wireless/mediatek/mt76/mt7921/mcu.c:409:3: warning: variable 'stats' is uninitialized when used here [-Wuninitialized]
stats->tx_rate = rate;
^~~~~
drivers/net/wireless/mediatek/mt76/mt7921/mcu.c:401:32: note: initialize the variable 'stats' to silence this warning
struct mt7921_sta_stats *stats;
^
Just remove the obviously incorrect line.
Fixes: 1c099ab44727 ("mt76: mt7921: add MCU support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20210225145953.404859-2-arnd@kernel.org
Kalle Valo [Thu, 25 Feb 2021 07:04:21 +0000 (09:04 +0200)]
iwlwifi: pcie: fix iwl_so_trans_cfg link error when CONFIG_IWLMVM is disabled
Randy reported an error on his randconfig builds:
ERROR: modpost: "iwl_so_trans_cfg" [drivers/net/wireless/intel/iwlwifi/iwlwifi.ko] undefined!
The problem was that when CONFIG_IWLMVM was disabled we were still accessing
iwl_so_trans_cfg. Fix it by moving IS_ENABLED() check before the access.
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Fixes: 930be4e76f26 ("iwlwifi: add support for SnJ with Jf devices")
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Acked-by: Luca Coelho <luciano.coelho@intel.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1614236661-20274-1-git-send-email-kvalo@codeaurora.org
Linus Torvalds [Thu, 25 Feb 2021 20:23:49 +0000 (12:23 -0800)]
Merge tag 'pwm/for-5.12-rc1' of git://git./linux/kernel/git/thierry.reding/linux-pwm
Pull pwm updates from Thierry Reding:
"The ZTE ZX platform is being removed, so the PWM driver is no longer
needed and removed as well.
Other than that this contains a small set of fixes and cleanups across
a couple of drivers"
* tag 'pwm/for-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm:
pwm: lpc18xx-sct: remove unneeded semicolon
pwm: iqs620a: Correct a stale state variable
pwm: iqs620a: Fix overflow and optimize calculations
pwm: rockchip: Enable clock before calling clk_get_rate()
pwm: rockchip: Eliminate potential race condition when probing
pwm: rockchip: Replace "bus clk" with "PWM clk"
pwm: rockchip: rockchip_pwm_probe(): Remove superfluous clk_unprepare()
pwm: rockchip: Enable APB clock during register access while probing
pwm: Remove ZTE ZX driver
Linus Torvalds [Thu, 25 Feb 2021 20:21:08 +0000 (12:21 -0800)]
Merge tag 'for_linus' of git://git./linux/kernel/git/mst/vhost
Pull virtio updates from Michael Tsirkin:
- new vdpa features to allow creation and deletion of new devices
- virtio-blk support per-device queue depth
- fixes, cleanups all over the place
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (31 commits)
virtio-input: add multi-touch support
virtio_mmio: fix one typo
vdpa/mlx5: fix param validation in mlx5_vdpa_get_config()
virtio_net: Fix fall-through warnings for Clang
virtio_input: Prevent EV_MSC/MSC_TIMESTAMP loop storm for MT.
virtio-blk: support per-device queue depth
virtio_vdpa: don't warn when fail to disable vq
virtio-pci: introduce modern device module
virito-pci-modern: rename map_capability() to vp_modern_map_capability()
virtio-pci-modern: introduce helper to get notification offset
virtio-pci-modern: introduce helper for getting queue nums
virtio-pci-modern: introduce helper for setting/geting queue size
virtio-pci-modern: introduce helper to set/get queue_enable
virtio-pci-modern: introduce vp_modern_queue_address()
virtio-pci-modern: introduce vp_modern_set_queue_vector()
virtio-pci-modern: introduce vp_modern_generation()
virtio-pci-modern: introduce helpers for setting and getting features
virtio-pci-modern: introduce helpers for setting and getting status
virtio-pci-modern: introduce helper to set config vector
virtio-pci-modern: introduce vp_modern_remove()
...
Linus Torvalds [Thu, 25 Feb 2021 20:18:21 +0000 (12:18 -0800)]
Merge tag 'mips_5.12_1' of git://git./linux/kernel/git/mips/linux
Pull more MIPS updates from Thomas Bogendoerfer:
- added n64 block driver
- fix for ubsan warnings
- fix for bcm63xx platform
- update of linux-mips mailinglist
* tag 'mips_5.12_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
arch: mips: update references to current linux-mips list
mips: bmips: init clocks earlier
vmlinux.lds.h: catch even more instrumentation symbols into .data
n64: store dev instance into disk private data
n64: cleanup n64cart_probe()
n64: cosmetics changes
n64: remove curly brackets
n64: use sector SECTOR_SHIFT instead 512
n64: use enums for reg
n64: move module param at the top
n64: move module info at the end
n64: use pr_fmt to avoid duplicate string
block: Add n64 cart driver
Linus Torvalds [Thu, 25 Feb 2021 20:10:22 +0000 (12:10 -0800)]
Merge tag 'drm-next-2021-02-26' of git://anongit.freedesktop.org/drm/drm
Pull more drm updates from Dave Airlie:
"This is mostly fixes but I missed msm-next pull last week. It's been
in drm-next.
Otherwise it's a selection of i915, amdgpu and misc fixes, one TTM
memory leak, nothing really major stands out otherwise.
core:
- vblank fence timing improvements
dma-buf:
- improve error handling
ttm:
- memory leak fix
msm:
- a6xx speedbin support
- a508, a509, a512 support
- various a5xx fixes
- various dpu fixes
- qseed3lite support for sm8250
- dsi fix for msm8994
- mdp5 fix for framerate bug with cmd mode panels
- a6xx GMU OOB race fixes that were showing up in CI
- various addition and removal of semicolons
- gem submit fix for legacy userspace relocs path
amdgpu:
- clang warning fix
- S0ix platform shutdown/poweroff fix
- misc display fixes
i915:
- color format fix
- -Wuninitialised reenabled
- GVT ww locking, cmd parser fixes
atyfb:
- fix build
rockchip:
- AFBC modifier fix"
* tag 'drm-next-2021-02-26' of git://anongit.freedesktop.org/drm/drm: (60 commits)
drm/panel: kd35t133: allow using non-continuous dsi clock
drm/rockchip: Require the YTR modifier for AFBC
drm/ttm: Fix a memory leak
drm/drm_vblank: set the dma-fence timestamp during send_vblank_event
dma-fence: allow signaling drivers to set fence timestamp
dma-buf: heaps: Rework heap allocation hooks to return struct dma_buf instead of fd
dma-buf: system_heap: Make sure to return an error if we abort
drm/amd/display: Fix system hang after multiple hotplugs (v3)
drm/amdgpu: fix shutdown and poweroff process failed with s0ix
drm/i915: Nuke INTEL_OUTPUT_FORMAT_INVALID
drm/i915: Enable -Wuninitialized
drm/amd/display: Remove Assert from dcn10_get_dig_frontend
drm/amd/display: Add vupdate_no_lock interrupts for DCN2.1
Revert "drm/amd/display: reuse current context instead of recreating one"
drm/amd/pm/swsmu: Avoid using structure_size uninitialized in smu_cmn_init_soft_gpu_metrics
fbdev: atyfb: add stubs for aty_{ld,st}_lcd()
drm/i915/gvt: Introduce per object locking in GVT scheduler.
drm/i915/gvt: Purge dev_priv->gt
drm/i915/gvt: Parse default state to update reg whitelist
dt-bindings: dp-connector: Drop maxItems from -supply
...
Linus Torvalds [Thu, 25 Feb 2021 20:06:25 +0000 (12:06 -0800)]
Merge tag 'net-5.12-rc1' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Rather small batch this time.
Current release - regressions:
- bcm63xx_enet: fix sporadic kernel panic due to queue length
mis-accounting
Current release - new code bugs:
- bcm4908_enet: fix RX path possible mem leak
- bcm4908_enet: fix NAPI poll returned value
- stmmac: fix missing spin_lock_init in visconti_eth_dwmac_probe()
- sched: cls_flower: validate ct_state for invalid and reply flags
Previous releases - regressions:
- net: introduce CAN specific pointer in the struct net_device to
prevent mis-interpreting memory
- phy: micrel: set soft_reset callback to genphy_soft_reset for
KSZ8081
- psample: fix netlink skb length with tunnel info
Previous releases - always broken:
- icmp: pass zeroed opts from icmp{,v6}_ndo_send before sending
- wireguard: device: do not generate ICMP for non-IP packets
- mptcp: provide subflow aware release function to avoid a mem leak
- hsr: add support for EntryForgetTime
- r8169: fix jumbo packet handling on RTL8168e
- octeontx2-af: fix an off by one in rvu_dbg_qsize_write()
- i40e: fix flow for IPv6 next header (extension header)
- phy: icplus: call phy_restore_page() when phy_select_page() fails
- dpaa_eth: fix the access method for the dpaa_napi_portal"
* tag 'net-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (55 commits)
r8169: fix jumbo packet handling on RTL8168e
net: phy: micrel: set soft_reset callback to genphy_soft_reset for KSZ8081
net: psample: Fix netlink skb length with tunnel info
net: broadcom: bcm4908_enet: fix NAPI poll returned value
net: broadcom: bcm4908_enet: fix RX path possible mem leak
net: hsr: add support for EntryForgetTime
net: dsa: sja1105: Remove unneeded cast in sja1105_crc32()
ibmvnic: fix a race between open and reset
net: stmmac: Fix missing spin_lock_init in visconti_eth_dwmac_probe()
net: introduce CAN specific pointer in the struct net_device
net: usb: qmi_wwan: support ZTE P685M modem
wireguard: kconfig: use arm chacha even with no neon
wireguard: queueing: get rid of per-peer ring buffers
wireguard: device: do not generate ICMP for non-IP packets
wireguard: peer: put frequently used members above cache lines
wireguard: selftests: test multiple parallel streams
wireguard: socket: remove bogus __be32 annotation
wireguard: avoid double unlikely() notation when using IS_ERR()
net: qrtr: Fix memory leak in qrtr_tun_open
vxlan: move debug check after netdev unregister
...
Linus Torvalds [Thu, 25 Feb 2021 20:03:13 +0000 (12:03 -0800)]
Merge tag 'acpi-5.12-rc1-3' of git://git./linux/kernel/git/rafael/linux-pm
Pull more ACPI updates from Rafael Wysocki:
"These make additional changes to the platform profile interface merged
recently and add support for the FPDT ACPI table.
Specifics:
- Rearrange Kconfig handling of ACPI_PLATFORM_PROFILE, add
"balanced-performance" to the list of supported platform profiles
and fix up some file references in a comment (Maximilian Luz).
- Add support for parsing the ACPI Firmware Performance Data Table
(FPDT) and exposing the data from there via sysfs (Zhang Rui)"
* tag 'acpi-5.12-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: platform: Add balanced-performance platform profile
ACPI: platform: Fix file references in comment
ACPI: platform: Hide ACPI_PLATFORM_PROFILE option
ACPI: tables: introduce support for FPDT table
Dave Airlie [Thu, 25 Feb 2021 19:07:24 +0000 (05:07 +1000)]
Merge tag 'drm-intel-next-fixes-2021-02-25' of git://anongit.freedesktop.org/drm/drm-intel into drm-next
A fix for color format check from Ville, plus the re-enable of -Wuninitialized
from Nathan, and the GVT fixes including fixes for ww locking, cmd parser and
a general cleanup of dev_priv->gt.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/YDe3pBPV5Kx3hpk6@intel.com
Dave Airlie [Thu, 25 Feb 2021 19:05:39 +0000 (05:05 +1000)]
Merge tag 'amd-drm-fixes-5.12-2021-02-24' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-fixes-5.12-2021-02-24:
amdgpu:
- Clang warning fix
- S0ix platform shutdown/poweroff fix
- Misc display fixes
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210225043853.3880-1-alexander.deucher@amd.com
Dave Airlie [Thu, 25 Feb 2021 18:42:51 +0000 (04:42 +1000)]
Merge tag 'drm-misc-next-fixes-2021-02-25' of git://anongit.freedesktop.org/drm/drm-misc into drm-next
drm-misc-next tasty fixes for v5.12:
- Cherry pick of drm-misc-fixes pull:
"here's this week's PR for drm-misc-fixes. One of the patches is a memory
leak; the rest is for hardware issues."
- Fix dt bindings for dp connector.
- Fix build error in atyfb.
- Improve error handling for dma-buf heaps.
- Make vblank timestamp more correct, by recording timestamp to be set when signaling.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/0f60a68c-d562-7266-0815-ea75ff680b17@linux.intel.com
Linus Torvalds [Thu, 25 Feb 2021 18:17:31 +0000 (10:17 -0800)]
Merge tag 'kbuild-v5.12' of git://git./linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- Fix false-positive build warnings for ARCH=ia64 builds
- Optimize dictionary size for module compression with xz
- Check the compiler and linker versions in Kconfig
- Fix misuse of extra-y
- Support DWARF v5 debug info
- Clamp SUBLEVEL to 255 because stable releases 4.4.x and 4.9.x
exceeded the limit
- Add generic syscall{tbl,hdr}.sh for cleanups across arches
- Minor cleanups of genksyms
- Minor cleanups of Kconfig
* tag 'kbuild-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (38 commits)
initramfs: Remove redundant dependency of RD_ZSTD on BLK_DEV_INITRD
kbuild: remove deprecated 'always' and 'hostprogs-y/m'
kbuild: parse C= and M= before changing the working directory
kbuild: reuse this-makefile to define abs_srctree
kconfig: unify rule of config, menuconfig, nconfig, gconfig, xconfig
kconfig: omit --oldaskconfig option for 'make config'
kconfig: fix 'invalid option' for help option
kconfig: remove dead code in conf_askvalue()
kconfig: clean up nested if-conditionals in check_conf()
kconfig: Remove duplicate call to sym_get_string_value()
Makefile: Remove # characters from compiler string
Makefile: reuse CC_VERSION_TEXT
kbuild: check the minimum linker version in Kconfig
kbuild: remove ld-version macro
scripts: add generic syscallhdr.sh
scripts: add generic syscalltbl.sh
arch: syscalls: remove $(srctree)/ prefix from syscall tables
arch: syscalls: add missing FORCE and fix 'targets' to make if_changed work
gen_compile_commands: prune some directories
kbuild: simplify access to the kernel's version
...
Linus Torvalds [Thu, 25 Feb 2021 18:06:55 +0000 (10:06 -0800)]
Merge tag 'ext4_for_linus' of git://git./linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
"Miscellaneous ext4 cleanups and bug fixes. Pretty boring this cycle..."
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: add .kunitconfig fragment to enable ext4-specific tests
ext: EXT4_KUNIT_TESTS should depend on EXT4_FS instead of selecting it
ext4: reset retry counter when ext4_alloc_file_blocks() makes progress
ext4: fix potential htree index checksum corruption
ext4: factor out htree rep invariant check
ext4: Change list_for_each* to list_for_each_entry*
ext4: don't try to processed freed blocks until mballoc is initialized
ext4: use DEFINE_MUTEX() for mutex lock
Rafael J. Wysocki [Thu, 25 Feb 2021 17:57:40 +0000 (18:57 +0100)]
Merge branch 'acpi-tables'
* acpi-tables:
ACPI: tables: introduce support for FPDT table
Linus Torvalds [Thu, 25 Feb 2021 17:56:08 +0000 (09:56 -0800)]
Merge tag 'pci-v5.12-changes' of git://git./linux/kernel/git/helgaas/pci
Pull PCI updates from Bjorn Helgaas:
"Enumeration:
- Remove unnecessary locking around _OSC (Bjorn Helgaas)
- Clarify message about _OSC failure (Bjorn Helgaas)
- Remove notification of PCIe bandwidth changes (Bjorn Helgaas)
- Tidy checking of syscall user config accessors (Heiner Kallweit)
Resource management:
- Decline to resize resources if boot config must be preserved (Ard
Biesheuvel)
- Fix pci_register_io_range() memory leak (Geert Uytterhoeven)
Error handling (Keith Busch):
- Clear error status from the correct device
- Retain error recovery status so drivers can use it after reset
- Log the type of Port (Root or Switch Downstream) that we reset
- Always request a reset for Downstream Ports in frozen state
Endpoint framework and NTB (Kishon Vijay Abraham I):
- Make *_get_first_free_bar() take into account 64 bit BAR
- Add helper API to get the 'next' unreserved BAR
- Make *_free_bar() return error codes on failure
- Remove unused pci_epf_match_device()
- Add support to associate secondary EPC with EPF
- Add support in configfs to associate two EPCs with EPF
- Add pci_epc_ops to map MSI IRQ
- Add pci_epf_ops to expose function-specific attrs
- Allow user to create sub-directory of 'EPF Device' directory
- Implement ->msi_map_irq() ops for cadence
- Configure LM_EP_FUNC_CFG based on epc->function_num_map for cadence
- Add EP function driver to provide NTB functionality
- Add support for EPF PCI Non-Transparent Bridge
- Add specification for PCI NTB function device
- Add PCI endpoint NTB function user guide
- Add configfs binding documentation for pci-ntb endpoint function
Broadcom STB PCIe controller driver:
- Add support for BCM4908 and external PERST# signal controller
(Rafał Miłecki)
Cadence PCIe controller driver:
- Retrain Link to work around Gen2 training defect (Nadeem Athani)
- Fix merge botch in cdns_pcie_host_map_dma_ranges() (Krzysztof
Wilczyński)
Freescale Layerscape PCIe controller driver:
- Add LX2160A rev2 EP mode support (Hou Zhiqiang)
- Convert to builtin_platform_driver() (Michael Walle)
MediaTek PCIe controller driver:
- Fix OF node reference leak (Krzysztof Wilczyński)
Microchip PolarFlare PCIe controller driver:
- Add Microchip PolarFire PCIe controller driver (Daire McNamara)
Qualcomm PCIe controller driver:
- Use PHY_REFCLK_USE_PAD only for ipq8064 (Ansuel Smith)
- Add support for ddrss_sf_tbu clock for sm8250 (Dmitry Baryshkov)
Renesas R-Car PCIe controller driver:
- Drop PCIE_RCAR config option (Lad Prabhakar)
- Always allocate MSI addresses in 32bit space (Marek Vasut)
Rockchip PCIe controller driver:
- Add FriendlyARM NanoPi M4B DT binding (Chen-Yu Tsai)
- Make 'ep-gpios' DT property optional (Chen-Yu Tsai)
Synopsys DesignWare PCIe controller driver:
- Work around ECRC configuration hardware defect (Vidya Sagar)
- Drop support for config space in DT 'ranges' (Rob Herring)
- Change size to u64 for EP outbound iATU (Shradha Todi)
- Add upper limit address for outbound iATU (Shradha Todi)
- Make dw_pcie ops optional (Jisheng Zhang)
- Remove unnecessary dw_pcie_ops from al driver (Jisheng Zhang)
Xilinx Versal CPM PCIe controller driver:
- Fix OF node reference leak (Pan Bian)
Miscellaneous:
- Remove tango host controller driver (Arnd Bergmann)
- Remove IRQ handler & data together (altera-msi, brcmstb, dwc)
(Martin Kaiser)
- Fix xgene-msi race in installing chained IRQ handler (Martin
Kaiser)
- Apply CONFIG_PCI_DEBUG to entire drivers/pci hierarchy (Junhao He)
- Fix pci-bridge-emul array overruns (Russell King)
- Remove obsolete uses of WARN_ON(in_interrupt()) (Sebastian Andrzej
Siewior)"
* tag 'pci-v5.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (69 commits)
PCI: qcom: Use PHY_REFCLK_USE_PAD only for ipq8064
PCI: qcom: Add support for ddrss_sf_tbu clock
dt-bindings: PCI: qcom: Document ddrss_sf_tbu clock for sm8250
PCI: al: Remove useless dw_pcie_ops
PCI: dwc: Don't assume the ops in dw_pcie always exist
PCI: dwc: Add upper limit address for outbound iATU
PCI: dwc: Change size to u64 for EP outbound iATU
PCI: dwc: Drop support for config space in 'ranges'
PCI: layerscape: Convert to builtin_platform_driver()
PCI: layerscape: Add LX2160A rev2 EP mode support
dt-bindings: PCI: layerscape: Add LX2160A rev2 compatible strings
PCI: dwc: Work around ECRC configuration issue
PCI/portdrv: Report reset for frozen channel
PCI/AER: Specify the type of Port that was reset
PCI/ERR: Retain status from error notification
PCI/AER: Clear AER status from Root Port when resetting Downstream Port
PCI/ERR: Clear status of the reporting device
dt-bindings: arm: rockchip: Add FriendlyARM NanoPi M4B
PCI: rockchip: Make 'ep-gpios' DT property optional
Documentation: PCI: Add PCI endpoint NTB function user guide
...
Heiner Kallweit [Thu, 25 Feb 2021 15:05:19 +0000 (16:05 +0100)]
r8169: fix jumbo packet handling on RTL8168e
Josef reported [0] that using jumbo packets fails on RTL8168e.
Aligning the values for register MaxTxPacketSize with the
vendor driver fixes the problem.
[0] https://bugzilla.kernel.org/show_bug.cgi?id=211827
Fixes: d58d46b5d851 ("r8169: jumbo fixes.")
Reported-by: Josef Oškera <joskera@redhat.com>
Tested-by: Josef Oškera <joskera@redhat.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/b15ddef7-0d50-4320-18f4-6a3f86fbfd3e@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Christian Melki [Wed, 24 Feb 2021 20:55:36 +0000 (21:55 +0100)]
net: phy: micrel: set soft_reset callback to genphy_soft_reset for KSZ8081
Following a similar reinstate for the KSZ9031.
Older kernels would use the genphy_soft_reset if the PHY did not implement
a .soft_reset.
Bluntly removing that default may expose a lot of situations where various
PHYs/board implementations won't recover on various changes.
Like with this implementation during a 4.9.x to 5.4.x LTS transition.
I think it's a good thing to remove unwanted soft resets but wonder if it
did open a can of worms?
Atleast this fixes one iMX6 FEC/RMII/8081 combo.
Fixes: 6e2d85ec0559 ("net: phy: Stop with excessive soft reset")
Signed-off-by: Christian Melki <christian.melki@t2data.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20210224205536.9349-1-christian.melki@t2data.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Thu, 25 Feb 2021 17:50:36 +0000 (09:50 -0800)]
Merge tag 'nds32-for-linux-5.12' of git://git./linux/kernel/git/greentime/linux
Pull nds32 updates from Greentime Hu:
"Code clean-up and refinement"
* tag 'nds32-for-linux-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/greentime/linux:
nds32: Fix bogus reference to <asm/procinfo.h>
nds32: use get_kernel_nofault in dump_mem
nds32: remove dump_instr
nds32: configs: Cleanup CONFIG_CROSS_COMPILE
nds32: Replace <linux/clk-provider.h> by <linux/of_clk.h>
Chris Mi [Thu, 25 Feb 2021 07:51:45 +0000 (15:51 +0800)]
net: psample: Fix netlink skb length with tunnel info
Currently, the psample netlink skb is allocated with a size that does
not account for the nested 'PSAMPLE_ATTR_TUNNEL' attribute and the
padding required for the 64-bit attribute 'PSAMPLE_TUNNEL_KEY_ATTR_ID'.
This can result in failure to add attributes to the netlink skb due
to insufficient tail room. The following error message is printed to
the kernel log: "Could not create psample log message".
Fix this by adjusting the allocation size to take into account the
nested attribute and the padding.
Fixes: d8bed686ab96 ("net: psample: Add tunnel support")
CC: Yotam Gigi <yotam.gi@gmail.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Chris Mi <cmi@nvidia.com>
Link: https://lore.kernel.org/r/20210225075145.184314-1-cmi@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rafał Miłecki [Wed, 24 Feb 2021 15:18:42 +0000 (16:18 +0100)]
net: broadcom: bcm4908_enet: fix NAPI poll returned value
Missing increment was resulting in poll function always returning 0
instead of amount of processed packets.
Fixes: 4feffeadbcb2 ("net: broadcom: bcm4908enet: add BCM4908 controller driver")
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20210224151842.2419-2-zajec5@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rafał Miłecki [Wed, 24 Feb 2021 15:18:41 +0000 (16:18 +0100)]
net: broadcom: bcm4908_enet: fix RX path possible mem leak
After filling RX ring slot with new skb it's required to free old skb.
Immediately on error or later in the net subsystem.
Fixes: 4feffeadbcb2 ("net: broadcom: bcm4908enet: add BCM4908 controller driver")
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20210224151842.2419-1-zajec5@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Marco Wenzel [Wed, 24 Feb 2021 09:46:49 +0000 (10:46 +0100)]
net: hsr: add support for EntryForgetTime
In IEC 62439-3 EntryForgetTime is defined with a value of 400 ms. When a
node does not send any frame within this time, the sequence number check
for can be ignored. This solves communication issues with Cisco IE 2000
in Redbox mode.
Fixes: f421436a591d ("net/hsr: Add support for the High-availability Seamless Redundancy protocol (HSRv0)")
Signed-off-by: Marco Wenzel <marco.wenzel@a-eberle.de>
Reviewed-by: George McCollister <george.mccollister@gmail.com>
Tested-by: George McCollister <george.mccollister@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20210224094653.1440-1-marco.wenzel@a-eberle.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Geert Uytterhoeven [Tue, 23 Feb 2021 11:20:03 +0000 (12:20 +0100)]
net: dsa: sja1105: Remove unneeded cast in sja1105_crc32()
sja1105_unpack() takes a "const void *buf" as its first parameter, so
there is no need to cast away the "const" of the "buf" variable before
calling it.
Drop the cast, as it prevents the compiler performing some checks.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20210223112003.2223332-1-geert+renesas@glider.be
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Heiko Stuebner [Sat, 6 Feb 2021 13:50:20 +0000 (14:50 +0100)]
drm/panel: kd35t133: allow using non-continuous dsi clock
The panel is able to work when dsi clock is non-continuous, thus
the system power consumption can be reduced using such feature.
Add MIPI_DSI_CLOCK_NON_CONTINUOUS to panel's mode_flags.
Also the flag actually becomes necessary after
commit
c6d94e37bdbb ("drm/bridge/synopsys: dsi: add support for non-continuous HS clock")
and without it the panel only emits stripes instead of output.
Fixes: c6d94e37bdbb ("drm/bridge/synopsys: dsi: add support for non-continuous HS clock")
Cc: stable@vger.kernel.org # 5.10+
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Tested-by: Ezequiel Garcia <ezequiel@collabora.com>
Reviewed-by: Christopher Morgan <macromorgan@hotmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210206135020.1991820-1-heiko@sntech.de
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Alyssa Rosenzweig [Tue, 11 Aug 2020 20:26:31 +0000 (16:26 -0400)]
drm/rockchip: Require the YTR modifier for AFBC
The AFBC decoder used in the Rockchip VOP assumes the use of the
YUV-like colourspace transform (YTR). YTR is lossless for RGB(A)
buffers, which covers the RGBA8 and RGB565 formats supported in
vop_convert_afbc_format. Use of YTR is signaled with the
AFBC_FORMAT_MOD_YTR modifier, which prior to this commit was missing. As
such, a producer would have to generate buffers that do not use YTR,
which the VOP would erroneously decode as YTR, leading to severe visual
corruption.
The upstream AFBC support was developed against a captured frame, which
failed to exercise modifier support. Prior to bring-up of AFBC in Mesa
(in the Panfrost driver), no open userspace respected modifier
reporting. As such, this change is not expected to affect broken
userspaces.
Tested on RK3399 with Panfrost and Weston.
Fixes: 7707f7227f09 ("drm/rockchip: Add support for afbc")
Cc: stable@vger.kernel.org
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20200811202631.3603-1-alyssa.rosenzweig@collabora.com
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
xinhui pan [Fri, 19 Feb 2021 04:25:47 +0000 (12:25 +0800)]
drm/ttm: Fix a memory leak
Free the memory on failure.
Also no need to re-alloc memory on retry.
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210219042547.44855-1-xinhui.pan@amd.com
Reviewed-by: Christian König <christian.koenig@amd.com>
CC: stable@vger.kernel.org # 5.11
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Geert Uytterhoeven [Thu, 27 Aug 2020 13:24:34 +0000 (15:24 +0200)]
nds32: Fix bogus reference to <asm/procinfo.h>
Andestech(nds32) never had <asm/procinfo.h>.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Greentime Hu <green.hu@gmail.com>
Signed-off-by: Greentime Hu <green.hu@gmail.com>
Christoph Hellwig [Mon, 20 Jul 2020 11:44:48 +0000 (13:44 +0200)]
nds32: use get_kernel_nofault in dump_mem
Use the proper get_kernel_nofault helper to access an unsafe kernel
pointer without faulting instead of playing with set_fs and get_user.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Nick Hu <nickhu@andestech.com>
Acked-by: Greentime Hu <green.hu@gmail.com>
Signed-off-by: Greentime Hu <green.hu@gmail.com>
Christoph Hellwig [Mon, 20 Jul 2020 11:44:47 +0000 (13:44 +0200)]
nds32: remove dump_instr
dump_inst has a return before actually doing anything, so just drop the
whole thing.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Nick Hu <nickhu@andestech.com>
Acked-by: Greentime Hu <green.hu@gmail.com>
Signed-off-by: Greentime Hu <green.hu@gmail.com>
Krzysztof Kozlowski [Mon, 17 Feb 2020 16:49:18 +0000 (17:49 +0100)]
nds32: configs: Cleanup CONFIG_CROSS_COMPILE
CONFIG_CROSS_COMPILE is gone since commit
f1089c92da79 ("kbuild: remove
CONFIG_CROSS_COMPILE support").
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Acked-by: Greentime Hu <green.hu@gmail.com>
Signed-off-by: Greentime Hu <green.hu@gmail.com>
Geert Uytterhoeven [Wed, 12 Feb 2020 10:16:51 +0000 (11:16 +0100)]
nds32: Replace <linux/clk-provider.h> by <linux/of_clk.h>
The Andes platform code is not a clock provider, and just needs to call
of_clk_init().
Hence it can include <linux/of_clk.h> instead of <linux/clk-provider.h>.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Greentime Hu <green.hu@gmail.com>
Reviewed-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Greentime Hu <green.hu@gmail.com>
Linus Torvalds [Thu, 25 Feb 2021 00:32:23 +0000 (16:32 -0800)]
Merge tag 'x86-entry-2021-02-24' of git://git./linux/kernel/git/tip/tip
Pull x86 irq entry updates from Thomas Gleixner:
"The irq stack switching was moved out of the ASM entry code in course
of the entry code consolidation. It ended up being suboptimal in
various ways.
This reworks the X86 irq stack handling:
- Make the stack switching inline so the stackpointer manipulation is
not longer at an easy to find place.
- Get rid of the unnecessary indirect call.
- Avoid the double stack switching in interrupt return and reuse the
interrupt stack for softirq handling.
- A objtool fix for CONFIG_FRAME_POINTER=y builds where it got
confused about the stack pointer manipulation"
* tag 'x86-entry-2021-02-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: Fix stack-swizzle for FRAME_POINTER=y
um: Enforce the usage of asm-generic/softirq_stack.h
x86/softirq/64: Inline do_softirq_own_stack()
softirq: Move do_softirq_own_stack() to generic asm header
softirq: Move __ARCH_HAS_DO_SOFTIRQ to Kconfig
x86: Select CONFIG_HAVE_IRQ_EXIT_ON_IRQ_STACK
x86/softirq: Remove indirection in do_softirq_own_stack()
x86/entry: Use run_sysvec_on_irqstack_cond() for XEN upcall
x86/entry: Convert device interrupts to inline stack switching
x86/entry: Convert system vectors to irq stack macro
x86/irq: Provide macro for inlining irq stack switching
x86/apic: Split out spurious handling code
x86/irq/64: Adjust the per CPU irq stack pointer by 8
x86/irq: Sanitize irq stack tracking
x86/entry: Fix instrumentation annotation
Linus Torvalds [Thu, 25 Feb 2021 00:20:38 +0000 (16:20 -0800)]
Merge branch 'akpm' (patches from Andrew)
Merge misc updates from Andrew Morton:
"A few small subsystems and some of MM.
172 patches.
Subsystems affected by this patch series: hexagon, scripts, ntfs,
ocfs2, vfs, and mm (slab-generic, slab, slub, debug, pagecache, swap,
memcg, pagemap, mprotect, mremap, page-reporting, vmalloc, kasan,
pagealloc, memory-failure, hugetlb, vmscan, z3fold, compaction,
mempolicy, oom-kill, hugetlbfs, and migration)"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (172 commits)
mm/migrate: remove unneeded semicolons
hugetlbfs: remove unneeded return value of hugetlb_vmtruncate()
hugetlbfs: fix some comment typos
hugetlbfs: correct some obsolete comments about inode i_mutex
hugetlbfs: make hugepage size conversion more readable
hugetlbfs: remove meaningless variable avoid_reserve
hugetlbfs: correct obsolete function name in hugetlbfs_read_iter()
hugetlbfs: use helper macro default_hstate in init_hugetlbfs_fs
hugetlbfs: remove useless BUG_ON(!inode) in hugetlbfs_setattr()
hugetlbfs: remove special hugetlbfs_set_page_dirty()
mm/hugetlb: change hugetlb_reserve_pages() to type bool
mm, oom: fix a comment in dump_task()
mm/mempolicy: use helper range_in_vma() in queue_pages_test_walk()
numa balancing: migrate on fault among multiple bound nodes
mm, compaction: make fast_isolate_freepages() stay within zone
mm/compaction: fix misbehaviors of fast_find_migrateblock()
mm/compaction: correct deferral logic for proactive compaction
mm/compaction: remove duplicated VM_BUG_ON_PAGE !PageLocked
mm/compaction: remove rcu_read_lock during page compaction
z3fold: simplify the zhdr initialization code in init_z3fold_page()
...
Sukadev Bhattiprolu [Wed, 24 Feb 2021 05:02:29 +0000 (21:02 -0800)]
ibmvnic: fix a race between open and reset
__ibmvnic_reset() currently reads the adapter->state before getting the
rtnl and saves that state as the "target state" for the reset. If this
read occurs when adapter is in PROBED state, the target state would be
PROBED.
Just after the target state is saved, and before the actual reset process
is started (i.e before rtnl is acquired) if we get an ibmvnic_open() call
we would move the adapter to OPEN state.
But when the reset is processed (after ibmvnic_open()) drops the rtnl),
it will leave the adapter in PROBED state even though we already moved
it to OPEN.
To fix this, use the RTNL to improve serialization when reading/updating
the adapter state. i.e determine the target state of a reset only after
getting the RTNL. And if a reset is in progress during an open, simply
set the target state of the adapter and let the reset code finish the
open (like we currently do if failover is pending).
One twist to this serialization is if the adapter state changes when we
drop the RTNL to update the link state. Account for this by checking if
there was an intervening open and update the target state for the reset
accordingly (see new comments in the code). Note that only the reset
functions and ibmvnic_open() can set the adapter to OPEN state and this
must happen under rtnl.
Fixes: 7d7195a026ba ("ibmvnic: Do not process device remove during device reset")
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Reviewed-by: Dany Madden <drt@linux.ibm.com>
Link: https://lore.kernel.org/r/20210224050229.1155468-1-sukadev@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Wei Yongjun [Tue, 23 Feb 2021 10:48:03 +0000 (10:48 +0000)]
net: stmmac: Fix missing spin_lock_init in visconti_eth_dwmac_probe()
The driver allocates the spinlock but not initialize it.
Use spin_lock_init() on it to initialize it correctly.
Fixes: b38dd98ff8d0 ("net: stmmac: Add Toshiba Visconti SoCs glue driver")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp>
Link: https://lore.kernel.org/r/20210223104803.4047281-1-weiyongjun1@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dave Airlie [Wed, 24 Feb 2021 23:27:29 +0000 (09:27 +1000)]
Merge tag 'drm-msm-next-2021-02-07' of https://gitlab.freedesktop.org/drm/msm into drm-next
* a6xx speedbin support
* a508, a509, a512 support
* various a5xx fixes
* various dpu fixes
* qseed3lite support for sm8250
* dsi fix for msm8994
* mdp5 fix for framerate bug with cmd mode panels
* a6xx GMU OOB race fixes that were showing up in CI
* various addition and removal of semicolons
* gem submit fix for legacy userspace relocs path
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rob Clark <robdclark@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/CAF6AEGvh3tvLz_xtk=4x9xUfo2h2s4xkniOvC7HyLO2jrXnXkw@mail.gmail.com
Oleksij Rempel [Tue, 23 Feb 2021 07:01:26 +0000 (08:01 +0100)]
net: introduce CAN specific pointer in the struct net_device
Since
20dd3850bcf8 ("can: Speed up CAN frame receiption by using
ml_priv") the CAN framework uses per device specific data in the AF_CAN
protocol. For this purpose the struct net_device->ml_priv is used. Later
the ml_priv usage in CAN was extended for other users, one of them being
CAN_J1939.
Later in the kernel ml_priv was converted to an union, used by other
drivers. E.g. the tun driver started storing it's stats pointer.
Since tun devices can claim to be a CAN device, CAN specific protocols
will wrongly interpret this pointer, which will cause system crashes.
Mostly this issue is visible in the CAN_J1939 stack.
To fix this issue, we request a dedicated CAN pointer within the
net_device struct.
Reported-by: syzbot+5138c4dd15a0401bec7b@syzkaller.appspotmail.com
Fixes: 20dd3850bcf8 ("can: Speed up CAN frame receiption by using ml_priv")
Fixes: ffd956eef69b ("can: introduce CAN midlayer private and allocate it automatically")
Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol")
Fixes: 497a5757ce4e ("tun: switch to net core provided statistics counters")
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://lore.kernel.org/r/20210223070127.4538-1-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Chengyang Fan [Wed, 24 Feb 2021 20:10:28 +0000 (12:10 -0800)]
mm/migrate: remove unneeded semicolons
Remove superfluous semicolons after function definitions.
Link: https://lkml.kernel.org/r/20210115110131.2359683-1-cy.fan@huawei.com
Signed-off-by: Chengyang Fan <cy.fan@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Miaohe Lin [Wed, 24 Feb 2021 20:10:25 +0000 (12:10 -0800)]
hugetlbfs: remove unneeded return value of hugetlb_vmtruncate()
The function hugetlb_vmtruncate() is guaranteed to always success since
commit
7aa91e104028 ("hugetlb: allow extending ftruncate on hugetlbfs").
So we should remove the unneeded return value which is always 0.
Link: https://lkml.kernel.org/r/20210208084637.47789-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Miaohe Lin [Wed, 24 Feb 2021 20:10:21 +0000 (12:10 -0800)]
hugetlbfs: fix some comment typos
Fix typos reserv to reserve, minimim to minimum. No functional change
intended.
Link: https://lkml.kernel.org/r/20210130092351.28072-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Miaohe Lin [Wed, 24 Feb 2021 20:10:18 +0000 (12:10 -0800)]
hugetlbfs: correct some obsolete comments about inode i_mutex
Since commit
9902af79c01a ("parallel lookups: actual switch to rwsem"),
i_mutex of inode is converted to i_rwsem. So replace i_mutex with i_rwsem
to make comments up to date.
Link: https://lkml.kernel.org/r/20210127093111.36672-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Miaohe Lin [Wed, 24 Feb 2021 20:10:14 +0000 (12:10 -0800)]
hugetlbfs: make hugepage size conversion more readable
The calculation 1U << (h->order + PAGE_SHIFT - 10) is actually equal to
(PAGE_SHIFT << (h->order)) >> 10. So we can make it more readable by
replace it with huge_page_size(h) >> 10.
Link: https://lkml.kernel.org/r/20210122083141.24548-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Miaohe Lin [Wed, 24 Feb 2021 20:10:11 +0000 (12:10 -0800)]
hugetlbfs: remove meaningless variable avoid_reserve
The variable avoid_reserve is meaningless because we never changed its
value and just passed it to alloc_huge_page(). So remove it to make code
more clear that in hugetlbfs_fallocate, we never avoid reserve when alloc
hugepage yet. Also add a comment offered by Mike Kravetz to explain this.
Link: https://lkml.kernel.org/r/20210120071508.9078-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Miaohe Lin [Wed, 24 Feb 2021 20:10:08 +0000 (12:10 -0800)]
hugetlbfs: correct obsolete function name in hugetlbfs_read_iter()
Since commit
36e789144267 ("kill do_generic_mapping_read"), the function
do_generic_mapping_read() is renamed to do_generic_file_read(). And then
commit
47c27bc46946 ("fs: pass iocb to do_generic_file_read") renamed it
to generic_file_buffered_read(). So replace do_generic_mapping_read() with
generic_file_buffered_read() to keep comment uptodate.
Link: https://lkml.kernel.org/r/20210118063210.47118-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Miaohe Lin [Wed, 24 Feb 2021 20:10:04 +0000 (12:10 -0800)]
hugetlbfs: use helper macro default_hstate in init_hugetlbfs_fs
Since commit
e5ff215941d5 ("hugetlb: multiple hstates for multiple page
sizes"), we can use macro default_hstate to get the struct hstate which we
use by default. But init_hugetlbfs_fs() forgot to use it.
Link: https://lkml.kernel.org/r/20210116091827.20982-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Miaohe Lin [Wed, 24 Feb 2021 20:10:01 +0000 (12:10 -0800)]
hugetlbfs: remove useless BUG_ON(!inode) in hugetlbfs_setattr()
When we reach here with inode = NULL, we should have crashed as inode has
already been dereferenced via hstate_inode. So this BUG_ON(!inode) does
not take effect and should be removed.
Link: https://lkml.kernel.org/r/20210118110700.52506-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mike Kravetz [Wed, 24 Feb 2021 20:09:58 +0000 (12:09 -0800)]
hugetlbfs: remove special hugetlbfs_set_page_dirty()
Matthew Wilcox noticed that hugetlbfs_set_page_dirty always returns 0.
Instead, it should return 1 or 0 depending on the previous state of the
dirty bit. In addition, the call to compound_head is redundant as it is
also performed in calling routine set_page_dirty.
Replace the hugetlbfs specific routine hugetlbfs_set_page_dirty with
__set_page_dirty_no_writeback as it addresses both of these issues.
Link: https://lkml.kernel.org/r/20201221192542.15732-2-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mike Kravetz [Wed, 24 Feb 2021 20:09:54 +0000 (12:09 -0800)]
mm/hugetlb: change hugetlb_reserve_pages() to type bool
While reviewing a bug in hugetlb_reserve_pages, it was noticed that all
callers ignore the return value. Any failure is considered an ENOMEM
error by the callers.
Change the function to be of type bool. The function will return true if
the reservation was successful, false otherwise. Callers currently assume
a zero return code indicates success. Change the callers to look for true
to indicate success. No functional change, only code cleanup.
Link: https://lkml.kernel.org/r/20201221192542.15732-1-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tang Yizhou [Wed, 24 Feb 2021 20:09:50 +0000 (12:09 -0800)]
mm, oom: fix a comment in dump_task()
If p is a kthread, it will be checked in oom_unkillable_task() so
we can delete the corresponding comment.
Link: https://lkml.kernel.org/r/20210125133006.7242-1-tangyizhou@huawei.com
Signed-off-by: Tang Yizhou <tangyizhou@huawei.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>