Johannes Berg [Mon, 29 Nov 2021 13:32:39 +0000 (15:32 +0200)]
mac80211: mark TX-during-stop for TX in in_reconfig
Mark TXQs as having seen transmit while they were stopped if
we bail out of drv_wake_tx_queue() due to reconfig, so that
the queue wake after this will make them catch up. This is
particularly necessary for when TXQs are used for management
packets since those TXQs won't see a lot of traffic that'd
make them catch up later.
Cc: stable@vger.kernel.org
Fixes: 4856bfd23098 ("mac80211: do not call driver wake_tx_queue op during reconfig")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Link: https://lore.kernel.org/r/iwlwifi.20211129152938.4573a221c0e1.I0d1d5daea3089be3fc0dccc92991b0f8c5677f0c@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Mordechay Goodstein [Mon, 29 Nov 2021 13:32:42 +0000 (15:32 +0200)]
mac80211: update channel context before station state
Currently channel context is updated only after station got an update about
new assoc state, this results in station using the old channel context.
Fix this by moving the update channel context before updating station,
enabling the driver to immediately use the updated channel context in
the new assoc state.
Signed-off-by: Mordechay Goodstein <mordechay.goodstein@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Link: https://lore.kernel.org/r/iwlwifi.20211129152938.1c80c17ffd8a.I94ae31378b363c1182cfdca46c4b7e7165cff984@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Ilan Peer [Mon, 29 Nov 2021 13:32:45 +0000 (15:32 +0200)]
mac80211: Fix the size used for building probe request
Instead of using the hard-coded value of '100' use the correct
scan IEs length as calculated during HW registration to mac80211.
Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Link: https://lore.kernel.org/r/iwlwifi.20211129152938.0a82d6891719.I8ded1f2e0bccb9e71222c945666bcd86537f2e35@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Johannes Berg [Mon, 29 Nov 2021 13:32:46 +0000 (15:32 +0200)]
mac80211: fix lookup when adding AddBA extension element
We should be doing the HE capabilities lookup based on the full
interface type so if P2P doesn't have HE but client has it doesn't
get confused. Fix that.
Fixes: 2ab45876756f ("mac80211: add support for the ADDBA extension element")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Link: https://lore.kernel.org/r/iwlwifi.20211129152938.010fc1d61137.If3a468145f29d670cb00a693bed559d8290ba693@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Johannes Berg [Sat, 11 Dec 2021 19:10:24 +0000 (20:10 +0100)]
mac80211: validate extended element ID is present
Before attempting to parse an extended element, verify that
the extended element ID is present.
Fixes: 41cbb0f5a295 ("mac80211: add support for HE")
Reported-by: syzbot+59bdff68edce82e393b6@syzkaller.appspotmail.com
Link: https://lore.kernel.org/r/20211211201023.f30a1b128c07.I5cacc176da94ba316877c6e10fe3ceec8b4dbd7d@changeid
Cc: stable@vger.kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Ilan Peer [Thu, 2 Dec 2021 13:28:54 +0000 (15:28 +0200)]
cfg80211: Acquire wiphy mutex on regulatory work
The function cfg80211_reg_can_beacon_relax() expects wiphy
mutex to be held when it is being called. However, when
reg_leave_invalid_chans() is called the mutex is not held.
Fix it by acquiring the lock before calling the function.
Fixes: a05829a7222e ("cfg80211: avoid holding the RTNL when calling the driver")
Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Link: https://lore.kernel.org/r/iwlwifi.20211202152831.527686cda037.I40ad9372a47cbad53b4aae7b5a6ccc0dc3fddf8b@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Johannes Berg [Thu, 2 Dec 2021 13:26:25 +0000 (15:26 +0200)]
mac80211: agg-tx: don't schedule_and_wake_txq() under sta->lock
When we call ieee80211_agg_start_txq(), that will in turn call
schedule_and_wake_txq(). Called from ieee80211_stop_tx_ba_cb()
this is done under sta->lock, which leads to certain circular
lock dependencies, as reported by Chris Murphy:
https://lore.kernel.org/r/CAJCQCtSXJ5qA4bqSPY=oLRMbv-irihVvP7A2uGutEbXQVkoNaw@mail.gmail.com
In general, ieee80211_agg_start_txq() is usually not called
with sta->lock held, only in this one place. But it's always
called with sta->ampdu_mlme.mtx held, and that's therefore
clearly sufficient.
Change ieee80211_stop_tx_ba_cb() to also call it without the
sta->lock held, by factoring it out of ieee80211_remove_tid_tx()
(which is only called in this one place).
This breaks the locking chain and makes it less likely that
we'll have similar locking chain problems in the future.
Fixes: ba8c3d6f16a1 ("mac80211: add an intermediate software queue implementation")
Reported-by: Chris Murphy <lists@colorremedies.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Link: https://lore.kernel.org/r/iwlwifi.20211202152554.f519884c8784.I555fef8e67d93fff3d9a304886c4a9f8b322e591@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Finn Behrens [Wed, 1 Dec 2021 12:49:18 +0000 (13:49 +0100)]
nl80211: remove reload flag from regulatory_request
This removes the previously unused reload flag, which was introduced in
1eda919126b4.
The request is handled as NL80211_REGDOM_SET_BY_CORE, which is parsed
unconditionally.
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Nathan Chancellor <nathan@kernel.org>
Fixes: 1eda919126b4 ("nl80211: reset regdom when reloading regdb")
Link: https://lore.kernel.org/all/YaZuKYM5bfWe2Urn@archlinux-ax161/
Signed-off-by: Finn Behrens <me@kloenk.de>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/YadvTolO8rQcNCd/@gimli.kloenk.dev
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Felix Fietkau [Thu, 2 Dec 2021 12:45:33 +0000 (13:45 +0100)]
mac80211: send ADDBA requests using the tid/queue of the aggregation session
Sending them out on a different queue can cause a race condition where a
number of packets in the queue may be discarded by the receiver, because
the ADDBA request is sent too early.
This affects any driver with software A-MPDU setup which does not allocate
packet seqno in hardware on tx, regardless of whether iTXQ is used or not.
The only driver I've seen that explicitly deals with this issue internally
is mwl8k.
Cc: stable@vger.kernel.org
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20211202124533.80388-1-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Finn Behrens [Sat, 27 Nov 2021 10:28:53 +0000 (11:28 +0100)]
nl80211: reset regdom when reloading regdb
Reload the regdom when the regulatory db is reloaded.
Otherwise, the user had to change the regulatoy domain
to a different one and then reset it to the correct
one to have a new regulatory db take effect after a
reload.
Signed-off-by: Finn Behrens <fin@nyantec.com>
Link: https://lore.kernel.org/r/YaIIZfxHgqc/UTA7@gimli.kloenk.dev
[edit commit message]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Johannes Berg [Mon, 29 Nov 2021 08:19:49 +0000 (09:19 +0100)]
mac80211: add docs for ssn in struct tid_ampdu_tx
As pointed out by Stephen, add the missing docs.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Link: https://lore.kernel.org/r/20211129091948.1327ec82beab.Iecc5975406a3028d35c65ff8d2dec31a693888d3@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Ahmed Zaki [Sat, 2 Oct 2021 14:53:29 +0000 (08:53 -0600)]
mac80211: fix a memory leak where sta_info is not freed
The following is from a system that went OOM due to a memory leak:
wlan0: Allocated STA 74:83:c2:64:0b:87
wlan0: Allocated STA 74:83:c2:64:0b:87
wlan0: IBSS finish 74:83:c2:64:0b:87 (---from ieee80211_ibss_add_sta)
wlan0: Adding new IBSS station 74:83:c2:64:0b:87
wlan0: moving STA 74:83:c2:64:0b:87 to state 2
wlan0: moving STA 74:83:c2:64:0b:87 to state 3
wlan0: Inserted STA 74:83:c2:64:0b:87
wlan0: IBSS finish 74:83:c2:64:0b:87 (---from ieee80211_ibss_work)
wlan0: Adding new IBSS station 74:83:c2:64:0b:87
wlan0: moving STA 74:83:c2:64:0b:87 to state 2
wlan0: moving STA 74:83:c2:64:0b:87 to state 3
.
.
wlan0: expiring inactive not authorized STA 74:83:c2:64:0b:87
wlan0: moving STA 74:83:c2:64:0b:87 to state 2
wlan0: moving STA 74:83:c2:64:0b:87 to state 1
wlan0: Removed STA 74:83:c2:64:0b:87
wlan0: Destroyed STA 74:83:c2:64:0b:87
The ieee80211_ibss_finish_sta() is called twice on the same STA from 2
different locations. On the second attempt, the allocated STA is not
destroyed creating a kernel memory leak.
This is happening because sta_info_insert_finish() does not call
sta_info_free() the second time when the STA already exists (returns
-EEXIST). Note that the caller sta_info_insert_rcu() assumes STA is
destroyed upon errors.
Same fix is applied to -ENOMEM.
Signed-off-by: Ahmed Zaki <anzaki@gmail.com>
Link: https://lore.kernel.org/r/20211002145329.3125293-1-anzaki@gmail.com
[change the error path label to use the existing code]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Xing Song [Tue, 23 Nov 2021 03:31:23 +0000 (11:31 +0800)]
mac80211: set up the fwd_skb->dev for mesh forwarding
Mesh forwarding requires that the fwd_skb->dev is set up for TX handling,
otherwise the following warning will be generated, so set it up for the
pending frames.
[ 72.835674 ] WARNING: CPU: 0 PID: 1193 at __skb_flow_dissect+0x284/0x1298
[ 72.842379 ] Modules linked in: ksmbd pppoe ppp_async l2tp_ppp ...
[ 72.962020 ] CPU: 0 PID: 1193 Comm: kworker/u5:1 Tainted: P S 5.4.137 #0
[ 72.969938 ] Hardware name: MT7622_MT7531 RFB (DT)
[ 72.974659 ] Workqueue: napi_workq napi_workfn
[ 72.979025 ] pstate:
60000005 (nZCv daif -PAN -UAO)
[ 72.983822 ] pc : __skb_flow_dissect+0x284/0x1298
[ 72.988444 ] lr : __skb_flow_dissect+0x54/0x1298
[ 72.992977 ] sp :
ffffffc010c738c0
[ 72.996293 ] x29:
ffffffc010c738c0 x28:
0000000000000000
[ 73.001615 ] x27:
000000000000ffc2 x26:
ffffff800c2eb818
[ 73.006937 ] x25:
ffffffc010a987c8 x24:
00000000000000ce
[ 73.012259 ] x23:
ffffffc010c73a28 x22:
ffffffc010a99c60
[ 73.017581 ] x21:
000000000000ffc2 x20:
ffffff80094da800
[ 73.022903 ] x19:
0000000000000000 x18:
0000000000000014
[ 73.028226 ] x17:
00000000084d16af x16:
00000000d1fc0bab
[ 73.033548 ] x15:
00000000715f6034 x14:
000000009dbdd301
[ 73.038870 ] x13:
00000000ea4dcbc3 x12:
0000000000000040
[ 73.044192 ] x11:
000000000eb00ff0 x10:
0000000000000000
[ 73.049513 ] x9 :
000000000eb00073 x8 :
0000000000000088
[ 73.054834 ] x7 :
0000000000000000 x6 :
0000000000000001
[ 73.060155 ] x5 :
0000000000000000 x4 :
0000000000000000
[ 73.065476 ] x3 :
ffffffc010a98000 x2 :
0000000000000000
[ 73.070797 ] x1 :
0000000000000000 x0 :
0000000000000000
[ 73.076120 ] Call trace:
[ 73.078572 ] __skb_flow_dissect+0x284/0x1298
[ 73.082846 ] __skb_get_hash+0x7c/0x228
[ 73.086629 ] ieee80211_txq_may_transmit+0x7fc/0x17b8 [mac80211]
[ 73.092564 ] ieee80211_tx_prepare_skb+0x20c/0x268 [mac80211]
[ 73.098238 ] ieee80211_tx_pending+0x144/0x330 [mac80211]
[ 73.103560 ] tasklet_action_common.isra.16+0xb4/0x158
[ 73.108618 ] tasklet_action+0x2c/0x38
[ 73.112286 ] __do_softirq+0x168/0x3b0
[ 73.115954 ] do_softirq.part.15+0x88/0x98
[ 73.119969 ] __local_bh_enable_ip+0xb0/0xb8
[ 73.124156 ] napi_workfn+0x58/0x90
[ 73.127565 ] process_one_work+0x20c/0x478
[ 73.131579 ] worker_thread+0x50/0x4f0
[ 73.135249 ] kthread+0x124/0x128
[ 73.138484 ] ret_from_fork+0x10/0x1c
Signed-off-by: Xing Song <xing.song@mediatek.com>
Tested-By: Frank Wunderlich <frank-w@public-files.de>
Link: https://lore.kernel.org/r/20211123033123.2684-1-xing.song@mediatek.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Felix Fietkau [Wed, 24 Nov 2021 09:40:24 +0000 (10:40 +0100)]
mac80211: fix regression in SSN handling of addba tx
Some drivers that do their own sequence number allocation (e.g. ath9k) rely
on being able to modify params->ssn on starting tx ampdu sessions.
This was broken by a change that modified it to use sta->tid_seq[tid] instead.
Cc: stable@vger.kernel.org
Fixes: 31d8bb4e07f8 ("mac80211: agg-tx: refactor sending addba")
Reported-by: Eneas U de Queiroz <cotequeiroz@gmail.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20211124094024.43222-1-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Felix Fietkau [Mon, 22 Nov 2021 20:43:23 +0000 (21:43 +0100)]
mac80211: fix rate control for retransmitted frames
Since retransmission clears info->control, rate control needs to be called
again, otherwise the driver might crash due to invalid rates.
Cc: stable@vger.kernel.org # 5.14+
Reported-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Reported-by: Robert W <rwbugreport@lost-in-the-void.net>
Fixes: 03c3911d2d67 ("mac80211: call ieee80211_tx_h_rate_ctrl() when dequeue")
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Tested-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Link: https://lore.kernel.org/r/20211122204323.9787-1-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Johannes Berg [Mon, 22 Nov 2021 11:47:40 +0000 (12:47 +0100)]
mac80211: track only QoS data frames for admission control
For admission control, obviously all of that only works for
QoS data frames, otherwise we cannot even access the QoS
field in the header.
Syzbot reported (see below) an uninitialized value here due
to a status of a non-QoS nullfunc packet, which isn't even
long enough to contain the QoS header.
Fix this to only do anything for QoS data packets.
Reported-by: syzbot+614e82b88a1a4973e534@syzkaller.appspotmail.com
Fixes: 02219b3abca5 ("mac80211: add WMM admission control support")
Link: https://lore.kernel.org/r/20211122124737.dad29e65902a.Ieb04587afacb27c14e0de93ec1bfbefb238cc2a0@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Maxime Bizon [Thu, 18 Nov 2021 11:58:24 +0000 (12:58 +0100)]
mac80211: fix TCP performance on mesh interface
sta is NULL for mesh point (resolved later), so sk pacing parameters
were not applied.
Signed-off-by: Maxime Bizon <mbizon@freebox.fr>
Link: https://lore.kernel.org/r/66f51659416ac35d6b11a313bd3ffe8b8a43dd55.camel@freebox.fr
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Jakub Kicinski [Fri, 26 Nov 2021 03:28:20 +0000 (19:28 -0800)]
Merge branch 'tls-splice_read-fixes'
Jakub Kicinski says:
====================
tls: splice_read fixes
As I work my way to unlocked and zero-copy TLS Rx the obvious bugs
in the splice_read implementation get harder and harder to ignore.
This is to say the fixes here are discovered by code inspection,
I'm not aware of anyone actually using splice_read.
====================
Link: https://lore.kernel.org/r/20211124232557.2039757-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 24 Nov 2021 23:25:57 +0000 (15:25 -0800)]
selftests: tls: test for correct proto_ops
Previous patch fixes overriding callbacks incorrectly. Triggering
the crash in sendpage_locked would be more spectacular but it's
hard to get to, so take the easier path of proving this is broken
and call getname. We're currently getting IPv4 socket info on an
IPv6 socket.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 24 Nov 2021 23:25:56 +0000 (15:25 -0800)]
tls: fix replacing proto_ops
We replace proto_ops whenever TLS is configured for RX. But our
replacement also overrides sendpage_locked, which will crash
unless TX is also configured. Similarly we plug both of those
in for TLS_HW (NIC crypto offload) even tho TLS_HW has a completely
different implementation for TX.
Last but not least we always plug in something based on inet_stream_ops
even though a few of the callbacks differ for IPv6 (getname, release,
bind).
Use a callback building method similar to what we do for struct proto.
Fixes: c46234ebb4d1 ("tls: RX path for ktls")
Fixes: d4ffb02dee2f ("net/tls: enable sk_msg redirect to tls socket egress")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 24 Nov 2021 23:25:55 +0000 (15:25 -0800)]
selftests: tls: test splicing decrypted records
Add tests for half-received and peeked records.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 24 Nov 2021 23:25:54 +0000 (15:25 -0800)]
tls: splice_read: fix accessing pre-processed records
recvmsg() will put peek()ed and partially read records onto the rx_list.
splice_read() needs to consult that list otherwise it may miss data.
Align with recvmsg() and also put partially-read records onto rx_list.
tls_sw_advance_skb() is pretty pointless now and will be removed in
net-next.
Fixes: 692d7b5d1f91 ("tls: Fix recvmsg() to be able to peek across multiple records")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 24 Nov 2021 23:25:53 +0000 (15:25 -0800)]
selftests: tls: test splicing cmsgs
Make sure we correctly reject splicing non-data records.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 24 Nov 2021 23:25:52 +0000 (15:25 -0800)]
tls: splice_read: fix record type check
We don't support splicing control records. TLS 1.3 changes moved
the record type check into the decrypt if(). The skb may already
be decrypted and still be an alert.
Note that decrypt_skb_update() is idempotent and updates ctx->decrypted
so the if() is pointless.
Reorder the check for decryption errors with the content type check
while touching them. This part is not really a bug, because if
decryption failed in TLS 1.3 content type will be DATA, and for
TLS 1.2 it will be correct. Nevertheless its strange to touch output
before checking if the function has failed.
Fixes: fedf201e1296 ("net: tls: Refactor control message handling on recv")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 24 Nov 2021 23:25:51 +0000 (15:25 -0800)]
selftests: tls: add tests for handling of bad records
Test broken records.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 24 Nov 2021 23:25:50 +0000 (15:25 -0800)]
selftests: tls: factor out cmsg send/receive
Add helpers for sending and receiving special record types.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 24 Nov 2021 23:25:49 +0000 (15:25 -0800)]
selftests: tls: add helper for creating sock pairs
We have the same code 3 times, about to add a fourth copy.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dylan Hung [Thu, 25 Nov 2021 02:44:32 +0000 (10:44 +0800)]
mdio: aspeed: Fix "Link is Down" issue
The issue happened randomly in runtime. The message "Link is Down" is
popped but soon it recovered to "Link is Up".
The "Link is Down" results from the incorrect read data for reading the
PHY register via MDIO bus. The correct sequence for reading the data
shall be:
1. fire the command
2. wait for command done (this step was missing)
3. wait for data idle
4. read data from data register
Cc: stable@vger.kernel.org
Fixes: f160e99462c6 ("net: phy: Add mdio-aspeed")
Reviewed-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Dylan Hung <dylan_hung@aspeedtech.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/20211125024432.15809-1-dylan_hung@aspeedtech.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jesse Brandeburg [Tue, 23 Nov 2021 20:40:00 +0000 (12:40 -0800)]
igb: fix netpoll exit with traffic
Oleksandr brought a bug report where netpoll causes trace
messages in the log on igb.
Danielle brought this back up as still occurring, so we'll try
again.
[22038.710800] ------------[ cut here ]------------
[22038.710801] igb_poll+0x0/0x1440 [igb] exceeded budget in poll
[22038.710802] WARNING: CPU: 12 PID: 40362 at net/core/netpoll.c:155 netpoll_poll_dev+0x18a/0x1a0
As Alex suggested, change the driver to return work_done at the
exit of napi_poll, which should be safe to do in this driver
because it is not polling multiple queues in this single napi
context (multiple queues attached to one MSI-X vector). Several
other drivers contain the same simple sequence, so I hope
this will not create new problems.
Fixes: 16eb8815c235 ("igb: Refactor clean_rx_irq to reduce overhead and improve performance")
Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Reported-by: Danielle Ratson <danieller@nvidia.com>
Suggested-by: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Danielle Ratson <danieller@nvidia.com>
Link: https://lore.kernel.org/r/20211123204000.1597971-1-jesse.brandeburg@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 25 Nov 2021 03:02:23 +0000 (19:02 -0800)]
Merge branch 'net-smc-fixes-2021-11-24'
Karsten Graul says:
====================
net/smc: fixes 2021-11-24
Patch 1 from DaXing fixes a possible loop in smc_listen().
Patch 2 prevents a NULL pointer dereferencing while iterating
over the lower network devices.
====================
Link: https://lore.kernel.org/r/20211124123238.471429-1-kgraul@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Guo DaXing [Wed, 24 Nov 2021 12:32:38 +0000 (13:32 +0100)]
net/smc: Fix loop in smc_listen
The kernel_listen function in smc_listen will fail when all the available
ports are occupied. At this point smc->clcsock->sk->sk_data_ready has
been changed to smc_clcsock_data_ready. When we call smc_listen again,
now both smc->clcsock->sk->sk_data_ready and smc->clcsk_data_ready point
to the smc_clcsock_data_ready function.
The smc_clcsock_data_ready() function calls lsmc->clcsk_data_ready which
now points to itself resulting in an infinite loop.
This patch restores smc->clcsock->sk->sk_data_ready with the old value.
Fixes: a60a2b1e0af1 ("net/smc: reduce active tcp_listen workers")
Signed-off-by: Guo DaXing <guodaxing@huawei.com>
Acked-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Karsten Graul [Wed, 24 Nov 2021 12:32:37 +0000 (13:32 +0100)]
net/smc: Fix NULL pointer dereferencing in smc_vlan_by_tcpsk()
Coverity reports a possible NULL dereferencing problem:
in smc_vlan_by_tcpsk():
6. returned_null: netdev_lower_get_next returns NULL (checked 29 out of 30 times).
7. var_assigned: Assigning: ndev = NULL return value from netdev_lower_get_next.
1623 ndev = (struct net_device *)netdev_lower_get_next(ndev, &lower);
CID
1468509 (#1 of 1): Dereference null return value (NULL_RETURNS)
8. dereference: Dereferencing a pointer that might be NULL ndev when calling is_vlan_dev.
1624 if (is_vlan_dev(ndev)) {
Remove the manual implementation and use netdev_walk_all_lower_dev() to
iterate over the lower devices. While on it remove an obsolete function
parameter comment.
Fixes: cb9d43f67754 ("net/smc: determine vlan_id of stacked net_device")
Suggested-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 25 Nov 2021 02:40:09 +0000 (18:40 -0800)]
Merge branch 'phylink-resolve-fixes'
Marek Behún says:
====================
phylink resolve fixes
With information from me and my nagging, Russell has produced two fixes
for phylink, which add code that triggers another phylink_resolve() from
phylink_resolve(), if certain conditions are met:
interface is being changed
or
link is down and previous link was up
These are needed because sometimes the PCS callbacks may provide stale
values if link / speed / ...
====================
Link: https://lore.kernel.org/r/20211123154403.32051-1-kabel@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Russell King (Oracle) [Tue, 23 Nov 2021 15:44:03 +0000 (16:44 +0100)]
net: phylink: Force retrigger in case of latched link-fail indicator
On mv88e6xxx 1G/2.5G PCS, the SerDes register 4.2001.2 has the following
description:
This register bit indicates when link was lost since the last
read. For the current link status, read this register
back-to-back.
Thus to get current link state, we need to read the register twice.
But doing that in the link change interrupt handler would lead to
potentially ignoring link down events, which we really want to avoid.
Thus this needs to be solved in phylink's resolve, by retriggering
another resolve in the event when PCS reports link down and previous
link was up, and by re-reading PCS state if the previous link was down.
The wrong value is read when phylink requests change from sgmii to
2500base-x mode, and link won't come up. This fixes the bug.
Fixes: 9525ae83959b ("phylink: add phylink infrastructure")
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Russell King (Oracle) [Tue, 23 Nov 2021 15:44:02 +0000 (16:44 +0100)]
net: phylink: Force link down and retrigger resolve on interface change
On PHY state change the phylink_resolve() function can read stale
information from the MAC and report incorrect link speed and duplex to
the kernel message log.
Example with a Marvell 88X3310 PHY connected to a SerDes port on Marvell
88E6393X switch:
- PHY driver triggers state change due to PHY interface mode being
changed from 10gbase-r to 2500base-x due to copper change in speed
from 10Gbps to 2.5Gbps, but the PHY itself either hasn't yet changed
its interface to the host, or the interrupt about loss of SerDes link
hadn't arrived yet (there can be a delay of several milliseconds for
this), so we still think that the 10gbase-r mode is up
- phylink_resolve()
- phylink_mac_pcs_get_state()
- this fills in speed=10g link=up
- interface mode is updated to 2500base-x but speed is left at 10Gbps
- phylink_major_config()
- interface is changed to 2500base-x
- phylink_link_up()
- mv88e6xxx_mac_link_up()
- .port_set_speed_duplex()
- speed is set to 10Gbps
- reports "Link is Up - 10Gbps/Full" to dmesg
Afterwards when the interrupt finally arrives for mv88e6xxx, another
resolve is forced in which we get the correct speed from
phylink_mac_pcs_get_state(), but since the interface is not being
changed anymore, we don't call phylink_major_config() but only
phylink_mac_config(), which does not set speed/duplex anymore.
To fix this, we need to force the link down and trigger another resolve
on PHY interface change event.
Fixes: 9525ae83959b ("phylink: add phylink infrastructure")
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Heiner Kallweit [Wed, 24 Nov 2021 07:16:25 +0000 (08:16 +0100)]
lan743x: fix deadlock in lan743x_phy_link_status_change()
Usage of phy_ethtool_get_link_ksettings() in the link status change
handler isn't needed, and in combination with the referenced change
it results in a deadlock. Simply remove the call and replace it with
direct access to phydev->speed. The duplex argument of
lan743x_phy_update_flowcontrol() isn't used and can be removed.
Fixes: c10a485c3de5 ("phy: phy_ethtool_ksettings_get: Lock the phy for consistency")
Reported-by: Alessandro B Maurici <abmaurici@gmail.com>
Tested-by: Alessandro B Maurici <abmaurici@gmail.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/40e27f76-0ba3-dcef-ee32-a78b9df38b0f@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Tue, 23 Nov 2021 20:25:35 +0000 (12:25 -0800)]
tcp_cubic: fix spurious Hystart ACK train detections for not-cwnd-limited flows
While testing BIG TCP patch series, I was expecting that TCP_RR workloads
with 80KB requests/answers would send one 80KB TSO packet,
then being received as a single GRO packet.
It turns out this was not happening, and the root cause was that
cubic Hystart ACK train was triggering after a few (2 or 3) rounds of RPC.
Hystart was wrongly setting CWND/SSTHRESH to 30, while my RPC
needed a budget of ~20 segments.
Ideally these TCP_RR flows should not exit slow start.
Cubic Hystart should reset itself at each round, instead of assuming
every TCP flow is a bulk one.
Note that even after this patch, Hystart can still trigger, depending
on scheduling artifacts, but at a higher CWND/SSTHRESH threshold,
keeping optimal TSO packet sizes.
Tested:
ip link set dev eth0 gro_ipv6_max_size 131072 gso_ipv6_max_size 131072
nstat -n; netperf -H ... -t TCP_RR -l 5 -- -r 80000,80000 -K cubic; nstat|egrep "Ip6InReceives|Hystart|Ip6OutRequests"
Before:
8605
Ip6InReceives 87541 0.0
Ip6OutRequests 129496 0.0
TcpExtTCPHystartTrainDetect 1 0.0
TcpExtTCPHystartTrainCwnd 30 0.0
After:
8760
Ip6InReceives 88514 0.0
Ip6OutRequests 87975 0.0
Fixes: ae27e98a5152 ("[TCP] CUBIC v2.3")
Co-developed-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Link: https://lore.kernel.org/r/20211123202535.1843771-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Florian Fainelli [Tue, 23 Nov 2021 22:24:22 +0000 (14:24 -0800)]
MAINTAINERS: Update B53 section to cover SF2 switch driver
Update the B53 Ethernet switch section to contain
drivers/net/dsa/bcm_sf2*.
Reported-by: Russell King (Oracle) <linux@armlinux.org.uk>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/20211123222422.3745485-1-f.fainelli@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 25 Nov 2021 01:02:43 +0000 (17:02 -0800)]
Merge tag 'ieee802154-for-net-2021-11-24' of git://git./linux/kernel/git/sschmidt/wpan
Stefan Schmidt says:
====================
pull-request: ieee802154 for net 2021-11-24
A fix from Alexander which has been brought up various times found by
automated checkers. Make sure values are in u32 range.
* tag 'ieee802154-for-net-2021-11-24' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan:
net: ieee802154: handle iftypes as u32
====================
Link: https://lore.kernel.org/r/20211124150934.3670248-1-stefan@datenfreihafen.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kumar Thangavel [Mon, 22 Nov 2021 16:38:18 +0000 (22:08 +0530)]
net/ncsi : Add payload to be 32-bit aligned to fix dropped packets
Update NC-SI command handler (both standard and OEM) to take into
account of payload paddings in allocating skb (in case of payload
size is not 32-bit aligned).
The checksum field follows payload field, without taking payload
padding into account can cause checksum being truncated, leading to
dropped packets.
Fixes: fb4ee67529ff ("net/ncsi: Add NCSI OEM command support")
Signed-off-by: Kumar Thangavel <thangavel.k@hcl.com>
Acked-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
James Prestwood [Mon, 22 Nov 2021 17:18:06 +0000 (09:18 -0800)]
selftests: add arp_ndisc_evict_nocarrier to Makefile
This was previously added in selftests but never added
to the Makefile
Signed-off-by: James Prestwood <prestwoj@gmail.com>
Link: https://lore.kernel.org/r/20211122171806.3529401-1-prestwoj@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jamal Hadi Salim [Mon, 22 Nov 2021 14:42:52 +0000 (09:42 -0500)]
tc-testing: Add link for reviews with TC MAINTAINERS
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Link: https://lore.kernel.org/r/20211122144252.25156-1-jhs@emojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Mon, 22 Nov 2021 18:48:10 +0000 (10:48 -0800)]
tools: sync uapi/linux/if_link.h header
This file has not been updated for a while.
Sync it before BIG TCP patch series.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20211122184810.769159-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Marek Behún [Mon, 22 Nov 2021 20:08:34 +0000 (21:08 +0100)]
net: marvell: mvpp2: increase MTU limit when XDP enabled
Currently mvpp2_xdp_setup won't allow attaching XDP program if
mtu > ETH_DATA_LEN (1500).
The mvpp2_change_mtu on the other hand checks whether
MVPP2_RX_PKT_SIZE(mtu) > MVPP2_BM_LONG_PKT_SIZE.
These two checks are semantically different.
Moreover this limit can be increased to MVPP2_MAX_RX_BUF_SIZE, since in
mvpp2_rx we have
xdp.data = data + MVPP2_MH_SIZE + MVPP2_SKB_HEADROOM;
xdp.frame_sz = PAGE_SIZE;
Change the checks to check whether
mtu > MVPP2_MAX_RX_BUF_SIZE
Fixes: 07dd0a7aae7f ("mvpp2: add basic XDP support")
Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alex Elder [Tue, 23 Nov 2021 01:16:40 +0000 (19:16 -0600)]
net: ipa: kill ipa_cmd_pipeline_clear()
Calling ipa_cmd_pipeline_clear() after stopping the channel
underlying the AP<-modem RX endpoint can lead to a deadlock.
This occurs in the ->runtime_suspend device power operation for the
IPA driver. While this callback is in progress, any other requests
for power will block until the callback returns.
Stopping the AP<-modem RX channel does not prevent the modem from
sending another packet to this endpoint. If a packet arrives for an
RX channel when the channel is stopped, an SUSPEND IPA interrupt
condition will be pending. Handling an IPA interrupt requires
power, so ipa_isr_thread() calls pm_runtime_get_sync() first thing.
The problem occurs because a "pipeline clear" command will not
complete while such a SUSPEND interrupt condition exists. So the
SUSPEND IPA interrupt handler won't proceed until it gets power;
that won't happen until the ->runtime_suspend callback (and its
"pipeline clear" command) completes; and that can't happen while
the SUSPEND interrupt condition exists.
It turns out that in this case there is no need to use the "pipeline
clear" command. There are scenarios in which clearing the pipeline
is required while suspending, but those are not (yet) supported
upstream. So a simple fix, avoiding the potential deadlock, is to
stop calling ipa_cmd_pipeline_clear() in ipa_endpoint_suspend().
This removes the only user of ipa_cmd_pipeline_clear(), so get rid
of that function. It can be restored again whenever it's needed.
This is basically a manual revert along with an explanation for
commit
6cb63ea6a39ea ("net: ipa: introduce ipa_cmd_tag_process()").
Fixes: 6cb63ea6a39ea ("net: ipa: introduce ipa_cmd_tag_process()")
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Martyn Welch [Mon, 22 Nov 2021 18:44:45 +0000 (18:44 +0000)]
net: usb: Correct PHY handling of smsc95xx
The smsc95xx driver is dropping phy speed settings and causing a stack
trace at device unbind:
[ 536.379147] smsc95xx 2-1:1.0 eth1: unregister 'smsc95xx' usb-ci_hdrc.2-1, smsc95xx USB 2.0 Ethernet
[ 536.425029] ------------[ cut here ]------------
[ 536.429650] WARNING: CPU: 0 PID: 439 at fs/kernfs/dir.c:1535 kernfs_remove_by_name_ns+0xb8/0xc0
[ 536.438416] kernfs: can not remove 'attached_dev', no directory
[ 536.444363] Modules linked in: xts dm_crypt dm_mod atmel_mxt_ts smsc95xx usbnet
[ 536.451748] CPU: 0 PID: 439 Comm: sh Tainted: G W 5.15.0 #1
[ 536.458636] Hardware name: Freescale i.MX53 (Device Tree Support)
[ 536.464735] Backtrace:
[ 536.467190] [<
80b1c904>] (dump_backtrace) from [<
80b1cb48>] (show_stack+0x20/0x24)
[ 536.474787] r7:
000005ff r6:
8035b294 r5:
600f0013 r4:
80d8af78
[ 536.480449] [<
80b1cb28>] (show_stack) from [<
80b1f764>] (dump_stack_lvl+0x48/0x54)
[ 536.488035] [<
80b1f71c>] (dump_stack_lvl) from [<
80b1f788>] (dump_stack+0x18/0x1c)
[ 536.495620] r5:
00000009 r4:
80d9b820
[ 536.499198] [<
80b1f770>] (dump_stack) from [<
80124fac>] (__warn+0xfc/0x114)
[ 536.506187] [<
80124eb0>] (__warn) from [<
80b1d21c>] (warn_slowpath_fmt+0xa8/0xdc)
[ 536.513688] r7:
000005ff r6:
80d9b820 r5:
80d9b8e0 r4:
83744000
[ 536.519349] [<
80b1d178>] (warn_slowpath_fmt) from [<
8035b294>] (kernfs_remove_by_name_ns+0xb8/0xc0)
[ 536.528416] r9:
00000001 r8:
00000000 r7:
824926dc r6:
00000000 r5:
80df6c2c r4:
00000000
[ 536.536162] [<
8035b1dc>] (kernfs_remove_by_name_ns) from [<
80b1f56c>] (sysfs_remove_link+0x4c/0x50)
[ 536.545225] r6:
7f00f02c r5:
80df6c2c r4:
83306400
[ 536.549845] [<
80b1f520>] (sysfs_remove_link) from [<
806f9c8c>] (phy_detach+0xfc/0x11c)
[ 536.557780] r5:
82492000 r4:
83306400
[ 536.561359] [<
806f9b90>] (phy_detach) from [<
806f9cf8>] (phy_disconnect+0x4c/0x58)
[ 536.568943] r7:
824926dc r6:
7f00f02c r5:
82492580 r4:
83306400
[ 536.574604] [<
806f9cac>] (phy_disconnect) from [<
7f00a310>] (smsc95xx_disconnect_phy+0x30/0x38 [smsc95xx])
[ 536.584290] r5:
82492580 r4:
82492580
[ 536.587868] [<
7f00a2e0>] (smsc95xx_disconnect_phy [smsc95xx]) from [<
7f001570>] (usbnet_stop+0x70/0x1a0 [usbnet])
[ 536.598161] r5:
82492580 r4:
82492000
[ 536.601740] [<
7f001500>] (usbnet_stop [usbnet]) from [<
808baa70>] (__dev_close_many+0xb4/0x12c)
[ 536.610466] r8:
83744000 r7:
00000000 r6:
83744000 r5:
83745b74 r4:
82492000
[ 536.617170] [<
808ba9bc>] (__dev_close_many) from [<
808bab78>] (dev_close_many+0x90/0x120)
[ 536.625365] r7:
00000001 r6:
83745b74 r5:
83745b8c r4:
82492000
[ 536.631026] [<
808baae8>] (dev_close_many) from [<
808bf408>] (unregister_netdevice_many+0x15c/0x704)
[ 536.640094] r9:
00000001 r8:
81130b98 r7:
83745b74 r6:
83745bc4 r5:
83745b8c r4:
82492000
[ 536.647840] [<
808bf2ac>] (unregister_netdevice_many) from [<
808bfa50>] (unregister_netdevice_queue+0xa0/0xe8)
[ 536.657775] r10:
8112bcc0 r9:
83306c00 r8:
83306c80 r7:
8291e420 r6:
83744000 r5:
00000000
[ 536.665608] r4:
82492000
[ 536.668143] [<
808bf9b0>] (unregister_netdevice_queue) from [<
808bfac0>] (unregister_netdev+0x28/0x30)
[ 536.677381] r6:
7f01003c r5:
82492000 r4:
82492000
[ 536.682000] [<
808bfa98>] (unregister_netdev) from [<
7f000b40>] (usbnet_disconnect+0x64/0xdc [usbnet])
[ 536.691241] r5:
82492000 r4:
82492580
[ 536.694819] [<
7f000adc>] (usbnet_disconnect [usbnet]) from [<
8076b958>] (usb_unbind_interface+0x80/0x248)
[ 536.704406] r5:
7f01003c r4:
83306c80
[ 536.707984] [<
8076b8d8>] (usb_unbind_interface) from [<
8061765c>] (device_release_driver_internal+0x1c4/0x1cc)
[ 536.718005] r10:
8112bcc0 r9:
80dff1dc r8:
83306c80 r7:
83744000 r6:
7f01003c r5:
00000000
[ 536.725838] r4:
8291e420
[ 536.728373] [<
80617498>] (device_release_driver_internal) from [<
80617684>] (device_release_driver+0x20/0x24)
[ 536.738302] r7:
83744000 r6:
810d4f4c r5:
8291e420 r4:
8176ae30
[ 536.743963] [<
80617664>] (device_release_driver) from [<
806156cc>] (bus_remove_device+0xf0/0x148)
[ 536.752858] [<
806155dc>] (bus_remove_device) from [<
80610018>] (device_del+0x198/0x41c)
[ 536.760880] r7:
83744000 r6:
8116e2e4 r5:
8291e464 r4:
8291e420
[ 536.766542] [<
8060fe80>] (device_del) from [<
80768fe8>] (usb_disable_device+0xcc/0x1e0)
[ 536.774576] r10:
8112bcc0 r9:
80dff1dc r8:
00000001 r7:
8112bc48 r6:
8291e400 r5:
00000001
[ 536.782410] r4:
83306c00
[ 536.784945] [<
80768f1c>] (usb_disable_device) from [<
80769c30>] (usb_set_configuration+0x514/0x8dc)
[ 536.794011] r10:
00000000 r9:
00000000 r8:
832c3600 r7:
00000004 r6:
810d5688 r5:
00000000
[ 536.801844] r4:
83306c00
[ 536.804379] [<
8076971c>] (usb_set_configuration) from [<
80775fac>] (usb_generic_driver_disconnect+0x34/0x38)
[ 536.814236] r10:
832c3610 r9:
83745ef8 r8:
832c3600 r7:
00000004 r6:
810d5688 r5:
83306c00
[ 536.822069] r4:
83306c00
[ 536.824605] [<
80775f78>] (usb_generic_driver_disconnect) from [<
8076b850>] (usb_unbind_device+0x30/0x70)
[ 536.834100] r5:
83306c00 r4:
810d5688
[ 536.837678] [<
8076b820>] (usb_unbind_device) from [<
8061765c>] (device_release_driver_internal+0x1c4/0x1cc)
[ 536.847432] r5:
822fb480 r4:
83306c80
[ 536.851009] [<
80617498>] (device_release_driver_internal) from [<
806176a8>] (device_driver_detach+0x20/0x24)
[ 536.860853] r7:
00000004 r6:
810d4f4c r5:
810d5688 r4:
83306c80
[ 536.866515] [<
80617688>] (device_driver_detach) from [<
80614d98>] (unbind_store+0x70/0xe4)
[ 536.874793] [<
80614d28>] (unbind_store) from [<
80614118>] (drv_attr_store+0x30/0x3c)
[ 536.882554] r7:
00000000 r6:
00000000 r5:
83739200 r4:
80614d28
[ 536.888217] [<
806140e8>] (drv_attr_store) from [<
8035cb68>] (sysfs_kf_write+0x48/0x54)
[ 536.896154] r5:
83739200 r4:
806140e8
[ 536.899732] [<
8035cb20>] (sysfs_kf_write) from [<
8035be84>] (kernfs_fop_write_iter+0x11c/0x1d4)
[ 536.908446] r5:
83739200 r4:
00000004
[ 536.912024] [<
8035bd68>] (kernfs_fop_write_iter) from [<
802b87fc>] (vfs_write+0x258/0x3e4)
[ 536.920317] r10:
00000000 r9:
83745f58 r8:
83744000 r7:
00000000 r6:
00000004 r5:
00000000
[ 536.928151] r4:
82adacc0
[ 536.930687] [<
802b85a4>] (vfs_write) from [<
802b8b0c>] (ksys_write+0x74/0xf4)
[ 536.937842] r10:
00000004 r9:
007767a0 r8:
83744000 r7:
00000000 r6:
00000000 r5:
82adacc0
[ 536.945676] r4:
82adacc0
[ 536.948213] [<
802b8a98>] (ksys_write) from [<
802b8ba4>] (sys_write+0x18/0x1c)
[ 536.955367] r10:
00000004 r9:
83744000 r8:
80100244 r7:
00000004 r6:
76f47b58 r5:
76fc0350
[ 536.963200] r4:
00000004
[ 536.965735] [<
802b8b8c>] (sys_write) from [<
80100060>] (ret_fast_syscall+0x0/0x48)
[ 536.973320] Exception stack(0x83745fa8 to 0x83745ff0)
[ 536.978383] 5fa0:
00000004 76fc0350 00000001 007767a0 00000004 00000000
[ 536.986569] 5fc0:
00000004 76fc0350 76f47b58 00000004 76f47c7c 76f48114 00000000 7e87991c
[ 536.994753] 5fe0:
00000498 7e879908 76e6dce8 76eca2e8
[ 536.999922] ---[ end trace
9b835d809816b435 ]---
The driver should not be connecting and disconnecting the PHY when the
device is opened and closed, it should be stopping and starting the PHY. The
phy should be connected as part of binding and disconnected during
unbinding.
As this results in the PHY not being reset during open, link speed, etc.
settings set prior to the link coming up are now not being lost.
It is necessary for phy_stop() to only be called when the phydev still
exists (resolving the above stack trace). When unbinding, ".unbind" will be
called prior to ".stop", with phy_disconnect() already having called
phy_stop() before the phydev becomes inaccessible.
Signed-off-by: Martyn Welch <martyn.welch@collabora.com>
Cc: Steve Glendinning <steve.glendinning@shawell.net>
Cc: UNGLinuxDriver@microchip.com
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: stable@kernel.org # v5.15
Signed-off-by: David S. Miller <davem@davemloft.net>
Zheyu Ma [Tue, 23 Nov 2021 02:21:50 +0000 (02:21 +0000)]
net: chelsio: cxgb4vf: Fix an error code in cxgb4vf_pci_probe()
During the process of driver probing, probe function should return < 0
for failure, otherwise kernel will treat value == 0 as success.
Therefore, we should set err to -EINVAL when
adapter->registered_device_map is NULL. Otherwise kernel will assume
that driver has been successfully probed and will cause unexpected
errors.
Signed-off-by: Zheyu Ma <zheyuma97@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Mon, 22 Nov 2021 21:35:33 +0000 (22:35 +0100)]
r8169: fix incorrect mac address assignment
The original changes brakes MAC address assignment on older chip
versions (see bug report [0]), and it brakes random MAC assignment.
is_valid_ether_addr() requires that its argument is word-aligned.
Add the missing alignment to array mac_addr.
[0] https://bugzilla.kernel.org/show_bug.cgi?id=215087
Fixes: 1c5d09d58748 ("ethernet: r8169: use eth_hw_addr_set()")
Reported-by: Richard Herbert <rherbert@sympatico.ca>
Tested-by: Richard Herbert <rherbert@sympatico.ca>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 23 Nov 2021 12:10:07 +0000 (12:10 +0000)]
Merge branch '100GbE' of git://git./linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2021-11-22
Maciej Fijalkowski says:
Here are the two fixes for issues around ethtool's set_channels()
callback for ice driver. Both are related to XDP resources. First one
corrects the size of vsi->txq_map that is used to track the usage of Tx
resources and the second one prevents the wrong refcounting of bpf_prog.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 23 Nov 2021 12:06:40 +0000 (12:06 +0000)]
Merge branch 'ipa-fixes'
Alex Elder says:
====================
net: ipa: prevent shutdown during setup
The setup phase of the IPA driver occurs in one of two ways.
Normally, it is done directly by the main driver probe function.
But some systems (those having a "modem-init" DTS property) don't
start setup until an SMP2P interrupt (sent by the modem) arrives.
Because it isn't performed by the probe function, setup on
"modem-init" systems could be underway at the time a driver
remove (or shutdown) request arrives (or vice-versa). This
situation can lead to hardware state not being cleaned up
properly.
This series addresses this problem by having the driver remove
function disable the setup interrupt. A consequence of this is
that setup will complete if it is underway when the remove function
is called.
So now, when removing the driver, setup:
- will have already completed;
- is underway, and will complete before proceeding; or
- will not have begun (and will not occur).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Alex Elder [Tue, 23 Nov 2021 00:15:55 +0000 (18:15 -0600)]
net: ipa: separate disabling setup from modem stop
The IPA setup_complete flag is set at the end of ipa_setup(), when
the setup phase of initialization has completed successfully. This
occurs as part of driver probe processing, or (if "modem-init" is
specified in the DTS file) it is triggered by the "ipa-setup-ready"
SMP2P interrupt generated by the modem.
In the latter case, it's possible for driver shutdown (or remove) to
begin while setup processing is underway, and this can't be allowed.
The problem is that the setup_complete flag is not adequate to signal
that setup is underway.
If setup_complete is set, it will never be un-set, so that case is
not a problem. But if setup_complete is false, there's a chance
setup is underway.
Because setup is triggered by an interrupt on a "modem-init" system,
there is a simple way to ensure the value of setup_complete is safe
to read. The threaded handler--if it is executing--will complete as
part of a request to disable the "ipa-modem-ready" interrupt. This
means that ipa_setup() (which is called from the handler) will run
to completion if it was underway, or will never be called otherwise.
The request to disable the "ipa-setup-ready" interrupt is currently
made within ipa_modem_stop(). Instead, disable the interrupt
outside that function in the two places it's called. In the case of
ipa_remove(), this ensures the setup_complete flag is safe to read
before we read it.
Rename ipa_smp2p_disable() to be ipa_smp2p_irq_disable_setup(), to be
more specific about its effect.
Fixes: 530f9216a953 ("soc: qcom: ipa: AP/modem communications")
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alex Elder [Tue, 23 Nov 2021 00:15:54 +0000 (18:15 -0600)]
net: ipa: directly disable ipa-setup-ready interrupt
We currently maintain a "disabled" Boolean flag to determine whether
the "ipa-setup-ready" SMP2P IRQ handler does anything. That flag
must be accessed under protection of a mutex.
Instead, disable the SMP2P interrupt when requested, which prevents
the interrupt handler from ever being called. More importantly, it
synchronizes a thread disabling the interrupt with the completion of
the interrupt handler in case they run concurrently.
Use the IPA setup_complete flag rather than the disabled flag in the
handler to determine whether to ignore any interrupts arriving after
the first.
Rename the "disabled" flag to be "setup_disabled", to be specific
about its purpose.
Fixes: 530f9216a953 ("soc: qcom: ipa: AP/modem communications")
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 23 Nov 2021 11:46:18 +0000 (11:46 +0000)]
Merge branch 'mlxsw-fixes'
Ido Schimmel says:
====================
mlxsw: Two small fixes
Patch #1 fixes a recent regression that prevents the driver from loading
with old firmware versions.
Patch #2 protects the driver from a NULL pointer dereference when
working on top of a buggy firmware. This was never observed in an actual
system, only on top of an emulator during development.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Tue, 23 Nov 2021 07:52:56 +0000 (09:52 +0200)]
mlxsw: spectrum: Protect driver from buggy firmware
When processing port up/down events generated by the device's firmware,
the driver protects itself from events reported for non-existent local
ports, but not the CPU port (local port 0), which exists, but lacks a
netdev.
This can result in a NULL pointer dereference when calling
netif_carrier_{on,off}().
Fix this by bailing early when processing an event reported for the CPU
port. Problem was only observed when running on top of a buggy emulator.
Fixes: 28b1987ef506 ("mlxsw: spectrum: Register CPU port with devlink")
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Danielle Ratson [Tue, 23 Nov 2021 07:52:55 +0000 (09:52 +0200)]
mlxsw: spectrum: Allow driver to load with old firmware versions
The driver fails to load with old firmware versions that cannot report
the maximum number of RIF MAC profiles [1].
Fix this by defaulting to a maximum of a single profile in such
situations, as multiple profiles are not supported by old firmware
versions.
[1]
mlxsw_spectrum 0000:03:00.0: cannot register bus device
mlxsw_spectrum: probe of 0000:03:00.0 failed with error -5
Fixes: 1c375ffb2efab ("mlxsw: spectrum_router: Expose RIF MAC profiles to devlink resource")
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reported-by: Vadim Pasternak <vadimp@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 23 Nov 2021 11:42:24 +0000 (11:42 +0000)]
Merge branch 'smc-fixes'
Tony Lu says:
====================
smc: Fixes for closing process and minor cleanup
Patch 1 is a minor cleanup for local struct sock variables.
Patch 2 ensures the active closing side enters TIME_WAIT.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Tony Lu [Tue, 23 Nov 2021 08:25:18 +0000 (16:25 +0800)]
net/smc: Ensure the active closing peer first closes clcsock
The side that actively closed socket, it's clcsock doesn't enter
TIME_WAIT state, but the passive side does it. It should show the same
behavior as TCP sockets.
Consider this, when client actively closes the socket, the clcsock in
server enters TIME_WAIT state, which means the address is occupied and
won't be reused before TIME_WAIT dismissing. If we restarted server, the
service would be unavailable for a long time.
To solve this issue, shutdown the clcsock in [A], perform the TCP active
close progress first, before the passive closed side closing it. So that
the actively closed side enters TIME_WAIT, not the passive one.
Client | Server
close() // client actively close |
smc_release() |
smc_close_active() // PEERCLOSEWAIT1 |
smc_close_final() // abort or closed = 1|
smc_cdc_get_slot_and_msg_send() |
[A] |
|smc_cdc_msg_recv_action() // ACTIVE
| queue_work(smc_close_wq, &conn->close_work)
| smc_close_passive_work() // PROCESSABORT or APPCLOSEWAIT1
| smc_close_passive_abort_received() // only in abort
|
|close() // server recv zero, close
| smc_release() // PROCESSABORT or APPCLOSEWAIT1
| smc_close_active()
| smc_close_abort() or smc_close_final() // CLOSED
| smc_cdc_get_slot_and_msg_send() // abort or closed = 1
smc_cdc_msg_recv_action() | smc_clcsock_release()
queue_work(smc_close_wq, &conn->close_work) | sock_release(tcp) // actively close clc, enter TIME_WAIT
smc_close_passive_work() // PEERCLOSEWAIT1 | smc_conn_free()
smc_close_passive_abort_received() // CLOSED|
smc_conn_free() |
smc_clcsock_release() |
sock_release(tcp) // passive close clc |
Link: https://www.spinics.net/lists/netdev/msg780407.html
Fixes: b38d732477e4 ("smc: socket closing and linkgroup cleanup")
Signed-off-by: Tony Lu <tonylu@linux.alibaba.com>
Reviewed-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tony Lu [Tue, 23 Nov 2021 08:25:16 +0000 (16:25 +0800)]
net/smc: Clean up local struct sock variables
There remains some variables to replace with local struct sock. So clean
them up all.
Fixes: 3163c5071f25 ("net/smc: use local struct sock variables consistently")
Signed-off-by: Tony Lu <tonylu@linux.alibaba.com>
Reviewed-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nikolay Aleksandrov [Tue, 23 Nov 2021 10:27:19 +0000 (12:27 +0200)]
net: nexthop: fix null pointer dereference when IPv6 is not enabled
When we try to add an IPv6 nexthop and IPv6 is not enabled
(!CONFIG_IPV6) we'll hit a NULL pointer dereference[1] in the error path
of nh_create_ipv6() due to calling ipv6_stub->fib6_nh_release. The bug
has been present since the beginning of IPv6 nexthop gateway support.
Commit
1aefd3de7bc6 ("ipv6: Add fib6_nh_init and release to stubs") tells
us that only fib6_nh_init has a dummy stub because fib6_nh_release should
not be called if fib6_nh_init returns an error, but the commit below added
a call to ipv6_stub->fib6_nh_release in its error path. To fix it return
the dummy stub's -EAFNOSUPPORT error directly without calling
ipv6_stub->fib6_nh_release in nh_create_ipv6()'s error path.
[1]
Output is a bit truncated, but it clearly shows the error.
BUG: kernel NULL pointer dereference, address:
000000000000000000
#PF: supervisor instruction fetch in kernel modede
#PF: error_code(0x0010) - not-present pagege
PGD 0 P4D 0
Oops: 0010 [#1] PREEMPT SMP NOPTI
CPU: 4 PID: 638 Comm: ip Kdump: loaded Not tainted 5.16.0-rc1+ #446
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-4.fc34 04/01/2014
RIP: 0010:0x0
Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
RSP: 0018:
ffff888109f5b8f0 EFLAGS:
00010286^Ac
RAX:
0000000000000000 RBX:
ffff888109f5ba28 RCX:
0000000000000000
RDX:
0000000000000000 RSI:
0000000000000000 RDI:
ffff8881008a2860
RBP:
ffff888109f5b9d8 R08:
0000000000000000 R09:
0000000000000000
R10:
ffff888109f5b978 R11:
ffff888109f5b948 R12:
00000000ffffff9f
R13:
ffff8881008a2a80 R14:
ffff8881008a2860 R15:
ffff8881008a2840
FS:
00007f98de70f100(0000) GS:
ffff88822bf00000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
ffffffffffffffd6 CR3:
0000000100efc000 CR4:
00000000000006e0
Call Trace:
<TASK>
nh_create_ipv6+0xed/0x10c
rtm_new_nexthop+0x6d7/0x13f3
? check_preemption_disabled+0x3d/0xf2
? lock_is_held_type+0xbe/0xfd
rtnetlink_rcv_msg+0x23f/0x26a
? check_preemption_disabled+0x3d/0xf2
? rtnl_calcit.isra.0+0x147/0x147
netlink_rcv_skb+0x61/0xb2
netlink_unicast+0x100/0x187
netlink_sendmsg+0x37f/0x3a0
? netlink_unicast+0x187/0x187
sock_sendmsg_nosec+0x67/0x9b
____sys_sendmsg+0x19d/0x1f9
? copy_msghdr_from_user+0x4c/0x5e
? rcu_read_lock_any_held+0x2a/0x78
___sys_sendmsg+0x6c/0x8c
? asm_sysvec_apic_timer_interrupt+0x12/0x20
? lockdep_hardirqs_on+0xd9/0x102
? sockfd_lookup_light+0x69/0x99
__sys_sendmsg+0x50/0x6e
do_syscall_64+0xcb/0xf2
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f98dea28914
Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b5 0f 1f 80 00 00 00 00 48 8d 05 e9 5d 0c 00 8b 00 85 c0 75 13 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 41 89 d4 55 48 89 f5 53
RSP: 002b:
00007fff859f5e68 EFLAGS:
00000246 ORIG_RAX:
000000000000002e2e
RAX:
ffffffffffffffda RBX:
00000000619cb810 RCX:
00007f98dea28914
RDX:
0000000000000000 RSI:
00007fff859f5ed0 RDI:
0000000000000003
RBP:
0000000000000000 R08:
0000000000000001 R09:
0000000000000008
R10:
fffffffffffffce6 R11:
0000000000000246 R12:
0000000000000001
R13:
000055c0097ae520 R14:
000055c0097957fd R15:
00007fff859f63a0
</TASK>
Modules linked in: bridge stp llc bonding virtio_net
Cc: stable@vger.kernel.org
Fixes: 53010f991a9f ("nexthop: Add support for IPv6 gateways")
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Huang Pei [Tue, 23 Nov 2021 11:07:49 +0000 (19:07 +0800)]
slip: fix macro redefine warning
MIPS/IA64 define END as assembly function ending, which conflict
with END definition in slip.h, just undef it at first
Reported-by: lkp@intel.com
Signed-off-by: Huang Pei <huangpei@loongson.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Huang Pei [Tue, 23 Nov 2021 11:07:48 +0000 (19:07 +0800)]
hamradio: fix macro redefine warning
MIPS/IA64 define END as assembly function ending, which conflict
with END definition in mkiss.c, just undef it at first
Reported-by: lkp@intel.com
Signed-off-by: Huang Pei <huangpei@loongson.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marta Plantykow [Tue, 26 Oct 2021 16:47:19 +0000 (18:47 +0200)]
ice: avoid bpf_prog refcount underflow
Ice driver has the routines for managing XDP resources that are shared
between ndo_bpf op and VSI rebuild flow. The latter takes place for
example when user changes queue count on an interface via ethtool's
set_channels().
There is an issue around the bpf_prog refcounting when VSI is being
rebuilt - since ice_prepare_xdp_rings() is called with vsi->xdp_prog as
an argument that is used later on by ice_vsi_assign_bpf_prog(), same
bpf_prog pointers are swapped with each other. Then it is also
interpreted as an 'old_prog' which in turn causes us to call
bpf_prog_put on it that will decrement its refcount.
Below splat can be interpreted in a way that due to zero refcount of a
bpf_prog it is wiped out from the system while kernel still tries to
refer to it:
[ 481.069429] BUG: unable to handle page fault for address:
ffffc9000640f038
[ 481.077390] #PF: supervisor read access in kernel mode
[ 481.083335] #PF: error_code(0x0000) - not-present page
[ 481.089276] PGD
100000067 P4D
100000067 PUD
1001cb067 PMD
106d2b067 PTE 0
[ 481.097141] Oops: 0000 [#1] PREEMPT SMP PTI
[ 481.101980] CPU: 12 PID: 3339 Comm: sudo Tainted: G OE 5.15.0-rc5+ #1
[ 481.110840] Hardware name: Intel Corp. GRANTLEY/GRANTLEY, BIOS GRRFCRB1.86B.0276.D07.
1605190235 05/19/2016
[ 481.122021] RIP: 0010:dev_xdp_prog_id+0x25/0x40
[ 481.127265] Code: 80 00 00 00 00 0f 1f 44 00 00 89 f6 48 c1 e6 04 48 01 fe 48 8b 86 98 08 00 00 48 85 c0 74 13 48 8b 50 18 31 c0 48 85 d2 74 07 <48> 8b 42 38 8b 40 20 c3 48 8b 96 90 08 00 00 eb e8 66 2e 0f 1f 84
[ 481.148991] RSP: 0018:
ffffc90007b63868 EFLAGS:
00010286
[ 481.155034] RAX:
0000000000000000 RBX:
ffff889080824000 RCX:
0000000000000000
[ 481.163278] RDX:
ffffc9000640f000 RSI:
ffff889080824010 RDI:
ffff889080824000
[ 481.171527] RBP:
ffff888107af7d00 R08:
0000000000000000 R09:
ffff88810db5f6e0
[ 481.179776] R10:
0000000000000000 R11:
ffff8890885b9988 R12:
ffff88810db5f4bc
[ 481.188026] R13:
0000000000000000 R14:
0000000000000000 R15:
0000000000000000
[ 481.196276] FS:
00007f5466d5bec0(0000) GS:
ffff88903fb00000(0000) knlGS:
0000000000000000
[ 481.205633] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 481.212279] CR2:
ffffc9000640f038 CR3:
000000014429c006 CR4:
00000000003706e0
[ 481.220530] DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
[ 481.228771] DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
[ 481.237029] Call Trace:
[ 481.239856] rtnl_fill_ifinfo+0x768/0x12e0
[ 481.244602] rtnl_dump_ifinfo+0x525/0x650
[ 481.249246] ? __alloc_skb+0xa5/0x280
[ 481.253484] netlink_dump+0x168/0x3c0
[ 481.257725] netlink_recvmsg+0x21e/0x3e0
[ 481.262263] ____sys_recvmsg+0x87/0x170
[ 481.266707] ? __might_fault+0x20/0x30
[ 481.271046] ? _copy_from_user+0x66/0xa0
[ 481.275591] ? iovec_from_user+0xf6/0x1c0
[ 481.280226] ___sys_recvmsg+0x82/0x100
[ 481.284566] ? sock_sendmsg+0x5e/0x60
[ 481.288791] ? __sys_sendto+0xee/0x150
[ 481.293129] __sys_recvmsg+0x56/0xa0
[ 481.297267] do_syscall_64+0x3b/0xc0
[ 481.301395] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 481.307238] RIP: 0033:0x7f5466f39617
[ 481.311373] Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb bd 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2f 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
[ 481.342944] RSP: 002b:
00007ffedc7f4308 EFLAGS:
00000246 ORIG_RAX:
000000000000002f
[ 481.361783] RAX:
ffffffffffffffda RBX:
00007ffedc7f5460 RCX:
00007f5466f39617
[ 481.380278] RDX:
0000000000000000 RSI:
00007ffedc7f5360 RDI:
0000000000000003
[ 481.398500] RBP:
00007ffedc7f53f0 R08:
0000000000000000 R09:
000055d556f04d50
[ 481.416463] R10:
0000000000000077 R11:
0000000000000246 R12:
00007ffedc7f5360
[ 481.434131] R13:
00007ffedc7f5350 R14:
00007ffedc7f5344 R15:
0000000000000e98
[ 481.451520] Modules linked in: ice(OE) af_packet binfmt_misc nls_iso8859_1 ipmi_ssif intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp mxm_wmi mei_me coretemp mei ipmi_si ipmi_msghandler wmi acpi_pad acpi_power_meter ip_tables x_tables autofs4 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel ahci crypto_simd cryptd libahci lpc_ich [last unloaded: ice]
[ 481.528558] CR2:
ffffc9000640f038
[ 481.542041] ---[ end trace
d1f24c9ecf5b61c1 ]---
Fix this by only calling ice_vsi_assign_bpf_prog() inside
ice_prepare_xdp_rings() when current vsi->xdp_prog pointer is NULL.
This way set_channels() flow will not attempt to swap the vsi->xdp_prog
pointers with itself.
Also, sprinkle around some comments that provide a reasoning about
correlation between driver and kernel in terms of bpf_prog refcount.
Fixes: efc2214b6047 ("ice: Add support for XDP")
Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Signed-off-by: Marta Plantykow <marta.a.plantykow@intel.com>
Co-developed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Maciej Fijalkowski [Tue, 26 Oct 2021 16:47:18 +0000 (18:47 +0200)]
ice: fix vsi->txq_map sizing
The approach of having XDP queue per CPU regardless of user's setting
exposed a hidden bug that could occur in case when Rx queue count differ
from Tx queue count. Currently vsi->txq_map's size is equal to the
doubled vsi->alloc_txq, which is not correct due to the fact that XDP
rings were previously based on the Rx queue count. Below splat can be
seen when ethtool -L is used and XDP rings are configured:
[ 682.875339] BUG: kernel NULL pointer dereference, address:
000000000000000f
[ 682.883403] #PF: supervisor read access in kernel mode
[ 682.889345] #PF: error_code(0x0000) - not-present page
[ 682.895289] PGD 0 P4D 0
[ 682.898218] Oops: 0000 [#1] PREEMPT SMP PTI
[ 682.903055] CPU: 42 PID: 2878 Comm: ethtool Tainted: G OE 5.15.0-rc5+ #1
[ 682.912214] Hardware name: Intel Corp. GRANTLEY/GRANTLEY, BIOS GRRFCRB1.86B.0276.D07.
1605190235 05/19/2016
[ 682.923380] RIP: 0010:devres_remove+0x44/0x130
[ 682.928527] Code: 49 89 f4 55 48 89 fd 4c 89 ff 53 48 83 ec 10 e8 92 b9 49 00 48 8b 9d a8 02 00 00 48 8d 8d a0 02 00 00 49 89 c2 48 39 cb 74 0f <4c> 3b 63 10 74 25 48 8b 5b 08 48 39 cb 75 f1 4c 89 ff 4c 89 d6 e8
[ 682.950237] RSP: 0018:
ffffc90006a679f0 EFLAGS:
00010002
[ 682.956285] RAX:
0000000000000286 RBX:
ffffffffffffffff RCX:
ffff88908343a370
[ 682.964538] RDX:
0000000000000001 RSI:
ffffffff81690d60 RDI:
0000000000000000
[ 682.972789] RBP:
ffff88908343a0d0 R08:
0000000000000000 R09:
0000000000000000
[ 682.981040] R10:
0000000000000286 R11:
3fffffffffffffff R12:
ffffffff81690d60
[ 682.989282] R13:
ffffffff81690a00 R14:
ffff8890819807a8 R15:
ffff88908343a36c
[ 682.997535] FS:
00007f08c7bfa740(0000) GS:
ffff88a03fd00000(0000) knlGS:
0000000000000000
[ 683.006910] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 683.013557] CR2:
000000000000000f CR3:
0000001080a66003 CR4:
00000000003706e0
[ 683.021819] DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
[ 683.030075] DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
[ 683.038336] Call Trace:
[ 683.041167] devm_kfree+0x33/0x50
[ 683.045004] ice_vsi_free_arrays+0x5e/0xc0 [ice]
[ 683.050380] ice_vsi_rebuild+0x4c8/0x750 [ice]
[ 683.055543] ice_vsi_recfg_qs+0x9a/0x110 [ice]
[ 683.060697] ice_set_channels+0x14f/0x290 [ice]
[ 683.065962] ethnl_set_channels+0x333/0x3f0
[ 683.070807] genl_family_rcv_msg_doit+0xea/0x150
[ 683.076152] genl_rcv_msg+0xde/0x1d0
[ 683.080289] ? channels_prepare_data+0x60/0x60
[ 683.085432] ? genl_get_cmd+0xd0/0xd0
[ 683.089667] netlink_rcv_skb+0x50/0xf0
[ 683.094006] genl_rcv+0x24/0x40
[ 683.097638] netlink_unicast+0x239/0x340
[ 683.102177] netlink_sendmsg+0x22e/0x470
[ 683.106717] sock_sendmsg+0x5e/0x60
[ 683.110756] __sys_sendto+0xee/0x150
[ 683.114894] ? handle_mm_fault+0xd0/0x2a0
[ 683.119535] ? do_user_addr_fault+0x1f3/0x690
[ 683.134173] __x64_sys_sendto+0x25/0x30
[ 683.148231] do_syscall_64+0x3b/0xc0
[ 683.161992] entry_SYSCALL_64_after_hwframe+0x44/0xae
Fix this by taking into account the value that num_possible_cpus()
yields in addition to vsi->alloc_txq instead of doubling the latter.
Fixes: efc2214b6047 ("ice: Add support for XDP")
Fixes: 22bf877e528f ("ice: introduce XDP_TX fallback path")
Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
David S. Miller [Mon, 22 Nov 2021 15:44:49 +0000 (15:44 +0000)]
Merge branch 'nh-group-refcnt'
Nikolay Aleksandrov says:
====================
net: nexthop: fix refcount issues when replacing groups
This set fixes a refcount bug when replacing nexthop groups and
modifying routes. It is complex because the objects look valid when
debugging memory dumps, but we end up having refcount dependency between
unlinked objects which can never be released, so in turn they cannot
free their resources and refcounts. The problem happens because we can
have stale IPv6 per-cpu dsts in nexthops which were removed from a
group. Even though the IPv6 gen is bumped, the dsts won't be released
until traffic passes through them or the nexthop is freed, that can take
arbitrarily long time, and even worse we can create a scenario[1] where it
can never be released. The fix is to release the IPv6 per-cpu dsts of
replaced nexthops after an RCU grace period so no new ones can be
created. To do that we add a new IPv6 stub - fib6_nh_release_dsts, which
is used by the nexthop code only when necessary. We can further optimize
group replacement, but that is more suited for net-next as these patches
would have to be backported to stable releases.
v2: patch 02: update commit msg
patch 03: check for mausezahn before testing and make a few comments
more verbose
[1]
This info is also present in patch 02's commit message.
Initial state:
$ ip nexthop list
id 200 via 2002:db8::2 dev bridge.10 scope link onlink
id 201 via 2002:db8::3 dev bridge scope link onlink
id 203 group 201/200
$ ip -6 route
2001:db8::10 nhid 203 metric 1024 pref medium
nexthop via 2002:db8::3 dev bridge weight 1 onlink
nexthop via 2002:db8::2 dev bridge.10 weight 1 onlink
Create rt6_info through one of the multipath legs, e.g.:
$ taskset -a -c 1 ./pkt_inj 24 bridge.10 2001:db8::10
(pkt_inj is just a custom packet generator, nothing special)
Then remove that leg from the group by replace (let's assume it is id
200 in this case):
$ ip nexthop replace id 203 group 201
Now remove the IPv6 route:
$ ip -6 route del 2001:db8::10/128
The route won't be really deleted due to the stale rt6_info holding 1
refcnt in nexthop id 200.
At this point we have the following reference count dependency:
(deleted) IPv6 route holds 1 reference over nhid 203
nh 203 holds 1 ref over id 201
nh 200 holds 1 ref over the net device and the route due to the stale
rt6_info
Now to create circular dependency between nh 200 and the IPv6 route, and
also to get a reference over nh 200, restore nhid 200 in the group:
$ ip nexthop replace id 203 group 201/200
And now we have a permanent circular dependncy because nhid 203 holds a
reference over nh 200 and 201, but the route holds a ref over nh 203 and
is deleted.
To trigger the bug just delete the group (nhid 203):
$ ip nexthop del id 203
It won't really be deleted due to the IPv6 route dependency, and now we
have 2 unlinked and deleted objects that reference each other: the group
and the IPv6 route. Since the group drops the reference it holds over its
entries at free time (i.e. its own refcount needs to drop to 0) that will
never happen and we get a permanent ref on them, since one of the entries
holds a reference over the IPv6 route it will also never be released.
At this point the dependencies are:
(deleted, only unlinked) IPv6 route holds reference over group nh 203
(deleted, only unlinked) group nh 203 holds reference over nh 201 and 200
nh 200 holds 1 ref over the net device and the route due to the stale
rt6_info
This is the last point where it can be fixed by running traffic through
nh 200, and specifically through the same CPU so the rt6_info (dst) will
get released due to the IPv6 genid, that in turn will free the IPv6
route, which in turn will free the ref count over the group nh 203.
If nh 200 is deleted at this point, it will never be released due to the
ref from the unlinked group 203, it will only be unlinked:
$ ip nexthop del id 200
$ ip nexthop
$
Now we can never release that stale rt6_info, we have IPv6 route with ref
over group nh 203, group nh 203 with ref over nh 200 and 201, nh 200 with
rt6_info (dst) with ref over the net device and the IPv6 route. All of
these objects are only unlinked, and cannot be released, thus they can't
release their ref counts.
Message from syslogd@dev at Nov 19 14:04:10 ...
kernel:[73501.828730] unregister_netdevice: waiting for bridge.10 to become free. Usage count = 3
Message from syslogd@dev at Nov 19 14:04:20 ...
kernel:[73512.068811] unregister_netdevice: waiting for bridge.10 to become free. Usage count = 3
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Nikolay Aleksandrov [Mon, 22 Nov 2021 15:15:14 +0000 (17:15 +0200)]
selftests: net: fib_nexthops: add test for group refcount imbalance bug
The new selftest runs a sequence which causes circular refcount
dependency between deleted objects which cannot be released and results
in a netdevice refcount imbalance.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nikolay Aleksandrov [Mon, 22 Nov 2021 15:15:13 +0000 (17:15 +0200)]
net: nexthop: release IPv6 per-cpu dsts when replacing a nexthop group
When replacing a nexthop group, we must release the IPv6 per-cpu dsts of
the removed nexthop entries after an RCU grace period because they
contain references to the nexthop's net device and to the fib6 info.
With specific series of events[1] we can reach net device refcount
imbalance which is unrecoverable. IPv4 is not affected because dsts
don't take a refcount on the route.
[1]
$ ip nexthop list
id 200 via 2002:db8::2 dev bridge.10 scope link onlink
id 201 via 2002:db8::3 dev bridge scope link onlink
id 203 group 201/200
$ ip -6 route
2001:db8::10 nhid 203 metric 1024 pref medium
nexthop via 2002:db8::3 dev bridge weight 1 onlink
nexthop via 2002:db8::2 dev bridge.10 weight 1 onlink
Create rt6_info through one of the multipath legs, e.g.:
$ taskset -a -c 1 ./pkt_inj 24 bridge.10 2001:db8::10
(pkt_inj is just a custom packet generator, nothing special)
Then remove that leg from the group by replace (let's assume it is id
200 in this case):
$ ip nexthop replace id 203 group 201
Now remove the IPv6 route:
$ ip -6 route del 2001:db8::10/128
The route won't be really deleted due to the stale rt6_info holding 1
refcnt in nexthop id 200.
At this point we have the following reference count dependency:
(deleted) IPv6 route holds 1 reference over nhid 203
nh 203 holds 1 ref over id 201
nh 200 holds 1 ref over the net device and the route due to the stale
rt6_info
Now to create circular dependency between nh 200 and the IPv6 route, and
also to get a reference over nh 200, restore nhid 200 in the group:
$ ip nexthop replace id 203 group 201/200
And now we have a permanent circular dependncy because nhid 203 holds a
reference over nh 200 and 201, but the route holds a ref over nh 203 and
is deleted.
To trigger the bug just delete the group (nhid 203):
$ ip nexthop del id 203
It won't really be deleted due to the IPv6 route dependency, and now we
have 2 unlinked and deleted objects that reference each other: the group
and the IPv6 route. Since the group drops the reference it holds over its
entries at free time (i.e. its own refcount needs to drop to 0) that will
never happen and we get a permanent ref on them, since one of the entries
holds a reference over the IPv6 route it will also never be released.
At this point the dependencies are:
(deleted, only unlinked) IPv6 route holds reference over group nh 203
(deleted, only unlinked) group nh 203 holds reference over nh 201 and 200
nh 200 holds 1 ref over the net device and the route due to the stale
rt6_info
This is the last point where it can be fixed by running traffic through
nh 200, and specifically through the same CPU so the rt6_info (dst) will
get released due to the IPv6 genid, that in turn will free the IPv6
route, which in turn will free the ref count over the group nh 203.
If nh 200 is deleted at this point, it will never be released due to the
ref from the unlinked group 203, it will only be unlinked:
$ ip nexthop del id 200
$ ip nexthop
$
Now we can never release that stale rt6_info, we have IPv6 route with ref
over group nh 203, group nh 203 with ref over nh 200 and 201, nh 200 with
rt6_info (dst) with ref over the net device and the IPv6 route. All of
these objects are only unlinked, and cannot be released, thus they can't
release their ref counts.
Message from syslogd@dev at Nov 19 14:04:10 ...
kernel:[73501.828730] unregister_netdevice: waiting for bridge.10 to become free. Usage count = 3
Message from syslogd@dev at Nov 19 14:04:20 ...
kernel:[73512.068811] unregister_netdevice: waiting for bridge.10 to become free. Usage count = 3
Fixes: 7bf4796dd099 ("nexthops: add support for replace")
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nikolay Aleksandrov [Mon, 22 Nov 2021 15:15:12 +0000 (17:15 +0200)]
net: ipv6: add fib6_nh_release_dsts stub
We need a way to release a fib6_nh's per-cpu dsts when replacing
nexthops otherwise we can end up with stale per-cpu dsts which hold net
device references, so add a new IPv6 stub called fib6_nh_release_dsts.
It must be used after an RCU grace period, so no new dsts can be created
through a group's nexthop entry.
Similar to fib6_nh_release it shouldn't be used if fib6_nh_init has failed
so it doesn't need a dummy stub when IPv6 is not enabled.
Fixes: 7bf4796dd099 ("nexthops: add support for replace")
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Mon, 22 Nov 2021 15:01:51 +0000 (16:01 +0100)]
net, neigh: Fix crash in v6 module initialization error path
When IPv6 module gets initialized, but it's hitting an error in inet6_init()
where it then needs to undo all the prior initialization work, it also might
do a call to ndisc_cleanup() which then calls neigh_table_clear(). In there
is a missing timer cancellation of the table's managed_work item.
The kernel test robot explicitly triggered this error path and caused a UAF
crash similar to the below:
[...]
[ 28.833183][ C0] BUG: unable to handle page fault for address:
f7a43288
[ 28.833973][ C0] #PF: supervisor write access in kernel mode
[ 28.834660][ C0] #PF: error_code(0x0002) - not-present page
[ 28.835319][ C0] *pde =
06b2c067 *pte =
00000000
[ 28.835853][ C0] Oops: 0002 [#1] PREEMPT
[ 28.836367][ C0] CPU: 0 PID: 303 Comm: sed Not tainted
5.16.0-rc1-00233-g83ff5faa0d3b #7
[ 28.837293][ C0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1 04/01/2014
[ 28.838338][ C0] EIP: __run_timers.constprop.0+0x82/0x440
[...]
[ 28.845607][ C0] Call Trace:
[ 28.845942][ C0] <SOFTIRQ>
[ 28.846333][ C0] ? check_preemption_disabled.isra.0+0x2a/0x80
[ 28.846975][ C0] ? __this_cpu_preempt_check+0x8/0xa
[ 28.847570][ C0] run_timer_softirq+0xd/0x40
[ 28.848050][ C0] __do_softirq+0xf5/0x576
[ 28.848547][ C0] ? __softirqentry_text_start+0x10/0x10
[ 28.849127][ C0] do_softirq_own_stack+0x2b/0x40
[ 28.849749][ C0] </SOFTIRQ>
[ 28.850087][ C0] irq_exit_rcu+0x7d/0xc0
[ 28.850587][ C0] common_interrupt+0x2a/0x40
[ 28.851068][ C0] asm_common_interrupt+0x119/0x120
[...]
Note that IPv6 module cannot be unloaded as per
8ce440610357 ("ipv6: do not
allow ipv6 module to be removed") hence this can only be seen during module
initialization error. Tested with kernel test robot's reproducer.
Fixes: 7482e3841d52 ("net, neigh: Add NTF_MANAGED flag for managed neighbor entries")
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Li Zhijian <zhijianx.li@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Mon, 22 Nov 2021 15:02:49 +0000 (16:02 +0100)]
nixge: fix mac address error handling again
The change to eth_hw_addr_set() caused gcc to correctly spot a
bug that was introduced in an earlier incorrect fix:
In file included from include/linux/etherdevice.h:21,
from drivers/net/ethernet/ni/nixge.c:7:
In function '__dev_addr_set',
inlined from 'eth_hw_addr_set' at include/linux/etherdevice.h:319:2,
inlined from 'nixge_probe' at drivers/net/ethernet/ni/nixge.c:1286:3:
include/linux/netdevice.h:4648:9: error: 'memcpy' reading 6 bytes from a region of size 0 [-Werror=stringop-overread]
4648 | memcpy(dev->dev_addr, addr, len);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
As nixge_get_nvmem_address() can return either NULL or an error
pointer, the NULL check is wrong, and we can end up reading from
ERR_PTR(-EOPNOTSUPP), which gcc knows to contain zero readable
bytes.
Make the function always return an error pointer again but fix
the check to match that.
Fixes: f3956ebb3bf0 ("ethernet: use eth_hw_addr_set() instead of ether_addr_copy()")
Fixes: abcd3d6fc640 ("net: nixge: Fix error path for obtaining mac address")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wen Gu [Mon, 22 Nov 2021 12:32:53 +0000 (20:32 +0800)]
net/smc: Avoid warning of possible recursive locking
Possible recursive locking is detected by lockdep when SMC
falls back to TCP. The corresponding warnings are as follows:
============================================
WARNING: possible recursive locking detected
5.16.0-rc1+ #18 Tainted: G E
--------------------------------------------
wrk/1391 is trying to acquire lock:
ffff975246c8e7d8 (&ei->socket.wq.wait){..-.}-{3:3}, at: smc_switch_to_fallback+0x109/0x250 [smc]
but task is already holding lock:
ffff975246c8f918 (&ei->socket.wq.wait){..-.}-{3:3}, at: smc_switch_to_fallback+0xfe/0x250 [smc]
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&ei->socket.wq.wait);
lock(&ei->socket.wq.wait);
*** DEADLOCK ***
May be due to missing lock nesting notation
2 locks held by wrk/1391:
#0:
ffff975246040130 (sk_lock-AF_SMC){+.+.}-{0:0}, at: smc_connect+0x43/0x150 [smc]
#1:
ffff975246c8f918 (&ei->socket.wq.wait){..-.}-{3:3}, at: smc_switch_to_fallback+0xfe/0x250 [smc]
stack backtrace:
Call Trace:
<TASK>
dump_stack_lvl+0x56/0x7b
__lock_acquire+0x951/0x11f0
lock_acquire+0x27a/0x320
? smc_switch_to_fallback+0x109/0x250 [smc]
? smc_switch_to_fallback+0xfe/0x250 [smc]
_raw_spin_lock_irq+0x3b/0x80
? smc_switch_to_fallback+0x109/0x250 [smc]
smc_switch_to_fallback+0x109/0x250 [smc]
smc_connect_fallback+0xe/0x30 [smc]
__smc_connect+0xcf/0x1090 [smc]
? mark_held_locks+0x61/0x80
? __local_bh_enable_ip+0x77/0xe0
? lockdep_hardirqs_on+0xbf/0x130
? smc_connect+0x12a/0x150 [smc]
smc_connect+0x12a/0x150 [smc]
__sys_connect+0x8a/0xc0
? syscall_enter_from_user_mode+0x20/0x70
__x64_sys_connect+0x16/0x20
do_syscall_64+0x34/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
The nested locking in smc_switch_to_fallback() is considered to
possibly cause a deadlock because smc_wait->lock and clc_wait->lock
are the same type of lock. But actually it is safe so far since
there is no other place trying to obtain smc_wait->lock when
clc_wait->lock is held. So the patch replaces spin_lock() with
spin_lock_nested() to avoid false report by lockdep.
Link: https://lkml.org/lkml/2021/11/19/962
Fixes: 2153bd1e3d3d ("Transfer remaining wait queue entries during fallback")
Reported-by: syzbot+e979d3597f48262cb4ee@syzkaller.appspotmail.com
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Acked-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael S. Tsirkin [Mon, 22 Nov 2021 09:32:01 +0000 (04:32 -0500)]
vsock/virtio: suppress used length validation
It turns out that vhost vsock violates the virtio spec
by supplying the out buffer length in the used length
(should just be the in length).
As a result, attempts to validate the used length fail with:
vmw_vsock_virtio_transport virtio1: tx: used len 44 is larger than in buflen 0
Since vsock driver does not use the length fox tx and
validates the length before use for rx, it is safe to
suppress the validation in virtio core for this driver.
Reported-by: Halil Pasic <pasic@linux.ibm.com>
Fixes: 939779f5152d ("virtio_ring: validate used buffer length")
Cc: "Jason Wang" <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nicolas Iooss [Sun, 21 Nov 2021 20:06:42 +0000 (21:06 +0100)]
net: ax88796c: do not receive data in pointer
Function axspi_read_status calls:
ret = spi_write_then_read(ax_spi->spi, ax_spi->cmd_buf, 1,
(u8 *)&status, 3);
status is a pointer to a struct spi_status, which is 3-byte wide:
struct spi_status {
u16 isr;
u8 status;
};
But &status is the pointer to this pointer, and spi_write_then_read does
not dereference this parameter:
int spi_write_then_read(struct spi_device *spi,
const void *txbuf, unsigned n_tx,
void *rxbuf, unsigned n_rx)
Therefore axspi_read_status currently receive a SPI response in the
pointer status, which overwrites 24 bits of the pointer.
Thankfully, on Little-Endian systems, the pointer is only used in
le16_to_cpus(&status->isr);
... which is a no-operation. So there, the overwritten pointer is not
dereferenced. Nevertheless on Big-Endian systems, this can lead to
dereferencing pointers after their 24 most significant bits were
overwritten. And in all systems this leads to possible use of
uninitialized value in functions calling spi_write_then_read which
expect status to be initialized when the function returns.
Moreover function axspi_read_status (and macro AX_READ_STATUS) do not
seem to be used anywhere. So currently this seems to be dead code. Fix
the issue anyway so that future code works properly when using function
axspi_read_status.
Fixes: a97c69ba4f30 ("net: ax88796c: ASIX AX88796C SPI Ethernet Adapter Driver")
Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Acked-by: Łukasz Stelmach <l.stelmach@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Holger Assmann [Sun, 21 Nov 2021 17:57:04 +0000 (19:57 +0200)]
net: stmmac: retain PTP clock time during SIOCSHWTSTAMP ioctls
Currently, when user space emits SIOCSHWTSTAMP ioctl calls such as
enabling/disabling timestamping or changing filter settings, the driver
reads the current CLOCK_REALTIME value and programming this into the
NIC's hardware clock. This might be necessary during system
initialization, but at runtime, when the PTP clock has already been
synchronized to a grandmaster, a reset of the timestamp settings might
result in a clock jump. Furthermore, if the clock is also controlled by
phc2sys in automatic mode (where the UTC offset is queried from ptp4l),
that UTC-to-TAI offset (currently 37 seconds in 2021) would be
temporarily reset to 0, and it would take a long time for phc2sys to
readjust so that CLOCK_REALTIME and the PHC are apart by 37 seconds
again.
To address the issue, we introduce a new function called
stmmac_init_tstamp_counter(), which gets called during ndo_open().
It contains the code snippet moved from stmmac_hwtstamp_set() that
manages the time synchronization. Besides, the sub second increment
configuration is also moved here since the related values are hardware
dependent and runtime invariant.
Furthermore, the hardware clock must be kept running even when no time
stamping mode is selected in order to retain the synchronized time base.
That way, timestamping can be enabled again at any time only with the
need to compensate the clock's natural drifting.
As a side effect, this patch fixes the issue that ptp_clock_info::enable
can be called before SIOCSHWTSTAMP and the driver (which looks at
priv->systime_flags) was not prepared to handle that ordering.
Fixes: 92ba6888510c ("stmmac: add the support for PTP hw clock driver")
Reported-by: Michael Olbrich <m.olbrich@pengutronix.de>
Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
Signed-off-by: Holger Assmann <h.assmann@pengutronix.de>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Diana Wang [Fri, 19 Nov 2021 13:38:03 +0000 (14:38 +0100)]
nfp: checking parameter process for rx-usecs/tx-usecs is invalid
Use nn->tlv_caps.me_freq_mhz instead of nn->me_freq_mhz to check whether
rx-usecs/tx-usecs is valid.
This is because nn->tlv_caps.me_freq_mhz represents the clock_freq (MHz) of
the flow processing cores (FPC) on the NIC. While nn->me_freq_mhz is not
be set.
Fixes: ce991ab6662a ("nfp: read ME frequency from vNIC ctrl memory")
Signed-off-by: Diana Wang <na.wang@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 19 Nov 2021 01:37:58 +0000 (17:37 -0800)]
ipv6: fix typos in __ip6_finish_output()
We deal with IPv6 packets, so we need to use IP6CB(skb)->flags and
IP6SKB_REROUTED, instead of IPCB(skb)->flags and IPSKB_REROUTED
Found by code inspection, please double check that fixing this bug
does not surface other bugs.
Fixes: 09ee9dba9611 ("ipv6: Reinject IPv6 packets if IPsec policy matches after SNAT")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tobias Brunner <tobias@strongswan.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: David Ahern <dsahern@kernel.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Tested-by: Tobias Brunner <tobias@strongswan.org>
Acked-by: Tobias Brunner <tobias@strongswan.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Li Zhijian [Fri, 19 Nov 2021 06:24:57 +0000 (14:24 +0800)]
selftests/tc-testings: Be compatible with newer tc output
old tc(iproute2-5.9.0) output:
action order 1: bpf action.o:[action-ok] id 60 tag
bcf7977d3b93787c jited default-action pipe
newer tc(iproute2-5.14.0) output:
action order 1: bpf action.o:[action-ok] id 64 name tag
bcf7977d3b93787c jited default-action pipe
It can fix below errors:
# ok 260 f84a - Add cBPF action with invalid bytecode
# not ok 261 e939 - Add eBPF action with valid object-file
# Could not match regex pattern. Verify command output:
# total acts 0
#
# action order 1: bpf action.o:[action-ok] id 42 name tag
bcf7977d3b93787c jited default-action pipe
# index 667 ref 1 bind 0
Signed-off-by: Li Zhijian <zhijianx.li@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Li Zhijian [Fri, 19 Nov 2021 06:24:56 +0000 (14:24 +0800)]
selftests/tc-testing: match any qdisc type
We should not always presume all kernels use pfifo_fast as the default qdisc.
For example, a fq_codel qdisk could have below output:
qdisc fq_codel 0: parent 1:4 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
Reported-by: kernel test robot <lkp@intel.com>
Suggested-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: Li Zhijian <zhijianx.li@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Robert Marko [Fri, 19 Nov 2021 02:03:50 +0000 (03:03 +0100)]
net: dsa: qca8k: fix MTU calculation
qca8k has a global MTU, so its tracking the MTU per port to make sure
that the largest MTU gets applied.
Since it uses the frame size instead of MTU the driver MTU change function
will then add the size of Ethernet header and checksum on top of MTU.
The driver currently populates the per port MTU size as Ethernet frame
length + checksum which equals 1518.
The issue is that then MTU change function will go through all of the
ports, find the largest MTU and apply the Ethernet header + checksum on
top of it again, so for a desired MTU of 1500 you will end up with 1536.
This is obviously incorrect, so to correct it populate the per port struct
MTU with just the MTU and not include the Ethernet header + checksum size
as those will be added by the MTU change function.
Fixes: f58d2598cf70 ("net: dsa: qca8k: implement the port MTU callbacks")
Signed-off-by: Robert Marko <robert.marko@sartura.hr>
Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ansuel Smith [Fri, 19 Nov 2021 02:03:49 +0000 (03:03 +0100)]
net: dsa: qca8k: fix internal delay applied to the wrong PAD config
With SGMII phy the internal delay is always applied to the PAD0 config.
This is caused by the falling edge configuration that hardcode the reg
to PAD0 (as the falling edge bits are present only in PAD0 reg)
Move the delay configuration before the reg overwrite to correctly apply
the delay.
Fixes: cef08115846e ("net: dsa: qca8k: set internal delay also for sgmii")
Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vincent Whitchurch [Fri, 19 Nov 2021 12:05:21 +0000 (13:05 +0100)]
af_unix: fix regression in read after shutdown
On kernels before v5.15, calling read() on a unix socket after
shutdown(SHUT_RD) or shutdown(SHUT_RDWR) would return the data
previously written or EOF. But now, while read() after
shutdown(SHUT_RD) still behaves the same way, read() after
shutdown(SHUT_RDWR) always fails with -EINVAL.
This behaviour change was apparently inadvertently introduced as part of
a bug fix for a different regression caused by the commit adding sockmap
support to af_unix, commit
94531cfcbe79c359 ("af_unix: Add
unix_stream_proto for sockmap"). Those commits, for unclear reasons,
started setting the socket state to TCP_CLOSE on shutdown(SHUT_RDWR),
while this state change had previously only been done in
unix_release_sock().
Restore the original behaviour. The sockmap tests in
tests/selftests/bpf continue to pass after this patch.
Fixes: d0c6416bd7091647f60 ("unix: Fix an issue in unix_shutdown causing the other end read/write failures")
Link: https://lore.kernel.org/lkml/20211111140000.GA10779@axis.com/
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Tested-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 20 Nov 2021 14:24:00 +0000 (14:24 +0000)]
Merge branch 'mptcp-rtx-timer'
Paolo Abeni says:
====================
mptcp: fix 3rd ack rtx timer
Eric noted that the MPTCP code do the wrong thing to schedule
the MPJ 3rd ack timer. He also provided a patch to address the
issues (patch 1/2).
To fix for good the MPJ 3rd ack retransmission timer, we additionally
need to set it after the current ack is transmitted (patch 2/2)
Note that the bug went unnotice so far because all the related
tests required some running data transfer, and that causes
MPTCP-level ack even on the opening MPJ subflow. We now have
explicit packet drill coverage for this code path.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Fri, 19 Nov 2021 14:27:55 +0000 (15:27 +0100)]
mptcp: use delegate action to schedule 3rd ack retrans
Scheduling a delack in mptcp_established_options_mp() is
not a good idea: such function is called by tcp_send_ack() and
the pending delayed ack will be cleared shortly after by the
tcp_event_ack_sent() call in __tcp_transmit_skb().
Instead use the mptcp delegated action infrastructure to
schedule the delayed ack after the current bh processing completes.
Additionally moves the schedule_3rdack_retransmission() helper
into protocol.c to avoid making it visible in a different compilation
unit.
Fixes: ec3edaa7ca6ce02f ("mptcp: Add handling of outgoing MP_JOIN requests")
Reviewed-by: Mat Martineau <mathew.j.martineau>@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 19 Nov 2021 14:27:54 +0000 (15:27 +0100)]
mptcp: fix delack timer
To compute the rtx timeout schedule_3rdack_retransmission() does multiple
things in the wrong way: srtt_us is measured in usec/8 and the timeout
itself is an absolute value.
Fixes: ec3edaa7ca6ce02f ("mptcp: Add handling of outgoing MP_JOIN requests")
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <mathew.j.martineau>@linux.intel.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 20 Nov 2021 12:21:07 +0000 (12:21 +0000)]
Merge branch '40GbE' of git://git./linux/kernel/git/tnguy/net-
queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2021-11-19
This series contains updates to iavf driver only.
Nitesh prevents user from changing interrupt settings when adaptive
interrupt moderation is on.
Jedrzej resolves a hang that occurred when interface was closed while a
reset was occurring and fixes statistics to be updated when requested to
prevent stale values.
Brett adjusts driver to accommodate changes in supported VLAN features
that could occur during reset and cause various errors to be reported.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Brett Creeley [Fri, 5 Nov 2021 16:20:25 +0000 (09:20 -0700)]
iavf: Fix VLAN feature flags after VFR
When a VF goes through a reset, it's possible for the VF's feature set
to change. For example it may lose the VIRTCHNL_VF_OFFLOAD_VLAN
capability after VF reset. Unfortunately, the driver doesn't correctly
deal with this situation and errors are seen from downing/upping the
interface and/or moving the interface in/out of a network namespace.
When setting the interface down/up we see the following errors after the
VIRTCHNL_VF_OFFLOAD_VLAN capability was taken away from the VF:
ice 0000:51:00.1: VF 1 failed opcode 12, retval: -64 iavf 0000:51:09.1:
Failed to add VLAN filter, error IAVF_NOT_SUPPORTED ice 0000:51:00.1: VF
1 failed opcode 13, retval: -64 iavf 0000:51:09.1: Failed to delete VLAN
filter, error IAVF_NOT_SUPPORTED
These add/delete errors are happening because the VLAN filters are
tracked internally to the driver and regardless of the VLAN_ALLOWED()
setting the driver tries to delete/re-add them over virtchnl.
Fix the delete failure by making sure to delete any VLAN filter tracking
in the driver when a removal request is made, while preventing the
virtchnl request. This makes it so the driver's VLAN list is up to date
and the errors are
Fix the add failure by making sure the check for VLAN_ALLOWED() during
reset is done after the VF receives its capability list from the PF via
VIRTCHNL_OP_GET_VF_RESOURCES. If VLAN functionality is not allowed, then
prevent requesting re-adding the filters over virtchnl.
When moving the interface into a network namespace we see the following
errors after the VIRTCHNL_VF_OFFLOAD_VLAN capability was taken away from
the VF:
iavf 0000:51:09.1 enp81s0f1v1: NIC Link is Up Speed is 25 Gbps Full Duplex
iavf 0000:51:09.1 temp_27: renamed from enp81s0f1v1
iavf 0000:51:09.1 mgmt: renamed from temp_27
iavf 0000:51:09.1 dev27: set_features() failed (-22); wanted 0x020190001fd54833, left 0x020190001fd54bb3
These errors are happening because we aren't correctly updating the
netdev capabilities and dealing with ndo_fix_features() and
ndo_set_features() correctly.
Fix this by only reporting errors in the driver's ndo_set_features()
callback when VIRTCHNL_VF_OFFLOAD_VLAN is not allowed and any attempt to
enable the VLAN features is made. Also, make sure to disable VLAN
insertion, filtering, and stripping since the VIRTCHNL_VF_OFFLOAD_VLAN
flag applies to all of them and not just VLAN stripping.
Also, after we process the capabilities in the VF reset path, make sure
to call netdev_update_features() in case the capabilities have changed
in order to update the netdev's feature set to match the VF's actual
capabilities.
Lastly, make sure to always report success on VLAN filter delete when
VIRTCHNL_VF_OFFLOAD_VLAN is not supported. The changed flow in
iavf_del_vlans() allows the stack to delete previosly existing VLAN
filters even if VLAN filtering is not allowed. This makes it so the VLAN
filter list is up to date.
Fixes: 8774370d268f ("i40e/i40evf: support for VF VLAN tag stripping control")
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Jedrzej Jagielski [Wed, 15 Sep 2021 09:01:00 +0000 (09:01 +0000)]
iavf: Fix refreshing iavf adapter stats on ethtool request
Currently iavf adapter statistics are refreshed only in a
watchdog task, triggered approximately every two seconds,
which causes some ethtool requests to return outdated values.
Add explicit statistics refresh when requested by ethtool -S.
Fixes: b476b0030e61 ("iavf: Move commands processing to the separate function")
Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Jedrzej Jagielski [Tue, 7 Sep 2021 09:25:40 +0000 (09:25 +0000)]
iavf: Fix deadlock occurrence during resetting VF interface
System hangs if close the interface is called from the kernel during
the interface is in resetting state.
During resetting operation the link is closing but kernel didn't
know it and it tried to close this interface again what sometimes
led to deadlock.
Inform kernel about current state of interface
and turn off the flag IFF_UP when interface is closing until reset
is finished.
Previously it was most likely to hang the system when kernel
(network manager) tried to close the interface in the same time
when interface was in resetting state because of deadlock.
Fixes: 3c8e0b989aa1 ("i40vf: don't stop me now")
Signed-off-by: Jaroslaw Gawin <jaroslawx.gawin@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Nitesh B Venkatesh [Fri, 4 Jun 2021 16:53:31 +0000 (09:53 -0700)]
iavf: Prevent changing static ITR values if adaptive moderation is on
Resolve being able to change static values on VF when adaptive interrupt
moderation is enabled.
This problem is fixed by checking the interrupt settings is not
a combination of change of static value while adaptive interrupt
moderation is turned on.
Without this fix, the user would be able to change static values
on VF with adaptive moderation enabled.
Fixes: 65e87c0398f5 ("i40evf: support queue-specific settings for interrupt moderation")
Signed-off-by: Nitesh B Venkatesh <nitesh.b.venkatesh@intel.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Zekun Shen [Thu, 18 Nov 2021 21:42:47 +0000 (16:42 -0500)]
stmmac_pci: Fix underflow size in stmmac_rx
This bug report came up when we were testing the device driver
by fuzzing. It shows that buf1_len can get underflowed and be
0xfffffffc (
4294967292).
This bug is triggerable with a compromised/malfunctioning device.
We found the bug through QEMU emulation tested the patch with
emulation. We did NOT test it on real hardware.
Attached is the bug report by fuzzing.
BUG: KASAN: use-after-free in stmmac_napi_poll_rx+0x1c08/0x36e0 [stmmac]
Read of size
4294967292 at addr
ffff888016358000 by task ksoftirqd/0/9
CPU: 0 PID: 9 Comm: ksoftirqd/0 Tainted: G W 5.6.0 #1
Call Trace:
dump_stack+0x76/0xa0
print_address_description.constprop.0+0x16/0x200
? stmmac_napi_poll_rx+0x1c08/0x36e0 [stmmac]
? stmmac_napi_poll_rx+0x1c08/0x36e0 [stmmac]
__kasan_report.cold+0x37/0x7c
? stmmac_napi_poll_rx+0x1c08/0x36e0 [stmmac]
kasan_report+0xe/0x20
check_memory_region+0x15a/0x1d0
memcpy+0x20/0x50
stmmac_napi_poll_rx+0x1c08/0x36e0 [stmmac]
? stmmac_suspend+0x850/0x850 [stmmac]
? __next_timer_interrupt+0xba/0xf0
net_rx_action+0x363/0xbd0
? call_timer_fn+0x240/0x240
? __switch_to_asm+0x40/0x70
? napi_busy_loop+0x520/0x520
? __schedule+0x839/0x15a0
__do_softirq+0x18c/0x634
? takeover_tasklets+0x5f0/0x5f0
run_ksoftirqd+0x15/0x20
smpboot_thread_fn+0x2f1/0x6b0
? smpboot_unregister_percpu_thread+0x160/0x160
? __kthread_parkme+0x80/0x100
? smpboot_unregister_percpu_thread+0x160/0x160
kthread+0x2b5/0x3b0
? kthread_create_on_node+0xd0/0xd0
ret_from_fork+0x22/0x40
Reported-by: Brendan Dolan-Gavitt <brendandg@nyu.edu>
Signed-off-by: Zekun Shen <bruceshenzk@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Zekun Shen [Thu, 18 Nov 2021 21:08:02 +0000 (16:08 -0500)]
atlantic: fix double-free in aq_ring_tx_clean
We found this bug while fuzzing the device driver. Using and freeing
the dangling pointer buff->skb would cause use-after-free and
double-free.
This bug is triggerable with compromised/malfunctioning devices. We
found the bug with QEMU emulation and tested the patch by emulation.
We did NOT test on a real device.
Attached is the bug report.
BUG: KASAN: double-free or invalid-free in consume_skb+0x6c/0x1c0
Call Trace:
dump_stack+0x76/0xa0
print_address_description.constprop.0+0x16/0x200
? consume_skb+0x6c/0x1c0
kasan_report_invalid_free+0x61/0xa0
? consume_skb+0x6c/0x1c0
__kasan_slab_free+0x15e/0x170
? consume_skb+0x6c/0x1c0
kfree+0x8c/0x230
consume_skb+0x6c/0x1c0
aq_ring_tx_clean+0x5c2/0xa80 [atlantic]
aq_vec_poll+0x309/0x5d0 [atlantic]
? _sub_I_65535_1+0x20/0x20 [atlantic]
? __next_timer_interrupt+0xba/0xf0
net_rx_action+0x363/0xbd0
? call_timer_fn+0x240/0x240
? __switch_to_asm+0x34/0x70
? napi_busy_loop+0x520/0x520
? net_tx_action+0x379/0x720
__do_softirq+0x18c/0x634
? takeover_tasklets+0x5f0/0x5f0
run_ksoftirqd+0x15/0x20
smpboot_thread_fn+0x2f1/0x6b0
? smpboot_unregister_percpu_thread+0x160/0x160
? __kthread_parkme+0x80/0x100
? smpboot_unregister_percpu_thread+0x160/0x160
kthread+0x2b5/0x3b0
? kthread_create_on_node+0xd0/0xd0
ret_from_fork+0x22/0x40
Reported-by: Brendan Dolan-Gavitt <brendandg@nyu.edu>
Signed-off-by: Zekun Shen <bruceshenzk@gmail.com>
Reviewed-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Volodymyr Mytnyk [Thu, 18 Nov 2021 19:51:40 +0000 (21:51 +0200)]
net: marvell: prestera: fix double free issue on err path
fix error path handling in prestera_bridge_port_join() that
cases prestera driver to crash (see below).
Trace:
Internal error: Oops:
96000044 [#1] SMP
Modules linked in: prestera_pci prestera uio_pdrv_genirq
CPU: 1 PID: 881 Comm: ip Not tainted 5.15.0 #1
pstate:
60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : prestera_bridge_destroy+0x2c/0xb0 [prestera]
lr : prestera_bridge_port_join+0x2cc/0x350 [prestera]
sp :
ffff800011a1b0f0
...
x2 :
ffff000109ca6c80 x1 :
dead000000000100 x0 :
dead000000000122
Call trace:
prestera_bridge_destroy+0x2c/0xb0 [prestera]
prestera_bridge_port_join+0x2cc/0x350 [prestera]
prestera_netdev_port_event.constprop.0+0x3c4/0x450 [prestera]
prestera_netdev_event_handler+0xf4/0x110 [prestera]
raw_notifier_call_chain+0x54/0x80
call_netdevice_notifiers_info+0x54/0xa0
__netdev_upper_dev_link+0x19c/0x380
Fixes: e1189d9a5fbe ("net: marvell: prestera: Add Switchdev driver implementation")
Signed-off-by: Volodymyr Mytnyk <vmytnyk@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Volodymyr Mytnyk [Thu, 18 Nov 2021 19:48:03 +0000 (21:48 +0200)]
net: marvell: prestera: fix brige port operation
Return NOTIFY_DONE (dont't care) for switchdev notifications
that prestera driver don't know how to handle them.
With introduction of SWITCHDEV_BRPORT_[UN]OFFLOADED switchdev
events, the driver rejects adding swport to bridge operation
which is handled by prestera_bridge_port_join() func. The root
cause of this is that prestera driver returns error (EOPNOTSUPP)
in prestera_switchdev_blk_event() handler for unknown swdev
events. This causes switchdev_bridge_port_offload() to fail
when adding port to bridge in prestera_bridge_port_join().
Fixes: 957e2235e526 ("net: make switchdev_bridge_port_{,unoffload} loosely coupled with the bridge")
Signed-off-by: Volodymyr Mytnyk <vmytnyk@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 19 Nov 2021 11:00:01 +0000 (11:00 +0000)]
Merge git://git./pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter/IPVS fixes for net:
1) Add selftest for vrf+conntrack, from Florian Westphal.
2) Extend nfqueue selftest to cover nfqueue, also from Florian.
3) Remove duplicated include in nft_payload, from Wan Jiabing.
4) Several improvements to the nat port shadowing selftest,
from Phil Sutter.
5) Fix filtering of reply tuple in ctnetlink, from Florent Fourcot.
6) Do not override error with -EINVAL in filter setup path, also
from Florent.
7) Honor sysctl_expire_nodest_conn regardless conn_reuse_mode for
reused connections, from yangxingwu.
8) Replace snprintf() by sysfs_emit() in xt_IDLETIMER as reported
by Coccinelle, from Jing Yao.
9) Incorrect IPv6 tunnel match in flowtable offload, from Will
Mortensen.
10) Switch port shadow selftest to use socat, from Florian Westphal.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Thu, 18 Nov 2021 20:54:24 +0000 (12:54 -0800)]
Merge tag 'net-5.16-rc2' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bpf, mac80211.
Current release - regressions:
- devlink: don't throw an error if flash notification sent before
devlink visible
- page_pool: Revert "page_pool: disable dma mapping support...",
turns out there are active arches who need it
Current release - new code bugs:
- amt: cancel delayed_work synchronously in amt_fini()
Previous releases - regressions:
- xsk: fix crash on double free in buffer pool
- bpf: fix inner map state pruning regression causing program
rejections
- mac80211: drop check for DONT_REORDER in __ieee80211_select_queue,
preventing mis-selecting the best effort queue
- mac80211: do not access the IV when it was stripped
- mac80211: fix radiotap header generation, off-by-one
- nl80211: fix getting radio statistics in survey dump
- e100: fix device suspend/resume
Previous releases - always broken:
- tcp: fix uninitialized access in skb frags array for Rx 0cp
- bpf: fix toctou on read-only map's constant scalar tracking
- bpf: forbid bpf_ktime_get_coarse_ns and bpf_timer_* in tracing
progs
- tipc: only accept encrypted MSG_CRYPTO msgs
- smc: transfer remaining wait queue entries during fallback, fix
missing wake ups
- udp: validate checksum in udp_read_sock() (when sockmap is used)
- sched: act_mirred: drop dst for the direction from egress to
ingress
- virtio_net_hdr_to_skb: count transport header in UFO, prevent
allowing bad skbs into the stack
- nfc: reorder the logic in nfc_{un,}register_device, fix unregister
- ipsec: check return value of ipv6_skip_exthdr
- usb: r8152: add MAC passthrough support for more Lenovo Docks"
* tag 'net-5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (96 commits)
ptp: ocp: Fix a couple NULL vs IS_ERR() checks
net: ethernet: dec: tulip: de4x5: fix possible array overflows in type3_infoblock()
net: tulip: de4x5: fix the problem that the array 'lp->phy[8]' may be out of bound
ipv6: check return value of ipv6_skip_exthdr
e100: fix device suspend/resume
devlink: Don't throw an error if flash notification sent before devlink visible
page_pool: Revert "page_pool: disable dma mapping support..."
ethernet: hisilicon: hns: hns_dsaf_misc: fix a possible array overflow in hns_dsaf_ge_srst_by_port()
octeontx2-af: debugfs: don't corrupt user memory
NFC: add NCI_UNREG flag to eliminate the race
NFC: reorder the logic in nfc_{un,}register_device
NFC: reorganize the functions in nci_request
tipc: check for null after calling kmemdup
i40e: Fix display error code in dmesg
i40e: Fix creation of first queue by omitting it if is not power of two
i40e: Fix warning message and call stack during rmmod i40e driver
i40e: Fix ping is lost after configuring ADq on VF
i40e: Fix changing previously set num_queue_pairs for PFs
i40e: Fix NULL ptr dereference on VSI filter sync
i40e: Fix correct max_pkt_size on VF RX queue
...
Linus Torvalds [Thu, 18 Nov 2021 20:41:14 +0000 (12:41 -0800)]
Merge tag 'for-5.16-rc1-tag' of git://git./linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"Several xes and one old ioctl deprecation. Namely there's fix for
crashes/warnings with lzo compression that was suspected to be caused
by first pull merge resolution, but it was a different bug.
Summary:
- regression fix for a crash in lzo due to missing boundary checks of
the page array
- fix crashes on ARM64 due to missing barriers when synchronizing
status bits between work queues
- silence lockdep when reading chunk tree during mount
- fix false positive warning in integrity checker on devices with
disabled write caching
- fix signedness of bitfields in scrub
- start deprecation of balance v1 ioctl"
* tag 'for-5.16-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: deprecate BTRFS_IOC_BALANCE ioctl
btrfs: make 1-bit bit-fields of scrub_page unsigned int
btrfs: check-integrity: fix a warning on write caching disabled disk
btrfs: silence lockdep when reading chunk tree during mount
btrfs: fix memory ordering between normal and ordered work functions
btrfs: fix a out-of-bound access in copy_compressed_data_to_page()
Linus Torvalds [Thu, 18 Nov 2021 20:31:29 +0000 (12:31 -0800)]
Merge tag 'fs_for_v5.16-rc2' of git://git./linux/kernel/git/jack/linux-fs
Pull UDF fix from Jan Kara:
"A fix for a long-standing UDF bug where we were not properly
validating directory position inside readdir"
* tag 'fs_for_v5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
udf: Fix crash after seekdir
Linus Torvalds [Thu, 18 Nov 2021 20:17:33 +0000 (12:17 -0800)]
Merge tag 'fs.idmapped.v5.16-rc2' of git://git./linux/kernel/git/brauner/linux
Pull setattr idmapping fix from Christian Brauner:
"This contains a simple fix for setattr. When determining the validity
of the attributes the ia_{g,u}id fields contain the value that will be
written to inode->i_{g,u}id. When the {g,u}id attribute of the file
isn't altered and the caller's fs{g,u}id matches the current {g,u}id
attribute the attribute change is allowed.
The value in ia_{g,u}id does already account for idmapped mounts and
will have taken the relevant idmapping into account. So in order to
verify that the {g,u}id attribute isn't changed we simple need to
compare the ia_{g,u}id value against the inode's i_{g,u}id value.
This only has any meaning for idmapped mounts as idmapping helpers are
idempotent without them. And for idmapped mounts this really only has
a meaning when circular idmappings are used, i.e. mappings where e.g.
id 1000 is mapped to id 1001 and id 1001 is mapped to id 1000. Such
ciruclar mappings can e.g. be useful when sharing the same home
directory between multiple users at the same time.
Before this patch we could end up denying legitimate attribute changes
and allowing invalid attribute changes when circular mappings are
used. To even get into this situation the caller must've been
privileged both to create that mapping and to create that idmapped
mount.
This hasn't been seen in the wild anywhere but came up when expanding
the fstest suite during work on a series of hardening patches. All
idmapped fstests pass without any regressions and we're adding new
tests to verify the behavior of circular mappings.
The new tests can be found at [1]"
Link: https://lore.kernel.org/linux-fsdevel/20211109145713.1868404-2-brauner@kernel.org
* tag 'fs.idmapped.v5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
fs: handle circular mappings correctly
Linus Torvalds [Thu, 18 Nov 2021 20:13:24 +0000 (12:13 -0800)]
Merge tag 'for-5.16/parisc-4' of git://git./linux/kernel/git/deller/parisc-linux
Pull parisc fixes from Helge Deller:
"parisc bug and warning fixes and wire up futex_waitv.
Fix some warnings which showed up with allmodconfig builds, a revert
of a change to the sigreturn trampoline which broke signal handling,
wire up futex_waitv and add CONFIG_PRINTK_TIME=y to 32bit defconfig"
* tag 'for-5.16/parisc-4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Enable CONFIG_PRINTK_TIME=y in 32bit defconfig
Revert "parisc: Reduce sigreturn trampoline to 3 instructions"
parisc: Wrap assembler related defines inside __ASSEMBLY__
parisc: Wire up futex_waitv
parisc: Include stringify.h to avoid build error in crypto/api.c
parisc/sticon: fix reverse colors
Linus Torvalds [Thu, 18 Nov 2021 20:05:22 +0000 (12:05 -0800)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
"Selftest changes:
- Cleanups for the perf test infrastructure and mapping hugepages
- Avoid contention on mmap_sem when the guests start to run
- Add event channel upcall support to xen_shinfo_test
x86 changes:
- Fixes for Xen emulation
- Kill kvm_map_gfn() / kvm_unmap_gfn() and broken gfn_to_pfn_cache
- Fixes for migration of 32-bit nested guests on 64-bit hypervisor
- Compilation fixes
- More SEV cleanups
Generic:
- Cap the return value of KVM_CAP_NR_VCPUS to both KVM_CAP_MAX_VCPUS
and num_online_cpus(). Most architectures were only using one of
the two"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (42 commits)
KVM: x86: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS
KVM: s390: Cap KVM_CAP_NR_VCPUS by num_online_cpus()
KVM: RISC-V: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS
KVM: PPC: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS
KVM: MIPS: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS
KVM: arm64: Cap KVM_CAP_NR_VCPUS by kvm_arm_default_max_vcpus()
KVM: x86: Assume a 64-bit hypercall for guests with protected state
selftests: KVM: Add /x86_64/sev_migrate_tests to .gitignore
riscv: kvm: fix non-kernel-doc comment block
KVM: SEV: Fix typo in and tweak name of cmd_allowed_from_miror()
KVM: SEV: Drop a redundant setting of sev->asid during initialization
KVM: SEV: WARN if SEV-ES is marked active but SEV is not
KVM: SEV: Set sev_info.active after initial checks in sev_guest_init()
KVM: SEV: Disallow COPY_ENC_CONTEXT_FROM if target has created vCPUs
KVM: Kill kvm_map_gfn() / kvm_unmap_gfn() and gfn_to_pfn_cache
KVM: nVMX: Use a gfn_to_hva_cache for vmptrld
KVM: nVMX: Use kvm_read_guest_offset_cached() for nested VMCS check
KVM: x86/xen: Use sizeof_field() instead of open-coding it
KVM: nVMX: Use kvm_{read,write}_guest_cached() for shadow_vmcs12
KVM: x86/xen: Fix get_attr of KVM_XEN_ATTR_TYPE_SHARED_INFO
...
Linus Torvalds [Thu, 18 Nov 2021 19:01:06 +0000 (11:01 -0800)]
Merge tag 'docs-5.16-2' of git://git.lwn.net/linux
Pull documentation fixes from Jonathan Corbet:
"A handful of documentation fixes for 5.16"
* tag 'docs-5.16-2' of git://git.lwn.net/linux:
Documentation/process: fix a cross reference
Documentation: update vcpu-requests.rst reference
docs: accounting: update delay-accounting.rst reference
libbpf: update index.rst reference
docs: filesystems: Fix grammatical error "with" to "which"
doc/zh_CN: fix a translation error in management-style
docs: ftrace: fix the wrong path of tracefs
Documentation: arm: marvell: Fix link to armada_1000_pb.pdf document
Documentation: arm: marvell: Put Armada XP section between Armada 370 and 375
Documentation: arm: marvell: Add some links to homepage / product infos
docs: Update Sphinx requirements