Michael Walle [Tue, 20 Sep 2022 14:16:19 +0000 (16:16 +0200)]
net: phy: micrel: fix shared interrupt on LAN8814
Since commit
ece19502834d ("net: phy: micrel: 1588 support for LAN8814
phy") the handler always returns IRQ_HANDLED, except in an error case.
Before that commit, the interrupt status register was checked and if
it was empty, IRQ_NONE was returned. Restore that behavior to play nice
with the interrupt line being shared with others.
Fixes: ece19502834d ("net: phy: micrel: 1588 support for LAN8814 phy")
Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Divya Koppera <Divya.Koppera@microchip.com>
Link: https://lore.kernel.org/r/20220920141619.808117-1-michael@walle.cc
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Wen Gu [Tue, 20 Sep 2022 06:43:09 +0000 (14:43 +0800)]
net/smc: Stop the CLC flow if no link to map buffers on
There might be a potential race between SMC-R buffer map and
link group termination.
smc_smcr_terminate_all() | smc_connect_rdma()
--------------------------------------------------------------
| smc_conn_create()
for links in smcibdev |
schedule links down |
| smc_buf_create()
| \- smcr_buf_map_usable_links()
| \- no usable links found,
| (rmb->mr = NULL)
|
| smc_clc_send_confirm()
| \- access conn->rmb_desc->mr[]->rkey
| (panic)
During reboot and IB device module remove, all links will be set
down and no usable links remain in link groups. In such situation
smcr_buf_map_usable_links() should return an error and stop the
CLC flow accessing to uninitialized mr.
Fixes: b9247544c1bc ("net/smc: convert static link ID instances to support multiple links")
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Link: https://lore.kernel.org/r/1663656189-32090-1-git-send-email-guwen@linux.alibaba.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jakub Kicinski [Thu, 22 Sep 2022 01:39:23 +0000 (18:39 -0700)]
Merge branch '100GbE' of git://git./linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2022-09-20 (ice)
Michal re-sets TC configuration when changing number of queues.
Mateusz moves the check and call for link-down-on-close to the specific
path for downing/closing the interface.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
ice: Fix interface being down after reset with link-down-on-close flag on
ice: config netdev tc before setting queues number
====================
Link: https://lore.kernel.org/r/20220920205344.1860934-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Larysa Zaremba [Mon, 19 Sep 2022 13:43:46 +0000 (15:43 +0200)]
ice: Fix ice_xdp_xmit() when XDP TX queue number is not sufficient
The original patch added the static branch to handle the situation,
when assigning an XDP TX queue to every CPU is not possible,
so they have to be shared.
However, in the XDP transmit handler ice_xdp_xmit(), an error was
returned in such cases even before static condition was checked,
thus making queue sharing still impossible.
Fixes: 22bf877e528f ("ice: introduce XDP_TX fallback path")
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Link: https://lore.kernel.org/r/20220919134346.25030-1-larysa.zaremba@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 22 Sep 2022 00:28:35 +0000 (17:28 -0700)]
Merge branch '40GbE' of git://git./linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2022-09-19 (iavf, i40e)
Norbert adds checking of buffer size for Rx buffer checks in iavf.
Michal corrects setting of max MTU in iavf to account for MTU data provided
by PF, fixes i40e to set VF max MTU, and resolves lack of rate limiting
when value was less than divisor for i40e.
* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
i40e: Fix set max_tx_rate when it is lower than 1 Mbps
i40e: Fix VF set max MTU size
iavf: Fix set max MTU size with port VLAN and jumbo frames
iavf: Fix bad page state
====================
Link: https://lore.kernel.org/r/20220919223428.572091-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 21 Sep 2022 13:52:32 +0000 (06:52 -0700)]
Merge tag 'linux-can-fixes-for-6.0-
20220921' of git://git./linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2022-09-21
The 1st patch is by me, targets the flexcan driver and fixes a
potential system hang on single core systems under high CAN packet
rate.
The next 2 patches are also by me and target the gs_usb driver. A
potential race condition during the ndo_open callback as well as the
return value if the ethtool identify feature is not supported are
fixed.
* tag 'linux-can-fixes-for-6.0-
20220921' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: gs_usb: gs_usb_set_phys_id(): return with error if identify is not supported
can: gs_usb: gs_can_open(): fix race dev->can.state condition
can: flexcan: flexcan_mailbox_read() fix return value for drop = true
====================
Link: https://lore.kernel.org/r/20220921083609.419768-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jianglei Nie [Wed, 14 Sep 2022 01:42:38 +0000 (09:42 +0800)]
net: atlantic: fix potential memory leak in aq_ndev_close()
If aq_nic_stop() fails, aq_ndev_close() returns err without calling
aq_nic_deinit() to release the relevant memory and resource, which
will lead to a memory leak.
We can fix it by deleting the if condition judgment and goto statement to
call aq_nic_deinit() directly after aq_nic_stop() to fix the memory leak.
Signed-off-by: Jianglei Nie <niejianglei2021@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 21 Sep 2022 08:07:53 +0000 (09:07 +0100)]
Merge git://git./linux/kernel/git/netfilter/nf
Florian Westphal says:
====================
netfilter: bugfixes for net
The following set contains netfilter fixes for the *net* tree.
Regressions (rc only):
recent ebtables crash fix was incomplete, it added a memory leak.
The patch to fix possible buffer overrun for BIG TCP in ftp conntrack
tried to be too clever, we cannot re-use ct->lock: NAT engine might
grab it again -> deadlock. Revert back to a global spinlock.
Both from myself.
Remove the documentation for the recently removed
'nf_conntrack_helper' sysctl as well, from Pablo Neira.
The static_branch_inc() that guards the 'chain stats enabled' path
needs to be deferred further, until the entire transaction was created.
From Tetsuo Handa.
Older bugs:
Since 5.3:
nf_tables_addchain may leak pcpu memory in error path when
offloading fails. Also from Tetsuo Handa.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Marc Kleine-Budde [Thu, 18 Aug 2022 12:10:08 +0000 (14:10 +0200)]
can: gs_usb: gs_usb_set_phys_id(): return with error if identify is not supported
Until commit
409c188c57cd ("can: tree-wide: advertise software
timestamping capabilities") the ethtool_ops was only assigned for
devices which support the GS_CAN_FEATURE_IDENTIFY feature. That commit
assigns ethtool_ops unconditionally.
This results on controllers without GS_CAN_FEATURE_IDENTIFY support
for the following ethtool error:
| $ ethtool -p can0 1
| Cannot identify NIC: Broken pipe
Restore the correct error value by checking for
GS_CAN_FEATURE_IDENTIFY in the gs_usb_set_phys_id() function.
| $ ethtool -p can0 1
| Cannot identify NIC: Operation not supported
While there use the variable "netdev" for the "struct net_device"
pointer and "dev" for the "struct gs_can" pointer as in the rest of
the driver.
Fixes: 409c188c57cd ("can: tree-wide: advertise software timestamping capabilities")
Link: http://lore.kernel.org/all/20220818143853.2671854-1-mkl@pengutronix.de
Cc: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Marc Kleine-Budde [Tue, 20 Sep 2022 09:40:56 +0000 (11:40 +0200)]
can: gs_usb: gs_can_open(): fix race dev->can.state condition
The dev->can.state is set to CAN_STATE_ERROR_ACTIVE, after the device
has been started. On busy networks the CAN controller might receive
CAN frame between and go into an error state before the dev->can.state
is assigned.
Assign dev->can.state before starting the controller to close the race
window.
Fixes: d08e973a77d1 ("can: gs_usb: Added support for the GS_USB CAN devices")
Link: https://lore.kernel.org/all/20220920195216.232481-1-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Marc Kleine-Budde [Thu, 11 Aug 2022 08:25:44 +0000 (10:25 +0200)]
can: flexcan: flexcan_mailbox_read() fix return value for drop = true
The following happened on an i.MX25 using flexcan with many packets on
the bus:
The rx-offload queue reached a length more than skb_queue_len_max. In
can_rx_offload_offload_one() the drop variable was set to true which
made the call to .mailbox_read() (here: flexcan_mailbox_read()) to
_always_ return ERR_PTR(-ENOBUFS) and drop the rx'ed CAN frame. So
can_rx_offload_offload_one() returned ERR_PTR(-ENOBUFS), too.
can_rx_offload_irq_offload_fifo() looks as follows:
| while (1) {
| skb = can_rx_offload_offload_one(offload, 0);
| if (IS_ERR(skb))
| continue;
| if (!skb)
| break;
| ...
| }
The flexcan driver wrongly always returns ERR_PTR(-ENOBUFS) if drop is
requested, even if there is no CAN frame pending. As the i.MX25 is a
single core CPU, while the rx-offload processing is active, there is
no thread to process packets from the offload queue. So the queue
doesn't get any shorter and this results is a tight loop.
Instead of always returning ERR_PTR(-ENOBUFS) if drop is requested,
return NULL if no CAN frame is pending.
Changes since v1: https://lore.kernel.org/all/
20220810144536.389237-1-u.kleine-koenig@pengutronix.de
- don't break in can_rx_offload_irq_offload_fifo() in case of an error,
return NULL in flexcan_mailbox_read() in case of no pending CAN frame
instead
Fixes: 4e9c9484b085 ("can: rx-offload: Prepare for CAN FD support")
Link: https://lore.kernel.org/all/20220811094254.1864367-1-mkl@pengutronix.de
Cc: stable@vger.kernel.org # v5.5
Suggested-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Reviewed-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Tested-by: Thorsten Scherer <t.scherer@eckelmann.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Geert Uytterhoeven [Mon, 19 Sep 2022 14:48:28 +0000 (16:48 +0200)]
net: sh_eth: Fix PHY state warning splat during system resume
Since commit
744d23c71af39c7d ("net: phy: Warn about incorrect
mdio_bus_phy_resume() state"), a warning splat is printed during system
resume with Wake-on-LAN disabled:
WARNING: CPU: 0 PID: 626 at drivers/net/phy/phy_device.c:323 mdio_bus_phy_resume+0xbc/0xe4
As the Renesas SuperH Ethernet driver already calls phy_{stop,start}()
in its suspend/resume callbacks, it is sufficient to just mark the MAC
responsible for managing the power state of the PHY.
Fixes: fba863b816049b03 ("net: phy: make PHY PM ops a no-op if MAC driver manages PHY PM")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Link: https://lore.kernel.org/r/c6e1331b9bef61225fa4c09db3ba3e2e7214ba2d.1663598886.git.geert+renesas@glider.be
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Geert Uytterhoeven [Mon, 19 Sep 2022 14:48:00 +0000 (16:48 +0200)]
net: ravb: Fix PHY state warning splat during system resume
Since commit
744d23c71af39c7d ("net: phy: Warn about incorrect
mdio_bus_phy_resume() state"), a warning splat is printed during system
resume with Wake-on-LAN disabled:
WARNING: CPU: 0 PID: 1197 at drivers/net/phy/phy_device.c:323 mdio_bus_phy_resume+0xbc/0xc8
As the Renesas Ethernet AVB driver already calls phy_{stop,start}() in
its suspend/resume callbacks, it is sufficient to just mark the MAC
responsible for managing the power state of the PHY.
Fixes: fba863b816049b03 ("net: phy: make PHY PM ops a no-op if MAC driver manages PHY PM")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Link: https://lore.kernel.org/r/8ec796f47620980fdd0403e21bd8b7200b4fa1d4.1663598796.git.geert+renesas@glider.be
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Florian Westphal [Tue, 20 Sep 2022 16:31:30 +0000 (18:31 +0200)]
netfilter: nf_ct_ftp: fix deadlock when nat rewrite is needed
We can't use ct->lock, this is already used by the seqadj internals.
When using ftp helper + nat, seqadj will attempt to acquire ct->lock
again.
Revert back to a global lock for now.
Fixes: c783a29c7e59 ("netfilter: nf_ct_ftp: prefer skb_linearize")
Reported-by: Bruno de Paula Larini <bruno.larini@riosoft.com.br>
Signed-off-by: Florian Westphal <fw@strlen.de>
Florian Westphal [Tue, 20 Sep 2022 12:20:17 +0000 (14:20 +0200)]
netfilter: ebtables: fix memory leak when blob is malformed
The bug fix was incomplete, it "replaced" crash with a memory leak.
The old code had an assignment to "ret" embedded into the conditional,
restore this.
Fixes: 7997eff82828 ("netfilter: ebtables: reject blobs that don't provide all entry points")
Reported-and-tested-by: syzbot+a24c5252f3e3ab733464@syzkaller.appspotmail.com
Signed-off-by: Florian Westphal <fw@strlen.de>
Tetsuo Handa [Mon, 12 Sep 2022 13:58:51 +0000 (22:58 +0900)]
netfilter: nf_tables: fix percpu memory leak at nf_tables_addchain()
It seems to me that percpu memory for chain stats started leaking since
commit
3bc158f8d0330f0a ("netfilter: nf_tables: map basechain priority to
hardware priority") when nft_chain_offload_priority() returned an error.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Fixes: 3bc158f8d0330f0a ("netfilter: nf_tables: map basechain priority to hardware priority")
Signed-off-by: Florian Westphal <fw@strlen.de>
Tetsuo Handa [Mon, 12 Sep 2022 12:41:00 +0000 (21:41 +0900)]
netfilter: nf_tables: fix nft_counters_enabled underflow at nf_tables_addchain()
syzbot is reporting underflow of nft_counters_enabled counter at
nf_tables_addchain() [1], for commit
43eb8949cfdffa76 ("netfilter:
nf_tables: do not leave chain stats enabled on error") missed that
nf_tables_chain_destroy() after nft_basechain_init() in the error path of
nf_tables_addchain() decrements the counter because nft_basechain_init()
makes nft_is_base_chain() return true by setting NFT_CHAIN_BASE flag.
Increment the counter immediately after returning from
nft_basechain_init().
Link: https://syzkaller.appspot.com/bug?extid=
b5d82a651b71cd8a75ab [1]
Reported-by: syzbot <syzbot+b5d82a651b71cd8a75ab@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Tested-by: syzbot <syzbot+b5d82a651b71cd8a75ab@syzkaller.appspotmail.com>
Fixes: 43eb8949cfdffa76 ("netfilter: nf_tables: do not leave chain stats enabled on error")
Signed-off-by: Florian Westphal <fw@strlen.de>
Pablo Neira Ayuso [Fri, 9 Sep 2022 10:42:11 +0000 (12:42 +0200)]
netfilter: conntrack: remove nf_conntrack_helper documentation
This toggle has been already remove by
b118509076b3 ("netfilter: remove
nf_conntrack_helper sysctl and modparam toggles").
Remove the documentation entry for this toggle too.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Bhupesh Sharma [Thu, 15 Sep 2022 11:28:04 +0000 (16:58 +0530)]
MAINTAINERS: Add myself as a reviewer for Qualcomm ETHQOS Ethernet driver
As suggested by Vinod, adding myself as the reviewer
for the Qualcomm ETHQOS Ethernet driver.
Recently I have enabled this driver on a few Qualcomm
SoCs / boards and hence trying to keep a close eye on
it.
Signed-off-by: Bhupesh Sharma <bhupesh.sharma@linaro.org>
Acked-by: Vinod Koul <vkoul@kernel.org>
Link: https://lore.kernel.org/r/20220915112804.3950680-1-bhupesh.sharma@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Mateusz Palczewski [Fri, 26 Aug 2022 08:31:23 +0000 (10:31 +0200)]
ice: Fix interface being down after reset with link-down-on-close flag on
When performing a reset on ice driver with link-down-on-close flag on
interface would always stay down. Fix this by moving a check of this
flag to ice_stop() that is called only when user wants to bring
interface down.
Fixes: ab4ab73fc1ec ("ice: Add ethtool private flag to make forcing link down optional")
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Petr Oros <poros@redhat.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Michal Swiatkowski [Mon, 8 Aug 2022 09:58:54 +0000 (11:58 +0200)]
ice: config netdev tc before setting queues number
After lowering number of tx queues the warning appears:
"Number of in use tx queues changed invalidating tc mappings. Priority
traffic classification disabled!"
Example command to reproduce:
ethtool -L enp24s0f0 tx 36 rx 36
Fix this by setting correct tc mapping before setting real number of
queues on netdev.
Fixes: 0754d65bd4be5 ("ice: Add infrastructure for mqprio support via ndo_setup_tc")
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Jakub Kicinski [Tue, 20 Sep 2022 18:41:17 +0000 (11:41 -0700)]
Merge branch 'fixes-for-tc-taprio-software-mode'
Vladimir Oltean says:
====================
Fixes for tc-taprio software mode
While working on some new features for tc-taprio, I found some strange
behavior which looked like bugs. I was able to eventually trigger a NULL
pointer dereference. This patch set fixes 2 issues I saw. Detailed
explanation in patches.
====================
Link: https://lore.kernel.org/r/20220915100802.2308279-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Thu, 15 Sep 2022 10:08:02 +0000 (13:08 +0300)]
net/sched: taprio: make qdisc_leaf() see the per-netdev-queue pfifo child qdiscs
taprio can only operate as root qdisc, and to that end, there exists the
following check in taprio_init(), just as in mqprio:
if (sch->parent != TC_H_ROOT)
return -EOPNOTSUPP;
And indeed, when we try to attach taprio to an mqprio child, it fails as
expected:
$ tc qdisc add dev swp0 root handle 1: mqprio num_tc 8 \
map 0 1 2 3 4 5 6 7 \
queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 hw 0
$ tc qdisc replace dev swp0 parent 1:2 taprio num_tc 8 \
map 0 1 2 3 4 5 6 7 \
queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
base-time 0 sched-entry S 0x7f 990000 sched-entry S 0x80 100000 \
flags 0x0 clockid CLOCK_TAI
Error: sch_taprio: Can only be attached as root qdisc.
(extack message added by me)
But when we try to attach a taprio child to a taprio root qdisc,
surprisingly it doesn't fail:
$ tc qdisc replace dev swp0 root handle 1: taprio num_tc 8 \
map 0 1 2 3 4 5 6 7 queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
base-time 0 sched-entry S 0x7f 990000 sched-entry S 0x80 100000 \
flags 0x0 clockid CLOCK_TAI
$ tc qdisc replace dev swp0 parent 1:2 taprio num_tc 8 \
map 0 1 2 3 4 5 6 7 \
queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
base-time 0 sched-entry S 0x7f 990000 sched-entry S 0x80 100000 \
flags 0x0 clockid CLOCK_TAI
This is because tc_modify_qdisc() behaves differently when mqprio is
root, vs when taprio is root.
In the mqprio case, it finds the parent qdisc through
p = qdisc_lookup(dev, TC_H_MAJ(clid)), and then the child qdisc through
q = qdisc_leaf(p, clid). This leaf qdisc q has handle 0, so it is
ignored according to the comment right below ("It may be default qdisc,
ignore it"). As a result, tc_modify_qdisc() goes through the
qdisc_create() code path, and this gives taprio_init() a chance to check
for sch_parent != TC_H_ROOT and error out.
Whereas in the taprio case, the returned q = qdisc_leaf(p, clid) is
different. It is not the default qdisc created for each netdev queue
(both taprio and mqprio call qdisc_create_dflt() and keep them in
a private q->qdiscs[], or priv->qdiscs[], respectively). Instead, taprio
makes qdisc_leaf() return the _root_ qdisc, aka itself.
When taprio does that, tc_modify_qdisc() goes through the qdisc_change()
code path, because the qdisc layer never finds out about the child qdisc
of the root. And through the ->change() ops, taprio has no reason to
check whether its parent is root or not, just through ->init(), which is
not called.
The problem is the taprio_leaf() implementation. Even though code wise,
it does the exact same thing as mqprio_leaf() which it is copied from,
it works with different input data. This is because mqprio does not
attach itself (the root) to each device TX queue, but one of the default
qdiscs from its private array.
In fact, since commit
13511704f8d7 ("net: taprio offload: enforce qdisc
to netdev queue mapping"), taprio does this too, but just for the full
offload case. So if we tried to attach a taprio child to a fully
offloaded taprio root qdisc, it would properly fail too; just not to a
software root taprio.
To fix the problem, stop looking at the Qdisc that's attached to the TX
queue, and instead, always return the default qdiscs that we've
allocated (and to which we privately enqueue and dequeue, in software
scheduling mode).
Since Qdisc_class_ops :: leaf is only called from tc_modify_qdisc(),
the risk of unforeseen side effects introduced by this change is
minimal.
Fixes: 5a781ccbd19e ("tc: Add support for configuring the taprio scheduler")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Thu, 15 Sep 2022 10:08:01 +0000 (13:08 +0300)]
net/sched: taprio: avoid disabling offload when it was never enabled
In an incredibly strange API design decision, qdisc->destroy() gets
called even if qdisc->init() never succeeded, not exclusively since
commit
87b60cfacf9f ("net_sched: fix error recovery at qdisc creation"),
but apparently also earlier (in the case of qdisc_create_dflt()).
The taprio qdisc does not fully acknowledge this when it attempts full
offload, because it starts off with q->flags = TAPRIO_FLAGS_INVALID in
taprio_init(), then it replaces q->flags with TCA_TAPRIO_ATTR_FLAGS
parsed from netlink (in taprio_change(), tail called from taprio_init()).
But in taprio_destroy(), we call taprio_disable_offload(), and this
determines what to do based on FULL_OFFLOAD_IS_ENABLED(q->flags).
But looking at the implementation of FULL_OFFLOAD_IS_ENABLED()
(a bitwise check of bit 1 in q->flags), it is invalid to call this macro
on q->flags when it contains TAPRIO_FLAGS_INVALID, because that is set
to U32_MAX, and therefore FULL_OFFLOAD_IS_ENABLED() will return true on
an invalid set of flags.
As a result, it is possible to crash the kernel if user space forces an
error between setting q->flags = TAPRIO_FLAGS_INVALID, and the calling
of taprio_enable_offload(). This is because drivers do not expect the
offload to be disabled when it was never enabled.
The error that we force here is to attach taprio as a non-root qdisc,
but instead as child of an mqprio root qdisc:
$ tc qdisc add dev swp0 root handle 1: \
mqprio num_tc 8 map 0 1 2 3 4 5 6 7 \
queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 hw 0
$ tc qdisc replace dev swp0 parent 1:1 \
taprio num_tc 8 map 0 1 2 3 4 5 6 7 \
queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 base-time 0 \
sched-entry S 0x7f 990000 sched-entry S 0x80 100000 \
flags 0x0 clockid CLOCK_TAI
Unable to handle kernel paging request at virtual address
fffffffffffffff8
[
fffffffffffffff8] pgd=
0000000000000000, p4d=
0000000000000000
Internal error: Oops:
96000004 [#1] PREEMPT SMP
Call trace:
taprio_dump+0x27c/0x310
vsc9959_port_setup_tc+0x1f4/0x460
felix_port_setup_tc+0x24/0x3c
dsa_slave_setup_tc+0x54/0x27c
taprio_disable_offload.isra.0+0x58/0xe0
taprio_destroy+0x80/0x104
qdisc_create+0x240/0x470
tc_modify_qdisc+0x1fc/0x6b0
rtnetlink_rcv_msg+0x12c/0x390
netlink_rcv_skb+0x5c/0x130
rtnetlink_rcv+0x1c/0x2c
Fix this by keeping track of the operations we made, and undo the
offload only if we actually did it.
I've added "bool offloaded" inside a 4 byte hole between "int clockid"
and "atomic64_t picos_per_byte". Now the first cache line looks like
below:
$ pahole -C taprio_sched net/sched/sch_taprio.o
struct taprio_sched {
struct Qdisc * * qdiscs; /* 0 8 */
struct Qdisc * root; /* 8 8 */
u32 flags; /* 16 4 */
enum tk_offsets tk_offset; /* 20 4 */
int clockid; /* 24 4 */
bool offloaded; /* 28 1 */
/* XXX 3 bytes hole, try to pack */
atomic64_t picos_per_byte; /* 32 0 */
/* XXX 8 bytes hole, try to pack */
spinlock_t current_entry_lock; /* 40 0 */
/* XXX 8 bytes hole, try to pack */
struct sched_entry * current_entry; /* 48 8 */
struct sched_gate_list * oper_sched; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
Fixes: 9c66d1564676 ("taprio: Add support for hardware offloading")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ido Schimmel [Fri, 16 Sep 2022 08:48:21 +0000 (11:48 +0300)]
ipv6: Fix crash when IPv6 is administratively disabled
The global 'raw_v6_hashinfo' variable can be accessed even when IPv6 is
administratively disabled via the 'ipv6.disable=1' kernel command line
option, leading to a crash [1].
Fix by restoring the original behavior and always initializing the
variable, regardless of IPv6 support being administratively disabled or
not.
[1]
BUG: unable to handle page fault for address:
ffffffffffffffc8
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD
173e18067 P4D
173e18067 PUD
173e1a067 PMD 0
Oops: 0000 [#1] PREEMPT SMP KASAN
CPU: 3 PID: 271 Comm: ss Not tainted
6.0.0-rc4-custom-00136-g0727a9a5fbc1 #1396
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-1.fc36 04/01/2014
RIP: 0010:raw_diag_dump+0x310/0x7f0
[...]
Call Trace:
<TASK>
__inet_diag_dump+0x10f/0x2e0
netlink_dump+0x575/0xfd0
__netlink_dump_start+0x67b/0x940
inet_diag_handler_cmd+0x273/0x2d0
sock_diag_rcv_msg+0x317/0x440
netlink_rcv_skb+0x15e/0x430
sock_diag_rcv+0x2b/0x40
netlink_unicast+0x53b/0x800
netlink_sendmsg+0x945/0xe60
____sys_sendmsg+0x747/0x960
___sys_sendmsg+0x13a/0x1e0
__sys_sendmsg+0x118/0x1e0
do_syscall_64+0x34/0x80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Fixes: 0daf07e52709 ("raw: convert raw sockets to RCU")
Reported-by: Roberto Ricci <rroberto2r@gmail.com>
Tested-by: Roberto Ricci <rroberto2r@gmail.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20220916084821.229287-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Fri, 16 Sep 2022 13:32:09 +0000 (16:32 +0300)]
net: enetc: deny offload of tc-based TSN features on VF interfaces
TSN features on the ENETC (taprio, cbs, gate, police) are configured
through a mix of command BD ring messages and port registers:
enetc_port_rd(), enetc_port_wr().
Port registers are a region of the ENETC memory map which are only
accessible from the PCIe Physical Function. They are not accessible from
the Virtual Functions.
Moreover, attempting to access these registers crashes the kernel:
$ echo 1 > /sys/bus/pci/devices/0000\:00\:00.0/sriov_numvfs
pci 0000:00:01.0: [1957:ef00] type 00 class 0x020001
fsl_enetc_vf 0000:00:01.0: Adding to iommu group 15
fsl_enetc_vf 0000:00:01.0: enabling device (0000 -> 0002)
fsl_enetc_vf 0000:00:01.0 eno0vf0: renamed from eth0
$ tc qdisc replace dev eno0vf0 root taprio num_tc 8 map 0 1 2 3 4 5 6 7 \
queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 base-time 0 \
sched-entry S 0x7f 900000 sched-entry S 0x80 100000 flags 0x2
Unable to handle kernel paging request at virtual address
ffff800009551a08
Internal error: Oops:
96000007 [#1] PREEMPT SMP
pc : enetc_setup_tc_taprio+0x170/0x47c
lr : enetc_setup_tc_taprio+0x16c/0x47c
Call trace:
enetc_setup_tc_taprio+0x170/0x47c
enetc_setup_tc+0x38/0x2dc
taprio_change+0x43c/0x970
taprio_init+0x188/0x1e0
qdisc_create+0x114/0x470
tc_modify_qdisc+0x1fc/0x6c0
rtnetlink_rcv_msg+0x12c/0x390
Split enetc_setup_tc() into separate functions for the PF and for the
VF drivers. Also remove enetc_qos.o from being included into
enetc-vf.ko, since it serves absolutely no purpose there.
Fixes: 34c6adf1977b ("enetc: Configure the Time-Aware Scheduler via tc-taprio offload")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20220916133209.3351399-2-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Fri, 16 Sep 2022 13:32:08 +0000 (16:32 +0300)]
net: enetc: move enetc_set_psfp() out of the common enetc_set_features()
The VF netdev driver shouldn't respond to changes in the NETIF_F_HW_TC
flag; only PFs should. Moreover, TSN-specific code should go to
enetc_qos.c, which should not be included in the VF driver.
Fixes: 79e499829f3f ("net: enetc: add hw tc hw offload features for PSPF capability")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20220916133209.3351399-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 20 Sep 2022 18:26:18 +0000 (11:26 -0700)]
Merge branch 'wireguard-patches-for-6-0-rc6'
Jason A. Donenfeld says:
====================
wireguard patches for 6.0-rc6
1) The ratelimiter timing test doesn't help outside of development, yet
it is currently preventing the module from being inserted on some
kernels when it flakes at insertion time. So we disable it.
2) A fix for a build error on UML, caused by a recent change in a
different tree.
3) A WARN_ON() is triggered by Kees' new fortified memcpy() patch, due
to memcpy()ing over a sockaddr pointer with the size of a
sockaddr_in[6]. The type safe fix is pretty simple. Given how classic
of a thing sockaddr punning is, I suspect this may be the first in a
few patches like this throughout the net tree, once Kees' fortify
series is more widely deployed (current it's just in next).
====================
Link: https://lore.kernel.org/r/20220916143740.831881-1-Jason@zx2c4.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jason A. Donenfeld [Fri, 16 Sep 2022 14:37:40 +0000 (15:37 +0100)]
wireguard: netlink: avoid variable-sized memcpy on sockaddr
Doing a variable-sized memcpy is slower, and the compiler isn't smart
enough to turn this into a constant-size assignment.
Further, Kees' latest fortified memcpy will actually bark, because the
destination pointer is type sockaddr, not explicitly sockaddr_in or
sockaddr_in6, so it thinks there's an overflow:
memcpy: detected field-spanning write (size 28) of single field
"&endpoint.addr" at drivers/net/wireguard/netlink.c:446 (size 16)
Fix this by just assigning by using explicit casts for each checked
case.
Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reported-by: syzbot+a448cda4dba2dac50de5@syzkaller.appspotmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jason A. Donenfeld [Fri, 16 Sep 2022 14:37:39 +0000 (15:37 +0100)]
wireguard: selftests: do not install headers on UML
Since
1b620d539ccc ("kbuild: disable header exports for UML in a
straightforward way"), installing headers fails on UML, so just disable
installing them, since they're not needed anyway on the architecture.
Fixes: b438b3b8d6e6 ("wireguard: selftests: support UML")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jason A. Donenfeld [Fri, 16 Sep 2022 14:37:38 +0000 (15:37 +0100)]
wireguard: ratelimiter: disable timings test by default
A previous commit tried to make the ratelimiter timings test more
reliable but in the process made it less reliable on other
configurations. This is an impossible problem to solve without
increasingly ridiculous heuristics. And it's not even a problem that
actually needs to be solved in any comprehensive way, since this is only
ever used during development. So just cordon this off with a DEBUG_
ifdef, just like we do for the trie's randomized tests, so it can be
enabled while hacking on the code, and otherwise disabled in CI. In the
process we also revert
151c8e499f47.
Fixes: 151c8e499f47 ("wireguard: ratelimiter: use hrtimer in selftest")
Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Íñigo Huguet [Thu, 15 Sep 2022 14:19:58 +0000 (16:19 +0200)]
sfc/siena: fix null pointer dereference in efx_hard_start_xmit
Like in previous patch for sfc, prevent potential (but unlikely) NULL
pointer dereference.
Fixes: 12804793b17c ("sfc: decouple TXQ type from label")
Reported-by: Tianhao Zhao <tizhao@redhat.com>
Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
Link: https://lore.kernel.org/r/20220915141958.16458-1-ihuguet@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Íñigo Huguet [Thu, 15 Sep 2022 14:16:53 +0000 (16:16 +0200)]
sfc/siena: fix TX channel offset when using legacy interrupts
As in previous commit for sfc, fix TX channels offset when
efx_siena_separate_tx_channels is false (the default)
Fixes: 25bde571b4a8 ("sfc/siena: fix wrong tx channel offset with efx_separate_tx_channels")
Reported-by: Tianhao Zhao <tizhao@redhat.com>
Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
Link: https://lore.kernel.org/r/20220915141653.15504-1-ihuguet@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tetsuo Handa [Wed, 14 Sep 2022 09:51:54 +0000 (18:51 +0900)]
net: clear msg_get_inq in __get_compat_msghdr()
syzbot is still complaining uninit-value in tcp_recvmsg(), for
commit
1228b34c8d0ecf6d ("net: clear msg_get_inq in __sys_recvfrom() and
__copy_msghdr_from_user()") missed that __get_compat_msghdr() is called
instead of copy_msghdr_from_user() when MSG_CMSG_COMPAT is specified.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Fixes: 1228b34c8d0ecf6d ("net: clear msg_get_inq in __sys_recvfrom() and __copy_msghdr_from_user()")
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/d06d0f7f-696c-83b4-b2d5-70b5f2730a37@I-love.SAKURA.ne.jp
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 20 Sep 2022 15:22:20 +0000 (08:22 -0700)]
Merge branch 'ipmr-always-call-ip-6-_mr_forward-from-rcu-read-side-critical-section'
Ido Schimmel says:
====================
ipmr: Always call ip{,6}_mr_forward() from RCU read-side critical section
Patch #1 fixes a bug in ipmr code.
Patch #2 adds corresponding test cases.
====================
Link: https://lore.kernel.org/r/20220914075339.4074096-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ido Schimmel [Wed, 14 Sep 2022 07:53:39 +0000 (10:53 +0300)]
selftests: forwarding: Add test cases for unresolved multicast routes
Add IPv4 and IPv6 test cases for unresolved multicast routes, testing
that queued packets are forwarded after installing a matching (S, G)
route.
The test cases can be used to reproduce the bugs fixed in "ipmr: Always
call ip{,6}_mr_forward() from RCU read-side critical section".
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ido Schimmel [Wed, 14 Sep 2022 07:53:38 +0000 (10:53 +0300)]
ipmr: Always call ip{,6}_mr_forward() from RCU read-side critical section
These functions expect to be called from RCU read-side critical section,
but this only happens when invoked from the data path via
ip{,6}_mr_input(). They can also be invoked from process context in
response to user space adding a multicast route which resolves a cache
entry with queued packets [1][2].
Fix by adding missing rcu_read_lock() / rcu_read_unlock() in these call
paths.
[1]
WARNING: suspicious RCU usage
6.0.0-rc3-custom-15969-g049d233c8bcc-dirty #1387 Not tainted
-----------------------------
net/ipv4/ipmr.c:84 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
1 lock held by smcrouted/246:
#0:
ffffffff862389b0 (rtnl_mutex){+.+.}-{3:3}, at: ip_mroute_setsockopt+0x11c/0x1420
stack backtrace:
CPU: 0 PID: 246 Comm: smcrouted Not tainted
6.0.0-rc3-custom-15969-g049d233c8bcc-dirty #1387
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-1.fc36 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x91/0xb9
vif_dev_read+0xbf/0xd0
ipmr_queue_xmit+0x135/0x1ab0
ip_mr_forward+0xe7b/0x13d0
ipmr_mfc_add+0x1a06/0x2ad0
ip_mroute_setsockopt+0x5c1/0x1420
do_ip_setsockopt+0x23d/0x37f0
ip_setsockopt+0x56/0x80
raw_setsockopt+0x219/0x290
__sys_setsockopt+0x236/0x4d0
__x64_sys_setsockopt+0xbe/0x160
do_syscall_64+0x34/0x80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
[2]
WARNING: suspicious RCU usage
6.0.0-rc3-custom-15969-g049d233c8bcc-dirty #1387 Not tainted
-----------------------------
net/ipv6/ip6mr.c:69 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
1 lock held by smcrouted/246:
#0:
ffffffff862389b0 (rtnl_mutex){+.+.}-{3:3}, at: ip6_mroute_setsockopt+0x6b9/0x2630
stack backtrace:
CPU: 1 PID: 246 Comm: smcrouted Not tainted
6.0.0-rc3-custom-15969-g049d233c8bcc-dirty #1387
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-1.fc36 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x91/0xb9
vif_dev_read+0xbf/0xd0
ip6mr_forward2.isra.0+0xc9/0x1160
ip6_mr_forward+0xef0/0x13f0
ip6mr_mfc_add+0x1ff2/0x31f0
ip6_mroute_setsockopt+0x1825/0x2630
do_ipv6_setsockopt+0x462/0x4440
ipv6_setsockopt+0x105/0x140
rawv6_setsockopt+0xd8/0x690
__sys_setsockopt+0x236/0x4d0
__x64_sys_setsockopt+0xbe/0x160
do_syscall_64+0x34/0x80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Fixes: ebc3197963fc ("ipmr: add rcu protection over (struct vif_device)->dev")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Tue, 13 Sep 2022 20:46:02 +0000 (15:46 -0500)]
net: ipa: properly limit modem routing table use
IPA can route packets between IPA-connected entities. The AP and
modem are currently the only such entities supported, and no routing
is required to transfer packets between them.
The number of entries in each routing table is fixed, and defined at
initialization time. Some of these entries are designated for use
by the modem, and the rest are available for the AP to use. The AP
sends a QMI message to the modem which describes (among other
things) information about routing table memory available for the
modem to use.
Currently the QMI initialization packet gives wrong information in
its description of routing tables. What *should* be supplied is the
maximum index that the modem can use for the routing table memory
located at a given location. The current code instead supplies the
total *number* of routing table entries. Furthermore, the modem is
granted the entire table, not just the subset it's supposed to use.
This patch fixes this. First, the ipa_mem_bounds structure is
generalized so its "end" field can be interpreted either as a final
byte offset, or a final array index. Second, the IPv4 and IPv6
(non-hashed and hashed) table information fields in the QMI
ipa_init_modem_driver_req structure are changed to be ipa_mem_bounds
rather than ipa_mem_array structures. Third, we set the "end" value
for each routing table to be the last index, rather than setting the
"count" to be the number of indices. Finally, instead of allowing
the modem to use all of a routing table's memory, it is limited to
just the portion meant to be used by the modem. In all versions of
IPA currently supported, that is IPA_ROUTE_MODEM_COUNT (8) entries.
Update a few comments for clarity.
Fixes: 530f9216a9537 ("soc: qcom: ipa: AP/modem communications")
Signed-off-by: Alex Elder <elder@linaro.org>
Link: https://lore.kernel.org/r/20220913204602.1803004-1-elder@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Liang He [Tue, 13 Sep 2022 12:56:59 +0000 (20:56 +0800)]
of: mdio: Add of_node_put() when breaking out of for_each_xx
In of_mdiobus_register(), we should call of_node_put() for 'child'
escaped out of for_each_available_child_of_node().
Fixes: 66bdede495c7 ("of_mdio: Fix broken PHY IRQ in case of probe deferral")
Co-developed-by: Miaoqian Lin <linmq006@gmail.com>
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Signed-off-by: Liang He <windhl@126.com>
Link: https://lore.kernel.org/r/20220913125659.3331969-1-windhl@126.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Cong Wang [Mon, 12 Sep 2022 17:35:53 +0000 (10:35 -0700)]
tcp: read multiple skbs in tcp_read_skb()
Before we switched to ->read_skb(), ->read_sock() was passed with
desc.count=1, which technically indicates we only read one skb per
->sk_data_ready() call. However, for TCP, this is not true.
TCP at least has sk_rcvlowat which intentionally holds skb's in
receive queue until this watermark is reached. This means when
->sk_data_ready() is invoked there could be multiple skb's in the
queue, therefore we have to read multiple skbs in tcp_read_skb()
instead of one.
Fixes: 965b57b469a5 ("net: Introduce a new proto_ops ->read_skb()")
Reported-by: Peilin Ye <peilin.ye@bytedance.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Link: https://lore.kernel.org/r/20220912173553.235838-1-xiyou.wangcong@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Tue, 20 Sep 2022 10:18:08 +0000 (12:18 +0200)]
Merge branch 'revert-fec-ptp-changes'
Francesco Dolcini says:
====================
Revert fec PTP changes
Revert the last 2 FEC PTP changes from Csókás Bence, they are causing multiple
issues and we are at 6.0-rc5.
====================
Link: https://lore.kernel.org/r/20220912070143.98153-1-francesco.dolcini@toradex.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Francesco Dolcini [Mon, 12 Sep 2022 07:01:43 +0000 (09:01 +0200)]
Revert "net: fec: Use a spinlock to guard `fep->ptp_clk_on`"
This reverts commit
b353b241f1eb9b6265358ffbe2632fdcb563354f, this is
creating multiple issues, just not ready to be merged yet.
Link: https://lore.kernel.org/all/CAHk-=wj1obPoTu1AHj9Bd_BGYjdjDyPP+vT5WMj8eheb3A9WHw@mail.gmail.com/
Link: https://lore.kernel.org/all/20220907143915.5w65kainpykfobte@pengutronix.de/
Fixes: b353b241f1eb ("net: fec: Use a spinlock to guard `fep->ptp_clk_on`")
Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Francesco Dolcini [Mon, 12 Sep 2022 07:01:42 +0000 (09:01 +0200)]
Revert "fec: Restart PPS after link state change"
This reverts commit
f79959220fa5fbda939592bf91c7a9ea90419040, this is
creating multiple issues, just not ready to be merged yet.
Link: https://lore.kernel.org/all/20220905180542.GA3685102@roeck-us.net/
Link: https://lore.kernel.org/all/CAHk-=wj1obPoTu1AHj9Bd_BGYjdjDyPP+vT5WMj8eheb3A9WHw@mail.gmail.com/
Fixes: f79959220fa5 ("fec: Restart PPS after link state change")
Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Rakesh Sankaranarayanan [Mon, 12 Sep 2022 05:12:28 +0000 (10:42 +0530)]
net: dsa: microchip: lan937x: fix maximum frame length check
Maximum frame length check is enabled in lan937x switch on POR, But it
is found to be disabled on driver during port setup operation. Due to
this, packets are not dropped when transmitted with greater than configured
value. For testing, setup made for lan1->lan2 transmission and configured
lan1 interface with a frame length (less than 1500 as mentioned in
documentation) and transmitted packets with greater than configured value.
Expected no packets at lan2 end, but packets observed at lan2.
Based on the documentation, packets should get discarded if the actual
packet length doesn't match the frame length configured. Frame length check
should be disabled only for cascaded ports due to tailtags.
This feature was disabled on ksz9477 series due to ptp issue, which is
not in lan937x series. But since lan937x took ksz9477 as base, frame
length check disabled here as well. Patch added to remove this portion
from port setup so that maximum frame length check will be active for
normal ports.
Fixes: 55ab6ffaf378 ("net: dsa: microchip: add DSA support for microchip LAN937x")
Signed-off-by: Rakesh Sankaranarayanan <rakesh.sankaranarayanan@microchip.com>
Link: https://lore.kernel.org/r/20220912051228.1306074-1-rakesh.sankaranarayanan@microchip.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Shailend Chand [Tue, 13 Sep 2022 00:09:01 +0000 (17:09 -0700)]
gve: Fix GFP flags when allocing pages
Use GFP_ATOMIC when allocating pages out of the hotpath,
continue to use GFP_KERNEL when allocating pages during setup.
GFP_KERNEL will allow blocking which allows it to succeed
more often in a low memory enviornment but in the hotpath we do
not want to allow the allocation to block.
Fixes: 9b8dd5e5ea48b ("gve: DQO: Add RX path")
Signed-off-by: Shailend Chand <shailend@google.com>
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Link: https://lore.kernel.org/r/20220913000901.959546-1-jeroendb@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vadim Fedorenko [Thu, 15 Sep 2022 23:49:32 +0000 (02:49 +0300)]
bnxt_en: fix flags to check for supported fw version
The warning message of unsupported FW appears every time RX timestamps
are disabled on the interface. The patch fixes the flags to correct set
for the check.
Fixes: 66ed81dcedc6 ("bnxt_en: Enable packet timestamping for all RX packets")
Cc: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20220915234932.25497-1-vfedorenko@novek.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 20 Sep 2022 01:17:22 +0000 (18:17 -0700)]
Merge tag 'wireless-2022-09-19' of git://git./linux/kernel/git/wireless/wireless
Kalle Valo says:
====================
wireless fixes for v6.0
Late stage fixes for v6.0. Temporarily mark iwlwifi's mei code broken
as it breaks suspend for iwd users and also don't spam nss trimming
messages. mt76 has fixes for aggregation sequence numbers and a
regression related to the VHT extended NSS BW feature.
* tag 'wireless-2022-09-19' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: mt76: fix 5 GHz connection regression on mt76x0/mt76x2
wifi: mt76: fix reading current per-tid starting sequence number for aggregation
wifi: iwlwifi: Mark IWLMEI as broken
wifi: iwlwifi: don't spam logs with NSS>2 messages
====================
Link: https://lore.kernel.org/r/20220919105003.1EAE7C433B5@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 20 Sep 2022 01:13:43 +0000 (18:13 -0700)]
Merge tag 'batadv-net-pullrequest-
20220916' of git://git.open-mesh.org/linux-merge
Simon Wunderlich says:
====================
Here is a batman-adv bugfix:
- Fix hang up with small MTU hard-interface, by Shigeru Yoshida
* tag 'batadv-net-pullrequest-
20220916' of git://git.open-mesh.org/linux-merge:
batman-adv: Fix hang up with small MTU hard-interface
====================
Link: https://lore.kernel.org/r/20220916160931.1412407-1-sw@simonwunderlich.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Íñigo Huguet [Wed, 14 Sep 2022 11:11:35 +0000 (13:11 +0200)]
sfc: fix null pointer dereference in efx_hard_start_xmit
Trying to get the channel from the tx_queue variable here is wrong
because we can only be here if tx_queue is NULL, so we shouldn't
dereference it. As the above comment in the code says, this is very
unlikely to happen, but it's wrong anyway so let's fix it.
I hit this issue because of a different bug that caused tx_queue to be
NULL. If that happens, this is the error message that we get here:
BUG: unable to handle kernel NULL pointer dereference at
0000000000000020
[...]
RIP: 0010:efx_hard_start_xmit+0x153/0x170 [sfc]
Fixes: 12804793b17c ("sfc: decouple TXQ type from label")
Reported-by: Tianhao Zhao <tizhao@redhat.com>
Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://lore.kernel.org/r/20220914111135.21038-1-ihuguet@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Íñigo Huguet [Wed, 14 Sep 2022 10:36:48 +0000 (12:36 +0200)]
sfc: fix TX channel offset when using legacy interrupts
In legacy interrupt mode the tx_channel_offset was hardcoded to 1, but
that's not correct if efx_sepparate_tx_channels is false. In that case,
the offset is 0 because the tx queues are in the single existing channel
at index 0, together with the rx queue.
Without this fix, as soon as you try to send any traffic, it tries to
get the tx queues from an uninitialized channel getting these errors:
WARNING: CPU: 1 PID: 0 at drivers/net/ethernet/sfc/tx.c:540 efx_hard_start_xmit+0x12e/0x170 [sfc]
[...]
RIP: 0010:efx_hard_start_xmit+0x12e/0x170 [sfc]
[...]
Call Trace:
<IRQ>
dev_hard_start_xmit+0xd7/0x230
sch_direct_xmit+0x9f/0x360
__dev_queue_xmit+0x890/0xa40
[...]
BUG: unable to handle kernel NULL pointer dereference at
0000000000000020
[...]
RIP: 0010:efx_hard_start_xmit+0x153/0x170 [sfc]
[...]
Call Trace:
<IRQ>
dev_hard_start_xmit+0xd7/0x230
sch_direct_xmit+0x9f/0x360
__dev_queue_xmit+0x890/0xa40
[...]
Fixes: c308dfd1b43e ("sfc: fix wrong tx channel offset with efx_separate_tx_channels")
Reported-by: Tianhao Zhao <tizhao@redhat.com>
Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://lore.kernel.org/r/20220914103648.16902-1-ihuguet@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 20 Sep 2022 01:01:04 +0000 (18:01 -0700)]
Merge tag 'for-net-2022-09-09' of git://git./linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- Fix HCIGETDEVINFO regression
* tag 'for-net-2022-09-09' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: Fix HCIGETDEVINFO regression
====================
Link: https://lore.kernel.org/r/20220909201642.3810565-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Lorenzo Bianconi [Tue, 13 Sep 2022 13:03:05 +0000 (15:03 +0200)]
net: ethernet: mtk_eth_soc: enable XDP support just for MT7986 SoC
Disable page_pool/XDP support for MT7621 SoC in order fix a regression
introduce adding XDP for MT7986 SoC. There is no a real use case for XDP
on MT7621 since it is a low-end cpu. Moreover this patch reduces the
memory footprint.
Tested-by: Sergio Paracuellos <sergio.paracuellos@gmail.com>
Tested-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Fixes: 23233e577ef9 ("net: ethernet: mtk_eth_soc: rely on page_pool for single page buffers")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/2bf31e27b888c43228b0d84dd2ef5033338269e2.1663074002.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Haiyang Zhang [Sun, 11 Sep 2022 20:40:05 +0000 (13:40 -0700)]
net: mana: Add rmb after checking owner bits
Per GDMA spec, rmb is necessary after checking owner_bits, before
reading EQ or CQ entries.
Add rmb in these two places to comply with the specs.
Cc: stable@vger.kernel.org
Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")
Reported-by: Sinan Kaya <Sinan.Kaya@microsoft.com>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Link: https://lore.kernel.org/r/1662928805-15861-1-git-send-email-haiyangz@microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jeroen de Borst [Tue, 13 Sep 2022 18:53:19 +0000 (11:53 -0700)]
MAINTAINERS: gve: update developers
Updating active developers.
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Link: https://lore.kernel.org/r/20220913185319.1061909-1-jeroendb@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ido Schimmel [Fri, 9 Sep 2022 15:38:30 +0000 (18:38 +0300)]
netdevsim: Fix hwstats debugfs file permissions
The hwstats debugfs files are only writeable, but they are created with
read and write permissions, causing certain selftests to fail [1].
Fix by creating the files with write permission only.
[1]
# ./test_offload.py
Test destruction of generic XDP...
Traceback (most recent call last):
File "/home/idosch/code/linux/tools/testing/selftests/bpf/./test_offload.py", line 810, in <module>
simdev = NetdevSimDev()
[...]
Exception: Command failed: cat /sys/kernel/debug/netdevsim/netdevsim0//ports/0/dev/hwstats/l3/disable_ifindex
cat: /sys/kernel/debug/netdevsim/netdevsim0//ports/0/dev/hwstats/l3/disable_ifindex: Invalid argument
Fixes: 1a6d7ae7d63c ("netdevsim: Introduce support for L3 offload xstats")
Reported-by: Jie2x Zhou <jie2x.zhou@intel.com>
Tested-by: Jie2x Zhou <jie2x.zhou@intel.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/20220909153830.3732504-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Michal Jaron [Thu, 1 Sep 2022 07:49:33 +0000 (09:49 +0200)]
i40e: Fix set max_tx_rate when it is lower than 1 Mbps
While converting max_tx_rate from bytes to Mbps, this value was set to 0,
if the original value was lower than 125000 bytes (1 Mbps). This would
cause no transmission rate limiting to occur. This happened due to lack of
check of max_tx_rate against the 1 Mbps value for max_tx_rate and the
following division by 125000. Fix this issue by adding a helper
i40e_bw_bytes_to_mbits() which sets max_tx_rate to minimum usable value of
50 Mbps, if its value is less than 1 Mbps, otherwise do the required
conversion by dividing by 125000.
Fixes: 5ecae4120a6b ("i40e: Refactor VF BW rate limiting")
Signed-off-by: Michal Jaron <michalx.jaron@intel.com>
Signed-off-by: Andrii Staikov <andrii.staikov@intel.com>
Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Michal Jaron [Tue, 13 Sep 2022 13:38:36 +0000 (15:38 +0200)]
i40e: Fix VF set max MTU size
Max MTU sent to VF is set to 0 during memory allocation. It cause
that max MTU on VF is changed to IAVF_MAX_RXBUFFER and does not
depend on data from HW.
Set max_mtu field in virtchnl_vf_resource struct to inform
VF in GET_VF_RESOURCES msg what size should be max frame.
Fixes: dab86afdbbd1 ("i40e/i40evf: Change the way we limit the maximum frame size for Rx")
Signed-off-by: Michal Jaron <michalx.jaron@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Michal Jaron [Tue, 13 Sep 2022 13:38:35 +0000 (15:38 +0200)]
iavf: Fix set max MTU size with port VLAN and jumbo frames
After setting port VLAN and MTU to 9000 on VF with ice driver there
was an iavf error
"PF returned error -5 (IAVF_ERR_PARAM) to our request 6".
During queue configuration, VF's max packet size was set to
IAVF_MAX_RXBUFFER but on ice max frame size was smaller by VLAN_HLEN
due to making some space for port VLAN as VF is not aware whether it's
in a port VLAN. This mismatch in sizes caused ice to reject queue
configuration with ERR_PARAM error. Proper max_mtu is sent from ice PF
to VF with GET_VF_RESOURCES msg but VF does not look at this.
In iavf change max_frame from IAVF_MAX_RXBUFFER to max_mtu
received from pf with GET_VF_RESOURCES msg to make vf's
max_frame_size dependent from pf. Add check if received max_mtu is
not in eligible range then set it to IAVF_MAX_RXBUFFER.
Fixes: dab86afdbbd1 ("i40e/i40evf: Change the way we limit the maximum frame size for Rx")
Signed-off-by: Michal Jaron <michalx.jaron@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
David Thompson [Fri, 2 Sep 2022 16:42:47 +0000 (12:42 -0400)]
mlxbf_gige: clear MDIO gateway lock after read
The MDIO gateway (GW) lock in BlueField-2 GIGE logic is
set after read. This patch adds logic to make sure the
lock is always cleared at the end of each MDIO transaction.
Fixes: f92e1869d74e ("Add Mellanox BlueField Gigabit Ethernet driver")
Reviewed-by: Asmaa Mnebhi <asmaa@nvidia.com>
Signed-off-by: David Thompson <davthompson@nvidia.com>
Link: https://lore.kernel.org/r/20220902164247.19862-1-davthompson@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Norbert Zulinski [Wed, 14 Sep 2022 13:39:13 +0000 (15:39 +0200)]
iavf: Fix bad page state
Fix bad page state, free inappropriate page in handling dummy
descriptor. iavf_build_skb now has to check not only if rx_buffer is
NULL but also if size is zero, same thing in iavf_clean_rx_irq.
Without this patch driver would free page that will be used
by napi_build_skb.
Fixes: a9f49e006030 ("iavf: Fix handling of dummy receive descriptors")
Signed-off-by: Norbert Zulinski <norbertx.zulinski@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Peilin Ye [Thu, 8 Sep 2022 23:15:23 +0000 (16:15 -0700)]
tcp: Use WARN_ON_ONCE() in tcp_read_skb()
Prevent tcp_read_skb() from flooding the syslog.
Suggested-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 16 Sep 2022 13:34:01 +0000 (14:34 +0100)]
Merge branch 'net-unsync-addresses-from-ports'
From: Benjamin Poirier <bpoirier@nvidia.com>
To: netdev@vger.kernel.org
Cc: Jay Vosburgh <j.vosburgh@gmail.com>,
Veaceslav Falico <vfalico@gmail.com>,
Andy Gospodarek <andy@greyhouse.net>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Jiri Pirko <jiri@resnulli.us>, Shuah Khan <shuah@kernel.org>,
Jonathan Toppins <jtoppins@redhat.com>,
linux-kselftest@vger.kernel.org
Subject: [PATCH net v3 0/4] Unsync addresses from ports when stopping aggregated devices
Date: Wed, 7 Sep 2022 16:56:38 +0900 [thread overview]
Message-ID: <
20220907075642.475236-1-bpoirier@nvidia.com> (raw)
This series fixes similar problems in the bonding and team drivers.
Because of missing dev_{uc,mc}_unsync() calls, addresses added to
underlying devices may be leftover after the aggregated device is deleted.
Add the missing calls and a few related tests.
v2:
* fix selftest installation, see patch 3
v3:
* Split lacpdu_multicast changes to their own patch, #1
* In ndo_{add,del}_slave methods, only perform address list changes when
the aggregated device is up (patches 2 & 3)
* Add selftest function related to the above change (patch 4)
====================
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Benjamin Poirier [Wed, 7 Sep 2022 07:56:42 +0000 (16:56 +0900)]
net: Add tests for bonding and team address list management
Test that the bonding and team drivers clean up an underlying device's
address lists (dev->uc, dev->mc) when the aggregated device is deleted.
Test addition and removal of the LACPDU multicast address on underlying
devices by the bonding driver.
v2:
* add lag_lib.sh to TEST_FILES
v3:
* extend bond_listen_lacpdu_multicast test to init_state up and down cases
* remove some superfluous shell syntax and 'set dev ... up' commands
Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Benjamin Poirier [Wed, 7 Sep 2022 07:56:41 +0000 (16:56 +0900)]
net: team: Unsync device addresses on ndo_stop
Netdev drivers are expected to call dev_{uc,mc}_sync() in their
ndo_set_rx_mode method and dev_{uc,mc}_unsync() in their ndo_stop method.
This is mentioned in the kerneldoc for those dev_* functions.
The team driver calls dev_{uc,mc}_unsync() during ndo_uninit instead of
ndo_stop. This is ineffective because address lists (dev->{uc,mc}) have
already been emptied in unregister_netdevice_many() before ndo_uninit is
called. This mistake can result in addresses being leftover on former team
ports after a team device has been deleted; see test_LAG_cleanup() in the
last patch in this series.
Add unsync calls at their expected location, team_close().
v3:
* When adding or deleting a port, only sync/unsync addresses if the team
device is up. In other cases, it is taken care of at the right time by
ndo_open/ndo_set_rx_mode/ndo_stop.
Fixes: 3d249d4ca7d0 ("net: introduce ethernet teaming device")
Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Benjamin Poirier [Wed, 7 Sep 2022 07:56:40 +0000 (16:56 +0900)]
net: bonding: Unsync device addresses on ndo_stop
Netdev drivers are expected to call dev_{uc,mc}_sync() in their
ndo_set_rx_mode method and dev_{uc,mc}_unsync() in their ndo_stop method.
This is mentioned in the kerneldoc for those dev_* functions.
The bonding driver calls dev_{uc,mc}_unsync() during ndo_uninit instead of
ndo_stop. This is ineffective because address lists (dev->{uc,mc}) have
already been emptied in unregister_netdevice_many() before ndo_uninit is
called. This mistake can result in addresses being leftover on former bond
slaves after a bond has been deleted; see test_LAG_cleanup() in the last
patch in this series.
Add unsync calls, via bond_hw_addr_flush(), at their expected location,
bond_close().
Add dev_mc_add() call to bond_open() to match the above change.
v3:
* When adding or deleting a slave, only sync/unsync, add/del addresses if
the bond is up. In other cases, it is taken care of at the right time by
ndo_open/ndo_set_rx_mode/ndo_stop.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Benjamin Poirier [Wed, 7 Sep 2022 07:56:39 +0000 (16:56 +0900)]
net: bonding: Share lacpdu_mcast_addr definition
There are already a few definitions of arrays containing
MULTICAST_LACPDU_ADDR and the next patch will add one more use. These all
contain the same constant data so define one common instance for all
bonding code.
Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 16 Sep 2022 11:16:44 +0000 (12:16 +0100)]
Merge branch '100GbE' of git://git./linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2022-09-08 (ice, iavf)
This series contains updates to ice and iavf drivers.
Dave removes extra unplug of auxiliary bus on reset which caused a
scheduling while atomic to be reported for ice.
Ding Hui defers setting of queues for TCs to ensure valid configuration
and restores old config if invalid for ice.
Sylwester fixes a check of setting MAC address to occur after result is
received from PF for iavf driver.
Brett changes check of ring tail to use software cached value as not all
devices have access to register tail for iavf driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Oleksandr Mazur [Thu, 8 Sep 2022 13:14:46 +0000 (16:14 +0300)]
net: marvell: prestera: add support for for Aldrin2
Aldrin2 (98DX8525) is a Marvell Prestera PP, with 100G support.
Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu>
V2:
- retarget to net tree instead of net-next;
- fix missed colon in patch subject ('net marvell' vs 'net: mavell');
Signed-off-by: David S. Miller <davem@davemloft.net>
Haimin Zhang [Thu, 8 Sep 2022 12:19:27 +0000 (20:19 +0800)]
net/ieee802154: fix uninit value bug in dgram_sendmsg
There is uninit value bug in dgram_sendmsg function in
net/ieee802154/socket.c when the length of valid data pointed by the
msg->msg_name isn't verified.
We introducing a helper function ieee802154_sockaddr_check_size to
check namelen. First we check there is addr_type in ieee802154_addr_sa.
Then, we check namelen according to addr_type.
Also fixed in raw_bind, dgram_bind, dgram_connect.
Signed-off-by: Haimin Zhang <tcs_kernel@tencent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Matthieu Baerts [Tue, 6 Sep 2022 18:04:02 +0000 (20:04 +0200)]
Documentation: mptcp: fix pm_type formatting
When looking at the rendered HTML version, we can see 'pm_type' is not
displayed with a bold font:
https://docs.kernel.org/5.19/networking/mptcp-sysctl.html
The empty line under 'pm_type' is then removed to have the same style as
the others.
Fixes: 6bb63ccc25d4 ("mptcp: Add a per-namespace sysctl to set the default path manager type")
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Link: https://lore.kernel.org/r/20220906180404.1255873-2-matthieu.baerts@tessares.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Tue, 6 Sep 2022 18:04:01 +0000 (20:04 +0200)]
mptcp: fix fwd memory accounting on coalesce
The intel bot reported a memory accounting related splat:
[ 240.473094] ------------[ cut here ]------------
[ 240.478507] page_counter underflow: -
4294828518 nr_pages=
4294967290
[ 240.485500] WARNING: CPU: 2 PID: 14986 at mm/page_counter.c:56 page_counter_cancel+0x96/0xc0
[ 240.570849] CPU: 2 PID: 14986 Comm: mptcp_connect Tainted: G S
5.19.0-rc4-00739-gd24141fe7b48 #1
[ 240.581637] Hardware name: HP HP Z240 SFF Workstation/802E, BIOS N51 Ver. 01.63 10/05/2017
[ 240.590600] RIP: 0010:page_counter_cancel+0x96/0xc0
[ 240.596179] Code: 00 00 00 45 31 c0 48 89 ef 5d 4c 89 c6 41 5c e9 40 fd ff ff 4c 89 e2 48 c7 c7 20 73 39 84 c6 05 d5 b1 52 04 01 e8 e7 95 f3
01 <0f> 0b eb a9 48 89 ef e8 1e 25 fc ff eb c3 66 66 2e 0f 1f 84 00 00
[ 240.615639] RSP: 0018:
ffffc9000496f7c8 EFLAGS:
00010082
[ 240.621569] RAX:
0000000000000000 RBX:
ffff88819c9c0120 RCX:
0000000000000000
[ 240.629404] RDX:
0000000000000027 RSI:
0000000000000004 RDI:
fffff5200092deeb
[ 240.637239] RBP:
ffff88819c9c0120 R08:
0000000000000001 R09:
ffff888366527a2b
[ 240.645069] R10:
ffffed106cca4f45 R11:
0000000000000001 R12:
00000000fffffffa
[ 240.652903] R13:
ffff888366536118 R14:
00000000fffffffa R15:
ffff88819c9c0000
[ 240.660738] FS:
00007f3786e72540(0000) GS:
ffff888366500000(0000) knlGS:
0000000000000000
[ 240.669529] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 240.675974] CR2:
00007f966b346000 CR3:
0000000168cea002 CR4:
00000000003706e0
[ 240.683807] DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
[ 240.691641] DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
[ 240.699468] Call Trace:
[ 240.702613] <TASK>
[ 240.705413] page_counter_uncharge+0x29/0x80
[ 240.710389] drain_stock+0xd0/0x180
[ 240.714585] refill_stock+0x278/0x580
[ 240.718951] __sk_mem_reduce_allocated+0x222/0x5c0
[ 240.729248] __mptcp_update_rmem+0x235/0x2c0
[ 240.734228] __mptcp_move_skbs+0x194/0x6c0
[ 240.749764] mptcp_recvmsg+0xdfa/0x1340
[ 240.763153] inet_recvmsg+0x37f/0x500
[ 240.782109] sock_read_iter+0x24a/0x380
[ 240.805353] new_sync_read+0x420/0x540
[ 240.838552] vfs_read+0x37f/0x4c0
[ 240.842582] ksys_read+0x170/0x200
[ 240.864039] do_syscall_64+0x5c/0x80
[ 240.872770] entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 240.878526] RIP: 0033:0x7f3786d9ae8e
[ 240.882805] Code: c0 e9 b6 fe ff ff 50 48 8d 3d 6e 18 0a 00 e8 89 e8 01 00 66 0f 1f 84 00 00 00 00 00 64 8b 04 25 18 00 00 00 85 c0 75 14 0f 05 <48> 3d 00 f0 ff ff 77 5a c3 66 0f 1f 84 00 00 00 00 00 48 83 ec 28
[ 240.902259] RSP: 002b:
00007fff7be81e08 EFLAGS:
00000246 ORIG_RAX:
0000000000000000
[ 240.910533] RAX:
ffffffffffffffda RBX:
0000000000002000 RCX:
00007f3786d9ae8e
[ 240.918368] RDX:
0000000000002000 RSI:
00007fff7be87ec0 RDI:
0000000000000005
[ 240.926206] RBP:
0000000000000005 R08:
00007f3786e6a230 R09:
00007f3786e6a240
[ 240.934046] R10:
fffffffffffff288 R11:
0000000000000246 R12:
0000000000002000
[ 240.941884] R13:
00007fff7be87ec0 R14:
00007fff7be87ec0 R15:
0000000000002000
[ 240.949741] </TASK>
[ 240.952632] irq event stamp: 27367
[ 240.956735] hardirqs last enabled at (27366): [<
ffffffff81ba50ea>] mem_cgroup_uncharge_skmem+0x6a/0x80
[ 240.966848] hardirqs last disabled at (27367): [<
ffffffff81b8fd42>] refill_stock+0x282/0x580
[ 240.976017] softirqs last enabled at (27360): [<
ffffffff83a4d8ef>] mptcp_recvmsg+0xaf/0x1340
[ 240.985273] softirqs last disabled at (27364): [<
ffffffff83a4d30c>] __mptcp_move_skbs+0x18c/0x6c0
[ 240.994872] ---[ end trace
0000000000000000 ]---
After commit
d24141fe7b48 ("mptcp: drop SK_RECLAIM_* macros"),
if rmem_fwd_alloc become negative, mptcp_rmem_uncharge() can
try to reclaim a negative amount of pages, since the expression:
reclaimable >= PAGE_SIZE
will evaluate to true for any negative value of the int
'reclaimable': 'PAGE_SIZE' is an unsigned long and
the negative integer will be promoted to a (very large)
unsigned long value.
Still after the mentioned commit, kfree_skb_partial()
in mptcp_try_coalesce() will reclaim most of just released fwd
memory, so that following charging of the skb delta size will
lead to negative fwd memory values.
At that point a racing recvmsg() can trigger the splat.
Address the issue switching the order of the memory accounting
operations. The fwd memory can still transiently reach negative
values, but that will happen in an atomic scope and no code
path could touch/use such value.
Reported-by: kernel test robot <oliver.sang@intel.com>
Fixes: d24141fe7b48 ("mptcp: drop SK_RECLAIM_* macros")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Link: https://lore.kernel.org/r/20220906180404.1255873-1-matthieu.baerts@tessares.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Ioana Ciornei [Tue, 6 Sep 2022 13:04:51 +0000 (16:04 +0300)]
net: phy: aquantia: wait for the suspend/resume operations to finish
The Aquantia datasheet notes that after issuing a Processor-Intensive
MDIO operation, like changing the low-power state of the device, the
driver should wait for the operation to finish before issuing a new MDIO
command.
The new aqr107_wait_processor_intensive_op() function is added which can
be used after these kind of MDIO operations. At the moment, we are only
adding it at the end of the suspend/resume calls.
The issue was identified on a board featuring the AQR113C PHY, on
which commands like 'ip link (..) up / down' issued without any delays
between them would render the link on the PHY to remain down.
The issue was easy to reproduce with a one-liner:
$ ip link set dev ethX down; ip link set dev ethX up; \
ip link set dev ethX down; ip link set dev ethX up;
Fixes: ac9e81c230eb ("net: phy: aquantia: add suspend / resume callbacks for AQR107 family")
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20220906130451.1483448-1-ioana.ciornei@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Felix Fietkau [Wed, 7 Sep 2022 09:52:28 +0000 (11:52 +0200)]
wifi: mt76: fix 5 GHz connection regression on mt76x0/mt76x2
Some users have reported being unable to connect to MT76x0 APs running mt76
after a commit enabling the VHT extneded NSS BW feature.
Fix this regression by ensuring that this feature only gets enabled on drivers
that support it
Cc: stable@vger.kernel.org
Fixes: d9fcfc1424aa ("mt76: enable the VHT extended NSS BW feature")
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20220907095228.82072-1-nbd@nbd.name
Felix Fietkau [Fri, 26 Aug 2022 18:23:29 +0000 (20:23 +0200)]
wifi: mt76: fix reading current per-tid starting sequence number for aggregation
The code was accidentally shifting register values down by tid % 32 instead of
(tid * field_size) % 32.
Cc: stable@vger.kernel.org
Fixes: a28bef561a5c ("mt76: mt7615: re-enable offloading of sequence number assignment")
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20220826182329.18155-1-nbd@nbd.name
Toke Høiland-Jørgensen [Wed, 7 Sep 2022 13:44:50 +0000 (15:44 +0200)]
wifi: iwlwifi: Mark IWLMEI as broken
The iwlmei driver breaks iwlwifi when returning from suspend. The interface
ends up in the 'down' state after coming back from suspend. And iwd doesn't
touch the interface state, but wpa_supplicant does, so the bug only happens on
iwd.
The bug report[0] has been open for four months now, and no fix seems to be
forthcoming. Since just disabling the iwlmei driver works as a workaround,
let's mark the config option as broken until it can be fixed properly.
[0] https://bugzilla.kernel.org/show_bug.cgi?id=215937
Fixes: 2da4366f9e2c ("iwlwifi: mei: add the driver to allow cooperation with CSME")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20220907134450.1183045-1-toke@toke.dk
Luiz Augusto von Dentz [Thu, 8 Sep 2022 20:57:50 +0000 (13:57 -0700)]
Bluetooth: Fix HCIGETDEVINFO regression
Recent changes breaks HCIGETDEVINFO since it changes the size of
hci_dev_info.
Fixes: 26afbd826ee3 ("Bluetooth: Add initial implementation of CIS connections")
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Ludovic Cintrat [Wed, 7 Sep 2022 10:08:13 +0000 (12:08 +0200)]
net: core: fix flow symmetric hash
__flow_hash_consistentify() wrongly swaps ipv4 addresses in few cases.
This function is indirectly used by __skb_get_hash_symmetric(), which is
used to fanout packets in AF_PACKET.
Intrusion detection systems may be impacted by this issue.
__flow_hash_consistentify() computes the addresses difference then swaps
them if the difference is negative. In few cases src - dst and dst - src
are both negative.
The following snippet mimics __flow_hash_consistentify():
```
#include <stdio.h>
#include <stdint.h>
int main(int argc, char** argv) {
int diffs_d, diffd_s;
uint32_t dst = 0xb225a8c0; /* 178.37.168.192 --> 192.168.37.178 */
uint32_t src = 0x3225a8c0; /* 50.37.168.192 --> 192.168.37.50 */
uint32_t dst2 = 0x3325a8c0; /* 51.37.168.192 --> 192.168.37.51 */
diffs_d = src - dst;
diffd_s = dst - src;
printf("src:%08x dst:%08x, diff(s-d)=%d(0x%x) diff(d-s)=%d(0x%x)\n",
src, dst, diffs_d, diffs_d, diffd_s, diffd_s);
diffs_d = src - dst2;
diffd_s = dst2 - src;
printf("src:%08x dst:%08x, diff(s-d)=%d(0x%x) diff(d-s)=%d(0x%x)\n",
src, dst2, diffs_d, diffs_d, diffd_s, diffd_s);
return 0;
}
```
Results:
src:
3225a8c0 dst:
b225a8c0, \
diff(s-d)=-
2147483648(0x80000000) \
diff(d-s)=-
2147483648(0x80000000)
src:
3225a8c0 dst:
3325a8c0, \
diff(s-d)=-
16777216(0xff000000) \
diff(d-s)=
16777216(0x1000000)
In the first case the addresses differences are always < 0, therefore
__flow_hash_consistentify() always swaps, thus dst->src and src->dst
packets have differents hashes.
Fixes: c3f8324188fa8 ("net: Add full IPv6 addresses to flow_keys")
Signed-off-by: Ludovic Cintrat <ludovic.cintrat@gatewatcher.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lu Wei [Wed, 7 Sep 2022 10:12:04 +0000 (18:12 +0800)]
ipvlan: Fix out-of-bound bugs caused by unset skb->mac_header
If an AF_PACKET socket is used to send packets through ipvlan and the
default xmit function of the AF_PACKET socket is changed from
dev_queue_xmit() to packet_direct_xmit() via setsockopt() with the option
name of PACKET_QDISC_BYPASS, the skb->mac_header may not be reset and
remains as the initial value of 65535, this may trigger slab-out-of-bounds
bugs as following:
=================================================================
UG: KASAN: slab-out-of-bounds in ipvlan_xmit_mode_l2+0xdb/0x330 [ipvlan]
PU: 2 PID: 1768 Comm: raw_send Kdump: loaded Not tainted 6.0.0-rc4+ #6
ardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33
all Trace:
print_address_description.constprop.0+0x1d/0x160
print_report.cold+0x4f/0x112
kasan_report+0xa3/0x130
ipvlan_xmit_mode_l2+0xdb/0x330 [ipvlan]
ipvlan_start_xmit+0x29/0xa0 [ipvlan]
__dev_direct_xmit+0x2e2/0x380
packet_direct_xmit+0x22/0x60
packet_snd+0x7c9/0xc40
sock_sendmsg+0x9a/0xa0
__sys_sendto+0x18a/0x230
__x64_sys_sendto+0x74/0x90
do_syscall_64+0x3b/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
The root cause is:
1. packet_snd() only reset skb->mac_header when sock->type is SOCK_RAW
and skb->protocol is not specified as in packet_parse_headers()
2. packet_direct_xmit() doesn't reset skb->mac_header as dev_queue_xmit()
In this case, skb->mac_header is 65535 when ipvlan_xmit_mode_l2() is
called. So when ipvlan_xmit_mode_l2() gets mac header with eth_hdr() which
use "skb->head + skb->mac_header", out-of-bound access occurs.
This patch replaces eth_hdr() with skb_eth_hdr() in ipvlan_xmit_mode_l2()
and reset mac header in multicast to solve this out-of-bound bug.
Fixes: 2ad7bf363841 ("ipvlan: Initial check-in of the IPVLAN driver.")
Signed-off-by: Lu Wei <luwei32@huawei.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 9 Sep 2022 09:06:34 +0000 (10:06 +0100)]
Merge git://git./linux/kernel/git/netfilter/nf
Florian Westhal says:
====================
netfilter: bugfixes for net
The following set contains four netfilter patches for your *net* tree.
When there are multiple Contact headers in a SIP message its possible
the next headers won't be found because the SIP helper confuses relative
and absolute offsets in the message. From Igor Ryzhov.
Make the nft_concat_range self-test support socat, this makes the
selftest pass on my test VM, from myself.
nf_conntrack_irc helper can be tricked into opening a local port forward
that the client never requested by embedding a DCC message in a PING
request sent to the client. Fix from David Leadbeater.
Both have been broken since the kernel 2.6.x days.
The 'osf' match might indicate success while it could not find
anything, broken since 5.2 . Fix from Pablo Neira.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Brett Creeley [Thu, 1 Sep 2022 14:34:40 +0000 (16:34 +0200)]
iavf: Fix cached head and tail value for iavf_get_tx_pending
The underlying hardware may or may not allow reading of the head or tail
registers and it really makes no difference if we use the software
cached values. So, always used the software cached values.
Fixes: 9c6c12595b73 ("i40e: Detection and recovery of TX queue hung logic moved to service_task from tx_timeout")
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Co-developed-by: Norbert Zulinski <norbertx.zulinski@intel.com>
Signed-off-by: Norbert Zulinski <norbertx.zulinski@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Sylwester Dziedziuch [Thu, 1 Sep 2022 14:32:06 +0000 (16:32 +0200)]
iavf: Fix change VF's mac address
Previously changing mac address gives false negative because
ip link set <interface> address <MAC> return with
RTNLINK: Permission denied.
In iavf_set_mac was check if PF handled our mac set request,
even before filter was added to list.
Because this check returns always true and it never waits for
PF's response.
Move iavf_is_mac_handled to wait_event_interruptible_timeout
instead of false. Now it will wait for PF's response and then
check if address was added or rejected.
Fixes: 35a2443d0910 ("iavf: Add waiting for response from PF in set mac")
Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>
Co-developed-by: Norbert Zulinski <norbertx.zulinski@intel.com>
Signed-off-by: Norbert Zulinski <norbertx.zulinski@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Ding Hui [Wed, 17 Aug 2022 10:53:18 +0000 (18:53 +0800)]
ice: Fix crash by keep old cfg when update TCs more than queues
There are problems if allocated queues less than Traffic Classes.
Commit
a632b2a4c920 ("ice: ethtool: Prohibit improper channel config
for DCB") already disallow setting less queues than TCs.
Another case is if we first set less queues, and later update more TCs
config due to LLDP, ice_vsi_cfg_tc() will failed but left dirty
num_txq/rxq and tc_cfg in vsi, that will cause invalid pointer access.
[ 95.968089] ice 0000:3b:00.1: More TCs defined than queues/rings allocated.
[ 95.968092] ice 0000:3b:00.1: Trying to use more Rx queues (8), than were allocated (1)!
[ 95.968093] ice 0000:3b:00.1: Failed to config TC for VSI index: 0
[ 95.969621] general protection fault: 0000 [#1] SMP NOPTI
[ 95.969705] CPU: 1 PID: 58405 Comm: lldpad Kdump: loaded Tainted: G U W O --------- -t - 4.18.0 #1
[ 95.969867] Hardware name: O.E.M/BC11SPSCB10, BIOS 8.23 12/30/2021
[ 95.969992] RIP: 0010:devm_kmalloc+0xa/0x60
[ 95.970052] Code: 5c ff ff ff 31 c0 5b 5d 41 5c c3 b8 f4 ff ff ff eb f4 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 89 d1 <8b> 97 60 02 00 00 48 8d 7e 18 48 39 f7 72 3f 55 89 ce 53 48 8b 4c
[ 95.970344] RSP: 0018:
ffffc9003f553888 EFLAGS:
00010206
[ 95.970425] RAX:
dead000000000200 RBX:
ffffea003c425b00 RCX:
00000000006080c0
[ 95.970536] RDX:
00000000006080c0 RSI:
0000000000000200 RDI:
dead000000000200
[ 95.970648] RBP:
dead000000000200 R08:
00000000000463c0 R09:
ffff888ffa900000
[ 95.970760] R10:
0000000000000000 R11:
0000000000000002 R12:
ffff888ff6b40100
[ 95.970870] R13:
ffff888ff6a55018 R14:
0000000000000000 R15:
ffff888ff6a55460
[ 95.970981] FS:
00007f51b7d24700(0000) GS:
ffff88903ee80000(0000) knlGS:
0000000000000000
[ 95.971108] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 95.971197] CR2:
00007fac5410d710 CR3:
0000000f2c1de002 CR4:
00000000007606e0
[ 95.971309] DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
[ 95.971419] DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
[ 95.971530] PKRU:
55555554
[ 95.971573] Call Trace:
[ 95.971622] ice_setup_rx_ring+0x39/0x110 [ice]
[ 95.971695] ice_vsi_setup_rx_rings+0x54/0x90 [ice]
[ 95.971774] ice_vsi_open+0x25/0x120 [ice]
[ 95.971843] ice_open_internal+0xb8/0x1f0 [ice]
[ 95.971919] ice_ena_vsi+0x4f/0xd0 [ice]
[ 95.971987] ice_dcb_ena_dis_vsi.constprop.5+0x29/0x90 [ice]
[ 95.972082] ice_pf_dcb_cfg+0x29a/0x380 [ice]
[ 95.972154] ice_dcbnl_setets+0x174/0x1b0 [ice]
[ 95.972220] dcbnl_ieee_set+0x89/0x230
[ 95.972279] ? dcbnl_ieee_del+0x150/0x150
[ 95.972341] dcb_doit+0x124/0x1b0
[ 95.972392] rtnetlink_rcv_msg+0x243/0x2f0
[ 95.972457] ? dcb_doit+0x14d/0x1b0
[ 95.972510] ? __kmalloc_node_track_caller+0x1d3/0x280
[ 95.972591] ? rtnl_calcit.isra.31+0x100/0x100
[ 95.972661] netlink_rcv_skb+0xcf/0xf0
[ 95.972720] netlink_unicast+0x16d/0x220
[ 95.972781] netlink_sendmsg+0x2ba/0x3a0
[ 95.975891] sock_sendmsg+0x4c/0x50
[ 95.979032] ___sys_sendmsg+0x2e4/0x300
[ 95.982147] ? kmem_cache_alloc+0x13e/0x190
[ 95.985242] ? __wake_up_common_lock+0x79/0x90
[ 95.988338] ? __check_object_size+0xac/0x1b0
[ 95.991440] ? _copy_to_user+0x22/0x30
[ 95.994539] ? move_addr_to_user+0xbb/0xd0
[ 95.997619] ? __sys_sendmsg+0x53/0x80
[ 96.000664] __sys_sendmsg+0x53/0x80
[ 96.003747] do_syscall_64+0x5b/0x1d0
[ 96.006862] entry_SYSCALL_64_after_hwframe+0x65/0xca
Only update num_txq/rxq when passed check, and restore tc_cfg if setup
queue map failed.
Fixes: a632b2a4c920 ("ice: ethtool: Prohibit improper channel config for DCB")
Signed-off-by: Ding Hui <dinghui@sangfor.com.cn>
Reviewed-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com>
Tested-by: Arpana Arland <arpanax.arland@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Dave Ertman [Tue, 9 Aug 2022 17:24:23 +0000 (10:24 -0700)]
ice: Don't double unplug aux on peer initiated reset
In the IDC callback that is accessed when the aux drivers request a reset,
the function to unplug the aux devices is called. This function is also
called in the ice_prepare_for_reset function. This double call is causing
a "scheduling while atomic" BUG.
[ 662.676430] ice 0000:4c:00.0 rocep76s0: cqp opcode = 0x1 maj_err_code = 0xffff min_err_code = 0x8003
[ 662.676609] ice 0000:4c:00.0 rocep76s0: [Modify QP Cmd Error][op_code=8] status=-29 waiting=1 completion_err=1 maj=0xffff min=0x8003
[ 662.815006] ice 0000:4c:00.0 rocep76s0: ICE OICR event notification: oicr = 0x10000003
[ 662.815014] ice 0000:4c:00.0 rocep76s0: critical PE Error, GLPE_CRITERR=0x00011424
[ 662.815017] ice 0000:4c:00.0 rocep76s0: Requesting a reset
[ 662.815475] BUG: scheduling while atomic: swapper/37/0/0x00010002
[ 662.815475] BUG: scheduling while atomic: swapper/37/0/0x00010002
[ 662.815477] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs rfkill 8021q garp mrp stp llc vfat fat rpcrdma intel_rapl_msr intel_rapl_common sunrpc i10nm_edac rdma_ucm nfit ib_srpt libnvdimm ib_isert iscsi_target_mod x86_pkg_temp_thermal intel_powerclamp coretemp target_core_mod snd_hda_intel ib_iser snd_intel_dspcfg libiscsi snd_intel_sdw_acpi scsi_transport_iscsi kvm_intel iTCO_wdt rdma_cm snd_hda_codec kvm iw_cm ipmi_ssif iTCO_vendor_support snd_hda_core irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hwdep snd_seq snd_seq_device rapl snd_pcm snd_timer isst_if_mbox_pci pcspkr isst_if_mmio irdma intel_uncore idxd acpi_ipmi joydev isst_if_common snd mei_me idxd_bus ipmi_si soundcore i2c_i801 mei ipmi_devintf i2c_smbus i2c_ismt ipmi_msghandler acpi_power_meter acpi_pad rv(OE) ib_uverbs ib_cm ib_core xfs libcrc32c ast i2c_algo_bit drm_vram_helper drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm_ttm_helpe
r ttm
[ 662.815546] nvme nvme_core ice drm crc32c_intel i40e t10_pi wmi pinctrl_emmitsburg dm_mirror dm_region_hash dm_log dm_mod fuse
[ 662.815557] Preemption disabled at:
[ 662.815558] [<
0000000000000000>] 0x0
[ 662.815563] CPU: 37 PID: 0 Comm: swapper/37 Kdump: loaded Tainted: G S OE 5.17.1 #2
[ 662.815566] Hardware name: Intel Corporation D50DNP/D50DNP, BIOS SE5C6301.86B.6624.D18.
2111021741 11/02/2021
[ 662.815568] Call Trace:
[ 662.815572] <IRQ>
[ 662.815574] dump_stack_lvl+0x33/0x42
[ 662.815581] __schedule_bug.cold.147+0x7d/0x8a
[ 662.815588] __schedule+0x798/0x990
[ 662.815595] schedule+0x44/0xc0
[ 662.815597] schedule_preempt_disabled+0x14/0x20
[ 662.815600] __mutex_lock.isra.11+0x46c/0x490
[ 662.815603] ? __ibdev_printk+0x76/0xc0 [ib_core]
[ 662.815633] device_del+0x37/0x3d0
[ 662.815639] ice_unplug_aux_dev+0x1a/0x40 [ice]
[ 662.815674] ice_schedule_reset+0x3c/0xd0 [ice]
[ 662.815693] irdma_iidc_event_handler.cold.7+0xb6/0xd3 [irdma]
[ 662.815712] ? bitmap_find_next_zero_area_off+0x45/0xa0
[ 662.815719] ice_send_event_to_aux+0x54/0x70 [ice]
[ 662.815741] ice_misc_intr+0x21d/0x2d0 [ice]
[ 662.815756] __handle_irq_event_percpu+0x4c/0x180
[ 662.815762] handle_irq_event_percpu+0xf/0x40
[ 662.815764] handle_irq_event+0x34/0x60
[ 662.815766] handle_edge_irq+0x9a/0x1c0
[ 662.815770] __common_interrupt+0x62/0x100
[ 662.815774] common_interrupt+0xb4/0xd0
[ 662.815779] </IRQ>
[ 662.815780] <TASK>
[ 662.815780] asm_common_interrupt+0x1e/0x40
[ 662.815785] RIP: 0010:cpuidle_enter_state+0xd6/0x380
[ 662.815789] Code: 49 89 c4 0f 1f 44 00 00 31 ff e8 65 d7 95 ff 45 84 ff 74 12 9c 58 f6 c4 02 0f 85 64 02 00 00 31 ff e8 ae c5 9c ff fb 45 85 f6 <0f> 88 12 01 00 00 49 63 d6 4c 2b 24 24 48 8d 04 52 48 8d 04 82 49
[ 662.815791] RSP: 0018:
ff2c2c4f18edbe80 EFLAGS:
00000202
[ 662.815793] RAX:
ff280805df140000 RBX:
0000000000000002 RCX:
000000000000001f
[ 662.815795] RDX:
0000009a52da2d08 RSI:
ffffffff93f8240b RDI:
ffffffff93f53ee7
[ 662.815796] RBP:
ff5e2bd11ff41928 R08:
0000000000000000 R09:
000000000002f8c0
[ 662.815797] R10:
0000010c3f18e2cf R11:
000000000000000f R12:
0000009a52da2d08
[ 662.815798] R13:
ffffffff94ad7e20 R14:
0000000000000002 R15:
0000000000000000
[ 662.815801] cpuidle_enter+0x29/0x40
[ 662.815803] do_idle+0x261/0x2b0
[ 662.815807] cpu_startup_entry+0x19/0x20
[ 662.815809] start_secondary+0x114/0x150
[ 662.815813] secondary_startup_64_no_verify+0xd5/0xdb
[ 662.815818] </TASK>
[ 662.815846] bad: scheduling from the idle thread!
[ 662.815849] CPU: 37 PID: 0 Comm: swapper/37 Kdump: loaded Tainted: G S W OE 5.17.1 #2
[ 662.815852] Hardware name: Intel Corporation D50DNP/D50DNP, BIOS SE5C6301.86B.6624.D18.
2111021741 11/02/2021
[ 662.815853] Call Trace:
[ 662.815855] <IRQ>
[ 662.815856] dump_stack_lvl+0x33/0x42
[ 662.815860] dequeue_task_idle+0x20/0x30
[ 662.815863] __schedule+0x1c3/0x990
[ 662.815868] schedule+0x44/0xc0
[ 662.815871] schedule_preempt_disabled+0x14/0x20
[ 662.815873] __mutex_lock.isra.11+0x3a8/0x490
[ 662.815876] ? __ibdev_printk+0x76/0xc0 [ib_core]
[ 662.815904] device_del+0x37/0x3d0
[ 662.815909] ice_unplug_aux_dev+0x1a/0x40 [ice]
[ 662.815937] ice_schedule_reset+0x3c/0xd0 [ice]
[ 662.815961] irdma_iidc_event_handler.cold.7+0xb6/0xd3 [irdma]
[ 662.815979] ? bitmap_find_next_zero_area_off+0x45/0xa0
[ 662.815985] ice_send_event_to_aux+0x54/0x70 [ice]
[ 662.816011] ice_misc_intr+0x21d/0x2d0 [ice]
[ 662.816033] __handle_irq_event_percpu+0x4c/0x180
[ 662.816037] handle_irq_event_percpu+0xf/0x40
[ 662.816039] handle_irq_event+0x34/0x60
[ 662.816042] handle_edge_irq+0x9a/0x1c0
[ 662.816045] __common_interrupt+0x62/0x100
[ 662.816048] common_interrupt+0xb4/0xd0
[ 662.816052] </IRQ>
[ 662.816053] <TASK>
[ 662.816054] asm_common_interrupt+0x1e/0x40
[ 662.816057] RIP: 0010:cpuidle_enter_state+0xd6/0x380
[ 662.816060] Code: 49 89 c4 0f 1f 44 00 00 31 ff e8 65 d7 95 ff 45 84 ff 74 12 9c 58 f6 c4 02 0f 85 64 02 00 00 31 ff e8 ae c5 9c ff fb 45 85 f6 <0f> 88 12 01 00 00 49 63 d6 4c 2b 24 24 48 8d 04 52 48 8d 04 82 49
[ 662.816063] RSP: 0018:
ff2c2c4f18edbe80 EFLAGS:
00000202
[ 662.816065] RAX:
ff280805df140000 RBX:
0000000000000002 RCX:
000000000000001f
[ 662.816067] RDX:
0000009a52da2d08 RSI:
ffffffff93f8240b RDI:
ffffffff93f53ee7
[ 662.816068] RBP:
ff5e2bd11ff41928 R08:
0000000000000000 R09:
000000000002f8c0
[ 662.816070] R10:
0000010c3f18e2cf R11:
000000000000000f R12:
0000009a52da2d08
[ 662.816071] R13:
ffffffff94ad7e20 R14:
0000000000000002 R15:
0000000000000000
[ 662.816075] cpuidle_enter+0x29/0x40
[ 662.816077] do_idle+0x261/0x2b0
[ 662.816080] cpu_startup_entry+0x19/0x20
[ 662.816083] start_secondary+0x114/0x150
[ 662.816087] secondary_startup_64_no_verify+0xd5/0xdb
[ 662.816091] </TASK>
[ 662.816169] bad: scheduling from the idle thread!
The correct place to unplug the aux devices for a reset is in the
prepare_for_reset function, as this is a common place for all reset flows.
It also has built in protection from being called twice in a single reset
instance before the aux devices are replugged.
Fixes: f9f5301e7e2d4 ("ice: Register auxiliary device to provide RDMA")
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Helena Anna Dubel <helena.anna.dubel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Linus Torvalds [Thu, 8 Sep 2022 12:15:01 +0000 (08:15 -0400)]
Merge tag 'net-6.0-rc5' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from rxrpc, netfilter, wireless and bluetooth
subtrees.
Current release - regressions:
- skb: export skb drop reaons to user by TRACE_DEFINE_ENUM
- bluetooth: fix regression preventing ACL packet transmission
Current release - new code bugs:
- dsa: microchip: fix kernel oops on ksz8 switches
- dsa: qca8k: fix NULL pointer dereference for
of_device_get_match_data
Previous releases - regressions:
- netfilter: clean up hook list when offload flags check fails
- wifi: mt76: fix crash in chip reset fail
- rxrpc: fix ICMP/ICMP6 error handling
- ice: fix DMA mappings leak
- i40e: fix kernel crash during module removal
Previous releases - always broken:
- ipv6: sr: fix out-of-bounds read when setting HMAC data.
- tcp: TX zerocopy should not sense pfmemalloc status
- sch_sfb: don't assume the skb is still around after
enqueueing to child
- netfilter: drop dst references before setting
- wifi: wilc1000: fix DMA on stack objects
- rxrpc: fix an insufficiently large sglist in
rxkad_verify_packet_2()
- fec: use a spinlock to guard `fep->ptp_clk_on`
Misc:
- usb: qmi_wwan: add Quectel RM520N"
* tag 'net-6.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (50 commits)
sch_sfb: Also store skb len before calling child enqueue
net: phy: lan87xx: change interrupt src of link_up to comm_ready
net/smc: Fix possible access to freed memory in link clear
net: ethernet: mtk_eth_soc: check max allowed hash in mtk_ppe_check_skb
net: skb: export skb drop reaons to user by TRACE_DEFINE_ENUM
net: ethernet: mtk_eth_soc: fix typo in __mtk_foe_entry_clear
net: dsa: felix: access QSYS_TAG_CONFIG under tas_lock in vsc9959_sched_speed_set
net: dsa: felix: disable cut-through forwarding for frames oversized for tc-taprio
net: dsa: felix: tc-taprio intervals smaller than MTU should send at least one packet
net: usb: qmi_wwan: add Quectel RM520N
net: dsa: qca8k: fix NULL pointer dereference for of_device_get_match_data
tcp: fix early ETIMEDOUT after spurious non-SACK RTO
stmmac: intel: Simplify intel_eth_pci_remove()
net: mvpp2: debugfs: fix memory leak when using debugfs_lookup()
ipv6: sr: fix out-of-bounds read when setting HMAC data.
bonding: accept unsolicited NA message
bonding: add all node mcast address when slave up
bonding: use unspecified address if no available link local address
wifi: use struct_group to copy addresses
wifi: mac80211_hwsim: check length for virtio packets
...
Linus Torvalds [Wed, 31 Aug 2022 16:46:12 +0000 (09:46 -0700)]
fs: only do a memory barrier for the first set_buffer_uptodate()
Commit
d4252071b97d ("add barriers to buffer_uptodate and
set_buffer_uptodate") added proper memory barriers to the buffer head
BH_Uptodate bit, so that anybody who tests a buffer for being up-to-date
will be guaranteed to actually see initialized state.
However, that commit didn't _just_ add the memory barrier, it also ended
up dropping the "was it already set" logic that the BUFFER_FNS() macro
had.
That's conceptually the right thing for a generic "this is a memory
barrier" operation, but in the case of the buffer contents, we really
only care about the memory barrier for the _first_ time we set the bit,
in that the only memory ordering protection we need is to avoid anybody
seeing uninitialized memory contents.
Any other access ordering wouldn't be about the BH_Uptodate bit anyway,
and would require some other proper lock (typically BH_Lock or the folio
lock). A reader that races with somebody invalidating the buffer head
isn't an issue wrt the memory ordering, it's a serialization issue.
Now, you'd think that the buffer head operations don't matter in this
day and age (and I certainly thought so), but apparently some loads
still end up being heavy users of buffer heads. In particular, the
kernel test robot reported that not having this bit access optimization
in place caused a noticeable direct IO performance regression on ext4:
fxmark.ssd_ext4_no_jnl_DWTL_54_directio.works/sec -26.5% regression
although you presumably need a fast disk and a lot of cores to actually
notice.
Link: https://lore.kernel.org/all/Yw8L7HTZ%2FdE2%2Fo9C@xsang-OptiPlex-9020/
Reported-by: kernel test robot <oliver.sang@intel.com>
Tested-by: Fengwei Yin <fengwei.yin@intel.com>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 8 Sep 2022 11:37:38 +0000 (07:37 -0400)]
Merge tag 'efi-urgent-for-v6.0-1' of git://git./linux/kernel/git/efi/efi
Pull EFI fixes from Ard Biesheuvel:
"A couple of low-priority EFI fixes:
- prevent the randstruct plugin from re-ordering EFI protocol
definitions
- fix a use-after-free in the capsule loader
- drop unused variable"
* tag 'efi-urgent-for-v6.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
efi: capsule-loader: Fix use-after-free in efi_capsule_write
efi/x86: libstub: remove unused variable
efi: libstub: Disable struct randomization
Toke Høiland-Jørgensen [Mon, 5 Sep 2022 19:21:36 +0000 (21:21 +0200)]
sch_sfb: Also store skb len before calling child enqueue
Cong Wang noticed that the previous fix for sch_sfb accessing the queued
skb after enqueueing it to a child qdisc was incomplete: the SFB enqueue
function was also calling qdisc_qstats_backlog_inc() after enqueue, which
reads the pkt len from the skb cb field. Fix this by also storing the skb
len, and using the stored value to increment the backlog after enqueueing.
Fixes: 9efd23297cca ("sch_sfb: Don't assume the skb is still around after enqueueing to child")
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Link: https://lore.kernel.org/r/20220905192137.965549-1-toke@toke.dk
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Arun Ramadoss [Mon, 5 Sep 2022 15:27:50 +0000 (20:57 +0530)]
net: phy: lan87xx: change interrupt src of link_up to comm_ready
Currently phy link up/down interrupt is enabled using the
LAN87xx_INTERRUPT_MASK register. In the lan87xx_read_status function,
phy link is determined using the T1_MODE_STAT_REG register comm_ready bit.
comm_ready bit is set using the loc_rcvr_status & rem_rcvr_status.
Whenever the phy link is up, LAN87xx_INTERRUPT_SOURCE link_up bit is set
first but comm_ready bit takes some time to set based on local and
remote receiver status.
As per the current implementation, interrupt is triggered using link_up
but the comm_ready bit is still cleared in the read_status function. So,
link is always down. Initially tested with the shared interrupt
mechanism with switch and internal phy which is working, but after
implementing interrupt controller it is not working.
It can fixed either by updating the read_status function to read from
LAN87XX_INTERRUPT_SOURCE register or enable the interrupt mask for
comm_ready bit. But the validation team recommends the use of comm_ready
for link detection.
This patch fixes by enabling the comm_ready bit for link_up in the
LAN87XX_INTERRUPT_MASK_2 register (MISC Bank) and link_down in
LAN87xx_INTERRUPT_MASK register.
Fixes: 8a1b415d70b7 ("net: phy: added ethtool master-slave configuration support")
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20220905152750.5079-1-arun.ramadoss@microchip.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Hyunwoo Kim [Wed, 7 Sep 2022 16:07:14 +0000 (09:07 -0700)]
efi: capsule-loader: Fix use-after-free in efi_capsule_write
A race condition may occur if the user calls close() on another thread
during a write() operation on the device node of the efi capsule.
This is a race condition that occurs between the efi_capsule_write() and
efi_capsule_flush() functions of efi_capsule_fops, which ultimately
results in UAF.
So, the page freeing process is modified to be done in
efi_capsule_release() instead of efi_capsule_flush().
Cc: <stable@vger.kernel.org> # v4.9+
Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com>
Link: https://lore.kernel.org/all/20220907102920.GA88602@ubuntu/
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Yacan Liu [Tue, 6 Sep 2022 13:01:39 +0000 (21:01 +0800)]
net/smc: Fix possible access to freed memory in link clear
After modifying the QP to the Error state, all RX WR would be completed
with WC in IB_WC_WR_FLUSH_ERR status. Current implementation does not
wait for it is done, but destroy the QP and free the link group directly.
So there is a risk that accessing the freed memory in tasklet context.
Here is a crash example:
BUG: unable to handle page fault for address:
ffffffff8f220860
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD
f7300e067 P4D
f7300e067 PUD
f7300f063 PMD
8c4e45063 PTE
800ffff08c9df060
Oops: 0002 [#1] SMP PTI
CPU: 1 PID: 0 Comm: swapper/1 Kdump: loaded Tainted: G S OE 5.10.0-0607+ #23
Hardware name: Inspur NF5280M4/YZMB-00689-101, BIOS 4.1.20 07/09/2018
RIP: 0010:native_queued_spin_lock_slowpath+0x176/0x1b0
Code: f3 90 48 8b 32 48 85 f6 74 f6 eb d5 c1 ee 12 83 e0 03 83 ee 01 48 c1 e0 05 48 63 f6 48 05 00 c8 02 00 48 03 04 f5 00 09 98 8e <48> 89 10 8b 42 08 85 c0 75 09 f3 90 8b 42 08 85 c0 74 f7 48 8b 32
RSP: 0018:
ffffb3b6c001ebd8 EFLAGS:
00010086
RAX:
ffffffff8f220860 RBX:
0000000000000246 RCX:
0000000000080000
RDX:
ffff91db1f86c800 RSI:
000000000000173c RDI:
ffff91db62bace00
RBP:
ffff91db62bacc00 R08:
0000000000000000 R09:
c00000010000028b
R10:
0000000000055198 R11:
ffffb3b6c001ea58 R12:
ffff91db80e05010
R13:
000000000000000a R14:
0000000000000006 R15:
0000000000000040
FS:
0000000000000000(0000) GS:
ffff91db1f840000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
ffffffff8f220860 CR3:
00000001f9580004 CR4:
00000000003706e0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
Call Trace:
<IRQ>
_raw_spin_lock_irqsave+0x30/0x40
mlx5_ib_poll_cq+0x4c/0xc50 [mlx5_ib]
smc_wr_rx_tasklet_fn+0x56/0xa0 [smc]
tasklet_action_common.isra.21+0x66/0x100
__do_softirq+0xd5/0x29c
asm_call_irq_on_stack+0x12/0x20
</IRQ>
do_softirq_own_stack+0x37/0x40
irq_exit_rcu+0x9d/0xa0
sysvec_call_function_single+0x34/0x80
asm_sysvec_call_function_single+0x12/0x20
Fixes: bd4ad57718cc ("smc: initialize IB transport incl. PD, MR, QP, CQ, event, WR")
Signed-off-by: Yacan Liu <liuyacan@corp.netease.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lorenzo Bianconi [Mon, 5 Sep 2022 12:41:28 +0000 (14:41 +0200)]
net: ethernet: mtk_eth_soc: check max allowed hash in mtk_ppe_check_skb
Even if max hash configured in hw in mtk_ppe_hash_entry is
MTK_PPE_ENTRIES - 1, check theoretical OOB accesses in
mtk_ppe_check_skb routine
Fixes: c4f033d9e03e9 ("net: ethernet: mtk_eth_soc: rework hardware flow table management")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Menglong Dong [Mon, 5 Sep 2022 03:50:15 +0000 (11:50 +0800)]
net: skb: export skb drop reaons to user by TRACE_DEFINE_ENUM
As Eric reported, the 'reason' field is not presented when trace the
kfree_skb event by perf:
$ perf record -e skb:kfree_skb -a sleep 10
$ perf script
ip_defrag 14605 [021] 221.614303: skb:kfree_skb:
skbaddr=0xffff9d2851242700 protocol=34525 location=0xffffffffa39346b1
reason:
The cause seems to be passing kernel address directly to TP_printk(),
which is not right. As the enum 'skb_drop_reason' is not exported to
user space through TRACE_DEFINE_ENUM(), perf can't get the drop reason
string from the 'reason' field, which is a number.
Therefore, we introduce the macro DEFINE_DROP_REASON(), which is used
to define the trace enum by TRACE_DEFINE_ENUM(). With the help of
DEFINE_DROP_REASON(), now we can remove the auto-generate that we
introduced in the commit
ec43908dd556
("net: skb: use auto-generation to convert skb drop reason to string"),
and define the string array 'drop_reasons'.
Hmmmm...now we come back to the situation that have to maintain drop
reasons in both enum skb_drop_reason and DEFINE_DROP_REASON. But they
are both in dropreason.h, which makes it easier.
After this commit, now the format of kfree_skb is like this:
$ cat /tracing/events/skb/kfree_skb/format
name: kfree_skb
ID: 1524
format:
field:unsigned short common_type; offset:0; size:2; signed:0;
field:unsigned char common_flags; offset:2; size:1; signed:0;
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
field:int common_pid; offset:4; size:4; signed:1;
field:void * skbaddr; offset:8; size:8; signed:0;
field:void * location; offset:16; size:8; signed:0;
field:unsigned short protocol; offset:24; size:2; signed:0;
field:enum skb_drop_reason reason; offset:28; size:4; signed:0;
print fmt: "skbaddr=%p protocol=%u location=%p reason: %s", REC->skbaddr, REC->protocol, REC->location, __print_symbolic(REC->reason, { 1, "NOT_SPECIFIED" }, { 2, "NO_SOCKET" } ......
Fixes: ec43908dd556 ("net: skb: use auto-generation to convert skb drop reason to string")
Link: https://lore.kernel.org/netdev/CANn89i+bx0ybvE55iMYf5GJM48WwV1HNpdm9Q6t-HaEstqpCSA@mail.gmail.com/
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lorenzo Bianconi [Tue, 6 Sep 2022 14:36:32 +0000 (16:36 +0200)]
net: ethernet: mtk_eth_soc: fix typo in __mtk_foe_entry_clear
Set ib1 state to MTK_FOE_STATE_UNBIND in __mtk_foe_entry_clear routine.
Fixes: 33fc42de33278 ("net: ethernet: mtk_eth_soc: support creating mac address based offload entries")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso [Wed, 7 Sep 2022 08:26:18 +0000 (10:26 +0200)]
netfilter: nfnetlink_osf: fix possible bogus match in nf_osf_find()
nf_osf_find() incorrectly returns true on mismatch, this leads to
copying uninitialized memory area in nft_osf which can be used to leak
stale kernel stack data to userspace.
Fixes: 22c7652cdaa8 ("netfilter: nft_osf: Add version option support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
David Leadbeater [Fri, 26 Aug 2022 04:56:57 +0000 (14:56 +1000)]
netfilter: nf_conntrack_irc: Tighten matching on DCC message
CTCP messages should only be at the start of an IRC message, not
anywhere within it.
While the helper only decodes packes in the ORIGINAL direction, its
possible to make a client send a CTCP message back by empedding one into
a PING request. As-is, thats enough to make the helper believe that it
saw a CTCP message.
Fixes: 869f37d8e48f ("[NETFILTER]: nf_conntrack/nf_nat: add IRC helper port")
Signed-off-by: David Leadbeater <dgl@dgl.cx>
Signed-off-by: Florian Westphal <fw@strlen.de>
Florian Westphal [Wed, 31 Aug 2022 13:12:45 +0000 (15:12 +0200)]
selftests: nft_concat_range: add socat support
There are different flavors of 'nc' around, this script fails on
my test vm because 'nc' is 'nmap-ncat' which isn't 100% compatible.
Add socat support and use it if available.
Signed-off-by: Florian Westphal <fw@strlen.de>
Igor Ryzhov [Wed, 5 Jun 2019 09:32:40 +0000 (12:32 +0300)]
netfilter: nf_conntrack_sip: fix ct_sip_walk_headers
ct_sip_next_header and ct_sip_get_header return an absolute
value of matchoff, not a shift from current dataoff.
So dataoff should be assigned matchoff, not incremented by it.
This issue can be seen in the scenario when there are multiple
Contact headers and the first one is using a hostname and other headers
use IP addresses. In this case, ct_sip_walk_headers will work as follows:
The first ct_sip_get_header call to will find the first Contact header
but will return -1 as the header uses a hostname. But matchoff will
be changed to the offset of this header. After that, dataoff should be
set to matchoff, so that the next ct_sip_get_header call find the next
Contact header. But instead of assigning dataoff to matchoff, it is
incremented by it, which is not correct, as matchoff is an absolute
value of the offset. So on the next call to the ct_sip_get_header,
dataoff will be incorrect, and the next Contact header may not be
found at all.
Fixes: 05e3ced297fe ("[NETFILTER]: nf_conntrack_sip: introduce SIP-URI parsing helper")
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
David S. Miller [Wed, 7 Sep 2022 12:44:04 +0000 (13:44 +0100)]
Merge branch 'dsa-felix-fixes'
Vladimir Oltean says:
====================
Fixes for Felix DSA driver calculation of tc-taprio guard bands
This series fixes some bugs which are not quite new, but date from v5.13
when static guard bands were enabled by Michael Walle to prevent
tc-taprio overruns.
The investigation started when Xiaoliang asked privately what is the
expected max SDU for a traffic class when its minimum gate interval is
10 us. The answer, as it turns out, is not an L1 size of 1250 octets,
but 1245 octets, since otherwise, the switch will not consider frames
for egress scheduling, because the static guard band is exactly as large
as the time interval. The switch needs a minimum of 33 ns outside of the
guard band to consider a frame for scheduling, and the reduction of the
max SDU by 5 provides exactly for that.
The fix for that (patch 1/3) is relatively small, but during testing, it
became apparent that cut-through forwarding prevents oversized frame
dropping from working properly. This is solved through the larger patch
2/3. Finally, patch 3/3 fixes one more tc-taprio locking problem found
through code inspection.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Mon, 5 Sep 2022 17:01:25 +0000 (20:01 +0300)]
net: dsa: felix: access QSYS_TAG_CONFIG under tas_lock in vsc9959_sched_speed_set
The read-modify-write of QSYS_TAG_CONFIG from vsc9959_sched_speed_set()
runs unlocked with respect to the other functions that access it, which
are vsc9959_tas_guard_bands_update(), vsc9959_qos_port_tas_set() and
vsc9959_tas_clock_adjust(). All the others are under ocelot->tas_lock,
so move the vsc9959_sched_speed_set() access under that lock as well, to
resolve the concurrency.
Fixes: 55a515b1f5a9 ("net: dsa: felix: drop oversized frames with tc-taprio instead of hanging the port")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Mon, 5 Sep 2022 17:01:24 +0000 (20:01 +0300)]
net: dsa: felix: disable cut-through forwarding for frames oversized for tc-taprio
Experimentally, it looks like when QSYS_QMAXSDU_CFG_7 is set to 605,
frames even way larger than 601 octets are transmitted even though these
should be considered as oversized, according to the documentation, and
dropped.
Since oversized frame dropping depends on frame size, which is only
known at the EOF stage, and therefore not at SOF when cut-through
forwarding begins, it means that the switch cannot take QSYS_QMAXSDU_CFG_*
into consideration for traffic classes that are cut-through.
Since cut-through forwarding has no UAPI to control it, and the driver
enables it based on the mantra "if we can, then why not", the strategy
is to alter vsc9959_cut_through_fwd() to take into consideration which
tc's have oversize frame dropping enabled, and disable cut-through for
them. Then, from vsc9959_tas_guard_bands_update(), we re-trigger the
cut-through determination process.
There are 2 strategies for vsc9959_cut_through_fwd() to determine
whether a tc has oversized dropping enabled or not. One is to keep a bit
mask of traffic classes per port, and the other is to read back from the
hardware registers (a non-zero value of QSYS_QMAXSDU_CFG_* means the
feature is enabled). We choose reading back from registers, because
struct ocelot_port is shared with drivers (ocelot, seville) that don't
support either cut-through nor tc-taprio, and we don't have a felix
specific extension of struct ocelot_port. Furthermore, reading registers
from the Felix hardware is quite cheap, since they are memory-mapped.
Fixes: 55a515b1f5a9 ("net: dsa: felix: drop oversized frames with tc-taprio instead of hanging the port")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>