Dan Carpenter [Sat, 19 Jun 2021 13:47:38 +0000 (16:47 +0300)]
net: hns3: fix different snprintf() limit
This patch doesn't affect runtime at all, it's just a correctness issue.
The ptp->info.name[] buffer has 16 characters but the snprintf() limit
was capped at 32 characters. Fortunately, HCLGE_DRIVER_NAME is "hclge"
which isn't close to 16 characters so we're fine.
Fixes:
0bf5eb788512 ("net: hns3: add support for PTP")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 18 Jun 2021 20:25:04 +0000 (13:25 -0700)]
selftests: tls: fix chacha+bidir tests
ChaCha support did not adjust the bidirectional test.
We need to set up KTLS in reverse direction correctly,
otherwise these two cases will fail:
tls.12_chacha.bidir
tls.13_chacha.bidir
Fixes:
4f336e88a870 ("selftests/tls: add CHACHA20-POLY1305 to tls selftests")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Vadim Fedorenko <vfedorenko@novek.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 18 Jun 2021 20:25:03 +0000 (13:25 -0700)]
selftests: tls: clean up uninitialized warnings
A bunch of tests uses uninitialized stack memory as random
data to send. This is harmless but generates compiler warnings.
Explicitly init the buffers with random data.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Vadim Fedorenko <vfedorenko@novek.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Sat, 19 Jun 2021 02:47:02 +0000 (19:47 -0700)]
Merge git://git./linux/kernel/git/netdev/net
Trivial conflicts in net/can/isotp.c and
tools/testing/selftests/net/mptcp/mptcp_connect.sh
scaled_ppm_to_ppb() was moved from drivers/ptp/ptp_clock.c
to include/linux/ptp_clock_kernel.h in -next so re-apply
the fix there.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Sat, 19 Jun 2021 01:55:29 +0000 (18:55 -0700)]
Merge tag 'net-5.13-rc7' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Networking fixes for 5.13-rc7, including fixes from wireless, bpf,
bluetooth, netfilter and can.
Current release - regressions:
- mlxsw: spectrum_qdisc: Pass handle, not band number to find_class()
to fix modifying offloaded qdiscs
- lantiq: net: fix duplicated skb in rx descriptor ring
- rtnetlink: fix regression in bridge VLAN configuration, empty info
is not an error, bot-generated "fix" was not needed
- libbpf: s/rx/tx/ typo on umem->rx_ring_setup_done to fix umem
creation
Current release - new code bugs:
- ethtool: fix NULL pointer dereference during module EEPROM dump via
the new netlink API
- mlx5e: don't update netdev RQs with PTP-RQ, the special purpose
queue should not be visible to the stack
- mlx5e: select special PTP queue only for SKBTX_HW_TSTAMP skbs
- mlx5e: verify dev is present in get devlink port ndo, avoid a panic
Previous releases - regressions:
- neighbour: allow NUD_NOARP entries to be force GCed
- further fixes for fallout from reorg of WiFi locking (staging:
rtl8723bs, mac80211, cfg80211)
- skbuff: fix incorrect msg_zerocopy copy notifications
- mac80211: fix NULL ptr deref for injected rate info
- Revert "net/mlx5: Arm only EQs with EQEs" it may cause missed IRQs
Previous releases - always broken:
- bpf: more speculative execution fixes
- netfilter: nft_fib_ipv6: skip ipv6 packets from any to link-local
- udp: fix race between close() and udp_abort() resulting in a panic
- fix out of bounds when parsing TCP options before packets are
validated (in netfilter: synproxy, tc: sch_cake and mptcp)
- mptcp: improve operation under memory pressure, add missing
wake-ups
- mptcp: fix double-lock/soft lookup in subflow_error_report()
- bridge: fix races (null pointer deref and UAF) in vlan tunnel
egress
- ena: fix DMA mapping function issues in XDP
- rds: fix memory leak in rds_recvmsg
Misc:
- vrf: allow larger MTUs
- icmp: don't send out ICMP messages with a source address of 0.0.0.0
- cdc_ncm: switch to eth%d interface naming"
* tag 'net-5.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (139 commits)
net: ethernet: fix potential use-after-free in ec_bhf_remove
selftests/net: Add icmp.sh for testing ICMP dummy address responses
icmp: don't send out ICMP messages with a source address of 0.0.0.0
net: ll_temac: Avoid ndo_start_xmit returning NETDEV_TX_BUSY
net: ll_temac: Fix TX BD buffer overwrite
net: ll_temac: Add memory-barriers for TX BD access
net: ll_temac: Make sure to free skb when it is completely used
MAINTAINERS: add Guvenc as SMC maintainer
bnxt_en: Call bnxt_ethtool_free() in bnxt_init_one() error path
bnxt_en: Fix TQM fastpath ring backing store computation
bnxt_en: Rediscover PHY capabilities after firmware reset
cxgb4: fix wrong shift.
mac80211: handle various extensible elements correctly
mac80211: reset profile_periodicity/ema_ap
cfg80211: avoid double free of PMSR request
cfg80211: make certificate generation more robust
mac80211: minstrel_ht: fix sample time check
net: qed: Fix memcpy() overflow of qed_dcbx_params()
net: cdc_eem: fix tx fixup skb leak
net: hamradio: fix memory leak in mkiss_close
...
Linus Torvalds [Fri, 18 Jun 2021 23:39:03 +0000 (16:39 -0700)]
Merge tag 'for-5.13-rc6-tag' of git://git./linux/kernel/git/kdave/linux
Pull btrfs fix from David Sterba:
"One more fix, for a space accounting bug in zoned mode. It happens
when a block group is switched back rw->ro and unusable bytes (due to
zoned constraints) are subtracted twice.
It has user visible effects so I consider it important enough for late
-rc inclusion and backport to stable"
* tag 'for-5.13-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: zoned: fix negative space_info->bytes_readonly
Linus Torvalds [Fri, 18 Jun 2021 20:54:11 +0000 (13:54 -0700)]
Merge tag 'pci-v5.13-fixes-2' of git://git./linux/kernel/git/helgaas/pci
Pull PCI fixes from Bjorn Helgaas:
- Clear 64-bit flag for host bridge windows below 4GB to fix a resource
allocation regression added in -rc1 (Punit Agrawal)
- Fix tegra194 MCFG quirk build regressions added in -rc1 (Jon Hunter)
- Avoid secondary bus resets on TI KeyStone C667X devices (Antti
Järvinen)
- Avoid secondary bus resets on some NVIDIA GPUs (Shanker Donthineni)
- Work around FLR erratum on Huawei Intelligent NIC VF (Chiqijun)
- Avoid broken ATS on AMD Navi14 GPU (Evan Quan)
- Trust Broadcom BCM57414 NIC to isolate functions even though it
doesn't advertise ACS support (Sriharsha Basavapatna)
- Work around AMD RS690 BIOSes that don't configure DMA above 4GB
(Mikel Rychliski)
- Fix panic during PIO transfer on Aardvark controller (Pali Rohár)
* tag 'pci-v5.13-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI: aardvark: Fix kernel panic during PIO transfer
PCI: Add AMD RS690 quirk to enable 64-bit DMA
PCI: Add ACS quirk for Broadcom BCM57414 NIC
PCI: Mark AMD Navi14 GPU ATS as broken
PCI: Work around Huawei Intelligent NIC VF FLR erratum
PCI: Mark some NVIDIA GPUs to avoid bus reset
PCI: Mark TI C667X to avoid bus reset
PCI: tegra194: Fix MCFG quirk build regressions
PCI: of: Clear 64-bit flag for non-prefetchable memory below 4GB
Matthew Wilcox (Oracle) [Wed, 16 Jun 2021 21:22:28 +0000 (22:22 +0100)]
afs: Re-enable freezing once a page fault is interrupted
If a task is killed during a page fault, it does not currently call
sb_end_pagefault(), which means that the filesystem cannot be frozen
at any time thereafter. This may be reported by lockdep like this:
====================================
WARNING: fsstress/10757 still has locks held!
5.13.0-rc4-build4+ #91 Not tainted
------------------------------------
1 lock held by fsstress/10757:
#0:
ffff888104eac530
(
sb_pagefaults
as filesystem freezing is modelled as a lock.
Fix this by removing all the direct returns from within the function,
and using 'ret' to indicate whether we were interrupted or successful.
Fixes:
1cf7a1518aef ("afs: Implement shared-writeable mmap")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/20210616154900.1958373-1-willy@infradead.org/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David S. Miller [Fri, 18 Jun 2021 20:13:40 +0000 (13:13 -0700)]
Merge branch 'RPMSG-WWAN-CTRL-driver'
Stephan Gerhold says:
====================
net: wwan: Add RPMSG WWAN CTRL driver
This patch series adds a WWAN "control" driver for the remote processor
messaging (rpmsg) subsystem. This subsystem allows communicating with
an integrated modem DSP on many Qualcomm SoCs, e.g. MSM8916 or MSM8974.
The driver is a fairly simple glue layer between WWAN and RPMSG
and is mostly based on the existing mhi_wwan_ctrl.c and rpmsg_char.c.
For more information, see commit message in PATCH 2/3.
I already posted a RFC for this a while ago:
https://lore.kernel.org/linux-arm-msm/YLfL9Q+4860uqS8f@gerhold.net/
and now I'm looking for some feedback for the actual changes. :)
Changes in v3:
- PATCH 2/3: Clarify commit message
- PATCH 3/3: Fix build error for cdc-wdm.c, use extra tx_blocking() op instead
v2: https://lore.kernel.org/netdev/
20210618075243.42046-1-stephan@gerhold.net/
Changes in v2: Only in PATCH 3/3
- Fix EPOLLOUT being always set even if poll op is defined
- Rename poll() op -> tx_poll() since it should be only used for TX
v1: https://lore.kernel.org/netdev/
20210615133229.213064-1-stephan@gerhold.net/
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Stephan Gerhold [Fri, 18 Jun 2021 17:36:11 +0000 (19:36 +0200)]
net: wwan: Allow WWAN drivers to provide blocking tx and poll function
At the moment, the WWAN core provides wwan_port_txon/off() to implement
blocking writes. The tx() port operation should not block, instead
wwan_port_txon/off() should be called when the TX queue is full or has
free space again.
However, in some cases it is not straightforward to make use of that
functionality. For example, the RPMSG API used by rpmsg_wwan_ctrl.c
does not provide any way to be notified when the TX queue has space
again. Instead, it only provides the following operations:
- rpmsg_send(): blocking write (wait until there is space)
- rpmsg_trysend(): non-blocking write (return error if no space)
- rpmsg_poll(): set poll flags depending on TX queue state
Generally that's totally sufficient for implementing a char device,
but it does not fit well to the currently provided WWAN port ops.
Most of the time, using the non-blocking rpmsg_trysend() in the
WWAN tx() port operation works just fine. However, with high-frequent
writes to the char device it is possible to trigger a situation
where this causes issues. For example, consider the following
(somewhat unrealistic) example:
# dd if=/dev/zero bs=1000 of=/dev/wwan0qmi0
dd: error writing '/dev/wwan0qmi0': Resource temporarily unavailable
1+0 records out
This fails immediately after writing the first record. It's likely
only a matter of time until this triggers issues for some real application
(e.g. ModemManager sending a lot of large QMI packets).
The rpmsg_char device does not have this problem, because it uses
rpmsg_trysend() and rpmsg_poll() to support non-blocking operations.
Make it possible to use the same in the RPMSG WWAN driver by adding
two new optional wwan_port_ops:
- tx_blocking(): send data blocking if allowed
- tx_poll(): set additional TX poll flags
This integrates nicely with the RPMSG API and does not require
any change in existing WWAN drivers.
With these changes, the dd example above blocks instead of exiting
with an error.
Cc: Loic Poulain <loic.poulain@linaro.org>
Signed-off-by: Stephan Gerhold <stephan@gerhold.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stephan Gerhold [Fri, 18 Jun 2021 17:36:10 +0000 (19:36 +0200)]
net: wwan: Add RPMSG WWAN CTRL driver
The remote processor messaging (rpmsg) subsystem provides an interface
to communicate with other remote processors. On many Qualcomm SoCs this
is used to communicate with an integrated modem DSP that implements most
of the modem functionality and provides high-level protocols like
QMI or AT to allow controlling the modem.
For QMI, most older Qualcomm SoCs (e.g. MSM8916/MSM8974) have
a standalone "DATA5_CNTL" channel that allows exchanging QMI messages.
Note that newer SoCs (e.g. SDM845) only allow exchanging QMI messages
via a shared QRTR channel that is available via a socket API on Linux.
For AT, the "DATA4" channel accepts at least a limited set of AT
commands, on many older and newer Qualcomm SoCs, although QMI is
typically the preferred control protocol.
Often there are additional QMI/AT channels (usually named DATA*_CNTL
for QMI and DATA* for AT), but it is not clear if those are really
functional on all devices. Also, at the moment there is no use case
for having multiple QMI/AT ports. If needed more channels could be
added later after more testing.
Note that the data path (network interface) is entirely separate
from the control path and varies between Qualcomm SoCs, e.g. "IPA"
on newer Qualcomm SoCs or "BAM-DMUX" on some older ones.
The RPMSG WWAN CTRL driver exposes the QMI/AT control ports via the
WWAN subsystem, and therefore allows userspace like ModemManager to
set up the modem. Until now, ModemManager had to use the RPMSG-specific
rpmsg-char where the channels must be explicitly exposed as a char
device first and don't show up directly in sysfs.
The driver is a fairly simple glue layer between WWAN and RPMSG
and is mostly based on the existing mhi_wwan_ctrl.c and rpmsg_char.c.
Cc: Loic Poulain <loic.poulain@linaro.org>
Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Stephan Gerhold <stephan@gerhold.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stephan Gerhold [Fri, 18 Jun 2021 17:36:09 +0000 (19:36 +0200)]
rpmsg: core: Add driver_data for rpmsg_device_id
Most device_id structs provide a driver_data field that can be used
by drivers to associate data more easily for a particular device ID.
Add the same for the rpmsg_device_id.
Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Stephan Gerhold <stephan@gerhold.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 18 Jun 2021 20:10:36 +0000 (13:10 -0700)]
Merge branch '100GbE' of git://git./linux/kernel/git/tnguy/next-queue
Jesse Brandeburg says:
====================
100GbE Intel Wired LAN Driver Updates 2021-06-18
Update three of the Intel Ethernet drivers with similar (but not the
same) improvements to simplify the packet type table init, while removing
an unused structure entry. For the ice driver, the table is extended
to 10 bits, which is the hardware limit, and for now is initialized
to zero.
The end result is slightly reduced memory usage, removal of a bunch
of code, and more specific initialization.
====================
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
David S. Miller [Fri, 18 Jun 2021 20:02:45 +0000 (13:02 -0700)]
Revert "net: add pf_family_names[] for protocol family"
This reverts commit
1f3c98eaddec857e16a7a1c6cd83317b3dc89438.
Does not build...
Signed-off-by: David S. Miller <davem@davemloft.net>
Yejune Deng [Fri, 18 Jun 2021 14:32:47 +0000 (22:32 +0800)]
net: add pf_family_names[] for protocol family
Modify the pr_info content from int to char *, this looks more readable.
Signed-off-by: Yejune Deng <yejune.deng@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Skripkin [Fri, 18 Jun 2021 13:49:02 +0000 (16:49 +0300)]
net: ethernet: fix potential use-after-free in ec_bhf_remove
static void ec_bhf_remove(struct pci_dev *dev)
{
...
struct ec_bhf_priv *priv = netdev_priv(net_dev);
unregister_netdev(net_dev);
free_netdev(net_dev);
pci_iounmap(dev, priv->dma_io);
pci_iounmap(dev, priv->io);
...
}
priv is netdev private data, but it is used
after free_netdev(). It can cause use-after-free when accessing priv
pointer. So, fix it by moving free_netdev() after pci_iounmap()
calls.
Fixes:
6af55ff52b02 ("Driver for Beckhoff CX5020 EtherCAT master module.")
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 18 Jun 2021 19:59:53 +0000 (12:59 -0700)]
Merge branch 'csock-seqpoacket-small-fixes'
Stefano Garzarella says:
====================
vsock: small fixes for seqpacket support
This series contains few patches to clean up a bit the code
of seqpacket recently merged in the net-next tree.
No functionality changes.
====================
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Stefano Garzarella [Fri, 18 Jun 2021 13:35:26 +0000 (15:35 +0200)]
vsock/virtio: remove redundant `copy_failed` variable
When memcpy_to_msg() fails in virtio_transport_seqpacket_do_dequeue(),
we already set `dequeued_len` with the negative error value returned
by memcpy_to_msg().
So we can directly check `dequeued_len` value instead of using a
dedicated flag variable to skip the copy path for the rest of
fragments.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefano Garzarella [Fri, 18 Jun 2021 13:35:25 +0000 (15:35 +0200)]
vsock: rename vsock_wait_data()
vsock_wait_data() is used only by STREAM and SEQPACKET sockets,
so let's rename it to vsock_connectible_wait_data(), using the same
nomenclature (connectible) used in other functions after the
introduction of SEQPACKET.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefano Garzarella [Fri, 18 Jun 2021 13:35:24 +0000 (15:35 +0200)]
vsock: rename vsock_has_data()
vsock_has_data() is used only by STREAM and SEQPACKET sockets,
so let's rename it to vsock_connectible_has_data(), using the same
nomenclature (connectible) used in other functions after the
introduction of SEQPACKET.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
wengjianfeng [Fri, 18 Jun 2021 08:52:26 +0000 (16:52 +0800)]
NFC: nxp-nci: remove unnecessary label
Remove unnecessary label chunk_exit and return directly.
Signed-off-by: wengjianfeng <wengjianfeng@yulong.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Fri, 18 Jun 2021 13:48:12 +0000 (16:48 +0300)]
net: dsa: sja1105: completely error out in sja1105_static_config_reload if something fails
If reloading the static config fails for whatever reason, for example if
sja1105_static_config_check_valid() fails, then we "goto out_unlock_ptp"
but we print anyway that "Reset switch and programmed static config.",
which is confusing because we didn't. We also do a bunch of other stuff
like reprogram the XPCS and reload the credit-based shapers, as if a
switch reset took place, which didn't.
So just unlock the PTP lock and goto out, skipping all of that.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Fri, 18 Jun 2021 13:44:00 +0000 (16:44 +0300)]
net: dsa: sja1105: allow the TTEthernet configuration in the static config for SJA1110
Currently sja1105_static_config_check_valid() is coded up to detect
whether TTEthernet is supported based on device ID, and this check was
not updated to cover SJA1110.
However, it is desirable to have as few checks for the device ID as
possible, so the driver core is more generic. So what we can do is look
at the static config table operations implemented by that specific
switch family (populated by sja1105_static_config_init) whether the
schedule table has a non-zero maximum entry count (meaning that it is
supported) or not.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yunsheng Lin [Fri, 18 Jun 2021 12:09:45 +0000 (20:09 +0800)]
net: hns3: fix reuse conflict of the rx page
In the current rx page reuse handling process, the rx page buffer may
have conflict between driver and stack in high-pressure scenario.
To fix this problem, we need to check whether the page is only owned
by driver at the begin and at the end of a page to make sure there is
no reuse conflict between driver and stack when desc_cb->page_offset
is rollbacked to zero or increased.
Fixes:
fa7711b888f2 ("net: hns3: optimize the rx page reuse handling process")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Fri, 18 Jun 2021 11:52:54 +0000 (14:52 +0300)]
net: dsa: sja1105: properly power down the microcontroller clock for SJA1110
It turns out that powering down the BASE_TIMER_CLK does not turn off the
microcontroller, just its timers, including the one for the watchdog.
So the embedded microcontroller is still running, and potentially still
doing things.
To prevent unwanted interference, we should power down the BASE_MCSS_CLK
as well (MCSS = microcontroller subsystem).
The trouble is that currently we turn off the BASE_TIMER_CLK for SJA1110
from the .clocking_setup() method, mostly because this is a Clock
Generation Unit (CGU) setting which was traditionally configured in that
method for SJA1105. But in SJA1105, the CGU was used for bringing up the
port clocks at the proper speeds, and in SJA1110 it's not (but rather
for initial configuration), so it's best that we rebrand the
sja1110_clocking_setup() method into what it really is - an implementation
of the .disable_microcontroller() method.
Since disabling the microcontroller only needs to be done once, at probe
time, we can choose the best place to do that as being in sja1105_setup(),
before we upload the static config to the device. This guarantees that
the static config being used by the switch afterwards is really ours.
Note that the procedure to upload a static config necessarily resets the
switch. This already did not reset the microcontroller, only the switch
core, so since the .disable_microcontroller() method is guaranteed to be
called by that point, if it's disabled, it remains disabled. Add a
comment to make that clear.
With the code movement for SJA1110 from .clocking_setup() to
.disable_microcontroller(), both methods are optional and are guarded by
"if" conditions.
Tested by enabling in the device tree the rev-mii switch port 0 that
goes towards the microcontroller, and flashing a firmware that would
have networking. Without this patch, the microcontroller can be pinged,
with this patch it cannot.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 18 Jun 2021 19:22:55 +0000 (12:22 -0700)]
Merge tag 'mac80211-for-net-2021-06-18' of git://git./linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
A couple of straggler fixes:
* a minstrel HT sample check fix
* peer measurement could double-free on races
* certificate file generation at build time could
sometimes hang
* some parameters weren't reset between connections
in mac80211
* some extensible elements were treated as non-
extensible, possibly causuing bad connections
(or failures) if the AP adds data
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Toke Høiland-Jørgensen [Fri, 18 Jun 2021 11:04:36 +0000 (13:04 +0200)]
selftests/net: Add icmp.sh for testing ICMP dummy address responses
This adds a new icmp.sh selftest for testing that the kernel will respond
correctly with an ICMP unreachable message with the dummy (192.0.0.8)
source address when there are no IPv4 addresses configured to use as source
addresses.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Toke Høiland-Jørgensen [Fri, 18 Jun 2021 11:04:35 +0000 (13:04 +0200)]
icmp: don't send out ICMP messages with a source address of 0.0.0.0
When constructing ICMP response messages, the kernel will try to pick a
suitable source address for the outgoing packet. However, if no IPv4
addresses are configured on the system at all, this will fail and we end up
producing an ICMP message with a source address of 0.0.0.0. This can happen
on a box routing IPv4 traffic via v6 nexthops, for instance.
Since 0.0.0.0 is not generally routable on the internet, there's a good
chance that such ICMP messages will never make it back to the sender of the
original packet that the ICMP message was sent in response to. This, in
turn, can create connectivity and PMTUd problems for senders. Fortunately,
RFC7600 reserves a dummy address to be used as a source for ICMP
messages (192.0.0.8/32), so let's teach the kernel to substitute that
address as a last resort if the regular source address selection procedure
fails.
Below is a quick example reproducing this issue with network namespaces:
ip netns add ns0
ip l add type veth peer netns ns0
ip l set dev veth0 up
ip a add 10.0.0.1/24 dev veth0
ip a add fc00:dead:cafe:42::1/64 dev veth0
ip r add 10.1.0.0/24 via inet6 fc00:dead:cafe:42::2
ip -n ns0 l set dev veth0 up
ip -n ns0 a add fc00:dead:cafe:42::2/64 dev veth0
ip -n ns0 r add 10.0.0.0/24 via inet6 fc00:dead:cafe:42::1
ip netns exec ns0 sysctl -w net.ipv4.icmp_ratelimit=0
ip netns exec ns0 sysctl -w net.ipv4.ip_forward=1
tcpdump -tpni veth0 -c 2 icmp &
ping -w 1 10.1.0.1 > /dev/null
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on veth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
IP 10.0.0.1 > 10.1.0.1: ICMP echo request, id 29, seq 1, length 64
IP 0.0.0.0 > 10.0.0.1: ICMP net 10.1.0.1 unreachable, length 92
2 packets captured
2 packets received by filter
0 packets dropped by kernel
With this patch the above capture changes to:
IP 10.0.0.1 > 10.1.0.1: ICMP echo request, id 31127, seq 1, length 64
IP 192.0.0.8 > 10.0.0.1: ICMP net 10.1.0.1 unreachable, length 92
Fixes:
1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: Juliusz Chroboczek <jch@irif.fr>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Esben Haabendal [Fri, 18 Jun 2021 10:52:38 +0000 (12:52 +0200)]
net: ll_temac: Avoid ndo_start_xmit returning NETDEV_TX_BUSY
As documented in Documentation/networking/driver.rst, the ndo_start_xmit
method must not return NETDEV_TX_BUSY under any normal circumstances, and
as recommended, we simply stop the tx queue in advance, when there is a
risk that the next xmit would cause a NETDEV_TX_BUSY return.
Signed-off-by: Esben Haabendal <esben@geanix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Esben Haabendal [Fri, 18 Jun 2021 10:52:33 +0000 (12:52 +0200)]
net: ll_temac: Fix TX BD buffer overwrite
Just as the initial check, we need to ensure num_frag+1 buffers available,
as that is the number of buffers we are going to use.
This fixes a buffer overflow, which might be seen during heavy network
load. Complete lockup of TEMAC was reproducible within about 10 minutes of
a particular load.
Fixes:
84823ff80f74 ("net: ll_temac: Fix race condition causing TX hang")
Cc: stable@vger.kernel.org # v5.4+
Signed-off-by: Esben Haabendal <esben@geanix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Esben Haabendal [Fri, 18 Jun 2021 10:52:28 +0000 (12:52 +0200)]
net: ll_temac: Add memory-barriers for TX BD access
Add a couple of memory-barriers to ensure correct ordering of read/write
access to TX BDs.
In xmit_done, we should ensure that reading the additional BD fields are
only done after STS_CTRL_APP0_CMPLT bit is set.
When xmit_done marks the BD as free by setting APP0=0, we need to ensure
that the other BD fields are reset first, so we avoid racing with the xmit
path, which writes to the same fields.
Finally, making sure to read APP0 of next BD after the current BD, ensures
that we see all available buffers.
Signed-off-by: Esben Haabendal <esben@geanix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Esben Haabendal [Fri, 18 Jun 2021 10:52:23 +0000 (12:52 +0200)]
net: ll_temac: Make sure to free skb when it is completely used
With the skb pointer piggy-backed on the TX BD, we have a simple and
efficient way to free the skb buffer when the frame has been transmitted.
But in order to avoid freeing the skb while there are still fragments from
the skb in use, we need to piggy-back on the TX BD of the skb, not the
first.
Without this, we are doing use-after-free on the DMA side, when the first
BD of a multi TX BD packet is seen as completed in xmit_done, and the
remaining BDs are still being processed.
Cc: stable@vger.kernel.org # v5.4+
Signed-off-by: Esben Haabendal <esben@geanix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Fri, 18 Jun 2021 10:19:19 +0000 (11:19 +0100)]
qlcnic: remove redundant continue statement
The continue statement at the end of a for-loop has no effect,
it is redundant and can be removed.
Addresses-Coverity: ("Continue has no effect")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Fri, 18 Jun 2021 10:01:55 +0000 (11:01 +0100)]
net: bridge: remove redundant continue statement
The continue statement at the end of a for-loop has no effect,
invert the if expression and remove the continue.
Addresses-Coverity: ("Continue has no effect")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Fri, 18 Jun 2021 09:44:25 +0000 (10:44 +0100)]
net: stmmac: remove redundant continue statement
The continue statement in the for-loop has no effect, remove it.
Addresses-Coverity: ("Continue has no effect")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Machek [Fri, 18 Jun 2021 09:35:26 +0000 (11:35 +0200)]
net: pxa168_eth: Fix a potential data race in pxa168_eth_remove
Commit
0571a753cb07 cancelled delayed work too late, keeping small
race. Cancel work sooner to close it completely.
Signed-off-by: Pavel Machek (CIP) <pavel@denx.de>
Fixes:
0571a753cb07 ("net: pxa168_eth: Fix a potential data race in pxa168_eth_remove")
Signed-off-by: David S. Miller <davem@davemloft.net>
wengjianfeng [Fri, 18 Jun 2021 09:10:16 +0000 (17:10 +0800)]
NFC: nxp-nci: remove unnecessary labels
Simplify the code by removing unnecessary labels and returning directly.
Signed-off-by: wengjianfeng <wengjianfeng@yulong.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
dingsenjie [Fri, 18 Jun 2021 07:34:31 +0000 (15:34 +0800)]
ethernet: marvell/octeontx2: Simplify the return expression of npc_is_same
Simplify the return expression in the rvu_npc_fs.c
Signed-off-by: dingsenjie <dingsenjie@yulong.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dongliang Mu [Fri, 18 Jun 2021 07:32:04 +0000 (15:32 +0800)]
net: caif: modify the label out_err to out
Modify the label out_err to out to avoid the meanless kfree.
Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Karsten Graul [Fri, 18 Jun 2021 07:00:30 +0000 (09:00 +0200)]
MAINTAINERS: add Guvenc as SMC maintainer
Add Guvenc as maintainer for Shared Memory Communications (SMC)
Sockets.
Cc: Julian Wiedmann <jwi@linux.ibm.com>
Acked-by: Guvenc Gulce <guvenc@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 18 Jun 2021 19:00:27 +0000 (12:00 -0700)]
Merge branch 'bnxt_en-fixes'
Michael Chan says:
====================
bnxt_en: Bug fixes
This patchset includes 3 small bug fixes to reinitialize PHY capabilities
after firmware reset, setup the chip's internal TQM fastpath ring
backing memory properly for RoCE traffic, and to free ethtool related
memory if driver probe fails.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Somnath Kotur [Fri, 18 Jun 2021 06:07:27 +0000 (02:07 -0400)]
bnxt_en: Call bnxt_ethtool_free() in bnxt_init_one() error path
bnxt_ethtool_init() may have allocated some memory and we need to
call bnxt_ethtool_free() to properly unwind if bnxt_init_one()
fails.
Fixes:
7c3809181468 ("bnxt_en: Refactor bnxt_init_one() and turn on TPA support on 57500 chips.")
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rukhsana Ansari [Fri, 18 Jun 2021 06:07:26 +0000 (02:07 -0400)]
bnxt_en: Fix TQM fastpath ring backing store computation
TQM fastpath ring needs to be sized to store both the requester
and responder side of RoCE QPs in TQM for supporting bi-directional
tests. Fix bnxt_alloc_ctx_mem() to multiply the RoCE QPs by a factor of
2 when computing the number of entries for TQM fastpath ring. This
fixes an RX pipeline stall issue when running bi-directional max
RoCE QP tests.
Fixes:
c7dd7ab4b204 ("bnxt_en: Improve TQM ring context memory sizing formulas.")
Signed-off-by: Rukhsana Ansari <rukhsana.ansari@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Fri, 18 Jun 2021 06:07:25 +0000 (02:07 -0400)]
bnxt_en: Rediscover PHY capabilities after firmware reset
There is a missing bnxt_probe_phy() call in bnxt_fw_init_one() to
rediscover the PHY capabilities after a firmware reset. This can cause
some PHY related functionalities to fail after a firmware reset. For
example, in multi-host, the ability for any host to configure the PHY
settings may be lost after a firmware reset.
Fixes:
ec5d31e3c15d ("bnxt_en: Handle firmware reset status during IF_UP.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 18 Jun 2021 04:55:56 +0000 (21:55 -0700)]
net: vlan: pass thru all GSO_SOFTWARE in hw_enc_features
Currently UDP tunnel devices on top of VLANs lose the ability
to offload UDP GSO. Widen the pass thru features from TSO
to all GSO_SOFTWARE.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Machek [Fri, 18 Jun 2021 09:29:48 +0000 (11:29 +0200)]
cxgb4: fix wrong shift.
While fixing coverity warning, commit
dd2c79677375 introduced typo in
shift value. Fix that.
Signed-off-by: Pavel Machek (CIP) <pavel@denx.de>
Fixes:
dd2c79677375 ("cxgb4: Fix unintentional sign extension issues")
Signed-off-by: David S. Miller <davem@davemloft.net>
Qing Zhang [Fri, 18 Jun 2021 02:53:37 +0000 (10:53 +0800)]
dt-bindings: dwmac: Add bindings for new Loongson SoC and bridge chip
Add the dwmac bindings for the Loongson-2K SoC and the LS7A
bridge chip.
Signed-off-by: Qing Zhang <zhangqing@loongson.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Qing Zhang [Fri, 18 Jun 2021 02:53:36 +0000 (10:53 +0800)]
MIPS: Loongson64: DTS: Add GMAC support for LS7A PCH
The GMAC module is now supported, enable it.
Signed-off-by: Qing Zhang <zhangqing@loongson.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Qing Zhang [Fri, 18 Jun 2021 02:53:35 +0000 (10:53 +0800)]
MIPS: Loongson64: Add GMAC support for Loongson-2K1000
The GMAC module is now supported, enable it.
Signed-off-by: Qing Zhang <zhangqing@loongson.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Qing Zhang [Fri, 18 Jun 2021 02:53:34 +0000 (10:53 +0800)]
stmmac: pci: Add dwmac support for Loongson
This GMAC module is integrated into the Loongson-2K SoC and the LS7A
bridge chip.
Signed-off-by: Qing Zhang <zhangqing@loongson.cn>
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 18 Jun 2021 18:42:40 +0000 (11:42 -0700)]
Merge branch 'hostess_sv11-cleanups'
Peng Li says:
====================
net: hostess_sv11: clean up some code style issues
This patchset clean up some code style issues.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Peng Li [Fri, 18 Jun 2021 02:32:24 +0000 (10:32 +0800)]
net: hostess_sv11: fix the alignment issue
Alignment should match open parenthesis.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Peng Li [Fri, 18 Jun 2021 02:32:23 +0000 (10:32 +0800)]
net: hostess_sv11: fix the comments style issue
Networking block comments don't use an empty /* line,
use /* Comment...
Block comments use * on subsequent lines.
Block comments use a trailing */ on a separate line.
This patch fixes the comments style issues.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Peng Li [Fri, 18 Jun 2021 02:32:22 +0000 (10:32 +0800)]
net: hostess_sv11: remove dead code
This patch removes the dead code.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Peng Li [Fri, 18 Jun 2021 02:32:21 +0000 (10:32 +0800)]
net: hostess_sv11: fix the code style issue about switch and case
According to the chackpatch.pl,
switch and case should be at the same indent.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Peng Li [Fri, 18 Jun 2021 02:32:20 +0000 (10:32 +0800)]
net: hostess_sv11: remove trailing whitespace
This patch removes trailing whitespace.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Peng Li [Fri, 18 Jun 2021 02:32:19 +0000 (10:32 +0800)]
net: hostess_sv11: move out assignment in if condition
Should not use assignment in if condition.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Peng Li [Fri, 18 Jun 2021 02:32:18 +0000 (10:32 +0800)]
net: hostess_sv11: fix the code style issue about "foo* bar"
Fix the checkpatch error as "foo* bar" should be "foo *bar".
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 18 Jun 2021 18:40:12 +0000 (11:40 -0700)]
Merge branch 'mptcp-dss-checksums'
Mat Martineau says:
====================
mptcp: DSS checksum support
RFC 8684 defines a DSS checksum feature that allows MPTCP to detect
middlebox interference with the MPTCP DSS header and the portion of the
data stream associated with that header. So far, the MPTCP
implementation in the Linux kernel has not supported this feature.
This patch series adds DSS checksum support. By default, the kernel will
not request checksums when sending SYN or SYN/ACK packets for MPTCP
connections. Outgoing checksum requests can be enabled with a
per-namespace net.mptcp.checksum_enabled sysctl. MPTCP connections will
now proceed with DSS checksums when the peer requests them, whether the
sysctl is enabled or not.
Patches 1-5 add checksum bits to the outgoing SYN, SYN/ACK, and data
packet headers. This includes calculating the checksum using a range of
data and the MPTCP DSS mapping for that data.
Patches 6-10 handle the checksum request in the SYN or SYN/ACK, and
receiving and verifying the DSS checksum on data packets.
Patch 11 adjusts the MPTCP-level retransmission process for checksum
compatibility.
Patches 12-14 add checksum-related MIBs, the net.mptcp.checksum_enabled
sysctl, and a checksum field to debug trace output.
Patches 15 & 16 add selftests.
The series is slightly longer than the preferred 15-patch limit that
patchwork warns about. I do try to stay below that whenever possible -
this series does implement one feature and is, I think, cohesive enough
to justify keeping it together. If it's at all problematic please let me
know!
A trivial merge conflict with net/master is introduced in patch 15: a
commit in net/master removes a couple of nearby lines of code.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Geliang Tang [Thu, 17 Jun 2021 23:46:22 +0000 (16:46 -0700)]
selftests: mptcp: enable checksum in mptcp_join.sh
This patch added a new argument "-C" for the mptcp_join.sh script to set
the sysctl checksum_enabled to 1 in ns1 and ns2 to enable the data
checksum.
In chk_join_nr, check the counter of the mib for the data checksum.
Also added a new argument "-S" for the mptcp_join.sh script to start the
test cases that verify the checksum handshake:
* Sender and listener both have checksums off
* Sender and listener both have checksums on
* Sender checksums off, listener checksums on
* Sender checksums on, listener checksums off
The output looks like this:
01 checksum test 0 0 sum[ ok ] - csum [ ok ]
02 checksum test 1 1 sum[ ok ] - csum [ ok ]
03 checksum test 0 1 sum[ ok ] - csum [ ok ]
04 checksum test 1 0 sum[ ok ] - csum [ ok ]
05 no JOIN syn[ ok ] - synack[ ok ] - ack[ ok ]
sum[ ok ] - csum [ ok ]
06 single subflow, limited by client syn[ ok ] - synack[ ok ] - ack[ ok ]
sum[ ok ] - csum [ ok ]
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Geliang Tang [Thu, 17 Jun 2021 23:46:21 +0000 (16:46 -0700)]
selftests: mptcp: enable checksum in mptcp_connect.sh
This patch added a new argument "-C" for the mptcp_connect.sh script to
set the sysctl checksum_enabled to 1 in ns1, ns2, ns3 and ns4 to enable
the data checksum.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Geliang Tang [Thu, 17 Jun 2021 23:46:20 +0000 (16:46 -0700)]
mptcp: dump csum fields in mptcp_dump_mpext
In mptcp_dump_mpext, dump the csum fields, csum and csum_reqd in struct
mptcp_dump_mpext too.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Geliang Tang [Thu, 17 Jun 2021 23:46:19 +0000 (16:46 -0700)]
mptcp: add a new sysctl checksum_enabled
This patch added a new sysctl, named checksum_enabled, to control
whether DSS checksum can be enabled.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Geliang Tang [Thu, 17 Jun 2021 23:46:18 +0000 (16:46 -0700)]
mptcp: add the mib for data checksum
This patch added the mib for the data checksum, MPTCP_MIB_DATACSUMERR.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Thu, 17 Jun 2021 23:46:17 +0000 (16:46 -0700)]
mptcp: tune re-injections for csum enabled mode
If the MPTCP-level checksum is enabled, on re-injections we
must spool a complete DSS, or the receive side will not be
able to compute the csum and process any data.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Thu, 17 Jun 2021 23:46:16 +0000 (16:46 -0700)]
mptcp: validate the data checksum
This patch added three new members named data_csum, csum_len and
map_csum in struct mptcp_subflow_context, implemented a new function
named mptcp_validate_data_checksum().
If the current mapping is valid and csum is enabled traverse the later
pending skbs and compute csum incrementally till the whole mapping has
been covered. If not enough data is available in the rx queue, return
MAPPING_EMPTY - that is, no data.
Next subflow_data_ready invocation will trigger again csum computation.
When the full DSS is available, validate the csum and return to the
caller an appropriate error code, to trigger subflow reset of fallback
as required by the RFC.
Additionally:
- if the csum prevence in the DSS don't match the negotiated value e.g.
csum present, but not requested, return invalid mapping to trigger
subflow reset.
- keep some csum state, to avoid re-compute the csum on the same data
when multiple rx queue traversal are required.
- clean-up the uncompleted mapping from the receive queue on close, to
allow proper subflow disposal
Co-developed-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Geliang Tang [Thu, 17 Jun 2021 23:46:15 +0000 (16:46 -0700)]
mptcp: receive checksum for DSS
In mptcp_parse_option, adjust the expected_opsize, and always parse the
data checksum value from the receiving DSS regardless of csum presence.
Then save it in mp_opt->csum.
Co-developed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Geliang Tang [Thu, 17 Jun 2021 23:46:14 +0000 (16:46 -0700)]
mptcp: receive checksum for MP_CAPABLE with data
This patch added a new member named csum in struct mptcp_options_received.
When parsing the MP_CAPABLE with data, if the checksum is enabled,
adjust the expected_opsize. If the receiving option length matches the
length with the data checksum, get the checksum value and save it in
mp_opt->csum. And in mptcp_incoming_options, pass it to mpext->csum.
We always parse any csum/nocsum combination and delay the presence check
to later code, to allow reset if missing.
Additionally, in the TX path, use the newly introduce ext field to avoid
MPTCP csum recomputation on TCP retransmission and unneeded csum update
on when setting the data fin_flag.
Co-developed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Geliang Tang [Thu, 17 Jun 2021 23:46:13 +0000 (16:46 -0700)]
mptcp: add csum_reqd in mptcp_options_received
This patch added a new flag csum_reqd in struct mptcp_options_received, if
the flag MPTCP_CAP_CHECKSUM_REQD is set in the receiving MP_CAPABLE
suboption, set this flag.
In mptcp_sk_clone and subflow_finish_connect, if the csum_reqd flag is set,
enable the msk->csum_enabled flag.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Geliang Tang [Thu, 17 Jun 2021 23:46:12 +0000 (16:46 -0700)]
mptcp: add sk parameter for mptcp_get_options
This patch added a new parameter name sk in mptcp_get_options().
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Geliang Tang [Thu, 17 Jun 2021 23:46:11 +0000 (16:46 -0700)]
mptcp: send out checksum for DSS
In mptcp_write_options, if the checksum is enabled, adjust the option
length and send out the data checksum with DSS suboption.
Co-developed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Geliang Tang [Thu, 17 Jun 2021 23:46:10 +0000 (16:46 -0700)]
mptcp: send out checksum for MP_CAPABLE with data
If the checksum is enabled, send out the data checksum with the
MP_CAPABLE suboption with data.
In mptcp_established_options_mp, save the data checksum in
opts->ext_copy.csum. In mptcp_write_options, adjust the option length and
send it out with the MP_CAPABLE suboption.
Co-developed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Geliang Tang [Thu, 17 Jun 2021 23:46:09 +0000 (16:46 -0700)]
mptcp: add csum_reqd in mptcp_out_options
This patch added a new member csum_reqd in struct mptcp_out_options and
struct mptcp_subflow_request_sock. Initialized it with the helper
function mptcp_is_checksum_enabled().
In mptcp_write_options, if this field is enabled, send out the MP_CAPABLE
suboption with the MPTCP_CAP_CHECKSUM_REQD flag.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Geliang Tang [Thu, 17 Jun 2021 23:46:08 +0000 (16:46 -0700)]
mptcp: generate the data checksum
This patch added a new member named csum in struct mptcp_ext, implemented
a new function named mptcp_generate_data_checksum().
Generate the data checksum in mptcp_sendmsg_frag, save it in mpext->csum.
Note that we must generate the csum for zero window probe, too.
Do the csum update incrementally, to avoid multiple csum computation
when the data is appended to existing skb.
Note that in a later patch we will skip unneeded csum related operation.
Changes not included here to keep the delta small.
Co-developed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Geliang Tang [Thu, 17 Jun 2021 23:46:07 +0000 (16:46 -0700)]
mptcp: add csum_enabled in mptcp_sock
This patch added a new member named csum_enabled in struct mptcp_sock,
used a dummy mptcp_is_checksum_enabled() helper to initialize it.
Also added a new member named mptcpi_csum_enabled in struct mptcp_info
to expose the csum_enabled flag.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 18 Jun 2021 18:35:47 +0000 (11:35 -0700)]
Merge branch 'seg6.end.dt6'
Andrea Mayer says:
====================
seg6: add support for SRv6 End.DT46 Behavior
SRv6 End.DT46 Behavior is defined in the IETF RFC 8986 [1] along with SRv6
End.DT4 and End.DT6 Behaviors.
The proposed End.DT46 implementation is meant to support the decapsulation
of both IPv4 and IPv6 traffic coming from a *single* SRv6 tunnel.
The SRv6 End.DT46 Behavior greatly simplifies the setup and operations of
SRv6 VPNs in the Linux kernel.
- patch 1/2 is the core patch that adds support for the SRv6 End.DT46
Behavior;
- patch 2/2 adds the selftest for SRv6 End.DT46 Behavior.
The patch introducing the new SRv6 End.DT46 Behavior in iproute2 will
follow shortly.
Comments, suggestions and improvements are very welcome as always!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrea Mayer [Thu, 17 Jun 2021 17:16:45 +0000 (19:16 +0200)]
selftests: seg6: add selftest for SRv6 End.DT46 Behavior
this selftest is designed for evaluating the new SRv6 End.DT46 Behavior
used, in this example, for implementing IPv4/IPv6 L3 VPN use cases.
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Signed-off-by: Paolo Lungaroni <paolo.lungaroni@uniroma2.it>
Acked-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrea Mayer [Thu, 17 Jun 2021 17:16:44 +0000 (19:16 +0200)]
seg6: add support for SRv6 End.DT46 Behavior
IETF RFC 8986 [1] includes the definition of SRv6 End.DT4, End.DT6, and
End.DT46 Behaviors.
The current SRv6 code in the Linux kernel only implements End.DT4 and
End.DT6 which can be used respectively to support IPv4-in-IPv6 and
IPv6-in-IPv6 VPNs. With End.DT4 and End.DT6 it is not possible to create a
single SRv6 VPN tunnel to carry both IPv4 and IPv6 traffic.
The proposed End.DT46 implementation is meant to support the decapsulation
of IPv4 and IPv6 traffic coming from a single SRv6 tunnel.
The implementation of the SRv6 End.DT46 Behavior in the Linux kernel
greatly simplifies the setup and operations of SRv6 VPNs.
The SRv6 End.DT46 Behavior leverages the infrastructure of SRv6 End.DT{4,6}
Behaviors implemented so far, because it makes use of a VRF device in
order to force the routing lookup into the associated routing table.
To make the End.DT46 work properly, it must be guaranteed that the routing
table used for routing lookup operations is bound to one and only one VRF
during the tunnel creation. Such constraint has to be enforced by enabling
the VRF strict_mode sysctl parameter, i.e.:
$ sysctl -wq net.vrf.strict_mode=1
Note that the same approach is used for the SRv6 End.DT4 Behavior and for
the End.DT6 Behavior in VRF mode.
The command used to instantiate an SRv6 End.DT46 Behavior is
straightforward, i.e.:
$ ip -6 route add 2001:db8::1 encap seg6local action End.DT46 vrftable 100 dev vrf100.
[1] https://www.rfc-editor.org/rfc/rfc8986.html#name-enddt46-decapsulation-and-s
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Performance and impact of SRv6 End.DT46 Behavior on the SRv6 Networking
=======================================================================
This patch aims to add the SRv6 End.DT46 Behavior with minimal impact on
the performance of SRv6 End.DT4 and End.DT6 Behaviors.
In order to verify this, we tested the performance of the newly introduced
SRv6 End.DT46 Behavior and compared it with the performance of SRv6
End.DT{4,6} Behaviors, considering both the patched kernel and the kernel
before applying the End.DT46 patch (referred to as vanilla kernel).
In details, the following decapsulation scenarios were considered:
1.a) IPv6 traffic in SRv6 End.DT46 Behavior on patched kernel;
1.b) IPv4 traffic in SRv6 End.DT46 Behavior on patched kernel;
2.a) SRv6 End.DT6 Behavior (VRF mode) on patched kernel;
2.b) SRv6 End.DT4 Behavior on patched kernel;
3.a) SRv6 End.DT6 Behavior (VRF mode) on vanilla kernel (without the
End.DT46 patch);
3.b) SRv6 End.DT4 Behavior on vanilla kernel (without the End.DT46 patch).
All tests were performed on a testbed deployed on the CloudLab [2]
facilities. We considered IPv{4,6} traffic handled by a single core (at 2.4
GHz on a Xeon(R) CPU E5-2630 v3) on kernel 5.13-rc1 using packets of size
~ 100 bytes.
Scenario (1.a): average 684.70 kpps; std. dev. 0.7 kpps;
Scenario (1.b): average 711.69 kpps; std. dev. 1.2 kpps;
Scenario (2.a): average 690.70 kpps; std. dev. 1.2 kpps;
Scenario (2.b): average 722.22 kpps; std. dev. 1.7 kpps;
Scenario (3.a): average 690.02 kpps; std. dev. 2.6 kpps;
Scenario (3.b): average 721.91 kpps; std. dev. 1.2 kpps;
Considering the results for the patched kernel (1.a, 1.b, 2.a, 2.b) we
observe that the performance degradation incurred in using End.DT46 rather
than End.DT6 and End.DT4 respectively for IPv6 and IPv4 traffic is minimal,
around 0.9% and 1.5%. Such very minimal performance degradation is the
price to be paid if one prefers to use a single tunnel capable of handling
both types of traffic (IPv4 and IPv6).
Comparing the results for End.DT4 and End.DT6 under the patched and the
vanilla kernel (2.a, 2.b, 3.a, 3.b) we observe that the introduction of the
End.DT46 patch has no impact on the performance of End.DT4 and End.DT6.
[2] https://www.cloudlab.us
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ioana Ciornei [Thu, 17 Jun 2021 15:55:52 +0000 (18:55 +0300)]
Documentation: ACPI: DSD: fix block code comments
Use the '.. code-block:: none' to properly highlight the documented DSDT
entries. This also fixes warnings in the documentation build process.
Fixes:
e71305acd81c ("Documentation: ACPI: DSD: Document MDIO PHY")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ioana Ciornei [Thu, 17 Jun 2021 15:55:51 +0000 (18:55 +0300)]
Documentation: ACPI: DSD: include phy.rst in the toctree
Include the new phy.rst into the index of the ACPI support
documentation.
Fixes:
e71305acd81c ("Documentation: ACPI: DSD: Document MDIO PHY")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Thu, 17 Jun 2021 12:14:49 +0000 (13:14 +0100)]
net: neterion: vxge: remove redundant continue statement
The continue statement at the end of a for-loop has no effect,
invert the if expression and remove the continue.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Oleksandr Mazur [Thu, 17 Jun 2021 11:36:32 +0000 (14:36 +0300)]
drivers: net: netdevsim: fix devlink_trap selftests failing
devlink_trap tests for the netdevsim fail due to misspelled
debugfs file name. Change this name, as well as name of callback
function, to match the naming as in the devlink itself - 'trap_drop_counter'.
Test-results:
selftests: drivers/net/netdevsim: devlink_trap.sh
TEST: Initialization [ OK ]
TEST: Trap action [ OK ]
TEST: Trap metadata [ OK ]
TEST: Non-existing trap [ OK ]
TEST: Non-existing trap action [ OK ]
TEST: Trap statistics [ OK ]
TEST: Trap group action [ OK ]
TEST: Non-existing trap group [ OK ]
TEST: Trap group statistics [ OK ]
TEST: Trap policer [ OK ]
TEST: Trap policer binding [ OK ]
TEST: Port delete [ OK ]
TEST: Device delete [ OK ]
Fixes:
a7b3527a43fe ("drivers: net: netdevsim: add devlink trap_drop_counter_get implementation")
Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 18 Jun 2021 18:09:23 +0000 (11:09 -0700)]
Merge tag 'arc-5.13-rc7-fixes' of git://git./linux/kernel/git/vgupta/arc
Pull ARC fixes from Vineet Gupta:
- ARCv2 userspace ABI not populating a few registers
- Unbork CONFIG_HARDENED_USERCOPY for ARC
* tag 'arc-5.13-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: fix CONFIG_HARDENED_USERCOPY
ARCv2: save ABI registers across signal handling
Linus Torvalds [Fri, 18 Jun 2021 17:57:09 +0000 (10:57 -0700)]
Merge tag 'trace-v5.13-rc6' of git://git./linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
- Have recordmcount check for valid st_shndx otherwise some archs may
have invalid references for the mcount location.
- Two fixes done for mapping pids to task names. Traces were not
showing the names of tasks when they should have.
- Fix to trace_clock_global() to prevent it from going backwards
* tag 'trace-v5.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Do no increment trace_clock_global() by one
tracing: Do not stop recording comms if the trace file is being read
tracing: Do not stop recording cmdlines when tracing is off
recordmcount: Correct st_shndx handling
Linus Torvalds [Fri, 18 Jun 2021 17:50:41 +0000 (10:50 -0700)]
Merge tag 'printk-for-5.13-fixup' of git://git./linux/kernel/git/printk/linux
Pull printk fixup from Petr Mladek:
"Fix misplaced EXPORT_SYMBOL(vsprintf)"
* tag 'printk-for-5.13-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
printk: Move EXPORT_SYMBOL() closer to vprintk definition
Linus Torvalds [Fri, 18 Jun 2021 17:42:36 +0000 (10:42 -0700)]
Merge tag 'pm-5.13-rc7' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management fix from Rafael Wysocki:
"Remove recently added frequency invariance support from the CPPC
cpufreq driver, because it has turned out to be problematic and it
cannot be fixed properly on time for 5.13 (Viresh Kumar)"
* tag 'pm-5.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
Revert "cpufreq: CPPC: Add support for frequency invariance"
Linus Torvalds [Fri, 18 Jun 2021 17:39:32 +0000 (10:39 -0700)]
Merge tag 'usb-5.13-rc7' of git://git./linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are three small USB fixes for reported problems for 5.13-rc7.
They include:
- disable autosuspend for a cypress USB hub
- fix the battery charger detection for the chipidea driver
- fix a kernel panic in the dwc3 driver due to a previous change in
5.13-rc1.
All have been in linux-next with no reported problems"
* tag 'usb-5.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: core: hub: Disable autosuspend for Cypress CY7C65632
usb: chipidea: imx: Fix Battery Charger 1.2 CDP detection
usb: dwc3: core: fix kernel panic when do reboot
Linus Torvalds [Fri, 18 Jun 2021 17:36:18 +0000 (10:36 -0700)]
Merge tag 'drm-fixes-2021-06-18' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Not much happening in fixes land this week only one PR for two amdgpu
powergating fixes was waiting for me, maybe something will show up
over the weekend, maybe not.
amdgpu:
- GFX9 and 10 powergating fixes"
* tag 'drm-fixes-2021-06-18' of git://anongit.freedesktop.org/drm/drm:
drm/amdgpu/gfx10: enlarge CP_MEC_DOORBELL_RANGE_UPPER to cover full doorbell.
drm/amdgpu/gfx9: fix the doorbell missing when in CGPG issue.
Jesse Brandeburg [Tue, 23 Feb 2021 23:47:07 +0000 (15:47 -0800)]
iavf: clean up packet type lookup table
Remove the unused ptype struct value, which makes table init easier for
the zero entries, and use ranged initializer to remove a bunch of code
(works with gcc and clang). There is no significant functional change.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Jesse Brandeburg [Tue, 23 Feb 2021 23:47:06 +0000 (15:47 -0800)]
i40e: clean up packet type lookup table
Remove the unused ptype struct value, which makes table init easier for
the zero entries, and use ranged initializer to remove a bunch of code
(works with gcc and clang). There is no significant functional change.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Jesse Brandeburg [Tue, 23 Feb 2021 23:47:05 +0000 (15:47 -0800)]
ice: report hash type such as L2/L3/L4
The hardware is reporting the type of the hash used for RSS
as a PTYPE field in the receive descriptor. Use this value to set
the skb packet hash type by extending the hash type table to
cover all 10-bits of possible values (requiring some variables
to be changed from u8 to u16), and then use that table to convert
to one of the possible values in enum pkt_hash_types.
While we're here, remove the unused ptype struct value, which
makes table init easier for the zero entries, and use ranged
initializer to remove a bunch of code (works with gcc and clang).
Without this change, the kernel will recalculate the hash in software,
which can consume extra CPU cycles.
Co-developed-by: Kiran Patil <kiran.patil@intel.com>
Signed-off-by: Kiran Patil <kiran.patil@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Pali Rohár [Tue, 8 Jun 2021 20:36:55 +0000 (22:36 +0200)]
PCI: aardvark: Fix kernel panic during PIO transfer
Trying to start a new PIO transfer by writing value 0 in PIO_START register
when previous transfer has not yet completed (which is indicated by value 1
in PIO_START) causes an External Abort on CPU, which results in kernel
panic:
SError Interrupt on CPU0, code 0xbf000002 -- SError
Kernel panic - not syncing: Asynchronous SError Interrupt
To prevent kernel panic, it is required to reject a new PIO transfer when
previous one has not finished yet.
If previous PIO transfer is not finished yet, the kernel may issue a new
PIO request only if the previous PIO transfer timed out.
In the past the root cause of this issue was incorrectly identified (as it
often happens during link retraining or after link down event) and special
hack was implemented in Trusted Firmware to catch all SError events in EL3,
to ignore errors with code 0xbf000002 and not forwarding any other errors
to kernel and instead throw panic from EL3 Trusted Firmware handler.
Links to discussion and patches about this issue:
https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/commit/?id=
3c7dcdac5c50
https://lore.kernel.org/linux-pci/
20190316161243.29517-1-repk@triplefau.lt/
https://lore.kernel.org/linux-pci/
971be151d24312cc533989a64bd454b4@www.loen.fr/
https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/1541
But the real cause was the fact that during link retraining or after link
down event the PIO transfer may take longer time, up to the 1.44s until it
times out. This increased probability that a new PIO transfer would be
issued by kernel while previous one has not finished yet.
After applying this change into the kernel, it is possible to revert the
mentioned TF-A hack and SError events do not have to be caught in TF-A EL3.
Link: https://lore.kernel.org/r/20210608203655.31228-1-pali@kernel.org
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Marek Behún <kabel@kernel.org>
Cc: stable@vger.kernel.org # 7fbcb5da811b ("PCI: aardvark: Don't rely on jiffies while holding spinlock")
Mikel Rychliski [Fri, 11 Jun 2021 21:48:23 +0000 (17:48 -0400)]
PCI: Add AMD RS690 quirk to enable 64-bit DMA
Although the AMD RS690 chipset has 64-bit DMA support, BIOS implementations
sometimes fail to configure the memory limit registers correctly.
The Acer F690GVM mainboard uses this chipset and a Marvell 88E8056 NIC. The
sky2 driver programs the NIC to use 64-bit DMA, which will not work:
sky2 0000:02:00.0: error interrupt status=0x8
sky2 0000:02:00.0 eth0: tx timeout
sky2 0000:02:00.0 eth0: transmit ring 0 .. 22 report=0 done=0
Other drivers required by this mainboard either don't support 64-bit DMA,
or have it disabled using driver specific quirks. For example, the ahci
driver has quirks to enable or disable 64-bit DMA depending on the BIOS
version (see ahci_sb600_enable_64bit() in ahci.c). This ahci quirk matches
against the SB600 SATA controller, but the real issue is almost certainly
with the RS690 PCI host that it was commonly attached to.
To avoid this issue in all drivers with 64-bit DMA support, fix the
configuration of the PCI host. If the kernel is aware of physical memory
above 4GB, but the BIOS never configured the PCI host with this
information, update the registers with our values.
[bhelgaas: drop PCI_DEVICE_ID_ATI_RS690 definition]
Link: https://lore.kernel.org/r/20210611214823.4898-1-mikel@mikelr.com
Signed-off-by: Mikel Rychliski <mikel@mikelr.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Sriharsha Basavapatna [Sat, 22 May 2021 01:13:17 +0000 (21:13 -0400)]
PCI: Add ACS quirk for Broadcom BCM57414 NIC
The Broadcom BCM57414 NIC may be a multi-function device. While it does
not advertise an ACS capability, peer-to-peer transactions are not possible
between the individual functions, so it is safe to treat them as fully
isolated.
Add an ACS quirk for this device so the functions can be in independent
IOMMU groups and attached individually to userspace applications using
VFIO.
[bhelgaas: commit log]
Link: https://lore.kernel.org/r/1621645997-16251-1-git-send-email-michael.chan@broadcom.com
Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org
Evan Quan [Wed, 2 Jun 2021 02:12:55 +0000 (10:12 +0800)]
PCI: Mark AMD Navi14 GPU ATS as broken
Observed unexpected GPU hang during runpm stress test on 0x7341 rev 0x00.
Further debugging shows broken ATS is related.
Disable ATS on this part. Similar issues on other devices:
a2da5d8cc0b0 ("PCI: Mark AMD Raven iGPU ATS as broken in some platforms")
45beb31d3afb ("PCI: Mark AMD Navi10 GPU rev 0x00 ATS as broken")
5e89cd303e3a ("PCI: Mark AMD Navi14 GPU rev 0xc5 ATS as broken")
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20210602021255.939090-1-evan.quan@amd.com
Signed-off-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Krzysztof Wilczyński <kw@linux.com>
Cc: stable@vger.kernel.org
Chiqijun [Mon, 24 May 2021 22:44:07 +0000 (17:44 -0500)]
PCI: Work around Huawei Intelligent NIC VF FLR erratum
pcie_flr() starts a Function Level Reset (FLR), waits 100ms (the maximum
time allowed for FLR completion by PCIe r5.0, sec 6.6.2), and waits for the
FLR to complete. It assumes the FLR is complete when a config read returns
valid data.
When we do an FLR on several Huawei Intelligent NIC VFs at the same time,
firmware on the NIC processes them serially. The VF may respond to config
reads before the firmware has completed its reset processing. If we bind a
driver to the VF (e.g., by assigning the VF to a virtual machine) in the
interval between the successful config read and completion of the firmware
reset processing, the NIC VF driver may fail to load.
Prevent this driver failure by waiting for the NIC firmware to complete its
reset processing. Not all NIC firmware supports this feature.
[bhelgaas: commit log]
Link: https://support.huawei.com/enterprise/en/doc/EDOC1100063073/87950645/vm-oss-occasionally-fail-to-load-the-in200-driver-when-the-vf-performs-flr
Link: https://lore.kernel.org/r/20210414132301.1793-1-chiqijun@huawei.com
Signed-off-by: Chiqijun <chiqijun@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Cc: stable@vger.kernel.org
Shanker Donthineni [Tue, 8 Jun 2021 05:48:56 +0000 (11:18 +0530)]
PCI: Mark some NVIDIA GPUs to avoid bus reset
Some NVIDIA GPU devices do not work with SBR. Triggering SBR leaves the
device inoperable for the current system boot. It requires a system
hard-reboot to get the GPU device back to normal operating condition
post-SBR. For the affected devices, enable NO_BUS_RESET quirk to avoid the
issue.
This issue will be fixed in the next generation of hardware.
Link: https://lore.kernel.org/r/20210608054857.18963-8-ameynarkhede03@gmail.com
Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Sinan Kaya <okaya@kernel.org>
Cc: stable@vger.kernel.org
Antti Järvinen [Mon, 15 Mar 2021 10:26:06 +0000 (10:26 +0000)]
PCI: Mark TI C667X to avoid bus reset
Some TI KeyStone C667X devices do not support bus/hot reset. The PCIESS
automatically disables LTSSM when Secondary Bus Reset is received and
device stops working. Prevent bus reset for these devices. With this
change, the device can be assigned to VMs with VFIO, but it will leak state
between VMs.
Reference: https://e2e.ti.com/support/processors/f/791/t/954382
Link: https://lore.kernel.org/r/20210315102606.17153-1-antti.jarvinen@gmail.com
Signed-off-by: Antti Järvinen <antti.jarvinen@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Kishon Vijay Abraham I <kishon@ti.com>
Cc: stable@vger.kernel.org
Jon Hunter [Thu, 10 Jun 2021 06:41:34 +0000 (07:41 +0100)]
PCI: tegra194: Fix MCFG quirk build regressions
7f100744749e ("PCI: tegra: Add Tegra194 MCFG quirks for ECAM errata")
caused a few build regressions:
-
7f100744749e removed the Makefile rule for CONFIG_PCIE_TEGRA194, so
pcie-tegra.c can no longer be built as a module. Restore that rule.
-
7f100744749e added "#ifdef CONFIG_PCIE_TEGRA194" around the native
driver, but that's only set when the driver is built-in (for a module,
CONFIG_PCIE_TEGRA194_MODULE is defined).
The ACPI quirk is completely independent of the rest of the native
driver, so move the quirk to its own file and remove the #ifdef in the
native driver.
-
7f100744749e added symbols that are always defined but used only when
CONFIG_PCIEASPM, which causes warnings when CONFIG_PCIEASPM is not set:
drivers/pci/controller/dwc/pcie-tegra194.c:259:18: warning: ‘event_cntr_data_offset’ defined but not used [-Wunused-const-variable=]
drivers/pci/controller/dwc/pcie-tegra194.c:250:18: warning: ‘event_cntr_ctrl_offset’ defined but not used [-Wunused-const-variable=]
drivers/pci/controller/dwc/pcie-tegra194.c:243:27: warning: ‘pcie_gen_freq’ defined but not used [-Wunused-const-variable=]
Fixes:
7f100744749e ("PCI: tegra: Add Tegra194 MCFG quirks for ECAM errata")
Link: https://lore.kernel.org/r/20210610064134.336781-1-jonathanh@nvidia.com
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Thierry Reding <treding@nvidia.com>
Punit Agrawal [Mon, 14 Jun 2021 23:04:57 +0000 (08:04 +0900)]
PCI: of: Clear 64-bit flag for non-prefetchable memory below 4GB
Alexandru and Qu reported this resource allocation failure on ROCKPro64 v2
and ROCK Pi 4B, both based on the RK3399:
pci_bus 0000:00: root bus resource [mem 0xfa000000-0xfbdfffff 64bit]
pci 0000:00:00.0: PCI bridge to [bus 01]
pci 0000:00:00.0: BAR 14: no space for [mem size 0x00100000]
pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00003fff 64bit]
"BAR 14" is the PCI bridge's 32-bit non-prefetchable window, and our PCI
allocation code isn't smart enough to allocate it in a host bridge window
marked as 64-bit, even though this should work fine.
A DT host bridge description includes the windows from the CPU address
space to the PCI bus space. On a few architectures (microblaze, powerpc,
sparc), the DT may also describe PCI devices themselves, including their
BARs.
Before
9d57e61bf723 ("of/pci: Add IORESOURCE_MEM_64 to resource flags for
64-bit memory addresses"), of_bus_pci_get_flags() ignored the fact that
some DT addresses described 64-bit windows and BARs. That was a problem
because the virtio virtual NIC has a 32-bit BAR and a 64-bit BAR, and the
driver couldn't distinguish them.
9d57e61bf723 set IORESOURCE_MEM_64 for those 64-bit DT ranges, which fixed
the virtio driver. But it also set IORESOURCE_MEM_64 for host bridge
windows, which exposed the fact that the PCI allocator isn't smart enough
to put 32-bit resources in those 64-bit windows.
Clear IORESOURCE_MEM_64 from host bridge windows since we don't need that
information.
Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
Fixes:
9d57e61bf723 ("of/pci: Add IORESOURCE_MEM_64 to resource flags for 64-bit memory addresses")
Link: https://lore.kernel.org/r/20210614230457.752811-1-punitagrawal@gmail.com
Reported-at: https://lore.kernel.org/lkml/
7a1e2ebc-f7d8-8431-d844-
41a9c36a8911@arm.com/
Reported-at: https://lore.kernel.org/lkml/YMyTUv7Jsd89PGci@m4/T/#u
Reported-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reported-by: Qu Wenruo <wqu@suse.com>
Tested-by: Alexandru Elisei <alexandru.elisei@arm.com>
Tested-by: Domenico Andreoli <domenico.andreoli@linux.com>
Signed-off-by: Punit Agrawal <punitagrawal@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>