Igor Russkikh [Thu, 14 May 2020 09:57:18 +0000 (12:57 +0300)]
net: qede: add hw err scheduled handler
qede (ethernet level driver) registers a callback handler.
This handler maintains eth dev state flags/bits to track error processing.
It implements in place processing part for nonsleeping context (WARN_ON
trigger), and a deferred (delayed work) part which triggers recovery
process for recoverable errors.
In later patches this atomic handler will come with more meat.
We introduce err_flags on ethdevice structure, its being used to record
error handling properties.
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Russkikh [Thu, 14 May 2020 09:57:17 +0000 (12:57 +0300)]
net: qed: adding hw_err states and handling
Here we introduce qed device error tracking flags and error types.
qed_hw_err_notify is an entrace point to report errors.
It'll notify higher level drivers (qede/qedr/etc) to handle and recover
the error.
List of posible errors comes from hardware interfaces, but could be
extended in future.
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 14 May 2020 20:18:10 +0000 (13:18 -0700)]
Merge branch 'net-hns3-add-some-cleanups-for-next'
Huazhong Tan says:
====================
net: hns3: add some cleanups for -next
This patchset adds some cleanups for the HNS3 ethernet driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Huazhong Tan [Thu, 14 May 2020 12:41:26 +0000 (20:41 +0800)]
net: hns3: remove unnecessary frag list checking in hns3_nic_net_xmit()
The skb_has_frag_list() in hns3_nic_net_xmit() is redundant, since
skb_walk_frags() includes this checking implicitly.
Reported-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Huazhong Tan [Thu, 14 May 2020 12:41:25 +0000 (20:41 +0800)]
net: hns3: remove some unused macros
There are some macros defined in hns3_enet.h, but not used in
anywhere.
Reported-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Huazhong Tan [Thu, 14 May 2020 12:41:24 +0000 (20:41 +0800)]
net: hns3: modify an incorrect error log in hclge_mbx_handler()
When handling HCLGE_MBX_GET_LINK_STATUS, PF will return the link
status to the VF, so the error log of hclge_get_link_info() is
incorrect.
Reported-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Huazhong Tan [Thu, 14 May 2020 12:41:23 +0000 (20:41 +0800)]
net: hns3: remove a duplicated printing in hclge_configure()
Since hclge_get_cfg() already has error print, so hclge_configure()
should not print error when calling hclge_get_cfg() fail.
Reported-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Huazhong Tan [Thu, 14 May 2020 12:41:22 +0000 (20:41 +0800)]
net: hns3: modify some incorrect spelling
This patch modifies some incorrect spelling.
Reported-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thierry Reding [Thu, 14 May 2020 12:38:48 +0000 (14:38 +0200)]
r8152: Use MAC address from device tree if available
If a MAC address was passed via the device tree node for the r8152
device, use it and fall back to reading from EEPROM otherwise. This is
useful for devices where the r8152 EEPROM was not programmed with a
valid MAC address, or if users want to explicitly set a MAC address in
the bootloader and pass that to the kernel.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Buslov [Thu, 14 May 2020 06:35:52 +0000 (09:35 +0300)]
selftests: fix flower parent qdisc
Flower tests used to create ingress filter with specified parent qdisc
"parent ffff:" but dump them on "ingress". With recent commit that fixed
tcm_parent handling in dump those are not considered same parent anymore,
which causes iproute2 tc to emit additional "parent ffff:" in first line of
filter dump output. The change in output causes filter match in tests to
fail.
Prevent parent qdisc output when dumping filters in flower tests by always
correctly specifying "ingress" parent both when creating and dumping
filters.
Fixes:
a7df4870d79b ("net_sched: fix tcm_parent in tc filter dump")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
DENG Qingfang [Wed, 13 May 2020 15:37:17 +0000 (23:37 +0800)]
net: dsa: mt7530: set CPU port to fallback mode
Currently, setting a bridge's self PVID to other value and deleting
the default VID 1 renders untagged ports of that VLAN unable to talk to
the CPU port:
bridge vlan add dev br0 vid 2 pvid untagged self
bridge vlan del dev br0 vid 1 self
bridge vlan add dev sw0p0 vid 2 pvid untagged
bridge vlan del dev sw0p0 vid 1
# br0 cannot send untagged frames out of sw0p0 anymore
That is because the CPU port is set to security mode and its PVID is
still 1, and untagged frames are dropped due to VLAN member violation.
Set the CPU port to fallback mode so untagged frames can pass through.
Fixes:
83163f7dca56 ("net: dsa: mediatek: add VLAN support for MT7530")
Signed-off-by: DENG Qingfang <dqfext@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel González Cabanelas [Tue, 12 May 2020 17:59:48 +0000 (19:59 +0200)]
net: mvneta: speed down the PHY, if WoL used, to save energy
Some PHYs connected to this ethernet hardware support the WoL feature.
But when WoL is enabled and the machine is powered off, the PHY remains
waiting for a magic packet at max speed (i.e. 1Gbps), which is a waste of
energy.
Slow down the PHY speed before stopping the ethernet if WoL is enabled,
and save some energy while the machine is powered off or sleeping.
Tested using an Armada 370 based board (LS421DE) equipped with a Marvell
88E1518 PHY.
Signed-off-by: Daniel González Cabanelas <dgcbueu@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Tue, 12 May 2020 17:13:55 +0000 (18:13 +0100)]
sfc: fix dereference of table before it is null checked
Currently pointer table is being dereferenced on a null check of
table->must_restore_filters before it is being null checked, leading
to a potential null pointer dereference issue. Fix this by null
checking table before dereferencing it when checking for a null
table->must_restore_filters.
Addresses-Coverity: ("Dereference before null check")
Fixes:
e4fe938cff04 ("sfc: move 'must restore' flags out of ef10-specific nic_data")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Walle [Wed, 13 May 2020 20:38:07 +0000 (22:38 +0200)]
net: phy: at803x: add cable diagnostics support
The AR8031/AR8033 and the AR8035 support cable diagnostics. Adding
driver support is straightforward, so lets add it.
The PHY just do one pair at a time, so we have to start the test four
times. The cable_test_get_status() can block and therefore we can just
busy poll the test completion and continue with the next pair until we
are done.
The time delta counter seems to run at 125MHz which just gives us a
resolution of about 82.4cm per tick.
100m cable, A/B/C/D open:
Cable test started for device eth0.
Cable test completed for device eth0.
Pair: Pair A, result: Open Circuit
Pair: Pair A, fault length: 107.94m
Pair: Pair B, result: Open Circuit
Pair: Pair B, fault length: 104.64m
Pair: Pair C, result: Open Circuit
Pair: Pair C, fault length: 105.47m
Pair: Pair D, result: Open Circuit
Pair: Pair D, fault length: 107.94m
1m cable, A/B connected, C shorted, D open:
Cable test started for device eth0.
Cable test completed for device eth0.
Pair: Pair A, result: OK
Pair: Pair B, result: OK
Pair: Pair C, result: Short within Pair
Pair: Pair C, fault length: 0.82m
Pair: Pair D, result: Open Circuit
Pair: Pair D, fault length: 0.82m
Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Christoph Hellwig [Wed, 13 May 2020 19:36:41 +0000 (21:36 +0200)]
ipv6: set msg_control_is_user in do_ipv6_getsockopt
While do_ipv6_getsockopt does not call the high-level recvmsg helper,
the msghdr eventually ends up being passed to put_cmsg anyway, and thus
needs msg_control_is_user set to the proper value.
Fixes:
1f466e1f15cf ("net: cleanly handle kernel vs user buffers for ->msg_control")
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 13 May 2020 19:52:39 +0000 (12:52 -0700)]
Merge branch 'net-phy-broadcom-cable-tester-support'
Michael Walle says:
====================
net: phy: broadcom: cable tester support
Add cable tester support for the Broadcom PHYs. Support for it was
developed on a BCM54140 Quad PHY which RDB register access.
If there is a link partner the results are not as good as with an open
cable. I guess we could retry if the measurement until all pairs had at
least one valid result.
changes since v1:
- added Reviewed-by: tags
- removed "div by 2" for cross shorts, just mention it in the commit
message. The results are inconclusive if the tests are repeated. So
just report the length as is for now.
- fixed typo in commit message
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Walle [Wed, 13 May 2020 16:35:24 +0000 (18:35 +0200)]
net: phy: bcm54140: add cable diagnostics support
Use the generic cable tester functions from bcm-phy-lib to add cable
tester support.
100m cable, A/B/C/D open:
Cable test started for device eth0.
Cable test completed for device eth0.
Pair: Pair A, result: Open Circuit
Pair: Pair B, result: Open Circuit
Pair: Pair C, result: Open Circuit
Pair: Pair D, result: Open Circuit
Pair: Pair A, fault length: 106.60m
Pair: Pair B, fault length: 103.32m
Pair: Pair C, fault length: 104.96m
Pair: Pair D, fault length: 106.60m
1m cable, A/B connected, pair C shorted, D open:
Cable test started for device eth0.
Cable test completed for device eth0.
Pair: Pair A, result: OK
Pair: Pair B, result: OK
Pair: Pair C, result: Short within Pair
Pair: Pair D, result: Open Circuit
Pair: Pair C, fault length: 0.82m
Pair: Pair D, fault length: 1.64m
1m cable, A/B connected, pair C shorted with D:
Cable test started for device eth0.
Cable test completed for device eth0.
Pair: Pair A, result: OK
Pair: Pair B, result: OK
Pair: Pair C, result: Short to another pair
Pair: Pair D, result: Short to another pair
Pair: Pair C, fault length: 1.64m
Pair: Pair D, fault length: 1.64m
The granularity of the length measurement seems to be 82cm.
Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Walle [Wed, 13 May 2020 16:35:23 +0000 (18:35 +0200)]
net: phy: broadcom: add cable test support
Most modern broadcom PHYs support ECD (enhanced cable diagnostics). Add
support for it in the bcm-phy-lib so they can easily be used in the PHY
driver.
There are two access methods for ECD: legacy by expansion registers and
via the new RDB registers which are exclusive. Provide functions in two
variants where the PHY driver can choose from. To keep things simple for
now, we just switch the register access to expansion registers in the
RDB variant for now. On the flipside, we have to keep a bus lock to
prevent any other non-legacy access on the PHY.
The results of the intra-pair tests are inconclusive (at least for the
BCM54140). Most of the times half the length is reported but sometimes
the length is correct.
Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Walle [Wed, 13 May 2020 16:35:22 +0000 (18:35 +0200)]
net: phy: broadcom: add bcm_phy_modify_exp()
Add the convenience function to do a read-modify-write. This has the
additional benefit of saving one write to the selection register.
Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Walle [Wed, 13 May 2020 16:35:21 +0000 (18:35 +0200)]
net: phy: broadcom: add exp register access methods without buslock
Add helper to read and write expansion registers without taking the mdio
lock.
Please note, that this changes the semantics of the read and write.
Before there was no lock between selecting the expansion register and
the actual read/write. This may lead to access failures if there are
parallel accesses. Instead take the bus lock during the whole access
cycle.
Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Oleksij Rempel [Wed, 13 May 2020 12:34:40 +0000 (14:34 +0200)]
net: phy: tja11xx: add cable-test support
Add initial cable testing support.
This PHY needs only 100usec for this test and it is recommended to run it
before the link is up. For now, provide at least ethtool support, so it
can be tested by more developers.
This patch was tested with TJA1102 PHY with following results:
- No cable, is detected as open
- 1m cable, with no connected other end and detected as open
- a 40m cable (out of spec, max lenght should be 15m) is detected as OK.
Current patch do not provide polarity test support. This test would
indicate not proper wire connection, where "+" wire of main phy is
connected to the "-" wire of the link partner.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Christoph Hellwig [Wed, 13 May 2020 11:07:59 +0000 (13:07 +0200)]
net: ignore sock_from_file errors in __scm_install_fd
The code had historically been ignoring these errors, and my recent
refactoring changed that, which broke ssh in some setups.
Fixes:
2618d530dd8b ("net/scm: cleanup scm_detach_fds")
Reported-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Ido Schimmel <idosch@mellanox.com>
Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 13 May 2020 19:23:14 +0000 (12:23 -0700)]
Merge branch 'dwmac-meson8b-Ethernet-RX-delay-configuration'
Martin Blumenstingl says:
====================
dwmac-meson8b Ethernet RX delay configuration
The Ethernet TX performance has been historically bad on Meson8b and
Meson8m2 SoCs because high packet loss was seen. I found out that this
was related (yet again) to the RGMII TX delay configuration.
In the process of discussing the big picture (and not just a single
patch) [0] with Andrew I discovered that the IP block behind the
dwmac-meson8b driver actually seems to support the configuration of the
RGMII RX delay (at least on the Meson8b SoC generation).
Since I sent the first RFC I got additional documentation from Jianxin
(many thanks!). Also I have discovered some more interesting details:
- Meson8b Odroid-C1 requires an RX delay (by either the PHY or the MAC)
Based on the vendor u-boot code (not upstream) I assume that it will
be the same for all Meson8b and Meson8m2 boards
- Khadas VIM2 seems to have the RX delay built into the PCB trace
length. When I enable the RX delay on the PHY or MAC I can't get any
data through. I expect that we will have the same situation on all
GXBB, GXM, AXG, G12A, G12B and SM1 boards. Further clarification is
needed here though (since I can't visually see these lengthened
traces on the PCB). This will be done before sending patches for
these boards.
Dependencies for this series:
There is a soft dependency for patch #2 on commit
f22531438ff42c
"dt-bindings: net: dwmac: increase 'maxItems' for 'clocks',
'clock-names' properties" which is currently in Rob's -next tree.
That commit is needed to make the dt-bindings schema validation
pass for patch #2. That patch has been for ~4 weeks in Robs tree,
so I assume that is not going to be dropped.
Changes since RFC v2 at [2]:
- dropped $ref: /schemas/types.yaml#definitions/uint32 from the
"amlogic,rx-delay-ns" in patch #1 ("Don't need to define the
type when in standard units." says Rob - thanks, I learned
something new). Also use "default: 0" for for this property
instead of explaining it in the description text.
- added a note to the cover-letter about a hidden dependency for
dt-binding schema validation in patch #2
- Added Andrew's Reviewed-by to patches 1-7. Thank you again for
the quick and detailed reviews, I appreciate this!
- error out if the (optional) timing-adjustment clock is missing
but we're asked to enable the RGMII RX delay. The MAC won't
work in this specific case and either the RX delay has to be
provided by the PHY or the timing-adjustment clock has to be
added.
- dropped the dts patches (#9-11) which were only added to give
an overview how this is going to be used. those will be sent
separately
- dropped the RFC prefix
Changes since RFC v1 at [1]:
- add support for the timing adjustment clock input (dt-bindings and
in the driver) thanks to the input from the unnamed Ethernet engineer
at Amlogic. This is the missing link between the fclk_div2 clock and
the Ethernet controller on Meson8b (no traffic would flow if that
clock was disabled)
- add support fot the amlogic,rx-delay-ns property. The only supported
values so far are 0ns and 2ns. The registers seem to allow more
precise timing adjustments, but I could not make that work so far.
- add more register documentation (for the new RX delay bits) and
unified the placement of existing register documentation. Again,
thanks to Jianxin and the unnamed Ethernet engineer at Amlogic
- DO NOT MERGE: .dts patches to show the conversion of the Meson8b
and Meson8m2 boards to "rgmii-id". I didn't have time for all arm64
patches yet, but these will switch to phy-mode = "rgmii-txid" with
amlogic,rx-delay-ns = <0> (because the delay seems to be provided by
the PCB trace length).
[0] https://patchwork.kernel.org/patch/
11309891/
[1] https://patchwork.kernel.org/cover/
11310719/
[2] https://patchwork.kernel.org/cover/
11518257/
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Martin Blumenstingl [Tue, 12 May 2020 21:11:03 +0000 (23:11 +0200)]
net: stmmac: dwmac-meson8b: add support for the RX delay configuration
Configure the PRG_ETH0_ADJ_* bits to enable or disable the RX delay
based on the various RGMII PHY modes. For now the only supported RX
delay settings are:
- disabled, use for example for phy-mode "rgmii-id"
- 0ns - this is treated identical to "disabled", used for example on
boards where the PHY provides 2ns TX delay and the PCB trace length
already adds 2ns RX delay
- 2ns - for whenever the PHY cannot add the RX delay and the traces on
the PCB don't add any RX delay
Disabling the RX delay (in case u-boot enables it, which is the case
for example on Meson8b Odroid-C1) simply means that PRG_ETH0_ADJ_ENABLE,
PRG_ETH0_ADJ_SETUP, PRG_ETH0_ADJ_DELAY and PRG_ETH0_ADJ_SKEW should be
disabled (just disabling PRG_ETH0_ADJ_ENABLE may be enough, since that
disables the whole re-timing logic - but I find it makes more sense to
clear the other bits as well since they depend on that setting).
u-boot on Odroid-C1 uses the following steps to enable a 2ns RX delay:
- enabling enabling the timing adjustment clock
- enabling the timing adjustment logic by setting PRG_ETH0_ADJ_ENABLE
- setting the PRG_ETH0_ADJ_SETUP bit
The documentation for the PRG_ETH0_ADJ_DELAY and PRG_ETH0_ADJ_SKEW
registers indicates that we can even set different RX delays. However,
I could not find out how this works exactly, so for now we only support
a 2ns RX delay using the exact same way that Odroid-C1's u-boot does.
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Martin Blumenstingl [Tue, 12 May 2020 21:11:02 +0000 (23:11 +0200)]
net: stmmac: dwmac-meson8b: Make the clock enabling code re-usable
The timing adjustment clock will need similar logic as the RGMII clock:
It has to be enabled in the driver conditionally and when the driver is
unloaded it should be disabled again. Extract the existing code for the
RGMII clock into a new function so it can be re-used.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Martin Blumenstingl [Tue, 12 May 2020 21:11:01 +0000 (23:11 +0200)]
net: stmmac: dwmac-meson8b: Fetch the "timing-adjustment" clock
The PRG_ETHERNET registers have a built-in timing adjustment circuit
which can provide the RX delay in RGMII mode. This is driven by an
external (to this IP, but internal to the SoC) clock input. Fetch this
clock as optional (even though it's there on all supported SoCs) since
we just learned about it and existing .dtbs don't specify it.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Martin Blumenstingl [Tue, 12 May 2020 21:11:00 +0000 (23:11 +0200)]
net: stmmac: dwmac-meson8b: Add the PRG_ETH0_ADJ_* bits
The PRG_ETH0_ADJ_* are used for applying the RGMII RX delay. The public
datasheets only have very limited description for these registers, but
Jianxin Pan provided more detailed documentation from an (unnamed)
Amlogic engineer. Add the PRG_ETH0_ADJ_* bits along with the improved
description.
Suggested-by: Jianxin Pan <jianxin.pan@amlogic.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Martin Blumenstingl [Tue, 12 May 2020 21:10:59 +0000 (23:10 +0200)]
net: stmmac: dwmac-meson8b: Move the documentation for the TX delay
Move the documentation for the TX delay above the PRG_ETH0_TXDLY_MASK
definition. Future commits will add more registers also with
documentation above their register bit definitions. Move the existing
comment so it will be consistent with the upcoming changes.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Martin Blumenstingl [Tue, 12 May 2020 21:10:58 +0000 (23:10 +0200)]
net: stmmac: dwmac-meson8b: use FIELD_PREP instead of open-coding it
Use FIELD_PREP() to shift a value to the correct offset based on a
bitmask instead of open-coding the logic.
No functional changes.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Martin Blumenstingl [Tue, 12 May 2020 21:10:57 +0000 (23:10 +0200)]
dt-bindings: net: dwmac-meson: Document the "timing-adjustment" clock
The PRG_ETHERNET registers can add an RX delay in RGMII mode. This
requires an internal re-timing circuit whose input clock is called
"timing adjustment clock". Document this clock input so the clock can be
enabled as needed.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Martin Blumenstingl [Tue, 12 May 2020 21:10:56 +0000 (23:10 +0200)]
dt-bindings: net: meson-dwmac: Add the amlogic,rx-delay-ns property
The PRG_ETHERNET registers on Meson8b and newer SoCs can add an RX
delay. Add a property with the known supported values so it can be
configured according to the board layout.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 13 May 2020 19:20:12 +0000 (12:20 -0700)]
Merge branch 'for-upstream' of git://git./linux/kernel/git/bluetooth/bluetooth-next
Johan Hedberg says:
====================
pull request: bluetooth-next 2020-05-13
Here's a second attempt at a bluetooth-next pull request which
supercedes the one dated 2020-05-09. This should have the issues
discovered by Jakub fixed.
- Add support for Intel Typhoon Peak device (8087:0032)
- Add device tree bindings for Realtek RTL8723BS device
- Add device tree bindings for Qualcomm QCA9377 device
- Add support for experimental features configuration through mgmt
- Add driver hook to prevent wake from suspend
- Add support for waiting for L2CAP disconnection response
- Multiple fixes & cleanups to the btbcm driver
- Add support for LE scatternet topology for selected devices
- A few other smaller fixes & cleanups
Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 13 May 2020 18:54:47 +0000 (11:54 -0700)]
Merge branch 'net-dsa-felix-tc-taprio-and-CBS-offload-support'
Xiaoliang Yang says:
====================
net: dsa: felix: tc taprio and CBS offload support
This patch series support tc taprio and CBS hardware offload according
to IEEE 802.1Qbv and IEEE-802.1Qav on VSC9959.
v1->v2 changes:
- Move port_qos_map_init() function to be common felix codes.
- Keep const for dsa_switch_ops structs, add felix_port_setup_tc
function to call port_setup_tc of felix.info.
- fix code style for cbs_set, rename variables.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Xiaoliang Yang [Wed, 13 May 2020 02:25:10 +0000 (10:25 +0800)]
net: dsa: felix: add support Credit Based Shaper(CBS) for hardware offload
VSC9959 hardware support the Credit Based Shaper(CBS) which part
of the IEEE-802.1Qav. This patch support sch_cbs set for VSC9959.
Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xiaoliang Yang [Wed, 13 May 2020 02:25:09 +0000 (10:25 +0800)]
net: dsa: felix: Configure Time-Aware Scheduler via taprio offload
Ocelot VSC9959 switch supports time-based egress shaping in hardware
according to IEEE 802.1Qbv. This patch add support for TAS configuration
on egress port of VSC9959 switch.
Felix driver is an instance of Ocelot family, with a DSA front-end. The
patch uses tc taprio hardware offload to setup TAS set function on felix
driver.
Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xiaoliang Yang [Wed, 13 May 2020 02:25:08 +0000 (10:25 +0800)]
net: dsa: felix: qos classified based on pcp
Set the default QoS Classification based on PCP and DEI of vlan tag,
after that, frames can be Classified to different Qos based on PCP tag.
If there is no vlan tag or vlan ignored, use port default Qos.
Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Archie Pusaka [Tue, 14 Apr 2020 08:08:40 +0000 (16:08 +0800)]
Bluetooth: L2CAP: add support for waiting disconnection resp
Whenever we disconnect a L2CAP connection, we would immediately
report a disconnection event (EPOLLHUP) to the upper layer, without
waiting for the response of the other device.
This patch offers an option to wait until we receive a disconnection
response before reporting disconnection event, by using the "how"
parameter in l2cap_sock_shutdown(). Therefore, upper layer can opt
to wait for disconnection response by shutdown(sock, SHUT_WR).
This can be used to enforce proper disconnection order in HID,
where the disconnection of the interrupt channel must be complete
before attempting to disconnect the control channel.
Signed-off-by: Archie Pusaka <apusaka@chromium.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Sonny Sasaka [Wed, 6 May 2020 19:55:03 +0000 (12:55 -0700)]
Bluetooth: Handle Inquiry Cancel error after Inquiry Complete
After sending Inquiry Cancel command to the controller, it is possible
that Inquiry Complete event comes before Inquiry Cancel command complete
event. In this case the Inquiry Cancel command will have status of
Command Disallowed since there is no Inquiry session to be cancelled.
This case should not be treated as error, otherwise we can reach an
inconsistent state.
Example of a btmon trace when this happened:
< HCI Command: Inquiry Cancel (0x01|0x0002) plen 0
> HCI Event: Inquiry Complete (0x01) plen 1
Status: Success (0x00)
> HCI Event: Command Complete (0x0e) plen 4
Inquiry Cancel (0x01|0x0002) ncmd 1
Status: Command Disallowed (0x0c)
Signed-off-by: Sonny Sasaka <sonnysasaka@chromium.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Rikard Falkeborn [Sat, 9 May 2020 13:17:19 +0000 (15:17 +0200)]
Bluetooth: serdev: Constify serdev_device_ops
serdev_device_ops is not modified and can be const. Also, remove the
unneeded declaration of it.
Output from the file command before and after:
Before:
text data bss dec hex filename
7192 2408 192 9792 2640 drivers/bluetooth/hci_serdev.o
After:
text data bss dec hex filename
7256 2344 192 9792 2640 drivers/bluetooth/hci_serdev.o
Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Raghuram Hegde [Mon, 11 May 2020 11:10:40 +0000 (16:40 +0530)]
Bluetooth: btusb: Add support for Intel Bluetooth Device Typhoon Peak (8087:0032)
Device from /sys/kernel/debug/usb/devices:
T: Bus=01 Lev=01 Prnt=01 Port=13 Cnt=02 Dev#= 3 Spd=12 MxCh= 0
D: Ver= 2.01 Cls=e0(wlcon) Sub=01 Prot=01 MxPS=64 #Cfgs= 1
P: Vendor=8087 ProdID=0032 Rev= 0.00
C:* #Ifs= 2 Cfg#= 1 Atr=e0 MxPwr=100mA
I:* If#= 0 Alt= 0 #EPs= 3 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E: Ad=81(I) Atr=03(Int.) MxPS= 64 Ivl=1ms
E: Ad=02(O) Atr=02(Bulk) MxPS= 64 Ivl=0ms
E: Ad=82(I) Atr=02(Bulk) MxPS= 64 Ivl=0ms
I:* If#= 1 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E: Ad=03(O) Atr=01(Isoc) MxPS= 0 Ivl=1ms
E: Ad=83(I) Atr=01(Isoc) MxPS= 0 Ivl=1ms
I: If#= 1 Alt= 1 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E: Ad=03(O) Atr=01(Isoc) MxPS= 9 Ivl=1ms
E: Ad=83(I) Atr=01(Isoc) MxPS= 9 Ivl=1ms
I: If#= 1 Alt= 2 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E: Ad=03(O) Atr=01(Isoc) MxPS= 17 Ivl=1ms
E: Ad=83(I) Atr=01(Isoc) MxPS= 17 Ivl=1ms
I: If#= 1 Alt= 3 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E: Ad=03(O) Atr=01(Isoc) MxPS= 25 Ivl=1ms
E: Ad=83(I) Atr=01(Isoc) MxPS= 25 Ivl=1ms
I: If#= 1 Alt= 4 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E: Ad=03(O) Atr=01(Isoc) MxPS= 33 Ivl=1ms
E: Ad=83(I) Atr=01(Isoc) MxPS= 33 Ivl=1ms
I: If#= 1 Alt= 5 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E: Ad=03(O) Atr=01(Isoc) MxPS= 49 Ivl=1ms
E: Ad=83(I) Atr=01(Isoc) MxPS= 49 Ivl=1ms
I: If#= 1 Alt= 6 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E: Ad=03(O) Atr=01(Isoc) MxPS= 63 Ivl=1ms
E: Ad=83(I) Atr=01(Isoc) MxPS= 63 Ivl=1ms
Signed-off-by: Raghuram Hegde <raghuram.hegde@intel.com>
Signed-off-by: Chethan T N <chethan.tumkur.narayan@intel.com>
Signed-off-by: Amit K Bag <amit.k.bag@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Abhishek Pandit-Subedi [Wed, 13 May 2020 02:19:27 +0000 (19:19 -0700)]
Bluetooth: btusb: Implement hdev->prevent_wake
Implement the prevent_wake hook by checking device_may_wakeup on the usb
interface. This prevents the Bluetooth core from enabling scanning when
the device isn't expected to wake from suspend.
Signed-off-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org>
Reviewed-by: Alain Michaud <alainm@chromium.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Abhishek Pandit-Subedi [Wed, 13 May 2020 02:19:26 +0000 (19:19 -0700)]
Bluetooth: Add hook for driver to prevent wake from suspend
Let drivers have a hook to disable configuring scanning during suspend.
Drivers should use the device_may_wakeup function call to determine
whether hci should be configured for wakeup.
For example, an implementation for btusb may look like the following:
bool btusb_prevent_wake(struct hci_dev *hdev)
{
struct btusb_data *data = hci_get_drvdata(hdev);
return !device_may_wakeup(&data->udev->dev);
}
Signed-off-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org>
Reviewed-by: Alain Michaud <alainm@chromium.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Abhishek Pandit-Subedi [Wed, 13 May 2020 02:19:25 +0000 (19:19 -0700)]
Bluetooth: Rename BT_SUSPEND_COMPLETE
Renamed BT_SUSPEND_COMPLETE to BT_SUSPEND_CONFIGURE_WAKE since it sets
up the event filter and whitelist for wake-up.
Signed-off-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org>
Reviewed-by: Alain Michaud <alainm@chromium.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Abhishek Pandit-Subedi [Wed, 13 May 2020 02:09:33 +0000 (19:09 -0700)]
Bluetooth: Modify LE window and interval for suspend
When a device is suspended, it doesn't need to be as responsive to
connection events. Increase the interval to 640ms (creating a duty cycle
of roughly 1.75%) so that passive scanning uses much less power (vs
previous duty cycle of 18.75%). The new window + interval combination
has been tested to work with HID devices (which are currently the only
devices capable of wake up).
Signed-off-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Abhishek Pandit-Subedi [Wed, 13 May 2020 02:09:32 +0000 (19:09 -0700)]
Bluetooth: Fix incorrect type for window and interval
The types for window and interval should be uint16, not uint8.
Signed-off-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Vladimir Oltean [Wed, 13 May 2020 00:23:27 +0000 (03:23 +0300)]
net: dsa: tag_sja1105: appease sparse checks for ethertype accessors
A comparison between a value from the packet and an integer constant
value needs to be done by converting the value from the packet from
net->host, or the constant from host->net. Not the other way around.
Even though it makes no practical difference, correct that.
Fixes:
38b5beeae7a4 ("net: dsa: sja1105: prepare tagger for handling DSA tags and VLAN simultaneously")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
William Tu [Tue, 12 May 2020 17:36:23 +0000 (10:36 -0700)]
erspan: Check IFLA_GRE_ERSPAN_VER is set.
Add a check to make sure the IFLA_GRE_ERSPAN_VER is provided by users.
Fixes:
f989d546a2d5 ("erspan: Add type I version 0 support.")
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 12 May 2020 20:08:08 +0000 (13:08 -0700)]
Merge branch 'Traffic-support-for-dsa_8021q-in-vlan_filtering-1-mode'
Vladimir Oltean says:
====================
Traffic support for dsa_8021q in vlan_filtering=1 mode
This series is an attempt to support as much as possible in terms of
traffic I/O from the network stack with the only dsa_8021q user thus
far, sja1105.
The hardware doesn't support pushing a second VLAN tag to packets that
are already tagged, so our only option is to combine the dsa_8021q with
the user tag into a single tag and decode that on the CPU.
The assumption is that there is a type of use cases for which 7 VLANs
per port are more than sufficient, and that there's another type of use
cases where the full 4096 entries are barely enough. Those use cases are
very different from one another, so I prefer trying to give both the
best experience by creating this best_effort_vlan_filtering knob to
select the mode in which they want to operate in.
v2 was submitted here:
https://patchwork.ozlabs.org/project/netdev/cover/
20200511135338.20263-1-olteanv@gmail.com/
v1 was submitted here:
https://patchwork.ozlabs.org/project/netdev/cover/
20200510164255.19322-1-olteanv@gmail.com/
Changes in v3:
Patch 01/15:
- Rename again to configure_vlan_while_not_filtering, and add a helper
function for skipping VLAN configuration.
Patch 03/15:
- Remove sja1105_can_use_vlan_as_tags from driver code.
Patch 06/15:
- Adapt sja1105 driver to the second variable name change.
Patch 08/15:
- Provide an implementation of sja1105_can_use_vlan_as_tags as part of
the tagger and not as part of the switch driver. So we have to look at
the skb only, and not at the VLAN awareness state.
Changes in v2:
Patch 01/15:
- Rename variable from vlan_bridge_vtu to configure_vlans_while_disabled.
Patch 03/15:
- Be much more thorough, and make sure that things like virtual links
and FDB operations still work properly.
Patch 05/15:
- Free the vlan lists on teardown.
- Simplify sja1105_classify_vlan: only look at priv->expect_dsa_8021q.
- Keep vid 1 in the list of dsa_8021q VLANs, to make sure that untagged
packets transmitted from the stack, like PTP, continue to work in
VLAN-unaware mode.
Patch 06/15:
- Adapt to vlan_bridge_vtu variable name change.
Patch 11/15:
- In sja1105_best_effort_vlan_filtering_set, get the vlan_filtering
value of each port instead of just one time for port 0. Normally this
shouldn't matter, but it avoids issues when port 0 is disabled in
device tree.
Patch 14/14:
- Only do anything in sja1105_build_subvlans and in
sja1105_build_crosschip_subvlans when operating in
SJA1105_VLAN_BEST_EFFORT state. This avoids installing VLAN retagging
rules in unaware mode, which would cost us a penalty in terms of
usable frame memory.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Tue, 12 May 2020 17:20:39 +0000 (20:20 +0300)]
docs: net: dsa: sja1105: document the best_effort_vlan_filtering option
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Tue, 12 May 2020 17:20:38 +0000 (20:20 +0300)]
net: dsa: sja1105: implement VLAN retagging for dsa_8021q sub-VLANs
Expand the delta commit procedure for VLANs with additional logic for
treating bridge_vlans in the newly introduced operating mode,
SJA1105_VLAN_BEST_EFFORT.
For every bridge VLAN on every user port, a sub-VLAN index is calculated
and retagging rules are installed towards a dsa_8021q rx_vid that
encodes that sub-VLAN index. This way, the tagger can identify the
original VLANs.
Extra care is taken for VLANs to still work as intended in cross-chip
scenarios. Retagging may have unintended consequences for these because
a sub-VLAN encoding that works for the CPU does not make any sense for a
front-panel port.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Tue, 12 May 2020 17:20:37 +0000 (20:20 +0300)]
net: dsa: sja1105: implement a common frame memory partitioning function
There are 2 different features that require some reserved frame memory
space: VLAN retagging and virtual links. Create a central function that
modifies the static config and ensures frame memory is never
overcommitted.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Tue, 12 May 2020 17:20:36 +0000 (20:20 +0300)]
net: dsa: sja1105: add packing ops for the Retagging Table
The Retagging Table is an optional feature that allows the switch to
match frames against a {ingress port, egress port, vid} rule and change
their VLAN ID. The retagged frames are by default clones of the original
ones (since the hardware-foreseen use case was to mirror traffic for
debugging purposes and to tag it with a special VLAN for this purpose),
but we can force the original frames to be dropped by removing the
pre-retagging VLAN from the port membership list of the egress port.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Tue, 12 May 2020 17:20:35 +0000 (20:20 +0300)]
net: dsa: sja1105: add a new best_effort_vlan_filtering devlink parameter
This devlink parameter enables the handling of DSA tags when enslaved to
a bridge with vlan_filtering=1. There are very good reasons to want
this, but there are also very good reasons for not enabling it by
default. So a devlink param named best_effort_vlan_filtering, currently
driver-specific and exported only by sja1105, is used to configure this.
In practice, this is perhaps the way that most users are going to use
the switch in. It assumes that no more than 7 VLANs are needed per port.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Tue, 12 May 2020 17:20:34 +0000 (20:20 +0300)]
net: dsa: tag_sja1105: implement sub-VLAN decoding
Create a subvlan_map as part of each port's tagger private structure.
This keeps reverse mappings of bridge-to-dsa_8021q VLAN retagging rules.
Note that as of this patch, this piece of code is never engaged, due to
the fact that the driver hasn't installed any retagging rule, so we'll
always see packets with a subvlan code of 0 (untagged).
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Tue, 12 May 2020 17:20:33 +0000 (20:20 +0300)]
net: dsa: tag_8021q: support up to 8 VLANs per port using sub-VLANs
For switches that support VLAN retagging, such as sja1105, we extend
dsa_8021q by encoding a "sub-VLAN" into the remaining 3 free bits in the
dsa_8021q tag.
A sub-VLAN is nothing more than a number in the range 0-7, which serves
as an index into a per-port driver lookup table. The sub-VLAN value of
zero means that traffic is untagged (this is also backwards-compatible
with dsa_8021q without retagging).
The switch should be configured to retag VLAN-tagged traffic that gets
transmitted towards the CPU port (and towards the CPU only). Example:
bridge vlan add dev sw1p0 vid 100
The switch retags frames received on port 0, going to the CPU, and
having VID 100, to the VID of 1104 (0x0450). In dsa_8021q language:
| 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
+-----------+-----+-----------------+-----------+-----------------------+
| DIR | SVL | SWITCH_ID | SUBVLAN | PORT |
+-----------+-----+-----------------+-----------+-----------------------+
0x0450 means:
- DIR = 0b01: this is an RX VLAN
- SUBVLAN = 0b001: this is subvlan #1
- SWITCH_ID = 0b001: this is switch 1 (see the name "sw1p0")
- PORT = 0b0000: this is port 0 (see the name "sw1p0")
The driver also remembers the "1 -> 100" mapping. In the hotpath, if the
sub-VLAN from the tag encodes a non-untagged frame, this mapping is used
to create a VLAN hwaccel tag, with the value of 100.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Tue, 12 May 2020 17:20:32 +0000 (20:20 +0300)]
net: dsa: sja1105: prepare tagger for handling DSA tags and VLAN simultaneously
In VLAN-unaware mode, sja1105 uses VLAN tags with a custom TPID of
0xdadb. While in the yet-to-be introduced best_effort_vlan_filtering
mode, it needs to work with normal VLAN TPID values.
A complication arises when we must transmit a VLAN-tagged packet to the
switch when it's in VLAN-aware mode. We need to construct a packet with
2 VLAN tags, and the switch will use the outer header for routing and
pop it on egress. But sadly, here the 2 hardware generations don't
behave the same:
- E/T switches won't pop an ETH_P_8021AD tag on egress, it seems
(packets will remain double-tagged).
- P/Q/R/S switches will drop a packet with 2 ETH_P_8021Q tags (it looks
like it tries to prevent VLAN hopping).
But looks like the reverse is also true:
- E/T switches have no problem popping the outer tag from packets with
2 ETH_P_8021Q tags.
- P/Q/R/S will have no problem popping a single tag even if that is
ETH_P_8021AD.
So it is clear that if we want the hardware to work with dsa_8021q
tagging in VLAN-aware mode, we need to send different TPIDs depending on
revision. Keep that information in priv->info->qinq_tpid.
The per-port tagger structure will hold an xmit_tpid value that depends
not only upon the qinq_tpid, but also upon the VLAN awareness state
itself (in case we must transmit using 0xdadb).
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Tue, 12 May 2020 17:20:31 +0000 (20:20 +0300)]
net: dsa: sja1105: exit sja1105_vlan_filtering when called multiple times
VLAN filtering is a global property for sja1105, and that means that we
rely on the DSA core to not call us more than once.
But we need to introduce some per-port state for the tagger, namely the
xmit_tpid, and the best place to do that is where the xmit_tpid changes,
namely in sja1105_vlan_filtering. So at the moment, exit early from the
function to avoid unnecessarily resetting the switch for each port call.
Then we'll change the xmit_tpid prior to the early exit in the next
patch.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Tue, 12 May 2020 17:20:30 +0000 (20:20 +0300)]
net: dsa: sja1105: allow VLAN configuration from the bridge in all states
Let the DSA core call our .port_vlan_add methods every time the bridge
layer requests so. We will deal internally with saving/restoring VLANs
depending on our VLAN awareness state.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Tue, 12 May 2020 17:20:29 +0000 (20:20 +0300)]
net: dsa: sja1105: save/restore VLANs using a delta commit method
Managing the VLAN table that is present in hardware will become very
difficult once we add a third operating state
(best_effort_vlan_filtering). That is because correct cleanup (not too
little, not too much) becomes virtually impossible, when VLANs can be
added from the bridge layer, from dsa_8021q for basic tagging, for
cross-chip bridging, as well as retagging rules for sub-VLANs and
cross-chip sub-VLANs. So we need to rethink VLAN interaction with the
switch in a more scalable way.
In preparation for that, use the priv->expect_dsa_8021q boolean to
classify any VLAN request received through .port_vlan_add or
.port_vlan_del towards either one of 2 internal lists: bridge VLANs and
dsa_8021q VLANs.
Then, implement a central sja1105_build_vlan_table method that creates a
VLAN configuration from scratch based on the 2 lists of VLANs kept by
the driver, and based on the VLAN awareness state. Currently, if we are
VLAN-unaware, install the dsa_8021q VLANs, otherwise the bridge VLANs.
Then, implement a delta commit procedure that identifies which VLANs
from this new configuration are actually different from the config
previously committed to hardware. We apply the delta through the dynamic
configuration interface (we don't reset the switch). The result is that
the hardware should see the exact sequence of operations as before this
patch.
This also helps remove the "br" argument passed to
dsa_8021q_crosschip_bridge_join, which it was only using to figure out
whether it should commit the configuration back to us or not, based on
the VLAN awareness state of the bridge. We can simplify that, by always
allowing those VLANs inside of our dsa_8021q_vlans list, and committing
those to hardware when necessary.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Tue, 12 May 2020 17:20:28 +0000 (20:20 +0300)]
net: dsa: sja1105: deny alterations of dsa_8021q VLANs from the bridge
At the moment, this can never happen. The 2 modes that we operate in do
not permit that:
- SJA1105_VLAN_UNAWARE: we are guarded from bridge VLANs added by the
user by the DSA core. We will later lift this restriction by setting
ds->vlan_bridge_vtu = true, and that is where we'll need it.
- SJA1105_VLAN_FILTERING_FULL: in this mode, dsa_8021q configuration is
disabled. So the user is free to add these VLANs in the 1024-3071
range.
The reason for the patch is that we'll introduce a third VLAN awareness
state, where both dsa_8021q as well as the bridge are going to call our
.port_vlan_add and .port_vlan_del methods.
For that, we need a good way to discriminate between the 2. The easiest
(and less intrusive way for upper layers) is to recognize the fact that
dsa_8021q configurations are always driven by our driver - we _know_
when a .port_vlan_add method will be called from dsa_8021q because _we_
initiated it.
So introduce an expect_dsa_8021q boolean which is only used, at the
moment, for blacklisting VLANs in range 1024-3071 in the modes when
dsa_8021q is active.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Tue, 12 May 2020 17:20:27 +0000 (20:20 +0300)]
net: dsa: sja1105: keep the VLAN awareness state in a driver variable
Soon we'll add a third operating mode to the driver. Introduce a
vlan_state to make things more easy to manage, and use it where
applicable.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Tue, 12 May 2020 17:20:26 +0000 (20:20 +0300)]
net: dsa: tag_8021q: introduce a vid_is_dsa_8021q helper
This function returns a boolean denoting whether the VLAN passed as
argument is part of the 1024-3071 range that the dsa_8021q tagging
scheme uses.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Tue, 12 May 2020 17:20:25 +0000 (20:20 +0300)]
net: dsa: provide an option for drivers to always receive bridge VLANs
DSA assumes that a bridge which has vlan filtering disabled is not
vlan aware, and ignores all vlan configuration. However, the kernel
software bridge code allows configuration in this state.
This causes the kernel's idea of the bridge vlan state and the
hardware state to disagree, so "bridge vlan show" indicates a correct
configuration but the hardware lacks all configuration. Even worse,
enabling vlan filtering on a DSA bridge immediately blocks all traffic
which, given the output of "bridge vlan show", is very confusing.
Provide an option that drivers can set to indicate they want to receive
vlan configuration even when vlan filtering is disabled. At the very
least, this is safe for Marvell DSA bridges, which do not look up
ingress traffic in the VTU if the port is in 8021Q disabled state. It is
also safe for the Ocelot switch family. Whether this change is suitable
for all DSA bridges is not known.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 12 May 2020 19:47:40 +0000 (12:47 -0700)]
Merge branch 'sfc-siena_check_caps-fixups'
Edward Cree says:
====================
sfc: siena_check_caps fixups
Fix a bug and a build warning introduced in a recent refactor.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Tue, 12 May 2020 13:24:58 +0000 (14:24 +0100)]
sfc: siena_check_caps() can be static
Reported-by: Jakub Kicinski <kuba@kernel.org>
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Tue, 12 May 2020 13:24:34 +0000 (14:24 +0100)]
sfc: actually wire up siena_check_caps()
Assign it to siena_a0_nic_type.check_caps function pointer.
Fixes:
be904b855200 ("sfc: make capability checking a nic_type function")
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kunihiko Hayashi [Tue, 12 May 2020 07:26:50 +0000 (16:26 +0900)]
dt-bindings: net: Convert UniPhier AVE4 controller to json-schema
Convert the UniPhier AVE4 controller binding to DT schema format.
Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 12 May 2020 19:19:30 +0000 (12:19 -0700)]
Merge branch 'ionic-updates'
Shannon Nelson says:
====================
ionic updates
This set of patches is a bunch of code cleanup, a little
documentation, longer tx sg lists, more ethtool stats,
and a couple more transceiver types.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Tue, 12 May 2020 00:59:36 +0000 (17:59 -0700)]
ionic: update doc files
Update the basic doc file with some configuration hints and a
little bit of stats information.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Tue, 12 May 2020 00:59:35 +0000 (17:59 -0700)]
ionic: add more ethtool stats
Add hardware port stats and a few more driver collected
statistics to the ethtool stats output.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Tue, 12 May 2020 00:59:34 +0000 (17:59 -0700)]
ionic: more ionic name tweaks
Fix up a few more local names that need an "ionic" prefix.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Tue, 12 May 2020 00:59:33 +0000 (17:59 -0700)]
ionic: ionic_intr_free parameter change
Change the ionic_intr_free parameter from struct ionic_lif to
struct ionic since that's what it actually cares about.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Tue, 12 May 2020 00:59:32 +0000 (17:59 -0700)]
ionic: reset device at probe
Once we're talking to the device, tell it to reset to
be sure we've got a fresh, clean environment.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Tue, 12 May 2020 00:59:31 +0000 (17:59 -0700)]
ionic: shorter dev cmd wait time
Shorten our msleep time while polling for the dev command
request to finish. Yes, checkpatch.pl complains that the
msleep might actually go longer - that won't hurt, but we'll
take the shorter time if we can get it.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Tue, 12 May 2020 00:59:30 +0000 (17:59 -0700)]
ionic: add support for more xcvr types
Add a couple more SFP and QSFP transceiver types to our
ethtool get link ksettings.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Tue, 12 May 2020 00:59:29 +0000 (17:59 -0700)]
ionic: protect vf calls from fw reset
When going into a firmware upgrade cycle, we set the device as
not present to keep some user commands from trying to change
the driver while we're only half there. Unfortunately, the
ndo_vf_* calls don't check netif_device_present() so we need
to add a check in the callbacks.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Tue, 12 May 2020 00:59:28 +0000 (17:59 -0700)]
ionic: updates to ionic FW api description
Lots of comment cleanup for better documentation, a few new
fields added, and a few minor mistakes fixed up.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Tue, 12 May 2020 00:59:27 +0000 (17:59 -0700)]
ionic: support longer tx sg lists
The version 1 Tx queues can use longer SG lists than the
original version 0 queues, but we need to check to see if the
firmware supports the v1 Tx queues. This implements the queue
type query for all queue types, and uses the information to
set up for using the longer Tx SG lists.
Because the Tx SG list can be longer, we need to limit the
max ring length to be sure we stay inside the boundaries of a
DMA allocation max size, so we lower the max Tx ring size.
The driver sets its highest known version in the Q_IDENTITY
command, and the FW returns the highest version that it knows,
bounded by the driver's version. The negotiated version number
is later used in the Q_INIT commands.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 11 May 2020 17:08:07 +0000 (10:08 -0700)]
checkpatch: warn about uses of ENOTSUPP
ENOTSUPP often feels like the right error code to use, but it's
in fact not a standard Unix error. E.g.:
$ python
>>> import errno
>>> errno.errorcode[errno.ENOTSUPP]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: module 'errno' has no attribute 'ENOTSUPP'
There were numerous commits converting the uses back to EOPNOTSUPP
but in some cases we are stuck with the high error code for backward
compatibility reasons.
Let's try prevent more ENOTSUPPs from getting into the kernel.
Recent example:
https://lore.kernel.org/netdev/
20200510182252.GA411829@lunn.ch/
v3 (Joe):
- fix the "not file" condition.
v2 (Joe):
- add a link to recent discussion,
- don't match when scanning files, not patches to avoid sudden
influx of conversion patches.
https://lore.kernel.org/netdev/
20200511165319.2251678-1-kuba@kernel.org/
v1:
https://lore.kernel.org/netdev/
20200510185148.2230767-1-kuba@kernel.org/
Suggested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 11 May 2020 23:59:16 +0000 (16:59 -0700)]
Merge branch 'improve-msg_control-kernel-vs-user-pointer-handling'
Christoph Hellwig says:
====================
improve msg_control kernel vs user pointer handling
this series replace the msg_control in the kernel msghdr structure
with an anonymous union and separate fields for kernel vs user
pointers. In addition to helping a bit with type safety and reducing
sparse warnings, this also allows to remove the set_fs() in
kernel_recvmsg, helping with an eventual entire removal of set_fs().
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Christoph Hellwig [Mon, 11 May 2020 11:59:13 +0000 (13:59 +0200)]
net: cleanly handle kernel vs user buffers for ->msg_control
The msg_control field in struct msghdr can either contain a user
pointer when used with the recvmsg system call, or a kernel pointer
when used with sendmsg. To complicate things further kernel_recvmsg
can stuff a kernel pointer in and then use set_fs to make the uaccess
helpers accept it.
Replace it with a union of a kernel pointer msg_control field, and
a user pointer msg_control_user one, and allow kernel_recvmsg operate
on a proper kernel pointer using a bitfield to override the normal
choice of a user pointer for recvmsg.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Christoph Hellwig [Mon, 11 May 2020 11:59:12 +0000 (13:59 +0200)]
net/scm: cleanup scm_detach_fds
Factor out two helpes to keep the code tidy.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Christoph Hellwig [Mon, 11 May 2020 11:59:11 +0000 (13:59 +0200)]
net: add a CMSG_USER_DATA macro
Add a variant of CMSG_DATA that operates on user pointer to avoid
sparse warnings about casting to/from user pointers. Also fix up
CMSG_DATA to rely on the gcc extension that allows void pointer
arithmetics to cut down on the amount of casts.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 11 May 2020 23:50:45 +0000 (16:50 -0700)]
Merge branch 'net-dsa-Constify-two-tagger-ops'
Florian Fainelli says:
====================
net: dsa: Constify two tagger ops
This patch series constifies the dsa_device_ops for ocelot and sja1105
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Mon, 11 May 2020 23:47:15 +0000 (16:47 -0700)]
net: dsa: tag_sja1105: Constify dsa_device_ops
sja1105_netdev_ops should be const since that is what the DSA layer
expects.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Mon, 11 May 2020 23:47:14 +0000 (16:47 -0700)]
net: dsa: ocelot: Constify dsa_device_ops
ocelot_netdev_ops should be const since that is what the DSA layer
expects.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 11 May 2020 20:31:49 +0000 (13:31 -0700)]
Merge branch 'sfc-remove-nic_data-usage-in-common-code'
Edward Cree says:
====================
sfc: remove nic_data usage in common code
efx->nic_data should only be used from NIC-specific code (i.e. nic_type
functions and things they call), in files like ef10[_sriov].c and
siena.c. This series refactors several nic_data usages from common
code (mainly in mcdi_filters.c) into nic_type functions, in preparation
for the upcoming ef100 driver which will use those functions but have
its own struct layout for efx->nic_data distinct from ef10's.
After this series, one nic_data usage (in ptp.c) remains; it wasn't
clear to me how to fix it, and ef100 devices don't yet have PTP support
(so the initial ef100 driver will not call that code).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Mon, 11 May 2020 12:30:00 +0000 (13:30 +0100)]
sfc: make firmware-variant printing a nic_type function
Instead of having efx_mcdi_print_fwver() look at efx_nic_rev and
conditionally poke around inside ef10-specific nic_data, add a new
efx->type->print_additional_fwver() method to do this work.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Mon, 11 May 2020 12:29:45 +0000 (13:29 +0100)]
sfc: make filter table probe caller responsible for adding VLANs
By making the caller of efx_mcdi_filter_table_probe() loop over the
vlan_list calling efx_mcdi_filter_add_vlan(), instead of doing it in
efx_mcdi_filter_table_probe(), the latter avoids looking in ef10-
specific nic_data.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Mon, 11 May 2020 12:29:34 +0000 (13:29 +0100)]
sfc: move rx_rss_context_exclusive into struct efx_mcdi_filter_table
It's both set and used solely by mcdi_filters.c, so there's no reason
for it to be in ef10-specific nic_data.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Mon, 11 May 2020 12:29:23 +0000 (13:29 +0100)]
sfc: rework handling of (firmware) multicast chaining state
Store the mc_chaining bit in struct efx_mcdi_filter_table, so that common
code in mcdi_filters.c doesn't need to get it from ef10-specific nic_data.
Also, probe the firmware workaround just before the call to
efx_mcdi_filter_table_probe(), rather than in a random other part of the
driver bringup, to ensure that (a) it gets probed in time and (b) it gets
reprobed as necessary on resets, no matter how the surrounding code gets
reorganised and reordered.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Mon, 11 May 2020 12:29:09 +0000 (13:29 +0100)]
sfc: move 'must restore' flags out of ef10-specific nic_data
Common code in mcdi_filters.c uses these flags, so by moving them to
either struct efx_nic (in the case of must_realloc_vis) or struct
efx_mcdi_filter_table (for must_restore_rss_contexts and
must_restore_filters), decouple this code from ef10's nic_data.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Mon, 11 May 2020 12:28:56 +0000 (13:28 +0100)]
sfc: use efx_has_cap for capability checks outside of NIC-specific code
Removes some efx_ef10_nic_data references from common code.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Zhao [Mon, 11 May 2020 12:28:40 +0000 (13:28 +0100)]
sfc: make capability checking a nic_type function
Various MCDI functions (especially in filter handling) need to check the
datapath caps, but those live in nic_data (since they don't exist on
Siena). Decouple from ef10-specific data structures by adding check_caps
to the nic_type, to allow using these functions from non-ef10 drivers.
Also add a convenience macro efx_has_cap() to reduce the amount of
boilerplate involved in calling it.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Mon, 11 May 2020 12:28:20 +0000 (13:28 +0100)]
sfc: move vport_id to struct efx_nic
Remove some usage of ef10-specific nic_data structs from common MCDI
functions, in preparation for using them from a non-EF10 driver.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 11 May 2020 20:25:00 +0000 (13:25 -0700)]
Merge branch 'net-Optimize-the-qed-allocations-inside-kdump-kernel'
Bhupesh Sharma says:
====================
net: Optimize the qed* allocations inside kdump kernel
Changes since v1:
----------------
- v1 can be seen here: http://lists.infradead.org/pipermail/kexec/2020-May/024935.html
- Addressed review comments received on v1:
* Removed unnecessary paranthesis.
* Used a different macro for minimum RX/TX ring count value in kdump
kernel.
Since kdump kernel(s) run under severe memory constraint with the
basic idea being to save the crashdump vmcore reliably when the primary
kernel panics/hangs, large memory allocations done by a network driver
can cause the crashkernel to panic with OOM.
The qed* drivers take up approximately 214MB memory when run in the
kdump kernel with the default configuration settings presently used in
the driver. With an usual crashkernel size of 512M, this allocation
is equal to almost half of the total crashkernel size allocated.
See some logs obtained via memstrack tool (see [1]) below:
dracut-pre-pivot[676]: ======== Report format module_summary: ========
dracut-pre-pivot[676]: Module qed using 149.6MB (2394 pages), peak allocation 149.6MB (2394 pages)
dracut-pre-pivot[676]: Module qede using 65.3MB (1045 pages), peak allocation 65.3MB (1045 pages)
This patchset tries to reduce the overall memory allocation profile of
the qed* driver when they run in the kdump kernel. With these
optimization we can see a saving of approx 85M in the kdump kernel:
dracut-pre-pivot[671]: ======== Report format module_summary: ========
dracut-pre-pivot[671]: Module qed using 124.6MB (1993 pages), peak allocation 124.7MB (1995 pages)
<..snip..>
dracut-pre-pivot[671]: Module qede using 4.6MB (73 pages), peak allocation 4.6MB (74 pages)
And the kdump kernel can save vmcore successfully via both ssh and nfs
interfaces.
This patchset contains two patches:
[PATCH 1/2] - Reduces the default TX and RX ring count in kdump kernel.
[PATCH 2/2] - Disables qed SRIOV feature in kdump kernel (as it is
normally not a supported kdump target for saving
vmcore).
[1]. Memstrack tool: https://github.com/ryncsn/memstrack
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Bhupesh Sharma [Mon, 11 May 2020 10:11:42 +0000 (15:41 +0530)]
net: qed: Disable SRIOV functionality inside kdump kernel
Since we have kdump kernel(s) running under severe memory constraint
it makes sense to disable the qed SRIOV functionality when running the
kdump kernel as kdump configurations on several distributions don't
support SRIOV targets for saving the vmcore (see [1] for example).
Currently the qed SRIOV functionality ends up consuming memory in
the kdump kernel, when we don't really use the same.
An example log seen in the kdump kernel with the SRIOV functionality
enabled can be seen below (obtained via memstrack tool, see [2]):
dracut-pre-pivot[676]: ======== Report format module_summary: ========
dracut-pre-pivot[676]: Module qed using 149.6MB (2394 pages), peak allocation 149.6MB (2394 pages)
This patch disables the SRIOV functionality inside kdump kernel and with
the same applied the memory consumption goes down:
dracut-pre-pivot[671]: ======== Report format module_summary: ========
dracut-pre-pivot[671]: Module qed using 124.6MB (1993 pages), peak allocation 124.7MB (1995 pages)
[1]. https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/managing_monitoring_and_updating_the_kernel/installing-and-configuring-kdump_managing-monitoring-and-updating-the-kernel#supported-kdump-targets_supported-kdump-configurations-and-targets
[2]. Memstrack tool: https://github.com/ryncsn/memstrack
Cc: kexec@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: Ariel Elior <aelior@marvell.com>
Cc: GR-everest-linux-l2@marvell.com
Cc: Manish Chopra <manishc@marvell.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Bhupesh Sharma <bhsharma@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bhupesh Sharma [Mon, 11 May 2020 10:11:41 +0000 (15:41 +0530)]
net: qed*: Reduce RX and TX default ring count when running inside kdump kernel
Normally kdump kernel(s) run under severe memory constraint with the
basic idea being to save the crashdump vmcore reliably when the primary
kernel panics/hangs.
Currently the qed* ethernet driver ends up consuming a lot of memory in
the kdump kernel, leading to kdump kernel panic when one tries to save
the vmcore via ssh/nfs (thus utilizing the services of the underlying
qed* network interfaces).
An example OOM message log seen in the kdump kernel can be seen here
[1], with crashkernel size reservation of 512M.
Using tools like memstrack (see [2]), we can track the modules taking up
the bulk of memory in the kdump kernel and organize the memory usage
output as per 'highest allocator first'. An example log for the OOM case
indicates that the qed* modules end up allocating approximately 216M
memory, which is a large part of the total crashkernel size:
dracut-pre-pivot[676]: ======== Report format module_summary: ========
dracut-pre-pivot[676]: Module qed using 149.6MB (2394 pages), peak allocation 149.6MB (2394 pages)
dracut-pre-pivot[676]: Module qede using 65.3MB (1045 pages), peak allocation 65.3MB (1045 pages)
This patch reduces the default RX and TX ring count from 1024 to 64
when running inside kdump kernel, which leads to a significant memory
saving.
An example log with the patch applied shows the reduced memory
allocation in the kdump kernel:
dracut-pre-pivot[674]: ======== Report format module_summary: ========
dracut-pre-pivot[674]: Module qed using 141.8MB (2268 pages), peak allocation 141.8MB (2268 pages)
<..snip..>
[dracut-pre-pivot[674]: Module qede using 4.8MB (76 pages), peak allocation 4.9MB (78 pages)
Tested crashdump vmcore save via ssh/nfs protocol using underlying qed*
network interface after applying this patch.
[1] OOM log:
------------
kworker/0:6: page allocation failure: order:6,
mode:0x60c0c0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null)
kworker/0:6 cpuset=/ mems_allowed=0
CPU: 0 PID: 145 Comm: kworker/0:6 Not tainted 4.18.0-109.el8.aarch64 #1
Hardware name: To be filled by O.E.M. Saber/Saber, BIOS 0ACKL025
01/18/2019
Workqueue: events work_for_cpu_fn
Call trace:
dump_backtrace+0x0/0x188
show_stack+0x24/0x30
dump_stack+0x90/0xb4
warn_alloc+0xf4/0x178
__alloc_pages_nodemask+0xcac/0xd58
alloc_pages_current+0x8c/0xf8
kmalloc_order_trace+0x38/0x108
qed_iov_alloc+0x40/0x248 [qed]
qed_resc_alloc+0x224/0x518 [qed]
qed_slowpath_start+0x254/0x928 [qed]
__qede_probe+0xf8/0x5e0 [qede]
qede_probe+0x68/0xd8 [qede]
local_pci_probe+0x44/0xa8
work_for_cpu_fn+0x20/0x30
process_one_work+0x1ac/0x3e8
worker_thread+0x44/0x448
kthread+0x130/0x138
ret_from_fork+0x10/0x18
Cannot start slowpath
qede: probe of 0000:05:00.1 failed with error -12
[2]. Memstrack tool: https://github.com/ryncsn/memstrack
Cc: kexec@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: Ariel Elior <aelior@marvell.com>
Cc: GR-everest-linux-l2@marvell.com
Cc: Manish Chopra <manishc@marvell.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Bhupesh Sharma <bhsharma@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Luo bin [Mon, 11 May 2020 05:58:57 +0000 (05:58 +0000)]
hinic: add link_ksettings ethtool_ops support
add set_link_ksettings implementation and improve the implementation
of get_link_ksettings
Signed-off-by: Luo bin <luobin9@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gustavo A. R. Silva [Thu, 7 May 2020 19:25:07 +0000 (14:25 -0500)]
team: Replace zero-length array with flexible-array
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
sizeof(flexible-array-member) triggers a warning because flexible array
members have incomplete type[1]. There are some instances of code in
which the sizeof operator is being incorrectly/erroneously applied to
zero-length arrays and the result is zero. Such instances may be hiding
some bugs. So, this work (flexible-array member conversions) will also
help to get completely rid of those sorts of issues.
This issue was found with the help of Coccinelle.
[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit
76497732932f ("cxgb3/l2t: Fix undefined behaviour")
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>