Krzysztof Kozlowski [Thu, 29 Jul 2021 10:40:11 +0000 (12:40 +0200)]
nfc: constify passed nfc_dev
The struct nfc_dev is not modified by nfc_get_drvdata() and
nfc_device_name() so it can be made a const.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 29 Jul 2021 11:18:51 +0000 (12:18 +0100)]
Merge branch 'skb-gro-optimize'
Paolo Abeni says:
====================
sk_buff: optimize GRO for the common case
This is a trimmed down revision of "sk_buff: optimize layout for GRO",
specifically dropping the changes to the sk_buff layout[1].
This series tries to accomplish 2 goals:
- optimize the GRO stage for the most common scenario, avoiding a bunch
of conditional and some more code
- let owned skbs entering the GRO engine, allowing backpressure in the
veth GRO forward path.
A new sk_buff flag (!!!) is introduced and maintained for GRO's sake.
Such field uses an existing hole, so there is no change to the sk_buff
size.
[1] two main reasons:
- move skb->inner_ field requires some extra care, as some in kernel
users access and the fields regardless of skb->encapsulation.
- extending secmark size clash with ct and nft uAPIs
address the all above is possible, I think, but for sure not in a single
series.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Wed, 28 Jul 2021 16:24:04 +0000 (18:24 +0200)]
veth: use skb_prepare_for_gro()
Leveraging the previous patch we can now avoid orphaning the
skb in the veth gro path, allowing correct backpressure.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Wed, 28 Jul 2021 16:24:03 +0000 (18:24 +0200)]
skbuff: allow 'slow_gro' for skb carring sock reference
This change leverages the infrastructure introduced by the previous
patches to allow soft devices passing to the GRO engine owned skbs
without impacting the fast-path.
It's up to the GRO caller ensuring the slow_gro bit validity before
invoking the GRO engine. The new helper skb_prepare_for_gro() is
introduced for that goal.
On slow_gro, skbs are aggregated only with equal sk.
Additionally, skb truesize on GRO recycle and free is correctly
updated so that sk wmem is not changed by the GRO processing.
rfc-> v1:
- fixed bad truesize on dev_gro_receive NAPI_FREE
- use the existing state bit
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Wed, 28 Jul 2021 16:24:02 +0000 (18:24 +0200)]
net: optimize GRO for the common case.
After the previous patches, at GRO time, skb->slow_gro is
usually 0, unless the packets comes from some H/W offload
slowpath or tunnel.
We can optimize the GRO code assuming !skb->slow_gro is likely.
This remove multiple conditionals in the most common path, at the
price of an additional one when we hit the above "slow-paths".
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Wed, 28 Jul 2021 16:24:01 +0000 (18:24 +0200)]
sk_buff: track extension status in slow_gro
Similar to the previous one, but tracking the
active_extensions field status.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Wed, 28 Jul 2021 16:24:00 +0000 (18:24 +0200)]
sk_buff: track dst status in slow_gro
Similar to the previous patch, but covering the dst field:
the slow_gro flag is additionally set when a dst is attached
to the skb
RFC -> v1:
- use the existing flag instead of adding a new one
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Wed, 28 Jul 2021 16:23:59 +0000 (18:23 +0200)]
sk_buff: introduce 'slow_gro' flags
The new flag tracks if any state field is set, so that
GRO requires 'unusual'/slow prepare steps.
Set such flag when a ct entry is attached to the skb,
and never clear it.
The new bit uses an existing hole into the sk_buff struct
RFC -> v1:
- use a single state bit, never clear it
- avoid moving the _nfct field
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hu Haowen [Wed, 28 Jul 2021 15:59:12 +0000 (23:59 +0800)]
Documentation: networking: add ioam6-sysctl into index
Append ioam6-sysctl to toctree in order to get rid of building warnings.
Signed-off-by: Hu Haowen <src.res@email.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Wed, 28 Jul 2021 18:53:15 +0000 (21:53 +0300)]
net: dsa: sja1105: be stateless when installing FDB entries
Currently there are issues when adding a bridge FDB entry as VLAN-aware
and deleting it as VLAN-unaware, or vice versa.
However this is an unneeded complication, since the bridge always
installs its default FDB entries in VLAN 0 to match on VLAN-unaware
ports, and in the default_pvid (VLAN 1) to match on VLAN-aware ports.
So instead of trying to outsmart the bridge, just install all entries it
gives us, and they will start matching packets when the vlan_filtering
mode changes.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 28 Jul 2021 19:26:05 +0000 (20:26 +0100)]
Merge branch 'switchdev-notifiers'
Vladimir Oltean says:
====================
Plug the last 2 holes in the switchdev notifiers for local FDB entries
The work for trapping local FDB entries to the CPU in switchdev/DSA
started with the "RX filtering in DSA" series:
https://patchwork.kernel.org/project/netdevbpf/cover/
20210629140658.2510288-1-olteanv@gmail.com/
and was continued with further improvements such as "Fan out FDB entries
pointing towards the bridge to all switchdev member ports":
https://patchwork.kernel.org/project/netdevbpf/cover/
20210719135140.278938-1-vladimir.oltean@nxp.com/
https://patchwork.kernel.org/project/netdevbpf/cover/
20210720173557.999534-1-vladimir.oltean@nxp.com/
There are only 2 more issues left to be addressed (famous last words),
and these are:
- dynamically learned FDB entries towards interfaces foreign to DSA need
to be replayed too
- adding/deleting a VLAN on a port causes the local FDB entries in that
VLAN to be prematurely deleted
This patch series addresses both, and patch 2 depends on 1 to work properly.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Wed, 28 Jul 2021 18:27:48 +0000 (21:27 +0300)]
net: bridge: switchdev: treat local FDBs the same as entries towards the bridge
Currently the following script:
1. ip link add br0 type bridge vlan_filtering 1 && ip link set br0 up
2. ip link set swp2 up && ip link set swp2 master br0
3. ip link set swp3 up && ip link set swp3 master br0
4. ip link set swp4 up && ip link set swp4 master br0
5. bridge vlan del dev swp2 vid 1
6. bridge vlan del dev swp3 vid 1
7. ip link set swp4 nomaster
8. ip link set swp3 nomaster
produces the following output:
[ 641.010738] sja1105 spi0.1: port 2 failed to delete 00:1f:7b:63:02:48 vid 1 from fdb: -2
[ swp2, swp3 and br0 all have the same MAC address, the one listed above ]
In short, this happens because the number of FDB entry additions
notified to switchdev is unbalanced with the number of deletions.
At step 1, the bridge has a random MAC address. At step 2, the
br_fdb_replay of swp2 receives this initial MAC address. Then the bridge
inherits the MAC address of swp2 via br_fdb_change_mac_address(), and it
notifies switchdev (only swp2 at this point) of the deletion of the
random MAC address and the addition of 00:1f:7b:63:02:48 as a local FDB
entry with fdb->dst == swp2, in VLANs 0 and the default_pvid (1).
During step 7:
del_nbp
-> br_fdb_delete_by_port(br, p, vid=0, do_all=1);
-> fdb_delete_local(br, p, f);
br_fdb_delete_by_port() deletes all entries towards the ports,
regardless of vid, because do_all is 1.
fdb_delete_local() has logic to migrate local FDB entries deleted from
one port to another port which shares the same MAC address and is in the
same VLAN, or to the bridge device itself. This migration happens
without notifying switchdev of the deletion on the old port and the
addition on the new one, just fdb->dst is changed and the added_by_user
flag is cleared.
In the example above, the del_nbp(swp4) causes the
"addr 00:1f:7b:63:02:48 vid 1" local FDB entry with fdb->dst == swp4
that existed up until then to be migrated directly towards the bridge
(fdb->dst == NULL). This is because it cannot be migrated to any of the
other ports (swp2 and swp3 are not in VLAN 1).
After the migration to br0 takes place, swp4 requests a deletion replay
of all FDB entries. Since the "addr 00:1f:7b:63:02:48 vid 1" entry now
point towards the bridge, a deletion of it is replayed. There was just
a prior addition of this address, so the switchdev driver deletes this
entry.
Then, the del_nbp(swp3) at step 8 triggers another br_fdb_replay, and
switchdev is notified again to delete "addr 00:1f:7b:63:02:48 vid 1".
But it can't because it no longer has it, so it returns -ENOENT.
There are other possibilities to trigger this issue, but this is by far
the simplest to explain.
To fix this, we must avoid the situation where the addition of an FDB
entry is notified to switchdev as a local entry on a port, and the
deletion is notified on the bridge itself.
Considering that the 2 types of FDB entries are completely equivalent
and we cannot have the same MAC address as a local entry on 2 bridge
ports, or on a bridge port and pointing towards the bridge at the same
time, it makes sense to hide away from switchdev completely the fact
that a local FDB entry is associated with a given bridge port at all.
Just say that it points towards the bridge, it should make no difference
whatsoever to the switchdev driver and should even lead to a simpler
overall implementation, will less cases to handle.
This also avoids any modification at all to the core bridge driver, just
what is reported to switchdev changes. With the local/permanent entries
on bridge ports being already reported to user space, it is hard to
believe that the bridge behavior can change in any backwards-incompatible
way such as making all local FDB entries point towards the bridge.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Wed, 28 Jul 2021 18:27:47 +0000 (21:27 +0300)]
net: bridge: switchdev: replay the entire FDB for each port
Currently when a switchdev port joins a bridge, we replay all FDB
entries pointing towards that port or towards the bridge.
However, this is insufficient in certain situations:
(a) DSA, through its assisted_learning_on_cpu_port logic, snoops
dynamically learned FDB entries on foreign interfaces.
These are FDB entries that are pointing neither towards the newly
joined switchdev port, nor towards the bridge. So these addresses
would be missed when joining a bridge where a foreign interface has
already learned some addresses, and they would also linger on if the
DSA port leaves the bridge before the foreign interface forgets them.
None of this happens if we replay the entire FDB when the port joins.
(b) There is a desire to treat local FDB entries on a port (i.e. the
port's termination MAC address) identically to FDB entries pointing
towards the bridge itself. More details on the reason behind this in
the next patch. The point is that this cannot be done given the
current structure of br_fdb_replay() in this situation:
ip link set swp0 master br0 # br0 inherits its MAC address from swp0
ip link set swp1 master br0
What is desirable is that when swp1 joins the bridge, br_fdb_replay()
also notifies swp1 of br0's MAC address, but this won't in fact
happen because the MAC address of br0 does not have fdb->dst == NULL
(it doesn't point towards the bridge), but it has fdb->dst == swp0.
So our current logic makes it impossible for that address to be
replayed. But if we dump the entire FDB instead of just the entries
with fdb->dst == swp1 and fdb->dst == NULL, then the inherited MAC
address of br0 will be replayed too, which is what we need.
A natural question arises: say there is an FDB entry to be replayed,
like a MAC address dynamically learned on a foreign interface that
belongs to a bridge where no switchdev port has joined yet. If 10
switchdev ports belonging to the same driver join this bridge, one by
one, won't every port get notified 10 times of the foreign FDB entry,
amounting to a total of 100 notifications for this FDB entry in the
switchdev driver?
Well, yes, but this is where the "void *ctx" argument for br_fdb_replay
is useful: every port of the switchdev driver is notified whenever any
other port requests an FDB replay, but because the replay was initiated
by a different port, its context is different from the initiating port's
context, so it ignores those replays.
So the foreign FDB entry will be installed only 10 times, once per port.
This is done so that the following 4 code paths are always well balanced:
(a) addition of foreign FDB entry is replayed when port joins bridge
(b) deletion of foreign FDB entry is replayed when port leaves bridge
(c) addition of foreign FDB entry is notified to all ports currently in bridge
(c) deletion of foreign FDB entry is notified to all ports currently in bridge
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 28 Jul 2021 19:23:45 +0000 (20:23 +0100)]
Merge branch 'bnxt_en-ptp'
Michael Chan says:
====================
bnxt_en: PTP enhancements
This series adds two PTP enhancements. This first one is to register
the PHC during probe time and keep it registered whether it is in
ifup or ifdown state. It will get unregistered and possibly
reregistered if the firmware PTP capability changes after firmware
reset. The second one is to add the 1PPS (one pulse per second)
feature to support input/output of the 1PPS signal.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavan Chebbi [Wed, 28 Jul 2021 18:11:45 +0000 (14:11 -0400)]
bnxt_en: Log if an invalid signal detected on TSIO pin
FW can report to driver via ASYNC event if it encountered an
invalid signal on any TSIO PIN. Driver will log this event
for the user to take corrective action.
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Arvind Susarla <arvind.susarla@broadcom.com>
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavan Chebbi [Wed, 28 Jul 2021 18:11:44 +0000 (14:11 -0400)]
bnxt_en: Event handler for PPS events
Once the PPS pins are configured, the FW can report
PPS values using ASYNC event. This patch adds the
ASYNC event handler and subsequent reporting of the
events to kernel.
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavan Chebbi [Wed, 28 Jul 2021 18:11:43 +0000 (14:11 -0400)]
bnxt_en: 1PPS functions to configure TSIO pins
Application will send ioctls to set/clear PPS pin functions
based on user input. This patch implements the driver
callbacks that will configure the TSIO pins using firmware
commands. After firmware reset, the TSIO pins will be reconfigured
again.
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavan Chebbi [Wed, 28 Jul 2021 18:11:42 +0000 (14:11 -0400)]
bnxt_en: 1PPS support for 5750X family chips
1PPS (One Pulse Per Second) is a signal generated either
by the NIC PHC or an external timing source.
Integrating the support to configure and use 1PPS using
the TSIO pins along with PTP timestamps will add Grand
Master capability to the 5750X family chipsets.
This patch initializes the driver data structures and
registers the 1PPS with kernel, based on the TSIO pins'
capability in the hardware. This will create a /dev/ppsX
device which applications can use to receive PPS events.
Later patches will define functions to configure and use
the pins.
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Wed, 28 Jul 2021 18:11:41 +0000 (14:11 -0400)]
bnxt_en: Do not read the PTP PHC during chip reset
During error recovery or hot firmware upgrade, the chip may be under
reset and the PHC register read cycles may cause completion timeouts.
Check that the chip is not under reset condition before proceeding
to read the PHC by checking the flag BNXT_STATE_IN_FW_RESET. We also
need to take the ptp_lock before we set this flag to prevent race
conditions.
We need this logic because the PHC now will stay registered after
bnxt_close().
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Wed, 28 Jul 2021 18:11:40 +0000 (14:11 -0400)]
bnxt_en: Move bnxt_ptp_init() from bnxt_open() back to bnxt_init_one()
It was pointed out by Richard Cochran that registering the PHC during
probe is better than during ifup, so move bnxt_ptp_init() back to
bnxt_init_one(). In order to work correctly after firmware reset which
may result in PTP config. changes, we modify bnxt_ptp_init() to return
if the PHC has been registered earlier. If PTP is no longer supported
by the new firmware, we will unregister the PHC and clean up.
This partially reverts:
d7859afb6880 ("bnxt_en: Move bnxt_ptp_init() to bnxt_open()")
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 28 Jul 2021 12:39:03 +0000 (13:39 +0100)]
Merge branch 'fec-next'
Joakim Zhang says:
====================
net: fec: add support for i.MX8MQ and i.MX8QM
This patch set adds supports for i.MX8MQ and i.MX8QM, both of them extend new features.
ChangeLogs:
V1->V2:
* rebase on schema binding, and update dts compatible string.
* use generic ethernet controller property for MAC internal RGMII clock delay
rx-internal-delay-ps and tx-internal-delay-ps
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Joakim Zhang [Wed, 28 Jul 2021 11:52:03 +0000 (19:52 +0800)]
arm64: dts: imx8qxp: add "fsl,imx8qm-fec" compatible string for FEC
Add "fsl,imx8qm-fec" compatible string for FEC to support new feature
(RGMII delayed clock).
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joakim Zhang [Wed, 28 Jul 2021 11:52:02 +0000 (19:52 +0800)]
arm64: dts: imx8m: add "fsl,imx8mq-fec" compatible string for FEC
Add "fsl,imx8mq-fec" compatible string for FEC to support new feature
(IEEE 802.3az EEE standard).
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fugang Duan [Wed, 28 Jul 2021 11:52:01 +0000 (19:52 +0800)]
net: fec: add MAC internal delayed clock feature support
i.MX8QM ENET IP version support timing specification that MAC
integrate clock delay in RGMII mode, the delayed TXC/RXC as an
alternative option to work well with various PHYs.
Signed-off-by: Fugang Duan <fugang.duan@nxp.com>
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fugang Duan [Wed, 28 Jul 2021 11:52:00 +0000 (19:52 +0800)]
net: fec: add eee mode tx lpi support
The i.MX8MQ ENET version support IEEE802.3az eee mode, add
eee mode tx lpi enable to support ethtool interface.
usage:
1. set sleep and wake timer to 5ms:
ethtool --set-eee eth0 eee on tx-lpi on tx-timer 5000
2. check the eee mode:
~# ethtool --show-eee eth0
EEE Settings for eth0:
EEE status: enabled - active
Tx LPI: 5000 (us)
Supported EEE link modes: 100baseT/Full
1000baseT/Full
Advertised EEE link modes: 100baseT/Full
1000baseT/Full
Link partner advertised EEE link modes: 100baseT/Full
Note: For realtime case and IEEE1588 ptp case, it should disable
EEE mode.
Signed-off-by: Fugang Duan <fugang.duan@nxp.com>
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fugang Duan [Wed, 28 Jul 2021 11:51:59 +0000 (19:51 +0800)]
net: fec: add imx8mq and imx8qm new versions support
The ENET of imx8mq and imx8qm are basically the same as imx6sx,
but they have new features support based on imx6sx, like:
- imx8mq: supports IEEE 802.3az EEE standard.
- imx8qm: supports RGMII mode delayed clock.
Signed-off-by: Fugang Duan <fugang.duan@nxp.com>
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joakim Zhang [Wed, 28 Jul 2021 11:51:58 +0000 (19:51 +0800)]
dt-bindings: net: fsl,fec: add RGMII internal clock delay
Add RGMII internal clock delay for FEC controller.
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joakim Zhang [Wed, 28 Jul 2021 11:51:57 +0000 (19:51 +0800)]
dt-bindings: net: fsl,fec: update compatible items
Add more compatible items for i.MX8/8M platforms.
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Peilin Ye [Wed, 28 Jul 2021 01:33:40 +0000 (18:33 -0700)]
tc-testing: Add control-plane selftest for skbmod SKBMOD_F_ECN option
Recently we added a new option, SKBMOD_F_ECN, to tc-skbmod(8). Add a
control-plane selftest for it.
Depends on kernel patch "net/sched: act_skbmod: Add SKBMOD_F_ECN option
support", as well as iproute2 patch "tc/skbmod: Introduce SKBMOD_F_ECN
option".
Reviewed-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Peilin Ye [Wed, 28 Jul 2021 01:33:15 +0000 (18:33 -0700)]
net/sched: act_skbmod: Add SKBMOD_F_ECN option support
Currently, when doing rate limiting using the tc-police(8) action, the
easiest way is to simply drop the packets which exceed or conform the
configured bandwidth limit. Add a new option to tc-skbmod(8), so that
users may use the ECN [1] extension to explicitly inform the receiver
about the congestion instead of dropping packets "on the floor".
The 2 least significant bits of the Traffic Class field in IPv4 and IPv6
headers are used to represent different ECN states [2]:
0b00: "Non ECN-Capable Transport", Non-ECT
0b10: "ECN Capable Transport", ECT(0)
0b01: "ECN Capable Transport", ECT(1)
0b11: "Congestion Encountered", CE
As an example:
$ tc filter add dev eth0 parent 1: protocol ip prio 10 \
matchall action skbmod ecn
Doing the above marks all ECT(0) and ECT(1) packets as CE. It does NOT
affect Non-ECT or non-IP packets. In the tc-police scenario mentioned
above, users may pipe a tc-police action and a tc-skbmod "ecn" action
together to achieve ECN-based rate limiting.
For TCP connections, upon receiving a CE packet, the receiver will respond
with an ECE packet, asking the sender to reduce their congestion window.
However ECN also works with other L4 protocols e.g. DCCP and SCTP [2], and
our implementation does not touch or care about L4 headers.
The updated tc-skbmod SYNOPSIS looks like the following:
tc ... action skbmod { set SETTABLE | swap SWAPPABLE | ecn } ...
Only one of "set", "swap" or "ecn" shall be used in a single tc-skbmod
command. Trying to use more than one of them at a time is considered
undefined behavior; pipe multiple tc-skbmod commands together instead.
"set" and "swap" only affect Ethernet packets, while "ecn" only affects
IPv{4,6} packets.
It is also worth mentioning that, in theory, the same effect could be
achieved by piping a "police" action and a "bpf" action using the
bpf_skb_ecn_set_ce() helper, but this requires eBPF programming from the
user, thus impractical.
Depends on patch "net/sched: act_skbmod: Skip non-Ethernet packets".
[1] https://datatracker.ietf.org/doc/html/rfc3168
[2] https://en.wikipedia.org/wiki/Explicit_Congestion_Notification
Reviewed-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yang Yingliang [Wed, 28 Jul 2021 09:16:31 +0000 (17:16 +0800)]
nfp: flower-ct: fix error return code in nfp_fl_ct_add_offload()
If nfp_tunnel_add_ipv6_off() fails, it should return error code
in nfp_fl_ct_add_offload().
Fixes:
5a2b93041646 ("nfp: flower-ct: compile match sections of flow_payload")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 28 Jul 2021 09:23:59 +0000 (10:23 +0100)]
Merge branch 'devlink-register'
Leon Romanovsky says:
====================
Remove duplicated devlink registration check
Changelog:
v1:
* Added two new patches that remove registration field from mlx5 and ti drivers.
v0: https://lore.kernel.org/lkml/
ed7bbb1e4c51dd58e6035a058e93d16f883b09ce.
1627215829.git.leonro@nvidia.com
--------------------------------------------------------------------
Both registered flag and devlink pointer are set at the same time
and indicate the same thing - devlink/devlink_port are ready. Instead
of checking ->registered use devlink pointer as an indication.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Leon Romanovsky [Wed, 28 Jul 2021 07:33:47 +0000 (10:33 +0300)]
devlink: Remove duplicated registration check
Both registered flag and devlink pointer are set at the same time
and indicate the same thing - devlink/devlink_port are ready. Instead
of checking ->registered use devlink pointer as an indication.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Leon Romanovsky [Wed, 28 Jul 2021 07:33:46 +0000 (10:33 +0300)]
net/mlx5: Don't rely on always true registered field
Devlink is an integral part of mlx5 driver and all flows ensure that
devlink_*_register() will success. That makes the ->registered check
an obsolete.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Leon Romanovsky [Wed, 28 Jul 2021 07:33:45 +0000 (10:33 +0300)]
net: ti: am65-cpsw-nuss: fix wrong devlink release order
The commit that introduced devlink support released devlink resources in
wrong order, that made an unwind flow to be asymmetrical. In addition,
the am65-cpsw-nuss used internal to devlink core field - registered.
In order to fix the unwind flow and remove such access to the
registered field, rewrite the code to call devlink_port_unregister only
on registered ports.
Fixes:
58356eb31d60 ("net: ti: am65-cpsw-nuss: Add devlink support")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 27 Jul 2021 23:06:41 +0000 (00:06 +0100)]
Merge branch 'ipa-clock-refs'
Alex Elder says:
====================
net: ipa: add clock references
This series continues preparation for implementing runtime power
management for IPA. We need to ensure that the IPA core clock and
interconnects are operational whenever IPA hardware is accessed.
And in particular this means that any external entry point that can
lead to accessing IPA hardware must guarantee the hardware is "up"
when it is accessed.
The first four patches in this series take IPA clock references when
needed by such external entry points, dropping those references in
those same functions when they are no longer required.
The last patch is a bit different, though it too prepares for
enabling runtime power management. It avoids suspending/resuming
endpoints if setup is not complete.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Alex Elder [Tue, 27 Jul 2021 21:19:33 +0000 (16:19 -0500)]
net: ipa: don't suspend endpoints if setup not complete
Until we complete the setup stage of initialization, GSI is not
initialized and therefore endpoints aren't usable. So avoid
suspending endpoints during system suspend unless setup is complete.
Clear the setup_complete flag at the top of ipa_teardown() to
reflect the fact that things are no longer in setup state.
Get rid of a misplaced (and superfluous) comment.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alex Elder [Tue, 27 Jul 2021 21:19:32 +0000 (16:19 -0500)]
net: ipa: add a clock reference for netdev operations
The IPA network device can be opened at any time, and an opened
network device can be stopped any time. Both of these callback
functions require access to the hardware, and therefore they need
the IPA clock to be operational. Take an IPA clock reference in
both the ->open and ->stop callback functions, dropping the
reference when they are done accessing hardware.
The ->start_xmit callback requires a little different handling,
and that will be added separately.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alex Elder [Tue, 27 Jul 2021 21:19:31 +0000 (16:19 -0500)]
net: ipa: add clock reference for remoteproc SSR
The remoteproc SSR callback function for the modem requires hardware
access when handling a modem crash or shutdown. Take and later
release an IPA clock reference in ipa_modem_crashed(), to ensure the
hardware is operational.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alex Elder [Tue, 27 Jul 2021 21:19:30 +0000 (16:19 -0500)]
net: ipa: get another clock for ipa_setup()
Two places call ipa_setup(). The first, ipa_probe(), holds an IPA
clock reference when calling ipa_setup() (if the AP is responsible
for IPA firmware loading). But if the modem is loading IPA
firmware, ipa_smp2p_modem_setup_ready_isr() calls ipa_setup() after
the modem has signaled the hardware is ready. This can happen at
any time, and there is no guarantee the hardware is active.
Have ipa_smp2p_modem_setup() take an IPA clock reference before it
calls ipa_setup(), and release it once setup is complete.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alex Elder [Tue, 27 Jul 2021 21:19:29 +0000 (16:19 -0500)]
net: ipa: get clock in ipa_probe()
Any entry point that leads to IPA hardware access must ensure the
hardware is operational (clocked). Currently we ensure this by
taking an extra clock reference during setup that is not released
until we receive a system suspend request. But this extra reference
will soon go away.
When the platform driver ->probe function is called, we first need
hardware access in ipa_config(). Although ipa_config() takes an IPA
clock reference, it the special reference taken to prevent suspending
the hardware.
Have ipa_probe() take a reference before calling ipa_config(), so
that the "no-suspend" reference can eventually go away. Drop this
reference before ipa_probe() returns.
Similarly, the driver ->remove function can be called at any time.
Take an IPA clock reference at the beginning of that function, and
drop it again after the deconfig stage has completed (at which point
hardware access is no longer needed).
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 27 Jul 2021 20:02:21 +0000 (21:02 +0100)]
Merge branch 'ipa-interrupts'
Alex Elder says:
====================
net: ipa: IPA interrupt cleanup
The first patch in this series makes all IPA interrupt handling be
done in a threaded context. The remaining ones refactor some code
to simplify that threaded handler function.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Alex Elder [Tue, 27 Jul 2021 19:46:29 +0000 (14:46 -0500)]
net: ipa: kill ipa_interrupt_process_all()
Now that ipa_isr_thread() is a simple wrapper that gets a clock
reference around ipa_interrupt_process_all(), get rid of the
called function and just open-code it in ipa_isr_thread().
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alex Elder [Tue, 27 Jul 2021 19:46:28 +0000 (14:46 -0500)]
net: ipa: get rid of some unneeded IPA interrupt code
The pending IPA interrupts are checked by ipa_isr_thread(), and
interrupts are processed only if an enabled interrupt has a
condition pending. But ipa_interrupt_process_all() now makes the
same check, so the one in ipa_isr_thread() can just be skipped.
Also in ipa_isr_thread(), any interrupt conditions pending which are
not enabled are cleared. Here too, ipa_interrupt_process_all() now
clears such excess interrupt conditions, so ipa_isr_thread() doesn't
have to.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alex Elder [Tue, 27 Jul 2021 19:46:27 +0000 (14:46 -0500)]
net: ipa: clear disabled IPA interrupt conditions
We ignore any IPA interrupt that has no handler. If any interrupt
conditions without a handler exist when an IPA interrupt occurs,
clear those conditions. Add a debug message to report which ones
are being cleared.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alex Elder [Tue, 27 Jul 2021 19:46:26 +0000 (14:46 -0500)]
net: ipa: make IPA interrupt handler threaded only
When the IPA interrupt handler runs, the IPA core clock must already
be operational, and the interconnect providing access by the AP to
IPA config space must be enabled too.
Currently we ensure this by taking a top-level "stay awake" IPA
clock reference, but that will soon go away. In preparation for
that, move all handling for the IPA IRQ into the thread function.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Skripkin [Tue, 27 Jul 2021 16:35:30 +0000 (19:35 +0300)]
net: cipso: fix warnings in netlbl_cipsov4_add_std
Syzbot reported warning in netlbl_cipsov4_add(). The
problem was in too big doi_def->map.std->lvl.local_size
passed to kcalloc(). Since this value comes from userpace there is
no need to warn if value is not correct.
The same problem may occur with other kcalloc() calls in
this function, so, I've added __GFP_NOWARN flag to all
kcalloc() calls there.
Reported-and-tested-by: syzbot+cdd51ee2e6b0b2e18c0d@syzkaller.appspotmail.com
Fixes:
96cb8e3313c7 ("[NetLabel]: CIPSOv4 and Unlabeled packet integration")
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 27 Jul 2021 19:15:31 +0000 (20:15 +0100)]
Merge branch 'ionic-next'
Shannon Nelson says:
====================
ionic: driver updates 27-July-2021
This is a collection of small driver updates for adding a couple of
small features and for a bit of code cleaning.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Tue, 27 Jul 2021 17:43:34 +0000 (10:43 -0700)]
ionic: add function tag to debug string
Prefix the log output with the function string as in other
debug messages.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Tue, 27 Jul 2021 17:43:33 +0000 (10:43 -0700)]
ionic: enable rxhash only with multiple queues
If there's only one queue, there is no need to enable
the rxhashing.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Tue, 27 Jul 2021 17:43:32 +0000 (10:43 -0700)]
ionic: block some ethtool operations when fw in reset
There are a few things that we can't safely do when the fw is
resetting, as the driver may be in the middle of rebuilding
queue structures.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Tue, 27 Jul 2021 17:43:31 +0000 (10:43 -0700)]
ionic: remove unneeded comp union fields
We don't use these fields, so remove them from
the definition.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Tue, 27 Jul 2021 17:43:30 +0000 (10:43 -0700)]
ionic: increment num-vfs before configure
Add the new VF to our internal count before we start configuring it.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Tue, 27 Jul 2021 17:43:29 +0000 (10:43 -0700)]
ionic: use fewer inits on the buf_info struct
Based on Alex's review notes on [1], we don't need to write
to the buf_info elements as often, and can tighten up how they
are used. Also, use prefetchw() to warm up the page struct
for a later get_page().
[1] https://lore.kernel.org/netdev/CAKgT0UfyjoAN7LTnq0NMZfXRv4v7iTCPyAb9pVr3qWMhop_BVw@mail.gmail.com/
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Tue, 27 Jul 2021 17:43:28 +0000 (10:43 -0700)]
ionic: init reconfig err to 0
Initialize err to 0 instead of ENOMEM, and specifically set
err to ENOMEM in the devm_kcalloc() failure cases.
Also, add an error message to the end of reconfig.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Tue, 27 Jul 2021 17:43:27 +0000 (10:43 -0700)]
ionic: print firmware version on identify
Print the version of the DSC firmware seen when we do a fresh
ident check. Because the FW can be updated by the external
orchestration system, this helps us track that FW has been
updated on the DSC.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Tue, 27 Jul 2021 17:43:26 +0000 (10:43 -0700)]
ionic: monitor fw status generation
The top 4 bits of the fw_status in dev_info_regs is reserved
for the status generation. This generation number is an
arbitrary value defined when firmware starts up. If the FW
is killed/crashed/stopped and then restarted, it will create
a different generation number. With this mechanism, the host
driver can detect that the FW has crashed and restarted, and
the driver can then take steps to re-initialize its connection.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Tue, 27 Jul 2021 17:43:25 +0000 (10:43 -0700)]
ionic: minimize resources when under kdump
When running in a small kdump kernel, we can play nice and
minimize our resource use to help make sure that kdump is
successful in its mission.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 27 Jul 2021 19:12:04 +0000 (20:12 +0100)]
Merge branch 'ndo_ioctl-rework'
Arnd Bergmann says:
====================
ndo_ioctl rework
This series is a follow-up to the series for removing
compat_alloc_user_space() and copy_in_user() that has now
been merged.
I wanted to be sure I address all the ways that 'struct ifreq' is used
in device drivers through .ndo_do_ioctl, originally to prove that
my approach of changing the struct definition was correct, but then
I discarded that approach and went on anyway.
Roughly, the contents here are:
- split out all the users of SIOCDEVPRIVATE ioctls into a
separate ndo_siocdevprivate callback, to better see what
gets used where
- fix compat handling for those drivers that pass data
directly inside of 'ifreq' rather than using an indirect
ifr_data pointer
- remove unreachable code in ndo_ioctl handlers that relies
on command codes we never pass into that, in particular
for wireless drivers
- split out the ethernet specific ioctls into yet another
ndo_eth_ioctl callback, as these are by far the most
common use of ndo_do_ioctl today. I considered splitting
them further into MII and timestamp controls, but
went with the simpler change for now.
- split out bonding and wandev ioctls into separate helpers
- rework the bridge handling with a separate callback
At this point, only a few oddball things remain in ndo_do_ioctl:
appletalk and ieee802154 pass down SIOCSIFADDR/SIOCGIFADDR and
some wireless drivers have completely dead code.
I have thoroughly compile tested this on randconfig builds,
but not done any notable runtime testing, so please review.
All of it is also available as part of a larger branch at
https://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground.git \
compat-alloc-user-space-12
Changes since v2:
- rebase to net-next
- fix qeth regression
- Cc driver maintainers for each patch and in cover letter
Changes since v1:
- rebase to linux-5.14-rc2
- add conversion for ndo_siowandev, bridge and bonding drivers
- leave broken wifi drivers untouched for now
Link: https://lore.kernel.org/netdev/20201106221743.3271965-14-arnd@kernel.org/
====================
Arnd Bergmann [Tue, 27 Jul 2021 13:45:17 +0000 (15:45 +0200)]
net: bonding: move ioctl handling to private ndo operation
All other user triggered operations are gone from ndo_ioctl, so move
the SIOCBOND family into a custom operation as well.
The .ndo_ioctl() helper is no longer called by the dev_ioctl.c code now,
but there are still a few definitions in obsolete wireless drivers as well
as the appletalk and ieee802154 layers to call SIOCSIFADDR/SIOCGIFADDR
helpers from inside the kernel.
Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Tue, 27 Jul 2021 13:45:16 +0000 (15:45 +0200)]
net: bridge: move bridge ioctls out of .ndo_do_ioctl
Working towards obsoleting the .ndo_do_ioctl operation entirely,
stop passing the SIOCBRADDIF/SIOCBRDELIF device ioctl commands
into this callback.
My first attempt was to add another ndo_siocbr() callback, but
as there is only a single driver that takes these commands and
there is already a hook mechanism to call directly into this
driver, extend this hook instead, and use it for both the
deviceless and the device specific ioctl commands.
Cc: Roopa Prabhu <roopa@nvidia.com>
Cc: Nikolay Aleksandrov <nikolay@nvidia.com>
Cc: bridge@lists.linux-foundation.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Tue, 27 Jul 2021 13:45:15 +0000 (15:45 +0200)]
net: socket: return changed ifreq from SIOCDEVPRIVATE
Some drivers that use SIOCDEVPRIVATE ioctl commands modify
the ifreq structure and expect it to be passed back to user
space, which has never really happened for compat mode
because the calling these drivers through ndo_do_ioctl
requires overwriting the ifr_data pointer.
Now that all drivers are converted to ndo_siocdevprivate,
change it to handle this correctly in both compat and
native mode.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Tue, 27 Jul 2021 13:45:14 +0000 (15:45 +0200)]
net: split out ndo_siowandev ioctl
In order to further reduce the scope of ndo_do_ioctl(), move
out the SIOCWANDEV handling into a new network device operation
function.
Adjust the prototype to only pass the if_settings sub-structure
in place of the ifreq, and remove the redundant 'cmd' argument
in the process.
Cc: Krzysztof Halasa <khc@pm.waw.pl>
Cc: "Jan \"Yenya\" Kasprzak" <kas@fi.muni.cz>
Cc: Kevin Curtis <kevin.curtis@farsite.co.uk>
Cc: Zhao Qiang <qiang.zhao@nxp.com>
Cc: Martin Schiller <ms@dev.tdt.de>
Cc: Jiri Slaby <jirislaby@kernel.org>
Cc: linux-x25@vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Tue, 27 Jul 2021 13:45:13 +0000 (15:45 +0200)]
dev_ioctl: split out ndo_eth_ioctl
Most users of ndo_do_ioctl are ethernet drivers that implement
the MII commands SIOCGMIIPHY/SIOCGMIIREG/SIOCSMIIREG, or hardware
timestamping with SIOCSHWTSTAMP/SIOCGHWTSTAMP.
Separate these from the few drivers that use ndo_do_ioctl to
implement SIOCBOND, SIOCBR and SIOCWANDEV commands.
This is a purely cosmetic change intended to help readers find
their way through the implementation.
Cc: Doug Ledford <dledford@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Vivien Didelot <vivien.didelot@gmail.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Vladimir Oltean <olteanv@gmail.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: linux-rdma@vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Tue, 27 Jul 2021 13:45:12 +0000 (15:45 +0200)]
dev_ioctl: pass SIOCDEVPRIVATE data separately
The compat handlers for SIOCDEVPRIVATE are incorrect for any driver that
passes data as part of struct ifreq rather than as an ifr_data pointer, or
that passes data back this way, since the compat_ifr_data_ioctl() helper
overwrites the ifr_data pointer and does not copy anything back out.
Since all drivers using devprivate commands are now converted to the
new .ndo_siocdevprivate callback, fix this by adding the missing piece
and passing the pointer separately the whole way.
This further unifies the native and compat logic for socket ioctls,
as the new code now passes the correct pointer as well as the correct
data for both native and compat ioctls.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Tue, 27 Jul 2021 13:45:11 +0000 (15:45 +0200)]
wan: cosa: remove dead cosa_net_ioctl() function
The ndo_do_ioctl callback is never called with the COSAIO* commands,
so this is never used. Call the hdlc_ioctl function directly instead.
Any user space code that relied on this function working as intended
has never worked in a mainline kernel since before linux-1.0.
Cc: "Jan \"Yenya\" Kasprzak" <kas@fi.muni.cz>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Tue, 27 Jul 2021 13:45:10 +0000 (15:45 +0200)]
wan: use ndo_siocdevprivate
The wan drivers each support some custom SIOCDEVPRIVATE
ioctls, plus the common SIOCWANDEV command.
Split these so the ioctl callback only deals with SIOCWANDEV
and the rest is handled by ndo_siocdevprivate.
It might make sense to also split out SIOCWANDEV into a
separate callback in order to eventually remove ndo_do_ioctl
entirely.
Cc: Krzysztof Halasa <khc@pm.waw.pl>
Cc: Kevin Curtis <kevin.curtis@farsite.co.uk>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Tue, 27 Jul 2021 13:45:09 +0000 (15:45 +0200)]
ppp: use ndo_siocdevprivate
ppp has a custom statistics interface using SIOCDEVPRIVATE
ioctl commands that works correctly in compat mode.
Convert it to use ndo_siocdevprivate as a cleanup.
Cc: Paul Mackerras <paulus@samba.org>
Cc: linux-ppp@vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Tue, 27 Jul 2021 13:45:08 +0000 (15:45 +0200)]
sb1000: use ndo_siocdevprivate
The private sb1000 ioctl commands all work correctly in
compat mode. Change the to ndo_siocdevprivate as a cleanup.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Tue, 27 Jul 2021 13:45:07 +0000 (15:45 +0200)]
hippi: use ndo_siocdevprivate
The rr_ioctl uses private ioctl commands that correctly pass
all data through ifr_data, which works fine in compat mode.
Change it to use ndo_siocdevprivate as a cleanup.
Cc: Jes Sorensen <jes@trained-monkey.org>
Cc: linux-hippi@sunsite.dk
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Tue, 27 Jul 2021 13:45:06 +0000 (15:45 +0200)]
ip_tunnel: use ndo_siocdevprivate
The various ipv4 and ipv6 tunnel drivers each implement a set
of 12 SIOCDEVPRIVATE commands for managing tunnels. These
all work correctly in compat mode.
Move them over to the new .ndo_siocdevprivate operation.
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: David Ahern <dsahern@kernel.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Tue, 27 Jul 2021 13:45:05 +0000 (15:45 +0200)]
airo: use ndo_siocdevprivate
The airo driver overloads SIOCDEVPRIVATE ioctls with another
set based on SIOCIWFIRSTPRIV. Only the first ones actually
work (also in compat mode) as the others do not get passed
down any more.
Change it over to ndo_siocdevprivate for clarification.
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: linux-wireless@vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Tue, 27 Jul 2021 13:45:04 +0000 (15:45 +0200)]
hamradio: use ndo_siocdevprivate
hamradio uses a set of private ioctls that do seem to work
correctly in compat mode, as they only rely on the ifr_data
pointer.
Move them over to the ndo_siocdevprivate callback as a cleanup.
Cc: Thomas Sailer <t.sailer@alumni.ethz.ch>
Cc: Joerg Reuter <jreuter@yaina.de>
Cc: Jean-Paul Roubelat <jpr@f6fbb.org>
Cc: linux-hams@vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Tue, 27 Jul 2021 13:45:03 +0000 (15:45 +0200)]
cxgb3: use ndo_siocdevprivate
cxgb3 has a private multiplexor that works correctly in compat
mode, split out the siocdevprivate callback from do_ioctl for
simplification.
Cc: Raju Rangoju <rajur@chelsio.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Tue, 27 Jul 2021 13:45:02 +0000 (15:45 +0200)]
qeth: use ndo_siocdevprivate
qeth has both standard MII ioctls and custom SIOCDEVPRIVATE ones,
all of which work correctly with compat user space.
Move the private ones over to the new ndo_siocdevprivate callback.
Cc: Julian Wiedmann <jwi@linux.ibm.com>
Cc: Karsten Graul <kgraul@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: linux-s390@vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Tue, 27 Jul 2021 13:45:01 +0000 (15:45 +0200)]
slip/plip: use ndo_siocdevprivate
slip and plip both use a couple of SIOCDEVPRIVATE ioctl
commands that overload the ifreq layout in a way that is
incompatible with compat mode.
Convert to use ndo_siocdevprivate to allow passing the
data this way, but return an error in compat mode anyway
because the private structure is still incompatible.
This could be fixed as well to make compat work properly.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Tue, 27 Jul 2021 13:45:00 +0000 (15:45 +0200)]
net: usb: use ndo_siocdevprivate
The pegasus and rtl8150 drivers use SIOCDEVPRIVATE ioctls
to access their MII registers, in place of the normal
commands. This is broken for all compat ioctls today.
Change to ndo_siocdevprivate to fix it.
Cc: Petko Manolov <petkan@nucleusys.com>
Cc: linux-usb@vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Tue, 27 Jul 2021 13:44:59 +0000 (15:44 +0200)]
fddi: use ndo_siocdevprivate
The skfddi driver has a private ioctl and passes the data correctly
through ifr_data, but the use of a pointer in s_skfp_ioctl is
broken in compat mode.
Change the driver to use ndo_siocdevprivate and disallow calling
it in compat mode until a conversion handler is added.
Cc: "Maciej W. Rozycki" <macro@orcam.me.uk>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Tue, 27 Jul 2021 13:44:58 +0000 (15:44 +0200)]
eql: use ndo_siocdevprivate
The private ioctls in eql pass the arguments correctly through ifr_data,
but the slaving_request_t and slave_config_t structures are incompatible
with compat mode and need special conversion code in the driver.
Convert to siocdevprivate for now, and return an error when called
in compat mode.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Tue, 27 Jul 2021 13:44:57 +0000 (15:44 +0200)]
tehuti: use ndo_siocdevprivate
Tehuti only implements private ioctl commands, and implements
them by overriding the ifreq layout, which is broken in
compat mode.
Move it to the ndo_siocdevprivate callback in order to fix this.
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Tue, 27 Jul 2021 13:44:56 +0000 (15:44 +0200)]
hamachi: use ndo_siocdevprivate
hamachi has one command that overloads the ifreq argument
and requires a conversion to ndo_siocdevprivate in order to
make compat mode work, so split it from ndo_ioctl.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Tue, 27 Jul 2021 13:44:55 +0000 (15:44 +0200)]
appletalk: use ndo_siocdevprivate
appletalk has three SIOCDEVPRIVATE ioctl commands that are
broken in compat mode because the passed structure contains
a pointer.
Change it over to ndo_siocdevprivate for consistency and
make it return an error when called in compat mode. This
could be fixed if there are still users.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Tue, 27 Jul 2021 13:44:54 +0000 (15:44 +0200)]
bonding: use siocdevprivate
The bonding driver supports two command codes for each operation: one
in the SIOCDEVPRIVATE range and another one with the same definition
but a unique command code.
Only the second set currently works in compat mode, as the ifr_data
expansion overwrites part of the ifr_slave field.
Move the private ones into ndo_siocdevprivate and change the
implementation to call the other function. This makes both version
work correctly.
Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Tue, 27 Jul 2021 13:44:53 +0000 (15:44 +0200)]
tulip: use ndo_siocdevprivate
The tulip driver has a debugging method over ioctl built-in, but it
does not actually check the command type, which may end up leading
to random behavior when trying to run other ioctls on it.
Change the driver to use ndo_siocdevprivate and limit the execution
further to the first private command code. If anyone still has tools
to run these debugging commands, they might have to be patched for
it if they pass different ioctl command.
The function has existed in this form since the driver was merged in
Linux-1.1.86.
Cc: linux-parisc@vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Tue, 27 Jul 2021 13:44:52 +0000 (15:44 +0200)]
phonet: use siocdevprivate
phonet has a single private ioctl that is broken in compat
mode on big-endian machines today because the data returned
from it is never copied back to user space.
Move it over to the ndo_siocdevprivate callback, which also
fixes the compat issue.
Cc: Remi Denis-Courmont <courmisch@gmail.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: RĂ©mi Denis-Courmont <courmisch@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Tue, 27 Jul 2021 13:44:51 +0000 (15:44 +0200)]
bridge: use ndo_siocdevprivate
The bridge driver has an old set of ioctls using the SIOCDEVPRIVATE
namespace that have never worked in compat mode and are explicitly
forbidden already.
Move them over to ndo_siocdevprivate and fix compat mode for these,
because we can.
Cc: Roopa Prabhu <roopa@nvidia.com>
Cc: Nikolay Aleksandrov <nikolay@nvidia.com>
Cc: bridge@lists.linux-foundation.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Tue, 27 Jul 2021 13:44:50 +0000 (15:44 +0200)]
hostap: use ndo_siocdevprivate
hostap has a combination of iwpriv ioctls that do not work at
all, and two SIOCDEVPRIVATE commands that work natively but
lack a compat conversion handler.
For the moment, move them over to the new ndo_siocdevprivate
interface and return an error for compat mode.
Cc: Jouni Malinen <j@w1.fi>
Cc: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Tue, 27 Jul 2021 13:44:49 +0000 (15:44 +0200)]
staging: wlan-ng: use siocdevprivate
wlan-ng has two private ioctls that correctly work in compat
mode. Move these over to the new ndo_siocdevprivate mechanism.
The p80211netdev_ethtool() function is commented out and
has no use here, so this can be removed
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Tue, 27 Jul 2021 13:44:48 +0000 (15:44 +0200)]
staging: rtlwifi: use siocdevprivate
rtl8188eu has an "android private" ioctl command multiplexer
that is not currently safe for use in compat mode because
of its triple-indirect pointer.
rtl8723bs uses a different interface on the SIOCDEVPRIVATE
command, based on the iwpriv data structure
Both also have normal unreachable iwpriv commands, and all
of the above should probably just get removed. For the
moment, just switch over to the new interface.
Cc: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Tue, 27 Jul 2021 13:44:47 +0000 (15:44 +0200)]
net: split out SIOCDEVPRIVATE handling from dev_ioctl
SIOCDEVPRIVATE ioctl commands are mainly used in really old
drivers, and they have a number of problems:
- They hide behind the normal .ndo_do_ioctl function that
is also used for other things in modern drivers, so it's
hard to spot a driver that actually uses one of these
- Since drivers use a number different calling conventions,
it is impossible to support compat mode for them in
a generic way.
- With all drivers using the same 16 commands codes, there
is no way to introspect the data being passed through
things like strace.
Add a new net_device_ops callback pointer, to address the
first two of these. Separating them from .ndo_do_ioctl
makes it easy to grep for drivers with a .ndo_siocdevprivate
callback, and the unwieldy name hopefully makes it easier
to spot in code review.
By passing the ifreq structure and the ifr_data pointer
separately, it is no longer necessary to overload these,
and the driver can use either one for a given command.
Cc: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 27 Jul 2021 19:09:29 +0000 (20:09 +0100)]
Merge branch 'tcp-rack'
Neal Cardwell says:
====================
more accurate DSACK processing for RACK-TLP
This patch series includes two minor improvements to tighten up the accuracy of
the processing of incoming DSACK information, so that RACK-TLP behavior is
faster and more precise: first, to ensure we detect packet loss in some extra
corner cases; and second, to avoid growing the RACK reordering window (and
delaying fast recovery) in cases where it seems clear we don't need to.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Neal Cardwell [Tue, 27 Jul 2021 14:42:58 +0000 (10:42 -0400)]
tcp: more accurately check DSACKs to grow RACK reordering window
Previously, a DSACK could expand the RACK reordering window when no
reordering has been seen, and/or when the DSACK was due to an
unnecessary TLP retransmit (rather than a spurious fast recovery due
to reordering). This could result in unnecessarily growing the RACK
reordering window and thus unnecessarily delaying RACK-based fast
recovery episodes.
To avoid these issues, this commit tightens the conditions under which
a DSACK triggers the RACK reordering window to grow, so that a
connection only expands its RACK reordering window if:
(a) reordering has been seen in the connection
(b) a DSACKed range does not match the most recent TLP retransmit
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuchung Cheng [Tue, 27 Jul 2021 14:42:57 +0000 (10:42 -0400)]
tcp: more accurately detect spurious TLP probes
Previously TLP is considered spurious if the sender receives any
DSACK during a TLP episode. This patch further checks the DSACK
sequences match the TLP's to improve accuracy.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tonghao Zhang [Tue, 27 Jul 2021 13:14:13 +0000 (21:14 +0800)]
qdisc: add new field for qdisc_enqueue tracepoint
qdisc_enqueue tracepoint can work with qdisc:qdisc_dequeue
to measure packets latency in qdisc queues.
Add a new field txq for it, then we can retrieve more info.
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Wang [Sun, 25 Jul 2021 15:13:53 +0000 (23:13 +0800)]
net: qed: remove unneeded return variables
Some return variables are never changed until function returned.
These variables are unneeded for their functions. Therefore, the
unneeded return variables can be removed safely by returning their
initial values.
Signed-off-by: Jason Wang <wangborong@cdjrlc.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ioana Ciornei [Fri, 23 Jul 2021 08:42:44 +0000 (11:42 +0300)]
docs: networking: dpaa2: add documentation for the switch driver
Add a documentation entry for the DPAA2 switch listing its
requirements, features and some examples to go along them.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 27 Jul 2021 10:48:52 +0000 (11:48 +0100)]
Merge branch 'ovs-upcall-issues'
Mark Gray says:
====================
openvswitch: per-cpu upcall patchwork issues
Some issues were raised by patchwork at:
https://patchwork.kernel.org/project/netdevbpf/patch/
20210630095350.817785-1-mark.d.gray@redhat.com/#
24285159
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Mark Gray [Fri, 23 Jul 2021 14:24:14 +0000 (10:24 -0400)]
openvswitch: fix sparse warning incorrect type
fix incorrect type in argument 1 (different address spaces)
../net/openvswitch/datapath.c:169:17: warning: incorrect type in argument 1 (different address spaces)
../net/openvswitch/datapath.c:169:17: expected void const *
../net/openvswitch/datapath.c:169:17: got struct dp_nlsk_pids [noderef] __rcu *upcall_portids
Found at: https://patchwork.kernel.org/project/netdevbpf/patch/
20210630095350.817785-1-mark.d.gray@redhat.com/#
24285159
Signed-off-by: Mark Gray <mark.d.gray@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mark Gray [Fri, 23 Jul 2021 14:24:13 +0000 (10:24 -0400)]
openvswitch: fix alignment issues
Signed-off-by: Mark Gray <mark.d.gray@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mark Gray [Fri, 23 Jul 2021 14:24:12 +0000 (10:24 -0400)]
openvswitch: update kdoc OVS_DP_ATTR_PER_CPU_PIDS
Signed-off-by: Mark Gray <mark.d.gray@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>