Christoph Hellwig [Thu, 28 May 2020 05:12:20 +0000 (07:12 +0200)]
tcp: add tcp_sock_set_quickack
Add a helper to directly set the TCP_QUICKACK sockopt from kernel space
without going through a fake uaccess. Cleanup the callers to avoid
pointless wrappers now that this is a simple function call.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Christoph Hellwig [Thu, 28 May 2020 05:12:19 +0000 (07:12 +0200)]
tcp: add tcp_sock_set_nodelay
Add a helper to directly set the TCP_NODELAY sockopt from kernel space
without going through a fake uaccess. Cleanup the callers to avoid
pointless wrappers now that this is a simple function call.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Acked-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Christoph Hellwig [Thu, 28 May 2020 05:12:18 +0000 (07:12 +0200)]
tcp: add tcp_sock_set_cork
Add a helper to directly set the TCP_CORK sockopt from kernel space
without going through a fake uaccess. Cleanup the callers to avoid
pointless wrappers now that this is a simple function call.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Christoph Hellwig [Thu, 28 May 2020 05:12:17 +0000 (07:12 +0200)]
net: add sock_set_reuseport
Add a helper to directly set the SO_REUSEPORT sockopt from kernel space
without going through a fake uaccess.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Christoph Hellwig [Thu, 28 May 2020 05:12:16 +0000 (07:12 +0200)]
net: add sock_set_rcvbuf
Add a helper to directly set the SO_RCVBUFFORCE sockopt from kernel space
without going through a fake uaccess.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Christoph Hellwig [Thu, 28 May 2020 05:12:15 +0000 (07:12 +0200)]
net: add sock_set_keepalive
Add a helper to directly set the SO_KEEPALIVE sockopt from kernel space
without going through a fake uaccess.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Christoph Hellwig [Thu, 28 May 2020 05:12:14 +0000 (07:12 +0200)]
net: add sock_enable_timestamps
Add a helper to directly enable timestamps instead of setting the
SO_TIMESTAMP* sockopts from kernel space and going through a fake
uaccess.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Christoph Hellwig [Thu, 28 May 2020 05:12:13 +0000 (07:12 +0200)]
net: add sock_bindtoindex
Add a helper to directly set the SO_BINDTOIFINDEX sockopt from kernel
space without going through a fake uaccess.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Christoph Hellwig [Thu, 28 May 2020 05:12:12 +0000 (07:12 +0200)]
net: add sock_set_sndtimeo
Add a helper to directly set the SO_SNDTIMEO_NEW sockopt from kernel
space without going through a fake uaccess. The interface is
simplified to only pass the seconds value, as that is the only
thing needed at the moment.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Christoph Hellwig [Thu, 28 May 2020 05:12:11 +0000 (07:12 +0200)]
net: add sock_set_priority
Add a helper to directly set the SO_PRIORITY sockopt from kernel space
without going through a fake uaccess.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: David S. Miller <davem@davemloft.net>
Christoph Hellwig [Thu, 28 May 2020 05:12:10 +0000 (07:12 +0200)]
net: add sock_no_linger
Add a helper to directly set the SO_LINGER sockopt from kernel space
with onoff set to true and a linger time of 0 without going through a
fake uaccess.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: David S. Miller <davem@davemloft.net>
Christoph Hellwig [Thu, 28 May 2020 05:12:09 +0000 (07:12 +0200)]
net: add sock_set_reuseaddr
Add a helper to directly set the SO_REUSEADDR sockopt from kernel space
without going through a fake uaccess.
For this the iscsi target now has to formally depend on inet to avoid
a mostly theoretical compile failure. For actual operation it already
did depend on having ipv4 or ipv6 support.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 28 May 2020 18:04:12 +0000 (11:04 -0700)]
Merge tag 'mlx5-updates-2020-05-26' of git://git./linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2020-05-26
Updates highlights:
1) From Vu Pham (8): Support VM traffics failover with bonded VF
representors and e-switch egress/ingress ACLs
This series introduce the support for Virtual Machine running I/O
traffic over direct/fast VF path and failing over to slower
paravirtualized path using the following features:
__________________________________
| VM _________________ |
| |FAILOVER device | |
| |________________| |
| | |
| ____|_____ |
| | | |
| ______ |___ ____|_______ |
| | VF PT | |VIRTIO-NET | |
| | device | | device | |
| |_________| |___________| |
|___________|______________|________|
| |
| HYPERVISOR |
| ____|______
| | macvtap |
| |virtio BE |
| |___________|
| |
| ____|_____
| |host VF |
| |_________|
| |
_____|______ _____|_____
| PT VF | | host VF |
|representor| |representor|
|___________| |___________|
\ /
\ /
\ /
\ / _________________
\_______/ | |
_______|________ | V-SWITCH |
|VF representors |________________| (OVS) |
| bond | |________________|
|________________| |
________|________
| Uplink |
| representor |
|_________________|
Summary:
--------
Problem statement:
------------------
Currently in above topology, when netfailover device is configured using
VFs and eswitch VF representors, and when traffic fails over to stand-by
VF which is exposed using macvtap device to guest VM, eswitch fails to
switch the traffic to the stand-by VF representor. This occurs because
there is no knowledge at eswitch level of the stand-by representor
device.
Solution:
---------
Using standard bonding driver, a bond netdevice is created over VF
representor device which is used for offloading tc rules.
Two VF representors are bonded together, one for the passthrough VF
device and another one for the stand-by VF device.
With this solution, mlx5 driver listens to the failover events
occuring at the bond device level to failover traffic to either of
the active VF representor of the bond.
a. VM with netfailover device of VF pass-thru (PT) device and virtio-net
paravirtualized device with same MAC-address to handle failover
traffics at VM level.
b. Host bond is active-standby mode, with the lower devices being the VM
VF PT representor, and the representor of the 2nd VF to handle
failover traffics at Hypervisor/V-Switch OVS level.
- During the steady state (fast datapath): set the bond active
device to be the VM PT VF representor.
- During failover: apply bond failover to the second VF representor
device which connects to the VM non-accelerated path.
c. E-Switch ingress/egress ACL tables to support failover traffics at
E-Switch level
I. E-Switch egress ACL with forward-to-vport rule:
- By default, eswitch vport egress acl forward packets to its
counterpart NIC vport.
- During port failover, the egress acl forward-to-vport rule will
be added to e-switch vport of passive/in-active slave VF
representor
to forward packets to other e-switch vport ie. the active slave
representor's e-switch vport to handle egress "failover"
traffics.
- Using lower change netdev event to detect a representor is a
lower
dev (slave) of bond and becomes active, adding egress acl
forward-to-vport rule of all other slave netdevs to forward to
this
representor's vport.
- Using upper change netdev event to detect a representor unslaving
from bond device to delete its vport's egress acl forward-to-vport
rule.
II. E-Switch ingress ACL metadata reg_c for match
- Bonded representors' vorts sharing tc block have the same
root ingress acl table and a unique metadata for match.
- Traffics from both representors's vports will be tagged with same
unique metadata reg_c.
- Using upper change netdev event to detect a representor
enslaving/unslaving from bond device to setup shared root ingress
acl and unique metadata.
2) From Alex Vesker (2): Slpit RX and TX lock for parallel rule insertion in
software steering
3) Eli Britstein (2): Optimize performance for IPv4/IPv6 ethertype use the HW
ip_version register rather than parsing eth frames for ethertype.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 28 May 2020 00:34:58 +0000 (17:34 -0700)]
tcp: ipv6: support RFC 6069 (TCP-LD)
Make tcp_ld_RTO_revert() helper available to IPv6, and
implement RFC 6069 :
Quoting this RFC :
3. Connectivity Disruption Indication
For Internet Protocol version 6 (IPv6) [RFC2460], the counterpart of
the ICMP destination unreachable message of code 0 (net unreachable)
and of code 1 (host unreachable) is the ICMPv6 destination
unreachable message of code 0 (no route to destination) [RFC4443].
As with IPv4, a router should generate an ICMPv6 destination
unreachable message of code 0 in response to a packet that cannot be
delivered to its destination address because it lacks a matching
entry in its routing table.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Thu, 28 May 2020 00:27:58 +0000 (03:27 +0300)]
net: dsa: sja1105: offload the Credit-Based Shaper qdisc
SJA1105, being AVB/TSN switches, provide hardware assist for the
Credit-Based Shaper as described in the IEEE 8021Q-2018 document.
First generation has 10 shapers, freely assignable to any of the 4
external ports and 8 traffic classes, and second generation has 16
shapers.
The Credit-Based Shaper tables are accessed through the dynamic
reconfiguration interface, so we have to restore them manually after a
switch reset. The tables are backed up by the static config only on
P/Q/R/S, and we don't want to add custom code only for that family,
since the procedure that is in place now works for both.
Tested with the following commands:
data_rate_kbps=67000
port_transmit_rate_kbps=
1000000
idleslope=$data_rate_kbps
sendslope=$(($idleslope - $port_transmit_rate_kbps))
locredit=$((-0x80000000))
hicredit=$((0x7fffffff))
tc qdisc add dev swp2 root handle 1: mqprio hw 0 num_tc 8 \
map 0 1 2 3 4 5 6 7 \
queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7
tc qdisc replace dev swp2 parent 1:1 cbs \
idleslope $idleslope \
sendslope $sendslope \
hicredit $hicredit \
locredit $locredit \
offload 1
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 28 May 2020 00:03:44 +0000 (18:03 -0600)]
selftests: Add torture tests to nexthop tests
Add Nik's torture tests as a new set to stress the replace and cleanup
paths.
Torture test created by Nikolay Aleksandrov and then I adapted to
selftest and added IPv6 version.
Signed-off-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alex Vesker [Wed, 20 May 2020 15:09:35 +0000 (18:09 +0300)]
net/mlx5: DR, Split RX and TX lock for parallel insertion
Change the locking flow to support RX and TX locks, splitting
the single lock to two will allow inserting rules in parallel
for RX and TX parts of the FDB.
Locking the dr_domain will be done by locking the RX domain
and the TX domain locks, this is mostly used for control operations
on the dr_domain. When inserting rules for RX or TX the single
nic_doamin RX or TX lock will be used. Splitting the lock is safe since
RX and TX domains are logically separated from each other, shared
objects such the send-ring and memory pool are protected by locks.
Signed-off-by: Alex Vesker <valex@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Alex Vesker [Wed, 20 May 2020 15:09:14 +0000 (18:09 +0300)]
net/mlx5: DR, Add a spinlock to protect the send ring
Adding this lock will allow writing steering entries without
locking the dr_domain and allow parallel insertion.
Signed-off-by: Alex Vesker <valex@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Eli Britstein [Tue, 19 May 2020 05:55:59 +0000 (05:55 +0000)]
net/mlx5e: Optimize performance for IPv4/IPv6 ethertype
The HW is optimized for IPv4/IPv6. For such cases, pending capability,
avoid matching on ethertype, and use ip_version field instead.
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Eli Britstein [Mon, 11 May 2020 19:20:29 +0000 (19:20 +0000)]
net/mlx5e: Helper function to set ethertype
Set ethertype match in a helper function as a pre-step towards
optimizing it.
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Parav Pandit [Fri, 15 May 2020 04:42:45 +0000 (23:42 -0500)]
net/mlx5: Add missing mutex destroy
Add mutex destroy calls to balance with mutex_init() done in the init
path.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Vu Pham [Thu, 12 Mar 2020 17:26:25 +0000 (10:26 -0700)]
net/mlx5e: Use change upper event to setup representors' bond_metadata
Use change upper event to detect slave representor from
enslaving/unslaving to/from lag device.
On enslaving event, call mlx5_enslave_rep() API to create, add
this slave representor shadow entry to the slaves list of
bond_metadata structure representing master lag device and use
its metadata to setup ingress acl metadata header.
On unslaving event, resetting the vport of unslaved representor
to use its default ingress/egress acls and rx rules with its
default_metadata.
The last slave will free the shared bond_metadata and its
unique metadata.
Signed-off-by: Vu Pham <vuhuong@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Vu Pham [Mon, 2 Mar 2020 18:33:49 +0000 (10:33 -0800)]
net/mlx5e: Slave representors sharing unique metadata for match
Bonded slave representors' vports must share a unique metadata
for match.
On enslaving event of slave representor to lag device, allocate
new unique "bond_metadata" for match if this is the first slave.
The subsequent enslaved representors will share the same unique
"bond_metadata".
On unslaving event of slave representor, reset the slave
representor's vport to use its own default metadata.
Replace ingress acl and rx rules of the slave representors' vports
using new vport->bond_metadata.
Signed-off-by: Vu Pham <vuhuong@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Vu Pham [Sat, 29 Feb 2020 00:10:34 +0000 (16:10 -0800)]
net/mlx5: E-Switch, Alloc and free unique metadata for match
Introduce infrastructure to create unique metadata for match
for vport without depending on vport_num. Vport uses its
default metadata for match in standalone configuration but
will share a different unique "bond_metadata" for match with
other vports in bond configuration.
Using ida to generate unique metadata for match for vports
in default and bond configurations.
Introduce APIs to generate, free metadata for match.
Introduce APIs to set vport's bond_metadata and replace its
ingress acl rules with bond_metatada.
Signed-off-by: Vu Pham <vuhuong@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Vu Pham [Fri, 28 Feb 2020 22:28:27 +0000 (14:28 -0800)]
net/mlx5e: Add bond_metadata and its slave entries
Adding bond_metadata and its slave entries to represent a lag device
and its slaves VF representors. Bond_metadata structure includes a
unique metadata shared by slaves VF respresentors, and a list of slaves
representors slave entries.
On enslaving event, create a bond_metadata structure representing
the upper lag device of this slave representor if it has not been
created yet. Create and add entry for the slave representor to the
slaves list.
On unslaving event, free the slave entry of the slave representor.
On the last unslave event, free the bond_metadata structure and its
resources.
Introduce APIs to create and remove bond_metadata and its resources,
enslave and unslave VF representor slave entries.
Signed-off-by: Vu Pham <vuhuong@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Or Gerlitz [Tue, 5 Mar 2019 19:11:14 +0000 (21:11 +0200)]
net/mlx5e: Offload flow rules to active lower representor
When a bond device is created over one or more non uplink representors,
and when a flow rule is offloaded to such bond device, offload a rule
to the active lower device.
Assuming that this is active-backup lag, the rules should be offloaded
to the active lower device which is the representor of the direct
path (not the failover).
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Vu Pham <vuhuong@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Vu Pham [Fri, 2 Aug 2019 23:13:10 +0000 (16:13 -0700)]
net/mlx5e: Support tc block sharing for representors
Currently offloading a rule over a tc block shared by multiple
representors fails because an e-switch global hashtable to keep
the mapping from tc cookies to mlx5e flow instances is used, and
tc block sharing offloads the same rule/cookie multiple times,
each time for different representor sharing the tc block.
Changing the implementation and behavior by acknowledging and returning
success if the same rule/cookie is offloaded again to other slave
representor sharing the tc block by setting, checking and comparing
the netdev that added the rule first.
Signed-off-by: Vu Pham <vuhuong@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Or Gerlitz [Fri, 21 Jun 2019 20:23:44 +0000 (13:23 -0700)]
net/mlx5e: Use netdev events to set/del egress acl forward-to-vport rule
Register a notifier block to handle netdev events for bond device
of non-uplink representors to support eswitch vports bonding.
When a non-uplink representor is a lower dev (slave) of bond and
becomes active, adding egress acl forward-to-vport rule of all slave
netdevs (active + standby) to forward to this representor's vport. Use
change lower netdev event to do this.
Use change upper event to detect slave representor unslaved from lag
device to delete its vport egress acl forward rule if any.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Vu Pham <vuhuong@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Vu Pham [Tue, 17 Mar 2020 00:32:50 +0000 (17:32 -0700)]
net/mlx5: E-Switch, Introduce APIs to enable egress acl forward-to-vport rule
By default, e-switch vport's egress acl just forward packets to its
counterpart NIC vport using existing egress acl table.
During port failover in bonding scenario where two VFs representors
are bonded, the egress acl forward-to-vport rule will be added to
the existing egress acl table of e-switch vport of passive/inactive
slave representor to forward packets to other NIC vport ie. the active
slave representor's NIC vport to handle egress "failover" traffic.
Enable egress acl and have APIs to create and destroy egress acl
forward-to-vport rule and group.
Signed-off-by: Vu Pham <vuhuong@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Vu Pham [Sat, 28 Mar 2020 06:12:22 +0000 (23:12 -0700)]
net/mlx5: E-Switch, Refactor eswitch ingress acl codes
Restructure the eswitch ingress acl codes into eswitch directory
and different files:
. Acl ingress helper functions to acl_helper.c/h
. Acl ingress functions used in offloads mode to acl_ingress_ofld.c
. Acl ingress functions used in legacy mode to acl_ingress_lgy.c
This patch does not change any functionality.
Signed-off-by: Vu Pham <vuhuong@mellanox.com>
Vu Pham [Wed, 6 Nov 2019 17:57:12 +0000 (09:57 -0800)]
net/mlx5: E-Switch, Refactor eswitch egress acl codes
Refactor the egress acl codes so that offloads and legacy modes
can configure specifically their own needs of egress acl table,
groups and rules. While at it, restructure the eswitch egress
acl codes into eswitch directory and different files:
. Acl egress helper functions to acl_helper.c/h
. Acl egress functions used in offloads mode to acl_egress_ofld.c
. Acl egress functions used in legacy mode to acl_egress_lgy.c
This patch does not change any functionality.
Signed-off-by: Vu Pham <vuhuong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
David S. Miller [Wed, 27 May 2020 22:11:33 +0000 (15:11 -0700)]
Merge branch 'remove-kernel_getsockopt'
Christoph Hellwig says:
====================
remove kernel_getsockopt
this series reduces scope from the last round and just removes
kernel_getsockopt to avoid conflicting with the sctp cleanup series.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Christoph Hellwig [Wed, 27 May 2020 18:22:29 +0000 (20:22 +0200)]
net: remove kernel_getsockopt
No users left.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Christoph Hellwig [Wed, 27 May 2020 18:22:28 +0000 (20:22 +0200)]
dlm: use the tcp version of accept_from_sock for sctp as well
The only difference between a few missing fixes applied to the SCTP
one is that TCP uses ->getpeername to get the remote address, while
SCTP uses kernel_getsockopt(.. SCTP_PRIMARY_ADDR). But given that
getpeername is defined to return the primary address for sctp, there
doesn't seem to be any reason for the different way of quering the
peername, or all the code duplication.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jonas Falkevik [Wed, 27 May 2020 09:59:43 +0000 (11:59 +0200)]
sctp: fix typo sctp_ulpevent_nofity_peer_addr_change
change typo in function name "nofity" to "notify"
sctp_ulpevent_nofity_peer_addr_change ->
sctp_ulpevent_notify_peer_addr_change
Signed-off-by: Jonas Falkevik <jonas.falkevik@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tariq Toukan [Wed, 27 May 2020 09:25:26 +0000 (12:25 +0300)]
net/tls: Add force_resync for driver resync
This patch adds a field to the tls rx offload context which enables
drivers to force a send_resync call.
This field can be used by drivers to request a resync at the next
possible tls record. It is beneficial for hardware that provides the
resync sequence number asynchronously. In such cases, the packet that
triggered the resync does not contain the information required for a
resync. Instead, the driver requests resync for all the following
TLS record until the asynchronous notification with the resync request
TCP sequence arrives.
A following series for mlx5e ConnectX-6DX TLS RX offload support will
use this mechanism.
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 27 May 2020 22:05:50 +0000 (15:05 -0700)]
Merge branch 'net_sched-reduce-the-number-of-qdisc-resets'
Cong Wang says:
====================
net_sched: reduce the number of qdisc resets
This patchset aims to reduce the number of qdisc resets during
qdisc tear down. Patch 1~3 are preparation for their following
patches, especially patch 2 and patch 3 add a few tracepoints
so that we can observe the whole lifetime of qdisc's. Patch 4
and patch 5 are the ones do the actual work. Please find more
details in each patch description.
Vaclav Zindulka tested this patchset and his large ruleset with
over 13k qdiscs defined got from 22s to 520ms.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Wed, 27 May 2020 04:35:27 +0000 (21:35 -0700)]
net_sched: get rid of unnecessary dev_qdisc_reset()
Resetting old qdisc on dev_queue->qdisc_sleeping in
dev_qdisc_reset() is redundant, because this qdisc,
even if not same with dev_queue->qdisc, is reset via
qdisc_put() right after calling dev_graft_qdisc() when
hitting refcnt 0.
This is very easy to observe with qdisc_reset() tracepoint
and stack traces.
Reported-by: Václav Zindulka <vaclav.zindulka@tlapnet.cz>
Tested-by: Václav Zindulka <vaclav.zindulka@tlapnet.cz>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Wed, 27 May 2020 04:35:26 +0000 (21:35 -0700)]
net_sched: avoid resetting active qdisc for multiple times
Except for sch_mq and sch_mqprio, each dev queue points to the
same root qdisc, so when we reset the dev queues with
netdev_for_each_tx_queue() we end up resetting the same instance
of the root qdisc for multiple times.
Avoid this by checking the __QDISC_STATE_DEACTIVATED bit in
each iteration, so for sch_mq/sch_mqprio, we still reset all
of them like before, for the rest, we only reset it once.
Reported-by: Václav Zindulka <vaclav.zindulka@tlapnet.cz>
Tested-by: Václav Zindulka <vaclav.zindulka@tlapnet.cz>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Wed, 27 May 2020 04:35:25 +0000 (21:35 -0700)]
net_sched: add a tracepoint for qdisc creation
With this tracepoint, we could know when qdisc's are created,
especially those default qdisc's.
Sample output:
tc-736 [001] ...1 56.230107: qdisc_create: dev=ens3 kind=pfifo parent=1:0
tc-736 [001] ...1 56.230113: qdisc_create: dev=ens3 kind=hfsc parent=ffff:ffff
tc-738 [001] ...1 56.256816: qdisc_create: dev=ens3 kind=pfifo parent=1:100
tc-739 [001] ...1 56.267584: qdisc_create: dev=ens3 kind=pfifo parent=1:200
tc-740 [001] ...1 56.279649: qdisc_create: dev=ens3 kind=fq_codel parent=1:100
tc-741 [001] ...1 56.289996: qdisc_create: dev=ens3 kind=pfifo_fast parent=1:200
tc-745 [000] .N.1 111.687483: qdisc_create: dev=ens3 kind=ingress parent=ffff:fff1
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Wed, 27 May 2020 04:35:24 +0000 (21:35 -0700)]
net_sched: add tracepoints for qdisc_reset() and qdisc_destroy()
Add two tracepoints for qdisc_reset() and qdisc_destroy() to track
qdisc resetting and destroying.
Sample output:
tc-756 [000] ...3 138.355662: qdisc_reset: dev=ens3 kind=pfifo_fast parent=ffff:ffff handle=0:0
tc-756 [000] ...1 138.355720: qdisc_reset: dev=ens3 kind=pfifo_fast parent=ffff:ffff handle=0:0
tc-756 [000] ...1 138.355867: qdisc_reset: dev=ens3 kind=pfifo_fast parent=ffff:ffff handle=0:0
tc-756 [000] ...1 138.355930: qdisc_destroy: dev=ens3 kind=pfifo_fast parent=ffff:ffff handle=0:0
tc-757 [000] ...2 143.073780: qdisc_reset: dev=ens3 kind=fq_codel parent=ffff:ffff handle=8001:0
tc-757 [000] ...1 143.073878: qdisc_reset: dev=ens3 kind=fq_codel parent=ffff:ffff handle=8001:0
tc-757 [000] ...1 143.074114: qdisc_reset: dev=ens3 kind=fq_codel parent=ffff:ffff handle=8001:0
tc-757 [000] ...1 143.074228: qdisc_destroy: dev=ens3 kind=fq_codel parent=ffff:ffff handle=8001:0
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Wed, 27 May 2020 04:35:23 +0000 (21:35 -0700)]
net_sched: use qdisc_reset() in qdisc_destroy()
qdisc_destroy() calls ops->reset() and cleans up qdisc->gso_skb
and qdisc->skb_bad_txq, these are nearly same with qdisc_reset(),
so just call it directly, and cosolidate the code for the next
patch.
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wang Wenhu [Wed, 27 May 2020 03:32:22 +0000 (20:32 -0700)]
drivers: ipa: remove discription of nonexistent element
No element named "client" exists within "struct ipa_endpoint".
It might be a heritage forgotten to be removed. Delete it now.
Signed-off-by: Wang Wenhu <wenhu.wang@vivo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wang Wenhu [Wed, 27 May 2020 03:19:24 +0000 (20:19 -0700)]
drivers: ipa: fix typoes for ipa
Change "transactio" -> "transaction". Also an alignment correction.
Signed-off-by: Wang Wenhu <wenhu.wang@vivo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 27 May 2020 21:57:27 +0000 (14:57 -0700)]
Merge branch 'tcp-tcp_v4_err-cleanups'
Eric Dumazet says:
====================
tcp: tcp_v4_err() cleanups
This series is a followup of patch
239174945dac ("tcp: tcp_v4_err() icmp
skb is named icmp_skb").
Move the RFC 6069 code into a helper, and rename icmp_skb to standard
skb name so that tcp_v4_err() and tcp_v6_err() are using consistent names.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 27 May 2020 02:48:50 +0000 (19:48 -0700)]
tcp: rename tcp_v4_err() skb parameter
This essentially reverts
4d1a2d9ec1c1 ("Revert Backoff [v3]:
Rename skb to icmp_skb in tcp_v4_err()")
Now we have tcp_ld_RTO_revert() helper, we can use the usual
name for sk_buff parameter, so that tcp_v4_err() and
tcp_v6_err() use similar names.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 27 May 2020 02:48:49 +0000 (19:48 -0700)]
tcp: add tcp_ld_RTO_revert() helper
RFC 6069 logic has been implemented for IPv4 only so far,
right in the middle of tcp_v4_err() and was error prone.
Move this code to one helper, to make tcp_v4_err() more
readable and to eventually expand RFC 6069 to IPv6 in
the future.
Also perform sock_owned_by_user() check a bit sooner.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Tested-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 27 May 2020 21:56:08 +0000 (14:56 -0700)]
Merge branch 'hns3-next'
Huazhong Tan says:
====================
net: hns3: misc updates for -next
This patchset includes some misc updates for the HNS3 ethernet driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Huazhong Tan [Wed, 27 May 2020 00:59:17 +0000 (08:59 +0800)]
net: hns3: add a print for initializing CMDQ when reset pending
When initializing CMDQ fails because of reset pending,
there is no hint for debugging, so adds a log for it.
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yufeng Mo [Wed, 27 May 2020 00:59:16 +0000 (08:59 +0800)]
net: hns3: remove unnecessary MAC enable in app loopback
Packets will not pass through MAC during app loopback.
Therefore, it is meaningless to enable MAC while doing
app loopback. This patch removes this unnecessary action.
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yufeng Mo [Wed, 27 May 2020 00:59:15 +0000 (08:59 +0800)]
net: hns3: change the order of reinitializing RoCE and NIC client during reset
The HNS RDMA driver will support VF device later, whose
re-initialization should be done after PF's. This patch
changes the order of hclge_reset_prepare_up() and
hclge_notify_roce_client(), so that PF's RoCE client
will be reinitialized before VF's.
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Guangbin Huang [Wed, 27 May 2020 00:59:14 +0000 (08:59 +0800)]
net: hns3: add a resetting check in hclgevf_init_nic_client_instance()
To prevent from initializing VF NIC client in reset handling state,
this patch adds resetting check in hclgevf_init_nic_client_instance().
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 27 May 2020 21:54:32 +0000 (14:54 -0700)]
Merge branch 'net-mscc-allow-forwarding-ioctl-operations-to-attached-PHYs'
Antoine Tenart says:
====================
net: mscc: allow forwarding ioctl operations to attached PHYs
These two patches allow forwarding ioctl to the PHY MII implementation,
and support is added for offloading timestamping operations to
compatible attached PHYs.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Antoine Tenart [Tue, 26 May 2020 15:01:49 +0000 (17:01 +0200)]
net: mscc: allow offloading timestamping operations to the PHY
This patch adds support for offloading timestamping operations not only
to the Ocelot switch (as already supported) but to compatible PHYs.
When both the PHY and the Ocelot switch support timestamping operations,
the PHY implementation is chosen as the timestamp will happen closer to
the medium.
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Antoine Tenart [Tue, 26 May 2020 15:01:48 +0000 (17:01 +0200)]
net: mscc: use the PHY MII ioctl interface when possible
Allow ioctl to be implemented by the PHY, when a PHY is attached to the
Ocelot switch. In case the ioctl is a request to set or get the hardware
timestamp, use the Ocelot switch implementation for now.
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Wed, 27 May 2020 16:45:38 +0000 (19:45 +0300)]
net: dsa: felix: accept VLAN config regardless of bridge VLAN awareness state
The ocelot core library is written with the idea in mind that the VLAN
table is populated by the bridge. Otherwise, not even a sane default
pvid is provided: in standalone mode, the default pvid is 0, and the
core expects the bridge layer to change it to 1.
So without this patch, the VLAN table is completely empty at the end of
the commands below, and traffic is broken as a result:
ip link add dev br0 type bridge vlan_filtering 0 && ip link set dev br0 up
for eth in $(ls /sys/bus/pci/devices/0000\:00\:00.5/net/); do
ip link set dev $eth master br0
ip link set dev $eth up
done
ip link set dev br0 type bridge vlan_filtering 1
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Worley [Wed, 27 May 2020 16:41:42 +0000 (12:41 -0400)]
net: add large ecmp group nexthop tests
Add a couple large ecmp group nexthop selftests to cover
the remnant fixed by
d69100b8eee27c2d60ee52df76e0b80a8d492d34.
The tests create 100 x32 ecmp groups of ipv4 and ipv6 and then
dump them. On kernels without the fix, they will fail due
to data remnant during the dump.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Wed, 27 May 2020 13:34:45 +0000 (15:34 +0200)]
mtk-star-emac: mark PM functions as __maybe_unused
Without CONFIG_PM, the compiler warns about two unused functions:
drivers/net/ethernet/mediatek/mtk_star_emac.c:1472:12: error: unused function 'mtk_star_suspend' [-Werror,-Wunused-function]
drivers/net/ethernet/mediatek/mtk_star_emac.c:1488:12: error: unused function 'mtk_star_resume' [-Werror,-Wunused-function]
Mark these as __maybe_unused.
Fixes: 8c7bd5a454ff ("net: ethernet: mtk-star-emac: new driver")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Horatiu Vultur [Wed, 27 May 2020 12:34:30 +0000 (12:34 +0000)]
bridge: mrp: Rework the MRP netlink interface
This patch reworks the MRP netlink interface. Before, each attribute
represented a binary structure which made it hard to be extended.
Therefore update the MRP netlink interface such that each existing
attribute to be a nested attribute which contains the fields of the
binary structures.
In this way the MRP netlink interface can be extended without breaking
the backwards compatibility. It is also using strict checking for
attributes under the MRP top attribute.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Wed, 27 May 2020 12:01:29 +0000 (13:01 +0100)]
net: dsa: b53: remove redundant premature assignment to new_pvid
Variable new_pvid is being assigned with a value that is never read,
the following if statement updates new_pvid with a new value in both
of the if paths. The assignment is redundant and can be removed.
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bartosz Golaszewski [Wed, 27 May 2020 09:24:04 +0000 (11:24 +0200)]
net: ethernet: mtk-star-emac: fix error path in RX handling
The dma_addr field in desc_data must not be overwritten until after the
new skb is mapped. Currently we do replace it with uninitialized value
in error path. This change fixes it by moving the assignment before the
label to which we jump after mapping or allocation errors.
Fixes: 8c7bd5a454ff ("net: ethernet: mtk-star-emac: new driver")
Reported-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Tested-by: Nathan Chancellor <natechancellor@gmail.com> # build
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Wed, 27 May 2020 08:15:55 +0000 (09:15 +0100)]
mlxsw: spectrum_router: remove redundant initialization of pointer br_dev
The pointer br_dev is being initialized with a value that is never read
and is being updated with a new value later on. The initialization
is redundant and can be removed.
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nathan Chancellor [Wed, 27 May 2020 08:00:20 +0000 (01:00 -0700)]
nexthop: Fix type of event_type in call_nexthop_notifiers
Clang warns:
net/ipv4/nexthop.c:841:30: warning: implicit conversion from enumeration
type 'enum nexthop_event_type' to different enumeration type 'enum
fib_event_type' [-Wenum-conversion]
call_nexthop_notifiers(net, NEXTHOP_EVENT_DEL, nh);
~~~~~~~~~~~~~~~~~~~~~~ ^~~~~~~~~~~~~~~~~
1 warning generated.
Use the right type for event_type so that clang does not warn.
Fixes: 8590ceedb701 ("nexthop: add support for notifiers")
Link: https://github.com/ClangBuiltLinux/linux/issues/1038
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Oleksij Rempel [Wed, 27 May 2020 05:08:43 +0000 (07:08 +0200)]
net: phy: at803x: add cable diagnostics support for ATH9331 and ATH8032
Add support for Atheros 100Base-T PHYs. The only difference seems to be
the ability to test 2 pairs instead of 4 and the lack of 1000Base-T
specific register.
Only the ATH9331 was tested with this patch.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 27 May 2020 06:22:28 +0000 (23:22 -0700)]
Merge branch 'Raw-PHY-TDR-data'
Andrew Lunn says:
====================
Raw PHY TDR data
Some ethernet PHYs allow access to raw TDR data in addition to summary
diagnostics information. Add support for retrieving this data via
netlink ethtool. The basic structure in the core is the same as for
normal phy diagnostics, the PHY driver simply uses different helpers
to fill the netlink message with different data.
There is a graphical tool under development, as well a ethtool(1)
which can dump the data as text and JSON.
A patched ethtool(1) can be found in
https://github.com/lunn/ethtool.git feature/cable-test-v5
Thanks for Chris Healy for lots of testing.
v2:
See the individual patches but:
Pass distances in centimeters, not meters
Allow the PHY to round distances to what it supports and report what
it actually used along with the results.
Make the Marvell PHY use steps a multiple of 0.805 meters, its native
step size.
v3:
Move the TDR configuration into a structure
Add a range check on step
Use NL_SET_ERR_MSG_ATTR() when appropriate
Move TDR configuration into a nest
Document attributes in the request
Unsquash the last two patches
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Tue, 26 May 2020 22:21:43 +0000 (00:21 +0200)]
net: phy: marvell: Configure TDR pulse based on measurement length
When performing a TDR measurement for a short distance, the pulse
width should be low, to help differentiate between the outgoing pulse
and any reflection. For longer distances, the pulse should be wider,
to help with attenuation.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Tue, 26 May 2020 22:21:42 +0000 (00:21 +0200)]
net : phy: marvell: Speedup TDR data retrieval by only changing page once
Getting the TDR data requires a large number of MDIO bus
transactions. The number can however be reduced if the page is only
changed once. Add the needed locking to allow this, and make use of
unlocked read/write methods where needed.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Tue, 26 May 2020 22:21:41 +0000 (00:21 +0200)]
net: ethtool: Allow PHY cable test TDR data to configured
Allow the user to configure where on the cable the TDR data should be
retrieved, in terms of first and last sample, and the step between
samples. Also add the ability to ask for TDR data for just one pair.
If this configuration is not provided, it defaults to 1-150m at 1m
intervals for all pairs.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
v3:
Move the TDR configuration into a structure
Add a range check on step
Use NL_SET_ERR_MSG_ATTR() when appropriate
Move TDR configuration into a nest
Document attributes in the request
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Tue, 26 May 2020 22:21:40 +0000 (00:21 +0200)]
net: phy: marvell: Add support for amplitude graph
The Marvell PHYs can measure the amplitude of the returned signal for
a given distance. Implement this option of the cable test
infrastructure. When reporting the step, convert the distance into cm.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
v2:
Step based on the measurement resolution, and convert this to cm.
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Tue, 26 May 2020 22:21:39 +0000 (00:21 +0200)]
net: ethtool: Add helpers for cable test TDR data
Add helpers for returning raw TDR helpers in netlink messages.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Tue, 26 May 2020 22:21:38 +0000 (00:21 +0200)]
net: ethtool: Add generic parts of cable test TDR
Add the generic parts of the code used to trigger a cable test and
return raw TDR data. Any PHY driver which support this must implement
the new driver op.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
v2
Update nxp-tja11xx for API change.
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Tue, 26 May 2020 22:21:37 +0000 (00:21 +0200)]
net: ethtool: Add attributes for cable test TDR data
Some Ethernet PHYs can return the raw time domain reflectromatry data.
Add the attributes to allow this data to be requested and returned via
netlink ethtool.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
v2:
m -> cm
Report what the PHY actually used for start/stop/step.
Signed-off-by: David S. Miller <davem@davemloft.net>
Armin Wolf [Tue, 26 May 2020 18:03:02 +0000 (20:03 +0200)]
ne2k-pci: Fix various coding-style issues and improve printk() usage
Fixed a ton of minor checkpatch errors/warnings and remove version
printing at module init/when device is found and use MODULE_VERSION
instead. Also modifying the RTL8029 PCI string to include the compatible
RTL8029AS nic.
The only mayor issue remaining is the missing SPDX tag, but since the
exact version of the GPL is not stated anywhere inside the file, its
impossible to add such a tag at the moment.
But maybe it is possible, since 8390.h states Donald Becker's 8390
drivers are licensed under GPL 2.2 only (= GPL-2.0-only ?).
The kernel module containing this patch compiles and runs without
problems on a RTL8029AS-based NE2000 clone card with kernel 5.7.0-rc6.
Signed-off-by: Armin Wolf <W_Armin@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Sverdlin [Tue, 26 May 2020 12:27:51 +0000 (14:27 +0200)]
macvlan: Skip loopback packets in RX handler
Ignore loopback-originatig packets soon enough and don't try to process L2
header where it doesn't exist. The very similar br_handle_frame() in bridge
code performs exactly the same check.
This is an example of such ICMPv6 packet:
skb len=96 headroom=40 headlen=96 tailroom=56
mac=(40,0) net=(40,40) trans=80
shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
csum(0xae2e9a2f ip_summed=1 complete_sw=0 valid=0 level=0)
hash(0xc97ebd88 sw=1 l4=1) proto=0x86dd pkttype=5 iif=24
dev name=etha01.212 feat=0x0x0000000040005000
skb headroom:
00000000: 00 7c 86 52 84 88 ff ff 00 00 00 00 00 00 08 00
skb headroom:
00000010: 45 00 00 9e 5d 5c 40 00 40 11 33 33 00 00 00 01
skb headroom:
00000020: 02 40 43 80 00 00 86 dd
skb linear:
00000000: 60 09 88 bd 00 38 3a ff fe 80 00 00 00 00 00 00
skb linear:
00000010: 00 40 43 ff fe 80 00 00 ff 02 00 00 00 00 00 00
skb linear:
00000020: 00 00 00 00 00 00 00 01 86 00 61 00 40 00 00 2d
skb linear:
00000030: 00 00 00 00 00 00 00 00 03 04 40 e0 00 00 01 2c
skb linear:
00000040: 00 00 00 78 00 00 00 00 fd 5f 42 68 23 87 a8 81
skb linear:
00000050: 00 00 00 00 00 00 00 00 01 01 02 40 43 80 00 00
skb tailroom:
00000000: ...
skb tailroom:
00000010: ...
skb tailroom:
00000020: ...
skb tailroom:
00000030: ...
Call Trace, how it happens exactly:
...
macvlan_handle_frame+0x321/0x425 [macvlan]
? macvlan_forward_source+0x110/0x110 [macvlan]
__netif_receive_skb_core+0x545/0xda0
? enqueue_task_fair+0xe5/0x8e0
? __netif_receive_skb_one_core+0x36/0x70
__netif_receive_skb_one_core+0x36/0x70
process_backlog+0x97/0x140
net_rx_action+0x1eb/0x350
? __hrtimer_run_queues+0x136/0x2e0
__do_softirq+0xe3/0x383
do_softirq_own_stack+0x2a/0x40
</IRQ>
do_softirq.part.4+0x4e/0x50
netif_rx_ni+0x60/0xd0
dev_loopback_xmit+0x83/0xf0
ip6_finish_output2+0x575/0x590 [ipv6]
? ip6_cork_release.isra.1+0x64/0x90 [ipv6]
? __ip6_make_skb+0x38d/0x680 [ipv6]
? ip6_output+0x6c/0x140 [ipv6]
ip6_output+0x6c/0x140 [ipv6]
ip6_send_skb+0x1e/0x60 [ipv6]
rawv6_sendmsg+0xc4b/0xe10 [ipv6]
? proc_put_long+0xd0/0xd0
? rw_copy_check_uvector+0x4e/0x110
? sock_sendmsg+0x36/0x40
sock_sendmsg+0x36/0x40
___sys_sendmsg+0x2b6/0x2d0
? proc_dointvec+0x23/0x30
? addrconf_sysctl_forward+0x8d/0x250 [ipv6]
? dev_forward_change+0x130/0x130 [ipv6]
? _raw_spin_unlock+0x12/0x30
? proc_sys_call_handler.isra.14+0x9f/0x110
? __call_rcu+0x213/0x510
? get_max_files+0x10/0x10
? trace_hardirqs_on+0x2c/0xe0
? __sys_sendmsg+0x63/0xa0
__sys_sendmsg+0x63/0xa0
do_syscall_64+0x6c/0x1e0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 27 May 2020 03:33:59 +0000 (20:33 -0700)]
Merge branch 'mlxsw-Various-trap-changes-part-2'
Ido Schimmel says:
====================
mlxsw: Various trap changes - part 2
This patch set contains another set of small changes in mlxsw trap
configuration. It is the last set before exposing control traps (e.g.,
IGMP query, ARP request) via devlink-trap.
Tested with existing devlink-trap selftests. Please see individual
patches for a detailed changelog.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Mon, 25 May 2020 23:05:56 +0000 (02:05 +0300)]
mlxsw: spectrum_router: Allow programming link-local prefix routes
The device has a trap for IPv6 packets that need be routed and have a
unicast link-local destination IP (i.e., fe80::/10). This allows mlxsw
to ignore link-local routes, as the packets will be trapped to the CPU
in any case.
However, since link-local routes are not programmed, it is possible for
routed packets to hit the default route which might also be programmed
to trap packets. This means that packets with a link-local destination
IP might be trapped for the wrong reason.
To overcome this, allow programming link-local prefix routes (usually
one fe80::/64 per-table), so that the packets will be forwarded until
reaching the link-local trap.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Mon, 25 May 2020 23:05:55 +0000 (02:05 +0300)]
mlxsw: spectrum: Add packet traps for BFD packets
Bidirectional Forwarding Detection (BFD) provides "low-overhead,
short-duration detection of failures in the path between adjacent
forwarding engines" (RFC 5880).
This is accomplished by exchanging BFD packets between the two
forwarding engines. Up until now these packets were trapped via the
general local delivery (i.e., IP2ME) trap which also traps a lot of
other packets that are not as time-sensitive as BFD packets.
Expose dedicated traps for BFD packets so that user space could
configure a dedicated policer for them.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Mon, 25 May 2020 23:05:54 +0000 (02:05 +0300)]
mlxsw: spectrum: Treat IPv6 link-local SIP as an exception
IPv6 packets that need to be forwarded and have a link-local source IP are
dropped by the kernel and an ICMPv6 "Destination unreachable" is sent to
the sending host.
As such, change the trap group of such packets so that they do not
interfere with IPv6 management packets. In the future this trap will be
exposed as an exception via devlink-trap.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Mon, 25 May 2020 23:05:53 +0000 (02:05 +0300)]
mlxsw: spectrum: Share one group for all locally delivered packets
Routed IP packets with the Router Alert option need to be trapped to
the CPU as they might need to be locally delivered to raw sockets with
the IP_ROUTER_ALERT / IPV6_ROUTER_ALERT socket option.
Move them to the same group with other packets that might need to be
trapped following route lookup.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Mon, 25 May 2020 23:05:52 +0000 (02:05 +0300)]
mlxsw: reg: Move all trap groups under the same enum
After the previous patch the split is no longer necessary and all the
trap groups can be moved under the same enum.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Mon, 25 May 2020 23:05:51 +0000 (02:05 +0300)]
mlxsw: spectrum_trap: Do not hard code "thin" policer identifier
As explained in commit
e612523041ab ("mlxsw: spectrum_trap: Introduce
dummy group with thin policer"), the purpose of the "thin" policer is to
pass as less packets as possible to the CPU.
The identifier of this policer is currently set according to the maximum
number of used trap groups, but this is fragile: On Spectrum-1 the
maximum number of policers is less than the maximum number of trap
groups, which might result in an invalid policer identifier in case the
number of used trap groups grows beyond the policer limit.
Solve this by dynamically allocating the policer identifier.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Mon, 25 May 2020 23:05:50 +0000 (02:05 +0300)]
mlxsw: switchx2: Move SwitchX-2 trap groups out of main enum
The number of Spectrum trap groups is not infinite, but two identifiers
are occupied by SwitchX-2 specific trap groups. Free these identifiers
by moving them out of the main enum.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Mon, 25 May 2020 23:05:49 +0000 (02:05 +0300)]
mlxsw: spectrum: Reduce priority of locally delivered packets
To align with recent recommended values. Will be configurable by future
patches.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Mon, 25 May 2020 23:05:48 +0000 (02:05 +0300)]
mlxsw: spectrum: Use same trap group for local routes and link-local destination
Packets with an IPv6 link-local destination (i.e., fe80::/10) should not
be forwarded and are therefore trapped to the CPU for local delivery.
Since these packets are trapped for the same logical reason as packets
hitting local routes, associate both traps with the same group.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Mon, 25 May 2020 23:05:47 +0000 (02:05 +0300)]
mlxsw: spectrum: Use separate trap group for FID miss
When a packet enters the device it is classified to a filtering
identifier (FID) based on the ingress port and VLAN. The FID miss trap
is used to trap packets for which a FID could not be found.
In mlxsw this trap should only be triggered when a port is enslaved to
an OVS bridge and a matching ACL rule could not be found, so as to
trigger learning.
These packets are therefore completely unrelated to packets hitting
local routes and should be in a different group. Move them.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Mon, 25 May 2020 23:05:46 +0000 (02:05 +0300)]
mlxsw: spectrum: Use same trap group for various IPv6 packets
Group these various IPv6 packets (e.g., router solicitations, router
advertisement) together and subject them to the same policer.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Mon, 25 May 2020 23:05:45 +0000 (02:05 +0300)]
mlxsw: spectrum: Rename IPv6 ND trap group
The IPv6 Neighbour Discovery (ND) group will be used for various IPv6
packets, not all of which fall under the definition of ND, so rename it
to "IPV6" which is more appropriate.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Mon, 25 May 2020 23:05:44 +0000 (02:05 +0300)]
mlxsw: spectrum: Use same switch case for identical groups
Trap groups that use the same policer settings can share the same switch
case.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Mon, 25 May 2020 23:05:43 +0000 (02:05 +0300)]
mlxsw: spectrum: Use dedicated trap group for ACL trap
Packets that are trapped via tc's trap action are currently subject to
the same policer as packets hitting local routes. The latter are
critical to the correct functioning of the control plane, while the
former are mainly used for traffic inspection.
Split the ACL trap to a separate group with its own policer. Use a
higher priority for these traps than for traps using mirror action
(e.g., ARP, IGMP). Otherwise, packets matching both traps will not be
forwarded in hardware (because of trap action) and also not forwarded in
software because they will be marked with 'offload_fwd_mark'.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal [Mon, 25 May 2020 21:41:13 +0000 (23:41 +0200)]
mptcp: attempt coalescing when moving skbs to mptcp rx queue
We can try to coalesce skbs we take from the subflows rx queue with the
tail of the mptcp rx queue.
If successful, the skb head can be discarded early.
We can also free the skb extensions, we do not access them after this.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Mon, 25 May 2020 19:54:00 +0000 (21:54 +0200)]
r8169: improve rtl_remove_one
Don't call netif_napi_del() manually, free_netdev() does this for us.
In addition reorder calls to match reverse order of calls in probe().
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 27 May 2020 03:21:43 +0000 (20:21 -0700)]
Merge branch 'net-ethernet-fec-move-GPR-register-offset-and-bit-into-DT'
Fugang Duan says:
====================
net: ethernet: fec: move GPR register offset and bit into DT
The commit
da722186f654 (net: fec: set GPR bit on suspend by
DT configuration) set the GPR reigster offset and bit in driver
for wol feature support.
It brings trouble to enable wol feature on imx6sx/imx6ul/imx7d
platforms that have multiple ethernet instances with different
GPR bit for stop mode control. So the patch set is to move GPR
register offset and bit define into DT, and enable imx6q/imx6dl
imx6qp/imx6sx/imx6ul/imx7d stop mode support.
Currently, below NXP i.MX boards support wol:
- imx6q/imx6dl/imx6qp sabresd
- imx6sx sabreauto
- imx7d sdb
imx6q/imx6dl/imx6qp sabresd board dts file miss the property
"fsl,magic-packet;", so patch#4 is to add the property for stop
mode support.
v1 -> v2:
- driver: switch back to store the quirks bitmask in driver_data
- dt-bindings: rename 'gpr' property string to 'fsl,stop-mode'
- imx6/7 dtsi: add imx6sx/imx6ul/imx7d ethernet stop mode property
v2 -> v3:
- driver: suggested by Sascha Hauer, use a struct fec_devinfo for
abstracting differences between different hardware variants,
it can give more freedom to describe the differences.
- imx6/7 dtsi: correct one typo pointed out by Andrew.
Thanks Martin, Andrew and Sascha Hauer for the review.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Fugang Duan [Mon, 25 May 2020 16:27:13 +0000 (00:27 +0800)]
ARM: dts: imx6qdl-sabresd: enable fec wake-on-lan
Enable ethernet wake-on-lan feature for imx6q/dl/qp sabresd
boards since the PHY clock is supplied by external osc.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Fugang Duan <fugang.duan@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fugang Duan [Mon, 25 May 2020 16:27:12 +0000 (00:27 +0800)]
ARM: dts: imx: add ethernet stop mode property
- Update the imx6qdl gpr property to define gpr register
offset and bit in DT.
- Add imx6sx/imx6ul/imx7d ethernet stop mode property.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Fugang Duan <fugang.duan@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fugang Duan [Mon, 25 May 2020 16:27:11 +0000 (00:27 +0800)]
dt-bindings: fec: update the gpr property
- rename the 'gpr' property string to 'fsl,stop-mode'.
- Update the property to define gpr register offset and
bit in DT, since different instance have different gpr bit.
v2:
* rename 'gpr' property string to 'fsl,stop-mode'.
Signed-off-by: Fugang Duan <fugang.duan@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fugang Duan [Mon, 25 May 2020 16:27:10 +0000 (00:27 +0800)]
net: ethernet: fec: move GPR register offset and bit into DT
The commit
da722186f654 (net: fec: set GPR bit on suspend by DT
configuration) set the GPR reigster offset and bit in driver for
wake on lan feature.
But it introduces two issues here:
- one SOC has two instances, they have different bit
- different SOCs may have different offset and bit
So to support wake-on-lan feature on other i.MX platforms, it should
configure the GPR reigster offset and bit from DT.
So the patch is to improve the commit
da722186f654 (net: fec: set GPR
bit on suspend by DT configuration) to support multiple ethernet
instances on i.MX series.
v2:
* switch back to store the quirks bitmask in driver_data
v3:
* suggested by Sascha Hauer, use a struct fec_devinfo for
abstracting differences between different hardware variants,
it can give more freedom to describe the differences.
Signed-off-by: Fugang Duan <fugang.duan@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Vyukov [Mon, 25 May 2020 15:31:58 +0000 (17:31 +0200)]
net/smc: mark smc_pnet_policy as const
Netlink policies are generally declared as const.
This is safer and prevents potential bugs.
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 27 May 2020 03:17:35 +0000 (20:17 -0700)]
Merge tag 'mac80211-next-for-net-next-2020-04-25' of git://git./linux/kernel/git/jberg/mac80211-next
Johannes Berg says:
====================
One batch of changes, containing:
* hwsim improvements from Jouni and myself, to be able to
test more scenarios easily
* some more HE (802.11ax) support
* some initial S1G (sub 1 GHz) work for fractional MHz channels
* some (action) frame registration updates to help DPP support
* along with other various improvements/fixes
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 26 May 2020 22:33:57 +0000 (15:33 -0700)]
Merge branch 'net-phy-mscc-miim-reduce-waiting-time-between-MDIO-transactions'
Antoine Tenart says:
====================
net: phy: mscc-miim: reduce waiting time between MDIO transactions
This series aims at reducing the waiting time between MDIO transactions
when using the MSCC MIIM MDIO controller.
I'm not sure we need patch 4/4 and we could reasonably drop it from the
series. I'm including the patch as it could help to ensure the system
is functional with a non optimal configuration.
We needed to improve the driver's performances as when using a PHY
requiring lots of registers accesses (such as the VSC85xx family),
delays would add up and ended up to be quite large which would cause
issues such as: a slow initialization of the PHY, and issues when using
timestamping operations (this feature will be sent quite soon to the
mailing lists).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Antoine Tenart [Tue, 26 May 2020 16:22:56 +0000 (18:22 +0200)]
net: phy: mscc-miim: read poll when high resolution timers are disabled
The driver uses a read polling mechanism to check the status of the MDIO
bus, to know if it is ready to accept next commands. This polling
mechanism uses usleep_delay() under the hood between reads which is fine
as long as high resolution timers are enabled. Otherwise the delays will
end up to be much longer than expected.
This patch fixes this by using udelay() under the hood when
CONFIG_HIGH_RES_TIMERS isn't enabled. This increases CPU usage.
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>