Grygorii Strashko [Fri, 1 May 2020 20:50:10 +0000 (23:50 +0300)]
arm64: dts: ti: k3-j721e-mcu: add mcu cpsw cpts node
Add DT node for The TI J721E MCU CPSW CPTS which is part of MCU CPSW NUSS.
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Grygorii Strashko [Fri, 1 May 2020 20:50:09 +0000 (23:50 +0300)]
arm64: dts: ti: k3-am65-main: add main navss cpts node
Add DT node for Main NAVSS CPTS module.
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Grygorii Strashko [Fri, 1 May 2020 20:50:08 +0000 (23:50 +0300)]
arm64: dts: ti: k3-am65-mcu: add cpsw cpts node
Add DT node for the TI AM65x SoC Common Platform Time Sync (CPTS).
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Grygorii Strashko [Fri, 1 May 2020 20:50:07 +0000 (23:50 +0300)]
net: ethernet: ti: am65-cpsw-nuss: enable packet timestamping support
The MCU CPSW Common Platform Time Sync (CPTS) provides possibility to
timestamp TX PTP packets and all RX packets.
This enables corresponding support in TI AM65x/J721E MCU CPSW driver.
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Grygorii Strashko [Fri, 1 May 2020 20:50:06 +0000 (23:50 +0300)]
net: ethernet: ti: introduce am654 common platform time sync driver
The CPTS module is used to facilitate host control of time sync operations.
Main features of CPTS module are:
- selection of multiple external clock sources
- control of time sync events via interrupt or polling
- 64-bit timestamp mode in ns with HW PPM and nudge adjustment.
- hardware timestamp ext. inputs (HWx_TS_PUSH)
- timestamp Generator function outputs (TS_GENFx)
Depending on integration it enables compliance with the IEEE 1588-2008
standard for a precision clock synchronization protocol, Ethernet Enhanced
Scheduled Traffic Operations (CPTS_ESTFn) and PCIe Subsystem Precision Time
Measurement (PTM).
Introduced driver provides Linux PTP hardware clock for each CPTS device
and network packets timestamping where applicable. CPTS PTP hardware clock
supports following operations:
- Set time
- Get time
- Shift the clock by a given offset atomically
- Adjust clock frequency
- Time stamp external events
- Periodic output signals
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Grygorii Strashko [Fri, 1 May 2020 20:50:05 +0000 (23:50 +0300)]
dt-binding: ti: am65x: document common platform time sync cpts module
Document device tree bindings for TI AM654/J721E SoC The Common Platform
Time Sync (CPTS) module. The CPTS module is used to facilitate host control
of time sync operations. Main features of CPTS module are:
- selection of multiple external clock sources
- 64-bit timestamp mode in ns with ppm and nudge adjustment.
- control of time sync events via interrupt or polling
- hardware timestamp of ext. events (HWx_TS_PUSH)
- periodic generator function outputs (TS_GENFx)
- PPS in combination with timesync router
- Depending on integration it enables compliance with the IEEE 1588-2008
standard for a precision clock synchronization protocol, Ethernet Enhanced
Scheduled Traffic Operations (CPTS_ESTFn) and PCIe Subsystem Precision Time
Measurement (PTM).
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 4 May 2020 18:58:31 +0000 (11:58 -0700)]
Merge branch 'devlink-kernel-region-snapshot-id-allocation'
Jakub Kicinski says:
====================
devlink: kernel region snapshot id allocation
currently users have to find a free snapshot id to pass
to the kernel when they are requesting a snapshot to be
taken.
This set extends the kernel so it can allocate the id
on its own and send it back to user space in a response.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 1 May 2020 16:40:42 +0000 (09:40 -0700)]
docs: devlink: clarify the scope of snapshot id
In past discussions Jiri explained snapshot ids are cross-region.
Explain this in the docs.
v3: new patch
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 1 May 2020 16:40:41 +0000 (09:40 -0700)]
devlink: let kernel allocate region snapshot id
Currently users have to choose a free snapshot id before
calling DEVLINK_CMD_REGION_NEW. This is potentially racy
and inconvenient.
Make the DEVLINK_ATTR_REGION_SNAPSHOT_ID optional and try
to allocate id automatically. Send a message back to the
caller with the snapshot info.
Example use:
$ devlink region new netdevsim/netdevsim1/dummy
netdevsim/netdevsim1/dummy: snapshot 1
$ id=$(devlink -j region new netdevsim/netdevsim1/dummy | \
jq '.[][][][]')
$ devlink region dump netdevsim/netdevsim1/dummy snapshot $id
[...]
$ devlink region del netdevsim/netdevsim1/dummy snapshot $id
v4:
- inline the notification code
v3:
- send the notification only once snapshot creation completed.
v2:
- don't wrap the line containing extack;
- add a few sentences to the docs.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 1 May 2020 16:40:40 +0000 (09:40 -0700)]
devlink: factor out building a snapshot notification
We'll need to send snapshot info back on the socket
which requested a snapshot to be created. Factor out
constructing a snapshot description from the broadcast
notification code.
v3: new patch
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 1 May 2020 14:07:41 +0000 (07:07 -0700)]
net_sched: sch_fq: add horizon attribute
QUIC servers would like to use SO_TXTIME, without having CAP_NET_ADMIN,
to efficiently pace UDP packets.
As far as sch_fq is concerned, we need to add safety checks, so
that a buggy application does not fill the qdisc with packets
having delivery time far in the future.
This patch adds a configurable horizon (default: 10 seconds),
and a configurable policy when a packet is beyond the horizon
at enqueue() time:
- either drop the packet (default policy)
- or cap its delivery time to the horizon.
$ tc -s -d qd sh dev eth0
qdisc fq 8022: root refcnt 257 limit 10000p flow_limit 100p buckets 1024
orphan_mask 1023 quantum 10Kb initial_quantum 51160b low_rate_threshold 550Kbit
refill_delay 40.0ms timer_slack 10.000us horizon 10.000s
Sent
1234215879 bytes 837099 pkt (dropped 21, overlimits 0 requeues 6)
backlog 0b 0p requeues 6
flows 1191 (inactive 1177 throttled 0)
gc 0 highprio 0 throttled 692 latency 11.480us
pkts_too_long 0 alloc_errors 0 horizon_drops 21 horizon_caps 0
v2: fixed an overflow on 32bit kernels in fq_init(), reported
by kbuild test robot <lkp@intel.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesper Dangaard Brouer [Thu, 30 Apr 2020 11:42:22 +0000 (13:42 +0200)]
net: sched: fallback to qdisc noqueue if default qdisc setup fail
Currently if the default qdisc setup/init fails, the device ends up with
qdisc "noop", which causes all TX packets to get dropped.
With the introduction of sysctl net/core/default_qdisc it is possible
to change the default qdisc to be more advanced, which opens for the
possibility that Qdisc_ops->init() can fail.
This patch detect these kind of failures, and choose to fallback to
qdisc "noqueue", which is so simple that its init call will not fail.
This allows the interface to continue functioning.
V2:
As this also captures memory failures, which are transient, the
device is not kept in IFF_NO_QUEUE state. This allows the net_device
to retry to default qdisc assignment.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 4 May 2020 18:26:55 +0000 (11:26 -0700)]
Merge branch 'net-ipa-I-O-map-SMEM-and-IMEM'
Alex Elder says:
====================
net: ipa: I/O map SMEM and IMEM
This series adds the definition of two memory regions that must be
mapped for IPA to access through an SMMU. It requires the SMMU to
be defined in the IPA node in the SoC's Device Tree file.
There is no change since version 1 to the content of the code in
these patches, *however* this time the first patch is an update to
the binding definition rather than an update to a DTS file.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Alex Elder [Mon, 4 May 2020 17:58:59 +0000 (12:58 -0500)]
net: ipa: define SMEM memory region for IPA
Arrange to use an item from SMEM memory for IPA. SMEM item number
497 is designated to be used by the IPA. Specify the item ID and
size of the region in platform configuration data. Allocate and get
a pointer to this region from ipa_mem_init(). The memory must be
mapped for access through an SMMU.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alex Elder [Mon, 4 May 2020 17:58:58 +0000 (12:58 -0500)]
net: ipa: define IMEM memory region for IPA
Define a region of IMEM memory available for use by IPA in the
platform configuration data. Initialize it from ipa_mem_init().
The memory must be mapped for access through an SMMU.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alex Elder [Mon, 4 May 2020 17:58:57 +0000 (12:58 -0500)]
net: ipa: redefine struct ipa_mem_data
The ipa_mem_data structure type was never actually used. Instead,
the IPA memory regions were defined using the ipa_mem structure.
Redefine struct ipa_mem_data so it encapsulates the array of IPA-local
memory region descriptors along with the count of entries in that
array. Pass just an ipa_mem structure pointer to ipa_mem_init().
Rename the ipa_mem_data[] array ipa_mem_local_data[] to emphasize
that the memory regions it defines are IPA-local memory.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alex Elder [Mon, 4 May 2020 17:58:56 +0000 (12:58 -0500)]
dt-bindings: net: add IPA iommus property
The IPA accesses "IMEM" and main system memory through an SMMU, so
its DT node requires an iommus property to define range of stream IDs
it uses.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 4 May 2020 18:19:58 +0000 (11:19 -0700)]
Merge branch 'net-add-helper-eth_hw_addr_crc'
Heiner Kallweit says:
====================
net: add helper eth_hw_addr_crc
Several drivers use the same code as basis for filter hashes. Therefore
let's factor it out to a helper. This way drivers don't have to access
struct netdev_hw_addr internals.
First user is r8169.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Mon, 4 May 2020 17:28:21 +0000 (19:28 +0200)]
r8169: use new helper eth_hw_addr_crc
Use new helper eth_hw_addr_crc to simplify the code.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Mon, 4 May 2020 17:27:00 +0000 (19:27 +0200)]
net: add helper eth_hw_addr_crc
Several drivers use the same code as basis for filter hashes. Therefore
let's factor it out to a helper. This way drivers don't have to access
struct netdev_hw_addr internals.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Walle [Mon, 4 May 2020 16:52:28 +0000 (18:52 +0200)]
net: dsa: felix: allow the device to be disabled
If there is no specific configuration of the felix switch in the device
tree, but only the default configuration (ie. given by the SoCs dtsi
file), the probe fails because no CPU port has been set. On the other
hand you cannot set a default CPU port because that depends on the
actual board using the switch.
[ 2.701300] DSA: tree 0 has no CPU port
[ 2.705167] mscc_felix 0000:00:00.5: Failed to register DSA switch: -22
[ 2.711844] mscc_felix: probe of 0000:00:00.5 failed with error -22
Thus let the device tree disable this device entirely, like it is also
done with the enetc driver of the same SoC.
Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 4 May 2020 17:54:40 +0000 (10:54 -0700)]
Merge branch 'net-smc-add-failover-processing'
Karsten Graul says:
====================
net/smc: add failover processing
This patch series adds the actual SMC-R link failover processing and
improved link group termination. There will be one more (very small)
series after this which will complete the SMC-R link failover support.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Karsten Graul [Mon, 4 May 2020 12:18:48 +0000 (14:18 +0200)]
net/smc: save SMC-R peer link_uid
During SMC-R link establishment the peers exchange the link_uid that
is used for debugging purposes. Save the peer link_uid in smc_link so it
can be retrieved by the smc_diag netlink interface.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Karsten Graul [Mon, 4 May 2020 12:18:47 +0000 (14:18 +0200)]
net/smc: create improved SMC-R link_uid
The link_uid of an SMC-R link is exchanged between SMC peers and its
value can be used for debugging purposes. Create a unique link_uid
during link initialization and use it in communication with SMC-R peers.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Karsten Graul [Mon, 4 May 2020 12:18:46 +0000 (14:18 +0200)]
net/smc: improve termination processing
Add helper smcr_lgr_link_deactivate_all() and eliminate duplicate code.
In smc_lgr_free(), clear the smc-r links before smc_lgr_free_bufs() is
called so buffers are already prepared for free. The usage of the soft
parameter in __smc_lgr_terminate() is no longer needed, smc_lgr_free()
can be called directly. smc_lgr_terminate_sched() and
smc_smcd_terminate() set lgr->freeing to indicate that the link group
will be freed soon to avoid unnecessary schedules of the free worker.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Karsten Graul [Mon, 4 May 2020 12:18:45 +0000 (14:18 +0200)]
net/smc: add termination reason and handle LLC protocol violation
Allow to set the reason code for the link group termination, and set
meaningful values before termination processing is triggered. This
reason code is sent to the peer in the final delete link message.
When the LLC request or response layer receives a message type that was
not handled, drop a warning and terminate the link group.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Karsten Graul [Mon, 4 May 2020 12:18:44 +0000 (14:18 +0200)]
net/smc: asymmetric link tagging
New connections must not be assigned to asymmetric links. Add asymmetric
link tagging using new link variable link_is_asym. The new helpers
smcr_lgr_set_type() and smcr_lgr_set_type_asym() are called to set the
state of the link group, and tag all links accordingly.
smcr_lgr_conn_assign_link() respects the link tagging and will not
assign new connections to links tagged as asymmetric link.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Karsten Graul [Mon, 4 May 2020 12:18:43 +0000 (14:18 +0200)]
net/smc: assign link to a new connection
For new connections, assign a link from the link group, using some
simple load balancing.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Karsten Graul [Mon, 4 May 2020 12:18:42 +0000 (14:18 +0200)]
net/smc: send DELETE_LINK, ALL message and wait for send to complete
Add smc_llc_send_message_wait() which uses smc_wr_tx_send_wait() to send
an LLC message and waits for the message send to complete.
smc_llc_send_link_delete_all() calls the new function to send an
DELETE_LINK,ALL LLC message. The RFC states that the sender of this type
of message needs to wait for the completion event of the message
transmission and can terminate the link afterwards.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Karsten Graul [Mon, 4 May 2020 12:18:41 +0000 (14:18 +0200)]
net/smc: wait for departure of an IB message
Introduce smc_wr_tx_send_wait() to send an IB message and wait for the
tx completion event of the message. This makes sure that the message is
no longer in-flight when the function returns.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Karsten Graul [Mon, 4 May 2020 12:18:40 +0000 (14:18 +0200)]
net/smc: handle incoming CDC validation message
Call smc_cdc_msg_validate() when a CDC message with the failover
validation bit enabled was received. Validate that the sequence number
sent with the message is one we already have received. If not, messages
were lost and the connection is terminated using a new abort_work.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Karsten Graul [Mon, 4 May 2020 12:18:39 +0000 (14:18 +0200)]
net/smc: send failover validation message
When a connection is switched to a new link then a link validation
message must be sent to the peer over the new link, containing the
sequence number of the last CDC message that was sent over the old link.
The peer will validate if this sequence number is the same or lower then
the number he received, and abort the connection if messages were lost.
Add smcr_cdc_msg_send_validation() to send the message validation
message and call it when a connection was switched in
smc_switch_cursor().
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Karsten Graul [Mon, 4 May 2020 12:18:38 +0000 (14:18 +0200)]
net/smc: switch connections to alternate link
Add smc_switch_conns() to switch all connections from a link that is
going down. Find an other link to switch the connections to, and
switch each connection to the new link. smc_switch_cursor() updates the
cursors of a connection to the state of the last successfully sent CDC
message. When there is no link to switch to, terminate the link group.
Call smc_switch_conns() when a link is going down.
And with the possibility that links of connections can switch adapt CDC
and TX functions to detect and handle link switches.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Karsten Graul [Mon, 4 May 2020 12:18:37 +0000 (14:18 +0200)]
net/smc: save state of last sent CDC message
When a link goes down and all connections of this link need to be
switched to an other link then the producer cursor and the sequence of
the last successfully sent CDC message must be known. Add the two fields
to the SMC connection and update it in the tx completion handler.
And to allow matching of sequences in error cases reset the seqno to the
old value in smc_cdc_msg_send() when the actual send failed.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 4 May 2020 17:44:11 +0000 (10:44 -0700)]
Merge branch 'bnxt_en-Updates-for-net-next'
Michael Chan says:
====================
bnxt_en: Updates for net-next.
This patchset includes these main changes:
1. Firmware spec. update.
2. Context memory sizing improvements for the hardware TQM block.
3. ethtool chip reset improvements and fixes for correctness.
4. Improve L2 doorbell mapping by mapping only up to the size specified
by firmware. This allows the RoCE driver to map the remaining doorbell
space for its purpose, such as write-combining.
5. Improve ethtool -S channel statistics by showing only relevant ring
counters for non-combined channels.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Rajesh Ravi [Mon, 4 May 2020 08:50:41 +0000 (04:50 -0400)]
bnxt_en: show only relevant ethtool stats for a TX or RX ring
Currently, ethtool -S shows all TX/RX ring counters whether the
channel is combined, RX, or TX. The unused counters will always be
zero. Improve it by showing only the relevant counters if the channel
is RX or TX. If the channel is combined, the counters will be shown
exactly the same as before.
[ MChan: Lots of cleanups and simplifications on Rajesh's original
code]
Signed-off-by: Rajesh Ravi <rajesh.ravi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Mon, 4 May 2020 08:50:40 +0000 (04:50 -0400)]
bnxt_en: Split HW ring statistics strings into RX and TX parts.
This will allow the RX and TX ring statistics to be separated if needed.
In the next patch, we'll be able to only display RX or TX statistcis if
the channel is RX only or TX only.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Mon, 4 May 2020 08:50:39 +0000 (04:50 -0400)]
bnxt_en: Refactor the software ring counters.
We currently have 3 software ring counters, rx_l4_csum_errors,
rx_buf_errors, and missed_irqs. The 1st two are RX counters and the
last one is a common counter. Organize them into 2 structures
bnxt_rx_sw_stats and bnxt_cmn_sw_stats.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Mon, 4 May 2020 08:50:38 +0000 (04:50 -0400)]
bnxt_en: Add doorbell information to bnxt_en_dev struct.
The purpose of this is to inform the RDMA driver the size of the doorbell
BAR that the L2 driver has mapped and the portion that is mapped
uncacheable. The unchaeable portion is shared with the RoCE driver.
Any remaining unmapped doorbell BAR can be used by the RDMA driver for
its own purpose. Currently, the entire L2 portion is mapped uncacheable.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Mon, 4 May 2020 08:50:37 +0000 (04:50 -0400)]
bnxt_en: Add support for L2 doorbell size.
Read the L2 doorbell size from the firmware and only map the portion
of the doorbell BAR for L2 use. This will leave the remaining doorbell
BAR available for the RoCE driver to use. The RoCE driver can map
the remaining portion as write-combining to support the push feature.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Mon, 4 May 2020 08:50:36 +0000 (04:50 -0400)]
bnxt_en: Set the db_offset on 57500 chips for the RDMA MSIX entries.
The driver provides completion ring or NQ doorbell offset for each
MSIX entry requested by the RDMA driver. The NQ offset on 57500
chips is different than legacy chips. Set it correctly based on
chip type for correctness. The RDMA driver is ignoring this field
for the 57500 chips so it is not causing any problem.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Mon, 4 May 2020 08:50:35 +0000 (04:50 -0400)]
bnxt_en: Define the doorbell offsets on 57500 chips.
Define the 57500 chip doorbell offsets instead of using the magic
values in the C file.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edwin Peer [Mon, 4 May 2020 08:50:34 +0000 (04:50 -0400)]
bnxt_en: Improve kernel log messages related to ethtool reset.
Kernel log messages for failed AP reset commands should be suppressed.
These are expected to fail on devices that do not have an AP. Add
missing driver reload message after AP reset and log it in a common
way without duplication.
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edwin Peer [Mon, 4 May 2020 08:50:33 +0000 (04:50 -0400)]
bnxt_en: fix ethtool_reset_flags ABI violations
The ethtool ABI specifies that the reset operation should only clear
the flags that were actually reset. Setting the flags to zero after
a chip reset violates this because it does not include resetting the
application processor complex. Similarly, components that are not yet
defined are also not necessarily being reset.
The fact that chip reset does not cover the AP also means that it is
inappropriate to treat these two components exclusively of one another.
The ABI provides a mechanism to report a failure to reset independent
components via the returned bitmask, so it is also wrong to fail hard
if one of a set of independent resets is not possible.
It is incorrect to rely on the passed by reference flags in bnxt_reset(),
which are being updated as components are reset. The initially requested
value should be used instead so that hard errors do not propagate if any
earlier components could have been reset successfully.
Note, AP and chip resets are global in nature. Dedicated resets are
thus not currently supported.
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edwin Peer [Mon, 4 May 2020 08:50:32 +0000 (04:50 -0400)]
bnxt_en: refactor ethtool firmware reset types
The case statement in bnxt_firmware_reset() dangerously mixes types.
This patch separates the application processor and whole chip resets
from the rest such that the selection is performed on a pure type.
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edwin Peer [Mon, 4 May 2020 08:50:31 +0000 (04:50 -0400)]
bnxt_en: prepare to refactor ethtool reset types
Extract bnxt_hwrm_firmware_reset() for performing firmware reset
operations. This new helper function will be used in a subsequent
patch to separate unrelated reset types out of bnxt_firmware_reset().
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vasundhara Volam [Mon, 4 May 2020 08:50:30 +0000 (04:50 -0400)]
bnxt_en: Do not include ETH_FCS_LEN in the max packet length sent to fw.
The firmware does not expect the CRC to be included in the length
passed from the driver. The firmware always configures the chip
to strip out the CRC.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Mon, 4 May 2020 08:50:29 +0000 (04:50 -0400)]
bnxt_en: Improve TQM ring context memory sizing formulas.
The current formulas to calculate the TQM slow path and fast path ring
context memory sizes are not quite correct. TQM slow path entry is
array index 0 of ctx->tqm_mem[]. The other array entries are for fast
path. Fix these sizes according to latest firmware spec. for 57500 and
newer chips.
Fixes: 3be8136ce14e ("bnxt_en: Initialize context memory to the value specified by firmware.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Mon, 4 May 2020 08:50:28 +0000 (04:50 -0400)]
bnxt_en: Allocate TQM ring context memory according to fw specification.
Newer firmware spec. will specify the number of TQM rings to allocate
context memory for. Use the firmware specified value and fall back
to the old value derived from bp->max_q if it is not available.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Mon, 4 May 2020 08:50:27 +0000 (04:50 -0400)]
bnxt_en: Update firmware spec. to 1.10.1.33.
Changes include additional statistics, ECN support, context memory
interface change for better TQM context memory sizing, firmware
health status definitions, etc.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 3 May 2020 23:08:29 +0000 (16:08 -0700)]
Merge branch 'net-smc-add-and-delete-link-processing'
Karsten Graul says:
====================
net/smc: add and delete link processing
These patches add the 'add link' and 'delete link' processing as
SMC server and client. This processing allows to establish and
remove links of a link group dynamically.
v2: Fix mess up with unused static functions. Merge patch 8 into patch 4.
Postpone patch 13 to next series.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Karsten Graul [Sun, 3 May 2020 12:38:50 +0000 (14:38 +0200)]
net/smc: enqueue local LLC messages
As SMC server, when a second link was deleted, trigger the setup of an
asymmetric link. Do this by enqueueing a local ADD_LINK message which
is processed by the LLC layer as if it were received from peer. Do the
same when a new IB port became active and a new link could be created.
smc_llc_srv_add_link_local() enqueues a local ADD_LINK message.
And smc_llc_srv_delete_link_local() is used the same way to enqueue a
local DELETE_LINK message. This is used when an IB port is no longer
active.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Karsten Graul [Sun, 3 May 2020 12:38:49 +0000 (14:38 +0200)]
net/smc: delete link processing as SMC server
Add smc_llc_process_srv_delete_link() to process a DELETE_LINK request
as SMC server. When the request is to delete ALL links then terminate
the whole link group. If not, find the link to delete by its link_id,
send the DELETE_LINK request LLC message and wait for the response.
No matter if a response was received, clear the deleted link and update
the link group state.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Karsten Graul [Sun, 3 May 2020 12:38:48 +0000 (14:38 +0200)]
net/smc: delete link processing as SMC client
Add smc_llc_process_cli_delete_link() to process a DELETE_LINK request
as SMC client. When the request is to delete ALL links then terminate
the whole link group. If not, find the link to delete by its link_id,
send the DELETE_LINK response LLC message and then clear the deleted
link. Finally determine and update the link group state.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Karsten Graul [Sun, 3 May 2020 12:38:47 +0000 (14:38 +0200)]
net/smc: llc_del_link_work and use the LLC flow for delete link
Introduce a work that is scheduled when a new DELETE_LINK LLC request is
received. The work will call either the SMC client or SMC server
DELETE_LINK processing.
And use the LLC flow framework to process incoming DELETE_LINK LLC
messages, scheduling the llc_del_link_work for those events.
With these changes smc_lgr_forget() is only called by one function and
can be migrated into smc_lgr_cleanup_early().
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Karsten Graul [Sun, 3 May 2020 12:38:46 +0000 (14:38 +0200)]
net/smc: delete an asymmetric link as SMC server
When a link group moved from asymmetric to symmetric state then the
dangling asymmetric link can be deleted. Add smc_llc_find_asym_link() to
find the respective link and add smc_llc_delete_asym_link() to delete
it.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Karsten Graul [Sun, 3 May 2020 12:38:45 +0000 (14:38 +0200)]
net/smc: final part of add link processing as SMC server
This patch finalizes the ADD_LINK processing of new links. Send the
CONFIRM_LINK request to the peer, receive the response and set link
state to ACTIVE.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Karsten Graul [Sun, 3 May 2020 12:38:44 +0000 (14:38 +0200)]
net/smc: rkey processing for a new link as SMC server
Part of SMC server new link establishment is the exchange of rkeys for
used buffers.
Loop over all used RMB buffers and send ADD_LINK_CONTINUE LLC messages
to the peer.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Karsten Graul [Sun, 3 May 2020 12:38:43 +0000 (14:38 +0200)]
net/smc: first part of add link processing as SMC server
First set of functions to process an ADD_LINK LLC request as an SMC
server. Find an alternate IB device, determine the new link group type
and get the index for the new link. Then initialize the link and send
the ADD_LINK LLC message to the peer. Save the contents of the response,
ready the link, map all used buffers and register the buffers with the
IB device. If any error occurs, stop the processing and clear the link.
And call smc_llc_srv_add_link() in af_smc.c to start second link
establishment after the initial link of a link group was created.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Karsten Graul [Sun, 3 May 2020 12:38:42 +0000 (14:38 +0200)]
net/smc: final part of add link processing as SMC client
This patch finalizes the ADD_LINK processing of new links. Receive the
CONFIRM_LINK request from peer, complete the link initialization,
register all used buffers with the IB device and finally send the
CONFIRM_LINK response, which completes the ADD_LINK processing.
And activate smc_llc_cli_add_link() in af_smc.c.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Karsten Graul [Sun, 3 May 2020 12:38:41 +0000 (14:38 +0200)]
net/smc: rkey processing for a new link as SMC client
Part of the SMC client new link establishment process is the exchange of
rkeys for all used buffers.
Add new LLC message type ADD_LINK_CONTINUE which is used to exchange
rkeys of all current RMB buffers. Add functions to iterate over all
used RMB buffers of the link group, and implement the ADD_LINK_CONTINUE
processing.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Karsten Graul [Sun, 3 May 2020 12:38:40 +0000 (14:38 +0200)]
net/smc: first part of add link processing as SMC client
First set of functions to process an ADD_LINK LLC request as an SMC
client. Find an alternate IB device, determine the new link group type
and get the index for the new link. Then ready the link, map the buffers
and send an ADD_LINK LLC response. If any error occurs, send a reject
LLC message and terminate the processing.
Add smc_llc_alloc_alt_link() to find a free link index for a new link,
depending on the new link group type.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 3 May 2020 22:59:30 +0000 (15:59 -0700)]
Merge branch 'Enhance-current-features-in-ena-driver'
Sameeh Jubran says:
====================
Enhance current features in ena driver
Difference from v2:
* dropped patch "net: ena: move llq configuration from ena_probe to ena_device_init()"
* reworked patch ""net: ena: implement ena_com_get_admin_polling_mode() to drop the prototype
Difference from v1:
* reodered paches #01 and #02.
* dropped adding Rx/Tx drops to ethtool in patch #08
V1:
This patchset introduces the following:
* minor changes to RSS feature
* add total rx and tx drop counter
* add unmask_interrupt counter for ethtool statistics
* add missing implementation for ena_com_get_admin_polling_mode()
* some minor code clean-up and cosmetics
* use SHUTDOWN as reset reason when closing interface
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Arthur Kiyanovski [Sun, 3 May 2020 09:52:21 +0000 (09:52 +0000)]
net: ena: cosmetic: extract code to ena_indirection_table_set()
Extract code to ena_indirection_table_set() to make
the code cleaner.
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sameeh Jubran [Sun, 3 May 2020 09:52:20 +0000 (09:52 +0000)]
net: ena: cosmetic: remove unnecessary spaces and tabs in ena_com.h macros
The macros in ena_com.h have inconsistent spaces between
the macro name and it's value.
This commit sets all the macros to have a single space between
the name and value.
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sameeh Jubran [Sun, 3 May 2020 09:52:19 +0000 (09:52 +0000)]
net: ena: use SHUTDOWN as reset reason when closing interface
The 'ENA_REGS_RESET_SHUTDOWN' enum indicates a normal driver
shutdown / removal procedure.
Also, a comment is added to one of the reset reason assignments for
code clarity.
Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arthur Kiyanovski [Sun, 3 May 2020 09:52:18 +0000 (09:52 +0000)]
net: ena: drop superfluous prototype
Before this commit there was a function prototype named
ena_com_get_ena_admin_polling_mode() that was never implemented.
This patch simply deletes it.
Signed-off-by: Igor Chauskin <igorch@amazon.com>
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sameeh Jubran [Sun, 3 May 2020 09:52:17 +0000 (09:52 +0000)]
net: ena: add support for reporting of packet drops
1. Add support for getting tx drops from the device and saving them
in the driver.
2. Report tx via netdev stats.
Signed-off-by: Igor Chauskin <igorch@amazon.com>
Signed-off-by: Guy Tzalik <gtzalik@amazon.com>
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sameeh Jubran [Sun, 3 May 2020 09:52:16 +0000 (09:52 +0000)]
net: ena: add unmask interrupts statistics to ethtool
Add unmask interrupts statistics to ethtool.
Signed-off-by: Netanel Belgazal <netanel@amazon.com>
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sameeh Jubran [Sun, 3 May 2020 09:52:15 +0000 (09:52 +0000)]
net: ena: remove code that does nothing
Both key and func parameters are pointers on the stack.
Setting them to NULL does nothing.
The original intent was to leave the key and func unset in this case,
but for this to happen nothing needs to be done as the calling
function ethtool_get_rxfh() already clears key and func.
This commit removes the above described useless code.
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sameeh Jubran [Sun, 3 May 2020 09:52:14 +0000 (09:52 +0000)]
net: ena: changes to RSS hash key allocation
This commit contains 2 cosmetic changes:
1. Use ena_com_check_supported_feature_id() in
ena_com_hash_key_fill_default_key() instead of rewriting
its implementation. This also saves us a superfluous admin
command by using the cached value.
2. Change if conditions in ena_com_rss_init() to be clearer.
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arthur Kiyanovski [Sun, 3 May 2020 09:52:13 +0000 (09:52 +0000)]
net: ena: change default RSS hash function to Toeplitz
Currently in the driver we are setting the hash function to be CRC32.
Starting with this commit we want to change the default behaviour so that
we set the hash function to be Toeplitz instead.
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sameeh Jubran [Sun, 3 May 2020 09:52:12 +0000 (09:52 +0000)]
net: ena: allow setting the hash function without changing the key
Current code does not allow setting the hash function without
changing the key. This commit enables it.
To achieve this we separate ena_com_get_hash_function() to 2 functions:
ena_com_get_hash_function() - which gets only the hash function, and
ena_com_get_hash_key() - which gets only the hash key.
Also return 0 instead of rc at the end of ena_get_rxfh() since all
previous operations succeeded.
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arthur Kiyanovski [Sun, 3 May 2020 09:52:11 +0000 (09:52 +0000)]
net: ena: fix error returning in ena_com_get_hash_function()
In case the "func" parameter is NULL we now return "-EINVAL".
This shouldn't happen in general, but when it does happen, this is the
proper way to handle it.
We also check func for NULL in the beginning of the function, as there
is no reason to do all the work and realize in the end of the function
it was useless.
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arthur Kiyanovski [Sun, 3 May 2020 09:52:10 +0000 (09:52 +0000)]
net: ena: avoid unnecessary admin command when RSS function set fails
Currently when ena_set_hash_function() fails the hash function is
restored to the previous value by calling an admin command to get
the hash function from the device.
In this commit we avoid the admin command, by saving the previous
hash function before calling ena_set_hash_function() and using this
previous value to restore the hash function in case of failure of
ena_set_hash_function().
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 3 May 2020 22:50:53 +0000 (15:50 -0700)]
Merge branch 'sch_fq-optimizations'
Eric Dumazet says:
====================
net_sched: sch_fq: round of optimizations
This series is focused on better layout of struct fq_flow to
reduce number of cache line misses in fq_enqueue() and dequeue operations.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sun, 3 May 2020 02:54:22 +0000 (19:54 -0700)]
net_sched: sch_fq: perform a prefetch() earlier
The prefetch() done in fq_dequeue() can be done a bit earlier
after the refactoring of the code done in the prior patch.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sun, 3 May 2020 02:54:21 +0000 (19:54 -0700)]
net_sched: sch_fq: do not call fq_peek() twice per packet
This refactors the code to not call fq_peek() from fq_dequeue_head()
since the caller can provide the skb.
Also rename fq_dequeue_head() to fq_dequeue_skb() because 'head' is
a bit vague, given the skb could come from t_root rb-tree.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sun, 3 May 2020 02:54:20 +0000 (19:54 -0700)]
net_sched: sch_fq: use bulk freeing in fq_gc()
fq_gc() already builds a small array of pointers, so using
kmem_cache_free_bulk() needs very little change.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sun, 3 May 2020 02:54:19 +0000 (19:54 -0700)]
net_sched: sch_fq: change fq_flow size/layout
sizeof(struct fq_flow) is 112 bytes on 64bit arches.
This means that half of them use two cache lines, but 50% use
three cache lines.
This patch adds cache line alignment, and makes sure that only
the first cache line is touched by fq_enqueue(), which is more
expensive that fq_dequeue() in general.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sun, 3 May 2020 02:54:18 +0000 (19:54 -0700)]
net_sched: sch_fq: avoid touching f->next from fq_gc()
A significant amount of cpu cycles is spent in fq_gc()
When fq_gc() does its lookup in the rb-tree, it needs the
following fields from struct fq_flow :
f->sk (lookup key in the rb-tree)
f->fq_node (anchor in the rb-tree)
f->next (used to determine if the flow is detached)
f->age (used to determine if the flow is candidate for gc)
This unfortunately spans two cache lines (assuming 64 bytes cache lines)
We can avoid using f->next, if we use the low order bit of f->{age|tail}
This low order bit is 0, if f->tail points to an sk_buff.
We set the low order bit to 1, if the union contains a jiffies value.
Combined with the following patch, this makes sure we only need
to bring into cpu caches one cache line per flow.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Yakunin [Sat, 2 May 2020 15:34:42 +0000 (18:34 +0300)]
inet_diag: bc: read cgroup id only for full sockets
Fix bug introduced by commit
b1f3e43dbfac ("inet_diag: add support for
cgroup filter").
Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru>
Reported-by: syzbot+ee80f840d9bf6893223b@syzkaller.appspotmail.com
Reported-by: syzbot+13bef047dbfffa5cd1af@syzkaller.appspotmail.com
Fixes: b1f3e43dbfac ("inet_diag: add support for cgroup filter")
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Sat, 2 May 2020 15:25:04 +0000 (17:25 +0200)]
net: ethernet: fec: Replace interrupt driven MDIO with polled IO
Measurements of the MDIO bus have shown that driving the MDIO bus
using interrupts is slow. Back to back MDIO transactions take about
90us, with 25us spent performing the transaction, and the remainder of
the time the bus is idle.
Replacing the completion interrupt with polled IO results in back to
back transactions of 40us. The polling loop waiting for the hardware
to complete the transaction takes around 28us. Which suggests
interrupt handling has an overhead of 50us, and polled IO nearly
halves this overhead, and doubles the MDIO performance.
Care has to be taken when setting the MII_SPEED register, or it can
trigger an MII event> That then upsets the polling, due to an
unexpected pending event.
Suggested-by: Chris Heally <cphealy@gmail.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 2 May 2020 23:41:06 +0000 (16:41 -0700)]
smc: Remove unused function.
net/smc/smc_llc.c:544:12: warning: ‘smc_llc_alloc_alt_link’ defined but not used [-Wunused-function]
static int smc_llc_alloc_alt_link(struct smc_link_group *lgr,
^~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 2 May 2020 23:31:45 +0000 (16:31 -0700)]
Merge branch 'ptp-Add-adjust-phase-to-support-phase-offset'
Vincent Cheng says:
====================
ptp: Add adjust phase to support phase offset.
This series adds adjust phase to the PTP Hardware Clock device interface.
Some PTP hardware clocks have a write phase mode that has
a built-in hardware filtering capability. The write phase mode
utilizes a phase offset control word instead of a frequency offset
control word. Add adjust phase function to take advantage of this
capability.
Changes since v1:
- As suggested by Richard Cochran:
1. ops->adjphase is new so need to check for non-null function pointer.
2. Kernel coding style uses lower_case_underscores.
3. Use existing PTP clock API for delayed worker.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vincent Cheng [Sat, 2 May 2020 03:35:38 +0000 (23:35 -0400)]
ptp: ptp_clockmatrix: Add adjphase() to support PHC write phase mode.
Add idtcm_adjphase() to support PHC write phase mode.
Signed-off-by: Vincent Cheng <vincent.cheng.xh@renesas.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vincent Cheng [Sat, 2 May 2020 03:35:37 +0000 (23:35 -0400)]
ptp: Add adjust_phase to ptp_clock_caps capability.
Add adjust_phase to ptp_clock_caps capability to allow
user to query if a PHC driver supports adjust phase with
ioctl PTP_CLOCK_GETCAPS command.
Signed-off-by: Vincent Cheng <vincent.cheng.xh@renesas.com>
Reviewed-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vincent Cheng [Sat, 2 May 2020 03:35:36 +0000 (23:35 -0400)]
ptp: Add adjphase function to support phase offset control.
Adds adjust phase function to take advantage of a PHC
clock's hardware filtering capability that uses phase offset
control word instead of frequency offset control word.
Signed-off-by: Vincent Cheng <vincent.cheng.xh@renesas.com>
Reviewed-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 2 May 2020 00:02:27 +0000 (17:02 -0700)]
Merge git://git./linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:
====================
pull-request: bpf-next 2020-05-01 (v2)
The following pull-request contains BPF updates for your *net-next* tree.
We've added 61 non-merge commits during the last 6 day(s) which contain
a total of 153 files changed, 6739 insertions(+), 3367 deletions(-).
The main changes are:
1) pulled work.sysctl from vfs tree with sysctl bpf changes.
2) bpf_link observability, from Andrii.
3) BTF-defined map in map, from Andrii.
4) asan fixes for selftests, from Andrii.
5) Allow bpf_map_lookup_elem for SOCKMAP and SOCKHASH, from Jakub.
6) production cloudflare classifier as a selftes, from Lorenz.
7) bpf_ktime_get_*_ns() helper improvements, from Maciej.
8) unprivileged bpftool feature probe, from Quentin.
9) BPF_ENABLE_STATS command, from Song.
10) enable bpf_[gs]etsockopt() helpers for sock_ops progs, from Stanislav.
11) enable a bunch of common helpers for cg-device, sysctl, sockopt progs,
from Stanislav.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 1 May 2020 23:20:05 +0000 (16:20 -0700)]
Merge branch 'net-smc-extent-buffer-mapping-and-port-handling'
Karsten Graul says:
====================
net/smc: extent buffer mapping and port handling
Add functionality to map/unmap and register/unregister memory buffers for
specific SMC-R links and for the whole link group. Prepare LLC layer messages
for the support of multiple links and extent the processing of adapter events.
And add further small preparations needed for the SMC-R failover support.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Stanislav Fomichev [Fri, 1 May 2020 22:43:20 +0000 (15:43 -0700)]
selftests/bpf: Use reno instead of dctcp
Andrey pointed out that we can use reno instead of dctcp for CC
tests and drop CONFIG_TCP_CONG_DCTCP=y requirement.
Fixes: beecf11bc218 ("bpf: Bpf_{g,s}etsockopt for struct bpf_sock_addr")
Suggested-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200501224320.28441-1-sdf@google.com
Karsten Graul [Fri, 1 May 2020 10:48:13 +0000 (12:48 +0200)]
net/smc: llc_add_link_work to handle ADD_LINK LLC requests
Introduce a work that is scheduled when a new ADD_LINK LLC request is
received. The work will call either the SMC client or SMC server
ADD_LINK processing.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Karsten Graul [Fri, 1 May 2020 10:48:12 +0000 (12:48 +0200)]
net/smc: allocate index for a new link
Add smc_llc_alloc_alt_link() to find a free link index for a new link,
depending on the new link group type. And update constants for the
maximum number of links to 3 (2 symmetric and 1 dangling asymmetric link).
These maximum numbers are the same as used by other implementations of the
SMC-R protocol.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Karsten Graul [Fri, 1 May 2020 10:48:11 +0000 (12:48 +0200)]
net/smc: introduce smc_pnet_find_alt_roce()
Introduce a new function in smc_pnet.c that searches for an alternate
IB device, using an existing link group and a primary IB device. The
alternate IB device needs to be active and must have the same PNETID
as the link group.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Karsten Graul [Fri, 1 May 2020 10:48:10 +0000 (12:48 +0200)]
net/smc: remove DELETE LINK processing from smc_core.c
Support for multiple links makes the former DELETE LINK processing
obsolete which sent one DELETE_LINK LLC message for each single link.
Remove this processing from smc_core.c.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Karsten Graul [Fri, 1 May 2020 10:48:09 +0000 (12:48 +0200)]
net/smc: take link down instead of terminating the link group
Use the introduced link down processing in all places where the link
group is terminated and take down the affected link only.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Karsten Graul [Fri, 1 May 2020 10:48:08 +0000 (12:48 +0200)]
net/smc: add smcr_port_err() and smcr_link_down() processing
Call smcr_port_err() when an IB event reports an inactive IB device.
smcr_port_err() calls smcr_link_down() for all affected links.
smcr_link_down() either triggers the local DELETE_LINK processing, or
sends an DELETE_LINK LLC message to the SMC server to initiate the
processing.
The old handler function smc_port_terminate() is removed.
Add helper smcr_link_down_cond() to take a link down conditionally, and
smcr_link_down_cond_sched() to schedule the link_down processing to a
work.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Karsten Graul [Fri, 1 May 2020 10:48:07 +0000 (12:48 +0200)]
net/smc: add smcr_port_add() and smcr_link_up() processing
Call smcr_port_add() when an IB event reports a new active IB device.
smcr_port_add() will start a work which either triggers the local
ADD_LINK processing, or send an ADD_LINK LLC message to the SMC server
to initiate the processing.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Karsten Graul [Fri, 1 May 2020 10:48:06 +0000 (12:48 +0200)]
net/smc: remember PNETID of IB device for later device matching
The PNETID is needed to find an alternate link for a link group.
Save the PNETID of the link that is used to create the link group for
later device matching.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Karsten Graul [Fri, 1 May 2020 10:48:05 +0000 (12:48 +0200)]
net/smc: mutex to protect the lgr against parallel reconfigurations
Introduce llc_conf_mutex in the link group which is used to protect the
buffers and lgr states against parallel link reconfiguration.
This ensures that new connections do not start to register buffers with
the links of a link group when link creation or termination is running.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>