review.tizen.org Git - platform/kernel/linux-starfive.git/log

net: ethernet: ti: cpts: add irq support

Add CPTS IRQ support, but do not enable it. By default, the CPTS driver
will continue working using polling mode which is required for CPTS to
continue working on platforms other than CPSW, like Keystone 2.

The CPTS IRQ support is required to enable support for HW_TS_PUSH events.
The CPSW CPTS IRQ and HW_TS_PUSH events support will be enabled in follow
up patches.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: ti: cpts: rework locking

Now spinlock is used to synchronize everything which is not required. Add
mutex and use to sync access to PTP interface and PTP worker and use
spinlock only to sync FIFO/events processing.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: ti: cpts: move tx timestamp processing to ptp worker only

Now the tx timestamp processing happens from different contexts - softirq
and thread/PTP worker. Enabling IRQ will add one more hard_irq context.
This makes over all defered TX timestamp processing and locking
overcomplicated. Move tx timestamp processing to PTP worker always instead.

napi_rx->cpts_tx_timestamp
if ptp_packet then
push to txq
ptp_schedule_worker()

do_aux_work->cpts_overflow_check
cpts_process_events()

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: ti: cpts: optimize packet to event matching

Now the CPTS driver performs packet (skb) parsing every time when it needs
to match packet to CPTS event (including ptp_classify_raw() calls).

This patch optimizes matching process by parsing packet only once upon
arrival and stores PTP specific data in skb->cb using the same fromat as in
CPTS HW event. As result, all future matching reduces to comparing two u32
values.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: ti: cpts: switch to use new .gettimex64() interface

The CPTS HW latches and saves CPTS counter value in CPTS fifo immediately
after writing to CPSW_CPTS_PUSH.TS_PUSH (bit 0), so the total time that the
driver needs to read the CPTS timestamp is the time required CPSW_CPTS_PUSH
write to actually reach HW.

Hence switch CPTS driver to implement new .gettimex64() callback for more
precise measurement of the offset between a PHC and the system clock which
is measured as time between
write(CPSW_CPTS_PUSH)
read(CPSW_CPTS_PUSH)

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: ti: cpts: move tc mult update in cpts_fifo_read()

Now CPTS driver .adjfreq() generates request to read CPTS current time
(CPTS_EV_PUSH) with intention to process all pending event using previous
frequency adjustment values before switching to the new ones. So
CPTS_EV_PUSH works as a marker to switch to the new frequency adjustment
values. Current code assumes that all job is done in .adjfreq(), but after
enabling IRQ this will not be true any more.

Hence save new frequency adjustment values (mult) and perform actual freq
adjustment in cpts_fifo_read() immediately after CPTS_EV_PUSH is received.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: ti: cpts: separate hw counter read from timecounter

Now CPTS HW time reading code is implemented in timecounter->cyclecounter
.read() callback and performs following operations:
timecounter_read() ->cc.read() -> cpts_systim_read()
- request current CPTS HW time CPTS_TS_PUSH.TS_PUSH = 1
- poll CPTS FIFO for CPTS_EV_PUSH event with current HW timestamp

This approach need to be changed for the future switch to PTP PHC
.gettimex64() callback, which require to separate requesting current CPTS
HW time and processing CPTS FIFO. And for the follow up patch, which
improves .adjfreq() implementation.

This patch moves code accessing CPTS HW out of timecounter code as
following:
- convert HW timestamp of every CPTS event to PTP time (us) and store it as
part struct cpts_event;
- add CPTS context field to store current CPTS HW time (counter) value and
update it on CPTS_EV_PUSH reception;
- move code accessing CPTS HW out of timecounter code and use current CPTS
HW time (counter) from CPTS context instead;
- ensure timecounter->cycle_last is updated on CPTS_EV_PUSH reception.

After this change CPTS timecounter will only perform timekeeper role
without actually accessing CPTS HW.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: ti: cpts: use dev_yy() api for logs

Use dev_yy() API instead of pr_yy() for log outputs.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-napi-addition-of-napi_defer_hard_irqs'

Eric Dumazet says:

====================
net: napi: addition of napi_defer_hard_irqs

This patch series augments gro_glush_timeout feature with napi_defer_hard_irqs

As extensively described in first patch changelog, this can suppresss
the chit-chat traffic between NIC and host to signal interrupts and re-arming
them, since this can be an issue on high speed NIC with many queues.

The last patch in this series converts mlx4 TX completion to
napi_complete_done(), to enable this new mechanism.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_en: use napi_complete_done() in TX completion

In order to benefit from the new napi_defer_hard_irqs feature,
we need to use napi_complete_done() variant in this driver.

RX path is already using it, this patch implements TX completion side.

mlx4_en_process_tx_cq() now returns the amount of retired packets,
instead of a boolean, so that mlx4_en_poll_tx_cq() can pass
this value to napi_complete_done().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: napi: use READ_ONCE()/WRITE_ONCE()

gro_flush_timeout and napi_defer_hard_irqs can be read
from napi_complete_done() while other cpus write the value,
whithout explicit synchronization.

Use READ_ONCE()/WRITE_ONCE() to annotate the races.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: napi: add hard irqs deferral feature

Back in commit 3b47d30396ba ("net: gro: add a per device gro flush timer")
we added the ability to arm one high resolution timer, that we used
to keep not-complete packets in GRO engine a bit longer, hoping that further
frames might be added to them.

Since then, we added the napi_complete_done() interface, and commit
364b6055738b ("net: busy-poll: return busypolling status to drivers")
allowed drivers to avoid re-arming NIC interrupts if we made a promise
that their NAPI poll() handler would be called in the near future.

This infrastructure can be leveraged, thanks to a new device parameter,
which allows to arm the napi hrtimer, instead of re-arming the device
hard IRQ.

We have noticed that on some servers with 32 RX queues or more, the chit-chat
between the NIC and the host caused by IRQ delivery and re-arming could hurt
throughput by ~20% on 100Gbit NIC.

In contrast, hrtimers are using local (percpu) resources and might have lower
cost.

The new tunable, named napi_defer_hard_irqs, is placed in the same hierarchy
than gro_flush_timeout (/sys/class/net/ethX/)

By default, both gro_flush_timeout and napi_defer_hard_irqs are zero.

This patch does not change the prior behavior of gro_flush_timeout
if used alone : NIC hard irqs should be rearmed as before.

One concrete usage can be :

echo 20000 >/sys/class/net/eth1/gro_flush_timeout
echo 10 >/sys/class/net/eth1/napi_defer_hard_irqs

If at least one packet is retired, then we will reset napi counter
to 10 (napi_defer_hard_irqs), ensuring at least 10 periodic scans
of the queue.

On busy queues, this should avoid NIC hard IRQ, while before this patch IRQ
avoidance was only possible if napi->poll() was exhausting its budget
and not call napi_complete_done().

This feature also can be used to work around some non-optimal NIC irq
coalescing strategies.

Having the ability to insert XX usec delays between each napi->poll()
can increase cache efficiency, since we increase batch sizes.

It also keeps serving cpus not idle too long, reducing tail latencies.

Co-developed-by: Luigi Rizzo <lrizzo@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'qed-aer'

Sudarsana Reddy Kalluru says:

====================
qed*: Add support for pcie advanced error recovery.

The patch series adds qed/qede driver changes for PCIe Advanced Error
Recovery (AER) support.
Patch (1) adds qed changes to enable the device to send error messages
to root port when detected.
Patch (2) adds qede support for handling the detected errors (AERs).

Changes from previous version:
-------------------------------
v2: use pci_num_vf() instead of caching the value in edev.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

qede: Add support for handling the pcie errors.

The error recovery is handled by management firmware (MFW) with the help of
qed/qede drivers. Upon detecting the errors, driver informs MFW about this
event which in turn starts a recovery process. MFW sends ERROR_RECOVERY
notification to the driver which performs the required cleanup/recovery
from the driver side.

Signed-off-by: Sudarsana Reddy Kalluru <skalluru@marvell.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

qed: Enable device error reporting capability.

The patch enables the device to send error messages to root port when
an error is detected.

Signed-off-by: Sudarsana Reddy Kalluru <skalluru@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: add GRO support via gro_cells

gro_cells lib is used by different encapsulating netdevices, such as
geneve, macsec, vxlan etc. to speed up decapsulated traffic processing.
CPU tag is a sort of "encapsulation", and we can use the same mechs to
greatly improve overall DSA performance.
skbs are passed to the GRO layer after removing CPU tags, so we don't
need any new packet offload types as it was firstly proposed by me in
the first GRO-over-DSA variant [1].

The size of struct gro_cells is sizeof(void *), so hot struct
dsa_slave_priv becomes only 4/8 bytes bigger, and all critical fields
remain in one 32-byte cacheline.
The other positive side effect is that drivers for network devices
that can be shipped as CPU ports of DSA-driven switches can now use
napi_gro_frags() to pass skbs to kernel. Packets built that way are
completely non-linear and are likely being dropped without GRO.

This was tested on to-be-mainlined-soon Ethernet driver that uses
napi_gro_frags(), and the overall performance was on par with the
variant from [1], sometimes even better due to minimal overhead.
net.core.gro_normal_batch tuning may help to push it to the limit
on particular setups and platforms.

iperf3 IPoE VLAN NAT TCP forwarding (port1.218 -> port0) setup
on 1.2 GHz MIPS board:

5.7-rc2 baseline:

[ID]  Interval         Transfer     Bitrate        Retr
[ 5]  0.00-120.01 sec  9.00 GBytes  644 Mbits/sec  413  sender
[ 5]  0.00-120.00 sec  8.99 GBytes  644 Mbits/sec       receiver

Iface      RX packets  TX packets
eth0       7097731     7097702
port0      426050      6671829
port1      6671681     425862
port1.218  6671677     425851

With this patch:

[ID]  Interval         Transfer     Bitrate        Retr
[ 5]  0.00-120.01 sec  12.2 GBytes  870 Mbits/sec  122  sender
[ 5]  0.00-120.00 sec  12.2 GBytes  870 Mbits/sec       receiver

Iface      RX packets  TX packets
eth0       9474792     9474777
port0      455200      353288
port1      9019592     455035
port1.218  353144      455024

v2:
- Add some performance examples in the commit message;
- No functional changes.

[1] https://lore.kernel.org/netdev/20191230143028.27313-1-alobakin@dlink.ru/

Signed-off-by: Alexander Lobakin <bloodyreaper@yandex.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: Honor all IPv6 PIO Valid Lifetime values

RFC4862 5.5.3 e) prevents received Router Advertisements from reducing
the Valid Lifetime of configured addresses to less than two hours, thus
preventing hosts from reacting to the information provided by a router
that has positive knowledge that a prefix has become invalid.

This patch makes hosts honor all Valid Lifetime values, as per
draft-gont-6man-slaac-renum-06, Section 4.2. This is meant to help
mitigate the problem discussed in draft-ietf-v6ops-slaac-renum.

Note: Attacks aiming at disabling an advertised prefix via a Valid
Lifetime of 0 are not really more harmful than other attacks
that can be performed via forged RA messages, such as those
aiming at completely disabling a next-hop router via an RA that
advertises a Router Lifetime of 0, or performing a Denial of
Service (DoS) attack by advertising illegitimate prefixes via
forged PIOs. In scenarios where RA-based attacks are of concern,
proper mitigations such as RA-Guard [RFC6105] [RFC7113] should
be implemented.

Signed-off-by: Fernando Gont <fgont@si6networks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'dpaa2-eth-add-support-for-xdp-bulk-enqueue'

Ioana Ciornei says:

====================
dpaa2-eth: add support for xdp bulk enqueue

The first patch moves the DEV_MAP_BULK_SIZE macro into the xdp.h header
file so that drivers can take advantage of it and use it.

The following 3 patches are there to setup the scene for using the bulk
enqueue feature.  First of all, the prototype of the enqueue function is
changed so that it returns the number of enqueued frames. Second, the
bulk enqueue interface is used but without any functional changes, still
one frame at a time is enqueued.  Third, the .ndo_xdp_xmit callback is
split into two stages, create all FDs for the xdp_frames received and
then enqueue them.

The last patch of the series builds on top of the others and instead of
issuing an enqueue operation for each FD it issues a bulk enqueue call
for as many frames as possible. This is repeated until all frames are
enqueued or the maximum number of retries is hit. We do not use the
XDP_XMIT_FLUSH flag since the architecture is not capable to store all
frames dequeued in a NAPI cycle, instead we send out right away all
frames received in a .ndo_xdp_xmit call.

Changes in v2:
- statically allocate an array of dpaa2_fd by frame queue
- use the DEV_MAP_BULK_SIZE as the maximum number of xdp_frames
   received in .ndo_xdp_xmit()
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

dpaa2-eth: use bulk enqueue in .ndo_xdp_xmit

Take advantage of the bulk enqueue feature in .ndo_xdp_xmit.
We cannot use the XDP_XMIT_FLUSH since the architecture is not capable
to store all the frames dequeued in a NAPI cycle so we instead are
enqueueing all the frames received in a ndo_xdp_xmit call right away.

After setting up all FDs for the xdp_frames received, enqueue multiple
frames at a time until all are sent or the maximum number of retries is
hit.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dpaa2-eth: split the .ndo_xdp_xmit callback into two stages

Instead of having a function that both creates a frame descriptor from
an xdp_frame and enqueues it, split this into two stages.
Add the dpaa2_eth_xdp_create_fd that just transforms an xdp_frame into a
FD while the actual enqueue callback is called directly from the ndo for
each frame.
This is particulary useful in conjunction with bulk enqueue.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dpaa2-eth: use the bulk ring mode enqueue interface

Update the dpaa2-eth driver to use the bulk enqueue function introduced
with the change to QBMAN ring mode. At the moment, no functional changes
are made but rather the driver just transitions to the new interface
while still enqueuing just one frame at a time.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dpaa2-eth: return num_enqueued frames from enqueue callback

The enqueue dpaa2-eth callback now returns the number of successfully
enqueued frames. This is a preliminary patch necessary for adding
support for bulk ring mode enqueue.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

xdp: export the DEV_MAP_BULK_SIZE macro

Export the DEV_MAP_BULK_SIZE macro to the header file so that drivers
can directly use it as the maximum number of xdp_frames received in the
.ndo_xdp_xmit() callback.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: A few improvements to fib_nexthops.sh

Add nodad when adding IPv6 addresses and remove the sleep.

A recent change to iproute2 moved the 'pref medium' to the prefix
(where it belongs). Change the expected route check to strip
'pref medium' to be compatible with old and new iproute2.

Add IPv4 runtime test with an IPv6 address as the gateway in
the default route.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'Add-selftests-for-pedit-ex-munge-ip6-dsfield'

Petr Machata says:

====================
Add selftests for pedit ex munge ip6 dsfield

Patch #1 extends the existing generic forwarding selftests to cover pedit
ex munge ip6 traffic_class as well. Patch #2 adds TDC test coverage.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: tc-testing: Add a TDC test for pedit munge ip6 dsfield

Add a self-test for the IPv6 dsfield munge that iproute2 will support.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: forwarding: pedit_dsfield: Add pedit munge ip6 dsfield

Extend the pedit_dsfield forwarding selftest with coverage of "pedit ex
munge ip6 dsfield set".

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'add-TJA1102-support'

Oleksij Rempel says:

====================
add TJA1102 support

changes v5:
- rename __of_mdiobus_register_phy() to of_mdiobus_phy_device_register()

changes v4:
- remove unused phy_id variable

changes v3:
- export part of of_mdiobus_register_phy() and reuse it in tja11xx
driver
- coding style fixes

changes v2:
- use .match_phy_device
- add irq support
- add add delayed registration for PHY1
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: tja11xx: add delayed registration of TJA1102 PHY1

TJA1102 is a dual PHY package with PHY0 having proper PHYID and PHY1
having no ID. On one hand it is possible to for PHY detection by
compatible, on other hand we should be able to reset complete chip
before PHY1 configured it, and we need to define dependencies for proper
power management.

We can solve it by defining PHY1 as child of PHY0:
tja1102_phy0: ethernet-phy@4 {
reg = <0x4>;

interrupts-extended = <&gpio5 8 IRQ_TYPE_LEVEL_LOW>;

reset-gpios = <&gpio5 9 GPIO_ACTIVE_LOW>;
reset-assert-us = <20>;
reset-deassert-us = <2000>;

tja1102_phy1: ethernet-phy@5 {
reg = <0x5>;

interrupts-extended = <&gpio5 8 IRQ_TYPE_LEVEL_LOW>;
};
};

The PHY1 should be a subnode of PHY0 and registered only after PHY0 was
completely reset and initialized.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mdio: of: export part of of_mdiobus_register_phy()

This function will be needed in tja11xx driver for secondary PHY
support.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: tja11xx: add initial TJA1102 support

TJA1102 is an dual T1 PHY chip. Both PHYs are separately addressable.
Both PHYs are similar but have different amount of functionality. For
example PHY 1 has no PHY ID and no health monitor.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

dt-bindings: net: phy: Add support for NXP TJA11xx

Document the NXP TJA11xx PHY bindings.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: Use IS_ERR() to check and simplify code

Use IS_ERR() and PTR_ERR() instead of PTR_ZRR_OR_ZERO()
to simplify code, avoid redundant paramenter definitions
and judgements.

Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
Signed-off-by: Tang Bin <tangbin@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: micrel: add phy-mode support for the KSZ9031 PHY

Add support for following phy-modes: rgmii, rgmii-id, rgmii-txid, rgmii-rxid.

This PHY has an internal RX delay of 1.2ns and no delay for TX.

The pad skew registers allow to set the total TX delay to max 1.38ns and
the total RX delay to max of 2.58ns (configurable 1.38ns + build in
1.2ns) and a minimal delay of 0ns.

According to the RGMII v1.3 specification the delay provided by PCB traces
should be between 1.5ns and 2.0ns. The RGMII v2.0 allows to provide this
delay by MAC or PHY. So, we configure this PHY to the best values we can
get by this HW: TX delay to 1.38ns (max supported value) and RX delay to
1.80ns (best calculated delay)

The phy-modes can still be fine tuned/overwritten by *-skew-ps
device tree properties described in:
Documentation/devicetree/bindings/net/micrel-ksz90x1.txt

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Philippe Schenker <philippe.schenker@toradex.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: caif: use true,false for bool variables

Fix the following coccicheck warning:

net/caif/caif_dev.c:410:2-13: WARNING: Assignment of 0/1 to bool
variable
net/caif/caif_dev.c:445:2-13: WARNING: Assignment of 0/1 to bool
variable
net/caif/caif_dev.c:145:1-12: WARNING: Assignment of 0/1 to bool
variable
net/caif/caif_dev.c:223:1-12: WARNING: Assignment of 0/1 to bool
variable

Signed-off-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: stmmac: Add support for VLAN promiscuous mode

For dwmac4, enable VLAN promiscuity when MAC controller is requested to
enter promiscuous mode.

Signed-off-by: Chuah, Kim Tatt <kim.tatt.chuah@intel.com>
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: Tan, Tee Min <tee.min.tan@intel.com>
Signed-off-by: Wong Vee Khee <vee.khee.wong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

macvlan: silence RCU list debugging warning

macvlan_hash_lookup() uses list_for_each_entry_rcu() for traversing
should either under RCU in fast path or the protection of rtnl_mutex.

In the case of holding RTNL, we should add the corresponding lockdep
expression to silence the following false-positive warning:

=============================
WARNING: suspicious RCU usage
5.7.0-rc1-next-20200416-00003-ga3b8d28bc #1 Not tainted
-----------------------------
drivers/net/macvlan.c:126 RCU-list traversed in non-reader section!!

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: Add tests for vrf and xfrms

Add tests for vrf and xfrms with a second round after adding a
qdisc. There are a few known problems documented with the test
cases that fail. The fix is non-trivial; will come back to it
when time allows.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: felix: allow flooding for all traffic classes

Right now it can be seen that the VSC9959 (Felix) switch will not flood
frames if they have a VLAN tag with a PCP of 1-7 (nonzero).

It turns out that Felix is quite different from its cousin, Ocelot, in
that frame flooding can be allowed/denied per traffic class. Where
Ocelot has 1 instance of the ANA_FLOODING register, Felix has 8.

The approach that this driver is going to take is "thanks, but no
thanks". We have no use case of limiting the flooding domain based on
traffic class, so we just want to allow packets to be flooded, no matter
what traffic class they have.

So we copy the line of code from ocelot.c which does the one-shot
initialization of the flooding PGIDs, and we add it to felix.c as well -
except replicated 8 times.

Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: qrtr: Add tracepoint support

Add tracepoint support for QRTR with NS as the first candidate. Later on
this can be extended to core QRTR and transport drivers.

The trace_printk() used in NS has been replaced by tracepoints.

Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ila: remove unused macro 'ILA_HASH_TABLE_SIZE'

net/ipv6/ila/ila_xlat.c:604:0: warning: macro "ILA_HASH_TABLE_SIZE" is not used [-Wunused-macros]

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/sched: act_ct: update nf_conn_acct for act_ct SW offload in flowtable

When the act_ct SW offload in flowtable, The counter of the conntrack
entry will never update. So update the nf_conn_acct conuter in act_ct
flowtable software offload.

Signed-off-by: wenxu <wenxu@ucloud.cn>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-phy-add-device-managed-devm_mdiobus_register'

Heiner Kallweit says:

====================
net: phy: add device-managed devm_mdiobus_register

If there's no special ordering requirement for mdiobus_unregister(),
then driver code can be simplified by using a device-managed version
of mdiobus_register(). Prerequisite is that bus allocation has been
done device-managed too. Else mdiobus_free() may be called whilst
bus is still registered, resulting in a BUG_ON(). Therefore let
devm_mdiobus_register() return -EPERM if bus was allocated
non-managed.

First user of the new functionality is r8169 driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: use devm_mdiobus_register

Use new function devm_mdiobus_register() to simplify the driver.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: add device-managed devm_mdiobus_register

If there's no special ordering requirement for mdiobus_unregister(),
then driver code can be simplified by using a device-managed version
of mdiobus_register(). Prerequisite is that bus allocation has been
done device-managed too. Else mdiobus_free() may be called whilst
bus is still registered, resulting in a BUG_ON(). Therefore let
devm_mdiobus_register() return -EPERM if bus was allocated
non-managed.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: bcm54140: add hwmon support

The PHY supports monitoring its die temperature as well as two analog
voltages. Add support for it.

Signed-off-by: Michael Walle <michael@walle.cc>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: add Broadcom BCM54140 support

The Broadcom BCM54140 is a Quad SGMII/QSGMII Copper/Fiber Gigabit
Ethernet transceiver.

This also adds support for tunables to set and get downshift and
energy detect auto power-down.

The PHY has four ports and each port has its own PHY address.
There are per-port registers as well as global registers.
Unfortunately, the global registers can only be accessed by reading
and writing from/to the PHY address of the first port. Further,
there is no way to find out what port you actually are by just
reading the per-port registers. We therefore, have to scan the
bus on the PHY probe to determine the port and thus what address
we need to access the global registers.

Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: broadcom: add helper to write/read RDB registers

RDB (Register Data Base) registers are used on newer Broadcom PHYs. Add
helper to read, write and modify these registers.

Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'dt-bindings-net-mdio.yaml-fixes'

Florian Fainelli says:

====================
dt-bindings: net: mdio.yaml fixes

This patch series documents some common MDIO devices properties such as
resets (and delays) and broken-turn-around. The second patch also
rephrases some descriptions to be more general towards MDIO devices and
not specific towards Ethernet PHYs.

Changes in v3:

- corrected wording of 'broken-turn-around' in ethernet-phy.yaml and
mdio.yaml, add Andrew's R-b tag to patch #3
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

dt-bindings: net: mdio: Make descriptions more general

A number of descriptions assume a PHY device, but since this binding
describes a MDIO bus which can have different kinds of MDIO devices
attached to it, rephrase some descriptions to be more general in that
regard.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dt-bindings: net: mdio: Document common properties

Some of the properties pertaining to the broken turn around or resets
were only documented in ethernet-phy.yaml while they are applicable
across all MDIO devices and not Ethernet PHYs specifically which are a
superset.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

dt-bindings: net: Correct description of 'broken-turn-around'

The turn around bytes (2) are placed between the control phase of the
MDIO transaction and the data phase, correct the wording to be more
exact.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'Ocelot-MAC_ETYPE-tc-flower-key-improvements'

Vladimir Oltean says:

====================
Ocelot MAC_ETYPE tc-flower key improvements

As discussed in the comments surrounding this patch:
https://patchwork.ozlabs.org/project/netdev/patch/20200417190308.32598-1-olteanv@gmail.com/

the restrictions imposed on non-MAC_ETYPE rules were harsher than they
needed to be. IP, IPv6, ARP rules can still be added concurrently with
src_mac and dst_mac rules, as long as those MAC address rules do not ask
for an offending EtherType.

For that to actually be supported, we need to parse the EtherType from
the flower classification rule first.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: mscc: ocelot: lift protocol restriction for flow_match_eth_addrs keys

An attempt was made in commit fe3490e6107e ("net: mscc: ocelot: Hardware
ofload for tc flower filter") to avoid clashes between MAC_ETYPE rules
and IP rules. Because the protocol blacklist should have included
ETH_P_ALL too, it created some confusion, but now the situation should
be dealt with a bit better by the patch immediately previous to this one
("net: mscc: ocelot: refine the ocelot_ace_is_problematic_mac_etype
function").

So now we can remove that check. MAC_ETYPE rules with a protocol of
ETH_P_IP, ETH_P_IPV6, ETH_P_ARP and ETH_P_ALL _are_ supported, with some
restrictions regarding per-port exclusivity which are enforced now.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mscc: ocelot: refine the ocelot_ace_is_problematic_mac_etype function

The commit mentioned below was a bit too harsh, and while it restricted
the invalid key combinations which are known to not work, such as:

tc filter add dev swp0 ingress proto ip \
      flower src_ip 192.0.2.1 action drop
tc filter add dev swp0 ingress proto all \
      flower src_mac 00:11:22:33:44:55 action drop

it also restricted some which still should work, such as:

tc filter add dev swp0 ingress proto ip \
      flower src_ip 192.0.2.1 action drop
tc filter add dev swp0 ingress proto 0x22f0 \
      flower src_mac 00:11:22:33:44:55 action drop

What actually does not match "sanely" is a MAC_ETYPE rule on frames
having an EtherType of ARP, IPv4, IPv6, in addition to SNAP and OAM
frames (which the ocelot tc-flower implementation does not parse yet, so
the function might need to be revisited again in the future).

So just make the function recognize the problematic MAC_ETYPE rules by
EtherType - thus the VCAP IS2 can be forced to match even on those
packets.

This patch makes it possible for IP rules to live on a port together
with MAC_ETYPE rules that are non-all, non-arp, non-ip and non-ipv6.

Fixes: d4d0cb741d7b ("net: mscc: ocelot: deal with problematic MAC_ETYPE VCAP IS2 rules")
Reported-by: Allan W. Nielsen <allan.nielsen@microchip.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mscc: ocelot: support matching on EtherType

Currently, the filter's protocol is ignored except for a few special
cases (IPv4 and IPv6).

The EtherType can be matched inside VCAP IS2 by using a MAC_ETYPE key.
So there are 2 cases in which EtherType matches are supported:

  - As part of a larger MAC_ETYPE rule, such as:

    tc filter add dev swp0 ingress protocol ip \
            flower skip_sw src_mac 42:be:24:9b:76:20 action drop

  - Standalone (matching on protocol only):

    tc filter add dev swp0 ingress protocol arp \
            flower skip_sw action drop

As before, if the protocol is not specified, is it implicitly "all" and
the EtherType mask in the MAC_ETYPE half key is set to zero.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'Support-programmable-pins-for-Ocelot-PTP-driver'

Yangbo Lu says:

====================
Support programmable pins for Ocelot PTP driver

The Ocelot PTP clock driver had been embedded into ocelot.c driver.
It had supported basic gettime64/settime64/adjtime/adjfine functions
by now which were used by both Ocelot switch and Felix switch.

This patch-set is to move current ptp clock code out of ocelot.c driver
maintaining as a single ocelot_ptp.c driver, and to implement 4
programmable pins with only PTP_PF_PEROUT function for now.
The PTP_PF_EXTTS function will be supported in the future, and it should
be implemented separately for Felix and Ocelot, because of different
hardware interrupt implementation in them.

Changes for v2:
- Put PTP driver under drivers/net/ethernet/mscc/.
- Dropped MAINTAINERS patch. Kept original maintaining.
- Initialized PTP separately in ocelot/felix platforms.
- Supported PPS case in programmable pin.
- Supported disabling pin function since deadlock is fixed by Richard.
- Returned -EBUSY if not finding pin available.
Changes for v3:
- Re-sent.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: felix: enable PTP programmable pin

Enable PTP programmable pin.

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mscc: ocelot: enable PTP programmable pin

Enable PTP programmable pin.

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mscc: ocelot: support 4 PTP programmable pins

Support 4 PTP programmable pins with only PTP_PF_PEROUT function
for now. The PTP_PF_EXTTS function will be supported in the
future, and it should be implemented separately for Felix and
Ocelot, because of different hardware interrupt implementation
in them.

Since the hardware is not able to support absolute start time,
the periodic clock request only allows start time 0 0. But nsec
could be accepted for PPS case for phase adjustment.

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mscc: ocelot: add wave programming registers definitions

Add wave programming registers definitions for Ocelot platforms.

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mscc: ocelot: redefine PTP pins

There are 5 PTP_PINS register groups on Ocelot switch.
Except the one used for TOD operations, there are still
4 register groups for programmable pins. So redefine the
4 programmable pins.

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mscc: ocelot: fix timestamp info if ptp clock does not work

The timestamp info should be only software timestamp capabilities
if ptp clock does not work.

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mscc: ocelot: move ocelot ptp clock code out of ocelot.c

The Ocelot PTP clock driver had been embedded into ocelot.c driver.
It had supported basic gettime64/settime64/adjtime/adjfine functions
by now which were used by both Ocelot switch and Felix switch.

This patch is to move current ptp clock code out of ocelot.c driver
maintaining as a single ocelot_ptp.c.
For futher new features implementation, the common code could be put
in ocelot_ptp.c and the switch specific code should be in specific
switch driver. The interrupt implementation in SoC is different
between Ocelot and Felix.

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'vermagic-non-global'

Leon Romanovsky says:

====================
Remove vermagic header from global include folder

Changelog:
v2:
* Changed the implementation of patch #4 to be like Masahiro wants.
I personally don't like this implementation and changing it just to move forward
this this patchset.
v1:
https://lore.kernel.org/lkml/20200415133648.1306956-1-leon@kernel.org
* Added tags
* Updated patch #4 with test results
* Changed scripts/mod/modpost.c to create inclusion of vermagic.h
from kernel folder and not from general include/linux. This is
needed to generate *.mod.c files, while building modules.
v0:
https://lore.kernel.org/lkml/20200414155732.1236944-1-leon@kernel.org

This is followup to the failure reported by Borislav [1] and suggested
fix later on [2].

The series removes all includes of linux/vermagic.h, updates hns and
nfp to use same kernel versioning scheme (exactly like we did for
other drivers in previous cycle) and removes vermagic.h from global
include folder.

[1] https://lore.kernel.org/lkml/20200411155623.GA22175@zn.tnic
[2] https://lore.kernel.org/lkml/20200413080452.GA3772@zn.tnic
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

kernel/module: Hide vermagic header file from general use

VERMAGIC* definitions are not supposed to be used by the drivers,
see this [1] bug report, so introduce special define to guard inclusion
of this header file and define it in kernel/modules.h and in internal
script that generates *.mod.c files.

In-tree module build:
➜  kernel git:(vermagic) ✗ make clean
➜  kernel git:(vermagic) ✗ make M=drivers/infiniband/hw/mlx5
➜  kernel git:(vermagic) ✗ modinfo drivers/infiniband/hw/mlx5/mlx5_ib.ko
filename: /images/leonro/src/kernel/drivers/infiniband/hw/mlx5/mlx5_ib.ko
<...>
vermagic:       5.6.0+ SMP mod_unload modversions

Out-of-tree module build:
➜  mlx5 make -C /images/leonro/src/kernel clean M=/tmp/mlx5
➜  mlx5 make -C /images/leonro/src/kernel M=/tmp/mlx5
➜  mlx5 modinfo /tmp/mlx5/mlx5_ib.ko
filename:       /tmp/mlx5/mlx5_ib.ko
<...>
vermagic:       5.6.0+ SMP mod_unload modversions

[1] https://lore.kernel.org/lkml/20200411155623.GA22175@zn.tnic
Reported-by: Borislav Petkov <bp@suse.de>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Jessica Yu <jeyu@kernel.org>
Co-developed-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/nfp: Update driver to use global kernel version

Change nfp driver to use globally defined kernel version.

Reported-by: Borislav Petkov <bp@suse.de>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/hns: Remove custom driver version in favour of global one

Use globally defined kernel version instead of custom driver variant.

Reported-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers: Remove inclusion of vermagic header

Get rid of linux/vermagic.h includes, so that MODULE_ARCH_VERMAGIC from
the arch header arch/x86/include/asm/module.h won't be redefined.

  In file included from ./include/linux/module.h:30,
                   from drivers/net/ethernet/3com/3c515.c:56:
  ./arch/x86/include/asm/module.h:73: warning: "MODULE_ARCH_VERMAGIC"
redefined
     73 | # define MODULE_ARCH_VERMAGIC MODULE_PROC_FAMILY
        |
  In file included from drivers/net/ethernet/3com/3c515.c:25:
  ./include/linux/vermagic.h:28: note: this is the location of the
previous definition
     28 | #define MODULE_ARCH_VERMAGIC ""
        |

Fixes: 6bba2e89a88c ("net/3com: Delete driver and module versions from 3com drivers")
Co-developed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Shannon Nelson <snelson@pensando.io> # ionic
Acked-by: Sebastian Reichel <sre@kernel.org> # power
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipv4: remove redundant assignment to variable rc

The variable rc is being assigned with a value that is never read
and it is being updated later with a new value. The initialization is
redundant and can be removed.

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-bcmgenet-Clean-up-after-ACPI-enablement'

Andy Shevchenko says:

====================
net: bcmgenet: Clean up after ACPI enablement

ACPI enablement series had missed some clean ups that would have been done
at the same time. Here are these bits.

In v2:
- return dev_dbg() calls to avoid spamming logs when probe is deferred (Florian)
- added Ack (Florian)
- combined two, earlier sent, series together
- added couple more patches
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: bcmgenet: Drop too many parentheses in bcmgenet_probe()

No need to have parentheses around plain pointer variable or
negation operator. Drop them for good.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bcmgenet: Use get_unligned_beXX() and put_unaligned_beXX()

It's convenient to use get_unligned_beXX() and put_unaligned_beXX() helpers
to get or set MAC instead of open-coded variants.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bcmgenet: Use devm_clk_get_optional() to get the clocks

Conversion to devm_clk_get_optional() makes it explicit that clocks are
optional. This change allows to handle deferred probe in case clocks are
defined, but not yet probed. Due to above changes bail out in error case.

While here, check potential error when enable main clock.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bcmgenet: Drop useless OF code

There is nothing which needs a set of OF headers, followed by redundant
OF node ID check. Drop them for good.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bcmgenet: Drop ACPI_PTR() to avoid compiler warning

When compiled with CONFIG_ACPI=n, ACPI_PTR() will be no-op, and thus
genet_acpi_match table defined, but not used. Compiler is not happy about
such data. Drop ACPI_PTR() for good.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'mlx5-updates-2020-04-20' of git://git./linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2020-04-20

This series includes misc updates and clean ups to mlx5 driver:

1) improve some comments from Hu Haowen.
2) Handles errors of netif_set_real_num_{tx,rx}_queues, from Maxim
3) IPsec and FPGA related code cleanup to prepare for ASIC devices
IPsec offloads, from Raed
4) Allow partial mask for tunnel options, from Roi.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: change wmb to smb_wmb in rtl8169_start_xmit

A barrier is needed here to ensure that rtl_tx sees the descriptor
changes (DescOwn set) before the updated tp->cur_tx value. Else it may
wrongly assume that the transfer has been finished already. For this
purpose smp_wmb() is sufficient.

No separate barrier is needed for ordering the descriptor changes
with the MMIO doorbell write. The needed barrier is included in
the non-relaxed writel() used by rtl8169_doorbell().

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5: improve some comments

Replaced "its" with "it's".

Signed-off-by: Hu Haowen <xianfengting221@163.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5: Read embedded cpu bit only once

Embedded CPU bit doesn't change with PCI resume/suspend.
Hence read it only once while probing the PCI device.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Handle errors from netif_set_real_num_{tx,rx}_queues

netif_set_real_num_tx_queues and netif_set_real_num_rx_queues may fail.
Now that mlx5e supports handling errors in the preactivate hook, this
commit leverages that functionality to handle errors from those
functions and roll back all changes on failure.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Allow partial data mask for tunnel options

We use mapping to save and restore the tunnel options.
Save also the tunnel options mask.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Set of completion request bit should not clear other adjacent bits

In notify HW (ring doorbell) flow, we set the bit to request a completion
on the TX descriptor.
When doing so, we should not unset other bits in the same byte.
Currently, this does not fix a real issue, as we still don't have a flow
where both MLX5_WQE_CTRL_CQ_UPDATE and any adjacent bit are set together.

Fixes: 542578c67936 ("net/mlx5e: Move helper functions to a new txrx datapath header")
Fixes: 864b2d715300 ("net/mlx5e: Generalize tx helper functions for different SQ types")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5: IPsec, Refactor SA handle creation and destruction

Currently the SA handle is created and managed as part of the common
code for different IPsec supporting HW, this handle is passed to HW
to be used on Rx to identify the SA handle that was used to
return the xfrm state to stack.

The above implementation pose a limitation on managing this handle.

Refactor by moving management of this field to the specific HW code.

Downstream patches will introduce the Connect-X support for IPsec that
will use this handle differently than current implementation.

Signed-off-by: Raed Salem <raeds@mellanox.com>
Reviewed-by: Boris Pismenny <borisp@mellanox.com>
Reviewed-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: IPSec, Expose IPsec HW stat only for supporting HW

The current HW counters are supported only by Innova, split the ipsec
stats group into two groups, one for HW and one for SW. And expose
the HW counters to ethtool only if Innova HW is used for IPsec offload.

Signed-off-by: Raed Salem <raeds@mellanox.com>
Reviewed-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5: Refactor mlx5_accel_esp_create_hw_context parameter list

Currently the FPGA IPsec is the only hw implementation of the IPsec
acceleration api, and so the mlx5_accel_esp_create_hw_context was
wrongly made to suit this HW api, among other in its parameter list
and some of its parameter endianness.

This implementation might not be suitable for different HW.

Refactor by group and pass all function arguments of
mlx5_accel_esp_create_hw_context in common mlx5_accel_esp_xfrm_attrs
struct field of mlx5_accel_esp_xfrm struct and correct the endianness
according to the HW being called.

Signed-off-by: Raed Salem <raeds@mellanox.com>
Reviewed-by: Boris Pismenny <borisp@mellanox.com>
Reviewed-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: en_accel, Add missing net/geneve.h include

The cited commit relies on include <net/geneve.h> being included
implicitly prior to include "en_accel/en_accel.h".
This mandates that all files that needs to include en_accel.h
to redantantly include net/geneve.h.

Include net/geneve.h explicitly at "en_accel/en_accel.h" to avoid
undesired constrain as above.

Fixes: e3cfc7e6b7bd ("net/mlx5e: TX, Add geneve tunnel stateless offload support")
Signed-off-by: Raed Salem <raeds@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5: Use the correct IPsec capability function for FPGA ops

Currently the IPsec acceleration capability function is also used
at IPsec fpga capable device code.

This could cause a future bug as the acceleration layer is agnostic
to the device implementing its API.

Fix by using the IPsec FPGA capability function instead of acceleration
layer capability function in case of FPGA IPsec only related operations.

Downstream patches will add support for Connect-X IPsec, this can avoid
a future bug.

Signed-off-by: Raed Salem <raeds@mellanox.com>
Reviewed-by: Boris Pismenny <borisp@mellanox.com>
Reviewed-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

qed: use true,false for bool variables

Fix the following coccicheck warning:

drivers/net/ethernet/qlogic/qed/qed_dev.c:4395:2-34: WARNING:
Assignment of 0/1 to bool variable
drivers/net/ethernet/qlogic/qed/qed_dev.c:1975:2-34: WARNING:
Assignment of 0/1 to bool variable

Signed-off-by: Jason Yan <yanaijie@huawei.com>
Acked-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'hns3-next'

Huazhong Tan says:

====================
net: hns3: misc updates for -next

This patchset includes some misc updates for the HNS3 ethernet driver.

[patch 1&2] separates two bloated function.
[patch 3-5] removes some redundant code.
[patch 6-7] cleans up some coding style issues.
[patch 8-10] adds some debugging information.

Change log:
V1->V2: removes an unnecessary initialization in [patch 1] which
suggested by David Miller.
modified some print format issue and commit log in [patch 8].
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: add trace event support for PF/VF mailbox

This patch adds trace event support for PF/VF mailbox.

Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: add support for dumping MAC reg in debugfs

This patch adds support for dumping MAC reg in debugfs,
which will be helpful for debugging.

Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: add debug information for flow table when failed

Adds some debug information for failures of processing flow table,
removes the redundant printing when hclge_fd_check_spec() returns
error, and modifies the printing level for FD not enable error.

Signed-off-by: Guojia Liao <liaoguojia@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: clean up some coding style issue

This patch removes some unnecessary blank lines, redundant
parentheses, and changes one tab to blank in
hclge_dbg_dump_reg_common().

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: modify some unsuitable type declaration

In hclge_set_fd_key_config(), parameter 'stage' should be
as enum HCLGE_FD_STAGE, and in hclge_config_key(), 'tuple_size'
should be type u8, also simplify unsigned int with u32 for 'i'.

Signed-off-by: Guojia Liao <liaoguojia@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: remove two unused structures in hclge_cmd.h

struct hclge_mac_vlan_remove_cmd and hclge_mac_vlan_add_cmd are unused.
So removes them from hclge_cmd.h.

Signed-off-by: Guojia Liao <liaoguojia@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: remove useless proto_support field in struct hclge_fd_cfg

proto_support field in struct hclge_fd_cfg shows what protocols
in flow direct table are supported now. It is unnecessary since
checking which one is unsupported will be more efficient,
so this patch removes it.

Signed-off-by: Guojia Liao <liaoguojia@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: remove an unnecessary case 0 in hclge_fd_convert_tuple()

Since case default has included case 0, so removes this
redundant case 0.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: split out hclge_get_fd_rule_info()

hclge_get_fd_rule_info() is bloated, this patch separates
it into several standalone functions for readability and
maintainability.

Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: split out hclge_fd_check_ether_tuple()

For readability and maintainability, this patch separates the
handling part of each flow type in hclge_fd_check_ether_tuple()
into standalone functions.

Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>