review.tizen.org Git - platform/kernel/linux-starfive.git/log

net/mlx5e: Return the allocated flow directly from __mlx5e_add_fdb_flow

This confusing construction confuses the compiler which can't see
that flow is initialized if !err:

drivers/net/ethernet/mellanox/mlx5/core/en_tc.c: In function `mlx5e_configure_flower`
drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:2727:28: warning:
`flow` may be used uninitialized in this function
[-Wmaybe-uninitialized]

There is no reason for two function outputs, just return the
pointer directly and use ERR_PTR to encode a failure.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Expand XPS cpumask to cover all online cpus

Currently we have one cpu in XPS cpumask per tx queue, this is good
enough for default configuration where there is a tx queue per cpu.
However, once configuration changes to use less tx queues, part of the
cpus are not XPS-mapped and so the select queue decision falls back to
hash calculation and balancing is not guaranteed.

Expand XPS cpumask to enable using all cpus even when number of tx
queues is smaller than number of cpus.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Take CQ decompress fields into a separate structure

Only the Receive CQ makes use of these fields. Take them
out into a separate struct and save space in the generic
CQ structure.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: RX, Make sure packet header does not cross page boundary

In the non-linear SKB memory scheme of Striding RQ, a packet header
could cross page boundary. This requires special care in fast path
that costs LoC, additional runtime instructions and branches.

It could happen when the header (up to 256B) does not fit in
a single stride. Avoid this by working with a stride size that fits
the maximum possible header. Stride is increased form 64B to 256B.

Performance:
Tested packet rate for UDP streams, single ring, on ConnectX-5.

Configuration:
Set Striding RQ and LRO ON (to enabled the non-linear SKB scheme).
GRO OFF, early drop by TC rule.

64B: 4x worse memory utilization, no page-crossers headers
- No degradation (5,887,305 pps).
- The reduction in memory utilization is compensated by the saving of
branches tests.

192B: 1.33x worse memory utilization, avoid page-crossers headers
- Before: 5,727,252. After: 5,777,037. ~1% gain.

256B: Same memory util, no page-crossers
- Before: 5,691,885. After: 5,748,007. ~1% gain.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net: Revert devlink health changes.

This reverts the devlink health changes from 9/17/2019,
Jiri wants things to be designed differently and it was
agreed that the easiest way to do this is start from the
beginning again.

Commits reverted:

cb5ccfbe73b389470e1dc11061bb185ef4bc9aec
880ee82f0313453ec5a6cb122866ac057263066b
c7af343b4e33578b7de91786a3f639c8cfa0d97b
ff253fedab961b22117a73ab808fcfa9e6852b50
6f9d56132eb6d2603d4273cfc65bed914ec47acb
fcd852c69d776c0f46c8f79e8e431e5cc6ddc7b7
8a66704a13d9713593342e29b4f0c19762f5746b
12bd0dcefe88782ac1c9fff632958dd1b71d27e5
aba25279c10094c5c97d09c3491ca86d00b4ad5e
ce019faa70f81555fa17ebc1d5a03651f2e7e15a
b8c45a033acc607201588f7665ba84207e5149e0

And the follow-on build fix:

o33a0efa4baecd689da9474ce0e8b673eb6931c60

Signed-off-by: David S. Miller <davem@davemloft.net>

bridge: remove duplicated include from br_multicast.c

Remove duplicated include.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxfw: Replace license text with SPDX identifiers and adjust copyrights

Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: remove dead code in struct tipc_topsrv

max_rcvbuf_size is no longer used since commit "414574a0af36".

Signed-off-by: Zhaolong Zhang <zhangzl2013@126.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'tcp_bbr-Improving-TCP-BBR-performance-for-WiFi-and-cellular-networks'

Priyaranjan Jha says:

====================
tcp_bbr: Improving TCP BBR performance for WiFi and cellular networks

Ack aggregation is quite prevalent with wifi, cellular and cable modem
link tchnologies, ACK decimation in middleboxes, and common offloading
techniques such as TSO and GRO, at end hosts. Previously, BBR was often
cwnd-limited in the presence of severe ACK aggregation, which resulted in
low throughput due to insufficient data in flight.

To achieve good throughput for wifi and other paths with aggregation, this
patch series implements an ACK aggregation estimator for BBR, which
estimates the maximum recent degree of ACK aggregation and adapts cwnd
based on it. The algorithm is further described by the following
presentation:
https://datatracker.ietf.org/meeting/101/materials/slides-101-iccrg-an-update-on-bbr-work-at-google-00

(1) A preparatory patch, which refactors bbr_target_cwnd for generic
inflight provisioning.

(2) Implements BBR ack aggregation estimator and adapts cwnd based
on measured degree of ACK aggregation.
====================

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp_bbr: adapt cwnd based on ack aggregation estimation

Aggregation effects are extremely common with wifi, cellular, and cable
modem link technologies, ACK decimation in middleboxes, and LRO and GRO
in receiving hosts. The aggregation can happen in either direction,
data or ACKs, but in either case the aggregation effect is visible
to the sender in the ACK stream.

Previously BBR's sending was often limited by cwnd under severe ACK
aggregation/decimation because BBR sized the cwnd at 2*BDP. If packets
were acked in bursts after long delays (e.g. one ACK acking 5*BDP after
5*RTT), BBR's sending was halted after sending 2*BDP over 2*RTT, leaving
the bottleneck idle for potentially long periods. Note that loss-based
congestion control does not have this issue because when facing
aggregation it continues increasing cwnd after bursts of ACKs, growing
cwnd until the buffer is full.

To achieve good throughput in the presence of aggregation effects, this
algorithm allows the BBR sender to put extra data in flight to keep the
bottleneck utilized during silences in the ACK stream that it has evidence
to suggest were caused by aggregation.

A summary of the algorithm: when a burst of packets are acked by a
stretched ACK or a burst of ACKs or both, BBR first estimates the expected
amount of data that should have been acked, based on its estimated
bandwidth. Then the surplus ("extra_acked") is recorded in a windowed-max
filter to estimate the recent level of observed ACK aggregation. Then cwnd
is increased by the ACK aggregation estimate. The larger cwnd avoids BBR
being cwnd-limited in the face of ACK silences that recent history suggests
were caused by aggregation. As a sanity check, the ACK aggregation degree
is upper-bounded by the cwnd (at the time of measurement) and a global max
of BW * 100ms. The algorithm is further described by the following
presentation:
https://datatracker.ietf.org/meeting/101/materials/slides-101-iccrg-an-update-on-bbr-work-at-google-00

In our internal testing, we observed a significant increase in BBR
throughput (measured using netperf), in a basic wifi setup.
- Host1 (sender on ethernet) -> AP -> Host2 (receiver on wifi)
- 2.4 GHz -> BBR before: ~73 Mbps; BBR after: ~102 Mbps; CUBIC: ~100 Mbps
- 5.0 GHz -> BBR before: ~362 Mbps; BBR after: ~593 Mbps; CUBIC: ~601 Mbps

Also, this code is running globally on YouTube TCP connections and produced
significant bandwidth increases for YouTube traffic.

This is based on Ian Swett's max_ack_height_ algorithm from the
QUIC BBR implementation.

Signed-off-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp_bbr: refactor bbr_target_cwnd() for general inflight provisioning

Because bbr_target_cwnd() is really a general-purpose BBR helper for
computing some volume of inflight data as a function of the estimated
BDP, refactor it into following helper functions:
- bbr_bdp()
- bbr_quantization_budget()
- bbr_inflight()

Signed-off-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: factor out PHY init sequence adjusting 10M and ALDPS

Few chip versions use the same sequence to adjust 10M and ALDPS, so
let's factor it out. This patch also fixes a (most likely) typo in
rtl8168g_1_hw_phy_config. There bit 8 in reg 0x14 on page 0x0bcc
was set and not cleared. According to the vendor driver this bit
needs to be cleared in all cases.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: factor out disabling ALDPS

Chip versions from RTL8168g onward use the same sequence to disable
ALDPS (Advanced Link-Down Power Saving). So let's factor this out.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: count master 3ad stats separately

I made a dumb mistake when I summed up the slave stats, obviously slaves
can come and go which would make the master stats unreliable.
Count and export the master stats separately.

Fixes: a258aeacd7f0 ("bonding: add support for xstats and export 3ad stats")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-phy-improve-starting-PHY'

Heiner Kallweit says:

====================
net: phy: improve starting PHY

This patch series improves few aspects of starting the PHY.

v2:
- improve a warning in patch 4
v3:
- extend commit message for patch 2
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: change phy_start_interrupts to phy_request_interrupt

Now that we enable the interrupts in phy_start() we don't have to do it
before. Therefore remove enabling interrupts from phy_start_interrupts()
and rename this function to reflect the changed functionality.

v2:
- improve warning to clearly state that we fall back to polling

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: start interrupts in phy_start

Interrupts don't have to be enabled before calling phy_start().
Therefore let's enable them in phy_start(). In a subsequent step
we'll remove enabling interrupts from phy_connect_direct().

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: warn if phy_start is called from invalid state

phy_start() should be called from states PHY_READY or PHY_HALTED only.
Check for this to detect misbehaving drivers. Also the state machine
should be started only when being called from one of the valid states.

Some more background:
For all invalid states phy_start() basically was a no-op. All it did
was triggering a state machine run, but for all "running" states the
poll loop was active anyway. And if called from PHY_DOWN, the state
machine does nothing.

v3:
- extended commit message

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: start state machine in phy_start only

The state machine is a no-op before phy_start() has been called.
Therefore let's enable it in phy_start() only. In phy_start()
let's call phy_start_machine() instead of phy_trigger_machine().
phy_start_machine is an alias for phy_trigger_machine but it makes
clearer that we start the state machine here instead of just
triggering a run.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: stmmac: Fix return value check in qcom_ethqos_probe()

In case of error, the function devm_clk_get() returns ERR_PTR() and
never returns NULL. The NULL test in the return value check should be
replaced with IS_ERR().

Fixes: a7c30e62d4b8 ("net: stmmac: Add driver for Qualcomm ethqos")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Vinod Koul <vkoul@kernel.org>
Acked-by: Niklas Cassel <niklas.cassel@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: amd8111e: clean up two minor indentation issues

Two statements are incorrecly indented, fix these by removing a space.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'ENETC'

Claudiu Manoil says:

====================
Introduce ENETC ethernet drivers

ENETC is a multi-port virtualized Ethernet controller supporting GbE
designs and Time-Sensitive Networking (TSN) functionality.
ENETC is operating as an SR-IOV multi-PF capable Root Complex Integrated
Endpoint (RCIE). As such, it contains multiple physical (PF) and virtual
(VF) PCIe functions, discoverable by standard PCI Express.

The patch series adds basic enablement for these otherwise standard
buffer descriptor (BD) ring based ethernet devices (PCIe PFs and VFs),
currently included in the 64-bit dual ARMv8 processors LS1028A SoC.
The driver is portable to 32-bit designs, and it's independent of CPU
endianness.

Contributors:
Alex Marginean <alexandru.marginean@nxp.com>
Catalin Horghidan <catalin.horghidan@nxp.com>

TODO list:
* IEEE 1588 PTP support;
* TSN support;
* MDIO support and VF link management;
* power management support;
* flow control support;
* TC offloading with h/w MQPRIO;
* interrupt coalescing, configurable BD ring sizes, and other usual
config options if missing.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

enetc: Add RFS and RSS support

A ternary match table is used for RFS. If multiple entries in the table
match, the entry with the lowest numerical values index is chosen as the
matching entry.  Entries in the table are identified using an index
which takes a value from 0 to PRFSCAPR[NUM_RFS]-1 when accessed by the
PSI (PF).
Portions of the RFS table can be assigned to each SI by the PSI (PF)
driver in PSIaRFSCFGR.  Assignments are cumulative, the entries assigned
to SIn start after those assigned to SIn-1.  The total assignments to
all SIs must be equal to or less than the number available to the port
as found in PRFSCAPR.

For RSS, the Toeplitz hash function used requires two inputs, a 40B
random secret key that is supplied through the PRSSKR0-9 registers as well
as the relevant pieces of the packet header (n-tuple).  The 6 LSB bits of
the hash function result will then be used as a pointer to obtain the tag
referenced in the 64 entry indirection table.  The result will provide a
winning group which will be used to help route the received packet.

Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com>
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

enetc: Add vf to pf messaging support

VSIs (VFs) may send a message to the PSI (PF) for general notification
or to gain access to hardware resources which requires host inspection.
These messages may vary in size and are handled as a partition copy
between two memory regions owned by the respective participants.
The PSI will respond with fail or success and a 16-bit message code.
The patch implements the vf to pf messaging mechanism above and, as the
first application making use of this support, it enables the VF to
configure its own primary MAC address.

Signed-off-by: Catalin Horghidan <catalin.horghidan@nxp.com>
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

enetc: Add ethtool statistics

This adds most h/w statistics counters: non-privileged SI conters, as
well as privileged Port and MAC counters available only to the PF.
Per ring software stats are also included.

Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com>
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

enetc: Introduce basic PF and VF ENETC ethernet drivers

ENETC is a multi-port virtualized Ethernet controller supporting GbE
designs and Time-Sensitive Networking (TSN) functionality.
ENETC is operating as an SR-IOV multi-PF capable Root Complex Integrated
Endpoint (RCIE).  As such, it contains multiple physical (PF) and
virtual (VF) PCIe functions, discoverable by standard PCI Express.

Introduce basic PF and VF ENETC ethernet drivers.  The PF has access to
the ENETC Port registers and resources and makes the required privileged
configurations for the underlying VF devices.  Common functionality is
controlled through so called System Interface (SI) register blocks, PFs
and VFs own a SI each.  Though SI register blocks are almost identical,
there are a few privileged SI level controls that are accessible only to
PFs, and so the distinction is made between PF SIs (PSI) and VF SIs (VSI).
As such, the bulk of the code, including datapath processing, basic h/w
offload support and generic pci related configuration, is shared between
the 2 drivers and is factored out in common source files (i.e. enetc.c).

Major functionalities included (for both drivers):
MSI-X support for Rx and Tx processing, assignment of Rx/Tx BD ring pairs
to MSI-X entries, multi-queue support, Rx S/G (Rx frame fragmentation) and
jumbo frame (up to 9600B) support, Rx paged allocation and reuse, Tx S/G
support (NETIF_F_SG), Rx and Tx checksum offload, PF MAC filtering and
initial control ring support, VLAN extraction/ insertion, PF Rx VLAN
CTAG filtering, VF mac address config support, VF VLAN isolation support,
etc.

Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_core: A write memory barrier is sufficient in EQ ci update

Soften the memory barrier call of mb() by a sufficient wmb() in the
consumer index update of the event queues.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: improve firmware handling

So far member rtl_fw has three states:
- IS_ERR(rtl_fw): firmware not loaded
- !rtl_fw: no firmware available
- other: firmware loaded

This can be made simpler and clearer by adding the firmware name as
member fw_name to struct rtl8169_private. Then:

- !fw_name: no firmware available
- !rtl_fw: firmware not loaded
- rtl_fw: firmware loaded

This change also allows to easily merge rtl_request_uncached_firmware
into rtl_request_firmware.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'hns3-next'

Huazhong Tan says:

====================
code optimizations & bugfixes for HNS3 driver

This patchset includes bugfixes and code optimizations for the HNS3
ethernet controller driver

Change log:
V1->V2: fixes comment from Eric Dumazet
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: fix PFC not setting problem for DCB module

The PFC enabling is based on user priority, currently it is
based on TC, which may cause PFC not setting correctly when pri
to TC mapping is not one to one relation.

This patch adds pfc_en in tm_info to fix it.

Fixes: cacde272dd00 ("net: hns3: Add hclge_dcb module for the support of DCB feature")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: add statistics for PFC frames and MAC control frames

In the old firmware version, statistics acquisition of
PFC frames and MAC control frames is not supported.
Add command retrieves statistics for PFC frames and
MAC control frames from the firmware.

Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: add ETS TC weight setting in SSU module

This patch sets the TC weight in SSU module according to
info in tm_info.

Also, zero weight of TC weight in SSU ETS module means enabling
strict priority, so do not allow zero weight when in ETS mode.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: do not return GE PFC setting err when initializing

GE MAC does not support PFC, when driver is initializing and MAC
is in GE Mode, ignore the fw not supported error, otherwise
initialization will fail.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: Change fw error code NOT_EXEC to NOT_SUPPORTED

According to firmware error code definition, the error code of 2
means NOT_SUPPORTED, this patch changes it to NOT_SUPPORTED.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: clear param in ring when free ring

Param pending_buf and skb may be not NULL when free ring.
This patch clears them when free ring.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: fix bug of ethtool_ops.get_channels for VF

The current code returns the number of all queues that can be used and
the number of queues that have been allocated, which is incorrect.
What should be returned is the number of queues allocated for each enabled
TC and the number of queues that can be allocated.

This patch fixes it.

Fixes: 849e46077689 ("net: hns3: add ethtool_ops.get_channels support for VF")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: fix for shaper not setting when TC num changes

Shaper setting does not change currently, when TC num changes,
which may cause shaper parameter not setting problem.

This patch fixes it by setting the shaper parameter when TC num
changes.

Fixes: cacde272dd00 ("net: hns3: Add hclge_dcb module for the support of DCB feature")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: fix rss configuration lost problem when setting channel

Currently rss configuration set by user will be lost when setting
channel.

This patch fixes it by not setting rss configuration to default
if user has configured the rss.

Fixes: 09f2af6405b8 ("net: hns3: add support to modify tqps number")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: refactor the statistics updating for netdev

In origin codes, there are some statistics item are got from mac, which
also include the packets statistics of VF. It is unreasonable. This
patch fixes it by counting them in the rx/tx processing flow.

Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: add rx multicast packets statistic

This patch adds rx multicast packets statistic for each ring.

Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: add calling roce callback function when link status change

This patch adds calling roce callback function when link status
change.

Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mlxsw-Add-VXLAN-support-for-Spectrum-2'

Ido Schimmel says:

====================
mlxsw: Add VXLAN support for Spectrum-2

This patchset adds support for VXLAN tunneling on the Spectrum-2 ASIC.
Spectrum-1 and Spectrum-2 are largely backward compatible in this area,
so not too many changes are required.

Patches #1-#2 expose a function and perform small refactoring towards
the actual Spectrum-2 implementation in patches #3-#4.

Patch #3 adds the required initialization steps on Spectrum-2.

Patch #4 finally enables VXLAN on Spectrum-2.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_nve: Enable VXLAN on Spectrum-2

Enable VXLAN on Spectrum-2 as previous patches added the required
functionality.

Note that for now Spectrum-1 and Spectrum-2 use the same function to
determine whether the VXLAN configuration is valid or not. In the
future, when the driver will be extended to support features not present
in Spectrum-1, two different functions will be needed.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_nve: Add support for VXLAN on Spectrum-2

Spectrum-1 and Spectrum-2 are largely backward compatible with regards
to VXLAN. One difference - as explained in previous patch - is that an
underlay RIF needs to be specified instead of an underlay VR during NVE
initialization. This is accomplished by calling the relevant function
that returns the index of such a RIF based on the table ID
(RT_TABLE_MAIN) where underlay look up occurs.

The second difference is that VXLAN learning (snooping) is controlled
via a different register (TNPC).

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_nve: Breakout common code to a common function

The configuration of a VXLAN tunnel in Spectrum-1 and Spectrum-2 is
largely the same. To avoid code duplication, breakout the common parts
to a common function that can be invoked from the ASIC-specific code.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum: Expose functions to create and destroy underlay RIF

In Spectrum-2, instead of providing the ID of the virtual router (VR)
where NVE underlay lookups will occur as in Spectrum-1, the ID of a
router interface (RIF) in this VR is required.

Expose functions to create and destroy such a RIF.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4: Mark expected switch fall-through

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

This patch fixes the following warning:

drivers/net/ethernet/mellanox/mlx4/eq.c: In function ‘mlx4_eq_int’:
drivers/net/ethernet/mellanox/mlx4/mlx4.h:219:5: warning: this statement may fall through [-Wimplicit-fallthrough=]
  if (mlx4_debug_level)      \
     ^
drivers/net/ethernet/mellanox/mlx4/eq.c:558:4: note: in expansion of macro ‘mlx4_dbg’
    mlx4_dbg(dev, "%s: MLX4_EVENT_TYPE_SRQ_LIMIT. srq_no=0x%x, eq 0x%x\n",
    ^~~~~~~~
drivers/net/ethernet/mellanox/mlx4/eq.c:561:3: note: here
   case MLX4_EVENT_TYPE_SRQ_CATAS_ERROR:
   ^~~~

Warning level 3 was used: -Wimplicit-fallthrough=3

This patch is part of the ongoing efforts to enabling
-Wimplicit-fallthrough.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qed: Mark expected switch fall-through

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

This patch fixes the following warning:

drivers/net/ethernet/qlogic/qed/qed_cxt.c:2126:4: warning: this statement may fall through [-Wimplicit-fallthrough=]

Warning level 3 was used: -Wimplicit-fallthrough=3

This patch is part of the ongoing efforts to enabling
-Wimplicit-fallthrough.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bna: Mark expected switch fall-throughs

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

This patch fixes the following warnings:

drivers/net/ethernet/brocade/bna/bfa_ioc.c:790:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
drivers/net/ethernet/brocade/bna/bfa_ioc.c:860:3: warning: this statement may fall through [-Wimplicit-fallthrough=]

Warning level 3 was used: -Wimplicit-fallthrough=3

This patch is part of the ongoing efforts to enabling
-Wimplicit-fallthrough

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Sudarsana Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

broadcom: Mark expected switch fall-throughs

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

This patch fixes the following warnings:

drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c:6336:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c:2231:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
drivers/net/ethernet/broadcom/tg3.c:722:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
drivers/net/ethernet/broadcom/tg3.c:783:6: warning: this statement may fall through [-Wimplicit-fallthrough=]

Warning level 3 was used: -Wimplicit-fallthrough=3

This patch is part of the ongoing efforts to enabling
-Wimplicit-fallthrough.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Sudarsana Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: 3c509: mark expected switch fall-throughs

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

This patch fixes the following warnings:

drivers/net/ethernet/3com/3c509.c:1265:8: warning: this statement may fall through [-Wimplicit-fallthrough=]
drivers/net/ethernet/3com/3c509.c:1271:8: warning: this statement may fall through [-Wimplicit-fallthrough=]

Warning level 3 was used: -Wimplicit-fallthrough=3

This patch is part of the ongoing efforts to enabling
-Wimplicit-fallthrough.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: mark expected switch fall-throughs

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

This patch fixes the following warnings:

net/tipc/link.c:1125:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
net/tipc/socket.c:736:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
net/tipc/socket.c:2418:7: warning: this statement may fall through [-Wimplicit-fallthrough=]

Warning level 3 was used: -Wimplicit-fallthrough=3

This patch is part of the ongoing efforts to enabling
-Wimplicit-fallthrough.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: Bump up driver version to 1.713.36

Recently, there were bunch of fixes to bnx2x driver, the code is now
aligned to out-of-box driver version 1.713.36. This patch updates
bnx2x driver version to 1.713.36.

Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

devlink: Use DIV_ROUND_UP_ULL in DEVLINK_HEALTH_SIZE_TO_BUFFERS

When building this code on a 32-bit platform such as ARM, there is a
link time error (lld error shown, happpens with ld.bfd too):

ld.lld: error: undefined symbol: __aeabi_uldivmod
>>> referenced by devlink.c
>>> net/core/devlink.o:(devlink_health_buffers_create) in archive built-in.a

This happens when using a regular division symbol with a u64 dividend.
Use DIV_ROUND_UP_ULL, which wraps do_div, to avoid this situation.

Fixes: cb5ccfbe73b3 ("devlink: Add health buffer support")
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: Add SDPX tag based on COPYING file

Some of the PHY and MDIO drivers refer to the COPYING file in the main
directory of this archive. This is the main license for Linux, thus
GPLv2 plus syscall extension.

Fixup the MODULE_LICENSE() where needed and add an SDPX header for
GPLv2.

Cc: David Daney <david.daney@cavium.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: Fixup GPLv2 SPDX tags based on license text

A few PHY drivers have the GPLv2 license text. They then either have
a MODULE_LICENSE() of GPLv2+, or an SPDX tag of GPLv2+.

Since the license text is much easier to understand than either the
SPDX tag or the MODULE_LICENSE, use it as the definitive source of the
licence, and fixup with others when there are contradictions.

Cc: Russell King <rmk+kernel@armlinux.org.uk>
Cc: Jonas Jensen <jonas.jensen@gmail.com>
Cc: Laurent Pinchart <laurentp@cse-semaphore.com>
Cc: Paulius Zaleckas <paulius.zaleckas@teltonika.lt>
Cc: Scott Wood <scottwood@freescale.com>
Acked-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Acked-by: Andrew F. Davis <afd@ti.com>
Acked-by: Dan Murphy <dmurphy@ti.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'SPDX-tags-for-PHY-and-MDIO-drivers'

Andrew Lunn says:

====================
SPDX tags for PHY and MDIO drivers

This patchset adds SPDX tags to files where the license information is
clear and consistent. It also removes redundent license text when an
SPDX header is present.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: Remove redundent License text when SPDX header is present

The SPDX header makes any license text redundent. Remove it.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: Convert some PHY and MDIO driver files to SPDX headers

Where the license text and the MODULE_LICENSE() value agree, convert
to using an SPDX header, removing the license text.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4/cxgb4vf: Link management changes

1) Speed should be supported by Physical Port Capabilities.
2) report Forward Error Correction mode which are available.
3) Added few comments.

Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Vishal Kulkarni <vishal@chelsio.com>
Signed-off-by: Arjun Vynipadath <arjun@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'selftests-forwarding-Add-tests-for-VXLAN-routing'

Ido Schimmel says:

====================
selftests: forwarding: Add tests for VXLAN routing

VXLAN routing allows hosts in different overlay networks (i.e.,
different VNIs) to communicate with one another.

Two popular routing models are asymmetric and symmetric routing.

In asymmetric routing the ingress VTEP routes the packet into the
correct VXLAN tunnel, whereas the egress VTEP only bridges the packet to
the correct host. Therefore, packets in different directions use
different VNIs - the target VNI.

In symmetric routing both the ingress and egress VTEPs perform routing
in the overlay network into / from the VXLAN tunnel. Packets in
different directions use the same VNI - the L3 VNI. Different tenants
(VRFs) use different L3 VNIs.

Patch #1 adds a test for asymmetric routing. Patches #2-#3 reuse the
topology and add test cases for ARP decapsulation and suppression.

Patch #4 adds a test for symmetric routing.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: forwarding: Add a test case for ARP suppression

ARP suppression allows the Linux bridge to answer ARP requests on behalf
of remote hosts. It reduces the amount of packets a VTEP needs to flood.

This test verifies that ARP suppression on / off works when a neighbour
exists and when it does not exist. It does so by sending an ARP request
from a host connected to one VTEP and checking whether it was received
by a second VTEP.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: forwarding: Add a test for VXLAN symmetric routing

In a similar fashion to the asymmetric test, add a test for symmetric
routing. In symmetric routing both the ingress and egress VTEPs perform
routing in the overlay network into / from the VXLAN tunnel. Packets in
different directions use the same VNI - the L3 VNI. Different tenants
(VRFs) use different L3 VNIs.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: forwarding: Add a test case for ARP decapsulation

Verify that ARP packets are correctly decapsulated by the ingress VTEP
by removing the neighbours configured on both VLAN interfaces and
running a ping test.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: forwarding: Add a test for VXLAN asymmetric routing

In asymmetric routing the ingress VTEP routes the packet into the
correct VXLAN tunnel, whereas the egress VTEP only bridges the packet to
the correct host. Therefore, packets in different directions use
different VNIs - the target VNI.

The test uses a simple topology with two VTEPs and two VNIs and verifies
that ping passes between hosts (local / remote) in the same VLAN (VNI)
and in different VLANs belonging to the same tenant (VRF).

While the test does not check VM mobility, it does configure an anycast
gateway using a macvlan device on both VTEPs.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'ptp_qoriq'

Yangbo Lu says:

====================
External trigger stamp fifo support for ptp_qoriq

This patch-set is to add external trigger stamp fifo support by a new
binding "fsl,extts-fifo", and to add fiper pulse loopback support which
is very useful for validating trigger without external hardware.
Also fixed issues in interrupt enabling/handling.

"fsl,extts-fifo" is required to be added into 1588 timer dts node whose
hardware supports it. The work will be done for some QorIQ platforms dts in
the near future.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

MAINTAINERS: add drivers/ptp/ptp_qoriq_debugfs.c into QorIQ PTP list

Added drivers/ptp/ptp_qoriq_debugfs.c into QorIQ PTP clock driver list.

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ptp: add debugfs support for ptp_qoriq

This patch is to add debugfs support for ptp_qoriq. Current debugfs
supports to control fiper1/fiper2 loopback mode. If the loopback mode
is enabled, the fiper1/fiper2 pulse is looped back into trigger1/
trigger2 input. This is very useful for validating hardware and driver
without external hardware. Below is an example to enable fiper1 loopback.

echo 1 > /sys/kernel/debug/2d10e00.ptp_clock/fiper1-loopback

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ARM: dts: ls1021a: add 1588 external trigger stamp fifo support

This patch is to add external trigger stamp fifo support
for 1588 timer.

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dt-binding: ptp_qoriq: document "fsl,extts-fifo" property

Documented "fsl,extts-fifo" property.

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ptp_qoriq: support external trigger stamp FIFO

The external trigger stamp FIFO was introduced as a new feature
for QorIQ 1588 timer IP block. This patch is to support it by
adding a new dts property "fsl,extts-fifo". Any QorIQ 1588 timer
supporting this feature is required to add this property in its
dts node.

In addition, the FIFO should be cleaned up before enabling external
trigger interrupts. Otherwise, there will be interrupts immediately
just after enabling external trigger interrupts.

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ptp_qoriq: fix interrupt enabling and handling

The tmr_tevent register would update event bits
no matter tmr_temask bits were set or not. So we
should get interrupts by tmr_tevent & tmr_temask,
and clean up interrupts in tmr_tevent before
enabling them.

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

devlink: Add missing check of nlmsg_put

nlmsg_put may fail, this fix add a check of its return value.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'Qualcomm-ethqos'

Vinod Koul says:

====================
net: Add support for Qualcomm ethqos

Some Qualcomm SoCs sport a ethqos controller which use DW ip, so add
the glue driver which uses stmmac driver along with DT bindings for
this device.

This controller supports rgmii mode and doesn't work with existing
phy drivers as they do not remove the phy delay delay in this mode,
so fix the two phy drivers tested with this.

Changes in v3:
- Add description in DT and rename the file and compatible as suggested by
Rob
- Update changelog for QCA8K driver
- Update AT803x phy disable delay for all RGMxx modes

Changes in v2:
- Fix the example in dt-binding
- Remove DT property for disable the delay and disable delay for RGMII mode
in AT803x and QCA8K PHY drivers
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: qca8k: disable delay for RGMII mode

In RGMII mode we should not have any delay in port MAC, so disable
the delay.

Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: at803x: Disable phy delay for RGMII mode

For RGMII mode, phy delay should be disabled. Add this case along
with disable delay routines.

Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

MAINTAINER: Add entry for Qualcomm ETHQOS ethernet driver

Add myself and Niklas as maintainers for this driver

Signed-off-by: Niklas Cassel <niklas.cassel@linaro.org>
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: stmmac: Add driver for Qualcomm ethqos

Add glue driver to support Qualcomm ETHQOS using stmmac driver.

This is based on downstream driver written by Siddarth Gupta, Sunil
Kumar Paidimarri, Rahul Ankushrao Kawadgave, Nisha Menon, Jagadeesh
Babu Challagundla, Chaitanya Pratapa, Lakshit Tyagi, Suraj Jaiswal,
Sneh Shah and Ventrapragada Ravi Kanth

Co-developed-by: Niklas Cassel <niklas.cassel@linaro.org>
Signed-off-by: Niklas Cassel <niklas.cassel@linaro.org>
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

dt-bindings: net: Add Qualcomm ethqos binding

Add support for Qualcomm ethqos found in some SoCs like QCS404.

Signed-off-by: Vinod Koul <vkoul@kernel.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: stmmac: implement the SIOCGHWTSTAMP ioctl

This patch adds support for the SIOCGHWTSTAMP ioctl which enables user
processes to read the current hwtstamp_config settings
non-destructively.

Signed-off-by: Artem Panfilov <panfilov.artyom@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'bridge-mrd'

Linus Lüssing says:

====================
bridge: implement Multicast Router Discovery (RFC4286)

This patchset adds initial Multicast Router Discovery support to
the Linux bridge (RFC4286). With MRD it is possible to detect multicast
routers and mark bridge ports and forward multicast packets to such routers
accordingly.

So far, multicast routers are detected via IGMP/MLD queries and PIM
messages in the Linux bridge. As there is only one active, selected
querier at a time RFC4541 ("Considerations for Internet Group Management
Protocol (IGMP) and Multicast Listener Discovery (MLD) Snooping
Switches") section 2.1.1.a) recommends snooping Multicast Router
Advertisements as provided by MRD (RFC4286).

The first two patches are refactoring some existing code which is reused
for parsing the Multicast Router Advertisements later in the fourth
patch. The third patch lets the bridge join the all-snoopers multicast
address to be able to reliably receive the Multicast Router
Advertisements.

What is not implemented yet from RFC4286 yet:

* Sending Multicast Router Solicitations:
  -> RFC4286: "[...] may be sent when [...] an interface is
     (re-)initialized [or] MRD is enabled"
* Snooping Multicast Router Terminations:
  -> currently this only relies on our own timeouts
* Adjusting timeouts with the values provided in the announcements

Changes in v2:

* rebased to current net-next/master (no conflicts/changes)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

bridge: Snoop Multicast Router Advertisements

When multiple multicast routers are present in a broadcast domain then
only one of them will be detectable via IGMP/MLD query snooping. The
multicast router with the lowest IP address will become the selected and
active querier while all other multicast routers will then refrain from
sending queries.

To detect such rather silent multicast routers, too, RFC4286
("Multicast Router Discovery") provides a standardized protocol to
detect multicast routers for multicast snooping switches.

This patch implements the necessary MRD Advertisement message parsing
and after successful processing adds such routers to the internal
multicast router list.

Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: David S. Miller <davem@davemloft.net>

bridge: join all-snoopers multicast address

Next to snooping IGMP/MLD queries RFC4541, section 2.1.1.a) recommends
to snoop multicast router advertisements to detect multicast routers.

Multicast router advertisements are sent to an "all-snoopers"
multicast address. To be able to receive them reliably, we need to
join this group.

Otherwise other snooping switches might refrain from forwarding these
advertisements to us.

Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: David S. Miller <davem@davemloft.net>

bridge: simplify ip_mc_check_igmp() and ipv6_mc_check_mld() internals

With this patch the internal use of the skb_trimmed is reduced to
the ICMPv6/IGMP checksum verification. And for the length checks
the newly introduced helper functions are used instead of calculating
and checking with skb->len directly.

These changes should hopefully make it easier to verify that length
checks are performed properly.

Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: David S. Miller <davem@davemloft.net>

bridge: simplify ip_mc_check_igmp() and ipv6_mc_check_mld() calls

This patch refactors ip_mc_check_igmp(), ipv6_mc_check_mld() and
their callers (more precisely, the Linux bridge) to not rely on
the skb_trimmed parameter anymore.

An skb with its tail trimmed to the IP packet length was initially
introduced for the following three reasons:

1) To be able to verify the ICMPv6 checksum.
2) To be able to distinguish the version of an IGMP or MLD query.
They are distinguishable only by their size.
3) To avoid parsing data for an IGMPv3 or MLDv2 report that is
beyond the IP packet but still within the skb.

The first case still uses a cloned and potentially trimmed skb to
verfiy. However, there is no need to propagate it to the caller.
For the second and third case explicit IP packet length checks were
added.

This hopefully makes ip_mc_check_igmp() and ipv6_mc_check_mld() easier
to read and verfiy, as well as easier to use.

Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: David S. Miller <davem@davemloft.net>

hinic: Add pci device ids

This patch adds PCI device IDs to support following cards:

1. Add device id 0x0205 for HINIC 100GE dual port mezz card.
2. Add device id 0x0210 for HINIC 25GE quad port mezz card.
3. Delete device id 0x0201 for HINIC 100GE dual port card, because
this is used by other product.
4. Macro of device id 0x200 is modified for HINIC 100GE dual port card.

Signed-off-by: Xue Chaojing <xuechaojing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: remove PCI DAC support

The old non-PCIe chip versions support PCI DAC, however this feature
seems to be fragile, see comment in the PCI error handler. Therefore
it's disabled per default. I think meanwhile it's time remove support
for this legacy feature. This helps to reduce complexity of the driver.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: improve rx buffer allocation

8 years ago, as part of 6f0333b8fde4 ("r8169: use 50% less ram for RX
ring"), the alignment requirement for rx buffers was silently changed
from 8 bytes to 16 bytes. I found nothing explaining this, also the
chip specs I have only mention an 8 byte requirement.
AFAICS kmalloc_node() guarantees allocated memory to be at least
"long long" aligned, what is 8 bytes on a 32 bit machine.
So we can take this memory as-is and avoid some overhead by changing
the alignment requirement back to 8 bytes.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: improve phy_init_hw

Currently the soft reset (if defined) is done only if the driver also
implements the config_init callback. I think this dependency is a
mistake, so let's remove it.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: fix issue with loading PHY driver w/o initramfs

It was reported that on a system with nfsboot and w/o initramfs network
fails because trying to load the PHY driver returns -ENOENT. Reason was
that due to missing initramfs the modprobe binary isn't available.
So we have to ignore error code -ENOENT.

Fixes: 13d0ab6750b2 ("net: phy: check return code when requesting PHY driver module")
Reported-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Tested-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch '40GbE' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
40GbE Intel Wired LAN Driver Updates 2019-01-22

This series contains updates to i40e and xsk.

Jan exports xdp_get_umem_from_qid() for other drivers/modules to use.
Refactored the code use the netdev provided umems, instead of containing
them inside our i40e_vsi.

Aleksandr fixes an issue where RSS queues were misconfigured, so limit
the RSS queue number to the online CPU number.

Damian adds support for ethtool's setting and getting the FEC
configuration.

Grzegorz fixes a type mismatch, where the return value was not matching
the function declaration.

Sergey adds checks in the queue configuration handler to ensure the
number of queue pairs requested by the VF is less than maximum possible.

Lihong cleans up code left around from earlier silicon validation in the
i40e debugfs code.

Julia Lawall and Colin Ian King clean up white space indentation issues
found.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'bonding-add-3ad-stats-and-export-them-via-xstats'

Nikolay Aleksandrov says:

====================
bonding: add 3ad stats and export them via xstats

This set adds support for counting some 3ad-specific packet types and
exports the new stats via the xstats API. atomic64 counters are used
since these are not fastpaths and we can avoid the per-cpu allocations.
Each 3ad counter is exported as a separate attribute to be easily
extensible since we plan to add more later. The stats are per-slave and
when the master stats are requested the slaves' stats are summed up.
Patches 01 and 02 do minor cleanups in preparation for the new stats
API. Patch 03 adds the new stats and patch 04 adds xstats support to
export them.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: add support for xstats and export 3ad stats

This patch adds support for extended statistics (xstats) call to the
bonding. The first user would be the 3ad code which counts the following
events:
- LACPDU Rx/Tx
- LACPDU unknown type Rx
- LACPDU illegal Rx
- Marker Rx/Tx
- Marker response Rx/Tx
- Marker unknown type Rx

All of these are exported via netlink as separate attributes to be
easily extensible as we plan to add more in the future.
Similar to how the bridge and other xstats exports, the structure
inside is:
[ IFLA_STATS_LINK_XSTATS ]
   -> [ LINK_XSTATS_TYPE_BOND ]
        -> [ BOND_XSTATS_3AD ]
             -> [ 3ad stats attributes ]

With this structure it's easy to add more stat types later.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: add 3ad stats

Count the following types of 3ad packets per slave:
- rx/tx lacpdu
- rx/tx marker
- rx/tx marker response
- rx illegal lacpdus (right now counted on wrong length)
- rx unknown lacpdu type
- rx unknown marker type

The counters are using atomic64 since this is not fast path.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: 3ad: remove bond_3ad_rx_indication's length argument

Since the received lacpdu is accessed via skb_header_pointer() in
bond_3ad_lacpdu_recv() we no longer need to check for skb->len's length.
If the returned lacpdu pointer is not null that should be enough.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: adjust style of bond_3ad_rx_indication

No functional changes, adjust the style of bond_3ad_rx_indication to
prepare it for the stats changes:
- reduce indentation by returning early on wrong length
- remove extra new lines between switch cases
- add marker local variable and use it to reduce line length
- rearrange local variables in reverse xmas tree
- separate final return

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: TLS record offload enable

Enable Inline TLS record by default

Signed-off-by: Atul Gupta <atul.gupta@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/tls: free ctx in sock destruct

free tls context in sock destruct. close may not be the last
call to free sock but force releasing the ctx in close
will result in GPF when ctx referred again in tcp_done

[  515.330477] general protection fault: 0000 [#1] SMP PTI
[  515.330539] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 4.20.0-rc7+ #10
[  515.330657] Hardware name: Supermicro X8ST3/X8ST3, BIOS 2.0b
11/07/2013
[  515.330844] RIP: 0010:tls_hw_unhash+0xbf/0xd0
[
[  515.332220] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  515.332340] CR2: 00007fab32c55000 CR3: 000000009261e000 CR4:
00000000000006e0
[  515.332519] Call Trace:
[  515.332632]  <IRQ>
[  515.332793]  tcp_set_state+0x5a/0x190
[  515.332907]  ? tcp_update_metrics+0xe3/0x350
[  515.333023]  tcp_done+0x31/0xd0
[  515.333130]  tcp_rcv_state_process+0xc27/0x111a
[  515.333242]  ? __lock_is_held+0x4f/0x90
[  515.333350]  ? tcp_v4_do_rcv+0xaf/0x1e0
[  515.333456]  tcp_v4_do_rcv+0xaf/0x1e0

Signed-off-by: Atul Gupta <atul.gupta@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/tls: build_protos moved to common routine

build protos is required for tls_hw_prot also hence moved to
'tls_build_proto' and called as required from tls_init
and tls_hw_proto. This is required since build_protos
for v4 is moved from tls_register to tls_init in
commit <28cb6f1eaffdc5a6a9707cac55f4a43aa3fd7895>

Signed-off-by: Atul Gupta <atul.gupta@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: introduce a knob to control whether to inherit devconf config

There have been many people complaining about the inconsistent
behaviors of IPv4 and IPv6 devconf when creating new network
namespaces. Currently, for IPv4, we inherit all current settings
from init_net, but for IPv6 we reset all setting to default.

This patch introduces a new /proc file
/proc/sys/net/core/devconf_inherit_init_net to control the
behavior of whether to inhert sysctl current settings from init_net.
This file itself is only available in init_net.

As demonstrated below:

Initial setup in init_net:
# cat /proc/sys/net/ipv4/conf/all/rp_filter
2
# cat /proc/sys/net/ipv6/conf/all/accept_dad
1

Default value 0 (current behavior):
# ip netns del test
# ip netns add test
# ip netns exec test cat /proc/sys/net/ipv4/conf/all/rp_filter
2
# ip netns exec test cat /proc/sys/net/ipv6/conf/all/accept_dad
0

Set to 1 (inherit from init_net):
# echo 1 > /proc/sys/net/core/devconf_inherit_init_net
# ip netns del test
# ip netns add test
# ip netns exec test cat /proc/sys/net/ipv4/conf/all/rp_filter
2
# ip netns exec test cat /proc/sys/net/ipv6/conf/all/accept_dad
1

Set to 2 (reset to default):
# echo 2 > /proc/sys/net/core/devconf_inherit_init_net
# ip netns del test
# ip netns add test
# ip netns exec test cat /proc/sys/net/ipv4/conf/all/rp_filter
0
# ip netns exec test cat /proc/sys/net/ipv6/conf/all/accept_dad
0

Set to a value out of range (invalid):
# echo 3 > /proc/sys/net/core/devconf_inherit_init_net
-bash: echo: write error: Invalid argument
# echo -1 > /proc/sys/net/core/devconf_inherit_init_net
-bash: echo: write error: Invalid argument

Reported-by: Zhu Yanjun <Yanjun.Zhu@windriver.com>
Reported-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>