review.tizen.org Git - platform/kernel/linux-rpi.git/log

net: phy: avoid setting unsupported EEE advertisments

We currently allow userspace to set any EEE advertisments it desires,
whether or not the PHY supports them.  For example:

# ethtool --set-eee eth1 advertise 0xffffffff
# ethtool --show-eee eth1
EEE Settings for eth1:
        EEE status: disabled
        Tx LPI: disabled
        Supported EEE link modes:  100baseT/Full
                                   1000baseT/Full
                                   10000baseT/Full
        Advertised EEE link modes:  100baseT/Full
                                    1000baseT/Full
                                    1000baseKX/Full
                                    10000baseT/Full
                                    10000baseKX4/Full
                                    10000baseKR/Full

Clearly, this is not sane, we should only allow link modes that are
supported to be advertised (as we do elsewhere.)  Ensure that we mask
the MDIO_AN_EEE_ADV value with the capabilities retrieved from the
MDIO_PCS_EEE_ABLE register.

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'bpf-prog-testing-framework'

Alexei Starovoitov says:

====================
bpf: program testing framework

Development and testing of networking bpf programs is quite cumbersome.
Especially tricky are XDP programs that attach to real netdevices and
program development feels like working on the car engine while
the car is in motion.
Another problem is ongoing changes to upstream llvm core
that can introduce an optimization that verifier will not
recognize. llvm bpf backend tests have no ability to run the programs.
To improve this situation introduce BPF_PROG_TEST_RUN command
to test and performance benchmark bpf programs.
It achieves several goals:
- development of xdp and skb based bpf programs can be done
in a canned environment with unit tests
- program performance optimizations can be benchmarked outside of
networking core (without driver and skb costs)
- continuous testing of upstream changes is finally practical

Patches 4,5,6 add C based test cases of various complexity
to cover some sched_cls and xdp features. More tests will
be added in the future. The tests were run on centos7 only.

For now the framework supports only skb and xdp programs. In the future
it can be extended to socket_filter and tracing program types.

More details are in individual patches.

v1->v2:
- rename bpf_program_test_run->bpf_prog_test_run
- add missing #include <linux/bpf.h> since libbpf.h shouldn't depend
on prior includes
- reordered patches 3 and 4 to keep bisect clean
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

selftests/bpf: add l4 load balancer test based on sched_cls

this l4lb demo is a comprehensive test case for LLVM codegen and
kernel verifier. It's using fully inlined jhash(), complex packet
parsing and multiple map lookups of different types to stress
llvm and verifier.
The map sizes, map population and test vectors are artificial to
exercise different paths through the bpf program.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests/bpf: add a test for basic XDP functionality

add C test for xdp_adjust_head(), packet rewrite and map lookups

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests/bpf: add a test for overlapping packet range checks

add simple C test case for llvm and verifier range check fix from
commit b1977682a385 ("bpf: improve verifier packet range checks")

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tools/lib/bpf: expose bpf_program__set_type()

expose bpf_program__set_type() to set program type

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tools/lib/bpf: add support for BPF_PROG_TEST_RUN command

add support for BPF_PROG_TEST_RUN command to libbpf.a

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: introduce BPF_PROG_TEST_RUN command

development and testing of networking bpf programs is quite cumbersome.
Despite availability of user space bpf interpreters the kernel is
the ultimate authority and execution environment.
Current test frameworks for TC include creation of netns, veth,
qdiscs and use of various packet generators just to test functionality
of a bpf program. XDP testing is even more complicated, since
qemu needs to be started with gro/gso disabled and precise queue
configuration, transferring of xdp program from host into guest,
attaching to virtio/eth0 and generating traffic from the host
while capturing the results from the guest.

Moreover analyzing performance bottlenecks in XDP program is
impossible in virtio environment, since cost of running the program
is tiny comparing to the overhead of virtio packet processing,
so performance testing can only be done on physical nic
with another server generating traffic.

Furthermore ongoing changes to user space control plane of production
applications cannot be run on the test servers leaving bpf programs
stubbed out for testing.

Last but not least, the upstream llvm changes are validated by the bpf
backend testsuite which has no ability to test the code generated.

To improve this situation introduce BPF_PROG_TEST_RUN command
to test and performance benchmark bpf programs.

Joint work with Daniel Borkmann.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: Mock-up driver

This patch adds support for a DSA mock-up driver which essentially does
the following:

- registers/unregisters 4 fixed PHYs to the slave network devices
- uses eth0 (configurable) as the master netdev
- registers the switch as a fixed MDIO device against the fixed MDIO bus
at address 31
- includes dynamic debug prints for dsa_switch_ops functions that can be
enabled to get call traces

This is a good way to test modular builds as well as exercise the DSA
APIs without requiring access to real hardware. This does not test the
data-path, although this could be added later on.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mv88e6xxx-cross-chip-bridging'

Vivien Didelot says:

====================
net: dsa: mv88e6xxx: program cross-chip bridging

The purpose of this patch series is to bring hardware cross-chip
bridging configuration to the DSA layer and the mv88e6xxx DSA driver.

Most recent Marvell switch chips have a Cross-chip Port Based VLAN Table
(PVT) used to restrict to which internal destination port an arbitrary
external source port is allowed to egress frames to.

The current behavior of the mv88e6xxx driver is to program this table
table with all ones, allowing any external ports to egress frames on any
internal ports. This means that carefully crafted Ethernet frames can
potentially bypass the user bridging configuration.

Patches 1 to 7 prepare the setup of this table and factorize the common
bits of both in-chip and cross-chip Marvell bridging code.

Patch 8 adds new optional cross-chip bridging operations to DSA switch.

Patch 9 switches the current behavior to program the table according to
the user bridging configuration when (cross-chip) ports get (un)bridged.

On a ZII Rev B board, bridging together the 3 user ports of both 88E6352
will result in the following PVTs on respectively switch 0 and switch 1:

    External   Internal Ports
    Dev Port   0  1  2  3  4  5  6

     1    0    *  *  *  -  -  *  *
     1    1    *  *  *  -  -  *  *
     1    2    *  *  *  -  -  *  *
     1    3    -  -  -  -  -  *  *
     1    4    -  -  -  -  -  *  *
     1    5    *  *  *  *  *  *  *
     1    6    *  *  *  *  *  *  *

     0    0    *  *  *  -  -  *  *
     0    1    *  *  *  -  -  *  *
     0    2    *  *  *  -  -  *  *
     0    3    -  -  -  -  -  *  *
     0    4    -  -  -  -  -  *  *
     0    5    *  *  *  *  *  *  *
     0    6    *  *  *  *  *  *  *

Changes since v2:
  - Define MV88E6XXX_MAX_PVT_SWITCHES and MV88E6XXX_MAX_PVT_PORTS
  - use mv88e6xxx_g2_misc_4_bit_port instead of the 5-bit variant
  - add Andrew's tags and reword commit 6/9
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: add cross-chip bridging

Implement the DSA cross-chip bridging operations by remapping the local
ports an external source port can egress frames to, when this cross-chip
port joins or leaves a bridge.

The PVT is no longer configured with all ones allowing any external
frame to egress any local port. Only DSA and CPU ports, as well as
bridge group members, can egress frames on local ports.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: add cross-chip bridging operations

Introduce crosschip_bridge_{join,leave} operations in the dsa_switch_ops
structure, which can be used by switches supporting interconnection.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: remap existing bridge members

When a local port of a switch chip becomes a member of a bridge group,
we need to reprogram the Cross-chip Port Based VLAN Table (PVT) to allow
existing cross-chip bridge members to egress frames on the new ports.

There is no functional changes yet, since the PVT is still programmed
with all ones, allowing any external port to egress frames locally.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: factorize in-chip bridge map

Factorize the code in the DSA port_bridge_{join,leave} routines used to
program the port VLAN map of all local ports of a given bridge group.

At the same time shorten the _mv88e6xxx_port_based_vlan_map to get rid
of the old underscore prefix naming convention.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: rework in-chip bridging

All ports -- internal and external, for chips featuring a PVT -- have a
mask restricting to which internal ports a frame is allowed to egress.

Now that DSA exposes the number of ports and their bridge devices, it is
possible to extract the code generating the VLAN map and make it generic
so that it can be shared later with the cross-chip bridging code.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: allocate the number of ports

The current code allocates DSA_MAX_PORTS ports for a Marvell dsa_switch
structure. Provide the exact number of ports so the corresponding
ds->num_ports is accurate.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: program the PVT with all ones

The Cross-chip Port Based VLAN Table (PVT) is currently initialized with
all ones, allowing any external ports to egress frames on local ports.

This commit implements the PVT access functions and programs the PVT
with all ones for the local switch ports only, instead of using the Init
operation. The current behavior is unchanged for the moment.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: use 4-bit port for PVT data

The Cross-chip Port Based VLAN Table (PVT) supports two indexing modes,
one using 5-bit for device and 4-bit for port, the other using 4-bit for
device and 5-bit for port, configured via the Global 2 Misc register.

Only 4 bits for the source port are needed when interconnecting 88E6xxx
switch devices since they all support less than 16 physical ports. The
full 5 bits are needed when interconnecting a device with 98DXxxx switch
devices since they support more than 16 physical ports.

Add a mv88e6xxx_pvt_setup helper to set the 4-bit port PVT mode, which
will be extended later to also initialize the PVT content.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: move PVT description in info

Not all Marvell switch chips feature a Cross-chip Port VLAN Table (PVT).

Chips with a PVT use the same implementation, so a new mv88e6xxx_ops
member won't be necessary yet. Add a "pvt" boolean member to the
mv88e6xxx_info structure and kill the obsolete MV88E6XXX_FLAGS_PVT flag.

Add a mv88e6xxx_has_pvt helper to wrap future checks of that condition.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

dpaa_eth: use AVOIDBLOCK for Tx confirmation queues

The AVOIDBLOCK flag determines the Tx confirmation queues processing
to be redirected to any available CPU when the current one is slow
in processing them. This may result in a higher Tx confirmation
interrupt count but may reduce pressure on a certain CPU that with
the previous setting would process all Tx confirmation frames.

Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fsl/fman: take into account all RGMII modes

Accept the internal delay RGMII variants.

Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

vxlan: vxlan dev should inherit lowerdev's gso_max_size

vxlan dev currently ignores lowerdev's gso_max_size, which adversely
affects TSO performance of liquidio if it's the lowerdev.  Egress TCP
packets' skb->len often exceed liquidio's advertised gso_max_size.  This
may happen on other NIC drivers.

Fix it by assigning lowerdev's gso_max_size to that of vxlan dev.  Might as
well do likewise for gso_max_segs.

Single flow TSO throughput of liquidio as lowerdev (using iperf3):

    Before the patch:    139 Mbps
    After the patch :   8.68 Gbps
    Percent increase:  6,144 %

Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: Satanand Burla <satananda.burla@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sock: avoid dirtying sk_stamp, if possible

sock_recv_ts_and_drops() unconditionally set sk->sk_stamp for
every packet, even if the SOCK_TIMESTAMP flag is not set in the
related socket.
If selinux is enabled, this cause a cache miss for every packet
since sk->sk_stamp and sk->sk_security share the same cacheline.
With this change sk_stamp is set only if the SOCK_TIMESTAMP
flag is set, and is cleared for the first packet, so that the user
perceived behavior is unchanged.

This gives up to 5% speed-up under udp-flood with small packets.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'ibmvnic-cleanup-resource-handling'

Nathan Fontenot says:

====================
ibmvnic: Cleanup resource handling

In order to better manage the resources of the ibmvnic driver, this set of
patches creates a set of initialization and release routines for the
drivers resources. Additionally, some patches do some re-naming of the
affected routines so that there is a common naming scheme in the driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ibmvnic: Cleanup failure path in ibmvnic_open

Now that ibmvnic_release_resources will clean up all of our resources
properly, even if they were not allocated, we can just call this
for failues in ibmvnic_open.

This patch also moves the ibmvnic_release_resources() routine up
in the file to avoid creating a forward declaration ad re-names it to
drop the ibmvnic prefix.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ibmvnic: Create init/release routines for stats token

Create an initialization and a release routine for the stats token used by
the ibmvnic driver.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ibmvnic: Merge the two release_sub_crq_queue routines

Keeping two routines for releasing sub crqs, one for when irqs are not
initialized and one for when they are, is a bit of overkill. Merge the
two routines to a common release routine that will check for an irq
and release it if needed.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ibmvnic: Create init and release routines for the rx pool

Move the initialization and the release of the rx pool to their own
routines, and update them to do validation.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ibmvnic: Create init and release routines for the tx pool

Move the initialization and the release of the tx pool to their own routines,
and update them to do validation. This also adds validation to the release
of the long term buffer.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ibmvnic: Create init and release routines for the bounce buffer

Move the handling of initialization and releasing the bounce buffer to their
own init and release routines.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ibmvnic: Update main crq initialization and release

Update the initialization and release routines for the crq queue so that
we validate the crq queue.

Additionally this updates the naming of the init and release routines
for the crq queue to drop the ibmvnic prefix. This matches the naming
for similar routines in the driver

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: tcp: Refine the __tcp_select_window

1. Move the "window = tp->rcv_wnd;" into the condition block without
tp->rx_opt.rcv_wscale.
Because it is unnecessary when enable wscale;

2. Use the macro ALIGN instead of two statements.
The two statements are used to make window align to 1<<wscale.
Use the ALIGN is more clearer.

3. Use the rounddown to make codes clearer.

Signed-off-by: Gao Feng <fgao@ikuai8.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: debug ATU Age Time

The ATU ageing time value programmed in the switch is rounded up to the
nearest multiple of its coefficient (variable depending on the model.)

Add a debug message to inform the user about the exact programmed value.

On 6352, "brctl setageing br0 18" gives "AgeTime set to 0x01 (15000 ms)"
while on 6390 we get "AgeTime set to 0x05 (18750 ms)".

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ibmvnic: Remove debugfs support

The debugfs support in the ibmvnic driver is not, and never has been,
supported. Just remove it.

The work done in the debugfs code for the driver was part of the original
spec for the ibmvnic driver. The corresponding support for this from the
server side was never supported and has been dropped.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: refine bond_fold_stats() wrap detection

Some device drivers reset their stats at down/up events, possibly
fooling bonding stats, since they operate with relative deltas.

It is nearly not possible to fix drivers, since some of them compute the
tx/rx counters based on per rx/tx queue stats, and the queues can be
reconfigured (ethtool -L) between the down/up sequence.

Lets avoid accumulating 'negative' values that render bonding stats
useless.

It is better to lose small deltas, assuming the bonding stats are
fetched at a reasonable frequency.

Fixes: 5f0c5f73e5ef ("bonding: make global bonding stats more reliable")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

VSOCK: remove unnecessary ternary operator on return value

Rather than assign the positive errno values to ret and then
checking if it is positive and flip the sign, just return the
errno value.

Detected by CoverityScan, CID#986649 ("Logically Dead Code")

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers: add explicit interrupt.h includes

These files all use functions declared in interrupt.h, but currently rely
on implicit inclusion of this file (via netns/xfrm.h).

That won't work anymore when the flow cache is removed so include that
header where needed.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: stmmac: dwmac-rk: Add handling for RGMII_ID/RXID/TXID

ATM dwmac-rk will always set and enable it's internal delay lines.
Using PHY internal delays in combination with the phy-mode
rgmii-id/rxid/txid was not possible. Only rgmii was supported.

Now we can disable rockchip's gmac delay lines and also use
rgmii-id/rxid/txid.

Tested only with a RK3288 based board.

Signed-off-by: Wadim Egorov <w.egorov@phytec.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

Revert "net: stmmac: enable multiple buffers"

The commit aff3d9eff843 ("net: stmmac: enable multiple buffers") breaks
numerous boards. while some patch exists for fixing some of it,
dwmac-sunxi is still broken with it.
Since this patch is very huge, it will be better to split it in smaller
part.

Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch '40GbE' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
40GbE Intel Wired LAN Driver Updates 2017-03-29

This series contains updates to i40e and i40evf only.

Preethi changes the default driver mode of operation to descriptor
write-back for VF.

Alex cleans up and addresses several issues in the way that i40e handles
private flags.  Modifies the driver to use the length of the packet
instead of the DD status bit to determine if a new descriptor is ready
to be processed.  Refactors the driver by pulling the code responsible
for fetching the receive buffer and synchronizing DMA into a single
function.  Also pulled the code responsible for handling buffer
recycling and page counting and distributed it through several functions,
so we can commonize the bits that handle either freeing or recycling the
buffers.  Cleans up the code in preparation for us adding support for
build_skb().  Changed the way we handle the maximum frame size for the
receive path so it is more consistent with other drivers.

Paul enables XL722 to use the direct read/write method since it does not
support the AQ command to read/write the control register.

Christopher fixes a case where we miss an arq element if a new one is
added before we enable interrupts and exit the loop.

Jake cleans up a pointless goto statement.  Also cleaned up a flag that
was not being used.

Carolyn does round 2 for adding a delay to the receive queue to
accommodate the hardware needs.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'tipc-socketpair'

Parthasarathy Bhuvaragan says:

====================
tipc: add socketpair support

We add socketpair support for connection oriented sockets in
the first patch and for connection less in the second.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: allow rdm/dgram socketpairs

for socketpairs using connectionless transport, we cache
the respective node local TIPC portid to use in subsequent
calls to send() in the socket's private data.

Signed-off-by: Erik Hugne <erik.hugne@gmail.com>
Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: add support for stream/seqpacket socketpairs

sockets A and B are connected back-to-back, similar to what
AF_UNIX does.

Signed-off-by: Erik Hugne <erik.hugne@gmail.com>
Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvneta: set rx mode during resume if interface is running

I found a bug by:

0. boot and start dhcp client
1. echo mem > /sys/power/state
2. resume back immediately
3. don't touch dhcp client to renew the lease
4. ping the gateway. No acks

Usually, after step2, the DHCP lease isn't expired, so in theory we
should resume all back. But in fact, it doesn't. It turns out
the rx mode isn't resumed correctly. This patch fixes it by adding
mvneta_set_rx_mode(dev) in the resume hook if interface is running.

Signed-off-by: Jisheng Zhang <jszhang@marvell.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvneta: add RGMII_RXID and RGMII_TXID support

RGMII_RXID and RGMII_TX_ID share the same GMAC CTRL setting as RGMII
or RGMII_ID.

Signed-off-by: Jisheng Zhang <jszhang@marvell.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: veth: use new api ethtool_{get|set}_link_ksettings

The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'mlx5e-pedit' of git://git./linux/kernel/git/saeed/linux

Or Gerlitz says:

====================
mlx5e-pedit 2017-03-28

This series adds support for offloading modifications of packet headers using
ConnectX-5 HW header re-write as an action applied during packet steering.

The offloaded SW mechanism is TC's pedit action. The offloading is
supported for E-Switch steering of VF traffic in the SRIOV
switchdev mode and for NIC (non eswitch) RX.

One use-case for this offload on virtual networks, is when the hypervisor
implements flow based router such as Open-Stack's DVR, where L2 headers
of guest packets re-written with routers' MAC addresses and the IP TTL
is decremented.

Another use case (which can be applied in parallel with routing) is
stateless NAT where guest L3/L4 headers are re-written.

The series is built as follows: the 1st six patches are preperations which
don't yet add new functionality, patches 7-8 add the FW APIs (data-structures
and commands) for header re-write, and patch nine allows offloading driver
to access pedit keys.

The 10th patch is somehow the core of the series, where we translate from
the pedit way to represent set of header modification elements to the FW
API for that same matter.

Once a set of HW modification is established, we register it with the FW
and get a modify header ID. When this ID is used with an action during
packet steering, the HW applies the header modification on the packet.

Patches 11 and 12 implement the above logic as an offload for pedit action
for the NIC and E-Switch use-cases.

I'd like to thanks Elijah Shakkour <elijahs@mellanox.com> for implementing
and helping me testing this functionality on HW simulator, before it could
be done with FW.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: mpls: Update lfib_nlmsg_size to skip deleted nexthops

A recent commit skips nexthops in a route if the device has been
deleted. Update lfib_nlmsg_size accordingly.

Reported-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Robert Shearman <rshearma@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: Allow building mdio-boardinfo into the kernel

mdio-boardinfo contains code that is helpful for platforms to register
specific MDIO bus devices independent of how CONFIG_MDIO_DEVICE or
CONFIG_PHYLIB will be selected (modular or built-in). In order to make
that possible, let's do the following:

- descend into drivers/net/phy/ unconditionally

- make mdiobus_setup_mdiodev_from_board_info() take a callback argument
  which allows us not to expose the internal MDIO board info list and
  mutex, yet maintain the logic within the same file

- relocate the code that creates a MDIO device into
  drivers/net/phy/mdio_bus.c

- build mdio-boardinfo.o into the kernel as soon as MDIO_DEVICE is
  defined (y or m)

Fixes: 90eff9096c01 ("net: phy: Allow splitting MDIO bus/device support from PHYs")
Fixes: 648ea0134069 ("net: phy: Allow pre-declaration of MDIO devices")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

i40e: fix for queue timing delays

This patch adds a delay to Rx queue disables to accommodate HW needs.

v2: Added missing check for disable only, additional details on the
need for the ugly delay and fixed spacing on comment.

Change-ID: I2864ca667ce5dcc2cc44f8718113b719742a46a1
Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e/i40evf: Change the way we limit the maximum frame size for Rx

This patch changes the way we handle the maximum frame size for the Rx
path.  Previously we were rounding up to 2K for a 1500 MTU and then brining
the max frame size down to MTU plus a fixed amount.  With this patch
applied what we now do is limit the maximum frame to 1.5K minus the value
for NET_IP_ALIGN for standard MTU, and for any MTU greater than 1500 we
allow up to the maximum frame size.  This makes the behavior more
consistent with the other drivers such as igb which had similar logic.  In
addition it reduces the test matrix for MTU since we only have two max
frame sizes that are handled for Rx now.

Change-ID: I23a9d3c857e7df04b0ef28c64df63e659c013f3f
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e/i40evf: Add legacy-rx private flag to allow fallback to old Rx flow

This patch adds a control which will allow us to toggle into and out of the
legacy Rx mode. The legacy Rx mode is what we currently do when performing
Rx. As I make further changes what should happen is that the driver will
fall back to the behavior for Rx as of this patch should the "legacy-rx"
flag be set to on.

Change-ID: I0342998849bbb31351cce05f6e182c99174e7751
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e/i40evf: Break i40e_fetch_rx_buffer up to allow for reuse of frag code

This patch is meant to clean up the code in preparation for us adding
support for build_skb. Specifically we deconstruct i40e_fetch_buffer into
several functions so that those functions can later be reused when we add a
path for build_skb.

Specifically with this change we split out the code for adding a page to an
exiting skb.

Change-ID: Iab1efbab6b8b97cb60ab9fdd0be1d37a056a154d
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e/i40evf: Pull out code for cleaning up Rx buffers

This patch pulls out the code responsible for handling buffer recycling and
page counting and distributes it through several functions. This allows us
to commonize the bits that handle either freeing or recycling the buffers.

As far as the page count tracking one change to the logic is that
pagecnt_bias is decremented as soon as we call i40e_get_rx_buffer. It is
then the responsibility of the function that pulls the data to either
increment the pagecnt_bias if the buffer can be recycled as-is, or to
update page_offset so that we are pointing at the correct location for
placement of the next buffer.

Change-ID: Ibac576360cb7f0b1627f2a993d13c1a8a2bf60af
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e/i40evf: Pull code for grabbing and syncing rx_buffer from fetch_buffer

This patch pulls the code responsible for fetching the Rx buffer and
synchronizing DMA into a function, specifically called i40e_get_rx_buffer.

The general idea is to allow for better code reuse by pulling this out of
i40e_fetch_rx_buffer. We dropped a couple of prefetches since the time
between the prefetch being called and the data being accessed was too small
to be useful.

Change-ID: I4885fce4b2637dbedc8e16431169d23d3d7e79b9
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e/i40evf: Use length to determine if descriptor is done

This change makes it so that we use the length of the packet instead of the
DD status bit to determine if a new descriptor is ready to be processed.
The obvious advantage is that it cuts down on reads as we don't really even
need the DD bit if going from a 0 to a non-zero value on size is enough to
inform us that the packet has been completed.

Change-ID: Iebdf9cdb36c454ef092df27199b92ad09c374231
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: remove FDIR_REQUIRES_REINIT driver flag

This flag hasn't been used since commit 1e1be8f622ee ("i40e: ATR policy
change to flush the table to clean stale ATR rules").

Lets simplify things and just remove it.

Change-ID: I76279d84db8a2fd96f445b96aa413059f9256879
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: remove a useless goto statement

The goto found here for when in MFP mode is pointless. It jumps to the
end of a series of if blocks. However, right after this statement is
a closing '}' for this if block, which will result in the program flow
going to the exact same location as the goto statement indicates. Thus,
regardless of whether we are in MFP mode, the program flow will resume
from the same location.

This arose due to various refactoring which did not notice that this
goto became essentially a no-op.

To properly understand this diff you will need to view a larger context
than is given by default.

Change-ID: I088f73c3831aa5c4e2281380c7a3ce605594300c
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: Check for new arq elements before leaving the adminq subtask loop

Fix a case where we miss an arq element if a new one is added before we
enable interrupts and exit the arq subtask loop. This occurs frequently
with RDMA running on Windows VF and causes long delays that prevent SMB
from establishing connections.

Change-ID: I3e1c8b2b960c12857d9b8275bea2c1563674392e
Signed-off-by: Christopher N Bednarz <christopher.n.bednarz@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: use register for XL722 control register read/write

The XL722 doesn't support the AQ command to read/write the control
register so enable it to bypass the check and use the direct read/write
method.

Change-ID: Iefecc737b57207485c90845af5989d5af518bf16
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: Clean up handling of private flags

This patch cleans up and addresses several issues in the way that i40e
handles private flags. Previously the code was choosing fixed bits and
trying to match them up with strings in a somewhat haphazard way. This
resulted in the possibility for adding a new bit and causing a mismatch as
the private flags are linear bits starting at 0, and the private flags in
the driver were split up over a group specific to the PF and a group that
was global.

What this change does is define an array of structs used to represent the
private flags. Contained within the structs are the bits necessary to know
which flags to set and/or clear depending on the state of the bit. By
doing this we can add new bits in the future with minimal overhead and
avoid creating possible mis-matches should we need to remove a flag based
on compile options.

Change-ID: Ia3214ab04f0ab2f70354ac0997a135f1d01b0acd
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40evf: enforce descriptor write-back mechanism for VF

The current driver mode is to use a write-back mechanism for the head
register which indicates transmit completions. The VF driver needs to be
able to work on hardware that exclusively uses descriptor write-back, so
change the default driver mode of operation to descriptor write-back for
VF. In our analysis, performance wasn't significantly different with
either write-back method.

Change-ID: Ia92e4ec77c2df8dc4515c71d53746d57d77759af
Signed-off-by: Preethi Banala <preethi.banala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Merge branch 'dsa-devlink'

Andrew Lunn says:

====================
break include loop and dsa devlink support

These two patches add very basic support for devlink to DSA, in
preparation for playing with dpipe.

The first patch is needed to break an include loop between
netdevice.h, dsa.h and devlink.h. We need to remove dsa.h from
netdevice.h. As a result, some files fail to compile, because they
require includes pulled in via dsa.h. So this patch adds a number of
includes in various places. The majority is within the network
subsystem, but cifs also needs a few fixes.

0-day has been chewing on this for over a day now, and not found any
breakage. But Arnd's randconfig might uncover something.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: dsa2: Add basic support of devlink

Register the switch and its ports with devlink.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: break include loop netdevice.h, dsa.h, devlink.h

There is an include loop between netdevice.h, dsa.h, devlink.h because
of NETDEV_ALIGN, making it impossible to use devlink structures in
dsa.h.

Break this loop by taking dsa.h out of netdevice.h, add a forward
declaration of dsa_switch_tree and netdev_set_default_ethtool_ops()
function, which is what netdevice.h requires.

No longer having dsa.h in netdevice.h means the includes in dsa.h no
longer get included. This breaks a few other files which depend on
these includes. Add these directly in the affected file.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'netconf-delnetconf'

David Ahern says:

====================
netconf: Add support for RTM_DELNETCONF

netconf notifications are sent as devices register but not when they
are deleted leaving userspace caches out of sync. Add support for
RTM_DELNETCONF to ipv4, ipv6 and mpls.

MPLS is missing RTM_NEWNETCONF as devices are created, so add it as well.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: mpls: Send netconf messages on device register and unregister

Send netconf notifications for MPLS when the device registers and
unregisters.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net:mpls: Refactor mpls_netconf_notify_devconf to take event

Refactor mpls_netconf_notify_devconf to take the event as an input arg.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipv6: Add support for RTM_DELNETCONF

Send RTM_DELNETCONF notifications when a device is deleted. The message only
needs the device index, so modify inet6_netconf_fill_devconf to skip devconf
references if it is NULL.

Allows a userspace cache to remove entries as devices are deleted.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipv6: Refactor inet6_netconf_notify_devconf to take event

Refactor inet6_netconf_notify_devconf to take the event as an input arg.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: devinet: Add support for RTM_DELNETCONF

Send RTM_DELNETCONF notifications when a device is deleted. The message only
needs the device index, so modify inet_netconf_fill_devconf to skip devconf
references if it is NULL.

Allows a userspace cache to remove entries as devices are deleted.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: devinet: Refactor inet_netconf_notify_devconf to take event

Refactor inet_netconf_notify_devconf to take the event as an input arg.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

rtnetlink: Add RTM_DELNETCONF

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

liquidio: refactor interrupt moderation code

Refactor interrupt moderation code for flexibility because parameters are
different for 10G and 25G cards. Currently parameters (for 10G only) come
from macros compiled-in to the PF and VF drivers; fix it so that parameters
suitable for the card (10G or 25G) come from the NIC firmware via response
to a command.

Also bump up driver version to 1.5.1 to match newer NIC firmware version.

Signed-off-by: Prasad Kanneganti <prasad.kanneganti@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: Derek Chickles <derek.chickles@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: fix copyright holder

I do not hold the copyright of the DSA core and drivers source files,
since these changes have been written as an initiative of my day job.
Fix this.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: unconditionally set ATU trunk

Set the trunk member of the mv88e6xxx_atu_entry structure regardless its
value, so that uninitialized structures gets the correct boolean value.

Note that no mainline code is affected by the current behavior.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: add support for NETDEV_RESEND_IGMP event

This patch adds support for NETDEV_RESEND_IGMP event similar
to how it works for IPv4.

Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'dsa-mv88e6xxx-fix-chip-definitions'

Vivien Didelot says:

====================
net: dsa: fix chip definitions

The definitions of some of the mv88e6xxx_ops and mv88e6xxx_info
structures are misordered and erroneous for 88E6191 and 88E6391.

This patch series cleans that up.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: remove 88E6391 ops

We don't support 88E6391 anywhere in the code, so remove the unused
mv88e6391_ops structure.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: fix 88E6191 ops

The mv88e6xxx_info structure for the 88E6191 chip was pointing the
mv88e6391_ops definition instead of mv88e6191_ops. Fix this.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: reorder 88E6341 definitions

The related mv88e6xxx_ops structure was misplaced. Reorder it correctly
to fix this.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: reorder 88E6141 definitions

The related mv88e6xxx_ops and mv88e6xxx_info structure were misplaced.
Reorder them correctly to fix this.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'qed-load-unload-mfw'

Yuval Mintz says:

====================
qed: load/unload mfw series

This series correct the unload flow and greatly enhances its
initialization flow in regard to interactions between driver
and management firmware.

Patch #1 makes sure unloading is done under management-firmware's
'criticial section' protection.

Patches #2 - #4 move driver into using a newer scheme for loading
in regard to the MFW; This newer scheme would help cleaning the device
in case a previous instance has dirtied it [preboot, PDA, etc.].

Patches #5 - #6 let driver inform management-firmware on number of
resources which are dependent on the non-management firmware used.
Patch #7 then uses a new resource [BDQ] instead of some set value.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

qed: Use BDQ resource for storage protocols

Until now, qed used some port-defined value as BDQ index for both iSCSI
and FCoE.

As management firmware now treats BDQ as a resource and tells each PF
its BDQ-range, start using a valure from that range instead.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qed: Utilize resource-lock based scheme

Management firmware is used as an arbiter between the various PFs
in matters of resources, but some of the resources that need to
be divided are dependent on the non-management firmware used,
so management firmware first needs to be told how many resources
there are before trying to divide them.

As part of the initialization sequence, driver would first inform
the management firmware of the available resources under
a dedicated resource lock, and afterwards request for various
resources which might be based on the previous set values.

Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qed: Support management-based resource locking

Global locking can't properly be used to synchronize between different
PFs in all scenarios, as those instances might reside in different
logical partitions [e.g., when a PF is assigned via PDA to some VM].

The management firmware provides a generic infrastructure for
device locks. For each 'resource', it's guaranteed it could be acquired
by at most a single PF at any given time [or by management firmware].

This patch adds the necessary logic in qed for utilizing said
infrastructure, implementing lock/unlock internal APIs.

Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qed: Send pf-flr as part of initialization

During HW initialization, driver would set various registers to their
needed values - but it assumes all registers start at their reset-value,
so there's no need to re-configure a register's default value.

This assumption might be incorrect, e.g., in case of preboot driver
running and initializing the driver prior to our driver.

To overcome this, we now ask management firmware to initiate a PF-flr
early during the initialization sequence. That would return everything
in the PF's scope back to default and prevent previous configurations
from still being applied.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qed: Move to new load request scheme

Management firmware is used as an arbiter between the various PFs
in regard to loading - it causes the various PFs to load/unload
sequentially and informs each of its appropriate rule in the init.

But the existing flow is too weak to handle some scenarios where
PFs aren't properly cleaned prior to loading.
The significant scenarios falling under this criteria:
a. Preboot drivers in some environment can't properly unload.
b. Unexpected driver replacement [kdump, PDA].

Modern management firmware supports a more intricate loading flow,
where the driver has the ability to overcome previous limitations.
This moves qed into using this newer scheme.

Notice new scheme is backward compatible, so new drivers would
still be able to load properly on top of older management firmwares
and vice versa.

Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qed: hw_init() to receive parameter-struct

We'll soon need additional information, so start by changing
the infrastructure to receive the initializing variables
via a parameter struct.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qed: Correct HW stop flow

Management firmware is used as arbiter between different PFs
which are loading/unloading, but in order to use the synchronization
it offers the contending configurations need to be applied either
between their LOAD_REQ <-> LOAD_DONE or UNLOAD_REQ <-> UNLOAD_DONE
management firmware commands.

Existing HW stop flow utilizes 2 different functions: qed_hw_stop() and
qed_hw_reset() which don't abide this requirement; Most of the closure
is doing outside the scope of the unload request.

This patch removes qed_hw_reset() and places the relevant stop
functionality underneath the management firmware protection.

Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'tipc-subscription-refcount-simplifications'

Parthasarathy Bhuvaragan says:

====================
tipc: subscription refcount simplifications

The first patch makes the subscription refcount cleanup lockless and
the second updates the subscription refcount policy.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: adjust the policy of holding subscription kref

When a new subscription object is inserted into name_seq->subscriptions
list, it's under name_seq->lock protection; when a subscription is
deleted from the list, it's also under the same lock protection;
similarly, when accessing a subscription by going through subscriptions
list, the entire process is also protected by the name_seq->lock.

Therefore, if subscription refcount is increased before it's inserted
into subscriptions list, and its refcount is decreased after it's
deleted from the list, it will be unnecessary to hold refcount at all
before accessing subscription object which is obtained by going through
subscriptions list under name_seq->lock protection.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: advance the time of deleting subscription from subscriber->subscrp_list

After a subscription object is created, it's inserted into its
subscriber subscrp_list list under subscriber lock protection,
similarly, before it's destroyed, it should be first removed from
its subscriber->subscrp_list. Since the subscription list is
accessed with subscriber lock, all the subscriptions are valid
during the lock duration. Hence in tipc_subscrb_subscrp_delete(), we
remove subscription get/put and the extra subscriber unlock/lock.

After this change, the subscriptions refcount cleanup is very simple
and does not access any lock.

Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

stmmac: use netif_set_real_num_{rx,tx}_queues

A driver must not access the two fields directly but should instead use
the helper functions to set the values and keep a consistent internal
state:

ethernet/stmicro/stmmac/stmmac_main.c: In function 'stmmac_dvr_probe':
ethernet/stmicro/stmmac/stmmac_main.c:4083:8: error: 'struct net_device' has no member named 'real_num_rx_queues'; did you mean 'real_num_tx_queues'?

Fixes: a8f5102af2a7 ("net: stmmac: TX and RX queue priority configuration")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

soc: qcom: smd-rpm: Add msm8996 compatibility

With the RPM driver transitioned to RPMSG we can reuse the SMD-RPM
driver ontop of GLINK for 8996, without any modifications.

Acked-by: Andy Gross <andy.gross@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

soc: qcom: smd: Remove standalone driver

Remove the standalone SMD implementation as we have transitioned the
client drivers to use the RPMSG based one.

Also remove all dependencies on QCOM_SMD from Kconfig files, in order to
keep them selectable in the absence of the removed symbol.

Acked-by: Andy Gross <andy.gross@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

soc: qcom: smd: Transition client drivers from smd to rpmsg

By moving these client drivers to use RPMSG instead of the direct SMD
API we can reuse them ontop of the newly added GLINK wire-protocol
support found in the 820 and 835 Qualcomm platforms.

As the new (RPMSG-based) and old SMD implementations are mutually
exclusive we have to change all client drivers in one commit, to make
sure we have a working system before and after this transition.

Acked-by: Andy Gross <andy.gross@linaro.org>
Acked-by: Kalle Valo <kvalo@codeaurora.org>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

vxlan: don't age NTF_EXT_LEARNED fdb entries

vxlan driver already implicitly supports installing
of external fdb entries with NTF_EXT_LEARNED. This
patch just makes sure these entries are not aged
by the vxlan driver. An external entity managing these
entries will age them out. This is consistent with
the use of NTF_EXT_LEARNED in the bridge driver.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-dpipe'

Jiri Pirko says:

====================
Add support for pipeline debug (dpipe)

Arkadi says:

While doing the hardware offloading process much of the hardware
specifics cannot be presented. An example for such is the routing
LPM algorithm which differ in hardware implementation from the
kernel software implementation. The only information the user receives
is whether specific route is offloaded or not, but he cannot really
understand the underlying implementation nor get the specific statistics
related to that process.

Another example is ACL offload using TC which is commonly implemented
using TCAM memory. Currently there is no capability to gain visibility
into the TCAM structure and to debug suboptimal resource allocation.

This patchset introduces capability for exporting the ASICs pipeline
abstraction via devlink infrastructure, which should serve as an
complementary tool. This infrastructure allows the user to get visibility
into the ASIC by modeling it as a set of match/action tables.

The main objects defined:
Table - abstraction for a single pipeline stage. Contains the
available match/actions and counter availability.
Entry - entry in a specific table with specific matches/actions
values and dedicated counter.
Header/field - tuples which describes the tables behavior.

As an example one of the ASIC's L3 blocks will be modeled. The egress
rif (router interface) table is the final step in the L3 pipeline
processing which does match on the internal rif index which was
determined before by the routing logic. The erif table determines
whether to forward or drop the packet and updates the corresponding
rif L3 statistics.

To expose this internal resources a special metadata header will
be introduced that describes the internal information gathered by
the ASIC's pipeline and contains the following fields: rif_port_index,
forward and drop.

Some internal hardware resources have direct mapping to kernel
objects. For example the rif_port_index is mapped to the net-devices
ifindex. By providing this mapping the users gains visibility into
the offloading process.

Follow-up work will include exporting more L3 tables which will give
visibility into the routing process.

First stage is adding support for dpipe in devlink. Next add support
in spectrum driver. Finally implement egress router interface
(erif) table for spectrum ASIC as an example.

---
v1->v2: Please see individual patches
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum: Add Support for erif table entries access

Implement dpipe's table ops for erif table which provide:
1. Getting the entries in the table with the associate values.
- match on "mlxsw_meta:erif_index"
- action on "mlxsw_meta:forwared_out"
2. Synchronize the hardware in case of enabling/disabling counters which
mean removing erif counters from all interfaces.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>