review.tizen.org Git - platform/kernel/linux-starfive.git/log

Merge branch 'mvpp2-Armada-7k-8k-PP2-ACPI-support'

Marcin Wojtas says:

====================
Armada 7k/8k PP2 ACPI support

I quickly resend the series, thanks to Antoine Tenart's remark,
who spotted !CONFIG_ACPI compilation issue after introducing
the new fwnode_irq_get() routine. Please see the details in the changelog
below and the 3/7 commit log.

mvpp2 driver can work with the ACPI representation, as exposed
on a public branch:
https://github.com/MarvellEmbeddedProcessors/edk2-open-platform/commits/marvell-armada-wip
It was compiled together with the most recent Tianocore EDK2 revision.
Please refer to the firmware build instruction on MacchiatoBin board:
http://wiki.macchiatobin.net/tiki-index.php?page=Build+from+source+-+UEFI+EDK+II

ACPI representation of PP2 controllers (withouth PHY support) can
be viewed in the github:
* MacchiatoBin:
https://github.com/MarvellEmbeddedProcessors/edk2-open-platform/blob/71ae395da1661374b0f07d1602afb1eee56e9794/Platforms/Marvell/Armada/AcpiTables/Armada80x0McBin/Dsdt.asl#L201

* Armada 7040 DB:
https://github.com/MarvellEmbeddedProcessors/edk2-open-platform/blob/71ae395da1661374b0f07d1602afb1eee56e9794/Platforms/Marvell/Armada/AcpiTables/Armada70x0/Dsdt.asl#L131

I will appreciate any comments or remarks.

Best regards,
Marcin

Changelog:
v3 -> v4:
* 3/7
    - add new macro (ACPI_HANDLE_FWNODE) and fix
      compilation with !CONFIG_ACPI
    - extend commit log and mention usability of fwnode_irq_get
      for the child nodes as well

v2 -> v3:
* 1/7, 2/7
    - Add Rafael's Acked-by's
* 3/7, 4/7
    - New patches
* 6/7, 7/7
    - Update driver with new helper routines usage
    - Improve commit log.

v1 -> v2:
* Remove MDIO patches
* Use PP2 ports only with link interrupts
* Release second region resources in mvpp2 driver (code moved from
  mvmdio), as explained in details in 5/5 commit message.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvpp2: enable ACPI support in the driver

This patch introduces an alternative way of obtaining resources - via
ACPI tables provided by firmware. Enabling coexistence with the DT
support, in addition to the OF_*->device_*/fwnode_* API replacement,
required following steps to be taken:

* Add mvpp2_acpi_match table
* Omit clock configuration and obtain tclk from the property - in ACPI
  world, the firmware is responsible for clock maintenance.
* Disable comphy and syscon handling as they are not available for ACPI.
* Modify way of obtaining interrupts - use newly introduced
  fwnode_irq_get() routine
* Until proper MDIO bus and PHY handling with ACPI is established in the
  kernel, use only link interrupts feature in the driver. For the RGMII
  port it results in depending on GMAC settings done during firmware
  stage.
* When booting with ACPI MVPP2_QDIST_MULTI_MODE is picked by
  default, as there is no need to keep any kind of the backward
  compatibility.

Moreover, a memory region used by mvmdio driver is usually placed in
the middle of the address space of the PP2 network controller.
The MDIO base address is obtained without requesting memory region
(by devm_ioremap() call) in mvmdio.c, later overlapping resources are
requested by the network driver, which is responsible for avoiding
a concurrent access.

In case the MDIO memory region is declared in the ACPI, it can
already appear as 'in-use' in the OS. Because it is overlapped by second
region of the network controller, make sure it is released, before
requesting it again. The care is taken by mvpp2 driver to avoid
concurrent access to this memory region.

Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvpp2: use device_*/fwnode_* APIs instead of of_*

OF functions can be used only for the driver using DT.
As a preparation for introducing ACPI support in mvpp2
driver, use struct fwnode_handle in order to obtain
properties from the hardware description.

This patch replaces of_* function with device_*/fwnode_*
where possible in the mvpp2.

Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvpp2: simplify maintaining enabled ports' list

'port_count' field of the mvpp2 structure holds an overall amount
of available ports, based on DT nodes status. In order to be prepared
to support other HW description, obtain the value by incrementing it
upon each successful port initialization. This allowed for simplifying
port indexing in the controller's private array, whose size is now not
dynamically allocated, but fixed to MVPP2_MAX_PORTS.

This patch simplifies creating and filling list of enabled ports and
is a part of the preparation for adding ACPI support in the mvpp2 driver.

Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

device property: Allow iterating over available child fwnodes

Implement a new helper function fwnode_get_next_available_child_node(),
which enables obtaining next enabled child fwnode, which
works on a similar basis to OF's of_get_next_available_child().

This commit also introduces a macro, thanks to which it is
possible to iterate over the available fwnodes, using the
new function described above.

Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

device property: Introduce fwnode_irq_get()

Until now there were two very similar functions allowing
to get Linux IRQ number from ACPI handle (acpi_irq_get())
and OF node (of_irq_get()). The first one appeared to be used
only as a subroutine of platform_irq_get(), which (in the generic
code) limited IRQ obtaining from _CRS method only to nodes
associated to kernel's struct platform_device.

This patch introduces a new helper routine - fwnode_irq_get(),
which allows to get the IRQ number directly from the fwnode
to be used as common for OF/ACPI worlds. It is usable not
only for the parents fwnodes, but also for the child nodes
comprising their own _CRS methods with interrupts description.

In order to be able o satisfy compilation with !CONFIG_ACPI
and also simplify the new code, introduce a helper macro
(ACPI_HANDLE_FWNODE), with which it is possible to reach
an ACPI handle directly from its fwnode.

Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

device property: Introduce fwnode_get_phy_mode()

Until now there were two almost identical functions for
obtaining network PHY mode - of_get_phy_mode() and,
more generic, device_get_phy_mode(). However it is not uncommon,
that the network interface is represented as a child
of the actual controller, hence it is not associated
directly to any struct device, required by the latter
routine.

This commit allows for getting the PHY mode for
children nodes in the ACPI world by introducing a new function -
fwnode_get_phy_mode(). This commit also changes
device_get_phy_mode() routine to be its wrapper, in order
to prevent unnecessary duplication.

Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

device property: Introduce fwnode_get_mac_address()

Until now there were two almost identical functions for
obtaining MAC address - of_get_mac_address() and, more generic,
device_get_mac_address(). However it is not uncommon,
that the network interface is represented as a child
of the actual controller, hence it is not associated
directly to any struct device, required by the latter
routine.

This commit allows for getting the MAC address for
children nodes in the ACPI world by introducing a new function -
fwnode_get_mac_address(). This commit also changes
device_get_mac_address() routine to be its wrapper, in order
to prevent unnecessary duplication.

Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: caif: remove redundant re-assignment of pointer pfrm

The pointer pfrm is initialized and then later re-assigned the same
value and hence the second assignment is redundant and can be removed.

Cleans up clang warning:
drivers/net/caif/caif_hsi.c:222:6: warning: Value stored to 'pfrm'
during its initialization is never read

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: add geneve offload support for T6

Add geneve segmentation offload support of T6 cards.

Original work by: Santosh Rastapur <santosh@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'mac80211-next-for-davem-2018-01-22' of git://git./linux/kernel/git/jberg/mac80211-next

Johannes Berg says:

====================
Less than a handful of changes:
* possible memory leak fix in hwsim
* speed up hwsim
* add hwsim userspace rate control API
* code cleanups
====================

A conflict was resolved in mac80211_hwsim.c, mostly of
the simple overlapping changes category. One adding
a rhashtable and another adding a workqueue.

Signed-off-by: David S. Miller <davem@davemloft.net>

devlink: fix memory leak on 'resource'

Currently, if the call to devlink_resource_find returns null then
the error exit path does not free the devlink_resource 'resource'
and a memory leak occurs. Fix this by kfree'ing resource on the
error exit path.

Detected by CoverityScan, CID#1464184 ("Resource leak")

Fixes: d9f9b9a4d05f ("devlink: Add support for resource abstraction")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mlxsw-spectrum_router-Optimize-LPM-trees'

Jiri Pirko says:

====================
mlxsw: spectrum_router: Optimize LPM trees

Ido says:

This set tries to optimize the structure of the LPM trees used for route
lookup by avoiding lookups that are guaranteed not to return a result.
This is done by making sure only used prefix lengths are present in the
tree.

First two patches are small preparatory steps towards the actual change
in the last patch.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_router: Remove unnecessary prefix lengths from LPM tree

In commit fc922bb0dd94 ("mlxsw: spectrum_router: Use one LPM tree for
all virtual routers") I tried to make sure only used prefix lengths are
present in the LPM tree shared between all virtual routers.

However, this optimization had to be removed in commit a69518cf0b4c
("mlxsw: spectrum_router: Avoid expensive lookup during route removal"),
since determining the used prefix lengths required us to traverse all
the active virtual routers, which could result in a hung task depending
on the number of VRFs and whether routes were removed due to abort or
not.

Re-introduce the optimization by moving the prefix usage accounting from
the virtual routers to the LPM tree, as this accounting is only used in
order to determine the tree's structure.

To make the sharing of the trees more explicit, the two trees (for IPv4
and IPv6) are stored in the shared router struct and upon the creation
of a virtual router it is immediately bound to both.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_router: Pass FIB node to LPM tree unlink function

Next patch will try to optimize the LPM tree and make sure only used
prefix lengths are present, to avoid unnecessary look-ups.

Pass the currently removed FIB node to the unlinking function as its
associated prefix length is a potential candidate for removal from the
tree.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_router: Use the nodes list as indication for empty FIB

Currently, each FIB (IPv4 / IPv6) in a virtual router holds a prefix
usage that is used to choose a matching LPM tree, but also to check if
the FIB is empty, so that the LPM tree could be unbound.

Next patches will remove the reliance on the per-FIB prefix usage for
LPM tree matching. Keeping it only to check if the FIB is empty is a
waste, since we can use the nodes ({Prefix, Length}) list instead.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tun: add missing rcu annotation

This patch fixes the following sparse warnings:

drivers/net/tun.c:2241:15: error: incompatible types in comparison expression (different address spaces)

Fixes: cd5681d7d890 ("tuntap: rename struct tun_steering_prog to struct tun_prog")
Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mac80211_hwsim: fix possible memory leak in hwsim_new_radio_nl()

'hwname' is malloced in hwsim_new_radio_nl() and should be freed
before leaving from the error handling cases, otherwise it will cause
memory leak.

Fixes: ff4dd73dd2b4 ("mac80211_hwsim: check HWSIM_ATTR_RADIO_NAME length")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

debugfs_sta: Remove unneeded semicolons

Trivial fix removes unneeded semicolons after switch blocks.

Signed-off-by: Christopher Díaz Riveros <chrisadr@gentoo.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

Merge branch 'mlxsw-Add-support-for-mirror-action-with-flower'

Jiri Pirko says:

====================
mlxsw: Add support for mirror action with flower

Arkadi says:

Add support for mirror action with flower classifier. The first 3 patches
introduce a generic per-block resource infra. The last 4 patches add
support for flow based span.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_acl: Add support for mirror action

Add support for mirror action. Only one mirror action can be set per rule.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum: Extend mlxsw_afa_ops for counter index and implement for Spectrum

Introduce extension of mlxsw_afa_ops in order to add/del mirroring and
implement the ops for Spectrum.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum: Extend and export SPAN API

Extend SPAN API for ACL case. In case of ACL triggering the MPAR register
shouldn't be configured. This patch also export those helpers for
ACL usage.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_acl: Add support for mirroring action

The patch extends the trap action for mirroring.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: core: Make counter index allocated inside the action append

So far, the caller of mlxsw_afa_block_append_counter needed to allocate
counter index by hand. Benefit from the previously introduced resource
infra and counter_index_get/put callbacks, and allocate the counter
index in place where it is needed, inside the action append function.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: core: Convert fwd_entry_ref list to be generic per-block resource list

Since the resource list needs to be used also for other entries different
to fwd_entry_ref, make the list generic. For that purpose, introduce a
resource structure with couple of helpers that the code which need to
store a per-block resource should use.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum: Extend mlxsw_afa_ops for counter index and implement for Spectrum

Introduce extension of mlxsw_afa_ops in order to get/put counter indexes
and implement the ops for Spectrum.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'Aquantia-atlantic-driver-new-devices-support'

Igor Russkikh says:

====================
Aquantia atlantic driver new devices support

This patchset introduces a support for new Aquantia hardware:
AQC11x family with updated hardware (B1) and firmware (2.x and 3.x branches).

For that, a number of improvements in overall driver model were done:
- Firmware specific ops tables. Firmware 2.x and 3.x series support
functions are now in separate fw2x module.
- PCI module cleanup and simplification done.
- Verified and tested hardware reset process.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: aquantia: bump driver version to match aquantia internal numbering

Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: aquantia: Report correct mediatype via ethtool

For devices with known capabilities of Fibre media type we now report that
to ethtool.

Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: aquantia: Introduce global AQC hardware reset sequence

The detailed reset sequence ensures all HW components are in aligned
state before NIC startup. It also supports cards with signed firmware (RBL)
and checks if their FW is valid.

Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: aquantia: Introduce support for new firmware on AQC cards

This defines fw2x operations table and corresponding methods.
Some of the functions are being shared with 1.x firmware

Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: aquantia: Introduce firmware ops callbacks

New AQC cards will have an updated firmware with new binary interface.
This patch extracts firmware specific operations into a separate table
and prepares for the introduction of new fw 2.x and 3.x

Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: aquantia: Change confusing no_ff_addr to more meaningful name

The address to check if HW is not dead/hang could be stored in
capabilities, since it is a constant. Change its name to better reflect
the idea.

Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: aquantia: Remove create/destroy from hw ops

These ops are not related to HW and are now implemented in pci module.
Thus, remove these ops pointers and implementation.

Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: aquantia: Cleanup pci functions module

Driver contained a dead code of maintaining multiple pci port instances.
That will never be used since for each pci function a separate NIC
instance is created.
Simplify this, making pci module only responsible for pci resource
management.
NIC initialization is also simplified accordingly.

Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: aquantia: Convert hw and caps structures to const static pointers

This removes unnecessary structure copying, and prepares the driver for
separate firmware ops table introduction.

We also remove extra copy of capabilities structure (which is const actually)
and also replace it with a const pointer in aq_nic_cfg.

Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: aquantia: Introduce new AQC devices and capabilities

A number of new AQC devices is going to be released. To support more
flexible capabilities management a number of static caps instances is now
declared. Devices now are mainly differs by supported speeds, but in future
more parameters will be customized. A set of AQC100 devices have
fibre media, not twisted pair - this is also reflected in
new capabilities definitions.

HW level also now directly exports hw_ops for each of A0/B0 hardware.

PCI configuration now uses a device configuration table where each
device ID is explicitly mapped with hardware OPs and capabilities
structures.

Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: aquantia: Introduce new device ids and constants

New set of aquantia devices has an upgraded hardware (B1).
The hardware interface is identical to B0. The difference will
be in firmware which is incompatible with old one.

Reorganized and removed duplicate speed and devid definitions
Introduced explicit flow control configuration defines

Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'mlx5-updates-2018-01-19' of git://git./linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2018-01-19

From: Or Gerlitz <ogerlitz@mellanox.com>
=======
First six patches of this series further enhances the mlx5 hairpin support.
The first two patches deal with using different hairpin instances
for flows whose packets have different priorities to align with the port
TX QoS model. The next four patches allow us to do HW spreading
of flows over a set of hairpin pairs using RSS. The last two patches
change the driver to also set the size of the HW hairpin queues.
========

Next four patches from Eran Ben Elisha <eranbe@mellanox.com>:
Add more debug data for TX timeout handling, and further enhance and optimize
TX timeout handling upon lost interrupts, which adds a mechanism for explicitly
polling EQ in case of a TX timeout in order to recover from a lost interrupt.
If this is not the case (no pending EQEs), perform a channels full recovery as
usual.

From Kamal Heib <kamalh@mellanox.com>, Two patches to extend the stats group API
to have an update_stats() callback which will be used to fetch the hardware or
software counters data, this will improve the current API and reduce code
duplication.

From Gal Pressman <galp@mellanox.com>, Last patch, Add likely to the common RX checksum
flow.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: flower: prioritize stats updates

Previously it was possible to interrupt processing stats updates because
they were handled in a work queue. Interrupting the stats updates could
lead to a situation where we backup the control message queue. This patch
moves the stats update processing out of the work queue to be processed as
soon as hardware sends a request.

Reported-by: Louis Peens <louis.peens@netronome.com>
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: gemini: Depend on HAS_IOMEM

The zeroday builder notices that since Usermode Linux does not
have IO memory, the build fails for them when selecting everything
it can enable.

As the driver is clearly using memory-mapped registers to access
the network adapter, we add depends on HAS_IOMEM to solve this
problem.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git./linux/kernel/git/pablo/nf-next

Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

The following patchset contains Netfilter/IPVS updates for your net-next
tree. Basically, a new extension for ip6tables, simplification work of
nf_tables that saves us 500 LoC, allow raw table registration before
defragmentation, conversion of the SNMP helper to use the ASN.1 code
generator, unique 64-bit handle for all nf_tables objects and fixes to
address fallout from previous nf-next batch.  More specifically, they
are:

1) Seven patches to remove family abstraction layer (struct nft_af_info)
   in nf_tables, this simplifies our codebase and it saves us 64 bytes per
   net namespace.

2) Add IPv6 segment routing header matching for ip6tables, from Ahmed
   Abdelsalam.

3) Allow to register iptable_raw table before defragmentation, some
   people do not want to waste cycles on defragmenting traffic that is
   going to be dropped, hence add a new module parameter to enable this
   behaviour in iptables and ip6tables. From Subash Abhinov
   Kasiviswanathan. This patch needed a couple of follow up patches to
   get things tidy from Arnd Bergmann.

4) SNMP helper uses the ASN.1 code generator, from Taehee Yoo. Several
   patches for this helper to prepare this change are also part of this
   patch series.

5) Add 64-bit handles to uniquely objects in nf_tables, from Harsha
   Sharma.

6) Remove log message that several netfilter subsystems print at
   boot/load time.

7) Restore x_tables module autoloading, that got broken in a previous
   patch to allow singleton NAT hook callback registration per hook
   spot, from Florian Westphal. Moreover, return EBUSY to report that
   the singleton NAT hook slot is already in instead.

8) Several fixes for the new nf_tables flowtable representation,
   including incorrect error check after nf_tables_flowtable_lookup(),
   missing Kconfig dependencies that lead to build breakage and missing
   initialization of priority and hooknum in flowtable object.

9) Missing NETFILTER_FAMILY_ARP dependency in Kconfig for the clusterip
   target. This is due to recent updates in the core to shrink the hook
   array size and compile it out if no specific family is enabled via
   .config file. Patch from Florian Westphal.

10) Remove duplicated include header files, from Wei Yongjun.

11) Sparse warning fix for the NFPROTO_INET handling from the core
    due to missing static function definition, also from Wei Yongjun.

12) Restore ICMPv6 Parameter Problem error reporting when
    defragmentation fails, from Subash Abhinov Kasiviswanathan.

13) Remove obsolete owner field initialization from struct
    file_operations, patch from Alexey Dobriyan.

14) Use boolean datatype where needed in the Netfilter codebase, from
    Gustavo A. R. Silva.

15) Remove double semicolon in dynset nf_tables expression, from
    Luis de Bethencourt.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git./linux/kernel/git/bpf/bpf-next

Alexei Starovoitov says:

====================
pull-request: bpf-next 2018-01-19

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) bpf array map HW offload, from Jakub.

2) support for bpf_get_next_key() for LPM map, from Yonghong.

3) test_verifier now runs loaded programs, from Alexei.

4) xdp cpumap monitoring, from Jesper.

5) variety of tests, cleanups and small x64 JIT optimization, from Daniel.

6) user space can now retrieve HW JITed program, from Jiong.

Note there is a minor conflict between Russell's arm32 JIT fixes
and removal of bpf_jit_enable variable by Daniel which should
be resolved by keeping Russell's comment and removing that variable.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git./linux/kernel/git/davem/net

The BPF verifier conflict was some minor contextual issue.

The TUN conflict was less trivial. Cong Wang fixed a memory leak of
tfile->tx_array in 'net'. This is an skb_array. But meanwhile in
net-next tun changed tfile->tx_arry into tfile->tx_ring which is a
ptr_ring.

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'bpf-misc-improvements'

Daniel Borkmann says:

====================
This series adds various misc improvements to BPF: detection
of BPF helper definition misconfiguration for mem/size argument
pairs, csum_diff helper also for XDP, various test cases,
removal of the recently added pure_initcall(), restriction
of the jit sysctls to cap_sys_admin for initns, a minor size
improvement for x86 jit in alu ops, output of complexity limit
to verifier log and last but not least having the event output
more flexible with moving to const_size_or_zero type.

Thanks!
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: move event_output to const_size_or_zero for xdp/skb as well

Similar rationale as in a60dd35d2e39 ("bpf: change bpf_perf_event_output
arg5 type to ARG_CONST_SIZE_OR_ZERO"), change the type to CONST_SIZE_OR_ZERO
such that we can better deal with optimized code. No changes needed in
bpf_event_output() as it can also deal with 0 size entirely (e.g. as only
wake-up signal with empty frame in perf RB, or packet dumps w/o meta data
as another such possibility).

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: add upper complexity limit to verifier log

Given the limit could potentially get further adjustments in the
future, add it to the log so it becomes obvious what the current
limit is w/o having to check the source first. This may also be
helpful for debugging complexity related issues on kernels that
backport from upstream.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf, x86: small optimization in alu ops with imm

For the BPF_REG_0 (BPF_REG_A in cBPF, respectively), we can use
the short form of the opcode as dst mapping is on eax/rax and
thus save a byte per such operation. Added to add/sub/and/or/xor
for 32/64 bit when K immediate is used. There may be more such
low-hanging fruit to add in future as well.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: restrict access to core bpf sysctls

Given BPF reaches far beyond just networking these days, it was
never intended to allow setting and in some cases reading those
knobs out of a user namespace root running without CAP_SYS_ADMIN,
thus tighten such access.

Also the bpf_jit_enable = 2 debugging mode should only be allowed
if kptr_restrict is not set since it otherwise can leak addresses
to the kernel log. Dump a note to the kernel log that this is for
debugging JITs only when enabled.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: get rid of pure_initcall dependency to enable jits

Having a pure_initcall() callback just to permanently enable BPF
JITs under CONFIG_BPF_JIT_ALWAYS_ON is unnecessary and could leave
a small race window in future where JIT is still disabled on boot.
Since we know about the setting at compilation time anyway, just
initialize it properly there. Also consolidate all the individual
bpf_jit_enable variables into a single one and move them under one
location. Moreover, don't allow for setting unspecified garbage
values on them.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: add couple of test cases for div/mod by zero

Add couple of missing test cases for eBPF div/mod by zero to the
new test_verifier prog runtime feature. Also one for an empty prog
and only exit.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: add couple of test cases for signed extended imms

Add a couple of test cases for interpreter and JIT that are
related to an issue we faced some time ago in Cilium [1],
which is fixed in LLVM with commit e53750e1e086 ("bpf: fix
bug on silently truncating 64-bit immediate").

Test cases were run-time checking kernel to behave as intended
which should also provide some guidance for current or new
JITs in case they should trip over this. Added for cBPF and
eBPF.

[1] https://github.com/cilium/cilium/pull/2162

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: add csum_diff helper to xdp as well

Useful for porting cls_bpf programs w/o increasing program
complexity limits much at the same time, so add the helper
to XDP as well.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf, verifier: detect misconfigured mem, size argument pair

I've seen two patch proposals now for helper additions that used
ARG_PTR_TO_MEM or similar in reg_X but no corresponding ARG_CONST_SIZE
in reg_X+1. Verifier won't complain in such case, but it will omit
verifying the memory passed to the helper thus ending up badly.
Detect such buggy helper function signature and bail out during
verification rather than finding them through review.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

samples/bpf: xdp_monitor include cpumap tracepoints in monitoring

The xdp_redirect_cpu sample have some "builtin" monitoring of the
tracepoints for xdp_cpumap_*, but it is practical to have an external
tool that can monitor these transpoint as an easy way to troubleshoot
an application using XDP + cpumap.

Specifically I need such external tool when working on Suricata and
XDP cpumap redirect. Extend the xdp_monitor tool sample with
monitoring of these xdp_cpumap_* tracepoints.  Model the output format
like xdp_redirect_cpu.

Given I needed to handle per CPU decoding for cpumap, this patch also
add per CPU info on the existing monitor events.  This resembles part
of the builtin monitoring output from sample xdp_rxq_info.  Thus, also
covering part of that sample in an external monitoring tool.

Performance wise, the cpumap tracepoints uses bulking, which cause
them to have very little overhead.  Thus, they are enabled by default.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

Merge branch 'bpf-lpm-get-next-key'

Yonghong Song says:

====================
This patch set implements MAP_GET_NEXT_KEY command for LPM_TRIE map.
This command is really useful for key enumeration, and for key deletion
if what keys in the trie are unknown.

Patch #1 implements the functionality in the kernel and patch #2
adds a test case in tools/testing/selftests/bpf.
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

tools/bpf: add a testcase for MAP_GET_NEXT_KEY command of LPM_TRIE map

A test case is added in tools/testing/selftests/bpf/test_lpm_map.c
for MAP_GET_NEXT_KEY command. A four node trie, which
is described in kernel/bpf/lpm_trie.c, is built and the
MAP_GET_NEXT_KEY results are checked.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

bpf: implement MAP_GET_NEXT_KEY command for LPM_TRIE map

Current LPM_TRIE map type does not implement MAP_GET_NEXT_KEY
command. This command is handy when users want to enumerate
keys. Otherwise, a different map which supports key
enumeration may be required to store the keys. If the
map data is sparse and all map data are to be deleted without
closing file descriptor, using MAP_GET_NEXT_KEY to find
all keys is much faster than enumerating all key space.

This patch implements MAP_GET_NEXT_KEY command for LPM_TRIE map.
If user provided key pointer is NULL or the key does not have
an exact match in the trie, the first key will be returned.
Otherwise, the next key will be returned.

In this implemenation, key enumeration follows a postorder
traversal of internal trie. More specific keys
will be returned first than less specific ones, given
a sequence of MAP_GET_NEXT_KEY syscalls.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

selftests: bpf: update .gitignore with missing generated files

Update .gitignore with missing generated files.

Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

bpftool: recognize BPF_MAP_TYPE_CPUMAP maps

Add BPF_MAP_TYPE_CPUMAP map type to the list
of map type recognized by bpftool and define
corresponding text representation.

Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Quentin Monnet <quentin.monnet@netronome.com>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Acked-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

Merge branch 'dsa-mv88e6xxx-ATU-VTU-irq-fixes'

Andrew Lunn says:

====================
ATU and VTU irq fixes

Further testing and code review found two sets of bugs.

Core review found a cut/paste error in the irq setup code.

A board which does not have an interrupt line from the switch to the
SoC, and experiancing an EPROBE_DEFER throw a splat when the ATU irq
was freed but never registered.

v2: Fix typ0 chip->chip->vtu_prob_irq to chip->vtu_prob_irq
0-day compile testing.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: Free ATU/VTU irq only when there is chip irq

We only register the ATU and VTU irq when we have a chip level IRQ.
In the error path, we should only attempt to remove the ATU and VTU
irq if we also have a chip level IRQ.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: Return error from irq_find_mapping()

Fix a cut/paste error. When irq_find_mapping() returns an error for
the ATU or VTU interrupt, return that error, not the value of
chip->device_irq.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-sched-cls-add-extack-support'

Alexander Aring says:

====================
net: sched: cls: add extack support

this patch adds extack support for TC classifier subsystem. The first
patch fixes some code style issues for this patch series pointed out
by checkpatch. The other patches until the last one prepares extack
handling for the TC classifier subsystem and handle generic extack
errors.

The last patch is an example for u32 classifier to add extack support
inside the callbacks delete and change. There exists a init callback as
well, but most classifier implementation run a kalloc() once to allocate
something. Not necessary _yet_ to add extack support now.

- Alex

Cc: David Ahern <dsahern@gmail.com>
changes since v3:
- fix accidentally move of config option mismatch message in PATCH 2/8
   correct one is 4/8, detected by kbuildbot (Thank you)
- Removed patch "net: sched: cls: add extack support for tc_setup_cb_call"
   PATCH 7/8 in version v2 as suggested by Jakub Kicinski (Thank you)
- changed NL_SET_ERR_MSG to NL_SET_ERR_MSG_MOD as suggested by Jakub Kicinski
   in u32 cls (Thank You)
- Removed text from cover letter that I was waiting for Jiri's Patches as
   detected by Jamal Hadi Salim (Thank you).

changes since v2:
- rebased on Jiri's patches (Thank you)
- several spelling fixes pointed out by Cong Wang (Thank you)
- several spelling fixes pointed out by David Ahern (Thank you)
- use David Ahern recommendation if config option is mismatch, but
   combine it with Cong Wang recommendation to put config name into it
   (Thank you)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: sched: cls_u32: add extack support

This patch adds extack support for the u32 classifier as example for
delete and init callback.

Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sched: cls: add extack support for tcf_change_indev

This patch adds extack handling for the tcf_change_indev function which
is common used by TC classifier implementations.

Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sched: cls: add extack support for delete callback

This patch adds extack support for classifier delete callback api. This
prepares to handle extack support inside each specific classifier
implementation.

Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sched: cls: add extack support for tcf_exts_validate

The tcf_exts_validate function calls the act api change callback. For
preparing extack support for act api, this patch adds the extack as
parameter for this function which is common used in cls implementations.

Furthermore the tcf_exts_validate will call action init callback which
prepares the TC action subsystem for extack support.

Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sched: cls: add extack support for change callback

This patch adds extack support for classifier change callback api. This
prepares to handle extack support inside each specific classifier
implementation.

Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sched: cls_api: handle generic cls errors

This patch adds extack support for generic cls handling. The extack
will be set deeper to each called function which is not part of netdev
core api.

Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sched: cls: fix code style issues

This patch changes some code style issues pointed out by checkpatch
inside the TC cls subsystem.

Signed-off-by: Alexander Aring <aring@mojatatu.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum: Upper-bound supported FW version

During initialization the driver checks whether the flashed FW image
suits its requirements by checking that it's sufficiently new.
However, there's only a weak backward compatibility scheme that is
actually guaranteed by the FW, so driver must also upper bound the
version to prevent compatibility issues between current driver and some
possible future fw.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'nfp-devlink-capabilities-extensions-and-updates'

Jakub Kicinski says:

====================
nfp: devlink, capabilities extensions and updates

This series starts with an improvement to the usability of the device
memory accessors (CPP transactions).  Next few patches are devoted to
fixing the devlink locking.  After recent patches for mlxsw the locking
scheme of devlink ops has to be reworked.  Following patches improve
NFP code dealing with "representors", and expands the error message
printed when driver has no support for loaded FW.

Second part of the series is focused on vNIC capabilities read from
vNIC control memory (often referred to as "BAR0" for historical reasons).
TLV capability format is established and immediately made use of.  The
next patches rework parsing of features for control vNIC which allows
apps to mask out features they don't want enabled.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: bpf: disable all ctrl vNIC capabilities

BPF firmware currently exposes IRQ moderation capability.
The driver will make use of it by default, inserting 50 usec
delay to every control message exchange. This cuts the number
of messages per second we can exchange by almost half.

None of the other capabilities make much sense for BPF control
vNIC, either. Disable them all.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: allow apps to disable ctrl vNIC capabilities

Most vNIC capabilities are netdev related.  It makes no sense
to initialize them and waste FW resources.  Some are even
counter-productive, like IRQ moderation, which will slow
down exchange of control messages.

Add to nfp_app a mask of enabled control vNIC capabilities
for apps to use.  Make flower and BPF enable all capabilities
for now.  No functional changes.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: split reading capabilities out of nfp_net_init()

nfp_net_init() is a little long and we are about to add more
code to reading capabilties. Move the capability reading,
parsing and validating out. Only actual initialization
will stay in nfp_net_init().

No functional changes.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: read mailbox address from TLV caps

Allow specifying alternative vNIC mailbox location in TLV caps.
This way we can size the mailbox to the needs and not necessarily
waste 512B of ctrl memory space.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: read ME frequency from vNIC ctrl memory

PCIe island clock frequency is used when converting coalescing
parameters from usecs to NFP timestamps. Most chips don't run
at 1200MHz, allow FW to provide us with the real frequency.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: add TLV capabilities to the BAR

NFP is entirely programmable, including the PCI data interface.
Using a fixed control BAR layout certainly makes implementations
easier, but require careful considerations when space is allocated.
Once BAR area is allocated to one feature nothing else can use it.
Allocating space statically also requires it to be sized upfront,
which leads to either unnecessary limitation or wastage.

We currently have a 32bit capability word defined which tells drivers
which application FW features are supported.   Most of the bits
are exhausted.  The same bits are also reused for enabling specific
features.  Bulk of capabilities don't have a need for an enable bit,
however, leading to confusion and wastage.

TLVs seems like a better fit for expressing capabilities of applications
running on programmable hardware.

This patch leaves the front of the BAR as is, and declares a TLV
capability start at offset 0x58.  Most of the space up to 0x0d90
is already allocated, but the used space can be wrapped with RESERVED
TLVs.  E.g.:

Address    Type         Length
0x0058    RESERVED      0xe00  /* Wrap basic structures */
0x0e5c    FEATURE_A     0x004
0x0e64    FEATURE_B     0x004
0x0e6c    RESERVED      0x990  /* Wrap qeueue stats */
0x1800    FEATURE_C     0x100

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: improve app not found message

When driver app matching loaded FW is not found users are faced with:

nfp: failed to find app with ID 0x%02x

This message does not properly explain that matching driver code is
either not built into the driver or the driver is too old.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: protect each repr pointer individually with RCU

Representors are grouped in sets by type.  Currently the whole
sets are under RCU protection, but individual representor pointers
are not.  This causes some inconveniences when representors have
to be destroyed, because we have to allocate new sets to remove
any representors.  Protect the individual pointers with RCU.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: add nfp_reprs_get_locked() helper

The write side of repr tables is always done under pf->lock.
Add a helper to dereference repr table pointers under protection
of that lock.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: register devlink after app is created

Devlink used to have two global locks: devlink lock and port lock,
our lock ordering looked like this:

  devlink lock -> driver's pf->lock -> devlink port lock

After recent changes port lock was replaced with per-instance
lock.  Unfortunately, new per-instance lock is taken on most
operations now.  This means we can only grab the pf->lock from
the port split/unsplit ops.  Lock ordering looks like this:

  devlink lock -> driver's pf->lock -> devlink instance lock

Since we can't take pf->lock from most devlink ops, make sure
nfp_apps are prepared to service them as soon as devlink is
registered.  Locking the pf must be pushed down after
nfp_app_init() callback.

The init order looks like this:
nfp_app_init
devlink_register
nfp_app_start
netdev/port_register

As soon as app_init is done nfp_apps must be ready to service
devlink-related callbacks.  apps can only register their own
devlink objects from nfp_app_start.

Fixes: 2406e7e546b2 ("devlink: Add per devlink instance lock")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: release global resources only on the remove path

NFP app is currently shut down as soon as all the vNICs are gone.
This means we can't depend on the app existing throughout the
lifetime of the device. Free the app only from PCI remove path.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: core: make scalar CPP helpers fail on short accesses

Currently the helpers for accessing 4 or 8 byte values over
the CPP bus return the length of IO on success.  If the IO
was short caller has to deal with error handling.  The short
IO for 4/8B values is completely impractical.  Make the
helpers return an error if full access was not possible.
Fix the few places which are actually dealing with errors
correctly, most call sites already only deal with negative
return codes.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5e: Add likely to the common RX checksum flow

Most of the packets return true for is_last_ethertype_ip, surround it
with likely compiler hint.

Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Extend the stats group API to have update_stats()

Extend the stats group API to have an update_stats() callback which
will be used to fetch the hardware or software counters data.

Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Merge per priority stats groups

Merge the per priority traffic and pfc groups into one group, because
both groups share the same update_stats() callback which will be
introduced in the upcoming patch.

Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Add per-channel counters infrastructure, use it upon TX timeout

Add per-channel counter ch#_eq_rearm to monitor how many lost interrupt
recovery actions happened upon TX timeouts.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Poll event queue upon TX timeout before performing full channels recovery

Up until this patch, on every TX timeout we would try to do channels
recovery. However, in case of a lost interrupt for an EQ, the channel
associated to it cannot be recovered if reopened as it would never get
another interrupt on sent/received traffic, and eventually ends up with
another TX timeout (Restarting the EQ is not part of channel recovery).

This patch adds a mechanism for explicitly polling EQ in case of a TX
timeout in order to recover from a lost interrupt. If this is not the
case (no pending EQEs), perform a channels full recovery as usual.

Once a lost EQE is recovered, it triggers the NAPI to run and handle all
pending completions. This will free some budget in the bql (via calling
netdev_tx_completed_queue) or by clearing pending TXWQEs and waking up
the queue. One of the above actions will move the queue to be ready for
transmit again.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Add Event Queue meta data info for TX timeout logs

When TX timeout occurs, EQ consumer index and irqn can help in debug for
understanding the SW state of EQ. Add them to the logger prints for the
relevant EQ only.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Print delta since last transmit per SQ upon TX timeout

When driver callback for TX timeout is being called, it handles all
stopped xmit queues (not only the ones which their timeout expired).
Add usecs since last transmit to TX timeout logs per send queue in order
to monitor if the queue timeout expired.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Set hairpin queue size

For a given hairpin packet buffer size, different queue sizes
(values of log_hairpin_num_packets) determine how the data is broken
to strides on the RQ. Currently the chosen value is set to 64B strides.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5: Enable setting hairpin queue size

Allow to specify the size of the hairpin queues along with the
packet buffer data size from the core setup code.

If the driver doesn't provide this, the FW applies proper value that
matches the provided data size and a FW chosen RQ stride size.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Add RSS support for hairpin

Support RSS for hairpin traffic. We create multiple hairpin RQ/SQ pairs
and RSS TTC table per hairpin instance and steer the related flows
through that table so they are spread between the pairs.

We open one pair per 50Gbs link speed, for all speeds <= 50Gbs, there
is one pair and no RSS while for 100Gbs ports two RSSed pairs.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5: Vectorize the low level core hairpin object

Enhance the hairpin setup code at the core to support a set of N
(RQ,SQ) pairs. This will be later used by the caller to set RSS
spreading among the different RQs.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Enlarge the NIC TC offload steering prio to support two levels

This will allow to be able and set TC rule whose steering dest is
RSS TTC steering table.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Refactor RSS related objects and code

In order to use RSS for hairpin, we refactor the code that deals with
setup of the TTC steering tables. This is done using an interim ttc
params object that has the flow table attributes, TIR numbers, etc.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Set per priority hairpin pairs

As part of the QoS model, on xmit, the HW mandates that all packets going
through a given SQ have the same priority. To align hairpin SQs with that,
we use the priority given as part of the matching for the hairpin hash key.

This ensures that flows/packets mapped to different HW priorities will
go through different hairpin instances. If no priority is given for
matching, we treat that as an 8th priority, this is in order not to
harm cases where priority is specified.

Only the PCP priority trust model is supported.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>