review.tizen.org Git - platform/kernel/linux-rpi.git/log

Merge git://git./linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2019-02-07

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Add a riscv64 JIT for BPF, from Björn.

2) Implement BTF deduplication algorithm for libbpf which takes BTF type
   information containing duplicate per-compilation unit information and
   reduces it to an equivalent set of BTF types with no duplication and
   without loss of information, from Andrii.

3) Offloaded and native BPF XDP programs can coexist today, enable also
   offloaded and generic ones as well, from Jakub.

4) Expose various BTF related helper functions in libbpf as API which
   are in particular helpful for JITed programs, from Yonghong.

5) Fix the recently added JMP32 code emission in s390x JIT, from Heiko.

6) Fix BPF kselftests' tcp_{server,client}.py to be able to run inside
   a network namespace, also add a fix for libbpf to get libbpf_print()
   working, from Stanislav.

7) Fixes for bpftool documentation, from Prashant.

8) Type cleanup in BPF kselftests' test_maps.c to silence a gcc8 warning,
   from Breno.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mlxsw-blackhole-routes'

Ido Schimmel says:

====================
mlxsw: Offload blackhole routes

Blackhole routes are routes that cause matching packets to be silently
dropped. This is in contrast to unreachable routes that generate an ICMP
host unreachable packet in response.

The driver currently programs both route types with a trap action and
lets the kernel drop matching packets. This is sub-optimal as packets
routed using a blackhole route can be directly dropped by the ASIC.

Patch #1 alters mlxsw to program blackhole routes with a discard action.

Patch #2 adds a matching test.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: mlxsw: Add a test for blackhole routes

Use a simple topology consisting of two hosts directly connected to a
router. Make sure IPv4/IPv6 ping works and then add blackhole routes.
Test that ping fails and that the routes are marked as offloaded. Use a
simple tc filter to test that packets were dropped by the ASIC and not
trapped to the CPU.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_router: Offload blackhole routes

Create a new FIB entry type for blackhole routes and set it in case the
type of the notified route is 'RTN_BLACKHOLE'.

Program such routes with a discard action and mark them as offloaded
since the device is dropping the packets instead of the kernel.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-Introduce-ndo_get_port_parent_id'

Florian Fainelli says:

====================
net: Introduce ndo_get_port_parent_id()

Based on discussion with Ido and feedback from Jakub there are clearly
two classes of users that implement SWITCHDEV_ATTR_ID_PORT_PARENT_ID:

- PF/VF drivers which typically only implement return the port's parent
  ID, yet have to implement switchdev_port_attr_get() just for that

- Ethernet switch drivers: mlxsw, ocelot, DSA, etc. which implement more
  attributes which we want to be able to eventually veto in the context
  of the caller, thus making them candidates for using a blocking notifier
  chain

Changes in v4:

- remove superfluous net/switchdev.h inclusions in a few files
- added Jiri's Acked-by where given
- removed err = -EOPNOTSUPP initializations
- changed according to Jiri's suggestion in net/ipv4/ipmr.c

Changes in v3:

- keep ethsw's switchdev_ops assignment
- remove inclusion of net/switchdev.h in netdevsim which is no longer
  necesary

Changes in v2:

- resolved build failures spotted by kbuild test robot
- added helpers functions into the core network device layer:
  dev_get_port_parent_id() and netdev_port_same_parent_id();
- added support for recursion to lower devices

Changes from RFC:

- introduce a ndo_get_port_parent_id() and convert all relevant drivers
  to use it

- get rid of SWITCHDEV_ATTR_ID_PORT_PARENT_ID

A subsequent set of patches will convert switchdev_port_attr_set() to
use a blocking notifier call, and still get rid of
switchdev_port_attr_get() altogether.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: Get rid of SWITCHDEV_ATTR_ID_PORT_PARENT_ID

Now that we have a dedicated NDO for getting a port's parent ID, get rid
of SWITCHDEV_ATTR_ID_PORT_PARENT_ID and convert all callers to use the
NDO exclusively. This is a preliminary change to getting rid of
switchdev_ops eventually.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: Implement ndo_get_port_parent_id()

DSA implements SWITCHDEV_ATTR_ID_PORT_PARENT_ID and we want to get rid
of switchdev_ops eventually, ease that migration by implementing a
ndo_get_port_parent_id() function which returns what
switchdev_port_attr_get() would do.

Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

staging: fsl-dpaa2: ethsw: Implement ndo_get_port_parent_id()

ethsw implements SWITCHDEV_ATTR_ID_PORT_PARENT_ID and we want to get rid
of switchdev_ops eventually, ease that migration by implementing a
ndo_get_port_parent_id() function which returns what
switchdev_port_attr_get() would do.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netdevsim: Implement ndo_get_port_parent_id()

netdevsim only supports SWITCHDEV_ATTR_ID_PORT_PARENT_ID, which makes it a
great candidate to be converted to use the ndo_get_port_parent_id() NDO
instead of implementing switchdev_port_attr_get().

Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

rocker: Implement ndo_get_port_parent_id()

mlxsw implements SWITCHDEV_ATTR_ID_PORT_PARENT_ID and we want to get rid
of switchdev_ops eventually, ease that migration by implementing a
ndo_get_port_parent_id() function which returns what
switchdev_port_attr_get() would do.

Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: Implement ndo_get_port_parent_id()

NFP only supports SWITCHDEV_ATTR_ID_PORT_PARENT_ID, which makes it a
great candidate to be converted to use the ndo_get_port_parent_id() NDO
instead of implementing switchdev_port_attr_get().

Since NFP uses switchdev_port_same_parent_id() convert it to use
netdev_port_same_parent_id().

Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mscc: ocelot: Implement ndo_get_port_parent_id()

Ocelot only supports SWITCHDEV_ATTR_ID_PORT_PARENT_ID as a valid
switchdev attribute getter, convert it to use ndo_get_port_parent_id()
and get rid of the switchdev_ops::switchdev_port_attr_get altogether.

Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: Implement ndo_get_port_parent_id()

mlxsw implements SWITCHDEV_ATTR_ID_PORT_PARENT_ID and we want to get rid
of switchdev_ops eventually, ease that migration by implementing a
ndo_get_port_parent_id() function which returns what
switchdev_port_attr_get() would do.

Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5e: Implement ndo_get_port_parent_id()

mlx5e only supports SWITCHDEV_ATTR_ID_PORT_PARENT_ID, which makes it a
great candidate to be converted to use the ndo_get_port_parent_id() NDO
instead of implementing switchdev_port_attr_get().

Since mlx5e makes use of switchdev_port_parent_id() convert it to use
netdev_port_same_parent_id().

Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

liquidio: Implement ndo_get_port_parent_id()

Liquidio only supports SWITCHDEV_ATTR_ID_PORT_PARENT_ID, which makes it
a great candidate to be converted to use the ndo_get_port_parent_id()
NDO instead of implementing switchdev_port_attr_get().

Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt: Implement ndo_get_port_parent_id()

BNXT only supports SWITCHDEV_ATTR_ID_PORT_PARENT_ID, which makes it a
great candidate to be converted to use the ndo_get_port_parent_id() NDO
instead of implementing switchdev_port_attr_get(). The conversion is
straight forward here since the PF and VF code use the same getter.

Since bnxt makes uses of switchdev_port_same_parent_id() convert it to
use netdev_port_same_parent_id().

Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Introduce ndo_get_port_parent_id()

In preparation for getting rid of switchdev_ops, create a dedicated NDO
operation for getting the port's parent identifier. There are
essentially two classes of drivers that need to implement getting the
port's parent ID which are VF/PF drivers with a built-in switch, and
pure switchdev drivers such as mlxsw, ocelot, dsa etc.

We introduce a helper function: dev_get_port_parent_id() which supports
recursion into the lower devices to obtain the first port's parent ID.

Convert the bridge, core and ipv4 multicast routing code to check for
such ndo_get_port_parent_id() and call the helper function when valid
before falling back to switchdev_port_attr_get(). This will allow us to
convert all relevant drivers in one go instead of having to implement
both switchdev_port_attr_get() and ndo_get_port_parent_id() operations,
then get rid of switchdev_port_attr_get().

Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: Update 1.22.9.0 as the latest firmware supported.

Change t4fw_version.h to update latest firmware version
number to 1.22.9.0.

Signed-off-by: Vishal Kulkarni <vishal@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: Add new T6 PCI device ids 0x608b

Signed-off-by: Vishal Kulkarni <vishal@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: Avoid pointer aliasing

Read MAC address 32-bit at a time and manually extract the individual
bytes. This avoids pointer aliasing and gives the compiler a better
chance of optimizing the operation.

Suggested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: Load MAC address from device tree if present

If the system was booted using a device tree and if the device tree
contains a MAC address, use it instead of reading one from the EEPROM.
This is useful in situations where the EEPROM isn't properly programmed
or where the firmware wants to override the existing MAC address.

Signed-off-by: Thierry Reding <treding@nvidia.com>
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mlxsw-core-Trace-EMAD-errors'

Ido Schimmel says:

====================
mlxsw: core: Trace EMAD errors

Nir says:

This patchset adds a trace for EMAD errors to the existing EMAD payload
traces. This tracepoint is useful to track user or firmware errors during
tests execution.

Patch #1 defines the devlink tracepoint.
Patch #2 uses it for reporting mlxsw EMAD errors.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: core: Trace EMAD errors

Trace EMAD errors returned from HW.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

devlink: add hardware errors tracing facility

Define a tracepoint and allow user to trace messages in case of an hardware
error code for hardware associated with devlink instance.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'dpaa2-eth-Driver-updates'

Ioana Ciocoi Radulescu says:

====================
dpaa2-eth: Driver updates

First patch moves the driver to a page-per-frame memory model.
The others are minor tweaks and optimizations.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

dpaa2-eth: Update buffer pool refill threshold

Add more buffers to the Rx buffer pool as soon as 7 of them
get consumed, instead of waiting for their number to drop
below a fixed threshold.
7 is the number of buffers that can be released in the pool
via a single DPIO command.

Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dpaa2-eth: Use FQ-based DPIO enqueue API

Starting with MC10.14.0, dpaa2_io_service_enqueue_fq() API is
functional. Since there are a number of cases where it offers
better performance compared to the currently used enqueue
function, switch to it for firmware versions that support it.

Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dpaa2-eth: Use napi_consume_skb()

While in NAPI context, free skbs by calling napi_consume_skb()
instead of dev_kfree_skb(), to take advantage of the bulk freeing
mechanism.

Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dpaa2-eth: Use a single page per Rx buffer

Instead of allocating page fragments via the network stack,
use the page allocator directly. For now, we consume one page
for each Rx buffer.

With the new memory model we are free to consider adding more
XDP support.

Performance decreases slightly in some IP forwarding cases.
No visible effect on termination traffic. The driver memory
footprint increases as a result of this change, but it is
still small enough to not really matter.

Another side effect is that now Rx buffer alignment requirements
are naturally satisfied without any additional actions needed.
Remove alignment related code, except in the buffer layout
information conveyed to MC, as hardware still needs to know the
alignment value we guarantee.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'add-flow_rule-infrastructure'

Pablo Neira Ayuso says:

====================
add flow_rule infrastructure

This patchset, as is, allows us to reuse the driver codebase to
configure ACL hardware offloads for the ethtool_rxnfc and the TC flower
interfaces. A few clients for this infrastructure are presented, such as
the bcm_sf2 and the qede drivers, for reference. Moreover all of the
existing drivers in the tree are converted to use this infrastructure.

This patchset is re-using the existing flow dissector infrastructure
that was introduced by Jiri Pirko et al. so the amount of abstractions
that this patchset adds are minimal. Well, just a few wrapper structures
for the selector side of the rules. And, in order to express actions,
this patchset exposes an action API that is based on the existing TC
action infrastructure and what existing drivers already support on that
front.

v7: This patchset is a rebase on top of the net-next tree, after
addressing questions and feedback from driver developers in the
last batch.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

qede: use ethtool_rx_flow_rule() to remove duplicated parser code

The qede driver supports for ethtool_rx_flow_spec and flower, both
codebases look very similar.

This patch uses the ethtool_rx_flow_rule() infrastructure to remove the
duplicated ethtool_rx_flow_spec parser and consolidate ACL offload
support around the flow_rule infrastructure.

Furthermore, more code can be consolidated by merging
qede_add_cls_rule() and qede_add_tc_flower_fltr(), these two functions
also look very similar.

This driver currently provides simple ACL support, such as 5-tuple
matching, drop policy and queue to CPU.

Drivers that support more features can benefit from this infrastructure
to save even more redundant codebase.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

qede: place ethtool_rx_flow_spec after code after TC flower codebase

This is a preparation patch to reuse the existing TC flower codebase
from ethtool_rx_flow_spec.

This patch is merely moving the core ethtool_rx_flow_spec parser after
tc flower offload driver code so we can skip a few forward function
declarations in the follow up patch.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dsa: bcm_sf2: use flow_rule infrastructure

Update this driver to use the flow_rule infrastructure, hence we can use
the same code to populate hardware IR from ethtool_rx_flow and the
cls_flower interfaces.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethtool: add ethtool_rx_flow_spec to flow_rule structure translator

This patch adds a function to translate the ethtool_rx_flow_spec
structure to the flow_rule representation.

This allows us to reuse code from the driver side given that both flower
and ethtool_rx_flow interfaces use the same representation.

This patch also includes support for the flow type flags FLOW_EXT,
FLOW_MAC_EXT and FLOW_RSS.

The ethtool_rx_flow_spec_input wrapper structure is used to convey the
rss_context field, that is away from the ethtool_rx_flow_spec structure,
and the ethtool_rx_flow_spec structure.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

flow_offload: add wake-up-on-lan and queue to flow_action

These actions need to be added to support the ethtool_rx_flow interface.
The queue action includes a field to specify the RSS context, that is
set via FLOW_RSS flow type flag and the rss_context field in struct
ethtool_rxnfc, plus the corresponding queue index. FLOW_RSS implies that
rss_context is non-zero, therefore, queue.ctx == 0 means that FLOW_RSS
was not set. Also add a field to store the vf index which is stored in
the ethtool_rxnfc ring_cookie field.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cls_flower: don't expose TC actions to drivers anymore

Now that drivers have been converted to use the flow action
infrastructure, remove this field from the tc_cls_flower_offload
structure.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers: net: use flow action infrastructure

This patch updates drivers to use the new flow action infrastructure.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

flow_offload: add statistics retrieval infrastructure and use it

This patch provides the flow_stats structure that acts as container for
tc_cls_flower_offload, then we can use to restore the statistics on the
existing TC actions. Hence, tcf_exts_stats_update() is not used from
drivers anymore.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cls_api: add translator to flow_action representation

This patch implements a new function to translate from native TC action
to the new flow_action representation. Moreover, this patch also updates
cls_flower to use this new function.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

flow_offload: add flow action infrastructure

This new infrastructure defines the nic actions that you can perform
from existing network drivers. This infrastructure allows us to avoid a
direct dependency with the native software TC action representation.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5e: support for two independent packet edit actions

This patch adds pedit_headers_action structure to store the result of
parsing tc pedit actions. Then, it calls alloc_tc_pedit_action() to
populate the mlx5e hardware intermediate representation once all actions
have been parsed.

This patch comes in preparation for the new flow_action infrastructure,
where each packet mangling comes in an separated action, ie. not packed
as in tc pedit.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

flow_offload: add flow_rule and flow_match structures and use them

This patch wraps the dissector key and mask - that flower uses to
represent the matching side - around the flow_match structure.

To avoid a follow up patch that would edit the same LoCs in the drivers,
this patch also wraps this new flow match structure around the flow rule
object. This new structure will also contain the flow actions in follow
up patches.

This introduces two new interfaces:

bool flow_rule_match_key(rule, dissector_id)

that returns true if a given matching key is set on, and:

flow_rule_match_XYZ(rule, &match);

To fetch the matching side XYZ into the match container structure, to
retrieve the key and the mask with one single call.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-phy-add-and-use-further-MMD-accessors'

Heiner Kallweit says:

====================
net: phy: add and use further MMD accessors

Add MMD accessors for modifying MMD registers and clearing / setting
bits in MMD registers. Use these accessors in PHY drivers and phylib.

v2:
- fix SoB in patch 2
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: make use of new MMD accessors

Make use of the new MMD accessors.

v2:
- fix SoB

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: provide full set of accessor functions to MMD registers

This adds full set of locked and unlocked accessor functions to read and
write PHY MMD registers and/or bitfields.

Set of functions exactly matches what is already available for PHY
legacy registers.

Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'wireless-drivers-next-for-davem-2019-02-06' of git://git./linux/kernel/git/kvalo/wireless-drivers-next

Kalle Valo says:

====================
wireless-drivers-next patches for 5.1

First set of patches for 5.1. Lots of new features in various drivers
but nothing really special standing out.

Major changes:

brcmfmac

* DMI nvram filename quirk for PoV TAB-P1006W-232 tablet

rsi

* support for hardware scan offload

iwlwifi

* support for Target Wakeup Time (TWT) -- a feature that allows the AP
to specify when individual stations can access the medium

* support for mac80211 AMSDU handling

* some new PCI IDs

* relicense the pcie submodule to dual GPL/BSD

* reworked the TOF/CSI (channel estimation matrix) implementation

* Some product name updates in the human-readable strings

mt76

* energy detect regulatory compliance fixes

* preparation for MT7603 support

* channel switch announcement support

mwifiex

* support for sd8977 chipset

qtnfmac

* support for 4addr mode

* convert to SPDX license identifiers
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: test_maps: fix possible out of bound access warning

When compiling test_maps selftest with GCC-8, it warns that an array
might be indexed with a negative value, which could cause a negative
out of bound access, depending on parameters of the function. This
is the GCC-8 warning:

gcc -Wall -O2 -I../../../include/uapi -I../../../lib -I../../../lib/bpf -I../../../../include/generated -DHAVE_GENHDR -I../../../include    test_maps.c /home/breno/Devel/linux/tools/testing/selftests/bpf/libbpf.a -lcap -lelf -lrt -lpthread -o /home/breno/Devel/linux/tools/testing/selftests/bpf/test_maps
In file included from test_maps.c:16:
test_maps.c: In function ‘run_all_tests’:
test_maps.c:1079:10: warning: array subscript -1 is below array bounds of ‘pid_t[<Ube20> + 1]’ [-Warray-bounds]
   assert(waitpid(pid[i], &status, 0) == pid[i]);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~
test_maps.c:1059:6: warning: array subscript -1 is below array bounds of ‘pid_t[<Ube20> + 1]’ [-Warray-bounds]
   pid[i] = fork();
   ~~~^~~

This patch simply guarantees that the task(s) variables are unsigned,
thus, they could never be a negative number (which they are not in
current code anyway), hence avoiding an out of bound access warning.

Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

tools: bpftool: doc, fix incorrect text

Documentation about cgroup, feature, prog uses wrong header
'MAP COMMANDS' while listing commands. This patch corrects the header
in respective doc files.

Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Acked-by: Yonghong Song <yhs@fb.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

Merge branch 'bpf-xdp-hw-plus-generic'

Jakub Kicinski says:

====================
Offloaded and native/driver XDP programs can already coexist.
Allow offload and generic hook to coexist as well, there seem
to be no reason why not to do so.
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

selftests/bpf: test reading the offloaded program

Test adding the offloaded program after the other program
is already installed.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

selftests/bpf: add test for mixing generic and offload XDP

Add simple sanity check for enabling generic and offload
XDP, simply reuse the native and offload checks.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

selftests/bpf: print traceback when test fails

Figuring out which exact check in test_offload.py takes more
time than it should. Print the traceback (to the screen and
the logs).

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

net: xdp: allow generic and driver XDP on one interface

Since commit a25717d2b604 ("xdp: support simultaneous driver and
hw XDP attachment") users can load an XDP program for offload and
in native driver mode simultaneously. Allow a similar mix of
offload and SKB mode/generic XDP.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

selftests/bpf: fix the expected messages

Recent changes added extack to program replacement path,
expect extack instead of generic messages.

Fixes: 01dde20ce04b ("xdp: Provide extack messages when prog attachment failed")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

tools/bpf: silence a libbpf unnecessary warning

Commit 96408c43447a ("tools/bpf: implement libbpf
btf__get_map_kv_tids() API function") refactored
function bpf_map_find_btf_info() and moved bulk of
implementation into btf.c as btf__get_map_kv_tids().
This change introduced a bug such that test_btf will
print out the following warning although the test passed:
  BTF libbpf test[2] (test_btf_nokv.o): libbpf: map:btf_map
      container_name:____btf_map_btf_map cannot be found
      in BTF. Missing BPF_ANNOTATE_KV_PAIR?

Previously, the error message is guarded with pr_debug().
Commit 96408c43447a changed it to pr_warning() and
hence caused the warning.

Restoring to pr_debug() for the message fixed the issue.

Fixes: 96408c43447a ("tools/bpf: implement libbpf btf__get_map_kv_tids() API function")
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Merge branch '1GbE' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2019-02-05

This series contains updates to igc, e1000e, ixgbe, fm10k and driver
documentation.

Kai-Heng Feng fixes an e1000e issue where the Wake-On-LAN settings where
being set incorrectly during a system suspend.

Sasha addresses community feedback on the igc driver and provides a
number of code cleanups to remove either unreachable or unused code. In
addition, added basic ethtool support for the igc driver.

Mike Rapoport fixes the formatting of the kernel driver documentation so
that the title is properly formatted and does not get lumped with the
document sections in the HTML kernel documents generated.

Jiri Kosina updates a hard coded RAR entries value with the existing
define IXGBE_82599_RAR_ENTRIES.

Jake fixes up whitespace in the fm10k driver.

Konstantin Khlebnikov fixes an issue where in some cases, the e1000e
driver will continually reset during a system boot because the watchdog
task sees items in the transmit buffer but the carrier is off (trying to
establish link) causing the device reset to flush the buffer. To
resolve, just move this check/flush into the watchdog section for when
the carrier is off.

Todd bumps the igb driver version to reflect the recent driver changes.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

tools/bpf: add const qualifier to btf__get_map_kv_tids() map_name parameter

Commit 96408c43447a ("tools/bpf: implement libbpf btf__get_map_kv_tids() API function")
added the API function btf__get_map_kv_tids():
  btf__get_map_kv_tids(const struct btf *btf, char *map_name, ...)

The parameter map_name has type "char *". This is okay inside libbpf library since
the map_name is from bpf_map->name which also has type "char *".

This will be problematic if the caller for map_name already has attribute "const",
e.g., from C++ string.c_str(). It will result in either a warning or an error.

  /home/yhs/work/bcc/src/cc/btf.cc:166:51:
    error: invalid conversion from ‘const char*’ to ‘char*’ [-fpermissive]
      return btf__get_map_kv_tids(btf_, map_name.c_str()

This patch added "const" attributes to map_name parameter.

Fixes: 96408c43447a ("tools/bpf: implement libbpf btf__get_map_kv_tids() API function")
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

tools/bpf: fix a selftest test_btf failure

Commit 9c651127445c ("selftests/btf: add initial BTF dedup tests")
added dedup tests in test_btf.c.
It broke the raw test:
BTF raw test[71] (func proto (Bad arg name_off)):
    btf_raw_create:2905:FAIL Error getting string #65535, strs_cnt:1

The test itself encodes invalid func_proto parameter name
offset 0xffffFFFF as a negative test for the kernel.
The above commit changed the meaning of that offset and
resulted in a user space error.
  #define NAME_NTH(N) (0xffff0000 | N)
  #define IS_NAME_NTH(X) ((X & 0xffff0000) == 0xffff0000)
  #define GET_NAME_NTH_IDX(X) (X & 0x0000ffff)

Currently, the kernel permits maximum name offset 0xffff.
Set the test name off as 0x0fffFFFF to trigger the kernel
verification failure.

Cc: Andrii Nakryiko <andriin@fb.com>
Fixes: 9c651127445c ("selftests/btf: add initial BTF dedup tests")
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

igc: Add ethtool support

This patch adds basic ethtool support to the device to allow
for configuration.

Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

igb: Bump version number

With recent changes, need to bump the driver version to reflect the
changes.

Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

igc: Remove the 'igc_get_phy_id_base' method

Remove the redundant 'igc_get_phy_id_base' method and use
the 'igc_get_phy_id' method directly instead.

Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

igc: Remove the 'igc_read_mac_addr_base' method

Remove the redundant 'igc_read_mac_addr_base' method and use
the 'igc_read_mac_addr' method directly instead.

Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

e1000e: fix cyclic resets at link up with active tx

I'm seeing series of e1000e resets (sometimes endless) at system boot
if something generates tx traffic at this time. In my case this is
netconsole who sends message "e1000e 0000:02:00.0: Some CPU C-states
have been disabled in order to enable jumbo frames" from e1000e itself.
As result e1000_watchdog_task sees used tx buffer while carrier is off
and start this reset cycle again.

[   17.794359] e1000e: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[   17.794714] IPv6: ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
[   22.936455] e1000e 0000:02:00.0 eth1: changing MTU from 1500 to 9000
[   23.033336] e1000e 0000:02:00.0: Some CPU C-states have been disabled in order to enable jumbo frames
[   26.102364] e1000e: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[   27.174495] 8021q: 802.1Q VLAN Support v1.8
[   27.174513] 8021q: adding VLAN 0 to HW filter on device eth1
[   30.671724] cgroup: cgroup: disabling cgroup2 socket matching due to net_prio or net_cls activation
[   30.898564] netpoll: netconsole: local port 6666
[   30.898566] netpoll: netconsole: local IPv6 address 2a02:6b8:0:80b:beae:c5ff:fe28:23f8
[   30.898567] netpoll: netconsole: interface 'eth1'
[   30.898568] netpoll: netconsole: remote port 6666
[   30.898568] netpoll: netconsole: remote IPv6 address 2a02:6b8:b000:605c:e61d:2dff:fe03:3790
[   30.898569] netpoll: netconsole: remote ethernet address b0:a8:6e:f4:ff:c0
[   30.917747] console [netcon0] enabled
[   30.917749] netconsole: network logging started
[   31.453353] e1000e 0000:02:00.0: Some CPU C-states have been disabled in order to enable jumbo frames
[   34.185730] e1000e 0000:02:00.0: Some CPU C-states have been disabled in order to enable jumbo frames
[   34.321840] e1000e 0000:02:00.0: Some CPU C-states have been disabled in order to enable jumbo frames
[   34.465822] e1000e 0000:02:00.0: Some CPU C-states have been disabled in order to enable jumbo frames
[   34.597423] e1000e 0000:02:00.0: Some CPU C-states have been disabled in order to enable jumbo frames
[   34.745417] e1000e 0000:02:00.0: Some CPU C-states have been disabled in order to enable jumbo frames
[   34.877356] e1000e 0000:02:00.0: Some CPU C-states have been disabled in order to enable jumbo frames
[   35.005441] e1000e 0000:02:00.0: Some CPU C-states have been disabled in order to enable jumbo frames
[   35.157376] e1000e 0000:02:00.0: Some CPU C-states have been disabled in order to enable jumbo frames
[   35.289362] e1000e 0000:02:00.0: Some CPU C-states have been disabled in order to enable jumbo frames
[   35.417441] e1000e 0000:02:00.0: Some CPU C-states have been disabled in order to enable jumbo frames
[   37.790342] e1000e: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None

This patch flushes tx buffers only once when carrier is off
rather than at each watchdog iteration.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

igc: Remove unneeded code

Remove the 'igc_get_link_up_info_base method' from igc_base.c file.
Use the 'igc_get_speed_and_duplex_copper' method directly and reduce
the code redundancy.

Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

igc: Remove unused code

Remove unused igc_adv_data_desc definition from igc_base.h file.
Descriptors definition will be added per demand.

Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

e1000e: fix a missing check for return value

The change is based on the issue found by Kangjie Lu <kjlu@umn.edu> where
we not checking the return value of a register read/write which could result
in a NULL pointer dereference if the read/write fails.

Since we are only trying to disable the far-end loopback, if the read
and write of register fails, we do not want to bail out of the function.
We just want to log that it failed to disable and continue on.

CC: Sasha Neftin <sasha.neftin@intel.com>
CC: Kangjie Lu <kjlu@umn.edu>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

fm10k: TRIVIAL cleanup of extra spacing in function comment

The function comment for fm10k_iov_msg_msix_pf has an extra space in
a sentence, which is unnecessary.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: remove magic constant in ixgbe_reset_hw_82599()

ixgbe_reset_hw_82599() resets the value of hw->mac.num_rar_entries to
pre-defined value of 128. Let's get rid of that hardcoded literal, and use
IXGBE_82599_RAR_ENTRIES instead, the same way the normal initialization
path does.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

igc: Fix code redundancy

Remove redundant igc_check_for_link_base code and replace it with
an igc_check_for_copper_link method.
Fix duplication of IGC_ADVTXD_PAYLEN_SHIFT mask declaration.
Remove obsolete IGC_SCVPC register definition.

Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

docs/networking: fix formatting of Intel drivers documentation

The documentation of Intel drivers is missing the heading adornment for
document titles.

This causes the generated html to have TOC entries from these documents to
appear as top level TOC entries:

* Linux* Base Driver for Intel(R) Ethernet Network Connection
* Contents
* Identifying Your Adapter
* Command Line Parameters
  * AutoNeg
  * Duplex
  ...

Add overline heading adornment to document titles.

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

igc: Remove unreachable code from igc_phy.c file

Address community comment.
Remove the unreachable code leads to the static checker warning.
PHY functionality will be added later per demand.
Reported by Dan Carpenter.

Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

e1000e: Exclude device from suspend direct complete optimization

e1000e sets different WoL settings in system suspend callback and
runtime suspend callback.

The suspend direct complete optimization leaves e1000e in runtime
suspended state with wrong WoL setting during system suspend.

To fix this, we need to disable suspend direct complete optimization to
let e1000e always use suspend callback to set correct WoL during system
suspend.

Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

net: marvell: mvpp2: fix lack of link interrupts

Sven Auhagen reports that if he changes a SFP+ module for a SFP module
on the Macchiatobin Single Shot, the link does not come back up.  For
Sven, it is as easy as:

- Insert a SFP+ module connected, and use ping6 to verify link is up.
- Remove SFP+ module
- Insert SFP 1000base-X module use ping6 to verify link is up: Link
  up event did not trigger and the link is down

but that doesn't show the problem for me.  Locally, this has been
reproduced by:

- Boot with no modules.
- Insert SFP+ module, confirm link is up.
- Replace module with 25000base-X module.  Confirm link is up.
- Set remote end down, link is reported as dropped at both ends.
- Set remote end up, link is reported up at remote end, but not local
  end due to lack of link interrupt.

Fix this by setting up both GMAC and XLG interrupts for port 0, but
only unmasking the appropriate interrupt according to the current mode
set in the mac_config() method.  However, only do the mask/unmask
dance when we are really changing the link mode to avoid missing any
link interrupts.

Tested-by: Sven Auhagen <sven.auhagen@voleatech.de>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: marvell: mvpp2: use phy_interface_mode_is_8023z() helper

Use the phy_interface_mode_is_8023z() helper for detecting interface
modes that use 802.3z serial encoding. This is equivalent to testing
for both 1000base-X and 2500base-X.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'nixge-Fixed-link-support'

Moritz Fischer says:

====================
nixge: Fixed-link support

This series adds fixed-link support to nixge.

The first patch corrects the binding to correctly reflect
hardware that does not come with MDIO cores instantiated.

The second patch adds fixed link support to the driver.

The third patch updates the binding document with the now
optional (formerly required) phy-handle property and references
the fixed-link docs.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

dt-bindings: net: Add fixed-link support

Update device-tree binding with fixed-link support.

With fixed-link support the formerly required property 'phy-handle'
is now optional if 'fixed-link' child is present.

Signed-off-by: Moritz Fischer <mdf@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: nixge: Add support for fixed-link configurations

Add support for fixed-link configurations to nixge driver.

Signed-off-by: Moritz Fischer <mdf@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: nixge: Make mdio child node optional

Make MDIO child optional and only instantiate the
MDIO bus if the child is actually present.

There are currently no (in-tree) users of this
binding; all (out-of-tree) users use overlays that
get shipped together with the FPGA images that contain
the IP.

This will significantly increase maintainabilty
of future revisions of this IP.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Moritz Fischer <mdf@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'bpf-riscv-jit'

Björn Töpel says:

====================
This v2 series adds an RV64G BPF JIT to the kernel.

At the moment the RISC-V Linux port does not support
CONFIG_HAVE_KPROBES (Patrick Stählin sent out an RFC last year), which
means that CONFIG_BPF_EVENTS is not supported. Thus, no tests
involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT,
BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes.

The implementation does not support "far branching" (>4KiB).

Test results:
  # modprobe test_bpf
  test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed]

  # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled
  # ./test_verifier
  ...
  Summary: 761 PASSED, 507 SKIPPED, 2 FAILED

Note that "test_verifier" was run with one build with
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise
many of the the tests that require unaligned access were skipped.

CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y:
  # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled
  # ./test_verifier | grep -c 'NOTE.*unknown align'
  0

No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS:
  # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled
  # ./test_verifier | grep -c 'NOTE.*unknown align'
  59

The two failing test_verifier tests are:
  "ld_abs: vlan + abs, test 1"
  "ld_abs: jump around ld_abs"

This is due to that "far branching" involved in those tests.
All tests where done on QEMU emulator version 3.1.50
(v3.1.0-688-g8ae951fbc106). I'll test it on real hardware, when I get
access to it.

I'm routing this patch via bpf-next/netdev mailing list (after a
conversation with Palmer at FOSDEM), mainly because the other JITs
went that path.

Again, thanks for all the comments!

Cheers,
Björn

v1 -> v2:
* Added JMP32 support. (Daniel)
* Add RISC-V to Documentation/sysctl/net.txt. (Daniel)
* Fixed seen_call() asymmetry. (Daniel)
* Fixed broken bpf_flush_icache() range. (Daniel)
* Added alignment annotations to some selftests.

RFCv1 -> v1:
* Cleaned up the Kconfig and net/Makefile. (Christoph)
* Removed the entry-stub and squashed the build/config changes to be
  part of the JIT implementation. (Christoph)
* Simplified the register tracking code. (Daniel)
* Removed unused macros. (Daniel)
* Added myself as maintainer and updated documentation. (Daniel)
* Removed HAVE_EFFICIENT_UNALIGNED_ACCESS. (Christoph, Palmer)
* Added tail-calls and cleaned up the code.
====================

Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

selftests/bpf: add "any alignment" annotation for some tests

RISC-V does, in-general, not have "efficient unaligned access". When
testing the RISC-V BPF JIT, some selftests failed in the verification
due to misaligned access. Annotate these tests with the
F_NEEDS_EFFICIENT_UNALIGNED_ACCESS flag.

Signed-off-by: Björn Töpel <bjorn.topel@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

bpf, doc: add RISC-V JIT to BPF documentation

Update Documentation/networking/filter.txt and
Documentation/sysctl/net.txt to mention RISC-V.

Signed-off-by: Björn Töpel <bjorn.topel@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

MAINTAINERS: add RISC-V BPF JIT maintainer

Add Björn Töpel as RISC-V BPF JIT maintainer.

Signed-off-by: Björn Töpel <bjorn.topel@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

bpf, riscv: add BPF JIT for RV64G

This commit adds a BPF JIT for RV64G.

The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar
to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64).

At the moment the RISC-V Linux port does not support
CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not
supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT,
BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and
BPF_PROG_TYPE_RAW_TRACEPOINT passes.

The implementation does not support "far branching" (>4KiB).

Test results:
  # modprobe test_bpf
  test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed]

  # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled
  # ./test_verifier
  ...
  Summary: 761 PASSED, 507 SKIPPED, 2 FAILED

Note that "test_verifier" was run with one build with
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise
many of the the tests that require unaligned access were skipped.

CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y:
  # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled
  # ./test_verifier | grep -c 'NOTE.*unknown align'
  0

No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS:
  # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled
  # ./test_verifier | grep -c 'NOTE.*unknown align'
  59

The two failing test_verifier tests are:
  "ld_abs: vlan + abs, test 1"
  "ld_abs: jump around ld_abs"

This is due to that "far branching" involved in those tests.

All tests where done on QEMU (QEMU emulator version 3.1.50
(v3.1.0-688-g8ae951fbc106)).

Signed-off-by: Björn Töpel <bjorn.topel@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

Merge branch 'bpf-btf-dedup'

Andrii Nakryiko says:

====================
This patch series adds BTF deduplication algorithm to libbpf. This algorithm
allows to take BTF type information containing duplicate per-compilation unit
information and reduce it to equivalent set of BTF types with no duplication without
loss of information. It also deduplicates strings and removes those strings that
are not referenced from any BTF type (and line information in .BTF.ext section,
if any).

Algorithm also resolves struct/union forward declarations into concrete BTF types
across multiple compilation units to facilitate better deduplication ratio. If
undesired, this resolution can be disabled through specifying corresponding options.

When applied to BTF data emitted by pahole's DWARF->BTF converter, it reduces
the overall size of .BTF section by about 65x, from about 112MB to 1.75MB, leaving
only 29247 out of initial 3073497 BTF type descriptors.

Algorithm with minor differences and preliminary results before FUNC/FUNC_PROTO
support is also described more verbosely at:

https://facebookmicrosites.github.io/bpf/blog/2018/11/14/btf-enhancement.html

v1->v2:
- rebase on latest bpf-next
- err_log/elog -> pr_debug
- btf__dedup, btf__get_strings, btf__get_nr_types listed under 0.0.2 version
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

selftests/btf: add initial BTF dedup tests

This patch sets up a new kind of tests (BTF dedup tests) and tests few aspects of
BTF dedup algorithm. More complete set of tests will come in follow up patches.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

btf: add BTF types deduplication algorithm

This patch implements BTF types deduplication algorithm. It allows to
greatly compress typical output of pahole's DWARF-to-BTF conversion or
LLVM's compilation output by detecting and collapsing identical types emitted in
isolation per compilation unit. Algorithm also resolves struct/union forward
declarations into concrete BTF types representing referenced struct/union. If
undesired, this resolution can be disabled through specifying corresponding options.

Algorithm itself and its application to Linux kernel's BTF types is
described in details at:
https://facebookmicrosites.github.io/bpf/blog/2018/11/14/btf-enhancement.html

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

btf: extract BTF type size calculation

This pre-patch extracts calculation of amount of space taken by BTF type descriptor
for later reuse by btf_dedup functionality.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

net: phy: fixed-phy: Drop GPIO from fixed_phy_add()

All users of the fixed_phy_add() pass -1 as GPIO number
to the fixed phy driver, and all users of fixed_phy_register()
pass -1 as GPIO number as well, except for the device
tree MDIO bus.

Any new users should create a proper device and pass the
GPIO as a descriptor associated with the device so delete
the GPIO argument from the calls and drop the code looking
requesting a GPIO in fixed_phy_add().

In fixed phy_register(), investigate the "fixed-link"
node and pick the GPIO descriptor from "link-gpios" if
this property exists. Move the corresponding code out
of of_mdio.c as the fixed phy code anyways requires
OF to be in use.

Tested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5: Fix code style issue in mlx driver

Add the tab before '}' and keep the code style consistent.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

libbpf: fix libbpf_print

With the recent print rework we now have the following problem:
pr_{warning,info,debug} expand to __pr which calls libbpf_print.
libbpf_print does va_start and calls __libbpf_pr with va_list argument.
In __base_pr we again do va_start. Because the next argument is a
va_list, we don't get correct pointer to the argument (and print noting
in my case, I don't know why it doesn't crash tbh).

Fix this by changing libbpf_print_fn_t signature to accept va_list and
remove unneeded calls to va_start in the existing users.

Alternatively, this can we solved by exporting __libbpf_pr and
changing __pr macro to (and killing libbpf_print):
{
if (__libbpf_pr)
__libbpf_pr(level, "libbpf: " fmt, ##__VA_ARGS__)
}

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Merge branch 'sh_eth-implement-simple-RX-checksum-offload'

Sergei Shtylyov says:

====================
sh_eth: implement simple RX checksum offload

Here's a set of 7 patches against DaveM's 'net-next.git' repo. I'm implemeting
the simple RX checksum offload (like was done for the 'ravb' driver by Simon
Horman); it has been only tested on the R8A7740 and R8A77980 SoCs, the other
SoCs should just work (according to their manuals)...
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

sh_eth: offload RX checksum on SH7763

The SH7763 SoC manual describes the Ether MAC's RX checksum offload
the same way as it's implemented in the EtherAVB MACs...

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sh_eth: offload RX checksum on SH7734

The SH7734 SoC manual describes the Ether MAC's RX checksum offload
the same way as it's implemented in the EtherAVB MACs...

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sh_eth: offload RX checksum on R8A77980

The R-Car V3H (R8A77980) SoC manual describes the Ether MAC's RX checksum
offload the same way as it's implemented in the EtherAVB MAC...

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sh_eth: offload RX checksum on R8A7740

The R-Mobile A1 (R8A7740) SoC manual describes the Ether MAC's RX checksum
offload the same way as it's implemented in the EtherAVB MAC...

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: David S. Miller <davem@davemloft.net>

sh_eth: offload RX checksum on R7S72100

The RZ/A1H (R7S721000) SoC manual describes the Ether MAC's RX checksum
offload the same way as it's implemented in the EtherAVB MACs...

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sh_eth: RX checksum offload support

Add support for the RX checksum offload. This is enabled by default and
may be disabled and re-enabled using 'ethtool':

# ethtool -K eth0 rx off
# ethtool -K eth0 rx on

Some Ether MACs provide a simple checksumming scheme which appears to be
completely compatible with CHECKSUM_COMPLETE: sum of all packet data after
the L2 header is appended to packet data; this may be trivially read by
the driver and used to update the skb accordingly. The same checksumming
scheme is implemented in the EtherAVB MACs and now supported by the 'ravb'
driver.

In terms of performance, throughput is close to gigabit line rate with the
RX checksum offload both enabled and disabled.  The 'perf' output, however,
appears to indicate that significantly less time is spent in do_csum() --
this is as expected.

Test results with RX checksum offload enabled:

~/netperf-2.2pl4# perf record -a ./netperf -t TCP_MAERTS -H 192.168.2.4
TCP MAERTS TEST to 192.168.2.4
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

131072  16384  16384    10.01     933.93
[ perf record: Woken up 8 times to write data ]
[ perf record: Captured and wrote 1.955 MB perf.data (41940 samples) ]
~/netperf-2.2pl4# perf report
Samples: 41K of event 'cycles:ppp', Event count (approx.): 9915302763
Overhead  Command          Shared Object             Symbol
   9.44%  netperf          [kernel.kallsyms]         [k] __arch_copy_to_user
   7.75%  swapper          [kernel.kallsyms]         [k] _raw_spin_unlock_irq
   6.31%  swapper          [kernel.kallsyms]         [k] default_idle_call
   5.89%  swapper          [kernel.kallsyms]         [k] arch_cpu_idle
   4.37%  swapper          [kernel.kallsyms]         [k] tick_nohz_idle_exit
   4.02%  netperf          [kernel.kallsyms]         [k] _raw_spin_unlock_irq
   2.52%  netperf          [kernel.kallsyms]         [k] preempt_count_sub
   1.81%  netperf          [kernel.kallsyms]         [k] tcp_recvmsg
   1.80%  netperf          [kernel.kallsyms]         [k] _raw_spin_unlock_irqres
   1.78%  netperf          [kernel.kallsyms]         [k] preempt_count_add
   1.36%  netperf          [kernel.kallsyms]         [k] __tcp_transmit_skb
   1.20%  netperf          [kernel.kallsyms]         [k] __local_bh_enable_ip
   1.10%  netperf          [kernel.kallsyms]         [k] sh_eth_start_xmit

Test results with RX checksum offload disabled:

~/netperf-2.2pl4# perf record -a ./netperf -t TCP_MAERTS -H 192.168.2.4
TCP MAERTS TEST to 192.168.2.4
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec
131072  16384  16384    10.01     932.04
[ perf record: Woken up 14 times to write data ]
[ perf record: Captured and wrote 3.642 MB perf.data (78817 samples) ]
~/netperf-2.2pl4# perf report
Samples: 78K of event 'cycles:ppp', Event count (approx.): 18091442796
Overhead  Command          Shared Object       Symbol
   7.00%  swapper          [kernel.kallsyms]   [k] do_csum
   3.94%  swapper          [kernel.kallsyms]   [k] sh_eth_poll
   3.83%  ksoftirqd/0      [kernel.kallsyms]   [k] do_csum
   3.23%  swapper          [kernel.kallsyms]   [k] _raw_spin_unlock_irq
   2.87%  netperf          [kernel.kallsyms]   [k] __arch_copy_to_user
   2.86%  swapper          [kernel.kallsyms]   [k] arch_cpu_idle
   2.13%  swapper          [kernel.kallsyms]   [k] default_idle_call
   2.12%  ksoftirqd/0      [kernel.kallsyms]   [k] sh_eth_poll
   2.02%  swapper          [kernel.kallsyms]   [k] _raw_spin_unlock_irqrestore
   1.84%  swapper          [kernel.kallsyms]   [k] __softirqentry_text_start
   1.64%  swapper          [kernel.kallsyms]   [k] tick_nohz_idle_exit
   1.53%  netperf          [kernel.kallsyms]   [k] _raw_spin_unlock_irq
   1.32%  netperf          [kernel.kallsyms]   [k] preempt_count_sub
   1.27%  swapper          [kernel.kallsyms]   [k] __pi___inval_dcache_area
   1.22%  swapper          [kernel.kallsyms]   [k] check_preemption_disabled
   1.01%  ksoftirqd/0      [kernel.kallsyms]   [k] _raw_spin_unlock_irqrestore

The above results collected on the R-Car V3H Starter Kit board.

Based on the commit 4d86d3818627 ("ravb: RX checksum offload")...

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sh_eth: rename sh_eth_cpu_data::hw_checksum

Commit 62e04b7e0e3c ("sh_eth: rename 'sh_eth_cpu_data::hw_crc'") renamed
the field to 'hw_checksum' for the Ether DMAC "intelligent checksum",
however some Ether MACs implement a simpler checksumming scheme, so that
name now seems misleading. Rename that field to 'csmr' as the "intelligent
checksum" is always controlled by the CSMR register.

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'libbpf-btf_ext'

Yonghong Song says:

====================
This patch set exposed a few functions in libbpf.
All these newly added API functions are helpful for
JIT based bpf compilation where .BTF and .BTF.ext
are available as in-memory data blobs.

Patch #1 exposed several btf_ext__* API functions which
are used to handle .BTF.ext ELF sections.
Patch #2 refactored the function bpf_map_find_btf_info()
and exposed API function btf__get_map_kv_tids() to
retrieve the map key/value type id's generated by
bpf program through BPF_ANNOTATE_KV_PAIR macro.
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>

tools/bpf: implement libbpf btf__get_map_kv_tids() API function

Currently, to get map key/value type id's, the macro
BPF_ANNOTATE_KV_PAIR(<map_name>, <key_type>, <value_type>)
needs to be defined in the bpf program for the
corresponding map.

During program/map loading time,
the local static function bpf_map_find_btf_info()
in libbpf.c is implemented to retrieve the key/value
type ids given the map name.

The patch refactored function bpf_map_find_btf_info()
to create an API btf__get_map_kv_tids() which includes
the bulk of implementation for the original function.
The API btf__get_map_kv_tids() can be used by bcc,
a JIT based bpf compilation system, which uses the
same BPF_ANNOTATE_KV_PAIR to record map key/value types.

Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>