platform/kernel/linux-starfive.git
5 years agonet: dsa: microchip: add MIB counter reading support
Tristram Ha [Sat, 23 Feb 2019 00:36:48 +0000 (16:36 -0800)]
net: dsa: microchip: add MIB counter reading support

Add background MIB counter reading support.

Port MIB counters should only be read when there is link.  Otherwise it is
a waste of time as hardware never increases those counters.  There are
exceptions as some switches keep track of dropped counts no matter what.

Signed-off-by: Tristram Ha <Tristram.Ha@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: microchip: prepare PHY for proper advertisement
Tristram Ha [Sat, 23 Feb 2019 00:36:47 +0000 (16:36 -0800)]
net: dsa: microchip: prepare PHY for proper advertisement

Prepare PHY for proper advertisement as sometimes the PHY in the switch
has its own problems even though it may share the PHY id from regular PHY
but the fixes in the PHY driver do not apply.

Signed-off-by: Tristram Ha <Tristram.Ha@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'net-phy-marvell10g-Add-2-5GBaseT-support'
David S. Miller [Mon, 25 Feb 2019 01:45:25 +0000 (17:45 -0800)]
Merge branch 'net-phy-marvell10g-Add-2-5GBaseT-support'

Maxime Chevallier says:

====================
net: phy: marvell10g: Add 2.5GBaseT support

This series adds the missing bits necessary to fully support 2.5GBaseT
in the Marvell Alaska PHYs.

The main points for that support are :

 - Making use of the .get_features call, recently introduced by Heiner
   and Andrew, that allows having a fully populated list of supported
   modes, including 2500BaseT.

 - Configuring the MII to 2500BaseX when establishing a link at 2.5G

 - Adding a small quirk to take into account the fact that some PHYs in
   the family won't report the correct supported abilities

The rest of the series consists of small cosmetic improvements such as
using the correct helper to set a linkmode bit and adding macros for the
PHY ids.

We also add support for the 88E2110 PHY, which doesn't require the
quirk, and support for 2500BaseT in the PPv2 driver, in order to have a
fully working setup on the MacchiatoBin board.

Changes since V1 : Fixed formatting issue in patch 01, rebased.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: marvell10g: add support for the 88x2110 PHY
Maxime Chevallier [Fri, 22 Feb 2019 23:37:44 +0000 (00:37 +0100)]
net: phy: marvell10g: add support for the 88x2110 PHY

This patch adds support for the 88x2110 PHY, which is similar to the
already supported 88x3310 PHY without the SFP interface.

It supports 10/100/1000BASET along with 2.5GBASET, 5GBASET and 10GBASET,
with the same interface modes that are used by the 3310.

This PHY don't have the same issue as the 88x3310 regarding 2.5/5G
abilities, and correctly follows the 802.3bz standard to list the
supported abilities.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Suggested-by: Antoine Tenart <antoine.tenart@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: mvpp2: Add 2.5GBaseT support
Maxime Chevallier [Fri, 22 Feb 2019 23:37:43 +0000 (00:37 +0100)]
net: mvpp2: Add 2.5GBaseT support

The PPv2 controller is able to support 2.5G speeds, allowing to use
2.5GBASET in conjunction with PHYs that use 2500BASEX as their MII
interface when using this mode.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: marvell10g: Force reading of 2.5/5G
Maxime Chevallier [Fri, 22 Feb 2019 23:37:42 +0000 (00:37 +0100)]
net: phy: marvell10g: Force reading of 2.5/5G

As per 802.3bz, if bit 14 of (1.11) "PMA Extended Abilities" indicates
whether or not we should read register (1.21) "2.52/5G PMA Extended
Abilities", which contains information on the support of 2.5GBASET and
5GBASET.

After testing on several variants of PHYS of this family, it appears
that bit 14 in (1.11) isn't always set when it should be.

PHYs 88X3310 (on MacchiatoBin) and 88E2010 do support 2.5G and 5GBASET,
but don't have 1.11.14 set. Their register 1.21 is filled with the
correct values, indicating 2.5G and 5G support.

PHYs 88E2110 do have their 1.11.14 bit set, as it should.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: marvell10g: Use a #define for 88X3310 family id
Maxime Chevallier [Fri, 22 Feb 2019 23:37:41 +0000 (00:37 +0100)]
net: phy: marvell10g: Use a #define for 88X3310 family id

The PHY ID corresponding to the 88X3310 is also used for other PHYs in
the same family, such as the 88E2010. Use a #define for the PHY id, that
ignores the last nibble.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: marvell10g: Use 2500BASEX when using 2.5GBASET
Maxime Chevallier [Fri, 22 Feb 2019 23:37:40 +0000 (00:37 +0100)]
net: phy: marvell10g: Use 2500BASEX when using 2.5GBASET

The Marvell Alaska family of PHYs supports 2.5GBaseT and 5GBaseT modes,
as defined in the 802.3bz specification.

Upon establishing a 2.5GBASET link, the PHY will reconfigure it's MII
interface to 2500BASEX.

At 5G, the PHY will reconfigure it's interface to 5GBASE-R, but this
mode isn't supported by any MAC for now.

This was tested with :
 - The 88X3310, which is on the MacchiatoBin
 - The 88E2010, an Alaska PHY that has no fiber interfaces, and is
   limited to 5G maximum speed.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: marvell10g: Use linkmode_set_bit helper instead of __set_bit
Maxime Chevallier [Fri, 22 Feb 2019 23:37:39 +0000 (00:37 +0100)]
net: phy: marvell10g: Use linkmode_set_bit helper instead of __set_bit

Cosmetic patch making use of helpers dedicated to linkmodes handling.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: marvell10g: Use get_features to get the PHY abilities
Maxime Chevallier [Fri, 22 Feb 2019 23:37:38 +0000 (00:37 +0100)]
net: phy: marvell10g: Use get_features to get the PHY abilities

The Alaska family of 10G PHYs has more abilities than the ones listed in
PHY_10GBIT_FULL_FEATURES, the exact list depending on the model.

Make use of the newly introduced .get_features call to build this list,
using genphy_c45_pma_read_abilities to build the list of supported
linkmodes, and adding autoneg ability based on what's reported by the AN
MMD.

.config_init is still used to validate the interface_mode.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: check PMAPMD link status only in genphy_c45_read_link
Heiner Kallweit [Fri, 22 Feb 2019 21:59:38 +0000 (22:59 +0100)]
net: phy: check PMAPMD link status only in genphy_c45_read_link

The current code reports a link as up if all devices (except a few
blacklisted ones) report the link as up. This breaks Aquantia AQCS109
for lower speeds because on this PHY the PCS link status reflects a
10G link only. For Marvell there's a similar issue, therefore PHYXS
device isn't checked.

There may be more PHYs where depending on the mode the link status
of only selected devices is relevant.

For now it seems to be sufficient to check the link status of the
PMAPMD device only. Leave the loop in the code to be prepared in
case we have to add functionality to check more than one device,
depending on the mode.

Successfully tested on a board with an AQCS109.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'net-switchdev-h-inclusion-removal'
David S. Miller [Mon, 25 Feb 2019 01:40:47 +0000 (17:40 -0800)]
Merge branch 'net-switchdev-h-inclusion-removal'

Florian Fainelli says:

====================
net: switchdev.h inclusion removal

This targets a few drivers that no longer to have net/switchdev.h
included.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: Remove switchdev.h inclusion from team/bond/vlan
Florian Fainelli [Fri, 22 Feb 2019 20:31:34 +0000 (12:31 -0800)]
net: Remove switchdev.h inclusion from team/bond/vlan

This is no longer necessary after eca59f691566 ("net: Remove support for bridge bypass ndos from stacked devices")

Suggested-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andy Gospodarek <andy@greyhouse.net>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonfp: Remove switchdev.h inclusion
Florian Fainelli [Fri, 22 Feb 2019 20:31:33 +0000 (12:31 -0800)]
nfp: Remove switchdev.h inclusion

This is no longer necessary after a5084bb71fa4 ("nfp: Implement
ndo_get_port_parent_id()")

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: lantiq: Do not use eth_change_mtu()
Hauke Mehrtens [Fri, 22 Feb 2019 19:12:57 +0000 (20:12 +0100)]
net: lantiq: Do not use eth_change_mtu()

eth_change_mtu() is not needed any more, the networking subsystem will
call it automatically when this callback is not implemented.

Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: improve definition of __ETHTOOL_LINK_MODE_MASK_NBITS
Heiner Kallweit [Fri, 22 Feb 2019 18:25:59 +0000 (19:25 +0100)]
net: phy: improve definition of __ETHTOOL_LINK_MODE_MASK_NBITS

The way to define __ETHTOOL_LINK_MODE_MASK_NBITS seems to be overly
complicated, go with a standard approach instead.
Whilst we're at it, move the comment to the right place.

v2:
- rebased

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'net-protodown-support-for-macvlan-and-vxlan'
David S. Miller [Sun, 24 Feb 2019 21:01:05 +0000 (13:01 -0800)]
Merge branch 'net-protodown-support-for-macvlan-and-vxlan'

Andy Roulin says:

====================
net: protodown support for macvlan and vxlan

This patch series adds dev_change_proto_down_generic, a generic
implementation of ndo_change_proto_down, which sets the netdev carrier
state according to the new proto_down value.

This handler adds the ability to set protodown on macvlan and vxlan
interfaces in a generic way for use by control protocols like VRRPD.

Patch (1) introduces the handler in net/code/dev.c. Patch (2) and (3) add
support for change_proto_down in macvlan and vxlan drivers, respectively,
using the new function.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agovxlan: add ndo_change_proto_down support
Andy Roulin [Fri, 22 Feb 2019 18:06:38 +0000 (18:06 +0000)]
vxlan: add ndo_change_proto_down support

Add ndo_change_proto_down support through dev_change_proto_down_generic
for use by control protocols like VRRPD.

Signed-off-by: Andy Roulin <aroulin@cumulusnetworks.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomacvlan: add ndo_change_proto_down support
Andy Roulin [Fri, 22 Feb 2019 18:06:37 +0000 (18:06 +0000)]
macvlan: add ndo_change_proto_down support

Add ndo_change_proto_down support through dev_change_proto_down_generic
for use by control protocols like VRRPD.

Signed-off-by: Andy Roulin <aroulin@cumulusnetworks.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dev: add generic protodown handler
Andy Roulin [Fri, 22 Feb 2019 18:06:36 +0000 (18:06 +0000)]
net: dev: add generic protodown handler

Introduce dev_change_proto_down_generic, a generic ndo_change_proto_down
implementation, which sets the netdev carrier state according to proto_down.

This adds the ability to set protodown on vxlan and macvlan devices in a
generic way for use by control protocols like VRRPD.

Signed-off-by: Andy Roulin <aroulin@cumulusnetworks.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'Add-tests-for-unlocked-flower-classifier-implementation'
David S. Miller [Sun, 24 Feb 2019 20:49:59 +0000 (12:49 -0800)]
Merge branch 'Add-tests-for-unlocked-flower-classifier-implementation'

Vlad Buslov says:

====================
Add tests for unlocked flower classifier implementation

Implement tests for tdc testsuite to verify concurrent rules update with
rtnl-unlocked flower classifier implementation. The goal of these tests
is to verify general flower classifier correctness by updating filters
on same classifier instance in parallel and to verify its atomicity by
concurrently updating filters in same handle range. All three filter
update operations (add, replace, delete) are tested.

Existing script tdc_batch.py is re-used for batch file generation. It is
extended with several optional CLI arguments that are needed for
concurrency tests. Thin wrapper tdc_multibatch.py is implemented on top
of tdc_batch.py to simplify its usage when generating multiple batch
files for several test configurations.

Parallelism in tests is implemented by running multiple instances of tc
in batch mode with xargs tool. Xargs is chosen for its ease of use and
because it is available by default on most modern Linux distributions.
====================

Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: concurrency: add test to verify parallel replace/delete
Vlad Buslov [Fri, 22 Feb 2019 14:00:47 +0000 (16:00 +0200)]
selftests: concurrency: add test to verify parallel replace/delete

Implement test that runs 5 instances of tc replace filter in parallel with
5 instances of tc del filter from same tp instance. Each instance uses its
own filter handle and key range.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: concurrency: add test to verify parallel add/delete
Vlad Buslov [Fri, 22 Feb 2019 14:00:46 +0000 (16:00 +0200)]
selftests: concurrency: add test to verify parallel add/delete

Implement test that runs 5 instances of tc add filter in parallel with 5
instances of tc del filter from same tp instance. Each instance uses its
own filter handle and key range.

Extend tdc_multibatch.py with additional options required to implement the
test: common prefix for all generated batch files, first value of filter
handle range, MAC address prefix modifier. These are necessary to allow
creating batch files with unique keys and handle ranges with multiple
invocation of tdc_multibatch.py helper script.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: concurrency: add test to verify concurrent delete
Vlad Buslov [Fri, 22 Feb 2019 14:00:45 +0000 (16:00 +0200)]
selftests: concurrency: add test to verify concurrent delete

Implement test that verifies concurrent deletion of rules by executing 10
tc instances that delete flower filters in same handle range. In this case
only one tc instance succeeds in deleting a filter with particular handle.
To mitigate expected failures of all other instances, run tc with 'force'
option to continue processing batch file in case of errors and expect xargs
to return code '123' that indicates that invocation of command(s) exited
with error in range 1-125.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: concurrency: add test to verify concurrent replace
Vlad Buslov [Fri, 22 Feb 2019 14:00:44 +0000 (16:00 +0200)]
selftests: concurrency: add test to verify concurrent replace

Implement test that verifies concurrent replacement of rules by executing
10 tc instances that replace flower filters in same handle range.

Extend tdc_multibatch.py script with new optional CLI argument that is used
to generate all batch files with same filter handle range.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: concurrency: add test to verify parallel rules replace
Vlad Buslov [Fri, 22 Feb 2019 14:00:43 +0000 (16:00 +0200)]
selftests: concurrency: add test to verify parallel rules replace

Implement test that verifies parallel rules replacement by adding 1 million
flower filters and then replacing them with 10 concurrent tc instances.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: concurrency: add test to verify parallel rules deletion
Vlad Buslov [Fri, 22 Feb 2019 14:00:42 +0000 (16:00 +0200)]
selftests: concurrency: add test to verify parallel rules deletion

Implement test that verifies parallel rules deletion by adding 1 million
flower filters and then deleting them with 10 concurrent tc instances.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: concurrency: add test to verify parallel rules insertion
Vlad Buslov [Fri, 22 Feb 2019 14:00:41 +0000 (16:00 +0200)]
selftests: concurrency: add test to verify parallel rules insertion

Implement test that verifies parallel rules insertion by adding 1 million
flower filters with 10 concurrent tc instances. Put it to standalone
'concurrency' category.

Implement tdc_multibatch.py helper script that is used to generate multiple
batch files for concurrent tc execution. Extend config with new 'BATCH_DIR'
variable to specify temporary output directory that is used to store batch
files generated by tdc_multibatch.py.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: tdc_batch.py: add options needed for concurrency tests
Vlad Buslov [Fri, 22 Feb 2019 14:00:40 +0000 (16:00 +0200)]
selftests: tdc_batch.py: add options needed for concurrency tests

Extend tdc_batch.py with several optional CLI arguments that are used for
implementation of concurrency tests in following patches in this set:

- Add optional argument to specify range of filter handles used in batch
  file [fitler_handle, filter_handle+number). This is needed for testing
  filter deletion where it is necessary to know exact handles of configured
  filters.

- Add optional argument to specify filter operation type (possible values
  are ['add', 'del', 'replace']) instead of hardcoded "add" value. This
  allows generation of batches for filter addition, deletion and
  replacement.

- Add optional argument to allow user to change mac address prefix that is
  used for all filters in batch. This is necessary to allow generating
  multiple batches with unique flower classifier keys.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: Skip GSO length estimation if transport header is not set
Maxim Mikityanskiy [Fri, 22 Feb 2019 12:55:22 +0000 (12:55 +0000)]
net: Skip GSO length estimation if transport header is not set

qdisc_pkt_len_init expects transport_header to be set for GSO packets.
Patch [1] skips transport_header validation for GSO packets that don't
have network_header set at the moment of calling virtio_net_hdr_to_skb,
and allows them to pass into the stack. After patch [2] no placeholder
value is assigned to transport_header if dissection fails, so this patch
adds a check to the place where the value of transport_header is used.

[1] https://patchwork.ozlabs.org/patch/1044429/
[2] https://patchwork.ozlabs.org/patch/1046122/

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agodoc: add phylink documentation to the networking book
Russell King [Fri, 22 Feb 2019 11:31:46 +0000 (11:31 +0000)]
doc: add phylink documentation to the networking book

Add some phylink documentation to the networking book detailing how
to convert network drivers from phylib to phylink.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phylink: update mac_config() documentation
Russell King [Fri, 22 Feb 2019 11:31:41 +0000 (11:31 +0000)]
net: phylink: update mac_config() documentation

A detail for mac_config() had been missed in the documentation for the
method - it is expected that the method will update the MAC to the
settings, rather than completely reprogram the MAC on each call.
Update the documentation for this method for this detail.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: Use RCU_INIT_POINTER() to set sk_wq
Li RongQing [Fri, 22 Feb 2019 09:08:22 +0000 (17:08 +0800)]
net: Use RCU_INIT_POINTER() to set sk_wq

This pointer is RCU protected, so proper primitives should be used.

Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: let genphy_c45_read_abilities also check aneg capability
Heiner Kallweit [Fri, 22 Feb 2019 07:23:04 +0000 (08:23 +0100)]
net: phy: let genphy_c45_read_abilities also check aneg capability

When using genphy_c45_read_abilities() as get_features callback we
also have to set the autoneg capability in phydev->supported.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
David S. Miller [Sun, 24 Feb 2019 19:48:04 +0000 (11:48 -0800)]
Merge git://git./linux/kernel/git/davem/net

Three conflicts, one of which, for marvell10g.c is non-trivial and
requires some follow-up from Heiner or someone else.

The issue is that Heiner converted the marvell10g driver over to
use the generic c45 code as much as possible.

However, in 'net' a bug fix appeared which makes sure that a new
local mask (MDIO_AN_10GBT_CTRL_ADV_NBT_MASK) with value 0x01e0
is cleared.

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Linus Torvalds [Sun, 24 Feb 2019 17:47:07 +0000 (09:47 -0800)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "Bug fixes"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: MMU: record maximum physical address width in kvm_mmu_extended_role
  kvm: x86: Return LA57 feature based on hardware capability
  x86/kvm/mmu: fix switch between root and guest MMUs
  s390: vsie: Use effective CRYCBD.31 to check CRYCBD validity

5 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Linus Torvalds [Sun, 24 Feb 2019 17:28:26 +0000 (09:28 -0800)]
Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:
 "Hopefully the last pull request for this release. Fingers crossed:

   1) Only refcount ESP stats on full sockets, from Martin Willi.

   2) Missing barriers in AF_UNIX, from Al Viro.

   3) RCU protection fixes in ipv6 route code, from Paolo Abeni.

   4) Avoid false positives in untrusted GSO validation, from Willem de
      Bruijn.

   5) Forwarded mesh packets in mac80211 need more tailroom allocated,
      from Felix Fietkau.

   6) Use operstate consistently for linkup in team driver, from George
      Wilkie.

   7) ThunderX bug fixes from Vadim Lomovtsev. Mostly races between VF
      and PF code paths.

   8) Purge ipv6 exceptions during netdevice removal, from Paolo Abeni.

   9) nfp eBPF code gen fixes from Jiong Wang.

  10) bnxt_en firmware timeout fix from Michael Chan.

  11) Use after free in udp/udpv6 error handlers, from Paolo Abeni.

  12) Fix a race in x25_bind triggerable by syzbot, from Eric Dumazet"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (65 commits)
  net: phy: realtek: Dummy IRQ calls for RTL8366RB
  tcp: repaired skbs must init their tso_segs
  net/x25: fix a race in x25_bind()
  net: dsa: Remove documentation for port_fdb_prepare
  Revert "bridge: do not add port to router list when receives query with source 0.0.0.0"
  selftests: fib_tests: sleep after changing carrier. again.
  net: set static variable an initial value in atl2_probe()
  net: phy: marvell10g: Fix Multi-G advertisement to only advertise 10G
  bpf, doc: add bpf list as secondary entry to maintainers file
  udp: fix possible user after free in error handler
  udpv6: fix possible user after free in error handler
  fou6: fix proto error handler argument type
  udpv6: add the required annotation to mib type
  mdio_bus: Fix use-after-free on device_register fails
  net: Set rtm_table to RT_TABLE_COMPAT for ipv6 for tables > 255
  bnxt_en: Wait longer for the firmware message response to complete.
  bnxt_en: Fix typo in firmware message timeout logic.
  nfp: bpf: fix ALU32 high bits clearance bug
  nfp: bpf: fix code-gen bug on BPF_ALU | BPF_XOR | BPF_K
  Documentation: networking: switchdev: Update port parent ID section
  ...

5 years agotrace: events: neigh_update: print new state in string format
Roopa Prabhu [Sun, 24 Feb 2019 06:25:12 +0000 (22:25 -0800)]
trace: events: neigh_update: print new state in string format

Also, extend neigh_state_str to include neigh dummy states
noarp and permanent

Fixes: 9c03b282badb ("trace: events: add a few neigh tracepoints")
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: realtek: Dummy IRQ calls for RTL8366RB
Linus Walleij [Sun, 24 Feb 2019 00:11:15 +0000 (01:11 +0100)]
net: phy: realtek: Dummy IRQ calls for RTL8366RB

This fixes a regression introduced by
commit 0d2e778e38e0ddffab4bb2b0e9ed2ad5165c4bf7
"net: phy: replace PHY_HAS_INTERRUPT with a check for
config_intr and ack_interrupt".

This assumes that a PHY cannot trigger interrupt unless
it has .config_intr() or .ack_interrupt() implemented.
A later patch makes the code assume both need to be
implemented for interrupts to be present.

But this PHY (which is inside a DSA) will happily
fire interrupts without either callback.

Implement dummy callbacks for .config_intr() and
.ack_interrupt() in the phy header to fix this.

Tested on the RTL8366RB on D-Link DIR-685.

Fixes: 0d2e778e38e0 ("net: phy: replace PHY_HAS_INTERRUPT with a check for config_intr and ack_interrupt")
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotcp: repaired skbs must init their tso_segs
Eric Dumazet [Sat, 23 Feb 2019 23:51:51 +0000 (15:51 -0800)]
tcp: repaired skbs must init their tso_segs

syzbot reported a WARN_ON(!tcp_skb_pcount(skb))
in tcp_send_loss_probe() [1]

This was caused by TCP_REPAIR sent skbs that inadvertenly
were missing a call to tcp_init_tso_segs()

[1]
WARNING: CPU: 1 PID: 0 at net/ipv4/tcp_output.c:2534 tcp_send_loss_probe+0x771/0x8a0 net/ipv4/tcp_output.c:2534
Kernel panic - not syncing: panic_on_warn set ...
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.0.0-rc7+ #77
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 <IRQ>
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x172/0x1f0 lib/dump_stack.c:113
 panic+0x2cb/0x65c kernel/panic.c:214
 __warn.cold+0x20/0x45 kernel/panic.c:571
 report_bug+0x263/0x2b0 lib/bug.c:186
 fixup_bug arch/x86/kernel/traps.c:178 [inline]
 fixup_bug arch/x86/kernel/traps.c:173 [inline]
 do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271
 do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:290
 invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:973
RIP: 0010:tcp_send_loss_probe+0x771/0x8a0 net/ipv4/tcp_output.c:2534
Code: 88 fc ff ff 4c 89 ef e8 ed 75 c8 fb e9 c8 fc ff ff e8 43 76 c8 fb e9 63 fd ff ff e8 d9 75 c8 fb e9 94 f9 ff ff e8 bf 03 91 fb <0f> 0b e9 7d fa ff ff e8 b3 03 91 fb 0f b6 1d 37 43 7a 03 31 ff 89
RSP: 0018:ffff8880ae907c60 EFLAGS: 00010206
RAX: ffff8880a989c340 RBX: 0000000000000000 RCX: ffffffff85dedbdb
RDX: 0000000000000100 RSI: ffffffff85dee0b1 RDI: 0000000000000005
RBP: ffff8880ae907c90 R08: ffff8880a989c340 R09: ffffed10147d1ae1
R10: ffffed10147d1ae0 R11: ffff8880a3e8d703 R12: ffff888091b90040
R13: ffff8880a3e8d540 R14: 0000000000008000 R15: ffff888091b90860
 tcp_write_timer_handler+0x5c0/0x8a0 net/ipv4/tcp_timer.c:583
 tcp_write_timer+0x10e/0x1d0 net/ipv4/tcp_timer.c:607
 call_timer_fn+0x190/0x720 kernel/time/timer.c:1325
 expire_timers kernel/time/timer.c:1362 [inline]
 __run_timers kernel/time/timer.c:1681 [inline]
 __run_timers kernel/time/timer.c:1649 [inline]
 run_timer_softirq+0x652/0x1700 kernel/time/timer.c:1694
 __do_softirq+0x266/0x95a kernel/softirq.c:292
 invoke_softirq kernel/softirq.c:373 [inline]
 irq_exit+0x180/0x1d0 kernel/softirq.c:413
 exiting_irq arch/x86/include/asm/apic.h:536 [inline]
 smp_apic_timer_interrupt+0x14a/0x570 arch/x86/kernel/apic/apic.c:1062
 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:807
 </IRQ>
RIP: 0010:native_safe_halt+0x2/0x10 arch/x86/include/asm/irqflags.h:58
Code: ff ff ff 48 89 c7 48 89 45 d8 e8 59 0c a1 fa 48 8b 45 d8 e9 ce fe ff ff 48 89 df e8 48 0c a1 fa eb 82 90 90 90 90 90 90 fb f4 <c3> 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90
RSP: 0018:ffff8880a98afd78 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff13
RAX: 1ffffffff1125061 RBX: ffff8880a989c340 RCX: 0000000000000000
RDX: dffffc0000000000 RSI: 0000000000000001 RDI: ffff8880a989cbbc
RBP: ffff8880a98afda8 R08: ffff8880a989c340 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
R13: ffffffff889282f8 R14: 0000000000000001 R15: 0000000000000000
 arch_cpu_idle+0x10/0x20 arch/x86/kernel/process.c:555
 default_idle_call+0x36/0x90 kernel/sched/idle.c:93
 cpuidle_idle_call kernel/sched/idle.c:153 [inline]
 do_idle+0x386/0x570 kernel/sched/idle.c:262
 cpu_startup_entry+0x1b/0x20 kernel/sched/idle.c:353
 start_secondary+0x404/0x5c0 arch/x86/kernel/smpboot.c:271
 secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:243
Kernel Offset: disabled
Rebooting in 86400 seconds..

Fixes: 79861919b889 ("tcp: fix TCP_REPAIR xmit queue setup")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Andrey Vagin <avagin@openvz.org>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/x25: fix a race in x25_bind()
Eric Dumazet [Sat, 23 Feb 2019 21:24:59 +0000 (13:24 -0800)]
net/x25: fix a race in x25_bind()

syzbot was able to trigger another soft lockup [1]

I first thought it was the O(N^2) issue I mentioned in my
prior fix (f657d22ee1f "net/x25: do not hold the cpu
too long in x25_new_lci()"), but I eventually found
that x25_bind() was not checking SOCK_ZAPPED state under
socket lock protection.

This means that multiple threads can end up calling
x25_insert_socket() for the same socket, and corrupt x25_list

[1]
watchdog: BUG: soft lockup - CPU#0 stuck for 123s! [syz-executor.2:10492]
Modules linked in:
irq event stamp: 27515
hardirqs last  enabled at (27514): [<ffffffff81006673>] trace_hardirqs_on_thunk+0x1a/0x1c
hardirqs last disabled at (27515): [<ffffffff8100668f>] trace_hardirqs_off_thunk+0x1a/0x1c
softirqs last  enabled at (32): [<ffffffff8632ee73>] x25_get_neigh+0xa3/0xd0 net/x25/x25_link.c:336
softirqs last disabled at (34): [<ffffffff86324bc3>] x25_find_socket+0x23/0x140 net/x25/af_x25.c:341
CPU: 0 PID: 10492 Comm: syz-executor.2 Not tainted 5.0.0-rc7+ #88
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:__sanitizer_cov_trace_pc+0x4/0x50 kernel/kcov.c:97
Code: f4 ff ff ff e8 11 9f ea ff 48 c7 05 12 fb e5 08 00 00 00 00 e9 c8 e9 ff ff 90 90 90 90 90 90 90 90 90 90 90 90 90 55 48 89 e5 <48> 8b 75 08 65 48 8b 04 25 40 ee 01 00 65 8b 15 38 0c 92 7e 81 e2
RSP: 0018:ffff88806e94fc48 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff13
RAX: 1ffff1100d84dac5 RBX: 0000000000000001 RCX: ffffc90006197000
RDX: 0000000000040000 RSI: ffffffff86324bf3 RDI: ffff88806c26d628
RBP: ffff88806e94fc48 R08: ffff88806c1c6500 R09: fffffbfff1282561
R10: fffffbfff1282560 R11: ffffffff89412b03 R12: ffff88806c26d628
R13: ffff888090455200 R14: dffffc0000000000 R15: 0000000000000000
FS:  00007f3a107e4700(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f3a107e3db8 CR3: 00000000a5544000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 __x25_find_socket net/x25/af_x25.c:327 [inline]
 x25_find_socket+0x7d/0x140 net/x25/af_x25.c:342
 x25_new_lci net/x25/af_x25.c:355 [inline]
 x25_connect+0x380/0xde0 net/x25/af_x25.c:784
 __sys_connect+0x266/0x330 net/socket.c:1662
 __do_sys_connect net/socket.c:1673 [inline]
 __se_sys_connect net/socket.c:1670 [inline]
 __x64_sys_connect+0x73/0xb0 net/socket.c:1670
 do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x457e29
Code: ad b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f3a107e3c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000457e29
RDX: 0000000000000012 RSI: 0000000020000200 RDI: 0000000000000005
RBP: 000000000073c040 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f3a107e46d4
R13: 00000000004be362 R14: 00000000004ceb98 R15: 00000000ffffffff
Sending NMI from CPU 0 to CPUs 1:
NMI backtrace for cpu 1
CPU: 1 PID: 10493 Comm: syz-executor.3 Not tainted 5.0.0-rc7+ #88
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:__read_once_size include/linux/compiler.h:193 [inline]
RIP: 0010:queued_write_lock_slowpath+0x143/0x290 kernel/locking/qrwlock.c:86
Code: 4c 8d 2c 01 41 83 c7 03 41 0f b6 45 00 41 38 c7 7c 08 84 c0 0f 85 0c 01 00 00 8b 03 3d 00 01 00 00 74 1a f3 90 41 0f b6 55 00 <41> 38 d7 7c eb 84 d2 74 e7 48 89 df e8 cc aa 4e 00 eb dd be 04 00
RSP: 0018:ffff888085c47bd8 EFLAGS: 00000206
RAX: 0000000000000300 RBX: ffffffff89412b00 RCX: 1ffffffff1282560
RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffff89412b00
RBP: ffff888085c47c70 R08: 1ffffffff1282560 R09: fffffbfff1282561
R10: fffffbfff1282560 R11: ffffffff89412b03 R12: 00000000000000ff
R13: fffffbfff1282560 R14: 1ffff11010b88f7d R15: 0000000000000003
FS:  00007fdd04086700(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fdd04064db8 CR3: 0000000090be0000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 queued_write_lock include/asm-generic/qrwlock.h:104 [inline]
 do_raw_write_lock+0x1d6/0x290 kernel/locking/spinlock_debug.c:203
 __raw_write_lock_bh include/linux/rwlock_api_smp.h:204 [inline]
 _raw_write_lock_bh+0x3b/0x50 kernel/locking/spinlock.c:312
 x25_insert_socket+0x21/0xe0 net/x25/af_x25.c:267
 x25_bind+0x273/0x340 net/x25/af_x25.c:703
 __sys_bind+0x23f/0x290 net/socket.c:1481
 __do_sys_bind net/socket.c:1492 [inline]
 __se_sys_bind net/socket.c:1490 [inline]
 __x64_sys_bind+0x73/0xb0 net/socket.c:1490
 do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x457e29

Fixes: 90c27297a9bf ("X.25 remove bkl in bind")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: andrew hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: Remove documentation for port_fdb_prepare
Hauke Mehrtens [Fri, 22 Feb 2019 19:07:45 +0000 (20:07 +0100)]
net: dsa: Remove documentation for port_fdb_prepare

This callback was removed some time ago, also remove the documentation.

Fixes: 1b6dd556c304 ("net: dsa: Remove prepare phase for FDB")
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoRevert "bridge: do not add port to router list when receives query with source 0...
Hangbin Liu [Fri, 22 Feb 2019 13:22:32 +0000 (21:22 +0800)]
Revert "bridge: do not add port to router list when receives query with source 0.0.0.0"

This reverts commit 5a2de63fd1a5 ("bridge: do not add port to router list
when receives query with source 0.0.0.0") and commit 0fe5119e267f ("net:
bridge: remove ipv6 zero address check in mcast queries")

The reason is RFC 4541 is not a standard but suggestive. Currently we
will elect 0.0.0.0 as Querier if there is no ip address configured on
bridge. If we do not add the port which recives query with source
0.0.0.0 to router list, the IGMP reports will not be about to forward
to Querier, IGMP data will also not be able to forward to dest.

As Nikolay suggested, revert this change first and add a boolopt api
to disable none-zero election in future if needed.

Reported-by: Linus Lüssing <linus.luessing@c0d3.blue>
Reported-by: Sebastian Gottschall <s.gottschall@newmedia-net.de>
Fixes: 5a2de63fd1a5 ("bridge: do not add port to router list when receives query with source 0.0.0.0")
Fixes: 0fe5119e267f ("net: bridge: remove ipv6 zero address check in mcast queries")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: fib_tests: sleep after changing carrier. again.
Thadeu Lima de Souza Cascardo [Fri, 22 Feb 2019 10:27:41 +0000 (07:27 -0300)]
selftests: fib_tests: sleep after changing carrier. again.

Just like commit e2ba732a1681 ("selftests: fib_tests: sleep after
changing carrier"), wait one second to allow linkwatch to propagate the
carrier change to the stack.

There are two sets of carrier tests. The first slept after the carrier
was set to off, and when the second set ran, it was likely that the
linkwatch would be able to run again without much delay, reducing the
likelihood of a race. However, if you run 'fib_tests.sh -t carrier' on a
loop, you will quickly notice the failures.

Sleeping on the second set of tests make the failures go away.

Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'net-phy-aquantia-improve-and-extend-driver'
David S. Miller [Sat, 23 Feb 2019 22:12:10 +0000 (14:12 -0800)]
Merge branch 'net-phy-aquantia-improve-and-extend-driver'

Heiner Kallweit says:

====================
net: phy: aquantia: improve and extend driver

This series improves and extends the Aquantia PHY driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: aquantia: use genphy_c45_read_status
Heiner Kallweit [Fri, 22 Feb 2019 22:52:32 +0000 (23:52 +0100)]
net: phy: aquantia: use genphy_c45_read_status

Use new function genphy_c45_read_status(). 1000BaseT link partner
advertisement needs to be read from vendor registers.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: add genphy_c45_read_status
Heiner Kallweit [Fri, 22 Feb 2019 22:51:44 +0000 (23:51 +0100)]
net: phy: add genphy_c45_read_status

Similar to genphy_read_status() for Clause 22 add a generic read_status
function for Clause 45.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: don't change modes we don't care about in genphy_c45_read_lpa
Heiner Kallweit [Fri, 22 Feb 2019 22:50:49 +0000 (23:50 +0100)]
net: phy: don't change modes we don't care about in genphy_c45_read_lpa

Because 1000BaseT isn't covered by Clause 45, the 1000BaseT flags in
phydev->lp_advertising may have been set based on vendor registers
already. genphy_c45_read_lpa() would clear these flags as of today.
Therefore switch to mii_lpa_mod_linkmode_lpa_t.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: aquantia: add support for auto-negotiation configuration
Andrew Lunn [Fri, 22 Feb 2019 22:49:54 +0000 (23:49 +0100)]
net: phy: aquantia: add support for auto-negotiation configuration

Make use of the generic c45 code, plus code specific to the Aquantia
phy for 1000BaseT negotiation.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: aquantia: remove false 5G and 10G speed ability for AQCS109
Heiner Kallweit [Fri, 22 Feb 2019 22:48:14 +0000 (23:48 +0100)]
net: phy: aquantia: remove false 5G and 10G speed ability for AQCS109

AQCS109 belongs to a family of PHY's where certain members don't
support 5G or 10G. However for all members of the family the chip
reports 10G and 5G capability. Therefore remove the not supported
modes for AQCS109.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge tag 'mlx5-updates-2019-02-21' of git://git.kernel.org/pub/scm/linux/kernel...
David S. Miller [Sat, 23 Feb 2019 21:56:25 +0000 (13:56 -0800)]
Merge tag 'mlx5-updates-2019-02-21' of git://git./linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2019-02-21

This series adds some misc updates to mlx5 driver,

1) Eli Britstein, Introduces tunnel entropy control from PCMR register
and fixes GRE key by controlling port tunnel entropy calculation.

2) Eran Ben Elisha, provides some mlx5 fixes to the latest tx devlink health
reporting mechanism.

3) Huy Nguyen, Added the support for ndo bridge_setlink to allow
   VEPA/VEB E-Switch legacy mode configurations.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'mlxsw-Add-support-for-new-port-types-and-speeds-for-Spectrum-2'
David S. Miller [Sat, 23 Feb 2019 21:54:36 +0000 (13:54 -0800)]
Merge branch 'mlxsw-Add-support-for-new-port-types-and-speeds-for-Spectrum-2'

Ido Schimmel says:

====================
mlxsw: Add support for new port types and speeds for Spectrum-2

Shalom says:

This patchset adds support for new port types and speeds for Spectrum-2.

Patch #1 + #2 removes an unsupported PTYS field and a duplicate link
mode entry.

Patch #3 queries port's connector type from firmware instead of deriving
it from port admin state.

Patch #4 renames functions which relate to port type-speed to be
Spectrum-1 specific.

Patch #5 defines port type-speed operations and applies it for
Spectrum-1.

Patch #6 + #7 are small renaming and cosmetic changes.

Patch #8 adds new port type-speed fields for PTYS register. These new
fields extend the existing ones in order to support more types and
speeds.

Patch #9 adds Spectrum-2 support for port type-speed operations.

Patch #10 adds Spectrum-2 new port types and speeds.

For Spectrum-2, the user must configure all the types per speed if he /
she wants a specific speed to be advertised. For example, if the user
wants to advertise 100Gbps 4-lanes speed, the following ethtool bits
should be advertised:

  Supported ethtool bits for 100Gbps 4-lanes:
      0x1000000000      100000baseKR4 Full
      0x2000000000      100000baseSR4 Full
      0x4000000000      100000baseCR4 Full
      0x8000000000      100000baseLR4_ER4 Full

  Command for advertising 100Gbps 4-lanes:
      ethtool -s enp3s0np1 advertise 0xF000000000
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum: Add Spectrum-2 ASIC support for new port types and speeds
Shalom Toledo [Fri, 22 Feb 2019 13:56:46 +0000 (13:56 +0000)]
mlxsw: spectrum: Add Spectrum-2 ASIC support for new port types and speeds

Add Spectrum-2 ASIC support for the following new port types and speeds:
  * 50Gbps 1-lane
  * 100Gbps 2-lanes
  * 200Gbps 4-lanes

Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum: Add Spectrum-2 ASIC port type-speed operations
Shalom Toledo [Fri, 22 Feb 2019 13:56:45 +0000 (13:56 +0000)]
mlxsw: spectrum: Add Spectrum-2 ASIC port type-speed operations

Add Spectrum-2 ASIC port type-speed operations.

Since multiple ethtool link modes are represented using a single bit in the
ASIC, the driver forces the user to configure all types per a specific
speed. For example, if the user wants to advertise 100Gbps 4-lanes speed,
he should advertise all the types of 100Gbps 4-lanes speed that are
supported by the ASIC as shown below:

  Supported ethtool bits for 100Gbps 4-lanes:
      0x1000000000      100000baseKR4 Full
      0x2000000000      100000baseSR4 Full
      0x4000000000      100000baseCR4 Full
      0x8000000000      100000baseLR4_ER4 Full

  Command for advertising 100Gbps 4-lanes:
      ethtool -s enp3s0np1 advertise 0xF000000000

Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: reg: Add new port type-speed fields for PTYS register
Shalom Toledo [Fri, 22 Feb 2019 13:56:44 +0000 (13:56 +0000)]
mlxsw: reg: Add new port type-speed fields for PTYS register

PTYS register introduces a new layout for port type-speed fields. These
fields extend the existing ones in order to handle more types and speeds.
For example, the new 200Gbps speed.

Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: reg: 80 columns wrapping change
Shalom Toledo [Fri, 22 Feb 2019 13:56:42 +0000 (13:56 +0000)]
mlxsw: reg: 80 columns wrapping change

80 columns wrapping change in mlxsw_reg_ptys_eth_unpack function.

Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: reg: Rename p_eth_proto_adm to full name p_eth_proto_admin
Shalom Toledo [Fri, 22 Feb 2019 13:56:41 +0000 (13:56 +0000)]
mlxsw: reg: Rename p_eth_proto_adm to full name p_eth_proto_admin

Rename p_eth_proto_adm to p_eth_proto_admin in mlxsw_reg_ptys_eth_unpack
function.

Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum: Add port type-speed operations
Shalom Toledo [Fri, 22 Feb 2019 13:56:40 +0000 (13:56 +0000)]
mlxsw: spectrum: Add port type-speed operations

Add port type-speed operations in order to have different operations for
different ASICs. For now, both ASICs use the same pointer.

Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum: Rename port type-speed functions to ASIC specific
Shalom Toledo [Fri, 22 Feb 2019 13:56:39 +0000 (13:56 +0000)]
mlxsw: spectrum: Rename port type-speed functions to ASIC specific

Rename port speed-type functions to be Spectrum-1 ASIC specific.

Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum: Query port connector type from firmware
Shalom Toledo [Fri, 22 Feb 2019 13:56:38 +0000 (13:56 +0000)]
mlxsw: spectrum: Query port connector type from firmware

Instead of deriving the port connector type from port admin state, query it
from firmware.

Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum: Remove unsupported eth_proto_lp_advertise field in PTYS
Shalom Toledo [Fri, 22 Feb 2019 13:56:37 +0000 (13:56 +0000)]
mlxsw: spectrum: Remove unsupported eth_proto_lp_advertise field in PTYS

Remove eth_proto_lp_advertise field in PTYS register since it is not
supported by the firmware.

Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum: Remove duplicate port link mode entry
Shalom Toledo [Fri, 22 Feb 2019 13:56:36 +0000 (13:56 +0000)]
mlxsw: spectrum: Remove duplicate port link mode entry

Remove duplicate port link mode entry from mlxsw_sp_port_link_mode.

Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: set static variable an initial value in atl2_probe()
Mao Wenan [Fri, 22 Feb 2019 06:57:23 +0000 (14:57 +0800)]
net: set static variable an initial value in atl2_probe()

cards_found is a static variable, but when it enters atl2_probe(),
cards_found is set to zero, the value is not consistent with last probe,
so next behavior is not our expect.

Signed-off-by: Mao Wenan <maowenan@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agokcm: Remove unnecessary SLAB_PANIC for kmem_cache_create() in kcm_init
YueHaibing [Fri, 22 Feb 2019 06:15:30 +0000 (14:15 +0800)]
kcm: Remove unnecessary SLAB_PANIC for kmem_cache_create() in kcm_init

There has check NULL on kmem_cache_create on failure in kcm_init,
no need use SLAB_PANIC to panic the system.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'net-Wformat-fixes'
David S. Miller [Sat, 23 Feb 2019 21:44:58 +0000 (13:44 -0800)]
Merge branch 'net-Wformat-fixes'

Florian Fainelli says:

====================
-Wformat fixes

This is a collection of some -Wformat fixes found during build, nothing
critical, but nice to have for people turning on more warnings with
their builds.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoveth: Fix -Wformat-truncation
Florian Fainelli [Fri, 22 Feb 2019 04:09:29 +0000 (20:09 -0800)]
veth: Fix -Wformat-truncation

Provide a precision hint to snprintf() in order to eliminate a
-Wformat-truncation warning provided below. A maximum of 11 characters
is allowed to reach a maximum of 32 - 1 characters given a possible
maximum value of queues using up to UINT_MAX which occupies 10
characters. Incidentally 11 is the number of characters for
"xdp_packets" which is the largest string we append.

drivers/net/veth.c: In function 'veth_get_strings':
drivers/net/veth.c:118:47: warning: '%s' directive output may be
truncated writing up to 31 bytes into a region of size between 12 and 21
[-Wformat-truncation=]
     snprintf(p, ETH_GSTRING_LEN, "rx_queue_%u_%s",
                                               ^~
drivers/net/veth.c:118:5: note: 'snprintf' output between 12 and 52
bytes into a destination of size 32
     snprintf(p, ETH_GSTRING_LEN, "rx_queue_%u_%s",
     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
       i, veth_rq_stats_desc[j].desc);
       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoe1000e: Fix -Wformat-truncation warnings
Florian Fainelli [Fri, 22 Feb 2019 04:09:28 +0000 (20:09 -0800)]
e1000e: Fix -Wformat-truncation warnings

Provide precision hints to snprintf() since we know the destination
buffer size of the RX/TX ring names are IFNAMSIZ + 5 - 1. This fixes the
following warnings:

drivers/net/ethernet/intel/e1000e/netdev.c: In function
'e1000_request_msix':
drivers/net/ethernet/intel/e1000e/netdev.c:2109:13: warning: 'snprintf'
output may be truncated before the last format character
[-Wformat-truncation=]
     "%s-rx-0", netdev->name);
             ^
drivers/net/ethernet/intel/e1000e/netdev.c:2107:3: note: 'snprintf'
output between 6 and 21 bytes into a destination of size 20
   snprintf(adapter->rx_ring->name,
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     sizeof(adapter->rx_ring->name) - 1,
     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     "%s-rx-0", netdev->name);
     ~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/intel/e1000e/netdev.c:2125:13: warning: 'snprintf'
output may be truncated before the last format character
[-Wformat-truncation=]
     "%s-tx-0", netdev->name);
             ^
drivers/net/ethernet/intel/e1000e/netdev.c:2123:3: note: 'snprintf'
output between 6 and 21 bytes into a destination of size 20
   snprintf(adapter->tx_ring->name,
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     sizeof(adapter->tx_ring->name) - 1,
     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     "%s-tx-0", netdev->name);
     ~~~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: mv88e6xxx: Fix -Wformat-security warnings
Florian Fainelli [Fri, 22 Feb 2019 04:09:27 +0000 (20:09 -0800)]
net: dsa: mv88e6xxx: Fix -Wformat-security warnings

We are not specifying an explicit format argument but instead passing a
string litteral which causes these two warnings to show up:

drivers/net/dsa/mv88e6xxx/chip.c: In function
'mv88e6xxx_irq_poll_setup':
drivers/net/dsa/mv88e6xxx/chip.c:483:2: warning: format not a string
literal and no format arguments [-Wformat-security]
  chip->kworker = kthread_create_worker(0, dev_name(chip->dev));
  ^~~~
drivers/net/dsa/mv88e6xxx/ptp.c: In function 'mv88e6xxx_ptp_setup':
drivers/net/dsa/mv88e6xxx/ptp.c:403:4: warning: format not a string
literal and no format arguments [-Wformat-security]
    dev_name(chip->dev));
    ^~~~~~~~
  LD [M]  drivers/net/dsa/mv88e6xxx/mv88e6xxx.o

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum: Avoid -Wformat-truncation warnings
Florian Fainelli [Fri, 22 Feb 2019 04:09:26 +0000 (20:09 -0800)]
mlxsw: spectrum: Avoid -Wformat-truncation warnings

Give precision identifiers to the two snprintf() formatting the priority
and TC strings to avoid producing these two warnings:

drivers/net/ethernet/mellanox/mlxsw/spectrum.c: In function
'mlxsw_sp_port_get_prio_strings':
drivers/net/ethernet/mellanox/mlxsw/spectrum.c:2132:37: warning: '%d'
directive output may be truncated writing between 1 and 3 bytes into a
region of size between 0 and 31 [-Wformat-truncation=]
   snprintf(*p, ETH_GSTRING_LEN, "%s_%d",
                                     ^~
drivers/net/ethernet/mellanox/mlxsw/spectrum.c:2132:3: note: 'snprintf'
output between 3 and 36 bytes into a destination of size 32
   snprintf(*p, ETH_GSTRING_LEN, "%s_%d",
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     mlxsw_sp_port_hw_prio_stats[i].str, prio);
     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/mellanox/mlxsw/spectrum.c: In function
'mlxsw_sp_port_get_tc_strings':
drivers/net/ethernet/mellanox/mlxsw/spectrum.c:2143:37: warning: '%d'
directive output may be truncated writing between 1 and 11 bytes into a
region of size between 0 and 31 [-Wformat-truncation=]
   snprintf(*p, ETH_GSTRING_LEN, "%s_%d",
                                     ^~
drivers/net/ethernet/mellanox/mlxsw/spectrum.c:2143:3: note: 'snprintf'
output between 3 and 44 bytes into a destination of size 32
   snprintf(*p, ETH_GSTRING_LEN, "%s_%d",
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     mlxsw_sp_port_hw_tc_stats[i].str, tc);
     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agobpfilter: re-add header search paths to tools include to fix build error
Masahiro Yamada [Fri, 22 Feb 2019 03:23:19 +0000 (12:23 +0900)]
bpfilter: re-add header search paths to tools include to fix build error

I thought header search paths to tools/include(/uapi) were unneeded,
but it looks like a build error occurs depending on the compiler.

Commit 303a339f30a9 ("bpfilter: remove extra header search paths for
bpfilter_umh") reintroduced the build error fixed by commit ae40832e53c3
("bpfilter: fix a build err").

Apology for the breakage, and thanks to Guenter for reporting this.

Fixes: 303a339f30a9 ("bpfilter: remove extra header search paths for bpfilter_umh")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: marvell10g: Fix Multi-G advertisement to only advertise 10G
Maxime Chevallier [Thu, 21 Feb 2019 16:54:11 +0000 (17:54 +0100)]
net: phy: marvell10g: Fix Multi-G advertisement to only advertise 10G

Some Marvell Alaska PHYs support 2.5G, 5G and 10G BaseT links. Their
default behaviour is to advertise all of these modes, but at the moment,
only 10GBaseT is supported. To prevent link partners from establishing
link at that speed, clear these modes upon configuring aneg parameters.

Fixes: 20b2af32ff3f ("net: phy: add Marvell Alaska X 88X3310 10Gigabit PHY support")
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reported-by: Russell King <linux@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge tag 'powerpc-5.0-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc...
Linus Torvalds [Sat, 23 Feb 2019 19:13:50 +0000 (11:13 -0800)]
Merge tag 'powerpc-5.0-6' of git://git./linux/kernel/git/powerpc/linux

Pull powerpc fix from Michael Ellerman:
 "One fix for an oops when using SRIOV, introduced by the recent changes
  to support compound IOMMU groups.

  Thanks to Alexey Kardashevskiy"

* tag 'powerpc-5.0-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/powernv/sriov: Register IOMMU groups for VFs

5 years agoMerge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Linus Torvalds [Sat, 23 Feb 2019 17:48:01 +0000 (09:48 -0800)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "Four small fixes: three in drivers and one in the core.

  The core fix is also minor in scope since the bug it fixes is only
  known to affect systems using SCSI reservations. Of the driver bugs,
  the libsas one is the most major because it can lead to multiple disks
  on the same expander not being exposed"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: core: reset host byte in DID_NEXUS_FAILURE case
  scsi: libsas: Fix rphy phy_identifier for PHYs with end devices attached
  scsi: sd_zbc: Fix sd_zbc_report_zones() buffer allocation
  scsi: libiscsi: Fix race between iscsi_xmit_task and iscsi_complete_task

5 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
David S. Miller [Sat, 23 Feb 2019 04:45:38 +0000 (20:45 -0800)]
Merge git://git./pub/scm/linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2019-02-23

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Fix a bug in BPF's LPM deletion logic to match correct prefix
   length, from Alban.

2) Fix AF_XDP teardown by not destroying umem prematurely as it
   is still needed till all outstanding skbs are freed, from Björn.

3) Fix unkillable BPF_PROG_TEST_RUN under preempt kernel by checking
   signal_pending() outside need_resched() condition which is never
   triggered there, from Stanislav.

4) Fix two nfp JIT bugs, one in code emission for K-based xor, and
   another one to explicitly clear upper bits in alu32, from Jiong.

5) Add bpf list address to maintainers file, from Daniel.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'fixes-v5.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorri...
Linus Torvalds [Sat, 23 Feb 2019 01:48:50 +0000 (17:48 -0800)]
Merge branch 'fixes-v5.0-rc7' of git://git./linux/kernel/git/jmorris/linux-security

Pull keys fixes from James Morris:
 "Two fixes from Eric Biggers"

* 'fixes-v5.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
  KEYS: always initialize keyring_index_key::desc_len
  KEYS: user: Align the payload buffer

5 years agoMerge tag 'pm-5.0' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Linus Torvalds [Sat, 23 Feb 2019 01:46:30 +0000 (17:46 -0800)]
Merge tag 'pm-5.0' of git://git./linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
 "These fix a regression in the PM-runtime framework introduced by the
  recent switch-over of it to using hrtimers and a use-after-free
  introduced by one of the recent changes in the scmi-cpufreq driver.

  Specifics:

   - Use hrtimer_try_to_cancel() instead of hrtimer_cancel() in the
     PM-runtime framework to avoid a possible timer-related deadlock
     introduced recently (Vincent Guittot).

   - Reorder the scmi-cpufreq driver code to avoid accessing memory that
     has just been freed (Yangtao Li)"

* tag 'pm-5.0' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM-runtime: Fix deadlock when canceling hrtimer
  cpufreq: scmi: Fix use-after-free in scmi_cpufreq_exit()

5 years agoMerge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Linus Torvalds [Sat, 23 Feb 2019 00:48:37 +0000 (16:48 -0800)]
Merge tag 'armsoc-fixes' of git://git./linux/kernel/git/soc/soc

Pull ARM SoC fixes from Arnd Bergmann:
 "Only a handful of device tree fixes, all simple enough:

  NVIDIA Tegra:
   - Fix a regression for booting on chromebooks

  TI OMAP:
   - Two fixes PHY mode on am335x reference boards

  Marvell mvebu:
   - A regression fix for Armada XP NAND flash controllers
   - An incorrect reset signal on the clearfog board"

* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
  ARM: tegra: Restore DT ABI on Tegra124 Chromebooks
  ARM: dts: am335x-evm: Fix PHY mode for ethernet
  ARM: dts: am335x-evmsk: Fix PHY mode for ethernet
  arm64: dts: clearfog-gt-8k: fix SGMII PHY reset signal
  ARM: dts: armada-xp: fix Armada XP boards NAND description

5 years agoMerge tag 'arc-5.0-final' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Linus Torvalds [Sat, 23 Feb 2019 00:31:26 +0000 (16:31 -0800)]
Merge tag 'arc-5.0-final' of git://git./linux/kernel/git/vgupta/arc

Pull ARC fixes from Vineet Gupta:
 "Fixes for ARC for 5.0, bunch of those are stable fodder anyways so
  sooner the better.

   - Fix memcpy to prevent prefetchw beyond end of buffer [Eugeniy]

   - Enable unaligned access early to prevent exceptions given newer gcc
     code gen [Eugeniy]

   - Tighten up uboot arg checking to prevent false negatives and also
     allow both jtag and bootloading to coexist w/o config option as
     needed by kernelCi folks [Eugeniy]

   - Set slab alignment to 8 for ARC to avoid the atomic64_t unalign
     [Alexey]

   - Disable regfile auto save on interrupts on HSDK platform due to a
     silicon issue [Vineet]

   - Avoid HS38x boot printing crash by not reading HS48x only reg
     [Vineet]"

* tag 'arc-5.0-final' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
  ARCv2: don't assume core 0x54 has dual issue
  ARC: define ARCH_SLAB_MINALIGN = 8
  ARC: enable uboot support unconditionally
  ARC: U-boot: check arguments paranoidly
  ARCv2: support manual regfile save on interrupts
  ARC: uacces: remove lp_start, lp_end from clobber list
  ARC: fix actionpoints configuration detection
  ARCv2: lib: memcpy: fix doing prefetchw outside of buffer
  ARCv2: Enable unaligned access in early ASM code

5 years agobpf, doc: add bpf list as secondary entry to maintainers file
Daniel Borkmann [Fri, 22 Feb 2019 23:03:44 +0000 (00:03 +0100)]
bpf, doc: add bpf list as secondary entry to maintainers file

We recently created a bpf@vger.kernel.org list (https://lore.kernel.org/bpf/)
for BPF related discussions, originally in context of BPF track at LSF/MM
for topic discussions. It's *optional* but *desirable* to keep it in Cc for
BPF related kernel/loader/llvm/tooling threads, meaning also infrastructure
like llvm that sits on top of kernel but is crucial to BPF. In any case,
netdev with it's bpf delegate is *as-is* today primary list for patches, so
nothing changes in the workflow. Main purpose is to have some more awareness
for the bpf@vger.kernel.org list that folks can Cc for BPF specific topics.

Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agoMerge branch 'parisc-5.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller...
Linus Torvalds [Sat, 23 Feb 2019 00:12:01 +0000 (16:12 -0800)]
Merge branch 'parisc-5.0-1' of git://git./linux/kernel/git/deller/parisc-linux

Pull parisc fixes from Helge Deller:
 "Fix ptrace syscall number modification which has been broken since
  kernel v4.5 and provide alternative email addresses for the remaining
  users of the retired parisc-linux.org email domain"

* 'parisc-5.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  CREDITS/MAINTAINERS: Retire parisc-linux.org email domain
  parisc: Fix ptrace syscall number modification

5 years agoMerge tag 'kbuild-fixes-v5.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 23 Feb 2019 00:09:55 +0000 (16:09 -0800)]
Merge tag 'kbuild-fixes-v5.0-2' of git://git./linux/kernel/git/masahiroy/linux-kbuild

Pull more Kbuild fixes from Masahiro Yamada:

 - fix scripts/kallsyms.c to correctly check too long symbol names

 - fix sh build error for the combination of CONFIG_OF_EARLY_FLATTREE=y
   and CONFIG_USE_BUILTIN_DTB=n

* tag 'kbuild-fixes-v5.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  sh: fix build error for invisible CONFIG_BUILTIN_DTB_SOURCE
  kallsyms: Handle too long symbols in kallsyms.c

5 years agoMerge branch 'udp-a-few-fixes'
David S. Miller [Sat, 23 Feb 2019 00:05:12 +0000 (16:05 -0800)]
Merge branch 'udp-a-few-fixes'

Paolo Abeni says:

====================
udp: a few fixes

This series includes some UDP-related fixlet. All this stuff has been
pointed out by the sparse tool. The first two patches are just annotation
related, while the last 2 cover some very unlikely races.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoudp: fix possible user after free in error handler
Paolo Abeni [Thu, 21 Feb 2019 16:44:00 +0000 (17:44 +0100)]
udp: fix possible user after free in error handler

Similar to the previous commit, this addresses the same issue for
ipv4: use a single fetch operation and use the correct rcu
annotation.

Fixes: e7cc082455cb ("udp: Support for error handlers of tunnels with arbitrary destination port")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoudpv6: fix possible user after free in error handler
Paolo Abeni [Thu, 21 Feb 2019 16:43:59 +0000 (17:43 +0100)]
udpv6: fix possible user after free in error handler

Before derefencing the encap pointer, commit e7cc082455cb ("udp: Support
for error handlers of tunnels with arbitrary destination port") checks
for a NULL value, but the two fetch operation can race with removal.
Fix the above using a single access.
Also fix a couple of type annotations, to make sparse happy.

Fixes: e7cc082455cb ("udp: Support for error handlers of tunnels with arbitrary destination port")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agofou6: fix proto error handler argument type
Paolo Abeni [Thu, 21 Feb 2019 16:43:58 +0000 (17:43 +0100)]
fou6: fix proto error handler argument type

Last argument of gue6_err_proto_handler() has a wrong type annotation,
fix it and make sparse happy again.

Fixes: b8a51b38e4d4 ("fou, fou6: ICMP error handlers for FoU and GUE")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoudpv6: add the required annotation to mib type
Paolo Abeni [Thu, 21 Feb 2019 16:43:57 +0000 (17:43 +0100)]
udpv6: add the required annotation to mib type

In commit 029a37434880 ("udp6: cleanup stats accounting in recvmsg()")
I forgot to add the percpu annotation for the mib pointer. Add it, and
make sparse happy.

Fixes: 029a37434880 ("udp6: cleanup stats accounting in recvmsg()")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomdio_bus: Fix use-after-free on device_register fails
YueHaibing [Thu, 21 Feb 2019 14:42:01 +0000 (22:42 +0800)]
mdio_bus: Fix use-after-free on device_register fails

KASAN has found use-after-free in fixed_mdio_bus_init,
commit 0c692d07842a ("drivers/net/phy/mdio_bus.c: call
put_device on device_register() failure") call put_device()
while device_register() fails,give up the last reference
to the device and allow mdiobus_release to be executed
,kfreeing the bus. However in most drives, mdiobus_free
be called to free the bus while mdiobus_register fails.
use-after-free occurs when access bus again, this patch
revert it to let mdiobus_free free the bus.

KASAN report details as below:

BUG: KASAN: use-after-free in mdiobus_free+0x85/0x90 drivers/net/phy/mdio_bus.c:482
Read of size 4 at addr ffff8881dc824d78 by task syz-executor.0/3524

CPU: 1 PID: 3524 Comm: syz-executor.0 Not tainted 5.0.0-rc7+ #45
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0xfa/0x1ce lib/dump_stack.c:113
 print_address_description+0x65/0x270 mm/kasan/report.c:187
 kasan_report+0x149/0x18d mm/kasan/report.c:317
 mdiobus_free+0x85/0x90 drivers/net/phy/mdio_bus.c:482
 fixed_mdio_bus_init+0x283/0x1000 [fixed_phy]
 ? 0xffffffffc0e40000
 ? 0xffffffffc0e40000
 ? 0xffffffffc0e40000
 do_one_initcall+0xfa/0x5ca init/main.c:887
 do_init_module+0x204/0x5f6 kernel/module.c:3460
 load_module+0x66b2/0x8570 kernel/module.c:3808
 __do_sys_finit_module+0x238/0x2a0 kernel/module.c:3902
 do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x462e99
Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f6215c19c58 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000462e99
RDX: 0000000000000000 RSI: 0000000020000080 RDI: 0000000000000003
RBP: 00007f6215c19c70 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f6215c1a6bc
R13: 00000000004bcefb R14: 00000000006f7030 R15: 0000000000000004

Allocated by task 3524:
 set_track mm/kasan/common.c:85 [inline]
 __kasan_kmalloc.constprop.3+0xa0/0xd0 mm/kasan/common.c:496
 kmalloc include/linux/slab.h:545 [inline]
 kzalloc include/linux/slab.h:740 [inline]
 mdiobus_alloc_size+0x54/0x1b0 drivers/net/phy/mdio_bus.c:143
 fixed_mdio_bus_init+0x163/0x1000 [fixed_phy]
 do_one_initcall+0xfa/0x5ca init/main.c:887
 do_init_module+0x204/0x5f6 kernel/module.c:3460
 load_module+0x66b2/0x8570 kernel/module.c:3808
 __do_sys_finit_module+0x238/0x2a0 kernel/module.c:3902
 do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 3524:
 set_track mm/kasan/common.c:85 [inline]
 __kasan_slab_free+0x130/0x180 mm/kasan/common.c:458
 slab_free_hook mm/slub.c:1409 [inline]
 slab_free_freelist_hook mm/slub.c:1436 [inline]
 slab_free mm/slub.c:2986 [inline]
 kfree+0xe1/0x270 mm/slub.c:3938
 device_release+0x78/0x200 drivers/base/core.c:919
 kobject_cleanup lib/kobject.c:662 [inline]
 kobject_release lib/kobject.c:691 [inline]
 kref_put include/linux/kref.h:67 [inline]
 kobject_put+0x146/0x240 lib/kobject.c:708
 put_device+0x1c/0x30 drivers/base/core.c:2060
 __mdiobus_register+0x483/0x560 drivers/net/phy/mdio_bus.c:382
 fixed_mdio_bus_init+0x26b/0x1000 [fixed_phy]
 do_one_initcall+0xfa/0x5ca init/main.c:887
 do_init_module+0x204/0x5f6 kernel/module.c:3460
 load_module+0x66b2/0x8570 kernel/module.c:3808
 __do_sys_finit_module+0x238/0x2a0 kernel/module.c:3902
 do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

The buggy address belongs to the object at ffff8881dc824c80
 which belongs to the cache kmalloc-2k of size 2048
The buggy address is located 248 bytes inside of
 2048-byte region [ffff8881dc824c80ffff8881dc825480)
The buggy address belongs to the page:
page:ffffea0007720800 count:1 mapcount:0 mapping:ffff8881f6c02800 index:0x0 compound_mapcount: 0
flags: 0x2fffc0000010200(slab|head)
raw: 02fffc0000010200 0000000000000000 0000000500000001 ffff8881f6c02800
raw: 0000000000000000 00000000800f000f 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff8881dc824c00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff8881dc824c80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff8881dc824d00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                                ^
 ffff8881dc824d80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff8881dc824e00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 0c692d07842a ("drivers/net/phy/mdio_bus.c: call put_device on device_register() failure")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'net-phy-at803x-Update-delays-for-RGMII-modes'
David S. Miller [Fri, 22 Feb 2019 23:30:03 +0000 (15:30 -0800)]
Merge branch 'net-phy-at803x-Update-delays-for-RGMII-modes'

Vinod Koul says:

====================
net: phy: at803x: Update delays for RGMII modes

Peter[1] reported that patch cd28d1d6e52e: ("net: phy: at803x: Disable
phy delay for RGMII mode") caused regression on am335x-evmsk board.
This board expects the Phy delay to be enabled but specified RGMII mode
which refers to delays being disabled. So fix this by disabling delay only
for RGMII mode and enable for RGMII_ID and RGMII_TXID/RXID modes.

While at it, as pointed by Dave, don't inline the helpers.

[1]: https://www.spinics.net/lists/netdev/msg550749.html

Changes in v4:
 - fix log & comments nbased on Marc's feedback
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: at803x: disable delay only for RGMII mode
Vinod Koul [Thu, 21 Feb 2019 10:23:15 +0000 (15:53 +0530)]
net: phy: at803x: disable delay only for RGMII mode

Per "Documentation/devicetree/bindings/net/ethernet.txt" RGMII mode
should not have delay in PHY whereas RGMII_ID and RGMII_RXID/RGMII_TXID
can have delay in PHY.

So disable the delay only for RGMII mode and enable for other modes.
Also treat the default case as disabled delays.

Fixes: cd28d1d6e52e: ("net: phy: at803x: Disable phy delay for RGMII mode")
Reported-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Reviewed-by: Niklas Cassel <niklas.cassel@linaro.org>
Tested-by: Peter Ujfalusi <peter.ujflausi@ti.com>
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: at803x: don't inline helpers
Vinod Koul [Thu, 21 Feb 2019 10:23:14 +0000 (15:53 +0530)]
net: phy: at803x: don't inline helpers

Some helpers were declared with the "inline" function specifier.
It is preferable to let the compiler pick the right optimizations,
so drop the specifier for at803x_disable_rx_delay() and
at803x_disable_tx_delay()

Reviewed-by: Niklas Cassel <niklas.cassel@linaro.org>
Tested-by: Peter Ujfalusi <peter.ujflausi@ti.com>
Reviewed-by: Marc Gonzalez <marc.w.gonzalez@free.fr>
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet_sched: initialize net pointer inside tcf_exts_init()
Cong Wang [Thu, 21 Feb 2019 05:37:42 +0000 (21:37 -0800)]
net_sched: initialize net pointer inside tcf_exts_init()

For tcindex filter, it is too late to initialize the
net pointer in tcf_exts_validate(), as tcf_exts_get_net()
requires a non-NULL net pointer. We can just move its
initialization into tcf_exts_init(), which just requires
an additional parameter.

This makes the code in tcindex_alloc_perfect_hash()
prettier.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: Set rtm_table to RT_TABLE_COMPAT for ipv6 for tables > 255
Kalash Nainwal [Thu, 21 Feb 2019 00:23:04 +0000 (16:23 -0800)]
net: Set rtm_table to RT_TABLE_COMPAT for ipv6 for tables > 255

Set rtm_table to RT_TABLE_COMPAT for ipv6 for tables > 255 to
keep legacy software happy. This is similar to what was done for
ipv4 in commit 709772e6e065 ("net: Fix routing tables with
id > 255 for legacy software").

Signed-off-by: Kalash Nainwal <kalash@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'bnxt_en-firmware-message-delay-fixes'
David S. Miller [Fri, 22 Feb 2019 23:16:56 +0000 (15:16 -0800)]
Merge branch 'bnxt_en-firmware-message-delay-fixes'

Michael Chan says:

====================
bnxt_en: firmware message delay fixes.

We were seeing some intermittent firmware message timeouts in our lab and
these 2 small patches fix them.  Please apply to stable as well.  Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agobnxt_en: Wait longer for the firmware message response to complete.
Michael Chan [Thu, 21 Feb 2019 00:07:32 +0000 (19:07 -0500)]
bnxt_en: Wait longer for the firmware message response to complete.

The code waits up to 20 usec for the firmware response to complete
once we've seen the valid response header in the buffer.  It turns
out that in some scenarios, this wait time is not long enough.
Extend it to 150 usec and use usleep_range() instead of udelay().

Fixes: 9751e8e71487 ("bnxt_en: reduce timeout on initial HWRM calls")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agobnxt_en: Fix typo in firmware message timeout logic.
Michael Chan [Thu, 21 Feb 2019 00:07:31 +0000 (19:07 -0500)]
bnxt_en: Fix typo in firmware message timeout logic.

The logic that polls for the firmware message response uses a shorter
sleep interval for the first few passes.  But there was a typo so it
was using the wrong counter (larger counter) for these short sleep
passes.  The result is a slightly shorter timeout period for these
firmware messages than intended.  Fix it by using the proper counter.

Fixes: 9751e8e71487 ("bnxt_en: reduce timeout on initial HWRM calls")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'bpf-nfp-codegen-fixes'
Daniel Borkmann [Fri, 22 Feb 2019 23:07:48 +0000 (00:07 +0100)]
Merge branch 'bpf-nfp-codegen-fixes'

Jiong Wang says:

====================
Code-gen for BPF_ALU | BPF_XOR | BPF_K is wrong when imm is -1,
also high 32-bit of 64-bit register should always be cleared.

This set fixed both bugs.
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agonfp: bpf: fix ALU32 high bits clearance bug
Jiong Wang [Fri, 22 Feb 2019 22:36:04 +0000 (22:36 +0000)]
nfp: bpf: fix ALU32 high bits clearance bug

NFP BPF JIT compiler is doing a couple of small optimizations when jitting
ALU imm instructions, some of these optimizations could save code-gen, for
example:

  A & -1 =  A
  A |  0 =  A
  A ^  0 =  A

However, for ALU32, high 32-bit of the 64-bit register should still be
cleared according to ISA semantics.

Fixes: cd7df56ed3e6 ("nfp: add BPF to NFP code translator")
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agonfp: bpf: fix code-gen bug on BPF_ALU | BPF_XOR | BPF_K
Jiong Wang [Fri, 22 Feb 2019 22:36:03 +0000 (22:36 +0000)]
nfp: bpf: fix code-gen bug on BPF_ALU | BPF_XOR | BPF_K

The intended optimization should be A ^ 0 = A, not A ^ -1 = A.

Fixes: cd7df56ed3e6 ("nfp: add BPF to NFP code translator")
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agonet/mlx5: Support ndo bridge_setlink and getlink
Huy Nguyen [Wed, 20 Feb 2019 03:40:34 +0000 (21:40 -0600)]
net/mlx5: Support ndo bridge_setlink and getlink

Allow enabling VEPA mode on the HCA's port in legacy devlink mode.

Example:
bridge link set dev ens1f0 hwmode vepa
will turn on VEPA mode on the netdev ens1f0.

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: E-Switch, Add support for VEPA in legacy mode.
Huy Nguyen [Mon, 21 Jan 2019 22:22:05 +0000 (16:22 -0600)]
net/mlx5: E-Switch, Add support for VEPA in legacy mode.

In Virtual Ethernet Port Aggregator (VEPA) mode, the packet skips
the system internal virtual switch and forwards to external network
switch. In Mellanox HCA case, the virtual switch is the HCA's Eswitch.

To support this, an new FDB flow table are created with level 0 and
linked to the existing FDB flow table in legacy mode. By default,
VEPA is turned off and this FDB flow table is empty. When VEPA is
turned on, two rules are created. One rule to forward on uplink vport
traffic to the legacy FDB. The other rule forward all other traffic
to uplink vport.

Other design alternatives were not chosen as explained below:
1. Create a forward rule in ACL flow table (most efficient design).
This approach is the not chosen because firmware does not support
forward rule to uplink vport (0xffff) for ACL flow table.
2. Add additional source port criteria in all the FDB rules to make the
FDB rules to be received rules only. This approach is not chosen because
it is not efficient as there can many rules in the FDB and VEPA mode
cannot be controlled per vport.
3. Add a highest prioirty flow group in the existing legacy FDB Flow
Table instead of a new flow table. This approoach does not work because the
new flow group has the same match criteria as the promiscuous flow group
and mlx5_add_flow_rules does not allow specifying flow group.

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>