platform/kernel/linux-starfive.git
6 years agoMerge branch 'sh_eth-complain-on-access-to-unimplemented-TSU-registers'
David S. Miller [Fri, 4 May 2018 13:11:50 +0000 (09:11 -0400)]
Merge branch 'sh_eth-complain-on-access-to-unimplemented-TSU-registers'

Sergei Shtylyov says:

====================
sh_eth: complain on access to unimplemented TSU registers

Here's a set of 2 patches against DaveM's 'net-next.git' repo. The 1st patch
routes TSU_POST<n> register accesses thru sh_eth_tsu_{read|write}() and the 2nd
added WARN_ON() unimplemented register to those functions. I'm going to deal with
TSU_ADR{H|L}<n> registers in a later series...
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosh_eth: WARN_ON() access to unimplemented TSU register
Sergei Shtylyov [Wed, 2 May 2018 19:55:52 +0000 (22:55 +0300)]
sh_eth: WARN_ON() access to unimplemented TSU register

Commit 3365711df024 ("sh_eth: WARN on access to a register not implemented
in a particular chip") added  WARN_ON() to sh_eth_{read|write}() but not
to sh_eth_tsu_{read|write}(). Now that we've routed almost all TSU register
accesses  (except TSU_ADR{H|L}<n> -- which are special) thru the latter
pair of accessors, it makes sense to check for the unimplemented TSU
registers as well...

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosh_eth: use TSU register accessors for TSU_POST<n>
Sergei Shtylyov [Wed, 2 May 2018 19:54:48 +0000 (22:54 +0300)]
sh_eth: use TSU register accessors for TSU_POST<n>

There's no particularly good reason TSU_POST<n> registers get accessed
circumventing sh_eth_tsu_{read|write}() -- start using those, removing
(badly named) sh_eth_tsu_get_post_reg_offset(),  while at it...

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'bridge-FDB-Notify-about-removal-of-non-user-added-entries'
David S. Miller [Thu, 3 May 2018 17:46:48 +0000 (13:46 -0400)]
Merge branch 'bridge-FDB-Notify-about-removal-of-non-user-added-entries'

Petr Machata says:

====================
bridge: FDB: Notify about removal of non-user-added entries

Device drivers may generally need to keep in sync with bridge's FDB. In
particular, for its offload of tc mirror action where the mirrored-to
device is a gretap device, mlxsw needs to listen to a number of events,
FDB events among the others. SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE would be
a natural notification in that case.

However, for removal of FDB entries added due to device activity (as
opposed to explicit addition through "bridge fdb add" or similar), there
are no notifications.

Thus in patch #1, add the "added_by_user" field to switchdev
notifications sent for FDB activity. Adapt drivers to ignore activity on
non-user-added entries, to maintain the current behavior. Specifically
in case of mlxsw, allow mlxsw_sp_span_respin() call for any and all FDB
updates.

In patch #2, change the bridge driver to actually emit notifications for
these FDB entries. Take care not to send notification for bridge
updates that itself originate in SWITCHDEV_FDB_*_TO_BRIDGE events.

Changes from v1 to v2:
- Instead of introducing a new variant of fdb_delete(), add a new
  parameter to the existing function.
- Name the parameter swdev_notify, not notify.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: bridge: Notify about !added_by_user FDB entries
Petr Machata [Thu, 3 May 2018 12:43:53 +0000 (14:43 +0200)]
net: bridge: Notify about !added_by_user FDB entries

Do not automatically bail out on sending notifications about activity on
non-user-added FDB entries. Instead, notify about this activity except
for cases where the activity itself originates in a notification, to
avoid sending duplicate notifications.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoswitchdev: Add fdb.added_by_user to switchdev notifications
Petr Machata [Thu, 3 May 2018 12:43:46 +0000 (14:43 +0200)]
switchdev: Add fdb.added_by_user to switchdev notifications

The following patch enables sending notifications also for events on FDB
entries that weren't added by the user. Give the drivers the information
necessary to distinguish between the two origins of FDB entries.

To maintain the current behavior, have switchdev-implementing drivers
bail out on notifications about non-user-added FDB entries. In case of
mlxsw driver, allow a call to mlxsw_sp_span_respin() so that SPAN over
bridge catches up with the changed FDB.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'mlxsw-Introduce-support-for-CQEv1-2'
David S. Miller [Thu, 3 May 2018 17:44:43 +0000 (13:44 -0400)]
Merge branch 'mlxsw-Introduce-support-for-CQEv1-2'

Ido Schimmel says:

====================
mlxsw: Introduce support for CQEv1/2

Jiri says:

Current SwitchX2 and Spectrum FWs support CQEv0 and that is what we
implement in mlxsw. Spectrum FW also supports CQE v1 and v2.
However, Spectrum-2 won't support CQEv0. Prepare for it and setup the
CQE versions to use according to what is queried from FW.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: pci: Check number of CQEs for CQE version 2
Jiri Pirko [Thu, 3 May 2018 11:59:42 +0000 (14:59 +0300)]
mlxsw: pci: Check number of CQEs for CQE version 2

Check number of CQEs for CQE version 2 reported by QUERY_AQ_CAP command.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: pci: Allow to use CQEs of version 1 and version 2
Jiri Pirko [Thu, 3 May 2018 11:59:41 +0000 (14:59 +0300)]
mlxsw: pci: Allow to use CQEs of version 1 and version 2

Use previously added resources to query FW support for multiple versions
of CQEs. Use the biggest version supported. For SDQs, it has no sense to
use version 2 as it does not introduce any new features, but it is
twice the size of CQE version 1.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: pci: Introduce helpers to work with multiple CQE versions
Jiri Pirko [Thu, 3 May 2018 11:59:40 +0000 (14:59 +0300)]
mlxsw: pci: Introduce helpers to work with multiple CQE versions

Introduce definitions of fields in CQE version 1 and 2. Also, introduce
common helpers that would call appropriate version-specific helpers
according to the version enum passed.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: resources: Add CQE versions resources
Jiri Pirko [Thu, 3 May 2018 11:59:39 +0000 (14:59 +0300)]
mlxsw: resources: Add CQE versions resources

Add resources that FW uses to report supported CQE versions.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: bridge: avoid duplicate notification on up/down/change netdev events
Nikolay Aleksandrov [Thu, 3 May 2018 10:47:24 +0000 (13:47 +0300)]
net: bridge: avoid duplicate notification on up/down/change netdev events

While handling netdevice events, br_device_event() sometimes uses
br_stp_(disable|enable)_port which unconditionally send a notification,
but then a second notification for the same event is sent at the end of
the br_device_event() function. To avoid sending duplicate notifications
in such cases, check if one has already been sent (i.e.
br_stp_enable/disable_port have been called).
The patch is based on a change by Satish Ashok.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'selftests-forwarding-sysctl'
David S. Miller [Thu, 3 May 2018 17:37:03 +0000 (13:37 -0400)]
Merge branch 'selftests-forwarding-sysctl'

Petr Machata says:

====================
selftests: forwarding: Updates to sysctl handling

Some selftests need to adjust sysctl settings. In order to be neutral to
the system that the test is run on, it is a good practice to change back
to the original setting after the test ends. That involves some
boilerplate that can be abstracted away.

In patch #1, introduce two functions, sysctl_set() and sysctl_restore().
The former stores the current value of a given setting, and sets a new
value. The latter restores the setting to the previously-stored value.

In patch #2, use these wrappers in a number of tests.

Additionally in patch #3, fix a problem in mirror_gre_nh.sh, which
neglected to set a sysctl that's crucial for the test to work.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests: forwarding: mirror_gre_nh: Unset RP filter
Petr Machata [Thu, 3 May 2018 10:37:21 +0000 (12:37 +0200)]
selftests: forwarding: mirror_gre_nh: Unset RP filter

The test fails to work if reverse-path filtering is in effect on the
mirrored-to host interface, or for all interfaces.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests: forwarding: Use sysctl_set(), sysctl_restore()
Petr Machata [Thu, 3 May 2018 10:37:13 +0000 (12:37 +0200)]
selftests: forwarding: Use sysctl_set(), sysctl_restore()

Instead of hand-managing the sysctl set and restore, use the wrappers
sysctl_set() and sysctl_restore() to do the bookkeeping automatically.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests: forwarding: lib: Add sysctl_set(), sysctl_restore()
Petr Machata [Thu, 3 May 2018 10:36:59 +0000 (12:36 +0200)]
selftests: forwarding: lib: Add sysctl_set(), sysctl_restore()

Add two helper functions: sysctl_set() to change the value of a given
sysctl setting, and sysctl_restore() to change it back to what it was.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'selftests-forwarding-Two-enhancements'
David S. Miller [Thu, 3 May 2018 16:54:32 +0000 (12:54 -0400)]
Merge branch 'selftests-forwarding-Two-enhancements'

Ido Schimmel says:

====================
selftests: forwarding: Two enhancements

First patch increases the maximum deviation in the multipath tests which
proved to be too low in some cases.

Second patch allows user to run only specific tests from each file using
the TESTS environment variable. This granularity is needed in setups
where not all the tests can pass.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests: forwarding: Allow running specific tests
Ido Schimmel [Thu, 3 May 2018 07:51:33 +0000 (10:51 +0300)]
selftests: forwarding: Allow running specific tests

Similar to commit a511858c7536 ("selftests: fib_tests: Allow user to run
a specific test"), allow user to run only a subset of the tests using
the TESTS environment variable.

This is useful when not all the tests can pass on a given system.

Example:
# export TESTS="ping_ipv4 ping_ipv6"
# ./bridge_vlan_aware.sh
TEST: ping [PASS]
TEST: ping6 [PASS]

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests: forwarding: Increase maximum deviation in multipath test
Ido Schimmel [Thu, 3 May 2018 07:51:32 +0000 (10:51 +0300)]
selftests: forwarding: Increase maximum deviation in multipath test

We sometimes observe failures in the test due to too large discrepancy
between the measured and expected ratios. For example:

TEST: ECMP                                                          [FAIL]
        Too large discrepancy between expected and measured ratios
        INFO: Expected ratio 1.00 Measured ratio 1.11

Fix this by allowing an up to 15% deviation between both ratios.

Another possibility is to increase the number of generated flows, but
this will prolong the execution time of the test, which is already quite
high.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agocxgb4: update latest firmware version supported
Ganesh Goudar [Thu, 3 May 2018 06:24:23 +0000 (11:54 +0530)]
cxgb4: update latest firmware version supported

Change t4fw_version.h to update latest firmware version
number to 1.19.1.0.

Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoip6_gre: correct the function name in ip6gre_tnl_addr_conflict() comment
Sun Lianwen [Thu, 3 May 2018 01:34:29 +0000 (09:34 +0800)]
ip6_gre: correct the function name in ip6gre_tnl_addr_conflict() comment

The function name is wrong in ip6gre_tnl_addr_conflict() comment, which
use ip6_tnl_addr_conflict instead of ip6gre_tnl_addr_conflict.

Signed-off-by: Sun Lianwen <sunlw.fnst@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'act_csum-get_fill_size'
David S. Miller [Thu, 3 May 2018 15:15:59 +0000 (11:15 -0400)]
Merge branch 'act_csum-get_fill_size'

Craig Dillabaugh says:

====================
Update csum tc action for batch operation.

This patchset includes two patches the first updating act_csum.c
to include the get_fill_size routine required for batch operation, and
the second including updated TDC tests for the feature.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotc-testing: Updated csum action tests batch create w/wo cookies.
Craig Dillabaugh [Tue, 1 May 2018 14:17:44 +0000 (10:17 -0400)]
tc-testing: Updated csum action tests batch create w/wo cookies.

Signed-off-by: Craig Dillabaugh <cdillaba@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet sched: Implemented get_fill_size routine for act_csum.
Craig Dillabaugh [Tue, 1 May 2018 14:17:43 +0000 (10:17 -0400)]
net sched: Implemented get_fill_size routine for act_csum.

Signed-off-by: Craig Dillabaugh <cdillaba@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'r8169-series-with-further-improvements'
David S. Miller [Wed, 2 May 2018 20:23:50 +0000 (16:23 -0400)]
Merge branch 'r8169-series-with-further-improvements'

Heiner Kallweit says:

====================
r8169: series with further improvements

I thought I'm more or less done with the basic refactoring. But again
I stumbled across things that can be improved / simplified.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agor8169: replace get_protocol with vlan_get_protocol
Heiner Kallweit [Wed, 2 May 2018 19:40:02 +0000 (21:40 +0200)]
r8169: replace get_protocol with vlan_get_protocol

This patch is basically the same as 6e74d1749a33 ("r8152: replace
get_protocol with vlan_get_protocol"). Use vlan_get_protocol
instead of duplicating the functionality.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agor8169: avoid potentially misaligned access when getting mac address
Heiner Kallweit [Wed, 2 May 2018 19:39:59 +0000 (21:39 +0200)]
r8169: avoid potentially misaligned access when getting mac address

Interpreting a member of an u16 array as u32 may result in a misaligned
access. Also it's not really intuitive to define a mac address variable
as array of three u16 words. Therefore use an array of six bytes that
is properly aligned for 32 bit access.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agor8169: improve PCI config space access
Heiner Kallweit [Wed, 2 May 2018 19:39:56 +0000 (21:39 +0200)]
r8169: improve PCI config space access

Some chips have a non-zero function id, however instead of hardcoding
the id's (CSIAR_FUNC_NIC and CSIAR_FUNC_NIC2) we can get them
dynamically via PCI_FUNC(pci_dev->devfn). This way we can get rid
of the csi_ops.

In general csi is just a fallback mechanism for PCI config space
access in case no native access is supported. Therefore let's
try native access first.

I checked with Realtek regarding the functionality of config space
byte 0x070f and according to them it controls the L0s/L1
entrance latency.
Currently ASPM is disabled in general and therefore this value
isn't used. However we may introduce a whitelist for chips
where ASPM is known to work, therefore let's keep this code.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agor8169: drop rtl_generic_op
Heiner Kallweit [Wed, 2 May 2018 19:39:54 +0000 (21:39 +0200)]
r8169: drop rtl_generic_op

Only two places are left where rtl_generic_op() is used, so we can
inline it and simplify the code a little.
This change also avoids the overhead of unlocking/locking in case
the respective operation isn't set.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agor8169: replace longer if statements with switch statements
Heiner Kallweit [Wed, 2 May 2018 19:39:52 +0000 (21:39 +0200)]
r8169: replace longer if statements with switch statements

Some longer if statements can be simplified by using switch
statements instead.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agor8169: simplify code by using ranges in switch clauses
Heiner Kallweit [Wed, 2 May 2018 19:39:49 +0000 (21:39 +0200)]
r8169: simplify code by using ranges in switch clauses

Several switch statements can be significantly simplified by using
case ranges.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agor8169: drop member pll_power_ops from struct rtl8169_private
Heiner Kallweit [Wed, 2 May 2018 19:39:47 +0000 (21:39 +0200)]
r8169: drop member pll_power_ops from struct rtl8169_private

After merging r810x_pll_power_down/up and r8168_pll_power_down/up we
don't need member pll_power_ops any longer and can drop it, thus
simplifying the code.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agor8169: merge r810x_pll_power_down/up into r8168_pll_power_down/up
Heiner Kallweit [Wed, 2 May 2018 19:39:45 +0000 (21:39 +0200)]
r8169: merge r810x_pll_power_down/up into r8168_pll_power_down/up

r810x_pll_power_down/up and r8168_pll_power_down/up have a lot in common,
so we can simplify the code by merging the former into the latter.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agor8169: remove 810x_phy_power_up/down
Heiner Kallweit [Wed, 2 May 2018 19:39:40 +0000 (21:39 +0200)]
r8169: remove 810x_phy_power_up/down

The functionality of 810x_phy_power_up/down is covered by the default
clause in 8168_phy_power_up/down. Therefore we don't need these
functions.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agor8169: remove unneeded check in r8168_pll_power_down
Heiner Kallweit [Wed, 2 May 2018 19:39:35 +0000 (21:39 +0200)]
r8169: remove unneeded check in r8168_pll_power_down

RTL_GIGA_MAC_VER_23/24 are configured by rtl_hw_start_8168cp_2()
and rtl_hw_start_8168cp_3() respectively which both apply
CPCMD_QUIRK_MASK, thus clearing bit ASF.

Bit ASF isn't set at any other place in the driver, therefore this
check can be removed.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'net-smc-small-features'
David S. Miller [Wed, 2 May 2018 17:29:13 +0000 (13:29 -0400)]
Merge branch 'net-smc-small-features'

Ursula Braun says:

====================
net/smc: small features 2018/04/30

here are 4 smc patches for net-next covering small new features
in different areas:
   * link health check
   * diagnostics for IPv6 smc sockets
   * ioctl
   * improvement for vlan determination

v2 changes:
   * better title
   * patch 2 - remove compile problem for disabled CONFIG_IPV6
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/smc: determine vlan_id of stacked net_device
Ursula Braun [Wed, 2 May 2018 14:56:47 +0000 (16:56 +0200)]
net/smc: determine vlan_id of stacked net_device

An SMC link group is bound to a specific vlan_id. Its link uses
the RoCE-GIDs established for the specific vlan_id. This patch makes
sure the appropriate vlan_id is determined for stacked scenarios like
for instance a master bonding device with vlan devices enslaved.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/smc: handle ioctls SIOCINQ, SIOCOUTQ, and SIOCOUTQNSD
Ursula Braun [Wed, 2 May 2018 14:56:46 +0000 (16:56 +0200)]
net/smc: handle ioctls SIOCINQ, SIOCOUTQ, and SIOCOUTQNSD

SIOCINQ returns the amount of unread data in the RMB.
SIOCOUTQ returns the amount of unsent or unacked sent data in the send
buffer.
SIOCOUTQNSD returns the amount of data prepared for sending, but
not yet sent.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/smc: ipv6 support for smc_diag.c
Karsten Graul [Wed, 2 May 2018 14:56:45 +0000 (16:56 +0200)]
net/smc: ipv6 support for smc_diag.c

Update smc_diag.c to support ipv6 addresses on the diagnosis interface.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/smc: periodic testlink support
Karsten Graul [Wed, 2 May 2018 14:56:44 +0000 (16:56 +0200)]
net/smc: periodic testlink support

Add periodic LLC testlink support to ensure the link is still active.
The interval time is initialized using the value of
sysctl_tcp_keepalive_time.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'mlxsw-Reject-unsupported-FIB-configurations'
David S. Miller [Wed, 2 May 2018 17:15:18 +0000 (13:15 -0400)]
Merge branch 'mlxsw-Reject-unsupported-FIB-configurations'

Ido Schimmel says:

====================
mlxsw: Reject unsupported FIB configurations

Recently it became possible for listeners of the FIB notification chain
to veto operations such as addition of routes and rules.

Adjust the mlxsw driver to take advantage of it and return an error for
unsupported FIB rules and for routes configured after the abort
mechanism was triggered (due to exceeded resources for example).

v2:
* Change error code in first patch to -EOPNOTSUPP (David Ahern).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: spectrum_router: Return an error for routes added after abort
Ido Schimmel [Wed, 2 May 2018 07:17:35 +0000 (10:17 +0300)]
mlxsw: spectrum_router: Return an error for routes added after abort

We currently do not perform accounting in the driver and thus can't
reject routes before resources are exceeded.

However, in order to make users aware of the fact that routes are no
longer offloaded we can return an error for routes configured after the
abort mechanism was triggered.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: spectrum_router: Return an error for non-default FIB rules
Ido Schimmel [Wed, 2 May 2018 07:17:34 +0000 (10:17 +0300)]
mlxsw: spectrum_router: Return an error for non-default FIB rules

Since commit 9776d32537d2 ("net: Move call_fib_rule_notifiers up in
fib_nl_newrule") it is possible to forbid the installation of
unsupported FIB rules.

Have mlxsw return an error for non-default FIB rules in addition to the
existing extack message.

Example:
# ip rule add from 198.51.100.1 table 10
Error: mlxsw_spectrum: FIB rules not supported.

Note that offload is only aborted when non-default FIB rules are already
installed and merely replayed during module initialization.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agocxgb4: add new T5 device id's
Ganesh Goudar [Wed, 2 May 2018 06:17:15 +0000 (11:47 +0530)]
cxgb4: add new T5 device id's

Add device id's 0x5019, 0x501a and 0x501b for T5
cards.

Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: stmmac: Avoid VLA usage
Kees Cook [Tue, 1 May 2018 21:01:30 +0000 (14:01 -0700)]
net: stmmac: Avoid VLA usage

In the quest to remove all stack VLAs from the kernel[1], this switches
the "status" stack buffer to use the existing small (8) upper bound on
how many queues can be checked for DMA, and adds a sanity-check just to
make sure it doesn't operate under pathological conditions.

[1] http://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com

Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Jose Abreu <joabreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoliquidio VF: indicate that disabling rx vlan offload is not allowed
Raghu Vatsavayi [Tue, 1 May 2018 17:32:10 +0000 (10:32 -0700)]
liquidio VF: indicate that disabling rx vlan offload is not allowed

NIC firmware does not support disabling rx vlan offload, but the VF driver
incorrectly indicates that it is supported.  The PF driver already does the
correct indication by clearing the NETIF_F_HW_VLAN_CTAG_RX bit in its
netdev->hw_features.  So just do the same thing in the VF.

Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@cavium.com>
Acked-by: Prasad Kanneganti <prasad.kanneganti@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoudp: Complement partial checksum for GSO packet
Sean Tranchetti [Tue, 1 May 2018 00:01:02 +0000 (18:01 -0600)]
udp: Complement partial checksum for GSO packet

Using the udp_v4_check() function to calculate the pseudo header
for the newly segmented UDP packets results in assigning the complement
of the value to the UDP header checksum field.

Always undo the complement the partial checksum value in order to
match the case where GSO is not used on the UDP transmit path.

Fixes: ee80d1ebe5ba ("udp: add udp gso")
Signed-off-by: Sean Tranchetti <stranche@codeaurora.org>
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftest: add test for TCP_INQ
Soheil Hassas Yeganeh [Tue, 1 May 2018 19:39:16 +0000 (15:39 -0400)]
selftest: add test for TCP_INQ

Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotcp: send in-queue bytes in cmsg upon read
Soheil Hassas Yeganeh [Tue, 1 May 2018 19:39:15 +0000 (15:39 -0400)]
tcp: send in-queue bytes in cmsg upon read

Applications with many concurrent connections, high variance
in receive queue length and tight memory bounds cannot
allocate worst-case buffer size to drain sockets. Knowing
the size of receive queue length, applications can optimize
how they allocate buffers to read from the socket.

The number of bytes pending on the socket is directly
available through ioctl(FIONREAD/SIOCINQ) and can be
approximated using getsockopt(MEMINFO) (rmem_alloc includes
skb overheads in addition to application data). But, both of
these options add an extra syscall per recvmsg. Moreover,
ioctl(FIONREAD/SIOCINQ) takes the socket lock.

Add the TCP_INQ socket option to TCP. When this socket
option is set, recvmsg() relays the number of bytes available
on the socket for reading to the application via the
TCP_CM_INQ control message.

Calculate the number of bytes after releasing the socket lock
to include the processed backlog, if any. To avoid an extra
branch in the hot path of recvmsg() for this new control
message, move all cmsg processing inside an existing branch for
processing receive timestamps. Since the socket lock is not held
when calculating the size of receive queue, TCP_INQ is a hint.
For example, it can overestimate the queue size by one byte,
if FIN is received.

With this method, applications can start reading from the socket
using a small buffer, and then use larger buffers based on the
remaining data when needed.

V3 change-log:
As suggested by David Miller, added loads with barrier
to check whether we have multiple threads calling recvmsg
in parallel. When that happens we lock the socket to
calculate inq.
V4 change-log:
Removed inline from a static function.

Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'hns3-fixes'
David S. Miller [Tue, 1 May 2018 19:08:38 +0000 (15:08 -0400)]
Merge branch 'hns3-fixes'

Salil Mehta says:

====================
Misc bug fixes for HNS3 Ethernet driver

This patch-set presents some miscellaneous bug fixs and cleanups for
HNS3 Ethernet Driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: Remove packet statistics in the range of 8192~12287
Xi Wang [Tue, 1 May 2018 18:56:05 +0000 (19:56 +0100)]
net: hns3: Remove packet statistics in the range of 8192~12287

Because the current statistics for size 8192~12287 are only valid for GE,
the ranges of 8192~9216 and 9217~12287 are valid only for LGE/CGE, and are
always 0 for GE interfaces. it is easy to cause confusion when viewing the
packet statistics using the command ethtool -S.

This patch removes the 8192~12287 range of packet statistics and uses the
8192~9216 and 9217~12287 ranges for statistics. This change depends on the
firmware upgrade.

Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: Fix for packet loss due wrong filter config in VLAN tbls
Yunsheng Lin [Tue, 1 May 2018 18:56:04 +0000 (19:56 +0100)]
net: hns3: Fix for packet loss due wrong filter config in VLAN tbls

There are two level of vlan tables in hardware, one is port vlan
which is shared by all functions, the other one is function
vlan table, each function has it's own function vlan table.
Currently, PF sets the port vlan table, and vf sets the function
vlan table, which will cause packet lost problem.

This patch fixes this problem by setting both vlan table, and
use hdev->vlan_table to manage thet port vlan table.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: fix a dead loop in hclge_cmd_csq_clean
Huazhong Tan [Tue, 1 May 2018 18:56:03 +0000 (19:56 +0100)]
net: hns3: fix a dead loop in hclge_cmd_csq_clean

If head has invlid value then a dead loop can be triggered in
hclge_cmd_csq_clean. This patch adds sanity check for this case.

Fixes: 68c0a5c70614 ("net: hns3: Add HNS3 IMP(Integrated Mgmt Proc) Cmd
Interface Support")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: Fix to support autoneg only for port attached with phy
Fuyun Liang [Tue, 1 May 2018 18:56:02 +0000 (19:56 +0100)]
net: hns3: Fix to support autoneg only for port attached with phy

This patch adds a check to support autoneg(ethtool -A) only when PHY
is attached with the port.

Fixes: e2cb1dec9779 ("net: hns3: Add HNS3 VF HCL(Hardware Compatibility
Layer) Support")
Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: fix for phy_addr error in hclge_mac_mdio_config
Huazhong Tan [Tue, 1 May 2018 18:56:01 +0000 (19:56 +0100)]
net: hns3: fix for phy_addr error in hclge_mac_mdio_config

When phy exists, phy_addr must less than PHY_MAX_ADDR.
If not, hclge_mac_mdio_config should return error.
And for fiber(phy_addr=0xff), it does not need hclge_mac_mdio_config.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: Fixes the error legs in hclge_init_ae_dev function
Huazhong Tan [Tue, 1 May 2018 18:56:00 +0000 (19:56 +0100)]
net: hns3: Fixes the error legs in hclge_init_ae_dev function

This patch fixes some of the missed error legs in the initialization
function of the ae device. This might cause leaks in case of failure.

Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer
Support")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: Fixes the out of bounds access in hclge_map_tqp
Huazhong Tan [Tue, 1 May 2018 18:55:59 +0000 (19:55 +0100)]
net: hns3: Fixes the out of bounds access in hclge_map_tqp

This patch fixes the handling of the check when number of vports
are detected to be more than available TPQs. Current handling causes
an out of bounds access in hclge_map_tqp().

Fixes: 7df7dad633e2 ("net: hns3: Refactor the mapping of tqp to vport")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: fix to correctly fetch l4 protocol outer header
Huazhong Tan [Tue, 1 May 2018 18:55:58 +0000 (19:55 +0100)]
net: hns3: fix to correctly fetch l4 protocol outer header

This patch fixes the function being used to fetch L4
protocol outer header. Mistakenly skb_inner_transport_header
API was being used earlier.

Fixes: 76ad4f0ee747 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: Remove error log when getting pfc stats fails
Yunsheng Lin [Tue, 1 May 2018 18:55:57 +0000 (19:55 +0100)]
net: hns3: Remove error log when getting pfc stats fails

When mac supports DCB, but is in GE mode, it does not support
querying pfc stats, firmware returns error when trying to
query the pfc stats. this creates a lot of noise in the kernel
log when it prints the error log.

This patch fixes it by removing the error log, because it already
return the error to the user space, so the user should be aware of
the error.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoconnector: add parent pid and tgid to coredump and exit events
Stefan Strogin [Mon, 30 Apr 2018 22:04:29 +0000 (01:04 +0300)]
connector: add parent pid and tgid to coredump and exit events

The intention is to get notified of process failures as soon
as possible, before a possible core dumping (which could be very long)
(e.g. in some process-manager). Coredump and exit process events
are perfect for such use cases (see 2b5faa4c553f "connector: Added
coredumping event to the process connector").

The problem is that for now the process-manager cannot know the parent
of a dying process using connectors. This could be useful if the
process-manager should monitor for failures only children of certain
parents, so we could filter the coredump and exit events by parent
process and/or thread ID.

Add parent pid and tgid to coredump and exit process connectors event
data.

Signed-off-by: Stefan Strogin <sstrogin@cisco.com>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: core: Inline netdev_features_size_check()
Florian Fainelli [Mon, 30 Apr 2018 21:20:05 +0000 (14:20 -0700)]
net: core: Inline netdev_features_size_check()

We do not require this inline function to be used in multiple different
locations, just inline it where it gets used in register_netdevice().

Suggested-by: David Miller <davem@davemloft.net>
Suggested-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoudp: disable gso with no_check_tx
Willem de Bruijn [Mon, 30 Apr 2018 19:58:36 +0000 (15:58 -0400)]
udp: disable gso with no_check_tx

Syzbot managed to send a udp gso packet without checksum offload into
the gso stack by disabling tx checksum (UDP_NO_CHECK6_TX). This
triggered the skb_warn_bad_offload.

  RIP: 0010:skb_warn_bad_offload+0x2bc/0x600 net/core/dev.c:2658
   skb_gso_segment include/linux/netdevice.h:4038 [inline]
   validate_xmit_skb+0x54d/0xd90 net/core/dev.c:3120
   __dev_queue_xmit+0xbf8/0x34c0 net/core/dev.c:3577
   dev_queue_xmit+0x17/0x20 net/core/dev.c:3618

UDP_NO_CHECK6_TX sets skb->ip_summed to CHECKSUM_NONE just after the
udp gso integrity checks in udp_(v6_)send_skb. Extend those checks to
catch and fail in this case.

After the integrity checks jump directly to the CHECKSUM_PARTIAL case
to avoid reading the no_check_tx flags again (a TOCTTOU race).

Fixes: bec1f6f69736 ("udp: generate gso with UDP_SEGMENT")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agocls_flower: Support multiple masks per priority
Paul Blakey [Mon, 30 Apr 2018 11:28:30 +0000 (14:28 +0300)]
cls_flower: Support multiple masks per priority

Currently flower doesn't support inserting filters with different masks
on a single priority, even if the actual flows (key + mask) inserted
aren't overlapping, as with the use case of offloading openvswitch
datapath flows. Instead one must go up one level, and assign different
priorities for each mask, which will create a different flower
instances.

This patch opens flower to support more than one mask per priority,
and a single flower instance. It does so by adding another hash table
on top of the existing one which will store the different masks,
and the filters that share it.

The user is left with the responsibility of ensuring non overlapping
flows, otherwise precedence is not guaranteed.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'sctp-unify-sctp_make_op_error_fixed-and-sctp_make_op_error_space'
David S. Miller [Tue, 1 May 2018 16:09:36 +0000 (12:09 -0400)]
Merge branch 'sctp-unify-sctp_make_op_error_fixed-and-sctp_make_op_error_space'

Marcelo Ricardo Leitner says:

====================
sctp: unify sctp_make_op_error_fixed and sctp_make_op_error_space

These two variants are very close to each other and can be merged
to avoid code duplication. That's what this patchset does.

First, we allow sctp_init_cause to return errors, which then allow us to
add sctp_make_op_error_limited that handles both situations.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosctp: add sctp_make_op_error_limited and reuse inner functions
Marcelo Ricardo Leitner [Sun, 29 Apr 2018 15:56:32 +0000 (12:56 -0300)]
sctp: add sctp_make_op_error_limited and reuse inner functions

The idea is quite similar to the old functions, but note that the _fixed
function wasn't "fixed" as in that it would generate a packet with a fixed
size, but rather limited/bounded to PMTU.

Also, now with sctp_mtu_payload(), we have a more accurate limit.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosctp: allow sctp_init_cause to return errors
Marcelo Ricardo Leitner [Sun, 29 Apr 2018 15:56:31 +0000 (12:56 -0300)]
sctp: allow sctp_init_cause to return errors

And do so if the skb doesn't have enough space for the payload.
This is a preparation for the next patch.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'net-stmmac-dwmac-meson-100M-phy-mode-support-for-AXG-SoC'
David S. Miller [Tue, 1 May 2018 15:30:00 +0000 (11:30 -0400)]
Merge branch 'net-stmmac-dwmac-meson-100M-phy-mode-support-for-AXG-SoC'

Yixun Lan says:

====================
net: stmmac: dwmac-meson: 100M phy mode support for AXG SoC

Due to the dwmac glue layer register changed, we need to
introduce a new compatible name for the Meson-AXG SoC
to support for the RMII 100M ethernet PHY.

Change since v1 at [1]:
  - implement set_phy_mode() for each SoC

[1] https://lkml.kernel.org/r/20180426160508.29380-1-yixun.lan@amlogic.com
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: stmmac: dwmac-meson: extend phy mode setting
Yixun Lan [Sat, 28 Apr 2018 10:21:11 +0000 (10:21 +0000)]
net: stmmac: dwmac-meson: extend phy mode setting

In the Meson-AXG SoC, the phy mode setting of PRG_ETH0 in the glue layer
is extended from bit[0] to bit[2:0].
  There is no problem if we configure it to the RGMII 1000M PHY mode,
since the register setting is coincidentally compatible with previous one,
but for the RMII 100M PHY mode, the configuration need to be changed to
value - b100.
  This patch was verified with a RTL8201F 100M ethernet PHY.

Signed-off-by: Yixun Lan <yixun.lan@amlogic.com>
Acked-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agodt-bindings: net: meson-dwmac: new compatible name for AXG SoC
Yixun Lan [Sat, 28 Apr 2018 10:21:10 +0000 (10:21 +0000)]
dt-bindings: net: meson-dwmac: new compatible name for AXG SoC

We need to introduce a new compatible name for the Meson-AXG SoC
in order to support the RMII 100M ethernet PHY, since the PRG_ETH0
register of the dwmac glue layer is changed from previous old SoC.

Signed-off-by: Yixun Lan <yixun.lan@amlogic.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'netns-uevent-filtering'
David S. Miller [Tue, 1 May 2018 14:22:41 +0000 (10:22 -0400)]
Merge branch 'netns-uevent-filtering'

Christian Brauner says:

====================
netns: uevent filtering

This is the new approach to uevent filtering as discussed (see the
threads in [1], [2], and [3]). It only contains *non-functional
changes*.

This series deals with with fixing up uevent filtering logic:
- uevent filtering logic is simplified
- locking time on uevent_sock_list is minimized
- tagged and untagged kobjects are handled in separate codepaths
- permissions for userspace are fixed for network device uevents in
  network namespaces owned by non-initial user namespaces
  Udev is now able to see those events correctly which it wasn't before.
  For example, moving a physical device into a network namespace not
  owned by the initial user namespaces before gave:

  root@xen1:~# udevadm --debug monitor -k
  calling: monitor
  monitor will print the received events for:
  KERNEL - the kernel uevent

  sender uid=65534, message ignored
  sender uid=65534, message ignored
  sender uid=65534, message ignored
  sender uid=65534, message ignored
  sender uid=65534, message ignored

  and now after the discussion and solution in [3] correctly gives:

  root@xen1:~# udevadm --debug monitor -k
  calling: monitor
  monitor will print the received events for:
  KERNEL - the kernel uevent

  KERNEL[625.301042] add      /devices/pci0000:00/0000:00:02.0/0000:01:00.1/net/enp1s0f1 (net)
  KERNEL[625.301109] move     /devices/pci0000:00/0000:00:02.0/0000:01:00.1/net/enp1s0f1 (net)
  KERNEL[625.301138] move     /devices/pci0000:00/0000:00:02.0/0000:01:00.1/net/eth1 (net)
  KERNEL[655.333272] remove /devices/pci0000:00/0000:00:02.0/0000:01:00.1/net/eth1 (net)

Thanks!
Christian

[1]: https://lkml.org/lkml/2018/4/4/739
[2]: https://lkml.org/lkml/2018/4/26/767
[3]: https://lkml.org/lkml/2018/4/26/738
====================

Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonetns: restrict uevents
Christian Brauner [Sun, 29 Apr 2018 10:44:12 +0000 (12:44 +0200)]
netns: restrict uevents

commit 07e98962fa77 ("kobject: Send hotplug events in all network namespaces")

enabled sending hotplug events into all network namespaces back in 2010.
Over time the set of uevents that get sent into all network namespaces has
shrunk. We have now reached the point where hotplug events for all devices
that carry a namespace tag are filtered according to that namespace.
Specifically, they are filtered whenever the namespace tag of the kobject
does not match the namespace tag of the netlink socket.
Currently, only network devices carry namespace tags (i.e. network
namespace tags). Hence, uevents for network devices only show up in the
network namespace such devices are created in or moved to.

However, any uevent for a kobject that does not have a namespace tag
associated with it will not be filtered and we will broadcast it into all
network namespaces. This behavior stopped making sense when user namespaces
were introduced.

This patch simplifies and fixes couple of things:
- Split codepath for sending uevents by kobject namespace tags:
  1. Untagged kobjects - uevent_net_broadcast_untagged():
     Untagged kobjects will be broadcast into all uevent sockets recorded
     in uevent_sock_list, i.e. into all network namespacs owned by the
     intial user namespace.
  2. Tagged kobjects - uevent_net_broadcast_tagged():
     Tagged kobjects will only be broadcast into the network namespace they
     were tagged with.
  Handling of tagged kobjects in 2. does not cause any semantic changes.
  This is just splitting out the filtering logic that was handled by
  kobj_bcast_filter() before.
  Handling of untagged kobjects in 1. will cause a semantic change. The
  reasons why this is needed and ok have been discussed in [1]. Here is a
  short summary:
  - Userspace ignores uevents from network namespaces that are not owned by
    the intial user namespace:
    Uevents are filtered by userspace in a user namespace because the
    received uid != 0. Instead the uid associated with the event will be
    65534 == "nobody" because the global root uid is not mapped.
    This means we can safely and without introducing regressions modify the
    kernel to not send uevents into all network namespaces whose owning
    user namespace is not the initial user namespace because we know that
    userspace will ignore the message because of the uid anyway.
    I have a) verified that is is true for every udev implementation out
    there b) that this behavior has been present in all udev
    implementations from the very beginning.
  - Thundering herd:
    Broadcasting uevents into all network namespaces introduces significant
    overhead.
    All processes that listen to uevents running in non-initial user
    namespaces will end up responding to uevents that will be meaningless
    to them. Mainly, because non-initial user namespaces cannot easily
    manage devices unless they have a privileged host-process helping them
    out. This means that there will be a thundering herd of activity when
    there shouldn't be any.
  - Removing needless overhead/Increasing performance:
    Currently, the uevent socket for each network namespace is added to the
    global variable uevent_sock_list. The list itself needs to be protected
    by a mutex. So everytime a uevent is generated the mutex is taken on
    the list. The mutex is held *from the creation of the uevent (memory
    allocation, string creation etc. until all uevent sockets have been
    handled*. This is aggravated by the fact that for each uevent socket
    that has listeners the mc_list must be walked as well which means we're
    talking O(n^2) here. Given that a standard Linux workload usually has
    quite a lot of network namespaces and - in the face of containers - a
    lot of user namespaces this quickly becomes a performance problem (see
    "Thundering herd" above). By just recording uevent sockets of network
    namespaces that are owned by the initial user namespace we
    significantly increase performance in this codepath.
  - Injecting uevents:
    There's a valid argument that containers might be interested in
    receiving device events especially if they are delegated to them by a
    privileged userspace process. One prime example are SR-IOV enabled
    devices that are explicitly designed to be handed of to other users
    such as VMs or containers.
    This use-case can now be correctly handled since
    commit 692ec06d7c92 ("netns: send uevent messages"). This commit
    introduced the ability to send uevents from userspace. As such we can
    let a sufficiently privileged (CAP_SYS_ADMIN in the owning user
    namespace of the network namespace of the netlink socket) userspace
    process make a decision what uevents should be sent. This removes the
    need to blindly broadcast uevents into all user namespaces and provides
    a performant and safe solution to this problem.
  - Filtering logic:
    This patch filters by *owning user namespace of the network namespace a
    given task resides in* and not by user namespace of the task per se.
    This means if the user namespace of a given task is unshared but the
    network namespace is kept and is owned by the initial user namespace a
    listener that is opening the uevent socket in that network namespace
    can still listen to uevents.
- Fix permission for tagged kobjects:
  Network devices that are created or moved into a network namespace that
  is owned by a non-initial user namespace currently are send with
  INVALID_{G,U}ID in their credentials. This means that all current udev
  implementations in userspace will ignore the uevent they receive for
  them. This has lead to weird bugs whereby new devices showing up in such
  network namespaces were not recognized and did not get IPs assigned etc.
  This patch adjusts the permission to the appropriate {g,u}id in the
  respective user namespace. This way udevd is able to correctly handle
  such devices.
- Simplify filtering logic:
  do_one_broadcast() already ensures that only listeners in mc_list receive
  uevents that have the same network namespace as the uevent socket itself.
  So the filtering logic in kobj_bcast_filter is not needed (see [3]). This
  patch therefore removes kobj_bcast_filter() and replaces
  netlink_broadcast_filtered() with the simpler netlink_broadcast()
  everywhere.

[1]: https://lkml.org/lkml/2018/4/4/739
[2]: https://lkml.org/lkml/2018/4/26/767
[3]: https://lkml.org/lkml/2018/4/26/738
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agouevent: add alloc_uevent_skb() helper
Christian Brauner [Sun, 29 Apr 2018 10:44:11 +0000 (12:44 +0200)]
uevent: add alloc_uevent_skb() helper

This patch adds alloc_uevent_skb() in preparation for follow up patches.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'tls-offload-netdev-and-mlx5-support'
David S. Miller [Tue, 1 May 2018 13:42:48 +0000 (09:42 -0400)]
Merge branch 'tls-offload-netdev-and-mlx5-support'

Boris Pismenny says:

====================
TLS offload, netdev & MLX5 support

The following series provides TLS TX inline crypto offload.

v1->v2:
   - Added IS_ENABLED(CONFIG_TLS_DEVICE) and a STATIC_KEY for icsk_clean_acked
   - File license fix
   - Fix spelling, comment by DaveW
   - Move memory allocations out of tls_set_device_offload and other misc fixes,
comments by Kiril.

v2->v3:
   - Reversed xmas tree where needed and style fixes
   - Removed the need for skb_page_frag_refill, per Eric's comment
   - IPv6 dependency fixes

v3->v4:
   - Remove "inline" from functions in C files
   - Make clean_acked_data_enabled a static variable and add enable/disable functions to control it.
   - Remove unnecessary variable initialization mentioned by ShannonN
   - Rebase over TLS RX
   - Refactor the tls_software_fallback to reduce the number of variables mentioned by KirilT

v4->v5:
   - Add missing CONFIG_TLS_DEVICE

v5->v6:
   - Move changes to the software implementation into a seperate patch
   - Fix some checkpatch warnings
   - GPL export the enable/disable clean_acked_data functions

v6->v7:
   - Use the dst_entry to obtain the netdev in dev_get_by_index
   - Remove the IPv6 patch since it is redundent now

v7->v8:
   - Fix a merge conflict in mlx5 header

v8->v9:
   - Fix false -Wmaybe-uninitialized warning
   - Fix empty space in the end of new files

v9->v10:
   - Remove default "n" in net/Kconfig

This series adds a generic infrastructure to offload TLS crypto to a
network devices. It enables the kernel TLS socket to skip encryption and
authentication operations on the transmit side of the data path. Leaving
those computationally expensive operations to the NIC.

The NIC offload infrastructure builds TLS records and pushes them to the
TCP layer just like the SW KTLS implementation and using the same API.
TCP segmentation is mostly unaffected. Currently the only exception is
that we prevent mixed SKBs where only part of the payload requires
offload. In the future we are likely to add a similar restriction
following a change cipher spec record.

The notable differences between SW KTLS and NIC offloaded TLS
implementations are as follows:
1. The offloaded implementation builds "plaintext TLS record", those
records contain plaintext instead of ciphertext and place holder bytes
instead of authentication tags.
2. The offloaded implementation maintains a mapping from TCP sequence
number to TLS records. Thus given a TCP SKB sent from a NIC offloaded
TLS socket, we can use the tls NIC offload infrastructure to obtain
enough context to encrypt the payload of the SKB.
A TLS record is released when the last byte of the record is ack'ed,
this is done through the new icsk_clean_acked callback.

The infrastructure should be extendable to support various NIC offload
implementations.  However it is currently written with the
implementation below in mind:
The NIC assumes that packets from each offloaded stream are sent as
plaintext and in-order. It keeps track of the TLS records in the TCP
stream. When a packet marked for offload is transmitted, the NIC
encrypts the payload in-place and puts authentication tags in the
relevant place holders.

The responsibility for handling out-of-order packets (i.e. TCP
retransmission, qdisc drops) falls on the netdev driver.

The netdev driver keeps track of the expected TCP SN from the NIC's
perspective.  If the next packet to transmit matches the expected TCP
SN, the driver advances the expected TCP SN, and transmits the packet
with TLS offload indication.

If the next packet to transmit does not match the expected TCP SN. The
driver calls the TLS layer to obtain the TLS record that includes the
TCP of the packet for transmission. Using this TLS record, the driver
posts a work entry on the transmit queue to reconstruct the NIC TLS
state required for the offload of the out-of-order packet. It updates
the expected TCP SN accordingly and transmit the now in-order packet.
The same queue is used for packet transmission and TLS context
reconstruction to avoid the need for flushing the transmit queue before
issuing the context reconstruction request.

Expected TCP SN is accessed without a lock, under the assumption that
TCP doesn't transmit SKBs from different TX queue concurrently.

If packets are rerouted to a different netdevice, then a software
fallback routine handles encryption.

Paper: https://www.netdevconf.org/1.2/papers/netdevconf-TLS.pdf
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMAINTAINERS: Update TLS maintainers
Boris Pismenny [Mon, 30 Apr 2018 07:16:23 +0000 (10:16 +0300)]
MAINTAINERS: Update TLS maintainers

Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMAINTAINERS: Update mlx5 innova driver maintainers
Boris Pismenny [Mon, 30 Apr 2018 07:16:22 +0000 (10:16 +0300)]
MAINTAINERS: Update mlx5 innova driver maintainers

Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/mlx5e: TLS, Add error statistics
Ilya Lesokhin [Mon, 30 Apr 2018 07:16:21 +0000 (10:16 +0300)]
net/mlx5e: TLS, Add error statistics

Add statistics for rare TLS related errors.
Since the errors are rare we have a counter per netdev
rather then per SQ.

Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/mlx5e: TLS, Add Innova TLS TX offload data path
Ilya Lesokhin [Mon, 30 Apr 2018 07:16:20 +0000 (10:16 +0300)]
net/mlx5e: TLS, Add Innova TLS TX offload data path

Implement the TLS tx offload data path according to the
requirements of the TLS generic NIC offload infrastructure.

Special metadata ethertype is used to pass information to
the hardware.

Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/mlx5e: TLS, Add Innova TLS TX support
Ilya Lesokhin [Mon, 30 Apr 2018 07:16:19 +0000 (10:16 +0300)]
net/mlx5e: TLS, Add Innova TLS TX support

Add NETIF_F_HW_TLS_TX capability and expose tlsdev_ops to work with the
TLS generic NIC offload infrastructure.
The NETIF_F_HW_TLS_TX capability will be added in the next patch.

Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/mlx5: Accel, Add TLS tx offload interface
Ilya Lesokhin [Mon, 30 Apr 2018 07:16:18 +0000 (10:16 +0300)]
net/mlx5: Accel, Add TLS tx offload interface

Add routines for manipulating TLS TX offload contexts.

In Innova TLS, TLS contexts are added or deleted
via a command message over the SBU connection.
The HW then sends a response message over the same connection.

Add implementation for Innova TLS (FPGA-based) hardware.

These routines will be used by the TLS offload support in a later patch

mlx5/accel is a middle acceleration layer to allow mlx5e and other ULPs
to work directly with mlx5_core rather than Innova FPGA or other mlx5
acceleration providers.

In the future, when IPSec/TLS or any other acceleration gets integrated
into ConnectX chip, mlx5/accel layer will provide the integrated
acceleration, rather than the Innova one.

Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/mlx5e: Move defines out of ipsec code
Ilya Lesokhin [Mon, 30 Apr 2018 07:16:17 +0000 (10:16 +0300)]
net/mlx5e: Move defines out of ipsec code

The defines are not IPSEC specific.

Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/tls: Add generic NIC offload infrastructure
Ilya Lesokhin [Mon, 30 Apr 2018 07:16:16 +0000 (10:16 +0300)]
net/tls: Add generic NIC offload infrastructure

This patch adds a generic infrastructure to offload TLS crypto to a
network device. It enables the kernel TLS socket to skip encryption
and authentication operations on the transmit side of the data path.
Leaving those computationally expensive operations to the NIC.

The NIC offload infrastructure builds TLS records and pushes them to
the TCP layer just like the SW KTLS implementation and using the same
API.
TCP segmentation is mostly unaffected. Currently the only exception is
that we prevent mixed SKBs where only part of the payload requires
offload. In the future we are likely to add a similar restriction
following a change cipher spec record.

The notable differences between SW KTLS and NIC offloaded TLS
implementations are as follows:
1. The offloaded implementation builds "plaintext TLS record", those
records contain plaintext instead of ciphertext and place holder bytes
instead of authentication tags.
2. The offloaded implementation maintains a mapping from TCP sequence
number to TLS records. Thus given a TCP SKB sent from a NIC offloaded
TLS socket, we can use the tls NIC offload infrastructure to obtain
enough context to encrypt the payload of the SKB.
A TLS record is released when the last byte of the record is ack'ed,
this is done through the new icsk_clean_acked callback.

The infrastructure should be extendable to support various NIC offload
implementations.  However it is currently written with the
implementation below in mind:
The NIC assumes that packets from each offloaded stream are sent as
plaintext and in-order. It keeps track of the TLS records in the TCP
stream. When a packet marked for offload is transmitted, the NIC
encrypts the payload in-place and puts authentication tags in the
relevant place holders.

The responsibility for handling out-of-order packets (i.e. TCP
retransmission, qdisc drops) falls on the netdev driver.

The netdev driver keeps track of the expected TCP SN from the NIC's
perspective.  If the next packet to transmit matches the expected TCP
SN, the driver advances the expected TCP SN, and transmits the packet
with TLS offload indication.

If the next packet to transmit does not match the expected TCP SN. The
driver calls the TLS layer to obtain the TLS record that includes the
TCP of the packet for transmission. Using this TLS record, the driver
posts a work entry on the transmit queue to reconstruct the NIC TLS
state required for the offload of the out-of-order packet. It updates
the expected TCP SN accordingly and transmits the now in-order packet.
The same queue is used for packet transmission and TLS context
reconstruction to avoid the need for flushing the transmit queue before
issuing the context reconstruction request.

Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/tls: Split conf to rx + tx
Boris Pismenny [Mon, 30 Apr 2018 07:16:15 +0000 (10:16 +0300)]
net/tls: Split conf to rx + tx

In TLS inline crypto, we can have one direction in software
and another in hardware. Thus, we split the TLS configuration to separate
structures for receive and transmit.

Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Add TLS TX offload features
Ilya Lesokhin [Mon, 30 Apr 2018 07:16:14 +0000 (10:16 +0300)]
net: Add TLS TX offload features

This patch adds a netdev feature to configure TLS TX offloads.

Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Add TLS offload netdev ops
Ilya Lesokhin [Mon, 30 Apr 2018 07:16:13 +0000 (10:16 +0300)]
net: Add TLS offload netdev ops

Add new netdev ops to add and delete tls context

Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Add Software fallback infrastructure for socket dependent offloads
Ilya Lesokhin [Mon, 30 Apr 2018 07:16:12 +0000 (10:16 +0300)]
net: Add Software fallback infrastructure for socket dependent offloads

With socket dependent offloads we rely on the netdev to transform
the transmitted packets before sending them to the wire.
When a packet from an offloaded socket is rerouted to a different
device we need to detect it and do the transformation in software.

Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Rename and export copy_skb_header
Ilya Lesokhin [Mon, 30 Apr 2018 07:16:11 +0000 (10:16 +0300)]
net: Rename and export copy_skb_header

copy_skb_header is renamed to skb_copy_header and
exported. Exposing this function give more flexibility
in copying SKBs.
skb_copy and skb_copy_expand do not give enough control
over which parts are copied.

Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotcp: Add clean acked data hook
Ilya Lesokhin [Mon, 30 Apr 2018 07:16:10 +0000 (10:16 +0300)]
tcp: Add clean acked data hook

Called when a TCP segment is acknowledged.
Could be used by application protocols who hold additional
metadata associated with the stream data.

This is required by TLS device offload to release
metadata associated with acknowledged TLS records.

Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next...
David S. Miller [Tue, 1 May 2018 13:37:44 +0000 (09:37 -0400)]
Merge branch '40GbE' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
40GbE Intel Wired LAN Driver Updates 2018-04-30

This series contains updates to i40e and i40evf only.

Jia-Ju Bai replaces an instance of GFP_ATOMIC to GFP_KERNEL, since
i40evf is not in atomic context when i40evf_add_vlan() is called.

Jake cleans up function header comments to ensure that the function
parameter comments actually match the function parameters.  Fixed a
possible overflow error in the PTP clock code.  Fixed warnings regarding
restricted __be32 type usage.

Mariusz fixes the reading of the LLDP configuration, which moves from
using relative values to calculating the absolute address.

Jakub adds a check for 10G LR mode for i40e.

Paweł fixes an issue, where changing the MTU would turn on TSO, GSO and
GRO.

Alex fixes a couple of issues with the UDP tunnel filter configuration.
First being that the tunnels did not have mutual exclusion in place to
prevent a race condition between a user request to add/remove a port and
an update.  The second issue was we were deleting filters that were not
associated with the actual filter we wanted to delete.

Harshitha ensures that the queue map sent by the VF is taken into
account when enabling/disabling queues in the VF VSI.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'mlxsw-SPAN-Support-routes-pointing-at-bridges'
David S. Miller [Mon, 30 Apr 2018 16:42:41 +0000 (12:42 -0400)]
Merge branch 'mlxsw-SPAN-Support-routes-pointing-at-bridges'

Ido Schimmel says:

====================
mlxsw: SPAN: Support routes pointing at bridges

Petr says:

When mirroring to a gretap or ip6gretap netdevice, the route that
directs the encapsulated packets can reference a bridge. In that case,
in the software model, the packet is switched.

Thus when offloading mirroring like that, take into consideration FDB,
STP, PVID configured at the bridge, and whether that VLAN ID should be
tagged on egress.

Patch #1 introduces functions to get bridge PVID, VLAN flags and to look
up an FDB entry.

Patches #2 and #3 refactor some existing code and introduce a new
accessor function.

With patches #4 and #5 mlxsw calls mlxsw_sp_span_respin() on switchdev
events as well. There is no impact yet, because bridge as an underlay
device is still not allowed.

That is implemented in patch #6, which uses the new interfaces to figure
out on which one port the mirroring should be configured, and whether
the mirrored packets should be VLAN-tagged and how.

Changes from v2 to v3:

- Rename the suite of bridge accessor function to br_vlan_get_pvid(),
  br_vlan_get_info() and br_fdb_find_port(). The _get bit is to avoid
  clashing with an existing static function.

Changes from v1 to v2:

- Change the suite of bridge accessor functions to br_vlan_pvid_rtnl(),
  br_vlan_info_rtnl(), br_fdb_find_port_rtnl().
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: spectrum_span: Allow bridge for gretap mirror
Petr Machata [Sun, 29 Apr 2018 07:56:13 +0000 (10:56 +0300)]
mlxsw: spectrum_span: Allow bridge for gretap mirror

When handling mirroring to a gretap or ip6gretap netdevice in mlxsw, the
underlay address (i.e. the remote address of the tunnel) may be routed
to a bridge.

In that case, look up the resolved neighbor Ethernet address in that
bridge's FDB. Then configure the offload to direct the mirrored traffic
to that port, possibly with tagging.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: Respin SPAN on switchdev events
Petr Machata [Sun, 29 Apr 2018 07:56:12 +0000 (10:56 +0300)]
mlxsw: Respin SPAN on switchdev events

Changes to switchdev artifact can make a SPAN entry offloadable or
unoffloadable. To that end:

- Listen to SWITCHDEV_FDB_*_TO_BRIDGE notifications in addition to
  the *_TO_DEVICE ones, to catch whatever activity is sent to the
  bridge (likely by mlxsw itself).

  On each FDB notification, respin SPAN to reconcile it with the FDB
  changes.

- Also respin on switchdev port attribute changes (which currently
  covers changes to STP state of ports) and port object additions and
  removals.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: spectrum: Register SPAN before switchdev
Petr Machata [Sun, 29 Apr 2018 07:56:11 +0000 (10:56 +0300)]
mlxsw: spectrum: Register SPAN before switchdev

Since switchdev events can trigger SPAN respin, it is necessary that the
data structures are available. Register SPAN first, with a commentary on
what the dependencies are.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: spectrum_switchdev: Publish two functions
Petr Machata [Sun, 29 Apr 2018 07:56:10 +0000 (10:56 +0300)]
mlxsw: spectrum_switchdev: Publish two functions

Publish the existing function mlxsw_sp_bridge_port_find(), and add
another service accessor mlxsw_sp_bridge_port_stp_state(). Publish both
in a new file spectrum_switchdev.h.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: spectrum: Extract mlxsw_sp_stp_spms_state()
Petr Machata [Sun, 29 Apr 2018 07:56:09 +0000 (10:56 +0300)]
mlxsw: spectrum: Extract mlxsw_sp_stp_spms_state()

Instead of duplicating the decision regarding port forwarding state made
by mlxsw_sp_port_vid_stp_set(), extract the decision-making into a new
function and reuse.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: bridge: Publish bridge accessor functions
Petr Machata [Sun, 29 Apr 2018 07:56:08 +0000 (10:56 +0300)]
net: bridge: Publish bridge accessor functions

Add a couple new functions to allow querying FDB and vlan settings of a
bridge.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoi40e: use %pI4b instead of byte swapping before dev_err
Jacob Keller [Fri, 20 Apr 2018 08:41:40 +0000 (01:41 -0700)]
i40e: use %pI4b instead of byte swapping before dev_err

Fix warnings regarding restricted __be32 type usage by strictly
specifying the type of the ipv4 address being printed in the dev_err
statement.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
6 years agoi40e/i40evf: take into account queue map from vf when handling queues
Harshitha Ramamurthy [Fri, 20 Apr 2018 08:41:39 +0000 (01:41 -0700)]
i40e/i40evf: take into account queue map from vf when handling queues

The expectation of the ops VIRTCHNL_OP_ENABLE_QUEUES and
VIRTCHNL_OP_DISABLE_QUEUES is that the queue map sent by
the VF is taken into account when enabling/disabling
queues in the VF VSI. This patch makes sure that happens.

By breaking out the individual queue set up functions so
that they can be called directly from the i40e_virtchnl_pf.c
file, only the queues as specified by the queue bit map that
accompanies the enable/disable queues ops will be handled.

Signed-off-by: Harshitha Ramamurthy <harshitha.ramamurthy@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
6 years agoi40e: avoid overflow in i40e_ptp_adjfreq()
Jacob Keller [Fri, 20 Apr 2018 08:41:38 +0000 (01:41 -0700)]
i40e: avoid overflow in i40e_ptp_adjfreq()

When operating at 1GbE, the base incval for the PTP clock is so large
that multiplying it by numbers close to the max_adj can overflow the
u64.

Rather than attempting to limit the max_adj to a value small enough to
avoid overflow, instead calculate the incvalue adjustment based on the
40GbE incvalue, and then multiply that by the scaling factor for the
link speed.

This sacrifices a small amount of precision in the adjustment but we
avoid erratic behavior of the clock due to the overflow caused if ppb is
very near the maximum adjustment.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
6 years agoi40e: Fix multiple issues with UDP tunnel offload filter configuration
Alexander Duyck [Fri, 20 Apr 2018 08:41:37 +0000 (01:41 -0700)]
i40e: Fix multiple issues with UDP tunnel offload filter configuration

This fixes at least 2 issues I have found with the UDP tunnel filter
configuration.

The first issue is the fact that the tunnels didn't have any sort of mutual
exclusion in place to prevent an update from racing with a user request to
add/remove a port. As such you could request to add and remove a port
before the port update code had a chance to respond which would result in a
very confusing result. To address it I have added 2 changes. First I added
the RTNL mutex wrapper around our updating of the pending, port, and
filter_index bits. Second I added logic so that we cannot use a port that
has a pending deletion since we need to free the space in hardware before
we can allow software to reuse it.

The second issue addressed is the fact that we were not recording the
actual filter index provided to us by the admin queue. As a result we were
deleting filters that were not associated with the actual filter we wanted
to delete. To fix that I added a filter_index member to the UDP port
tracking structure.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
6 years agoi40evf: Fix turning TSO, GSO and GRO on after
Paweł Jabłoński [Fri, 20 Apr 2018 08:41:36 +0000 (01:41 -0700)]
i40evf: Fix turning TSO, GSO and GRO on after

This patch fixes the problem where each MTU change turns TSO,
GSO and GRO on from off state.

Now when TSO, GSO or GRO is turned off, MTU change does not
turn them on.

Signed-off-by: Paweł Jabłoński <pawel.jablonski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>