platform/kernel/linux-starfive.git
6 years agomac80211: Use proper smps_mode enum in sta opmode event
tamizhr@codeaurora.org [Tue, 27 Mar 2018 13:46:16 +0000 (19:16 +0530)]
mac80211: Use proper smps_mode enum in sta opmode event

SMPS_MODE change value notified via nl80211 contains mac80211
specific value(ieee80211_smps_mode) and user space application
will not know those values. This patch add support to map
the mac80211 enum value to nl80211_smps_mode which will be
understood by the userspace application.

Signed-off-by: Tamizh chelvam <tamizhr@codeaurora.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
6 years agocfg80211: fix data type of sta_opmode_info parameter
tamizhr@codeaurora.org [Tue, 27 Mar 2018 13:46:15 +0000 (19:16 +0530)]
cfg80211: fix data type of sta_opmode_info parameter

Currently bw and smps_mode are u8 type value in sta_opmode_info
structure. This values filled in mac80211 from ieee80211_sta_rx_bandwidth
and ieee80211_smps_mode. These enum values are specific to mac80211 and
userspace/cfg80211 doesn't know about that. This will lead to incorrect
result/assumption by the user space application.
Change bw and smps_mode parameters to their respective enums in nl80211.

Signed-off-by: Tamizh chelvam <tamizhr@codeaurora.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
6 years agomac80211: notify driver for change in multicast rates
Pradeep Kumar Chitrapu [Thu, 22 Mar 2018 19:18:03 +0000 (12:18 -0700)]
mac80211: notify driver for change in multicast rates

With drivers implementing rate control in driver or firmware
rate_control_send_low() may not get called, and thus the
driver needs to know about changes in the multicast rate.

Add and use a new BSS change flag for this.

Signed-off-by: Pradeep Kumar Chitrapu <pradeepc@codeaurora.org>
[rewrite commit message]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
6 years agocfg80211/nl80211: add DFS offload flag
Dmitry Lebed [Thu, 1 Mar 2018 09:39:15 +0000 (12:39 +0300)]
cfg80211/nl80211: add DFS offload flag

Add wiphy EXT_FEATURE flag to indicate that HW or driver does
all DFS actions by itself.
User-space functionality already implemented in hostapd using
vendor-specific (QCA) OUI to advertise DFS offload support.
Need to introduce generic flag to inform about DFS offload support.
For devices with DFS_OFFLOAD flag set user-space will no longer
need to issue CAC or do any actions in response to
"radar detected" events. HW will do everything by itself and send
events to user-space to indicate that CAC was started/finished, etc.

Signed-off-by: Dmitrii Lebed <dlebed@quantenna.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
6 years agocfg80211/nl80211: add CAC_STARTED event
Dmitry Lebed [Thu, 1 Mar 2018 09:39:16 +0000 (12:39 +0300)]
cfg80211/nl80211: add CAC_STARTED event

CAC_STARTED event is needed for DFS offload feature and
should be generated by driver/HW if DFS_OFFLOAD is enabled.

Signed-off-by: Dmitry Lebed <dlebed@quantenna.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
6 years agomac80211: inform wireless layer when frame RSSI is invalid
Tosoni [Wed, 14 Mar 2018 16:58:34 +0000 (16:58 +0000)]
mac80211: inform wireless layer when frame RSSI is invalid

When the low-level driver returns an invalid RSSI indication,
set the signal value to 0 as an indication to the upper layer.

Also, skip average level computation if signal is invalid.

Signed-off-by: Jean Pierre TOSONI <jp.tosoni@acksys.fr>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
6 years agomac80211_hwsim: fix use-after-free bug in hwsim_exit_net
Benjamin Beichler [Wed, 7 Mar 2018 17:11:07 +0000 (18:11 +0100)]
mac80211_hwsim: fix use-after-free bug in hwsim_exit_net

When destroying a net namespace, all hwsim interfaces, which are not
created in default namespace are deleted. But the async deletion of the
interfaces could last longer than the actual destruction of the
namespace, which results to an use after free bug. Therefore use
synchronous deletion in this case.

Fixes: 100cb9ff40e0 ("mac80211_hwsim: Allow managing radios from non-initial namespaces")
Reported-by: syzbot+70ce058e01259de7bb1d@syzkaller.appspotmail.com
Signed-off-by: Benjamin Beichler <benjamin.beichler@uni-rostock.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
6 years agomac80211_hwsim: fix secondary MAC address assignment
Johannes Berg [Wed, 21 Mar 2018 08:07:02 +0000 (09:07 +0100)]
mac80211_hwsim: fix secondary MAC address assignment

OR'ing in 0x40 before a memcpy() to overwrite the value doesn't
do much good - flip the order of operations are reported and
tested by Jouni.

Fixes: cb1a5bae5684 ("mac80211_hwsim: add permanent mac address option for new radios")
Reported-by: Jouni Malinen <j@w1.fi>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
6 years agoMerge branch 'dsa-mv88e6xxx-some-fixes'
David S. Miller [Tue, 20 Mar 2018 16:29:58 +0000 (12:29 -0400)]
Merge branch 'dsa-mv88e6xxx-some-fixes'

Uwe Kleine-König says:

====================
net: dsa: mv88e6xxx: some fixes

these patches target net-next and got approved by Andrew Lunn.

Compared to (implicit) v1, I dropped the patch that I didn't know if it
was right because of missing documentation on my side. But Andrew
already cared for that in a patch that is now adfccf118211 in net-next.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: dsa: mv88e6xxx: Fix interrupt name for g2 irq
Uwe Kleine-König [Tue, 20 Mar 2018 09:44:42 +0000 (10:44 +0100)]
net: dsa: mv88e6xxx: Fix interrupt name for g2 irq

This changes the respective line in /proc/interrupts from

 49:          x          x  mv88e6xxx-g1   7 Edge      mv88e6xxx-g1

to

 49:          x          x  mv88e6xxx-g1   7 Edge      mv88e6xxx-g2

which makes more sense.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: dsa: mv88e6xxx: Fix typo in a comment
Uwe Kleine-König [Tue, 20 Mar 2018 09:44:41 +0000 (10:44 +0100)]
net: dsa: mv88e6xxx: Fix typo in a comment

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: dsa: mv88e6xxx: Fix name of switch 88E6141
Uwe Kleine-König [Tue, 20 Mar 2018 09:44:40 +0000 (10:44 +0100)]
net: dsa: mv88e6xxx: Fix name of switch 88E6141

The switch name is emitted in the kernel log, so having the right name
there is nice.

Fixes: 1558727a1c1b ("net: dsa: mv88e6xxx: Add support for ethernet switch 88E6141")
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'mlxsw-Adapt-driver-to-upcoming-firmware-versions'
David S. Miller [Tue, 20 Mar 2018 16:11:03 +0000 (12:11 -0400)]
Merge branch 'mlxsw-Adapt-driver-to-upcoming-firmware-versions'

Ido Schimmel says:

====================
mlxsw: Adapt driver to upcoming firmware versions

The first two patches make sure that reserved fields are set to zero, as
required by the device's programmer's reference manual (PRM).

Last two patches prevent the driver from performing an invalid operation
that is going to be denied by upcoming firmware versions.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: spectrum_acl: Do not invalidate already invalid ACL groups
Ido Schimmel [Mon, 19 Mar 2018 07:51:03 +0000 (09:51 +0200)]
mlxsw: spectrum_acl: Do not invalidate already invalid ACL groups

When a new ACL group is created its region (ACL) list is initially
empty. Thus, the call to mlxsw_sp_acl_tcam_group_update() would
basically invalidate an already invalid (non-existent) group.

Remove the unnecessary call and make the function symmetric to its del()
counterpart.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: spectrum_acl: Adapt ACL configuration to new firmware versions
Ido Schimmel [Mon, 19 Mar 2018 07:51:02 +0000 (09:51 +0200)]
mlxsw: spectrum_acl: Adapt ACL configuration to new firmware versions

The driver currently creates empty ACL groups, binds them to the
requested port and then fills them with actual ACLs that point to TCAM
regions.

However, empty ACL groups are considered invalid and upcoming firmware
versions are going to forbid their binding.

Work around this limitation by only performing the binding after the
first ACL was added to the group.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: spectrum: Reserved field in mbox profile shouldn't be set
Tal Bar [Mon, 19 Mar 2018 07:51:01 +0000 (09:51 +0200)]
mlxsw: spectrum: Reserved field in mbox profile shouldn't be set

There is no need to set some of the fields within 'mbox_config_profile',
since they are reserved and capability mask should be set to zero.

Signed-off-by: Tal Bar <talb@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: pci: Set mbox dma addresses to zero when not used
Shalom Toledo [Mon, 19 Mar 2018 07:51:00 +0000 (09:51 +0200)]
mlxsw: pci: Set mbox dma addresses to zero when not used

Some of the opcodes don't use in, out or both mboxes. In such cases, the
mbox address is a reserved field and FW expects it to be zero.

Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlx5: Remove call to ida_pre_get
Matthew Wilcox [Thu, 15 Mar 2018 02:57:24 +0000 (19:57 -0700)]
mlx5: Remove call to ida_pre_get

The mlx5 driver calls ida_pre_get() in a loop for no readily apparent
reason.  The driver uses ida_simple_get() which will call ida_pre_get()
by itself and there's no need to use ida_pre_get() unless using
ida_get_new().

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next...
David S. Miller [Mon, 19 Mar 2018 19:12:03 +0000 (15:12 -0400)]
Merge branch '40GbE' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
40GbE Intel Wired LAN Driver Updates 2018-03-19

This series contains updates to i40e and i40evf only.

Alex fixes a potential deadlock in the configure_clsflower function in
i40evf, where we exit with the "IN_CRITICAL_TASK" bit set while
notifying the PF of flower filters.

Jan fixed an issue where it was possible to set a mode that is not
allowed which resulted in link being down, so fixed the parity between
i40e_set_link_ksettings() and i40e_get_link_ksettings().

Patryk fixes a bug where a backplane device was allowing the setting of
link settings, which is not allowed.

Shiraz fixes a crash when entering S3 because the client interface was
freeing the MSIx vectors while they are still in use.

Jake fixes up a function header comment to document a newly added
parameter.  Also cleaned up flags that were never used.

Doug fixes the incorrect return type for i40e_aq_add_cloud_filters().
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoi40e: Fix the polling mechanism of GLGEN_RSTAT.DEVSTATE
Paweł Jabłoński [Mon, 19 Mar 2018 16:28:04 +0000 (09:28 -0700)]
i40e: Fix the polling mechanism of GLGEN_RSTAT.DEVSTATE

This fixes the polling mechanism of GLGEN_RSTAT.DEVSTATE in the
PF Reset path when Global Reset is in progress. While the driver
is polling for the end of the PF Reset and the Global Reset is
triggered, abandon the PF Reset path and prepare for the
upcoming Global Reset.

Signed-off-by: Paweł Jabłoński <pawel.jablonski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
6 years agoi40evf: remove flags that are never used
Jacob Keller [Mon, 19 Mar 2018 16:28:04 +0000 (09:28 -0700)]
i40evf: remove flags that are never used

These flags were defined, but there is no use within the driver code, so
we don't need to keep them.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
6 years agoi40e: Prevent setting link speed on I40E_DEV_ID_25G_B
Patryk Małek [Mon, 19 Mar 2018 16:28:04 +0000 (09:28 -0700)]
i40e: Prevent setting link speed on I40E_DEV_ID_25G_B

Setting link settings on backplane devices shouldn't be allowed.
This patch adds one more device id to the list which we check
that against.

Signed-off-by: Patryk Małek <patryk.malek@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
6 years agoi40e: Fix incorrect return types
Doug Dziggel [Mon, 19 Mar 2018 16:28:04 +0000 (09:28 -0700)]
i40e: Fix incorrect return types

Fix return types from i40e_status to enum i40e_status_code.

Signed-off-by: Doug Dziggel <douglas.a.dziggel@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
6 years agoi40e: add doxygen comment for new mode parameter
Jacob Keller [Mon, 19 Mar 2018 16:28:04 +0000 (09:28 -0700)]
i40e: add doxygen comment for new mode parameter

A recent patch updated the signature for i40e_aq_set_switch_config() to
add a new parameter 'mode'. It forgot to document the parameter in the
doxygen function header comment. Add the parameter to the function
description now.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
6 years agoi40e: Close client on suspend and restore client MSIx on resume
Shiraz Saleem [Mon, 19 Mar 2018 16:28:03 +0000 (09:28 -0700)]
i40e: Close client on suspend and restore client MSIx on resume

During suspend client MSIx vectors are freed while they are still
in use causing a crash on entering S3.

Fix this calling client close before freeing up its MSIx vectors.
Also update the client MSIx vectors on resume before client
open is called.

Fixes commit b980c0634fe5 ("i40e: shutdown all IRQs and disable MSI-X
when suspended")

Reported-by: Stefan Assmann <sassmann@redhat.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
6 years agoi40e: Prevent setting link speed on KX_X722
Patryk Małek [Mon, 19 Mar 2018 16:28:03 +0000 (09:28 -0700)]
i40e: Prevent setting link speed on KX_X722

Setting link settings on backplane devices shouldn't be allowed.
This patch adds one more device id to the list which we check
that against.

Signed-off-by: Patryk Małek <patryk.malek@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
6 years agoi40e: Properly check allowed advertisement capabilities
Jan Sokolowski [Mon, 19 Mar 2018 16:28:03 +0000 (09:28 -0700)]
i40e: Properly check allowed advertisement capabilities

The i40e_set_link_ksettings and i40e_get_link_ksettings use different
codepaths to check available and supported advertisement modes. This
creates scenarios where it's possible to set a mode that's not allowed,
resulting in a link down.

Fix setting advertisement in i40e_set_link_ksettings by calling
i40e_get_link_ksettings to check what modes are allowed.

Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
6 years agoi40evf: Reorder configure_clsflower to avoid deadlock on error
Alexander Duyck [Mon, 19 Mar 2018 16:28:03 +0000 (09:28 -0700)]
i40evf: Reorder configure_clsflower to avoid deadlock on error

While doing some code review I noticed that we can get into a state where
we exit with the "IN_CRITICAL_TASK" bit set while notifying the PF of
flower filters. This patch is meant to address that plus tweak the ordering
of the while loop waiting on it slightly so that we don't wait an extra
period after we have failed for the last time.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
6 years agoselftests: pmtu: Drop prints to kernel log from pmtu_vti6_link_change_mtu
Stefano Brivio [Sun, 18 Mar 2018 20:58:12 +0000 (21:58 +0100)]
selftests: pmtu: Drop prints to kernel log from pmtu_vti6_link_change_mtu

Reported-by: David Ahern <dsahern@gmail.com>
Fixes: 1fad59ea1c34 ("selftests: pmtu: Add pmtu_vti6_link_change_mtu test")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'mv88e6xxx-auto-phy-intr'
David S. Miller [Sun, 18 Mar 2018 20:52:59 +0000 (16:52 -0400)]
Merge branch 'mv88e6xxx-auto-phy-intr'

Andrew Lunn says:

====================
Automatic PHY interrupts

Now that the mv88e6xxx driver either installs in interrupt handler, or
polls for interrupts, it is possible to always handle PHY interrupts,
rather than have phylib perform the polling. This speeds up detection
of link changes and reduces the load on the MDIO bus, which is
beneficial for PTP.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: dsa: mv88e6xxx: Add MDIO interrupts for internal PHYs
Andrew Lunn [Sat, 17 Mar 2018 19:32:05 +0000 (20:32 +0100)]
net: dsa: mv88e6xxx: Add MDIO interrupts for internal PHYs

When registering an MDIO bus, it is possible to pass an array of
interrupts, one per address on the bus. phylib will then associate the
interrupt to the PHY device, if no other interrupt is provided.

Some of the global2 interrupts are PHY interrupts. Place them into the
MDIO bus structure.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: dsa: mv88e6xxx: Add number of internal PHYs
Andrew Lunn [Sat, 17 Mar 2018 19:32:04 +0000 (20:32 +0100)]
net: dsa: mv88e6xxx: Add number of internal PHYs

Add to the info structure the number of internal PHYs, if they generate
interrupts. Some of the older generations of switches have internal
PHYs, but no interrupt registers. In this case, set the count to zero.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: dsa: mv88e6xxx: Add missing g1 IRQ numbers
Andrew Lunn [Sat, 17 Mar 2018 19:32:03 +0000 (20:32 +0100)]
net: dsa: mv88e6xxx: Add missing g1 IRQ numbers

With the recent change to polling for interrupts, it is important that
the number of global 1 interrupts is listed. Without it, the driver
requests an interrupt domain for zero interrupts, which returns
EINVAL, and the probe fails.

Add two missing entries.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: dsa: mv88e6xxx: Fix missing register lock in serdes_get_stats
Florian Fainelli [Sun, 18 Mar 2018 18:23:05 +0000 (11:23 -0700)]
net: dsa: mv88e6xxx: Fix missing register lock in serdes_get_stats

We can hit the register lock not held assertion with the following path:

[   34.170631] mv88e6085 0.1:00: Switch registers lock not held!
[   34.176510] CPU: 0 PID: 950 Comm: ethtool Not tainted 4.16.0-rc4 #143
[   34.182985] Hardware name: Freescale Vybrid VF5xx/VF6xx (Device Tree)
[   34.189519] Backtrace:
[   34.192033] [<8010c4b4>] (dump_backtrace) from [<8010c788>] (show_stack+0x20/0x24)
[   34.199680]  r6:9f5dc010 r5:00000011 r4:9f5dc010 r3:00000000
[   34.205434] [<8010c768>] (show_stack) from [<80679d38>] (dump_stack+0x24/0x28)
[   34.212719] [<80679d14>] (dump_stack) from [<804844a8>] (mv88e6xxx_read+0x70/0x7c)
[   34.220376] [<80484438>] (mv88e6xxx_read) from [<804870dc>] (mv88e6xxx_port_get_cmode+0x34/0x4c)
[   34.229257]  r5:a09cd128 r4:9ee31d07
[   34.232880] [<804870a8>] (mv88e6xxx_port_get_cmode) from [<80487e6c>] (mv88e6352_port_has_serdes+0x24/0x64)
[   34.242690]  r4:9f5dc010
[   34.245309] [<80487e48>] (mv88e6352_port_has_serdes) from [<804880b8>] (mv88e6352_serdes_get_stats+0x28/0x12c)
[   34.255389]  r4:00000001
[   34.257973] [<80488090>] (mv88e6352_serdes_get_stats) from [<804811e8>] (mv88e6xxx_get_ethtool_stats+0xb0/0xc0)
[   34.268156]  r10:00000000 r9:00000000 r8:00000000 r7:a09cd020 r6:00000001 r5:9f5dc01c
[   34.276052]  r4:9f5dc010
[   34.278631] [<80481138>] (mv88e6xxx_get_ethtool_stats) from [<8064f740>] (dsa_slave_get_ethtool_stats+0xbc/0xc4)

mv88e6xxx_get_ethtool_stats() calls mv88e6xxx_get_stats() which calls both
chip->info->ops->stats_get_stats(), which holds the register lock, and
chip->info->ops->serdes_get_stats() which does not. Have
chip->info->ops->serdes_get_stats() be running with the register lock held to
avoid such assertions.

Fixes: 436fe17d273b ("net: dsa: mv88e6xxx: Allow the SERDES interfaces to have statistics")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: dsa: mv88e6xxx: Fix IRQ when loading module
Andrew Lunn [Sat, 17 Mar 2018 19:21:09 +0000 (20:21 +0100)]
net: dsa: mv88e6xxx: Fix IRQ when loading module

Handle polled interrupts correctly when loading the module.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Fixes: 294d711ee8c0 ("net: dsa: mv88e6xxx: Poll when no interrupt defined")
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'selftests-pmtu-Add-further-vti-vti6-MTU-and-PMTU-tests'
David S. Miller [Sun, 18 Mar 2018 00:15:14 +0000 (20:15 -0400)]
Merge branch 'selftests-pmtu-Add-further-vti-vti6-MTU-and-PMTU-tests'

Stefano Brivio says:

====================
selftests: pmtu: Add further vti/vti6 MTU and PMTU tests

Patches 5/10 to 10/10 add tests to verify default MTU assignment
for vti4 and vti6 interfaces, to check that MTU values set on new
link and link changes are properly taken and validated, and to
verify PMTU exceptions on vti4 interfaces.

Patch 1/10 reverses function return codes as suggested by David
Ahern.

Patch 2/10 fixes the helper to fetch exceptions MTU to run in the
passed namespace.

Patches 3/10 and 4/10 are preparation work to make it easier to
introduce those tests.

v2: Reverse return codes, and make output prettier in 4/9 by
    using padded printf, test descriptions and buffered error
    strings. Remove accidental output to /dev/kmsg from 10/10
    (was 9/9).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests: pmtu: Add pmtu_vti6_link_change_mtu test
Stefano Brivio [Sat, 17 Mar 2018 01:31:47 +0000 (02:31 +0100)]
selftests: pmtu: Add pmtu_vti6_link_change_mtu test

This test checks that MTU configured from userspace is used on
link creation and changes, and that when it's not passed from
userspace, it's calculated properly from the MTU of the lower
layer.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests: pmtu: Add pmtu_vti6_link_add_mtu test
Stefano Brivio [Sat, 17 Mar 2018 01:31:46 +0000 (02:31 +0100)]
selftests: pmtu: Add pmtu_vti6_link_add_mtu test

Same as pmtu_vti4_link_add_mtu test, but for IPv6.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests: pmtu: Add pmtu_vti4_link_add_mtu test
Stefano Brivio [Sat, 17 Mar 2018 01:31:45 +0000 (02:31 +0100)]
selftests: pmtu: Add pmtu_vti4_link_add_mtu test

This test checks that MTU given on vti link creation is actually
configured, and that tunnel is not created with an invalid MTU
value.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests: pmtu: Add test_pmtu_vti4_exception test
Stefano Brivio [Sat, 17 Mar 2018 01:31:44 +0000 (02:31 +0100)]
selftests: pmtu: Add test_pmtu_vti4_exception test

This test checks that PMTU exceptions are created only when
needed on IPv4 routes with vti and xfrm, and their PMTU value is
checked as well.

We can't adopt the same approach as test_pmtu_vti6_exception()
here, because on IPv4 administrative MTU changes won't be
reflected directly on PMTU.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests: pmtu: Add pmtu_vti6_default_mtu test
Stefano Brivio [Sat, 17 Mar 2018 01:31:43 +0000 (02:31 +0100)]
selftests: pmtu: Add pmtu_vti6_default_mtu test

Same as pmtu_vti4_default_mtu, but on IPv6 with vti6.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests: pmtu: Add pmtu_vti4_default_mtu test
Stefano Brivio [Sat, 17 Mar 2018 01:31:42 +0000 (02:31 +0100)]
selftests: pmtu: Add pmtu_vti4_default_mtu test

This test checks that the MTU assigned by default to a vti (IPv4)
interface created on top of veth is simply veth's MTU minus the
length of the encapsulated IPv4 header.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests: pmtu: Introduce support for multiple tests
Stefano Brivio [Sat, 17 Mar 2018 01:31:41 +0000 (02:31 +0100)]
selftests: pmtu: Introduce support for multiple tests

Introduce list of tests and their descriptions, and loop on it
in main body.

Tests will now just take care of calling setup with a list of
"units" they need, and return 0 on success, 1 on failure, 2 if
the test had to be skipped.

Main script body will take care of displaying results and
cleaning up after every test. Introduce guard variable so that
we don't clean up twice in case of interrupts or unexpected
failures.

The pmtu_vti6_exception test can now run its third step even if
the previous one failed, as we can return values from it.

Also introduce support to display test descriptions, and display
aligned OK/FAIL/SKIP test outcomes. Buffer error strings so that
in case of failure we can display them right under the outcome
for each test.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests: pmtu: Factor out MTU parsing helper
Stefano Brivio [Sat, 17 Mar 2018 01:31:40 +0000 (02:31 +0100)]
selftests: pmtu: Factor out MTU parsing helper

...so that it can be used for any iproute command output.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests: pmtu: Use namespace command prefix to fetch route mtu
Stefano Brivio [Sat, 17 Mar 2018 01:31:39 +0000 (02:31 +0100)]
selftests: pmtu: Use namespace command prefix to fetch route mtu

In 7af137b72131 ("selftests: net: Introduce first PMTU test") I
accidentally assumed route_get_* helpers would run from a single
namespace. Make them a bit more generic, by passing the
namespace command prefix as a parameter instead.

Fixes: 7af137b72131 ("selftests: net: Introduce first PMTU test")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests: pmtu: Reverse return codes of functions
Stefano Brivio [Sat, 17 Mar 2018 01:31:38 +0000 (02:31 +0100)]
selftests: pmtu: Reverse return codes of functions

David suggests it's more intuitive to return non-zero on
failures, and zero on success.

No need to introduce tail 'return 0' in functions, they will
return the exit code of the last command anyway.

Suggested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'ibmvnic-Update-TX-pool-and-TX-routines'
David S. Miller [Sun, 18 Mar 2018 00:12:40 +0000 (20:12 -0400)]
Merge branch 'ibmvnic-Update-TX-pool-and-TX-routines'

Thomas Falcon says:

====================
ibmvnic: Update TX pool and TX routines

This patch restructures the TX pool data structure and provides a
separate TX pool array for TSO transmissions. This is already used
in some way due to our unique DMA situation, namely that we cannot
use single DMA mappings for packet data. Previously, both buffer
arrays used the same pool entry. This restructuring allows for
some additional cleanup in the driver code, especially in some
places in the device transmit routine.

In addition, it allows us to more easily track the consumer
and producer indexes of a particular pool. This has been
further improved by better tracking of in-use buffers to
prevent possible data corruption in case an invalid buffer
entry is used.

v5: Fix bisectability mistake in the first patch. Removed
    TSO-specific data in a later patch when it is no longer used.

v4: Fix error in 7th patch that causes an oops by using
    the older fixed value for number of buffers instead
    of the respective field in the tx pool data structure

v3: Forgot to update TX pool cleaning function to handle new data
    structures. Included 7th patch for that.

v2: Fix typo in 3/6 commit subject line
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoibmvnic: Remove unused TSO resources in TX pool structure
Thomas Falcon [Sat, 17 Mar 2018 01:00:31 +0000 (20:00 -0500)]
ibmvnic: Remove unused TSO resources in TX pool structure

Finally, remove the TSO-specific fields in the TX pool
strcutures. These are no longer needed with the introduction
of separate buffer pools for TSO transmissions.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoibmvnic: Update TX pool cleaning routine
Thomas Falcon [Sat, 17 Mar 2018 01:00:30 +0000 (20:00 -0500)]
ibmvnic: Update TX pool cleaning routine

Update routine that cleans up any outstanding transmits that
have not received completions when the device needs to close.
Introduces a helper function that cleans one TX pool to make
code more readable.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoibmvnic: Improve TX buffer accounting
Thomas Falcon [Sat, 17 Mar 2018 01:00:29 +0000 (20:00 -0500)]
ibmvnic: Improve TX buffer accounting

Improve TX pool buffer accounting to prevent the producer
index from overruning the consumer. First, set the next free
index to an invalid value if it is in use. If next buffer
to be consumed is in use, drop the packet.

Finally, if the transmit fails for some other reason, roll
back the consumer index and set the free map entry to its original
value. This should also be done if the DMA map fails.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoibmvnic: Update TX and TX completion routines
Thomas Falcon [Sat, 17 Mar 2018 01:00:28 +0000 (20:00 -0500)]
ibmvnic: Update TX and TX completion routines

Update TX and TX completion routines to account for TX pool
restructuring. TX routine first chooses the pool depending
on whether a packet is GSO or not, then uses it accordingly.

For the completion routine to know which pool it needs to use,
set the most significant bit of the correlator index to one
if the packet uses the TSO pool. On completion, unset the bit
and use the correlator index to release the buffer pool entry.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoibmvnic: Update TX pool initialization routine
Thomas Falcon [Sat, 17 Mar 2018 01:00:27 +0000 (20:00 -0500)]
ibmvnic: Update TX pool initialization routine

Introduce function that initializes one TX pool. Use that to
create each pool entry in both the standard TX pool and TSO
pool arrays.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoibmvnic: Update release TX pool routine
Thomas Falcon [Sat, 17 Mar 2018 01:00:26 +0000 (20:00 -0500)]
ibmvnic: Update release TX pool routine

Introduce function that frees one TX pool.  Use that to release
each pool in both the standard TX pool and TSO pool arrays.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoibmvnic: Update and clean up reset TX pool routine
Thomas Falcon [Sat, 17 Mar 2018 01:00:25 +0000 (20:00 -0500)]
ibmvnic: Update and clean up reset TX pool routine

Update TX pool reset routine to accommodate new TSO pool array. Introduce
a function that resets one TX pool, and use that function to initialize
each pool in both pool arrays.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoibmvnic: Generalize TX pool structure
Thomas Falcon [Sat, 17 Mar 2018 01:00:24 +0000 (20:00 -0500)]
ibmvnic: Generalize TX pool structure

Remove some unused fields in the structure and include values
describing the individual buffer size and number of buffers in
a TX pool. This allows us to use these fields for TX pool buffer
accounting as opposed to using hard coded values. Include a new
pool array for TSO transmissions.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosctp: use proc_remove_subtree()
Al Viro [Fri, 16 Mar 2018 23:32:51 +0000 (23:32 +0000)]
sctp: use proc_remove_subtree()

use proc_remove_subtree() for subtree removal, both on setup failure
halfway through and on teardown.  No need to make simple things
complex...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'hv_netvsc-minor-enhancements'
David S. Miller [Sun, 18 Mar 2018 00:10:27 +0000 (20:10 -0400)]
Merge branch 'hv_netvsc-minor-enhancements'

Stephen Hemminger says:

====================
hv_netvsc: minor enhancements

A couple of small things for net-next
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agohv_netvsc: add trace points
Stephen Hemminger [Fri, 16 Mar 2018 22:44:28 +0000 (15:44 -0700)]
hv_netvsc: add trace points

This adds tracepoints to the driver which has proved useful in
debugging startup and shutdown race conditions.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agohv_netvsc: pass netvsc_device to rndis halt
Stephen Hemminger [Fri, 16 Mar 2018 22:44:27 +0000 (15:44 -0700)]
hv_netvsc: pass netvsc_device to rndis halt

The caller has a valid pointer, pass it to rndis_filter_halt_device
and avoid any possible RCU races here.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: ethernet: ti: cpsw: enable vlan rx vlan offload
Grygorii Strashko [Thu, 15 Mar 2018 20:15:50 +0000 (15:15 -0500)]
net: ethernet: ti: cpsw: enable vlan rx vlan offload

In VLAN_AWARE mode CPSW can insert VLAN header encapsulation word on Host
port 0 egress (RX) before the packet data if RX_VLAN_ENCAP bit is set in
CPSW_CONTROL register. VLAN header encapsulation word has following format:

 HDR_PKT_Priority bits 29-31 - Header Packet VLAN prio (Highest prio: 7)
 HDR_PKT_CFI    bits 28 - Header Packet VLAN CFI bit.
 HDR_PKT_Vid    bits 27-16 - Header Packet VLAN ID
 PKT_Type         bits 8-9 - Packet Type. Indicates whether the packet is
                  VLAN-tagged, priority-tagged, or non-tagged.
00: VLAN-tagged packet
01: Reserved
10: Priority-tagged packet
11: Non-tagged packet

This feature can be used to implement TX VLAN offload in case of
VLAN-tagged packets and to insert VLAN tag in case Non-tagged packet was
received on port with PVID set. As per documentation, CPSW never modifies
packet data on Host egress (RX) and as result, without this feature
enabled, Host port will not be able to receive properly packets which
entered switch non-tagged through external Port with PVID set (when
non-tagged packet forwarded from external Port with PVID set to another
external Port - packet will be VLAN tagged properly).

Implementation details:
- on RX driver will check CPDMA status bit RX_VLAN_ENCAP BIT(19) in CPPI
descriptor to identify when VLAN header encapsulation word is present.
- PKT_Type = 0x01 or 0x02 then ignore VLAN header encapsulation word and
pass packet as is;
- if HDR_PKT_Vid = 0 then ignore VLAN header encapsulation word and pass
packet as is;
- In dual mac mode traffic is separated between ports using default port
vlans, which are not be visible to Host and so should not be reported.
Hence, check for default port vlans in dual mac mode and ignore VLAN header
encapsulation word;
- otherwise fill SKB with VLAN info using __vlan_hwaccel_put_tag();
- PKT_Type = 0x00 (VLAN-tagged) then strip out VLAN header from SKB.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agocxgb4: Fix queue free path of ULD drivers
Arjun Vynipadath [Thu, 15 Mar 2018 12:04:14 +0000 (17:34 +0530)]
cxgb4: Fix queue free path of ULD drivers

Setting sge_uld_rxq_info to NULL in free_queues_uld().
We are referencing sge_uld_rxq_info in cxgb_up(). This
will fix a panic when interface is brought up after a
ULDq creation failure.

Fixes: 94cdb8bb993a (cxgb4: Add support for dynamic allocation
       of resources for ULD)
Signed-off-by: Arjun Vynipadath <arjun@chelsio.com>
Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Ganesh Goudhar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agords: tcp: must use spin_lock_irq* and not spin_lock_bh with rds_tcp_conn_lock
Sowmini Varadhan [Thu, 15 Mar 2018 10:54:26 +0000 (03:54 -0700)]
rds: tcp: must use spin_lock_irq* and not spin_lock_bh with rds_tcp_conn_lock

rds_tcp_connection allocation/free management has the potential to be
called from __rds_conn_create after IRQs have been disabled, so
spin_[un]lock_bh cannot be used with rds_tcp_conn_lock.

Bottom-halves that need to synchronize for critical sections protected
by rds_tcp_conn_lock should instead use rds_destroy_pending() correctly.

Reported-by: syzbot+c68e51bb5e699d3f8d91@syzkaller.appspotmail.com
Fixes: ebeeb1ad9b8a ("rds: tcp: use rds_destroy_pending() to synchronize
       netns/module teardown and rds connection/workq management")
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'tipc-obsolete-zone-concept'
David S. Miller [Sat, 17 Mar 2018 21:11:47 +0000 (17:11 -0400)]
Merge branch 'tipc-obsolete-zone-concept'

Jon Maloy says:

====================
tipc: obsolete zone concept

Functionality related to the 'zone' concept was never implemented in
TIPC. In this series we eliminate the remaining traces of it in the
code, and can hence take a first step in reducing the footprint and
complexity of the binding table.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotipc: some name changes
Jon Maloy [Thu, 15 Mar 2018 15:48:55 +0000 (16:48 +0100)]
tipc: some name changes

We rename some lists and fields in struct publication both to make
the naming more consistent and to better reflect their roles. We
also update the descriptions of those lists.

node_list -> local_publ
cluster_list -> all_publ
pport_list -> binding_sock
ref -> port

There are no functional changes in this commit.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotipc: merge two lists in struct publication
Jon Maloy [Thu, 15 Mar 2018 15:48:54 +0000 (16:48 +0100)]
tipc: merge two lists in struct publication

The size of struct publication can be reduced further. Membership in
lists 'nodesub_list' and 'local_list' is mutually exlusive, in that
remote publications use the former and local publications the latter.
We replace the two lists with one single, named 'binding_node' which
reflects what it really is.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotipc: remove zone_list member in struct publication
Jon Maloy [Thu, 15 Mar 2018 15:48:53 +0000 (16:48 +0100)]
tipc: remove zone_list member in struct publication

As a further consequence of the previous commits, we can also remove
the member 'zone_list 'in struct name_info and struct publication.
Instead, we now let the member cluster_list take over the role a
container of all publications of a given <type,lower, upper>.
We also remove the counters for the size of those lists, since
they don't serve any purpose.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotipc: remove zone publication list in name table
Jon Maloy [Thu, 15 Mar 2018 15:48:52 +0000 (16:48 +0100)]
tipc: remove zone publication list in name table

As a consequence of the previous commit we nan now eliminate zone scope
related lists in the name table. We start with name_table::publ_list[3],
which can now be replaced with two lists, one for node scope publications
and one for cluster scope publications.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotipc: obsolete TIPC_ZONE_SCOPE
Jon Maloy [Thu, 15 Mar 2018 15:48:51 +0000 (16:48 +0100)]
tipc: obsolete TIPC_ZONE_SCOPE

Publications for TIPC_CLUSTER_SCOPE and TIPC_ZONE_SCOPE are in all
aspects handled the same way, both on the publishing node and on the
receiving nodes.

Despite previous ambitions to the contrary, this is never going to change,
so we take the conseqeunce of this and obsolete TIPC_ZONE_SCOPE and related
macros/functions. Whenever a user is doing a bind() or a sendmsg() attempt
using ZONE_SCOPE we translate this internally to CLUSTER_SCOPE, while we
remain compatible with users and remote nodes still using ZONE_SCOPE.

Furthermore, the non-formalized scope value 0 has always been permitted
for use during lookup, with the same meaning as ZONE_SCOPE/CLUSTER_SCOPE.
We now permit it even as binding scope, but for compatibility reasons we
choose to not change the value of TIPC_CLUSTER_SCOPE.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'pernet-convert-part8'
David S. Miller [Sat, 17 Mar 2018 21:07:40 +0000 (17:07 -0400)]
Merge branch 'pernet-convert-part8'

Kirill Tkhai says:

====================
Converting pernet_operations (part #8)

this series continues to review and to convert pernet_operations
to make them possible to be executed in parallel for several
net namespaces at the same time. There are different operations
over the tree, mostly are ipvs.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Convert ip_vs_ftp_ops
Kirill Tkhai [Thu, 15 Mar 2018 09:11:44 +0000 (12:11 +0300)]
net: Convert ip_vs_ftp_ops

These pernet_operations register and unregister ipvs app.
register_ip_vs_app(), unregister_ip_vs_app() and
register_ip_vs_app_inc() modify per-net structures,
and there are no global structures touched. So,
this looks safe to be marked as async.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Convert ipvs_core_dev_ops
Kirill Tkhai [Thu, 15 Mar 2018 09:11:35 +0000 (12:11 +0300)]
net: Convert ipvs_core_dev_ops

Exit method stops two per-net threads and cancels
delayed work. Everything looks nicely per-net divided.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Convert ipvs_core_ops
Kirill Tkhai [Thu, 15 Mar 2018 09:11:25 +0000 (12:11 +0300)]
net: Convert ipvs_core_ops

These pernet_operations register and unregister nf hooks,
/proc entries, sysctl, percpu statistics. There are several
global lists, and the only list modified without exclusive
locks is ip_vs_conn_tab in ip_vs_conn_flush(). We iterate
the list and force the timers expire at the moment. Since
there were possible several timer expirations before this
patch, and since they are safe, the patch does not invent
new parallelism of their destruction. These pernet_operations
look safe to be converted.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Convert ovs_net_ops
Kirill Tkhai [Thu, 15 Mar 2018 09:11:16 +0000 (12:11 +0300)]
net: Convert ovs_net_ops

These pernet_operations initialize and destroy net_generic()
data pointed by ovs_net_id. Exit method destroys vports from
alive net to exiting net. Since they are only pernet_operations
interested in this data, and exit method is executed under
exclusive global lock (ovs_mutex), they are safe to be executed
in parallel.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Convert mpls_net_ops
Kirill Tkhai [Thu, 15 Mar 2018 09:11:06 +0000 (12:11 +0300)]
net: Convert mpls_net_ops

These pernet_operations register and unregister sysctl table.
Exit methods frees platform_labels from net::mpls::platform_label.
Everything is per-net, and they looks safe to be marked async.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Convert l2tp_net_ops
Kirill Tkhai [Thu, 15 Mar 2018 09:10:57 +0000 (12:10 +0300)]
net: Convert l2tp_net_ops

Init method is rather simple. Exit method queues del_work
for every tunnel from per-net list. This seems to be safe
to be marked async.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Acked-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet-tcp_bbr: set tp->snd_ssthresh to BDP upon STARTUP exit
Yousuk Seung [Fri, 16 Mar 2018 17:51:49 +0000 (10:51 -0700)]
net-tcp_bbr: set tp->snd_ssthresh to BDP upon STARTUP exit

Set tp->snd_ssthresh to BDP upon STARTUP exit. This allows us
to check if a BBR flow exited STARTUP and the BDP at the
time of STARTUP exit with SCM_TIMESTAMPING_OPT_STATS. Since BBR does not
use snd_ssthresh this fix has no impact on BBR's behavior.

Signed-off-by: Yousuk Seung <ysseung@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotcp: add snd_ssthresh stat in SCM_TIMESTAMPING_OPT_STATS
Yousuk Seung [Fri, 16 Mar 2018 17:51:07 +0000 (10:51 -0700)]
tcp: add snd_ssthresh stat in SCM_TIMESTAMPING_OPT_STATS

This patch adds TCP_NLA_SND_SSTHRESH stat into SCM_TIMESTAMPING_OPT_STATS
that reports tcp_sock.snd_ssthresh.

Signed-off-by: Yousuk Seung <ysseung@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests/txtimestamp: Add more configurable parameters
Vinicius Costa Gomes [Fri, 16 Mar 2018 17:41:14 +0000 (10:41 -0700)]
selftests/txtimestamp: Add more configurable parameters

Add a way to configure if poll() should wait forever for an event, the
number of packets that should be sent for each and if there should be
any delay between packets.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoliquidio: Simplified napi poll
Intiyaz Basha [Fri, 16 Mar 2018 17:21:31 +0000 (10:21 -0700)]
liquidio: Simplified napi poll

1) Moved interrupt enable related code from octeon_process_droq_poll_cmd()
   to separate function octeon_enable_irq().
2) Removed wrapper function octeon_process_droq_poll_cmd(), and directlyi
   using octeon_droq_process_poll_pkts().
3) Removed unused macros POLL_EVENT_XXX.

Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'net-smc-IPv6-support'
David S. Miller [Fri, 16 Mar 2018 18:57:26 +0000 (14:57 -0400)]
Merge branch 'net-smc-IPv6-support'

Ursula Braun says:

====================
net/smc: IPv6 support

these smc patches for the net-next tree add IPv6 support.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/smc: enable ipv6 support for smc
Karsten Graul [Fri, 16 Mar 2018 14:06:41 +0000 (15:06 +0100)]
net/smc: enable ipv6 support for smc

Add ipv6 support to the smc socket layer functions. Make use of the
updated clc layer functions to retrieve and match ipv6 information.
The indicator for ipv4 or ipv6 is the protocol constant that is provided
in the socket() call with address family AF_SMC.

Based-on-patch-by: Takanori Ueda <tkueda@jp.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/smc: add ipv6 support to CLC layer
Karsten Graul [Fri, 16 Mar 2018 14:06:40 +0000 (15:06 +0100)]
net/smc: add ipv6 support to CLC layer

The CLC layer is updated to support ipv6 proposal messages from peers and
to match incoming proposal messages against the ipv6 addresses of the net
device. struct smc_clc_ipv6_prefix is updated to provide the space for an
ipv6 address (struct was not used before). SMC_CLC_MAX_LEN is updated to
include the size of the proposal prefix. Existing code in net is not
affected, the previous SMC_CLC_MAX_LEN value is large enough to hold ipv4
proposal messages.

Signed-off-by: Karsten Graul <kgraul@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/smc: restructure netinfo for CLC proposal msgs
Karsten Graul [Fri, 16 Mar 2018 14:06:39 +0000 (15:06 +0100)]
net/smc: restructure netinfo for CLC proposal msgs

Introduce functions smc_clc_prfx_set to retrieve IP information for the
CLC proposal msg and smc_clc_prfx_match to match the contents of a
proposal message against the IP addresses of the net device. The new
functions replace the functionality provided by smc_clc_netinfo_by_tcpsk,
which is removed by this patch. The match functionality is extended to
scan all ipv4 addresses of the net device for a match against the
ipv4 subnet from the proposal msg.

Signed-off-by: Karsten Graul <kgraul@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agocxgb4: notify fatal error to uld drivers
Ganesh Goudar [Fri, 16 Mar 2018 08:52:57 +0000 (14:22 +0530)]
cxgb4: notify fatal error to uld drivers

notify uld drivers if the adapter encounters fatal
error.

Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'rtnl_lock_killable'
David S. Miller [Fri, 16 Mar 2018 16:31:19 +0000 (12:31 -0400)]
Merge branch 'rtnl_lock_killable'

Kirill Tkhai says:

====================
Introduce rtnl_lock_killable()

rtnl_lock() is widely used mutex in kernel. Some of kernel code
does memory allocations under it. In case of memory deficit this
may invoke OOM killer, but the problem is a killed task can't
exit if it's waiting for the mutex. This may be a reason of deadlock
and panic.

This patchset adds a new primitive, which responds on SIGKILL,
and it allows to use it in the places, where we don't want
to sleep forever. Also, the first place is made to use it.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Use rtnl_lock_killable() in register_netdev()
Kirill Tkhai [Wed, 14 Mar 2018 19:17:28 +0000 (22:17 +0300)]
net: Use rtnl_lock_killable() in register_netdev()

This patch adds rtnl_lock_killable() to one of hot path
using rtnl_lock().

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Add rtnl_lock_killable()
Kirill Tkhai [Wed, 14 Mar 2018 19:17:20 +0000 (22:17 +0300)]
net: Add rtnl_lock_killable()

rtnl_lock() is widely used mutex in kernel. Some of kernel code
does memory allocations under it. In case of memory deficit this
may invoke OOM killer, but the problem is a killed task can't
exit if it's waiting for the mutex. This may be a reason of deadlock
and panic.

This patch adds a new primitive, which responds on SIGKILL, and
it allows to use it in the places, where we don't want to sleep
forever.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agodoc: Change the udp/sctp rmem/wmem default value.
Tonghao Zhang [Wed, 14 Mar 2018 04:57:17 +0000 (21:57 -0700)]
doc: Change the udp/sctp rmem/wmem default value.

The SK_MEM_QUANTUM was changed from PAGE_SIZE to 4096.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoudp: Move the udp sysctl to namespace.
Tonghao Zhang [Wed, 14 Mar 2018 04:57:16 +0000 (21:57 -0700)]
udp: Move the udp sysctl to namespace.

This patch moves the udp_rmem_min, udp_wmem_min
to namespace and init the udp_l3mdev_accept explicitly.

The udp_rmem_min/udp_wmem_min affect udp rx/tx queue,
with this patch namespaces can set them differently.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'net-ipv6-Address-checks-need-to-consider-the-L3-domain'
David S. Miller [Fri, 16 Mar 2018 15:28:40 +0000 (11:28 -0400)]
Merge branch 'net-ipv6-Address-checks-need-to-consider-the-L3-domain'

David Ahern says:

====================
net/ipv6: Address checks need to consider the L3 domain

IPv6 prohibits a local address from being used as a gateway for a route.
However, it is ok for the gateway to be a local address in a different L3
domain (e.g., VRF). This allows, for example, veth pairs to connect VRFs.

ip6_route_info_create calls ipv6_chk_addr_and_flags for gateway addresses
to determine if the address is a local one, but ipv6_chk_addr_and_flags
does not currently consider L3 domains. As a result routes can not be
added in one VRF with a nexthop that points to a local address in a
second VRF.

Resolve by comparing the l3mdev for the passed in device and requiring an
l3mdev match with the device containing an address. The intent of checking
for an address on the specified device versus any device in the domain is
mantained by a new argument to skip the check between the passed in device
and the device with the address.

Patch 1 moves the gateway validation from ip6_route_info_create into a
helper; the function is long enough and refactoring drops the indent
level.

Patch 2 adds a skip_dev_check argument to ipv6_chk_addr_and_flags to
allow a device to always be passed yet skip the device check when
looking at addresses and fixes up a few ipv6_chk_addr callers that
pass a NULL device.

Patch 3 adds l3mdev checks to ipv6_chk_addr_and_flags.

Patches 4 and 5 do some refactoring to the fib_tests script and then
patch 6 adds nexthop validation tests.

v4
- separated l3mdev check into a separate patch (patch 3 of this set)
  as suggested by Kirill
- consolidated dev and ipv6_chk_addr_and_flags call into 1 if (Kirill)
- added a temp variable for gw type (Kirill)

v3
- set skip_dev_check in ipv6_chk_addr based on dev == NULL (per
  comment from Ido)

v2
- handle 2 variations of route spec with sane error path
- add test cases
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests: fib_tests: Add IPv6 nexthop spec tests
David Ahern [Tue, 13 Mar 2018 15:29:41 +0000 (08:29 -0700)]
selftests: fib_tests: Add IPv6 nexthop spec tests

Add series of tests for valid and invalid nexthop specs for IPv6.

$ TEST=fib_nexthop_test ./fib_tests.sh
...
IPv6 nexthop tests
    TEST: Directly connected nexthop, unicast address              [ OK ]
    TEST: Directly connected nexthop, unicast address with device  [ OK ]
    TEST: Gateway is linklocal address                             [ OK ]
    TEST: Gateway is linklocal address, no device                  [ OK ]
    TEST: Gateway can not be local unicast address                 [ OK ]
    TEST: Gateway can not be local unicast address, with device    [ OK ]
    TEST: Gateway can not be a local linklocal address             [ OK ]
    TEST: Gateway can be local address in a VRF                    [ OK ]
    TEST: Gateway can be local address in a VRF, with device       [ OK ]
    TEST: Gateway can be local linklocal address in a VRF          [ OK ]
    TEST: Redirect to VRF lookup                                   [ OK ]
    TEST: VRF route, gateway can be local address in default VRF   [ OK ]
    TEST: VRF route, gateway can not be a local address            [ OK ]
    TEST: VRF route, gateway can not be a local addr with device   [ OK ]

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests: fib_tests: Allow user to run a specific test
David Ahern [Tue, 13 Mar 2018 15:29:40 +0000 (08:29 -0700)]
selftests: fib_tests: Allow user to run a specific test

Allow a user to run just a specific fib test by setting the TEST
environment variable.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests: fib_tests: Use an alias for ip command
David Ahern [Tue, 13 Mar 2018 15:29:39 +0000 (08:29 -0700)]
selftests: fib_tests: Use an alias for ip command

Replace 'ip -netns testns' with the alias IP. Shortens the line lengths
and makes running the commands manually a bit easier.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/ipv6: Add l3mdev check to ipv6_chk_addr_and_flags
David Ahern [Tue, 13 Mar 2018 15:29:38 +0000 (08:29 -0700)]
net/ipv6: Add l3mdev check to ipv6_chk_addr_and_flags

Lookup the L3 master device for the passed in device. Only consider
addresses on netdev's with the same master device. If the device is
not enslaved or is NULL, then the l3mdev is NULL which means only
devices not enslaved (ie, in the default domain) are considered.

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/ipv6: Change address check to always take a device argument
David Ahern [Tue, 13 Mar 2018 15:29:37 +0000 (08:29 -0700)]
net/ipv6: Change address check to always take a device argument

ipv6_chk_addr_and_flags determines if an address is a local address and
optionally if it is an address on a specific device. For example, it is
called by ip6_route_info_create to determine if a given gateway address
is a local address. The address check currently does not consider L3
domains and as a result does not allow a route to be added in one VRF
if the nexthop points to an address in a second VRF. e.g.,

    $ ip route add 2001:db8:1::/64 vrf r2 via 2001:db8:102::23
    Error: Invalid gateway address.

where 2001:db8:102::23 is an address on an interface in vrf r1.

ipv6_chk_addr_and_flags needs to allow callers to always pass in a device
with a separate argument to not limit the address to the specific device.
The device is used used to determine the L3 domain of interest.

To that end add an argument to skip the device check and update callers
to always pass a device where possible and use the new argument to mean
any address in the domain.

Update a handful of users of ipv6_chk_addr with a NULL dev argument. This
patch handles the change to these callers without adding the domain check.

ip6_validate_gw needs to handle 2 cases - one where the device is given
as part of the nexthop spec and the other where the device is resolved.
There is at least 1 VRF case where deferring the check to only after
the route lookup has resolved the device fails with an unintuitive error
"RTNETLINK answers: No route to host" as opposed to the preferred
"Error: Gateway can not be a local address." The 'no route to host'
error is because of the fallback to a full lookup. The check is done
twice to avoid this error.

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/ipv6: Refactor gateway validation on route add
David Ahern [Tue, 13 Mar 2018 15:29:36 +0000 (08:29 -0700)]
net/ipv6: Refactor gateway validation on route add

Move gateway validation code from ip6_route_info_create into
ip6_validate_gw. Code move plus adjustments to handle the potential
reset of dev and idev and to make checkpatch happy.

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'macb-Introduce-phy-handle-DT-functionality'
David S. Miller [Fri, 16 Mar 2018 15:14:34 +0000 (11:14 -0400)]
Merge branch 'macb-Introduce-phy-handle-DT-functionality'

Brad Mouring says:

====================
net: macb: Introduce phy-handle DT functionality

Consider the situation where a macb netdev is connected through
a phydev that sits on a mii bus other than the one provided to
this particular netdev. This situation is what this patchset aims
to accomplish through the existing phy-handle optional binding.

This optional binding (as described in the ethernet DT bindings doc)
directs the netdev to the phydev to use. This is precisely the
situation this patchset aims to solve, so it makes sense to introduce
the functionality to this driver (where the physical layout discussed
was encountered).

The devicetree snippet would look something like this:

...
   ethernet@feedf00d {
           ...
           phy-handle = <&phy0> // the first netdev is physically wired to phy0
           ...
           phy0: phy@0 {
                   ...
                   reg = <0x0> // MDIO address 0
                   ...
           }
           phy1: phy@1 {
                   ...
                   reg = <0x1> // MDIO address 1
                   ...
           }
           ...
   }

   ethernet@deadbeef {
           ...
           phy-handle = <&phy1> // tells the driver to use phy1 on the
                                // first mac's mdio bus (it's wired thusly)
           ...
   }
...

The work done to add the phy_node in the first place (dacdbb4dfc1a1:
"net: macb: add fixed-link node support") will consume the
device_node (if found).

v2: Reorganization of mii probe/init functions, suggested by Andrew Lunn
v3: Moved some of the bus init code back into init (erroneously moved to probe)
    some style issues, and an unintialized variable warning addressed.
v4: Add Reviewed-by: tags
    Skip fallback code if phy-handle phandle is found
v5: Cleanup formatting issues
    Fix compile failure introduced in 1/4 "net: macb: Reorganize macb_mii
        bringup"
    Fix typo in "Documentation: macb: Document phy-handle binding"
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoDocumentation: macb: Document phy-handle binding
Brad Mouring [Tue, 13 Mar 2018 21:32:16 +0000 (16:32 -0500)]
Documentation: macb: Document phy-handle binding

Document the existence of the optional binding, directing to the
general ethernet document that describes this binding.

Signed-off-by: Brad Mouring <brad.mouring@ni.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: macb: Add phy-handle DT support
Brad Mouring [Tue, 13 Mar 2018 21:32:15 +0000 (16:32 -0500)]
net: macb: Add phy-handle DT support

This optional binding (as described in the ethernet DT bindings doc)
directs the netdev to the phydev to use. This is useful for a phy
chip that has >1 phy in it, and two netdevs are using the same phy
chip (i.e. the second mac's phy lives on the first mac's MDIO bus)

The devicetree snippet would look something like this:

ethernet@feedf00d {
...
phy-handle = <&phy0> // the first netdev is physically wired to phy0
...
phy0: phy@0 {
...
reg = <0x0> // MDIO address 0
...
}
phy1: phy@1 {
...
reg = <0x1> // MDIO address 1
...
}
...
}

ethernet@deadbeef {
...
phy-handle = <&phy1> // tells the driver to use phy1 on the
 // first mac's mdio bus (it's wired thusly)
...
}

The work done to add the phy_node in the first place (dacdbb4dfc1a1:
"net: macb: add fixed-link node support") will consume the
device_node (if found).

Signed-off-by: Brad Mouring <brad.mouring@ni.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: macb: Remove redundant poll irq assignment
Brad Mouring [Tue, 13 Mar 2018 21:32:14 +0000 (16:32 -0500)]
net: macb: Remove redundant poll irq assignment

In phy_device's general probe, this device will already be set for
phy register polling, rendering this code redundant.

Signed-off-by: Brad Mouring <brad.mouring@ni.com>
Suggested-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>