platform/kernel/linux-exynos.git
7 years agoMerge branch 'net-phy-Support-managed-Cortina-phys'
David S. Miller [Tue, 30 May 2017 16:42:28 +0000 (12:42 -0400)]
Merge branch 'net-phy-Support-managed-Cortina-phys'

Bogdan Purcareata says:

====================
net: phy: Support managed Cortina phys

So far, the Cortina family phys (CS4340 in this particular case) are only
supported in fixed link mode (via fixed_phy_register). The generic 10G
phy driver does not work well with the phylib state machine, when the phy
is registered via of_phy_connect. This prohibits the user from describing the
phy nodes in the device tree.

In order to support this scenario, and to properly describe the board
device tree, add a minimal Cortina driver that reads the status from the
right register. With the generic 10G C45 driver, the kernel will print
messages like:
[    0.226521] mdio_bus 8b96000: Error while reading PHY16 reg at 1.6
[    0.232780] mdio_bus 8b96000: Error while reading PHY16 reg at 1.5

v3 -> v4:
- Add trademark info.
- Minor documentation entry consistency nit.

v2 -> v3:
- Add documentation entry.

v1 -> v2:
- Change approach for getting the phy_id from hacking get_phy_c45_ids to
  describing the device in the device tree via ethernet-phy-id.

More patch version changes per individual patches.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agodt-bindings: net: Add Cortina device tree bindings
Bogdan Purcareata [Mon, 29 May 2017 09:11:31 +0000 (09:11 +0000)]
dt-bindings: net: Add Cortina device tree bindings

Add device tree description info for Cortina 10G phy devices.

Signed-off-by: Bogdan Purcareata <bogdan.purcareata@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: phy: Add Cortina CS4340 driver
Bogdan Purcareata [Mon, 29 May 2017 09:11:30 +0000 (09:11 +0000)]
net: phy: Add Cortina CS4340 driver

Add basic support for Cortina PHY drivers. Support only CS4340 for now.
The phys are not compatible with IEEE 802.3 clause 22/45 registers.

Implement proper read_status support. The generic 10G phy driver causes
bus register access errors.

The driver should be described using the "ethernet-phy-id" device tree
compatible.

Signed-off-by: Bogdan Purcareata <bogdan.purcareata@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'qed-DCBx-and-Attentions-series'
David S. Miller [Tue, 30 May 2017 16:07:04 +0000 (12:07 -0400)]
Merge branch 'qed-DCBx-and-Attentions-series'

Yuval Mintz says:

====================
qed: DCBx and Attentions series

The series contains 2 major components [& some odd bits]:
 - The first 3 patches are DCBx-related, containg missing bits in the
   implementation, correcting existing API and removing code no longer
   necessary.
 - Most of the remaining patches are interrupt/hw-attention related,
   adding some differeneces relating to QL41xxx and QL45xxx differences.
   While at it, they also remove a large chunk of unnecessary structure
   definitions.

The series also contain a patch [#10] that was accidently missing
from a previous series.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoqed: Cache alignemnt padding to match host
Mintz, Yuval [Mon, 29 May 2017 06:53:14 +0000 (09:53 +0300)]
qed: Cache alignemnt padding to match host

Improve PCI performance by adjusting padding sizes to match those of the
host machine's cacheline.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoqed: Mask parities after occurance
Mintz, Yuval [Mon, 29 May 2017 06:53:13 +0000 (09:53 +0300)]
qed: Mask parities after occurance

Parities might exhibit a flood behavior since we re-enable the
attention line without preventing the parity from re-triggering the
assertion.
Mask the source in AEU until the parity would be handled.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoqed: Print multi-bit attentions properly
Mintz, Yuval [Mon, 29 May 2017 06:53:12 +0000 (09:53 +0300)]
qed: Print multi-bit attentions properly

In strucuture reflecting the AEU hw block some entries
represent multiple HW bits, and the associated name is in fact
a pattern.
Today, whenever such an attention would be asserted the resulted
prints would show the pattern string instead of indicating which
of the possible bits was set.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoqed: Diffrentiate adapter-specific attentions
Mintz, Yuval [Mon, 29 May 2017 06:53:11 +0000 (09:53 +0300)]
qed: Diffrentiate adapter-specific attentions

There are 4 attention bits in AEU that have different meaning
for QL45xxx and QL41xxx adapters.

Instead of doing a massive infrastructure change in favor of these
bits, we implement a point fix where only those four would change
meaning dependent on the adapter involved.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoqed: Get rid of the attention-arrays
Mintz, Yuval [Mon, 29 May 2017 06:53:10 +0000 (09:53 +0300)]
qed: Get rid of the attention-arrays

We have almost all the necessary information regarding attentions
in the logic employed for taking register dumps.
Add some more and get rid of the seperate implementation we have today
for identifying & printing various attention sources.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoqed: Support dynamic s-tag change
Mintz, Yuval [Mon, 29 May 2017 06:53:09 +0000 (09:53 +0300)]
qed: Support dynamic s-tag change

In case management firmware indicates a change in the used S-tag,
propagate the configuration to HW and FW.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoqed: QL41xxx VF MSI-x table
Mintz, Yuval [Mon, 29 May 2017 06:53:08 +0000 (09:53 +0300)]
qed: QL41xxx VF MSI-x table

The QL41xxx adapters' PCI allows a single configuration for the
MSI-x table size of all child VFs of a given PF.
The existing code wouldn't cause the management firmware to set
that value, meaning the VFs would retain the default MSI-x table
size.

Introduce a new scheme so that whenever a VF is enabled, driver
would set the number of MSI-x to be the maximum over the various
VFs' needs.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoqed: Don't inherit RoCE DCBx for V2
Sudarsana Reddy Kalluru [Mon, 29 May 2017 06:53:07 +0000 (09:53 +0300)]
qed: Don't inherit RoCE DCBx for V2

Older firmware used by device didn't distinguish between RoCE and RoCE
V2 from DCBx configuration perspective, and as a result we've used to
take a the RoCE-related configuration and apply to it for both.

Since we now support configuring each its own values, there's no reason
to reflect [& configure] that both are using the same.

Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoqed: Correct DCBx update scheme
Sudarsana Reddy Kalluru [Mon, 29 May 2017 06:53:06 +0000 (09:53 +0300)]
qed: Correct DCBx update scheme

Instead of using a boolean value that propagates to FW configuration,
use the proper firmware HSI values.

Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoqed: Add missing static/local dcbx info
Sudarsana Reddy Kalluru [Mon, 29 May 2017 06:53:05 +0000 (09:53 +0300)]
qed: Add missing static/local dcbx info

Some getters are not getting filled with the correct information
regarding local DCBx.

Fixes: 49632b5822ea ("qed: Add support for static dcbx.")
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'net-more-extack'
David S. Miller [Tue, 30 May 2017 15:55:34 +0000 (11:55 -0400)]
Merge branch 'net-more-extack'

David Ahern says:

====================
net: another round of extack handling for routing

This set focuses on passing extack through lwtunnel and MPLS with
additional catches for IPv4 route add and minor cleanups in MPLS
encountered passing the extack arg around.

v2
- mindful of bloat adding duplicate messages
  + refactored prefix and prefix length checks in ipv4's fib_table_insert
    and fib_table_del
  + refactored label check in mpls

- split mpls cleanups into 2 patches
  + move nla_get_via up in af_mpls to avoid forward declaration
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: mpls: remove unnecessary initialization of err
David Ahern [Sat, 27 May 2017 22:19:33 +0000 (16:19 -0600)]
net: mpls: remove unnecessary initialization of err

err is initialized to EINVAL and not used before it is set again.
Remove the unnecessary initialization.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: mpls: Make nla_get_via in af_mpls.c
David Ahern [Sat, 27 May 2017 22:19:32 +0000 (16:19 -0600)]
net: mpls: Make nla_get_via in af_mpls.c

nla_get_via is only used in af_mpls.c. Remove declaration from internal.h
and move up in af_mpls.c before first use. Code move only; no
functional change intended.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: mpls: Add extack messages for route add and delete failures
David Ahern [Sat, 27 May 2017 22:19:31 +0000 (16:19 -0600)]
net: mpls: Add extack messages for route add and delete failures

Add error messages for failures in adding and deleting mpls routes.
This covers most of the annoying EINVAL errors.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: mpls: Pull common label check into helper
David Ahern [Sat, 27 May 2017 22:19:30 +0000 (16:19 -0600)]
net: mpls: Pull common label check into helper

mpls_route_add and mpls_route_del have the same checks on the label.
Move to a helper. Avoid duplicate extack messages in the next patch.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: Fill in extack for mpls lwt encap
David Ahern [Sat, 27 May 2017 22:19:29 +0000 (16:19 -0600)]
net: Fill in extack for mpls lwt encap

Fill in extack for errors in build_state for mpls lwt encap including
passing extack to nla_get_labels and adding error messages for failures
in it.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: add extack arg to lwtunnel build state
David Ahern [Sat, 27 May 2017 22:19:28 +0000 (16:19 -0600)]
net: add extack arg to lwtunnel build state

Pass extack arg down to lwtunnel_build_state and the build_state callbacks.
Add messages for failures in lwtunnel_build_state, and add the extarg to
nla_parse where possible in the build_state callbacks.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: lwtunnel: Add extack to encap attr validation
David Ahern [Sat, 27 May 2017 22:19:27 +0000 (16:19 -0600)]
net: lwtunnel: Add extack to encap attr validation

Pass extack down to lwtunnel_valid_encap_type and
lwtunnel_valid_encap_type_attr. Add messages for unknown
or unsupported encap types.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: ipv4: Add extack message for invalid prefix or length
David Ahern [Sat, 27 May 2017 22:19:26 +0000 (16:19 -0600)]
net: ipv4: Add extack message for invalid prefix or length

Add extack error message for invalid prefix length and invalid prefix.
Example of the latter is a route spec containing 172.16.100.1/24, where
the /24 mask means the lower 8-bits should be 0. Amazing how easy that
one is to overlook when an EINVAL is returned.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: ipv4: refactor key and length checks
David Ahern [Sat, 27 May 2017 22:19:25 +0000 (16:19 -0600)]
net: ipv4: refactor key and length checks

fib_table_insert and fib_table_delete have the same checks on the prefix
and length. Refactor into a helper. Avoids duplicate extack messages in
the next patch.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'nfp-pci-core-hwmon-live-mac-addr-change'
David S. Miller [Tue, 30 May 2017 15:27:10 +0000 (11:27 -0400)]
Merge branch 'nfp-pci-core-hwmon-live-mac-addr-change'

Jakub Kicinski says:

====================
nfp: pci core, hwmon, live mac addr change

This series brings updates to core PCI code, SR-IOV, exposes
firmware's capability to change MAC address at runtime and HWMON
interfaces.

The PCI code updates include resiliency improvement in conditions
which are quite unusual, but still shouldn't make the driver oops.
We also handle very large device memory operation more gracefully.
A timeout is added to acquiring mutexes in device memory.

Pablo provides a patch to expose to the stack the ability to change
MAC addresses under traffic while David adds HWMON interface for
reading device temperature and power consumption.

Last three patches are minor improvements to the netdev code.

v2:
 - add patch 1 - fix for devlink build;
 - fix build issue with the hwmon patch.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonfp: don't keep count for free buffers delayed kick
Jakub Kicinski [Mon, 29 May 2017 00:53:04 +0000 (17:53 -0700)]
nfp: don't keep count for free buffers delayed kick

We only kick RX free buffer queue controller every NFP_NET_FL_BATCH
(currently 16) entries.  This means that we will always kick the QC
when write ring index is divisable by NFP_NET_FL_BATCH.  There is
no need to keep counts.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonfp: don't add ring size to index calculations
Jakub Kicinski [Mon, 29 May 2017 00:53:03 +0000 (17:53 -0700)]
nfp: don't add ring size to index calculations

Adding ring size to index calculation is pointless, since index
will be masked with ring size - 1.

Suggested-by: David Laight <David.Laight@ACULAB.COM>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonfp: fix print format for ring pointers in ring dumps
Jakub Kicinski [Mon, 29 May 2017 00:53:02 +0000 (17:53 -0700)]
nfp: fix print format for ring pointers in ring dumps

Ring pointers are unsigned.  Fix the print formats to avoid
showing users negative values.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonfp: don't wait for resources indefinitely
Jakub Kicinski [Mon, 29 May 2017 00:53:01 +0000 (17:53 -0700)]
nfp: don't wait for resources indefinitely

There is currently no timeout to the resource and lock acquiring
loops.  We printed warnings and depended on user sending a signal
to the waiting process to stop the waiting.  This doesn't work
very well when wait happens out of a work queue.  The simplest
example of that is PCI probe.  When user loads the module and card
is in a broken state modprobe will wait forever and signals sent
to it will not actually reach the probing thread.

Make sure all wait loops have a time out.  Set the upper wait time
to 60 seconds to stay on the safe side.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonfp: add hwmon support
David Brunecz [Mon, 29 May 2017 00:53:00 +0000 (17:53 -0700)]
nfp: add hwmon support

Add support for retrieving temperature and power sensor and limits via NSP.

Signed-off-by: David Brunecz <david.brunecz@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonfp: support variable NSP response lengths
Jakub Kicinski [Mon, 29 May 2017 00:52:59 +0000 (17:52 -0700)]
nfp: support variable NSP response lengths

We want to support extendable commands, where newer versions
of the management FW may provide more information.  Zero out
the communication buffer before passing control to NSP.  This
way if management FW is old and only fills in first N bytes,
the remaining ones will be zeros which extended ABI fields
should reserve as not supported/not available.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonfp: shorten CPP core probe logs
Jakub Kicinski [Mon, 29 May 2017 00:52:58 +0000 (17:52 -0700)]
nfp: shorten CPP core probe logs

We currently print reserved BAR mappings info as we create them.
This makes the probe logs longer than necessary.  Print into a
buffer instead and log all the info as a single line.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonfp: support long reads and writes with the cpp helpers
Jakub Kicinski [Mon, 29 May 2017 00:52:57 +0000 (17:52 -0700)]
nfp: support long reads and writes with the cpp helpers

nfp_cpp_{read,write}() helpers perform device memory mapping (setting
the PCIe -> NOC translation BARs) and accessing it.  They, however,
currently implicitly expect that the length of entire operation will
fit in one BAR translation window.  There is a number of 16MB windows
available, and we don't really need to access such large areas today.

If the user, however, manages to trick the driver into making a big
mapping (e.g. by providing a huge fake FW file), the driver will
print a warning saying "No suitable BAR found for request" and a
stack trace - which most users find concerning.

To be future-proof and not scare users with warnings, make the
nfp_cpp_{read,write}() helpers do accesses chunk by chunk if the area
size is large.  Set the notion of "large" to 2MB, which is the size
of the smallest BAR window.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonfp: only try to get to PCIe ctrl memory if BARs are wide enough
Jakub Kicinski [Mon, 29 May 2017 00:52:56 +0000 (17:52 -0700)]
nfp: only try to get to PCIe ctrl memory if BARs are wide enough

For accessing PCIe ctrl memory we depend on the BAR aperture being
large enough to reach all registers.  Since the BAR aperture can
be set in the flash make sure the driver won't oops the kernel
when the PCIe configuration is unusual.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonfp: don't set aux pointers if ioremap failed
Jakub Kicinski [Mon, 29 May 2017 00:52:55 +0000 (17:52 -0700)]
nfp: don't set aux pointers if ioremap failed

If ioremap of PCIe ctrl memory failed we can still get to it through
PCI config space, therefore we allow ioremap() to fail.  When if fails,
however, we must leave all the IOMEM pointers as NULL.  Currently we
would calculate csr and em pointers, adding offsets to the potential
NULL value and therefore making the NULL-checks throughout the code
ineffective.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonfp: set driver VF limit
Jakub Kicinski [Mon, 29 May 2017 00:52:54 +0000 (17:52 -0700)]
nfp: set driver VF limit

PCI subsystem has support for drivers limiting the number of VFs
available below what the IOV capability claims.  Make use of it.

While at it remove the #ifdef/#endif on CONFIG_PCI_IOV, it was
there to avoid unnecessary warnings in case device read failed
but kernel doesn't have SR-IOV support anyway.  Device reads
should not fail.

Note that we still need the driver-internal check for the case
where max VFs is 0 since PCI subsystem treats 0 as limit not set.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonfp: add set_mac_address support while the interface is up
Pablo Cascón [Mon, 29 May 2017 00:52:53 +0000 (17:52 -0700)]
nfp: add set_mac_address support while the interface is up

Expose FW app ability to change MAC address at runtime.  Make sure
we only depend on it if FW app advertised the right capability.

Signed-off-by: Pablo Cascón <pablo.cascon@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonfp: add MAY_USE_DEVLINK dependency
Jakub Kicinski [Mon, 29 May 2017 00:52:52 +0000 (17:52 -0700)]
nfp: add MAY_USE_DEVLINK dependency

Fix build with DEVLINK=m and NFP=y.

Fixes: 1851f93fd2ee ("nfp: add devlink support")
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: phy: Relax error checking on sysfs_create_link()
Florian Fainelli [Sat, 27 May 2017 17:42:25 +0000 (10:42 -0700)]
net: phy: Relax error checking on sysfs_create_link()

Some Ethernet drivers will attach/connect to a PHY device before calling
register_netdevice() which is responsible for calling netdev_register_kobject()
which would do the network device's kobject initialization. In such a case,
sysfs_create_link() would return -ENOENT because the network device's kobject
is not ready yet, and we would fail to connect to the PHY device.

In order to keep things simple and symetrical, we just take the success path as
indicative of the ability to access the network device's kobject, and create
the second link if that's the case.

Fixes: 5568363f0cb3 ("net: phy: Create sysfs reciprocal links for attached_dev/phydev")
Reported-by: Woojung Hung <Woojung.Huh@microchip.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: dsa: mv88e6xxx: handle SERDES error appropriately
Vivien Didelot [Fri, 26 May 2017 22:02:42 +0000 (18:02 -0400)]
net: dsa: mv88e6xxx: handle SERDES error appropriately

mv88e6xxx_serdes_power returns an error, so no need to print an error
message inside of it. Rather print it in its caller when the error is
ignored, which is in the mv88e6xxx_port_disable void function.

Catch and return its error in the counterpart mv88e6xxx_port_enable.

Fixes: 04aca9938255 ("dsa: mv88e6xxx: Enable/Disable SERDES on port enable/disable")
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'rtnetlink-Updates-to-rtnetlink_event'
David S. Miller [Sat, 27 May 2017 22:51:42 +0000 (18:51 -0400)]
Merge branch 'rtnetlink-Updates-to-rtnetlink_event'

Vladislav Yasevich says:

====================
rtnetlink: Updates to rtnetlink_event()

First is the patch to add IFLA_EVENT attribute to the netlink message.  It
supports only currently white-listed events.
Like before, this is just an attribute that gets added to the rtnetlink
message only when the messaged was generated as a result of a netdev event.
In my case, this is necessary since I want to trap NETDEV_NOTIFY_PEERS
event (also possibly NETDEV_RESEND_IGMP event) and perform certain actions
in user space.  This is not possible since the messages generated as
a result of netdev events do not usually contain any changed data.  They
are just notifications.  This patch exposes this notification type to
userspace.

Second, I remove duplicate messages that a result of a change to bonding
options.  If netlink is used to configure bonding options, 2 messages
are generated, one as a result NETDEV_CHANGEINFODATA event triggered by
bonding code and one a result of device state changes triggered by
netdev_state_change (called from do_setlink).

V6: Updated names and refactored to make it less tied to netdev events.
    (From David Ahern)
V5: Rebased.  Added iproute2 patch to the series.
V4:
  * Removed the patch the removed NETDEV_CHANGENAME from event whitelist.
    It doesn't trigger duplicate messages since name changes can only be
    done while device is down and netdev_state_change() doesn't report
    changes while device is down.
  * Added a patch to clean-up duplicate messages on bonding option changes.

V3: Rebased.  Cleaned-up duplicate event.

V2: Added missed events (from David Ahern)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobonding: Prevent duplicate userspace notification
Vlad Yasevich [Sat, 27 May 2017 14:14:35 +0000 (10:14 -0400)]
bonding: Prevent duplicate userspace notification

Whenever a user changes bonding options, a NETDEV_CHANGEINFODATA
notificatin is generated which results in a rtnelink message to
be sent.  While runnig 'ip monitor', we can actually see 2 messages,
one a result of the event, and the other a result of state change
that is generated bo netdev_state_change().  However, this is not
always the case. If bonding changes were done via sysfs or ifenslave
(old ioctl interface), then only 1 message is seen.

This patch removes duplicate messages in the case of using netlink
to configure bonding.  It introduceds a separte function that
triggers a netdev event and uses that function in the syfs and ioctl
cases.

This was discovered while auditing all the different envents and
continues the effort of cleaning up duplicated netlink messages.

CC: David Ahern <dsa@cumulusnetworks.com>
CC: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agortnl: Add support for netdev event to link messages
Vlad Yasevich [Sat, 27 May 2017 14:14:34 +0000 (10:14 -0400)]
rtnl: Add support for netdev event to link messages

When netdev events happen, a rtnetlink_event() handler will send
messages for every event in it's white list.  These messages contain
current information about a particular device, but they do not include
the iformation about which event just happened.  So, it is impossible
to tell what just happend for these events.

This patch adds a new extension to RTM_NEWLINK message called IFLA_EVENT
that would have an encoding of event that triggered this
message.  This would allow the the message consumer to easily determine
if it needs to perform certain actions.

Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
David S. Miller [Sat, 27 May 2017 00:46:35 +0000 (20:46 -0400)]
Merge git://git./linux/kernel/git/davem/net

Overlapping changes in drivers/net/phy/marvell.c, bug fix in 'net'
restricting a HW workaround alongside cleanups in 'net-next'.

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge tag 'led_fixes_for_4-12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 26 May 2017 21:02:30 +0000 (14:02 -0700)]
Merge tag 'led_fixes_for_4-12-rc3' of git://git./linux/kernel/git/j.anaszewski/linux-leds

Pull LED fix from Jacek Anaszewski:
 "A single LED fix for 4.12-rc3.

  leds-pca955x driver uses only i2c_smbus API and thus it should pass
  I2C_FUNC_SMBUS_BYTE_DATA flag to i2c_check_functionality"

* tag 'led_fixes_for_4-12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds:
  leds: pca955x: Correct I2C Functionality

7 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Linus Torvalds [Fri, 26 May 2017 20:51:01 +0000 (13:51 -0700)]
Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

 1) Fix state pruning in bpf verifier wrt. alignment, from Daniel
    Borkmann.

 2) Handle non-linear SKBs properly in SCTP ICMP parsing, from Davide
    Caratti.

 3) Fix bit field definitions for rss_hash_type of descriptors in mlx5
    driver, from Jesper Brouer.

 4) Defer slave->link updates until bonding is ready to do a full commit
    to the new settings, from Nithin Sujir.

 5) Properly reference count ipv4 FIB metrics to avoid use after free
    situations, from Eric Dumazet and several others including Cong Wang
    and Julian Anastasov.

 6) Fix races in llc_ui_bind(), from Lin Zhang.

 7) Fix regression of ESP UDP encapsulation for TCP packets, from
    Steffen Klassert.

 8) Fix mdio-octeon driver Kconfig deps, from Randy Dunlap.

 9) Fix regression in setting DSCP on ipv6/GRE encapsulation, from Peter
    Dawson.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (43 commits)
  ipv4: add reference counting to metrics
  net: ethernet: ax88796: don't call free_irq without request_irq first
  ip6_tunnel, ip6_gre: fix setting of DSCP on encapsulated packets
  sctp: fix ICMP processing if skb is non-linear
  net: llc: add lock_sock in llc_ui_bind to avoid a race condition
  bonding: Don't update slave->link until ready to commit
  test_bpf: Add a couple of tests for BPF_JSGE.
  bpf: add various verifier test cases
  bpf: fix wrong exposure of map_flags into fdinfo for lpm
  bpf: add bpf_clone_redirect to bpf_helper_changes_pkt_data
  bpf: properly reset caller saved regs after helper call and ld_abs/ind
  bpf: fix incorrect pruning decision when alignment must be tracked
  arp: fixed -Wuninitialized compiler warning
  tcp: avoid fastopen API to be used on AF_UNSPEC
  net: move somaxconn init from sysctl code
  net: fix potential null pointer dereference
  geneve: fix fill_info when using collect_metadata
  virtio-net: enable TSO/checksum offloads for Q-in-Q vlans
  be2net: Fix offload features for Q-in-Q packets
  vlan: Fix tcp checksum offloads in Q-in-Q vlans
  ...

7 years agoMerge branch 'ibmvnic-Driver-updates'
David S. Miller [Fri, 26 May 2017 19:32:47 +0000 (15:32 -0400)]
Merge branch 'ibmvnic-Driver-updates'

Nathan Fontenot says:

====================
ibmvnic: Driver updates

This set of patches implements several updates to the ibmvnic driver
to fix issues that have been found in testing. Most of the updates
invovle updating queue handling during driver close and reset
operations.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoibmvnic: Reset sub-crqs during driver reset
Nathan Fontenot [Fri, 26 May 2017 14:31:12 +0000 (10:31 -0400)]
ibmvnic: Reset sub-crqs during driver reset

When the ibmvnic driver is resetting, we can just reset the sub crqs
instead of releasing all of their resources and re-allocting them.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoibmvnic: Reset tx/rx pools on driver reset
Nathan Fontenot [Fri, 26 May 2017 14:31:06 +0000 (10:31 -0400)]
ibmvnic: Reset tx/rx pools on driver reset

When resetting the ibmvnic driver there is not a need to release
and re-allocate the resources for the tx and rx pools. These
resources can just be reset to avoid the re-allocations.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoibmvnic: Reset the CRQ queue during driver reset
Nathan Fontenot [Fri, 26 May 2017 14:31:00 +0000 (10:31 -0400)]
ibmvnic: Reset the CRQ queue during driver reset

When a driver reset operation occurs there is not a need to release
the CRQ resources and re-allocate them. Instead a reset of the CRQ
will suffice.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoibmvnic: Check adapter state during ibmvnic_poll
Nathan Fontenot [Fri, 26 May 2017 14:30:54 +0000 (10:30 -0400)]
ibmvnic: Check adapter state during ibmvnic_poll

We do not want to process any receive frames if the ibmvnic_poll
routine is invoked while a reset is in process. Also, before
replenishing the rx pools in the ibmvnic_poll, we want to
make sure the adapter is not in the process of closing.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoibmvnic: Deactivate RX pool buffer replenishment on H_CLOSED
Thomas Falcon [Fri, 26 May 2017 14:30:48 +0000 (10:30 -0400)]
ibmvnic: Deactivate RX pool buffer replenishment on H_CLOSED

If H_CLOSED is returned, halt RX buffer replenishment activity
until firmware sends a notification that the driver can reset.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoibmvnic: Halt TX and report carrier off on H_CLOSED return code
Thomas Falcon [Fri, 26 May 2017 14:30:42 +0000 (10:30 -0400)]
ibmvnic: Halt TX and report carrier off on H_CLOSED return code

This patch disables transmissions and reports carrier off if xmit
function returns that the hardware TX queue is closed. The driver can
then await a signal from firmware to determine the correct reset method.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoibmvnic: Non-fatal error handling
John Allen [Fri, 26 May 2017 14:30:37 +0000 (10:30 -0400)]
ibmvnic: Non-fatal error handling

Handle non-fatal error conditions. The process to do this when
resetting the driver is to just do __ibmvnic_close followed by
__ibmvnic_open.

Signed-off-by: John Allen <jallen@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoibmvnic: Fix cleanup of SKB's on driver close
Thomas Falcon [Fri, 26 May 2017 14:30:31 +0000 (10:30 -0400)]
ibmvnic: Fix cleanup of SKB's on driver close

A race condition occurs when closing the driver. Free'ing of skb's
can race between the close routine and ibmvnic_tx_interrupt. To fix
this we move the claenup of tx pools during close to after the
sub-CRQ interrupts are disabled.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoibmvnic: Send gratuitous arp on reset
John Allen [Fri, 26 May 2017 14:30:25 +0000 (10:30 -0400)]
ibmvnic: Send gratuitous arp on reset

Send gratuitous arp after any reset.

Signed-off-by: John Allen <jallen@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoibmvnic: Handle failover after failed init crq
John Allen [Fri, 26 May 2017 14:30:19 +0000 (10:30 -0400)]
ibmvnic: Handle failover after failed init crq

Handle case where phyp sends a failover after failing to send the
init crq.

Signed-off-by: John Allen <jallen@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoibmvnic: Track state of adapter napis
John Allen [Fri, 26 May 2017 14:30:13 +0000 (10:30 -0400)]
ibmvnic: Track state of adapter napis

Track the state of ibmvnic napis. The driver can get into states where it
can be reset when napis are already disabled and attempting to disable them
again will cause the driver to hang.

Signed-off-by: John Allen <jallen@linux.vnet.ibm.com>
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'mlxsw-Improve-extensibility'
David S. Miller [Fri, 26 May 2017 19:18:50 +0000 (15:18 -0400)]
Merge branch 'mlxsw-Improve-extensibility'

Jiri Pirko says:

====================
mlxsw: Improve extensibility

Ido says:

Since the initial introduction of the bridge offload in commit
56ade8fe3fe1 ("mlxsw: spectrum: Add initial support for Spectrum ASIC")
the per-port struct was used to store both physical properties of the
port as well as logical bridge properties such as learning and active
VLANs in the VLAN-aware bridge.

The above resulted in a bloated struct and code that is getting
increasingly difficult to extend when stacked devices are taken into
account as well as more advanced use cases such as IGMP snooping.

Due to the incremental development nature of this driver as well as the
complexity of the underlying hardware, subsequent design decisions failed
to generalize the FID and RIF resources, which could've benefited from
a more generic design, resulting in consolidated code paths and better
extensibility with regards to future ASICs and use cases.

This patchset tries to solve both of these design problems, as they're
tightly coupled. To ease the code review, the changes are done in a
bottom-up manner, in which the port struct is the first to be patched,
then the FIDs the ports are mapped to and finally the RIFs configured on
top.

The first half of the patchset gradually moves away from the previous
design to a design that is more in sync with the underlying hardware and
which clearly separates between hardware-specific structs and logical
ones such as a bridge port.

All the bridge-specific information is removed from the port struct, as
well as the list of VLAN devices ("vPorts") configured on top of it.
Instead, a linked list of VLANs is introduced, which allows each VLAN
to hold a state, such as mapping to a particular FID and membership in
a bridge. The data structures are depicted in the following figure:

                                  mlxsw_sp_bridge_device
                                       +----------+
                                       |          |
                                  +----+          |
                                  |    |          |
                                  |    +----------+
                                  |
             mlxsw_sp_bridge_port |
                 +----------+     |
                 |          |     |
              +-->          +-----+--> ..
              |  |          |
              |  +----+-----+
              |       |
              |       v
              | mlxsw_sp_bridge_vlan
              |  +----------+
              |  | vid X    |
              |  |          +--> ..
              |  |          |
              |  +----+-----+
              |       |
              +--+----v-----+
                 | vid X    |
              +--+          +--> ..
              |  |          |
mlxsw_sp_port |  +----------+
+----------+  | mlxsw_sp_port_vlan
|          |  |
|          +--+
|          |
+----------+

This model allows us to consolidate many of the code paths relating to
VLAN-aware and VLAN-unaware bridges, as the latter is simply represented
using a bridge port with a VLAN list size of one. Another advantage of
the model is that it's easy to extend it with future per-VLAN
attributes - such as mrouter indication - by merely pushing these down
from the bridge port struct to the bridge VLAN one.

The second half of the patchset builds on top of previous work and
prepares the driver for the common FID and RIF cores, which are finally
implemented in the last two patches. These exploit the fact that despite
the different kinds of FIDs and RIFs, they do share a common object on
which the core operations can operate on.

By hiding both objects from the rest of the driver and modeling their
operations using a VFT, it'll be easier to extend the driver for future
use cases such as VXLAN.

Tested using following LNST recipes:
https://github.com/jpirko/lnst/tree/master/recipes/switchdev
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum_router: Implement common RIF core
Ido Schimmel [Fri, 26 May 2017 06:37:40 +0000 (08:37 +0200)]
mlxsw: spectrum_router: Implement common RIF core

The mlxsw driver currently implements three types of RIFs. VLAN and FID
RIFs for L3 interfaces on top of VLAN-aware and VLAN-unaware bridges
(respectively) and Subport RIFs for all other L3 interfaces.

All the RIF types follow a common configuration procedure, which only
differs in the type-specific bits. The patch exploits this fact and
consolidates the common code paths, thereby simplifying the code and
making it more extensible.

This work also prepares the driver for use with future ASICs, where the
range of the Subport RIFs will be extended and their configuration
modified accordingly. By merely implementing a new RIF operations and
selecting it during initialization, the same driver could be re-used.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum: Implement common FID core
Ido Schimmel [Fri, 26 May 2017 06:37:39 +0000 (08:37 +0200)]
mlxsw: spectrum: Implement common FID core

The device supports three types of FIDs. 802.1Q and 802.1D FIDs for
VLAN-aware and VLAN-unaware bridges (respectively) and rFIDs to
transport packets to the router block.

The different users (e.g., bridge, router, ACLs) of the FIDs
infrastructure need not know about the internal FIDs implementation and
can therefore interact with it using a restricted set of exported
functions.

By encapsulating the entire FID logic and hiding it from the rest of the
driver we get a code base that it much simpler and easier to work with
and extend.

For example, in the current Spectrum ASIC only 802.1D FIDs can be
assigned a VNI, but future ASICs will also support 802.1Q FIDs. With
this patch in place, support for future ASICs can be easily added by
implementing a new FID operations according to their capabilities.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum_router: Determine VR first when creating RIF
Ido Schimmel [Fri, 26 May 2017 06:37:38 +0000 (08:37 +0200)]
mlxsw: spectrum_router: Determine VR first when creating RIF

All RIF types are associated with a virtual router (VR), so determine VR
first when creating a RIF.

That way, we can more easily integrate the common RIF core in the
following patches.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum_router: Flood packets to router after RIF creation
Ido Schimmel [Fri, 26 May 2017 06:37:37 +0000 (08:37 +0200)]
mlxsw: spectrum_router: Flood packets to router after RIF creation

If a packet ingress the router but can't be assigned an ingress RIF,
it's dropped.

Therefore, in the case of RIF configured on top of a bridge, it makes
sense to start flooding broadcast packets to the router only after the
RIF was created.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum_router: Destroy RIF only based on its struct
Ido Schimmel [Fri, 26 May 2017 06:37:36 +0000 (08:37 +0200)]
mlxsw: spectrum_router: Destroy RIF only based on its struct

Now that all the information to create a RIF is contained within the RIF
struct itself, we can also simplify the destruction logic.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum_router: Configure RIFs based on RIF struct
Ido Schimmel [Fri, 26 May 2017 06:37:35 +0000 (08:37 +0200)]
mlxsw: spectrum_router: Configure RIFs based on RIF struct

All the information necessary for the configuration of RIFs can now be
found in the RIF struct itself, so reduce the arguments list.

This gets us one step closer to the common RIF core.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum_router: Extend the RIF struct
Ido Schimmel [Fri, 26 May 2017 06:37:34 +0000 (08:37 +0200)]
mlxsw: spectrum_router: Extend the RIF struct

Currently, when a Subport RIF is configured, the LAG status and VLAN of
the underlying port are read from the port itself. This is problematic,
as we would like to have common code to configure all types of RIFs,
which aren't necessarily bound to a port.

Instead, embed the RIF in a struct specific to the Subport type, which
contains all the necessary information.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum_router: Allocate RIF prior to its configuration
Ido Schimmel [Fri, 26 May 2017 06:37:33 +0000 (08:37 +0200)]
mlxsw: spectrum_router: Allocate RIF prior to its configuration

In the following patches the RIF's configuration function is going to
expect a RIF struct with all the necessary information.

Therefore, allocate the RIF just before it's configured to the device.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum_router: Allocate FID prior to RIF configuration
Ido Schimmel [Fri, 26 May 2017 06:37:32 +0000 (08:37 +0200)]
mlxsw: spectrum_router: Allocate FID prior to RIF configuration

The following patches are going to re-arrange the FID and RIF code, so
that when the RIF is configured to the device based on the information
present in the RIF struct (which points to a FID).

For this reason, move the FID allocation to just before the RIF
configuration.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum: Replace vPorts with Port-VLAN
Ido Schimmel [Fri, 26 May 2017 06:37:31 +0000 (08:37 +0200)]
mlxsw: spectrum: Replace vPorts with Port-VLAN

As explained in the cover letter, since the introduction of the bridge
offload in the mlxsw driver, information related to the offloaded bridge
and bridge ports was stored in the individual port struct,
mlxsw_sp_port.

This lead to a bloated struct storing both physical properties of the
port (e.g., autoneg status) as well as logical properties of an upper
bridge port (e.g., learning, mrouter indication). While this might work
well for simple devices, it proved to be hard to extend when stacked
devices were taken into account and more advanced use-cases (e.g., IGMP
snooping) considered.

This patch removes the excess information from the above struct and
instead stores it in more appropriate structs that represent the bridge
port, the bridge itself and a VLAN configured on the bridge port.

The membership of a port in a bridge is denoted using the Port-VLAN
struct, which points to the bridge port and also member in the bridge
VLAN group of the VLAN it represents. This allows us to completely
remove the vPort abstraction and consolidate many of the code paths
relating to VLAN-aware and unaware bridges.

Note that the FID / vFID code is currently duplicated, but this will
soon go away when the common FID core will be introduced.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum: Don't create FIDs upon creation of VLAN uppers
Ido Schimmel [Fri, 26 May 2017 06:37:30 +0000 (08:37 +0200)]
mlxsw: spectrum: Don't create FIDs upon creation of VLAN uppers

Up until now we used to create FIDs upon the creation of VLAN uppers on
top of the VLAN-aware bridge. This was done so that in case a router
interface (RIF) was configured on top of the bridge, the FID would
already be there.

Instead, simplify the code and only create the FID upon RIF creation.

This is an intermediary step towards the introduction of the common FID
core, in which this code would be completely removed.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum: Don't lose bridge port device during enslavement
Ido Schimmel [Fri, 26 May 2017 06:37:29 +0000 (08:37 +0200)]
mlxsw: spectrum: Don't lose bridge port device during enslavement

Currently, when port netdevs (or their uppers) are enslaved to a bridge,
we simply propagate the CHANGEUPPER event all the way down and lose the
context of the actual netdevice used as the bridge port.

This leads to a lot of information hanging off the ports (and vPorts),
which doesn't logically belong there, such as mrouter indication and
unknown unicast flood state.

Following patches are going to put the mlxsw_sp_port struct on diet and
instead introduce a bridge port struct, where the above mentioned
information belongs. But in order to do that, we need to be able to
determine the bridge port netdevice, so propagate it down.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum_router: Replace vPorts with Port-VLAN
Ido Schimmel [Fri, 26 May 2017 06:37:28 +0000 (08:37 +0200)]
mlxsw: spectrum_router: Replace vPorts with Port-VLAN

We're going to get rid of vPorts completely later in the patchset, but
the router code is self-contained, so it's a good candidate to start the
transition with.

Convert all the functions that expects to operate on a vPort to operate
on a Port-VLAN instead.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum: Change signature of FID leave function
Ido Schimmel [Fri, 26 May 2017 06:37:27 +0000 (08:37 +0200)]
mlxsw: spectrum: Change signature of FID leave function

When a vPort is destroyed, it leaves the FID it's currently mapped to
(if any) and drops the reference. The FID's leave function expects to
get the vPort as its argument, but this will have to change when the
vPort model is retired.

Change the function signature to expect a Port-VLAN struct instead and
patch the call sites accordingly.

The code introduced in this patch will be removed later in the patchset,
but this intermediary step is required in order to ease the code review.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum: Introduce Port-VLAN structure
Ido Schimmel [Fri, 26 May 2017 06:37:26 +0000 (08:37 +0200)]
mlxsw: spectrum: Introduce Port-VLAN structure

This is the first step in the transition from the vPort model to a
unified Port-VLAN structure. The new structure is defined and created /
destroyed upon invocation of the 8021q ndos, but it's not actually used
throughout the code.

Subsequent patches will initialize it correctly and also create /
destroy it upon switchdev's VLAN object.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum: Set port's mode according to FID mappings
Ido Schimmel [Fri, 26 May 2017 06:37:25 +0000 (08:37 +0200)]
mlxsw: spectrum: Set port's mode according to FID mappings

We currently transition the port to "Virtual mode" upon the creation of
its first VLAN upper, as we need to classify incoming packets to a FID
using {Port, VID} and not only the VID.

However, it's more appropriate to transition the port to this mode when
the {Port, VID} are actually mapped to a FID. Either during the
enslavement of the VLAN upper to a VLAN-unaware bridge or the
configuration of a router port.

Do this change now in preparation for the introduction of the FID core,
where this operation will be encapsulated.

To prevent regressions, this patch also explicitly configures an OVS
slave to "Virtual mode". Otherwise, a packet that didn't hit an ACL rule
could be classified to an existing FID based on a global VID-to-FID
mapping, thus not incurring a FID mis-classification, which would
otherwise trap the packet to the CPU to be processed by the OVS daemon.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobridge: Export multicast enabled state
Ido Schimmel [Fri, 26 May 2017 06:37:24 +0000 (08:37 +0200)]
bridge: Export multicast enabled state

During enslavement to a bridge, after the CHANGEUPPER is sent, the
multicast enabled state of the bridge isn't propagated down to the
offloading driver unless it's changed.

This patch allows such drivers to query the multicast enabled state from
the bridge, so that they'll be able to correctly configure their flood
tables during port enslavement.

In case multicast is disabled, unregistered multicast packets can be
treated as broadcast and be flooded through all the bridge ports.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobridge: Export VLAN filtering state
Ido Schimmel [Fri, 26 May 2017 06:37:23 +0000 (08:37 +0200)]
bridge: Export VLAN filtering state

It's useful for drivers supporting bridge offload to be able to query
the bridge's VLAN filtering state.

Currently, upon enslavement to a bridge master, the offloading driver
will only learn about the bridge's VLAN filtering state after the bridge
device was already linked with its slave.

Being able to query the bridge's VLAN filtering state allows such
drivers to forbid enslavement in case resource couldn't be allocated for
a VLAN-aware bridge and also choose the correct initialization routine
for the enslaved port, which is dependent on the bridge type.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge tag 'xfs-4.12-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Linus Torvalds [Fri, 26 May 2017 19:13:08 +0000 (12:13 -0700)]
Merge tag 'xfs-4.12-fixes-2' of git://git./fs/xfs/xfs-linux

Pull XFS fixes from Darrick Wong:
 "A few miscellaneous bug fixes & cleanups:

   - Fix indlen block reservation accounting bug when splitting delalloc
     extent

   - Fix warnings about unused variables that appeared in -rc1.

   - Don't spew errors when bmapping a local format directory

   - Fix an off-by-one error in a delalloc eof assertion

   - Make fsmap only return inode information for CAP_SYS_ADMIN

   - Fix a potential mount time deadlock recovering cow extents

   - Fix unaligned memory access in _btree_visit_blocks

   - Fix various SEEK_HOLE/SEEK_DATA bugs"

* tag 'xfs-4.12-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: Move handling of missing page into one place in xfs_find_get_desired_pgoff()
  xfs: Fix off-by-in in loop termination in xfs_find_get_desired_pgoff()
  xfs: Fix missed holes in SEEK_HOLE implementation
  xfs: fix off-by-one on max nr_pages in xfs_find_get_desired_pgoff()
  xfs: fix unaligned access in xfs_btree_visit_blocks
  xfs: avoid mount-time deadlock in CoW extent recovery
  xfs: only return detailed fsmap info if the caller has CAP_SYS_ADMIN
  xfs: bad assertion for delalloc an extent that start at i_size
  xfs: fix warnings about unused stack variables
  xfs: BMAPX shouldn't barf on inline-format directories
  xfs: fix indlen accounting error on partial delalloc conversion

7 years agoMerge branch 'mv88e6xxx-SERDES'
David S. Miller [Fri, 26 May 2017 19:00:46 +0000 (15:00 -0400)]
Merge branch 'mv88e6xxx-SERDES'

Andrew Lunn says:

====================
net: dsa: mv88e6xxx: Add basic SERDES support

Some of the Marvell switches are SERDES interface, which must be
powered up before packets can be passed. This is particularly true on
the 6390, where the SERDES defaults to down, probably to save power.

This series refactors the existing SERDES support for the 6352, and
adds 6390 support.

v2:

Split phy functions out into phy.[ch]
Don't add MV88E6XXX_FLAG_G1_ATU_FID back again
Move the serdes op up in mv88e6xxx_ops
Move some #defines into serdes.h
Add a mv88e6xxx_serdes_power()
Don't keep moving calls to this helper around in the code

v3:

Move more phy functions into phy.[ch]
Make mv88e6xxx_phy_page_get() and mv88e6xxx_phy_page_put static
Use the mv88e6xxx_serdes_power() helper everywhere
dev_err(...) when mv88e6xxx_serdes_power() fails
Add reviewed-by's
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agodsa: mv88e6xxx: Enable/Disable SERDES on port enable/disable
Andrew Lunn [Thu, 25 May 2017 23:03:24 +0000 (01:03 +0200)]
dsa: mv88e6xxx: Enable/Disable SERDES on port enable/disable

Implement the port enable/disable callbacks, which enable/disable the
SERDES interfaces, if applicable. This should save a bit of
power/heat.

We also need to enable SERDES on CPU and DSA ports, so keep the
existing call to the op, but make it conditional.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: dsa: mv88e6xxx: mv88e6390X SERDES support
Andrew Lunn [Thu, 25 May 2017 23:03:23 +0000 (01:03 +0200)]
net: dsa: mv88e6xxx: mv88e6390X SERDES support

The mv88e6390X family has 8 SERDES lanes. These can be used for 2
10Gbps ports, ports 9 or 10. If these ports are used at slower speeds,
the SERDES lanes become available for other ports for 1000Base-X.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: dsa: mv88e6xxx: Remove SERDES flag
Andrew Lunn [Thu, 25 May 2017 23:03:22 +0000 (01:03 +0200)]
net: dsa: mv88e6xxx: Remove SERDES flag

Now that we use an op for SERDES operations, we don't need a flag for
it. Remove it.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: dsa: mv88e6xxx: Refactor mv88e6352 SERDES code into an op
Andrew Lunn [Thu, 25 May 2017 23:03:21 +0000 (01:03 +0200)]
net: dsa: mv88e6xxx: Refactor mv88e6352 SERDES code into an op

The mv88e6390 family has a different SERDES implementation. Refactor
the mv88e6352 code into an ops function, so we can later add the
mv88e6390 code.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: dsa: mv88e6xxx: Move phy functions into phy.[ch]
Andrew Lunn [Thu, 25 May 2017 23:03:20 +0000 (01:03 +0200)]
net: dsa: mv88e6xxx: Move phy functions into phy.[ch]

The upcoming SERDES support will need to make use of PHY functions. Move
them out into a file of there own. No code changes.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoipv4: add reference counting to metrics
Eric Dumazet [Thu, 25 May 2017 21:27:35 +0000 (14:27 -0700)]
ipv4: add reference counting to metrics

Andrey Konovalov reported crashes in ipv4_mtu()

I could reproduce the issue with KASAN kernels, between
10.246.7.151 and 10.246.7.152 :

1) 20 concurrent netperf -t TCP_RR -H 10.246.7.152 -l 1000 &

2) At the same time run following loop :
while :
do
 ip ro add 10.246.7.152 dev eth0 src 10.246.7.151 mtu 1500
 ip ro del 10.246.7.152 dev eth0 src 10.246.7.151 mtu 1500
done

Cong Wang attempted to add back rt->fi in commit
82486aa6f1b9 ("ipv4: restore rt->fi for reference counting")
but this proved to add some issues that were complex to solve.

Instead, I suggested to add a refcount to the metrics themselves,
being a standalone object (in particular, no reference to other objects)

I tried to make this patch as small as possible to ease its backport,
instead of being super clean. Note that we believe that only ipv4 dst
need to take care of the metric refcount. But if this is wrong,
this patch adds the basic infrastructure to extend this to other
families.

Many thanks to Julian Anastasov for reviewing this patch, and Cong Wang
for his efforts on this problem.

Fixes: 2860583fe840 ("ipv4: Kill rt->fi")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: ethernet: ax88796: support generating a random mac address
Uwe Kleine-König [Thu, 25 May 2017 20:55:11 +0000 (22:55 +0200)]
net: ethernet: ax88796: support generating a random mac address

Instead of falling back to 00:00:00:00:00:00 generate a random address
if none is provided via platform data or from the device's register
space.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: ethernet: ax88796: don't call free_irq without request_irq first
Uwe Kleine-König [Thu, 25 May 2017 20:54:53 +0000 (22:54 +0200)]
net: ethernet: ax88796: don't call free_irq without request_irq first

The function ax_init_dev (which is called only from the driver's .probe
function) calls free_irq in the error path without having requested the
irq in the first place. So drop the free_irq call in the error path.

Fixes: 825a2ff1896e ("AX88796 network driver")
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoip6_tunnel, ip6_gre: fix setting of DSCP on encapsulated packets
Peter Dawson [Thu, 25 May 2017 20:35:18 +0000 (06:35 +1000)]
ip6_tunnel, ip6_gre: fix setting of DSCP on encapsulated packets

This fix addresses two problems in the way the DSCP field is formulated
 on the encapsulating header of IPv6 tunnels.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=195661

1) The IPv6 tunneling code was manipulating the DSCP field of the
 encapsulating packet using the 32b flowlabel. Since the flowlabel is
 only the lower 20b it was incorrect to assume that the upper 12b
 containing the DSCP and ECN fields would remain intact when formulating
 the encapsulating header. This fix handles the 'inherit' and
 'fixed-value' DSCP cases explicitly using the extant dsfield u8 variable.

2) The use of INET_ECN_encapsulate(0, dsfield) in ip6_tnl_xmit was
 incorrect and resulted in the DSCP value always being set to 0.

Commit 90427ef5d2a4 ("ipv6: fix flow labels when the traffic class
 is non-0") caused the regression by masking out the flowlabel
 which exposed the incorrect handling of the DSCP portion of the
 flowlabel in ip6_tunnel and ip6_gre.

Fixes: 90427ef5d2a4 ("ipv6: fix flow labels when the traffic class is non-0")
Signed-off-by: Peter Dawson <peter.a.dawson@boeing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'marvell-phy-cleanups'
David S. Miller [Fri, 26 May 2017 18:44:51 +0000 (14:44 -0400)]
Merge branch 'marvell-phy-cleanups'

Andrew Lunn says:

====================
More marvell phy cleanups

This patchset continues the cleanup of the Marvell PHY driver.  These
phys use pages to allow more than the 32 registers that fit into the
MDIO address space. Cleanup the code used for changing pages.

v2
Reverse christmas tree
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: phy: marvell: Uniform page names
Andrew Lunn [Thu, 25 May 2017 19:42:08 +0000 (21:42 +0200)]
net: phy: marvell: Uniform page names

Bring all the page names together, remove the repeats, and make them
uniform.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: phy: marvell: helper to get and set page
Andrew Lunn [Thu, 25 May 2017 19:42:07 +0000 (21:42 +0200)]
net: phy: marvell: helper to get and set page

There is a common pattern of first reading the currently selected page
and then changing to another page. Add a helper to do this.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: phy: marvell: More hidden page changes refactored
Andrew Lunn [Thu, 25 May 2017 19:42:06 +0000 (21:42 +0200)]
net: phy: marvell: More hidden page changes refactored

EXT_ADDR_PAGE is the same meaning as MII_MARVELL_PHY_PAGE, i.e. change
page. Replace it will calls to the helpers.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: phy: marvell: #defines for copper and fibre pages
Andrew Lunn [Thu, 25 May 2017 19:42:05 +0000 (21:42 +0200)]
net: phy: marvell: #defines for copper and fibre pages

Replace magic numbers for PHY pages with symbolic names.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoliquidio: fix inaccurate count of napi-processed rx packets reported to Octeon
Prasad Kanneganti [Thu, 25 May 2017 17:54:29 +0000 (10:54 -0700)]
liquidio: fix inaccurate count of napi-processed rx packets reported to Octeon

lio_enable_irq (called by napi poll) is reporting to Octeon an inaccurate
count of processed rx packets causing Octeon to eventually stop forwarding
packets to the host.  Fix it by using this formula for an accurate count:

    processed rx packets = droq->pkt_count - droq->pkts_pending

Also increase SOFT_COMMAND_BUFFER_SIZE to match what the firmware expects.

Signed-off-by: Prasad Kanneganti <prasad.kanneganti@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoliquidio: fix rare pci_driver.probe failure of VF driver
Prasad Kanneganti [Thu, 25 May 2017 17:42:14 +0000 (10:42 -0700)]
liquidio: fix rare pci_driver.probe failure of VF driver

There's a rare pci_driver.probe failure of the VF driver that's caused by
PF/VF handshake going out of sync.  The culprit is octeon_mbox_write() who
ignores an ack timeout condition; it just keeps unconditionally writing all
elements of mbox_cmd->data[] even when the other side is not ready for
them.  Fix it by making each write of mbox_cmd->data[i] conditional to
having previously received an ack.

Also fix the octeon_mbox_state enum such that each state gets a unique
value.  Also add ULL suffix to numeric literals in macro definitions.

Signed-off-by: Prasad Kanneganti <prasad.kanneganti@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosctp: fix ICMP processing if skb is non-linear
Davide Caratti [Thu, 25 May 2017 17:14:56 +0000 (19:14 +0200)]
sctp: fix ICMP processing if skb is non-linear

sometimes ICMP replies to INIT chunks are ignored by the client, even if
the encapsulated SCTP headers match an open socket. This happens when the
ICMP packet is carried by a paged skb: use skb_header_pointer() to read
packet contents beyond the SCTP header, so that chunk header and initiate
tag are validated correctly.

v2:
- don't use skb_header_pointer() to read the transport header, since
  icmp_socket_deliver() already puts these 8 bytes in the linear area.
- change commit message to make specific reference to INIT chunks.

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'phy-sysfs-reciprocal-links'
David S. Miller [Fri, 26 May 2017 18:37:42 +0000 (14:37 -0400)]
Merge branch 'phy-sysfs-reciprocal-links'

Florian Fainelli says:

====================
net: phy: Create sysfs reciprocal links for attached_dev/phydev

This patch series addresses a device topology shortcoming where a program
scanning /sys would not be able to establish a mapping between the network
device and the PHY device.

In the process it turned out that no PHY device documentation existed for
sysfs attributes.

Changes in v2:

- document possible phy_interface values in sysfs-class-net-phydev
====================

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: sysfs: Document PHY device sysfs attributes
Florian Fainelli [Thu, 25 May 2017 16:21:43 +0000 (09:21 -0700)]
net: sysfs: Document PHY device sysfs attributes

Document the different sysfs attributes that exist for PHY devices:
attached_dev, phy_has_fixups, phy_id and phy_interface.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: sysfs: Document "phydev" symbolic link
Florian Fainelli [Thu, 25 May 2017 16:21:42 +0000 (09:21 -0700)]
net: sysfs: Document "phydev" symbolic link

Now that we link the network device to its PHY device, document this
sysfs symbolic link.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: phy: Create sysfs reciprocal links for attached_dev/phydev
Florian Fainelli [Thu, 25 May 2017 16:21:41 +0000 (09:21 -0700)]
net: phy: Create sysfs reciprocal links for attached_dev/phydev

There is currently no way for a program scanning /sys to know whether a
network device is attached to a particular PHY device, just like the PHY
device is not pointed back to its attached network device.

Create a symbolic link in the network device's namespace named "phydev"
which points to the PHY device and create a symbolic link in the PHY
device's namespace named "attached_dev" that points back to the network
device. These links are set up during phy_attach_direct() and removed
during phy_detach() for symetry.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>