platform/kernel/linux-rpi.git
3 years agoMerge branch 'net-ipa-ipa-register-cleanup'
Jakub Kicinski [Wed, 18 Nov 2020 23:53:51 +0000 (15:53 -0800)]
Merge branch 'net-ipa-ipa-register-cleanup'

Alex Elder says:

====================
net: ipa: IPA register cleanup

This series consists of cleanup patches, almost entirely related to
the definitions for IPA registers.  Some comments are updated or
added to provide better information about defined IPA registers.
Other cleanups ensure symbol names and their assigned values are
defined consistently.  Some essentially duplicate definitions get
consolidated for simplicity.  In a few cases some minor bugs
(missing definitions) are fixed.  With these changes, all IPA
register offsets and associated field masks should be correct for
IPA versions 3.5.1, 4.0, 4.1, and 4.2.
====================

Link: https://lore.kernel.org/r/20201116233805.13775-1-elder@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: ipa: a few last IPA register cleanups
Alex Elder [Mon, 16 Nov 2020 23:38:05 +0000 (17:38 -0600)]
net: ipa: a few last IPA register cleanups

Some last cleanups for the existing IPA register definitions:
  - Remove the definition of IPA_REG_ENABLED_PIPES_OFFSET, because
    it is not used.
  - Use "IPA_" instead of "BAM_" as the prefix on fields associated
    with the FLAVOR_0 register.  We use GSI (not BAM), but the
    fields apply to both GSI and BAM.
  - Get rid of the definition of IPA_CS_RSVD; it is never used.
  - Add two missing field mask definitions for the INIT_DEAGGR
    endpoint register.
  - Eliminate a few of the defined sequencer types, because they
    are unused.  We can add them back when needed.
  - Add a field mask to indicate which bit causes an interrupt on
    the microcontroller.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: ipa: move definition of enum ipa_irq_id
Alex Elder [Mon, 16 Nov 2020 23:38:04 +0000 (17:38 -0600)]
net: ipa: move definition of enum ipa_irq_id

Move the definition of the ipa_irq_id enumerated type out of
"ipa_interrupt.h" and into "ipa_reg.h", and flesh out its set of
defined values.  Each interrupt id indicates a particular type of
IPA interrupt that can be signaled.  Their numeric values define bit
positions in the IPA_IRQ_* registers, so should their definitions
should accompany the definition of those register offsets.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: ipa: rearrange a few IPA register definitions
Alex Elder [Mon, 16 Nov 2020 23:38:03 +0000 (17:38 -0600)]
net: ipa: rearrange a few IPA register definitions

Move a few things around in "ipa_reg.h":
  - Move the definition of ipa_reg_state_aggr_active_offset() down
    a bit in the file so definitions are ordered by offset (for
    the lowest supported IPA version) like all other definitions.
  - Move the definition TIMER_FREQUENCY to be immediately above
    the definition of ipa_aggr_granularity_val() where it's used.
  - Move each register field value enumerated type definition to
    immediately follow the definitions of the register and field
    it is associated with.
No code functionality is modified by this patch.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: ipa: fix up IPA register comments
Alex Elder [Mon, 16 Nov 2020 23:38:02 +0000 (17:38 -0600)]
net: ipa: fix up IPA register comments

Revise or add comments in "ipa_reg.h" for to provide more
information, and to improve clarity and consistency.
  - Always provide a comment to define when a register or field is
    supported (or not) for certain versions of IPA hardware.
  - Try to be specific about *which* or *how many* definitions
    a comment refers to.
  - Move comments stating that ipa->available defines the valid
    bits in various registers *above* the register offset
    definition, to avoid some checkpatch.pl warnings.
No code is changed by this patch.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: ipa: define enumerated types consistently
Alex Elder [Mon, 16 Nov 2020 23:38:01 +0000 (17:38 -0600)]
net: ipa: define enumerated types consistently

Consistently define numeric values for enumerated type members using
hexidecimal (rather than decimal) format values.  Align the values
assigned in the same column in each file.

Only assign values where they really matter, for example don't
assign IPA_ENDPOINT_AP_MODEM_TX the value 0.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: ipa: fix BCR register field definitions
Alex Elder [Mon, 16 Nov 2020 23:38:00 +0000 (17:38 -0600)]
net: ipa: fix BCR register field definitions

The backward compatibility register field masks are defined using
single-bit masks defined with BIT(x) rather than GENMASK(x, x).
Change this one set of definitions to follow the GENMASK() pattern
used everywhere else.  Add a few missing field definitions for this
register as well.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: ipa: use _FMASK consistently
Alex Elder [Mon, 16 Nov 2020 23:37:59 +0000 (17:37 -0600)]
net: ipa: use _FMASK consistently

Several IPA register field masks are defined without the "_FMASK"
suffix naming convention.  Rename these, so all field masks are
consistently named.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: ipa: fix two inconsistent IPA register names
Alex Elder [Mon, 16 Nov 2020 23:37:58 +0000 (17:37 -0600)]
net: ipa: fix two inconsistent IPA register names

Rename two suspend IRQ registers so they follow the IPA_REG_IRQ_xxx
naming convention used elsewhere.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: ipa: support more versions for HOLB timer
Alex Elder [Mon, 16 Nov 2020 23:37:57 +0000 (17:37 -0600)]
net: ipa: support more versions for HOLB timer

IPA version 3.5.1 represents the timer used in avoiding head-of-line
blocking with a simple tick count.  IPA v4.2 changes that, instead
splitting the timer field into two parts (base and scale) to
represent the ticks in the timer period.

IPA v4.0 and IPA v4.1 use the same method as IPA v3.5.1.  Change the
test in ipa_reg_init_hol_block_timer_val() so the result is correct
for those versions as well.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: ipa: make filter/routing hash enable register variable
Alex Elder [Mon, 16 Nov 2020 23:37:56 +0000 (17:37 -0600)]
net: ipa: make filter/routing hash enable register variable

For IPA v3.5.1, the IPA filter/routing hash enable register actually
does exist, but it is at offset 0x8c into the IPA register space.
For newer versions of IPA it is at offset 0x148.

Define a new inline function ipa_reg_filt_rout_hash_en_offset() to
return the appropriate value for a given version of IPA hardware.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: ipa: share field mask values for IPA hash registers
Alex Elder [Mon, 16 Nov 2020 23:37:55 +0000 (17:37 -0600)]
net: ipa: share field mask values for IPA hash registers

The IPA filter/routing hash enable register and filter/routing hash
flush register each have four single-bit fields representing the
four hashed tables to be enabled or flushed.  The field positions
are identical, so just use a single set of field masks to represent
the fields for both registers.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoDocumentation: Remove the deleted "framerelay" document from the index
Xie He [Wed, 18 Nov 2020 12:42:26 +0000 (04:42 -0800)]
Documentation: Remove the deleted "framerelay" document from the index

commit f73659192b0b ("net: wan: Delete the DLCI / SDLA drivers")
deleted "Documentation/networking/framerelay.rst". However, it is still
referenced in "Documentation/networking/index.rst". We need to remove the
reference, too.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Xie He <xie.he.0141@gmail.com>
Link: https://lore.kernel.org/r/20201118124226.15588-1-xie.he.0141@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoMerge branch 'mlxsw-preparations-for-nexthop-objects-support-part-2-2'
Jakub Kicinski [Wed, 18 Nov 2020 19:51:21 +0000 (11:51 -0800)]
Merge branch 'mlxsw-preparations-for-nexthop-objects-support-part-2-2'

Ido Schimmel says:

====================
mlxsw: Preparations for nexthop objects support - part 2/2

This patch set contains the second round of preparations towards nexthop
objects support in mlxsw. Follow up patches can be found here [1].

The patches are mostly small and trivial and contain non-functional
changes aimed at making it easier to integrate nexthop objects with
mlxsw.

Patch #1 is a fix for an issue introduced in previous submission. Found
by Coverity.

[1] https://github.com/idosch/linux/tree/submit/nexthop_objects
====================

Link: https://lore.kernel.org/r/20201117174704.291990-1-idosch@idosch.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agomlxsw: spectrum_router: Allow returning errors from mlxsw_sp_nexthop_group_refresh()
Ido Schimmel [Tue, 17 Nov 2020 17:47:04 +0000 (19:47 +0200)]
mlxsw: spectrum_router: Allow returning errors from mlxsw_sp_nexthop_group_refresh()

The function is responsible for allocating the adjacency entries used by
the nexthop group and populating them with the adjacency information
such as egress RIF and MAC address.

Allow the function to return an error when it encounters a problem and
have the relevant call sites check it.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agomlxsw: spectrum_router: Add an indication if a nexthop group can be destroyed
Ido Schimmel [Tue, 17 Nov 2020 17:47:03 +0000 (19:47 +0200)]
mlxsw: spectrum_router: Add an indication if a nexthop group can be destroyed

Currently, a nexthop group is destroyed when the last FIB entry is
detached from it.

When nexthop objects are supported, this can no longer be the case, as
the group is a separate object whose lifetime is managed by user space.

Add an indication if a nexthop group can be destroyed and always set it
to true for the existing IPv4 and IPv6 nexthop groups.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agomlxsw: spectrum_router: Only clear offload indication from valid IPv6 FIB info
Ido Schimmel [Tue, 17 Nov 2020 17:47:02 +0000 (19:47 +0200)]
mlxsw: spectrum_router: Only clear offload indication from valid IPv6 FIB info

When the IPv6 FIB info has a nexthop object, the nexthop offload
indication is set on the nexthop object and not on the FIB info itself.

Therefore, do not try to clear the offload indication from the FIB info
when it has a nexthop object.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agomlxsw: spectrum_router: Re-order mlxsw_sp_nexthop6_group_get()
Ido Schimmel [Tue, 17 Nov 2020 17:47:01 +0000 (19:47 +0200)]
mlxsw: spectrum_router: Re-order mlxsw_sp_nexthop6_group_get()

Attach the FIB entry to the nexthop group after setting the offload flag
on the IPv6 FIB info (i.e., 'struct fib6_info'). The second operation is
not needed when the nexthop group is a nexthop object. This will allow
us to have a common exit path from the function, regardless of the
nexthop group's type.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agomlxsw: spectrum_router: Set FIB entry's type based on nexthop group
Ido Schimmel [Tue, 17 Nov 2020 17:47:00 +0000 (19:47 +0200)]
mlxsw: spectrum_router: Set FIB entry's type based on nexthop group

The previous patch associated a nexthop group with the FIB entry before
the entry's type is determined.

Make use of the nexthop group when determining the entry's type instead
of relying on helpers that assume that the nexthop info is not a nexthop
object (i.e., 'struct nexthop').

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agomlxsw: spectrum_router: Set FIB entry's type after creating nexthop group
Ido Schimmel [Tue, 17 Nov 2020 17:46:59 +0000 (19:46 +0200)]
mlxsw: spectrum_router: Set FIB entry's type after creating nexthop group

Each FIB entry has a type (e.g., remote, local) that determines how the
entry is programmed to the device. In order to determine if the entry is
local (directly connected) or remote (has a gateway) the relevant FIB
info structures (e.g., 'struct fib_info') are checked.

When entries that use nexthop objects are supported, these checks will
need to be changed to take into account 'struct nexthop'.

Instead, first associate the entry with a nexthop group so that the next
patch could determine the entry's type based on the associated nexthop
group's type.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agomlxsw: spectrum_router: Pass ifindex to mlxsw_sp_ipip_entry_find_by_decap()
Ido Schimmel [Tue, 17 Nov 2020 17:46:58 +0000 (19:46 +0200)]
mlxsw: spectrum_router: Pass ifindex to mlxsw_sp_ipip_entry_find_by_decap()

The sole caller of the function will soon only have the ifindex
available, instead of the pointer itself.

Therefore, change the function to take the ifindex as input and have it
get the pointer.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agomlxsw: spectrum_router: Set ifindex for IPv4 nexthops
Ido Schimmel [Tue, 17 Nov 2020 17:46:57 +0000 (19:46 +0200)]
mlxsw: spectrum_router: Set ifindex for IPv4 nexthops

The ifindex of the nexthop device was never set for IPv4 nexthops,
unlike IPv6 nexthops. This went unnoticed since only IPv6 nexthops use
it.

Set the ifindex for IPv4 nexthops in order to be consistent with IPv6
and also because it will be used by a later patch.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agomlxsw: spectrum_router: Fix wrong kfree() in error path
Ido Schimmel [Tue, 17 Nov 2020 17:46:56 +0000 (19:46 +0200)]
mlxsw: spectrum_router: Fix wrong kfree() in error path

The function allocates 'nhgi', not 'nh_grp', so it needs to free the
former in its error path.

Fixes: 7f7a417e6a11 ("mlxsw: spectrum_router: Split nexthop group configuration to a different struct")
Addresses-Coverity: ("Memory - corruptions  (USE_AFTER_FREE)")
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoptp: document struct ptp_clock_request members
Ahmad Fatoum [Tue, 17 Nov 2020 21:38:26 +0000 (22:38 +0100)]
ptp: document struct ptp_clock_request members

It's arguable most people interested in configuring a PPS signal
want it as external output, not as kernel input. PTP_CLK_REQ_PPS
is for input though. Add documentation to nudge readers into
the correct direction.

Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Link: https://lore.kernel.org/r/20201117213826.18235-1-a.fatoum@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonfp: tls: Fix unreachable code issue
Gustavo A. R. Silva [Tue, 17 Nov 2020 17:13:47 +0000 (11:13 -0600)]
nfp: tls: Fix unreachable code issue

Fix the following unreachable code issue:

   drivers/net/ethernet/netronome/nfp/crypto/tls.c: In function 'nfp_net_tls_add':
   include/linux/compiler_attributes.h:208:41: warning: statement will never be executed [-Wswitch-unreachable]
     208 | # define fallthrough                    __attribute__((__fallthrough__))
         |                                         ^~~~~~~~~~~~~
   drivers/net/ethernet/netronome/nfp/crypto/tls.c:299:3: note: in expansion of macro 'fallthrough'
     299 |   fallthrough;
         |   ^~~~~~~~~~~

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Link: https://lore.kernel.org/r/20201117171347.GA27231@embeddedor
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoMerge branch 'fix-several-bad-kernel-doc-markups'
Jakub Kicinski [Tue, 17 Nov 2020 22:15:05 +0000 (14:15 -0800)]
Merge branch 'fix-several-bad-kernel-doc-markups'

Mauro Carvalho Chehab says:

====================
Fix several bad kernel-doc markups

Kernel-doc has always be limited to a probably bad documented
rule:

The kernel-doc markups should appear *imediatelly before* the
function or data structure that it documents.

On other words, if a C file would contain something like this:

/**
 * foo - function foo
 * @args: foo args
 */
static inline void bar(int args);

/**
 * bar - function bar
 * @args: foo args
 */
static inline void foo(void *args);

The output (in ReST format) will be:

.. c:function:: void bar (int args)

   function foo

**Parameters**

``int args``
  foo args

.. c:function:: void foo (void *args)

   function bar

**Parameters**

``void *args``
  foo args

Which is clearly a wrong result.  Before this changeset,
not even a warning is produced on such cases.

As placing such markups just before the documented
data is a common practice, on most cases this is fine.

However, as patches touch things, identifiers may be
renamed, and people may forget to update the kernel-doc
markups to follow such changes.

This has been happening for quite a while, as there are
lots of files with kernel-doc problems.

This series address those issues and add a file at the
end that will enforce that the identifier will match the
kernel-doc markup, avoiding this problem from
keep happening as time goes by.
====================

Link: https://lore.kernel.org/r/cover.1605521731.git.mchehab+huawei@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: core: fix some kernel-doc markups
Mauro Carvalho Chehab [Mon, 16 Nov 2020 10:17:59 +0000 (11:17 +0100)]
net: core: fix some kernel-doc markups

Some identifiers have different names between their prototypes
and the kernel-doc markup.

In the specific case of netif_subqueue_stopped(), keep the
current markup for __netif_subqueue_stopped(), adding a
new one for netif_subqueue_stopped().

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: datagram: fix some kernel-doc markups
Mauro Carvalho Chehab [Mon, 16 Nov 2020 10:17:58 +0000 (11:17 +0100)]
net: datagram: fix some kernel-doc markups

Some identifiers have different names between their prototypes
and the kernel-doc markup.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: phy: fix kernel-doc markups
Mauro Carvalho Chehab [Mon, 16 Nov 2020 10:17:57 +0000 (11:17 +0100)]
net: phy: fix kernel-doc markups

Some functions have different names between their prototypes
and the kernel-doc markup.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoMerge branch 'add-ethtool-ntuple-filters-support'
Jakub Kicinski [Tue, 17 Nov 2020 21:48:27 +0000 (13:48 -0800)]
Merge branch 'add-ethtool-ntuple-filters-support'

Naveen Mamindlapalli says:

====================
Add ethtool ntuple filters support

This patch series adds support for ethtool ntuple filters, unicast
address filtering, VLAN offload and SR-IOV ndo handlers. All of the
above features are based on the Admin Function(AF) driver support to
install and delete the low level MCAM entries. Each MCAM entry is
programmed with the packet fields to match and what actions to take
if the match succeeds. The PF driver requests AF driver to allocate
set of MCAM entries to be used to install the flows by that PF. The
entries will be freed when the PF driver is unloaded.

* The patches 1 to 4 adds AF driver infrastructure to install and
  delete the low level MCAM flow entries.
* Patch 5 adds ethtool ntuple filter support.
* Patch 6 adds unicast MAC address filtering.
* Patch 7 adds support for dumping the MCAM entries via debugfs.
* Patches 8 to 10 adds support for VLAN offload.
* Patch 10 to 11 adds support for SR-IOV ndo handlers.
* Patch 12 adds support to read the MCAM entries.

Misc:
* Removed redundant mailbox NIX_RXVLAN_ALLOC.
====================

Link: https://lore.kernel.org/r/20201114195303.25967-1-naveenm@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoocteontx2-af: Delete NIX_RXVLAN_ALLOC mailbox message
Subbaraya Sundeep [Sat, 14 Nov 2020 19:53:03 +0000 (01:23 +0530)]
octeontx2-af: Delete NIX_RXVLAN_ALLOC mailbox message

Since mailbox message for installing flows is in place,
remove the RXVLAN_ALLOC mbox message which is redundant.

Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoocteontx2-af: Add new mbox messages to retrieve MCAM entries
Naveen Mamindlapalli [Sat, 14 Nov 2020 19:53:02 +0000 (01:23 +0530)]
octeontx2-af: Add new mbox messages to retrieve MCAM entries

This patch introduces new mailbox mesages to retrieve a given
MCAM entry or base flow steering rule of a VF installed by its
parent PF. This helps while updating the existing MCAM rules
with out re-framing the whole mailbox request again. The INSTALL
FLOW mailbox consumer can read-modify-write the existing entry.
Similarly while installing new flow rules for a VF, the base
flow steering rule match creteria is copied to the new flow rule
and the deltas are appended to the new rule.

Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com>
Co-developed-by: Vamsi Attunuru <vattunuru@marvell.com>
Signed-off-by: Vamsi Attunuru <vattunuru@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoocteontx2-af: Handle PF-VF mac address changes
Hariprasad Kelam [Sat, 14 Nov 2020 19:53:01 +0000 (01:23 +0530)]
octeontx2-af: Handle PF-VF mac address changes

This patch handles the VF mac address changes as given below.
    1. mac addr configrued by VF will be retained until VF module unload.
    2. mac addr configred by PF for VF will be retained until power cycle.
    3. mac addr confgired by PF for its VF can't be overwritten by VF.

Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoocteontx2-pf: Add support for SR-IOV management functions
Naveen Mamindlapalli [Sat, 14 Nov 2020 19:53:00 +0000 (01:23 +0530)]
octeontx2-pf: Add support for SR-IOV management functions

This patch adds support for ndo_set_vf_mac, ndo_set_vf_vlan
and ndo_get_vf_config handlers. The traffic redirection
based on the VF mac address or vlan id is done by installing
MCAM rules. Reserved RX_VTAG_TYPE7 in each NIXLF for VF VLAN
which strips the VLAN tag from ingress VLAN traffic. The NIX PF
allocates two MCAM entries for VF VLAN feature, one used for
ingress VTAG strip and another entry for egress VTAG insertion.

This patch also updates the MAC address in PF installed VF VLAN
rule upon receiving nix_lf_start_rx mbox request for VF since
Administrative Function driver will assign a valid MAC addr
in nix_lf_start_rx function.

Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com>
Co-developed-by: Tomasz Duszynski <tduszynski@marvell.com>
Signed-off-by: Tomasz Duszynski <tduszynski@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoocteontx2-pf: Implement ingress/egress VLAN offload
Hariprasad Kelam [Sat, 14 Nov 2020 19:52:59 +0000 (01:22 +0530)]
octeontx2-pf: Implement ingress/egress VLAN offload

This patch implements egress VLAN offload by appending NIX_SEND_EXT_S
header to NIX_SEND_HDR_S. The VLAN TCI information is specified
in the NIX_SEND_EXT_S. The VLAN offload in the ingress path is
implemented by configuring the NIX_RX_VTAG_ACTION_S to strip and
capture the outer vlan fields. The NIX PF allocates one MCAM entry
for Rx VLAN offload.

Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoocteontx2-af: Modify nix_vtag_cfg mailbox to support TX VTAG entries
Vamsi Attunuru [Sat, 14 Nov 2020 19:52:58 +0000 (01:22 +0530)]
octeontx2-af: Modify nix_vtag_cfg mailbox to support TX VTAG entries

This patch modifies the existing nix_vtag_config mailbox message
to allocate and free TX VTAG entries as requested by a NIX PF.
The TX VTAG entries are global resource that shared by all PFs
and each entry specifies the size of VTAG to insert and the VTAG
header data to insert. The mailbox response contains the entry
index which is used by mailbox requester in configuring the
NPC_TX_VTAG_ACTION for any MCAM entry.

Signed-off-by: Vamsi Attunuru <vattunuru@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoocteontx2-af: Add debugfs entry to dump the MCAM rules
Subbaraya Sundeep [Sat, 14 Nov 2020 19:52:57 +0000 (01:22 +0530)]
octeontx2-af: Add debugfs entry to dump the MCAM rules

Add debugfs support to dump the MCAM rules installed using
NPC_INSTALL_FLOW mbox message. Debugfs file can display mcam
entry, counter if any, flow type and counter hits.

Ethtool will dump the ntuple flows related to the PF only.
The debugfs file gives systemwide view of the MCAM rules
installed by all the PF's.

Below is the example output when the debugfs file is read:
~ # mount -t debugfs none /sys/kernel/debug
~ # cat /sys/kernel/debug/octeontx2/npc/mcam_rules

Installed by: PF1
direction: RX
        mcam entry: 227
udp source port 23 mask 0xffff
Forward to: PF1 VF0
        action: Direct to queue 0
enabled: yes
        counter: 1
        hits: 0

Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoocteontx2-pf: Add support for unicast MAC address filtering
Hariprasad Kelam [Sat, 14 Nov 2020 19:52:56 +0000 (01:22 +0530)]
octeontx2-pf: Add support for unicast MAC address filtering

Add unicast MAC address filtering support using install flow
message. Total of 8 MCAM entries are allocated for adding
unicast mac filtering rules. If the MCAM allocation fails,
the unicast filtering support will not be advertised.

Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoocteontx2-pf: Add support for ethtool ntuple filters
Subbaraya Sundeep [Sat, 14 Nov 2020 19:52:55 +0000 (01:22 +0530)]
octeontx2-pf: Add support for ethtool ntuple filters

This patch adds support for adding and deleting ethtool ntuple
filters. The filters for ether, ipv4, ipv6, tcp, udp and sctp
are supported. The mask is also supported. The supported actions
are drop and direct to a queue. Additionally we support FLOW_EXT
field vlan_tci and FLOW_MAC_EXT.

The NIX PF will allocate total 32 MCAM entries for the use of
ethtool ntuple filters. The Administrative Function(AF) will
install/delete the MCAM rules when NIX PF sends mailbox message
to install/delete the ntuple filters.

Ethtool ntuple filters support is restricted to PFs as of now
and PF can install ntuple filters to direct the traffic to its
VFs. Hence added a separate callback for VFs to get/set RSS
configuration.

Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoocteontx2-af: Add mbox messages to install and delete MCAM rules
Subbaraya Sundeep [Sat, 14 Nov 2020 19:52:54 +0000 (01:22 +0530)]
octeontx2-af: Add mbox messages to install and delete MCAM rules

Added new mailbox messages to install and delete MCAM rules.
These mailbox messages will be used for adding/deleting ethtool
n-tuple filters by NIX PF. The installed MCAM rules are stored
in a list that will be traversed later to delete the MCAM entries
when the interface is brought down or when PCIe FLR is received.
The delete mailbox supports deleting a single MCAM entry or range
of entries or all the MCAM entries owned by the pcifunc. Each MCAM
entry can be associated with a HW match stat entry if the mailbox
requester wants to check the hit count for debugging.

Modified adding default unicast DMAC match rule using install
flow API. The default unicast DMAC match entry installed by
Administrative Function is saved and can be changed later by the
mailbox user to fit additional fields, or the default MCAM entry
rule action can be used for other flow rules installed later.

Modified rvu_mbox_handler_nix_lf_free mailbox to add a flag to
disable or delete the MCAM entries. The MCAM entries are disabled
when the interface is brought down and deleted in FLR handler.
The disabled MCAM entries will be re-enabled when the interface
is brought up again.

Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoocteontx2-af: Generate key field bit mask from KEX profile
Subbaraya Sundeep [Sat, 14 Nov 2020 19:52:53 +0000 (01:22 +0530)]
octeontx2-af: Generate key field bit mask from KEX profile

Key Extraction(KEX) profile decides how the packet metadata such as
layer information and selected packet data bytes at each layer are
placed in MCAM search key. This patch reads the configured KEX profile
parameters to find out the bit position and bit mask for each field.
The information is used when programming the MCAM match data by SW
to match a packet flow and take appropriate action on the flow. This
patch also verifies the mandatory fields such as channel and DMAC
are not overwritten by the KEX configuration of other fields.

Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoocteontx2-af: Verify MCAM entry channel and PF_FUNC
Subbaraya Sundeep [Sat, 14 Nov 2020 19:52:52 +0000 (01:22 +0530)]
octeontx2-af: Verify MCAM entry channel and PF_FUNC

This patch adds support to verify the channel number sent by
mailbox requester before writing MCAM entry for Ingress packets.
Similarly for Egress packets, verifying the PF_FUNC sent by the
mailbox user.

Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoocteontx2-af: Modify default KEX profile to extract TX packet fields
Stanislaw Kardach [Sat, 14 Nov 2020 19:52:51 +0000 (01:22 +0530)]
octeontx2-af: Modify default KEX profile to extract TX packet fields

The current default Key Extraction(KEX) profile can only use RX
packet fields while generating the MCAM search key. The profile
can't be used for matching TX packet fields. This patch modifies
the default KEX profile to add support for extracting TX packet
fields into MCAM search key. Enabled Tx KPU packet parsing by
configuring TX PKIND in tx_parse_cfg.

Modified the KEX profile to extract 2 bytes of VLAN TCI from an
offset of 2 bytes from LB_PTR. The LB_PTR points to the byte offset
where the VLAN header starts. The NPC KPU parser profile has been
modified to point LB_PTR to the starting byte offset of VLAN header
which points to the tpid field.

Signed-off-by: Stanislaw Kardach <skardach@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: wan: Delete the DLCI / SDLA drivers
Xie He [Sat, 14 Nov 2020 15:09:21 +0000 (07:09 -0800)]
net: wan: Delete the DLCI / SDLA drivers

The DLCI driver (dlci.c) implements the Frame Relay protocol. However,
we already have another newer and better implementation of Frame Relay
provided by the HDLC_FR driver (hdlc_fr.c).

The DLCI driver's implementation of Frame Relay is used by only one
hardware driver in the kernel - the SDLA driver (sdla.c).

The SDLA driver provides Frame Relay support for the Sangoma S50x devices.
However, the vendor provides their own driver (along with their own
multi-WAN-protocol implementations including Frame Relay), called WANPIPE.
I believe most users of the hardware would use the vendor-provided WANPIPE
driver instead.

(The WANPIPE driver was even once in the kernel, but was deleted in
commit 8db60bcf3021 ("[WAN]: Remove broken and unmaintained Sangoma
drivers.") because the vendor no longer updated the in-kernel WANPIPE
driver.)

Cc: Mike McLagan <mike.mclagan@linux.org>
Signed-off-by: Xie He <xie.he.0141@gmail.com>
Link: https://lore.kernel.org/r/20201114150921.685594-1-xie.he.0141@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoMerge branch 'net-hns3-updates-for-next'
Jakub Kicinski [Tue, 17 Nov 2020 19:39:22 +0000 (11:39 -0800)]
Merge branch 'net-hns3-updates-for-next'

Huazhong Tan says:

====================
net: hns3: updates for -next

There are several updates relating to the interrupt coalesce for
the HNS3 ethernet driver.

   based on the frame quantity).
   a fixed value in code.
   based on the gap time).
   its new usage.

change log:
V4 - remove #5~#10 from this series, which needs more discussion.
V3 - fix a typo error in #1 reported by Jakub Kicinski.
     rewrite #9 commit log.
     remove #11 from this series.
V2 - reorder #2 & #3 to fix compiler error.
     fix some checkpatch warnings in #10 & #11.

previous version:
V3: https://patchwork.ozlabs.org/project/netdev/cover/1605151998-12633-1-git-send-email-tanhuazhong@huawei.com/
V2: https://patchwork.ozlabs.org/project/netdev/cover/1604892159-19990-1-git-send-email-tanhuazhong@huawei.com/
V1: https://patchwork.ozlabs.org/project/netdev/cover/1604730681-32559-1-git-send-email-tanhuazhong@huawei.com/
====================

Link: https://lore.kernel.org/r/1605514854-11205-1-git-send-email-tanhuazhong@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: hns3: rename gl_adapt_enable in struct hns3_enet_coalesce
Huazhong Tan [Mon, 16 Nov 2020 08:20:54 +0000 (16:20 +0800)]
net: hns3: rename gl_adapt_enable in struct hns3_enet_coalesce

Besides GL(Gap Limiting), QL(Quantity Limiting) can be modified
dynamically when DIM is supported. So rename gl_adapt_enable as
adapt_enable in struct hns3_enet_coalesce.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: hns3: add support for 1us unit GL configuration
Huazhong Tan [Mon, 16 Nov 2020 08:20:53 +0000 (16:20 +0800)]
net: hns3: add support for 1us unit GL configuration

For device whose version is above V3(include V3), the GL
configuration can set as 1us unit, so adds support for
configuring this field.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: hns3: add support for querying maximum value of GL
Huazhong Tan [Mon, 16 Nov 2020 08:20:52 +0000 (16:20 +0800)]
net: hns3: add support for querying maximum value of GL

For maintainability and compatibility, add support for querying
the maximum value of GL.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: hns3: add support for configuring interrupt quantity limiting
Huazhong Tan [Mon, 16 Nov 2020 08:20:51 +0000 (16:20 +0800)]
net: hns3: add support for configuring interrupt quantity limiting

QL(quantity limiting) means that hardware supports the interrupt
coalesce based on the frame quantity.  QL can be configured when
int_ql_max in device's specification is non-zero, so add support
to configure it. Also, rename two coalesce init function to fit
their purpose.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoMerge branch 'net-phy-add-support-for-shared-interrupts-part-2'
Jakub Kicinski [Tue, 17 Nov 2020 19:37:13 +0000 (11:37 -0800)]
Merge branch 'net-phy-add-support-for-shared-interrupts-part-2'

Ioana Ciornei says:

====================
net: phy: add support for shared interrupts (part 2)

This patch set aims to actually add support for shared interrupts in
phylib and not only for multi-PHY devices. While we are at it,
streamline the interrupt handling in phylib.

For a bit of context, at the moment, there are multiple phy_driver ops
that deal with this subject:

- .config_intr() - Enable/disable the interrupt line.

- .ack_interrupt() - Should quiesce any interrupts that may have been
  fired.  It's also used by phylib in conjunction with .config_intr() to
  clear any pending interrupts after the line was disabled, and before
  it is going to be enabled.

- .did_interrupt() - Intended for multi-PHY devices with a shared IRQ
  line and used by phylib to discern which PHY from the package was the
  one that actually fired the interrupt.

- .handle_interrupt() - Completely overrides the default interrupt
  handling logic from phylib. The PHY driver is responsible for checking
  if any interrupt was fired by the respective PHY and choose
  accordingly if it's the one that should trigger the link state machine.

From my point of view, the interrupt handling in phylib has become
somewhat confusing with all these callbacks that actually read the same
PHY register - the interrupt status.  A more streamlined approach would
be to just move the responsibility to write an interrupt handler to the
driver (as any other device driver does) and make .handle_interrupt()
the only way to deal with interrupts.

Another advantage with this approach would be that phylib would gain
support for shared IRQs between different PHY (not just multi-PHY
devices), something which at the moment would require extending every
PHY driver anyway in order to implement their .did_interrupt() callback
and duplicate the same logic as in .ack_interrupt(). The disadvantage
of making .did_interrupt() mandatory would be that we are slightly
changing the semantics of the phylib API and that would increase
confusion instead of reducing it.

What I am proposing is the following:

- As a first step, make the .ack_interrupt() callback optional so that
  we do not break any PHY driver amid the transition.

- Every PHY driver gains a .handle_interrupt() implementation that, for
  the most part, would look like below:

irq_status = phy_read(phydev, INTR_STATUS);
if (irq_status < 0) {
phy_error(phydev);
return IRQ_NONE;
}

if (!(irq_status & irq_mask))
return IRQ_NONE;

phy_trigger_machine(phydev);

return IRQ_HANDLED;

- Remove each PHY driver's implementation of the .ack_interrupt() by
  actually taking care of quiescing any pending interrupts before
  enabling/after disabling the interrupt line.

- Finally, after all drivers have been ported, remove the
  .ack_interrupt() and .did_interrupt() callbacks from phy_driver.

This patch set is part 2 of the entire change set and it addresses the
changes needed in 9 PHY drivers. The rest can be found on my Github
branch here:
https://github.com/IoanaCiornei/linux/commits/phylib-shared-irq
====================

Link: https://lore.kernel.org/r/20201113165226.561153-1-ciorneiioana@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: phy: adin: remove the use of the .ack_interrupt()
Ioana Ciornei [Fri, 13 Nov 2020 16:52:26 +0000 (18:52 +0200)]
net: phy: adin: remove the use of the .ack_interrupt()

In preparation of removing the .ack_interrupt() callback, we must replace
its occurrences (aka phy_clear_interrupt), from the 2 places where it is
called from (phy_enable_interrupts and phy_disable_interrupts), with
equivalent functionality.

This means that clearing interrupts now becomes something that the PHY
driver is responsible of doing, before enabling interrupts and after
clearing them. Make this driver follow the new contract.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Acked-by: Alexandru Ardelean <alexandru.ardelean@analog.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: phy: adin: implement generic .handle_interrupt() callback
Ioana Ciornei [Fri, 13 Nov 2020 16:52:25 +0000 (18:52 +0200)]
net: phy: adin: implement generic .handle_interrupt() callback

In an attempt to actually support shared IRQs in phylib, we now move the
responsibility of triggering the phylib state machine or just returning
IRQ_NONE, based on the IRQ status register, to the PHY driver. Having
3 different IRQ handling callbacks (.handle_interrupt(),
.did_interrupt() and .ack_interrupt() ) is confusing so let the PHY
driver implement directly an IRQ handler like any other device driver.
Make this driver follow the new convention.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Acked-by: Alexandru Ardelean <alexandru.ardelean@analog.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: phy: ste10Xp: remove the use of .ack_interrupt()
Ioana Ciornei [Fri, 13 Nov 2020 16:52:24 +0000 (18:52 +0200)]
net: phy: ste10Xp: remove the use of .ack_interrupt()

In preparation of removing the .ack_interrupt() callback, we must replace
its occurrences (aka phy_clear_interrupt), from the 2 places where it is
called from (phy_enable_interrupts and phy_disable_interrupts), with
equivalent functionality.

This means that clearing interrupts now becomes something that the PHY
driver is responsible of doing, before enabling interrupts and after
clearing them. Make this driver follow the new contract.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: phy: ste10Xp: implement generic .handle_interrupt() callback
Ioana Ciornei [Fri, 13 Nov 2020 16:52:23 +0000 (18:52 +0200)]
net: phy: ste10Xp: implement generic .handle_interrupt() callback

In an attempt to actually support shared IRQs in phylib, we now move the
responsibility of triggering the phylib state machine or just returning
IRQ_NONE, based on the IRQ status register, to the PHY driver. Having
3 different IRQ handling callbacks (.handle_interrupt(),
.did_interrupt() and .ack_interrupt() ) is confusing so let the PHY
driver implement directly an IRQ handler like any other device driver.
Make this driver follow the new convention.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: phy: smsc: remove the use of .ack_interrupt()
Ioana Ciornei [Fri, 13 Nov 2020 16:52:22 +0000 (18:52 +0200)]
net: phy: smsc: remove the use of .ack_interrupt()

In preparation of removing the .ack_interrupt() callback, we must replace
its occurrences (aka phy_clear_interrupt), from the 2 places where it is
called from (phy_enable_interrupts and phy_disable_interrupts), with
equivalent functionality.

This means that clearing interrupts now becomes something that the PHY
driver is responsible of doing, before enabling interrupts and after
clearing them. Make this driver follow the new contract.

Cc: Andre Edich <andre.edich@microchip.com>
Cc: Marco Felsch <m.felsch@pengutronix.de>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: phy: smsc: implement generic .handle_interrupt() callback
Ioana Ciornei [Fri, 13 Nov 2020 16:52:21 +0000 (18:52 +0200)]
net: phy: smsc: implement generic .handle_interrupt() callback

In an attempt to actually support shared IRQs in phylib, we now move the
responsibility of triggering the phylib state machine or just returning
IRQ_NONE, based on the IRQ status register, to the PHY driver. Having
3 different IRQ handling callbacks (.handle_interrupt(),
.did_interrupt() and .ack_interrupt() ) is confusing so let the PHY
driver implement directly an IRQ handler like any other device driver.
Make this driver follow the new convention.

Cc: Andre Edich <andre.edich@microchip.com>
Cc: Marco Felsch <m.felsch@pengutronix.de>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: phy: amd: remove the use of .ack_interrupt()
Ioana Ciornei [Fri, 13 Nov 2020 16:52:20 +0000 (18:52 +0200)]
net: phy: amd: remove the use of .ack_interrupt()

In preparation of removing the .ack_interrupt() callback, we must replace
its occurrences (aka phy_clear_interrupt), from the 2 places where it is
called from (phy_enable_interrupts and phy_disable_interrupts), with
equivalent functionality.

This means that clearing interrupts now becomes something that the PHY
driver is responsible of doing, before enabling interrupts and after
clearing them. Make this driver follow the new contract.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: phy: amd: implement generic .handle_interrupt() callback
Ioana Ciornei [Fri, 13 Nov 2020 16:52:19 +0000 (18:52 +0200)]
net: phy: amd: implement generic .handle_interrupt() callback

In an attempt to actually support shared IRQs in phylib, we now move the
responsibility of triggering the phylib state machine or just returning
IRQ_NONE, based on the IRQ status register, to the PHY driver. Having
3 different IRQ handling callbacks (.handle_interrupt(),
.did_interrupt() and .ack_interrupt() ) is confusing so let the PHY
driver implement directly an IRQ handler like any other device driver.
Make this driver follow the new convention.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: phy: nxp-tja11xx: remove the use of .ack_interrupt()
Ioana Ciornei [Fri, 13 Nov 2020 16:52:18 +0000 (18:52 +0200)]
net: phy: nxp-tja11xx: remove the use of .ack_interrupt()

In preparation of removing the .ack_interrupt() callback, we must replace
its occurrences (aka phy_clear_interrupt), from the 2 places where it is
called from (phy_enable_interrupts and phy_disable_interrupts), with
equivalent functionality.

This means that clearing interrupts now becomes something that the PHY
driver is responsible of doing, before enabling interrupts and after
clearing them. Make this driver follow the new contract.

Cc: Marek Vasut <marex@denx.de>
Cc: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: phy: nxp-tja11xx: implement generic .handle_interrupt() callback
Ioana Ciornei [Fri, 13 Nov 2020 16:52:17 +0000 (18:52 +0200)]
net: phy: nxp-tja11xx: implement generic .handle_interrupt() callback

In an attempt to actually support shared IRQs in phylib, we now move the
responsibility of triggering the phylib state machine or just returning
IRQ_NONE, based on the IRQ status register, to the PHY driver. Having
3 different IRQ handling callbacks (.handle_interrupt(),
.did_interrupt() and .ack_interrupt() ) is confusing so let the PHY
driver implement directly an IRQ handler like any other device driver.
Make this driver follow the new convention.

Cc: Marek Vasut <marex@denx.de>
Cc: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: phy: lxt: remove the use of .ack_interrupt()
Ioana Ciornei [Fri, 13 Nov 2020 16:52:16 +0000 (18:52 +0200)]
net: phy: lxt: remove the use of .ack_interrupt()

In preparation of removing the .ack_interrupt() callback, we must replace
its occurrences (aka phy_clear_interrupt), from the 2 places where it is
called from (phy_enable_interrupts and phy_disable_interrupts), with
equivalent functionality.

This means that clearing interrupts now becomes something that the PHY
driver is responsible of doing, before enabling interrupts and after
clearing them. Make this driver follow the new contract.

Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: phy: lxt: implement generic .handle_interrupt() callback
Ioana Ciornei [Fri, 13 Nov 2020 16:52:15 +0000 (18:52 +0200)]
net: phy: lxt: implement generic .handle_interrupt() callback

In an attempt to actually support shared IRQs in phylib, we now move the
responsibility of triggering the phylib state machine or just returning
IRQ_NONE, based on the IRQ status register, to the PHY driver. Having
3 different IRQ handling callbacks (.handle_interrupt(),
.did_interrupt() and .ack_interrupt() ) is confusing so let the PHY
driver implement directly an IRQ handler like any other device driver.
Make this driver follow the new convention.

Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: phy: marvell: remove the use of .ack_interrupt()
Ioana Ciornei [Fri, 13 Nov 2020 16:52:14 +0000 (18:52 +0200)]
net: phy: marvell: remove the use of .ack_interrupt()

In preparation of removing the .ack_interrupt() callback, we must replace
its occurrences (aka phy_clear_interrupt), from the 2 places where it is
called from (phy_enable_interrupts and phy_disable_interrupts), with
equivalent functionality.

This means that clearing interrupts now becomes something that the PHY
driver is responsible of doing, before enabling interrupts and after
clearing them. Make this driver follow the new contract.

Cc: Maxim Kochetkov <fido_max@inbox.ru>
Cc: Baruch Siach <baruch@tkos.co.il>
Cc: Robert Hancock <robert.hancock@calian.com>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Tested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: phy: marvell: implement generic .handle_interrupt() callback
Ioana Ciornei [Fri, 13 Nov 2020 16:52:13 +0000 (18:52 +0200)]
net: phy: marvell: implement generic .handle_interrupt() callback

In an attempt to actually support shared IRQs in phylib, we now move the
responsibility of triggering the phylib state machine or just returning
IRQ_NONE, based on the IRQ status register, to the PHY driver. Having
3 different IRQ handling callbacks (.handle_interrupt(),
.did_interrupt() and .ack_interrupt() ) is confusing so let the PHY
driver implement directly an IRQ handler like any other device driver.
Make this driver follow the new convention.

Cc: Maxim Kochetkov <fido_max@inbox.ru>
Cc: Baruch Siach <baruch@tkos.co.il>
Cc: Robert Hancock <robert.hancock@calian.com>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: phy: microchip: remove the use of .ack_interrupt()
Ioana Ciornei [Fri, 13 Nov 2020 16:52:12 +0000 (18:52 +0200)]
net: phy: microchip: remove the use of .ack_interrupt()

In preparation of removing the .ack_interrupt() callback, we must replace
its occurrences (aka phy_clear_interrupt), from the 2 places where it is
called from (phy_enable_interrupts and phy_disable_interrupts), with
equivalent functionality.

This means that clearing interrupts now becomes something that the PHY
driver is responsible of doing, before enabling interrupts and after
clearing them. Make this driver follow the new contract.

Cc: Nisar Sayed <Nisar.Sayed@microchip.com>
Cc: Yuiko Oshino <yuiko.oshino@microchip.com>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: phy: microchip: implement generic .handle_interrupt() callback
Ioana Ciornei [Fri, 13 Nov 2020 16:52:11 +0000 (18:52 +0200)]
net: phy: microchip: implement generic .handle_interrupt() callback

In an attempt to actually support shared IRQs in phylib, we now move the
responsibility of triggering the phylib state machine or just returning
IRQ_NONE, based on the IRQ status register, to the PHY driver. Having
3 different IRQ handling callbacks (.handle_interrupt(),
.did_interrupt() and .ack_interrupt() ) is confusing so let the PHY
driver implement directly an IRQ handler like any other device driver.
Make this driver follow the new convention.

Cc: Nisar Sayed <Nisar.Sayed@microchip.com>
Cc: Yuiko Oshino <yuiko.oshino@microchip.com>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: phy: vitesse: remove the use of .ack_interrupt()
Ioana Ciornei [Fri, 13 Nov 2020 16:52:10 +0000 (18:52 +0200)]
net: phy: vitesse: remove the use of .ack_interrupt()

In preparation of removing the .ack_interrupt() callback, we must replace
its occurrences (aka phy_clear_interrupt), from the 2 places where it is
called from (phy_enable_interrupts and phy_disable_interrupts), with
equivalent functionality.

This means that clearing interrupts now becomes something that the PHY
driver is responsible of doing, before enabling interrupts and after
clearing them. Make this driver follow the new contract.

Cc: Kavya Sree Kotagiri <kavyasree.kotagiri@microchip.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: phy: vitesse: implement generic .handle_interrupt() callback
Ioana Ciornei [Fri, 13 Nov 2020 16:52:09 +0000 (18:52 +0200)]
net: phy: vitesse: implement generic .handle_interrupt() callback

In an attempt to actually support shared IRQs in phylib, we now move the
responsibility of triggering the phylib state machine or just returning
IRQ_NONE, based on the IRQ status register, to the PHY driver. Having
3 different IRQ handling callbacks (.handle_interrupt(),
.did_interrupt() and .ack_interrupt() ) is confusing so let the PHY
driver implement directly an IRQ handler like any other device driver.
Make this driver follow the new convention.

Cc: Kavya Sree Kotagiri <kavyasree.kotagiri@microchip.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: linux/skbuff.h: combine SKB_EXTENSIONS + KCOV handling
Randy Dunlap [Mon, 16 Nov 2020 21:21:08 +0000 (13:21 -0800)]
net: linux/skbuff.h: combine SKB_EXTENSIONS + KCOV handling

The previous Kconfig patch led to some other build errors as
reported by the 0day bot and my own overnight build testing.

These are all in <linux/skbuff.h> when KCOV is enabled but
SKB_EXTENSIONS is not enabled, so fix those by combining those conditions
in the header file.

Fixes: 6370cc3bbd8a ("net: add kcov handle to skb extensions")
Fixes: 85ce50d337d1 ("net: kcov: don't select SKB_EXTENSIONS when there is no NET")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kernel test robot <lkp@intel.com>
Cc: Aleksandr Nogikh <nogikh@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Acked-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20201116212108.32465-1-rdunlap@infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agolan743x: replace devicetree phy parse code with library function
Sven Van Asbroeck [Mon, 16 Nov 2020 17:01:55 +0000 (12:01 -0500)]
lan743x: replace devicetree phy parse code with library function

The code in this driver which parses the devicetree to determine
the phy/fixed link setup, can be replaced by a single library
function: of_phy_get_and_connect().

Behaviour is identical, except that the library function will
complain when 'phy-connection-type' is omitted, instead of
blindly using PHY_INTERFACE_MODE_NA, which would result in an
invalid phy configuration.

The library function no longer brings out the exact phy_mode,
but the driver doesn't need this, because phy_interface_is_rgmii()
queries the phydev directly. Remove 'phy_mode' from the private
adapter struct.

While we're here, log info about the attached phy on connect,
this is useful because the phy type and connection method is now
fully configurable via the devicetree.

Tested on a lan7430 chip with built-in phy. Verified that adding
fixed-link/phy-connection-type in the devicetree results in a
fixed-link setup. Used ethtool to verify that the devicetree
settings are used.

Tested-by: Sven Van Asbroeck <thesven73@gmail.com> # lan7430
Signed-off-by: Sven Van Asbroeck <thesven73@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20201116170155.26967-1-TheSven73@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: phy: don't duplicate driver name in phy_attached_print
Heiner Kallweit [Sun, 15 Nov 2020 15:03:10 +0000 (16:03 +0100)]
net: phy: don't duplicate driver name in phy_attached_print

Currently we print the driver name twice in phy_attached_print():
- phy_dev_info() prints it as part of the device info
- and we print it as part of the info string

This is a little bit ugly, it makes the info harder to read,
especially if the driver name is a little bit longer.
Therefore omit the driver name (if set) in the info string.

Example from r8169 that uses phylib:

old: Generic FE-GE Realtek PHY r8169-300:00: attached PHY driver \
   [Generic FE-GE Realtek PHY] (mii_bus:phy_addr=r8169-300:00, irq=IGNORE)
new: Generic FE-GE Realtek PHY r8169-300:00: attached PHY driver \
   (mii_bus:phy_addr=r8169-300:00, irq=IGNORE)

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/8ab72586-f079-41d8-84ee-9f6a5bd97b2a@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agor8169: remove nr_frags argument from rtl_tx_slots_avail
Heiner Kallweit [Mon, 16 Nov 2020 16:03:14 +0000 (17:03 +0100)]
r8169: remove nr_frags argument from rtl_tx_slots_avail

The only time when nr_frags isn't SKB_MAX_FRAGS is when entering
rtl8169_start_xmit(). However we can use SKB_MAX_FRAGS also here
because when queue isn't stopped there should always be room for
MAX_SKB_FRAGS + 1 descriptors.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/3d1f2ad7-31d5-2cac-4f4a-394f8a3cab63@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: phy: mscc: fix excluded_middle.cocci warnings
kernel test robot [Mon, 16 Nov 2020 15:34:44 +0000 (16:34 +0100)]
net: phy: mscc: fix excluded_middle.cocci warnings

Condition !A || A && B is equivalent to !A || B.

Generated by: scripts/coccinelle/misc/excluded_middle.cocci

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: kernel test robot <lkp@intel.com>
Signed-off-by: Julia Lawall <julia.lawall@inria.fr>
Reviewed-by: Antoine Tenart <atenart@kernel.org>
Link: https://lore.kernel.org/r/alpine.DEB.2.22.394.2011161633240.2682@hadrien
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoMerge branch 'net-dsa-tag_dsa-unify-regular-and-ethertype-dsa-taggers'
Jakub Kicinski [Tue, 17 Nov 2020 17:16:13 +0000 (09:16 -0800)]
Merge branch 'net-dsa-tag_dsa-unify-regular-and-ethertype-dsa-taggers'

Tobias Waldekranz says:

====================
net: dsa: tag_dsa: Unify regular and ethertype DSA taggers

The first patch ports tag_edsa.c's handling of IGMP/MLD traps to
tag_dsa.c. That way, we start from two logically equivalent taggers
that are then merged. The second commit does the heavy lifting of
actually fusing tag_dsa.c and tag_edsa.c. The final one just follows
up with some clean up of existing comments.

v2 -> v3:
  - Add the first patch described above as suggested by Andrew.
  - Better documentation of TO_SNIFFER and FORWARD tags.
  - Spelling.

v1 -> v2:
  - Fixed some grammar and whitespace errors.
  - Removed unnecessary default value in Kconfig.
  - Removed unnecessary #ifdef.
  - Split out comment fixes from functional changes.
  - Fully document enum dsa_code.
====================

Link: https://lore.kernel.org/r/20201114234558.31203-1-tobias@waldekranz.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: dsa: tag_dsa: Use a consistent comment style
Tobias Waldekranz [Sat, 14 Nov 2020 23:45:58 +0000 (00:45 +0100)]
net: dsa: tag_dsa: Use a consistent comment style

Use a consistent style of one-line/multi-line comments throughout the
file.

Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: dsa: tag_dsa: Unify regular and ethertype DSA taggers
Tobias Waldekranz [Sat, 14 Nov 2020 23:45:57 +0000 (00:45 +0100)]
net: dsa: tag_dsa: Unify regular and ethertype DSA taggers

Ethertype DSA encodes exactly the same information in the DSA tag as
the non-ethertype variety. So refactor out the common parts and reuse
them for both protocols.

This is ensures tag parsing and generation is always consistent across
all mv88e6xxx chips.

While we are at it, explicitly deal with all possible CPU codes on
receive, making sure to set offload_fwd_mark as appropriate.

Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: dsa: tag_dsa: Allow forwarding of redirected IGMP traffic
Tobias Waldekranz [Sat, 14 Nov 2020 23:45:56 +0000 (00:45 +0100)]
net: dsa: tag_dsa: Allow forwarding of redirected IGMP traffic

When receiving an IGMP/MLD frame with a TO_CPU tag, the switch has not
performed any forwarding of it. This means that we should not set the
offload_fwd_mark on the skb, in case a software bridge wants it
forwarded.

This is a port of:

1ed9ec9b08ad ("dsa: Allow forwarding of redirected IGMP traffic")

Which corrected the issue for chips using EDSA tags, but not for those
using regular DSA tags.

Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoMerge branch 'mptcp-improve-multiple-xmit-streams-support'
Jakub Kicinski [Mon, 16 Nov 2020 18:46:10 +0000 (10:46 -0800)]
Merge branch 'mptcp-improve-multiple-xmit-streams-support'

Paolo Abeni says:

====================
mptcp: improve multiple xmit streams support

This series improves MPTCP handling of multiple concurrent
xmit streams.

The to-be-transmitted data is enqueued to a subflow only when
the send window is open, keeping the subflows xmit queue shorter
and allowing for faster switch-over.

The above requires a more accurate msk socket state tracking
and some additional infrastructure to allow pushing the data
pending in the msk xmit queue as soon as the MPTCP's send window
opens (patches 6-10).

As a side effect, the MPTCP socket could enqueue data to subflows
after close() time - to completely spooling the data sitting in the
msk xmit queue. Dealing with the requires some infrastructure and
core TCP changes (patches 1-5)

Finally, patches 11-12 introduce a more accurate tracking of the other
end's receive window.

Overall this refactor the MPTCP xmit path, without introducing
new features - the new code is covered by the existing self-tests.

v2 -> v3:
 - rebased,
 - fixed checkpatch issue in patch 1/13
 - fixed some state tracking issues in patch 8/13

v1 -> v2:
 - this is just a repost, to cope with patchwork issues, no changes
   at all
====================

Link: https://lore.kernel.org/r/cover.1605458224.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agomptcp: send explicit ack on delayed ack_seq incr
Paolo Abeni [Mon, 16 Nov 2020 09:48:14 +0000 (10:48 +0100)]
mptcp: send explicit ack on delayed ack_seq incr

When the worker moves some bytes from the OoO queue into
the receive queue, the msk->ask_seq is updated, the MPTCP-level
ack carrying that value needs to wait the next ingress packet,
possibly slowing down or hanging the peer

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agomptcp: keep track of advertised windows right edge
Florian Westphal [Mon, 16 Nov 2020 09:48:13 +0000 (10:48 +0100)]
mptcp: keep track of advertised windows right edge

Before sending 'x' new bytes also check that the new snd_una would
be within the permitted receive window.

For every ACK that also contains a DSS ack, check whether its tcp-level
receive window would advance the current mptcp window right edge and
update it if so.

Signed-off-by: Florian Westphal <fw@strlen.de>
Co-developed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agomptcp: rework poll+nospace handling
Florian Westphal [Mon, 16 Nov 2020 09:48:12 +0000 (10:48 +0100)]
mptcp: rework poll+nospace handling

MPTCP maintains a status bit, MPTCP_SEND_SPACE, that is set when at
least one subflow and the mptcp socket itself are writeable.

mptcp_poll returns EPOLLOUT if the bit is set.

mptcp_sendmsg makes sure MPTCP_SEND_SPACE gets cleared when last write
has used up all subflows or the mptcp socket wmem.

This reworks nospace handling as follows:

MPTCP_SEND_SPACE is replaced with MPTCP_NOSPACE, i.e. inverted meaning.
This bit is set when the mptcp socket is not writeable.
The mptcp-level ack path schedule will then schedule the mptcp worker
to allow it to free already-acked data (and reduce wmem usage).

This will then wake userspace processes that wait for a POLLOUT event.

sendmsg will set MPTCP_NOSPACE only when it has to wait for more
wmem (blocking I/O case).

poll path will set MPTCP_NOSPACE in case the mptcp socket is
not writeable.

Normal tcp-level notification (SOCK_NOSPACE) is only enabled
in case the subflow socket has no available wmem.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agomptcp: try to push pending data on snd una updates
Paolo Abeni [Mon, 16 Nov 2020 09:48:11 +0000 (10:48 +0100)]
mptcp: try to push pending data on snd una updates

After the previous patch we may end-up with unsent data
in the write buffer. If such buffer is full, the writer
will block for unlimited time.

We need to trigger the MPTCP xmit path even for the
subflow rx path, on MPTCP snd_una updates.

Keep things simple and just schedule the work queue if
needed.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agomptcp: move page frag allocation in mptcp_sendmsg()
Paolo Abeni [Mon, 16 Nov 2020 09:48:10 +0000 (10:48 +0100)]
mptcp: move page frag allocation in mptcp_sendmsg()

mptcp_sendmsg() is refactored so that first it copies
the data provided from user space into the send queue,
and then tries to spool the send queue via sendmsg_frag.

There a subtle change in the mptcp level collapsing on
consecutive data fragment: we now allow that only on unsent
data.

The latter don't need to deal with msghdr data anymore
and can be simplified in a relevant way.

snd_nxt and write_seq are now tracked independently.

Overall this allows some relevant cleanup and will
allow sending pending mptcp data on msk una update in
later patch.

Co-developed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agomptcp: refactor shutdown and close
Paolo Abeni [Mon, 16 Nov 2020 09:48:09 +0000 (10:48 +0100)]
mptcp: refactor shutdown and close

We must not close the subflows before all the MPTCP level
data, comprising the DATA_FIN has been acked at the MPTCP
level, otherwise we could be unable to retransmit as needed.

__mptcp_wr_shutdown() shutdown is responsible to check for the
correct status and close all subflows. Is called by the output
path after spooling any data and at shutdown/close time.

In a similar way, __mptcp_destroy_sock() is responsible to clean-up
the MPTCP level status, and is called when the msk transition
to TCP_CLOSE.

The protocol level close() does not force anymore the TCP_CLOSE
status, but orphan the msk socket and all the subflows.
Orphaned msk sockets are forciby closed after a timeout or
when all MPTCP-level data is acked.

There is a caveat about keeping the orphaned subflows around:
the TCP stack can asynchronusly call tcp_cleanup_ulp() on them via
tcp_close(). To prevent accessing freed memory on later MPTCP
level operations, the msk acquires a reference to each subflow
socket and prevent subflow_ulp_release() from releasing the
subflow context before __mptcp_destroy_sock().

The additional subflow references are released by __mptcp_done()
and the async ULP release is detected checking ULP ops. If such
field has been already cleared by the ULP release path, the
dangling context is freed directly by __mptcp_done().

Co-developed-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agomptcp: introduce MPTCP snd_nxt
Paolo Abeni [Mon, 16 Nov 2020 09:48:08 +0000 (10:48 +0100)]
mptcp: introduce MPTCP snd_nxt

Track the next MPTCP sequence number used on xmit,
currently always equal to write_next.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agomptcp: add accounting for pending data
Paolo Abeni [Mon, 16 Nov 2020 09:48:07 +0000 (10:48 +0100)]
mptcp: add accounting for pending data

Preparation patch to track the data pending in the msk
write queue. No functional change introduced here

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agomptcp: reduce the arguments of mptcp_sendmsg_frag
Paolo Abeni [Mon, 16 Nov 2020 09:48:06 +0000 (10:48 +0100)]
mptcp: reduce the arguments of mptcp_sendmsg_frag

The current argument list is pretty long and quite unreadable,
move many of them into a specific struct. Later patches
will add more stuff to such struct.

Additionally drop the 'timeo' argument, now unused.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agomptcp: introduce mptcp_schedule_work
Paolo Abeni [Mon, 16 Nov 2020 09:48:05 +0000 (10:48 +0100)]
mptcp: introduce mptcp_schedule_work

remove some of code duplications an allow preventing
rescheduling on close.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agotcp: factor out __tcp_close() helper
Paolo Abeni [Mon, 16 Nov 2020 09:48:04 +0000 (10:48 +0100)]
tcp: factor out __tcp_close() helper

unlocked version of protocol level close, will be used by
MPTCP to allow decouple orphaning and subflow level close.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agomptcp: use tcp_build_frag()
Paolo Abeni [Mon, 16 Nov 2020 09:48:03 +0000 (10:48 +0100)]
mptcp: use tcp_build_frag()

mptcp_push_pending() is called even on orphaned
msk (and orphaned subflows), if there is outstanding
data at close() time.

To cope with the above MPTCP needs to handle explicitly
the allocation failure on xmit. The newly introduced
do_tcp_sendfrag() allows that, just plug it.

We can additionally drop a couple of sanity checks,
duplicate in the TCP code.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agotcp: factor out tcp_build_frag()
Paolo Abeni [Mon, 16 Nov 2020 09:48:02 +0000 (10:48 +0100)]
tcp: factor out tcp_build_frag()

Will be needed by the next patch, as MPTCP needs to handle
directly the error/memory-allocation-needed path.

No functional changes intended.

Additionally let MPTCP code access the tcp_remove_empty_skb()
helper.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoMerge branch 'fix-inefficiences-and-rename-nla_strlcpy'
Jakub Kicinski [Mon, 16 Nov 2020 16:09:50 +0000 (08:09 -0800)]
Merge branch 'fix-inefficiences-and-rename-nla_strlcpy'

Francis Laniel says:

====================
Fix inefficiences and rename nla_strlcpy

This patch set answers to first three issues listed in:
https://github.com/KSPP/linux/issues/110

To sum up, the patch contributions are the following:
1. the first patch fixes an inefficiency where some bytes in dst were written
twice, one with 0 the other with src content.
2. The second one modifies nla_strlcpy to return the same value as strscpy,
i.e. number of bytes written or -E2BIG if src was truncated.
It also modifies code that calls nla_strlcpy and checks for its return value.
3. The third renames nla_strlcpy to nla_strscpy.

Unfortunately, I did not find how to create struct nlattr objects so I tested
my modifications on simple char* and with GDB using tc to get to
tcf_proto_check_kind.
====================

Link: https://lore.kernel.org/r/20201115170806.3578-1-laniel_francis@privacyrequired.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agotreewide: rename nla_strlcpy to nla_strscpy.
Francis Laniel [Sun, 15 Nov 2020 17:08:06 +0000 (18:08 +0100)]
treewide: rename nla_strlcpy to nla_strscpy.

Calls to nla_strlcpy are now replaced by calls to nla_strscpy which is the new
name of this function.

Signed-off-by: Francis Laniel <laniel_francis@privacyrequired.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoModify return value of nla_strlcpy to match that of strscpy.
Francis Laniel [Sun, 15 Nov 2020 17:08:05 +0000 (18:08 +0100)]
Modify return value of nla_strlcpy to match that of strscpy.

nla_strlcpy now returns -E2BIG if src was truncated when written to dst.
It also returns this error value if dstsize is 0 or higher than INT_MAX.

For example, if src is "foo\0" and dst is 3 bytes long, the result will be:
1. "foG" after memcpy (G means garbage).
2. "fo\0" after memset.
3. -E2BIG is returned because src was not completely written into dst.

The callers of nla_strlcpy were modified to take into account this modification.

Signed-off-by: Francis Laniel <laniel_francis@privacyrequired.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoFix unefficient call to memset before memcpu in nla_strlcpy.
Francis Laniel [Sun, 15 Nov 2020 17:08:04 +0000 (18:08 +0100)]
Fix unefficient call to memset before memcpu in nla_strlcpy.

Before this commit, nla_strlcpy first memseted dst to 0 then wrote src into it.
This is inefficient because bytes whom number is less than src length are written
twice.

This patch solves this issue by first writing src into dst then fill dst with
0's.
Note that, in the case where src length is higher than dst, only 0 is written.
Otherwise there are as many 0's written to fill dst.

For example, if src is "foo\0" and dst is 5 bytes long, the result will be:
1. "fooGG" after memcpy (G means garbage).
2. "foo\0\0" after memset.

Signed-off-by: Francis Laniel <laniel_francis@privacyrequired.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agor8169: improve rtl8169_start_xmit
Heiner Kallweit [Sat, 14 Nov 2020 20:49:53 +0000 (21:49 +0100)]
r8169: improve rtl8169_start_xmit

Improve the following in rtl8169_start_xmit:
- tp->cur_tx can be accessed in parallel by rtl_tx(), therefore
  annotate the race by using WRITE_ONCE
- avoid checking stop_queue a second time by moving the doorbell check
- netif_stop_queue() uses atomic operation set_bit() that includes a
  full memory barrier on some platforms, therefore use
  smp_mb__after_atomic to avoid overhead

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/80085451-3eaf-507a-c7c0-08d607c46fbc@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: xfrm: use core API for updating/providing stats
Lev Stipakov [Fri, 13 Nov 2020 21:59:40 +0000 (23:59 +0200)]
net: xfrm: use core API for updating/providing stats

Commit d3fd65484c781 ("net: core: add dev_sw_netstats_tx_add") has added
function "dev_sw_netstats_tx_add()" to update net device per-cpu TX
stats.

Use this function instead of own code.

While on it, remove xfrmi_get_stats64() and replace it with
dev_get_tstats64().

Signed-off-by: Lev Stipakov <lev@openvpn.net>
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/20201113215939.147007-1-lev@openvpn.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: openvswitch: use core API to update/provide stats
Lev Stipakov [Fri, 13 Nov 2020 21:53:36 +0000 (23:53 +0200)]
net: openvswitch: use core API to update/provide stats

Commit d3fd65484c781 ("net: core: add dev_sw_netstats_tx_add") has added
function "dev_sw_netstats_tx_add()" to update net device per-cpu TX
stats.

Use this function instead of own code.

While on it, remove internal_get_stats() and replace it
with dev_get_tstats64().

Signed-off-by: Lev Stipakov <lev@openvpn.net>
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/20201113215336.145998-1-lev@openvpn.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoMerge branch 'mlxsw-preparations-for-nexthop-objects-support-part-1-2'
Jakub Kicinski [Sun, 15 Nov 2020 00:55:07 +0000 (16:55 -0800)]
Merge branch 'mlxsw-preparations-for-nexthop-objects-support-part-1-2'

Ido Schimmel says:

====================
mlxsw: Preparations for nexthop objects support - part 1/2

This patch set contains small and non-functional changes aimed at making
it easier to support nexthop objects in mlxsw. Follow up patches can be
found here [1].

Patches #1-#4 add a type field to the nexthop group struct instead of
the existing protocol field. This will be used later on to add a nexthop
object type, which can contain both IPv4 and IPv6 nexthops.

Patches #5-#7 move the IPv4 FIB info pointer (i.e., 'struct fib_info')
from the nexthop group struct to the route. The pointer will not be
available when the nexthop group is a nexthop object, but it needs to be
accessible to routes regardless.

Patch #8 is the biggest change, but it is an entirely cosmetic change
and should therefore be easy to review. The motivation and the change
itself are explained in detail in the commit message.

Patches #9-#12 perform small changes so that two functions that are
currently split between IPv4 and IPv6 could be consolidated in patches

Patch #15 removes an outdated comment.

[1] https://github.com/idosch/linux/tree/submit/nexthop_objects
====================

Link: https://lore.kernel.org/r/20201113160559.22148-1-idosch@idosch.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agomlxsw: spectrum_router: Remove outdated comment
Ido Schimmel [Fri, 13 Nov 2020 16:05:59 +0000 (18:05 +0200)]
mlxsw: spectrum_router: Remove outdated comment

Since commit 21151f64a458 ("mlxsw: Add new FIB entry type for reject
routes") this comment is no longer correct. Remove it.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>