review.tizen.org Git - platform/kernel/linux-rpi.git/log

staging: dpaa2-switch: move the notifier register to module_init()

Move the notifier blocks register into the module_init() step, instead of
object probe, so that all DPSW devices probed by the dpaa2-switch driver
can use the same notifiers.

This will enable us to have a more straightforward approach in
determining if an event is intended for an object managed by this driver
or not. Previously, the dpaa2_switch_port_dev_check() function was
forced to also check the notifier block beside the net_device_ops
structure to determine if the event is for us or not.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

staging: dpaa2-switch: properly setup switching domains

Until now, the DPAA2 switch was not capable to properly setup its
switching domains depending on the existence, or lack thereof, of a
upper bridge device. This meant that all switch ports of a DPSW object
were switching by default even though they were not under the same
bridge device.

Another issue was the inability to actually add the CPU in the flooding
domains (broadcast, unknown unicast etc) of a particular switch port.
This meant that a simple ping on a switch interface was not possible
since no broadcast ARP frame would actually reach the CPU queues.

This patch tries to fix exactly these problems by:

* Creating and managing a FDB table for each flooding domain. This means
  that when a switch interface is not bridged it will use its own FDB
  table. While in bridged mode all DPAA2 switch interfaces under the
  same upper will use the same FDB table, thus leverage the same FDB
  entries.

* Adding a new MC firmware command - dpsw_set_egress_flood() - through
  which the driver can setup the flooding domains as needed. For
  example, when the switch interface is standalone, thus not in a
  bridge with any other DPAA2 switch port, it will setup its broadcast
  and unknown unicast flooding domains to only include the control
  interface (the queues that reach the CPU and the driver can dequeue
  from). This flooding domain changes when the interface joins a bridge
  and is configured to include, beside the control interface, all other
  DPAA2 switch interfaces.

We impose a minimum limit of FDB tables available equal to the number of
switch interfaces so that we guarantee that, in the maximal
configuration - all interfaces are standalone, each switch port will
have a private FDB table. At the same time, we only probe DPSW objects
that have the flooding and broadcast replicators configured to be per
FDB (DPSW_*_PER_FDB). Without this, the dpaa2-switch driver would not
be able to configure multiple switching domains.

At probe time, a FDB table will be allocated for each port. At a bridge
join event, the switch port will either continue to use the current FDB
table (if it's the first dpaa2-switch port to join that bridge) or will
switch to use the FDB table associated with the port that it's already
under the bridge. If a FDB switch is necessary, the private FDB table
which was previously used will be returned to the pool of unused FDBs.

Upon a bridge leave, the switch port needs a private FDB table thus it
will search and get the first unused FDB table. This way, all the other
ports remaining under the bridge will continue to use the same FDB
table.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

staging: dpaa2-switch: enable the control interface

Enable the CTRL_IF of the switch object, now that all the pieces are in
place (buffer and queue management, interrupts, NAPI instances etc).

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

staging: dpaa2-switch: add .ndo_start_xmit() callback

Implement the .ndo_start_xmit() callback for the switch port interfaces.
For each of the switch ports, gather the corresponding queue
destination ID (QDID) necessary for Tx enqueueing.

We'll reserve 64 bytes for software annotations, where we keep a skb
backpointer used on the Tx confirmation side for releasing the allocated
memory. At the moment, we only support linear skbs.

Also, add support for the Tx confirmation path which for the most part
shares the code path with the normal Rx queue.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

staging: dpaa2-switch: handle Rx path on control interface

The dpaa2-ethsw supports only one Rx queue that is shared by all switch
ports. This means that information about which port was the ingress port
for a specific frame needs to be passed in metadata. In our case, the
Flow Context (FLC) field from the frame descriptor holds this
information. Besides the interface ID of the ingress port we also
receive the virtual QDID of the port. Below is a visual description of
the 64 bits of FLC.

63           47           31           15           0
+---------------------------------------------------+
|            |            |            |            |
|  RESERVED  |    IF_ID   |  RESERVED  |  IF QDID   |
|            |            |            |            |
+---------------------------------------------------+

Because all switch ports share the same Rx and Tx conf queues, NAPI
management takes into consideration when there is at least one switch
interface open to enable the NAPI instance.

The Rx path is common, for the most part, for both Rx and Tx conf with
the mention that each of them has its own consume function of a frame
descriptor. Dequeueing from a FQ, consuming dequeued store and also the
NAPI poll function is common between both queues.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

staging: dpaa2-switch: setup dpio

Setup interrupts on the control interface queues. We do not force an
exact affinity between the interrupts received from a specific queue and
a cpu.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

staging: dpaa2-switch: setup buffer pool and RX path rings

Allocate and setup a buffer pool, needed on the Rx path of the control
interface. Also, define the Rx buffer size seen by the WRIOP from the
PAGE_SIZE buffers seeded.

Also, create the needed Rx rings for both frame queues used on the
control interface. On the Rx path, when a pull-dequeue operation is
performed on a software portal, available frame descriptors are put in a
ring - a DMA memory storage - for further usage.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

staging: dpaa2-switch: get control interface attributes

Introduce a new structure to hold all necessary info related to an RX
queue for the control interface and populate the FQ IDs.
We only have one Rx queue and one Tx confirmation queue on the control
interface, both shared by all the switch ports.

Also, increase the minimum version of the object supported by the driver
since for a basic switch driver support we'll be in need for some ABIs
added in the latest version of firmware.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

staging: dpaa2-switch: remove obsolete .ndo_fdb_{add|del} callbacks

Since the dpaa2-switch already listens for SWITCHDEV_FDB_ADD_TO_DEVICE /
SWITCHDEV_FDB_DEL_TO_DEVICE events emitted by the bridge, we don't need
the bridge bypass operations, and now is a good time to delete them. All
'bridge fdb' commands need the 'master' flag specified now.

In fact, having the obsolete .ndo_fdb_{add|del} callbacks would even
complicate the bridge leave/join procedures without any real benefit.
Every FDB entry is installed in an FDB ID as far as the hardware is
concerned, and the dpaa2-switch ports change their FDB ID when they join
or leave a bridge. So we would need to manually delete these FDB entries
when the FDB ID changes. That's because, unlike FDB entries added
through switchdev, where the bridge automatically deletes those on
leave, there isn't anybody who will remove the static FDB entries
installed via the bridge bypass operations upon a change in the upper
device.

Note that we still need .ndo_fdb_dump though. The dpaa2-switch does not
emit any interrupts when a new address is learnt, so we cannot keep the
bridge FDB in sync with the hardware FDB. Therefore, we need this
callback to get a chance to print the FDB entries that were dynamically
learnt by our hardware.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

staging: dpaa2-switch: fix up initial forwarding configuration done by firmware

By default, the DPSW object is configured with VLAN ID 1 in the VLAN
table, which all ports are member of. This entry in the VLAN table
selects the same FDB ID for all ports, meaning that forwarding between
ports is permitted. This is unlike the switchdev model, where each port
should operate as standalone by default.

To make the switch operate in standalone ports mode, we need the VLAN
table to select a unique FDB ID for each port. In order to do that, we
need to simply delete the VLAN 1 created automatically by firmware, and
let dpaa2_switch_port_init take over, by readding VLAN ID 1, but
pointing towards a unique FDB ID.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

staging: dpaa2-switch: remove broken learning and flooding support

This patch is removing the current configuration of learning and
flooding states per switch port because they are essentially broken in
terms of integration with the switchdev APIs and the bridge
understanding of these states.

First of all, the learning state is a per switch port configuration
while the dpaa2-switch driver was using it to configure the entire
bridging domain. This is broken since the software learning state could
be out of sync with the hardware state when ports from the same bridging
domain are configured by the user with different learning parameters.

The BR_FLOOD flag has been misinterpreted as well. Instead of denoting
whether unicast traffic for which there is no FDB entry will be flooded
towards a given port, the dpaa2-switch used the flag to configure
whether or not a frame with an unknown destination received on a given
port should be flooded or not. In summary, it was used as ingress
setting instead of a egress one.

Also, remove the unnecessary call to dpsw_if_set_broadcast() and the API
definition. The HW default is to let all switch ports to be able to
flood broadcast traffic thus there is no need to call the API again.

Instead of trying to patch things up, just remove the support for the
moment so that we'll add it back cleanly once the driver is out of
staging.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'enetc-cleanups'

Vladimir Oltean says:

====================
Refactoring/cleanup for NXP ENETC

This series performs the following:
- makes the API for Control Buffer Descriptor Rings in enetc_cbdr.c a
bit more tightly knit.
- moves more logic into enetc_rxbd_next to make the callers simpler
- moves more logic into enetc_refill_rx_ring to make the callers simpler
- removes forward declarations
- simplifies the probe path to unify probing for used and unused PFs.

Nothing radical.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: enetc: make enetc_refill_rx_ring update the consumer index

Since commit fd5736bf9f23 ("enetc: Workaround for MDIO register access
issue"), enetc_refill_rx_ring no longer updates the RX BD ring's
consumer index, that is left to be done by the caller. This has led to
bugs such as the ones found in 96a5223b918c ("net: enetc: remove bogus
write to SIRXIDR from enetc_setup_rxbdr") and 3a5d12c9be6f ("net: enetc:
keep RX ring consumer index in sync with hardware"), so it is desirable
that we move back the update of the consumer index into enetc_refill_rx_ring.

The trouble with that is the different MDIO locking context for the two
callers of enetc_refill_rx_ring:

- enetc_clean_rx_ring runs under enetc_lock_mdio()
- enetc_setup_rxbdr runs outside enetc_lock_mdio()

Simplify the callers of enetc_refill_rx_ring by making enetc_setup_rxbdr
explicitly take enetc_lock_mdio() around the call. It will be the only
place in need of ensuring the hot accessors can be used.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: enetc: remove forward declaration for enetc_map_tx_buffs

There is no other reason why this forward declaration exists rather than
poor ordering of the functions.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: enetc: remove forward-declarations of enetc_clean_{rx,tx}_ring

This patch moves the NAPI enetc_poll after enetc_clean_rx_ring such that
we can delete the forward declarations.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: enetc: use enum enetc_active_offloads

The active_offloads variable of enetc_ndev_priv has an enum type, use it.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: enetc: simplify callers of enetc_rxbd_next

When we iterate through the BDs in the RX ring, the software producer
index (which is already passed by value to enetc_rxbd_next) lags behind,
and we end up with this funny looking "++i == rx_ring->bd_count" check
so that we drag it after us.

Let's pass the software producer index "i" by reference, so that
enetc_rxbd_next can increment it by itself (mod rx_ring->bd_count),
especially since enetc_rxbd_next has to increment the index anyway.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: enetc: don't initialize unused ports from a separate code path

Since commit 3222b5b613db ("net: enetc: initialize RFS/RSS memories for
unused ports too") there is a requirement to initialize the memories of
unused PFs too, which has left the probe path in a bit of a rough shape,
because we basically have a minimal initialization path for unused PFs
which is separate from the main initialization path.

Now that initializing a control BD ring is as simple as calling
enetc_setup_cbdr, let's move that outside of enetc_alloc_si_resources
(unused PFs don't need classification rules, so no point in allocating
them just to free them later).

But enetc_alloc_si_resources is called both for PFs and for VFs, so now
that enetc_setup_cbdr is no longer called from this common function, it
means that the VF probe path needs to explicitly call enetc_setup_cbdr
too.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: enetc: pass bd_count as an argument to enetc_setup_cbdr

It makes no sense from an API perspective to first initialize some
portion of struct enetc_cbdr outside enetc_setup_cbdr, then leave that
function to initialize the rest. enetc_setup_cbdr should be able to
perform all initialization given a zero-initialized struct enetc_cbdr.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: enetc: squash clear_cbdr and free_cbdr into teardown_cbdr

All call sites call enetc_clear_cbdr and enetc_free_cbdr one after
another, so let's combine the two functions into a single method named
enetc_teardown_cbdr which does both, and in the same order.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: enetc: save the mode register address inside struct enetc_cbdr

enetc_clear_cbdr depends on struct enetc_hw because it must disable the
ring through a register write. We'd like to remove that dependency, so
let's do what's already done with the producer and consumer indices,
which is to save the iomem address in a variable kept in struct enetc_cbdr.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: enetc: squash enetc_alloc_cbdr and enetc_setup_cbdr

enetc_alloc_cbdr and enetc_setup_cbdr are always called one after
another, so we can simplify the callers and make enetc_setup_cbdr do
everything that's needed.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: enetc: save the DMA device for enetc_free_cbdr

We shouldn't need to pass the struct device *dev to enetc CBDR APIs over
and over again, so save this inside struct enetc_cbdr::dma_dev and avoid
calling it from the enetc_free_cbdr functions.

This breaks the dependency of the cbdr API from struct enetc_si (the
station interface).

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: enetc: move the CBDR API to enetc_cbdr.c

Since there is a dedicated file in this driver for interacting with
control BD rings, it makes sense to move these functions there.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'defxx-updates'

Maciej W. Rozycki says:

====================
FDDI: defxx: CSR access fixes and improvements

As a lab upgrade I have recently replaced a dated 32-bit x86 server with
a new POWER9 system.  One of the purposes of the system has been providing
network based resources to clients over my FDDI network.  As such the new
server has also received a new DEFPA FDDI network adapter.

As it turned out the interface did not work with the driver as shipped by
the most recent stable Debian release (Linux version 5.9.15) for ppc64el.
Symptoms were inconclusive, and the DEFPA adapter turned out to have a
manufacturing defect as well, however eventually I have figured out the
PCIe host bridge used with the system, Power Systems Host Bridge 4 (PHB4),
does not (anymore) implement PCI I/O transactions, while the binary defxx
driver as shipped by Debian comes configured for port I/O, and then a bug
in resource handling causes the driver to try and use an unassigned port
I/O range for adapter's PDQ main ASIC's CSR access.

Fortunately the PFI PCI interface ASIC used with the DEFPA adapter has
been designed such as to provide for both PCI I/O and PCI memory accesses
to be used for PDQ CSR access, via a pair of BARs to be alternatively
used.

Originally the defxx driver only supported port I/O access, but in the
course of interfacing it to the TURBOchannel bus I had to implement MMIO
access too, and while at it I have added a kernel configuration option to
globally switch between port I/O and MMIO at compilation time, however
conservatively defaulting to port I/O for EISA bus support where the use
of MMIO currently requires the adapter to have been suitably configured
via ECU (EISA Configuration Utility), supplied externally.

With the kernel configuration option set to MMIO the DEFPA interface
works correctly with my POWER9 system.  Therefore I have prepared this
small patch series consisting of a pair of conservative bug fixes, to be
backported to stable branches, and then a pair of improvements for the
robustness of the driver.

So changes 1/4 and 2/4 apply both to net and net-next, and then changes
3/4 and 4/4 apply on top of them to net-next only.  In particular there
are diff context dependencies going like this: 1/4 -> 3/4 -> 4/4.  Let me
know if this submission needs to be sorted differently.

See individual change descriptions for further details as to the actual
changes made.

NB the ESIC interface chip used for slave address decoding with the DEFEA
EISA adapter has decoding implemented for address bits 31:10 and therefore
supports full 32-bit range for the allocation of the CSR decoding window.
For DOS compatibility reasons ECU however only allows allocations between
0x000c0000 and 0x000effff.

Given that for other compatibility reasons EISA is subtractively decoded
on mixed PCI/EISA systems we could allocate an MMIO region from arbitrary
unoccupied memory space and program the ESIC suitably without regard for
that compatibility limitation.  In fact I have a proof-of-concept change
and it seems to work reliably.

However with these patches applied the driver continues supporting port
I/O as fallback and the EISA product ID register is located in the EISA
slot-specific port I/O address space, so any EISA system however modern
(sounds like a joke, eh?) also has to support port I/O access somehow.

So while I think such a dynamic MMIO allocation would be an example of
good engineering, but it would require changes to our EISA core and
therefore it may have had sense 25 years ago when EISA was still
mainstream, but not nowadays when EISA systems are I suppose more of a
curiosity rather than the usual equipment.

This patch series has been thoroughly verified with Linux 5.11.0 as
released and then a Raptor Talos II POWER9 system and a Malta 5Kc MIPS64
system for PCI DEFPA adapter support, an Advanced Integrated Research
486EI x86 system for EISA DEFEA adapter support, and a Digital Equipment
DECstation 5000 model 260 MIPS III system for TURBOchannel DEFTA adapter
support, covering both port I/O and MMIO operation where applicable.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

FDDI: defxx: Use driver's name with resource requests

Replace repeated "defxx" strings with a reference to the DRV_NAME macro
and then use the driver's name rather that the bus address with resource
requests so as to have contents of /proc/iomem and /proc/ioports more
meaningful to the user, in line with what drivers usually do.

So rather than say:

5000-50ff : DEC FDDIcontroller/EISA Adapter
  5000-503f : 00:05
  5040-5043 : 00:05
5400-54ff : DEC FDDIcontroller/EISA Adapter
5800-58ff : DEC FDDIcontroller/EISA Adapter
5c00-5cff : DEC FDDIcontroller/EISA Adapter
  5c80-5cbf : 00:05

or:

620c080020000-620c08002007f : 0031:02:04.0
  620c080020000-620c08002007f : 0031:02:04.0
620c080030000-620c08003ffff : 0031:02:04.0

or:

1f100000-1f10003f : tc2

we report:

5000-50ff : DEC FDDIcontroller/EISA Adapter
  5000-503f : defxx
  5040-5043 : defxx
5400-54ff : DEC FDDIcontroller/EISA Adapter
5800-58ff : DEC FDDIcontroller/EISA Adapter
5c00-5cff : DEC FDDIcontroller/EISA Adapter
  5c80-5cbf : defxx

and:

620c080020000-620c08002007f : 0031:02:04.0
  620c080020000-620c08002007f : defxx
620c080030000-620c08003ffff : 0031:02:04.0

and:

1f100000-1f10003f : defxx

respectively for the DEFEA (EISA), DEFPA (PCI), and DEFTA (TURBOchannel)
adapters.

Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

FDDI: defxx: Implement dynamic CSR I/O address space selection

Recent versions of the PCI Express specification have deprecated support
for I/O transactions and actually some PCIe host bridges, such as Power
Systems Host Bridge 4 (PHB4), do not implement them.  Conversely a DEFEA
adapter can have its MMIO decoding disabled with ECU (EISA Configuration
Utility) and therefore not available for us with the resource allocation
infrastructure we implement.

However either I/O address space will always be available for use with
the DEFEA (EISA) and DEFPA (PCI) adapters and both have double address
decoding implemented in hardware for Control and Status Register access.
The two kinds of adapters can be present both at once in a single mixed
PCI/EISA system.  For the DEFTA (TURBOchannel) variant there is no issue
as there has been no port I/O address space defined for that bus.

To make people's life easier and the driver more robust remove the
DEFXX_MMIO configuration option so as to rather than making the choice
for the I/O address space to use at build time for all the adapters
installed in the system let the driver choose the most suitable address
space dynamically on a case-by-case basis at run time.  Make MMIO the
default and resort to port I/O should the default fail for some reason.

This way multiple adapters installed in one system can use different I/O
address spaces each, in particular in the presence of DEFEA adapters in
a pure-EISA or a mixed EISA/PCI system (it is expected that DEFPA boards
will use MMIO in normal circumstances).

The choice of the I/O address space to use continues being reported by
the driver on startup, e.g.:

eisa 00:05: EISA: slot 5: DEC3002 detected
defxx: v1.12 2021/03/10  Lawrence V. Stefani and others
00:05: DEFEA at I/O addr = 0x5000, IRQ = 10, Hardware addr = 00-00-f8-c8-b3-b6
00:05: registered as fddi0

and:

defxx: v1.12 2021/03/10  Lawrence V. Stefani and others
0031:02:04.0: DEFPA at MMIO addr = 0x620c080020000, IRQ = 57, Hardware addr = 00-60-6d-93-91-98
0031:02:04.0: registered as fddi0

and:

defxx: v1.12 2021/03/10  Lawrence V. Stefani and others
tc2: DEFTA at MMIO addr = 0x1f100000, IRQ = 21, Hardware addr = 08-00-2b-b0-8b-1e
tc2: registered as fddi0

so there is no need to add further information.

The change is supposed to cause a negligible performance hit as I/O
accessors will now have code executed conditionally at run time.

Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

FDDI: defxx: Make MMIO the configuration default except for EISA

Recent versions of the PCI Express specification have deprecated support
for I/O transactions and actually some PCIe host bridges, such as Power
Systems Host Bridge 4 (PHB4), do not implement them.

The default kernel configuration choice for the defxx driver is the use
of I/O ports rather than MMIO for PCI and EISA systems.  It may have
made sense as a conservative backwards compatible choice back when MMIO
operation support was added to the driver as a part of TURBOchannel bus
support.  However nowadays this configuration choice makes the driver
unusable with systems that do not implement I/O transactions for PCIe.

Make DEFXX_MMIO the configuration default then, except where configured
for EISA.  This exception is because an EISA adapter can have its MMIO
decoding disabled with ECU (EISA Configuration Utility) and therefore
not available with the resource allocation infrastructure we implement,
while port I/O is always readily available as it uses slot-specific
addressing, directly mapped to the slot an option card has been placed
in and handled with our EISA bus support core.  Conversely a kernel that
supports modern systems which may not have I/O transactions implemented
for PCIe will usually not be expected to handle legacy EISA systems.

The change of the default will make it easier for people, including but
not limited to distribution packagers, to make a working choice for the
driver.

Update the option description accordingly and while at it replace the
potentially ambiguous PIO acronym with IOP for "port I/O" vs "I/O ports"
according to our nomenclature used elsewhere.

Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Fixes: e89a2cfb7d7b ("[TC] defxx: TURBOchannel support")
Cc: stable@vger.kernel.org # v2.6.21+
Signed-off-by: David S. Miller <davem@davemloft.net>

FDDI: defxx: Bail out gracefully with unassigned PCI resource for CSR

Recent versions of the PCI Express specification have deprecated support
for I/O transactions and actually some PCIe host bridges, such as Power
Systems Host Bridge 4 (PHB4), do not implement them.

For those systems the PCI BARs that request a mapping in the I/O space
have the length recorded in the corresponding PCI resource set to zero,
which makes it unassigned:

# lspci -s 0031:02:04.0 -v
0031:02:04.0 FDDI network controller: Digital Equipment Corporation PCI-to-PDQ Interface Chip [PFI] FDDI (DEFPA) (rev 02)
Subsystem: Digital Equipment Corporation FDDIcontroller/PCI (DEFPA)
Flags: bus master, medium devsel, latency 136, IRQ 57, NUMA node 8
Memory at 620c080020000 (32-bit, non-prefetchable) [size=128]
I/O ports at <unassigned> [disabled]
Memory at 620c080030000 (32-bit, non-prefetchable) [size=64K]
Capabilities: [50] Power Management version 2
Kernel driver in use: defxx
Kernel modules: defxx

#

Regardless the driver goes ahead and requests it (here observed with a
Raptor Talos II POWER9 system), resulting in an odd /proc/ioport entry:

# cat /proc/ioports
00000000-ffffffffffffffff : 0031:02:04.0
#

Furthermore, the system gets confused as the driver actually continues
and pokes at those locations, causing a flood of messages being output
to the system console by the underlying system firmware, like:

defxx: v1.11 2014/07/01  Lawrence V. Stefani and others
defxx 0031:02:04.0: enabling device (0140 -> 0142)
LPC[000]: Got SYNC no-response error. Error address reg: 0xd0010000
IPMI: dropping non severe PEL event
LPC[000]: Got SYNC no-response error. Error address reg: 0xd0010014
IPMI: dropping non severe PEL event
LPC[000]: Got SYNC no-response error. Error address reg: 0xd0010014
IPMI: dropping non severe PEL event

and so on and so on (possibly intermixed actually, as there's no locking
between the kernel and the firmware in console port access with this
particular system, but cleaned up above for clarity), and once some 10k
of such pairs of the latter two messages have been produced an interace
eventually shows up in a useless state:

0031:02:04.0: DEFPA at I/O addr = 0x0, IRQ = 57, Hardware addr = 00-00-00-00-00-00

This was not expected to happen as resource handling was added to the
driver a while ago, because it was not known at that time that a PCI
system would be possible that cannot assign port I/O resources, and
oddly enough `request_region' does not fail, which would have caught it.

Correct the problem then by checking for the length of zero for the CSR
resource and bail out gracefully refusing to register an interface if
that turns out to be the case, producing messages like:

defxx: v1.11 2014/07/01  Lawrence V. Stefani and others
0031:02:04.0: Cannot use I/O, no address set, aborting
0031:02:04.0: Recompile driver with "CONFIG_DEFXX_MMIO=y"

Keep the original check for the EISA MMIO resource as implemented,
because in that case the length is hardwired to 0x400 as a consequence
of how the compare/mask address decoding works in the ESIC chip and it
is only the base address that is set to zero if MMIO has been disabled
for the adapter in EISA configuration, which in turn could be a valid
bus address in a legacy-free system implementing PCI, especially for
port I/O.

Where the EISA MMIO resource has been disabled for the adapter in EISA
configuration this arrangement keeps producing messages like:

eisa 00:05: EISA: slot 5: DEC3002 detected
defxx: v1.11 2014/07/01  Lawrence V. Stefani and others
00:05: Cannot use MMIO, no address set, aborting
00:05: Recompile driver with "CONFIG_DEFXX_MMIO=n"
00:05: Or run ECU and set adapter's MMIO location

with the last two lines now swapped for easier handling in the driver.

There is no need to check for and catch the case of a port I/O resource
not having been assigned for EISA as the adapter uses the slot-specific
I/O space, which gets assigned by how EISA has been specified and maps
directly to the particular slot an option card has been placed in.  And
the EISA variant of the adapter has additional registers that are only
accessible via the port I/O space anyway.

While at it factor out the error message calls into helpers and fix an
argument order bug with the `pr_err' call now in `dfx_register_res_err'.

Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Fixes: 4d0438e56a8f ("defxx: Clean up DEFEA resource management")
Cc: stable@vger.kernel.org # v3.19+
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mlxsw-misc-updates'

Ido Schimmel says:

====================
mlxsw: Misc updates

This patch set contains miscellaneous updates for mlxsw.

Patches #1-#2 reword an extack message to make it clearer and fix a
comment.

Patch #3 bumps the minimum firmware version enforced by mlxsw. This is
needed for two upcoming features: Resilient hashing and per-flow
sampling.

Patches #4-#6 improve the information reported via devlink-health for
'fw_fatal' events.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: Adjust some MFDE fields shift and size to fw implementation

MFDE.irisc_id and MFDE.event_id were adjusted according to what is
actually implemented in firmware.

Adjust the shift and size of these fields in mlxsw as well.

Note that the displacement of the first field is not a regression.
It was always incorrect and therefore reported "0".

Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: core: Expose MFDE.log_ip to devlink health

Add the MFDE.log_ip field to devlink health reporter in order to ease
firmware debug. This field encodes the instruction pointer that triggered
the CR space timeout.

Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: reg: Extend MFDE register with new log_ip field

Extend MFDE (Monitoring FW Debug) register with new field specifying the
instruction pointer that triggered the CR space timeout.

Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum: Bump minimum FW version to xx.2008.2406

The indicated version fixes the following two issues:

- MIRROR_SAMPLER_ACTION.mirror_probability_rate inverted. This has
  implication for per-flow sampling.

- When adjacency is replaced-if-inactive (RATR.opcode=3), bad parameter
  was reported when replacing an active entry. This breaks offload of
  resilient next-hop groups.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: reg: Fix comment about slot_index field in PMAOS register

The comment did not include the register name.
Add `pmaos` to align the comment with other comments.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum: Reword an error message for Q-in-Q veto

'Uppers' is not clear enough for all users when referring to upper
devices.

Reword the error message so it will be clearer.

Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipv6: route.c:fix indentation

The series of space has been replaced by tab space
wherever required.

Signed-off-by: Shubhankar Kuranagatti <shubhankarvk@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: add a helper to avoid issues with HW TX timestamping and SO_TXTIME

As explained in commit 29d98f54a4fe ("net: enetc: allow hardware
timestamping on TX queues with tc-etf enabled"), hardware TX
timestamping requires an skb with skb->tstamp = 0. When a packet is sent
with SO_TXTIME, the skb->skb_mstamp_ns corrupts the value of skb->tstamp,
so the drivers need to explicitly reset skb->tstamp to zero after
consuming the TX time.

Create a helper named skb_txtime_consumed() which does just that. All
drivers which offload TC_SETUP_QDISC_ETF should implement it, and it
would make it easier to assess during review whether they do the right
thing in order to be compatible with hardware timestamping or not.

Suggested-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

FDDI: defza: Update my e-mail address

Following the recent update to MAINTAINERS update my e-mail address.

Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

FDDI: defxx: Update my e-mail address

Following the recent update to MAINTAINERS update my e-mail address.

Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

FDDI: if_fddi.h: Update my e-mail address

Following the recent update to MAINTAINERS update my e-mail address.

Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

sched: act_sample: Implement stats_update callback

Implement this callback in order to get the offloaded stats added to the
kernel stats.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

isdn: mISDN: remove unneeded variable 'ret'

Fix the following coccicheck warning:
./drivers/isdn/mISDN/dsp_core.c:956:6-9: Unneeded variable: "err".
Return "0" on line 1001

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: broadcom: bcm4908_enet: read MAC from OF

BCM4908 devices have MAC address accessible using NVMEM so it's needed
to use OF helper for reading it.

Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>

skbuff: remove some unnecessary operation in skb_segment_list()

gro list uses skb_shinfo(skb)->frag_list to link two skb together,
and NAPI_GRO_CB(p)->last->next is used when there are more skb,
see skb_gro_receive_list(). gso expects that each segmented skb is
linked together using skb->next, so only the first skb->next need
to set to skb_shinfo(skb)-> frag_list when doing gso list segment.

It is the same reason that nskb->next does not need to be set to
list_skb before goto the error handling, because nskb->next already
pointers to list_skb.

And nskb is also the last skb at the end of loop, so remove tail
variable and use nskb instead.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qed: Fix fall-through warnings for Clang

In preparation to enable -Wimplicit-fallthrough for Clang, fix multiple
warnings by explicitly adding a couple of break statements instead of
just letting the code fall through to the next case.

Link: https://github.com/KSPP/linux/issues/115
Reviewed-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: plip: Fix fall-through warnings for Clang

In preparation to enable -Wimplicit-fallthrough for Clang, fix multiple
warnings by explicitly adding multiple break statements instead of
letting the code fall through to the next case.

Link: https://github.com/KSPP/linux/issues/115
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: rose: Fix fall-through warnings for Clang

In preparation to enable -Wimplicit-fallthrough for Clang, fix multiple
warnings by explicitly adding multiple break statements instead of
letting the code fall through to the next case.

Link: https://github.com/KSPP/linux/issues/115
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: core: Fix fall-through warnings for Clang

In preparation to enable -Wimplicit-fallthrough for Clang, fix a warning
by explicitly adding a break statement instead of letting the code fall
through to the next case.

Link: https://github.com/KSPP/linux/issues/115
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bridge: Fix fall-through warnings for Clang

In preparation to enable -Wimplicit-fallthrough for Clang, fix a warning
by explicitly adding a break statement instead of letting the code fall
through to the next case.

Link: https://github.com/KSPP/linux/issues/115
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ax25: Fix fall-through warnings for Clang

In preparation to enable -Wimplicit-fallthrough for Clang, fix a warning
by explicitly adding a break statement instead of letting the code fall
through to the next case.

Link: https://github.com/KSPP/linux/issues/115
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

decnet: Fix fall-through warnings for Clang

In preparation to enable -Wimplicit-fallthrough for Clang, fix a warning
by explicitly adding a break statement instead of letting the code fall
through to the next case.

Link: https://github.com/KSPP/linux/issues/115
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: cassini: Fix fall-through warnings for Clang

In preparation to enable -Wimplicit-fallthrough for Clang, fix a warning
by explicitly adding a break statement instead of just letting the code
fall through to the next case.

Link: https://github.com/KSPP/linux/issues/115
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: 3c509: Fix fall-through warnings for Clang

In preparation to enable -Wimplicit-fallthrough for Clang, fix a warning
by explicitly adding a break statement instead of just letting the code
fall through to the next case.

Link: https://github.com/KSPP/linux/issues/115
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mscc: ocelot: Fix fall-through warnings for Clang

In preparation to enable -Wimplicit-fallthrough for Clang, fix a warning
by explicitly adding a break statement instead of just letting the code
fall through to the next case.

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://github.com/KSPP/linux/issues/115
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fddi: skfp: smt: Replace one-element array with flexible-array member

There is a regular need in the kernel to provide a way to declare having
a dynamically sized set of trailing elements in a structure. Kernel code
should always use “flexible array members”[1] for these cases. The older
style of one-element or zero-length arrays should no longer be used[2].

Refactor the code according to the use of flexible-array members in
smt_sif_operation structure, instead of one-element arrays. Also, make
use of the struct_size() helper instead of the open-coded version
to calculate the size of the struct-with-flex-array. Additionally, make
use of the typeof operator to properly determine the object type to be
passed to macro smtod().

Also, this helps the ongoing efforts to enable -Warray-bounds by fixing
the following warnings:

  CC [M]  drivers/net/fddi/skfp/smt.o
drivers/net/fddi/skfp/smt.c: In function ‘smt_send_sif_operation’:
drivers/net/fddi/skfp/smt.c:1084:30: warning: array subscript 1 is above array bounds of ‘struct smt_p_lem[1]’ [-Warray-bounds]
1084 |    smt_fill_lem(smc,&sif->lem[i],i) ;
      |                      ~~~~~~~~^~~
In file included from drivers/net/fddi/skfp/h/smc.h:42,
                 from drivers/net/fddi/skfp/smt.c:15:
drivers/net/fddi/skfp/h/smt.h:767:19: note: while referencing ‘lem’
  767 |  struct smt_p_lem lem[1] ; /* phy lem status */
      |                   ^~~
drivers/net/fddi/skfp/smt.c:1084:30: warning: array subscript 1 is above array bounds of ‘struct smt_p_lem[1]’ [-Warray-bounds]
1084 |    smt_fill_lem(smc,&sif->lem[i],i) ;
      |                      ~~~~~~~~^~~
In file included from drivers/net/fddi/skfp/h/smc.h:42,
                 from drivers/net/fddi/skfp/smt.c:15:
drivers/net/fddi/skfp/h/smt.h:767:19: note: while referencing ‘lem’
  767 |  struct smt_p_lem lem[1] ; /* phy lem status */
      |                   ^~~
drivers/net/fddi/skfp/smt.c:1084:30: warning: array subscript 1 is above array bounds of ‘struct smt_p_lem[1]’ [-Warray-bounds]
1084 |    smt_fill_lem(smc,&sif->lem[i],i) ;
      |                      ~~~~~~~~^~~
In file included from drivers/net/fddi/skfp/h/smc.h:42,
                 from drivers/net/fddi/skfp/smt.c:15:
drivers/net/fddi/skfp/h/smt.h:767:19: note: while referencing ‘lem’
  767 |  struct smt_p_lem lem[1] ; /* phy lem status */
      |                   ^~~
drivers/net/fddi/skfp/smt.c:1084:30: warning: array subscript 1 is above array bounds of ‘struct smt_p_lem[1]’ [-Warray-bounds]
1084 |    smt_fill_lem(smc,&sif->lem[i],i) ;
      |                      ~~~~~~~~^~~
In file included from drivers/net/fddi/skfp/h/smc.h:42,
                 from drivers/net/fddi/skfp/smt.c:15:
drivers/net/fddi/skfp/h/smt.h:767:19: note: while referencing ‘lem’
  767 |  struct smt_p_lem lem[1] ; /* phy lem status */
      |                   ^~~
drivers/net/fddi/skfp/smt.c:1084:30: warning: array subscript 1 is above array bounds of ‘struct smt_p_lem[1]’ [-Warray-bounds]
1084 |    smt_fill_lem(smc,&sif->lem[i],i) ;
      |                      ~~~~~~~~^~~
In file included from drivers/net/fddi/skfp/h/smc.h:42,
                 from drivers/net/fddi/skfp/smt.c:15:
drivers/net/fddi/skfp/h/smt.h:767:19: note: while referencing ‘lem’
  767 |  struct smt_p_lem lem[1] ; /* phy lem status */
      |                   ^~~
drivers/net/fddi/skfp/smt.c:1084:30: warning: array subscript 1 is above array bounds of ‘struct smt_p_lem[1]’ [-Warray-bounds]
1084 |    smt_fill_lem(smc,&sif->lem[i],i) ;
      |                      ~~~~~~~~^~~
In file included from drivers/net/fddi/skfp/h/smc.h:42,
                 from drivers/net/fddi/skfp/smt.c:15:
drivers/net/fddi/skfp/h/smt.h:767:19: note: while referencing ‘lem’
  767 |  struct smt_p_lem lem[1] ; /* phy lem status */
      |                   ^~~
drivers/net/fddi/skfp/smt.c:1084:30: warning: array subscript 1 is above array bounds of ‘struct smt_p_lem[1]’ [-Warray-bounds]
1084 |    smt_fill_lem(smc,&sif->lem[i],i) ;
      |                      ~~~~~~~~^~~
In file included from drivers/net/fddi/skfp/h/smc.h:42,
                 from drivers/net/fddi/skfp/smt.c:15:
drivers/net/fddi/skfp/h/smt.h:767:19: note: while referencing ‘lem’
  767 |  struct smt_p_lem lem[1] ; /* phy lem status */
      |                   ^~~
drivers/net/fddi/skfp/smt.c:1084:30: warning: array subscript 1 is above array bounds of ‘struct smt_p_lem[1]’ [-Warray-bounds]
1084 |    smt_fill_lem(smc,&sif->lem[i],i) ;
      |                      ~~~~~~~~^~~
In file included from drivers/net/fddi/skfp/h/smc.h:42,
                 from drivers/net/fddi/skfp/smt.c:15:
drivers/net/fddi/skfp/h/smt.h:767:19: note: while referencing ‘lem’
  767 |  struct smt_p_lem lem[1] ; /* phy lem status */
      |                   ^~~
drivers/net/fddi/skfp/smt.c:1084:30: warning: array subscript 1 is above array bounds of ‘struct smt_p_lem[1]’ [-Warray-bounds]
1084 |    smt_fill_lem(smc,&sif->lem[i],i) ;
      |                      ~~~~~~~~^~~
In file included from drivers/net/fddi/skfp/h/smc.h:42,
                 from drivers/net/fddi/skfp/smt.c:15:
drivers/net/fddi/skfp/h/smt.h:767:19: note: while referencing ‘lem’
  767 |  struct smt_p_lem lem[1] ; /* phy lem status */
      |                   ^~~
drivers/net/fddi/skfp/smt.c:1084:30: warning: array subscript 1 is above array bounds of ‘struct smt_p_lem[1]’ [-Warray-bounds]
1084 |    smt_fill_lem(smc,&sif->lem[i],i) ;
      |                      ~~~~~~~~^~~
In file included from drivers/net/fddi/skfp/h/smc.h:42,
                 from drivers/net/fddi/skfp/smt.c:15:
drivers/net/fddi/skfp/h/smt.h:767:19: note: while referencing ‘lem’
  767 |  struct smt_p_lem lem[1] ; /* phy lem status */

[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://www.kernel.org/doc/html/v5.10/process/deprecated.html#zero-length-and-one-element-arrays

Link: https://github.com/KSPP/linux/issues/79
Link: https://github.com/KSPP/linux/issues/109
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/rds: Drop duplicate sin and sin6 assignments

There is no need to assign the msg->msg_name to sin or sin6,
because there is DECLARE_SOCKADDR statement.

Signed-off-by: Yejune Deng <yejune.deng@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: chelsiofix: spelling typo of 'rewriteing'

rewriteing -> rewriting

Signed-off-by: Wang Qing <wangqing@vivo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers: isdn: mISDN: fix spelling typo of 'wheter'

wheter -> whether

Signed-off-by: Wang Qing <wangqing@vivo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

virtio-net: support XDP when not more queues

The number of queues implemented by many virtio backends is limited,
especially some machines have a large number of CPUs. In this case, it
is often impossible to allocate a separate queue for
XDP_TX/XDP_REDIRECT, then xdp cannot be loaded to work, even xdp does
not use the XDP_TX/XDP_REDIRECT.

This patch allows XDP_TX/XDP_REDIRECT to run by reuse the existing SQ
with __netif_tx_lock() hold when there are not enough queues.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: socket: use BIT() for MSG_*

The bit mask for MSG_* seems a little confused here. Replace it
with BIT() to make it clear to understand.

Signed-off-by: Menglong Dong <dong.menglong@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git./linux/kernel/git/bpf/bpf-next

Alexei Starovoitov says:

====================
pull-request: bpf-next 2021-03-09

The following pull-request contains BPF updates for your *net-next* tree.

We've added 90 non-merge commits during the last 17 day(s) which contain
a total of 114 files changed, 5158 insertions(+), 1288 deletions(-).

The main changes are:

1) Faster bpf_redirect_map(), from Björn.

2) skmsg cleanup, from Cong.

3) Support for floating point types in BTF, from Ilya.

4) Documentation for sys_bpf commands, from Joe.

5) Support for sk_lookup in bpf_prog_test_run, form Lorenz.

6) Enable task local storage for tracing programs, from Song.

7) bpf_for_each_map_elem() helper, from Yonghong.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git./linux/kernel/git/netdev/net

Pull networking fixes from David Miller:

1) Fix transmissions in dynamic SMPS mode in ath9k, from Felix Fietkau.

2) TX skb error handling fix in mt76 driver, also from Felix.

3) Fix BPF_FETCH atomic in x86 JIT, from Brendan Jackman.

4) Avoid double free of percpu pointers when freeing a cloned bpf prog.
    From Cong Wang.

5) Use correct printf format for dma_addr_t in ath11k, from Geert
    Uytterhoeven.

6) Fix resolve_btfids build with older toolchains, from Kun-Chuan
    Hsieh.

7) Don't report truncated frames to mac80211 in mt76 driver, from
    Lorenzop Bianconi.

8) Fix watcdog timeout on suspend/resume of stmmac, from Joakim Zhang.

9) mscc ocelot needs NET_DEVLINK selct in Kconfig, from Arnd Bergmann.

10) Fix sign comparison bug in TCP_ZEROCOPY_RECEIVE getsockopt(), from
    Arjun Roy.

11) Ignore routes with deleted nexthop object in mlxsw, from Ido
    Schimmel.

12) Need to undo tcp early demux lookup sometimes in nf_nat, from
    Florian Westphal.

13) Fix gro aggregation for udp encaps with zero csum, from Daniel
    Borkmann.

14) Make sure to always use imp*_ndo_send when necessaey, from Jason A.
    Donenfeld.

15) Fix TRSCER masks in sh_eth driver from Sergey Shtylyov.

16) prevent overly huge skb allocationsd in qrtr, from Pavel Skripkin.

17) Prevent rx ring copnsumer index loss of sync in enetc, from Vladimir
    Oltean.

18) Make sure textsearch copntrol block is large enough, from Wilem de
    Bruijn.

19) Revert MAC changes to r8152 leading to instability, from Hates Wang.

20) Advance iov in 9p even for empty reads, from Jissheng Zhang.

21) Double hook unregister in nftables, from PabloNeira Ayuso.

22) Fix memleak in ixgbe, fropm Dinghao Liu.

23) Avoid dups in pkt scheduler class dumps, from Maximilian Heyne.

24) Various mptcp fixes from Florian Westphal, Paolo Abeni, and Geliang
    Tang.

25) Fix DOI refcount bugs in cipso, from Paul Moore.

26) One too many irqsave in ibmvnic, from Junlin Yang.

27) Fix infinite loop with MPLS gso segmenting via virtio_net, from
    Balazs Nemeth.

* git://git.kernel.org:/pub/scm/linux/kernel/git/netdev/net: (164 commits)
  s390/qeth: fix notification for pending buffers during teardown
  s390/qeth: schedule TX NAPI on QAOB completion
  s390/qeth: improve completion of pending TX buffers
  s390/qeth: fix memory leak after failed TX Buffer allocation
  net: avoid infinite loop in mpls_gso_segment when mpls_hlen == 0
  net: check if protocol extracted by virtio_net_hdr_set_proto is correct
  net: dsa: xrs700x: check if partner is same as port in hsr join
  net: lapbether: Remove netif_start_queue / netif_stop_queue
  atm: idt77252: fix null-ptr-dereference
  atm: uPD98402: fix incorrect allocation
  atm: fix a typo in the struct description
  net: qrtr: fix error return code of qrtr_sendmsg()
  mptcp: fix length of ADD_ADDR with port sub-option
  net: bonding: fix error return code of bond_neigh_init()
  net: enetc: allow hardware timestamping on TX queues with tc-etf enabled
  net: enetc: set MAC RX FIFO to recommended value
  net: davicom: Use platform_get_irq_optional()
  net: davicom: Fix regulator not turned off on driver removal
  net: davicom: Fix regulator not turned off on failed probe
  net: dsa: fix switchdev objects on bridge master mistakenly being applied on ports
  ...

Merge git://git./linux/kernel/git/davem/sparc

Pull sparc fixes from David Miller:
"Fix opcode filtering for exceptions, and clean up defconfig"

* git://git.kernel.org:/pub/scm/linux/kernel/git/davem/sparc:
sparc: sparc64_defconfig: remove duplicate CONFIGs
sparc64: Fix opcode filtering in handling of no fault loads

sparc: sparc64_defconfig: remove duplicate CONFIGs

After my patch there is CONFIG_ATA defined twice.
Remove the duplicate one.
Same problem for CONFIG_HAPPYMEAL, except I added as builtin for boot
test with NFS.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: a57cdeb369ef ("sparc: sparc64_defconfig: add necessary configs for qemu")
Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sparc64: Fix opcode filtering in handling of no fault loads

is_no_fault_exception() has two bugs which were discovered via random
opcode testing with stress-ng. Both are caused by improper filtering
of opcodes.

The first bug can be triggered by a floating point store with a no-fault
ASI, for instance "sta %f0, [%g0] #ASI_PNF", opcode C1A01040.

The code first tests op3[5] (0x1000000), which denotes a floating
point instruction, and then tests op3[2] (0x200000), which denotes a
store instruction. But these bits are not mutually exclusive, and the
above mentioned opcode has both bits set. The intent is to filter out
stores, so the test for stores must be done first in order to have
any effect.

The second bug can be triggered by a floating point load with one of
the invalid ASI values 0x8e or 0x8f, which pass this check in
is_no_fault_exception():
if ((asi & 0xf2) == ASI_PNF)

An example instruction is "ldqa [%l7 + %o7] #ASI 0x8f, %f38",
opcode CF95D1EF. Asi values greater than 0x8b (ASI_SNFL) are fatal
in handle_ldf_stq(), and is_no_fault_exception() must not allow these
invalid asi values to make it that far.

In both of these cases, handle_ldf_stq() reacts by calling
sun4v_data_access_exception() or spitfire_data_access_exception(),
which call is_no_fault_exception() and results in an infinite
recursion.

Signed-off-by: Rob Gardner <rob.gardner@oracle.com>
Tested-by: Anatoly Pugachev <matorola@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 's390-qeth-fixes'

Julian Wiedmann says:

====================
s390/qeth: fixes 2021-03-09

please apply the following patch series to netdev's net tree.

This brings one fix for a memleak in an error path of the setup code.
Also several fixes for dealing with pending TX buffers - two for old
bugs in their completion handling, and one recent regression in a
teardown path.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

s390/qeth: fix notification for pending buffers during teardown

The cited commit reworked the state machine for pending TX buffers.
In qeth_iqd_tx_complete() it turned PENDING into a transient state, and
uses NEED_QAOB for buffers that get parked while waiting for their QAOB
completion.

But it missed to adjust the check in qeth_tx_complete_buf(). So if
qeth_tx_complete_pending_bufs() is called during teardown to drain
the parked TX buffers, we no longer raise a notification for af_iucv.

Instead of updating the checked state, just move this code into
qeth_tx_complete_pending_bufs() itself. This also gets rid of the
special-case in the common TX completion path.

Fixes: 8908f36d20d8 ("s390/qeth: fix af_iucv notification race")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

s390/qeth: schedule TX NAPI on QAOB completion

When a QAOB notifies us that a pending TX buffer has been delivered, the
actual TX completion processing by qeth_tx_complete_pending_bufs()
is done within the context of a TX NAPI instance. We shouldn't rely on
this instance being scheduled by some other TX event, but just do it
ourselves.

qeth_qdio_handle_aob() is called from qeth_poll(), ie. our main NAPI
instance. To avoid touching the TX queue's NAPI instance
before/after it is (un-)registered, reorder the code in qeth_open()
and qeth_stop() accordingly.

Fixes: 0da9581ddb0f ("qeth: exploit asynchronous delivery of storage blocks")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

s390/qeth: improve completion of pending TX buffers

The current design attaches a pending TX buffer to a custom
single-linked list, which is anchored at the buffer's slot on the
TX ring. The buffer is then checked for final completion whenever
this slot is processed during a subsequent TX NAPI poll cycle.

But if there's insufficient traffic on the ring, we might never make
enough progress to get back to this ring slot and discover the pending
buffer's final TX completion. In particular if this missing TX
completion blocks the application from sending further traffic.

So convert the custom single-linked list code to a per-queue list_head,
and scan this list on every TX NAPI cycle.

Fixes: 0da9581ddb0f ("qeth: exploit asynchronous delivery of storage blocks")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

s390/qeth: fix memory leak after failed TX Buffer allocation

When qeth_alloc_qdio_queues() fails to allocate one of the buffers that
back an Output Queue, the 'out_freeoutqbufs' path will free all
previously allocated buffers for this queue. But it misses to free the
half-finished queue struct itself.

Move the buffer allocation into qeth_alloc_output_queue(), and deal with
such errors internally.

Fixes: 0da9581ddb0f ("qeth: exploit asynchronous delivery of storage blocks")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'virtio_net-infinite-loop'

Balazs Nemeth says:

====================
net: prevent infinite loop caused by incorrect proto from virtio_net_hdr_set_proto

These patches prevent an infinite loop for gso packets with a protocol
from virtio net hdr that doesn't match the protocol in the packet.
Note that packets coming from a device without
header_ops->parse_protocol being implemented will not be caught by
the check in virtio_net_hdr_to_skb, but the infinite loop will still
be prevented by the check in the gso layer.

Changes from v2 to v3:
  - Remove unused *eth.
  - Use MPLS_HLEN to also check if the MPLS header length is a multiple
    of four.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: avoid infinite loop in mpls_gso_segment when mpls_hlen == 0

A packet with skb_inner_network_header(skb) == skb_network_header(skb)
and ETH_P_MPLS_UC will prevent mpls_gso_segment from pulling any headers
from the packet. Subsequently, the call to skb_mac_gso_segment will
again call mpls_gso_segment with the same packet leading to an infinite
loop. In addition, ensure that the header length is a multiple of four,
which should hold irrespective of the number of stacked labels.

Signed-off-by: Balazs Nemeth <bnemeth@redhat.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: check if protocol extracted by virtio_net_hdr_set_proto is correct

For gso packets, virtio_net_hdr_set_proto sets the protocol (if it isn't
set) based on the type in the virtio net hdr, but the skb could contain
anything since it could come from packet_snd through a raw socket. If
there is a mismatch between what virtio_net_hdr_set_proto sets and
the actual protocol, then the skb could be handled incorrectly later
on.

An example where this poses an issue is with the subsequent call to
skb_flow_dissect_flow_keys_basic which relies on skb->protocol being set
correctly. A specially crafted packet could fool
skb_flow_dissect_flow_keys_basic preventing EINVAL to be returned.

Avoid blindly trusting the information provided by the virtio net header
by checking that the protocol in the packet actually matches the
protocol set by virtio_net_hdr_set_proto. Note that since the protocol
is only checked if skb->dev implements header_ops->parse_protocol,
packets from devices without the implementation are not checked at this
stage.

Fixes: 9274124f023b ("net: stricter validation of untrusted gso packets")
Signed-off-by: Balazs Nemeth <bnemeth@redhat.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'bpf-xdp-redirect'

Björn Töpel says:

====================
This two patch series contain two optimizations for the
bpf_redirect_map() helper and the xdp_do_redirect() function.

The bpf_redirect_map() optimization is about avoiding the map lookup
dispatching. Instead of having a switch-statement and selecting the
correct lookup function, we let bpf_redirect_map() be a map operation,
where each map has its own bpf_redirect_map() implementation. This way
the run-time lookup is avoided.

The xdp_do_redirect() patch restructures the code, so that the map
pointer indirection can be avoided.

Performance-wise I got 4% improvement for XSKMAP
(sample:xdpsock/rx-drop), and 8% (sample:xdp_redirect_map) on my
machine.

v5->v6:  Removed REDIR enum, and instead use map_id and map_type. (Daniel)
         Applied Daniel's fixups on patch 1. (Daniel)
v4->v5:  Renamed map operation to map_redirect. (Daniel)
v3->v4:  Made bpf_redirect_map() a map operation. (Daniel)
v2->v3:  Fix build when CONFIG_NET is not set. (lkp)
v1->v2:  Removed warning when CONFIG_BPF_SYSCALL was not set. (lkp)
         Cleaned up case-clause in xdp_do_generic_redirect_map(). (Toke)
         Re-added comment. (Toke)
rfc->v1: Use map_id, and remove bpf_clear_redirect_map(). (Toke)
         Get rid of the macro and use __always_inline. (Jesper)
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

bpf, xdp: Restructure redirect actions

The XDP_REDIRECT implementations for maps and non-maps are fairly
similar, but obviously need to take different code paths depending on
if the target is using a map or not. Today, the redirect targets for
XDP either uses a map, or is based on ifindex.

Here, the map type and id are added to bpf_redirect_info, instead of
the actual map. Map type, map item/ifindex, and the map_id (if any) is
passed to xdp_do_redirect().

For ifindex-based redirect, used by the bpf_redirect() XDP BFP helper,
a special map type/id are used. Map type of UNSPEC together with map id
equal to INT_MAX has the special meaning of an ifindex based
redirect. Note that valid map ids are 1 inclusive, INT_MAX exclusive
([1,INT_MAX[).

In addition to making the code easier to follow, using explicit type
and id in bpf_redirect_info has a slight positive performance impact
by avoiding a pointer indirection for the map type lookup, and instead
use the cacheline for bpf_redirect_info.

Since the actual map is not passed via bpf_redirect_info anymore, the
map lookup is only done in the BPF helper. This means that the
bpf_clear_redirect_map() function can be removed. The actual map item
is RCU protected.

The bpf_redirect_info flags member is not used by XDP, and not
read/written any more. The map member is only written to when
required/used, and not unconditionally.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210308112907.559576-3-bjorn.topel@gmail.com

bpf, xdp: Make bpf_redirect_map() a map operation

Currently the bpf_redirect_map() implementation dispatches to the
correct map-lookup function via a switch-statement. To avoid the
dispatching, this change adds bpf_redirect_map() as a map
operation. Each map provides its bpf_redirect_map() version, and
correct function is automatically selected by the BPF verifier.

A nice side-effect of the code movement is that the map lookup
functions are now local to the map implementation files, which removes
one additional function call.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210308112907.559576-2-bjorn.topel@gmail.com

net: dsa: xrs700x: check if partner is same as port in hsr join

Don't assign dp to partner if it's the same port that xrs700x_hsr_join
was called with. The partner port is supposed to be the other port in
the HSR/PRP redundant pair not the same port. This fixes an issue
observed in testing where forwarding between redundant HSR ports on this
switch didn't work depending on the order the ports were added to the
hsr device.

Fixes: bd62e6f5e6a9 ("net: dsa: xrs700x: add HSR offloading support")
Signed-off-by: George McCollister <george.mccollister@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests/bpf: Fix compiler warning in BPF_KPROBE definition in loop6.c

Add missing return type to BPF_KPROBE definition. Without it, compiler
generates the following warning:

progs/loop6.c:68:12: warning: type specifier missing, defaults to 'int' [-Wimplicit-int]
BPF_KPROBE(trace_virtqueue_add_sgs, void *unused, struct scatterlist **sgs,
^
1 warning generated.

Fixes: 86a35af628e5 ("selftests/bpf: Add a verifier scale test with unknown bounded loop")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210309044322.3487636-1-andrii@kernel.org

Merge tag 'gpio-fixes-for-v5.12-rc3' of git://git./linux/kernel/git/brgl/linux

Pull gpio fixes from Bartosz Golaszewski:
"A bunch of fixes for the GPIO subsystem. We have two regressions in
  the core code spotted right after the merge window, a series of fixes
  for ACPI GPIO and a subsequent fix for a related regression in
  gpio-pca953x + a minor tweak in .gitignore and a rework of handling of
  the gpio-line-names to remedy a regression in stm32mp151.

  Summary:

   - fix two regressions in core GPIO subsystem code: one NULL-pointer
     dereference and one list corruption

   - read GPIO line names from fwnode instead of using the generic
     device properties to fix a regression on stm32mp151

   - fixes to ACPI GPIO and gpio-pca953x to handle a regression in IRQ
     handling on Intel Galileo

   - update .gitignore in GPIO selftests"

* tag 'gpio-fixes-for-v5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
  gpiolib: Read "gpio-line-names" from a firmware node
  gpio: pca953x: Set IRQ type when handle Intel Galileo Gen 2
  gpiolib: acpi: Allow to find GpioInt() resource by name and index
  gpiolib: acpi: Add ACPI_GPIO_QUIRK_ABSOLUTE_NUMBER quirk
  gpiolib: acpi: Add missing IRQF_ONESHOT
  gpio: fix gpio-device list corruption
  gpio: fix NULL-deref-on-deregistration regression
  selftests: gpio: update .gitignore

Merge tag 'mips-fixes_5.12_1' of git://git./linux/kernel/git/mips/linux

Pull MIPS fixes from Thomas Bogendoerfer:

- fixes for boot breakage because of misaligned FDTs

- fix for overwritten exception handlers

- enable MIPS optimized crypto for all MIPS CPUs to improve wireguard
   performance

* tag 'mips-fixes_5.12_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
  MIPS: kernel: Reserve exception base early to prevent corruption
  MIPS: vmlinux.lds.S: align raw appended dtb to 8 bytes
  crypto: mips/poly1305 - enable for all MIPS processors
  MIPS: boot/compressed: Copy DTB to aligned address

net: lapbether: Remove netif_start_queue / netif_stop_queue

For the devices in this driver, the default qdisc is "noqueue",
because their "tx_queue_len" is 0.

In function "__dev_queue_xmit" in "net/core/dev.c", devices with the
"noqueue" qdisc are specially handled. Packets are transmitted without
being queued after a "dev->flags & IFF_UP" check. However, it's possible
that even if this check succeeds, "ops->ndo_stop" may still have already
been called. This is because in "__dev_close_many", "ops->ndo_stop" is
called before clearing the "IFF_UP" flag.

If we call "netif_stop_queue" in "ops->ndo_stop", then it's possible in
"__dev_queue_xmit", it sees the "IFF_UP" flag is present, and then it
checks "netif_xmit_stopped" and finds that the queue is already stopped.
In this case, it will complain that:
"Virtual device ... asks to queue packet!"

To prevent "__dev_queue_xmit" from generating this complaint, we should
not call "netif_stop_queue" in "ops->ndo_stop".

We also don't need to call "netif_start_queue" in "ops->ndo_open",
because after a netdev is allocated and registered, the
"__QUEUE_STATE_DRV_XOFF" flag is initially not set, so there is no need
to call "netif_start_queue" to clear it.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Xie He <xie.he.0141@gmail.com>
Acked-by: Martin Schiller <ms@dev.tdt.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'Add clang-based BTF_KIND_FLOAT tests'

Ilya Leoshkevich says:

====================

The two tests here did not make it into the main BTF_KIND_FLOAT series
because of dependency on LLVM.

v0: https://lore.kernel.org/bpf/20210226202256.116518-10-iii@linux.ibm.com/
v1 -> v0: Per Alexei's suggestion, document the required LLVM commit.

v1: https://lore.kernel.org/bpf/20210305170844.151594-1-iii@linux.ibm.com/
v1 -> v2: Per Andrii's suggestions, use double in
core_reloc_size___diff_sz and non-pointer member in
btf_dump_test_case_syntax.
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Add BTF_KIND_FLOAT to btf_dump_test_case_syntax

Check that dumping various floating-point types produces a valid C
code.

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210309005649.162480-3-iii@linux.ibm.com

selftests/bpf: Add BTF_KIND_FLOAT to test_core_reloc_size

Verify that bpf_core_field_size() is working correctly with floats.
Also document the required clang version.

Suggested-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210309005649.162480-2-iii@linux.ibm.com

MIPS: kernel: Reserve exception base early to prevent corruption

BMIPS is one of the few platforms that do change the exception base.
After commit 2dcb39645441 ("memblock: do not start bottom-up allocations
with kernel_end") we started seeing BMIPS boards fail to boot with the
built-in FDT being corrupted.

Before the cited commit, early allocations would be in the [kernel_end,
RAM_END] range, but after commit they would be within [RAM_START +
PAGE_SIZE, RAM_END].

The custom exception base handler that is installed by
bmips_ebase_setup() done for BMIPS5000 CPUs ends-up trampling on the
memory region allocated by unflatten_and_copy_device_tree() thus
corrupting the FDT used by the kernel.

To fix this, we need to perform an early reservation of the custom
exception space. Additional we reserve the first 4k (1k for R3k) for
either normal exception vector space (legacy CPUs) or special vectors
like cache exceptions.

Huge thanks to Serge for analysing and proposing a solution to this
issue.

Fixes: 2dcb39645441 ("memblock: do not start bottom-up allocations with kernel_end")
Reported-by: Kamal Dasu <kdasu.kdev@gmail.com>
Debugged-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Serge Semin <fancer.lancer@gmail.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>

Merge git://git./linux/kernel/git/davem/sparc

Pull sparc updates from David Miller:
"Just some more random bits from Al, including a conversion over to
  generic extables"

* git://git.kernel.org:/pub/scm/linux/kernel/git/davem/sparc:
  sparc32: take ->thread.flags out
  sparc32: get rid of fake_swapper_regs
  sparc64: get rid of fake_swapper_regs
  sparc32: switch to generic extables
  sparc32: switch copy_user.S away from range exception table entries
  sparc32: get rid of range exception table entries in checksum_32.S
  sparc32: switch __bzero() away from range exception table entries
  sparc32: kill lookup_fault()
  sparc32: don't bother with lookup_fault() in __bzero()

atm: idt77252: fix null-ptr-dereference

this one is similar to the phy_data allocation fix in uPD98402, the
driver allocate the idt77105_priv and store to dev_data but later
dereference using dev->dev_data, which will cause null-ptr-dereference.

fix this issue by changing dev_data to phy_data so that PRIV(dev) can
work correctly.

Signed-off-by: Tong Zhang <ztong0001@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

atm: uPD98402: fix incorrect allocation

dev->dev_data is set in zatm.c, calling zatm_start() will overwrite this
dev->dev_data in uPD98402_start() and a subsequent PRIV(dev)->lock
(i.e dev->phy_data->lock) will result in a null-ptr-dereference.

I believe this is a typo and what it actually want to do is to allocate
phy_data instead of dev_data.

Signed-off-by: Tong Zhang <ztong0001@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

atm: fix a typo in the struct description

phy_data means private PHY data not date

Signed-off-by: Tong Zhang <ztong0001@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: qrtr: fix error return code of qrtr_sendmsg()

When sock_alloc_send_skb() returns NULL to skb, no error return code of
qrtr_sendmsg() is assigned.
To fix this bug, rc is assigned with -ENOMEM in this case.

Fixes: 194ccc88297a ("net: qrtr: Support decoding incoming v2 packets")
Reported-by: TOTE Robot <oslab@tsinghua.edu.cn>
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mptcp: fix length of ADD_ADDR with port sub-option

in current Linux, MPTCP peers advertising endpoints with port numbers use
a sub-option length that wrongly accounts for the trailing TCP NOP. Also,
receivers will only process incoming ADD_ADDR with port having such wrong
sub-option length. Fix this, making ADD_ADDR compliant to RFC8684 §3.4.1.

this can be verified running tcpdump on the kselftests artifacts:

unpatched kernel:
[root@bottarga mptcp]# tcpdump -tnnr unpatched.pcap | grep add-addr
reading from file unpatched.pcap, link-type LINUX_SLL (Linux cooked v1), snapshot length 65535
IP 10.0.1.1.10000 > 10.0.1.2.53078: Flags [.], ack 101, win 509, options [nop,nop,TS val 214459678 ecr 521312851,mptcp add-addr v1 id 1 a00:201:2774:2d88:7436:85c3:17fd:101], length 0
IP 10.0.1.2.53078 > 10.0.1.1.10000: Flags [.], ack 101, win 502, options [nop,nop,TS val 521312852 ecr 214459678,mptcp add-addr[bad opt]]

patched kernel:
[root@bottarga mptcp]# tcpdump -tnnr patched.pcap | grep add-addr
reading from file patched.pcap, link-type LINUX_SLL (Linux cooked v1), snapshot length 65535
IP 10.0.1.1.10000 > 10.0.1.2.38178: Flags [.], ack 101, win 509, options [nop,nop,TS val 3728873902 ecr 2732713192,mptcp add-addr v1 id 1 10.0.2.1:10100 hmac 0xbccdfcbe59292a1f,nop,nop], length 0
IP 10.0.1.2.38178 > 10.0.1.1.10000: Flags [.], ack 101, win 502, options [nop,nop,TS val 2732713195 ecr 3728873902,mptcp add-addr v1-echo id 1 10.0.2.1:10100,nop,nop], length 0

Fixes: 22fb85ffaefb ("mptcp: add port support for ADD_ADDR suboption writing")
CC: stable@vger.kernel.org # 5.11+
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Acked-and-tested-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bonding: fix error return code of bond_neigh_init()

When slave is NULL or slave_ops->ndo_neigh_setup is NULL, no error
return code of bond_neigh_init() is assigned.
To fix this bug, ret is assigned with -EINVAL in these cases.

Fixes: 9e99bfefdbce ("bonding: fix bond_neigh_init()")
Reported-by: TOTE Robot <oslab@tsinghua.edu.cn>
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: enetc: allow hardware timestamping on TX queues with tc-etf enabled

The txtime is passed to the driver in skb->skb_mstamp_ns, which is
actually in a union with skb->tstamp (the place where software
timestamps are kept).

Since commit b50a5c70ffa4 ("net: allow simultaneous SW and HW transmit
timestamping"), __sock_recv_timestamp has some logic for making sure
that the two calls to skb_tstamp_tx:

skb_tx_timestamp(skb) # Software timestamp in the driver
-> skb_tstamp_tx(skb, NULL)

and

skb_tstamp_tx(skb, &shhwtstamps) # Hardware timestamp in the driver

will both do the right thing and in a race-free manner, meaning that
skb_tx_timestamp will deliver a cmsg with the software timestamp only,
and skb_tstamp_tx with a non-NULL hwtstamps argument will deliver a cmsg
with the hardware timestamp only.

Why are races even possible? Well, because although the software timestamp
skb->tstamp is private per skb, the hardware timestamp skb_hwtstamps(skb)
lives in skb_shinfo(skb), an area which is shared between skbs and their
clones. And skb_tstamp_tx works by cloning the packets when timestamping
them, therefore attempting to perform hardware timestamping on an skb's
clone will also change the hardware timestamp of the original skb. And
the original skb might have been yet again cloned for software
timestamping, at an earlier stage.

So the logic in __sock_recv_timestamp can't be as simple as saying
"does this skb have a hardware timestamp? if yes I'll send the hardware
timestamp to the socket, otherwise I'll send the software timestamp",
precisely because the hardware timestamp is shared.
Instead, it's quite the other way around: __sock_recv_timestamp says
"does this skb have a software timestamp? if yes, I'll send the software
timestamp, otherwise the hardware one". This works because the software
timestamp is not shared with clones.

But that means we have a problem when we attempt hardware timestamping
with skbs that don't have the skb->tstamp == 0. __sock_recv_timestamp
will say "oh, yeah, this must be some sort of odd clone" and will not
deliver the hardware timestamp to the socket. And this is exactly what
is happening when we have txtime enabled on the socket: as mentioned,
that is put in a union with skb->tstamp, so it is quite easy to mistake
it.

Do what other drivers do (intel igb/igc) and write zero to skb->tstamp
before taking the hardware timestamp. It's of no use to us now (we're
already on the TX confirmation path).

Fixes: 0d08c9ec7d6e ("enetc: add support time specific departure base on the qos etf")
Cc: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: enetc: set MAC RX FIFO to recommended value

On LS1028A, the MAC RX FIFO defaults to the value 2, which is too high
and may lead to RX lock-up under traffic at a rate higher than 6 Gbps.
Set it to 1 instead, as recommended by the hardware design team and by
later versions of the ENETC block guide.

Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com>
Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Reviewed-by: Jason Liu <jason.hui.liu@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: davicom: Use platform_get_irq_optional()

The second IRQ line really is optional, so use
platform_get_irq_optional() to obtain it.

Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: davicom: Fix regulator not turned off on driver removal

We must disable the regulator that was enabled in the probe function.

Fixes: 7994fe55a4a2 ("dm9000: Add regulator and reset support to dm9000")
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: davicom: Fix regulator not turned off on failed probe

When the probe fails or requests to be defered, we must disable the
regulator that was previously enabled.

Fixes: 7994fe55a4a2 ("dm9000: Add regulator and reset support to dm9000")
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: fix switchdev objects on bridge master mistakenly being applied on ports

Tobias reports that after the blamed patch, VLAN objects being added to
a bridge device are being added to all slave ports instead (swp2, swp3).

ip link add br0 type bridge vlan_filtering 1
ip link set swp2 master br0
ip link set swp3 master br0
bridge vlan add dev br0 vid 100 self

This is because the fix was too broad: we made dsa_port_offloads_netdev
say "yes, I offload the br0 bridge" for all slave ports, but we didn't
add the checks whether the switchdev object was in fact meant for the
physical port or for the bridge itself. So we are reacting on events in
a way in which we shouldn't.

The reason why the fix was too broad is because the question itself,
"does this DSA port offload this netdev", was too broad in the first
place. The solution is to disambiguate the question and separate it into
two different functions, one to be called for each switchdev attribute /
object that has an orig_dev == net_bridge (dsa_port_offloads_bridge),
and the other for orig_dev == net_bridge_port (*_offloads_bridge_port).

In the case of VLAN objects on the bridge interface, this solves the
problem because we know that VLAN objects are per bridge port and not
per bridge. And when orig_dev is equal to the net_bridge, we offload it
as a bridge, but not as a bridge port; that's how we are able to skip
reacting on those events. Note that this is compatible with future plans
to have explicit offloading of VLAN objects on the bridge interface as a
bridge port (in DSA, this signifies that we should add that VLAN towards
the CPU port).

Fixes: 99b8202b179f ("net: dsa: fix SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING getting ignored")
Reported-by: Tobias Waldekranz <tobias@waldekranz.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Tobias Waldekranz <tobias@waldekranz.com>
Tested-by: Tobias Waldekranz <tobias@waldekranz.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: wan: fix error return code of uhdlc_init()

When priv->rx_skbuff or priv->tx_skbuff is NULL, no error return code of
uhdlc_init() is assigned.
To fix this bug, ret is assigned with -ENOMEM in these cases.

Reported-by: TOTE Robot <oslab@tsinghua.edu.cn>
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>