platform/kernel/linux-starfive.git
2 years agoMerge tag 'rxrpc-next-20221108' of git://git.kernel.org/pub/scm/linux/kernel/git...
David S. Miller [Wed, 9 Nov 2022 14:03:49 +0000 (14:03 +0000)]
Merge tag 'rxrpc-next-20221108' of git://git./linux/kernel/git/dhowells/linux-fs

rxrpc changes

David Howells says:

====================
rxrpc: Increasing SACK size and moving away from softirq, part 1

AF_RXRPC has some issues that need addressing:

 (1) The SACK table has a maximum capacity of 255, but for modern networks
     that isn't sufficient.  This is hard to increase in the upstream code
     because of the way the application thread is coupled to the softirq
     and retransmission side through a ring buffer.  Adjustments to the rx
     protocol allows a capacity of up to 8192, and having a ring
     sufficiently large to accommodate that would use an excessive amount
     of memory as this is per-call.

 (2) Processing ACKs in softirq mode causes the ACKs get conflated, with
     only the most recent being considered.  Whilst this has the upside
     that the retransmission algorithm only needs to deal with the most
     recent ACK, it causes DATA transmission for a call to be very bursty
     because DATA packets cannot be transmitted in softirq mode.  Rather
     transmission must be delegated to either the application thread or a
     workqueue, so there tend to be sudden bursts of traffic for any
     particular call due to scheduling delays.

 (3) All crypto in a single call is done in series; however, each DATA
     packet is individually encrypted so encryption and decryption of large
     calls could be parallelised if spare CPU resources are available.

This is the first of a number of sets of patches that try and address them.
The overall aims of these changes include:

 (1) To get rid of the TxRx ring and instead pass the packets round in
     queues (eg. sk_buff_head).  On the Tx side, each ACK packet comes with
     a SACK table that can be parsed as-is, so there's no particular need
     to maintain our own; we just have to refer to the ACK.

     On the Rx side, we do need to maintain a SACK table with one bit per
     entry - but only if packets go missing - and we don't want to have to
     perform a complex transformation to get the information into an ACK
     packet.

 (2) To try and move almost all processing of received packets out of the
     softirq handler and into a high-priority kernel I/O thread.  Only the
     transferral of packets would be left there.  I would still use the
     encap_rcv hook to receive packets as there's a noticeable performance
     drop from letting the UDP socket put the packets into its own queue
     and then getting them out of there.

 (3) To make the I/O thread also do all the transmission.  The app thread
     would be responsible for packaging the data into packets and then
     buffering them for the I/O thread to transmit.  This would make it
     easier for the app thread to run ahead of the I/O thread, and would
     mean the I/O thread is less likely to have to wait around for a new
     packet to come available for transmission.

 (4) To logically partition the socket/UAPI/KAPI side of things from the
     I/O side of things.  The local endpoint, connection, peer and call
     objects would belong to the I/O side.  The socket side would not then
     touch the private internals of calls and suchlike and would not change
     their states.  It would only look at the send queue, receive queue and
     a way to pass a message to cause an abort.

 (5) To remove as much locking, synchronisation, barriering and atomic ops
     as possible from the I/O side.  Exclusion would be achieved by
     limiting modification of state to the I/O thread only.  Locks would
     still need to be used in communication with the UDP socket and the
     AF_RXRPC socket API.

 (6) To provide crypto offload kernel threads that, when there's slack in
     the system, can see packets that need crypting and provide
     parallelisation in dealing with them.

 (7) To remove the use of system timers.  Since each timer would then send
     a poke to the I/O thread, which would then deal with it when it had
     the opportunity, there seems no point in using system timers if,
     instead, a list of timeouts can be sensibly consulted.  An I/O thread
     only then needs to schedule with a timeout when it is idle.

 (8) To use zero-copy sendmsg to send packets.  This would make use of the
     I/O thread being the sole transmitter on the socket to manage the
     dead-reckoning sequencing of the completion notifications.  There is a
     problem with zero-copy, though: the UDP socket doesn't handle running
     out of option memory very gracefully.

With regard to this first patchset, the changes made include:

 (1) Some fixes, including a fallback for proc_create_net_single_write(),
     setting ack.bufferSize to 0 in ACK packets and a fix for rxrpc
     congestion management, which shouldn't be saving the cwnd value
     between calls.

 (2) Improvements in rxrpc tracepoints, including splitting the timer
     tracepoint into a set-timer and a timer-expired trace.

 (3) Addition of a new proc file to display some stats.

 (4) Some code cleanups, including removing some unused bits and
     unnecessary header inclusions.

 (5) A change to the recently added UDP encap_err_rcv hook so that it has
     the same signature as {ip,ipv6}_icmp_error(), and then just have rxrpc
     point its UDP socket's hook directly at those.

 (6) Definition of a new struct, rxrpc_txbuf, that is used to hold
     transmissible packets of DATA and ACK type in a single 2KiB block
     rather than using an sk_buff.  This allows the buffer to be on a
     number of queues simultaneously more easily, and also guarantees that
     the entire block is in a single unit for zerocopy purposes and that
     the data payload is aligned for in-place crypto purposes.

 (7) ACK txbufs are allocated at proposal and queued for later transmission
     rather than being stored in a single place in the rxrpc_call struct,
     which means only a single ACK can be pending transmission at a time.
     The queue is then drained at various points.  This allows the ACK
     generation code to be simplified.

 (8) The Rx ring buffer is removed.  When a jumbo packet is received (which
     comprises a number of ordinary DATA packets glued together), it used
     to be pointed to by the ring multiple times, with an annotation in a
     side ring indicating which subpacket was in that slot - but this is no
     longer possible.  Instead, the packet is cloned once for each
     subpacket, barring the last, and the range of data is set in the skb
     private area.  This makes it easier for the subpackets in a jumbo
     packet to be decrypted in parallel.

 (9) The Tx ring buffer is removed.  The side annotation ring that held the
     SACK information is also removed.  Instead, in the event of packet
     loss, the SACK data attached an ACK packet is parsed.

(10) Allocate an skcipher request when needed in the rxkad security class
     rather than caching one in the rxrpc_call struct.  This deals with a
     race between externally-driven call disconnection getting rid of the
     skcipher request and sendmsg/recvmsg trying to use it because they
     haven't seen the completion yet.  This is also needed to support
     parallelisation as the skcipher request cannot be used by two or more
     threads simultaneously.

(11) Call udp_sendmsg() and udpv6_sendmsg() directly rather than going
     through kernel_sendmsg() so that we can provide our own iterator
     (zerocopy explicitly doesn't work with a KVEC iterator).  This also
     lets us avoid the overhead of the security hook.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet/core: Allow live renaming when an interface is up
Andy Ren [Mon, 7 Nov 2022 17:42:42 +0000 (09:42 -0800)]
net/core: Allow live renaming when an interface is up

Allow a network interface to be renamed when the interface
is up.

As described in the netconsole documentation [1], when netconsole is
used as a built-in, it will bring up the specified interface as soon as
possible. As a result, user space will not be able to rename the
interface since the kernel disallows renaming of interfaces that are
administratively up unless the 'IFF_LIVE_RENAME_OK' private flag was set
by the kernel.

The original solution [2] to this problem was to add a new parameter to
the netconsole configuration parameters that allows renaming of
the interface used by netconsole while it is administratively up.
However, during the discussion that followed, it became apparent that we
have no reason to keep the current restriction and instead we should
allow user space to rename interfaces regardless of their administrative
state:

1. The restriction was put in place over 20 years ago when renaming was
only possible via IOCTL and before rtnetlink started notifying user
space about such changes like it does today.

2. The 'IFF_LIVE_RENAME_OK' flag was added over 3 years ago in version
5.2 and no regressions were reported.

3. In-kernel listeners to 'NETDEV_CHANGENAME' do not seem to care about
the administrative state of interface.

Therefore, allow user space to rename running interfaces by removing the
restriction and the associated 'IFF_LIVE_RENAME_OK' flag. Help in
possible triage by emitting a message to the kernel log that an
interface was renamed while UP.

[1] https://www.kernel.org/doc/Documentation/networking/netconsole.rst
[2] https://lore.kernel.org/netdev/20221102002420.2613004-1-andy.ren@getcruise.com/

Signed-off-by: Andy Ren <andy.ren@getcruise.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'dsa-microchip-checking'
David S. Miller [Wed, 9 Nov 2022 13:06:01 +0000 (13:06 +0000)]
Merge branch 'dsa-microchip-checking'

Rakesh Sankaranarayanan says:

====================
net: dsa: microchip: ksz_pwrite status check for lan937x and irq and error checking updates for ksz series

This patch series include following changes,
- Add KSZ9563 inside ksz_switch_chips. As per current structure,
KSZ9893 is reused inside ksz_switch_chips structure, but since
there is a mismatch in number of irq's, new member added for KSZ9563
and sku detected based on Global Chip ID 4 Register. Compatible
string from device tree mapped to KSZ9563 for spi and i2c mode
probes.
- Assign device interrupt during i2c probe operation.
- Add error checking for ksz_pwrite inside lan937x_change_mtu. After v6.0,
ksz_pwrite updated to have return type int instead of void, and
lan937x_change_mtu still uses ksz_pwrite without status verification.
- Add port_nirq as 3 for KSZ8563 switch family.
- Use dev_err_probe() instead of dev_err() to have more standardized error
formatting and logging.

v1 -> v2:
- Removed regmap validation patch from the series, planning to take
  up in future after checking for any better approach and studying
  the actual need for this change.
- Resolved error reported in ksz8863_smi.c file.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: dsa: microchip: add dev_err_probe in probe functions
Rakesh Sankaranarayanan [Mon, 7 Nov 2022 09:29:22 +0000 (14:59 +0530)]
net: dsa: microchip: add dev_err_probe in probe functions

Probe functions uses normal dev_err() to check error conditions
and print messages. Replace dev_err() with dev_err_probe() to
have more standardized format and error logging.

Signed-off-by: Rakesh Sankaranarayanan <rakesh.sankaranarayanan@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: dsa: microchip: ksz8563: Add number of port irq
Rakesh Sankaranarayanan [Mon, 7 Nov 2022 09:29:21 +0000 (14:59 +0530)]
net: dsa: microchip: ksz8563: Add number of port irq

KSZ8563 have three port interrupts: PTP, PHY and ACL. Add
port_nirq as 3 for KSZ8563 inside ksz_chip_data.

Signed-off-by: Rakesh Sankaranarayanan <rakesh.sankaranarayanan@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: dsa: microchip: add error checking for ksz_pwrite
Rakesh Sankaranarayanan [Mon, 7 Nov 2022 09:29:20 +0000 (14:59 +0530)]
net: dsa: microchip: add error checking for ksz_pwrite

Add status validation for port register write inside
lan937x_change_mtu. ksz_pwrite and ksz_pread api's are
updated with return type int (Reference patch mentioned
below). Update lan937x_change_mtu with status validation
for ksz_pwrite16().

Link: https://patchwork.kernel.org/project/netdevbpf/patch/20220826105634.3855578-6-o.rempel@pengutronix.de/
Signed-off-by: Rakesh Sankaranarayanan <rakesh.sankaranarayanan@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: dsa: microchip: add irq in i2c probe
Rakesh Sankaranarayanan [Mon, 7 Nov 2022 09:29:19 +0000 (14:59 +0530)]
net: dsa: microchip: add irq in i2c probe

add device irq in i2c probe function.

Signed-off-by: Rakesh Sankaranarayanan <rakesh.sankaranarayanan@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: dsa: microchip: add ksz9563 in ksz_switch_ops and select based on compatible...
Rakesh Sankaranarayanan [Mon, 7 Nov 2022 09:29:18 +0000 (14:59 +0530)]
net: dsa: microchip: add ksz9563 in ksz_switch_ops and select based on compatible string

Add KSZ9563 inside ksz_switch_chips structure with
port_nirq as 3. KSZ9563 use KSZ9893 switch parameters
but port_nirq count is 3 for KSZ9563 whereas 2 for
KSZ9893. Add KSZ9563 inside ksz_switch_chips as a separate
member and from device tree map compatible string into
KSZ9563 inside ksz_spi.c and ksz9477_i2c.c.
Global Chip ID 1 and 2 registers read value 9893, select
sku based on  Global Chip ID 4 Register which read 0x1c
for KSZ9563.

Signed-off-by: Rakesh Sankaranarayanan <rakesh.sankaranarayanan@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ethernet: renesas: rswitch: Fix endless loop in error paths
Yoshihiro Shimoda [Mon, 7 Nov 2022 08:10:21 +0000 (17:10 +0900)]
net: ethernet: renesas: rswitch: Fix endless loop in error paths

Coverity reported that the error path in rswitch_gwca_queue_alloc_skb()
has an issue to cause endless loop. So, fix the issue by changing
variables' types from u32 to int. After changed the types,
rswitch_tx_free() should use rswitch_get_num_cur_queues() to
calculate number of current queues.

Reported-by: coverity-bot <keescook+coverity-bot@chromium.org>
Addresses-Coverity-ID: 1527147 ("Control flow issues")
Fixes: 3590918b5d07 ("net: ethernet: renesas: Add support for "Ethernet Switch"")
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Link: https://lore.kernel.org/r/20221107081021.2955122-1-yoshihiro.shimoda.uh@renesas.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agolib: Fix some kernel-doc comments
Yang Li [Mon, 7 Nov 2022 06:26:23 +0000 (14:26 +0800)]
lib: Fix some kernel-doc comments

Make the description of @policy to @p in nla_policy_len()
to clear the below warnings:

lib/nlattr.c:660: warning: Function parameter or member 'p' not described in 'nla_policy_len'
lib/nlattr.c:660: warning: Excess function parameter 'policy' description in 'nla_policy_len'

Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=2736
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Link: https://lore.kernel.org/r/20221107062623.6709-1-yang.lee@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agorxrpc: Allocate an skcipher each time needed rather than reusing
David Howells [Fri, 4 Nov 2022 23:13:40 +0000 (23:13 +0000)]
rxrpc: Allocate an skcipher each time needed rather than reusing

In the rxkad security class, allocate the skcipher used to do packet
encryption and decription rather than allocating one up front and reusing
it for each packet.  Reusing the skcipher precludes doing crypto in
parallel.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

2 years agorxrpc: Fix congestion management
David Howells [Mon, 3 Oct 2022 17:49:11 +0000 (18:49 +0100)]
rxrpc: Fix congestion management

rxrpc has a problem in its congestion management in that it saves the
congestion window size (cwnd) from one call to another, but if this is 0 at
the time is saved, then the next call may not actually manage to ever
transmit anything.

To this end:

 (1) Don't save cwnd between calls, but rather reset back down to the
     initial cwnd and re-enter slow-start if data transmission is idle for
     more than an RTT.

 (2) Preserve ssthresh instead, as that is a handy estimate of pipe
     capacity.  Knowing roughly when to stop slow start and enter
     congestion avoidance can reduce the tendency to overshoot and drop
     larger amounts of packets when probing.

In future, cwind growth also needs to be constrained when the window isn't
being filled due to being application limited.

Reported-by: Simon Wilkinson <sxw@auristor.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

2 years agorxrpc: Remove the rxtx ring
David Howells [Wed, 15 Jun 2022 13:49:26 +0000 (14:49 +0100)]
rxrpc: Remove the rxtx ring

The Rx/Tx ring is no longer used, so remove it.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

2 years agorxrpc: Save last ACK's SACK table rather than marking txbufs
David Howells [Sat, 7 May 2022 09:06:13 +0000 (10:06 +0100)]
rxrpc: Save last ACK's SACK table rather than marking txbufs

Improve the tracking of which packets need to be transmitted by saving the
last ACK packet that we receive that has a populated soft-ACK table rather
than marking packets.  Then we can step through the soft-ACK table and look
at the packets we've transmitted beyond that to determine which packets we
might want to retransmit.

We also look at the highest serial number that has been acked to try and
guess which packets we've transmitted the peer is likely to have seen.  If
necessary, we send a ping to retrieve that number.

One downside that might be a problem is that we can't then compare the
previous acked/unacked state so easily in rxrpc_input_soft_acks() - which
is a potential problem for the slow-start algorithm.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

2 years agorxrpc: Remove call->lock
David Howells [Fri, 6 May 2022 15:13:13 +0000 (16:13 +0100)]
rxrpc: Remove call->lock

call->lock is no longer necessary, so remove it.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

2 years agorxrpc: Don't use a ring buffer for call Tx queue
David Howells [Thu, 31 Mar 2022 22:55:08 +0000 (23:55 +0100)]
rxrpc: Don't use a ring buffer for call Tx queue

Change the way the Tx queueing works to make the following ends easier to
achieve:

 (1) The filling of packets, the encryption of packets and the transmission
     of packets can be handled in parallel by separate threads, rather than
     rxrpc_sendmsg() allocating, filling, encrypting and transmitting each
     packet before moving onto the next one.

 (2) Get rid of the fixed-size ring which sets a hard limit on the number
     of packets that can be retained in the ring.  This allows the number
     of packets to increase without having to allocate a very large ring or
     having variable-sized rings.

     [Note: the downside of this is that it's then less efficient to locate
     a packet for retransmission as we then have to step through a list and
     examine each buffer in the list.]

 (3) Allow the filler/encrypter to run ahead of the transmission window.

 (4) Make it easier to do zero copy UDP from the packet buffers.

 (5) Make it easier to do zero copy from userspace to the packet buffers -
     and thence to UDP (only if for unauthenticated connections).

To that end, the following changes are made:

 (1) Use the new rxrpc_txbuf struct instead of sk_buff for keeping packets
     to be transmitted in.  This allows them to be placed on multiple
     queues simultaneously.  An sk_buff isn't really necessary as it's
     never passed on to lower-level networking code.

 (2) Keep the transmissable packets in a linked list on the call struct
     rather than in a ring.  As a consequence, the annotation buffer isn't
     used either; rather a flag is set on the packet to indicate ackedness.

 (3) Use the RXRPC_CALL_TX_LAST flag to indicate that the last packet to be
     transmitted has been queued.  Add RXRPC_CALL_TX_ALL_ACKED to indicate
     that all packets up to and including the last got hard acked.

 (4) Wire headers are now stored in the txbuf rather than being concocted
     on the stack and they're stored immediately before the data, thereby
     allowing zerocopy of a single span.

 (5) Don't bother with instant-resend on transmission failure; rather,
     leave it for a timer or an ACK packet to trigger.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

2 years agorxrpc: Get rid of the Rx ring
David Howells [Sat, 27 Aug 2022 13:27:56 +0000 (14:27 +0100)]
rxrpc: Get rid of the Rx ring

Get rid of the Rx ring and replace it with a pair of queues instead.  One
queue gets the packets that are in-sequence and are ready for processing by
recvmsg(); the other queue gets the out-of-sequence packets for addition to
the first queue as the holes get filled.

The annotation ring is removed and replaced with a SACK table.  The SACK
table has the bits set that correspond exactly to the sequence number of
the packet being acked.  The SACK ring is copied when an ACK packet is
being assembled and rotated so that the first ACK is in byte 0.

Flow control handling is altered so that packets that are moved to the
in-sequence queue are hard-ACK'd even before they're consumed - and then
the Rx window size in the ACK packet (rsize) is shrunk down to compensate
(even going to 0 if the window is full).

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

2 years agorxrpc: Clone received jumbo subpackets and queue separately
David Howells [Fri, 7 Oct 2022 16:44:39 +0000 (17:44 +0100)]
rxrpc: Clone received jumbo subpackets and queue separately

Split up received jumbo packets into separate skbuffs by cloning the
original skbuff for each subpacket and setting the offset and length of the
data in that subpacket in the skbuff's private data.  The subpackets are
then placed on the recvmsg queue separately.  The security class then gets
to revise the offset and length to remove its metadata.

If we fail to clone a packet, we just drop it and let the peer resend it.
The original packet gets used for the final subpacket.

This should make it easier to handle parallel decryption of the subpackets.
It also simplifies the handling of lost or misordered packets in the
queuing/buffering loop as the possibility of overlapping jumbo packets no
longer needs to be considered.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

2 years agorxrpc: Split the rxrpc_recvmsg tracepoint
David Howells [Fri, 7 Oct 2022 16:22:40 +0000 (17:22 +0100)]
rxrpc: Split the rxrpc_recvmsg tracepoint

Split the rxrpc_recvmsg tracepoint so that the tracepoints that are about
data packet processing (and which have extra pieces of information) are
separate from the tracepoint that shows the general flow of recvmsg().

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

2 years agorxrpc: Clean up ACK handling
David Howells [Thu, 30 Jan 2020 21:48:14 +0000 (21:48 +0000)]
rxrpc: Clean up ACK handling

Clean up the rxrpc_propose_ACK() function.  If deferred PING ACK proposal
is split out, it's only really needed for deferred DELAY ACKs.  All other
ACKs, bar terminal IDLE ACK are sent immediately.  The deferred IDLE ACK
submission can be handled by conversion of a DELAY ACK into an IDLE ACK if
there's nothing to be SACK'd.

Also, because there's a delay between an ACK being generated and being
transmitted, it's possible that other ACKs of the same type will be
generated during that interval.  Apart from the ACK time and the serial
number responded to, most of the ACK body, including window and SACK
parameters, are not filled out till the point of transmission - so we can
avoid generating a new ACK if there's one pending that will cover the SACK
data we need to convey.

Therefore, don't propose a new DELAY or IDLE ACK for a call if there's one
already pending.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

2 years agorxrpc: Allocate ACK records at proposal and queue for transmission
David Howells [Thu, 30 Jan 2020 21:48:13 +0000 (21:48 +0000)]
rxrpc: Allocate ACK records at proposal and queue for transmission

Allocate rxrpc_txbuf records for ACKs and put onto a queue for the
transmitter thread to dispatch.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

2 years agorxrpc: Define rxrpc_txbuf struct to carry data to be transmitted
David Howells [Tue, 5 Apr 2022 20:16:32 +0000 (21:16 +0100)]
rxrpc: Define rxrpc_txbuf struct to carry data to be transmitted

Define a struct, rxrpc_txbuf, to carry data to be transmitted instead of a
socket buffer so that it can be placed onto multiple queues at once.  This
also allows the data buffer to be in the same allocation as the internal
data.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

2 years agorxrpc: Remove call->tx_phase
David Howells [Fri, 7 Oct 2022 13:46:08 +0000 (14:46 +0100)]
rxrpc: Remove call->tx_phase

Remove call->tx_phase as it's only ever set.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

2 years agorxrpc: Remove the flags from the rxrpc_skb tracepoint
David Howells [Fri, 7 Oct 2022 12:52:06 +0000 (13:52 +0100)]
rxrpc: Remove the flags from the rxrpc_skb tracepoint

Remove the flags from the rxrpc_skb tracepoint as we're no longer going to
be using this for the transmission buffers and so marking which are
transmission buffers isn't going to be necessary.

Note that this also remove the rxrpc skb flag that indicates if this is a
transmission buffer and so the count is not updated for the moment.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

2 years agorxrpc: Remove unnecessary header inclusions
David Howells [Thu, 23 Jan 2020 21:36:21 +0000 (21:36 +0000)]
rxrpc: Remove unnecessary header inclusions

Remove a bunch of unnecessary header inclusions.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

2 years agorxrpc: Call udp_sendmsg() directly
David Howells [Tue, 22 Mar 2022 11:07:20 +0000 (11:07 +0000)]
rxrpc: Call udp_sendmsg() directly

Call udp_sendmsg() and udpv6_sendmsg() directly rather than calling
kernel_sendmsg() as the latter assumes we want a kvec-class iterator.
However, zerocopy explicitly doesn't work with such an iterator.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

2 years agorxrpc: Use the core ICMP/ICMP6 parsers
David Howells [Wed, 12 Oct 2022 08:51:12 +0000 (09:51 +0100)]
rxrpc: Use the core ICMP/ICMP6 parsers

Make rxrpc_encap_rcv_err() pass the ICMP/ICMP6 skbuff to ip_icmp_error() or
ipv6_icmp_error() as appropriate to do the parsing rather than trying to do
it in rxrpc.

This pushes an error report onto the UDP socket's error queue and calls
->sk_error_report() from which point rxrpc can pick it up.

It would be preferable to steal the packet directly from ip*_icmp_error()
rather than letting it get queued, but this is probably good enough.

Also note that __udp4_lib_err() calls sk_error_report() twice in some
cases.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

2 years agonet: Change the udp encap_err_rcv to allow use of {ip,ipv6}_icmp_error()
David Howells [Wed, 12 Oct 2022 07:49:29 +0000 (08:49 +0100)]
net: Change the udp encap_err_rcv to allow use of {ip,ipv6}_icmp_error()

Change the udp encap_err_rcv signature to match ip_icmp_error() and
ipv6_icmp_error() so that those can be used from the called function and
export them.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: netdev@vger.kernel.org

2 years agorxrpc: Fix ack.bufferSize to be 0 when generating an ack
David Howells [Wed, 7 Sep 2022 18:17:29 +0000 (19:17 +0100)]
rxrpc: Fix ack.bufferSize to be 0 when generating an ack

ack.bufferSize should be set to 0 when generating an ack.

Fixes: 8d94aa381dab ("rxrpc: Calls shouldn't hold socket refs")
Reported-by: Jeffrey Altman <jaltman@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

2 years agorxrpc: Record stats for why the REQUEST-ACK flag is being set
David Howells [Thu, 18 Aug 2022 10:52:36 +0000 (11:52 +0100)]
rxrpc: Record stats for why the REQUEST-ACK flag is being set

Record stats for why the REQUEST-ACK flag is being set.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

2 years agorxrpc: Record statistics about ACK types
David Howells [Wed, 11 May 2022 13:01:25 +0000 (14:01 +0100)]
rxrpc: Record statistics about ACK types

Record statistics about the different types of ACKs that have been
transmitted and received and the number of ACKs that have been filled out
and transmitted or that have been skipped.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

2 years agorxrpc: Add stats procfile and DATA packet stats
David Howells [Wed, 11 May 2022 13:01:25 +0000 (14:01 +0100)]
rxrpc: Add stats procfile and DATA packet stats

Add a procfile, /proc/net/rxrpc/stats, to display some statistics about
what rxrpc has been doing.  Writing a blank line to the stats file will
clear the increment-only counters.  Allocated resource counters don't get
cleared.

Add some counters to count various things about DATA packets, including the
number created, transmitted and retransmitted and the number received, the
number of ACK-requests markings and the number of jumbo packets received.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

2 years agorxrpc: Track highest acked serial
David Howells [Thu, 28 Apr 2022 07:30:47 +0000 (08:30 +0100)]
rxrpc: Track highest acked serial

Keep track of the highest DATA serial number that has been acked by the
peer for future purposes.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

2 years agorxrpc: Split call timer-expiration from call timer-set tracepoint
David Howells [Thu, 21 Apr 2022 23:20:49 +0000 (00:20 +0100)]
rxrpc: Split call timer-expiration from call timer-set tracepoint

Split the tracepoint for call timer-set to separate out the call
timer-expiration event

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

2 years agorxrpc: Trace setting of the request-ack flag
David Howells [Tue, 5 Apr 2022 20:48:48 +0000 (21:48 +0100)]
rxrpc: Trace setting of the request-ack flag

Add a tracepoint to log why the request-ack flag is set on an outgoing DATA
packet, allowing debugging as to why.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

2 years agonet, proc: Provide PROC_FS=n fallback for proc_create_net_single_write()
David Howells [Mon, 3 Oct 2022 06:34:21 +0000 (07:34 +0100)]
net, proc: Provide PROC_FS=n fallback for proc_create_net_single_write()

Provide a CONFIG_PROC_FS=n fallback for proc_create_net_single_write().

Also provide a fallback for proc_create_net_data_write().

Fixes: 564def71765c ("proc: Add a way to make network proc files writable")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: netdev@vger.kernel.org

2 years agoMerge branch 'bnxt_en-updates'
Paolo Abeni [Tue, 8 Nov 2022 11:39:04 +0000 (12:39 +0100)]
Merge branch 'bnxt_en-updates'

Michael Chan says:

====================
bnxt_en: Updates

This small patchset adds an improvement to the configuration of ethtool
RSS tuple hash and a PTP improvement when running in a multi-host
environment.
====================

Link: https://lore.kernel.org/r/1667780192-3700-1-git-send-email-michael.chan@broadcom.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agobnxt_en: Add a non-real time mode to access NIC clock
Pavan Chebbi [Mon, 7 Nov 2022 00:16:32 +0000 (19:16 -0500)]
bnxt_en: Add a non-real time mode to access NIC clock

When using a PHC that is shared between multiple hosts,
in order to achieve consistent timestamps across all hosts,
we need to isolate the PHC from any host making frequency
adjustments.

This patch adds a non-real time mode for this purpose.
The implementation is based on a free running NIC hardware timer
which is used as the timestamper time-base. Each host implements
individual adjustments to a local timecounter based on the NIC free
running timer.

Cc: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agobnxt_en: update RSS config using difference algorithm
Edwin Peer [Mon, 7 Nov 2022 00:16:31 +0000 (19:16 -0500)]
bnxt_en: update RSS config using difference algorithm

Hardware is unable to realize all legal firmware interface state values
for hash_type.  For example, if 4-tuple TCP_IPV4 hash is enabled,
4-tuple UDP_IPV4 hash must also be enabled.  By providing the bits the
user intended to change instead of the possible illegal intermediate
states, the firmware is able to make better compromises when deciding
which bits to ignore.

With this new mechansim, we can now report the actual configured hash
back to the user.  Add bnxt_hwrm_update_rss_hash_cfg() to report the
actual hash after user configuration.

Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agobnxt_en: refactor VNIC RSS update functions
Edwin Peer [Mon, 7 Nov 2022 00:16:30 +0000 (19:16 -0500)]
bnxt_en: refactor VNIC RSS update functions

Extract common code into a new function. This will avoid duplication
in the next patch, which changes the update algorithm for both the P5
and legacy code paths.

No functional changes.

Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agoMerge branch 'net-add-helper-support-in-tc-act_ct-for-ovs-offloading'
Paolo Abeni [Tue, 8 Nov 2022 11:21:51 +0000 (12:21 +0100)]
Merge branch 'net-add-helper-support-in-tc-act_ct-for-ovs-offloading'

Xin Long says:

====================
net: add helper support in tc act_ct for ovs offloading

Ilya reported an issue that FTP traffic would be broken when the OVS flow
with ct(commit,alg=ftp) installed in the OVS kernel module, and it was
caused by that TC didn't support the ftp helper offloaded from OVS.

This patchset is to add the helper support in act_ct for OVS offloading
in kernel net/sched.

The 1st and 2nd patches move some common code into nf_conntrack_helper from
openvswitch so that they could be used by net/sched in the 4th patch (Note
there are still some other common code used in both OVS and TC, and I will
extract it in other patches). The 3rd patch extracts another function in
net/sched to make the 4th patch easier to write. The 4th patch adds this
feature in net/sched.

The user space part will be added in another patch, and with it these OVS
flows (FTP over SNAT) can be used to test this feature:

  table=0, in_port=veth1,tcp,tcp_dst=2121,ct_state=-trk \
    actions=ct(table=1, nat), normal
  table=0, in_port=veth2,tcp,ct_state=-trk actions=ct(table=1, nat)
  table=0, in_port=veth1,tcp,ct_state=-trk actions=ct(table=0, nat)
  table=0, in_port=veth1,tcp,ct_state=+trk+rel actions=ct(commit, nat),normal
  table=0, in_port=veth1,tcp,ct_state=+trk+est actions=veth2"

  table=1, in_port=veth1,tcp,tcp_dst=2121,ct_state=+trk+new \
    actions=ct(commit, nat(src=7.7.16.1), alg=ftp),normal"
  table=1, in_port=veth1,tcp,tcp_dst=2121,ct_state=+trk+est actions=veth2"
  table=1, in_port=veth2,tcp,ct_state=+trk+est actions=veth1"

====================

Link: https://lore.kernel.org/r/cover.1667766782.git.lucien.xin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agonet: sched: add helper support in act_ct
Xin Long [Sun, 6 Nov 2022 20:34:17 +0000 (15:34 -0500)]
net: sched: add helper support in act_ct

This patch is to add helper support in act_ct for OVS actions=ct(alg=xxx)
offloading, which is corresponding to Commit cae3a2627520 ("openvswitch:
Allow attaching helpers to ct action") in OVS kernel part.

The difference is when adding TC actions family and proto cannot be got
from the filter/match, other than helper name in tb[TCA_CT_HELPER_NAME],
we also need to send the family in tb[TCA_CT_HELPER_FAMILY] and the
proto in tb[TCA_CT_HELPER_PROTO] to kernel.

Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agonet: sched: call tcf_ct_params_free to free params in tcf_ct_init
Xin Long [Sun, 6 Nov 2022 20:34:16 +0000 (15:34 -0500)]
net: sched: call tcf_ct_params_free to free params in tcf_ct_init

This patch is to make the err path simple by calling tcf_ct_params_free(),
so that it won't cause problems when more members are added into param and
need freeing on the err path.

Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agonet: move add ct helper function to nf_conntrack_helper for ovs and tc
Xin Long [Sun, 6 Nov 2022 20:34:15 +0000 (15:34 -0500)]
net: move add ct helper function to nf_conntrack_helper for ovs and tc

Move ovs_ct_add_helper from openvswitch to nf_conntrack_helper and
rename as nf_ct_add_helper, so that it can be used in TC act_ct in
the next patch.

Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agonet: move the ct helper function to nf_conntrack_helper for ovs and tc
Xin Long [Sun, 6 Nov 2022 20:34:14 +0000 (15:34 -0500)]
net: move the ct helper function to nf_conntrack_helper for ovs and tc

Move ovs_ct_helper from openvswitch to nf_conntrack_helper and rename
as nf_ct_helper so that it can be used in TC act_ct in the next patch.
Note that it also adds the checks for the family and proto, as in TC
act_ct, the packets with correct family and proto are not guaranteed.

Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agoethtool: Fail number of channels change when it conflicts with rxnfc
Gal Pressman [Sun, 6 Nov 2022 12:31:27 +0000 (14:31 +0200)]
ethtool: Fail number of channels change when it conflicts with rxnfc

Similar to what we do with the hash indirection table [1], when network
flow classification rules are forwarding traffic to channels greater
than the requested number of channels, fail the operation.
Without this, traffic could be directed to channels which no longer
exist (dropped) after changing number of channels.

[1] commit d4ab4286276f ("ethtool: correctly ensure {GS}CHANNELS doesn't conflict with GS{RXFH}")

Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://lore.kernel.org/r/20221106123127.522985-1-gal@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agoethtool: linkstate: add a statistic for PHY down events
Jakub Kicinski [Fri, 4 Nov 2022 19:01:25 +0000 (12:01 -0700)]
ethtool: linkstate: add a statistic for PHY down events

The previous attempt to augment carrier_down (see Link)
was not met with much enthusiasm so let's do the simple
thing of exposing what some devices already maintain.
Add a common ethtool statistic for link going down.
Currently users have to maintain per-driver mapping
to extract the right stat from the vendor-specific ethtool -S
stats. carrier_down does not fit the bill because it counts
a lot of software related false positives.

Add the statistic to the extended link state API to steer
vendors towards implementing all of it.

Implement for bnxt and all Linux-controlled PHYs. mlx5 and (possibly)
enic also have a counter for this but I leave the implementation
to their maintainers.

Link: https://lore.kernel.org/r/20220520004500.2250674-1-kuba@kernel.org
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20221104190125.684910-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agoMerge branch 'net-txgbe-fix-two-bugs-in-txgbe_calc_eeprom_checksum'
Jakub Kicinski [Tue, 8 Nov 2022 04:00:13 +0000 (20:00 -0800)]
Merge branch 'net-txgbe-fix-two-bugs-in-txgbe_calc_eeprom_checksum'

YueHaibing says:

====================
net: txgbe: Fix two bugs in txgbe_calc_eeprom_checksum

Fix memleak and unsigned comparison bugs in txgbe_calc_eeprom_checksum
====================

Link: https://lore.kernel.org/r/20221105080722.20292-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: txgbe: Fix unsigned comparison to zero in txgbe_calc_eeprom_checksum()
YueHaibing [Sat, 5 Nov 2022 08:07:22 +0000 (16:07 +0800)]
net: txgbe: Fix unsigned comparison to zero in txgbe_calc_eeprom_checksum()

The error checks on checksum for a negative error return always fails because
it is unsigned and can never be negative.

Fixes: 049fe5365324 ("net: txgbe: Add operations to interact with firmware")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: txgbe: Fix memleak in txgbe_calc_eeprom_checksum()
YueHaibing [Sat, 5 Nov 2022 08:07:21 +0000 (16:07 +0800)]
net: txgbe: Fix memleak in txgbe_calc_eeprom_checksum()

eeprom_ptrs should be freed before returned.

Fixes: 049fe5365324 ("net: txgbe: Add operations to interact with firmware")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next...
Jakub Kicinski [Tue, 8 Nov 2022 03:55:52 +0000 (19:55 -0800)]
Merge branch '10GbE' of git://git./linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2022-11-04 (ixgbe, ixgbevf, igb)

This series contains updates to ixgbe, ixgbevf, and igb drivers.

Daniel Willenson adjusts descriptor buffer limits to be based on what
hardware supports instead of using a generic, least common value for
ixgbe.

Ani removes local variable for ixgbe, instead returning conditional result
directly.

Yang Li removes unneeded semicolon for ixgbe.

Jan adds error messaging when VLAN errors are encountered for ixgbevf.

Kees Cook prevents a potential use after free condition and explicitly
rounds up q_vector allocations so that allocations can be correctly
compared to ksize() for igb.

* '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  igb: Proactively round up to kmalloc bucket size
  igb: Do not free q_vector unless new one was allocated
  ixgbevf: Add error messages on vlan error
  ixgbe: Remove unneeded semicolon
  ixgbe: Remove local variable
  ixgbe: change MAX_RXD/MAX_TXD based on adapter type
====================

Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20221104205414.2354973-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonetlink: Fix potential skb memleak in netlink_ack
Tao Chen [Sat, 5 Nov 2022 09:05:04 +0000 (17:05 +0800)]
netlink: Fix potential skb memleak in netlink_ack

Fix coverity issue 'Resource leak'.

We should clean the skb resource if nlmsg_put/append failed.

Fixes: 738136a0e375 ("netlink: split up copies in the ack construction")
Signed-off-by: Tao Chen <chentao.kernel@linux.alibaba.com>
Link: https://lore.kernel.org/r/bff442d62c87de6299817fe1897cc5a5694ba9cc.1667638204.git.chentao.kernel@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: macb: implement live mac addr change
Roman Gushchin [Fri, 4 Nov 2022 20:48:37 +0000 (13:48 -0700)]
net: macb: implement live mac addr change

Implement live mac addr change for the macb ethernet driver.

Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20221104204837.614459-1-roman.gushchin@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: remove explicit phylink_generic_validate() references
Russell King (Oracle) [Fri, 4 Nov 2022 17:13:01 +0000 (17:13 +0000)]
net: remove explicit phylink_generic_validate() references

Virtually all conventional network drivers are now converted to use
phylink_generic_validate() - only DSA drivers and fman_memac remain,
so lets remove the necessity for network drivers to explicitly set
this member, and default to phylink_generic_validate() when unset.
This is possible as .validate must currently be set.

Any remaining instances that have not been addressed by this patch can
be fixed up later.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/E1or0FZ-001tRa-DI@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: lan966x: move unnecessary linux/sfp.h include
Russell King (Oracle) [Fri, 4 Nov 2022 16:53:59 +0000 (16:53 +0000)]
net: lan966x: move unnecessary linux/sfp.h include

lan966x_phylink.c doesn't make use of anything from linux/sfp.h, so
remove this unnecessary include.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/E1oqzx9-001r9g-HV@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge branch 'genetlink-per-op-type-policies'
David S. Miller [Mon, 7 Nov 2022 12:30:17 +0000 (12:30 +0000)]
Merge branch 'genetlink-per-op-type-policies'

Jakub Kicinski says:

====================
genetlink: support per op type policies

While writing new genetlink families I was increasingly annoyed by the fact
that we don't support different policies for do and dump callbacks.
This makes it hard to do proper input validation for dumps which usually
have a lot more narrow range of accepted attributes.

There is also a minor inconvenience of not supporting different per_doit
and post_doit callbacks per op.

This series addresses those problems by introducing another op format.

v3:
 - minor fixes to patch 12 after I took it for a spin with a real family
 - adjust commit msg in patch 8
v2: https://lore.kernel.org/all/20221102213338.194672-1-kuba@kernel.org/
 - wait for net changes to propagate
 - restore the missing comment in patch 1
 - drop extra space in patch 3
 - improve commit message in patch 4
v1: https://lore.kernel.org/all/20221018230728.1039524-1-kuba@kernel.org/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agogenetlink: convert control family to split ops
Jakub Kicinski [Fri, 4 Nov 2022 19:13:43 +0000 (12:13 -0700)]
genetlink: convert control family to split ops

Prove that the split ops work.
Sadly we need to keep bug-wards compatibility and specify
the same policy for dump as do, even tho we don't parse
inputs for the dump.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agogenetlink: allow families to use split ops directly
Jakub Kicinski [Fri, 4 Nov 2022 19:13:42 +0000 (12:13 -0700)]
genetlink: allow families to use split ops directly

Let families to hook in the new split ops.

They are more flexible and should not be much larger than
full ops. Each split op is 40B while full op is 48B.
Devlink for example has 54 dos and 19 dumps, 2 of the dumps
do not have a do -> 56 full commands = 2688B.
Split ops would have taken 2920B, so 9% more space while
allowing individual per/post doit and per-type policies.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agogenetlink: inline old iteration helpers
Jakub Kicinski [Fri, 4 Nov 2022 19:13:41 +0000 (12:13 -0700)]
genetlink: inline old iteration helpers

All dumpers use the iterators now, inline the cmd by index
stuff into iterator code.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agogenetlink: use iterator in the op to policy map dumping
Jakub Kicinski [Fri, 4 Nov 2022 19:13:40 +0000 (12:13 -0700)]
genetlink: use iterator in the op to policy map dumping

We can't put the full iterator in the struct ctrl_dump_policy_ctx
because dump context is statically sized by netlink core.
Allocate it dynamically.

Rename policy to dump_map to make the logic a little easier to follow.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agogenetlink: add iterator for walking family ops
Jakub Kicinski [Fri, 4 Nov 2022 19:13:39 +0000 (12:13 -0700)]
genetlink: add iterator for walking family ops

Subsequent changes will expose split op structures to users,
so walking the family ops with just an index will get harder.
Add a structured iterator, convert the simple cases.
Policy dumping needs more careful conversion.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agogenetlink: inline genl_get_cmd()
Jakub Kicinski [Fri, 4 Nov 2022 19:13:38 +0000 (12:13 -0700)]
genetlink: inline genl_get_cmd()

All callers go via genl_get_cmd_split() now, so rename it
to genl_get_cmd() remove the original.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agogenetlink: support split policies in ctrl_dumppolicy_put_op()
Jakub Kicinski [Fri, 4 Nov 2022 19:13:37 +0000 (12:13 -0700)]
genetlink: support split policies in ctrl_dumppolicy_put_op()

Pass do and dump versions of the op to ctrl_dumppolicy_put_op()
so that it can provide a different policy index for the two.

Since we now look at policies, and those are set appropriately
there's no need to look at the GENL_DONT_VALIDATE_DUMP flag.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agogenetlink: add policies for both doit and dumpit in ctrl_dumppolicy_start()
Jakub Kicinski [Fri, 4 Nov 2022 19:13:36 +0000 (12:13 -0700)]
genetlink: add policies for both doit and dumpit in ctrl_dumppolicy_start()

Separate adding doit and dumpit policies for CTRL_CMD_GETPOLICY.
This has no effect until we actually allow do and dump to come
from different sources as netlink_policy_dump_add_policy()
does deduplication.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agogenetlink: check for callback type at op load time
Jakub Kicinski [Fri, 4 Nov 2022 19:13:35 +0000 (12:13 -0700)]
genetlink: check for callback type at op load time

Now that genl_get_cmd_split() is informed what type of callback
user is trying to access (do or dump) we can check that this
callback is indeed available and return an error early.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agogenetlink: load policy based on validation flags
Jakub Kicinski [Fri, 4 Nov 2022 19:13:34 +0000 (12:13 -0700)]
genetlink: load policy based on validation flags

Set the policy and maxattr pointers based on validation flags.
genl_family_rcv_msg_attrs_parse() will do nothing and return NULL
if maxattrs is zero, so no behavior change is expected.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agogenetlink: introduce split op representation
Jakub Kicinski [Fri, 4 Nov 2022 19:13:33 +0000 (12:13 -0700)]
genetlink: introduce split op representation

We currently have two forms of operations - small ops and "full" ops
(or just ops). The former does not have pointers for some of the less
commonly used features (namely dump start/done and policy).

The "full" ops, however, still don't contain all the necessary
information. In particular the policy is per command ID, while
do and dump often accept different attributes. It's also not
possible to define different pre_doit and post_doit callbacks
for different commands within the family.

At the same time a lot of commands do not support dumping and
therefore all the dump-related information is wasted space.

Create a new command representation which can hold info about
a do implementation or a dump implementation, but not both at
the same time.

Use this new representation on the command execution path
(genl_family_rcv_msg) as we either run a do or a dump and
don't have to create a "full" op there.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agogenetlink: move the private fields in struct genl_family
Jakub Kicinski [Fri, 4 Nov 2022 19:13:32 +0000 (12:13 -0700)]
genetlink: move the private fields in struct genl_family

Move the private fields down to form a "private section".
Use the kdoc "private:" label comment thing to hide them
from the main kdoc comment.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agogenetlink: refactor the cmd <> policy mapping dump
Jakub Kicinski [Fri, 4 Nov 2022 19:13:31 +0000 (12:13 -0700)]
genetlink: refactor the cmd <> policy mapping dump

The code at the top of ctrl_dumppolicy() dumps mappings between
ops and policies. It supports dumping both the entire family and
single op if dump is filtered. But both of those cases are handled
inside a loop, which makes the logic harder to follow and change.
Refactor to split the two cases more clearly.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'am65-cpsw-suspend-resume'
David S. Miller [Mon, 7 Nov 2022 12:20:03 +0000 (12:20 +0000)]
Merge branch 'am65-cpsw-suspend-resume'

Roger Quadros says:

====================
net: ethernet: ti: am65-cpsw: Add suspend/resume support

This series enables PM_SLEEP(suspend/resume) support to
the am65-cpsw network driver.

Dual-emac and Switch mode are tested to work with suspend/resume
on AM62-SK platform.

It can be verified on the following branch
https://github.com/rogerq/linux/commits/for-v6.2/am62-cpsw-lpm-1.0
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ethernet: ti: am65-cpsw: Fix hardware switch mode on suspend/resume
Roger Quadros [Fri, 4 Nov 2022 13:23:10 +0000 (15:23 +0200)]
net: ethernet: ti: am65-cpsw: Fix hardware switch mode on suspend/resume

On low power during system suspend the ALE table context is lost.
Save the ALE contect before suspend and restore it after resume.

Signed-off-by: Roger Quadros <rogerq@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ethernet: ti: am65-cpsw: retain PORT_VLAN_REG after suspend/resume
Roger Quadros [Fri, 4 Nov 2022 13:23:09 +0000 (15:23 +0200)]
net: ethernet: ti: am65-cpsw: retain PORT_VLAN_REG after suspend/resume

During suspend resume the context of PORT_VLAN_REG is lost so
save it during suspend and restore it during resume for
host port and slave ports.

Signed-off-by: Roger Quadros <rogerq@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ethernet: ti: cpsw_ale: Add cpsw_ale_restore() helper
Roger Quadros [Fri, 4 Nov 2022 13:23:08 +0000 (15:23 +0200)]
net: ethernet: ti: cpsw_ale: Add cpsw_ale_restore() helper

This can be used by device driver to restore ALE context.
The data produced by cpsw_ale_dump() can be passed to
cpsw_ale_restore().

This is required as on certain platforms the ALE context
is lost on low power suspend/resume.

Signed-off-by: Roger Quadros <rogerq@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ethernet: ti: am65-cpsw: Add suspend/resume support
Roger Quadros [Fri, 4 Nov 2022 13:23:07 +0000 (15:23 +0200)]
net: ethernet: ti: am65-cpsw: Add suspend/resume support

Add PM handlers for System suspend/resume.

As DMA driver doesn't yet support suspend/resume we free up
the DMA channels at suspend and acquire and initialize them
at resume.

Move the init/free dma calls to ndo_open/close() hooks so
it is symmetric and easier to invoke from suspend/resume handler.

As CPTS looses contect during suspend/resume, invoke the
necessary CPTS suspend/resume helpers.

Signed-off-by: Roger Quadros <rogerq@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ethernet: ti: am65-cpsw/cpts: Add suspend/resume helpers
Roger Quadros [Fri, 4 Nov 2022 13:23:06 +0000 (15:23 +0200)]
net: ethernet: ti: am65-cpsw/cpts: Add suspend/resume helpers

CPTS looses context on suspend (e.g. on AM62).
Provide suspend/resume hooks in CPTS driver. These will be
invoked by CPSW driver if CPTS was instantiated by CPSW.

Signed-off-by: Roger Quadros <rogerq@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: phy: fix yt8521 duplicated argument to & or |
Frank [Fri, 4 Nov 2022 08:44:41 +0000 (16:44 +0800)]
net: phy: fix yt8521 duplicated argument to & or |

cocci warnings: (new ones prefixed by >>)
>> drivers/net/phy/motorcomm.c:1122:8-35: duplicated argument to & or |
  drivers/net/phy/motorcomm.c:1126:8-35: duplicated argument to & or |
  drivers/net/phy/motorcomm.c:1130:8-34: duplicated argument to & or |
  drivers/net/phy/motorcomm.c:1134:8-34: duplicated argument to & or |

 The second YT8521_RC1R_GE_TX_DELAY_xx should be YT8521_RC1R_FE_TX_DELAY_xx.

Fixes: 70479a40954c ("net: phy: Add driver for Motorcomm yt8521 gigabit ethernet phy")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Frank <Frank.Sae@motor-comm.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agogve: Fix error return code in gve_prefill_rx_pages()
Yang Yingliang [Fri, 4 Nov 2022 06:17:36 +0000 (14:17 +0800)]
gve: Fix error return code in gve_prefill_rx_pages()

If alloc_page() fails in gve_prefill_rx_pages(), it should return
an error code in the error path.

Fixes: 82fd151d38d9 ("gve: Reduce alloc and copy costs in the GQ rx path")
Cc: Jeroen de Borst <jeroendb@google.com>
Cc: Catherine Sullivan <csully@google.com>
Cc: Shailend Chand <shailend@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: fec: simplify the code logic of quirks
Shenwei Wang [Fri, 4 Nov 2022 02:47:54 +0000 (21:47 -0500)]
net: fec: simplify the code logic of quirks

Simplify the code logic of handling the quirk of FEC_QUIRK_HAS_RACC.
If a SoC has the RACC quirk, the driver will enable the 16bit shift
by default in the probe function for a better performance.

This patch handles the logic in one place to make the logic simple
and clean. The patch optimizes the fec_enet_xdp_get_tx_queue function
according to Paolo Abeni's comments, and it also exludes the SoCs that
require to do frame swap from XDP support.

Signed-off-by: Shenwei Wang <shenwei.wang@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agos390/lcs: Fix return type of lcs_start_xmit()
Nathan Chancellor [Thu, 3 Nov 2022 17:01:30 +0000 (10:01 -0700)]
s390/lcs: Fix return type of lcs_start_xmit()

With clang's kernel control flow integrity (kCFI, CONFIG_CFI_CLANG),
indirect call targets are validated against the expected function
pointer prototype to make sure the call target is valid to help mitigate
ROP attacks. If they are not identical, there is a failure at run time,
which manifests as either a kernel panic or thread getting killed. A
proposed warning in clang aims to catch these at compile time, which
reveals:

  drivers/s390/net/lcs.c:2090:21: error: incompatible function pointer types initializing 'netdev_tx_t (*)(struct sk_buff *, struct net_device *)' (aka 'enum netdev_tx (*)(struct sk_buff *, struct net_device *)') with an expression of type 'int (struct sk_buff *, struct net_device *)' [-Werror,-Wincompatible-function-pointer-types-strict]
          .ndo_start_xmit         = lcs_start_xmit,
                                    ^~~~~~~~~~~~~~
  drivers/s390/net/lcs.c:2097:21: error: incompatible function pointer types initializing 'netdev_tx_t (*)(struct sk_buff *, struct net_device *)' (aka 'enum netdev_tx (*)(struct sk_buff *, struct net_device *)') with an expression of type 'int (struct sk_buff *, struct net_device *)' [-Werror,-Wincompatible-function-pointer-types-strict]
          .ndo_start_xmit         = lcs_start_xmit,
                                    ^~~~~~~~~~~~~~

->ndo_start_xmit() in 'struct net_device_ops' expects a return type of
'netdev_tx_t', not 'int'. Adjust the return type of lcs_start_xmit() to
match the prototype's to resolve the warning and potential CFI failure,
should s390 select ARCH_SUPPORTS_CFI_CLANG in the future.

Link: https://github.com/ClangBuiltLinux/linux/issues/1750
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agos390/netiucv: Fix return type of netiucv_tx()
Nathan Chancellor [Thu, 3 Nov 2022 17:01:29 +0000 (10:01 -0700)]
s390/netiucv: Fix return type of netiucv_tx()

With clang's kernel control flow integrity (kCFI, CONFIG_CFI_CLANG),
indirect call targets are validated against the expected function
pointer prototype to make sure the call target is valid to help mitigate
ROP attacks. If they are not identical, there is a failure at run time,
which manifests as either a kernel panic or thread getting killed. A
proposed warning in clang aims to catch these at compile time, which
reveals:

  drivers/s390/net/netiucv.c:1854:21: error: incompatible function pointer types initializing 'netdev_tx_t (*)(struct sk_buff *, struct net_device *)' (aka 'enum netdev_tx (*)(struct sk_buff *, struct net_device *)') with an expression of type 'int (struct sk_buff *, struct net_device *)' [-Werror,-Wincompatible-function-pointer-types-strict]
          .ndo_start_xmit         = netiucv_tx,
                                    ^~~~~~~~~~

->ndo_start_xmit() in 'struct net_device_ops' expects a return type of
'netdev_tx_t', not 'int'. Adjust the return type of netiucv_tx() to
match the prototype's to resolve the warning and potential CFI failure,
should s390 select ARCH_SUPPORTS_CFI_CLANG in the future.

Additionally, while in the area, remove a comment block that is no
longer relevant.

Link: https://github.com/ClangBuiltLinux/linux/issues/1750
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agos390/ctcm: Fix return type of ctc{mp,}m_tx()
Nathan Chancellor [Thu, 3 Nov 2022 17:01:28 +0000 (10:01 -0700)]
s390/ctcm: Fix return type of ctc{mp,}m_tx()

With clang's kernel control flow integrity (kCFI, CONFIG_CFI_CLANG),
indirect call targets are validated against the expected function
pointer prototype to make sure the call target is valid to help mitigate
ROP attacks. If they are not identical, there is a failure at run time,
which manifests as either a kernel panic or thread getting killed. A
proposed warning in clang aims to catch these at compile time, which
reveals:

  drivers/s390/net/ctcm_main.c:1064:21: error: incompatible function pointer types initializing 'netdev_tx_t (*)(struct sk_buff *, struct net_device *)' (aka 'enum netdev_tx (*)(struct sk_buff *, struct net_device *)') with an expression of type 'int (struct sk_buff *, struct net_device *)' [-Werror,-Wincompatible-function-pointer-types-strict]
          .ndo_start_xmit         = ctcm_tx,
                                    ^~~~~~~
  drivers/s390/net/ctcm_main.c:1072:21: error: incompatible function pointer types initializing 'netdev_tx_t (*)(struct sk_buff *, struct net_device *)' (aka 'enum netdev_tx (*)(struct sk_buff *, struct net_device *)') with an expression of type 'int (struct sk_buff *, struct net_device *)' [-Werror,-Wincompatible-function-pointer-types-strict]
          .ndo_start_xmit         = ctcmpc_tx,
                                    ^~~~~~~~~

->ndo_start_xmit() in 'struct net_device_ops' expects a return type of
'netdev_tx_t', not 'int'. Adjust the return type of ctc{mp,}m_tx() to
match the prototype's to resolve the warning and potential CFI failure,
should s390 select ARCH_SUPPORTS_CFI_CLANG in the future.

Additionally, while in the area, remove a comment block that is no
longer relevant.

Link: https://github.com/ClangBuiltLinux/linux/issues/1750
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: wwan: t7xx: Add NAPI support
Haijun Liu [Thu, 3 Nov 2022 09:18:29 +0000 (14:48 +0530)]
net: wwan: t7xx: Add NAPI support

Replace the work queue based RX flow with a NAPI implementation
Remove rx_thread and dpmaif_rxq_work.
Enable GRO on RX path.
Introduce dummy network device. its responsibility is
    - Binds one NAPI object for each DL HW queue and acts as
      the agent of all those network devices.
    - Use NAPI object to poll DL packets.
    - Helps to dispatch each packet to the network interface.

Signed-off-by: Haijun Liu <haijun.liu@mediatek.com>
Co-developed-by: Sreehari Kancharla <sreehari.kancharla@linux.intel.com>
Signed-off-by: Sreehari Kancharla <sreehari.kancharla@linux.intel.com>
Signed-off-by: Chandrashekar Devegowda <chandrashekar.devegowda@intel.com>
Acked-by: Ricardo Martinez <ricardo.martinez@linux.intel.com>
Acked-by: M Chetan Kumar <m.chetan.kumar@linux.intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: wwan: t7xx: Use needed_headroom instead of hard_header_len
Ilpo Järvinen [Thu, 3 Nov 2022 09:18:28 +0000 (14:48 +0530)]
net: wwan: t7xx: Use needed_headroom instead of hard_header_len

hard_header_len is used by gro_list_prepare() but on Rx, there
is no header so use needed_headroom instead.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Sreehari Kancharla <sreehari.kancharla@linux.intel.com>
Reviewed-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: hinic: Add support for configuration of rx-vlan-filter by ethtool
Cai Huoqing [Thu, 3 Nov 2022 08:05:11 +0000 (16:05 +0800)]
net: hinic: Add support for configuration of rx-vlan-filter by ethtool

When ethtool config rx-vlan-filter, the driver will send
control command to firmware, then set to hardware in this patch.

Signed-off-by: Cai Huoqing <cai.huoqing@linux.dev>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: hinic: Add control command support for VF PMD driver in DPDK
Cai Huoqing [Thu, 3 Nov 2022 08:05:10 +0000 (16:05 +0800)]
net: hinic: Add control command support for VF PMD driver in DPDK

HINIC has a mailbox for PF-VF communication and the VF driver
could send port control command to PF driver via mailbox.

The control command only can be set to register in PF,
so add support in PF driver for VF PMD driver control
command when VF PMD driver work with linux PF driver.

Then, no need to add handlers to nic_vf_cmd_msg_handler[],
because the host driver just forwards it to the firmware.
Actually the firmware works on a coprocessor MGMT_CPU(inside the NIC)
which will recv and deal with these commands.

Signed-off-by: Cai Huoqing <cai.huoqing@linux.dev>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: hinic: Convert the cmd code from decimal to hex to be more readable
Cai Huoqing [Thu, 3 Nov 2022 08:05:09 +0000 (16:05 +0800)]
net: hinic: Convert the cmd code from decimal to hex to be more readable

The print cmd code is in hex, so using hex cmd code intead of
decimal is easy to check the value with print info.

Signed-off-by: Cai Huoqing <cai.huoqing@linux.dev>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agodt-bindings: net: dsa-port: constrain number of 'reg' in ports
Krzysztof Kozlowski [Wed, 2 Nov 2022 16:15:12 +0000 (12:15 -0400)]
dt-bindings: net: dsa-port: constrain number of 'reg' in ports

'reg' without any constraints allows multiple items which is not the
intention in DSA port schema (as physical port is expected to have only
one address).

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agodt-bindings: net: constrain number of 'reg' in ethernet ports
Krzysztof Kozlowski [Wed, 2 Nov 2022 16:15:11 +0000 (12:15 -0400)]
dt-bindings: net: constrain number of 'reg' in ethernet ports

'reg' without any constraints allows multiple items which is not the
intention for Ethernet controller's port number.

Constrain the 'reg' on AX88178 and LAN95xx USB Ethernet Controllers.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Rob Herring <robh@kernel.org>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: axiemac: add PM callbacks to support suspend/resume
Andy Chiu [Tue, 1 Nov 2022 01:11:39 +0000 (09:11 +0800)]
net: axiemac: add PM callbacks to support suspend/resume

Support basic system-wide suspend and resume functions

Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: mv643xx_eth: support MII/GMII/RGMII modes for Kirkwood
David Yang [Fri, 28 Oct 2022 02:01:01 +0000 (10:01 +0800)]
net: mv643xx_eth: support MII/GMII/RGMII modes for Kirkwood

Support mode switch properly, which is not available before.

If SoC has two Ethernet controllers, by setting both of them into MII
mode, the first controller enters GMII mode, while the second
controller is effectively disabled. This requires configuring (and
maybe enabling) the second controller in the device tree, even though
it cannot be used.

Signed-off-by: David Yang <mmyangfl@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoocteon_ep: support Octeon device CNF95N
Veerasenareddy Burru [Thu, 3 Nov 2022 06:05:57 +0000 (23:05 -0700)]
octeon_ep: support Octeon device CNF95N

Add support for Octeon device CNF95N.
CNF95N is a Octeon Fusion family product with same PCI NIC
characteristics as CN93 which is currently supported by the driver.

update supported device list in Documentation.

Signed-off-by: Veerasenareddy Burru <vburru@marvell.com>
Link: https://lore.kernel.org/r/20221103060600.1858-1-vburru@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge branch 'sfc-add-basic-flower-matches-to-offload'
Jakub Kicinski [Sat, 5 Nov 2022 02:54:29 +0000 (19:54 -0700)]
Merge branch 'sfc-add-basic-flower-matches-to-offload'

Edward Cree says:

====================
sfc: add basic flower matches to offload

Support offloading TC flower rules with matches on L2-L4 fields.
====================

Link: https://lore.kernel.org/r/cover.1667412458.git.ecree.xilinx@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agosfc: add Layer 4 matches to ef100 TC offload
Edward Cree [Thu, 3 Nov 2022 15:27:31 +0000 (15:27 +0000)]
sfc: add Layer 4 matches to ef100 TC offload

Support matching on UDP/TCP source and destination ports and TCP flags,
 with masking if supported by the hardware.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agosfc: add Layer 3 flag matches to ef100 TC offload
Edward Cree [Thu, 3 Nov 2022 15:27:30 +0000 (15:27 +0000)]
sfc: add Layer 3 flag matches to ef100 TC offload

Support matching on ip_frag and ip_firstfrag.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agosfc: add Layer 3 matches to ef100 TC offload
Edward Cree [Thu, 3 Nov 2022 15:27:29 +0000 (15:27 +0000)]
sfc: add Layer 3 matches to ef100 TC offload

Support matching on IP protocol, Type of Service, Time To Live, source
 and destination addresses, with masking if supported by the hardware.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agosfc: add Layer 2 matches to ef100 TC offload
Edward Cree [Thu, 3 Nov 2022 15:27:28 +0000 (15:27 +0000)]
sfc: add Layer 2 matches to ef100 TC offload

Support matching on EtherType, VLANs and ethernet source/destination
 addresses, with masking if supported by the hardware.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agosfc: check recirc_id match caps before MAE offload
Edward Cree [Thu, 3 Nov 2022 15:27:27 +0000 (15:27 +0000)]
sfc: check recirc_id match caps before MAE offload

Offloaded TC rules always match on recirc_id in the MAE, so we should
 check that the MAE reported support for this match before attempting
 to insert the rule.
These checks allow us to fail early, avoiding the PCIe round-trip to
 firmware for an MC_CMD_MAE_ACTION_RULE_INSERT that will only fail,
 and more importantly providing a more informative error message that
 identifies which match field is unsupported.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: ethernet: renesas: Fix return type of rswitch_start_xmit()
Nathan Chancellor [Thu, 3 Nov 2022 22:00:32 +0000 (15:00 -0700)]
net: ethernet: renesas: Fix return type of rswitch_start_xmit()

With clang's kernel control flow integrity (kCFI, CONFIG_CFI_CLANG),
indirect call targets are validated against the expected function
pointer prototype to make sure the call target is valid to help mitigate
ROP attacks. If they are not identical, there is a failure at run time,
which manifests as either a kernel panic or thread getting killed. A
proposed warning in clang aims to catch these at compile time, which
reveals:

  drivers/net/ethernet/renesas/rswitch.c:1533:20: error: incompatible function pointer types initializing 'netdev_tx_t (*)(struct sk_buff *, struct net_device *)' (aka 'enum netdev_tx (*)(struct sk_buff *, struct net_device *)') with an expression of type 'int (struct sk_buff *, struct net_device *)' [-Werror,-Wincompatible-function-pointer-types-strict]
          .ndo_start_xmit = rswitch_start_xmit,
                          ^~~~~~~~~~~~~~~~~~
  1 error generated.

->ndo_start_xmit() in 'struct net_device_ops' expects a return type of
'netdev_tx_t', not 'int'. Adjust the return type of rswitch_start_xmit()
to match the prototype's to resolve the warning and CFI failure.

Link: https://github.com/ClangBuiltLinux/linux/issues/1750
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20221103220032.2142122-1-nathan@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: remove redundant check in ip_metrics_convert()
Zhengchao Shao [Fri, 4 Nov 2022 02:25:13 +0000 (10:25 +0800)]
net: remove redundant check in ip_metrics_convert()

Now ip_metrics_convert() is only called by ip_fib_metrics_init(). Before
ip_fib_metrics_init() invokes ip_metrics_convert(), it checks whether
input parameter fc_mx is NULL. Therefore, ip_metrics_convert() doesn't
need to check fc_mx.

Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20221104022513.168868-1-shaozhengchao@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoigb: Proactively round up to kmalloc bucket size
Kees Cook [Tue, 18 Oct 2022 09:25:25 +0000 (02:25 -0700)]
igb: Proactively round up to kmalloc bucket size

In preparation for removing the "silently change allocation size"
users of ksize(), explicitly round up all q_vector allocations so that
allocations can be correctly compared to ksize().

Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Tony Nguyen <anthony.l.nguyen@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: intel-wired-lan@lists.osuosl.org
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>