review.tizen.org Git - platform/kernel/linux-starfive.git/log

Merge branch 'net-ipa-small-transaction-updates'

Alex Elder says:

====================
net: ipa: small transaction updates

Version 2 of this series corrects a misspelling of "outstanding"
pointed out by the netdev test bots.  (For some reason I don't see
that when I run "checkpatch".)  I found and fixed a second instance
of that word being misspelled as well.

This series includes three changes to the transaction code.  The
first adds a new transaction list that represents a distinct state
that has not been maintained.  The second moves a field in the
transaction information structure, and reorders its initialization
a bit.  The third skips a function call when it is known not to be
necessary.

The last two are very small "leftover" patches.
====================

Link: https://lore.kernel.org/r/20220719181020.372697-1-elder@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ipa: fix an outdated comment

Since commit 8797972afff3d ("net: ipa: remove command info pool"),
we don't allocate "command info" entries for command channel
transactions. Fix a comment that seems to suggest we still do.
(Even before that commit, the comment was out of place.)

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ipa: report when the driver has been removed

When the IPA driver has completed its initialization and setup
stages, it emits a brief message to the log. Add a small message
that reports when it has been removed.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ipa: skip some cleanup for unused transactions

In gsi_trans_free(), there's no point in ipa_gsi_trans_release() if
a transaction is unused. No used TREs means no IPA layer resources
to clean up. So only call ipa_gsi_trans_release() if at least one
TRE was used.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ipa: rearrange transaction initialization

The transaction map is really associated with the transaction pool;
move its definition earlier in the gsi_trans_info structure.

Rearrange initialization in gsi_channel_trans_init() so it
sets the tre_avail value first, then initializes the transaction
pool, and finally allocating the transaction map.

Update comments.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ipa: add a transaction committed list

We currently put a transaction on the pending list when it has
been committed. But until the channel's doorbell rings, these
transactions aren't actually "owned" by the hardware yet.

Add a new "committed" state (and list), to represent transactions
that have been committed but not yet sent to hardware. Define
"pending" to mean committed transactions that have been sent
to hardware but have not yet completed.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ipa: add an endpoint device attribute group

Create a new attribute group meant to provide a single place that
defines endpoint IDs that might be needed by user space. Not all
defined endpoints are presented, and only those that are defined
will be made visible.

The new attributes use "extended" device attributes to hold endpoint
IDs, which is a little more compact and efficient. Reimplement the
existing modem endpoint ID attribute files using common code.

Signed-off-by: Alex Elder <elder@linaro.org>
Link: https://lore.kernel.org/r/20220719191639.373249-1-elder@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: net: af_unix: Fix a build error of unix_connect.c.

This patch fixes a build error reported in the link. [0]

  unix_connect.c: In function ‘unix_connect_test’:
  unix_connect.c:115:55: error: expected identifier before ‘(’ token
   #define offsetof(type, member) ((size_t)&((type *)0)->(member))
                                                       ^
  unix_connect.c:128:12: note: in expansion of macro ‘offsetof’
    addrlen = offsetof(struct sockaddr_un, sun_path) + variant->len;
              ^~~~~~~~

We can fix this by removing () around member, but checkpatch will complain
about it, and the root cause of the build failure is that I followed the
warning and fixed this in the v2 -> v3 change of the blamed commit. [1]

  CHECK: Macro argument 'member' may be better as '(member)' to avoid precedence issues
  #33: FILE: tools/testing/selftests/net/af_unix/unix_connect.c:115:
  +#define offsetof(type, member) ((size_t)&((type *)0)->member)

To avoid this warning, let's use offsetof() defined in stddef.h instead.

[0]: https://lore.kernel.org/linux-mm/202207182205.FrkMeDZT-lkp@intel.com/
[1]: https://lore.kernel.org/netdev/20220702154818.66761-1-kuniyu@amazon.com/

Fixes: e95ab1d85289 ("selftests: net: af_unix: Test connect() with different netns.")
Reported-by: kernel test robot <lkp@intel.com>
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20220720005750.16600-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: amd8111e: remove repeated dev->features assignement

It's repeated with line 1793-1795, and there isn't any other
handling for it. So remove it.

Signed-off-by: Jian Shen <shenjian15@huawei.com>
Link: https://lore.kernel.org/r/20220719142424.4528-1-shenjian15@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge git://git./linux/kernel/git/netfilter/nf-next

Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

The following patchset contains Netfilter/IPVS updates for net-next:

1) Simplify nf_ct_get_tuple(), from Jackie Liu.

2) Add format to request_module() call, from Bill Wendling.

3) Add /proc/net/stats/nf_flowtable to monitor in-flight pending
   hardware offload objects to be processed, from Vlad Buslov.

4) Missing rcu annotation and accessors in the netfilter tree,
   from Florian Westphal.

5) Merge h323 conntrack helper nat hooks into single object,
   also from Florian.

6) A batch of update to fix sparse warnings treewide,
   from Florian Westphal.

7) Move nft_cmp_fast_mask() where it used, from Florian.

8) Missing const in nf_nat_initialized(), from James Yonan.

9) Use bitmap API for Maglev IPVS scheduler, from Christophe Jaillet.

10) Use refcount_inc instead of _inc_not_zero in flowtable,
    from Florian Westphal.

11) Remove pr_debug in xt_TPROXY, from Nathan Cancellor.

* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
  netfilter: xt_TPROXY: remove pr_debug invocations
  netfilter: flowtable: prefer refcount_inc
  netfilter: ipvs: Use the bitmap API to allocate bitmaps
  netfilter: nf_nat: in nf_nat_initialized(), use const struct nf_conn *
  netfilter: nf_tables: move nft_cmp_fast_mask to where its used
  netfilter: nf_tables: use correct integer types
  netfilter: nf_tables: add and use BE register load-store helpers
  netfilter: nf_tables: use the correct get/put helpers
  netfilter: x_tables: use correct integer types
  netfilter: nfnetlink: add missing __be16 cast
  netfilter: nft_set_bitmap: Fix spelling mistake
  netfilter: h323: merge nat hook pointers into one
  netfilter: nf_conntrack: use rcu accessors where needed
  netfilter: nf_conntrack: add missing __rcu annotations
  netfilter: nf_flow_table: count pending offload workqueue tasks
  net/sched: act_ct: set 'net' pointer when creating new nf_flow_table
  netfilter: conntrack: use correct format characters
  netfilter: conntrack: use fallthrough to cleanup
====================

Link: https://lore.kernel.org/r/20220720230754.209053-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'mlx5-updates-2022-07-17' of git://git./linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2022-07-17

1) Add resiliency for lost completions for PTP TX port timestamp

2) Report Header-data split state via ethtool

3) Decouple HTB code from main regular TX code

* tag 'mlx5-updates-2022-07-17' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5: CT: Remove warning of ignore_flow_level support for non PF
  net/mlx5e: Add resiliency for PTP TX port timestamp
  net/mlx5: Expose ts_cqe_metadata_size2wqe_counter
  net/mlx5e: HTB, move htb functions to a new file
  net/mlx5e: HTB, change functions name to follow convention
  net/mlx5e: HTB, remove priv from htb function calls
  net/mlx5e: HTB, hide and dynamically allocate mlx5e_htb structure
  net/mlx5e: HTB, move stats and max_sqs to priv
  net/mlx5e: HTB, move section comment to the right place
  net/mlx5e: HTB, move ids to selq_params struct
  net/mlx5e: HTB, reduce visibility of htb functions
  net/mlx5e: Fix mqprio_rl handling on devlink reload
  net/mlx5e: Report header-data split state through ethtool
====================

Link: https://lore.kernel.org/r/20220719203529.51151-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netfilter: xt_TPROXY: remove pr_debug invocations

pr_debug calls are no longer needed in this file.

Pablo suggested "a patch to remove these pr_debug calls". This patch has
some other beneficial collateral as it also silences multiple Clang
-Wformat warnings that were present in the pr_debug calls.

diff from v1 -> v2:
* converted if statement one-liner style
* x == NULL is now !x

Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Justin Stitt <justinstitt@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: flowtable: prefer refcount_inc

With refcount_inc_not_zero, we'd also need a smp_rmb or similar,
followed by a test of the CONFIRMED bit.

However, the ct pointer is taken from skb->_nfct, its refcount must
not be 0 (else, we'd already have a use-after-free bug).

Use refcount_inc() instead to clarify the ct refcount is expected to
be at least 1.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: ipvs: Use the bitmap API to allocate bitmaps

Use bitmap_zalloc()/bitmap_free() instead of hand-writing them.

It is less verbose and it improves the semantic.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

net: ipa: initialize ring indexes to 0

When a GSI channel is initially allocated, and after it has been
reset, the hardware assumes its ring index is 0.  And although we
do initialize channels this way, the comments in the IPA code don't
really explain this.  For event rings, it doesn't matter what value
we use initially, so using 0 is just fine.

Add some information about the assumptions made by hardware above
the definition of the gsi_ring structure in "gsi.h".

Zero the index field for all rings (channel and event) when the ring
is allocated.  As a result, that function initializes all fields in
the structure.

Stop zeroing the index the top of gsi_channel_program().  Initially
we'll use the index value set when the channel ring was allocated.
And we'll explicitly zero the index value in gsi_channel_reset()
before programming the hardware, adding a comment explaining why
it's required.

For event rings, use the index initialized by gsi_ring_alloc()
rather than 0 when ringing the doorbell in gsi_evt_ring_program().
(It'll still be zero, but we won't assume that to be the case.)

Use a local variable in gsi_evt_ring_program() that represents the
address of the event ring's ring structure.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: marvell: prestera: add phylink support

For SFP port prestera driver will use kernel
phylink infrastucture to configure port mode based on
the module that has beed inserted

Co-developed-by: Yevhen Orlov <yevhen.orlov@plvision.eu>
Signed-off-by: Yevhen Orlov <yevhen.orlov@plvision.eu>
Co-developed-by: Taras Chornyi <taras.chornyi@plvision.eu>
Signed-off-by: Taras Chornyi <taras.chornyi@plvision.eu>
Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>

vmxnet3: Implement ethtool's get_channels command

Some tools (e.g. libxdp) use that information.

Signed-off-by: Andrey Turkin <andrey.turkin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'linux-can-next-for-5.20-20220720' of git://git./linux/kernel/git/mkl/linux-can-next

Marc Kleine-Budde says:

====================
this is a pull request of 29 patches for net-next/master.

The first 6 patches target the slcan driver. Dan Carpenter contributes
a hardening patch, followed by 5 cleanup patches.

Biju Das contributes 5 patches to prepare the sja1000 driver to
support the Renesas RZ/N1 SJA1000 CAN controller.

Dario Binacchi's patch for the slcan driver fixes a sleep with held
spin lock.

Another patch by Dario Binacchi fixes a wrong comment in the c_can
driver.

Pavel Pisa updates the CTU CAN FD IP core registers.

Stephane Grosjean contributes 3 patches to the peak_usb driver for
cleanups and support of a new MCU.

The last 12 patches are by Vincent Mailhol, they fix and improve the
txerr and rxerr reporting in all CAN drivers.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'can-error-set-of-fixes-and-improvement-on-txerr-and-rxerr-reporting'

Vincent Mailhol says:

====================
can: error: set of fixes and improvement on txerr and rxerr reporting

This series is a collection of patches targeting the CAN error
counter. The series is split in three blocks (with small relation to
each other).

Several drivers uses the data[6] and data[7] fields (both of type u8)
of the CAN error frame to report those values. However, the maximum
size an u8 can hold is 255 and the error counter can exceed this value
if bus-off status occurs. As such, the first nine patches of this
series make sure that no drivers try to report txerr or rxerr through
the CAN error frame when bus-off status is reached.

can_frame::data[5..7] are defined as being "controller
specific". Controller specific behaviors are not something desirable
(portability issue...) The tenth patch of this series specifies how
can_frame::data[5..7] should be use and remove any "controller
specific" freedom. The eleventh patch adds a flag to notify though
can_frame::can_id that data[6..7] were populated (in order to be
consistent with other fields).

Finally, the twelfth and last patch add three macro values to specify
the different error counter threshold with so far was hard-coded as
magic numbers in the drivers.

N.B.:
  * patches 1 to 10 are for net (stable).
  * patches 11 and 12 are for net-next (but depends on patches 1 to 10).

** Changelog **

v1 -> v2: https://lore.kernel.org/all/20220712153157.83847-1-mailhol.vincent@wanadoo.fr
  * Fix typo in patch #10: data[7] of CAN error frames is for the RX
    error counter, not the TX one (this is litteraly a one byte
    change).
====================

As discussed take the whole series via can-next -> net-next.

Link: https://lore.kernel.org/all/20220719143550.3681-1-mailhol.vincent@wanadoo.fr
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: error: add definitions for the different CAN error thresholds

Currently, drivers are using magic numbers to derive the CAN error
states from the error counter. Add three macro declarations to
remediate this.

For reference, the error-active, error-passive and bus-off are defined
in ISO 11898, section 12.1.4.2 "Error counting". Although ISO 11898
does not define error-warning state, this extra value is also commonly
used and is thus also added.

Link: https://lore.kernel.org/all/20220719143550.3681-13-mailhol.vincent@wanadoo.fr
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: add CAN_ERR_CNT flag to notify availability of error counter

Add a dedicated flag in uapi/linux/can/error.h to notify the userland
that fields data[6] and data[7] of the CAN error frame were
respectively populated with the tx and rx error counters.

For all driver tree-wide, set up this flags whenever needed.

Link: https://lore.kernel.org/all/20220719143550.3681-12-mailhol.vincent@wanadoo.fr
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: error: specify the values of data[5..7] of CAN error frames

Currently, data[5..7] of struct can_frame, when used as a CAN error
frame, are defined as being "controller specific". Device specific
behaviours are problematic because it prevents someone from writing
code which is portable between devices.

As a matter of fact, data[5] is never used, data[6] is always used to
report TX error counter and data[7] is always used to report RX error
counter. can-utils also relies on this.

This patch updates the comment in the uapi header to specify that
data[5] is reserved (and thus should not be used) and that data[6..7]
are used for error counters.

Fixes: 0d66548a10cb ("[CAN]: Add PF_CAN core module")
Link: https://lore.kernel.org/all/20220719143550.3681-11-mailhol.vincent@wanadoo.fr
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: usb_8dev: do not report txerr and rxerr during bus-off

During bus off, the error count is greater than 255 and can not fit in
a u8.

Fixes: 0024d8ad1639 ("can: usb_8dev: Add support for USB2CAN interface from 8 devices")
Link: https://lore.kernel.org/all/20220719143550.3681-10-mailhol.vincent@wanadoo.fr
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: kvaser_usb_leaf: do not report txerr and rxerr during bus-off

During bus off, the error count is greater than 255 and can not fit in
a u8.

Fixes: 7259124eac7d1 ("can: kvaser_usb: Split driver into kvaser_usb_core.c and kvaser_usb_leaf.c")
Link: https://lore.kernel.org/all/20220719143550.3681-9-mailhol.vincent@wanadoo.fr
CC: Jimmy Assarsson <extja@kvaser.com>
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: kvaser_usb_hydra: do not report txerr and rxerr during bus-off

During bus off, the error count is greater than 255 and can not fit in
a u8.

Fixes: aec5fb2268b7 ("can: kvaser_usb: Add support for Kvaser USB hydra family")
Link: https://lore.kernel.org/all/20220719143550.3681-8-mailhol.vincent@wanadoo.fr
CC: Jimmy Assarsson <extja@kvaser.com>
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: sun4i_can: do not report txerr and rxerr during bus-off

During bus off, the error count is greater than 255 and can not fit in
a u8.

Fixes: 0738eff14d81 ("can: Allwinner A10/A20 CAN Controller support - Kernel module")
Link: https://lore.kernel.org/all/20220719143550.3681-7-mailhol.vincent@wanadoo.fr
CC: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: hi311x: do not report txerr and rxerr during bus-off

During bus off, the error count is greater than 255 and can not fit in
a u8.

Fixes: 57e83fb9b746 ("can: hi311x: Add Holt HI-311x CAN driver")
Link: https://lore.kernel.org/all/20220719143550.3681-6-mailhol.vincent@wanadoo.fr
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: slcan: do not report txerr and rxerr during bus-off

During bus off, the error count is greater than 255 and can not fit in
a u8.

alloc_can_err_skb() already sets cf to NULL if the allocation fails [1],
so the redundant cf = NULL assignment gets removed.

[1] https://elixir.bootlin.com/linux/latest/source/drivers/net/can/dev/skb.c#L187

Fixes: 0a9cdcf098a4 ("can: slcan: extend the protocol with CAN state info")
Link: https://lore.kernel.org/all/20220719143550.3681-5-mailhol.vincent@wanadoo.fr
CC: Dario Binacchi <dario.binacchi@amarulasolutions.com>
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: sja1000: do not report txerr and rxerr during bus-off

During bus off, the error count is greater than 255 and can not fit in
a u8.

Fixes: 215db1856e83 ("can: sja1000: Consolidate and unify state change handling")
Link: https://lore.kernel.org/all/20220719143550.3681-4-mailhol.vincent@wanadoo.fr
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: rcar_can: do not report txerr and rxerr during bus-off

During bus off, the error count is greater than 255 and can not fit in
a u8.

Fixes: fd1159318e55 ("can: add Renesas R-Car CAN driver")
Link: https://lore.kernel.org/all/20220719143550.3681-3-mailhol.vincent@wanadoo.fr
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: pch_can: do not report txerr and rxerr during bus-off

During bus off, the error count is greater than 255 and can not fit in
a u8.

Fixes: 0c78ab76a05c ("pch_can: Add setting TEC/REC statistics processing")
Link: https://lore.kernel.org/all/20220719143550.3681-2-mailhol.vincent@wanadoo.fr
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

Merge branch '1GbE' of git://git./linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
1GbE Intel Wired LAN Driver Updates 2022-07-18

This series contains updates to igc driver only.

Kurt Kanzenbach adds support for Qbv schedules where one queue stays open
in consecutive entries.

Sasha removes an unused define and field.

* '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  igc: Remove forced_speed_duplex value
  igc: Remove MSI-X PBA Clear register
  igc: Lift TAPRIO schedule restriction
====================

Link: https://lore.kernel.org/r/20220718180109.4114540-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/sched: remove qdisc_root_lock() helper

the last caller has been removed with commit 96f5e66e8a79 ("mac80211: fix
aggregation for hardware with ampdu queues"), so it's safe to remove this
function.

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Link: https://lore.kernel.org/r/703d549e3088367651d92a059743f1be848d74b7.1658133689.git.dcaratti@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'io_uring-zerocopy-send' of git://git./linux/kernel/git/kuba/linux

Pavel Begunkov says:

====================
io_uring zerocopy send

The patchset implements io_uring zerocopy send. It works with both registered
and normal buffers, mixing is allowed but not recommended. Apart from usual
request completions, just as with MSG_ZEROCOPY, io_uring separately notifies
the userspace when buffers are freed and can be reused (see API design below),
which is delivered into io_uring's Completion Queue. Those "buffer-free"
notifications are not necessarily per request, but the userspace has control
over it and should explicitly attaching a number of requests to a single
notification. The series also adds some internal optimisations when used with
registered buffers like removing page referencing.

From the kernel networking perspective there are two main changes. The first
one is passing ubuf_info into the network layer from io_uring (inside of an
in kernel struct msghdr). This allows extra optimisations, e.g. ubuf_info
caching on the io_uring side, but also helps to avoid cross-referencing
and synchronisation problems. The second part is an optional optimisation
removing page referencing for requests with registered buffers.

Benchmarking UDP with an optimised version of the selftest (see [1]), which
sends a bunch of requests, waits for completions and repeats. "+ flush" column
posts one additional "buffer-free" notification per request, and just "zc"
doesn't post buffer notifications at all.

NIC (requests / second):
IO size | non-zc    | zc             | zc + flush
4000    | 495134    | 606420 (+22%)  | 558971 (+12%)
1500    | 551808    | 577116 (+4.5%) | 565803 (+2.5%)
1000    | 584677    | 592088 (+1.2%) | 560885 (-4%)
600     | 596292    | 598550 (+0.4%) | 555366 (-6.7%)

dummy (requests / second):
IO size | non-zc    | zc             | zc + flush
8000    | 1299916   | 2396600 (+84%) | 2224219 (+71%)
4000    | 1869230   | 2344146 (+25%) | 2170069 (+16%)
1200    | 2071617   | 2361960 (+14%) | 2203052 (+6%)
600     | 2106794   | 2381527 (+13%) | 2195295 (+4%)

Previously it also brought a massive performance speedup compared to the
msg_zerocopy tool (see [3]), which is probably not super interesting. There
is also an additional bunch of refcounting optimisations that was omitted from
the series for simplicity and as they don't change the picture drastically,
they will be sent as follow up, as well as flushing optimisations closing the
performance gap b/w two last columns.

For TCP on localhost (with hacks enabling localhost zerocopy) and including
additional overhead for receive:

IO size | non-zc    | zc
1200    | 4174      | 4148
4096    | 7597      | 11228

Using a real NIC 1200 bytes, zc is worse than non-zc ~5-10%, maybe the
omitted optimisations will somewhat help, should look better for 4000,
but couldn't test properly because of setup problems.

Links:

  liburing (benchmark + tests):
  [1] https://github.com/isilence/liburing/tree/zc_v4

  kernel repo:
  [2] https://github.com/isilence/linux/tree/zc_v4

  RFC v1:
  [3] https://lore.kernel.org/io-uring/cover.1638282789.git.asml.silence@gmail.com/

  RFC v2:
  https://lore.kernel.org/io-uring/cover.1640029579.git.asml.silence@gmail.com/

  Net patches based:
  git@github.com:isilence/linux.git zc_v4-net-base
  or
  https://github.com/isilence/linux/tree/zc_v4-net-base

API design overview:

  The series introduces an io_uring concept of notifactors. From the userspace
  perspective it's an entity to which it can bind one or more requests and then
  requesting to flush it. Flushing a notifier makes it impossible to attach new
  requests to it, and instructs the notifier to post a completion once all
  requests attached to it are completed and the kernel doesn't need the buffers
  anymore.

  Notifications are stored in notification slots, which should be registered as
  an array in io_uring. Each slot stores only one notifier at any particular
  moment. Flushing removes it from the slot and the slot automatically replaces
  it with a new notifier. All operations with notifiers are done by specifying
  an index of a slot it's currently in.

  When registering a notification the userspace specifies a u64 tag for each
  slot, which will be copied in notification completion entries as
  cqe::user_data. cqe::res is 0 and cqe::flags is equal to wrap around u32
  sequence number counting notifiers of a slot.

====================

Link: https://lore.kernel.org/r/cover.1657643355.git.asml.silence@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp: support externally provided ubufs

Teach tcp how to use external ubuf_info provided in msghdr and
also prepare it for managed frags by sprinkling
skb_zcopy_downgrade_managed() when it could mix managed and not managed
frags.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6/udp: support externally provided ubufs

Teach ipv6/udp how to use external ubuf_info provided in msghdr and
also prepare it for managed frags by sprinkling
skb_zcopy_downgrade_managed() when it could mix managed and not managed
frags.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv4/udp: support externally provided ubufs

Teach ipv4/udp how to use external ubuf_info provided in msghdr and
also prepare it for managed frags by sprinkling
skb_zcopy_downgrade_managed() when it could mix managed and not managed
frags.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: introduce __skb_fill_page_desc_noacc

Managed pages contain pinned userspace pages and controlled by upper
layers, there is no need in tracking skb->pfmemalloc for them. Introduce
a helper for filling frags but ignoring page tracking, it'll be needed
later.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: introduce managed frags infrastructure

Some users like io_uring can do page pinning more efficiently, so we
want a way to delegate referencing to other subsystems. For that add
a new flag called SKBFL_MANAGED_FRAG_REFS. When set, skb doesn't hold
page references and upper layers are responsivle to managing page
lifetime.

It's allowed to convert skbs from managed to normal by calling
skb_zcopy_downgrade_managed(). The function will take all needed
page references and clear the flag. It's needed, for instance,
to avoid mixing managed modes.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: Allow custom iter handler in msghdr

Add support for custom iov_iter handling to msghdr. The idea is that
in-kernel subsystems want control over how an SG is split.

Signed-off-by: David Ahern <dsahern@kernel.org>
[pavel: move callback into msghdr]
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

skbuff: carry external ubuf_info in msghdr

Make possible for network in-kernel callers like io_uring to pass in a
custom ubuf_info by setting it in a new field of struct msghdr.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

sfc: update MCDI protocol headers

Link: https://lore.kernel.org/r/cover.1657878101.git.ecree.xilinx@gmail.com
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: CT: Remove warning of ignore_flow_level support for non PF

ignore_flow_level isn't supported for SFs, and so it causes
post_act and ct to warn about it per SF.
Apply the warning only for PF.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Add resiliency for PTP TX port timestamp

PTP TX port timestamp relies on receiving 2 CQEs for each outgoing
packet (WQE). The regular CQE has a less accurate timestamp than the
wire CQE. On link change, the wire CQE may get lost. Let the driver
detect and restore the relation between the CQEs, and re-sync after
timeout.

Add resiliency for this as follows: add id (producer counter)
into the WQE's metadata. This id will be received in the wire
CQE (in wqe_counter field). On handling the wire CQE, if there is no
match, replay the PTP application with the time-stamp from the regular
CQE and restore the sync between the CQEs and their SKBs. This patch
adds 2 ptp counters:
1) ptp_cq0_resync_event: number of times a mismatch was detected between
the regular CQE and the wire CQE.
2) ptp_cq0_resync_cqe: total amount of missing wire CQEs.

Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Expose ts_cqe_metadata_size2wqe_counter

Add capability field which indicates the mask for wqe_counter which
connects between loopback CQE and the original WQE. With this connection
the driver can identify lost of the loopback CQE and reply PTP
synchronization with timestamp given in the original CQE.

Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: HTB, move htb functions to a new file

Move htb related functions and data to a separated file for better
encapsulation.

Signed-off-by: Moshe Tal <moshet@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: HTB, change functions name to follow convention

Following the change of the functions to be object like, change also
the names.

Signed-off-by: Moshe Tal <moshet@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: HTB, remove priv from htb function calls

As a step to make htb self-contained replace the passing of priv as a
parameter to htb function calls with members in the htb struct.

Full decoupling the htb from priv will require more work, so for now
leave the priv as one of the members in the htb struct, to be replaced
by channels in a future commit.

Signed-off-by: Moshe Tal <moshet@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: HTB, hide and dynamically allocate mlx5e_htb structure

Move structure mlx5e_htb from the main driver include file "en.h" to be
hidden in qos.c where the qos functionality is implemented, forward
declare it for the rest of the driver and allocate it dynamically upon
user demand only.

Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Moshe Tal <moshet@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>

net/mlx5e: HTB, move stats and max_sqs to priv

Preparation for dynamic allocation of the HTB struct.
The statistics should be preserved even when the struct is de-allocated.

Signed-off-by: Moshe Tal <moshet@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: HTB, move section comment to the right place

mlx5e_get_qos_sq is a part of the SQ lifecycle, so need be under the
title.

Signed-off-by: Moshe Tal <moshet@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: HTB, move ids to selq_params struct

HTB id fields are needed for selecting queue. Moving them to the
selq_params struct will simplify synchronization between control flow
and mlx5e_select_queues and will keep the IDs in the hot cacheline of
mlx5e_selq_params.

Replace mlx5e_selq_prepare() with separate functions that change subsets
of parameters, while keeping the rest.

This also will be useful to hide mlx5e_htb structure from the rest of the
driver in a later patch in this series.

Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Moshe Tal <moshet@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>

net/mlx5e: HTB, reduce visibility of htb functions

No need to expose all htb tc functions to the main driver file,
expose only the master htb tc function mlx5e_htb_setup_tc()
which selects the internal "now static" function to call.

Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Moshe Tal <moshet@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>

net/mlx5e: Fix mqprio_rl handling on devlink reload

Keep mqprio_rl data to params and restore the configuration in case of
devlink reload.
Change the location of mqprio_rl resources cleanup so it will be done
also in reload flow.

Also, remove the rl pointer from the params, since this is dynamic object
and saved to priv.

Signed-off-by: Moshe Tal <moshet@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Report header-data split state through ethtool

HW-GRO (SHAMPO) packet merger scheme implies header-data split in the
driver, report it through the ethtool interface.

Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

Merge branch 'can-peak_usb-cleanups-and-updates'

Stephane Grosjean contributes a peak-usb cleanup and update series.

Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: peak_usb: include support for a new MCU

The CANFD-USB PCAN-USB FD interface undergoes an internal component
change that requires a slight modification of its drivers, which leads
them to dynamically use endpoint numbers provided by the interface
itself. In addition to a change in the calls to the USB functions
exported by the kernel, the detection of the USB interface dedicated
to CAN must also be modified, as some PEAK-System devices support
other interfaces than CAN.

Link: https://lore.kernel.org/all/20220719120632.26774-3-s.grosjean@peak-system.com
Signed-off-by: Stephane Grosjean <s.grosjean@peak-system.com>
[mkl: add missing cpu_to_le16() conversion]
[mkl: fix networking block comment style]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: peak_usb: correction of an initially misnamed field name

The data structure returned from the USB device contains a number
flashed by the user and not the serial number of the device.

Link: https://lore.kernel.org/all/20220719120632.26774-2-s.grosjean@peak-system.com
Signed-off-by: Stephane Grosjean <s.grosjean@peak-system.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: peak_usb: pcan_dump_mem(): mark input prompt and data pointer as const

Mark the input prompt and data pointer as const.

Link: https://lore.kernel.org/all/20220719120632.26774-1-s.grosjean@peak-system.com
Signed-off-by: Stephane Grosjean <s.grosjean@peak-system.com>
[mkl: mark data pointer as const, too; update commit message]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: ctucanfd: Update CTU CAN FD IP core registers to match version 3.x.

The update is compatible/pure extension of 2.x IP core version

- new option for 2, 4, or 8 Tx buffers option during synthesis.
   The 2.x version has fixed 4 Tx buffers. 3.x version default
   is 4 as well
- new REG_TX_COMMAND_TXT_BUFFER_COUNT provides synthesis
   choice. When read as 0 assume 2.x core with fixed 4 Tx buffers.
- new REG_ERR_CAPT_TS_BITS field to provide most significant
   active/implemented timestamp bit. For 2.x read as zero,
   assume value 63 is such case for 64 bit counter.
- new REG_MODE_RXBAM bit which controls automatic advance
   to next word after Rx FIFO register read. Bit is set
   to 1 by default after the core reset (REG_MODE_RST)
   and value 1 has to be preserved for the normal ctucanfd
   Linux driver operation. Even preceding driver version
   resets core and then modifies only known/required MODE
   register bits so backward and forward compatibility is
   ensured.

See complete datasheet for time-triggered and other
updated capabilities

  http://canbus.pages.fel.cvut.cz/ctucanfd_ip_core/doc/Datasheet.pdf

The fields related to ongoing Ondrej Ille's work
on fault tolerant version with parity protected buffers
and FIFOs are not included for now. Their inclusion will
be considered when design is settled and tested.

Link: https://lore.kernel.org/all/14a98ed1829121f0f3bde784f1aa533bc3cc7fe0.1658139843.git.pisa@cmp.felk.cvut.cz
Signed-off-by: Pavel Pisa <pisa@cmp.felk.cvut.cz>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: c_can: remove wrong comment

The comment referred to a status (warning) other than the one that was
being managed (active error).

Link: https://lore.kernel.org/all/20220716170112.2020291-1-dario.binacchi@amarulasolutions.com
Signed-off-by: Dario Binacchi <dario.binacchi@amarulasolutions.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: slcan: do not sleep with a spin lock held

We can't call close_candev() with a spin lock held, so release the lock
before calling it. After calling close_candev(), we can update the
fields of the private `struct can_priv' without having to acquire the
lock.

Fixes: c4e54b063f42f ("can: slcan: use CAN network device driver API")
Link: https://lore.kernel.org/linux-kernel/Ysrf1Yc5DaRGN1WE@xsang-OptiPlex-9020/
Link: https://lore.kernel.org/all/20220715072951.859586-1-dario.binacchi@amarulasolutions.com
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Dario Binacchi <dario.binacchi@amarulasolutions.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

Merge branch 'can-add-support-for-rz-n1-sja1000-can-controller'

Biju Das says:

====================
Add support for RZ/N1 SJA1000 CAN controller

This patch series aims to add support for RZ/N1 SJA1000 CAN controller.

The SJA1000 CAN controller on RZ/N1 SoC has some differences compared
to others like it has no clock divider register (CDR) support and it has
no HW loopback (HW doesn't see tx messages on rx), so introduced a new
compatible 'renesas,rzn1-sja1000' to handle these differences.

v3->v4:
* Updated bindings as per coding style used in example-schema.
* Entire entry in properties compatible declared as enum. Also Descriptions
   do not bring any information,so removed it from compatible description.
* Used decimal values in nxp,tx-output-mode enums.
* Fixed indentaions in binding examples.
* Removed clock-names from bindings, as it is single clock.
* Optimized the code as per Vincent's suggestion.
* Updated clock handling as per bindings.
v2->v3:
* Added reg-io-width is a required property for technologic,sja1000 & renesas,rzn1-sja1000
* Removed enum type from nxp,tx-output-config and updated the description
   for combination of TX0 and TX1.
* Updated the example for technologic,sja1000
v1->v2:
* Moved $ref: can-controller.yaml# to top along with if conditional to
  avoid multiple mapping issues with the if conditional in the subsequent
   patch.
* Added an example for RZ/N1D SJA1000 usage.
* Updated commit description for patch#2,#3 and #6
* Removed the quirk macro SJA1000_NO_HW_LOOPBACK_QUIRK
* Added prefix SJA1000_QUIRK_* for quirk macro.
* Replaced of_device_get_match_data->device_get_match_data.
* Added error handling on clk error path
* Started using "devm_clk_get_optional_enabled" for clk get,prepare and enable.

Ref:
[1] https://lore.kernel.org/linux-renesas-soc/20220701162320.102165-1-biju.das.jz@bp.renesas.com/T/#t
====================

Link: https://lore.kernel.org/all/20220710115248.190280-1-biju.das.jz@bp.renesas.com
[mkl: applying patches 1...5 only, as 6 depends
      devm_clk_get_optional_enabled(), which is not in
      net-next/master, yet]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: sja1000: Change the return type as void for SoC specific init

Change the return type as void for SoC specific init function as it
always return 0.

Link: https://lore.kernel.org/all/20220710115248.190280-6-biju.das.jz@bp.renesas.com
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: sja1000: Use device_get_match_data to get device data

This patch replaces of_match_device->device_get_match_data
to get pointer to device data.

Link: https://lore.kernel.org/all/20220710115248.190280-5-biju.das.jz@bp.renesas.com
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: sja1000: Add Quirk for RZ/N1 SJA1000 CAN controller

As per Chapter 6.5.16 of the RZ/N1 Peripheral Manual, The SJA1000
CAN controller does not support Clock Divider Register compared to
the reference Philips SJA1000 device.

This patch adds a device quirk to handle this difference.

Link: https://lore.kernel.org/all/20220710115248.190280-4-biju.das.jz@bp.renesas.com
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

dt-bindings: can: nxp,sja1000: Document RZ/N1{D,S} support

Add CAN binding documentation for Renesas RZ/N1 SoC.

The SJA1000 CAN controller on RZ/N1 SoC has some differences compared
to others like it has no clock divider register (CDR) support and it has
no HW loopback (HW doesn't see tx messages on rx), so introduced a new
compatible 'renesas,rzn1-sja1000' to handle these differences.

Link: https://lore.kernel.org/all/20220710115248.190280-3-biju.das.jz@bp.renesas.com
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

dt-bindings: can: sja1000: Convert to json-schema

Convert the NXP SJA1000 CAN Controller Device Tree binding
documentation to json-schema.

Update the example to match reality.

Link: https://lore.kernel.org/all/20220710115248.190280-2-biju.das.jz@bp.renesas.com
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

Merge branch 'can-slcan-checkpatch-cleanups'

Marc Kleine-Budde says:

====================
can: slcan: checkpatch cleanups

This is a patch series consisting of various checkpatch cleanups for
the slcan driver.
====================

Link: https://lore.kernel.org/all/20220704125954.1587880-1-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: slcan: clean up if/else

Remove braces after if() for single statement blocks, also remove else
after return() in if() block.

Link: https://lore.kernel.org/all/20220704125954.1587880-6-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: slcan: convert comparison to NULL into !val

All comparison to NULL could be written "!val", convert them to make
checkpatch happy.

Link: https://lore.kernel.org/all/20220704125954.1587880-5-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: slcan: fix whitespace issues

Add and remove whitespace to make checkpatch happy.

Link: https://lore.kernel.org/all/20220704125954.1587880-4-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: slcan: slcan_init() convert printk(LEVEL ...) to pr_level()

Convert the last printk(LEVEL ...) to pr_level().

Link: https://lore.kernel.org/all/20220704125954.1587880-3-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: slcan: convert comments to network style comments

Convert all comments to network subsystem style comments.

Link: https://lore.kernel.org/all/20220704125954.1587880-2-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: slcan: use scnprintf() as a hardening measure

The snprintf() function returns the number of bytes which *would* have
been copied if there were no space. So, since this code does not check
the return value, there if the buffer was not large enough then there
would be a buffer overflow two lines later when it does:

actual = sl->tty->ops->write(sl->tty, sl->xbuff, n);

Use scnprintf() instead because that returns the number of bytes which
were actually copied.

Fixes: 52f9ac85b876 ("can: slcan: allow to send commands to the adapter")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/all/YsVA9KoY/ZSvNGYk@kili
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

net: dsa: microchip: fix the missing ksz8_r_mib_cnt

During the refactoring for the ksz8_dev_ops from ksz8795.c to
ksz_common.c, the ksz8_r_mib_cnt has been missed. So this patch adds the
missing one.

Fixes: 6ec23aaaac43 ("net: dsa: microchip: move ksz_dev_ops to ksz_common.c")
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20220718061803.4939-1-arun.ramadoss@microchip.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: prestera: acl: add support for 'police' action on egress

Propagate ingress/egress direction for 'police' action down to hardware.

Co-developed-by: Volodymyr Mytnyk <volodymyr.mytnyk@plvision.eu>
Signed-off-by: Volodymyr Mytnyk <volodymyr.mytnyk@plvision.eu>
Signed-off-by: Maksym Glubokiy <maksym.glubokiy@plvision.eu>
Link: https://lore.kernel.org/r/20220714083541.1973919-1-maksym.glubokiy@plvision.eu
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch '100GbE' of git://git./linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
100GbE Intel Wired LAN Driver Updates 2022-07-15

This series contains updates to ice driver only.

Ani updates feature restriction for devices that don't support external
time stamping.

Zhuo Chen removes unnecessary call to pci_aer_clear_nonfatal_status().

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
ice: Remove pci_aer_clear_nonfatal_status() call
ice: Add EXTTS feature to the feature bitmap
====================

Link: https://lore.kernel.org/r/20220715214642.2968799-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: macb: fixup sparse warnings on __be16 ports

The port fields in the ethool flow structures are defined
to be __be16 types, so sparse is showing issues where these
are being passed to htons(). Fix these warnings by passing
them to be16_to_cpu() instead.

These are being used in netdev_dbg() so should only effect
anyone doing debug.

Fixes the following sparse warnings:

drivers/net/ethernet/cadence/macb_main.c:3366:9: warning: cast from restricted __be16
drivers/net/ethernet/cadence/macb_main.c:3366:9: warning: cast from restricted __be16
drivers/net/ethernet/cadence/macb_main.c:3366:9: warning: cast from restricted __be16
drivers/net/ethernet/cadence/macb_main.c:3419:25: warning: cast from restricted __be16
drivers/net/ethernet/cadence/macb_main.c:3419:25: warning: cast from restricted __be16
drivers/net/ethernet/cadence/macb_main.c:3419:25: warning: cast from restricted __be16
drivers/net/ethernet/cadence/macb_main.c:3419:25: warning: cast from restricted __be16

Signed-off-by: Ben Dooks <ben.dooks@sifive.com>
Link: https://lore.kernel.org/r/20220715173009.526126-1-ben.dooks@sifive.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: prestera: acl: fix code formatting

Make the code look better.

Signed-off-by: Maksym Glubokiy <maksym.glubokiy@plvision.eu>
Link: https://lore.kernel.org/r/20220715103806.7108-1-maksym.glubokiy@plvision.eu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

vmxnet3: Record queue number to incoming packets

Make generic XDP processing attribute packets to their actual
queues instead of queue #0. This improves AF_XDP performance
considerably since softirq threads no longer fight over single
AF_XDP socket spinlock.

Signed-off-by: Andrey Turkin <andrey.turkin@gmail.com>
Link: https://lore.kernel.org/r/20220717022050.822766-2-andrey.turkin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'devlink-prepare-mlxsw-and-netdevsim-for-locked-reload'

Jiri Pirko says:

====================
devlink: prepare mlxsw and netdevsim for locked reload

This is preparation patchset to be able to eventually make a switch and
make reload cmd to take devlink->lock as the other commands do.

This patchset is preparing 2 major users of devlink API - mlxsw and
netdevsim. The sets of functions are similar, therefore taking care of
both here.
====================

Link: https://lore.kernel.org/r/20220716110241.3390528-1-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: devlink: remove unused locked functions

Remove locked versions of functions that are no longer used by anyone.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netdevsim: convert driver to use unlocked devlink API during init/fini

Prepare for devlink reload being called with devlink->lock held and
convert the netdevsim driver to use unlocked devlink API during init and
fini flows. Take devl_lock() in reload_down() and reload_up() ops in the
meantime before reload cmd is converted to take the lock itself.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: devlink: add unlocked variants of devlink_region_create/destroy() functions

Add unlocked variants of devlink_region_create/destroy() functions
to be used in drivers called-in with devlink->lock held.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mlxsw: convert driver to use unlocked devlink API during init/fini

Prepare for devlink reload being called with devlink->lock held and
convert the mlxsw driver to use unlocked devlink API during init and
fini flows. Take devl_lock() in reload_down() and reload_up() ops in the
meantime before reload cmd is converted to take the lock itself.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: devlink: add unlocked variants of devlink_dpipe*() functions

Add unlocked variants of devlink_dpipe*() functions to be used
in drivers called-in with devlink->lock held.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: devlink: add unlocked variants of devlink_sb*() functions

Add unlocked variants of devlink_sb*() functions to be used
in drivers called-in with devlink->lock held.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: devlink: add unlocked variants of devlink_resource*() functions

Add unlocked variants of devlink_resource*() functions to be used
in drivers called-in with devlink->lock held.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: devlink: add unlocked variants of devling_trap*() functions

Add unlocked variants of devl_trap*() functions to be used in drivers
called-in with devlink->lock held.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: devlink: avoid false DEADLOCK warning reported by lockdep

Add a lock_class_key per devlink instance to avoid DEADLOCK warning by
lockdep, while locking more than one devlink instance in driver code,
for example in opening VFs flow.

Kernel log:
[  101.433802] ============================================
[  101.433803] WARNING: possible recursive locking detected
[  101.433810] 5.19.0-rc1+ #35 Not tainted
[  101.433812] --------------------------------------------
[  101.433813] bash/892 is trying to acquire lock:
[  101.433815] ffff888127bfc2f8 (&devlink->lock){+.+.}-{3:3}, at: probe_one+0x3c/0x690 [mlx5_core]
[  101.433909]
               but task is already holding lock:
[  101.433910] ffff888118f4c2f8 (&devlink->lock){+.+.}-{3:3}, at: mlx5_core_sriov_configure+0x62/0x280 [mlx5_core]
[  101.433989]
               other info that might help us debug this:
[  101.433990]  Possible unsafe locking scenario:

[  101.433991]        CPU0
[  101.433991]        ----
[  101.433992]   lock(&devlink->lock);
[  101.433993]   lock(&devlink->lock);
[  101.433995]
                *** DEADLOCK ***

[  101.433996]  May be due to missing lock nesting notation

[  101.433996] 6 locks held by bash/892:
[  101.433998]  #0: ffff88810eb50448 (sb_writers#3){.+.+}-{0:0}, at: ksys_write+0xf3/0x1d0
[  101.434009]  #1: ffff888114777c88 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x20d/0x520
[  101.434017]  #2: ffff888102b58660 (kn->active#231){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x230/0x520
[  101.434023]  #3: ffff888102d70198 (&dev->mutex){....}-{3:3}, at: sriov_numvfs_store+0x132/0x310
[  101.434031]  #4: ffff888118f4c2f8 (&devlink->lock){+.+.}-{3:3}, at: mlx5_core_sriov_configure+0x62/0x280 [mlx5_core]
[  101.434108]  #5: ffff88812adce198 (&dev->mutex){....}-{3:3}, at: __device_attach+0x76/0x430
[  101.434116]
               stack backtrace:
[  101.434118] CPU: 5 PID: 892 Comm: bash Not tainted 5.19.0-rc1+ #35
[  101.434120] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[  101.434130] Call Trace:
[  101.434133]  <TASK>
[  101.434135]  dump_stack_lvl+0x57/0x7d
[  101.434145]  __lock_acquire.cold+0x1df/0x3e7
[  101.434151]  ? register_lock_class+0x1880/0x1880
[  101.434157]  lock_acquire+0x1c1/0x550
[  101.434160]  ? probe_one+0x3c/0x690 [mlx5_core]
[  101.434229]  ? lockdep_hardirqs_on_prepare+0x400/0x400
[  101.434232]  ? __xa_alloc+0x1ed/0x2d0
[  101.434236]  ? ksys_write+0xf3/0x1d0
[  101.434239]  __mutex_lock+0x12c/0x14b0
[  101.434243]  ? probe_one+0x3c/0x690 [mlx5_core]
[  101.434312]  ? probe_one+0x3c/0x690 [mlx5_core]
[  101.434380]  ? devlink_alloc_ns+0x11b/0x910
[  101.434385]  ? mutex_lock_io_nested+0x1320/0x1320
[  101.434388]  ? lockdep_init_map_type+0x21a/0x7d0
[  101.434391]  ? lockdep_init_map_type+0x21a/0x7d0
[  101.434393]  ? __init_swait_queue_head+0x70/0xd0
[  101.434397]  probe_one+0x3c/0x690 [mlx5_core]
[  101.434467]  pci_device_probe+0x1b4/0x480
[  101.434471]  really_probe+0x1e0/0xaa0
[  101.434474]  __driver_probe_device+0x219/0x480
[  101.434478]  driver_probe_device+0x49/0x130
[  101.434481]  __device_attach_driver+0x1b8/0x280
[  101.434484]  ? driver_allows_async_probing+0x140/0x140
[  101.434487]  bus_for_each_drv+0x123/0x1a0
[  101.434489]  ? bus_for_each_dev+0x1a0/0x1a0
[  101.434491]  ? lockdep_hardirqs_on_prepare+0x286/0x400
[  101.434494]  ? trace_hardirqs_on+0x2d/0x100
[  101.434498]  __device_attach+0x1a3/0x430
[  101.434501]  ? device_driver_attach+0x1e0/0x1e0
[  101.434503]  ? pci_bridge_d3_possible+0x1e0/0x1e0
[  101.434506]  ? pci_create_resource_files+0xeb/0x190
[  101.434511]  pci_bus_add_device+0x6c/0xa0
[  101.434514]  pci_iov_add_virtfn+0x9e4/0xe00
[  101.434517]  ? trace_hardirqs_on+0x2d/0x100
[  101.434521]  sriov_enable+0x64a/0xca0
[  101.434524]  ? pcibios_sriov_disable+0x10/0x10
[  101.434528]  mlx5_core_sriov_configure+0xab/0x280 [mlx5_core]
[  101.434602]  sriov_numvfs_store+0x20a/0x310
[  101.434605]  ? sriov_totalvfs_show+0xc0/0xc0
[  101.434608]  ? sysfs_file_ops+0x170/0x170
[  101.434611]  ? sysfs_file_ops+0x117/0x170
[  101.434614]  ? sysfs_file_ops+0x170/0x170
[  101.434616]  kernfs_fop_write_iter+0x348/0x520
[  101.434619]  new_sync_write+0x2e5/0x520
[  101.434621]  ? new_sync_read+0x520/0x520
[  101.434624]  ? lock_acquire+0x1c1/0x550
[  101.434626]  ? lockdep_hardirqs_on_prepare+0x400/0x400
[  101.434630]  vfs_write+0x5cb/0x8d0
[  101.434633]  ksys_write+0xf3/0x1d0
[  101.434635]  ? __x64_sys_read+0xb0/0xb0
[  101.434638]  ? lockdep_hardirqs_on_prepare+0x286/0x400
[  101.434640]  ? syscall_enter_from_user_mode+0x1d/0x50
[  101.434643]  do_syscall_64+0x3d/0x90
[  101.434647]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
[  101.434650] RIP: 0033:0x7f5ff536b2f7
[  101.434658] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f
1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f
05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
[  101.434661] RSP: 002b:00007ffd9ea85d58 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  101.434664] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f5ff536b2f7
[  101.434666] RDX: 0000000000000002 RSI: 000055c4c279e230 RDI: 0000000000000001
[  101.434668] RBP: 000055c4c279e230 R08: 000000000000000a R09: 0000000000000001
[  101.434669] R10: 000055c4c283cbf0 R11: 0000000000000246 R12: 0000000000000002
[  101.434670] R13: 00007f5ff543d500 R14: 0000000000000002 R15: 00007f5ff543d700
[  101.434673]  </TASK>

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

skbuff: add SKBFL_DONT_ORPHAN flag

We don't want to list every single ubuf_info callback in
skb_orphan_frags(), add a flag controlling the behaviour.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

skbuff: don't mix ubuf_info from different sources

We should not append MSG_ZEROCOPY requests to skbuff with non
MSG_ZEROCOPY ubuf_info, they might be not compatible.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: avoid partial copy for zc

Even when zerocopy transmission is requested and possible,
__ip_append_data() will still copy a small chunk of data just because it
allocated some extra linear space (e.g. 128 bytes). It wastes CPU cycles
on copy and iter manipulations and also misalignes potentially aligned
data. Avoid such copies. And as a bonus we can allocate smaller skb.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv4: avoid partial copy for zc

Even when zerocopy transmission is requested and possible,
__ip_append_data() will still copy a small chunk of data just because it
allocated some extra linear space (e.g. 148 bytes). It wastes CPU cycles
on copy and iter manipulations and also misalignes potentially aligned
data. Avoid such copies. And as a bonus we can allocate smaller skb.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

igc: Remove forced_speed_duplex value

u8 forced_speed_duplex from value from igc_mac_info struct is not used.
This patch comes to tidy up the driver code.

Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

igc: Remove MSI-X PBA Clear register

MSI-X PBA Clear register is not used. This patch comes to tidy up the
driver code.

Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

igc: Lift TAPRIO schedule restriction

Add support for Qbv schedules where one queue stays open
in consecutive entries. Currently that's not supported.

Example schedule:

|tc qdisc replace dev ${INTERFACE} handle 100 parent root taprio num_tc 3 \
|   map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
|   queues 1@0 1@1 2@2 \
|   base-time ${BASETIME} \
|   sched-entry S 0x01 300000 \ # Stream High/Low
|   sched-entry S 0x06 500000 \ # Management and Best Effort
|   sched-entry S 0x04 200000 \ # Best Effort
|   flags 0x02

Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

atl1c: use netif_napi_add_tx() for Tx NAPI

Use netif_napi_add_tx() for NAPI in Tx direction instead of the regular
netif_napi_add() function.

Signed-off-by: Sieng-Piaw Liew <liew.s.piaw@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: microchip: fix Clang -Wunused-const-variable warning on 'ksz_dt_ids'

This patch removes the of_match_ptr() pointer when dereferencing the
ksz_dt_ids which produce the unused variable warning.

Reported-by: kernel test robot <lkp@intel.com>
Suggested-by: Arnd Bergmann <arnd@kernel.org>
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>