review.tizen.org Git - platform/kernel/linux-rpi.git/log

selftests/xsk: replace asm acquire/release implementations

Replace our own homegrown assembly store/release and load/acquire
implementations with the HW agnositic atomic APIs C11 offers. This to
make the code more portable, easier to read, and reduce the
maintenance burden.

The original code used load-acquire and store-release barriers
hand-coded in assembly. Since C11, these kind of operations are
offered as built-ins in gcc and llvm. The load-acquire operation
prevents hoisting of non-atomic memory operations to before this
operation and it corresponds to the __ATOMIC_ACQUIRE operation in the
built-in atomics. The store-release operation prevents hoisting of
non-atomic memory operations to after this operation and it
corresponds to the __ATOMIC_RELEASE operation in the built-in atomics.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20230111093526.11682-8-magnus.karlsson@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/xsk: add debug option for creating netdevs

Add a new option to the test_xsk.sh script that only creates the two
veth netdevs and the extra namespace, then exits without running any
tests. The failed test can then be executed in the debugger without
having to create the netdevs and namespace manually. For ease-of-use,
the veth netdevs to use are printed so they can be copied into the
debugger.

Here is an example how to use it:

> sudo ./test_xsk.sh -d

veth10 veth11

> gdb xskxceiver

In gdb:

run -i veth10 -i veth11

And now the test cases can be debugged with gdb.

If you want to debug the test suite on a real NIC in loopback mode,
there is no need to use this feature as you already know the netdev of
your NIC.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20230111093526.11682-7-magnus.karlsson@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/xsk: remove unused variable outstanding_tx

Remove the unused variable outstanding_tx.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20230111093526.11682-6-magnus.karlsson@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/xsk: print correct error codes when exiting

Print the correct error codes when exiting the test suite due to some
terminal error. Some of these had a switched sign and some of them
printed zero instead of errno.

Fixes: facb7cb2e909 ("selftests/bpf: Xsk selftests - SKB POLL, NOPOLL")
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20230111093526.11682-5-magnus.karlsson@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/xsk: submit correct number of frames in populate_fill_ring

Submit the correct number of frames in the function
xsk_populate_fill_ring(). For the tests that set the flag
use_addr_for_fill, uninitialized buffers were sent to the fill ring
following the correct ones. This has no impact on the tests, since
they only use the ones that were initialized. But for correctness,
this should be fixed.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20230111093526.11682-4-magnus.karlsson@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/xsk: do not close unused file descriptors

Do not close descriptors that have never been used. File descriptor
fields that are not in use are erroneously marked with the number 0,
which is a valid fd. Mark unused fds with -1 instead and do not close
these when deleting the socket.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20230111093526.11682-3-magnus.karlsson@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/xsk: print correct payload for packet dump

Print the correct payload when the packet dump option is selected. The
network to host conversion was forgotten and the payload was
erronously declared to be an int instead of an unsigned int.

Fixes: facb7cb2e909 ("selftests/bpf: Xsk selftests - SKB POLL, NOPOLL")
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20230111093526.11682-2-magnus.karlsson@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf_doc: Fix build error with older python versions

The ability to subscript match result as an array is only available
since python 3.6. Existing code in bpf_doc uses the older group()
interface but commit 8a76145a2ec2 adds code using the new interface.

Use the old interface consistently to avoid build error on older
distributions like the below:

+ make -j48 -s -C /dev/shm/kbuild/linux.33946/current ARCH=powerpc HOSTCC=gcc CROSS_COMPILE=powerpc64-suse-linux- clean
TypeError: '_sre.SRE_Match' object is not subscriptable

Fixes: 8a76145a2ec2 ("bpf: explicitly define BPF_FUNC_xxx integer values")
Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20230109113442.20946-1-msuchanek@suse.de

libbpf: Fix map creation flags sanitization

As BPF_F_MMAPABLE flag is now conditionnaly set (by map_is_mmapable),
it should not be toggled but disabled if not supported by kernel.

Fixes: 4fcac46c7e10 ("libbpf: only add BPF_F_MMAPABLE flag for data maps with global vars")
Signed-off-by: Ludovic L'Hours <ludovic.lhours@gmail.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230108182018.24433-1-ludovic.lhours@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

bpftool: fix output for skipping kernel config check

When bpftool feature does not find kernel config
files under default path or wrong format,
do not output CONFIG_XYZ is not set.
Skip kernel config check and continue.

Signed-off-by: Chethan Suresh <chethan.suresh@sony.com>
Signed-off-by: Kenta Tada <Kenta.Tada@sony.com>
Acked-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/r/20230109023742.29657-1-chethan.suresh@sony.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

bpf: btf: limit logging of ignored BTF mismatches

Enabling CONFIG_MODULE_ALLOW_BTF_MISMATCH is an indication that BTF
mismatches are expected and module loading should proceed
anyway. Logging with pr_warn() on every one of these "benign"
mismatches creates unnecessary noise when many such modules are
loaded. Instead, handle this case with a single log warning that BTF
info may be unavailable.

Mismatches also result in calls to __btf_verifier_log() via
__btf_verifier_log_type() or btf_verifier_log_member(), adding several
additional lines of logging per mismatched module. Add checks to these
paths to skip logging for module BTF mismatches in the "allow
mismatch" case.

All existing logging behavior is preserved in the default
CONFIG_MODULE_ALLOW_BTF_MISMATCH=n case.

Signed-off-by: Connor O'Brien <connoro@google.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20230107025331.3240536-1-connoro@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

bpf, x86: Simplify the parsing logic of structure parameters

Extra_nregs of structure parameters and nr_args can be
added directly at the beginning, and using a flip flag
to identifiy structure parameters. Meantime, renaming
some variables to make them more sense.

Signed-off-by: Pu Lehui <pulehui@huawei.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20230105035026.3091988-1-pulehui@huaweicloud.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

bpf: Replace 0-length arrays with flexible arrays

Zero-length arrays are deprecated [1]. Replace struct bpf_array's union
of 0-length arrays with flexible arrays. Detected with GCC 13, by using
-fstrict-flex-arrays=3:

  arch/x86/net/bpf_jit_comp.c: In function 'bpf_tail_call_direct_fixup':
  arch/x86/net/bpf_jit_comp.c:606:37: warning: array subscript <unknown> is outside array bounds of 'void *[0]' [-Warray-bounds=]
    606 |                 target = array->ptrs[poke->tail_call.key];
        |                          ~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~
  In file included from include/linux/filter.h:9,
                   from arch/x86/net/bpf_jit_comp.c:9:
  include/linux/bpf.h:1527:23: note: while referencing 'ptrs'
   1527 |                 void *ptrs[0] __aligned(8);
        |                       ^~~~

  [1] https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20230105192646.never.154-kees@kernel.org

bpftool: Add missing quotes to libbpf bootstrap submake vars

When passing compiler variables like CC=$(HOSTCC) to a submake
we must ensure the variable is quoted in order to handle cases
where $(HOSTCC) may be multiple binaries.

For example when using ccache $HOSTCC may be:
"/usr/bin/ccache /usr/bin/gcc"

If we pass CC without quotes like CC=$(HOSTCC) only the first
"/usr/bin/ccache" part will be assigned to the CC variable which
will cause an error due to dropping the "/usr/bin/gcc" part of
the variable in the submake invocation.

This fixes errors such as:
/usr/bin/ccache: invalid option -- 'd'

Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20230110014504.3120711-1-james.hilliard1@gmail.com

bpf: Remove the unnecessary insn buffer comparison

The variable 'insn' is initialized to 'insn_buf' without being changed, only
some helper macros are defined, so the insn buffer comparison is unnecessary.
Just remove it. This missed removal back in 2377b81de527 ("bpf: split shared
bpf_tcp_sock and bpf_sock_ops implementation").

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20230108151258.96570-1-haiyue.wang@intel.com

libbpf: Poison strlcpy()

Since commit 9fc205b413b3("libbpf: Add sane strncpy alternative and use
it internally") introduce libbpf_strlcpy(), thus add strlcpy() to a poison
list to prevent accidental use of it.

Signed-off-by: Rong Tao <rongtao@cestc.cn>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/tencent_5695A257C4D16B4413036BA1DAACDECB0B07@qq.com

Merge branch 'devlink-unregister'

Jakub Kicinski says:

====================
devlink: remove the wait-for-references on unregister

Move the registration and unregistration of the devlink instances
under their instance locks. Don't perform the netdev-style wait
for all references when unregistering the instance.

Instead the devlink instance refcount will only ensure that
the memory of the instance is not freed. All places which acquire
access to devlink instances via a reference must check that the
instance is still registered under the instance lock.

This fixes the problem of the netdev code accessing devlink
instances before they are registered.

RFC: https://lore.kernel.org/all/20221217011953.152487-1-kuba@kernel.org/
- rewrite the cover letter
- rewrite the commit message for patch 1
- un-export and rename devl_is_alive
- squash the netdevsim patches
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

netdevsim: move devlink registration under the instance lock

To prevent races with netdev code accessing free devlink instances
move the registration under the devlink instance lock.
Core now waits for the instance to be registered before accessing it.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

netdevsim: rename a label

err_dl_unregister should unregister the devlink instance.
Looks like renaming it was missed in one of the reshufflings.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

devlink: allow registering parameters after the instance

It's most natural to register the instance first and then its
subobjects. Now that we can use the instance lock to protect
the atomicity of all init - it should also be safe.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

devlink: don't require setting features before registration

Requiring devlink_set_features() to be run before devlink is
registered is overzealous. devlink_set_features() itself is
a leftover from old workarounds which were trying to prevent
initiating reload before probe was complete.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

devlink: remove the registration guarantee of references

The objective of exposing the devlink instance locks to
drivers was to let them use these locks to prevent user space
from accessing the device before it's fully initialized.
This is difficult because devlink_unregister() waits for all
references to be released, meaning that devlink_unregister()
can't itself be called under the instance lock.

To avoid this issue devlink_register() was moved after subobject
registration a while ago. Unfortunately the netdev paths get
a hold of the devlink instances _before_ they are registered.
Ideally netdev should wait for devlink init to finish (synchronizing
on the instance lock). This can't work because we don't know if the
instance will _ever_ be registered (in case of failures it may not).
The other option of returning an error until devlink_register()
is called is unappealing (user space would get a notification
netdev exist but would have to wait arbitrary amount of time
before accessing some of its attributes).

Weaken the guarantees of the devlink references.

Holding a reference will now only guarantee that the memory
of the object is around. Another way of looking at it is that
the reference now protects the object not its "registered" status.
Use devlink instance lock to synchronize unregistration.

This implies that releasing of the "main" reference of the devlink
instance moves from devlink_unregister() to devlink_free().

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

devlink: always check if the devlink instance is registered

Always check under the instance lock whether the devlink instance
is still / already registered.

This is a no-op for the most part, as the unregistration path currently
waits for all references. On the init path, however, we may temporarily
open up a race with netdev code, if netdevs are registered before the
devlink instance. This is temporary, the next change fixes it, and this
commit has been split out for the ease of review.

Note that in case of iterating over sub-objects which have their
own lock (regions and line cards) we assume an implicit dependency
between those objects existing and devlink unregistration.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

devlink: protect devlink->dev by the instance lock

devlink->dev is assumed to be always valid as long as any
outstanding reference to the devlink instance exists.

In prep for weakening of the references take the instance lock.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

devlink: update the code in netns move to latest helpers

devlink_pernet_pre_exit() is the only obvious place which takes
the instance lock without using the devl_ helpers. Update the code
and move the error print after releasing the reference
(having unlock and put together feels slightly idiomatic).

Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

devlink: bump the instance index directly when iterating

xa_find_after() is designed to handle multi-index entries correctly.
If a xarray has two entries one which spans indexes 0-3 and one at
index 4 xa_find_after(0) will return the entry at index 4.

Having to juggle the two callbacks, however, is unnecessary in case
of the devlink xarray, as there is 1:1 relationship with indexes.

Always use xa_find() and increment the index manually.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sysctl: expose all net/core sysctls inside netns

All were not visible to the non-priv users inside netns. However,
with 4ecb90090c84 ("sysctl: allow override of /proc/sys/net with
CAP_NET_ADMIN"), these vars are protected from getting modified.
A proc with capable(CAP_NET_ADMIN) can change the values so
not having them visible inside netns is just causing nuisance to
process that check certain values (e.g. net.core.somaxconn) and
see different behavior in root-netns vs. other-netns

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'devlink-code-split-and-structured-instance-walk'

Jakub Kicinski says:

====================
devlink: code split and structured instance walk

Split devlink.c into a handful of files, trying to keep the "core"
code away from all the command-specific implementations.
The core code has been quite scattered until now. Going forward we can
consider using a source file per-subobject, I think that it's quite
beneficial to newcomers (based on relative ease with which folks
contribute to ethtool vs devlink). But this series doesn't split
everything out, yet - partially due to backporting concerns,
but mostly due to lack of time. Bulk of the netlink command
handling is left in a leftover.c file.

Introduce a context structure for dumps, and use it to store
the devlink instance ID of the last dumped devlink instance.
This means we don't have to restart the walk from 0 each time.

Finally - introduce a "structured walk". A centralized dump handler
in devlink/netlink.c which walks the devlink instances, deals with
refcounting/locking, simplifying the per-object implementations quite
a bit. Inspired by the ethtool code.

v1: https://lore.kernel.org/all/20230104041636.226398-1-kuba@kernel.org/
RFC: https://lore.kernel.org/all/20221215020155.1619839-1-kuba@kernel.org/
====================

Link: https://lore.kernel.org/r/20230105040531.353563-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

devlink: convert remaining dumps to the by-instance scheme

Soon we'll have to check if a devlink instance is alive after
locking it. Convert to the by-instance dumping scheme to make
refactoring easier.

Most of the subobject code no longer has to worry about any devlink
locking / lifetime rules (the only ones that still do are the two subject
types which stubbornly use their own locking). Both dump and do callbacks
are given a devlink instance which is already locked and good-to-access
(do from the .pre_doit handler, dump from the new dump indirection).

Note that we'll now check presence of an op (e.g. for sb_pool_get)
under the devlink instance lock, that will soon be necessary anyway,
because we don't hold refs on the driver modules so the memory
in which ops live may be gone for a dead instance, after upcoming
locking changes.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

devlink: add by-instance dump infra

Most dumpit implementations walk the devlink instances.
This requires careful lock taking and reference dropping.
Factor the loop out and provide just a callback to handle
a single instance dump.

Convert one user as an example, other users converted
in the next change.

Slightly inspired by ethtool netlink code.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

devlink: uniformly take the devlink instance lock in the dump loop

Move the lock taking out of devlink_nl_cmd_region_get_devlink_dumpit().
This way all dumps will take the instance lock in the main iteration
loop directly, making refactoring and reading the code easier.

Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

devlink: restart dump based on devlink instance ids (function)

Use xarray id for cases of sub-objects which are iterated in
a function.

Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

devlink: restart dump based on devlink instance ids (nested)

Use xarray id for cases of simple sub-object iteration.
We'll now use the state->instance for the devlink instances
and state->idx for subobject index.

Moving the definition of idx into the inner loop makes sense,
so while at it also move other sub-object local variables into
the loop.

Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

devlink: restart dump based on devlink instance ids (simple)

xarray gives each devlink instance an id and allows us to restart
walk based on that id quite neatly. This is nice both from the
perspective of code brevity and from the stability of the dump
(devlink instances disappearing from before the resumption point
will not cause inconsistent dumps).

This patch takes care of simple cases where state->idx counts
devlink instances only.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

devlink: health: combine loops in dump

Walk devlink instances only once. Dump the instance reporters
and port reporters before moving to the next instance.
User space should not depend on ordering of messages.

This will make improving stability of the walk easier.

Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

devlink: drop the filter argument from devlinks_xa_find_get

Looks like devlinks_xa_find_get() was intended to get the mark
from the @filter argument. It doesn't actually use @filter, passing
DEVLINK_REGISTERED to xa_find_fn() directly. Walking marks other
than registered is unlikely so drop @filter argument completely.

Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

devlink: remove start variables from dumps

The start variables made the code clearer when we had to access
cb->args[0] directly, as the name args doesn't explain much.
Now that we use a structure to hold state this seems no longer
needed.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

devlink: use an explicit structure for dump context

Create a dump context structure instead of using cb->args
as an unsigned long array. This is a pure conversion which
is intended to be as much of a noop as possible.
Subsequent changes will use this to simplify the code.

The two non-trivial parts are:
- devlink_nl_cmd_health_reporter_dump_get_dumpit() checks args[0]
   to see if devlink_fmsg_dumpit() has already been called (whether
   this is the first msg), but doesn't use the exact value, so we
   can drop the local variable there already
- devlink_nl_cmd_region_read_dumpit() uses args[0] for address
   but we'll use args[1] now, shouldn't matter

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netlink: add macro for checking dump ctx size

We encourage casting struct netlink_callback::ctx to a local
struct (in a comment above the field). Provide a convenience
macro for checking if the local struct fits into the ctx.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

devlink: split out netlink code

Move out the netlink glue into a separate file.
Leave the ops in the old file because we'd have to export a ton
of functions. Going forward we should switch to split ops which
will let us to put the new ops in the netlink.c file.

Pure code move, no functional changes.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

devlink: split out core code

Move core code into a separate file. It's spread around the main
file which makes refactoring and figuring out how devlink works
harder.

Move the xarray, all the most core devlink instance code out like
locking, ref counting, alloc, register, etc. Leave port stuff in
leftover.c, if we want to move port code it'd probably be to its
own file.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

devlink: rename devlink_netdevice_event -> devlink_port_netdevice_event

To make the upcoming change a pure(er?) code move rename
devlink_netdevice_event -> devlink_port_netdevice_event.
This makes it clear that it only touches ports and doesn't
belong cleanly in the core.

Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

devlink: move code to a dedicated directory

The devlink code is hard to navigate with 13kLoC in one file.
I really like the way Michal split the ethtool into per-command
files and core. It'd probably be too much to split it all up,
but we can at least separate the core parts out of the per-cmd
implementations and put it in a directory so that new commands
can be separate files.

Move the code, subsequent commit will do a partial split.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-ipa-simplify-ipa-interrupt-handling'

Alex Elder says:

====================
net: ipa: simplify IPA interrupt handling

One of the IPA's two IRQs fires when data on a suspended channel is
available (to request that the channel--or system--be resumed to
recieve the pending data).  This interrupt also handles a few
conditions signaled by the embedded microcontroller.

For this "IPA interrupt", the current code requires a handler to be
dynamically registered for each interrupt condition.  Any condition
that has no registered handler is quietly ignored.  This design is
derived from the downstream IPA driver implementation.

There isn't any need for this complexity.  Even in the downstream
code, only four of the available 30 or so IPA interrupt conditions
are ever handled.  So these handlers can pretty easily just be
called directly in the main IRQ handler function.

This series simplifies the interrupt handling code by having the
small number of IPA interrupt handlers be called directly, rather
than having them be registered dynamically.

Version 2 just adds a missing forward-reference, as suggested by
Caleb.
====================

Link: https://lore.kernel.org/r/20230104175233.2862874-1-elder@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ipa: don't maintain IPA interrupt handler array

We can call the two IPA interrupt handler functions directly;
there's no need to maintain the array of handler function pointers
any more.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ipa: kill ipa_interrupt_add()

The dynamic assignment of IPA interrupt handlers isn't needed; we
only handle three IPA interrupt types, and their handler functions
are now assigned directly. We can get rid of ipa_interrupt_add()
and ipa_interrupt_remove() now, because they serve no purpose.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ipa: register IPA interrupt handlers directly

Declare the microcontroller IPA interrupt handler publicly, and
assign it directly in ipa_interrupt_config(). Make the SUSPEND IPA
interrupt handler public, and rename it ipa_power_suspend_handler().
Assign it directly in ipa_interrupt_config() as well.

This makes it unnecessary to do this in ipa_interrupt_add(). Make
similar changes for removing IPA interrupt handlers.

The next two patches will finish the cleanup, removing the
add/remove functions and the handler array entirely.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ipa: enable IPA interrupt handlers separate from registration

Expose ipa_interrupt_enable() and have functions that register
IPA interrupt handlers enable them directly, rather than having the
registration process do that. Do the same for disabling IPA
interrupt handlers.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ipa: introduce ipa_interrupt_enable()

Create new function ipa_interrupt_enable() to encapsulate enabling
one of the IPA interrupt types. Introduce ipa_interrupt_disable()
to reverse that operation. Add a helper function to factor out the
common register update used by both.

Use these in ipa_interrupt_add() and ipa_interrupt_remove().

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ipa: introduce a common microcontroller interrupt handler

The prototype for an IPA interrupt handler supplies the IPA
interrupt ID, so it's possible to use a single function to handle
any type of microcontroller interrupt.

Introduce ipa_uc_interrupt_handler(), which calls the event or the
response handler depending on the IRQ ID provided. Register the new
function as the handler for both microcontroller IPA interrupt types.

The called functions don't use their "irq_id" arguments, so remove
them.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'enetc-unlock-xdp_redirect-for-xdp-non-linear-buffers'

Lorenzo Bianconi says:

====================
enetc: unlock XDP_REDIRECT for XDP non-linear buffers

Unlock XDP_REDIRECT for S/G XDP buffer and rely on XDP stack to properly
take care of the frames.
Rely on XDP_FLAGS_HAS_FRAGS flag to check if it really necessary to access
non-linear part of the xdp_buff/xdp_frame.
====================

Link: https://lore.kernel.org/r/cover.1672840490.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethernet: enetc: do not always access skb_shared_info in the XDP path

Move XDP skb_shared_info structure initialization in from
enetc_map_rx_buff_to_xdp() to enetc_add_rx_buff_to_xdp() and do not always
access skb_shared_info in the xdp_buff/xdp_frame since it is located in a
different cacheline with respect to hard_start and data xdp pointers.
Rely on XDP_FLAGS_HAS_FRAGS flag to check if it really necessary to access
non-linear part of the xdp_buff/xdp_frame.

Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethernet: enetc: get rid of xdp_redirect_sg counter

Remove xdp_redirect_sg counter and the related ethtool entry since it is
no longer used.

Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethernet: enetc: unlock XDP_REDIRECT for XDP non-linear buffers

Even if full XDP_REDIRECT is not supported yet for non-linear XDP buffers
since we allow redirecting just into CPUMAPs, unlock XDP_REDIRECT for
S/G XDP buffer and rely on XDP stack to properly take care of the
frames.

Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge git://git./linux/kernel/git/netdev/net

No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'net-6.2-rc3' of git://git./linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
"Including fixes from bpf, wifi, and netfilter.

  Current release - regressions:

   - bpf: fix nullness propagation for reg to reg comparisons, avoid
     null-deref

   - inet: control sockets should not use current thread task_frag

   - bpf: always use maximal size for copy_array()

   - eth: bnxt_en: don't link netdev to a devlink port for VFs

  Current release - new code bugs:

   - rxrpc: fix a couple of potential use-after-frees

   - netfilter: conntrack: fix IPv6 exthdr error check

   - wifi: iwlwifi: fw: skip PPAG for JF, avoid FW crashes

   - eth: dsa: qca8k: various fixes for the in-band register access

   - eth: nfp: fix schedule in atomic context when sync mc address

   - eth: renesas: rswitch: fix getting mac address from device tree

   - mobile: ipa: use proper endpoint mask for suspend

  Previous releases - regressions:

   - tcp: add TIME_WAIT sockets in bhash2, fix regression caught by
     Jiri / python tests

   - net: tc: don't intepret cls results when asked to drop, fix
     oob-access

   - vrf: determine the dst using the original ifindex for multicast

   - eth: bnxt_en:
      - fix XDP RX path if BPF adjusted packet length
      - fix HDS (header placement) and jumbo thresholds for RX packets

   - eth: ice: xsk: do not use xdp_return_frame() on tx_buf->raw_buf,
     avoid memory corruptions

  Previous releases - always broken:

   - ulp: prevent ULP without clone op from entering the LISTEN status

   - veth: fix race with AF_XDP exposing old or uninitialized
     descriptors

   - bpf:
      - pull before calling skb_postpull_rcsum() (fix checksum support
        and avoid a WARN())
      - fix panic due to wrong pageattr of im->image (when livepatch and
        kretfunc coexist)
      - keep a reference to the mm, in case the task is dead

   - mptcp: fix deadlock in fastopen error path

   - netfilter:
      - nf_tables: perform type checking for existing sets
      - nf_tables: honor set timeout and garbage collection updates
      - ipset: fix hash:net,port,net hang with /0 subnet
      - ipset: avoid hung task warning when adding/deleting entries

   - selftests: net:
      - fix cmsg_so_mark.sh test hang on non-x86 systems
      - fix the arp_ndisc_evict_nocarrier test for IPv6

   - usb: rndis_host: secure rndis_query check against int overflow

   - eth: r8169: fix dmar pte write access during suspend/resume with
     WOL

   - eth: lan966x: fix configuration of the PCS

   - eth: sparx5: fix reading of the MAC address

   - eth: qed: allow sleep in qed_mcp_trace_dump()

   - eth: hns3:
      - fix interrupts re-initialization after VF FLR
      - fix handling of promisc when MAC addr table gets full
      - refine the handling for VF heartbeat

   - eth: mlx5:
      - properly handle ingress QinQ-tagged packets on VST
      - fix io_eq_size and event_eq_size params validation on big endian
      - fix RoCE setting at HCA level if not supported at all
      - don't turn CQE compression on by default for IPoIB

   - eth: ena:
      - fix toeplitz initial hash key value
      - account for the number of XDP-processed bytes in interface stats
      - fix rx_copybreak value update

  Misc:

   - ethtool: harden phy stat handling against buggy drivers

   - docs: netdev: convert maintainer's doc from FAQ to a normal
     document"

* tag 'net-6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (112 commits)
  caif: fix memory leak in cfctrl_linkup_request()
  inet: control sockets should not use current thread task_frag
  net/ulp: prevent ULP without clone op from entering the LISTEN status
  qed: allow sleep in qed_mcp_trace_dump()
  MAINTAINERS: Update maintainers for ptp_vmw driver
  usb: rndis_host: Secure rndis_query check against int overflow
  net: dpaa: Fix dtsec check for PCS availability
  octeontx2-pf: Fix lmtst ID used in aura free
  drivers/net/bonding/bond_3ad: return when there's no aggregator
  netfilter: ipset: Rework long task execution when adding/deleting entries
  netfilter: ipset: fix hash:net,port,net hang with /0 subnet
  net: sparx5: Fix reading of the MAC address
  vxlan: Fix memory leaks in error path
  net: sched: htb: fix htb_classify() kernel-doc
  net: sched: cbq: dont intepret cls results when asked to drop
  net: sched: atm: dont intepret cls results when asked to drop
  dt-bindings: net: marvell,orion-mdio: Fix examples
  dt-bindings: net: sun8i-emac: Add phy-supply property
  net: ipa: use proper endpoint mask for suspend
  selftests: net: return non-zero for failures reported in arp_ndisc_evict_nocarrier
  ...

Merge tag 'gpio-fixes-for-v6.2-rc3' of git://git./linux/kernel/git/brgl/linux

Pull gpio fixes from Bartosz Golaszewski:
"A reference leak fix, two fixes for using uninitialized variables and
  more drivers converted to using immutable irqchips:

   - fix a reference leak in gpio-sifive

   - fix a potential use of an uninitialized variable in core gpiolib

   - fix a potential use of an uninitialized variable in gpio-pca953x

   - make GPIO irqchips immutable in gpio-pmic-eic-sprd, gpio-eic-sprd
     and gpio-sprd"

* tag 'gpio-fixes-for-v6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
  gpio: sifive: Fix refcount leak in sifive_gpio_probe
  gpio: sprd: Make the irqchip immutable
  gpio: pmic-eic-sprd: Make the irqchip immutable
  gpio: eic-sprd: Make the irqchip immutable
  gpio: pca953x: avoid to use uninitialized value pinctrl
  gpiolib: Fix using uninitialized lookup-flags on ACPI platforms

Merge tag 'fbdev-for-6.2-rc3' of git://git./linux/kernel/git/deller/linux-fbdev

Pull fbdev fixes from Helge Deller:

- Fix Matrox G200eW initialization failure

- Fix build failure of offb driver when built as module

- Optimize stack usage in omapfb

* tag 'fbdev-for-6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev:
  fbdev: omapfb: avoid stack overflow warning
  fbdev: matroxfb: G200eW: Increase max memory from 1 MB to 16 MB
  fbdev: atyfb: use strscpy() to instead of strncpy()
  fbdev: omapfb: use strscpy() to instead of strncpy()
  fbdev: make offb driver tristate

Merge branch 'add-support-for-qsgmii-mode-for-j721e-cpsw9g-to-am65-cpsw-driver'

Siddharth Vadapalli says:

====================
Add support for QSGMII mode for J721e CPSW9G to am65-cpsw driver

Add compatible to am65-cpsw driver for J721e CPSW9G, which contains 8
external ports and 1 internal host port.

Add support to power on and power off the SERDES PHY which is used by the
CPSW MAC.

=========
Changelog
=========
v5:
https://lore.kernel.org/r/20221109042203.375042-1-s-vadapalli@ti.com/
v4:
https://lore.kernel.org/r/20221108080606.124596-1-s-vadapalli@ti.com/
v3:
https://lore.kernel.org/r/20221026090957.180592-1-s-vadapalli@ti.com/
v2:
https://lore.kernel.org/r/20221018085810.151327-1-s-vadapalli@ti.com/
v1:
https://lore.kernel.org/r/20220914095053.189851-1-s-vadapalli@ti.com/
====================

Link: https://lore.kernel.org/r/20230104103432.1126403-1-s-vadapalli@ti.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: ethernet: ti: am65-cpsw: Add support for SERDES configuration

Use PHY framework APIs to initialize the SERDES PHY connected to CPSW MAC.

Define the functions am65_cpsw_disable_phy(), am65_cpsw_enable_phy(),
am65_cpsw_disable_serdes_phy() and am65_cpsw_enable_serdes_phy().

Add new member "serdes_phy" to struct "am65_cpsw_slave_data" to store the
SERDES PHY for each port, if it exists. Use it later while disabling the
SERDES PHY for each port.

Power on and initialize the SerDes PHY in am65_cpsw_nuss_init_slave_ports()
by invoking am65_cpsw_enable_serdes_phy().

Power off the SerDes PHY in am65_cpsw_nuss_remove() by invoking
am65_cpsw_disable_serdes_phy().

Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: ethernet: ti: am65-cpsw: Enable QSGMII mode for J721e CPSW9G

CPSW9G in J721e supports additional modes like QSGMII.
Add new compatible for J721e in am65-cpsw driver.

Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

dt-bindings: net: ti: k3-am654-cpsw-nuss: Add J721e CPSW9G support

Update bindings for TI K3 J721e SoC which contains 9 ports (8 external
ports) CPSW9G module and add compatible for it.

Changes made:
    - Add new compatible ti,j721e-cpswxg-nuss for CPSW9G.
    - Extend pattern properties for new compatible.
    - Change maximum number of CPSW ports to 8 for new compatible.

Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

fbdev: omapfb: avoid stack overflow warning

The dsi_irq_stats structure is a little too big to fit on the
stack of a 32-bit task, depending on the specific gcc options:

fbdev/omap2/omapfb/dss/dsi.c: In function 'dsi_dump_dsidev_irqs':
fbdev/omap2/omapfb/dss/dsi.c:1621:1: error: the frame size of 1064 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]

Since this is only a debugfs file, performance is not critical,
so just dynamically allocate it, and print an error message
in there in place of a failure code when the allocation fails.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Helge Deller <deller@gmx.de>

caif: fix memory leak in cfctrl_linkup_request()

When linktype is unknown or kzalloc failed in cfctrl_linkup_request(),
pkt is not released. Add release process to error path.

Fixes: b482cd2053e3 ("net-caif: add CAIF core protocol stack")
Fixes: 8d545c8f958f ("caif: Disconnect without waiting for response")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20230104065146.1153009-1-shaozhengchao@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

inet: control sockets should not use current thread task_frag

Because ICMP handlers run from softirq contexts,
they must not use current thread task_frag.

Previously, all sockets allocated by inet_ctl_sock_create()
would use the per-socket page fragment, with no chance of
recursion.

Fixes: 98123866fcf3 ("Treewide: Stop corrupting socket's task_frag")
Reported-by: syzbot+bebc6f1acdf4cbb79b03@syzkaller.appspotmail.com
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Benjamin Coddington <bcodding@redhat.com>
Acked-by: Guillaume Nault <gnault@redhat.com>
Link: https://lore.kernel.org/r/20230103192736.454149-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/ulp: prevent ULP without clone op from entering the LISTEN status

When an ULP-enabled socket enters the LISTEN status, the listener ULP data
pointer is copied inside the child/accepted sockets by sk_clone_lock().

The relevant ULP can take care of de-duplicating the context pointer via
the clone() operation, but only MPTCP and SMC implement such op.

Other ULPs may end-up with a double-free at socket disposal time.

We can't simply clear the ULP data at clone time, as TLS replaces the
socket ops with custom ones assuming a valid TLS ULP context is
available.

Instead completely prevent clone-less ULP sockets from entering the
LISTEN status.

Fixes: 734942cc4ea6 ("tcp: ULP infrastructure")
Reported-by: slipper <slipper.alive@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/4b80c3d1dbe3d0ab072f80450c202d9bc88b4b03.1672740602.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

qed: allow sleep in qed_mcp_trace_dump()

By default, qed_mcp_cmd_and_union() delays 10us at a time in a loop
that can run 500K times, so calls to qed_mcp_nvm_rd_cmd()
may block the current thread for over 5s.
We observed thread scheduling delays over 700ms in production,
with stacktraces pointing to this code as the culprit.

qed_mcp_trace_dump() is called from ethtool, so sleeping is permitted.
It already can sleep in qed_mcp_halt(), which calls qed_mcp_cmd().
Add a "can sleep" parameter to qed_find_nvram_image() and
qed_nvram_read() so they can sleep during qed_mcp_trace_dump().
qed_mcp_trace_get_meta_info() and qed_mcp_trace_read_meta(),
called only by qed_mcp_trace_dump(), allow these functions to sleep.
I can't tell if the other caller (qed_grc_dump_mcp_hw_dump()) can sleep,
so keep b_can_sleep set to false when it calls these functions.

An example stacktrace from a custom warning we added to the kernel
showing a thread that has not scheduled despite long needing resched:
[ 2745.362925,17] ------------[ cut here ]------------
[ 2745.362941,17] WARNING: CPU: 23 PID: 5640 at arch/x86/kernel/irq.c:233 do_IRQ+0x15e/0x1a0()
[ 2745.362946,17] Thread not rescheduled for 744 ms after irq 99
[ 2745.362956,17] Modules linked in: ...
[ 2745.363339,17] CPU: 23 PID: 5640 Comm: lldpd Tainted: P           O    4.4.182+ #202104120910+6d1da174272d.61x
[ 2745.363343,17] Hardware name: FOXCONN MercuryB/Quicksilver Controller, BIOS H11P1N09 07/08/2020
[ 2745.363346,17]  0000000000000000 ffff885ec07c3ed8 ffffffff8131eb2f ffff885ec07c3f20
[ 2745.363358,17]  ffffffff81d14f64 ffff885ec07c3f10 ffffffff81072ac2 ffff88be98ed0000
[ 2745.363369,17]  0000000000000063 0000000000000174 0000000000000074 0000000000000000
[ 2745.363379,17] Call Trace:
[ 2745.363382,17]  <IRQ>  [<ffffffff8131eb2f>] dump_stack+0x8e/0xcf
[ 2745.363393,17]  [<ffffffff81072ac2>] warn_slowpath_common+0x82/0xc0
[ 2745.363398,17]  [<ffffffff81072b4c>] warn_slowpath_fmt+0x4c/0x50
[ 2745.363404,17]  [<ffffffff810d5a8e>] ? rcu_irq_exit+0xae/0xc0
[ 2745.363408,17]  [<ffffffff817c99fe>] do_IRQ+0x15e/0x1a0
[ 2745.363413,17]  [<ffffffff817c7ac9>] common_interrupt+0x89/0x89
[ 2745.363416,17]  <EOI>  [<ffffffff8132aa74>] ? delay_tsc+0x24/0x50
[ 2745.363425,17]  [<ffffffff8132aa04>] __udelay+0x34/0x40
[ 2745.363457,17]  [<ffffffffa04d45ff>] qed_mcp_cmd_and_union+0x36f/0x7d0 [qed]
[ 2745.363473,17]  [<ffffffffa04d5ced>] qed_mcp_nvm_rd_cmd+0x4d/0x90 [qed]
[ 2745.363490,17]  [<ffffffffa04e1dc7>] qed_mcp_trace_dump+0x4a7/0x630 [qed]
[ 2745.363504,17]  [<ffffffffa04e2556>] ? qed_fw_asserts_dump+0x1d6/0x1f0 [qed]
[ 2745.363520,17]  [<ffffffffa04e4ea7>] qed_dbg_mcp_trace_get_dump_buf_size+0x37/0x80 [qed]
[ 2745.363536,17]  [<ffffffffa04ea881>] qed_dbg_feature_size+0x61/0xa0 [qed]
[ 2745.363551,17]  [<ffffffffa04eb427>] qed_dbg_all_data_size+0x247/0x260 [qed]
[ 2745.363560,17]  [<ffffffffa0482c10>] qede_get_regs_len+0x30/0x40 [qede]
[ 2745.363566,17]  [<ffffffff816c9783>] ethtool_get_drvinfo+0xe3/0x190
[ 2745.363570,17]  [<ffffffff816cc152>] dev_ethtool+0x1362/0x2140
[ 2745.363575,17]  [<ffffffff8109bcc6>] ? finish_task_switch+0x76/0x260
[ 2745.363580,17]  [<ffffffff817c2116>] ? __schedule+0x3c6/0x9d0
[ 2745.363585,17]  [<ffffffff810dbd50>] ? hrtimer_start_range_ns+0x1d0/0x370
[ 2745.363589,17]  [<ffffffff816c1e5b>] ? dev_get_by_name_rcu+0x6b/0x90
[ 2745.363594,17]  [<ffffffff816de6a8>] dev_ioctl+0xe8/0x710
[ 2745.363599,17]  [<ffffffff816a58a8>] sock_do_ioctl+0x48/0x60
[ 2745.363603,17]  [<ffffffff816a5d87>] sock_ioctl+0x1c7/0x280
[ 2745.363608,17]  [<ffffffff8111f393>] ? seccomp_phase1+0x83/0x220
[ 2745.363612,17]  [<ffffffff811e3503>] do_vfs_ioctl+0x2b3/0x4e0
[ 2745.363616,17]  [<ffffffff811e3771>] SyS_ioctl+0x41/0x70
[ 2745.363619,17]  [<ffffffff817c6ffe>] entry_SYSCALL_64_fastpath+0x1e/0x79
[ 2745.363622,17] ---[ end trace f6954aa440266421 ]---

Fixes: c965db4446291 ("qed: Add support for debug data collection")
Signed-off-by: Caleb Sander <csander@purestorage.com>
Acked-by: Alok Prasad <palok@marvell.com>
Link: https://lore.kernel.org/r/20230103233021.1457646-1-csander@purestorage.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'for-netdev' of https://git./linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
bpf-next 2023-01-04

We've added 45 non-merge commits during the last 21 day(s) which contain
a total of 50 files changed, 1454 insertions(+), 375 deletions(-).

The main changes are:

1) Fixes, improvements and refactoring of parts of BPF verifier's
   state equivalence checks, from Andrii Nakryiko.

2) Fix a few corner cases in libbpf's BTF-to-C converter in particular
   around padding handling and enums, also from Andrii Nakryiko.

3) Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key to better
  support decap on GRE tunnel devices not operating in collect metadata,
  from Christian Ehrig.

4) Improve x86 JIT's codegen for PROBE_MEM runtime error checks,
   from Dave Marchevsky.

5) Remove the need for trace_printk_lock for bpf_trace_printk
   and bpf_trace_vprintk helpers, from Jiri Olsa.

6) Add proper documentation for BPF_MAP_TYPE_SOCK{MAP,HASH} maps,
   from Maryam Tahhan.

7) Improvements in libbpf's btf_parse_elf error handling, from Changbin Du.

8) Bigger batch of improvements to BPF tracing code samples,
   from Daniel T. Lee.

9) Add LoongArch support to libbpf's bpf_tracing helper header,
   from Hengqi Chen.

10) Fix a libbpf compiler warning in perf_event_open_probe on arm32,
    from Khem Raj.

11) Optimize bpf_local_storage_elem by removing 56 bytes of padding,
    from Martin KaFai Lau.

12) Use pkg-config to locate libelf for resolve_btfids build,
    from Shen Jiamin.

13) Various libbpf improvements around API documentation and errno
    handling, from Xin Liu.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (45 commits)
  libbpf: Return -ENODATA for missing btf section
  libbpf: Add LoongArch support to bpf_tracing.h
  libbpf: Restore errno after pr_warn.
  libbpf: Added the description of some API functions
  libbpf: Fix invalid return address register in s390
  samples/bpf: Use BPF_KSYSCALL macro in syscall tracing programs
  samples/bpf: Fix tracex2 by using BPF_KSYSCALL macro
  samples/bpf: Change _kern suffix to .bpf with syscall tracing program
  samples/bpf: Use vmlinux.h instead of implicit headers in syscall tracing program
  samples/bpf: Use kyscall instead of kprobe in syscall tracing program
  bpf: rename list_head -> graph_root in field info types
  libbpf: fix errno is overwritten after being closed.
  bpf: fix regs_exact() logic in regsafe() to remap IDs correctly
  bpf: perform byte-by-byte comparison only when necessary in regsafe()
  bpf: reject non-exact register type matches in regsafe()
  bpf: generalize MAYBE_NULL vs non-MAYBE_NULL rule
  bpf: reorganize struct bpf_reg_state fields
  bpf: teach refsafe() to take into account ID remapping
  bpf: Remove unused field initialization in bpf's ctl_table
  selftests/bpf: Add jit probe_mem corner case tests to s390x denylist
  ...
====================

Link: https://lore.kernel.org/r/20230105000926.31350-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'for-netdev' of https://git./linux/kernel/git/bpf/bpf

Alexei Starovoitov says:

====================
bpf 2023-01-04

We've added 5 non-merge commits during the last 8 day(s) which contain
a total of 5 files changed, 112 insertions(+), 18 deletions(-).

The main changes are:

1) Always use maximal size for copy_array in the verifier to fix
   KASAN tracking, from Kees.

2) Fix bpf task iterator walking through dead tasks, from Kui-Feng.

3) Make sure livepatch and bpf fexit can coexist, from Chuang.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  bpf: Always use maximal size for copy_array()
  selftests/bpf: add a test for iter/task_vma for short-lived processes
  bpf: keep a reference to the mm, in case the task is dead.
  selftests/bpf: Temporarily disable part of btf_dump:var_data test.
  bpf: Fix panic due to wrong pageattr of im->image
====================

Link: https://lore.kernel.org/r/20230104215500.79435-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'for_linus' of git://git./linux/kernel/git/mst/vhost

Pull virtio updates from Michael Tsirkin:
"Mostly fixes all over the place, a couple of cleanups"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (32 commits)
  virtio_blk: Fix signedness bug in virtblk_prep_rq()
  vdpa_sim_net: should not drop the multicast/broadcast packet
  vdpasim: fix memory leak when freeing IOTLBs
  vdpa: conditionally fill max max queue pair for stats
  vdpa/vp_vdpa: fix kfree a wrong pointer in vp_vdpa_remove
  vduse: Validate vq_num in vduse_validate_config()
  tools/virtio: remove smp_read_barrier_depends()
  tools/virtio: remove stray characters
  vhost_vdpa: fix the crash in unmap a large memory
  virtio: Implementing attribute show with sysfs_emit
  virtio-crypto: fix memory leak in virtio_crypto_alg_skcipher_close_session()
  tools/virtio: Variable type completion
  vdpa_sim: fix vringh initialization in vdpasim_queue_ready()
  virtio_blk: use UINT_MAX instead of -1U
  vhost-vdpa: fix an iotlb memory leak
  vhost: fix range used in translate_desc()
  vringh: fix range used in iotlb_translate()
  vhost/vsock: Fix error handling in vhost_vsock_init()
  vdpa_sim: fix possible memory leak in vdpasim_net_init() and vdpasim_blk_init()
  tools: Delete the unneeded semicolon after curly braces
  ...

Merge tag 'x86-urgent-2023-01-04' of git://git./linux/kernel/git/tip/tip

Pull misc x86 fixes from Ingo Molnar:
"Fix a double-free bug, a binutils warning, a header namespace clash
  and a bug in ib_prctl_set()"

* tag 'x86-urgent-2023-01-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/bugs: Flush IBP in ib_prctl_set()
  x86/insn: Avoid namespace clash by separating instruction decoder MMIO type from MMIO trace type
  x86/asm: Fix an assembler warning with current binutils
  x86/kexec: Fix double-free of elf header buffer

Merge tag 'f2fs-fix-6.2-rc3' of git://git./linux/kernel/git/jaegeuk/f2fs

Pull f2fs fixes from Jaegeuk Kim:

- fix a null pointer dereference in f2fs_issue_flush, which occurs by
   the combination of mount/remount options.

- fix a bug in per-block age-based extent_cache newly introduced in
   6.2-rc1, which reported a wrong age information in extent_cache.

- fix a kernel panic if extent_tree was not created, which was caught
   by a wrong BUG_ON

* tag 'f2fs-fix-6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs:
  f2fs: let's avoid panic if extent_tree is not created
  f2fs: should use a temp extent_info for lookup
  f2fs: don't mix to use union values in extent_info
  f2fs: initialize extent_cache parameter
  f2fs: fix to avoid NULL pointer dereference in f2fs_issue_flush()

Merge tag 'nfsd-6.2-2' of git://git./linux/kernel/git/cel/linux

Pull nfsd fixes from Chuck Lever:

- Fix a filecache UAF during NFSD shutdown

- Avoid exposing automounted mounts on NFS re-exports

* tag 'nfsd-6.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: fix handling of readdir in v4root vs. mount upcall timeout
nfsd: shut down the NFSv4 state objects before the filecache

x86/bugs: Flush IBP in ib_prctl_set()

We missed the window between the TIF flag update and the next reschedule.

Signed-off-by: Rodrigo Branco <bsdaemon@google.com>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: <stable@vger.kernel.org>

Merge branch '1GbE' of git://git./linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2023-01-03 (igc)

Muhammad Husaini Zulkifli says:

Improvements to the Time-Sensitive Networking (TSN) Qbv Scheduling
capabilities were included in this patch series for I226 SKU.

An overview of each patch series is given below:

Patch 1: To enable basetime scheduling in the future, remove the existing
restriction for i226 stepping while maintain the restriction for i225.
Patch 2: Remove the restriction which require a controller reset when
setting the basetime register for new i226 steps and enable the second
GCL configuration.
Patch 3: Remove the power reset adapter during disabling the tsn config.
---
Patches remaining from initial PR:
https://lore.kernel.org/netdev/20221205212414.3197525-1-anthony.l.nguyen@intel.com/

after sending net patches:
https://lore.kernel.org/netdev/20221215230758.3595578-1-anthony.l.nguyen@intel.com/

Note: patch 3 is an additional patch from the initial PR.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

MAINTAINERS: Update maintainers for ptp_vmw driver

Vivek has decided to transfer the maintainership of the VMware virtual
PTP clock driver (ptp_vmw) to Srivatsa and Deep. Update the
MAINTAINERS file to reflect this change, and also add Alexey as a
reviewer for the driver.

Signed-off-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
Acked-by: Vivek Thampi <vivek@vivekthampi.com>
Acked-by: Deep Shah <sdeep@vmware.com>
Acked-by: Alexey Makhalov <amakhalov@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

libbpf: Return -ENODATA for missing btf section

As discussed before, return -ENODATA (No data available) would be more
meaningful than ENOENT (No such file or directory).

Suggested-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Changbin Du <changbin.du@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221231151436.6541-1-changbin.du@gmail.com

x86/insn: Avoid namespace clash by separating instruction decoder MMIO type from MMIO trace type

Both <linux/mmiotrace.h> and <asm/insn-eval.h> define various MMIO_ enum constants,
whose namespace overlaps.

Rename the <asm/insn-eval.h> ones to have a INSN_ prefix, so that the headers can be
used from the same source file.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230101162910.710293-2-Jason@zx2c4.com

igc: Remove reset adapter task for i226 during disable tsn config

I225 have limitation when programming the BaseTime register which required
a power cycle of the controller. This limitation already lifted in I226.
This patch removes the restriction so that when user configure/remove any
TSN mode, it would not go into power cycle reset adapter.

How to test:

Schedule any gate control list configuration or delete it.

Example:

1)

BASE_TIME=$(date +%s%N)
tc qdisc replace dev $interface_name parent root handle 100 taprio \
    num_tc 4 \
    map 3 1 0 2 3 3 3 3 3 3 3 3 3 3 3 3 \
    queues 1@0 1@1 1@2 1@3 \
    base-time $BASE_TIME \
    sched-entry S 0F 1000000 \
    flags 0x2

2) tc qdisc del dev $intername_name root

Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

igc: enable Qbv configuration for 2nd GCL

Make reset task only executes for i225 and Qbv disabling to allow
i226 configure for 2nd GCL without resetting the adapter.

In i226, Tx won't hang if there is a GCL is already running, so in
this case we don't need to set FutScdDis bit.

Signed-off-by: Tan Tee Min <tee.min.tan@linux.intel.com>
Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

igc: remove I226 Qbv BaseTime restriction

Remove the Qbv BaseTime restriction for I226 so that the BaseTime can be
scheduled to the future time. A new register bit of Tx Qav Control
(Bit-7: FutScdDis) was introduced to allow I226 scheduling future time as
Qbv BaseTime and not having the Tx hang timeout issue.

Besides, according to datasheet section 7.5.2.9.3.3, FutScdDis bit has to
be configured first before the cycle time and base time.

Indeed the FutScdDis bit is only active on re-configuration, thus we have
to set the BASET_L to zero and then only set it to the desired value.

Please also note that the Qbv configuration flow is moved around based on
the Qbv programming guideline that is documented in the latest datasheet.

Co-developed-by: Tan Tee Min <tee.min.tan@linux.intel.com>
Signed-off-by: Tan Tee Min <tee.min.tan@linux.intel.com>
Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

f2fs: let's avoid panic if extent_tree is not created

This patch avoids the below panic.

pc : __lookup_extent_tree+0xd8/0x760
lr : f2fs_do_write_data_page+0x104/0x87c
sp : ffffffc010cbb3c0
x29: ffffffc010cbb3e0 x28: 0000000000000000
x27: ffffff8803e7f020 x26: ffffff8803e7ed40
x25: ffffff8803e7f020 x24: ffffffc010cbb460
x23: ffffffc010cbb480 x22: 0000000000000000
x21: 0000000000000000 x20: ffffffff22e90900
x19: 0000000000000000 x18: ffffffc010c5d080
x17: 0000000000000000 x16: 0000000000000020
x15: ffffffdb1acdbb88 x14: ffffff888759e2b0
x13: 0000000000000000 x12: ffffff802da49000
x11: 000000000a001200 x10: ffffff8803e7ed40
x9 : ffffff8023195800 x8 : ffffff802da49078
x7 : 0000000000000001 x6 : 0000000000000000
x5 : 0000000000000006 x4 : ffffffc010cbba28
x3 : 0000000000000000 x2 : ffffffc010cbb480
x1 : 0000000000000000 x0 : ffffff8803e7ed40
Call trace:
__lookup_extent_tree+0xd8/0x760
f2fs_do_write_data_page+0x104/0x87c
f2fs_write_single_data_page+0x420/0xb60
f2fs_write_cache_pages+0x418/0xb1c
__f2fs_write_data_pages+0x428/0x58c
f2fs_write_data_pages+0x30/0x40
do_writepages+0x88/0x190
__writeback_single_inode+0x48/0x448
writeback_sb_inodes+0x468/0x9e8
__writeback_inodes_wb+0xb8/0x2a4
wb_writeback+0x33c/0x740
wb_do_writeback+0x2b4/0x400
wb_workfn+0xe4/0x34c
process_one_work+0x24c/0x5bc
worker_thread+0x3e8/0xa50
kthread+0x150/0x1b4

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: should use a temp extent_info for lookup

Otherwise, __lookup_extent_tree() will override the given extent_info which will
be used by caller.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: don't mix to use union values in extent_info

Let's explicitly use the defined values in block_age case only.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: initialize extent_cache parameter

This can avoid confusing tracepoint values.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: fix to avoid NULL pointer dereference in f2fs_issue_flush()

With below two cases, it will cause NULL pointer dereference when
accessing SM_I(sbi)->fcc_info in f2fs_issue_flush().

a) If kthread_run() fails in f2fs_create_flush_cmd_control(), it will
release SM_I(sbi)->fcc_info,

- mount -o noflush_merge /dev/vda /mnt/f2fs
- mount -o remount,flush_merge /dev/vda /mnt/f2fs -- kthread_run() fails
- dd if=/dev/zero of=/mnt/f2fs/file bs=4k count=1 conv=fsync

b) we will never allocate memory for SM_I(sbi)->fcc_info w/ below
testcase,

- mount -o ro /dev/vda /mnt/f2fs
- mount -o rw,remount /dev/vda /mnt/f2fs
- dd if=/dev/zero of=/mnt/f2fs/file bs=4k count=1 conv=fsync

In order to fix this issue, let change as below:
- fix error path handling in f2fs_create_flush_cmd_control().
- allocate SM_I(sbi)->fcc_info even if readonly is on.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

x86/asm: Fix an assembler warning with current binutils

Fix a warning: "found `movsd'; assuming `movsl' was meant"

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: linux-kernel@vger.kernel.org

libbpf: Add LoongArch support to bpf_tracing.h

Add PT_REGS macros for LoongArch ([0]).

[0]: https://loongson.github.io/LoongArch-Documentation/LoongArch-ELF-ABI-EN.html

Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Huacai Chen <chenhuacai@loongson.cn>
Link: https://lore.kernel.org/bpf/20221231100757.3177034-1-hengqi.chen@gmail.com

usb: rndis_host: Secure rndis_query check against int overflow

Variables off and len typed as uint32 in rndis_query function
are controlled by incoming RNDIS response message thus their
value may be manipulated. Setting off to a unexpectetly large
value will cause the sum with len and 8 to overflow and pass
the implemented validation step. Consequently the response
pointer will be referring to a location past the expected
buffer boundaries allowing information leakage e.g. via
RNDIS_OID_802_3_PERMANENT_ADDRESS OID.

Fixes: ddda08624013 ("USB: rndis_host, various cleanups")
Signed-off-by: Szymon Heidrich <szymon.heidrich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dpaa: Fix dtsec check for PCS availability

We want to fail if the PCS is not available, not if it is available. Fix
this condition.

Fixes: 5d93cfcf7360 ("net: dpaa: Convert to phylink")
Reported-by: Christian Zigotzky <info@xenosoft.de>
Signed-off-by: Sean Anderson <seanga2@gmail.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

octeontx2-pf: Fix lmtst ID used in aura free

Current code uses per_cpu pointer to get the lmtst_id mapped to
the core on which aura_free() is executed. Using per_cpu pointer
without preemption disable causing mismatch between lmtst_id and
core on which pointer gets freed. This patch fixes the issue by
disabling preemption around aura_free.

Fixes: ef6c8da71eaf ("octeontx2-pf: cn10K: Reserve LMTST lines per core")
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers/net/bonding/bond_3ad: return when there's no aggregator

Otherwise we would dereference a NULL aggregator pointer when calling
__set_agg_ports_ready on the line below.

Found by Linux Verification Center (linuxtesting.org) with the SVACE
static analysis tool.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Daniil Tatianin <d-tatianin@yandex-team.ru>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git./linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

1) Use signed integer in ipv6_skip_exthdr() called from nf_confirm().
   Reported by static analysis tooling, patch from Florian Westphal.

2) Missing set type checks in nf_tables: Validate that set declaration
   matches the an existing set type, otherwise bail out with EEXIST.
   Currently, nf_tables silently accepts the re-declaration with a
   different type but it bails out later with EINVAL when the user adds
   entries to the set. This fix is relatively large because it requires
   two preparation patches that are included in this batch.

3) Do not ignore updates of timeout and gc_interval parameters in
   existing sets.

4) Fix a hang when 0/0 subnets is added to a hash:net,port,net type of
   ipset. Except hash:net,port,net and hash:net,iface, the set types don't
   support 0/0 and the auxiliary functions rely on this fact. So 0/0 needs
   a special handling in hash:net,port,net which was missing (hash:net,iface
   was not affected by this bug), from Jozsef Kadlecsik.

5) When adding/deleting large number of elements in one step in ipset,
   it can take a reasonable amount of time and can result in soft lockup
   errors. This patch is a complete rework of the previous version in order
   to use a smaller internal batch limit and at the same time removing
   the external hard limit to add arbitrary number of elements in one step.
   Also from Jozsef Kadlecsik.

Except for patch #1, which fixes a bug introduced in the previous net-next
development cycle, anything else has been broken for several releases.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'for-6.2-rc2-tag' of git://git./linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
"First batch of regression and regular fixes:

   - regressions:
       - fix error handling after conversion to qstr for paths
       - fix raid56/scrub recovery caused by uninitialized variable
         after conversion to error bitmaps
       - restore qgroup backref lookup behaviour after recent
         refactoring
       - fix leak of device lists at module exit time

   - fix resolving backrefs for inline extent followed by prealloc

   - reset defrag ioctl buffer on memory allocation error"

* tag 'for-6.2-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: fix fscrypt name leak after failure to join log transaction
  btrfs: scrub: fix uninitialized return value in recover_scrub_rbio
  btrfs: fix resolving backrefs for inline extent followed by prealloc
  btrfs: fix trace event name typo for FLUSH_DELAYED_REFS
  btrfs: restore BTRFS_SEQ_LAST when looking up qgroup backref lookup
  btrfs: fix leak of fs devices after removing btrfs module
  btrfs: fix an error handling path in btrfs_defrag_leaves()
  btrfs: fix an error handling path in btrfs_rename()

fs/ntfs3: don't hold ni_lock when calling truncate_setsize()

syzbot is reporting hung task at do_user_addr_fault() [1], for there is
a silent deadlock between PG_locked bit and ni_lock lock.

Since filemap_update_page() calls filemap_read_folio() after calling
folio_trylock() which will set PG_locked bit, ntfs_truncate() must not
call truncate_setsize() which will wait for PG_locked bit to be cleared
when holding ni_lock lock.

Link: https://lore.kernel.org/all/00000000000060d41f05f139aa44@google.com/
Link: https://syzkaller.appspot.com/bug?extid=bed15dbf10294aa4f2ae
Reported-by: syzbot <syzbot+bed15dbf10294aa4f2ae@syzkaller.appspotmail.com>
Debugged-by: Linus Torvalds <torvalds@linux-foundation.org>
Co-developed-by: Hillf Danton <hdanton@sina.com>
Signed-off-by: Hillf Danton <hdanton@sina.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Fixes: 4342306f0f0d ("fs/ntfs3: Add file operations and implementation")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

x86/kexec: Fix double-free of elf header buffer

After

b3e34a47f989 ("x86/kexec: fix memory leak of elf header buffer"),

freeing image->elf_headers in the error path of crash_load_segments()
is not needed because kimage_file_post_load_cleanup() will take
care of that later. And not clearing it could result in a double-free.

Drop the superfluous vfree() call at the error path of
crash_load_segments().

Fixes: b3e34a47f989 ("x86/kexec: fix memory leak of elf header buffer")
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Baoquan He <bhe@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/20221122115122.13937-1-tiwai@suse.de

nfsd: fix handling of readdir in v4root vs. mount upcall timeout

If v4 READDIR operation hits a mountpoint and gets back an error,
then it will include that entry in the reply and set RDATTR_ERROR for it
to the error.

That's fine for "normal" exported filesystems, but on the v4root, we
need to be more careful to only expose the existence of dentries that
lead to exports.

If the mountd upcall times out while checking to see whether a
mountpoint on the v4root is exported, then we have no recourse other
than to fail the whole operation.

Cc: Steve Dickson <steved@redhat.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216777
Reported-by: JianHong Yin <yin-jianhong@163.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: <stable@vger.kernel.org>

fbdev: matroxfb: G200eW: Increase max memory from 1 MB to 16 MB

Commit 62d89a7d49af ("video: fbdev: matroxfb: set maxvram of vbG200eW to
the same as vbG200 to avoid black screen") accidently decreases the
maximum memory size for the Matrox G200eW (102b:0532) from 8 MB to 1 MB
by missing one zero. This caused the driver initialization to fail with
the messages below, as the minimum required VRAM size is 2 MB:

     [    9.436420] matroxfb: Matrox MGA-G200eW (PCI) detected
     [    9.444502] matroxfb: cannot determine memory size
     [    9.449316] matroxfb: probe of 0000:0a:03.0 failed with error -1

So, add the missing 0 to make it the intended 16 MB. Successfully tested on
the Dell PowerEdge R910/0KYD3D, BIOS 2.10.0 08/29/2013, that the warning is
gone.

While at it, add a leading 0 to the maxdisplayable entry, so it’s aligned
properly. The value could probably also be increased from 8 MB to 16 MB, as
the G200 uses the same values, but I have not checked any datasheet.

Note, matroxfb is obsolete and superseded by the maintained DRM driver
mga200, which is used by default on most systems where both drivers are
available. Therefore, on most systems it was only a cosmetic issue.

Fixes: 62d89a7d49af ("video: fbdev: matroxfb: set maxvram of vbG200eW to the same as vbG200 to avoid black screen")
Link: https://lore.kernel.org/linux-fbdev/972999d3-b75d-5680-fcef-6e6905c52ac5@suse.de/T/#mb6953a9995ebd18acc8552f99d6db39787aec775
Cc: it+linux-fbdev@molgen.mpg.de
Cc: Z. Liu <liuzx@knownsec.com>
Cc: Rich Felker <dalias@libc.org>
Cc: stable@vger.kernel.org
Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Helge Deller <deller@gmx.de>

netfilter: ipset: Rework long task execution when adding/deleting entries

When adding/deleting large number of elements in one step in ipset, it can
take a reasonable amount of time and can result in soft lockup errors. The
patch 5f7b51bf09ba ("netfilter: ipset: Limit the maximal range of
consecutive elements to add/delete") tried to fix it by limiting the max
elements to process at all. However it was not enough, it is still possible
that we get hung tasks. Lowering the limit is not reasonable, so the
approach in this patch is as follows: rely on the method used at resizing
sets and save the state when we reach a smaller internal batch limit,
unlock/lock and proceed from the saved state. Thus we can avoid long
continuous tasks and at the same time removed the limit to add/delete large
number of elements in one step.

The nfnl mutex is held during the whole operation which prevents one to
issue other ipset commands in parallel.

Fixes: 5f7b51bf09ba ("netfilter: ipset: Limit the maximal range of consecutive elements to add/delete")
Reported-by: syzbot+9204e7399656300bf271@syzkaller.appspotmail.com
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: ipset: fix hash:net,port,net hang with /0 subnet

The hash:net,port,net set type supports /0 subnets. However, the patch
commit 5f7b51bf09baca8e titled "netfilter: ipset: Limit the maximal range
of consecutive elements to add/delete" did not take into account it and
resulted in an endless loop. The bug is actually older but the patch
5f7b51bf09baca8e brings it out earlier.

Handle /0 subnets properly in hash:net,port,net set types.

Fixes: 5f7b51bf09ba ("netfilter: ipset: Limit the maximal range of consecutive elements to add/delete")
Reported-by: Марк Коренберг <socketpair@gmail.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>