David S. Miller [Wed, 17 May 2023 08:27:32 +0000 (09:27 +0100)]
Merge branch '100GbE' of git://git./linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
ice: support dynamic interrupt allocation
Piotr Raczynski says:
This patchset reimplements MSIX interrupt allocation logic to allow dynamic
interrupt allocation after MSIX has been initially enabled. This allows
current and future features to allocate and free interrupts as needed and
will help to drastically decrease number of initially preallocated
interrupts (even down to the API hard limit of 1). Although this patchset
does not change behavior in terms of actual number of allocated interrupts
during probe, it will be subject to change.
First few patches prepares to introduce dynamic allocation by moving
interrupt allocation code to separate file and update allocation API used
in the driver to the currently preferred one.
Due to the current contract between ice and irdma driver which is directly
accessing msix entries allocated by ice driver, even after moving away from
older pci_enable_msix_range function, still keep msix_entries array for
irdma use.
Next patches refactors and removes redundant code from SRIOV related logic
as it also make it easier to move away from static allocation scheme.
Last patches actually enables dynamic allocation of MSIX interrupts. First,
introduce functions to allocate and free interrupts individually. This sets
ground for the rest of the changes even if that patch still allocates the
interrupts from the preallocated pool. Since this patch starts to keep
interrupt details in ice_q_vector structure we can get rid of functions
that calculates base vector number and register offset for the interrupt
as it is equal to the interrupt index. Only keep separate register offset
functions for the VF VSIs.
Next, replace homegrown interrupt tracker with much simpler xarray based
approach. As new API always allocate interrupts one by one, also track
interrupts in the same manner.
Lastly, extend the interrupt tracker to deal both with preallocated and
dynamically allocated vectors and use pci_msix_alloc_irq_at and
pci_msix_free_irq functions. Since not all architecture supports dynamic
allocation, check it before trying to allocate a new interrupt.
As previously mentioned, this patchset does not change number of initially
allocated interrupts during init phase but now it can and will likely be
changed.
Patch 1-3 -> move code around and use newer API
Patch 4-5 -> refactor and remove redundant SRIOV code
Patch 6 -> allocate every interrupt individually
Patch 7 -> replace homegrown interrupt tracker with xarray
Patch 8 -> allow dynamic interrupt allocation
---
v2:
Patch 4
- simplify ice_vsi_setup_vector_base and account for num_avail_sw_msix
Patch 8
- prevent q_vector leak in case vf ctrl VSI error
v1: https://lore.kernel.org/netdev/
20230509170048.2235678-1-anthony.l.nguyen@intel.com/
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Subbaraya Sundeep [Tue, 16 May 2023 11:40:31 +0000 (17:10 +0530)]
octeontx2-pf: mcs: Support VLAN in clear text
Detect whether macsec secy is running on top of VLAN
which implies transmitting VLAN tag in clear text before
macsec SecTag. In this case configure hardware to insert
SecTag after VLAN tag.
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuya Tajima [Mon, 15 May 2023 15:34:27 +0000 (15:34 +0000)]
seg6: Cleanup duplicates of skb_dst_drop calls
In processing IPv6 segment routing header (SRH), several functions call
skb_dst_drop before ip6_route_input. However, ip6_route_input calls
skb_dst_drop within it, so there is no need to call skb_dst_drop in advance.
Signed-off-by: Yuya Tajima <yuya.tajimaa@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 17 May 2023 07:38:42 +0000 (08:38 +0100)]
Merge branch 'tcp-io_uring-zc-opts'
Merge branch 'tcp-io_uring-zc-opts'
Pavel Begunkov says:
====================
minor tcp io_uring zc optimisations
Patch 1 is a simple cleanup, patch 2 gives removes 2 atomics from the
io_uring zc TCP submission path, which yielded extra 0.5% for my
throughput CPU bound tests based on liburing/examples/send-zerocopy.c
====================
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Begunkov [Mon, 15 May 2023 16:06:37 +0000 (17:06 +0100)]
net/tcp: optimise io_uring zc ubuf refcounting
io_uring keeps a reference to ubuf_info during submission, so if
tcp_sendmsg_locked() sees msghdr::msg_ubuf in can be sure the buffer
will be kept alive and doesn't need to additionally pin it.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Begunkov [Mon, 15 May 2023 16:06:36 +0000 (17:06 +0100)]
net/tcp: don't peek at tail for io_uring zc
Move tcp_write_queue_tail() to SOCK_ZEROCOPY specific flag as zerocopy
setup for msghdr->ubuf_info doesn't need to peek into the last request.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Wed, 17 May 2023 03:58:58 +0000 (20:58 -0700)]
Merge tag 'linux-can-next-for-6.5-
20230515' of git://git./linux/kernel/git/mkl/linux-can-next
Marc Kleine-Budde says:
====================
pull-request: can-next 2023-05-15
The 1st patch is by Ji-Ze Hong and adds support for the Fintek F81604
USB-CAN adapter.
Jiapeng Chong's patch removes unnecessary dev_err() functions from the
bxcan driver.
The next patch is by me an makes a CAN internal header file self
contained.
The remaining 19 patches are by Uwe Kleine-König, they all convert the
platform driver remove callback to return void.
* tag 'linux-can-next-for-6.5-
20230515' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next: (22 commits)
can: xilinx: Convert to platform remove callback returning void
can: ti_hecc: Convert to platform remove callback returning void
can: sun4i_can: Convert to platform remove callback returning void
can: softing: Convert to platform remove callback returning void
can: sja1000_platform: Convert to platform remove callback returning void
can: sja1000_isa: Convert to platform remove callback returning void
can: rcar: Convert to platform remove callback returning void
can: mscan: mpc5xxx_can: Convert to platform remove callback returning void
can: m_can: Convert to platform remove callback returning void
can: janz-ican3: Convert to platform remove callback returning void
can: ifi_canfd: Convert to platform remove callback returning void
can: grcan: Convert to platform remove callback returning void
can: flexcan: Convert to platform remove callback returning void
can: ctucanfd: Convert to platform remove callback returning void
can: length: make header self contained
can: cc770_platform: Convert to platform remove callback returning void
can: bxcan: Remove unnecessary print function dev_err()
can: cc770_isa: Convert to platform remove callback returning void
can: usb: f81604: add Fintek F81604 support
can: c_can: Convert to platform remove callback returning void
...
====================
Link: https://lore.kernel.org/r/20230515205759.1003118-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 17 May 2023 03:41:12 +0000 (20:41 -0700)]
Revert "net: Remove low_thresh in ip defrag"
This reverts commit
b2cbac9b9b28730e9e53be20b6cdf979d3b9f27e.
We have multiple reports of obvious breakage from this patch.
Reported-by: Ido Schimmel <idosch@idosch.org>
Link: https://lore.kernel.org/all/ZGIRWjNcfqI8yY8W@shredder/
Link: https://lore.kernel.org/all/CADJHv_sDK=0RrMA2FTZQV5fw7UQ+qY=HG21Wu5qb0V9vvx5w6A@mail.gmail.com/
Reported-by: syzbot+a5e719ac7c268e414c95@syzkaller.appspotmail.com
Reported-by: syzbot+a03fd670838d927d9cd8@syzkaller.appspotmail.com
Fixes:
b2cbac9b9b28 ("net: Remove low_thresh in ip defrag")
Link: https://lore.kernel.org/r/20230517034112.1261835-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 17 May 2023 02:50:05 +0000 (19:50 -0700)]
Merge tag 'for-netdev' of https://git./linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2023-05-16
We've added 57 non-merge commits during the last 19 day(s) which contain
a total of 63 files changed, 3293 insertions(+), 690 deletions(-).
The main changes are:
1) Add precision propagation to verifier for subprogs and callbacks,
from Andrii Nakryiko.
2) Improve BPF's {g,s}setsockopt() handling with wrong option lengths,
from Stanislav Fomichev.
3) Utilize pahole v1.25 for the kernel's BTF generation to filter out
inconsistent function prototypes, from Alan Maguire.
4) Various dyn-pointer verifier improvements to relax restrictions,
from Daniel Rosenberg.
5) Add a new bpf_task_under_cgroup() kfunc for designated task,
from Feng Zhou.
6) Unblock tests for arm64 BPF CI after ftrace supporting direct call,
from Florent Revest.
7) Add XDP hint kfunc metadata for RX hash/timestamp for igc,
from Jesper Dangaard Brouer.
8) Add several new dyn-pointer kfuncs to ease their usability,
from Joanne Koong.
9) Add in-depth LRU internals description and dot function graph,
from Joe Stringer.
10) Fix KCSAN report on bpf_lru_list when accessing node->ref,
from Martin KaFai Lau.
11) Only dump unprivileged_bpf_disabled log warning upon write,
from Kui-Feng Lee.
12) Extend test_progs to directly passing allow/denylist file,
from Stephen Veiss.
13) Fix BPF trampoline memleak upon failure attaching to fentry,
from Yafang Shao.
14) Fix emitting struct bpf_tcp_sock type in vmlinux BTF,
from Yonghong Song.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (57 commits)
bpf: Fix memleak due to fentry attach failure
bpf: Remove bpf trampoline selector
bpf, arm64: Support struct arguments in the BPF trampoline
bpftool: JIT limited misreported as negative value on aarch64
bpf: fix calculation of subseq_idx during precision backtracking
bpf: Remove anonymous union in bpf_kfunc_call_arg_meta
bpf: Document EFAULT changes for sockopt
selftests/bpf: Correctly handle optlen > 4096
selftests/bpf: Update EFAULT {g,s}etsockopt selftests
bpf: Don't EFAULT for {g,s}setsockopt with wrong optlen
libbpf: fix offsetof() and container_of() to work with CO-RE
bpf: Address KCSAN report on bpf_lru_list
bpf: Add --skip_encoding_btf_inconsistent_proto, --btf_gen_optimized to pahole flags for v1.25
selftests/bpf: Accept mem from dynptr in helper funcs
bpf: verifier: Accept dynptr mem as mem in helpers
selftests/bpf: Check overflow in optional buffer
selftests/bpf: Test allowing NULL buffer in dynptr slice
bpf: Allow NULL buffers in bpf_dynptr_slice(_rw)
selftests/bpf: Add testcase for bpf_task_under_cgroup
bpf: Add bpf_task_under_cgroup() kfunc
...
====================
Link: https://lore.kernel.org/r/20230515225603.27027-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Piotr Raczynski [Mon, 15 May 2023 19:03:19 +0000 (21:03 +0200)]
ice: add dynamic interrupt allocation
Currently driver can only allocate interrupt vectors during init phase by
calling pci_alloc_irq_vectors. Change that and make use of new
pci_msix_alloc_irq_at/pci_msix_free_irq API and enable to allocate and free
more interrupts after MSIX has been enabled. Since not all platforms
supports dynamic allocation, check it with pci_msix_can_alloc_dyn.
Extend the tracker to keep track how many interrupts are allocated
initially so when all such vectors are already used, additional interrupts
are automatically allocated dynamically. Remember each interrupt allocation
method to then free appropriately. Since some features may require
interrupts allocated dynamically add appropriate VSI flag and take it into
account when allocating new interrupt.
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Piotr Raczynski <piotr.raczynski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Piotr Raczynski [Mon, 15 May 2023 19:03:18 +0000 (21:03 +0200)]
ice: track interrupt vectors with xarray
Replace custom interrupt tracker with generic xarray data structure.
Remove all code responsible for searching for a new entry with xa_alloc,
which always tries to allocate at the lowes possible index. As a result
driver is always using a contiguous region of the MSIX vector table.
New tracker keeps ice_irq_entry entries in xarray as opaque for the rest
of the driver hiding the entry details from the caller.
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Piotr Raczynski <piotr.raczynski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Piotr Raczynski [Mon, 15 May 2023 19:03:17 +0000 (21:03 +0200)]
ice: add individual interrupt allocation
Currently interrupt allocations, depending on a feature are distributed
in batches. Also, after allocation there is a series of operations that
distributes per irq settings through that batch of interrupts.
Although driver does not yet support dynamic interrupt allocation, keep
allocated interrupts in a pool and add allocation abstraction logic to
make code more flexible. Keep per interrupt information in the
ice_q_vector structure, which yields ice_vsi::base_vector redundant.
Also, as a result there are a few functions that can be removed.
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Piotr Raczynski <piotr.raczynski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Piotr Raczynski [Mon, 15 May 2023 19:03:16 +0000 (21:03 +0200)]
ice: remove redundant SRIOV code
Remove redundant code from ice_get_max_valid_res_idx that has no effect.
ice_pf::irq_tracker is initialized during driver probe, there is no reason
to check it again. Also it is not possible for pf::sriov_base_vector to be
lower than the tracker length, remove WARN_ON that will never happen.
Get rid of ice_get_max_valid_res_idx helper function completely since it
can never return negative value.
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Piotr Raczynski <piotr.raczynski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Piotr Raczynski [Mon, 15 May 2023 19:03:15 +0000 (21:03 +0200)]
ice: refactor VF control VSI interrupt handling
All VF control VSIs share the same interrupt vector. Currently, a helper
function dedicated for that directly sets ice_vsi::base_vector.
Use helper that returns pointer to first found VF control VSI instead.
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Piotr Raczynski <piotr.raczynski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Piotr Raczynski [Mon, 15 May 2023 19:03:14 +0000 (21:03 +0200)]
ice: use preferred MSIX allocation api
Move away from using pci_enable_msix_range/pci_disable_msix and use
pci_alloc_irq_vectors/pci_free_irq_vectors instead.
As a result stop tracking msix_entries since with newer API entries are
handled by MSIX core. However, due to current design of communication
with RDMA driver which accesses ice_pf::msix_entries directly, keep
using the array just for RDMA driver use.
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Piotr Raczynski <piotr.raczynski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Piotr Raczynski [Mon, 15 May 2023 19:03:13 +0000 (21:03 +0200)]
ice: use pci_irq_vector helper function
Currently, driver gets interrupt number directly from ice_pf::msix_entries
array. Use helper function dedicated to do just that.
While at it use a variable to store interrupt number in
ice_free_irq_msix_misc instead of calling the helper function twice.
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Piotr Raczynski <piotr.raczynski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Piotr Raczynski [Mon, 15 May 2023 19:03:12 +0000 (21:03 +0200)]
ice: move interrupt related code to separate file
Keep interrupt handling code in a dedicated file. This helps keep driver
structured better and prepares for more functionality added to this file.
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Piotr Raczynski <piotr.raczynski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Paolo Abeni [Tue, 16 May 2023 13:39:09 +0000 (15:39 +0200)]
Merge branch 'spdx-conversion-for-bonding-8390-and-i825xx-drivers'
Bagas Sanjaya says:
====================
SPDX conversion for bonding, 8390, and i825xx drivers
This series is SPDX conversion for bonding, 8390, and i825xx driver
subsystems. It is splitted from v2 of my SPDX conversion series in
response to Didi's GPL full name fixes [1] to make it easily
digestible.
The conversion in this series is divided by each subsystem and by
license type.
[1]: https://lore.kernel.org/linux-spdx/
20230512100620.36807-1-bagasdotme@gmail.com/
====================
Link: https://lore.kernel.org/r/20230515060714.621952-1-bagasdotme@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Bagas Sanjaya [Mon, 15 May 2023 06:07:15 +0000 (13:07 +0700)]
net: ethernet: i825xx: sun3_8256: Add SPDX license identifier
The boilerplate reads that sun3_8256 driver is an extension to Linux
kernel core, hence add SPDX license identifier for GPL 2.0.
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Michael Hipp <hippm@informatik.uni-tuebingen.de>
Cc: Sam Creasey <sammy@sammy.net>
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Bagas Sanjaya [Mon, 15 May 2023 06:07:14 +0000 (13:07 +0700)]
net: ethernet: i825xx: Replace unversioned GPL (GPL 1.0) notice with SPDX identifier
Replace unversioned GPL boilerplate notice with corresponding SPDX
license identifier, which is GPL 1.0+.
Cc: Donald Becker <becker@scyld.com>
Cc: Richard Hirst <richard@sleepie.demon.co.uk>
Cc: Sam Creasey <sammy@sammy.net>
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Bagas Sanjaya [Mon, 15 May 2023 06:07:13 +0000 (13:07 +0700)]
net: ethernet: 8390: Replace GPL 2.0 boilerplate with SPDX identifier
The boilerplate refers to COPYING in the top-level directory of kernel
tree. Replace it with corresponding SPDX license identifier.
Cc: Donald Becker <becker@scyld.com>
Cc: Peter De Schrijver <p2@mind.be>
Cc: Topi Kanerva <topi@susanna.oulu.fi>
Cc: Alain Malek <Alain.Malek@cryogen.com>
Cc: Bruce Abbott <bhabbott@inhb.co.nz>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Richard Fontana <rfontana@redhat.com>
Acked-by: Greg Ungerer <gerg@linux-m68k.org>
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Bagas Sanjaya [Mon, 15 May 2023 06:07:12 +0000 (13:07 +0700)]
net: ethernet: 8390: Convert unversioned GPL notice to SPDX license identifier
Replace boilerplate notice for unversioned GPL to SPDX tag for GPL 1.0+.
For ne2k-pci.c, only add SPDX tag and keep the boilerplate instead,
since the boilerplate notes that it must be preserved.
Cc: David A. Hinds <dahinds@users.sourceforge.net>
Cc: Donald Becker <becker@scyld.com>
Cc: Alan Cox <alan@linux.intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Richard Fontana <rfontana@redhat.com>
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Bagas Sanjaya [Mon, 15 May 2023 06:07:11 +0000 (13:07 +0700)]
net: bonding: Add SPDX identifier to remaining files
Previous batches of SPDX conversion missed bond_main.c and bonding_priv.h
because these files doesn't mention intended GPL version. Add SPDX identifier
to these files, assuming GPL 1.0+.
Cc: Thomas Davis <tadavis@lbl.gov>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Yunsheng Lin [Mon, 15 May 2023 05:01:07 +0000 (13:01 +0800)]
net: skbuff: update comment about pfmemalloc propagating
__skb_fill_page_desc_noacc() is not doing any pfmemalloc
propagating, and yet it has a comment about that, commit
84ce071e38a6 ("net: introduce __skb_fill_page_desc_noacc")
may have accidentally moved it to __skb_fill_page_desc_noacc(),
so move it back to __skb_fill_page_desc() which is supposed
to be doing pfmemalloc propagating.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
CC: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/20230515050107.46397-1-linyunsheng@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Yafang Shao [Mon, 15 May 2023 13:08:47 +0000 (13:08 +0000)]
bpf: Fix memleak due to fentry attach failure
If it fails to attach fentry, the allocated bpf trampoline image will be
left in the system. That can be verified by checking /proc/kallsyms.
This meamleak can be verified by a simple bpf program as follows:
SEC("fentry/trap_init")
int fentry_run()
{
return 0;
}
It will fail to attach trap_init because this function is freed after
kernel init, and then we can find the trampoline image is left in the
system by checking /proc/kallsyms.
$ tail /proc/kallsyms
ffffffffc0613000 t bpf_trampoline_6442453466_1 [bpf]
ffffffffc06c3000 t bpf_trampoline_6442453466_1 [bpf]
$ bpftool btf dump file /sys/kernel/btf/vmlinux | grep "FUNC 'trap_init'"
[2522] FUNC 'trap_init' type_id=119 linkage=static
$ echo $((
6442453466 & 0x7fffffff))
2522
Note that there are two left bpf trampoline images, that is because the
libbpf will fallback to raw tracepoint if -EINVAL is returned.
Fixes:
e21aa341785c ("bpf: Fix fexit trampoline.")
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <song@kernel.org>
Cc: Jiri Olsa <olsajiri@gmail.com>
Link: https://lore.kernel.org/bpf/20230515130849.57502-2-laoar.shao@gmail.com
Marc Kleine-Budde [Mon, 15 May 2023 20:54:22 +0000 (22:54 +0200)]
Merge patch series "can: Convert to platform remove callback returning void"
Uwe Kleine-König <u.kleine-koenig@pengutronix.de> says:
this series converts the drivers below drivers/net/can to the
.remove_new() callback of struct platform_driver(). The motivation is to
make the remove callback less prone for errors and wrong assumptions.
See commit
5c5a7680e67b ("platform: Provide a remove callback that
returns no value") for a more detailed rationale.
All drivers already returned zero unconditionally in their
.remove() callback, so converting them to .remove_new() is trivial.
Link: https://lore.kernel.org/r/20230512212725.143824-1-u.kleine-koenig@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Uwe Kleine-König [Fri, 12 May 2023 21:27:25 +0000 (23:27 +0200)]
can: xilinx: Convert to platform remove callback returning void
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart from
emitting a warning) and this typically results in resource leaks. To improve
here there is a quest to make the remove callback return void. In the first
step of this quest all drivers are converted to .remove_new() which already
returns void. Eventually after all drivers are converted, .remove_new() is
renamed to .remove().
Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20230512212725.143824-20-u.kleine-koenig@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Uwe Kleine-König [Fri, 12 May 2023 21:27:24 +0000 (23:27 +0200)]
can: ti_hecc: Convert to platform remove callback returning void
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart from
emitting a warning) and this typically results in resource leaks. To improve
here there is a quest to make the remove callback return void. In the first
step of this quest all drivers are converted to .remove_new() which already
returns void. Eventually after all drivers are converted, .remove_new() is
renamed to .remove().
Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20230512212725.143824-19-u.kleine-koenig@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Uwe Kleine-König [Fri, 12 May 2023 21:27:23 +0000 (23:27 +0200)]
can: sun4i_can: Convert to platform remove callback returning void
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart from
emitting a warning) and this typically results in resource leaks. To improve
here there is a quest to make the remove callback return void. In the first
step of this quest all drivers are converted to .remove_new() which already
returns void. Eventually after all drivers are converted, .remove_new() is
renamed to .remove().
Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Acked-by: Gerhard Bertelsmann <info@gerhard-bertelsmann.de>
Acked-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Link: https://lore.kernel.org/r/20230512212725.143824-18-u.kleine-koenig@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Uwe Kleine-König [Fri, 12 May 2023 21:27:22 +0000 (23:27 +0200)]
can: softing: Convert to platform remove callback returning void
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart from
emitting a warning) and this typically results in resource leaks. To improve
here there is a quest to make the remove callback return void. In the first
step of this quest all drivers are converted to .remove_new() which already
returns void. Eventually after all drivers are converted, .remove_new() is
renamed to .remove().
Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20230512212725.143824-17-u.kleine-koenig@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Uwe Kleine-König [Fri, 12 May 2023 21:27:21 +0000 (23:27 +0200)]
can: sja1000_platform: Convert to platform remove callback returning void
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart from
emitting a warning) and this typically results in resource leaks. To improve
here there is a quest to make the remove callback return void. In the first
step of this quest all drivers are converted to .remove_new() which already
returns void. Eventually after all drivers are converted, .remove_new() is
renamed to .remove().
Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20230512212725.143824-16-u.kleine-koenig@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Uwe Kleine-König [Fri, 12 May 2023 21:27:20 +0000 (23:27 +0200)]
can: sja1000_isa: Convert to platform remove callback returning void
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart from
emitting a warning) and this typically results in resource leaks. To improve
here there is a quest to make the remove callback return void. In the first
step of this quest all drivers are converted to .remove_new() which already
returns void. Eventually after all drivers are converted, .remove_new() is
renamed to .remove().
Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20230512212725.143824-15-u.kleine-koenig@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Uwe Kleine-König [Fri, 12 May 2023 21:27:19 +0000 (23:27 +0200)]
can: rcar: Convert to platform remove callback returning void
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart from
emitting a warning) and this typically results in resource leaks. To improve
here there is a quest to make the remove callback return void. In the first
step of this quest all drivers are converted to .remove_new() which already
returns void. Eventually after all drivers are converted, .remove_new() is
renamed to .remove().
Trivially convert these drivers from always returning zero in the remove
callback to the void returning variant.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20230512212725.143824-14-u.kleine-koenig@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Uwe Kleine-König [Fri, 12 May 2023 21:27:18 +0000 (23:27 +0200)]
can: mscan: mpc5xxx_can: Convert to platform remove callback returning void
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart from
emitting a warning) and this typically results in resource leaks. To improve
here there is a quest to make the remove callback return void. In the first
step of this quest all drivers are converted to .remove_new() which already
returns void. Eventually after all drivers are converted, .remove_new() is
renamed to .remove().
Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20230512212725.143824-13-u.kleine-koenig@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Uwe Kleine-König [Fri, 12 May 2023 21:27:17 +0000 (23:27 +0200)]
can: m_can: Convert to platform remove callback returning void
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart from
emitting a warning) and this typically results in resource leaks. To improve
here there is a quest to make the remove callback return void. In the first
step of this quest all drivers are converted to .remove_new() which already
returns void. Eventually after all drivers are converted, .remove_new() is
renamed to .remove().
Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20230512212725.143824-12-u.kleine-koenig@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Uwe Kleine-König [Fri, 12 May 2023 21:27:16 +0000 (23:27 +0200)]
can: janz-ican3: Convert to platform remove callback returning void
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart from
emitting a warning) and this typically results in resource leaks. To improve
here there is a quest to make the remove callback return void. In the first
step of this quest all drivers are converted to .remove_new() which already
returns void. Eventually after all drivers are converted, .remove_new() is
renamed to .remove().
Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20230512212725.143824-11-u.kleine-koenig@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Uwe Kleine-König [Fri, 12 May 2023 21:27:15 +0000 (23:27 +0200)]
can: ifi_canfd: Convert to platform remove callback returning void
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart from
emitting a warning) and this typically results in resource leaks. To improve
here there is a quest to make the remove callback return void. In the first
step of this quest all drivers are converted to .remove_new() which already
returns void. Eventually after all drivers are converted, .remove_new() is
renamed to .remove().
Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20230512212725.143824-10-u.kleine-koenig@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Uwe Kleine-König [Fri, 12 May 2023 21:27:14 +0000 (23:27 +0200)]
can: grcan: Convert to platform remove callback returning void
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart from
emitting a warning) and this typically results in resource leaks. To improve
here there is a quest to make the remove callback return void. In the first
step of this quest all drivers are converted to .remove_new() which already
returns void. Eventually after all drivers are converted, .remove_new() is
renamed to .remove().
Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20230512212725.143824-9-u.kleine-koenig@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Uwe Kleine-König [Fri, 12 May 2023 21:27:13 +0000 (23:27 +0200)]
can: flexcan: Convert to platform remove callback returning void
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart from
emitting a warning) and this typically results in resource leaks. To improve
here there is a quest to make the remove callback return void. In the first
step of this quest all drivers are converted to .remove_new() which already
returns void. Eventually after all drivers are converted, .remove_new() is
renamed to .remove().
Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20230512212725.143824-8-u.kleine-koenig@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Uwe Kleine-König [Fri, 12 May 2023 21:27:12 +0000 (23:27 +0200)]
can: ctucanfd: Convert to platform remove callback returning void
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart from
emitting a warning) and this typically results in resource leaks. To improve
here there is a quest to make the remove callback return void. In the first
step of this quest all drivers are converted to .remove_new() which already
returns void. Eventually after all drivers are converted, .remove_new() is
renamed to .remove().
Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Acked-by: Pavel Pisa <pisa@cmp.felk.cvut.cz>
Link: https://lore.kernel.org/r/20230512212725.143824-7-u.kleine-koenig@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Marc Kleine-Budde [Sun, 23 Apr 2023 11:06:39 +0000 (13:06 +0200)]
can: length: make header self contained
Include the headers that "can/length.h" depends on.
Fixes:
bdd2e413192d ("can: dev: move length related code into seperate file")
Link: https://lore.kernel.org/all/20230509122854.350426-1-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Uwe Kleine-König [Fri, 12 May 2023 21:27:11 +0000 (23:27 +0200)]
can: cc770_platform: Convert to platform remove callback returning void
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart from
emitting a warning) and this typically results in resource leaks. To improve
here there is a quest to make the remove callback return void. In the first
step of this quest all drivers are converted to .remove_new() which already
returns void. Eventually after all drivers are converted, .remove_new() is
renamed to .remove().
Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20230512212725.143824-6-u.kleine-koenig@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Jiapeng Chong [Sat, 6 May 2023 08:07:25 +0000 (16:07 +0800)]
can: bxcan: Remove unnecessary print function dev_err()
The print function dev_err() is redundant because
platform_get_irq_byname() already prints an error.
./drivers/net/can/bxcan.c:970:2-9: line 970 is redundant because platform_get_irq() already prints an error.
./drivers/net/can/bxcan.c:964:2-9: line 964 is redundant because platform_get_irq() already prints an error.
./drivers/net/can/bxcan.c:958:2-9: line 958 is redundant because platform_get_irq() already prints an error.
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=4878
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/all/20230506080725.68401-1-jiapeng.chong@linux.alibaba.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Uwe Kleine-König [Fri, 12 May 2023 21:27:10 +0000 (23:27 +0200)]
can: cc770_isa: Convert to platform remove callback returning void
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart from
emitting a warning) and this typically results in resource leaks. To improve
here there is a quest to make the remove callback return void. In the first
step of this quest all drivers are converted to .remove_new() which already
returns void. Eventually after all drivers are converted, .remove_new() is
renamed to .remove().
Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20230512212725.143824-5-u.kleine-koenig@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Ji-Ze Hong [Tue, 9 May 2023 07:38:21 +0000 (15:38 +0800)]
can: usb: f81604: add Fintek F81604 support
This patch adds support for Fintek USB to 2CAN controller.
Changelog:
v7: https://lore.kernel.org/all/
20230509073821.25289-1-peter_hong@fintek.com.tw
1. Fix consistency of coding style for "break" in f81604_register_urbs().
2. Remove goto statement in f81604_open().
v6: https://lore.kernel.org/all/
20230505022317.22417-1-peter_hong@fintek.com.tw
1. Remove non-used define and change constant mask to GENMASK().
2. Move some variables declaration from function start to block start.
3. Move some variables initization into declaration.
4. Change variable "id" in f81604_start_xmit() only for CAN ID usage.
v5: https://lore.kernel.org/all/
20230420024403.13830-1-peter_hong@fintek.com.tw
1. Change all u8 *buff to struct f81604_int_data/f81604_can_frame.
2. Change all netdev->dev_id to netdev->dev_port.
3. Remove over design for f81604_process_rx_packet(). This device only
report a frame at once, so the f81604_process_rx_packet() are reduced
to process 1 frame.
v4: https://lore.kernel.org/all/
20230413084253.1524-1-peter_hong@fintek.com.tw
1. Remove f81604_prepare_urbs/f81604_remove_urbs() and alloc URB/buffer
dynamically in f81604_register_urbs(), using "urbs_anchor" for manage
all rx/int URBs.
2. Add F81604 to MAINTAINERS list.
3. Change handle_clear_reg_work/handle_clear_overrun_work to single
clear_reg_work and using bitwise "clear_flags" to record it.
4. Move __f81604_set_termination in front of f81604_probe() to avoid
rarely racing condition.
5. Add __aligned to struct f81604_int_data / f81604_sff / f81604_eff.
6. Add aligned operations in f81604_start_xmit/f81604_process_rx_packet().
7. Change lots of CANBUS functions first parameter from struct usb_device*
to struct f81604_port_priv *priv. But remain f81604_write / f81604_read
/ f81604_update_bits() as struct usb_device* for
__f81604_set_termination() in probe() stage.
8. Simplify f81604_read_int_callback() and separate into
f81604_handle_tx / f81604_handle_can_bus_errors() functions.
v3: https://lore.kernel.org/all/
20230327051048.11589-1-peter_hong@fintek.com.tw
1. Change CAN clock to using MEGA units.
2. Remove USB set/get retry, only remain SJA1000 reset/operation retry.
3. Fix all numberic constant to define.
4. Add terminator control. (only 0 & 120 ohm)
5. Using struct data to represent INT/TX/RX endpoints data instead byte
arrays.
6. Error message reports changed from %d to %pe for mnemotechnic values.
7. Some bit operations are changed to FIELD_PREP().
8. Separate TX functions from f81604_read_int_callback().
9. cf->can_id |= CAN_ERR_CNT in f81604_read_int_callback to report valid
TX/RX error counts.
10. Move f81604_prepare_urbs/f81604_remove_urbs() from CAN open/close() to
USB probe/disconnect().
11. coding style refactoring.
v2: https://lore.kernel.org/all/
20230321081152.26510-1-peter_hong@fintek.com.tw
1. coding style refactoring.
2. some const number are defined to describe itself.
3. fix wrong usage for can_get_echo_skb() in f81604_write_bulk_callback().
v1: https://lore.kernel.org/all/
20230317093352.3979-1-peter_hong@fintek.com.tw
Signed-off-by: Ji-Ze Hong (Peter Hong) <peter_hong@fintek.com.tw>
Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://lore.kernel.org/all/20230509073821.25289-1-peter_hong@fintek.com.tw
[mkl: add changelog, fix printf format]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Uwe Kleine-König [Fri, 12 May 2023 21:27:09 +0000 (23:27 +0200)]
can: c_can: Convert to platform remove callback returning void
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart from
emitting a warning) and this typically results in resource leaks. To improve
here there is a quest to make the remove callback return void. In the first
step of this quest all drivers are converted to .remove_new() which already
returns void. Eventually after all drivers are converted, .remove_new() is
renamed to .remove().
Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20230512212725.143824-4-u.kleine-koenig@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Uwe Kleine-König [Fri, 12 May 2023 21:27:08 +0000 (23:27 +0200)]
can: bxcan: Convert to platform remove callback returning void
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart from
emitting a warning) and this typically results in resource leaks. To improve
here there is a quest to make the remove callback return void. In the first
step of this quest all drivers are converted to .remove_new() which already
returns void. Eventually after all drivers are converted, .remove_new() is
renamed to .remove().
Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20230512212725.143824-3-u.kleine-koenig@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Uwe Kleine-König [Fri, 12 May 2023 21:27:07 +0000 (23:27 +0200)]
can: at91_can: Convert to platform remove callback returning void
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart from
emitting a warning) and this typically results in resource leaks. To improve
here there is a quest to make the remove callback return void. In the first
step of this quest all drivers are converted to .remove_new() which already
returns void. Eventually after all drivers are converted, .remove_new() is
renamed to .remove().
Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Reviewed-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Link: https://lore.kernel.org/r/20230512212725.143824-2-u.kleine-koenig@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Yafang Shao [Mon, 15 May 2023 13:08:48 +0000 (13:08 +0000)]
bpf: Remove bpf trampoline selector
After commit
e21aa341785c ("bpf: Fix fexit trampoline."), the selector is only
used to indicate how many times the bpf trampoline image are updated and been
displayed in the trampoline ksym name. After the trampoline is freed, the
selector will start from 0 again. So the selector is a useless value to the
user. We can remove it.
If the user want to check whether the bpf trampoline image has been updated
or not, the user can compare the address. Each time the trampoline image is
updated, the address will change consequently. Jiri also pointed out another
issue that perf is still using the old name "bpf_trampoline_%lu", so this
change can fix the issue in perf.
Fixes:
e21aa341785c ("bpf: Fix fexit trampoline.")
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <song@kernel.org>
Cc: Jiri Olsa <olsajiri@gmail.com>
Link: https://lore.kernel.org/bpf/ZFvOOlrmHiY9AgXE@krava
Link: https://lore.kernel.org/bpf/20230515130849.57502-3-laoar.shao@gmail.com
Florent Revest [Thu, 11 May 2023 14:05:07 +0000 (16:05 +0200)]
bpf, arm64: Support struct arguments in the BPF trampoline
This extends the BPF trampoline JIT to support attachment to functions
that take small structures (up to 128bit) as argument. This is trivially
achieved by saving/restoring a number of "argument registers" rather
than a number of arguments.
The AAPCS64 section 6.8.2 describes the parameter passing ABI.
"Composite types" (like C structs) below 16 bytes (as enforced by the
BPF verifier) are provided as part of the 8 argument registers as
explained in the section C.12.
Signed-off-by: Florent Revest <revest@chromium.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Xu Kuohai <xukuohai@huawei.com>
Link: https://lore.kernel.org/bpf/20230511140507.514888-1-revest@chromium.org
Alan Maguire [Fri, 12 May 2023 11:31:34 +0000 (12:31 +0100)]
bpftool: JIT limited misreported as negative value on aarch64
On aarch64, "bpftool feature" reports an incorrect BPF JIT limit:
$ sudo /sbin/bpftool feature
Scanning system configuration...
bpf() syscall restricted to privileged users
JIT compiler is enabled
JIT compiler hardening is disabled
JIT compiler kallsyms exports are enabled for root
skipping kernel config, can't open file: No such file or directory
Global memory limit for JIT compiler for unprivileged users is -
201326592 bytes
This is because /proc/sys/net/core/bpf_jit_limit reports
$ sudo cat /proc/sys/net/core/bpf_jit_limit
68169519595520
...and an int is assumed in read_procfs(). Change read_procfs()
to return a long to avoid negative value reporting.
Fixes:
7a4522bbef0c ("tools: bpftool: add probes for /proc/ eBPF parameters")
Reported-by: Nicky Veitch <nicky.veitch@oracle.com>
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20230512113134.58996-1-alan.maguire@oracle.com
Andrii Nakryiko [Mon, 15 May 2023 18:07:10 +0000 (11:07 -0700)]
bpf: fix calculation of subseq_idx during precision backtracking
Subsequent instruction index (subseq_idx) is an index of an instruction
that was verified/executed by verifier after the currently processed
instruction. It is maintained during precision backtracking processing
and is used to detect various subprog calling conditions.
This patch fixes the bug with incorrectly resetting subseq_idx to -1
when going from child state to parent state during backtracking. If we
don't maintain correct subseq_idx we can misidentify subprog calls
leading to precision tracking bugs.
One such case was triggered by test_global_funcs/global_func9 test where
global subprog call happened to be the very last instruction in parent
state, leading to subseq_idx==-1, triggering WARN_ONCE:
[ 36.045754] verifier backtracking bug
[ 36.045764] WARNING: CPU: 13 PID: 2073 at kernel/bpf/verifier.c:3503 __mark_chain_precision+0xcc6/0xde0
[ 36.046819] Modules linked in: aesni_intel(E) crypto_simd(E) cryptd(E) kvm_intel(E) kvm(E) irqbypass(E) i2c_piix4(E) serio_raw(E) i2c_core(E) crc32c_intel)
[ 36.048040] CPU: 13 PID: 2073 Comm: test_progs Tainted: G W OE 6.3.0-07976-g4d585f48ee6b-dirty #972
[ 36.048783] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
[ 36.049648] RIP: 0010:__mark_chain_precision+0xcc6/0xde0
[ 36.050038] Code: 3d 82 c6 05 bb 35 32 02 01 e8 66 21 ec ff 0f 0b b8 f2 ff ff ff e9 30 f5 ff ff 48 c7 c7 f3 61 3d 82 4c 89 0c 24 e8 4a 21 ec ff <0f> 0b 4c0
With the fix precision tracking across multiple states works correctly now:
mark_precise: frame0: last_idx 45 first_idx 38 subseq_idx -1
mark_precise: frame0: regs=r8 stack= before 44: (61) r7 = *(u32 *)(r10 -4)
mark_precise: frame0: regs=r8 stack= before 43: (85) call pc+41
mark_precise: frame0: regs=r8 stack= before 42: (07) r1 += -48
mark_precise: frame0: regs=r8 stack= before 41: (bf) r1 = r10
mark_precise: frame0: regs=r8 stack= before 40: (63) *(u32 *)(r10 -48) = r1
mark_precise: frame0: regs=r8 stack= before 39: (b4) w1 = 0
mark_precise: frame0: regs=r8 stack= before 38: (85) call pc+38
mark_precise: frame0: parent state regs=r8 stack=: R0_w=scalar() R1_w=map_value(off=4,ks=4,vs=8,imm=0) R6=1 R7_w=scalar() R8_r=P0 R10=fpm
mark_precise: frame0: last_idx 36 first_idx 28 subseq_idx 38
mark_precise: frame0: regs=r8 stack= before 36: (18) r1 = 0xffff888104f2ed14
mark_precise: frame0: regs=r8 stack= before 35: (85) call pc+33
mark_precise: frame0: regs=r8 stack= before 33: (18) r1 = 0xffff888104f2ed10
mark_precise: frame0: regs=r8 stack= before 32: (85) call pc+36
mark_precise: frame0: regs=r8 stack= before 31: (07) r1 += -4
mark_precise: frame0: regs=r8 stack= before 30: (bf) r1 = r10
mark_precise: frame0: regs=r8 stack= before 29: (63) *(u32 *)(r10 -4) = r7
mark_precise: frame0: regs=r8 stack= before 28: (4c) w7 |= w0
mark_precise: frame0: parent state regs=r8 stack=: R0_rw=scalar() R6=1 R7_rw=scalar() R8_rw=P0 R10=fp0 fp-48_r=mmmmmmmm
mark_precise: frame0: last_idx 27 first_idx 16 subseq_idx 28
mark_precise: frame0: regs=r8 stack= before 27: (85) call pc+31
mark_precise: frame0: regs=r8 stack= before 26: (b7) r1 = 0
mark_precise: frame0: regs=r8 stack= before 25: (b7) r8 = 0
Note how subseq_idx starts out as -1, then is preserved as 38 and then 28 as we
go up the parent state chain.
Reported-by: Alexei Starovoitov <ast@kernel.org>
Fixes:
fde2a3882bd0 ("bpf: support precision propagation in the presence of subprogs")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230515180710.1535018-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Dave Marchevsky [Wed, 10 May 2023 21:30:47 +0000 (14:30 -0700)]
bpf: Remove anonymous union in bpf_kfunc_call_arg_meta
For kfuncs like bpf_obj_drop and bpf_refcount_acquire - which take
user-defined types as input - the verifier needs to track the specific
type passed in when checking a particular kfunc call. This requires
tracking (btf, btf_id) tuple. In commit
7c50b1cb76ac
("bpf: Add bpf_refcount_acquire kfunc") I added an anonymous union with
inner structs named after the specific kfuncs tracking this information,
with the goal of making it more obvious which kfunc this data was being
tracked / expected to be tracked on behalf of.
In a recent series adding a new user of this tuple, Alexei mentioned
that he didn't like this union usage as it doesn't really help with
readability or bug-proofing ([0]). In an offline convo we agreed to
have the tuple be fields (arg_btf, arg_btf_id), with comments in
bpf_kfunc_call_arg_meta definition enumerating the uses of the fields by
kfunc-specific handling logic. Such a pattern is used by struct
bpf_reg_state without trouble.
Accordingly, this patch removes the anonymous union in favor of arg_btf
and arg_btf_id fields and comment enumerating their current uses. The
patch also removes struct btf_and_id, which was only being used by the
removed union's inner structs.
This is a mechanical change, existing linked_list and rbtree tests will
validate that correct (btf, btf_id) are being passed.
[0]: https://lore.kernel.org/bpf/
20230505021707.vlyiwy57vwxglbka@dhcp-172-26-102-232.dhcp.thefacebook.com
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/r/20230510213047.1633612-1-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Krzysztof Kozlowski [Sat, 13 May 2023 11:52:04 +0000 (13:52 +0200)]
nfc: llcp: fix possible use of uninitialized variable in nfc_llcp_send_connect()
If sock->service_name is NULL, the local variable
service_name_tlv_length will not be assigned by nfc_llcp_build_tlv(),
later leading to using value frmo the stack. Smatch warning:
net/nfc/llcp_commands.c:442 nfc_llcp_send_connect() error: uninitialized symbol 'service_name_tlv_length'.
Fixes:
de9e5aeb4f40 ("NFC: llcp: Fix usage of llcp_add_tlv()")
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yang Li [Mon, 15 May 2023 08:56:45 +0000 (16:56 +0800)]
octeontx2-pf: mcs: Remove unneeded semicolon
./drivers/net/ethernet/marvell/octeontx2/nic/cn10k_macsec.c:242:2-3: Unneeded semicolon
./drivers/net/ethernet/marvell/octeontx2/nic/cn10k_macsec.c:476:2-3: Unneeded semicolon
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=4947
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Anup Sharma [Sat, 13 May 2023 11:06:54 +0000 (16:36 +0530)]
net: ethernet: microchip: vcap: Remove extra semicolon
Remove the extra semicolon at end. Issue identified using
semicolon.cocci Coccinelle semantic patch.
drivers/net/ethernet/microchip/vcap/vcap_api.c:1124:3-4: Unneeded semicolon
drivers/net/ethernet/microchip/vcap/vcap_api.c:1165:3-4: Unneeded semicolon
drivers/net/ethernet/microchip/vcap/vcap_api.c:1239:3-4: Unneeded semicolon
drivers/net/ethernet/microchip/vcap/vcap_api.c:1287:3-4: Unneeded semicolon
Signed-off-by: Anup Sharma <anupnewsmail@gmail.com>
Changes:
V1 -> V2: Target tree included in the subject line.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 15 May 2023 08:31:08 +0000 (09:31 +0100)]
Merge branch 'octeontx2-pf-HTB'
Hariprasad Kelam says:
====================
octeontx2-pf: HTB offload support
octeontx2 silicon and CN10K transmit interface consists of five
transmit levels starting from MDQ, TL4 to TL1. Once packets are
submitted to MDQ, hardware picks all active MDQs using strict
priority, and MDQs having the same priority level are chosen using
round robin. Each packet will traverse MDQ, TL4 to TL1 levels.
Each level contains an array of queues to support scheduling and
shaping.
As HTB supports classful queuing mechanism by supporting rate and
ceil and allow the user to control the absolute bandwidth to
particular classes of traffic the same can be achieved by
configuring shapers and schedulers on different transmit levels.
This series of patches adds support for HTB offload,
Patch1: Allow strict priority parameter in HTB offload mode.
Patch2: Rename existing total tx queues for better readability
Patch3: defines APIs such that the driver can dynamically initialize/
deinitialize the send queues.
Patch4: Refactors transmit alloc/free calls as preparation for QOS
offload code.
Patch5: moves rate limiting logic to common header which will be used
by qos offload code.
Patch6: Adds actual HTB offload support.
Patch7: exposes qos send queue stats over ethtool.
Patch8: Add documentation about htb offload flow in driver
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Hariprasad Kelam [Sat, 13 May 2023 08:51:43 +0000 (14:21 +0530)]
docs: octeontx2: Add Documentation for QOS
Add QOS example configuration along with tc-htb commands
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hariprasad Kelam [Sat, 13 May 2023 08:51:42 +0000 (14:21 +0530)]
octeontx2-pf: ethtool expose qos stats
This patch extends ethtool stats support for QoS send queues as well.
upon the number of transmit channels change request, Ensures the real
number of transmit queues are equal to active QoS send queues plus
configured transmit queues.
ethtool -S eth0
txq_qos0: bytes:
3021391800
txq_qos0: frames: 1998275
txq_qos1: bytes:
4619766312
txq_qos1: frames: 3055401
...
...
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Naveen Mamindlapalli [Sat, 13 May 2023 08:51:41 +0000 (14:21 +0530)]
octeontx2-pf: Add support for HTB offload
This patch registers callbacks to support HTB offload.
Below are features supported,
- supports traffic shaping on the given class by honoring rate and ceil
configuration.
- supports traffic scheduling, which prioritizes different types of
traffic based on strict priority values.
- supports the creation of leaf to inner classes such that parent node
rate limits apply to all child nodes.
Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com>
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hariprasad Kelam [Sat, 13 May 2023 08:51:40 +0000 (14:21 +0530)]
octeontx2-pf: Prepare for QOS offload
This patch moves rate limiting definitions to a common header file and
adds csr definitions required for QOS code.
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hariprasad Kelam [Sat, 13 May 2023 08:51:39 +0000 (14:21 +0530)]
octeontx2-pf: Refactor schedular queue alloc/free calls
1. Upon txschq free request, the transmit schedular config in hardware
is not getting reset. This patch adds necessary changes to do the same.
2. Current implementation calls txschq alloc during interface
initialization and in response handler updates the default txschq array.
This creates a problem for htb offload where txsch alloc will be called
for every tc class. This patch addresses the issue by reading txschq
response in mbox caller function instead in the response handler.
3. Current otx2_txschq_stop routine tries to free all txschq nodes
allocated to the interface. This creates a problem for htb offload.
This patch introduces the otx2_txschq_free_one to free txschq in a
given level.
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Subbaraya Sundeep [Sat, 13 May 2023 08:51:38 +0000 (14:21 +0530)]
octeontx2-pf: qos send queues management
Current implementation is such that the number of Send queues (SQs)
are decided on the device probe which is equal to the number of online
cpus. These SQs are allocated and deallocated in interface open and c
lose calls respectively.
This patch defines new APIs for initializing and deinitializing Send
queues dynamically and allocates more number of transmit queues for
QOS feature.
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hariprasad Kelam [Sat, 13 May 2023 08:51:37 +0000 (14:21 +0530)]
octeontx2-pf: Rename tot_tx_queues to non_qos_queues
current implementation is such that tot_tx_queues contains both
xdp queues and normal tx queues. which will be allocated in interface
open calls and deallocated on interface down calls respectively.
With addition of QOS, where send quees are allocated/deallacated upon
user request Qos send queues won't be part of tot_tx_queues. So this
patch renames tot_tx_queues to non_qos_queues.
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Naveen Mamindlapalli [Sat, 13 May 2023 08:51:36 +0000 (14:21 +0530)]
sch_htb: Allow HTB priority parameter in offload mode
The current implementation of HTB offload returns the EINVAL error
for unsupported parameters like prio and quantum. This patch removes
the error returning checks for 'prio' parameter and populates its
value to tc_htb_qopt_offload structure such that driver can use the
same.
Add prio parameter check in mlx5 driver, as mlx5 devices are not capable
of supporting the prio parameter when htb offload is used. Report error
if prio parameter is set to a non-default value.
Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com>
Co-developed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Angus Chen [Fri, 12 May 2023 01:01:52 +0000 (09:01 +0800)]
net: Remove low_thresh in ip defrag
As low_thresh has no work in fragment reassembles,del it.
And Mark it deprecated in sysctl Document.
Signed-off-by: Angus Chen <angus.chen@jaguarmicro.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 15 May 2023 07:37:17 +0000 (08:37 +0100)]
Merge tag 'wireless-next-2023-05-12' of git://git./linux/kernel/git/wireless/wireless-next
Kalle valo says:
====================
wireless-next patches for v6.5
The first pull request for v6.5 and only driver changes this time.
rtl8xxxu has been making lots of progress lately and now has AP mode
support.
Major changes:
rtl8xxxu
* AP mode support, initially only for rtl8188f
rtw89
* provide RSSI, EVN and SNR statistics via debugfs
* support U-NII-4 channels on 5 GHz band
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Martin KaFai Lau [Sat, 13 May 2023 23:20:16 +0000 (16:20 -0700)]
Merge branch 'bpf: Don't EFAULT for {g,s}setsockopt with wrong optlen'
Stanislav Fomichev says:
====================
optval larger than PAGE_SIZE leads to EFAULT if the BPF program
isn't careful enough. This is often overlooked and might break
completely unrelated socket options. Instead of EFAULT,
let's ignore BPF program buffer changes. See the first patch for
more info.
In addition, clearly document this corner case and reset optlen
in our selftests (in case somebody copy-pastes from them).
v6:
- no changes; resending due to screwing up v5 series with the unrelated
patch
v5:
- goto in the selftest (Martin)
- set IP_TOS to zero to avoid endianness complications (Martin)
v4:
- ignore retval as well when optlen > PAGE_SIZE (Martin)
v3:
- don't hard-code PAGE_SIZE (Martin)
- reset orig_optlen in getsockopt when kernel part succeeds (Martin)
====================
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Stanislav Fomichev [Thu, 11 May 2023 17:04:56 +0000 (10:04 -0700)]
bpf: Document EFAULT changes for sockopt
And add examples for how to correctly handle large optlens.
This is less relevant now when we don't EFAULT anymore, but
that's still the correct thing to do.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20230511170456.1759459-5-sdf@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Stanislav Fomichev [Thu, 11 May 2023 17:04:55 +0000 (10:04 -0700)]
selftests/bpf: Correctly handle optlen > 4096
Even though it's not relevant in selftests, the people
might still copy-paste from them. So let's take care
of optlen > 4096 cases explicitly.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20230511170456.1759459-4-sdf@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Stanislav Fomichev [Thu, 11 May 2023 17:04:54 +0000 (10:04 -0700)]
selftests/bpf: Update EFAULT {g,s}etsockopt selftests
Instead of assuming EFAULT, let's assume the BPF program's
output is ignored.
Remove "getsockopt: deny arbitrary ctx->retval" because it
was actually testing optlen. We have separate set of tests
for retval.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20230511170456.1759459-3-sdf@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Stanislav Fomichev [Thu, 11 May 2023 17:04:53 +0000 (10:04 -0700)]
bpf: Don't EFAULT for {g,s}setsockopt with wrong optlen
With the way the hooks implemented right now, we have a special
condition: optval larger than PAGE_SIZE will expose only first 4k into
BPF; any modifications to the optval are ignored. If the BPF program
doesn't handle this condition by resetting optlen to 0,
the userspace will get EFAULT.
The intention of the EFAULT was to make it apparent to the
developers that the program is doing something wrong.
However, this inadvertently might affect production workloads
with the BPF programs that are not too careful (i.e., returning EFAULT
for perfectly valid setsockopt/getsockopt calls).
Let's try to minimize the chance of BPF program screwing up userspace
by ignoring the output of those BPF programs (instead of returning
EFAULT to the userspace). pr_info_once those cases to
the dmesg to help with figuring out what's going wrong.
Fixes:
0d01da6afc54 ("bpf: implement getsockopt and setsockopt hooks")
Suggested-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20230511170456.1759459-2-sdf@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Edward Cree [Fri, 12 May 2023 15:35:58 +0000 (16:35 +0100)]
sfc: fix use-after-free in efx_tc_flower_record_encap_match()
When writing error messages to extack for pseudo collisions, we can't
use encap->type as encap has already been freed. Fortunately the
same value is stored in local variable em_type, so use that instead.
Fixes:
3c9561c0a5b9 ("sfc: support TC decap rules matching on enc_ip_tos")
Reported-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King (Oracle) [Fri, 12 May 2023 16:58:37 +0000 (17:58 +0100)]
net: phylink: constify fwnode arguments
Both phylink_create() and phylink_fwnode_phy_connect() do not modify
the fwnode argument that they are passed, so lets constify these.
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shenwei Wang [Fri, 12 May 2023 13:20:10 +0000 (08:20 -0500)]
net: fec: using the standard return codes when xdp xmit errors
This patch standardizes the inconsistent return values for unsuccessful
XDP transmits by using standardized error codes (-EBUSY or -ENOMEM).
Signed-off-by: Shenwei Wang <shenwei.wang@nxp.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daire McNamara [Fri, 12 May 2023 12:20:32 +0000 (13:20 +0100)]
net: macb: Shorten max_tx_len to 4KiB - 56 on mpfs
On mpfs, with SRAM configured for 4 queues, setting max_tx_len
to GEM_TX_MAX_LEN=0x3f0 results multiple AMBA errors.
Setting max_tx_len to (4KiB - 56) removes those errors.
The details are described in erratum 1686 by Cadence
The max jumbo frame size is also reduced for mpfs to (4KiB - 56).
Signed-off-by: Daire McNamara <daire.mcnamara@microchip.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kuniyuki Iwashima [Wed, 10 May 2023 21:54:43 +0000 (14:54 -0700)]
ping: Convert hlist_nulls to plain hlist.
Since introduced in commit
c319b4d76b9e ("net: ipv4: add IPPROTO_ICMP
socket kind"), ping socket does not use SLAB_TYPESAFE_BY_RCU nor check
nulls marker in loops.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 13 May 2023 18:47:56 +0000 (19:47 +0100)]
Merge branch 'skb_frag_fill_page_desc'
Yunsheng Lin says:
====================
net: introduce skb_frag_fill_page_desc()
Most users use __skb_frag_set_page()/skb_frag_off_set()/
skb_frag_size_set() to fill the page desc for a skb frag.
It does not make much sense to calling __skb_frag_set_page()
without calling skb_frag_off_set(), as the offset may depend
on whether the page is head page or tail page, so add
skb_frag_fill_page_desc() to fill the page desc for a skb
frag.
In the future, we can make sure the page in the frag is
head page of compound page or a base page, if not, we
may warn about that and convert the tail page to head
page and update the offset accordingly, if we see a warning
about that, we also fix the caller to fill the head page
in the frag. when the fixing is done, we may remove the
warning and converting.
In this way, we can remove the compound_head() or use
page_ref_*() like the below case:
https://elixir.bootlin.com/linux/latest/source/net/core/page_pool.c#L881
https://elixir.bootlin.com/linux/latest/source/include/linux/skbuff.h#L3383
It may also convert net stack to use the folio easier.
V1: repost with all the ack/review tags included.
RFC: remove a local variable as pointed out by Simon.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Yunsheng Lin [Thu, 11 May 2023 01:12:13 +0000 (09:12 +0800)]
net: remove __skb_frag_set_page()
The remaining users calling __skb_frag_set_page() with
page being NULL seems to be doing defensive programming,
as shinfo->nr_frags is already decremented, so remove
them.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yunsheng Lin [Thu, 11 May 2023 01:12:12 +0000 (09:12 +0800)]
net: introduce and use skb_frag_fill_page_desc()
Most users use __skb_frag_set_page()/skb_frag_off_set()/
skb_frag_size_set() to fill the page desc for a skb frag.
Introduce skb_frag_fill_page_desc() to do that.
net/bpf/test_run.c does not call skb_frag_off_set() to
set the offset, "copy_from_user(page_address(page), ...)"
and 'shinfo' being part of the 'data' kzalloced in
bpf_test_init() suggest that it is assuming offset to be
initialized as zero, so call skb_frag_fill_page_desc()
with offset being zero for this case.
Also, skb_frag_set_page() is not used anymore, so remove
it.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Nikishkin [Fri, 12 May 2023 03:40:34 +0000 (11:40 +0800)]
selftests: net: vxlan: Add tests for vxlan nolocalbypass option.
Add test to make sure that the localbypass option is on by default.
Add test to change vxlan localbypass to nolocalbypass and check
that packets are delivered to userspace.
Signed-off-by: Vladimir Nikishkin <vladimir@nikishkin.pw>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Nikishkin [Fri, 12 May 2023 03:40:33 +0000 (11:40 +0800)]
net: vxlan: Add nolocalbypass option to vxlan.
If a packet needs to be encapsulated towards a local destination IP, the
packet will undergo a "local bypass" and be injected into the Rx path as
if it was received by the target VXLAN device without undergoing
encapsulation. If such a device does not exist, the packet will be
dropped.
There are scenarios where we do not want to perform such a bypass, but
instead want the packet to be encapsulated and locally received by a
user space program for post-processing.
To that end, add a new VXLAN device attribute that controls whether a
"local bypass" is performed or not. Default to performing a bypass to
maintain existing behavior.
Signed-off-by: Vladimir Nikishkin <vladimir@nikishkin.pw>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 13 May 2023 15:56:29 +0000 (16:56 +0100)]
Merge branch 'broadcom-phy-wol'
Florian Fainelli says:
====================
Support for Wake-on-LAN for Broadcom PHYs
This patch series adds support for Wake-on-LAN to the Broadcom PHY
driver. Specifically the BCM54210E/B50212E are capable of supporting
Wake-on-LAN using an external pin typically wired up to a system's GPIO.
These PHY operate a programmable Ethernet MAC destination address
comparator which will fire up an interrupt whenever a match is received.
Because of that, it was necessary to introduce patch #1 which allows the
PHY driver's ->suspend() routine to be called unconditionally. This is
necessary in our case because we need a hook point into the device
suspend/resume flow to enable the wake-up interrupt as late as possible.
Patch #2 adds support for the Broadcom PHY library and driver for
Wake-on-LAN proper with the WAKE_UCAST, WAKE_MCAST, WAKE_BCAST,
WAKE_MAGIC and WAKE_MAGICSECURE. Note that WAKE_FILTER is supportable,
however this will require further discussions and be submitted as a RFC
series later on.
Patch #3 updates the GENET driver to defer to the PHY for Wake-on-LAN if
the PHY supports it, thus allowing the MAC to be powered down to
conserve power.
Changes in v3:
- collected Reviewed-by tags
- explicitly use return 0 in bcm54xx_phy_probe() (Paolo)
Changes in v2:
- introduce PHY_ALWAYS_CALL_SUSPEND and only have the Broadcom PHY
driver set this flag to minimize changes to the suspend flow to only
drivers that need it
- corrected possibly uninitialized variable in bcm54xx_set_wakeup_irq
(Simon)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 11 May 2023 17:21:10 +0000 (10:21 -0700)]
net: bcmgenet: Add support for PHY-based Wake-on-LAN
If available, interrogate the PHY to find out whether we can use it for
Wake-on-LAN. This can be a more power efficient way of implementing
that feature, especially when the MAC is powered off in low power
states.
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 11 May 2023 17:21:09 +0000 (10:21 -0700)]
net: phy: broadcom: Add support for Wake-on-LAN
Add support for WAKE_UCAST, WAKE_MCAST, WAKE_BCAST, WAKE_MAGIC and
WAKE_MAGICSECURE. This is only supported with the BCM54210E and
compatible Ethernet PHYs. Using the in-band interrupt or an out of band
GPIO interrupts are supported.
Broadcom PHYs will generate a Wake-on-LAN level low interrupt on LED4 as
soon as one of the supported patterns is being matched. That includes
generating such an interrupt even if the PHY is operated during normal
modes. If WAKE_UCAST is selected, this could lead to the LED4 interrupt
firing up for every packet being received which is absolutely
undesirable from a performance point of view.
Because the Wake-on-LAN configuration can be set long before the system
is actually put to sleep, we cannot have an interrupt service routine to
clear on read the interrupt status register and ensure that new packet
matches will be detected.
It is desirable to enable the Wake-on-LAN interrupt as late as possible
during the system suspend process such that we limit the number of
interrupts to be handled by the system, but also conversely feed into
the Linux's system suspend way of dealing with interrupts in and around
the points of no return.
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 11 May 2023 17:21:08 +0000 (10:21 -0700)]
net: phy: Allow drivers to always call into ->suspend()
A few PHY drivers are currently attempting to not suspend the PHY when
Wake-on-LAN is enabled, however that code is not currently executing at
all due to an early check in phy_suspend().
This prevents PHY drivers from making an appropriate decisions and put
the hardware into a low power state if desired.
In order to allow the PHY drivers to opt into getting their ->suspend
routine to be called, add a PHY_ALWAYS_CALL_SUSPEND bit which can be
set. A boolean that tracks whether the PHY or the attached MAC has
Wake-on-LAN enabled is also provided for convenience.
If phydev::wol_enabled then the PHY shall not prevent its own
Wake-on-LAN detection logic from working and shall not prevent the
Ethernet MAC from receiving packets for matching.
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrii Nakryiko [Tue, 9 May 2023 06:55:02 +0000 (23:55 -0700)]
libbpf: fix offsetof() and container_of() to work with CO-RE
It seems like __builtin_offset() doesn't preserve CO-RE field
relocations properly. So if offsetof() macro is defined through
__builtin_offset(), CO-RE-enabled BPF code using container_of() will be
subtly and silently broken.
To avoid this problem, redefine offsetof() and container_of() in the
form that works with CO-RE relocations more reliably.
Fixes:
5fbc220862fc ("tools/libpf: Add offsetof/container_of macro in bpf_helpers.h")
Reported-by: Lennart Poettering <lennart@poettering.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20230509065502.2306180-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Martin KaFai Lau [Thu, 11 May 2023 04:37:48 +0000 (21:37 -0700)]
bpf: Address KCSAN report on bpf_lru_list
KCSAN reported a data-race when accessing node->ref.
Although node->ref does not have to be accurate,
take this chance to use a more common READ_ONCE() and WRITE_ONCE()
pattern instead of data_race().
There is an existing bpf_lru_node_is_ref() and bpf_lru_node_set_ref().
This patch also adds bpf_lru_node_clear_ref() to do the
WRITE_ONCE(node->ref, 0) also.
==================================================================
BUG: KCSAN: data-race in __bpf_lru_list_rotate / __htab_lru_percpu_map_update_elem
write to 0xffff888137038deb of 1 bytes by task 11240 on cpu 1:
__bpf_lru_node_move kernel/bpf/bpf_lru_list.c:113 [inline]
__bpf_lru_list_rotate_active kernel/bpf/bpf_lru_list.c:149 [inline]
__bpf_lru_list_rotate+0x1bf/0x750 kernel/bpf/bpf_lru_list.c:240
bpf_lru_list_pop_free_to_local kernel/bpf/bpf_lru_list.c:329 [inline]
bpf_common_lru_pop_free kernel/bpf/bpf_lru_list.c:447 [inline]
bpf_lru_pop_free+0x638/0xe20 kernel/bpf/bpf_lru_list.c:499
prealloc_lru_pop kernel/bpf/hashtab.c:290 [inline]
__htab_lru_percpu_map_update_elem+0xe7/0x820 kernel/bpf/hashtab.c:1316
bpf_percpu_hash_update+0x5e/0x90 kernel/bpf/hashtab.c:2313
bpf_map_update_value+0x2a9/0x370 kernel/bpf/syscall.c:200
generic_map_update_batch+0x3ae/0x4f0 kernel/bpf/syscall.c:1687
bpf_map_do_batch+0x2d9/0x3d0 kernel/bpf/syscall.c:4534
__sys_bpf+0x338/0x810
__do_sys_bpf kernel/bpf/syscall.c:5096 [inline]
__se_sys_bpf kernel/bpf/syscall.c:5094 [inline]
__x64_sys_bpf+0x43/0x50 kernel/bpf/syscall.c:5094
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
read to 0xffff888137038deb of 1 bytes by task 11241 on cpu 0:
bpf_lru_node_set_ref kernel/bpf/bpf_lru_list.h:70 [inline]
__htab_lru_percpu_map_update_elem+0x2f1/0x820 kernel/bpf/hashtab.c:1332
bpf_percpu_hash_update+0x5e/0x90 kernel/bpf/hashtab.c:2313
bpf_map_update_value+0x2a9/0x370 kernel/bpf/syscall.c:200
generic_map_update_batch+0x3ae/0x4f0 kernel/bpf/syscall.c:1687
bpf_map_do_batch+0x2d9/0x3d0 kernel/bpf/syscall.c:4534
__sys_bpf+0x338/0x810
__do_sys_bpf kernel/bpf/syscall.c:5096 [inline]
__se_sys_bpf kernel/bpf/syscall.c:5094 [inline]
__x64_sys_bpf+0x43/0x50 kernel/bpf/syscall.c:5094
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
value changed: 0x01 -> 0x00
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 11241 Comm: syz-executor.3 Not tainted 6.3.0-rc7-syzkaller-00136-g6a66fdd29ea1 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/30/2023
==================================================================
Reported-by: syzbot+ebe648a84e8784763f82@syzkaller.appspotmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20230511043748.1384166-1-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Alan Maguire [Wed, 10 May 2023 13:02:41 +0000 (14:02 +0100)]
bpf: Add --skip_encoding_btf_inconsistent_proto, --btf_gen_optimized to pahole flags for v1.25
v1.25 of pahole supports filtering out functions with multiple inconsistent
function prototypes or optimized-out parameters from the BTF representation.
These present problems because there is no additional info in BTF saying which
inconsistent prototype matches which function instance to help guide attachment,
and functions with optimized-out parameters can lead to incorrect assumptions
about register contents.
So for now, filter out such functions while adding BTF representations for
functions that have "."-suffixes (foo.isra.0) but not optimized-out parameters.
This patch assumes that below linked changes land in pahole for v1.25.
Issues with pahole filtering being too aggressive in removing functions
appear to be resolved now, but CI and further testing will confirm.
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230510130241.1696561-1-alan.maguire@oracle.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
David S. Miller [Fri, 12 May 2023 09:37:02 +0000 (10:37 +0100)]
Merge branch 'sfc-decap'
Edward Cree says:
====================
sfc: more flexible encap matches on TC decap rules
This series extends the TC offload support on EF100 to support optionally
matching on the IP ToS and UDP source port of the outer header in rules
performing tunnel decapsulation. Both of these fields allow masked
matches if the underlying hardware supports it (current EF100 hardware
supports masking on ToS, but only exact-match on source port).
Given that the source port is typically populated from a hash of inner
header entropy, it's not clear whether filtering on it is useful, but
since we can support it we may as well expose the capability.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Thu, 11 May 2023 19:47:31 +0000 (20:47 +0100)]
sfc: support TC decap rules matching on enc_src_port
Allow efx_tc_encap_match entries to include a udp_sport and a
udp_sport_mask. As with enc_ip_tos, use pseudos to enforce that all
encap matches within a given <src_ip,dst_ip,udp_dport> tuple have
the same udp_sport_mask.
Note that since we use a single layer of pseudos for both fields, two
matches that differ in (say) udp_sport value aren't permitted to have
different ip_tos_mask, even though this would technically be safe.
Current userland TC does not support setting enc_src_port; this patch
was tested with an iproute2 patched to support it.
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Thu, 11 May 2023 19:47:30 +0000 (20:47 +0100)]
sfc: support TC decap rules matching on enc_ip_tos
Allow efx_tc_encap_match entries to include an ip_tos and ip_tos_mask.
To avoid partially-overlapping Outer Rules (which can lead to undefined
behaviour in the hardware), store extra "pseudo" entries in our
encap_match hashtable, which are used to enforce that all Outer Rule
entries within a given <src_ip,dst_ip,udp_dport> tuple (or IPv6
equivalent) have the same ip_tos_mask.
The "direct" encap_match entry takes a reference on the "pseudo",
allowing it to be destroyed when all "direct" entries using it are
removed.
efx_tc_em_pseudo_type is an enum rather than just a bool because in
future an additional pseudo-type will be added to support Conntrack
offload.
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Thu, 11 May 2023 19:47:29 +0000 (20:47 +0100)]
sfc: populate enc_ip_tos matches in MAE outer rules
Currently tc.c will block them before they get here, but following
patch will change that.
Use the extack message from efx_mae_check_encap_match_caps() instead
of writing a new one, since there's now more being fed in than just
an IP version.
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Thu, 11 May 2023 19:47:28 +0000 (20:47 +0100)]
sfc: release encap match in efx_tc_flow_free()
When force-freeing leftover entries from our match_action_ht, call
efx_tc_delete_rule(), which releases all the rule's resources, rather
than open-coding it. The open-coded version was missing a call to
release the rule's encap match (if any).
It probably doesn't matter as everything's being torn down anyway, but
it's cleaner this way and prevents further error messages potentially
being logged by efx_tc_encap_match_free() later on.
Move efx_tc_flow_free() further down the file to avoid introducing a
forward declaration of efx_tc_delete_rule().
Fixes:
17654d84b47c ("sfc: add offloading of 'foreign' TC (decap) rules")
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
wuych [Fri, 12 May 2023 02:44:03 +0000 (10:44 +0800)]
net: liquidio: lio_main: Remove unnecessary (void*) conversions
Pointer variables of void * type do not require type cast.
Signed-off-by: wuych <yunchuan@nfschina.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Mikhalitsyn [Thu, 11 May 2023 13:25:06 +0000 (15:25 +0200)]
sctp: add bpf_bypass_getsockopt proto callback
Implement ->bpf_bypass_getsockopt proto callback and filter out
SCTP_SOCKOPT_PEELOFF, SCTP_SOCKOPT_PEELOFF_FLAGS and SCTP_SOCKOPT_CONNECTX3
socket options from running eBPF hook on them.
SCTP_SOCKOPT_PEELOFF and SCTP_SOCKOPT_PEELOFF_FLAGS options do fd_install(),
and if BPF_CGROUP_RUN_PROG_GETSOCKOPT hook returns an error after success of
the original handler sctp_getsockopt(...), userspace will receive an error
from getsockopt syscall and will be not aware that fd was successfully
installed into a fdtable.
As pointed by Marcelo Ricardo Leitner it seems reasonable to skip
bpf getsockopt hook for SCTP_SOCKOPT_CONNECTX3 sockopt too.
Because internaly, it triggers connect() and if error is masked
then userspace will be confused.
This patch was born as a result of discussion around a new SCM_PIDFD interface:
https://lore.kernel.org/all/
20230413133355.350571-3-aleksandr.mikhalitsyn@canonical.com/
Fixes:
0d01da6afc54 ("bpf: implement getsockopt and setsockopt hooks")
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Stanislav Fomichev <sdf@google.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Cc: Xin Long <lucien.xin@gmail.com>
Cc: linux-sctp@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: netdev@vger.kernel.org
Suggested-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Acked-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 12 May 2023 08:43:56 +0000 (09:43 +0100)]
Merge branch 'selftests-fcnal'
Guillaume Nault says:
====================
selftests: fcnal: Test SO_DONTROUTE socket option.
The objective is to cover kernel paths that use the RTO_ONLINK flag
in .flowi4_tos. This way we'll be able to safely remove this flag in
the future by properly setting .flowi4_scope instead. With these
selftests in place, we can make sure this won't introduce regressions.
For more context, the final objective is to convert .flowi4_tos to
dscp_t, to ensure that ECN bits don't influence route and fib-rule
lookups (see commit
a410a0cf9885 ("ipv6: Define dscp_t and stop taking
ECN bits into account in fib6-rules")).
These selftests only cover IPv4, as SO_DONTROUTE has no effect on IPv6
sockets.
v2:
- Use two different nettest options for setting SO_DONTROUTE either
on the server or on the client socket.
- Use the above feature to run a single 'nettest -B' instance per
test (instead of having two nettest processes for server and
client).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Guillaume Nault [Thu, 11 May 2023 14:39:46 +0000 (16:39 +0200)]
selftests: fcnal: Test SO_DONTROUTE on raw and ping sockets.
Use ping -r to test the kernel behaviour with raw and ping sockets
having the SO_DONTROUTE option.
Since ipv4_ping_novrf() is called with different values of
net.ipv4.ping_group_range, then it tests both raw and ping sockets
(ping uses ping sockets if its user ID belongs to ping_group_range
and raw sockets otherwise).
With both socket types, sending packets to a neighbour (on link) host,
should work. When the host is behind a router, sending should fail.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Guillaume Nault [Thu, 11 May 2023 14:39:39 +0000 (16:39 +0200)]
selftests: fcnal: Test SO_DONTROUTE on UDP sockets.
Use nettest --client-dontroute to test the kernel behaviour with UDP
sockets having the SO_DONTROUTE option. Sending packets to a neighbour
(on link) host, should work. When the host is behind a router, sending
should fail.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Guillaume Nault [Thu, 11 May 2023 14:39:32 +0000 (16:39 +0200)]
selftests: fcnal: Test SO_DONTROUTE on TCP sockets.
Use nettest --{client,server}-dontroute to test the kernel behaviour
with TCP sockets having the SO_DONTROUTE option. Sending packets to a
neighbour (on link) host, should work. When the host is behind a
router, sending should fail.
Client and server sockets are tested independently, so that we can
cover different TCP kernel paths.
SO_DONTROUTE also affects the syncookies path. So ipv4_tcp_dontroute()
is made to work with or without syncookies, to cover both paths.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>