Nick Alcock [Wed, 8 Mar 2023 12:12:29 +0000 (12:12 +0000)]
lib: packing: remove MODULE_LICENSE in non-modules
Since commit
8b41fc4454e ("kbuild: create modules.builtin without
Makefile.modbuiltin or tristate.conf"), MODULE_LICENSE declarations
are used to identify modules. As a consequence, uses of the macro
in non-modules will cause modprobe to misidentify their containing
object file as a module when it is not (false positives), and modprobe
might succeed rather than failing with a suitable error message.
So remove it in the files in this commit, none of which can be built as
modules.
Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
Suggested-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Hitomi Hasegawa <hasegawa-hitomi@fujitsu.com>
Cc: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20230308121230.5354-1-nick.alcock@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Nick Alcock [Wed, 8 Mar 2023 12:12:30 +0000 (12:12 +0000)]
mctp: remove MODULE_LICENSE in non-modules
Since commit
8b41fc4454e ("kbuild: create modules.builtin without
Makefile.modbuiltin or tristate.conf"), MODULE_LICENSE declarations
are used to identify modules. As a consequence, uses of the macro
in non-modules will cause modprobe to misidentify their containing
object file as a module when it is not (false positives), and modprobe
might succeed rather than failing with a suitable error message.
So remove it in the files in this commit, none of which can be built as
modules.
Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
Suggested-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Hitomi Hasegawa <hasegawa-hitomi@fujitsu.com>
Cc: Jeremy Kerr <jk@codeconstruct.com.au>
Cc: Matt Johnston <matt@codeconstruct.com.au>
Link: https://lore.kernel.org/r/20230308121230.5354-2-nick.alcock@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 10 Mar 2023 06:18:59 +0000 (22:18 -0800)]
Merge git://git./linux/kernel/git/netdev/net
Documentation/bpf/bpf_devel_QA.rst
b7abcd9c656b ("bpf, doc: Link to submitting-patches.rst for general patch submission info")
d56b0c461d19 ("bpf, docs: Fix link to netdev-FAQ target")
https://lore.kernel.org/all/
20230307095812.
236eb1be@canb.auug.org.au/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Thu, 9 Mar 2023 18:56:58 +0000 (10:56 -0800)]
Merge tag 'net-6.3-rc2' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from netfilter and bpf.
Current release - regressions:
- core: avoid skb end_offset change in __skb_unclone_keeptruesize()
- sched:
- act_connmark: handle errno on tcf_idr_check_alloc
- flower: fix fl_change() error recovery path
- ieee802154: prevent user from crashing the host
Current release - new code bugs:
- eth: bnxt_en: fix the double free during device removal
- tools: ynl:
- fix enum-as-flags in the generic CLI
- fully inherit attrs in subsets
- re-license uniformly under GPL-2.0 or BSD-3-clause
Previous releases - regressions:
- core: use indirect calls helpers for sk_exit_memory_pressure()
- tls:
- fix return value for async crypto
- avoid hanging tasks on the tx_lock
- eth: ice: copy last block omitted in ice_get_module_eeprom()
Previous releases - always broken:
- core: avoid double iput when sock_alloc_file fails
- af_unix: fix struct pid leaks in OOB support
- tls:
- fix possible race condition
- fix device-offloaded sendpage straddling records
- bpf:
- sockmap: fix an infinite loop error
- test_run: fix &xdp_frame misplacement for LIVE_FRAMES
- fix resolving BTF_KIND_VAR after ARRAY, STRUCT, UNION, PTR
- netfilter: tproxy: fix deadlock due to missing BH disable
- phylib: get rid of unnecessary locking
- eth: bgmac: fix *initial* chip reset to support BCM5358
- eth: nfp: fix csum for ipsec offload
- eth: mtk_eth_soc: fix RX data corruption issue
Misc:
- usb: qmi_wwan: add telit 0x1080 composition"
* tag 'net-6.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (64 commits)
tools: ynl: fix enum-as-flags in the generic CLI
tools: ynl: move the enum classes to shared code
net: avoid double iput when sock_alloc_file fails
af_unix: fix struct pid leaks in OOB support
eth: fealnx: bring back this old driver
net: dsa: mt7530: permit port 5 to work without port 6 on MT7621 SoC
net: microchip: sparx5: fix deletion of existing DSCP mappings
octeontx2-af: Unlock contexts in the queue context cache in case of fault detection
net/smc: fix fallback failed while sendmsg with fastopen
ynl: re-license uniformly under GPL-2.0 OR BSD-3-Clause
mailmap: update entries for Stephen Hemminger
mailmap: add entry for Maxim Mikityanskiy
nfc: change order inside nfc_se_io error path
ethernet: ice: avoid gcc-9 integer overflow warning
ice: don't ignore return codes in VSI related code
ice: Fix DSCP PFC TLV creation
net: usb: qmi_wwan: add Telit 0x1080 composition
net: usb: cdc_mbim: avoid altsetting toggling for Telit FE990
netfilter: conntrack: adopt safer max chain length
net: tls: fix device-offloaded sendpage straddling records
...
Linus Torvalds [Thu, 9 Mar 2023 18:17:23 +0000 (10:17 -0800)]
Merge tag 'for-linus-
2023030901' of git://git./linux/kernel/git/hid/hid
Pull HID fixes from Benjamin Tissoires:
- fix potential out of bound write of zeroes in HID core with a
specially crafted uhid device (Lee Jones)
- fix potential use-after-free in work function in intel-ish-hid (Reka
Norman)
- selftests config fixes (Benjamin Tissoires)
- few device small fixes and support
* tag 'for-linus-
2023030901' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
HID: intel-ish-hid: ipc: Fix potential use-after-free in work function
HID: logitech-hidpp: Add support for Logitech MX Master 3S mouse
HID: cp2112: Fix driver not registering GPIO IRQ chip as threaded
selftest: hid: fix hid_bpf not set in config
HID: uhid: Over-ride the default maximum data buffer value with our own
HID: core: Provide new max_buffer_size attribute to over-ride the default
Linus Torvalds [Thu, 9 Mar 2023 18:08:46 +0000 (10:08 -0800)]
Merge tag 'm68k-for-v6.3-tag2' of git://git./linux/kernel/git/geert/linux-m68k
Pull m68k fixes from Geert Uytterhoeven:
- Fix systems with memory at end of 32-bit address space
- Fix initrd on systems where memory does not start at address zero
- Fix 68030 handling of bus errors for addresses in exception tables
* tag 'm68k-for-v6.3-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
m68k: Only force 030 bus error if PC not in exception table
m68k: mm: Move initrd phys_to_virt handling after paging_init()
m68k: mm: Fix systems with memory at end of 32-bit address space
Al Viro [Mon, 6 Mar 2023 01:20:30 +0000 (01:20 +0000)]
sh: sanitize the flags on sigreturn
We fetch %SR value from sigframe; it might have been modified by signal
handler, so we can't trust it with any bits that are not modifiable in
user mode.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Rich Felker <dalias@libc.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Paolo Abeni [Thu, 9 Mar 2023 10:45:08 +0000 (11:45 +0100)]
Merge branch '100GbE' of git://git./linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2023-03-07 (ice)
This series contains updates to ice driver only.
Dave removes masking from pfcena field as it was incorrectly preventing
valid traffic classes from being enabled.
Michal resolves various smatch issues such as not propagating error
codes and returning 0 explicitly.
Arnd Bergmann resolves gcc-9 warning for integer overflow.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
ethernet: ice: avoid gcc-9 integer overflow warning
ice: don't ignore return codes in VSI related code
ice: Fix DSCP PFC TLV creation
====================
Link: https://lore.kernel.org/r/20230307220714.3997294-1-anthony.l.nguyen@intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Thu, 9 Mar 2023 10:31:46 +0000 (11:31 +0100)]
Merge branch 'sctp-add-another-two-stream-schedulers'
Xin Long says:
====================
sctp: add another two stream schedulers
All SCTP stream schedulers are defined in rfc8260#section-3,
First-Come First-Served, Round-Robin and Priority-Based
Schedulers are already added in kernel.
This patchset adds another two schedulers: Fair Capacity
Scheduler and Weighted Fair Queueing Scheduler.
Note that the left one "Round-Robin Scheduler per Packet"
Scheduler is not implemented by this patch, as it's still
intrusive to be added in the current SCTP kernel code.
====================
Link: https://lore.kernel.org/r/cover.1678224012.git.lucien.xin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Xin Long [Tue, 7 Mar 2023 21:23:27 +0000 (16:23 -0500)]
sctp: add weighted fair queueing stream scheduler
As it says in rfc8260#section-3.6 about the weighted fair queueing
scheduler:
A Weighted Fair Queueing scheduler between the streams is used. The
weight is configurable per outgoing SCTP stream. This scheduler
considers the lengths of the messages of each stream and schedules
them in a specific way to use the capacity according to the given
weights. If the weight of stream S1 is n times the weight of stream
S2, the scheduler should assign to stream S1 n times the capacity it
assigns to stream S2. The details are implementation dependent.
Interleaving user messages allows for a better realization of the
capacity usage according to the given weights.
This patch adds Weighted Fair Queueing Scheduler actually based on
the code of Fair Capacity Scheduler by adding fc_weight into struct
sctp_stream_out_ext and taking it into account when sorting stream->
fc_list in sctp_sched_fc_sched() and sctp_sched_fc_dequeue_done().
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Xin Long [Tue, 7 Mar 2023 21:23:26 +0000 (16:23 -0500)]
sctp: add fair capacity stream scheduler
As it says in rfc8260#section-3.5 about the fair capacity scheduler:
A fair capacity distribution between the streams is used. This
scheduler considers the lengths of the messages of each stream and
schedules them in a specific way to maintain an equal capacity for
all streams. The details are implementation dependent. interleaving
user messages allows for a better realization of the fair capacity
usage.
This patch adds Fair Capacity Scheduler based on the foundations added
by commit
5bbbbe32a431 ("sctp: introduce stream scheduler foundations"):
A fc_list and a fc_length are added into struct sctp_stream_out_ext and
a fc_list is added into struct sctp_stream. In .enqueue, when there are
chunks enqueued into a stream, this stream will be linked into stream->
fc_list by its fc_list ordered by its fc_length. In .dequeue, it always
picks up the 1st skb from stream->fc_list. In .dequeue_done, fc_length
is increased by chunk's len and update its location in stream->fc_list
according to the its new fc_length.
Note that when the new fc_length overflows in .dequeue_done, instead of
resetting all fc_lengths to 0, we only reduced them by U32_MAX / 4 to
avoid a moment of imbalance in the scheduling, as Marcelo suggested.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Thu, 9 Mar 2023 08:51:34 +0000 (09:51 +0100)]
Merge branch 'various-mtk_eth_soc-cleanups'
Russell King says:
====================
Various mtk_eth_soc cleanups
Here are a number of patches that do a bit of cleanup to mtk_eth_soc.
The first patch cleans up mtk_gmac0_rgmii_adjust(), which is the
troublesome function preventing the driver becoming a post-March2020
phylink driver. It doesn't solve that problem, merely makes the code
easier to follow by getting rid of repeated tenary operators.
The second patch moves the check for DDR2 memory to the initialisation
of phylink's supported_interfaces - if TRGMII is not possible for some
reason, we should not be erroring out in phylink MAC operations when
that can be determined prior to phylink creation.
The third patch removes checks from mtk_mac_config() that are done
when initialising supported_interfaces - phylink will not call
mtk_mac_config() with an interface that was not marked as supported,
so these checks are redundant.
The last patch removes the remaining vestiges of REVMII and RMII
support, which appears to be entirely unused.
These shouldn't conflict with Daniel's patch set, but if they do I
will rework as appropriate.
====================
Link: https://lore.kernel.org/r/ZAdj9qUXcHUsK7Gt@shell.armlinux.org.uk
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Russell King (Oracle) [Tue, 7 Mar 2023 16:19:41 +0000 (16:19 +0000)]
net: mtk_eth_soc: remove support for RMII and REVMII modes
Since the conversion of mtk_eth_soc to phylink's supported_interfaces
bitmap, these two modes have not been selectable. No one has raised
this as an issue. Checking the in-kernel DT files, none of them use
either of these modes with this hardware.
Daniel Golle concurs:
A quick grep through the device trees of the more than 650 ramips and
mediatek boards we support in OpenWrt has revealed that *none* of them
uses either reduced-MII or reverse-MII PHY modes. I could imaging that
some more specialized ramips boards may use the RMII 100M PHY mode to
connect with exotic PHYs for industrial or automotive applications
(think: for 100BASE-T1 PHY connected via RMII). I have never seen or
touched such boards, but there are hints that they do exist.
For reverse-MII there are cases in which the Ralink SoC (Rt305x, for
example) is used in iNIC mode, ie. connected as a PHY to another SoC,
and running only a minimal firmware rather than running Linux. Due to
the lack of external DRAM for the Ralink SoC on this kind of boards,
the Ralink SoC there will anyway never be able to boot Linux.
I've seen this e.g. in multimedia devices like early WiFi-connected
not-yet-so-smart TVs.
Consequently, the conclusion is that no one uses these modes with this
hardware, so we might as well drop support for them.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Russell King (Oracle) [Tue, 7 Mar 2023 16:19:36 +0000 (16:19 +0000)]
net: mtk_eth_soc: remove unnecessary checks in mtk_mac_config()
mtk_mac_config() checks that the interface mode is permitted for the
capabilities, but we already do these checks in mtk_add_mac() when
initialising phylink's supported_interfaces bitmap. Remove the
unnecessary tests.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Russell King (Oracle) [Tue, 7 Mar 2023 16:19:31 +0000 (16:19 +0000)]
net: mtk_eth_soc: move trgmii ddr2 check to probe function
If TRGMII mode is not permitted when using DDR2 mode, we should handle
that when setting up phylink's ->supported_interfaces so phylink knows
that this is not supported by the hardware. Move this check to
mtk_add_mac().
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Russell King (Oracle) [Tue, 7 Mar 2023 16:19:26 +0000 (16:19 +0000)]
net: mtk_eth_soc: tidy mtk_gmac0_rgmii_adjust()
Get rid of the multiple tenary operators in mtk_gmac0_rgmii_adjust()
replacing them with a single if(), thus making the code easier to read.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jakub Kicinski [Thu, 9 Mar 2023 07:34:42 +0000 (23:34 -0800)]
Merge branch 'pci-aer-remove-redundant-device-control-error-reporting-enable'
Bjorn Helgaas says:
====================
PCI/AER: Remove redundant Device Control Error Reporting Enable
From: Bjorn Helgaas <bhelgaas@google.com>
Since
f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is native"),
which appeared in v6.0, the PCI core has enabled PCIe error reporting for
all devices during enumeration.
Remove driver code to do this and remove unnecessary includes of
<linux/aer.h> from several other drivers.
Intel folks, sorry that I missed removing the <linux/aer.h> includes in the
first series.
====================
Link: https://lore.kernel.org/r/20230307181940.868828-1-helgaas@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Bjorn Helgaas [Tue, 7 Mar 2023 18:19:39 +0000 (12:19 -0600)]
ixgbe: Remove unnecessary aer.h include
<linux/aer.h> is unused, so remove it.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Tony Nguyen <anthony.l.nguyen@intel.com>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Bjorn Helgaas [Tue, 7 Mar 2023 18:19:38 +0000 (12:19 -0600)]
igc: Remove unnecessary aer.h include
<linux/aer.h> is unused, so remove it.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Tony Nguyen <anthony.l.nguyen@intel.com>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Bjorn Helgaas [Tue, 7 Mar 2023 18:19:37 +0000 (12:19 -0600)]
igb: Remove unnecessary aer.h include
<linux/aer.h> is unused, so remove it.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Tony Nguyen <anthony.l.nguyen@intel.com>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Bjorn Helgaas [Tue, 7 Mar 2023 18:19:36 +0000 (12:19 -0600)]
ice: Remove unnecessary aer.h include
<linux/aer.h> is unused, so remove it.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Tony Nguyen <anthony.l.nguyen@intel.com>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Bjorn Helgaas [Tue, 7 Mar 2023 18:19:35 +0000 (12:19 -0600)]
iavf: Remove unnecessary aer.h include
<linux/aer.h> is unused, so remove it.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Tony Nguyen <anthony.l.nguyen@intel.com>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Bjorn Helgaas [Tue, 7 Mar 2023 18:19:34 +0000 (12:19 -0600)]
i40e: Remove unnecessary aer.h include
<linux/aer.h> is unused, so remove it.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Tony Nguyen <anthony.l.nguyen@intel.com>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Bjorn Helgaas [Tue, 7 Mar 2023 18:19:33 +0000 (12:19 -0600)]
fm10k: Remove unnecessary aer.h include
<linux/aer.h> is unused, so remove it.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Tony Nguyen <anthony.l.nguyen@intel.com>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Bjorn Helgaas [Tue, 7 Mar 2023 18:19:32 +0000 (12:19 -0600)]
e1000e: Remove unnecessary aer.h include
<linux/aer.h> is unused, so remove it.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Tony Nguyen <anthony.l.nguyen@intel.com>
Cc: intel-wired-lan@lists.osuosl.org
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Bjorn Helgaas [Tue, 7 Mar 2023 18:19:31 +0000 (12:19 -0600)]
net: txgbe: Drop redundant pci_enable_pcie_error_reporting()
pci_enable_pcie_error_reporting() enables the device to send ERR_*
Messages. Since
f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is
native"), the PCI core does this for all devices during enumeration, so the
driver doesn't need to do it itself.
Remove the redundant pci_enable_pcie_error_reporting() call from the
driver. Also remove the corresponding pci_disable_pcie_error_reporting()
from the driver .remove() path.
Note that this only controls ERR_* Messages from the device. An ERR_*
Message may cause the Root Port to generate an interrupt, depending on the
AER Root Error Command register managed by the AER service driver.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Jiawen Wu <jiawenwu@trustnetic.com>
Cc: Mengyuan Lou <mengyuanlou@net-swift.com>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Bjorn Helgaas [Tue, 7 Mar 2023 18:19:30 +0000 (12:19 -0600)]
net: ngbe: Drop redundant pci_enable_pcie_error_reporting()
pci_enable_pcie_error_reporting() enables the device to send ERR_*
Messages. Since
f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is
native"), the PCI core does this for all devices during enumeration, so the
driver doesn't need to do it itself.
Remove the redundant pci_enable_pcie_error_reporting() call from the
driver. Also remove the corresponding pci_disable_pcie_error_reporting()
from the driver .remove() path.
Note that this only controls ERR_* Messages from the device. An ERR_*
Message may cause the Root Port to generate an interrupt, depending on the
AER Root Error Command register managed by the AER service driver.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Jiawen Wu <jiawenwu@trustnetic.com>
Cc: Mengyuan Lou <mengyuanlou@net-swift.com>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Bjorn Helgaas [Tue, 7 Mar 2023 18:19:29 +0000 (12:19 -0600)]
sfc_ef100: Drop redundant pci_disable_pcie_error_reporting()
51b35a454efd ("sfc: skeleton EF100 PF driver") added a call to
pci_disable_pcie_error_reporting() in ef100_pci_remove().
Remove this call since there's no apparent reason to disable error
reporting when it was not previously enabled.
Note that since
f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is
native"), the PCI core enables PCIe error reporting for all devices during
enumeration, so the driver doesn't need to do it itself.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Martin Habets <habetsm.xilinx@gmail.com>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Bjorn Helgaas [Tue, 7 Mar 2023 18:19:28 +0000 (12:19 -0600)]
sfc/siena: Drop redundant pci_enable_pcie_error_reporting()
pci_enable_pcie_error_reporting() enables the device to send ERR_*
Messages. Since
f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is
native"), the PCI core does this for all devices during enumeration, so the
driver doesn't need to do it itself.
Remove the redundant pci_enable_pcie_error_reporting() call from the
driver. Also remove the corresponding pci_disable_pcie_error_reporting()
from the driver .remove() path.
Note that this only controls ERR_* Messages from the device. An ERR_*
Message may cause the Root Port to generate an interrupt, depending on the
AER Root Error Command register managed by the AER service driver.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Martin Habets <habetsm.xilinx@gmail.com>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Bjorn Helgaas [Tue, 7 Mar 2023 18:19:27 +0000 (12:19 -0600)]
sfc: falcon: Drop redundant pci_enable_pcie_error_reporting()
pci_enable_pcie_error_reporting() enables the device to send ERR_*
Messages. Since
f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is
native"), the PCI core does this for all devices during enumeration, so the
driver doesn't need to do it itself.
Remove the redundant pci_enable_pcie_error_reporting() call from the
driver. Also remove the corresponding pci_disable_pcie_error_reporting()
from the driver .remove() path.
Note that this only controls ERR_* Messages from the device. An ERR_*
Message may cause the Root Port to generate an interrupt, depending on the
AER Root Error Command register managed by the AER service driver.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Martin Habets <habetsm.xilinx@gmail.com>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Bjorn Helgaas [Tue, 7 Mar 2023 18:19:26 +0000 (12:19 -0600)]
sfc: Drop redundant pci_enable_pcie_error_reporting()
pci_enable_pcie_error_reporting() enables the device to send ERR_*
Messages. Since
f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is
native"), the PCI core does this for all devices during enumeration, so the
driver doesn't need to do it itself.
Remove the redundant pci_enable_pcie_error_reporting() call from the
driver. Also remove the corresponding pci_disable_pcie_error_reporting()
from the driver .remove() path.
Note that this only controls ERR_* Messages from the device. An ERR_*
Message may cause the Root Port to generate an interrupt, depending on the
AER Root Error Command register managed by the AER service driver.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Martin Habets <habetsm.xilinx@gmail.com>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Bjorn Helgaas [Tue, 7 Mar 2023 18:19:25 +0000 (12:19 -0600)]
qlcnic: Remove unnecessary aer.h include
<linux/aer.h> is unused, so remove it.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Shahed Shaikh <shshaikh@marvell.com>
Cc: Manish Chopra <manishc@marvell.com>
Cc: GR-Linux-NIC-Dev@marvell.com
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Bjorn Helgaas [Tue, 7 Mar 2023 18:19:24 +0000 (12:19 -0600)]
qlcnic: Drop redundant pci_enable_pcie_error_reporting()
pci_enable_pcie_error_reporting() enables the device to send ERR_*
Messages. Since
f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is
native"), the PCI core does this for all devices during enumeration, so the
driver doesn't need to do it itself.
Remove the redundant pci_enable_pcie_error_reporting() call from the
driver. Also remove the corresponding pci_disable_pcie_error_reporting()
from the driver .remove() path.
Note that this only controls ERR_* Messages from the device. An ERR_*
Message may cause the Root Port to generate an interrupt, depending on the
AER Root Error Command register managed by the AER service driver.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Shahed Shaikh <shshaikh@marvell.com>
Cc: Manish Chopra <manishc@marvell.com>
Cc: GR-Linux-NIC-Dev@marvell.com
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Bjorn Helgaas [Tue, 7 Mar 2023 18:19:23 +0000 (12:19 -0600)]
net: qede: Remove unnecessary aer.h include
<linux/aer.h> is unused, so remove it.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Ariel Elior <aelior@marvell.com>
Cc: Manish Chopra <manishc@marvell.com>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Bjorn Helgaas [Tue, 7 Mar 2023 18:19:22 +0000 (12:19 -0600)]
qed: Drop redundant pci_enable_pcie_error_reporting()
pci_enable_pcie_error_reporting() enables the device to send ERR_*
Messages. Since
f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is
native"), the PCI core does this for all devices during enumeration, so the
driver doesn't need to do it itself.
Remove the redundant pci_enable_pcie_error_reporting() call from the
driver. Also remove the corresponding pci_disable_pcie_error_reporting()
from the driver .remove() path.
Note that this only controls ERR_* Messages from the device. An ERR_*
Message may cause the Root Port to generate an interrupt, depending on the
AER Root Error Command register managed by the AER service driver.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Ariel Elior <aelior@marvell.com>
Cc: Manish Chopra <manishc@marvell.com>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Bjorn Helgaas [Tue, 7 Mar 2023 18:19:21 +0000 (12:19 -0600)]
octeon_ep: Drop redundant pci_enable_pcie_error_reporting()
pci_enable_pcie_error_reporting() enables the device to send ERR_*
Messages. Since
f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is
native"), the PCI core does this for all devices during enumeration, so the
driver doesn't need to do it itself.
Remove the redundant pci_enable_pcie_error_reporting() call from the
driver. Also remove the corresponding pci_disable_pcie_error_reporting()
from the driver .remove() path.
Note that this only controls ERR_* Messages from the device. An ERR_*
Message may cause the Root Port to generate an interrupt, depending on the
AER Root Error Command register managed by the AER service driver.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Veerasenareddy Burru <vburru@marvell.com>
Cc: Abhijit Ayarekar <aayarekar@marvell.com>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Bjorn Helgaas [Tue, 7 Mar 2023 18:19:20 +0000 (12:19 -0600)]
netxen_nic: Drop redundant pci_enable_pcie_error_reporting()
pci_enable_pcie_error_reporting() enables the device to send ERR_*
Messages. Since
f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is
native"), the PCI core does this for all devices during enumeration, so the
driver doesn't need to do it itself.
Remove the redundant pci_enable_pcie_error_reporting() call from the
driver. Also remove the corresponding pci_disable_pcie_error_reporting()
from the driver .remove() path.
Note that this only controls ERR_* Messages from the device. An ERR_*
Message may cause the Root Port to generate an interrupt, depending on the
AER Root Error Command register managed by the AER service driver.
Also note that the driver only called these for NX_IS_REVISION_P3 devices,
so since
f26e58bf6f54, error reporting has been enabled for devices other
than NX_IS_REVISION_P3.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Manish Chopra <manishc@marvell.com>
Cc: Rahul Verma <rahulv@marvell.com>
Cc: GR-Linux-NIC-Dev@marvell.com
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Bjorn Helgaas [Tue, 7 Mar 2023 18:19:19 +0000 (12:19 -0600)]
net: hns3: remove unnecessary aer.h include
<linux/aer.h> is unused, so remove it.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Yisen Zhuang <yisen.zhuang@huawei.com>
Cc: Salil Mehta <salil.mehta@huawei.com>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Bjorn Helgaas [Tue, 7 Mar 2023 18:19:18 +0000 (12:19 -0600)]
net/fungible: Drop redundant pci_enable_pcie_error_reporting()
pci_enable_pcie_error_reporting() enables the device to send ERR_*
Messages. Since
f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is
native"), the PCI core does this for all devices during enumeration, so the
driver doesn't need to do it itself.
Remove the redundant pci_enable_pcie_error_reporting() call from the
driver. Also remove the corresponding pci_disable_pcie_error_reporting()
from the driver .remove() path.
Note that this only controls ERR_* Messages from the device. An ERR_*
Message may cause the Root Port to generate an interrupt, depending on the
AER Root Error Command register managed by the AER service driver.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Dimitris Michailidis <dmichail@fungible.com>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Bjorn Helgaas [Tue, 7 Mar 2023 18:19:17 +0000 (12:19 -0600)]
cxgb4: Drop redundant pci_enable_pcie_error_reporting()
pci_enable_pcie_error_reporting() enables the device to send ERR_*
Messages. Since
f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is
native"), the PCI core does this for all devices during enumeration, so the
driver doesn't need to do it itself.
Remove the redundant pci_enable_pcie_error_reporting() call from the
driver. Also remove the corresponding pci_disable_pcie_error_reporting()
from the driver .remove() path.
Note that this only controls ERR_* Messages from the device. An ERR_*
Message may cause the Root Port to generate an interrupt, depending on the
AER Root Error Command register managed by the AER service driver.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Raju Rangoju <rajur@chelsio.com>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Bjorn Helgaas [Tue, 7 Mar 2023 18:19:16 +0000 (12:19 -0600)]
bnxt: Drop redundant pci_enable_pcie_error_reporting()
pci_enable_pcie_error_reporting() enables the device to send ERR_*
Messages. Since
f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is
native"), the PCI core does this for all devices during enumeration.
Remove the redundant pci_enable_pcie_error_reporting() call from the
driver. Also remove the corresponding pci_disable_pcie_error_reporting()
from the driver .remove() path.
Note that this only controls ERR_* Messages from the device. An ERR_*
Message may cause the Root Port to generate an interrupt, depending on the
AER Root Error Command register managed by the AER service driver.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Bjorn Helgaas [Tue, 7 Mar 2023 18:19:15 +0000 (12:19 -0600)]
bnx2x: Drop redundant pci_enable_pcie_error_reporting()
pci_enable_pcie_error_reporting() enables the device to send ERR_*
Messages. Since
f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is
native"), the PCI core does this for all devices during enumeration, so the
driver doesn't need to do it itself.
Remove the redundant pci_enable_pcie_error_reporting() call from the
driver. Also remove the corresponding pci_disable_pcie_error_reporting()
from the driver .remove() path.
Note that this only controls ERR_* Messages from the device. An ERR_*
Message may cause the Root Port to generate an interrupt, depending on the
AER Root Error Command register managed by the AER service driver.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Ariel Elior <aelior@marvell.com>
Cc: Sudarsana Kalluru <skalluru@marvell.com>
Cc: Manish Chopra <manishc@marvell.com>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Bjorn Helgaas [Tue, 7 Mar 2023 18:19:14 +0000 (12:19 -0600)]
bnx2: Drop redundant pci_enable_pcie_error_reporting()
pci_enable_pcie_error_reporting() enables the device to send ERR_*
Messages. Since
f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is
native"), the PCI core does this for all devices during enumeration, so the
driver doesn't need to do it itself.
Remove the redundant pci_enable_pcie_error_reporting() call from the
driver. Also remove the corresponding pci_disable_pcie_error_reporting()
from the driver .remove() path.
Note that this only controls ERR_* Messages from the device. An ERR_*
Message may cause the Root Port to generate an interrupt, depending on the
AER Root Error Command register managed by the AER service driver.
cd709aa90648 ("bnx2: Add PCI Advanced Error Reporting support.") added
pci_enable_pcie_error_reporting() for all devices, and
c239f279e571 ("bnx2:
Enable AER on PCIE devices only") restricted it to BNX2_CHIP_5709 devices
to avoid an error message when it failed on non-PCIe devices. The PCI core
only enables PCIe error reporting on PCIe devices, which I assume means
BNX2_CHIP_5709.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Rasesh Mody <rmody@marvell.com>
Cc: GR-Linux-NIC-Dev@marvell.com
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Bjorn Helgaas [Tue, 7 Mar 2023 18:19:13 +0000 (12:19 -0600)]
be2net: Drop redundant pci_enable_pcie_error_reporting()
pci_enable_pcie_error_reporting() enables the device to send ERR_*
Messages. Since
f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is
native"), the PCI core does this for all devices during enumeration, so the
driver doesn't need to do it itself.
Remove the redundant pci_enable_pcie_error_reporting() call from the
driver. Also remove the corresponding pci_disable_pcie_error_reporting()
from the driver .remove() path.
Note that this only controls ERR_* Messages from the device. An ERR_*
Message may cause the Root Port to generate an interrupt, depending on the
AER Root Error Command register managed by the AER service driver.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Ajit Khaparde <ajit.khaparde@broadcom.com>
Cc: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Cc: Somnath Kotur <somnath.kotur@broadcom.com>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Bjorn Helgaas [Tue, 7 Mar 2023 18:19:12 +0000 (12:19 -0600)]
alx: Drop redundant pci_enable_pcie_error_reporting()
pci_enable_pcie_error_reporting() enables the device to send ERR_*
Messages. Since
f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is
native"), the PCI core does this for all devices during enumeration, so the
driver doesn't need to do it itself.
Remove the redundant pci_enable_pcie_error_reporting() call from the
driver. Also remove the corresponding pci_disable_pcie_error_reporting()
from the driver .remove() path.
Note that this only controls ERR_* Messages from the device. An ERR_*
Message may cause the Root Port to generate an interrupt, depending on the
AER Root Error Command register managed by the AER service driver.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Chris Snook <chris.snook@gmail.com>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 9 Mar 2023 07:28:23 +0000 (23:28 -0800)]
Merge branch 'tools-ynl-fix-enum-as-flags-in-the-generic-cli'
Jakub Kicinski says:
====================
tools: ynl: fix enum-as-flags in the generic CLI
The CLI needs to use proper classes when looking at Enum definitions
rather than interpreting the YAML spec ad-hoc, because we have more
than on format of the definition supported.
====================
Link: https://lore.kernel.org/r/20230308003923.445268-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 8 Mar 2023 00:39:23 +0000 (16:39 -0800)]
tools: ynl: fix enum-as-flags in the generic CLI
Lorenzo points out that the generic CLI is broken for the netdev
family. When I added the support for documentation of enums
(and sparse enums) the client script was not updated.
It expects the values in enum to be a list of names,
now it can also be a dict (YAML object).
Reported-by: Lorenzo Bianconi <lorenzo@kernel.org>
Fixes:
e4b48ed460d3 ("tools: ynl: add a completely generic client")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 8 Mar 2023 00:39:22 +0000 (16:39 -0800)]
tools: ynl: move the enum classes to shared code
Move bulk of the EnumSet and EnumEntry code to shared
code for reuse by cli.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Thadeu Lima de Souza Cascardo [Tue, 7 Mar 2023 17:37:07 +0000 (14:37 -0300)]
net: avoid double iput when sock_alloc_file fails
When sock_alloc_file fails to allocate a file, it will call sock_release.
__sys_socket_file should then not call sock_release again, otherwise there
will be a double free.
[ 89.319884] ------------[ cut here ]------------
[ 89.320286] kernel BUG at fs/inode.c:1764!
[ 89.320656] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[ 89.321051] CPU: 7 PID: 125 Comm: iou-sqp-124 Not tainted 6.2.0+ #361
[ 89.321535] RIP: 0010:iput+0x1ff/0x240
[ 89.321808] Code: d1 83 e1 03 48 83 f9 02 75 09 48 81 fa 00 10 00 00 77 05 83 e2 01 75 1f 4c 89 ef e8 fb d2 ba 00 e9 80 fe ff ff c3 cc cc cc cc <0f> 0b 0f 0b e9 d0 fe ff ff 0f 0b eb 8d 49 8d b4 24 08 01 00 00 48
[ 89.322760] RSP: 0018:
ffffbdd60068bd50 EFLAGS:
00010202
[ 89.323036] RAX:
0000000000000000 RBX:
ffff9d7ad3cacac0 RCX:
0000000000001107
[ 89.323412] RDX:
000000000003af00 RSI:
0000000000000000 RDI:
ffff9d7ad3cacb40
[ 89.323785] RBP:
ffffbdd60068bd68 R08:
ffffffffffffffff R09:
ffffffffab606438
[ 89.324157] R10:
ffffffffacb3dfa0 R11:
6465686361657256 R12:
ffff9d7ad3cacb40
[ 89.324529] R13:
0000000080000001 R14:
0000000080000001 R15:
0000000000000002
[ 89.324904] FS:
00007f7b28516740(0000) GS:
ffff9d7aeb1c0000(0000) knlGS:
0000000000000000
[ 89.325328] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 89.325629] CR2:
00007f0af52e96c0 CR3:
0000000002a02006 CR4:
0000000000770ee0
[ 89.326004] PKRU:
55555554
[ 89.326161] Call Trace:
[ 89.326298] <TASK>
[ 89.326419] __sock_release+0xb5/0xc0
[ 89.326632] __sys_socket_file+0xb2/0xd0
[ 89.326844] io_socket+0x88/0x100
[ 89.327039] ? io_issue_sqe+0x6a/0x430
[ 89.327258] io_issue_sqe+0x67/0x430
[ 89.327450] io_submit_sqes+0x1fe/0x670
[ 89.327661] io_sq_thread+0x2e6/0x530
[ 89.327859] ? __pfx_autoremove_wake_function+0x10/0x10
[ 89.328145] ? __pfx_io_sq_thread+0x10/0x10
[ 89.328367] ret_from_fork+0x29/0x50
[ 89.328576] RIP: 0033:0x0
[ 89.328732] Code: Unable to access opcode bytes at 0xffffffffffffffd6.
[ 89.329073] RSP: 002b:
0000000000000000 EFLAGS:
00000202 ORIG_RAX:
00000000000001a9
[ 89.329477] RAX:
0000000000000000 RBX:
0000000000000000 RCX:
00007f7b28637a3d
[ 89.329845] RDX:
00007fff4e4318a8 RSI:
00007fff4e4318b0 RDI:
0000000000000400
[ 89.330216] RBP:
00007fff4e431830 R08:
00007fff4e431711 R09:
00007fff4e4318b0
[ 89.330584] R10:
0000000000000000 R11:
0000000000000202 R12:
00007fff4e441b38
[ 89.330950] R13:
0000563835e3e725 R14:
0000563835e40d10 R15:
00007f7b28784040
[ 89.331318] </TASK>
[ 89.331441] Modules linked in:
[ 89.331617] ---[ end trace
0000000000000000 ]---
Fixes:
da214a475f8b ("net: add __sys_socket_file()")
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230307173707.468744-1-cascardo@canonical.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Tue, 7 Mar 2023 16:45:30 +0000 (16:45 +0000)]
af_unix: fix struct pid leaks in OOB support
syzbot reported struct pid leak [1].
Issue is that queue_oob() calls maybe_add_creds() which potentially
holds a reference on a pid.
But skb->destructor is not set (either directly or by calling
unix_scm_to_skb())
This means that subsequent kfree_skb() or consume_skb() would leak
this reference.
In this fix, I chose to fully support scm even for the OOB message.
[1]
BUG: memory leak
unreferenced object 0xffff8881053e7f80 (size 128):
comm "syz-executor242", pid 5066, jiffies
4294946079 (age 13.220s)
hex dump (first 32 bytes):
01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<
ffffffff812ae26a>] alloc_pid+0x6a/0x560 kernel/pid.c:180
[<
ffffffff812718df>] copy_process+0x169f/0x26c0 kernel/fork.c:2285
[<
ffffffff81272b37>] kernel_clone+0xf7/0x610 kernel/fork.c:2684
[<
ffffffff812730cc>] __do_sys_clone+0x7c/0xb0 kernel/fork.c:2825
[<
ffffffff849ad699>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
[<
ffffffff849ad699>] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
[<
ffffffff84a0008b>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
Fixes:
314001f0bf92 ("af_unix: Add OOB support")
Reported-by: syzbot+7699d9e5635c10253a27@syzkaller.appspotmail.com
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Rao Shoaib <rao.shoaib@oracle.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230307164530.771896-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 7 Mar 2023 17:19:30 +0000 (09:19 -0800)]
eth: fealnx: bring back this old driver
This reverts commit
d5e2d038dbece821f1af57acbeded3aa9a1832c1.
We have a report of this chip being used on a
SURECOM EP-320X-S 100/10M Ethernet PCI Adapter
which could still have been purchased in some parts
of the world 3 years ago.
Cc: stable@vger.kernel.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217151
Fixes:
d5e2d038dbec ("eth: fealnx: delete the driver for Myson MTD-800")
Link: https://lore.kernel.org/r/20230307171930.4008454-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Wolfram Sang [Tue, 7 Mar 2023 16:30:35 +0000 (17:30 +0100)]
ravb: remove R-Car H3 ES1.* handling
R-Car H3 ES1.* was only available to an internal development group and
needed a lot of quirks and workarounds. These become a maintenance
burden now, so our development group decided to remove upstream support
and disable booting for this SoC. Public users only have ES2 onwards.
Reviewed-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Link: https://lore.kernel.org/all/20230307163041.3815-8-wsa+renesas@sang-engineering.com/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Tue, 7 Mar 2023 15:54:11 +0000 (17:54 +0200)]
net: dsa: mt7530: permit port 5 to work without port 6 on MT7621 SoC
The MT7530 switch from the MT7621 SoC has 2 ports which can be set up as
internal: port 5 and 6. Arınç reports that the GMAC1 attached to port 5
receives corrupted frames, unless port 6 (attached to GMAC0) has been
brought up by the driver. This is true regardless of whether port 5 is
used as a user port or as a CPU port (carrying DSA tags).
Offline debugging (blind for me) which began in the linked thread showed
experimentally that the configuration done by the driver for port 6
contains a step which is needed by port 5 as well - the write to
CORE_GSWPLL_GRP2 (note that I've no idea as to what it does, apart from
the comment "Set core clock into 500Mhz"). Prints put by Arınç show that
the reset value of CORE_GSWPLL_GRP2 is RG_GSWPLL_POSDIV_500M(1) |
RG_GSWPLL_FBKDIV_500M(40) (0x128), both on the MCM MT7530 from the
MT7621 SoC, as well as on the standalone MT7530 from MT7623NI Bananapi
BPI-R2. Apparently, port 5 on the standalone MT7530 can work under both
values of the register, while on the MT7621 SoC it cannot.
The call path that triggers the register write is:
mt753x_phylink_mac_config() for port 6
-> mt753x_pad_setup()
-> mt7530_pad_clk_setup()
so this fully explains the behavior noticed by Arınç, that bringing port
6 up is necessary.
The simplest fix for the problem is to extract the register writes which
are needed for both port 5 and 6 into a common mt7530_pll_setup()
function, which is called at mt7530_setup() time, immediately after
switch reset. We can argue that this mirrors the code layout introduced
in mt7531_setup() by commit
42bc4fafe359 ("net: mt7531: only do PLL once
after the reset"), in that the PLL setup has the exact same positioning,
and further work to consolidate the separate setup() functions is not
hindered.
Testing confirms that:
- the slight reordering of writes to MT7530_P6ECR and to
CORE_GSWPLL_GRP1 / CORE_GSWPLL_GRP2 introduced by this change does not
appear to cause problems for the operation of port 6 on MT7621 and on
MT7623 (where port 5 also always worked)
- packets sent through port 5 are not corrupted anymore, regardless of
whether port 6 is enabled by phylink or not (or even present in the
device tree)
My algorithm for determining the Fixes: tag is as follows. Testing shows
that some logic from mt7530_pad_clk_setup() is needed even for port 5.
Prior to commit
ca366d6c889b ("net: dsa: mt7530: Convert to PHYLINK
API"), a call did exist for all phy_is_pseudo_fixed_link() ports - so
port 5 included. That commit replaced it with a temporary "Port 5 is not
supported!" comment, and the following commit
38f790a80560 ("net: dsa:
mt7530: Add support for port 5") replaced that comment with a
configuration procedure in mt7530_setup_port5() which was insufficient
for port 5 to work. I'm laying the blame on the patch that claimed
support for port 5, although one would have also needed the change from
commit
c3b8e07909db ("net: dsa: mt7530: setup core clock even in TRGMII
mode") for the write to be performed completely independently from port
6's configuration.
Thanks go to Arınç for describing the problem, for debugging and for
testing.
Reported-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Link: https://lore.kernel.org/netdev/f297c2c4-6e7c-57ac-2394-f6025d309b9d@arinc9.com/
Fixes:
38f790a80560 ("net: dsa: mt7530: Add support for port 5")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230307155411.868573-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 8 Mar 2023 22:34:22 +0000 (14:34 -0800)]
Merge https://git./linux/kernel/git/bpf/bpf-next
Andrii Nakryiko says:
====================
pull-request: bpf-next 2023-03-08
We've added 23 non-merge commits during the last 2 day(s) which contain
a total of 28 files changed, 414 insertions(+), 104 deletions(-).
The main changes are:
1) Add more precise memory usage reporting for all BPF map types,
from Yafang Shao.
2) Add ARM32 USDT support to libbpf, from Puranjay Mohan.
3) Fix BTF_ID_LIST size causing problems in !CONFIG_DEBUG_INFO_BTF,
from Nathan Chancellor.
4) IMA selftests fix, from Roberto Sassu.
5) libbpf fix in APK support code, from Daniel Müller.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (23 commits)
selftests/bpf: Fix IMA test
libbpf: USDT arm arg parsing support
libbpf: Refactor parse_usdt_arg() to re-use code
libbpf: Fix theoretical u32 underflow in find_cd() function
bpf: enforce all maps having memory usage callback
bpf: offload map memory usage
bpf, net: xskmap memory usage
bpf, net: sock_map memory usage
bpf, net: bpf_local_storage memory usage
bpf: local_storage memory usage
bpf: bpf_struct_ops memory usage
bpf: queue_stack_maps memory usage
bpf: devmap memory usage
bpf: cpumap memory usage
bpf: bloom_filter memory usage
bpf: ringbuf memory usage
bpf: reuseport_array memory usage
bpf: stackmap memory usage
bpf: arraymap memory usage
bpf: hashtab memory usage
...
====================
Link: https://lore.kernel.org/r/20230308193533.1671597-1-andrii@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Wed, 8 Mar 2023 20:02:09 +0000 (12:02 -0800)]
Merge tag 'fs_for_v6.3-rc2' of git://git./linux/kernel/git/jack/linux-fs
Pull udf fixes from Jan Kara:
"Fix bugs in UDF caused by the big pile of changes that went in during
the merge window"
* tag 'fs_for_v6.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
udf: Warn if block mapping is done for in-ICB files
udf: Fix reading of in-ICB files
udf: Fix lost writes in udf_adinicb_writepage()
Linus Torvalds [Wed, 8 Mar 2023 19:56:45 +0000 (11:56 -0800)]
Merge tag 'platform-drivers-x86-v6.3-2' of git://git./linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Hans de Goede:
"A small set of assorted bug and build/warning fixes"
* tag 'platform-drivers-x86-v6.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform: mellanox: mlx-platform: Initialize shift variable to 0
platform/x86: int3472: Add GPIOs to Surface Go 3 Board data
platform/x86: ISST: Fix kernel documentation warnings
platform: x86: MLX_PLATFORM: select REGMAP instead of depending on it
platform: mellanox: select REGMAP instead of depending on it
platform/x86/intel/tpmi: Fix double free reported by Smatch
platform/x86: ISST: Increase range of valid mail box commands
platform/x86: dell-ddv: Fix temperature scaling
platform/x86: dell-ddv: Fix cache invalidation on resume
platform/x86/amd: pmc: remove CONFIG_SUSPEND checks
Linus Torvalds [Tue, 7 Mar 2023 21:06:29 +0000 (13:06 -0800)]
x86/resctl: fix scheduler confusion with 'current'
The implementation of 'current' on x86 is very intentionally special: it
is a very common thing to look up, and it uses 'this_cpu_read_stable()'
to get the current thread pointer efficiently from per-cpu storage.
And the keyword in there is 'stable': the current thread pointer never
changes as far as a single thread is concerned. Even if when a thread
is preempted, or moved to another CPU, or even across an explicit call
'schedule()' that thread will still have the same value for 'current'.
It is, after all, the kernel base pointer to thread-local storage.
That's why it's stable to begin with, but it's also why it's important
enough that we have that special 'this_cpu_read_stable()' access for it.
So this is all done very intentionally to allow the compiler to treat
'current' as a value that never visibly changes, so that the compiler
can do CSE and combine multiple different 'current' accesses into one.
However, there is obviously one very special situation when the
currently running thread does actually change: inside the scheduler
itself.
So the scheduler code paths are special, and do not have a 'current'
thread at all. Instead there are _two_ threads: the previous and the
next thread - typically called 'prev' and 'next' (or prev_p/next_p)
internally.
So this is all actually quite straightforward and simple, and not all
that complicated.
Except for when you then have special code that is run in scheduler
context, that code then has to be aware that 'current' isn't really a
valid thing. Did you mean 'prev'? Did you mean 'next'?
In fact, even if then look at the code, and you use 'current' after the
new value has been assigned to the percpu variable, we have explicitly
told the compiler that 'current' is magical and always stable. So the
compiler is quite free to use an older (or newer) value of 'current',
and the actual assignment to the percpu storage is not relevant even if
it might look that way.
Which is exactly what happened in the resctl code, that blithely used
'current' in '__resctrl_sched_in()' when it really wanted the new
process state (as implied by the name: we're scheduling 'into' that new
resctl state). And clang would end up just using the old thread pointer
value at least in some configurations.
This could have happened with gcc too, and purely depends on random
compiler details. Clang just seems to have been more aggressive about
moving the read of the per-cpu current_task pointer around.
The fix is trivial: just make the resctl code adhere to the scheduler
rules of using the prev/next thread pointer explicitly, instead of using
'current' in a situation where it just wasn't valid.
That same code is then also used outside of the scheduler context (when
a thread resctl state is explicitly changed), and then we will just pass
in 'current' as that pointer, of course. There is no ambiguity in that
case.
The fix may be trivial, but noticing and figuring out what went wrong
was not. The credit for that goes to Stephane Eranian.
Reported-by: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/lkml/20230303231133.1486085-1-eranian@google.com/
Link: https://lore.kernel.org/lkml/alpine.LFD.2.01.0908011214330.3304@localhost.localdomain/
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Tested-by: Stephane Eranian <eranian@google.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Roberto Sassu [Wed, 8 Mar 2023 10:37:13 +0000 (11:37 +0100)]
selftests/bpf: Fix IMA test
Commit
62622dab0a28 ("ima: return IMA digest value only when IMA_COLLECTED
flag is set") caused bpf_ima_inode_hash() to refuse to give non-fresh
digests. IMA test #3 assumed the old behavior, that bpf_ima_inode_hash()
still returned also non-fresh digests.
Correct the test by accepting both cases. If the samples returned are 1,
assume that the commit above is applied and that the returned digest is
fresh. If the samples returned are 2, assume that the commit above is not
applied, and check both the non-fresh and fresh digest.
Fixes:
62622dab0a28 ("ima: return IMA digest value only when IMA_COLLECTED flag is set")
Reported-by: David Vernet <void@manifault.com>
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Matt Bobrowski <mattbobrowski@google.com>
Link: https://lore.kernel.org/bpf/20230308103713.1681200-1-roberto.sassu@huaweicloud.com
Eric Dumazet [Tue, 7 Mar 2023 14:59:59 +0000 (14:59 +0000)]
net: reclaim skb->scm_io_uring bit
Commit
0091bfc81741 ("io_uring/af_unix: defer registered
files gc to io_uring release") added one bit to struct sk_buff.
This structure is critical for networking, and we try very hard
to not add bloat on it, unless absolutely required.
For instance, we can use a specific destructor as a wrapper
around unix_destruct_scm(), to identify skbs that unix_gc()
has to special case.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Pavel Begunkov <asml.silence@gmail.com>
Cc: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Cc: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 8 Mar 2023 13:19:44 +0000 (13:19 +0000)]
Merge branch 'sparx5-tc-flower-templates'
Steen Hegelund says:
====================
Add support for TC flower templates in Sparx5
This adds support for the TC template mechanism in the Sparx5 flower filter
implementation.
Templates are as such handled by the TC framework, but when a template is
created (using a chain id) there are by definition no filters on this
chain (an error will be returned if there are any).
If the templates chain id is one that is represented by a VCAP lookup, then
when the template is created, we know that it is safe to use the keys
provided in the template to change the keyset configuration for the (port,
lookup) combination, if this is needed to improve the match on the
template.
The original port keyset configuration is captured in the template state
information which is kept per port, so that when the template is deleted
the port keyset configuration can be restored to its previous setting.
The template also provides the protocol parameter which is the basic
information that is used to find out which port keyset configuration needs
to be changed.
The VCAPs and lookups are slightly different when it comes to which keys,
keysets and protocol are supported and used for selection, so in some
cases a bit of tweaking is needed to find a useful match. This is done by
e.g. removing a key that prevents the best matching keyset from being
selected.
The debugfs output that is provided for a port allows inspection of the
currently used keyset in each of the VCAPs lookups. So when a template has
been created the debugfs output allows you to verify if the keyset
configuration has been changed successfully.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Steen Hegelund [Tue, 7 Mar 2023 13:41:03 +0000 (14:41 +0100)]
net: microchip: sparx5: Add TC template support
This adds support for using the "template add" and "template destroy"
functionality to change the port keyset configuration.
If the VCAP lookup already contains rules, the port keyset is left
unchanged, as a change would make these rules unusable.
When the template is destroyed the port keyset configuration is restored.
The filters using the template chain will automatically be deleted by the
TC framework.
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steen Hegelund [Tue, 7 Mar 2023 13:41:02 +0000 (14:41 +0100)]
net: microchip: sparx5: Add port keyset changing functionality
With this its is now possible for clients (like TC) to change the port
keyset configuration in the Sparx5 VCAPs.
This is typically done per traffic class which is guided with the L3
protocol information.
Before the change the current keyset configuration is collected in a list
that is handed back to the client.
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steen Hegelund [Tue, 7 Mar 2023 13:41:01 +0000 (14:41 +0100)]
net: microchip: sparx5: Add TC template list to a port
This adds a list that is used to collect the templates that are active on a
port.
This allows the template creation to change the port configuration
and the template destruction to change it back.
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steen Hegelund [Tue, 7 Mar 2023 13:41:00 +0000 (14:41 +0100)]
net: microchip: sparx5: Provide rule count, key removal and keyset select
This provides these 3 functions in the VCAP API:
- Count the number of rules in a VCAP lookup (chain)
- Remove a key from a VCAP rule
- Find the keyset that gives the smallest rule list from a list of keysets
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steen Hegelund [Tue, 7 Mar 2023 13:40:59 +0000 (14:40 +0100)]
net: microchip: sparx5: Correct the spelling of the keysets in debugfs
Correct the name used in the debugfs output.
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Machon [Tue, 7 Mar 2023 11:21:03 +0000 (12:21 +0100)]
net: microchip: sparx5: fix deletion of existing DSCP mappings
Fix deletion of existing DSCP mappings in the APP table.
Adding and deleting DSCP entries are replicated per-port, since the
mapping table is global for all ports in the chip. Whenever a mapping
for a DSCP value already exists, the old mapping is deleted first.
However, it is only deleted for the specified port. Fix this by calling
sparx5_dcb_ieee_delapp() instead of dcb_ieee_delapp() as it ought to be.
Reproduce:
// Map and remap DSCP value 63
$ dcb app add dev eth0 dscp-prio 63:1
$ dcb app add dev eth0 dscp-prio 63:2
$ dcb app show dev eth0 dscp-prio
dscp-prio 63:2
$ dcb app show dev eth1 dscp-prio
dscp-prio 63:1 63:2 <-- 63:1 should not be there
Fixes:
8dcf69a64118 ("net: microchip: sparx5: add support for offloading dscp table")
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Suman Ghosh [Tue, 7 Mar 2023 10:49:08 +0000 (16:19 +0530)]
octeontx2-af: Unlock contexts in the queue context cache in case of fault detection
NDC caches contexts of frequently used queue's (Rx and Tx queues)
contexts. Due to a HW errata when NDC detects fault/poision while
accessing contexts it could go into an illegal state where a cache
line could get locked forever. To makesure all cache lines in NDC
are available for optimum performance upon fault/lockerror/posion
errors scan through all cache lines in NDC and clear the lock bit.
Fixes:
4a3581cd5995 ("octeontx2-af: NPA AQ instruction enqueue support")
Signed-off-by: Suman Ghosh <sumang@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: Sai Krishna <saikrishnag@marvell.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arınç ÜNAL [Tue, 7 Mar 2023 09:56:19 +0000 (12:56 +0300)]
dt-bindings: net: dsa: mediatek,mt7530: change some descriptions to literal
The line endings must be preserved on gpio-controller, io-supply, and
reset-gpios properties to look proper when the YAML file is parsed.
Currently it's interpreted as a single line when parsed. Change the style
of the description of these properties to literal style to preserve the
line endings.
Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiapeng Chong [Tue, 7 Mar 2023 05:41:38 +0000 (13:41 +0800)]
emulex/benet: clean up some inconsistent indenting
No functional modification involved.
drivers/net/ethernet/emulex/benet/be_cmds.c:1120 be_cmd_pmac_add() warn: inconsistent indenting.
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=4396
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
D. Wythe [Tue, 7 Mar 2023 03:23:46 +0000 (11:23 +0800)]
net/smc: fix fallback failed while sendmsg with fastopen
Before determining whether the msg has unsupported options, it has been
prematurely terminated by the wrong status check.
For the application, the general usages of MSG_FASTOPEN likes
fd = socket(...)
/* rather than connect */
sendto(fd, data, len, MSG_FASTOPEN)
Hence, We need to check the flag before state check, because the sock
state here is always SMC_INIT when applications tries MSG_FASTOPEN.
Once we found unsupported options, fallback it to TCP.
Fixes:
ee9dfbef02d1 ("net/smc: handle sockopts forcing fallback")
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
v2 -> v1: Optimize code style
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gustavo A. R. Silva [Mon, 6 Mar 2023 23:51:52 +0000 (17:51 -0600)]
net/mlx4_en: Replace fake flex-array with flexible-array member
Zero-length arrays as fake flexible arrays are deprecated and we are
moving towards adopting C99 flexible-array members instead.
Transform zero-length array into flexible-array member in struct
mlx4_en_rx_desc.
Address the following warnings found with GCC-13 and
-fstrict-flex-arrays=3 enabled:
drivers/net/ethernet/mellanox/mlx4/en_rx.c:88:30: warning: array subscript i is outside array bounds of ‘struct mlx4_wqe_data_seg[0]’ [-Warray-bounds=]
drivers/net/ethernet/mellanox/mlx4/en_rx.c:149:30: warning: array subscript 0 is outside array bounds of ‘struct mlx4_wqe_data_seg[0]’ [-Warray-bounds=]
drivers/net/ethernet/mellanox/mlx4/en_rx.c:127:30: warning: array subscript i is outside array bounds of ‘struct mlx4_wqe_data_seg[0]’ [-Warray-bounds=]
drivers/net/ethernet/mellanox/mlx4/en_rx.c:128:30: warning: array subscript i is outside array bounds of ‘struct mlx4_wqe_data_seg[0]’ [-Warray-bounds=]
drivers/net/ethernet/mellanox/mlx4/en_rx.c:129:30: warning: array subscript i is outside array bounds of ‘struct mlx4_wqe_data_seg[0]’ [-Warray-bounds=]
drivers/net/ethernet/mellanox/mlx4/en_rx.c:117:30: warning: array subscript i is outside array bounds of ‘struct mlx4_wqe_data_seg[0]’ [-Warray-bounds=]
drivers/net/ethernet/mellanox/mlx4/en_rx.c:119:30: warning: array subscript i is outside array bounds of ‘struct mlx4_wqe_data_seg[0]’ [-Warray-bounds=]
This helps with the ongoing efforts to tighten the FORTIFY_SOURCE
routines on memcpy() and help us make progress towards globally
enabling -fstrict-flex-arrays=3 [1].
Link: https://github.com/KSPP/linux/issues/21
Link: https://github.com/KSPP/linux/issues/264
Link: https://gcc.gnu.org/pipermail/gcc-patches/2022-October/602902.html
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 8 Mar 2023 09:31:31 +0000 (09:31 +0000)]
Merge branch 'r8169-disable-ASPM-during-NAPI-poll'
Heiner Kallweit says:
====================
r8169: disable ASPM during NAPI poll
This is a rework of ideas from Kai-Heng on how to avoid the known
ASPM issues whilst still allowing for a maximum of ASPM-related power
savings. As a prerequisite some locking is added first.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Mon, 6 Mar 2023 21:28:06 +0000 (22:28 +0100)]
r8169: remove ASPM restrictions now that ASPM is disabled during NAPI poll
Now that ASPM is disabled during NAPI poll, we can remove all ASPM
restrictions. This allows for higher power savings if the network
isn't fully loaded.
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Mon, 6 Mar 2023 21:26:47 +0000 (22:26 +0100)]
r8169: disable ASPM during NAPI poll
Several chip versions have problems with ASPM, what may result in
rx_missed errors or tx timeouts. The root cause isn't known but
experience shows that disabling ASPM during NAPI poll can avoid
these problems.
Suggested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Mon, 6 Mar 2023 21:25:49 +0000 (22:25 +0100)]
r8169: prepare rtl_hw_aspm_clkreq_enable for usage in atomic context
Bail out if the function is used with chip versions that don't support
ASPM configuration. In addition remove the delay, it tuned out that
it's not needed, also vendor driver r8125 doesn't have it.
Suggested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Mon, 6 Mar 2023 21:24:49 +0000 (22:24 +0100)]
r8169: enable cfg9346 config register access in atomic context
For disabling ASPM during NAPI poll we'll have to unlock access
to the config registers in atomic context. Other code parts
running with config register access unlocked are partially
longer and can sleep. Add a usage counter to enable parallel
execution of code parts requiring unlocked config registers.
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Mon, 6 Mar 2023 21:24:00 +0000 (22:24 +0100)]
r8169: use spinlock to protect access to registers Config2 and Config5
For disabling ASPM during NAPI poll we'll have to access both registers
in atomic context. Use a spinlock to protect access.
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Mon, 6 Mar 2023 21:23:15 +0000 (22:23 +0100)]
r8169: use spinlock to protect mac ocp register access
For disabling ASPM during NAPI poll we'll have to access mac ocp
registers in atomic context. This could result in races because
a mac ocp read consists of a write to register OCPDR, followed
by a read from the same register. Therefore add a spinlock to
protect access to mac ocp registers.
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vadim Fedorenko [Mon, 6 Mar 2023 16:07:38 +0000 (08:07 -0800)]
net-timestamp: extend SOF_TIMESTAMPING_OPT_ID to HW timestamps
When the feature was added it was enabled for SW timestamps only but
with current hardware the same out-of-order timestamps can be seen.
Let's expand the area for the feature to all types of timestamps.
Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gustavo A. R. Silva [Mon, 6 Mar 2023 23:40:28 +0000 (17:40 -0600)]
netxen_nic: Replace fake flex-array with flexible-array member
Zero-length arrays as fake flexible arrays are deprecated and we are
moving towards adopting C99 flexible-array members instead.
Transform zero-length array into flexible-array member in struct
nx_cardrsp_rx_ctx_t.
Address the following warnings found with GCC-13 and
-fstrict-flex-arrays=3 enabled:
drivers/net/ethernet/qlogic/netxen/netxen_nic_ctx.c:361:26: warning: array subscript <unknown> is outside array bounds of ‘char[0]’ [-Warray-bounds=]
drivers/net/ethernet/qlogic/netxen/netxen_nic_ctx.c:372:25: warning: array subscript <unknown> is outside array bounds of ‘char[0]’ [-Warray-bounds=]
This helps with the ongoing efforts to tighten the FORTIFY_SOURCE
routines on memcpy() and help us make progress towards globally
enabling -fstrict-flex-arrays=3 [1].
Link: https://github.com/KSPP/linux/issues/21
Link: https://github.com/KSPP/linux/issues/265
Link: https://gcc.gnu.org/pipermail/gcc-patches/2022-October/602902.html
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/ZAZ57I6WdQEwWh7v@work
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Heiner Kallweit [Mon, 6 Mar 2023 22:10:57 +0000 (23:10 +0100)]
net: phy: smsc: simplify lan95xx_config_aneg_ext
lan95xx_config_aneg_ext() can be simplified by using phy_set_bits().
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/3da785c7-3ef8-b5d3-89a0-340f550be3c2@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Mon, 6 Mar 2023 20:43:13 +0000 (20:43 +0000)]
net: remove enum skb_free_reason
enum skb_drop_reason is more generic, we can adopt it instead.
Provide dev_kfree_skb_irq_reason() and dev_kfree_skb_any_reason().
This means drivers can use more precise drop reasons if they want to.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Yunsheng Lin <linyunsheng@huawei.com>
Link: https://lore.kernel.org/r/20230306204313.10492-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Heiner Kallweit [Mon, 6 Mar 2023 21:51:35 +0000 (22:51 +0100)]
net: phy: improve phy_read_poll_timeout
cond sometimes is (val & MASK) what may result in a false positive
if val is a negative errno. We shouldn't evaluate cond if val < 0.
This has no functional impact here, but it's not nice.
Therefore switch order of the checks.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/6d8274ac-4344-23b4-d9a3-cad4c39517d4@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Andrii Nakryiko [Tue, 7 Mar 2023 23:31:13 +0000 (15:31 -0800)]
Merge branch 'libbpf: usdt arm arg parsing support'
Puranjay Mohan says:
====================
This series add the support of the ARM architecture to libbpf USDT. This
involves implementing the parse_usdt_arg() function for ARM.
It was seen that the last part of parse_usdt_arg() is repeated for all architectures,
so, the first patch in this series refactors these functions and moved the post
processing to parse_usdt_spec()
Changes in V2[1] to V3:
- Use a tabular approach to find register offsets.
- Add the patch for refactoring parse_usdt_arg()
====================
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Puranjay Mohan [Tue, 7 Mar 2023 12:04:40 +0000 (12:04 +0000)]
libbpf: USDT arm arg parsing support
Parsing of USDT arguments is architecture-specific; on arm it is
relatively easy since registers used are r[0-10], fp, ip, sp, lr,
pc. Format is slightly different compared to aarch64; forms are
- "size @ [ reg, #offset ]" for dereferences, for example
"-8 @ [ sp, #76 ]" ; " -4 @ [ sp ]"
- "size @ reg" for register values; for example
"-4@r0"
- "size @ #value" for raw values; for example
"-8@#1"
Add support for parsing USDT arguments for ARM architecture.
To test the above changes QEMU's virt[1] board with cortex-a15
CPU was used. libbpf-bootstrap's usdt example[2] was modified to attach
to a test program with DTRACE_PROBE1/2/3/4... probes to test different
combinations.
[1] https://www.qemu.org/docs/master/system/arm/virt.html
[2] https://github.com/libbpf/libbpf-bootstrap/blob/master/examples/c/usdt.bpf.c
Signed-off-by: Puranjay Mohan <puranjay12@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230307120440.25941-3-puranjay12@gmail.com
Puranjay Mohan [Tue, 7 Mar 2023 12:04:39 +0000 (12:04 +0000)]
libbpf: Refactor parse_usdt_arg() to re-use code
The parse_usdt_arg() function is defined differently for each
architecture but the last part of the function is repeated
verbatim for each architecture.
Refactor parse_usdt_arg() to fill the arg_sz and then do the repeated
post-processing in parse_usdt_spec().
Signed-off-by: Puranjay Mohan <puranjay12@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230307120440.25941-2-puranjay12@gmail.com
Daniel Müller [Tue, 7 Mar 2023 21:55:04 +0000 (21:55 +0000)]
libbpf: Fix theoretical u32 underflow in find_cd() function
Coverity reported a potential underflow of the offset variable used in
the find_cd() function. Switch to using a signed 64 bit integer for the
representation of offset to make sure we can never underflow.
Fixes:
1eebcb60633f ("libbpf: Implement basic zip archive parsing support")
Signed-off-by: Daniel Müller <deso@posteo.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230307215504.837321-1-deso@posteo.net
Jakub Kicinski [Mon, 6 Mar 2023 20:04:57 +0000 (12:04 -0800)]
ynl: re-license uniformly under GPL-2.0 OR BSD-3-Clause
I was intending to make all the Netlink Spec code BSD-3-Clause
to ease the adoption but it appears that:
- I fumbled the uAPI and used "GPL WITH uAPI note" there
- it gives people pause as they expect GPL in the kernel
As suggested by Chuck re-license under dual. This gives us benefit
of full BSD freedom while fulfilling the broad "kernel is under GPL"
expectations.
Link: https://lore.kernel.org/all/20230304120108.05dd44c5@kernel.org/
Link: https://lore.kernel.org/r/20230306200457.3903854-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Stephen Hemminger [Mon, 6 Mar 2023 19:44:05 +0000 (11:44 -0800)]
mailmap: update entries for Stephen Hemminger
Map all my old email addresses to current address.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Link: https://lore.kernel.org/r/20230306194405.108236-1-stephen@networkplumber.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Mon, 6 Mar 2023 19:20:18 +0000 (11:20 -0800)]
mailmap: add entry for Maxim Mikityanskiy
Map Maxim's old corporate addresses to his personal one.
Link: https://lore.kernel.org/r/20230306192018.3894988-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fedor Pchelkin [Mon, 6 Mar 2023 21:26:50 +0000 (00:26 +0300)]
nfc: change order inside nfc_se_io error path
cb_context should be freed on the error path in nfc_se_io as stated by
commit
25ff6f8a5a3b ("nfc: fix memory leak of se_io context in
nfc_genl_se_io").
Make the error path in nfc_se_io unwind everything in reverse order, i.e.
free the cb_context after unlocking the device.
Suggested-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20230306212650.230322-1-pchelkin@ispras.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Arnd Bergmann [Tue, 14 Feb 2023 15:25:36 +0000 (16:25 +0100)]
ethernet: ice: avoid gcc-9 integer overflow warning
With older compilers like gcc-9, the calculation of the vlan
priority field causes a false-positive warning from the byteswap:
In file included from drivers/net/ethernet/intel/ice/ice_tc_lib.c:4:
drivers/net/ethernet/intel/ice/ice_tc_lib.c: In function 'ice_parse_cls_flower':
include/uapi/linux/swab.h:15:15: error: integer overflow in expression '(int)(short unsigned int)((int)match.key-><U67c8>.<U6698>.vlan_priority << 13) & 57344 & 255' of type 'int' results in '0' [-Werror=overflow]
15 | (((__u16)(x) & (__u16)0x00ffU) << 8) | \
| ~~~~~~~~~~~~^~~~~~~~~~~~~~~~~
include/uapi/linux/swab.h:106:2: note: in expansion of macro '___constant_swab16'
106 | ___constant_swab16(x) : \
| ^~~~~~~~~~~~~~~~~~
include/uapi/linux/byteorder/little_endian.h:42:43: note: in expansion of macro '__swab16'
42 | #define __cpu_to_be16(x) ((__force __be16)__swab16((x)))
| ^~~~~~~~
include/linux/byteorder/generic.h:96:21: note: in expansion of macro '__cpu_to_be16'
96 | #define cpu_to_be16 __cpu_to_be16
| ^~~~~~~~~~~~~
drivers/net/ethernet/intel/ice/ice_tc_lib.c:1458:5: note: in expansion of macro 'cpu_to_be16'
1458 | cpu_to_be16((match.key->vlan_priority <<
| ^~~~~~~~~~~
After a change to be16_encode_bits(), the code becomes more
readable to both people and compilers, which avoids the warning.
Fixes:
34800178b302 ("ice: Add support for VLAN priority filters in switchdev")
Suggested-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Michal Swiatkowski [Mon, 13 Feb 2023 11:27:33 +0000 (12:27 +0100)]
ice: don't ignore return codes in VSI related code
There were few smatch warnings reported by Dan:
- ice_vsi_cfg_xdp_txqs can return 0 instead of ret, which is cleaner
- return values in ice_vsi_cfg_def were ignored
- in ice_vsi_rebuild return value was ignored in case rebuild failed,
it was a never reached code, however, rewrite it for clarity.
- ice_vsi_cfg_tc can return 0 instead of ret
Fixes:
6624e780a577 ("ice: split ice_vsi_setup into smaller functions")
Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Dave Ertman [Fri, 27 Jan 2023 13:24:10 +0000 (14:24 +0100)]
ice: Fix DSCP PFC TLV creation
When creating the TLV to send to the FW for configuring DSCP mode PFC,the
PFCENABLE field was being masked with a 4 bit mask (0xF), but this is an 8
bit bitmask for enabled classes for PFC. This means that traffic classes
4-7 could not be enabled for PFC.
Remove the mask completely, as it is not necessary, as we are assigning 8
bits to an 8 bit field.
Fixes:
2a87bd73e50d ("ice: Add DSCP support")
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Signed-off-by: Karen Ostrowska <karen.ostrowska@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Linus Torvalds [Tue, 7 Mar 2023 20:16:18 +0000 (12:16 -0800)]
cpumask: be more careful with 'cpumask_setall()'
Commit
596ff4a09b89 ("cpumask: re-introduce constant-sized cpumask
optimizations") changed cpumask_setall() to use "bitmap_set()" instead
of "bitmap_fill()", because bitmap_fill() would explicitly set all the
bits of a constant sized small bitmap, and that's exactly what we don't
want: we want to only set bits up to 'nr_cpu_ids', which is what
"bitmap_set()" does.
However, Yury correctly points out that while "bitmap_set()" does indeed
only set bits up to the required bitmap size, it doesn't _clear_ bits
above that size, so the upper bits would still not have well-defined
values.
Now, none of this should really matter, since any bits set past
'nr_cpu_ids' should always be ignored in the first place. Yes, the bit
scanning functions might return them as a result, but since users should
always consider the ">= nr_cpu_ids" condition to mean "no more bits",
that shouldn't have any actual effect (see previous commit
8ca09d5fa354
"cpumask: fix incorrect cpumask scanning result checks").
But let's just do it right, the way the code was _intended_ to work. We
have had enough lazy code that works but bites us in the *rse later
(again, see previous commit) that there's no reason to not just do this
properly.
It turns out that "bitmap_fill()" gets this all right for the complex
case, and really only fails for the inlined optimized case that just
fills the whole word. And while we could just fix bitmap_fill() to use
the proper last word mask, there's two issues with that:
- the cpumask case wants to do the _optimization_ based on "NR_CPUS is
a small constant", but then wants to do the actual bit _fill_ based
on "nr_cpu_ids" that isn't necessarily that same constant
- we have lots of non-cpumask users of bitmap_fill(), and while they
hopefully don't care, and probably would want the proper semantics
anyway ("only set bits up to the limit"), I do not want the cpumask
changes to impact other parts
So this ends up just doing the single-word optimization by hand in the
cpumask code. If our cpumask is fundamentally limited to a single word,
just do the proper "fill in that word" exactly. And if it's the more
complex multi-word case, then the generic bitmap_fill() will DTRT.
This is all an example of how our bitmap function optimizations really
are somewhat broken. They conflate the "this is size of the bitmap"
optimizations with the actual bit(s) we want to set.
In many cases we really want to have the two be separate things:
sometimes we base our optimizations on the size of the whole bitmap ("I
know this whole bitmap fits in a single word, so I'll just use
single-word accesses"), and sometimes we base them on the bit we are
looking at ("this is just acting on bits that are in the first word, so
I'll use single-word accesses").
Notice how the end result of the two optimizations are the same, but the
way we get to them are quite different.
And all our cpumask optimization games are really about that fundamental
distinction, and we'd often really want to pass in both the "this is the
bit I'm working on" (which _can_ be a small constant but might be
variable), and "I know it's in this range even if it's variable" (based
on CONFIG_NR_CPUS).
So this cpumask_setall() implementation just makes that explicit. It
checks the "I statically know the size is small" using the known static
size of the cpumask (which is what that 'small_cpumask_bits' is all
about), but then sets the actual bits using the exact number of cpus we
have (ie 'nr_cpumask_bits')
Of course, in a perfect world, the compiler would have done all the
range analysis (possibly with help from us just telling it that
"this value is always in this range"), and would do all of this for us.
But that is not the world we live in.
While we dream of that perfect world, this does that manual logic to
make it all work out. And this was a very long explanation for a small
code change that shouldn't even matter.
Reported-by: Yury Norov <yury.norov@gmail.com>
Link: https://lore.kernel.org/lkml/ZAV9nGG9e1%2FrV+L%2F@yury-laptop/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexei Starovoitov [Tue, 7 Mar 2023 17:33:43 +0000 (09:33 -0800)]
Merge branch 'bpf: bpf memory usage'
Yafang Shao says:
====================
Currently we can't get bpf memory usage reliably either from memcg or
from bpftool.
In memcg, there's not a 'bpf' item in memory.stat, but only 'kernel',
'sock', 'vmalloc' and 'percpu' which may related to bpf memory. With
these items we still can't get the bpf memory usage, because bpf memory
usage may far less than the kmem in a memcg, for example, the dentry may
consume lots of kmem.
bpftool now shows the bpf memory footprint, which is difference with bpf
memory usage. The difference can be quite great in some cases, for example,
- non-preallocated bpf map
The non-preallocated bpf map memory usage is dynamically changed. The
allocated elements count can be from 0 to the max entries. But the
memory footprint in bpftool only shows a fixed number.
- bpf metadata consumes more memory than bpf element
In some corner cases, the bpf metadata can consumes a lot more memory
than bpf element consumes. For example, it can happen when the element
size is quite small.
- some maps don't have key, value or max_entries
For example the key_size and value_size of ringbuf is 0, so its
memlock is always 0.
We need a way to show the bpf memory usage especially there will be more
and more bpf programs running on the production environment and thus the
bpf memory usage is not trivial.
This patchset introduces a new map ops ->map_mem_usage to calculate the
memory usage. Note that we don't intend to make the memory usage 100%
accurate, while our goal is to make sure there is only a small difference
between what bpftool reports and the real memory. That small difference
can be ignored compared to the total usage. That is enough to monitor
the bpf memory usage. For example, the user can rely on this value to
monitor the trend of bpf memory usage, compare the difference in bpf
memory usage between different bpf program versions, figure out which
maps consume large memory, and etc.
This patchset implements the bpf memory usage for all maps, and yet there's
still work to do. We don't want to introduce runtime overhead in the
element update and delete path, but we have to do it for some
non-preallocated maps,
- devmap, xskmap
When we update or delete an element, it will allocate or free memory.
In order to track this dynamic memory, we have to track the count in
element update and delete path.
- cpumap
The element size of each cpumap element is not determinated. If we
want to track the usage, we have to count the size of all elements in
the element update and delete path. So I just put it aside currently.
- local_storage, bpf_local_storage
When we attach or detach a cgroup, it will allocate or free memory. If
we want to track the dynamic memory, we also need to do something in
the update and delete path. So I just put it aside currently.
- offload map
The element update and delete of offload map is via the netdev dev_ops,
in which it may dynamically allocate or free memory, but this dynamic
memory isn't counted in offload map memory usage currently.
The result of each map can be found in the individual patch.
We may also need to track per-container bpf memory usage, that will be
addressed by a different patchset.
Changes:
v3->v4: code improvement on ringbuf (Andrii)
use READ_ONCE() to read lpm_trie (Tao)
explain why we can't get bpf memory usage from memcg.
v2->v3: check callback at map creation time and avoid warning (Alexei)
fix build error under CONFIG_BPF=n (lkp@intel.com)
v1->v2: calculate the memory usage within bpf (Alexei)
- [v1] bpf, mm: bpf memory usage
https://lwn.net/Articles/921991/
- [RFC PATCH v2] mm, bpf: Add BPF into /proc/meminfo
https://lwn.net/Articles/919848/
- [RFC PATCH v1] mm, bpf: Add BPF into /proc/meminfo
https://lwn.net/Articles/917647/
- [RFC PATCH] bpf, mm: Add a new item bpf into memory.stat
https://lore.kernel.org/bpf/
20220921170002.29557-1-laoar.shao@gmail].com/
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Yafang Shao [Sun, 5 Mar 2023 12:46:15 +0000 (12:46 +0000)]
bpf: enforce all maps having memory usage callback
We have implemented memory usage callback for all maps, and we enforce
any newly added map having a callback as well. We check this callback at
map creation time. If it doesn't have the callback, we will return
EINVAL.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Link: https://lore.kernel.org/r/20230305124615.12358-19-laoar.shao@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Yafang Shao [Sun, 5 Mar 2023 12:46:14 +0000 (12:46 +0000)]
bpf: offload map memory usage
A new helper is introduced to calculate offload map memory usage. But
currently the memory dynamically allocated in netdev dev_ops, like
nsim_map_update_elem, is not counted. Let's just put it aside now.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Link: https://lore.kernel.org/r/20230305124615.12358-18-laoar.shao@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Yafang Shao [Sun, 5 Mar 2023 12:46:13 +0000 (12:46 +0000)]
bpf, net: xskmap memory usage
A new helper is introduced to calculate xskmap memory usage.
The xfsmap memory usage can be dynamically changed when we add or remove
a xsk_map_node. Hence we need to track the count of xsk_map_node to get
its memory usage.
The result as follows,
- before
10: xskmap name count_map flags 0x0
key 4B value 4B max_entries 65536 memlock 524288B
- after
10: xskmap name count_map flags 0x0 <<< no elements case
key 4B value 4B max_entries 65536 memlock 524608B
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Link: https://lore.kernel.org/r/20230305124615.12358-17-laoar.shao@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Yafang Shao [Sun, 5 Mar 2023 12:46:12 +0000 (12:46 +0000)]
bpf, net: sock_map memory usage
sockmap and sockhash don't have something in common in allocation, so let's
introduce different helpers to calculate their memory usage.
The reuslt as follows,
- before
28: sockmap name count_map flags 0x0
key 4B value 4B max_entries 65536 memlock 524288B
29: sockhash name count_map flags 0x0
key 4B value 4B max_entries 65536 memlock 524288B
- after
28: sockmap name count_map flags 0x0
key 4B value 4B max_entries 65536 memlock 524608B
29: sockhash name count_map flags 0x0 <<<< no updated elements
key 4B value 4B max_entries 65536 memlock
1048896B
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Link: https://lore.kernel.org/r/20230305124615.12358-16-laoar.shao@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>