Steen Hegelund [Thu, 17 Nov 2022 21:31:13 +0000 (22:31 +0100)]
net: microchip: sparx5: Add VCAP locking to protect rules
This ensures that the VCAP cache and the lists maintained in the VCAP
instance is protected when accessed by different clients.
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steen Hegelund [Thu, 17 Nov 2022 21:31:12 +0000 (22:31 +0100)]
net: microchip: sparx5: Add VCAP debugFS key/action support for the VCAP API
This add support for displaying the keys and actions in a rule.
The keys and action display format will be determined by the size and the
type of the key or action. The longer keys will typically be displayed as a
hexadecimal byte array.
The actionset is not decoded in full as the Sparx5 IS2 only has one
supported action, so this will be added later with other VCAP types.
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steen Hegelund [Thu, 17 Nov 2022 21:31:11 +0000 (22:31 +0100)]
net: microchip: sparx5: Add VCAP rule debugFS support for the VCAP API
This add support to show all rules in a VCAP instance. The information
shown is:
- rule id
- address range
- size
- chain id
- keyset name, subword size, register span
- actionset name, subword size, register span
- counter value
- sticky bit (one bit width counter)
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steen Hegelund [Thu, 17 Nov 2022 21:31:10 +0000 (22:31 +0100)]
net: microchip: sparx5: Add raw VCAP debugFS support for the VCAP API
This adds support for decoding VCAP rules with a minimum number of
attributes: address, rule size and keyset.
This allows for a quick inspection of a VCAP instance to determine if the
rule are present and in the correct order.
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steen Hegelund [Thu, 17 Nov 2022 21:31:09 +0000 (22:31 +0100)]
net: microchip: sparx5: Add VCAP debugFS support
Add a debugFS root folder for Sparx5 and add a vcap folder underneath with
the VCAP instances and the ports
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steen Hegelund [Thu, 17 Nov 2022 21:31:08 +0000 (22:31 +0100)]
net: microchip: sparx5: Ensure VCAP last_used_addr is set back to default
This ensures that the last_used_addr in a VCAP instance is returned to the
default value when all rules have been deleted.
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steen Hegelund [Thu, 17 Nov 2022 21:31:07 +0000 (22:31 +0100)]
net: microchip: sparx5: Ensure L3 protocol has a default value
This ensures that the l3_proto always have a valid value and that any
dissector parsing errors causes the flower rule to be discarded.
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 21 Nov 2022 10:52:14 +0000 (10:52 +0000)]
Merge branch 'gve-alternate-missed-completions'
Jeroen de Borst says:
====================
gve: Handle alternate miss-completions
Some versions of the virtual NIC present miss-completions in
an alternative way. Let the diver handle these alternate completions
and announce this capability to the device.
The capability is announced uing a new AdminQ command that sends
driver information to the device. The device can refuse a driver
if it is lacking support for a capability, or it can adopt it's
behavior to work around OS specific issues.
Changed in v5:
- Removed comments in fucntion calls
- Switched ENOTSUPP back to EOPNOTSUPP and made sure it gets passed
Changed in v4:
- Clarified new AdminQ command in cover letter
- Changed EOPNOTSUPP to ENOTSUPP to match device's response
Changed in v3:
- Rewording cover letter
- Added 'Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>'
Changes in v2:
- Changed the subject to include 'gve:'
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeroen de Borst [Thu, 17 Nov 2022 16:27:01 +0000 (08:27 -0800)]
gve: Handle alternate miss completions
The virtual NIC has 2 ways of indicating a miss-path
completion. This handles the alternate.
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeroen de Borst [Thu, 17 Nov 2022 16:27:00 +0000 (08:27 -0800)]
gve: Adding a new AdminQ command to verify driver
Check whether the driver is compatible with the device
presented.
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Vyukov [Thu, 17 Nov 2022 16:21:01 +0000 (17:21 +0100)]
NFC: nci: Extend virtual NCI deinit test
Extend the test to check the scenario when NCI core tries to send data
to already closed device to ensure that nothing bad happens.
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Bongsu Jeon <bongsu.jeon@samsung.com>
Cc: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 21 Nov 2022 10:36:04 +0000 (10:36 +0000)]
Merge branch 'axiennet-mdio-bus-freq'
Andy Chiu says:
====================
net: axienet: Use a DT property to configure frequency of the MDIO bus
Some FPGA platforms have to set frequency of the MDIO bus lower than 2.5
MHz. Thus, we use a DT property, which is "clock-frequency", to work
with it at boot time. The default 2.5 MHz would be set if the property
is not pressent. Also, factor out mdio enable/disable functions due to
the api change since
253761a0e61b7.
Changelog:
--- v5 ---
1. Make dt-binding patch prior to the implementation patch.
2. Disable mdio bus in error path.
3. Update description of some functions.
--- v4 ---
1. change MAX_MDIO_FREQ to DEFAULT_MDIO_FREQ as suggested by Andrew.
--- v3 RESEND ---
1. Repost the exact same patch again
--- v3 ---
1. Fix coding style, and make probing of the driver fail if MDC overflow
--- v2 ---
1. Use clock-frequency, as defined in mdio.yaml, to configure MDIO
clock.
2. Only print out frequency if it is set to a non-standard value.
3. Reduce the scope of axienet_mdio_enable and remove
axienet_mdio_disable because no one really uses it anymore.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Chiu [Thu, 17 Nov 2022 15:40:14 +0000 (23:40 +0800)]
net: axienet: set mdio clock according to bus-frequency
Some FPGA platforms have 80KHz MDIO bus frequency constraint when
connecting Ethernet to its on-board external Marvell PHY. Thus, we may
have to set MDIO clock according to the DT. Otherwise, use the default
2.5 MHz, as specified by 802.3, if the entry is not present.
Also, change MAX_MDIO_FREQ to DEFAULT_MDIO_FREQ because we may actually
set MDIO bus frequency higher than 2.5MHz if undelying devices support
it. And properly disable the mdio bus clock in error path.
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Chiu [Thu, 17 Nov 2022 15:40:13 +0000 (23:40 +0800)]
dt-bindings: describe the support of "clock-frequency" in mdio
mdio bus frequency is going to be configurable at boottime by a property
in DT now, so add a description to it.
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Reviewed-by: Greentime Hu <greentime.hu@sifive.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Chiu [Thu, 17 Nov 2022 15:40:12 +0000 (23:40 +0800)]
net: axienet: Unexport and remove unused mdio functions
Both axienet_mdio_{enable/disable} functions are no longer used in
xilinx_axienet_main.c due to
253761a0e61b7. And axienet_mdio_disable is
not even used in the mdio.c. So unexport and remove them.
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Reviewed-by: Greentime Hu <greentime.hu@sifive.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Thu, 17 Nov 2022 15:29:05 +0000 (18:29 +0300)]
net: microchip: sparx5: prevent uninitialized variable
Smatch complains that:
drivers/net/ethernet/microchip/sparx5/sparx5_dcb.c:112
sparx5_dcb_apptrust_validate() error: uninitialized symbol 'match'.
This would only happen if the:
if (sparx5_dcb_apptrust_policies[i].nselectors != nselectors)
condition is always true (they are not equal). The "nselectors"
variable comes from dcbnl_ieee_set() and it is a number between 0-256.
This seems like a probably a real bug.
Fixes:
23f8382cd95d ("net: microchip: sparx5: add support for apptrust")
Signed-off-by: Dan Carpenter <error27@gmail.com>
Reviewed-by: Daniel Machon <daniel.machon@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lorenzo Bianconi [Thu, 17 Nov 2022 14:29:53 +0000 (15:29 +0100)]
net: ethernet: mtk_eth_soc: fix RSTCTRL_PPE{0,1} definitions
Fix RSTCTRL_PPE0 and RSTCTRL_PPE1 register mask definitions for
MTK_NETSYS_V2.
Remove duplicated definitions.
Fixes:
160d3a9b1929 ("net: ethernet: mtk_eth_soc: introduce MTK_NETSYS_V2 support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Horatiu Vultur [Thu, 17 Nov 2022 13:28:12 +0000 (14:28 +0100)]
net: microchip: sparx5: kunit test: Fix compile warnings.
When VCAP_KUNIT_TEST is enabled the following warnings are generated:
drivers/net/ethernet/microchip/vcap/vcap_api_kunit.c:257:34: warning: Using plain integer as NULL pointer
drivers/net/ethernet/microchip/vcap/vcap_api_kunit.c:258:41: warning: Using plain integer as NULL pointer
drivers/net/ethernet/microchip/vcap/vcap_api_kunit.c:342:23: warning: Using plain integer as NULL pointer
drivers/net/ethernet/microchip/vcap/vcap_api_kunit.c:359:23: warning: Using plain integer as NULL pointer
drivers/net/ethernet/microchip/vcap/vcap_api_kunit.c:1327:34: warning: Using plain integer as NULL pointer
drivers/net/ethernet/microchip/vcap/vcap_api_kunit.c:1328:41: warning: Using plain integer as NULL pointer
Therefore fix this.
Fixes:
dccc30cc4906 ("net: microchip: sparx5: Add KUNIT test of counters and sorted rules")
Fixes:
c956b9b318d9 ("net: microchip: sparx5: Adding KUNIT tests of key/action values in VCAP API")
Fixes:
67d637516fa9 ("net: microchip: sparx5: Adding KUNIT test for the VCAP API")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 21 Nov 2022 08:51:36 +0000 (08:51 +0000)]
Merge branch 'nfp-ipsec-offload'
Simon Horman says:
====================
nfp: IPsec offload support
Huanhuan Wang says:
this series adds support for IPsec offload to the NFP driver.
It covers three enhancements:
1. Patches 1/3:
- Extend the capability word and control word to to support
new features.
2. Patch 2/3:
- Add framework to support IPsec offloading for NFP driver,
but IPsec offload control plane interface xfrm callbacks which
interact with upper layer are not implemented in this patch.
3. Patch 3/3:
- IPsec control plane interface xfrm callbacks are implemented
in this patch.
Changes since v3
* Remove structure fields that describe firmware but
are not used for Kernel offload
* Add WARN_ON(!xa_empty()) before call to xa_destroy()
* Added helpers for hash methods
Changes since v2
* OFFLOAD_HANDLE_ERROR macro and the associated code removed
* Unnecessary logging removed
* Hook function xdo_dev_state_free in struct xfrmdev_ops removed
* Use Xarray to maintain SA entries
Changes since v1
* Explicitly return failure when XFRM_STATE_ESN is set
* Fix the issue that AEAD algorithm is not correctly offloaded
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Huanhuan Wang [Thu, 17 Nov 2022 13:21:02 +0000 (14:21 +0100)]
nfp: implement xfrm callbacks and expose ipsec offload feature to upper layer
Xfrm callbacks are implemented to offload SA info into firmware
by mailbox. It supports 16K SA info in total.
Expose ipsec offload feature to upper layer, this feature will
signal the availability of the offload.
Based on initial work of Norm Bagley <norman.bagley@netronome.com>.
Signed-off-by: Huanhuan Wang <huanhuan.wang@corigine.com>
Reviewed-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Huanhuan Wang [Thu, 17 Nov 2022 13:21:01 +0000 (14:21 +0100)]
nfp: add framework to support ipsec offloading
A new metadata type and config structure are introduced to
interact with firmware to support ipsec offloading. This
feature relies on specific firmware that supports ipsec
encrypt/decrypt by advertising related capability bit.
The xfrm callbacks which interact with upper layer are
implemented in the following patch.
Based on initial work of Norm Bagley <norman.bagley@netronome.com>.
Signed-off-by: Huanhuan Wang <huanhuan.wang@corigine.com>
Reviewed-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yinjun Zhang [Thu, 17 Nov 2022 13:21:00 +0000 (14:21 +0100)]
nfp: extend capability and control words
Currently the 32-bit capability word is almost exhausted, now
allocate some more words to support new features, and control
word is also extended accordingly. Packet-type offloading is
implemented in NIC application firmware, but it's not used in
kernel driver, so reserve this bit here in case it's redefined
for other use.
Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Reviewed-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gustavo A. R. Silva [Wed, 16 Nov 2022 16:59:44 +0000 (10:59 -0600)]
bna: Avoid clashing function prototypes
When built with Control Flow Integrity, function prototypes between
caller and function declaration must match. These mismatches are visible
at compile time with the new -Wcast-function-type-strict in Clang[1].
Fix a total of 227 warnings like these:
drivers/net/ethernet/brocade/bna/bna_enet.c:519:3: warning: cast from 'void (*)(struct bna_ethport *, enum bna_ethport_event)' to 'bfa_fsm_t' (aka 'void (*)(void *, int)') converts to incompatible function type [-Wcast-function-type-strict]
bfa_fsm_set_state(ethport, bna_ethport_sm_down);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The bna state machine code heavily overloads its state machine functions,
so these have been separated into their own sets of structs, enums,
typedefs, and helper functions. There are almost zero binary code changes,
all seem to be related to header file line numbers changing, or the
addition of the new stats helper.
Important to mention is that while I was manually implementing this changes
I was staring at this[2] patch from Kees Cook. Thanks, Kees. :)
Link: https://github.com/KSPP/linux/issues/240
[1] https://reviews.llvm.org/D134831
[2] https://lore.kernel.org/linux-hardening/
20220929230334.2109344-1-keescook@chromium.org/
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Felix Fietkau [Wed, 16 Nov 2022 08:07:34 +0000 (09:07 +0100)]
net: ethernet: mediatek: ppe: assign per-port queues for offloaded traffic
Keeps traffic sent to the switch within link speed limits
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20221116080734.44013-7-nbd@nbd.name
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Felix Fietkau [Wed, 16 Nov 2022 08:07:33 +0000 (09:07 +0100)]
net: dsa: tag_mtk: assign per-port queues
Keeps traffic sent to the switch within link speed limits
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20221116080734.44013-6-nbd@nbd.name
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Felix Fietkau [Wed, 16 Nov 2022 08:07:32 +0000 (09:07 +0100)]
net: ethernet: mtk_eth_soc: implement multi-queue support for per-port queues
When sending traffic to multiple ports with different link speeds, queued
packets to one port can drown out tx to other ports.
In order to better handle transmission to multiple ports, use the hardware
shaper feature to implement weighted fair queueing between ports.
Weight and maximum rate are automatically adjusted based on the link speed
of the port.
The first 3 queues are unrestricted and reserved for non-DSA direct tx on
GMAC ports. The following queues are automatically assigned by the MTK DSA
tag driver based on the target port number.
The PPE offload code configures the queues for offloaded traffic in the same
way.
This feature is only supported on devices supporting QDMA. All queues still
share the same DMA ring and descriptor pool.
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20221116080734.44013-5-nbd@nbd.name
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Felix Fietkau [Wed, 16 Nov 2022 08:07:31 +0000 (09:07 +0100)]
net: ethernet: mtk_eth_soc: avoid port_mg assignment on MT7622 and newer
On newer chips, this field is unused and contains some bits related to queue
assignment. Initialize it to 0 in those cases.
Fix offload_version on MT7621 and MT7623, which still need the previous value.
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20221116080734.44013-4-nbd@nbd.name
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Felix Fietkau [Wed, 16 Nov 2022 08:07:30 +0000 (09:07 +0100)]
net: ethernet: mtk_eth_soc: drop packets to WDMA if the ring is full
Improves handling of DMA ring overflow.
Clarify other WDMA drop related comment.
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20221116080734.44013-3-nbd@nbd.name
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Felix Fietkau [Wed, 16 Nov 2022 08:07:29 +0000 (09:07 +0100)]
net: ethernet: mtk_eth_soc: increase tx ring size for QDMA devices
In order to use the hardware traffic shaper feature, a larger tx ring is
needed, especially for the scratch ring, which the hardware shaper uses to
reorder packets.
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20221116080734.44013-2-nbd@nbd.name
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Lukas Bulwahn [Wed, 16 Nov 2022 10:24:50 +0000 (11:24 +0100)]
net: fman: remove reference to non-existing config PCS
Commit
a7c2a32e7f22 ("net: fman: memac: Use lynx pcs driver") makes the
Freescale Data-Path Acceleration Architecture Frame Manager use lynx pcs
driver by selecting PCS_LYNX.
It also selects the non-existing config PCS as well, which has no effect.
Remove this select to a non-existing config.
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Link: https://lore.kernel.org/r/20221116102450.13928-1-lukas.bulwahn@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 18 Nov 2022 03:39:03 +0000 (19:39 -0800)]
netlink: remove the flex array from struct nlmsghdr
I've added a flex array to struct nlmsghdr in
commit
738136a0e375 ("netlink: split up copies in the ack construction")
to allow accessing the data easily. It leads to warnings with clang,
if user space wraps this structure into another struct and the flex
array is not at the end of the container.
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/all/20221114023927.GA685@u2004-local/
Link: https://lore.kernel.org/r/20221118033903.1651026-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Schspa Shi [Wed, 16 Nov 2022 11:45:11 +0000 (19:45 +0800)]
mrp: introduce active flags to prevent UAF when applicant uninit
The caller of del_timer_sync must prevent restarting of the timer, If
we have no this synchronization, there is a small probability that the
cancellation will not be successful.
And syzbot report the fellowing crash:
==================================================================
BUG: KASAN: use-after-free in hlist_add_head include/linux/list.h:929 [inline]
BUG: KASAN: use-after-free in enqueue_timer+0x18/0xa4 kernel/time/timer.c:605
Write at addr
f9ff000024df6058 by task syz-fuzzer/2256
Pointer tag: [f9], memory tag: [fe]
CPU: 1 PID: 2256 Comm: syz-fuzzer Not tainted 6.1.0-rc5-syzkaller-00008-
ge01d50cbd6ee #0
Hardware name: linux,dummy-virt (DT)
Call trace:
dump_backtrace.part.0+0xe0/0xf0 arch/arm64/kernel/stacktrace.c:156
dump_backtrace arch/arm64/kernel/stacktrace.c:162 [inline]
show_stack+0x18/0x40 arch/arm64/kernel/stacktrace.c:163
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x68/0x84 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:284 [inline]
print_report+0x1a8/0x4a0 mm/kasan/report.c:395
kasan_report+0x94/0xb4 mm/kasan/report.c:495
__do_kernel_fault+0x164/0x1e0 arch/arm64/mm/fault.c:320
do_bad_area arch/arm64/mm/fault.c:473 [inline]
do_tag_check_fault+0x78/0x8c arch/arm64/mm/fault.c:749
do_mem_abort+0x44/0x94 arch/arm64/mm/fault.c:825
el1_abort+0x40/0x60 arch/arm64/kernel/entry-common.c:367
el1h_64_sync_handler+0xd8/0xe4 arch/arm64/kernel/entry-common.c:427
el1h_64_sync+0x64/0x68 arch/arm64/kernel/entry.S:576
hlist_add_head include/linux/list.h:929 [inline]
enqueue_timer+0x18/0xa4 kernel/time/timer.c:605
mod_timer+0x14/0x20 kernel/time/timer.c:1161
mrp_periodic_timer_arm net/802/mrp.c:614 [inline]
mrp_periodic_timer+0xa0/0xc0 net/802/mrp.c:627
call_timer_fn.constprop.0+0x24/0x80 kernel/time/timer.c:1474
expire_timers+0x98/0xc4 kernel/time/timer.c:1519
To fix it, we can introduce a new active flags to make sure the timer will
not restart.
Reported-by: syzbot+6fd64001c20aa99e34a4@syzkaller.appspotmail.com
Signed-off-by: Schspa Shi <schspa@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 18 Nov 2022 12:09:20 +0000 (12:09 +0000)]
Merge tag 'rxrpc-next-
20221116' of git://git./linux/kernel/git/dhowells/linux-fs
David Howells says:
====================
rxrpc: Fix oops and missing config conditionals
The patches that were pulled into net-next previously[1] had some issues
that this patchset fixes:
(1) Fix missing IPV6 config conditionals.
(2) Fix an oops caused by calling udpv6_sendmsg() directly on an AF_INET
socket.
(3) Fix the validation of network addresses on entry to socket functions
so that we don't allow an AF_INET6 address if we've selected an
AF_INET transport socket.
Link: https://lore.kernel.org/r/166794587113.2389296.16484814996876530222.stgit@warthog.procyon.org.uk/
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 17 Nov 2022 09:26:41 +0000 (09:26 +0000)]
net: fix napi_disable() logic error
Dan reported a new warning after my recent patch:
New smatch warnings:
net/core/dev.c:6409 napi_disable() error: uninitialized symbol 'new'.
Indeed, we must first wait for STATE_SCHED and STATE_NPSVC to be cleared,
to make sure @new variable has been initialized properly.
Fixes:
4ffa1d1c6842 ("net: adopt try_cmpxchg() in napi_{enable|disable}()")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Thu, 17 Nov 2022 07:44:02 +0000 (10:44 +0300)]
rxrpc: uninitialized variable in rxrpc_send_ack_packet()
The "pkt" was supposed to have been deleted in a previous patch. It
leads to an uninitialized variable bug.
Fixes:
72f0c6fb0579 ("rxrpc: Allocate ACK records at proposal and queue for transmission")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Thu, 17 Nov 2022 07:43:38 +0000 (10:43 +0300)]
rxrpc: fix rxkad_verify_response()
The error handling for if skb_copy_bits() fails was accidentally deleted
so the rxkad_decrypt_ticket() function is not called.
Fixes:
5d7edbc9231e ("rxrpc: Get rid of the Rx ring")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lorenzo Bianconi [Wed, 16 Nov 2022 23:58:46 +0000 (00:58 +0100)]
net: ethernet: mtk_eth_soc: remove cpu_relax in mtk_pending_work
Get rid of cpu_relax in mtk_pending_work routine since MTK_RESETTING is
set only in mtk_pending_work() and it runs holding rtnl lock
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lorenzo Bianconi [Wed, 16 Nov 2022 23:35:04 +0000 (00:35 +0100)]
net: ethernet: mtk_eth_soc: do not overwrite mtu configuration running reset routine
Restore user configured MTU running mtk_hw_init() during tx timeout routine
since it will be overwritten after a hw reset.
Reported-by: Felix Fietkau <nbd@nbd.name>
Fixes:
9ea4d311509f ("net: ethernet: mediatek: add the whole ethernet reset into the reset process")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alex Elder [Wed, 16 Nov 2022 22:37:18 +0000 (16:37 -0600)]
net: ipa: avoid a null pointer dereference
Dan Carpenter reported that Smatch found an instance where a pointer
which had previously been assumed could be null (as indicated by a
null check) was later dereferenced without a similar check.
In practice this doesn't lead to a problem because currently the
pointers used are all non-null. Nevertheless this patch addresses
the reported problem.
In addition, I spotted another bug that arose in the same commit.
When the command to initialize a routing table memory region was
added, the number of entries computed for the non-hashed table
was wrong (it ended up being a Boolean rather than the count
intended). This bug is fixed here as well.
Reported-by: Dan Carpenter <error27@gmail.com>
Link: https://lore.kernel.org/kernel-janitors/Y3OOP9dXK6oEydkf@kili
Tested-by: Caleb Connolly <caleb.connolly@linaro.com>
Fixes:
5cb76899fb47 ("net: ipa: reduce arguments to ipa_table_init_add()")
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 18 Nov 2022 11:44:36 +0000 (11:44 +0000)]
Merge tag 'wireless-next-2022-11-18' of git://git./linux/kernel/git/wireless/wireless-next
Kalle Valo says:
====================
wireless-next patches for v6.2
Second set of patches for v6.2. Only driver patches this time, nothing
really special. Unused platform data support was removed from wl1251
and rtw89 got WoWLAN support.
Major changes:
ath11k
* support configuring channel dwell time during scan
rtw89
* new dynamic header firmware format support
* Wake-over-WLAN support
rtl8xxxu
* enable IEEE80211_HW_SUPPORT_FAST_XMIT
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 18 Nov 2022 11:42:54 +0000 (11:42 +0000)]
Merge branch 'sctp-vrf'
Xin Long says:
====================
sctp: support vrf processing
This patchset adds the VRF processing in SCTP. Simliar to TCP/UDP,
it includes socket bind and socket/association lookup changes.
For socket bind change, it allows sockets to bind to a VRF device
and allows multiple sockets with the same IP and PORT to bind to
different interfaces in patch 1-3.
For socket/association lookup change, it adds dif and sdif check
in both asoc and ep lookup in patch 4 and 5, and when binding to
nodev, users can decide if accept the packets received from one
l3mdev by setup a sysctl option in patch 6.
Note with VRF support, in a netns, an association will be decided
by src ip + src port + dst ip + dst port + bound_dev_if, and it's
possible for ss to have:
State Local Address:Port Peer Address:Port
ESTAB 192.168.1.2%vrf-s1:1234
`- ESTAB 192.168.1.2%veth1:1234 192.168.1.1:1234
ESTAB 192.168.1.2%vrf-s2:1234
`- ESTAB 192.168.1.2%veth2:1234 192.168.1.1:1234
See the selftest in patch 7 for more usage.
Also, thanks Carlo for testing this patch series on their use.
v1->v2:
- In Patch 5, move sctp_sk_bound_dev_eq() definition to net/sctp/
input.c to avoid a build error when IP_SCTP is disabled, as Paolo
suggested.
- In Patch 7, avoid one sleep by disabling the IPv6 dad, and remove
another sleep by using ss to check if the server's ready, and also
delete two unncessary sleeps in sctp_hello.c, as Paolo suggested.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Wed, 16 Nov 2022 20:01:22 +0000 (15:01 -0500)]
selftests: add a selftest for sctp vrf
This patch adds 12 small test cases: 01-04 test for the sysctl
net.sctp.l3mdev_accept. 05-10 test for only binding to a right
l3mdev device, the connection can be created. 11-12 test for
two socks binding to different l3mdev devices at the same time,
each of them can process the packets from the corresponding
peer. The tests run for both IPv4 and IPv6 SCTP.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Wed, 16 Nov 2022 20:01:21 +0000 (15:01 -0500)]
sctp: add sysctl net.sctp.l3mdev_accept
This patch is to add sysctl net.sctp.l3mdev_accept to allow
users to change the pernet global l3mdev_accept.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Wed, 16 Nov 2022 20:01:20 +0000 (15:01 -0500)]
sctp: add dif and sdif check in asoc and ep lookup
This patch at first adds a pernet global l3mdev_accept to decide if it
accepts the packets from a l3mdev when a SCTP socket doesn't bind to
any interface. It's set to 1 to avoid any possible incompatible issue,
and in next patch, a sysctl will be introduced to allow to change it.
Then similar to inet/udp_sk_bound_dev_eq(), sctp_sk_bound_dev_eq() is
added to check either dif or sdif is equal to sk_bound_dev_if, and to
check sid is 0 or l3mdev_accept is 1 if sk_bound_dev_if is not set.
This function is used to match a association or a endpoint, namely
called by sctp_addrs_lookup_transport() and sctp_endpoint_is_match().
All functions that needs updating are:
sctp_rcv():
asoc:
__sctp_rcv_lookup()
__sctp_lookup_association() -> sctp_addrs_lookup_transport()
__sctp_rcv_lookup_harder()
__sctp_rcv_init_lookup()
__sctp_lookup_association() -> sctp_addrs_lookup_transport()
__sctp_rcv_walk_lookup()
__sctp_rcv_asconf_lookup()
__sctp_lookup_association() -> sctp_addrs_lookup_transport()
ep:
__sctp_rcv_lookup_endpoint() -> sctp_endpoint_is_match()
sctp_connect():
sctp_endpoint_is_peeled_off()
__sctp_lookup_association()
sctp_has_association()
sctp_lookup_association()
__sctp_lookup_association() -> sctp_addrs_lookup_transport()
sctp_diag_dump_one():
sctp_transport_lookup_process() -> sctp_addrs_lookup_transport()
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Wed, 16 Nov 2022 20:01:19 +0000 (15:01 -0500)]
sctp: add skb_sdif in struct sctp_af
Add skb_sdif function in struct sctp_af to get the enslaved device
for both ipv4 and ipv6 when adding SCTP VRF support in sctp_rcv in
the next patch.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Wed, 16 Nov 2022 20:01:18 +0000 (15:01 -0500)]
sctp: check sk_bound_dev_if when matching ep in get_port
In sctp_get_port_local(), when binding to IP and PORT, it should
also check sk_bound_dev_if to match listening sk if it's set by
SO_BINDTOIFINDEX, so that multiple sockets with the same IP and
PORT, but different sk_bound_dev_if can be listened at the same
time.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Wed, 16 Nov 2022 20:01:17 +0000 (15:01 -0500)]
sctp: check ipv6 addr with sk_bound_dev if set
When binding to an ipv6 address, it calls ipv6_chk_addr() to check if
this address is on any dev. If a socket binds to a l3mdev but no dev
is passed to do this check, all l3mdev and slaves will be skipped and
the check will fail.
This patch is to pass the bound_dev to make sure the devices under the
same l3mdev can be returned in ipv6_chk_addr(). When the bound_dev is
not a l3mdev or l3slave, l3mdev_master_dev_rcu() will return NULL in
__ipv6_chk_addr_and_flags(), it will keep compitable with before when
NULL dev was passed.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Wed, 16 Nov 2022 20:01:16 +0000 (15:01 -0500)]
sctp: verify the bind address with the tb_id from l3mdev
After binding to a l3mdev, it should use the route table from the
corresponding VRF to verify the addr when binding to an address.
Note ipv6 doesn't need it, as binding to ipv6 address does not
verify the addr with route lookup.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiawen Wu [Wed, 16 Nov 2022 01:58:35 +0000 (09:58 +0800)]
net: libwx: Fix dead code for duplicate check
Fix duplicate check on polling timeout.
Fixes:
1efa9bfe58c5 ("net: libwx: Implement interaction with firmware")
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Antoine Tenart [Tue, 15 Nov 2022 15:44:51 +0000 (16:44 +0100)]
net: phy: mscc: macsec: do not copy encryption keys
Following
1b16b3fdf675 ("net: phy: mscc: macsec: clear encryption keys when freeing a flow"),
go one step further and instead of calling memzero_explicit on the key
when freeing a flow, simply not copy the key in the first place as it's
only used when a new flow is set up.
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 18 Nov 2022 05:46:58 +0000 (21:46 -0800)]
Merge branch 'net-ipa-change-gsi-firmware-load-specification'
Alex Elder says:
====================
net: ipa: change GSI firmware load specification
Currently, GSI firmware must be loaded for IPA before it can be
used--either by the modem, or by the AP. New hardware supports a
third option, with the bootloader taking responsibility for loading
GSI firmware. In that case, neither the AP nor the modem needs to
do that.
The first patch in this series deprecates the "modem-init" Device
Tree property in the IPA binding, using a new "qcom,gsi-loader"
property instead. The second and third implement logic in the code
to support either the "old" or the "new" way of specifying how GSI
firmware is loaded.
The last two patches implement a new value for the "qcom,gsi-loader"
property. If the value is "skip", neither the AP nor modem needs to
load the GSI firmware. The first of these patches implements the
change in the IPA binding; the second implements it in the code.
====================
Link: https://lore.kernel.org/r/20221116073257.34010-1-elder@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Wed, 16 Nov 2022 07:32:56 +0000 (01:32 -0600)]
net: ipa: permit GSI firmware loading to be skipped
Define a new value "skip" for the "qcom,gsi-loader" Device Tree
property. If used, it indicates that neither the AP nor the modem
need to load GSI firmware (because it has already been loaded--for
example by the boot loader).
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Wed, 16 Nov 2022 07:32:55 +0000 (01:32 -0600)]
dt-bindings: net: qcom,ipa: support skipping GSI firmware load
Add a new enumerated value to those defined for the qcom,gsi-loader
property. If the qcom,gsi-loader is "skip", the GSI firmware will
already be loaded, so neither the AP nor modem is required to load
GSI firmware.
Signed-off-by: Alex Elder <elder@linaro.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Wed, 16 Nov 2022 07:32:54 +0000 (01:32 -0600)]
net: ipa: introduce "qcom,gsi-loader" property
Introduce a new way of specifying how the GSI firmware gets loaded
for IPA. Currently, this is indicated by the presence or absence of
the Boolean "modem-init" Device Tree property. The new property
must have a value--either "self" or "modem"--which indicates whether
the AP or modem is the GSI firmware loader, respectively.
For legacy systems, the new property will not exist, and the
"modem-init" property will be used. For newer systems, the
"qcom,gsi-loader" property *must* exist, and must have one of the
two prescribed values. It is an error to have both properties
defined, and it is an error for the new property to have an
unrecognized value.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Wed, 16 Nov 2022 07:32:53 +0000 (01:32 -0600)]
net: ipa: encapsulate decision about firmware load
The GSI layer used for IPA requires firmware to be loaded.
Currently either the AP or the modem loads the firmware,
distinguished by whether the "modem-init" Device Tree
property is defined.
Some newer systems implement a third option. In preparation for
that, encapsulate the code that determines how the GSI firmware
gets loaded in a new function, ipa_firmware_loader().
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Wed, 16 Nov 2022 07:32:52 +0000 (01:32 -0600)]
dt-bindings: net: qcom,ipa: deprecate modem-init
GSI firmware for IPA must be loaded during initialization, either by
the AP or by the modem. The loader is currently specified based on
whether the Boolean modem-init property is present.
Instead, use a new property with an enumerated value to indicate
explicitly how GSI firmware gets loaded. With this in place, a
third approach can be added in an upcoming patch.
The new qcom,gsi-loader property has two defined values:
- self: The AP loads GSI firmware
- modem: The modem loads GSI firmware
The modem-init property must still be supported, but is now marked
deprecated.
Update the example so it represents the SC7180 SoC, and provide
examples for the qcom,gsi-loader, memory-region, and firmware-name
properties.
Signed-off-by: Alex Elder <elder@linaro.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Xin Long [Tue, 15 Nov 2022 15:40:21 +0000 (10:40 -0500)]
sctp: move SCTP_PAD4 and SCTP_TRUNC4 to linux/sctp.h
Move these two macros from net/sctp/sctp.h to linux/sctp.h, so that
it will be enough to include only linux/sctp.h in nft_exthdr.c and
xt_sctp.c. It should not include "net/sctp/sctp.h" if a module does
not have a dependence on SCTP module.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Saeed Mahameed <saeed@kernel.org>
Link: https://lore.kernel.org/r/ef6468a687f36da06f575c2131cd4612f6b7be88.1668526821.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Xin Long [Tue, 15 Nov 2022 15:39:53 +0000 (10:39 -0500)]
sctp: change to include linux/sctp.h in net/sctp/checksum.h
Currently "net/sctp/checksum.h" including "net/sctp/sctp.h" is
included in quite some places in netfilter and openswitch and
net/sched. It's not necessary to include "net/sctp/sctp.h" if
a module does not have dependence on SCTP, "linux/sctp.h" is
the right one to include.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Saeed Mahameed <saeed@kernel.org>
Link: https://lore.kernel.org/r/ca7ea96d62a26732f0491153c3979dc1c0d8d34a.1668526793.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 18 Nov 2022 05:41:41 +0000 (21:41 -0800)]
Merge branch 'implement-devlink-rate-api-and-extend-it'
Michal Wilczynski says:
====================
Implement devlink-rate API and extend it
This patch series implements devlink-rate for ice driver. Unfortunately
current API isn't flexible enough for our use case, so there is a need to
extend it. Some functions have been introduced to enable the driver to
export current Tx scheduling configuration.
Pasting justification for this series from commit implementing devlink-rate
in ice driver(that is a part of this series):
There is a need to support modification of Tx scheduler tree, in the
ice driver. This will allow user to control Tx settings of each node in
the internal hierarchy of nodes. As a result user will be able to use
Hierarchy QoS implemented entirely in the hardware.
This patch implemenents devlink-rate API. It also exports initial
default hierarchy. It's mostly dictated by the fact that the tree
can't be removed entirely, all we can do is enable the user to modify
it. For example root node shouldn't ever be removed, also nodes that
have children are off-limits.
Example initial tree with 2 VF's:
[root@fedora ~]# devlink port function rate show
pci/0000:4b:00.0/node_27: type node parent node_26
pci/0000:4b:00.0/node_26: type node parent node_0
pci/0000:4b:00.0/node_34: type node parent node_33
pci/0000:4b:00.0/node_33: type node parent node_32
pci/0000:4b:00.0/node_32: type node parent node_16
pci/0000:4b:00.0/node_19: type node parent node_18
pci/0000:4b:00.0/node_18: type node parent node_17
pci/0000:4b:00.0/node_17: type node parent node_16
pci/0000:4b:00.0/node_21: type node parent node_20
pci/0000:4b:00.0/node_20: type node parent node_3
pci/0000:4b:00.0/node_14: type node parent node_5
pci/0000:4b:00.0/node_5: type node parent node_3
pci/0000:4b:00.0/node_13: type node parent node_4
pci/0000:4b:00.0/node_12: type node parent node_4
pci/0000:4b:00.0/node_11: type node parent node_4
pci/0000:4b:00.0/node_10: type node parent node_4
pci/0000:4b:00.0/node_9: type node parent node_4
pci/0000:4b:00.0/node_8: type node parent node_4
pci/0000:4b:00.0/node_7: type node parent node_4
pci/0000:4b:00.0/node_6: type node parent node_4
pci/0000:4b:00.0/node_4: type node parent node_3
pci/0000:4b:00.0/node_3: type node parent node_16
pci/0000:4b:00.0/node_16: type node parent node_15
pci/0000:4b:00.0/node_15: type node parent node_0
pci/0000:4b:00.0/node_2: type node parent node_1
pci/0000:4b:00.0/node_1: type node parent node_0
pci/0000:4b:00.0/node_0: type node
pci/0000:4b:00.0/1: type leaf parent node_27
pci/0000:4b:00.0/2: type leaf parent node_27
Let me visualize part of the tree:
+---------+
| node_0 |
+---------+
|
+----v----+
| node_26 |
+----+----+
|
+----v----+
| node_27 |
+----+----+
|
|-----------------|
+----v----+ +----v----+
| VF 1 | | VF 2 |
+----+----+ +----+----+
So at this point there is a couple things that can be done.
For example we could only assign parameters to VF's.
[root@fedora ~]# devlink port function rate set pci/0000:4b:00.0/1 \
tx_max 5Gbps
This would cap the VF 1 BW to 5Gbps.
But let's say you would like to create a completely new branch.
This can be done like this:
[root@fedora ~]# devlink port function rate add \
pci/0000:4b:00.0/node_custom parent node_0
[root@fedora ~]# devlink port function rate add \
pci/0000:4b:00.0/node_custom_1 parent node_custom
[root@fedora ~]# devlink port function rate set \
pci/0000:4b:00.0/1 parent node_custom_1
This creates a completely new branch and reassigns VF 1 to it.
A number of parameters is supported per each node: tx_max, tx_share,
tx_priority and tx_weight.
====================
Link: https://lore.kernel.org/r/20221115104825.172668-1-michal.wilczynski@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Michal Wilczynski [Tue, 15 Nov 2022 10:48:25 +0000 (11:48 +0100)]
Documentation: Add documentation for new devlink-rate attributes
Provide documentation for newly introduced netlink attributes for
devlink-rate: tx_priority and tx_weight.
Mention the possibility to export tree from the driver.
Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Michal Wilczynski [Tue, 15 Nov 2022 10:48:24 +0000 (11:48 +0100)]
ice: Add documentation for devlink-rate implementation
Add documentation to a newly added devlink-rate feature. Provide some
examples on how to use the commands, which netlink attributes are
supported and descriptions of the attributes.
Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Michal Wilczynski [Tue, 15 Nov 2022 10:48:23 +0000 (11:48 +0100)]
ice: Prevent ADQ, DCB coexistence with Custom Tx scheduler
ADQ, DCB might interfere with Custom Tx Scheduler changes that user
might introduce using devlink-rate API.
Check if ADQ, DCB is active, when user tries to change any setting
in exported Tx scheduler tree. If any of those are active block the user
from doing so, and log an appropriate message.
Remove the exported hierarchy if user enable ADQ or DCB.
Prevent ADQ or DCB from getting configured if user already made some
changes using devlink-rate API.
Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Michal Wilczynski [Tue, 15 Nov 2022 10:48:22 +0000 (11:48 +0100)]
ice: Implement devlink-rate API
There is a need to support modification of Tx scheduler tree, in the
ice driver. This will allow user to control Tx settings of each node in
the internal hierarchy of nodes. As a result user will be able to use
Hierarchy QoS implemented entirely in the hardware.
This patch implemenents devlink-rate API. It also exports initial
default hierarchy. It's mostly dictated by the fact that the tree
can't be removed entirely, all we can do is enable the user to modify
it. For example root node shouldn't ever be removed, also nodes that
have children are off-limits.
Example initial tree with 2 VF's:
[root@fedora ~]# devlink port function rate show
pci/0000:4b:00.0/node_27: type node parent node_26
pci/0000:4b:00.0/node_26: type node parent node_0
pci/0000:4b:00.0/node_34: type node parent node_33
pci/0000:4b:00.0/node_33: type node parent node_32
pci/0000:4b:00.0/node_32: type node parent node_16
pci/0000:4b:00.0/node_19: type node parent node_18
pci/0000:4b:00.0/node_18: type node parent node_17
pci/0000:4b:00.0/node_17: type node parent node_16
pci/0000:4b:00.0/node_21: type node parent node_20
pci/0000:4b:00.0/node_20: type node parent node_3
pci/0000:4b:00.0/node_14: type node parent node_5
pci/0000:4b:00.0/node_5: type node parent node_3
pci/0000:4b:00.0/node_13: type node parent node_4
pci/0000:4b:00.0/node_12: type node parent node_4
pci/0000:4b:00.0/node_11: type node parent node_4
pci/0000:4b:00.0/node_10: type node parent node_4
pci/0000:4b:00.0/node_9: type node parent node_4
pci/0000:4b:00.0/node_8: type node parent node_4
pci/0000:4b:00.0/node_7: type node parent node_4
pci/0000:4b:00.0/node_6: type node parent node_4
pci/0000:4b:00.0/node_4: type node parent node_3
pci/0000:4b:00.0/node_3: type node parent node_16
pci/0000:4b:00.0/node_16: type node parent node_15
pci/0000:4b:00.0/node_15: type node parent node_0
pci/0000:4b:00.0/node_2: type node parent node_1
pci/0000:4b:00.0/node_1: type node parent node_0
pci/0000:4b:00.0/node_0: type node
pci/0000:4b:00.0/1: type leaf parent node_27
pci/0000:4b:00.0/2: type leaf parent node_27
Let me visualize part of the tree:
+---------+
| node_0 |
+---------+
|
+----v----+
| node_26 |
+----+----+
|
+----v----+
| node_27 |
+----+----+
|
|-----------------|
+----v----+ +----v----+
| VF 1 | | VF 2 |
+----+----+ +----+----+
So at this point there is a couple things that can be done.
For example we could only assign parameters to VF's.
[root@fedora ~]# devlink port function rate set pci/0000:4b:00.0/1 \
tx_max 5Gbps
This would cap the VF 1 BW to 5Gbps.
But let's say you would like to create a completely new branch.
This can be done like this:
[root@fedora ~]# devlink port function rate add \
pci/0000:4b:00.0/node_custom parent node_0
[root@fedora ~]# devlink port function rate add \
pci/0000:4b:00.0/node_custom_1 parent node_custom
[root@fedora ~]# devlink port function rate set \
pci/0000:4b:00.0/1 parent node_custom_1
This creates a completely new branch and reassigns VF 1 to it.
A number of parameters is supported per each node: tx_max, tx_share,
tx_priority and tx_weight.
Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Michal Wilczynski [Tue, 15 Nov 2022 10:48:21 +0000 (11:48 +0100)]
ice: Add an option to pre-allocate memory for ice_sched_node
devlink-rate API requires a priv object to be allocated when node still
doesn't have a parent. This is problematic, because ice_sched_node can't
be currently created without a parent.
Add an option to pre-allocate memory for ice_sched_node struct. Add
new arguments to ice_sched_add() and ice_sched_add_elems() that allow
for pre-allocation of memory for ice_sched_node struct.
Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Michal Wilczynski [Tue, 15 Nov 2022 10:48:20 +0000 (11:48 +0100)]
ice: Introduce new parameters in ice_sched_node
To support new devlink-rate API ice_sched_node struct needs to store
a number of additional parameters. This includes tx_max, tx_share,
tx_weight, and tx_priority.
Add new fields to ice_sched_node struct. Add new functions to configure
the hardware with new parameters. Introduce new xarray to identify
nodes uniquely.
Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Michal Wilczynski [Tue, 15 Nov 2022 10:48:19 +0000 (11:48 +0100)]
devlink: Allow to set up parent in devl_rate_leaf_create()
Currently the driver is able to create leaf nodes for the devlink-rate,
but is unable to set parent for them. This wasn't as issue before the
possibility to export hierarchy from the driver. After adding the export
feature, in order for the driver to supply correct hierarchy, it's
necessary for it to be able to supply a parent name to
devl_rate_leaf_create().
Introduce a new parameter 'parent_name' in devl_rate_leaf_create().
Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Michal Wilczynski [Tue, 15 Nov 2022 10:48:18 +0000 (11:48 +0100)]
devlink: Allow for devlink-rate nodes parent reassignment
Currently it's not possible to reassign the parent of the node using one
command. As the previous commit introduced a way to export entire
hierarchy from the driver, being able to modify and reassign parents
become important. This way user might easily change QoS settings without
interrupting traffic.
Example command:
devlink port function rate set pci/0000:4b:00.0/1 parent node_custom_1
This reassigns leaf node parent to node_custom_1.
Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Michal Wilczynski [Tue, 15 Nov 2022 10:48:17 +0000 (11:48 +0100)]
devlink: Enable creation of the devlink-rate nodes from the driver
Intel 100G card internal firmware hierarchy for Hierarchicial QoS is very
rigid and can't be easily removed. This requires an ability to export
default hierarchy to allow user to modify it. Currently the driver is
only able to create the 'leaf' nodes, which usually represent the vport.
This is not enough for HQoS implemented in Intel hardware.
Introduce new function devl_rate_node_create() that allows for creation
of the devlink-rate nodes from the driver.
Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Michal Wilczynski [Tue, 15 Nov 2022 10:48:16 +0000 (11:48 +0100)]
devlink: Introduce new attribute 'tx_weight' to devlink-rate
To fully utilize offload capabilities of Intel 100G card QoS capabilities
new attribute 'tx_weight' needs to be introduced. This attribute allows
for usage of Weighted Fair Queuing arbitration scheme among siblings.
This arbitration scheme can be used simultaneously with the strict
priority.
Introduce new attribute in devlink-rate that will allow for configuration
of Weighted Fair Queueing. New attribute is optional.
Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Michal Wilczynski [Tue, 15 Nov 2022 10:48:15 +0000 (11:48 +0100)]
devlink: Introduce new attribute 'tx_priority' to devlink-rate
To fully utilize offload capabilities of Intel 100G card QoS capabilities
new attribute 'tx_priority' needs to be introduced. This attribute allows
for usage of strict priority arbiter among siblings. This arbitration
scheme attempts to schedule nodes based on their priority as long as the
nodes remain within their bandwidth limit.
Introduce new attribute in devlink-rate that will allow for configuration
of strict priority. New attribute is optional.
Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 18 Nov 2022 05:16:46 +0000 (21:16 -0800)]
Merge branch 'autoload-dsa-tagging-driver-when-dynamically-changing-protocol'
Vladimir Oltean says:
====================
Autoload DSA tagging driver when dynamically changing protocol
This patch set solves the issue reported by Michael and Heiko here:
https://lore.kernel.org/lkml/
20221027113248.420216-1-michael@walle.cc/
making full use of Michael's suggestion of having two modaliases: one
gets used for loading the tagging protocol when it's the default one
reported by the switch driver, the other gets loaded at user's request,
by name.
# modinfo tag_ocelot
filename: /lib/modules/6.1.0-rc4+/kernel/net/dsa/tag_ocelot.ko
license: GPL v2
alias: dsa_tag:seville
alias: dsa_tag:id-21
alias: dsa_tag:ocelot
alias: dsa_tag:id-15
depends: dsa_core
intree: Y
name: tag_ocelot
vermagic: 6.1.0-rc4+ SMP preempt mod_unload modversions aarch64
Tested on NXP LS1028A-RDB with the following device tree addition:
&mscc_felix_port4 {
dsa-tag-protocol = "ocelot-8021q";
};
&mscc_felix_port5 {
dsa-tag-protocol = "ocelot-8021q";
};
CONFIG_NET_DSA and everything that depends on it is built as module.
Everything auto-loads, and "cat /sys/class/net/eno2/dsa/tagging" shows
"ocelot-8021q". Traffic works as well. Furthermore, "echo ocelot-8021q"
into the aforementioned sysfs file now auto-loads the driver for it.
====================
Link: https://lore.kernel.org/r/20221115011847.2843127-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Tue, 15 Nov 2022 01:18:47 +0000 (03:18 +0200)]
net: dsa: autoload tag driver module on tagging protocol change
Issue a request_module() call when an attempt to change the tagging
protocol is made, either by sysfs or by device tree. In the case of
ocelot (the only driver for which the default and the alternative
tagging protocol are compiled as different modules), the user is now no
longer required to insert tag_ocelot_8021q.ko manually.
In the particular case of ocelot, this solves a problem where
tag_ocelot_8021q.ko is built as module, and this is present in the
device tree:
&mscc_felix_port4 {
dsa-tag-protocol = "ocelot-8021q";
};
&mscc_felix_port5 {
dsa-tag-protocol = "ocelot-8021q";
};
Because no one attempts to load the module into the kernel at boot time,
the switch driver will fail to probe (actually forever defer) until
someone manually inserts tag_ocelot_8021q.ko. This is now no longer
necessary and happens automatically.
Rename dsa_find_tagger_by_name() to denote the change in functionality:
there is now feature parity with dsa_tag_driver_get_by_id(), i.o.w. we
also load the module if it's missing.
Link: https://lore.kernel.org/lkml/20221027113248.420216-1-michael@walle.cc/
Suggested-by: Michael Walle <michael@walle.cc>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Michael Walle <michael@walle.cc> # on kontron-sl28 w/ ocelot_8021q
Tested-by: Michael Walle <michael@walle.cc>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Tue, 15 Nov 2022 01:18:46 +0000 (03:18 +0200)]
net: dsa: rename dsa_tag_driver_get() to dsa_tag_driver_get_by_id()
A future patch will introduce one more way of getting a reference on a
tagging protocl driver (by name). Rename the current method to "by_id".
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Michael Walle <michael@walle.cc>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Tue, 15 Nov 2022 01:18:45 +0000 (03:18 +0200)]
net: dsa: strip sysfs "tagging" string of trailing newline
Currently, dsa_find_tagger_by_name() uses sysfs_streq() which works both
with strings that contain \n at the end (echo ocelot > .../dsa/tagging)
and with strings that don't (printf ocelot > .../dsa/tagging).
There will be a problem once we'll want to construct the modalias string
based on which we auto-load the protocol kernel module. If the sysfs
buffer ends in a newline, we need to strip it first. This is a
preparatory patch specifically for that.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Michael Walle <michael@walle.cc>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Tue, 15 Nov 2022 01:18:44 +0000 (03:18 +0200)]
net: dsa: provide a second modalias to tag proto drivers based on their name
Currently, tagging protocol drivers have a modalias of
"dsa_tag:id-<number>", where the number is one of DSA_TAG_PROTO_*_VALUE.
This modalias makes it possible for the request_module() call in
dsa_tag_driver_get() to work, given the input it has - an integer
returned by ds->ops->get_tag_protocol().
It is also possible to change tagging protocols at (pseudo-)runtime, via
sysfs or via device tree, and this works via the name string of the
tagging protocol rather than via its id (DSA_TAG_PROTO_*_VALUE).
In the latter case, there is no request_module() call, because there is
no association that the DSA core has between the string name and the ID,
to construct the modalias. The module is simply assumed to have been
inserted. This is actually slightly problematic when the tagging
protocol change should take place at probe time, since it's expected
that the dependency module should get autoloaded.
For this purpose, let's introduce a second modalias, so that the DSA
core can call request_module() by name. There is no reason to make the
modalias by name optional, so just modify the MODULE_ALIAS_DSA_TAG_DRIVER()
macro to take both the ID and the name as arguments, and generate two
modaliases behind the scenes.
Suggested-by: Michael Walle <michael@walle.cc>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Michael Walle <michael@walle.cc> # on kontron-sl28 w/ ocelot_8021q
Tested-by: Michael Walle <michael@walle.cc>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Tue, 15 Nov 2022 01:18:43 +0000 (03:18 +0200)]
net: dsa: rename tagging protocol driver modalias
It's autumn cleanup time, and today's target are modaliases.
Michael says that for users of modinfo, "dsa_tag-20" is not the most
suggestive name, and recommends a change to "dsa_tag-id-20".
Andrew points out that other modaliases have a prefix delimited by
colons, so he recommends "dsa_tag:20" instead of "dsa_tag-20".
To satisfy both proposals, Florian recommends "dsa_tag:id-20".
The modaliases are not stable ABI, and the essential information
(protocol ID) is still conveyed in the new string, which
request_module() must be adapted to form.
Link:
20221027210830.3577793-1-vladimir.oltean@nxp.com
Suggested-by: Andrew Lunn <andrew@lunn.ch>
Suggested-by: Michael Walle <michael@walle.cc>
Suggested-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Michael Walle <michael@walle.cc>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Tue, 15 Nov 2022 01:18:42 +0000 (03:18 +0200)]
net: dsa: stop exposing tag proto module helpers to the world
The DSA tagging protocol driver macros are in the public include/net/dsa.h
probably because that's also where the DSA_TAG_PROTO_*_VALUE macros are
(MODULE_ALIAS_DSA_TAG_DRIVER hinges on those macro definitions).
But there is no reason to expose these helpers to <net/dsa.h>. That
header is shared between switch drivers (drivers/net/dsa/), tagging
protocol drivers (net/dsa/tag_*.c), the DSA core (net/dsa/ sans tag_*.c),
and the rest of the world (DSA master drivers, network stack, etc).
Too much exposure.
On the other hand, net/dsa/dsa_priv.h is included only by the DSA core
and by DSA tagging protocol drivers (or IOW, "friend" modules). Also a
bit too much exposure - I've contemplated creating a new header which is
only included by tagging protocol drivers, but completely separating a
new dsa_tag_proto.h from dsa_priv.h is not immediately trivial - for
example dsa_slave_to_port() is used both from the fast path and from the
control path.
So for now, move these definitions to dsa_priv.h which at least hides
them from the world.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Michael Walle <michael@walle.cc>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Robert Marko [Mon, 14 Nov 2022 19:47:33 +0000 (20:47 +0100)]
dt-bindings: net: ipq4019-mdio: document required clock-names
IPQ5018, IPQ6018 and IPQ8074 require clock-names to be set as driver is
requesting the clock based on it and not index, so document that and make
it required for the listed SoC-s.
Signed-off-by: Robert Marko <robimarko@gmail.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20221114194734.3287854-4-robimarko@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Robert Marko [Mon, 14 Nov 2022 19:47:32 +0000 (20:47 +0100)]
dt-bindings: net: ipq4019-mdio: require and validate clocks
Now that we can match the platforms requiring clocks by compatible start
using those to allow clocks per compatible and make them required.
Signed-off-by: Robert Marko <robimarko@gmail.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20221114194734.3287854-3-robimarko@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Robert Marko [Mon, 14 Nov 2022 19:47:31 +0000 (20:47 +0100)]
dt-bindings: net: ipq4019-mdio: add IPQ8074 compatible
Allow using IPQ8074 specific compatible along with the fallback IPQ4019
one in order to be able to specify which compatibles require clocks to
be able to validate them via schema.
Signed-off-by: Robert Marko <robimarko@gmail.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20221114194734.3287854-2-robimarko@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Robert Marko [Mon, 14 Nov 2022 19:47:30 +0000 (20:47 +0100)]
dt-bindings: net: ipq4019-mdio: document IPQ6018 compatible
Document IPQ6018 compatible that is already being used in the DTS along
with the fallback IPQ4019 compatible as driver itself only gets probed
on IPQ4019 and IPQ5018 compatibles.
This is also required in order to specify which platform require clock to
be defined and validate it in schema.
Signed-off-by: Robert Marko <robimarko@gmail.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20221114194734.3287854-1-robimarko@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 18 Nov 2022 03:40:13 +0000 (19:40 -0800)]
Merge branch 'net-dsa-use-more-appropriate-net_name_-constants-for-user-ports'
Rasmus Villemoes says:
====================
net: dsa: use more appropriate NET_NAME_* constants for user ports
The intention of commit
685343fc3ba6 ("net: add name_assign_type
netdev attribute") was clearly that drivers be switched over one by
one to select appropriate NET_NAME_* constants instead of
NET_NAME_UNKNOWN. This small series attempts to do that for DSA user
ports.
This is obviously and intentionally user-visible changes, so there's a
small chance that it could lead to a regression. To make it easy to
revert either of the "label in DT" and "fallback to eth%d" changes,
this is done as a refactoring which shouldn't introduce any functional
change (but by itself adds code which looks a little odd, with the two
identical assignments in the two branches), followed by changing the
constant used in each case in two different patches.
====================
Link: https://lore.kernel.org/r/20221116105205.1127843-1-linux@rasmusvillemoes.dk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rasmus Villemoes [Wed, 16 Nov 2022 10:52:04 +0000 (11:52 +0100)]
net: dsa: set name_assign_type to NET_NAME_ENUM for enumerated user ports
When a user port does not have a label in device tree, and we thus
fall back to the eth%d scheme, the proper constant to use is
NET_NAME_ENUM. See also commit
e9f656b7a214 ("net: ethernet: set
default assignment identifier to NET_NAME_ENUM"), which in turn quoted
commit
685343fc3ba6 ("net: add name_assign_type netdev attribute"):
... when the kernel has given the interface a name using global
device enumeration based on order of discovery (ethX, wlanY, etc)
... are labelled NET_NAME_ENUM.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.faineli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rasmus Villemoes [Wed, 16 Nov 2022 10:52:03 +0000 (11:52 +0100)]
net: dsa: use NET_NAME_PREDICTABLE for user ports with name given in DT
When a user port has a label in device tree, the corresponding
netdevice is, to quote include/uapi/linux/netdevice.h, "predictably
named by the kernel". This is also explicitly one of the intended use
cases for NET_NAME_PREDICTABLE, quoting
685343fc3ba6 ("net: add
name_assign_type netdev attribute"):
NET_NAME_PREDICTABLE:
The ifname has been assigned by the kernel in a predictable way
[...] Examples include [...] and names deduced from hardware
properties (including being given explicitly by the firmware).
Expose that information properly for the benefit of userspace tools
that make decisions based on the name_assign_type attribute,
e.g. a systemd-udev rule with "kernel" in NamePolicy.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.faineli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rasmus Villemoes [Wed, 16 Nov 2022 10:52:02 +0000 (11:52 +0100)]
net: dsa: refactor name assignment for user ports
The following two patches each have a (small) chance of causing
regressions for userspace and will in that case of course need to be
reverted.
In order to prepare for that and make those two patches independent
and individually revertable, refactor the code which sets the names
for user ports by moving the "fall back to eth%d if no label is given
in device tree" to dsa_slave_create().
No functional change (at least none intended).
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.faineli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vincent Mailhol [Wed, 16 Nov 2022 17:18:28 +0000 (02:18 +0900)]
ethtool: doc: clarify what drivers can implement in their get_drvinfo()
Many of the drivers which implement ethtool_ops::get_drvinfo() will
prints the .driver, .version or .bus_info of struct ethtool_drvinfo.
To have a glance of current state, do:
$ git grep -W "get_drvinfo(struct"
Printing in those three fields is useless because:
- since [1], the driver version should be the kernel version (at
least for upstream drivers). Arguably, out of tree drivers might
still want to set a custom version, but out of tree is not our
focus.
- since [2], the core is able to provide default values for .driver
and .bus_info.
In summary, drivers may provide .fw_version and .erom_version, the
rest is expected to be done by the core.
In struct ethtool_ops doc from linux/ethtool: rephrase field
get_drvinfo() doc to discourage developers from implementing this
callback.
In struct ethtool_drvinfo doc from uapi/linux/ethtool.h: remove the
paragraph mentioning what drivers should do. Rationale: no need to
repeat what is already written in struct ethtool_ops doc. But add a
note that .fw_version and .erom_version are driver defined.
Also update the dummy driver and simply remove the callback in order
not to confuse the newcomers: most of the drivers will not need this
callback function any more.
[1] commit
6a7e25c7fb48 ("net/core: Replace driver version to be
kernel version")
Link: https://git.kernel.org/torvalds/linux/c/6a7e25c7fb48
[2] commit
edaf5df22cb8 ("ethtool: ethtool_get_drvinfo: populate
drvinfo fields even if callback exits")
Link: https://git.kernel.org/netdev/net-next/c/edaf5df22cb8
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://lore.kernel.org/r/20221116171828.4093-1-mailhol.vincent@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 18 Nov 2022 00:19:14 +0000 (16:19 -0800)]
Merge git://git./linux/kernel/git/netdev/net
include/linux/bpf.h
1f6e04a1c7b8 ("bpf: Fix offset calculation error in __copy_map_value and zero_map_value")
aa3496accc41 ("bpf: Refactor kptr_off_tab into btf_record")
f71b2f64177a ("bpf: Refactor map->off_arr handling")
https://lore.kernel.org/all/
20221114095000.
67a73239@canb.auug.org.au/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Thu, 17 Nov 2022 16:58:36 +0000 (08:58 -0800)]
Merge tag 'net-6.1-rc6' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from bpf.
Current release - regressions:
- tls: fix memory leak in tls_enc_skb() and tls_sw_fallback_init()
Previous releases - regressions:
- bridge: fix memory leaks when changing VLAN protocol
- dsa: make dsa_master_ioctl() see through port_hwtstamp_get() shims
- dsa: don't leak tagger-owned storage on switch driver unbind
- eth: mlxsw: avoid warnings when not offloaded FDB entry with IPv6
is removed
- eth: stmmac: ensure tx function is not running in
stmmac_xdp_release()
- eth: hns3: fix return value check bug of rx copybreak
Previous releases - always broken:
- kcm: close race conditions on sk_receive_queue
- bpf: fix alignment problem in bpf_prog_test_run_skb()
- bpf: fix writing offset in case of fault in
strncpy_from_kernel_nofault
- eth: macvlan: use built-in RCU list checking
- eth: marvell: add sleep time after enabling the loopback bit
- eth: octeon_ep: fix potential memory leak in octep_device_setup()
Misc:
- tcp: configurable source port perturb table size
- bpf: Convert BPF_DISPATCHER to use static_call() (not ftrace)"
* tag 'net-6.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (51 commits)
net: use struct_group to copy ip/ipv6 header addresses
net: usb: smsc95xx: fix external PHY reset
net: usb: qmi_wwan: add Telit 0x103a composition
netdevsim: Fix memory leak of nsim_dev->fa_cookie
tcp: configurable source port perturb table size
l2tp: Serialize access to sk_user_data with sk_callback_lock
net: thunderbolt: Fix error handling in tbnet_init()
net: microchip: sparx5: Fix potential null-ptr-deref in sparx_stats_init() and sparx5_start()
net: lan966x: Fix potential null-ptr-deref in lan966x_stats_init()
net: dsa: don't leak tagger-owned storage on switch driver unbind
net/x25: Fix skb leak in x25_lapb_receive_frame()
net: ag71xx: call phylink_disconnect_phy if ag71xx_hw_enable() fail in ag71xx_open()
bridge: switchdev: Fix memory leaks when changing VLAN protocol
net: hns3: fix setting incorrect phy link ksettings for firmware in resetting process
net: hns3: fix return value check bug of rx copybreak
net: hns3: fix incorrect hw rss hash type of rx packet
net: phy: marvell: add sleep time after enabling the loopback bit
net: ena: Fix error handling in ena_init()
kcm: close race conditions on sk_receive_queue
net: ionic: Fix error handling in ionic_init_module()
...
Dan Carpenter [Tue, 15 Nov 2022 13:09:55 +0000 (16:09 +0300)]
net: ethernet: renesas: Fix return type in rswitch_etha_wait_link_verification()
The rswitch_etha_wait_link_verification() is supposed to return zero
on success or negative error codes. Unfortunately it is declared as a
bool so the caller treats everything as success.
Fixes:
3590918b5d07 ("net: ethernet: renesas: Add support for "Ethernet Switch"")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Link: https://lore.kernel.org/r/Y3OPo6AOL6PTvXFU@kili
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Kalle Valo [Thu, 17 Nov 2022 12:53:45 +0000 (14:53 +0200)]
Merge tag 'iwlwifi-next-for-kalle-2022-11-06-v2' of git./linux/kernel/git/iwlwifi/iwlwifi-next
iwlwifi patches intended for v6.2
* iwlmei fixes
* Debug mechanism update for new devices (BZ)
* Checksum offload fix for the new devices (BZ)
* A few rate scale fixes and cleanups
* A fix for iwlwifi debug mechanism
* Start of MLO preparations - supporting new key API
Dmitry Vyukov [Tue, 15 Nov 2022 10:00:17 +0000 (11:00 +0100)]
NFC: nci: Allow to create multiple virtual nci devices
The current virtual nci driver is great for testing and fuzzing.
But it allows to create at most one "global" device which does not allow
to run parallel tests and harms fuzzing isolation and reproducibility.
Restructure the driver to allow creation of multiple independent devices.
This should be backwards compatible for existing tests.
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Bongsu Jeon <bongsu.jeon@samsung.com>
Cc: Bongsu Jeon <bongsu.jeon@samsung.com>
Cc: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: netdev@vger.kernel.org
Link: https://lore.kernel.org/r/20221115100017.787929-1-dvyukov@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Colin Ian King [Tue, 15 Nov 2022 09:31:37 +0000 (09:31 +0000)]
sundance: remove unused variable cnt
Variable cnt is just being incremented and it's never used
anywhere else. The variable and the increment are redundant so
remove it.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20221115093137.144002-1-colin.i.king@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Li zeming [Tue, 15 Nov 2022 02:07:05 +0000 (10:07 +0800)]
sctp: sm_statefuns: Remove pointer casts of the same type
The subh.addip_hdr pointer is also of type (struct sctp_addiphdr *), so
it does not require a cast.
Signed-off-by: Li zeming <zeming@nfschina.com>
Link: https://lore.kernel.org/r/20221115020705.3220-1-zeming@nfschina.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Hangbin Liu [Tue, 15 Nov 2022 14:24:00 +0000 (22:24 +0800)]
net: use struct_group to copy ip/ipv6 header addresses
kernel test robot reported warnings when build bonding module with
make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash drivers/net/bonding/:
from ../drivers/net/bonding/bond_main.c:35:
In function ‘fortify_memcpy_chk’,
inlined from ‘iph_to_flow_copy_v4addrs’ at ../include/net/ip.h:566:2,
inlined from ‘bond_flow_ip’ at ../drivers/net/bonding/bond_main.c:3984:3:
../include/linux/fortify-string.h:413:25: warning: call to ‘__read_overflow2_field’ declared with attribute warning: detected read beyond size of f
ield (2nd parameter); maybe use struct_group()? [-Wattribute-warning]
413 | __read_overflow2_field(q_size_field, size);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In function ‘fortify_memcpy_chk’,
inlined from ‘iph_to_flow_copy_v6addrs’ at ../include/net/ipv6.h:900:2,
inlined from ‘bond_flow_ip’ at ../drivers/net/bonding/bond_main.c:3994:3:
../include/linux/fortify-string.h:413:25: warning: call to ‘__read_overflow2_field’ declared with attribute warning: detected read beyond size of f
ield (2nd parameter); maybe use struct_group()? [-Wattribute-warning]
413 | __read_overflow2_field(q_size_field, size);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This is because we try to copy the whole ip/ip6 address to the flow_key,
while we only point the to ip/ip6 saddr. Note that since these are UAPI
headers, __struct_group() is used to avoid the compiler warnings.
Reported-by: kernel test robot <lkp@intel.com>
Fixes:
c3f8324188fa ("net: Add full IPv6 addresses to flow_keys")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://lore.kernel.org/r/20221115142400.1204786-1-liuhangbin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Alexandru Tachici [Tue, 15 Nov 2022 11:44:34 +0000 (13:44 +0200)]
net: usb: smsc95xx: fix external PHY reset
An external PHY needs settling time after power up or reset.
In the bind() function an mdio bus is registered. If at this point
the external PHY is still initialising, no valid PHY ID will be
read and on phy_find_first() the bind() function will fail.
If an external PHY is present, wait the maximum time specified
in 802.3 45.2.7.1.1.
Fixes:
05b35e7eb9a1 ("smsc95xx: add phylib support")
Signed-off-by: Alexandru Tachici <alexandru.tachici@analog.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20221115114434.9991-2-alexandru.tachici@analog.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Enrico Sau [Tue, 15 Nov 2022 10:58:59 +0000 (11:58 +0100)]
net: usb: qmi_wwan: add Telit 0x103a composition
Add the following Telit LE910C4-WWX composition:
0x103a: rmnet
Signed-off-by: Enrico Sau <enrico.sau@gmail.com>
Acked-by: Bjørn Mork <bjorn@mork.no>
Link: https://lore.kernel.org/r/20221115105859.14324-1-enrico.sau@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Wang Yufen [Tue, 15 Nov 2022 09:30:25 +0000 (17:30 +0800)]
netdevsim: Fix memory leak of nsim_dev->fa_cookie
kmemleak reports this issue:
unreferenced object 0xffff8881bac872d0 (size 8):
comm "sh", pid 58603, jiffies
4481524462 (age 68.065s)
hex dump (first 8 bytes):
04 00 00 00 de ad be ef ........
backtrace:
[<
00000000c80b8577>] __kmalloc+0x49/0x150
[<
000000005292b8c6>] nsim_dev_trap_fa_cookie_write+0xc1/0x210 [netdevsim]
[<
0000000093d78e77>] full_proxy_write+0xf3/0x180
[<
000000005a662c16>] vfs_write+0x1c5/0xaf0
[<
000000007aabf84a>] ksys_write+0xed/0x1c0
[<
000000005f1d2e47>] do_syscall_64+0x3b/0x90
[<
000000006001c6ec>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
The issue occurs in the following scenarios:
nsim_dev_trap_fa_cookie_write()
kmalloc() fa_cookie
nsim_dev->fa_cookie = fa_cookie
..
nsim_drv_remove()
The fa_cookie allocked in nsim_dev_trap_fa_cookie_write() is not freed. To
fix, add kfree(nsim_dev->fa_cookie) to nsim_drv_remove().
Fixes:
d3cbb907ae57 ("netdevsim: add ACL trap reporting cookie as a metadata")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Jiri Pirko <jiri@mellanox.com>
Link: https://lore.kernel.org/r/1668504625-14698-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jacob Keller [Mon, 14 Nov 2022 21:37:01 +0000 (13:37 -0800)]
mlxsw: update adjfine to use adjust_by_scaled_ppm
The mlxsw adjfine implementation in the spectrum_ptp.c file converts
scaled_ppm into ppb before updating a cyclecounter multiplier using the
standard "base * ppb / 1billion" calculation.
This can be re-written to use adjust_by_scaled_ppm, directly using the
scaled parts per million and reducing the amount of code required to
express this calculation.
We still calculate the parts per billion for passing into
mlxsw_sp_ptp_phc_adjfreq because this function requires the input to be in
parts per billion.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Cc: Amit Cohen <amcohen@nvidia.com>
Cc: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20221114213701.815132-1-jacob.e.keller@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Wed, 16 Nov 2022 18:49:06 +0000 (10:49 -0800)]
Merge tag 'for-linus-6.1-rc6-tag' of git://git./linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
"Two trivial cleanups, and three simple fixes"
* tag 'for-linus-6.1-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/platform-pci: use define instead of literal number
xen/platform-pci: add missing free_irq() in error path
xen-pciback: Allow setting PCI_MSIX_FLAGS_MASKALL too
xen/pcpu: fix possible memory leak in register_pcpu()
x86/xen: Use kstrtobool() instead of strtobool()
Linus Torvalds [Wed, 16 Nov 2022 18:40:00 +0000 (10:40 -0800)]
Merge tag 'pinctrl-v6.1-4' of git://git./linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:
"Aere is a hopefully final round of pin control fixes. Nothing special,
driver fixes and we caught a potential NULL pointer exception.
- Fix a potential NULL dereference in the core!
- Fix all pin mux routes in the Rockchop PX30 driver
- Fix the UFS pins in the Qualcomm SC8280XP driver
- Fix bias disabling in the Mediatek driver
- Fix debounce time settings in the Mediatek driver"
* tag 'pinctrl-v6.1-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: mediatek: Export debounce time tables
pinctrl: mediatek: Fix EINT pins input debounce time configuration
pinctrl: devicetree: fix null pointer dereferencing in pinctrl_dt_to_map
pinctrl: mediatek: common-v2: Fix bias-disable for PULL_PU_PD_RSEL_TYPE
pinctrl: qcom: sc8280xp: Rectify UFS reset pins
pinctrl: rockchip: list all pins in a possible mux route for PX30