Paolo Abeni [Wed, 20 Jan 2021 14:39:12 +0000 (15:39 +0100)]
mptcp: do not queue excessive data on subflows
The current packet scheduler can enqueue up to sndbuf
data on each subflow. If the send buffer is large and
the subflows are not symmetric, this could lead to
suboptimal aggregate bandwidth utilization.
Limit the amount of queued data to the maximum send
window.
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Wed, 20 Jan 2021 14:39:11 +0000 (15:39 +0100)]
mptcp: re-enable sndbuf autotune
After commit
6e628cd3a8f7 ("mptcp: use mptcp release_cb for
delayed tasks"), MPTCP never sets the flag bit SOCK_NOSPACE
on its subflow. As a side effect, autotune never takes place,
as it happens inside tcp_new_space(), which in turn is called
only when the mentioned bit is set.
Let's sendmsg() set the subflows NOSPACE bit when looking for
more memory and use the subflow write_space callback to propagate
the snd buf update and wake-up the user-space.
Additionally, this allows dropping a bunch of duplicate code and
makes the SNDBUF_LIMITED chrono relevant again for MPTCP subflows.
Fixes:
6e628cd3a8f7 ("mptcp: use mptcp release_cb for delayed tasks")
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Wed, 20 Jan 2021 14:39:10 +0000 (15:39 +0100)]
mptcp: always graft subflow socket to parent
Currently, incoming subflows link to the parent socket,
while outgoing ones link to a per subflow socket. The latter
is not really needed, except at the initial connect() time and
for the first subflow.
Always graft the outgoing subflow to the parent socket and
free the unneeded ones early.
This allows some code cleanup, reduces the amount of memory
used and will simplify the next patch
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ivan Babrou [Wed, 20 Jan 2021 21:27:59 +0000 (13:27 -0800)]
sfc: reduce the number of requested xdp ev queues
Without this change the driver tries to allocate too many queues,
breaching the number of available msi-x interrupts on machines
with many logical cpus and default adapter settings:
Insufficient resources for 12 XDP event queues (24 other channels, max 32)
Which in turn triggers EINVAL on XDP processing:
sfc 0000:86:00.0 ext0: XDP TX failed (-22)
Signed-off-by: Ivan Babrou <ivan@cloudflare.com>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://lore.kernel.org/r/20210120212759.81548-1-ivan@cloudflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Yousuk Seung [Wed, 20 Jan 2021 20:41:55 +0000 (12:41 -0800)]
tcp: add TTL to SCM_TIMESTAMPING_OPT_STATS
This patch adds TCP_NLA_TTL to SCM_TIMESTAMPING_OPT_STATS that exports
the time-to-live or hop limit of the latest incoming packet with
SCM_TSTAMP_ACK. The value exported may not be from the packet that acks
the sequence when incoming packets are aggregated. Exporting the
time-to-live or hop limit value of incoming packets helps to estimate
the hop count of the path of the flow that may change over time.
Signed-off-by: Yousuk Seung <ysseung@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Link: https://lore.kernel.org/r/20210120204155.552275-1-ysseung@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pengcheng Yang [Thu, 21 Jan 2021 14:31:13 +0000 (22:31 +0800)]
tcp: remove unused ICSK_TIME_EARLY_RETRANS
Since the early retransmit has been removed by
commit
bec41a11dd3d ("tcp: remove early retransmit"),
we also remove the unused ICSK_TIME_EARLY_RETRANS macro.
Signed-off-by: Pengcheng Yang <yangpc@wangsu.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/1611239473-27304-1-git-send-email-yangpc@wangsu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Yuusuke Ashizuka [Thu, 21 Jan 2021 08:02:54 +0000 (17:02 +0900)]
net: phy: realtek: Add support for RTL9000AA/AN
RTL9000AA/AN as 100BASE-T1 is following:
- 100 Mbps
- Full duplex
- Link Status Change Interrupt
- Master/Slave configuration
Signed-off-by: Yuusuke Ashizuka <ashiduka@fujitsu.com>
Signed-off-by: Torii Kenichi <torii.ken1@fujitsu.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20210121080254.21286-1-ashiduka@fujitsu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Wolfram Sang [Thu, 21 Jan 2021 10:06:15 +0000 (11:06 +0100)]
dt-bindings: net: renesas,etheravb: Add r8a779a0 support
Document the compatible value for the RAVB block in the Renesas R-Car
V3U (R8A779A0) SoC. This variant has no stream buffer, so we only need
to add the new compatible and add it to the TX delay block.
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Link: https://lore.kernel.org/r/20210121100619.5653-2-wsa+renesas@sang-engineering.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 22 Jan 2021 04:42:47 +0000 (20:42 -0800)]
Merge branch 'net-ipa-remove-a-build-dependency'
Alex Elder says:
====================
net: ipa: remove a build dependency
Unlike the original (temporary) IPA notification mechanism, the
generic remoteproc SSR notification code does not require the IPA
driver to maintain a pointer to the modem subsystem remoteproc
structure.
The IPA driver was converted to use the newer SSR notifiers, but the
specification and use of a phandle for the modem subsystem was never
removed.
This series removes the lookup of the remoteproc pointer, and that
removes the need for the modem DT property. It also removes the
reference to the "modem-remoteproc" property from the DT binding,
and from the DT files that specified them.
====================
Link: https://lore.kernel.org/r/20210120212606.12556-1-elder@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Wed, 20 Jan 2021 21:26:06 +0000 (15:26 -0600)]
arm64: dts: qcom: sdm845: kill IPA modem-remoteproc property
The "modem-remoteproc" property is no longer required for the IPA
driver, so get rid of it.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Wed, 20 Jan 2021 21:26:05 +0000 (15:26 -0600)]
arm64: dts: qcom: sc7180: kill IPA modem-remoteproc property
The "modem-remoteproc" property is no longer required for the IPA
driver, so get rid of it.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Wed, 20 Jan 2021 21:26:04 +0000 (15:26 -0600)]
dt-bindings: net: remove modem-remoteproc property
The IPA driver uses the remoteproc SSR notifier now, rather than the
temporary IPA notification system used initially. As a result it no
longer needs a property identifying the modem subsystem DT node.
Use GIC_SPI rather than 0 in the example interrupt definition.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Wed, 20 Jan 2021 21:26:03 +0000 (15:26 -0600)]
net: ipa: remove a remoteproc dependency
The IPA driver currently requires a DT property to be defined whose
value is the phandle for the modem subsystem. This was needed to
look up a remoteproc structure pointer used when registering for
notifications in the original IPA notification mechanism.
Remoteproc provides a more generic SSR notifier system, and the IPA
driver switched over to it last summer, but this remoteproc phandle
dependency was not removed at that time.
Get rid of the IPA remoteproc pointer and stop requiring the phandle
be specified.
This avoids a link error (rproc_put() not defined) for certain
configurations.
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Michael Walle [Wed, 20 Jan 2021 19:43:03 +0000 (20:43 +0100)]
net: macb: ignore tx_clk if MII is used
If the MII interface is used, the PHY is the clock master, thus don't
set the clock rate. On Zynq-7000, this will prevent the following
warning:
macb
e000b000.ethernet eth0: unable to generate target frequency:
25000000 Hz
Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20210120194303.28268-1-michael@walle.cc
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Arnd Bergmann [Wed, 20 Jan 2021 15:06:31 +0000 (16:06 +0100)]
net: remove aurora nb8800 driver
The tango4 platform is getting removed, and this driver has no
other known users, so it can be removed.
Cc: Marc Gonzalez <marc.w.gonzalez@free.fr>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Mans Rullgard <mans@mansr.com>
Link: https://lore.kernel.org/r/20210120150703.1629527-1-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Heiner Kallweit [Wed, 20 Jan 2021 07:27:14 +0000 (08:27 +0100)]
cxgb4: remove bogus CHELSIO_VPD_UNIQUE_ID constant
The comment is quite weird, there is no such thing as a vendor-specific
VPD id. 0x82 is the value of PCI_VPD_LRDT_ID_STRING. So what we are
doing here is simply checking whether the byte at VPD address VPD_BASE
is a valid string LRDT, same as what is done a few lines later in
the code.
LRDT = Large Resource Data Tag, see PCI 2.2 spec, VPD chapter
v2:
- don't set VPD_BASE / VPD_BASE_OLD separately
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/644ef22f-e86a-5cc1-0f27-f873ab165696@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jiapeng Zhong [Wed, 20 Jan 2021 07:01:51 +0000 (15:01 +0800)]
cxgb4: Assign boolean values to a bool variable
Fix the following coccicheck warnings:
./drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c:5142:2-33:
WARNING: Assignment of 0/1 to bool variable.
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Zhong <abaci-bugfix@linux.alibaba.com>
Link: https://lore.kernel.org/r/1611126111-22079-1-git-send-email-abaci-bugfix@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
George McCollister [Wed, 20 Jan 2021 13:53:23 +0000 (07:53 -0600)]
MAINTAINERS: add entry for Arrow SpeedChips XRS7000 driver
Add myself as maintainer of the Arrow SpeedChips XRS7000 series Ethernet
switch driver.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: George McCollister <george.mccollister@gmail.com>
Link: https://lore.kernel.org/r/20210120135323.73856-1-george.mccollister@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 21 Jan 2021 20:19:58 +0000 (12:19 -0800)]
Merge branch 'ucc_geth-improvements'
Rasmus Villemoes says:
====================
ucc_geth improvements
This is a resend of some improvements to the ucc_geth driver that was
previously sent together with bug fixes, which have by now been
applied.
v2: rebase to net/master; address minor style issues; don't introduce
a use-after-free in patch "don't statically allocate eight
ucc_geth_info".
====================
Link: https://lore.kernel.org/r/20210119150802.19997-1-rasmus.villemoes@prevas.dk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rasmus Villemoes [Tue, 19 Jan 2021 15:08:02 +0000 (16:08 +0100)]
ethernet: ucc_geth: simplify rx/tx allocations
Since kmalloc() is nowadays [1] guaranteed to return naturally
aligned (i.e., aligned to the size itself) memory for power-of-2
sizes, we don't need to over-allocate the align amount, compute an
aligned address within the allocation, and (for later freeing) also
storing the original pointer [2].
Instead, just round up the length we want to allocate to the alignment
requirements, then round that up to the next power of 2. In theory,
this could allocate up to about twice as much memory as we needed. In
practice, (a) kmalloc() would in most cases anyway return a
power-of-2-sized allocation and (b) with the default values of the
bdRingLen[RT]x fields, the length is already itself a power of 2
greater than the alignment.
So we actually end up saving memory compared to the current
situtation (e.g. for tx, we currently allocate 128+32 bytes, which
kmalloc() likely rounds up to 192 or 256; with this patch, we just
allocate 128 bytes.) Also struct ucc_geth_private becomes a little
smaller.
[1]
59bb47985c1d ("mm, sl[aou]b: guarantee natural alignment for
kmalloc(power-of-two)")
[2] That storing was anyway done in a u32, which works on 32 bit
machines, but is not very elegant and certainly makes a reader of the
code pause for a while.
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rasmus Villemoes [Tue, 19 Jan 2021 15:08:01 +0000 (16:08 +0100)]
ethernet: ucc_geth: inform the compiler that numQueues is always 1
The numQueuesTx and numQueuesRx members of struct ucc_geth_info are
never set to anything but 1, and never have been. It's unclear how
well the code supporting multiple queues would work. Until somebody
wants to play with enabling that, help the compiler eliminate a lot of
dead code and loops that are not really loops by creating static
inline helpers. If and when the numQueuesTx/numQueuesRx fields are
re-introduced, it suffices to update those helper to return the
appropriate field.
This cuts the .text segment of ucc_geth.o by 8%.
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rasmus Villemoes [Tue, 19 Jan 2021 15:08:00 +0000 (16:08 +0100)]
ethernet: ucc_geth: add helper to replace repeated switch statements
The translation from the ucc_geth_num_of_threads enum value to the
actual count can be written somewhat more compactly with a small
lookup table, allowing us to replace the four switch statements.
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rasmus Villemoes [Tue, 19 Jan 2021 15:07:59 +0000 (16:07 +0100)]
ethernet: ucc_geth: replace kmalloc_array()+for loop by kcalloc()
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rasmus Villemoes [Tue, 19 Jan 2021 15:07:58 +0000 (16:07 +0100)]
ethernet: ucc_geth: remove bd_mem_part and all associated code
The bd_mem_part member of ucc_geth_info always has the value
MEM_PART_SYSTEM, and AFAICT, there has never been any code setting it
to any other value. Moreover, muram is a somewhat precious resource,
so there's no point using that when normal memory serves just as well.
Apart from removing a lot of dead code, this is also motivated by
wanting to clean up the "store result from kmalloc() in a u32" mess.
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rasmus Villemoes [Tue, 19 Jan 2021 15:07:57 +0000 (16:07 +0100)]
ethernet: ucc_geth: use UCC_GETH_{RX,TX}_BD_RING_ALIGNMENT macros directly
These macros both have the value 32, there's no point first
initializing align to a lower value.
If anything, one could throw in a
BUILD_BUG_ON(UCC_GETH_TX_BD_RING_ALIGNMENT < 4), but it's not worth it
- lots of code depends on named constants having sensible values.
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rasmus Villemoes [Tue, 19 Jan 2021 15:07:56 +0000 (16:07 +0100)]
ethernet: ucc_geth: don't statically allocate eight ucc_geth_info
struct ucc_geth_info is somewhat large, and on systems with only one
or two UCC instances, that just wastes a few KB of memory. So
allocate and populate a chunk of memory at probe time instead of
initializing them all during driver init.
Note that the existing "ug_info == NULL" check was dead code, as the
address of some static array element can obviously never be NULL.
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rasmus Villemoes [Tue, 19 Jan 2021 15:07:55 +0000 (16:07 +0100)]
ethernet: ucc_geth: constify ugeth_primary_info
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rasmus Villemoes [Tue, 19 Jan 2021 15:07:54 +0000 (16:07 +0100)]
ethernet: ucc_geth: factor out parsing of {rx,tx}-clock{,-name} properties
Reduce the code duplication a bit by moving the parsing of
rx-clock-name and the fallback handling to a helper function.
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rasmus Villemoes [Tue, 19 Jan 2021 15:07:53 +0000 (16:07 +0100)]
ethernet: ucc_geth: remove {rx,tx}_glbl_pram_offset from struct ucc_geth_private
These fields are only used within ucc_geth_startup(), so they might as
well be local variables in that function rather than being stashed in
struct ucc_geth_private.
Aside from making that struct a tiny bit smaller, it also shortens
some lines (getting rid of pointless casts while here), and fixes the
problems with using IS_ERR_VALUE() on a u32 as explained in commit
800cd6fb76f0 ("soc: fsl: qe: change return type of cpm_muram_alloc()
to s32").
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rasmus Villemoes [Tue, 19 Jan 2021 15:07:52 +0000 (16:07 +0100)]
ethernet: ucc_geth: replace kmalloc+memset by kzalloc
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rasmus Villemoes [Tue, 19 Jan 2021 15:07:51 +0000 (16:07 +0100)]
ethernet: ucc_geth: remove unnecessary memset_io() calls
These buffers have all just been handed out from qe_muram_alloc(), aka
cpm_muram_alloc(), and the helper cpm_muram_alloc_common() already
does
memset_io(cpm_muram_addr(start), 0, size);
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rasmus Villemoes [Tue, 19 Jan 2021 15:07:50 +0000 (16:07 +0100)]
ethernet: ucc_geth: use qe_muram_free_addr()
This removes the explicit NULL checks, and allows us to stop storing
at least some of the _offset values separately.
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rasmus Villemoes [Tue, 19 Jan 2021 15:07:49 +0000 (16:07 +0100)]
soc: fsl: qe: add cpm_muram_free_addr() helper
Add a helper that takes a virtual address rather than the muram
offset. This will be used in a couple of places to avoid having to
store both the offset and the virtual address, as well as removing
NULL checks from the callers.
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Acked-by: Li Yang <leoyang.li@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rasmus Villemoes [Tue, 19 Jan 2021 15:07:48 +0000 (16:07 +0100)]
soc: fsl: qe: store muram_vbase as a void pointer instead of u8
The two functions cpm_muram_offset() and cpm_muram_dma() both need a
cast currently, one casts muram_vbase to do the pointer arithmetic on
void pointers, the other casts the passed-in address u8*.
It's simpler and more consistent to just always use void* and drop all
the casting.
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Acked-by: Li Yang <leoyang.li@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rasmus Villemoes [Tue, 19 Jan 2021 15:07:47 +0000 (16:07 +0100)]
soc: fsl: qe: make cpm_muram_offset take a const void* argument
Allow passing const-qualified pointers without requiring a cast in the
caller.
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Acked-by: Li Yang <leoyang.li@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rasmus Villemoes [Tue, 19 Jan 2021 15:07:46 +0000 (16:07 +0100)]
ethernet: ucc_geth: remove unused read of temoder field
In theory, such a read-after-write might be required by the hardware,
but nothing in the data sheet suggests that to be the case. The name
test also suggests that it's some debug leftover.
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 21 Jan 2021 19:57:53 +0000 (11:57 -0800)]
Merge branch 'add-devlink-health-reporters-for-nix-block'
George Cherian says:
====================
Add devlink health reporters for NIX block
Devlink health reporters are added for the NIX block.
Address Jakub's comment to add devlink support for error reporting.
https://www.spinics.net/lists/netdev/msg670712.html
This series is in continuation to
https://www.spinics.net/lists/netdev/msg707798.html
Added Documentation for the same.
====================
Link: https://lore.kernel.org/r/20210119100120.2614730-1-george.cherian@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
George Cherian [Tue, 19 Jan 2021 10:01:20 +0000 (15:31 +0530)]
docs: octeontx2: Add Documentation for NIX health reporters
Add devlink health reporter documentation for NIX block.
Signed-off-by: George Cherian <george.cherian@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
George Cherian [Tue, 19 Jan 2021 10:01:19 +0000 (15:31 +0530)]
octeontx2-af: Add devlink health reporters for NIX
Add health reporters for RVU NIX block.
NIX Health reporters handle following HW event groups
- GENERAL events
- ERROR events
- RAS events
- RVU event
Output:
# devlink health
pci/0002:01:00.0:
reporter hw_npa_intr
state healthy error 0 recover 0 grace_period 0 auto_recover true auto_dump true
reporter hw_npa_gen
state healthy error 0 recover 0 grace_period 0 auto_recover true auto_dump true
reporter hw_npa_err
state healthy error 0 recover 0 grace_period 0 auto_recover true auto_dump true
reporter hw_npa_ras
state healthy error 0 recover 0 grace_period 0 auto_recover true auto_dump true
reporter hw_nix_intr
state healthy error 0 recover 0 grace_period 0 auto_recover true auto_dump true
reporter hw_nix_gen
state healthy error 0 recover 0 grace_period 0 auto_recover true auto_dump true
reporter hw_nix_err
state healthy error 0 recover 0 grace_period 0 auto_recover true auto_dump true
reporter hw_nix_ras
state healthy error 0 recover 0 grace_period 0 auto_recover true auto_dump true
# devlink health dump show pci/0002:01:00.0 reporter hw_nix_intr
NIX_AF_RVU:
NIX RVU Interrupt Reg : 1
Unmap Slot Error
# devlink health dump show pci/0002:01:00.0 reporter hw_nix_gen
NIX_AF_GENERAL:
NIX General Interrupt Reg : 1
Rx multicast pkt drop
Each reporter dump shows the Register value and the description of the cause.
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: George Cherian <george.cherian@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Martin Blumenstingl [Tue, 19 Jan 2021 20:24:24 +0000 (21:24 +0100)]
net: stmmac: dwmac-meson8b: fix the RX delay validation
When has_prg_eth1_rgmii_rx_delay is true then we support RX delays
between 0ps and 3000ps in 200ps steps. Swap the validation of the RX
delay based on the has_prg_eth1_rgmii_rx_delay flag so the 200ps check
is now applied correctly on G12A SoCs (instead of only allow 0ps or
2000ps on G12A, but 0..3000ps in 200ps steps on older SoCs which don't
support that).
Fixes:
de94fc104d58ea ("net: stmmac: dwmac-meson8b: add support for the RGMII RX delay on G12A")
Reported-by: Martijn van Deventer <martijn@martijnvandeventer.nl>
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Link: https://lore.kernel.org/r/20210119202424.591349-1-martin.blumenstingl@googlemail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Xin Long [Sat, 16 Jan 2021 04:44:11 +0000 (12:44 +0800)]
ip_gre: remove CRC flag from dev features in gre_gso_segment
This patch is to let it always do CRC checksum in sctp_gso_segment()
by removing CRC flag from the dev features in gre_gso_segment() for
SCTP over GRE, just as it does in Commit
527beb8ef9c0 ("udp: support
sctp over udp in skb_udp_tunnel_segment") for SCTP over UDP.
It could set csum/csum_start in GSO CB properly in sctp_gso_segment()
after that commit, so it would do checksum with gso_make_checksum()
in gre_gso_segment(), and Commit
622e32b7d4a6 ("net: gre: recompute
gre csum for sctp over gre tunnels") can be reverted now.
Note that when need_csum is false, we can still leave CRC checksum
of SCTP to HW by not clearing this CRC flag if it's supported, as
Jakub and Alex noticed.
v1->v2:
- improve the changelog.
- fix "rev xmas tree" in varibles declaration.
v2->v3:
- remove CRC flag from dev features only when need_csum is true.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Link: https://lore.kernel.org/r/00439f24d5f69e2c6fa2beadc681d056c15c258f.1610772251.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Xin Long [Sat, 16 Jan 2021 05:59:17 +0000 (13:59 +0800)]
udp: not remove the CRC flag from dev features when need_csum is false
In __skb_udp_tunnel_segment(), when it's a SCTP over VxLAN/GENEVE
packet and need_csum is false, which means the outer udp checksum
doesn't need to be computed, csum_start and csum_offset could be
used by the inner SCTP CRC CSUM for SCTP HW CRC offload.
So this patch is to not remove the CRC flag from dev features when
need_csum is false.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Link: https://lore.kernel.org/r/1e81b700642498546eaa3f298e023fd7ad394f85.1610776757.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
wenxu [Tue, 19 Jan 2021 08:31:50 +0000 (16:31 +0800)]
net/sched: cls_flower add CT_FLAGS_INVALID flag support
This patch add the TCA_FLOWER_KEY_CT_FLAGS_INVALID flag to
match the ct_state with invalid for conntrack.
Signed-off-by: wenxu <wenxu@ucloud.cn>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Link: https://lore.kernel.org/r/1611045110-682-1-git-send-email-wenxu@ucloud.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 21 Jan 2021 05:04:21 +0000 (21:04 -0800)]
Merge branch 'net-inline-rollback_registered-functions'
After recent changes to the error path of register_netdevice()
we no longer need a version of unregister_netdevice_many() which
does not set net_todo. We can inline the rollback_registered()
functions into respective unregister_netdevice() calls.
Link: https://lore.kernel.org/r/20210119202521.3108236-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 19 Jan 2021 20:25:21 +0000 (12:25 -0800)]
net: inline rollback_registered_many()
Similar to the change for rollback_registered() -
rollback_registered_many() was a part of unregister_netdevice_many()
minus the net_set_todo(), which is no longer needed.
Functionally this patch moves the list_empty() check back after:
BUG_ON(dev_boot_phase);
ASSERT_RTNL();
but I can't find any reason why that would be an issue.
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 19 Jan 2021 20:25:20 +0000 (12:25 -0800)]
net: move rollback_registered_many()
Move rollback_registered_many() and add a temporary
forward declaration to make merging the code into
unregister_netdevice_many() easier to review.
No functional changes.
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 19 Jan 2021 20:25:19 +0000 (12:25 -0800)]
net: inline rollback_registered()
rollback_registered() is a local helper, it's common for driver
code to call unregister_netdevice_queue(dev, NULL) when they
want to unregister netdevices under rtnl_lock. Inline
rollback_registered() and adjust the only remaining caller.
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 19 Jan 2021 20:25:18 +0000 (12:25 -0800)]
net: move net_set_todo inside rollback_registered()
Commit
93ee31f14f6f ("[NET]: Fix free_netdev on register_netdev
failure.") moved net_set_todo() outside of rollback_registered()
so that rollback_registered() can be used in the failure path of
register_netdevice() but without risking a double free.
Since commit
cf124db566e6 ("net: Fix inconsistent teardown and
release of private netdev state."), however, we have a better
way of handling that condition, since destructors don't call
free_netdev() directly.
After the change in commit
c269a24ce057 ("net: make free_netdev()
more lenient with unregistering devices") we can now move
net_set_todo() back.
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 21 Jan 2021 05:00:28 +0000 (21:00 -0800)]
Merge branch 'nexthop-more-fine-grained-policies-for-netlink-message-validation'
Petr Machata says:
====================
nexthop: More fine-grained policies for netlink message validation
There is currently one policy that covers all attributes for next hop
object management. Actual validation is then done in code, which makes it
unobvious which attributes are acceptable when, and indeed that everything
is rejected as necessary.
In this series, split rtm_nh_policy to several policies that cover various
aspects of the next hop object configuration, and instead of open-coding
the validation, defer to nlmsg_parse(). This should make extending the next
hop code simpler as well, which will be relevant in near future for
resilient hashing implementation.
This was tested by running tools/testing/selftests/net/fib_nexthops.sh.
Additionally iproute2 was tweaked to issue "nexthop list id" as an
RTM_GETNEXTHOP dump request, instead of a straight get to test that
unexpected attributes are indeed rejected.
====================
Link: https://lore.kernel.org/r/cover.1611156111.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Petr Machata [Wed, 20 Jan 2021 15:44:12 +0000 (16:44 +0100)]
nexthop: Specialize rtm_nh_policy
This policy is currently only used for creation of new next hops and new
next hop groups. Rename it accordingly and remove the two attributes that
are not valid in that context: NHA_GROUPS and NHA_MASTER.
For consistency with other policies, do not mention policy array size in
the declarator, and replace NHA_MAX for ARRAY_SIZE as appropriate.
Note that with this commit, NHA_MAX and __NHA_MAX are not used anymore.
Leave them in purely as a user API.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Petr Machata [Wed, 20 Jan 2021 15:44:11 +0000 (16:44 +0100)]
nexthop: Use a dedicated policy for nh_valid_dump_req()
This function uses the global nexthop policy, but only accepts four
particular attributes. Create a new policy that only includes the four
supported attributes, and use it. Convert the loop to a series of ifs.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Petr Machata [Wed, 20 Jan 2021 15:44:10 +0000 (16:44 +0100)]
nexthop: Use a dedicated policy for nh_valid_get_del_req()
This function uses the global nexthop policy only to then bounce all
arguments except for NHA_ID. Instead, just create a new policy that
only includes the one allowed attribute.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dan Carpenter [Tue, 19 Jan 2021 14:53:35 +0000 (17:53 +0300)]
net: dsa: Fix off by one in dsa_loop_port_vlan_add()
The > comparison is intended to be >= to prevent reading beyond the
end of the ps->vlans[] array. It doesn't affect run time though because
the ps->vlans[] array has VLAN_N_VID (4096) elements and the vlan->vid
cannot be > 4094 because it is checked earlier.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/YAbyb5kBJQlpYCs2@mwanda
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 20 Jan 2021 20:16:11 +0000 (12:16 -0800)]
Merge git://git./linux/kernel/git/netdev/net
Conflicts:
drivers/net/can/dev.c
commit
03f16c5075b2 ("can: dev: can_restart: fix use after free bug")
commit
3e77f70e7345 ("can: dev: move driver related infrastructure into separate subdir")
Code move.
drivers/net/dsa/b53/b53_common.c
commit
8e4052c32d6b ("net: dsa: b53: fix an off by one in checking "vlan->vid"")
commit
b7a9e0da2d1c ("net: switchdev: remove vid_begin -> vid_end range from VLAN objects")
Field rename.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Wed, 20 Jan 2021 19:52:21 +0000 (11:52 -0800)]
Merge tag 'net-5.11-rc5' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Networking fixes for 5.11-rc5, including fixes from bpf, wireless, and
can trees.
Current release - regressions:
- nfc: nci: fix the wrong NCI_CORE_INIT parameters
Current release - new code bugs:
- bpf: allow empty module BTFs
Previous releases - regressions:
- bpf: fix signed_{sub,add32}_overflows type handling
- tcp: do not mess with cloned skbs in tcp_add_backlog()
- bpf: prevent double bpf_prog_put call from bpf_tracing_prog_attach
- bpf: don't leak memory in bpf getsockopt when optlen == 0
- tcp: fix potential use-after-free due to double kfree()
- mac80211: fix encryption issues with WEP
- devlink: use right genl user_ptr when handling port param get/set
- ipv6: set multicast flag on the multicast route
- tcp: fix TCP_USER_TIMEOUT with zero window
Previous releases - always broken:
- bpf: local storage helpers should check nullness of owner ptr passed
- mac80211: fix incorrect strlen of .write in debugfs
- cls_flower: call nla_ok() before nla_next()
- skbuff: back tiny skbs with kmalloc() in __netdev_alloc_skb() too"
* tag 'net-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (52 commits)
net: systemport: free dev before on error path
net: usb: cdc_ncm: don't spew notifications
net: mscc: ocelot: Fix multicast to the CPU port
tcp: Fix potential use-after-free due to double kfree()
bpf: Fix signed_{sub,add32}_overflows type handling
can: peak_usb: fix use after free bugs
can: vxcan: vxcan_xmit: fix use after free bug
can: dev: can_restart: fix use after free bug
tcp: fix TCP socket rehash stats mis-accounting
net: dsa: b53: fix an off by one in checking "vlan->vid"
tcp: do not mess with cloned skbs in tcp_add_backlog()
selftests: net: fib_tests: remove duplicate log test
net: nfc: nci: fix the wrong NCI_CORE_INIT parameters
sh_eth: Fix power down vs. is_opened flag ordering
net: Disable NETIF_F_HW_TLS_RX when RXCSUM is disabled
netfilter: rpfilter: mask ecn bits before fib lookup
udp: mask TOS bits in udp_v4_early_demux()
xsk: Clear pool even for inactive queues
bpf: Fix helper bpf_map_peek_elem_proto pointing to wrong callback
sh_eth: Make PHY access aware of Runtime PM to fix reboot crash
...
Linus Torvalds [Wed, 20 Jan 2021 19:46:38 +0000 (11:46 -0800)]
Merge tag 'for-linus-5.11-rc5-tag' of git://git./linux/kernel/git/xen/tip
Pull xen fix from Juergen Gross:
"A fix for build failure showing up in some configurations"
* tag 'for-linus-5.11-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
x86/xen: fix 'nopvspin' build error
Tianjia Zhang [Tue, 19 Jan 2021 00:13:19 +0000 (00:13 +0000)]
X.509: Fix crash caused by NULL pointer
On the following call path, `sig->pkey_algo` is not assigned
in asymmetric_key_verify_signature(), which causes runtime
crash in public_key_verify_signature().
keyctl_pkey_verify
asymmetric_key_verify_signature
verify_signature
public_key_verify_signature
This patch simply check this situation and fixes the crash
caused by NULL pointer.
Fixes:
215525639631 ("X.509: support OSCCA SM2-with-SM3 certificate verification")
Reported-by: Tobias Markus <tobias@markus-regensburg.de>
Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-and-tested-by: Toke Høiland-Jørgensen <toke@redhat.com>
Tested-by: João Fonseca <jpedrofonseca@ua.pt>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Cc: stable@vger.kernel.org # v5.10+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Takashi Iwai [Wed, 20 Jan 2021 16:11:12 +0000 (16:11 +0000)]
cachefiles: Drop superfluous readpages aops NULL check
After the recent actions to convert readpages aops to readahead, the
NULL checks of readpages aops in cachefiles_read_or_alloc_page() may
hit falsely. More badly, it's an ASSERT() call, and this panics.
Drop the superfluous NULL checks for fixing this regression.
[DH: Note that cachefiles never actually used readpages, so this check was
never actually necessary]
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=208883
BugLink: https://bugzilla.opensuse.org/show_bug.cgi?id=1175245
Fixes:
9ae326a69004 ("CacheFiles: A cache that backs onto a mounted filesystem")
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jakub Kicinski [Wed, 20 Jan 2021 17:16:01 +0000 (09:16 -0800)]
Merge tag 'linux-can-fixes-for-5.11-
20210120' of git://git./linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
linux-can-fixes-for-5.11-
20210120
All three patches are by Vincent Mailhol and fix a potential use after free bug
in the CAN device infrastructure, the vxcan driver, and the peak_usk driver. In
the TX-path the skb is used to read from after it was passed to the networking
stack with netif_rx_ni().
* tag 'linux-can-fixes-for-5.11-
20210120' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: peak_usb: fix use after free bugs
can: vxcan: vxcan_xmit: fix use after free bug
can: dev: can_restart: fix use after free bug
====================
Link: https://lore.kernel.org/r/20210120125202.2187358-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pan Bian [Wed, 20 Jan 2021 04:44:23 +0000 (20:44 -0800)]
net: systemport: free dev before on error path
On the error path, it should goto the error handling label to free
allocated memory rather than directly return.
Fixes:
31bc72d97656 ("net: systemport: fetch and use clock resources")
Signed-off-by: Pan Bian <bianpan2016@163.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20210120044423.1704-1-bianpan2016@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Grant Grundler [Wed, 20 Jan 2021 01:12:08 +0000 (17:12 -0800)]
net: usb: cdc_ncm: don't spew notifications
RTL8156 sends notifications about every 32ms.
Only display/log notifications when something changes.
This issue has been reported by others:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1832472
https://lkml.org/lkml/2020/8/27/1083
...
[785962.779840] usb 1-1: new high-speed USB device number 5 using xhci_hcd
[785962.929944] usb 1-1: New USB device found, idVendor=0bda, idProduct=8156, bcdDevice=30.00
[785962.929949] usb 1-1: New USB device strings: Mfr=1, Product=2, SerialNumber=6
[785962.929952] usb 1-1: Product: USB 10/100/1G/2.5G LAN
[785962.929954] usb 1-1: Manufacturer: Realtek
[785962.929956] usb 1-1: SerialNumber:
000000001
[785962.991755] usbcore: registered new interface driver cdc_ether
[785963.017068] cdc_ncm 1-1:2.0: MAC-Address: 00:24:27:88:08:15
[785963.017072] cdc_ncm 1-1:2.0: setting rx_max = 16384
[785963.017169] cdc_ncm 1-1:2.0: setting tx_max = 16384
[785963.017682] cdc_ncm 1-1:2.0 usb0: register 'cdc_ncm' at usb-0000:00:14.0-1, CDC NCM, 00:24:27:88:08:15
[785963.019211] usbcore: registered new interface driver cdc_ncm
[785963.023856] usbcore: registered new interface driver cdc_wdm
[785963.025461] usbcore: registered new interface driver cdc_mbim
[785963.038824] cdc_ncm 1-1:2.0 enx002427880815: renamed from usb0
[785963.089586] cdc_ncm 1-1:2.0 enx002427880815: network connection: disconnected
[785963.121673] cdc_ncm 1-1:2.0 enx002427880815: network connection: disconnected
[785963.153682] cdc_ncm 1-1:2.0 enx002427880815: network connection: disconnected
...
This is about 2KB per second and will overwrite all contents of a 1MB
dmesg buffer in under 10 minutes rendering them useless for debugging
many kernel problems.
This is also an extra 180 MB/day in /var/logs (or 1GB per week) rendering
the majority of those logs useless too.
When the link is up (expected state), spew amount is >2x higher:
...
[786139.600992] cdc_ncm 2-1:2.0 enx002427880815: network connection: connected
[786139.632997] cdc_ncm 2-1:2.0 enx002427880815: 2500 mbit/s downlink 2500 mbit/s uplink
[786139.665097] cdc_ncm 2-1:2.0 enx002427880815: network connection: connected
[786139.697100] cdc_ncm 2-1:2.0 enx002427880815: 2500 mbit/s downlink 2500 mbit/s uplink
[786139.729094] cdc_ncm 2-1:2.0 enx002427880815: network connection: connected
[786139.761108] cdc_ncm 2-1:2.0 enx002427880815: 2500 mbit/s downlink 2500 mbit/s uplink
...
Chrome OS cannot support RTL8156 until this is fixed.
Signed-off-by: Grant Grundler <grundler@chromium.org>
Reviewed-by: Hayes Wang <hayeswang@realtek.com>
Link: https://lore.kernel.org/r/20210120011208.3768105-1-grundler@chromium.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alban Bedel [Tue, 19 Jan 2021 14:06:38 +0000 (15:06 +0100)]
net: mscc: ocelot: Fix multicast to the CPU port
Multicast entries in the MAC table use the high bits of the MAC
address to encode the ports that should get the packets. But this port
mask does not work for the CPU port, to receive these packets on the
CPU port the MAC_CPU_COPY flag must be set.
Because of this IPv6 was effectively not working because neighbor
solicitations were never received. This was not apparent before commit
9403c158 (net: mscc: ocelot: support IPv4, IPv6 and plain Ethernet mdb
entries) as the IPv6 entries were broken so all incoming IPv6
multicast was then treated as unknown and flooded on all ports.
To fix this problem rework the ocelot_mact_learn() to set the
MAC_CPU_COPY flag when a multicast entry that target the CPU port is
added. For this we have to read back the ports endcoded in the pseudo
MAC address by the caller. It is not a very nice design but that avoid
changing the callers and should make backporting easier.
Signed-off-by: Alban Bedel <alban.bedel@aerq.com>
Fixes:
9403c158b872 ("net: mscc: ocelot: support IPv4, IPv6 and plain Ethernet mdb entries")
Link: https://lore.kernel.org/r/20210119140638.203374-1-alban.bedel@aerq.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kuniyuki Iwashima [Mon, 18 Jan 2021 05:59:20 +0000 (14:59 +0900)]
tcp: Fix potential use-after-free due to double kfree()
Receiving ACK with a valid SYN cookie, cookie_v4_check() allocates struct
request_sock and then can allocate inet_rsk(req)->ireq_opt. After that,
tcp_v4_syn_recv_sock() allocates struct sock and copies ireq_opt to
inet_sk(sk)->inet_opt. Normally, tcp_v4_syn_recv_sock() inserts the full
socket into ehash and sets NULL to ireq_opt. Otherwise,
tcp_v4_syn_recv_sock() has to reset inet_opt by NULL and free the full
socket.
The commit
01770a1661657 ("tcp: fix race condition when creating child
sockets from syncookies") added a new path, in which more than one cores
create full sockets for the same SYN cookie. Currently, the core which
loses the race frees the full socket without resetting inet_opt, resulting
in that both sock_put() and reqsk_put() call kfree() for the same memory:
sock_put
sk_free
__sk_free
sk_destruct
__sk_destruct
sk->sk_destruct/inet_sock_destruct
kfree(rcu_dereference_protected(inet->inet_opt, 1));
reqsk_put
reqsk_free
__reqsk_free
req->rsk_ops->destructor/tcp_v4_reqsk_destructor
kfree(rcu_dereference_protected(inet_rsk(req)->ireq_opt, 1));
Calling kmalloc() between the double kfree() can lead to use-after-free, so
this patch fixes it by setting NULL to inet_opt before sock_put().
As a side note, this kind of issue does not happen for IPv6. This is
because tcp_v6_syn_recv_sock() clones both ipv6_opt and pktopts which
correspond to ireq_opt in IPv4.
Fixes:
01770a166165 ("tcp: fix race condition when creating child sockets from syncookies")
CC: Ricardo Dias <rdias@singlestore.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Reviewed-by: Benjamin Herrenschmidt <benh@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20210118055920.82516-1-kuniyu@amazon.co.jp
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 20 Jan 2021 16:47:34 +0000 (08:47 -0800)]
Merge https://git./linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2021-01-20
1) Fix wrong bpf_map_peek_elem_proto helper callback, from Mircea Cirjaliu.
2) Fix signed_{sub,add32}_overflows type truncation, from Daniel Borkmann.
3) Fix AF_XDP to also clear pools for inactive queues, from Maxim Mikityanskiy.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Fix signed_{sub,add32}_overflows type handling
xsk: Clear pool even for inactive queues
bpf: Fix helper bpf_map_peek_elem_proto pointing to wrong callback
====================
Link: https://lore.kernel.org/r/20210120163439.8160-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Daniel Borkmann [Tue, 19 Jan 2021 23:24:24 +0000 (00:24 +0100)]
bpf: Fix signed_{sub,add32}_overflows type handling
Fix incorrect signed_{sub,add32}_overflows() input types (and a related buggy
comment). It looks like this might have slipped in via copy/paste issue, also
given prior to
3f50f132d840 ("bpf: Verifier, do explicit ALU32 bounds tracking")
the signature of signed_sub_overflows() had s64 a and s64 b as its input args
whereas now they are truncated to s32. Thus restore proper types. Also, the case
of signed_add32_overflows() is not consistent to signed_sub32_overflows(). Both
have s32 as inputs, therefore align the former.
Fixes:
3f50f132d840 ("bpf: Verifier, do explicit ALU32 bounds tracking")
Reported-by: De4dCr0w <sa516203@mail.ustc.edu.cn>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Vincent Mailhol [Wed, 20 Jan 2021 11:41:37 +0000 (20:41 +0900)]
can: peak_usb: fix use after free bugs
After calling peak_usb_netif_rx_ni(skb), dereferencing skb is unsafe.
Especially, the can_frame cf which aliases skb memory is accessed
after the peak_usb_netif_rx_ni().
Reordering the lines solves the issue.
Fixes:
0a25e1f4f185 ("can: peak_usb: add support for PEAK new CANFD USB adapters")
Link: https://lore.kernel.org/r/20210120114137.200019-4-mailhol.vincent@wanadoo.fr
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Vincent Mailhol [Wed, 20 Jan 2021 11:41:36 +0000 (20:41 +0900)]
can: vxcan: vxcan_xmit: fix use after free bug
After calling netif_rx_ni(skb), dereferencing skb is unsafe.
Especially, the canfd_frame cfd which aliases skb memory is accessed
after the netif_rx_ni().
Fixes:
a8f820a380a2 ("can: add Virtual CAN Tunnel driver (vxcan)")
Link: https://lore.kernel.org/r/20210120114137.200019-3-mailhol.vincent@wanadoo.fr
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Vincent Mailhol [Wed, 20 Jan 2021 11:41:35 +0000 (20:41 +0900)]
can: dev: can_restart: fix use after free bug
After calling netif_rx_ni(skb), dereferencing skb is unsafe.
Especially, the can_frame cf which aliases skb memory is accessed
after the netif_rx_ni() in:
stats->rx_bytes += cf->len;
Reordering the lines solves the issue.
Fixes:
39549eef3587 ("can: CAN Network device driver and Netlink interface")
Link: https://lore.kernel.org/r/20210120114137.200019-2-mailhol.vincent@wanadoo.fr
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Yuchung Cheng [Tue, 19 Jan 2021 19:26:19 +0000 (11:26 -0800)]
tcp: fix TCP socket rehash stats mis-accounting
The previous commit
32efcc06d2a1 ("tcp: export count for rehash attempts")
would mis-account rehashing SNMP and socket stats:
a. During handshake of an active open, only counts the first
SYN timeout
b. After handshake of passive and active open, stop updating
after (roughly) TCP_RETRIES1 recurring RTOs
c. After the socket aborts, over count timeout_rehash by 1
This patch fixes this by checking the rehash result from sk_rethink_txhash.
Fixes:
32efcc06d2a1 ("tcp: export count for rehash attempts")
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Link: https://lore.kernel.org/r/20210119192619.1848270-1-ycheng@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dan Carpenter [Tue, 19 Jan 2021 14:48:03 +0000 (17:48 +0300)]
net: dsa: b53: fix an off by one in checking "vlan->vid"
The > comparison should be >= to prevent accessing one element beyond
the end of the dev->vlans[] array in the caller function, b53_vlan_add().
The "dev->vlans" array is allocated in the b53_switch_init() function
and it has "dev->num_vlans" elements.
Fixes:
a2482d2ce349 ("net: dsa: b53: Plug in VLAN support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/YAbxI97Dl/pmBy5V@mwanda
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jarod Wilson [Tue, 19 Jan 2021 01:09:27 +0000 (20:09 -0500)]
bonding: add a vlan+srcmac tx hashing option
This comes from an end-user request, where they're running multiple VMs on
hosts with bonded interfaces connected to some interest switch topologies,
where 802.3ad isn't an option. They're currently running a proprietary
solution that effectively achieves load-balancing of VMs and bandwidth
utilization improvements with a similar form of transmission algorithm.
Basically, each VM has it's own vlan, so it always sends its traffic out
the same interface, unless that interface fails. Traffic gets split
between the interfaces, maintaining a consistent path, with failover still
available if an interface goes down.
Unlike bond_eth_hash(), this hash function is using the full source MAC
address instead of just the last byte, as there are so few components to
the hash, and in the no-vlan case, we would be returning just the last
byte of the source MAC as the hash value. It's entirely possible to have
two NICs in a bond with the same last byte of their MAC, but not the same
MAC, so this adjustment should guarantee distinct hashes in all cases.
This has been rudimetarily tested to provide similar results to the
proprietary solution it is aiming to replace. A patch for iproute2 is also
posted, to properly support the new mode there as well.
Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Cc: Thomas Davis <tadavis@lbl.gov>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Link: https://lore.kernel.org/r/20210119010927.1191922-1-jarod@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Tue, 19 Jan 2021 16:56:56 +0000 (17:56 +0100)]
net: fix GSO for SG-enabled devices
The commit
dbd50f238dec ("net: move the hsize check to the else
block in skb_segment") introduced a data corruption for devices
supporting scatter-gather.
The problem boils down to signed/unsigned comparison given
unexpected results: if signed 'hsize' is negative, it will be
considered greater than a positive 'len', which is unsigned.
This commit addresses resorting to the old checks order, so that
'hsize' never has a negative value when compared with 'len'.
v1 -> v2:
- reorder hsize checks instead of explicit cast (Alex)
Bisected-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Fixes:
dbd50f238dec ("net: move the hsize check to the else block in skb_segment")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Link: https://lore.kernel.org/r/861947c2d2d087db82af93c21920ce8147d15490.1611074818.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Tue, 19 Jan 2021 16:49:00 +0000 (08:49 -0800)]
tcp: do not mess with cloned skbs in tcp_add_backlog()
Heiner Kallweit reported that some skbs were sent with
the following invalid GSO properties :
- gso_size > 0
- gso_type == 0
This was triggerring a WARN_ON_ONCE() in rtl8169_tso_csum_v2.
Juerg Haefliger was able to reproduce a similar issue using
a lan78xx NIC and a workload mixing TCP incoming traffic
and forwarded packets.
The problem is that tcp_add_backlog() is writing
over gso_segs and gso_size even if the incoming packet will not
be coalesced to the backlog tail packet.
While skb_try_coalesce() would bail out if tail packet is cloned,
this overwriting would lead to corruptions of other packets
cooked by lan78xx, sharing a common super-packet.
The strategy used by lan78xx is to use a big skb, and split
it into all received packets using skb_clone() to avoid copies.
The drawback of this strategy is that all the small skb share a common
struct skb_shared_info.
This patch rewrites TCP gso_size/gso_segs handling to only
happen on the tail skb, since skb_try_coalesce() made sure
it was not cloned.
Fixes:
4f693b55c3d2 ("tcp: implement coalescing on backlog queue")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Bisected-by: Juerg Haefliger <juergh@canonical.com>
Tested-by: Juerg Haefliger <juergh@canonical.com>
Reported-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=209423
Link: https://lore.kernel.org/r/20210119164900.766957-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Xu Wang [Tue, 19 Jan 2021 07:50:59 +0000 (07:50 +0000)]
octeontx2-af: Remove unneeded semicolons
fix semicolon.cocci warnings:
./drivers/net/ethernet/marvell/octeontx2/af/rvu_npc_fs.c:272:2-3: Unneeded semicolon
drivers/net/ethernet/marvell/octeontx2/af/rvu_debugfs.c:1788:3-4: Unneeded semicolon
drivers/net/ethernet/marvell/octeontx2/af/rvu_debugfs.c:1809:3-4: Unneeded semicolon
drivers/net/ethernet/marvell/octeontx2/af/rvu.c:1326:2-3: Unneeded semicolon
Signed-off-by: Xu Wang <vulab@iscas.ac.cn>
Link: https://lore.kernel.org/r/20210119075059.17493-1-vulab@iscas.ac.cn
Link: https://lore.kernel.org/r/20210119075507.17699-1-vulab@iscas.ac.cn
Link: https://lore.kernel.org/r/20210119080037.17931-1-vulab@iscas.ac.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Geert Uytterhoeven [Mon, 18 Jan 2021 15:08:57 +0000 (16:08 +0100)]
net: smsc911x: Make Runtime PM handling more fine-grained
Currently the smsc911x driver has mininal power management: during
driver probe, the device is powered up, and during driver remove, it is
powered down.
Improve power management by making it more fine-grained:
1. Power the device down when driver probe is finished,
2. Power the device (down) when it is opened (closed),
3. Make sure the device is powered during PHY access.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20210118150857.796943-1-geert+renesas@glider.be
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Colin Ian King [Mon, 18 Jan 2021 11:19:02 +0000 (11:19 +0000)]
selftests: forwarding: Fix spelling mistake "succeded" -> "succeeded"
There are two spelling mistakes in check_fail messages. Fix them.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20210118111902.71096-1-colin.king@canonical.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Menglong Dong [Mon, 18 Jan 2021 11:15:39 +0000 (03:15 -0800)]
net: tun: fix misspellings using codespell tool
Some typos are found out by codespell tool:
$ codespell -w -i 3 ./drivers/net/tun.c
aovid ==> avoid
Fix typos found by codespell.
Signed-off-by: Menglong Dong <dong.menglong@zte.com.cn>
Link: https://lore.kernel.org/r/20210118111539.35886-1-dong.menglong@zte.com.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jiapeng Zhong [Mon, 18 Jan 2021 08:31:02 +0000 (16:31 +0800)]
taprio: boolean values to a bool variable
Fix the following coccicheck warnings:
./net/sched/sch_taprio.c:393:3-16: WARNING: Assignment of 0/1 to bool
variable.
./net/sched/sch_taprio.c:375:2-15: WARNING: Assignment of 0/1 to bool
variable.
./net/sched/sch_taprio.c:244:4-19: WARNING: Assignment of 0/1 to bool
variable.
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Zhong <abaci-bugfix@linux.alibaba.com>
Link: https://lore.kernel.org/r/1610958662-71166-1-git-send-email-abaci-bugfix@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 20 Jan 2021 01:32:36 +0000 (17:32 -0800)]
Merge branch 'net-ethernet-ti-am65-cpsw-nuss-introduce-support-for-am64x-cpsw3g'
Grygorii Strashko says:
====================
net: ethernet: ti: am65-cpsw-nuss: introduce support for am64x cpsw3g
This series introduces basic support for recently introduced TI K3 AM642x SoC [1]
which contains 3 port (2 external ports) CPSW3g module. The CPSW3g integrated
in MAIN domain and can be configured in multi port or switch modes.
In this series only multi port mode is enabled. The initial version of switchdev
support was introduced by Vignesh Raghavendra [2] and work is in progress.
The overall functionality and DT bindings are similar to other K3 CPSWxg
versions, so DT binding changes are minimal and driver is mostly re-used for
TI K3 AM642x CPSW3g.
The main difference is that TI K3 AM642x SoC is not fully DMA coherent and
instead DMA coherency is supported per DMA channel.
Patches 1-2 - DT bindings update
Patches 3-4 - Update driver to support changed DMA coherency model
Patches 5-6 - adds TI K3 AM642x SoC platform data and so enable CPSW3g
[1] https://www.ti.com/lit/pdf/spruim2
[2] https://patchwork.ozlabs.org/project/netdev/cover/
20201130082046.16292-1-vigneshr@ti.com/
====================
Link: https://lore.kernel.org/r/20210115192853.5469-1-grygorii.strashko@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vignesh Raghavendra [Fri, 15 Jan 2021 19:28:53 +0000 (21:28 +0200)]
net: ethernet: ti: am65-cpsw: add support for am64x cpsw3g
The TI AM64x SoCs Gigabit Ethernet Switch subsystem (CPSW3g NUSS) has three
ports (2 ext. ports) and provides Ethernet packet communication for the
device and can be configured in multi port mode or as an Ethernet switch.
This patch adds support for the corresponding CPSW3g version.
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vignesh Raghavendra [Fri, 15 Jan 2021 19:28:52 +0000 (21:28 +0200)]
net: ti: cpsw_ale: add driver data for AM64 CPSW3g
The AM642x CPSW3g is similar to j721e-cpswxg except its ALE table size is
512 entries. Add entry for the same.
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Peter Ujfalusi [Fri, 15 Jan 2021 19:28:51 +0000 (21:28 +0200)]
net: ethernet: ti: am65-cpsw-nuss: Support for transparent ASEL handling
Use the glue layer's functions to convert the dma_addr_t to and from CPPI5
address (with the ASEL bits), which should be used within the descriptors
and data buffers.
- Per channel coherency support
The DMAs use the 'ASEL' bits to select data and configuration fetch path.
The ASEL bits are placed at the unused parts of any address field used by
the DMAs (pointers to descriptors, addresses in descriptors, ring base
addresses). The ASEL is not part of the address (the DMAs can address
48bits). Individual channels can be configured to be coherent (via ACP
port) or non coherent individually by configuring the ASEL to appropriate
value.
[1] https://lore.kernel.org/patchwork/cover/1350756/
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Co-developed-by: Vignesh Raghavendra <vigneshr@ti.com>
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Peter Ujfalusi [Fri, 15 Jan 2021 19:28:50 +0000 (21:28 +0200)]
net: ethernet: ti: am65-cpsw-nuss: Use DMA device for DMA API
For DMA API the DMA device should be used as cpsw does not accesses to
descriptors or data buffers in any ways. The DMA does.
Also, drop dma_coerce_mask_and_coherent() setting on CPSW device, as it
should be done by DMA driver which does data movement.
This is required for adding AM64x CPSW3g support where DMA coherency
supported per DMA channel.
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Co-developed-by: Vignesh Raghavendra <vigneshr@ti.com>
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Grygorii Strashko [Fri, 15 Jan 2021 19:28:49 +0000 (21:28 +0200)]
dt-binding: net: ti: k3-am654-cpsw-nuss: update bindings for am64x cpsw3g
Update DT binding for recently introduced TI K3 AM642x SoC [1] which
contains 3 port (2 external ports) CPSW3g module. The CPSW3g integrated
in MAIN domain and can be configured in multi port or switch modes.
The overall functionality and DT bindings are similar to other K3 CPSWxg
versions, so DT binding changes are minimal:
- reword description
- add new compatible 'ti,am642-cpsw-nuss'
- allow 2 external ports child nodes
- add missed 'assigned-clock' props
[1] https://www.ti.com/lit/pdf/spruim2
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Grygorii Strashko [Fri, 15 Jan 2021 19:28:48 +0000 (21:28 +0200)]
dt-binding: ti: am65x-cpts: add assigned-clock and power-domains props
The CPTS clock is usually a clk-mux which allows to select CPTS reference
clock by using 'assigned-clock-parents', 'assigned-clocks' DT properties.
Also depending on integration the power-domains has to be specified to
enable CPTS IP.
Hence add 'assigned-clock-parents', 'assigned-clocks' and 'power-domains'
properties to the CPTS DT bindings to avoid dtbs_check warnings:
cpts@
310d0000: 'assigned-clock-parents', 'assigned-clocks' do not match any of the regexes: 'pinctrl-[0-9]+'
cpts@
310d0000: 'power-domains' does not match any of the regexes: 'pinctrl-[0-9]+'
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Hangbin Liu [Tue, 19 Jan 2021 02:59:30 +0000 (10:59 +0800)]
selftests: net: fib_tests: remove duplicate log test
The previous test added an address with a specified metric and check if
correspond route was created. I somehow added two logs for the same
test. Remove the duplicated one.
Reported-by: Antoine Tenart <atenart@redhat.com>
Fixes:
0d29169a708b ("selftests/net/fib_tests: update addr_metric_test for peer route testing")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20210119025930.2810532-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Bongsu Jeon [Mon, 18 Jan 2021 20:55:22 +0000 (05:55 +0900)]
net: nfc: nci: fix the wrong NCI_CORE_INIT parameters
Fix the code because NCI_CORE_INIT_CMD includes two parameters in NCI2.0
but there is no parameters in NCI1.x.
Fixes:
bcd684aace34 ("net/nfc/nci: Support NCI 2.x initial sequence")
Signed-off-by: Bongsu Jeon <bongsu.jeon@samsung.com>
Link: https://lore.kernel.org/r/20210118205522.317087-1-bongsu.jeon@samsung.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Geert Uytterhoeven [Mon, 18 Jan 2021 15:08:12 +0000 (16:08 +0100)]
sh_eth: Fix power down vs. is_opened flag ordering
sh_eth_close() does a synchronous power down of the device before
marking it closed. Revert the order, to make sure the device is never
marked opened while suspended.
While at it, use pm_runtime_put() instead of pm_runtime_put_sync(), as
there is no reason to do a synchronous power down.
Fixes:
7fa2955ff70ce453 ("sh_eth: Fix sleeping function called from invalid context")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Sergei Shtylyov <sergei.shtylyov@gmail.com>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Link: https://lore.kernel.org/r/20210118150812.796791-1-geert+renesas@glider.be
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tariq Toukan [Sun, 17 Jan 2021 15:15:38 +0000 (17:15 +0200)]
net: Disable NETIF_F_HW_TLS_RX when RXCSUM is disabled
With NETIF_F_HW_TLS_RX packets are decrypted in HW. This cannot be
logically done when RXCSUM offload is off.
Fixes:
14136564c8ee ("net: Add TLS RX offload feature")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Boris Pismenny <borisp@nvidia.com>
Link: https://lore.kernel.org/r/20210117151538.9411-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 19 Jan 2021 22:31:28 +0000 (14:31 -0800)]
Merge branch 'net-support-sctp-crc-csum-offload-for-tunneling-packets-in-some-drivers'
Xin Long says:
====================
net: support SCTP CRC csum offload for tunneling packets in some drivers
This patchset introduces inline function skb_csum_is_sctp(), and uses it
to validate it's a sctp CRC csum offload packet, to make SCTP CRC csum
offload for tunneling packets supported in some HW drivers.
====================
Link: https://lore.kernel.org/r/cover.1610777159.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Xin Long [Sat, 16 Jan 2021 06:13:42 +0000 (14:13 +0800)]
net: ixgbevf: use skb_csum_is_sctp instead of protocol check
Using skb_csum_is_sctp is a easier way to validate it's a SCTP CRC
checksum offload packet, and yet it also makes ixgbevf support SCTP
CRC checksum offload for UDP and GRE encapped packets, just as it
does in igb driver.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Xin Long [Sat, 16 Jan 2021 06:13:41 +0000 (14:13 +0800)]
net: ixgbe: use skb_csum_is_sctp instead of protocol check
Using skb_csum_is_sctp is a easier way to validate it's a SCTP CRC
checksum offload packet, and yet it also makes ixgbe support SCTP
CRC checksum offload for UDP and GRE encapped packets, just as it
does in igb driver.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Xin Long [Sat, 16 Jan 2021 06:13:40 +0000 (14:13 +0800)]
net: igc: use skb_csum_is_sctp instead of protocol check
Using skb_csum_is_sctp is a easier way to validate it's a SCTP CRC
checksum offload packet, and yet it also makes igc support SCTP
CRC checksum offload for UDP and GRE encapped packets, just as it
does in igb driver.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Xin Long [Sat, 16 Jan 2021 06:13:39 +0000 (14:13 +0800)]
net: igbvf: use skb_csum_is_sctp instead of protocol check
Using skb_csum_is_sctp is a easier way to validate it's a SCTP CRC
checksum offload packet, and yet it also makes igbvf support SCTP
CRC checksum offload for UDP and GRE encapped packets, just as it
does in igb driver.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Xin Long [Sat, 16 Jan 2021 06:13:38 +0000 (14:13 +0800)]
net: igb: use skb_csum_is_sctp instead of protocol check
Using skb_csum_is_sctp is a easier way to validate it's a SCTP
CRC checksum offload packet, and there is no need to parse the
packet to check its proto field, especially when it's a UDP or
GRE encapped packet.
So this patch also makes igb support SCTP CRC checksum offload
for UDP and GRE encapped packets.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Xin Long [Sat, 16 Jan 2021 06:13:37 +0000 (14:13 +0800)]
net: add inline function skb_csum_is_sctp
This patch is to define a inline function skb_csum_is_sctp(), and
also replace all places where it checks if it's a SCTP CSUM skb.
This function would be used later in many networking drivers in
the following patches.
Suggested-by: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 19 Jan 2021 21:54:32 +0000 (13:54 -0800)]
Merge branch 'ipv4-ensure-ecn-bits-don-t-influence-source-address-validation'
Guillaume Nault says:
====================
ipv4: Ensure ECN bits don't influence source address validation
Functions that end up calling fib_table_lookup() should clear the ECN
bits from the TOS, otherwise ECT(0) and ECT(1) packets can be treated
differently.
Most functions already clear the ECN bits, but there are a few cases
where this is not done. This series only fixes the ones related to
source address validation.
====================
Link: https://lore.kernel.org/r/cover.1610790904.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Guillaume Nault [Sat, 16 Jan 2021 10:44:26 +0000 (11:44 +0100)]
netfilter: rpfilter: mask ecn bits before fib lookup
RT_TOS() only masks one of the two ECN bits. Therefore rpfilter_mt()
treats Not-ECT or ECT(1) packets in a different way than those with
ECT(0) or CE.
Reproducer:
Create two netns, connected with a veth:
$ ip netns add ns0
$ ip netns add ns1
$ ip link add name veth01 netns ns0 type veth peer name veth10 netns ns1
$ ip -netns ns0 link set dev veth01 up
$ ip -netns ns1 link set dev veth10 up
$ ip -netns ns0 address add 192.0.2.10/32 dev veth01
$ ip -netns ns1 address add 192.0.2.11/32 dev veth10
Add a route to ns1 in ns0:
$ ip -netns ns0 route add 192.0.2.11/32 dev veth01
In ns1, only packets with TOS 4 can be routed to ns0:
$ ip -netns ns1 route add 192.0.2.10/32 tos 4 dev veth10
Ping from ns0 to ns1 works regardless of the ECN bits, as long as TOS
is 4:
$ ip netns exec ns0 ping -Q 4 192.0.2.11 # TOS 4, Not-ECT
... 0% packet loss ...
$ ip netns exec ns0 ping -Q 5 192.0.2.11 # TOS 4, ECT(1)
... 0% packet loss ...
$ ip netns exec ns0 ping -Q 6 192.0.2.11 # TOS 4, ECT(0)
... 0% packet loss ...
$ ip netns exec ns0 ping -Q 7 192.0.2.11 # TOS 4, CE
... 0% packet loss ...
Now use iptable's rpfilter module in ns1:
$ ip netns exec ns1 iptables-legacy -t raw -A PREROUTING -m rpfilter --invert -j DROP
Not-ECT and ECT(1) packets still pass:
$ ip netns exec ns0 ping -Q 4 192.0.2.11 # TOS 4, Not-ECT
... 0% packet loss ...
$ ip netns exec ns0 ping -Q 5 192.0.2.11 # TOS 4, ECT(1)
... 0% packet loss ...
But ECT(0) and ECN packets are dropped:
$ ip netns exec ns0 ping -Q 6 192.0.2.11 # TOS 4, ECT(0)
... 100% packet loss ...
$ ip netns exec ns0 ping -Q 7 192.0.2.11 # TOS 4, CE
... 100% packet loss ...
After this patch, rpfilter doesn't drop ECT(0) and CE packets anymore.
Fixes:
8f97339d3feb ("netfilter: add ipv4 reverse path filter match")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Guillaume Nault [Sat, 16 Jan 2021 10:44:22 +0000 (11:44 +0100)]
udp: mask TOS bits in udp_v4_early_demux()
udp_v4_early_demux() is the only function that calls
ip_mc_validate_source() with a TOS that hasn't been masked with
IPTOS_RT_MASK.
This results in different behaviours for incoming multicast UDPv4
packets, depending on if ip_mc_validate_source() is called from the
early-demux path (udp_v4_early_demux) or from the regular input path
(ip_route_input_noref).
ECN would normally not be used with UDP multicast packets, so the
practical consequences should be limited on that side. However,
IPTOS_RT_MASK is used to also masks the TOS' high order bits, to align
with the non-early-demux path behaviour.
Reproducer:
Setup two netns, connected with veth:
$ ip netns add ns0
$ ip netns add ns1
$ ip -netns ns0 link set dev lo up
$ ip -netns ns1 link set dev lo up
$ ip link add name veth01 netns ns0 type veth peer name veth10 netns ns1
$ ip -netns ns0 link set dev veth01 up
$ ip -netns ns1 link set dev veth10 up
$ ip -netns ns0 address add 192.0.2.10 peer 192.0.2.11/32 dev veth01
$ ip -netns ns1 address add 192.0.2.11 peer 192.0.2.10/32 dev veth10
In ns0, add route to multicast address 224.0.2.0/24 using source
address 198.51.100.10:
$ ip -netns ns0 address add 198.51.100.10/32 dev lo
$ ip -netns ns0 route add 224.0.2.0/24 dev veth01 src 198.51.100.10
In ns1, define route to 198.51.100.10, only for packets with TOS 4:
$ ip -netns ns1 route add 198.51.100.10/32 tos 4 dev veth10
Also activate rp_filter in ns1, so that incoming packets not matching
the above route get dropped:
$ ip netns exec ns1 sysctl -wq net.ipv4.conf.veth10.rp_filter=1
Now try to receive packets on 224.0.2.11:
$ ip netns exec ns1 socat UDP-RECVFROM:1111,ip-add-membership=224.0.2.11:veth10,ignoreeof -
In ns0, send packet to 224.0.2.11 with TOS 4 and ECT(0) (that is,
tos 6 for socat):
$ echo test0 | ip netns exec ns0 socat - UDP-DATAGRAM:224.0.2.11:1111,bind=:1111,tos=6
The "test0" message is properly received by socat in ns1, because
early-demux has no cached dst to use, so source address validation
is done by ip_route_input_mc(), which receives a TOS that has the
ECN bits masked.
Now send another packet to 224.0.2.11, still with TOS 4 and ECT(0):
$ echo test1 | ip netns exec ns0 socat - UDP-DATAGRAM:224.0.2.11:1111,bind=:1111,tos=6
The "test1" message isn't received by socat in ns1, because, now,
early-demux has a cached dst to use and calls ip_mc_validate_source()
immediately, without masking the ECN bits.
Fixes:
bc044e8db796 ("udp: perform source validation for mcast early demux")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Maxim Mikityanskiy [Mon, 18 Jan 2021 16:03:33 +0000 (18:03 +0200)]
xsk: Clear pool even for inactive queues
The number of queues can change by other means, rather than ethtool. For
example, attaching an mqprio qdisc with num_tc > 1 leads to creating
multiple sets of TX queues, which may be then destroyed when mqprio is
deleted. If an AF_XDP socket is created while mqprio is active,
dev->_tx[queue_id].pool will be filled, but then real_num_tx_queues may
decrease with deletion of mqprio, which will mean that the pool won't be
NULLed, and a further increase of the number of TX queues may expose a
dangling pointer.
To avoid any potential misbehavior, this commit clears pool for RX and
TX queues, regardless of real_num_*_queues, still taking into
consideration num_*_queues to avoid overflows.
Fixes:
1c1efc2af158 ("xsk: Create and free buffer pool independently from umem")
Fixes:
a41b4f3c58dd ("xsk: simplify xdp_clear_umem_at_qid implementation")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Link: https://lore.kernel.org/bpf/20210118160333.333439-1-maximmi@mellanox.com