Jakub Kicinski [Wed, 4 Nov 2020 18:36:37 +0000 (10:36 -0800)]
Merge tag 'linux-can-fixes-for-5.10-
20201103' of git://git./linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2020-11-03
The first two patches are by Oleksij Rempel and they add a generic
can-controller Device Tree yaml binding and convert the text based binding
of the flexcan driver to a yaml based binding.
Zhang Changzhong's patch fixes a remove_proc_entry warning in the AF_CAN
core.
A patch by me fixes a kfree_skb() call from IRQ context in the rx-offload
helper.
Vincent Mailhol contributes a patch to prevent a call to kfree_skb() in
hard IRQ context in can_get_echo_skb().
Oliver Hartkopp's patch fixes the length calculation for RTR CAN frames
in the __can_get_echo_skb() helper.
Oleksij Rempel's patch fixes a use-after-free that shows up with j1939 in
can_create_echo_skb().
Yegor Yefremov contributes 4 patches to enhance the j1939 documentation.
Zhang Changzhong's patch fixes a hanging task problem in j1939_sk_bind()
if the netdev is down.
Then there are three patches for the newly added CAN_ISOTP protocol. Geert
Uytterhoeven enhances the kconfig help text. Oliver Hartkopp's patch adds
missing RX timeout handling in listen-only mode and Colin Ian King's patch
decreases the generated object code by 926 bytes.
Zhang Changzhong contributes a patch for the ti_hecc driver that fixes the
error path in the probe function.
Navid Emamdoost's patch for the xilinx_can driver fixes the error handling
in case of failing pm_runtime_get_sync().
There are two patches for the peak_usb driver. Dan Carpenter adds range
checking in decode operations and Stephane Grosjean's patch fixes
a timestamp wrapping problem.
Stephane Grosjean's patch for th peak_canfd driver fixes echo management if
loopback is on.
The next three patches all target the mcp251xfd driver. The first one is
by me and it increased the severity of CRC read error messages. The kernel
test robot removes an unneeded semicolon and Tom Rix removes unneeded
break in several switch-cases.
The last 4 patches are by Joakim Zhang and target the flexcan driver,
the first three fix ECC related device specific quirks for the LS1021A,
LX2160A and the VF610 SoC. The last patch disable wakeup completely upon
driver remove.
* tag 'linux-can-fixes-for-5.10-
20201103' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can: (27 commits)
can: flexcan: flexcan_remove(): disable wakeup completely
can: flexcan: add ECC initialization for VF610
can: flexcan: add ECC initialization for LX2160A
can: flexcan: remove FLEXCAN_QUIRK_DISABLE_MECR quirk for LS1021A
can: mcp251xfd: remove unneeded break
can: mcp251xfd: mcp251xfd_regmap_nocrc_read(): fix semicolon.cocci warnings
can: mcp251xfd: mcp251xfd_regmap_crc_read(): increase severity of CRC read error messages
can: peak_canfd: pucan_handle_can_rx(): fix echo management when loopback is on
can: peak_usb: peak_usb_get_ts_time(): fix timestamp wrapping
can: peak_usb: add range checking in decode operations
can: xilinx_can: handle failure cases of pm_runtime_get_sync
can: ti_hecc: ti_hecc_probe(): add missed clk_disable_unprepare() in error path
can: isotp: padlen(): make const array static, makes object smaller
can: isotp: isotp_rcv_cf(): enable RX timeout handling in listen-only mode
can: isotp: Explain PDU in CAN_ISOTP help text
can: j1939: j1939_sk_bind(): return failure if netdev is down
can: j1939: use backquotes for code samples
can: j1939: swap addr and pgn in the send example
can: j1939: fix syntax and spelling
can: j1939: rename jacd tool
...
====================
Link: https://lore.kernel.org/r/<20201103220636.972106-1-mkl@pengutronix.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 4 Nov 2020 16:12:52 +0000 (08:12 -0800)]
Merge branch 'master' of git://git./linux/kernel/git/klassert/ipsec
Steffen Klassert says:
====================
1) Fix packet receiving of standard IP tunnels when the xfrm_interface
module is installed. From Xin Long.
2) Fix a race condition between spi allocating and hash list
resizing. From zhuoliang zhang.
====================
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eelco Chaudron [Tue, 3 Nov 2020 08:25:49 +0000 (09:25 +0100)]
net: openvswitch: silence suspicious RCU usage warning
Silence suspicious RCU usage warning in ovs_flow_tbl_masks_cache_resize()
by replacing rcu_dereference() with rcu_dereference_ovsl().
In addition, when creating a new datapath, make sure it's configured under
the ovs_lock.
Fixes:
9bf24f594c6a ("net: openvswitch: make masks cache size configurable")
Reported-by: syzbot+9a8f8bfcc56e8578016c@syzkaller.appspotmail.com
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Link: https://lore.kernel.org/r/160439190002.56943.1418882726496275961.stgit@ebuild
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vinay Kumar Yadav [Mon, 2 Nov 2020 17:39:10 +0000 (23:09 +0530)]
chelsio/chtls: fix always leaking ctrl_skb
Correct skb refcount in alloc_ctrl_skb(), causing skb memleak
when chtls_send_abort() called with NULL skb.
it was always leaking the skb, correct it by incrementing skb
refs by one.
Fixes:
cc35c88ae4db ("crypto : chtls - CPL handler definition")
Signed-off-by: Vinay Kumar Yadav <vinay.yadav@chelsio.com>
Link: https://lore.kernel.org/r/20201102173909.24826-1-vinay.yadav@chelsio.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vinay Kumar Yadav [Mon, 2 Nov 2020 17:36:51 +0000 (23:06 +0530)]
chelsio/chtls: fix memory leaks caused by a race
race between user context and softirq causing memleak,
consider the call sequence scenario
chtls_setkey() //user context
chtls_peer_close()
chtls_abort_req_rss()
chtls_setkey() //user context
work request skb queued in chtls_setkey() won't be freed
because resources are already cleaned for this connection,
fix it by not queuing work request while socket is closing.
v1->v2:
- fix W=1 warning.
v2->v3:
- separate it out from another memleak fix.
Fixes:
cc35c88ae4db ("crypto : chtls - CPL handler definition")
Signed-off-by: Vinay Kumar Yadav <vinay.yadav@chelsio.com>
Link: https://lore.kernel.org/r/20201102173650.24754-1-vinay.yadav@chelsio.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Joakim Zhang [Tue, 20 Oct 2020 18:45:27 +0000 (02:45 +0800)]
can: flexcan: flexcan_remove(): disable wakeup completely
With below sequence, we can see wakeup default is enabled after re-load module,
if it was enabled before, so we need disable wakeup in flexcan_remove().
| # cat /sys/bus/platform/drivers/flexcan/
5a8e0000.can/power/wakeup
| disabled
| # echo enabled > /sys/bus/platform/drivers/flexcan/
5a8e0000.can/power/wakeup
| # cat /sys/bus/platform/drivers/flexcan/
5a8e0000.can/power/wakeup
| enabled
| # rmmod flexcan
| # modprobe flexcan
| # cat /sys/bus/platform/drivers/flexcan/
5a8e0000.can/power/wakeup
| enabled
Fixes:
de3578c198c6 ("can: flexcan: add self wakeup support")
Fixes:
915f9666421c ("can: flexcan: add support for DT property 'wakeup-source'")
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Link: https://lore.kernel.org/r/20201020184527.8190-1-qiangqing.zhang@nxp.com
[mkl: streamlined commit message]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Joakim Zhang [Tue, 20 Oct 2020 15:53:57 +0000 (23:53 +0800)]
can: flexcan: add ECC initialization for VF610
For SoCs with ECC supported, even use FLEXCAN_QUIRK_DISABLE_MECR quirk to
disable non-correctable errors interrupt and freeze mode, had better use
FLEXCAN_QUIRK_SUPPORT_ECC quirk to initialize all memory.
Fixes:
cdce844865bea ("can: flexcan: add vf610 support for FlexCAN")
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Link: https://lore.kernel.org/r/20201020155402.30318-6-qiangqing.zhang@nxp.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Joakim Zhang [Tue, 20 Oct 2020 15:53:56 +0000 (23:53 +0800)]
can: flexcan: add ECC initialization for LX2160A
After double check with Layerscape CAN owner (Pankaj Bansal), confirm
that LX2160A indeed supports ECC feature, so correct the feature table.
For SoCs with ECC supported, even use FLEXCAN_QUIRK_DISABLE_MECR quirk to
disable non-correctable errors interrupt and freeze mode, had better use
FLEXCAN_QUIRK_SUPPORT_ECC quirk to initialize all memory.
Fixes:
2c19bb43e5572 ("can: flexcan: add lx2160ar1 support")
Cc: Pankaj Bansal <pankaj.bansal@nxp.com>
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Link: https://lore.kernel.org/r/20201020155402.30318-5-qiangqing.zhang@nxp.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Joakim Zhang [Tue, 20 Oct 2020 15:53:55 +0000 (23:53 +0800)]
can: flexcan: remove FLEXCAN_QUIRK_DISABLE_MECR quirk for LS1021A
After double check with Layerscape CAN owner (Pankaj Bansal), confirm that
LS1021A doesn't support ECC feature, so remove FLEXCAN_QUIRK_DISABLE_MECR
quirk.
Fixes:
99b7668c04b27 ("can: flexcan: adding platform specific details for LS1021A")
Cc: Pankaj Bansal <pankaj.bansal@nxp.com>
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Link: https://lore.kernel.org/r/20201020155402.30318-4-qiangqing.zhang@nxp.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Tom Rix [Mon, 19 Oct 2020 17:24:12 +0000 (10:24 -0700)]
can: mcp251xfd: remove unneeded break
A break is not needed if it is preceded by a return.
Signed-off-by: Tom Rix <trix@redhat.com>
Link: https://lore.kernel.org/r/20201019172412.31143-1-trix@redhat.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
kernel test robot [Mon, 19 Oct 2020 12:08:05 +0000 (20:08 +0800)]
can: mcp251xfd: mcp251xfd_regmap_nocrc_read(): fix semicolon.cocci warnings
drivers/net/can/spi/mcp251xfd/mcp251xfd-regmap.c:176:2-3: Unneeded semicolon
Remove unneeded semicolon.
Generated by: scripts/coccinelle/misc/semicolon.cocci
Fixes:
875347fe5756 ("can: mcp25xxfd: add regmap infrastructure")
Signed-off-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/r/20201019120805.GA63693@ae4257e0ab22
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Marc Kleine-Budde [Thu, 15 Oct 2020 19:16:37 +0000 (21:16 +0200)]
can: mcp251xfd: mcp251xfd_regmap_crc_read(): increase severity of CRC read error messages
During debugging it turned out that some people have setups where the SPI
communication is more prone to CRC errors.
Increase the severity of both the transfer retry and transfer failure message
to give users feedback without the need to recompile the driver with debug
enabled.
Cc: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Cc: Thomas Kopp <thomas.kopp@microchip.com>
Link: http://lore.kernel.org/r/20201019190524.1285319-15-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Stephane Grosjean [Tue, 13 Oct 2020 15:39:47 +0000 (17:39 +0200)]
can: peak_canfd: pucan_handle_can_rx(): fix echo management when loopback is on
Echo management is driven by PUCAN_MSG_LOOPED_BACK bit, while loopback
frames are identified with PUCAN_MSG_SELF_RECEIVE bit. Those bits are set
for each outgoing frame written to the IP core so that a copy of each one
will be placed into the rx path. Thus,
- when PUCAN_MSG_LOOPED_BACK is set then the rx frame is an echo of a
previously sent frame,
- when PUCAN_MSG_LOOPED_BACK+PUCAN_MSG_SELF_RECEIVE are set, then the rx
frame is an echo AND a loopback frame. Therefore, this frame must be
put into the socket rx path too.
This patch fixes how CAN frames are handled when these are sent while the
can interface is configured in "loopback on" mode.
Signed-off-by: Stephane Grosjean <s.grosjean@peak-system.com>
Link: https://lore.kernel.org/r/20201013153947.28012-1-s.grosjean@peak-system.com
Fixes:
8ac8321e4a79 ("can: peak: add support for PEAK PCAN-PCIe FD CAN-FD boards")
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Stephane Grosjean [Wed, 14 Oct 2020 08:56:31 +0000 (10:56 +0200)]
can: peak_usb: peak_usb_get_ts_time(): fix timestamp wrapping
Fabian Inostroza <fabianinostrozap@gmail.com> has discovered a potential
problem in the hardware timestamp reporting from the PCAN-USB USB CAN interface
(only), related to the fact that a timestamp of an event may precede the
timestamp used for synchronization when both records are part of the same USB
packet. However, this case was used to detect the wrapping of the time counter.
This patch details and fixes the two identified cases where this problem can
occur.
Reported-by: Fabian Inostroza <fabianinostrozap@gmail.com>
Signed-off-by: Stephane Grosjean <s.grosjean@peak-system.com>
Link: https://lore.kernel.org/r/20201014085631.15128-1-s.grosjean@peak-system.com
Fixes:
bb4785551f64 ("can: usb: PEAK-System Technik USB adapters driver core")
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Dan Carpenter [Thu, 13 Aug 2020 14:06:04 +0000 (17:06 +0300)]
can: peak_usb: add range checking in decode operations
These values come from skb->data so Smatch considers them untrusted. I
believe Smatch is correct but I don't have a way to test this.
The usb_if->dev[] array has 2 elements but the index is in the 0-15
range without checks. The cfd->len can be up to 255 but the maximum
valid size is CANFD_MAX_DLEN (64) so that could lead to memory
corruption.
Fixes:
0a25e1f4f185 ("can: peak_usb: add support for PEAK new CANFD USB adapters")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/20200813140604.GA456946@mwanda
Acked-by: Stephane Grosjean <s.grosjean@peak-system.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Navid Emamdoost [Fri, 5 Jun 2020 03:32:39 +0000 (22:32 -0500)]
can: xilinx_can: handle failure cases of pm_runtime_get_sync
Calling pm_runtime_get_sync increments the counter even in case of
failure, causing incorrect ref count. Call pm_runtime_put if
pm_runtime_get_sync fails.
Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
Link: https://lore.kernel.org/r/20200605033239.60664-1-navid.emamdoost@gmail.com
Fixes:
4716620d1b62 ("can: xilinx: Convert to runtime_pm")
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Zhang Changzhong [Fri, 17 Jul 2020 08:04:39 +0000 (16:04 +0800)]
can: ti_hecc: ti_hecc_probe(): add missed clk_disable_unprepare() in error path
The driver forgets to call clk_disable_unprepare() in error path after
a success calling for clk_prepare_enable().
Fix it by adding a clk_disable_unprepare() in error path.
Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com>
Link: https://lore.kernel.org/r/1594973079-27743-1-git-send-email-zhangchangzhong@huawei.com
Fixes:
befa60113ce7 ("can: ti_hecc: add missing prepare and unprepare of the clock")
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Colin Ian King [Tue, 20 Oct 2020 15:42:03 +0000 (16:42 +0100)]
can: isotp: padlen(): make const array static, makes object smaller
Don't populate the const array plen on the stack but instead it static. Makes
the object code smaller by 926 bytes.
Before:
text data bss dec hex filename
26531 1943 64 28538 6f7a net/can/isotp.o
After:
text data bss dec hex filename
25509 2039 64 27612 6bdc net/can/isotp.o
(gcc version 10.2.0)
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20201020154203.54711-1-colin.king@canonical.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Oliver Hartkopp [Mon, 19 Oct 2020 12:02:29 +0000 (14:02 +0200)]
can: isotp: isotp_rcv_cf(): enable RX timeout handling in listen-only mode
As reported by Thomas Wagner:
https://github.com/hartkopp/can-isotp/issues/34
the timeout handling for data frames is not enabled when the isotp socket is
used in listen-only mode (sockopt CAN_ISOTP_LISTEN_MODE). This mode is enabled
by the isotpsniffer application which therefore became inconsistend with the
strict rx timeout rules when running the isotp protocol in the operational
mode.
This patch fixes this inconsistency by moving the return condition for the
listen-only mode behind the timeout handling code.
Reported-by: Thomas Wagner <thwa1@web.de>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Fixes:
e057dd3fc20f ("can: add ISO 15765-2:2016 transport protocol")
Link: https://github.com/hartkopp/can-isotp/issues/34
Link: https://lore.kernel.org/r/20201019120229.89326-1-socketcan@hartkopp.net
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Geert Uytterhoeven [Tue, 13 Oct 2020 14:13:41 +0000 (16:13 +0200)]
can: isotp: Explain PDU in CAN_ISOTP help text
The help text for the CAN_ISOTP config symbol uses the acronym "PDU". However,
this acronym is not explained here, nor in Documentation/networking/can.rst.
Expand the acronym to make it easier for users to decide if they need to enable
the CAN_ISOTP option or not.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20201013141341.28487-1-geert+renesas@glider.be
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Zhang Changzhong [Mon, 7 Sep 2020 06:31:48 +0000 (14:31 +0800)]
can: j1939: j1939_sk_bind(): return failure if netdev is down
When a netdev down event occurs after a successful call to
j1939_sk_bind(), j1939_netdev_notify() can handle it correctly.
But if the netdev already in down state before calling j1939_sk_bind(),
j1939_sk_release() will stay in wait_event_interruptible() blocked
forever. Because in this case, j1939_netdev_notify() won't be called and
j1939_tp_txtimer() won't call j1939_session_cancel() or other function
to clear session for ENETDOWN error, this lead to mismatch of
j1939_session_get/put() and jsk->skb_pending will never decrease to
zero.
To reproduce it use following commands:
1. ip link add dev vcan0 type vcan
2. j1939acd -r 100,80-120
1122334455667788 vcan0
3. presses ctrl-c and thread will be blocked forever
This patch adds check for ndev->flags in j1939_sk_bind() to avoid this
kind of situation and return with -ENETDOWN.
Fixes:
9d71dd0c7009 ("can: add support of SAE J1939 protocol")
Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com>
Link: https://lore.kernel.org/r/1599460308-18770-1-git-send-email-zhangchangzhong@huawei.com
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Yegor Yefremov [Mon, 26 Oct 2020 09:44:42 +0000 (10:44 +0100)]
can: j1939: use backquotes for code samples
This patch adds backquotes for code samples.
Signed-off-by: Yegor Yefremov <yegorslists@googlemail.com>
Link: https://lore.kernel.org/r/20201026094442.16587-1-yegorslists@googlemail.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Yegor Yefremov [Thu, 22 Oct 2020 08:37:08 +0000 (10:37 +0200)]
can: j1939: swap addr and pgn in the send example
The address was wrongly assigned to the PGN field and vice versa.
Signed-off-by: Yegor Yefremov <yegorslists@googlemail.com>
Link: https://lore.kernel.org/r/20201022083708.8755-1-yegorslists@googlemail.com
Fixes:
9d71dd0c7009 ("can: add support of SAE J1939 protocol")
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Yegor Yefremov [Tue, 20 Oct 2020 10:10:43 +0000 (12:10 +0200)]
can: j1939: fix syntax and spelling
This patches fixes the syntax an spelling of the j1939 documentation.
Signed-off-by: Yegor Yefremov <yegorslists@googlemail.com>
Link: https://lore.kernel.org/r/20201020101043.6369-1-yegorslists@googlemail.com
Fixes:
9d71dd0c7009 ("can: add support of SAE J1939 protocol")
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Yegor Yefremov [Tue, 20 Oct 2020 08:11:34 +0000 (10:11 +0200)]
can: j1939: rename jacd tool
Due to naming conflicts, jacd was renamed to j1939acd in:
https://github.com/linux-can/can-utils/pull/199
Signed-off-by: Yegor Yefremov <yegorslists@googlemail.com>
Link: https://lore.kernel.org/r/20201020081134.3597-1-yegorslists@googlemail.com
Link: https://github.com/linux-can/can-utils/pull/199
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Oleksij Rempel [Wed, 18 Dec 2019 08:39:02 +0000 (09:39 +0100)]
can: can_create_echo_skb(): fix echo skb generation: always use skb_clone()
All user space generated SKBs are owned by a socket (unless injected into the
key via AF_PACKET). If a socket is closed, all associated skbs will be cleaned
up.
This leads to a problem when a CAN driver calls can_put_echo_skb() on a
unshared SKB. If the socket is closed prior to the TX complete handler,
can_get_echo_skb() and the subsequent delivering of the echo SKB to all
registered callbacks, a SKB with a refcount of 0 is delivered.
To avoid the problem, in can_get_echo_skb() the original SKB is now always
cloned, regardless of shared SKB or not. If the process exists it can now
safely discard its SKBs, without disturbing the delivery of the echo SKB.
The problem shows up in the j1939 stack, when it clones the incoming skb, which
detects the already 0 refcount.
We can easily reproduce this with following example:
testj1939 -B -r can0: &
cansend can0
1823ff40#0123
WARNING: CPU: 0 PID: 293 at lib/refcount.c:25 refcount_warn_saturate+0x108/0x174
refcount_t: addition on 0; use-after-free.
Modules linked in: coda_vpu imx_vdoa videobuf2_vmalloc dw_hdmi_ahb_audio vcan
CPU: 0 PID: 293 Comm: cansend Not tainted 5.5.0-rc6-00376-g9e20dcb7040d #1
Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
Backtrace:
[<
c010f570>] (dump_backtrace) from [<
c010f90c>] (show_stack+0x20/0x24)
[<
c010f8ec>] (show_stack) from [<
c0c3e1a4>] (dump_stack+0x8c/0xa0)
[<
c0c3e118>] (dump_stack) from [<
c0127fec>] (__warn+0xe0/0x108)
[<
c0127f0c>] (__warn) from [<
c01283c8>] (warn_slowpath_fmt+0xa8/0xcc)
[<
c0128324>] (warn_slowpath_fmt) from [<
c0539c0c>] (refcount_warn_saturate+0x108/0x174)
[<
c0539b04>] (refcount_warn_saturate) from [<
c0ad2cac>] (j1939_can_recv+0x20c/0x210)
[<
c0ad2aa0>] (j1939_can_recv) from [<
c0ac9dc8>] (can_rcv_filter+0xb4/0x268)
[<
c0ac9d14>] (can_rcv_filter) from [<
c0aca2cc>] (can_receive+0xb0/0xe4)
[<
c0aca21c>] (can_receive) from [<
c0aca348>] (can_rcv+0x48/0x98)
[<
c0aca300>] (can_rcv) from [<
c09b1fdc>] (__netif_receive_skb_one_core+0x64/0x88)
[<
c09b1f78>] (__netif_receive_skb_one_core) from [<
c09b2070>] (__netif_receive_skb+0x38/0x94)
[<
c09b2038>] (__netif_receive_skb) from [<
c09b2130>] (netif_receive_skb_internal+0x64/0xf8)
[<
c09b20cc>] (netif_receive_skb_internal) from [<
c09b21f8>] (netif_receive_skb+0x34/0x19c)
[<
c09b21c4>] (netif_receive_skb) from [<
c0791278>] (can_rx_offload_napi_poll+0x58/0xb4)
Fixes:
0ae89beb283a ("can: add destructor for self generated skbs")
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: http://lore.kernel.org/r/20200124132656.22156-1-o.rempel@pengutronix.de
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Oliver Hartkopp [Tue, 20 Oct 2020 06:44:43 +0000 (08:44 +0200)]
can: dev: __can_get_echo_skb(): fix real payload length return value for RTR frames
The can_get_echo_skb() function returns the number of received bytes to
be used for netdev statistics. In the case of RTR frames we get a valid
(potential non-zero) data length value which has to be passed for further
operations. But on the wire RTR frames have no payload length. Therefore
the value to be used in the statistics has to be zero for RTR frames.
Reported-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://lore.kernel.org/r/20201020064443.80164-1-socketcan@hartkopp.net
Fixes:
cf5046b309b3 ("can: dev: let can_get_echo_skb() return dlc of CAN frame")
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Vincent Mailhol [Fri, 2 Oct 2020 15:41:45 +0000 (00:41 +0900)]
can: dev: can_get_echo_skb(): prevent call to kfree_skb() in hard IRQ context
If a driver calls can_get_echo_skb() during a hardware IRQ (which is often, but
not always, the case), the 'WARN_ON(in_irq)' in
net/core/skbuff.c#skb_release_head_state() might be triggered, under network
congestion circumstances, together with the potential risk of a NULL pointer
dereference.
The root cause of this issue is the call to kfree_skb() instead of
dev_kfree_skb_irq() in net/core/dev.c#enqueue_to_backlog().
This patch prevents the skb to be freed within the call to netif_rx() by
incrementing its reference count with skb_get(). The skb is finally freed by
one of the in-irq-context safe functions: dev_consume_skb_any() or
dev_kfree_skb_any(). The "any" version is used because some drivers might call
can_get_echo_skb() in a normal context.
The reason for this issue to occur is that initially, in the core network
stack, loopback skb were not supposed to be received in hardware IRQ context.
The CAN stack is an exeption.
This bug was previously reported back in 2017 in [1] but the proposed patch
never got accepted.
While [1] directly modifies net/core/dev.c, we try to propose here a
smoother modification local to CAN network stack (the assumption
behind is that only CAN devices are affected by this issue).
[1] http://lore.kernel.org/r/
57a3ffb6-3309-3ad5-5a34-
e93c3fe3614d@cetitec.com
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://lore.kernel.org/r/20201002154219.4887-2-mailhol.vincent@wanadoo.fr
Fixes:
39549eef3587 ("can: CAN Network device driver and Netlink interface")
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Marc Kleine-Budde [Thu, 18 Jun 2020 10:47:06 +0000 (12:47 +0200)]
can: rx-offload: don't call kfree_skb() from IRQ context
A CAN driver, using the rx-offload infrastructure, is reading CAN frames
(usually in IRQ context) from the hardware and placing it into the rx-offload
queue to be delivered to the networking stack via NAPI.
In case the rx-offload queue is full, trying to add more skbs results in the
skbs being dropped using kfree_skb(). If done from hard-IRQ context this
results in the following warning:
[ 682.552693] ------------[ cut here ]------------
[ 682.557360] WARNING: CPU: 0 PID: 3057 at net/core/skbuff.c:650 skb_release_head_state+0x74/0x84
[ 682.566075] Modules linked in: can_raw can coda_vpu flexcan dw_hdmi_ahb_audio v4l2_jpeg imx_vdoa can_dev
[ 682.575597] CPU: 0 PID: 3057 Comm: cansend Tainted: G W 5.7.0+ #18
[ 682.583098] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[ 682.589657] [<
c0112628>] (unwind_backtrace) from [<
c010c1c4>] (show_stack+0x10/0x14)
[ 682.597423] [<
c010c1c4>] (show_stack) from [<
c06c481c>] (dump_stack+0xe0/0x114)
[ 682.604759] [<
c06c481c>] (dump_stack) from [<
c0128f10>] (__warn+0xc0/0x10c)
[ 682.611742] [<
c0128f10>] (__warn) from [<
c0129314>] (warn_slowpath_fmt+0x5c/0xc0)
[ 682.619248] [<
c0129314>] (warn_slowpath_fmt) from [<
c0b95dec>] (skb_release_head_state+0x74/0x84)
[ 682.628143] [<
c0b95dec>] (skb_release_head_state) from [<
c0b95e08>] (skb_release_all+0xc/0x24)
[ 682.636774] [<
c0b95e08>] (skb_release_all) from [<
c0b95eac>] (kfree_skb+0x74/0x1c8)
[ 682.644479] [<
c0b95eac>] (kfree_skb) from [<
bf001d1c>] (can_rx_offload_queue_sorted+0xe0/0xe8 [can_dev])
[ 682.654051] [<
bf001d1c>] (can_rx_offload_queue_sorted [can_dev]) from [<
bf001d6c>] (can_rx_offload_get_echo_skb+0x48/0x94 [can_dev])
[ 682.666007] [<
bf001d6c>] (can_rx_offload_get_echo_skb [can_dev]) from [<
bf01efe4>] (flexcan_irq+0x194/0x5dc [flexcan])
[ 682.676734] [<
bf01efe4>] (flexcan_irq [flexcan]) from [<
c019c1ec>] (__handle_irq_event_percpu+0x4c/0x3ec)
[ 682.686322] [<
c019c1ec>] (__handle_irq_event_percpu) from [<
c019c5b8>] (handle_irq_event_percpu+0x2c/0x88)
[ 682.695993] [<
c019c5b8>] (handle_irq_event_percpu) from [<
c019c64c>] (handle_irq_event+0x38/0x5c)
[ 682.704887] [<
c019c64c>] (handle_irq_event) from [<
c01a1058>] (handle_fasteoi_irq+0xc8/0x180)
[ 682.713432] [<
c01a1058>] (handle_fasteoi_irq) from [<
c019b2c0>] (generic_handle_irq+0x30/0x44)
[ 682.722063] [<
c019b2c0>] (generic_handle_irq) from [<
c019b8f8>] (__handle_domain_irq+0x64/0xdc)
[ 682.730783] [<
c019b8f8>] (__handle_domain_irq) from [<
c06df4a4>] (gic_handle_irq+0x48/0x9c)
[ 682.739158] [<
c06df4a4>] (gic_handle_irq) from [<
c0100b30>] (__irq_svc+0x70/0x98)
[ 682.746656] Exception stack(0xe80e9dd8 to 0xe80e9e20)
[ 682.751725] 9dc0:
00000001 e80e8000
[ 682.759922] 9de0:
e820cf80 00000000 ffffe000 00000000 eaf08fe4 00000000 600d0013 00000000
[ 682.768117] 9e00:
c1732e3c c16093a8 e820d4c0 e80e9e28 c018a57c c018b870 600d0013 ffffffff
[ 682.776315] [<
c0100b30>] (__irq_svc) from [<
c018b870>] (lock_acquire+0x108/0x4e8)
[ 682.783821] [<
c018b870>] (lock_acquire) from [<
c0e938e4>] (down_write+0x48/0xa8)
[ 682.791242] [<
c0e938e4>] (down_write) from [<
c02818dc>] (unlink_file_vma+0x24/0x40)
[ 682.798922] [<
c02818dc>] (unlink_file_vma) from [<
c027a258>] (free_pgtables+0x34/0xb8)
[ 682.806858] [<
c027a258>] (free_pgtables) from [<
c02835a4>] (exit_mmap+0xe4/0x170)
[ 682.814361] [<
c02835a4>] (exit_mmap) from [<
c01248e0>] (mmput+0x5c/0x110)
[ 682.821171] [<
c01248e0>] (mmput) from [<
c012e910>] (do_exit+0x374/0xbe4)
[ 682.827892] [<
c012e910>] (do_exit) from [<
c0130888>] (do_group_exit+0x38/0xb4)
[ 682.835132] [<
c0130888>] (do_group_exit) from [<
c0130914>] (__wake_up_parent+0x0/0x14)
[ 682.843063] irq event stamp: 1936
[ 682.846399] hardirqs last enabled at (1935): [<
c02938b0>] rmqueue+0xf4/0xc64
[ 682.853553] hardirqs last disabled at (1936): [<
c0100b20>] __irq_svc+0x60/0x98
[ 682.860799] softirqs last enabled at (1878): [<
bf04cdcc>] raw_release+0x108/0x1f0 [can_raw]
[ 682.869256] softirqs last disabled at (1876): [<
c0b8f478>] release_sock+0x18/0x98
[ 682.876753] ---[ end trace
7bca4751ce44c444 ]---
This patch fixes the problem by replacing the kfree_skb() by
dev_kfree_skb_any(), as rx-offload might be called from threaded IRQ handlers
as well.
Fixes:
ca913f1ac024 ("can: rx-offload: can_rx_offload_queue_sorted(): fix error handling, avoid skb mem leak")
Fixes:
6caf8a6d6586 ("can: rx-offload: can_rx_offload_queue_tail(): fix error handling, avoid skb mem leak")
Link: http://lore.kernel.org/r/20201019190524.1285319-3-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Zhang Changzhong [Tue, 14 Jul 2020 06:44:50 +0000 (14:44 +0800)]
can: proc: can_remove_proc(): silence remove_proc_entry warning
If can_init_proc() fail to create /proc/net/can directory, can_remove_proc()
will trigger a warning:
WARNING: CPU: 6 PID: 7133 at fs/proc/generic.c:672 remove_proc_entry+0x17b0
Kernel panic - not syncing: panic_on_warn set ...
Fix to return early from can_remove_proc() if can proc_dir does not exists.
Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com>
Link: https://lore.kernel.org/r/1594709090-3203-1-git-send-email-zhangchangzhong@huawei.com
Fixes:
8e8cda6d737d ("can: initial support for network namespaces")
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Oleksij Rempel [Thu, 22 Oct 2020 07:52:18 +0000 (09:52 +0200)]
dt-bindings: can: flexcan: convert fsl,*flexcan bindings to yaml
In order to automate the verification of DT nodes convert
fsl-flexcan.txt to fsl,flexcan.yaml
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20201022075218.11880-3-o.rempel@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Oleksij Rempel [Thu, 22 Oct 2020 07:52:17 +0000 (09:52 +0200)]
dt-bindings: can: add can-controller.yaml
For now we have only node name as common rule for all CAN controllers
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20201022075218.11880-2-o.rempel@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Daniele Palmas [Mon, 2 Nov 2020 11:01:08 +0000 (12:01 +0100)]
net: usb: qmi_wwan: add Telit LE910Cx 0x1230 composition
Add support for Telit LE910Cx 0x1230 composition:
0x1230: tty, adb, rmnet, audio, tty, tty, tty, tty
Signed-off-by: Daniele Palmas <dnlplm@gmail.com>
Acked-by: Bjørn Mork <bjorn@mork.no>
Link: https://lore.kernel.org/r/20201102110108.17244-1-dnlplm@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Davide Caratti [Mon, 2 Nov 2020 09:09:49 +0000 (10:09 +0100)]
mptcp: token: fix unititialized variable
gcc complains about use of uninitialized 'num'. Fix it by doing the first
assignment of 'num' when the variable is declared.
Fixes:
96d890daad05 ("mptcp: add msk interations helper")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/49e20da5d467a73414d4294a8bd35e2cb1befd49.1604308087.git.dcaratti@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
YueHaibing [Sat, 31 Oct 2020 03:10:53 +0000 (11:10 +0800)]
sfp: Fix error handing in sfp_probe()
gpiod_to_irq() never return 0, but returns negative in
case of error, check it and set gpio_irq to 0.
Fixes:
73970055450e ("sfp: add SFP module support")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20201031031053.25264-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sukadev Bhattiprolu [Fri, 30 Oct 2020 17:07:11 +0000 (10:07 -0700)]
powerpc/vnic: Extend "failover pending" window
Commit
5a18e1e0c193b introduced the 'failover_pending' state to track
the "failover pending window" - where we wait for the partner to become
ready (after a transport event) before actually attempting to failover.
i.e window is between following two events:
a. we get a transport event due to a FAILOVER
b. later, we get CRQ_INITIALIZED indicating the partner is
ready at which point we schedule a FAILOVER reset.
and ->failover_pending is true during this window.
If during this window, we attempt to open (or close) a device, we pretend
that the operation succeded and let the FAILOVER reset path complete the
operation.
This is fine, except if the transport event ("a" above) occurs during the
open and after open has already checked whether a failover is pending. If
that happens, we fail the open, which can cause the boot scripts to leave
the interface down requiring administrator to manually bring up the device.
This fix "extends" the failover pending window till we are _actually_
ready to perform the failover reset (i.e until after we get the RTNL
lock). Since open() holds the RTNL lock, we can be sure that we either
finish the open or if the open() fails due to the failover pending window,
we can again pretend that open is done and let the failover complete it.
We could try and block the open until failover is completed but a) that
could still timeout the application and b) Existing code "pretends" that
failover occurred "just after" open succeeded, so marks the open successful
and lets the failover complete the open. So, mark the open successful even
if the transport event occurs before we actually start the open.
Fixes:
5a18e1e0c193 ("ibmvnic: Fix failover case for non-redundant configuration")
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Acked-by: Dany Madden <drt@linux.ibm.com>
Link: https://lore.kernel.org/r/20201030170711.1562994-1-sukadev@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jonathan McDowell [Fri, 30 Oct 2020 18:33:15 +0000 (18:33 +0000)]
net: dsa: qca8k: Fix port MTU setting
The qca8k only supports a switch-wide MTU setting, and the code to take
the max of all ports was only looking at the port currently being set.
Fix to examine all ports.
Reported-by: DENG Qingfang <dqfext@gmail.com>
Fixes:
f58d2598cf70 ("net: dsa: qca8k: implement the port MTU callbacks")
Signed-off-by: Jonathan McDowell <noodles@earth.li>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20201030183315.GA6736@earth.li
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Petr Malat [Fri, 30 Oct 2020 13:26:33 +0000 (14:26 +0100)]
sctp: Fix COMM_LOST/CANT_STR_ASSOC err reporting on big-endian platforms
Commit
978aa0474115 ("sctp: fix some type cast warnings introduced since
very beginning")' broke err reading from sctp_arg, because it reads the
value as 32-bit integer, although the value is stored as 16-bit integer.
Later this value is passed to the userspace in 16-bit variable, thus the
user always gets 0 on big-endian platforms. Fix it by reading the __u16
field of sctp_arg union, as reading err field would produce a sparse
warning.
Fixes:
978aa0474115 ("sctp: fix some type cast warnings introduced since very beginning")
Signed-off-by: Petr Malat <oss@malat.biz>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Link: https://lore.kernel.org/r/20201030132633.7045-1-oss@malat.biz
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Grygorii Strashko [Thu, 29 Oct 2020 19:09:10 +0000 (21:09 +0200)]
net: ethernet: ti: cpsw: disable PTPv1 hw timestamping advertisement
The TI CPTS does not natively support PTPv1, only PTPv2. But, as it
happens, the CPTS can provide HW timestamp for PTPv1 Sync messages, because
CPTS HW parser looks for PTP messageType id in PTP message octet 0 which
value is 0 for PTPv1. As result, CPTS HW can detect Sync messages for PTPv1
and PTPv2 (Sync messageType = 0 for both), but it fails for any other PTPv1
messages (Delay_req/resp) and will return PTP messageType id 0 for them.
The commit
e9523a5a32a1 ("net: ethernet: ti: cpsw: enable
HWTSTAMP_FILTER_PTP_V1_L4_EVENT filter") added PTPv1 hw timestamping
advertisement by mistake, only to make Linux Kernel "timestamping" utility
work, and this causes issues with only PTPv1 compatible HW/SW - Sync HW
timestamped, but Delay_req/resp are not.
Hence, fix it disabling PTPv1 hw timestamping advertisement, so only PTPv1
compatible HW/SW can properly roll back to SW timestamping.
Fixes:
e9523a5a32a1 ("net: ethernet: ti: cpsw: enable HWTSTAMP_FILTER_PTP_V1_L4_EVENT filter")
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Link: https://lore.kernel.org/r/20201029190910.30789-1-grygorii.strashko@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Mon, 2 Nov 2020 19:21:33 +0000 (11:21 -0800)]
Merge branch 'dpaa_eth-buffer-layout-fixes'
Camelia Groza says:
====================
dpaa_eth: buffer layout fixes
The patches are related to the software workaround for the A050385 erratum.
The first patch ensures optimal buffer usage for non-erratum scenarios. The
second patch fixes a currently inconsequential discrepancy between the
FMan and Ethernet drivers.
Changes in v3:
- refactor defines for clarity in 1/2
- add more details on the user impact in 1/2
- remove unnecessary inline identifier in 2/2
Changes in v2:
- make the returned value for TX ports explicit in 2/2
- simplify the buf_layout reference in 2/2
====================
Link: https://lore.kernel.org/r/cover.1604339942.git.camelia.groza@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Camelia Groza [Mon, 2 Nov 2020 18:34:36 +0000 (20:34 +0200)]
dpaa_eth: fix the RX headroom size alignment
The headroom reserved for received frames needs to be aligned to an
RX specific value. There is currently a discrepancy between the values
used in the Ethernet driver and the values passed to the FMan.
Coincidentally, the resulting aligned values are identical.
Fixes:
3c68b8fffb48 ("dpaa_eth: FMan erratum A050385 workaround")
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Camelia Groza <camelia.groza@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Camelia Groza [Mon, 2 Nov 2020 18:34:35 +0000 (20:34 +0200)]
dpaa_eth: update the buffer layout for non-A050385 erratum scenarios
Impose a larger RX private data area only when the A050385 erratum is
present on the hardware. A smaller buffer size is sufficient in all
other scenarios. This enables a wider range of linear Jumbo frame
sizes in non-erratum scenarios, instead of turning to multi
buffer Scatter/Gather frames. The maximum linear frame size is
increased by 128 bytes for non-erratum arm64 platforms.
Cleanup the hardware annotations header defines in the process.
Fixes:
3c68b8fffb48 ("dpaa_eth: FMan erratum A050385 workaround")
Signed-off-by: Camelia Groza <camelia.groza@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Mon, 2 Nov 2020 17:43:54 +0000 (09:43 -0800)]
Merge tag 'mac80211-for-net-2020-10-30' of git://git./linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
A couple of fixes, for
* HE on 2.4 GHz
* a few issues syzbot found, but we have many more reports :-(
* a regression in nl80211-transported EAPOL frames which had
affected a number of users, from Mathy
* kernel-doc markings in mac80211, from Mauro
* a format argument in reg.c, from Ye Bin
====================
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Sun, 1 Nov 2020 00:28:17 +0000 (17:28 -0700)]
Merge git://git./pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Incorrect netlink report logic in flowtable and genID.
2) Add a selftest to check that wireguard passes the right sk
to ip_route_me_harder, from Jason A. Donenfeld.
3) Pass the actual sk to ip_route_me_harder(), also from Jason.
4) Missing expression validation of updates via nft --check.
5) Update byte and packet counters regardless of whether they
match, from Stefano Brivio.
====================
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
wenxu [Fri, 30 Oct 2020 03:32:08 +0000 (11:32 +0800)]
ip_tunnel: fix over-mtu packet send fail without TUNNEL_DONT_FRAGMENT flags
The tunnel device such as vxlan, bareudp and geneve in the lwt mode set
the outer df only based TUNNEL_DONT_FRAGMENT.
And this was also the behavior for gre device before switching to use
ip_md_tunnel_xmit in commit
962924fa2b7a ("ip_gre: Refactor collect
metatdata mode tunnel xmit to ip_md_tunnel_xmit")
When the ip_gre in lwt mode xmit with ip_md_tunnel_xmi changed the rule and
make the discrepancy between handling of DF by different tunnels. So in the
ip_md_tunnel_xmit should follow the same rule like other tunnels.
Fixes:
cfc7381b3002 ("ip_tunnel: add collect_md mode to IPIP tunnel")
Signed-off-by: wenxu <wenxu@ucloud.cn>
Link: https://lore.kernel.org/r/1604028728-31100-1-git-send-email-wenxu@ucloud.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Mark Deneen [Fri, 30 Oct 2020 15:58:14 +0000 (15:58 +0000)]
cadence: force nonlinear buffers to be cloned
In my test setup, I had a SAMA5D27 device configured with ip forwarding, and
second device with usb ethernet (r8152) sending ICMP packets. Â If the packet
was larger than about 220 bytes, the SAMA5 device would "oops" with the
following trace:
kernel BUG at net/core/skbuff.c:1863!
Internal error: Oops - BUG: 0 [#1] ARM
Modules linked in: xt_MASQUERADE ppp_async ppp_generic slhc iptable_nat xt_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 can_raw can bridge stp llc ipt_REJECT nf_reject_ipv4 sd_mod cdc_ether usbnet usb_storage r8152 scsi_mod mii o
ption usb_wwan usbserial micrel macb at91_sama5d2_adc phylink gpio_sama5d2_piobu m_can_platform m_can industrialio_triggered_buffer kfifo_buf of_mdio can_dev fixed_phy sdhci_of_at91 sdhci_pltfm libphy sdhci mmc_core ohci_at91 ehci_atmel o
hci_hcd iio_rescale industrialio sch_fq_codel spidev prox2_hal(O)
CPU: 0 PID: 0 Comm: swapper Tainted: G Â Â Â Â Â O Â Â Â 5.9.1-prox2+ #1
Hardware name: Atmel SAMA5
PC is at skb_put+0x3c/0x50
LR is at macb_start_xmit+0x134/0xad0 [macb]
pc : [<
c05258cc>] Â Â lr : [<
bf0ea5b8>] Â Â psr:
20070113
sp :
c0d01a60 Â ip :
c07232c0 Â fp :
c4250000
r10:
c0d03cc8 Â r9 :
00000000 Â r8 :
c0d038c0
r7 :
00000000 Â r6 :
00000008 Â r5 :
c59b66c0 Â r4 :
0000002a
r3 :
8f659eff  r2 :
c59e9eea  r1 :
00000001 Â r0 :
c59b66c0
Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
Control:
10c53c7d  Table:
2640c059 Â DAC:
00000051
Process swapper (pid: 0, stack limit = 0x75002d81)
<snipped stack>
[<
c05258cc>] (skb_put) from [<
bf0ea5b8>] (macb_start_xmit+0x134/0xad0 [macb])
[<
bf0ea5b8>] (macb_start_xmit [macb]) from [<
c053e504>] (dev_hard_start_xmit+0x90/0x11c)
[<
c053e504>] (dev_hard_start_xmit) from [<
c0571180>] (sch_direct_xmit+0x124/0x260)
[<
c0571180>] (sch_direct_xmit) from [<
c053eae4>] (__dev_queue_xmit+0x4b0/0x6d0)
[<
c053eae4>] (__dev_queue_xmit) from [<
c05a5650>] (ip_finish_output2+0x350/0x580)
[<
c05a5650>] (ip_finish_output2) from [<
c05a7e24>] (ip_output+0xb4/0x13c)
[<
c05a7e24>] (ip_output) from [<
c05a39d0>] (ip_forward+0x474/0x500)
[<
c05a39d0>] (ip_forward) from [<
c05a13d8>] (ip_sublist_rcv_finish+0x3c/0x50)
[<
c05a13d8>] (ip_sublist_rcv_finish) from [<
c05a19b8>] (ip_sublist_rcv+0x11c/0x188)
[<
c05a19b8>] (ip_sublist_rcv) from [<
c05a2494>] (ip_list_rcv+0xf8/0x124)
[<
c05a2494>] (ip_list_rcv) from [<
c05403c4>] (__netif_receive_skb_list_core+0x1a0/0x20c)
[<
c05403c4>] (__netif_receive_skb_list_core) from [<
c05405c4>] (netif_receive_skb_list_internal+0x194/0x230)
[<
c05405c4>] (netif_receive_skb_list_internal) from [<
c0540684>] (gro_normal_list.part.0+0x14/0x28)
[<
c0540684>] (gro_normal_list.part.0) from [<
c0541280>] (napi_complete_done+0x16c/0x210)
[<
c0541280>] (napi_complete_done) from [<
bf14c1c0>] (r8152_poll+0x684/0x708 [r8152])
[<
bf14c1c0>] (r8152_poll [r8152]) from [<
c0541424>] (net_rx_action+0x100/0x328)
[<
c0541424>] (net_rx_action) from [<
c01012ec>] (__do_softirq+0xec/0x274)
[<
c01012ec>] (__do_softirq) from [<
c012d6d4>] (irq_exit+0xcc/0xd0)
[<
c012d6d4>] (irq_exit) from [<
c0160960>] (__handle_domain_irq+0x58/0xa4)
[<
c0160960>] (__handle_domain_irq) from [<
c0100b0c>] (__irq_svc+0x6c/0x90)
Exception stack(0xc0d01ef0 to 0xc0d01f38)
1ee0: Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â
00000000 0000003d 0c31f383 c0d0fa00
1f00:
c0d2eb80 00000000 c0d2e630 4dad8c49 4da967b0 0000003d 0000003d 00000000
1f20:
fffffff5 c0d01f40 c04e0f88 c04e0f8c 30070013 ffffffff
[<
c0100b0c>] (__irq_svc) from [<
c04e0f8c>] (cpuidle_enter_state+0x7c/0x378)
[<
c04e0f8c>] (cpuidle_enter_state) from [<
c04e12c4>] (cpuidle_enter+0x28/0x38)
[<
c04e12c4>] (cpuidle_enter) from [<
c014f710>] (do_idle+0x194/0x214)
[<
c014f710>] (do_idle) from [<
c014fa50>] (cpu_startup_entry+0xc/0x14)
[<
c014fa50>] (cpu_startup_entry) from [<
c0a00dc8>] (start_kernel+0x46c/0x4a0)
Code:
e580c054 8a000002 e1a00002 e8bd8070 (
e7f001f2)
---[ end trace
146c8a334115490c ]---
The solution was to force nonlinear buffers to be cloned. Â This was previously
reported by Klaus Doth (https://www.spinics.net/lists/netdev/msg556937.html)
but never formally submitted as a patch.
This is the third revision, hopefully the formatting is correct this time!
Suggested-by: Klaus Doth <krnl@doth.eu>
Fixes:
653e92a9175e ("net: macb: add support for padding and fcs computation")
Signed-off-by: Mark Deneen <mdeneen@saucontech.com>
Link: https://lore.kernel.org/r/20201030155814.622831-1-mdeneen@saucontech.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Sat, 31 Oct 2020 20:16:07 +0000 (13:16 -0700)]
Merge branch 'ipv6-reply-icmp-error-if-fragment-doesn-t-contain-all-headers'
Hangbin Liu says:
====================
IPv6: reply ICMP error if fragment doesn't contain all headers
When our Engineer run latest IPv6 Core Conformance test, test v6LC.1.3.6:
First Fragment Doesn’t Contain All Headers[1] failed. The test purpose is to
verify that the node (Linux for example) should properly process IPv6 packets
that don’t include all the headers through the Upper-Layer header.
Based on RFC 8200, Section 4.5 Fragment Header
- If the first fragment does not include all headers through an
Upper-Layer header, then that fragment should be discarded and
an ICMP Parameter Problem, Code 3, message should be sent to
the source of the fragment, with the Pointer field set to zero.
The first patch add a definition for ICMPv6 Parameter Problem, code 3.
The second patch add a check for the 1st fragment packet to make sure
Upper-Layer header exist.
[1] Page 68, v6LC.1.3.6: First Fragment Doesn’t Contain All Headers part A, B,
C and D at https://ipv6ready.org/docs/Core_Conformance_5_0_0.pdf
[2] My reproducer:
import sys, os
from scapy.all import *
def send_frag_dst_opt(src_ip6, dst_ip6):
ip6 = IPv6(src = src_ip6, dst = dst_ip6, nh = 44)
frag_1 = IPv6ExtHdrFragment(nh = 60, m = 1)
dst_opt = IPv6ExtHdrDestOpt(nh = 58)
frag_2 = IPv6ExtHdrFragment(nh = 58, offset = 4, m = 1)
icmp_echo = ICMPv6EchoRequest(seq = 1)
pkt_1 = ip6/frag_1/dst_opt
pkt_2 = ip6/frag_2/icmp_echo
send(pkt_1)
send(pkt_2)
def send_frag_route_opt(src_ip6, dst_ip6):
ip6 = IPv6(src = src_ip6, dst = dst_ip6, nh = 44)
frag_1 = IPv6ExtHdrFragment(nh = 43, m = 1)
route_opt = IPv6ExtHdrRouting(nh = 58)
frag_2 = IPv6ExtHdrFragment(nh = 58, offset = 4, m = 1)
icmp_echo = ICMPv6EchoRequest(seq = 2)
pkt_1 = ip6/frag_1/route_opt
pkt_2 = ip6/frag_2/icmp_echo
send(pkt_1)
send(pkt_2)
if __name__ == '__main__':
src = sys.argv[1]
dst = sys.argv[2]
conf.iface = sys.argv[3]
send_frag_dst_opt(src, dst)
send_frag_route_opt(src, dst)
====================
Link: https://lore.kernel.org/r/20201027123313.3717941-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Hangbin Liu [Tue, 27 Oct 2020 12:33:13 +0000 (20:33 +0800)]
IPv6: reply ICMP error if the first fragment don't include all headers
Based on RFC 8200, Section 4.5 Fragment Header:
- If the first fragment does not include all headers through an
Upper-Layer header, then that fragment should be discarded and
an ICMP Parameter Problem, Code 3, message should be sent to
the source of the fragment, with the Pointer field set to zero.
Checking each packet header in IPv6 fast path will have performance impact,
so I put the checking in ipv6_frag_rcv().
As the packet may be any kind of L4 protocol, I only checked some common
protocols' header length and handle others by (offset + 1) > skb->len.
Also use !(frag_off & htons(IP6_OFFSET)) to catch atomic fragments
(fragmented packet with only one fragment).
When send ICMP error message, if the 1st truncated fragment is ICMP message,
icmp6_send() will break as is_ineligible() return true. So I added a check
in is_ineligible() to let fragment packet with nexthdr ICMP but no ICMP header
return false.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Hangbin Liu [Tue, 27 Oct 2020 12:33:12 +0000 (20:33 +0800)]
ICMPv6: Add ICMPv6 Parameter Problem, code 3 definition
Based on RFC7112, Section 6:
IANA has added the following "Type 4 - Parameter Problem" message to
the "Internet Control Message Protocol version 6 (ICMPv6) Parameters"
registry:
CODE NAME/DESCRIPTION
3 IPv6 First Fragment has incomplete IPv6 Header Chain
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Colin Ian King [Tue, 27 Oct 2020 11:49:25 +0000 (11:49 +0000)]
net: atm: fix update of position index in lec_seq_next
The position index in leq_seq_next is not updated when the next
entry is fetched an no more entries are available. This causes
seq_file to report the following error:
"seq_file: buggy .next function lec_seq_next [lec] did not update
position index"
Fix this by always updating the position index.
[ Note: this is an ancient 2002 bug, the sha is from the
tglx/history repo ]
Fixes
4aea2cbff417 ("[ATM]: Move lan seq_file ops to lec.c [1/3]")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20201027114925.21843-1-colin.king@canonical.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Stefano Brivio [Thu, 29 Oct 2020 15:39:46 +0000 (16:39 +0100)]
netfilter: ipset: Update byte and packet counters regardless of whether they match
In ip_set_match_extensions(), for sets with counters, we take care of
updating counters themselves by calling ip_set_update_counter(), and of
checking if the given comparison and values match, by calling
ip_set_match_counter() if needed.
However, if a given comparison on counters doesn't match the configured
values, that doesn't mean the set entry itself isn't matching.
This fix restores the behaviour we had before commit
4750005a85f7
("netfilter: ipset: Fix "don't update counters" mode when counters used
at the matching"), without reintroducing the issue fixed there: back
then, mtype_data_match() first updated counters in any case, and then
took care of matching on counters.
Now, if the IPSET_FLAG_SKIP_COUNTER_UPDATE flag is set,
ip_set_update_counter() will anyway skip counter updates if desired.
The issue observed is illustrated by this reproducer:
ipset create c hash:ip counters
ipset add c 192.0.2.1
iptables -I INPUT -m set --match-set c src --bytes-gt 800 -j DROP
if we now send packets from 192.0.2.1, bytes and packets counters
for the entry as shown by 'ipset list' are always zero, and, no
matter how many bytes we send, the rule will never match, because
counters themselves are not updated.
Reported-by: Mithil Mhatre <mmhatre@redhat.com>
Fixes:
4750005a85f7 ("netfilter: ipset: Fix "don't update counters" mode when counters used at the matching")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Marek Szyprowski [Thu, 29 Oct 2020 18:50:11 +0000 (19:50 +0100)]
net: stmmac: Fix channel lock initialization
Commit
0366f7e06a6b ("net: stmmac: add ethtool support for get/set
channels") refactored channel initialization, but during that operation,
the spinlock initialization got lost. Fix this. This fixes the following
lockdep warning:
meson8b-dwmac
ff3f0000.ethernet eth0: Link is Up - 1Gbps/Full - flow control off
INFO: trying to register non-static key.
the code is fine but needs lockdep annotation.
turning off the locking correctness validator.
CPU: 1 PID: 331 Comm: kworker/1:2H Not tainted 5.9.0-rc3+ #1858
Hardware name: Hardkernel ODROID-N2 (DT)
Workqueue: kblockd blk_mq_run_work_fn
Call trace:
dump_backtrace+0x0/0x1d0
show_stack+0x14/0x20
dump_stack+0xe8/0x154
register_lock_class+0x58c/0x590
__lock_acquire+0x7c/0x1790
lock_acquire+0xf4/0x440
_raw_spin_lock_irqsave+0x80/0xb0
stmmac_tx_timer+0x4c/0xb0 [stmmac]
call_timer_fn+0xc4/0x3e8
run_timer_softirq+0x2b8/0x6c0
efi_header_end+0x114/0x5f8
irq_exit+0x104/0x110
__handle_domain_irq+0x60/0xb8
gic_handle_irq+0x58/0xb0
el1_irq+0xbc/0x180
_raw_spin_unlock_irqrestore+0x48/0x90
mmc_blk_rw_wait+0x70/0x160
mmc_blk_mq_issue_rq+0x510/0x830
mmc_mq_queue_rq+0x13c/0x278
blk_mq_dispatch_rq_list+0x2a0/0x698
__blk_mq_do_dispatch_sched+0x254/0x288
__blk_mq_sched_dispatch_requests+0x190/0x1d8
blk_mq_sched_dispatch_requests+0x34/0x70
__blk_mq_run_hw_queue+0xcc/0x148
blk_mq_run_work_fn+0x20/0x28
process_one_work+0x2a8/0x718
worker_thread+0x48/0x460
kthread+0x134/0x160
ret_from_fork+0x10/0x1c
Fixes:
0366f7e06a6b ("net: stmmac: add ethtool support for get/set channels")
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/20201029185011.4749-1-m.szyprowski@samsung.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Wong Vee Khee [Thu, 29 Oct 2020 09:32:28 +0000 (17:32 +0800)]
stmmac: intel: Fix kernel panic on pci probe
The commit "stmmac: intel: Adding ref clock 1us tic for LPI cntr"
introduced a regression which leads to the kernel panic duing loading
of the dwmac_intel module.
Move the code block after pci resources is obtained.
Fixes:
b4c5f83ae3f3 ("stmmac: intel: Adding ref clock 1us tic for LPI cntr")
Cc: Voon Weifeng <weifeng.voon@intel.com>
Signed-off-by: Wong Vee Khee <vee.khee.wong@intel.com>
Link: https://lore.kernel.org/r/20201029093228.1741-1-vee.khee.wong@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Claudiu Manoil [Tue, 20 Oct 2020 17:36:05 +0000 (20:36 +0300)]
gianfar: Account for Tx PTP timestamp in the skb headroom
When PTP timestamping is enabled on Tx, the controller
inserts the Tx timestamp at the beginning of the frame
buffer, between SFD and the L2 frame header. This means
that the skb provided by the stack is required to have
enough headroom otherwise a new skb needs to be created
by the driver to accommodate the timestamp inserted by h/w.
Up until now the driver was relying on the second option,
using skb_realloc_headroom() to create a new skb to accommodate
PTP frames. Turns out that this method is not reliable, as
reallocation of skbs for PTP frames along with the required
overhead (skb_set_owner_w, consume_skb) is causing random
crashes in subsequent skb_*() calls, when multiple concurrent
TCP streams are run at the same time on the same device
(as seen in James' report).
Note that these crashes don't occur with a single TCP stream,
nor with multiple concurrent UDP streams, but only when multiple
TCP streams are run concurrently with the PTP packet flow
(doing skb reallocation).
This patch enforces the first method, by requesting enough
headroom from the stack to accommodate PTP frames, and so avoiding
skb_realloc_headroom() & co, and the crashes no longer occur.
There's no reason not to set needed_headroom to a large enough
value to accommodate PTP frames, so in this regard this patch
is a fix.
Reported-by: James Jurack <james.jurack@ametek.com>
Fixes:
bee9e58c9e98 ("gianfar:don't add FCB length to hard_header_len")
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Link: https://lore.kernel.org/r/20201020173605.1173-1-claudiu.manoil@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Claudiu Manoil [Thu, 29 Oct 2020 08:10:56 +0000 (10:10 +0200)]
gianfar: Replace skb_realloc_headroom with skb_cow_head for PTP
When PTP timestamping is enabled on Tx, the controller
inserts the Tx timestamp at the beginning of the frame
buffer, between SFD and the L2 frame header. This means
that the skb provided by the stack is required to have
enough headroom otherwise a new skb needs to be created
by the driver to accommodate the timestamp inserted by h/w.
Up until now the driver was relying on skb_realloc_headroom()
to create new skbs to accommodate PTP frames. Turns out that
this method is not reliable in this context at least, as
skb_realloc_headroom() for PTP frames can cause random crashes,
mostly in subsequent skb_*() calls, when multiple concurrent
TCP streams are run at the same time with the PTP flow
on the same device (as seen in James' report). I also noticed
that when the system is loaded by sending multiple TCP streams,
the driver receives cloned skbs in large numbers.
skb_cow_head() instead proves to be stable in this scenario,
and not only handles cloned skbs too but it's also more efficient
and widely used in other drivers.
The commit introducing skb_realloc_headroom in the driver
goes back to 2009, commit
93c1285c5d92
("gianfar: reallocate skb when headroom is not enough for fcb").
For practical purposes I'm referencing a newer commit (from 2012)
that brings the code to its current structure (and fixes the PTP
case).
Fixes:
9c4886e5e63b ("gianfar: Fix invalid TX frames returned on error queue when time stamping")
Reported-by: James Jurack <james.jurack@ametek.com>
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Link: https://lore.kernel.org/r/20201029081057.8506-1-claudiu.manoil@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Greg Ungerer [Wed, 28 Oct 2020 05:22:32 +0000 (15:22 +1000)]
net: fec: fix MDIO probing for some FEC hardware blocks
Some (apparently older) versions of the FEC hardware block do not like
the MMFR register being cleared to avoid generation of MII events at
initialization time. The action of clearing this register results in no
future MII events being generated at all on the problem block. This means
the probing of the MDIO bus will find no PHYs.
Create a quirk that can be checked at the FECs MII init time so that
the right thing is done. The quirk is set as appropriate for the FEC
hardware blocks that are known to need this.
Fixes:
f166f890c8f0 ("net: ethernet: fec: Replace interrupt driven MDIO with polled IO")
Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
Acked-by: Fugang Duan <fugand.duan@nxp.com>
Tested-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Clemens Gruber <clemens.gruber@pqgruber.com>
Link: https://lore.kernel.org/r/20201028052232.1315167-1-gerg@linux-m68k.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alexander Ovechkin [Thu, 29 Oct 2020 17:10:12 +0000 (20:10 +0300)]
ip6_tunnel: set inner ipproto before ip6_tnl_encap
ip6_tnl_encap assigns to proto transport protocol which
encapsulates inner packet, but we must pass to set_inner_ipproto
protocol of that inner packet.
Calling set_inner_ipproto after ip6_tnl_encap might break gso.
For example, in case of encapsulating ipv6 packet in fou6 packet, inner_ipproto
would be set to IPPROTO_UDP instead of IPPROTO_IPV6. This would lead to
incorrect calling sequence of gso functions:
ipv6_gso_segment -> udp6_ufo_fragment -> skb_udp_tunnel_segment -> udp6_ufo_fragment
instead of:
ipv6_gso_segment -> udp6_ufo_fragment -> skb_udp_tunnel_segment -> ip6ip6_gso_segment
Fixes:
6c11fbf97e69 ("ip6_tunnel: add MPLS transmit support")
Signed-off-by: Alexander Ovechkin <ovov@yandex-team.ru>
Link: https://lore.kernel.org/r/20201029171012.20904-1-ovov@yandex-team.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pablo Neira Ayuso [Thu, 29 Oct 2020 12:50:03 +0000 (13:50 +0100)]
netfilter: nf_tables: missing validation from the abort path
If userspace does not include the trailing end of batch message, then
nfnetlink aborts the transaction. This allows to check that ruleset
updates trigger no errors.
After this patch, invoking this command from the prerouting chain:
# nft -c add rule x y fib saddr . oif type local
fails since oif is not supported there.
This patch fixes the lack of rule validation from the abort/check path
to catch configuration errors such as the one above.
Fixes:
a654de8fdc18 ("netfilter: nf_tables: fix chain dependency validation")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Jason A. Donenfeld [Thu, 29 Oct 2020 02:56:06 +0000 (03:56 +0100)]
netfilter: use actual socket sk rather than skb sk when routing harder
If netfilter changes the packet mark when mangling, the packet is
rerouted using the route_me_harder set of functions. Prior to this
commit, there's one big difference between route_me_harder and the
ordinary initial routing functions, described in the comment above
__ip_queue_xmit():
/* Note: skb->sk can be different from sk, in case of tunnels */
int __ip_queue_xmit(struct sock *sk, struct sk_buff *skb, struct flowi *fl,
That function goes on to correctly make use of sk->sk_bound_dev_if,
rather than skb->sk->sk_bound_dev_if. And indeed the comment is true: a
tunnel will receive a packet in ndo_start_xmit with an initial skb->sk.
It will make some transformations to that packet, and then it will send
the encapsulated packet out of a *new* socket. That new socket will
basically always have a different sk_bound_dev_if (otherwise there'd be
a routing loop). So for the purposes of routing the encapsulated packet,
the routing information as it pertains to the socket should come from
that socket's sk, rather than the packet's original skb->sk. For that
reason __ip_queue_xmit() and related functions all do the right thing.
One might argue that all tunnels should just call skb_orphan(skb) before
transmitting the encapsulated packet into the new socket. But tunnels do
*not* do this -- and this is wisely avoided in skb_scrub_packet() too --
because features like TSQ rely on skb->destructor() being called when
that buffer space is truely available again. Calling skb_orphan(skb) too
early would result in buffers filling up unnecessarily and accounting
info being all wrong. Instead, additional routing must take into account
the new sk, just as __ip_queue_xmit() notes.
So, this commit addresses the problem by fishing the correct sk out of
state->sk -- it's already set properly in the call to nf_hook() in
__ip_local_out(), which receives the sk as part of its normal
functionality. So we make sure to plumb state->sk through the various
route_me_harder functions, and then make correct use of it following the
example of __ip_queue_xmit().
Fixes:
1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Jason A. Donenfeld [Thu, 29 Oct 2020 02:56:05 +0000 (03:56 +0100)]
wireguard: selftests: check that route_me_harder packets use the right sk
If netfilter changes the packet mark, the packet is rerouted. The
ip_route_me_harder family of functions fails to use the right sk, opting
to instead use skb->sk, resulting in a routing loop when used with
tunnels. With the next change fixing this issue in netfilter, test for
the relevant condition inside our test suite, since wireguard was where
the bug was discovered.
Reported-by: Chen Minqiang <ptpt52@gmail.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pablo Neira Ayuso [Thu, 22 Oct 2020 20:17:49 +0000 (22:17 +0200)]
netfilter: nftables: fix netlink report logic in flowtable and genid
The netlink report should be sent regardless the available listeners.
Fixes:
84d7fce69388 ("netfilter: nf_tables: export rule-set generation ID")
Fixes:
3b49e2e94e6e ("netfilter: nf_tables: add flow table netlink frontend")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Johannes Berg [Tue, 13 Oct 2020 12:01:57 +0000 (14:01 +0200)]
mac80211: don't require VHT elements for HE on 2.4 GHz
After the previous similar bugfix there was another bug here,
if no VHT elements were found we also disabled HE. Fix this to
disable HE only on the 5 GHz band; on 6 GHz it was already not
disabled, and on 2.4 GHz there need (should) not be any VHT.
Fixes:
57fa5e85d53c ("mac80211: determine chandef from HE 6 GHz operation")
Link: https://lore.kernel.org/r/20201013140156.535a2fc6192f.Id6e5e525a60ac18d245d86f4015f1b271fce6ee6@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Ye Bin [Fri, 9 Oct 2020 07:02:15 +0000 (15:02 +0800)]
cfg80211: regulatory: Fix inconsistent format argument
Fix follow warning:
[net/wireless/reg.c:3619]: (warning) %d in format string (no. 2)
requires 'int' but the argument type is 'unsigned int'.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Ye Bin <yebin10@huawei.com>
Link: https://lore.kernel.org/r/20201009070215.63695-1-yebin10@huawei.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Mauro Carvalho Chehab [Fri, 23 Oct 2020 16:33:08 +0000 (18:33 +0200)]
mac80211: fix kernel-doc markups
Some identifiers have different names between their prototypes
and the kernel-doc markup.
Others need to be fixed, as kernel-doc markups should use this format:
identifier - description
In the specific case of __sta_info_flush(), add a documentation
for sta_info_flush(), as this one is the one used outside
sta_info.c.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Link: https://lore.kernel.org/r/978d35eef2dc76e21c81931804e4eaefbd6d635e.1603469755.git.mchehab+huawei@kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Johannes Berg [Fri, 9 Oct 2020 12:17:11 +0000 (14:17 +0200)]
mac80211: always wind down STA state
When (for example) an IBSS station is pre-moved to AUTHORIZED
before it's inserted, and then the insertion fails, we don't
clean up the fast RX/TX states that might already have been
created, since we don't go through all the state transitions
again on the way down.
Do that, if it hasn't been done already, when the station is
freed. I considered only freeing the fast TX/RX state there,
but we might add more state so it's more robust to wind down
the state properly.
Note that we warn if the station was ever inserted, it should
have been properly cleaned up in that case, and the driver
will probably not like things happening out of order.
Reported-by: syzbot+2e293dbd67de2836ba42@syzkaller.appspotmail.com
Link: https://lore.kernel.org/r/20201009141710.7223b322a955.I95bd08b9ad0e039c034927cce0b75beea38e059b@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Johannes Berg [Fri, 9 Oct 2020 11:58:22 +0000 (13:58 +0200)]
cfg80211: initialize wdev data earlier
There's a race condition in the netdev registration in that
NETDEV_REGISTER actually happens after the netdev is available,
and so if we initialize things only there, we might get called
with an uninitialized wdev through nl80211 - not using a wdev
but using a netdev interface index.
I found this while looking into a syzbot report, but it doesn't
really seem to be related, and unfortunately there's no repro
for it (yet). I can't (yet) explain how it managed to get into
cfg80211_release_pmsr() from nl80211_netlink_notify() without
the wdev having been initialized, as the latter only iterates
the wdevs that are linked into the rdev, which even without the
change here happened after init.
However, looking at this, it seems fairly clear that the init
needs to be done earlier, otherwise we might even re-init on a
netns move, when data might still be pending.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Link: https://lore.kernel.org/r/20201009135821.fdcbba3aad65.Ie9201d91dbcb7da32318812effdc1561aeaf4cdc@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Johannes Berg [Fri, 9 Oct 2020 11:25:41 +0000 (13:25 +0200)]
mac80211: fix use of skb payload instead of header
When ieee80211_skb_resize() is called from ieee80211_build_hdr()
the skb has no 802.11 header yet, in fact it consist only of the
payload as the ethernet frame is removed. As such, we're using
the payload data for ieee80211_is_mgmt(), which is of course
completely wrong. This didn't really hurt us because these are
always data frames, so we could only have added more tailroom
than we needed if we determined it was a management frame and
sdata->crypto_tx_tailroom_needed_cnt was false.
However, syzbot found that of course there need not be any payload,
so we're using at best uninitialized memory for the check.
Fix this to pass explicitly the kind of frame that we have instead
of checking there, by replacing the "bool may_encrypt" argument
with an argument that can carry the three possible states - it's
not going to be encrypted, it's a management frame, or it's a data
frame (and then we check sdata->crypto_tx_tailroom_needed_cnt).
Reported-by: syzbot+32fd1a1bfe355e93f1e2@syzkaller.appspotmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Link: https://lore.kernel.org/r/20201009132538.e1fd7f802947.I799b288466ea2815f9d4c84349fae697dca2f189@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Mathy Vanhoef [Mon, 19 Oct 2020 16:01:13 +0000 (20:01 +0400)]
mac80211: fix regression where EAPOL frames were sent in plaintext
When sending EAPOL frames via NL80211 they are treated as injected
frames in mac80211. Due to commit
1df2bdba528b ("mac80211: never drop
injected frames even if normally not allowed") these injected frames
were not assigned a sta context in the function ieee80211_tx_dequeue,
causing certain wireless network cards to always send EAPOL frames in
plaintext. This may cause compatibility issues with some clients or
APs, which for instance can cause the group key handshake to fail and
in turn would cause the station to get disconnected.
This commit fixes this regression by assigning a sta context in
ieee80211_tx_dequeue to injected frames as well.
Note that sending EAPOL frames in plaintext is not a security issue
since they contain their own encryption and authentication protection.
Cc: stable@vger.kernel.org
Fixes:
1df2bdba528b ("mac80211: never drop injected frames even if normally not allowed")
Reported-by: Thomas Deutschmann <whissi@gentoo.org>
Tested-by: Christian Hesse <list@eworm.de>
Tested-by: Thomas Deutschmann <whissi@gentoo.org>
Signed-off-by: Mathy Vanhoef <Mathy.Vanhoef@kuleuven.be>
Link: https://lore.kernel.org/r/20201019160113.350912-1-Mathy.Vanhoef@kuleuven.be
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Linus Torvalds [Thu, 29 Oct 2020 20:02:52 +0000 (13:02 -0700)]
Merge tag 'fallthrough-fixes-clang-5.10-rc2' of git://git./linux/kernel/git/gustavoars/linux
Pull fallthrough fix from Gustavo A. R. Silva:
"This fixes a ton of fall-through warnings when building with Clang
12.0.0 and -Wimplicit-fallthrough"
* tag 'fallthrough-fixes-clang-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux:
include: jhash/signal: Fix fall-through warnings for Clang
Linus Torvalds [Thu, 29 Oct 2020 19:55:02 +0000 (12:55 -0700)]
Merge tag 'net-5.10-rc2' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Current release regressions:
- r8169: fix forced threading conflicting with other shared
interrupts; we tried to fix the use of raise_softirq_irqoff from an
IRQ handler on RT by forcing hard irqs, but this driver shares
legacy PCI IRQs so drop the _irqoff() instead
- tipc: fix memory leak caused by a recent syzbot report fix to
tipc_buf_append()
Current release - bugs in new features:
- devlink: Unlock on error in dumpit() and fix some error codes
- net/smc: fix null pointer dereference in smc_listen_decline()
Previous release - regressions:
- tcp: Prevent low rmem stalls with SO_RCVLOWAT.
- net: protect tcf_block_unbind with block lock
- ibmveth: Fix use of ibmveth in a bridge; the self-imposed filtering
to only send legal frames to the hypervisor was too strict
- net: hns3: Clear the CMDQ registers before unmapping BAR region;
incorrect cleanup order was leading to a crash
- bnxt_en - handful of fixes to fixes:
- Send HWRM_FUNC_RESET fw command unconditionally, even if there
are PCIe errors being reported
- Check abort error state in bnxt_open_nic().
- Invoke cancel_delayed_work_sync() for PFs also.
- Fix regression in workqueue cleanup logic in bnxt_remove_one().
- mlxsw: Only advertise link modes supported by both driver and
device, after removal of 56G support from the driver 56G was not
cleared from advertised modes
- net/smc: fix suppressed return code
Previous release - always broken:
- netem: fix zero division in tabledist, caused by integer overflow
- bnxt_en: Re-write PCI BARs after PCI fatal error.
- cxgb4: set up filter action after rewrites
- net: ipa: command payloads already mapped
Misc:
- s390/ism: fix incorrect system EID, it's okay to change since it
was added in current release
- vsock: use ns_capable_noaudit() on socket create to suppress false
positive audit messages"
* tag 'net-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (36 commits)
r8169: fix issue with forced threading in combination with shared interrupts
netem: fix zero division in tabledist
ibmvnic: fix ibmvnic_set_mac
mptcp: add missing memory scheduling in the rx path
tipc: fix memory leak caused by tipc_buf_append()
gtp: fix an use-before-init in gtp_newlink()
net: protect tcf_block_unbind with block lock
ibmveth: Fix use of ibmveth in a bridge.
net/sched: act_mpls: Add softdep on mpls_gso.ko
ravb: Fix bit fields checking in ravb_hwtstamp_get()
devlink: Unlock on error in dumpit()
devlink: Fix some error codes
chelsio/chtls: fix memory leaks in CPL handlers
chelsio/chtls: fix deadlock issue
net: hns3: Clear the CMDQ registers before unmapping BAR region
bnxt_en: Send HWRM_FUNC_RESET fw command unconditionally.
bnxt_en: Check abort error state in bnxt_open_nic().
bnxt_en: Re-write PCI BARs after PCI fatal error.
bnxt_en: Invoke cancel_delayed_work_sync() for PFs also.
bnxt_en: Fix regression in workqueue cleanup logic in bnxt_remove_one().
...
Linus Torvalds [Thu, 29 Oct 2020 18:50:59 +0000 (11:50 -0700)]
Merge tag 'for-linus' of git://git./linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
"The good news is people are testing rc1 in the RDMA world - the bad
news is testing of the for-next area is not as good as I had hoped, as
we really should have caught at least the rdma_connect_locked() issue
before now.
Notable merge window regressions that didn't get caught/fixed in time
for rc1:
- Fix in kernel users of rxe, they were broken by the rapid fix to
undo the uABI breakage in rxe from another patch
- EFA userspace needs to read the GID table but was broken with the
new GID table logic
- Fix user triggerable deadlock in mlx5 using devlink reload
- Fix deadlock in several ULPs using rdma_connect from the CM handler
callbacks
- Memory leak in qedr"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/qedr: Fix memory leak in iWARP CM
RDMA: Add rdma_connect_locked()
RDMA/uverbs: Fix false error in query gid IOCTL
RDMA/mlx5: Fix devlink deadlock on net namespace deletion
RDMA/rxe: Fix small problem in network_type patch
Heiner Kallweit [Thu, 29 Oct 2020 09:18:53 +0000 (10:18 +0100)]
r8169: fix issue with forced threading in combination with shared interrupts
As reported by Serge flag IRQF_NO_THREAD causes an error if the
interrupt is actually shared and the other driver(s) don't have this
flag set. This situation can occur if a PCI(e) legacy interrupt is
used in combination with forced threading.
There's no good way to deal with this properly, therefore we have to
remove flag IRQF_NO_THREAD. For fixing the original forced threading
issue switch to napi_schedule().
Fixes:
424a646e072a ("r8169: fix operation under forced interrupt threading")
Link: https://www.spinics.net/lists/netdev/msg694960.html
Reported-by: Serge Belyshev <belyshev@depni.sinp.msu.ru>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Tested-by: Serge Belyshev <belyshev@depni.sinp.msu.ru>
Link: https://lore.kernel.org/r/b5b53bfe-35ac-3768-85bf-74d1290cf394@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Aleksandr Nogikh [Wed, 28 Oct 2020 17:07:31 +0000 (17:07 +0000)]
netem: fix zero division in tabledist
Currently it is possible to craft a special netlink RTM_NEWQDISC
command that can result in jitter being equal to 0x80000000. It is
enough to set the 32 bit jitter to 0x02000000 (it will later be
multiplied by 2^6) or just set the 64 bit jitter via
TCA_NETEM_JITTER64. This causes an overflow during the generation of
uniformly distributed numbers in tabledist(), which in turn leads to
division by zero (sigma != 0, but sigma * 2 is 0).
The related fragment of code needs 32-bit division - see commit
9b0ed89 ("netem: remove unnecessary 64 bit modulus"), so switching to
64 bit is not an option.
Fix the issue by keeping the value of jitter within the range that can
be adequately handled by tabledist() - [0;INT_MAX]. As negative std
deviation makes no sense, take the absolute value of the passed value
and cap it at INT_MAX. Inside tabledist(), switch to unsigned 32 bit
arithmetic in order to prevent overflows.
Fixes:
1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Aleksandr Nogikh <nogikh@google.com>
Reported-by: syzbot+ec762a6342ad0d3c0d8f@syzkaller.appspotmail.com
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Link: https://lore.kernel.org/r/20201028170731.1383332-1-aleksandrnogikh@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Lijun Pan [Tue, 27 Oct 2020 22:04:56 +0000 (17:04 -0500)]
ibmvnic: fix ibmvnic_set_mac
Jakub Kicinski brought up a concern in ibmvnic_set_mac().
ibmvnic_set_mac() does this:
ether_addr_copy(adapter->mac_addr, addr->sa_data);
if (adapter->state != VNIC_PROBED)
rc = __ibmvnic_set_mac(netdev, addr->sa_data);
So if state == VNIC_PROBED, the user can assign an invalid address to
adapter->mac_addr, and ibmvnic_set_mac() will still return 0.
The fix is to validate ethernet address at the beginning of
ibmvnic_set_mac(), and move the ether_addr_copy to
the case of "adapter->state != VNIC_PROBED".
Fixes:
c26eba03e407 ("ibmvnic: Update reset infrastructure to support tunable parameters")
Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Link: https://lore.kernel.org/r/20201027220456.71450-1-ljp@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Tue, 27 Oct 2020 14:59:14 +0000 (15:59 +0100)]
mptcp: add missing memory scheduling in the rx path
When moving the skbs from the subflow into the msk receive
queue, we must schedule there the required amount of memory.
Try to borrow the required memory from the subflow, if needed,
so that we leverage the existing TCP heuristic.
Fixes:
6771bfd9ee24 ("mptcp: update mptcp ack sequence from work queue")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Link: https://lore.kernel.org/r/f6143a6193a083574f11b00dbf7b5ad151bc4ff4.1603810630.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gustavo A. R. Silva [Thu, 3 Sep 2020 04:25:55 +0000 (23:25 -0500)]
include: jhash/signal: Fix fall-through warnings for Clang
In preparation to enable -Wimplicit-fallthrough for Clang, explicitly
add break statements instead of letting the code fall through to the
next case.
This patch adds four break statements that, together, fix almost 40,000
warnings when building Linux 5.10-rc1 with Clang 12.0.0 and this[1] change
reverted. Notice that in order to enable -Wimplicit-fallthrough for Clang,
such change[1] is meant to be reverted at some point. So, this patch helps
to move in that direction.
Something important to mention is that there is currently a discrepancy
between GCC and Clang when dealing with switch fall-through to empty case
statements or to cases that only contain a break/continue/return
statement[2][3][4].
Now that the -Wimplicit-fallthrough option has been globally enabled[5],
any compiler should really warn on missing either a fallthrough annotation
or any of the other case-terminating statements (break/continue/return/
goto) when falling through to the next case statement. Making exceptions
to this introduces variation in case handling which may continue to lead
to bugs, misunderstandings, and a general lack of robustness. The point
of enabling options like -Wimplicit-fallthrough is to prevent human error
and aid developers in spotting bugs before their code is even built/
submitted/committed, therefore eliminating classes of bugs. So, in order
to really accomplish this, we should, and can, move in the direction of
addressing any error-prone scenarios and get rid of the unintentional
fallthrough bug-class in the kernel, entirely, even if there is some minor
redundancy. Better to have explicit case-ending statements than continue to
have exceptions where one must guess as to the right result. The compiler
will eliminate any actual redundancy.
[1] commit
e2079e93f562c ("kbuild: Do not enable -Wimplicit-fallthrough for clang for now")
[2] https://github.com/ClangBuiltLinux/linux/issues/636
[3] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91432
[4] https://godbolt.org/z/xgkvIh
[5] commit
a035d552a93b ("Makefile: Globally enable fall-through warning")
Co-developed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Linus Torvalds [Thu, 29 Oct 2020 17:13:09 +0000 (10:13 -0700)]
Merge tag 'afs-fixes-
20201029' of git://git./linux/kernel/git/dhowells/linux-fs
Pull AFS fixes from David Howells:
- Fix copy_file_range() to an afs file now returning EINVAL if the
splice_write file op isn't supplied.
- Fix a deref-before-check in afs_unuse_cell().
- Fix a use-after-free in afs_xattr_get_acl().
- Fix afs to not try to clear PG_writeback when laundering a page.
- Fix afs to take a ref on a page that it sets PG_private on and to
drop that ref when clearing PG_private. This is done through recently
added helpers.
- Fix a page leak if write_begin() fails.
- Fix afs_write_begin() to not alter the dirty region info stored in
page->private, but rather do this in afs_write_end() instead when we
know what we actually changed.
- Fix afs_invalidatepage() to alter the dirty region info on a page
when partial page invalidation occurs so that we don't inadvertantly
include a span of zeros that will get written back if a page gets
laundered due to a remote 3rd-party induced invalidation.
We mustn't, however, reduce the dirty region if the page has been
seen to be mapped (ie. we got called through the page_mkwrite vector)
as the page might still be mapped and we might lose data if the file
is extended again.
- Fix the dirty region info to have a lower resolution if the size of
the page is too large for this to be encoded (e.g. powerpc32 with 64K
pages).
Note that this might not be the ideal way to handle this, since it
may allow some leakage of undirtied zero bytes to the server's copy
in the case of a 3rd-party conflict.
To aid the last two fixes, two additional changes:
- Wrap the manipulations of the dirty region info stored in
page->private into helper functions.
- Alter the encoding of the dirty region so that the region bounds can
be stored with one fewer bit, making a bit available for the
indication of mappedness.
* tag 'afs-fixes-
20201029' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
afs: Fix dirty-region encoding on ppc32 with 64K pages
afs: Fix afs_invalidatepage to adjust the dirty region
afs: Alter dirty range encoding in page->private
afs: Wrap page->private manipulations in inline functions
afs: Fix where page->private is set during write
afs: Fix page leak on afs_write_begin() failure
afs: Fix to take ref on page when PG_private is set
afs: Fix afs_launder_page to not clear PG_writeback
afs: Fix a use after free in afs_xattr_get_acl()
afs: Fix tracing deref-before-check
afs: Fix copy_file_range()
Tung Nguyen [Tue, 27 Oct 2020 03:24:03 +0000 (10:24 +0700)]
tipc: fix memory leak caused by tipc_buf_append()
Commit
ed42989eab57 ("tipc: fix the skb_unshare() in tipc_buf_append()")
replaced skb_unshare() with skb_copy() to not reduce the data reference
counter of the original skb intentionally. This is not the correct
way to handle the cloned skb because it causes memory leak in 2
following cases:
1/ Sending multicast messages via broadcast link
The original skb list is cloned to the local skb list for local
destination. After that, the data reference counter of each skb
in the original list has the value of 2. This causes each skb not
to be freed after receiving ACK:
tipc_link_advance_transmq()
{
...
/* release skb */
__skb_unlink(skb, &l->transmq);
kfree_skb(skb); <-- memory exists after being freed
}
2/ Sending multicast messages via replicast link
Similar to the above case, each skb cannot be freed after purging
the skb list:
tipc_mcast_xmit()
{
...
__skb_queue_purge(pkts); <-- memory exists after being freed
}
This commit fixes this issue by using skb_unshare() instead. Besides,
to avoid use-after-free error reported by KASAN, the pointer to the
fragment is set to NULL before calling skb_unshare() to make sure that
the original skb is not freed after freeing the fragment 2 times in
case skb_unshare() returns NULL.
Fixes:
ed42989eab57 ("tipc: fix the skb_unshare() in tipc_buf_append()")
Acked-by: Jon Maloy <jmaloy@redhat.com>
Reported-by: Thang Hoang Ngo <thang.h.ngo@dektech.com.au>
Signed-off-by: Tung Nguyen <tung.q.nguyen@dektech.com.au>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Link: https://lore.kernel.org/r/20201027032403.1823-1-tung.q.nguyen@dektech.com.au
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Masahiro Fujiwara [Tue, 27 Oct 2020 11:48:46 +0000 (20:48 +0900)]
gtp: fix an use-before-init in gtp_newlink()
*_pdp_find() from gtp_encap_recv() would trigger a crash when a peer
sends GTP packets while creating new GTP device.
RIP: 0010:gtp1_pdp_find.isra.0+0x68/0x90 [gtp]
<SNIP>
Call Trace:
<IRQ>
gtp_encap_recv+0xc2/0x2e0 [gtp]
? gtp1_pdp_find.isra.0+0x90/0x90 [gtp]
udp_queue_rcv_one_skb+0x1fe/0x530
udp_queue_rcv_skb+0x40/0x1b0
udp_unicast_rcv_skb.isra.0+0x78/0x90
__udp4_lib_rcv+0x5af/0xc70
udp_rcv+0x1a/0x20
ip_protocol_deliver_rcu+0xc5/0x1b0
ip_local_deliver_finish+0x48/0x50
ip_local_deliver+0xe5/0xf0
? ip_protocol_deliver_rcu+0x1b0/0x1b0
gtp_encap_enable() should be called after gtp_hastable_new() otherwise
*_pdp_find() will access the uninitialized hash table.
Fixes:
1e3a3abd8b28 ("gtp: make GTP sockets in gtp_newlink optional")
Signed-off-by: Masahiro Fujiwara <fujiwara.masahiro@gmail.com>
Link: https://lore.kernel.org/r/20201027114846.3924-1-fujiwara.masahiro@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Thu, 29 Oct 2020 16:36:11 +0000 (09:36 -0700)]
Merge tag 'ext4_for_linus_fixes' of git://git./linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"Bug fixes for the new ext4 fast commit feature, plus a fix for the
'data=journal' bug fix.
Also use the generic casefolding support which has now landed in
fs/libfs.c for 5.10"
* tag 'ext4_for_linus_fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: indicate that fast_commit is available via /sys/fs/ext4/feature/...
ext4: use generic casefolding support
ext4: do not use extent after put_bh
ext4: use IS_ERR() for error checking of path
ext4: fix mmap write protection for data=journal mode
jbd2: fix a kernel-doc markup
ext4: use s_mount_flags instead of s_mount_state for fast commit state
ext4: make num of fast commit blocks configurable
ext4: properly check for dirty state in ext4_inode_datasync_dirty()
ext4: fix double locking in ext4_fc_commit_dentry_updates()
David Howells [Wed, 28 Oct 2020 12:08:39 +0000 (12:08 +0000)]
afs: Fix dirty-region encoding on ppc32 with 64K pages
The dirty region bounds stored in page->private on an afs page are 15 bits
on a 32-bit box and can, at most, represent a range of up to 32K within a
32K page with a resolution of 1 byte. This is a problem for powerpc32 with
64K pages enabled.
Further, transparent huge pages may get up to 2M, which will be a problem
for the afs filesystem on all 32-bit arches in the future.
Fix this by decreasing the resolution. For the moment, a 64K page will
have a resolution determined from PAGE_SIZE. In the future, the page will
need to be passed in to the helper functions so that the page size can be
assessed and the resolution determined dynamically.
Note that this might not be the ideal way to handle this, since it may
allow some leakage of undirtied zero bytes to the server's copy in the case
of a 3rd-party conflict. Fixing that would require a separately allocated
record and is a more complicated fix.
Fixes:
4343d00872e1 ("afs: Get rid of the afs_writeback record")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
David Howells [Thu, 22 Oct 2020 13:08:23 +0000 (14:08 +0100)]
afs: Fix afs_invalidatepage to adjust the dirty region
Fix afs_invalidatepage() to adjust the dirty region recorded in
page->private when truncating a page. If the dirty region is entirely
removed, then the private data is cleared and the page dirty state is
cleared.
Without this, if the page is truncated and then expanded again by truncate,
zeros from the expanded, but no-longer dirty region may get written back to
the server if the page gets laundered due to a conflicting 3rd-party write.
It mustn't, however, shorten the dirty region of the page if that page is
still mmapped and has been marked dirty by afs_page_mkwrite(), so a flag is
stored in page->private to record this.
Fixes:
4343d00872e1 ("afs: Get rid of the afs_writeback record")
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Mon, 26 Oct 2020 13:57:44 +0000 (13:57 +0000)]
afs: Alter dirty range encoding in page->private
Currently, page->private on an afs page is used to store the range of
dirtied data within the page, where the range includes the lower bound, but
excludes the upper bound (e.g. 0-1 is a range covering a single byte).
This, however, requires a superfluous bit for the last-byte bound so that
on a 4KiB page, it can say 0-4096 to indicate the whole page, the idea
being that having both numbers the same would indicate an empty range.
This is unnecessary as the PG_private bit is clear if it's an empty range
(as is PG_dirty).
Alter the way the dirty range is encoded in page->private such that the
upper bound is reduced by 1 (e.g. 0-0 is then specified the same single
byte range mentioned above).
Applying this to both bounds frees up two bits, one of which can be used in
a future commit.
This allows the afs filesystem to be compiled on ppc32 with 64K pages;
without this, the following warnings are seen:
../fs/afs/internal.h: In function 'afs_page_dirty_to':
../fs/afs/internal.h:881:15: warning: right shift count >= width of type [-Wshift-count-overflow]
881 | return (priv >> __AFS_PAGE_PRIV_SHIFT) & __AFS_PAGE_PRIV_MASK;
| ^~
../fs/afs/internal.h: In function 'afs_page_dirty':
../fs/afs/internal.h:886:28: warning: left shift count >= width of type [-Wshift-count-overflow]
886 | return ((unsigned long)to << __AFS_PAGE_PRIV_SHIFT) | from;
| ^~
Fixes:
4343d00872e1 ("afs: Get rid of the afs_writeback record")
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Mon, 26 Oct 2020 13:22:47 +0000 (13:22 +0000)]
afs: Wrap page->private manipulations in inline functions
The afs filesystem uses page->private to store the dirty range within a
page such that in the event of a conflicting 3rd-party write to the server,
we write back just the bits that got changed locally.
However, there are a couple of problems with this:
(1) I need a bit to note if the page might be mapped so that partial
invalidation doesn't shrink the range.
(2) There aren't necessarily sufficient bits to store the entire range of
data altered (say it's a 32-bit system with 64KiB pages or transparent
huge pages are in use).
So wrap the accesses in inline functions so that future commits can change
how this works.
Also move them out of the tracing header into the in-directory header.
There's not really any need for them to be in the tracing header.
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Mon, 26 Oct 2020 14:05:33 +0000 (14:05 +0000)]
afs: Fix where page->private is set during write
In afs, page->private is set to indicate the dirty region of a page. This
is done in afs_write_begin(), but that can't take account of whether the
copy into the page actually worked.
Fix this by moving the change of page->private into afs_write_end().
Fixes:
4343d00872e1 ("afs: Get rid of the afs_writeback record")
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Thu, 22 Oct 2020 13:03:03 +0000 (14:03 +0100)]
afs: Fix page leak on afs_write_begin() failure
Fix the leak of the target page in afs_write_begin() when it fails.
Fixes:
15b4650e55e0 ("afs: convert to new aops")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Nick Piggin <npiggin@gmail.com>
David Howells [Wed, 21 Oct 2020 12:22:19 +0000 (13:22 +0100)]
afs: Fix to take ref on page when PG_private is set
Fix afs to take a ref on a page when it sets PG_private on it and to drop
the ref when removing the flag.
Note that in afs_write_begin(), a lot of the time, PG_private is already
set on a page to which we're going to add some data. In such a case, we
leave the bit set and mustn't increment the page count.
As suggested by Matthew Wilcox, use attach/detach_page_private() where
possible.
Fixes:
31143d5d515e ("AFS: implement basic file write support")
Reported-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Linus Torvalds [Wed, 28 Oct 2020 19:05:14 +0000 (12:05 -0700)]
Merge tag 'trace-v5.10-rc1' of git://git./linux/kernel/git/rostedt/linux-trace
Pull tracing fix from Steven Rostedt:
"Fix synthetic event "strcat" overrun
New synthetic event code used strcat() and miscalculated the ending,
causing the concatenation to write beyond the allocated memory.
Instead of using strncat(), the code is switched over to seq_buf which
has all the mechanisms in place to protect against writing more than
what is allocated, and cleans up the code a bit"
* tag 'trace-v5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing, synthetic events: Replace buggy strcat() with seq_buf operations
Theodore Ts'o [Wed, 28 Oct 2020 17:39:13 +0000 (13:39 -0400)]
ext4: indicate that fast_commit is available via /sys/fs/ext4/feature/...
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Daniel Rosenberg [Wed, 28 Oct 2020 05:08:20 +0000 (05:08 +0000)]
ext4: use generic casefolding support
This switches ext4 over to the generic support provided in libfs.
Since casefolded dentries behave the same in ext4 and f2fs, we decrease
the maintenance burden by unifying them, and any optimizations will
immediately apply to both.
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20201028050820.1636571-1-drosen@google.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
yangerkun [Wed, 28 Oct 2020 05:56:17 +0000 (13:56 +0800)]
ext4: do not use extent after put_bh
ext4_ext_search_right() will read more extent blocks and call put_bh
after we get the information we need. However, ret_ex will break this
and may cause use-after-free once pagecache has been freed. Fix it by
copying the extent structure if needed.
Signed-off-by: yangerkun <yangerkun@huawei.com>
Link: https://lore.kernel.org/r/20201028055617.2569255-1-yangerkun@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Harshad Shirwadkar [Tue, 27 Oct 2020 20:43:42 +0000 (13:43 -0700)]
ext4: use IS_ERR() for error checking of path
With this fix, fast commit recovery code uses IS_ERR() for path
returned by ext4_find_extent.
Fixes:
8016e29f4362 ("ext4: fast commit recovery path")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20201027204342.2794949-1-harshadshirwadkar@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Jan Kara [Tue, 27 Oct 2020 13:27:51 +0000 (14:27 +0100)]
ext4: fix mmap write protection for data=journal mode
Commit
afb585a97f81 "ext4: data=journal: write-protect pages on
j_submit_inode_data_buffers()") added calls ext4_jbd2_inode_add_write()
to track inode ranges whose mappings need to get write-protected during
transaction commits. However the added calls use wrong start of a range
(0 instead of page offset) and so write protection is not necessarily
effective. Use correct range start to fix the problem.
Fixes:
afb585a97f81 ("ext4: data=journal: write-protect pages on j_submit_inode_data_buffers()")
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20201027132751.29858-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Mauro Carvalho Chehab [Tue, 27 Oct 2020 09:51:27 +0000 (10:51 +0100)]
jbd2: fix a kernel-doc markup
The kernel-doc markup that documents _fc_replay_callback is
missing an asterisk, causing this warning:
../include/linux/jbd2.h:1271: warning: Function parameter or member 'j_fc_replay_callback' not described in 'journal_s'
When building the docs.
Fixes:
609f928af48f ("jbd2: fast commit recovery path")
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/6055927ada2015b55b413cdd2670533bdc9a8da2.1603791716.git.mchehab+huawei@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Harshad Shirwadkar [Tue, 27 Oct 2020 04:49:15 +0000 (21:49 -0700)]
ext4: use s_mount_flags instead of s_mount_state for fast commit state
Ext4's fast commit related transient states should use
sb->s_mount_flags instead of persistent sb->s_mount_state.
Fixes:
8016e29f4362 ("ext4: fast commit recovery path")
Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20201027044915.2553163-3-harshadshirwadkar@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Harshad Shirwadkar [Tue, 27 Oct 2020 04:49:14 +0000 (21:49 -0700)]
ext4: make num of fast commit blocks configurable
This patch reserves a field in the jbd2 superblock for number of fast
commit blocks. When this value is non-zero, Ext4 uses this field to
set the number of fast commit blocks.
Fixes:
6866d7b3f2bb ("ext4/jbd2: add fast commit initialization")
Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20201027044915.2553163-2-harshadshirwadkar@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Andrea Righi [Tue, 27 Oct 2020 04:49:13 +0000 (21:49 -0700)]
ext4: properly check for dirty state in ext4_inode_datasync_dirty()
ext4_inode_datasync_dirty() needs to return 'true' if the inode is
dirty, 'false' otherwise, but the logic seems to be incorrectly changed
by commit
aa75f4d3daae ("ext4: main fast-commit commit path").
This introduces a problem with swap files that are always failing to be
activated, showing this error in dmesg:
[ 34.406479] swapon: file is not committed
Simple test case to reproduce the problem:
# fallocate -l 8G swapfile
# chmod 0600 swapfile
# mkswap swapfile
# swapon swapfile
Fix the logic to return the proper state of the inode.
Link: https://lore.kernel.org/lkml/20201024131333.GA32124@xps-13-7390
Fixes:
8016e29f4362 ("ext4: fast commit recovery path")
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20201027044915.2553163-1-harshadshirwadkar@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Harshad Shirwadkar [Fri, 23 Oct 2020 16:13:39 +0000 (09:13 -0700)]
ext4: fix double locking in ext4_fc_commit_dentry_updates()
Fixed double locking of sbi->s_fc_lock in the above function
as reported by kernel-test-robot.
Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20201023161339.1449437-1-harshadshirwadkar@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Alok Prasad [Wed, 21 Oct 2020 11:50:08 +0000 (11:50 +0000)]
RDMA/qedr: Fix memory leak in iWARP CM
Fixes memory leak in iWARP CM
Fixes:
e411e0587e0d ("RDMA/qedr: Add iWARP connection management functions")
Link: https://lore.kernel.org/r/20201021115008.28138-1-palok@marvell.com
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Alok Prasad <palok@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Jason Gunthorpe [Mon, 26 Oct 2020 14:25:49 +0000 (11:25 -0300)]
RDMA: Add rdma_connect_locked()
There are two flows for handling RDMA_CM_EVENT_ROUTE_RESOLVED, either the
handler triggers a completion and another thread does rdma_connect() or
the handler directly calls rdma_connect().
In all cases rdma_connect() needs to hold the handler_mutex, but when
handler's are invoked this is already held by the core code. This causes
ULPs using the 2nd method to deadlock.
Provide a rdma_connect_locked() and have all ULPs call it from their
handlers.
Link: https://lore.kernel.org/r/0-v2-53c22d5c1405+33-rdma_connect_locking_jgg@nvidia.com
Reported-and-tested-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Fixes:
2a7cec538169 ("RDMA/cma: Fix locking for the RDMA_CM_CONNECT state")
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Acked-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>