Colin Ian King [Wed, 16 Dec 2020 12:00:20 +0000 (12:00 +0000)]
net: nixge: fix spelling mistake in Kconfig: "Instuments" -> "Instruments"
There is a spelling mistake in the Kconfig. Fix it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20201216120020.13149-1-colin.king@canonical.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dan Carpenter [Wed, 16 Dec 2020 08:38:04 +0000 (11:38 +0300)]
qlcnic: Fix error code in probe
Return -EINVAL if we can't find the correct device. Currently it
returns success.
Fixes:
13159183ec7a ("qlcnic: 83xx base driver")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/X9nHbMqEyI/xPfGd@mwanda
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 17 Dec 2020 18:25:10 +0000 (10:25 -0800)]
Merge branch 'mptcp-a-bunch-of-assorted-fixes'
Paolo Abeni says:
====================
mptcp: a bunch of assorted fixes
This series pulls a few fixes for the MPTCP datapath.
Most issues addressed here has been recently introduced
with the recent reworks, with the notable exception of
the first patch, which addresses an issue present since
the early days
====================
Link: https://lore.kernel.org/r/cover.1608114076.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Wed, 16 Dec 2020 11:48:35 +0000 (12:48 +0100)]
mptcp: fix pending data accounting
When sendmsg() needs to wait for memory, the pending data
is not updated. That causes a drift in forward memory allocation,
leading to stall and/or warnings at socket close time.
This change addresses the above issue moving the pending data
counter update inside the sendmsg() main loop.
Fixes:
6e628cd3a8f7 ("mptcp: use mptcp release_cb for delayed tasks")
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Wed, 16 Dec 2020 11:48:34 +0000 (12:48 +0100)]
mptcp: push pending frames when subflow has free space
When multiple subflows are active, we can receive a
window update on subflow with no write space available.
MPTCP will try to push frames on such subflow and will
fail. Pending frames will be pushed only after receiving
a window update on a subflow with some wspace available.
Overall the above could lead to suboptimal aggregate
bandwidth usage.
Instead, we should try to push pending frames as soon as
the subflow reaches both conditions mentioned above.
We can finally enable self-tests with asymmetric links,
as the above makes them finally pass.
Fixes:
6f8a612a33e4 ("mptcp: keep track of advertised windows right edge")
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Wed, 16 Dec 2020 11:48:33 +0000 (12:48 +0100)]
mptcp: properly annotate nested lock
MPTCP closes the subflows while holding the msk-level lock.
While acquiring the subflow socket lock we need to use the
correct nested annotation, or we can hit a lockdep splat
at runtime.
Reported-and-tested-by: Geliang Tang <geliangtang@gmail.com>
Fixes:
e16163b6e2b7 ("mptcp: refactor shutdown and close")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Wed, 16 Dec 2020 11:48:32 +0000 (12:48 +0100)]
mptcp: fix security context on server socket
Currently MPTCP is not propagating the security context
from the ingress request socket to newly created msk
at clone time.
Address the issue invoking the missing security helper.
Fixes:
cf7da0d66cc1 ("mptcp: Create SUBFLOW socket for incoming connections")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Parav Pandit [Sun, 13 Dec 2020 12:06:41 +0000 (14:06 +0200)]
net/mlx5: Fix compilation warning for 32-bit platform
MLX5_GENERAL_OBJECT_TYPES types bitfield is 64-bit field.
Defining an enum for such bit fields on 32-bit platform results in below
warning.
./include/vdso/bits.h:7:26: warning: left shift count >= width of type [-Wshift-count-overflow]
^
./include/linux/mlx5/mlx5_ifc.h:10716:46: note: in expansion of macro ‘BIT’
MLX5_HCA_CAP_GENERAL_OBJECT_TYPES_SAMPLER = BIT(0x20),
^~~
Use 32-bit friendly BIT_ULL macro.
Fixes:
2a2970891647 ("net/mlx5: Add sample offload hardware bits and structures")
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20201213120641.216032-1-leon@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Geliang Tang [Tue, 15 Dec 2020 09:56:51 +0000 (17:56 +0800)]
mptcp: clear use_ack and use_map when dropping other suboptions
This patch cleared use_ack and use_map when dropping other suboptions to
fix the following syzkaller BUG:
[ 15.223006] BUG: unable to handle page fault for address:
0000000000223b10
[ 15.223700] #PF: supervisor read access in kernel mode
[ 15.224209] #PF: error_code(0x0000) - not-present page
[ 15.224724] PGD b8d5067 P4D b8d5067 PUD c0a5067 PMD 0
[ 15.225237] Oops: 0000 [#1] SMP
[ 15.225556] CPU: 0 PID: 7747 Comm: syz-executor Not tainted 5.10.0-rc6+ #24
[ 15.226281] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[ 15.227292] RIP: 0010:skb_release_data+0x89/0x1e0
[ 15.227816] Code: 5b 5d 41 5c 41 5d 41 5e 41 5f e9 02 06 8a ff e8 fd 05 8a ff 45 31 ed 80 7d 02 00 4c 8d 65 30 74 55 e8 eb 05 8a ff 49 8b 1c 24 <4c> 8b 7b 08 41 f6 c7 01 0f 85 18 01 00 00 e8 d4 05 8a ff 8b 43 34
[ 15.229669] RSP: 0018:
ffffc900019c7c08 EFLAGS:
00010293
[ 15.230188] RAX:
ffff88800daad900 RBX:
0000000000223b08 RCX:
0000000000000006
[ 15.230895] RDX:
0000000000000000 RSI:
ffffffff818e06c5 RDI:
ffff88807f6dc700
[ 15.231593] RBP:
ffff88807f71a4c0 R08:
0000000000000001 R09:
0000000000000001
[ 15.232299] R10:
ffffc900019c7c18 R11:
0000000000000000 R12:
ffff88807f71a4f0
[ 15.233007] R13:
0000000000000000 R14:
ffff88807f6dc700 R15:
0000000000000002
[ 15.233714] FS:
00007f65d9b5f700(0000) GS:
ffff88807c400000(0000) knlGS:
0000000000000000
[ 15.234509] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 15.235081] CR2:
0000000000223b10 CR3:
000000000b883000 CR4:
00000000000006f0
[ 15.235788] Call Trace:
[ 15.236042] skb_release_all+0x28/0x30
[ 15.236419] __kfree_skb+0x11/0x20
[ 15.236768] tcp_data_queue+0x270/0x1240
[ 15.237161] ? tcp_urg+0x50/0x2a0
[ 15.237496] tcp_rcv_established+0x39a/0x890
[ 15.237997] ? mark_held_locks+0x49/0x70
[ 15.238467] tcp_v4_do_rcv+0xb9/0x270
[ 15.238915] __release_sock+0x8a/0x160
[ 15.239365] release_sock+0x32/0xd0
[ 15.239793] __inet_stream_connect+0x1d2/0x400
[ 15.240313] ? do_wait_intr_irq+0x80/0x80
[ 15.240791] inet_stream_connect+0x36/0x50
[ 15.241275] mptcp_stream_connect+0x69/0x1b0
[ 15.241787] __sys_connect+0x122/0x140
[ 15.242236] ? syscall_enter_from_user_mode+0x17/0x50
[ 15.242836] ? lockdep_hardirqs_on_prepare+0xd4/0x170
[ 15.243436] __x64_sys_connect+0x1a/0x20
[ 15.243924] do_syscall_64+0x33/0x40
[ 15.244313] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 15.244821] RIP: 0033:0x7f65d946e469
[ 15.245183] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ff 49 2b 00 f7 d8 64 89 01 48
[ 15.247019] RSP: 002b:
00007f65d9b5eda8 EFLAGS:
00000246 ORIG_RAX:
000000000000002a
[ 15.247770] RAX:
ffffffffffffffda RBX:
000000000049bf00 RCX:
00007f65d946e469
[ 15.248471] RDX:
0000000000000010 RSI:
00000000200000c0 RDI:
0000000000000005
[ 15.249205] RBP:
000000000049bf00 R08:
0000000000000000 R09:
0000000000000000
[ 15.249908] R10:
0000000000000000 R11:
0000000000000246 R12:
000000000049bf0c
[ 15.250603] R13:
00007fffe8a25cef R14:
00007f65d9b3f000 R15:
0000000000000003
[ 15.251312] Modules linked in:
[ 15.251626] CR2:
0000000000223b10
[ 15.251965] BUG: kernel NULL pointer dereference, address:
0000000000000048
[ 15.252005] ---[ end trace
f5c51fe19123c773 ]---
[ 15.252822] #PF: supervisor read access in kernel mode
[ 15.252823] #PF: error_code(0x0000) - not-present page
[ 15.252825] PGD c6c6067 P4D c6c6067 PUD c0d8067
[ 15.253294] RIP: 0010:skb_release_data+0x89/0x1e0
[ 15.253910] PMD 0
[ 15.253914] Oops: 0000 [#2] SMP
[ 15.253917] CPU: 1 PID: 7746 Comm: syz-executor Tainted: G D 5.10.0-rc6+ #24
[ 15.253920] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[ 15.254435] Code: 5b 5d 41 5c 41 5d 41 5e 41 5f e9 02 06 8a ff e8 fd 05 8a ff 45 31 ed 80 7d 02 00 4c 8d 65 30 74 55 e8 eb 05 8a ff 49 8b 1c 24 <4c> 8b 7b 08 41 f6 c7 01 0f 85 18 01 00 00 e8 d4 05 8a ff 8b 43 34
[ 15.254899] RIP: 0010:skb_release_data+0x89/0x1e0
[ 15.254902] Code: 5b 5d 41 5c 41 5d 41 5e 41 5f e9 02 06 8a ff e8 fd 05 8a ff 45 31 ed 80 7d 02 00 4c 8d 65 30 74 55 e8 eb 05 8a ff 49 8b 1c 24 <4c> 8b 7b 08 41 f6 c7 01 0f 85 18 01 00 00 e8 d4 05 8a ff 8b 43 34
[ 15.254905] RSP: 0018:
ffffc900019bfc08 EFLAGS:
00010293
[ 15.255376] RSP: 0018:
ffffc900019c7c08 EFLAGS:
00010293
[ 15.255580]
[ 15.255583] RAX:
ffff888004a7ac80 RBX:
0000000000000040 RCX:
0000000000000000
[ 15.255912]
[ 15.256724] RDX:
0000000000000000 RSI:
ffffffff818e06c5 RDI:
ffff88807f6ddd00
[ 15.257620] RAX:
ffff88800daad900 RBX:
0000000000223b08 RCX:
0000000000000006
[ 15.259817] RBP:
ffff88800e9006c0 R08:
0000000000000000 R09:
0000000000000000
[ 15.259818] R10:
0000000000000000 R11:
0000000000000000 R12:
ffff88800e9006f0
[ 15.259820] R13:
0000000000000000 R14:
ffff88807f6ddd00 R15:
0000000000000002
[ 15.259822] FS:
00007fae4a60a700(0000) GS:
ffff88807c500000(0000) knlGS:
0000000000000000
[ 15.259826] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 15.260296] RDX:
0000000000000000 RSI:
ffffffff818e06c5 RDI:
ffff88807f6dc700
[ 15.262514] CR2:
0000000000000048 CR3:
000000000b89c000 CR4:
00000000000006e0
[ 15.262515] Call Trace:
[ 15.262519] skb_release_all+0x28/0x30
[ 15.262523] __kfree_skb+0x11/0x20
[ 15.263054] RBP:
ffff88807f71a4c0 R08:
0000000000000001 R09:
0000000000000001
[ 15.263680] tcp_data_queue+0x270/0x1240
[ 15.263843] R10:
ffffc900019c7c18 R11:
0000000000000000 R12:
ffff88807f71a4f0
[ 15.264693] ? tcp_urg+0x50/0x2a0
[ 15.264856] R13:
0000000000000000 R14:
ffff88807f6dc700 R15:
0000000000000002
[ 15.265720] tcp_rcv_established+0x39a/0x890
[ 15.266438] FS:
00007f65d9b5f700(0000) GS:
ffff88807c400000(0000) knlGS:
0000000000000000
[ 15.267283] ? __schedule+0x3fa/0x880
[ 15.267287] tcp_v4_do_rcv+0xb9/0x270
[ 15.267290] __release_sock+0x8a/0x160
[ 15.268049] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 15.268788] release_sock+0x32/0xd0
[ 15.268791] __inet_stream_connect+0x1d2/0x400
[ 15.268795] ? do_wait_intr_irq+0x80/0x80
[ 15.269593] CR2:
0000000000223b10 CR3:
000000000b883000 CR4:
00000000000006f0
[ 15.270246] inet_stream_connect+0x36/0x50
[ 15.270250] mptcp_stream_connect+0x69/0x1b0
[ 15.270253] __sys_connect+0x122/0x140
[ 15.271097] Kernel panic - not syncing: Fatal exception
[ 15.271820] ? syscall_enter_from_user_mode+0x17/0x50
[ 15.283542] ? lockdep_hardirqs_on_prepare+0xd4/0x170
[ 15.284275] __x64_sys_connect+0x1a/0x20
[ 15.284853] do_syscall_64+0x33/0x40
[ 15.285369] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 15.286105] RIP: 0033:0x7fae49f19469
[ 15.286638] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ff 49 2b 00 f7 d8 64 89 01 48
[ 15.289295] RSP: 002b:
00007fae4a609da8 EFLAGS:
00000246 ORIG_RAX:
000000000000002a
[ 15.290375] RAX:
ffffffffffffffda RBX:
000000000049bf00 RCX:
00007fae49f19469
[ 15.291403] RDX:
0000000000000010 RSI:
00000000200000c0 RDI:
0000000000000005
[ 15.292437] RBP:
000000000049bf00 R08:
0000000000000000 R09:
0000000000000000
[ 15.293456] R10:
0000000000000000 R11:
0000000000000246 R12:
000000000049bf0c
[ 15.294473] R13:
00007fff0004b6bf R14:
00007fae4a5ea000 R15:
0000000000000003
[ 15.295492] Modules linked in:
[ 15.295944] CR2:
0000000000000048
[ 15.296567] Kernel Offset: disabled
[ 15.296941] ---[ end Kernel panic - not syncing: Fatal exception ]---
Reported-by: Christoph Paasch <cpaasch@apple.com>
Fixes:
84dfe3677a6f (mptcp: send out dedicated ADD_ADDR packet)
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Link: https://lore.kernel.org/r/ccca4e8f01457a1b495c5d612ed16c5f7a585706.1608010058.git.geliangtang@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tobias Klauser [Tue, 15 Dec 2020 10:25:31 +0000 (11:25 +0100)]
devlink: use _BITUL() macro instead of BIT() in the UAPI header
The BIT() macro is not available for the UAPI headers. Moreover, it can
be defined differently in user space headers. Thus, replace its usage
with the _BITUL() macro which is already used in other macro definitions
in <linux/devlink.h>.
Fixes:
dc64cc7c6310 ("devlink: Add devlink reload limit option")
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Link: https://lore.kernel.org/r/20201215102531.16958-1-tklauser@distanz.ch
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vincent Stehlé [Mon, 14 Dec 2020 22:09:52 +0000 (23:09 +0100)]
net: korina: fix return value
The ndo_start_xmit() method must not attempt to free the skb to transmit
when returning NETDEV_TX_BUSY. Therefore, make sure the
korina_send_packet() function returns NETDEV_TX_OK when it frees a packet.
Fixes:
ef11291bcd5f ("Add support the Korina (IDT RC32434) Ethernet MAC")
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Vincent Stehlé <vincent.stehle@laposte.net>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20201214220952.19935-1-vincent.stehle@laposte.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Karsten Graul [Tue, 15 Dec 2020 09:10:58 +0000 (10:10 +0100)]
net/smc: fix access to parent of an ib device
The parent of an ib device is used to retrieve the PCI device
attributes. It turns out that there are possible cases when an ib device
has no parent set in the device structure, which may lead to page
faults when trying to access this memory.
Fix that by checking the parent pointer and consolidate the pci device
specific processing in a new function.
Fixes:
a3db10efcc4c ("net/smc: Add support for obtaining SMCR device list")
Reported-by: syzbot+600fef7c414ee7e2d71b@syzkaller.appspotmail.com
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Link: https://lore.kernel.org/r/20201215091058.49354-2-kgraul@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ivan Vecera [Tue, 15 Dec 2020 09:08:10 +0000 (10:08 +0100)]
ethtool: fix error paths in ethnl_set_channels()
Fix two error paths in ethnl_set_channels() to avoid lock-up caused
but unreleased RTNL.
Fixes:
e19c591eafad ("ethtool: set device channel counts with CHANNELS_SET request")
Reported-by: LiLiang <liali@redhat.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Link: https://lore.kernel.org/r/20201215090810.801777-1-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 16 Dec 2020 21:09:37 +0000 (13:09 -0800)]
Merge branch 'nfc-s3fwrn5-refactor-the-s3fwrn5-module'
Bongsu Jeon says:
====================
nfc: s3fwrn5: Refactor the s3fwrn5 module
Refactor the s3fwrn5 module.
1/2 is to remove the unneeded delay for NFC sleep.
2/2 is to remove the unused NCI prop commands.
====================
Link: https://lore.kernel.org/r/20201215065401.3220-1-bongsu.jeon@samsung.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Bongsu Jeon [Tue, 15 Dec 2020 06:54:01 +0000 (15:54 +0900)]
nfc: s3fwrn5: Remove unused NCI prop commands
Remove the unused NCI prop commands that s3fwrn5 driver doesn't use.
Signed-off-by: Bongsu Jeon <bongsu.jeon@samsung.com>
Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Bongsu Jeon [Tue, 15 Dec 2020 06:54:00 +0000 (15:54 +0900)]
nfc: s3fwrn5: Remove the delay for NFC sleep
Remove the delay for NFC sleep because the delay is only needed to
guarantee that the NFC is awake.
Signed-off-by: Bongsu Jeon <bongsu.jeon@samsung.com>
Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 15 Dec 2020 06:37:50 +0000 (22:37 -0800)]
phy: fix kdoc warning
Kdoc does not like it when multiline comment follows the networking
style of starting right on the first line:
include/linux/phy.h:869: warning: Function parameter or member 'config_intr' not described in 'phy_driver'
Link: https://lore.kernel.org/r/20201215063750.3120976-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Hoang Le [Tue, 15 Dec 2020 03:31:51 +0000 (10:31 +0700)]
tipc: do sanity check payload of a netlink message
When we initialize nlmsghdr with no payload inside tipc_nl_compat_dumpit()
the parsing function returns -EINVAL. We fix it by making the parsing call
conditional.
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Link: https://lore.kernel.org/r/20201215033151.76139-1-hoang.h.le@dektech.com.au
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 16 Dec 2020 19:43:52 +0000 (11:43 -0800)]
Merge branch 'locked-version-of-netdev_notify_peers'
Lijun Pan says:
====================
add a locked version of netdev_notify_peers
This series introduce the lockless version of netdev_notify_peers
and then apply it to the relevant drivers.
In v1, a more appropriate name __netdev_notify_peers is used;
netdev_notify_peers is converted to call the new helper.
In v2, patch 3 calls the new helper where notify variable used
to be set true.
====================
Link: https://lore.kernel.org/r/20201214211930.80778-1-ljp@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Lijun Pan [Mon, 14 Dec 2020 21:19:30 +0000 (15:19 -0600)]
use __netdev_notify_peers in hyperv
Start to use the lockless version of netdev_notify_peers.
Call the helper where notify variable used to be set true.
Remove the notify bool variable and sort the variables
in reverse Christmas tree order.
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Lijun Pan [Mon, 14 Dec 2020 21:19:29 +0000 (15:19 -0600)]
use __netdev_notify_peers in ibmvnic
Start to use the lockless version of netdev_notify_peers.
Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Lijun Pan [Mon, 14 Dec 2020 21:19:28 +0000 (15:19 -0600)]
net: core: introduce __netdev_notify_peers
There are some use cases for netdev_notify_peers in the context
when rtnl lock is already held. Introduce lockless version
of netdev_notify_peers call to save the extra code to call
call_netdevice_notifiers(NETDEV_NOTIFY_PEERS, dev);
call_netdevice_notifiers(NETDEV_RESEND_IGMP, dev);
After that, convert netdev_notify_peers to call the new helper.
Suggested-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Christophe JAILLET [Mon, 14 Dec 2020 20:21:17 +0000 (21:21 +0100)]
net: allwinner: Fix some resources leak in the error handling path of the probe and in the remove function
'irq_of_parse_and_map()' should be balanced by a corresponding
'irq_dispose_mapping()' call. Otherwise, there is some resources leaks.
Add such a call in the error handling path of the probe function and in the
remove function.
Fixes:
492205050d77 ("net: Add EMAC ethernet driver found on Allwinner A10 SoC's")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/20201214202117.146293-1-christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Michal Kubecek [Mon, 14 Dec 2020 13:25:01 +0000 (14:25 +0100)]
ethtool: fix string set id check
Syzbot reported a shift of a u32 by more than 31 in strset_parse_request()
which is undefined behavior. This is caused by range check of string set id
using variable ret (which is always 0 at this point) instead of id (string
set id from request).
Fixes:
71921690f974 ("ethtool: provide string sets with STRSET_GET request")
Reported-by: syzbot+96523fb438937cd01220@syzkaller.appspotmail.com
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Link: https://lore.kernel.org/r/b54ed5c5fd972a59afea3e1badfb36d86df68799.1607952208.git.mkubecek@suse.cz
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Christophe JAILLET [Sun, 13 Dec 2020 11:48:38 +0000 (12:48 +0100)]
net: mscc: ocelot: Fix a resource leak in the error handling path of the probe function
In case of error after calling 'ocelot_init()', it must be undone by a
corresponding 'ocelot_deinit()' call, as already done in the remove
function.
Fixes:
a556c76adc05 ("net: mscc: Add initial Ocelot switch support")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20201213114838.126922-1-christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Geoff Levand [Tue, 15 Dec 2020 05:15:47 +0000 (21:15 -0800)]
net/connector: Add const qualifier to cb_id
The connector driver never modifies any cb_id passed to it, so add a const
qualifier to those arguments so callers can declare their struct cb_id as a
constant object.
Fixes build warnings like these when passing a constant struct cb_id:
warning: passing argument 1 of ‘cn_add_callback’ discards ‘const’ qualifier from pointer target
Signed-off-by: Geoff Levand <geoff@infradead.org>
Link: https://lore.kernel.org/r/a9e49c9e-67fa-16e7-0a6b-72f6bd30c58a@infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Christophe JAILLET [Sat, 12 Dec 2020 18:20:05 +0000 (19:20 +0100)]
net: bcmgenet: Fix a resource leak in an error handling path in the probe functin
If the 'register_netdev()' call fails, we must undo a previous
'bcmgenet_mii_init()' call.
Fixes:
1c1008c793fa ("net: bcmgenet: add main driver file")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20201212182005.120437-1-christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ioana Ciornei [Fri, 11 Dec 2020 17:16:07 +0000 (19:16 +0200)]
dpaa2-eth: fix the size of the mapped SGT buffer
This patch fixes an error condition triggered when the code path which
transmits a S/G frame descriptor when the skb's headroom is not enough
for DPAA2's needs.
We are greated with a splat like the one below when a SGT structure is
recycled and that is because even though a dma_unmap is performed on the
Tx confirmation path, the unmap is not done with the proper size.
[ 714.464927] WARNING: CPU: 13 PID: 0 at drivers/iommu/io-pgtable-arm.c:281 __arm_lpae_map+0x2d4/0x30c
(...)
[ 714.465343] Call trace:
[ 714.465348] __arm_lpae_map+0x2d4/0x30c
[ 714.465353] __arm_lpae_map+0x114/0x30c
[ 714.465357] __arm_lpae_map+0x114/0x30c
[ 714.465362] __arm_lpae_map+0x114/0x30c
[ 714.465366] arm_lpae_map+0xf4/0x180
[ 714.465373] arm_smmu_map+0x4c/0xc0
[ 714.465379] __iommu_map+0x100/0x2bc
[ 714.465385] iommu_map_atomic+0x20/0x30
[ 714.465391] __iommu_dma_map+0xb0/0x110
[ 714.465397] iommu_dma_map_page+0xb8/0x120
[ 714.465404] dma_map_page_attrs+0x1a8/0x210
[ 714.465413] __dpaa2_eth_tx+0x384/0xbd0 [fsl_dpaa2_eth]
[ 714.465421] dpaa2_eth_tx+0x84/0x134 [fsl_dpaa2_eth]
[ 714.465427] dev_hard_start_xmit+0x10c/0x2b0
[ 714.465433] sch_direct_xmit+0x1a0/0x550
(...)
The dpaa2-eth driver uses an area of software annotations to transmit
necessary information from the Tx path to the Tx confirmation one. This
SWA structure has a different layout for each kind of frame that we are
dealing with: linear, S/G or XDP.
The commit referenced was incorrectly setting up the 'sgt_size' field
for the S/G type of SWA even though we are dealing with a linear skb
here.
Fixes:
d70446ee1f40 ("dpaa2-eth: send a scatter-gather FD instead of realloc-ing")
Reported-by: Daniel Thompson <daniel.thompson@linaro.org>
Tested-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Link: https://lore.kernel.org/r/20201211171607.108034-1-ciorneiioana@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Oleksij Rempel [Fri, 11 Dec 2020 11:03:17 +0000 (12:03 +0100)]
net: dsa: qca: ar9331: fix sleeping function called from invalid context bug
With lockdep enabled, we will get following warning:
ar9331_switch ethernet.1:10 lan0 (uninitialized): PHY [!ahb!ethernet@
1a000000!mdio!switch@10:00] driver [Qualcomm Atheros AR9331 built-in PHY] (irq=13)
BUG: sleeping function called from invalid context at kernel/locking/mutex.c:935
in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 18, name: kworker/0:1
INFO: lockdep is turned off.
irq event stamp: 602
hardirqs last enabled at (601): [<
8073fde0>] _raw_spin_unlock_irq+0x3c/0x80
hardirqs last disabled at (602): [<
8073a4f4>] __schedule+0x184/0x800
softirqs last enabled at (0): [<
80080f60>] copy_process+0x578/0x14c8
softirqs last disabled at (0): [<
00000000>] 0x0
CPU: 0 PID: 18 Comm: kworker/0:1 Not tainted 5.10.0-rc3-ar9331-00734-g7d644991df0c #31
Workqueue: events deferred_probe_work_func
Stack :
80980000 80980000 8089ef70 80890000 804b5414 80980000 00000002 80b53728
00000000 800d1268 804b5414 ffffffde 00000017 800afe08 81943860 0f5bfc32
00000000 00000000 8089ef70 819436c0 ffffffea 00000000 00000000 00000000
8194390c 808e353c 0000000f 66657272 80980000 00000000 00000000 80890000
804b5414 80980000 00000002 80b53728 00000000 00000000 00000000 80d40000
...
Call Trace:
[<
80069ce0>] show_stack+0x9c/0x140
[<
800afe08>] ___might_sleep+0x220/0x244
[<
8073bfb0>] __mutex_lock+0x70/0x374
[<
8073c2e0>] mutex_lock_nested+0x2c/0x38
[<
804b5414>] regmap_update_bits_base+0x38/0x8c
[<
804ee584>] regmap_update_bits+0x1c/0x28
[<
804ee714>] ar9331_sw_unmask_irq+0x34/0x60
[<
800d91f0>] unmask_irq+0x48/0x70
[<
800d93d4>] irq_startup+0x114/0x11c
[<
800d65b4>] __setup_irq+0x4f4/0x6d0
[<
800d68a0>] request_threaded_irq+0x110/0x190
[<
804e3ef0>] phy_request_interrupt+0x4c/0xe4
[<
804df508>] phylink_bringup_phy+0x2c0/0x37c
[<
804df7bc>] phylink_of_phy_connect+0x118/0x130
[<
806c1a64>] dsa_slave_create+0x3d0/0x578
[<
806bc4ec>] dsa_register_switch+0x934/0xa20
[<
804eef98>] ar9331_sw_probe+0x34c/0x364
[<
804eb48c>] mdio_probe+0x44/0x70
[<
8049e3b4>] really_probe+0x30c/0x4f4
[<
8049ea10>] driver_probe_device+0x264/0x26c
[<
8049bc10>] bus_for_each_drv+0xb4/0xd8
[<
8049e684>] __device_attach+0xe8/0x18c
[<
8049ce58>] bus_probe_device+0x48/0xc4
[<
8049db70>] deferred_probe_work_func+0xdc/0xf8
[<
8009ff64>] process_one_work+0x2e4/0x4a0
[<
800a0770>] worker_thread+0x2a8/0x354
[<
800a774c>] kthread+0x16c/0x174
[<
8006306c>] ret_from_kernel_thread+0x14/0x1c
ar9331_switch ethernet.1:10 lan1 (uninitialized): PHY [!ahb!ethernet@
1a000000!mdio!switch@10:02] driver [Qualcomm Atheros AR9331 built-in PHY] (irq=13)
DSA: tree 0 setup
To fix it, it is better to move access to MDIO register to the .irq_bus_sync_unlock
call back.
Fixes:
ec6698c272de ("net: dsa: add support for Atheros AR9331 built-in switch")
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20201211110317.17061-1-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 16 Dec 2020 18:51:09 +0000 (10:51 -0800)]
Merge branch 'i40e-ice-af_xdp-zc-fixes'
Björn Töpel says:
====================
i40e/ice AF_XDP ZC fixes
This series address two crashes in the AF_XDP zero-copy mode for ice
and i40e. More details in each individual the commit message.
====================
Link: https://lore.kernel.org/r/20201211145712.72957-1-bjorn.topel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Björn Töpel [Fri, 11 Dec 2020 14:57:12 +0000 (15:57 +0100)]
i40e, xsk: clear the status bits for the next_to_use descriptor
On the Rx side, the next_to_use index points to the next item in the
HW ring to be refilled/allocated, and next_to_clean points to the next
item to potentially be processed.
When the HW Rx ring is fully refilled, i.e. no packets has been
processed, the next_to_use will be next_to_clean - 1. When the ring is
fully processed next_to_clean will be equal to next_to_use. The latter
case is where a bug is triggered.
If the next_to_use bits are not cleared, and the "fully processed"
state is entered, a stale descriptor can be processed.
The skb-path correctly clear the status bit for the next_to_use
descriptor, but the AF_XDP zero-copy path did not do that.
This change adds the status bits clearing of the next_to_use
descriptor.
Fixes:
3b4f0b66c2b3 ("i40e, xsk: Migrate to new MEM_TYPE_XSK_BUFF_POOL")
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Björn Töpel [Fri, 11 Dec 2020 14:57:11 +0000 (15:57 +0100)]
ice, xsk: clear the status bits for the next_to_use descriptor
On the Rx side, the next_to_use index points to the next item in the
HW ring to be refilled/allocated, and next_to_clean points to the next
item to potentially be processed.
When the HW Rx ring is fully refilled, i.e. no packets has been
processed, the next_to_use will be next_to_clean - 1. When the ring is
fully processed next_to_clean will be equal to next_to_use. The latter
case is where a bug is triggered.
If the next_to_use bits are not cleared, and the "fully processed"
state is entered, a stale descriptor can be processed.
The skb-path correctly clear the status bit for the next_to_use
descriptor, but the AF_XDP zero-copy path did not do that.
This change adds the status bits clearing of the next_to_use
descriptor.
Fixes:
2d4238f55697 ("ice: Add support for AF_XDP")
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sven Van Asbroeck [Tue, 15 Dec 2020 16:19:54 +0000 (11:19 -0500)]
lan743x: fix rx_napi_poll/interrupt ping-pong
Even if there is more rx data waiting on the chip, the rx napi poll fn
will never run more than once - it will always read a few buffers, then
bail out and re-arm interrupts. Which results in ping-pong between napi
and interrupt.
This defeats the purpose of napi, and is bad for performance.
Fix by making the rx napi poll behave identically to other ethernet
drivers:
1. initialize rx napi polling with an arbitrary budget (64).
2. in the polling fn, return full weight if rx queue is not depleted,
this tells the napi core to "keep polling".
3. update the rx tail ("ring the doorbell") once for every 8 processed
rx ring buffers.
Thanks to Jakub Kicinski, Eric Dumazet and Andrew Lunn for their expert
opinions and suggestions.
Tested with 20 seconds of full bandwidth receive (iperf3):
rx irqs softirqs(NET_RX)
-----------------------------
before 23827 33620
after 129 4081
Tested-by: Sven Van Asbroeck <thesven73@gmail.com> # lan7430
Fixes:
23f0703c125be ("lan743x: Add main source files for new lan743x driver")
Signed-off-by: Sven Van Asbroeck <thesven73@gmail.com>
Link: https://lore.kernel.org/r/20201215161954.5950-1-TheSven73@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Tue, 15 Dec 2020 22:18:40 +0000 (14:18 -0800)]
Merge tag 'staging-5.11-rc1' of git://git./linux/kernel/git/gregkh/staging
Pull staging / IIO driver updates from Greg KH:
"Here is the big staging and IIO driver pull request for 5.11-rc1
Lots of different things in here:
- loads of driver updates
- so many coding style cleanups
- new IIO drivers
- Android ION code is finally removed from the tree
- wimax drivers are moved to staging on their way out of the kernel
Nothing really exciting, just the constant grind of kernel development :)
All have been in linux-next for a while with no reported issues"
* tag 'staging-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (341 commits)
staging: olpc_dcon: Do not call platform_device_unregister() in dcon_probe()
staging: most: Fix spelling mistake "tranceiver" -> "transceiver"
staging: qlge: remove duplicate word in comment
staging: comedi: mf6x4: Fix AI end-of-conversion detection
staging: greybus: Add TODO item about modernizing the pwm code
pinctrl: ralink: add a pinctrl driver for the rt2880 family
dt-bindings: pinctrl: rt2880: add binding document
staging: rtl8723bs: remove ELEMENT_ID enum
staging: rtl8723bs: remove unused macros
staging: rtl8723bs: replace EID_EXTCapability
staging: rtl8723bs: replace EID_BSSIntolerantChlReport
staging: rtl8723bs: replace EID_BSSCoexistence
staging: rtl8723bs: replace _MME_IE_
staging: rtl8723bs: replace _WAPI_IE_
staging: rtl8723bs: replace _EXT_SUPPORTEDRATES_IE_
staging: rtl8723bs: replace _ERPINFO_IE_
staging: rtl8723bs: replace _CHLGETXT_IE_
staging: rtl8723bs: replace _COUNTRY_IE_
staging: rtl8723bs: replace _IBSS_PARA_IE_
staging: rtl8723bs: replace _TIM_IE_
...
Linus Torvalds [Tue, 15 Dec 2020 22:10:09 +0000 (14:10 -0800)]
Merge tag 'char-misc-5.11-rc1' of git://git./linux/kernel/git/gregkh/char-misc
Pull char / misc driver updates from Greg KH:
"Here is the big char/misc driver update for 5.11-rc1.
Continuing the tradition of previous -rc1 pulls, there seems to be
more and more tiny driver subsystems flowing through this tree.
Lots of different things, all of which have been in linux-next for a
while with no reported issues:
- extcon driver updates
- habannalab driver updates
- mei driver updates
- uio driver updates
- binder fixes and features added
- soundwire driver updates
- mhi bus driver updates
- phy driver updates
- coresight driver updates
- fpga driver updates
- speakup driver updates
- slimbus driver updates
- various small char and misc driver updates"
* tag 'char-misc-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (305 commits)
extcon: max77693: Fix modalias string
extcon: fsa9480: Support TI TSU6111 variant
extcon: fsa9480: Rewrite bindings in YAML and extend
dt-bindings: extcon: add binding for TUSB320
extcon: Add driver for TI TUSB320
slimbus: qcom: fix potential NULL dereference in qcom_slim_prg_slew()
siox: Make remove callback return void
siox: Use bus_type functions for probe, remove and shutdown
spmi: Add driver shutdown support
spmi: fix some coding style issues at the spmi core
spmi: get rid of a warning when built with W=1
uio: uio_hv_generic: use devm_kzalloc() for private data alloc
uio: uio_fsl_elbc_gpcm: use device-managed allocators
uio: uio_aec: use devm_kzalloc() for uio_info object
uio: uio_cif: use devm_kzalloc() for uio_info object
uio: uio_netx: use devm_kzalloc() for or uio_info object
uio: uio_mf624: use devm_kzalloc() for uio_info object
uio: uio_sercos3: use device-managed functions for simple allocs
uio: uio_dmem_genirq: finalize conversion of probe to devm_ handlers
uio: uio_dmem_genirq: convert simple allocations to device-managed
...
Linus Torvalds [Tue, 15 Dec 2020 22:02:26 +0000 (14:02 -0800)]
Merge tag 'driver-core-5.11-rc1' of git://git./linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is the big driver core updates for 5.11-rc1
This time there was a lot of different work happening here for some
reason:
- redo of the fwnode link logic, speeding it up greatly
- auxiliary bus added (this was a tag that will be pulled in from
other trees/maintainers this merge window as well, as driver
subsystems started to rely on it)
- platform driver core cleanups on the way to fixing some long-time
api updates in future releases
- minor fixes and tweaks.
All have been in linux-next with no (finally) reported issues. Testing
there did helped in shaking issues out a lot :)"
* tag 'driver-core-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (39 commits)
driver core: platform: don't oops in platform_shutdown() on unbound devices
ACPI: Use fwnode_init() to set up fwnode
misc: pvpanic: Replace OF headers by mod_devicetable.h
misc: pvpanic: Combine ACPI and platform drivers
usb: host: sl811: Switch to use platform_get_mem_or_io()
vfio: platform: Switch to use platform_get_mem_or_io()
driver core: platform: Introduce platform_get_mem_or_io()
dyndbg: fix use before null check
soc: fix comment for freeing soc_dev_attr
driver core: platform: use bus_type functions
driver core: platform: change logic implementing platform_driver_probe
driver core: platform: reorder functions
driver core: make driver_probe_device() static
driver core: Fix a couple of typos
driver core: Reorder devices on successful probe
driver core: Delete pointless parameter in fwnode_operations.add_links
driver core: Refactor fw_devlink feature
efi: Update implementation of add_links() to create fwnode links
of: property: Update implementation of add_links() to create fwnode links
driver core: Use device's fwnode to check if it is waiting for suppliers
...
Linus Torvalds [Tue, 15 Dec 2020 21:57:14 +0000 (13:57 -0800)]
Merge tag 'tty-5.11-rc1' of git://git./linux/kernel/git/gregkh/tty
Pull tty / serial updates from Greg KH:
"Here is the "large" set of tty and serial patches for 5.11-rc1.
Nothing major at all, some cleanups and some driver removals, always a
nice sign:
- build warning cleanups
- vt locking and logic unwinding and cleanups
- tiny serial driver fixes and updates
- removal of the synclink serial driver as it's no longer needed
- removal of dead termiox code
All of this has been in linux-next for a while with no reported issues"
* tag 'tty-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (89 commits)
serial: 8250_pci: Drop bogus __refdata annotation
tty: serial: meson: enable console as module
serial: 8250_omap: Avoid FIFO corruption caused by MDR1 access
serial: imx: Move imx_uart_probe_dt() content into probe()
serial: imx: Remove unneeded of_device_get_match_data() NULL check
tty: Fix whitespace inconsistencies in vt_io_ioctl
serial_core: Check for port state when tty is in error state
dt-bindings: serial: Update DT binding docs to support SiFive FU740 SoC
tty: use const parameters in port-flag accessors
tty: use assign_bit() in port-flag accessors
earlycon: drop semicolon from earlycon macro
tty: Remove dead termiox code
tty/serial/imx: Enable TXEN bit in imx_poll_init().
tty : serial: jsm: Fixed file by adding spacing
tty: serial: uartlite: Support probe deferral
earlycon: simplify earlycon-table implementation
tty: serial: bcm63xx: lower driver dependencies
serial: mxs-auart: Remove unneeded platform_device_id
serial: 8250-mtk: Fix reference leak in mtk8250_probe
serial: imx: Remove unused .id_table support
...
Linus Torvalds [Tue, 15 Dec 2020 21:54:56 +0000 (13:54 -0800)]
Merge tag 'usb-5.11-rc1' of git://git./linux/kernel/git/gregkh/usb
Pull USB / Thunderbolt updates from Greg KH:
"Here is the big USB and thunderbolt pull request for 5.11-rc1.
Nothing major in here, just the grind of constant development to
support new hardware and fix old issues:
- thunderbolt updates for new USB4 hardware
- cdns3 major driver updates
- lots of typec updates and additions as more hardware is available
- usb serial driver updates and fixes
- other tiny USB driver updates
All have been in linux-next with no reported issues"
* tag 'usb-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (172 commits)
usb: phy: convert comma to semicolon
usb: ucsi: convert comma to semicolon
usb: typec: tcpm: convert comma to semicolon
usb: typec: tcpm: Update vbus_vsafe0v on init
usb: typec: tcpci: Enable bleed discharge when auto discharge is enabled
usb: typec: Add class for plug alt mode device
USB: typec: tcpci: Add Bleed discharge to POWER_CONTROL definition
USB: typec: tcpm: Add a 30ms room for tPSSourceOn in PR_SWAP
USB: typec: tcpm: Fix PR_SWAP error handling
USB: typec: tcpm: Hard Reset after not receiving a Request
USB: gadget: f_fs: remove likely/unlikely
usb: gadget: f_fs: Re-use SS descriptors for SuperSpeedPlus
USB: gadget: f_midi: setup SuperSpeed Plus descriptors
USB: gadget: f_acm: add support for SuperSpeed Plus
USB: gadget: f_rndis: fix bitrate for SuperSpeed and above
usb: typec: intel_pmc_mux: Configure cable generation value for USB4
MAINTAINERS: Add myself as a reviewer for CADENCE USB3 DRD IP DRIVER
usb: chipidea: ci_hdrc_imx: Use of_device_get_match_data()
usb: chipidea: usbmisc_imx: Use of_device_get_match_data()
usb: cdns3: fix NULL pointer dereference on no platform data
...
Linus Torvalds [Tue, 15 Dec 2020 21:43:47 +0000 (13:43 -0800)]
Merge tag 'sound-5.11-rc1' of git://git./linux/kernel/git/tiwai/sound
Pull sound updates from Takashi Iwai:
"Lots of changes (slightly more code increase than usual) at this time,
while most of code changes are ASoC driver-specific.
Here are some highlights:
Core:
- The new auxiliary bus implementation for Intel DSP, which will be
used by other drivers as well
- Lots of ASoC core cleanups and refactoring
- UBSAN and KCSAN fixes in rawmidi, sequencer and a few others
- Compress-offload API enhancement for the pause during draining
HD- and USB-audio:
- Enhancements of the USB-audio implicit feedback support, including
better full-duplex operations
- Continued CA0132 improvements and fixes
- A few new quirk entries, HDMI audio fixes
ASoC:
- Support for boot time selection of Intel DSP firmware, which should
help distros/users testing new stuff more easily; the kconfig was
moved to boot time option, too
- Some basic DPCM support in audio graph card
- Removal of old pre-DT Freescale drivers
- Support for Allwinner H6 I2S, Analog Devices ADAU1372, Intel
Alderlake-S, GMediatek MT8192, NXP i.MX HDMI and XCVR, Realtek
RT715, Qualcomm SM8250 and simple GPIO based muxes"
* tag 'sound-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (445 commits)
ALSA: pcm: oss: Fix potential out-of-bounds shift
ALSA: usb-audio: Fix potential out-of-bounds shift
ALSA: hda/ca0132 - Add ZxR surround DAC setup.
ALSA: hda/ca0132 - Add 8051 PLL write helper functions.
ALSA: hda/hdmi: packet buffer index must be set before reading value
ASoC: SOF: imx: update kernel-doc description
ASoC: mediatek: mt8183: delete some unreachable code
ASoC: mediatek: mt8183: add PM ops to machine drivers
ASoC: topology: Fix wrong size check
ASoC: topology: Add missing size check
ASoC: SOF: Intel: hda: fix the condition passed to sof_dev_dbg_or_err
ASoC: SOF: modify the SOF_DBG flags
ASoC: SOF: Intel: hda: remove duplicated status dump
ASoC: rt1015p: delay 300ms after SDB pulling high for calibration
ASoC: rt1015p: move SDB control from trigger to DAPM
ASoC: wm_adsp: remove "ctl" from list on error in wm_adsp_create_control()
ALSA: usb-audio: Fix control 'access overflow' errors from chmap
ALSA: hda/hdmi: always print pin NIDs as hexadecimal
ALSA: hda/realtek - Add supported for more Lenovo ALC285 Headset Button
ALSA: hda/ca0132 - Remove now unnecessary DSP setup functions.
...
Linus Torvalds [Tue, 15 Dec 2020 21:22:29 +0000 (13:22 -0800)]
Merge tag 'net-next-5.11' of git://git./linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core:
- support "prefer busy polling" NAPI operation mode, where we defer
softirq for some time expecting applications to periodically busy
poll
- AF_XDP: improve efficiency by more batching and hindering the
adjacency cache prefetcher
- af_packet: make packet_fanout.arr size configurable up to 64K
- tcp: optimize TCP zero copy receive in presence of partial or
unaligned reads making zero copy a performance win for much smaller
messages
- XDP: add bulk APIs for returning / freeing frames
- sched: support fragmenting IP packets as they come out of conntrack
- net: allow virtual netdevs to forward UDP L4 and fraglist GSO skbs
BPF:
- BPF switch from crude rlimit-based to memcg-based memory accounting
- BPF type format information for kernel modules and related tracing
enhancements
- BPF implement task local storage for BPF LSM
- allow the FENTRY/FEXIT/RAW_TP tracing programs to use
bpf_sk_storage
Protocols:
- mptcp: improve multiple xmit streams support, memory accounting and
many smaller improvements
- TLS: support CHACHA20-POLY1305 cipher
- seg6: add support for SRv6 End.DT4/DT6 behavior
- sctp: Implement RFC 6951: UDP Encapsulation of SCTP
- ppp_generic: add ability to bridge channels directly
- bridge: Connectivity Fault Management (CFM) support as is defined
in IEEE 802.1Q section 12.14.
Drivers:
- mlx5: make use of the new auxiliary bus to organize the driver
internals
- mlx5: more accurate port TX timestamping support
- mlxsw:
- improve the efficiency of offloaded next hop updates by using
the new nexthop object API
- support blackhole nexthops
- support IEEE 802.1ad (Q-in-Q) bridging
- rtw88: major bluetooth co-existance improvements
- iwlwifi: support new 6 GHz frequency band
- ath11k: Fast Initial Link Setup (FILS)
- mt7915: dual band concurrent (DBDC) support
- net: ipa: add basic support for IPA v4.5
Refactor:
- a few pieces of in_interrupt() cleanup work from Sebastian Andrzej
Siewior
- phy: add support for shared interrupts; get rid of multiple driver
APIs and have the drivers write a full IRQ handler, slight growth
of driver code should be compensated by the simpler API which also
allows shared IRQs
- add common code for handling netdev per-cpu counters
- move TX packet re-allocation from Ethernet switch tag drivers to a
central place
- improve efficiency and rename nla_strlcpy
- number of W=1 warning cleanups as we now catch those in a patchwork
build bot
Old code removal:
- wan: delete the DLCI / SDLA drivers
- wimax: move to staging
- wifi: remove old WDS wifi bridging support"
* tag 'net-next-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1922 commits)
net: hns3: fix expression that is currently always true
net: fix proc_fs init handling in af_packet and tls
nfc: pn533: convert comma to semicolon
af_vsock: Assign the vsock transport considering the vsock address flags
af_vsock: Set VMADDR_FLAG_TO_HOST flag on the receive path
vsock_addr: Check for supported flag values
vm_sockets: Add VMADDR_FLAG_TO_HOST vsock flag
vm_sockets: Add flags field in the vsock address data structure
net: Disable NETIF_F_HW_TLS_TX when HW_CSUM is disabled
tcp: Add logic to check for SYN w/ data in tcp_simple_retransmit
net: mscc: ocelot: install MAC addresses in .ndo_set_rx_mode from process context
nfc: s3fwrn5: Release the nfc firmware
net: vxget: clean up sparse warnings
mlxsw: spectrum_router: Use eXtended mezzanine to offload IPv4 router
mlxsw: spectrum: Set KVH XLT cache mode for Spectrum2/3
mlxsw: spectrum_router_xm: Introduce basic XM cache flushing
mlxsw: reg: Add Router LPM Cache Enable Register
mlxsw: reg: Add Router LPM Cache ML Delete Register
mlxsw: spectrum_router_xm: Implement L-value tracking for M-index
mlxsw: reg: Add XM Router M Table Register
...
Linus Torvalds [Tue, 15 Dec 2020 20:53:37 +0000 (12:53 -0800)]
Merge branch 'akpm' (patches from Andrew)
Merge misc updates from Andrew Morton:
- a few random little subsystems
- almost all of the MM patches which are staged ahead of linux-next
material. I'll trickle to post-linux-next work in as the dependents
get merged up.
Subsystems affected by this patch series: kthread, kbuild, ide, ntfs,
ocfs2, arch, and mm (slab-generic, slab, slub, dax, debug, pagecache,
gup, swap, shmem, memcg, pagemap, mremap, hmm, vmalloc, documentation,
kasan, pagealloc, memory-failure, hugetlb, vmscan, z3fold, compaction,
oom-kill, migration, cma, page-poison, userfaultfd, zswap, zsmalloc,
uaccess, zram, and cleanups).
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (200 commits)
mm: cleanup kstrto*() usage
mm: fix fall-through warnings for Clang
mm: slub: convert sysfs sprintf family to sysfs_emit/sysfs_emit_at
mm: shmem: convert shmem_enabled_show to use sysfs_emit_at
mm:backing-dev: use sysfs_emit in macro defining functions
mm: huge_memory: convert remaining use of sprintf to sysfs_emit and neatening
mm: use sysfs_emit for struct kobject * uses
mm: fix kernel-doc markups
zram: break the strict dependency from lzo
zram: add stat to gather incompressible pages since zram set up
zram: support page writeback
mm/process_vm_access: remove redundant initialization of iov_r
mm/zsmalloc.c: rework the list_add code in insert_zspage()
mm/zswap: move to use crypto_acomp API for hardware acceleration
mm/zswap: fix passing zero to 'PTR_ERR' warning
mm/zswap: make struct kernel_param_ops definitions const
userfaultfd/selftests: hint the test runner on required privilege
userfaultfd/selftests: fix retval check for userfaultfd_open()
userfaultfd/selftests: always dump something in modes
userfaultfd: selftests: make __{s,u}64 format specifiers portable
...
Alexey Dobriyan [Tue, 15 Dec 2020 03:15:03 +0000 (19:15 -0800)]
mm: cleanup kstrto*() usage
Range checks can folded into proper conversion function. kstrto*() exist
for all arithmetic types.
Link: https://lkml.kernel.org/r/20201122123759.GC92364@localhost.localdomain
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Gustavo A. R. Silva [Tue, 15 Dec 2020 03:15:00 +0000 (19:15 -0800)]
mm: fix fall-through warnings for Clang
In preparation to enable -Wimplicit-fallthrough for Clang, fix a couple of
warnings by explicitly adding a break statement instead of just letting
the code fall through to the next, and by adding a fallthrough
pseudo-keyword in places where the code is intended to fall through.
Link: https://github.com/KSPP/linux/issues/115
Link: https://lkml.kernel.org/r/f5756988b8842a3f10008fbc5b0a654f828920a9.1605896059.git.gustavoars@kernel.org
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joe Perches [Tue, 15 Dec 2020 03:14:57 +0000 (19:14 -0800)]
mm: slub: convert sysfs sprintf family to sysfs_emit/sysfs_emit_at
Convert the unbounded uses of sprintf to sysfs_emit.
A few conversions may now not end in a newline if the output buffer is
overflowed.
Link: https://lkml.kernel.org/r/0c90a90f466167f8c37de4b737553cf49c4a277f.1605376435.git.joe@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joe Perches [Tue, 15 Dec 2020 03:14:53 +0000 (19:14 -0800)]
mm: shmem: convert shmem_enabled_show to use sysfs_emit_at
Update the function to use sysfs_emit_at while neatening the uses of
sprintf and overwriting the last space char with a newline to avoid
possible output buffer overflow.
Miscellanea:
- in shmem_enabled_show, the removal of the indirected use of fmt
allows __printf verification
Link: https://lkml.kernel.org/r/b612a93825e5ea330cb68d2e8b516e9687a06cc6.1605376435.git.joe@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joe Perches [Tue, 15 Dec 2020 03:14:50 +0000 (19:14 -0800)]
mm:backing-dev: use sysfs_emit in macro defining functions
The cocci script used in commit
bdacbb8d04f ("mm: Use sysfs_emit for
struct kobject * uses") does not convert the name##_show macro because the
macro uses concatenation via ##.
Convert it by hand.
Link: https://lkml.kernel.org/r/45ec6cfc177d743f9c0ebaf35e43969dce43af42.1605376435.git.joe@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joe Perches [Tue, 15 Dec 2020 03:14:46 +0000 (19:14 -0800)]
mm: huge_memory: convert remaining use of sprintf to sysfs_emit and neatening
Convert the only use of sprintf with struct kobject * that the cocci
script could not convert.
Miscellanea:
- Neaten the uses of a constant string with sysfs_emit to use a const
char * to reduce overall object size
Link: https://lkml.kernel.org/r/7df6be66bbd68e1a0bca9d35aca1341dbf94d2a7.1605376435.git.joe@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joe Perches [Tue, 15 Dec 2020 03:14:42 +0000 (19:14 -0800)]
mm: use sysfs_emit for struct kobject * uses
Patch series "mm: Convert sysfs sprintf family to sysfs_emit", v2.
Use the new sysfs_emit family and not the sprintf family.
This patch (of 5):
Use the sysfs_emit function instead of the sprintf family.
Done with cocci script as in commit
3c6bff3cf988 ("RDMA: Convert sysfs
kobject * show functions to use sysfs_emit()")
Link: https://lkml.kernel.org/r/cover.1605376435.git.joe@perches.com
Link: https://lkml.kernel.org/r/9c249215bad6df616ba0410ad980042694970c1b.1605376435.git.joe@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mauro Carvalho Chehab [Tue, 15 Dec 2020 03:14:39 +0000 (19:14 -0800)]
mm: fix kernel-doc markups
Kernel-doc markups should use this format:
identifier - description
Fix some issues on mm files:
1) The definition for get_user_pages_locked() doesn't follow it. Also,
it expects a short descrpition at the header, followed by a long one,
after the parameters. Fix it.
2) Kernel-doc requires that a kernel-doc markup to be immediately below
the function prototype, as otherwise it will rename it. So, move
get_pfnblock_flags_mask() description to the right place.
3) Make invalidate_mapping_pagevec() to also follow the expected
kernel-doc format.
While here, fix a few minor English syntax issues, as suggested
by Matthew:
will used -> will be used
similar with -> similar to
Link: https://lkml.kernel.org/r/80e85dddc92d333bc2159ee8a2294921612e8745.1605521731.git.mchehab+huawei@kernel.org
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Suggested-by: Mattew Wilcox <willy@infradead.org> [English fixes]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rui Salvaterra [Tue, 15 Dec 2020 03:14:35 +0000 (19:14 -0800)]
zram: break the strict dependency from lzo
From the beginning, the zram block device always enabled CRYPTO_LZO,
since lzo-rle is hardcoded as the fallback compression algorithm. As a
consequence, on systems where another compression algorithm is chosen
(e.g. CRYPTO_ZSTD), the lzo kernel module becomes unused, while still
having to be built/loaded.
This patch removes the hardcoded lzo-rle dependency and allows the user
to select the default compression algorithm for zram at build time. The
previous behaviour is kept, as the default algorithm is still lzo-rle.
Link: https://lkml.kernel.org/r/20201207121245.50529-1-rsalvaterra@gmail.com
Signed-off-by: Rui Salvaterra <rsalvaterra@gmail.com>
Suggested-by: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Suggested-by: Minchan Kim <minchan@kernel.org>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Minchan Kim [Tue, 15 Dec 2020 03:14:32 +0000 (19:14 -0800)]
zram: add stat to gather incompressible pages since zram set up
Currently, zram supports the stat via /sys/block/zram/mm_stat to represent
how many of incompressible pages are stored at the moment but it couldn't
show how many times incompressible pages were wrote down since zram set
up. It's also good indication to see how zram is effective in the system.
Link: https://lkml.kernel.org/r/20201130201907.1284910-1-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Minchan Kim [Tue, 15 Dec 2020 03:14:28 +0000 (19:14 -0800)]
zram: support page writeback
There is demand to writeback specific process pages to backing store
instead of all idles pages in the system due to storage wear out concerns
and to launching latency of apps which are most of the time idle but are
critical for resume latency.
This patch extends the writeback knob to support a specific page
writeback.
Link: https://lkml.kernel.org/r/20201020190506.3758660-1-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Colin Ian King [Tue, 15 Dec 2020 03:14:25 +0000 (19:14 -0800)]
mm/process_vm_access: remove redundant initialization of iov_r
The pointer iov_r is being initialized with a value that is never read and
it is being updated later with a new value. The initialization is
redundant and can be removed.
Link: https://lkml.kernel.org/r/20201102120614.694917-1-colin.king@canonical.com
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Miaohe Lin [Tue, 15 Dec 2020 03:14:22 +0000 (19:14 -0800)]
mm/zsmalloc.c: rework the list_add code in insert_zspage()
Rework the list_add code to make it more readable and simple.
Link: https://lkml.kernel.org/r/20201015130107.65195-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Barry Song [Tue, 15 Dec 2020 03:14:18 +0000 (19:14 -0800)]
mm/zswap: move to use crypto_acomp API for hardware acceleration
Right now, all new ZIP drivers are adapted to crypto_acomp APIs rather
than legacy crypto_comp APIs. Tradiontal ZIP drivers like lz4,lzo etc
have been also wrapped into acomp via scomp backend. But zswap.c is still
using the old APIs. That means zswap won't be able to work on any new ZIP
drivers in kernel.
This patch moves to use cryto_acomp APIs to fix the disconnected bridge
between new ZIP drivers and zswap. It is probably the first real user to
use acomp but perhaps not a good example to demonstrate how multiple acomp
requests can be executed in parallel in one acomp instance. frontswap is
doing page load and store page by page synchronously. swap_writepage()
depends on the completion of frontswap_store() to decide if it should call
__swap_writepage() to swap to disk.
However this patch creates multiple acomp instances, so multiple threads
running on multiple different cpus can actually do (de)compression
parallelly, leveraging the power of multiple ZIP hardware queues. This is
also consistent with frontswap's page management model.
The old zswap code uses atomic context and avoids the race conditions
while shared resources like zswap_dstmem are accessed. Here since acomp
can sleep, per-cpu mutex is used to replace preemption-disable.
While it is possible to make mm/page_io.c and mm/frontswap.c support async
(de)compression in some way, the entire design requires careful thinking
and performance evaluation. For the first step, the base with fixed
connection between ZIP drivers and zswap should be built.
Link: https://lkml.kernel.org/r/20201107065332.26992-1-song.bao.hua@hisilicon.com
Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
Acked-by: Vitaly Wool <vitalywool@gmail.com>
Cc: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: David S. Miller <davem@davemloft.net>
Cc: Mahipal Challa <mahipalreddy2006@gmail.com>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Zhou Wang <wangzhou1@hisilicon.com>
Cc: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
YueHaibing [Tue, 15 Dec 2020 03:14:15 +0000 (19:14 -0800)]
mm/zswap: fix passing zero to 'PTR_ERR' warning
Fix smatch warning:
mm/zswap.c:425 zswap_cpu_comp_prepare() warn: passing zero to 'PTR_ERR'
crypto_alloc_comp() never return NULL, use IS_ERR instead of
IS_ERR_OR_NULL to fix this.
Link: https://lkml.kernel.org/r/20201031055615.28080-1-yuehaibing@huawei.com
Fixes:
f1c54846ee45 ("zswap: dynamic pool creation")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joe Perches [Tue, 15 Dec 2020 03:14:11 +0000 (19:14 -0800)]
mm/zswap: make struct kernel_param_ops definitions const
These should be const, so make it so.
Link: https://lkml.kernel.org/r/1791535ee0b00f4a5c68cc4a8adada06593ad8f1.1601770305.git.joe@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Cc: "Maciej S. Szmigiero" <mail@maciej.szmigiero.name>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Peter Xu [Tue, 15 Dec 2020 03:14:08 +0000 (19:14 -0800)]
userfaultfd/selftests: hint the test runner on required privilege
Now userfaultfd test program requires either root or ptrace privilege due
to the signal/event tests. When UFFDIO_API failed, hint the test runner
about this fact verbosely.
Link: https://lkml.kernel.org/r/20201208024709.7701-4-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Peter Xu [Tue, 15 Dec 2020 03:14:05 +0000 (19:14 -0800)]
userfaultfd/selftests: fix retval check for userfaultfd_open()
userfaultfd_open() returns 1 for errors rather than negatives. Fix it on
all the callers so when UFFDIO_API failed the test will bail out.
Link: https://lkml.kernel.org/r/20201208024709.7701-3-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Peter Xu [Tue, 15 Dec 2020 03:14:02 +0000 (19:14 -0800)]
userfaultfd/selftests: always dump something in modes
Patch series "userfaultfd: selftests: Small fixes".
Some very trivial fixes that I kept locally to userfaultfd selftest
program.
This patch (of 3):
BOUNCE_POLL is a special bit that if cleared it means "READ" instead.
Dump that too otherwise we'll see tests with empty modes.
Link: https://lkml.kernel.org/r/20201208024709.7701-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20201208024709.7701-2-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Axel Rasmussen [Tue, 15 Dec 2020 03:13:58 +0000 (19:13 -0800)]
userfaultfd: selftests: make __{s,u}64 format specifiers portable
On certain platforms (powerpcle is the one on which I ran into this),
"%Ld" and "%Lu" are unsuitable for printing __s64 and __u64, respectively,
resulting in build warnings. Cast to {u,}int64_t, and use the PRI{d,u}64
macros defined in inttypes.h to print them. This ought to be portable to
all platforms.
Splitting this off into a separate macro lets us remove some lines, and
get rid of some (I would argue) stylistically odd cases where we joined
printf() and exit() into a single statement with a ,.
Finally, this also fixes a "missing braces around initializer" warning
when we initialize prms in wp_range().
[axelrasmussen@google.com: v2]
Link: https://lkml.kernel.org/r/20201203180244.1811601-1-axelrasmussen@google.com
Link: https://lkml.kernel.org/r/20201202211542.1121189-1-axelrasmussen@google.com
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Acked-by: Peter Xu <peterx@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Joe Perches <joe@perches.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Alan Gilbert <dgilbert@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Lokesh Gidra [Tue, 15 Dec 2020 03:13:54 +0000 (19:13 -0800)]
userfaultfd: add user-mode only option to unprivileged_userfaultfd sysctl knob
With this change, when the knob is set to 0, it allows unprivileged users
to call userfaultfd, like when it is set to 1, but with the restriction
that page faults from only user-mode can be handled. In this mode, an
unprivileged user (without SYS_CAP_PTRACE capability) must pass
UFFD_USER_MODE_ONLY to userfaultd or the API will fail with EPERM.
This enables administrators to reduce the likelihood that an attacker with
access to userfaultfd can delay faulting kernel code to widen timing
windows for other exploits.
The default value of this knob is changed to 0. This is required for
correct functioning of pipe mutex. However, this will fail postcopy live
migration, which will be unnoticeable to the VM guests. To avoid this,
set 'vm.userfault = 1' in /sys/sysctl.conf.
The main reason this change is desirable as in the short term is that the
Android userland will behave as with the sysctl set to zero. So without
this commit, any Linux binary using userfaultfd to manage its memory would
behave differently if run within the Android userland. For more details,
refer to Andrea's reply [1].
[1] https://lore.kernel.org/lkml/
20200904033438.GI9411@redhat.com/
Link: https://lkml.kernel.org/r/20201120030411.2690816-3-lokeshgidra@google.com
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Peter Xu <peterx@redhat.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Stephen Smalley <stephen.smalley.work@gmail.com>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Daniel Colascione <dancol@dancol.org>
Cc: "Joel Fernandes (Google)" <joel@joelfernandes.org>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Jeff Vander Stoep <jeffv@google.com>
Cc: <calin@google.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Nitin Gupta <nigupta@nvidia.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Daniel Colascione <dancol@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Lokesh Gidra [Tue, 15 Dec 2020 03:13:49 +0000 (19:13 -0800)]
userfaultfd: add UFFD_USER_MODE_ONLY
Patch series "Control over userfaultfd kernel-fault handling", v6.
This patch series is split from [1]. The other series enables SELinux
support for userfaultfd file descriptors so that its creation and movement
can be controlled.
It has been demonstrated on various occasions that suspending kernel code
execution for an arbitrary amount of time at any access to userspace
memory (copy_from_user()/copy_to_user()/...) can be exploited to change
the intended behavior of the kernel. For instance, handling page faults
in kernel-mode using userfaultfd has been exploited in [2, 3]. Likewise,
FUSE, which is similar to userfaultfd in this respect, has been exploited
in [4, 5] for similar outcome.
This small patch series adds a new flag to userfaultfd(2) that allows
callers to give up the ability to handle kernel-mode faults with the
resulting UFFD file object. It then adds a 'user-mode only' option to the
unprivileged_userfaultfd sysctl knob to require unprivileged callers to
use this new flag.
The purpose of this new interface is to decrease the chance of an
unprivileged userfaultfd user taking advantage of userfaultfd to enhance
security vulnerabilities by lengthening the race window in kernel code.
[1] https://lore.kernel.org/lkml/
20200211225547.235083-1-dancol@google.com/
[2] https://duasynt.com/blog/linux-kernel-heap-spray
[3] https://duasynt.com/blog/cve-2016-6187-heap-off-by-one-exploit
[4] https://googleprojectzero.blogspot.com/2016/06/exploiting-recursion-in-linux-kernel_20.html
[5] https://bugs.chromium.org/p/project-zero/issues/detail?id=808
This patch (of 2):
userfaultfd handles page faults from both user and kernel code. Add a new
UFFD_USER_MODE_ONLY flag for userfaultfd(2) that makes the resulting
userfaultfd object refuse to handle faults from kernel mode, treating
these faults as if SIGBUS were always raised, causing the kernel code to
fail with EFAULT.
A future patch adds a knob allowing administrators to give some processes
the ability to create userfaultfd file objects only if they pass
UFFD_USER_MODE_ONLY, reducing the likelihood that these processes will
exploit userfaultfd's ability to delay kernel page faults to open timing
windows for future exploits.
Link: https://lkml.kernel.org/r/20201120030411.2690816-1-lokeshgidra@google.com
Link: https://lkml.kernel.org/r/20201120030411.2690816-2-lokeshgidra@google.com
Signed-off-by: Daniel Colascione <dancol@google.com>
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: <calin@google.com>
Cc: Daniel Colascione <dancol@dancol.org>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Jeff Vander Stoep <jeffv@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: "Joel Fernandes (Google)" <joel@joelfernandes.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Nitin Gupta <nigupta@nvidia.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Shaohua Li <shli@fb.com>
Cc: Stephen Smalley <stephen.smalley.work@gmail.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vlastimil Babka [Tue, 15 Dec 2020 03:13:45 +0000 (19:13 -0800)]
mm, page_poison: remove CONFIG_PAGE_POISONING_ZERO
CONFIG_PAGE_POISONING_ZERO uses the zero pattern instead of 0xAA. It was
introduced by commit
1414c7f4f7d7 ("mm/page_poisoning.c: allow for zero
poisoning"), noting that using zeroes retains the benefit of sanitizing
content of freed pages, with the benefit of not having to zero them again
on alloc, and the downside of making some forms of corruption (stray
writes of NULLs) harder to detect than with the 0xAA pattern. Together
with CONFIG_PAGE_POISONING_NO_SANITY it made possible to sanitize the
contents on free without checking it back on alloc.
These days we have the init_on_free() option to achieve sanitization with
zeroes and to save clearing on alloc (and without checking on alloc).
Arguably if someone does choose to check the poison for corruption on
alloc, the savings of not clearing the page are secondary, and it makes
sense to always use the 0xAA poison pattern. Thus, remove the
CONFIG_PAGE_POISONING_ZERO option for being redundant.
Link: https://lkml.kernel.org/r/20201113104033.22907-6-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Laura Abbott <labbott@kernel.org>
Cc: Mateusz Nosek <mateusznosek0@gmail.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vlastimil Babka [Tue, 15 Dec 2020 03:13:41 +0000 (19:13 -0800)]
mm, page_poison: remove CONFIG_PAGE_POISONING_NO_SANITY
CONFIG_PAGE_POISONING_NO_SANITY skips the check on page alloc whether the
poison pattern was corrupted, suggesting a use-after-free. The motivation
to introduce it in commit
8823b1dbc05f ("mm/page_poison.c: enable
PAGE_POISONING as a separate option") was to simply sanitize freed pages,
optimally together with CONFIG_PAGE_POISONING_ZERO.
These days we have an init_on_free=1 boot option, which makes this use
case of page poisoning redundant. For sanitizing, writing zeroes is
sufficient, there is pretty much no benefit from writing the 0xAA poison
pattern to freed pages, without checking it back on alloc. Thus, remove
this option and suggest init_on_free instead in the main config's help.
Link: https://lkml.kernel.org/r/20201113104033.22907-5-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Laura Abbott <labbott@kernel.org>
Cc: Mateusz Nosek <mateusznosek0@gmail.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vlastimil Babka [Tue, 15 Dec 2020 03:13:38 +0000 (19:13 -0800)]
kernel/power: allow hibernation with page_poison sanity checking
Page poisoning used to be incompatible with hibernation, as the state of
poisoned pages was lost after resume, thus enabling CONFIG_HIBERNATION
forces CONFIG_PAGE_POISONING_NO_SANITY. For the same reason, the
poisoning with zeroes variant CONFIG_PAGE_POISONING_ZERO used to disable
hibernation. The latter restriction was removed by commit
1ad1410f632d
("PM / Hibernate: allow hibernation with PAGE_POISONING_ZERO") and
similarly for init_on_free by commit
18451f9f9e58 ("PM: hibernate: fix
crashes with init_on_free=1") by making sure free pages are cleared after
resume.
We can use the same mechanism to instead poison free pages with
PAGE_POISON after resume. This covers both zero and 0xAA patterns. Thus
we can remove the Kconfig restriction that disables page poison sanity
checking when hibernation is enabled.
Link: https://lkml.kernel.org/r/20201113104033.22907-4-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [hibernation]
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Laura Abbott <labbott@kernel.org>
Cc: Mateusz Nosek <mateusznosek0@gmail.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vlastimil Babka [Tue, 15 Dec 2020 03:13:34 +0000 (19:13 -0800)]
mm, page_poison: use static key more efficiently
Commit
11c9c7edae06 ("mm/page_poison.c: replace bool variable with static
key") changed page_poisoning_enabled() to a static key check. However,
the function is not inlined, so each check still involves a function call
with overhead not eliminated when page poisoning is disabled.
Analogically to how debug_pagealloc is handled, this patch converts
page_poisoning_enabled() back to boolean check, and introduces
page_poisoning_enabled_static() for fast paths. Both functions are
inlined.
The function kernel_poison_pages() is also called unconditionally and does
the static key check inside. Remove it from there and put it to callers.
Also split it to two functions kernel_poison_pages() and
kernel_unpoison_pages() instead of the confusing bool parameter.
Also optimize the check that enables page poisoning instead of
debug_pagealloc for architectures without proper debug_pagealloc support.
Move the check to init_mem_debugging_and_hardening() to enable a single
static key instead of having two static branches in
page_poisoning_enabled_static().
Link: https://lkml.kernel.org/r/20201113104033.22907-3-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Laura Abbott <labbott@kernel.org>
Cc: Mateusz Nosek <mateusznosek0@gmail.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vlastimil Babka [Tue, 15 Dec 2020 03:13:30 +0000 (19:13 -0800)]
mm, page_alloc: do not rely on the order of page_poison and init_on_alloc/free parameters
Patch series "cleanup page poisoning", v3.
I have identified a number of issues and opportunities for cleanup with
CONFIG_PAGE_POISON and friends:
- interaction with init_on_alloc and init_on_free parameters depends on
the order of parameters (Patch 1)
- the boot time enabling uses static key, but inefficienty (Patch 2)
- sanity checking is incompatible with hibernation (Patch 3)
- CONFIG_PAGE_POISONING_NO_SANITY can be removed now that we have
init_on_free (Patch 4)
- CONFIG_PAGE_POISONING_ZERO can be most likely removed now that we
have init_on_free (Patch 5)
This patch (of 5):
Enabling page_poison=1 together with init_on_alloc=1 or init_on_free=1
produces a warning in dmesg that page_poison takes precedence. However,
as these warnings are printed in early_param handlers for
init_on_alloc/free, they are not printed if page_poison is enabled later
on the command line (handlers are called in the order of their
parameters), or when init_on_alloc/free is always enabled by the
respective config option - before the page_poison early param handler is
called, it is not considered to be enabled. This is inconsistent.
We can remove the dependency on order by making the init_on_* parameters
only set a boolean variable, and postponing the evaluation after all early
params have been processed. Introduce a new
init_mem_debugging_and_hardening() function for that, and move the related
debug_pagealloc processing there as well.
As a result init_mem_debugging_and_hardening() knows always accurately if
init_on_* and/or page_poison options were enabled. Thus we can also
optimize want_init_on_alloc() and want_init_on_free(). We don't need to
check page_poisoning_enabled() there, we can instead not enable the
init_on_* static keys at all, if page poisoning is enabled. This results
in a simpler and more effective code.
Link: https://lkml.kernel.org/r/20201113104033.22907-1-vbabka@suse.cz
Link: https://lkml.kernel.org/r/20201113104033.22907-2-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mateusz Nosek <mateusznosek0@gmail.com>
Cc: Laura Abbott <labbott@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Charan Teja Reddy [Tue, 15 Dec 2020 03:13:26 +0000 (19:13 -0800)]
mm: cma: improve pr_debug log in cma_release()
It is required to print 'count' of pages, along with the pages, passed to
cma_release to debug the cases of mismatched count value passed between
cma_alloc() and cma_release() from a code path.
As an example, consider the below scenario:
1) CMA pool size is 4MB and
2) User doing the erroneous step of allocating 2 pages but freeing 1
page in a loop from this CMA pool. The step 2 causes cma_alloc() to
return NULL at one point of time because of -ENOMEM condition.
And the current pr_debug logs is not giving the info about these types of
allocation patterns because of count value not being printed in
cma_release().
We are printing the count value in the trace logs, just extend the same to
pr_debug logs too.
[akpm@linux-foundation.org: fix printk warning]
Link: https://lkml.kernel.org/r/1606318341-29521-1-git-send-email-charante@codeaurora.org
Signed-off-by: Charan Teja Reddy <charante@codeaurora.org>
Reviewed-by: Souptick Joarder <jrdr.linux@gmail.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Lecopzer Chen [Tue, 15 Dec 2020 03:13:23 +0000 (19:13 -0800)]
mm/cma.c: remove redundant cma_mutex lock
The cma_mutex which protects alloc_contig_range() was first appeared in
commit
7ee793a62fa8c ("cma: Remove potential deadlock situation"), at that
time, there is no guarantee the behavior of concurrency inside
alloc_contig_range().
After commit
2c7452a075d4db2dc ("mm/page_isolation.c: make
start_isolate_page_range() fail if already isolated")
> However, two subsystems (CMA and gigantic
> huge pages for example) could attempt operations on the same range. If
> this happens, one thread may 'undo' the work another thread is doing.
> This can result in pageblocks being incorrectly left marked as
> MIGRATE_ISOLATE and therefore not available for page allocation.
The concurrency inside alloc_contig_range() was clarified.
Now we can find that hugepage and virtio call alloc_contig_range() without
any lock, thus cma_mutex is "redundant" in cma_alloc() now.
Link: https://lkml.kernel.org/r/20201020102241.3729-1-lecopzer.chen@mediatek.com
Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: YJ Chiang <yj.chiang@mediatek.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Stephen Zhang [Tue, 15 Dec 2020 03:13:20 +0000 (19:13 -0800)]
mm: migrate: remove unused parameter in migrate_vma_insert_page()
"dst" parameter to migrate_vma_insert_page() is not used anymore.
Link: https://lkml.kernel.org/r/CANubcdUwCAMuUyamG2dkWP=cqSR9MAS=tHLDc95kQkqU-rEnAg@mail.gmail.com
Signed-off-by: Stephen Zhang <starzhangzsd@gmail.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yang Shi [Tue, 15 Dec 2020 03:13:16 +0000 (19:13 -0800)]
mm: migrate: return -ENOSYS if THP migration is unsupported
In the current implementation unmap_and_move() would return -ENOMEM if THP
migration is unsupported, then the THP will be split. If split is failed
just exit without trying to migrate other pages. It doesn't make too much
sense since there may be enough free memory to migrate other pages and
there may be a lot base pages on the list.
Return -ENOSYS to make consistent with hugetlb. And if THP split is
failed just skip and try other pages on the list.
Just skip the whole list and exit when free memory is really low.
Link: https://lkml.kernel.org/r/20201113205359.556831-6-shy828301@gmail.com
Signed-off-by: Yang Shi <shy828301@gmail.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yang Shi [Tue, 15 Dec 2020 03:13:13 +0000 (19:13 -0800)]
mm: migrate: clean up migrate_prep{_local}
The migrate_prep{_local} never fails, so it is pointless to have return
value and check the return value.
Link: https://lkml.kernel.org/r/20201113205359.556831-5-shy828301@gmail.com
Signed-off-by: Yang Shi <shy828301@gmail.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Song Liu <songliubraving@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yang Shi [Tue, 15 Dec 2020 03:13:09 +0000 (19:13 -0800)]
mm: migrate: skip shared exec THP for NUMA balancing
The NUMA balancing skip shared exec base page. Since
CONFIG_READ_ONLY_THP_FOR_FS was introduced, there are probably shared exec
THP, so skip such THPs for NUMA balancing as well.
And Willy's regular filesystem THP support patches could create shared
exec THP wven without that config.
In addition, the page_is_file_lru() is used to tell if the page is file
cache or not, but it filters out shmem page. It sounds like a typical
usecase by putting executables in shmem to achieve performance gain via
using shmem-THP, so it sounds worth skipping migration for such case too.
Link: https://lkml.kernel.org/r/20201113205359.556831-4-shy828301@gmail.com
Signed-off-by: Yang Shi <shy828301@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yang Shi [Tue, 15 Dec 2020 03:13:06 +0000 (19:13 -0800)]
mm: migrate: simplify the logic for handling permanent failure
When unmap_and_move{_huge_page}() returns !-EAGAIN and
!MIGRATEPAGE_SUCCESS, the page would be put back to LRU or proper list if
it is non-LRU movable page. But, the callers always call
putback_movable_pages() to put the failed pages back later on, so it seems
not very efficient to put every single page back immediately, and the code
looks convoluted.
Put the failed page on a separate list, then splice the list to migrate
list when all pages are tried. It is the caller's responsibility to call
putback_movable_pages() to handle failures. This also makes the code
simpler and more readable.
After the change the rules are:
* Success: non hugetlb page will be freed, hugetlb page will be put
back
* -EAGAIN: stay on the from list
* -ENOMEM: stay on the from list
* Other errno: put on ret_pages list then splice to from list
The from list would be empty iff all pages are migrated successfully, it
was not so before. This has no impact to current existing callsites.
Link: https://lkml.kernel.org/r/20201113205359.556831-3-shy828301@gmail.com
Signed-off-by: Yang Shi <shy828301@gmail.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Song Liu <songliubraving@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yang Shi [Tue, 15 Dec 2020 03:13:02 +0000 (19:13 -0800)]
mm: truncate_complete_page() does not exist any more
Patch series "mm: misc migrate cleanup and improvement", v3.
This patch (of 5):
The commit
9f4e41f4717832e ("mm: refactor truncate_complete_page()")
refactored truncate_complete_page(), and it is not existed anymore,
correct the comment in vmscan and migrate to avoid confusion.
Link: https://lkml.kernel.org/r/20201113205359.556831-1-shy828301@gmail.com
Link: https://lkml.kernel.org/r/20201113205359.556831-2-shy828301@gmail.com
Signed-off-by: Yang Shi <shy828301@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Matthew Wilcox (Oracle) [Tue, 15 Dec 2020 03:12:59 +0000 (19:12 -0800)]
mm: support THPs in zero_user_segments
We can only kmap() one subpage of a THP at a time, so loop over all
relevant subpages, skipping ones which don't need to be zeroed. This is
too large to inline when THPs are enabled and we actually need highmem, so
put it in highmem.c.
[willy@infradead.org: start1 was allowed to be less than start2]
Link: https://lkml.kernel.org/r/20201124041507.28996-1-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ralph Campbell [Tue, 15 Dec 2020 03:12:55 +0000 (19:12 -0800)]
mm/migrate.c: optimize migrate_vma_pages() mmu notifier
When migrating a zero page or pte_none() anonymous page to device private
memory, migrate_vma_setup() will initialize the src[] array with a NULL
PFN. This lets the device driver allocate device private memory and clear
it instead of DMAing a page of zeros over the device bus.
Since the source page didn't exist at the time, no struct page was locked
nor a migration PTE inserted into the CPU page tables. The actual PTE
insertion happens in migrate_vma_pages() when it tries to insert the
device private struct page PTE into the CPU page tables.
migrate_vma_pages() has to call the mmu notifiers again since another
device could fault on the same page before the page table locks are
acquired.
Allow device drivers to optimize the invalidation similar to
migrate_vma_setup() by calling mmu_notifier_range_init() which sets struct
mmu_notifier_range event type to MMU_NOTIFY_MIGRATE and the
migrate_pgmap_owner field.
Link: https://lkml.kernel.org/r/20201021191335.10916-1-rcampbell@nvidia.com
Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Long Li [Tue, 15 Dec 2020 03:12:52 +0000 (19:12 -0800)]
mm/migrate.c: fix comment spelling
The word in the comment is misspelled, it should be "include".
Link: https://lkml.kernel.org/r/20201024114144.GA20552@lilong
Signed-off-by: Long Li <lonuxli.64@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hui Su [Tue, 15 Dec 2020 03:12:49 +0000 (19:12 -0800)]
mm/oom_kill: change comment and rename is_dump_unreclaim_slabs()
Change the comment of is_dump_unreclaim_slabs(), it just check whether
nr_unreclaimable slabs amount is greater than user memory, and explain why
we dump unreclaim slabs.
Rename it to should_dump_unreclaim_slab() maybe better.
Link: https://lkml.kernel.org/r/20201030182704.GA53949@rlk
Signed-off-by: Hui Su <sh_def@163.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hui Su [Tue, 15 Dec 2020 03:12:46 +0000 (19:12 -0800)]
mm/compaction: make defer_compaction and compaction_deferred static
defer_compaction() and compaction_deferred() and compaction_restarting()
in mm/compaction.c won't be used in other files, so make them static, and
remove the declaration in the header file.
Take the chance to fix a typo.
Link: https://lkml.kernel.org/r/20201123170801.GA9625@rlk
Signed-off-by: Hui Su <sh_def@163.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Nitin Gupta <nigupta@nvidia.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Mateusz Nosek <mateusznosek0@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hui Su [Tue, 15 Dec 2020 03:12:42 +0000 (19:12 -0800)]
mm/compaction: move compaction_suitable's comment to right place
Since commit
837d026d560c ("mm/compaction: more trace to understand
when/why compaction start/finish"), the comment place is not suitable.
So move compaction_suitable's comment to right place.
Link: https://lkml.kernel.org/r/20201116144121.GA385717@rlk
Signed-off-by: Hui Su <sh_def@163.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yanfei Xu [Tue, 15 Dec 2020 03:12:39 +0000 (19:12 -0800)]
mm/compaction: rename 'start_pfn' to 'iteration_start_pfn' in compact_zone()
There are two 'start_pfn' declared in compact_zone() which have different
meanings. Rename the second one to 'iteration_start_pfn' to prevent
confusion.
Also, remove an useless semicolon.
Link: https://lkml.kernel.org/r/20201019115044.1571-1-yanfei.xu@windriver.com
Signed-off-by: Yanfei Xu <yanfei.xu@windriver.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vitaly Wool [Tue, 15 Dec 2020 03:12:36 +0000 (19:12 -0800)]
z3fold: remove preempt disabled sections for RT
Replace get_cpu_ptr() with migrate_disable()+this_cpu_ptr() so RT can take
spinlocks that become sleeping locks.
Signed-off-by Mike Galbraith <efault@gmx.de>
Link: https://lkml.kernel.org/r/20201209145151.18994-3-vitaly.wool@konsulko.com
Signed-off-by: Vitaly Wool <vitaly.wool@konsulko.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vitaly Wool [Tue, 15 Dec 2020 03:12:33 +0000 (19:12 -0800)]
z3fold: stricter locking and more careful reclaim
Use temporary slots in reclaim function to avoid possible race when
freeing those.
While at it, make sure we check CLAIMED flag under page lock in the
reclaim function to make sure we are not racing with z3fold_alloc().
Link: https://lkml.kernel.org/r/20201209145151.18994-4-vitaly.wool@konsulko.com
Signed-off-by: Vitaly Wool <vitaly.wool@konsulko.com>
Cc: <stable@vger.kernel.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vitaly Wool [Tue, 15 Dec 2020 03:12:30 +0000 (19:12 -0800)]
z3fold: simplify freeing slots
Patch series "z3fold: stability / rt fixes".
Address z3fold stability issues under stress load, primarily in the
reclaim and free aspects. Besides, it fixes the locking problems that
were only seen in real-time kernel configuration.
This patch (of 3):
There used to be two places in the code where slots could be freed, namely
when freeing the last allocated handle from the slots and when releasing
the z3fold header these slots aree linked to. The logic to decide on
whether to free certain slots was complicated and error prone in both
functions and it led to failures in RT case.
To fix that, make free_handle() the single point of freeing slots.
Link: https://lkml.kernel.org/r/20201209145151.18994-1-vitaly.wool@konsulko.com
Link: https://lkml.kernel.org/r/20201209145151.18994-2-vitaly.wool@konsulko.com
Signed-off-by: Vitaly Wool <vitaly.wool@konsulko.com>
Tested-by: Mike Galbraith <efault@gmx.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Muchun Song [Tue, 15 Dec 2020 03:12:27 +0000 (19:12 -0800)]
mm/page_isolation: do not isolate the max order page
A max order page has no buddy page and never merges to another order. So
isolating and then freeing it is pointless.
Link: https://lkml.kernel.org/r/20201202122114.75316-1-songmuchun@bytedance.com
Fixes:
3c605096d315 ("mm/page_alloc: restrict max order of merging on isolated pageblock")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
logic.yu [Tue, 15 Dec 2020 03:12:21 +0000 (19:12 -0800)]
mm/vmscan.c: remove the filename in the top of file comment
No point in having the filename inside the file.
Link: https://lkml.kernel.org/r/20201115141541.3878-1-hymmsx.yu@gmail.com
Signed-off-by: logic.yu <hymmsx.yu@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Lukas Bulwahn [Tue, 15 Dec 2020 03:12:18 +0000 (19:12 -0800)]
mm/vmscan: drop unneeded assignment in kswapd()
The refactoring to kswapd() in commit
e716f2eb24de ("mm, vmscan: prevent
kswapd sleeping prematurely due to mismatched classzone_idx") turned an
assignment to reclaim_order into a dead store, as in all further paths,
reclaim_order will be assigned again before it is used.
make clang-analyzer on x86_64 tinyconfig caught my attention with:
mm/vmscan.c: warning: Although the value stored to 'reclaim_order' is used in the enclosing expression, the value is never actually read from 'reclaim_order' [clang-analyzer-deadcode.DeadStores]
Compilers will detect this unneeded assignment and optimize this anyway.
So, the resulting binary is identical before and after this change.
Simplify the code and remove unneeded assignment to make clang-analyzer
happy.
No functional change. No change in binary code.
Link: https://lkml.kernel.org/r/20201004125827.17679-1-lukas.bulwahn@gmail.com
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nathan Chancellor <natechancellor@gmail.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Weiner [Tue, 15 Dec 2020 03:12:15 +0000 (19:12 -0800)]
mm: don't wake kswapd prematurely when watermark boosting is disabled
On 2-node NUMA hosts we see bursts of kswapd reclaim and subsequent
pressure spikes and stalls from cache refaults while there is plenty of
free memory in the system.
Usually, kswapd is woken up when all eligible nodes in an allocation are
full. But the code related to watermark boosting can wake kswapd on one
full node while the other one is mostly empty. This may be justified to
fight fragmentation, but is currently unconditionally done whether
watermark boosting is occurring or not.
In our case, many of our workloads' throughput scales with available
memory, and pure utilization is a more tangible concern than trends
around longer-term fragmentation. As a result we generally disable
watermark boosting.
Wake kswapd only woken when watermark boosting is requested.
Link: https://lkml.kernel.org/r/20201020175833.397286-1-hannes@cmpxchg.org
Fixes:
1c30844d2dfe ("mm: reclaim small amounts of memory when an external fragmentation event occurs")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dan Carpenter [Tue, 15 Dec 2020 03:12:11 +0000 (19:12 -0800)]
hugetlb: fix an error code in hugetlb_reserve_pages()
Preserve the error code from region_add() instead of returning success.
Link: https://lkml.kernel.org/r/X9NGZWnZl5/Mt99R@mwanda
Fixes:
0db9d74ed884 ("hugetlb: disable region_add file_region coalescing")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oscar Salvador [Tue, 15 Dec 2020 03:12:08 +0000 (19:12 -0800)]
mm,hugetlb: remove unneeded initialization
hugetlb_add_hstate initializes nr_huge_pages and free_huge_pages to 0, but
since hstates[] is a global variable, all its fields are defined to 0
already.
Link: https://lkml.kernel.org/r/20201119112141.6452-1-osalvador@suse.de
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Liu Xiang [Tue, 15 Dec 2020 03:12:05 +0000 (19:12 -0800)]
mm: hugetlb: fix type of delta parameter and related local variables in gather_surplus_pages()
On 64-bit machine, delta variable in hugetlb_acct_memory() may be larger
than 0xffffffff, but gather_surplus_pages() can only use the low 32-bit
value now. So we need to fix type of delta parameter and related local
variables in gather_surplus_pages().
Link: https://lkml.kernel.org/r/1605793733-3573-1-git-send-email-liu.xiang@zlingsmart.com
Reported-by: Ma Chenggong <ma.chenggong@zlingsmart.com>
Signed-off-by: Liu Xiang <liu.xiang@zlingsmart.com>
Signed-off-by: Pan Jiagen <pan.jiagen@zlingsmart.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Liu Xiang <liuxiang_1999@126.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alex Shi [Tue, 15 Dec 2020 03:12:01 +0000 (19:12 -0800)]
khugepaged: add parameter explanations for kernel-doc markup
Add missed parameter explanation for some kernel-doc warnings:
mm/khugepaged.c:102: warning: Function parameter or member 'nr_pte_mapped_thp' not described in 'mm_slot'
mm/khugepaged.c:102: warning: Function parameter or member 'pte_mapped_thp' not described in 'mm_slot'
mm/khugepaged.c:1424: warning: Function parameter or member 'mm' not described in 'collapse_pte_mapped_thp'
mm/khugepaged.c:1424: warning: Function parameter or member 'addr' not described in 'collapse_pte_mapped_thp'
mm/khugepaged.c:1626: warning: Function parameter or member 'mm' not described in 'collapse_file'
mm/khugepaged.c:1626: warning: Function parameter or member 'file' not described in 'collapse_file'
mm/khugepaged.c:1626: warning: Function parameter or member 'start' not described in 'collapse_file'
mm/khugepaged.c:1626: warning: Function parameter or member 'hpage' not described in 'collapse_file'
mm/khugepaged.c:1626: warning: Function parameter or member 'node' not described in 'collapse_file'
Link: https://lkml.kernel.org/r/1605597325-25284-1-git-send-email-alex.shi@linux.alibaba.com
Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ralph Campbell [Tue, 15 Dec 2020 03:11:58 +0000 (19:11 -0800)]
include/linux/huge_mm.h: remove extern keyword
The external function definitions don't need the "extern" keyword. Remove
them so future changes don't copy the function definition style.
Link: https://lkml.kernel.org/r/20201106235135.32109-1-rcampbell@nvidia.com
Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hui Su [Tue, 15 Dec 2020 03:11:55 +0000 (19:11 -0800)]
mm/hugetlb.c: just use put_page_testzero() instead of page_count()
We test the page reference count is zero or not here, it can be a bug here
if page refercence count is not zero. So we can just use
put_page_testzero() instead of page_count().
Link: https://lkml.kernel.org/r/20201007170949.GA6416@rlk
Signed-off-by: Hui Su <sh_def@163.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oscar Salvador [Tue, 15 Dec 2020 03:11:51 +0000 (19:11 -0800)]
mm,hwpoison: return -EBUSY when migration fails
Currently, we return -EIO when we fail to migrate the page.
Migrations' failures are rather transient as they can happen due to
several reasons, e.g: high page refcount bump, mapping->migrate_page
failing etc. All meaning that at that time the page could not be
migrated, but that has nothing to do with an EIO error.
Let us return -EBUSY instead, as we do in case we failed to isolate the
page.
While are it, let us remove the "ret" print as its value does not change.
Link: https://lkml.kernel.org/r/20201209092818.30417-1-osalvador@suse.de
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oscar Salvador [Tue, 15 Dec 2020 03:11:48 +0000 (19:11 -0800)]
mm,memory_failure: always pin the page in madvise_inject_error
madvise_inject_error() uses get_user_pages_fast to translate the address
we specified to a page. After [1], we drop the extra reference count for
memory_failure() path. That commit says that memory_failure wanted to
keep the pin in order to take the page out of circulation.
The truth is that we need to keep the page pinned, otherwise the page
might be re-used after the put_page() and we can end up messing with
someone else's memory.
E.g:
CPU0
process X CPU1
madvise_inject_error
get_user_pages
put_page
page gets reclaimed
process Y allocates the page
memory_failure
// We mess with process Y memory
madvise() is meant to operate on a self address space, so messing with
pages that do not belong to us seems the wrong thing to do.
To avoid that, let us keep the page pinned for memory_failure as well.
Pages for DAX mappings will release this extra refcount in
memory_failure_dev_pagemap.
[1] ("
23e7b5c2e271: mm, madvise_inject_error:
Let memory_failure() optionally take a page reference")
Link: https://lkml.kernel.org/r/20201207094818.8518-1-osalvador@suse.de
Fixes:
23e7b5c2e271 ("mm, madvise_inject_error: Let memory_failure() optionally take a page reference")
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oscar Salvador [Tue, 15 Dec 2020 03:11:45 +0000 (19:11 -0800)]
mm,hwpoison: remove drain_all_pages from shake_page
get_hwpoison_page already drains pcplists, previously disabling them when
trying to grab a refcount. We do not need shake_page to take care of it
anymore.
Link: https://lkml.kernel.org/r/20201204102558.31607-4-osalvador@suse.de
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Qian Cai <qcai@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oscar Salvador [Tue, 15 Dec 2020 03:11:41 +0000 (19:11 -0800)]
mm,hwpoison: disable pcplists before grabbing a refcount
Currently, we have a sort of retry mechanism to make sure pages in
pcp-lists are spilled to the buddy system, so we can handle those.
We can save us this extra checks with the new disable-pcplist mechanism
that is available with [1].
zone_pcplist_disable makes sure to 1) disable pcplists, so any page that
is freed up from that point onwards will end up in the buddy system and 2)
drain pcplists, so those pages that already in pcplists are spilled to
buddy.
With that, we can make a common entry point for grabbing a refcount from
both soft_offline and memory_failure paths that is guarded by
zone_pcplist_disable/zone_pcplist_enable.
[1] https://patchwork.kernel.org/project/linux-mm/cover/
20201111092812.11329-1-vbabka@suse.cz/
Link: https://lkml.kernel.org/r/20201204102558.31607-3-osalvador@suse.de
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Qian Cai <qcai@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>