Linus Torvalds [Thu, 26 Jan 2023 18:05:39 +0000 (10:05 -0800)]
treewide: fix up files incorrectly marked executable
[ Upstream commit
262b42e02d1e0b5ad1b33e9b9842e178c16231de ]
I'm not exactly clear on what strange workflow causes people to do it,
but clearly occasionally some files end up being committed as executable
even though they clearly aren't.
This is a reprise of commit
90fda63fa115 ("treewide: fix up files
incorrectly marked executable"), just with a different set of files (but
with the same trivial shell scripting).
So apparently we need to re-do this every five years or so, and Joe
needs to just keep reminding me to do so ;)
Reported-by: Joe Perches <joe@perches.com>
Fixes:
523375c943e5 ("drm/vmwgfx: Port vmwgfx to arm64")
Fixes:
5c439937775d ("ASoC: codecs: add support for ES8326")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Ming Lei [Thu, 26 Jan 2023 11:53:46 +0000 (19:53 +0800)]
block: ublk: move ublk_chr_class destroying after devices are removed
[ Upstream commit
8e4ff684762b6503db45e8906e258faee080c336 ]
The 'ublk_chr_class' is needed when deleting ublk char devices in
ublk_exit(), so move it after devices(idle) are removed.
Fixes the following warning reported by Harris, James R:
[ 859.178950] sysfs group 'power' not found for kobject 'ublkc0'
[ 859.178962] WARNING: CPU: 3 PID: 1109 at fs/sysfs/group.c:278 sysfs_remove_group+0x9c/0xb0
Reported-by: "Harris, James R" <james.r.harris@intel.com>
Fixes:
71f28f3136af ("ublk_drv: add io_uring based userspace block driver")
Link: https://lore.kernel.org/linux-block/Y9JlFmSgDl3+zy3N@T590/T/#t
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Tested-by: Jim Harris <james.r.harris@intel.com>
Link: https://lore.kernel.org/r/20230126115346.263344-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Robin Murphy [Mon, 23 Jan 2023 18:30:38 +0000 (18:30 +0000)]
Partially revert "perf/arm-cmn: Optimise DTC counter accesses"
[ Upstream commit
a428eb4b99ab80454f06ad256b25e930fe8a4954 ]
It turns out the optimisation implemented by commit
4f2c3872dde5 is
totally broken, since all the places that consume hw->dtcs_used for
events other than cycle count are still not expecting it to be sparsely
populated, and fail to read all the relevant DTC counters correctly if
so.
If implemented correctly, the optimisation potentially saves up to 3
register reads per event update, which is reasonably significant for
events targeting a single node, but still not worth a massive amount of
additional code complexity overall. Getting it right within the current
design looks a fair bit more involved than it was ever intended to be,
so let's just make a functional revert which restores the old behaviour
while still backporting easily.
Fixes:
4f2c3872dde5 ("perf/arm-cmn: Optimise DTC counter accesses")
Reported-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/b41bb4ed7283c3d8400ce5cf5e6ec94915e6750f.1674498637.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Jerome Brunet [Tue, 24 Jan 2023 10:11:57 +0000 (11:11 +0100)]
net: mdio-mux-meson-g12a: force internal PHY off on mux switch
[ Upstream commit
7083df59abbc2b7500db312cac706493be0273ff ]
Force the internal PHY off then on when switching to the internal path.
This fixes problems where the PHY ID is not properly set.
Fixes:
7090425104db ("net: phy: add amlogic g12a mdio mux support")
Suggested-by: Qi Duan <qi.duan@amlogic.com>
Co-developed-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Link: https://lore.kernel.org/r/20230124101157.232234-1-jbrunet@baylibre.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Gerhard Engleder [Tue, 24 Jan 2023 19:14:40 +0000 (20:14 +0100)]
tsnep: Fix TX queue stop/wake for multiple queues
[ Upstream commit
3d53aaef4332245044b2f3688ac0ea10436c719c ]
netif_stop_queue() and netif_wake_queue() act on TX queue 0. This is ok
as long as only a single TX queue is supported. But support for multiple
TX queues was introduced with
762031375d5c and I missed to adapt stop
and wake of TX queues.
Use netif_stop_subqueue() and netif_tx_wake_queue() to act on specific
TX queue.
Fixes:
762031375d5c ("tsnep: Support multiple TX/RX queue pairs")
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Link: https://lore.kernel.org/r/20230124191440.56887-1-gerhard@engleder-embedded.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
David Christensen [Tue, 24 Jan 2023 18:53:39 +0000 (13:53 -0500)]
net/tg3: resolve deadlock in tg3_reset_task() during EEH
[ Upstream commit
6c4ca03bd890566d873e3593b32d034bf2f5a087 ]
During EEH error injection testing, a deadlock was encountered in the tg3
driver when tg3_io_error_detected() was attempting to cancel outstanding
reset tasks:
crash> foreach UN bt
...
PID: 159 TASK:
c0000000067c6000 CPU: 8 COMMAND: "eehd"
...
#5 [
c00000000681f990] __cancel_work_timer at
c00000000019fd18
#6 [
c00000000681fa30] tg3_io_error_detected at
c00800000295f098 [tg3]
#7 [
c00000000681faf0] eeh_report_error at
c00000000004e25c
...
PID: 290 TASK:
c000000036e5f800 CPU: 6 COMMAND: "kworker/6:1"
...
#4 [
c00000003721fbc0] rtnl_lock at
c000000000c940d8
#5 [
c00000003721fbe0] tg3_reset_task at
c008000002969358 [tg3]
#6 [
c00000003721fc60] process_one_work at
c00000000019e5c4
...
PID: 296 TASK:
c000000037a65800 CPU: 21 COMMAND: "kworker/21:1"
...
#4 [
c000000037247bc0] rtnl_lock at
c000000000c940d8
#5 [
c000000037247be0] tg3_reset_task at
c008000002969358 [tg3]
#6 [
c000000037247c60] process_one_work at
c00000000019e5c4
...
PID: 655 TASK:
c000000036f49000 CPU: 16 COMMAND: "kworker/16:2"
...:1
#4 [
c0000000373ebbc0] rtnl_lock at
c000000000c940d8
#5 [
c0000000373ebbe0] tg3_reset_task at
c008000002969358 [tg3]
#6 [
c0000000373ebc60] process_one_work at
c00000000019e5c4
...
Code inspection shows that both tg3_io_error_detected() and
tg3_reset_task() attempt to acquire the RTNL lock at the beginning of
their code blocks. If tg3_reset_task() should happen to execute between
the times when tg3_io_error_deteced() acquires the RTNL lock and
tg3_reset_task_cancel() is called, a deadlock will occur.
Moving tg3_reset_task_cancel() call earlier within the code block, prior
to acquiring RTNL, prevents this from happening, but also exposes another
deadlock issue where tg3_reset_task() may execute AFTER
tg3_io_error_detected() has executed:
crash> foreach UN bt
PID: 159 TASK:
c0000000067d2000 CPU: 9 COMMAND: "eehd"
...
#4 [
c000000006867a60] rtnl_lock at
c000000000c940d8
#5 [
c000000006867a80] tg3_io_slot_reset at
c0080000026c2ea8 [tg3]
#6 [
c000000006867b00] eeh_report_reset at
c00000000004de88
...
PID: 363 TASK:
c000000037564000 CPU: 6 COMMAND: "kworker/6:1"
...
#3 [
c000000036c1bb70] msleep at
c000000000259e6c
#4 [
c000000036c1bba0] napi_disable at
c000000000c6b848
#5 [
c000000036c1bbe0] tg3_reset_task at
c0080000026d942c [tg3]
#6 [
c000000036c1bc60] process_one_work at
c00000000019e5c4
...
This issue can be avoided by aborting tg3_reset_task() if EEH error
recovery is already in progress.
Fixes:
db84bf43ef23 ("tg3: tg3_reset_task() needs to use rtnl_lock to synchronize")
Signed-off-by: David Christensen <drc@linux.vnet.ibm.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/20230124185339.225806-1-drc@linux.vnet.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Ley Foon Tan [Thu, 5 Jan 2023 03:37:05 +0000 (11:37 +0800)]
riscv: Move call to init_cpu_topology() to later initialization stage
[ Upstream commit
c1d6105869464635d8a2bcf87a43c05f4c0cfca4 ]
If "capacity-dmips-mhz" is present in a CPU DT node,
topology_parse_cpu_capacity() will fail to allocate memory. arm64, with
which this code path is shared, does not call
topology_parse_cpu_capacity() until later in boot where memory
allocation is available. While "capacity-dmips-mhz" is not yet a valid
property on RISC-V, invalid properties should be ignored rather than
cause issues. Move init_cpu_topology(), which calls
topology_parse_cpu_capacity(), to a later initialization stage, to match
arm64.
As a side effect of this change, RISC-V is "protected" from changes to
core topology code that would work on arm64 where memory allocation is
safe but on RISC-V isn't.
Fixes:
03f11f03dbfe ("RISC-V: Parse cpu topology during boot.")
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Ley Foon Tan <leyfoon.tan@starfivetech.com>
Link: https://lore.kernel.org/r/20230105033705.3946130-1-leyfoon.tan@starfivetech.com
[Palmer: use Conor's commit text]
Link: https://lore.kernel.org/linux-riscv/20230104183033.755668-1-pierre.gondois@arm.com/T/#me592d4c8b9508642954839f0077288a353b0b9b2
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Rafael J. Wysocki [Wed, 25 Jan 2023 12:17:42 +0000 (13:17 +0100)]
thermal: intel: int340x: Add locking to int340x_thermal_get_trip_type()
[ Upstream commit
acd7e9ee57c880b99671dd99680cb707b7b5b0ee ]
In order to prevent int340x_thermal_get_trip_type() from possibly
racing with int340x_thermal_read_trips() invoked by int3403_notify()
add locking to it in analogy with int340x_thermal_get_trip_temp().
Fixes:
6757a7abe47b ("thermal: intel: int340x: Protect trip temperature from concurrent updates")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Jeremy Kerr [Tue, 24 Jan 2023 02:01:06 +0000 (10:01 +0800)]
net: mctp: mark socks as dead on unhash, prevent re-add
[ Upstream commit
b98e1a04e27fddfdc808bf46fe78eca30db89ab3 ]
Once a socket has been unhashed, we want to prevent it from being
re-used in a sk_key entry as part of a routing operation.
This change marks the sk as SOCK_DEAD on unhash, which prevents addition
into the net's key list.
We need to do this during the key add path, rather than key lookup, as
we release the net keys_lock between those operations.
Fixes:
4a992bbd3650 ("mctp: Implement message fragmentation & reassembly")
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Paolo Abeni [Tue, 24 Jan 2023 02:01:05 +0000 (10:01 +0800)]
net: mctp: hold key reference when looking up a general key
[ Upstream commit
6e54ea37e344f145665c2dc3cc534b92529e8de5 ]
Currently, we have a race where we look up a sock through a "general"
(ie, not directly associated with the (src,dest,tag) tuple) key, then
drop the key reference while still holding the key's sock.
This change expands the key reference until we've finished using the
sock, and hence the sock reference too.
Commit message changes from Jeremy Kerr <jk@codeconstruct.com.au>.
Reported-by: Noam Rathaus <noamr@ssd-disclosure.com>
Fixes:
73c618456dc5 ("mctp: locking, lifetime and validity changes for sk_keys")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Jeremy Kerr [Tue, 24 Jan 2023 02:01:04 +0000 (10:01 +0800)]
net: mctp: move expiry timer delete to unhash
[ Upstream commit
5f41ae6fca9d40ab3cb9b0507931ef7a9b3ea50b ]
Currently, we delete the key expiry timer (in sk->close) before
unhashing the sk. This means that another thread may find the sk through
its presence on the key list, and re-queue the timer.
This change moves the timer deletion to the unhash, after we have made
the key no longer observable, so the timer cannot be re-queued.
Fixes:
7b14e15ae6f4 ("mctp: Implement a timeout for tags")
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Jeremy Kerr [Tue, 24 Jan 2023 02:01:03 +0000 (10:01 +0800)]
net: mctp: add an explicit reference from a mctp_sk_key to sock
[ Upstream commit
de8a6b15d9654c3e4f672d76da9d9df8ee06331d ]
Currently, we correlate the mctp_sk_key lifetime to the sock lifetime
through the sock hash/unhash operations, but this is pretty tenuous, and
there are cases where we may have a temporary reference to an unhashed
sk.
This change makes the reference more explicit, by adding a hold on the
sock when it's associated with a mctp_sk_key, released on final key
unref.
Fixes:
73c618456dc5 ("mctp: locking, lifetime and validity changes for sk_keys")
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Yoshihiro Shimoda [Tue, 24 Jan 2023 00:02:11 +0000 (09:02 +0900)]
net: ravb: Fix possible hang if RIS2_QFF1 happen
[ Upstream commit
f3c07758c9007a6bfff5290d9e19d3c41930c897 ]
Since this driver enables the interrupt by RIC2_QFE1, this driver
should clear the interrupt flag if it happens. Otherwise, the interrupt
causes to hang the system.
Note that this also fix a minor coding style (a comment indentation)
around the fixed code.
Fixes:
c156633f1353 ("Renesas Ethernet AVB driver proper")
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Yoshihiro Shimoda [Tue, 24 Jan 2023 00:02:10 +0000 (09:02 +0900)]
net: ravb: Fix lack of register setting after system resumed for Gen3
[ Upstream commit
c2b6cdee1d13ffbb24baca3c9b8a572d6b541e4e ]
After system entered Suspend to RAM, registers setting of this
hardware is reset because the SoC will be turned off. On R-Car Gen3
(info->ccc_gac), ravb_ptp_init() is called in ravb_probe() only. So,
after system resumed, it lacks of the initial settings for ptp. So,
add ravb_ptp_{init,stop}() into ravb_{resume,suspend}().
Fixes:
f5d7837f96e5 ("ravb: ptp: Add CONFIG mode support")
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Nikita Shubin [Wed, 25 Jan 2023 08:30:24 +0000 (11:30 +0300)]
gpio: ep93xx: Fix port F hwirq numbers in handler
[ Upstream commit
0f04cdbdb210000a97c773b28b598fa8ac3aafa4 ]
Fix wrong translation of irq numbers in port F handler, as ep93xx hwirqs
increased by 1, we should simply decrease them by 1 in translation.
Fixes:
482c27273f52 ("ARM: ep93xx: renumber interrupts")
Signed-off-by: Nikita Shubin <nikita.shubin@maquefel.me>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Dan Carpenter [Tue, 24 Jan 2023 15:20:26 +0000 (18:20 +0300)]
gpio: mxc: Unlock on error path in mxc_flip_edge()
[ Upstream commit
37870358616ca7fdb1e90ad1cdd791655ec54414 ]
We recently added locking to this function but one error path was
over looked. Drop the lock before returning.
Fixes:
e5464277625c ("gpio: mxc: Protect GPIO irqchip RMW with bgpio spinlock")
Signed-off-by: Dan Carpenter <error27@gmail.com>
Acked-by: Marek Vasut <marex@denx.de>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Keith Busch [Tue, 24 Jan 2023 21:29:14 +0000 (13:29 -0800)]
nvme: fix passthrough csi check
[ Upstream commit
85eee6341abb81ac6a35062ffd5c3029eb53be6b ]
The namespace head saves the Command Set Indicator enum, so use that
instead of the Command Set Selected. The two values are not the same.
Fixes:
831ed60c2aca2d ("nvme: also return I/O command effects from nvme_command_effects")
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Liao Chang [Mon, 16 Jan 2023 06:43:42 +0000 (14:43 +0800)]
riscv/kprobe: Fix instruction simulation of JALR
[ Upstream commit
ca0254998be4d74cf6add70ccfab0d2dbd362a10 ]
Set kprobe at 'jalr 1140(ra)' of vfs_write results in the following
crash:
[ 32.092235] Unable to handle kernel access to user memory without uaccess routines at virtual address
00aaaaaad77b1170
[ 32.093115] Oops [#1]
[ 32.093251] Modules linked in:
[ 32.093626] CPU: 0 PID: 135 Comm: ftracetest Not tainted 6.2.0-rc2-00013-gb0aa5e5df0cb-dirty #16
[ 32.093985] Hardware name: riscv-virtio,qemu (DT)
[ 32.094280] epc : ksys_read+0x88/0xd6
[ 32.094855] ra : ksys_read+0xc0/0xd6
[ 32.095016] epc :
ffffffff801cda80 ra :
ffffffff801cdab8 sp :
ff20000000d7bdc0
[ 32.095227] gp :
ffffffff80f14000 tp :
ff60000080f9cb40 t0 :
ffffffff80f13e80
[ 32.095500] t1 :
ffffffff8000c29c t2 :
ffffffff800dbc54 s0 :
ff20000000d7be60
[ 32.095716] s1 :
0000000000000000 a0 :
ffffffff805a64ae a1 :
ffffffff80a83708
[ 32.095921] a2 :
ffffffff80f160a0 a3 :
0000000000000000 a4 :
f229b0afdb165300
[ 32.096171] a5 :
f229b0afdb165300 a6 :
ffffffff80eeebd0 a7 :
00000000000003ff
[ 32.096411] s2 :
ff6000007ff76800 s3 :
fffffffffffffff7 s4 :
00aaaaaad77b1170
[ 32.096638] s5 :
ffffffff80f160a0 s6 :
ff6000007ff76800 s7 :
0000000000000030
[ 32.096865] s8 :
00ffffffc3d97be0 s9 :
0000000000000007 s10:
00aaaaaad77c9410
[ 32.097092] s11:
0000000000000000 t3 :
ffffffff80f13e48 t4 :
ffffffff8000c29c
[ 32.097317] t5 :
ffffffff8000c29c t6 :
ffffffff800dbc54
[ 32.097505] status:
0000000200000120 badaddr:
00aaaaaad77b1170 cause:
000000000000000d
[ 32.098011] [<
ffffffff801cdb72>] ksys_write+0x6c/0xd6
[ 32.098222] [<
ffffffff801cdc06>] sys_write+0x2a/0x38
[ 32.098405] [<
ffffffff80003c76>] ret_from_syscall+0x0/0x2
Since the rs1 and rd might be the same one, such as 'jalr 1140(ra)',
hence it requires obtaining the target address from rs1 followed by
updating rd.
Fixes:
c22b0bcb1dd0 ("riscv: Add kprobes supported")
Signed-off-by: Liao Chang <liaochang1@huawei.com>
Reviewed-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20230116064342.2092136-1-liaochang1@huawei.com
[Palmer: Pick Guo's cleanup]
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Marcelo Ricardo Leitner [Mon, 23 Jan 2023 17:59:33 +0000 (14:59 -0300)]
sctp: fail if no bound addresses can be used for a given scope
[ Upstream commit
458e279f861d3f61796894cd158b780765a1569f ]
Currently, if you bind the socket to something like:
servaddr.sin6_family = AF_INET6;
servaddr.sin6_port = htons(0);
servaddr.sin6_scope_id = 0;
inet_pton(AF_INET6, "::1", &servaddr.sin6_addr);
And then request a connect to:
connaddr.sin6_family = AF_INET6;
connaddr.sin6_port = htons(20000);
connaddr.sin6_scope_id = if_nametoindex("lo");
inet_pton(AF_INET6, "fe88::1", &connaddr.sin6_addr);
What the stack does is:
- bind the socket
- create a new asoc
- to handle the connect
- copy the addresses that can be used for the given scope
- try to connect
But the copy returns 0 addresses, and the effect is that it ends up
trying to connect as if the socket wasn't bound, which is not the
desired behavior. This unexpected behavior also allows KASLR leaks
through SCTP diag interface.
The fix here then is, if when trying to copy the addresses that can
be used for the scope used in connect() it returns 0 addresses, bail
out. This is what TCP does with a similar reproducer.
Reported-by: Pietro Borrello <borrello@diag.uniroma1.it>
Fixes:
1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Link: https://lore.kernel.org/r/9fcd182f1099f86c6661f3717f63712ddd1c676c.1674496737.git.marcelo.leitner@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Eric Dumazet [Mon, 23 Jan 2023 08:45:52 +0000 (08:45 +0000)]
net/sched: sch_taprio: do not schedule in taprio_reset()
[ Upstream commit
ea4fdbaa2f7798cb25adbe4fd52ffc6356f097bb ]
As reported by syzbot and hinted by Vinicius, I should not have added
a qdisc_synchronize() call in taprio_reset()
taprio_reset() can be called with qdisc spinlock held (and BH disabled)
as shown in included syzbot report [1].
Only taprio_destroy() needed this synchronization, as explained
in the blamed commit changelog.
[1]
BUG: scheduling while atomic: syz-executor150/5091/0x00000202
2 locks held by syz-executor150/5091:
Modules linked in:
Preemption disabled at:
[<
0000000000000000>] 0x0
Kernel panic - not syncing: scheduling while atomic: panic_on_warn set ...
CPU: 1 PID: 5091 Comm: syz-executor150 Not tainted 6.2.0-rc3-syzkaller-00219-g010a74f52203 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/12/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xd1/0x138 lib/dump_stack.c:106
panic+0x2cc/0x626 kernel/panic.c:318
check_panic_on_warn.cold+0x19/0x35 kernel/panic.c:238
__schedule_bug.cold+0xd5/0xfe kernel/sched/core.c:5836
schedule_debug kernel/sched/core.c:5865 [inline]
__schedule+0x34e4/0x5450 kernel/sched/core.c:6500
schedule+0xde/0x1b0 kernel/sched/core.c:6682
schedule_timeout+0x14e/0x2a0 kernel/time/timer.c:2167
schedule_timeout_uninterruptible kernel/time/timer.c:2201 [inline]
msleep+0xb6/0x100 kernel/time/timer.c:2322
qdisc_synchronize include/net/sch_generic.h:1295 [inline]
taprio_reset+0x93/0x270 net/sched/sch_taprio.c:1703
qdisc_reset+0x10c/0x770 net/sched/sch_generic.c:1022
dev_reset_queue+0x92/0x130 net/sched/sch_generic.c:1285
netdev_for_each_tx_queue include/linux/netdevice.h:2464 [inline]
dev_deactivate_many+0x36d/0x9f0 net/sched/sch_generic.c:1351
dev_deactivate+0xed/0x1b0 net/sched/sch_generic.c:1374
qdisc_graft+0xe4a/0x1380 net/sched/sch_api.c:1080
tc_modify_qdisc+0xb6b/0x19a0 net/sched/sch_api.c:1689
rtnetlink_rcv_msg+0x43e/0xca0 net/core/rtnetlink.c:6141
netlink_rcv_skb+0x165/0x440 net/netlink/af_netlink.c:2564
netlink_unicast_kernel net/netlink/af_netlink.c:1330 [inline]
netlink_unicast+0x547/0x7f0 net/netlink/af_netlink.c:1356
netlink_sendmsg+0x91b/0xe10 net/netlink/af_netlink.c:1932
sock_sendmsg_nosec net/socket.c:714 [inline]
sock_sendmsg+0xd3/0x120 net/socket.c:734
____sys_sendmsg+0x712/0x8c0 net/socket.c:2476
___sys_sendmsg+0x110/0x1b0 net/socket.c:2530
__sys_sendmsg+0xf7/0x1c0 net/socket.c:2559
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
Fixes:
3a415d59c1db ("net/sched: sch_taprio: fix possible use-after-free")
Link: https://lore.kernel.org/netdev/167387581653.2747.13878941339893288655.git-patchwork-notify@kernel.org/T/
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Link: https://lore.kernel.org/r/20230123084552.574396-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Chuang Wang [Tue, 27 Dec 2022 02:30:36 +0000 (10:30 +0800)]
tracing/osnoise: Use built-in RCU list checking
[ Upstream commit
685b64e4d6da4be8b4595654a57db663b3d1dfc2 ]
list_for_each_entry_rcu() has built-in RCU and lock checking.
Pass cond argument to list_for_each_entry_rcu() to silence false lockdep
warning when CONFIG_PROVE_RCU_LIST is enabled.
Execute as follow:
[tracing]# echo osnoise > current_tracer
[tracing]# echo 1 > tracing_on
[tracing]# echo 0 > tracing_on
The trace_types_lock is held when osnoise_tracer_stop() or
timerlat_tracer_stop() are called in the non-RCU read side section.
So, pass lockdep_is_held(&trace_types_lock) to silence false lockdep
warning.
Link: https://lkml.kernel.org/r/20221227023036.784337-1-nashuiliang@gmail.com
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Fixes:
dae181349f1e ("tracing/osnoise: Support a list of trace_array *tr")
Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Chuang Wang <nashuiliang@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Hans de Goede [Tue, 24 Jan 2023 10:57:54 +0000 (11:57 +0100)]
ACPI: video: Fix apple gmux detection
[ Upstream commit
b0935f110cff5d70da05c5cb1670bee0b07b631c ]
Some apple laptop models have an ACPI device with a HID of APP000B
and that device has an IO resource (so it does not describe the new
unsupported MMIO based gmux type), but there actually is no gmux
in the laptop at all.
The gmux_probe() function of the actual apple-gmux driver has code
to detect this, this code has been factored out into a new
apple_gmux_detect() helper in apple-gmux.h.
Use this new function to fix acpi_video_get_backlight_type() wrongly
returning apple_gmux as type on the following laptops:
MacBookPro5,4
https://pastebin.com/8Xjq7RhS
MacBookPro8,1
https://linux-hardware.org/?probe=
e513cfbadb&log=dmesg
MacBookPro9,2
https://bugzilla.kernel.org/attachment.cgi?id=278961
MacBookPro10,2
https://lkml.org/lkml/2014/9/22/657
MacBookPro11,2
https://forums.fedora-fr.org/viewtopic.php?id=70142
MacBookPro11,4
https://raw.githubusercontent.com/im-0/investigate-card-reader-suspend-problem-on-mbp11.4/mast
Fixes:
21245df307cb ("ACPI: video: Add Apple GMUX brightness control detection")
Link: https://lore.kernel.org/platform-driver-x86/20230123113750.462144-1-hdegoede@redhat.com/
Reported-by: Emmanouil Kouroupakis <kartebi@gmail.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20230124105754.62167-4-hdegoede@redhat.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
Hans de Goede [Tue, 24 Jan 2023 10:57:53 +0000 (11:57 +0100)]
platform/x86: apple-gmux: Add apple_gmux_detect() helper
[ Upstream commit
d143908f80f3e5d164ac3342f73d6b9f536e8b4d ]
Add a new (static inline) apple_gmux_detect() helper to apple-gmux.h
which can be used for gmux detection instead of apple_gmux_present().
The latter is not really reliable since an ACPI device with a HID
of APP000B is present on some devices without a gmux at all, as well
as on devices with a newer (unsupported) MMIO based gmux model.
This causes apple_gmux_present() to return false-positives on
a number of different Apple laptop models.
This new helper uses the same probing as the actual apple-gmux
driver, so that it does not return false positives.
To avoid code duplication the gmux_probe() function of the actual
driver is also moved over to using the new apple_gmux_detect() helper.
This avoids false positives (vs _HID + IO region detection) on:
MacBookPro5,4
https://pastebin.com/8Xjq7RhS
MacBookPro8,1
https://linux-hardware.org/?probe=
e513cfbadb&log=dmesg
MacBookPro9,2
https://bugzilla.kernel.org/attachment.cgi?id=278961
MacBookPro10,2
https://lkml.org/lkml/2014/9/22/657
MacBookPro11,2
https://forums.fedora-fr.org/viewtopic.php?id=70142
MacBookPro11,4
https://raw.githubusercontent.com/im-0/investigate-card-reader-suspend-problem-on-mbp11.4/master/test-16/dmesg
Fixes:
21245df307cb ("ACPI: video: Add Apple GMUX brightness control detection")
Link: https://lore.kernel.org/platform-driver-x86/20230123113750.462144-1-hdegoede@redhat.com/
Reported-by: Emmanouil Kouroupakis <kartebi@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20230124105754.62167-3-hdegoede@redhat.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
Hans de Goede [Tue, 24 Jan 2023 10:57:52 +0000 (11:57 +0100)]
platform/x86: apple-gmux: Move port defines to apple-gmux.h
[ Upstream commit
39f5a81f7ad80eb3fbcbfd817c6552db9de5504d ]
This is a preparation patch for adding a new static inline
apple_gmux_detect() helper which actually checks a supported
gmux is present, rather then only checking an ACPI device with
the HID is there as apple_gmux_present() does.
Fixes:
21245df307cb ("ACPI: video: Add Apple GMUX brightness control detection")
Link: https://lore.kernel.org/platform-driver-x86/20230123113750.462144-1-hdegoede@redhat.com/
Reported-by: Emmanouil Kouroupakis <kartebi@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20230124105754.62167-2-hdegoede@redhat.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
Hans de Goede [Fri, 20 Jan 2023 14:34:41 +0000 (15:34 +0100)]
platform/x86: asus-wmi: Fix kbd_dock_devid tablet-switch reporting
[ Upstream commit
fdcc0602d64f22185f61c70747214b630049cc33 ]
Commit
1ea0d3b46798 ("platform/x86: asus-wmi: Simplify tablet-mode-switch
handling") unified the asus-wmi tablet-switch handling, but it did not take
into account that the value returned for the kbd_dock_devid WMI method is
inverted where as the other ones are not inverted.
This causes asus-wmi to report an inverted tablet-switch state for devices
which use the kbd_dock_devid, which causes libinput to ignore touchpad
events while the affected T10x model 2-in-1s are docked.
Add inverting of the return value in the kbd_dock_devid case to fix this.
Fixes:
1ea0d3b46798 ("platform/x86: asus-wmi: Simplify tablet-mode-switch handling")
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20230120143441.527334-1-hdegoede@redhat.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
Kuniyuki Iwashima [Fri, 20 Jan 2023 23:19:27 +0000 (15:19 -0800)]
netrom: Fix use-after-free of a listening socket.
[ Upstream commit
409db27e3a2eb5e8ef7226ca33be33361b3ed1c9 ]
syzbot reported a use-after-free in do_accept(), precisely nr_accept()
as sk_prot_alloc() allocated the memory and sock_put() frees it. [0]
The issue could happen if the heartbeat timer is fired and
nr_heartbeat_expiry() calls nr_destroy_socket(), where a socket
has SOCK_DESTROY or a listening socket has SOCK_DEAD.
In this case, the first condition cannot be true. SOCK_DESTROY is
flagged in nr_release() only when the file descriptor is close()d,
but accept() is being called for the listening socket, so the second
condition must be true.
Usually, the AF_NETROM listener neither starts timers nor sets
SOCK_DEAD. However, the condition is met if connect() fails before
listen(). connect() starts the t1 timer and heartbeat timer, and
t1timer calls nr_disconnect() when timeout happens. Then, SOCK_DEAD
is set, and if we call listen(), the heartbeat timer calls
nr_destroy_socket().
nr_connect
nr_establish_data_link(sk)
nr_start_t1timer(sk)
nr_start_heartbeat(sk)
nr_t1timer_expiry
nr_disconnect(sk, ETIMEDOUT)
nr_sk(sk)->state = NR_STATE_0
sk->sk_state = TCP_CLOSE
sock_set_flag(sk, SOCK_DEAD)
nr_listen
if (sk->sk_state != TCP_LISTEN)
sk->sk_state = TCP_LISTEN
nr_heartbeat_expiry
switch (nr->state)
case NR_STATE_0
if (sk->sk_state == TCP_LISTEN &&
sock_flag(sk, SOCK_DEAD))
nr_destroy_socket(sk)
This path seems expected, and nr_destroy_socket() is called to clean
up resources. Initially, there was sock_hold() before nr_destroy_socket()
so that the socket would not be freed, but the commit
517a16b1a88b
("netrom: Decrease sock refcount when sock timers expire") accidentally
removed it.
To fix use-after-free, let's add sock_hold().
[0]:
BUG: KASAN: use-after-free in do_accept+0x483/0x510 net/socket.c:1848
Read of size 8 at addr
ffff88807978d398 by task syz-executor.3/5315
CPU: 0 PID: 5315 Comm: syz-executor.3 Not tainted 6.2.0-rc3-syzkaller-00165-gd9fc1511728c #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xd1/0x138 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:306 [inline]
print_report+0x15e/0x461 mm/kasan/report.c:417
kasan_report+0xbf/0x1f0 mm/kasan/report.c:517
do_accept+0x483/0x510 net/socket.c:1848
__sys_accept4_file net/socket.c:1897 [inline]
__sys_accept4+0x9a/0x120 net/socket.c:1927
__do_sys_accept net/socket.c:1944 [inline]
__se_sys_accept net/socket.c:1941 [inline]
__x64_sys_accept+0x75/0xb0 net/socket.c:1941
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7fa436a8c0c9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:
00007fa437784168 EFLAGS:
00000246 ORIG_RAX:
000000000000002b
RAX:
ffffffffffffffda RBX:
00007fa436bac050 RCX:
00007fa436a8c0c9
RDX:
0000000000000000 RSI:
0000000000000000 RDI:
0000000000000005
RBP:
00007fa436ae7ae9 R08:
0000000000000000 R09:
0000000000000000
R10:
0000000000000000 R11:
0000000000000246 R12:
0000000000000000
R13:
00007ffebc6700df R14:
00007fa437784300 R15:
0000000000022000
</TASK>
Allocated by task 5294:
kasan_save_stack+0x22/0x40 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:371 [inline]
____kasan_kmalloc mm/kasan/common.c:330 [inline]
__kasan_kmalloc+0xa3/0xb0 mm/kasan/common.c:380
kasan_kmalloc include/linux/kasan.h:211 [inline]
__do_kmalloc_node mm/slab_common.c:968 [inline]
__kmalloc+0x5a/0xd0 mm/slab_common.c:981
kmalloc include/linux/slab.h:584 [inline]
sk_prot_alloc+0x140/0x290 net/core/sock.c:2038
sk_alloc+0x3a/0x7a0 net/core/sock.c:2091
nr_create+0xb6/0x5f0 net/netrom/af_netrom.c:433
__sock_create+0x359/0x790 net/socket.c:1515
sock_create net/socket.c:1566 [inline]
__sys_socket_create net/socket.c:1603 [inline]
__sys_socket_create net/socket.c:1588 [inline]
__sys_socket+0x133/0x250 net/socket.c:1636
__do_sys_socket net/socket.c:1649 [inline]
__se_sys_socket net/socket.c:1647 [inline]
__x64_sys_socket+0x73/0xb0 net/socket.c:1647
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Freed by task 14:
kasan_save_stack+0x22/0x40 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2b/0x40 mm/kasan/generic.c:518
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free+0x13b/0x1a0 mm/kasan/common.c:200
kasan_slab_free include/linux/kasan.h:177 [inline]
__cache_free mm/slab.c:3394 [inline]
__do_kmem_cache_free mm/slab.c:3580 [inline]
__kmem_cache_free+0xcd/0x3b0 mm/slab.c:3587
sk_prot_free net/core/sock.c:2074 [inline]
__sk_destruct+0x5df/0x750 net/core/sock.c:2166
sk_destruct net/core/sock.c:2181 [inline]
__sk_free+0x175/0x460 net/core/sock.c:2192
sk_free+0x7c/0xa0 net/core/sock.c:2203
sock_put include/net/sock.h:1991 [inline]
nr_heartbeat_expiry+0x1d7/0x460 net/netrom/nr_timer.c:148
call_timer_fn+0x1da/0x7c0 kernel/time/timer.c:1700
expire_timers+0x2c6/0x5c0 kernel/time/timer.c:1751
__run_timers kernel/time/timer.c:2022 [inline]
__run_timers kernel/time/timer.c:1995 [inline]
run_timer_softirq+0x326/0x910 kernel/time/timer.c:2035
__do_softirq+0x1fb/0xadc kernel/softirq.c:571
Fixes:
517a16b1a88b ("netrom: Decrease sock refcount when sock timers expire")
Reported-by: syzbot+5fafd5cfe1fc91f6b352@syzkaller.appspotmail.com
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230120231927.51711-1-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Sriram Yagnaraman [Tue, 24 Jan 2023 01:47:18 +0000 (02:47 +0100)]
netfilter: conntrack: fix vtag checks for ABORT/SHUTDOWN_COMPLETE
[ Upstream commit
a9993591fa94246b16b444eea55d84c54608282a ]
RFC 9260, Sec 8.5.1 states that for ABORT/SHUTDOWN_COMPLETE, the chunk
MUST be accepted if the vtag of the packet matches its own tag and the
T bit is not set OR if it is set to its peer's vtag and the T bit is set
in chunk flags. Otherwise the packet MUST be silently dropped.
Update vtag verification for ABORT/SHUTDOWN_COMPLETE based on the above
description.
Fixes:
9fb9cbb1082d ("[NETFILTER]: Add nf_conntrack subsystem.")
Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Alexandru Tachici [Fri, 20 Jan 2023 09:08:46 +0000 (11:08 +0200)]
net: ethernet: adi: adin1110: Fix multicast offloading
[ Upstream commit
8a4f6d023221c4b052ddfa1db48b27871bad6e96 ]
Driver marked broadcast/multicast frames as offloaded incorrectly.
Mark them as offloaded only when HW offloading has been enabled.
This should happen only for ADIN2111 when both ports are bridged
by the software.
Fixes:
bc93e19d088b ("net: ethernet: adi: Add ADIN1110 support")
Signed-off-by: Alexandru Tachici <alexandru.tachici@analog.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20230120090846.18172-1-alexandru.tachici@analog.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Ahmad Fatoum [Fri, 20 Jan 2023 11:09:32 +0000 (12:09 +0100)]
net: dsa: microchip: fix probe of I2C-connected KSZ8563
[ Upstream commit
360fdc999d92db4a4adbba0db8641396dc9f1b13 ]
Starting with commit
eee16b147121 ("net: dsa: microchip: perform the
compatibility check for dev probed"), the KSZ switch driver now bails
out if it thinks the DT compatible doesn't match the actual chip ID
read back from the hardware:
ksz9477-switch 1-005f: Device tree specifies chip KSZ9893 but found
KSZ8563, please fix it!
For the KSZ8563, which used ksz_switch_chips[KSZ9893], this was fine
at first, because it indeed shares the same chip id as the KSZ9893.
Commit
b44908095612 ("net: dsa: microchip: add separate struct
ksz_chip_data for KSZ8563 chip") started differentiating KSZ9893
compatible chips by consulting the 0x1F register. The resulting breakage
was fixed for the SPI driver in the same commit by introducing the
appropriate ksz_switch_chips[KSZ8563], but not for the I2C driver.
Fix this for I2C-connected KSZ8563 now to get it probing again.
Fixes:
b44908095612 ("net: dsa: microchip: add separate struct ksz_chip_data for KSZ8563 chip").
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20230120110933.1151054-1-a.fatoum@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Eric Dumazet [Fri, 20 Jan 2023 13:31:40 +0000 (13:31 +0000)]
ipv4: prevent potential spectre v1 gadget in fib_metrics_match()
[ Upstream commit
5e9398a26a92fc402d82ce1f97cc67d832527da0 ]
if (!type)
continue;
if (type > RTAX_MAX)
return false;
...
fi_val = fi->fib_metrics->metrics[type - 1];
@type being used as an array index, we need to prevent
cpu speculation or risk leaking kernel memory content.
Fixes:
5f9ae3d9e7e4 ("ipv4: do metrics match when looking up and deleting a route")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230120133140.3624204-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Eric Dumazet [Fri, 20 Jan 2023 13:30:40 +0000 (13:30 +0000)]
ipv4: prevent potential spectre v1 gadget in ip_metrics_convert()
[ Upstream commit
1d1d63b612801b3f0a39b7d4467cad0abd60e5c8 ]
if (!type)
continue;
if (type > RTAX_MAX)
return -EINVAL;
...
metrics[type - 1] = val;
@type being used as an array index, we need to prevent
cpu speculation or risk leaking kernel memory content.
Fixes:
6cf9dfd3bd62 ("net: fib: move metrics parsing to a helper")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230120133040.3623463-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Eric Dumazet [Fri, 20 Jan 2023 12:59:55 +0000 (12:59 +0000)]
netlink: annotate data races around sk_state
[ Upstream commit
9b663b5cbb15b494ef132a3c937641c90646eb73 ]
netlink_getsockbyportid() reads sk_state while a concurrent
netlink_connect() can change its value.
Fixes:
1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Eric Dumazet [Fri, 20 Jan 2023 12:59:54 +0000 (12:59 +0000)]
netlink: annotate data races around dst_portid and dst_group
[ Upstream commit
004db64d185a5f23dfb891d7701e23713b2420ee ]
netlink_getname(), netlink_sendmsg() and netlink_getsockbyportid()
can read nlk->dst_portid and nlk->dst_group while another
thread is changing them.
Fixes:
1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Eric Dumazet [Fri, 20 Jan 2023 12:59:53 +0000 (12:59 +0000)]
netlink: annotate data races around nlk->portid
[ Upstream commit
c1bb9484e3b05166880da8574504156ccbd0549e ]
syzbot reminds us netlink_getname() runs locklessly [1]
This first patch annotates the race against nlk->portid.
Following patches take care of the remaining races.
[1]
BUG: KCSAN: data-race in netlink_getname / netlink_insert
write to 0xffff88814176d310 of 4 bytes by task 2315 on cpu 1:
netlink_insert+0xf1/0x9a0 net/netlink/af_netlink.c:583
netlink_autobind+0xae/0x180 net/netlink/af_netlink.c:856
netlink_sendmsg+0x444/0x760 net/netlink/af_netlink.c:1895
sock_sendmsg_nosec net/socket.c:714 [inline]
sock_sendmsg net/socket.c:734 [inline]
____sys_sendmsg+0x38f/0x500 net/socket.c:2476
___sys_sendmsg net/socket.c:2530 [inline]
__sys_sendmsg+0x19a/0x230 net/socket.c:2559
__do_sys_sendmsg net/socket.c:2568 [inline]
__se_sys_sendmsg net/socket.c:2566 [inline]
__x64_sys_sendmsg+0x42/0x50 net/socket.c:2566
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
read to 0xffff88814176d310 of 4 bytes by task 2316 on cpu 0:
netlink_getname+0xcd/0x1a0 net/netlink/af_netlink.c:1144
__sys_getsockname+0x11d/0x1b0 net/socket.c:2026
__do_sys_getsockname net/socket.c:2041 [inline]
__se_sys_getsockname net/socket.c:2038 [inline]
__x64_sys_getsockname+0x3e/0x50 net/socket.c:2038
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
value changed: 0x00000000 -> 0xc9a49780
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 2316 Comm: syz-executor.2 Not tainted 6.2.0-rc3-syzkaller-00030-ge8f60cd7db24-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
Fixes:
1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Pablo Neira Ayuso [Sat, 14 Jan 2023 22:49:46 +0000 (23:49 +0100)]
netfilter: nft_set_rbtree: skip elements in transaction from garbage collection
[ Upstream commit
5d235d6ce75c12a7fdee375eb211e4116f7ab01b ]
Skip interference with an ongoing transaction, do not perform garbage
collection on inactive elements. Reset annotated previous end interval
if the expired element is marked as busy (control plane removed the
element right before expiration).
Fixes:
8d8540c4f5e0 ("netfilter: nft_set_rbtree: add timeout support")
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Pablo Neira Ayuso [Sat, 14 Jan 2023 22:38:32 +0000 (23:38 +0100)]
netfilter: nft_set_rbtree: Switch to node list walk for overlap detection
[ Upstream commit
c9e6978e2725a7d4b6cd23b2facd3f11422c0643 ]
...instead of a tree descent, which became overly complicated in an
attempt to cover cases where expired or inactive elements would affect
comparisons with the new element being inserted.
Further, it turned out that it's probably impossible to cover all those
cases, as inactive nodes might entirely hide subtrees consisting of a
complete interval plus a node that makes the current insertion not
overlap.
To speed up the overlap check, descent the tree to find a greater
element that is closer to the key value to insert. Then walk down the
node list for overlap detection. Starting the overlap check from
rb_first() unconditionally is slow, it takes 10 times longer due to the
full linear traversal of the list.
Moreover, perform garbage collection of expired elements when walking
down the node list to avoid bogus overlap reports.
For the insertion operation itself, this essentially reverts back to the
implementation before commit
7c84d41416d8 ("netfilter: nft_set_rbtree:
Detect partial overlaps on insertion"), except that cases of complete
overlap are already handled in the overlap detection phase itself, which
slightly simplifies the loop to find the insertion point.
Based on initial patch from Stefano Brivio, including text from the
original patch description too.
Fixes:
7c84d41416d8 ("netfilter: nft_set_rbtree: Detect partial overlaps on insertion")
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Hans de Goede [Thu, 19 Jan 2023 17:24:41 +0000 (18:24 +0100)]
ACPI: video: Add backlight=native DMI quirk for Asus U46E
[ Upstream commit
e6b3086fddc0065a5ffb947d4d29dd0e6efc327b ]
The Asus U46E backlight tables have a set of interesting problems:
1. Its ACPI tables do make _OSI ("Windows 2012") checks, so
acpi_osi_is_win8() should return true.
But the tables have 2 sets of _OSI calls, one from the usual global
_INI method setting a global OSYS variable and a second set of _OSI
calls from a MSOS method and the MSOS method is the only one calling
_OSI ("Windows 2012").
The MSOS method only gets called in the following cases:
1. From some Asus specific WMI methods
2. From _DOD, which only runs after acpi_video_get_backlight_type()
has already been called by the i915 driver
3. From other ACPI video bus methods which never run (see below)
4. From some EC query callbacks
So when i915 calls acpi_video_get_backlight_type() MSOS has never run
and acpi_osi_is_win8() returns false, so acpi_video_get_backlight_type()
returns acpi_video as the desired backlight type, which causes
the intel_backlight device to not register.
2. _DOD effectively does this:
Return (Package (0x01)
{
0x0400
})
causing acpi_video_device_in_dod() to return false, which causes
the acpi_video backlight device to not register.
Leaving the user with no backlight device at all. Note that before 6.1.y
the i915 driver would register the intel_backlight device unconditionally
and since that then was the only backlight device userspace would use that.
Add a backlight=native DMI quirk for this special laptop to restore
the old (and working) behavior of the intel_backlight device registering.
Fixes:
fb1836c91317 ("ACPI: video: Prefer native over vendor")
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Hans de Goede [Thu, 19 Jan 2023 16:37:44 +0000 (17:37 +0100)]
ACPI: video: Add backlight=native DMI quirk for HP EliteBook 8460p
[ Upstream commit
9dcb34234b8235144c96103266317da33321077e ]
The HP EliteBook 8460p predates Windows 8, so it defaults to using
acpi_video# for backlight control.
Starting with the 6.1.y kernels the native radeon_bl0 backlight is hidden
in this case instead of relying on userspace preferring acpi_video# over
native backlight devices.
It turns out that for the acpi_video# interface to work on
the HP EliteBook 8460p, the brightness needs to be set at least once
through the native interface, which now no longer is done breaking
backlight control.
The native interface however always works without problems, so add
a quirk to use native backlight on the EliteBook 8460p to fix this.
Fixes:
fb1836c91317 ("ACPI: video: Prefer native over vendor")
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2161428
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Hans de Goede [Thu, 19 Jan 2023 16:37:43 +0000 (17:37 +0100)]
ACPI: video: Add backlight=native DMI quirk for HP Pavilion g6-1d80nr
[ Upstream commit
d77596d432cc4142520af32b5388d512e52e0edb ]
The HP Pavilion g6-1d80nr predates Windows 8, so it defaults to using
acpi_video# for backlight control, but this is non functional on
this model.
Add a DMI quirk to use the native backlight interface which does
work properly.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Stable-dep-of:
9dcb34234b82 ("ACPI: video: Add backlight=native DMI quirk for HP EliteBook 8460p")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Arnd Bergmann [Tue, 17 Jan 2023 16:37:29 +0000 (17:37 +0100)]
drm/i915/selftest: fix intel_selftest_modify_policy argument types
[ Upstream commit
2255bbcdc39d5b0311968f86614ae4f25fdd465d ]
The definition of intel_selftest_modify_policy() does not match the
declaration, as gcc-13 points out:
drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c:29:5: error: conflicting types for 'intel_selftest_modify_policy' due to enum/integer mismatch; have 'int(struct intel_engine_cs *, struct intel_selftest_saved_policy *, u32)' {aka 'int(struct intel_engine_cs *, struct intel_selftest_saved_policy *, unsigned int)'} [-Werror=enum-int-mismatch]
29 | int intel_selftest_modify_policy(struct intel_engine_cs *engine,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c:11:
drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.h:28:5: note: previous declaration of 'intel_selftest_modify_policy' with type 'int(struct intel_engine_cs *, struct intel_selftest_saved_policy *, enum selftest_scheduler_modify)'
28 | int intel_selftest_modify_policy(struct intel_engine_cs *engine,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
Change the type in the definition to match.
Fixes:
617e87c05c72 ("drm/i915/selftest: Fix hangcheck self test for GuC submission")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230117163743.1003219-1-arnd@kernel.org
(cherry picked from commit
8d7eb8ed3f83f248e01a4f548d9c500a950a2c2d)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Ross Lagerwall [Fri, 20 Jan 2023 17:43:54 +0000 (17:43 +0000)]
nvme-fc: fix initialization order
[ Upstream commit
98e3528012cd571c48bbae7c7c0f868823254b6c ]
ctrl->ops is used by nvme_alloc_admin_tag_set() but set by
nvme_init_ctrl() so reorder the calls to avoid a NULL pointer
dereference.
Fixes:
6dfba1c09c10 ("nvme-fc: use the tagset alloc/free helpers")
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Christoph Hellwig [Wed, 30 Nov 2022 16:19:50 +0000 (17:19 +0100)]
nvme: consolidate setting the tagset flags
[ Upstream commit
db45e1a5ddccc034eb60d62fc5352022d7963ae2 ]
All nvme transports should be using the same flags for their tagsets,
with the exception for the blocking flag that should only be set for
transports that can block in ->queue_rq.
Add a NVME_F_BLOCKING flag to nvme_ctrl_ops to control the blocking
behavior and lift setting the flags into nvme_alloc_{admin,io}_tag_set.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Stable-dep-of:
98e3528012cd ("nvme-fc: fix initialization order")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Christoph Hellwig [Thu, 27 Oct 2022 09:34:13 +0000 (02:34 -0700)]
nvme: simplify transport specific device attribute handling
[ Upstream commit
86adbf0cdb9ec6533234696c3e243184d4d0d040 ]
Allow the transport driver to override the attribute groups for the
control device, so that the PCIe driver doesn't manually have to add a
group after device creation and keep track of it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Tested-by Gerd Bayer <gbayer@linxu.ibm.com>
Stable-dep-of:
98e3528012cd ("nvme-fc: fix initialization order")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Wei Fang [Thu, 19 Jan 2023 04:37:47 +0000 (12:37 +0800)]
net: fec: Use page_pool_put_full_page when freeing rx buffers
[ Upstream commit
e38553bdc377e3e7a6caa9dd9770d8b644d8dac3 ]
The page_pool_release_page was used when freeing rx buffers, and this
function just unmaps the page (if mapped) and does not recycle the page.
So after hundreds of down/up the eth0, the system will out of memory.
For more details, please refer to the following reproduce steps and
bug logs. To solve this issue and refer to the doc of page pool, the
page_pool_put_full_page should be used to replace page_pool_release_page.
Because this API will try to recycle the page if the page refcnt equal to
1. After testing 20000 times, the issue can not be reproduced anymore
(about testing 391 times the issue will occur on i.MX8MN-EVK before).
Reproduce steps:
Create the test script and run the script. The script content is as
follows:
LOOPS=20000
i=1
while [ $i -le $LOOPS ]
do
echo "TINFO:ENET $curface up and down test $i times"
org_macaddr=$(cat /sys/class/net/eth0/address)
ifconfig eth0 down
ifconfig eth0 hw ether $org_macaddr up
i=$(expr $i + 1)
done
sleep 5
if cat /sys/class/net/eth0/operstate | grep 'up';then
echo "TEST PASS"
else
echo "TEST FAIL"
fi
Bug detail logs:
TINFO:ENET up and down test 391 times
[ 850.471205] Qualcomm Atheros AR8031/AR8033
30be0000.ethernet-1:00: attached PHY driver (mii_bus:phy_addr=
30be0000.ethernet-1:00, irq=POLL)
[ 853.535318] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 853.541694] fec
30be0000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx
[ 870.590531] page_pool_release_retry() stalled pool shutdown 199 inflight 60 sec
[ 931.006557] page_pool_release_retry() stalled pool shutdown 199 inflight 120 sec
TINFO:ENET up and down test 392 times
[ 991.426544] page_pool_release_retry() stalled pool shutdown 192 inflight 181 sec
[ 1051.838531] page_pool_release_retry() stalled pool shutdown 170 inflight 241 sec
[ 1093.751217] Qualcomm Atheros AR8031/AR8033
30be0000.ethernet-1:00: attached PHY driver (mii_bus:phy_addr=
30be0000.ethernet-1:00, irq=POLL)
[ 1096.446520] page_pool_release_retry() stalled pool shutdown 308 inflight 60 sec
[ 1096.831245] fec
30be0000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx
[ 1096.839092] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 1112.254526] page_pool_release_retry() stalled pool shutdown 103 inflight 302 sec
[ 1156.862533] page_pool_release_retry() stalled pool shutdown 308 inflight 120 sec
[ 1172.674516] page_pool_release_retry() stalled pool shutdown 103 inflight 362 sec
[ 1217.278532] page_pool_release_retry() stalled pool shutdown 308 inflight 181 sec
TINFO:ENET up and down test 393 times
[ 1233.086535] page_pool_release_retry() stalled pool shutdown 103 inflight 422 sec
[ 1277.698513] page_pool_release_retry() stalled pool shutdown 308 inflight 241 sec
[ 1293.502525] page_pool_release_retry() stalled pool shutdown 86 inflight 483 sec
[ 1338.110518] page_pool_release_retry() stalled pool shutdown 308 inflight 302 sec
[ 1353.918540] page_pool_release_retry() stalled pool shutdown 32 inflight 543 sec
[ 1361.179205] Qualcomm Atheros AR8031/AR8033
30be0000.ethernet-1:00: attached PHY driver (mii_bus:phy_addr=
30be0000.ethernet-1:00, irq=POLL)
[ 1364.255298] fec
30be0000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx
[ 1364.263189] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 1371.998532] page_pool_release_retry() stalled pool shutdown 310 inflight 60 sec
[ 1398.530542] page_pool_release_retry() stalled pool shutdown 308 inflight 362 sec
[ 1414.334539] page_pool_release_retry() stalled pool shutdown 16 inflight 604 sec
[ 1432.414520] page_pool_release_retry() stalled pool shutdown 310 inflight 120 sec
[ 1458.942523] page_pool_release_retry() stalled pool shutdown 308 inflight 422 sec
[ 1474.750521] page_pool_release_retry() stalled pool shutdown 16 inflight 664 sec
TINFO:ENET up and down test 394 times
[ 1492.830522] page_pool_release_retry() stalled pool shutdown 310 inflight 181 sec
[ 1519.358519] page_pool_release_retry() stalled pool shutdown 308 inflight 483 sec
[ 1535.166545] page_pool_release_retry() stalled pool shutdown 2 inflight 724 sec
[ 1537.090278] eth_test2.sh invoked oom-killer: gfp_mask=0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), order=0, oom_score_adj=0
[ 1537.101192] CPU: 3 PID: 2379 Comm: eth_test2.sh Tainted: G C 6.1.1+g56321e101aca #1
[ 1537.110249] Hardware name: NXP i.MX8MNano EVK board (DT)
[ 1537.115561] Call trace:
[ 1537.118005] dump_backtrace.part.0+0xe0/0xf0
[ 1537.122289] show_stack+0x18/0x40
[ 1537.125608] dump_stack_lvl+0x64/0x80
[ 1537.129276] dump_stack+0x18/0x34
[ 1537.132592] dump_header+0x44/0x208
[ 1537.136083] oom_kill_process+0x2b4/0x2c0
[ 1537.140097] out_of_memory+0xe4/0x594
[ 1537.143766] __alloc_pages+0xb68/0xd00
[ 1537.147521] alloc_pages+0xac/0x160
[ 1537.151013] __get_free_pages+0x14/0x40
[ 1537.154851] pgd_alloc+0x1c/0x30
[ 1537.158082] mm_init+0xf8/0x1d0
[ 1537.161228] mm_alloc+0x48/0x60
[ 1537.164368] alloc_bprm+0x7c/0x240
[ 1537.167777] do_execveat_common.isra.0+0x70/0x240
[ 1537.172486] __arm64_sys_execve+0x40/0x54
[ 1537.176502] invoke_syscall+0x48/0x114
[ 1537.180255] el0_svc_common.constprop.0+0xcc/0xec
[ 1537.184964] do_el0_svc+0x2c/0xd0
[ 1537.188280] el0_svc+0x2c/0x84
[ 1537.191340] el0t_64_sync_handler+0xf4/0x120
[ 1537.195613] el0t_64_sync+0x18c/0x190
[ 1537.199334] Mem-Info:
[ 1537.201620] active_anon:342 inactive_anon:10343 isolated_anon:0
[ 1537.201620] active_file:54 inactive_file:112 isolated_file:0
[ 1537.201620] unevictable:0 dirty:0 writeback:0
[ 1537.201620] slab_reclaimable:2620 slab_unreclaimable:7076
[ 1537.201620] mapped:1489 shmem:2473 pagetables:466
[ 1537.201620] sec_pagetables:0 bounce:0
[ 1537.201620] kernel_misc_reclaimable:0
[ 1537.201620] free:136672 free_pcp:96 free_cma:129241
[ 1537.240419] Node 0 active_anon:1368kB inactive_anon:41372kB active_file:216kB inactive_file:5052kB unevictable:0kB isolated(anon):0kB isolated(file):0kB s
[ 1537.271422] Node 0 DMA free:541636kB boost:0kB min:30000kB low:37500kB high:45000kB reserved_highatomic:0KB active_anon:1368kB inactive_anon:41372kB actiB
[ 1537.300219] lowmem_reserve[]: 0 0 0 0
[ 1537.303929] Node 0 DMA: 1015*4kB (UMEC) 743*8kB (UMEC) 417*16kB (UMEC) 235*32kB (UMEC) 116*64kB (UMEC) 25*128kB (UMEC) 4*256kB (UC) 2*512kB (UC) 0*1024kBB
[ 1537.323938] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[ 1537.332708] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=32768kB
[ 1537.341292] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[ 1537.349776] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=64kB
[ 1537.358087] 2939 total pagecache pages
[ 1537.361876] 0 pages in swap cache
[ 1537.365229] Free swap = 0kB
[ 1537.368147] Total swap = 0kB
[ 1537.371065] 516096 pages RAM
[ 1537.373959] 0 pages HighMem/MovableOnly
[ 1537.377834] 17302 pages reserved
[ 1537.381103] 163840 pages cma reserved
[ 1537.384809] 0 pages hwpoisoned
[ 1537.387902] Tasks state (memory values in pages):
[ 1537.392652] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
[ 1537.401356] [ 201] 993 201 1130 72 45056 0 0 rpcbind
[ 1537.409772] [ 202] 0 202 4529 1640 77824 0 -250 systemd-journal
[ 1537.418861] [ 222] 0 222 4691 801 69632 0 -1000 systemd-udevd
[ 1537.427787] [ 248] 994 248 20914 130 65536 0 0 systemd-timesyn
[ 1537.436884] [ 497] 0 497 620 31 49152 0 0 atd
[ 1537.444938] [ 500] 0 500 854 77 53248 0 0 crond
[ 1537.453165] [ 503] 997 503 1470 160 49152 0 -900 dbus-daemon
[ 1537.461908] [ 505] 0 505 633 24 40960 0 0 firmwared
[ 1537.470491] [ 513] 0 513 2507 180 61440 0 0 ofonod
[ 1537.478800] [ 514] 990 514 69640 137 81920 0 0 parsec
[ 1537.487120] [ 533] 0 533 599 39 40960 0 0 syslogd
[ 1537.495518] [ 534] 0 534 4546 148 65536 0 0 systemd-logind
[ 1537.504560] [ 535] 0 535 690 24 45056 0 0 tee-supplicant
[ 1537.513564] [ 540] 996 540 2769 168 61440 0 0 systemd-network
[ 1537.522680] [ 566] 0 566 3878 228 77824 0 0 connmand
[ 1537.531168] [ 645] 998 645 1538 133 57344 0 0 avahi-daemon
[ 1537.540004] [ 646] 998 646 1461 64 57344 0 0 avahi-daemon
[ 1537.548846] [ 648] 992 648 781 41 45056 0 0 rpc.statd
[ 1537.557415] [ 650] 64371 650 590 23 45056 0 0 ninfod
[ 1537.565754] [ 653] 61563 653 555 24 45056 0 0 rdisc
[ 1537.573971] [ 655] 0 655 374569 2999 290816 0 -999 containerd
[ 1537.582621] [ 658] 0 658 1311 20 49152 0 0 agetty
[ 1537.590922] [ 663] 0 663 1529 97 49152 0 0 login
[ 1537.599138] [ 666] 0 666 3430 202 69632 0 0 wpa_supplicant
[ 1537.608147] [ 667] 0 667 2344 96 61440 0 0 systemd-userdbd
[ 1537.617240] [ 677] 0 677 2964 314 65536 0 100 systemd
[ 1537.625651] [ 679] 0 679 3720 646 73728 0 100 (sd-pam)
[ 1537.634138] [ 687] 0 687 1289 403 45056 0 0 sh
[ 1537.642108] [ 789] 0 789 970 93 45056 0 0 eth_test2.sh
[ 1537.650955] [ 2355] 0 2355 2346 94 61440 0 0 systemd-userwor
[ 1537.660046] [ 2356] 0 2356 2346 94 61440 0 0 systemd-userwor
[ 1537.669137] [ 2358] 0 2358 2346 95 57344 0 0 systemd-userwor
[ 1537.678258] [ 2379] 0 2379 970 93 45056 0 0 eth_test2.sh
[ 1537.687098] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-0.slice/user@0.service,tas0
[ 1537.703009] Out of memory: Killed process 679 ((sd-pam)) total-vm:14880kB, anon-rss:2584kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:72kB oom_score_ad0
[ 1553.246526] page_pool_release_retry() stalled pool shutdown 310 inflight 241 sec
Fixes:
95698ff6177b ("net: fec: using page pool to manage RX buffers")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: shenwei wang <Shenwei.wang@nxp.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Paolo Abeni [Thu, 19 Jan 2023 18:55:45 +0000 (19:55 +0100)]
net: fix UaF in netns ops registration error path
[ Upstream commit
71ab9c3e2253619136c31c89dbb2c69305cc89b1 ]
If net_assign_generic() fails, the current error path in ops_init() tries
to clear the gen pointer slot. Anyway, in such error path, the gen pointer
itself has not been modified yet, and the existing and accessed one is
smaller than the accessed index, causing an out-of-bounds error:
BUG: KASAN: slab-out-of-bounds in ops_init+0x2de/0x320
Write of size 8 at addr
ffff888109124978 by task modprobe/1018
CPU: 2 PID: 1018 Comm: modprobe Not tainted 6.2.0-rc2.mptcp_ae5ac65fbed5+ #1641
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.1-2.fc37 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x6a/0x9f
print_address_description.constprop.0+0x86/0x2b5
print_report+0x11b/0x1fb
kasan_report+0x87/0xc0
ops_init+0x2de/0x320
register_pernet_operations+0x2e4/0x750
register_pernet_subsys+0x24/0x40
tcf_register_action+0x9f/0x560
do_one_initcall+0xf9/0x570
do_init_module+0x190/0x650
load_module+0x1fa5/0x23c0
__do_sys_finit_module+0x10d/0x1b0
do_syscall_64+0x58/0x80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f42518f778d
Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48
89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
ff 73 01 c3 48 8b 0d cb 56 2c 00 f7 d8 64 89 01 48
RSP: 002b:
00007fff96869688 EFLAGS:
00000246 ORIG_RAX:
0000000000000139
RAX:
ffffffffffffffda RBX:
00005568ef7f7c90 RCX:
00007f42518f778d
RDX:
0000000000000000 RSI:
00005568ef41d796 RDI:
0000000000000003
RBP:
00005568ef41d796 R08:
0000000000000000 R09:
0000000000000000
R10:
0000000000000003 R11:
0000000000000246 R12:
0000000000000000
R13:
00005568ef7f7d30 R14:
0000000000040000 R15:
0000000000000000
</TASK>
This change addresses the issue by skipping the gen pointer
de-reference in the mentioned error-path.
Found by code inspection and verified with explicit error injection
on a kasan-enabled kernel.
Fixes:
d266935ac43d ("net: fix UAF issue in nfqnl_nf_hook_drop() when ops_init() failed")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/cec4e0f3bb2c77ac03a6154a8508d3930beb5f0f.1674154348.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Eric Dumazet [Thu, 19 Jan 2023 11:01:50 +0000 (11:01 +0000)]
netlink: prevent potential spectre v1 gadgets
[ Upstream commit
f0950402e8c76e7dcb08563f1b4e8000fbc62455 ]
Most netlink attributes are parsed and validated from
__nla_validate_parse() or validate_nla()
u16 type = nla_type(nla);
if (type == 0 || type > maxtype) {
/* error or continue */
}
@type is then used as an array index and can be used
as a Spectre v1 gadget.
array_index_nospec() can be used to prevent leaking
content of kernel memory to malicious users.
This should take care of vast majority of netlink uses,
but an audit is needed to take care of others where
validation is not yet centralized in core netlink functions.
Fixes:
bfa83a9e03cf ("[NETLINK]: Type-safe netlink messages/attributes interface")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230119110150.2678537-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Stefan Assmann [Tue, 10 Jan 2023 08:00:18 +0000 (09:00 +0100)]
iavf: schedule watchdog immediately when changing primary MAC
[ Upstream commit
e2b53ea5a7c1fb484277ad12cd075f502cf03b04 ]
iavf_replace_primary_mac() utilizes queue_work() to schedule the
watchdog task but that only ensures that the watchdog task is queued
to run. To make sure the watchdog is executed asap use
mod_delayed_work().
Without this patch it may take up to 2s until the watchdog task gets
executed, which may cause long delays when setting the MAC address.
Fixes:
a3e839d539e0 ("iavf: Add usage of new virtchnl format to set default MAC")
Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
Reviewed-by: Michal Schmidt <mschmidt@redhat.com>
Tested-by: Michal Schmidt <mschmidt@redhat.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Michal Schmidt [Thu, 15 Dec 2022 22:50:48 +0000 (23:50 +0100)]
iavf: fix temporary deadlock and failure to set MAC address
[ Upstream commit
4411a608f7c8df000cb1a9f7881982dd8e10839a ]
We are seeing an issue where setting the MAC address on iavf fails with
EAGAIN after the 2.5s timeout expires in iavf_set_mac().
There is the following deadlock scenario:
iavf_set_mac(), holding rtnl_lock, waits on:
iavf_watchdog_task (within iavf_wq) to send a message to the PF,
and
iavf_adminq_task (within iavf_wq) to receive a response from the PF.
In this adapter state (>=__IAVF_DOWN), these tasks do not need to take
rtnl_lock, but iavf_wq is a global single-threaded workqueue, so they
may get stuck waiting for another adapter's iavf_watchdog_task to run
iavf_init_config_adapter(), which does take rtnl_lock.
The deadlock resolves itself by the timeout in iavf_set_mac(),
which results in EAGAIN returned to userspace.
Let's break the deadlock loop by changing iavf_wq into a per-adapter
workqueue, so that one adapter's tasks are not blocked by another's.
Fixes:
35a2443d0910 ("iavf: Add waiting for response from PF in set mac")
Co-developed-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Nirmoy Das [Tue, 17 Jan 2023 17:52:36 +0000 (18:52 +0100)]
drm/i915: Fix a memory leak with reused mmap_offset
[ Upstream commit
0220e4fe178c3390eb0291cdb34912d66972db8a ]
drm_vma_node_allow() and drm_vma_node_revoke() should be called in
balanced pairs. We call drm_vma_node_allow() once per-file everytime a
user calls mmap_offset, but only call drm_vma_node_revoke once per-file
on each mmap_offset. As the mmap_offset is reused by the client, the
per-file vm_count may remain non-zero and the rbtree leaked.
Call drm_vma_node_allow_once() instead to prevent that memory leak.
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
Fixes:
786555987207 ("drm/i915/gem: Store mmap_offsets in an rbtree rather than a plain list")
Reported-by: Chuansheng Liu <chuansheng.liu@intel.com>
Reported-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://lore.kernel.org/r/20230117175236.22317-2-nirmoy.das@intel.com
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Nirmoy Das [Tue, 17 Jan 2023 17:52:35 +0000 (18:52 +0100)]
drm/drm_vma_manager: Add drm_vma_node_allow_once()
[ Upstream commit
899d3a3c19ac0e5da013ce34833dccb97d19b5e4 ]
Currently there is no easy way for a drm driver to safely check and allow
drm_vma_offset_node for a drm file just once. Allow drm drivers to call
non-refcounted version of drm_vma_node_allow() so that a driver doesn't
need to keep track of each drm_vma_node_allow() to call subsequent
drm_vma_node_revoke() to prevent memory leak.
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: David Airlie <airlied@gmail.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Andi Shyti <andi.shyti@linux.intel.com>
Suggested-by: Chris Wilson <chris.p.wilson@intel.com>
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://lore.kernel.org/r/20230117175236.22317-1-nirmoy.das@intel.com
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Richard Fitzgerald [Mon, 19 Dec 2022 13:01:45 +0000 (13:01 +0000)]
i2c: designware: Fix unbalanced suspended flag
[ Upstream commit
75507a319876aba88932e2c7dab58b6c22d89f6b ]
Ensure that i2c_mark_adapter_suspended() is always balanced by a call to
i2c_mark_adapter_resumed().
dw_i2c_plat_resume() must always be called, so that
i2c_mark_adapter_resumed() is called. This is not compatible with
DPM_FLAG_MAY_SKIP_RESUME, so remove the flag.
Since the controller is always resumed on system resume the
dw_i2c_plat_complete() callback is redundant and has been removed.
The unbalanced suspended flag was introduced by commit
c57813b8b288
("i2c: designware: Lock the adapter while setting the suspended flag")
Before that commit, the system and runtime PM used the same functions. The
DPM_FLAG_MAY_SKIP_RESUME was used to skip the system resume if the driver
had been in runtime-suspend. If system resume was skipped, the suspended
flag would be cleared by the next runtime resume. The check of the
suspended flag was _after_ the call to pm_runtime_get_sync() in
i2c_dw_xfer(). So either a system resume or a runtime resume would clear
the flag before it was checked.
Having introduced the unbalanced suspended flag with that commit, a further
commit
80704a84a9f8
("i2c: designware: Use the i2c_mark_adapter_suspended/resumed() helpers")
changed from using a local suspended flag to using the
i2c_mark_adapter_suspended/resumed() functions. These use a flag that is
checked by I2C core code before issuing the transfer to the bus driver, so
there was no opportunity for the bus driver to runtime resume itself before
the flag check.
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Fixes:
c57813b8b288 ("i2c: designware: Lock the adapter while setting the suspended flag")
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Lareine Khawaly [Wed, 21 Dec 2022 19:59:00 +0000 (19:59 +0000)]
i2c: designware: use casting of u64 in clock multiplication to avoid overflow
[ Upstream commit
c8c37bc514514999e62a17e95160ed9ebf75ca8d ]
In functions i2c_dw_scl_lcnt() and i2c_dw_scl_hcnt() may have overflow
by depending on the values of the given parameters including the ic_clk.
For example in our use case where ic_clk is larger than one million,
multiplication of ic_clk * 4700 will result in 32 bit overflow.
Add cast of u64 to the calculation to avoid multiplication overflow, and
use the corresponding define for divide.
Fixes:
2373f6b9744d ("i2c-designware: split of i2c-designware.c into core and bus specific parts")
Signed-off-by: Lareine Khawaly <lareine@amazon.com>
Signed-off-by: Hanna Hawa <hhhawa@amazon.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Dylan Yudaken [Fri, 27 Jan 2023 10:59:11 +0000 (02:59 -0800)]
io_uring: always prep_async for drain requests
[ Upstream commit
ef5c600adb1d985513d2b612cc90403a148ff287 ]
Drain requests all go through io_drain_req, which has a quick exit in case
there is nothing pending (ie the drain is not useful). In that case it can
run the issue the request immediately.
However for safety it queues it through task work.
The problem is that in this case the request is run asynchronously, but
the async work has not been prepared through io_req_prep_async.
This has not been a problem up to now, as the task work always would run
before returning to userspace, and so the user would not have a chance to
race with it.
However - with IORING_SETUP_DEFER_TASKRUN - this is no longer the case and
the work might be defered, giving userspace a chance to change data being
referred to in the request.
Instead _always_ prep_async for drain requests, which is simpler anyway
and removes this issue.
Cc: stable@vger.kernel.org
Fixes:
c0e0d6ba25f1 ("io_uring: add IORING_SETUP_DEFER_TASKRUN")
Signed-off-by: Dylan Yudaken <dylany@meta.com>
Link: https://lore.kernel.org/r/20230127105911.2420061-1-dylany@meta.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Haiyang Zhang [Thu, 19 Jan 2023 20:59:10 +0000 (12:59 -0800)]
net: mana: Fix IRQ name - add PCI and queue number
[ Upstream commit
20e3028c39a5bf882e91e717da96d14f1acec40e ]
The PCI and queue number info is missing in IRQ names.
Add PCI and queue number to IRQ names, to allow CPU affinity
tuning scripts to work.
Cc: stable@vger.kernel.org
Fixes:
ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Link: https://lore.kernel.org/r/1674161950-19708-1-git-send-email-haiyangz@microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Pavel Begunkov [Wed, 23 Nov 2022 11:33:40 +0000 (11:33 +0000)]
io_uring: inline __io_req_complete_put()
[ Upstream commit
fa18fa2272c7469e470dcb7bf838ea50a25494ca ]
Inline __io_req_complete_put() into io_req_complete_post(), there are no
other users.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1923a4dfe80fa877f859a22ed3df2d5fc8ecf02b.1669203009.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Stable-dep-of:
ef5c600adb1d ("io_uring: always prep_async for drain requests")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Pavel Begunkov [Wed, 23 Nov 2022 11:33:39 +0000 (11:33 +0000)]
io_uring: remove io_req_tw_post_queue
[ Upstream commit
833b5dfffc26c81835ce38e2a5df9ac5fa142735 ]
Remove io_req_tw_post() and io_req_tw_post_queue(), we can use
io_req_task_complete() instead.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b9b73c08022c7f1457023ac841f35c0100e70345.1669203009.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Stable-dep-of:
ef5c600adb1d ("io_uring: always prep_async for drain requests")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Pavel Begunkov [Wed, 23 Nov 2022 11:33:38 +0000 (11:33 +0000)]
io_uring: use io_req_task_complete() in timeout
[ Upstream commit
624fd779fd869bdcb2c0ccca0f09456eed71ed52 ]
Use a more generic io_req_task_complete() in timeout completion
task_work instead of io_req_complete_post().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/bda1710b58c07bf06107421c2a65c529ea9cdcac.1669203009.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Stable-dep-of:
ef5c600adb1d ("io_uring: always prep_async for drain requests")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Pavel Begunkov [Wed, 23 Nov 2022 11:33:37 +0000 (11:33 +0000)]
io_uring: hold locks for io_req_complete_failed
[ Upstream commit
e276ae344a770f91912a81c6a338d92efd319be2 ]
A preparation patch, make sure we always hold uring_lock around
io_req_complete_failed(). The only place deviating from the rule
is io_cancel_defer_files(), queue a tw instead.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/70760344eadaecf2939287084b9d4ba5c05a6984.1669203009.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Stable-dep-of:
ef5c600adb1d ("io_uring: always prep_async for drain requests")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Pavel Begunkov [Thu, 17 Nov 2022 18:41:06 +0000 (18:41 +0000)]
io_uring: inline __io_req_complete_post()
[ Upstream commit
f9d567c75ec216447f36da6e855500023504fa04 ]
There is only one user of __io_req_complete_post(), inline it.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/ef4c9059950a3da5cf68df00f977f1fd13bd9306.1668597569.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Stable-dep-of:
ef5c600adb1d ("io_uring: always prep_async for drain requests")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Pavel Begunkov [Fri, 11 Nov 2022 16:54:08 +0000 (16:54 +0000)]
io_uring: inline io_req_task_work_add()
[ Upstream commit
e52d2e583e4ad1d5d0b804d79c2b8752eb0e5ceb ]
__io_req_task_work_add() is huge but marked inline, that makes compilers
to generate lots of garbage. Inline the wrapper caller
io_req_task_work_add() instead.
before and after:
text data bss dec hex filename
47347 16248 8 63603 f873 io_uring/io_uring.o
text data bss dec hex filename
45303 16248 8 61559 f077 io_uring/io_uring.o
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/26dc8c28ca0160e3269ef3e55c5a8b917c4d4450.1668162751.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Stable-dep-of:
ef5c600adb1d ("io_uring: always prep_async for drain requests")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Wayne Lin [Mon, 12 Dec 2022 07:41:18 +0000 (15:41 +0800)]
drm/amdgpu/display/mst: update mst_mgr relevant variable when long HPD
commit
f85c5e25fd28fe0bf6d6d0563cf83758a4e05c8f upstream.
[Why & How]
Now the vc_start_slot is controlled at drm side. When we
service a long HPD, we still need to run
dm_helpers_dp_mst_write_payload_allocation_table() to update
drm mst_mgr's relevant variable. Otherwise, on the next plug-in,
payload will get assigned with a wrong start slot.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2171
Signed-off-by: Wayne Lin <Wayne.Lin@amd.com>
Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Fixes:
4d07b0bc4034 ("drm/display/dp_mst: Move all payload info into the atomic state")
Cc: stable@vger.kernel.org # 6.1
Acked-by: Harry Wentland <harry.wentland@amd.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Tested-by: Didier Raboud <odyx@debian.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Wayne Lin [Fri, 9 Dec 2022 11:05:33 +0000 (19:05 +0800)]
drm/amdgpu/display/mst: limit payload to be updated one by one
commit
cb1e0b015f56b8f3c7f5ce33ff4b782ee5674512 upstream.
[Why]
amdgpu expects to update payload table for one stream one time
by calling dm_helpers_dp_mst_write_payload_allocation_table().
Currently, it get modified to try to update HW payload table
at once by referring mst_state.
[How]
This is just a quick workaround. Should find way to remove the
temporary struct dc_dp_mst_stream_allocation_table later if set
struct link_mst_stream_allocatio directly is possible.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2171
Signed-off-by: Wayne Lin <Wayne.Lin@amd.com>
Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Fixes:
4d07b0bc4034 ("drm/display/dp_mst: Move all payload info into the atomic state")
Cc: stable@vger.kernel.org # 6.1
Acked-by: Harry Wentland <harry.wentland@amd.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Tested-by: Didier Raboud <odyx@debian.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Lyude Paul [Wed, 23 Nov 2022 19:50:16 +0000 (14:50 -0500)]
drm/amdgpu/display/mst: Fix mst_state->pbn_div and slot count assignments
commit
1119e1f9636b76aef14068c7fd0b4d55132b86b8 upstream.
Looks like I made a pretty big mistake here without noticing: it seems when
I moved the assignments of mst_state->pbn_div I completely missed the fact
that the reason for us calling drm_dp_mst_update_slots() earlier was to
account for the fact that we need to call this function using info from the
root MST connector, instead of just trying to do this from each MST
encoder's atomic check function. Otherwise, we end up filling out all of
DC's link information with zeroes.
So, let's restore that and hopefully fix this DSC regression.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2171
Signed-off-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Fixes:
4d07b0bc4034 ("drm/display/dp_mst: Move all payload info into the atomic state")
Cc: stable@vger.kernel.org # 6.1
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Tested-by: Didier Raboud <odyx@debian.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jonathan Kim [Thu, 19 Jan 2023 23:42:03 +0000 (18:42 -0500)]
drm/amdgpu: remove unconditional trap enable on add gfx11 queues
commit
2de3769830346e68b3de0f4abc0d8e2625ad9dac upstream.
Rebase of driver has incorrect unconditional trap enablement
for GFX11 when adding mes queues.
Reported-by: Graham Sider <graham.sider@amd.com>
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Graham Sider <graham.sider@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.1.x
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Evan Quan [Fri, 20 Jan 2023 03:21:53 +0000 (11:21 +0800)]
drm/amd/pm: add missing AllowIHInterrupt message mapping for SMU13.0.0
commit
15b207d0abdcbb2271774aa99d9a290789159e75 upstream.
Add SMU13.0.0 AllowIHInterrupt message mapping.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.1.x
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Wayne Lin [Wed, 28 Dec 2022 06:50:43 +0000 (14:50 +0800)]
drm/display/dp_mst: Correct the kref of port.
commit
d8bf2df715bb8ac964f91fe8bf67c37c5d916463 upstream.
[why & how]
We still need to refer to port while removing payload at commit_tail.
we should keep the kref till then to release.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2171
Signed-off-by: Wayne Lin <Wayne.Lin@amd.com>
Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Fixes:
4d07b0bc4034 ("drm/display/dp_mst: Move all payload info into the atomic state")
Cc: stable@vger.kernel.org # 6.1
Acked-by: Harry Wentland <harry.wentland@amd.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Tested-by: Didier Raboud <odyx@debian.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Mark Pearson [Tue, 24 Jan 2023 15:36:23 +0000 (10:36 -0500)]
platform/x86: thinkpad_acpi: Fix profile modes on Intel platforms
commit
1bc5d819f0b9784043ea08570e1b21107aa35739 upstream.
My last commit to fix profile mode displays on AMD platforms caused
an issue on Intel platforms - sorry!
In it I was reading the current functional mode (MMC, PSC, AMT) from
the BIOS but didn't account for the fact that on some of our Intel
platforms I use a different API which returns just the profile and not
the functional mode.
This commit fixes it so that on Intel platforms it knows the functional
mode is always MMC.
I also fixed a potential problem that a platform may try to set the mode
for both MMC and PSC - which was incorrect.
Tested on X1 Carbon 9 (Intel) and Z13 (AMD).
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216963
Fixes:
fde5f74ccfc7 ("platform/x86: thinkpad_acpi: Fix profile mode display in AMT mode")
Cc: stable@vger.kernel.org
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Mark Pearson <mpearson-lenovo@squebb.ca>
Link: https://lore.kernel.org/r/20230124153623.145188-1-mpearson-lenovo@squebb.ca
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Manivannan Sadhasivam [Wed, 18 Jan 2023 15:08:50 +0000 (20:38 +0530)]
EDAC/qcom: Do not pass llcc_driv_data as edac_device_ctl_info's pvt_info
commit
977c6ba624f24ae20cf0faee871257a39348d4a9 upstream.
The memory for llcc_driv_data is allocated by the LLCC driver. But when
it is passed as the private driver info to the EDAC core, it will get freed
during the qcom_edac driver release. So when the qcom_edac driver gets probed
again, it will try to use the freed data leading to the use-after-free bug.
Hence, do not pass llcc_driv_data as pvt_info but rather reference it
using the platform_data pointer in the qcom_edac driver.
Fixes:
27450653f1db ("drivers: edac: Add EDAC driver support for QCOM SoCs")
Reported-by: Steev Klimaszewski <steev@kali.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Steev Klimaszewski <steev@kali.org> # Thinkpad X13s
Tested-by: Andrew Halaney <ahalaney@redhat.com> # sa8540p-ride
Cc: <stable@vger.kernel.org> # 4.20
Link: https://lore.kernel.org/r/20230118150904.26913-4-manivannan.sadhasivam@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Manivannan Sadhasivam [Wed, 18 Jan 2023 15:08:48 +0000 (20:38 +0530)]
EDAC/device: Respect any driver-supplied workqueue polling value
commit
cec669ff716cc83505c77b242aecf6f7baad869d upstream.
The EDAC drivers may optionally pass the poll_msec value. Use that value
if available, else fall back to 1000ms.
[ bp: Touchups. ]
Fixes:
e27e3dac6517 ("drivers/edac: add edac_device class")
Reported-by: Luca Weiss <luca.weiss@fairphone.com>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Steev Klimaszewski <steev@kali.org> # Thinkpad X13s
Tested-by: Andrew Halaney <ahalaney@redhat.com> # sa8540p-ride
Cc: <stable@vger.kernel.org> # 4.9
Link: https://lore.kernel.org/r/COZYL8MWN97H.MROQ391BGA09@otso
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Giulio Benetti [Tue, 13 Dec 2022 19:24:03 +0000 (20:24 +0100)]
ARM: 9280/1: mm: fix warning on phys_addr_t to void pointer assignment
commit
a4e03921c1bb118e6718e0a3b0322a2c13ed172b upstream.
zero_page is a void* pointer but memblock_alloc() returns phys_addr_t type
so this generates a warning while using clang and with -Wint-error enabled
that becomes and error. So let's cast the return of memblock_alloc() to
(void *).
Cc: <stable@vger.kernel.org> # 4.14.x +
Fixes:
340a982825f7 ("ARM: 9266/1: mm: fix no-MMU ZERO_PAGE() implementation")
Signed-off-by: Giulio Benetti <giulio.benetti@benettiengineering.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Gergely Risko [Thu, 19 Jan 2023 13:40:41 +0000 (14:40 +0100)]
ipv6: fix reachability confirmation with proxy_ndp
commit
9f535c870e493841ac7be390610ff2edec755762 upstream.
When proxying IPv6 NDP requests, the adverts to the initial multicast
solicits are correct and working. On the other hand, when later a
reachability confirmation is requested (on unicast), no reply is sent.
This causes the neighbor entry expiring on the sending node, which is
mostly a non-issue, as a new multicast request is sent. There are
routers, where the multicast requests are intentionally delayed, and in
these environments the current implementation causes periodic packet
loss for the proxied endpoints.
The root cause is the erroneous decrease of the hop limit, as this
is checked in ndisc.c and no answer is generated when it's 254 instead
of the correct 255.
Cc: stable@vger.kernel.org
Fixes:
46c7655f0b56 ("ipv6: decrease hop limit counter in ip6_forward()")
Signed-off-by: Gergely Risko <gergely.risko@gmail.com>
Tested-by: Gergely Risko <gergely.risko@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Krzysztof Kozlowski [Fri, 20 Jan 2023 13:14:47 +0000 (14:14 +0100)]
regulator: dt-bindings: samsung,s2mps14: add lost samsung,ext-control-gpios
commit
4bb3d82a1820c1b609ede8eb2332f3cb038c5840 upstream.
The samsung,ext-control-gpios property was lost during conversion to DT
schema:
exynos3250-artik5-eval.dtb: pmic@66: regulators:LDO11: Unevaluated properties are not allowed ('samsung,ext-control-gpios' was unexpected)
Fixes:
ea98b9eba05c ("regulator: dt-bindings: samsung,s2m: convert to dtschema")
Cc: <stable@vger.kernel.org>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20230120131447.289702-1-krzysztof.kozlowski@linaro.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Srinivas Pandruvada [Mon, 23 Jan 2023 17:21:10 +0000 (09:21 -0800)]
thermal: intel: int340x: Protect trip temperature from concurrent updates
commit
6757a7abe47bcb12cb2d45661067e182424b0ee3 upstream.
Trip temperatures are read using ACPI methods and stored in the memory
during zone initializtion and when the firmware sends a notification for
change. This trip temperature is returned when the thermal core calls via
callback get_trip_temp().
But it is possible that while updating the memory copy of the trips when
the firmware sends a notification for change, thermal core is reading the
trip temperature via the callback get_trip_temp(). This may return invalid
trip temperature.
To address this add a mutex to protect the invalid temperature reads in
the callback get_trip_temp() and int340x_thermal_read_trips().
Fixes:
5fbf7f27fa3d ("Thermal/int340x: Add common thermal zone handler")
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: 5.0+ <stable@vger.kernel.org> # 5.0+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Masahiro Yamada [Fri, 6 Jan 2023 16:12:13 +0000 (01:12 +0900)]
riscv: fix -Wundef warning for CONFIG_RISCV_BOOT_SPINWAIT
commit
5b89c6f9b2df2b7cf6da8e0b2b87c8995b378cad upstream.
Since commit
80b6093b55e3 ("kbuild: add -Wundef to KBUILD_CPPFLAGS
for W=1 builds"), building with W=1 detects misuse of #if.
$ make W=1 ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- arch/riscv/kernel/
[snip]
AS arch/riscv/kernel/head.o
arch/riscv/kernel/head.S:329:5: warning: "CONFIG_RISCV_BOOT_SPINWAIT" is not defined, evaluates to 0 [-Wundef]
329 | #if CONFIG_RISCV_BOOT_SPINWAIT
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
CONFIG_RISCV_BOOT_SPINWAIT is a bool option. #ifdef should be used.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Fixes:
2ffc48fc7071 ("RISC-V: Move spinwait booting method to its own config")
Link: https://lore.kernel.org/r/20230106161213.2374093-1-masahiroy@kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Johan Hovold [Mon, 16 Jan 2023 16:12:01 +0000 (17:12 +0100)]
scsi: ufs: core: Fix devfreq deadlocks
commit
ba81043753fffbc2ad6e0c5ff2659f12ac2f46b4 upstream.
There is a lock inversion and rwsem read-lock recursion in the devfreq
target callback which can lead to deadlocks.
Specifically, ufshcd_devfreq_scale() already holds a clk_scaling_lock
read lock when toggling the write booster, which involves taking the
dev_cmd mutex before taking another clk_scaling_lock read lock.
This can lead to a deadlock if another thread:
1) tries to acquire the dev_cmd and clk_scaling locks in the correct
order, or
2) takes a clk_scaling write lock before the attempt to take the
clk_scaling read lock a second time.
Fix this by dropping the clk_scaling_lock before toggling the write booster
as was done before commit
0e9d4ca43ba8 ("scsi: ufs: Protect some contexts
from unexpected clock scaling").
While the devfreq callbacks are already serialised, add a second
serialising mutex to handle the unlikely case where a callback triggered
through the devfreq sysfs interface is racing with a request to disable
clock scaling through the UFS controller 'clkscale_enable' sysfs
attribute. This could otherwise lead to the write booster being left
disabled after having disabled clock scaling.
Also take the new mutex in ufshcd_clk_scaling_allow() to make sure that any
pending write booster update has completed on return.
Note that this currently only affects Qualcomm platforms since commit
87bd05016a64 ("scsi: ufs: core: Allow host driver to disable wb toggling
during clock scaling").
The lock inversion (i.e. 1 above) was reported by lockdep as:
======================================================
WARNING: possible circular locking dependency detected
6.1.0-next-
20221216 #211 Not tainted
------------------------------------------------------
kworker/u16:2/71 is trying to acquire lock:
ffff076280ba98a0 (&hba->dev_cmd.lock){+.+.}-{3:3}, at: ufshcd_query_flag+0x50/0x1c0
but task is already holding lock:
ffff076280ba9cf0 (&hba->clk_scaling_lock){++++}-{3:3}, at: ufshcd_devfreq_scale+0x2b8/0x380
which lock already depends on the new lock.
[ +0.011606]
the existing dependency chain (in reverse order) is:
-> #1 (&hba->clk_scaling_lock){++++}-{3:3}:
lock_acquire+0x68/0x90
down_read+0x58/0x80
ufshcd_exec_dev_cmd+0x70/0x2c0
ufshcd_verify_dev_init+0x68/0x170
ufshcd_probe_hba+0x398/0x1180
ufshcd_async_scan+0x30/0x320
async_run_entry_fn+0x34/0x150
process_one_work+0x288/0x6c0
worker_thread+0x74/0x450
kthread+0x118/0x120
ret_from_fork+0x10/0x20
-> #0 (&hba->dev_cmd.lock){+.+.}-{3:3}:
__lock_acquire+0x12a0/0x2240
lock_acquire.part.0+0xcc/0x220
lock_acquire+0x68/0x90
__mutex_lock+0x98/0x430
mutex_lock_nested+0x2c/0x40
ufshcd_query_flag+0x50/0x1c0
ufshcd_query_flag_retry+0x64/0x100
ufshcd_wb_toggle+0x5c/0x120
ufshcd_devfreq_scale+0x2c4/0x380
ufshcd_devfreq_target+0xf4/0x230
devfreq_set_target+0x84/0x2f0
devfreq_update_target+0xc4/0xf0
devfreq_monitor+0x38/0x1f0
process_one_work+0x288/0x6c0
worker_thread+0x74/0x450
kthread+0x118/0x120
ret_from_fork+0x10/0x20
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&hba->clk_scaling_lock);
lock(&hba->dev_cmd.lock);
lock(&hba->clk_scaling_lock);
lock(&hba->dev_cmd.lock);
*** DEADLOCK ***
Fixes:
0e9d4ca43ba8 ("scsi: ufs: Protect some contexts from unexpected clock scaling")
Cc: stable@vger.kernel.org # 5.12
Cc: Can Guo <quic_cang@quicinc.com>
Tested-by: Andrew Halaney <ahalaney@redhat.com>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20230116161201.16923-1-johan+linaro@kernel.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Marc Zyngier [Thu, 19 Jan 2023 11:07:59 +0000 (11:07 +0000)]
KVM: arm64: GICv4.1: Fix race with doorbell on VPE activation/deactivation
commit
ef3691683d7bfd0a2acf48812e4ffe894f10bfa8 upstream.
To save the vgic LPI pending state with GICv4.1, the VPEs must all be
unmapped from the ITSs so that the sGIC caches can be flushed.
The opposite is done once the state is saved.
This is all done by using the activate/deactivate irqdomain callbacks
directly from the vgic code. Crutially, this is done without holding
the irqdesc lock for the interrupts that represent the VPE. And these
callbacks are changing the state of the irqdesc. What could possibly
go wrong?
If a doorbell fires while we are messing with the irqdesc state,
it will acquire the lock and change the interrupt state concurrently.
Since we don't hole the lock, curruption occurs in on the interrupt
state. Oh well.
While acquiring the lock would fix this (and this was Shanker's
initial approach), this is still a layering violation we could do
without. A better approach is actually to free the VPE interrupt,
do what we have to do, and re-request it.
It is more work, but this usually happens only once in the lifetime
of the VM and we don't really care about this sort of overhead.
Fixes:
f66b7b151e00 ("KVM: arm64: GICv4.1: Try to save VLPI state in save_pending_tables")
Reported-by: Shanker Donthineni <sdonthineni@nvidia.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230118022348.4137094-1-sdonthineni@nvidia.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Hendrik Borghorst [Mon, 14 Nov 2022 16:48:23 +0000 (16:48 +0000)]
KVM: x86/vmx: Do not skip segment attributes if unusable bit is set
commit
a44b331614e6f7e63902ed7dff7adc8c85edd8bc upstream.
When serializing and deserializing kvm_sregs, attributes of the segment
descriptors are stored by user space. For unusable segments,
vmx_segment_access_rights skips all attributes and sets them to 0.
This means we zero out the DPL (Descriptor Privilege Level) for unusable
entries.
Unusable segments are - contrary to their name - usable in 64bit mode and
are used by guests to for example create a linear map through the
NULL selector.
VMENTER checks if SS.DPL is correct depending on the CS segment type.
For types 9 (Execute Only) and 11 (Execute Read), CS.DPL must be equal to
SS.DPL [1].
We have seen real world guests setting CS to a usable segment with DPL=3
and SS to an unusable segment with DPL=3. Once we go through an sregs
get/set cycle, SS.DPL turns to 0. This causes the virtual machine to crash
reproducibly.
This commit changes the attribute logic to always preserve attributes for
unusable segments. According to [2] SS.DPL is always saved on VM exits,
regardless of the unusable bit so user space applications should have saved
the information on serialization correctly.
[3] specifies that besides SS.DPL the rest of the attributes of the
descriptors are undefined after VM entry if unusable bit is set. So, there
should be no harm in setting them all to the previous state.
[1] Intel SDM Vol 3C 26.3.1.2 Checks on Guest Segment Registers
[2] Intel SDM Vol 3C 27.3.2 Saving Segment Registers and Descriptor-Table
Registers
[3] Intel SDM Vol 3C 26.3.2.2 Loading Guest Segment Registers and
Descriptor-Table Registers
Cc: Alexander Graf <graf@amazon.de>
Cc: stable@vger.kernel.org
Signed-off-by: Hendrik Borghorst <hborghor@amazon.de>
Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
Message-Id: <
20221114164823.69555-1-hborghor@amazon.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jens Axboe [Sun, 22 Jan 2023 17:02:55 +0000 (10:02 -0700)]
io_uring/net: cache provided buffer group value for multishot receives
commit
b00c51ef8f72ced0965d021a291b98ff822c5337 upstream.
If we're using ring provided buffers with multishot receive, and we end
up doing an io-wq based issue at some points that also needs to select
a buffer, we'll lose the initially assigned buffer group as
io_ring_buffer_select() correctly clears the buffer group list as the
issue isn't serialized by the ctx uring_lock. This is fine for normal
receives as the request puts the buffer and finishes, but for multishot,
we will re-arm and do further receives. On the next trigger for this
multishot receive, the receive will try and pick from a buffer group
whose value is the same as the buffer ID of the las receive. That is
obviously incorrect, and will result in a premature -ENOUFS error for
the receive even if we had available buffers in the correct group.
Cache the buffer group value at prep time, so we can restore it for
future receives. This only needs doing for the above mentioned case, but
just do it by default to keep it easier to read.
Cc: stable@vger.kernel.org
Fixes:
b3fdea6ecb55 ("io_uring: multishot recv")
Fixes:
9bb66906f23e ("io_uring: support multishot in recvmsg")
Cc: Dylan Yudaken <dylany@meta.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Miklos Szeredi [Tue, 24 Jan 2023 15:41:18 +0000 (16:41 +0100)]
ovl: fail on invalid uid/gid mapping at copy up
commit
4f11ada10d0ad3fd53e2bd67806351de63a4f9c3 upstream.
If st_uid/st_gid doesn't have a mapping in the mounter's user_ns, then
copy-up should fail, just like it would fail if the mounter task was doing
the copy using "cp -a".
There's a corner case where the "cp -a" would succeed but copy up fail: if
there's a mapping of the invalid uid/gid (65534 by default) in the user
namespace. This is because stat(2) will return this value if the mapping
doesn't exist in the current user_ns and "cp -a" will in turn be able to
create a file with this uid/gid.
This behavior would be inconsistent with POSIX ACL's, which return -1 for
invalid uid/gid which result in a failed copy.
For consistency and simplicity fail the copy of the st_uid/st_gid are
invalid.
Fixes:
459c7c565ac3 ("ovl: unprivieged mounts")
Cc: <stable@vger.kernel.org> # v5.11
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Seth Forshee <sforshee@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Miklos Szeredi [Tue, 24 Jan 2023 15:41:18 +0000 (16:41 +0100)]
ovl: fix tmpfile leak
commit
baabaa505563362b71f2637aedd7b807d270656c upstream.
Missed an error cleanup.
Reported-by: syzbot+fd749a7ea127a84e0ffd@syzkaller.appspotmail.com
Fixes:
2b1a77461f16 ("ovl: use vfs_tmpfile_open() helper")
Cc: <stable@vger.kernel.org> # v6.1
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Namjae Jeon [Tue, 24 Jan 2023 15:13:20 +0000 (00:13 +0900)]
ksmbd: limit pdu length size according to connection status
commit
62c487b53a7ff31e322cf2874d3796b8202c54a5 upstream.
Stream protocol length will never be larger than 16KB until session setup.
After session setup, the size of requests will not be larger than
16KB + SMB2 MAX WRITE size. This patch limits these invalidly oversized
requests and closes the connection immediately.
Fixes:
0626e6641f6b ("cifsd: add server handler for central processing and tranport layers")
Cc: stable@vger.kernel.org
Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-18259
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Namjae Jeon [Tue, 24 Jan 2023 15:09:02 +0000 (00:09 +0900)]
ksmbd: downgrade ndr version error message to debug
commit
a34dc4a9b9e2fb3a45c179a60bb0b26539c96189 upstream.
When user switch samba to ksmbd, The following message flood is coming
when accessing files. Samba seems to changs dos attribute version to v5.
This patch downgrade ndr version error message to debug.
$ dmesg
...
[68971.766914] ksmbd: v5 version is not supported
[68971.779808] ksmbd: v5 version is not supported
[68971.871544] ksmbd: v5 version is not supported
[68971.910135] ksmbd: v5 version is not supported
...
Cc: stable@vger.kernel.org
Fixes:
e2f34481b24d ("cifsd: add server-side procedures for SMB3")
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Marios Makassikis [Wed, 11 Jan 2023 16:39:02 +0000 (17:39 +0100)]
ksmbd: do not sign response to session request for guest login
commit
5fde3c21cf33830eda7bfd006dc7f4bf07ec9fe6 upstream.
If ksmbd.mountd is configured to assign unknown users to the guest account
("map to guest = bad user" in the config), ksmbd signs the response.
This is wrong according to MS-SMB2 3.3.5.5.3:
12. If the SMB2_SESSION_FLAG_IS_GUEST bit is not set in the SessionFlags
field, and Session.IsAnonymous is FALSE, the server MUST sign the
final session setup response before sending it to the client, as
follows:
[...]
This fixes libsmb2 based applications failing to establish a session
("Wrong signature in received").
Fixes:
e2f34481b24d ("cifsd: add server-side procedures for SMB3")
Cc: stable@vger.kernel.org
Signed-off-by: Marios Makassikis <mmakassikis@freebox.fr>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Namjae Jeon [Thu, 29 Dec 2022 09:33:25 +0000 (18:33 +0900)]
ksmbd: add max connections parameter
commit
0d0d4680db22eda1eea785c47bbf66a9b33a8b16 upstream.
Add max connections parameter to limit number of maximum simultaneous
connections.
Fixes:
0626e6641f6b ("cifsd: add server handler for central processing and tranport layers")
Cc: stable@vger.kernel.org
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
David Howells [Wed, 25 Jan 2023 14:02:13 +0000 (14:02 +0000)]
cifs: Fix oops due to uncleared server->smbd_conn in reconnect
commit
b7ab9161cf5ddc42a288edf9d1a61f3bdffe17c7 upstream.
In smbd_destroy(), clear the server->smbd_conn pointer after freeing the
smbd_connection struct that it points to so that reconnection doesn't get
confused.
Fixes:
8ef130f9ec27 ("CIFS: SMBD: Implement function to destroy a SMB Direct connection")
Cc: stable@vger.kernel.org
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Acked-by: Tom Talpey <tom@talpey.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Long Li <longli@microsoft.com>
Cc: Pavel Shilovsky <piastryyy@gmail.com>
Cc: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Steven Rostedt (Google) [Mon, 23 Jan 2023 16:22:52 +0000 (11:22 -0500)]
ftrace/scripts: Update the instructions for ftrace-bisect.sh
commit
7ae4ba7195b1bac04a4210a499da9d8c63b0ba9c upstream.
The instructions for the ftrace-bisect.sh script, which is used to find
what function is being traced that is causing a kernel crash, and possibly
a triple fault reboot, uses the old method. In 5.1, a new feature was
added that let the user write in the index into available_filter_functions
that maps to the function a user wants to set in set_ftrace_filter (or
set_ftrace_notrace). This takes O(1) to set, as suppose to writing a
function name, which takes O(n) (where n is the number of functions in
available_filter_functions).
The ftrace-bisect.sh requires setting half of the functions in
available_filter_functions, which is O(n^2) using the name method to enable
and can take several minutes to complete. The number method is O(n) which
takes less than a second to complete. Using the number method for any
kernel 5.1 and after is the proper way to do the bisect.
Update the usage to reflect the new change, as well as using the
/sys/kernel/tracing path instead of the obsolete debugfs path.
Link: https://lkml.kernel.org/r/20230123112252.022003dd@gandalf.local.home
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Fixes:
f79b3f338564e ("ftrace: Allow enabling of filters via index of available_filter_functions")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Natalia Petrova [Wed, 11 Jan 2023 12:04:09 +0000 (15:04 +0300)]
trace_events_hist: add check for return value of 'create_hist_field'
commit
8b152e9150d07a885f95e1fd401fc81af202d9a4 upstream.
Function 'create_hist_field' is called recursively at
trace_events_hist.c:1954 and can return NULL-value that's why we have
to check it to avoid null pointer dereference.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Link: https://lkml.kernel.org/r/20230111120409.4111-1-n.petrova@fintech.ru
Cc: stable@vger.kernel.org
Fixes:
30350d65ac56 ("tracing: Add variable support to hist triggers")
Signed-off-by: Natalia Petrova <n.petrova@fintech.ru>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Steven Rostedt (Google) [Wed, 4 Jan 2023 21:14:12 +0000 (16:14 -0500)]
tracing: Make sure trace_printk() can output as soon as it can be used
commit
3bb06eb6e9acf7c4a3e1b5bc87aed398ff8e2253 upstream.
Currently trace_printk() can be used as soon as early_trace_init() is
called from start_kernel(). But if a crash happens, and
"ftrace_dump_on_oops" is set on the kernel command line, all you get will
be:
[ 0.456075] <idle>-0 0dN.2. 347519us : Unknown type 6
[ 0.456075] <idle>-0 0dN.2. 353141us : Unknown type 6
[ 0.456075] <idle>-0 0dN.2. 358684us : Unknown type 6
This is because the trace_printk() event (type 6) hasn't been registered
yet. That gets done via an early_initcall(), which may be early, but not
early enough.
Instead of registering the trace_printk() event (and other ftrace events,
which are not trace events) via an early_initcall(), have them registered at
the same time that trace_printk() can be used. This way, if there is a
crash before early_initcall(), then the trace_printk()s will actually be
useful.
Link: https://lkml.kernel.org/r/20230104161412.019f6c55@gandalf.local.home
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Fixes:
e725c731e3bb1 ("tracing: Split tracing initialization into two for early initialization")
Reported-by: "Joel Fernandes (Google)" <joel@joelfernandes.org>
Tested-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Mark Rutland [Tue, 3 Jan 2023 12:49:11 +0000 (12:49 +0000)]
ftrace: Export ftrace_free_filter() to modules
commit
8be9fbd5345da52f4a74f7f81d55ff9fa0a2958e upstream.
Setting filters on an ftrace ops results in some memory being allocated
for the filter hashes, which must be freed before the ops can be freed.
This can be done by removing every individual element of the hash by
calling ftrace_set_filter_ip() or ftrace_set_filter_ips() with `remove`
set, but this is somewhat error prone as it's easy to forget to remove
an element.
Make it easier to clean this up by exporting ftrace_free_filter(), which
can be used to clean up all of the filter hashes after an ftrace_ops has
been unregistered.
Using this, fix the ftrace-direct* samples to free hashes prior to being
unloaded. All other code either removes individual filters explicitly or
is built-in and already calls ftrace_free_filter().
Link: https://lkml.kernel.org/r/20230103124912.2948963-3-mark.rutland@arm.com
Cc: stable@vger.kernel.org
Cc: Florent Revest <revest@chromium.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Fixes:
e1067a07cfbc ("ftrace/samples: Add module to test multi direct modify interface")
Fixes:
5fae941b9a6f ("ftrace/samples: Add multi direct interface test module")
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Petr Pavlu [Mon, 5 Dec 2022 10:35:57 +0000 (11:35 +0100)]
module: Don't wait for GOING modules
commit
0254127ab977e70798707a7a2b757c9f3c971210 upstream.
During a system boot, it can happen that the kernel receives a burst of
requests to insert the same module but loading it eventually fails
during its init call. For instance, udev can make a request to insert
a frequency module for each individual CPU when another frequency module
is already loaded which causes the init function of the new module to
return an error.
Since commit
6e6de3dee51a ("kernel/module.c: Only return -EEXIST for
modules that have finished loading"), the kernel waits for modules in
MODULE_STATE_GOING state to finish unloading before making another
attempt to load the same module.
This creates unnecessary work in the described scenario and delays the
boot. In the worst case, it can prevent udev from loading drivers for
other devices and might cause timeouts of services waiting on them and
subsequently a failed boot.
This patch attempts a different solution for the problem
6e6de3dee51a
was trying to solve. Rather than waiting for the unloading to complete,
it returns a different error code (-EBUSY) for modules in the GOING
state. This should avoid the error situation that was described in
6e6de3dee51a (user space attempting to load a dependent module because
the -EEXIST error code would suggest to user space that the first module
had been loaded successfully), while avoiding the delay situation too.
This has been tested on linux-next since December 2022 and passes
all kmod selftests except test 0009 with module compression enabled
but it has been confirmed that this issue has existed and has gone
unnoticed since prior to this commit and can also be reproduced without
module compression with a simple usleep(5000000) on tools/modprobe.c [0].
These failures are caused by hitting the kernel mod_concurrent_max and can
happen either due to a self inflicted kernel module auto-loead DoS somehow
or on a system with large CPU count and each CPU count incorrectly triggering
many module auto-loads. Both of those issues need to be fixed in-kernel.
[0] https://lore.kernel.org/all/Y9A4fiobL6IHp%2F%2FP@bombadil.infradead.org/
Fixes:
6e6de3dee51a ("kernel/module.c: Only return -EEXIST for modules that have finished loading")
Co-developed-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Cc: stable@vger.kernel.org
Reviewed-by: Petr Mladek <pmladek@suse.com>
[mcgrof: enhance commit log with testing and kmod test result interpretation ]
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jeff Layton [Fri, 20 Jan 2023 19:52:14 +0000 (14:52 -0500)]
nfsd: don't free files unconditionally in __nfsd_file_cache_purge
[ Upstream commit
4bdbba54e9b1c769da8ded9abd209d765715e1d6 ]
nfsd_file_cache_purge is called when the server is shutting down, in
which case, tearing things down is generally fine, but it also gets
called when the exports cache is flushed.
Instead of walking the cache and freeing everything unconditionally,
handle it the same as when we have a notification of conflicting access.
Fixes:
ac3a2585f018 ("nfsd: rework refcounting in filecache")
Reported-by: Ruben Vestergaard <rubenv@drcmr.dk>
Reported-by: Torkil Svensgaard <torkil@drcmr.dk>
Reported-by: Shachar Kagan <skagan@nvidia.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Shachar Kagan <skagan@nvidia.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Yi Liu [Fri, 20 Jan 2023 15:05:28 +0000 (07:05 -0800)]
kvm/vfio: Fix potential deadlock on vfio group_lock
[ Upstream commit
51cdc8bc120ef6e42f6fb758341f5d91bc955952 ]
Currently it is possible that the final put of a KVM reference comes from
vfio during its device close operation. This occurs while the vfio group
lock is held; however, if the vfio device is still in the kvm device list,
then the following call chain could result in a deadlock:
VFIO holds group->group_lock/group_rwsem
-> kvm_put_kvm
-> kvm_destroy_vm
-> kvm_destroy_devices
-> kvm_vfio_destroy
-> kvm_vfio_file_set_kvm
-> vfio_file_set_kvm
-> try to hold group->group_lock/group_rwsem
The key function is the kvm_destroy_devices() which triggers destroy cb
of kvm_device_ops. It calls back to vfio and try to hold group_lock. So
if this path doesn't call back to vfio, this dead lock would be fixed.
Actually, there is a way for it. KVM provides another point to free the
kvm-vfio device which is the point when the device file descriptor is
closed. This can be achieved by providing the release cb instead of the
destroy cb. Also rename kvm_vfio_destroy() to be kvm_vfio_release().
/*
* Destroy is responsible for freeing dev.
*
* Destroy may be called before or after destructors are called
* on emulated I/O regions, depending on whether a reference is
* held by a vcpu or other kvm component that gets destroyed
* after the emulated I/O.
*/
void (*destroy)(struct kvm_device *dev);
/*
* Release is an alternative method to free the device. It is
* called when the device file descriptor is closed. Once
* release is called, the destroy method will not be called
* anymore as the device is removed from the device list of
* the VM. kvm->lock is held.
*/
void (*release)(struct kvm_device *dev);
Fixes:
421cfe6596f6 ("vfio: remove VFIO_GROUP_NOTIFY_SET_KVM")
Reported-by: Alex Williamson <alex.williamson@redhat.com>
Suggested-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Link: https://lore.kernel.org/r/20230114000351.115444-1-mjrosato@linux.ibm.com
Link: https://lore.kernel.org/r/20230120150528.471752-1-yi.l.liu@intel.com
[aw: update comment as well, s/destroy/release/]
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Alexey V. Vissarionov [Wed, 18 Jan 2023 03:12:55 +0000 (06:12 +0300)]
scsi: hpsa: Fix allocation size for scsi_host_alloc()
[ Upstream commit
bbbd25499100c810ceaf5193c3cfcab9f7402a33 ]
The 'h' is a pointer to struct ctlr_info, so it's just 4 or 8 bytes, while
the structure itself is much bigger.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes:
edd163687ea5 ("hpsa: add driver for HP Smart Array controllers.")
Link: https://lore.kernel.org/r/20230118031255.GE15213@altlinux.org
Signed-off-by: Alexey V. Vissarionov <gremlin@altlinux.org>
Acked-by: Don Brace <don.brace@microchip.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Niklas Schnelle [Tue, 10 Jan 2023 16:44:27 +0000 (17:44 +0100)]
vfio/type1: Respect IOMMU reserved regions in vfio_test_domain_fgsp()
[ Upstream commit
895c0747f726bb50c9b7a805613a61d1b6f9fa06 ]
Since commit
cbf7827bc5dc ("iommu/s390: Fix potential s390_domain
aperture shrinking") the s390 IOMMU driver uses reserved regions for the
system provided DMA ranges of PCI devices. Previously it reduced the
size of the IOMMU aperture and checked it on each mapping operation.
On current machines the system denies use of DMA addresses below 2^32 for
all PCI devices.
Usually mapping IOVAs in a reserved regions is harmless until a DMA
actually tries to utilize the mapping. However on s390 there is
a virtual PCI device called ISM which is implemented in firmware and
used for cross LPAR communication. Unlike real PCI devices this device
does not use the hardware IOMMU but inspects IOMMU translation tables
directly on IOTLB flush (s390 RPCIT instruction). If it detects IOVA
mappings outside the allowed ranges it goes into an error state. This
error state then causes the device to be unavailable to the KVM guest.
Analysing this we found that vfio_test_domain_fgsp() maps 2 pages at DMA
address 0 irrespective of the IOMMUs reserved regions. Even if usually
harmless this seems wrong in the general case so instead go through the
freshly updated IOVA list and try to find a range that isn't reserved,
and fits 2 pages, is PAGE_SIZE * 2 aligned. If found use that for
testing for fine grained super pages.
Fixes:
af029169b8fd ("vfio/type1: Check reserved region conflict and update iova list")
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20230110164427.4051938-2-schnelle@linux.ibm.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Qais Yousef [Thu, 12 Jan 2023 12:27:07 +0000 (12:27 +0000)]
sched/uclamp: Fix a uninitialized variable warnings
[ Upstream commit
e26fd28db82899be71b4b949527373d0a6be1e65 ]
Addresses the following warnings:
> config: riscv-randconfig-m031-
20221111
> compiler: riscv64-linux-gcc (GCC) 12.1.0
>
> smatch warnings:
> kernel/sched/fair.c:7263 find_energy_efficient_cpu() error: uninitialized symbol 'util_min'.
> kernel/sched/fair.c:7263 find_energy_efficient_cpu() error: uninitialized symbol 'util_max'.
Fixes:
244226035a1f ("sched/uclamp: Fix fits_capacity() check in feec()")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lore.kernel.org/r/20230112122708.330667-2-qyousef@layalina.io
Signed-off-by: Sasha Levin <sashal@kernel.org>
Pierre Gondois [Thu, 6 Oct 2022 08:10:52 +0000 (10:10 +0200)]
sched/fair: Check if prev_cpu has highest spare cap in feec()
[ Upstream commit
ad841e569f5c88e3332b32a000f251f33ff32187 ]
When evaluating the CPU candidates in the perf domain (pd) containing
the previously used CPU (prev_cpu), find_energy_efficient_cpu()
evaluates the energy of the pd:
- without the task (base_energy)
- with the task placed on prev_cpu (if the task fits)
- with the task placed on the CPU with the highest spare capacity,
prev_cpu being excluded from this set
If prev_cpu is already the CPU with the highest spare capacity,
max_spare_cap_cpu will be the CPU with the second highest spare
capacity.
On an Arm64 Juno-r2, with a workload of 10 tasks at a 10% duty cycle,
when prev_cpu and max_spare_cap_cpu are both valid candidates,
prev_spare_cap > max_spare_cap at ~82%.
Thus the energy of the pd when placing the task on max_spare_cap_cpu
is computed with no possible positive outcome 82% most of the time.
Do not consider max_spare_cap_cpu as a valid candidate if
prev_spare_cap > max_spare_cap.
Signed-off-by: Pierre Gondois <pierre.gondois@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lore.kernel.org/r/20221006081052.3862167-2-pierre.gondois@arm.com
Stable-dep-of:
e26fd28db828 ("sched/uclamp: Fix a uninitialized variable warnings")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Alexander Wetzel [Fri, 6 Jan 2023 22:31:41 +0000 (23:31 +0100)]
wifi: mac80211: Fix iTXQ AMPDU fragmentation handling
commit
592234e941f1addaa598601c9227e3b72d608625 upstream.
mac80211 must not enable aggregation wile transmitting a fragmented
MPDU. Enforce that for mac80211 internal TX queues (iTXQs).
Reported-by: kernel test robot <oliver.sang@intel.com>
Link: https://lore.kernel.org/oe-lkp/202301021738.7cd3e6ae-oliver.sang@intel.com
Signed-off-by: Alexander Wetzel <alexander@wetzel-home.de>
Link: https://lore.kernel.org/r/20230106223141.98696-1-alexander@wetzel-home.de
Cc: stable@vger.kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Alexander Wetzel [Fri, 30 Dec 2022 12:18:49 +0000 (13:18 +0100)]
wifi: mac80211: Proper mark iTXQs for resumption
commit
4444bc2116aecdcde87dce80373540adc8bd478b upstream.
When a running wake_tx_queue() call is aborted due to a hw queue stop
the corresponding iTXQ is not always correctly marked for resumption:
wake_tx_push_queue() can stops the queue run without setting
@IEEE80211_TXQ_STOP_NETIF_TX.
Without the @IEEE80211_TXQ_STOP_NETIF_TX flag __ieee80211_wake_txqs()
will not schedule a new queue run and remaining frames in the queue get
stuck till another frame is queued to it.
Fix the issue for all drivers - also the ones with custom wake_tx_queue
callbacks - by moving the logic into ieee80211_tx_dequeue() and drop the
redundant @txqs_stopped.
@IEEE80211_TXQ_STOP_NETIF_TX is also renamed to @IEEE80211_TXQ_DIRTY to
better describe the flag.
Fixes:
c850e31f79f0 ("wifi: mac80211: add internal handler for wake_tx_queue")
Signed-off-by: Alexander Wetzel <alexander@wetzel-home.de>
Link: https://lore.kernel.org/r/20221230121850.218810-1-alexander@wetzel-home.de
Cc: stable@vger.kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pavel Begunkov [Fri, 20 Jan 2023 16:38:06 +0000 (16:38 +0000)]
io_uring/msg_ring: fix remote queue to disabled ring
commit
8579538c89e33ce78be2feb41e07489c8cbf8f31 upstream.
IORING_SETUP_R_DISABLED rings don't have the submitter task set, so
it's not always safe to use ->submitter_task. Disallow posting msg_ring
messaged to disabled rings. Also add task NULL check for loosy sync
around testing for IORING_SETUP_R_DISABLED.
Cc: stable@vger.kernel.org
Fixes:
6d043ee1164ca ("io_uring: do msg_ring in target task via tw")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Harsh Jain [Wed, 2 Nov 2022 09:53:08 +0000 (15:23 +0530)]
drm/amdgpu: complete gfxoff allow signal during suspend without delay
commit
4b31b92b143f7d209f3d494c56d4c4673e9fc53d upstream.
change guarantees that gfxoff is allowed before moving further in
s2idle sequence to add more reliablity about gfxoff in amdgpu IP's
suspend flow
Signed-off-by: Harsh Jain <harsh.jain@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: "Limonciello, Mario" <Mario.Limonciello@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>