Linus Torvalds [Fri, 12 Nov 2021 19:32:22 +0000 (11:32 -0800)]
Merge tag 'libata-5.16-rc1-p2' of git://git./linux/kernel/git/dlemoal/libata
Pull more libata updates from Damien Le Moal:
"Second round of updates for libata for 5.16:
- Fix READ LOG EXT and READ LOG DMA EXT command timeouts during disk
revalidation after a resume or a modprobe of the LLDD (me)
- Remove unnecessary error message in sata_highbank driver (Xu)
- Better handling of accesses to the IDENTIFY DEVICE data log for
drives that do not support this log page (me)
- Fix ahci_shost_attr_group declaration in ahci driver (me)"
* tag 'libata-5.16-rc1-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
libata: libahci: declare ahci_shost_attr_group as static
libata: add horkage for missing Identify Device log
ata: sata_highbank: Remove unnecessary print function dev_err()
libata: fix read log timeout value
Steve French [Wed, 10 Nov 2021 09:15:29 +0000 (03:15 -0600)]
smb3: do not setup the fscache_super_cookie until fsinfo initialized
We were calling cifs_fscache_get_super_cookie after tcon but before
we queried the info (QFS_Info) we need to initialize the cookie
properly. Also includes an additional check suggested by Paulo
to make sure we don't initialize super cookie twice.
Suggested-by: David Howells <dhowells@redhat.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
Sasha Levin [Fri, 12 Nov 2021 15:16:02 +0000 (10:16 -0500)]
tools/lib/lockdep: drop liblockdep
TL;DR: While a tool like liblockdep is useful, it probably doesn't
belong within the kernel tree.
liblockdep attempts to reuse kernel code both directly (by directly
building the kernel's lockdep code) as well as indirectly (by using
sanitized headers). This makes liblockdep an integral part of the
kernel.
It also makes liblockdep quite unique: while other userspace code might
use sanitized headers, it generally doesn't attempt to use kernel code
directly which means that changes on the kernel side of things don't
affect (and break) it directly.
All our workflows and tooling around liblockdep don't support this
uniqueness. Changes that go into the kernel code aren't validated to not
break in-tree userspace code.
liblockdep ended up being very fragile, breaking over and over, to the
point that living in the same tree as the lockdep code lost most of it's
value.
liblockdep should continue living in an external tree, syncing with
the kernel often, in a controllable way.
Signed-off-by: Sasha Levin <sashal@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Paulo Alcantara [Fri, 12 Nov 2021 18:16:08 +0000 (15:16 -0300)]
cifs: fix potential use-after-free bugs
Ensure that share and prefix variables are set to NULL after kfree()
when looping through DFS targets in __tree_connect_dfs_target().
Also, get rid of @ref in __tree_connect_dfs_target() and just pass a
boolean to indicate whether we're handling link targets or not.
Fixes:
c88f7dcd6d64 ("cifs: support nested dfs links over reconnect")
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
Paulo Alcantara [Fri, 12 Nov 2021 17:53:36 +0000 (14:53 -0300)]
cifs: fix memory leak of smb3_fs_context_dup::server_hostname
Fix memory leak of smb3_fs_context_dup::server_hostname when parsing
and duplicating fs contexts during mount(2) as reported by kmemleak:
unreferenced object 0xffff888125715c90 (size 16):
comm "mount.cifs", pid 3832, jiffies
4304535868 (age 190.094s)
hex dump (first 16 bytes):
7a 65 6c 64 61 2e 74 65 73 74 00 6b 6b 6b 6b a5 zelda.test.kkkk.
backtrace:
[<
ffffffff8168106e>] kstrdup+0x2e/0x60
[<
ffffffffa027a362>] smb3_fs_context_dup+0x392/0x8d0 [cifs]
[<
ffffffffa0136353>] cifs_smb3_do_mount+0x143/0x1700 [cifs]
[<
ffffffffa02795e8>] smb3_get_tree+0x2e8/0x520 [cifs]
[<
ffffffff817a19aa>] vfs_get_tree+0x8a/0x2d0
[<
ffffffff8181e3e3>] path_mount+0x423/0x1a10
[<
ffffffff8181fbca>] __x64_sys_mount+0x1fa/0x270
[<
ffffffff83ae364b>] do_syscall_64+0x3b/0x90
[<
ffffffff83c0007c>] entry_SYSCALL_64_after_hwframe+0x44/0xae
unreferenced object 0xffff888111deed20 (size 32):
comm "mount.cifs", pid 3832, jiffies
4304536044 (age 189.918s)
hex dump (first 32 bytes):
44 46 53 52 4f 4f 54 31 2e 5a 45 4c 44 41 2e 54 DFSROOT1.ZELDA.T
45 53 54 00 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 EST.kkkkkkkkkkk.
backtrace:
[<
ffffffff8168118d>] kstrndup+0x2d/0x90
[<
ffffffffa027ab2e>] smb3_parse_devname+0x9e/0x360 [cifs]
[<
ffffffffa01870c8>] cifs_setup_volume_info+0xa8/0x470 [cifs]
[<
ffffffffa018c469>] connect_dfs_target+0x309/0xc80 [cifs]
[<
ffffffffa018d6cb>] cifs_mount+0x8eb/0x17f0 [cifs]
[<
ffffffffa0136475>] cifs_smb3_do_mount+0x265/0x1700 [cifs]
[<
ffffffffa02795e8>] smb3_get_tree+0x2e8/0x520 [cifs]
[<
ffffffff817a19aa>] vfs_get_tree+0x8a/0x2d0
[<
ffffffff8181e3e3>] path_mount+0x423/0x1a10
[<
ffffffff8181fbca>] __x64_sys_mount+0x1fa/0x270
[<
ffffffff83ae364b>] do_syscall_64+0x3b/0x90
[<
ffffffff83c0007c>] entry_SYSCALL_64_after_hwframe+0x44/0xae
Fixes:
7be3248f3139 ("cifs: To match file servers, make sure the server hostname matches")
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
Steve French [Thu, 11 Nov 2021 21:35:34 +0000 (15:35 -0600)]
smb3: add additional null check in SMB311_posix_mkdir
Although unlikely for it to be possible for rsp to be null here,
the check is safer to add, and quiets a Coverity warning.
Addresses-Coverity: 1437501 ("Explicit Null dereference")
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
Steve French [Fri, 12 Nov 2021 15:55:03 +0000 (09:55 -0600)]
cifs: release lock earlier in dequeue_mid error case
In dequeue_mid we can log an error while holding a spinlock,
GlobalMid_Lock. Coverity notes that the error logging
also grabs a lock so it is cleaner (and a bit safer) to
release the GlobalMid_Lock before logging the warning.
Addresses-Coverity: 1507573 ("Thread deadlock")
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
Linus Torvalds [Fri, 12 Nov 2021 18:56:25 +0000 (10:56 -0800)]
thermal: int340x: fix build on 32-bit targets
Commit
aeb58c860dc5 ("thermal/drivers/int340x: processor_thermal: Suppot
64 bit RFIM responses") started using 'readq()' to read 64-bit status
responses from the int340x hardware.
That's all fine and good, but on 32-bit targets a 64-bit 'readq()' is
ambiguous, since it's no longer an atomic access. Some hardware might
require 64-bit accesses, and other hardware might want low word first or
high word first.
It's quite likely that the driver isn't relevant in a 32-bit environment
any more, and there's a patch floating around to just make it depend on
X86_64, but let's make it buildable on x86-32 anyway.
The driver previously just read the low 32 bits, so the hardware
certainly is ok with 32-bit reads, and in a little-endian environment
the low word first model is the natural one.
So just add the include for the 'io-64-nonatomic-lo-hi.h' version.
Fixes:
aeb58c860dc5 ("thermal/drivers/int340x: processor_thermal: Suppot 64 bit RFIM responses")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Paul Moore [Fri, 12 Nov 2021 17:07:02 +0000 (12:07 -0500)]
net,lsm,selinux: revert the security_sctp_assoc_established() hook
This patch reverts two prior patches,
e7310c94024c
("security: implement sctp_assoc_established hook in selinux") and
7c2ef0240e6a ("security: add sctp_assoc_established hook"), which
create the security_sctp_assoc_established() LSM hook and provide a
SELinux implementation. Unfortunately these two patches were merged
without proper review (the Reviewed-by and Tested-by tags from
Richard Haines were for previous revisions of these patches that
were significantly different) and there are outstanding objections
from the SELinux maintainers regarding these patches.
Work is currently ongoing to correct the problems identified in the
reverted patches, as well as others that have come up during review,
but it is unclear at this point in time when that work will be ready
for inclusion in the mainline kernel. In the interest of not keeping
objectionable code in the kernel for multiple weeks, and potentially
a kernel release, we are reverting the two problematic patches.
Signed-off-by: Paul Moore <paul@paul-moore.com>
Ming Lei [Fri, 12 Nov 2021 12:47:15 +0000 (20:47 +0800)]
blk-mq: fix filesystem I/O request allocation
submit_bio_checks() may update bio->bi_opf, so we have to initialize
blk_mq_alloc_data.cmd_flags with bio->bi_opf after submit_bio_checks()
returns when allocating new request.
In case of using cached request, fallback to allocate new request if
cached rq isn't compatible with the incoming bio, otherwise change
rq->cmd_flags with incoming bio->bi_opf.
Fixes:
900e080752025f00 ("block: move queue enter logic into blk_mq_submit_bio()")
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Steve French [Thu, 11 Nov 2021 22:18:14 +0000 (16:18 -0600)]
smb3: add additional null check in SMB2_tcon
Although unlikely to be possible for rsp to be null here,
the check is safer to add, and quiets a Coverity warning.
Addresses-Coverity: 1420428 ("Explicit null dereferenced")
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
Steve French [Thu, 11 Nov 2021 22:10:00 +0000 (16:10 -0600)]
smb3: add additional null check in SMB2_open
Although unlikely to be possible for rsp to be null here,
the check is safer to add, and quiets a Coverity warning.
Addresses-Coverity: 1418458 ("Explicit null dereferenced")
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
Marc Zyngier [Fri, 12 Nov 2021 14:10:39 +0000 (14:10 +0000)]
of/irq: Don't ignore interrupt-controller when interrupt-map failed
Since
041284181226 ("of/irq: Allow matching of an interrupt-map local
to an interrupt controller"), the irq code favors using an interrupt-map
over a interrupt-controller property if both are available, while the
earlier behaviour was to ignore the interrupt-map altogether.
However, we now end-up with the opposite behaviour, which is to
ignore the interrupt-controller property even if the interrupt-map
fails to match its input. This new behaviour breaks the AmigaOne
X1000 machine, which ships with an extremely "creative" (read:
broken) device tree.
Fix this by allowing the interrupt-controller property to be selected
when interrupt-map fails to match anything.
Fixes:
041284181226 ("of/irq: Allow matching of an interrupt-map local to an interrupt controller")
Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/78308692-02e6-9544-4035-3171a8e1e6d4@xenosoft.de
Link: https://lore.kernel.org/r/20211112143644.434995-1-maz@kernel.org
Cc: Bjorn Helgaas <bhelgaas@google.com>
Guo Ren [Fri, 5 Nov 2021 09:47:48 +0000 (17:47 +0800)]
irqchip/sifive-plic: Fixup EOI failed when masked
When using "devm_request_threaded_irq(,,,,IRQF_ONESHOT,,)" in a driver,
only the first interrupt is handled, and following interrupts are never
delivered (initially reported in [1]).
That's because the RISC-V PLIC cannot EOI masked interrupts, as explained
in the description of Interrupt Completion in the PLIC spec [2]:
<quote>
The PLIC signals it has completed executing an interrupt handler by
writing the interrupt ID it received from the claim to the claim/complete
register. The PLIC does not check whether the completion ID is the same
as the last claim ID for that target. If the completion ID does not match
an interrupt source that *is currently enabled* for the target, the
completion is silently ignored.
</quote>
Re-enable the interrupt before completion if it has been masked during
the handling, and remask it afterwards.
[1] http://lists.infradead.org/pipermail/linux-riscv/2021-July/007441.html
[2] https://github.com/riscv/riscv-plic-spec/blob/
8bc15a35d07c9edf7b5d23fec9728302595ffc4d/riscv-plic.adoc
Fixes:
bb0fed1c60cc ("irqchip/sifive-plic: Switch to fasteoi flow")
Reported-by: Vincent Pelletier <plr.vincent@gmail.com>
Tested-by: Nikita Shubin <nikita.shubin@maquefel.me>
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Cc: stable@vger.kernel.org
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Atish Patra <atish.patra@wdc.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
[maz: amended commit message]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211105094748.3894453-1-guoren@kernel.org
Guo Ren [Mon, 1 Nov 2021 13:45:34 +0000 (21:45 +0800)]
irqchip/csky-mpintc: Fixup mask/unmask implementation
The mask/unmask must be implemented, and enable/disable supplement
them if the HW requires something different at startup time. When
irq source is disabled by mask, mpintc could complete irq normally.
So drop enable/disable if favour of mask/unmask.
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211101134534.3804542-1-guoren@kernel.org
Daniel Bristot de Oliveira [Thu, 11 Nov 2021 22:07:42 +0000 (23:07 +0100)]
tracing/osnoise: Make osnoise_instances static
Make the struct list_head osnoise_instances definition static.
Link: https://lore.kernel.org/all/202111120052.ZuikQSJi-lkp@intel.com/
Link: https://lkml.kernel.org/r/d001f0eeac66e2b2eeec7d2a15e9e7abede0453a.1636667971.git.bristot@kernel.org
Cc: Ingo Molnar <mingo@redhat.com>
Fixes:
dae181349f1e ("tracing/osnoise: Support a list of trace_array *tr")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Ian Rogers [Thu, 4 Nov 2021 06:41:48 +0000 (23:41 -0700)]
perf test: Use macro for "suite" definitions
Add a macro to simplify later refactoring. No functional change.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Sohaib Mohamed <sohaib.amhmd@gmail.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Daniel Latypov <dlatypov@google.com>
Cc: David Gow <davidgow@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Clarke <pc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20211104064208.3156807-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Thu, 4 Nov 2021 06:41:47 +0000 (23:41 -0700)]
perf test: Use macro for "suite" declarations
Currently tests are setup in builtin-test with function pointers. Kunit
exposes tests as a kunit_suite with a null terminated array of test
cases. Use a macro to aid transition from one to the other in later
changes.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Sohaib Mohamed <sohaib.amhmd@gmail.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Daniel Latypov <dlatypov@google.com>
Cc: David Gow <davidgow@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Clarke <pc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20211104064208.3156807-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Tue, 9 Nov 2021 12:23:07 +0000 (09:23 -0300)]
perf beauty: Add socket level scnprintf that handles ARCH specific SOL_SOCKET
SOL_SOCKET has a different value according to the architecture, some
have it as 0xffff while all the others have it as 1, so a simple string
array isn't usable, add a scnprintf routine that treats it as a special
case, using the array for other values.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Mon, 8 Nov 2021 18:29:10 +0000 (15:29 -0300)]
perf trace: Beautify the 'level' argument of setsockopt
# perf trace -e setsockopt
0.000 ( 0.019 ms): systemd-resolv/1121 setsockopt(fd: 22, level: IP, optname: 50, optval: 0x7ffee2c0c134, optlen: 4) = 0
0.022 ( 0.003 ms): systemd-resolv/1121 setsockopt(fd: 22, level: IP, optname: 11, optval: 0x7ffee2c0c114, optlen: 4) = 0
0.027 ( 0.003 ms): systemd-resolv/1121 setsockopt(fd: 22, level: IP, optname: 8, optval: 0x7ffee2c0c134, optlen: 4) = 0
0.032 ( 0.002 ms): systemd-resolv/1121 setsockopt(fd: 22, level: IP, optname: 10, optval: 0x7ffee2c0c134, optlen: 4) = 0
0.036 ( 0.002 ms): systemd-resolv/1121 setsockopt(fd: 22, level: IP, optname: 25, optval: 0x7ffee2c0c114, optlen: 4) = 0
0.043 ( 0.003 ms): systemd-resolv/1121 setsockopt(fd: 22, level: 1, optname: 62, optval: 0x7ffee2c0c0fc, optlen: 4) = 0
0.055 ( 0.003 ms): systemd-resolv/1121 setsockopt(fd: 22, level: 1, optname: 25)
^C#
So the simple straight STRARRAY method is not enough as SOL_SOCKET is
'1' in most architectures but some use 0xffff (alpha, mips, parisc and
sparc), so a followup patch will create a specialized scnprintf to cover
that.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Mon, 8 Nov 2021 18:29:10 +0000 (15:29 -0300)]
perf trace: Beautify the 'level' argument of getsockopt
# perf trace -e getsockopt
0.000 ( 0.006 ms): systemd-resolv/1121 getsockopt(fd: 21, level: 1, optname: 17, optval: 0x7ffee2c0c6cc, optlen: 0x7ffee2c0c6c8) = 0
0.301 ( 0.003 ms): systemd-resolv/1121 getsockopt(fd: 22, level: IP, optname: 14, optval: 0x7ffee2c0c1a0, optlen: 0x7ffee2c0c1a4) = -1 ENOTCONN (Transport endpoint is not connected)
2.215 ( 0.005 ms): systemd-resolv/1121 getsockopt(fd: 21, level: 1, optname: 17, optval: 0x7ffee2c0c6cc, optlen: 0x7ffee2c0c6c8) = 0
2.422 ( 0.005 ms): systemd-resolv/1121 getsockopt(fd: 22, level: IP, optname: 14, optval: 0x7ffee2c0c1a0, optlen: 0x7ffee2c0c1a4) = -1 ENOTCONN (Transport endpoint is not connected)
1001.308 ( 0.006 ms): systemd-resolv/1121 getsockopt(fd: 21, level: 1, optname: 17, optval: 0x7ffee2c0c6cc, optlen: 0x7ffee2c0c6c8) = 0
1001.586 ( 0.003 ms): systemd-resolv/1121 getsockopt(fd: 22, level: IP, optname: 14, optval: 0x7ffee2c0c1a0, optlen: 0x7ffee2c0c1a4) = -1 ENOTCONN (Transport endpoint is not connected)
1001.647 ( 0.002 ms): systemd-resolv/1121 getsockopt(fd: 23, level: IP, optname: 14, optval: 0x7ffee2c0c1a0, optlen: 0x7ffee2c0c1a4) = -1 ENOTCONN (Transport endpoint is not connected)
1003.868 ( 0.010 ms): systemd-resolv/1121 getsockopt(fd: 21, level: 1, optname: 17, optval: 0x7ffee2c0c6cc, optlen: 0x7ffee2c0c6c8) = 0
1004.036 ( 0.006 ms): systemd-resolv/1121 getsockopt(fd: 22, level: IP, optname: 14, optval: 0x7ffee2c0c1a0, optlen: 0x7ffee2c0c1a4) = -1 ENOTCONN (Transport endpoint is not connected)
1004.087 ( 0.002 ms): systemd-resolv/1121 getsockopt(fd: 23, level: IP, optname: 14, optval: 0x7ffee2c0c1a0, optlen: 0x7ffee2c0c1a4) = -1 ENOTCONN (Transport endpoint is not connected)
^C#
So the simple straight STRARRAY method is not enough as SOL_SOCKET is
'1' in most architectures but some use 0xffff (alpha, mips, parisc and
sparc), so a followup patch will create a specialized scnprintf to cover
that.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Mon, 8 Nov 2021 13:48:13 +0000 (10:48 -0300)]
perf beauty socket: Add generator for socket level (SOL_*) string table
$ tools/perf/trace/beauty/socket.sh
static const char *socket_ipproto[] = {
[0] = "IP",
[1] = "ICMP",
<SNIP>
[255] = "RAW",
[262] = "MPTCP",
};
static const char *socket_level[] = {
[0] = "IP",
[6] = "TCP",
[17] = "UDP",
[41] = "IPV6",
[58] = "ICMPV6",
[132] = "SCTP",
[136] = "UDPLITE",
[255] = "RAW",
[256] = "IPX",
[257] = "AX25",
[258] = "ATALK",
[259] = "NETROM",
[260] = "ROSE",
[261] = "DECNET",
[262] = "X25",
[263] = "PACKET",
[264] = "ATM",
[265] = "AAL",
[266] = "IRDA",
[267] = "NETBEUI",
[268] = "LLC",
[269] = "DCCP",
[270] = "NETLINK",
[271] = "TIPC",
[272] = "RXRPC",
[273] = "PPPOL2TP",
[274] = "BLUETOOTH",
[275] = "PNPIPE",
[276] = "RDS",
[277] = "IUCV",
[278] = "CAIF",
[279] = "ALG",
[280] = "NFC",
[281] = "KCM",
[282] = "TLS",
[283] = "XDP",
[284] = "MPTCP",
[285] = "MCTP",
};
$
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Mon, 8 Nov 2021 13:39:32 +0000 (10:39 -0300)]
perf beauty socket: Sort the ipproto array entries
Just tidying up the output for human consumption.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Mon, 8 Nov 2021 13:38:27 +0000 (10:38 -0300)]
perf beauty socket: Rename 'regex' to 'ipproto_regex'
Paving the way for more regexps to be used here.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Mon, 8 Nov 2021 13:34:06 +0000 (10:34 -0300)]
perf beauty socket: Prep to receive more input header files
Move from ternary like expression to an if block, this way we'll
have just the extra lines for new files in the following patches.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Mon, 8 Nov 2021 13:31:33 +0000 (10:31 -0300)]
perf beauty socket: Rename header_dir to uapi_header_dir
Paving the way to pass more headers to be consumed, like
tools/perf/trace/beauty/include/linux/socket.h in addition to the
current tools/include/uapi/linux/in.h, to get the SOL_* defines.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Mon, 8 Nov 2021 13:05:53 +0000 (10:05 -0300)]
perf beauty: Rename socket_ipproto.sh to socket.sh to hold more socket table generators
To avoid having to add new entries to tools/perf/Makefile.perf prep
socket.sh so that it can generate other socket table generators, such as
the upcoming SOL_ socket level one.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Mon, 8 Nov 2021 12:42:01 +0000 (09:42 -0300)]
perf beauty: Make all sockaddr files use a common naming scheme
The script that generates the tables was named 'socket.sh', which is
confusing, rename it to sockaddr.sh and make sure the related
Makefile.perf targets also use the 'sockaddr' namespace.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Laibin Qiu [Fri, 12 Nov 2021 09:33:54 +0000 (17:33 +0800)]
blkcg: Remove extra blkcg_bio_issue_init
KASAN reports a use-after-free report when doing block test:
==================================================================
[10050.967049] BUG: KASAN: use-after-free in
submit_bio_checks+0x1539/0x1550
[10050.977638] Call Trace:
[10050.978190] dump_stack+0x9b/0xce
[10050.979674] print_address_description.constprop.6+0x3e/0x60
[10050.983510] kasan_report.cold.9+0x22/0x3a
[10050.986089] submit_bio_checks+0x1539/0x1550
[10050.989576] submit_bio_noacct+0x83/0xc80
[10050.993714] submit_bio+0xa7/0x330
[10050.994435] mpage_readahead+0x380/0x500
[10050.998009] read_pages+0x1c1/0xbf0
[10051.002057] page_cache_ra_unbounded+0x4c2/0x6f0
[10051.007413] do_page_cache_ra+0xda/0x110
[10051.008207] force_page_cache_ra+0x23d/0x3d0
[10051.009087] page_cache_sync_ra+0xca/0x300
[10051.009970] generic_file_buffered_read+0xbea/0x2130
[10051.012685] generic_file_read_iter+0x315/0x490
[10051.014472] blkdev_read_iter+0x113/0x1b0
[10051.015300] aio_read+0x2ad/0x450
[10051.023786] io_submit_one+0xc8e/0x1d60
[10051.029855] __se_sys_io_submit+0x125/0x350
[10051.033442] do_syscall_64+0x2d/0x40
[10051.034156] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[10051.048733] Allocated by task 18598:
[10051.049482] kasan_save_stack+0x19/0x40
[10051.050263] __kasan_kmalloc.constprop.1+0xc1/0xd0
[10051.051230] kmem_cache_alloc+0x146/0x440
[10051.052060] mempool_alloc+0x125/0x2f0
[10051.052818] bio_alloc_bioset+0x353/0x590
[10051.053658] mpage_alloc+0x3b/0x240
[10051.054382] do_mpage_readpage+0xddf/0x1ef0
[10051.055250] mpage_readahead+0x264/0x500
[10051.056060] read_pages+0x1c1/0xbf0
[10051.056758] page_cache_ra_unbounded+0x4c2/0x6f0
[10051.057702] do_page_cache_ra+0xda/0x110
[10051.058511] force_page_cache_ra+0x23d/0x3d0
[10051.059373] page_cache_sync_ra+0xca/0x300
[10051.060198] generic_file_buffered_read+0xbea/0x2130
[10051.061195] generic_file_read_iter+0x315/0x490
[10051.062189] blkdev_read_iter+0x113/0x1b0
[10051.063015] aio_read+0x2ad/0x450
[10051.063686] io_submit_one+0xc8e/0x1d60
[10051.064467] __se_sys_io_submit+0x125/0x350
[10051.065318] do_syscall_64+0x2d/0x40
[10051.066082] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[10051.067455] Freed by task 13307:
[10051.068136] kasan_save_stack+0x19/0x40
[10051.068931] kasan_set_track+0x1c/0x30
[10051.069726] kasan_set_free_info+0x1b/0x30
[10051.070621] __kasan_slab_free+0x111/0x160
[10051.071480] kmem_cache_free+0x94/0x460
[10051.072256] mempool_free+0xd6/0x320
[10051.072985] bio_free+0xe0/0x130
[10051.073630] bio_put+0xab/0xe0
[10051.074252] bio_endio+0x3a6/0x5d0
[10051.074984] blk_update_request+0x590/0x1370
[10051.075870] scsi_end_request+0x7d/0x400
[10051.076667] scsi_io_completion+0x1aa/0xe50
[10051.077503] scsi_softirq_done+0x11b/0x240
[10051.078344] blk_mq_complete_request+0xd4/0x120
[10051.079275] scsi_mq_done+0xf0/0x200
[10051.080036] virtscsi_vq_done+0xbc/0x150
[10051.080850] vring_interrupt+0x179/0x390
[10051.081650] __handle_irq_event_percpu+0xf7/0x490
[10051.082626] handle_irq_event_percpu+0x7b/0x160
[10051.083527] handle_irq_event+0xcc/0x170
[10051.084297] handle_edge_irq+0x215/0xb20
[10051.085122] asm_call_irq_on_stack+0xf/0x20
[10051.085986] common_interrupt+0xae/0x120
[10051.086830] asm_common_interrupt+0x1e/0x40
==================================================================
Bio will be checked at beginning of submit_bio_noacct(). If bio needs
to be throttled, it will start the timer and stop submit bio directly.
Bio will submit in blk_throtl_dispatch_work_fn() when the timer expires.
But in the current process, if bio is throttled, it will still set bio
issue->value by blkcg_bio_issue_init(). This is redundant and may cause
the above use-after-free.
CPU0 CPU1
submit_bio
submit_bio_noacct
submit_bio_checks
blk_throtl_bio()
<=mod_timer(&sq->pending_timer
blk_throtl_dispatch_work_fn
submit_bio_noacct() <= bio have
throttle tag, will throw directly
and bio issue->value will be set
here
bio_endio()
bio_put()
bio_free() <= free this bio
blkcg_bio_issue_init(bio)
<= bio has been freed and
will lead to UAF
return BLK_QC_T_NONE
Fix this by remove extra blkcg_bio_issue_init.
Fixes:
e439bedf6b24 (blkcg: consolidate bio_issue_init() to be a part of core)
Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>
Link: https://lore.kernel.org/r/20211112093354.3581504-1-qiulaibin@huawei.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Paolo Bonzini [Fri, 12 Nov 2021 09:02:24 +0000 (04:02 -0500)]
KVM: SEV: unify cgroup cleanup code for svm_vm_migrate_from
Use the same cleanup code independent of whether the cgroup to be
uncharged and unref'd is the source or the destination cgroup. Use a
bool to track whether the destination cgroup has been charged, which also
fixes a bug in the error case: the destination cgroup must be uncharged
only if it does not match the source.
Fixes:
b56639318bb2 ("KVM: SEV: Add support for SEV intra host migration")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Fri, 12 Nov 2021 07:53:41 +0000 (02:53 -0500)]
KVM: x86: move guest_pv_has out of user_access section
When UBSAN is enabled, the code emitted for the call to guest_pv_has
includes a call to __ubsan_handle_load_invalid_value. objtool
complains that this call happens with UACCESS enabled; to avoid
the warning, pull the calls to user_access_begin into both arms
of the "if" statement, after the check for guest_pv_has.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Dmitry Torokhov [Fri, 12 Nov 2021 05:58:54 +0000 (21:58 -0800)]
Merge branch 'next' into for-linus
Prepare input updates for 5.16 merge window.
Dave Airlie [Fri, 12 Nov 2021 03:06:37 +0000 (13:06 +1000)]
Merge tag 'drm-misc-fixes-2021-11-11' of git://anongit.freedesktop.org/drm/drm-misc into drm-next
* dma-buf: name_lock fixes
* prime: Keep object ref during mmap
* nouveau: Fix a refcount issue; Fix device removal; Protect client
list with dedicated mutex; Fix address CE0 address calculation
* ttm: Fix race condition during BO eviction
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/YYzY6jeox9EeI15i@linux-uq9g.fritz.box
Ronnie Sahlberg [Tue, 2 Nov 2021 23:45:52 +0000 (08:45 +0900)]
ksmbd: Use the SMB3_Create definitions from the shared
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Ronnie Sahlberg [Thu, 9 Sep 2021 03:26:12 +0000 (12:26 +0900)]
ksmbd: Move more definitions into the shared area
Move SMB2_SessionSetup, SMB2_Close, SMB2_Read, SMB2_Write and
SMB2_ChangeNotify commands into smbfs_common/smb2pdu.h
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Ronnie Sahlberg [Tue, 2 Nov 2021 23:44:38 +0000 (08:44 +0900)]
ksmbd: use the common definitions for NEGOTIATE_PROTOCOL
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Ronnie Sahlberg [Tue, 2 Nov 2021 23:43:42 +0000 (08:43 +0900)]
ksmbd: switch to use shared definitions where available
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Namjae Jeon [Thu, 9 Sep 2021 03:28:18 +0000 (12:28 +0900)]
ksmbd: change LeaseKey data type to u8 array
cifs define LeaseKey as u8 array in structure. To move lease structure
to smbfs_common, ksmbd change LeaseKey data type to u8 array.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Namjae Jeon [Tue, 2 Nov 2021 23:25:54 +0000 (08:25 +0900)]
ksmbd: remove smb2_buf_length in smb2_transform_hdr
To move smb2_transform_hdr to smbfs_common, This patch remove
smb2_buf_length variable in smb2_transform_hdr.
Cc: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Namjae Jeon [Tue, 2 Nov 2021 23:08:44 +0000 (08:08 +0900)]
ksmbd: remove smb2_buf_length in smb2_hdr
To move smb2_hdr to smbfs_common, This patch remove smb2_buf_length
variable in smb2_hdr. Also, declare smb2_get_msg function to get smb2
request/response from ->request/response_buf.
Cc: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Namjae Jeon [Sun, 31 Oct 2021 00:56:53 +0000 (09:56 +0900)]
ksmbd: remove md4 leftovers
As NTLM authentication is removed, md4 is no longer used.
ksmbd remove md4 leftovers, i.e. select CRYPTO_MD4, MODULE_SOFTDEP md4.
Acked-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Namjae Jeon [Sun, 31 Oct 2021 00:53:50 +0000 (09:53 +0900)]
ksmbd: set unique value to volume serial field in FS_VOLUME_INFORMATION
Steve French reported ksmbd set fixed value to volume serial field in
FS_VOLUME_INFORMATION. Volume serial value needs to be set to a unique
value for client fscache. This patch set crc value that is generated
with share name, path name and netbios name to volume serial.
Fixes:
e2f34481b24d ("cifsd: add server-side procedures for SMB3")
Cc: stable@vger.kernel.org # v5.15
Reported-by: Steve French <smfrench@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Jens Axboe [Fri, 12 Nov 2021 00:32:53 +0000 (17:32 -0700)]
io-wq: serialize hash clear with wakeup
We need to ensure that we serialize the stalled and hash bits with the
wait_queue wait handler, or we could be racing with someone modifying
the hashed state after we find it busy, but before we then give up and
wait for it to be cleared. This can cause random delays or stalls when
handling buffered writes for many files, where some of these files cause
hash collisions between the worker threads.
Cc: stable@vger.kernel.org
Reported-by: Daniel Black <daniel@mariadb.org>
Fixes:
e941894eae31 ("io-wq: make buffered file write hashed work map per-ctx")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Dave Airlie [Thu, 11 Nov 2021 23:22:28 +0000 (09:22 +1000)]
BackMerge tag 'v5.15' into drm-next
I got a drm-fixes which had some 5.15 stuff in it, so to avoid
the mess just backmerge here.
Linux 5.15
Signed-off-by: Dave Airlie <airlied@redhat.com>
Linus Torvalds [Thu, 11 Nov 2021 23:10:18 +0000 (15:10 -0800)]
Merge tag 'pci-v5.16-fixes-1' of git://git./linux/kernel/git/helgaas/pci
Pull PCI fixes from Bjorn Helgaas:
"Revert conversion to struct device.driver instead of struct
pci_dev.driver.
The device.driver is set earlier, and using it caused the PCI core to
call driver PM entry points before .probe() and after .remove(), when
the driver isn't prepared.
This caused NULL pointer dereferences in i2c_designware_pci and
probably other driver issues"
* tag 'pci-v5.16-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
Revert "PCI: Use to_pci_driver() instead of pci_dev->driver"
Revert "PCI: Remove struct pci_dev->driver"
Damien Le Moal [Thu, 11 Nov 2021 03:03:27 +0000 (12:03 +0900)]
libata: libahci: declare ahci_shost_attr_group as static
ahci_shost_attr_group is referenced only in drivers/ata/libahci.c.
Declare it as static.
Fixes:
c3f69c7f629f ("scsi: ata: Switch to attribute groups")
Cc: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Damien Le Moal [Mon, 8 Nov 2021 23:45:25 +0000 (08:45 +0900)]
libata: add horkage for missing Identify Device log
ACS-3 introduced the ATA Identify Device Data log as mandatory. A
warning message currently signals to the user if a device does not
report supporting this log page in the log directory page, regardless
of the ATA version of the device. Furthermore, this warning will appear
for all attempts at accessing this missing log page during device
revalidation.
Since it is useless to constantly access the log directory and warn
about this lack of support once we have discovered that the device
does not support this log page, introduce the horkage flag
ATA_HORKAGE_NO_ID_DEV_LOG to mark a device as lacking support for
the Identify Device Data log page. Set this flag when
ata_log_supported() returns false in ata_identify_page_supported().
The warning is printed only if the device ATA level is 10 or above
(ACS-3 or above), and only once on device scan. With this flag set, the
log directory page is not accessed again to test for Identify Device
Data log page support.
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Linus Torvalds [Thu, 11 Nov 2021 23:00:04 +0000 (15:00 -0800)]
Merge tag 'kcsan.2021.11.11a' of git://git./linux/kernel/git/paulmck/linux-rcu
Pull KCSAN updates from Paul McKenney:
"This contains initialization fixups, testing improvements, addition of
instruction pointer to data-race reports, and scoped data-race checks"
* tag 'kcsan.2021.11.11a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
kcsan: selftest: Cleanup and add missing __init
kcsan: Move ctx to start of argument list
kcsan: Support reporting scoped read-write access type
kcsan: Start stack trace with explicit location if provided
kcsan: Save instruction pointer for scoped accesses
kcsan: Add ability to pass instruction pointer of access to reporting
kcsan: test: Fix flaky test case
kcsan: test: Use kunit_skip() to skip tests
kcsan: test: Defer kcsan_test_init() after kunit initialization
Linus Torvalds [Thu, 11 Nov 2021 22:47:32 +0000 (14:47 -0800)]
Merge tag 'apparmor-pr-2021-11-10' of git://git./linux/kernel/git/jj/linux-apparmor
Pull apparmor updates from John Johansen:
"Features
- use per file locks for transactional queries
- update policy management capability checks to work with LSM stacking
Bug Fixes:
- check/put label on apparmor_sk_clone_security()
- fix error check on update of label hname
- fix introspection of of task mode for unconfined tasks
Cleanups:
- avoid -Wempty-body warning
- remove duplicated 'Returns:' comments
- fix doc warning
- remove unneeded one-line hook wrappers
- use struct_size() helper in kzalloc()
- fix zero-length compiler warning in AA_BUG()
- file.h: delete duplicated word
- delete repeated words in comments
- remove repeated declaration"
* tag 'apparmor-pr-2021-11-10' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor:
apparmor: remove duplicated 'Returns:' comments
apparmor: remove unneeded one-line hook wrappers
apparmor: Use struct_size() helper in kzalloc()
apparmor: fix zero-length compiler warning in AA_BUG()
apparmor: use per file locks for transactional queries
apparmor: fix doc warning
apparmor: Remove the repeated declaration
apparmor: avoid -Wempty-body warning
apparmor: Fix internal policy capable check for policy management
apparmor: fix error check
security: apparmor: delete repeated words in comments
security: apparmor: file.h: delete duplicated word
apparmor: switch to apparmor to internal capable check for policy management
apparmor: update policy capable checks to use a label
apparmor: fix introspection of of task mode for unconfined tasks
apparmor: check/put label on apparmor_sk_clone_security()
Linus Torvalds [Thu, 11 Nov 2021 22:31:47 +0000 (14:31 -0800)]
Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:
"The post-linux-next material.
7 patches.
Subsystems affected by this patch series (all mm): debug,
slab-generic, migration, memcg, and kasan"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
kasan: add kasan mode messages when kasan init
mm: unexport {,un}lock_page_memcg
mm: unexport folio_memcg_{,un}lock
mm/migrate.c: remove MIGRATE_PFN_LOCKED
mm: migrate: simplify the file-backed pages validation when migrating its mapping
mm: allow only SLUB on PREEMPT_RT
mm/page_owner.c: modify the type of argument "order" in some functions
Linus Torvalds [Thu, 11 Nov 2021 22:22:05 +0000 (14:22 -0800)]
Merge tag 'm68knommu-for-v5.16' of git://git./linux/kernel/git/gerg/m68knommu
Pull m68knommu updates from Greg Ungerer:
"Only two changes.
One removes the now unused CONFIG_MCPU32 symbol. The other sets a
default for the CONFIG_MEMORY_RESERVE config symbol (this aids
scripting and other automation) so you don't interactively get asked
for a value at configure time.
Summary:
- remove unused CONFIG_MCPU32 symbol
- default CONFIG_MEMORY_RESERVE value (for scripting)"
* tag 'm68knommu-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
m68knommu: Remove MCPU32 config symbol
m68k: set a default value for MEMORY_RESERVE
Steve French [Thu, 11 Nov 2021 20:39:23 +0000 (14:39 -0600)]
smb3: add additional null check in SMB2_ioctl
Although unlikely for it to be possible for rsp to be null here,
the check is safer to add, and quiets a Coverity warning.
Addresses-Coverity: 1443909 ("Explicit Null dereference")
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
Bjorn Helgaas [Wed, 10 Nov 2021 18:03:34 +0000 (12:03 -0600)]
Revert "PCI: Use to_pci_driver() instead of pci_dev->driver"
This reverts commit
2a4d9408c9e8b6f6fc150c66f3fef755c9e20d4a.
Robert reported a NULL pointer dereference caused by the PCI core
(local_pci_probe()) calling the i2c_designware_pci driver's
.runtime_resume() method before the .probe() method. i2c_dw_pci_resume()
depends on initialization done by i2c_dw_pci_probe().
Prior to
2a4d9408c9e8 ("PCI: Use to_pci_driver() instead of
pci_dev->driver"), pci_pm_runtime_resume() avoided calling the
.runtime_resume() method because pci_dev->driver had not been set yet.
2a4d9408c9e8 and
b5f9c644eb1b ("PCI: Remove struct pci_dev->driver"),
removed pci_dev->driver, replacing it by device->driver, which *has* been
set by this time, so pci_pm_runtime_resume() called the .runtime_resume()
method when it previously had not.
Fixes:
2a4d9408c9e8 ("PCI: Use to_pci_driver() instead of pci_dev->driver")
Link: https://lore.kernel.org/linux-i2c/CAP145pgdrdiMAT7=-iB1DMgA7t_bMqTcJL4N0=6u8kNY3EU0dw@mail.gmail.com/
Reported-by: Robert Święcki <robert@swiecki.net>
Tested-by: Robert Święcki <robert@swiecki.net>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Bjorn Helgaas [Wed, 10 Nov 2021 18:01:14 +0000 (12:01 -0600)]
Revert "PCI: Remove struct pci_dev->driver"
This reverts commit
b5f9c644eb1baafcd349ad134e2110773f8d0a38.
Revert
b5f9c644eb1b ("PCI: Remove struct pci_dev->driver"), which is needed
to revert
2a4d9408c9e8 ("PCI: Use to_pci_driver() instead of
pci_dev->driver").
2a4d9408c9e8 caused a NULL pointer dereference reported by Robert Święcki.
Details in the revert of that commit.
Fixes:
2a4d9408c9e8 ("PCI: Use to_pci_driver() instead of pci_dev->driver")
Link: https://lore.kernel.org/linux-i2c/CAP145pgdrdiMAT7=-iB1DMgA7t_bMqTcJL4N0=6u8kNY3EU0dw@mail.gmail.com/
Reported-by: Robert Święcki <robert@swiecki.net>
Tested-by: Robert Święcki <robert@swiecki.net>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Shin'ichiro Kawasaki [Thu, 11 Nov 2021 08:52:38 +0000 (17:52 +0900)]
block: Hold invalidate_lock in BLKRESETZONE ioctl
When BLKRESETZONE ioctl and data read race, the data read leaves stale
page cache. The commit
e5113505904e ("block: Discard page cache of zone
reset target range") added page cache truncation to avoid stale page
cache after the ioctl. However, the stale page cache still can be read
during the reset zone operation for the ioctl. To avoid the stale page
cache completely, hold invalidate_lock of the block device file mapping.
Fixes:
e5113505904e ("block: Discard page cache of zone reset target range")
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Cc: stable@vger.kernel.org # v5.15
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20211111085238.942492-1-shinichiro.kawasaki@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Ming Lei [Thu, 11 Nov 2021 08:51:34 +0000 (16:51 +0800)]
blk-mq: rename blk_attempt_bio_merge
It is very annoying to have two block layer functions which share same
name, so rename blk_attempt_bio_merge in blk-mq.c as
blk_mq_attempt_bio_merge.
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211111085134.345235-3-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Ming Lei [Thu, 11 Nov 2021 08:51:33 +0000 (16:51 +0800)]
blk-mq: don't grab ->q_usage_counter in blk_mq_sched_bio_merge
blk_mq_sched_bio_merge is only called from blk-mq.c:blk_attempt_bio_merge(),
which is called when queue usage counter is grabbed already:
1) blk_mq_get_new_requests()
2) blk_mq_get_request()
- cached request in current plug owns one queue usage counter
So don't grab ->q_usage_counter in blk_mq_sched_bio_merge(), and more
importantly this nest way causes hang in blk_mq_freeze_queue_wait().
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211111085134.345235-2-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Thu, 11 Nov 2021 00:32:32 +0000 (17:32 -0700)]
block: fix kerneldoc for disk_register_independent_access__ranges()
The naming got changed as part of a revision of the patchset, but the
kerneldoc apparently never got updated. Fix it.
Reported-by: kernel test robot <lkp@intel.com>
Fixes:
a2247f19ee1c ("block: Add independent access ranges support")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Thu, 11 Nov 2021 18:16:33 +0000 (10:16 -0800)]
Merge tag 'trace-v5.16-3' of git://git./linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"Two locking fixes:
- Add mutex protection to ring_buffer_reset()
- Fix deadlock in modify_ftrace_direct_multi()"
* tag 'trace-v5.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ftrace/direct: Fix lockup in modify_ftrace_direct_multi
ring-buffer: Protect ring_buffer_reset() from reentrancy
Linus Torvalds [Thu, 11 Nov 2021 17:49:36 +0000 (09:49 -0800)]
Merge tag 'net-5.16-rc1' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bpf, can and netfilter.
Current release - regressions:
- bpf: do not reject when the stack read size is different from the
tracked scalar size
- net: fix premature exit from NAPI state polling in napi_disable()
- riscv, bpf: fix RV32 broken build, and silence RV64 warning
Current release - new code bugs:
- net: fix possible NULL deref in sock_reserve_memory
- amt: fix error return code in amt_init(); fix stopping the
workqueue
- ax88796c: use the correct ioctl callback
Previous releases - always broken:
- bpf: stop caching subprog index in the bpf_pseudo_func insn
- security: fixups for the security hooks in sctp
- nfc: add necessary privilege flags in netlink layer, limit
operations to admin only
- vsock: prevent unnecessary refcnt inc for non-blocking connect
- net/smc: fix sk_refcnt underflow on link down and fallback
- nfnetlink_queue: fix OOB when mac header was cleared
- can: j1939: ignore invalid messages per standard
- bpf, sockmap:
- fix race in ingress receive verdict with redirect to self
- fix incorrect sk_skb data_end access when src_reg = dst_reg
- strparser, and tls are reusing qdisc_skb_cb and colliding
- ethtool: fix ethtool msg len calculation for pause stats
- vlan: fix a UAF in vlan_dev_real_dev() when ref-holder tries to
access an unregistering real_dev
- udp6: make encap_rcv() bump the v6 not v4 stats
- drv: prestera: add explicit padding to fix m68k build
- drv: felix: fix broken VLAN-tagged PTP under VLAN-aware bridge
- drv: mvpp2: fix wrong SerDes reconfiguration order
Misc & small latecomers:
- ipvs: auto-load ipvs on genl access
- mctp: sanity check the struct sockaddr_mctp padding fields
- libfs: support RENAME_EXCHANGE in simple_rename()
- avoid double accounting for pure zerocopy skbs"
* tag 'net-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (123 commits)
selftests/net: udpgso_bench_rx: fix port argument
net: wwan: iosm: fix compilation warning
cxgb4: fix eeprom len when diagnostics not implemented
net: fix premature exit from NAPI state polling in napi_disable()
net/smc: fix sk_refcnt underflow on linkdown and fallback
net/mlx5: Lag, fix a potential Oops with mlx5_lag_create_definer()
gve: fix unmatched u64_stats_update_end()
net: ethernet: lantiq_etop: Fix compilation error
selftests: forwarding: Fix packet matching in mirroring selftests
vsock: prevent unnecessary refcnt inc for nonblocking connect
net: marvell: mvpp2: Fix wrong SerDes reconfiguration order
net: ethernet: ti: cpsw_ale: Fix access to un-initialized memory
net: stmmac: allow a tc-taprio base-time of zero
selftests: net: test_vxlan_under_vrf: fix HV connectivity test
net: hns3: allow configure ETS bandwidth of all TCs
net: hns3: remove check VF uc mac exist when set by PF
net: hns3: fix some mac statistics is always 0 in device version V2
net: hns3: fix kernel crash when unload VF while it is being reset
net: hns3: sync rx ring head in echo common pull
net: hns3: fix pfc packet number incorrect after querying pfc parameters
...
Linus Torvalds [Thu, 11 Nov 2021 17:44:29 +0000 (09:44 -0800)]
Merge tag 'char-misc-5.16-rc1' of git://git./linux/kernel/git/gregkh/char-misc
Pull char/misc fix from Greg KH:
"Here is a single fix for 5.16-rc1 to resolve a build problem that came
in through the coresight tree (and as such came in through the
char/misc tree merge in the 5.16-rc1 merge window).
It resolves a build problem with 'allmodconfig' on arm64 and is acked
by the proper subsystem maintainers. It has been in linux-next all
week with no reported problems"
* tag 'char-misc-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
arm64: cpufeature: Export this_cpu_has_cap helper
Linus Torvalds [Thu, 11 Nov 2021 17:40:15 +0000 (09:40 -0800)]
Merge tag 'usb-5.16-rc1' of git://git./linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are some small reverts and fixes for USB drivers for issues that
came up during the 5.16-rc1 merge window.
These include:
- two reverts of xhci and USB core patches that are causing problems
in many systems.
- xhci 3.1 enumeration delay fix for systems that were having
problems.
All three of these have been in linux-next all week with no reported
issues"
* tag 'usb-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
xhci: Fix USB 3.1 enumeration issues by increasing roothub power-on-good delay
Revert "usb: core: hcd: Add support for deferring roothub registration"
Revert "xhci: Set HCD flag to defer primary roothub registration"
Kuan-Ying Lee [Thu, 11 Nov 2021 04:32:49 +0000 (20:32 -0800)]
kasan: add kasan mode messages when kasan init
There are multiple kasan modes. It makes sense that we add some
messages to know which kasan mode is active when booting up [1].
Link: https://bugzilla.kernel.org/show_bug.cgi?id=212195
Link: https://lkml.kernel.org/r/20211020094850.4113-1-Kuan-Ying.Lee@mediatek.com
Signed-off-by: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>
Reviewed-by: Marco Elver <elver@google.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Chinwen Chang <chinwen.chang@mediatek.com>
Cc: Yee Lee <yee.lee@mediatek.com>
Cc: Nicholas Tang <nicholas.tang@mediatek.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Christoph Hellwig [Thu, 11 Nov 2021 04:32:46 +0000 (20:32 -0800)]
mm: unexport {,un}lock_page_memcg
These are only used in built-in core mm code.
Link: https://lkml.kernel.org/r/20210820095815.445392-3-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Christoph Hellwig [Thu, 11 Nov 2021 04:32:43 +0000 (20:32 -0800)]
mm: unexport folio_memcg_{,un}lock
Patch series "unexport memcg locking helpers".
Neither the old page-based nor the new folio-based memcg locking helpers
are used in modular code at all, so drop the exports.
This patch (of 2):
folio_memcg_{,un}lock are only used in built-in core mm code.
Link: https://lkml.kernel.org/r/20210820095815.445392-1-hch@lst.de
Link: https://lkml.kernel.org/r/20210820095815.445392-2-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alistair Popple [Thu, 11 Nov 2021 04:32:40 +0000 (20:32 -0800)]
mm/migrate.c: remove MIGRATE_PFN_LOCKED
MIGRATE_PFN_LOCKED is used to indicate to migrate_vma_prepare() that a
source page was already locked during migrate_vma_collect(). If it
wasn't then the a second attempt is made to lock the page. However if
the first attempt failed it's unlikely a second attempt will succeed,
and the retry adds complexity. So clean this up by removing the retry
and MIGRATE_PFN_LOCKED flag.
Destination pages are also meant to have the MIGRATE_PFN_LOCKED flag
set, but nothing actually checks that.
Link: https://lkml.kernel.org/r/20211025041608.289017-1-apopple@nvidia.com
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Baolin Wang [Thu, 11 Nov 2021 04:32:37 +0000 (20:32 -0800)]
mm: migrate: simplify the file-backed pages validation when migrating its mapping
There is no need to validate the file-backed page's refcount before
trying to freeze the page's expected refcount, instead we can rely on
the folio_ref_freeze() to validate if the page has the expected refcount
before migrating its mapping.
Moreover we are always under the page lock when migrating the page
mapping, which means nowhere else can remove it from the page cache, so
we can remove the xas_load() validation under the i_pages lock.
Link: https://lkml.kernel.org/r/cover.1629447552.git.baolin.wang@linux.alibaba.com
Link: https://lkml.kernel.org/r/df4c129fd8e86a95dbc55f4663d77441cc0d3bd1.1629447552.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Alistair Popple <apopple@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ingo Molnar [Thu, 11 Nov 2021 04:32:33 +0000 (20:32 -0800)]
mm: allow only SLUB on PREEMPT_RT
Memory allocators may disable interrupts or preemption as part of the
allocation and freeing process. For PREEMPT_RT it is important that
these sections remain deterministic and short and therefore don't depend
on the size of the memory to allocate/ free or the inner state of the
algorithm.
Until v3.12-RT the SLAB allocator was an option but involved several
changes to meet all the requirements. The SLUB design fits better with
PREEMPT_RT model and so the SLAB patches were dropped in the 3.12-RT
patchset. Comparing the two allocator, SLUB outperformed SLAB in both
throughput (time needed to allocate and free memory) and the maximal
latency of the system measured with cyclictest during hackbench.
SLOB was never evaluated since it was unlikely that it preforms better
than SLAB. During a quick test, the kernel crashed with SLOB enabled
during boot.
Disable SLAB and SLOB on PREEMPT_RT.
[bigeasy@linutronix.de: commit description]
Link: https://lkml.kernel.org/r/20211015210336.gen3tib33ig5q2md@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yixuan Cao [Thu, 11 Nov 2021 04:32:30 +0000 (20:32 -0800)]
mm/page_owner.c: modify the type of argument "order" in some functions
The type of "order" in struct page_owner is unsigned short.
However, it is unsigned int in the following 3 functions:
__reset_page_owner
__set_page_owner_handle
__set_page_owner_handle
The type of "order" in argument list is unsigned int, which is
inconsistent.
[akpm@linux-foundation.org: update include/linux/page_owner.h]
Link: https://lkml.kernel.org/r/20211020125945.47792-1-caoyixuan2019@email.szu.edu.cn
Signed-off-by: Yixuan Cao <caoyixuan2019@email.szu.edu.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Paolo Bonzini [Thu, 11 Nov 2021 16:03:05 +0000 (11:03 -0500)]
Merge branch 'kvm-5.16-fixes' into kvm-master
* Fix misuse of gfn-to-pfn cache when recording guest steal time / preempted status
* Fix selftests on APICv machines
* Fix sparse warnings
* Fix detection of KVM features in CPUID
* Cleanups for bogus writes to MSR_KVM_PV_EOI_EN
* Fixes and cleanups for MSR bitmap handling
* Cleanups for INVPCID
* Make x86 KVM_SOFT_MAX_VCPUS consistent with other architectures
Paolo Bonzini [Thu, 11 Nov 2021 15:52:26 +0000 (10:52 -0500)]
Merge branch 'kvm-sev-move-context' into kvm-master
Add support for AMD SEV and SEV-ES intra-host migration support. Intra
host migration provides a low-cost mechanism for userspace VMM upgrades.
In the common case for intra host migration, we can rely on the normal
ioctls for passing data from one VMM to the next. SEV, SEV-ES, and other
confidential compute environments make most of this information opaque, and
render KVM ioctls such as "KVM_GET_REGS" irrelevant. As a result, we need
the ability to pass this opaque metadata from one VMM to the next. The
easiest way to do this is to leave this data in the kernel, and transfer
ownership of the metadata from one KVM VM (or vCPU) to the next. In-kernel
hand off makes it possible to move any data that would be
unsafe/impossible for the kernel to hand directly to userspace, and
cannot be reproduced using data that can be handed to userspace.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Vitaly Kuznetsov [Thu, 11 Nov 2021 13:47:33 +0000 (14:47 +0100)]
KVM: x86: Drop arbitrary KVM_SOFT_MAX_VCPUS
KVM_CAP_NR_VCPUS is used to get the "recommended" maximum number of
VCPUs and arm64/mips/riscv report num_online_cpus(). Powerpc reports
either num_online_cpus() or num_present_cpus(), s390 has multiple
constants depending on hardware features. On x86, KVM reports an
arbitrary value of '710' which is supposed to be the maximum tested
value but it's possible to test all KVM_MAX_VCPUS even when there are
less physical CPUs available.
Drop the arbitrary '710' value and return num_online_cpus() on x86 as
well. The recommendation will match other architectures and will mean
'no CPU overcommit'.
For reference, QEMU only queries KVM_CAP_NR_VCPUS to print a warning
when the requested vCPU number exceeds it. The static limit of '710'
is quite weird as smaller systems with just a few physical CPUs should
certainly "recommend" less.
Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <
20211111134733.86601-1-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Vipin Sharma [Tue, 9 Nov 2021 17:44:26 +0000 (17:44 +0000)]
KVM: Move INVPCID type check from vmx and svm to the common kvm_handle_invpcid()
Handle #GP on INVPCID due to an invalid type in the common switch
statement instead of relying on the callers (VMX and SVM) to manually
validate the type.
Unlike INVVPID and INVEPT, INVPCID is not explicitly documented to check
the type before reading the operand from memory, so deferring the
type validity check until after that point is architecturally allowed.
Signed-off-by: Vipin Sharma <vipinsh@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <
20211109174426.2350547-3-vipinsh@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Vipin Sharma [Tue, 9 Nov 2021 17:44:25 +0000 (17:44 +0000)]
KVM: VMX: Add a helper function to retrieve the GPR index for INVPCID, INVVPID, and INVEPT
handle_invept(), handle_invvpid(), handle_invpcid() read the same reg2
field in vmcs.VMX_INSTRUCTION_INFO to get the index of the GPR that
holds the invalidation type. Add a helper to retrieve reg2 from VMX
instruction info to consolidate and document the shift+mask magic.
Signed-off-by: Vipin Sharma <vipinsh@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <
20211109174426.2350547-2-vipinsh@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Tue, 9 Nov 2021 01:30:47 +0000 (01:30 +0000)]
KVM: nVMX: Clean up x2APIC MSR handling for L2
Clean up the x2APIC MSR bitmap intereption code for L2, which is the last
holdout of open coded bitmap manipulations. Freshen up the SDM/PRM
comment, rename the function to make it abundantly clear the funky
behavior is x2APIC specific, and explain _why_ vmcs01's bitmap is ignored
(the previous comment was flat out wrong for x2APIC behavior).
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <
20211109013047.2041518-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Tue, 9 Nov 2021 01:30:46 +0000 (01:30 +0000)]
KVM: VMX: Macrofy the MSR bitmap getters and setters
Add builder macros to generate the MSR bitmap helpers to reduce the
amount of copy-paste code, especially with respect to all the magic
numbers needed to calc the correct bit location.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <
20211109013047.2041518-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Tue, 9 Nov 2021 01:30:45 +0000 (01:30 +0000)]
KVM: nVMX: Handle dynamic MSR intercept toggling
Always check vmcs01's MSR bitmap when merging L0 and L1 bitmaps for L2,
and always update the relevant bits in vmcs02. This fixes two distinct,
but intertwined bugs related to dynamic MSR bitmap modifications.
The first issue is that KVM fails to enable MSR interception in vmcs02
for the FS/GS base MSRs if L1 first runs L2 with interception disabled,
and later enables interception.
The second issue is that KVM fails to honor userspace MSR filtering when
preparing vmcs02.
Fix both issues simultaneous as fixing only one of the issues (doesn't
matter which) would create a mess that no one should have to bisect.
Fixing only the first bug would exacerbate the MSR filtering issue as
userspace would see inconsistent behavior depending on the whims of L1.
Fixing only the second bug (MSR filtering) effectively requires fixing
the first, as the nVMX code only knows how to transition vmcs02's
bitmap from 1->0.
Move the various accessor/mutators that are currently buried in vmx.c
into vmx.h so that they can be shared by the nested code.
Fixes:
1a155254ff93 ("KVM: x86: Introduce MSR filtering")
Fixes:
d69129b4e46a ("KVM: nVMX: Disable intercept for FS/GS base MSRs in vmcs02 when possible")
Cc: stable@vger.kernel.org
Cc: Alexander Graf <graf@amazon.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <
20211109013047.2041518-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Tue, 9 Nov 2021 01:30:44 +0000 (01:30 +0000)]
KVM: nVMX: Query current VMCS when determining if MSR bitmaps are in use
Check the current VMCS controls to determine if an MSR write will be
intercepted due to MSR bitmaps being disabled. In the nested VMX case,
KVM will disable MSR bitmaps in vmcs02 if they're disabled in vmcs12 or
if KVM can't map L1's bitmaps for whatever reason.
Note, the bad behavior is relatively benign in the current code base as
KVM sets all bits in vmcs02's MSR bitmap by default, clears bits if and
only if L0 KVM also disables interception of an MSR, and only uses the
buggy helper for MSR_IA32_SPEC_CTRL. Because KVM explicitly tests WRMSR
before disabling interception of MSR_IA32_SPEC_CTRL, the flawed check
will only result in KVM reading MSR_IA32_SPEC_CTRL from hardware when it
isn't strictly necessary.
Tag the fix for stable in case a future fix wants to use
msr_write_intercepted(), in which case a buggy implementation in older
kernels could prove subtly problematic.
Fixes:
d28b387fb74d ("KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <
20211109013047.2041518-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Vitaly Kuznetsov [Mon, 8 Nov 2021 15:28:19 +0000 (16:28 +0100)]
KVM: x86: Don't update vcpu->arch.pv_eoi.msr_val when a bogus value was written to MSR_KVM_PV_EOI_EN
When kvm_gfn_to_hva_cache_init() call from kvm_lapic_set_pv_eoi() fails,
MSR write to MSR_KVM_PV_EOI_EN results in #GP so it is reasonable to
expect that the value we keep internally in KVM wasn't updated.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <
20211108152819.12485-3-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Vitaly Kuznetsov [Mon, 8 Nov 2021 15:28:18 +0000 (16:28 +0100)]
KVM: x86: Rename kvm_lapic_enable_pv_eoi()
kvm_lapic_enable_pv_eoi() is a misnomer as the function is also
used to disable PV EOI. Rename it to kvm_lapic_set_pv_eoi().
No functional change intended.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <
20211108152819.12485-2-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paul Durrant [Fri, 5 Nov 2021 09:51:01 +0000 (09:51 +0000)]
KVM: x86: Make sure KVM_CPUID_FEATURES really are KVM_CPUID_FEATURES
Currently when kvm_update_cpuid_runtime() runs, it assumes that the
KVM_CPUID_FEATURES leaf is located at 0x40000001. This is not true,
however, if Hyper-V support is enabled. In this case the KVM leaves will
be offset.
This patch introdues as new 'kvm_cpuid_base' field into struct
kvm_vcpu_arch to track the location of the KVM leaves and function
kvm_update_kvm_cpuid_base() (called from kvm_set_cpuid()) to locate the
leaves using the 'KVMKVMKVM\0\0\0' signature (which is now given a
definition in kvm_para.h). Adjustment of KVM_CPUID_FEATURES will hence now
target the correct leaf.
NOTE: A new for_each_possible_hypervisor_cpuid_base() macro is intoduced
into processor.h to avoid having duplicate code for the iteration
over possible hypervisor base leaves.
Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Message-Id: <
20211105095101.5384-3-pdurrant@amazon.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Fri, 5 Nov 2021 09:51:00 +0000 (09:51 +0000)]
KVM: x86: Add helper to consolidate core logic of SET_CPUID{2} flows
Move the core logic of SET_CPUID and SET_CPUID2 to a common helper, the
only difference between the two ioctls() is the format of the userspace
struct. A future fix will add yet more code to the core logic.
No functional change intended.
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <
20211105095101.5384-2-pdurrant@amazon.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Junaid Shahid [Thu, 4 Nov 2021 00:33:59 +0000 (17:33 -0700)]
kvm: mmu: Use fast PF path for access tracking of huge pages when possible
The fast page fault path bails out on write faults to huge pages in
order to accommodate dirty logging. This change adds a check to do that
only when dirty logging is actually enabled, so that access tracking for
huge pages can still use the fast path for write faults in the common
case.
Signed-off-by: Junaid Shahid <junaids@google.com>
Reviewed-by: Ben Gardon <bgardon@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <
20211104003359.2201967-1-junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Wed, 3 Nov 2021 16:18:33 +0000 (09:18 -0700)]
KVM: x86/mmu: Properly dereference rcu-protected TDP MMU sptep iterator
Wrap the read of iter->sptep in tdp_mmu_map_handle_target_level() with
rcu_dereference(). Shadow pages in the TDP MMU, and thus their SPTEs,
are protected by rcu.
This fixes a Sparse warning at tdp_mmu.c:900:51:
warning: incorrect type in argument 1 (different address spaces)
expected unsigned long long [usertype] *sptep
got unsigned long long [noderef] [usertype] __rcu *[usertype] sptep
Fixes:
7158bee4b475 ("KVM: MMU: pass kvm_mmu_page struct to make_spte")
Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <
20211103161833.3769487-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Maxim Levitsky [Mon, 8 Nov 2021 09:02:45 +0000 (11:02 +0200)]
KVM: x86: inhibit APICv when KVM_GUESTDBG_BLOCKIRQ active
KVM_GUESTDBG_BLOCKIRQ relies on interrupts being injected using
standard kvm's inject_pending_event, and not via APICv/AVIC.
Since this is a debug feature, just inhibit APICv/AVIC while
KVM_GUESTDBG_BLOCKIRQ is in use on at least one vCPU.
Fixes:
61e5f69ef0837 ("KVM: x86: implement KVM_GUESTDBG_BLOCKIRQ")
Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Tested-by: Sean Christopherson <seanjc@google.com>
Message-Id: <
20211108090245.166408-1-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Jim Mattson [Fri, 5 Nov 2021 20:20:58 +0000 (13:20 -0700)]
kvm: x86: Convert return type of *is_valid_rdpmc_ecx() to bool
These function names sound like predicates, and they have siblings,
*is_valid_msr(), which _are_ predicates. Moreover, there are comments
that essentially warn that these functions behave unexpectedly.
Flip the polarity of the return values, so that they become
predicates, and convert the boolean result to a success/failure code
at the outer call site.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <
20211105202058.1048757-1-jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
David Woodhouse [Tue, 2 Nov 2021 17:36:39 +0000 (17:36 +0000)]
KVM: x86: Fix recording of guest steal time / preempted status
In commit
b043138246a4 ("x86/KVM: Make sure KVM_VCPU_FLUSH_TLB flag is
not missed") we switched to using a gfn_to_pfn_cache for accessing the
guest steal time structure in order to allow for an atomic xchg of the
preempted field. This has a couple of problems.
Firstly, kvm_map_gfn() doesn't work at all for IOMEM pages when the
atomic flag is set, which it is in kvm_steal_time_set_preempted(). So a
guest vCPU using an IOMEM page for its steal time would never have its
preempted field set.
Secondly, the gfn_to_pfn_cache is not invalidated in all cases where it
should have been. There are two stages to the GFN->PFN conversion;
first the GFN is converted to a userspace HVA, and then that HVA is
looked up in the process page tables to find the underlying host PFN.
Correct invalidation of the latter would require being hooked up to the
MMU notifiers, but that doesn't happen---so it just keeps mapping and
unmapping the *wrong* PFN after the userspace page tables change.
In the !IOMEM case at least the stale page *is* pinned all the time it's
cached, so it won't be freed and reused by anyone else while still
receiving the steal time updates. The map/unmap dance only takes care
of the KVM administrivia such as marking the page dirty.
Until the gfn_to_pfn cache handles the remapping automatically by
integrating with the MMU notifiers, we might as well not get a
kernel mapping of it, and use the perfectly serviceable userspace HVA
that we already have. We just need to implement the atomic xchg on
the userspace address with appropriate exception handling, which is
fairly trivial.
Cc: stable@vger.kernel.org
Fixes:
b043138246a4 ("x86/KVM: Make sure KVM_VCPU_FLUSH_TLB flag is not missed")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <
3645b9b889dac6438394194bb5586a46b68d581f.camel@infradead.org>
[I didn't entirely agree with David's assessment of the
usefulness of the gfn_to_pfn cache, and integrated the outcome
of the discussion in the above commit message. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Peter Gonda [Thu, 21 Oct 2021 17:43:03 +0000 (10:43 -0700)]
selftest: KVM: Add intra host migration tests
Adds testcases for intra host migration for SEV and SEV-ES. Also adds
locking test to confirm no deadlock exists.
Signed-off-by: Peter Gonda <pgonda@google.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Cc: Marc Orr <marcorr@google.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Message-Id: <
20211021174303.385706-6-pgonda@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Peter Gonda [Thu, 21 Oct 2021 17:43:02 +0000 (10:43 -0700)]
selftest: KVM: Add open sev dev helper
Refactors out open path support from open_kvm_dev_path_or_exit() and
adds new helper for SEV device path.
Signed-off-by: Peter Gonda <pgonda@google.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Cc: Marc Orr <marcorr@google.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Message-Id: <
20211021174303.385706-5-pgonda@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Peter Gonda [Thu, 21 Oct 2021 17:43:01 +0000 (10:43 -0700)]
KVM: SEV: Add support for SEV-ES intra host migration
For SEV-ES to work with intra host migration the VMSAs, GHCB metadata,
and other SEV-ES info needs to be preserved along with the guest's
memory.
Signed-off-by: Peter Gonda <pgonda@google.com>
Reviewed-by: Marc Orr <marcorr@google.com>
Cc: Marc Orr <marcorr@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Message-Id: <
20211021174303.385706-4-pgonda@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Peter Gonda [Thu, 21 Oct 2021 17:43:00 +0000 (10:43 -0700)]
KVM: SEV: Add support for SEV intra host migration
For SEV to work with intra host migration, contents of the SEV info struct
such as the ASID (used to index the encryption key in the AMD SP) and
the list of memory regions need to be transferred to the target VM.
This change adds a commands for a target VMM to get a source SEV VM's sev
info.
Signed-off-by: Peter Gonda <pgonda@google.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Marc Orr <marcorr@google.com>
Cc: Marc Orr <marcorr@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Message-Id: <
20211021174303.385706-3-pgonda@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Thu, 11 Nov 2021 15:02:26 +0000 (10:02 -0500)]
KVM: SEV: provide helpers to charge/uncharge misc_cg
Avoid code duplication across all callers of misc_cg_try_charge and
misc_cg_uncharge. The resource type for KVM is always derived from
sev->es_active, and the quantity is always 1.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Thu, 11 Nov 2021 15:13:38 +0000 (10:13 -0500)]
KVM: generalize "bugged" VM to "dead" VM
Generalize KVM_REQ_VM_BUGGED so that it can be called even in cases
where it is by design that the VM cannot be operated upon. In this
case any KVM_BUG_ON should still warn, so introduce a new flag
kvm->vm_dead that is separate from kvm->vm_bugged.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Peter Gonda [Thu, 21 Oct 2021 17:42:59 +0000 (10:42 -0700)]
KVM: SEV: Refactor out sev_es_state struct
Move SEV-ES vCPU metadata into new sev_es_state struct from vcpu_svm.
Signed-off-by: Peter Gonda <pgonda@google.com>
Suggested-by: Tom Lendacky <thomas.lendacky@amd.com>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Cc: Marc Orr <marcorr@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Message-Id: <
20211021174303.385706-2-pgonda@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Takashi Sakamoto [Thu, 11 Nov 2021 10:30:15 +0000 (19:30 +0900)]
ALSA: fireworks: add support for Loud Onyx 1200f quirk
Loud Technologies shipped Onyx 1200f 2008 in its Mackie brand and
already discontinued. The model uses component of Fireworks board
module as its communication and DSP function.
The latest firmware (v4.6.0) has a quirk that tx packet includes wrong
value (0x1f) in its dbs field at middle and higher sampling transfer
frequency. It brings ALSA fireworks driver discontinuity of data block
counter.
This commit fixes it by assuming it as a quirk of firmware version
4.6.0.
$ cd linux-firewire-tools/src
$ python crpp < /sys/bus/firewire/devices/fw1/config_rom
ROM header and bus information block
-----------------------------------------------------------------
400
0404b9ef bus_info_length 4, crc_length 4, crc 47599
404
31333934 bus_name "1394"
408
e064a212 irmc 1, cmc 1, isc 1, bmc 0, pmc 0, cyc_clk_acc 100,
max_rec 10 (2048), max_rom 2, gen 1, spd 2 (S400)
40c
000ff209 company_id 000ff2 |
410
62550ce0 device_id
0962550ce0 | EUI-64
000ff20962550ce0
root directory
-----------------------------------------------------------------
414
0008088e directory_length 8, crc 2190
418
03000ff2 vendor
41c
8100000f --> descriptor leaf at 458
420
1701200f model
424
81000018 --> descriptor leaf at 484
428
0c008380 node capabilities
42c
8d000003 --> eui-64 leaf at 438
430
d1000005 --> unit directory at 444
434
08000ff2 (immediate value)
eui-64 leaf at 438
-----------------------------------------------------------------
438
000281ae leaf_length 2, crc 33198
43c
000ff209 company_id 000ff2 |
440
62550ce0 device_id
0962550ce0 | EUI-64
000ff20962550ce0
unit directory at 444
-----------------------------------------------------------------
444
00045d94 directory_length 4, crc 23956
448
1200a02d specifier id: 1394 TA
44c
13010000 version
450
1701200f model
454
8100000c --> descriptor leaf at 484
descriptor leaf at 458
-----------------------------------------------------------------
458
000a199d leaf_length 10, crc 6557
45c
00000000 textual descriptor
460
00000000 minimal ASCII
464
4d61636b "Mack"
468
69650000 "ie"
46c
00000000
470
00000000
474
00000000
478
00000000
47c
00000000
480
00000000
descriptor leaf at 484
-----------------------------------------------------------------
484
000a0964 leaf_length 10, crc 2404
488
00000000 textual descriptor
48c
00000000 minimal ASCII
490
4f6e7978 "Onyx"
494
20313230 " 120"
498
30460000 "0F"
49c
00000000
4a0
00000000
4a4
00000000
4a8
00000000
4ac
00000000
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Link: https://lore.kernel.org/r/20211111103015.7498-1-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Paolo Bonzini [Thu, 11 Nov 2021 12:40:26 +0000 (07:40 -0500)]
Merge branch 'kvm-guest-sev-migration' into kvm-master
Add guest api and guest kernel support for SEV live migration.
Introduces a new hypercall to notify the host of changes to the page
encryption status. If the page is encrypted then it must be migrated
through the SEV firmware or a helper VM sharing the key. If page is
not encrypted then it can be migrated normally by userspace. This new
hypercall is invoked using paravirt_ops.
Conflicts: sev_active() replaced by cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT).
Ashish Kalra [Tue, 24 Aug 2021 11:07:45 +0000 (11:07 +0000)]
x86/kvm: Add kexec support for SEV Live Migration.
Reset the host's shared pages list related to kernel
specific page encryption status settings before we load a
new kernel by kexec. We cannot reset the complete
shared pages list here as we need to retain the
UEFI/OVMF firmware specific settings.
The host's shared pages list is maintained for the
guest to keep track of all unencrypted guest memory regions,
therefore we need to explicitly mark all shared pages as
encrypted again before rebooting into the new guest kernel.
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Reviewed-by: Steve Rutherford <srutherford@google.com>
Message-Id: <
3e051424ab839ea470f88333273d7a185006754f.
1629726117.git.ashish.kalra@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Ashish Kalra [Tue, 24 Aug 2021 11:07:07 +0000 (11:07 +0000)]
x86/kvm: Add guest support for detecting and enabling SEV Live Migration feature.
The guest support for detecting and enabling SEV Live migration
feature uses the following logic :
- kvm_init_plaform() checks if its booted under the EFI
- If not EFI,
i) if kvm_para_has_feature(KVM_FEATURE_MIGRATION_CONTROL), issue a wrmsrl()
to enable the SEV live migration support
- If EFI,
i) If kvm_para_has_feature(KVM_FEATURE_MIGRATION_CONTROL), read
the UEFI variable which indicates OVMF support for live migration
ii) the variable indicates live migration is supported, issue a wrmsrl() to
enable the SEV live migration support
The EFI live migration check is done using a late_initcall() callback.
Also, ensure that _bss_decrypted section is marked as decrypted in the
hypervisor's guest page encryption status tracking.
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Reviewed-by: Steve Rutherford <srutherford@google.com>
Message-Id: <
b4453e4c87103ebef12217d2505ea99a1c3e0f0f.
1629726117.git.ashish.kalra@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Ashish Kalra [Tue, 24 Aug 2021 11:06:40 +0000 (11:06 +0000)]
EFI: Introduce the new AMD Memory Encryption GUID.
Introduce a new AMD Memory Encryption GUID which is currently
used for defining a new UEFI environment variable which indicates
UEFI/OVMF support for the SEV live migration feature. This variable
is setup when UEFI/OVMF detects host/hypervisor support for SEV
live migration and later this variable is read by the kernel using
EFI runtime services to verify if OVMF supports the live migration
feature.
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Message-Id: <
1cea22976d2208f34d47e0c1ce0ecac816c13111.
1629726117.git.ashish.kalra@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Brijesh Singh [Tue, 24 Aug 2021 11:05:00 +0000 (11:05 +0000)]
mm: x86: Invoke hypercall when page encryption status is changed
Invoke a hypercall when a memory region is changed from encrypted ->
decrypted and vice versa. Hypervisor needs to know the page encryption
status during the guest migration.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Steve Rutherford <srutherford@google.com>
Reviewed-by: Venu Busireddy <venu.busireddy@oracle.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Message-Id: <
0a237d5bb08793916c7790a3e653a2cbe7485761.
1629726117.git.ashish.kalra@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>