x86/mce: Mark fatal MCE's page as poison to avoid panic in the kdump kernel
authorZhiquan Li <zhiquan1.li@intel.com>
Thu, 26 Oct 2023 00:39:03 +0000 (08:39 +0800)
committerGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Mon, 5 Feb 2024 20:14:14 +0000 (20:14 +0000)
commit04c6948db0ff5d6502b83d6ec703b2c4c9813b75
tree9fe9d2b24679040b72429dc6320f23a2ac907d80
parent28b8ba8eebf26f66d9f2df4ba550b6b3b136082c
x86/mce: Mark fatal MCE's page as poison to avoid panic in the kdump kernel

[ Upstream commit 9f3b130048bfa2e44a8cfb1b616f826d9d5d8188 ]

Memory errors don't happen very often, especially fatal ones. However,
in large-scale scenarios such as data centers, that probability
increases with the amount of machines present.

When a fatal machine check happens, mce_panic() is called based on the
severity grading of that error. The page containing the error is not
marked as poison.

However, when kexec is enabled, tools like makedumpfile understand when
pages are marked as poison and do not touch them so as not to cause
a fatal machine check exception again while dumping the previous
kernel's memory.

Therefore, mark the page containing the error as poisoned so that the
kexec'ed kernel can avoid accessing the page.

  [ bp: Rewrite commit message and comment. ]

Co-developed-by: Youquan Song <youquan.song@intel.com>
Signed-off-by: Youquan Song <youquan.song@intel.com>
Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Link: https://lore.kernel.org/r/20231014051754.3759099-1-zhiquan1.li@intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
arch/x86/kernel/cpu/mce/core.c