RAS: Add a Corrected Errors Collector
authorBorislav Petkov <bp@suse.de>
Mon, 27 Mar 2017 09:33:02 +0000 (11:33 +0200)
committerIngo Molnar <mingo@kernel.org>
Tue, 28 Mar 2017 06:54:48 +0000 (08:54 +0200)
commit011d8261117249eab97bc86a8e1ac7731e03e319
tree5e4a07f4ac44d81b62344ee3c8dadadf1f77cf66
parente64edfcce9c738300b4102d0739577d6ecc96d4a
RAS: Add a Corrected Errors Collector

Introduce a simple data structure for collecting correctable errors
along with accessors. More detailed description in the code itself.

The error decoding is done with the decoding chain now and
mce_first_notifier() gets to see the error first and the CEC decides
whether to log it and then the rest of the chain doesn't hear about it -
basically the main reason for the CE collector - or to continue running
the notifiers.

When the CEC hits the action threshold, it will try to soft-offine the
page containing the ECC and then the whole decoding chain gets to see
the error.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170327093304.10683-5-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Documentation/admin-guide/kernel-parameters.txt
arch/x86/include/asm/mce.h
arch/x86/kernel/cpu/mcheck/mce.c
arch/x86/ras/Kconfig
drivers/ras/Makefile
drivers/ras/cec.c [new file with mode: 0644]
drivers/ras/debugfs.c
drivers/ras/debugfs.h [new file with mode: 0644]
drivers/ras/ras.c
include/linux/ras.h