proc, coredump: add CoreDumping flag to /proc/pid/status 75/234475/2 accepted/tizen_6.0_unified accepted/tizen_6.0_unified_hotfix tizen_6.0 tizen_6.0_hotfix accepted/tizen/6.0/unified/20201030.103032 accepted/tizen/6.0/unified/hotfix/20201103.044956 accepted/tizen/unified/20200609.153518 submit/tizen/20200609.022747 submit/tizen_6.0/20201029.205505 submit/tizen_6.0_hotfix/20201102.192906 submit/tizen_6.0_hotfix/20201103.115106 tizen_6.0.m2_release
authorRoman Gushchin <guro@fb.com>
Fri, 17 Nov 2017 23:26:45 +0000 (15:26 -0800)
committerSeung-Woo Kim <sw0312.kim@samsung.com>
Wed, 3 Jun 2020 02:36:17 +0000 (02:36 +0000)
commitcdb2b5ce28c78720ac2afe033f1d264b8b8c13f5
tree96c2f59379b15d903d4608dd75b0cad04bf01bf2
parent4bf16d3d958547e6cf36d8740cd42ad1eb73f5c7
proc, coredump: add CoreDumping flag to /proc/pid/status

Right now there is no convenient way to check if a process is being
coredumped at the moment.

It might be necessary to recognize such state to prevent killing the
process and getting a broken coredump.  Writing a large core might take
significant time, and the process is unresponsive during it, so it might
be killed by timeout, if another process is monitoring and
killing/restarting hanging tasks.

We're getting a significant number of corrupted coredump files on
machines in our fleet, just because processes are being killed by
timeout in the middle of the core writing process.

We do have a process health check, and some agent is responsible for
restarting processes which are not responding for health check requests.
Writing a large coredump to the disk can easily exceed the reasonable
timeout (especially on an overloaded machine).

This flag will allow the agent to distinguish processes which are being
coredumped, extend the timeout for them, and let them produce a full
coredump file.

To provide an ability to detect if a process is in the state of being
coredumped, we can expose a boolean CoreDumping flag in
/proc/pid/status.

Example:
$ cat core.sh
  #!/bin/sh

  echo "|/usr/bin/sleep 10" > /proc/sys/kernel/core_pattern
  sleep 1000 &
  PID=$!

  cat /proc/$PID/status | grep CoreDumping
  kill -ABRT $PID
  sleep 1
  cat /proc/$PID/status | grep CoreDumping

$ ./core.sh
  CoreDumping: 0
  CoreDumping: 1

[guro@fb.com: document CoreDumping flag in /proc/<pid>/status]
Link: http://lkml.kernel.org/r/20170928135357.GA8470@castle.DHCP.thefacebook.com
Link: http://lkml.kernel.org/r/20170920230634.31572-1-guro@fb.com
Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[k.lewandowsk: backport mainline commit c643401218 for userspace process managers
 (eg. resourced) to be able to avoid killing dead processes, breaking crash reports]
Signed-off-by: Karol Lewandowski <k.lewandowsk@samsung.com>
Change-Id: I5ba2fcbf4f388be752db542e80ebca367dba618c
Documentation/filesystems/proc.txt
fs/proc/array.c