4 The corewatcher package provides a daemon for monitoring a system for
5 crashes. Crashes are analyzed and summary crash report information is
6 sent to a crashdb server.
8 The daemon is managed by a systemd unit file, corewatcher.service.
10 Configuration is stored in /etc/corewatcher.
12 Corefiles are assumed to be written to /var/lib/corewatcher by the
13 kernel with /proc/sys/kernel settings of:
14 core_pattern=/var/lib/corewatcher/core_%e_%t
17 To build and run, use the standard autotools workflow like:
23 ===========================================================================
26 The corewatcher daemon can be considered to be a state machine with the
27 following 5 possible states and the listed major functions called for
30 S1: core_folder has no core_*
32 | crash happens leading to inotification
34 S2: core_folder has core_* present
37 | move_core(fullpath, "to-process")
39 S3: processed_folder has some core_*.to-process
41 processed_folder has some core_*.processed, but no associated *.txt
43 | scan_processed_folder()
45 | (calls gdb, creates report summary *.txt)
47 S4: processed_folder has some core_*.processed, and associated *.txt
55 | submit_loop(): a sleepy thread whose work condition is set in
56 | queue_backtrace() and in the period timer
57 | "cleanup" thread, submits *.txt and where
58 | successful moves associated core_*.processed
61 S5: processed_folder has only core_*.submitted and *.txt
65 o At daemon start any of the states in the filesystem could exist, so we
66 need to do all of scan_core_folder(), scan_processed_folder() and
68 o During submission, crash reports are removed from the in-memory pending
69 submission work list. If the curl POST then fails, the associated cores
70 stay in the filesystem as "processed" files, and are placed back on the
71 in-memory submission work list.
72 - if client network is down and comes back up, an event notifier
73 could trigger resubmit by toggling the submit_loop() condition
75 - if server or intermediate connectivity was the problem, only a
76 periodic timer can trigger resubmission by setting the work condition
78 - failed submissions should hang out at the end of the work queue in
79 case there is something truly wrong with them so new reports have a
80 better chance of getting through
83 ===========================================================================
86 Internals: locking & global state
90 o bt_work GCond condition variable
91 o bt_list struct oops linked list
92 o bt_hash GHashTable of core file names
93 o A struct oops may exist off of bt_list and still be referenced by
94 name in be in bt_hash. Such a struct oops must exist if the core
95 name is in the bt_hash.
96 o pq_mtx: (coredump.c)
98 o pq "processing queue" boolean: the actual queue is represented
99 by the presence of files in filesystem, but this allow threads
100 to signal there are new ones to process
101 o pq_work GCond condition variable