From: Krzysztof Opasiak Date: Wed, 20 Dec 2017 21:06:47 +0000 (+0100) Subject: Add documentation about faultd internals X-Git-Tag: submit/tizen/20180222.151354~17 X-Git-Url: http://review.tizen.org/git/?a=commitdiff_plain;h=2d365bc9d2fb01db987bff79b8fc051ad966fd12;p=platform%2Fcore%2Fsystem%2Ffaultd.git Add documentation about faultd internals Change-Id: I82a04ee89a467bb20bf1268e41b2d25d3669b6ea Signed-off-by: Krzysztof Opasiak --- diff --git a/doc/HOWTO.rst b/doc/HOWTO.rst index 27534a0..f636c42 100644 --- a/doc/HOWTO.rst +++ b/doc/HOWTO.rst @@ -78,14 +78,344 @@ Only modules listed in that directory will be loaded by faultd during start up. Under the hood ============== -Architecture ------------- +faultd is implemented in C, using many kernel-style idioms. +To learn more about them please check kernel documentation. +As it's running as root it should be reliable. +That's why: + +.. WARNING:: +**faultd is meant to be single threaded. +Internal faultd API does not support concurrent access at all. +So if for any reason (including DBUS communication using glib) you run an additional thread you have to ensure that faultd functions are called only in the context of the main thread.** + +Modules +------- + +faultd has been designed to be extensible and modular. +That's why the very first and basic abstraction that is defined inside faultd is *struct faultd_module*. +Generally every piece of faultd code, apart from utilities like logs and helpers, is encapsulated into module abstraction. + +For every module, name, type and init() should be defined. +Name is just a string which identifies the module. +Type is an enum value that determines in which order modules should be initialized. +Generally, the bigger the number is, the later module is going to be initialized. +Instead of using pure integer *enum faultd_module_type* should be used. +Module life cycle starts with init() and ends with cleanup(). +It's mandatory for every module to have a init() set, while cleanup() is optional. +To register the module *FAULTD_MODULE_REGISTER()* macro should be used. + +As usa of bare module API is not always enough it's possible to build additional abstraction layers on the top of module API. +To do so, *struct faultd_module* should be embedded in some bigger structure that provides this abstraction layer. +To access the container structure in init() or cleanup() the *container_of()* macro should be used. +An example of simple module has been presented below. + +.. code:: C + struct my_module { + struct faultd_module mod; + char *my_data; + }; + + #define to_my_module(MOD) \ + container_of(MOD, struct my_module, mod) + + static int module_init(struct faultd_module *module, + struct faultd_config *config, + sd_event *event) + { + struct my_module *mmod = to_my_module(module); + + /* Alloc your resources like memory, open fds etc. */ + mmod->my_data = calloc(10, sizeof(char)); + if (!mmod->my_data) + return -ENOMEM; + + return 0; + } + + static void module_cleanup(struct faultd_module *module) + { + struct my_module *mmod = to_my_module(module); + + free(mmod->my_data); + } + + struct my_module my_mod = { + .mod = { + .name = "my_module", + .type = FAULTD_MODULE_TYPE_LISTENER, + .init = module_init, + .cleanup = module_cleanup, + .node = LIST_HEAD_INIT(&mmod.mod.node), + }, + .my_data = NULL, + }; + + FAULTD_MODULE_REGISTER(&my_mod.mod); + +Architecture Overview +--------------------- + +On a very general level faultd is just a collection of modules that are all initialized in the beginning and cleaned up just before returning from main(). +To ensure the right order of module initialization every module has to have a type. +This type can be understood as run level. +The greater the type value is, the later module is initialized. +In init() a module should allocate all required resources and release them in cleanup(). +A module should not block inside init() for a long period of time (in read(), poll() etc). +For this purpose there is a main loop running inside the daemon. +It can be accessed via the event parameter of the init(). +By default all modules should use sd_event mainloop but if it's necessary it is also possible to use glib mainloop. +The mainloop object can be accessed using g_mainloop_new() with NULL as a context or just by passing NULL as mainloop argument to other glib functions. Core ---- -Modules +faultd core is a set of modules that provides basic abstractions layers (including modules). +Generally you should not modify core unless you have a very good reason to do so. +In addition all modules in core directory should always be built-in. + +The most important abstraction which is commonly used in faultd is struct faultd_module. +Every event that goes through the event processor (event_processor_report_event()) is stored in the database so it can be retrieved later. +To keep the correct order of events monotonic time and boot-id are used. + +The typical flow of events starts with a listener. +Then control goes to a decision maker and ends in action. +In the very beginning an event of some particular type (like service_failed_event) is detected and reported by one of the listeners. +After reporting it to the core a suitable decision maker is called to decide what to do with this event. +When the decision is made, the decision maker creates the decision_made_event and fills it with data about the action which should be executed. +This event is once again reported to the core and now it's handled by the action executor which tries to find a suitable action and execute it. +After this, the action_executed event is reported to the core and usually this is the last step of event processing in faultd. + +Event Types +----------- + +The base abstraction on which most of faultd modules operate is struct faultd_event. +As there may be many different types of events in the system struct faultd_event can be easily extended. +To do so a structure which contains struct faultd_event as one of its fields should be defined. +To allow this event type to be used in common faultd core a couple of methods has to be defined: + +- char *to_string(struct faultd_event *) + This function should print all the event's data to a newly allocated string + +- void release(struct faultd_event *) + This function is called to release all the memory owned by the event when all references has been drooped. + +- void serialize(struct faultd_event*, struct faultd_object*) + This function is called to serialize given event to a generic faultd_object. + This function should put all the event data to faultd_object to allow placing this event into the database. + +- int allocate_event(struct faultd_event_type *type, void *data, struct faultd_event **ev) + This function should allocate new event based on its data. + +- int deserialize_event(struct faultd_event_type *type, struct faultd_object *data, struct faultd_event **ev) + This function should parse given faultd_object (usually retrieved from DB) and allocate event based on it. + +Those functions should be passed to core using struct faultd_event_type. +Apart from methods, a unique name for each event type should also be defined. +When struct faultd_event_type is initialized it should be registered in the core using FAULTD_EVENT_TYPE_REGISTER() macro. +An example of event type definition has been shown below: + +.. code:: C + /* Header file */ + #define EXAMPLE_EVENT_ID "example" + #define EXAMPLE_FIELD_ID "example" + + struct example_event { + struct faultd_event event; + int example; + }; + + struct e_event_data { + int example; + }; + + #define to_example_event(EVENT) \ + container_of(EVENT, struct example_event, event) + + /* .c file */ + + static int allocate_ee_event(struct faultd_event_type *type, + void *data, struct faultd_event **ev) + { + struct example_event_event *e_ev; + struct e_event_data *e_ev_data = data; + int ret; + + e_ev = calloc(1, sizeof(*e_ev)); + if (!e_ev) + return -ENOMEM; + + ret = faultd_event_init_internal(type, &e_ev->event); + if (ret) + goto free_e_ev; + + e_ev->example = e_ev_data->example; + + *ev = &e_ev->event; + return 0; + + free_e_ev: + free(e_ev); + + return ret; + } + + static int deserialize_e_event(struct faultd_event_type *type, + struct faultd_object *data, struct faultd_event **ev) + { + int ret = -EINVAL; + struct e_event_data e_ev_data; + struct faultd_object *obj; + + memset(&e_ev_data, 0, sizeof(e_ev_data)); + + list_for_each_entry(obj, &data->val.children, node) { + if ((obj->type == TYPE_INT) && + (strcmp(EXAMPLE_FIELD_ID, obj->key) == 0)) { + + e_ev_data = obj->val.i; + } + } + + if (!e_ev_data.example) { + ret = -EINVAL; + goto finish; + } + + ret = allocate_e_event(type, &e_ev_data, ev); + if (ret < 0) + goto finish; + + ret = faultd_event_deserialize_internal(data, type, *ev); + if (ret < 0) + goto finish; + + ret = 0; + finish: + return ret; + } + + static void e_event_release(struct faultd_event *ev) + { + struct example_event_event *e_ev = + to_example_event_event(ev); + + free(e_ev); + } + + static char *e_event_to_string(struct faultd_event *ev) + { + struct example_event_event *e_ev = + to_example_event_event(ev); + char *str; + int ret; + + ret = asprintf(&str, "Example Event:" + " Example: %d" + " Impl: %s" + " Result: %d", + e_ev->example); + + return ret > 0 ? str : NULL; + } + + static void e_event_serialize(struct faultd_event *ev, struct faultd_object *out) + { + struct example_event_event *e_ev = + to_example_event_event(ev); + + faultd_event_serialize_internal(ev, out); + + faultd_object_append_int(out, EXAMPLE_FIELD_ID, e_ev->example); + } + + static struct faultd_event_type example_event_type = { + .name = EXAMPLE_EVENT_EVENT_ID, + .default_ops = { + .release = e_event_release, + .serialize = e_event_serialize, + .to_string = e_event_to_string, + }, + .allocate_event = allocate_e_event, + .deserialize_event = deserialize_e_event, + .node = LIST_HEAD_INIT(example_event_type.node), + }; + + FAULTD_EVENT_TYPE_REGISTER(example_event_type, example_event_et) + +Listeners +--------- + +Listeners are just modules that watch the system and generate a suitable event when something happens in area of their interests. +There is no special API for defining them, they just use the base module abstraction. +The typical listener module consists of three basic functions: + +- module init() + Apart from memory allocation the listener module should use this function to install its watchers, for example start poll() on some file descriptor. + Because module should not block in init() instead of direct call of poll() mainloop infrastructure should be used. + It's advised to use sd_event for this purpose, however it's also possible to use glib if faultd has been compiled with glib support. + +- module cleanup() + Here all resources claimed by the module should be released. + The module should also take care of unregistering all event sources from the mainloop and close related fds. + +- file descriptor/dbus callback + This is usually a function that you pass to the mainloop while registering a file descriptor. + In this callback the listener should collect all the required data about the event, allocate it using faultd_event_create() and pass it for further processing using event_processor_report_event(). + +Decision Makers +--------------- + +Decision makers are modules that decide what action should be taken when an event arrives. +So it basically gets an event reported by some listener and checks what happened. +It may also check previous events by querying the database. +Then it chooses an action to be executed or simply decides to ignore the event. + +Decision makers have their own abstraction called struct faultd_event_handler. +To create a new decision maker functions listed below should be implemented: + +- init() + This function should claim required resources and load configuration (if needed) from the config parameter. + +- cleanup() + Here all resources claimed by the module should be released. + It's also the last opportunity for the decision maker to do something with pending events (if needed), otherwise they will just be dropped. + +- event_match() + This function should check if the decision maker is interested in handling the event passed as a parameter. + It's called before placing the event in the decision maker's queue. + Usually it calls faultd_event_is_of_type() to check if the event is of some particular type. + It should return non-zero if the decision maker is interested in this event. + +- handle_event() + This is the main routine of every decision maker. + It is called when the event queue for this decision maker (field event_queue in struct faultd_event_handler) is not empty. + Firstly, the decision maker should pop an event from the queue. + Then it should perform its logic to determine which action should be executed. + This may include querying the data base, contacting some other daemon or maybe even showing a pop up. + Next, a decision made event should be allocated and filled with details about the action using faultd_event_create(). + Lastly, the event should be reported to the core with event_processor_report_event(). + +Actions ------- +Action modules are used to implement some operation that should be executed in response to some event. +To implement this kind of module struct faultd_action should be used. +It is required to implement only one function - execute(). +First, this function should pop an event from the queue (it's guaranteed that it is decision_made_event). +Then it should check if action data provided in that event is correct and perform the action. +All messages related to action execution (including errors) should be logged as strings appended to the action_log object inside the action_executed_event passed as a parameter. +Lastly, this function should set the result field of action_executed_event to indicate if action execution failed or not. +The value returned from this function should be always 0 unless the error which occurred is fatal and this action won't be able to handle any more requests. + +If an action requires relatively long time to be executed it should be processed asynchronously. +It means that execute() should prepare the whole operation, store required data (including exec_info param) set the result field to -EPROBE_DEFER and return. +When the action is really executed it should set the new value of result field and report action_executed_event on its own. + Plugins ------- + +Generally all modules apart from core can be compiled as shared libraries (plugins). +To do so you just have to add your own library target in Makefile.am instead of adding it to faultd main binary sources. +Plugins are loaded by faultd in the very beginning before initiating any module. +It's not possible to load any new plugins when faultd is already started. +After loading plugins all modules are initialized using standard initialization process as described earlier in this document.