doc/arch.doc

   1 /******************************************************************************
   2  *
   3  *
   4  *
   5  * Copyright (C) 1997-2012 by Dimitri van Heesch.
   6  *
   7  * Permission to use, copy, modify, and distribute this software and its
   8  * documentation under the terms of the GNU General Public License is hereby
   9  * granted. No representations are made about the suitability of this software
  10  * for any purpose. It is provided "as is" without express or implied warranty.
  11  * See the GNU General Public License for more details.
  12  *
  13  * Documents produced by Doxygen are derivative works derived from the
  14  * input used in their production; they are not affected by this license.
  15  *
  16  */
  17 /*! \page arch Doxygen's Internals
  18
  19 <h3>Doxygen's internals</h3>
  20
  21 <B>Note that this section is still under construction!</B>
  22
  23 The following picture shows how source files are processed by doxygen.
  24
  25 \image html archoverview.gif "Data flow overview"
  26 \image latex archoverview.eps "Data flow overview" width=14cm
  27
  28 The following sections explain the steps above in more detail.
  29
  30 <h3>Config parser</h3>
  31
  32 The configuration file that controls the settings of a project is parsed
  33 and the settings are stored in the singleton class \c Config
  34 in <code>src/config.h</code>. The parser itself is written using \c flex
  35 and can be found in <code>src/config.l</code>. This parser is also used
  36 directly by \c doxywizard, so it is put in a separate library.
  37
  38 Each configuration option has one of 5 possible types: \c String,
  39 \c List, \c Enum, \c Int, or \c Bool. The values of these options are
  40 available through the global functions \c Config_getXXX(), where \c XXX is the
  41 type of the option. The argument of these function is a string naming
  42 the option as it appears in the configuration file. For instance:
  43 \c Config_getBool("GENERATE_TESTLIST") returns a reference to a boolean
  44 value that is \c TRUE if the test list was enabled in the config file.
  45
  46 The function \c readConfiguration() in \c src/doxygen.cpp
  47 reads the command line options and then calls the configuration parser.
  48
  49 <h3>C Preprocessor</h3>
  50
  51 The input files mentioned in the config file are (by default) fed to the
  52 C Preprocessor (after being piped through a user defined filter if available).
  53
  54 The way the preprocessor works differs somewhat from a standard C Preprocessor.
  55 By default it does not do macro expansion, although it can be configured to
  56 expand all macros. Typical usage is to only expand a user specified set
  57 of macros. This is to allow macro names to appear in the type of
  58 function parameters for instance.
  59
  60 Another difference is that the preprocessor parses, but not actually includes
  61 code when it encounters a \#include (with the exception of \#include
  62 found inside { ... } blocks). The reasons behind this deviation from
  63 the standard is to prevent feeding multiple definitions of the
  64 same functions/classes to doxygen's parser. If all source files would
  65 include a common header file for instance, the class and type
  66 definitions (and their documentation) would be present in each
  67 translation unit.
  68
  69 The preprocessor is written using \c flex and can be found in
  70 \c src/pre.l. For condition blocks (\#if) evaluation of constant expressions
  71 is needed. For this a \c yacc based parser is used, which can be found
  72 in \c src/constexp.y and \c src/constexp.l.
  73
  74 The preprocessor is invoked for each file using the \c preprocessFile()
  75 function declared in \c src/pre.h, and will append the preprocessed result
  76 to a character buffer. The format of the character buffer is
  77
  78 \verbatim
  79 0x06 file name 1
  80 0x06 preprocessed contents of file 1
  81 ...
  82 0x06 file name n
  83 0x06 preprocessed contents of file n
  84 \endverbatim
  85
  86 <h3>Language parser</h3>
  87
  88 The preprocessed input buffer is fed to the language parser, which is
  89 implemented as a big state machine using \c flex. It can be found
  90 in the file \c src/scanner.l. There is one parser for all
  91 languages (C/C++/Java/IDL). The state variables \c insideIDL
  92 and \c insideJava are uses at some places for language specific choices.
  93
  94 The task of the parser is to convert the input buffer into a tree of entries
  95 (basically an abstract syntax tree). An entry is defined in \c src/entry.h
  96 and is a blob of loosely structured information. The most important field
  97 is \c section which specifies the kind of information contained in the entry.
  98
  99 Possible improvements for future versions:
 100  - Use one scanner/parser per language instead of one big scanner.
 101  - Move the first pass parsing of documentation blocks to a separate module.
 102  - Parse defines (these are currently gathered by the preprocessor, and
 103    ignored by the language parser).
 104
 105 <h3>Data organizer</h3>
 106
 107 This step consists of many smaller steps, that build
 108 dictionaries of the extracted classes, files, namespaces,
 109 variables, functions, packages, pages, and groups. Besides building
 110 dictionaries, during this step relations (such as inheritance relations),
 111 between the extracted entities are computed.
 112
 113 Each step has a function defined in \c src/doxygen.cpp, which operates
 114 on the tree of entries, built during language parsing. Look at the
 115 "Gathering information" part of \c parseInput() for details.
 116
 117 The result of this step is a number of dictionaries, which can be
 118 found in the Doxygen "namespace" defined in \c src/doxygen.h. Most
 119 elements of these dictionaries are derived from the class \c Definition;
 120 The class \c MemberDef, for instance, holds all information for a member.
 121 An instance of such a class can be part of a file ( class \c FileDef ),
 122 a class ( class \c ClassDef ), a namespace ( class \c NamespaceDef ),
 123 a group ( class \c GroupDef ), or a Java package ( class \c PackageDef ).
 124
 125 <h3>Tag file parser</h3>
 126
 127 If tag files are specified in the configuration file, these are parsed
 128 by a SAX based XML parser, which can be found in \c src/tagreader.cpp.
 129 The result of parsing a tag file is the insertion of \c Entry objects in the
 130 entry tree. The field \c Entry::tagInfo is used to mark the entry as
 131 external, and holds information about the tag file.
 132
 133 <h3>Documentation parser</h3>
 134
 135 Special comment blocks are stored as strings in the entities that they
 136 document. There is a string for the brief description and a string
 137 for the detailed description. The documentation parser reads these
 138 strings and executes the commands it finds in it (this is the second pass
 139 in parsing the documentation). It writes the result directly to the output
 140 generators.
 141
 142 The parser is written in C++ and can be found in src/docparser.cpp. The
 143 tokens that are eaten by the parser come from src/doctokenizer.l.
 144 Code fragments found in the comment blocks are passed on to the source parser.
 145
 146 The main entry point for the documentation parser is \c validatingParseDoc()
 147 declared in \c src/docparser.h.  For simple texts with special
 148 commands \c validatingParseText() is used.
 149
 150 <h3>Source parser</h3>
 151
 152 If source browsing is enabled or if code fragments are encountered in the
 153 documentation, the source parser is invoked.
 154
 155 The code parser tries to cross-reference to source code it parses with
 156 documented entities. It also does syntax highlighting of the sources. The
 157 output is directly written to the output generators.
 158
 159 The main entry point for the code parser is \c parseCode()
 160 declared in \c src/code.h.
 161
 162 <h3>Output generators</h3>
 163
 164 After data is gathered and cross-referenced, doxygen generates
 165 output in various formats. For this it uses the methods provided by
 166 the abstract class \c OutputGenerator. In order to generate output
 167 for multiple formats at once, the methods of \c OutputList are called
 168 instead. This class maintains a list of concrete output generators,
 169 where each method called is delegated to all generators in the list.
 170
 171 To allow small deviations in what is written to the output for each
 172 concrete output generator, it is possible to temporarily disable certain
 173 generators. The OutputList class contains various \c disable() and \c enable()
 174 methods for this. The methods \c OutputList::pushGeneratorState() and
 175 \c OutputList::popGeneratorState() are used to temporarily save the
 176 set of enabled/disabled output generators on a stack.
 177
 178 The XML is generated directly from the gathered data structures. In the
 179 future XML will be used as an intermediate language (IL). The output
 180 generators will then use this IL as a starting point to generate the
 181 specific output formats. The advantage of having an IL is that various
 182 independently developed tools written in various languages,
 183 could extract information from the XML output. Possible tools could be:
 184 - an interactive source browser
 185 - a class diagram generator
 186 - computing code metrics.
 187
 188 <h3>Debugging</h3>
 189
 190 Since doxygen uses a lot of \c flex code it is important to understand
 191 how \c flex works (for this one should read the man page)
 192 and to understand what it is doing when \c flex is parsing some input.
 193 Fortunately, when flex is used with the -d option it outputs what rules
 194 matched. This makes it quite easy to follow what is going on for a
 195 particular input fragment.
 196
 197 To make it easier to toggle debug information for a given flex file I
 198 wrote the following perl script, which automatically adds or removes -d
 199 from the correct line in the Makefile:
 200
 201 \verbatim
 202 #!/usr/bin/perl
 203
 204 $file = shift @ARGV;
 205 print "Toggle debugging mode for $file\n";
 206
 207 # add or remove the -d flex flag in the makefile
 208 unless (rename "Makefile.libdoxygen","Makefile.libdoxygen.old") {
 209   print STDERR "Error: cannot rename Makefile.libdoxygen!\n";
 210   exit 1;
 211 }
 212 if (open(F,"<Makefile.libdoxygen.old")) {
 213   unless (open(G,">Makefile.libdoxygen")) {
 214     print STDERR "Error: opening file Makefile.libdoxygen for writing\n";
 215     exit 1;
 216   }
 217   print "Processing Makefile.libdoxygen...\n";
 218   while (<F>) {
 219     if ( s/\(LEX\) (-i )?-P([a-zA-Z]+)YY -t $file/(LEX) -d \1-P\2YY -t $file/g ) {
 220       print "Enabling debug info for $file\n";
 221     }
 222     elsif ( s/\(LEX\) -d (-i )?-P([a-zA-Z]+)YY -t $file/(LEX) \1-P\2YY -t $file/g ) {
 223       print "Disabling debug info for $file\n";
 224     }
 225     print G "$_";
 226   }
 227   close F;
 228   unlink "Makefile.libdoxygen.old";
 229 }
 230 else {
 231   print STDERR "Warning file Makefile.libdoxygen.old does not exist!\n";
 232 }
 233
 234 # touch the file
 235 $now = time;
 236 utime $now, $now, $file
 237
 238 \endverbatim
 239
 240 */
 241
 242