doc/extsearch.doc

   1 /******************************************************************************
   2  *
   3  * Copyright (C) 1997-2015 by Dimitri van Heesch.
   4  *
   5  * Permission to use, copy, modify, and distribute this software and its
   6  * documentation under the terms of the GNU General Public License is hereby
   7  * granted. No representations are made about the suitability of this software
   8  * for any purpose. It is provided "as is" without express or implied warranty.
   9  * See the GNU General Public License for more details.
  10  *
  11  * Documents produced by Doxygen are derivative works derived from the
  12  * input used in their production; they are not affected by this license.
  13  *
  14  */
  15 /*! \page extsearch External Indexing and Searching
  16
  17 [TOC]
  18
  19 \section extsearch_intro Introduction
  20
  21 With release 1.8.3, doxygen provides the ability to search through HTML using
  22 an external indexing tool and search engine.
  23 This has several advantages:
  24 - For large projects it can have significant performance advantages over
  25   doxygen's built-in search engine, as doxygen uses a rather simple indexing
  26   algorithm.
  27 - It allows combining the search data of multiple projects into one index,
  28   allowing a global search across multiple doxygen projects.
  29 - It allows adding additional data to the search index, i.e. other web pages
  30   not produced by doxygen.
  31 - The search engine needs to run on a web server, but clients can still browse
  32   the web pages locally.
  33
  34 To avoid that everyone has to start writing their own indexer and search
  35 engine, doxygen provides an example tool for each action: `doxyindexer`
  36 for indexing the data and `doxysearch.cgi` for searching through the index.
  37
  38 The data flow is shown in the following diagram:
  39
  40 \image html extsearch_flow.png "External Search Data Flow"
  41 \image latex extsearch_flow.eps "External Search Data Flow" height=10cm
  42
  43 - `doxygen` produces the raw search data
  44 - `doxyindexer` indexes the data into a search database `doxysearch.db`
  45 - when a user performs a search from a doxygen generated HTML page,
  46   the CGI binary `doxysearch.cgi` will be invoked.
  47 - the `doxysearch.cgi` tool will perform a query on the database and return
  48   the results.
  49 - The browser will show the search results.
  50
  51 \section extsearch_config Configuring
  52
  53 The first step is to make the search engine available via a web server.
  54 If you use `doxysearch.cgi` this means making the
  55 <a href="https://en.wikipedia.org/wiki/Common_Gateway_Interface">CGI</a> binary
  56 available from the web server (i.e. be able to run it from a
  57 browser via an URL starting with http:)
  58
  59 How to setup a web server is outside the scope of this document,
  60 but if you for instance have Apache installed, you could simply copy the
  61 `doxysearch.cgi` file from doxygen's `bin` directory to the `cgi-bin` directory of the
  62 Apache web server. Read the <a href="https://httpd.apache.org/docs/2.2/howto/cgi.html">apache documentation</a> for details.
  63
  64 To test if `doxysearch.cgi` is accessible start your web browser and
  65 point to URL to the binary and add `?test` at the end
  66
  67     http://yoursite.com/path/to/cgi/doxysearch.cgi?test
  68
  69 You should get the following message:
  70
  71     Test failed: cannot find search index doxysearch.db
  72
  73 If you use Internet Explorer you may be prompted to download a file,
  74 which will then contain this message.
  75
  76 Since we didn't create or install a doxysearch.db it is OK for the test to
  77 fail for this reason. How to correct this is discussed in the next section.
  78
  79 Before continuing with the next section add the above
  80 URL (without the `?test` part) to the \ref cfg_searchengine_url "SEARCHENGINE_URL" tag in
  81 doxygen's configuration file:
  82
  83     SEARCHENGINE_URL = http://yoursite.com/path/to/cgi/doxysearch.cgi
  84
  85 \subsection extsearch_single Single project index
  86
  87 To use the external search option, make sure the following options are enabled
  88 in doxygen's configuration file:
  89
  90     SEARCHENGINE           = YES
  91     SERVER_BASED_SEARCH    = YES
  92     EXTERNAL_SEARCH        = YES
  93
  94 This will make doxygen generate a file called `searchdata.xml` in the output
  95 directory (configured with \ref cfg_output_directory "OUTPUT_DIRECTORY").
  96 You can change the file name (and location) with the
  97 \ref cfg_searchdata_file "SEARCHDATA_FILE" option.
  98
  99 The next step is to put the raw search data into an index for efficient
 100 searching. You can use `doxyindexer` for this. Simply run it from the command
 101 line:
 102
 103     doxyindexer searchdata.xml
 104
 105 This will create a directory called `doxysearch.db` with some files in it.
 106 By default the directory will be created at the location from which doxyindexer
 107 was started, but you can change the directory using the `-o` option.
 108
 109 Copy the `doxysearch.db` directory to the same directory as where
 110 the `doxysearch.cgi` is located and rerun the browser test by pointing
 111 the browser to
 112
 113     http://yoursite.com/path/to/cgi/doxysearch.cgi?test
 114
 115 You should now get the following message:
 116
 117     Test successful.
 118
 119 Now you should be able to search for words and symbols from the HTML output.
 120
 121 \subsection extsearch_multi Multi project index
 122
 123 In case you have more than one doxygen project and these projects are related,
 124 it may be desirable to allow searching for words in all projects from within
 125 the documentation of any of the projects.
 126
 127 To make this possible all that is needed is to combine the search data
 128 for all projects into a single index, e.g. for two projects A and B for which the
 129 searchdata.xml is generated in directories project_A and project_B run:
 130
 131     doxyindexer project_A/searchdata.xml project_B/searchdata.xml
 132
 133 and then copy the resulting `doxysearch.db` to the directory where also
 134 `doxysearch.cgi` is located.
 135
 136 The `searchdata.xml` file doesn't contain any absolute paths or links,
 137 so how can the search results from multiple projects be linked back to the right documentation set?
 138 This is where the \ref cfg_external_search_id "EXTERNAL_SEARCH_ID" and
 139 \ref cfg_extra_search_mappings "EXTRA_SEARCH_MAPPINGS" options come into play.
 140
 141 To be able to identify the different projects, one needs to
 142 set a unique ID using \ref cfg_external_search_id "EXTERNAL_SEARCH_ID"
 143 for each project.
 144
 145 To link the search results to the right project, you need to define a
 146 mapping per project using the \ref cfg_extra_search_mappings "EXTRA_SEARCH_MAPPINGS" tag.
 147 With this option to can define the mapping from IDs of other projects to the
 148 (relative) location of documentation of those projects.
 149
 150 So for projects A and B the relevant part of the configuration file
 151 could look as follows:
 152
 153     project_A/Doxyfile
 154     ------------------
 155     EXTERNAL_SEARCH_ID    = A
 156     EXTRA_SEARCH_MAPPINGS = B=../../project_B/html
 157
 158 for project A and for project B
 159
 160     project_B/Doxyfile
 161     ------------------
 162     EXTERNAL_SEARCH_ID    = B
 163     EXTRA_SEARCH_MAPPINGS = A=../../project_A/html
 164
 165 with these settings, projects A and B can share the same search database,
 166 and the search results will link to the right documentation set.
 167
 168 \section extsearch_update Updating the index
 169
 170 When you modify the source code, you should re-run doxygen to get up to date
 171 documentation again. When using external searching you also need to update the
 172 search index by re-running `doxyindexer`. You could wrap the call to `doxygen`
 173 and `doxyindexer` together in a script to make this process easier.
 174
 175 \section extsearch_api Programming interface
 176
 177 Previous sections have assumed you use the tools `doxyindexer`
 178 and `doxysearch.cgi` to do the indexing and searching, but you could also
 179 write your own index and search tools if you like.
 180
 181 For this 3 interfaces are important
 182 - The format of the input for the index tool.
 183 - The format of the input for the search engine.
 184 - The format of the output of search engine.
 185
 186 The next subsections describe these interfaces in more detail.
 187
 188 \subsection extsearch_api_index Indexer input format
 189
 190 The search data produced by doxygen follows the
 191 <a href="https://cwiki.apache.org/confluence/display/solr/UpdateXmlMessages">Solr XML index message</a>
 192 format.
 193
 194 The input for the indexer is an XML file, which consists of one `<add>` tag containing
 195 multiple `<doc>` tags, which in turn contain multiple `<field>` tags.
 196
 197 Here is an example of one doc node, which contains the search data and meta data for
 198 one method:
 199
 200     <add>
 201       ...
 202       <doc>
 203         <field name="type">function</field>
 204         <field name="name">QXmlReader::setDTDHandler</field>
 205         <field name="args">(QXmlDTDHandler *handler)=0</field>
 206         <field name="tag">qtools.tag</field>
 207         <field name="url">de/df6/class_q_xml_reader.html#a0b24b1fe26a4c32a8032d68ee14d5dba</field>
 208         <field name="keywords">setDTDHandler QXmlReader::setDTDHandler QXmlReader</field>
 209         <field name="text">Sets the DTD handler to handler DTDHandler()</field>
 210       </doc>
 211       ...
 212     </add>
 213
 214 Each field has a name. The following field names are supported:
 215 - *type*: the type of the search entry; can be one of: source, function, slot,
 216           signal, variable, typedef, enum, enumvalue, property, event, related,
 217           friend, define, file, namespace, group, package, page, dir
 218 - *name*: the name of the search entry; for a method this is the qualified name of the method,
 219           for a class it is the name of the class, etc.
 220 - *args*: the parameter list (in case of functions or methods)
 221 - *tag*:  the name of the tag file used for this project.
 222 - *url*:  the (relative) URL to the HTML documentation for this entry.
 223 - *keywords*: important words that are representative for the entry. When searching for such
 224           keyword, this entry should get a higher rank in the search results.
 225 - *text*: the documentation associated with the item. Note that only words are present, no markup.
 226
 227 @note Due to the potentially large size of the XML file, it is recommended to use a
 228 <a href="https://en.wikipedia.org/wiki/Simple_API_for_XML">SAX based parser</a> to process it.
 229
 230 \subsection extsearch_api_search_in Search URL format
 231
 232 When the search engine is invoked from a doxygen generated HTML page, a number of parameters are
 233 passed to via the <a href="https://en.wikipedia.org/wiki/Query_string">query string</a>.
 234
 235 The following fields are passed:
 236 - *q*:  the query text as entered by the user
 237 - *n*:  the number of search results requested.
 238 - *p*:  the number of search page for which to return the results. Each page has *n* values.
 239 - *cb*: the name of the callback function, used for JSON with padding, see the next section.
 240
 241 From the complete list of search results, the range `[n*p - n*(p+1)-1]` should be returned.
 242
 243 Here is an example of how a query looks like.
 244
 245     http://yoursite.com/path/to/cgi/doxysearch.cgi?q=list&n=20&p=1&cb=dummy
 246
 247 It represents a query for the word 'list' (`q=list`) requesting 20 search results (`n=20`),
 248 starting with the result number 20 (`p=1`) and using callback 'dummy' (`cb=dummy`):
 249
 250
 251 @note The values are <a href="https://en.wikipedia.org/wiki/Percent-encoding">URL encoded</a> so they
 252 have to be decoded before they can be used.
 253
 254 \subsection extsearch_api_search_out Search results format
 255
 256 When invoking the search engine as shown in the previous subsection, it should reply with
 257 the results. The format of the reply is
 258 <a href="https://en.wikipedia.org/wiki/JSONP">JSON with padding</a>, which is basically
 259 a javascript struct wrapped in a function call. The name of function should be the name of
 260 the callback (as passed with the *cb* field in the query).
 261
 262 With the example query as shown the previous subsection the main structure of the reply should
 263 look as follows:
 264
 265     dummy({
 266       "hits":179,
 267       "first":20,
 268       "count":20,
 269       "page":1,
 270       "pages":9,
 271       "query": "list",
 272       "items":[
 273       ...
 274      ]})
 275
 276 The fields have the following meaning:
 277 - *hits*:  the total number of search results (could be more than was requested).
 278 - *first*: the index of first result returned: \f$\min(n*p,\mbox{\em hits})\f$.
 279 - *count*: the actual number of results returned: \f$\min(n,\mbox{\em hits}-\mbox{\em first})\f$
 280 - *page*:  the page number of the result: \f$p\f$
 281 - *pages*: the total number of pages: \f$\left\lceil\frac{\mbox{\em hits}}{n}\right\rceil\f$.
 282 - *items*: an array containing the search data per result.
 283
 284 Here is an example of how the element of the *items* array should look like:
 285
 286     {"type": "function",
 287      "name": "QDir::entryInfoList(const QString &nameFilter, int filterSpec=DefaultFilter, int sortSpec=DefaultSort) const",
 288      "tag": "qtools.tag",
 289      "url": "d5/d8d/class_q_dir.html#a9439ea6b331957f38dbad981c4d050ef",
 290      "fragments":[
 291        "Returns a <span class=\"hl\">list</span> of QFileInfo objects for all files and directories...",
 292        "... pointer to a QFileInfoList The <span class=\"hl\">list</span> is owned by the QDir object...",
 293        "... to keep the entries of the <span class=\"hl\">list</span> after a subsequent call to this..."
 294      ]
 295     },
 296
 297 The fields for such an item have the following meaning:
 298 - *type*: the type of the item, as found in the field with name "type" in the raw search data.
 299 - *name*: the name of the item, including the parameter list, as found in the fields with
 300           name "name" and "args" in the raw search data.
 301 - *tag*:  the name of the tag file, as found in the field with name "tag" in the raw search data.
 302 - *url*:  the name of the (relative) URL to the documentation, as found in the field with name "url"
 303           in the raw search data.
 304 - "fragments": an array with 0 or more fragments of text containing words that have been search for.
 305           These words should be wrapped in `<span class="hl">` and `</span>` tags to highlight them
 306           in the output.
 307 */