1 <!-- ##### SECTION Title ##### -->
4 <!-- ##### SECTION Short_Description ##### -->
5 a general purpose lexical scanner
7 <!-- ##### SECTION Long_Description ##### -->
9 The #GScanner and its associated functions provide a general purpose
14 FIXME: really needs an example and more detail, but I don't completely
15 understand it myself. Look at gtkrc.c for some code using the scanner.
18 <!-- ##### SECTION See_Also ##### -->
23 <!-- ##### SECTION Stability_Level ##### -->
26 <!-- ##### STRUCT GScanner ##### -->
28 The data structure representing a lexical scanner.
31 You should set <structfield>input_name</structfield> after creating
32 the scanner, since it is used by the default message handler when
33 displaying warnings and errors. If you are scanning a file, the file
34 name would be a good choice.
37 The <structfield>user_data</structfield> and
38 <structfield>max_parse_errors</structfield> fields are not used.
39 If you need to associate extra data with the scanner you can place them here.
42 If you want to use your own message handler you can set the
43 <structfield>msg_handler</structfield> field. The type of the message
44 handler function is declared by #GScannerMsgFunc.
53 @token: token parsed by the last g_scanner_get_next_token()
54 @value: value of the last token from g_scanner_get_next_token()
55 @line: line number of the last token from g_scanner_get_next_token()
56 @position: char number of the last token from g_scanner_get_next_token()
57 @next_token: token parsed by the last g_scanner_peek_next_token()
58 @next_value: value of the last token from g_scanner_peek_next_token()
59 @next_line: line number of the last token from g_scanner_peek_next_token()
60 @next_position: char number of the last token from g_scanner_peek_next_token()
67 @msg_handler: function to handle GScanner message output
69 <!-- ##### STRUCT GScannerConfig ##### -->
71 Specifies the #GScanner parser configuration. Most settings can be changed during
72 the parsing phase and will affect the lexical parsing of the next unpeeked token.
75 <structfield>cset_skip_characters</structfield> specifies which characters
76 should be skipped by the scanner (the default is the whitespace characters:
77 space, tab, carriage-return and line-feed).
80 <structfield>cset_identifier_first</structfield> specifies the characters
81 which can start identifiers (the default is #G_CSET_a_2_z, "_", and
85 <structfield>cset_identifier_nth</structfield> specifies the characters
86 which can be used in identifiers, after the first character (the default
87 is #G_CSET_a_2_z, "_0123456789", #G_CSET_A_2_Z, #G_CSET_LATINS,
91 <structfield>cpair_comment_single</structfield> specifies the characters
92 at the start and end of single-line comments. The default is "#\n" which
93 means that single-line comments start with a '#' and continue until a '\n'
97 <structfield>case_sensitive</structfield> specifies if symbols are
98 case sensitive (the default is %FALSE).
101 <structfield>skip_comment_multi</structfield> specifies if multi-line
102 comments are skipped and not returned as tokens (the default is %TRUE).
105 <structfield>skip_comment_single</structfield> specifies if single-line
106 comments are skipped and not returned as tokens (the default is %TRUE).
109 <structfield>scan_comment_multi</structfield> specifies if multi-line
110 comments are recognized (the default is %TRUE).
113 <structfield>scan_identifier</structfield> specifies if identifiers
114 are recognized (the default is %TRUE).
117 <structfield>scan_identifier_1char</structfield> specifies if single-character
118 identifiers are recognized (the default is %FALSE).
121 <structfield>scan_identifier_NULL</structfield> specifies if
122 <literal>NULL</literal> is reported as #G_TOKEN_IDENTIFIER_NULL.
123 (the default is %FALSE).
126 <structfield>scan_symbols</structfield> specifies if symbols are
127 recognized (the default is %TRUE).
130 <structfield>scan_binary</structfield> specifies if binary numbers
131 are recognized (the default is %FALSE).
134 <structfield>scan_octal</structfield> specifies if octal numbers
135 are recognized (the default is %TRUE).
138 <structfield>scan_float</structfield> specifies if floating point numbers
139 are recognized (the default is %TRUE).
142 <structfield>scan_hex</structfield> specifies if hexadecimal numbers
143 are recognized (the default is %TRUE).
146 <structfield>scan_hex_dollar</structfield> specifies if '$' is recognized
147 as a prefix for hexadecimal numbers (the default is %FALSE).
150 <structfield>scan_string_sq</structfield> specifies if strings can be
151 enclosed in single quotes (the default is %TRUE).
154 <structfield>scan_string_dq</structfield> specifies if strings can be
155 enclosed in double quotes (the default is %TRUE).
158 <structfield>numbers_2_int</structfield> specifies if binary, octal and
159 hexadecimal numbers are reported as #G_TOKEN_INT (the default is %TRUE).
162 <structfield>int_2_float</structfield> specifies if all numbers are
163 reported as #G_TOKEN_FLOAT (the default is %FALSE).
166 <structfield>identifier_2_string</structfield> specifies if identifiers
167 are reported as strings (the default is %FALSE).
170 <structfield>char_2_token</structfield> specifies if characters
171 are reported by setting <literal>token = ch</literal> or as #G_TOKEN_CHAR
172 (the default is %TRUE).
175 <structfield>symbol_2_token</structfield> specifies if symbols
176 are reported by setting <literal>token = v_symbol</literal> or as
177 #G_TOKEN_SYMBOL (the default is %FALSE).
180 <structfield>scope_0_fallback</structfield> specifies if a symbol
181 is searched for in the default scope in addition to the current scope
182 (the default is %FALSE).
185 @cset_skip_characters:
186 @cset_identifier_first:
187 @cset_identifier_nth:
188 @cpair_comment_single:
191 @skip_comment_single:
194 @scan_identifier_1char:
195 @scan_identifier_NULL:
206 @identifier_2_string:
213 <!-- ##### FUNCTION g_scanner_new ##### -->
215 Creates a new #GScanner.
216 The @config_templ structure specifies the initial settings of the scanner,
217 which are copied into the #GScanner <structfield>config</structfield> field.
218 If you pass %NULL then the default settings are used.
221 @config_templ: the initial scanner settings.
222 @Returns: the new #GScanner.
225 <!-- ##### FUNCTION g_scanner_destroy ##### -->
227 Frees all memory used by the #GScanner.
230 @scanner: a #GScanner.
233 <!-- ##### FUNCTION g_scanner_input_file ##### -->
235 Prepares to scan a file.
238 @scanner: a #GScanner.
239 @input_fd: a file descriptor.
242 <!-- ##### FUNCTION g_scanner_sync_file_offset ##### -->
244 Rewinds the filedescriptor to the current buffer position and blows
245 the file read ahead buffer. This is useful for third party uses of
246 the scanners filedescriptor, which hooks onto the current scanning
250 @scanner: a #GScanner.
253 <!-- ##### FUNCTION g_scanner_input_text ##### -->
255 Prepares to scan a text buffer.
258 @scanner: a #GScanner.
259 @text: the text buffer to scan.
260 @text_len: the length of the text buffer.
263 <!-- ##### FUNCTION g_scanner_peek_next_token ##### -->
265 Parses the next token, without removing it from the input stream.
266 The token data is placed in the
267 <structfield>next_token</structfield>,
268 <structfield>next_value</structfield>,
269 <structfield>next_line</structfield>, and
270 <structfield>next_position</structfield> fields of the #GScanner structure.
273 Note that, while the token is not removed from the input stream (i.e.
274 the next call to g_scanner_get_next_token() will return the same token),
275 it will not be reevaluated. This can lead to surprising results when
276 changing scope or the scanner configuration after peeking the next token.
277 Getting the next token after switching the scope or configuration will
278 return whatever was peeked before, regardless of any symbols that may
279 have been added or removed in the new scope.
282 @scanner: a #GScanner.
283 @Returns: the type of the token.
286 <!-- ##### FUNCTION g_scanner_get_next_token ##### -->
288 Parses the next token just like g_scanner_peek_next_token() and also
289 removes it from the input stream.
290 The token data is placed in the
291 <structfield>token</structfield>,
292 <structfield>value</structfield>,
293 <structfield>line</structfield>, and
294 <structfield>position</structfield> fields of the #GScanner structure.
297 @scanner: a #GScanner.
298 @Returns: the type of the token.
301 <!-- ##### FUNCTION g_scanner_eof ##### -->
303 Returns %TRUE if the scanner has reached the end of the file or text buffer.
306 @scanner: a #GScanner.
307 @Returns: %TRUE if the scanner has reached the end of the file or text buffer.
310 <!-- ##### FUNCTION g_scanner_cur_line ##### -->
312 Returns the current line in the input stream (counting from 1).
313 This is the line of the last token parsed via g_scanner_get_next_token().
316 @scanner: a #GScanner.
317 @Returns: the current line.
320 <!-- ##### FUNCTION g_scanner_cur_position ##### -->
322 Returns the current position in the current line (counting from 0).
323 This is the position of the last token parsed via g_scanner_get_next_token().
326 @scanner: a #GScanner.
327 @Returns: the current position on the line.
330 <!-- ##### FUNCTION g_scanner_cur_token ##### -->
332 Gets the current token type.
333 This is simply the <structfield>token</structfield> field in the #GScanner
337 @scanner: a #GScanner.
338 @Returns: the current token type.
341 <!-- ##### FUNCTION g_scanner_cur_value ##### -->
343 Gets the current token value.
344 This is simply the <structfield>value</structfield> field in the #GScanner
348 @scanner: a #GScanner.
349 @Returns: the current token value.
352 <!-- ##### FUNCTION g_scanner_set_scope ##### -->
354 Sets the current scope.
357 @scanner: a #GScanner.
358 @scope_id: the new scope id.
359 @Returns: the old scope id.
362 <!-- ##### FUNCTION g_scanner_scope_add_symbol ##### -->
364 Adds a symbol to the given scope.
367 @scanner: a #GScanner.
368 @scope_id: the scope id.
369 @symbol: the symbol to add.
370 @value: the value of the symbol.
373 <!-- ##### FUNCTION g_scanner_scope_foreach_symbol ##### -->
375 Calls the given function for each of the symbol/value pairs in the
376 given scope of the #GScanner. The function is passed the symbol and
377 value of each pair, and the given @user_data parameter.
380 @scanner: a #GScanner.
381 @scope_id: the scope id.
382 @func: the function to call for each symbol/value pair.
383 @user_data: user data to pass to the function.
386 <!-- ##### FUNCTION g_scanner_scope_lookup_symbol ##### -->
388 Looks up a symbol in a scope and return its value. If the
389 symbol is not bound in the scope, %NULL is returned.
392 @scanner: a #GScanner.
393 @scope_id: the scope id.
394 @symbol: the symbol to look up.
395 @Returns: the value of @symbol in the given scope, or %NULL
396 if @symbol is not bound in the given scope.
399 <!-- ##### FUNCTION g_scanner_scope_remove_symbol ##### -->
401 Removes a symbol from a scope.
404 @scanner: a #GScanner.
405 @scope_id: the scope id.
406 @symbol: the symbol to remove.
409 <!-- ##### MACRO g_scanner_add_symbol ##### -->
411 Adds a symbol to the default scope.
414 @scanner: a #GScanner.
415 @symbol: the symbol to add.
416 @value: the value of the symbol.
417 @Deprecated: 2.2: Use g_scanner_scope_add_symbol() instead.
420 <!-- ##### MACRO g_scanner_remove_symbol ##### -->
422 Removes a symbol from the default scope.
425 @scanner: a #GScanner.
426 @symbol: the symbol to remove.
427 @Deprecated: 2.2: Use g_scanner_scope_remove_symbol() instead.
430 <!-- ##### MACRO g_scanner_foreach_symbol ##### -->
432 Calls a function for each symbol in the default scope.
435 @scanner: a #GScanner.
436 @func: the function to call with each symbol.
437 @data: data to pass to the function.
438 @Deprecated: 2.2: Use g_scanner_scope_foreach_symbol() instead.
441 <!-- ##### MACRO g_scanner_freeze_symbol_table ##### -->
443 There is no reason to use this macro, since it does nothing.
446 @scanner: a #GScanner.
447 @Deprecated: 2.2: This macro does nothing.
450 <!-- ##### MACRO g_scanner_thaw_symbol_table ##### -->
452 There is no reason to use this macro, since it does nothing.
455 @scanner: a #GScanner.
456 @Deprecated: 2.2: This macro does nothing.
459 <!-- ##### FUNCTION g_scanner_lookup_symbol ##### -->
461 Looks up a symbol in the current scope and return its value. If the
462 symbol is not bound in the current scope, %NULL is returned.
465 @scanner: a #GScanner.
466 @symbol: the symbol to look up.
467 @Returns: the value of @symbol in the current scope, or %NULL
468 if @symbol is not bound in the current scope.
471 <!-- ##### FUNCTION g_scanner_warn ##### -->
473 Outputs a warning message, via the #GScanner message handler.
476 @scanner: a #GScanner.
477 @format: the message format. See the <function>printf()</function>
479 @Varargs: the parameters to insert into the format string.
482 <!-- ##### FUNCTION g_scanner_error ##### -->
484 Outputs an error message, via the #GScanner message handler.
487 @scanner: a #GScanner.
488 @format: the message format. See the <function>printf()</function>
490 @Varargs: the parameters to insert into the format string.
493 <!-- ##### FUNCTION g_scanner_unexp_token ##### -->
495 Outputs a message through the scanner's msg_handler, resulting from an
496 unexpected token in the input stream.
497 Note that you should not call g_scanner_peek_next_token() followed by
498 g_scanner_unexp_token() without an intermediate call to
499 g_scanner_get_next_token(), as g_scanner_unexp_token() evaluates the
500 scanner's current token (not the peeked token) to construct part
504 @scanner: a #GScanner.
505 @expected_token: the expected token.
506 @identifier_spec: a string describing how the scanner's user refers to
507 identifiers (%NULL defaults to "identifier").
508 This is used if @expected_token is #G_TOKEN_IDENTIFIER
509 or #G_TOKEN_IDENTIFIER_NULL.
510 @symbol_spec: a string describing how the scanner's user refers to
511 symbols (%NULL defaults to "symbol").
512 This is used if @expected_token is #G_TOKEN_SYMBOL or
513 any token value greater than #G_TOKEN_LAST.
514 @symbol_name: the name of the symbol, if the scanner's current token
516 @message: a message string to output at the end of the warning/error, or %NULL.
517 @is_error: if %TRUE it is output as an error. If %FALSE it is output as a
521 <!-- ##### USER_FUNCTION GScannerMsgFunc ##### -->
523 Specifies the type of the message handler function.
526 @scanner: a #GScanner.
527 @message: the message.
528 @error: %TRUE if the message signals an error, %FALSE if it
532 <!-- ##### MACRO G_CSET_a_2_z ##### -->
534 The set of lowercase ASCII alphabet characters.
535 Used for specifying valid identifier characters in #GScannerConfig.
540 <!-- ##### MACRO G_CSET_A_2_Z ##### -->
542 The set of uppercase ASCII alphabet characters.
543 Used for specifying valid identifier characters in #GScannerConfig.
548 <!-- ##### MACRO G_CSET_DIGITS ##### -->
551 Used for specifying valid identifier characters in #GScannerConfig.
556 <!-- ##### MACRO G_CSET_LATINC ##### -->
558 The set of uppercase ISO 8859-1 alphabet characters which are
559 not ASCII characters.
560 Used for specifying valid identifier characters in #GScannerConfig.
565 <!-- ##### MACRO G_CSET_LATINS ##### -->
567 The set of lowercase ISO 8859-1 alphabet characters which are
568 not ASCII characters.
569 Used for specifying valid identifier characters in #GScannerConfig.
574 <!-- ##### ENUM GTokenType ##### -->
576 The possible types of token returned from each g_scanner_get_next_token() call.
579 @G_TOKEN_EOF: the end of the file.
580 @G_TOKEN_LEFT_PAREN: a '(' character.
581 @G_TOKEN_LEFT_CURLY: a '{' character.
582 @G_TOKEN_RIGHT_CURLY: a '}' character.
584 <!-- ##### UNION GTokenValue ##### -->
586 A union holding the value of the token.
590 <!-- ##### ENUM GErrorType ##### -->
592 The possible errors, used in the <structfield>v_error</structfield> field
593 of #GTokenValue, when the token is a #G_TOKEN_ERROR.
596 @G_ERR_UNKNOWN: unknown error.
597 @G_ERR_UNEXP_EOF: unexpected end of file.
598 @G_ERR_UNEXP_EOF_IN_STRING: unterminated string constant.
599 @G_ERR_UNEXP_EOF_IN_COMMENT: unterminated comment.
600 @G_ERR_NON_DIGIT_IN_CONST: non-digit character in a number.
601 @G_ERR_DIGIT_RADIX: digit beyond radix in a number.
602 @G_ERR_FLOAT_RADIX: non-decimal floating point number.
603 @G_ERR_FLOAT_MALFORMED: malformed floating point number.