1 This is flex.info, produced by makeinfo version 6.1 from flex.texi.
3 The flex manual is placed under the same licensing conditions as the
6 Copyright (C) 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2012 The Flex
9 Copyright (C) 1990, 1997 The Regents of the University of California.
12 This code is derived from software contributed to Berkeley by Vern
15 The United States Government has rights in this work pursuant to
16 contract no. DE-AC03-76SF00098 between the United States Department of
17 Energy and the University of California.
19 Redistribution and use in source and binary forms, with or without
20 modification, are permitted provided that the following conditions are
23 1. Redistributions of source code must retain the above copyright
24 notice, this list of conditions and the following disclaimer.
26 2. Redistributions in binary form must reproduce the above copyright
27 notice, this list of conditions and the following disclaimer in the
28 documentation and/or other materials provided with the
31 Neither the name of the University nor the names of its contributors
32 may be used to endorse or promote products derived from this software
33 without specific prior written permission.
35 THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
36 WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
37 MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
38 INFO-DIR-SECTION Programming
40 * flex: (flex). Fast lexical analyzer generator (lex replacement).
44 File: flex.info, Node: Top, Next: Copyright, Prev: (dir), Up: (dir)
49 This manual describes 'flex', a tool for generating programs that
50 perform pattern-matching on text. The manual includes both tutorial and
53 This edition of 'The flex Manual' documents 'flex' version 2.6.4. It
54 was last updated on 6 May 2017.
56 This manual was written by Vern Paxson, Will Estes and John Millaway.
70 * Multiple Input Buffers::
89 -- The Detailed Node Listing --
91 Format of the Input File
93 * Definitions Section::
96 * Comments in the Input::
100 * Options for Specifying Filenames::
101 * Options Affecting Scanner Behavior::
102 * Code-Level And API Options::
103 * Options for Scanner Speed and Size::
104 * Debugging Options::
105 * Miscellaneous Options::
110 * Reentrant Overview::
111 * Reentrant Example::
113 * Reentrant Functions::
115 The Reentrant API in Detail
117 * Specify Reentrant::
118 * Extra Reentrant Argument::
119 * Global Replacement::
120 * Init and Destroy Functions::
127 * The Default Memory Management::
128 * Overriding The Default Memory Management::
129 * A Note About yytext And Memory::
133 * Creating Serialized Tables::
134 * Loading and Unloading Serialized Tables::
135 * Tables File Format::
139 * When was flex born?::
140 * How do I expand backslash-escape sequences in C-style quoted strings?::
141 * Why do flex scanners call fileno if it is not ANSI compatible?::
142 * Does flex support recursive pattern definitions?::
143 * How do I skip huge chunks of input (tens of megabytes) while using flex?::
144 * Flex is not matching my patterns in the same order that I defined them.::
145 * My actions are executing out of order or sometimes not at all.::
146 * How can I have multiple input sources feed into the same scanner at the same time?::
147 * Can I build nested parsers that work with the same input file?::
148 * How can I match text only at the end of a file?::
149 * How can I make REJECT cascade across start condition boundaries?::
150 * Why cant I use fast or full tables with interactive mode?::
151 * How much faster is -F or -f than -C?::
152 * If I have a simple grammar cant I just parse it with flex?::
153 * Why doesn't yyrestart() set the start state back to INITIAL?::
154 * How can I match C-style comments?::
155 * The period isn't working the way I expected.::
156 * Can I get the flex manual in another format?::
157 * Does there exist a "faster" NDFA->DFA algorithm?::
158 * How does flex compile the DFA so quickly?::
159 * How can I use more than 8192 rules?::
160 * How do I abandon a file in the middle of a scan and switch to a new file?::
161 * How do I execute code only during initialization (only before the first scan)?::
162 * How do I execute code at termination?::
163 * Where else can I find help?::
164 * Can I include comments in the "rules" section of the file?::
165 * I get an error about undefined yywrap().::
166 * How can I change the matching pattern at run time?::
167 * How can I expand macros in the input?::
168 * How can I build a two-pass scanner?::
169 * How do I match any string not matched in the preceding rules?::
170 * I am trying to port code from AT&T lex that uses yysptr and yysbuf.::
171 * Is there a way to make flex treat NULL like a regular character?::
172 * Whenever flex can not match the input it says "flex scanner jammed".::
173 * Why doesn't flex have non-greedy operators like perl does?::
174 * Memory leak - 16386 bytes allocated by malloc.::
175 * How do I track the byte offset for lseek()?::
176 * How do I use my own I/O classes in a C++ scanner?::
177 * How do I skip as many chars as possible?::
179 * Are certain equivalent patterns faster than others?::
180 * Is backing up a big deal?::
181 * Can I fake multi-byte character support?::
183 * Can you discuss some flex internals?::
184 * unput() messes up yy_at_bol::
185 * The | operator is not doing what I want::
186 * Why can't flex understand this variable trailing context pattern?::
187 * The ^ operator isn't working::
188 * Trailing context is getting confused with trailing optional patterns::
189 * Is flex GNU or not?::
191 * I need to scan if-then-else blocks and while loops::
195 * Is there a repository for flex scanners?::
196 * How can I conditionally compile or preprocess my flex input file?::
197 * Where can I find grammars for lex and yacc?::
198 * I get an end-of-buffer message for each character scanned.::
238 * What is the difference between YYLEX_PARAM and YY_DECL?::
239 * Why do I get "conflicting types for yylex" error?::
240 * How do I access the values set in a Flex action from within a Bison action?::
244 * Makefiles and Flex::
252 * Index of Functions and Macros::
253 * Index of Variables::
254 * Index of Data Types::
256 * Index of Scanner Options::
260 File: flex.info, Node: Copyright, Next: Reporting Bugs, Prev: Top, Up: Top
265 The flex manual is placed under the same licensing conditions as the
268 Copyright (C) 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2012 The Flex
271 Copyright (C) 1990, 1997 The Regents of the University of California.
274 This code is derived from software contributed to Berkeley by Vern
277 The United States Government has rights in this work pursuant to
278 contract no. DE-AC03-76SF00098 between the United States Department of
279 Energy and the University of California.
281 Redistribution and use in source and binary forms, with or without
282 modification, are permitted provided that the following conditions are
285 1. Redistributions of source code must retain the above copyright
286 notice, this list of conditions and the following disclaimer.
288 2. Redistributions in binary form must reproduce the above copyright
289 notice, this list of conditions and the following disclaimer in the
290 documentation and/or other materials provided with the
293 Neither the name of the University nor the names of its contributors
294 may be used to endorse or promote products derived from this software
295 without specific prior written permission.
297 THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
298 WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
299 MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
302 File: flex.info, Node: Reporting Bugs, Next: Introduction, Prev: Copyright, Up: Top
307 If you find a bug in 'flex', please report it using GitHub's issue
308 tracking facility at <https://github.com/westes/flex/issues/>
311 File: flex.info, Node: Introduction, Next: Simple Examples, Prev: Reporting Bugs, Up: Top
316 'flex' is a tool for generating "scanners". A scanner is a program
317 which recognizes lexical patterns in text. The 'flex' program reads the
318 given input files, or its standard input if no file names are given, for
319 a description of a scanner to generate. The description is in the form
320 of pairs of regular expressions and C code, called "rules". 'flex'
321 generates as output a C source file, 'lex.yy.c' by default, which
322 defines a routine 'yylex()'. This file can be compiled and linked with
323 the flex runtime library to produce an executable. When the executable
324 is run, it analyzes its input for occurrences of the regular
325 expressions. Whenever it finds one, it executes the corresponding C
329 File: flex.info, Node: Simple Examples, Next: Format, Prev: Introduction, Up: Top
331 4 Some Simple Examples
332 **********************
334 First some simple examples to get the flavor of how one uses 'flex'.
336 The following 'flex' input specifies a scanner which, when it
337 encounters the string 'username' will replace it with the user's login
341 username printf( "%s", getlogin() );
343 By default, any text not matched by a 'flex' scanner is copied to the
344 output, so the net effect of this scanner is to copy its input file to
345 its output with each occurrence of 'username' expanded. In this input,
346 there is just one rule. 'username' is the "pattern" and the 'printf' is
347 the "action". The '%%' symbol marks the beginning of the rules.
349 Here's another simple example:
351 int num_lines = 0, num_chars = 0;
354 \n ++num_lines; ++num_chars;
362 printf( "# of lines = %d, # of chars = %d\n",
363 num_lines, num_chars );
366 This scanner counts the number of characters and the number of lines
367 in its input. It produces no output other than the final report on the
368 character and line counts. The first line declares two globals,
369 'num_lines' and 'num_chars', which are accessible both inside 'yylex()'
370 and in the 'main()' routine declared after the second '%%'. There are
371 two rules, one which matches a newline ('\n') and increments both the
372 line count and the character count, and one which matches any character
373 other than a newline (indicated by the '.' regular expression).
375 A somewhat more complicated example:
377 /* scanner for a toy Pascal-like language */
380 /* need this for the call to atof() below */
390 printf( "An integer: %s (%d)\n", yytext,
394 {DIGIT}+"."{DIGIT}* {
395 printf( "A float: %s (%g)\n", yytext,
399 if|then|begin|end|procedure|function {
400 printf( "A keyword: %s\n", yytext );
403 {ID} printf( "An identifier: %s\n", yytext );
405 "+"|"-"|"*"|"/" printf( "An operator: %s\n", yytext );
407 "{"[^{}\n]*"}" /* eat up one-line comments */
409 [ \t\n]+ /* eat up whitespace */
411 . printf( "Unrecognized character: %s\n", yytext );
415 int main( int argc, char **argv )
417 ++argv, --argc; /* skip over program name */
419 yyin = fopen( argv[0], "r" );
426 This is the beginnings of a simple scanner for a language like
427 Pascal. It identifies different types of "tokens" and reports on what
430 The details of this example will be explained in the following
434 File: flex.info, Node: Format, Next: Patterns, Prev: Simple Examples, Up: Top
436 5 Format of the Input File
437 **************************
439 The 'flex' input file consists of three sections, separated by a line
440 containing only '%%'.
450 * Definitions Section::
452 * User Code Section::
453 * Comments in the Input::
456 File: flex.info, Node: Definitions Section, Next: Rules Section, Prev: Format, Up: Format
458 5.1 Format of the Definitions Section
459 =====================================
461 The "definitions section" contains declarations of simple "name"
462 definitions to simplify the scanner specification, and declarations of
463 "start conditions", which are explained in a later section.
465 Name definitions have the form:
469 The 'name' is a word beginning with a letter or an underscore ('_')
470 followed by zero or more letters, digits, '_', or '-' (dash). The
471 definition is taken to begin at the first non-whitespace character
472 following the name and continuing to the end of the line. The
473 definition can subsequently be referred to using '{name}', which will
474 expand to '(definition)'. For example,
479 Defines 'DIGIT' to be a regular expression which matches a single
480 digit, and 'ID' to be a regular expression which matches a letter
481 followed by zero-or-more letters-or-digits. A subsequent reference to
489 and matches one-or-more digits followed by a '.' followed by
492 An unindented comment (i.e., a line beginning with '/*') is copied
493 verbatim to the output up to the next '*/'.
495 Any _indented_ text or text enclosed in '%{' and '%}' is also copied
496 verbatim to the output (with the %{ and %} symbols removed). The %{ and
497 %} symbols must appear unindented on lines by themselves.
499 A '%top' block is similar to a '%{' ... '%}' block, except that the
500 code in a '%top' block is relocated to the _top_ of the generated file,
501 before any flex definitions (1). The '%top' block is useful when you
502 want certain preprocessor macros to be defined or certain files to be
503 included before the generated code. The single characters, '{' and '}'
504 are used to delimit the '%top' block, as show in the example below:
507 /* This code goes at the "top" of the generated file. */
509 #include <inttypes.h>
512 Multiple '%top' blocks are allowed, and their order is preserved.
514 ---------- Footnotes ----------
516 (1) Actually, 'yyIN_HEADER' is defined before the '%top' block.
519 File: flex.info, Node: Rules Section, Next: User Code Section, Prev: Definitions Section, Up: Format
521 5.2 Format of the Rules Section
522 ===============================
524 The "rules" section of the 'flex' input contains a series of rules of
529 where the pattern must be unindented and the action must begin on the
530 same line. *Note Patterns::, for a further description of patterns and
533 In the rules section, any indented or %{ %} enclosed text appearing
534 before the first rule may be used to declare variables which are local
535 to the scanning routine and (after the declarations) code which is to be
536 executed whenever the scanning routine is entered. Other indented or %{
537 %} text in the rule section is still copied to the output, but its
538 meaning is not well-defined and it may well cause compile-time errors
539 (this feature is present for POSIX compliance. *Note Lex and Posix::,
540 for other such features).
542 Any _indented_ text or text enclosed in '%{' and '%}' is copied
543 verbatim to the output (with the %{ and %} symbols removed). The %{ and
544 %} symbols must appear unindented on lines by themselves.
547 File: flex.info, Node: User Code Section, Next: Comments in the Input, Prev: Rules Section, Up: Format
549 5.3 Format of the User Code Section
550 ===================================
552 The user code section is simply copied to 'lex.yy.c' verbatim. It is
553 used for companion routines which call or are called by the scanner.
554 The presence of this section is optional; if it is missing, the second
555 '%%' in the input file may be skipped, too.
558 File: flex.info, Node: Comments in the Input, Prev: User Code Section, Up: Format
560 5.4 Comments in the Input
561 =========================
563 Flex supports C-style comments, that is, anything between '/*' and '*/'
564 is considered a comment. Whenever flex encounters a comment, it copies
565 the entire comment verbatim to the generated source code. Comments may
566 appear just about anywhere, but with the following exceptions:
568 * Comments may not appear in the Rules Section wherever flex is
569 expecting a regular expression. This means comments may not appear
570 at the beginning of a line, or immediately following a list of
572 * Comments may not appear on an '%option' line in the Definitions
575 If you want to follow a simple rule, then always begin a comment on a
576 new line, with one or more whitespace characters before the initial
577 '/*'). This rule will work anywhere in the input file.
579 All the comments in the following example are valid:
585 /* Definitions Section */
590 ruleA /* after regex */ { /* code block */ } /* after code block */
591 /* Rules Section (indented) */
600 /* User Code Section */
604 File: flex.info, Node: Patterns, Next: Matching, Prev: Format, Up: Top
609 The patterns in the input (see *note Rules Section::) are written using
610 an extended set of regular expressions. These are:
613 match the character 'x'
616 any character (byte) except newline
619 a "character class"; in this case, the pattern matches either an
623 a "character class" with a range in it; matches an 'a', a 'b', any
624 letter from 'j' through 'o', or a 'Z'
627 a "negated character class", i.e., any character but those in the
628 class. In this case, any character EXCEPT an uppercase letter.
631 any character EXCEPT an uppercase letter or a newline
634 the lowercase consonants
637 zero or more r's, where r is any regular expression
643 zero or one r's (that is, "an optional r")
646 anywhere from two to five r's
655 the expansion of the 'name' definition (*note Format::).
658 the literal string: '[xyz]"foo'
661 if X is 'a', 'b', 'f', 'n', 'r', 't', or 'v', then the ANSI-C
662 interpretation of '\x'. Otherwise, a literal 'X' (used to escape
663 operators such as '*')
666 a NUL character (ASCII code 0)
669 the character with octal value 123
672 the character with hexadecimal value 2a
675 match an 'r'; parentheses are used to override precedence (see
679 apply option 'r' and omit option 's' while interpreting pattern.
680 Options may be zero or more of the characters 'i', 's', or 'x'.
682 'i' means case-insensitive. '-i' means case-sensitive.
684 's' alters the meaning of the '.' syntax to match any single byte
685 whatsoever. '-s' alters the meaning of '.' to match any byte
688 'x' ignores comments and whitespace in patterns. Whitespace is
689 ignored unless it is backslash-escaped, contained within '""'s, or
690 appears inside a character class.
692 The following are all valid:
694 (?:foo) same as (foo)
695 (?i:ab7) same as ([aA][bB]7)
696 (?-i:ab) same as (ab)
697 (?s:.) same as [\x00-\xFF]
698 (?-s:.) same as [^\n]
699 (?ix-s: a . b) same as ([Aa][^\n][bB])
700 (?x:a b) same as ("ab")
701 (?x:a\ b) same as ("a b")
702 (?x:a" "b) same as ("a b")
703 (?x:a[ ]b) same as ("a b")
710 omit everything within '()'. The first ')' character encountered
711 ends the pattern. It is not possible to for the comment to contain
712 a ')' character. The comment may span lines.
715 the regular expression 'r' followed by the regular expression 's';
716 called "concatenation"
719 either an 'r' or an 's'
722 an 'r' but only if it is followed by an 's'. The text matched by
723 's' is included when determining whether this rule is the longest
724 match, but is then returned to the input before the action is
725 executed. So the action only sees the text matched by 'r'. This
726 type of pattern is called "trailing context". (There are some
727 combinations of 'r/s' that flex cannot match correctly. *Note
728 Limitations::, regarding dangerous trailing context.)
731 an 'r', but only at the beginning of a line (i.e., when just
732 starting to scan, or right after a newline has been scanned).
735 an 'r', but only at the end of a line (i.e., just before a
736 newline). Equivalent to 'r/\n'.
738 Note that 'flex''s notion of "newline" is exactly whatever the C
739 compiler used to compile 'flex' interprets '\n' as; in particular,
740 on some DOS systems you must either filter out '\r's in the input
741 yourself, or explicitly use 'r/\r\n' for 'r$'.
744 an 'r', but only in start condition 's' (see *note Start
745 Conditions:: for discussion of start conditions).
748 same, but in any of start conditions 's1', 's2', or 's3'.
751 an 'r' in any start condition, even an exclusive one.
757 an end-of-file when in start condition 's1' or 's2'
759 Note that inside of a character class, all regular expression
760 operators lose their special meaning except escape ('\') and the
761 character class operators, '-', ']]', and, at the beginning of the
764 The regular expressions listed above are grouped according to
765 precedence, from highest precedence at the top to lowest at the bottom.
766 Those grouped together have equal precedence (see special note on the
767 precedence of the repeat operator, '{}', under the documentation for the
768 '--posix' POSIX compliance option). For example,
776 since the '*' operator has higher precedence than concatenation, and
777 concatenation higher than alternation ('|'). This pattern therefore
778 matches _either_ the string 'foo' _or_ the string 'ba' followed by
779 zero-or-more 'r''s. To match 'foo' or zero-or-more repetitions of the
784 And to match a sequence of zero or more repetitions of 'foo' and
789 In addition to characters and ranges of characters, character classes
790 can also contain "character class expressions". These are expressions
791 enclosed inside '[:' and ':]' delimiters (which themselves must appear
792 between the '[' and ']' of the character class. Other elements may
793 occur inside the character class, too). The valid expressions are:
795 [:alnum:] [:alpha:] [:blank:]
796 [:cntrl:] [:digit:] [:graph:]
797 [:lower:] [:print:] [:punct:]
798 [:space:] [:upper:] [:xdigit:]
800 These expressions all designate a set of characters equivalent to the
801 corresponding standard C 'isXXX' function. For example, '[:alnum:]'
802 designates those characters for which 'isalnum()' returns true - i.e.,
803 any alphabetic or numeric character. Some systems don't provide
804 'isblank()', so flex defines '[:blank:]' as a blank or a tab.
806 For example, the following character classes are all equivalent:
813 A word of caution. Character classes are expanded immediately when
814 seen in the 'flex' input. This means the character classes are
815 sensitive to the locale in which 'flex' is executed, and the resulting
816 scanner will not be sensitive to the runtime locale. This may or may
819 * If your scanner is case-insensitive (the '-i' flag), then
820 '[:upper:]' and '[:lower:]' are equivalent to '[:alpha:]'.
822 * Character classes with ranges, such as '[a-Z]', should be used with
823 caution in a case-insensitive scanner if the range spans upper or
824 lowercase characters. Flex does not know if you want to fold all
825 upper and lowercase characters together, or if you want the literal
826 numeric range specified (with no case folding). When in doubt,
827 flex will assume that you meant the literal numeric range, and will
828 issue a warning. The exception to this rule is a character range
829 such as '[a-z]' or '[S-W]' where it is obvious that you want
830 case-folding to occur. Here are some examples with the '-i' flag
833 Range Result Literal Range Alternate Range
834 '[a-t]' ok '[a-tA-T]'
835 '[A-T]' ok '[a-tA-T]'
836 '[A-t]' ambiguous '[A-Z\[\\\]_`a-t]' '[a-tA-T]'
837 '[_-{]' ambiguous '[_`a-z{]' '[_`a-zA-Z{]'
838 '[@-C]' ambiguous '[@ABC]' '[@A-Z\[\\\]_`abc]'
840 * A negated character class such as the example '[^A-Z]' above _will_
841 match a newline unless '\n' (or an equivalent escape sequence) is
842 one of the characters explicitly present in the negated character
843 class (e.g., '[^A-Z\n]'). This is unlike how many other regular
844 expression tools treat negated character classes, but unfortunately
845 the inconsistency is historically entrenched. Matching newlines
846 means that a pattern like '[^"]*' can match the entire input unless
847 there's another quote in the input.
849 Flex allows negation of character class expressions by prepending
850 '^' to the POSIX character class name.
852 [:^alnum:] [:^alpha:] [:^blank:]
853 [:^cntrl:] [:^digit:] [:^graph:]
854 [:^lower:] [:^print:] [:^punct:]
855 [:^space:] [:^upper:] [:^xdigit:]
857 Flex will issue a warning if the expressions '[:^upper:]' and
858 '[:^lower:]' appear in a case-insensitive scanner, since their
859 meaning is unclear. The current behavior is to skip them entirely,
860 but this may change without notice in future revisions of flex.
863 The '{-}' operator computes the difference of two character
864 classes. For example, '[a-c]{-}[b-z]' represents all the
865 characters in the class '[a-c]' that are not in the class '[b-z]'
866 (which in this case, is just the single character 'a'). The '{-}'
867 operator is left associative, so '[abc]{-}[b]{-}[c]' is the same as
868 '[a]'. Be careful not to accidentally create an empty set, which
872 The '{+}' operator computes the union of two character classes.
873 For example, '[a-z]{+}[0-9]' is the same as '[a-z0-9]'. This
874 operator is useful when preceded by the result of a difference
875 operation, as in, '[[:alpha:]]{-}[[:lower:]]{+}[q]', which is
876 equivalent to '[A-Zq]' in the "C" locale.
878 * A rule can have at most one instance of trailing context (the '/'
879 operator or the '$' operator). The start condition, '^', and
880 '<<EOF>>' patterns can only occur at the beginning of a pattern,
881 and, as well as with '/' and '$', cannot be grouped inside
882 parentheses. A '^' which does not occur at the beginning of a rule
883 or a '$' which does not occur at the end of a rule loses its
884 special properties and is treated as a normal character.
886 * The following are invalid:
891 Note that the first of these can be written 'foo/bar\n'.
893 * The following will result in '$' or '^' being treated as a normal
899 If the desired meaning is a 'foo' or a 'bar'-followed-by-a-newline,
900 the following could be used (the special '|' action is explained
901 below, *note Actions::):
904 bar$ /* action goes here */
906 A similar trick will work for matching a 'foo' or a
907 'bar'-at-the-beginning-of-a-line.
910 File: flex.info, Node: Matching, Next: Actions, Prev: Patterns, Up: Top
912 7 How the Input Is Matched
913 **************************
915 When the generated scanner is run, it analyzes its input looking for
916 strings which match any of its patterns. If it finds more than one
917 match, it takes the one matching the most text (for trailing context
918 rules, this includes the length of the trailing part, even though it
919 will then be returned to the input). If it finds two or more matches of
920 the same length, the rule listed first in the 'flex' input file is
923 Once the match is determined, the text corresponding to the match
924 (called the "token") is made available in the global character pointer
925 'yytext', and its length in the global integer 'yyleng'. The "action"
926 corresponding to the matched pattern is then executed (*note Actions::),
927 and then the remaining input is scanned for another match.
929 If no match is found, then the "default rule" is executed: the next
930 character in the input is considered matched and copied to the standard
931 output. Thus, the simplest valid 'flex' input is:
935 which generates a scanner that simply copies its input (one character
936 at a time) to its output.
938 Note that 'yytext' can be defined in two different ways: either as a
939 character _pointer_ or as a character _array_. You can control which
940 definition 'flex' uses by including one of the special directives
941 '%pointer' or '%array' in the first (definitions) section of your flex
942 input. The default is '%pointer', unless you use the '-l' lex
943 compatibility option, in which case 'yytext' will be an array. The
944 advantage of using '%pointer' is substantially faster scanning and no
945 buffer overflow when matching very large tokens (unless you run out of
946 dynamic memory). The disadvantage is that you are restricted in how
947 your actions can modify 'yytext' (*note Actions::), and calls to the
948 'unput()' function destroys the present contents of 'yytext', which can
949 be a considerable porting headache when moving between different 'lex'
952 The advantage of '%array' is that you can then modify 'yytext' to
953 your heart's content, and calls to 'unput()' do not destroy 'yytext'
954 (*note Actions::). Furthermore, existing 'lex' programs sometimes
955 access 'yytext' externally using declarations of the form:
957 extern char yytext[];
959 This definition is erroneous when used with '%pointer', but correct
962 The '%array' declaration defines 'yytext' to be an array of 'YYLMAX'
963 characters, which defaults to a fairly large value. You can change the
964 size by simply #define'ing 'YYLMAX' to a different value in the first
965 section of your 'flex' input. As mentioned above, with '%pointer'
966 yytext grows dynamically to accommodate large tokens. While this means
967 your '%pointer' scanner can accommodate very large tokens (such as
968 matching entire blocks of comments), bear in mind that each time the
969 scanner must resize 'yytext' it also must rescan the entire token from
970 the beginning, so matching such tokens can prove slow. 'yytext'
971 presently does _not_ dynamically grow if a call to 'unput()' results in
972 too much text being pushed back; instead, a run-time error results.
974 Also note that you cannot use '%array' with C++ scanner classes
978 File: flex.info, Node: Actions, Next: Generated Scanner, Prev: Matching, Up: Top
983 Each pattern in a rule has a corresponding "action", which can be any
984 arbitrary C statement. The pattern ends at the first non-escaped
985 whitespace character; the remainder of the line is its action. If the
986 action is empty, then when the pattern is matched the input token is
987 simply discarded. For example, here is the specification for a program
988 which deletes all occurrences of 'zap me' from its input:
993 This example will copy all other characters in the input to the
994 output since they will be matched by the default rule.
996 Here is a program which compresses multiple blanks and tabs down to a
997 single blank, and throws away whitespace found at the end of a line:
1000 [ \t]+ putchar( ' ' );
1001 [ \t]+$ /* ignore this token */
1003 If the action contains a '{', then the action spans till the
1004 balancing '}' is found, and the action may cross multiple lines. 'flex'
1005 knows about C strings and comments and won't be fooled by braces found
1006 within them, but also allows actions to begin with '%{' and will
1007 consider the action to be all the text up to the next '%}' (regardless
1008 of ordinary braces inside the action).
1010 An action consisting solely of a vertical bar ('|') means "same as
1011 the action for the next rule". See below for an illustration.
1013 Actions can include arbitrary C code, including 'return' statements
1014 to return a value to whatever routine called 'yylex()'. Each time
1015 'yylex()' is called it continues processing tokens from where it last
1016 left off until it either reaches the end of the file or executes a
1019 Actions are free to modify 'yytext' except for lengthening it (adding
1020 characters to its end-these will overwrite later characters in the input
1021 stream). This however does not apply when using '%array' (*note
1022 Matching::). In that case, 'yytext' may be freely modified in any way.
1024 Actions are free to modify 'yyleng' except they should not do so if
1025 the action also includes use of 'yymore()' (see below).
1027 There are a number of special directives which can be included within
1031 copies yytext to the scanner's output.
1034 followed by the name of a start condition places the scanner in the
1035 corresponding start condition (see below).
1038 directs the scanner to proceed on to the "second best" rule which
1039 matched the input (or a prefix of the input). The rule is chosen
1040 as described above in *note Matching::, and 'yytext' and 'yyleng'
1041 set up appropriately. It may either be one which matched as much
1042 text as the originally chosen rule but came later in the 'flex'
1043 input file, or one which matched less text. For example, the
1044 following will both count the words in the input and call the
1045 routine 'special()' whenever 'frob' is seen:
1050 frob special(); REJECT;
1051 [^ \t\n]+ ++word_count;
1053 Without the 'REJECT', any occurrences of 'frob' in the input would
1054 not be counted as words, since the scanner normally executes only
1055 one action per token. Multiple uses of 'REJECT' are allowed, each
1056 one finding the next best choice to the currently active rule. For
1057 example, when the following scanner scans the token 'abcd', it will
1058 write 'abcdabcaba' to the output:
1065 .|\n /* eat up any unmatched character */
1067 The first three rules share the fourth's action since they use the
1070 'REJECT' is a particularly expensive feature in terms of scanner
1071 performance; if it is used in _any_ of the scanner's actions it
1072 will slow down _all_ of the scanner's matching. Furthermore,
1073 'REJECT' cannot be used with the '-Cf' or '-CF' options (*note
1076 Note also that unlike the other special actions, 'REJECT' is a
1077 _branch_. Code immediately following it in the action will _not_
1081 tells the scanner that the next time it matches a rule, the
1082 corresponding token should be _appended_ onto the current value of
1083 'yytext' rather than replacing it. For example, given the input
1084 'mega-kludge' the following will write 'mega-mega-kludge' to the
1088 mega- ECHO; yymore();
1091 First 'mega-' is matched and echoed to the output. Then 'kludge'
1092 is matched, but the previous 'mega-' is still hanging around at the
1093 beginning of 'yytext' so the 'ECHO' for the 'kludge' rule will
1094 actually write 'mega-kludge'.
1096 Two notes regarding use of 'yymore()'. First, 'yymore()' depends on
1097 the value of 'yyleng' correctly reflecting the size of the current
1098 token, so you must not modify 'yyleng' if you are using 'yymore()'.
1099 Second, the presence of 'yymore()' in the scanner's action entails a
1100 minor performance penalty in the scanner's matching speed.
1102 'yyless(n)' returns all but the first 'n' characters of the current
1103 token back to the input stream, where they will be rescanned when the
1104 scanner looks for the next match. 'yytext' and 'yyleng' are adjusted
1105 appropriately (e.g., 'yyleng' will now be equal to 'n'). For example,
1106 on the input 'foobar' the following will write out 'foobarbar':
1109 foobar ECHO; yyless(3);
1112 An argument of 0 to 'yyless()' will cause the entire current input
1113 string to be scanned again. Unless you've changed how the scanner will
1114 subsequently process its input (using 'BEGIN', for example), this will
1115 result in an endless loop.
1117 Note that 'yyless()' is a macro and can only be used in the flex
1118 input file, not from other source files.
1120 'unput(c)' puts the character 'c' back onto the input stream. It
1121 will be the next character scanned. The following action will take the
1122 current token and cause it to be rescanned enclosed in parentheses.
1126 /* Copy yytext because unput() trashes yytext */
1127 char *yycopy = strdup( yytext );
1129 for ( i = yyleng - 1; i >= 0; --i )
1135 Note that since each 'unput()' puts the given character back at the
1136 _beginning_ of the input stream, pushing back strings must be done
1139 An important potential problem when using 'unput()' is that if you
1140 are using '%pointer' (the default), a call to 'unput()' _destroys_ the
1141 contents of 'yytext', starting with its rightmost character and
1142 devouring one character to the left with each call. If you need the
1143 value of 'yytext' preserved after a call to 'unput()' (as in the above
1144 example), you must either first copy it elsewhere, or build your scanner
1145 using '%array' instead (*note Matching::).
1147 Finally, note that you cannot put back 'EOF' to attempt to mark the
1148 input stream with an end-of-file.
1150 'input()' reads the next character from the input stream. For
1151 example, the following is one way to eat up C comments:
1159 while ( (c = input()) != '*' &&
1161 ; /* eat up text of comment */
1165 while ( (c = input()) == '*' )
1168 break; /* found the end */
1173 error( "EOF in comment" );
1179 (Note that if the scanner is compiled using 'C++', then 'input()' is
1180 instead referred to as yyinput(), in order to avoid a name clash with
1181 the 'C++' stream by the name of 'input'.)
1183 'YY_FLUSH_BUFFER;' flushes the scanner's internal buffer so that the
1184 next time the scanner attempts to match a token, it will first refill
1185 the buffer using 'YY_INPUT()' (*note Generated Scanner::). This action
1186 is a special case of the more general 'yy_flush_buffer;' function,
1187 described below (*note Multiple Input Buffers::)
1189 'yyterminate()' can be used in lieu of a return statement in an
1190 action. It terminates the scanner and returns a 0 to the scanner's
1191 caller, indicating "all done". By default, 'yyterminate()' is also
1192 called when an end-of-file is encountered. It is a macro and may be
1196 File: flex.info, Node: Generated Scanner, Next: Start Conditions, Prev: Actions, Up: Top
1198 9 The Generated Scanner
1199 ***********************
1201 The output of 'flex' is the file 'lex.yy.c', which contains the scanning
1202 routine 'yylex()', a number of tables used by it for matching tokens,
1203 and a number of auxiliary routines and macros. By default, 'yylex()' is
1204 declared as follows:
1208 ... various definitions and the actions in here ...
1211 (If your environment supports function prototypes, then it will be
1212 'int yylex( void )'.) This definition may be changed by defining the
1213 'YY_DECL' macro. For example, you could use:
1215 #define YY_DECL float lexscan( a, b ) float a, b;
1217 to give the scanning routine the name 'lexscan', returning a float,
1218 and taking two floats as arguments. Note that if you give arguments to
1219 the scanning routine using a K&R-style/non-prototyped function
1220 declaration, you must terminate the definition with a semi-colon (;).
1222 'flex' generates 'C99' function definitions by default. Flex used to
1223 have the ability to generate obsolete, er, 'traditional', function
1224 definitions. This was to support bootstrapping gcc on old systems.
1225 Unfortunately, traditional definitions prevent us from using any
1226 standard data types smaller than int (such as short, char, or bool) as
1227 function arguments. Furthermore, traditional definitions support added
1228 extra complexity in the skeleton file. For this reason, current
1229 versions of 'flex' generate standard C99 code only, leaving K&R-style
1230 functions to the historians.
1232 Whenever 'yylex()' is called, it scans tokens from the global input
1233 file 'yyin' (which defaults to stdin). It continues until it either
1234 reaches an end-of-file (at which point it returns the value 0) or one of
1235 its actions executes a 'return' statement.
1237 If the scanner reaches an end-of-file, subsequent calls are undefined
1238 unless either 'yyin' is pointed at a new input file (in which case
1239 scanning continues from that file), or 'yyrestart()' is called.
1240 'yyrestart()' takes one argument, a 'FILE *' pointer (which can be NULL,
1241 if you've set up 'YY_INPUT' to scan from a source other than 'yyin'),
1242 and initializes 'yyin' for scanning from that file. Essentially there
1243 is no difference between just assigning 'yyin' to a new input file or
1244 using 'yyrestart()' to do so; the latter is available for compatibility
1245 with previous versions of 'flex', and because it can be used to switch
1246 input files in the middle of scanning. It can also be used to throw
1247 away the current input buffer, by calling it with an argument of 'yyin';
1248 but it would be better to use 'YY_FLUSH_BUFFER' (*note Actions::). Note
1249 that 'yyrestart()' does _not_ reset the start condition to 'INITIAL'
1250 (*note Start Conditions::).
1252 If 'yylex()' stops scanning due to executing a 'return' statement in
1253 one of the actions, the scanner may then be called again and it will
1254 resume scanning where it left off.
1256 By default (and for purposes of efficiency), the scanner uses
1257 block-reads rather than simple 'getc()' calls to read characters from
1258 'yyin'. The nature of how it gets its input can be controlled by
1259 defining the 'YY_INPUT' macro. The calling sequence for 'YY_INPUT()' is
1260 'YY_INPUT(buf,result,max_size)'. Its action is to place up to
1261 'max_size' characters in the character array 'buf' and return in the
1262 integer variable 'result' either the number of characters read or the
1263 constant 'YY_NULL' (0 on Unix systems) to indicate 'EOF'. The default
1264 'YY_INPUT' reads from the global file-pointer 'yyin'.
1266 Here is a sample definition of 'YY_INPUT' (in the definitions section
1270 #define YY_INPUT(buf,result,max_size) \
1272 int c = getchar(); \
1273 result = (c == EOF) ? YY_NULL : (buf[0] = c, 1); \
1277 This definition will change the input processing to occur one
1278 character at a time.
1280 When the scanner receives an end-of-file indication from YY_INPUT, it
1281 then checks the 'yywrap()' function. If 'yywrap()' returns false
1282 (zero), then it is assumed that the function has gone ahead and set up
1283 'yyin' to point to another input file, and scanning continues. If it
1284 returns true (non-zero), then the scanner terminates, returning 0 to its
1285 caller. Note that in either case, the start condition remains
1286 unchanged; it does _not_ revert to 'INITIAL'.
1288 If you do not supply your own version of 'yywrap()', then you must
1289 either use '%option noyywrap' (in which case the scanner behaves as
1290 though 'yywrap()' returned 1), or you must link with '-lfl' to obtain
1291 the default version of the routine, which always returns 1.
1293 For scanning from in-memory buffers (e.g., scanning strings), see
1294 *note Scanning Strings::. *Note Multiple Input Buffers::.
1296 The scanner writes its 'ECHO' output to the 'yyout' global (default,
1297 'stdout'), which may be redefined by the user simply by assigning it to
1298 some other 'FILE' pointer.
1301 File: flex.info, Node: Start Conditions, Next: Multiple Input Buffers, Prev: Generated Scanner, Up: Top
1306 'flex' provides a mechanism for conditionally activating rules. Any
1307 rule whose pattern is prefixed with '<sc>' will only be active when the
1308 scanner is in the "start condition" named 'sc'. For example,
1310 <STRING>[^"]* { /* eat up the string body ... */
1314 will be active only when the scanner is in the 'STRING' start
1317 <INITIAL,STRING,QUOTE>\. { /* handle an escape ... */
1321 will be active only when the current start condition is either
1322 'INITIAL', 'STRING', or 'QUOTE'.
1324 Start conditions are declared in the definitions (first) section of
1325 the input using unindented lines beginning with either '%s' or '%x'
1326 followed by a list of names. The former declares "inclusive" start
1327 conditions, the latter "exclusive" start conditions. A start condition
1328 is activated using the 'BEGIN' action. Until the next 'BEGIN' action is
1329 executed, rules with the given start condition will be active and rules
1330 with other start conditions will be inactive. If the start condition is
1331 inclusive, then rules with no start conditions at all will also be
1332 active. If it is exclusive, then _only_ rules qualified with the start
1333 condition will be active. A set of rules contingent on the same
1334 exclusive start condition describe a scanner which is independent of any
1335 of the other rules in the 'flex' input. Because of this, exclusive
1336 start conditions make it easy to specify "mini-scanners" which scan
1337 portions of the input that are syntactically different from the rest
1340 If the distinction between inclusive and exclusive start conditions
1341 is still a little vague, here's a simple example illustrating the
1342 connection between the two. The set of rules:
1347 <example>foo do_something();
1349 bar something_else();
1356 <example>foo do_something();
1358 <INITIAL,example>bar something_else();
1360 Without the '<INITIAL,example>' qualifier, the 'bar' pattern in the
1361 second example wouldn't be active (i.e., couldn't match) when in start
1362 condition 'example'. If we just used '<example>' to qualify 'bar',
1363 though, then it would only be active in 'example' and not in 'INITIAL',
1364 while in the first example it's active in both, because in the first
1365 example the 'example' start condition is an inclusive '(%s)' start
1368 Also note that the special start-condition specifier '<*>' matches
1369 every start condition. Thus, the above example could also have been
1375 <example>foo do_something();
1377 <*>bar something_else();
1379 The default rule (to 'ECHO' any unmatched character) remains active
1380 in start conditions. It is equivalent to:
1384 'BEGIN(0)' returns to the original state where only the rules with no
1385 start conditions are active. This state can also be referred to as the
1386 start-condition 'INITIAL', so 'BEGIN(INITIAL)' is equivalent to
1387 'BEGIN(0)'. (The parentheses around the start condition name are not
1388 required but are considered good style.)
1390 'BEGIN' actions can also be given as indented code at the beginning
1391 of the rules section. For example, the following will cause the scanner
1392 to enter the 'SPECIAL' start condition whenever 'yylex()' is called and
1393 the global variable 'enter_special' is true:
1399 if ( enter_special )
1402 <SPECIAL>blahblahblah
1403 ...more rules follow...
1405 To illustrate the uses of start conditions, here is a scanner which
1406 provides two different interpretations of a string like '123.456'. By
1407 default it will treat it as three tokens, the integer '123', a dot
1408 ('.'), and the integer '456'. But if the string is preceded earlier in
1409 the line by the string 'expect-floats' it will treat it as a single
1410 token, the floating-point number '123.456':
1418 expect-floats BEGIN(expect);
1420 <expect>[0-9]+.[0-9]+ {
1421 printf( "found a float, = %f\n",
1425 /* that's the end of the line, so
1426 * we need another "expect-number"
1427 * before we'll recognize any more
1434 printf( "found an integer, = %d\n",
1438 "." printf( "found a dot\n" );
1440 Here is a scanner which recognizes (and discards) C comments while
1441 maintaining a count of the current input line.
1447 "/*" BEGIN(comment);
1449 <comment>[^*\n]* /* eat anything that's not a '*' */
1450 <comment>"*"+[^*/\n]* /* eat up '*'s not followed by '/'s */
1451 <comment>\n ++line_num;
1452 <comment>"*"+"/" BEGIN(INITIAL);
1454 This scanner goes to a bit of trouble to match as much text as
1455 possible with each rule. In general, when attempting to write a
1456 high-speed scanner try to match as much possible in each rule, as it's a
1459 Note that start-conditions names are really integer values and can be
1460 stored as such. Thus, the above could be extended in the following
1469 comment_caller = INITIAL;
1476 comment_caller = foo;
1480 <comment>[^*\n]* /* eat anything that's not a '*' */
1481 <comment>"*"+[^*/\n]* /* eat up '*'s not followed by '/'s */
1482 <comment>\n ++line_num;
1483 <comment>"*"+"/" BEGIN(comment_caller);
1485 Furthermore, you can access the current start condition using the
1486 integer-valued 'YY_START' macro. For example, the above assignments to
1487 'comment_caller' could instead be written
1489 comment_caller = YY_START;
1491 Flex provides 'YYSTATE' as an alias for 'YY_START' (since that is
1492 what's used by AT&T 'lex').
1494 For historical reasons, start conditions do not have their own
1495 name-space within the generated scanner. The start condition names are
1496 unmodified in the generated scanner and generated header. *Note
1497 option-header::. *Note option-prefix::.
1499 Finally, here's an example of how to match C-style quoted strings
1500 using exclusive start conditions, including expanded escape sequences
1501 (but not including checking for a string that's too long):
1506 char string_buf[MAX_STR_CONST];
1507 char *string_buf_ptr;
1510 \" string_buf_ptr = string_buf; BEGIN(str);
1512 <str>\" { /* saw closing quote - all done */
1514 *string_buf_ptr = '\0';
1515 /* return string constant token type and
1521 /* error - unterminated string constant */
1522 /* generate error message */
1526 /* octal escape sequence */
1529 (void) sscanf( yytext + 1, "%o", &result );
1531 if ( result > 0xff )
1532 /* error, constant is out-of-bounds */
1534 *string_buf_ptr++ = result;
1538 /* generate error - bad escape sequence; something
1539 * like '\48' or '\0777777'
1543 <str>\\n *string_buf_ptr++ = '\n';
1544 <str>\\t *string_buf_ptr++ = '\t';
1545 <str>\\r *string_buf_ptr++ = '\r';
1546 <str>\\b *string_buf_ptr++ = '\b';
1547 <str>\\f *string_buf_ptr++ = '\f';
1549 <str>\\(.|\n) *string_buf_ptr++ = yytext[1];
1552 char *yptr = yytext;
1555 *string_buf_ptr++ = *yptr++;
1558 Often, such as in some of the examples above, you wind up writing a
1559 whole bunch of rules all preceded by the same start condition(s). Flex
1560 makes this a little easier and cleaner by introducing a notion of start
1561 condition "scope". A start condition scope is begun with:
1565 where '<SCs>' is a list of one or more start conditions. Inside the
1566 start condition scope, every rule automatically has the prefix '<SCs>'
1567 applied to it, until a '}' which matches the initial '{'. So, for
1579 <ESC>"\\n" return '\n';
1580 <ESC>"\\r" return '\r';
1581 <ESC>"\\f" return '\f';
1582 <ESC>"\\0" return '\0';
1584 Start condition scopes may be nested.
1586 The following routines are available for manipulating stacks of start
1589 -- Function: void yy_push_state ( int 'new_state' )
1590 pushes the current start condition onto the top of the start
1591 condition stack and switches to 'new_state' as though you had used
1592 'BEGIN new_state' (recall that start condition names are also
1595 -- Function: void yy_pop_state ()
1596 pops the top of the stack and switches to it via 'BEGIN'.
1598 -- Function: int yy_top_state ()
1599 returns the top of the stack without altering the stack's contents.
1601 The start condition stack grows dynamically and so has no built-in
1602 size limitation. If memory is exhausted, program execution aborts.
1604 To use start condition stacks, your scanner must include a '%option
1605 stack' directive (*note Scanner Options::).
1608 File: flex.info, Node: Multiple Input Buffers, Next: EOF, Prev: Start Conditions, Up: Top
1610 11 Multiple Input Buffers
1611 *************************
1613 Some scanners (such as those which support "include" files) require
1614 reading from several input streams. As 'flex' scanners do a large
1615 amount of buffering, one cannot control where the next input will be
1616 read from by simply writing a 'YY_INPUT()' which is sensitive to the
1617 scanning context. 'YY_INPUT()' is only called when the scanner reaches
1618 the end of its buffer, which may be a long time after scanning a
1619 statement such as an 'include' statement which requires switching the
1622 To negotiate these sorts of problems, 'flex' provides a mechanism for
1623 creating and switching between multiple input buffers. An input buffer
1624 is created by using:
1626 -- Function: YY_BUFFER_STATE yy_create_buffer ( FILE *file, int size )
1628 which takes a 'FILE' pointer and a size and creates a buffer
1629 associated with the given file and large enough to hold 'size'
1630 characters (when in doubt, use 'YY_BUF_SIZE' for the size). It returns
1631 a 'YY_BUFFER_STATE' handle, which may then be passed to other routines
1632 (see below). The 'YY_BUFFER_STATE' type is a pointer to an opaque
1633 'struct yy_buffer_state' structure, so you may safely initialize
1634 'YY_BUFFER_STATE' variables to '((YY_BUFFER_STATE) 0)' if you wish, and
1635 also refer to the opaque structure in order to correctly declare input
1636 buffers in source files other than that of your scanner. Note that the
1637 'FILE' pointer in the call to 'yy_create_buffer' is only used as the
1638 value of 'yyin' seen by 'YY_INPUT'. If you redefine 'YY_INPUT()' so it
1639 no longer uses 'yyin', then you can safely pass a NULL 'FILE' pointer to
1640 'yy_create_buffer'. You select a particular buffer to scan from using:
1642 -- Function: void yy_switch_to_buffer ( YY_BUFFER_STATE new_buffer )
1644 The above function switches the scanner's input buffer so subsequent
1645 tokens will come from 'new_buffer'. Note that 'yy_switch_to_buffer()'
1646 may be used by 'yywrap()' to set things up for continued scanning,
1647 instead of opening a new file and pointing 'yyin' at it. If you are
1648 looking for a stack of input buffers, then you want to use
1649 'yypush_buffer_state()' instead of this function. Note also that
1650 switching input sources via either 'yy_switch_to_buffer()' or 'yywrap()'
1651 does _not_ change the start condition.
1653 -- Function: void yy_delete_buffer ( YY_BUFFER_STATE buffer )
1655 is used to reclaim the storage associated with a buffer. ('buffer'
1656 can be NULL, in which case the routine does nothing.) You can also
1657 clear the current contents of a buffer using:
1659 -- Function: void yypush_buffer_state ( YY_BUFFER_STATE buffer )
1661 This function pushes the new buffer state onto an internal stack.
1662 The pushed state becomes the new current state. The stack is maintained
1663 by flex and will grow as required. This function is intended to be used
1664 instead of 'yy_switch_to_buffer', when you want to change states, but
1665 preserve the current state for later use.
1667 -- Function: void yypop_buffer_state ( )
1669 This function removes the current state from the top of the stack,
1670 and deletes it by calling 'yy_delete_buffer'. The next state on the
1671 stack, if any, becomes the new current state.
1673 -- Function: void yy_flush_buffer ( YY_BUFFER_STATE buffer )
1675 This function discards the buffer's contents, so the next time the
1676 scanner attempts to match a token from the buffer, it will first fill
1677 the buffer anew using 'YY_INPUT()'.
1679 -- Function: YY_BUFFER_STATE yy_new_buffer ( FILE *file, int size )
1681 is an alias for 'yy_create_buffer()', provided for compatibility with
1682 the C++ use of 'new' and 'delete' for creating and destroying dynamic
1685 'YY_CURRENT_BUFFER' macro returns a 'YY_BUFFER_STATE' handle to the
1686 current buffer. It should not be used as an lvalue.
1688 Here are two examples of using these features for writing a scanner
1689 which expands include files (the '<<EOF>>' feature is discussed below).
1691 This first example uses yypush_buffer_state and yypop_buffer_state.
1692 Flex maintains the stack internally.
1694 /* the "incl" state is used for picking up the name
1695 * of an include file
1699 include BEGIN(incl);
1704 <incl>[ \t]* /* eat the whitespace */
1705 <incl>[^ \t\n]+ { /* got the include file name */
1706 yyin = fopen( yytext, "r" );
1711 yypush_buffer_state(yy_create_buffer( yyin, YY_BUF_SIZE ));
1717 yypop_buffer_state();
1719 if ( !YY_CURRENT_BUFFER )
1725 The second example, below, does the same thing as the previous
1726 example did, but manages its own input buffer stack manually (instead of
1727 letting flex do it).
1729 /* the "incl" state is used for picking up the name
1730 * of an include file
1735 #define MAX_INCLUDE_DEPTH 10
1736 YY_BUFFER_STATE include_stack[MAX_INCLUDE_DEPTH];
1737 int include_stack_ptr = 0;
1741 include BEGIN(incl);
1746 <incl>[ \t]* /* eat the whitespace */
1747 <incl>[^ \t\n]+ { /* got the include file name */
1748 if ( include_stack_ptr >= MAX_INCLUDE_DEPTH )
1750 fprintf( stderr, "Includes nested too deeply" );
1754 include_stack[include_stack_ptr++] =
1757 yyin = fopen( yytext, "r" );
1762 yy_switch_to_buffer(
1763 yy_create_buffer( yyin, YY_BUF_SIZE ) );
1769 if ( --include_stack_ptr == 0 )
1776 yy_delete_buffer( YY_CURRENT_BUFFER );
1777 yy_switch_to_buffer(
1778 include_stack[include_stack_ptr] );
1782 The following routines are available for setting up input buffers for
1783 scanning in-memory strings instead of files. All of them create a new
1784 input buffer for scanning the string, and return a corresponding
1785 'YY_BUFFER_STATE' handle (which you should delete with
1786 'yy_delete_buffer()' when done with it). They also switch to the new
1787 buffer using 'yy_switch_to_buffer()', so the next call to 'yylex()' will
1788 start scanning the string.
1790 -- Function: YY_BUFFER_STATE yy_scan_string ( const char *str )
1791 scans a NUL-terminated string.
1793 -- Function: YY_BUFFER_STATE yy_scan_bytes ( const char *bytes, int len
1795 scans 'len' bytes (including possibly 'NUL's) starting at location
1798 Note that both of these functions create and scan a _copy_ of the
1799 string or bytes. (This may be desirable, since 'yylex()' modifies the
1800 contents of the buffer it is scanning.) You can avoid the copy by
1803 -- Function: YY_BUFFER_STATE yy_scan_buffer (char *base, yy_size_t
1805 which scans in place the buffer starting at 'base', consisting of
1806 'size' bytes, the last two bytes of which _must_ be
1807 'YY_END_OF_BUFFER_CHAR' (ASCII NUL). These last two bytes are not
1808 scanned; thus, scanning consists of 'base[0]' through
1809 'base[size-2]', inclusive.
1811 If you fail to set up 'base' in this manner (i.e., forget the final
1812 two 'YY_END_OF_BUFFER_CHAR' bytes), then 'yy_scan_buffer()' returns a
1813 NULL pointer instead of creating a new input buffer.
1815 -- Data type: yy_size_t
1816 is an integral type to which you can cast an integer expression
1817 reflecting the size of the buffer.
1820 File: flex.info, Node: EOF, Next: Misc Macros, Prev: Multiple Input Buffers, Up: Top
1822 12 End-of-File Rules
1823 ********************
1825 The special rule '<<EOF>>' indicates actions which are to be taken when
1826 an end-of-file is encountered and 'yywrap()' returns non-zero (i.e.,
1827 indicates no further files to process). The action must finish by doing
1828 one of the following things:
1830 * assigning 'yyin' to a new input file (in previous versions of
1831 'flex', after doing the assignment you had to call the special
1832 action 'YY_NEW_FILE'. This is no longer necessary.)
1834 * executing a 'return' statement;
1836 * executing the special 'yyterminate()' action.
1838 * or, switching to a new buffer using 'yy_switch_to_buffer()' as
1839 shown in the example above.
1841 <<EOF>> rules may not be used with other patterns; they may only be
1842 qualified with a list of start conditions. If an unqualified <<EOF>>
1843 rule is given, it applies to _all_ start conditions which do not already
1844 have <<EOF>> actions. To specify an <<EOF>> rule for only the initial
1845 start condition, use:
1849 These rules are useful for catching things like unclosed comments.
1855 ...other rules for dealing with quotes...
1858 error( "unterminated quote" );
1863 yyin = fopen( *filelist, "r" );
1869 File: flex.info, Node: Misc Macros, Next: User Values, Prev: EOF, Up: Top
1871 13 Miscellaneous Macros
1872 ***********************
1874 The macro 'YY_USER_ACTION' can be defined to provide an action which is
1875 always executed prior to the matched rule's action. For example, it
1876 could be #define'd to call a routine to convert yytext to lower-case.
1877 When 'YY_USER_ACTION' is invoked, the variable 'yy_act' gives the number
1878 of the matched rule (rules are numbered starting with 1). Suppose you
1879 want to profile how often each of your rules is matched. The following
1882 #define YY_USER_ACTION ++ctr[yy_act]
1884 where 'ctr' is an array to hold the counts for the different rules.
1885 Note that the macro 'YY_NUM_RULES' gives the total number of rules
1886 (including the default rule), even if you use '-s)', so a correct
1887 declaration for 'ctr' is:
1889 int ctr[YY_NUM_RULES];
1891 The macro 'YY_USER_INIT' may be defined to provide an action which is
1892 always executed before the first scan (and before the scanner's internal
1893 initializations are done). For example, it could be used to call a
1894 routine to read in a data table or open a logging file.
1896 The macro 'yy_set_interactive(is_interactive)' can be used to control
1897 whether the current buffer is considered "interactive". An interactive
1898 buffer is processed more slowly, but must be used when the scanner's
1899 input source is indeed interactive to avoid problems due to waiting to
1900 fill buffers (see the discussion of the '-I' flag in *note Scanner
1901 Options::). A non-zero value in the macro invocation marks the buffer
1902 as interactive, a zero value as non-interactive. Note that use of this
1903 macro overrides '%option always-interactive' or '%option
1904 never-interactive' (*note Scanner Options::). 'yy_set_interactive()'
1905 must be invoked prior to beginning to scan the buffer that is (or is
1906 not) to be considered interactive.
1908 The macro 'yy_set_bol(at_bol)' can be used to control whether the
1909 current buffer's scanning context for the next token match is done as
1910 though at the beginning of a line. A non-zero macro argument makes
1911 rules anchored with '^' active, while a zero argument makes '^' rules
1914 The macro 'YY_AT_BOL()' returns true if the next token scanned from
1915 the current buffer will have '^' rules active, false otherwise.
1917 In the generated scanner, the actions are all gathered in one large
1918 switch statement and separated using 'YY_BREAK', which may be redefined.
1919 By default, it is simply a 'break', to separate each rule's action from
1920 the following rule's. Redefining 'YY_BREAK' allows, for example, C++
1921 users to #define YY_BREAK to do nothing (while being very careful that
1922 every rule ends with a 'break' or a 'return'!) to avoid suffering from
1923 unreachable statement warnings where because a rule's action ends with
1924 'return', the 'YY_BREAK' is inaccessible.
1927 File: flex.info, Node: User Values, Next: Yacc, Prev: Misc Macros, Up: Top
1929 14 Values Available To the User
1930 *******************************
1932 This chapter summarizes the various values available to the user in the
1936 holds the text of the current token. It may be modified but not
1937 lengthened (you cannot append characters to the end).
1939 If the special directive '%array' appears in the first section of
1940 the scanner description, then 'yytext' is instead declared 'char
1941 yytext[YYLMAX]', where 'YYLMAX' is a macro definition that you can
1942 redefine in the first section if you don't like the default value
1943 (generally 8KB). Using '%array' results in somewhat slower
1944 scanners, but the value of 'yytext' becomes immune to calls to
1945 'unput()', which potentially destroy its value when 'yytext' is a
1946 character pointer. The opposite of '%array' is '%pointer', which
1949 You cannot use '%array' when generating C++ scanner classes (the
1953 holds the length of the current token.
1956 is the file which by default 'flex' reads from. It may be
1957 redefined but doing so only makes sense before scanning begins or
1958 after an EOF has been encountered. Changing it in the midst of
1959 scanning will have unexpected results since 'flex' buffers its
1960 input; use 'yyrestart()' instead. Once scanning terminates because
1961 an end-of-file has been seen, you can assign 'yyin' at the new
1962 input file and then call the scanner again to continue scanning.
1964 'void yyrestart( FILE *new_file )'
1965 may be called to point 'yyin' at the new input file. The
1966 switch-over to the new file is immediate (any previously
1967 buffered-up input is lost). Note that calling 'yyrestart()' with
1968 'yyin' as an argument thus throws away the current input buffer and
1969 continues scanning the same input file.
1972 is the file to which 'ECHO' actions are done. It can be reassigned
1976 returns a 'YY_BUFFER_STATE' handle to the current buffer.
1979 returns an integer value corresponding to the current start
1980 condition. You can subsequently use this value with 'BEGIN' to
1981 return to that start condition.
1984 File: flex.info, Node: Yacc, Next: Scanner Options, Prev: User Values, Up: Top
1986 15 Interfacing with Yacc
1987 ************************
1989 One of the main uses of 'flex' is as a companion to the 'yacc'
1990 parser-generator. 'yacc' parsers expect to call a routine named
1991 'yylex()' to find the next input token. The routine is supposed to
1992 return the type of the next token as well as putting any associated
1993 value in the global 'yylval'. To use 'flex' with 'yacc', one specifies
1994 the '-d' option to 'yacc' to instruct it to generate the file 'y.tab.h'
1995 containing definitions of all the '%tokens' appearing in the 'yacc'
1996 input. This file is then included in the 'flex' scanner. For example,
1997 if one of the tokens is 'TOK_NUMBER', part of the scanner might look
2006 [0-9]+ yylval = atoi( yytext ); return TOK_NUMBER;
2009 File: flex.info, Node: Scanner Options, Next: Performance, Prev: Yacc, Up: Top
2014 The various 'flex' options are categorized by function in the following
2015 menu. If you want to lookup a particular option by name, *Note Index of
2020 * Options for Specifying Filenames::
2021 * Options Affecting Scanner Behavior::
2022 * Code-Level And API Options::
2023 * Options for Scanner Speed and Size::
2024 * Debugging Options::
2025 * Miscellaneous Options::
2027 Even though there are many scanner options, a typical scanner might
2028 only specify the following options:
2030 %option 8bit reentrant bison-bridge
2031 %option warn nodefault
2033 %option outfile="scanner.c" header-file="scanner.h"
2035 The first line specifies the general type of scanner we want. The
2036 second line specifies that we are being careful. The third line asks
2037 flex to track line numbers. The last line tells flex what to name the
2038 files. (The options can be specified in any order. We just divided
2041 'flex' also provides a mechanism for controlling options within the
2042 scanner specification itself, rather than from the flex command-line.
2043 This is done by including '%option' directives in the first section of
2044 the scanner specification. You can specify multiple options with a
2045 single '%option' directive, and multiple directives in the first section
2046 of your flex input file.
2048 Most options are given simply as names, optionally preceded by the
2049 word 'no' (with no intervening whitespace) to negate their meaning. The
2050 names are the same as their long-option equivalents (but without the
2053 'flex' scans your rule actions to determine whether you use the
2054 'REJECT' or 'yymore()' features. The 'REJECT' and 'yymore' options are
2055 available to override its decision as to whether you use the options,
2056 either by setting them (e.g., '%option reject)' to indicate the feature
2057 is indeed used, or unsetting them to indicate it actually is not used
2058 (e.g., '%option noyymore)'.
2060 A number of options are available for lint purists who want to
2061 suppress the appearance of unneeded routines in the generated scanner.
2062 Each of the following, if unset (e.g., '%option nounput'), results in
2063 the corresponding routine not appearing in the generated scanner:
2066 yy_push_state, yy_pop_state, yy_top_state
2067 yy_scan_buffer, yy_scan_bytes, yy_scan_string
2069 yyget_extra, yyset_extra, yyget_leng, yyget_text,
2070 yyget_lineno, yyset_lineno, yyget_in, yyset_in,
2071 yyget_out, yyset_out, yyget_lval, yyset_lval,
2072 yyget_lloc, yyset_lloc, yyget_debug, yyset_debug
2074 (though 'yy_push_state()' and friends won't appear anyway unless you
2075 use '%option stack)'.
2078 File: flex.info, Node: Options for Specifying Filenames, Next: Options Affecting Scanner Behavior, Prev: Scanner Options, Up: Scanner Options
2080 16.1 Options for Specifying Filenames
2081 =====================================
2083 '--header-file=FILE, '%option header-file="FILE"''
2084 instructs flex to write a C header to 'FILE'. This file contains
2085 function prototypes, extern variables, and types used by the
2086 scanner. Only the external API is exported by the header file.
2087 Many macros that are usable from within scanner actions are not
2088 exported to the header file. This is due to namespace problems and
2089 the goal of a clean external API.
2091 While in the header, the macro 'yyIN_HEADER' is defined, where 'yy'
2092 is substituted with the appropriate prefix.
2094 The '--header-file' option is not compatible with the '--c++'
2095 option, since the C++ scanner provides its own header in
2098 '-oFILE, --outfile=FILE, '%option outfile="FILE"''
2099 directs flex to write the scanner to the file 'FILE' instead of
2100 'lex.yy.c'. If you combine '--outfile' with the '--stdout' option,
2101 then the scanner is written to 'stdout' but its '#line' directives
2102 (see the '-l' option above) refer to the file 'FILE'.
2104 '-t, --stdout, '%option stdout''
2105 instructs 'flex' to write the scanner it generates to standard
2106 output instead of 'lex.yy.c'.
2108 '-SFILE, --skel=FILE'
2109 overrides the default skeleton file from which 'flex' constructs
2110 its scanners. You'll never need this option unless you are doing
2111 'flex' maintenance or development.
2113 '--tables-file=FILE'
2114 Write serialized scanner dfa tables to FILE. The generated scanner
2115 will not contain the tables, and requires them to be loaded at
2116 runtime. *Note serialization::.
2119 This option is for flex development. We document it here in case
2120 you stumble upon it by accident or in case you suspect some
2121 inconsistency in the serialized tables. Flex will serialize the
2122 scanner dfa tables but will also generate the in-code tables as it
2123 normally does. At runtime, the scanner will verify that the
2124 serialized tables match the in-code tables, instead of loading
2128 File: flex.info, Node: Options Affecting Scanner Behavior, Next: Code-Level And API Options, Prev: Options for Specifying Filenames, Up: Scanner Options
2130 16.2 Options Affecting Scanner Behavior
2131 =======================================
2133 '-i, --case-insensitive, '%option case-insensitive''
2134 instructs 'flex' to generate a "case-insensitive" scanner. The
2135 case of letters given in the 'flex' input patterns will be ignored,
2136 and tokens in the input will be matched regardless of case. The
2137 matched text given in 'yytext' will have the preserved case (i.e.,
2138 it will not be folded). For tricky behavior, see *note case and
2141 '-l, --lex-compat, '%option lex-compat''
2142 turns on maximum compatibility with the original AT&T 'lex'
2143 implementation. Note that this does not mean _full_ compatibility.
2144 Use of this option costs a considerable amount of performance, and
2145 it cannot be used with the '--c++', '--full', '--fast', '-Cf', or
2146 '-CF' options. For details on the compatibilities it provides, see
2147 *note Lex and Posix::. This option also results in the name
2148 'YY_FLEX_LEX_COMPAT' being '#define''d in the generated scanner.
2150 '-B, --batch, '%option batch''
2151 instructs 'flex' to generate a "batch" scanner, the opposite of
2152 _interactive_ scanners generated by '--interactive' (see below).
2153 In general, you use '-B' when you are _certain_ that your scanner
2154 will never be used interactively, and you want to squeeze a
2155 _little_ more performance out of it. If your goal is instead to
2156 squeeze out a _lot_ more performance, you should be using the '-Cf'
2157 or '-CF' options, which turn on '--batch' automatically anyway.
2159 '-I, --interactive, '%option interactive''
2160 instructs 'flex' to generate an interactive scanner. An
2161 interactive scanner is one that only looks ahead to decide what
2162 token has been matched if it absolutely must. It turns out that
2163 always looking one extra character ahead, even if the scanner has
2164 already seen enough text to disambiguate the current token, is a
2165 bit faster than only looking ahead when necessary. But scanners
2166 that always look ahead give dreadful interactive performance; for
2167 example, when a user types a newline, it is not recognized as a
2168 newline token until they enter _another_ token, which often means
2169 typing in another whole line.
2171 'flex' scanners default to 'interactive' unless you use the '-Cf'
2172 or '-CF' table-compression options (*note Performance::). That's
2173 because if you're looking for high-performance you should be using
2174 one of these options, so if you didn't, 'flex' assumes you'd rather
2175 trade off a bit of run-time performance for intuitive interactive
2176 behavior. Note also that you _cannot_ use '--interactive' in
2177 conjunction with '-Cf' or '-CF'. Thus, this option is not really
2178 needed; it is on by default for all those cases in which it is
2181 You can force a scanner to _not_ be interactive by using '--batch'
2183 '-7, --7bit, '%option 7bit''
2184 instructs 'flex' to generate a 7-bit scanner, i.e., one which can
2185 only recognize 7-bit characters in its input. The advantage of
2186 using '--7bit' is that the scanner's tables can be up to half the
2187 size of those generated using the '--8bit'. The disadvantage is
2188 that such scanners often hang or crash if their input contains an
2191 Note, however, that unless you generate your scanner using the
2192 '-Cf' or '-CF' table compression options, use of '--7bit' will save
2193 only a small amount of table space, and make your scanner
2194 considerably less portable. 'Flex''s default behavior is to
2195 generate an 8-bit scanner unless you use the '-Cf' or '-CF', in
2196 which case 'flex' defaults to generating 7-bit scanners unless your
2197 site was always configured to generate 8-bit scanners (as will
2198 often be the case with non-USA sites). You can tell whether flex
2199 generated a 7-bit or an 8-bit scanner by inspecting the flag
2200 summary in the '--verbose' output as described above.
2202 Note that if you use '-Cfe' or '-CFe' 'flex' still defaults to
2203 generating an 8-bit scanner, since usually with these compression
2204 options full 8-bit tables are not much more expensive than 7-bit
2207 '-8, --8bit, '%option 8bit''
2208 instructs 'flex' to generate an 8-bit scanner, i.e., one which can
2209 recognize 8-bit characters. This flag is only needed for scanners
2210 generated using '-Cf' or '-CF', as otherwise flex defaults to
2211 generating an 8-bit scanner anyway.
2213 See the discussion of '--7bit' above for 'flex''s default behavior
2214 and the tradeoffs between 7-bit and 8-bit scanners.
2216 '--default, '%option default''
2217 generate the default rule.
2219 '--always-interactive, '%option always-interactive''
2220 instructs flex to generate a scanner which always considers its
2221 input _interactive_. Normally, on each new input file the scanner
2222 calls 'isatty()' in an attempt to determine whether the scanner's
2223 input source is interactive and thus should be read a character at
2224 a time. When this option is used, however, then no such call is
2227 '--never-interactive, '--never-interactive''
2228 instructs flex to generate a scanner which never considers its
2229 input interactive. This is the opposite of 'always-interactive'.
2231 '-X, --posix, '%option posix''
2232 turns on maximum compatibility with the POSIX 1003.2-1992
2233 definition of 'lex'. Since 'flex' was originally designed to
2234 implement the POSIX definition of 'lex' this generally involves
2235 very few changes in behavior. At the current writing the known
2236 differences between 'flex' and the POSIX standard are:
2238 * In POSIX and AT&T 'lex', the repeat operator, '{}', has lower
2239 precedence than concatenation (thus 'ab{3}' yields 'ababab').
2240 Most POSIX utilities use an Extended Regular Expression (ERE)
2241 precedence that has the precedence of the repeat operator
2242 higher than concatenation (which causes 'ab{3}' to yield
2243 'abbb'). By default, 'flex' places the precedence of the
2244 repeat operator higher than concatenation which matches the
2245 ERE processing of other POSIX utilities. When either
2246 '--posix' or '-l' are specified, 'flex' will use the
2247 traditional AT&T and POSIX-compliant precedence for the repeat
2248 operator where concatenation has higher precedence than the
2251 '--stack, '%option stack''
2252 enables the use of start condition stacks (*note Start
2255 '--stdinit, '%option stdinit''
2256 if set (i.e., %option stdinit) initializes 'yyin' and 'yyout' to
2257 'stdin' and 'stdout', instead of the default of 'NULL'. Some
2258 existing 'lex' programs depend on this behavior, even though it is
2259 not compliant with ANSI C, which does not require 'stdin' and
2260 'stdout' to be compile-time constant. In a reentrant scanner,
2261 however, this is not a problem since initialization is performed in
2262 'yylex_init' at runtime.
2264 '--yylineno, '%option yylineno''
2265 directs 'flex' to generate a scanner that maintains the number of
2266 the current line read from its input in the global variable
2267 'yylineno'. This option is implied by '%option lex-compat'. In a
2268 reentrant C scanner, the macro 'yylineno' is accessible regardless
2269 of the value of '%option yylineno', however, its value is not
2270 modified by 'flex' unless '%option yylineno' is enabled.
2272 '--yywrap, '%option yywrap''
2273 if unset (i.e., '--noyywrap)', makes the scanner not call
2274 'yywrap()' upon an end-of-file, but simply assume that there are no
2275 more files to scan (until the user points 'yyin' at a new file and
2276 calls 'yylex()' again).
2279 File: flex.info, Node: Code-Level And API Options, Next: Options for Scanner Speed and Size, Prev: Options Affecting Scanner Behavior, Up: Scanner Options
2281 16.3 Code-Level And API Options
2282 ===============================
2284 '--ansi-definitions, '%option ansi-definitions''
2287 '--ansi-prototypes, '%option ansi-prototypes''
2290 '--bison-bridge, '%option bison-bridge''
2291 instructs flex to generate a C scanner that is meant to be called
2292 by a 'GNU bison' parser. The scanner has minor API changes for
2293 'bison' compatibility. In particular, the declaration of 'yylex'
2294 is modified to take an additional parameter, 'yylval'. *Note Bison
2297 '--bison-locations, '%option bison-locations''
2298 instruct flex that 'GNU bison' '%locations' are being used. This
2299 means 'yylex' will be passed an additional parameter, 'yylloc'.
2300 This option implies '%option bison-bridge'. *Note Bison Bridge::.
2302 '-L, --noline, '%option noline''
2303 instructs 'flex' not to generate '#line' directives. Without this
2304 option, 'flex' peppers the generated scanner with '#line'
2305 directives so error messages in the actions will be correctly
2306 located with respect to either the original 'flex' input file (if
2307 the errors are due to code in the input file), or 'lex.yy.c' (if
2308 the errors are 'flex''s fault - you should report these sorts of
2309 errors to the email address given in *note Reporting Bugs::).
2311 '-R, --reentrant, '%option reentrant''
2312 instructs flex to generate a reentrant C scanner. The generated
2313 scanner may safely be used in a multi-threaded environment. The
2314 API for a reentrant scanner is different than for a non-reentrant
2315 scanner *note Reentrant::). Because of the API difference between
2316 reentrant and non-reentrant 'flex' scanners, non-reentrant flex
2317 code must be modified before it is suitable for use with this
2318 option. This option is not compatible with the '--c++' option.
2320 The option '--reentrant' does not affect the performance of the
2323 '-+, --c++, '%option c++''
2324 specifies that you want flex to generate a C++ scanner class.
2325 *Note Cxx::, for details.
2327 '--array, '%option array''
2328 specifies that you want yytext to be an array instead of a char*
2330 '--pointer, '%option pointer''
2331 specify that 'yytext' should be a 'char *', not an array. This
2332 default is 'char *'.
2334 '-PPREFIX, --prefix=PREFIX, '%option prefix="PREFIX"''
2335 changes the default 'yy' prefix used by 'flex' for all
2336 globally-visible variable and function names to instead be
2337 'PREFIX'. For example, '--prefix=foo' changes the name of 'yytext'
2338 to 'footext'. It also changes the name of the default output file
2339 from 'lex.yy.c' to 'lex.foo.c'. Here is a partial list of the
2347 yy_load_buffer_state
2361 (If you are using a C++ scanner, then only 'yywrap' and
2362 'yyFlexLexer' are affected.) Within your scanner itself, you can
2363 still refer to the global variables and functions using either
2364 version of their name; but externally, they have the modified name.
2366 This option lets you easily link together multiple 'flex' programs
2367 into the same executable. Note, though, that using this option
2368 also renames 'yywrap()', so you now _must_ either provide your own
2369 (appropriately-named) version of the routine for your scanner, or
2370 use '%option noyywrap', as linking with '-lfl' no longer provides
2371 one for you by default.
2373 '--main, '%option main''
2374 directs flex to provide a default 'main()' program for the scanner,
2375 which simply calls 'yylex()'. This option implies 'noyywrap' (see
2378 '--nounistd, '%option nounistd''
2379 suppresses inclusion of the non-ANSI header file 'unistd.h'. This
2380 option is meant to target environments in which 'unistd.h' does not
2381 exist. Be aware that certain options may cause flex to generate
2382 code that relies on functions normally found in 'unistd.h', (e.g.
2383 'isatty()', 'read()'.) If you wish to use these functions, you
2384 will have to inform your compiler where to find them. *Note
2385 option-always-interactive::. *Note option-read::.
2387 '--yyclass=NAME, '%option yyclass="NAME"''
2388 only applies when generating a C++ scanner (the '--c++' option).
2389 It informs 'flex' that you have derived 'NAME' as a subclass of
2390 'yyFlexLexer', so 'flex' will place your actions in the member
2391 function 'foo::yylex()' instead of 'yyFlexLexer::yylex()'. It also
2392 generates a 'yyFlexLexer::yylex()' member function that emits a
2393 run-time error (by invoking 'yyFlexLexer::LexerError())' if called.
2397 File: flex.info, Node: Options for Scanner Speed and Size, Next: Debugging Options, Prev: Code-Level And API Options, Up: Scanner Options
2399 16.4 Options for Scanner Speed and Size
2400 =======================================
2403 controls the degree of table compression and, more generally,
2404 trade-offs between small scanners and fast scanners.
2407 A lone '-C' specifies that the scanner tables should be
2408 compressed but neither equivalence classes nor
2409 meta-equivalence classes should be used.
2411 '-Ca, --align, '%option align''
2412 ("align") instructs flex to trade off larger tables in the
2413 generated scanner for faster performance because the elements
2414 of the tables are better aligned for memory access and
2415 computation. On some RISC architectures, fetching and
2416 manipulating longwords is more efficient than with
2417 smaller-sized units such as shortwords. This option can
2418 quadruple the size of the tables used by your scanner.
2420 '-Ce, --ecs, '%option ecs''
2421 directs 'flex' to construct "equivalence classes", i.e., sets
2422 of characters which have identical lexical properties (for
2423 example, if the only appearance of digits in the 'flex' input
2424 is in the character class "[0-9]" then the digits '0', '1',
2425 ..., '9' will all be put in the same equivalence class).
2426 Equivalence classes usually give dramatic reductions in the
2427 final table/object file sizes (typically a factor of 2-5) and
2428 are pretty cheap performance-wise (one array look-up per
2432 specifies that the "full" scanner tables should be generated -
2433 'flex' should not compress the tables by taking advantages of
2434 similar transition functions for different states.
2437 specifies that the alternate fast scanner representation
2438 (described above under the '--fast' flag) should be used.
2439 This option cannot be used with '--c++'.
2441 '-Cm, --meta-ecs, '%option meta-ecs''
2442 directs 'flex' to construct "meta-equivalence classes", which
2443 are sets of equivalence classes (or characters, if equivalence
2444 classes are not being used) that are commonly used together.
2445 Meta-equivalence classes are often a big win when using
2446 compressed tables, but they have a moderate performance impact
2447 (one or two 'if' tests and one array look-up per character
2450 '-Cr, --read, '%option read''
2451 causes the generated scanner to _bypass_ use of the standard
2452 I/O library ('stdio') for input. Instead of calling 'fread()'
2453 or 'getc()', the scanner will use the 'read()' system call,
2454 resulting in a performance gain which varies from system to
2455 system, but in general is probably negligible unless you are
2456 also using '-Cf' or '-CF'. Using '-Cr' can cause strange
2457 behavior if, for example, you read from 'yyin' using 'stdio'
2458 prior to calling the scanner (because the scanner will miss
2459 whatever text your previous reads left in the 'stdio' input
2460 buffer). '-Cr' has no effect if you define 'YY_INPUT()'
2461 (*note Generated Scanner::).
2463 The options '-Cf' or '-CF' and '-Cm' do not make sense together -
2464 there is no opportunity for meta-equivalence classes if the table
2465 is not being compressed. Otherwise the options may be freely
2466 mixed, and are cumulative.
2468 The default setting is '-Cem', which specifies that 'flex' should
2469 generate equivalence classes and meta-equivalence classes. This
2470 setting provides the highest degree of table compression. You can
2471 trade off faster-executing scanners at the cost of larger tables
2472 with the following generally being true:
2484 Note that scanners with the smallest tables are usually generated
2485 and compiled the quickest, so during development you will usually
2486 want to use the default, maximal compression.
2488 '-Cfe' is often a good compromise between speed and size for
2489 production scanners.
2491 '-f, --full, '%option full''
2492 specifies "fast scanner". No table compression is done and 'stdio'
2493 is bypassed. The result is large but fast. This option is
2494 equivalent to '--Cfr'
2496 '-F, --fast, '%option fast''
2497 specifies that the _fast_ scanner table representation should be
2498 used (and 'stdio' bypassed). This representation is about as fast
2499 as the full table representation '--full', and for some sets of
2500 patterns will be considerably smaller (and for others, larger). In
2501 general, if the pattern set contains both _keywords_ and a
2502 catch-all, _identifier_ rule, such as in the set:
2504 "case" return TOK_CASE;
2505 "switch" return TOK_SWITCH;
2507 "default" return TOK_DEFAULT;
2508 [a-z]+ return TOK_ID;
2510 then you're better off using the full table representation. If
2511 only the _identifier_ rule is present and you then use a hash table
2512 or some such to detect the keywords, you're better off using
2515 This option is equivalent to '-CFr'. It cannot be used with
2519 File: flex.info, Node: Debugging Options, Next: Miscellaneous Options, Prev: Options for Scanner Speed and Size, Up: Scanner Options
2521 16.5 Debugging Options
2522 ======================
2524 '-b, --backup, '%option backup''
2525 Generate backing-up information to 'lex.backup'. This is a list of
2526 scanner states which require backing up and the input characters on
2527 which they do so. By adding rules one can remove backing-up
2528 states. If _all_ backing-up states are eliminated and '-Cf' or
2529 '-CF' is used, the generated scanner will run faster (see the
2530 '--perf-report' flag). Only users who wish to squeeze every last
2531 cycle out of their scanners need worry about this option. (*note
2534 '-d, --debug, '%option debug''
2535 makes the generated scanner run in "debug" mode. Whenever a
2536 pattern is recognized and the global variable 'yy_flex_debug' is
2537 non-zero (which is the default), the scanner will write to 'stderr'
2540 -accepting rule at line 53 ("the matched text")
2542 The line number refers to the location of the rule in the file
2543 defining the scanner (i.e., the file that was fed to flex).
2544 Messages are also generated when the scanner backs up, accepts the
2545 default rule, reaches the end of its input buffer (or encounters a
2546 NUL; at this point, the two look the same as far as the scanner's
2547 concerned), or reaches an end-of-file.
2549 '-p, --perf-report, '%option perf-report''
2550 generates a performance report to 'stderr'. The report consists of
2551 comments regarding features of the 'flex' input file which will
2552 cause a serious loss of performance in the resulting scanner. If
2553 you give the flag twice, you will also get comments regarding
2554 features that lead to minor performance losses.
2556 Note that the use of 'REJECT', and variable trailing context (*note
2557 Limitations::) entails a substantial performance penalty; use of
2558 'yymore()', the '^' operator, and the '--interactive' flag entail
2559 minor performance penalties.
2561 '-s, --nodefault, '%option nodefault''
2562 causes the _default rule_ (that unmatched scanner input is echoed
2563 to 'stdout)' to be suppressed. If the scanner encounters input
2564 that does not match any of its rules, it aborts with an error.
2565 This option is useful for finding holes in a scanner's rule set.
2567 '-T, --trace, '%option trace''
2568 makes 'flex' run in "trace" mode. It will generate a lot of
2569 messages to 'stderr' concerning the form of the input and the
2570 resultant non-deterministic and deterministic finite automata.
2571 This option is mostly for use in maintaining 'flex'.
2573 '-w, --nowarn, '%option nowarn''
2574 suppresses warning messages.
2576 '-v, --verbose, '%option verbose''
2577 specifies that 'flex' should write to 'stderr' a summary of
2578 statistics regarding the scanner it generates. Most of the
2579 statistics are meaningless to the casual 'flex' user, but the first
2580 line identifies the version of 'flex' (same as reported by
2581 '--version'), and the next line the flags used when generating the
2582 scanner, including those that are on by default.
2584 '--warn, '%option warn''
2585 warn about certain things. In particular, if the default rule can
2586 be matched but no default rule has been given, the flex will warn
2587 you. We recommend using this option always.
2590 File: flex.info, Node: Miscellaneous Options, Prev: Debugging Options, Up: Scanner Options
2592 16.6 Miscellaneous Options
2593 ==========================
2596 A do-nothing option included for POSIX compliance.
2599 generates a "help" summary of 'flex''s options to 'stdout' and then
2603 Another do-nothing option included for POSIX compliance.
2606 prints the version number to 'stdout' and exits.
2609 File: flex.info, Node: Performance, Next: Cxx, Prev: Scanner Options, Up: Top
2611 17 Performance Considerations
2612 *****************************
2614 The main design goal of 'flex' is that it generate high-performance
2615 scanners. It has been optimized for dealing well with large sets of
2616 rules. Aside from the effects on scanner speed of the table compression
2617 '-C' options outlined above, there are a number of options/actions which
2618 degrade performance. These are, from most expensive to least:
2621 arbitrary trailing context
2623 pattern sets that require backing up
2628 %option always-interactive
2630 ^ beginning-of-line operator
2633 with the first two all being quite expensive and the last two being
2634 quite cheap. Note also that 'unput()' is implemented as a routine call
2635 that potentially does quite a bit of work, while 'yyless()' is a
2636 quite-cheap macro. So if you are just putting back some excess text you
2637 scanned, use 'yyless()'.
2639 'REJECT' should be avoided at all costs when performance is
2640 important. It is a particularly expensive option.
2642 There is one case when '%option yylineno' can be expensive. That is
2643 when your patterns match long tokens that could _possibly_ contain a
2644 newline character. There is no performance penalty for rules that can
2645 not possibly match newlines, since flex does not need to check them for
2646 newlines. In general, you should avoid rules such as '[^f]+', which
2647 match very long tokens, including newlines, and may possibly match your
2648 entire file! A better approach is to separate '[^f]+' into two rules:
2655 The above scanner does not incur a performance penalty.
2657 Getting rid of backing up is messy and often may be an enormous
2658 amount of work for a complicated scanner. In principal, one begins by
2659 using the '-b' flag to generate a 'lex.backup' file. For example, on
2663 foo return TOK_KEYWORD;
2664 foobar return TOK_KEYWORD;
2666 the file looks like:
2668 State #6 is non-accepting -
2669 associated rule line numbers:
2671 out-transitions: [ o ]
2672 jam-transitions: EOF [ \001-n p-\177 ]
2674 State #8 is non-accepting -
2675 associated rule line numbers:
2677 out-transitions: [ a ]
2678 jam-transitions: EOF [ \001-` b-\177 ]
2680 State #9 is non-accepting -
2681 associated rule line numbers:
2683 out-transitions: [ r ]
2684 jam-transitions: EOF [ \001-q s-\177 ]
2686 Compressed tables always back up.
2688 The first few lines tell us that there's a scanner state in which it
2689 can make a transition on an 'o' but not on any other character, and that
2690 in that state the currently scanned text does not match any rule. The
2691 state occurs when trying to match the rules found at lines 2 and 3 in
2692 the input file. If the scanner is in that state and then reads
2693 something other than an 'o', it will have to back up to find a rule
2694 which is matched. With a bit of headscratching one can see that this
2695 must be the state it's in when it has seen 'fo'. When this has
2696 happened, if anything other than another 'o' is seen, the scanner will
2697 have to back up to simply match the 'f' (by the default rule).
2699 The comment regarding State #8 indicates there's a problem when
2700 'foob' has been scanned. Indeed, on any character other than an 'a',
2701 the scanner will have to back up to accept "foo". Similarly, the
2702 comment for State #9 concerns when 'fooba' has been scanned and an 'r'
2705 The final comment reminds us that there's no point going to all the
2706 trouble of removing backing up from the rules unless we're using '-Cf'
2707 or '-CF', since there's no performance gain doing so with compressed
2710 The way to remove the backing up is to add "error" rules:
2713 foo return TOK_KEYWORD;
2714 foobar return TOK_KEYWORD;
2719 /* false alarm, not really a keyword */
2723 Eliminating backing up among a list of keywords can also be done
2724 using a "catch-all" rule:
2727 foo return TOK_KEYWORD;
2728 foobar return TOK_KEYWORD;
2730 [a-z]+ return TOK_ID;
2732 This is usually the best solution when appropriate.
2734 Backing up messages tend to cascade. With a complicated set of rules
2735 it's not uncommon to get hundreds of messages. If one can decipher
2736 them, though, it often only takes a dozen or so rules to eliminate the
2737 backing up (though it's easy to make a mistake and have an error rule
2738 accidentally match a valid token. A possible future 'flex' feature will
2739 be to automatically add rules to eliminate backing up).
2741 It's important to keep in mind that you gain the benefits of
2742 eliminating backing up only if you eliminate _every_ instance of backing
2743 up. Leaving just one means you gain nothing.
2745 _Variable_ trailing context (where both the leading and trailing
2746 parts do not have a fixed length) entails almost the same performance
2747 loss as 'REJECT' (i.e., substantial). So when possible a rule like:
2750 mouse|rat/(cat|dog) run();
2755 mouse/cat|dog run();
2761 mouse|rat/cat run();
2762 mouse|rat/dog run();
2764 Note that here the special '|' action does _not_ provide any savings,
2765 and can even make things worse (*note Limitations::).
2767 Another area where the user can increase a scanner's performance (and
2768 one that's easier to implement) arises from the fact that the longer the
2769 tokens matched, the faster the scanner will run. This is because with
2770 long tokens the processing of most input characters takes place in the
2771 (short) inner scanning loop, and does not often have to go through the
2772 additional work of setting up the scanning environment (e.g., 'yytext')
2773 for the action. Recall the scanner for C comments:
2779 "/*" BEGIN(comment);
2782 <comment>"*"+[^*/\n]*
2783 <comment>\n ++line_num;
2784 <comment>"*"+"/" BEGIN(INITIAL);
2786 This could be sped up by writing it as:
2792 "/*" BEGIN(comment);
2795 <comment>[^*\n]*\n ++line_num;
2796 <comment>"*"+[^*/\n]*
2797 <comment>"*"+[^*/\n]*\n ++line_num;
2798 <comment>"*"+"/" BEGIN(INITIAL);
2800 Now instead of each newline requiring the processing of another
2801 action, recognizing the newlines is distributed over the other rules to
2802 keep the matched text as long as possible. Note that _adding_ rules
2803 does _not_ slow down the scanner! The speed of the scanner is
2804 independent of the number of rules or (modulo the considerations given
2805 at the beginning of this section) how complicated the rules are with
2806 regard to operators such as '*' and '|'.
2808 A final example in speeding up a scanner: suppose you want to scan
2809 through a file containing identifiers and keywords, one per line and
2810 with no other extraneous characters, and recognize all the keywords. A
2811 natural first approach is:
2819 while /* it's a keyword */
2821 .|\n /* it's not a keyword */
2823 To eliminate the back-tracking, introduce a catch-all rule:
2831 while /* it's a keyword */
2834 .|\n /* it's not a keyword */
2836 Now, if it's guaranteed that there's exactly one word per line, then
2837 we can reduce the total number of matches by a half by merging in the
2838 recognition of newlines with that of the other tokens:
2846 while\n /* it's a keyword */
2849 .|\n /* it's not a keyword */
2851 One has to be careful here, as we have now reintroduced backing up
2852 into the scanner. In particular, while _we_ know that there will never
2853 be any characters in the input stream other than letters or newlines,
2854 'flex' can't figure this out, and it will plan for possibly needing to
2855 back up when it has scanned a token like 'auto' and then the next
2856 character is something other than a newline or a letter. Previously it
2857 would then just match the 'auto' rule and be done, but now it has no
2858 'auto' rule, only a 'auto\n' rule. To eliminate the possibility of
2859 backing up, we could either duplicate all rules but without final
2860 newlines, or, since we never expect to encounter such an input and
2861 therefore don't how it's classified, we can introduce one more catch-all
2862 rule, this one which doesn't include a newline:
2870 while\n /* it's a keyword */
2874 .|\n /* it's not a keyword */
2876 Compiled with '-Cf', this is about as fast as one can get a 'flex'
2877 scanner to go for this particular problem.
2879 A final note: 'flex' is slow when matching 'NUL's, particularly when
2880 a token contains multiple 'NUL's. It's best to write rules which match
2881 _short_ amounts of text if it's anticipated that the text will often
2884 Another final note regarding performance: as mentioned in *note
2885 Matching::, dynamically resizing 'yytext' to accommodate huge tokens is
2886 a slow process because it presently requires that the (huge) token be
2887 rescanned from the beginning. Thus if performance is vital, you should
2888 attempt to match "large" quantities of text but not "huge" quantities,
2889 where the cutoff between the two is at about 8K characters per token.
2892 File: flex.info, Node: Cxx, Next: Reentrant, Prev: Performance, Up: Top
2894 18 Generating C++ Scanners
2895 **************************
2897 *IMPORTANT*: the present form of the scanning class is _experimental_
2898 and may change considerably between major releases.
2900 'flex' provides two different ways to generate scanners for use with
2901 C++. The first way is to simply compile a scanner generated by 'flex'
2902 using a C++ compiler instead of a C compiler. You should not encounter
2903 any compilation errors (*note Reporting Bugs::). You can then use C++
2904 code in your rule actions instead of C code. Note that the default
2905 input source for your scanner remains 'yyin', and default echoing is
2906 still done to 'yyout'. Both of these remain 'FILE *' variables and not
2909 You can also use 'flex' to generate a C++ scanner class, using the
2910 '-+' option (or, equivalently, '%option c++)', which is automatically
2911 specified if the name of the 'flex' executable ends in a '+', such as
2912 'flex++'. When using this option, 'flex' defaults to generating the
2913 scanner to the file 'lex.yy.cc' instead of 'lex.yy.c'. The generated
2914 scanner includes the header file 'FlexLexer.h', which defines the
2915 interface to two C++ classes.
2917 The first class in 'FlexLexer.h', 'FlexLexer', provides an abstract
2918 base class defining the general scanner class interface. It provides
2919 the following member functions:
2921 'const char* YYText()'
2922 returns the text of the most recently matched token, the equivalent
2926 returns the length of the most recently matched token, the
2927 equivalent of 'yyleng'.
2929 'int lineno() const'
2930 returns the current input line number (see '%option yylineno)', or
2931 '1' if '%option yylineno' was not used.
2933 'void set_debug( int flag )'
2934 sets the debugging flag for the scanner, equivalent to assigning to
2935 'yy_flex_debug' (*note Scanner Options::). Note that you must
2936 build the scanner using '%option debug' to include debugging
2940 returns the current setting of the debugging flag.
2942 Also provided are member functions equivalent to
2943 'yy_switch_to_buffer()', 'yy_create_buffer()' (though the first argument
2944 is an 'istream&' object reference and not a 'FILE*)',
2945 'yy_flush_buffer()', 'yy_delete_buffer()', and 'yyrestart()' (again, the
2946 first argument is a 'istream&' object reference).
2948 The second class defined in 'FlexLexer.h' is 'yyFlexLexer', which is
2949 derived from 'FlexLexer'. It defines the following additional member
2952 'yyFlexLexer( istream* arg_yyin = 0, ostream* arg_yyout = 0 )'
2953 'yyFlexLexer( istream& arg_yyin, ostream& arg_yyout )'
2954 constructs a 'yyFlexLexer' object using the given streams for input
2955 and output. If not specified, the streams default to 'cin' and
2956 'cout', respectively. 'yyFlexLexer' does not take ownership of its
2957 stream arguments. It's up to the user to ensure the streams
2958 pointed to remain alive at least as long as the 'yyFlexLexer'
2961 'virtual int yylex()'
2962 performs the same role is 'yylex()' does for ordinary 'flex'
2963 scanners: it scans the input stream, consuming tokens, until a
2964 rule's action returns a value. If you derive a subclass 'S' from
2965 'yyFlexLexer' and want to access the member functions and variables
2966 of 'S' inside 'yylex()', then you need to use '%option yyclass="S"'
2967 to inform 'flex' that you will be using that subclass instead of
2968 'yyFlexLexer'. In this case, rather than generating
2969 'yyFlexLexer::yylex()', 'flex' generates 'S::yylex()' (and also
2970 generates a dummy 'yyFlexLexer::yylex()' that calls
2971 'yyFlexLexer::LexerError()' if called).
2973 'virtual void switch_streams(istream* new_in = 0, ostream* new_out = 0)'
2974 'virtual void switch_streams(istream& new_in, ostream& new_out)'
2975 reassigns 'yyin' to 'new_in' (if non-null) and 'yyout' to 'new_out'
2976 (if non-null), deleting the previous input buffer if 'yyin' is
2979 'int yylex( istream* new_in, ostream* new_out = 0 )'
2980 'int yylex( istream& new_in, ostream& new_out )'
2981 first switches the input streams via 'switch_streams( new_in,
2982 new_out )' and then returns the value of 'yylex()'.
2984 In addition, 'yyFlexLexer' defines the following protected virtual
2985 functions which you can redefine in derived classes to tailor the
2988 'virtual int LexerInput( char* buf, int max_size )'
2989 reads up to 'max_size' characters into 'buf' and returns the number
2990 of characters read. To indicate end-of-input, return 0 characters.
2991 Note that 'interactive' scanners (see the '-B' and '-I' flags in
2992 *note Scanner Options::) define the macro 'YY_INTERACTIVE'. If you
2993 redefine 'LexerInput()' and need to take different actions
2994 depending on whether or not the scanner might be scanning an
2995 interactive input source, you can test for the presence of this
2996 name via '#ifdef' statements.
2998 'virtual void LexerOutput( const char* buf, int size )'
2999 writes out 'size' characters from the buffer 'buf', which, while
3000 'NUL'-terminated, may also contain internal 'NUL's if the scanner's
3001 rules can match text with 'NUL's in them.
3003 'virtual void LexerError( const char* msg )'
3004 reports a fatal error message. The default version of this
3005 function writes the message to the stream 'cerr' and exits.
3007 Note that a 'yyFlexLexer' object contains its _entire_ scanning
3008 state. Thus you can use such objects to create reentrant scanners, but
3009 see also *note Reentrant::. You can instantiate multiple instances of
3010 the same 'yyFlexLexer' class, and you can also combine multiple C++
3011 scanner classes together in the same program using the '-P' option
3014 Finally, note that the '%array' feature is not available to C++
3015 scanner classes; you must use '%pointer' (the default).
3017 Here is an example of a simple C++ scanner:
3019 // An example of using the flex C++ scanner class.
3023 using namespace std;
3027 %option noyywrap c++
3035 name ({alpha}|{dig}|\$)({alpha}|{dig}|[_.\-/$])*
3036 num1 [-+]?{dig}+\.?([eE][-+]?{dig}+)?
3037 num2 [-+]?{dig}*\.{dig}+([eE][-+]?{dig}+)?
3038 number {num1}|{num2}
3042 {ws} /* skip blanks and tabs */
3047 while((c = yyinput()) != 0)
3054 if((c = yyinput()) == '/')
3062 {number} cout << "number " << YYText() << '\n';
3066 {name} cout << "name " << YYText() << '\n';
3068 {string} cout << "string " << YYText() << '\n';
3072 // This include is required if main() is an another source file.
3073 //#include <FlexLexer.h>
3075 int main( int /* argc */, char** /* argv */ )
3077 FlexLexer* lexer = new yyFlexLexer;
3078 while(lexer->yylex() != 0)
3083 If you want to create multiple (different) lexer classes, you use the
3084 '-P' flag (or the 'prefix=' option) to rename each 'yyFlexLexer' to some
3085 other 'xxFlexLexer'. You then can include '<FlexLexer.h>' in your other
3086 sources once per lexer class, first renaming 'yyFlexLexer' as follows:
3089 #define yyFlexLexer xxFlexLexer
3090 #include <FlexLexer.h>
3093 #define yyFlexLexer zzFlexLexer
3094 #include <FlexLexer.h>
3096 if, for example, you used '%option prefix="xx"' for one of your
3097 scanners and '%option prefix="zz"' for the other.
3100 File: flex.info, Node: Reentrant, Next: Lex and Posix, Prev: Cxx, Up: Top
3102 19 Reentrant C Scanners
3103 ***********************
3105 'flex' has the ability to generate a reentrant C scanner. This is
3106 accomplished by specifying '%option reentrant' ('-R') The generated
3107 scanner is both portable, and safe to use in one or more separate
3108 threads of control. The most common use for reentrant scanners is from
3109 within multi-threaded applications. Any thread may create and execute a
3110 reentrant 'flex' scanner without the need for synchronization with other
3116 * Reentrant Overview::
3117 * Reentrant Example::
3118 * Reentrant Detail::
3119 * Reentrant Functions::
3122 File: flex.info, Node: Reentrant Uses, Next: Reentrant Overview, Prev: Reentrant, Up: Reentrant
3124 19.1 Uses for Reentrant Scanners
3125 ================================
3127 However, there are other uses for a reentrant scanner. For example, you
3128 could scan two or more files simultaneously to implement a 'diff' at the
3129 token level (i.e., instead of at the character level):
3131 /* Example of maintaining more than one active scanner. */
3136 tok1 = yylex( scanner_1 );
3137 tok2 = yylex( scanner_2 );
3140 printf("Files are different.");
3142 } while ( tok1 && tok2 );
3144 Another use for a reentrant scanner is recursion. (Note that a
3145 recursive scanner can also be created using a non-reentrant scanner and
3146 buffer states. *Note Multiple Input Buffers::.)
3148 The following crude scanner supports the 'eval' command by invoking
3149 another instance of itself.
3151 /* Example of recursive invocation. */
3158 YY_BUFFER_STATE buf;
3160 yylex_init( &scanner );
3161 yytext[yyleng-1] = ' ';
3163 buf = yy_scan_string( yytext + 5, scanner );
3166 yy_delete_buffer(buf,scanner);
3167 yylex_destroy( scanner );
3173 File: flex.info, Node: Reentrant Overview, Next: Reentrant Example, Prev: Reentrant Uses, Up: Reentrant
3175 19.2 An Overview of the Reentrant API
3176 =====================================
3178 The API for reentrant scanners is different than for non-reentrant
3179 scanners. Here is a quick overview of the API:
3181 '%option reentrant' must be specified.
3183 * All functions take one additional argument: 'yyscanner'
3185 * All global variables are replaced by their macro equivalents. (We
3186 tell you this because it may be important to you during debugging.)
3188 * 'yylex_init' and 'yylex_destroy' must be called before and after
3189 'yylex', respectively.
3191 * Accessor methods (get/set functions) provide access to common
3194 * User-specific data can be stored in 'yyextra'.
3197 File: flex.info, Node: Reentrant Example, Next: Reentrant Detail, Prev: Reentrant Overview, Up: Reentrant
3199 19.3 Reentrant Example
3200 ======================
3202 First, an example of a reentrant scanner:
3203 /* This scanner prints "//" comments. */
3205 %option reentrant stack noyywrap
3210 "//" yy_push_state( COMMENT, yyscanner);
3213 <COMMENT>\n yy_pop_state( yyscanner );
3214 <COMMENT>[^\n]+ fprintf( yyout, "%s\n", yytext);
3218 int main ( int argc, char * argv[] )
3222 yylex_init ( &scanner );
3224 yylex_destroy ( scanner );
3229 File: flex.info, Node: Reentrant Detail, Next: Reentrant Functions, Prev: Reentrant Example, Up: Reentrant
3231 19.4 The Reentrant API in Detail
3232 ================================
3234 Here are the things you need to do or know to use the reentrant C API of
3239 * Specify Reentrant::
3240 * Extra Reentrant Argument::
3241 * Global Replacement::
3242 * Init and Destroy Functions::
3243 * Accessor Methods::
3248 File: flex.info, Node: Specify Reentrant, Next: Extra Reentrant Argument, Prev: Reentrant Detail, Up: Reentrant Detail
3250 19.4.1 Declaring a Scanner As Reentrant
3251 ---------------------------------------
3253 %option reentrant (-reentrant) must be specified.
3255 Notice that '%option reentrant' is specified in the above example
3256 (*note Reentrant Example::. Had this option not been specified, 'flex'
3257 would have happily generated a non-reentrant scanner without
3258 complaining. You may explicitly specify '%option noreentrant', if you
3259 do _not_ want a reentrant scanner, although it is not necessary. The
3260 default is to generate a non-reentrant scanner.
3263 File: flex.info, Node: Extra Reentrant Argument, Next: Global Replacement, Prev: Specify Reentrant, Up: Reentrant Detail
3265 19.4.2 The Extra Argument
3266 -------------------------
3268 All functions take one additional argument: 'yyscanner'.
3270 Notice that the calls to 'yy_push_state' and 'yy_pop_state' both have
3271 an argument, 'yyscanner' , that is not present in a non-reentrant
3272 scanner. Here are the declarations of 'yy_push_state' and
3273 'yy_pop_state' in the reentrant scanner:
3275 static void yy_push_state ( int new_state , yyscan_t yyscanner ) ;
3276 static void yy_pop_state ( yyscan_t yyscanner ) ;
3278 Notice that the argument 'yyscanner' appears in the declaration of
3279 both functions. In fact, all 'flex' functions in a reentrant scanner
3280 have this additional argument. It is always the last argument in the
3281 argument list, it is always of type 'yyscan_t' (which is typedef'd to
3282 'void *') and it is always named 'yyscanner'. As you may have guessed,
3283 'yyscanner' is a pointer to an opaque data structure encapsulating the
3284 current state of the scanner. For a list of function declarations, see
3285 *note Reentrant Functions::. Note that preprocessor macros, such as
3286 'BEGIN', 'ECHO', and 'REJECT', do not take this additional argument.
3289 File: flex.info, Node: Global Replacement, Next: Init and Destroy Functions, Prev: Extra Reentrant Argument, Up: Reentrant Detail
3291 19.4.3 Global Variables Replaced By Macros
3292 ------------------------------------------
3294 All global variables in traditional flex have been replaced by macro
3297 Note that in the above example, 'yyout' and 'yytext' are not plain
3298 variables. These are macros that will expand to their equivalent
3299 lvalue. All of the familiar 'flex' globals have been replaced by their
3300 macro equivalents. In particular, 'yytext', 'yyleng', 'yylineno',
3301 'yyin', 'yyout', 'yyextra', 'yylval', and 'yylloc' are macros. You may
3302 safely use these macros in actions as if they were plain variables. We
3303 only tell you this so you don't expect to link to these variables
3304 externally. Currently, each macro expands to a member of an internal
3307 #define yytext (((struct yyguts_t*)yyscanner)->yytext_r)
3309 One important thing to remember about 'yytext' and friends is that
3310 'yytext' is not a global variable in a reentrant scanner, you can not
3311 access it directly from outside an action or from other functions. You
3312 must use an accessor method, e.g., 'yyget_text', to accomplish this.
3316 File: flex.info, Node: Init and Destroy Functions, Next: Accessor Methods, Prev: Global Replacement, Up: Reentrant Detail
3318 19.4.4 Init and Destroy Functions
3319 ---------------------------------
3321 'yylex_init' and 'yylex_destroy' must be called before and after
3322 'yylex', respectively.
3324 int yylex_init ( yyscan_t * ptr_yy_globals ) ;
3325 int yylex_init_extra ( YY_EXTRA_TYPE user_defined, yyscan_t * ptr_yy_globals ) ;
3326 int yylex ( yyscan_t yyscanner ) ;
3327 int yylex_destroy ( yyscan_t yyscanner ) ;
3329 The function 'yylex_init' must be called before calling any other
3330 function. The argument to 'yylex_init' is the address of an
3331 uninitialized pointer to be filled in by 'yylex_init', overwriting any
3332 previous contents. The function 'yylex_init_extra' may be used instead,
3333 taking as its first argument a variable of type 'YY_EXTRA_TYPE'. See
3334 the section on yyextra, below, for more details.
3336 The value stored in 'ptr_yy_globals' should thereafter be passed to
3337 'yylex' and 'yylex_destroy'. Flex does not save the argument passed to
3338 'yylex_init', so it is safe to pass the address of a local pointer to
3339 'yylex_init' so long as it remains in scope for the duration of all
3340 calls to the scanner, up to and including the call to 'yylex_destroy'.
3342 The function 'yylex' should be familiar to you by now. The reentrant
3343 version takes one argument, which is the value returned (via an
3344 argument) by 'yylex_init'. Otherwise, it behaves the same as the
3345 non-reentrant version of 'yylex'.
3347 Both 'yylex_init' and 'yylex_init_extra' returns 0 (zero) on success,
3348 or non-zero on failure, in which case errno is set to one of the
3351 * ENOMEM Memory allocation error. *Note memory-management::.
3352 * EINVAL Invalid argument.
3354 The function 'yylex_destroy' should be called to free resources used
3355 by the scanner. After 'yylex_destroy' is called, the contents of
3356 'yyscanner' should not be used. Of course, there is no need to destroy
3357 a scanner if you plan to reuse it. A 'flex' scanner (both reentrant and
3358 non-reentrant) may be restarted by calling 'yyrestart'.
3360 Below is an example of a program that creates a scanner, uses it,
3361 then destroys it when done:
3368 yylex_init(&scanner);
3370 while ((tok=yylex(scanner)) > 0)
3371 printf("tok=%d yytext=%s\n", tok, yyget_text(scanner));
3373 yylex_destroy(scanner);
3378 File: flex.info, Node: Accessor Methods, Next: Extra Data, Prev: Init and Destroy Functions, Up: Reentrant Detail
3380 19.4.5 Accessing Variables with Reentrant Scanners
3381 --------------------------------------------------
3383 Accessor methods (get/set functions) provide access to common 'flex'
3386 Many scanners that you build will be part of a larger project.
3387 Portions of your project will need access to 'flex' values, such as
3388 'yytext'. In a non-reentrant scanner, these values are global, so there
3389 is no problem accessing them. However, in a reentrant scanner, there
3390 are no global 'flex' values. You can not access them directly.
3391 Instead, you must access 'flex' values using accessor methods (get/set
3392 functions). Each accessor method is named 'yyget_NAME' or 'yyset_NAME',
3393 where 'NAME' is the name of the 'flex' variable you want. For example:
3395 /* Set the last character of yytext to NULL. */
3396 void chop ( yyscan_t scanner )
3398 int len = yyget_leng( scanner );
3399 yyget_text( scanner )[len - 1] = '\0';
3402 The above code may be called from within an action like this:
3405 .+\n { chop( yyscanner );}
3407 You may find that '%option header-file' is particularly useful for
3408 generating prototypes of all the accessor functions. *Note
3412 File: flex.info, Node: Extra Data, Next: About yyscan_t, Prev: Accessor Methods, Up: Reentrant Detail
3417 User-specific data can be stored in 'yyextra'.
3419 In a reentrant scanner, it is unwise to use global variables to
3420 communicate with or maintain state between different pieces of your
3421 program. However, you may need access to external data or invoke
3422 external functions from within the scanner actions. Likewise, you may
3423 need to pass information to your scanner (e.g., open file descriptors,
3424 or database connections). In a non-reentrant scanner, the only way to
3425 do this would be through the use of global variables. 'Flex' allows you
3426 to store arbitrary, "extra" data in a scanner. This data is accessible
3427 through the accessor methods 'yyget_extra' and 'yyset_extra' from
3428 outside the scanner, and through the shortcut macro 'yyextra' from
3429 within the scanner itself. They are defined as follows:
3431 #define YY_EXTRA_TYPE void*
3432 YY_EXTRA_TYPE yyget_extra ( yyscan_t scanner );
3433 void yyset_extra ( YY_EXTRA_TYPE arbitrary_data , yyscan_t scanner);
3435 In addition, an extra form of 'yylex_init' is provided,
3436 'yylex_init_extra'. This function is provided so that the yyextra value
3437 can be accessed from within the very first yyalloc, used to allocate the
3440 By default, 'YY_EXTRA_TYPE' is defined as type 'void *'. You may
3441 redefine this type using '%option extra-type="your_type"' in the
3444 /* An example of overriding YY_EXTRA_TYPE. */
3446 #include <sys/stat.h>
3450 %option extra-type="struct stat *"
3453 __filesize__ printf( "%ld", yyextra->st_size );
3454 __lastmod__ printf( "%ld", yyextra->st_mtime );
3456 void scan_file( char* filename )
3462 in = fopen( filename, "r" );
3463 stat( filename, &buf );
3465 yylex_init_extra( buf, &scanner );
3466 yyset_in( in, scanner );
3468 yylex_destroy( scanner );
3474 File: flex.info, Node: About yyscan_t, Prev: Extra Data, Up: Reentrant Detail
3476 19.4.7 About yyscan_t
3477 ---------------------
3479 'yyscan_t' is defined as:
3481 typedef void* yyscan_t;
3483 It is initialized by 'yylex_init()' to point to an internal
3484 structure. You should never access this value directly. In particular,
3485 you should never attempt to free it (use 'yylex_destroy()' instead.)
3488 File: flex.info, Node: Reentrant Functions, Prev: Reentrant Detail, Up: Reentrant
3490 19.5 Functions and Macros Available in Reentrant C Scanners
3491 ===========================================================
3493 The following Functions are available in a reentrant scanner:
3495 char *yyget_text ( yyscan_t scanner );
3496 int yyget_leng ( yyscan_t scanner );
3497 FILE *yyget_in ( yyscan_t scanner );
3498 FILE *yyget_out ( yyscan_t scanner );
3499 int yyget_lineno ( yyscan_t scanner );
3500 YY_EXTRA_TYPE yyget_extra ( yyscan_t scanner );
3501 int yyget_debug ( yyscan_t scanner );
3503 void yyset_debug ( int flag, yyscan_t scanner );
3504 void yyset_in ( FILE * in_str , yyscan_t scanner );
3505 void yyset_out ( FILE * out_str , yyscan_t scanner );
3506 void yyset_lineno ( int line_number , yyscan_t scanner );
3507 void yyset_extra ( YY_EXTRA_TYPE user_defined , yyscan_t scanner );
3509 There are no "set" functions for yytext and yyleng. This is
3512 The following Macro shortcuts are available in actions in a reentrant
3523 In a reentrant C scanner, support for yylineno is always present
3524 (i.e., you may access yylineno), but the value is never modified by
3525 'flex' unless '%option yylineno' is enabled. This is to allow the user
3526 to maintain the line count independently of 'flex'.
3528 The following functions and macros are made available when '%option
3529 bison-bridge' ('--bison-bridge') is specified:
3531 YYSTYPE * yyget_lval ( yyscan_t scanner );
3532 void yyset_lval ( YYSTYPE * yylvalp , yyscan_t scanner );
3535 The following functions and macros are made available when '%option
3536 bison-locations' ('--bison-locations') is specified:
3538 YYLTYPE *yyget_lloc ( yyscan_t scanner );
3539 void yyset_lloc ( YYLTYPE * yyllocp , yyscan_t scanner );
3542 Support for yylval assumes that 'YYSTYPE' is a valid type. Support
3543 for yylloc assumes that 'YYSLYPE' is a valid type. Typically, these
3544 types are generated by 'bison', and are included in section 1 of the
3548 File: flex.info, Node: Lex and Posix, Next: Memory Management, Prev: Reentrant, Up: Top
3550 20 Incompatibilities with Lex and Posix
3551 ***************************************
3553 'flex' is a rewrite of the AT&T Unix _lex_ tool (the two implementations
3554 do not share any code, though), with some extensions and
3555 incompatibilities, both of which are of concern to those who wish to
3556 write scanners acceptable to both implementations. 'flex' is fully
3557 compliant with the POSIX 'lex' specification, except that when using
3558 '%pointer' (the default), a call to 'unput()' destroys the contents of
3559 'yytext', which is counter to the POSIX specification. In this section
3560 we discuss all of the known areas of incompatibility between 'flex',
3561 AT&T 'lex', and the POSIX specification. 'flex''s '-l' option turns on
3562 maximum compatibility with the original AT&T 'lex' implementation, at
3563 the cost of a major loss in the generated scanner's performance. We
3564 note below which incompatibilities can be overcome using the '-l'
3565 option. 'flex' is fully compatible with 'lex' with the following
3568 * The undocumented 'lex' scanner internal variable 'yylineno' is not
3569 supported unless '-l' or '%option yylineno' is used.
3571 * 'yylineno' should be maintained on a per-buffer basis, rather than
3572 a per-scanner (single global variable) basis.
3574 * 'yylineno' is not part of the POSIX specification.
3576 * The 'input()' routine is not redefinable, though it may be called
3577 to read characters following whatever has been matched by a rule.
3578 If 'input()' encounters an end-of-file the normal 'yywrap()'
3579 processing is done. A "real" end-of-file is returned by 'input()'
3582 * Input is instead controlled by defining the 'YY_INPUT()' macro.
3584 * The 'flex' restriction that 'input()' cannot be redefined is in
3585 accordance with the POSIX specification, which simply does not
3586 specify any way of controlling the scanner's input other than by
3587 making an initial assignment to 'yyin'.
3589 * The 'unput()' routine is not redefinable. This restriction is in
3590 accordance with POSIX.
3592 * 'flex' scanners are not as reentrant as 'lex' scanners. In
3593 particular, if you have an interactive scanner and an interrupt
3594 handler which long-jumps out of the scanner, and the scanner is
3595 subsequently called again, you may get the following message:
3597 fatal flex scanner internal error--end of buffer missed
3599 To reenter the scanner, first use:
3603 Note that this call will throw away any buffered input; usually
3604 this isn't a problem with an interactive scanner. *Note
3605 Reentrant::, for 'flex''s reentrant API.
3607 * Also note that 'flex' C++ scanner classes _are_ reentrant, so if
3608 using C++ is an option for you, you should use them instead. *Note
3609 Cxx::, and *note Reentrant:: for details.
3611 * 'output()' is not supported. Output from the ECHO macro is done to
3612 the file-pointer 'yyout' (default 'stdout)'.
3614 * 'output()' is not part of the POSIX specification.
3616 * 'lex' does not support exclusive start conditions (%x), though they
3617 are in the POSIX specification.
3619 * When definitions are expanded, 'flex' encloses them in parentheses.
3620 With 'lex', the following:
3624 foo{NAME}? printf( "Found it\n" );
3627 will not match the string 'foo' because when the macro is expanded
3628 the rule is equivalent to 'foo[A-Z][A-Z0-9]*?' and the precedence
3629 is such that the '?' is associated with '[A-Z0-9]*'. With 'flex',
3630 the rule will be expanded to 'foo([A-Z][A-Z0-9]*)?' and so the
3631 string 'foo' will match.
3633 * Note that if the definition begins with '^' or ends with '$' then
3634 it is _not_ expanded with parentheses, to allow these operators to
3635 appear in definitions without losing their special meanings. But
3636 the '<s>', '/', and '<<EOF>>' operators cannot be used in a 'flex'
3639 * Using '-l' results in the 'lex' behavior of no parentheses around
3642 * The POSIX specification is that the definition be enclosed in
3645 * Some implementations of 'lex' allow a rule's action to begin on a
3646 separate line, if the rule's pattern has trailing whitespace:
3652 'flex' does not support this feature.
3654 * The 'lex' '%r' (generate a Ratfor scanner) option is not supported.
3655 It is not part of the POSIX specification.
3657 * After a call to 'unput()', _yytext_ is undefined until the next
3658 token is matched, unless the scanner was built using '%array'.
3659 This is not the case with 'lex' or the POSIX specification. The
3660 '-l' option does away with this incompatibility.
3662 * The precedence of the '{,}' (numeric range) operator is different.
3663 The AT&T and POSIX specifications of 'lex' interpret 'abc{1,3}' as
3664 match one, two, or three occurrences of 'abc'", whereas 'flex'
3665 interprets it as "match 'ab' followed by one, two, or three
3666 occurrences of 'c'". The '-l' and '--posix' options do away with
3667 this incompatibility.
3669 * The precedence of the '^' operator is different. 'lex' interprets
3670 '^foo|bar' as "match either 'foo' at the beginning of a line, or
3671 'bar' anywhere", whereas 'flex' interprets it as "match either
3672 'foo' or 'bar' if they come at the beginning of a line". The
3673 latter is in agreement with the POSIX specification.
3675 * The special table-size declarations such as '%a' supported by 'lex'
3676 are not required by 'flex' scanners.. 'flex' ignores them.
3677 * The name 'FLEX_SCANNER' is '#define''d so scanners may be written
3678 for use with either 'flex' or 'lex'. Scanners also include
3679 'YY_FLEX_MAJOR_VERSION', 'YY_FLEX_MINOR_VERSION' and
3680 'YY_FLEX_SUBMINOR_VERSION' indicating which version of 'flex'
3681 generated the scanner. For example, for the 2.5.22 release, these
3682 defines would be 2, 5 and 22 respectively. If the version of
3683 'flex' being used is a beta version, then the symbol 'FLEX_BETA' is
3686 * The symbols '[[' and ']]' in the code sections of the input may
3687 conflict with the m4 delimiters. *Note M4 Dependency::.
3689 The following 'flex' features are not included in 'lex' or the POSIX
3694 * start condition scopes
3695 * start condition stacks
3696 * interactive/non-interactive scanners
3697 * yy_scan_string() and friends
3699 * yy_set_interactive()
3701 * YY_AT_BOL() <<EOF>>
3708 * %{}'s around actions
3710 * multiple actions on a line
3711 * almost all of the 'flex' command-line options
3713 The feature "multiple actions on a line" refers to the fact that with
3714 'flex' you can put multiple actions on the same line, separated with
3715 semi-colons, while with 'lex', the following:
3717 foo handle_foo(); ++num_foos_seen;
3719 is (rather surprisingly) truncated to
3723 'flex' does not truncate the action. Actions that are not enclosed
3724 in braces are simply terminated at the end of the line.
3727 File: flex.info, Node: Memory Management, Next: Serialized Tables, Prev: Lex and Posix, Up: Top
3729 21 Memory Management
3730 ********************
3732 This chapter describes how flex handles dynamic memory, and how you can
3733 override the default behavior.
3737 * The Default Memory Management::
3738 * Overriding The Default Memory Management::
3739 * A Note About yytext And Memory::
3742 File: flex.info, Node: The Default Memory Management, Next: Overriding The Default Memory Management, Prev: Memory Management, Up: Memory Management
3744 21.1 The Default Memory Management
3745 ==================================
3747 Flex allocates dynamic memory during initialization, and once in a while
3748 from within a call to yylex(). Initialization takes place during the
3749 first call to yylex(). Thereafter, flex may reallocate more memory if
3750 it needs to enlarge a buffer. As of version 2.5.9 Flex will clean up
3751 all memory when you call 'yylex_destroy' *Note faq-memory-leak::.
3753 Flex allocates dynamic memory for four purposes, listed below (1)
3755 16kB for the input buffer.
3756 Flex allocates memory for the character buffer used to perform
3757 pattern matching. Flex must read ahead from the input stream and
3758 store it in a large character buffer. This buffer is typically the
3759 largest chunk of dynamic memory flex consumes. This buffer will
3760 grow if necessary, doubling the size each time. Flex frees this
3761 memory when you call yylex_destroy(). The default size of this
3762 buffer (16384 bytes) is almost always too large. The ideal size
3763 for this buffer is the length of the longest token expected, in
3764 bytes, plus a little more. Flex will allocate a few extra bytes
3765 for housekeeping. Currently, to override the size of the input
3766 buffer you must '#define YY_BUF_SIZE' to whatever number of bytes
3767 you want. We don't plan to change this in the near future, but we
3768 reserve the right to do so if we ever add a more robust memory
3771 64kb for the REJECT state. This will only be allocated if you use REJECT.
3772 The size is large enough to hold the same number of states as
3773 characters in the input buffer. If you override the size of the
3774 input buffer (via 'YY_BUF_SIZE'), then you automatically override
3775 the size of this buffer as well.
3777 100 bytes for the start condition stack.
3778 Flex allocates memory for the start condition stack. This is the
3779 stack used for pushing start states, i.e., with yy_push_state().
3780 It will grow if necessary. Since the states are simply integers,
3781 this stack doesn't consume much memory. This stack is not present
3782 if '%option stack' is not specified. You will rarely need to tune
3783 this buffer. The ideal size for this stack is the maximum depth
3784 expected. The memory for this stack is automatically destroyed
3785 when you call yylex_destroy(). *Note option-stack::.
3787 40 bytes for each YY_BUFFER_STATE.
3788 Flex allocates memory for each YY_BUFFER_STATE. The buffer state
3789 itself is about 40 bytes, plus an additional large character buffer
3790 (described above.) The initial buffer state is created during
3791 initialization, and with each call to yy_create_buffer(). You
3792 can't tune the size of this, but you can tune the character buffer
3793 as described above. Any buffer state that you explicitly create by
3794 calling yy_create_buffer() is _NOT_ destroyed automatically. You
3795 must call yy_delete_buffer() to free the memory. The exception to
3796 this rule is that flex will delete the current buffer automatically
3797 when you call yylex_destroy(). If you delete the current buffer,
3798 be sure to set it to NULL. That way, flex will not try to delete
3799 the buffer a second time (possibly crashing your program!) At the
3800 time of this writing, flex does not provide a growable stack for
3801 the buffer states. You have to manage that yourself. *Note
3802 Multiple Input Buffers::.
3804 84 bytes for the reentrant scanner guts
3805 Flex allocates about 84 bytes for the reentrant scanner structure
3806 when you call yylex_init(). It is destroyed when the user calls
3809 ---------- Footnotes ----------
3811 (1) The quantities given here are approximate, and may vary due to
3812 host architecture, compiler configuration, or due to future enhancements
3816 File: flex.info, Node: Overriding The Default Memory Management, Next: A Note About yytext And Memory, Prev: The Default Memory Management, Up: Memory Management
3818 21.2 Overriding The Default Memory Management
3819 =============================================
3821 Flex calls the functions 'yyalloc', 'yyrealloc', and 'yyfree' when it
3822 needs to allocate or free memory. By default, these functions are
3823 wrappers around the standard C functions, 'malloc', 'realloc', and
3824 'free', respectively. You can override the default implementations by
3825 telling flex that you will provide your own implementations.
3827 To override the default implementations, you must do two things:
3829 1. Suppress the default implementations by specifying one or more of
3830 the following options:
3832 * '%option noyyalloc'
3833 * '%option noyyrealloc'
3834 * '%option noyyfree'.
3836 2. Provide your own implementation of the following functions: (1)
3838 // For a non-reentrant scanner
3839 void * yyalloc (size_t bytes);
3840 void * yyrealloc (void * ptr, size_t bytes);
3841 void yyfree (void * ptr);
3843 // For a reentrant scanner
3844 void * yyalloc (size_t bytes, void * yyscanner);
3845 void * yyrealloc (void * ptr, size_t bytes, void * yyscanner);
3846 void yyfree (void * ptr, void * yyscanner);
3848 In the following example, we will override all three memory routines.
3849 We assume that there is a custom allocator with garbage collection. In
3850 order to make this example interesting, we will use a reentrant scanner,
3851 passing a pointer to the custom allocator through 'yyextra'.
3854 #include "some_allocator.h"
3857 /* Suppress the default implementations. */
3858 %option noyyalloc noyyrealloc noyyfree
3861 /* Initialize the allocator. */
3863 #define YY_EXTRA_TYPE struct allocator*
3864 #define YY_USER_INIT yyextra = allocator_create();
3871 /* Provide our own implementations. */
3872 void * yyalloc (size_t bytes, void* yyscanner) {
3873 return allocator_alloc (yyextra, bytes);
3876 void * yyrealloc (void * ptr, size_t bytes, void* yyscanner) {
3877 return allocator_realloc (yyextra, bytes);
3880 void yyfree (void * ptr, void * yyscanner) {
3881 /* Do nothing -- we leave it to the garbage collector. */
3885 ---------- Footnotes ----------
3887 (1) It is not necessary to override all (or any) of the memory
3888 management routines. You may, for example, override 'yyrealloc', but
3889 not 'yyfree' or 'yyalloc'.
3892 File: flex.info, Node: A Note About yytext And Memory, Prev: Overriding The Default Memory Management, Up: Memory Management
3894 21.3 A Note About yytext And Memory
3895 ===================================
3897 When flex finds a match, 'yytext' points to the first character of the
3898 match in the input buffer. The string itself is part of the input
3899 buffer, and is _NOT_ allocated separately. The value of yytext will be
3900 overwritten the next time yylex() is called. In short, the value of
3901 yytext is only valid from within the matched rule's action.
3903 Often, you want the value of yytext to persist for later processing,
3904 i.e., by a parser with non-zero lookahead. In order to preserve yytext,
3905 you will have to copy it with strdup() or a similar function. But this
3906 introduces some headache because your parser is now responsible for
3907 freeing the copy of yytext. If you use a yacc or bison parser,
3908 (commonly used with flex), you will discover that the error recovery
3909 mechanisms can cause memory to be leaked.
3911 To prevent memory leaks from strdup'd yytext, you will have to track
3912 the memory somehow. Our experience has shown that a garbage collection
3913 mechanism or a pooled memory mechanism will save you a lot of grief when
3917 File: flex.info, Node: Serialized Tables, Next: Diagnostics, Prev: Memory Management, Up: Top
3919 22 Serialized Tables
3920 ********************
3922 A 'flex' scanner has the ability to save the DFA tables to a file, and
3923 load them at runtime when needed. The motivation for this feature is to
3924 reduce the runtime memory footprint. Traditionally, these tables have
3925 been compiled into the scanner as C arrays, and are sometimes quite
3926 large. Since the tables are compiled into the scanner, the memory used
3927 by the tables can never be freed. This is a waste of memory, especially
3928 if an application uses several scanners, but none of them at the same
3931 The serialization feature allows the tables to be loaded at runtime,
3932 before scanning begins. The tables may be discarded when scanning is
3937 * Creating Serialized Tables::
3938 * Loading and Unloading Serialized Tables::
3939 * Tables File Format::
3942 File: flex.info, Node: Creating Serialized Tables, Next: Loading and Unloading Serialized Tables, Prev: Serialized Tables, Up: Serialized Tables
3944 22.1 Creating Serialized Tables
3945 ===============================
3947 You may create a scanner with serialized tables by specifying:
3949 %option tables-file=FILE
3953 These options instruct flex to save the DFA tables to the file FILE.
3954 The tables will _not_ be embedded in the generated scanner. The scanner
3955 will not function on its own. The scanner will be dependent upon the
3956 serialized tables. You must load the tables from this file at runtime
3957 before you can scan anything.
3959 If you do not specify a filename to '--tables-file', the tables will
3960 be saved to 'lex.yy.tables', where 'yy' is the appropriate prefix.
3962 If your project uses several different scanners, you can concatenate
3963 the serialized tables into one file, and flex will find the correct set
3964 of tables, using the scanner prefix as part of the lookup key. An
3967 $ flex --tables-file --prefix=cpp cpp.l
3968 $ flex --tables-file --prefix=c c.l
3969 $ cat lex.cpp.tables lex.c.tables > all.tables
3971 The above example created two scanners, 'cpp', and 'c'. Since we did
3972 not specify a filename, the tables were serialized to 'lex.c.tables' and
3973 'lex.cpp.tables', respectively. Then, we concatenated the two files
3974 together into 'all.tables', which we will distribute with our project.
3975 At runtime, we will open the file and tell flex to load the tables from
3976 it. Flex will find the correct tables automatically. (See next
3980 File: flex.info, Node: Loading and Unloading Serialized Tables, Next: Tables File Format, Prev: Creating Serialized Tables, Up: Serialized Tables
3982 22.2 Loading and Unloading Serialized Tables
3983 ============================================
3985 If you've built your scanner with '%option tables-file', then you must
3986 load the scanner tables at runtime. This can be accomplished with the
3989 -- Function: int yytables_fload (FILE* FP [, yyscan_t SCANNER])
3990 Locates scanner tables in the stream pointed to by FP and loads
3991 them. Memory for the tables is allocated via 'yyalloc'. You must
3992 call this function before the first call to 'yylex'. The argument
3993 SCANNER only appears in the reentrant scanner. This function
3994 returns '0' (zero) on success, or non-zero on error.
3996 The loaded tables are *not* automatically destroyed (unloaded) when
3997 you call 'yylex_destroy'. The reason is that you may create several
3998 scanners of the same type (in a reentrant scanner), each of which needs
3999 access to these tables. To avoid a nasty memory leak, you must call the
4002 -- Function: int yytables_destroy ([yyscan_t SCANNER])
4003 Unloads the scanner tables. The tables must be loaded again before
4004 you can scan any more data. The argument SCANNER only appears in
4005 the reentrant scanner. This function returns '0' (zero) on
4006 success, or non-zero on error.
4008 *The functions 'yytables_fload' and 'yytables_destroy' are not
4009 thread-safe.* You must ensure that these functions are called exactly
4010 once (for each scanner type) in a threaded program, before any thread
4011 calls 'yylex'. After the tables are loaded, they are never written to,
4012 and no thread protection is required thereafter - until you destroy
4016 File: flex.info, Node: Tables File Format, Prev: Loading and Unloading Serialized Tables, Up: Serialized Tables
4018 22.3 Tables File Format
4019 =======================
4021 This section defines the file format of serialized 'flex' tables.
4023 The tables format allows for one or more sets of tables to be
4024 specified, where each set corresponds to a given scanner. Scanners are
4025 indexed by name, as described below. The file format is as follows:
4028 +-------------------------------+
4029 Header | uint32 th_magic; |
4030 | uint32 th_hsize; |
4031 | uint32 th_ssize; |
4032 | uint16 th_flags; |
4033 | char th_version[]; |
4035 | uint8 th_pad64[]; |
4036 +-------------------------------+
4037 Table 1 | uint16 td_id; |
4038 | uint16 td_flags; |
4039 | uint32 td_hilen; |
4040 | uint32 td_lolen; |
4042 | uint8 td_pad64[]; |
4043 +-------------------------------+
4050 +-------------------------------+
4057 The above diagram shows that a complete set of tables consists of a
4058 header followed by multiple individual tables. Furthermore, multiple
4059 complete sets may be present in the same file, each set with its own
4060 header and tables. The sets are contiguous in the file. The only way
4061 to know if another set follows is to check the next four bytes for the
4062 magic number (or check for EOF). The header and tables sections are
4063 padded to 64-bit boundaries. Below we describe each field in detail.
4064 This format does not specify how the scanner will expand the given data,
4065 i.e., data may be serialized as int8, but expanded to an int32 array at
4066 runtime. This is to reduce the size of the serialized data where
4067 possible. Remember, _all integer values are in network byte order_.
4069 Fields of a table header:
4072 Magic number, always 0xF13C57B1.
4075 Size of this entire header, in bytes, including all fields plus any
4079 Size of this entire set, in bytes, including the header, all
4080 tables, plus any padding.
4083 Bit flags for this table set. Currently unused.
4086 Flex version in NULL-terminated string format. e.g., '2.5.13a'.
4087 This is the version of flex that was used to create the serialized
4091 Contains the name of this table set. The default is 'yytables',
4092 and is prefixed accordingly, e.g., 'footables'. Must be
4096 Zero or more NULL bytes, padding the entire header to the next
4097 64-bit boundary as calculated from the beginning of the header.
4102 Specifies the table identifier. Possible values are:
4103 'YYTD_ID_ACCEPT (0x01)'
4105 'YYTD_ID_BASE (0x02)'
4107 'YYTD_ID_CHK (0x03)'
4109 'YYTD_ID_DEF (0x04)'
4113 'YYTD_ID_META (0x06)'
4115 'YYTD_ID_NUL_TRANS (0x07)'
4117 'YYTD_ID_NXT (0x08)'
4118 'yy_nxt'. This array may be two dimensional. See the
4119 'td_hilen' field below.
4120 'YYTD_ID_RULE_CAN_MATCH_EOL (0x09)'
4121 'yy_rule_can_match_eol'
4122 'YYTD_ID_START_STATE_LIST (0x0A)'
4123 'yy_start_state_list'. This array is handled specially
4124 because it is an array of pointers to structs. See the
4125 'td_flags' field below.
4126 'YYTD_ID_TRANSITION (0x0B)'
4127 'yy_transition'. This array is handled specially because it
4128 is an array of structs. See the 'td_lolen' field below.
4129 'YYTD_ID_ACCLIST (0x0C)'
4133 Bit flags describing how to interpret the data in 'td_data'. The
4134 data arrays are one-dimensional by default, but may be two
4135 dimensional as specified in the 'td_hilen' field.
4138 The data is serialized as an array of type int8.
4139 'YYTD_DATA16 (0x02)'
4140 The data is serialized as an array of type int16.
4141 'YYTD_DATA32 (0x04)'
4142 The data is serialized as an array of type int32.
4143 'YYTD_PTRANS (0x08)'
4144 The data is a list of indexes of entries in the expanded
4145 'yy_transition' array. Each index should be expanded to a
4146 pointer to the corresponding entry in the 'yy_transition'
4147 array. We count on the fact that the 'yy_transition' array
4148 has already been seen.
4149 'YYTD_STRUCT (0x10)'
4150 The data is a list of yy_trans_info structs, each of which
4151 consists of two integers. There is no padding between struct
4152 elements or between structs. The type of each member is
4153 determined by the 'YYTD_DATA*' bits.
4156 If 'td_hilen' is non-zero, then the data is a two-dimensional
4157 array. Otherwise, the data is a one-dimensional array. 'td_hilen'
4158 contains the number of elements in the higher dimensional array,
4159 and 'td_lolen' contains the number of elements in the lowest
4162 Conceptually, 'td_data' is either 'sometype td_data[td_lolen]', or
4163 'sometype td_data[td_hilen][td_lolen]', where 'sometype' is
4164 specified by the 'td_flags' field. It is possible for both
4165 'td_lolen' and 'td_hilen' to be zero, in which case 'td_data' is a
4166 zero length array, and no data is loaded, i.e., this table is
4167 simply skipped. Flex does not currently generate tables of zero
4171 Specifies the number of elements in the lowest dimension array. If
4172 this is a one-dimensional array, then it is simply the number of
4173 elements in this array. The element size is determined by the
4177 The table data. This array may be a one- or two-dimensional array,
4178 of type 'int8', 'int16', 'int32', 'struct yy_trans_info', or
4179 'struct yy_trans_info*', depending upon the values in the
4180 'td_flags', 'td_hilen', and 'td_lolen' fields.
4183 Zero or more NULL bytes, padding the entire table to the next
4184 64-bit boundary as calculated from the beginning of this table.
4187 File: flex.info, Node: Diagnostics, Next: Limitations, Prev: Serialized Tables, Up: Top
4192 The following is a list of 'flex' diagnostic messages:
4194 * 'warning, rule cannot be matched' indicates that the given rule
4195 cannot be matched because it follows other rules that will always
4196 match the same text as it. For example, in the following 'foo'
4197 cannot be matched because it comes after an identifier "catch-all"
4200 [a-z]+ got_identifier();
4203 Using 'REJECT' in a scanner suppresses this warning.
4205 * 'warning, -s option given but default rule can be matched' means
4206 that it is possible (perhaps only in a particular start condition)
4207 that the default rule (match any single character) is the only one
4208 that will match a particular input. Since '-s' was given,
4209 presumably this is not intended.
4211 * 'reject_used_but_not_detected undefined' or
4212 'yymore_used_but_not_detected undefined'. These errors can occur
4213 at compile time. They indicate that the scanner uses 'REJECT' or
4214 'yymore()' but that 'flex' failed to notice the fact, meaning that
4215 'flex' scanned the first two sections looking for occurrences of
4216 these actions and failed to find any, but somehow you snuck some in
4217 (via a #include file, for example). Use '%option reject' or
4218 '%option yymore' to indicate to 'flex' that you really do use these
4221 * 'flex scanner jammed'. a scanner compiled with '-s' has
4222 encountered an input string which wasn't matched by any of its
4223 rules. This error can also occur due to internal problems.
4225 * 'token too large, exceeds YYLMAX'. your scanner uses '%array' and
4226 one of its rules matched a string longer than the 'YYLMAX' constant
4227 (8K bytes by default). You can increase the value by #define'ing
4228 'YYLMAX' in the definitions section of your 'flex' input.
4230 * 'scanner requires -8 flag to use the character 'x''. Your scanner
4231 specification includes recognizing the 8-bit character ''x'' and
4232 you did not specify the -8 flag, and your scanner defaulted to
4233 7-bit because you used the '-Cf' or '-CF' table compression
4234 options. See the discussion of the '-7' flag, *note Scanner
4235 Options::, for details.
4237 * 'flex scanner push-back overflow'. you used 'unput()' to push back
4238 so much text that the scanner's buffer could not hold both the
4239 pushed-back text and the current token in 'yytext'. Ideally the
4240 scanner should dynamically resize the buffer in this case, but at
4241 present it does not.
4243 * 'input buffer overflow, can't enlarge buffer because scanner uses
4244 REJECT'. the scanner was working on matching an extremely large
4245 token and needed to expand the input buffer. This doesn't work
4246 with scanners that use 'REJECT'.
4248 * 'fatal flex scanner internal error--end of buffer missed'. This
4249 can occur in a scanner which is reentered after a long-jump has
4250 jumped out (or over) the scanner's activation frame. Before
4251 reentering the scanner, use:
4253 or, as noted above, switch to using the C++ scanner class.
4255 * 'too many start conditions in <> construct!' you listed more start
4256 conditions in a <> construct than exist (so you must have listed at
4257 least one of them twice).
4260 File: flex.info, Node: Limitations, Next: Bibliography, Prev: Diagnostics, Up: Top
4265 Some trailing context patterns cannot be properly matched and generate
4266 warning messages ('dangerous trailing context'). These are patterns
4267 where the ending of the first part of the rule matches the beginning of
4268 the second part, such as 'zx*/xy*', where the 'x*' matches the 'x' at
4269 the beginning of the trailing context. (Note that the POSIX draft
4270 states that the text matched by such patterns is undefined.) For some
4271 trailing context rules, parts which are actually fixed-length are not
4272 recognized as such, leading to the abovementioned performance loss. In
4273 particular, parts using '|' or '{n}' (such as 'foo{3}') are always
4274 considered variable-length. Combining trailing context with the special
4275 '|' action can result in _fixed_ trailing context being turned into the
4276 more expensive _variable_ trailing context. For example, in the
4283 Use of 'unput()' invalidates yytext and yyleng, unless the '%array'
4284 directive or the '-l' option has been used. Pattern-matching of 'NUL's
4285 is substantially slower than matching other characters. Dynamic
4286 resizing of the input buffer is slow, as it entails rescanning all the
4287 text matched so far by the current (generally huge) token. Due to both
4288 buffering of input and read-ahead, you cannot intermix calls to
4289 '<stdio.h>' routines, such as, getchar(), with 'flex' rules and expect
4290 it to work. Call 'input()' instead. The total table entries listed by
4291 the '-v' flag excludes the number of table entries needed to determine
4292 what rule has been matched. The number of entries is equal to the
4293 number of DFA states if the scanner does not use 'REJECT', and somewhat
4294 greater than the number of states if it does. 'REJECT' cannot be used
4295 with the '-f' or '-F' options.
4297 The 'flex' internal algorithms need documentation.
4300 File: flex.info, Node: Bibliography, Next: FAQ, Prev: Limitations, Up: Top
4302 25 Additional Reading
4303 *********************
4305 You may wish to read more about the following programs:
4311 The following books may contain material of interest:
4313 John Levine, Tony Mason, and Doug Brown, _Lex & Yacc_, O'Reilly and
4314 Associates. Be sure to get the 2nd edition.
4316 M. E. Lesk and E. Schmidt, _LEX - Lexical Analyzer Generator_
4318 Alfred Aho, Ravi Sethi and Jeffrey Ullman, _Compilers: Principles,
4319 Techniques and Tools_, Addison-Wesley (1986). Describes the
4320 pattern-matching techniques used by 'flex' (deterministic finite
4324 File: flex.info, Node: FAQ, Next: Appendices, Prev: Bibliography, Up: Top
4329 From time to time, the 'flex' maintainer receives certain questions.
4330 Rather than repeat answers to well-understood problems, we publish them
4335 * When was flex born?::
4336 * How do I expand backslash-escape sequences in C-style quoted strings?::
4337 * Why do flex scanners call fileno if it is not ANSI compatible?::
4338 * Does flex support recursive pattern definitions?::
4339 * How do I skip huge chunks of input (tens of megabytes) while using flex?::
4340 * Flex is not matching my patterns in the same order that I defined them.::
4341 * My actions are executing out of order or sometimes not at all.::
4342 * How can I have multiple input sources feed into the same scanner at the same time?::
4343 * Can I build nested parsers that work with the same input file?::
4344 * How can I match text only at the end of a file?::
4345 * How can I make REJECT cascade across start condition boundaries?::
4346 * Why cant I use fast or full tables with interactive mode?::
4347 * How much faster is -F or -f than -C?::
4348 * If I have a simple grammar cant I just parse it with flex?::
4349 * Why doesn't yyrestart() set the start state back to INITIAL?::
4350 * How can I match C-style comments?::
4351 * The period isn't working the way I expected.::
4352 * Can I get the flex manual in another format?::
4353 * Does there exist a "faster" NDFA->DFA algorithm?::
4354 * How does flex compile the DFA so quickly?::
4355 * How can I use more than 8192 rules?::
4356 * How do I abandon a file in the middle of a scan and switch to a new file?::
4357 * How do I execute code only during initialization (only before the first scan)?::
4358 * How do I execute code at termination?::
4359 * Where else can I find help?::
4360 * Can I include comments in the "rules" section of the file?::
4361 * I get an error about undefined yywrap().::
4362 * How can I change the matching pattern at run time?::
4363 * How can I expand macros in the input?::
4364 * How can I build a two-pass scanner?::
4365 * How do I match any string not matched in the preceding rules?::
4366 * I am trying to port code from AT&T lex that uses yysptr and yysbuf.::
4367 * Is there a way to make flex treat NULL like a regular character?::
4368 * Whenever flex can not match the input it says "flex scanner jammed".::
4369 * Why doesn't flex have non-greedy operators like perl does?::
4370 * Memory leak - 16386 bytes allocated by malloc.::
4371 * How do I track the byte offset for lseek()?::
4372 * How do I use my own I/O classes in a C++ scanner?::
4373 * How do I skip as many chars as possible?::
4375 * Are certain equivalent patterns faster than others?::
4376 * Is backing up a big deal?::
4377 * Can I fake multi-byte character support?::
4379 * Can you discuss some flex internals?::
4380 * unput() messes up yy_at_bol::
4381 * The | operator is not doing what I want::
4382 * Why can't flex understand this variable trailing context pattern?::
4383 * The ^ operator isn't working::
4384 * Trailing context is getting confused with trailing optional patterns::
4385 * Is flex GNU or not?::
4387 * I need to scan if-then-else blocks and while loops::
4391 * Is there a repository for flex scanners?::
4392 * How can I conditionally compile or preprocess my flex input file?::
4393 * Where can I find grammars for lex and yacc?::
4394 * I get an end-of-buffer message for each character scanned.::
4434 * What is the difference between YYLEX_PARAM and YY_DECL?::
4435 * Why do I get "conflicting types for yylex" error?::
4436 * How do I access the values set in a Flex action from within a Bison action?::
4439 File: flex.info, Node: When was flex born?, Next: How do I expand backslash-escape sequences in C-style quoted strings?, Up: FAQ
4444 Vern Paxson took over the 'Software Tools' lex project from Jef
4445 Poskanzer in 1982. At that point it was written in Ratfor. Around 1987
4446 or so, Paxson translated it into C, and a legend was born :-).
4449 File: flex.info, Node: How do I expand backslash-escape sequences in C-style quoted strings?, Next: Why do flex scanners call fileno if it is not ANSI compatible?, Prev: When was flex born?, Up: FAQ
4451 How do I expand backslash-escape sequences in C-style quoted strings?
4452 =====================================================================
4454 A key point when scanning quoted strings is that you cannot (easily)
4455 write a single rule that will precisely match the string if you allow
4456 things like embedded escape sequences and newlines. If you try to match
4457 strings with a single rule then you'll wind up having to rescan the
4458 string anyway to find any escape sequences.
4460 Instead you can use exclusive start conditions and a set of rules,
4461 one for matching non-escaped text, one for matching a single escape, one
4462 for matching an embedded newline, and one for recognizing the end of the
4463 string. Each of these rules is then faced with the question of where to
4464 put its intermediary results. The best solution is for the rules to
4465 append their local value of 'yytext' to the end of a "string literal"
4466 buffer. A rule like the escape-matcher will append to the buffer the
4467 meaning of the escape sequence rather than the literal text in 'yytext'.
4468 In this way, 'yytext' does not need to be modified at all.
4471 File: flex.info, Node: Why do flex scanners call fileno if it is not ANSI compatible?, Next: Does flex support recursive pattern definitions?, Prev: How do I expand backslash-escape sequences in C-style quoted strings?, Up: FAQ
4473 Why do flex scanners call fileno if it is not ANSI compatible?
4474 ==============================================================
4476 Flex scanners call 'fileno()' in order to get the file descriptor
4477 corresponding to 'yyin'. The file descriptor may be passed to
4478 'isatty()' or 'read()', depending upon which '%options' you specified.
4479 If your system does not have 'fileno()' support, to get rid of the
4480 'read()' call, do not specify '%option read'. To get rid of the
4481 'isatty()' call, you must specify one of '%option always-interactive' or
4482 '%option never-interactive'.
4485 File: flex.info, Node: Does flex support recursive pattern definitions?, Next: How do I skip huge chunks of input (tens of megabytes) while using flex?, Prev: Why do flex scanners call fileno if it is not ANSI compatible?, Up: FAQ
4487 Does flex support recursive pattern definitions?
4488 ================================================
4493 block "{"({block}|{statement})*"}"
4495 No. You cannot have recursive definitions. The pattern-matching
4496 power of regular expressions in general (and therefore flex scanners,
4497 too) is limited. In particular, regular expressions cannot "balance"
4498 parentheses to an arbitrary degree. For example, it's impossible to
4499 write a regular expression that matches all strings containing the same
4500 number of '{'s as '}'s. For more powerful pattern matching, you need a
4501 parser, such as 'GNU bison'.
4504 File: flex.info, Node: How do I skip huge chunks of input (tens of megabytes) while using flex?, Next: Flex is not matching my patterns in the same order that I defined them., Prev: Does flex support recursive pattern definitions?, Up: FAQ
4506 How do I skip huge chunks of input (tens of megabytes) while using flex?
4507 ========================================================================
4509 Use 'fseek()' (or 'lseek()') to position yyin, then call 'yyrestart()'.
4512 File: flex.info, Node: Flex is not matching my patterns in the same order that I defined them., Next: My actions are executing out of order or sometimes not at all., Prev: How do I skip huge chunks of input (tens of megabytes) while using flex?, Up: FAQ
4514 Flex is not matching my patterns in the same order that I defined them.
4515 =======================================================================
4517 'flex' picks the rule that matches the most text (i.e., the longest
4518 possible input string). This is because 'flex' uses an entirely
4519 different matching technique ("deterministic finite automata") that
4520 actually does all of the matching simultaneously, in parallel. (Seems
4521 impossible, but it's actually a fairly simple technique once you
4522 understand the principles.)
4524 A side-effect of this parallel matching is that when the input
4525 matches more than one rule, 'flex' scanners pick the rule that matched
4526 the _most_ text. This is explained further in the manual, in the
4527 section *Note Matching::.
4529 If you want 'flex' to choose a shorter match, then you can work
4530 around this behavior by expanding your short rule to match more text,
4531 then put back the extra:
4533 data_.* yyless( 5 ); BEGIN BLOCKIDSTATE;
4535 Another fix would be to make the second rule active only during the
4536 '<BLOCKIDSTATE>' start condition, and make that start condition
4537 exclusive by declaring it with '%x' instead of '%s'.
4539 A final fix is to change the input language so that the ambiguity for
4540 'data_' is removed, by adding characters to it that don't match the
4541 identifier rule, or by removing characters (such as '_') from the
4542 identifier rule so it no longer matches 'data_'. (Of course, you might
4543 also not have the option of changing the input language.)
4546 File: flex.info, Node: My actions are executing out of order or sometimes not at all., Next: How can I have multiple input sources feed into the same scanner at the same time?, Prev: Flex is not matching my patterns in the same order that I defined them., Up: FAQ
4548 My actions are executing out of order or sometimes not at all.
4549 ==============================================================
4551 Most likely, you have (in error) placed the opening '{' of the action
4552 block on a different line than the rule, e.g.,
4559 'flex' requires that the opening '{' of an action associated with a
4560 rule begin on the same line as does the rule. You need instead to write
4561 your rules as follows:
4563 ^(foo|bar) { // CORRECT!
4568 File: flex.info, Node: How can I have multiple input sources feed into the same scanner at the same time?, Next: Can I build nested parsers that work with the same input file?, Prev: My actions are executing out of order or sometimes not at all., Up: FAQ
4570 How can I have multiple input sources feed into the same scanner at the same time?
4571 ==================================================================================
4574 * your scanner is free of backtracking (verified using 'flex''s '-b'
4576 * AND you run your scanner interactively ('-I' option; default unless
4577 using special table compression options),
4578 * AND you feed it one character at a time by redefining 'YY_INPUT' to
4581 then every time it matches a token, it will have exhausted its input
4582 buffer (because the scanner is free of backtracking). This means you
4583 can safely use 'select()' at the point and only call 'yylex()' for
4584 another token if 'select()' indicates there's data available.
4586 That is, move the 'select()' out from the input function to a point
4587 where it determines whether 'yylex()' gets called for the next token.
4589 With this approach, you will still have problems if your input can
4590 arrive piecemeal; 'select()' could inform you that the beginning of a
4591 token is available, you call 'yylex()' to get it, but it winds up
4592 blocking waiting for the later characters in the token.
4594 Here's another way: Move your input multiplexing inside of
4595 'YY_INPUT'. That is, whenever 'YY_INPUT' is called, it 'select()''s to
4596 see where input is available. If input is available for the scanner, it
4597 reads and returns the next byte. If input is available from another
4598 source, it calls whatever function is responsible for reading from that
4599 source. (If no input is available, it blocks until some input is
4600 available.) I've used this technique in an interpreter I wrote that
4601 both reads keyboard input using a 'flex' scanner and IPC traffic from
4602 sockets, and it works fine.
4605 File: flex.info, Node: Can I build nested parsers that work with the same input file?, Next: How can I match text only at the end of a file?, Prev: How can I have multiple input sources feed into the same scanner at the same time?, Up: FAQ
4607 Can I build nested parsers that work with the same input file?
4608 ==============================================================
4610 This is not going to work without some additional effort. The reason is
4611 that 'flex' block-buffers the input it reads from 'yyin'. This means
4612 that the "outermost" 'yylex()', when called, will automatically slurp up
4613 the first 8K of input available on yyin, and subsequent calls to other
4614 'yylex()''s won't see that input. You might be tempted to work around
4615 this problem by redefining 'YY_INPUT' to only return a small amount of
4616 text, but it turns out that that approach is quite difficult. Instead,
4617 the best solution is to combine all of your scanners into one large
4618 scanner, using a different exclusive start condition for each.
4621 File: flex.info, Node: How can I match text only at the end of a file?, Next: How can I make REJECT cascade across start condition boundaries?, Prev: Can I build nested parsers that work with the same input file?, Up: FAQ
4623 How can I match text only at the end of a file?
4624 ===============================================
4626 There is no way to write a rule which is "match this text, but only if
4627 it comes at the end of the file". You can fake it, though, if you
4628 happen to have a character lying around that you don't allow in your
4629 input. Then you redefine 'YY_INPUT' to call your own routine which, if
4630 it sees an 'EOF', returns the magic character first (and remembers to
4631 return a real 'EOF' next time it's called). Then you could write:
4633 <COMMENT>(.|\n)*{EOF_CHAR} /* saw comment at EOF */
4636 File: flex.info, Node: How can I make REJECT cascade across start condition boundaries?, Next: Why cant I use fast or full tables with interactive mode?, Prev: How can I match text only at the end of a file?, Up: FAQ
4638 How can I make REJECT cascade across start condition boundaries?
4639 ================================================================
4641 You can do this as follows. Suppose you have a start condition 'A', and
4642 after exhausting all of the possible matches in '<A>', you want to try
4643 matches in '<INITIAL>'. Then you could use the following:
4647 <A>rule_that_is_long ...; REJECT;
4648 <A>rule ...; REJECT; /* shorter rule */
4652 /* Shortest and last rule in <A>, so
4653 * cascaded REJECTs will eventually
4654 * wind up matching this rule. We want
4655 * to now switch to the initial state
4656 * and try matching from there instead.
4658 yyless(0); /* put back matched text */
4663 File: flex.info, Node: Why cant I use fast or full tables with interactive mode?, Next: How much faster is -F or -f than -C?, Prev: How can I make REJECT cascade across start condition boundaries?, Up: FAQ
4665 Why can't I use fast or full tables with interactive mode?
4666 ==========================================================
4668 One of the assumptions flex makes is that interactive applications are
4669 inherently slow (they're waiting on a human after all). It has to do
4670 with how the scanner detects that it must be finished scanning a token.
4671 For interactive scanners, after scanning each character the current
4672 state is looked up in a table (essentially) to see whether there's a
4673 chance of another input character possibly extending the length of the
4674 match. If not, the scanner halts. For non-interactive scanners, the
4675 end-of-token test is much simpler, basically a compare with 0, so no
4676 memory bus cycles. Since the test occurs in the innermost scanning
4677 loop, one would like to make it go as fast as possible.
4679 Still, it seems reasonable to allow the user to choose to trade off a
4680 bit of performance in this area to gain the corresponding flexibility.
4681 There might be another reason, though, why fast scanners don't support
4682 the interactive option.
4685 File: flex.info, Node: How much faster is -F or -f than -C?, Next: If I have a simple grammar cant I just parse it with flex?, Prev: Why cant I use fast or full tables with interactive mode?, Up: FAQ
4687 How much faster is -F or -f than -C?
4688 ====================================
4690 Much faster (factor of 2-3).
4693 File: flex.info, Node: If I have a simple grammar cant I just parse it with flex?, Next: Why doesn't yyrestart() set the start state back to INITIAL?, Prev: How much faster is -F or -f than -C?, Up: FAQ
4695 If I have a simple grammar can't I just parse it with flex?
4696 ===========================================================
4698 Is your grammar recursive? That's almost always a sign that you're
4699 better off using a parser/scanner rather than just trying to use a
4703 File: flex.info, Node: Why doesn't yyrestart() set the start state back to INITIAL?, Next: How can I match C-style comments?, Prev: If I have a simple grammar cant I just parse it with flex?, Up: FAQ
4705 Why doesn't yyrestart() set the start state back to INITIAL?
4706 ============================================================
4708 There are two reasons. The first is that there might be programs that
4709 rely on the start state not changing across file changes. The second is
4710 that beginning with 'flex' version 2.4, use of 'yyrestart()' is no
4711 longer required, so fixing the problem there doesn't solve the more
4715 File: flex.info, Node: How can I match C-style comments?, Next: The period isn't working the way I expected., Prev: Why doesn't yyrestart() set the start state back to INITIAL?, Up: FAQ
4717 How can I match C-style comments?
4718 =================================
4720 You might be tempted to try something like this:
4722 "/*".*"*/" // WRONG!
4726 "/*"(.|\n)"*/" // WRONG!
4728 The above rules will eat too much input, and blow up on things like:
4730 /* a comment */ do_my_thing( "oops */" );
4732 Here is one way which allows you to track line information:
4735 "/*" BEGIN(IN_COMMENT);
4738 "*/" BEGIN(INITIAL);
4739 [^*\n]+ // eat comment in chunks
4740 "*" // eat the lone star
4745 File: flex.info, Node: The period isn't working the way I expected., Next: Can I get the flex manual in another format?, Prev: How can I match C-style comments?, Up: FAQ
4747 The '.' isn't working the way I expected.
4748 =========================================
4750 Here are some tips for using '.':
4752 * A common mistake is to place the grouping parenthesis AFTER an
4753 operator, when you really meant to place the parenthesis BEFORE the
4754 operator, e.g., you probably want this '(foo|bar)+' and NOT this
4757 The first pattern matches the words 'foo' or 'bar' any number of
4758 times, e.g., it matches the text 'barfoofoobarfoo'. The second
4759 pattern matches a single instance of 'foo' or a single instance of
4760 'bar' followed by one or more 'r's, e.g., it matches the text
4762 * A '.' inside '[]''s just means a literal'.' (period), and NOT "any
4763 character except newline".
4764 * Remember that '.' matches any character EXCEPT '\n' (and 'EOF').
4765 If you really want to match ANY character, including newlines, then
4766 use '(.|\n)' Beware that the regex '(.|\n)+' will match your entire
4768 * Finally, if you want to match a literal '.' (a period), then use
4772 File: flex.info, Node: Can I get the flex manual in another format?, Next: Does there exist a "faster" NDFA->DFA algorithm?, Prev: The period isn't working the way I expected., Up: FAQ
4774 Can I get the flex manual in another format?
4775 ============================================
4777 The 'flex' source distribution includes a texinfo manual. You are free
4778 to convert that texinfo into whatever format you desire. The 'texinfo'
4779 package includes tools for conversion to a number of formats.
4782 File: flex.info, Node: Does there exist a "faster" NDFA->DFA algorithm?, Next: How does flex compile the DFA so quickly?, Prev: Can I get the flex manual in another format?, Up: FAQ
4784 Does there exist a "faster" NDFA->DFA algorithm?
4785 ================================================
4787 There's no way around the potential exponential running time - it can
4788 take you exponential time just to enumerate all of the DFA states. In
4789 practice, though, the running time is closer to linear, or sometimes
4793 File: flex.info, Node: How does flex compile the DFA so quickly?, Next: How can I use more than 8192 rules?, Prev: Does there exist a "faster" NDFA->DFA algorithm?, Up: FAQ
4795 How does flex compile the DFA so quickly?
4796 =========================================
4798 There are two big speed wins that 'flex' uses:
4800 1. It analyzes the input rules to construct equivalence classes for
4801 those characters that always make the same transitions. It then
4802 rewrites the NFA using equivalence classes for transitions instead
4803 of characters. This cuts down the NFA->DFA computation time
4804 dramatically, to the point where, for uncompressed DFA tables, the
4805 DFA generation is often I/O bound in writing out the tables.
4806 2. It maintains hash values for previously computed DFA states, so
4807 testing whether a newly constructed DFA state is equivalent to a
4808 previously constructed state can be done very quickly, by first
4809 comparing hash values.
4812 File: flex.info, Node: How can I use more than 8192 rules?, Next: How do I abandon a file in the middle of a scan and switch to a new file?, Prev: How does flex compile the DFA so quickly?, Up: FAQ
4814 How can I use more than 8192 rules?
4815 ===================================
4817 'Flex' is compiled with an upper limit of 8192 rules per scanner. If
4818 you need more than 8192 rules in your scanner, you'll have to recompile
4819 'flex' with the following changes in 'flexdef.h':
4821 < #define YY_TRAILING_MASK 0x2000
4822 < #define YY_TRAILING_HEAD_MASK 0x4000
4824 > #define YY_TRAILING_MASK 0x20000000
4825 > #define YY_TRAILING_HEAD_MASK 0x40000000
4827 This should work okay as long as your C compiler uses 32 bit
4828 integers. But you might want to think about whether using such a huge
4829 number of rules is the best way to solve your problem.
4831 The following may also be relevant:
4833 With luck, you should be able to increase the definitions in
4836 #define JAMSTATE -32766 /* marks a reference to the state that always jams */
4837 #define MAXIMUM_MNS 31999
4838 #define BAD_SUBSCRIPT -32767
4840 recompile everything, and it'll all work. Flex only has these
4841 16-bit-like values built into it because a long time ago it was
4842 developed on a machine with 16-bit ints. I've given this advice to
4843 others in the past but haven't heard back from them whether it worked
4847 File: flex.info, Node: How do I abandon a file in the middle of a scan and switch to a new file?, Next: How do I execute code only during initialization (only before the first scan)?, Prev: How can I use more than 8192 rules?, Up: FAQ
4849 How do I abandon a file in the middle of a scan and switch to a new file?
4850 =========================================================================
4852 Just call 'yyrestart(newfile)'. Be sure to reset the start state if you
4853 want a "fresh start, since 'yyrestart' does NOT reset the start state
4857 File: flex.info, Node: How do I execute code only during initialization (only before the first scan)?, Next: How do I execute code at termination?, Prev: How do I abandon a file in the middle of a scan and switch to a new file?, Up: FAQ
4859 How do I execute code only during initialization (only before the first scan)?
4860 ==============================================================================
4862 You can specify an initial action by defining the macro 'YY_USER_INIT'
4863 (though note that 'yyout' may not be available at the time this macro is
4864 executed). Or you can add to the beginning of your rules section:
4867 /* Must be indented! */
4868 static int did_init = 0;
4876 File: flex.info, Node: How do I execute code at termination?, Next: Where else can I find help?, Prev: How do I execute code only during initialization (only before the first scan)?, Up: FAQ
4878 How do I execute code at termination?
4879 =====================================
4881 You can specify an action for the '<<EOF>>' rule.
4884 File: flex.info, Node: Where else can I find help?, Next: Can I include comments in the "rules" section of the file?, Prev: How do I execute code at termination?, Up: FAQ
4886 Where else can I find help?
4887 ===========================
4889 You can find the flex homepage on the web at
4890 <http://flex.sourceforge.net/>. See that page for details about flex
4891 mailing lists as well.
4894 File: flex.info, Node: Can I include comments in the "rules" section of the file?, Next: I get an error about undefined yywrap()., Prev: Where else can I find help?, Up: FAQ
4896 Can I include comments in the "rules" section of the file?
4897 ==========================================================
4899 Yes, just about anywhere you want to. See the manual for the specific
4903 File: flex.info, Node: I get an error about undefined yywrap()., Next: How can I change the matching pattern at run time?, Prev: Can I include comments in the "rules" section of the file?, Up: FAQ
4905 I get an error about undefined yywrap().
4906 ========================================
4908 You must supply a 'yywrap()' function of your own, or link to 'libfl.a'
4909 (which provides one), or use
4913 in your source to say you don't want a 'yywrap()' function.
4916 File: flex.info, Node: How can I change the matching pattern at run time?, Next: How can I expand macros in the input?, Prev: I get an error about undefined yywrap()., Up: FAQ
4918 How can I change the matching pattern at run time?
4919 ==================================================
4921 You can't, it's compiled into a static table when flex builds the
4925 File: flex.info, Node: How can I expand macros in the input?, Next: How can I build a two-pass scanner?, Prev: How can I change the matching pattern at run time?, Up: FAQ
4927 How can I expand macros in the input?
4928 =====================================
4930 The best way to approach this problem is at a higher level, e.g., in the
4933 However, you can do this using multiple input buffers.
4937 /* Saw the macro "macro" followed by extra stuff. */
4938 main_buffer = YY_CURRENT_BUFFER;
4939 expansion_buffer = yy_scan_string(expand(yytext));
4940 yy_switch_to_buffer(expansion_buffer);
4944 if ( expansion_buffer )
4946 // We were doing an expansion, return to where
4948 yy_switch_to_buffer(main_buffer);
4949 yy_delete_buffer(expansion_buffer);
4950 expansion_buffer = 0;
4956 You probably will want a stack of expansion buffers to allow nested
4957 macros. From the above though hopefully the idea is clear.
4960 File: flex.info, Node: How can I build a two-pass scanner?, Next: How do I match any string not matched in the preceding rules?, Prev: How can I expand macros in the input?, Up: FAQ
4962 How can I build a two-pass scanner?
4963 ===================================
4965 One way to do it is to filter the first pass to a temporary file, then
4966 process the temporary file on the second pass. You will probably see a
4967 performance hit, due to all the disk I/O.
4969 When you need to look ahead far forward like this, it almost always
4970 means that the right solution is to build a parse tree of the entire
4971 input, then walk it after the parse in order to generate the output. In
4972 a sense, this is a two-pass approach, once through the text and once
4973 through the parse tree, but the performance hit for the latter is
4974 usually an order of magnitude smaller, since everything is already
4975 classified, in binary format, and residing in memory.
4978 File: flex.info, Node: How do I match any string not matched in the preceding rules?, Next: I am trying to port code from AT&T lex that uses yysptr and yysbuf., Prev: How can I build a two-pass scanner?, Up: FAQ
4980 How do I match any string not matched in the preceding rules?
4981 =============================================================
4983 One way to assign precedence, is to place the more specific rules first.
4984 If two rules would match the same input (same sequence of characters)
4985 then the first rule listed in the 'flex' input wins, e.g.,
4988 foo[a-zA-Z_]+ return FOO_ID;
4989 bar[a-zA-Z_]+ return BAR_ID;
4990 [a-zA-Z_]+ return GENERIC_ID;
4992 Note that the rule '[a-zA-Z_]+' must come *after* the others. It
4993 will match the same amount of text as the more specific rules, and in
4994 that case the 'flex' scanner will pick the first rule listed in your
4995 scanner as the one to match.
4998 File: flex.info, Node: I am trying to port code from AT&T lex that uses yysptr and yysbuf., Next: Is there a way to make flex treat NULL like a regular character?, Prev: How do I match any string not matched in the preceding rules?, Up: FAQ
5000 I am trying to port code from AT&T lex that uses yysptr and yysbuf.
5001 ===================================================================
5003 Those are internal variables pointing into the AT&T scanner's input
5004 buffer. I imagine they're being manipulated in user versions of the
5005 'input()' and 'unput()' functions. If so, what you need to do is
5006 analyze those functions to figure out what they're doing, and then
5007 replace 'input()' with an appropriate definition of 'YY_INPUT'. You
5008 shouldn't need to (and must not) replace 'flex''s 'unput()' function.
5011 File: flex.info, Node: Is there a way to make flex treat NULL like a regular character?, Next: Whenever flex can not match the input it says "flex scanner jammed"., Prev: I am trying to port code from AT&T lex that uses yysptr and yysbuf., Up: FAQ
5013 Is there a way to make flex treat NULL like a regular character?
5014 ================================================================
5016 Yes, '\0' and '\x00' should both do the trick. Perhaps you have an
5017 ancient version of 'flex'. The latest release is version 2.6.4.
5020 File: flex.info, Node: Whenever flex can not match the input it says "flex scanner jammed"., Next: Why doesn't flex have non-greedy operators like perl does?, Prev: Is there a way to make flex treat NULL like a regular character?, Up: FAQ
5022 Whenever flex can not match the input it says "flex scanner jammed".
5023 ====================================================================
5025 You need to add a rule that matches the otherwise-unmatched text, e.g.,
5029 [[a bunch of rules here]]
5031 . printf("bad input character '%s' at line %d\n", yytext, yylineno);
5033 See '%option default' for more information.
5036 File: flex.info, Node: Why doesn't flex have non-greedy operators like perl does?, Next: Memory leak - 16386 bytes allocated by malloc., Prev: Whenever flex can not match the input it says "flex scanner jammed"., Up: FAQ
5038 Why doesn't flex have non-greedy operators like perl does?
5039 ==========================================================
5041 A DFA can do a non-greedy match by stopping the first time it enters an
5042 accepting state, instead of consuming input until it determines that no
5043 further matching is possible (a "jam" state). This is actually easier
5044 to implement than longest leftmost match (which flex does).
5046 But it's also much less useful than longest leftmost match. In
5047 general, when you find yourself wishing for non-greedy matching, that's
5048 usually a sign that you're trying to make the scanner do some parsing.
5049 That's generally the wrong approach, since it lacks the power to do a
5050 decent job. Better is to either introduce a separate parser, or to
5051 split the scanner into multiple scanners using (exclusive) start
5054 You might have a separate start state once you've seen the 'BEGIN'.
5055 In that state, you might then have a regex that will match 'END' (to
5056 kick you out of the state), and perhaps '(.|\n)' to get a single
5057 character within the chunk ...
5059 This approach also has much better error-reporting properties.
5062 File: flex.info, Node: Memory leak - 16386 bytes allocated by malloc., Next: How do I track the byte offset for lseek()?, Prev: Why doesn't flex have non-greedy operators like perl does?, Up: FAQ
5064 Memory leak - 16386 bytes allocated by malloc.
5065 ==============================================
5067 UPDATED 2002-07-10: As of 'flex' version 2.5.9, this leak means that you
5068 did not call 'yylex_destroy()'. If you are using an earlier version of
5069 'flex', then read on.
5071 The leak is about 16426 bytes. That is, (8192 * 2 + 2) for the
5072 read-buffer, and about 40 for 'struct yy_buffer_state' (depending upon
5073 alignment). The leak is in the non-reentrant C scanner only (NOT in the
5074 reentrant scanner, NOT in the C++ scanner). Since 'flex' doesn't know
5075 when you are done, the buffer is never freed.
5077 However, the leak won't multiply since the buffer is reused no matter
5078 how many times you call 'yylex()'.
5080 If you want to reclaim the memory when you are completely done
5081 scanning, then you might try this:
5083 /* For non-reentrant C scanner only. */
5084 yy_delete_buffer(YY_CURRENT_BUFFER);
5087 Note: 'yy_init' is an "internal variable", and hasn't been tested in
5088 this situation. It is possible that some other globals may need
5092 File: flex.info, Node: How do I track the byte offset for lseek()?, Next: How do I use my own I/O classes in a C++ scanner?, Prev: Memory leak - 16386 bytes allocated by malloc., Up: FAQ
5094 How do I track the byte offset for lseek()?
5095 ===========================================
5097 > We thought that it would be possible to have this number through the
5098 > evaluation of the following expression:
5100 > seek_position = (no_buffers)*YY_READ_BUF_SIZE + yy_c_buf_p - YY_CURRENT_BUFFER->yy_ch_buf
5102 While this is the right idea, it has two problems. The first is that
5103 it's possible that 'flex' will request less than 'YY_READ_BUF_SIZE'
5104 during an invocation of 'YY_INPUT' (or that your input source will
5105 return less even though 'YY_READ_BUF_SIZE' bytes were requested). The
5106 second problem is that when refilling its internal buffer, 'flex' keeps
5107 some characters from the previous buffer (because usually it's in the
5108 middle of a match, and needs those characters to construct 'yytext' for
5109 the match once it's done). Because of this, 'yy_c_buf_p -
5110 YY_CURRENT_BUFFER->yy_ch_buf' won't be exactly the number of characters
5111 already read from the current buffer.
5113 An alternative solution is to count the number of characters you've
5114 matched since starting to scan. This can be done by using
5115 'YY_USER_ACTION'. For example,
5117 #define YY_USER_ACTION num_chars += yyleng;
5119 (You need to be careful to update your bookkeeping if you use
5120 'yymore('), 'yyless()', 'unput()', or 'input()'.)
5123 File: flex.info, Node: How do I use my own I/O classes in a C++ scanner?, Next: How do I skip as many chars as possible?, Prev: How do I track the byte offset for lseek()?, Up: FAQ
5125 How do I use my own I/O classes in a C++ scanner?
5126 =================================================
5128 When the flex C++ scanning class rewrite finally happens, then this sort
5129 of thing should become much easier.
5131 You can do this by passing the various functions (such as
5132 'LexerInput()' and 'LexerOutput()') NULL 'iostream*''s, and then dealing
5133 with your own I/O classes surreptitiously (i.e., stashing them in
5134 special member variables). This works because the only assumption about
5135 the lexer regarding what's done with the iostream's is that they're
5136 ultimately passed to 'LexerInput()' and 'LexerOutput', which then do
5137 whatever is necessary with them.
5140 File: flex.info, Node: How do I skip as many chars as possible?, Next: deleteme00, Prev: How do I use my own I/O classes in a C++ scanner?, Up: FAQ
5142 How do I skip as many chars as possible?
5143 ========================================
5145 How do I skip as many chars as possible - without interfering with the
5148 In the example below, we want to skip over characters until we see
5149 the phrase "endskip". The following will _NOT_ work correctly (do you
5152 /* INCORRECT SCANNER */
5155 <INITIAL>startskip BEGIN(SKIP);
5157 <SKIP>"endskip" BEGIN(INITIAL);
5160 The problem is that the pattern .* will eat up the word "endskip."
5161 The simplest (but slow) fix is:
5163 <SKIP>"endskip" BEGIN(INITIAL);
5166 The fix involves making the second rule match more, without making it
5167 match "endskip" plus something else. So for example:
5169 <SKIP>"endskip" BEGIN(INITIAL);
5171 <SKIP>. ;/* so you eat up e's, too */
5174 File: flex.info, Node: deleteme00, Next: Are certain equivalent patterns faster than others?, Prev: How do I skip as many chars as possible?, Up: FAQ
5182 Vern Paxson took over
5183 the Software Tools lex project from Jef Poskanzer in 1982. At that point it
5184 was written in Ratfor. Around 1987 or so, Paxson translated it into C, and
5185 a legend was born :-).
5188 File: flex.info, Node: Are certain equivalent patterns faster than others?, Next: Is backing up a big deal?, Prev: deleteme00, Up: FAQ
5190 Are certain equivalent patterns faster than others?
5191 ===================================================
5193 To: Adoram Rogel <adoram@orna.hybridge.com>
5194 Subject: Re: Flex 2.5.2 performance questions
5195 In-reply-to: Your message of Wed, 18 Sep 96 11:12:17 EDT.
5196 Date: Wed, 18 Sep 96 10:51:02 PDT
5197 From: Vern Paxson <vern>
5199 [Note, the most recent flex release is 2.5.4, which you can get from
5200 ftp.ee.lbl.gov. It has bug fixes over 2.5.2 and 2.5.3.]
5202 > 1. Using the pattern
5203 > ([Ff](oot)?)?[Nn](ote)?(\.)?
5205 > (((F|f)oot(N|n)ote)|((N|n)ote)|((N|n)\.)|((F|f)(N|n)(\.)))
5206 > (in a very complicated flex program) caused the program to slow from
5207 > 300K+/min to 100K/min (no other changes were done).
5209 These two are not equivalent. For example, the first can match "footnote."
5210 but the second can only match "footnote". This is almost certainly the
5211 cause in the discrepancy - the slower scanner run is matching more tokens,
5212 and/or having to do more backing up.
5214 > 2. Which of these two are better: [Ff]oot or (F|f)oot ?
5216 From a performance point of view, they're equivalent (modulo presumably
5217 minor effects such as memory cache hit rates; and the presence of trailing
5218 context, see below). From a space point of view, the first is slightly
5221 > 3. I have a pattern that look like this:
5222 > pats {p1}|{p2}|{p3}|...|{p50} (50 patterns ORd)
5224 > running yet another complicated program that includes the following rule:
5225 > <snext>{and}/{no4}{bb}{pats}
5227 > gets me to "too complicated - over 32,000 states"...
5229 I can't tell from this example whether the trailing context is variable-length
5230 or fixed-length (it could be the latter if {and} is fixed-length). If it's
5231 variable length, which flex -p will tell you, then this reflects a basic
5232 performance problem, and if you can eliminate it by restructuring your
5233 scanner, you will see significant improvement.
5235 > so I divided {pats} to {pats1}, {pats2},..., {pats5} each consists of about
5236 > 10 patterns and changed the rule to be 5 rules.
5237 > This did compile, but what is the rule of thumb here ?
5239 The rule is to avoid trailing context other than fixed-length, in which for
5240 a/b, either the 'a' pattern or the 'b' pattern have a fixed length. Use
5241 of the '|' operator automatically makes the pattern variable length, so in
5242 this case '[Ff]oot' is preferred to '(F|f)oot'.
5244 > 4. I changed a rule that looked like this:
5245 > <snext8>{and}{bb}/{ROMAN}[^A-Za-z] { BEGIN...
5247 > to the next 2 rules:
5248 > <snext8>{and}{bb}/{ROMAN}[A-Za-z] { ECHO;}
5249 > <snext8>{and}{bb}/{ROMAN} { BEGIN...
5251 > Again, I understand the using [^...] will cause a great performance loss
5253 Actually, it doesn't cause any sort of performance loss. It's a surprising
5254 fact about regular expressions that they always match in linear time
5255 regardless of how complex they are.
5257 > but are there any specific rules about it ?
5259 See the "Performance Considerations" section of the man page, and also
5260 the example in MISC/fastwc/.
5265 File: flex.info, Node: Is backing up a big deal?, Next: Can I fake multi-byte character support?, Prev: Are certain equivalent patterns faster than others?, Up: FAQ
5267 Is backing up a big deal?
5268 =========================
5270 To: Adoram Rogel <adoram@hybridge.com>
5271 Subject: Re: Flex 2.5.2 performance questions
5272 In-reply-to: Your message of Thu, 19 Sep 96 10:16:04 EDT.
5273 Date: Thu, 19 Sep 96 09:58:00 PDT
5274 From: Vern Paxson <vern>
5276 > a lot about the backing up problem.
5277 > I believe that there lies my biggest problem, and I'll try to improve
5280 Since you have variable trailing context, this is a bigger performance
5281 problem. Fixing it is usually easier than fixing backing up, which in a
5282 complicated scanner (yours seems to fit the bill) can be extremely
5283 difficult to do correctly.
5285 You also don't mention what flags you are using for your scanner.
5286 -f makes a large speed difference, and -Cfe buys you nearly as much
5287 speed but the resulting scanner is considerably smaller.
5289 > I have an | operator in {and} and in {pats} so both of them are variable
5292 -p should have reported this.
5294 > Is changing one of them to fixed-length is enough ?
5298 > Is it possible to change the 32,000 states limit ?
5300 Yes. I've appended instructions on how. Before you make this change,
5301 though, you should think about whether there are ways to fundamentally
5302 simplify your scanner - those are certainly preferable!
5306 To increase the 32K limit (on a machine with 32 bit integers), you increase
5307 the magnitude of the following in flexdef.h:
5309 #define JAMSTATE -32766 /* marks a reference to the state that always jams */
5310 #define MAXIMUM_MNS 31999
5311 #define BAD_SUBSCRIPT -32767
5312 #define MAX_SHORT 32700
5314 Adding a 0 or two after each should do the trick.
5317 File: flex.info, Node: Can I fake multi-byte character support?, Next: deleteme01, Prev: Is backing up a big deal?, Up: FAQ
5319 Can I fake multi-byte character support?
5320 ========================================
5322 To: Heeman_Lee@hp.com
5323 Subject: Re: flex - multi-byte support?
5324 In-reply-to: Your message of Thu, 03 Oct 1996 17:24:04 PDT.
5325 Date: Fri, 04 Oct 1996 11:42:18 PDT
5326 From: Vern Paxson <vern>
5328 > I assume as long as my *.l file defines the
5329 > range of expected character code values (in octal format), flex will
5330 > scan the file and read multi-byte characters correctly. But I have no
5331 > confidence in this assumption.
5333 Your lack of confidence is justified - this won't work.
5335 Flex has in it a widespread assumption that the input is processed
5336 one byte at a time. Fixing this is on the to-do list, but is involved,
5337 so it won't happen any time soon. In the interim, the best I can suggest
5338 (unless you want to try fixing it yourself) is to write your rules in
5339 terms of pairs of bytes, using definitions in the first section:
5344 foo{X}bar found_foo_fe_c2_bar();
5346 etc. Definitely a pain - sorry about that.
5348 By the way, the email address you used for me is ancient, indicating you
5349 have a very old version of flex. You can get the most recent, 2.5.4, from
5355 File: flex.info, Node: deleteme01, Next: Can you discuss some flex internals?, Prev: Can I fake multi-byte character support?, Up: FAQ
5360 To: moleary@primus.com
5361 Subject: Re: Flex / Unicode compatibility question
5362 In-reply-to: Your message of Tue, 22 Oct 1996 10:15:42 PDT.
5363 Date: Tue, 22 Oct 1996 11:06:13 PDT
5364 From: Vern Paxson <vern>
5366 Unfortunately flex at the moment has a widespread assumption within it
5367 that characters are processed 8 bits at a time. I don't see any easy
5368 fix for this (other than writing your rules in terms of double characters -
5369 a pain). I also don't know of a wider lex, though you might try surfing
5370 the Plan 9 stuff because I know it's a Unicode system, and also the PCCT
5371 toolkit (try searching say Alta Vista for "Purdue Compiler Construction
5374 Fixing flex to handle wider characters is on the long-term to-do list.
5375 But since flex is a strictly spare-time project these days, this probably
5376 won't happen for quite a while, unless someone else does it first.
5381 File: flex.info, Node: Can you discuss some flex internals?, Next: unput() messes up yy_at_bol, Prev: deleteme01, Up: FAQ
5383 Can you discuss some flex internals?
5384 ====================================
5386 To: Johan Linde <jl@theophys.kth.se>
5387 Subject: Re: translation of flex
5388 In-reply-to: Your message of Sun, 10 Nov 1996 09:16:36 PST.
5389 Date: Mon, 11 Nov 1996 10:33:50 PST
5390 From: Vern Paxson <vern>
5392 > I'm working for the Swedish team translating GNU program, and I'm currently
5393 > working with flex. I have a few questions about some of the messages which
5394 > I hope you can answer.
5396 All of the things you're wondering about, by the way, concerning flex
5397 internals - probably the only person who understands what they mean in
5398 English is me! So I wouldn't worry too much about getting them right.
5402 > msgid " %d protos created\n"
5404 > Does proto mean prototype?
5406 Yes - prototypes of state compression tables.
5409 > msgid " %d/%d (peak %d) template nxt-chk entries created\n"
5411 > Here I'm mainly puzzled by 'nxt-chk'. I guess it means 'next-check'. (?)
5412 > However, 'template next-check entries' doesn't make much sense to me. To be
5413 > able to find a good translation I need to know a little bit more about it.
5415 There is a scheme in the Aho/Sethi/Ullman compiler book for compressing
5416 scanner tables. It involves creating two pairs of tables. The first has
5417 "base" and "default" entries, the second has "next" and "check" entries.
5418 The "base" entry is indexed by the current state and yields an index into
5419 the next/check table. The "default" entry gives what to do if the state
5420 transition isn't found in next/check. The "next" entry gives the next
5421 state to enter, but only if the "check" entry verifies that this entry is
5422 correct for the current state. Flex creates templates of series of
5423 next/check entries and then encodes differences from these templates as a
5424 way to compress the tables.
5427 > msgid " %d/%d base-def entries created\n"
5429 > The same problem here for 'base-def'.
5436 File: flex.info, Node: unput() messes up yy_at_bol, Next: The | operator is not doing what I want, Prev: Can you discuss some flex internals?, Up: FAQ
5438 unput() messes up yy_at_bol
5439 ===========================
5441 To: Xinying Li <xli@npac.syr.edu>
5443 In-reply-to: Your message of Wed, 13 Nov 1996 17:28:38 PST.
5444 Date: Wed, 13 Nov 1996 19:51:54 PST
5445 From: Vern Paxson <vern>
5447 > "unput()" them to input flow, question occurs. If I do this after I scan
5448 > a carriage, the variable "YY_CURRENT_BUFFER->yy_at_bol" is changed. That
5449 > means the carriage flag has gone.
5451 You can control this by calling yy_set_bol(). It's described in the manual.
5453 > And if in pre-reading it goes to the end of file, is anything done
5454 > to control the end of curren buffer and end of file?
5456 No, there's no way to put back an end-of-file.
5458 > By the way I am using flex 2.5.2 and using the "-l".
5460 The latest release is 2.5.4, by the way. It fixes some bugs in 2.5.2 and
5461 2.5.3. You can get it from ftp.ee.lbl.gov.
5466 File: flex.info, Node: The | operator is not doing what I want, Next: Why can't flex understand this variable trailing context pattern?, Prev: unput() messes up yy_at_bol, Up: FAQ
5468 The | operator is not doing what I want
5469 =======================================
5471 To: Alain.ISSARD@st.com
5472 Subject: Re: Start condition with FLEX
5473 In-reply-to: Your message of Mon, 18 Nov 1996 09:45:02 PST.
5474 Date: Mon, 18 Nov 1996 10:41:34 PST
5475 From: Vern Paxson <vern>
5477 > I am not able to use the start condition scope and to use the | (OR) with
5478 > rules having start conditions.
5480 The problem is that if you use '|' as a regular expression operator, for
5481 example "a|b" meaning "match either 'a' or 'b'", then it must *not* have
5482 any blanks around it. If you instead want the special '|' *action* (which
5483 from your scanner appears to be the case), which is a way of giving two
5484 different rules the same action:
5487 bar matched_foo_or_bar();
5489 then '|' *must* be separated from the first rule by whitespace and *must*
5490 be followed by a new line. You *cannot* write it as:
5492 foo | bar matched_foo_or_bar();
5494 even though you might think you could because yacc supports this syntax.
5495 The reason for this unfortunately incompatibility is historical, but it's
5496 unlikely to be changed.
5498 Your problems with start condition scope are simply due to syntax errors
5499 from your use of '|' later confusing flex.
5501 Let me know if you still have problems.
5506 File: flex.info, Node: Why can't flex understand this variable trailing context pattern?, Next: The ^ operator isn't working, Prev: The | operator is not doing what I want, Up: FAQ
5508 Why can't flex understand this variable trailing context pattern?
5509 =================================================================
5511 To: Gregory Margo <gmargo@newton.vip.best.com>
5512 Subject: Re: flex-2.5.3 bug report
5513 In-reply-to: Your message of Sat, 23 Nov 1996 16:50:09 PST.
5514 Date: Sat, 23 Nov 1996 17:07:32 PST
5515 From: Vern Paxson <vern>
5517 > Enclosed is a lex file that "real" lex will process, but I cannot get
5518 > flex to process it. Could you try it and maybe point me in the right direction?
5520 Your problem is that some of the definitions in the scanner use the '/'
5521 trailing context operator, and have it enclosed in ()'s. Flex does not
5522 allow this operator to be enclosed in ()'s because doing so allows undefined
5523 regular expressions such as "(a/b)+". So the solution is to remove the
5524 parentheses. Note that you must also be building the scanner with the -l
5525 option for AT&T lex compatibility. Without this option, flex automatically
5526 encloses the definitions in parentheses.
5531 File: flex.info, Node: The ^ operator isn't working, Next: Trailing context is getting confused with trailing optional patterns, Prev: Why can't flex understand this variable trailing context pattern?, Up: FAQ
5533 The ^ operator isn't working
5534 ============================
5536 To: Thomas Hadig <hadig@toots.physik.rwth-aachen.de>
5537 Subject: Re: Flex Bug ?
5538 In-reply-to: Your message of Tue, 26 Nov 1996 14:35:01 PST.
5539 Date: Tue, 26 Nov 1996 11:15:05 PST
5540 From: Vern Paxson <vern>
5542 > In my lexer code, i have the line :
5545 > Thus all lines starting with an astrix (*) are comment lines.
5546 > This does not work !
5548 I can't get this problem to reproduce - it works fine for me. Note
5549 though that if what you have is slightly different:
5555 then it won't work, because flex pushes back macro definitions enclosed
5556 in ()'s, so the rule becomes
5560 and now that the '^' operator is not at the immediate beginning of the
5561 line, it's interpreted as just a regular character. You can avoid this
5562 behavior by using the "-l" lex-compatibility flag, or "%option lex-compat".
5567 File: flex.info, Node: Trailing context is getting confused with trailing optional patterns, Next: Is flex GNU or not?, Prev: The ^ operator isn't working, Up: FAQ
5569 Trailing context is getting confused with trailing optional patterns
5570 ====================================================================
5572 To: Adoram Rogel <adoram@hybridge.com>
5573 Subject: Re: Flex 2.5.4 BOF ???
5574 In-reply-to: Your message of Tue, 26 Nov 1996 16:10:41 PST.
5575 Date: Wed, 27 Nov 1996 10:56:25 PST
5576 From: Vern Paxson <vern>
5578 > Organization(s)?/[a-z]
5580 > This matched "Organizations" (looking in debug mode, the trailing s
5581 > was matched with trailing context instead of the optional (s) in the
5584 That should only happen with lex. Flex can properly match this pattern.
5585 (That might be what you're saying, I'm just not sure.)
5587 > Is there a way to avoid this dangerous trailing context problem ?
5589 Unfortunately, there's no easy way. On the other hand, I don't see why
5590 it should be a problem. Lex's matching is clearly wrong, and I'd hope
5591 that usually the intent remains the same as expressed with the pattern,
5592 so flex's matching will be correct.
5597 File: flex.info, Node: Is flex GNU or not?, Next: ERASEME53, Prev: Trailing context is getting confused with trailing optional patterns, Up: FAQ
5602 To: Cameron MacKinnon <mackin@interlog.com>
5603 Subject: Re: Flex documentation bug
5604 In-reply-to: Your message of Mon, 02 Dec 1996 00:07:08 PST.
5605 Date: Sun, 01 Dec 1996 22:29:39 PST
5606 From: Vern Paxson <vern>
5608 > I'm not sure how or where to submit bug reports (documentation or
5609 > otherwise) for the GNU project stuff ...
5611 Well, strictly speaking flex isn't part of the GNU project. They just
5612 distribute it because no one's written a decent GPL'd lex replacement.
5613 So you should send bugs directly to me. Those sent to the GNU folks
5614 sometimes find there way to me, but some may drop between the cracks.
5616 > In GNU Info, under the section 'Start Conditions', and also in the man
5617 > page (mine's dated April '95) is a nice little snippet showing how to
5618 > parse C quoted strings into a buffer, defined to be MAX_STR_CONST in
5619 > size. Unfortunately, no overflow checking is ever done ...
5621 This is already mentioned in the manual:
5623 Finally, here's an example of how to match C-style quoted
5624 strings using exclusive start conditions, including expanded
5625 escape sequences (but not including checking for a string
5628 The reason for not doing the overflow checking is that it will needlessly
5629 clutter up an example whose main purpose is just to demonstrate how to
5632 The latest release is 2.5.4, by the way, available from ftp.ee.lbl.gov.
5637 File: flex.info, Node: ERASEME53, Next: I need to scan if-then-else blocks and while loops, Prev: Is flex GNU or not?, Up: FAQ
5642 To: tsv@cs.UManitoba.CA
5643 Subject: Re: Flex (reg)..
5644 In-reply-to: Your message of Thu, 06 Mar 1997 23:50:16 PST.
5645 Date: Thu, 06 Mar 1997 15:54:19 PST
5646 From: Vern Paxson <vern>
5648 > [:alpha:] ([:alnum:] | \\_)*
5650 If your rule really has embedded blanks as shown above, then it won't
5651 work, as the first blank delimits the rule from the action. (It wouldn't
5652 even compile ...) You need instead:
5654 [:alpha:]([:alnum:]|\\_)*
5656 and that should work fine - there's no restriction on what can go inside
5657 of ()'s except for the trailing context operator, '/'.
5662 File: flex.info, Node: I need to scan if-then-else blocks and while loops, Next: ERASEME55, Prev: ERASEME53, Up: FAQ
5664 I need to scan if-then-else blocks and while loops
5665 ==================================================
5667 To: "Mike Stolnicki" <mstolnic@ford.com>
5668 Subject: Re: FLEX help
5669 In-reply-to: Your message of Fri, 30 May 1997 13:33:27 PDT.
5670 Date: Fri, 30 May 1997 10:46:35 PDT
5671 From: Vern Paxson <vern>
5673 > We'd like to add "if-then-else", "while", and "for" statements to our
5675 > We've investigated many possible solutions. The one solution that seems
5676 > the most reasonable involves knowing the position of a TOKEN in yyin.
5678 I strongly advise you to instead build a parse tree (abstract syntax tree)
5679 and loop over that instead. You'll find this has major benefits in keeping
5680 your interpreter simple and extensible.
5682 That said, the functionality you mention for get_position and set_position
5683 have been on the to-do list for a while. As flex is a purely spare-time
5684 project for me, no guarantees when this will be added (in particular, it
5685 for sure won't be for many months to come).
5690 File: flex.info, Node: ERASEME55, Next: ERASEME56, Prev: I need to scan if-then-else blocks and while loops, Up: FAQ
5695 To: Colin Paul Adams <colin@colina.demon.co.uk>
5696 Subject: Re: Flex C++ classes and Bison
5697 In-reply-to: Your message of 09 Aug 1997 17:11:41 PDT.
5698 Date: Fri, 15 Aug 1997 10:48:19 PDT
5699 From: Vern Paxson <vern>
5701 > #define YY_DECL int yylex (YYSTYPE *lvalp, struct parser_control
5704 > I have been trying to get this to work as a C++ scanner, but it does
5705 > not appear to be possible (warning that it matches no declarations in
5706 > yyFlexLexer, or something like that).
5708 > Is this supposed to be possible, or is it being worked on (I DID
5709 > notice the comment that scanner classes are still experimental, so I'm
5712 What you need to do is derive a subclass from yyFlexLexer that provides
5713 the above yylex() method, squirrels away lvalp and parm into member
5714 variables, and then invokes yyFlexLexer::yylex() to do the regular scanning.
5719 File: flex.info, Node: ERASEME56, Next: ERASEME57, Prev: ERASEME55, Up: FAQ
5724 To: Mikael.Latvala@lmf.ericsson.se
5725 Subject: Re: Possible mistake in Flex v2.5 document
5726 In-reply-to: Your message of Fri, 05 Sep 1997 16:07:24 PDT.
5727 Date: Fri, 05 Sep 1997 10:01:54 PDT
5728 From: Vern Paxson <vern>
5730 > In that example you show how to count comment lines when using
5731 > C style /* ... */ comments. My question is, shouldn't you take into
5732 > account a scenario where end of a comment marker occurs inside
5733 > character or string literals?
5735 The scanner certainly needs to also scan character and string literals.
5736 However it does that (there's an example in the man page for strings), the
5737 lexer will recognize the beginning of the literal before it runs across the
5738 embedded "/*". Consequently, it will finish scanning the literal before it
5739 even considers the possibility of matching "/*".
5743 '([^']*|{ESCAPE_SEQUENCE})'
5745 will match all the text between the ''s (inclusive). So the lexer
5746 considers this as a token beginning at the first ', and doesn't even
5747 attempt to match other tokens inside it.
5749 I thinnk this subtlety is not worth putting in the manual, as I suspect
5750 it would confuse more people than it would enlighten.
5755 File: flex.info, Node: ERASEME57, Next: Is there a repository for flex scanners?, Prev: ERASEME56, Up: FAQ
5760 To: "Marty Leisner" <leisner@sdsp.mc.xerox.com>
5761 Subject: Re: flex limitations
5762 In-reply-to: Your message of Sat, 06 Sep 1997 11:27:21 PDT.
5763 Date: Mon, 08 Sep 1997 11:38:08 PDT
5764 From: Vern Paxson <vern>
5767 > [a-zA-Z]+ /* skip a line */
5768 > { printf("got %s\n", yytext); }
5771 What version of flex are you using? If I feed this to 2.5.4, it complains:
5773 "bug.l", line 5: EOF encountered inside an action
5774 "bug.l", line 5: unrecognized rule
5775 "bug.l", line 5: fatal parse error
5777 Not the world's greatest error message, but it manages to flag the problem.
5779 (With the introduction of start condition scopes, flex can't accommodate
5780 an action on a separate line, since it's ambiguous with an indented rule.)
5782 You can get 2.5.4 from ftp.ee.lbl.gov.
5787 File: flex.info, Node: Is there a repository for flex scanners?, Next: How can I conditionally compile or preprocess my flex input file?, Prev: ERASEME57, Up: FAQ
5789 Is there a repository for flex scanners?
5790 ========================================
5792 Not that we know of. You might try asking on comp.compilers.
5795 File: flex.info, Node: How can I conditionally compile or preprocess my flex input file?, Next: Where can I find grammars for lex and yacc?, Prev: Is there a repository for flex scanners?, Up: FAQ
5797 How can I conditionally compile or preprocess my flex input file?
5798 =================================================================
5800 Flex doesn't have a preprocessor like C does. You might try using m4,
5801 or the C preprocessor plus a sed script to clean up the result.
5804 File: flex.info, Node: Where can I find grammars for lex and yacc?, Next: I get an end-of-buffer message for each character scanned., Prev: How can I conditionally compile or preprocess my flex input file?, Up: FAQ
5806 Where can I find grammars for lex and yacc?
5807 ===========================================
5809 In the sources for flex and bison.
5812 File: flex.info, Node: I get an end-of-buffer message for each character scanned., Next: unnamed-faq-62, Prev: Where can I find grammars for lex and yacc?, Up: FAQ
5814 I get an end-of-buffer message for each character scanned.
5815 ==========================================================
5817 This will happen if your LexerInput() function returns only one
5818 character at a time, which can happen either if you're scanner is
5819 "interactive", or if the streams library on your platform always returns
5820 1 for yyin->gcount().
5822 Solution: override LexerInput() with a version that returns whole
5826 File: flex.info, Node: unnamed-faq-62, Next: unnamed-faq-63, Prev: I get an end-of-buffer message for each character scanned., Up: FAQ
5831 To: Georg.Rehm@CL-KI.Uni-Osnabrueck.DE
5832 Subject: Re: Flex maximums
5833 In-reply-to: Your message of Mon, 17 Nov 1997 17:16:06 PST.
5834 Date: Mon, 17 Nov 1997 17:16:15 PST
5835 From: Vern Paxson <vern>
5837 > I took a quick look into the flex-sources and altered some #defines in
5840 > #define INITIAL_MNS 64000
5841 > #define MNS_INCREMENT 1024000
5842 > #define MAXIMUM_MNS 64000
5844 The things to fix are to add a couple of zeroes to:
5846 #define JAMSTATE -32766 /* marks a reference to the state that always jams */
5847 #define MAXIMUM_MNS 31999
5848 #define BAD_SUBSCRIPT -32767
5849 #define MAX_SHORT 32700
5851 and, if you get complaints about too many rules, make the following change too:
5853 #define YY_TRAILING_MASK 0x200000
5854 #define YY_TRAILING_HEAD_MASK 0x400000
5859 File: flex.info, Node: unnamed-faq-63, Next: unnamed-faq-64, Prev: unnamed-faq-62, Up: FAQ
5864 To: jimmey@lexis-nexis.com (Jimmey Todd)
5865 Subject: Re: FLEX question regarding istream vs ifstream
5866 In-reply-to: Your message of Mon, 08 Dec 1997 15:54:15 PST.
5867 Date: Mon, 15 Dec 1997 13:21:35 PST
5868 From: Vern Paxson <vern>
5870 > stdin_handle = YY_CURRENT_BUFFER;
5871 > ifstream fin( "aFile" );
5872 > yy_switch_to_buffer( yy_create_buffer( fin, YY_BUF_SIZE ) );
5874 > What I'm wanting to do, is pass the contents of a file thru one set
5875 > of rules and then pass stdin thru another set... It works great if, I
5876 > don't use the C++ classes. But since everything else that I'm doing is
5877 > in C++, I thought I'd be consistent.
5879 > The problem is that 'yy_create_buffer' is expecting an istream* as it's
5880 > first argument (as stated in the man page). However, fin is a ifstream
5881 > object. Any ideas on what I might be doing wrong? Any help would be
5882 > appreciated. Thanks!!
5884 You need to pass &fin, to turn it into an ifstream* instead of an ifstream.
5885 Then its type will be compatible with the expected istream*, because ifstream
5886 is derived from istream.
5891 File: flex.info, Node: unnamed-faq-64, Next: unnamed-faq-65, Prev: unnamed-faq-63, Up: FAQ
5896 To: Enda Fadian <fadiane@piercom.ie>
5897 Subject: Re: Question related to Flex man page?
5898 In-reply-to: Your message of Tue, 16 Dec 1997 15:17:34 PST.
5899 Date: Tue, 16 Dec 1997 14:17:09 PST
5900 From: Vern Paxson <vern>
5902 > Can you explain to me what is ment by a long-jump in relation to flex?
5904 Using the longjmp() function while inside yylex() or a routine called by it.
5906 > what is the flex activation frame.
5908 Just yylex()'s stack frame.
5910 > As far as I can see yyrestart will bring me back to the sart of the input
5911 > file and using flex++ isnot really an option!
5913 No, yyrestart() doesn't imply a rewind, even though its name might sound
5914 like it does. It tells the scanner to flush its internal buffers and
5915 start reading from the given file at its present location.
5920 File: flex.info, Node: unnamed-faq-65, Next: unnamed-faq-66, Prev: unnamed-faq-64, Up: FAQ
5925 To: hassan@larc.info.uqam.ca (Hassan Alaoui)
5926 Subject: Re: Need urgent Help
5927 In-reply-to: Your message of Sat, 20 Dec 1997 19:38:19 PST.
5928 Date: Sun, 21 Dec 1997 21:30:46 PST
5929 From: Vern Paxson <vern>
5931 > /usr/lib/yaccpar: In function `int yyparse()':
5932 > /usr/lib/yaccpar:184: warning: implicit declaration of function `int yylex(...)'
5934 > ld: Undefined symbol
5939 This is a known problem with Solaris C++ (and/or Solaris yacc). I believe
5940 the fix is to explicitly insert some 'extern "C"' statements for the
5941 corresponding routines/symbols.
5946 File: flex.info, Node: unnamed-faq-66, Next: unnamed-faq-67, Prev: unnamed-faq-65, Up: FAQ
5951 To: mc0307@mclink.it
5952 Cc: gnu@prep.ai.mit.edu
5953 Subject: Re: [mc0307@mclink.it: Help request]
5954 In-reply-to: Your message of Fri, 12 Dec 1997 17:57:29 PST.
5955 Date: Sun, 21 Dec 1997 22:33:37 PST
5956 From: Vern Paxson <vern>
5958 > This is my definition for float and integer types:
5962 > I've tested my program on other lex version (on UNIX Sun Solaris an HP
5963 > UNIX) and it work well, so I think that my definitions are correct.
5964 > There are any differences between Lex and Flex?
5966 There are indeed differences, as discussed in the man page. The one
5967 you are probably running into is that when flex expands a name definition,
5968 it puts parentheses around the expansion, while lex does not. There's
5969 an example in the man page of how this can lead to different matching.
5970 Flex's behavior complies with the POSIX standard (or at least with the
5971 last POSIX draft I saw).
5976 File: flex.info, Node: unnamed-faq-67, Next: unnamed-faq-68, Prev: unnamed-faq-66, Up: FAQ
5981 To: hassan@larc.info.uqam.ca (Hassan Alaoui)
5983 In-reply-to: Your message of Mon, 22 Dec 1997 16:06:35 PST.
5984 Date: Mon, 22 Dec 1997 14:35:05 PST
5985 From: Vern Paxson <vern>
5987 > Thank you very much for your help. I compile and link well with C++ while
5988 > declaring 'yylex ...' extern, But a little problem remains. I get a
5989 > segmentation default when executing ( I linked with lfl library) while it
5990 > works well when using LEX instead of flex. Do you have some ideas about the
5993 The one possible reason for this that comes to mind is if you've defined
5994 yytext as "extern char yytext[]" (which is what lex uses) instead of
5995 "extern char *yytext" (which is what flex uses). If it's not that, then
5996 I'm afraid I don't know what the problem might be.
6001 File: flex.info, Node: unnamed-faq-68, Next: unnamed-faq-69, Prev: unnamed-faq-67, Up: FAQ
6006 To: "Bart Niswonger" <NISWONGR@almaden.ibm.com>
6007 Subject: Re: flex 2.5: c++ scanners & start conditions
6008 In-reply-to: Your message of Tue, 06 Jan 1998 10:34:21 PST.
6009 Date: Tue, 06 Jan 1998 19:19:30 PST
6010 From: Vern Paxson <vern>
6012 > The problem is that when I do this (using %option c++) start
6013 > conditions seem to not apply.
6015 The BEGIN macro modifies the yy_start variable. For C scanners, this
6016 is a static with scope visible through the whole file. For C++ scanners,
6017 it's a member variable, so it only has visible scope within a member
6018 function. Your lexbegin() routine is not a member function when you
6019 build a C++ scanner, so it's not modifying the correct yy_start. The
6020 diagnostic that indicates this is that you found you needed to add
6021 a declaration of yy_start in order to get your scanner to compile when
6022 using C++; instead, the correct fix is to make lexbegin() a member
6023 function (by deriving from yyFlexLexer).
6028 File: flex.info, Node: unnamed-faq-69, Next: unnamed-faq-70, Prev: unnamed-faq-68, Up: FAQ
6033 To: "Boris Zinin" <boris@ippe.rssi.ru>
6034 Subject: Re: current position in flex buffer
6035 In-reply-to: Your message of Mon, 12 Jan 1998 18:58:23 PST.
6036 Date: Mon, 12 Jan 1998 12:03:15 PST
6037 From: Vern Paxson <vern>
6039 > The problem is how to determine the current position in flex active
6040 > buffer when a rule is matched....
6042 You will need to keep track of this explicitly, such as by redefining
6043 YY_USER_ACTION to count the number of characters matched.
6045 The latest flex release, by the way, is 2.5.4, available from ftp.ee.lbl.gov.
6050 File: flex.info, Node: unnamed-faq-70, Next: unnamed-faq-71, Prev: unnamed-faq-69, Up: FAQ
6055 To: Bik.Dhaliwal@bis.org
6056 Subject: Re: Flex question
6057 In-reply-to: Your message of Mon, 26 Jan 1998 13:05:35 PST.
6058 Date: Tue, 27 Jan 1998 22:41:52 PST
6059 From: Vern Paxson <vern>
6061 > That requirement involves knowing
6062 > the character position at which a particular token was matched
6065 The way you have to do this is by explicitly keeping track of where
6066 you are in the file, by counting the number of characters scanned
6067 for each token (available in yyleng). It may prove convenient to
6068 do this by redefining YY_USER_ACTION, as described in the manual.
6073 File: flex.info, Node: unnamed-faq-71, Next: unnamed-faq-72, Prev: unnamed-faq-70, Up: FAQ
6078 To: Vladimir Alexiev <vladimir@cs.ualberta.ca>
6079 Subject: Re: flex: how to control start condition from parser?
6080 In-reply-to: Your message of Mon, 26 Jan 1998 05:50:16 PST.
6081 Date: Tue, 27 Jan 1998 22:45:37 PST
6082 From: Vern Paxson <vern>
6084 > It seems useful for the parser to be able to tell the lexer about such
6085 > context dependencies, because then they don't have to be limited to
6086 > local or sequential context.
6088 One way to do this is to have the parser call a stub routine that's
6089 included in the scanner's .l file, and consequently that has access ot
6090 BEGIN. The only ugliness is that the parser can't pass in the state
6091 it wants, because those aren't visible - but if you don't have many
6092 such states, then using a different set of names doesn't seem like
6093 to much of a burden.
6095 While generating a .h file like you suggests is certainly cleaner,
6096 flex development has come to a virtual stand-still :-(, so a workaround
6097 like the above is much more pragmatic than waiting for a new feature.
6102 File: flex.info, Node: unnamed-faq-72, Next: unnamed-faq-73, Prev: unnamed-faq-71, Up: FAQ
6107 To: Barbara Denny <denny@3com.com>
6108 Subject: Re: freebsd flex bug?
6109 In-reply-to: Your message of Fri, 30 Jan 1998 12:00:43 PST.
6110 Date: Fri, 30 Jan 1998 12:42:32 PST
6111 From: Vern Paxson <vern>
6113 > lex.yy.c:1996: parse error before `='
6115 This is the key, identifying this error. (It may help to pinpoint
6116 it by using flex -L, so it doesn't generate #line directives in its
6117 output.) I will bet you heavy money that you have a start condition
6118 name that is also a variable name, or something like that; flex spits
6119 out #define's for each start condition name, mapping them to a number,
6120 so you can wind up with:
6131 and the penultimate will turn into "int 1 = 3" after C preprocessing,
6132 since flex will put "#define foo 1" in the generated scanner.
6137 File: flex.info, Node: unnamed-faq-73, Next: unnamed-faq-74, Prev: unnamed-faq-72, Up: FAQ
6142 To: Maurice Petrie <mpetrie@infoscigroup.com>
6143 Subject: Re: Lost flex .l file
6144 In-reply-to: Your message of Mon, 02 Feb 1998 14:10:01 PST.
6145 Date: Mon, 02 Feb 1998 11:15:12 PST
6146 From: Vern Paxson <vern>
6148 > I am curious as to
6149 > whether there is a simple way to backtrack from the generated source to
6150 > reproduce the lost list of tokens we are searching on.
6152 In theory, it's straight-forward to go from the DFA representation
6153 back to a regular-expression representation - the two are isomorphic.
6154 In practice, a huge headache, because you have to unpack all the tables
6155 back into a single DFA representation, and then write a program to munch
6156 on that and translate it into an RE.
6158 Sorry for the less-than-happy news ...
6163 File: flex.info, Node: unnamed-faq-74, Next: unnamed-faq-75, Prev: unnamed-faq-73, Up: FAQ
6168 To: jimmey@lexis-nexis.com (Jimmey Todd)
6169 Subject: Re: Flex performance question
6170 In-reply-to: Your message of Thu, 19 Feb 1998 11:01:17 PST.
6171 Date: Thu, 19 Feb 1998 08:48:51 PST
6172 From: Vern Paxson <vern>
6174 > What I have found, is that the smaller the data chunk, the faster the
6175 > program executes. This is the opposite of what I expected. Should this be
6176 > happening this way?
6178 This is exactly what will happen if your input file has embedded NULs.
6181 A final note: flex is slow when matching NUL's, particularly
6182 when a token contains multiple NUL's. It's best to write
6183 rules which match short amounts of text if it's anticipated
6184 that the text will often include NUL's.
6186 So that's the first thing to look for.
6191 File: flex.info, Node: unnamed-faq-75, Next: unnamed-faq-76, Prev: unnamed-faq-74, Up: FAQ
6196 To: jimmey@lexis-nexis.com (Jimmey Todd)
6197 Subject: Re: Flex performance question
6198 In-reply-to: Your message of Thu, 19 Feb 1998 11:01:17 PST.
6199 Date: Thu, 19 Feb 1998 15:42:25 PST
6200 From: Vern Paxson <vern>
6202 So there are several problems.
6204 First, to go fast, you want to match as much text as possible, which
6205 your scanners don't in the case that what they're scanning is *not*
6206 a <RN> tag. So you want a rule like:
6210 Second, C++ scanners are particularly slow if they're interactive,
6211 which they are by default. Using -B speeds it up by a factor of 3-4
6214 Third, C++ scanners that use the istream interface are slow, because
6215 of how poorly implemented istream's are. I built two versions of
6216 the following scanner:
6223 and the C version inhales a 2.5MB file on my workstation in 0.8 seconds.
6224 The C++ istream version, using -B, takes 3.8 seconds.
6229 File: flex.info, Node: unnamed-faq-76, Next: unnamed-faq-77, Prev: unnamed-faq-75, Up: FAQ
6234 To: "Frescatore, David (CRD, TAD)" <frescatore@exc01crdge.crd.ge.com>
6235 Subject: Re: FLEX 2.5 & THE YEAR 2000
6236 In-reply-to: Your message of Wed, 03 Jun 1998 11:26:22 PDT.
6237 Date: Wed, 03 Jun 1998 10:22:26 PDT
6238 From: Vern Paxson <vern>
6240 > I am researching the Y2K problem with General Electric R&D
6241 > and need to know if there are any known issues concerning
6242 > the above mentioned software and Y2K regardless of version.
6244 There shouldn't be, all it ever does with the date is ask the system
6245 for it and then print it out.
6250 File: flex.info, Node: unnamed-faq-77, Next: unnamed-faq-78, Prev: unnamed-faq-76, Up: FAQ
6255 To: "Hans Dermot Doran" <htd@ibhdoran.com>
6256 Subject: Re: flex problem
6257 In-reply-to: Your message of Wed, 15 Jul 1998 21:30:13 PDT.
6258 Date: Tue, 21 Jul 1998 14:23:34 PDT
6259 From: Vern Paxson <vern>
6261 > To overcome this, I gets() the stdin into a string and lex the string. The
6262 > string is lexed OK except that the end of string isn't lexed properly
6263 > (yy_scan_string()), that is the lexer dosn't recognise the end of string.
6265 Flex doesn't contain mechanisms for recognizing buffer endpoints. But if
6266 you use fgets instead (which you should anyway, to protect against buffer
6267 overflows), then the final \n will be preserved in the string, and you can
6268 scan that in order to find the end of the string.
6273 File: flex.info, Node: unnamed-faq-78, Next: unnamed-faq-79, Prev: unnamed-faq-77, Up: FAQ
6278 To: soumen@almaden.ibm.com
6279 Subject: Re: Flex++ 2.5.3 instance member vs. static member
6280 In-reply-to: Your message of Mon, 27 Jul 1998 02:10:04 PDT.
6281 Date: Tue, 28 Jul 1998 01:10:34 PDT
6282 From: Vern Paxson <vern>
6292 > Now you'd expect mylineno to be a member of each instance of class
6293 > yyFlexLexer, but is this the case? A look at the lex.yy.cc file seems to
6294 > indicate otherwise; unless I am missing something the declaration of
6295 > mylineno seems to be outside any class scope.
6297 > How will this work if I want to run a multi-threaded application with each
6298 > thread creating a FlexLexer instance?
6300 Derive your own subclass and make mylineno a member variable of it.
6305 File: flex.info, Node: unnamed-faq-79, Next: unnamed-faq-80, Prev: unnamed-faq-78, Up: FAQ
6310 To: Adoram Rogel <adoram@hybridge.com>
6311 Subject: Re: More than 32K states change hangs
6312 In-reply-to: Your message of Tue, 04 Aug 1998 16:55:39 PDT.
6313 Date: Tue, 04 Aug 1998 22:28:45 PDT
6314 From: Vern Paxson <vern>
6318 > I followed your advice, posted on Usenet bu you, and emailed to me
6319 > personally by you, on how to overcome the 32K states limit. I'm running
6320 > on Linux machines.
6321 > I took the full source of version 2.5.4 and did the following changes in
6323 > #define JAMSTATE -327660
6324 > #define MAXIMUM_MNS 319990
6325 > #define BAD_SUBSCRIPT -327670
6326 > #define MAX_SHORT 327000
6329 > All looked fine, including check and bigcheck, so I installed.
6331 Hmmm, you shouldn't increase MAX_SHORT, though looking through my email
6332 archives I see that I did indeed recommend doing so. Try setting it back
6333 to 32700; that should suffice that you no longer need -Ca. If it still
6334 hangs, then the interesting question is - where?
6336 > Compiling the same hanged program with a out-of-the-box (RedHat 4.2
6337 > distribution of Linux)
6338 > flex 2.5.4 binary works.
6340 Since Linux comes with source code, you should diff it against what
6341 you have to see what problems they missed.
6343 > Should I always compile with the -Ca option now ? even short and simple
6346 No, definitely not. It's meant to be for those situations where you
6347 absolutely must squeeze every last cycle out of your scanner.
6352 File: flex.info, Node: unnamed-faq-80, Next: unnamed-faq-81, Prev: unnamed-faq-79, Up: FAQ
6357 To: "Schmackpfeffer, Craig" <Craig.Schmackpfeffer@usa.xerox.com>
6358 Subject: Re: flex output for static code portion
6359 In-reply-to: Your message of Tue, 11 Aug 1998 11:55:30 PDT.
6360 Date: Mon, 17 Aug 1998 23:57:42 PDT
6361 From: Vern Paxson <vern>
6363 > I would like to use flex under the hood to generate a binary file
6364 > containing the data structures that control the parse.
6366 This has been on the wish-list for a long time. In principle it's
6367 straight-forward - you redirect mkdata() et al's I/O to another file,
6368 and modify the skeleton to have a start-up function that slurps these
6369 into dynamic arrays. The concerns are (1) the scanner generation code
6370 is hairy and full of corner cases, so it's easy to get surprised when
6371 going down this path :-( ; and (2) being careful about buffering so
6372 that when the tables change you make sure the scanner starts in the
6373 correct state and reading at the right point in the input file.
6375 > I was wondering if you know of anyone who has used flex in this way.
6377 I don't - but it seems like a reasonable project to undertake (unlike
6378 numerous other flex tweaks :-).
6383 File: flex.info, Node: unnamed-faq-81, Next: unnamed-faq-82, Prev: unnamed-faq-80, Up: FAQ
6388 Received: from 131.173.17.11 (131.173.17.11 [131.173.17.11])
6389 by ee.lbl.gov (8.9.1/8.9.1) with ESMTP id AAA03838
6390 for <vern@ee.lbl.gov>; Thu, 20 Aug 1998 00:47:57 -0700 (PDT)
6391 Received: from hal.cl-ki.uni-osnabrueck.de (hal.cl-ki.Uni-Osnabrueck.DE [131.173.141.2])
6392 by deimos.rz.uni-osnabrueck.de (8.8.7/8.8.8) with ESMTP id JAA34694
6393 for <vern@ee.lbl.gov>; Thu, 20 Aug 1998 09:47:55 +0200
6394 Received: (from georg@localhost) by hal.cl-ki.uni-osnabrueck.de (8.6.12/8.6.12) id JAA34834 for vern@ee.lbl.gov; Thu, 20 Aug 1998 09:47:54 +0200
6395 From: Georg Rehm <georg@hal.cl-ki.uni-osnabrueck.de>
6396 Message-Id: <199808200747.JAA34834@hal.cl-ki.uni-osnabrueck.de>
6397 Subject: "flex scanner push-back overflow"
6399 Date: Thu, 20 Aug 1998 09:47:54 +0200 (MEST)
6400 Reply-To: Georg.Rehm@CL-KI.Uni-Osnabrueck.DE
6401 X-NoJunk: Do NOT send commercial mail, spam or ads to this address!
6402 X-URL: http://www.cl-ki.uni-osnabrueck.de/~georg/
6403 X-Mailer: ELM [version 2.4ME+ PL28 (25)]
6405 Content-Type: text/plain; charset=US-ASCII
6406 Content-Transfer-Encoding: 7bit
6410 Yesterday, I encountered a strange problem: I use the macro processor m4
6411 to include some lengthy lists into a .l file. Following is a flex macro
6412 definition that causes some serious pain in my neck:
6414 AUTHOR ("A. Boucard / L. Boucard"|"A. Dastarac / M. Levent"|"A.Boucaud / L.Boucaud"|"Abderrahim Lamchichi"|"Achmat Dangor"|"Adeline Toullier"|"Adewale Maja-Pearce"|"Ahmed Ziri"|"Akram Ellyas"|"Alain Bihr"|"Alain Gresh"|"Alain Guillemoles"|"Alain Joxe"|"Alain Morice"|"Alain Renon"|"Alain Zecchini"|"Albert Memmi"|"Alberto Manguel"|"Alex De Waal"|"Alfonso Artico"| [...])
6416 The complete list contains about 10kB. When I try to "flex" this file
6417 (on a Solaris 2.6 machine, using a modified flex 2.5.4 (I only increased
6418 some of the predefined values in flexdefs.h) I get the error:
6420 myflex/flex -8 sentag.tmp.l
6421 flex scanner push-back overflow
6423 When I remove the slashes in the macro definition everything works fine.
6424 As I understand it, the double quotes escape the slash-character so it
6425 really means "/" and not "trailing context". Furthermore, I tried to
6426 escape the slashes with backslashes, but with no use, the same error message
6427 appeared when flexing the code.
6429 Do you have an idea what's going on here?
6431 Greetings from Germany,
6434 Georg Rehm georg@cl-ki.uni-osnabrueck.de
6435 Institute for Semantic Information Processing, University of Osnabrueck, FRG
6438 File: flex.info, Node: unnamed-faq-82, Next: unnamed-faq-83, Prev: unnamed-faq-81, Up: FAQ
6443 To: Georg.Rehm@CL-KI.Uni-Osnabrueck.DE
6444 Subject: Re: "flex scanner push-back overflow"
6445 In-reply-to: Your message of Thu, 20 Aug 1998 09:47:54 PDT.
6446 Date: Thu, 20 Aug 1998 07:05:35 PDT
6447 From: Vern Paxson <vern>
6449 > myflex/flex -8 sentag.tmp.l
6450 > flex scanner push-back overflow
6452 Flex itself uses a flex scanner. That scanner is running out of buffer
6453 space when it tries to unput() the humongous macro you've defined. When
6454 you remove the '/'s, you make it small enough so that it fits in the buffer;
6455 removing spaces would do the same thing.
6457 The fix is to either rethink how come you're using such a big macro and
6458 perhaps there's another/better way to do it; or to rebuild flex's own
6459 scan.c with a larger value for
6461 #define YY_BUF_SIZE 16384
6466 File: flex.info, Node: unnamed-faq-83, Next: unnamed-faq-84, Prev: unnamed-faq-82, Up: FAQ
6471 To: Jan Kort <jan@research.techforce.nl>
6473 In-reply-to: Your message of Fri, 04 Sep 1998 12:18:43 +0200.
6474 Date: Sat, 05 Sep 1998 00:59:49 PDT
6475 From: Vern Paxson <vern>
6479 > "TEST1\n" { fprintf(stderr, "TEST1\n"); yyless(5); }
6480 > ^\n { fprintf(stderr, "empty line\n"); }
6482 > \n { fprintf(stderr, "new line\n"); }
6485 > -- input ---------------------------------------
6487 > -- output --------------------------------------
6490 > ------------------------------------------------
6492 IMHO, it's not clear whether or not this is in fact a bug. It depends
6493 on whether you view yyless() as backing up in the input stream, or as
6494 pushing new characters onto the beginning of the input stream. Flex
6495 interprets it as the latter (for implementation convenience, I'll admit),
6496 and so considers the newline as in fact matching at the beginning of a
6497 line, as after all the last token scanned an entire line and so the
6498 scanner is now at the beginning of a new line.
6500 I agree that this is counter-intuitive for yyless(), given its
6501 functional description (it's less so for unput(), depending on whether
6502 you're unput()'ing new text or scanned text). But I don't plan to
6503 change it any time soon, as it's a pain to do so. Consequently,
6504 you do indeed need to use yy_set_bol() and YY_AT_BOL() to tweak
6505 your scanner into the behavior you desire.
6507 Sorry for the less-than-completely-satisfactory answer.
6512 File: flex.info, Node: unnamed-faq-84, Next: unnamed-faq-85, Prev: unnamed-faq-83, Up: FAQ
6517 To: Patrick Krusenotto <krusenot@mac-info-link.de>
6518 Subject: Re: Problems with restarting flex-2.5.2-generated scanner
6519 In-reply-to: Your message of Thu, 24 Sep 1998 10:14:07 PDT.
6520 Date: Thu, 24 Sep 1998 23:28:43 PDT
6521 From: Vern Paxson <vern>
6523 > I am using flex-2.5.2 and bison 1.25 for Solaris and I am desperately
6524 > trying to make my scanner restart with a new file after my parser stops
6525 > with a parse error. When my compiler restarts, the parser always
6526 > receives the token after the token (in the old file!) that caused the
6529 I suspect the problem is that your parser has read ahead in order
6530 to attempt to resolve an ambiguity, and when it's restarted it picks
6531 up with that token rather than reading a fresh one. If you're using
6532 yacc, then the special "error" production can sometimes be used to
6533 consume tokens in an attempt to get the parser into a consistent state.
6538 File: flex.info, Node: unnamed-faq-85, Next: unnamed-faq-86, Prev: unnamed-faq-84, Up: FAQ
6543 To: Henric Jungheim <junghelh@pe-nelson.com>
6544 Subject: Re: flex 2.5.4a
6545 In-reply-to: Your message of Tue, 27 Oct 1998 16:41:42 PST.
6546 Date: Tue, 27 Oct 1998 16:50:14 PST
6547 From: Vern Paxson <vern>
6549 > This brings up a feature request: How about a command line
6550 > option to specify the filename when reading from stdin? That way one
6551 > doesn't need to create a temporary file in order to get the "#line"
6552 > directives to make sense.
6554 Use -o combined with -t (per the man page description of -o).
6556 > P.S., Is there any simple way to use non-blocking IO to parse multiple
6561 One approach might be to return a magic character on EWOULDBLOCK and
6564 .*<magic-character> // put back .*, eat magic character
6566 This is off the top of my head, not sure it'll work.
6571 File: flex.info, Node: unnamed-faq-86, Next: unnamed-faq-87, Prev: unnamed-faq-85, Up: FAQ
6576 To: "Repko, Billy D" <billy.d.repko@intel.com>
6577 Subject: Re: Compiling scanners
6578 In-reply-to: Your message of Wed, 13 Jan 1999 10:52:47 PST.
6579 Date: Thu, 14 Jan 1999 00:25:30 PST
6580 From: Vern Paxson <vern>
6582 > It appears that maybe it cannot find the lfl library.
6584 The Makefile in the distribution builds it, so you should have it.
6585 It's exceedingly trivial, just a main() that calls yylex() and
6586 a yyrap() that always returns 1.
6589 > \n ++num_lines; ++num_chars;
6592 You can't indent your rules like this - that's where the errors are coming
6593 from. Flex copies indented text to the output file, it's how you do things
6596 int num_lines_seen = 0;
6598 to declare local variables.
6603 File: flex.info, Node: unnamed-faq-87, Next: unnamed-faq-88, Prev: unnamed-faq-86, Up: FAQ
6608 To: Erick Branderhorst <Erick.Branderhorst@asml.nl>
6609 Subject: Re: flex input buffer
6610 In-reply-to: Your message of Tue, 09 Feb 1999 13:53:46 PST.
6611 Date: Tue, 09 Feb 1999 21:03:37 PST
6612 From: Vern Paxson <vern>
6614 > In the flex.skl file the size of the default input buffers is set. Can you
6615 > explain why this size is set and why it is such a high number.
6617 It's large to optimize performance when scanning large files. You can
6618 safely make it a lot lower if needed.
6623 File: flex.info, Node: unnamed-faq-88, Next: unnamed-faq-90, Prev: unnamed-faq-87, Up: FAQ
6628 To: "Guido Minnen" <guidomi@cogs.susx.ac.uk>
6629 Subject: Re: Flex error message
6630 In-reply-to: Your message of Wed, 24 Feb 1999 15:31:46 PST.
6631 Date: Thu, 25 Feb 1999 00:11:31 PST
6632 From: Vern Paxson <vern>
6634 > I'm extending a larger scanner written in Flex and I keep running into
6635 > problems. More specifically, I get the error message:
6636 > "flex: input rules are too complicated (>= 32000 NFA states)"
6638 Increase the definitions in flexdef.h for:
6640 #define JAMSTATE -32766 /* marks a reference to the state that always j
6642 #define MAXIMUM_MNS 31999
6643 #define BAD_SUBSCRIPT -32767
6645 recompile everything, and it should all work.
6650 File: flex.info, Node: unnamed-faq-90, Next: unnamed-faq-91, Prev: unnamed-faq-88, Up: FAQ
6655 To: "Dmitriy Goldobin" <gold@ems.chel.su>
6656 Subject: Re: FLEX trouble
6657 In-reply-to: Your message of Mon, 31 May 1999 18:44:49 PDT.
6658 Date: Tue, 01 Jun 1999 00:15:07 PDT
6659 From: Vern Paxson <vern>
6661 > I have a trouble with FLEX. Why rule "/*".*"*/" work properly,=20
6662 > but rule "/*"(.|\n)*"*/" don't work ?
6664 The second of these will have to scan the entire input stream (because
6665 "(.|\n)*" matches an arbitrary amount of any text) in order to see if
6666 it ends with "*/", terminating the comment. That potentially will overflow
6669 > More complex rule "/*"([^*]|(\*/[^/]))*"*/ give an error
6670 > 'unrecognized rule'.
6672 You can't use the '/' operator inside parentheses. It's not clear
6673 what "(a/b)*" actually means.
6675 > I now use workaround with state <comment>, but single-rule is
6678 Single-rule is nice but will always have the problem of either setting
6679 restrictions on comments (like not allowing multi-line comments) and/or
6680 running the risk of consuming the entire input stream, as noted above.
6685 File: flex.info, Node: unnamed-faq-91, Next: unnamed-faq-92, Prev: unnamed-faq-90, Up: FAQ
6690 Received: from mc-qout4.whowhere.com (mc-qout4.whowhere.com [209.185.123.18])
6691 by ee.lbl.gov (8.9.3/8.9.3) with SMTP id IAA05100
6692 for <vern@ee.lbl.gov>; Tue, 15 Jun 1999 08:56:06 -0700 (PDT)
6693 Received: from Unknown/Local ([?.?.?.?]) by my-deja.com; Tue Jun 15 08:55:43 1999
6695 Date: Tue, 15 Jun 1999 08:55:43 -0700
6696 From: "Aki Niimura" <neko@my-deja.com>
6697 Message-ID: <KNONDOHDOBGAEAAA@my-deja.com>
6702 X-Mailer: MailCity Service
6703 Subject: A question on flex C++ scanner
6704 X-Sender-Ip: 12.72.207.61
6705 Organization: My Deja Email (http://www.my-deja.com:80)
6706 Content-Type: text/plain; charset=us-ascii
6707 Content-Transfer-Encoding: 7bit
6711 I have been using flex for years.
6712 It works very well on many projects.
6713 Most case, I used it to generate a scanner on C language.
6714 However, one project I needed to generate a scanner
6715 on C++ lanuage. Thanks to your enhancement, flex did
6718 Currently, I'm working on enhancing my previous project.
6719 I need to deal with multiple input streams (recursive
6720 inclusion) in this scanner (C++).
6721 I did similar thing for another scanner (C) as you
6722 explained in your documentation.
6724 The generated scanner (C++) has necessary methods:
6725 - switch_to_buffer(struct yy_buffer_state *b)
6726 - yy_create_buffer(istream *is, int sz)
6727 - yy_delete_buffer(struct yy_buffer_state *b)
6729 However, I couldn't figure out how to access current
6730 buffer (yy_current_buffer).
6732 yy_current_buffer is a protected member of yyFlexLexer.
6733 I can't access it directly.
6734 Then, I thought yy_create_buffer() with is = 0 might
6735 return current stream buffer. But it seems not as far
6736 as I checked the source. (flex 2.5.4)
6738 I went through the Web in addition to Flex documentation.
6739 However, it hasn't been successful, so far.
6741 It is not my intention to bother you, but, can you
6742 comment about how to obtain the current stream buffer?
6744 Your response would be highly appreciated.
6749 --== Sent via Deja.com http://www.deja.com/ ==--
6750 Share what you know. Learn what you don't.
6753 File: flex.info, Node: unnamed-faq-92, Next: unnamed-faq-93, Prev: unnamed-faq-91, Up: FAQ
6758 To: neko@my-deja.com
6759 Subject: Re: A question on flex C++ scanner
6760 In-reply-to: Your message of Tue, 15 Jun 1999 08:55:43 PDT.
6761 Date: Tue, 15 Jun 1999 09:04:24 PDT
6762 From: Vern Paxson <vern>
6764 > However, I couldn't figure out how to access current
6765 > buffer (yy_current_buffer).
6767 Derive your own subclass from yyFlexLexer.
6772 File: flex.info, Node: unnamed-faq-93, Next: unnamed-faq-94, Prev: unnamed-faq-92, Up: FAQ
6777 To: "Stones, Darren" <Darren.Stones@nectech.co.uk>
6778 Subject: Re: You're the man to see?
6779 In-reply-to: Your message of Wed, 23 Jun 1999 11:10:29 PDT.
6780 Date: Wed, 23 Jun 1999 09:01:40 PDT
6781 From: Vern Paxson <vern>
6783 > I hope you can help me. I am using Flex and Bison to produce an interpreted
6784 > language. However all goes well until I try to implement an IF statement or
6785 > a WHILE. I cannot get this to work as the parser parses all the conditions
6786 > eg. the TRUE and FALSE conditons to check for a rule match. So I cannot
6789 You need to use the parser to build a parse tree (= abstract syntax trwee),
6790 and when that's all done you recursively evaluate the tree, binding variables
6791 to values at that time.
6796 File: flex.info, Node: unnamed-faq-94, Next: unnamed-faq-95, Prev: unnamed-faq-93, Up: FAQ
6801 To: Petr Danecek <petr@ics.cas.cz>
6802 Subject: Re: flex - question
6803 In-reply-to: Your message of Mon, 28 Jun 1999 19:21:41 PDT.
6804 Date: Fri, 02 Jul 1999 16:52:13 PDT
6805 From: Vern Paxson <vern>
6807 > file, it takes an enormous amount of time. It is funny, because the
6808 > source code has only 12 rules!!! I think it looks like an exponencial
6811 Right, that's the problem - some patterns (those with a lot of
6812 ambiguity, where yours has because at any given time the scanner can
6813 be in the middle of all sorts of combinations of the different
6814 rules) blow up exponentially.
6816 For your rules, there is an easy fix. Change the ".*" that comes fater
6817 the directory name to "[^ ]*". With that in place, the rules are no
6818 longer nearly so ambiguous, because then once one of the directories
6819 has been matched, no other can be matched (since they all require a
6822 If that's not an acceptable solution, then you can enter a start state
6823 to pick up the .*\n after each directory is matched.
6825 Also note that for speed, you'll want to add a ".*" rule at the end,
6826 otherwise rules that don't match any of the patterns will be matched
6827 very slowly, a character at a time.
6832 File: flex.info, Node: unnamed-faq-95, Next: unnamed-faq-96, Prev: unnamed-faq-94, Up: FAQ
6837 To: Tielman Koekemoer <tielman@spi.co.za>
6838 Subject: Re: Please help.
6839 In-reply-to: Your message of Thu, 08 Jul 1999 13:20:37 PDT.
6840 Date: Thu, 08 Jul 1999 08:20:39 PDT
6841 From: Vern Paxson <vern>
6843 > I was hoping you could help me with my problem.
6845 > I tried compiling (gnu)flex on a Solaris 2.4 machine
6846 > but when I ran make (after configure) I got an error.
6848 > --------------------------------------------------------------
6849 > gcc -c -I. -I. -g -O parse.c
6850 > ./flex -t -p ./scan.l >scan.c
6851 > sh: ./flex: not found
6853 > make: Fatal error: Command failed for target `scan.c'
6854 > -------------------------------------------------------------
6856 > What's strange to me is that I'm only
6857 > trying to install flex now. I then edited the Makefile to
6858 > and changed where it says "FLEX = flex" to "FLEX = lex"
6859 > ( lex: the native Solaris one ) but then it complains about
6860 > the "-p" option. Is there any way I can compile flex without
6861 > using flex or lex?
6863 > Thanks so much for your time.
6865 You managed to step on the bootstrap sequence, which first copies
6866 initscan.c to scan.c in order to build flex. Try fetching a fresh
6867 distribution from ftp.ee.lbl.gov. (Or you can first try removing
6868 ".bootstrap" and doing a make again.)
6873 File: flex.info, Node: unnamed-faq-96, Next: unnamed-faq-97, Prev: unnamed-faq-95, Up: FAQ
6878 To: Tielman Koekemoer <tielman@spi.co.za>
6879 Subject: Re: Please help.
6880 In-reply-to: Your message of Fri, 09 Jul 1999 09:16:14 PDT.
6881 Date: Fri, 09 Jul 1999 00:27:20 PDT
6882 From: Vern Paxson <vern>
6884 > First I removed .bootstrap (and ran make) - no luck. I downloaded the
6885 > software but I still have the same problem. Is there anything else I
6890 cp initscan.c scan.c
6894 If this last tries to first build scan.c from scan.l using ./flex, then
6895 your "make" is broken, in which case compile scan.c to scan.o by hand.
6900 File: flex.info, Node: unnamed-faq-97, Next: unnamed-faq-98, Prev: unnamed-faq-96, Up: FAQ
6905 To: Sumanth Kamenani <skamenan@crl.nmsu.edu>
6907 In-reply-to: Your message of Mon, 19 Jul 1999 23:08:41 PDT.
6908 Date: Tue, 20 Jul 1999 00:18:26 PDT
6909 From: Vern Paxson <vern>
6911 > I am getting a compilation error. The error is given as "unknown symbol- yylex".
6913 The parser relies on calling yylex(), but you're instead using the C++ scanning
6914 class, so you need to supply a yylex() "glue" function that calls an instance
6915 scanner of the scanner (e.g., "scanner->yylex()").
6920 File: flex.info, Node: unnamed-faq-98, Next: unnamed-faq-99, Prev: unnamed-faq-97, Up: FAQ
6925 To: daniel@synchrods.synchrods.COM (Daniel Senderowicz)
6927 In-reply-to: Your message of Mon, 22 Nov 1999 11:19:04 PST.
6928 Date: Tue, 23 Nov 1999 15:54:30 PST
6929 From: Vern Paxson <vern>
6931 Well, your problem is the
6933 switch (yybgin-yysvec-1) { /* witchcraft */
6935 at the beginning of lex rules. "witchcraft" == "non-portable". It's
6936 assuming knowledge of the AT&T lex's internal variables.
6938 For flex, you can probably do the equivalent using a switch on YYSTATE.
6943 File: flex.info, Node: unnamed-faq-99, Next: unnamed-faq-100, Prev: unnamed-faq-98, Up: FAQ
6948 To: archow@hss.hns.com
6949 Subject: Re: Regarding distribution of flex and yacc based grammars
6950 In-reply-to: Your message of Sun, 19 Dec 1999 17:50:24 +0530.
6951 Date: Wed, 22 Dec 1999 01:56:24 PST
6952 From: Vern Paxson <vern>
6954 > When we provide the customer with an object code distribution, is it
6955 > necessary for us to provide source
6956 > for the generated C files from flex and bison since they are generated by
6959 For flex, no. I don't know what the current state of this is for bison.
6961 > Also, is there any requrirement for us to neccessarily provide source for
6962 > the grammar files which are fed into flex and bison ?
6964 Again, for flex, no.
6966 See the file "COPYING" in the flex distribution for the legalese.
6971 File: flex.info, Node: unnamed-faq-100, Next: unnamed-faq-101, Prev: unnamed-faq-99, Up: FAQ
6976 To: Martin Gallwey <gallweym@hyperion.moe.ul.ie>
6977 Subject: Re: Flex, and self referencing rules
6978 In-reply-to: Your message of Sun, 20 Feb 2000 01:01:21 PST.
6979 Date: Sat, 19 Feb 2000 18:33:16 PST
6980 From: Vern Paxson <vern>
6982 > However, I do not use unput anywhere. I do use self-referencing
6985 > UnaryExpr ({UnionExpr})|("-"{UnaryExpr})
6987 You can't do this - flex is *not* a parser like yacc (which does indeed
6988 allow recursion), it is a scanner that's confined to regular expressions.
6993 File: flex.info, Node: unnamed-faq-101, Next: What is the difference between YYLEX_PARAM and YY_DECL?, Prev: unnamed-faq-100, Up: FAQ
6998 To: slg3@lehigh.edu (SAMUEL L. GULDEN)
6999 Subject: Re: Flex problem
7000 In-reply-to: Your message of Thu, 02 Mar 2000 12:29:04 PST.
7001 Date: Thu, 02 Mar 2000 23:00:46 PST
7002 From: Vern Paxson <vern>
7004 If this is exactly your program:
7008 > whitespace [ \t\n]+
7011 > "[" { printf("open_brac\n");}
7012 > "]" { printf("close_brac\n");}
7013 > "+" { printf("addop\n");}
7014 > "*" { printf("multop\n");}
7015 > {digits} { printf("NUMBER = %s\n", yytext);}
7018 then the problem is that the last rule needs to be "{whitespace}" !
7023 File: flex.info, Node: What is the difference between YYLEX_PARAM and YY_DECL?, Next: Why do I get "conflicting types for yylex" error?, Prev: unnamed-faq-101, Up: FAQ
7025 What is the difference between YYLEX_PARAM and YY_DECL?
7026 =======================================================
7028 YYLEX_PARAM is not a flex symbol. It is for Bison. It tells Bison to
7029 pass extra params when it calls yylex() from the parser.
7031 YY_DECL is the Flex declaration of yylex. The default is similar to
7034 #define int yy_lex ()
7037 File: flex.info, Node: Why do I get "conflicting types for yylex" error?, Next: How do I access the values set in a Flex action from within a Bison action?, Prev: What is the difference between YYLEX_PARAM and YY_DECL?, Up: FAQ
7039 Why do I get "conflicting types for yylex" error?
7040 =================================================
7042 This is a compiler error regarding a generated Bison parser, not a Flex
7043 scanner. It means you need a prototype of yylex() in the top of the
7044 Bison file. Be sure the prototype matches YY_DECL.
7047 File: flex.info, Node: How do I access the values set in a Flex action from within a Bison action?, Prev: Why do I get "conflicting types for yylex" error?, Up: FAQ
7049 How do I access the values set in a Flex action from within a Bison action?
7050 ===========================================================================
7052 With $1, $2, $3, etc. These are called "Semantic Values" in the Bison
7053 manual. See *note (bison)Top::.
7056 File: flex.info, Node: Appendices, Next: Indices, Prev: FAQ, Up: Top
7058 Appendix A Appendices
7059 *********************
7063 * Makefiles and Flex::
7069 File: flex.info, Node: Makefiles and Flex, Next: Bison Bridge, Prev: Appendices, Up: Appendices
7071 A.1 Makefiles and Flex
7072 ======================
7074 In this appendix, we provide tips for writing Makefiles to build your
7077 In a traditional build environment, we say that the '.c' files are
7078 the sources, and the '.o' files are the intermediate files. When using
7079 'flex', however, the '.l' files are the sources, and the generated '.c'
7080 files (along with the '.o' files) are the intermediate files. This
7081 requires you to carefully plan your Makefile.
7083 Modern 'make' programs understand that 'foo.l' is intended to
7084 generate 'lex.yy.c' or 'foo.c', and will behave accordingly(1)(2). The
7085 following Makefile does not explicitly instruct 'make' how to build
7086 'foo.c' from 'foo.l'. Instead, it relies on the implicit rules of the
7087 'make' program to build the intermediate file, 'scan.c':
7089 # Basic Makefile -- relies on implicit rules
7090 # Creates "myprogram" from "scan.l" and "myprogram.c"
7093 myprogram: scan.o myprogram.o
7097 For simple cases, the above may be sufficient. For other cases, you
7098 may have to explicitly instruct 'make' how to build your scanner. The
7099 following is an example of a Makefile containing explicit rules:
7101 # Basic Makefile -- provides explicit rules
7102 # Creates "myprogram" from "scan.l" and "myprogram.c"
7105 myprogram: scan.o myprogram.o
7106 $(CC) -o $@ $(LDFLAGS) $^
7108 myprogram.o: myprogram.c
7109 $(CC) $(CPPFLAGS) $(CFLAGS) -o $@ -c $^
7112 $(CC) $(CPPFLAGS) $(CFLAGS) -o $@ -c $^
7115 $(LEX) $(LFLAGS) -o $@ $^
7121 Notice in the above example that 'scan.c' is in the 'clean' target.
7122 This is because we consider the file 'scan.c' to be an intermediate
7125 Finally, we provide a realistic example of a 'flex' scanner used with
7126 a 'bison' parser(3). There is a tricky problem we have to deal with.
7127 Since a 'flex' scanner will typically include a header file (e.g.,
7128 'y.tab.h') generated by the parser, we need to be sure that the header
7129 file is generated BEFORE the scanner is compiled. We handle this case
7130 in the following example:
7132 # Makefile example -- scanner and parser.
7133 # Creates "myprogram" from "scan.l", "parse.y", and "myprogram.c"
7138 objects = scan.o parse.o myprogram.o
7140 myprogram: $(objects)
7141 scan.o: scan.l parse.c
7143 myprogram.o: myprogram.c
7146 In the above example, notice the line,
7148 scan.o: scan.l parse.c
7150 , which lists the file 'parse.c' (the generated parser) as a
7151 dependency of 'scan.o'. We want to ensure that the parser is created
7152 before the scanner is compiled, and the above line seems to do the
7153 trick. Feel free to experiment with your specific implementation of
7156 For more details on writing Makefiles, see *note (make)Top::.
7158 ---------- Footnotes ----------
7160 (1) GNU 'make' and GNU 'automake' are two such programs that provide
7161 implicit rules for flex-generated scanners.
7163 (2) GNU 'automake' may generate code to execute flex in
7164 lex-compatible mode, or to stdout. If this is not what you want, then
7165 you should provide an explicit rule in your Makefile.am
7167 (3) This example also applies to yacc parsers.
7170 File: flex.info, Node: Bison Bridge, Next: M4 Dependency, Prev: Makefiles and Flex, Up: Appendices
7172 A.2 C Scanners with Bison Parsers
7173 =================================
7175 This section describes the 'flex' features useful when integrating
7176 'flex' with 'GNU bison'(1). Skip this section if you are not using
7177 'bison' with your scanner. Here we discuss only the 'flex' half of the
7178 'flex' and 'bison' pair. We do not discuss 'bison' in any detail. For
7179 more information about generating 'bison' parsers, see *note
7182 A compatible 'bison' scanner is generated by declaring '%option
7183 bison-bridge' or by supplying '--bison-bridge' when invoking 'flex' from
7184 the command line. This instructs 'flex' that the macro 'yylval' may be
7185 used. The data type for 'yylval', 'YYSTYPE', is typically defined in a
7186 header file, included in section 1 of the 'flex' input file. For a list
7187 of functions and macros available, *Note bison-functions::.
7189 The declaration of yylex becomes,
7191 int yylex ( YYSTYPE * lvalp, yyscan_t scanner );
7193 If '%option bison-locations' is specified, then the declaration
7196 int yylex ( YYSTYPE * lvalp, YYLTYPE * llocp, yyscan_t scanner );
7198 Note that the macros 'yylval' and 'yylloc' evaluate to pointers.
7199 Support for 'yylloc' is optional in 'bison', so it is optional in 'flex'
7200 as well. The following is an example of a 'flex' scanner that is
7201 compatible with 'bison'.
7203 /* Scanner for "C" assignment statements... sort of. */
7205 #include "y.tab.h" /* Generated by bison. */
7208 %option bison-bridge bison-locations
7211 [[:digit:]]+ { yylval->num = atoi(yytext); return NUMBER;}
7212 [[:alnum:]]+ { yylval->str = strdup(yytext); return STRING;}
7213 "="|";" { return yytext[0];}
7217 As you can see, there really is no magic here. We just use 'yylval'
7218 as we would any other variable. The data type of 'yylval' is generated
7219 by 'bison', and included in the file 'y.tab.h'. Here is the
7220 corresponding 'bison' parser:
7222 /* Parser to convert "C" assignments to lisp. */
7224 /* Pass the argument to yyparse through to yylex. */
7225 #define YYPARSE_PARAM scanner
7226 #define YYLEX_PARAM scanner
7238 STRING '=' NUMBER ';' {
7239 printf( "(setf %s %d)", $1, $3 );
7243 ---------- Footnotes ----------
7245 (1) The features described here are purely optional, and are by no
7246 means the only way to use flex with bison. We merely provide some glue
7247 to ease development of your parser-scanner pair.
7250 File: flex.info, Node: M4 Dependency, Next: Common Patterns, Prev: Bison Bridge, Up: Appendices
7255 The macro processor 'm4'(1) must be installed wherever flex is
7256 installed. 'flex' invokes 'm4', found by searching the directories in
7257 the 'PATH' environment variable. Any code you place in section 1 or in
7258 the actions will be sent through m4. Please follow these rules to
7259 protect your code from unwanted 'm4' processing.
7261 * Do not use symbols that begin with, 'm4_', such as, 'm4_define', or
7262 'm4_include', since those are reserved for 'm4' macro names. If
7263 for some reason you need m4_ as a prefix, use a preprocessor
7264 #define to get your symbol past m4 unmangled.
7266 * Do not use the strings '[[' or ']]' anywhere in your code. The
7267 former is not valid in C, except within comments and strings, but
7268 the latter is valid in code such as 'x[y[z]]'. The solution is
7269 simple. To get the literal string '"]]"', use '"]""]"'. To get
7270 the array notation 'x[y[z]]', use 'x[y[z] ]'. Flex will attempt to
7271 detect these sequences in user code, and escape them. However,
7272 it's best to avoid this complexity where possible, by removing such
7273 sequences from your code.
7275 'm4' is only required at the time you run 'flex'. The generated
7276 scanner is ordinary C or C++, and does _not_ require 'm4'.
7278 ---------- Footnotes ----------
7280 (1) The use of m4 is subject to change in future revisions of flex.
7281 It is not part of the public API of flex. Do not depend on it.
7284 File: flex.info, Node: Common Patterns, Prev: M4 Dependency, Up: Appendices
7289 This appendix provides examples of common regular expressions you might
7290 use in your scanner.
7296 * Quoted Constructs::
7300 File: flex.info, Node: Numbers, Next: Identifiers, Up: Common Patterns
7305 C99 decimal constant
7306 '([[:digit:]]{-}[0])[[:digit:]]*'
7308 C99 hexadecimal constant
7309 '0[xX][[:xdigit:]]+'
7314 C99 floating point constant
7315 {dseq} ([[:digit:]]+)
7316 {dseq_opt} ([[:digit:]]*)
7317 {frac} (({dseq_opt}"."{dseq})|{dseq}".")
7318 {exp} ([eE][+-]?{dseq})
7321 {fsuff_opt} ({fsuff}?)
7323 {hdseq} ([[:xdigit:]]+)
7324 {hdseq_opt} ([[:xdigit:]]*)
7325 {hfrac} (({hdseq_opt}"."{hdseq})|({hdseq}"."))
7326 {bexp} ([pP][+-]?{dseq})
7327 {dfc} (({frac}{exp_opt}{fsuff_opt})|({dseq}{exp}{fsuff_opt}))
7328 {hfc} (({hpref}{hfrac}{bexp}{fsuff_opt})|({hpref}{hdseq}{bexp}{fsuff_opt}))
7330 {c99_floating_point_constant} ({dfc}|{hfc})
7332 See C99 section 6.4.4.2 for the gory details.
7335 File: flex.info, Node: Identifiers, Next: Quoted Constructs, Prev: Numbers, Up: Common Patterns
7341 ucn ((\\u([[:xdigit:]]{4}))|(\\U([[:xdigit:]]{8})))
7342 nondigit [_[:alpha:]]
7343 c99_id ([_[:alpha:]]|{ucn})([_[:alnum:]]|{ucn})*
7345 Technically, the above pattern does not encompass all possible C99
7346 identifiers, since C99 allows for "implementation-defined"
7347 characters. In practice, C compilers follow the above pattern,
7348 with the addition of the '$' character.
7350 UTF-8 Encoded Unicode Code Point
7351 [\x09\x0A\x0D\x20-\x7E]|[\xC2-\xDF][\x80-\xBF]|\xE0[\xA0-\xBF][\x80-\xBF]|[\xE1-\xEC\xEE\xEF]([\x80-\xBF]{2})|\xED[\x80-\x9F][\x80-\xBF]|\xF0[\x90-\xBF]([\x80-\xBF]{2})|[\xF1-\xF3]([\x80-\xBF]{3})|\xF4[\x80-\x8F]([\x80-\xBF]{2})
7354 File: flex.info, Node: Quoted Constructs, Next: Addresses, Prev: Identifiers, Up: Common Patterns
7356 A.4.3 Quoted Constructs
7357 -----------------------
7360 'L?\"([^\"\\\n]|(\\['\"?\\abfnrtv])|(\\([0123456]{1,3}))|(\\x[[:xdigit:]]+)|(\\u([[:xdigit:]]{4}))|(\\U([[:xdigit:]]{8})))*\"'
7363 '("/*"([^*]|"*"[^/])*"*/")|("/"(\\\n)*"/"[^\n]*)'
7365 Note that in C99, a '//'-style comment may be split across lines,
7366 and, contrary to popular belief, does not include the trailing '\n'
7369 A better way to scan '/* */' comments is by line, rather than
7370 matching possibly huge comments all at once. This will allow you
7371 to scan comments of unlimited length, as long as line breaks appear
7372 at sane intervals. This is also more efficient when used with
7373 automatic line number processing. *Note option-yylineno::.
7376 "/*" BEGIN(COMMENT);
7386 File: flex.info, Node: Addresses, Prev: Quoted Constructs, Up: Common Patterns
7392 dec-octet [0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5]
7393 IPv4address {dec-octet}\.{dec-octet}\.{dec-octet}\.{dec-octet}
7396 h16 [0-9A-Fa-f]{1,4}
7397 ls32 {h16}:{h16}|{IPv4address}
7398 IPv6address ({h16}:){6}{ls32}|
7399 ::({h16}:){5}{ls32}|
7400 ({h16})?::({h16}:){4}{ls32}|
7401 (({h16}:){0,1}{h16})?::({h16}:){3}{ls32}|
7402 (({h16}:){0,2}{h16})?::({h16}:){2}{ls32}|
7403 (({h16}:){0,3}{h16})?::{h16}:{ls32}|
7404 (({h16}:){0,4}{h16})?::{ls32}|
7405 (({h16}:){0,5}{h16})?::{h16}|
7406 (({h16}:){0,6}{h16})?::
7408 See RFC 2373 (http://www.ietf.org/rfc/rfc2373.txt) for details.
7409 Note that you have to fold the definition of 'IPv6address' into one
7410 line and that it also matches the "unspecified address" "::".
7413 '(([^:/?#]+):)?("//"([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?'
7415 This pattern is nearly useless, since it allows just about any
7416 character to appear in a URI, including spaces and control
7417 characters. See RFC 2396 (http://www.ietf.org/rfc/rfc2396.txt) for
7421 File: flex.info, Node: Indices, Prev: Appendices, Up: Top
7429 * Index of Functions and Macros::
7430 * Index of Variables::
7431 * Index of Data Types::
7433 * Index of Scanner Options::
7436 File: flex.info, Node: Concept Index, Next: Index of Functions and Macros, Prev: Indices, Up: Indices
7444 * $ as normal character in patterns: Patterns. (line 275)
7445 * %array, advantages of: Matching. (line 43)
7446 * %array, use of: Matching. (line 29)
7447 * %array, with C++: Matching. (line 65)
7448 * %option noyywrapp: Generated Scanner. (line 93)
7449 * %pointer, and unput(): Actions. (line 162)
7450 * %pointer, use of: Matching. (line 29)
7451 * %top: Definitions Section. (line 44)
7452 * %{ and %}, in Definitions Section: Definitions Section. (line 40)
7453 * %{ and %}, in Rules Section: Actions. (line 26)
7454 * <<EOF>>, use of: EOF. (line 33)
7455 * [] in patterns: Patterns. (line 15)
7456 * ^ as non-special character in patterns: Patterns. (line 275)
7457 * |, in actions: Actions. (line 33)
7458 * |, use of: Actions. (line 83)
7459 * accessor functions, use of: Accessor Methods. (line 18)
7460 * actions: Actions. (line 6)
7461 * actions, embedded C strings: Actions. (line 26)
7462 * actions, redefining YY_BREAK: Misc Macros. (line 49)
7463 * actions, use of { and }: Actions. (line 26)
7464 * aliases, how to define: Definitions Section. (line 10)
7465 * arguments, command-line: Scanner Options. (line 6)
7466 * array, default size for yytext: User Values. (line 13)
7467 * backing up, eliminating: Performance. (line 54)
7468 * backing up, eliminating by adding error rules: Performance. (line 104)
7469 * backing up, eliminating with catch-all rule: Performance. (line 118)
7470 * backing up, example of eliminating: Performance. (line 49)
7471 * BEGIN: Actions. (line 57)
7472 * BEGIN, explanation: Start Conditions. (line 84)
7473 * beginning of line, in patterns: Patterns. (line 127)
7474 * bison, bridging with flex: Bison Bridge. (line 6)
7475 * bison, parser: Bison Bridge. (line 53)
7476 * bison, scanner to be called from bison: Bison Bridge. (line 34)
7477 * BOL, checking the BOL flag: Misc Macros. (line 46)
7478 * BOL, in patterns: Patterns. (line 127)
7479 * BOL, setting it: Misc Macros. (line 40)
7480 * braces in patterns: Patterns. (line 42)
7481 * bugs, reporting: Reporting Bugs. (line 6)
7482 * C code in flex input: Definitions Section. (line 40)
7483 * C++: Cxx. (line 9)
7484 * C++ and %array: User Values. (line 23)
7485 * C++ I/O, customizing: How do I use my own I/O classes in a C++ scanner?.
7487 * C++ scanners, including multiple scanners: Cxx. (line 197)
7488 * C++ scanners, use of: Cxx. (line 128)
7489 * c++, experimental form of scanner class: Cxx. (line 6)
7490 * C++, multiple different scanners: Cxx. (line 192)
7491 * C-strings, in actions: Actions. (line 26)
7492 * case-insensitive, effect on character classes: Patterns. (line 216)
7493 * character classes in patterns: Patterns. (line 186)
7494 * character classes in patterns, syntax of: Patterns. (line 15)
7495 * character classes, equivalence of: Patterns. (line 205)
7496 * clearing an input buffer: Multiple Input Buffers.
7498 * command-line options: Scanner Options. (line 6)
7499 * comments in flex input: Definitions Section. (line 37)
7500 * comments in the input: Comments in the Input.
7502 * comments, discarding: Actions. (line 176)
7503 * comments, example of scanning C comments: Start Conditions. (line 140)
7504 * comments, in actions: Actions. (line 26)
7505 * comments, in rules section: Comments in the Input.
7507 * comments, syntax of: Comments in the Input.
7509 * comments, valid uses of: Comments in the Input.
7511 * compressing whitespace: Actions. (line 22)
7512 * concatenation, in patterns: Patterns. (line 111)
7513 * copyright of flex: Copyright. (line 6)
7514 * counting characters and lines: Simple Examples. (line 23)
7515 * customizing I/O in C++ scanners: How do I use my own I/O classes in a C++ scanner?.
7517 * default rule: Simple Examples. (line 15)
7518 * default rule <1>: Matching. (line 20)
7519 * defining pattern aliases: Definitions Section. (line 21)
7520 * Definitions, in flex input: Definitions Section. (line 6)
7521 * deleting lines from input: Actions. (line 13)
7522 * discarding C comments: Actions. (line 176)
7523 * distributing flex: Copyright. (line 6)
7524 * ECHO: Actions. (line 54)
7525 * ECHO, and yyout: Generated Scanner. (line 101)
7526 * embedding C code in flex input: Definitions Section. (line 40)
7527 * end of file, in patterns: Patterns. (line 150)
7528 * end of line, in negated character classes: Patterns. (line 237)
7529 * end of line, in patterns: Patterns. (line 131)
7530 * end-of-file, and yyrestart(): Generated Scanner. (line 42)
7531 * EOF and yyrestart(): Generated Scanner. (line 42)
7532 * EOF in patterns, syntax of: Patterns. (line 150)
7533 * EOF, example using multiple input buffers: Multiple Input Buffers.
7535 * EOF, explanation: EOF. (line 6)
7536 * EOF, pushing back: Actions. (line 170)
7537 * EOL, in negated character classes: Patterns. (line 237)
7538 * EOL, in patterns: Patterns. (line 131)
7539 * error messages, end of buffer missed: Lex and Posix. (line 50)
7540 * error reporting, diagnostic messages: Diagnostics. (line 6)
7541 * error reporting, in C++: Cxx. (line 112)
7542 * error rules, to eliminate backing up: Performance. (line 102)
7543 * escape sequences in patterns, syntax of: Patterns. (line 57)
7544 * exiting with yyterminate(): Actions. (line 212)
7545 * experimental form of c++ scanner class: Cxx. (line 6)
7546 * extended scope of start conditions: Start Conditions. (line 270)
7547 * file format: Format. (line 6)
7548 * file format, serialized tables: Tables File Format. (line 6)
7549 * flushing an input buffer: Multiple Input Buffers.
7551 * flushing the internal buffer: Actions. (line 206)
7552 * format of flex input: Format. (line 6)
7553 * format of input file: Format. (line 9)
7554 * freeing tables: Loading and Unloading Serialized Tables.
7556 * getting current start state with YY_START: Start Conditions.
7558 * halting with yyterminate(): Actions. (line 212)
7559 * handling include files with multiple input buffers: Multiple Input Buffers.
7561 * handling include files with multiple input buffers <1>: Multiple Input Buffers.
7563 * header files, with C++: Cxx. (line 197)
7564 * include files, with C++: Cxx. (line 197)
7565 * input file, Definitions section: Definitions Section. (line 6)
7566 * input file, Rules Section: Rules Section. (line 6)
7567 * input file, user code Section: User Code Section. (line 6)
7568 * input(): Actions. (line 173)
7569 * input(), and C++: Actions. (line 202)
7570 * input, format of: Format. (line 6)
7571 * input, matching: Matching. (line 6)
7572 * keywords, for performance: Performance. (line 200)
7573 * lex (traditional) and POSIX: Lex and Posix. (line 6)
7574 * LexerInput, overriding: How do I use my own I/O classes in a C++ scanner?.
7576 * LexerOutput, overriding: How do I use my own I/O classes in a C++ scanner?.
7578 * limitations of flex: Limitations. (line 6)
7579 * literal text in patterns, syntax of: Patterns. (line 54)
7580 * loading tables at runtime: Loading and Unloading Serialized Tables.
7582 * m4: M4 Dependency. (line 6)
7583 * Makefile, example of implicit rules: Makefiles and Flex. (line 21)
7584 * Makefile, explicit example: Makefiles and Flex. (line 33)
7585 * Makefile, syntax: Makefiles and Flex. (line 6)
7586 * matching C-style double-quoted strings: Start Conditions. (line 203)
7587 * matching, and trailing context: Matching. (line 6)
7588 * matching, length of: Matching. (line 6)
7589 * matching, multiple matches: Matching. (line 6)
7590 * member functions, C++: Cxx. (line 9)
7591 * memory management: Memory Management. (line 6)
7592 * memory, allocating input buffers: Multiple Input Buffers.
7594 * memory, considerations for reentrant scanners: Init and Destroy Functions.
7596 * memory, deleting input buffers: Multiple Input Buffers.
7598 * memory, for start condition stacks: Start Conditions. (line 301)
7599 * memory, serialized tables: Serialized Tables. (line 6)
7600 * memory, serialized tables <1>: Loading and Unloading Serialized Tables.
7602 * methods, c++: Cxx. (line 9)
7603 * minimal scanner: Matching. (line 24)
7604 * multiple input streams: Multiple Input Buffers.
7606 * name definitions, not POSIX: Lex and Posix. (line 75)
7607 * negating ranges in patterns: Patterns. (line 23)
7608 * newline, matching in patterns: Patterns. (line 135)
7609 * non-POSIX features of flex: Lex and Posix. (line 142)
7610 * noyywrap, %option: Generated Scanner. (line 93)
7611 * NULL character in patterns, syntax of: Patterns. (line 62)
7612 * octal characters in patterns: Patterns. (line 65)
7613 * options, command-line: Scanner Options. (line 6)
7614 * overriding LexerInput: How do I use my own I/O classes in a C++ scanner?.
7616 * overriding LexerOutput: How do I use my own I/O classes in a C++ scanner?.
7618 * overriding the memory routines: Overriding The Default Memory Management.
7620 * Pascal-like language: Simple Examples. (line 49)
7621 * pattern aliases, defining: Definitions Section. (line 21)
7622 * pattern aliases, expansion of: Patterns. (line 51)
7623 * pattern aliases, how to define: Definitions Section. (line 10)
7624 * pattern aliases, use of: Definitions Section. (line 28)
7625 * patterns and actions on different lines: Lex and Posix. (line 101)
7626 * patterns, character class equivalence: Patterns. (line 205)
7627 * patterns, common: Common Patterns. (line 6)
7628 * patterns, end of line: Patterns. (line 300)
7629 * patterns, grouping and precedence: Patterns. (line 167)
7630 * patterns, in rules section: Patterns. (line 6)
7631 * patterns, invalid trailing context: Patterns. (line 285)
7632 * patterns, matching: Matching. (line 6)
7633 * patterns, precedence of operators: Patterns. (line 161)
7634 * patterns, repetitions with grouping: Patterns. (line 184)
7635 * patterns, special characters treated as non-special: Patterns.
7637 * patterns, syntax: Patterns. (line 9)
7638 * patterns, syntax <1>: Patterns. (line 9)
7639 * patterns, tuning for performance: Performance. (line 49)
7640 * patterns, valid character classes: Patterns. (line 192)
7641 * performance optimization, matching longer tokens: Performance.
7643 * performance optimization, recognizing keywords: Performance.
7645 * performance, backing up: Performance. (line 49)
7646 * performance, considerations: Performance. (line 6)
7647 * performance, using keywords: Performance. (line 200)
7648 * popping an input buffer: Multiple Input Buffers.
7650 * POSIX and lex: Lex and Posix. (line 6)
7651 * POSIX comp;compliance: Lex and Posix. (line 142)
7652 * POSIX, character classes in patterns, syntax of: Patterns. (line 15)
7653 * preprocessor macros, for use in actions: Actions. (line 50)
7654 * pushing an input buffer: Multiple Input Buffers.
7656 * pushing back characters with unput: Actions. (line 143)
7657 * pushing back characters with unput(): Actions. (line 147)
7658 * pushing back characters with yyless: Actions. (line 131)
7659 * pushing back EOF: Actions. (line 170)
7660 * ranges in patterns: Patterns. (line 19)
7661 * ranges in patterns, negating: Patterns. (line 23)
7662 * recognizing C comments: Start Conditions. (line 143)
7663 * reentrant scanners, multiple interleaved scanners: Reentrant Uses.
7665 * reentrant scanners, recursive invocation: Reentrant Uses. (line 30)
7666 * reentrant, accessing flex variables: Global Replacement. (line 6)
7667 * reentrant, accessor functions: Accessor Methods. (line 6)
7668 * reentrant, API explanation: Reentrant Overview. (line 6)
7669 * reentrant, calling functions: Extra Reentrant Argument.
7671 * reentrant, example of: Reentrant Example. (line 6)
7672 * reentrant, explanation: Reentrant. (line 6)
7673 * reentrant, extra data: Extra Data. (line 6)
7674 * reentrant, initialization: Init and Destroy Functions.
7676 * regular expressions, in patterns: Patterns. (line 6)
7677 * REJECT: Actions. (line 61)
7678 * REJECT, calling multiple times: Actions. (line 83)
7679 * REJECT, performance costs: Performance. (line 12)
7680 * reporting bugs: Reporting Bugs. (line 6)
7681 * restarting the scanner: Lex and Posix. (line 54)
7682 * RETURN, within actions: Generated Scanner. (line 57)
7683 * rules, default: Simple Examples. (line 15)
7684 * rules, in flex input: Rules Section. (line 6)
7685 * scanner, definition of: Introduction. (line 6)
7686 * sections of flex input: Format. (line 6)
7687 * serialization: Serialized Tables. (line 6)
7688 * serialization of tables: Creating Serialized Tables.
7690 * serialized tables, multiple scanners: Creating Serialized Tables.
7692 * stack, input buffer pop: Multiple Input Buffers.
7694 * stack, input buffer push: Multiple Input Buffers.
7696 * stacks, routines for manipulating: Start Conditions. (line 286)
7697 * start condition, applying to multiple patterns: Start Conditions.
7699 * start conditions: Start Conditions. (line 6)
7700 * start conditions, behavior of default rule: Start Conditions.
7702 * start conditions, exclusive: Start Conditions. (line 53)
7703 * start conditions, for different interpretations of same input: Start Conditions.
7705 * start conditions, in patterns: Patterns. (line 140)
7706 * start conditions, inclusive: Start Conditions. (line 44)
7707 * start conditions, inclusive v.s. exclusive: Start Conditions.
7709 * start conditions, integer values: Start Conditions. (line 163)
7710 * start conditions, multiple: Start Conditions. (line 17)
7711 * start conditions, special wildcard condition: Start Conditions.
7713 * start conditions, use of a stack: Start Conditions. (line 286)
7714 * start conditions, use of wildcard condition (<*>): Start Conditions.
7716 * start conditions, using BEGIN: Start Conditions. (line 95)
7717 * stdin, default for yyin: Generated Scanner. (line 37)
7718 * stdout, as default for yyout: Generated Scanner. (line 101)
7719 * strings, scanning strings instead of files: Multiple Input Buffers.
7721 * tables, creating serialized: Creating Serialized Tables.
7723 * tables, file format: Tables File Format. (line 6)
7724 * tables, freeing: Loading and Unloading Serialized Tables.
7726 * tables, loading and unloading: Loading and Unloading Serialized Tables.
7728 * terminating with yyterminate(): Actions. (line 212)
7729 * token: Matching. (line 14)
7730 * trailing context, in patterns: Patterns. (line 118)
7731 * trailing context, limits of: Patterns. (line 275)
7732 * trailing context, matching: Matching. (line 6)
7733 * trailing context, performance costs: Performance. (line 12)
7734 * trailing context, variable length: Performance. (line 141)
7735 * unput(): Actions. (line 143)
7736 * unput(), and %pointer: Actions. (line 162)
7737 * unput(), pushing back characters: Actions. (line 147)
7738 * user code, in flex input: User Code Section. (line 6)
7739 * username expansion: Simple Examples. (line 8)
7740 * using integer values of start condition names: Start Conditions.
7742 * verbatim text in patterns, syntax of: Patterns. (line 54)
7743 * warning, dangerous trailing context: Limitations. (line 20)
7744 * warning, rule cannot be matched: Diagnostics. (line 14)
7745 * warnings, diagnostic messages: Diagnostics. (line 6)
7746 * whitespace, compressing: Actions. (line 22)
7747 * yacc interface: Yacc. (line 17)
7748 * yacc, interface: Yacc. (line 6)
7749 * yyalloc, overriding: Overriding The Default Memory Management.
7751 * yyfree, overriding: Overriding The Default Memory Management.
7753 * yyin: Generated Scanner. (line 37)
7754 * yyinput(): Actions. (line 202)
7755 * yyleng: Matching. (line 14)
7756 * yyleng, modification of: Actions. (line 47)
7757 * yyless(): Actions. (line 125)
7758 * yyless(), pushing back characters: Actions. (line 131)
7759 * yylex(), in generated scanner: Generated Scanner. (line 6)
7760 * yylex(), overriding: Generated Scanner. (line 16)
7761 * yylex, overriding the prototype of: Generated Scanner. (line 20)
7762 * yylineno, in a reentrant scanner: Reentrant Functions. (line 36)
7763 * yylineno, performance costs: Performance. (line 12)
7764 * yymore(): Actions. (line 104)
7765 * yymore() to append token to previous token: Actions. (line 110)
7766 * yymore(), mega-kludge: Actions. (line 110)
7767 * yymore, and yyleng: Actions. (line 47)
7768 * yymore, performance penalty of: Actions. (line 119)
7769 * yyout: Generated Scanner. (line 101)
7770 * yyrealloc, overriding: Overriding The Default Memory Management.
7772 * yyrestart(): Generated Scanner. (line 42)
7773 * yyterminate(): Actions. (line 212)
7774 * yytext: Matching. (line 14)
7775 * yytext, default array size: User Values. (line 13)
7776 * yytext, memory considerations: A Note About yytext And Memory.
7778 * yytext, modification of: Actions. (line 42)
7779 * yytext, two types of: Matching. (line 29)
7780 * yywrap(): Generated Scanner. (line 85)
7781 * yywrap, default for: Generated Scanner. (line 93)
7782 * YY_CURRENT_BUFFER, and multiple buffers Finally, the macro: Multiple Input Buffers.
7784 * YY_EXTRA_TYPE, defining your own type: Extra Data. (line 33)
7785 * YY_FLUSH_BUFFER: Actions. (line 206)
7786 * YY_INPUT: Generated Scanner. (line 61)
7787 * YY_INPUT, overriding: Generated Scanner. (line 71)
7788 * YY_START, example: Start Conditions. (line 185)
7789 * YY_USER_ACTION to track each time a rule is matched: Misc Macros.