2 .\" Copyright 2001-2007 Adrian Thurston <thurston@complang.org>
5 .\" This file is part of Ragel.
7 .\" Ragel is free software; you can redistribute it and/or modify
8 .\" it under the terms of the GNU General Public License as published by
9 .\" the Free Software Foundation; either version 2 of the License, or
10 .\" (at your option) any later version.
12 .\" Ragel is distributed in the hope that it will be useful,
13 .\" but WITHOUT ANY WARRANTY; without even the implied warranty of
14 .\" MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
15 .\" GNU General Public License for more details.
17 .\" You should have received a copy of the GNU General Public License
18 .\" along with Ragel; if not, write to the Free Software
19 .\" Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
21 .\" Process this file with
22 .\" groff -man -Tascii ragel.1
24 .TH RAGEL 1 "@PUBDATE@" "Ragel @VERSION@" "Ragel State Machine Compiler"
26 ragel \- compile regular languages into executable state machines
32 Ragel compiles executable finite state machines from regular languages.
33 Ragel can generate C, C++, Objective-C, D, or Java code. Ragel state
34 machines can not only recognize byte
35 sequences as regular expression machines do, but can also execute code at
36 arbitrary points in the recognition of a regular language. User code is
37 embedded using inline operators that do not disrupt the regular language
40 The core language consists of standard regular expression operators, such as
41 union, concatenation and kleene star, accompanied by action embedding
42 operators. Ragel also provides operators that let you control any
43 non-determinism that you create, construct scanners using the longest match
44 paradigm, and build state machines using the statechart model. It is also
45 possible to influence the execution of a state machine from inside an embedded
46 action by jumping or calling to other parts of the machine and reprocessing
49 Ragel provides a very flexibile interface to the host language that attempts to
50 place minimal restrictions on how the generated code is used and integrated
51 into the application. The generated code has no dependencies.
55 .BR \-h ", " \-H ", " \-? ", " \-\-help
56 Display help and exit.
59 Print version information and exit.
62 Write output to file. If -o is not given, a default file name is chosen by
63 replacing the file extenstion of the input file. For source files ending in .rh
64 the suffix .h is used. For all other source files a suffix based on the output
65 language is used (.c, .cpp, .m, .dot, etc.).
68 Print some statistics on standard error.
70 .B \--error-format=gnu
71 Print error messages using the format "file:line:column:" (default)
73 .B \--error-format=msvc
74 Print error messages using the format "file(line,column):"
77 Do not remove duplicate actions from action lists.
80 Add dir to the list of directories to search for included and imported files
83 Do not perform state minimization.
86 Perform minimization once, at the end of the state machine compilation.
89 Minimize after nearly every operation. Lists of like operations such as unions
90 are minimized once at the end. This is the default minimization option.
93 Minimize after every operation.
96 Compile the state machines and emit an XML representation of the host data and
100 Generate a dot file for Graphviz.
103 Display printable characters on labels.
106 FSM specification to output.
109 Machine definition/instantiation to output.
112 The host language is C, C++, Obj-C or Obj-C++. This is the default host language option.
115 The host language is D.
118 The host language is Java.
121 The host language is Ruby.
124 Inhibit writing of #line directives.
127 Generate a table driven FSM. This is the default code style. The table driven
128 FSM represents the state machine as static data. There are tables of states,
129 transitions, indicies and actions. The current state is stored in a variable.
130 The execution is a loop that looks that given the current state and current
131 character to process looks up the transition to take using a binary search,
132 executes any actions and moves to the target state. In general, the table
133 driven FSM produces a smaller binary and requires a less expensive host language
134 compile but results in slower running code. The table driven FSM is suitable
138 Generate a faster table driven FSM by expanding action lists in the action
142 Generate a flat table driven FSM. Transitions are represented as an array
143 indexed by the current alphabet character. This eliminates the need for a
144 binary search to locate transitions and produces faster code, however it is
145 only suitable for small alphabets.
148 Generate a faster flat table driven FSM by expanding action lists in the action
152 Generate a goto driven FSM. The goto driven FSM represents the state machine
153 as a series of goto statements. While in the machine, the current state is
154 stored by the processor's instruction pointer. The execution is a flat function
155 where control is passed from state to state using gotos. In general, the goto
156 FSM produces faster code but results in a larger binary and a more expensive
157 host language compile.
160 Generate a faster goto driven FSM by expanding action lists in the action
164 Generate a really fast goto driven FSM by embedding action lists in the state
165 machine control code.
168 N-Way Split really fast goto-driven FSM.
171 NOTE: This is a very brief description of Ragel input. Ragel is described in
172 more detail in the user guide available from the homepage (see below).
174 Ragel normally passes input files straight to the output. When it sees an FSM
175 specification that contains machine instantiations it stops to generate the
176 state machine. If there are write statements (such as "write exec") then ragel emits the
177 corresponding code. There can be any number of FSM specifications in an input
178 file. A multi-line FSM specification starts with '%%{' and ends with '}%%'. A
179 single line FSM specification starts with %% and ends at the first newline.
183 Set the the name of the machine. If given, it must be the first statement.
186 Set the data type of the alphabet.
189 Specify how to retrieve the alphabet character from the element type.
192 Include a machine of same name as the current or of a different name in either
193 the current file or some other file.
195 .I Action Definition:
196 Define an action that can be invoked by the FSM.
198 .I Fsm Definition, Instantiation and Longest Match Instantiation:
199 Used to build FSMs. Syntax description in next few sections.
202 Specify how to access the persistent state machine variables.
205 Write some component of the machine.
208 Override the default variable names (p, pe, cs, act, etc).
210 The basic machines are the base operands of the regular language expressions.
213 Concat literal. Produces a concatenation of the characters in the string.
214 Supports escape sequences with '\\'. The result will have a start state and a
215 transition to a new state for each character in the string. The last state in
216 the sequence will be made final. To make the string case-insensitive, append
217 an 'i' to the string, as in 'cmd'i\fR.
220 Identical to single quote version.
223 Or literal. Produces a union of characters. Supports character ranges
224 with '\-', negating the sense of the union with an initial '^' and escape
225 sequences with '\\'. The result will have two states with a transition between
226 them for each character or range.
228 NOTE: '', "", and [] produce null FSMs. Null machines have one state that is
229 both a start state and a final state and match the zero length string. A null machine
230 may be created with the null builtin machine.
233 Makes a two state machine with one transition on the given integer number.
236 Makes a two state machine with one transition on the given hexidecimal number.
239 A simple regular expression. Supports the notation '.', '*' and '[]', character
240 ranges with '\-', negating the sense of an OR expression with and initial '^'
241 and escape sequences with '\\'. Also supports one trailing flag: i. Use it to
242 produce a case-insensitive regular expression, as in /GET/i.
245 Specifies a range. The allowable upper and lower bounds are concat literals of
246 length one and number machines.
247 For example, 0x10..0x20, 0..63, and 'a'..'z' are valid ranges.
250 References the machine definition assigned to the variable name given.
253 There are several builtin machines available. They are all two state machines
254 for the purpose of matching common classes of characters. They are:
258 Any character in the alphabet.
261 Ascii characters 0..127.
264 Ascii extended characters. This is the range -128..127 for signed alphabets
265 and the range 0..255 for unsigned alphabets.
268 Alphabetic characters /[A-Za-z]/.
274 Alpha numerics /[0-9A-Za-z]/.
277 Lowercase characters /[a-z]/.
280 Uppercase characters /[A-Z]/.
283 Hexidecimal digits /[0-9A-Fa-f]/.
286 Control characters 0..31.
289 Graphical characters /[!-~]/.
292 Printable characters /[ -~]/.
295 Punctuation. Graphical characters that are not alpha-numerics
299 Whitespace /[\\t\\v\\f\\n\\r ]/.
302 Zero length string. Equivalent to '', "" and [].
305 Empty set. Matches nothing.
307 .SH BRIEF OPERATOR REFERENCE
308 Operators are grouped by precedence, group 1 being the lowest and group 6 the
314 Join machines together without drawing any transitions, setting up a start
315 state or any final states. Start state must be explicitly specified with the
316 "start" label. Final states may be specified with the an epsilon transitions to
317 the implicitly created "final" state.
322 Produces a machine that matches any string in machine one or machine two.
325 Produces a machine that matches any string that is in both machine one and
329 Produces a machine that matches any string that is in machine one but not in
333 Strong Subtraction. Matches any string in machine one that does not have any string
334 in machine two as a substring.
339 Produces a machine that matches all the strings in machine one followed
340 by all the strings in machine two.
343 Entry-Guarded Concatenation: terminates machine one upon entry to machine two.
346 Finish-Guarded Concatenation: terminates machine one when machine two finishes.
349 Left-Guarded Concatenation: gives a higher priority to machine one.
351 NOTE: Concatenation is the default operator. Two machines next to each other
352 with no operator between them results in the concatenation operation.
357 Attaches a label to an expression. Labels can be used by epsilon transitions
358 and fgoto and fcall statements in actions. Also note that the referencing of a
359 machine definition causes the implicit creation of label by the same name.
364 Draws an epsilon transition to the state defined by label. Label must
365 be a name in the current scope. Epsilon transitions are resolved when
366 comma operators are evaluated and at the root of the expression tree of
367 machine assignment/instantiation.
371 An action may be a name predefined with an action statement or may
372 be specified directly with '{' and '}' in the expression.
375 Embeds action into starting transitions.
378 Embeds action into transitions that go into a final state.
381 Embeds action into all transitions. Does not include pending out transitions.
384 Embeds action into pending out transitions from final states.
386 .B GROUP 6: EOF Actions
388 When a machine's finish routine is called the current state's EOF actions are
392 Embed an EOF action into the start state.
395 Embed an EOF action into all states except the start state.
398 Embed an EOF action into all states.
401 Embed an EOF action into final states.
404 Embed an EOF action into all states that are not final.
407 Embed an EOF action into all states that are not the start
408 state and that are not final (middle states).
410 .B GROUP 6: Global Error Actions
412 Global error actions are stored in states until the final state machine has
413 been fully constructed. They are then transferred to error transitions, giving
414 the effect of a default action.
417 Embed a global error action into the start state.
420 Embed a global error action into all states except the start state.
423 Embed a global error action into all states.
426 Embed a global error action into the final states.
429 Embed a global error action into all states which are not final.
432 Embed a global error action into all states which are not the start state and
433 are not final (middle states).
435 .B GROUP 6: Local Error Actions
437 Local error actions are stored in states until the named machine is fully
438 constructed. They are then transferred to error transitions, giving the effect
439 of a default action for a section of the total machine. Note that the name may
440 be omitted, in which case the action will be transferred to error actions upon
441 construction of the current machine.
444 Embed a local error action into the start state.
447 Embed a local error action into all states except the start state.
450 Embed a local error action into all states.
453 Embed a local error action into the final states.
456 Embed a local error action into all states which are not final.
459 Embed a local error action into all states which are not the start state and
460 are not final (middle states).
462 .B GROUP 6: To-State Actions
464 To state actions are stored in states and executed any time the machine moves
465 into a state. This includes regular transitions, and transfers of control such
466 as fgoto. Note that setting the current state from outside the machine (for
467 example during initialization) does not count as a transition into a state.
470 Embed a to-state action action into the start state.
473 Embed a to-state action into all states except the start state.
476 Embed a to-state action into all states.
479 Embed a to-state action into the final states.
482 Embed a to-state action into all states which are not final.
485 Embed a to-state action into all states which are not the start state and
486 are not final (middle states).
488 .B GROUP 6: From-State Actions
490 From state actions are executed whenever a state takes a transition on a character.
491 This includes the error transition and a transition to self.
494 Embed a from-state action into the start state.
497 Embed a from-state action into every state except the start state.
500 Embed a from-state action into all states.
503 Embed a from-state action into the final states.
506 Embed a from-state action into all states which are not final.
509 Embed a from-state action into all states which are not the start state and
510 are not final (middle states).
512 .B GROUP 6: Priority Assignment
514 Priorities are assigned to names within transitions. Only priorities on the
515 same name are allowed to interact. In the first form of priorities the name
516 defaults to the name of the machine definition the priority is assigned in.
517 Transitions do not have default priorities.
520 Assigns the priority int in all transitions leaving the start state.
523 Assigns the priority int in all transitions that go into a final state.
526 Assigns the priority int in all existing transitions.
529 Assigns the priority int in all pending out transitions.
531 A second form of priority assignment allows the programmer to specify the name
532 to which the priority is assigned, allowing interactions to cross machine
533 definition boundaries.
536 Assigns the priority int to name in all transitions leaving the start state.
538 .I expr @ (name, int)
539 Assigns the priority int to name in all transitions that go into a final state.
541 .I expr $ (name, int)
542 Assigns the priority int to name in all existing transitions.
544 .I expr % (name, int)
545 Assigns the priority int to name in all pending out transitions.
550 Produces the kleene star of a machine. Matches zero or more repetitions of the
554 Longest-Match Kleene Star. This version of kleene star puts a higher
555 priority on staying in the machine over wrapping around and starting over. This
556 operator is equivalent to ( ( expr ) $0 %1 )*.
559 Produces a machine that accepts the machine given or the null string. This operator
560 is equivalent to ( expr | '' ).
563 Produces the machine concatenated with the kleen star of itself. Matches one or
564 more repetitions of the machine. This operator is equivalent to ( expr . expr* ).
567 Produces a machine that matches exactly n repetitions of expr.
570 Produces a machine that matches anywhere from zero to n repetitions of expr.
573 Produces a machine that matches n or more repetitions of expr.
576 Produces a machine that matches n to m repetitions of expr.
581 Produces a machine that matches any string not matched by the given machine.
582 This operator is equivalent to ( *extend - expr ).
585 Character-Level Negation. Matches any single character not matched by the
586 single character machine expr.
591 Forces precedence on operators.
592 .SH VALUES AVAILABLE IN CODE BLOCKS
595 The current character. Equivalent to *p.
598 A pointer to the current character. Equivalent to p.
601 An integer value representing the current state.
604 An integer value representing the target state.
607 An integer value representing the entry point <label>.
608 .SH STATEMENTS AVAILABLE IN CODE BLOCKS
611 Do not advance over the current character. Equivalent to --p;.
614 Sets the current character to something else. Equivalent to p = (<expr>)-1;
617 Jump to the machine defined by <label>.
620 Jump to the entry point given by <expr>. The expression must
621 evaluate to an integer value representing a state.
624 Set the next state to be the entry point defined by <label>. The fnext
625 statement does not immediately jump to the specified state. Any action code
626 following the statement is executed.
629 Set the next state to be the entry point given by <expr>. The expression must
630 evaluate to an integer value representing a state.
633 Call the machine defined by <label>. The next fret will jump to the
634 target of the transition on which the action is invoked.
637 Call the entry point given by <expr>. The next fret will jump to the target of
638 the transition on which the action is invoked.
641 Return to the target state of the transition on which the last fcall was made.
644 Save the current state and immediately break out of the machine.
646 Ragel was written by Adrian Thurston <thurston@complang.org>. Objective-C
647 output contributed by Erich Ocean. D output contributed by Alan West. Ruby
648 output contributed by Victor Hugo Borja. C Sharp code generation contributed by
654 Homepage: http://www.complang.org/ragel/