From: thurston Date: Sun, 21 Jan 2007 22:58:22 +0000 (+0000) Subject: Import from my private repository. Snapshot after version 5.16, immediately X-Git-Tag: 2.0_release~1 X-Git-Url: http://review.tizen.org/git/?a=commitdiff_plain;h=12056158053532946b53b6249cb0e6cfd4580051;p=external%2Fragel.git Import from my private repository. Snapshot after version 5.16, immediately following the rewrite of the parsers. Repository revision number 3961. git-svn-id: http://svn.complang.org/ragel/trunk@2 052ea7fc-9027-0410-9066-f65837a77df0 --- 12056158053532946b53b6249cb0e6cfd4580051 diff --git a/COPYING b/COPYING new file mode 100644 index 0000000..ec0507b --- /dev/null +++ b/COPYING @@ -0,0 +1,340 @@ + GNU GENERAL PUBLIC LICENSE + Version 2, June 1991 + + Copyright (C) 1989, 1991 Free Software Foundation, Inc. + 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + Everyone is permitted to copy and distribute verbatim copies + of this license document, but changing it is not allowed. + + Preamble + + The licenses for most software are designed to take away your +freedom to share and change it. By contrast, the GNU General Public +License is intended to guarantee your freedom to share and change free +software--to make sure the software is free for all its users. This +General Public License applies to most of the Free Software +Foundation's software and to any other program whose authors commit to +using it. (Some other Free Software Foundation software is covered by +the GNU Library General Public License instead.) You can apply it to +your programs, too. + + When we speak of free software, we are referring to freedom, not +price. Our General Public Licenses are designed to make sure that you +have the freedom to distribute copies of free software (and charge for +this service if you wish), that you receive source code or can get it +if you want it, that you can change the software or use pieces of it +in new free programs; and that you know you can do these things. + + To protect your rights, we need to make restrictions that forbid +anyone to deny you these rights or to ask you to surrender the rights. +These restrictions translate to certain responsibilities for you if you +distribute copies of the software, or if you modify it. + + For example, if you distribute copies of such a program, whether +gratis or for a fee, you must give the recipients all the rights that +you have. You must make sure that they, too, receive or can get the +source code. And you must show them these terms so they know their +rights. + + We protect your rights with two steps: (1) copyright the software, and +(2) offer you this license which gives you legal permission to copy, +distribute and/or modify the software. + + Also, for each author's protection and ours, we want to make certain +that everyone understands that there is no warranty for this free +software. If the software is modified by someone else and passed on, we +want its recipients to know that what they have is not the original, so +that any problems introduced by others will not reflect on the original +authors' reputations. + + Finally, any free program is threatened constantly by software +patents. We wish to avoid the danger that redistributors of a free +program will individually obtain patent licenses, in effect making the +program proprietary. To prevent this, we have made it clear that any +patent must be licensed for everyone's free use or not licensed at all. + + The precise terms and conditions for copying, distribution and +modification follow. + + GNU GENERAL PUBLIC LICENSE + TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION + + 0. This License applies to any program or other work which contains +a notice placed by the copyright holder saying it may be distributed +under the terms of this General Public License. The "Program", below, +refers to any such program or work, and a "work based on the Program" +means either the Program or any derivative work under copyright law: +that is to say, a work containing the Program or a portion of it, +either verbatim or with modifications and/or translated into another +language. (Hereinafter, translation is included without limitation in +the term "modification".) Each licensee is addressed as "you". + +Activities other than copying, distribution and modification are not +covered by this License; they are outside its scope. The act of +running the Program is not restricted, and the output from the Program +is covered only if its contents constitute a work based on the +Program (independent of having been made by running the Program). +Whether that is true depends on what the Program does. + + 1. You may copy and distribute verbatim copies of the Program's +source code as you receive it, in any medium, provided that you +conspicuously and appropriately publish on each copy an appropriate +copyright notice and disclaimer of warranty; keep intact all the +notices that refer to this License and to the absence of any warranty; +and give any other recipients of the Program a copy of this License +along with the Program. + +You may charge a fee for the physical act of transferring a copy, and +you may at your option offer warranty protection in exchange for a fee. + + 2. You may modify your copy or copies of the Program or any portion +of it, thus forming a work based on the Program, and copy and +distribute such modifications or work under the terms of Section 1 +above, provided that you also meet all of these conditions: + + a) You must cause the modified files to carry prominent notices + stating that you changed the files and the date of any change. + + b) You must cause any work that you distribute or publish, that in + whole or in part contains or is derived from the Program or any + part thereof, to be licensed as a whole at no charge to all third + parties under the terms of this License. + + c) If the modified program normally reads commands interactively + when run, you must cause it, when started running for such + interactive use in the most ordinary way, to print or display an + announcement including an appropriate copyright notice and a + notice that there is no warranty (or else, saying that you provide + a warranty) and that users may redistribute the program under + these conditions, and telling the user how to view a copy of this + License. (Exception: if the Program itself is interactive but + does not normally print such an announcement, your work based on + the Program is not required to print an announcement.) + +These requirements apply to the modified work as a whole. If +identifiable sections of that work are not derived from the Program, +and can be reasonably considered independent and separate works in +themselves, then this License, and its terms, do not apply to those +sections when you distribute them as separate works. But when you +distribute the same sections as part of a whole which is a work based +on the Program, the distribution of the whole must be on the terms of +this License, whose permissions for other licensees extend to the +entire whole, and thus to each and every part regardless of who wrote it. + +Thus, it is not the intent of this section to claim rights or contest +your rights to work written entirely by you; rather, the intent is to +exercise the right to control the distribution of derivative or +collective works based on the Program. + +In addition, mere aggregation of another work not based on the Program +with the Program (or with a work based on the Program) on a volume of +a storage or distribution medium does not bring the other work under +the scope of this License. + + 3. You may copy and distribute the Program (or a work based on it, +under Section 2) in object code or executable form under the terms of +Sections 1 and 2 above provided that you also do one of the following: + + a) Accompany it with the complete corresponding machine-readable + source code, which must be distributed under the terms of Sections + 1 and 2 above on a medium customarily used for software interchange; or, + + b) Accompany it with a written offer, valid for at least three + years, to give any third party, for a charge no more than your + cost of physically performing source distribution, a complete + machine-readable copy of the corresponding source code, to be + distributed under the terms of Sections 1 and 2 above on a medium + customarily used for software interchange; or, + + c) Accompany it with the information you received as to the offer + to distribute corresponding source code. (This alternative is + allowed only for noncommercial distribution and only if you + received the program in object code or executable form with such + an offer, in accord with Subsection b above.) + +The source code for a work means the preferred form of the work for +making modifications to it. For an executable work, complete source +code means all the source code for all modules it contains, plus any +associated interface definition files, plus the scripts used to +control compilation and installation of the executable. However, as a +special exception, the source code distributed need not include +anything that is normally distributed (in either source or binary +form) with the major components (compiler, kernel, and so on) of the +operating system on which the executable runs, unless that component +itself accompanies the executable. + +If distribution of executable or object code is made by offering +access to copy from a designated place, then offering equivalent +access to copy the source code from the same place counts as +distribution of the source code, even though third parties are not +compelled to copy the source along with the object code. + + 4. You may not copy, modify, sublicense, or distribute the Program +except as expressly provided under this License. Any attempt +otherwise to copy, modify, sublicense or distribute the Program is +void, and will automatically terminate your rights under this License. +However, parties who have received copies, or rights, from you under +this License will not have their licenses terminated so long as such +parties remain in full compliance. + + 5. You are not required to accept this License, since you have not +signed it. However, nothing else grants you permission to modify or +distribute the Program or its derivative works. These actions are +prohibited by law if you do not accept this License. Therefore, by +modifying or distributing the Program (or any work based on the +Program), you indicate your acceptance of this License to do so, and +all its terms and conditions for copying, distributing or modifying +the Program or works based on it. + + 6. Each time you redistribute the Program (or any work based on the +Program), the recipient automatically receives a license from the +original licensor to copy, distribute or modify the Program subject to +these terms and conditions. You may not impose any further +restrictions on the recipients' exercise of the rights granted herein. +You are not responsible for enforcing compliance by third parties to +this License. + + 7. If, as a consequence of a court judgment or allegation of patent +infringement or for any other reason (not limited to patent issues), +conditions are imposed on you (whether by court order, agreement or +otherwise) that contradict the conditions of this License, they do not +excuse you from the conditions of this License. If you cannot +distribute so as to satisfy simultaneously your obligations under this +License and any other pertinent obligations, then as a consequence you +may not distribute the Program at all. For example, if a patent +license would not permit royalty-free redistribution of the Program by +all those who receive copies directly or indirectly through you, then +the only way you could satisfy both it and this License would be to +refrain entirely from distribution of the Program. + +If any portion of this section is held invalid or unenforceable under +any particular circumstance, the balance of the section is intended to +apply and the section as a whole is intended to apply in other +circumstances. + +It is not the purpose of this section to induce you to infringe any +patents or other property right claims or to contest validity of any +such claims; this section has the sole purpose of protecting the +integrity of the free software distribution system, which is +implemented by public license practices. Many people have made +generous contributions to the wide range of software distributed +through that system in reliance on consistent application of that +system; it is up to the author/donor to decide if he or she is willing +to distribute software through any other system and a licensee cannot +impose that choice. + +This section is intended to make thoroughly clear what is believed to +be a consequence of the rest of this License. + + 8. If the distribution and/or use of the Program is restricted in +certain countries either by patents or by copyrighted interfaces, the +original copyright holder who places the Program under this License +may add an explicit geographical distribution limitation excluding +those countries, so that distribution is permitted only in or among +countries not thus excluded. In such case, this License incorporates +the limitation as if written in the body of this License. + + 9. The Free Software Foundation may publish revised and/or new versions +of the General Public License from time to time. Such new versions will +be similar in spirit to the present version, but may differ in detail to +address new problems or concerns. + +Each version is given a distinguishing version number. If the Program +specifies a version number of this License which applies to it and "any +later version", you have the option of following the terms and conditions +either of that version or of any later version published by the Free +Software Foundation. If the Program does not specify a version number of +this License, you may choose any version ever published by the Free Software +Foundation. + + 10. If you wish to incorporate parts of the Program into other free +programs whose distribution conditions are different, write to the author +to ask for permission. For software which is copyrighted by the Free +Software Foundation, write to the Free Software Foundation; we sometimes +make exceptions for this. Our decision will be guided by the two goals +of preserving the free status of all derivatives of our free software and +of promoting the sharing and reuse of software generally. + + NO WARRANTY + + 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY +FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN +OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES +PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED +OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF +MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS +TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE +PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, +REPAIR OR CORRECTION. + + 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING +WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR +REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, +INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING +OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED +TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY +YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER +PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE +POSSIBILITY OF SUCH DAMAGES. + + END OF TERMS AND CONDITIONS + + How to Apply These Terms to Your New Programs + + If you develop a new program, and you want it to be of the greatest +possible use to the public, the best way to achieve this is to make it +free software which everyone can redistribute and change under these terms. + + To do so, attach the following notices to the program. It is safest +to attach them to the start of each source file to most effectively +convey the exclusion of warranty; and each file should have at least +the "copyright" line and a pointer to where the full notice is found. + + + Copyright (C) + + This program is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 2 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program; if not, write to the Free Software + Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + + +Also add information on how to contact you by electronic and paper mail. + +If the program is interactive, make it output a short notice like this +when it starts in an interactive mode: + + Gnomovision version 69, Copyright (C) year name of author + Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. + This is free software, and you are welcome to redistribute it + under certain conditions; type `show c' for details. + +The hypothetical commands `show w' and `show c' should show the appropriate +parts of the General Public License. Of course, the commands you use may +be called something other than `show w' and `show c'; they could even be +mouse-clicks or menu items--whatever suits your program. + +You should also get your employer (if you work as a programmer) or your +school, if any, to sign a "copyright disclaimer" for the program, if +necessary. Here is a sample; alter the names: + + Yoyodyne, Inc., hereby disclaims all copyright interest in the program + `Gnomovision' (which makes passes at compilers) written by James Hacker. + + , 1 April 1989 + Ty Coon, President of Vice + +This General Public License does not permit incorporating your program into +proprietary programs. If your program is a subroutine library, you may +consider it more useful to permit linking proprietary applications with the +library. If this is what you want to do, use the GNU Library General +Public License instead of this License. diff --git a/CREDITS b/CREDITS new file mode 100644 index 0000000..d0b5355 --- /dev/null +++ b/CREDITS @@ -0,0 +1,25 @@ + + Ragel State Machine Compiler -- CREDITS + ======================================= + +* Written by Adrian Thurston . + +* Objective-C output contributed by Eric Ocean. + +* D output and many great ideas contributed by Alan West. + +* Conditionals inspired by David Helder. + +* Java code generation contributions, bug reports, fixes, test cases + and suggestions from Colin Fleming + +* Useful discussion and bug from Carlos Antunes. + +* Feedback, Packaging, and Fixes provided by: + + Bob Tennent, Robert Lemmen, Tobias Jahn, Cris Bailiff, Buddy Betts, Scott + Dixon, Steven Handerson, Michael Somos, Bob Paddock, Istvan Buki, David + Drai, Matthias Rahlf, Zinx Verituse, Markus W. Weissmann, Marc Liyanage, + Eric Ocean, Alan West, Steven Kibbler, Laurent Boulard, Jon Oberheide, + David Helder, Lexington Luthor, Jason Jobe, Colin Fleming, Carlos Antunes, + Steve Horne diff --git a/ChangeLog b/ChangeLog new file mode 100644 index 0000000..83795a6 --- /dev/null +++ b/ChangeLog @@ -0,0 +1,1213 @@ +For Next Release +================ + -The '%when condition' syntax was functioning like '$when condition'. This + was fixed. + +Ragel 5.16 - Nov 20, 2006 +========================= + -Bug fix: the fhold and fexec directives did not function correctly in + scanner pattern actions. In this context manipulations of p may be lost or + made invalid. To fix this, fexec and fhold now manipulate tokend, which is + now always used to update p when the action terminates. + +Ragel 5.15 - Oct 31, 2006 +========================= + -A language independent test harness was introduced. Test cases can be + written using a custom mini-language in the embedded actions. This + mini-language is then translated to C, D and Java when generating the + language-specific test cases. + -Several existing tests have been ported to the language-independent format + and a number of new language-independent test cases have been added. + -The state-based embedding operators which access states that are not the + start state and are not final (the 'middle' states) have changed. They + were: + <@/ eof action into middle states + <@! error action into middle states + <@^ local error action into middle states + <@~ to-state action into middle states + <@* from-state action into middle states + They are now: + <>/ eof action into middle states + <>! error action into middle states + <>^ local error action into middle states + <>~ to-state action into middle states + <>* from-state action into middle states + -The verbose form of embeddings using the <- operator have been removed. + This syntax was difficult to remember. + -A new verbose form of state-based embedding operators have been added. + These are like the symbol versions, except they replace the symbols: + / ! ^ ~ * + with literal keywords: + eof err lerr to from + -The following words have been promoted to keywords: + when eof err lerr to from + -The write statment now gets its own lexical scope in the scanner to ensure + that commands are passed through as is (not affected by keywords). + -Bug fix: in the code generation of fret in scanner actions the adjustment to + p that is needed in some cases (dependent on content of patterns) was not + happening. + -The fhold directive, which decrements p, cannot be permitted in the pattern + action of a scanner item because it will not behave consistently. At the end + of a pattern action p could be decremented, set to a new value or left + alone. This depends on the contents of the scanner's patterns. The user + cannot be expected to predict what will happen to p. + -Conditions in D code require a cast to the widec type when computing widec. + -Like Java, D code also needs if (true) branches for control flow in actions + in order to fool the unreachable code detector. This is now abstracted in + all code generators using the CTRL_FLOW() function. + -The NULL_ITEM value in java code should be -1. This is needed for + maintaining tokstart. + +Ragel 5.14 - Oct 1, 2006 +======================== + -Fixed the check for use of fcall in actions embedded within longest match + items. It was emitting an error if an item's longest-match action had an + fcall, which is allowed. This bug was introduced while fixing a segfault in + version 5.8. + -A new minimization option was added: MinimizeMostOps (-l). This option + minimizes at every operation except on chains of expressions and chains of + terms (eg, union and concat). On these chains it minimizes only at the last + operation. This makes test cases with many states compile faster, without + killing the performance on grammars like strings2.rl. + -The -l minimiziation option was made the default. + -Fixes to Java code: Use of the fc value did not work, now fixed. Static data + is now declared with the final keyword. Patch from Colin Fleming. Conditions + now work when generating Java code. + -The option -p was added to rlcodegen which causes printable characters to be + printed in GraphViz output. Patch from Colin Fleming. + -The "element" keyword no longer exists, removed from vim syntax file. + Updated keyword highlighting. + -The host language selection is now made in the frontend. + -Native host language types are now used when specifying the alphtype. + Previously all languages used the set defined by C, and these were mapped to + the appropriate type in the backend. + +Ragel 5.13 - Sep 7, 2006 +======================== + -Fixed a careless error which broke Java code generation. + +Ragel 5.12 - Sep 7, 2006 +======================== + -The -o flag did not work in combination with -V. This was fixed. + -The split code generation format uses only the required number of digits + when writing out the number in the file name of each part. + -The -T0, -F0 and -G0 codegens should write out the action list iteration + variables only when there are regular, to state or from state actions. The + code gens should not use anyActions(). + -If two states have the same EOF actions, they are written out in the finish + routine as one case. + -The split and in-place goto formats would sometimes generate _out when it is + not needed. This was fixed. + -Improved the basic partitioning in the split code gen. The last partition + would sometimes be empty. This was fixed. + -Use of 'fcall *' was not causing top to be initialized. Fixed. + -Implemented a Java backend, specified with -J. Only the table-based format + is supported. + -Implemented range compression in the frontend. This has no effect on the + generated code, however it reduces the work of the backend and any programs + that read the intermediate format. + +Ragel 5.11 - Aug 10, 2006 +========================= + -Added a variable to the configure.in script which allows the building of + the parsers to be turned off (BUILD_PARSERS). Parser building is off by + default for released versions. + -Removed configure tests for bison defines header file. Use --defines=file + instead. + -Configure script doesn't test for bison, flex and gperf when building of the + parsers is turned off. + -Removed check for YYLTYPE structure from configure script. Since shipped + code will not build parsers by default, we don't need to be as accomodating + of other versions of bison. + -Added a missing include that showed up with g++ 2.95.3. + -Failed configure test for Objective-C compiler is now silent. + +Ragel 5.10 - Jul 31, 2006 +========================= + -Moved the check for error state higher in the table-based processing loop. + -Replaced naive implementations of condition searching with proper ones. In + the table-based formats the searching is also table-based. In the directly + executed formats the searching is also directly executable. + -The minimization process was made aware of conditions. + -A problem with the condition implementation was fixed. Previously we were + taking pointers to transitions and then using them after a call to + outTransCopy, which was a bad idea because they may be changed by the call. + -Added test mailbox3.rl which is based on mailbox2.rl but includes conditions + for restricting header and message body lengths. + -Eliminated the initial one-character backup of p just before resuming + execution. + -Added the -s option to the frontend for printing statistics. This currently + includes just the number of states. + -Sped up the generation of the in-place goto-driven (-G2) code style. + -Implemented a split version of in-place goto-driven code style. This code + generation style is suitable for producing fast implementations of very + large machines. Partitioning is currently naive. In the future a + high-quality partitioning program will be employed. The flag for accessing + this feature is -Pn, where n is the number of partitions. + -Converted mailbox1.rl, strings2.rl and cppscan1.rl tests to support the + split code generation. + -Fixes and updates were made to the runtests script: added -c for compiling + only, changed the -me option to -e, and added support for testing the split + code style. + +Ragel 5.9 - Jul 19, 2006 +======================== + -Fixed a bug in the include system which caused malformed output from the + frontend when the include was made from a multi-line machine spec and the + included file ended in a single line spec (or vice versa). + -Static data is now const. + -Actions which referenced states but were not embedded caused the frontend to + segfault, now fixed. + -Manual now built with pdflatex. + -The manual was reorganized and expanded. Chapter sequence is now: + Introduction, Constructing Machines, Embedding Actions, Controlling + Nondeterminism and Interfacing to the Host program. + +Ragel 5.8 - Jun 17, 2006 +======================== + -The internal representation of the alphabet type has been encapsulated + into a class and all operations on it have been defined as C++ operators. + -The condition implementation now supports range transitions. This allows + conditions to be embedded into arbitrary machines. Conditions are still + exprimental. + -More condition embedding operators were added + 1. Isolate the start state and embed a condition into all transitions + leaving it: + >when cond OR >?cond + 2. Embed a condition into all transitions: + when cond OR $when cond OR $?cond + 3. Embed a condition into pending out transitions: + %when cond OR %?cond + -Improvements were made to the determinization process to support pending out + conditions. + -The Vim sytax file was fixed so that :> doesn't cause the match of a label. + -The test suite was converted to a single-file format which uses less disk + space than the old directory-per-test format. + +Ragel 5.7 - May 14, 2006 +======================== + -Conditions will not be embedded like actions because they involve a + manipulation of the state machine they are specified in. They have therefore + been taken out of the verbose action embedding form (using the <- compound + symbol). A new syntax for specifying conditions has been created: + m = '\n' when {i==4}; + -Fixed a bug which prevented state machine commands like fcurs, fcall, fret, + etc, from being accounted for in from-state actions and to-state actions. + This prevented some necessary support code from being generated. + -Implemented condition testing in remaining code generators. + -Configure script now checks for gperf, which is required for building. + -Added support for case-insensitive literal strings (in addition to regexes). + A case-insensitive string is made by appending an 'i' to the literal, as in + 'cmd'i or "cmd"i. + -Fixed a bug which caused all or expressions inside of all regular + expressions to be case-insensitive. For example /[fo]o bar/ would make the + [fo] part case-insensitive even though no 'i' was given following the + regular expression. + +Ragel 5.6 - Apr 1, 2006 +======================= + -Added a left-guarded concatenation operator. This operator <: is equivalent + to ( expr1 $1 . expr2 >0 ). It is useful if you want to prefix a sequence + with a sequence of a subset of the characters it matches. For example, one + can consume leading whitespace before tokenizing a sequence of whitespace + separated words: ( ' '* <: ( ' '+ | [a-z]+ )** ) + -Removed context embedding code, which has been dead since 5.0. + +Ragel 5.5 - Mar 28, 2006 +======================== + -Implemented a case-insensitive option for regular expressions: /get/i. + -If no input file is given to the ragel program it reads from standard input. + -The label of the start state has been changed from START to IN to save on + required screen space. + -Bug fix: \0 was not working in literal strings, due to a change that reduced + memory usage by concatenating components of literal strings. Token data + length is now passed from the scanner to the paser so that we do not need to + rely on null termination. + +Ragel 5.4 - Mar 12, 2006 +======================== + -Eliminated the default transition from the frontend implementation. This + default transition was a space-saving optimization that at best could reduce + the number of allocated transitions by one half. Unfortunately it + complicated the implementation and this stood in the way of introducing + conditionals. The default transition may be reintroduced in the future. + -Added entry-guarded concatenation. This operator :>, is syntactic sugar + for expr1 $0 . expr >1. This operator terminates the matching of the first + machine when a first character of the second machine is matched. For + example in any* . ';' we never leave the any* machine. If we use any* :> ';' + then the any* machine is terminiated upon matching the semi-colon. + -Added finish-guarded concatenation. This operator :>>, is syntactic sugar + for expr1 $0 . expr @1. This operator is like entry guarded concatenation + except the first machine is terminated when the second machine enters a + final state. This is useful for delaying the guard until a full pattern is + matched. For example as in '/*' any* :>> '*/'. + -Added strong subtraction. Where regular subtraction removes from the first + machine any strings that are matched by the second machine, strong + subtraction removes any strings from the first that contain any strings of + the second as a substring. Strong subtraction is syntactic sugar for + expr1 - ( any* expr2 any* ). + -Eliminated the use of priorities from the examples. Replaced with + subtraction, guarded concatenation and longest-match kleene star. + -Did some initial work on supporting conditional transitions. Far from + complete and very buggy. This code will only be active when conditionals are + used. + +Ragel 5.3 - Jan 27, 2006 +======================== + -Added missing semi-colons that cause the build to fail when using older + versions of Bison. + -Fix for D code: if the contents of an fexec is a single word, the generated + code will get interpreted as a C-style cast. Adding two brackets prevents + this. Can now turn eliminate the "access this.;" in cppscan5 that was used to + get around this problem. + -Improved some of the tag names in the intermediate format. + -Added unsigned long to the list of supported alphabet types. + -Added ids of actions and action lists to XML intermediate format. Makes it + more human readable. + -Updated to latest Aapl package. + +Ragel 5.2 - Jan 6, 2006 +======================== + -Ragel emits an error if the target of fentry, fcall, fgoto or fnext is inside + a longest match operator, or if an action embedding in a longest match + machine uses fcall. The fcall command can still be used in pattern actions. + -Made improvements to the clang, rlscan, awkemu and cppscan examples. + -Some fixes to generated label names: they should all be prefixed with _. + -A fix to the Vim syntax highlighting script was made + -Many fixes and updates to the documentation. All important features and + concepts are now documented. A second chapter describing Ragel's use + was added. + +Ragel 5.1 - Dec 22, 2005 +======================== + -Fixes to the matching of section delimiters in Vim syntax file. + -If there is a longest match machine, the tokend var is now initialized by + write init. This is not necessary for correct functionality, however + prevents compiler warnings. + -The rlscan example was ported to the longest match operator and changed to + emit XML data. + -Fix to the error handling in the frontend: if there are errors in the lookup + of names at machine generation time then do not emit anything. + -If not compiling the full machine in the frontend (by using -M), avoid + errors and segfaults caused by names that are not part of the compiled + machine. + -Longest match bug fix: need to init tokstart when returing from fsm calls + that are inside longest match actions. + -In Graphviz drawing, the arrow into the start state is not a real + transition, do not draw to-state actions on the label. + -A bug fix to the handling of non-tag data within an XML tag was made. + -Backend exit value fixed: since the parser now accepts nothing so as to + avoid a redundant parse error when the frontend dies, we must force an + error. The backend should now be properly reporting errors. + -The longest match machine now has it's start state set final. An LM machine + is in a final state when it has not matched anything, when it has matched + and accepted a token and is ready for another, and when it has matched a + token but is waiting for some lookahead before determining what to do about + it (similar to kleene star). + -Element statement removed from some tests. + -Entry point names are propagated to the backend and used to label the entry + point arrows in Graphviz output. + +Ragel 5.0 - Dec 17, 2005 +======================== + (additional details in V5 release notes) + -Ragel has been split into two executables: A frontend which compiles + machines and emits them in an XML format, and a backend which generates code + or a Graphviz dot file from the XML input. The purpose of this split is to + allow Ragel to interface with other tools by means of the XML intermediate + format and to reduce complexity by strictly separating the previously + entangled phases. The intermediate format will provide a better platform + inspecting compiled machines and for extending Ragel to support other host + languages. + -The host language interface has been reduced significantly. Ragel no longer + expects the machine to be implemented as a structure or class and does not + generate functions corresponding to initialization, execution and EOF. + Instead, Ragel just generates the code of these components, allowing all of + them to be placed in a single function if desired. The user specifies a + machine in the usual manner, then indicates at which place in the program + text the state machine code is to be generated. This is done using the write + statement. It is possible to specify to Ragel how it should access the + variables it needs (such as the current state) using the access statement. + -The host language embedding delimiters have been changed. Single line + machines start with '%%' and end at newline. Multiline machines start with + '%%{' and end with '}%%'. The machine name is given with the machine + statement at the very beginning of the specification. This purpose of this + change is to make it easier separate Ragel code from the host language. This + will ease the addition of supported host languages. + -The structure and class parsing which was previously able to extract a + machine's name has been removed since this feature is dependent on the host + language and inhibits the move towards a more language-independent frontend. + -The init, element and interface statements have been made obsolete by the + new host language interface and have been removed. + -The fexec action statement has been changed to take only the new position to + move to. This statement is more useful for moving backwards and reparsing + input than for specifying a whole new buffer entirely and has been shifted + to this new use. Giving it only one argument also simplifies the parsing of + host code embedded in a Ragel specification. This will ease the addition of + supported host languages. + -Introduced the fbreak statement, which allows one to stop processing data + immediately. The machine ends up in the state that the current transition + was to go to. The current character is not changed. + -Introduced the noend option for writing the execute code. This inhibits + checking if we have reached pe. The machine will run until it goes into the + error state or fbreak is hit. This allows one to parse null-terminate + strings without first computing the length. + -The execute code now breaks out of the processing loop when it moves into + the error state. Previously it would run until pe was hit. Breaking out + makes the noend option useful when an error is encountered and allows + user code to determine where in the input the error occured. It also + eliminates needlessly iterating the input buffer. + -Introduced the noerror, nofinal and noprefix options for writing the machine + data. The first two inhibit the writing of the error state and the + first-final state should they not be needed. The noprefix eliminates the + prefixing of the data items with the machine name. + -Support for the D language has been added. This is specified in the backend + with the -D switch. + -Since the new host language interface has been reduced considerably, Ragel + no longer needs to distinguish between C-based languages. Support for C, C++ + and Objective-C has been folded into one option in the backend: -C + -The code generator has been made independent of the languages that it + supports by pushing the language dependent apsects down into the lower + levels of the code generator. + -Many improvements to the longest match construction were made. It is no + longer considered experimental. A longest match machine must appear at the + top level of a machine instantiation. Since it does not generate a pure + state machine (it may need to backtrack), it cannot be used as an operand to + other operators. + -References to the current character and current state are now completely + banned in EOF actions. + +Ragel 4.2 - Sep 16, 2005 +======================== + (additional details in V4 release notes) + -Fixed a bug in the longest match operator. In some states it's possible that + we either match a token or match nothing at all. In these states we need to + consult the LmSwitch on error so it must be prepared to execute an error + handler. We therefore need to init act to this error value (which is zero). + We can compute if we need to do this and the code generator emits the + initialization only if necessary. + -Changed the definition of the token end of longest match actions. It now + points to one past the last token. This makes computing the token length + easier because you don't have to add one. The longest match variables token + start, action identifier and token end are now properly initialized in + generated code. They don't need to be initialized in the user's code. + -Implemented to-state and from-state actions. These actions are executed on + transitions into the state (after the in transition's actions) and on + transitions out of the state (before the out transition's actions). See V4 + release notes for more information. + -Since there are no longer any action embedding operators that embed both on + transitions and on EOF, any actions that exist in both places will be there + because the user has explicitly done so. Presuming this case is rare, and + with code duplication in the hands of the user, we therefore give the EOF + actions their own action switch in the finish() function. This is further + motivated by the fact that the best solution is to do the same for to-state + and from-state actions in the main loop. + -Longest match actions can now be specified using a named action. Since a + word following a longest match item conflicts with the concatenation of a + named machine, the => symbol must come immediately before a named action. + -The longest match operator permits action and machine definitions in the + middle of a longest match construction. These are parsed as if they came + before the machine definition they are contained in. Permitting action and + machine definitions in a longest match construction allows objects to be + defined closer to their use. + -The longest match operator can now handle longest match items with no + action, where previously Ragel segfaulted. + -Updated to Aapl post 2.12. + -Fixed a bug in epsilon transition name lookups. After doing a name lookup + the result was stored in the parse tree. This is wrong because if a machine + is used more than once, each time it may resolve to different targets, + however it will be stored in the same place. We now store name resolutions + in a separated data structure so that each walk of a parse tree uses the + name resolved during the corresponding walk in the name lookup pass. + -The operators used to embed context and actions into states have been + modified. The V4 release notes contain the full details. + -Added zlen builtin machine to represent the zero length machine. Eventually + the name "null" will be phased out in favour of zlen because it is unclear + whether null matches the zero length string or if it does not match any + string at all (as does the empty builtin). + -Added verbose versions of action, context and priority embedding. See the V4 + release notes for the full details. A small example: + machine <- all exec { foo(); } <- final eof act1 + -Bugfix for machines with epsilon ops, but no join operations. I had + wrongfully assumed that because epsilon ops can only increase connectivity, + that no states are ever merged and therefore a call to fillInStates() is not + necessary. In reality, epsilon transitions within one machine can induce the + merging of states. In the following, state 2 follows two paths on 'i': + main := 'h' -> i 'i h' i: 'i'; + -Changed the license of the guide from a custom "do not propagate modified + versions of this document" license to the GPL. + +Ragel 4.1 - Jun 26, 2005 +======================== + (additional details in V4 release notes) + -A bug in include processing was fixed. Surrounding code in an include file + was being passed through to the output when it should be ignored. Includes + are only for including portions of another machine into he current. This + went unnoticed because all tested includes were wrapped in #ifndef ... + #endif directives and so did not affect the compilation of the file making + the include. + -Fixes were made to Vim syntax highlighting file. + -Duplicate actions are now removed from action lists. + -The character-level negation operator ^ was added. This operator produces a + machine that matches single characters that are not matched by the machine + it is applied to. This unary prefix operator has the same precedence level + as !. + -The use of + to specify the a positive literal number was discontinued. + -The parser now assigns the subtraction operator a higher precedence than + the negation of literal number. + +Ragel 4.0 - May 26, 2005 +======================== + (additional details in V4 release notes) + -Operators now strictly embed into a machine either on a specific class of + characters or on EOF, but never both. This gives a cleaner association + between the operators and the physical state machine entitites they operate + on. This change is made up of several parts: + 1. '%' operator embeds only into leaving characters. + 2. All global and local error operators only embed on error character + transitions, their action will not be triggerend on EOF in non-final + states. + 3. EOF action embedding operators have been added for all classes of states + to make up for functionality removed from other operators. These are + >/ $/ @/ %/. + 4. Start transition operator '>' no longer implicitly embeds into leaving + transtions when start state is final. + -Ragel now emits warnings about the improper use of statements and values in + action code that is embedded as an EOF action. Warnings are emitted for fpc, + fc, fexec, fbuf and fblen. + -Added a longest match construction operator |* machine opt-action; ... *|. + This is for repetition where an ability to revert to a shorter, previously + matched item is required. This is the same behaviour as flex and re2c. The + longest match operator is not a pure FSM construction, it introduces + transitions that implicitly hold the current character or reset execution to + a previous location in the input. Use of this operator requires the caller + of the machine to occasionally hold onto data after a call to the exectute + routine. Use of machines generated with this operator as the input to other + operators may have undefined results. See examples/cppscan for an example. + This is very experimental code. + -Action ids are only assigned to actions that are referenced in the final + constructed machine, preventing gaps in the action id sequence. Previously + an action id was assigned if the action was referenced during parsing. + -Machine specifications now begin with %% and are followed with an optional + name and either a single Ragel statement or a sequence of statements + enclosed in {}. + -Ragel no longer generates the FSM's structure or class. It is up to the user + to declare the structure and to give it a variable named curs of type + integer. If the machine uses the call stack the user must also declare a + array of integers named stack and an integer variable named top. + -In the case of Objective-C, Ragel no longer generates the interface or + implementation directives, allowing the user to declare additional methods. + -If a machine specification does not have a name then Ragel tries to find a + name for it by first checking if the specification is inside a struct, class + or interface. If it is not then it uses the name of the previous machine + specification. If still no name is found then this is an error. + -Fsm specifications now persist in memory and statements accumulate. + -Ragel now has an include statement for including the statements of a machine + spec in another file (perhaps because it is the corresponding header file). + The include statement can also be used to draw in the statements of another + fsm spec in the current file. + -The fstack statement is now obsolete and has been removed. + -A new statement, simply 'interface;', indicates that ragel should generate + the machine's interface. If Ragel sees the main machine it generates the + code sections of the machine. Previously, the header portion was generated + if the (now removed) struct statement was found and code was generated if + any machine definition was found. + -Fixed a bug in the resolution of fsm name references in actions. The name + resolution code did not recurse into inline code items with children + (fgoto*, fcall*, fnext*, and fexec), causing a segfault at code generation + time. + -Cleaned up the code generators. FsmCodeGen was made into a virtual base + class allowing for the language/output-style specific classes to inherit + both a language specific and style-specific base class while retaining only + one copy of FsmCodeGen. Language specific output can now be moved into the + language specific code generators, requiring less duplication of code in the + language/output-style specific leaf classes. + -Fixed bugs in fcall* implementation of IpgGoto code generation. + -If the element type has not been defined Ragel now uses a constant version + of the alphtype, not the exact alphtype. In most cases the data pointer of + the execute routine should be const. A non-const element type can still be + defined with the element statement. + -The fc special value now uses getkey for retrieving the current char rather + than *_p, which is wrong if the element type is a structure. + -User guide converted to TeX and updated for new 4.0 syntax and semantics. + +Ragel 3.7 - Oct 31, 2004 +======================== + -Bug fix: unreferenced machine instantiations causing segfault due to name + tree and parse tree walk becomming out of syncronization. + -Rewrote representation of inline code blocks using a tree data structure. + This allows special keywords such as fbuf to be used as the operatands of + other fsm commands. + -Documentation updates. + -When deciding whether or not to generate machine instantiations, search the + entire name tree beneath the instantiation for references, not just the + root. + -Removed stray ';' in keller2.rl + -Added fexec for restarting the machine with new buffer data (state stays the + same), fbuf for retrieving the the start of the buf, and fblen for + retrieving the orig buffer length. + -Implemented test/cppscan2 using fexec. This allows token emitting and restart + to stay inside the execute routine, instead of leaving and re-entering on + every token. + -Changed examples/cppscan to use fexec and thereby go much faster. + -Implemented flex and re2c versions of examples/cppscan. Ragel version + goes faster than flex version but not as fast as re2c version. + -Merged in Objective-C patch from Eric Ocean. + -Turned off syncing with stdio in C++ tests to make them go faster. + -Renamed C++ code generaion classes with the Cpp Prefix instead of CC to make + them easier to read. + -In the finish function emit fbuf as 0 cast to a pointer to the element type + so it's type is not interpreted as an integer. + -The number -128 underflows char alphabets on some architectures. Removed + uses of it in tests. + -Disabled the keller2 test because it causes problems on many architectures + due to its large size and compilation requirements. + +Ragel 3.6 - Jul 10, 2004 +======================== + -Many documentation updates. + -When resolving names, return a set of values so that a reference in an + action block that is embedded more than once won't report distinct entry + points that are actually the same. + -Implemented flat tables. Stores a linear array of indicies into the + transition array and only a low and high key value. Faster than binary + searching for keys but not usable for large alphabets. + -Fixed bug in deleting of transitions leftover from converstion from bst to + list implementation of transitions. Other code cleanup. + -In table based output calculate the cost of using an index. Don't use if + cheaper. + -Changed fstate() value available in init and action code to to fentry() to + reflect the fact that the values returned are intended to be used as targets + in fgoto, fnext and fcall statements. The returned state is not a unique + state representing the label. There can be any number of states representing + a label. + -Added keller2 test, C++ scanning tests and C++ scanning example. + -In table based output split up transitions into targets and actions. This + allows actions to be omitted. + -Broke the components of the state array into separate arrays. Requires + adding some fields where they could previously be omitted, however allows + finer grained control over the sizes of items and an overal size reduction. + Also means that state numbers are not an offset into the state array but + instead a sequence of numbers, meaning the context array does not have any + wasted bits. + -Action lists and transition also have their types chosen to be the smallest + possible for accomodating the contained values. + -Changed curs state stored in fsm struct from _cs to curs. Keep fsm->curs == + -1 while in machine. Added tests curs1 and curs2. + -Implemented the notion of context. Context can be embedded in states using + >:, $:, @: and %: operators. These embed a named context into start states, + all states, non-start/non-final and final states. If the context is declared + using a context statment + context name; + then the context can be quered for any state using fsm_name_ctx_name(state) + in C code and fsm_name::ctx_name(state) in C++ code. This feature makes it + possible to determine what "part" of the machine is currently active. + -Fixed crash on machine generation of graphs with no final state. If there + is no reference to a final state in a join operation, don't generate one. + -Updated Vim sytax: added labels to inline code, added various C++ keywords. + Don't highlight name separations as labels. Added switch labels, improved + alphtype, element and getkey. + -Fixed line info in error reporting of bad epsilon trans. + -Fixed fstate() for tab code gen. + -Removed references to malloc.h. + +Ragel 3.5 - May 29, 2004 +======================== + -When parse errors occur, the partially generated output file is deleted and + an non-zero exit status is returned. + -Updated Vim syntax file. + -Implemented the setting of the element type that is passed to the execute + routine as well as method for specifying how ragel should retrive the key + from the element type. This lets ragel process arbitrary structures inside + of which is the key that is parsed. + element struct Element; + getkey fpc->character; + -The current state is now implemented with an int across all machines. This + simplifies working with current state variables. For example this allows a + call stack to be implemented in user code. + -Implemented a method for retrieving the current state, the target state, and + any named states. + fcurs -retrieve the current state + ftargs -retrieve the target state + fstate(name) -retrieve a named state. + -Implemented a mechanism for jumping to and calling to a state stored in a + variable. + fgoto *; -goto the state returned by the C/C++ expression. + fcall *; -call the state returned by the C/C++ expression. + -Implemented a mechanism for specifying the next state without immediately + transfering control there (any code following statement is executed). + fnext label; -set the state pointed to by label as the next state. + fnext *; -set the state returned by the C/C++ expression as the + next. + -Action references are determined from the final machine instead of during + the parse tree walk. Some actions can be referenced in the parse tree but not + show up in the final machine. Machine analysis is now done based on this new + computation. + -Named state lookup now employs a breadth-first search in the lookup and + allows the user to fully qualify names, making it possible to specify + jumps/calls into parts of the machine deep in the name hierarchy. Each part + of name (separated by ::) employs a breadth first search from it's starting + point. + -Name references now must always refer to a single state. Since references to + multiple states is not normally intended, it no longer happens + automatically. This frees the programmer from thinking about whether or not + a state reference is unique. It also avoids the added complexity of + determining when to merge the targets of multiple references. The effect of + references to multiple states can be explicitly created using the join + operator and epsilon transitions. + -M option was split into -S and -M. -S specifies the machine spec to generate + for graphviz output and dumping. -M specifies the machine definition or + instantiation. + -Machine function parameters are now prefixed with and underscore to + avoid the hiding of class members. + +Ragel 3.4 - May 8, 2004 +======================= + -Added the longest match kleene star operator **, which is synonymous + with ( ( ) $0 %1 ) *. + -Epsilon operators distinguish between leaving transitions (going to an + another expression in a comma separated list) and non-leaving transitions. + Leaving actions and priorities are appropriately transferred. + -Relative priority of following ops changed to: + 1. Action/Priority + 2. Epsilon + 3. Label + If label is done first then the isolation of the start state in > operators + will cause the label to point to the old start state that doesn't have the + new action/priority. + -Merged >! and >~, @! and @~, %! and %~, and $! and $~ operators to have one + set of global error action operators (>!, @!, %! and $!) that are invoked on + error by unexpected characters as well as by unexepected EOF. + -Added the fpc keyword for use in action code. This is a pointer to the + current character. *fpc == fc. If an action is invoked on EOF then fpc == 0. + -Added >^, @^, %^, and $^ local error operators. Global error operators (>!, + @!, $!, and %!) cause actions to be invoked if the final machine fails. + Local error actions cause actions to be invoked if if the current machine + fails. + -Changed error operators to mean embed global/local error actions in: + >! and !^ -the start state. + @! and @^ -states that are not the start state and are not final. + %! and %^ -final states. + $! and $^ -all states. + -Added >@! which is synonymous >! then @! + -Added >@^ which is synonymous >^ then @^ + -Added @%! which is synonymous @! then %! + -Added @%^ which is synonymous >^ then @^ + -FsmGraph representation of transition lists was changed from a mapping of + alphabet key -> transition objects using a BST to simply a list of + transition objects. Since the transitions are no longer divided by + single/range, the fast finding of transition objects by key is no longer + required functionality and can be eliminated. This new implementation uses + the same amount of memory however causes less allocations. It also make more + sense for supporting error transitions with actions. Previously an error + transition was represented by a null value in the BST. + -Regular expression ranges are checked to ensure that lower <= upper. + -Added printf-like example. + -Added atoi2, erract2, and gotcallret to the test suite. + -Improved build test to support make -jN and simplified the compiling and + running of tests. + +Ragel 3.3 - Mar 7, 2004 +========================= + -Portability bug fixes were made. Minimum and maximum integer values are + now taken from the system. An alignment problem on 64bit systems + was fixed. + +Ragel 3.2 - Feb 28, 2004 +======================== + -Added a Vim syntax file. + -Eliminated length var from generated execute code in favour of an end + pointer. Using length requires two variables be read and written. Using an + end pointer requires one variable read and written and one read. Results in + more optimizable code. + -Minimization is now on by default. + -States are ordered in output by depth first search. + -Bug in minimization fixed. States were not being distinguished based on + error actions. + -Added null and empty builtin machines. + -Added EOF error action operators. These are >~, >@, $~, and %~. EOF error + operators embed actions to take if the EOF is seen and interpreted as an + error. The operators correspond to the following states: + -the start state + -any state with a transition to a final state + -any state with a transiion out + -a final state + -Fixed bug in generation of unreference machine vars using -M. Unreferenced + vars don't have a name tree built underneath when starting from + instantiations. Need to instead build the name tree starting at the var. + -Calls, returns, holds and references to fc in out action code are now + handled for ipgoto output. + -Only actions referenced by an instantiated machine expression are put into + the action index and written out. + -Added rlscan, an example that lexes Ragel input. + +Ragel 3.1 - Feb 18, 2004 +======================== + -Duplicates in OR literals are removed and no longer cause an assertion + failure. + -Duplicate entry points used in goto and call statements are made into + deterministic entry points. + -Base FsmGraph code moved from aapl into ragel, as an increasing amount + of specialization is required. Too much time was spent attempting to + keep it as a general purpose template. + -FsmGraph code de-templatized and heirarchy squashed to a single class. + -Single transitions taken out of FsmGraph code. In the machine construction + stage, transitions are now implemented only with ranges and default + transtions. This reduces memory consumption, simplifies code and prevents + covered transitions. However it requires the automated selection of single + transitions to keep goto-driven code lean. + -Machine reduction completely rewritten to be in-place. As duplicate + transitions and actions are found and the machine is converted to a format + suitable for writing as C code or as GraphViz input, the memory allocated + for states and transitions is reused, instead of newly allocated. + -New reduction code consolodates ranges, selects a default transition, and + selects single transitions with the goal of joining ranges that are split by + any number of single characters. + -Line directive changed from "# " to the more common format + "#line ". + -Operator :! changed to @!. This should have happened in last release. + -Added params example. + +Ragel 3.0 - Jan 22, 2004 +======================== + -Ragel now parses the contents of struct statements and action code. + -The keyword fc replaces the use of *p to reference the current character in + action code. + -Machine instantiations other than main are allowed. + -Call, jump and return statements are now available in action code. This + facility makes it possible to jump to an error handling machine, call a + sub-machine for parsing a field or to follow paths through a machine as + determined by arbitrary C code. + -Added labels to the language. Labels can be used anywhere in a machine + expression to define an entry point. Also references to machine definitions + cause the implicit creation of a label. + -Added epsilon transitions to the language. Epsilon operators may reference + labels in the current name scope resolved when join operators are evaluated + and at the root of the expression tree of machine assignment/instantiation. + -Added the comma operator, which joins machines together without drawing any + transitions between them. This operator is useful in combination with + labels, the epsilon operator and user code transitions for defining machines + using the named state and transition list paradigm. It is also useful for + invoking transitions based on some analysis of the input or on the + environment. + -Added >!, :!, $!, %! operators for specifying actions to take should the + machine fail. These operators embed actions to execute if the machine + fails in + -the start state + -any state with a transition to a final state + -any state with a transiion out + -a final state + The general rule is that if an action embedding operator embeds an action + into a set of transitions T, then the error-counterpart with a ! embeds an + action into the error transition taken when any transition T is a candidate, + but does not match the input. + -The finishing augmentation operator ':' has been changed to '@'. This + frees the ':' symbol for machine labels and avoids hacks to the parser to + allow the use of ':' for both labels and finishing augmentations. The best + hack required that label names be distinct from machine definition names as + in main := word : word; This restriction is not good because labels are + local to the machine that they are used in whereas machine names are global + entities. Label name choices should not be restricted by the set of names + that are in use for machines. + -Named priority syntax now requires parenthesis surrounding the name and + value pair. This avoids grammar ambiguities now that the ',' operator has + been introduced and makes it more clear that the name and value are an + asscociated pair. + -Backslashes are escaped in line directive paths. + +Ragel 2.2 - Oct 6, 2003 +======================= + -Added {n}, {,n}, {n,} {n,m} repetition operators. + {n} -- exactly n repetitions + {,n} -- zero to n repetitions + {n,} -- n or more repetitions + {n,m} -- n to m repetitions + -Bug in binary search table in Aapl fixed. Fixes crashing on machines that + add to action tables that are implicitly shared among transitions. + -Tests using obsolete minimization algorithms are no longer built and run by + default. + -Added atoi and concurrent from examples to the test suite. + +Ragel 2.1 - Sep 22, 2003 +======================== + -Bug in priority comparison code fixed. Segfaulted on some input with many + embedded priorities. + -Added two new examples. + +Ragel 2.0 - Sep 7, 2003 +======================= + -Optional (?), One or More (+) and Kleene Star (*) operators changed from + prefix to postfix. Rationale is that postfix version is far more common in + regular expression implementations and will be more readily understood. + -All priority values attached to transitions are now accompanied by a name. + Transitions no longer have default priority values of zero assigned + to them. Only transitions that have different priority values assigned + to the same name influence the NFA-DFA conversion. This scheme reduces + side-effects of priorities. + -Removed the %! statement for unsetting pending out priorities. With + named priorities, it is not necessary to clear the priorities of a + machine with $0 %! because non-colliding names can be used to avoid + side-effects. + -Removed the clear keyword, which was for removing actions from a machine. + Not required functionality and it is non-intuitive to have a language + feature that undoes previous definitions. + -Removed the ^ modifier to repetition and concatenation operators. This + undocumented feature prevented out transitions and out priorities from being + transfered from final states to transitions leaving machines. Not required + functionality and complicates the language unnecessarily. + -Keyword 'func' changed to 'action' as a part of the phasing out of the term + 'function' in favour of 'action'. Rationale is that the term 'function' + implies that the code is called like a C function, which is not necessarily + the case. The term 'action' is far more common in state machine compiler + implementations. + -Added the instantiation statement, which looks like a standard variable + assignment except := is used instead of =. Instantiations go into the + same graph dictionary as definitions. In the the future, instantiations + will be used as the target for gotos and calls in action code. + -The main graph should now be explicitly instantiated. If it is not, + a warning is issued. + -Or literal basic machines ([] outside of regular expressions) now support + negation and ranges. + -C and C++ interfaces lowercased. In the C interface an underscore now + separates the fsm machine and the function name. Rationale is that lowercased + library and generated routines are more common. + C output: + int fsm_init( struct clang *fsm ); + int fsm_execute( struct clang *fsm, char *data, int dlen ); + int fsm_finish( struct clang *fsm ); + C++ output: + int fsm::init( ); + int fsm::execute( char *data, int dlen ); + int fsm::finish( ); + -Init, execute and finish all return -1 if the machine is in the error state + and can never accept, 0 if the machine is in a non-accepting state that has a + path to a final state and 1 if the machine is in an accepting state. + -Accept routine eliminated. Determining whether or not the machine accepts is + done by examining the return value of the finish routine. + -In C output, fsm structure is no longer a typedef, so referencing requires + the struct keyword. This is to stay in line with C language conventions. + -In C++ output, constructor is no longer written by ragel. As a consequence, + init routine is not called automatically. Allows constructor to be supplied + by user as well as the return value of init to be examined without calling it + twice. + -Static start state and private structures are taken out of C++ classes. + +Ragel 1.5.4 - Jul 14, 2003 +========================== + -Workaround for building with bison 1.875, which produces an + optimization that doesn't build with newer version gcc. + +Ragel 1.5.3 - Jul 10, 2003 +========================== + -Fixed building with versions of flex that recognize YY_NO_UNPUT. + -Fixed version numbers in ragel.spec file. + +Ragel 1.5.2 - Jul 7, 2003 +========================= + -Transition actions and out actions displayed in the graphviz output. + -Transitions on negative numbers handled in graphviz output. + -Warning generated when using bison 1.875 now squashed. + +Ragel 1.5.1 - Jun 21, 2003 +========================== + -Bugs fixed: Don't delete the output objects when writing to standard out. + Copy mem into parser buffer with memcpy, not strcpy. Fixes buffer mem errror. + -Fixes for compiling with Sun WorkShop 6 compilers. + +Ragel 1.5.0 - Jun 10, 2003 +========================== + -Line directives written to the output so that errors in the action code + are properly reported in the ragel input file. + -Simple graphviz dot file output format is supported. Shows states and + transitions. Does not yet show actions. + -Options -p and -f dropped in favour of -d output format. + -Added option -M for specifying the machine to dump with -d or the graph to + generate with -V. + -Error recovery implemented. + -Proper line and column number tracking implemented in the scanner. + -All action/function code is now embedded in the main Execute routine. Avoids + duplication of action code in the Finish routine and the need to call + ExecFuncs which resulted in huge code bloat. Will also allow actions to + modify cs when fsm goto, call and return is supported in action code. + -Fsm spec can have no statements, nothing will be generated. + -Bug fix: Don't accept ] as the opening of a .-. range a reg exp. + -Regular expression or set ranges (ie /[0-9]/) are now handled by the parser + and consequently must be well-formed. The following now generates a parser + error: /[+-]/ and must be rewritten as /[+\-]/. Also fixes a bug whereby ] + might be accepted as the opening of a .-. range causing /[0-9]-[0-9]/ to + parse incorrectly. + -\v, \f, and \r are now treated as whitespace in an fsm spec. + +Ragel 1.4.1 - Nov 19, 2002 +========================== + -Compile fixes. The last release (integer alphabets) was so exciting + that usual portability checks got bypassed. + +Ragel 1.4.0 - Nov 19, 2002 +========================== + -Arbitrary integer alphabets are now fully supported! A new language + construct: + 'alphtype ' added for specifying the type of the alphabet. Default + is 'char'. Possible alphabet types are: + char, unsigned char, short, unsigned short, int, unsigned int + -Literal machines specified in decimal format can now be negative when the + alphabet is a signed type. + -Literal machines (strings, decimal and hex) have their values checked for + overflow/underflow against the size of the alphabet type. + -Table driven and goto driven output redesigned to support ranges. Table + driven uses a binary search for locating single characters and ranges. Goto + driven uses a switch statement for single characters and nested if blocks for + ranges. + -Switch driven output removed due to a lack of consistent advantages. Most of + the time the switch driven FSM is of no use because the goto FSM makes + smaller and faster code. Under certain circumstances it can produce smaller + code than a goto driven fsm and be almost as fast, but some sporadic case + does not warrant maintaining it. + -Many warnings changed to errors. + -Added option -p for printing the final fsm before minimization. This lets + priorities be seen. Priorties are all reset to 0 before minimization. The + exiting option -f prints the final fsm after minimization. + -Fixed a bug in the clang test and example that resulted in redundant actions + being executed. + +Ragel 1.3.4 - Nov 6, 2002 +========================= + -Fixes to Chapter 1 of the guide. + -Brought back the examples and made them current. + -MSVC is no longer supported for compiling windows binaries because its + support for the C++ standard is frustratingly inadequate, it will cost money + to upgrade if it ever gets better, and MinGW is a much better alternative. + -The build system now supports the --host= option for building ragel + for another system (used for cross compiling a windows binary with MinGW). + -Various design changes and fixes towards the goal of arbitrary integer + alphabets and the handling of larger state machines were made. + -The new shared vector class is now used for action lists in transitions and + states to reduce memory allocations. + -An avl tree is now used for the reduction of transitions and functions of an + fsm graph before making the final machine. The tree allows better scalability + and performance by not requiring consecutively larger heap allocations. + -Final stages in the separation of fsm graph code from action embedding and + priority assignment is complete. Makes the base graph leaner and easier to reuse + in other projects (like Keller). + +Ragel 1.3.3 - Oct 22, 2002 +========================== + -More diagrams were added to section 1.7.1 of the user guide. + -FSM Graph code was reworked to spearate the regex/nfa/minimizaion graph + algorithms from the manipulation of state and transition properties. + -An rpm spec file from Cris Bailiff was added. This allows an rpm for ragel + to be built with the command 'rpm -ta ragel-x.x.x.tar.gz' + -Fixes to the build system and corresponding doc updates in the README. + -Removed autil and included the one needed source file directly in the top + level ragel directory. + -Fixed a bug that nullified the 20 times speedup in large compilations + claimed by the last version. + -Removed awk from the doc build (it was added with the last release -- though + not mentioned in the changelog). + -Install of man page was moved to the doc dir. The install also installs the + user guide to $(PREFIX)/share/doc/ragel/ + +Ragel 1.3.2 - Oct 16, 2002 +========================== + -Added option -v (or --version) to show version information. + -The subtract operator no longer removes transition data from the machine + being subtracted. This is left up to the user for the purpose of making it + possible to transfer transitions using subtract and also for speeding up the + subtract routine. Note that it is possible to explicitly clear transition + data before a doing a subtract. + -Rather severe typo bug fixed. Bug was related to transitions with higher + priorities taking precedence. A wrong ptr was being returned. It appears to + have worked most of the time becuase the old ptr was deleted and the new one + allocated immediatly after so the old ptr often pointed to the same space. + Just luck though. + -Bug in the removing of dead end paths was fixed. If the start state + has in transitions then those paths were not followed when finding states to + keep. Would result in non-dead end states being removed from the graph. + -In lists and in ranges are no longer maintained as a bst with the key as the + alphabet character and the value as a list of transitions coming in on that + char. There is one list for each of inList, inRange and inDefault. Now that + the required functionality of the graph is well known it is safe to remove + these lists to gain in speed and footprint. They shouldn't be needed. + -IsolateStartState() runs on modification of start data only if the start + state is not already isolated, which is now possible with the new in list + representation. + -Concat, Or and Star operators now use an approximation to + removeUnreachableStates that does not require a traversal of the entire + graph. This combined with an 'on-the-fly' management of final bits and final + state status results is a dramatic speed increase when compiling machines + that use those operators heavily. The strings2 test goes 20 times faster. + -Before the final minimization, after all fsm operations are complete, + priority data is reset which enables better minimization in cases where + priorities would otherwise separate similar states. + +Ragel 1.3.1 - Oct 2, 2002 +========================= + -Range transitions are now used to implement machines made with /[a-z]/ and + the .. operator as well as most of the builtin machines. The ranges are not + yet reflected in the output code, they are expanded as if they came from the + regular single transitions. This is one step closer to arbitrary integer + output. + -The builtin machine 'any' was added. It is equiv to the builtin extend, + matching any characters. + -The builtin machine 'cntrl' now includes newline. + -The builtin machine 'space' now includes newline. + -The builtin machine 'ascii' is now the range 0-127, not all characters. + -A man page was written. + -A proper user guide was started. Chapter 1: Specifying Ragel Programs + was written. It even has some diagrams :) + +Ragel 1.3.0 - Sept 4, 2002 +========================== + -NULL keyword no longer used in table output. + -Though not yet in use, underlying graph structure changed to support range + transitions. As a result, most of the code that walks transition lists is now + implemented with an iterator that hides the complexity of the transition + lists and ranges. Range transitions will be used to implement /[a-z]/ style + machines and machines made with the .. operator. Previously a single + transition would be used for each char in the range, which is very costly. + Ranges eliminate much of the space complexity and allow for the .. operator + to be used with very large (integer) alphabets. + -New minimization similar to Hopcroft's alg. It does not require n^2 space and + runs close to O(n*log(n)) (an exact analysis of the alg is very hard). It is + much better than the stable and approx minimization and obsoletes them both. + An exact implementation of Hopcroft's alg is desirable but not possible + because the ragel implementation does not assume a finite alphabet, which + Hopcroft's requires. Ragel will support arbitrary integer alphabets which + must be treated as an infinite set for implementation considerations. + -New option -m using above described minimization to replace all previous + minimization options. Old options sill work but are obsolete and not + advertised with -h. + -Bug fixed in goto style output. The error exit set the current state to 0, + which is actually a valid state. If the machine was entered again it would go + into the first state, very wrong. If the first state happened to be final then + an immediate finish would accept when in fact it should fail. + -Slightly better fsm minimization now capable due to clearing of the + transition ordering numbers just prior to minimization. + +Ragel 1.2.2 - May 25, 2002 +========================== + -Configuration option --prefix now works when installing. + -cc file extension changed to cpp for better portability. + -Unlink of output file upon error no longer happens, removes dependency on + unlink system command. + -All multiline strings removed: not standard c++. + -Awk build dependency removed. + -MSVC 6.0 added to the list of supported compilers (with some tweaking of + bison and flex output). + +Ragel 1.2.1 - May 13, 2002 +========================== + -Automatic dependencies were fixed, they were not working correctly. + -Updated AUTHORS file to reflect contributors. + -Code is more C++ standards compliant: compiles with g++ 3.0 + -Fixed bugs that only showed up in g++ 3.0 + -Latest (unreleased) Aapl. + -Configuration script bails out if bison++ is installed. Ragel will not + compile with bison++ because it is coded in c++ and bison++ automatically + generates a c++ parser. Ragel uses a c-style bison parser. + +Ragel 1.2.0 - May 3, 2002 +========================= + -Underlying graph structure now supports default transitions. The result is + that a transition does not need to be made for each char of the alphabet + when making 'extend' or '/./' machines. Ragel compiles machines that + use the aforementioned primitives WAY faster. + -The ugly hacks needed to pick default transitions now go away due to + the graph supporting default transitions directly. + -If -e is given, but minimization is not turned on, print a warning. + -Makefiles use automatic dependencies. + +Ragel 1.1.0 - April 15, 2002 +============================ + -Added goto fsm: much faster than any other fsm style. + -Default operator (if two machines are side by side with no operator + between them) is concatenation. First showed up in 1.0.4. + -The fsm machine no longer auotmatically builds the flat table for + transition indicies. Instead it keeps the key,ptr pair. In tabcodegen + the flat table is produced. This way very large alphabets with sparse + transitions will not consume large amounts of mem. This is also in prep + for fsm graph getting a default transition. + -Generated code contains a statement explicitly stating that ragel fsms + are NOT covered by the GPL. Technically, Ragel copies part of itself + to the output to make the generic fsm execution routine (for table driven + fsms only) and so the output could be considered under the GPL. But this + code is very trivial and could easlily be rewritten. The actual fsm data + is subject to the copyright of the source. To promote the use of Ragel, + a special exception is made for the part of the output copied from Ragel: + it may be used without restriction. + -Much more elegant code generation scheme is employed. Code generation + class members need only put the 'codegen' keyword after their 'void' type + in order to be automatically registerd to handle macros of the same name. + An awk script recognises this keyword and generates an appropriate driver. + -Ragel gets a test suite. + -Postfunc and prefunc go away because they are not supported by non + loop-driven fsms (goto, switch) and present duplicate functionality. + Universal funcs can be implemented by using $ operator. + -Automatic dependencies used in build system, no more make depend target. + -Code generation section in docs. + -Uses the latests aapl. + +Ragel 1.0.5 - March 3, 2002 +=========================== + -Bugfix in SetErrorState that caused an assertion failure when compiling + simple machines that did not have full transition tables (and thus did + not show up on any example machines). Assertion failure did not occur + when using the switch statement code as ragel does not call SetErrorState + in that case. + -Fixed some missing includes, now compiles on redhat. + -Moved the FsmMachTrans Compare class out of FsmMachTrans. Some compilers + don't deal with nested classes in templates too well. + -Removed old unused BASEREF in fsmgraph and ragel now compiles using + egcs-2.91.66 and presumably SUNWspro. The baseref is no longer needed + because states do not support being elements in multiple lists. I would + rather be able to support more compilers than have this feature. + -Started a README with compilation notes. Started an AUTHORS file. + -Started the user documentation. Describes basic machines and operators. + +Ragel 1.0.4 - March 1, 2002 +=========================== + -Ported to the version of Aapl just after 2.2.0 release. See + http://www.ragel.ca/aapl/ for details on aapl. + -Fixed a bug in the clang example: the newline machine was not stared. + -Added explanations to the clang and mailbox examples. This should + help people that want to learn the lanuage as the manual is far from + complete. + +Ragel 1.0.3 - Feb 2, 2002 +========================= + -Added aapl to the ragel tree. No longer requires you to download + and build aapl separately. Should avoid discouraging impatient users + from compiling ragel. + -Added the examples to the ragel tree. + -Added configure script checks for bison and flex. + -Fixed makefile so as not to die with newer versions of bison that + write the header of the parser to a .hh file. + -Started ChangeLog file. + +Ragel 1.0.2 - Jan 30, 2002 +========================== + -Bug fix in calculating highIndex for table based code. Was using + the length of out tranisition table rather than the value at the + end. + -If high/low index are at the limits, output a define in their place, + not the high/low values themselves so as not to cause compiler warnings. + -If the resulting machines don't have any indicies or functions, then + omit the empty unrefereced static arrays so as not to cause compiler + warnings about unused static vars. + -Fixed variable sized indicies support. The header cannot have any + reference to INDEX_TYPE as that info is not known at the time the header + data is written. Forces us to use a void * for pointers to indicies. In + the c++ versions we are forced to make much of the data non-member + static data in the code portion for the same reason. + +Ragel 1.0.1 - Jan 28, 2002 +========================== + -Exe name change from reglang to ragel. + -Added ftabcodegen output code style which uses a table for states and + transitions but uses a switch statement for the function execution. + -Reformatted options in usage dump to look better. + -Support escape sequences in [] sections of regular expressions. + +Ragel 1.0 - Jan 25, 2002 +======================== + -Initial release. diff --git a/Makefile.in b/Makefile.in new file mode 100644 index 0000000..9b16e8e --- /dev/null +++ b/Makefile.in @@ -0,0 +1,56 @@ +# +# Copyright 2001-2006 Adrian Thurston +# + +# This file is part of Ragel. +# +# Ragel is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2 of the License, or +# (at your option) any later version. +# +# Ragel is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with Ragel; if not, write to the Free Software +# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + +SUBDIRS = common ragel rlcodegen test examples doc + +#************************************* + +# Programs +CXX = @CXX@ + +# Get the version info. +include version.mk + +# Rules. +all: + @cd common && $(MAKE) && cd ../ragel && $(MAKE) && cd ../rlcodegen && $(MAKE) + +new-version: + sed 's/^\(Version:[[:space:]]*\)[0-9.]*$$/\1$(VERSION)/' ragel.spec > spec-new + cat spec-new > ragel.spec && rm spec-new + +distclean: distclean-rec distclean-local + +distclean-rec: + for dir in $(SUBDIRS); do cd $$dir; $(MAKE) distclean; cd ..; done + +distclean-local: clean-local + rm -f Makefile config.cache config.status config.log + +clean: clean-rec clean-local + +clean-rec: + for dir in $(SUBDIRS); do cd $$dir; $(MAKE) clean; cd ..; done + +clean-local: + rm -f tags + +install: + @cd ragel && $(MAKE) install && cd ../rlcodegen && $(MAKE) install diff --git a/README b/README new file mode 100644 index 0000000..f4a5817 --- /dev/null +++ b/README @@ -0,0 +1,54 @@ + + Ragel State Machine Compiler -- README + ====================================== + +1. Build Requirements +--------------------- + + * GNU Make + * g++ + +If you would like to modify Ragel and need to build Ragel's scanners and +parsers from the specifications then set BUILD_PARSERS=true in the configure +script and then run it. To build the parsers you will need the following +programs: + + * flex + * bison (recent version and not bison++, see below) + * gperf + +To build the user guide the following extra programs are needed: + + * fig2dev + * pdflatex + + +2. Compilation +-------------- + +To configure type './configure'. The makefiles honour the --prefix option to +specify where the program is to be installed to. + +To build the ragel program type 'make'. + +To build all the documentation cd to 'doc' and type 'make'. If you don't have +all of the programs to build the user guide and just want the man page use +'make ragel.1 rlcodegen.1'. + + +3. Installing +------------- + +The command 'make install' will build the programs and install them to $PREFIX/bin/. +A 'make install' in the doc directory will make and install all the +documentation. The man pages install to $PREFIX/man/man1/ and the user guide +and ChangeLog install to $PREFIX/share/doc/ragel/. To install just the man page +use 'make man-install'. + + +4. Why Ragel cannot be built with Bison++ +----------------------------------------- +Ragel is written in C++ using a C-style parser. Bison++ sees that we are using +C++ and generates classes, which breaks the build. As of last investigation, +this can't be stopped. Bison++ is therefore only compatible with Bison if you +are implementing a C-style parser in C. diff --git a/TODO b/TODO new file mode 100644 index 0000000..baf5c05 --- /dev/null +++ b/TODO @@ -0,0 +1,48 @@ +fbreak should advance the current char. Depreciate fbreak and add + fctl_break; + fctl_return ; + fctl_goto