From 7f8825434d396f8748f2fa27219d80a6359a715a Mon Sep 17 00:00:00 2001 From: thurston Date: Wed, 9 May 2007 22:44:14 +0000 Subject: [PATCH] Added new sections (import/export/variable) and expanded on running the executables. git-svn-id: http://svn.complang.org/ragel/trunk@222 052ea7fc-9027-0410-9066-f65837a77df0 --- doc/ragel-guide.tex | 145 +++++++++++++++++++++++++++++++++++++++------------- 1 file changed, 109 insertions(+), 36 deletions(-) diff --git a/doc/ragel-guide.tex b/doc/ragel-guide.tex index ba39d4b..5afda97 100644 --- a/doc/ragel-guide.tex +++ b/doc/ragel-guide.tex @@ -537,6 +537,27 @@ of the same name as the current specification. Without an input file the current file is searched for a machine of the given name. If both are present, the given input file is searched for a machine of the given name. +\subsection{Importing Definitions} + +\begin{verbatim} +import "inputfile.h"; +\end{verbatim} +\verbspace + +The \verb|import| statement takes a literal string as an argument, interprets +it as a file name, then scrapes the file for sequences of tokens that match the +following forms. If the input file is a Ragel program then tokens inside the +Ragel sections are ignored. See Section \ref{export} for a description of +exporting machine definitions. + +\begin{itemize} + \setlength{\itemsep}{-2mm} + \item \verb|name = number| + \item \verb|name = lit_string| + \item \verb|"define" name number| + \item \verb|"define" name lit_string| +\end{itemize} + \subsection{Machine Definition} \label{definition} @@ -2447,6 +2468,34 @@ sense use the Longest-Match machine construction described Section \chapter{Interface to Host Program} +The Ragel code generator is very flexible. The generated code has no +dependencies and can be inserted in any function, perhaps inside a loop if so +desired. The user is responsible for declaring and initializing a number of +required variables, including the current state and the pointer to the input +stream. These can live in any scope. Control of the input processing loop is +also possible: the user may break out of the processing loop and return to it +at any time. + +In the case of C and D host languages, Ragel is able to generate very +fast-running code that implements state machines as directly executable code. +Since very large files strain the host language compiler, table-based code +generation is also supported. In the future we hope to provide a partitioned, +directly executable format which is able to reduce the burden on the host +compiler by splitting large machines across multiple functions. + +In the case of Java and Ruby, table-based code generation is the only code +style supported. In the future this may be expanded to include other code +styles. + +Ragel can be used to parse input in one block, or it can be used to parse input +in a sequence of blocks as it arrives from a file or socket. Parsing the input +in a sequence of blocks brings with it a few responsibilities. If the parser +utilizes a scanner, care must be taken to not break the input stream anywhere +but token boundaries. If pointers to the input stream are taken during +parsing, care must be taken to not use a pointer which has been invalidated by +movement to a subsequent block. If the current input data pointer is moved +backwards it must not be moved past the beginning of the current block. + \section{Alphtype Statement} \begin{verbatim} @@ -2492,6 +2541,18 @@ This is useful if a machine is to be encapsulated inside a structure in C code. The access statement can be used to give the name of a pointer to the structure. +\section{Variable Statement} + +\begin{verbatim} +variable p fsm->p; +\end{verbatim} +\verbspace + +The variable statement allows one to tell ragel how to access a specific +variable. All of the variables which are declared by the user and +used by Ragel can be changed. This includes \verb|p|, \verb|pe|, \verb|cs|, +\verb|top|, \verb|stack|, \verb|tokstart|, \verb|tokend| and \verb|act|. + \section{Write Statement} \label{write-statement} @@ -2544,11 +2605,12 @@ state. Data generation has several options: \begin{itemize} -\item \verb|noerror| - Do not generate the integer variable that gives the +\setlength{\itemsep}{-2mm} +\item \verb|noerror | - Do not generate the integer variable that gives the id of the error state. -\item \verb|nofinal| - Do not generate the integer variable that gives the +\item \verb|nofinal | - Do not generate the integer variable that gives the id of the first final state. -\item \verb|noprefix| - Do not prefix the variable names with the name of the +\item \verb|noprefix | - Do not prefix the variable names with the name of the machine. \end{itemize} @@ -2635,6 +2697,27 @@ This write statement is only relevant if EOF actions have been embedded, otherwise it does not generate anything. The EOF action code requires access to the current state. +\subsection{Write Exports} +\label{export} + +\begin{verbatim} +write exports; +\end{verbatim} +\verbspace + +The export feature can be used to export simple machine definitions. Machine definitions +are marked for export using the \verb|export| keyword. + +\verbspace +\begin{verbatim} +export machine_to_export = 0x44; +\end{verbatim} +\verbspace + +When the write exports statement is used these machines are +written out in the generated code. Defines are used for C and constant integers +are used for D, Java and Ruby. + \section{Maintaining Pointers to Input Data} In the creation of any parser it is not uncommon to require the collection of @@ -2715,48 +2798,38 @@ An example of line-oriented processing is given in Figure \ref{line-oriented}. \section{Running the Executables} -Ragel is broken down into two executables: a frontend which compiles machines +Ragel is broken down into two parts: a frontend which compiles machines and emits them in an XML format, and a backend which generates code or a Graphviz Dot file from the XML data. The purpose of the XML-based intermediate format is to allow users to inspect their compiled state machines and to interface Ragel to other tools such as custom visualizers, code generators or -analysis tools. The intermediate format will provide a better platform for -extending Ragel to support new host languages. The split also serves to reduce -complexity of the Ragel program by strictly separating the data structures and -algorithms that are used to compile machines from those that are used to -generate code. +analysis tools. The split also serves to reduce the complexity of the Ragel +program by strictly separating the data structures and algorithms that are used +to compile machines from those that are used to generate code. -\verbspace -\begin{verbatim} -[user@host] myproj: ragel file.rl | rlgen-cd -G2 -o file.c -\end{verbatim} +\vspace{10pt} -\section{Choosing a Generated Code Style} -\label{genout} +\noindent The frontend program is called \verb|ragel|. It takes as an argument the host +language. This can be: -The Ragel code generator is very flexible. Following the lead of Re2C, the -generated code has no dependencies and can be inserted in any function, perhaps -inside a loop if so desired. The user is responsible for declaring and -initializing a number of required variables, including the current state and -the pointer to the input stream. The user may break out of the processing loop -and return to it at any time. +\begin{itemize} +\item \verb|-C | for C/C++/Objective-C code (default) +\item \verb|-D | for D code. +\item \verb|-J | for Java code. +\item \verb|-R | for Ruby code. +\end{itemize} -Ragel is able to generate very fast-running code that implements state machines -as directly executable code. Since very large files strain the host language -compiler, table-based code generation is also supported. In the future we hope -to provide a partitioned, directly executable format which is able to reduce the -burden on the host compiler by splitting large machines across multiple functions. +\noindent There are four code backend programs. These are: -Ragel can be used to parse input in one block, or it can be used to parse input -in a sequence of blocks as it arrives from a file or socket. Parsing the -input in a sequence of blocks brings with it a few responsibilities. If the parser -utilizes a scanner, care must be taken to not break the input stream anywhere -but token boundaries. If pointers to the input stream are taken during parsing, -care must be taken to not use a pointer which has been invalidated by movement -to a subsequent block. -If the current input data pointer is moved backwards it must not be moved -past the beginning of the current block. -Strategies for handling these scenarios are given in Ragel's manual. +\begin{itemize} +\item \verb|rlgen-cd | generate code for the C-based and D languages. +\item \verb|rlgen-java | generate code for the Java language. +\item \verb|rlgen-ruby | generate code for the Ruby language. +\item \verb|rlgen-dot | generate a Graphviz Dot file. +\end{itemize} + +\section{Choosing a Generated Code Style (C/D only)} +\label{genout} There are three styles of code output to choose from. Code style affects the size and speed of the compiled binary. Changing code style does not require any -- 2.7.4