From 62c9af7ce9567d11450f5146d7148d5e88f9dec8 Mon Sep 17 00:00:00 2001 From: thurston Date: Wed, 20 Jun 2007 23:14:08 +0000 Subject: [PATCH] Updates to "machine instantiation", "write init" and "write exports" sectons. Added the "variables used by ragel" section. git-svn-id: http://svn.complang.org/ragel/trunk@253 052ea7fc-9027-0410-9066-f65837a77df0 --- doc/ragel-guide.tex | 127 +++++++++++++++++++++++++++++++++++++--------------- 1 file changed, 91 insertions(+), 36 deletions(-) diff --git a/doc/ragel-guide.tex b/doc/ragel-guide.tex index e63ae9c..7dc0e4a 100644 --- a/doc/ragel-guide.tex +++ b/doc/ragel-guide.tex @@ -523,6 +523,42 @@ multiple machine specifications. This allows one to break up a machine across several files or draw in statements that are common to multiple machines using the \verb|include| statement. +\subsection{Machine Definition} +\label{definition} + +\begin{verbatim} + = ; +\end{verbatim} +\verbspace + +The machine definition statement associates an FSM expression with a name. Machine +expressions assigned to names can later be referenced by other expressions. A +definition statement on its own does not cause any states to be generated. It is simply a +description of a machine to be used later. States are generated only when a definition is +instantiated, which happens when a definition is referenced in an instantiated +expression. + +\subsection{Machine Instantiation} +\label{instantiation} + +\begin{verbatim} + := ; +\end{verbatim} +\verbspace + +The machine instantiation statement generates a set of states representing an +expression. Each instantiation generates a distinct set of states. The entry +point is written in the generated code using the instantiation name. If the +\verb|main| machine is instantiated, its start state is used as the +specification's start state and is assigned to the \verb|cs| variable by the +\verb|write init| command. If no \verb|main| machine is given, the start state +of the last machine instantiation is used as the specification's start state. + +From outside the execution loop, control may be passed to any machine by +assigning the entry point to the \verb|cs| variable. From inside the execution +loop, control may be passed to any machine instantiation using \verb|fcall|, +\verb|fgoto| or \verb|fnext| statements. + \subsection{Including Ragel Code} \begin{verbatim} @@ -538,6 +574,7 @@ current file is searched for a machine of the given name. If both are present, the given input file is searched for a machine of the given name. \subsection{Importing Definitions} +\label{import} \begin{verbatim} import "inputfile.h"; @@ -558,38 +595,6 @@ exporting machine definitions. \item \verb|"define" name lit_string| \end{itemize} -\subsection{Machine Definition} -\label{definition} - -\begin{verbatim} - = ; -\end{verbatim} -\verbspace - -The machine definition statement associates an FSM expression with a name. Machine -expressions assigned to names can later be referenced by other expressions. A -definition statement on its own does not cause any states to be generated. It is simply a -description of a machine to be used later. States are generated only when a definition is -instantiated, which happens when a definition is referenced in an instantiated -expression. - -\subsection{Machine Instantiation} -\label{instantiation} - -\begin{verbatim} - := ; -\end{verbatim} -\verbspace - -The machine instantiation statement generates a set of states representing an -expression. Each instantiation generates a distinct set of states. The entry -point is written in the generated code using the instantiation name. If the -\verb|main| machine is instantiated, then a start state is also generated and -assigned to the \verb|cs| variable by the \verb|write init| command. From -outside the execution loop, control may be passed to any machine by assigning -the entry point to the \verb|cs| variable. From inside the execution loop, -control may be passed to any machine instantiation using \verb|fcall|, -\verb|fgoto| or \verb|fnext| statements. \section{Lexical Analysis of a Ragel Block} \label{lexing} @@ -2491,6 +2496,52 @@ parsing, care must be taken to not use a pointer which has been invalidated by movement to a subsequent block. If the current input data pointer is moved backwards it must not be moved past the beginning of the current block. +\section{Variables Used by Ragel} + +There are a number of variables which Ragel expects the user to declare. At a +very minimum the \verb|cs|, \verb|p| and \verb|pe| variables must be declared. +In Java and Ruby code the \verb|data| variable must also be declared. If +stack-based state machine control flow statements are used then the +\verb|stack| and \verb|top| variables are required. If a scanner is declared +then the \verb|act|, \verb|tokstart| and \verb|tokend| variables must be +declared. + +\begin{itemize} + +\item \verb|cs| - Current state. This must be an integer and it should persist +across invocations of the machine when the data is broken into blocks that are +processed independently. + +\item \verb|p| - Data pointer. In C/D code this variable is expected to be a +pointer to the character data to process. It should be initialized to the +beginning of the data block on every run of the machine. In Java and Ruby it is +used as an offset to \verb|data| and must be an integer. In this case it should +be initialized to zero on every run of the machine. + +\item \verb|pe| - Data end pointer. This should be initialized to \verb|p| plus +the data length on every run of the machine. In Java and Ruby code this should +be initialized to the data length. + +\item \verb|data| - This variable is only required in Java and Ruby code. It +must be an array containting the data to process. + +\item \verb|stack| - This must be an array of integers. It is used to store +integer values representing states. + +\item \verb|top| - This must be an integer value and will be used as an offset +to \verb|stack|, giving the next available spot on the top of the stack. + +\item \verb|act| - This must be an integer value. It is a variable sometimes +used by scanner code to keep track of the most recent successful pattern match. + +\item \verb|tokstart| - This must be a pointer to character data. In Java and +Ruby code this must be an integer. See Section \ref{generating-scanners} for +more information. + +\item \verb|tokend| - Also a pointer to character data. + +\end{itemize} + \section{Alphtype Statement} \begin{verbatim} @@ -2619,9 +2670,12 @@ write init; The write init statement causes Ragel to emit initialization code. This should be executed once before the machine is started. At a very minimum this sets the current state to the start state. If other variables are needed by the -generated code, such as call -stack variables or longest-match management variables, they are also -initialized here. +generated code, such as call stack variables or scanner management +variables, they are also initialized here. + +The \verb|nocs| option to the write init statement will cause ragel to skip +intialization of the cs variable. This is useful if the user wishes to use +custom logic to decide which state the specification should start in. \subsection{Write Exec} \begin{verbatim} @@ -2712,7 +2766,8 @@ export machine_to_export = 0x44; When the write exports statement is used these machines are written out in the generated code. Defines are used for C and constant integers -are used for D, Java and Ruby. +are used for D, Java and Ruby. See Section \ref{import} for a description of the +import statement. \section{Maintaining Pointers to Input Data} -- 2.7.4