demos/expr/README

   1 Copyright 2001, 2004 Free Software Foundation, Inc.
   2
   3 This file is part of the GNU MP Library.
   4
   5 The GNU MP Library is free software; you can redistribute it and/or modify
   6 it under the terms of the GNU Lesser General Public License as published by
   7 the Free Software Foundation; either version 3 of the License, or (at your
   8 option) any later version.
   9
  10 The GNU MP Library is distributed in the hope that it will be useful, but
  11 WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
  12 or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser General Public
  13 License for more details.
  14
  15 You should have received a copy of the GNU Lesser General Public License
  16 along with the GNU MP Library.  If not, see http://www.gnu.org/licenses/.
  17
  18
  19
  20
  21
  22
  23                     GMP EXPRESSION EVALUATION
  24                     -------------------------
  25
  26
  27
  28 THIS CODE IS PRELIMINARY AND MAY BE SUBJECT TO INCOMPATIBLE CHANGES IN
  29 FUTURE VERSIONS OF GMP.
  30
  31
  32
  33 The files in this directory implement a simple scheme of string based
  34 expression parsing and evaluation, supporting mpz, mpq and mpf.
  35
  36 This will be slower than direct GMP library calls, but may be convenient in
  37 various circumstances, such as while prototyping, or for letting a user
  38 enter values in symbolic form.  "2**5723-7" for example is a lot easier to
  39 enter or maintain than the equivalent written out in decimal.
  40
  41
  42
  43 BUILDING
  44
  45 Nothing in this directory is a normal part of libgmp, and nothing is built
  46 or installed, but various Makefile rules are available to compile
  47 everything.
  48
  49 All the functions are available through a little library (there's no shared
  50 library since upward binary compatibility is not guaranteed).
  51
  52         make libexpr.a
  53
  54 In a program, prototypes are available using
  55
  56         #include "expr.h"
  57
  58 run-expr.c is a sample program doing evaluations from the command line.
  59
  60         make run-expr
  61         ./run-expr '1+2*3'
  62
  63 t-expr.c is self-test program, it prints nothing if successful.
  64
  65         make t-expr
  66         ./t-expr
  67
  68 The expr*.c sources don't depend on gmp-impl.h and can be compiled with just
  69 a standard installed GMP.  This isn't true of t-expr though, since it uses
  70 some of the internal tests/libtests.la.
  71
  72
  73
  74 SIMPLE USAGE
  75
  76 int mpz_expr (mpz_t res, int base, const char *e, ...);
  77 int mpq_expr (mpq_t res, int base, const char *e, ...);
  78 int mpf_expr (mpf_t res, int base, const char *e, ...);
  79
  80 These functions evaluate simple arithmetic expressions.  For example,
  81
  82         mpz_expr (result, 0, "123+456", NULL);
  83
  84 Numbers are parsed by mpz_expr and mpq_expr the same as mpz_set_str with the
  85 given base.  mpf_expr follows mpf_set_str, but supporting an "0x" prefix for
  86 hex when base==0.
  87
  88         mpz_expr (result, 0, "0xAAAA * 0x5555", NULL);
  89
  90 White space, as indicated by <ctype.h> isspace(), is ignored except for the
  91 purpose of separating tokens.
  92
  93 Variables can be included in expressions by putting them in the varargs list
  94 after the string.  "a", "b", "c" etc in the expression string designate
  95 those values.  For example,
  96
  97         mpq_t  foo, bar;
  98         ...
  99         mpq_expr (q, 10, "2/3 + 1/a + b/2", foo, bar, NULL);
 100
 101 Here "a" will be the value from foo and "b" from bar.  Up to 26 variables
 102 can be included this way.  The NULL must be present to indicate the end of
 103 the list.
 104
 105 Variables can also be written "$a", "$b" etc.  This is necessary when using
 106 bases greater than 10 since plain "a", "b" etc will otherwise be interpreted
 107 as numbers.  For example,
 108
 109         mpf_t  quux;
 110         mpf_expr (f, 16, "F00F@-6 * $a", quux, NULL);
 111
 112 All the standard C operators are available, with the usual precedences, plus
 113 "**" for exponentiation at the highest precedence (and right associative).
 114
 115         Operators      Precedence
 116          **              220
 117          ~ ! - (unary)   210
 118          * / %           200
 119          + -             190
 120          << >>           180
 121          <= < >= >       170
 122          == !=           160
 123          &               150
 124          ^               140
 125          |               130
 126          &&              120
 127          ||              110
 128          ? :             100/101
 129
 130 Currently only mpz_expr has the bitwise ~ % & ^ and | operators.  The
 131 precedence numbers are of interest in the advanced usage described below.
 132
 133 Various functions are available too.  For example,
 134
 135         mpz_expr (res, 10, "gcd(123,456,789) * abs(a)", var, NULL);
 136
 137 The following is the full set of functions,
 138
 139         mpz_expr
 140             abs bin clrbit cmp cmpabs congruent_p divisible_p even_p fib fac
 141             gcd hamdist invert jacobi kronecker lcm lucnum max min nextprime
 142             odd_p perfect_power_p perfect_square_p popcount powm
 143             probab_prime_p root scan0 scan1 setbit sgn sqrt
 144
 145         mpq_expr
 146             abs, cmp, den, max, min, num, sgn
 147
 148         mpf_expr
 149             abs, ceil, cmp, eq, floor, integer_p, max, min, reldiff, sgn,
 150             sqrt, trunc
 151
 152 All these are the same as the GMP library functions, except that min and max
 153 don't exist in the library.  Note also that min, max, gcd and lcm take any
 154 number of arguments, not just two.
 155
 156 mpf_expr does all calculations to the precision of the destination variable.
 157
 158
 159 Expression parsing can succeed or fail.  The return value indicates this,
 160 and will be one of the following
 161
 162         MPEXPR_RESULT_OK
 163         MPEXPR_RESULT_BAD_VARIABLE
 164         MPEXPR_RESULT_BAD_TABLE
 165         MPEXPR_RESULT_PARSE_ERROR
 166         MPEXPR_RESULT_NOT_UI
 167
 168 BAD_VARIABLE is when a variable is referenced that hasn't been provided.
 169 For example if "c" is used when only two parameters have been passed.
 170 BAD_TABLE is applicable to the advanced usage described below.
 171
 172 PARSE_ERROR is a general syntax error, returned for any mal-formed input
 173 string.
 174
 175 NOT_UI is returned when an attempt is made to use an operand that's bigger
 176 than an "unsigned long" with a function that's restricted to that range.
 177 For example "fib" is mpz_fib_ui and only accepts an "unsigned long".
 178
 179
 180
 181
 182 ADVANCED USAGE
 183
 184 int mpz_expr_a (const struct mpexpr_operator_t *table,
 185                 mpz_ptr res, int base, const char *e, size_t elen,
 186                 mpz_srcptr var[26])
 187 int mpq_expr_a (const struct mpexpr_operator_t *table,
 188                 mpq_ptr res, int base, const char *e, size_t elen,
 189                 mpq_srcptr var[26])
 190 int mpf_expr_a (const struct mpexpr_operator_t *table,
 191                 mpf_ptr res, int base, unsigned long prec,
 192                 const char *e, size_t elen,
 193                 mpf_srcptr var[26])
 194
 195 These functions are an advanced interface to expression parsing.
 196
 197 The string is taken as pointer and length.  This makes it possible to parse
 198 an expression in the middle of somewhere without copying and null
 199 terminating it.
 200
 201 Variables are an array of 26 pointers to the appropriate operands, or NULL
 202 for variables that are not available.  Any combination of variables can be
 203 given, for example just "x" and "y" (var[23] and var[24]) could be set.
 204
 205 Operators and functions are specified with a table.  This makes it possible
 206 to provide additional operators or functions, or to completely change the
 207 syntax.  The standard tables used by the simple functions above are
 208 available as
 209
 210         const struct mpexpr_operator_t * const mpz_expr_standard_table;
 211         const struct mpexpr_operator_t * const mpq_expr_standard_table;
 212         const struct mpexpr_operator_t * const mpf_expr_standard_table;
 213
 214 struct mpexpr_operator_t is the following
 215
 216         struct mpexpr_operator_t {
 217           const char    *name;
 218           mpexpr_fun_t  fun;
 219           int           type;
 220           int           precedence;
 221         };
 222
 223         typedef void (*mpexpr_fun_t) (void);
 224
 225 As an example, the standard mpz_expr table entry for multiplication is as
 226 follows.  See the source code for the full set of standard entries.
 227
 228         { "*", (mpexpr_fun_t) mpz_mul, MPEXPR_TYPE_BINARY, 200 },
 229
 230 "name" is the string to parse, "fun" is the function to call for it, "type"
 231 indicates what parameters the function takes (among other things), and
 232 "precedence" sets its operator precedence.
 233
 234 A NULL for "name" indicates the end of the table, so for example an mpf
 235 table with nothing but addition could be
 236
 237         struct mpexpr_operator_t  table[] = {
 238           { "+", (mpexpr_fun_t) mpf_add, MPEXPR_TYPE_BINARY, 190 },
 239           { NULL }
 240         };
 241
 242 A special type MPEXPR_TYPE_NEW_TABLE makes it possible to chain from one
 243 table to another.  For example the following would add a "mod" operator to
 244 the standard mpz table,
 245
 246         struct mpexpr_operator_t  table[] = {
 247         { "mod", (mpexpr_fun_t) mpz_fdiv_r, MPEXPR_TYPE_BINARY, 125 },
 248         { (const char *) mpz_expr_standard_table, NULL, MPEXPR_TYPE_NEW_TABLE }
 249         };
 250
 251 Notice the low precedence on "mod", so that for instance "45+26 mod 7"
 252 parses as "(45+26)mod7".
 253
 254
 255 Functions are designated by a precedence of 0.  They always occur as
 256 "foo(expr)" and so have no need for a precedence level.  mpq_abs in the
 257 standard mpq table is
 258
 259         { "abs", (mpexpr_fun_t) mpq_abs, MPEXPR_TYPE_UNARY },
 260
 261 Functions expecting no arguments as in "foo()" can be given with
 262 MPEXPR_TYPE_0ARY, or actual constants to be parsed as just "foo" are
 263 MPEXPR_TYPE_CONSTANT.  For example if a "void mpf_const_pi(mpf_t f)"
 264 function existed (which it doesn't) it could be,
 265
 266         { "pi", (mpexpr_fun_t) mpf_const_pi, MPEXPR_TYPE_CONSTANT },
 267
 268
 269 Parsing of operator names is done by seeking the table entry with the
 270 longest matching name.  So for instance operators "<" and "<=" exist, and
 271 when presented with "x <= y" the parser matches "<=" because it's longer.
 272
 273 Parsing of function names, on the other hand, is done by requiring a whole
 274 alphanumeric word to match.  For example presented with "fib2zz(5)" the
 275 parser will attempt to find a function called "fib2zz".  A function "fib"
 276 wouldn't be used because it doesn't match the whole word.
 277
 278 The flag MPEXPR_TYPE_WHOLEWORD can be ORed into an operator type to override
 279 the default parsing style.  Similarly MPEXPR_TYPE_OPERATOR into a function.
 280
 281
 282 Binary operators are left associative by default, meaning they're evaluated
 283 from left to right, so for example "1+2+3" is treated as "(1+2)+3".
 284 MPEXPR_TYPE_RIGHTASSOC can be ORed into the operator type to work from right
 285 to left as in "1+(2+3)".  This is generally what's wanted for
 286 exponentiation, and for example the standard mpz table has
 287
 288         { "**", (mpexpr_fun_t) mpz_pow_ui,
 289           MPEXPR_TYPE_BINARY_UI | MPEXPR_TYPE_RIGHTASSOC, 220 }
 290
 291 Unary operators are postfix by default.  For example a factorial to be used
 292 as "123!" might be
 293
 294         { "!", (mpexpr_fun_t) mpz_fac_ui, MPEXPR_TYPE_UNARY_UI, 215 }
 295
 296 MPEXPR_TYPE_PREFIX can be ORed into the type to get a prefix operator.  For
 297 instance negation (unary minus) in the standard mpf table is
 298
 299         { "-", (mpexpr_fun_t) mpf_neg,
 300           MPEXPR_TYPE_UNARY | MPEXPR_TYPE_PREFIX, 210 },
 301
 302
 303 The same operator can exist as a prefix unary and a binary, or as a prefix
 304 and postfix unary, simply by putting two entries in the table.  While
 305 parsing the context determines which style is sought.  But note that the
 306 same operator can't be both a postfix unary and a binary, since the parser
 307 doesn't try to look ahead to decide which ought to be used.
 308
 309 When there's two entries for an operator, both prefix or both postfix (or
 310 binary), then the first in the table will be used.  This makes it possible
 311 to override an entry in a standard table, for example to change the function
 312 it calls, or perhaps its precedence level.  The following would change mpz
 313 division from tdiv to cdiv,
 314
 315         struct mpexpr_operator_t  table[] = {
 316           { "/", (mpexpr_fun_t) mpz_cdiv_q, MPEXPR_TYPE_BINARY, 200 },
 317           { "%", (mpexpr_fun_t) mpz_cdiv_r, MPEXPR_TYPE_BINARY, 200 },
 318           { (char *) mpz_expr_standard_table, NULL, MPEXPR_TYPE_NEW_TABLE }
 319         };
 320
 321
 322 The type field indicates what parameters the given function expects.  The
 323 following styles of functions are supported.  mpz_t is shown, but of course
 324 this is mpq_t for mpq_expr_a, mpf_t for mpf_expr_a, etc.
 325
 326     MPEXPR_TYPE_CONSTANT     void func (mpz_t result);
 327
 328     MPEXPR_TYPE_0ARY         void func (mpz_t result);
 329     MPEXPR_TYPE_I_0ARY       int func (void);
 330
 331     MPEXPR_TYPE_UNARY        void func (mpz_t result, mpz_t op);
 332     MPEXPR_TYPE_UNARY_UI     void func (mpz_t result, unsigned long op);
 333     MPEXPR_TYPE_I_UNARY      int func (mpz_t op);
 334     MPEXPR_TYPE_I_UNARY_UI   int func (unsigned long op);
 335
 336     MPEXPR_TYPE_BINARY       void func (mpz_t result, mpz_t op1, mpz_t op2);
 337     MPEXPR_TYPE_BINARY_UI    void func (mpz_t result,
 338                                         mpz_t op1, unsigned long op2);
 339     MPEXPR_TYPE_I_BINARY     int func (mpz_t op1, mpz_t op2);
 340     MPEXPR_TYPE_I_BINARY_UI  int func (mpz_t op1, unsigned long op2);
 341
 342     MPEXPR_TYPE_TERNARY      void func (mpz_t result,
 343                                         mpz_t op1, mpz_t op2, mpz_t op3);
 344     MPEXPR_TYPE_TERNARY_UI   void func (mpz_t result, mpz_t op1, mpz_t op2,
 345                                         unsigned long op3);
 346     MPEXPR_TYPE_I_TERNARY    int func (mpz_t op1, mpz_t op2, mpz_t op3);
 347     MPEXPR_TYPE_I_TERNARY_UI int func (mpz_t op1, mpz_t op2,
 348                                        unsigned long op3);
 349
 350 Notice the pattern of "UI" for the last parameter as an unsigned long, or
 351 "I" for the result as an "int" return value.
 352
 353 It's important that the declared type for an operator or function matches
 354 the function pointer given.  Any mismatch will have unpredictable results.
 355
 356 For binary functions, a further type attribute is MPEXPR_TYPE_PAIRWISE which
 357 indicates that any number of arguments should be accepted, and evaluated by
 358 applying the given binary function to them pairwise.  This is used by gcd,
 359 lcm, min and max.  For example the standard mpz gcd is
 360
 361         { "gcd", (mpexpr_fun_t) mpz_gcd,
 362           MPEXPR_TYPE_BINARY | MPEXPR_TYPE_PAIRWISE },
 363
 364 Some special types exist for comparison operators (or functions).
 365 MPEXPR_TYPE_CMP_LT through MPEXPR_TYPE_CMP_GE expect an MPEXPR_TYPE_I_BINARY
 366 function, returning positive, negative or zero like mpz_cmp and similar.
 367 For example the standard mpf "!=" operator is
 368
 369         { "!=", (mpexpr_fun_t) mpf_cmp, MPEXPR_TYPE_CMP_NE, 160 },
 370
 371 But there's no obligation to use these types, for instance the standard mpq
 372 table just uses a plain MPEXPR_TYPE_I_BINARY and mpq_equal for "==".
 373
 374 Further special types MPEXPR_TYPE_MIN and MPEXPR_TYPE_MAX exist to implement
 375 the min and max functions, and they take a function like mpf_cmp similarly.
 376 The standard mpf max function is
 377
 378         { "max",  (mpexpr_fun_t) mpf_cmp,
 379           MPEXPR_TYPE_MAX | MPEXPR_TYPE_PAIRWISE },
 380
 381 These can be used as operators too, for instance the following would be the
 382 >? operator which is a feature of GNU C++,
 383
 384         { ">?", (mpexpr_fun_t) mpf_cmp, MPEXPR_TYPE_MAX, 175 },
 385
 386 Other special types are used to define "(" ")" parentheses, "," function
 387 argument separator, "!" through "||" logical booleans, ternary "?"  ":", and
 388 the "$" which introduces variables.  See the sources for how they should be
 389 used.
 390
 391
 392 User definable operator tables will have various uses.  For example,
 393
 394   - a subset of the C operators, to be rid of infrequently used things
 395   - a more mathematical syntax like "." for multiply, "^" for powering,
 396     and "!" for factorial
 397   - a boolean evaluator with "^" for AND, "v" for OR
 398   - variables introduced with "%" instead of "$"
 399   - brackets as "[" and "]" instead of "(" and ")"
 400
 401 The only fixed parts of the parsing are the treatment of numbers, whitespace
 402 and the two styles of operator/function name recognition.
 403
 404 As a final example, the following would be a complete mpz table implementing
 405 some operators with a more mathematical syntax.  Notice there's no need to
 406 preserve the standard precedence values, anything can be used so long as
 407 they're in the desired relation to each other.  There's also no need to have
 408 entries in precedence order, but it's convenient to do so to show what comes
 409 where.
 410
 411         static const struct mpexpr_operator_t  table[] = {
 412           { "^",   (mpexpr_fun_t) mpz_pow_ui,
 413             MPEXPR_TYPE_BINARY_UI | MPEXPR_TYPE_RIGHTASSOC,           9 },
 414
 415           { "!",   (mpexpr_fun_t) mpz_fac_ui, MPEXPR_TYPE_UNARY_UI,   8 },
 416           { "-",   (mpexpr_fun_t) mpz_neg,
 417             MPEXPR_TYPE_UNARY | MPEXPR_TYPE_PREFIX,                   7 },
 418
 419           { "*",   (mpexpr_fun_t) mpz_mul,    MPEXPR_TYPE_BINARY,     6 },
 420           { "/",   (mpexpr_fun_t) mpz_fdiv_q, MPEXPR_TYPE_BINARY,     6 },
 421
 422           { "+",   (mpexpr_fun_t) mpz_add,    MPEXPR_TYPE_BINARY,     5 },
 423           { "-",   (mpexpr_fun_t) mpz_sub,    MPEXPR_TYPE_BINARY,     5 },
 424
 425           { "mod", (mpexpr_fun_t) mpz_mod,    MPEXPR_TYPE_BINARY,     6 },
 426
 427           { ")",   NULL,                      MPEXPR_TYPE_CLOSEPAREN, 4 },
 428           { "(",   NULL,                      MPEXPR_TYPE_OPENPAREN,  3 },
 429           { ",",   NULL,                      MPEXPR_TYPE_ARGSEP,     2 },
 430
 431           { "$",   NULL,                      MPEXPR_TYPE_VARIABLE,   1 },
 432           { NULL }
 433         };
 434
 435
 436
 437
 438 INTERNALS
 439
 440 Operator precedence is implemented using a control and data stack, there's
 441 no C recursion.  When an expression like 1+2*3 is read the "+" is held on
 442 the control stack and 1 on the data stack until "*" has been parsed and
 443 applied to 2 and 3.  This happens any time a higher precedence operator
 444 follows a lower one, or when a right-associative operator like "**" is
 445 repeated.
 446
 447 Parentheses are handled by making "(" a special prefix unary with a low
 448 precedence so a whole following expression is read.  The special operator
 449 ")" knows to discard the pending "(".  Function arguments are handled
 450 similarly, with the function pretending to be a low precedence prefix unary
 451 operator, and with "," allowed within functions.  The same special ")"
 452 operator recognises a pending function and will invoke it appropriately.
 453
 454 The ternary "? :" operator is also handled using precedences.  ":" is one
 455 level higher than "?", so when a valid a?b:c is parsed the ":" finds a "?"
 456 on the control stack.  It's a parse error for ":" to find anything else.
 457
 458
 459
 460 FUTURE
 461
 462 The ternary "?:" operator evaluates the "false" side of its pair, which is
 463 wasteful, though it ought to be harmless.  It'd be better if it could
 464 evaluate only the "true" side.  Similarly for the logical booleans "&&" and
 465 "||" if they know their result already.
 466
 467 Functions like MPEXPR_TYPE_BINARY could return a status indicating operand
 468 out of range or whatever, to get an error back through mpz_expr etc.  That
 469 would want to be just an option, since plain mpz_add etc have no such
 470 return.
 471
 472 Could have assignments like "a = b*c" modifying the input variables.
 473 Assignment could be an operator attribute, making it expect an lvalue.
 474 There would want to be a standard table without assignments available
 475 though, so user input could be safely parsed.
 476
 477 The closing parenthesis table entry could specify the type of open paren it
 478 expects, so that "(" and ")" could match and "[" and "]" match but not a
 479 mixture of the two.  Currently "[" and "]" can be added, but there's no
 480 error on writing a mixed expression like "2*(3+4]".  Maybe also there could
 481 be a way to say that functions can only be written with one or the other
 482 style of parens.
 483
 484
 485
 486 ----------------
 487 Local variables:
 488 mode: text
 489 fill-column: 76
 490 End: