Change Log
==========
+Version 3.0.2 -
+---------------
+- Reverted change in behavior with `LineStart` and `StringStart`, which changed the
+ interpretation of when and how `LineStart` and `StringStart` should match when
+ a line starts with spaces. In 3.0.0, the `xxxStart` expressions were not
+ really treated like expressions in their own right, but as modifiers to the
+ following expression when used like `LineStart() + expr`, so that if there
+ were whitespace on the line before `expr` (which would match in versions prior
+ to 3.0.0), the match would fail.
+
+ 3.0.0 implemented this by automatically promoting `LineStart() + expr` to
+ `AtLineStart(expr)`, which broke existing parsers that did not expect `expr` to
+ necessarily be right at the start of the line, but only be the first token
+ found on the line. This was reported as a regression in Issue #317.
+
+ In 3.0.2, pyparsing reverts to the previous behavior, but will retain the new
+ `AtLineStart` and `AtStringStart` expression classes, so that parsers can chose
+ whichever behavior applies in their specific instance. Specifically:
+
+ # matches expr if it is the first token on the line
+ # (allows for leading whitespace)
+ LineStart() + expr
+
+ # matches only if expr is found in column 1
+ AtLineStart(expr)
+
+- Performance enhancement to `one_of` to always generate an internal `Regex`,
+ even if `caseless` or `as_keyword` args are given as `True` (unless explicitly
+ disabled by passing `use_regex=False`).
+
+- `IndentedBlock` class now works with `recursive` flag. By default, the
+ results parsed by an `IndentedBlock` are grouped. This can be disabled by constructing
+ the `IndentedBlock` with `grouped=False`.
+
+
Version 3.0.1 -
---------------
-- Fixed bug where Word(max=n) did not match word groups less than length 'n'.
+- Fixed bug where `Word(max=n)` did not match word groups less than length 'n'.
Thanks to Joachim Metz for catching this!
-- Fixed bug where ParseResults accidentally created recursive contents.
+- Fixed bug where `ParseResults` accidentally created recursive contents.
Joachim Metz on this one also!
-- Fixed bug where warn_on_multiple_string_args_to_oneof warning is raised
+- Fixed bug where `warn_on_multiple_string_args_to_oneof` warning is raised
even when not enabled.
Version 3.0.0 -
---------------
- A consolidated list of all the changes in the 3.0.0 release can be found in
- docs/whats_new_in_3_0_0.rst.
+ `docs/whats_new_in_3_0_0.rst`.
(https://github.com/pyparsing/pyparsing/blob/master/docs/whats_new_in_3_0_0.rst)
Version 3.0.0.final -
---------------------
-- Added support for python -W warning option to call enable_all_warnings() at startup.
- Also detects setting of PYPARSINGENABLEALLWARNINGS environment variable to any non-blank
- value.
+- Added support for python `-W` warning option to call `enable_all_warnings`() at startup.
+ Also detects setting of `PYPARSINGENABLEALLWARNINGS` environment variable to any non-blank
+ value. (If using `-Wd` for testing, but wishing to disable pyparsing warnings, add
+ `-Wi:::pyparsing`.)
- Fixed named results returned by `url` to match fields as they would be parsed
- using urllib.parse.urlparse.
+ using `urllib.parse.urlparse`.
- Early response to `with_line_numbers` was positive, with some requested enhancements:
. added a trailing "|" at the end of each line (to show presence of trailing spaces);
can be customized using `eol_mark` argument
. added expand_tabs argument, to control calling str.expandtabs (defaults to True
- to match parseString)
+ to match `parseString`)
. added mark_spaces argument to support display of a printing character in place of
spaces, or Unicode symbols for space and tab characters
. added mark_control argument to support highlighting of control characters using
'.' or Unicode symbols, such as "␍" and "␊".
-- Modified helpers common_html_entity and replace_html_entity() to use the HTML
- entity definitions from html.entities.html5.
+- Modified helpers `common_html_entity` and `replace_html_entity()` to use the HTML
+ entity definitions from `html.entities.html5`.
- Updated the class diagram in the pyparsing docs directory, along with the supporting
.puml file (PlantUML markup) used to create the diagram.
- Added new example `cuneiform_python.py` to demonstrate creating a new Unicode
range, and writing a Cuneiform->Python transformer (inspired by zhpy).
-- Fixed issue #272, reported by PhasecoreX, when LineStart() expressions would match
+- Fixed issue #272, reported by PhasecoreX, when `LineStart`() expressions would match
input text that was not necessarily at the beginning of a line.
As part of this fix, two new classes have been added: AtLineStart and AtStringStart.
LineStart() + expr and AtLineStart(expr)
StringStart() + expr and AtStringStart(expr)
-- Fixed ParseFatalExceptions failing to override normal exceptions or expression
- matches in MatchFirst expressions. Addresses issue #251, reported by zyp-rgb.
+ [`LineStart` and `StringStart` changes reverted in 3.0.2.]
+
+- Fixed `ParseFatalExceptions` failing to override normal exceptions or expression
+ matches in `MatchFirst` expressions. Addresses issue #251, reported by zyp-rgb.
-- Fixed bug in which ParseResults replaces a collection type value with an invalid
+- Fixed bug in which `ParseResults` replaces a collection type value with an invalid
type annotation (as a result of changed behavior in Python 3.9). Addresses issue #276, reported by
Rob Shuler, thanks.
-- Fixed bug in ParseResults when calling `__getattr__` for special double-underscored
- methods. Now raises AttributeError for non-existent results when accessing a
+- Fixed bug in `ParseResults` when calling `__getattr__` for special double-underscored
+ methods. Now raises `AttributeError` for non-existent results when accessing a
name starting with '__'. Addresses issue #208, reported by Joachim Metz.
- Modified debug fail messages to include the expression name to make it easier to sync
to be shown vertically; default=3
. optional 'show_results_names' argument, to specify whether results name
annotations should be shown; default=False
- . every expression that gets a name using setName() gets separated out as
+ . every expression that gets a name using `setName()` gets separated out as
a separate subdiagram
. results names can be shown as annotations to diagram items
- . Each, FollowedBy, and PrecededBy elements get [ALL], [LOOKAHEAD], and [LOOKBEHIND]
+ . `Each`, `FollowedBy`, and `PrecededBy` elements get [ALL], [LOOKAHEAD], and [LOOKBEHIND]
annotations
. removed annotations for Suppress elements
. some diagram cleanup when a grammar contains Forward elements
- Fixed bug in Located class when used with a results name. (Issue #294)
-- Fixed bug in QuotedString class when the escaped quote string is not a
+- Fixed bug in `QuotedString` class when the escaped quote string is not a
repeated character. (Issue #263)
-- parseFile() and create_diagram() methods now will accept pathlib.Path
+- `parseFile()` and `create_diagram()` methods now will accept `pathlib.Path`
arguments.
Contributed by Kazantcev Andrey, thanks!
- Removed internal comparison of results values against b"", which
- raised a BytesWarning when run with `python -bb`. Fixes issue #271 reported
+ raised a `BytesWarning` when run with `python -bb`. Fixes issue #271 reported
by Florian Bruhin, thank you!
- Fixed STUDENTS table in sql2dot.py example, fixes issue #261 reported by
distinctions in working with the different types.
In addition parse actions that must return a value of list type (which would
- normally be converted internally to a ParseResults) can override this default
+ normally be converted internally to a `ParseResults`) can override this default
behavior by returning their list wrapped in the new `ParseResults.List` class:
# this parse action tries to return a list, but pyparsing
(['abc', 'def'], {'qty': 100}]
-- Fixed bugs in Each when passed OneOrMore or ZeroOrMore expressions:
+- Fixed bugs in Each when passed `OneOrMore` or `ZeroOrMore` expressions:
. first expression match could be enclosed in an extra nesting level
. out-of-order expressions now handled correctly if mixed with required
expressions
documentation.
- API CHANGE
- Changed result returned when parsing using countedArray,
+ Changed result returned when parsing using `countedArray`,
the array items are no longer returned in a doubly-nested
list.
string ranges if possible. `Word(alphas)` would formerly
print as `W:(ABCD...)`, now prints as `W:(A-Za-z)`.
-- Added ignoreWhitespace(recurse:bool = True) and added a
- recurse argument to leaveWhitespace, both added to provide finer
+- Added `ignoreWhitespace(recurse:bool = True)`` and added a
+ recurse argument to `leaveWhitespace`, both added to provide finer
control over pyparsing's whitespace skipping. Also contributed
by Michael Milton.
Also, pyparsing_unicode.Korean was renamed to Hangul (Korean
is also defined as a synonym for compatibility).
-- Enhanced ParseResults dump() to show both results names and list
+- Enhanced `ParseResults` dump() to show both results names and list
subitems. Fixes bug where adding a results name would hide
- lower-level structures in the ParseResults.
+ lower-level structures in the `ParseResults`.
- Added new __diag__ warnings:
mistake when using Forwards)
(**currently not working on PyPy**)
-- Added ParserElement.recurse() method to make it simpler for
+- Added `ParserElement`.recurse() method to make it simpler for
grammar utilities to navigate through the tree of expressions in
a pyparsing grammar.
-- Fixed bug in ParseResults repr() which showed all matching
- entries for a results name, even if listAllMatches was set
- to False when creating the ParseResults originally. Reported
+- Fixed bug in `ParseResults` repr() which showed all matching
+ entries for a results name, even if `listAllMatches` was set
+ to False when creating the `ParseResults` originally. Reported
by Nicholas42 on GitHub, good catch! (Issue #205)
- Modified refactored modules to use relative imports, as
version of Python, you must use a Pyparsing 2.4.x version
Deprecated features removed:
- . ParseResults.asXML() - if used for debugging, switch
- to using ParseResults.dump(); if used for data transfer,
- use ParseResults.asDict() to convert to a nested Python
+ . `ParseResults.asXML()` - if used for debugging, switch
+ to using `ParseResults.dump()`; if used for data transfer,
+ use `ParseResults.asDict()` to convert to a nested Python
dict, which can then be converted to XML or JSON or
other transfer format
- . operatorPrecedence synonym for infixNotation -
- convert to calling infixNotation
+ . `operatorPrecedence` synonym for `infixNotation` -
+ convert to calling `infixNotation`
- . commaSeparatedList - convert to using
+ . `commaSeparatedList` - convert to using
pyparsing_common.comma_separated_list
- . upcaseTokens and downcaseTokens - convert to using
- pyparsing_common.upcaseTokens and downcaseTokens
+ . `upcaseTokens` and `downcaseTokens` - convert to using
+ `pyparsing_common.upcaseTokens` and `downcaseTokens`
. __compat__.collect_all_And_tokens will not be settable to
False to revert to pre-2.3.1 results name behavior -
- review use of names for MatchFirst and Or expressions
+ review use of names for `MatchFirst` and Or expressions
containing And expressions, as they will return the
complete list of parsed tokens, not just the first one.
Use `__diag__.warn_multiple_tokens_in_named_alternation`
- API CHANGE:
The staticmethod `ParseException.explain` has been moved to
`ParseBaseException.explain_exception`, and a new `explain` instance
- method added to ParseBaseException. This will make calls to `explain`
+ method added to `ParseBaseException`. This will make calls to `explain`
much more natural:
try:
print(pe.explain())
- POTENTIAL API CHANGE:
- ZeroOrMore expressions that have results names will now
+ `ZeroOrMore` expressions that have results names will now
include empty lists for their name if no matches are found.
Previously, no named result would be present. Code that tested
for the presence of any expressions using "if name in results:"
will now always return True. This code will need to change to
"if name in results and results[name]:" or just
"if results[name]:". Also, any parser unit tests that check the
- asDict() contents will now see additional entries for parsers
- having named ZeroOrMore expressions, whose values will be `[]`.
+ `asDict()` contents will now see additional entries for parsers
+ having named `ZeroOrMore` expressions, whose values will be `[]`.
- POTENTIAL API CHANGE:
- Fixed a bug in which calls to ParserElement.setDefaultWhitespaceChars
+ Fixed a bug in which calls to `ParserElement.setDefaultWhitespaceChars`
did not change whitespace definitions on any pyparsing built-in
- expressions defined at import time (such as quotedString, or those
+ expressions defined at import time (such as `quotedString`, or those
defined in pyparsing_common). This would lead to confusion when
built-in expressions would not use updated default whitespace
- characters. Now a call to ParserElement.setDefaultWhitespaceChars
+ characters. Now a call to `ParserElement.setDefaultWhitespaceChars`
will also go and update all pyparsing built-ins to use the new
default whitespace characters. (Note that this will only modify
expressions defined within the pyparsing module.) Prompted by
pp.__diag__.enable_all_warnings()
- added new warning, "warn_on_match_first_with_lshift_operator" to
- warn when using '<<' with a '|' MatchFirst operator, which will
+ warn when using '<<' with a '|' `MatchFirst` operator, which will
create an unintended expression due to precedence of operations.
Example: This statement will erroneously define the `fwd` expression
or
fwd << (expr_a | expr_b)
-- Cleaned up default tracebacks when getting a ParseException when calling
- parseString. Exception traces should now stop at the call in parseString,
+- Cleaned up default tracebacks when getting a `ParseException` when calling
+ `parseString`. Exception traces should now stop at the call in `parseString`,
and not include the internal traceback frames. (If the full traceback
- is desired, then set ParserElement.verbose_traceback to True.)
+ is desired, then set `ParserElement`.verbose_traceback to True.)
-- Fixed FutureWarnings that sometimes are raised when '[' passed as a
+- Fixed `FutureWarnings` that sometimes are raised when '[' passed as a
character to Word.
- New namespace, assert methods and classes added to support writing
unit tests.
- - assertParseResultsEquals
- - assertParseAndCheckList
- - assertParseAndCheckDict
- - assertRunTestResults
- - assertRaisesParseException
- - reset_pyparsing_context context manager, to restore pyparsing
+ - `assertParseResultsEquals`
+ - `assertParseAndCheckList`
+ - `assertParseAndCheckDict`
+ - `assertRunTestResults`
+ - `assertRaisesParseException`
+ - `reset_pyparsing_context` context manager, to restore pyparsing
config settings
- Enhanced error messages and error locations when parsing fails on
- the Keyword or CaselessKeyword classes due to the presence of a
+ the Keyword or `CaselessKeyword` classes due to the presence of a
preceding or trailing keyword character. Surfaced while
working with metaperl on issue #201.
Inspired by PR submitted by bjrnfrdnnd on GitHub, very nice!
-- Fixed handling of ParseSyntaxExceptions raised as part of Each
+- Fixed handling of `ParseSyntaxExceptions` raised as part of Each
expressions, when sub-expressions contain '-' backtrack
suppression. As part of resolution to a question posted by John
Greene on StackOverflow.
- Improvements in select_parser.py, to include new SQL syntax
from SQLite. PR submitted by Robert Coup, nice work!
-- Fixed bug in PrecededBy which caused infinite recursion, issue #127
+- Fixed bug in `PrecededBy` which caused infinite recursion, issue #127
submitted by EdwardJB.
-- Fixed bug in CloseMatch where end location was incorrectly
+- Fixed bug in `CloseMatch` where end location was incorrectly
computed; and updated partial_gene_match.py example.
-- Fixed bug in indentedBlock with a parser using two different
+- Fixed bug in `indentedBlock` with a parser using two different
types of nested indented blocks with different indent values,
but sharing the same indent stack, submitted by renzbagaporo.
- Fixed bug in Each when using Regex, when Regex expression would
get parsed twice; issue #183 submitted by scauligi, thanks!
-- BigQueryViewParser.py added to examples directory, PR submitted
+- `BigQueryViewParser.py` added to examples directory, PR submitted
by Michael Smedberg, nice work!
- booleansearchparser.py added to examples directory, PR submitted
- Fixed bug in regex definitions for real and sci_real expressions in
pyparsing_common. Issue #194, reported by Michael Wayne Goodman, thanks!
-- Fixed FutureWarning raised beginning in Python 3.7 for Regex expressions
+- Fixed `FutureWarning` raised beginning in Python 3.7 for Regex expressions
containing '[' within a regex set.
-- Minor reformatting of output from runTests to make embedded
+- Minor reformatting of output from `runTests` to make embedded
comments more visible.
- And finally, many thanks to those who helped in the restructuring
Metadata-Version: 2.1
Name: pyparsing
-Version: 3.0.1
+Version: 3.0.2
Summary: Python parsing module
Home-page: https://github.com/pyparsing/pyparsing/
Author: Paul McGuire
on Forward expression that has no contained expression
Warnings can also be enabled using the Python ``-W`` switch, or setting a non-empty
- value to the environment variable ``PYPARSINGENABLEALLWARNINGS``
+ value to the environment variable ``PYPARSINGENABLEALLWARNINGS``. (If using `-Wd` for
+ testing, but wishing to disable pyparsing warnings, add `-Wi:::pyparsing`.)
Miscellaneous attributes and methods
then pass ``None`` for this argument.
-- ``IndentedBlock(statement_expr, recursive=True)`` -
+- ``IndentedBlock(statement_expr, recursive=False, grouped=True)`` -
function to define an indented block of statements, similar to
indentation-based blocking in Python source code:
will be found in the indented block; a valid ``IndentedBlock``
must contain at least 1 matching ``statement_expr``
+ - ``recursive`` - flag indicating whether the IndentedBlock can
+ itself contain nested sub-blocks of the same type of expression
+ (default=False)
+
+ - ``grouped`` - flag indicating whether the tokens returned from
+ parsing the IndentedBlock should be grouped (default=True)
+
.. _originalTextFor:
- ``original_text_for(expr)`` - helper function to preserve the originally parsed text, regardless of any
:abstract: This document summarizes the changes made
in the 3.0.0 release of pyparsing.
+ (Updated to reflect changes up to 3.0.2)
.. sectnum:: :depth: 4
- added support for calling ``enable_all_warnings()`` if warnings are enabled
using the Python ``-W`` switch, or setting a non-empty value to the environment
- variable ``PYPARSINGENABLEALLWARNINGS``.
+ variable ``PYPARSINGENABLEALLWARNINGS``. (If using `-Wd` for testing, but
+ wishing to disable pyparsing warnings, add `-Wi:::pyparsing`.)
- added new warning, ``warn_on_match_first_with_lshift_operator`` to
warn when using ``'<<'`` with a ``'|'`` ``MatchFirst`` operator,
This is the mechanism used internally by the ``Group`` class when defined
using ``aslist=True``.
-New Located class to replace locatedExpr helper method
+New Located class to replace ``locatedExpr`` helper method
------------------------------------------------------
The new ``Located`` class will replace the current ``locatedExpr`` method for
marking parsed results with the start and end locations of the parsed data in
The existing ``locatedExpr`` is retained for backward-compatibility, but will be
deprecated in a future release.
-New AtLineStart and AtStringStart classes
------------------------------------------
-As part fixing some matching behavior in LineStart and StringStart, two new
-classes have been added: AtLineStart and AtStringStart.
+New ``AtLineStart`` and ``AtStringStart`` classes
+-------------------------------------------------
+As part of fixing some matching behavior in ``LineStart`` and ``StringStart``, two new
+classes have been added: ``AtLineStart`` and ``AtStringStart``.
-The following expressions are equivalent::
+``LineStart`` and ``StringStart`` can be treated as separate elements, including whitespace skipping.
+``AtLineStart`` and ``AtStringStart`` enforce that an expression starts exactly at column 1, with no
+leading whitespace.
- LineStart() + expr and AtLineStart(expr)
- StringStart() + expr and AtStringStart(expr)
+ (LineStart() + Word(alphas)).parseString("ABC") # passes
+ (LineStart() + Word(alphas)).parseString(" ABC") # passes
+ AtLineStart(Word(alphas)).parseString(" ABC") # fails
-LineStart and StringStart now will only match if their related expression is
-actually at the start of the string or current line, without skipping whitespace.::
+[This is a fix to behavior that was added in 3.0.0, but was actually a regression from 2.4.x.]
- (LineStart() + Word(alphas)).parseString("ABC") # passes
- (LineStart() + Word(alphas)).parseString(" ABC") # fails
-
-LineStart is also smarter about matching at the beginning of the string.
-
-This was the intended behavior previously, but could be bypassed if wrapped
-in other ParserElements.
-
-New IndentedBlock class to replace indentedBlock helper method
+New ``IndentedBlock`` class to replace ``indentedBlock`` helper method
--------------------------------------------------------------
The new ``IndentedBlock`` class will replace the current ``indentedBlock`` method
for defining indented blocks of text, similar to Python source code. Using
by an indented list of integers::
integer = pp.Word(pp.nums)
- group = pp.Group(pp.Char(pp.alphas) + pp.Group(pp.IndentedBlock(integer)))
+ group = pp.Group(pp.Char(pp.alphas) + pp.IndentedBlock(integer))
parses::
[['A', [100, 101]], ['B', [200, 201]]]
+By default, the results returned from the ``IndentedBlock`` are grouped.
+
``IndentedBlock`` may also be used to define a recursive indented block (containing nested
indented blocks).
Fixed Bugs
==========
-- Fixed issue when LineStart() expressions would match input text that was not
+- [Reverted in 3.0.2]Fixed issue when ``LineStart``() expressions would match input text that was not
necessarily at the beginning of a line.
+ [The previous behavior was the correct behavior, since it represents the ``LineStart`` as its own
+ matching expression. ``ParserElements`` that must start in column 1 can be wrapped in the new
+ ``AtLineStart`` class.]
- Fixed bug in regex definitions for ``real`` and ``sci_real`` expressions in
``pyparsing_common``.
self.assertEqual(obj.parseString("{}").asList(), [])
self.assertEqual(obj.parseString('{a "string}')[0], 'a "string')
self.assertEqual(
- ["a ", ["nested"], "string"],
+ ["a ", ["nested"], " string"],
obj.parseString("{a {nested} string}").asList(),
)
self.assertEqual(
- ["a ", ["double ", ["nested"]], "string"],
+ ["a ", ["double ", ["nested"]], " string"],
obj.parseString("{a {double {nested}} string}").asList(),
)
for obj in (bp.quoted_string, bp.string, bp.field_value):
self.assertEqual([], obj.parseString('""').asList())
self.assertEqual("a string", obj.parseString('"a string"')[0])
self.assertEqual(
- ["a ", ["nested"], "string"],
+ ["a ", ["nested"], " string"],
obj.parseString('"a {nested} string"').asList(),
)
self.assertEqual(
- ["a ", ["double ", ["nested"]], "string"],
+ ["a ", ["double ", ["nested"]], " string"],
obj.parseString('"a {double {nested}} string"').asList(),
)
Metadata-Version: 2.1
Name: pyparsing
-Version: 3.0.1
+Version: 3.0.2
Summary: Python parsing module
Home-page: https://github.com/pyparsing/pyparsing/
Author: Paul McGuire
from collections import namedtuple
version_info = namedtuple("version_info", "major minor micro release_level serial")
-__version_info__ = version_info(3, 0, 1, "final", 0)
+__version_info__ = version_info(3, 0, 2, "final", 0)
__version__ = "{}.{}.{}".format(*__version_info__[:3]) + (
"{}{}{}".format(
"r" if __version_info__.release_level[0] == "c" else "",
),
"",
)[__version_info__.release_level == "final"]
-__version_time__ = "24 October 2021 17:43 UTC"
+__version_time__ = "27 October 2021 11:18 UTC"
__versionTime__ = __version_time__
__author__ = "Paul McGuire <ptmcg.gm+pyparsing@gmail.com>"
(Note that this is a raw string literal, you must include the leading ``'r'``.)
"""
+ from .testing import pyparsing_test
+
parseAll = parseAll and parse_all
fullDump = fullDump and full_dump
printResults = printResults and print_results
BOM = "\ufeff"
for t in tests:
if comment is not None and comment.matches(t, False) or comments and not t:
- comments.append(t)
+ comments.append(pyparsing_test.with_line_numbers(t))
continue
if not t:
continue
- out = ["\n" + "\n".join(comments) if comments else "", t]
+ out = [
+ "\n" + "\n".join(comments) if comments else "",
+ pyparsing_test.with_line_numbers(t),
+ ]
comments = []
try:
# convert newline marks to actual newlines, and strip leading BOM if present
result = self.parse_string(t, parse_all=parseAll)
except ParseBaseException as pe:
fatal = "(FATAL)" if isinstance(pe, ParseFatalException) else ""
- if "\n" in t:
- out.append(line(pe.loc, t))
- out.append(" " * (col(pe.loc, t) - 1) + "^" + fatal)
- else:
- out.append(" " * pe.loc + "^" + fatal)
+ out.append(pe.explain())
out.append("FAIL: " + str(pe))
success = success and failureTests
result = pe
repeat = ""
else:
repeat = "{{{},{}}}".format(
- self.minLen,
- "" if self.maxLen == _MAX_INT else self.maxLen
+ self.minLen, "" if self.maxLen == _MAX_INT else self.maxLen
)
self.reString = "[{}]{}".format(
_collapseStringToRanges(self.initChars),
def __init__(self):
super().__init__()
+ self.leave_whitespace()
+ self.orig_whiteChars = set() | self.whiteChars
+ self.whiteChars.discard("\n")
+ self.skipper = Empty().set_whitespace_chars(self.whiteChars)
self.errmsg = "Expected start of line"
- def __add__(self, other):
- return AtLineStart(other)
-
- def __sub__(self, other):
- return AtLineStart(other) - Empty()
-
def preParse(self, instring, loc):
if loc == 0:
return loc
else:
- if instring[loc : loc + 1] == "\n" and "\n" in self.whiteChars:
- ret = loc + 1
- else:
- ret = super().preParse(instring, loc)
+ ret = self.skipper.preParse(instring, loc)
+ if "\n" in self.orig_whiteChars:
+ while instring[ret : ret + 1] == "\n":
+ ret = self.skipper.preParse(instring, ret + 1)
return ret
def parseImpl(self, instring, loc, doActions=True):
super().__init__()
self.errmsg = "Expected start of text"
- def __add__(self, other):
- return AtStringStart(other)
-
- def __sub__(self, other):
- return AtStringStart(other) - Empty()
-
def parseImpl(self, instring, loc, doActions=True):
if loc != 0:
# see if entire string up to here is just whitespace and ignoreables
self.exprs = [e for e in self.exprs if e is not None]
super().streamline()
+
+ # link any IndentedBlocks to the prior expression
+ for prev, cur in zip(self.exprs, self.exprs[1:]):
+ # traverse cur or any first embedded expr of cur looking for an IndentedBlock
+ # (but watch out for recursive grammar)
+ seen = set()
+ while cur:
+ if id(cur) in seen:
+ break
+ seen.add(id(cur))
+ if isinstance(cur, IndentedBlock):
+ prev.add_parse_action(
+ lambda s, l, t: setattr(cur, "parent_anchor", col(l, s))
+ )
+ break
+ subs = cur.recurse()
+ cur = next(iter(subs), None)
+
self.mayReturnEmpty = all(e.mayReturnEmpty for e in self.exprs)
return self
super().__init__(exprs, savelist)
if self.exprs:
self.mayReturnEmpty = any(e.mayReturnEmpty for e in self.exprs)
+ self.skipWhitespace = all(e.skipWhitespace for e in self.exprs)
else:
self.mayReturnEmpty = True
if self.exprs:
self.mayReturnEmpty = any(e.mayReturnEmpty for e in self.exprs)
self.callPreparse = all(e.callPreparse for e in self.exprs)
+ self.skipWhitespace = all(e.skipWhitespace for e in self.exprs)
else:
self.mayReturnEmpty = True
leaveWhitespace = leave_whitespace
+class IndentedBlock(ParseElementEnhance):
+ """
+ Expression to match one or more expressions at a given indentation level.
+ Useful for parsing text where structure is implied by indentation (like Python source code).
+ """
+
+ class _Indent(Empty):
+ def __init__(self, ref_col: int):
+ super().__init__()
+ self.errmsg = "expected indent at column {}".format(ref_col)
+ self.add_condition(lambda s, l, t: col(l, s) == ref_col)
+
+ class _IndentGreater(Empty):
+ def __init__(self, ref_col: int):
+ super().__init__()
+ self.errmsg = "expected indent at column greater than {}".format(ref_col)
+ self.add_condition(lambda s, l, t: col(l, s) > ref_col)
+
+ def __init__(self, expr: ParserElement, *, recursive: bool = False, grouped: bool = True):
+ super().__init__(expr, savelist=True)
+ # if recursive:
+ # raise NotImplementedError("IndentedBlock with recursive is not implemented")
+ self._recursive = recursive
+ self._grouped = grouped
+ self.parent_anchor = 1
+
+ def parseImpl(self, instring, loc, doActions=True):
+ # advance parse position to non-whitespace by using an Empty()
+ # this should be the column to be used for all subsequent indented lines
+ anchor_loc = Empty().preParse(instring, loc)
+
+ # see if self.expr matches at the current location - if not it will raise an exception
+ # and no further work is necessary
+ self.expr.try_parse(instring, anchor_loc, doActions)
+
+ indent_col = col(anchor_loc, instring)
+ peer_detect_expr = self._Indent(indent_col)
+
+ inner_expr = Empty() + peer_detect_expr + self.expr
+ if self._recursive:
+ sub_indent = self._IndentGreater(indent_col)
+ nested_block = IndentedBlock(
+ self.expr, recursive=self._recursive, grouped=self._grouped
+ )
+ nested_block.set_debug(self.debug)
+ nested_block.parent_anchor = indent_col
+ inner_expr += Opt(sub_indent + nested_block)
+
+ inner_expr.set_name(f"inner {hex(id(inner_expr))[-4:].upper()}@{indent_col}")
+ block = OneOrMore(inner_expr)
+
+ trailing_undent = self._Indent(self.parent_anchor) | StringEnd()
+
+ if self._grouped:
+ wrapper = Group
+ else:
+ wrapper = lambda expr: expr
+ return (wrapper(block) + Optional(trailing_undent)).parseImpl(
+ instring, anchor_loc, doActions
+ )
+
+
class AtStringStart(ParseElementEnhance):
"""Matches if expression matches at the beginning of the parse
string::
# helpers.py
import html.entities
+import re
from . import __diag__
from .core import *
if not symbols:
return NoMatch()
- if not asKeyword:
- # if not producing keywords, need to reorder to take care to avoid masking
- # longer choices with shorter ones
+ # reorder given symbols to take care to avoid masking longer choices with shorter ones
+ # (but only if the given symbols are not just single characters)
+ if any(len(sym) > 1 for sym in symbols):
i = 0
while i < len(symbols) - 1:
cur = symbols[i]
else:
i += 1
- if not (caseless or asKeyword) and useRegex:
- # ~ print(strs, "->", "|".join([_escapeRegexChars(sym) for sym in symbols]))
+ if useRegex:
+ re_flags: int = re.IGNORECASE if caseless else 0
+
try:
- if len(symbols) == len("".join(symbols)):
- return Regex(
- "[%s]" % "".join(_escapeRegexRangeChars(sym) for sym in symbols)
- ).set_name(" | ".join(symbols))
- else:
- return Regex("|".join(re.escape(sym) for sym in symbols)).set_name(
- " | ".join(symbols)
+ if all(len(sym) == 1 for sym in symbols):
+ # symbols are just single characters, create range regex pattern
+ patt = "[{}]".format(
+ "".join(_escapeRegexRangeChars(sym) for sym in symbols)
)
+ else:
+ patt = "|".join(re.escape(sym) for sym in symbols)
+
+ # wrap with \b word break markers if defining as keywords
+ if asKeyword:
+ patt = r"\b(:?{})\b".format(patt)
+
+ ret = Regex(patt, flags=re_flags).set_name(" | ".join(symbols))
+
+ if caseless:
+ # add parse action to return symbols as specified, not in random
+ # casing as found in input string
+ symbol_map = {sym.lower(): sym for sym in symbols}
+ ret.add_parse_action(lambda s, l, t: symbol_map[t[0].lower()])
+
+ return ret
+
except sre_constants.error:
warnings.warn(
"Exception creating Regex for one_of, building MatchFirst", stacklevel=2
return smExpr.set_name("indented block")
-class IndentedBlock(ParseElementEnhance):
- """
- Expression to match one or more expressions at a given indentation level.
- Useful for parsing text where structure is implied by indentation (like Python source code).
- """
-
- def __init__(self, expr: ParserElement, recursive: bool = True):
- super().__init__(expr, savelist=True)
- self._recursive = recursive
-
- def parseImpl(self, instring, loc, doActions=True):
- # advance parse position to non-whitespace by using an Empty()
- # this should be the column to be used for all subsequent indented lines
- anchor_loc = Empty().preParse(instring, loc)
-
- # see if self.expr matches at the current location - if not it will raise an exception
- # and no further work is necessary
- self.expr.try_parse(instring, anchor_loc, doActions)
-
- indent_col = col(anchor_loc, instring)
- peer_parse_action = match_only_at_col(indent_col)
- peer_detect_expr = Empty().add_parse_action(peer_parse_action)
- inner_expr = Empty() + peer_detect_expr + self.expr
- inner_expr.set_name(f"inner {hex(id(inner_expr))[-4:].upper()}@{indent_col}")
-
- if self._recursive:
- indent_parse_action = condition_as_parse_action(
- lambda s, l, t, relative_to_col=indent_col: col(l, s) > relative_to_col
- )
- indent_expr = FollowedBy(self.expr).add_parse_action(indent_parse_action)
- inner_expr += Opt(Group(indent_expr + self.copy()))
-
- return OneOrMore(inner_expr).parseImpl(instring, loc, doActions)
-
-
# it's easy to get these comment structures wrong - they're very common, so may as well make them available
c_style_comment = Combine(Regex(r"/\*(?:[^*]|\*(?!/))*") + "*/").set_name(
"C style comment"
r"[-A]",
r"[\x21]",
r"[а-яА-ЯёЁA-Z$_\041α-ω]",
+ r"[\0xc0-\0xd6\0xd8-\0xf6\0xf8-\0xff]",
+ r"[\0xa1-\0xbf\0xd7\0xf7]",
+ r"[\0xc0-\0xd6\0xd8-\0xf6\0xf8-\0xff]",
+ r"[\0xa1-\0xbf\0xd7\0xf7]",
)
expectedResults = (
"ABCDEFGHIJKLMNOPQRSTUVWXYZ",
"-A",
"!",
"абвгдежзийклмнопрстуфхцчшщъыьэюяАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯёЁABCDEFGHIJKLMNOPQRSTUVWXYZ$_!αβγδεζηθικλμνξοπρςστυφχψω",
+ "ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ",
+ "¡¢£¤¥¦§¨©ª«¬\xad®¯°±²³´µ¶·¸¹º»¼½¾¿×÷",
+ pp.alphas8bit,
+ pp.punc8bit,
)
for test in zip(testCases, expectedResults):
t, exp = test
"""
test = dedent(test)
- print(test)
+ print(pp.testing.with_line_numbers(test))
print("normal parsing")
for t, s, e in (pp.LineStart() + "AAA").scanString(test):
- print(s, e, pp.lineno(s, test), pp.line(s, test), repr(test[s]))
+ print(s, e, pp.lineno(s, test), pp.line(s, test), repr(t))
print()
self.assertEqual(
- "A", test[s], "failed LineStart with insignificant newlines"
+ "A", t[0][0], "failed LineStart with insignificant newlines"
)
print(r"parsing without \n in whitespace chars")
print(s, e, pp.lineno(s, test), pp.line(s, test), repr(test[s]))
print()
self.assertEqual(
- "A", test[s], "failed LineStart with insignificant newlines"
+ "A", t[0][0], "failed LineStart with insignificant newlines"
)
- def testLineStart3(self):
+ def testLineStartWithLeadingSpaces(self):
# testing issue #272
instring = dedent(
"""
alpha_line | pp.Word("_"),
alpha_line | alpha_line,
pp.MatchFirst([alpha_line, alpha_line]),
+ alpha_line ^ pp.Word("_"),
+ alpha_line ^ alpha_line,
+ pp.Or([alpha_line, pp.Word("_")]),
pp.LineStart() + pp.Word(pp.alphas) + pp.LineEnd().suppress(),
pp.And([pp.LineStart(), pp.Word(pp.alphas), pp.LineEnd().suppress()]),
]
+ fails = []
for test in tests:
print(test.searchString(instring))
- self.assertEqual(
- ["a", "d", "e"], flatten(sum(test.search_string(instring)).as_list())
+ if ["a", "b", "c", "d", "e", "f", "g"] != flatten(
+ sum(test.search_string(instring)).as_list()
+ ):
+ fails.append(test)
+ if fails:
+ self.fail(
+ "failed LineStart tests:\n{}".format(
+ "\n".join(str(expr) for expr in fails)
+ )
)
- def testLineStart4(self):
+ def testAtLineStart(self):
test = dedent(
"""\
AAA this line
)
def testStringStart(self):
+ self.assertParseAndCheckList(
+ pp.StringStart() + pp.Word(pp.nums), "123", ["123"]
+ )
+ self.assertParseAndCheckList(
+ pp.StringStart() + pp.Word(pp.nums), " 123", ["123"]
+ )
+ self.assertParseAndCheckList(pp.StringStart() + "123", "123", ["123"])
+ self.assertParseAndCheckList(pp.StringStart() + "123", " 123", ["123"])
self.assertParseAndCheckList(pp.AtStringStart(pp.Word(pp.nums)), "123", ["123"])
self.assertParseAndCheckList(pp.AtStringStart("123"), "123", ["123"])
with self.assertRaisesParseException():
pp.AtStringStart("123").parse_string(" 123")
+ def testStringStartAndLineStartInsideAnd(self):
+ # fmt: off
+ P_MTARG = (
+ pp.StringStart()
+ + pp.Word("abcde")
+ + pp.StringEnd()
+ )
+
+ P_MTARG2 = (
+ pp.LineStart()
+ + pp.Word("abcde")
+ + pp.StringEnd()
+ )
+
+ P_MTARG3 = (
+ pp.AtLineStart(pp.Word("abcde"))
+ + pp.StringEnd()
+ )
+ # fmt: on
+
+ def test(expr, string):
+ expr.streamline()
+ print(expr, repr(string), end=" ")
+ print(expr.parse_string(string))
+
+ test(P_MTARG, "aaa")
+ test(P_MTARG2, "aaa")
+ test(P_MTARG2, "\naaa")
+ test(P_MTARG2, " aaa")
+ test(P_MTARG2, "\n aaa")
+
+ with self.assertRaisesParseException():
+ test(P_MTARG3, " aaa")
+ with self.assertRaisesParseException():
+ test(P_MTARG3, "\n aaa")
+
def testLineAndStringEnd(self):
NLs = pp.OneOrMore(pp.lineEnd)
U = pp.Literal("U").setParseAction(parseActionHolder.pa0)
V = pp.Literal("V")
+ # fmt: off
gg = pp.OneOrMore(
- A
- | B
- | C
- | D
- | E
- | F
- | G
- | H
- | I
- | J
- | K
- | L
- | M
- | N
- | O
- | P
- | Q
- | R
- | S
- | U
- | V
- | B
- | T
+ A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | U | V | B | T
)
+ # fmt: on
testString = "VUTSRQPONMLKJIHGFEDCBA"
res = gg.parseString(testString)
print(res)
D = pp.Literal("D").setParseAction(ClassAsPA3)
E = pp.Literal("E").setParseAction(ClassAsPAStarNew)
+ # fmt: off
gg = pp.OneOrMore(
- A
- | B
- | C
- | D
- | E
- | F
- | G
- | H
- | I
- | J
- | K
- | L
- | M
- | N
- | O
- | P
- | Q
- | R
- | S
- | T
- | U
- | V
+ A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V
)
+ # fmt: on
testString = "VUTSRQPONMLKJIHGFEDCBA"
res = gg.parseString(testString)
print(list(map(str, res)))
"""
integer = ppc.integer
- group = pp.Group(pp.Char(pp.alphas) + pp.Group(pp.IndentedBlock(integer)))
+ group = pp.Group(pp.Char(pp.alphas) + pp.IndentedBlock(integer))
group[...].parseString(data).pprint()
]
integer = ppc.integer
group = pp.Group(
- pp.Char(pp.alphas) + pp.Group(pp.IndentedBlock(integer, recursive=False))
+ pp.Char(pp.alphas) + pp.IndentedBlock(integer, recursive=False)
)
for data in datas:
integer = ppc.integer
group = pp.Forward()
- group <<= pp.Group(
- pp.Char(pp.alphas) + pp.Group(pp.IndentedBlock(integer | group))
- )
+ group <<= pp.Group(pp.Char(pp.alphas) + pp.IndentedBlock(integer | group))
print("using searchString")
print(group.searchString(data))
print("using parseString")
print(group[...].parseString(data).dump())
- print("test bad indentation")
dotted_int = pp.delimited_list(
pp.Word(pp.nums), ".", allow_trailing_delim=True, combine=True
)
- indented_expr = pp.IndentedBlock(dotted_int, recursive=True)
+ indented_expr = pp.IndentedBlock(dotted_int, recursive=True, grouped=True)
+ # indented_expr = pp.Forward()
+ # indented_expr <<= pp.IndentedBlock(dotted_int + indented_expr))
good_data = """\
1.
1.1
1.1.1
+ 1.1.2
2."""
- bad_data = """\
+ bad_data1 = """\
1.
1.1
1.1.1
1.2
2."""
- indented_expr.parseString(good_data, parseAll=True)
+ bad_data2 = """\
+ 1.
+ 1.1
+ 1.1.1
+ 1.2
+ 2."""
+ print("test good indentation")
+ print(pp.pyparsing_test.with_line_numbers(good_data))
+ print(indented_expr.parseString(good_data, parseAll=True).as_list())
+ print()
+
+ print("test bad indentation")
+ print(pp.pyparsing_test.with_line_numbers(bad_data1))
+ with self.assertRaisesParseException(
+ msg="Failed to raise exception with bad indentation 1"
+ ):
+ indented_expr.parseString(bad_data1, parseAll=True)
+
+ print(pp.pyparsing_test.with_line_numbers(bad_data2))
with self.assertRaisesParseException(
- msg="Failed to raise exception with bad indentation"
+ msg="Failed to raise exception with bad indentation 2"
):
- indented_expr.parseString(bad_data, parseAll=True)
+ indented_expr.parseString(bad_data2, parseAll=True)
def testInvalidDiagSetting(self):
with self.assertRaises(