From 424ec7b35b2fb40603b791b447d646dc9bf7f237 Mon Sep 17 00:00:00 2001 From: peter klausler Date: Tue, 20 Mar 2018 10:59:07 -0700 Subject: [PATCH] [flang] Handle empty files gracefully. Create interval.h. Use std::size_t instead of bare size_t. Redefine parser::Name to not be just a bare string. Break out and rename CharBlock from token-sequence.h for use in the parse tree. Incremental replacement of name strings with pointers to cooked characters. Fix case sensitivity problem. Use new CharBlock encoding to replace strings for real literal constants. Normalized cooked character stream to lower case. Simplify parsing now that cooked stream is lower case. Replace Keyword in parse tree. Add static_asserts to || and recovery parsers to enforce same result types. Remove needless TODO comment inserted earlier. Fix case conversion on prefixed character literals (f90_correct/dc04.f90). Use CharBlock in user-state.h. Complete transition from nextChar to nextCh (i.e., always use pointers). Document extensions. Begin work on compiler directive lines. More documentation work. Reformat prescan.cc. More work on compiler directive scanning. Original-commit: flang-compiler/f18@38d0404e16ee7c797c69bc38fa64bcb55413884a Reviewed-on: https://github.com/flang-compiler/f18/pull/29 Tree-same-pre-rewrite: false --- flang/documentation/ParserCombinators.md | 13 +- flang/documentation/extensions.md | 57 ++++++++ flang/documentation/parsing.md | 3 + flang/lib/parser/basic-parsers.h | 61 ++++++--- flang/lib/parser/char-block.h | 80 +++++++++++ flang/lib/parser/char-buffer.cc | 11 +- flang/lib/parser/char-buffer.h | 15 ++- flang/lib/parser/characters.cc | 8 +- flang/lib/parser/characters.h | 5 +- flang/lib/parser/debug-parser.h | 8 +- flang/lib/parser/grammar.h | 147 +++++++++------------ flang/lib/parser/interval.h | 74 +++++++++++ flang/lib/parser/message.cc | 5 +- flang/lib/parser/message.h | 15 ++- flang/lib/parser/parse-state.h | 3 +- flang/lib/parser/parse-tree-visitor.h | 20 ++- flang/lib/parser/parse-tree.cc | 31 +---- flang/lib/parser/parse-tree.h | 60 +++++---- flang/lib/parser/parsing.cc | 9 +- flang/lib/parser/preprocessor.cc | 115 ++++++++-------- flang/lib/parser/preprocessor.h | 22 ++-- flang/lib/parser/prescan.cc | 134 ++++++++++++++++--- flang/lib/parser/prescan.h | 19 ++- flang/lib/parser/provenance.cc | 30 ++--- flang/lib/parser/provenance.h | 95 +++----------- flang/lib/parser/source.cc | 18 +-- flang/lib/parser/source.h | 13 +- flang/lib/parser/token-parsers.h | 219 +++++++++++++++++-------------- flang/lib/parser/token-sequence.cc | 93 ++++++++----- flang/lib/parser/token-sequence.h | 83 +++--------- flang/lib/parser/unparse.cc | 28 ++-- flang/lib/parser/user-state.h | 8 +- flang/lib/semantics/make-types.cc | 8 +- 33 files changed, 910 insertions(+), 600 deletions(-) create mode 100644 flang/documentation/extensions.md create mode 100644 flang/lib/parser/char-block.h create mode 100644 flang/lib/parser/interval.h diff --git a/flang/documentation/ParserCombinators.md b/flang/documentation/ParserCombinators.md index 1c79742..20a47e2 100644 --- a/flang/documentation/ParserCombinators.md +++ b/flang/documentation/ParserCombinators.md @@ -45,7 +45,10 @@ These objects and functions are (or return) the fundamental parsers: * `cut` is a trivial parser that always fails silently. * `guard(pred)` returns a parser that succeeds if and only if the predicate expression evaluates to true. -* `nextChar` returns the next character, and fails at EOF. +* `nextCh` consumes the next character and returns its location, + and fails at EOF. +* `"xyz"_ch` succeeds if the next character consumed matches any of those + in the string (ignoring case), and returns its location. ### Combinators These functions and operators combine existing parsers to generate new parsers. @@ -96,6 +99,8 @@ collect the values that they return. * `construct{}(p1, p2, ...)` matches zero or more parsers in succession, collecting their results and then passing them with move semantics to a constructor for the type T if they all succeed. +* `sourced(p)` matches p, and fills in its `source` data member with the + locations of the cooked character stream that it consumed * `applyFunction(f, p1, p2, ...)` matches one or more parsers in succession, collecting their results and passing them as rvalue reference arguments to some function, returning its result. @@ -109,19 +114,15 @@ These are non-advancing state inquiry and update parsers: * `getColumn` returns the 1-based column position. * `inFixedForm` succeeds in fixed form Fortran source. -* `setInFixedForm` sets the fixed form flag, returning its prior value. -* `columns` returns the 1-based column number after which source is clipped. -* `setColumns(c)` sets the column limit and returns its prior value. ### Token Parsers Last, we have these basic parsers on which the actual grammar of the Fortran is built. All of the following parsers consume characters acquired from -`nextChar`. +`nextCh`. * `spaces` always succeeds after consuming any spaces or tabs * `digit` matches one cooked decimal digit (0-9) * `letter` matches one cooked letter (A-Z) -* `CharMatch<'c'>{}` matches one specific cooked character. * `"..."_tok` match the content of the string, skipping spaces before and after, and with multiple spaces accepted for any internal space. (Note that the `_tok` suffix is optional when the parser appears before diff --git a/flang/documentation/extensions.md b/flang/documentation/extensions.md new file mode 100644 index 0000000..5807a10 --- /dev/null +++ b/flang/documentation/extensions.md @@ -0,0 +1,57 @@ +Extensions, deletions, and legacy features supported +==================================================== + +* Tabs in source +* `<>` as synonym for `.NE.` and `/=` +* `$` and `@` as legal characters in names +* `.T.` and `.F.` +* Initialization in type declaration statements using `/values/` +* Kind specification with `*`, e.g. `REAL*4` +* `DOUBLE COMPLEX` +* Signed complex literal constants +* `.XOR.` as predefined operator (can be overridden) +* `.N.`, `.A.`, `.O.`, `.X.` predefined operator synonyms +* `STRUCTURE`, `RECORD`, `UNION`, and `MAP` +* Structure field access with `.field` +* `NCHARACTER` type and `NC` Kanji character literals +* `BYTE` as synonym for `INTEGER(KIND=1)` +* Quad precision REAL literals with `Q` +* `X` prefix/suffix as synonym for `Z` on hexadecimal literals +* `B`, `O`, `Z`, and `X` accepted as suffixes as well as prefixes +* Triplets allowed in array constructors +* Old-style `PARAMETER pi=3.14` statement (no parentheses) +* `%LOC`, `%VAL`, and `%REF` +* Leading comma allowed before I/O item list +* Empty parentheses allowed in `PROGRAM P()` +* Missing parentheses allowed in `FUNCTION F` +* Cray based `POINTER(p,x)` +* Arithmetic `IF`. (Which branch with NaN take?) +* `ASSIGN` statement, assigned `GO TO`, and assigned format +* `PAUSE` statement +* Hollerith literals and edit descriptors +* `NAMELIST` allowed in the execution part +* Omitted colons on type declaration statements with attributes +* COMPLEX constructor expression, e.g. `(x+y,z)` +* `+` and `-` before all primary expressions, e.g. `x*-y` +* `.NOT. .NOT.` accepted +* `NAME=` as synonym for `FILE=` +* `DISPOSE=` +* Data edit descriptors without width or other details +* Backslash escape character sequences in quoted character literals +* `D` lines in fixed form as comments or debug code + +Extensions and legacy features deliberately not supported +--------------------------------------------------------- +* `.LG.` as synonym for `.NE.` +* `REDIMENSION` +* Allocatable `COMMON` +* Expressions in formats +* `ACCEPT` as synonym for `READ *` +* `ARRAY` as synonym for `DIMENSION` +* `VIRTUAL` as synonym for `DIMENSION` +* `ENCODE` and `DECODE` as synonyms for internal I/O +* `IMPLICIT AUTOMATIC`, `IMPLICIT STATIC` +* Default exponent of zero, e.g. `3.14159E` +* Characters in defined operators that are neither letters nor digits +* `B` suffix on unquoted octal constants +* `Z` prefix on unquoted hexadecimal constants diff --git a/flang/documentation/parsing.md b/flang/documentation/parsing.md index a40a0a8..ef1e756 100644 --- a/flang/documentation/parsing.md +++ b/flang/documentation/parsing.md @@ -51,6 +51,9 @@ by a CookedSource class instance, in which: * preprocessing directives have been implemented * preprocessing macro invocations have been expanded * legacy `D` lines in fixed form source have been omitted or included +* except for the payload in character literals, Hollerith constants, + and character and Hollerith edit descriptors, all letters have been + normalized to lower case Lines in the cooked character stream can be of arbitrary length. diff --git a/flang/lib/parser/basic-parsers.h b/flang/lib/parser/basic-parsers.h index 2804196..72f41b0 100644 --- a/flang/lib/parser/basic-parsers.h +++ b/flang/lib/parser/basic-parsers.h @@ -14,6 +14,7 @@ // This header defines the fundamental parser template classes and helper // template functions. See parser-combinators.txt for documentation. +#include "char-block.h" #include "idioms.h" #include "message.h" #include "parse-state.h" @@ -81,7 +82,7 @@ public: MessageContext context{state->context()}; ParseState backtrack{*state}; std::optional result{parser_.Parse(state)}; - if (result) { + if (result.has_value()) { // preserve any new messages messages.Annex(state->messages()); state->messages()->swap(messages); @@ -233,6 +234,7 @@ inline constexpr auto operator/(const PA &pa, const PB &pb) { template class AlternativeParser { public: using resultType = typename PA::resultType; + static_assert(std::is_same_v); constexpr AlternativeParser(const AlternativeParser &) = default; constexpr AlternativeParser(const PA &pa, const PB &pb) : pa_{pa}, pb_{pb} {} std::optional Parse(ParseState *state) const { @@ -306,6 +308,7 @@ inline constexpr auto operator||(const AlternativeParser &papb, template class RecoveryParser { public: using resultType = typename PA::resultType; + static_assert(std::is_same_v); constexpr RecoveryParser(const RecoveryParser &) = default; constexpr RecoveryParser(const PA &pa, const PB &pb) : pa_{pa}, pb_{pb} {} std::optional Parse(ParseState *state) const { @@ -1197,20 +1200,22 @@ private: inline constexpr auto guard(bool truth) { return GuardParser(truth); } -// nextChar is a parser that succeeds if the parsing state is not -// at the end of its input, returning the next character and +// nextCh is a parser that succeeds if the parsing state is not +// at the end of its input, returning the next character location and // advancing the parse when it does so. -constexpr struct NextCharParser { - using resultType = char; - constexpr NextCharParser() {} - std::optional Parse(ParseState *state) const { - std::optional ch{state->GetNextChar()}; - if (!ch) { +constexpr struct NextCh { + using resultType = const char *; + constexpr NextCh() {} + std::optional Parse(ParseState *state) const { + if (state->IsAtEnd()) { state->PutMessage("end of file"_en_US); + return {}; } - return ch; + const char *at{state->GetLocation()}; + state->UncheckedAdvance(); + return {at}; } -} nextChar; +} nextCh; // If a is a parser for nonstandard usage, extension(a) is a parser that // is disabled in strict conformance mode and otherwise sets a violation flag @@ -1226,7 +1231,7 @@ public: } auto at = state->GetLocation(); auto result = parser_.Parse(state); - if (result) { + if (result.has_value()) { state->set_anyConformanceViolation(); if (state->warnOnNonstandardUsage()) { state->PutMessage(at, "nonstandard usage"_en_US); @@ -1274,6 +1279,29 @@ template inline constexpr auto deprecated(const PA &parser) { return DeprecatedParser(parser); } +// Parsing objects with "source" members. +template class SourcedParser { +public: + using resultType = typename PA::resultType; + constexpr SourcedParser(const SourcedParser &) = default; + constexpr SourcedParser(const PA &parser) : parser_{parser} {} + std::optional Parse(ParseState *state) const { + const char *start{state->GetLocation()}; + auto result = parser_.Parse(state); + if (result.has_value()) { + result->source = CharBlock{start, state->GetLocation()}; + } + return result; + } + +private: + const PA parser_; +}; + +template inline constexpr auto sourced(const PA &parser) { + return SourcedParser{parser}; +} + constexpr struct GetUserState { using resultType = UserState *; constexpr GetUserState() {} @@ -1300,15 +1328,6 @@ constexpr struct GetColumn { return {state->column()}; } } getColumn; - -constexpr struct GetProvenance { - using resultType = Provenance; - constexpr GetProvenance() {} - static std::optional Parse(ParseState *state) { - return {state->GetProvenance()}; - } -} getProvenance; - } // namespace parser } // namespace Fortran #endif // FORTRAN_PARSER_BASIC_PARSERS_H_ diff --git a/flang/lib/parser/char-block.h b/flang/lib/parser/char-block.h new file mode 100644 index 0000000..85db0f9 --- /dev/null +++ b/flang/lib/parser/char-block.h @@ -0,0 +1,80 @@ +#ifndef FORTRAN_PARSER_CHAR_BLOCK_H_ +#define FORTRAN_PARSER_CHAR_BLOCK_H_ + +// Describes a contiguous block of characters; does not own their storage. + +#include "interval.h" +#include +#include +#include +#include +#include + +namespace Fortran { +namespace parser { + +class CharBlock { +public: + CharBlock() {} + CharBlock(const char *x, std::size_t n = 1) : interval_{x, n} {} + CharBlock(const char *b, const char *e) + : interval_{b, static_cast(e - b)} {} + CharBlock(const std::string &s) : interval_{s.data(), s.size()} {} + CharBlock(const CharBlock &) = default; + CharBlock(CharBlock &&) = default; + CharBlock &operator=(const CharBlock &) = default; + CharBlock &operator=(CharBlock &&) = default; + + bool empty() const { return interval_.empty(); } + std::size_t size() const { return interval_.size(); } + const char *begin() const { return interval_.start(); } + const char *end() const { return interval_.start() + interval_.size(); } + const char &operator[](std::size_t j) const { return interval_.start()[j]; } + + bool IsBlank() const { + for (char ch : *this) { + if (ch != ' ' && ch != '\t') { + return false; + } + } + return true; + } + + std::string ToString() const { + return std::string{interval_.start(), interval_.size()}; + } + + bool operator<(const CharBlock &that) const { return Compare(that) < 0; } + bool operator<=(const CharBlock &that) const { return Compare(that) <= 0; } + bool operator==(const CharBlock &that) const { return Compare(that) == 0; } + bool operator!=(const CharBlock &that) const { return Compare(that) != 0; } + bool operator>=(const CharBlock &that) const { return Compare(that) >= 0; } + bool operator>(const CharBlock &that) const { return Compare(that) > 0; } + +private: + int Compare(const CharBlock &that) const { + std::size_t bytes{std::min(size(), that.size())}; + int cmp{std::memcmp(static_cast(begin()), + static_cast(that.begin()), bytes)}; + if (cmp != 0) { + return cmp; + } + return size() < that.size() ? -1 : size() > that.size(); + } + + Interval interval_{nullptr, 0}; +}; +} // namespace parser +} // namespace Fortran + +// Specializations to enable std::unordered_map &c. +template<> struct std::hash { + std::size_t operator()(const Fortran::parser::CharBlock &x) const { + std::size_t hash{0}, bytes{x.size()}; + for (std::size_t j{0}; j < bytes; ++j) { + hash = (hash * 31) ^ x[j]; + } + return hash; + } +}; +#endif // FORTRAN_PARSER_CHAR_BLOCK_H_ diff --git a/flang/lib/parser/char-buffer.cc b/flang/lib/parser/char-buffer.cc index 185bcd9..2a3c017 100644 --- a/flang/lib/parser/char-buffer.cc +++ b/flang/lib/parser/char-buffer.cc @@ -1,12 +1,13 @@ #include "char-buffer.h" #include "idioms.h" #include +#include #include namespace Fortran { namespace parser { -char *CharBuffer::FreeSpace(size_t *n) { +char *CharBuffer::FreeSpace(std::size_t *n) { int offset{LastBlockOffset()}; if (blocks_.empty()) { blocks_.emplace_front(); @@ -20,16 +21,16 @@ char *CharBuffer::FreeSpace(size_t *n) { return last_->data + offset; } -void CharBuffer::Claim(size_t n) { +void CharBuffer::Claim(std::size_t n) { if (n > 0) { bytes_ += n; lastBlockEmpty_ = false; } } -void CharBuffer::Put(const char *data, size_t n) { - size_t chunk; - for (size_t at{0}; at < n; at += chunk) { +void CharBuffer::Put(const char *data, std::size_t n) { + std::size_t chunk; + for (std::size_t at{0}; at < n; at += chunk) { char *to{FreeSpace(&chunk)}; chunk = std::min(n - at, chunk); Claim(chunk); diff --git a/flang/lib/parser/char-buffer.h b/flang/lib/parser/char-buffer.h index 7ea6419c..1347f82 100644 --- a/flang/lib/parser/char-buffer.h +++ b/flang/lib/parser/char-buffer.h @@ -4,6 +4,7 @@ // Defines a simple expandable buffer suitable for efficiently accumulating // a stream of bytes. +#include #include #include #include @@ -29,7 +30,7 @@ public: return *this; } - size_t size() const { return bytes_; } + std::size_t size() const { return bytes_; } void clear() { blocks_.clear(); @@ -38,15 +39,15 @@ public: lastBlockEmpty_ = false; } - char *FreeSpace(size_t *); - void Claim(size_t); - void Put(const char *data, size_t n); + char *FreeSpace(std::size_t *); + void Claim(std::size_t); + void Put(const char *data, std::size_t n); void Put(const std::string &); void Put(char x) { Put(&x, 1); } private: struct Block { - static constexpr size_t capacity{1 << 20}; + static constexpr std::size_t capacity{1 << 20}; char data[capacity]; }; @@ -76,7 +77,7 @@ public: ++*this; return result; } - iterator &operator+=(size_t n) { + iterator &operator+=(std::size_t n) { while (n >= Block::capacity - offset_) { n -= Block::capacity - offset_; offset_ = 0; @@ -110,7 +111,7 @@ private: int LastBlockOffset() const { return bytes_ % Block::capacity; } std::forward_list blocks_; std::forward_list::iterator last_{blocks_.end()}; - size_t bytes_{0}; + std::size_t bytes_{0}; bool lastBlockEmpty_{false}; }; } // namespace parser diff --git a/flang/lib/parser/characters.cc b/flang/lib/parser/characters.cc index 8ead0b7..7953fb9 100644 --- a/flang/lib/parser/characters.cc +++ b/flang/lib/parser/characters.cc @@ -1,4 +1,6 @@ #include "characters.h" +#include +#include namespace Fortran { namespace parser { @@ -52,9 +54,9 @@ std::optional EUC_JPCharacterBytes(const char *p) { return {}; } -std::optional CountCharacters( - const char *p, size_t bytes, std::optional (*cbf)(const char *)) { - size_t chars{0}; +std::optional CountCharacters( + const char *p, std::size_t bytes, std::optional (*cbf)(const char *)) { + std::size_t chars{0}; const char *limit{p + bytes}; while (p < limit) { ++chars; diff --git a/flang/lib/parser/characters.h b/flang/lib/parser/characters.h index 4e1e1d0..d0a837e 100644 --- a/flang/lib/parser/characters.h +++ b/flang/lib/parser/characters.h @@ -5,6 +5,7 @@ // conversions here to avoid dependences upon and // also to accomodate Fortran tokenization. +#include #include #include @@ -160,8 +161,8 @@ void EmitQuotedChar(char ch, const NORMAL &emit, const INSERTED &insert, std::optional UTF8CharacterBytes(const char *); std::optional EUC_JPCharacterBytes(const char *); -std::optional CountCharacters( - const char *, size_t bytes, std::optional (*)(const char *)); +std::optional CountCharacters( + const char *, std::size_t bytes, std::optional (*)(const char *)); } // namespace parser } // namespace Fortran #endif // FORTRAN_PARSER_CHARACTERS_H_ diff --git a/flang/lib/parser/debug-parser.h b/flang/lib/parser/debug-parser.h index 664d029..f474a8c 100644 --- a/flang/lib/parser/debug-parser.h +++ b/flang/lib/parser/debug-parser.h @@ -7,6 +7,7 @@ #include "basic-parsers.h" #include "parse-state.h" +#include #include #include #include @@ -18,7 +19,8 @@ class DebugParser { public: using resultType = Success; constexpr DebugParser(const DebugParser &) = default; - constexpr DebugParser(const char *str, size_t n) : str_{str}, length_{n} {} + constexpr DebugParser(const char *str, std::size_t n) + : str_{str}, length_{n} {} std::optional Parse(ParseState *state) const { if (auto context = state->context()) { context->Emit(std::cout, *state->cooked().allSources()); @@ -31,10 +33,10 @@ public: private: const char *const str_; - size_t length_; + std::size_t length_; }; -constexpr DebugParser operator""_debug(const char str[], size_t n) { +constexpr DebugParser operator""_debug(const char str[], std::size_t n) { return DebugParser{str, n}; } } // namespace parser diff --git a/flang/lib/parser/grammar.h b/flang/lib/parser/grammar.h index ffcaa84..f32155d 100644 --- a/flang/lib/parser/grammar.h +++ b/flang/lib/parser/grammar.h @@ -78,7 +78,6 @@ constexpr Parser signedIntLiteralConstant; // R707 constexpr Parser intLiteralConstant; // R708 constexpr Parser kindParam; // R709 constexpr Parser realLiteralConstant; // R714 -constexpr Parser exponentPart; // R717 constexpr Parser charLength; // R723 constexpr Parser charLiteralConstant; // R724 constexpr Parser initialization; // R743 & R805 @@ -179,15 +178,15 @@ template using statementConstructor = construct>; template inline constexpr auto unterminatedStatement(const PA &p) { - return skipMany("\n"_tok) >> statementConstructor{}(getProvenance, - maybe(label), isLabelOk, spaces >> p); + return skipMany("\n"_tok) >> + sourced(statementConstructor{}(maybe(label), isLabelOk, spaces >> p)); } -constexpr auto endOfLine = CharMatch<'\n'>{} / skipMany("\n"_tok) || - fail("expected end of line"_en_US); +constexpr auto endOfLine = "\n"_ch / skipMany("\n"_tok) || + fail("expected end of line"_en_US); constexpr auto endOfStmt = spaces >> - (CharMatch<';'>{} / skipMany(";"_tok) / maybe(endOfLine) || endOfLine); + (";"_ch / skipMany(";"_tok) / maybe(endOfLine) || endOfLine); template inline constexpr auto statement(const PA &p) { return unterminatedStatement(p) / endOfStmt; @@ -273,9 +272,6 @@ TYPE_PARSER( construct{}(indirect(Parser{})) || construct{}(indirect(Parser{}))) -// R516 keyword -> name -constexpr auto keyword = name; - // R604 constant -> literal-constant | named-constant // Used only via R607 int-constant and R845 data-stmt-constant. TYPE_PARSER(construct{}(literalConstant) || @@ -298,8 +294,11 @@ constexpr auto namedIntrinsicOperator = ".LT." >> ".OR." >> pure(DefinedOperator::IntrinsicOperator::OR) || ".EQV." >> pure(DefinedOperator::IntrinsicOperator::EQV) || ".NEQV." >> pure(DefinedOperator::IntrinsicOperator::NEQV) || - extension(".XOR." >> pure(DefinedOperator::IntrinsicOperator::XOR)); -// TODO: .N./.A./.O./.X. abbreviations? + extension(".XOR." >> pure(DefinedOperator::IntrinsicOperator::XOR) || + ".N." >> pure(DefinedOperator::IntrinsicOperator::NOT) || + ".A." >> pure(DefinedOperator::IntrinsicOperator::AND) || + ".O." >> pure(DefinedOperator::IntrinsicOperator::OR) || + ".X." >> pure(DefinedOperator::IntrinsicOperator::XOR)); constexpr auto intrinsicOperator = "**" >> pure(DefinedOperator::IntrinsicOperator::Power) || @@ -588,33 +587,21 @@ constexpr auto executionPart = inContext("execution part"_en_US, many(executionPartConstruct)); // R602 underscore -> _ -constexpr CharMatch<'_'> underscore; +constexpr auto underscore = "_"_ch; +// R516 keyword -> name // R601 alphanumeric-character -> letter | digit | underscore +// R603 name -> letter [alphanumeric-character]... // N.B. Don't accept an underscore if it is immediately followed by a // quotation mark, so that kindParameter_"character literal" is parsed properly. -constexpr auto otherIdCharacter = - underscore / !(CharMatch<'\''>{} || CharMatch<'"'>{}) || - extension( - CharMatch<'$'>{} || // PGI/ifort (and Cray/gfortran, but not first) - CharMatch<'@'>{}); // Cray - -constexpr auto nonDigitIdCharacter = letter || otherIdCharacter; - -// R603 name -> letter [alphanumeric-character]... -static inline Name listToString(std::list &&chlist) { - Name result; - for (auto ch : chlist) { - result += ToLowerCaseLetter(ch); - } - return result; -} - -constexpr auto rawName = applyFunction(listToString, - applyFunction(prepend, nonDigitIdCharacter, - many(nonDigitIdCharacter || digit))); - -TYPE_PARSER(spaces >> rawName) +// PGI and ifort accept '$' in identifiers, even as the initial character. +// Cray and gfortran accept '$', but not as the first character. +// Cray accepts '@' as well. +constexpr auto otherIdChar = underscore / !"'\""_ch || extension("$@"_ch); +constexpr auto nonDigitIdChar = letter || otherIdChar; +constexpr auto rawName = nonDigitIdChar >> many(nonDigitIdChar || digit); +TYPE_PARSER(spaces >> sourced(attempt(rawName) >> construct{})) +constexpr auto keyword = construct{}(name); // R605 literal-constant -> // int-literal-constant | real-literal-constant | @@ -716,7 +703,8 @@ TYPE_PARSER(construct{}( extension(construct{}( construct{}("*" >> digitString)))) -// R707 signed-int-literal-constant -> [sign] int-literal-constant +// R710 signed-digit-string -> [sign] digit-string +// N.B. Not a complete token -- no spaces are skipped. static inline std::int64_t negate(std::uint64_t &&n) { return -n; // TODO: check for overflow } @@ -725,11 +713,13 @@ static inline std::int64_t castToSigned(std::uint64_t &&n) { return n; // TODO: check for overflow } -TYPE_PARSER(spaces >> - construct{}( - CharMatch<'-'>{} >> applyFunction(negate, digitString) || - maybe(CharMatch<'+'>{}) >> applyFunction(castToSigned, digitString), - maybe(underscore >> kindParam))) +constexpr auto signedDigitString = "-"_ch >> + applyFunction(negate, digitString) || + maybe("+"_ch) >> applyFunction(castToSigned, digitString); + +// R707 signed-int-literal-constant -> [sign] int-literal-constant +TYPE_PARSER(spaces >> sourced(construct{}( + signedDigitString, maybe(underscore >> kindParam)))) // R708 int-literal-constant -> digit-string [_ kind-param] TYPE_PARSER(construct{}( @@ -739,16 +729,10 @@ TYPE_PARSER(construct{}( TYPE_PARSER(construct{}(digitString) || construct{}(scalar(integer(constant(name))))) -// R710 signed-digit-string -> [sign] digit-string -// N.B. Not a complete token -- no spaces are skipped. -constexpr auto signedDigitString = CharMatch<'-'>{} >> - applyFunction(negate, digitString) || - maybe(CharMatch<'+'>{}) >> digitString; - // R712 sign -> + | - // Not a complete token. -constexpr auto sign = CharMatch<'+'>{} >> pure(Sign::Positive) || - CharMatch<'-'>{} >> pure(Sign::Negative); +constexpr auto sign = "+"_ch >> pure(Sign::Positive) || + "-"_ch >> pure(Sign::Negative); // R713 signed-real-literal-constant -> [sign] real-literal-constant constexpr auto signedRealLiteralConstant = spaces >> @@ -758,37 +742,23 @@ constexpr auto signedRealLiteralConstant = spaces >> // significand [exponent-letter exponent] [_ kind-param] | // digit-string exponent-letter exponent [_ kind-param] // R715 significand -> digit-string . [digit-string] | . digit-string -// N.B. Preceding spaces are not skipped. -TYPE_CONTEXT_PARSER("REAL literal constant"_en_US, - construct{}(some(digit), - CharMatch<'.'>{} >> - !(some(letter) >> CharMatch<'.'>{}) >> // don't misinterpret 1.AND. - many(digit), - maybe(exponentPart), maybe(underscore >> kindParam)) || - construct{}(CharMatch<'.'>{} >> some(digit), - maybe(exponentPart), maybe(underscore >> kindParam)) || - construct{}( - some(digit), exponentPart, maybe(underscore >> kindParam))) - // R716 exponent-letter -> E | D // Extension: Q -// Not a complete token. -inline constexpr bool isEorD(char ch) { - ch = ToLowerCaseLetter(ch); - return ch == 'e' || ch == 'd'; -} - -inline constexpr bool isQ(char ch) { return ToLowerCaseLetter(ch) == 'q'; } - -constexpr CharPredicateGuardParser exponentEorD{ - isEorD, "expected exponent letter"_en_US}; -constexpr CharPredicateGuardParser exponentQ{ - isQ, "expected exponent letter"_en_US}; - // R717 exponent -> signed-digit-string -// Not a complete token. -TYPE_PARSER(construct{}( - extension(exponentQ) || exponentEorD, signedDigitString)) +// N.B. Preceding spaces are not skipped. +constexpr auto exponentPart = + ("ed"_ch || extension("q"_ch)) >> signedDigitString; + +TYPE_CONTEXT_PARSER("REAL literal constant"_en_US, + construct{}( + sourced( + (digitString >> "."_ch >> + !(some(letter) >> "."_ch /* don't misinterpret 1.AND. */) >> + maybe(digitString) >> maybe(exponentPart) >> ok || + "."_ch >> digitString >> maybe(exponentPart) >> ok || + digitString >> exponentPart >> ok) >> + construct{}), + maybe(underscore >> kindParam))) // R718 complex-literal-constant -> ( real-part , imag-part ) TYPE_CONTEXT_PARSER("COMPLEX literal constant"_en_US, @@ -844,8 +814,7 @@ TYPE_PARSER(construct{}(parenthesized(typeParamValue)) || // N.B. charLiteralConstantWithoutKind does not skip preceding spaces. // N.B. the parsing of "name" takes care to not consume the '_'. constexpr auto charLiteralConstantWithoutKind = - CharMatch<'\''>{} >> CharLiteral<'\''>{} || - CharMatch<'"'>{} >> CharLiteral<'"'>{}; + "'"_ch >> CharLiteral<'\''>{} || "\""_ch >> CharLiteral<'"'>{}; TYPE_CONTEXT_PARSER("CHARACTER literal constant"_en_US, construct{}( @@ -1161,7 +1130,7 @@ TYPE_PARSER(construct{}(declarationTypeSpec, optionalListBeforeColons(Parser{}), nonemptyList(entityDecl)) || // PGI-only extension: don't require the colons - // TODO: The standard requires the colons if the entity + // N.B.: The standard requires the colons if the entity // declarations contain initializers. extension(construct{}(declarationTypeSpec, defaulted("," >> nonemptyList(Parser{})), @@ -1205,7 +1174,7 @@ TYPE_PARSER(construct{}(objectName, maybe(arraySpec), maybe(coarraySpec), maybe("*" >> charLength), maybe(initialization))) // R806 null-init -> function-reference -// TODO: confirm that NULL still intrinsic +// TODO: confirm in semantics that NULL still intrinsic in this scope TYPE_PARSER("NULL ( )" >> construct{}) // R807 access-spec -> PUBLIC | PRIVATE @@ -1503,8 +1472,8 @@ TYPE_PARSER(construct{}(declarationTypeSpec, // R865 letter-spec -> letter [- letter] TYPE_PARSER(spaces >> (construct{}(letter, maybe("-" >> letter)) || - construct{}(extension(otherIdCharacter), - construct>{}))) + construct{}(otherIdChar, + construct>{}))) // R867 import-stmt -> // IMPORT [[::] import-name-list] | @@ -1661,8 +1630,10 @@ TYPE_PARSER(construct{}( // R1023 defined-binary-op -> . letter [letter]... . // R1414 local-defined-operator -> defined-unary-op | defined-binary-op // R1415 use-defined-operator -> defined-unary-op | defined-binary-op -TYPE_PARSER(construct{}(applyFunction(listToString, - spaces >> CharMatch<'.'>{} >> some(letter) / CharMatch<'.'>{}))) +// N.B. The name of the operator is captured without the periods around it. +TYPE_PARSER(spaces >> "."_ch >> + construct{}(sourced(some(letter) >> construct{})) / + "."_ch) // R911 data-ref -> part-ref [% part-ref]... // R914 coindexed-named-object -> data-ref @@ -1672,7 +1643,7 @@ constexpr struct DefinedOperatorName { static std::optional Parse(ParseState *state) { if (std::optional n{definedOpName.Parse(state)}) { if (const auto *user = state->userState()) { - if (user->IsDefinedOperator(n->v)) { + if (user->IsDefinedOperator(n->v.source)) { return {Success{}}; } } @@ -2084,7 +2055,7 @@ static constexpr struct EquivOperand { // R1017 level-5-expr -> [level-5-expr equiv-op] equiv-operand // R1021 equiv-op -> .EQV. | .NEQV. // Logical equivalence is left-associative. -// Extension: .XOR. as synonym for .NEQV. (TODO: is this the right precedence?) +// Extension: .XOR. as synonym for .NEQV. constexpr struct Level5Expr { using resultType = Expr; constexpr Level5Expr() {} @@ -3423,7 +3394,7 @@ constexpr struct NoteOperatorDefinition { if (op.has_value()) { if (auto ustate = state->userState()) { if (const auto *name = std::get_if(&op->u)) { - ustate->NoteDefinedOperator(name->v); + ustate->NoteDefinedOperator(name->v.source); } } } @@ -3726,7 +3697,7 @@ TYPE_CONTEXT_PARSER("PAUSE statement"_en_US, // R1221 dtv-type-spec -> TYPE ( derived-type-spec ) | // CLASS ( derived-type-spec ) // -// There requirement productions are defined and used, but need not be +// These requirement productions are defined and used, but need not be // defined independently here in this file: // R771 lbracket -> [ // R772 rbracket -> ] diff --git a/flang/lib/parser/interval.h b/flang/lib/parser/interval.h new file mode 100644 index 0000000..fc238b2 --- /dev/null +++ b/flang/lib/parser/interval.h @@ -0,0 +1,74 @@ +#ifndef FORTRAN_PARSER_INTERVAL_H_ +#define FORTRAN_PARSER_INTERVAL_H_ + +// Defines a generalized template class Interval to represent +// the half-open interval [x .. x+n). + +#include "idioms.h" +#include +#include + +namespace Fortran { +namespace parser { + +template class Interval { +public: + using type = A; + Interval() {} + Interval(const A &s, std::size_t n = 1) : start_{s}, size_{n} {} + Interval(A &&s, std::size_t n = 1) : start_{std::move(s)}, size_{n} {} + Interval(const Interval &) = default; + Interval(Interval &&) = default; + Interval &operator=(const Interval &) = default; + Interval &operator=(Interval &&) = default; + + bool operator==(const Interval &that) const { + return start_ == that.start_ && size_ == that.size_; + } + + const A &start() const { return start_; } + std::size_t size() const { return size_; } + bool empty() const { return size_ == 0; } + + bool Contains(const A &x) const { return start_ <= x && x < start_ + size_; } + bool Contains(const Interval &that) const { + return Contains(that.start_) && Contains(that.start_ + (that.size_ - 1)); + } + bool ImmediatelyPrecedes(const Interval &that) const { + return NextAfter() == that.start_; + } + void Annex(const Interval &that) { + size_ = (that.start_ + that.size_) - start_; + } + bool AnnexIfPredecessor(const Interval &that) { + if (ImmediatelyPrecedes(that)) { + size_ += that.size_; + return true; + } + return false; + } + + std::size_t MemberOffset(const A &x) const { + CHECK(Contains(x)); + return x - start_; + } + A OffsetMember(std::size_t n) const { + CHECK(n < size_); + return start_ + n; + } + + A Last() const { return start_ + (size_ - 1); } + A NextAfter() const { return start_ + size_; } + Interval Prefix(std::size_t n) const { return {start_, std::min(size_, n)}; } + Interval Suffix(std::size_t n) const { + CHECK(n <= size_); + return {start_ + n, size_ - n}; + } + +private: + A start_; + std::size_t size_{0}; +}; +} // namespace parser +} // namespace Fortran +#endif // FORTRAN_PARSER_INTERVAL_H_ diff --git a/flang/lib/parser/message.cc b/flang/lib/parser/message.cc index 85c629f..0552753 100644 --- a/flang/lib/parser/message.cc +++ b/flang/lib/parser/message.cc @@ -1,5 +1,6 @@ #include "message.h" #include +#include #include #include @@ -7,7 +8,7 @@ namespace Fortran { namespace parser { std::ostream &operator<<(std::ostream &o, const MessageFixedText &t) { - for (size_t j{0}; j < t.size(); ++j) { + for (std::size_t j{0}; j < t.size(); ++j) { o << t.str()[j]; } return o; @@ -41,7 +42,7 @@ MessageFixedText MessageExpectedText::AsMessageFixedText() const { if (chars[1] == '\0') { // one-time initialization of array used for permanant single-byte string // pointers - for (size_t j{0}; j < sizeof chars; ++j) { + for (std::size_t j{0}; j < sizeof chars; ++j) { chars[j] = j; } } diff --git a/flang/lib/parser/message.h b/flang/lib/parser/message.h index b586645..4a6fbd0 100644 --- a/flang/lib/parser/message.h +++ b/flang/lib/parser/message.h @@ -6,6 +6,7 @@ #include "idioms.h" #include "provenance.h" +#include #include #include #include @@ -19,7 +20,7 @@ namespace parser { class MessageFixedText { public: MessageFixedText() {} - constexpr MessageFixedText(const char str[], size_t n) + constexpr MessageFixedText(const char str[], std::size_t n) : str_{str}, bytes_{n} {} constexpr MessageFixedText(const MessageFixedText &) = default; MessageFixedText(MessageFixedText &&) = default; @@ -27,17 +28,17 @@ public: MessageFixedText &operator=(MessageFixedText &&) = default; const char *str() const { return str_; } - size_t size() const { return bytes_; } + std::size_t size() const { return bytes_; } bool empty() const { return bytes_ == 0; } std::string ToString() const; private: const char *str_{nullptr}; - size_t bytes_{0}; + std::size_t bytes_{0}; }; -constexpr MessageFixedText operator""_en_US(const char str[], size_t n) { +constexpr MessageFixedText operator""_en_US(const char str[], std::size_t n) { return MessageFixedText{str, n}; } @@ -55,14 +56,14 @@ private: // Represents a formatted rendition of "expected '%s'"_en_US on a constant text. class MessageExpectedText { public: - MessageExpectedText(const char *s, size_t n) : str_{s}, bytes_{n} {} + MessageExpectedText(const char *s, std::size_t n) : str_{s}, bytes_{n} {} explicit MessageExpectedText(char ch) : singleton_{ch} {} MessageFixedText AsMessageFixedText() const; private: const char *str_{nullptr}; char singleton_; - size_t bytes_{1}; + std::size_t bytes_{1}; }; class Message; @@ -132,7 +133,7 @@ public: const AllSources &allSources() const { return allSources_; } Message &Put(Message &&m) { - CHECK(m.provenance() < allSources_.size()); + CHECK(allSources_.IsValid(m.provenance())); if (messages_.empty()) { messages_.emplace_front(std::move(m)); last_ = messages_.begin(); diff --git a/flang/lib/parser/parse-state.h b/flang/lib/parser/parse-state.h index abed103..a6e6915 100644 --- a/flang/lib/parser/parse-state.h +++ b/flang/lib/parser/parse-state.h @@ -11,6 +11,7 @@ #include "idioms.h" #include "message.h" #include "provenance.h" +#include #include #include #include @@ -53,7 +54,7 @@ public: } void swap(ParseState &that) { - constexpr size_t bytes{sizeof *this}; + constexpr std::size_t bytes{sizeof *this}; char buffer[bytes]; std::memcpy(buffer, this, bytes); std::memcpy(this, &that, bytes); diff --git a/flang/lib/parser/parse-tree-visitor.h b/flang/lib/parser/parse-tree-visitor.h index d89dc96..2760809 100644 --- a/flang/lib/parser/parse-tree-visitor.h +++ b/flang/lib/parser/parse-tree-visitor.h @@ -2,6 +2,7 @@ #define FORTRAN_PARSER_PARSE_TREE_VISITOR_H_ #include "parse-tree.h" +#include #include #include #include @@ -17,7 +18,7 @@ namespace Fortran { namespace parser { -// Default case for visitation of non-class data members (and strings) +// Default case for visitation of non-class data members and strings template typename std::enable_if || std::is_same_v>::type @@ -45,7 +46,7 @@ template void Walk(const std::list &x, V &visitor) { Walk(elem, visitor); } } -template +template void ForEachInTuple(const T &tuple, Func func) { if constexpr (I < std::tuple_size_v) { func(std::get(tuple)); @@ -130,6 +131,12 @@ template void Walk(const Statement &x, V &visitor) { } } +template void Walk(const Name &x, V &visitor) { + if (visitor.Pre(x)) { + visitor.Post(x); + } +} + template void Walk(const AcSpec &x, V &visitor) { if (visitor.Pre(x)) { Walk(x.type, visitor); @@ -240,13 +247,16 @@ template void Walk(const ReadStmt &x, V &visitor) { } template void Walk(const RealLiteralConstant &x, V &visitor) { if (visitor.Pre(x)) { - Walk(x.intPart, visitor); - Walk(x.fraction, visitor); - Walk(x.exponent, visitor); + Walk(x.real, visitor); Walk(x.kind, visitor); visitor.Post(x); } } +template void Walk(const RealLiteralConstant::Real &x, V &visitor) { + if (visitor.Pre(x)) { + visitor.Post(x); + } +} template void Walk(const StructureComponent &x, V &visitor) { if (visitor.Pre(x)) { Walk(x.base, visitor); diff --git a/flang/lib/parser/parse-tree.cc b/flang/lib/parser/parse-tree.cc index cf98176..315edef 100644 --- a/flang/lib/parser/parse-tree.cc +++ b/flang/lib/parser/parse-tree.cc @@ -6,33 +6,6 @@ namespace Fortran { namespace parser { -// R714 real-literal-constant -// R715 significand -static std::string charListToString(std::list &&cs) { - std::string result; - for (auto ch : cs) { - result += ch; - } - return result; -} - -RealLiteralConstant::RealLiteralConstant(std::list &&i, - std::list &&f, std::optional &&expo, - std::optional &&k) - : intPart{charListToString(std::move(i))}, fraction{charListToString( - std::move(f))}, - exponent(std::move(expo)), kind(std::move(k)) {} - -RealLiteralConstant::RealLiteralConstant(std::list &&f, - std::optional &&expo, std::optional &&k) - : fraction{charListToString(std::move(f))}, exponent(std::move(expo)), - kind(std::move(k)) {} - -RealLiteralConstant::RealLiteralConstant( - std::list &&i, ExponentPart &&expo, std::optional &&k) - : intPart{charListToString(std::move(i))}, exponent(std::move(expo)), - kind(std::move(k)) {} - // R867 ImportStmt::ImportStmt(Kind &&k, std::list &&n) : kind{k}, names(std::move(n)) { @@ -67,7 +40,7 @@ ProcedureDesignator Designator::ConvertToProcedureDesignator() { }, [](Substring &) -> ProcedureDesignator { CHECK(!"can't get here"); - return {Name{""}}; + return {Name{}}; }}, u); } @@ -158,7 +131,7 @@ ActualArg SectionSubscript::ConvertToActualArgument() { }, [](SubscriptTriplet &) -> ActualArg { CHECK(!"can't happen"); - return {Name{"bad"}}; + return {Name{}}; }}, u); } diff --git a/flang/lib/parser/parse-tree.h b/flang/lib/parser/parse-tree.h index e5bab7f..0caf8d6 100644 --- a/flang/lib/parser/parse-tree.h +++ b/flang/lib/parser/parse-tree.h @@ -9,6 +9,8 @@ // run-time I/O support library have been isolated into a distinct header file // (viz., format-specification.h). +#include "char-block.h" +#include "characters.h" #include "format-specification.h" #include "idioms.h" #include "indirection.h" @@ -39,14 +41,19 @@ CLASS_TRAIT(UnionTrait); CLASS_TRAIT(TupleTrait); // Most non-template classes in this file use these default definitions -// for their move constructor and move assignment operator=. -#define BOILERPLATE(classname) \ +// for their move constructor and move assignment operator=, and disable +// their copy constructor and copy assignment operator=. +#define COPY_AND_ASSIGN_BOILERPLATE(classname) \ classname(classname &&) = default; \ classname &operator=(classname &&) = default; \ - classname() = delete; \ classname(const classname &) = delete; \ classname &operator=(const classname &) = delete +// Almost all classes in this file have no default constructor. +#define BOILERPLATE(classname) \ + COPY_AND_ASSIGN_BOILERPLATE(classname); \ + classname() = delete + // Empty classes are often used below as alternatives in std::variant<> // discriminated unions. #define EMPTY_CLASS(classname) \ @@ -229,8 +236,10 @@ struct AssignStmt; struct AssignedGotoStmt; struct PauseStmt; +// Cooked character stream locations +using Location = const char *; + // Implicit definitions of the Standard -using Keyword = std::string; // R403 scalar-xyz -> xyz // These template class wrappers correspond to the Standard's modifiers @@ -290,10 +299,10 @@ using Label = std::uint64_t; // validated later, must be in [1..99999] // A wrapper for xzy-stmt productions that are statements, so that // source provenances and labels have a uniform representation. template struct Statement { - Statement(Provenance &&at, std::optional &&lab, bool &&accept, A &&s) - : provenance(at), label(std::move(lab)), isLabelInAcceptableField{accept}, + Statement(std::optional &&lab, bool &&accept, A &&s) + : label(std::move(lab)), isLabelInAcceptableField{accept}, statement(std::move(s)) {} - Provenance provenance; + CharBlock source; std::optional