From: peter klausler Date: Mon, 30 Mar 2020 23:37:30 +0000 (-0700) Subject: [flang] Define CHARACTER runtime API, establish placeholder implementations X-Git-Tag: llvmorg-12-init~9537^2~10 X-Git-Url: http://review.tizen.org/git/?a=commitdiff_plain;h=4d54bb7af8ab64575d10331f3cdd902660908d27;p=platform%2Fupstream%2Fllvm.git [flang] Define CHARACTER runtime API, establish placeholder implementations formatting Original-commit: flang-compiler/f18@1d287d9d59eaee357db604dd9986c74c298a7b4d Reviewed-on: https://github.com/flang-compiler/f18/pull/1096 --- diff --git a/flang/documentation/Calls.md b/flang/documentation/Calls.md index bd57dee..79f0d97 100644 --- a/flang/documentation/Calls.md +++ b/flang/documentation/Calls.md @@ -1,9 +1,9 @@ - ## Procedure reference implementation protocol @@ -182,6 +182,7 @@ some design alternatives that are explored further below. discretionary `VALUE` arguments) into registers. 1. Marshal `CHARACTER` argument lengths in additional value arguments for `CHARACTER` effective arguments not passed via descriptors. + These lengths must be 64-bit integers. 1. Marshal an extra argument for the length of a `CHARACTER` function result if the function is F77ish. 1. Marshal an extra argument for the function result's descriptor, diff --git a/flang/documentation/Character.md b/flang/documentation/Character.md new file mode 100644 index 0000000..d1c7ca4 --- /dev/null +++ b/flang/documentation/Character.md @@ -0,0 +1,147 @@ + + +## Implementation of `CHARACTER` types in f18 + +### Kinds and Character Sets + +The f18 compiler and runtime support three kinds of the intrinsic +`CHARACTER` type of Fortran 2018. +The default (`CHARACTER(KIND=1)`) holds 8-bit character codes; +`CHARACTER(KIND=2)` holds 16-bit character codes; +and `CHARACTER(KIND=4)` holds 32-bit character codes. + +We assume that code values 0 through 127 correspond to +the 7-bit ASCII character set (ISO-646) in every kind of `CHARACTER`. +This is a valid assumption for Unicode (UCS == ISO/IEC-10646), +ISO-8859, and many legacy character sets and interchange formats. + +`CHARACTER` data in memory and unformatted files are not in an +interchange representation (like UTF-8, Shift-JIS, EUC-JP, or a JIS X). +Each character's code in memory occupies a 1-, 2-, or 4- byte +word and substrings can be indexed with simple arithmetic. +In formatted I/O, however, `CHARACTER` data may be assumed to use +the UTF-8 variable-length encoding when it is selected with +`OPEN(ENCODING='UTF-8')`. + +`CHARACTER(KIND=1)` literal constants in Fortran source files, +Hollerith constants, and formatted I/O with `ENCODING='DEFAULT'` +are not translated. + +For the purposes of non-default-kind `CHARACTER` constants in Fortran +source files, formatted I/O with `ENCODING='UTF-8'` or non-default-kind +`CHARACTER` value, and conversions between kinds of `CHARACTER`, +by default: +* `CHARACTER(KIND=1)` is assumed to be ISO-8859-1 (Latin-1), +* `CHARACTER(KIND=2)` is assumed to be UCS-2 (16-bit Unicode), and +* `CHARACTER(KIND=4)` is assumed to be UCS-4 (full Unicode in a 32-bit word). + +In particular, conversions between kinds are assumed to be +simple zero-extensions or truncation, not table look-ups. + +We might want to support one or more environment variables to change these +assumptions, especially for `KIND=1` users of ISO-8859 character sets +besides Latin-1. + +### Lengths + +Allocatable `CHARACTER` objects in Fortran may defer the specification +of their lengths until the time of their allocation or whole (non-substring) +assignment. +Non-allocatable objects (and non-deferred-length allocatables) have +lengths that are fixed or assumed from an actual argument, or, +in the case of assumed-length `CHARACTER` functions, their local +declaration in the calling scope. + +The elements of `CHARACTER` arrays have the same length. + +Assignments to targets that are not deferred-length allocatables will +truncate or pad the assigned value to the length of the left-hand side +of the assignment. + +Lengths and offsets that are used by or exposed to Fortran programs via +declarations, substring bounds, and the `LEN()` intrinsic function are always +represented in units of characters, not bytes. +In generated code, assumed-length arguments, the runtime support library, +and in the `elem_len` field of the interoperable descriptor `cdesc_t`, +lengths are always in units of bytes. +The distinction matters only for kinds other than the default. + +Fortran substrings are rather like subscript triplets into a hidden +"zero" dimension of a scalar `CHARACTER` value, but they cannot have +strides. + +### Concatenation + +Fortran has one `CHARACTER`-valued intrinsic operator, `//`, which +concatenates its operands (10.1.5.3). +The operands must have the same kind type parameter. +One or both of the operands may be arrays; if both are arrays, their +shapes must be identical. +The effective length of the result is the sum of the lengths of the +operands. +Parentheses may be ignored, so any `CHARACTER`-valued expression +may be "flattened" into a single sequence of concatenations. + +The result of `//` may be used +* as an operand to another concatenation, +* as an operand of a `CHARACTER` relation, +* as an actual argument, +* as the right-hand side of an assignment, +* as the `SOURCE=` or `MOLD=` of an `ALLOCATE` statemnt, +* as the selector or case-expr of an `ASSOCIATE` or `SELECT` construct, +* as a component of a structure or array constructor, +* as the value of a named constant or initializer, +* as the `NAME=` of a `BIND(C)` attribute, +* as the stop-code of a `STOP` statement, +* as the value of a specifier of an I/O statement, +* or as the value of a statement function. + +The f18 compiler has a general (but slow) means of implementing concatenation +and a specialized (fast) option to optimize the most common case. + +#### General concatenation + +In the most general case, the f18 compiler's generated code and +runtime support library represent the result as a deferred-length allocatable +`CHARACTER` temporary scalar or array variable that is initialized +as a zero-length array by `AllocatableInitCharacter()` +and then progressively augmented in place by the values of each of the +operands of the concatenation sequence in turn with calls to +`CharacterConcatenate()`. +Conformability errors are fatal -- Fortran has no means by which a program +may recover from them. +The result is then used as any other deferred-length allocatable +array or scalar would be, and finally deallocated like any other +allocatable. + +The runtime routine `CharacterAssign()` takes care of +truncating, padding, or replicating the value(s) assigned to the left-hand +side, as well as reallocating an nonconforming or deferred-length allocatable +left-hand side. It takes the descriptors of the left- and right-hand sides of +a `CHARACTER` assignemnt as its arguments. + +When the left-hand side of a `CHARACTER` assignment is a deferred-length +allocatable and the right-hand side is a temporary, use of the runtime's +`MoveAlloc()` subroutine instead can save an allocation and a copy. + +#### Optimized concatenation + +Scalar `CHARACTER(KIND=1)` expressions evaluated as the right-hand sides of +assignments to independent substrings or whole variables that are not +deferred-length allocatables can be optimized into a sequence of +calls to the runtime support library that do not allocate temporary +memory. + +The routine `CharacterAppend()` copies data from the right-hand side value +to the remaining space, if any, in the left-hand side object, and returns +the new offset of the reduced remaining space. +It is essentially `memcpy(lhs + offset, rhs, min(lhsLength - offset, rhsLength))`. +It does nothing when `offset > lhsLength`. + +`void CharacterPad()`adds any necessary trailing blank characters. diff --git a/flang/runtime/CMakeLists.txt b/flang/runtime/CMakeLists.txt index 066cb7d..9232d80 100644 --- a/flang/runtime/CMakeLists.txt +++ b/flang/runtime/CMakeLists.txt @@ -28,7 +28,9 @@ configure_file(config.h.cmake config.h) add_library(FortranRuntime ISO_Fortran_binding.cpp + allocatable.cpp buffer.cpp + character.cpp connection.cpp derived-type.cpp descriptor.cpp diff --git a/flang/runtime/allocatable.cpp b/flang/runtime/allocatable.cpp new file mode 100644 index 0000000..b47a40e --- /dev/null +++ b/flang/runtime/allocatable.cpp @@ -0,0 +1,45 @@ +//===-- runtime/allocatable.cpp ---------------------------------*- C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// + +#include "allocatable.h" +#include "terminator.h" + +namespace Fortran::runtime { +extern "C" { + +void RTNAME(AllocatableInitIntrinsic)( + Descriptor &, TypeCategory, int /*kind*/, int /*rank*/, int /*corank*/) { + // TODO +} + +void RTNAME(AllocatableInitCharacter)(Descriptor &, SubscriptValue /*length*/, + int /*kind*/, int /*rank*/, int /*corank*/) { + // TODO +} + +void RTNAME(AllocatableInitDerived)( + Descriptor &, const DerivedType &, int /*rank*/, int /*corank*/) { + // TODO +} + +void RTNAME(AllocatableAssign)(Descriptor &to, const Descriptor & /*from*/) {} + +int RTNAME(MoveAlloc)(Descriptor &to, const Descriptor & /*from*/, + bool /*hasStat*/, Descriptor * /*errMsg*/, const char * /*sourceFile*/, + int /*sourceLine*/) { + // TODO + return 0; +} + +int RTNAME(AllocatableDeallocate)(Descriptor &, bool /*hasStat*/, + Descriptor * /*errMsg*/, const char * /*sourceFile*/, int /*sourceLine*/) { + // TODO + return 0; +} +} +} // namespace Fortran::runtime diff --git a/flang/runtime/allocatable.h b/flang/runtime/allocatable.h index d63bd2c..c65ede2 100644 --- a/flang/runtime/allocatable.h +++ b/flang/runtime/allocatable.h @@ -26,7 +26,7 @@ extern "C" { // a change of type, rank, or corank. void RTNAME(AllocatableInitIntrinsic)( Descriptor &, TypeCategory, int kind, int rank = 0, int corank = 0); -void RTNAME(AllocatableInitCharacter)(Descriptor &, SubscriptValue length, +void RTNAME(AllocatableInitCharacter)(Descriptor &, SubscriptValue length = 0, int kind = 1, int rank = 0, int corank = 0); void RTNAME(AllocatableInitDerived)( Descriptor &, const DerivedType &, int rank = 0, int corank = 0); @@ -94,7 +94,7 @@ int RTNAME(AllocatableAllocateSource)(Descriptor &, const Descriptor &source, // TODO: Consider renaming to a more general name that will work for // assignments to pointers, dummy arguments, and anything else with a // descriptor. -void RTNAME(AllocatableAssignment)(Descriptor &to, const Descriptor &from); +void RTNAME(AllocatableAssign)(Descriptor &to, const Descriptor &from); // Implements the intrinsic subroutine MOVE_ALLOC (16.9.137 in F'2018, // but note the order of first two arguments is reversed for consistency diff --git a/flang/runtime/character.cpp b/flang/runtime/character.cpp new file mode 100644 index 0000000..b6a804d --- /dev/null +++ b/flang/runtime/character.cpp @@ -0,0 +1,48 @@ +//===-- runtime/character.cpp -----------------------------------*- C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// + +#include "character.h" +#include "terminator.h" +#include +#include + +namespace Fortran::runtime { +extern "C" { + +void RTNAME(CharacterConcatenate)(Descriptor & /*temp*/, + const Descriptor & /*operand*/, const char * /*sourceFile*/, + int /*sourceLine*/) { + // TODO +} + +void RTNAME(CharacterConcatenateScalar)( + Descriptor & /*temp*/, const char * /*from*/, std::size_t /*byteLength*/) { + // TODO +} + +void RTNAME(CharacterAssign)(Descriptor & /*lhs*/, const Descriptor & /*rhs*/, + const char * /*sourceFile*/, int /*sourceLine*/) { + // TODO +} + +std::size_t RTNAME(CharacterAppend)(char *lhs, std::size_t lhsLength, + std::size_t offset, const char *rhs, std::size_t rhsLength) { + if (auto n{std::min(lhsLength - offset, rhsLength)}) { + std::memcpy(lhs + offset, rhs, n); + offset += n; + } + return offset; +} + +void RTNAME(CharacterPad)(char *lhs, std::size_t length, std::size_t offset) { + if (length > offset) { + std::memset(lhs + offset, ' ', length - offset); + } +} +} +} // namespace Fortran::runtime diff --git a/flang/runtime/character.h b/flang/runtime/character.h new file mode 100644 index 0000000..ff182de --- /dev/null +++ b/flang/runtime/character.h @@ -0,0 +1,53 @@ +//===-- runtime/character.h -------------------------------------*- C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// + +// Defines API between compiled code and the CHARACTER +// support functions in the runtime library. + +#ifndef FORTRAN_RUNTIME_CHARACTER_H_ +#define FORTRAN_RUNTIME_CHARACTER_H_ +#include "descriptor.h" +#include "entry-names.h" +#include + +namespace Fortran::runtime { +extern "C" { + +// Appends the corresponding (or expanded) characters of 'operand' +// to the (elements of) a (re)allocation of 'temp', which must be an +// initialized CHARACTER allocatable scalar or array descriptor -- use +// AllocatableInitCharacter() to set one up. Crashes when not +// conforming. Assumes independence of data. +void RTNAME(CharacterConcatenate)(Descriptor &temp, const Descriptor &operand, + const char *sourceFile = nullptr, int sourceLine = 0); + +// Convenience specialization for character scalars. +void RTNAME(CharacterConcatenateScalar)( + Descriptor &temp, const char *, std::size_t byteLength); + +// Assigns the value(s) of 'rhs' to 'lhs'. Handles reallocation, +// truncation, or padding ss necessary. Crashes when not conforming. +// Assumes independence of data. +// Call MoveAlloc() instead as an optimization when a temporary value is +// being assigned to a deferred-length allocatable. +void RTNAME(CharacterAssign)(Descriptor &lhs, const Descriptor &rhs, + const char *sourceFile = nullptr, int sourceLine = 0); + +// Special-case support for optimized scalar CHARACTER concatenation +// expressions. + +// Copies data from 'rhs' to the remaining space (lhsLength - offset) +// in 'lhs', if any. Returns the new offset. Assumes independence. +std::size_t RTNAME(CharacterAppend)(char *lhs, std::size_t lhsLength, + std::size_t offset, const char *rhs, std::size_t rhsLength); + +// Appends any necessary spaces to a CHARACTER(KIND=1) scalar. +void RTNAME(CharacterPad)(char *lhs, std::size_t length, std::size_t offset); +} +} // namespace Fortran::runtime +#endif // FORTRAN_RUNTIME_CHARACTER_H_