abseil-redundant-strcat-calls
=============================
-Suggests removal of unnecessary calls to ``absl::StrCat`` when the result is
+Suggests removal of unnecessary calls to ``absl::StrCat`` when the result is
being passed to another call to ``absl::StrCat`` or ``absl::StrAppend``.
The extra calls cause unnecessary temporary strings to be constructed. Removing
absl::StrAppend(&s, absl::StrCat("E", "F", "G"));
//before
-
+
absl::StrAppend(&s, "E", "F", "G");
//after
abseil-str-cat-append
=====================
-Flags uses of ``absl::StrCat()`` to append to a ``std::string``. Suggests
+Flags uses of ``absl::StrCat()`` to append to a ``std::string``. Suggests
``absl::StrAppend()`` should be used instead.
The extra calls cause unnecessary temporary strings to be constructed. Removing
bugprone-bad-signal-to-kill-thread
==================================
-Finds ``pthread_kill`` function calls when a thread is terminated by
-raising ``SIGTERM`` signal and the signal kills the entire process, not
+Finds ``pthread_kill`` function calls when a thread is terminated by
+raising ``SIGTERM`` signal and the signal kills the entire process, not
just the individual thread. Use any signal except ``SIGTERM``.
.. code-block: c++
.. option:: WantToUseSafeFunctions
The value `true` specifies that the target environment is considered to
- implement '_s' suffixed memory and string handler functions which are safer
+ implement '_s' suffixed memory and string handler functions which are safer
than older versions (e.g. 'memcpy_s()'). The default value is `true`.
`cert-dcl37-c` and `cert-dcl51-cpp` redirect here as an alias for this check.
-Checks for usages of identifiers reserved for use by the implementation.
+Checks for usages of identifiers reserved for use by the implementation.
The C and C++ standards both reserve the following names for such use:
- identifiers in the global namespace that begin with an underscore.
The C standard additionally reserves names beginning with a double underscore,
-while the C++ standard strengthens this to reserve names with a double
+while the C++ standard strengthens this to reserve names with a double
underscore occurring anywhere.
Violating the naming rules above results in undefined behavior.
.. code-block:: c++
- namespace NS {
+ namespace NS {
void __f(); // name is not allowed in user code
using _Int = int; // same with this
#define cool__macro // also this
}
int _g(); // disallowed in global namespace only
-The check can also be inverted, i.e. it can be configured to flag any
-identifier that is _not_ a reserved identifier. This mode is for use by e.g.
-standard library implementors, to ensure they don't infringe on the user
+The check can also be inverted, i.e. it can be configured to flag any
+identifier that is _not_ a reserved identifier. This mode is for use by e.g.
+standard library implementors, to ensure they don't infringe on the user
namespace.
-This check does not (yet) check for other reserved names, e.g. macro names
-identical to language keywords, and names specifically reserved by language
+This check does not (yet) check for other reserved names, e.g. macro names
+identical to language keywords, and names specifically reserved by language
standards, e.g. C++ 'zombie names' and C future library directions.
-This check corresponds to CERT C Coding Standard rule `DCL37-C. Do not declare
+This check corresponds to CERT C Coding Standard rule `DCL37-C. Do not declare
or define a reserved identifier
<https://wiki.sei.cmu.edu/confluence/display/c/DCL37-C.+Do+not+declare+or+define+a+reserved+identifier>`_
as well as its C++ counterpart, `DCL51-CPP. Do not declare or define a reserved
-identifier
+identifier
<https://wiki.sei.cmu.edu/confluence/display/cplusplus/DCL51-CPP.+Do+not+declare+or+define+a+reserved+identifier>`_.
Options
.. option:: Invert
- If `true`, inverts the check, i.e. flags names that are not reserved.
+ If `true`, inverts the check, i.e. flags names that are not reserved.
Default is `false`.
.. option:: AllowedIdentifiers
- Semicolon-separated list of names that the check ignores. Default is an
+ Semicolon-separated list of names that the check ignores. Default is an
empty list.
assumable that the reason is that the list was not updated for C11.
The checker includes ``quick_exit`` in the set of safe functions.
Functions registered as exit handlers are not checked.
-
- Default is ``POSIX``.
+ Default is ``POSIX``.
bugprone-spuriously-wake-up-functions
=====================================
-Finds ``cnd_wait``, ``cnd_timedwait``, ``wait``, ``wait_for``, or
+Finds ``cnd_wait``, ``cnd_timedwait``, ``wait``, ``wait_for``, or
``wait_until`` function calls when the function is not invoked from a loop
-that checks whether a condition predicate holds or the function has a
+that checks whether a condition predicate holds or the function has a
condition parameter.
.. code-block: c++
The checker detects various cases when an enum is probably misused (as a bitmask
).
-
+
1. When "ADD" or "bitwise OR" is used between two enum which come from different
types and these types value ranges are not disjoint.
-The following cases will be investigated only using :option:`StrictMode`. We
+The following cases will be investigated only using :option:`StrictMode`. We
regard the enum as a (suspicious)
bitmask if the three conditions below are true at the same time:
enum { A, B, C };
enum { D, E, F = 5 };
enum { G = 10, H = 11, I = 12 };
-
+
unsigned flag;
flag =
A |
H; // OK, disjoint value intervals in the enum types ->probably good use.
flag = B | F; // Warning, have common values so they are probably misused.
-
+
// Case 2:
enum Bitmask {
A = 0,
F = 16,
G = 31 // OK, real bitmask.
};
-
+
enum Almostbitmask {
AA = 0,
BB = 1,
FF = 16,
GG // Problem, forgot to initialize.
};
-
+
unsigned flag = 0;
flag |= E; // OK.
flag |=
Objects with the same value may not have the same object representation.
This may be caused by padding or floating-point types.
-See also:
+See also:
`EXP42-C. Do not compare padding data
<https://wiki.sei.cmu.edu/confluence/display/c/EXP42-C.+Do+not+compare+padding+data>`_
and
`FLP37-C. Do not use object representations to compare floating-point values
<https://wiki.sei.cmu.edu/confluence/display/c/FLP37-C.+Do+not+use+object+representations+to+compare+floating-point+values>`_
-This check is also related to and partially overlaps the CERT C++ Coding Standard rules
+This check is also related to and partially overlaps the CERT C++ Coding Standard rules
`OOP57-CPP. Prefer special member functions and overloaded operators to C Standard Library functions
<https://wiki.sei.cmu.edu/confluence/display/cplusplus/OOP57-CPP.+Prefer+special+member+functions+and+overloaded+operators+to+C+Standard+Library+functions>`_
and
============
The cert-dcl37-c check is an alias, please see
-`bugprone-reserved-identifier <bugprone-reserved-identifier.html>`_ for more
+`bugprone-reserved-identifier <bugprone-reserved-identifier.html>`_ for more
information.
==============
The cert-dcl51-cpp check is an alias, please see
-`bugprone-reserved-identifier <bugprone-reserved-identifier.html>`_ for more
+`bugprone-reserved-identifier <bugprone-reserved-identifier.html>`_ for more
information.
std::mt19937 engine2(1); // Diagnose
engine1.seed(); // Diagnose
engine2.seed(1); // Diagnose
-
+
std::time_t t;
engine1.seed(std::time(&t)); // Diagnose, system time might be controlled by user
`std::memcmp`, `memcmp`, `std::strcmp`, `strcmp`, `strncmp`.
This check corresponds to the CERT C++ Coding Standard rule
-`OOP57-CPP. Prefer special member functions and overloaded operators to C
+`OOP57-CPP. Prefer special member functions and overloaded operators to C
Standard Library functions
<https://wiki.sei.cmu.edu/confluence/display/cplusplus/OOP57-CPP.+Prefer+special+member+functions+and+overloaded+operators+to+C+Standard+Library+functions>`_.
The usage of ``goto`` for control flow is error prone and should be replaced
with looping constructs. Only forward jumps in nested loops are accepted.
-This check implements `ES.76 <https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md#es76-avoid-goto>`_
-from the CppCoreGuidelines and
+This check implements `ES.76 <https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md#es76-avoid-goto>`_
+from the CppCoreGuidelines and
`6.3.1 from High Integrity C++ <http://www.codingstandard.com/rule/6-3-1-ensure-that-the-labels-for-a-jump-statement-or-a-switch-condition-appear-later-in-the-same-or-an-enclosing-block/>`_.
-For more information on why to avoid programming
+For more information on why to avoid programming
with ``goto`` you can read the famous paper `A Case against the GO TO Statement. <https://www.cs.utexas.edu/users/EWD/ewd02xx/EWD215.PDF>`_.
The check diagnoses ``goto`` for backward jumps in every language mode. These
Finds macro usage that is considered problematic because better language
constructs exist for the task.
-The relevant sections in the C++ Core Guidelines are
+The relevant sections in the C++ Core Guidelines are
`Enum.1 <https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md#enum1-prefer-enumerations-over-macros>`_,
`ES.30 <https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md#es30-dont-use-macros-for-program-text-manipulation>`_,
`ES.31 <https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md#es31-dont-use-macros-for-constants-or-functions>`_ and
.. option:: AllowedRegexp
- A regular expression to filter allowed macros. For example
+ A regular expression to filter allowed macros. For example
`DEBUG*|LIBTORRENT*|TORRENT*|UNI*` could be applied to filter `libtorrent`.
Default value is `^DEBUG_*`.
This check handles C-Style memory management using ``malloc()``, ``realloc()``,
``calloc()`` and ``free()``. It warns about its use and tries to suggest the use
of an appropriate RAII object.
-Furthermore, it can be configured to check against a user-specified list of functions
+Furthermore, it can be configured to check against a user-specified list of functions
that are used for memory management (e.g. ``posix_memalign()``).
See `C++ Core Guidelines <https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md#Rr-mallocfree>`_.
.. option:: Allocations
- Semicolon-separated list of fully qualified names of memory allocation functions.
+ Semicolon-separated list of fully qualified names of memory allocation functions.
Defaults to ``::malloc;::calloc``.
.. option:: Deallocations
- Semicolon-separated list of fully qualified names of memory allocation functions.
+ Semicolon-separated list of fully qualified names of memory allocation functions.
Defaults to ``::free``.
.. option:: Reallocations
- Semicolon-separated list of fully qualified names of memory allocation functions.
+ Semicolon-separated list of fully qualified names of memory allocation functions.
Defaults to ``::realloc``.
-
cppcoreguidelines-owning-memory
===============================
-This check implements the type-based semantics of ``gsl::owner<T*>``, which allows
-static analysis on code, that uses raw pointers to handle resources like
+This check implements the type-based semantics of ``gsl::owner<T*>``, which allows
+static analysis on code, that uses raw pointers to handle resources like
dynamic memory, but won't introduce RAII concepts.
The relevant sections in the `C++ Core Guidelines <https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md>`_ are I.11, C.33, R.3 and GSL.Views
All checks are purely type based and not (yet) flow sensitive.
The following examples will demonstrate the correct and incorrect initializations
-of owners, assignment is handled the same way. Note that both ``new`` and
+of owners, assignment is handled the same way. Note that both ``new`` and
``malloc()``-like resource functions are considered to produce resources.
.. code-block:: c++
// Example Good, Ownership correctly stated
gsl::owner<int*> Owner = new int(42); // Good
delete Owner; // Good as well, statically enforced, that only owners get deleted
-
+
The check will furthermore ensure, that functions, that expect a ``gsl::owner<T*>`` as
argument get called with either a ``gsl::owner<T*>`` or a newly created resource.
Limitations
-----------
-Using ``gsl::owner<T*>`` in a typedef or alias is not handled correctly.
+Using ``gsl::owner<T*>`` in a typedef or alias is not handled correctly.
.. code-block:: c++
The ``gsl::owner<T*>`` is declared as a templated type alias.
In template functions and classes, like in the example below, the information
of the type aliases gets lost. Therefore using ``gsl::owner<T*>`` in a heavy templated
-code base might lead to false positives.
+code base might lead to false positives.
Known code constructs that do not get diagnosed correctly are:
gsl::owner<int*> function_that_returns_owner() { return gsl::owner<int*>(new int(42)); }
- // Type deduction does not work for auto variables.
+ // Type deduction does not work for auto variables.
// This is caught by the check and will be noted accordingly.
auto OwnedObject = function_that_returns_owner(); // Type of OwnedObject will be int*
};
// Code, that yields a false positive.
- OwnedValue<gsl::owner<int*>> Owner(new int(42)); // Type deduction yield T -> int *
+ OwnedValue<gsl::owner<int*>> Owner(new int(42)); // Type deduction yield T -> int *
// False positive, getValue returns int* and not gsl::owner<int*>
- gsl::owner<int*> OwnedInt = Owner.getValue();
+ gsl::owner<int*> OwnedInt = Owner.getValue();
Another limitation of the current implementation is only the type based checking.
Suppose you have code like the following:
.. code-block:: c++
// Two owners with assigned resources
- gsl::owner<int*> Owner1 = new int(42);
+ gsl::owner<int*> Owner1 = new int(42);
gsl::owner<int*> Owner2 = new int(42);
Owner2 = Owner1; // Conceptual Leak of initial resource of Owner2!
Owner1 = nullptr;
The semantic of a ``gsl::owner<T*>`` is mostly like a ``std::unique_ptr<T>``, therefore
-assignment of two ``gsl::owner<T*>`` is considered a move, which requires that the
+assignment of two ``gsl::owner<T*>`` is considered a move, which requires that the
resource ``Owner2`` must have been released before the assignment.
-This kind of condition could be caught in later improvements of this check with
+This kind of condition could be caught in later improvements of this check with
flowsensitive analysis. Currently, the `Clang Static Analyzer` catches this bug
for dynamic memory, but not for general types of resources.
When set to `true` (default is `false`), this check doesn't flag classes with a sole, explicitly
defaulted destructor. An example for such a class is:
-
+
.. code-block:: c++
-
+
struct A {
virtual ~A() = default;
};
-
+
.. option:: AllowMissingMoveFunctions
When set to `true` (default is `false`), this check doesn't flag classes which define no move
operations at all. It still flags classes which define only one of either
move constructor or move assignment operator. With this option enabled, the following class won't be flagged:
-
+
.. code-block:: c++
-
+
struct A {
A(const A&);
A& operator=(const A&);
When set to `true` (default is `false`), this check doesn't flag classes which define deleted copy
operations but don't define move operations. This flag is related to Google C++ Style Guide
- https://google.github.io/styleguide/cppguide.html#Copyable_Movable_Types. With this option enabled, the
+ https://google.github.io/styleguide/cppguide.html#Copyable_Movable_Types. With this option enabled, the
following class won't be flagged:
-
+
.. code-block:: c++
-
+
struct A {
A(const A&) = delete;
A& operator=(const A&) = delete;
=====================
Finds usages of ``OSSpinlock``, which is deprecated due to potential livelock
-problems.
+problems.
This check will detect following function invocations:
fuchsia-overloaded-operator
===========================
-Warns if an operator is overloaded, except for the assignment (copy and move)
+Warns if an operator is overloaded, except for the assignment (copy and move)
operators.
For example:
fuchsia-statically-constructed-objects
======================================
-Warns if global, non-trivial objects with static storage are constructed, unless
-the object is statically initialized with a ``constexpr`` constructor or has no
+Warns if global, non-trivial objects with static storage are constructed, unless
+the object is statically initialized with a ``constexpr`` constructor or has no
explicit constructor.
For example:
static B b(0); // Warning, as constructor is not constexpr
static C c2(0, 1); // Warning, as constructor is not constexpr
-
+
static int i; // No warning, as it is trivial
-
+
extern int get_i();
static C(get_i()) // Warning, as the constructor is dynamically initialized
fuchsia-trailing-return
=======================
-Functions that have trailing returns are disallowed, except for those using
+Functions that have trailing returns are disallowed, except for those using
``decltype`` specifiers and lambda with otherwise unutterable return types.
For example:
Exceptions are made for lambdas and ``decltype`` specifiers:
.. code-block:: c++
-
+
// No warning
auto lambda = [](double x, double y) -> double {return x + y;};
-
+
// No warning
template <typename T1, typename T2>
auto fn(const T1 &lhs, const T2 &rhs) -> decltype(lhs + rhs) {
Finds uses of throwing exceptions usages in Objective-C files.
-For the same reason as the Google C++ style guide, we prefer not throwing
+For the same reason as the Google C++ style guide, we prefer not throwing
exceptions from Objective-C code.
The corresponding C++ style guide rule:
hicpp-avoid-goto
================
-The `hicpp-avoid-goto` check is an alias to
+The `hicpp-avoid-goto` check is an alias to
`cppcoreguidelines-avoid-goto <cppcoreguidelines-avoid-goto.html>`_.
Rule `6.3.1 High Integrity C++ <http://www.codingstandard.com/rule/6-3-1-ensure-that-the-labels-for-a-jump-statement-or-a-switch-condition-appear-later-in-the-same-or-an-enclosing-block/>`_
-requires that ``goto`` only skips parts of a block and is not used for other
+requires that ``goto`` only skips parts of a block and is not used for other
reasons.
Both coding guidelines implement the same exception to the usage of ``goto``.
hicpp-exception-baseclass
=========================
-Ensure that every value that in a ``throw`` expression is an instance of
+Ensure that every value that in a ``throw`` expression is an instance of
``std::exception``.
This enforces `rule 15.1 <http://www.codingstandard.com/section/15-1-throwing-an-exception/>`_
throw std::runtime_error();
throw std::exception();
}
-
- `cppcoreguidelines-pro-type-static-cast-downcast <cppcoreguidelines-pro-type-static-cast-downcast.html>`_
- `cppcoreguidelines-pro-type-reinterpret-cast <cppcoreguidelines-pro-type-reinterpret-cast.html>`_
-- `cppcoreguidelines-pro-type-const-cast <cppcoreguidelines-pro-type-const-cast.html>`_
+- `cppcoreguidelines-pro-type-const-cast <cppcoreguidelines-pro-type-const-cast.html>`_
- `cppcoreguidelines-pro-type-cstyle-cast <cppcoreguidelines-pro-type-cstyle-cast.html>`_
=================
This check is an alias for `cppcoreguidelines-pro-type-member-init <cppcoreguidelines-pro-type-member-init.html>`_.
-Implements the check for
-`rule 12.4.2 <http://www.codingstandard.com/rule/12-4-2-ensure-that-a-constructor-initializes-explicitly-all-base-classes-and-non-static-data-members/>`_
+Implements the check for
+`rule 12.4.2 <http://www.codingstandard.com/rule/12-4-2-ensure-that-a-constructor-initializes-explicitly-all-base-classes-and-non-static-data-members/>`_
to initialize class members in the right order.
and `rule 6.1.4 <http://www.codingstandard.com/rule/6-1-4-ensure-that-a-switch-statement-has-at-least-two-case-labels-distinct-from-the-default-label/>`_
of the High Integrity C++ Coding Standard are enforced.
-``if-else if`` chains that miss a final ``else`` branch might lead to unexpected
+``if-else if`` chains that miss a final ``else`` branch might lead to unexpected
program execution and be the result of a logical error.
If the missing ``else`` branch is intended you can leave it empty with a clarifying
comment.
void f1() {
int i = determineTheNumber();
- if(i > 0) {
- // Some Calculation
- } else if (i < 0) {
- // Precondition violated or something else.
+ if(i > 0) {
+ // Some Calculation
+ } else if (i < 0) {
+ // Precondition violated or something else.
}
// ...
}
}
// Should rather be the following:
- if (i == 1) {
- // do something here
+ if (i == 1) {
+ // do something here
}
- else {
- // do something here
+ else {
+ // do something here
}
.. code-block:: c++
-
+
// A completely degenerated switch will be diagnosed.
int i = 42;
switch(i) {}
Check for assembler statements. No fix is offered.
Inline assembler is forbidden by the `High Integrity C++ Coding Standard
-<http://www.codingstandard.com/section/7-5-the-asm-declaration/>`_
+<http://www.codingstandard.com/section/7-5-the-asm-declaration/>`_
as it restricts the portability of code.
hicpp-signed-bitwise
====================
-Finds uses of bitwise operations on signed integer types, which may lead to
+Finds uses of bitwise operations on signed integer types, which may lead to
undefined or implementation defined behavior.
The according rule is defined in the `High Integrity C++ Standard, Section 5.6.1 <http://www.codingstandard.com/section/5-6-shift-operators/>`_.
=============================
This check is an alias for `bugprone-undelegated-constructor <bugprone-undelegated-constructor.html>`_.
-Partially implements `rule 12.4.5 <http://www.codingstandard.com/rule/12-4-5-use-delegating-constructors-to-reduce-code-duplication/>`_
+Partially implements `rule 12.4.5 <http://www.codingstandard.com/rule/12-4-5-use-delegating-constructors-to-reduce-code-duplication/>`_
to find misplaced constructor calls inside a constructor.
.. code-block:: c++
Ctor(int, int);
Ctor(Ctor *i) {
// All Ctor() calls result in a temporary object
- Ctor(); // did you intend to call a delegated constructor?
+ Ctor(); // did you intend to call a delegated constructor?
Ctor(0); // did you intend to call a delegated constructor?
Ctor(1, 2); // did you intend to call a delegated constructor?
foo();
=======================
This check is an alias for `modernize-use-equals-delete <modernize-use-equals-delete.html>`_.
-Implements `rule 12.5.1 <http://www.codingstandard.com/rule/12-5-1-define-explicitly-default-or-delete-implicit-special-member-functions-of-concrete-classes/>`_
+Implements `rule 12.5.1 <http://www.codingstandard.com/rule/12-5-1-define-explicitly-default-or-delete-implicit-special-member-functions-of-concrete-classes/>`_
to explicitly default or delete special member functions.
==================
This check is an alias for `modernize-use-override <modernize-use-override.html>`_.
-Implements `rule 10.2.1 <http://www.codingstandard.com/section/10-2-virtual-functions/>`_ to
+Implements `rule 10.2.1 <http://www.codingstandard.com/section/10-2-virtual-functions/>`_ to
declare a virtual function `override` when overriding.
`hicpp-vararg <hicpp-vararg.html>`_, `cppcoreguidelines-pro-type-vararg <cppcoreguidelines-pro-type-vararg.html>`_,
`llvm-else-after-return <llvm-else-after-return.html>`_, `readability-else-after-return <readability-else-after-return.html>`_, "Yes"
`llvm-qualified-auto <llvm-qualified-auto.html>`_, `readability-qualified-auto <readability-qualified-auto.html>`_, "Yes"
-
Checks all calls resolve to functions within ``__llvm_libc`` namespace.
.. code-block:: c++
-
+
namespace __llvm_libc {
// Allow calls with the fully qualified name.
.. option:: MaxSize
Determines the maximum size of an object allowed to be caught without
- warning. Only applicable if :option:`WarnOnLargeObject` is set to `true`. If
+ warning. Only applicable if :option:`WarnOnLargeObject` is set to `true`. If
the option is set by the user to `std::numeric_limits<uint64_t>::max()` then
it reverts to the default value.
Default is the size of `size_t`.
to perform an incorrect transformation in the case where the result of the ``bind``
is used in the context of a type erased functor such as ``std::function`` which
allows mismatched arguments. For example:
-
+
.. code-block:: c++
}
which is correct.
-
+
This check requires using C++14 or higher to run.
``std::ios_base::io_state`` ``std::ios_base::iostate``
``std::ios_base::open_mode`` ``std::ios_base::openmode``
``std::ios_base::seek_dir`` ``std::ios_base::seekdir``
-``std::ios_base::streamoff``
-``std::ios_base::streampos``
+``std::ios_base::streamoff``
+``std::ios_base::streampos``
=================================== ===========================
Reverse Iterator Support
------------------------
-The converter is also capable of transforming iterator loops which use
-``rbegin`` and ``rend`` for looping backwards over a container. Out of the box
-this will automatically happen in C++20 mode using the ``ranges`` library,
-however the check can be configured to work without C++20 by specifying a
+The converter is also capable of transforming iterator loops which use
+``rbegin`` and ``rend`` for looping backwards over a container. Out of the box
+this will automatically happen in C++20 mode using the ``ranges`` library,
+however the check can be configured to work without C++20 by specifying a
function to reverse a range and optionally the header file where that function
lives.
.. option:: UseCxx20ReverseRanges
-
- When set to true convert loops when in C++20 or later mode using
+
+ When set to true convert loops when in C++20 or later mode using
``std::ranges::reverse_view``.
Default value is ``true``.
.. option:: MakeReverseRangeFunction
- Specify the function used to reverse an iterator pair, the function should
- accept a class with ``rbegin`` and ``rend`` methods and return a
+ Specify the function used to reverse an iterator pair, the function should
+ accept a class with ``rbegin`` and ``rend`` methods and return a
class with ``begin`` and ``end`` methods that call the ``rbegin`` and
``rend`` methods respectively. Common examples are ``ranges::reverse_view``
and ``llvm::reverse``.
.. option:: MakeReverseRangeHeader
Specifies the header file where :option:`MakeReverseRangeFunction` is
- declared. For the previous examples this option would be set to
+ declared. For the previous examples this option would be set to
``range/v3/view/reverse.hpp`` and ``llvm/ADT/STLExtras.h`` respectively.
- If this is an empty string and :option:`MakeReverseRangeFunction` is set,
- the check will proceed on the assumption that the function is already
+ If this is an empty string and :option:`MakeReverseRangeFunction` is set,
+ the check will proceed on the assumption that the function is already
available in the translation unit.
This can be wrapped in angle brackets to signify to add the include as a
system include.
The second example will also receive a warning that ``randomFunc`` is no longer supported in the same way as before so if the user wants the same functionality, the user will need to change the implementation of the ``randomFunc``.
-One thing to be aware of here is that ``std::random_device`` is quite expensive to initialize. So if you are using the code in a performance critical place, you probably want to initialize it elsewhere.
+One thing to be aware of here is that ``std::random_device`` is quite expensive to initialize. So if you are using the code in a performance critical place, you probably want to initialize it elsewhere.
Another thing is that the seeding quality of the suggested fix is quite poor: ``std::mt19937`` has an internal state of 624 32-bit integers, but is only seeded with a single integer. So if you require
higher quality randomness, you should consider seeding better, for example:
.. note::
- If the :option:`ReplacementString` is not a C++ attribute, but instead a
- macro, then that macro must be defined in scope or the fix-it will not be
+ If the :option:`ReplacementString` is not a C++ attribute, but instead a
+ macro, then that macro must be defined in scope or the fix-it will not be
applied.
.. note::
Finds improper initialization of ``NSError`` objects.
-According to Apple developer document, we should always use factory method
+According to Apple developer document, we should always use factory method
``errorWithDomain:code:userInfo:`` to create new NSError objects instead
of ``[NSError alloc] init]``. Otherwise it will lead to a warning message
during runtime.
.. option:: StringLikeClasses
Semicolon-separated list of names of string-like classes. By default only
- ``::std::basic_string`` and ``::std::basic_string_view`` are considered.
- The check will only consider member functions named ``find``, ``rfind``,
- ``find_first_of``, ``find_first_not_of``, ``find_last_of``, or
+ ``::std::basic_string`` and ``::std::basic_string_view`` are considered.
+ The check will only consider member functions named ``find``, ``rfind``,
+ ``find_first_of``, ``find_first_not_of``, ``find_last_of``, or
``find_last_not_of`` within these classes.
-
std::vector<int> obj = ...;
return obj; // calls StatusOr::StatusOr(std::vector<int>&&)
}
-
+
StatusOr<std::vector<int>> NotCool() {
const std::vector<int> obj = ...;
return obj; // calls `StatusOr::StatusOr(const std::vector<int>&)`
.. option:: WarnOnUnfixable
- When `true`, emit a warning for cases where the check can't output a
+ When `true`, emit a warning for cases where the check can't output a
Fix-It. These can occur with declarations inside the ``else`` branch that
would have an extended lifetime if the ``else`` branch was removed.
Default value is `true`.
When `true`, the check will attempt to refactor a variable defined inside
the condition of the ``if`` statement that is used in the ``else`` branch
- defining them just before the ``if`` statement. This can only be done if
+ defining them just before the ``if`` statement. This can only be done if
the ``if`` statement is the last statement in its parent's scope.
Default value is `true`.
----------
There is an alias of this check called llvm-else-after-return.
-In that version the options :option:`WarnOnUnfixable` and
+In that version the options :option:`WarnOnUnfixable` and
:option:`WarnOnConditionVariables` are both set to `false` by default.
This check helps to enforce this `LLVM Coding Standards recommendation
.. option:: GetConfigPerFile
When `true` the check will look for the configuration for where an
- identifier is declared. Useful for when included header files use a
- different style.
+ identifier is declared. Useful for when included header files use a
+ different style.
Default value is `true`.
.. option:: GlobalConstantCase
.. option:: IgnoreMainLikeFunctions
- When set to `true` functions that have a similar signature to ``main`` or
+ When set to `true` functions that have a similar signature to ``main`` or
``wmain`` won't enforce checks on the names of their parameters.
Default value is `false`.
.. option:: ScopedEnumConstantCase
- When defined, the check will ensure scoped enum constant names conform to
+ When defined, the check will ensure scoped enum constant names conform to
the selected casing.
.. option:: ScopedEnumConstantPrefix
Correct indentation helps to understand code. Mismatch of the syntactical
structure and the indentation of the code may hide serious problems.
Missing braces can also make it significantly harder to read the code,
-therefore it is important to use braces.
+therefore it is important to use braces.
The way to avoid dangling else is to always check that an ``else`` belongs
to the ``if`` that begins in the same column.
readability-qualified-auto
==========================
-Adds pointer qualifications to ``auto``-typed variables that are deduced to
+Adds pointer qualifications to ``auto``-typed variables that are deduced to
pointers.
`LLVM Coding Standards <https://llvm.org/docs/CodingStandards.html#beware-unnecessary-copies-with-auto>`_
-advises to make it obvious if a ``auto`` typed variable is a pointer. This
+advises to make it obvious if a ``auto`` typed variable is a pointer. This
check will transform ``auto`` to ``auto *`` when the type is deduced to be a
pointer.
observe(*Data);
}
-Note ``const`` ``volatile`` qualified types will retain their ``const`` and
+Note ``const`` ``volatile`` qualified types will retain their ``const`` and
``volatile`` qualifiers. Pointers to pointers will not be fully qualified.
.. code-block:: c++
-------
.. option:: AddConstToQualified
-
+
When set to `true` the check will add const qualifiers variables defined as
``auto *`` or ``auto &`` when applicable.
Default value is `true`.
Finds string comparisons using the compare method.
-A common mistake is to use the string's ``compare`` method instead of using the
+A common mistake is to use the string's ``compare`` method instead of using the
equality or inequality operators. The compare method is intended for sorting
-functions and thus returns a negative number, a positive number or
-zero depending on the lexicographical relationship between the strings compared.
-If an equality or inequality check can suffice, that is recommended. This is
+functions and thus returns a negative number, a positive number or
+zero depending on the lexicographical relationship between the strings compared.
+If an equality or inequality check can suffice, that is recommended. This is
recommended to avoid the risk of incorrect interpretation of the return value
and to simplify the code. The string equality and inequality operators can
also be faster than the ``compare`` method due to early termination.
}
The above code examples show the list of if-statements that this check will
-give a warning for. All of them uses ``compare`` to check if equality or
+give a warning for. All of them uses ``compare`` to check if equality or
inequality of two strings instead of using the correct operators.
std::unique_ptr<int> P;
P = nullptr;
-
+
Options
-------
// Output/NoProblemsAssistant.txt
// Generated by: modularize -module-map-path=Output/NoProblemsAssistant.txt \
-root-module=Root NoProblemsAssistant.modularize
-
+
module SomeTypes {
header "SomeTypes.h"
export *
// Output/NoProblemsAssistant.txt
// Generated by: modularize -module-map-path=Output/NoProblemsAssistant.txt \
-root-module=Root NoProblemsAssistant.modularize
-
+
module Root {
module SomeTypes {
header "SomeTypes.h"
Argument descriptions:
============== ================================================== ============================== ==============================
-Argument Name Argument Value Syntax Clang C++ Type Description
+Argument Name Argument Value Syntax Clang C++ Type Description
============== ================================================== ============================== ==============================
Loc "(file):(line):(col)" SourceLocation The location of the directive.
Reason (EnterFile|ExitFile|SystemHeaderPragma|RenameFile) PPCallbacks::FileChangeReason Reason for change.
Argument descriptions:
============== ================================================== ============================== ========================================================
-Argument Name Argument Value Syntax Clang C++ Type Description
+Argument Name Argument Value Syntax Clang C++ Type Description
============== ================================================== ============================== ========================================================
ParentFile ("(file)" or (null)) const FileEntry The file that #included the skipped file.
FilenameTok (token) const Token The token in ParentFile that indicates the skipped file.
Argument descriptions:
============== ================================================== ============================== =====================================================================================================================================
-Argument Name Argument Value Syntax Clang C++ Type Description
+Argument Name Argument Value Syntax Clang C++ Type Description
============== ================================================== ============================== =====================================================================================================================================
FileName "(file)" StringRef The name of the file being included, as written in the source code.
RecoveryPath (path) SmallVectorImpl<char> If this client indicates that it can recover from this missing file, the client should set this as an additional header search patch.
Argument descriptions:
============== ================================================== ============================== ============================================================================================================
-Argument Name Argument Value Syntax Clang C++ Type Description
+Argument Name Argument Value Syntax Clang C++ Type Description
============== ================================================== ============================== ============================================================================================================
HashLoc "(file):(line):(col)" SourceLocation The location of the '#' that starts the inclusion directive.
IncludeTok (token) const Token The token that indicates the kind of inclusion directive, e.g., 'include' or 'import'.
Argument descriptions:
============== ================================================== ============================== ===========================================================
-Argument Name Argument Value Syntax Clang C++ Type Description
+Argument Name Argument Value Syntax Clang C++ Type Description
============== ================================================== ============================== ===========================================================
ImportLoc "(file):(line):(col)" SourceLocation The location of import directive token.
Path "(path)" ModuleIdPath The identifiers (and their locations) of the module "path".
Argument descriptions:
============== ================================================== ============================== ======================
-Argument Name Argument Value Syntax Clang C++ Type Description
+Argument Name Argument Value Syntax Clang C++ Type Description
============== ================================================== ============================== ======================
(no arguments)
============== ================================================== ============================== ======================
Argument descriptions:
============== ================================================== ============================== ==============================
-Argument Name Argument Value Syntax Clang C++ Type Description
+Argument Name Argument Value Syntax Clang C++ Type Description
============== ================================================== ============================== ==============================
Loc "(file):(line):(col)" SourceLocation The location of the directive.
str (name) const std::string The text of the directive.
Argument descriptions:
============== ================================================== ============================== =================================
-Argument Name Argument Value Syntax Clang C++ Type Description
+Argument Name Argument Value Syntax Clang C++ Type Description
============== ================================================== ============================== =================================
Loc "(file):(line):(col)" SourceLocation The location of the directive.
Introducer (PIK_HashPragma|PIK__Pragma|PIK___pragma) PragmaIntroducerKind The type of the pragma directive.
Argument descriptions:
============== ================================================== ============================== ==============================
-Argument Name Argument Value Syntax Clang C++ Type Description
+Argument Name Argument Value Syntax Clang C++ Type Description
============== ================================================== ============================== ==============================
Loc "(file):(line):(col)" SourceLocation The location of the directive.
Kind ((name)|(null)) const IdentifierInfo The comment kind symbol.
Argument descriptions:
============== ================================================== ============================== ==============================
-Argument Name Argument Value Syntax Clang C++ Type Description
+Argument Name Argument Value Syntax Clang C++ Type Description
============== ================================================== ============================== ==============================
Loc "(file):(line):(col)" SourceLocation The location of the directive.
Name "(name)" const std::string The name.
Argument descriptions:
============== ================================================== ============================== ================================
-Argument Name Argument Value Syntax Clang C++ Type Description
+Argument Name Argument Value Syntax Clang C++ Type Description
============== ================================================== ============================== ================================
Loc "(file):(line):(col)" SourceLocation The location of the directive.
DebugType (string) StringRef Indicates type of debug message.
Argument descriptions:
============== ================================================== ============================== =======================================
-Argument Name Argument Value Syntax Clang C++ Type Description
+Argument Name Argument Value Syntax Clang C++ Type Description
============== ================================================== ============================== =======================================
Loc "(file):(line):(col)" SourceLocation The location of the directive.
Namespace (name) StringRef The namespace of the message directive.
Argument descriptions:
============== ================================================== ============================== ==============================
-Argument Name Argument Value Syntax Clang C++ Type Description
+Argument Name Argument Value Syntax Clang C++ Type Description
============== ================================================== ============================== ==============================
Loc "(file):(line):(col)" SourceLocation The location of the directive.
Namespace (name) StringRef Namespace name.
Argument descriptions:
============== ================================================== ============================== ==============================
-Argument Name Argument Value Syntax Clang C++ Type Description
+Argument Name Argument Value Syntax Clang C++ Type Description
============== ================================================== ============================== ==============================
Loc "(file):(line):(col)" SourceLocation The location of the directive.
Namespace (name) StringRef Namespace name.
Argument descriptions:
============== ================================================== ============================== ==============================
-Argument Name Argument Value Syntax Clang C++ Type Description
+Argument Name Argument Value Syntax Clang C++ Type Description
============== ================================================== ============================== ==============================
Loc "(file):(line):(col)" SourceLocation The location of the directive.
Namespace (name) StringRef Namespace name.
Argument descriptions:
============== ================================================== ============================== ==========================
-Argument Name Argument Value Syntax Clang C++ Type Description
+Argument Name Argument Value Syntax Clang C++ Type Description
============== ================================================== ============================== ==========================
NameLoc "(file):(line):(col)" SourceLocation The location of the name.
Name (name) const IdentifierInfo Name symbol.
Argument descriptions:
============== ================================================== ============================== ==============================
-Argument Name Argument Value Syntax Clang C++ Type Description
+Argument Name Argument Value Syntax Clang C++ Type Description
============== ================================================== ============================== ==============================
Loc "(file):(line):(col)" SourceLocation The location of the directive.
WarningSpec (string) StringRef The warning specifier.
Argument descriptions:
============== ================================================== ============================== ==============================
-Argument Name Argument Value Syntax Clang C++ Type Description
+Argument Name Argument Value Syntax Clang C++ Type Description
============== ================================================== ============================== ==============================
Loc "(file):(line):(col)" SourceLocation The location of the directive.
Level (number) int Warning level.
Argument descriptions:
============== ================================================== ============================== ==============================
-Argument Name Argument Value Syntax Clang C++ Type Description
+Argument Name Argument Value Syntax Clang C++ Type Description
============== ================================================== ============================== ==============================
Loc "(file):(line):(col)" SourceLocation The location of the directive.
============== ================================================== ============================== ==============================
Argument descriptions:
============== ================================================== ============================== ======================================================================================================
-Argument Name Argument Value Syntax Clang C++ Type Description
+Argument Name Argument Value Syntax Clang C++ Type Description
============== ================================================== ============================== ======================================================================================================
MacroNameTok (token) const Token The macro name token.
MacroDirective (MD_Define|MD_Undefine|MD_Visibility) const MacroDirective The kind of macro directive from the MacroDirective structure.
Argument descriptions:
============== ================================================== ============================== ==============================================================
-Argument Name Argument Value Syntax Clang C++ Type Description
+Argument Name Argument Value Syntax Clang C++ Type Description
============== ================================================== ============================== ==============================================================
MacroNameTok (token) const Token The macro name token.
MacroDirective (MD_Define|MD_Undefine|MD_Visibility) const MacroDirective The kind of macro directive from the MacroDirective structure.
Argument descriptions:
============== ================================================== ============================== ==============================================================
-Argument Name Argument Value Syntax Clang C++ Type Description
+Argument Name Argument Value Syntax Clang C++ Type Description
============== ================================================== ============================== ==============================================================
MacroNameTok (token) const Token The macro name token.
MacroDirective (MD_Define|MD_Undefine|MD_Visibility) const MacroDirective The kind of macro directive from the MacroDirective structure.
Argument descriptions:
============== ================================================== ============================== ==============================================================
-Argument Name Argument Value Syntax Clang C++ Type Description
+Argument Name Argument Value Syntax Clang C++ Type Description
============== ================================================== ============================== ==============================================================
MacroNameTok (token) const Token The macro name token.
MacroDirective (MD_Define|MD_Undefine|MD_Visibility) const MacroDirective The kind of macro directive from the MacroDirective structure.
Argument descriptions:
============== ================================================== ============================== =========================
-Argument Name Argument Value Syntax Clang C++ Type Description
+Argument Name Argument Value Syntax Clang C++ Type Description
============== ================================================== ============================== =========================
Range ["(file):(line):(col)", "(file):(line):(col)"] SourceRange The source range skipped.
============== ================================================== ============================== =========================
Argument descriptions:
============== ================================================== ============================== ===================================
-Argument Name Argument Value Syntax Clang C++ Type Description
+Argument Name Argument Value Syntax Clang C++ Type Description
============== ================================================== ============================== ===================================
Loc "(file):(line):(col)" SourceLocation The location of the directive.
ConditionRange ["(file):(line):(col)", "(file):(line):(col)"] SourceRange The source range for the condition.
Argument descriptions:
============== ================================================== ============================== ===================================
-Argument Name Argument Value Syntax Clang C++ Type Description
+Argument Name Argument Value Syntax Clang C++ Type Description
============== ================================================== ============================== ===================================
Loc "(file):(line):(col)" SourceLocation The location of the directive.
ConditionRange ["(file):(line):(col)", "(file):(line):(col)"] SourceRange The source range for the condition.
Argument descriptions:
============== ================================================== ============================== ==============================================================
-Argument Name Argument Value Syntax Clang C++ Type Description
+Argument Name Argument Value Syntax Clang C++ Type Description
============== ================================================== ============================== ==============================================================
Loc "(file):(line):(col)" SourceLocation The location of the directive.
MacroNameTok (token) const Token The macro name token.
Argument descriptions:
============== ================================================== ============================== ==============================================================
-Argument Name Argument Value Syntax Clang C++ Type Description
+Argument Name Argument Value Syntax Clang C++ Type Description
============== ================================================== ============================== ==============================================================
Loc "(file):(line):(col)" SourceLocation The location of the directive.
MacroNameTok (token) const Token The macro name token.
Argument descriptions:
============== ================================================== ============================== ===================================
-Argument Name Argument Value Syntax Clang C++ Type Description
+Argument Name Argument Value Syntax Clang C++ Type Description
============== ================================================== ============================== ===================================
Loc "(file):(line):(col)" SourceLocation The location of the else directive.
IfLoc "(file):(line):(col)" SourceLocation The location of the if directive.
Argument descriptions:
============== ================================================== ============================== ====================================
-Argument Name Argument Value Syntax Clang C++ Type Description
+Argument Name Argument Value Syntax Clang C++ Type Description
============== ================================================== ============================== ====================================
Loc "(file):(line):(col)" SourceLocation The location of the endif directive.
IfLoc "(file):(line):(col)" SourceLocation The location of the if directive.
.. _Getting Started with the LLVM System: https://llvm.org/docs/GettingStarted.html
.. _Building LLVM with CMake: https://llvm.org/docs/CMake.html
.. _Clang Tools Documentation: https://clang.llvm.org/docs/ClangTools.html
-
struct Block_literal_1 {
void *isa; // initialized to &_NSConcreteStackBlock or &_NSConcreteGlobalBlock
int flags;
- int reserved;
+ int reserved;
R (*invoke)(struct Block_literal_1 *, P...);
struct Block_descriptor_1 {
unsigned long int reserved; // NULL
BLOCK_HAS_CTOR = (1 << 26), // helpers have C++ code
BLOCK_IS_GLOBAL = (1 << 28),
BLOCK_HAS_STRET = (1 << 29), // IFF BLOCK_HAS_SIGNATURE
- BLOCK_HAS_SIGNATURE = (1 << 30),
+ BLOCK_HAS_SIGNATURE = (1 << 30),
};
In 10.6.ABI the (1<<29) was usually set and was always ignored by the runtime -
initialized as follows:
1. A ``static`` descriptor structure is declared and initialized as follows:
-
+
a. The ``invoke`` function pointer is set to a function that takes the
``Block`` structure as its first argument and the rest of the arguments (if
any) to the ``Block`` and executes the ``Block`` compound statement.
-
+
b. The ``size`` field is set to the size of the following ``Block`` literal
structure.
-
+
c. The ``copy_helper`` and ``dispose_helper`` function pointers are set to
respective helper functions if they are required by the ``Block`` literal.
2. A stack (or global) ``Block`` literal data structure is created and
initialized as follows:
-
+
a. The ``isa`` field is set to the address of the external
``_NSConcreteStackBlock``, which is a block of uninitialized memory supplied
in ``libSystem``, or ``_NSConcreteGlobalBlock`` if this is a static or file
level ``Block`` literal.
-
+
b. The ``flags`` field is set to zero unless there are variables imported
into the ``Block`` that need helper functions for program level
``Block_copy()`` and ``Block_release()`` operations, in which case the
struct __block_literal_1 {
void *isa;
int flags;
- int reserved;
+ int reserved;
void (*invoke)(struct __block_literal_1 *);
struct __block_descriptor_1 *descriptor;
};
-
+
void __block_invoke_1(struct __block_literal_1 *_block) {
printf("hello world\n");
}
-
+
static struct __block_descriptor_1 {
unsigned long int reserved;
unsigned long int Block_size;
which would be compiled to:
.. code-block:: c
-
+
struct __block_literal_2 {
void *isa;
int flags;
- int reserved;
+ int reserved;
void (*invoke)(struct __block_literal_2 *);
struct __block_descriptor_2 *descriptor;
const int x;
};
-
+
void __block_invoke_2(struct __block_literal_2 *_block) {
printf("x is %d\n", _block->x);
}
-
+
static struct __block_descriptor_2 {
unsigned long int reserved;
unsigned long int Block_size;
void (^existingBlock)(void) = ...;
void (^vv)(void) = ^{ existingBlock(); }
vv();
-
+
struct __block_literal_3 {
...; // existing block
};
-
+
struct __block_literal_4 {
void *isa;
int flags;
- int reserved;
+ int reserved;
void (*invoke)(struct __block_literal_4 *);
struct __block_literal_3 *const existingBlock;
};
-
+
void __block_invoke_4(struct __block_literal_2 *_block) {
__block->existingBlock->invoke(__block->existingBlock);
}
-
+
void __block_copy_4(struct __block_literal_4 *dst, struct __block_literal_4 *src) {
//_Block_copy_assign(&dst->existingBlock, src->existingBlock, 0);
_Block_object_assign(&dst->existingBlock, src->existingBlock, BLOCK_FIELD_IS_BLOCK);
}
-
+
void __block_dispose_4(struct __block_literal_4 *src) {
// was _Block_destroy
_Block_object_dispose(src->existingBlock, BLOCK_FIELD_IS_BLOCK);
}
-
+
static struct __block_descriptor_4 {
unsigned long int reserved;
unsigned long int Block_size;
void __block_copy_foo(struct __block_literal_5 *dst, struct __block_literal_5 *src) {
_Block_object_assign(&dst->objectPointer, src-> objectPointer, BLOCK_FIELD_IS_OBJECT);
}
-
+
void __block_dispose_foo(struct __block_literal_5 *src) {
_Block_object_dispose(src->objectPointer, BLOCK_FIELD_IS_OBJECT);
}
a. The ``forwarding`` pointer is set to the beginning of its enclosing
structure.
-
+
b. The ``size`` field is initialized to the total size of the enclosing
- structure.
-
+ structure.
+
c. The ``flags`` field is set to either 0 if no helper functions are needed
- or (1<<25) if they are.
-
- d. The helper functions are initialized (if present).
-
- e. The variable itself is set to its initial value.
-
+ or (1<<25) if they are.
+
+ d. The helper functions are initialized (if present).
+
+ e. The variable itself is set to its initial value.
+
f. The ``isa`` field is set to ``NULL``.
Access to ``__block`` variables from within its lexical scope
int size;
int captured_i;
} i = { NULL, &i, 0, sizeof(struct _block_byref_i), 10 };
-
+
i.forwarding->captured_i = 11;
In the case of a ``Block`` reference variable being marked ``__block`` the
void (*byref_dispose)(struct _block_byref_voidBlock *);
void (^captured_voidBlock)(void);
};
-
+
void _block_byref_keep_helper(struct _block_byref_voidBlock *dst, struct _block_byref_voidBlock *src) {
//_Block_copy_assign(&dst->captured_voidBlock, src->captured_voidBlock, 0);
_Block_object_assign(&dst->captured_voidBlock, src->captured_voidBlock, BLOCK_FIELD_IS_BLOCK | BLOCK_BYREF_CALLER);
}
-
+
void _block_byref_dispose_helper(struct _block_byref_voidBlock *param) {
//_Block_destroy(param->captured_voidBlock, 0);
_Block_object_dispose(param->captured_voidBlock, BLOCK_FIELD_IS_BLOCK | BLOCK_BYREF_CALLER)}
struct _block_byref_voidBlock voidBlock = {( .forwarding=&voidBlock, .flags=(1<<25), .size=sizeof(struct _block_byref_voidBlock *),
.byref_keep=_block_byref_keep_helper, .byref_dispose=_block_byref_dispose_helper,
.captured_voidBlock=blockA )};
-
+
voidBlock.forwarding->captured_voidBlock = blockB;
Importing ``__block`` variables into ``Blocks``
void (*byref_dispose)(struct _block_byref_i *);
int captured_i;
};
-
-
+
+
struct __block_literal_5 {
void *isa;
int flags;
- int reserved;
+ int reserved;
void (*invoke)(struct __block_literal_5 *);
struct __block_descriptor_5 *descriptor;
struct _block_byref_i *i_holder;
};
-
+
void __block_invoke_5(struct __block_literal_5 *_block) {
_block->forwarding->captured_i = 10;
}
-
+
void __block_copy_5(struct __block_literal_5 *dst, struct __block_literal_5 *src) {
//_Block_byref_assign_copy(&dst->captured_i, src->captured_i);
_Block_object_assign(&dst->captured_i, src->captured_i, BLOCK_FIELD_IS_BYREF | BLOCK_BYREF_CALLER);
}
-
+
void __block_dispose_5(struct __block_literal_5 *src) {
//_Block_byref_release(src->captured_i);
_Block_object_dispose(src->captured_i, BLOCK_FIELD_IS_BYREF | BLOCK_BYREF_CALLER);
}
-
+
static struct __block_descriptor_5 {
unsigned long int reserved;
unsigned long int Block_size;
void (*byref_dispose)(struct _block_byref_i *);
id captured_obj;
};
-
+
void _block_byref_obj_keep(struct _block_byref_voidBlock *dst, struct _block_byref_voidBlock *src) {
//_Block_copy_assign(&dst->captured_obj, src->captured_obj, 0);
_Block_object_assign(&dst->captured_obj, src->captured_obj, BLOCK_FIELD_IS_OBJECT | BLOCK_FIELD_IS_WEAK | BLOCK_BYREF_CALLER);
}
-
+
void _block_byref_obj_dispose(struct _block_byref_voidBlock *param) {
//_Block_destroy(param->captured_obj, 0);
_Block_object_dispose(param->captured_obj, BLOCK_FIELD_IS_OBJECT | BLOCK_FIELD_IS_WEAK | BLOCK_BYREF_CALLER);
struct __block_literal_5 {
void *isa;
int flags;
- int reserved;
+ int reserved;
void (*invoke)(struct __block_literal_5 *);
struct __block_descriptor_5 *descriptor;
struct _block_byref_obj *byref_obj;
};
-
+
void __block_invoke_5(struct __block_literal_5 *_block) {
[objc_read_weak(&_block->byref_obj->forwarding->captured_obj) somemessage];
}
-
+
void __block_copy_5(struct __block_literal_5 *dst, struct __block_literal_5 *src) {
//_Block_byref_assign_copy(&dst->byref_obj, src->byref_obj);
_Block_object_assign(&dst->byref_obj, src->byref_obj, BLOCK_FIELD_IS_BYREF | BLOCK_FIELD_IS_WEAK);
}
-
+
void __block_dispose_5(struct __block_literal_5 *src) {
//_Block_byref_release(src->byref_obj);
_Block_object_dispose(src->byref_obj, BLOCK_FIELD_IS_BYREF | BLOCK_FIELD_IS_WEAK);
}
-
+
static struct __block_descriptor_5 {
unsigned long int reserved;
unsigned long int Block_size;
truct _block_byref_obj obj = {( .forwarding=&obj, .flags=(1<<25), .size=sizeof(struct _block_byref_obj),
.byref_keep=_block_byref_obj_keep, .byref_dispose=_block_byref_obj_dispose,
.captured_obj = <initialization expression> )};
-
+
truct __block_literal_5 _block_literal = {
&_NSConcreteStackBlock,
(1<<25)|(1<<29), <uninitialized>,
&__block_descriptor_5,
&obj, // a reference to the on-stack structure containing "captured_obj"
};
-
-
+
+
functioncall(_block_literal->invoke(&_block_literal));
C++ Support
struct __block_literal_10 {
void *isa;
int flags;
- int reserved;
+ int reserved;
void (*invoke)(struct __block_literal_10 *);
struct __block_descriptor_10 *descriptor;
const FOO foo;
};
-
+
void __block_invoke_10(struct __block_literal_10 *_block) {
printf("%d\n", _block->foo.value());
}
-
+
void __block_copy_10(struct __block_literal_10 *dst, struct __block_literal_10 *src) {
FOO_ctor(&dst->foo, &src->foo);
}
-
+
void __block_dispose_10(struct __block_literal_10 *src) {
FOO_dtor(&src->foo);
}
-
+
static struct __block_descriptor_10 {
unsigned long int reserved;
unsigned long int Block_size;
void _block_byref_obj_keep(struct _block_byref_blockStorageFoo *dst, struct _block_byref_blockStorageFoo *src) {
FOO_ctor(&dst->blockStorageFoo, &src->blockStorageFoo);
}
-
+
void _block_byref_obj_dispose(struct _block_byref_blockStorageFoo *src) {
FOO_dtor(&src->blockStorageFoo);
}
BLOCK_FIELD_IS_OBJECT = 3, // id, NSObject, __attribute__((NSObject)), block, ...
BLOCK_FIELD_IS_BLOCK = 7, // a block variable
BLOCK_FIELD_IS_BYREF = 8, // the on stack structure holding the __block variable
-
+
BLOCK_FIELD_IS_WEAK = 16, // declared __weak
-
+
BLOCK_BYREF_CALLER = 128, // called from byref copy/dispose helpers
};
The prototypes, and summary, of the helper functions are:
.. code-block:: c
-
+
/* Certain field types require runtime assistance when being copied to the
heap. The following function is used to copy fields of types: blocks,
pointers to byref structures, and objects (including
helper will one see BLOCK_FIELD_IS_BYREF.
*/
void _Block_object_assign(void *destAddr, const void *object, const int flags);
-
+
/* Similarly a compiler generated dispose helper needs to call back for each
field of the byref data structure. (Currently the implementation only
packs one field into the byref structure but in principle there could be
**QualifierAlignment** (``QualifierAlignmentStyle``) :versionbadge:`clang-format 14`
Different ways to arrange specifiers and qualifiers (e.g. const/volatile).
- .. warning::
+ .. warning::
Setting ``QualifierAlignment`` to something other than `Leave`, COULD
lead to incorrect code formatting due to incorrect decisions made due to
1. Extract (libTest-nvptx-sm_50.a) => /tmp/a.cubin /tmp/b.cubin
2. nvlink -o a.out-openmp-nvptx64 main.cubin /tmp/a.cubin /tmp/b.cubin
-
+
**Output**
Output file generated by ``nvlink`` which links all cubin files.
bad-cast.cpp:109:7: runtime error: control flow integrity check for type 'B' failed during base-to-derived cast (vtable address 0x000000425a50)
0x000000425a50: note: vtable is of type 'A'
00 00 00 00 f0 f1 41 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 5a 42 00
- ^
+ ^
If diagnostics are enabled, you can also configure CFI to continue program
execution instead of aborting by using the :ref:`-fsanitize-recover=
.. csv-table:: Bit Vectors for A, B, C
:header: Class, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14
- A, , , 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, ,
- B, , , , , , , , 1, , , , , , ,
- C, , , , , , , , , , , , , 1, ,
+ A, , , 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, ,
+ B, , , , , , , , 1, , , , , , ,
+ C, , , , , , , , , , , , , 1, ,
Short Inline Bit Vectors
~~~~~~~~~~~~~~~~~~~~~~~~
de6: 48 89 df mov %rbx,%rdi
de9: ff 10 callq *(%rax)
[...]
- e0b: 0f 0b ud2
+ e0b: 0f 0b ud2
Or if the bit vector fits in 64 bits:
11ba: 48 83 f9 2a cmp $0x2a,%rcx
11be: 77 35 ja 11f5 <main+0xb5>
11c0: 48 ba 09 00 00 00 00 movabs $0x40000000009,%rdx
- 11c7: 04 00 00
+ 11c7: 04 00 00
11ca: 48 0f a3 ca bt %rcx,%rdx
11ce: 73 25 jae 11f5 <main+0xb5>
11d0: 48 89 df mov %rbx,%rdi
11d3: ff 10 callq *(%rax)
[...]
- 11f5: 0f 0b ud2
+ 11f5: 0f 0b ud2
If the bit vector consists of a single bit, there is only one possible
virtual table, and the check can consist of a single equality comparison:
Forward-Edge CFI for Virtual Calls by Interleaving Virtual Tables
-----------------------------------------------------------------
-Dimitar et. al. proposed a novel approach that interleaves virtual tables in [1]_.
-This approach is more efficient in terms of space because padding and bit vectors are no longer needed.
-At the same time, it is also more efficient in terms of performance because in the interleaved layout
-address points of the virtual tables are consecutive, thus the validity check of a virtual
-vtable pointer is always a range check.
+Dimitar et. al. proposed a novel approach that interleaves virtual tables in [1]_.
+This approach is more efficient in terms of space because padding and bit vectors are no longer needed.
+At the same time, it is also more efficient in terms of performance because in the interleaved layout
+address points of the virtual tables are consecutive, thus the validity check of a virtual
+vtable pointer is always a range check.
-At a high level, the interleaving scheme consists of three steps: 1) split virtual table groups into
-separate virtual tables, 2) order virtual tables by a pre-order traversal of the class hierarchy
+At a high level, the interleaving scheme consists of three steps: 1) split virtual table groups into
+separate virtual tables, 2) order virtual tables by a pre-order traversal of the class hierarchy
and 3) interleave virtual tables.
The interleaving scheme implemented in LLVM is inspired by [1]_ but has its own
Split virtual table groups into separate virtual tables
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-The Itanium C++ ABI glues multiple individual virtual tables for a class into a combined virtual table (virtual table group).
+The Itanium C++ ABI glues multiple individual virtual tables for a class into a combined virtual table (virtual table group).
The interleaving scheme, however, can only work with individual virtual tables so it must split the combined virtual tables first.
In comparison, the old scheme does not require the splitting but it is more efficient when the combined virtual tables have been split.
-The `GlobalSplit`_ pass is responsible for splitting combined virtual tables into individual ones.
+The `GlobalSplit`_ pass is responsible for splitting combined virtual tables into individual ones.
.. _GlobalSplit: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/IPO/GlobalSplit.cpp
-Order virtual tables by a pre-order traversal of the class hierarchy
+Order virtual tables by a pre-order traversal of the class hierarchy
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-This step is common to both the old scheme described above and the interleaving scheme.
-For the interleaving scheme, since the combined virtual tables have been split in the previous step,
-this step ensures that for any class all the compatible virtual tables will appear consecutively.
-For the old scheme, the same property may not hold since it may work on combined virtual tables.
+This step is common to both the old scheme described above and the interleaving scheme.
+For the interleaving scheme, since the combined virtual tables have been split in the previous step,
+this step ensures that for any class all the compatible virtual tables will appear consecutively.
+For the old scheme, the same property may not hold since it may work on combined virtual tables.
For example, consider the following four C++ classes:
Interleave virtual tables
~~~~~~~~~~~~~~~~~~~~~~~~~
-This step is where the interleaving scheme deviates from the old scheme. Instead of laying out
-whole virtual tables in the previously computed order, the interleaving scheme lays out table
-entries of the virtual tables strategically to ensure the following properties:
+This step is where the interleaving scheme deviates from the old scheme. Instead of laying out
+whole virtual tables in the previously computed order, the interleaving scheme lays out table
+entries of the virtual tables strategically to ensure the following properties:
(1) offset-to-top and RTTI fields layout property
-The Itanium C++ ABI specifies that offset-to-top and RTTI fields appear at the offsets behind the
-address point. Note that libraries like libcxxabi do assume this property.
+The Itanium C++ ABI specifies that offset-to-top and RTTI fields appear at the offsets behind the
+address point. Note that libraries like libcxxabi do assume this property.
(2) virtual function entry layout property
-For each virtual function the distance between an virtual table entry for this function and the corresponding
+For each virtual function the distance between an virtual table entry for this function and the corresponding
address point is always the same. This property ensures that dynamic dispatch still works with the interleaving layout.
-Note that the interleaving scheme in the CFI implementation guarantees both properties above whereas the original scheme proposed
-in [1]_ only guarantees the second property.
+Note that the interleaving scheme in the CFI implementation guarantees both properties above whereas the original scheme proposed
+in [1]_ only guarantees the second property.
To illustrate how the interleaving algorithm works, let us continue with the running example.
-The algorithm first separates all the virtual table entries into two work lists. To do so,
-it starts by allocating two work lists, one initialized with all the offset-to-top entries of virtual tables in the order
-computed in the last step, one initialized with all the RTTI entries in the same order.
+The algorithm first separates all the virtual table entries into two work lists. To do so,
+it starts by allocating two work lists, one initialized with all the offset-to-top entries of virtual tables in the order
+computed in the last step, one initialized with all the RTTI entries in the same order.
-.. csv-table:: Work list 1 Layout
+.. csv-table:: Work list 1 Layout
:header: 0, 1, 2, 3
-
+
A::offset-to-top, B::offset-to-top, D::offset-to-top, C::offset-to-top
.. csv-table:: Work list 2 layout
:header: 0, 1, 2, 3,
-
- &A::rtti, &B::rtti, &D::rtti, &C::rtti
+
+ &A::rtti, &B::rtti, &D::rtti, &C::rtti
Then for each virtual function the algorithm goes through all the virtual tables in the previously computed order
-to collect all the related entries into a virtual function list.
+to collect all the related entries into a virtual function list.
After this step, there are the following virtual function lists:
-.. csv-table:: f1 list
+.. csv-table:: f1 list
:header: 0, 1, 2, 3
&A::f1, &B::f1, &D::f1, &C::f1
-.. csv-table:: f2 list
+.. csv-table:: f2 list
:header: 0, 1
&B::f2, &D::f2
-.. csv-table:: f3 list
+.. csv-table:: f3 list
:header: 0
&C::f3
Next, the algorithm picks the longest remaining virtual function list and appends the whole list to the shortest work list
-until no function lists are left, and pads the shorter work list so that they are of the same length.
-In the example, f1 list will be first added to work list 1, then f2 list will be added
-to work list 2, and finally f3 list will be added to the work list 2. Since work list 1 now has one more entry than
-work list 2, a padding entry is added to the latter. After this step, the two work lists look like:
+until no function lists are left, and pads the shorter work list so that they are of the same length.
+In the example, f1 list will be first added to work list 1, then f2 list will be added
+to work list 2, and finally f3 list will be added to the work list 2. Since work list 1 now has one more entry than
+work list 2, a padding entry is added to the latter. After this step, the two work lists look like:
-.. csv-table:: Work list 1 Layout
+.. csv-table:: Work list 1 Layout
:header: 0, 1, 2, 3, 4, 5, 6, 7
A::offset-to-top, B::offset-to-top, D::offset-to-top, C::offset-to-top, &A::f1, &B::f1, &D::f1, &C::f1
.. csv-table:: Work list 2 layout
:header: 0, 1, 2, 3, 4, 5, 6, 7
- &A::rtti, &B::rtti, &D::rtti, &C::rtti, &B::f2, &D::f2, &C::f3, padding
+ &A::rtti, &B::rtti, &D::rtti, &C::rtti, &B::f2, &D::f2, &C::f3, padding
-Finally, the algorithm merges the two work lists into the interleaved layout by alternatingly
+Finally, the algorithm merges the two work lists into the interleaved layout by alternatingly
moving the head of each list to the final layout. After this step, the final interleaved layout looks like:
.. csv-table:: Interleaved layout
- :header: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15
+ :header: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15
A::offset-to-top, &A::rtti, B::offset-to-top, &B::rtti, D::offset-to-top, &D::rtti, C::offset-to-top, &C::rtti, &A::f1, &B::f2, &B::f1, &D::f2, &D::f1, &C::f3, &C::f1, padding
In the above interleaved layout, each virtual table's offset-to-top and RTTI are always adjacent, which shows that the layout has the first property.
For the second property, let us look at f2 as an example. In the interleaved layout,
-there are two entries for f2: B::f2 and D::f2. The distance between &B::f2
+there are two entries for f2: B::f2 and D::f2. The distance between &B::f2
and its address point D::offset-to-top (the entry immediately after &B::rtti) is 5 entry-length, so is the distance between &D::f2 and C::offset-to-top (the entry immediately after &D::rtti).
Forward-Edge CFI for Indirect Function Calls
void Clang::ConstructJob(const ArgList &Args /*...*/) const {
ArgStringList CmdArgs;
- // ...
+ // ...
+ for (const Arg *A : Args.filtered(OPT_fpass_plugin_EQ)) {
+ CmdArgs.push_back(Args.MakeArgString(Twine("-fpass-plugin=") + A->getValue()));
Subjects
~~~~~~~~
-Attributes appertain to one or more subjects. If the attribute attempts to
+Attributes appertain to one or more subjects. If the attribute attempts to
attach to a subject that is not in the subject list, a diagnostic is issued
automatically. Whether the diagnostic is a warning or an error depends on how
the attribute's ``SubjectList`` is defined, but the default behavior is to warn.
All attributes must have some form of documentation associated with them.
Documentation is table generated on the public web server by a server-side
process that runs daily. Generally, the documentation for an attribute is a
-stand-alone definition in `include/clang/Basic/AttrDocs.td
+stand-alone definition in `include/clang/Basic/AttrDocs.td
<https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Basic/AttrDocs.td>`_
that is named after the attribute being documented.
attributes that appertain to function-like subjects, ``DocCatVariable`` for
attributes that appertain to variable-like subjects, ``DocCatType`` for type
attributes, and ``DocCatStmt`` for statement attributes. A custom documentation
-category should be used for groups of attributes with similar functionality.
+category should be used for groups of attributes with similar functionality.
Custom categories are good for providing overview information for the attributes
grouped under it. For instance, the consumed annotation attributes define a
custom category, ``DocCatConsumed``, that explains what consumed annotations are
proper visitation for your expression, enabling various IDE features such
as syntax highlighting, cross-referencing, and so on. The
``c-index-test`` helper program can be used to test these features.
-
#undef FINAL_MACRO // warning: FINAL_MACRO is marked final and should not be undefined
This is useful for enforcing system-provided macros that should not be altered
-in user headers or code. This is controlled by ``-Wpedantic-macros``. Final
+in user headers or code. This is controlled by ``-Wpedantic-macros``. Final
macros will always warn on redefinition, including situations with identical
bodies and in system headers.
.. code-block:: c
- # 57 // Advance (or return) to line 57 of the current source file
+ # 57 // Advance (or return) to line 57 of the current source file
# 57 "frob" // Set to line 57 of "frob"
# 1 "foo.h" 1 // Enter "foo.h" at line 1
# 59 "main.c" 2 // Leave current include and return to "main.c"
code into headers.
* **Fragility**: ``#include`` directives are treated as textual
- inclusion by the preprocessor, and are therefore subject to any
- active macro definitions at the time of inclusion. If any of the
- active macro definitions happens to collide with a name in the
- library, it can break the library API or cause compilation failures
- in the library header itself. For an extreme example,
- ``#define std "The C++ Standard"`` and then include a standard
+ inclusion by the preprocessor, and are therefore subject to any
+ active macro definitions at the time of inclusion. If any of the
+ active macro definitions happens to collide with a name in the
+ library, it can break the library API or cause compilation failures
+ in the library header itself. For an extreme example,
+ ``#define std "The C++ Standard"`` and then include a standard
library header: the result is a horrific cascade of failures in the
C++ Standard Library's implementation. More subtle real-world
problems occur when the headers for two different libraries interact
.. note::
To actually see any benefits from modules, one first has to introduce module maps for the underlying C standard library and the libraries and headers on which it depends. The section `Modularizing a Platform`_ describes the steps one must take to write these module maps.
-
+
One can use module maps without modules to check the integrity of the use of header files. To do this, use the ``-fimplicit-module-maps`` option instead of the ``-fmodules`` option, or use ``-fmodule-map-file=`` option to explicitly specify the module map files to load.
Compilation model
* ``<stdio.h>`` defines a macro ``getc`` (and exports its ``#define``)
* ``<cstdio>`` imports the ``<stdio.h>`` module and undefines the macro (and exports its ``#undef``)
-
+
The ``#undef`` overrides the ``#define``, and a source file that imports both modules *in any order* will not see ``getc`` defined as a macro.
Module Map Language
// ...more headers follow...
}
-Here, the top-level module ``std`` encompasses the whole C standard library. It has a number of submodules containing different parts of the standard library: ``complex`` for complex numbers, ``ctype`` for character types, etc. Each submodule lists one of more headers that provide the contents for that submodule. Finally, the ``export *`` command specifies that anything included by that submodule will be automatically re-exported.
+Here, the top-level module ``std`` encompasses the whole C standard library. It has a number of submodules containing different parts of the standard library: ``complex`` for complex numbers, ``ctype`` for character types, etc. Each submodule lists one of more headers that provide the contents for that submodule. Finally, the ``export *`` command specifies that anything included by that submodule will be automatically re-exported.
Lexical structure
-----------------
.. note::
Any headers not included by the umbrella header should have
- explicit ``header`` declarations. Use the
+ explicit ``header`` declarations. Use the
``-Wincomplete-umbrella`` warning option to ask Clang to complain
about headers not covered by the umbrella header or the module map.
*umbrella-dir-declaration*:
``umbrella`` *string-literal*
-
+
The *string-literal* refers to a directory. When the module is built, all of the header files in that directory (and its subdirectories) are included in the module.
An *umbrella-dir-declaration* shall not refer to the same directory as the location of an umbrella *header-declaration*. In other words, only a single kind of umbrella can be specified for a given directory.
*inferred-submodule-declaration*:
``explicit``:sub:`opt` ``framework``:sub:`opt` ``module`` '*' *attributes*:sub:`opt` '{' *inferred-submodule-member** '}'
-
+
*inferred-submodule-member*:
``export`` '*'
* Have the same name as the header (without the file extension)
* Have the ``explicit`` specifier, if the *inferred-submodule-declaration* has the ``explicit`` specifier
-* Have the ``framework`` specifier, if the
+* Have the ``framework`` specifier, if the
*inferred-submodule-declaration* has the ``framework`` specifier
-* Have the attributes specified by the \ *inferred-submodule-declaration*
+* Have the attributes specified by the \ *inferred-submodule-declaration*
* Contain a single *header-declaration* naming that header
* Contain a single *export-declaration* ``export *``, if the \ *inferred-submodule-declaration* contains the \ *inferred-submodule-member* ``export *``
A *config-macros-declaration* shall only be present on a top-level module, i.e., a module that is not nested within an enclosing module.
-The ``exhaustive`` attribute specifies that the list of macros in the *config-macros-declaration* is exhaustive, meaning that no other macro definition is intended to have an effect on the API of that module.
+The ``exhaustive`` attribute specifies that the list of macros in the *config-macros-declaration* is exhaustive, meaning that no other macro definition is intended to have an effect on the API of that module.
.. note::
- The ``exhaustive`` attribute implies that any macro definitions
+ The ``exhaustive`` attribute implies that any macro definitions
for macros not listed as configuration macros should be ignored
completely when building the module. As an optimization, the
compiler could reduce the number of unique module variants by not
Modularizing a Platform
=======================
-To get any benefit out of modules, one needs to introduce module maps for software libraries starting at the bottom of the stack. This typically means introducing a module map covering the operating system's headers and the C standard library headers (in ``/usr/include``, for a Unix system).
+To get any benefit out of modules, one needs to introduce module maps for software libraries starting at the bottom of the stack. This typically means introducing a module map covering the operating system's headers and the C standard library headers (in ``/usr/include``, for a Unix system).
The module maps will be written using the `module map language`_, which provides the tools necessary to describe the mapping between headers and modules. Because the set of headers differs from one system to the next, the module map will likely have to be somewhat customized for, e.g., a particular distribution and version of the operating system. Moreover, the system headers themselves may require some modification, if they exhibit any anti-patterns that break modules. Such common patterns are described below.
**Example of Use**:
.. code-block:: console
-
+
$ clang -Xclang -fdeclare-opencl-builtins test.cl
.. _opencl_fake_address_space_map:
add_clang_executable(find-class-decls FindClassDecls.cpp)
- target_link_libraries(find-class-decls
+ target_link_libraries(find-class-decls
PRIVATE
clangAST
clangBasic
$ ./bin/find-class-decls "namespace n { namespace m { class C {}; } }"
Found declaration at 1:29
-
The functions `__sanitizer_cov_trace_pc_*` should be defined by the user.
-Example:
+Example:
.. code-block:: c++
extern "C" void __sanitizer_cov_trace_pc_guard(uint32_t *guard) {
if (!*guard) return; // Duplicate the guard check.
// If you set *guard to 0 this code will not be called again for this edge.
- // Now you can get the PC and do whatever you want:
+ // Now you can get the PC and do whatever you want:
// store it somewhere or symbolize it and print right away.
// The values of `*guard` are as you set them in
// __sanitizer_cov_trace_pc_guard_init and so you can make them consecutive
}
.. code-block:: console
-
+
clang++ -g -fsanitize-coverage=trace-pc-guard trace-pc-guard-example.cc -c
clang++ trace-pc-guard-cb.cc trace-pc-guard-example.o -fsanitize=address
ASAN_OPTIONS=strip_path_prefix=`pwd`/ ./a.out
void __sanitizer_cov_trace_cmp8(uint64_t Arg1, uint64_t Arg2);
// Called before a comparison instruction if exactly one of the arguments is constant.
- // Arg1 and Arg2 are arguments of the comparison, Arg1 is a compile-time constant.
+ // Arg1 and Arg2 are arguments of the comparison, Arg1 is a compile-time constant.
// These callbacks are emitted by -fsanitize-coverage=trace-cmp since 2017-08-11
void __sanitizer_cov_trace_const_cmp1(uint8_t Arg1, uint8_t Arg2);
void __sanitizer_cov_trace_const_cmp2(uint16_t Arg1, uint16_t Arg2);
An simple ``sancov`` tool is provided to process coverage files.
The tool is part of LLVM project and is currently supported only on Linux.
It can handle symbolization tasks autonomously without any extra support
-from the environment. You need to pass .sancov files (named
-``<module_name>.<pid>.sancov`` and paths to all corresponding binary elf files.
+from the environment. You need to pass .sancov files (named
+``<module_name>.<pid>.sancov`` and paths to all corresponding binary elf files.
Sancov matches these files using module names and binaries file names.
.. code-block:: console
// Assert that is mutex is currently held for read operations.
void AssertReaderHeld() ASSERT_SHARED_CAPABILITY(this);
-
+
// For negative capabilities.
const Mutex& operator!() const { return *this; }
};
#endif // USE_LOCK_STYLE_THREAD_SAFETY_ATTRIBUTES
#endif // THREAD_SAFETY_ANALYSIS_MUTEX_H
-
ld,"a.out",900,8000,53568
The data on each row represent:
-
+
* file name of the tool executable,
* output file name in quotes,
* total execution time in microseconds,
* execution time in user mode in microseconds,
* peak memory usage in Kb.
-
+
It is possible to specify this option without any value. In this case statistics
are printed on standard output in human readable format:
-
+
.. code-block:: console
$ clang -fproc-stat-report foo.c
clang-11: output=/tmp/foo-855a8e.o, total=68.000 ms, user=60.000 ms, mem=86920 Kb
ld: output=a.out, total=8.000 ms, user=4.000 ms, mem=52320 Kb
-
+
The report file specified in the option is locked for write, so this option
can be used to collect statistics in parallel builds. The report file is not
cleared, new data is appended to it, thus making posible to accumulate build
Select which denormal numbers the code is permitted to require.
- Valid values are:
+ Valid values are:
* ``ieee`` - IEEE 754 denormal numbers
* ``preserve-sign`` - the sign of a flushed-to-zero number is preserved in the sign of 0
**-f[no-]strict-float-cast-overflow**
- When a floating-point value is not representable in a destination integer
+ When a floating-point value is not representable in a destination integer
type, the code has undefined behavior according to the language standard.
By default, Clang will not guarantee any particular result in that case.
With the 'no-strict' option, Clang attempts to match the overflowing behavior
the optimizer may ignore parentheses when computing arithmetic expressions
in circumstances where the parenthesized and unparenthesized expression
express the same mathematical value. For example (a+b)+c is the same
- mathematical value as a+(b+c), but the optimizer is free to evaluate the
+ mathematical value as a+(b+c), but the optimizer is free to evaluate the
additions in any order regardless of the parentheses. When enabled, this
option forces the optimizer to honor the order of operations with respect
to parentheses in all circumstances.
2. Run the instrumented executable with inputs that reflect the typical usage.
By default, the profile data will be written to a ``default.profraw`` file
in the current directory. You can override that default by using option
- ``-fprofile-instr-generate=`` or by setting the ``LLVM_PROFILE_FILE``
+ ``-fprofile-instr-generate=`` or by setting the ``LLVM_PROFILE_FILE``
environment variable to specify an alternate file. If non-default file name
is specified by both the environment variable and the command line option,
the environment variable takes precedence. The file name pattern specified
When ``code`` is executed, the profile will be written to the file
``yyy/zzz/default_xxxx.profraw``.
- To generate the profile data file with the compiler readable format, the
+ To generate the profile data file with the compiler readable format, the
``llvm-profdata`` tool can be used with the profile directory as the input:
.. code-block:: console
$ clang --coverage -fprofile-exclude-files="^/usr/include/.*$" \
-fprofile-filter-files="^/usr/.*$"
-
+
In that case ``/usr/foo/oof.h`` is instrumented since it matches the filter regex and
doesn't match the exclude regex, but ``/usr/include/foo.h`` doesn't since it matches
the exclude regex.
Clang currently supports OpenCL C language standards up to v2.0. Clang mainly
supports full profile. There is only very limited support of the embedded
-profile.
+profile.
Starting from clang 9 a C++ mode is available for OpenCL (see
:ref:`C++ for OpenCL <cxx_for_opencl>`).
To make sure no invalid optimizations occur for single program multiple data
(SPMD) / single instruction multiple thread (SIMT) Clang provides attributes that
can be used for special functions that have cross work item semantics.
-An example is the subgroup operations such as `intel_sub_group_shuffle
+An example is the subgroup operations such as `intel_sub_group_shuffle
<https://www.khronos.org/registry/cl/extensions/intel/cl_intel_subgroups.txt>`_
.. code-block:: c
// Define custom my_sub_group_shuffle(data, c)
// that makes use of intel_sub_group_shuffle
- r1 = ...
+ r1 = ...
if (r0) r1 = computeA();
// Shuffle data from r1 into r3
// of threads id r2.
Using ``convergent`` guarantees correct execution by keeping CFG equivalence
wrt operations marked as ``convergent``. CFG ``G´`` is equivalent to ``G`` wrt
node ``Ni`` : ``iff ∀ Nj (i≠j)`` domination and post-domination relations with
-respect to ``Ni`` remain the same in both ``G`` and ``G´``.
+respect to ``Ni`` remain the same in both ``G`` and ``G´``.
noduplicate
^^^^^^^^^^^
clang test.clcpp
-C++ for OpenCL kernel sources can also be compiled online in drivers supporting
+C++ for OpenCL kernel sources can also be compiled online in drivers supporting
`cl_ext_cxx_for_opencl
<https://www.khronos.org/registry/OpenCL/extensions/ext/cl_ext_cxx_for_opencl.html>`_
extension.
constructor initialization kernel that has the following name scheme
``_GLOBAL__sub_I_<compiled file name>``.
This kernel is only present if there are global objects with non-trivial
-constructors present in the compiled binary. One way to check this is by
+constructors present in the compiled binary. One way to check this is by
passing ``CL_PROGRAM_KERNEL_NAMES`` to ``clGetProgramInfo`` (OpenCL v2.0
s5.8.7) and then checking whether any kernel name matches the naming scheme of
global constructor initialization kernel above.
.. toctree::
:maxdepth: 2
-
+
developer-docs/DebugChecks
developer-docs/IPA
developer-docs/InitializerLists
developer-docs/nullability
developer-docs/RegionStore
-
clang_analyzer_checkInlined(true); // expected-warning{{TRUE}}
return 42;
}
-
+
void topLevel() {
clang_analyzer_checkInlined(false); // no-warning (not inlined)
int value = inlined();
There are several options that control which calls the analyzer will consider for
inlining. The major one is ``-analyzer-config ipa``:
-* ``analyzer-config ipa=none`` - All inlining is disabled. This is the only mode
+* ``analyzer-config ipa=none`` - All inlining is disabled. This is the only mode
available in LLVM 3.1 and earlier and in Xcode 4.3 and earlier.
-* ``analyzer-config ipa=basic-inlining`` - Turns on inlining for C functions, C++
- static member functions, and blocks -- essentially, the calls that behave
- like simple C function calls. This is essentially the mode used in
+* ``analyzer-config ipa=basic-inlining`` - Turns on inlining for C functions, C++
+ static member functions, and blocks -- essentially, the calls that behave
+ like simple C function calls. This is essentially the mode used in
Xcode 4.4.
* ``analyzer-config ipa=inlining`` - Turns on inlining when we can confidently find
correct. For virtual calls, inline the most plausible definition.
* ``analyzer-config ipa=dynamic-bifurcate`` - Same as -analyzer-config ipa=dynamic,
- but the path is split. We inline on one branch and do not inline on the
- other. This mode does not drop the coverage in cases when the parent class
+ but the path is split. We inline on one branch and do not inline on the
+ other. This mode does not drop the coverage in cases when the parent class
has code that is only exercised when some of its methods are overridden.
Currently, ``-analyzer-config ipa=dynamic-bifurcate`` is the default mode.
-While ``-analyzer-config ipa`` determines in general how aggressively the analyzer
-will try to inline functions, several additional options control which types of
-functions can inlined, in an all-or-nothing way. These options use the
+While ``-analyzer-config ipa`` determines in general how aggressively the analyzer
+will try to inline functions, several additional options control which types of
+functions can inlined, in an all-or-nothing way. These options use the
analyzer's configuration table, so they are all specified as follows:
``-analyzer-config OPTION=VALUE``
cases the analyzer may still choose not to inline the function.
Note that under 'constructors', constructors for types with non-trivial
-destructors will not be inlined. Additionally, no C++ member functions will be
+destructors will not be inlined. Additionally, no C++ member functions will be
inlined under -analyzer-config ipa=none or -analyzer-config ipa=basic-inlining,
regardless of the setting of the c++-inlining mode.
``-analyzer-config c++-stdlib-inlining=[true | false]``
-Currently, C++ standard library functions are considered for inlining by
+Currently, C++ standard library functions are considered for inlining by
default.
The standard library functions and the STL in particular are used ubiquitously
.. code-block:: cpp
-
+
std::distance(c.begin(), c.end()) == 0
c.begin() == c.end()
c.empty()
"Dynamic" calls are those that are resolved at runtime, such as C++ virtual
method calls and Objective-C message sends. Due to the path-sensitive nature of
the analysis, the analyzer may be able to reason about the dynamic type of the
-object whose method is being called and thus "devirtualize" the call.
+object whose method is being called and thus "devirtualize" the call.
This path-sensitive devirtualization occurs when the analyzer can determine what
method would actually be called at runtime. This is possible when the type
inlined.
Inlining Dynamic Calls
-^^^^^^^^^^^^^^^^^^^^^^
+^^^^^^^^^^^^^^^^^^^^^^
The -analyzer-config ipa option has five different modes: none, basic-inlining,
inlining, dynamic, and dynamic-bifurcate. Under -analyzer-config ipa=dynamic,
all dynamic calls are inlined, whether we are certain or not that this will
actually be the definition used at runtime. Under -analyzer-config ipa=inlining,
only "near-perfect" devirtualized calls are inlined*, and other dynamic calls
-are evaluated conservatively (as if no definition were available).
+are evaluated conservatively (as if no definition were available).
* Currently, no Objective-C messages are not inlined under
-analyzer-config ipa=inlining, even if we are reasonably confident of the type
"dynamic", but performs a conservative invalidation in the general virtual case
in *addition* to inlining. The details of this are discussed below.
-As stated above, -analyzer-config ipa=basic-inlining does not inline any C++
-member functions or Objective-C method calls, even if they are non-virtual or
+As stated above, -analyzer-config ipa=basic-inlining does not inline any C++
+member functions or Objective-C method calls, even if they are non-virtual or
can be safely devirtualized.
ExprEngine::BifurcateCall implements the ``-analyzer-config ipa=dynamic-bifurcate``
mode.
-When a call is made on an object with imprecise dynamic type information
+When a call is made on an object with imprecise dynamic type information
(RuntimeDefinition::mayHaveOtherDefinitions() evaluates to TRUE), ExprEngine
bifurcates the path and marks the object's region (retrieved from the
RuntimeDefinition object) with a path-sensitive "mode" in the ProgramState.
-Currently, there are 2 modes:
+Currently, there are 2 modes:
* ``DynamicDispatchModeInlined`` - Models the case where the dynamic type information
- of the receiver (MemoryRegion) is assumed to be perfectly constrained so
- that a given definition of a method is expected to be the code actually
- called. When this mode is set, ExprEngine uses the Decl from
- RuntimeDefinition to inline any dynamically dispatched call sent to this
+ of the receiver (MemoryRegion) is assumed to be perfectly constrained so
+ that a given definition of a method is expected to be the code actually
+ called. When this mode is set, ExprEngine uses the Decl from
+ RuntimeDefinition to inline any dynamically dispatched call sent to this
receiver because the function definition is considered to be fully resolved.
* ``DynamicDispatchModeConservative`` - Models the case where the dynamic type
- information is assumed to be incorrect, for example, implies that the method
- definition is overridden in a subclass. In such cases, ExprEngine does not
- inline the methods sent to the receiver (MemoryRegion), even if a candidate
- definition is available. This mode is conservative about simulating the
+ information is assumed to be incorrect, for example, implies that the method
+ definition is overridden in a subclass. In such cases, ExprEngine does not
+ inline the methods sent to the receiver (MemoryRegion), even if a candidate
+ definition is available. This mode is conservative about simulating the
effects of a call.
-Going forward along the symbolic execution path, ExprEngine consults the mode
-of the receiver's MemRegion to make decisions on whether the calls should be
+Going forward along the symbolic execution path, ExprEngine consults the mode
+of the receiver's MemRegion to make decisions on whether the calls should be
inlined or not, which ensures that there is at most one split per region.
At a high level, "bifurcation mode" allows for increased semantic coverage in
Objective-C Message Heuristics
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-ExprEngine relies on a set of heuristics to partition the set of Objective-C
-method calls into those that require bifurcation and those that do not. Below
+ExprEngine relies on a set of heuristics to partition the set of Objective-C
+method calls into those that require bifurcation and those that do not. Below
are the cases when the DynamicTypeInfo of the object is considered precise
(cannot be a subclass):
At this point, I am a bit wondering about two questions.
-* When should something belong to a checker and when should something belong to the engine?
+* When should something belong to a checker and when should something belong to the engine?
Sometimes we model library aspects in the engine and model language constructs in checkers.
* What is the checker programming model that we are aiming for? Maximum freedom or more easy checker development?
I'd like to consider another funny example. Suppose we're trying to model
.. code-block:: cpp
-
+
std::unique_ptr. Consider::
-
+
void bar(const std::unique_ptr<int> &x);
-
+
void foo(std::unique_ptr<int> &x) {
int *a = x.get(); // (a, 0, direct): &AbstractStorageRegion
*a = 1; // (AbstractStorageRegion, 0, direct): 1 S32b
clang_analyzer_eval(*a == 1); // Making this true is up to the checker.
clang_analyzer_eval(*b == 2); // Making this unknown is up to the checker.
}
-
+
The checker doesn't totally need to ensure that ``*a == 1`` passes - even though the
pointer was unique, it could theoretically have ``.get()``-ed above and the code
could of course break the uniqueness invariant (though we'd probably want it).
anotherTakesNonNull(bar); // would be great to warn here, but not necessary(*)
Because bar corresponds to the same symbol all the time it is not easy to implement the checker that way the cast only suppress the first call but not the second. For this reason in the first implementation after a contradictory cast happens, I will treat bar as nullable unspecified, this way all of the warnings will be suppressed. Treating the symbol as nullable unspecified also has an advantage that in case the takesNonNull function body is being inlined, the will be no warning, when the symbol is dereferenced. In case I have time after the initial version I might spend additional time to try to find a more sophisticated solution, in which we would produce the second warning (*).
-
+
**2) nonnull**
* Dereferencing a nonnull, or sending message to it is ok.
id obj = getNonnull();
takesNullable(obj);
takesNonnull(obj);
-
+
void takesNullable(nullable id obj) {
obj->ivar // we should assume obj is nullable and warn here
}
-
+
With no special treatment, when the takesNullable is inlined the analyzer will not warn when the obj symbol is dereferenced. One solution for this is to reanalyze takesNullable as a top level function to get possible violations. The alternative method, deducing nullability information from the arguments after inlining is not robust enough (for example there might be more parameters with different nullability, but in the given path the two parameters might end up being the same symbol or there can be nested functions that take different view of the nullability of the same symbol). So the symbol will remain nonnull to avoid false positives but the functions that takes nullable parameters will be analyzed separately as well without inlining.
Annotations on multi level pointers
The `invocation list`:
.. code-block:: bash
-
- "/path/to/your/project/foo.cpp":
+
+ "/path/to/your/project/foo.cpp":
- "clang++"
- "-c"
- "/path/to/your/project/foo.cpp"
- "-o"
- "/path/to/your/project/foo.o"
- "/path/to/your/project/main.cpp":
+ "/path/to/your/project/main.cpp":
- "clang++"
- "-c"
- "/path/to/your/project/main.cpp"
`scan-build-py` has various errors and issues, expect it to work only with the very basic projects only.
Currently On-demand analysis is not supported with `scan-build-py`.
-
---------------------
Fuzzing tests are used to ensure quality and security of LLVM-libc
-implementations.
+implementations.
Each fuzzing test lives under the fuzzing directory in a subdirectory
-corresponding with the src layout.
+corresponding with the src layout.
Currently we use system libc for functions that have yet to be implemented,
-however as they are implemented the fuzzers will be changed to use our
-implementation to increase coverage for testing.
+however as they are implemented the fuzzers will be changed to use our
+implementation to increase coverage for testing.
Fuzzers will be run on `oss-fuzz <https://github.com/google/oss-fuzz>`_ and the
-check-libc target will ensure that they build correctly.
+check-libc target will ensure that they build correctly.
There are differences in how LLD and LD64 handle ObjC symbols loaded from archives.
- LD64:
- * Duplicate ObjC symbols from the same archives will not raise an error. LD64 will pick the first one.
+ * Duplicate ObjC symbols from the same archives will not raise an error. LD64 will pick the first one.
* Duplicate ObjC symbols from different archives will raise a "duplicate symbol" error.
- LLD:
* Duplicate symbols, regardless of which archives they are from, will raise errors.
-
Usage
-----
-The WebAssembly version of lld is installed as **wasm-ld**. It shared many
+The WebAssembly version of lld is installed as **wasm-ld**. It shared many
common linker flags with **ld.lld** but also includes several
WebAssembly-specific options:
**Do not use hard-coded line numbers in your test case.**
-Instead, try to tag the line with some distinguishing pattern, and use the function line_number() defined in lldbtest.py which takes
+Instead, try to tag the line with some distinguishing pattern, and use the function line_number() defined in lldbtest.py which takes
filename and string_to_match as arguments and returns the line number.
As an example, take a look at test/API/functionalities/breakpoint/breakpoint_conditions/main.c which has these
The default cleanup action performed by the packages/Python/lldbsuite/test/lldbtest.py module invokes the "make clean" os command.
-If this default cleanup is not enough, individual class can provide an extra cleanup hook with a class method named classCleanup ,
+If this default cleanup is not enough, individual class can provide an extra cleanup hook with a class method named classCleanup ,
for example, in test/API/terminal/TestSTTYBeforeAndAfter.py:
.. code-block:: python
cls.RemoveTempFile("child_send1.txt")
-The 'child_send1.txt' file gets generated during the test run, so it makes sense to explicitly spell out the action in the same
+The 'child_send1.txt' file gets generated during the test run, so it makes sense to explicitly spell out the action in the same
TestSTTYBeforeAndAfter.py file to do the cleanup instead of artificially adding it as part of the default cleanup action which serves to
cleanup those intermediate and a.out files.
``options`` Python summary formatters can optionally define this
third argument, which is an object of type ``lldb.SBTypeSummaryOptions``,
allowing for a few customizations of the result. The decision to
-adopt or not this third argument - and the meaning of options
+adopt or not this third argument - and the meaning of options
thereof - is up to the individual formatter's writer.
Other than interactively typing a Python script there are two other ways for
**Layer:** The representation of trace data between passes. For Intel PT there are two types of layers:
- **Instruction Layer:** Composed of the load addresses of the instructions in the trace. In an effort to save space,
- metadata is only stored for instructions that are of interest, not every instruction in the trace. HTR contains a
+ **Instruction Layer:** Composed of the load addresses of the instructions in the trace. In an effort to save space,
+ metadata is only stored for instructions that are of interest, not every instruction in the trace. HTR contains a
single instruction layer.
- **Block Layer:** Composed of blocks - a block in *layer n* refers to a sequence of blocks in *layer n - 1*. A block in
- *layer 1* refers to a sequence of instructions in *layer 0* (the instruction layer). Metadata is stored for each block in
+ **Block Layer:** Composed of blocks - a block in *layer n* refers to a sequence of blocks in *layer n - 1*. A block in
+ *layer 1* refers to a sequence of instructions in *layer 0* (the instruction layer). Metadata is stored for each block in
a block layer. HTR contains one or more block layers.
**Pass:** A transformation applied to a *layer* that generates a new *layer* that is a more summarized, consolidated representation of the trace data.
This document contains information necessary to successfully implement this
interface, use it, and to test both sides. It also explains some of the finer
-points about what exactly results mean.
+points about what exactly results mean.
``AliasAnalysis`` Class Overview
================================
int i;
char C[2];
- char A[10];
+ char A[10];
/* ... */
for (i = 0; i != 10; ++i) {
C[0] = A[i]; /* One byte store */
int i;
char C[2];
- char A[10];
+ char A[10];
/* ... */
for (i = 0; i != 10; ++i) {
((short*)C)[0] = A[i]; /* Two byte store! */
The ``alias`` method
--------------------
-
+
The ``alias`` method is the primary interface used to determine whether or not
two memory objects alias each other. It takes two memory objects as input and
returns MustAlias, PartialAlias, MayAlias, or NoAlias as appropriate.
.. figure:: ARM-BE-ldr.png
:align: right
-
+
Big endian vector load using ``LDR``.
.. container:: clearer
Note that throughout this section we only mention loads. Stores have exactly the same problems as their associated loads, so have been skipped for brevity.
-
+
Considerations
==============
There are 3 parts to the implementation:
- 1. Predicate ``LDR`` and ``STR`` instructions so that they are never allowed to be selected to generate vector loads and stores. The exception is one-lane vectors [1]_ - these by definition cannot have lane ordering problems so are fine to use ``LDR``/``STR``.
+ 1. Predicate ``LDR`` and ``STR`` instructions so that they are never allowed to be selected to generate vector loads and stores. The exception is one-lane vectors [1]_ - these by definition cannot have lane ordering problems so are fine to use ``LDR``/``STR``.
2. Create code generation patterns for bitconverts that create ``REV`` instructions.
LD1 v0.4s, [x]
- REV64 v0.4s, v0.4s // There is no REV128 instruction, so it must be synthesizedcd
+ REV64 v0.4s, v0.4s // There is no REV128 instruction, so it must be synthesizedcd
EXT v0.16b, v0.16b, v0.16b, #8 // with a REV64 then an EXT to swap the two 64-bit elements.
REV64 v0.2d, v0.2d
It turns out that these ``REV`` pairs can, in almost all cases, be squashed together into a single ``REV``. For the example above, a ``REV128 4s`` + ``REV128 2d`` is actually a ``REV64 4s``, as shown in the figure on the right.
.. [1] One lane vectors may seem useless as a concept but they serve to distinguish between values held in general purpose registers and values held in NEON/VFP registers. For example, an ``i64`` would live in an ``x`` register, but ``<1 x i64>`` would live in a ``d`` register.
-
plus 1.
* *preemptionspecifier*: If present, an encoding of the :ref:`runtime preemption specifier<bcpreemptionspecifier>` of this function.
-
+
MODULE_CODE_ALIAS Record
^^^^^^^^^^^^^^^^^^^^^^^^
components. LLVM library components are either library names with the LLVM
prefix removed (i.e. Support, Demangle...), LLVM target names, or special
purpose component names. The special purpose component names are:
-
+
#. ``all`` - All LLVM available component libraries
#. ``Native`` - The LLVM target for the Native system
#. ``AllTargetsAsmParsers`` - All the included target ASM parsers libraries
Defaults to ON.
**LLVM_EXPERIMENTAL_TARGETS_TO_BUILD**:STRING
- Semicolon-separated list of experimental targets to build and linked into
- llvm. This will build the experimental target without needing it to add to the
+ Semicolon-separated list of experimental targets to build and linked into
+ llvm. This will build the experimental target without needing it to add to the
list of all the targets available in the LLVM's main CMakeLists.txt.
**LLVM_EXTERNAL_{CLANG,LLD,POLLY}_SOURCE_DIR**:PATH
$ D:\git> git clone https://github.com/mjansson/rpmalloc
$ D:\llvm-project> cmake ... -DLLVM_INTEGRATED_CRT_ALLOC=D:\git\rpmalloc
-
+
This flag needs to be used along with the static CRT, ie. if building the
Release target, add -DLLVM_USE_CRT_RELEASE=MT.
""""""""""""
The header file's guard should be the all-caps path that a user of this header
-would #include, using '_' instead of path separator and extension marker.
+would #include, using '_' instead of path separator and extension marker.
For example, the header file
-``llvm/include/llvm/Analysis/Utils/Local.h`` would be ``#include``-ed as
-``#include "llvm/Analysis/Utils/Local.h"``, so its guard is
+``llvm/include/llvm/Analysis/Utils/Local.h`` would be ``#include``-ed as
+``#include "llvm/Analysis/Utils/Local.h"``, so its guard is
``LLVM_ANALYSIS_UTILS_LOCAL_H``.
Class overviews
(quick update) operations, the archive will be reconstructed in the format
defined by :option:`--format`.
-Here's where :program:`llvm-ar` departs from previous :program:`ar`
+Here's where :program:`llvm-ar` departs from previous :program:`ar`
implementations:
*The following option is not supported*
-
+
[f] - truncate inserted filenames
-
+
*The following options are ignored for compatibility*
--plugin=<string> - load a plugin which adds support for other file formats
-
- [l] - ignored in :program:`ar`
+
+ [l] - ignored in :program:`ar`
*Symbol Table*
Since :program:`llvm-ar` supports bitcode files, the symbol table it creates
includes both native and bitcode symbols.
-
+
*Deterministic Archives*
By default, :program:`llvm-ar` always uses zero for timestamps and UIDs/GIDs
- to write archives in a deterministic mode. This is equivalent to the
+ to write archives in a deterministic mode. This is equivalent to the
:option:`D` modifier being enabled by default. If you wish to maintain
compatibility with other :program:`ar` implementations, you can pass the
:option:`U` modifier to write actual timestamps and UIDs/GIDs.
-
+
*Windows Paths*
When on Windows :program:`llvm-ar` treats the names of archived *files* in the same
:program:`llvm-ar` operations are compatible with other :program:`ar`
implementations. However, there are a few modifiers (:option:`L`) that are not
-found in other :program:`ar` implementations. The options for
+found in other :program:`ar` implementations. The options for
:program:`llvm-ar` specify a single basic Operation to perform on the archive,
a variety of Modifiers for that Operation, the name of the archive file, and an
optional list of file names. If the *files* option is not specified, it
they do not exist. The :option:`a`, :option:`b`, :option:`T` and :option:`u`
modifiers apply to this operation. If no *files* are specified, the archive
is not modified.
-
+
t[v]
.. option:: t [vO]
size, and the date. With the :option:`O` modifier, display member offsets. If
any *files* are specified, the listing is only for those files. If no *files*
are specified, the table of contents for the whole archive is printed.
-
+
.. option:: V
- A synonym for the :option:`--version` option.
+ A synonym for the :option:`--version` option.
.. option:: x [oP]
.. option:: i
- A synonym for the :option:`b` option.
+ A synonym for the :option:`b` option.
.. option:: L
selects the instance of the given name, with "1" indicating the first
instance. If :option:`N` is not specified the first member of that name will
be selected. If *count* is not supplied, the operation fails.*count* cannot be
-
+
.. option:: o
When extracting files, use the modification times of any *files* as they
appear in the ``archive``. By default *files* extracted from the archive
use the time of extraction.
-
+
.. option:: O
Display member offsets inside the archive.
This modifier is the opposite of the :option:`s` modifier. It instructs
:program:`llvm-ar` to not build the symbol table. If both :option:`s` and
:option:`S` are used, the last modifier to occur in the options will prevail.
-
+
.. option:: u
Only update ``archive`` members with *files* that have more recent
timestamps.
-
+
.. option:: U
Use actual timestamps and UIDs/GIDs.
stream. No other options are compatible with this option.
.. option:: --rsp-quoting=<type>
- This option selects the quoting style ``<type>`` for response files, either
+ This option selects the quoting style ``<type>`` for response files, either
``posix`` or ``windows``. The default when on Windows is ``windows``, otherwise the
default is ``posix``.
supported by archivers following in the ar tradition. An MRI script contains a
sequence of commands to be executed by the archiver. The :option:`-M` option
allows for an MRI script to be passed to :program:`llvm-ar` through the
-standard input stream.
-
+standard input stream.
+
Note that :program:`llvm-ar` has known limitations regarding the use of MRI
scripts:
-
+
* Each script can only create one archive.
* Existing archives can not be modified.
# LLVM-MCA-BEGIN A simple example
add %eax, %eax
- # LLVM-MCA-END
+ # LLVM-MCA-END
The code from the example above defines a region named "A simple example" with a
single instruction in it. Note how the region name doesn't have to be repeated
Cycles with backend pressure increase [ 48.07% ]
- Throughput Bottlenecks:
+ Throughput Bottlenecks:
Resource Pressure [ 47.77% ]
- JFPA [ 47.77% ]
- JFPU0 [ 47.77% ]
Data Dependencies: [ 0.30% ]
- Register Dependencies [ 0.30% ]
- Memory Dependencies [ 0.00% ]
-
+
Critical sequence based on the simulation:
-
+
Instruction Dependency Information
+----< 2. vhaddps %xmm3, %xmm3, %xmm4
|
- | < loop carried >
+ | < loop carried >
|
| 0. vmulps %xmm0, %xmm1, %xmm2
+----> 1. vhaddps %xmm2, %xmm2, %xmm3 ## RESOURCE interference: JFPA [ probability: 74% ]
+----> 2. vhaddps %xmm3, %xmm3, %xmm4 ## REGISTER dependency: %xmm3
|
- | < loop carried >
+ | < loop carried >
|
+----> 1. vhaddps %xmm2, %xmm2, %xmm3 ## RESOURCE interference: JFPA [ probability: 74% ]
represents a single symbol, with leading and trailing whitespace ignored, as is
anything following a '#'. Can be specified multiple times to read names from
multiple files.
-
+
.. option:: --new-symbol-visibility <visibility>
Specify the visibility of the symbols automatically created when using binary
.. option:: -D, --disassemble-all
Disassemble all sections found in the input files.
-
+
.. option:: --disassemble-symbols=<symbol1[,symbol2,...]>
Disassemble only the specified symbols. Takes demangled symbol names when
.. option:: -u, --unwind-info
Display the unwind info of the input(s).
-
+
This operation is only currently supported for COFF and Mach-O object files.
.. option:: -v, --version
.. option:: -sample
Specify that the input profile is a sample-based profile.
-
+
The format of the generated file can be generated in one of three ways:
.. option:: -binary (default)
Emit the profile using a binary encoding. For instrumentation-based profile
- the output format is the indexed binary format.
+ the output format is the indexed binary format.
.. option:: -extbinary
.. option:: --demangle, -C
Display demangled symbol names in the output.
-
+
.. option:: --dependent-libraries
Display the dependent libraries section.
.. option:: --needed-libs
Display the needed libraries.
-
+
.. option:: --no-demangle
Do not display demangled symbol names in the output. On by default.
.. option:: --version-info, -V
Display version sections.
-
+
.. option:: --wide, -W
Ignored for GNU readelf compatibility. The output is already similar to when using -W with GNU readelf.
-
+
.. option:: @<FILE>
Read command-line options from response file `<FILE>`.
section index or section name.
.. option:: --string-table
-
+
Display contents of the string table.
.. option:: --symbols, --syms, -s
Print just the file's name without any directories, instead of the
absolute path.
-
+
.. _llvm-symbolizer-opt-C:
.. option:: --demangle, -C
Specify the preferred output style. Defaults to ``LLVM``. When the output
style is set to ``GNU``, the tool follows the style of GNU's **addr2line**.
The differences from the ``LLVM`` style are:
-
+
* Does not print the column of a source code location.
* Does not add an empty line after the report for an address.
:depth: 3
.. warning::
- This is a work in progress. Compatibility across LLVM releases is not
+ This is a work in progress. Compatibility across LLVM releases is not
guaranteed.
Introduction
.. _coroutine handle:
-LLVM coroutines are functions that have one or more `suspend points`_.
+LLVM coroutines are functions that have one or more `suspend points`_.
When a suspend point is reached, the execution of a coroutine is suspended and
-control is returned back to its caller. A suspended coroutine can be resumed
-to continue execution from the last suspend point or it can be destroyed.
+control is returned back to its caller. A suspended coroutine can be resumed
+to continue execution from the last suspend point or it can be destroyed.
-In the following example, we call function `f` (which may or may not be a
-coroutine itself) that returns a handle to a suspended coroutine
+In the following example, we call function `f` (which may or may not be a
+coroutine itself) that returns a handle to a suspended coroutine
(**coroutine handle**) that is used by `main` to resume the coroutine twice and
then destroy it:
.. _coroutine frame:
-In addition to the function stack frame which exists when a coroutine is
-executing, there is an additional region of storage that contains objects that
+In addition to the function stack frame which exists when a coroutine is
+executing, there is an additional region of storage that contains objects that
keep the coroutine state when a coroutine is suspended. This region of storage
is called the **coroutine frame**. It is created when a coroutine is called
and destroyed when a coroutine either runs to completion or is destroyed
for(;;) {
print(n++);
<suspend> // returns a coroutine handle on first suspend
- }
- }
+ }
+ }
This coroutine calls some function `print` with value `n` as an argument and
-suspends execution. Every time this coroutine resumes, it calls `print` again with an argument one bigger than the last time. This coroutine never completes by itself and must be destroyed explicitly. If we use this coroutine with
-a `main` shown in the previous section. It will call `print` with values 4, 5
+suspends execution. Every time this coroutine resumes, it calls `print` again with an argument one bigger than the last time. This coroutine never completes by itself and must be destroyed explicitly. If we use this coroutine with
+a `main` shown in the previous section. It will call `print` with values 4, 5
and 6 after which the coroutine will be destroyed.
The LLVM IR for this coroutine looks like this:
}
The `entry` block establishes the coroutine frame. The `coro.size`_ intrinsic is
-lowered to a constant representing the size required for the coroutine frame.
-The `coro.begin`_ intrinsic initializes the coroutine frame and returns the
-coroutine handle. The second parameter of `coro.begin` is given a block of memory
+lowered to a constant representing the size required for the coroutine frame.
+The `coro.begin`_ intrinsic initializes the coroutine frame and returns the
+coroutine handle. The second parameter of `coro.begin` is given a block of memory
to be used if the coroutine frame needs to be allocated dynamically.
The `coro.id`_ intrinsic serves as coroutine identity useful in cases when the
-`coro.begin`_ intrinsic get duplicated by optimization passes such as
+`coro.begin`_ intrinsic get duplicated by optimization passes such as
jump-threading.
-The `cleanup` block destroys the coroutine frame. The `coro.free`_ intrinsic,
+The `cleanup` block destroys the coroutine frame. The `coro.free`_ intrinsic,
given the coroutine handle, returns a pointer of the memory block to be freed or
-`null` if the coroutine frame was not allocated dynamically. The `cleanup`
+`null` if the coroutine frame was not allocated dynamically. The `cleanup`
block is entered when coroutine runs to completion by itself or destroyed via
call to the `coro.destroy`_ intrinsic.
-The `suspend` block contains code to be executed when coroutine runs to
-completion or suspended. The `coro.end`_ intrinsic marks the point where
-a coroutine needs to return control back to the caller if it is not an initial
-invocation of the coroutine.
+The `suspend` block contains code to be executed when coroutine runs to
+completion or suspended. The `coro.end`_ intrinsic marks the point where
+a coroutine needs to return control back to the caller if it is not an initial
+invocation of the coroutine.
-The `loop` blocks represents the body of the coroutine. The `coro.suspend`_
-intrinsic in combination with the following switch indicates what happens to
-control flow when a coroutine is suspended (default case), resumed (case 0) or
+The `loop` blocks represents the body of the coroutine. The `coro.suspend`_
+intrinsic in combination with the following switch indicates what happens to
+control flow when a coroutine is suspended (default case), resumed (case 0) or
destroyed (case 1).
Coroutine Transformation
One of the steps of coroutine lowering is building the coroutine frame. The
def-use chains are analyzed to determine which objects need be kept alive across
-suspend points. In the coroutine shown in the previous section, use of virtual register
-`%inc` is separated from the definition by a suspend point, therefore, it
-cannot reside on the stack frame since the latter goes away once the coroutine
-is suspended and control is returned back to the caller. An i32 slot is
+suspend points. In the coroutine shown in the previous section, use of virtual register
+`%inc` is separated from the definition by a suspend point, therefore, it
+cannot reside on the stack frame since the latter goes away once the coroutine
+is suspended and control is returned back to the caller. An i32 slot is
allocated in the coroutine frame and `%inc` is spilled and reloaded from that
slot as needed.
-We also store addresses of the resume and destroy functions so that the
+We also store addresses of the resume and destroy functions so that the
`coro.resume` and `coro.destroy` intrinsics can resume and destroy the coroutine
-when its identity cannot be determined statically at compile time. For our
+when its identity cannot be determined statically at compile time. For our
example, the coroutine frame will be:
.. code-block:: llvm
%f.frame = type { void (%f.frame*)*, void (%f.frame*)*, i32 }
-After resume and destroy parts are outlined, function `f` will contain only the
-code responsible for creation and initialization of the coroutine frame and
+After resume and destroy parts are outlined, function `f` will contain only the
+code responsible for creation and initialization of the coroutine frame and
execution of the coroutine until a suspend point is reached:
.. code-block:: llvm
store void (%f.frame*)* @f.resume, void (%f.frame*)** %1
%2 = getelementptr %f.frame, %f.frame* %frame, i32 0, i32 1
store void (%f.frame*)* @f.destroy, void (%f.frame*)** %2
-
+
%inc = add nsw i32 %n, 1
%inc.spill.addr = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i32 0, i32 2
store i32 %inc, i32* %inc.spill.addr
call void @print(i32 %n)
-
+
ret i8* %frame
}
Avoiding Heap Allocations
-------------------------
-
-A particular coroutine usage pattern, which is illustrated by the `main`
-function in the overview section, where a coroutine is created, manipulated and
+
+A particular coroutine usage pattern, which is illustrated by the `main`
+function in the overview section, where a coroutine is created, manipulated and
destroyed by the same calling function, is common for coroutines implementing
-RAII idiom and is suitable for allocation elision optimization which avoid
-dynamic allocation by storing the coroutine frame as a static `alloca` in its
+RAII idiom and is suitable for allocation elision optimization which avoid
+dynamic allocation by storing the coroutine frame as a static `alloca` in its
caller.
In the entry block, we will call `coro.alloc`_ intrinsic that will return `true`
-when dynamic allocation is required, and `false` if dynamic allocation is
+when dynamic allocation is required, and `false` if dynamic allocation is
elided.
.. code-block:: llvm
switch i8 %3, label %suspend [i8 0, label %loop
i8 1, label %cleanup]
-In this case, the coroutine frame would include a suspend index that will
-indicate at which suspend point the coroutine needs to resume. The resume
-function will use an index to jump to an appropriate basic block and will look
+In this case, the coroutine frame would include a suspend index that will
+indicate at which suspend point the coroutine needs to resume. The resume
+function will use an index to jump to an appropriate basic block and will look
as follows:
.. code-block:: llvm
ret void
}
-If different cleanup code needs to get executed for different suspend points,
+If different cleanup code needs to get executed for different suspend points,
a similar switch will be in the `f.destroy` function.
.. note ::
Using suspend index in a coroutine state and having a switch in `f.resume` and
- `f.destroy` is one of the possible implementation strategies. We explored
+ `f.destroy` is one of the possible implementation strategies. We explored
another option where a distinct `f.resume1`, `f.resume2`, etc. are created for
- every suspend point, and instead of storing an index, the resume and destroy
+ every suspend point, and instead of storing an index, the resume and destroy
function pointers are updated at every suspend. Early testing showed that the
- current approach is easier on the optimizer than the latter so it is a
+ current approach is easier on the optimizer than the latter so it is a
lowering strategy implemented at the moment.
Distinct Save and Suspend
-------------------------
-In the previous example, setting a resume index (or some other state change that
+In the previous example, setting a resume index (or some other state change that
needs to happen to prepare a coroutine for resumption) happens at the same time as
-a suspension of a coroutine. However, in certain cases, it is necessary to control
+a suspension of a coroutine. However, in certain cases, it is necessary to control
when coroutine is prepared for resumption and when it is suspended.
In the following example, a coroutine represents some activity that is driven
}
}
-In this case, coroutine should be ready for resumption prior to a call to
+In this case, coroutine should be ready for resumption prior to a call to
`async_op1` and `async_op2`. The `coro.save`_ intrinsic is used to indicate a
point when coroutine should be ready for resumption (namely, when a resume index
-should be stored in the coroutine frame, so that it can be resumed at the
+should be stored in the coroutine frame, so that it can be resumed at the
correct resume point):
.. code-block:: llvm
A coroutine author or a frontend may designate a distinguished `alloca` that can
be used to communicate with the coroutine. This distinguished alloca is called
-**coroutine promise** and is provided as the second parameter to the
+**coroutine promise** and is provided as the second parameter to the
`coro.id`_ intrinsic.
The following coroutine designates a 32 bit integer `promise` and uses it to
* it is possible to check whether a suspended coroutine is at the final suspend
point via `coro.done`_ intrinsic;
-* a resumption of a coroutine stopped at the final suspend point leads to
+* a resumption of a coroutine stopped at the final suspend point leads to
undefined behavior. The only possible action for a coroutine at a final
suspend point is destroying it via `coro.destroy`_ intrinsic.
-From the user perspective, the final suspend point represents an idea of a
+From the user perspective, the final suspend point represents an idea of a
coroutine reaching the end. From the compiler perspective, it is an optimization
opportunity for reducing number of resume points (and therefore switch cases) in
the resume function.
The following is an example of a function that keeps resuming the coroutine
-until the final suspend point is reached after which point the coroutine is
+until the final suspend point is reached after which point the coroutine is
destroyed:
.. code-block:: llvm
.. code-block:: c
void* coroutine(int n) {
- int current_value;
+ int current_value;
<designate current_value to be coroutine promise>
<SUSPEND> // injected suspend point, so that the coroutine starts suspended
for (int i = 0; i < n; ++i) {
Semantics:
""""""""""
-When possible, the `coro.destroy` intrinsic is replaced with a direct call to
-the coroutine destroy function. Otherwise it is replaced with an indirect call
+When possible, the `coro.destroy` intrinsic is replaced with a direct call to
+the coroutine destroy function. Otherwise it is replaced with an indirect call
based on the function pointer for the destroy function stored in the coroutine
frame. Destroying a coroutine that is not suspended leads to undefined behavior.
""""""""""
When possible, the `coro.resume` intrinsic is replaced with a direct call to the
-coroutine resume function. Otherwise it is replaced with an indirect call based
-on the function pointer for the resume function stored in the coroutine frame.
+coroutine resume function. Otherwise it is replaced with an indirect call based
+on the function pointer for the resume function stored in the coroutine frame.
Resuming a coroutine that is not suspended leads to undefined behavior.
.. _coro.done:
Semantics:
""""""""""
-Using this intrinsic on a coroutine that does not have a `final suspend`_ point
+Using this intrinsic on a coroutine that does not have a `final suspend`_ point
or on a coroutine that is not suspended leads to undefined behavior.
.. _coro.promise:
Overview:
"""""""""
-The '``llvm.coro.promise``' intrinsic obtains a pointer to a
+The '``llvm.coro.promise``' intrinsic obtains a pointer to a
`coroutine promise`_ given a switched-resume coroutine handle and vice versa.
Arguments:
""""""""""
-The first argument is a handle to a coroutine if `from` is false. Otherwise,
+The first argument is a handle to a coroutine if `from` is false. Otherwise,
it is a pointer to a coroutine promise.
-The second argument is an alignment requirements of the promise.
-If a frontend designated `%promise = alloca i32` as a promise, the alignment
-argument to `coro.promise` should be the alignment of `i32` on the target
-platform. If a frontend designated `%promise = alloca i32, align 16` as a
+The second argument is an alignment requirements of the promise.
+If a frontend designated `%promise = alloca i32` as a promise, the alignment
+argument to `coro.promise` should be the alignment of `i32` on the target
+platform. If a frontend designated `%promise = alloca i32, align 16` as a
promise, the alignment argument should be 16.
This argument only accepts constants.
The third argument is a boolean indicating a direction of the transformation.
-If `from` is true, the intrinsic returns a coroutine handle given a pointer
-to a promise. If `from` is false, the intrinsics return a pointer to a promise
+If `from` is true, the intrinsic returns a coroutine handle given a pointer
+to a promise. If `from` is false, the intrinsics return a pointer to a promise
from a coroutine handle. This argument only accepts constants.
Semantics:
entry:
%hdl = call i8* @f(i32 4) ; starts the coroutine and returns its handle
%promise.addr.raw = call i8* @llvm.coro.promise(i8* %hdl, i32 4, i1 false)
- %promise.addr = bitcast i8* %promise.addr.raw to i32*
+ %promise.addr = bitcast i8* %promise.addr.raw to i32*
%val = load i32, i32* %promise.addr ; load a value from the promise
call void @print(i32 %val)
call void @llvm.coro.destroy(i8* %hdl)
""""""""""
The `coro.size` intrinsic is lowered to a constant representing the size of
-the coroutine frame.
+the coroutine frame.
.. _coro.begin:
Arguments:
""""""""""
-The first argument is a token returned by a call to '``llvm.coro.id``'
+The first argument is a token returned by a call to '``llvm.coro.id``'
identifying the coroutine.
The second argument is a pointer to a block of memory where coroutine frame
""""""""""
Depending on the alignment requirements of the objects in the coroutine frame
-and/or on the codegen compactness reasons the pointer returned from `coro.begin`
-may be at offset to the `%mem` argument. (This could be beneficial if
-instructions that express relative access to data can be more compactly encoded
+and/or on the codegen compactness reasons the pointer returned from `coro.begin`
+may be at offset to the `%mem` argument. (This could be beneficial if
+instructions that express relative access to data can be more compactly encoded
with small positive and negative offsets).
A frontend should emit exactly one `coro.begin` intrinsic per coroutine.
Overview:
"""""""""
-The '``llvm.coro.free``' intrinsic returns a pointer to a block of memory where
+The '``llvm.coro.free``' intrinsic returns a pointer to a block of memory where
coroutine frame is stored or `null` if this instance of a coroutine did not use
dynamically allocated memory for its coroutine frame. This intrinsic is not
supported for returned-continuation coroutines.
Arguments:
""""""""""
-The first argument is a token returned by a call to '``llvm.coro.id``'
+The first argument is a token returned by a call to '``llvm.coro.id``'
identifying the coroutine.
The second argument is a pointer to the coroutine frame. This should be the same
Arguments:
""""""""""
-The first argument is a token returned by a call to '``llvm.coro.id``'
+The first argument is a token returned by a call to '``llvm.coro.id``'
identifying the coroutine.
Semantics:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
::
- declare token @llvm.coro.id(i32 <align>, i8* <promise>, i8* <coroaddr>,
+ declare token @llvm.coro.id(i32 <align>, i8* <promise>, i8* <coroaddr>,
i8* <fnaddrs>)
Overview:
Arguments:
""""""""""
-The first argument provides information on the alignment of the memory returned
-by the allocation function and given to `coro.begin` by the first argument. If
+The first argument provides information on the alignment of the memory returned
+by the allocation function and given to `coro.begin` by the first argument. If
this argument is 0, the memory is assumed to be aligned to 2 * sizeof(i8*).
This argument only accepts constants.
to be a `coroutine promise`_.
The third argument is `null` coming out of the frontend. The CoroEarly pass sets
-this argument to point to the function this coro.id belongs to.
+this argument to point to the function this coro.id belongs to.
-The fourth argument is `null` before coroutine is split, and later is replaced
-to point to a private global constant array containing function pointers to
+The fourth argument is `null` before coroutine is split, and later is replaced
+to point to a private global constant array containing function pointers to
outlined resume and destroy parts of the coroutine.
Overview:
"""""""""
-The '``llvm.coro.end``' marks the point where execution of the resume part of
+The '``llvm.coro.end``' marks the point where execution of the resume part of
the coroutine should end and control should return to the caller.
The first argument should refer to the coroutine handle of the enclosing
coroutine. A frontend is allowed to supply null as the first parameter, in this
-case `coro-early` pass will replace the null with an appropriate coroutine
+case `coro-early` pass will replace the null with an appropriate coroutine
handle value.
-The second argument should be `true` if this coro.end is in the block that is
-part of the unwind sequence leaving the coroutine body due to an exception and
+The second argument should be `true` if this coro.end is in the block that is
+part of the unwind sequence leaving the coroutine body due to an exception and
`false` otherwise.
Semantics:
""""""""""
The purpose of this intrinsic is to allow frontends to mark the cleanup and
other code that is only relevant during the initial invocation of the coroutine
-and should not be present in resume and destroy parts.
+and should not be present in resume and destroy parts.
In returned-continuation lowering, ``llvm.coro.end`` fully destroys the
coroutine frame. If the second argument is `false`, it also returns from
the start, resume and destroy parts. In the start part, it is a no-op,
in resume and destroy parts, it is replaced with `ret void` instruction and
the rest of the block containing `coro.end` instruction is discarded.
-In landing pads it is replaced with an appropriate instruction to unwind to
-caller. The handling of coro.end differs depending on whether the target is
+In landing pads it is replaced with an appropriate instruction to unwind to
+caller. The handling of coro.end differs depending on whether the target is
using landingpad or WinEH exception model.
-For landingpad based exception model, it is expected that frontend uses the
+For landingpad based exception model, it is expected that frontend uses the
`coro.end`_ intrinsic as follows:
.. code-block:: llvm
.. code-block:: llvm
- ehcleanup:
+ ehcleanup:
%tok = cleanuppad within none []
%unused = call i1 @llvm.coro.end(i8* null, i1 true) [ "funclet"(token %tok) ]
cleanupret from %tok unwind label %RestOfTheCleanup
-The `CoroSplit` pass, if the funclet bundle is present, will insert
+The `CoroSplit` pass, if the funclet bundle is present, will insert
``cleanupret from %tok unwind to caller`` before
the `coro.end`_ intrinsic and will remove the rest of the block.
Arguments:
""""""""""
-The first argument refers to a token of `coro.save` intrinsic that marks the
+The first argument refers to a token of `coro.save` intrinsic that marks the
point when coroutine state is prepared for suspension. If `none` token is passed,
the intrinsic behaves as if there were a `coro.save` immediately preceding
the `coro.suspend` intrinsic.
%s.final = call i8 @llvm.coro.suspend(token none, i1 true)
switch i8 %s.final, label %suspend [i8 0, label %trap
i8 1, label %cleanup]
- trap:
+ trap:
call void @llvm.trap()
unreachable
If a coroutine that was suspended at the suspend point marked by this intrinsic
is resumed via `coro.resume`_ the control will transfer to the basic block
of the 0-case. If it is resumed via `coro.destroy`_, it will proceed to the
-basic block indicated by the 1-case. To suspend, coroutine proceed to the
+basic block indicated by the 1-case. To suspend, coroutine proceed to the
default label.
If suspend intrinsic is marked as final, it can consider the `true` branch
Overview:
"""""""""
-The '``llvm.coro.save``' marks the point where a coroutine need to update its
-state to prepare for resumption to be considered suspended (and thus eligible
-for resumption).
+The '``llvm.coro.save``' marks the point where a coroutine need to update its
+state to prepare for resumption to be considered suspended (and thus eligible
+for resumption).
Arguments:
""""""""""
""""""""""
Whatever coroutine state changes are required to enable resumption of
-the coroutine from the corresponding suspend point should be done at the point
+the coroutine from the corresponding suspend point should be done at the point
of `coro.save` intrinsic.
Example:
""""""""
-Separate save and suspend points are necessary when a coroutine is used to
+Separate save and suspend points are necessary when a coroutine is used to
represent an asynchronous control flow driven by callbacks representing
completions of asynchronous operations.
-In such a case, a coroutine should be ready for resumption prior to a call to
+In such a case, a coroutine should be ready for resumption prior to a call to
`async_op` function that may trigger resumption of a coroutine from the same or
a different thread possibly prior to `async_op` call returning control back
to the coroutine:
Arguments:
""""""""""
-The first argument points to an `alloca` storing the value of a parameter to a
-coroutine.
+The first argument points to an `alloca` storing the value of a parameter to a
+coroutine.
The second argument points to an `alloca` storing the value of the copy of that
parameter.
The optimizer is free to always replace this intrinsic with `i1 true`.
-The optimizer is also allowed to replace it with `i1 false` provided that the
+The optimizer is also allowed to replace it with `i1 false` provided that the
parameter copy is only used prior to control flow reaching any of the suspend
-points. The code that would be DCE'd if the `coro.param` is replaced with
+points. The code that would be DCE'd if the `coro.param` is replaced with
`i1 false` is not considered to be a use of the parameter copy.
-The frontend can emit this intrinsic if its language rules allow for this
+The frontend can emit this intrinsic if its language rules allow for this
optimization.
Example:
}
Note that, uses of `b` is used after a suspend point and thus must be copied
-into a coroutine frame, whereas `a` does not have to, since it never used
+into a coroutine frame, whereas `a` does not have to, since it never used
after suspend.
A frontend can create parameter copies for `a` and `b` as follows:
---------
The pass CoroEarly lowers coroutine intrinsics that hide the details of the
structure of the coroutine frame, but, otherwise not needed to be preserved to
-help later coroutine passes. This pass lowers `coro.frame`_, `coro.done`_,
+help later coroutine passes. This pass lowers `coro.frame`_, `coro.done`_,
and `coro.promise`_ intrinsics.
.. _CoroSplit:
CoroSplit
---------
-The pass CoroSplit buides coroutine frame and outlines resume and destroy parts
+The pass CoroSplit buides coroutine frame and outlines resume and destroy parts
into separate functions.
CoroElide
---------
-The pass CoroElide examines if the inlined coroutine is eligible for heap
-allocation elision optimization. If so, it replaces
+The pass CoroElide examines if the inlined coroutine is eligible for heap
+allocation elision optimization. If so, it replaces
`coro.begin` intrinsic with an address of a coroutine frame placed on its caller
and replaces `coro.alloc` and `coro.free` intrinsics with `false` and `null`
-respectively to remove the deallocation code.
-This pass also replaces `coro.resume` and `coro.destroy` intrinsics with direct
+respectively to remove the deallocation code.
+This pass also replaces `coro.resume` and `coro.destroy` intrinsics with direct
calls to resume and destroy functions for a particular coroutine where possible.
CoroCleanup
allocas.
#. The CoroElide optimization pass relies on coroutine ramp function to be
- inlined. It would be beneficial to split the ramp function further to
+ inlined. It would be beneficial to split the ramp function further to
increase the chance that it will get inlined into its caller.
#. Design a convention that would make it possible to apply coroutine heap
7 f *= n;
8 return f;
-> 9 }
- 10
+ 10
11 int main(int argc, char** argv)
12 {
(lldb) p f
14 return -1;
15 char firstletter = argv[1][0];
-> 16 int result = compute_factorial(firstletter - '0');
- 17
+ 17
18 // Returned result is clipped at 255...
19 return result;
(lldb) p result
* thread #1, name = 'lli', stop reason = step over
frame #0: 0x00007ffff7fd0098 JIT(0x45c2cb0)`main(argc=2, argv=0x00000000046122f0) at showdebug.c:19:12
16 int result = compute_factorial(firstletter - '0');
- 17
+ 17
18 // Returned result is clipped at 255...
-> 19 return result;
20 }
instructions.
As described in [1]_ the DDG uses graph abstraction to group nodes
-that are part of a strongly connected component of the graph
+that are part of a strongly connected component of the graph
into special nodes called pi-blocks. pi-blocks represent cycles of data
dependency that prevent reordering transformations. Since any strongly
connected component of the graph is a maximal subgraph of all the nodes
that form a cycle, pi-blocks are at most one level deep. In other words,
-no pi-blocks are nested inside another pi-block, resulting in a
+no pi-blocks are nested inside another pi-block, resulting in a
hierarchical representation that is at most one level deep.
graph described in [1]_ in the following ways:
1. The graph nodes in the paper represent three main program components, namely *assignment statements*, *for loop headers* and *while loop headers*. In this implementation, DDG nodes naturally represent LLVM IR instructions. An assignment statement in this implementation typically involves a node representing the ``store`` instruction along with a number of individual nodes computing the right-hand-side of the assignment that connect to the ``store`` node via a def-use edge. The loop header instructions are not represented as special nodes in this implementation because they have limited uses and can be easily identified, for example, through ``LoopAnalysis``.
- 2. The paper describes five types of dependency edges between nodes namely *loop dependency*, *flow-*, *anti-*, *output-*, and *input-* dependencies. In this implementation *memory* edges represent the *flow-*, *anti-*, *output-*, and *input-* dependencies. However, *loop dependencies* are not made explicit, because they mainly represent association between a loop structure and the program elements inside the loop and this association is fairly obvious in LLVM IR itself.
+ 2. The paper describes five types of dependency edges between nodes namely *loop dependency*, *flow-*, *anti-*, *output-*, and *input-* dependencies. In this implementation *memory* edges represent the *flow-*, *anti-*, *output-*, and *input-* dependencies. However, *loop dependencies* are not made explicit, because they mainly represent association between a loop structure and the program elements inside the loop and this association is fairly obvious in LLVM IR itself.
3. The paper describes two types of pi-blocks; *recurrences* whose bodies are SCCs and *IN* nodes whose bodies are not part of any SCC. In this implementation, pi-blocks are only created for *recurrences*. *IN* nodes remain as simple DDG nodes in the graph.
* It is customary to respond to the original commit email mentioning the
revert. This serves as both a notice to the original author that their
patch was reverted, and helps others following llvm-commits track context.
-* Ideally, you should have a publicly reproducible test case ready to share.
+* Ideally, you should have a publicly reproducible test case ready to share.
Where possible, we encourage sharing of test cases in commit threads, or
in PRs. We encourage the reverter to minimize the test case and to prune
dependencies where practical. This even applies when reverting your own
Working with the CI system
--------------------------
-The main continuous integration (CI) tool for the LLVM project is the
-`LLVM Buildbot <https://lab.llvm.org/buildbot/>`_. It uses different *builders*
-to cover a wide variety of sub-projects and configurations. The builds are
-executed on different *workers*. Builders and workers are configured and
+The main continuous integration (CI) tool for the LLVM project is the
+`LLVM Buildbot <https://lab.llvm.org/buildbot/>`_. It uses different *builders*
+to cover a wide variety of sub-projects and configurations. The builds are
+executed on different *workers*. Builders and workers are configured and
provided by community members.
-The Buildbot tracks the commits on the main branch and the release branches.
+The Buildbot tracks the commits on the main branch and the release branches.
This means that patches are built and tested after they are merged to the these
branches (aka post-merge testing). This also means it's okay to break the build
occasionally, as it's unreasonable to expect contributors to build and test
-their patch with every possible configuration.
+their patch with every possible configuration.
*If your commit broke the build:*
*If someone else broke the build and this blocks your work*
-* Comment on the code review in `Phabricator <https://reviews.llvm.org/>`_
+* Comment on the code review in `Phabricator <https://reviews.llvm.org/>`_
(if available) or email the author, explain the problem and how this impacts
you. Add a link to the broken build and the error message so folks can
understand the problem.
*If a build/worker is permanently broken*
* 1st step: contact the owner of the worker. You can find the name and contact
- information for the *Admin* of worker on the page of the build in the
+ information for the *Admin* of worker on the page of the build in the
*Worker* tab:
.. image:: buildbot_worker_contact.png
-* 2nd step: If the owner does not respond or fix the worker, please escalate
+* 2nd step: If the owner does not respond or fix the worker, please escalate
to Galina Kostanova, the maintainer of the BuildBot master.
-* 3rd step: If Galina could not help you, please escalate to the
+* 3rd step: If Galina could not help you, please escalate to the
`Infrastructure Working Group <mailto:iwg@llvm.org>`_.
.. _new-llvm-components:
%ptr = call i32* @get_ptr()
%ptr_is_null = icmp i32* %ptr, null
br i1 %ptr_is_null, label %is_null, label %not_null, !make.implicit !0
-
+
not_null:
%t = load i32, i32* %ptr
br label %do_something_with_t
-
+
is_null:
call void @HFC()
unreachable
-
+
!0 = !{}
to control flow implicit in the instruction loading or storing through
%ptr = call i32* @get_ptr()
%t = load i32, i32* %ptr ;; handler-pc = label %is_null
br label %do_something_with_t
-
+
is_null:
call void @HFC()
unreachable
========
This document covers how to integrate LLVM into a compiler for a language which
-supports garbage collection. **Note that LLVM itself does not provide a
-garbage collector.** You must provide your own.
+supports garbage collection. **Note that LLVM itself does not provide a
+garbage collector.** You must provide your own.
Quick Start
============
-First, you should pick a collector strategy. LLVM includes a number of built
+First, you should pick a collector strategy. LLVM includes a number of built
in ones, but you can also implement a loadable plugin with a custom definition.
-Note that the collector strategy is a description of how LLVM should generate
+Note that the collector strategy is a description of how LLVM should generate
code such that it interacts with your collector and runtime, not a description
of the collector itself.
-Next, mark your generated functions as using your chosen collector strategy.
-From c++, you can call:
+Next, mark your generated functions as using your chosen collector strategy.
+From c++, you can call:
.. code-block:: c++
When generating LLVM IR for your functions, you will need to:
-* Use ``@llvm.gcread`` and/or ``@llvm.gcwrite`` in place of standard load and
- store instructions. These intrinsics are used to represent load and store
- barriers. If you collector does not require such barriers, you can skip
- this step.
+* Use ``@llvm.gcread`` and/or ``@llvm.gcwrite`` in place of standard load and
+ store instructions. These intrinsics are used to represent load and store
+ barriers. If you collector does not require such barriers, you can skip
+ this step.
-* Use the memory allocation routines provided by your garbage collector's
+* Use the memory allocation routines provided by your garbage collector's
runtime library.
-* If your collector requires them, generate type maps according to your
- runtime's binary interface. LLVM is not involved in the process. In
- particular, the LLVM type system is not suitable for conveying such
+* If your collector requires them, generate type maps according to your
+ runtime's binary interface. LLVM is not involved in the process. In
+ particular, the LLVM type system is not suitable for conveying such
information though the compiler.
-* Insert any coordination code required for interacting with your collector.
+* Insert any coordination code required for interacting with your collector.
Many collectors require running application code to periodically check a
- flag and conditionally call a runtime function. This is often referred to
- as a safepoint poll.
+ flag and conditionally call a runtime function. This is often referred to
+ as a safepoint poll.
-You will need to identify roots (i.e. references to heap objects your collector
-needs to know about) in your generated IR, so that LLVM can encode them into
-your final stack maps. Depending on the collector strategy chosen, this is
-accomplished by using either the ``@llvm.gcroot`` intrinsics or an
-``gc.statepoint`` relocation sequence.
+You will need to identify roots (i.e. references to heap objects your collector
+needs to know about) in your generated IR, so that LLVM can encode them into
+your final stack maps. Depending on the collector strategy chosen, this is
+accomplished by using either the ``@llvm.gcroot`` intrinsics or an
+``gc.statepoint`` relocation sequence.
Don't forget to create a root for each intermediate value that is generated when
-evaluating an expression. In ``h(f(), g())``, the result of ``f()`` could
+evaluating an expression. In ``h(f(), g())``, the result of ``f()`` could
easily be collected if evaluating ``g()`` triggers a collection.
-Finally, you need to link your runtime library with the generated program
-executable (for a static compiler) or ensure the appropriate symbols are
-available for the runtime linker (for a JIT compiler).
+Finally, you need to link your runtime library with the generated program
+executable (for a static compiler) or ensure the appropriate symbols are
+available for the runtime linker (for a JIT compiler).
Introduction
* reference counting
-We hope that the support built into the LLVM IR is sufficient to support a
-broad class of garbage collected languages including Scheme, ML, Java, C#,
+We hope that the support built into the LLVM IR is sufficient to support a
+broad class of garbage collected languages including Scheme, ML, Java, C#,
Perl, Python, Lua, Ruby, other scripting languages, and more.
Note that LLVM **does not itself provide a garbage collector** --- this should
be part of your language's runtime library. LLVM provides a framework for
describing the garbage collectors requirements to the compiler. In particular,
-LLVM provides support for generating stack maps at call sites, polling for a
-safepoint, and emitting load and store barriers. You can also extend LLVM -
+LLVM provides support for generating stack maps at call sites, polling for a
+safepoint, and emitting load and store barriers. You can also extend LLVM -
possibly through a loadable :ref:`code generation plugins <plugin>` - to
generate code and data structures which conforms to the *binary interface*
specified by the *runtime library*. This is similar to the relationship between
In general, LLVM's support for GC does not include features which can be
adequately addressed with other features of the IR and does not specify a
particular binary interface. On the plus side, this means that you should be
-able to integrate LLVM with an existing runtime. On the other hand, it can
-have the effect of leaving a lot of work for the developer of a novel
-language. We try to mitigate this by providing built in collector strategy
-descriptions that can work with many common collector designs and easy
-extension points. If you don't already have a specific binary interface
-you need to support, we recommend trying to use one of these built in collector
+able to integrate LLVM with an existing runtime. On the other hand, it can
+have the effect of leaving a lot of work for the developer of a novel
+language. We try to mitigate this by providing built in collector strategy
+descriptions that can work with many common collector designs and easy
+extension points. If you don't already have a specific binary interface
+you need to support, we recommend trying to use one of these built in collector
strategies.
.. _gc_intrinsics:
This section describes the garbage collection facilities provided by the
:doc:`LLVM intermediate representation <LangRef>`. The exact behavior of these
-IR features is specified by the selected :ref:`GC strategy description
-<plugin>`.
+IR features is specified by the selected :ref:`GC strategy description
+<plugin>`.
Specifying GC code generation: ``gc "..."``
-------------------------------------------
compiler. Its programmatic equivalent is the ``setGC`` method of ``Function``.
Setting ``gc "name"`` on a function triggers a search for a matching subclass
-of GCStrategy. Some collector strategies are built in. You can add others
+of GCStrategy. Some collector strategies are built in. You can add others
using either the loadable plugin mechanism, or by patching your copy of LLVM.
-It is the selected GC strategy which defines the exact nature of the code
+It is the selected GC strategy which defines the exact nature of the code
generated to support GC. If none is found, the compiler will raise an error.
Specifying the GC style on a per-function basis allows LLVM to link together
----------------------------------
LLVM currently supports two different mechanisms for describing references in
-compiled code at safepoints. ``llvm.gcroot`` is the older mechanism;
-``gc.statepoint`` has been added more recently. At the moment, you can choose
-either implementation (on a per :ref:`GC strategy <plugin>` basis). Longer
-term, we will probably either migrate away from ``llvm.gcroot`` entirely, or
-substantially merge their implementations. Note that most new development
-work is focused on ``gc.statepoint``.
+compiled code at safepoints. ``llvm.gcroot`` is the older mechanism;
+``gc.statepoint`` has been added more recently. At the moment, you can choose
+either implementation (on a per :ref:`GC strategy <plugin>` basis). Longer
+term, we will probably either migrate away from ``llvm.gcroot`` entirely, or
+substantially merge their implementations. Note that most new development
+work is focused on ``gc.statepoint``.
Using ``gc.statepoint``
^^^^^^^^^^^^^^^^^^^^^^^^
-:doc:`This page <Statepoints>` contains detailed documentation for
-``gc.statepoint``.
+:doc:`This page <Statepoints>` contains detailed documentation for
+``gc.statepoint``.
Using ``llvm.gcwrite``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The ``llvm.gcroot`` intrinsic is used to inform LLVM that a stack variable
references an object on the heap and is to be tracked for garbage collection.
-The exact impact on generated code is specified by the Function's selected
-:ref:`GC strategy <plugin>`. All calls to ``llvm.gcroot`` **must** reside
+The exact impact on generated code is specified by the Function's selected
+:ref:`GC strategy <plugin>`. All calls to ``llvm.gcroot`` **must** reside
inside the first basic block.
The first argument **must** be a value referring to an alloca instruction or a
associated with the pointer, and **must** be a constant or global value
address. If your target collector uses tags, use a null pointer for metadata.
-A compiler which performs manual SSA construction **must** ensure that SSA
+A compiler which performs manual SSA construction **must** ensure that SSA
values representing GC references are stored in to the alloca passed to the
-respective ``gcroot`` before every call site and reloaded after every call.
-A compiler which uses mem2reg to raise imperative code using ``alloca`` into
-SSA form need only add a call to ``@llvm.gcroot`` for those variables which
-are pointers into the GC heap.
+respective ``gcroot`` before every call site and reloaded after every call.
+A compiler which uses mem2reg to raise imperative code using ``alloca`` into
+SSA form need only add a call to ``@llvm.gcroot`` for those variables which
+are pointers into the GC heap.
It is also important to mark intermediate values with ``llvm.gcroot``. For
example, consider ``h(f(), g())``. Beware leaking the result of ``f()`` in the
(although a particular :ref:`collector strategy <plugin>` might). However, it
would be an unusual collector that violated it.
-The use of these intrinsics is naturally optional if the target GC does not
-require the corresponding barrier. The GC strategy used with such a collector
-should replace the intrinsic calls with the corresponding ``load`` or
+The use of these intrinsics is naturally optional if the target GC does not
+require the corresponding barrier. The GC strategy used with such a collector
+should replace the intrinsic calls with the corresponding ``load`` or
``store`` instruction if they are used.
-One known deficiency with the current design is that the barrier intrinsics do
-not include the size or alignment of the underlying operation performed. It is
+One known deficiency with the current design is that the barrier intrinsics do
+not include the size or alignment of the underlying operation performed. It is
currently assumed that the operation is of pointer size and the alignment is
assumed to be the target machine's default alignment.
Built In GC Strategies
======================
-LLVM includes built in support for several varieties of garbage collectors.
+LLVM includes built in support for several varieties of garbage collectors.
The Shadow Stack GC
----------------------
The 'Erlang' and 'Ocaml' GCs
-----------------------------
-LLVM ships with two example collectors which leverage the ``gcroot``
-mechanisms. To our knowledge, these are not actually used by any language
-runtime, but they do provide a reasonable starting point for someone interested
-in writing an ``gcroot`` compatible GC plugin. In particular, these are the
-only in tree examples of how to produce a custom binary stack map format using
+LLVM ships with two example collectors which leverage the ``gcroot``
+mechanisms. To our knowledge, these are not actually used by any language
+runtime, but they do provide a reasonable starting point for someone interested
+in writing an ``gcroot`` compatible GC plugin. In particular, these are the
+only in tree examples of how to produce a custom binary stack map format using
a ``gcroot`` strategy.
-As there names imply, the binary format produced is intended to model that
-used by the Erlang and OCaml compilers respectively.
+As there names imply, the binary format produced is intended to model that
+used by the Erlang and OCaml compilers respectively.
.. _statepoint_example_gc:
F.setGC("statepoint-example");
-This GC provides an example of how one might use the infrastructure provided
-by ``gc.statepoint``. This example GC is compatible with the
-:ref:`PlaceSafepoints` and :ref:`RewriteStatepointsForGC` utility passes
-which simplify ``gc.statepoint`` sequence insertion. If you need to build a
+This GC provides an example of how one might use the infrastructure provided
+by ``gc.statepoint``. This example GC is compatible with the
+:ref:`PlaceSafepoints` and :ref:`RewriteStatepointsForGC` utility passes
+which simplify ``gc.statepoint`` sequence insertion. If you need to build a
custom GC strategy around the ``gc.statepoints`` mechanisms, it is recommended
that you use this one as a starting point.
-This GC strategy does not support read or write barriers. As a result, these
+This GC strategy does not support read or write barriers. As a result, these
intrinsics are lowered to normal loads and stores.
-The stack map format generated by this GC strategy can be found in the
-:ref:`stackmap-section` using a format documented :ref:`here
-<statepoint-stackmap-format>`. This format is intended to be the standard
+The stack map format generated by this GC strategy can be found in the
+:ref:`stackmap-section` using a format documented :ref:`here
+<statepoint-stackmap-format>`. This format is intended to be the standard
format supported by LLVM going forward.
The CoreCLR GC
F.setGC("coreclr");
-This GC leverages the ``gc.statepoint`` mechanism to support the
+This GC leverages the ``gc.statepoint`` mechanism to support the
`CoreCLR <https://github.com/dotnet/coreclr>`__ runtime.
-Support for this GC strategy is a work in progress. This strategy will
-differ from
-:ref:`statepoint-example GC<statepoint_example_gc>` strategy in
+Support for this GC strategy is a work in progress. This strategy will
+differ from
+:ref:`statepoint-example GC<statepoint_example_gc>` strategy in
certain aspects like:
-* Base-pointers of interior pointers are not explicitly
+* Base-pointers of interior pointers are not explicitly
tracked and reported.
* A different format is used for encoding stack maps.
====================
If none of the built in GC strategy descriptions met your needs above, you will
-need to define a custom GCStrategy and possibly, a custom LLVM pass to perform
-lowering. Your best example of where to start defining a custom GCStrategy
+need to define a custom GCStrategy and possibly, a custom LLVM pass to perform
+lowering. Your best example of where to start defining a custom GCStrategy
would be to look at one of the built in strategies.
You may be able to structure this additional code as a loadable plugin library.
-Loadable plugins are sufficient if all you need is to enable a different
-combination of built in functionality, but if you need to provide a custom
-lowering pass, you will need to build a patched version of LLVM. If you think
-you need a patched build, please ask for advice on llvm-dev. There may be an
-easy way we can extend the support to make it work for your use case without
-requiring a custom build.
+Loadable plugins are sufficient if all you need is to enable a different
+combination of built in functionality, but if you need to provide a custom
+lowering pass, you will need to build a patched version of LLVM. If you think
+you need a patched build, please ask for advice on llvm-dev. There may be an
+easy way we can extend the support to make it work for your use case without
+requiring a custom build.
Collector Requirements
----------------------
You should be able to leverage any existing collector library that includes the following elements:
-#. A memory allocator which exposes an allocation function your compiled
+#. A memory allocator which exposes an allocation function your compiled
code can call.
#. A binary format for the stack map. A stack map describes the location
which conservatively scan the stack don't require such a structure.
#. A stack crawler to discover functions on the call stack, and enumerate the
- references listed in the stack map for each call site.
+ references listed in the stack map for each call site.
-#. A mechanism for identifying references in global locations (e.g. global
+#. A mechanism for identifying references in global locations (e.g. global
variables).
#. If you collector requires them, an LLVM IR implementation of your collectors
- load and store barriers. Note that since many collectors don't require
- barriers at all, LLVM defaults to lowering such barriers to normal loads
+ load and store barriers. Note that since many collectors don't require
+ barriers at all, LLVM defaults to lowering such barriers to normal loads
and stores unless you arrange otherwise.
For GCs which use barriers or unusual treatment of stack roots, the
implementor is responsibly for providing a custom pass to lower the
intrinsics with the desired semantics. If you have opted in to custom
-lowering of a particular intrinsic your pass **must** eliminate all
+lowering of a particular intrinsic your pass **must** eliminate all
instances of the corresponding intrinsic in functions which opt in to
-your GC. The best example of such a pass is the ShadowStackGC and it's
-ShadowStackGCLowering pass.
+your GC. The best example of such a pass is the ShadowStackGC and it's
+ShadowStackGCLowering pass.
-There is currently no way to register such a custom lowering pass
+There is currently no way to register such a custom lowering pass
without building a custom copy of LLVM.
.. _safe-points:
-------------\r
\r
If you can't find what you need in these docs, try consulting the mailing\r
-lists. In addition to the traditional mailing lists there is also a \r
-`Discourse server <https://llvm.discourse.group>`_ available. \r
+lists. In addition to the traditional mailing lists there is also a\r
+`Discourse server <https://llvm.discourse.group>`_ available.\r
\r
`Developer's List (llvm-dev)`__\r
This list is for people who want to be included in technical discussions of\r
- Every 2 weeks on Thursday\r
- `ics <https://calendar.google.com/calendar/ical/lowrisc.org_0n5pkesfjcnp0bh5hps1p0bd80%40group.calendar.google.com/public/basic.ics>`__\r
`gcal <https://calendar.google.com/calendar/b/1?cid=bG93cmlzYy5vcmdfMG41cGtlc2ZqY25wMGJoNWhwczFwMGJkODBAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ>`__\r
- - \r
+ -\r
* - Scalable Vectors and Arm SVE\r
- Monthly, every 3rd Tuesday\r
- `ics <https://calendar.google.com/calendar/ical/bjms39pe6k6bo5egtsp7don414%40group.calendar.google.com/public/basic.ics>`__\r
- `Minutes/docs <https://docs.google.com/document/d/1GLCE8cl7goCaLSiM9j1eIq5IqeXt6_YTY2UEcC4jmsg/edit?usp=sharing>`__\r
* - `CIRCT <https://github.com/llvm/circt>`__\r
- Weekly, on Wednesday\r
- - \r
+ -\r
- `Minutes/docs <https://docs.google.com/document/d/1fOSRdyZR2w75D87yU2Ma9h2-_lEPL4NxvhJGJd-s5pk/edit#heading=h.mulvhjtr8dk9>`__\r
* - `MLIR <https://mlir.llvm.org>`__ design meetings\r
- Weekly, on Thursdays\r
- - \r
+ -\r
- `Minutes/docs <https://docs.google.com/document/d/1y_9f1AbfgcoVdJh4_aM6-BaSHvrHl8zuA5G4jv_94K8/edit#heading=h.cite1kolful9>`__\r
* - flang\r
- Multiple meeting series, `documented here <https://github.com/llvm/llvm-project/blob/main/flang/docs/GettingInvolved.md#calls>`__\r
- - \r
- - \r
+ -\r
+ -\r
* - OpenMP\r
- Multiple meeting series, `documented here <https://openmp.llvm.org/docs/SupportAndFAQ.html>`__\r
- - \r
- - \r
+ -\r
+ -\r
* - LLVM Alias Analysis\r
- Every 4 weeks on Tuesdays\r
- `ics <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201103/a3499a67/attachment-0001.ics>`__\r
- `Minutes/docs <https://docs.google.com/document/d/17U-WvX8qyKc3S36YUKr3xfF-GHunWyYowXbxEdpHscw>`__\r
* - Windows/COFF related developments\r
- Every 2 months on Thursday\r
- - \r
+ -\r
- `Minutes/docs <https://docs.google.com/document/d/1A-W0Sas_oHWTEl_x_djZYoRtzAdTONMW_6l1BH9G6Bo/edit?usp=sharing>`__\r
* - Vector Predication\r
- Every 2 weeks on Tuesdays, 3pm UTC\r
* clang-bot - A `geordi <http://www.eelis.net/geordi/>`_ instance running\r
near-trunk clang instead of gcc.\r
\r
-In addition to the traditional IRC there is a \r
-`Discord <https://discord.com/channels/636084430946959380/636725486533345280>`_ \r
-chat server available. To sign up, please use this \r
+In addition to the traditional IRC there is a\r
+`Discord <https://discord.com/channels/636084430946959380/636725486533345280>`_\r
+chat server available. To sign up, please use this\r
`invitation link <https://discord.com/invite/xS7Z362>`_.\r
- \r
+\r
\r
.. _meetups-social-events:\r
\r
pip install psutil
git clone https://github.com/llvm/llvm-project.git llvm
-
+
Instead of ``git clone`` you may download a compressed source distribution
from the `releases page <https://github.com/llvm/llvm-project/releases>`_.
Select the last link: ``Source code (zip)`` and unpack the downloaded file using
You can run LLVM tests by merely building the project "check-all". The test
results will be shown in the VS output window. Once the build succeeds, you
have verified a working LLVM development environment!
-
+
You should not see any unexpected failures, but will see many unsupported
tests and expected failures:
choco install -y git cmake python3
pip3 install psutil
-There is also a Windows
-`Dockerfile <https://github.com/llvm/llvm-zorg/blob/main/buildbot/google/docker/windows-base-vscode2019/Dockerfile>`_
+There is also a Windows
+`Dockerfile <https://github.com/llvm/llvm-zorg/blob/main/buildbot/google/docker/windows-base-vscode2019/Dockerfile>`_
with the entire build tool chain. This can be used to test the build with a
-tool chain different from your host installation or to create build servers.
+tool chain different from your host installation or to create build servers.
Next steps
==========
To make sure your run script works, it's a good idea to run ``./run.sh`` by
hand and tweak the script until it works, then run ``git bisect good`` or
-``git bisect bad`` manually once based on the result of the script
+``git bisect bad`` manually once based on the result of the script
(check ``echo $?`` after your script ran), and only then run ``git bisect run
./run.sh``. Don't forget to mark your run script as executable -- ``git bisect
run`` doesn't check for that, it just assumes the run script failed each time.
A-o-o-......-o-D-o-o-HEAD
/
B-o-...-o-C-
-
+
``A`` is the first commit in LLVM ever, ``97724f18c79c``.
``B`` is the first commit in MLIR, ``aed0d21a62db``.
G_JUMP_TABLE
^^^^^^^^^^^^
-Generates a pointer to the address of the jump table specified by the source
+Generates a pointer to the address of the jump table specified by the source
operand. The source operand is a jump table index.
-G_JUMP_TABLE can be used in conjunction with G_BRJT to support jump table
+G_JUMP_TABLE can be used in conjunction with G_BRJT to support jump table
codegen with GlobalISel.
.. code-block:: none
* ``widenScalarToNextPow2()`` is like ``widenScalarIf()`` but is satisfied iff the type
size in bits is not a power of 2 and selects a target type that is the next
- largest power of 2.
+ largest power of 2.
.. _clampscalar:
* ``minScalar()`` is like ``widenScalarIf()`` but is satisfied iff the type
size in bits is smaller than the given minimum and selects the minimum as the
target type. Similarly, there is also a ``maxScalar()`` for the maximum and a
- ``clampScalar()`` to do both at once.
+ ``clampScalar()`` to do both at once.
* ``minScalarSameAs()`` is like ``minScalar()`` but the minimum is taken from another
type index.
to see if it works.
#. Send a patch which adds your build worker and your builder to
- `zorg <https://github.com/llvm/llvm-zorg>`_. Use the typical LLVM
+ `zorg <https://github.com/llvm/llvm-zorg>`_. Use the typical LLVM
`workflow <https://llvm.org/docs/Contributing.html#how-to-submit-a-patch>`_.
* workers are added to ``buildbot/osuosl/master/config/workers.py``
In the Itanium C++ ABI the first member of an object is a pointer to the vtable
for its class. The vtable is often emitted into the object file with the key function
and must be imported for classes marked dllimport. The pointers must be globally
-unique. Unfortunately, the COFF/PE file format does not provide a mechanism to
+unique. Unfortunately, the COFF/PE file format does not provide a mechanism to
store a runtime address from another DLL into this pointer (although runtime
addresses are patched into the IAT). Therefore, the compiler must emit some code,
that runs after IAT patching but before anything that might use the vtable pointers,
programs to link we currently rely on the -auto-import switch in LLD to auto-import
references to __cxxabiv1::__class_type_info pointers (see: https://reviews.llvm.org/D43184
for a related discussion). This allows for linking; but, code that actually uses
-such fields will not work as they these will not be fixed up at runtime. See
+such fields will not work as they these will not be fixed up at runtime. See
_pei386_runtime_relocator which handles the runtime component of the autoimporting
scheme used for mingw and comments in https://reviews.llvm.org/D43184 and
https://reviews.llvm.org/D89518 for more.
That process will perform both Release+Asserts and Release builds but only
pack the Release build for upload. You should use the Release+Asserts sysroot,
normally under ``final/Phase3/Release+Asserts/llvmCore-3.8.1-RCn.install/``,
-for test-suite and run-time benchmarks, to make sure nothing serious has
+for test-suite and run-time benchmarks, to make sure nothing serious has
passed through the net. For compile-time benchmarks, use the Release version.
The minimum required version of the tools you'll need are :doc:`here <GettingStarted>`
Send an email to the list announcing the release, pointing people to all the
relevant documentation, download pages and bugs fixed.
-
- On iOS platforms, we use AAPCS-VFP calling convention.
"``swifttailcc``"
This calling convention is like ``swiftcc`` in most respects, but also the
- callee pops the argument area of the stack so that mandatory tail calls are
+ callee pops the argument area of the stack so that mandatory tail calls are
possible as in ``tailcc``.
"``cfguard_checkcc``" - Windows Control Flow Guard (Check mechanism)
This calling convention is used for the Control Flow Guard check function,
appropriate fencing is inserted. Since the appropriate fencing is
implementation defined, the optimizer can't do the latter. The former is
challenging as many commonly expected properties, such as
-``ptrtoint(v)-ptrtoint(v) == 0``, don't hold for non-integral types.
+``ptrtoint(v)-ptrtoint(v) == 0``, don't hold for non-integral types.
.. _globalvars:
declare token
@llvm.experimental.gc.statepoint(i64 <id>, i32 <num patch bytes>,
- func_type <target>,
+ func_type <target>,
i64 <#call args>, i64 <flags>,
... (call parameters),
i64 0, i64 0)
The first and only argument is the ``gc.statepoint`` which starts
the safepoint sequence of which this ``gc.result`` is a part.
-Despite the typing of this as a generic token, *only* the value defined
+Despite the typing of this as a generic token, *only* the value defined
by a ``gc.statepoint`` is legal here.
Semantics:
::
declare <pointer type>
- @llvm.experimental.gc.relocate(token %statepoint_token,
- i32 %base_offset,
+ @llvm.experimental.gc.relocate(token %statepoint_token,
+ i32 %base_offset,
i32 %pointer_offset)
Overview:
The first argument is the ``gc.statepoint`` which starts the
safepoint sequence of which this ``gc.relocation`` is a part.
-Despite the typing of this as a generic token, *only* the value defined
+Despite the typing of this as a generic token, *only* the value defined
by a ``gc.statepoint`` is legal here.
The second and third arguments are both indices into operands of the
the Module that was used to create the EngineBuilder.
.. image:: MCJIT-engine-builder.png
-
+
EngineBuilder::create will call the static MCJIT::createJIT function,
passing in its pointers to the module, memory manager and target machine
objects, all of which will subsequently be owned by the MCJIT object.
gets created when an object is loaded.
.. image:: MCJIT-creation.png
-
+
Upon creation, MCJIT holds a pointer to the Module object that it received
from EngineBuilder but it does not immediately generate code for this
module. Code generation is deferred until either the
on the Module with which it was created.
.. image:: MCJIT-load.png
-
+
The PassManager::run call causes the MC code generation mechanisms to emit
a complete relocatable binary object image (either in either ELF or MachO
format, depending on the target) into the ObjectBufferStream object, which
actual loading.
.. image:: MCJIT-dyld-load.png
-
+
RuntimeDyldImpl::loadObject begins by creating an ObjectImage instance
from the ObjectBuffer it received. ObjectImage, which wraps the
ObjectFile class, is a helper class which parses the binary object image
an external symbol relocation map.
.. image:: MCJIT-load-object.png
-
+
When RuntimeDyldImpl::loadObject returns, all of the code and data
sections for the object will have been loaded into memory allocated by the
memory manager and relocation information will have been prepared, but the
likely located in a different section.
.. image:: MCJIT-resolve-relocations.png
-
+
Once relocations have been applied as described above, MCJIT calls
RuntimeDyld::getEHFrameSection, and if a non-zero result is returned
passes the section data to the memory manager's registerEHFrames method.
method, the memory manager will invalidate the target code cache, if
necessary, and apply final permissions to the memory pages it has
allocated for code and data memory.
-
end, including a description of the conventions used and the set of accepted
LLVM IR.
-.. note::
-
+.. note::
+
This document assumes a basic familiarity with CUDA and the PTX
assembly language. Information about the CUDA Driver API and the PTX assembly
language can be found in the `CUDA documentation
Dissecting the Kernel
---------------------
-Now let us dissect the LLVM IR that makes up this kernel.
+Now let us dissect the LLVM IR that makes up this kernel.
Data Layout
^^^^^^^^^^^
st.global.f32 [%rl1], %f110;
ret;
}
-
PreservedAnalyses PA;
PA.preserveSet<CFGAnalyses>();
return PA;
-
+
The pass manager will call the analysis manager's ``invalidate()`` method
with the pass's returned ``PreservedAnalyses``. This can be also done
manually within the pass:
Padding is implemented by inserting a decreasing sequence of `<_padding_records>`
that terminates with ``LF_PAD0``.
-The final category of record is a ``member record``. One particular leaf type --
+The final category of record is a ``member record``. One particular leaf type --
``LF_FIELDLIST`` -- contains a series of embedded records. While the outer
``LF_FIELDLIST`` describes its length (like any other leaf record), the embedded
records -- called ``member records`` do not.
uint16_t Machine;
uint32_t Padding;
};
-
+
- **VersionSignature** - Unknown meaning. Appears to always be ``-1``.
- **VersionHeader** - A value from the following enum.
- **Age** - The number of times the PDB has been written. Equal to the same
field from the :ref:`PDB Stream header <pdb_stream_header>`.
-
+
- **GlobalStreamIndex** - The index of the :doc:`Global Symbol Stream <GlobalStream>`,
which contains CodeView symbol records for all global symbols. Actual records
are stored in the symbol record stream, and are referenced from this stream.
-
+
- **BuildNumber** - A bitfield containing values representing the major and minor
version number of the toolchain (e.g. 12.0 for MSVC 2013) used to build the
program, with the following layout:
If it is ``false``, the layout above does not apply and the reader should consult
the `Microsoft Source Code <https://github.com/Microsoft/microsoft-pdb>`__ for
further guidance.
-
+
- **PublicStreamIndex** - The index of the :doc:`Public Symbol Stream <PublicStream>`,
which contains CodeView symbol records for all public symbols. Actual records
are stored in the symbol record stream, and are referenced from this stream.
-
+
- **PdbDllVersion** - The version number of ``mspdbXXXX.dll`` used to produce this
PDB. Note this obviously does not apply for LLVM as LLVM does not use ``mspdb.dll``.
-
+
- **SymRecordStream** - The stream containing all CodeView symbol records used
by the program. This is used for deduplication, so that many different
compilands can refer to the same symbols without having to include the full record
content inside of each module stream.
-
+
- **PdbDllRbld** - Unknown
- **MFCTypeServerIndex** - The index of the MFC type server in the
- **Flags** - A bitfield with the following layout, containing various
information about how the program was built:
-
+
.. code-block:: c++
uint16_t WasIncrementallyLinked : 1;
of each of the following ``7`` fields.
- **ModInfoSize** - The length of the :ref:`dbi_mod_info_substream`.
-
+
- **SectionContributionSize** - The length of the :ref:`dbi_sec_contr_substream`.
- **SectionMapSize** - The length of the :ref:`dbi_section_map_substream`.
module info substream is an array of variable-length records, each one
describing a single module (e.g. object file) linked into the program. Each
record in the array has the format:
-
+
.. code-block:: c++
struct ModInfo {
char ModuleName[];
char ObjFileName[];
};
-
+
- **SectionContr** - Describes the properties of the section in the final binary
which contain the code and data from this module.
``SectionContr.Characteristics`` corresponds to the ``Characteristics`` field
of the `IMAGE_SECTION_HEADER <https://msdn.microsoft.com/en-us/library/windows/desktop/ms680341(v=vs.85).aspx>`__
structure.
-
+
- **Flags** - A bitfield with the following format:
-
+
.. code-block:: c++
// ``true`` if this ModInfo has been written since reading the PDB. This is
// but as LLVM treats /Zi as /Z7, this field will always be invalid for LLVM
// generated PDBs.
uint16_t TSM : 8;
-
+
- **ModuleSymStream** - The index of the stream that contains symbol information
for this module. This includes CodeView symbol information as well as source
Begins at offset ``0`` immediately after the :ref:`dbi_mod_info_substream` ends,
and consumes ``Header->SectionContributionSize`` bytes. This substream begins
with a single ``uint32_t`` which will be one of the following values:
-
+
.. code-block:: c++
enum class SectionContrSubstreamVersion : uint32_t {
Ver60 = 0xeffe0000 + 19970605,
V2 = 0xeffe0000 + 20140516
};
-
+
``Ver60`` is the only value which has been observed in a PDB so far. Following
this is an array of fixed-length structures. If the version is ``Ver60``,
it is an array of ``SectionContribEntry`` structures (this is the nested structure
from the ``ModInfo`` type. If the version is ``V2``, it is an array of
``SectionContribEntry2`` structures, defined as follows:
-
+
.. code-block:: c++
struct SectionContribEntry2 {
SectionContribEntry SC;
uint32_t ISectCoff;
};
-
+
The purpose of the second field is not well understood. The name implies that
is the index of the COFF section, but this also describes the existing field
``SectionContribEntry::Section``.
-
+
.. _dbi_section_map_substream:
and consumes ``Header->SectionMapSize`` bytes. This substream begins with an ``4``
byte header followed by an array of fixed-length records. The header and records
have the following layout:
-
+
.. code-block:: c++
struct SectionMapHeader {
uint16_t Count; // Number of segment descriptors
uint16_t LogCount; // Number of logical segment descriptors
};
-
+
struct SectionMapEntry {
uint16_t Flags; // See the SectionMapEntryFlags enum below.
uint16_t Ovl; // Logical overlay number
uint32_t Offset; // Byte offset of the logical segment within physical segment. If group is set in flags, this is the offset of the group.
uint32_t SectionLength; // Byte count of the segment or group.
};
-
+
enum class SectionMapEntryFlags : uint16_t {
Read = 1 << 0, // Segment is readable.
Write = 1 << 1, // Segment is writable.
IsAbsoluteAddress = 1 << 9, // Frame represents an absolute address.
IsGroup = 1 << 10 // If set, descriptor represents a group.
};
-
+
Many of these fields are not well understood, so will not be discussed further.
.. _dbi_file_info_substream:
uses a string table to store each unique file name only once, and then have each
module use offsets into the string table rather than embedding the string's value
directly. The format of this substream is as follows:
-
+
.. code-block:: c++
struct FileInfoSubstream {
uint16_t NumModules;
uint16_t NumSourceFiles;
-
+
uint16_t ModIndices[NumModules];
uint16_t ModFileCounts[NumModules];
uint32_t FileNameOffsets[NumSourceFiles];
debug data directory of type ``IMAGE_DEBUG_TYPE_FIXUP``.
**Omap To Src Data** - ``DbgStreamArray[3]``. The data in the referenced stream
-is a debug data directory of type ``IMAGE_DEBUG_TYPE_OMAP_TO_SRC``. This
+is a debug data directory of type ``IMAGE_DEBUG_TYPE_OMAP_TO_SRC``. This
is used for mapping addresses between instrumented and uninstrumented code.
**Omap From Src Data** - ``DbgStreamArray[4]``. The data in the referenced stream
-is a debug data directory of type ``IMAGE_DEBUG_TYPE_OMAP_FROM_SRC``. This
+is a debug data directory of type ``IMAGE_DEBUG_TYPE_OMAP_FROM_SRC``. This
is used for mapping addresses between instrumented and uninstrumented code.
**Section Header Data** - ``DbgStreamArray[5]``. A dump of all section headers from
the original executable.
**Token / RID Map** - ``DbgStreamArray[6]``. The layout of this stream is not
-understood, but it is assumed to be a mapping from ``CLR Token`` to
+understood, but it is assumed to be a mapping from ``CLR Token`` to
``CLR Record ID``. Refer to `ECMA 335 <http://www.ecma-international.org/publications/standards/Ecma-335.htm>`__
for more information.
Thus, it is possible for both to appear in the same PDB if both MASM object files
and cl object files are linked into the same program.
-**Original Section Header Data** - ``DbgStreamArray[10]``. Similar to
+**Original Section Header Data** - ``DbgStreamArray[10]``. Similar to
``DbgStreamArray[5]``, but contains the section headers before any binary translation
has been performed. This can be used in conjunction with ``DebugStreamArray[3]``
and ``DbgStreamArray[4]`` to map instrumented and uninstrumented addresses.
problems of using a timestamp with 1-second granularity, this field does not
really serve its intended purpose, and as such is typically ignored in favor
of the ``Guid`` field, described below.
-
+
- **Age** - The number of times the PDB file has been written. This can be used
along with ``Guid`` to match the PDB to its corresponding executable.
-
+
- **Guid** - A 128-bit identifier guaranteed to be unique across space and time.
- In general, this can be thought of as the result of calling the Win32 API
+ In general, this can be thought of as the result of calling the Win32 API
`UuidCreate <https://msdn.microsoft.com/en-us/library/windows/desktop/aa379205(v=vs.85).aspx>`__,
although LLVM cannot rely on that, as it must work on non-Windows platforms.
-
+
.. _pdb_named_stream_map:
Named Stream Map
Following the header is a serialized hash table whose key type is a string, and
whose value type is an integer. The existence of a mapping ``X -> Y`` means
that the stream with the name ``X`` has stream index ``Y`` in the underlying MSF
-file. Note that not all streams are named (for example, the
+file. Note that not all streams are named (for example, the
:doc:`TPI Stream <TpiStream>` has a fixed index and as such there is no need to
look up its index by name). In practice, there are usually only a small number
of named streams and these are enumerated in the table of streams in :doc:`index`.
a buffer of string data prefixed by a 32-bit length. The second is a serialized
hash table whose key and value types are both ``uint32_t``. The key is the offset
of a null-terminated string in the string data buffer specifying the name of the
-stream, and the value is the MSF stream index of the stream with said name.
+stream, and the value is the MSF stream index of the stream with said name.
Note that although the key is an integer, the hash function used to find the right
bucket hashes the string at the corresponding offset in the string data buffer.
Note that the entire Named Stream Map is not length-prefixed, so the only way to
get to the data following it is to de-serialize it in its entirety.
-
+
.. _pdb_stream_features:
PDB Feature Codes
NoTypeMerge = 0x4D544F4E,
MinimalDebugInfo = 0x494E494D,
};
-
+
The meaning of these values is summarized by the following table:
+------------------+-------------------------------------------------+
| | - There is no TPI / IPI stream, all type info |
| | is contained in the original object files. |
+------------------+-------------------------------------------------+
-
+
Matching a PDB to its executable
================================
The linker is responsible for writing both the PDB and the final executable, and
Pre-merge testing
-----------------
-The pre-merge tests are a continuous integration (CI) workflow. The workflow
-checks the patches uploaded to Phabricator before a user merges them to the main
-branch - thus the term *pre-merge testing*.
+The pre-merge tests are a continuous integration (CI) workflow. The workflow
+checks the patches uploaded to Phabricator before a user merges them to the main
+branch - thus the term *pre-merge testing*.
When a user uploads a patch to Phabricator, Phabricator triggers the checks and
-then displays the results. This way bugs in a patch are contained during the
+then displays the results. This way bugs in a patch are contained during the
code review stage and do not pollute the main branch.
Our goal with pre-merge testing is to report most true problems while strongly
reported are always actionable. If you notice a false positive, please report
it so that we can identify the cause.
-If you notice issues or have an idea on how to improve pre-merge checks, please
-`create a new issue <https://github.com/google/llvm-premerge-checks/issues/new>`_
+If you notice issues or have an idea on how to improve pre-merge checks, please
+`create a new issue <https://github.com/google/llvm-premerge-checks/issues/new>`_
or give a ❤️ to an existing one.
Requirements
patch to the checked out git repository. Please make sure that either:
* You set a git hash as ``sourceControlBaseRevision`` in Phabricator which is
- available on the GitHub repository,
-* **or** you define the dependencies of your patch in Phabricator,
+ available on the GitHub repository,
+* **or** you define the dependencies of your patch in Phabricator,
* **or** your patch can be applied to the main branch.
Only then can the build server apply the patch locally and run the builds and
Accessing build results
^^^^^^^^^^^^^^^^^^^^^^^
Phabricator will automatically trigger a build for every new patch you upload or
-modify. Phabricator shows the build results at the top of the entry. Clicking on
+modify. Phabricator shows the build results at the top of the entry. Clicking on
the links (in the red box) will show more details:
.. image:: Phabricator_premerge_results.png
strings, especially for platform-specific types like ``size_t`` or pointer types.
Unlike both ``printf`` and Python, it additionally fails to compile if LLVM does
not know how to format the type. These two properties ensure that the function
-is both safer and simpler to use than traditional formatting methods such as
+is both safer and simpler to use than traditional formatting methods such as
the ``printf`` family of functions.
Simple formatting
the value into, and the alignment of the value within the field. It is specified as
an optional **alignment style** followed by a positive integral **field width**. The
alignment style can be one of the characters ``-`` (left align), ``=`` (center align),
-or ``+`` (right align). The default is right aligned.
+or ``+`` (right align). The default is right aligned.
``style`` is an optional string consisting of a type specific that controls the
formatting of the value. For example, to format a floating point value as a percentage,
type ``T`` with the appropriate static format method.
.. code-block:: c++
-
+
namespace llvm {
template<>
struct format_provider<MyFooBar> {
std::string S = formatv("{0}", X);
}
}
-
+
This is a useful extensibility mechanism for adding support for formatting your own
custom types with your own custom Style options. But it does not help when you want
to extend the mechanism for formatting a type that the library already knows how to
format. For that, we need something else.
-
+
2. Provide a **format adapter** inheriting from ``llvm::FormatAdapter<T>``.
.. code-block:: c++
-
+
namespace anything {
struct format_int_custom : public llvm::FormatAdapter<int> {
explicit format_int_custom(int N) : llvm::FormatAdapter<int>(N) {}
std::string S = formatv("{0}", anything::format_int_custom(42));
}
}
-
+
If the type is detected to be derived from ``FormatAdapter<T>``, ``formatv``
will call the
``format`` method on the argument passing in the specified style. This allows
.. code-block:: c++
-
+
std::string S;
// Simple formatting of basic types and implicit string conversion.
S = formatv("{0} ({1:P})", 7, 0.35); // S == "7 (35.00%)"
-
+
// Out-of-order referencing and multi-referencing
outs() << formatv("{0} {2} {1} {0}", 1, "test", 3); // prints "1 3 test 1"
-
+
// Left, right, and center alignment
S = formatv("{0,7}", 'a'); // S == " a";
S = formatv("{0,-7}", 'a'); // S == "a ";
S = formatv("{0,=7}", 'a'); // S == " a ";
S = formatv("{0,+7}", 'a'); // S == " a";
-
+
// Custom styles
S = formatv("{0:N} - {0:x} - {1:E}", 12345, 123908342); // S == "12,345 - 0x3039 - 1.24E8"
-
+
// Adapters
S = formatv("{0}", fmt_align(42, AlignStyle::Center, 7)); // S == " 42 "
S = formatv("{0}", fmt_repeat("hi", 3)); // S == "hihihi"
S = formatv("{0}", fmt_pad("hi", 2, 6)); // S == " hi "
-
+
// Ranges
std::vector<int> V = {8, 9, 10};
S = formatv("{0}", make_range(V.begin(), V.end())); // S == "8, 9, 10"
This subclass of Value defines the interface for incoming formal arguments to a
function. A Function maintains a list of its formal arguments. An argument has
a pointer to the parent Function.
-
-
benchmarks and programs that are known to compile with the Clang front
end. You can use these programs to test your code, gather statistical
information, and compare it to the current LLVM performance statistics.
-
+
Currently, there is no way to hook your tests directly into the ``llvm/test``
testing harness. You will simply need to find a way to use the source
provided within that directory on your own.
``submodule-map.txt`` is a list of pairs, one per line. The first
pair item describes the path to a submodule in the umbrella
repository. The second pair item describes the path where trees for
-that submodule should be written in the zipped history.
+that submodule should be written in the zipped history.
Let's say your umbrella repository is actually the llvm repository and
it has submodules in the "nested sources" layout (clang in
these potentially stale variable values from the developer diminishes the
amount of available debug information, but increases the reliability of the
remaining information.
-
+
To illustrate some potential issues, consider the following example:
.. code-block:: llvm
entry:
br i1 %cond, label %truebr, label %falsebr
- bb1:
+ bb1:
%value = phi i32 [ %value1, %truebr ], [ %value2, %falsebr ]
br label %exit, !dbg !26
%value = add i32 %input, 2
br label %bb1
- exit:
+ exit:
ret i32 %value, !dbg !30
}
.. code-block:: text
- DW_TAG_subprogram [3]
+ DW_TAG_subprogram [3]
DW_AT_low_pc [DW_FORM_addr] (0x0000000000000010 ".text")
DW_AT_high_pc [DW_FORM_data4] (0x00000001)
...
============================
You can generate the HTML documentation from the sources locally if you want to
-see what they would look like. In addition to the normal
+see what they would look like. In addition to the normal
`build tools <docs/GettingStarted.html>`_
-you need to install `Sphinx`_ and the
+you need to install `Sphinx`_ and the
`recommonmark <https://recommonmark.readthedocs.io/en/latest/>`_ extension.
On Debian you can install these with:
cmake -DLLVM_ENABLE_SPHINX=On ../llvm
cmake --build . --target docs-llvm-html
-In case you already have the Cmake build set up and want to reuse that,
+In case you already have the Cmake build set up and want to reuse that,
just set the CMake variable ``LLVM_ENABLE_SPHINX=On``.
After that you find the generated documentation in ``build/docs/html``
Supported Architectures
=======================
-Support for StackMap generation and the related intrinsics requires
-some code for each backend. Today, only a subset of LLVM's backends
-are supported. The currently supported architectures are X86_64,
+Support for StackMap generation and the related intrinsics requires
+some code for each backend. Today, only a subset of LLVM's backends
+are supported. The currently supported architectures are X86_64,
PowerPC, Aarch64 and SystemZ.
=======
This document describes a set of extensions to LLVM to support garbage
-collection. By now, these mechanisms are well proven with commercial java
-implementation with a fully relocating collector having shipped using them.
+collection. By now, these mechanisms are well proven with commercial java
+implementation with a fully relocating collector having shipped using them.
There are a couple places where bugs might still linger; these are called out
below.
They are still listed as "experimental" to indicate that no forward or backward
-compatibility guarantees are offered across versions. If your use case is such
-that you need some form of forward compatibility guarantee, please raise the
-issue on the llvm-dev mailing list.
+compatibility guarantees are offered across versions. If your use case is such
+that you need some form of forward compatibility guarantee, please raise the
+issue on the llvm-dev mailing list.
-LLVM still supports an alternate mechanism for conservative garbage collection
+LLVM still supports an alternate mechanism for conservative garbage collection
support using the ``gcroot`` intrinsic. The ``gcroot`` mechanism is mostly of
historical interest at this point with one exception - its implementation of
shadow stacks has been used successfully by a number of language frontends and
-is still supported.
+is still supported.
Overview & Core Concepts
========================
Abstract Machine Model
^^^^^^^^^^^^^^^^^^^^^^^
-At a high level, LLVM has been extended to support compiling to an abstract
-machine which extends the actual target with a non-integral pointer type
-suitable for representing a garbage collected reference to an object. In
-particular, such non-integral pointer type have no defined mapping to an
-integer representation. This semantic quirk allows the runtime to pick a
-integer mapping for each point in the program allowing relocations of objects
+At a high level, LLVM has been extended to support compiling to an abstract
+machine which extends the actual target with a non-integral pointer type
+suitable for representing a garbage collected reference to an object. In
+particular, such non-integral pointer type have no defined mapping to an
+integer representation. This semantic quirk allows the runtime to pick a
+integer mapping for each point in the program allowing relocations of objects
without visible effects.
This high level abstract machine model is used for most of the optimizer. As
Note that most of the value of the abstract machine model comes for collectors
which need to model potentially relocatable objects. For a compiler which
supports only a non-relocating collector, you may wish to consider starting
-with the fully explicit form.
+with the fully explicit form.
-Warning: There is one currently known semantic hole in the definition of
+Warning: There is one currently known semantic hole in the definition of
non-integral pointers which has not been addressed upstream. To work around
-this, you need to disable speculation of loads unless the memory type
-(non-integral pointer vs anything else) is known to unchanged. That is, it is
-not safe to speculate a load if doing causes a non-integral pointer value to
-be loaded as any other type or vice versa. In practice, this restriction is
+this, you need to disable speculation of loads unless the memory type
+(non-integral pointer vs anything else) is known to unchanged. That is, it is
+not safe to speculate a load if doing causes a non-integral pointer value to
+be loaded as any other type or vice versa. In practice, this restriction is
well isolated to isSafeToSpeculate in ValueTracking.cpp.
Explicit Representation
^^^^^^^^^^^^^^^^^^^^^^^
-A frontend could directly generate this low level explicit form, but
+A frontend could directly generate this low level explicit form, but
doing so may inhibit optimization. Instead, it is recommended that
compilers with relocating collectors target the abstract machine model just
-described.
+described.
-The heart of the explicit approach is to construct (or rewrite) the IR in a
+The heart of the explicit approach is to construct (or rewrite) the IR in a
manner where the possible updates performed by the garbage collector are
explicitly visible in the IR. Doing so requires that we:
collected values, transforming the IR to expose a pointer giving the
base object for every such live pointer, and inserting all the
intrinsics correctly is explicitly out of scope for this document.
- The recommended approach is to use the :ref:`utility passes
- <statepoint-utilities>` described below.
+ The recommended approach is to use the :ref:`utility passes
+ <statepoint-utilities>` described below.
This abstract function call is concretely represented by a sequence of
intrinsic calls known collectively as a "statepoint relocation sequence".
.. code-block:: llvm
- define i8 addrspace(1)* @test1(i8 addrspace(1)* %obj)
+ define i8 addrspace(1)* @test1(i8 addrspace(1)* %obj)
gc "statepoint-example" {
call void ()* @foo()
ret i8 addrspace(1)* %obj
}
-Depending on our language we may need to allow a safepoint during the execution
-of ``foo``. If so, we need to let the collector update local values in the
-current frame. If we don't, we'll be accessing a potential invalid reference
+Depending on our language we may need to allow a safepoint during the execution
+of ``foo``. If so, we need to let the collector update local values in the
+current frame. If we don't, we'll be accessing a potential invalid reference
once we eventually return from the call.
-In this example, we need to relocate the SSA value ``%obj``. Since we can't
-actually change the value in the SSA value ``%obj``, we need to introduce a new
+In this example, we need to relocate the SSA value ``%obj``. Since we can't
+actually change the value in the SSA value ``%obj``, we need to introduce a new
SSA value ``%obj.relocated`` which represents the potentially changed value of
-``%obj`` after the safepoint and update any following uses appropriately. The
+``%obj`` after the safepoint and update any following uses appropriately. The
resulting relocation sequence is:
.. code-block:: llvm
- define i8 addrspace(1)* @test1(i8 addrspace(1)* %obj)
+ define i8 addrspace(1)* @test1(i8 addrspace(1)* %obj)
gc "statepoint-example" {
%0 = call token (i64, i32, void ()*, i32, i32, ...)* @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 0, i32 0, void ()* @foo, i32 0, i32 0, i32 0, i32 0, i8 addrspace(1)* %obj)
%obj.relocated = call coldcc i8 addrspace(1)* @llvm.experimental.gc.relocate.p1i8(token %0, i32 7, i32 7)
of the call, we use the ``gc.result`` intrinsic. To get the relocation
of each pointer in turn, we use the ``gc.relocate`` intrinsic with the
appropriate index. Note that both the ``gc.relocate`` and ``gc.result`` are
-tied to the statepoint. The combination forms a "statepoint relocation
+tied to the statepoint. The combination forms a "statepoint relocation
sequence" and represents the entirety of a parseable call or 'statepoint'.
When lowered, this example would generate the following x86 assembly:
.. code-block:: gas
-
+
.globl test1
.align 16, 0x90
pushq %rax
The relevant parts of the StackMap section for our example are:
.. code-block:: gas
-
+
# This describes the call site
# Stack Maps: callsite 2882400000
.quad 2882400000
.short 0
# .. 8 entries skipped ..
# This entry describes the spill slot which is directly addressable
- # off RSP with offset 0. Given the value was spilled with a pushq,
+ # off RSP with offset 0. Given the value was spilled with a pushq,
# that makes sense.
# Stack Maps: Loc 8: Direct RSP [encoding: .byte 2, .byte 8, .short 7, .int 0]
.byte 2
information about which location contain live references, it doesn't need to
represent explicit relocations. As such, the previously described explicit
lowering can be simplified to remove all of the ``gc.relocate`` intrinsic
-calls and leave uses in terms of the original reference value.
+calls and leave uses in terms of the original reference value.
Here's the explicit lowering for the previous example for a non-relocating
collector:
.. code-block:: llvm
- define i8 addrspace(1)* @test1(i8 addrspace(1)* %obj)
+ define i8 addrspace(1)* @test1(i8 addrspace(1)* %obj)
gc "statepoint-example" {
call token (i64, i32, void ()*, i32, i32, ...)* @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 0, i32 0, void ()* @foo, i32 0, i32 0, i32 0, i32 0, i8 addrspace(1)* %obj)
ret i8 addrspace(1)* %obj
recommended to use this with caution and expect to have to fix a few bugs.
In particular, the RewriteStatepointsForGC utility pass does not do
anything for allocas today.
-
+
Base & Derived Pointers
^^^^^^^^^^^^^^^^^^^^^^^
A "base pointer" is one which points to the starting address of an allocation
(object). A "derived pointer" is one which is offset from a base pointer by
-some amount. When relocating objects, a garbage collector needs to be able
-to relocate each derived pointer associated with an allocation to the same
+some amount. When relocating objects, a garbage collector needs to be able
+to relocate each derived pointer associated with an allocation to the same
offset from the new address.
-"Interior derived pointers" remain within the bounds of the allocation
-they're associated with. As a result, the base object can be found at
+"Interior derived pointers" remain within the bounds of the allocation
+they're associated with. As a result, the base object can be found at
runtime provided the bounds of allocations are known to the runtime system.
"Exterior derived pointers" are outside the bounds of the associated object;
they may even fall within *another* allocations address range. As a result,
-there is no way for a garbage collector to determine which allocation they
+there is no way for a garbage collector to determine which allocation they
are associated with at runtime and compiler support is needed.
The ``gc.relocate`` intrinsic supports an explicit operand for describing the
-allocation associated with a derived pointer. This operand is frequently
+allocation associated with a derived pointer. This operand is frequently
referred to as the base operand, but does not strictly speaking have to be
a base pointer, but it does need to lie within the bounds of the associated
allocation. Some collectors may require that the operand be an actual base
-pointer rather than merely an internal derived pointer. Note that during
-lowering both the base and derived pointer operands are required to be live
-over the associated call safepoint even if the base is otherwise unused
+pointer rather than merely an internal derived pointer. Note that during
+lowering both the base and derived pointer operands are required to be live
+over the associated call safepoint even if the base is otherwise unused
afterwards.
-If we extend our previous example to include a pointless derived pointer,
+If we extend our previous example to include a pointless derived pointer,
we get:
.. code-block:: llvm
- define i8 addrspace(1)* @test1(i8 addrspace(1)* %obj)
+ define i8 addrspace(1)* @test1(i8 addrspace(1)* %obj)
gc "statepoint-example" {
%gep = getelementptr i8, i8 addrspace(1)* %obj, i64 20000
%token = call token (i64, i32, void ()*, i32, i32, ...)* @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 0, i32 0, void ()* @foo, i32 0, i32 0, i32 0, i32 0, i8 addrspace(1)* %obj, i8 addrspace(1)* %gep)
"deopt" operand bundle. At the moment, only deopt parameters with a bitwidth
of 64 bits or less are supported. Values of a type larger than 64 bits can be
specified and reported only if a) the value is constant at the call site, and
- b) the constant can be represented with less than 64 bits (assuming zero
+ b) the constant can be represented with less than 64 bits (assuming zero
extension to the original bitwidth).
-* Variable number of relocation records, each of which consists of
+* Variable number of relocation records, each of which consists of
exactly two Locations. Relocation records are described in detail
below.
-Each relocation record provides sufficient information for a collector to
-relocate one or more derived pointers. Each record consists of a pair of
-Locations. The second element in the record represents the pointer (or
-pointers) which need updated. The first element in the record provides a
+Each relocation record provides sufficient information for a collector to
+relocate one or more derived pointers. Each record consists of a pair of
+Locations. The second element in the record represents the pointer (or
+pointers) which need updated. The first element in the record provides a
pointer to the base of the object with which the pointer(s) being relocated is
-associated. This information is required for handling generalized derived
+associated. This information is required for handling generalized derived
pointers since a pointer may be outside the bounds of the original allocation,
but still needs to be relocated with the allocation. Additionally:
-* It is guaranteed that the base pointer must also appear explicitly as a
- relocation pair if used after the statepoint.
+* It is guaranteed that the base pointer must also appear explicitly as a
+ relocation pair if used after the statepoint.
* There may be fewer relocation records then gc parameters in the IR
statepoint. Each *unique* pair will occur at least once; duplicates
- are possible.
-* The Locations within each record may either be of pointer size or a
- multiple of pointer size. In the later case, the record must be
- interpreted as describing a sequence of pointers and their corresponding
+ are possible.
+* The Locations within each record may either be of pointer size or a
+ multiple of pointer size. In the later case, the record must be
+ interpreted as describing a sequence of pointers and their corresponding
base pointers. If the Location is of size N x sizeof(pointer), then
there will be N records of one pointer each contained within the Location.
Both Locations in a pair can be assumed to be of the same size.
^^^^^^^^^^^^^^^^^^^^^^^^
The pass RewriteStatepointsForGC transforms a function's IR to lower from the
-abstract machine model described above to the explicit statepoint model of
+abstract machine model described above to the explicit statepoint model of
relocations. To do this, it replaces all calls or invokes of functions which
might contain a safepoint poll with a ``gc.statepoint`` and associated full
-relocation sequence, including all required ``gc.relocates``.
+relocation sequence, including all required ``gc.relocates``.
-Note that by default, this pass only runs for the "statepoint-example" or
-"core-clr" gc strategies. You will need to add your custom strategy to this
-list or use one of the predefined ones.
+Note that by default, this pass only runs for the "statepoint-example" or
+"core-clr" gc strategies. You will need to add your custom strategy to this
+list or use one of the predefined ones.
As an example, given this code:
.. code-block:: llvm
- define i8 addrspace(1)* @test1(i8 addrspace(1)* %obj)
+ define i8 addrspace(1)* @test1(i8 addrspace(1)* %obj)
gc "statepoint-example" {
call void @foo()
ret i8 addrspace(1)* %obj
.. code-block:: llvm
- define i8 addrspace(1)* @test1(i8 addrspace(1)* %obj)
+ define i8 addrspace(1)* @test1(i8 addrspace(1)* %obj)
gc "statepoint-example" {
%0 = call token (i64, i32, void ()*, i32, i32, ...)* @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 2882400000, i32 0, void ()* @foo, i32 0, i32 0, i32 0, i32 5, i32 0, i32 -1, i32 0, i32 0, i32 0, i8 addrspace(1)* %obj)
%obj.relocated = call coldcc i8 addrspace(1)* @llvm.experimental.gc.relocate.p1i8(token %0, i32 12, i32 12)
non references. The pass assumes that all addrspace(1) pointers are non-integral
pointer types. Address space 1 is not globally reserved for this purpose.
-This pass can be used an utility function by a language frontend that doesn't
-want to manually reason about liveness, base pointers, or relocation when
-constructing IR. As currently implemented, RewriteStatepointsForGC must be
+This pass can be used an utility function by a language frontend that doesn't
+want to manually reason about liveness, base pointers, or relocation when
+constructing IR. As currently implemented, RewriteStatepointsForGC must be
run after SSA construction (i.e. mem2ref).
RewriteStatepointsForGC will ensure that appropriate base pointers are listed
for every relocation created. It will do so by duplicating code as needed to
propagate the base pointer associated with each pointer being relocated to
-the appropriate safepoints. The implementation assumes that the following
-IR constructs produce base pointers: loads from the heap, addresses of global
+the appropriate safepoints. The implementation assumes that the following
+IR constructs produce base pointers: loads from the heap, addresses of global
variables, function arguments, function return values. Constant pointers (such
as null) are also assumed to be base pointers. In practice, this constraint
-can be relaxed to producing interior derived pointers provided the target
-collector can find the associated allocation from an arbitrary interior
+can be relaxed to producing interior derived pointers provided the target
+collector can find the associated allocation from an arbitrary interior
derived pointer.
By default RewriteStatepointsForGC passes in ``0xABCDEF00`` as the statepoint
are not propagated to the ``gc.statepoint`` call or invoke if they
could be successfully parsed.
-In practice, RewriteStatepointsForGC should be run much later in the pass
-pipeline, after most optimization is already done. This helps to improve
+In practice, RewriteStatepointsForGC should be run much later in the pass
+pipeline, after most optimization is already done. This helps to improve
the quality of the generated code when compiled with garbage collection support.
.. _RewriteStatepointsForGC_intrinsic_lowering:
PlaceSafepoints
^^^^^^^^^^^^^^^^
-The pass PlaceSafepoints inserts safepoint polls sufficient to ensure running
-code checks for a safepoint request on a timely manner. This pass is expected
-to be run before RewriteStatepointsForGC and thus does not produce full
-relocation sequences.
+The pass PlaceSafepoints inserts safepoint polls sufficient to ensure running
+code checks for a safepoint request on a timely manner. This pass is expected
+to be run before RewriteStatepointsForGC and thus does not produce full
+relocation sequences.
As an example, given input IR of the following:
ret void
}
-In this case, we've added an (unconditional) entry safepoint poll. Note that
-despite appearances, the entry poll is not necessarily redundant. We'd have to
-know that ``foo`` and ``test`` were not mutually recursive for the poll to be
-redundant. In practice, you'd probably want to your poll definition to contain
+In this case, we've added an (unconditional) entry safepoint poll. Note that
+despite appearances, the entry poll is not necessarily redundant. We'd have to
+know that ``foo`` and ``test`` were not mutually recursive for the poll to be
+redundant. In practice, you'd probably want to your poll definition to contain
a conditional branch of some form.
-At the moment, PlaceSafepoints can insert safepoint polls at method entry and
-loop backedges locations. Extending this to work with return polls would be
+At the moment, PlaceSafepoints can insert safepoint polls at method entry and
+loop backedges locations. Extending this to work with return polls would be
straight forward if desired.
-PlaceSafepoints includes a number of optimizations to avoid placing safepoint
-polls at particular sites unless needed to ensure timely execution of a poll
-under normal conditions. PlaceSafepoints does not attempt to ensure timely
+PlaceSafepoints includes a number of optimizations to avoid placing safepoint
+polls at particular sites unless needed to ensure timely execution of a poll
+under normal conditions. PlaceSafepoints does not attempt to ensure timely
execution of a poll under worst case conditions such as heavy system paging.
-The implementation of a safepoint poll action is specified by looking up a
+The implementation of a safepoint poll action is specified by looking up a
function of the name ``gc.safepoint_poll`` in the containing Module. The body
of this function is inserted at each poll site desired. While calls or invokes
-inside this method are transformed to a ``gc.statepoints``, recursive poll
+inside this method are transformed to a ``gc.statepoints``, recursive poll
insertion is not performed.
This pass is useful for any language frontend which only has to support
you can insert safepoint polls in the frontend. If you have the later case,
please ask on llvm-dev for suggestions. There's been a good amount of work
done on making such a scheme work well in practice which is not yet documented
-here.
+here.
Supported Architectures
The missing pieces are a) integration with rewriting (RS4GC) from the
abstract machine model and b) support for optionally decomposing on stack
objects so as not to require heap maps for them. The later is required
-for ease of integration with some collectors.
+for ease of integration with some collectors.
Lowering Quality and Representation Overhead
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
post processing of each individual object file. While not implemented
today for statepoints, there is precedent for a GCStrategy to be able to
select a customer GCMetataPrinter for this purpose. Patches to enable
-this functionality upstream are welcome.
+this functionality upstream are welcome.
Bugs and Enhancements
=====================
<http://lists.llvm.org/mailman/listinfo/llvm-dev>`_, and patches
should be sent to `llvm-commits
<http://lists.llvm.org/mailman/listinfo/llvm-commits>`_ for review.
-
---------------------------
The Support Library must shield LLVM from **all** system headers. To obtain
-system level functionality, LLVM source must
+system level functionality, LLVM source must
``#include "llvm/Support/Thing.h"`` and nothing else. This means that
``Thing.h`` cannot expose any system header files. This protects LLVM from
accidentally using system specific functionality and only allows it via
#endif
The implementation in ``lib/Support/Unix/Path.inc`` should handle all Unix
-variants. The implementation in ``lib/Support/Windows/Path.inc`` should handle
+variants. The implementation in ``lib/Support/Windows/Path.inc`` should handle
all Windows variants. What this does is quickly inc the basic class
of operating system that will provide the implementation. The specific details
for a given platform must still be determined through the use of ``#ifdef``.
ClangAttrVisitor
-------------------
-**Purpose**: Creates AttrVisitor.inc, which is used when implementing
+**Purpose**: Creates AttrVisitor.inc, which is used when implementing
recursive AST visitors.
ClangAttrTemplateInstantiate
return false;
return false;
});
-
+
if (Idx == Table.end() ||
Key.Val1 != Idx->Val1 ||
Key.Val2 != Idx->Val2)
return nullptr;
return &CTable[Idx->_index];
}
-
are described in the following subsections.
*All* of the classes derived from ``RecTy`` provide the ``get()`` function.
-It returns an instance of ``Recty`` corresponding to the derived class.
+It returns an instance of ``Recty`` corresponding to the derived class.
Some of the ``get()`` functions require an argument to
specify which particular variant of the type is desired. These arguments are
described in the following subsections.
~~~~~~~~~~~
The ``DagInit`` class is a subclass of ``TypedInit``. Its instances
-represent the possible direct acyclic graphs (``dag``).
+represent the possible direct acyclic graphs (``dag``).
The class includes a pointer to an ``Init`` for the DAG operator and a
pointer to a ``StringInit`` for the operator name. It includes the count of
.. code-block:: text
using const_iterator = Init *const *;
-
+
``StringInit``
~~~~~~~~~~~~~~
function. It should invoke the "main function" of your backend, which
in this case, according to convention, is named ``EmitAddressModes``.
-5. Add a declaration of your "main function" to the corresponding
+5. Add a declaration of your "main function" to the corresponding
``TableGenBackends.h`` header file.
#. Add your backend C++ file to the appropriate ``CMakeLists.txt`` file so
The field is assumed to have another record as its value. That record is returned
as a pointer to a ``Record``. If the field does not exist or is unset, the
-functions returns null.
+functions returns null.
Getting Record Superclasses
===========================
* ``PrintFatalNote`` prints a note and then terminates.
-Each of these five functions is overloaded four times.
+Each of these five functions is overloaded four times.
* ``PrintError(const Twine &Msg)``:
Prints the message with no source file location.
-* ``PrintError(ArrayRef<SMLoc> ErrorLoc, const Twine &Msg)``:
+* ``PrintError(ArrayRef<SMLoc> ErrorLoc, const Twine &Msg)``:
Prints the message followed by the specified source line,
along with a pointer to the item in error. The array of
source file locations is typically taken from a ``Record`` instance.
.. code-block:: text
DETAILED RECORDS for file llvm-project\llvm\lib\target\arc\arc.td
-
+
-------------------- Global Variables (5) --------------------
-
+
AMDGPUBufferIntrinsics = [int_amdgcn_buffer_load_format, ...
AMDGPUImageDimAtomicIntrinsics = [int_amdgcn_image_atomic_swap_1d, ...
...
-------------------- Classes (758) --------------------
-
+
AMDGPUBufferLoad |IntrinsicsAMDGPU.td:879|
Template args:
LLVMType AMDGPUBufferLoad:data_ty = llvm_any_ty |IntrinsicsAMDGPU.td:879|
string LLVMName = "" |Intrinsics.td:343|
...
-------------------- Records (12303) --------------------
-
+
AMDGPUSample_lz_o |IntrinsicsAMDGPU.td:560|
Defm sequence: |IntrinsicsAMDGPU.td:584| |IntrinsicsAMDGPU.td:566|
Superclasses: AMDGPUSampleVariant
their values.
* The classes are shown with their source location, template arguments,
- superclasses, and fields.
+ superclasses, and fields.
* The records are shown with their source location, ``defm`` sequence,
superclasses, and fields.
TableGen Phase Timing
===-------------------------------------------------------------------------===
Total Execution Time: 101.0106 seconds (102.4819 wall clock)
-
+
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
85.5197 ( 84.9%) 0.1560 ( 50.0%) 85.6757 ( 84.8%) 85.7009 ( 83.6%) Backend overall
15.1789 ( 15.1%) 0.0000 ( 0.0%) 15.1789 ( 15.0%) 15.1829 ( 14.8%) Parse, build records
TableGen Phase Timing
===-------------------------------------------------------------------------===
Total Execution Time: 746.3868 seconds (747.1447 wall clock)
-
+
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
657.7938 ( 88.1%) 0.1404 ( 90.0%) 657.9342 ( 88.1%) 658.6497 ( 88.2%) Emit matcher table
70.2317 ( 9.4%) 0.0000 ( 0.0%) 70.2317 ( 9.4%) 70.2700 ( 9.4%) Convert to matchers
.. productionlist::
BangOperator: one of
- : !add !and !cast !con !dag
+ : !add !and !cast !con !dag
: !empty !eq !filter !find !foldl
: !foreach !ge !getdagop !gt !head
: !if !interleave !isa !le !listconcat
Statement: `Assert` | `Class` | `Def` | `Defm` | `Defset` | `Defvar`
:| `Foreach` | `If` | `Let` | `MultiClass`
-The following sections describe each of these top-level statements.
+The following sections describe each of these top-level statements.
``class`` --- define an abstract record class
---------------------------------------------
A ``class`` statement defines an abstract record class from which other
-classes and records can inherit.
+classes and records can inherit.
.. productionlist::
Class: "class" `ClassID` [`TemplateArgList`] `RecordBody`
Once multiclasses have been defined, you use the ``defm`` statement to
"invoke" them and process the multiple record definitions in those
-multiclasses. Those record definitions are specified by ``def``
+multiclasses. Those record definitions are specified by ``def``
statements in the multiclasses, and indirectly by ``defm`` statements.
.. productionlist::
``dag`` datatype. A DAG node consists of an operator and zero or more
arguments (or operands). Each argument can be of any desired type. By using
another DAG node as an argument, an arbitrary graph of DAG nodes can be
-built.
+built.
The syntax of a ``dag`` instance is:
The operator must be present and must be a record. There can be zero or more
arguments, separated by commas. The operator and arguments can have three
-formats.
+formats.
====================== =============================================
Format Meaning
``!eq(`` *a*\ `,` *b*\ ``)``
This operator produces 1 if *a* is equal to *b*; 0 otherwise.
- The arguments must be ``bit``, ``bits``, ``int``, ``string``, or
+ The arguments must be ``bit``, ``bits``, ``int``, ``string``, or
record values. Use ``!cast<string>`` to compare other types of objects.
``!filter(``\ *var*\ ``,`` *list*\ ``,`` *predicate*\ ``)``
XMM0, XMM1, XMM10, XMM11, XMM12, XMM13, XMM14, XMM15, XMM2, XMM3, XMM4, XMM5,
XMM6, XMM7, XMM8, XMM9,
- $ llvm-tblgen X86.td -print-enums -class=Instruction
+ $ llvm-tblgen X86.td -print-enums -class=Instruction
ABS_F, ABS_Fp32, ABS_Fp64, ABS_Fp80, ADC32mi, ADC32mi8, ADC32mr, ADC32ri,
ADC32ri8, ADC32rm, ADC32rr, ADC64mi32, ADC64mi8, ADC64mr, ADC64ri32, ADC64ri8,
ADC64rm, ADC64rr, ADD16mi, ADD16mi8, ADD16mr, ADD16ri, ADD16ri8, ADD16rm,
TableGen files have no real meaning without a backend. The default operation
when running ``*-tblgen`` is to print the information in a textual format, but
that's only useful for debugging the TableGen files themselves. The power
-in TableGen is, however, to interpret the source files into an internal
+in TableGen is, however, to interpret the source files into an internal
representation that can be generated into anything you want.
Current usage of TableGen is to create huge include files with tables that you
Scatter / Gather
^^^^^^^^^^^^^^^^
-The Loop Vectorizer can vectorize code that becomes a sequence of scalar instructions
+The Loop Vectorizer can vectorize code that becomes a sequence of scalar instructions
that scatter/gathers memory.
.. code-block:: c++
| | | fmuladd |
+-----+-----+---------+
-Note that the optimizer may not be able to vectorize math library functions
-that correspond to these intrinsics if the library calls access external state
-such as "errno". To allow better optimization of C/C++ math library functions,
+Note that the optimizer may not be able to vectorize math library functions
+that correspond to these intrinsics if the library calls access external state
+such as "errno". To allow better optimization of C/C++ math library functions,
use "-fno-math-errno".
The loop vectorizer knows about special instructions on the target and will
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Modern processors feature multiple execution units, and only programs that contain a
-high degree of parallelism can fully utilize the entire width of the machine.
-The Loop Vectorizer increases the instruction level parallelism (ILP) by
+high degree of parallelism can fully utilize the entire width of the machine.
+The Loop Vectorizer increases the instruction level parallelism (ILP) by
performing partial-unrolling of loops.
In the example below the entire array is accumulated into the variable 'sum'.
}
The Loop Vectorizer uses a cost model to decide when it is profitable to unroll loops.
-The decision to unroll the loop depends on the register pressure and the generated code size.
+The decision to unroll the loop depends on the register pressure and the generated code size.
Epilogue Vectorization
^^^^^^^^^^^^^^^^^^^^^^
XXXInstrInfo.cpp:
-.. code-block:: c++
+.. code-block:: c++
#define GET_INSTRINFO_NAMED_OPS // For getNamedOperandIdx() function
#include "XXXGenInstrInfo.inc"
.. code-block:: shell
- $ VERBOSE=1 make ...
+ $ VERBOSE=1 make ...
and search for ``llvm-tblgen`` commands in the output.
values, incoming arguments, and frame and return address. The callback
function needs low-level access to the registers or stack, so it is typically
implemented with assembler.
-
add_llvm_library( LLVMHello MODULE
Hello.cpp
-
+
PLUGIN_TOOL
opt
)
struct Hello : public FunctionPass {
static char ID;
Hello() : FunctionPass(ID) {}
-
+
bool runOnFunction(Function &F) override {
errs() << "Hello: ";
errs().write_escaped(F.getName()) << '\n';
... Pass execution timing report ...
===-------------------------------------------------------------------------===
Total Execution Time: 0.0007 seconds (0.0005 wall clock)
-
+
---User Time--- --User+System-- ---Wall Time--- --- Name ---
0.0004 ( 55.3%) 0.0004 ( 55.3%) 0.0004 ( 75.7%) Bitcode Writer
0.0003 ( 44.7%) 0.0003 ( 44.7%) 0.0001 ( 13.6%) Hello World Pass
places (for global resources). Although this is a simple extension, we simply
haven't had time (or multiprocessor machines, thus a reason) to implement this.
Despite that, we have kept the LLVM passes SMP ready, and you should too.
-
$ objdump -h -j xray_instr_map ./bin/llc
./bin/llc: file format elf64-x86-64
-
+
Sections:
Idx Name Size VMA LMA File off Algn
14 xray_instr_map 00002fc0 00000000041516c6 00000000041516c6 03d516c6 2**0
$ llvm-xray convert -f yaml --symbolize --instr_map=./bin/llc xray-log.llc.m35qPB
---
- header:
+ header:
version: 1
type: 0
constant-tsc: true
nonstop-tsc: true
cycle-frequency: 2601000000
- records:
+ records:
- { type: 0, func-id: 110, function: __cxx_global_var_init.8, cpu: 37, thread: 69819, kind: function-enter, tsc: 5434426023268520 }
- { type: 0, func-id: 110, function: __cxx_global_var_init.8, cpu: 37, thread: 69819, kind: function-exit, tsc: 5434426023523052 }
- { type: 0, func-id: 164, function: __cxx_global_var_init, cpu: 37, thread: 69819, kind: function-enter, tsc: 5434426029925386 }
$ llvm-xray account xray-log.llc.5rqxkU --top=10 --sort=sum --sortorder=dsc --instr_map=./bin/llc
Functions with latencies: 36652
- funcid count [ min, med, 90p, 99p, max] sum function
+ funcid count [ min, med, 90p, 99p, max] sum function
75 1 [ 0.672368, 0.672368, 0.672368, 0.672368, 0.672368] 0.672368 llc.cpp:271:0: main
78 1 [ 0.626455, 0.626455, 0.626455, 0.626455, 0.626455] 0.626455 llc.cpp:381:0: compileModule(char**, llvm::LLVMContext&)
139617 1 [ 0.472618, 0.472618, 0.472618, 0.472618, 0.472618] 0.472618 LegacyPassManager.cpp:1723:0: llvm::legacy::PassManager::run(llvm::Module&)
XRay traces.
- Collecting function call stacks and how often they're encountered in the
XRay trace.
-
-
Introduction to YAML
====================
-YAML is a human readable data serialization language. The full YAML language
-spec can be read at `yaml.org
+YAML is a human readable data serialization language. The full YAML language
+spec can be read at `yaml.org
<http://www.yaml.org/spec/1.2/spec.html#Introduction>`_. The simplest form of
yaml is just "scalars", "mappings", and "sequences". A scalar is any number
or string. The pound/hash symbol (#) begins a comment line. A mapping is
# a mapping
name: Tom
hat-size: 7
-
-A sequence is a list of items where each item starts with a leading dash ('-').
+
+A sequence is a list of items where each item starts with a leading dash ('-').
For example:
.. code-block:: yaml
Sometime sequences are known to be short and the one entry per line is too
verbose, so YAML offers an alternate syntax for sequences called a "Flow
-Sequence" in which you put comma separated sequence elements into square
+Sequence" in which you put comma separated sequence elements into square
brackets. The above example could then be simplified to :
The use of indenting makes the YAML easy for a human to read and understand,
but having a program read and write YAML involves a lot of tedious details.
-The YAML I/O library structures and simplifies reading and writing YAML
+The YAML I/O library structures and simplifies reading and writing YAML
documents.
YAML I/O assumes you have some "native" data structures which you want to be
-able to dump as YAML and recreate from YAML. The first step is to try
-writing example YAML for your data structures. You may find after looking at
+able to dump as YAML and recreate from YAML. The first step is to try
+writing example YAML for your data structures. You may find after looking at
possible YAML representations that a direct mapping of your data structures
to YAML is not very readable. Often the fields are not in the order that
a human would find readable. Or the same information is replicated in multiple
-locations, making it hard for a human to write such YAML correctly.
+locations, making it hard for a human to write such YAML correctly.
-In relational database theory there is a design step called normalization in
-which you reorganize fields and tables. The same considerations need to
+In relational database theory there is a design step called normalization in
+which you reorganize fields and tables. The same considerations need to
go into the design of your YAML encoding. But, you may not want to change
your existing native data structures. Therefore, when writing out YAML
there may be a normalization step, and when reading YAML there would be a
-corresponding denormalization step.
+corresponding denormalization step.
-YAML I/O uses a non-invasive, traits based design. YAML I/O defines some
+YAML I/O uses a non-invasive, traits based design. YAML I/O defines some
abstract base templates. You specialize those templates on your data types.
-For instance, if you have an enumerated type FooBar you could specialize
+For instance, if you have an enumerated type FooBar you could specialize
ScalarEnumerationTraits on that type and define the enumeration() method:
.. code-block:: c++
};
-As with all YAML I/O template specializations, the ScalarEnumerationTraits is used for
+As with all YAML I/O template specializations, the ScalarEnumerationTraits is used for
both reading and writing YAML. That is, the mapping between in-memory enum
values and the YAML string representation is only in one place.
This assures that the code for writing and parsing of YAML stays in sync.
-To specify a YAML mappings, you define a specialization on
+To specify a YAML mappings, you define a specialization on
llvm::yaml::MappingTraits.
If your native data structure happens to be a struct that is already normalized,
then the specialization is simple. For example:
.. code-block:: c++
-
+
using llvm::yaml::MappingTraits;
using llvm::yaml::IO;
-
+
template <>
struct MappingTraits<Person> {
static void mapping(IO &io, Person &info) {
iterators and a push_back() method. Therefore any of the STL containers
(such as std::vector<>) will automatically translate to YAML sequences.
-Once you have defined specializations for your data types, you can
+Once you have defined specializations for your data types, you can
programmatically use YAML I/O to write a YAML document:
.. code-block:: c++
-
+
using llvm::yaml::Output;
Person tom;
std::vector<Person> persons;
persons.push_back(tom);
persons.push_back(dan);
-
+
Output yout(llvm::outs());
yout << persons;
-
+
This would write the following:
.. code-block:: yaml
typedef std::vector<Person> PersonList;
std::vector<PersonList> docs;
-
+
Input yin(document.getBuffer());
yin >> docs;
-
+
if ( yin.error() )
return;
-
+
// Process read document
for ( PersonList &pl : docs ) {
for ( Person &person : pl ) {
cout << "name=" << person.name;
}
}
-
-One other feature of YAML is the ability to define multiple documents in a
+
+One other feature of YAML is the ability to define multiple documents in a
single file. That is why reading YAML produces a vector of your document type.
Error Handling
==============
-When parsing a YAML document, if the input does not match your schema (as
-expressed in your XxxTraits<> specializations). YAML I/O
-will print out an error message and your Input object's error() method will
+When parsing a YAML document, if the input does not match your schema (as
+expressed in your XxxTraits<> specializations). YAML I/O
+will print out an error message and your Input object's error() method will
return true. For instance the following document:
.. code-block:: yaml
- name: Dan
hat-size: 7
-Has a key (shoe-size) that is not defined in the schema. YAML I/O will
+Has a key (shoe-size) that is not defined in the schema. YAML I/O will
automatically generate this error:
.. code-block:: yaml
LLVM_YAML_STRONG_TYPEDEF(uint32_t, MyBarFlags)
This generates two classes MyFooFlags and MyBarFlags which you can use in your
-native data structures instead of uint32_t. They are implicitly
+native data structures instead of uint32_t. They are implicitly
converted to and from uint32_t. The point of creating these unique types
is that you can now specify traits on them to get different YAML conversions.
YAML I/O supports translating between in-memory enumerations and a set of string
values in YAML documents. This is done by specializing ScalarEnumerationTraits<>
on your enumeration type and define an enumeration() method.
-For instance, suppose you had an enumeration of CPUs and a struct with it as
+For instance, suppose you had an enumeration of CPUs and a struct with it as
a field:
.. code-block:: c++
cpu_x86 = 7,
cpu_PowerPC = 8
};
-
+
struct Info {
CPUs cpu;
uint32_t flags;
};
-
-To support reading and writing of this enumeration, you can define a
-ScalarEnumerationTraits specialization on CPUs, which can then be used
-as a field type:
+
+To support reading and writing of this enumeration, you can define a
+ScalarEnumerationTraits specialization on CPUs, which can then be used
+as a field type:
.. code-block:: c++
io.enumCase(value, "PowerPC", cpu_PowerPC);
}
};
-
+
template <>
struct MappingTraits<Info> {
static void mapping(IO &io, Info &info) {
specified by enumCase() methods, an error is automatically generated.
When writing YAML, if the value being written does not match any of the values
specified by the enumCase() methods, a runtime assertion is triggered.
-
+
BitValue
--------
Another common data structure in C++ is a field where each bit has a unique
meaning. This is often used in a "flags" field. YAML I/O has support for
-converting such fields to a flow sequence. For instance suppose you
+converting such fields to a flow sequence. For instance suppose you
had the following bit flags defined:
.. code-block:: c++
};
LLVM_YAML_STRONG_TYPEDEF(uint32_t, MyFlags)
-
+
To support reading and writing of MyFlags, you specialize ScalarBitSetTraits<>
-on MyFlags and provide the bit values and their names.
+on MyFlags and provide the bit values and their names.
.. code-block:: c++
io.bitSetCase(value, "pointy", flagPointy);
}
};
-
+
struct Info {
StringRef name;
MyFlags flags;
};
-
+
template <>
struct MappingTraits<Info> {
static void mapping(IO &io, Info& info) {
}
};
-With the above, YAML I/O (when writing) will test mask each value in the
+With the above, YAML I/O (when writing) will test mask each value in the
bitset trait against the flags field, and each that matches will
cause the corresponding string to be added to the flow sequence. The opposite
is done when reading and any unknown string values will result in an error. With
-------------
Sometimes for readability a scalar needs to be formatted in a custom way. For
instance your internal data structure may use an integer for time (seconds since
-some epoch), but in YAML it would be much nicer to express that integer in
-some time format (e.g. 4-May-2012 10:30pm). YAML I/O has a way to support
+some epoch), but in YAML it would be much nicer to express that integer in
+some time format (e.g. 4-May-2012 10:30pm). YAML I/O has a way to support
custom formatting and parsing of scalar types by specializing ScalarTraits<> on
your data type. When writing, YAML I/O will provide the native type and
your specialization must create a temporary llvm::StringRef. When reading,
}
};
-
+
Mappings
========
-To be translated to or from a YAML mapping for your type T you must specialize
-llvm::yaml::MappingTraits on T and implement the "void mapping(IO &io, T&)"
+To be translated to or from a YAML mapping for your type T you must specialize
+llvm::yaml::MappingTraits on T and implement the "void mapping(IO &io, T&)"
method. If your native data structures use pointers to a class everywhere,
you can specialize on the class pointer. Examples:
.. code-block:: c++
-
+
using llvm::yaml::MappingTraits;
using llvm::yaml::IO;
-
+
// Example of struct Foo which is used by value
template <>
struct MappingTraits<Foo> {
No Normalization
----------------
-The ``mapping()`` method is responsible, if needed, for normalizing and
-denormalizing. In a simple case where the native data structure requires no
-normalization, the mapping method just uses mapOptional() or mapRequired() to
+The ``mapping()`` method is responsible, if needed, for normalizing and
+denormalizing. In a simple case where the native data structure requires no
+normalization, the mapping method just uses mapOptional() or mapRequired() to
bind the struct's fields to YAML key names. For example:
.. code-block:: c++
-
+
using llvm::yaml::MappingTraits;
using llvm::yaml::IO;
-
+
template <>
struct MappingTraits<Person> {
static void mapping(IO &io, Person &info) {
do the normalization and denormalization. The template is used to create
a local variable in your mapping() method which contains the normalized keys.
-Suppose you have native data type
+Suppose you have native data type
Polar which specifies a position in polar coordinates (distance, angle):
.. code-block:: c++
-
+
struct Polar {
float distance;
float angle;
};
-but you've decided the normalized YAML for should be in x,y coordinates. That
+but you've decided the normalized YAML for should be in x,y coordinates. That
is, you want the yaml to look like:
.. code-block:: yaml
y: -4.7
You can support this by defining a MappingTraits that normalizes the polar
-coordinates to x,y coordinates when writing YAML and denormalizes x,y
-coordinates into polar when reading YAML.
+coordinates to x,y coordinates when writing YAML and denormalizes x,y
+coordinates into polar when reading YAML.
.. code-block:: c++
-
+
using llvm::yaml::MappingTraits;
using llvm::yaml::IO;
-
+
template <>
struct MappingTraits<Polar> {
-
+
class NormalizedPolar {
public:
NormalizedPolar(IO &io)
: x(0.0), y(0.0) {
}
NormalizedPolar(IO &, Polar &polar)
- : x(polar.distance * cos(polar.angle)),
+ : x(polar.distance * cos(polar.angle)),
y(polar.distance * sin(polar.angle)) {
}
Polar denormalize(IO &) {
return Polar(sqrt(x*x+y*y), arctan(x,y));
}
-
+
float x;
float y;
};
static void mapping(IO &io, Polar &polar) {
MappingNormalization<NormalizedPolar, Polar> keys(io, polar);
-
+
io.mapRequired("x", keys->x);
io.mapRequired("y", keys->y);
}
};
-When writing YAML, the local variable "keys" will be a stack allocated
+When writing YAML, the local variable "keys" will be a stack allocated
instance of NormalizedPolar, constructed from the supplied polar object which
initializes it x and y fields. The mapRequired() methods then write out the x
-and y values as key/value pairs.
+and y values as key/value pairs.
When reading YAML, the local variable "keys" will be a stack allocated instance
-of NormalizedPolar, constructed by the empty constructor. The mapRequired
-methods will find the matching key in the YAML document and fill in the x and y
+of NormalizedPolar, constructed by the empty constructor. The mapRequired
+methods will find the matching key in the YAML document and fill in the x and y
fields of the NormalizedPolar object keys. At the end of the mapping() method
when the local keys variable goes out of scope, the denormalize() method will
automatically be called to convert the read values back to polar coordinates,
In some cases, the normalized class may be a subclass of the native type and
could be returned by the denormalize() method, except that the temporary
normalized instance is stack allocated. In these cases, the utility template
-MappingNormalizationHeap<> can be used instead. It just like
+MappingNormalizationHeap<> can be used instead. It just like
MappingNormalization<> except that it heap allocates the normalized object
when reading YAML. It never destroys the normalized object. The denormalize()
method can this return "this".
Default values
--------------
-Within a mapping() method, calls to io.mapRequired() mean that that key is
-required to exist when parsing YAML documents, otherwise YAML I/O will issue an
+Within a mapping() method, calls to io.mapRequired() mean that that key is
+required to exist when parsing YAML documents, otherwise YAML I/O will issue an
error.
-On the other hand, keys registered with io.mapOptional() are allowed to not
-exist in the YAML document being read. So what value is put in the field
-for those optional keys?
-There are two steps to how those optional fields are filled in. First, the
+On the other hand, keys registered with io.mapOptional() are allowed to not
+exist in the YAML document being read. So what value is put in the field
+for those optional keys?
+There are two steps to how those optional fields are filled in. First, the
second parameter to the mapping() method is a reference to a native class. That
native class must have a default constructor. Whatever value the default
constructor initially sets for an optional field will be that field's value.
Second, the mapOptional() method has an optional third parameter. If provided
-it is the value that mapOptional() should set that field to if the YAML document
-does not have that key.
+it is the value that mapOptional() should set that field to if the YAML document
+does not have that key.
There is one important difference between those two ways (default constructor
-and third parameter to mapOptional). When YAML I/O generates a YAML document,
+and third parameter to mapOptional). When YAML I/O generates a YAML document,
if the mapOptional() third parameter is used, if the actual value being written
is the same as (using ==) the default value, then that key/value is not written.
the YAML document would find natural. This may be different that the order
of the fields in the native class.
-When reading in a YAML document, the keys in the document can be in any order,
-but they are processed in the order that the calls to mapRequired()/mapOptional()
-are made in the mapping() method. That enables some interesting
+When reading in a YAML document, the keys in the document can be in any order,
+but they are processed in the order that the calls to mapRequired()/mapOptional()
+are made in the mapping() method. That enables some interesting
functionality. For instance, if the first field bound is the cpu and the second
field bound is flags, and the flags are cpu specific, you can programmatically
-switch how the flags are converted to and from YAML based on the cpu.
+switch how the flags are converted to and from YAML based on the cpu.
This works for both reading and writing. For example:
.. code-block:: c++
using llvm::yaml::MappingTraits;
using llvm::yaml::IO;
-
+
struct Info {
CPUs cpu;
uint32_t flags;
The YAML syntax supports tags as a way to specify the type of a node before
it is parsed. This allows dynamic types of nodes. But the YAML I/O model uses
static typing, so there are limits to how you can use tags with the YAML I/O
-model. Recently, we added support to YAML I/O for checking/setting the optional
-tag on a map. Using this functionality it is even possible to support different
-mappings, as long as they are convertible.
+model. Recently, we added support to YAML I/O for checking/setting the optional
+tag on a map. Using this functionality it is even possible to support different
+mappings, as long as they are convertible.
To check a tag, inside your mapping() method you can use io.mapTag() to specify
what the tag should be. This will also add that tag when writing yaml.
Sometimes in a YAML map, each key/value pair is valid, but the combination is
not. This is similar to something having no syntax errors, but still having
semantic errors. To support semantic level checking, YAML I/O allows
-an optional ``validate()`` method in a MappingTraits template specialization.
+an optional ``validate()`` method in a MappingTraits template specialization.
-When parsing YAML, the ``validate()`` method is call *after* all key/values in
-the map have been processed. Any error message returned by the ``validate()``
+When parsing YAML, the ``validate()`` method is call *after* all key/values in
+the map have been processed. Any error message returned by the ``validate()``
method during input will be printed just a like a syntax error would be printed.
-When writing YAML, the ``validate()`` method is called *before* the YAML
-key/values are written. Any error during output will trigger an ``assert()``
+When writing YAML, the ``validate()`` method is called *before* the YAML
+key/values are written. Any error during output will trigger an ``assert()``
because it is a programming error to have invalid struct values.
using llvm::yaml::MappingTraits;
using llvm::yaml::IO;
-
+
struct Stuff {
...
};
};
The size() method returns how many elements are currently in your sequence.
-The element() method returns a reference to the i'th element in the sequence.
+The element() method returns a reference to the i'th element in the sequence.
When parsing YAML, the element() method may be called with an index one bigger
than the current size. Your element() method should allocate space for one
more element (using default constructor if element is a C++ object) and returns
-a reference to that new allocated space.
+a reference to that new allocated space.
Flow Sequence
-------------
-A YAML "flow sequence" is a sequence that when written to YAML it uses the
+A YAML "flow sequence" is a sequence that when written to YAML it uses the
inline notation (e.g [ foo, bar ] ). To specify that a sequence type should
be written in YAML as a flow sequence, your SequenceTraits specialization should
add "static const bool flow = true;". For instance:
struct SequenceTraits<MyList> {
static size_t size(IO &io, MyList &list) { ... }
static MyListEl &element(IO &io, MyList &list, size_t index) { ... }
-
+
// The existence of this member causes YAML I/O to use a flow sequence
static const bool flow = true;
};
-With the above, if you used MyList as the data type in your native data
-structures, then when converted to YAML, a flow sequence of integers
+With the above, if you used MyList as the data type in your native data
+structures, then when converted to YAML, a flow sequence of integers
will be used (e.g. [ 10, -3, 4 ]).
Flow sequences are subject to line wrapping according to the Output object
--------------
Since a common source of sequences is std::vector<>, YAML I/O provides macros:
LLVM_YAML_IS_SEQUENCE_VECTOR() and LLVM_YAML_IS_FLOW_SEQUENCE_VECTOR() which
-can be used to easily specify SequenceTraits<> on a std::vector type. YAML
+can be used to easily specify SequenceTraits<> on a std::vector type. YAML
I/O does not partial specialize SequenceTraits on std::vector<> because that
would force all vectors to be sequences. An example use of the macros:
Document List
=============
-YAML allows you to define multiple "documents" in a single YAML file. Each
+YAML allows you to define multiple "documents" in a single YAML file. Each
new document starts with a left aligned "---" token. The end of all documents
is denoted with a left aligned "..." token. Many users of YAML will never
have need for multiple documents. The top level node in their YAML schema
will be a mapping or sequence. For those cases, the following is not needed.
But for cases where you do want multiple documents, you can specify a
-trait for you document list type. The trait has the same methods as
+trait for you document list type. The trait has the same methods as
SequenceTraits but is named DocumentListTraits. For example:
.. code-block:: c++
User Context Data
=================
-When an llvm::yaml::Input or llvm::yaml::Output object is created their
-constructors take an optional "context" parameter. This is a pointer to
-whatever state information you might need.
+When an llvm::yaml::Input or llvm::yaml::Output object is created their
+constructors take an optional "context" parameter. This is a pointer to
+whatever state information you might need.
-For instance, in a previous example we showed how the conversion type for a
-flags field could be determined at runtime based on the value of another field
+For instance, in a previous example we showed how the conversion type for a
+flags field could be determined at runtime based on the value of another field
in the mapping. But what if an inner mapping needs to know some field value
of an outer mapping? That is where the "context" parameter comes in. You
can set values in the context in the outer map's mapping() method and
retrieve those values in the inner map's mapping() method.
-The context value is just a void*. All your traits which use the context
+The context value is just a void*. All your traits which use the context
and operate on your native data types, need to agree what the context value
actually is. It could be a pointer to an object or struct which your various
traits use to shared context sensitive information.
Output
======
-The llvm::yaml::Output class is used to generate a YAML document from your
-in-memory data structures, using traits defined on your data types.
-To instantiate an Output object you need an llvm::raw_ostream, an optional
+The llvm::yaml::Output class is used to generate a YAML document from your
+in-memory data structures, using traits defined on your data types.
+To instantiate an Output object you need an llvm::raw_ostream, an optional
context pointer and an optional wrapping column:
.. code-block:: c++
class Output : public IO {
public:
Output(llvm::raw_ostream &, void *context = NULL, int WrapColumn = 70);
-
+
Once you have an Output object, you can use the C++ stream operator on it
to write your native data as YAML. One thing to recall is that a YAML file
can contain multiple "documents". If the top level data structure you are
streaming as YAML is a mapping, scalar, or sequence, then Output assumes you
-are generating one document and wraps the mapping output
-with "``---``" and trailing "``...``".
+are generating one document and wraps the mapping output
+with "``---``" and trailing "``...``".
The WrapColumn parameter will cause the flow mappings and sequences to
line-wrap when they go over the supplied column. Pass 0 to completely
suppress the wrapping.
.. code-block:: c++
-
+
using llvm::yaml::Output;
void dumpMyMapDoc(const MyMapType &info) {
and ends with a "...".
.. code-block:: c++
-
+
using llvm::yaml::Output;
void dumpMyMapDoc(const MyDocListType &docList) {
The llvm::yaml::Input class is used to parse YAML document(s) into your native
data structures. To instantiate an Input
-object you need a StringRef to the entire YAML file, and optionally a context
+object you need a StringRef to the entire YAML file, and optionally a context
pointer:
.. code-block:: c++
class Input : public IO {
public:
Input(StringRef inputContent, void *context=NULL);
-
+
Once you have an Input object, you can use the C++ stream operator to read
the document(s). If you expect there might be multiple YAML documents in
one file, you'll need to specialize DocumentListTraits on a list of your
document type and stream in that document list type. Otherwise you can
-just stream in the document type. Also, you can check if there was
+just stream in the document type. Also, you can check if there was
any syntax errors in the YAML be calling the error() method on the Input
object. For example:
.. code-block:: c++
-
+
// Reading a single document
using llvm::yaml::Input;
Input yin(mb.getBuffer());
-
+
// Parse the YAML file
MyDocType theDoc;
yin >> theDoc;
// Check for error
if ( yin.error() )
return;
-
-
+
+
.. code-block:: c++
-
+
// Reading multiple documents in one file
using llvm::yaml::Input;
LLVM_YAML_IS_DOCUMENT_LIST_VECTOR(MyDocType)
-
+
Input yin(mb.getBuffer());
-
+
// Parse the YAML file
std::vector<MyDocType> theDocList;
yin >> theDocList;
// Check for error
if ( yin.error() )
return;
-
-
.. code-block:: c++
TheModule->setDataLayout(TargetMachine->createDataLayout());
- TheModule->setTargetTriple(TargetTriple);
-
+ TheModule->setTargetTriple(TargetTriple);
+
Emit Object Code
================
when you're done.
::
-
+
$ ./toy
ready> def average(x y) (x + y) * 0.5;
^D
- `Chapter #8: Compiling to Object Files <LangImpl08.html>`_ - This
chapter explains how to take LLVM IR and compile it down to object
files, like a static compiler does.
-- `Chapter #9: Debug Information <LangImpl09.html>`_ - A real language
+- `Chapter #9: Debug Information <LangImpl09.html>`_ - A real language
needs to support debuggers, so we
add debug information that allows setting breakpoints in Kaleidoscope
functions, print out argument variables, and call functions!
SOURCE_DIR=same-as-llvm-source # e.g. the checkout of llvm-project, next to openmp
BUILD_DIR=somewhere
INSTALL_PREFIX=same-as-llvm-install
-
+
cd $SOURCE_DIR
git clone git@github.com:RadeonOpenCompute/ROCT-Thunk-Interface.git -b roc-4.2.x \
--single-branch
git clone git@github.com:RadeonOpenCompute/ROCR-Runtime.git -b rocm-4.2.x \
--single-branch
-
+
cd $BUILD_DIR && mkdir roct && cd roct
cmake $SOURCE_DIR/ROCT-Thunk-Interface/ -DCMAKE_INSTALL_PREFIX=$INSTALL_PREFIX \
-DCMAKE_BUILD_TYPE=Release -DBUILD_SHARED_LIBS=OFF
make && make install
``IMAGE_SUPPORT`` requires building rocr with clang and is not used by openmp.
-
+
Provided cmake's find_package can find the ROCR-Runtime package, LLVM will
build a tool ``bin/amdgpu-arch`` which will print a string like ``gfx906`` when
run if it recognises a GPU on the local system. LLVM will also build a shared
Q: What is a way to debug errors from mapping memory to a target device?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-An experimental way to debug these errors is to use :ref:`remote process
+An experimental way to debug these errors is to use :ref:`remote process
offloading <remote_offloading_plugin>`.
By using ``libomptarget.rtl.rpc.so`` and ``openmp-offloading-server``, it is
possible to explicitly perform memory transfers between processes on the host
support for your compiler. The flags necessary for OpenMP target offloading will
be loaded into the ``OpenMPTarget::OpenMPTarget_<device>`` target or the
``OpenMPTarget_<device>_FLAGS`` variable if successful. Currently supported
-devices are ``AMDGPU`` and ``NVPTX``.
+devices are ``AMDGPU`` and ``NVPTX``.
To use this module, simply add the path to CMake's current module path and call
``find_package``. The module will be installed with your OpenMP installation by
cmake_minimum_required(VERSION 3.13.4)
project(offloadTest VERSION 1.0 LANGUAGES CXX)
-
+
list(APPEND CMAKE_MODULE_PATH "${PATH_TO_OPENMP_INSTALL}/lib/cmake/openmp")
-
+
find_package(OpenMPTarget REQUIRED NVPTX)
-
+
add_executable(offload)
target_link_libraries(offload PRIVATE OpenMPTarget::OpenMPTarget_NVPTX)
target_sources(offload PRIVATE ${CMAKE_CURRENT_SOURCE_DIR}/src/Main.cpp)
LLVM/OpenMP Runtimes
====================
-There are four distinct types of LLVM/OpenMP runtimes
+There are four distinct types of LLVM/OpenMP runtimes
LLVM/OpenMP Host Runtime (``libomp``)
-------------------------------------
is enabled at any level of debugging so a full debug runtime is not required.
For minimal debugging information compile with `-gline-tables-only`, or compile
with `-g` for full debug information. A full list of flags supported by
-``LIBOMPTARGET_INFO`` is given below.
+``LIBOMPTARGET_INFO`` is given below.
* Print all data arguments upon entering an OpenMP device kernel: ``0x01``
* Indicate when a mapped address already exists in the device mapping table:
#pragma omp target teams distribute parallel for reduction(+:sum)
for (int i = 0; i < N; ++i)
sum += A[i];
-
+
return sum;
}
-
+
int main() {
const int N = 1024;
double A[N];
.. code-block:: text
- CUDA error: an illegal memory access was encountered
+ CUDA error: an illegal memory access was encountered
Libomptarget error: Copying data from device failed.
Libomptarget error: Call to targetDataEnd failed, abort target.
Libomptarget error: Failed to process data after launching the kernel.
#pragma omp target teams distribute parallel for reduction(+:sum) map(to:A[0 : N])
for (int i = 0; i < N; ++i)
sum += A[i];
-
+
return sum;
}
LIBOMPTARGET_SHARED_MEMORY_SIZE
"""""""""""""""""""""""""""""""
-This environment variable sets the amount of dynamic shared memory in bytes used
-by the kernel once it is launched. A pointer to the dynamic memory buffer can
-currently only be accessed using the ``__kmpc_get_dynamic_shared`` device
+This environment variable sets the amount of dynamic shared memory in bytes used
+by the kernel once it is launched. A pointer to the dynamic memory buffer can
+currently only be accessed using the ``__kmpc_get_dynamic_shared`` device
runtime call.
.. toctree::
LLVM/OpenMP Target Device Runtime (``libomptarget-ARCH-SUBARCH.bc``)
--------------------------------------------------------------------
-The target device runtime is an LLVM bitcode library that implements OpenMP
-runtime functions on the target device. It is linked with the device code's LLVM
+The target device runtime is an LLVM bitcode library that implements OpenMP
+runtime functions on the target device. It is linked with the device code's LLVM
IR during compilation.
Debugging
.. code-block:: c++
void use(void *) { }
-
+
void foo() {
int x;
use(&x);
}
-
+
int main() {
#pragma omp target parallel
foo();
- 2021 OpenMP Webinar: "A Compiler's View of OpenMP" https://youtu.be/eIMpgez61r4
- 2020 LLVM Developers’ Meeting: "(OpenMP) Parallelism-Aware Optimizations" https://youtu.be/gtxWkeLCxmU
- 2019 EuroLLVM Developers’ Meeting: "Compiler Optimizations for (OpenMP) Target Offloading to GPUs" https://youtu.be/3AbS82C3X30
-
.. code-block:: c++
void use(int *x) { }
-
+
void foo() {
int x;
use(&x);
}
-
+
int main() {
#pragma omp target parallel
foo();
.. code-block:: c++
#include <complex>
-
+
using complex = std::complex<double>;
-
+
void zaxpy(complex *X, complex *Y, const complex D, int N) {
#pragma omp target teams distribute parallel for firstprivate(D)
for (int i = 0; i < N; ++i)
for (int i0 = 0; i0 < M; i0 += MC) {
for (int j0 = 0; j0 < N; j0 += NC) {
double sX[MC][NC];
-
+
#pragma omp parallel for collapse(2) shared(sX) default(firstprivate)
for (int i1 = 0; i1 < MC; ++i1)
for (int j1 = 0; j1 < NC; ++j1)
sX[i1][j1] = X[(i0 + i1) * N + (j0 + j1)];
-
+
#pragma omp parallel for collapse(2) shared(sX) default(firstprivate)
for (int i1 = 1; i1 < MC - 1; ++i1)
for (int j1 = 1; j1 < NC - 1; ++j1)
Y[(i0 + i1) * N + j0 * j1] = (sX[i1 + 1][j1] + sX[i1 - 1][j1] +
sX[i1][j1 + 1] + sX[i1][j1 - 1] +
-4.0 * sX[i1][j1]) / (dX * dX);
- }
+ }
}
}
.. code-block:: console
- $ clang++ -fopenmp -fopenmp-targets=nvptx64 -O1 -Rpass=openmp-opt -fopenmp-version=51 omp111.cpp
+ $ clang++ -fopenmp -fopenmp-targets=nvptx64 -O1 -Rpass=openmp-opt -fopenmp-version=51 omp111.cpp
omp111.cpp:10:14: remark: Replaced globalized variable with 8192 bytes of shared memory. [OMP111]
double sX[MC][NC];
^
device that was not either replaced with stack memory by :ref:`OMP110 <omp110>`
or shared memory by :ref:`OMP111 <omp111>`. Globalization that has not been
removed will need to be handled by the runtime and will significantly impact
-performance.
+performance.
The OpenMP standard requires that threads are able to share their data between
each-other. However, this is not true by default when offloading to a target
#include <omp.h>
#include <cstdio>
-
+
#pragma omp declare target
static int *p;
#pragma omp end declare target
-
+
void foo() {
int x = omp_get_thread_num();
if (omp_get_thread_num() == 1)
p = &x;
-
+
#pragma omp barrier
-
+
printf ("Thread %d: %d\n", omp_get_thread_num(), *p);
}
-
+
int main() {
#pragma omp target parallel
foo();
.. code-block:: console
- $ clang++ -fopenmp -fopenmp-targets=nvptx64 -O1 -Rpass-missed=openmp-opt omp112.cpp
+ $ clang++ -fopenmp -fopenmp-targets=nvptx64 -O1 -Rpass-missed=openmp-opt omp112.cpp
omp112.cpp:9:7: remark: Found thread data sharing on the GPU. Expect degraded performance
due to data globalization. [OMP112] [-Rpass-missed=openmp-opt]
int x = omp_get_thread_num();
.. code-block:: c++
extern void use(int *x);
-
+
void foo() {
int x;
use(&x);
}
-
+
int main() {
#pragma omp target parallel
foo();
.. code-block:: console
- $ clang++ -fopenmp -fopenmp-targets=nvptx64 -O1 -Rpass-missed=openmp-opt omp112.cpp
+ $ clang++ -fopenmp -fopenmp-targets=nvptx64 -O1 -Rpass-missed=openmp-opt omp112.cpp
omp112.cpp:4:7: remark: Found thread data sharing on the GPU. Expect degraded performance
due to data globalization. [OMP112] [-Rpass-missed=openmp-opt]
int x;
.. code-block:: c++
extern void use(int *x);
-
+
void foo() {
int x;
use(&x);
}
-
+
int main() {
#pragma omp target parallel
foo();
.. code-block:: console
$ clang++ -fopenmp -fopenmp-targets=nvptx64 -O2 -Rpass-missed=openmp-opt omp113.cpp
- missed.cpp:4:7: remark: Could not move globalized variable to the stack. Variable is
- potentially captured in call. Mark parameter as `__attribute__((noescape))` to
+ missed.cpp:4:7: remark: Could not move globalized variable to the stack. Variable is
+ potentially captured in call. Mark parameter as `__attribute__((noescape))` to
override. [OMP113]
int x;
^
int x;
use(&x);
}
-
+
int main() {
#pragma omp target parallel
foo();
guarded prevents the target region from executing in SPMD-mode. SPMD-mode
requires that each thread is active inside the region. Any instruction that
cannot be either recomputed by each thread independently or guarded and executed
-by a single thread prevents the region from executing in SPMD-mode.
+by a single thread prevents the region from executing in SPMD-mode.
This remark will attempt to print out the instructions preventing the region
from being executed in SPMD-mode. Calls to functions outside the current
extern int work();
void use(int x);
-
+
void foo() {
#pragma omp target teams
{
int x = work();
#pragma omp parallel
use(x);
-
+
}
}
.. code-block:: console
$ clang++ -fopenmp -fopenmp-targets=nvptx64 -O2 -Rpass-analysis=openmp-opt omp121.cpp
- omp121.cpp:8:13: remark: Value has potential side effects preventing SPMD-mode
- execution. Add `__attribute__((assume("ompx_spmd_amenable")))` to the called function
+ omp121.cpp:8:13: remark: Value has potential side effects preventing SPMD-mode
+ execution. Add `__attribute__((assume("ompx_spmd_amenable")))` to the called function
to override. [OMP121]
int x = work();
^
__attribute__((assume("ompx_spmd_amenable"))) extern int work();
void use(int x);
-
+
void foo() {
#pragma omp target teams
{
int x = work();
#pragma omp parallel
use(x);
-
+
}
}
a target region. This occurs when there are no parallel regions inside of a
target construct. Normally, a state machine is required to schedule the threads
inside of a parallel region. If there are no parallel regions, the state machine
-is unnecessary because there is only a single thread active at any time.
+is unnecessary because there is only a single thread active at any time.
Examples
--------
.. code-block:: console
$ clang++ -fopenmp -fopenmp-targets=nvptx64 -O2 -Rpass-analysis=openmp-opt omp132.cpp
- omp133.cpp:4:1: remark: Generic-mode kernel is executed with a customized state machine
+ omp133.cpp:4:1: remark: Generic-mode kernel is executed with a customized state machine
that requires a fallback. [OMP132]
#pragma omp target
^
.. code-block:: console
$ clang++ -fopenmp -fopenmp-targets=nvptx64 -O2 -Rpass-analysis=openmp-opt omp133.cpp
- omp133.cpp:6:5: remark: Call may contain unknown parallel regions. Use
+ omp133.cpp:6:5: remark: Call may contain unknown parallel regions. Use
`__attribute__((assume("omp_no_parallelism")))` to override. [OMP133]
setup();
^
.. code-block:: console
$ clang++ -fopenmp -fopenmp-targets=nvptx64 -O1 -Rpass-analysis=openmp-opt omp140.cpp
- omp140.cpp:1:1: remark: Could not internalize function. Some optimizations may not
+ omp140.cpp:1:1: remark: Could not internalize function. Some optimizations may not
be possible. [OMP140]
__attribute__((weak)) void setup() {
^
void foo() {
#pragma omp parallel
parallel_work();
-
+
sequential_work();
-
+
#pragma omp parallel
parallel_work();
}
void foo(int N) {
double *A = malloc(N * omp_get_thread_limit());
double *B = malloc(N * omp_get_thread_limit());
-
+
#pragma omp parallel
work(&A[omp_get_thread_num() * N]);
#pragma omp parallel
.. code-block:: console
- $ clang -fopenmp -O2 -Rpass=openmp-opt omp170.c
+ $ clang -fopenmp -O2 -Rpass=openmp-opt omp170.c
ompi170.c:2:26: remark: OpenMP runtime call omp_get_thread_limit deduplicated. [OMP170]
double *A = malloc(N * omp_get_thread_limit());
^
Replacing OpenMP runtime call <call> with <value>.
====================================================================
-This optimization remark indicates that analysis determined an OpenMP runtime
-calls can be replaced with a constant value. This can occur when an OpenMP
-runtime call that queried some internal state was found to always return a
+This optimization remark indicates that analysis determined an OpenMP runtime
+calls can be replaced with a constant value. This can occur when an OpenMP
+runtime call that queried some internal state was found to always return a
single value after analysis.
Example
-------
-This optimization will trigger for most target regions to simplify the runtime
-once certain constants are known. This will trigger for internal runtime
-functions so it requires enabling verbose remarks with
+This optimization will trigger for most target regions to simplify the runtime
+once certain constants are known. This will trigger for internal runtime
+functions so it requires enabling verbose remarks with
`-openmp-opt-verbose-remarks`.
.. code-block:: c++
cmake_minimum_required(VERSION 3.13.4)
project(offloadTest VERSION 1.0 LANGUAGES CXX)
-
+
list(APPEND CMAKE_MODULE_PATH "${PATH_TO_OPENMP_INSTALL}/lib/cmake/openmp")
-
+
find_package(OpenMPTarget REQUIRED NVPTX)
-
+
add_executable(offload)
target_link_libraries(offload PRIVATE OpenMPTarget::OpenMPTarget_NVPTX)
target_sources(offload PRIVATE ${CMAKE_CURRENT_SOURCE_DIR}/src/Main.cpp)