1 ==============================
2 LLVM Language Reference Manual
3 ==============================
12 This document is a reference manual for the LLVM assembly language. LLVM
13 is a Static Single Assignment (SSA) based representation that provides
14 type safety, low-level operations, flexibility, and the capability of
15 representing 'all' high-level languages cleanly. It is the common code
16 representation used throughout all phases of the LLVM compilation
22 The LLVM code representation is designed to be used in three different
23 forms: as an in-memory compiler IR, as an on-disk bitcode representation
24 (suitable for fast loading by a Just-In-Time compiler), and as a human
25 readable assembly language representation. This allows LLVM to provide a
26 powerful intermediate representation for efficient compiler
27 transformations and analysis, while providing a natural means to debug
28 and visualize the transformations. The three different forms of LLVM are
29 all equivalent. This document describes the human readable
30 representation and notation.
32 The LLVM representation aims to be light-weight and low-level while
33 being expressive, typed, and extensible at the same time. It aims to be
34 a "universal IR" of sorts, by being at a low enough level that
35 high-level ideas may be cleanly mapped to it (similar to how
36 microprocessors are "universal IR's", allowing many source languages to
37 be mapped to them). By providing type information, LLVM can be used as
38 the target of optimizations: for example, through pointer analysis, it
39 can be proven that a C automatic variable is never accessed outside of
40 the current function, allowing it to be promoted to a simple SSA value
41 instead of a memory location.
48 It is important to note that this document describes 'well formed' LLVM
49 assembly language. There is a difference between what the parser accepts
50 and what is considered 'well formed'. For example, the following
51 instruction is syntactically okay, but not well formed:
57 because the definition of ``%x`` does not dominate all of its uses. The
58 LLVM infrastructure provides a verification pass that may be used to
59 verify that an LLVM module is well formed. This pass is automatically
60 run by the parser after parsing input assembly and by the optimizer
61 before it outputs bitcode. The violations pointed out by the verifier
62 pass indicate bugs in transformation passes or input to the parser.
69 LLVM identifiers come in two basic types: global and local. Global
70 identifiers (functions, global variables) begin with the ``'@'``
71 character. Local identifiers (register names, types) begin with the
72 ``'%'`` character. Additionally, there are three different formats for
73 identifiers, for different purposes:
75 #. Named values are represented as a string of characters with their
76 prefix. For example, ``%foo``, ``@DivisionByZero``,
77 ``%a.really.long.identifier``. The actual regular expression used is
78 '``[%@][-a-zA-Z$._][-a-zA-Z$._0-9]*``'. Identifiers that require other
79 characters in their names can be surrounded with quotes. Special
80 characters may be escaped using ``"\xx"`` where ``xx`` is the ASCII
81 code for the character in hexadecimal. In this way, any character can
82 be used in a name value, even quotes themselves. The ``"\01"`` prefix
83 can be used on global values to suppress mangling.
84 #. Unnamed values are represented as an unsigned numeric value with
85 their prefix. For example, ``%12``, ``@2``, ``%44``.
86 #. Constants, which are described in the section Constants_ below.
88 LLVM requires that values start with a prefix for two reasons: Compilers
89 don't need to worry about name clashes with reserved words, and the set
90 of reserved words may be expanded in the future without penalty.
91 Additionally, unnamed identifiers allow a compiler to quickly come up
92 with a temporary variable without having to avoid symbol table
95 Reserved words in LLVM are very similar to reserved words in other
96 languages. There are keywords for different opcodes ('``add``',
97 '``bitcast``', '``ret``', etc...), for primitive type names ('``void``',
98 '``i32``', etc...), and others. These reserved words cannot conflict
99 with variable names, because none of them start with a prefix character
100 (``'%'`` or ``'@'``).
102 Here is an example of LLVM code to multiply the integer variable
109 %result = mul i32 %X, 8
111 After strength reduction:
115 %result = shl i32 %X, 3
121 %0 = add i32 %X, %X ; yields i32:%0
122 %1 = add i32 %0, %0 ; yields i32:%1
123 %result = add i32 %1, %1
125 This last way of multiplying ``%X`` by 8 illustrates several important
126 lexical features of LLVM:
128 #. Comments are delimited with a '``;``' and go until the end of line.
129 #. Unnamed temporaries are created when the result of a computation is
130 not assigned to a named value.
131 #. Unnamed temporaries are numbered sequentially (using a per-function
132 incrementing counter, starting with 0). Note that basic blocks and unnamed
133 function parameters are included in this numbering. For example, if the
134 entry basic block is not given a label name and all function parameters are
135 named, then it will get number 0.
137 It also shows a convention that we follow in this document. When
138 demonstrating instructions, we will follow an instruction with a comment
139 that defines the type and name of value produced.
147 LLVM programs are composed of ``Module``'s, each of which is a
148 translation unit of the input programs. Each module consists of
149 functions, global variables, and symbol table entries. Modules may be
150 combined together with the LLVM linker, which merges function (and
151 global variable) definitions, resolves forward declarations, and merges
152 symbol table entries. Here is an example of the "hello world" module:
156 ; Declare the string constant as a global constant.
157 @.str = private unnamed_addr constant [13 x i8] c"hello world\0A\00"
159 ; External declaration of the puts function
160 declare i32 @puts(i8* nocapture) nounwind
162 ; Definition of main function
163 define i32 @main() { ; i32()*
164 ; Convert [13 x i8]* to i8*...
165 %cast210 = getelementptr [13 x i8], [13 x i8]* @.str, i64 0, i64 0
167 ; Call puts function to write out the string to stdout.
168 call i32 @puts(i8* %cast210)
173 !0 = !{i32 42, null, !"string"}
176 This example is made up of a :ref:`global variable <globalvars>` named
177 "``.str``", an external declaration of the "``puts``" function, a
178 :ref:`function definition <functionstructure>` for "``main``" and
179 :ref:`named metadata <namedmetadatastructure>` "``foo``".
181 In general, a module is made up of a list of global values (where both
182 functions and global variables are global values). Global values are
183 represented by a pointer to a memory location (in this case, a pointer
184 to an array of char, and a pointer to a function), and have one of the
185 following :ref:`linkage types <linkage>`.
192 All Global Variables and Functions have one of the following types of
196 Global values with "``private``" linkage are only directly
197 accessible by objects in the current module. In particular, linking
198 code into a module with a private global value may cause the
199 private to be renamed as necessary to avoid collisions. Because the
200 symbol is private to the module, all references can be updated. This
201 doesn't show up in any symbol table in the object file.
203 Similar to private, but the value shows as a local symbol
204 (``STB_LOCAL`` in the case of ELF) in the object file. This
205 corresponds to the notion of the '``static``' keyword in C.
206 ``available_externally``
207 Globals with "``available_externally``" linkage are never emitted into
208 the object file corresponding to the LLVM module. From the linker's
209 perspective, an ``available_externally`` global is equivalent to
210 an external declaration. They exist to allow inlining and other
211 optimizations to take place given knowledge of the definition of the
212 global, which is known to be somewhere outside the module. Globals
213 with ``available_externally`` linkage are allowed to be discarded at
214 will, and allow inlining and other optimizations. This linkage type is
215 only allowed on definitions, not declarations.
217 Globals with "``linkonce``" linkage are merged with other globals of
218 the same name when linkage occurs. This can be used to implement
219 some forms of inline functions, templates, or other code which must
220 be generated in each translation unit that uses it, but where the
221 body may be overridden with a more definitive definition later.
222 Unreferenced ``linkonce`` globals are allowed to be discarded. Note
223 that ``linkonce`` linkage does not actually allow the optimizer to
224 inline the body of this function into callers because it doesn't
225 know if this definition of the function is the definitive definition
226 within the program or whether it will be overridden by a stronger
227 definition. To enable inlining and other optimizations, use
228 "``linkonce_odr``" linkage.
230 "``weak``" linkage has the same merging semantics as ``linkonce``
231 linkage, except that unreferenced globals with ``weak`` linkage may
232 not be discarded. This is used for globals that are declared "weak"
235 "``common``" linkage is most similar to "``weak``" linkage, but they
236 are used for tentative definitions in C, such as "``int X;``" at
237 global scope. Symbols with "``common``" linkage are merged in the
238 same way as ``weak symbols``, and they may not be deleted if
239 unreferenced. ``common`` symbols may not have an explicit section,
240 must have a zero initializer, and may not be marked
241 ':ref:`constant <globalvars>`'. Functions and aliases may not have
244 .. _linkage_appending:
247 "``appending``" linkage may only be applied to global variables of
248 pointer to array type. When two global variables with appending
249 linkage are linked together, the two global arrays are appended
250 together. This is the LLVM, typesafe, equivalent of having the
251 system linker append together "sections" with identical names when
254 Unfortunately this doesn't correspond to any feature in .o files, so it
255 can only be used for variables like ``llvm.global_ctors`` which llvm
256 interprets specially.
259 The semantics of this linkage follow the ELF object file model: the
260 symbol is weak until linked, if not linked, the symbol becomes null
261 instead of being an undefined reference.
262 ``linkonce_odr``, ``weak_odr``
263 Some languages allow differing globals to be merged, such as two
264 functions with different semantics. Other languages, such as
265 ``C++``, ensure that only equivalent globals are ever merged (the
266 "one definition rule" --- "ODR"). Such languages can use the
267 ``linkonce_odr`` and ``weak_odr`` linkage types to indicate that the
268 global will only be merged with equivalent globals. These linkage
269 types are otherwise the same as their non-``odr`` versions.
271 If none of the above identifiers are used, the global is externally
272 visible, meaning that it participates in linkage and can be used to
273 resolve external symbol references.
275 It is illegal for a global variable or function *declaration* to have any
276 linkage type other than ``external`` or ``extern_weak``.
283 LLVM :ref:`functions <functionstructure>`, :ref:`calls <i_call>` and
284 :ref:`invokes <i_invoke>` can all have an optional calling convention
285 specified for the call. The calling convention of any pair of dynamic
286 caller/callee must match, or the behavior of the program is undefined.
287 The following calling conventions are supported by LLVM, and more may be
290 "``ccc``" - The C calling convention
291 This calling convention (the default if no other calling convention
292 is specified) matches the target C calling conventions. This calling
293 convention supports varargs function calls and tolerates some
294 mismatch in the declared prototype and implemented declaration of
295 the function (as does normal C).
296 "``fastcc``" - The fast calling convention
297 This calling convention attempts to make calls as fast as possible
298 (e.g. by passing things in registers). This calling convention
299 allows the target to use whatever tricks it wants to produce fast
300 code for the target, without having to conform to an externally
301 specified ABI (Application Binary Interface). `Tail calls can only
302 be optimized when this, the tailcc, the GHC or the HiPE convention is
303 used. <CodeGenerator.html#id80>`_ This calling convention does not
304 support varargs and requires the prototype of all callees to exactly
305 match the prototype of the function definition.
306 "``coldcc``" - The cold calling convention
307 This calling convention attempts to make code in the caller as
308 efficient as possible under the assumption that the call is not
309 commonly executed. As such, these calls often preserve all registers
310 so that the call does not break any live ranges in the caller side.
311 This calling convention does not support varargs and requires the
312 prototype of all callees to exactly match the prototype of the
313 function definition. Furthermore the inliner doesn't consider such function
315 "``cc 10``" - GHC convention
316 This calling convention has been implemented specifically for use by
317 the `Glasgow Haskell Compiler (GHC) <http://www.haskell.org/ghc>`_.
318 It passes everything in registers, going to extremes to achieve this
319 by disabling callee save registers. This calling convention should
320 not be used lightly but only for specific situations such as an
321 alternative to the *register pinning* performance technique often
322 used when implementing functional programming languages. At the
323 moment only X86 supports this convention and it has the following
326 - On *X86-32* only supports up to 4 bit type parameters. No
327 floating-point types are supported.
328 - On *X86-64* only supports up to 10 bit type parameters and 6
329 floating-point parameters.
331 This calling convention supports `tail call
332 optimization <CodeGenerator.html#id80>`_ but requires both the
333 caller and callee are using it.
334 "``cc 11``" - The HiPE calling convention
335 This calling convention has been implemented specifically for use by
336 the `High-Performance Erlang
337 (HiPE) <http://www.it.uu.se/research/group/hipe/>`_ compiler, *the*
338 native code compiler of the `Ericsson's Open Source Erlang/OTP
339 system <http://www.erlang.org/download.shtml>`_. It uses more
340 registers for argument passing than the ordinary C calling
341 convention and defines no callee-saved registers. The calling
342 convention properly supports `tail call
343 optimization <CodeGenerator.html#id80>`_ but requires that both the
344 caller and the callee use it. It uses a *register pinning*
345 mechanism, similar to GHC's convention, for keeping frequently
346 accessed runtime components pinned to specific hardware registers.
347 At the moment only X86 supports this convention (both 32 and 64
349 "``webkit_jscc``" - WebKit's JavaScript calling convention
350 This calling convention has been implemented for `WebKit FTL JIT
351 <https://trac.webkit.org/wiki/FTLJIT>`_. It passes arguments on the
352 stack right to left (as cdecl does), and returns a value in the
353 platform's customary return register.
354 "``anyregcc``" - Dynamic calling convention for code patching
355 This is a special convention that supports patching an arbitrary code
356 sequence in place of a call site. This convention forces the call
357 arguments into registers but allows them to be dynamically
358 allocated. This can currently only be used with calls to
359 llvm.experimental.patchpoint because only this intrinsic records
360 the location of its arguments in a side table. See :doc:`StackMaps`.
361 "``preserve_mostcc``" - The `PreserveMost` calling convention
362 This calling convention attempts to make the code in the caller as
363 unintrusive as possible. This convention behaves identically to the `C`
364 calling convention on how arguments and return values are passed, but it
365 uses a different set of caller/callee-saved registers. This alleviates the
366 burden of saving and recovering a large register set before and after the
367 call in the caller. If the arguments are passed in callee-saved registers,
368 then they will be preserved by the callee across the call. This doesn't
369 apply for values returned in callee-saved registers.
371 - On X86-64 the callee preserves all general purpose registers, except for
372 R11. R11 can be used as a scratch register. Floating-point registers
373 (XMMs/YMMs) are not preserved and need to be saved by the caller.
375 The idea behind this convention is to support calls to runtime functions
376 that have a hot path and a cold path. The hot path is usually a small piece
377 of code that doesn't use many registers. The cold path might need to call out to
378 another function and therefore only needs to preserve the caller-saved
379 registers, which haven't already been saved by the caller. The
380 `PreserveMost` calling convention is very similar to the `cold` calling
381 convention in terms of caller/callee-saved registers, but they are used for
382 different types of function calls. `coldcc` is for function calls that are
383 rarely executed, whereas `preserve_mostcc` function calls are intended to be
384 on the hot path and definitely executed a lot. Furthermore `preserve_mostcc`
385 doesn't prevent the inliner from inlining the function call.
387 This calling convention will be used by a future version of the ObjectiveC
388 runtime and should therefore still be considered experimental at this time.
389 Although this convention was created to optimize certain runtime calls to
390 the ObjectiveC runtime, it is not limited to this runtime and might be used
391 by other runtimes in the future too. The current implementation only
392 supports X86-64, but the intention is to support more architectures in the
394 "``preserve_allcc``" - The `PreserveAll` calling convention
395 This calling convention attempts to make the code in the caller even less
396 intrusive than the `PreserveMost` calling convention. This calling
397 convention also behaves identical to the `C` calling convention on how
398 arguments and return values are passed, but it uses a different set of
399 caller/callee-saved registers. This removes the burden of saving and
400 recovering a large register set before and after the call in the caller. If
401 the arguments are passed in callee-saved registers, then they will be
402 preserved by the callee across the call. This doesn't apply for values
403 returned in callee-saved registers.
405 - On X86-64 the callee preserves all general purpose registers, except for
406 R11. R11 can be used as a scratch register. Furthermore it also preserves
407 all floating-point registers (XMMs/YMMs).
409 The idea behind this convention is to support calls to runtime functions
410 that don't need to call out to any other functions.
412 This calling convention, like the `PreserveMost` calling convention, will be
413 used by a future version of the ObjectiveC runtime and should be considered
414 experimental at this time.
415 "``cxx_fast_tlscc``" - The `CXX_FAST_TLS` calling convention for access functions
416 Clang generates an access function to access C++-style TLS. The access
417 function generally has an entry block, an exit block and an initialization
418 block that is run at the first time. The entry and exit blocks can access
419 a few TLS IR variables, each access will be lowered to a platform-specific
422 This calling convention aims to minimize overhead in the caller by
423 preserving as many registers as possible (all the registers that are
424 preserved on the fast path, composed of the entry and exit blocks).
426 This calling convention behaves identical to the `C` calling convention on
427 how arguments and return values are passed, but it uses a different set of
428 caller/callee-saved registers.
430 Given that each platform has its own lowering sequence, hence its own set
431 of preserved registers, we can't use the existing `PreserveMost`.
433 - On X86-64 the callee preserves all general purpose registers, except for
435 "``tailcc``" - Tail callable calling convention
436 This calling convention ensures that calls in tail position will always be
437 tail call optimized. This calling convention is equivalent to fastcc,
438 except for an additional guarantee that tail calls will be produced
439 whenever possible. `Tail calls can only be optimized when this, the fastcc,
440 the GHC or the HiPE convention is used. <CodeGenerator.html#id80>`_ This
441 calling convention does not support varargs and requires the prototype of
442 all callees to exactly match the prototype of the function definition.
443 "``swiftcc``" - This calling convention is used for Swift language.
444 - On X86-64 RCX and R8 are available for additional integer returns, and
445 XMM2 and XMM3 are available for additional FP/vector returns.
446 - On iOS platforms, we use AAPCS-VFP calling convention.
448 This calling convention is like ``swiftcc`` in most respects, but also the
449 callee pops the argument area of the stack so that mandatory tail calls are
450 possible as in ``tailcc``.
451 "``cfguard_checkcc``" - Windows Control Flow Guard (Check mechanism)
452 This calling convention is used for the Control Flow Guard check function,
453 calls to which can be inserted before indirect calls to check that the call
454 target is a valid function address. The check function has no return value,
455 but it will trigger an OS-level error if the address is not a valid target.
456 The set of registers preserved by the check function, and the register
457 containing the target address are architecture-specific.
459 - On X86 the target address is passed in ECX.
460 - On ARM the target address is passed in R0.
461 - On AArch64 the target address is passed in X15.
462 "``cc <n>``" - Numbered convention
463 Any calling convention may be specified by number, allowing
464 target-specific calling conventions to be used. Target specific
465 calling conventions start at 64.
467 More calling conventions can be added/defined on an as-needed basis, to
468 support Pascal conventions or any other well-known target-independent
471 .. _visibilitystyles:
476 All Global Variables and Functions have one of the following visibility
479 "``default``" - Default style
480 On targets that use the ELF object file format, default visibility
481 means that the declaration is visible to other modules and, in
482 shared libraries, means that the declared entity may be overridden.
483 On Darwin, default visibility means that the declaration is visible
484 to other modules. Default visibility corresponds to "external
485 linkage" in the language.
486 "``hidden``" - Hidden style
487 Two declarations of an object with hidden visibility refer to the
488 same object if they are in the same shared object. Usually, hidden
489 visibility indicates that the symbol will not be placed into the
490 dynamic symbol table, so no other module (executable or shared
491 library) can reference it directly.
492 "``protected``" - Protected style
493 On ELF, protected visibility indicates that the symbol will be
494 placed in the dynamic symbol table, but that references within the
495 defining module will bind to the local symbol. That is, the symbol
496 cannot be overridden by another module.
498 A symbol with ``internal`` or ``private`` linkage must have ``default``
506 All Global Variables, Functions and Aliases can have one of the following
510 "``dllimport``" causes the compiler to reference a function or variable via
511 a global pointer to a pointer that is set up by the DLL exporting the
512 symbol. On Microsoft Windows targets, the pointer name is formed by
513 combining ``__imp_`` and the function or variable name.
515 "``dllexport``" causes the compiler to provide a global pointer to a pointer
516 in a DLL, so that it can be referenced with the ``dllimport`` attribute. On
517 Microsoft Windows targets, the pointer name is formed by combining
518 ``__imp_`` and the function or variable name. Since this storage class
519 exists for defining a dll interface, the compiler, assembler and linker know
520 it is externally referenced and must refrain from deleting the symbol.
524 Thread Local Storage Models
525 ---------------------------
527 A variable may be defined as ``thread_local``, which means that it will
528 not be shared by threads (each thread will have a separated copy of the
529 variable). Not all targets support thread-local variables. Optionally, a
530 TLS model may be specified:
533 For variables that are only used within the current shared library.
535 For variables in modules that will not be loaded dynamically.
537 For variables defined in the executable and only used within it.
539 If no explicit model is given, the "general dynamic" model is used.
541 The models correspond to the ELF TLS models; see `ELF Handling For
542 Thread-Local Storage <http://people.redhat.com/drepper/tls.pdf>`_ for
543 more information on under which circumstances the different models may
544 be used. The target may choose a different TLS model if the specified
545 model is not supported, or if a better choice of model can be made.
547 A model can also be specified in an alias, but then it only governs how
548 the alias is accessed. It will not have any effect in the aliasee.
550 For platforms without linker support of ELF TLS model, the -femulated-tls
551 flag can be used to generate GCC compatible emulated TLS code.
553 .. _runtime_preemption_model:
555 Runtime Preemption Specifiers
556 -----------------------------
558 Global variables, functions and aliases may have an optional runtime preemption
559 specifier. If a preemption specifier isn't given explicitly, then a
560 symbol is assumed to be ``dso_preemptable``.
563 Indicates that the function or variable may be replaced by a symbol from
564 outside the linkage unit at runtime.
567 The compiler may assume that a function or variable marked as ``dso_local``
568 will resolve to a symbol within the same linkage unit. Direct access will
569 be generated even if the definition is not within this compilation unit.
576 LLVM IR allows you to specify both "identified" and "literal" :ref:`structure
577 types <t_struct>`. Literal types are uniqued structurally, but identified types
578 are never uniqued. An :ref:`opaque structural type <t_opaque>` can also be used
579 to forward declare a type that is not yet available.
581 An example of an identified structure specification is:
585 %mytype = type { %mytype*, i32 }
587 Prior to the LLVM 3.0 release, identified types were structurally uniqued. Only
588 literal types are uniqued in recent versions of LLVM.
592 Non-Integral Pointer Type
593 -------------------------
595 Note: non-integral pointer types are a work in progress, and they should be
596 considered experimental at this time.
598 LLVM IR optionally allows the frontend to denote pointers in certain address
599 spaces as "non-integral" via the :ref:`datalayout string<langref_datalayout>`.
600 Non-integral pointer types represent pointers that have an *unspecified* bitwise
601 representation; that is, the integral representation may be target dependent or
602 unstable (not backed by a fixed integer).
604 ``inttoptr`` and ``ptrtoint`` instructions have the same semantics as for
605 integral (i.e. normal) pointers in that they convert integers to and from
606 corresponding pointer types, but there are additional implications to be
607 aware of. Because the bit-representation of a non-integral pointer may
608 not be stable, two identical casts of the same operand may or may not
609 return the same value. Said differently, the conversion to or from the
610 non-integral type depends on environmental state in an implementation
613 If the frontend wishes to observe a *particular* value following a cast, the
614 generated IR must fence with the underlying environment in an implementation
615 defined manner. (In practice, this tends to require ``noinline`` routines for
618 From the perspective of the optimizer, ``inttoptr`` and ``ptrtoint`` for
619 non-integral types are analogous to ones on integral types with one
620 key exception: the optimizer may not, in general, insert new dynamic
621 occurrences of such casts. If a new cast is inserted, the optimizer would
622 need to either ensure that a) all possible values are valid, or b)
623 appropriate fencing is inserted. Since the appropriate fencing is
624 implementation defined, the optimizer can't do the latter. The former is
625 challenging as many commonly expected properties, such as
626 ``ptrtoint(v)-ptrtoint(v) == 0``, don't hold for non-integral types.
633 Global variables define regions of memory allocated at compilation time
636 Global variable definitions must be initialized.
638 Global variables in other translation units can also be declared, in which
639 case they don't have an initializer.
641 Global variables can optionally specify a :ref:`linkage type <linkage>`.
643 Either global variable definitions or declarations may have an explicit section
644 to be placed in and may have an optional explicit alignment specified. If there
645 is a mismatch between the explicit or inferred section information for the
646 variable declaration and its definition the resulting behavior is undefined.
648 A variable may be defined as a global ``constant``, which indicates that
649 the contents of the variable will **never** be modified (enabling better
650 optimization, allowing the global data to be placed in the read-only
651 section of an executable, etc). Note that variables that need runtime
652 initialization cannot be marked ``constant`` as there is a store to the
655 LLVM explicitly allows *declarations* of global variables to be marked
656 constant, even if the final definition of the global is not. This
657 capability can be used to enable slightly better optimization of the
658 program, but requires the language definition to guarantee that
659 optimizations based on the 'constantness' are valid for the translation
660 units that do not include the definition.
662 As SSA values, global variables define pointer values that are in scope
663 (i.e. they dominate) all basic blocks in the program. Global variables
664 always define a pointer to their "content" type because they describe a
665 region of memory, and all memory objects in LLVM are accessed through
668 Global variables can be marked with ``unnamed_addr`` which indicates
669 that the address is not significant, only the content. Constants marked
670 like this can be merged with other constants if they have the same
671 initializer. Note that a constant with significant address *can* be
672 merged with a ``unnamed_addr`` constant, the result being a constant
673 whose address is significant.
675 If the ``local_unnamed_addr`` attribute is given, the address is known to
676 not be significant within the module.
678 A global variable may be declared to reside in a target-specific
679 numbered address space. For targets that support them, address spaces
680 may affect how optimizations are performed and/or what target
681 instructions are used to access the variable. The default address space
682 is zero. The address space qualifier must precede any other attributes.
684 LLVM allows an explicit section to be specified for globals. If the
685 target supports it, it will emit globals to the section specified.
686 Additionally, the global can placed in a comdat if the target has the necessary
689 External declarations may have an explicit section specified. Section
690 information is retained in LLVM IR for targets that make use of this
691 information. Attaching section information to an external declaration is an
692 assertion that its definition is located in the specified section. If the
693 definition is located in a different section, the behavior is undefined.
695 By default, global initializers are optimized by assuming that global
696 variables defined within the module are not modified from their
697 initial values before the start of the global initializer. This is
698 true even for variables potentially accessible from outside the
699 module, including those with external linkage or appearing in
700 ``@llvm.used`` or dllexported variables. This assumption may be suppressed
701 by marking the variable with ``externally_initialized``.
703 An explicit alignment may be specified for a global, which must be a
704 power of 2. If not present, or if the alignment is set to zero, the
705 alignment of the global is set by the target to whatever it feels
706 convenient. If an explicit alignment is specified, the global is forced
707 to have exactly that alignment. Targets and optimizers are not allowed
708 to over-align the global if the global has an assigned section. In this
709 case, the extra alignment could be observable: for example, code could
710 assume that the globals are densely packed in their section and try to
711 iterate over them as an array, alignment padding would break this
712 iteration. The maximum alignment is ``1 << 32``.
714 For global variables declarations, as well as definitions that may be
715 replaced at link time (``linkonce``, ``weak``, ``extern_weak`` and ``common``
716 linkage types), LLVM makes no assumptions about the allocation size of the
717 variables, except that they may not overlap. The alignment of a global variable
718 declaration or replaceable definition must not be greater than the alignment of
719 the definition it resolves to.
721 Globals can also have a :ref:`DLL storage class <dllstorageclass>`,
722 an optional :ref:`runtime preemption specifier <runtime_preemption_model>`,
723 an optional :ref:`global attributes <glattrs>` and
724 an optional list of attached :ref:`metadata <metadata>`.
726 Variables and aliases can have a
727 :ref:`Thread Local Storage Model <tls_model>`.
729 :ref:`Scalable vectors <t_vector>` cannot be global variables or members of
730 arrays because their size is unknown at compile time. They are allowed in
731 structs to facilitate intrinsics returning multiple values. Structs containing
732 scalable vectors cannot be used in loads, stores, allocas, or GEPs.
736 @<GlobalVarName> = [Linkage] [PreemptionSpecifier] [Visibility]
737 [DLLStorageClass] [ThreadLocal]
738 [(unnamed_addr|local_unnamed_addr)] [AddrSpace]
739 [ExternallyInitialized]
740 <global | constant> <Type> [<InitializerConstant>]
741 [, section "name"] [, comdat [($name)]]
742 [, align <Alignment>] (, !name !N)*
744 For example, the following defines a global in a numbered address space
745 with an initializer, section, and alignment:
749 @G = addrspace(5) constant float 1.0, section "foo", align 4
751 The following example just declares a global variable
755 @G = external global i32
757 The following example defines a thread-local global with the
758 ``initialexec`` TLS model:
762 @G = thread_local(initialexec) global i32 0, align 4
764 .. _functionstructure:
769 LLVM function definitions consist of the "``define``" keyword, an
770 optional :ref:`linkage type <linkage>`, an optional :ref:`runtime preemption
771 specifier <runtime_preemption_model>`, an optional :ref:`visibility
772 style <visibility>`, an optional :ref:`DLL storage class <dllstorageclass>`,
773 an optional :ref:`calling convention <callingconv>`,
774 an optional ``unnamed_addr`` attribute, a return type, an optional
775 :ref:`parameter attribute <paramattrs>` for the return type, a function
776 name, a (possibly empty) argument list (each with optional :ref:`parameter
777 attributes <paramattrs>`), optional :ref:`function attributes <fnattrs>`,
778 an optional address space, an optional section, an optional alignment,
779 an optional :ref:`comdat <langref_comdats>`,
780 an optional :ref:`garbage collector name <gc>`, an optional :ref:`prefix <prefixdata>`,
781 an optional :ref:`prologue <prologuedata>`,
782 an optional :ref:`personality <personalityfn>`,
783 an optional list of attached :ref:`metadata <metadata>`,
784 an opening curly brace, a list of basic blocks, and a closing curly brace.
786 LLVM function declarations consist of the "``declare``" keyword, an
787 optional :ref:`linkage type <linkage>`, an optional :ref:`visibility style
788 <visibility>`, an optional :ref:`DLL storage class <dllstorageclass>`, an
789 optional :ref:`calling convention <callingconv>`, an optional ``unnamed_addr``
790 or ``local_unnamed_addr`` attribute, an optional address space, a return type,
791 an optional :ref:`parameter attribute <paramattrs>` for the return type, a function name, a possibly
792 empty list of arguments, an optional alignment, an optional :ref:`garbage
793 collector name <gc>`, an optional :ref:`prefix <prefixdata>`, and an optional
794 :ref:`prologue <prologuedata>`.
796 A function definition contains a list of basic blocks, forming the CFG (Control
797 Flow Graph) for the function. Each basic block may optionally start with a label
798 (giving the basic block a symbol table entry), contains a list of instructions,
799 and ends with a :ref:`terminator <terminators>` instruction (such as a branch or
800 function return). If an explicit label name is not provided, a block is assigned
801 an implicit numbered label, using the next value from the same counter as used
802 for unnamed temporaries (:ref:`see above<identifiers>`). For example, if a
803 function entry block does not have an explicit label, it will be assigned label
804 "%0", then the first unnamed temporary in that block will be "%1", etc. If a
805 numeric label is explicitly specified, it must match the numeric label that
806 would be used implicitly.
808 The first basic block in a function is special in two ways: it is
809 immediately executed on entrance to the function, and it is not allowed
810 to have predecessor basic blocks (i.e. there can not be any branches to
811 the entry block of a function). Because the block can have no
812 predecessors, it also cannot have any :ref:`PHI nodes <i_phi>`.
814 LLVM allows an explicit section to be specified for functions. If the
815 target supports it, it will emit functions to the section specified.
816 Additionally, the function can be placed in a COMDAT.
818 An explicit alignment may be specified for a function. If not present,
819 or if the alignment is set to zero, the alignment of the function is set
820 by the target to whatever it feels convenient. If an explicit alignment
821 is specified, the function is forced to have at least that much
822 alignment. All alignments must be a power of 2.
824 If the ``unnamed_addr`` attribute is given, the address is known to not
825 be significant and two identical functions can be merged.
827 If the ``local_unnamed_addr`` attribute is given, the address is known to
828 not be significant within the module.
830 If an explicit address space is not given, it will default to the program
831 address space from the :ref:`datalayout string<langref_datalayout>`.
835 define [linkage] [PreemptionSpecifier] [visibility] [DLLStorageClass]
837 <ResultType> @<FunctionName> ([argument list])
838 [(unnamed_addr|local_unnamed_addr)] [AddrSpace] [fn Attrs]
839 [section "name"] [comdat [($name)]] [align N] [gc] [prefix Constant]
840 [prologue Constant] [personality Constant] (!name !N)* { ... }
842 The argument list is a comma separated sequence of arguments where each
843 argument is of the following form:
847 <type> [parameter Attrs] [name]
855 Aliases, unlike function or variables, don't create any new data. They
856 are just a new symbol and metadata for an existing position.
858 Aliases have a name and an aliasee that is either a global value or a
861 Aliases may have an optional :ref:`linkage type <linkage>`, an optional
862 :ref:`runtime preemption specifier <runtime_preemption_model>`, an optional
863 :ref:`visibility style <visibility>`, an optional :ref:`DLL storage class
864 <dllstorageclass>` and an optional :ref:`tls model <tls_model>`.
868 @<Name> = [Linkage] [PreemptionSpecifier] [Visibility] [DLLStorageClass] [ThreadLocal] [(unnamed_addr|local_unnamed_addr)] alias <AliaseeTy>, <AliaseeTy>* @<Aliasee>
870 The linkage must be one of ``private``, ``internal``, ``linkonce``, ``weak``,
871 ``linkonce_odr``, ``weak_odr``, ``external``. Note that some system linkers
872 might not correctly handle dropping a weak symbol that is aliased.
874 Aliases that are not ``unnamed_addr`` are guaranteed to have the same address as
875 the aliasee expression. ``unnamed_addr`` ones are only guaranteed to point
878 If the ``local_unnamed_addr`` attribute is given, the address is known to
879 not be significant within the module.
881 Since aliases are only a second name, some restrictions apply, of which
882 some can only be checked when producing an object file:
884 * The expression defining the aliasee must be computable at assembly
885 time. Since it is just a name, no relocations can be used.
887 * No alias in the expression can be weak as the possibility of the
888 intermediate alias being overridden cannot be represented in an
891 * No global value in the expression can be a declaration, since that
892 would require a relocation, which is not possible.
899 IFuncs, like as aliases, don't create any new data or func. They are just a new
900 symbol that dynamic linker resolves at runtime by calling a resolver function.
902 IFuncs have a name and a resolver that is a function called by dynamic linker
903 that returns address of another function associated with the name.
905 IFunc may have an optional :ref:`linkage type <linkage>` and an optional
906 :ref:`visibility style <visibility>`.
910 @<Name> = [Linkage] [PreemptionSpecifier] [Visibility] ifunc <IFuncTy>, <ResolverTy>* @<Resolver>
918 Comdat IR provides access to object file COMDAT/section group functionality
919 which represents interrelated sections.
921 Comdats have a name which represents the COMDAT key and a selection kind to
922 provide input on how the linker deduplicates comdats with the same key in two
923 different object files. A comdat must be included or omitted as a unit.
924 Discarding the whole comdat is allowed but discarding a subset is not.
926 A global object may be a member of at most one comdat. Aliases are placed in the
927 same COMDAT that their aliasee computes to, if any.
931 $<Name> = comdat SelectionKind
933 For selection kinds other than ``nodeduplicate``, only one of the duplicate
934 comdats may be retained by the linker and the members of the remaining comdats
935 must be discarded. The following selection kinds are supported:
938 The linker may choose any COMDAT key, the choice is arbitrary.
940 The linker may choose any COMDAT key but the sections must contain the
943 The linker will choose the section containing the largest COMDAT key.
945 No deduplication is performed.
947 The linker may choose any COMDAT key but the sections must contain the
950 - XCOFF and Mach-O don't support COMDATs.
951 - COFF supports all selection kinds. Non-``nodeduplicate`` selection kinds need
952 a non-local linkage COMDAT symbol.
953 - ELF supports ``any`` and ``nodeduplicate``.
954 - WebAssembly only supports ``any``.
956 Here is an example of a COFF COMDAT where a function will only be selected if
957 the COMDAT key's section is the largest:
961 $foo = comdat largest
962 @foo = global i32 2, comdat($foo)
964 define void @bar() comdat($foo) {
968 In a COFF object file, this will create a COMDAT section with selection kind
969 ``IMAGE_COMDAT_SELECT_LARGEST`` containing the contents of the ``@foo`` symbol
970 and another COMDAT section with selection kind
971 ``IMAGE_COMDAT_SELECT_ASSOCIATIVE`` which is associated with the first COMDAT
972 section and contains the contents of the ``@bar`` symbol.
974 As a syntactic sugar the ``$name`` can be omitted if the name is the same as
980 @foo = global i32 2, comdat
981 @bar = global i32 3, comdat($foo)
983 There are some restrictions on the properties of the global object.
984 It, or an alias to it, must have the same name as the COMDAT group when
986 The contents and size of this object may be used during link-time to determine
987 which COMDAT groups get selected depending on the selection kind.
988 Because the name of the object must match the name of the COMDAT group, the
989 linkage of the global object must not be local; local symbols can get renamed
990 if a collision occurs in the symbol table.
992 The combined use of COMDATS and section attributes may yield surprising results.
999 @g1 = global i32 42, section "sec", comdat($foo)
1000 @g2 = global i32 42, section "sec", comdat($bar)
1002 From the object file perspective, this requires the creation of two sections
1003 with the same name. This is necessary because both globals belong to different
1004 COMDAT groups and COMDATs, at the object file level, are represented by
1007 Note that certain IR constructs like global variables and functions may
1008 create COMDATs in the object file in addition to any which are specified using
1009 COMDAT IR. This arises when the code generator is configured to emit globals
1010 in individual sections (e.g. when `-data-sections` or `-function-sections`
1011 is supplied to `llc`).
1013 .. _namedmetadatastructure:
1018 Named metadata is a collection of metadata. :ref:`Metadata
1019 nodes <metadata>` (but not metadata strings) are the only valid
1020 operands for a named metadata.
1022 #. Named metadata are represented as a string of characters with the
1023 metadata prefix. The rules for metadata names are the same as for
1024 identifiers, but quoted names are not allowed. ``"\xx"`` type escapes
1025 are still valid, which allows any character to be part of a name.
1029 ; Some unnamed metadata nodes, which are referenced by the named metadata.
1034 !name = !{!0, !1, !2}
1038 Parameter Attributes
1039 --------------------
1041 The return type and each parameter of a function type may have a set of
1042 *parameter attributes* associated with them. Parameter attributes are
1043 used to communicate additional information about the result or
1044 parameters of a function. Parameter attributes are considered to be part
1045 of the function, not of the function type, so functions with different
1046 parameter attributes can have the same function type.
1048 Parameter attributes are simple keywords that follow the type specified.
1049 If multiple parameter attributes are needed, they are space separated.
1052 .. code-block:: llvm
1054 declare i32 @printf(i8* noalias nocapture, ...)
1055 declare i32 @atoi(i8 zeroext)
1056 declare signext i8 @returns_signed_char()
1058 Note that any attributes for the function result (``nounwind``,
1059 ``readonly``) come immediately after the argument list.
1061 Currently, only the following parameter attributes are defined:
1064 This indicates to the code generator that the parameter or return
1065 value should be zero-extended to the extent required by the target's
1066 ABI by the caller (for a parameter) or the callee (for a return value).
1068 This indicates to the code generator that the parameter or return
1069 value should be sign-extended to the extent required by the target's
1070 ABI (which is usually 32-bits) by the caller (for a parameter) or
1071 the callee (for a return value).
1073 This indicates that this parameter or return value should be treated
1074 in a special target-dependent fashion while emitting code for
1075 a function call or return (usually, by putting it in a register as
1076 opposed to memory, though some targets use it to distinguish between
1077 two different kinds of registers). Use of this attribute is
1080 This indicates that the pointer parameter should really be passed by
1081 value to the function. The attribute implies that a hidden copy of
1082 the pointee is made between the caller and the callee, so the callee
1083 is unable to modify the value in the caller. This attribute is only
1084 valid on LLVM pointer arguments. It is generally used to pass
1085 structs and arrays by value, but is also valid on pointers to
1086 scalars. The copy is considered to belong to the caller not the
1087 callee (for example, ``readonly`` functions should not write to
1088 ``byval`` parameters). This is not a valid attribute for return
1091 The byval type argument indicates the in-memory value type, and
1092 must be the same as the pointee type of the argument.
1094 The byval attribute also supports specifying an alignment with the
1095 align attribute. It indicates the alignment of the stack slot to
1096 form and the known alignment of the pointer specified to the call
1097 site. If the alignment is not specified, then the code generator
1098 makes a target-specific assumption.
1104 The ``byref`` argument attribute allows specifying the pointee
1105 memory type of an argument. This is similar to ``byval``, but does
1106 not imply a copy is made anywhere, or that the argument is passed
1107 on the stack. This implies the pointer is dereferenceable up to
1108 the storage size of the type.
1110 It is not generally permissible to introduce a write to an
1111 ``byref`` pointer. The pointer may have any address space and may
1114 This is not a valid attribute for return values.
1116 The alignment for an ``byref`` parameter can be explicitly
1117 specified by combining it with the ``align`` attribute, similar to
1118 ``byval``. If the alignment is not specified, then the code generator
1119 makes a target-specific assumption.
1121 This is intended for representing ABI constraints, and is not
1122 intended to be inferred for optimization use.
1124 .. _attr_preallocated:
1126 ``preallocated(<ty>)``
1127 This indicates that the pointer parameter should really be passed by
1128 value to the function, and that the pointer parameter's pointee has
1129 already been initialized before the call instruction. This attribute
1130 is only valid on LLVM pointer arguments. The argument must be the value
1131 returned by the appropriate
1132 :ref:`llvm.call.preallocated.arg<int_call_preallocated_arg>` on non
1133 ``musttail`` calls, or the corresponding caller parameter in ``musttail``
1134 calls, although it is ignored during codegen.
1136 A non ``musttail`` function call with a ``preallocated`` attribute in
1137 any parameter must have a ``"preallocated"`` operand bundle. A ``musttail``
1138 function call cannot have a ``"preallocated"`` operand bundle.
1140 The preallocated attribute requires a type argument, which must be
1141 the same as the pointee type of the argument.
1143 The preallocated attribute also supports specifying an alignment with the
1144 align attribute. It indicates the alignment of the stack slot to
1145 form and the known alignment of the pointer specified to the call
1146 site. If the alignment is not specified, then the code generator
1147 makes a target-specific assumption.
1153 The ``inalloca`` argument attribute allows the caller to take the
1154 address of outgoing stack arguments. An ``inalloca`` argument must
1155 be a pointer to stack memory produced by an ``alloca`` instruction.
1156 The alloca, or argument allocation, must also be tagged with the
1157 inalloca keyword. Only the last argument may have the ``inalloca``
1158 attribute, and that argument is guaranteed to be passed in memory.
1160 An argument allocation may be used by a call at most once because
1161 the call may deallocate it. The ``inalloca`` attribute cannot be
1162 used in conjunction with other attributes that affect argument
1163 storage, like ``inreg``, ``nest``, ``sret``, or ``byval``. The
1164 ``inalloca`` attribute also disables LLVM's implicit lowering of
1165 large aggregate return values, which means that frontend authors
1166 must lower them with ``sret`` pointers.
1168 When the call site is reached, the argument allocation must have
1169 been the most recent stack allocation that is still live, or the
1170 behavior is undefined. It is possible to allocate additional stack
1171 space after an argument allocation and before its call site, but it
1172 must be cleared off with :ref:`llvm.stackrestore
1173 <int_stackrestore>`.
1175 The inalloca attribute requires a type argument, which must be the
1176 same as the pointee type of the argument.
1178 See :doc:`InAlloca` for more information on how to use this
1182 This indicates that the pointer parameter specifies the address of a
1183 structure that is the return value of the function in the source
1184 program. This pointer must be guaranteed by the caller to be valid:
1185 loads and stores to the structure may be assumed by the callee not
1186 to trap and to be properly aligned. This is not a valid attribute
1189 The sret type argument specifies the in memory type, which must be
1190 the same as the pointee type of the argument.
1192 .. _attr_elementtype:
1194 ``elementtype(<ty>)``
1196 The ``elementtype`` argument attribute can be used to specify a pointer
1197 element type in a way that is compatible with `opaque pointers
1198 <OpaquePointers.html>`.
1200 The ``elementtype`` attribute by itself does not carry any specific
1201 semantics. However, certain intrinsics may require this attribute to be
1202 present and assign it particular semantics. This will be documented on
1203 individual intrinsics.
1205 The attribute may only be applied to pointer typed arguments of intrinsic
1206 calls. It cannot be applied to non-intrinsic calls, and cannot be applied
1207 to parameters on function declarations. For non-opaque pointers, the type
1208 passed to ``elementtype`` must match the pointer element type.
1212 ``align <n>`` or ``align(<n>)``
1213 This indicates that the pointer value has the specified alignment.
1214 If the pointer value does not have the specified alignment,
1215 :ref:`poison value <poisonvalues>` is returned or passed instead. The
1216 ``align`` attribute should be combined with the ``noundef`` attribute to
1217 ensure a pointer is aligned, or otherwise the behavior is undefined. Note
1218 that ``align 1`` has no effect on non-byval, non-preallocated arguments.
1220 Note that this attribute has additional semantics when combined with the
1221 ``byval`` or ``preallocated`` attribute, which are documented there.
1226 This indicates that memory locations accessed via pointer values
1227 :ref:`based <pointeraliasing>` on the argument or return value are not also
1228 accessed, during the execution of the function, via pointer values not
1229 *based* on the argument or return value. This guarantee only holds for
1230 memory locations that are *modified*, by any means, during the execution of
1231 the function. The attribute on a return value also has additional semantics
1232 described below. The caller shares the responsibility with the callee for
1233 ensuring that these requirements are met. For further details, please see
1234 the discussion of the NoAlias response in :ref:`alias analysis <Must, May,
1237 Note that this definition of ``noalias`` is intentionally similar
1238 to the definition of ``restrict`` in C99 for function arguments.
1240 For function return values, C99's ``restrict`` is not meaningful,
1241 while LLVM's ``noalias`` is. Furthermore, the semantics of the ``noalias``
1242 attribute on return values are stronger than the semantics of the attribute
1243 when used on function arguments. On function return values, the ``noalias``
1244 attribute indicates that the function acts like a system memory allocation
1245 function, returning a pointer to allocated storage disjoint from the
1246 storage for any other object accessible to the caller.
1251 This indicates that the callee does not :ref:`capture <pointercapture>` the
1252 pointer. This is not a valid attribute for return values.
1253 This attribute applies only to the particular copy of the pointer passed in
1254 this argument. A caller could pass two copies of the same pointer with one
1255 being annotated nocapture and the other not, and the callee could validly
1256 capture through the non annotated parameter.
1258 .. code-block:: llvm
1260 define void @f(i8* nocapture %a, i8* %b) {
1264 call void @f(i8* @glb, i8* @glb) ; well-defined
1267 This indicates that callee does not free the pointer argument. This is not
1268 a valid attribute for return values.
1273 This indicates that the pointer parameter can be excised using the
1274 :ref:`trampoline intrinsics <int_trampoline>`. This is not a valid
1275 attribute for return values and can only be applied to one parameter.
1278 This indicates that the function always returns the argument as its return
1279 value. This is a hint to the optimizer and code generator used when
1280 generating the caller, allowing value propagation, tail call optimization,
1281 and omission of register saves and restores in some cases; it is not
1282 checked or enforced when generating the callee. The parameter and the
1283 function return type must be valid operands for the
1284 :ref:`bitcast instruction <i_bitcast>`. This is not a valid attribute for
1285 return values and can only be applied to one parameter.
1288 This indicates that the parameter or return pointer is not null. This
1289 attribute may only be applied to pointer typed parameters. This is not
1290 checked or enforced by LLVM; if the parameter or return pointer is null,
1291 :ref:`poison value <poisonvalues>` is returned or passed instead.
1292 The ``nonnull`` attribute should be combined with the ``noundef`` attribute
1293 to ensure a pointer is not null or otherwise the behavior is undefined.
1295 ``dereferenceable(<n>)``
1296 This indicates that the parameter or return pointer is dereferenceable. This
1297 attribute may only be applied to pointer typed parameters. A pointer that
1298 is dereferenceable can be loaded from speculatively without a risk of
1299 trapping. The number of bytes known to be dereferenceable must be provided
1300 in parentheses. It is legal for the number of bytes to be less than the
1301 size of the pointee type. The ``nonnull`` attribute does not imply
1302 dereferenceability (consider a pointer to one element past the end of an
1303 array), however ``dereferenceable(<n>)`` does imply ``nonnull`` in
1304 ``addrspace(0)`` (which is the default address space), except if the
1305 ``null_pointer_is_valid`` function attribute is present.
1306 ``n`` should be a positive number. The pointer should be well defined,
1307 otherwise it is undefined behavior. This means ``dereferenceable(<n>)``
1308 implies ``noundef``.
1310 ``dereferenceable_or_null(<n>)``
1311 This indicates that the parameter or return value isn't both
1312 non-null and non-dereferenceable (up to ``<n>`` bytes) at the same
1313 time. All non-null pointers tagged with
1314 ``dereferenceable_or_null(<n>)`` are ``dereferenceable(<n>)``.
1315 For address space 0 ``dereferenceable_or_null(<n>)`` implies that
1316 a pointer is exactly one of ``dereferenceable(<n>)`` or ``null``,
1317 and in other address spaces ``dereferenceable_or_null(<n>)``
1318 implies that a pointer is at least one of ``dereferenceable(<n>)``
1319 or ``null`` (i.e. it may be both ``null`` and
1320 ``dereferenceable(<n>)``). This attribute may only be applied to
1321 pointer typed parameters.
1324 This indicates that the parameter is the self/context parameter. This is not
1325 a valid attribute for return values and can only be applied to one
1329 This indicates that the parameter is the asynchronous context parameter and
1330 triggers the creation of a target-specific extended frame record to store
1331 this pointer. This is not a valid attribute for return values and can only
1332 be applied to one parameter.
1335 This attribute is motivated to model and optimize Swift error handling. It
1336 can be applied to a parameter with pointer to pointer type or a
1337 pointer-sized alloca. At the call site, the actual argument that corresponds
1338 to a ``swifterror`` parameter has to come from a ``swifterror`` alloca or
1339 the ``swifterror`` parameter of the caller. A ``swifterror`` value (either
1340 the parameter or the alloca) can only be loaded and stored from, or used as
1341 a ``swifterror`` argument. This is not a valid attribute for return values
1342 and can only be applied to one parameter.
1344 These constraints allow the calling convention to optimize access to
1345 ``swifterror`` variables by associating them with a specific register at
1346 call boundaries rather than placing them in memory. Since this does change
1347 the calling convention, a function which uses the ``swifterror`` attribute
1348 on a parameter is not ABI-compatible with one which does not.
1350 These constraints also allow LLVM to assume that a ``swifterror`` argument
1351 does not alias any other memory visible within a function and that a
1352 ``swifterror`` alloca passed as an argument does not escape.
1355 This indicates the parameter is required to be an immediate
1356 value. This must be a trivial immediate integer or floating-point
1357 constant. Undef or constant expressions are not valid. This is
1358 only valid on intrinsic declarations and cannot be applied to a
1359 call site or arbitrary function.
1362 This attribute applies to parameters and return values. If the value
1363 representation contains any undefined or poison bits, the behavior is
1364 undefined. Note that this does not refer to padding introduced by the
1365 type's storage representation.
1368 This indicates the alignment that should be considered by the backend when
1369 assigning this parameter to a stack slot during calling convention
1370 lowering. The enforcement of the specified alignment is target-dependent,
1371 as target-specific calling convention rules may override this value. This
1372 attribute serves the purpose of carrying language specific alignment
1373 information that is not mapped to base types in the backend (for example,
1374 over-alignment specification through language attributes).
1378 Garbage Collector Strategy Names
1379 --------------------------------
1381 Each function may specify a garbage collector strategy name, which is simply a
1384 .. code-block:: llvm
1386 define void @f() gc "name" { ... }
1388 The supported values of *name* includes those :ref:`built in to LLVM
1389 <builtin-gc-strategies>` and any provided by loaded plugins. Specifying a GC
1390 strategy will cause the compiler to alter its output in order to support the
1391 named garbage collection algorithm. Note that LLVM itself does not contain a
1392 garbage collector, this functionality is restricted to generating machine code
1393 which can interoperate with a collector provided externally.
1400 Prefix data is data associated with a function which the code
1401 generator will emit immediately before the function's entrypoint.
1402 The purpose of this feature is to allow frontends to associate
1403 language-specific runtime metadata with specific functions and make it
1404 available through the function pointer while still allowing the
1405 function pointer to be called.
1407 To access the data for a given function, a program may bitcast the
1408 function pointer to a pointer to the constant's type and dereference
1409 index -1. This implies that the IR symbol points just past the end of
1410 the prefix data. For instance, take the example of a function annotated
1411 with a single ``i32``,
1413 .. code-block:: llvm
1415 define void @f() prefix i32 123 { ... }
1417 The prefix data can be referenced as,
1419 .. code-block:: llvm
1421 %0 = bitcast void* () @f to i32*
1422 %a = getelementptr inbounds i32, i32* %0, i32 -1
1423 %b = load i32, i32* %a
1425 Prefix data is laid out as if it were an initializer for a global variable
1426 of the prefix data's type. The function will be placed such that the
1427 beginning of the prefix data is aligned. This means that if the size
1428 of the prefix data is not a multiple of the alignment size, the
1429 function's entrypoint will not be aligned. If alignment of the
1430 function's entrypoint is desired, padding must be added to the prefix
1433 A function may have prefix data but no body. This has similar semantics
1434 to the ``available_externally`` linkage in that the data may be used by the
1435 optimizers but will not be emitted in the object file.
1442 The ``prologue`` attribute allows arbitrary code (encoded as bytes) to
1443 be inserted prior to the function body. This can be used for enabling
1444 function hot-patching and instrumentation.
1446 To maintain the semantics of ordinary function calls, the prologue data must
1447 have a particular format. Specifically, it must begin with a sequence of
1448 bytes which decode to a sequence of machine instructions, valid for the
1449 module's target, which transfer control to the point immediately succeeding
1450 the prologue data, without performing any other visible action. This allows
1451 the inliner and other passes to reason about the semantics of the function
1452 definition without needing to reason about the prologue data. Obviously this
1453 makes the format of the prologue data highly target dependent.
1455 A trivial example of valid prologue data for the x86 architecture is ``i8 144``,
1456 which encodes the ``nop`` instruction:
1458 .. code-block:: text
1460 define void @f() prologue i8 144 { ... }
1462 Generally prologue data can be formed by encoding a relative branch instruction
1463 which skips the metadata, as in this example of valid prologue data for the
1464 x86_64 architecture, where the first two bytes encode ``jmp .+10``:
1466 .. code-block:: text
1468 %0 = type <{ i8, i8, i8* }>
1470 define void @f() prologue %0 <{ i8 235, i8 8, i8* @md}> { ... }
1472 A function may have prologue data but no body. This has similar semantics
1473 to the ``available_externally`` linkage in that the data may be used by the
1474 optimizers but will not be emitted in the object file.
1478 Personality Function
1479 --------------------
1481 The ``personality`` attribute permits functions to specify what function
1482 to use for exception handling.
1489 Attribute groups are groups of attributes that are referenced by objects within
1490 the IR. They are important for keeping ``.ll`` files readable, because a lot of
1491 functions will use the same set of attributes. In the degenerative case of a
1492 ``.ll`` file that corresponds to a single ``.c`` file, the single attribute
1493 group will capture the important command line flags used to build that file.
1495 An attribute group is a module-level object. To use an attribute group, an
1496 object references the attribute group's ID (e.g. ``#37``). An object may refer
1497 to more than one attribute group. In that situation, the attributes from the
1498 different groups are merged.
1500 Here is an example of attribute groups for a function that should always be
1501 inlined, has a stack alignment of 4, and which shouldn't use SSE instructions:
1503 .. code-block:: llvm
1505 ; Target-independent attributes:
1506 attributes #0 = { alwaysinline alignstack=4 }
1508 ; Target-dependent attributes:
1509 attributes #1 = { "no-sse" }
1511 ; Function @f has attributes: alwaysinline, alignstack=4, and "no-sse".
1512 define void @f() #0 #1 { ... }
1519 Function attributes are set to communicate additional information about
1520 a function. Function attributes are considered to be part of the
1521 function, not of the function type, so functions with different function
1522 attributes can have the same function type.
1524 Function attributes are simple keywords that follow the type specified.
1525 If multiple attributes are needed, they are space separated. For
1528 .. code-block:: llvm
1530 define void @f() noinline { ... }
1531 define void @f() alwaysinline { ... }
1532 define void @f() alwaysinline optsize { ... }
1533 define void @f() optsize { ... }
1536 This attribute indicates that, when emitting the prologue and
1537 epilogue, the backend should forcibly align the stack pointer.
1538 Specify the desired alignment, which must be a power of two, in
1540 ``allocsize(<EltSizeParam>[, <NumEltsParam>])``
1541 This attribute indicates that the annotated function will always return at
1542 least a given number of bytes (or null). Its arguments are zero-indexed
1543 parameter numbers; if one argument is provided, then it's assumed that at
1544 least ``CallSite.Args[EltSizeParam]`` bytes will be available at the
1545 returned pointer. If two are provided, then it's assumed that
1546 ``CallSite.Args[EltSizeParam] * CallSite.Args[NumEltsParam]`` bytes are
1547 available. The referenced parameters must be integer types. No assumptions
1548 are made about the contents of the returned block of memory.
1550 This attribute indicates that the inliner should attempt to inline
1551 this function into callers whenever possible, ignoring any active
1552 inlining size threshold for this caller.
1554 This indicates that the callee function at a call site should be
1555 recognized as a built-in function, even though the function's declaration
1556 uses the ``nobuiltin`` attribute. This is only valid at call sites for
1557 direct calls to functions that are declared with the ``nobuiltin``
1560 This attribute indicates that this function is rarely called. When
1561 computing edge weights, basic blocks post-dominated by a cold
1562 function call are also considered to be cold; and, thus, given low
1565 In some parallel execution models, there exist operations that cannot be
1566 made control-dependent on any additional values. We call such operations
1567 ``convergent``, and mark them with this attribute.
1569 The ``convergent`` attribute may appear on functions or call/invoke
1570 instructions. When it appears on a function, it indicates that calls to
1571 this function should not be made control-dependent on additional values.
1572 For example, the intrinsic ``llvm.nvvm.barrier0`` is ``convergent``, so
1573 calls to this intrinsic cannot be made control-dependent on additional
1576 When it appears on a call/invoke, the ``convergent`` attribute indicates
1577 that we should treat the call as though we're calling a convergent
1578 function. This is particularly useful on indirect calls; without this we
1579 may treat such calls as though the target is non-convergent.
1581 The optimizer may remove the ``convergent`` attribute on functions when it
1582 can prove that the function does not execute any convergent operations.
1583 Similarly, the optimizer may remove ``convergent`` on calls/invokes when it
1584 can prove that the call/invoke cannot call a convergent function.
1585 ``disable_sanitizer_instrumentation``
1586 When instrumenting code with sanitizers, it can be important to skip certain
1587 functions to ensure no instrumentation is applied to them.
1589 This attribute is not always similar to absent ``sanitize_<name>``
1590 attributes: depending on the specific sanitizer, code can be inserted into
1591 functions regardless of the ``sanitize_<name>`` attribute to prevent false
1594 ``disable_sanitizer_instrumentation`` disables all kinds of instrumentation,
1595 taking precedence over the ``sanitize_<name>`` attributes and other compiler
1597 ``"dontcall-error"``
1598 This attribute denotes that an error diagnostic should be emitted when a
1599 call of a function with this attribute is not eliminated via optimization.
1600 Front ends can provide optional ``srcloc`` metadata nodes on call sites of
1601 such callees to attach information about where in the source language such a
1602 call came from. A string value can be provided as a note.
1604 This attribute denotes that a warning diagnostic should be emitted when a
1605 call of a function with this attribute is not eliminated via optimization.
1606 Front ends can provide optional ``srcloc`` metadata nodes on call sites of
1607 such callees to attach information about where in the source language such a
1608 call came from. A string value can be provided as a note.
1610 This attribute tells the code generator whether the function
1611 should keep the frame pointer. The code generator may emit the frame pointer
1612 even if this attribute says the frame pointer can be eliminated.
1613 The allowed string values are:
1615 * ``"none"`` (default) - the frame pointer can be eliminated.
1616 * ``"non-leaf"`` - the frame pointer should be kept if the function calls
1618 * ``"all"`` - the frame pointer should be kept.
1620 This attribute indicates that this function is a hot spot of the program
1621 execution. The function will be optimized more aggressively and will be
1622 placed into special subsection of the text section to improving locality.
1624 When profile feedback is enabled, this attribute has the precedence over
1625 the profile information. By marking a function ``hot``, users can work
1626 around the cases where the training input does not have good coverage
1627 on all the hot functions.
1628 ``inaccessiblememonly``
1629 This attribute indicates that the function may only access memory that
1630 is not accessible by the module being compiled. This is a weaker form
1631 of ``readnone``. If the function reads or writes other memory, the
1632 behavior is undefined.
1633 ``inaccessiblemem_or_argmemonly``
1634 This attribute indicates that the function may only access memory that is
1635 either not accessible by the module being compiled, or is pointed to
1636 by its pointer arguments. This is a weaker form of ``argmemonly``. If the
1637 function reads or writes other memory, the behavior is undefined.
1639 This attribute indicates that the source code contained a hint that
1640 inlining this function is desirable (such as the "inline" keyword in
1641 C/C++). It is just a hint; it imposes no requirements on the
1644 This attribute indicates that the function should be added to a
1645 jump-instruction table at code-generation time, and that all address-taken
1646 references to this function should be replaced with a reference to the
1647 appropriate jump-instruction-table function pointer. Note that this creates
1648 a new pointer for the original function, which means that code that depends
1649 on function-pointer identity can break. So, any function annotated with
1650 ``jumptable`` must also be ``unnamed_addr``.
1652 This attribute suggests that optimization passes and code generator
1653 passes make choices that keep the code size of this function as small
1654 as possible and perform optimizations that may sacrifice runtime
1655 performance in order to minimize the size of the generated code.
1657 This attribute disables prologue / epilogue emission for the
1658 function. This can have very system-specific consequences.
1659 ``"no-inline-line-tables"``
1660 When this attribute is set to true, the inliner discards source locations
1661 when inlining code and instead uses the source location of the call site.
1662 Breakpoints set on code that was inlined into the current function will
1663 not fire during the execution of the inlined call sites. If the debugger
1664 stops inside an inlined call site, it will appear to be stopped at the
1665 outermost inlined call site.
1667 When this attribute is set to true, the jump tables and lookup tables that
1668 can be generated from a switch case lowering are disabled.
1670 This indicates that the callee function at a call site is not recognized as
1671 a built-in function. LLVM will retain the original call and not replace it
1672 with equivalent code based on the semantics of the built-in function, unless
1673 the call site uses the ``builtin`` attribute. This is valid at call sites
1674 and on function declarations and definitions.
1676 This attribute indicates that calls to the function cannot be
1677 duplicated. A call to a ``noduplicate`` function may be moved
1678 within its parent function, but may not be duplicated within
1679 its parent function.
1681 A function containing a ``noduplicate`` call may still
1682 be an inlining candidate, provided that the call is not
1683 duplicated by inlining. That implies that the function has
1684 internal linkage and only has one call site, so the original
1685 call is dead after inlining.
1687 This function attribute indicates that the function does not, directly or
1688 transitively, call a memory-deallocation function (``free``, for example)
1689 on a memory allocation which existed before the call.
1691 As a result, uncaptured pointers that are known to be dereferenceable
1692 prior to a call to a function with the ``nofree`` attribute are still
1693 known to be dereferenceable after the call. The capturing condition is
1694 necessary in environments where the function might communicate the
1695 pointer to another thread which then deallocates the memory. Alternatively,
1696 ``nosync`` would ensure such communication cannot happen and even captured
1697 pointers cannot be freed by the function.
1699 A ``nofree`` function is explicitly allowed to free memory which it
1700 allocated or (if not ``nosync``) arrange for another thread to free
1701 memory on it's behalf. As a result, perhaps surprisingly, a ``nofree``
1702 function can return a pointer to a previously deallocated memory object.
1704 Disallows implicit floating-point code. This inhibits optimizations that
1705 use floating-point code and floating-point/SIMD/vector registers for
1706 operations that are not nominally floating-point. LLVM instructions that
1707 perform floating-point operations or require access to floating-point
1708 registers may still cause floating-point code to be generated.
1710 This attribute indicates that the inliner should never inline this
1711 function in any situation. This attribute may not be used together
1712 with the ``alwaysinline`` attribute.
1714 This attribute indicates that calls to this function should never be merged
1715 during optimization. For example, it will prevent tail merging otherwise
1716 identical code sequences that raise an exception or terminate the program.
1717 Tail merging normally reduces the precision of source location information,
1718 making stack traces less useful for debugging. This attribute gives the
1719 user control over the tradeoff between code size and debug information
1722 This attribute suppresses lazy symbol binding for the function. This
1723 may make calls to the function faster, at the cost of extra program
1724 startup time if the function is not called during program startup.
1726 This function attribute prevents instrumentation based profiling, used for
1727 coverage or profile based optimization, from being added to a function,
1730 This attribute indicates that the code generator should not use a
1731 red zone, even if the target-specific ABI normally permits it.
1732 ``indirect-tls-seg-refs``
1733 This attribute indicates that the code generator should not use
1734 direct TLS access through segment registers, even if the
1735 target-specific ABI normally permits it.
1737 This function attribute indicates that the function never returns
1738 normally, hence through a return instruction. This produces undefined
1739 behavior at runtime if the function ever does dynamically return. Annotated
1740 functions may still raise an exception, i.a., ``nounwind`` is not implied.
1742 This function attribute indicates that the function does not call itself
1743 either directly or indirectly down any possible call path. This produces
1744 undefined behavior at runtime if the function ever does recurse.
1746 This function attribute indicates that a call of this function will
1747 either exhibit undefined behavior or comes back and continues execution
1748 at a point in the existing call stack that includes the current invocation.
1749 Annotated functions may still raise an exception, i.a., ``nounwind`` is not implied.
1750 If an invocation of an annotated function does not return control back
1751 to a point in the call stack, the behavior is undefined.
1753 This function attribute indicates that the function does not communicate
1754 (synchronize) with another thread through memory or other well-defined means.
1755 Synchronization is considered possible in the presence of `atomic` accesses
1756 that enforce an order, thus not "unordered" and "monotonic", `volatile` accesses,
1757 as well as `convergent` function calls. Note that through `convergent` function calls
1758 non-memory communication, e.g., cross-lane operations, are possible and are also
1759 considered synchronization. However `convergent` does not contradict `nosync`.
1760 If an annotated function does ever synchronize with another thread,
1761 the behavior is undefined.
1763 This function attribute indicates that the function never raises an
1764 exception. If the function does raise an exception, its runtime
1765 behavior is undefined. However, functions marked nounwind may still
1766 trap or generate asynchronous exceptions. Exception handling schemes
1767 that are recognized by LLVM to handle asynchronous exceptions, such
1768 as SEH, will still provide their implementation defined semantics.
1769 ``nosanitize_coverage``
1770 This attribute indicates that SanitizerCoverage instrumentation is disabled
1772 ``null_pointer_is_valid``
1773 If ``null_pointer_is_valid`` is set, then the ``null`` address
1774 in address-space 0 is considered to be a valid address for memory loads and
1775 stores. Any analysis or optimization should not treat dereferencing a
1776 pointer to ``null`` as undefined behavior in this function.
1777 Note: Comparing address of a global variable to ``null`` may still
1778 evaluate to false because of a limitation in querying this attribute inside
1779 constant expressions.
1781 This attribute indicates that this function should be optimized
1782 for maximum fuzzing signal.
1784 This function attribute indicates that most optimization passes will skip
1785 this function, with the exception of interprocedural optimization passes.
1786 Code generation defaults to the "fast" instruction selector.
1787 This attribute cannot be used together with the ``alwaysinline``
1788 attribute; this attribute is also incompatible
1789 with the ``minsize`` attribute and the ``optsize`` attribute.
1791 This attribute requires the ``noinline`` attribute to be specified on
1792 the function as well, so the function is never inlined into any caller.
1793 Only functions with the ``alwaysinline`` attribute are valid
1794 candidates for inlining into the body of this function.
1796 This attribute suggests that optimization passes and code generator
1797 passes make choices that keep the code size of this function low,
1798 and otherwise do optimizations specifically to reduce code size as
1799 long as they do not significantly impact runtime performance.
1800 ``"patchable-function"``
1801 This attribute tells the code generator that the code
1802 generated for this function needs to follow certain conventions that
1803 make it possible for a runtime function to patch over it later.
1804 The exact effect of this attribute depends on its string value,
1805 for which there currently is one legal possibility:
1807 * ``"prologue-short-redirect"`` - This style of patchable
1808 function is intended to support patching a function prologue to
1809 redirect control away from the function in a thread safe
1810 manner. It guarantees that the first instruction of the
1811 function will be large enough to accommodate a short jump
1812 instruction, and will be sufficiently aligned to allow being
1813 fully changed via an atomic compare-and-swap instruction.
1814 While the first requirement can be satisfied by inserting large
1815 enough NOP, LLVM can and will try to re-purpose an existing
1816 instruction (i.e. one that would have to be emitted anyway) as
1817 the patchable instruction larger than a short jump.
1819 ``"prologue-short-redirect"`` is currently only supported on
1822 This attribute by itself does not imply restrictions on
1823 inter-procedural optimizations. All of the semantic effects the
1824 patching may have to be separately conveyed via the linkage type.
1826 This attribute indicates that the function will trigger a guard region
1827 in the end of the stack. It ensures that accesses to the stack must be
1828 no further apart than the size of the guard region to a previous
1829 access of the stack. It takes one required string value, the name of
1830 the stack probing function that will be called.
1832 If a function that has a ``"probe-stack"`` attribute is inlined into
1833 a function with another ``"probe-stack"`` attribute, the resulting
1834 function has the ``"probe-stack"`` attribute of the caller. If a
1835 function that has a ``"probe-stack"`` attribute is inlined into a
1836 function that has no ``"probe-stack"`` attribute at all, the resulting
1837 function has the ``"probe-stack"`` attribute of the callee.
1839 On a function, this attribute indicates that the function computes its
1840 result (or decides to unwind an exception) based strictly on its arguments,
1841 without dereferencing any pointer arguments or otherwise accessing
1842 any mutable state (e.g. memory, control registers, etc) visible to
1843 caller functions. It does not write through any pointer arguments
1844 (including ``byval`` arguments) and never changes any state visible
1845 to callers. This means while it cannot unwind exceptions by calling
1846 the ``C++`` exception throwing methods (since they write to memory), there may
1847 be non-``C++`` mechanisms that throw exceptions without writing to LLVM
1850 On an argument, this attribute indicates that the function does not
1851 dereference that pointer argument, even though it may read or write the
1852 memory that the pointer points to if accessed through other pointers.
1854 If a readnone function reads or writes memory visible to the program, or
1855 has other side-effects, the behavior is undefined. If a function reads from
1856 or writes to a readnone pointer argument, the behavior is undefined.
1858 On a function, this attribute indicates that the function does not write
1859 through any pointer arguments (including ``byval`` arguments) or otherwise
1860 modify any state (e.g. memory, control registers, etc) visible to
1861 caller functions. It may dereference pointer arguments and read
1862 state that may be set in the caller. A readonly function always
1863 returns the same value (or unwinds an exception identically) when
1864 called with the same set of arguments and global state. This means while it
1865 cannot unwind exceptions by calling the ``C++`` exception throwing methods
1866 (since they write to memory), there may be non-``C++`` mechanisms that throw
1867 exceptions without writing to LLVM visible memory.
1869 On an argument, this attribute indicates that the function does not write
1870 through this pointer argument, even though it may write to the memory that
1871 the pointer points to.
1873 If a readonly function writes memory visible to the program, or
1874 has other side-effects, the behavior is undefined. If a function writes to
1875 a readonly pointer argument, the behavior is undefined.
1876 ``"stack-probe-size"``
1877 This attribute controls the behavior of stack probes: either
1878 the ``"probe-stack"`` attribute, or ABI-required stack probes, if any.
1879 It defines the size of the guard region. It ensures that if the function
1880 may use more stack space than the size of the guard region, stack probing
1881 sequence will be emitted. It takes one required integer value, which
1884 If a function that has a ``"stack-probe-size"`` attribute is inlined into
1885 a function with another ``"stack-probe-size"`` attribute, the resulting
1886 function has the ``"stack-probe-size"`` attribute that has the lower
1887 numeric value. If a function that has a ``"stack-probe-size"`` attribute is
1888 inlined into a function that has no ``"stack-probe-size"`` attribute
1889 at all, the resulting function has the ``"stack-probe-size"`` attribute
1891 ``"no-stack-arg-probe"``
1892 This attribute disables ABI-required stack probes, if any.
1894 On a function, this attribute indicates that the function may write to but
1895 does not read from memory.
1897 On an argument, this attribute indicates that the function may write to but
1898 does not read through this pointer argument (even though it may read from
1899 the memory that the pointer points to).
1901 If a writeonly function reads memory visible to the program, or
1902 has other side-effects, the behavior is undefined. If a function reads
1903 from a writeonly pointer argument, the behavior is undefined.
1905 This attribute indicates that the only memory accesses inside function are
1906 loads and stores from objects pointed to by its pointer-typed arguments,
1907 with arbitrary offsets. Or in other words, all memory operations in the
1908 function can refer to memory only using pointers based on its function
1911 Note that ``argmemonly`` can be used together with ``readonly`` attribute
1912 in order to specify that function reads only from its arguments.
1914 If an argmemonly function reads or writes memory other than the pointer
1915 arguments, or has other side-effects, the behavior is undefined.
1917 This attribute indicates that this function can return twice. The C
1918 ``setjmp`` is an example of such a function. The compiler disables
1919 some optimizations (like tail calls) in the caller of these
1922 This attribute indicates that
1923 `SafeStack <https://clang.llvm.org/docs/SafeStack.html>`_
1924 protection is enabled for this function.
1926 If a function that has a ``safestack`` attribute is inlined into a
1927 function that doesn't have a ``safestack`` attribute or which has an
1928 ``ssp``, ``sspstrong`` or ``sspreq`` attribute, then the resulting
1929 function will have a ``safestack`` attribute.
1930 ``sanitize_address``
1931 This attribute indicates that AddressSanitizer checks
1932 (dynamic address safety analysis) are enabled for this function.
1934 This attribute indicates that MemorySanitizer checks (dynamic detection
1935 of accesses to uninitialized memory) are enabled for this function.
1937 This attribute indicates that ThreadSanitizer checks
1938 (dynamic thread safety analysis) are enabled for this function.
1939 ``sanitize_hwaddress``
1940 This attribute indicates that HWAddressSanitizer checks
1941 (dynamic address safety analysis based on tagged pointers) are enabled for
1944 This attribute indicates that MemTagSanitizer checks
1945 (dynamic address safety analysis based on Armv8 MTE) are enabled for
1947 ``speculative_load_hardening``
1948 This attribute indicates that
1949 `Speculative Load Hardening <https://llvm.org/docs/SpeculativeLoadHardening.html>`_
1950 should be enabled for the function body.
1952 Speculative Load Hardening is a best-effort mitigation against
1953 information leak attacks that make use of control flow
1954 miss-speculation - specifically miss-speculation of whether a branch
1955 is taken or not. Typically vulnerabilities enabling such attacks are
1956 classified as "Spectre variant #1". Notably, this does not attempt to
1957 mitigate against miss-speculation of branch target, classified as
1958 "Spectre variant #2" vulnerabilities.
1960 When inlining, the attribute is sticky. Inlining a function that carries
1961 this attribute will cause the caller to gain the attribute. This is intended
1962 to provide a maximally conservative model where the code in a function
1963 annotated with this attribute will always (even after inlining) end up
1966 This function attribute indicates that the function does not have any
1967 effects besides calculating its result and does not have undefined behavior.
1968 Note that ``speculatable`` is not enough to conclude that along any
1969 particular execution path the number of calls to this function will not be
1970 externally observable. This attribute is only valid on functions
1971 and declarations, not on individual call sites. If a function is
1972 incorrectly marked as speculatable and really does exhibit
1973 undefined behavior, the undefined behavior may be observed even
1974 if the call site is dead code.
1977 This attribute indicates that the function should emit a stack
1978 smashing protector. It is in the form of a "canary" --- a random value
1979 placed on the stack before the local variables that's checked upon
1980 return from the function to see if it has been overwritten. A
1981 heuristic is used to determine if a function needs stack protectors
1982 or not. The heuristic used will enable protectors for functions with:
1984 - Character arrays larger than ``ssp-buffer-size`` (default 8).
1985 - Aggregates containing character arrays larger than ``ssp-buffer-size``.
1986 - Calls to alloca() with variable sizes or constant sizes greater than
1987 ``ssp-buffer-size``.
1989 Variables that are identified as requiring a protector will be arranged
1990 on the stack such that they are adjacent to the stack protector guard.
1992 A function with the ``ssp`` attribute but without the ``alwaysinline``
1993 attribute cannot be inlined into a function without a
1994 ``ssp/sspreq/sspstrong`` attribute. If inlined, the caller will get the
1995 ``ssp`` attribute. ``call``, ``invoke``, and ``callbr`` instructions with
1996 the ``alwaysinline`` attribute force inlining.
1998 This attribute indicates that the function should emit a stack smashing
1999 protector. This attribute causes a strong heuristic to be used when
2000 determining if a function needs stack protectors. The strong heuristic
2001 will enable protectors for functions with:
2003 - Arrays of any size and type
2004 - Aggregates containing an array of any size and type.
2005 - Calls to alloca().
2006 - Local variables that have had their address taken.
2008 Variables that are identified as requiring a protector will be arranged
2009 on the stack such that they are adjacent to the stack protector guard.
2010 The specific layout rules are:
2012 #. Large arrays and structures containing large arrays
2013 (``>= ssp-buffer-size``) are closest to the stack protector.
2014 #. Small arrays and structures containing small arrays
2015 (``< ssp-buffer-size``) are 2nd closest to the protector.
2016 #. Variables that have had their address taken are 3rd closest to the
2019 This overrides the ``ssp`` function attribute.
2021 A function with the ``sspstrong`` attribute but without the
2022 ``alwaysinline`` attribute cannot be inlined into a function without a
2023 ``ssp/sspstrong/sspreq`` attribute. If inlined, the caller will get the
2024 ``sspstrong`` attribute unless the ``sspreq`` attribute exists. ``call``,
2025 ``invoke``, and ``callbr`` instructions with the ``alwaysinline`` attribute
2028 This attribute indicates that the function should *always* emit a stack
2029 smashing protector. This overrides the ``ssp`` and ``sspstrong`` function
2032 Variables that are identified as requiring a protector will be arranged
2033 on the stack such that they are adjacent to the stack protector guard.
2034 The specific layout rules are:
2036 #. Large arrays and structures containing large arrays
2037 (``>= ssp-buffer-size``) are closest to the stack protector.
2038 #. Small arrays and structures containing small arrays
2039 (``< ssp-buffer-size``) are 2nd closest to the protector.
2040 #. Variables that have had their address taken are 3rd closest to the
2043 A function with the ``sspreq`` attribute but without the ``alwaysinline``
2044 attribute cannot be inlined into a function without a
2045 ``ssp/sspstrong/sspreq`` attribute. If inlined, the caller will get the
2046 ``sspreq`` attribute. ``call``, ``invoke``, and ``callbr`` instructions
2047 with the ``alwaysinline`` attribute force inlining.
2050 This attribute indicates that the function was called from a scope that
2051 requires strict floating-point semantics. LLVM will not attempt any
2052 optimizations that require assumptions about the floating-point rounding
2053 mode or that might alter the state of floating-point status flags that
2054 might otherwise be set or cleared by calling this function. LLVM will
2055 not introduce any new floating-point instructions that may trap.
2057 ``"denormal-fp-math"``
2058 This indicates the denormal (subnormal) handling that may be
2059 assumed for the default floating-point environment. This is a
2060 comma separated pair. The elements may be one of ``"ieee"``,
2061 ``"preserve-sign"``, or ``"positive-zero"``. The first entry
2062 indicates the flushing mode for the result of floating point
2063 operations. The second indicates the handling of denormal inputs
2064 to floating point instructions. For compatibility with older
2065 bitcode, if the second value is omitted, both input and output
2066 modes will assume the same mode.
2068 If this is attribute is not specified, the default is
2071 If the output mode is ``"preserve-sign"``, or ``"positive-zero"``,
2072 denormal outputs may be flushed to zero by standard floating-point
2073 operations. It is not mandated that flushing to zero occurs, but if
2074 a denormal output is flushed to zero, it must respect the sign
2075 mode. Not all targets support all modes. While this indicates the
2076 expected floating point mode the function will be executed with,
2077 this does not make any attempt to ensure the mode is
2078 consistent. User or platform code is expected to set the floating
2079 point mode appropriately before function entry.
2081 If the input mode is ``"preserve-sign"``, or ``"positive-zero"``, a
2082 floating-point operation must treat any input denormal value as
2083 zero. In some situations, if an instruction does not respect this
2084 mode, the input may need to be converted to 0 as if by
2085 ``@llvm.canonicalize`` during lowering for correctness.
2087 ``"denormal-fp-math-f32"``
2088 Same as ``"denormal-fp-math"``, but only controls the behavior of
2089 the 32-bit float type (or vectors of 32-bit floats). If both are
2090 are present, this overrides ``"denormal-fp-math"``. Not all targets
2091 support separately setting the denormal mode per type, and no
2092 attempt is made to diagnose unsupported uses. Currently this
2093 attribute is respected by the AMDGPU and NVPTX backends.
2096 This attribute indicates that the function will delegate to some other
2097 function with a tail call. The prototype of a thunk should not be used for
2098 optimization purposes. The caller is expected to cast the thunk prototype to
2099 match the thunk target prototype.
2101 This attribute indicates that the ABI being targeted requires that
2102 an unwind table entry be produced for this function even if we can
2103 show that no exceptions passes by it. This is normally the case for
2104 the ELF x86-64 abi, but it can be disabled for some compilation
2107 This attribute indicates that no control-flow check will be performed on
2108 the attributed entity. It disables -fcf-protection=<> for a specific
2109 entity to fine grain the HW control flow protection mechanism. The flag
2110 is target independent and currently appertains to a function or function
2113 This attribute indicates that the ShadowCallStack checks are enabled for
2114 the function. The instrumentation checks that the return address for the
2115 function has not changed between the function prolog and epilog. It is
2116 currently x86_64-specific.
2118 This attribute indicates that the function is required to return, unwind,
2119 or interact with the environment in an observable way e.g. via a volatile
2120 memory access, I/O, or other synchronization. The ``mustprogress``
2121 attribute is intended to model the requirements of the first section of
2122 [intro.progress] of the C++ Standard. As a consequence, a loop in a
2123 function with the `mustprogress` attribute can be assumed to terminate if
2124 it does not interact with the environment in an observable way, and
2125 terminating loops without side-effects can be removed. If a `mustprogress`
2126 function does not satisfy this contract, the behavior is undefined. This
2127 attribute does not apply transitively to callees, but does apply to call
2128 sites within the function. Note that `willreturn` implies `mustprogress`.
2129 ``"warn-stack-size"="<threshold>"``
2130 This attribute sets a threshold to emit diagnostics once the frame size is
2131 known should the frame size exceed the specified value. It takes one
2132 required integer value, which should be a non-negative integer, and less
2133 than `UINT_MAX`. It's unspecified which threshold will be used when
2134 duplicate definitions are linked together with differing values.
2135 ``vscale_range(<min>[, <max>])``
2136 This attribute indicates the minimum and maximum vscale value for the given
2137 function. A value of 0 means unbounded. If the optional max value is omitted
2138 then max is set to the value of min. If the attribute is not present, no
2139 assumptions are made about the range of vscale.
2141 Call Site Attributes
2142 ----------------------
2144 In addition to function attributes the following call site only
2145 attributes are supported:
2147 ``vector-function-abi-variant``
2148 This attribute can be attached to a :ref:`call <i_call>` to list
2149 the vector functions associated to the function. Notice that the
2150 attribute cannot be attached to a :ref:`invoke <i_invoke>` or a
2151 :ref:`callbr <i_callbr>` instruction. The attribute consists of a
2152 comma separated list of mangled names. The order of the list does
2153 not imply preference (it is logically a set). The compiler is free
2154 to pick any listed vector function of its choosing.
2156 The syntax for the mangled names is as follows:::
2158 _ZGV<isa><mask><vlen><parameters>_<scalar_name>[(<vector_redirection>)]
2160 When present, the attribute informs the compiler that the function
2161 ``<scalar_name>`` has a corresponding vector variant that can be
2162 used to perform the concurrent invocation of ``<scalar_name>`` on
2163 vectors. The shape of the vector function is described by the
2164 tokens between the prefix ``_ZGV`` and the ``<scalar_name>``
2165 token. The standard name of the vector function is
2166 ``_ZGV<isa><mask><vlen><parameters>_<scalar_name>``. When present,
2167 the optional token ``(<vector_redirection>)`` informs the compiler
2168 that a custom name is provided in addition to the standard one
2169 (custom names can be provided for example via the use of ``declare
2170 variant`` in OpenMP 5.0). The declaration of the variant must be
2171 present in the IR Module. The signature of the vector variant is
2172 determined by the rules of the Vector Function ABI (VFABI)
2173 specifications of the target. For Arm and X86, the VFABI can be
2174 found at https://github.com/ARM-software/abi-aa and
2175 https://software.intel.com/content/www/us/en/develop/download/vector-simd-function-abi.html,
2178 For X86 and Arm targets, the values of the tokens in the standard
2179 name are those that are defined in the VFABI. LLVM has an internal
2180 ``<isa>`` token that can be used to create scalar-to-vector
2181 mappings for functions that are not directly associated to any of
2182 the target ISAs (for example, some of the mappings stored in the
2183 TargetLibraryInfo). Valid values for the ``<isa>`` token are:::
2185 <isa>:= b | c | d | e -> X86 SSE, AVX, AVX2, AVX512
2186 | n | s -> Armv8 Advanced SIMD, SVE
2187 | __LLVM__ -> Internal LLVM Vector ISA
2189 For all targets currently supported (x86, Arm and Internal LLVM),
2190 the remaining tokens can have the following values:::
2192 <mask>:= M | N -> mask | no mask
2194 <vlen>:= number -> number of lanes
2195 | x -> VLA (Vector Length Agnostic)
2197 <parameters>:= v -> vector
2198 | l | l <number> -> linear
2199 | R | R <number> -> linear with ref modifier
2200 | L | L <number> -> linear with val modifier
2201 | U | U <number> -> linear with uval modifier
2202 | ls <pos> -> runtime linear
2203 | Rs <pos> -> runtime linear with ref modifier
2204 | Ls <pos> -> runtime linear with val modifier
2205 | Us <pos> -> runtime linear with uval modifier
2208 <scalar_name>:= name of the scalar function
2210 <vector_redirection>:= optional, custom name of the vector function
2212 ``preallocated(<ty>)``
2213 This attribute is required on calls to ``llvm.call.preallocated.arg``
2214 and cannot be used on any other call. See
2215 :ref:`llvm.call.preallocated.arg<int_call_preallocated_arg>` for more
2223 Attributes may be set to communicate additional information about a global variable.
2224 Unlike :ref:`function attributes <fnattrs>`, attributes on a global variable
2225 are grouped into a single :ref:`attribute group <attrgrp>`.
2232 Operand bundles are tagged sets of SSA values that can be associated
2233 with certain LLVM instructions (currently only ``call`` s and
2234 ``invoke`` s). In a way they are like metadata, but dropping them is
2235 incorrect and will change program semantics.
2239 operand bundle set ::= '[' operand bundle (, operand bundle )* ']'
2240 operand bundle ::= tag '(' [ bundle operand ] (, bundle operand )* ')'
2241 bundle operand ::= SSA value
2242 tag ::= string constant
2244 Operand bundles are **not** part of a function's signature, and a
2245 given function may be called from multiple places with different kinds
2246 of operand bundles. This reflects the fact that the operand bundles
2247 are conceptually a part of the ``call`` (or ``invoke``), not the
2248 callee being dispatched to.
2250 Operand bundles are a generic mechanism intended to support
2251 runtime-introspection-like functionality for managed languages. While
2252 the exact semantics of an operand bundle depend on the bundle tag,
2253 there are certain limitations to how much the presence of an operand
2254 bundle can influence the semantics of a program. These restrictions
2255 are described as the semantics of an "unknown" operand bundle. As
2256 long as the behavior of an operand bundle is describable within these
2257 restrictions, LLVM does not need to have special knowledge of the
2258 operand bundle to not miscompile programs containing it.
2260 - The bundle operands for an unknown operand bundle escape in unknown
2261 ways before control is transferred to the callee or invokee.
2262 - Calls and invokes with operand bundles have unknown read / write
2263 effect on the heap on entry and exit (even if the call target is
2264 ``readnone`` or ``readonly``), unless they're overridden with
2265 callsite specific attributes.
2266 - An operand bundle at a call site cannot change the implementation
2267 of the called function. Inter-procedural optimizations work as
2268 usual as long as they take into account the first two properties.
2270 More specific types of operand bundles are described below.
2272 .. _deopt_opbundles:
2274 Deoptimization Operand Bundles
2275 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2277 Deoptimization operand bundles are characterized by the ``"deopt"``
2278 operand bundle tag. These operand bundles represent an alternate
2279 "safe" continuation for the call site they're attached to, and can be
2280 used by a suitable runtime to deoptimize the compiled frame at the
2281 specified call site. There can be at most one ``"deopt"`` operand
2282 bundle attached to a call site. Exact details of deoptimization is
2283 out of scope for the language reference, but it usually involves
2284 rewriting a compiled frame into a set of interpreted frames.
2286 From the compiler's perspective, deoptimization operand bundles make
2287 the call sites they're attached to at least ``readonly``. They read
2288 through all of their pointer typed operands (even if they're not
2289 otherwise escaped) and the entire visible heap. Deoptimization
2290 operand bundles do not capture their operands except during
2291 deoptimization, in which case control will not be returned to the
2294 The inliner knows how to inline through calls that have deoptimization
2295 operand bundles. Just like inlining through a normal call site
2296 involves composing the normal and exceptional continuations, inlining
2297 through a call site with a deoptimization operand bundle needs to
2298 appropriately compose the "safe" deoptimization continuation. The
2299 inliner does this by prepending the parent's deoptimization
2300 continuation to every deoptimization continuation in the inlined body.
2301 E.g. inlining ``@f`` into ``@g`` in the following example
2303 .. code-block:: llvm
2306 call void @x() ;; no deopt state
2307 call void @y() [ "deopt"(i32 10) ]
2308 call void @y() [ "deopt"(i32 10), "unknown"(i8* null) ]
2313 call void @f() [ "deopt"(i32 20) ]
2319 .. code-block:: llvm
2322 call void @x() ;; still no deopt state
2323 call void @y() [ "deopt"(i32 20, i32 10) ]
2324 call void @y() [ "deopt"(i32 20, i32 10), "unknown"(i8* null) ]
2328 It is the frontend's responsibility to structure or encode the
2329 deoptimization state in a way that syntactically prepending the
2330 caller's deoptimization state to the callee's deoptimization state is
2331 semantically equivalent to composing the caller's deoptimization
2332 continuation after the callee's deoptimization continuation.
2336 Funclet Operand Bundles
2337 ^^^^^^^^^^^^^^^^^^^^^^^
2339 Funclet operand bundles are characterized by the ``"funclet"``
2340 operand bundle tag. These operand bundles indicate that a call site
2341 is within a particular funclet. There can be at most one
2342 ``"funclet"`` operand bundle attached to a call site and it must have
2343 exactly one bundle operand.
2345 If any funclet EH pads have been "entered" but not "exited" (per the
2346 `description in the EH doc\ <ExceptionHandling.html#wineh-constraints>`_),
2347 it is undefined behavior to execute a ``call`` or ``invoke`` which:
2349 * does not have a ``"funclet"`` bundle and is not a ``call`` to a nounwind
2351 * has a ``"funclet"`` bundle whose operand is not the most-recently-entered
2352 not-yet-exited funclet EH pad.
2354 Similarly, if no funclet EH pads have been entered-but-not-yet-exited,
2355 executing a ``call`` or ``invoke`` with a ``"funclet"`` bundle is undefined behavior.
2357 GC Transition Operand Bundles
2358 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2360 GC transition operand bundles are characterized by the
2361 ``"gc-transition"`` operand bundle tag. These operand bundles mark a
2362 call as a transition between a function with one GC strategy to a
2363 function with a different GC strategy. If coordinating the transition
2364 between GC strategies requires additional code generation at the call
2365 site, these bundles may contain any values that are needed by the
2366 generated code. For more details, see :ref:`GC Transitions
2367 <gc_transition_args>`.
2369 The bundle contain an arbitrary list of Values which need to be passed
2370 to GC transition code. They will be lowered and passed as operands to
2371 the appropriate GC_TRANSITION nodes in the selection DAG. It is assumed
2372 that these arguments must be available before and after (but not
2373 necessarily during) the execution of the callee.
2375 .. _assume_opbundles:
2377 Assume Operand Bundles
2378 ^^^^^^^^^^^^^^^^^^^^^^
2380 Operand bundles on an :ref:`llvm.assume <int_assume>` allows representing
2381 assumptions that a :ref:`parameter attribute <paramattrs>` or a
2382 :ref:`function attribute <fnattrs>` holds for a certain value at a certain
2383 location. Operand bundles enable assumptions that are either hard or impossible
2384 to represent as a boolean argument of an :ref:`llvm.assume <int_assume>`.
2386 An assume operand bundle has the form:
2390 "<tag>"([ <holds for value> [, <attribute argument>] ])
2392 * The tag of the operand bundle is usually the name of attribute that can be
2393 assumed to hold. It can also be `ignore`, this tag doesn't contain any
2394 information and should be ignored.
2395 * The first argument if present is the value for which the attribute hold.
2396 * The second argument if present is an argument of the attribute.
2398 If there are no arguments the attribute is a property of the call location.
2400 If the represented attribute expects a constant argument, the argument provided
2401 to the operand bundle should be a constant as well.
2405 .. code-block:: llvm
2407 call void @llvm.assume(i1 true) ["align"(i32* %val, i32 8)]
2409 allows the optimizer to assume that at location of call to
2410 :ref:`llvm.assume <int_assume>` ``%val`` has an alignment of at least 8.
2412 .. code-block:: llvm
2414 call void @llvm.assume(i1 %cond) ["cold"(), "nonnull"(i64* %val)]
2416 allows the optimizer to assume that the :ref:`llvm.assume <int_assume>`
2417 call location is cold and that ``%val`` may not be null.
2419 Just like for the argument of :ref:`llvm.assume <int_assume>`, if any of the
2420 provided guarantees are violated at runtime the behavior is undefined.
2422 Even if the assumed property can be encoded as a boolean value, like
2423 ``nonnull``, using operand bundles to express the property can still have
2426 * Attributes that can be expressed via operand bundles are directly the
2427 property that the optimizer uses and cares about. Encoding attributes as
2428 operand bundles removes the need for an instruction sequence that represents
2429 the property (e.g., `icmp ne i32* %p, null` for `nonnull`) and for the
2430 optimizer to deduce the property from that instruction sequence.
2431 * Expressing the property using operand bundles makes it easy to identify the
2432 use of the value as a use in an :ref:`llvm.assume <int_assume>`. This then
2433 simplifies and improves heuristics, e.g., for use "use-sensitive"
2436 .. _ob_preallocated:
2438 Preallocated Operand Bundles
2439 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2441 Preallocated operand bundles are characterized by the ``"preallocated"``
2442 operand bundle tag. These operand bundles allow separation of the allocation
2443 of the call argument memory from the call site. This is necessary to pass
2444 non-trivially copyable objects by value in a way that is compatible with MSVC
2445 on some targets. There can be at most one ``"preallocated"`` operand bundle
2446 attached to a call site and it must have exactly one bundle operand, which is
2447 a token generated by ``@llvm.call.preallocated.setup``. A call with this
2448 operand bundle should not adjust the stack before entering the function, as
2449 that will have been done by one of the ``@llvm.call.preallocated.*`` intrinsics.
2451 .. code-block:: llvm
2453 %foo = type { i64, i32 }
2457 %t = call token @llvm.call.preallocated.setup(i32 1)
2458 %a = call i8* @llvm.call.preallocated.arg(token %t, i32 0) preallocated(%foo)
2459 %b = bitcast i8* %a to %foo*
2461 call void @bar(i32 42, %foo* preallocated(%foo) %b) ["preallocated"(token %t)]
2465 GC Live Operand Bundles
2466 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2468 A "gc-live" operand bundle is only valid on a :ref:`gc.statepoint <gc_statepoint>`
2469 intrinsic. The operand bundle must contain every pointer to a garbage collected
2470 object which potentially needs to be updated by the garbage collector.
2472 When lowered, any relocated value will be recorded in the corresponding
2473 :ref:`stackmap entry <statepoint-stackmap-format>`. See the intrinsic description
2474 for further details.
2476 ObjC ARC Attached Call Operand Bundles
2477 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2479 A ``"clang.arc.attachedcall"`` operand bundle on a call indicates the call is
2480 implicitly followed by a marker instruction and a call to an ObjC runtime
2481 function that uses the result of the call. The operand bundle takes either the
2482 pointer to the runtime function (``@objc_retainAutoreleasedReturnValue`` or
2483 ``@objc_unsafeClaimAutoreleasedReturnValue``) or no arguments. If the bundle
2484 doesn't take any arguments, only the marker instruction has to be emitted after
2485 the call; the runtime function calls don't have to be emitted since they already
2486 have been emitted. The return value of a call with this bundle is used by a call
2487 to ``@llvm.objc.clang.arc.noop.use`` unless the called function's return type is
2488 void, in which case the operand bundle is ignored.
2490 .. code-block:: llvm
2492 ; The marker instruction and a runtime function call are inserted after the call
2494 call i8* @foo() [ "clang.arc.attachedcall"(i8* (i8*)* @objc_retainAutoreleasedReturnValue) ]
2495 call i8* @foo() [ "clang.arc.attachedcall"(i8* (i8*)* @objc_unsafeClaimAutoreleasedReturnValue) ]
2497 ; Only the marker instruction is inserted after the call to @foo.
2498 call i8* @foo() [ "clang.arc.attachedcall"() ]
2500 The operand bundle is needed to ensure the call is immediately followed by the
2501 marker instruction or the ObjC runtime call in the final output.
2505 Module-Level Inline Assembly
2506 ----------------------------
2508 Modules may contain "module-level inline asm" blocks, which corresponds
2509 to the GCC "file scope inline asm" blocks. These blocks are internally
2510 concatenated by LLVM and treated as a single unit, but may be separated
2511 in the ``.ll`` file if desired. The syntax is very simple:
2513 .. code-block:: llvm
2515 module asm "inline asm code goes here"
2516 module asm "more can go here"
2518 The strings can contain any character by escaping non-printable
2519 characters. The escape sequence used is simply "\\xx" where "xx" is the
2520 two digit hex code for the number.
2522 Note that the assembly string *must* be parseable by LLVM's integrated assembler
2523 (unless it is disabled), even when emitting a ``.s`` file.
2525 .. _langref_datalayout:
2530 A module may specify a target specific data layout string that specifies
2531 how data is to be laid out in memory. The syntax for the data layout is
2534 .. code-block:: llvm
2536 target datalayout = "layout specification"
2538 The *layout specification* consists of a list of specifications
2539 separated by the minus sign character ('-'). Each specification starts
2540 with a letter and may include other information after the letter to
2541 define some aspect of the data layout. The specifications accepted are
2545 Specifies that the target lays out data in big-endian form. That is,
2546 the bits with the most significance have the lowest address
2549 Specifies that the target lays out data in little-endian form. That
2550 is, the bits with the least significance have the lowest address
2553 Specifies the natural alignment of the stack in bits. Alignment
2554 promotion of stack variables is limited to the natural stack
2555 alignment to avoid dynamic stack realignment. The stack alignment
2556 must be a multiple of 8-bits. If omitted, the natural stack
2557 alignment defaults to "unspecified", which does not prevent any
2558 alignment promotions.
2559 ``P<address space>``
2560 Specifies the address space that corresponds to program memory.
2561 Harvard architectures can use this to specify what space LLVM
2562 should place things such as functions into. If omitted, the
2563 program memory space defaults to the default address space of 0,
2564 which corresponds to a Von Neumann architecture that has code
2565 and data in the same space.
2566 ``G<address space>``
2567 Specifies the address space to be used by default when creating global
2568 variables. If omitted, the globals address space defaults to the default
2570 Note: variable declarations without an address space are always created in
2571 address space 0, this property only affects the default value to be used
2572 when creating globals without additional contextual information (e.g. in
2574 ``A<address space>``
2575 Specifies the address space of objects created by '``alloca``'.
2576 Defaults to the default address space of 0.
2577 ``p[n]:<size>:<abi>[:<pref>][:<idx>]``
2578 This specifies the *size* of a pointer and its ``<abi>`` and
2579 ``<pref>``\erred alignments for address space ``n``. ``<pref>`` is optional
2580 and defaults to ``<abi>``. The fourth parameter ``<idx>`` is the size of the
2581 index that used for address calculation. If not
2582 specified, the default index size is equal to the pointer size. All sizes
2583 are in bits. The address space, ``n``, is optional, and if not specified,
2584 denotes the default address space 0. The value of ``n`` must be
2585 in the range [1,2^23).
2586 ``i<size>:<abi>[:<pref>]``
2587 This specifies the alignment for an integer type of a given bit
2588 ``<size>``. The value of ``<size>`` must be in the range [1,2^23).
2589 ``<pref>`` is optional and defaults to ``<abi>``.
2590 ``v<size>:<abi>[:<pref>]``
2591 This specifies the alignment for a vector type of a given bit
2592 ``<size>``. The value of ``<size>`` must be in the range [1,2^23).
2593 ``<pref>`` is optional and defaults to ``<abi>``.
2594 ``f<size>:<abi>[:<pref>]``
2595 This specifies the alignment for a floating-point type of a given bit
2596 ``<size>``. Only values of ``<size>`` that are supported by the target
2597 will work. 32 (float) and 64 (double) are supported on all targets; 80
2598 or 128 (different flavors of long double) are also supported on some
2599 targets. The value of ``<size>`` must be in the range [1,2^23).
2600 ``<pref>`` is optional and defaults to ``<abi>``.
2601 ``a:<abi>[:<pref>]``
2602 This specifies the alignment for an object of aggregate type.
2603 ``<pref>`` is optional and defaults to ``<abi>``.
2605 This specifies the alignment for function pointers.
2606 The options for ``<type>`` are:
2608 * ``i``: The alignment of function pointers is independent of the alignment
2609 of functions, and is a multiple of ``<abi>``.
2610 * ``n``: The alignment of function pointers is a multiple of the explicit
2611 alignment specified on the function, and is a multiple of ``<abi>``.
2613 If present, specifies that llvm names are mangled in the output. Symbols
2614 prefixed with the mangling escape character ``\01`` are passed through
2615 directly to the assembler without the escape character. The mangling style
2618 * ``e``: ELF mangling: Private symbols get a ``.L`` prefix.
2619 * ``l``: GOFF mangling: Private symbols get a ``@`` prefix.
2620 * ``m``: Mips mangling: Private symbols get a ``$`` prefix.
2621 * ``o``: Mach-O mangling: Private symbols get ``L`` prefix. Other
2622 symbols get a ``_`` prefix.
2623 * ``x``: Windows x86 COFF mangling: Private symbols get the usual prefix.
2624 Regular C symbols get a ``_`` prefix. Functions with ``__stdcall``,
2625 ``__fastcall``, and ``__vectorcall`` have custom mangling that appends
2626 ``@N`` where N is the number of bytes used to pass parameters. C++ symbols
2627 starting with ``?`` are not mangled in any way.
2628 * ``w``: Windows COFF mangling: Similar to ``x``, except that normal C
2629 symbols do not receive a ``_`` prefix.
2630 * ``a``: XCOFF mangling: Private symbols get a ``L..`` prefix.
2631 ``n<size1>:<size2>:<size3>...``
2632 This specifies a set of native integer widths for the target CPU in
2633 bits. For example, it might contain ``n32`` for 32-bit PowerPC,
2634 ``n32:64`` for PowerPC 64, or ``n8:16:32:64`` for X86-64. Elements of
2635 this set are considered to support most general arithmetic operations
2637 ``ni:<address space0>:<address space1>:<address space2>...``
2638 This specifies pointer types with the specified address spaces
2639 as :ref:`Non-Integral Pointer Type <nointptrtype>` s. The ``0``
2640 address space cannot be specified as non-integral.
2642 On every specification that takes a ``<abi>:<pref>``, specifying the
2643 ``<pref>`` alignment is optional. If omitted, the preceding ``:``
2644 should be omitted too and ``<pref>`` will be equal to ``<abi>``.
2646 When constructing the data layout for a given target, LLVM starts with a
2647 default set of specifications which are then (possibly) overridden by
2648 the specifications in the ``datalayout`` keyword. The default
2649 specifications are given in this list:
2651 - ``e`` - little endian
2652 - ``p:64:64:64`` - 64-bit pointers with 64-bit alignment.
2653 - ``p[n]:64:64:64`` - Other address spaces are assumed to be the
2654 same as the default address space.
2655 - ``S0`` - natural stack alignment is unspecified
2656 - ``i1:8:8`` - i1 is 8-bit (byte) aligned
2657 - ``i8:8:8`` - i8 is 8-bit (byte) aligned
2658 - ``i16:16:16`` - i16 is 16-bit aligned
2659 - ``i32:32:32`` - i32 is 32-bit aligned
2660 - ``i64:32:64`` - i64 has ABI alignment of 32-bits but preferred
2661 alignment of 64-bits
2662 - ``f16:16:16`` - half is 16-bit aligned
2663 - ``f32:32:32`` - float is 32-bit aligned
2664 - ``f64:64:64`` - double is 64-bit aligned
2665 - ``f128:128:128`` - quad is 128-bit aligned
2666 - ``v64:64:64`` - 64-bit vector is 64-bit aligned
2667 - ``v128:128:128`` - 128-bit vector is 128-bit aligned
2668 - ``a:0:64`` - aggregates are 64-bit aligned
2670 When LLVM is determining the alignment for a given type, it uses the
2673 #. If the type sought is an exact match for one of the specifications,
2674 that specification is used.
2675 #. If no match is found, and the type sought is an integer type, then
2676 the smallest integer type that is larger than the bitwidth of the
2677 sought type is used. If none of the specifications are larger than
2678 the bitwidth then the largest integer type is used. For example,
2679 given the default specifications above, the i7 type will use the
2680 alignment of i8 (next largest) while both i65 and i256 will use the
2681 alignment of i64 (largest specified).
2682 #. If no match is found, and the type sought is a vector type, then the
2683 largest vector type that is smaller than the sought vector type will
2684 be used as a fall back. This happens because <128 x double> can be
2685 implemented in terms of 64 <2 x double>, for example.
2687 The function of the data layout string may not be what you expect.
2688 Notably, this is not a specification from the frontend of what alignment
2689 the code generator should use.
2691 Instead, if specified, the target data layout is required to match what
2692 the ultimate *code generator* expects. This string is used by the
2693 mid-level optimizers to improve code, and this only works if it matches
2694 what the ultimate code generator uses. There is no way to generate IR
2695 that does not embed this target-specific detail into the IR. If you
2696 don't specify the string, the default specifications will be used to
2697 generate a Data Layout and the optimization phases will operate
2698 accordingly and introduce target specificity into the IR with respect to
2699 these default specifications.
2706 A module may specify a target triple string that describes the target
2707 host. The syntax for the target triple is simply:
2709 .. code-block:: llvm
2711 target triple = "x86_64-apple-macosx10.7.0"
2713 The *target triple* string consists of a series of identifiers delimited
2714 by the minus sign character ('-'). The canonical forms are:
2718 ARCHITECTURE-VENDOR-OPERATING_SYSTEM
2719 ARCHITECTURE-VENDOR-OPERATING_SYSTEM-ENVIRONMENT
2721 This information is passed along to the backend so that it generates
2722 code for the proper architecture. It's possible to override this on the
2723 command line with the ``-mtriple`` command line option.
2728 ----------------------
2730 A memory object, or simply object, is a region of a memory space that is
2731 reserved by a memory allocation such as :ref:`alloca <i_alloca>`, heap
2732 allocation calls, and global variable definitions.
2733 Once it is allocated, the bytes stored in the region can only be read or written
2734 through a pointer that is :ref:`based on <pointeraliasing>` the allocation
2736 If a pointer that is not based on the object tries to read or write to the
2737 object, it is undefined behavior.
2739 A lifetime of a memory object is a property that decides its accessibility.
2740 Unless stated otherwise, a memory object is alive since its allocation, and
2741 dead after its deallocation.
2742 It is undefined behavior to access a memory object that isn't alive, but
2743 operations that don't dereference it such as
2744 :ref:`getelementptr <i_getelementptr>`, :ref:`ptrtoint <i_ptrtoint>` and
2745 :ref:`icmp <i_icmp>` return a valid result.
2746 This explains code motion of these instructions across operations that
2747 impact the object's lifetime.
2748 A stack object's lifetime can be explicitly specified using
2749 :ref:`llvm.lifetime.start <int_lifestart>` and
2750 :ref:`llvm.lifetime.end <int_lifeend>` intrinsic function calls.
2752 .. _pointeraliasing:
2754 Pointer Aliasing Rules
2755 ----------------------
2757 Any memory access must be done through a pointer value associated with
2758 an address range of the memory access, otherwise the behavior is
2759 undefined. Pointer values are associated with address ranges according
2760 to the following rules:
2762 - A pointer value is associated with the addresses associated with any
2763 value it is *based* on.
2764 - An address of a global variable is associated with the address range
2765 of the variable's storage.
2766 - The result value of an allocation instruction is associated with the
2767 address range of the allocated storage.
2768 - A null pointer in the default address-space is associated with no
2770 - An :ref:`undef value <undefvalues>` in *any* address-space is
2771 associated with no address.
2772 - An integer constant other than zero or a pointer value returned from
2773 a function not defined within LLVM may be associated with address
2774 ranges allocated through mechanisms other than those provided by
2775 LLVM. Such ranges shall not overlap with any ranges of addresses
2776 allocated by mechanisms provided by LLVM.
2778 A pointer value is *based* on another pointer value according to the
2781 - A pointer value formed from a scalar ``getelementptr`` operation is *based* on
2782 the pointer-typed operand of the ``getelementptr``.
2783 - The pointer in lane *l* of the result of a vector ``getelementptr`` operation
2784 is *based* on the pointer in lane *l* of the vector-of-pointers-typed operand
2785 of the ``getelementptr``.
2786 - The result value of a ``bitcast`` is *based* on the operand of the
2788 - A pointer value formed by an ``inttoptr`` is *based* on all pointer
2789 values that contribute (directly or indirectly) to the computation of
2790 the pointer's value.
2791 - The "*based* on" relationship is transitive.
2793 Note that this definition of *"based"* is intentionally similar to the
2794 definition of *"based"* in C99, though it is slightly weaker.
2796 LLVM IR does not associate types with memory. The result type of a
2797 ``load`` merely indicates the size and alignment of the memory from
2798 which to load, as well as the interpretation of the value. The first
2799 operand type of a ``store`` similarly only indicates the size and
2800 alignment of the store.
2802 Consequently, type-based alias analysis, aka TBAA, aka
2803 ``-fstrict-aliasing``, is not applicable to general unadorned LLVM IR.
2804 :ref:`Metadata <metadata>` may be used to encode additional information
2805 which specialized optimization passes may use to implement type-based
2813 Given a function call and a pointer that is passed as an argument or stored in
2814 the memory before the call, a pointer is *captured* by the call if it makes a
2815 copy of any part of the pointer that outlives the call.
2816 To be precise, a pointer is captured if one or more of the following conditions
2819 1. The call stores any bit of the pointer carrying information into a place,
2820 and the stored bits can be read from the place by the caller after this call
2823 .. code-block:: llvm
2825 @glb = global i8* null
2826 @glb2 = global i8* null
2827 @glb3 = global i8* null
2828 @glbi = global i32 0
2830 define i8* @f(i8* %a, i8* %b, i8* %c, i8* %d, i8* %e) {
2831 store i8* %a, i8** @glb ; %a is captured by this call
2833 store i8* %b, i8** @glb2 ; %b isn't captured because the stored value is overwritten by the store below
2834 store i8* null, i8** @glb2
2836 store i8* %c, i8** @glb3
2837 call void @g() ; If @g makes a copy of %c that outlives this call (@f), %c is captured
2838 store i8* null, i8** @glb3
2840 %i = ptrtoint i8* %d to i64
2841 %j = trunc i64 %i to i32
2842 store i32 %j, i32* @glbi ; %d is captured
2844 ret i8* %e ; %e is captured
2847 2. The call stores any bit of the pointer carrying information into a place,
2848 and the stored bits can be safely read from the place by another thread via
2851 .. code-block:: llvm
2853 @lock = global i1 true
2855 define void @f(i8* %a) {
2856 store i8* %a, i8** @glb
2857 store atomic i1 false, i1* @lock release ; %a is captured because another thread can safely read @glb
2858 store i8* null, i8** @glb
2862 3. The call's behavior depends on any bit of the pointer carrying information.
2864 .. code-block:: llvm
2868 define void @f(i8* %a) {
2869 %c = icmp eq i8* %a, @glb
2870 br i1 %c, label %BB_EXIT, label %BB_CONTINUE ; escapes %a
2878 4. The pointer is used in a volatile access as its address.
2883 Volatile Memory Accesses
2884 ------------------------
2886 Certain memory accesses, such as :ref:`load <i_load>`'s,
2887 :ref:`store <i_store>`'s, and :ref:`llvm.memcpy <int_memcpy>`'s may be
2888 marked ``volatile``. The optimizers must not change the number of
2889 volatile operations or change their order of execution relative to other
2890 volatile operations. The optimizers *may* change the order of volatile
2891 operations relative to non-volatile operations. This is not Java's
2892 "volatile" and has no cross-thread synchronization behavior.
2894 A volatile load or store may have additional target-specific semantics.
2895 Any volatile operation can have side effects, and any volatile operation
2896 can read and/or modify state which is not accessible via a regular load
2897 or store in this module. Volatile operations may use addresses which do
2898 not point to memory (like MMIO registers). This means the compiler may
2899 not use a volatile operation to prove a non-volatile access to that
2900 address has defined behavior.
2902 The allowed side-effects for volatile accesses are limited. If a
2903 non-volatile store to a given address would be legal, a volatile
2904 operation may modify the memory at that address. A volatile operation
2905 may not modify any other memory accessible by the module being compiled.
2906 A volatile operation may not call any code in the current module.
2908 The compiler may assume execution will continue after a volatile operation,
2909 so operations which modify memory or may have undefined behavior can be
2910 hoisted past a volatile operation.
2912 As an exception to the preceding rule, the compiler may not assume execution
2913 will continue after a volatile store operation. This restriction is necessary
2914 to support the somewhat common pattern in C of intentionally storing to an
2915 invalid pointer to crash the program. In the future, it might make sense to
2916 allow frontends to control this behavior.
2918 IR-level volatile loads and stores cannot safely be optimized into llvm.memcpy
2919 or llvm.memmove intrinsics even when those intrinsics are flagged volatile.
2920 Likewise, the backend should never split or merge target-legal volatile
2921 load/store instructions. Similarly, IR-level volatile loads and stores cannot
2922 change from integer to floating-point or vice versa.
2924 .. admonition:: Rationale
2926 Platforms may rely on volatile loads and stores of natively supported
2927 data width to be executed as single instruction. For example, in C
2928 this holds for an l-value of volatile primitive type with native
2929 hardware support, but not necessarily for aggregate types. The
2930 frontend upholds these expectations, which are intentionally
2931 unspecified in the IR. The rules above ensure that IR transformations
2932 do not violate the frontend's contract with the language.
2936 Memory Model for Concurrent Operations
2937 --------------------------------------
2939 The LLVM IR does not define any way to start parallel threads of
2940 execution or to register signal handlers. Nonetheless, there are
2941 platform-specific ways to create them, and we define LLVM IR's behavior
2942 in their presence. This model is inspired by the C++0x memory model.
2944 For a more informal introduction to this model, see the :doc:`Atomics`.
2946 We define a *happens-before* partial order as the least partial order
2949 - Is a superset of single-thread program order, and
2950 - When a *synchronizes-with* ``b``, includes an edge from ``a`` to
2951 ``b``. *Synchronizes-with* pairs are introduced by platform-specific
2952 techniques, like pthread locks, thread creation, thread joining,
2953 etc., and by atomic instructions. (See also :ref:`Atomic Memory Ordering
2954 Constraints <ordering>`).
2956 Note that program order does not introduce *happens-before* edges
2957 between a thread and signals executing inside that thread.
2959 Every (defined) read operation (load instructions, memcpy, atomic
2960 loads/read-modify-writes, etc.) R reads a series of bytes written by
2961 (defined) write operations (store instructions, atomic
2962 stores/read-modify-writes, memcpy, etc.). For the purposes of this
2963 section, initialized globals are considered to have a write of the
2964 initializer which is atomic and happens before any other read or write
2965 of the memory in question. For each byte of a read R, R\ :sub:`byte`
2966 may see any write to the same byte, except:
2968 - If write\ :sub:`1` happens before write\ :sub:`2`, and
2969 write\ :sub:`2` happens before R\ :sub:`byte`, then
2970 R\ :sub:`byte` does not see write\ :sub:`1`.
2971 - If R\ :sub:`byte` happens before write\ :sub:`3`, then
2972 R\ :sub:`byte` does not see write\ :sub:`3`.
2974 Given that definition, R\ :sub:`byte` is defined as follows:
2976 - If R is volatile, the result is target-dependent. (Volatile is
2977 supposed to give guarantees which can support ``sig_atomic_t`` in
2978 C/C++, and may be used for accesses to addresses that do not behave
2979 like normal memory. It does not generally provide cross-thread
2981 - Otherwise, if there is no write to the same byte that happens before
2982 R\ :sub:`byte`, R\ :sub:`byte` returns ``undef`` for that byte.
2983 - Otherwise, if R\ :sub:`byte` may see exactly one write,
2984 R\ :sub:`byte` returns the value written by that write.
2985 - Otherwise, if R is atomic, and all the writes R\ :sub:`byte` may
2986 see are atomic, it chooses one of the values written. See the :ref:`Atomic
2987 Memory Ordering Constraints <ordering>` section for additional
2988 constraints on how the choice is made.
2989 - Otherwise R\ :sub:`byte` returns ``undef``.
2991 R returns the value composed of the series of bytes it read. This
2992 implies that some bytes within the value may be ``undef`` **without**
2993 the entire value being ``undef``. Note that this only defines the
2994 semantics of the operation; it doesn't mean that targets will emit more
2995 than one instruction to read the series of bytes.
2997 Note that in cases where none of the atomic intrinsics are used, this
2998 model places only one restriction on IR transformations on top of what
2999 is required for single-threaded execution: introducing a store to a byte
3000 which might not otherwise be stored is not allowed in general.
3001 (Specifically, in the case where another thread might write to and read
3002 from an address, introducing a store can change a load that may see
3003 exactly one write into a load that may see multiple writes.)
3007 Atomic Memory Ordering Constraints
3008 ----------------------------------
3010 Atomic instructions (:ref:`cmpxchg <i_cmpxchg>`,
3011 :ref:`atomicrmw <i_atomicrmw>`, :ref:`fence <i_fence>`,
3012 :ref:`atomic load <i_load>`, and :ref:`atomic store <i_store>`) take
3013 ordering parameters that determine which other atomic instructions on
3014 the same address they *synchronize with*. These semantics are borrowed
3015 from Java and C++0x, but are somewhat more colloquial. If these
3016 descriptions aren't precise enough, check those specs (see spec
3017 references in the :doc:`atomics guide <Atomics>`).
3018 :ref:`fence <i_fence>` instructions treat these orderings somewhat
3019 differently since they don't take an address. See that instruction's
3020 documentation for details.
3022 For a simpler introduction to the ordering constraints, see the
3026 The set of values that can be read is governed by the happens-before
3027 partial order. A value cannot be read unless some operation wrote
3028 it. This is intended to provide a guarantee strong enough to model
3029 Java's non-volatile shared variables. This ordering cannot be
3030 specified for read-modify-write operations; it is not strong enough
3031 to make them atomic in any interesting way.
3033 In addition to the guarantees of ``unordered``, there is a single
3034 total order for modifications by ``monotonic`` operations on each
3035 address. All modification orders must be compatible with the
3036 happens-before order. There is no guarantee that the modification
3037 orders can be combined to a global total order for the whole program
3038 (and this often will not be possible). The read in an atomic
3039 read-modify-write operation (:ref:`cmpxchg <i_cmpxchg>` and
3040 :ref:`atomicrmw <i_atomicrmw>`) reads the value in the modification
3041 order immediately before the value it writes. If one atomic read
3042 happens before another atomic read of the same address, the later
3043 read must see the same value or a later value in the address's
3044 modification order. This disallows reordering of ``monotonic`` (or
3045 stronger) operations on the same address. If an address is written
3046 ``monotonic``-ally by one thread, and other threads ``monotonic``-ally
3047 read that address repeatedly, the other threads must eventually see
3048 the write. This corresponds to the C++0x/C1x
3049 ``memory_order_relaxed``.
3051 In addition to the guarantees of ``monotonic``, a
3052 *synchronizes-with* edge may be formed with a ``release`` operation.
3053 This is intended to model C++'s ``memory_order_acquire``.
3055 In addition to the guarantees of ``monotonic``, if this operation
3056 writes a value which is subsequently read by an ``acquire``
3057 operation, it *synchronizes-with* that operation. (This isn't a
3058 complete description; see the C++0x definition of a release
3059 sequence.) This corresponds to the C++0x/C1x
3060 ``memory_order_release``.
3061 ``acq_rel`` (acquire+release)
3062 Acts as both an ``acquire`` and ``release`` operation on its
3063 address. This corresponds to the C++0x/C1x ``memory_order_acq_rel``.
3064 ``seq_cst`` (sequentially consistent)
3065 In addition to the guarantees of ``acq_rel`` (``acquire`` for an
3066 operation that only reads, ``release`` for an operation that only
3067 writes), there is a global total order on all
3068 sequentially-consistent operations on all addresses, which is
3069 consistent with the *happens-before* partial order and with the
3070 modification orders of all the affected addresses. Each
3071 sequentially-consistent read sees the last preceding write to the
3072 same address in this global order. This corresponds to the C++0x/C1x
3073 ``memory_order_seq_cst`` and Java volatile.
3077 If an atomic operation is marked ``syncscope("singlethread")``, it only
3078 *synchronizes with* and only participates in the seq\_cst total orderings of
3079 other operations running in the same thread (for example, in signal handlers).
3081 If an atomic operation is marked ``syncscope("<target-scope>")``, where
3082 ``<target-scope>`` is a target specific synchronization scope, then it is target
3083 dependent if it *synchronizes with* and participates in the seq\_cst total
3084 orderings of other operations.
3086 Otherwise, an atomic operation that is not marked ``syncscope("singlethread")``
3087 or ``syncscope("<target-scope>")`` *synchronizes with* and participates in the
3088 seq\_cst total orderings of other operations that are not marked
3089 ``syncscope("singlethread")`` or ``syncscope("<target-scope>")``.
3093 Floating-Point Environment
3094 --------------------------
3096 The default LLVM floating-point environment assumes that floating-point
3097 instructions do not have side effects. Results assume the round-to-nearest
3098 rounding mode. No floating-point exception state is maintained in this
3099 environment. Therefore, there is no attempt to create or preserve invalid
3100 operation (SNaN) or division-by-zero exceptions.
3102 The benefit of this exception-free assumption is that floating-point
3103 operations may be speculated freely without any other fast-math relaxations
3104 to the floating-point model.
3106 Code that requires different behavior than this should use the
3107 :ref:`Constrained Floating-Point Intrinsics <constrainedfp>`.
3114 LLVM IR floating-point operations (:ref:`fneg <i_fneg>`, :ref:`fadd <i_fadd>`,
3115 :ref:`fsub <i_fsub>`, :ref:`fmul <i_fmul>`, :ref:`fdiv <i_fdiv>`,
3116 :ref:`frem <i_frem>`, :ref:`fcmp <i_fcmp>`), :ref:`phi <i_phi>`,
3117 :ref:`select <i_select>` and :ref:`call <i_call>`
3118 may use the following flags to enable otherwise unsafe
3119 floating-point transformations.
3122 No NaNs - Allow optimizations to assume the arguments and result are not
3123 NaN. If an argument is a nan, or the result would be a nan, it produces
3124 a :ref:`poison value <poisonvalues>` instead.
3127 No Infs - Allow optimizations to assume the arguments and result are not
3128 +/-Inf. If an argument is +/-Inf, or the result would be +/-Inf, it
3129 produces a :ref:`poison value <poisonvalues>` instead.
3132 No Signed Zeros - Allow optimizations to treat the sign of a zero
3133 argument or result as insignificant. This does not imply that -0.0
3134 is poison and/or guaranteed to not exist in the operation.
3137 Allow Reciprocal - Allow optimizations to use the reciprocal of an
3138 argument rather than perform division.
3141 Allow floating-point contraction (e.g. fusing a multiply followed by an
3142 addition into a fused multiply-and-add). This does not enable reassociating
3143 to form arbitrary contractions. For example, ``(a*b) + (c*d) + e`` can not
3144 be transformed into ``(a*b) + ((c*d) + e)`` to create two fma operations.
3147 Approximate functions - Allow substitution of approximate calculations for
3148 functions (sin, log, sqrt, etc). See floating-point intrinsic definitions
3149 for places where this can apply to LLVM's intrinsic math functions.
3152 Allow reassociation transformations for floating-point instructions.
3153 This may dramatically change results in floating-point.
3156 This flag implies all of the others.
3160 Use-list Order Directives
3161 -------------------------
3163 Use-list directives encode the in-memory order of each use-list, allowing the
3164 order to be recreated. ``<order-indexes>`` is a comma-separated list of
3165 indexes that are assigned to the referenced value's uses. The referenced
3166 value's use-list is immediately sorted by these indexes.
3168 Use-list directives may appear at function scope or global scope. They are not
3169 instructions, and have no effect on the semantics of the IR. When they're at
3170 function scope, they must appear after the terminator of the final basic block.
3172 If basic blocks have their address taken via ``blockaddress()`` expressions,
3173 ``uselistorder_bb`` can be used to reorder their use-lists from outside their
3180 uselistorder <ty> <value>, { <order-indexes> }
3181 uselistorder_bb @function, %block { <order-indexes> }
3187 define void @foo(i32 %arg1, i32 %arg2) {
3189 ; ... instructions ...
3191 ; ... instructions ...
3193 ; At function scope.
3194 uselistorder i32 %arg1, { 1, 0, 2 }
3195 uselistorder label %bb, { 1, 0 }
3199 uselistorder i32* @global, { 1, 2, 0 }
3200 uselistorder i32 7, { 1, 0 }
3201 uselistorder i32 (i32) @bar, { 1, 0 }
3202 uselistorder_bb @foo, %bb, { 5, 1, 3, 2, 0, 4 }
3204 .. _source_filename:
3209 The *source filename* string is set to the original module identifier,
3210 which will be the name of the compiled source file when compiling from
3211 source through the clang front end, for example. It is then preserved through
3214 This is currently necessary to generate a consistent unique global
3215 identifier for local functions used in profile data, which prepends the
3216 source file name to the local function name.
3218 The syntax for the source file name is simply:
3220 .. code-block:: text
3222 source_filename = "/path/to/source.c"
3229 The LLVM type system is one of the most important features of the
3230 intermediate representation. Being typed enables a number of
3231 optimizations to be performed on the intermediate representation
3232 directly, without having to do extra analyses on the side before the
3233 transformation. A strong type system makes it easier to read the
3234 generated code and enables novel analyses and transformations that are
3235 not feasible to perform on normal three address code representations.
3245 The void type does not represent any value and has no size.
3263 The function type can be thought of as a function signature. It consists of a
3264 return type and a list of formal parameter types. The return type of a function
3265 type is a void type or first class type --- except for :ref:`label <t_label>`
3266 and :ref:`metadata <t_metadata>` types.
3272 <returntype> (<parameter list>)
3274 ...where '``<parameter list>``' is a comma-separated list of type
3275 specifiers. Optionally, the parameter list may include a type ``...``, which
3276 indicates that the function takes a variable number of arguments. Variable
3277 argument functions can access their arguments with the :ref:`variable argument
3278 handling intrinsic <int_varargs>` functions. '``<returntype>``' is any type
3279 except :ref:`label <t_label>` and :ref:`metadata <t_metadata>`.
3283 +---------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+
3284 | ``i32 (i32)`` | function taking an ``i32``, returning an ``i32`` |
3285 +---------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+
3286 | ``float (i16, i32 *) *`` | :ref:`Pointer <t_pointer>` to a function that takes an ``i16`` and a :ref:`pointer <t_pointer>` to ``i32``, returning ``float``. |
3287 +---------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+
3288 | ``i32 (i8*, ...)`` | A vararg function that takes at least one :ref:`pointer <t_pointer>` to ``i8`` (char in C), which returns an integer. This is the signature for ``printf`` in LLVM. |
3289 +---------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+
3290 | ``{i32, i32} (i32)`` | A function taking an ``i32``, returning a :ref:`structure <t_struct>` containing two ``i32`` values |
3291 +---------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+
3298 The :ref:`first class <t_firstclass>` types are perhaps the most important.
3299 Values of these types are the only ones which can be produced by
3307 These are the types that are valid in registers from CodeGen's perspective.
3316 The integer type is a very simple type that simply specifies an
3317 arbitrary bit width for the integer type desired. Any bit width from 1
3318 bit to 2\ :sup:`23`\ (about 8 million) can be specified.
3326 The number of bits the integer will occupy is specified by the ``N``
3332 +----------------+------------------------------------------------+
3333 | ``i1`` | a single-bit integer. |
3334 +----------------+------------------------------------------------+
3335 | ``i32`` | a 32-bit integer. |
3336 +----------------+------------------------------------------------+
3337 | ``i1942652`` | a really big integer of over 1 million bits. |
3338 +----------------+------------------------------------------------+
3342 Floating-Point Types
3343 """"""""""""""""""""
3352 - 16-bit floating-point value
3355 - 16-bit "brain" floating-point value (7-bit significand). Provides the
3356 same number of exponent bits as ``float``, so that it matches its dynamic
3357 range, but with greatly reduced precision. Used in Intel's AVX-512 BF16
3358 extensions and Arm's ARMv8.6-A extensions, among others.
3361 - 32-bit floating-point value
3364 - 64-bit floating-point value
3367 - 128-bit floating-point value (113-bit significand)
3370 - 80-bit floating-point value (X87)
3373 - 128-bit floating-point value (two 64-bits)
3375 The binary format of half, float, double, and fp128 correspond to the
3376 IEEE-754-2008 specifications for binary16, binary32, binary64, and binary128
3384 The x86_amx type represents a value held in an AMX tile register on an x86
3385 machine. The operations allowed on it are quite limited. Only few intrinsics
3386 are allowed: stride load and store, zero and dot product. No instruction is
3387 allowed for this type. There are no arguments, arrays, pointers, vectors
3388 or constants of this type.
3402 The x86_mmx type represents a value held in an MMX register on an x86
3403 machine. The operations allowed on it are quite limited: parameters and
3404 return values, load and store, and bitcast. User-specified MMX
3405 instructions are represented as intrinsic or asm calls with arguments
3406 and/or results of this type. There are no arrays, vectors or constants
3423 The pointer type is used to specify memory locations. Pointers are
3424 commonly used to reference objects in memory.
3426 Pointer types may have an optional address space attribute defining the
3427 numbered address space where the pointed-to object resides. The default
3428 address space is number zero. The semantics of non-zero address spaces
3429 are target-specific.
3431 Note that LLVM does not permit pointers to void (``void*``) nor does it
3432 permit pointers to labels (``label*``). Use ``i8*`` instead.
3434 LLVM is in the process of transitioning to
3435 `opaque pointers <OpaquePointers.html#opaque-pointers>`_.
3436 Opaque pointers do not have a pointee type. Rather, instructions
3437 interacting through pointers specify the type of the underlying memory
3438 they are interacting with. Opaque pointers are still in the process of
3439 being worked on and are not complete.
3450 +-------------------------+--------------------------------------------------------------------------------------------------------------+
3451 | ``[4 x i32]*`` | A :ref:`pointer <t_pointer>` to :ref:`array <t_array>` of four ``i32`` values. |
3452 +-------------------------+--------------------------------------------------------------------------------------------------------------+
3453 | ``i32 (i32*) *`` | A :ref:`pointer <t_pointer>` to a :ref:`function <t_function>` that takes an ``i32*``, returning an ``i32``. |
3454 +-------------------------+--------------------------------------------------------------------------------------------------------------+
3455 | ``i32 addrspace(5)*`` | A :ref:`pointer <t_pointer>` to an ``i32`` value that resides in address space 5. |
3456 +-------------------------+--------------------------------------------------------------------------------------------------------------+
3457 | ``ptr`` | An opaque pointer type to a value that resides in address space 0. |
3458 +-------------------------+--------------------------------------------------------------------------------------------------------------+
3459 | ``ptr addrspace(5)`` | An opaque pointer type to a value that resides in address space 5. |
3460 +-------------------------+--------------------------------------------------------------------------------------------------------------+
3469 A vector type is a simple derived type that represents a vector of
3470 elements. Vector types are used when multiple primitive data are
3471 operated in parallel using a single instruction (SIMD). A vector type
3472 requires a size (number of elements), an underlying primitive data type,
3473 and a scalable property to represent vectors where the exact hardware
3474 vector length is unknown at compile time. Vector types are considered
3475 :ref:`first class <t_firstclass>`.
3479 In general vector elements are laid out in memory in the same way as
3480 :ref:`array types <t_array>`. Such an analogy works fine as long as the vector
3481 elements are byte sized. However, when the elements of the vector aren't byte
3482 sized it gets a bit more complicated. One way to describe the layout is by
3483 describing what happens when a vector such as <N x iM> is bitcasted to an
3484 integer type with N*M bits, and then following the rules for storing such an
3487 A bitcast from a vector type to a scalar integer type will see the elements
3488 being packed together (without padding). The order in which elements are
3489 inserted in the integer depends on endianess. For little endian element zero
3490 is put in the least significant bits of the integer, and for big endian
3491 element zero is put in the most significant bits.
3493 Using a vector such as ``<i4 1, i4 2, i4 3, i4 5>`` as an example, together
3494 with the analogy that we can replace a vector store by a bitcast followed by
3495 an integer store, we get this for big endian:
3497 .. code-block:: llvm
3499 %val = bitcast <4 x i4> <i4 1, i4 2, i4 3, i4 5> to i16
3501 ; Bitcasting from a vector to an integral type can be seen as
3502 ; concatenating the values:
3503 ; %val now has the hexadecimal value 0x1235.
3505 store i16 %val, i16* %ptr
3507 ; In memory the content will be (8-bit addressing):
3509 ; [%ptr + 0]: 00010010 (0x12)
3510 ; [%ptr + 1]: 00110101 (0x35)
3512 The same example for little endian:
3514 .. code-block:: llvm
3516 %val = bitcast <4 x i4> <i4 1, i4 2, i4 3, i4 5> to i16
3518 ; Bitcasting from a vector to an integral type can be seen as
3519 ; concatenating the values:
3520 ; %val now has the hexadecimal value 0x5321.
3522 store i16 %val, i16* %ptr
3524 ; In memory the content will be (8-bit addressing):
3526 ; [%ptr + 0]: 01010011 (0x53)
3527 ; [%ptr + 1]: 00100001 (0x21)
3529 When ``<N*M>`` isn't evenly divisible by the byte size the exact memory layout
3530 is unspecified (just like it is for an integral type of the same size). This
3531 is because different targets could put the padding at different positions when
3532 the type size is smaller than the type's store size.
3538 < <# elements> x <elementtype> > ; Fixed-length vector
3539 < vscale x <# elements> x <elementtype> > ; Scalable vector
3541 The number of elements is a constant integer value larger than 0;
3542 elementtype may be any integer, floating-point or pointer type. Vectors
3543 of size zero are not allowed. For scalable vectors, the total number of
3544 elements is a constant multiple (called vscale) of the specified number
3545 of elements; vscale is a positive integer that is unknown at compile time
3546 and the same hardware-dependent constant for all scalable vectors at run
3547 time. The size of a specific scalable vector type is thus constant within
3548 IR, even if the exact size in bytes cannot be determined until run time.
3552 +------------------------+----------------------------------------------------+
3553 | ``<4 x i32>`` | Vector of 4 32-bit integer values. |
3554 +------------------------+----------------------------------------------------+
3555 | ``<8 x float>`` | Vector of 8 32-bit floating-point values. |
3556 +------------------------+----------------------------------------------------+
3557 | ``<2 x i64>`` | Vector of 2 64-bit integer values. |
3558 +------------------------+----------------------------------------------------+
3559 | ``<4 x i64*>`` | Vector of 4 pointers to 64-bit integer values. |
3560 +------------------------+----------------------------------------------------+
3561 | ``<vscale x 4 x i32>`` | Vector with a multiple of 4 32-bit integer values. |
3562 +------------------------+----------------------------------------------------+
3571 The label type represents code labels.
3586 The token type is used when a value is associated with an instruction
3587 but all uses of the value must not attempt to introspect or obscure it.
3588 As such, it is not appropriate to have a :ref:`phi <i_phi>` or
3589 :ref:`select <i_select>` of type token.
3606 The metadata type represents embedded metadata. No derived types may be
3607 created from metadata except for :ref:`function <t_function>` arguments.
3620 Aggregate Types are a subset of derived types that can contain multiple
3621 member types. :ref:`Arrays <t_array>` and :ref:`structs <t_struct>` are
3622 aggregate types. :ref:`Vectors <t_vector>` are not considered to be
3632 The array type is a very simple derived type that arranges elements
3633 sequentially in memory. The array type requires a size (number of
3634 elements) and an underlying data type.
3640 [<# elements> x <elementtype>]
3642 The number of elements is a constant integer value; ``elementtype`` may
3643 be any type with a size.
3647 +------------------+--------------------------------------+
3648 | ``[40 x i32]`` | Array of 40 32-bit integer values. |
3649 +------------------+--------------------------------------+
3650 | ``[41 x i32]`` | Array of 41 32-bit integer values. |
3651 +------------------+--------------------------------------+
3652 | ``[4 x i8]`` | Array of 4 8-bit integer values. |
3653 +------------------+--------------------------------------+
3655 Here are some examples of multidimensional arrays:
3657 +-----------------------------+----------------------------------------------------------+
3658 | ``[3 x [4 x i32]]`` | 3x4 array of 32-bit integer values. |
3659 +-----------------------------+----------------------------------------------------------+
3660 | ``[12 x [10 x float]]`` | 12x10 array of single precision floating-point values. |
3661 +-----------------------------+----------------------------------------------------------+
3662 | ``[2 x [3 x [4 x i16]]]`` | 2x3x4 array of 16-bit integer values. |
3663 +-----------------------------+----------------------------------------------------------+
3665 There is no restriction on indexing beyond the end of the array implied
3666 by a static type (though there are restrictions on indexing beyond the
3667 bounds of an allocated object in some cases). This means that
3668 single-dimension 'variable sized array' addressing can be implemented in
3669 LLVM with a zero length array type. An implementation of 'pascal style
3670 arrays' in LLVM could use the type "``{ i32, [0 x float]}``", for
3680 The structure type is used to represent a collection of data members
3681 together in memory. The elements of a structure may be any type that has
3684 Structures in memory are accessed using '``load``' and '``store``' by
3685 getting a pointer to a field with the '``getelementptr``' instruction.
3686 Structures in registers are accessed using the '``extractvalue``' and
3687 '``insertvalue``' instructions.
3689 Structures may optionally be "packed" structures, which indicate that
3690 the alignment of the struct is one byte, and that there is no padding
3691 between the elements. In non-packed structs, padding between field types
3692 is inserted as defined by the DataLayout string in the module, which is
3693 required to match what the underlying code generator expects.
3695 Structures can either be "literal" or "identified". A literal structure
3696 is defined inline with other types (e.g. ``{i32, i32}*``) whereas
3697 identified types are always defined at the top level with a name.
3698 Literal types are uniqued by their contents and can never be recursive
3699 or opaque since there is no way to write one. Identified types can be
3700 recursive, can be opaqued, and are never uniqued.
3706 %T1 = type { <type list> } ; Identified normal struct type
3707 %T2 = type <{ <type list> }> ; Identified packed struct type
3711 +------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
3712 | ``{ i32, i32, i32 }`` | A triple of three ``i32`` values |
3713 +------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
3714 | ``{ float, i32 (i32) * }`` | A pair, where the first element is a ``float`` and the second element is a :ref:`pointer <t_pointer>` to a :ref:`function <t_function>` that takes an ``i32``, returning an ``i32``. |
3715 +------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
3716 | ``<{ i8, i32 }>`` | A packed struct known to be 5 bytes in size. |
3717 +------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
3721 Opaque Structure Types
3722 """"""""""""""""""""""
3726 Opaque structure types are used to represent structure types that
3727 do not have a body specified. This corresponds (for example) to the C
3728 notion of a forward declared structure. They can be named (``%X``) or
3740 +--------------+-------------------+
3741 | ``opaque`` | An opaque type. |
3742 +--------------+-------------------+
3749 LLVM has several different basic types of constants. This section
3750 describes them all and their syntax.
3755 **Boolean constants**
3756 The two strings '``true``' and '``false``' are both valid constants
3758 **Integer constants**
3759 Standard integers (such as '4') are constants of the
3760 :ref:`integer <t_integer>` type. Negative numbers may be used with
3762 **Floating-point constants**
3763 Floating-point constants use standard decimal notation (e.g.
3764 123.421), exponential notation (e.g. 1.23421e+2), or a more precise
3765 hexadecimal notation (see below). The assembler requires the exact
3766 decimal value of a floating-point constant. For example, the
3767 assembler accepts 1.25 but rejects 1.3 because 1.3 is a repeating
3768 decimal in binary. Floating-point constants must have a
3769 :ref:`floating-point <t_floating>` type.
3770 **Null pointer constants**
3771 The identifier '``null``' is recognized as a null pointer constant
3772 and must be of :ref:`pointer type <t_pointer>`.
3774 The identifier '``none``' is recognized as an empty token constant
3775 and must be of :ref:`token type <t_token>`.
3777 The one non-intuitive notation for constants is the hexadecimal form of
3778 floating-point constants. For example, the form
3779 '``double 0x432ff973cafa8000``' is equivalent to (but harder to read
3780 than) '``double 4.5e+15``'. The only time hexadecimal floating-point
3781 constants are required (and the only time that they are generated by the
3782 disassembler) is when a floating-point constant must be emitted but it
3783 cannot be represented as a decimal floating-point number in a reasonable
3784 number of digits. For example, NaN's, infinities, and other special
3785 values are represented in their IEEE hexadecimal format so that assembly
3786 and disassembly do not cause any bits to change in the constants.
3788 When using the hexadecimal form, constants of types bfloat, half, float, and
3789 double are represented using the 16-digit form shown above (which matches the
3790 IEEE754 representation for double); bfloat, half and float values must, however,
3791 be exactly representable as bfloat, IEEE 754 half, and IEEE 754 single
3792 precision respectively. Hexadecimal format is always used for long double, and
3793 there are three forms of long double. The 80-bit format used by x86 is
3794 represented as ``0xK`` followed by 20 hexadecimal digits. The 128-bit format
3795 used by PowerPC (two adjacent doubles) is represented by ``0xM`` followed by 32
3796 hexadecimal digits. The IEEE 128-bit format is represented by ``0xL`` followed
3797 by 32 hexadecimal digits. Long doubles will only work if they match the long
3798 double format on your target. The IEEE 16-bit format (half precision) is
3799 represented by ``0xH`` followed by 4 hexadecimal digits. The bfloat 16-bit
3800 format is represented by ``0xR`` followed by 4 hexadecimal digits. All
3801 hexadecimal formats are big-endian (sign bit at the left).
3803 There are no constants of type x86_mmx and x86_amx.
3805 .. _complexconstants:
3810 Complex constants are a (potentially recursive) combination of simple
3811 constants and smaller complex constants.
3813 **Structure constants**
3814 Structure constants are represented with notation similar to
3815 structure type definitions (a comma separated list of elements,
3816 surrounded by braces (``{}``)). For example:
3817 "``{ i32 4, float 17.0, i32* @G }``", where "``@G``" is declared as
3818 "``@G = external global i32``". Structure constants must have
3819 :ref:`structure type <t_struct>`, and the number and types of elements
3820 must match those specified by the type.
3822 Array constants are represented with notation similar to array type
3823 definitions (a comma separated list of elements, surrounded by
3824 square brackets (``[]``)). For example:
3825 "``[ i32 42, i32 11, i32 74 ]``". Array constants must have
3826 :ref:`array type <t_array>`, and the number and types of elements must
3827 match those specified by the type. As a special case, character array
3828 constants may also be represented as a double-quoted string using the ``c``
3829 prefix. For example: "``c"Hello World\0A\00"``".
3830 **Vector constants**
3831 Vector constants are represented with notation similar to vector
3832 type definitions (a comma separated list of elements, surrounded by
3833 less-than/greater-than's (``<>``)). For example:
3834 "``< i32 42, i32 11, i32 74, i32 100 >``". Vector constants
3835 must have :ref:`vector type <t_vector>`, and the number and types of
3836 elements must match those specified by the type.
3837 **Zero initialization**
3838 The string '``zeroinitializer``' can be used to zero initialize a
3839 value to zero of *any* type, including scalar and
3840 :ref:`aggregate <t_aggregate>` types. This is often used to avoid
3841 having to print large zero initializers (e.g. for large arrays) and
3842 is always exactly equivalent to using explicit zero initializers.
3844 A metadata node is a constant tuple without types. For example:
3845 "``!{!0, !{!2, !0}, !"test"}``". Metadata can reference constant values,
3846 for example: "``!{!0, i32 0, i8* @global, i64 (i64)* @function, !"str"}``".
3847 Unlike other typed constants that are meant to be interpreted as part of
3848 the instruction stream, metadata is a place to attach additional
3849 information such as debug info.
3851 Global Variable and Function Addresses
3852 --------------------------------------
3854 The addresses of :ref:`global variables <globalvars>` and
3855 :ref:`functions <functionstructure>` are always implicitly valid
3856 (link-time) constants. These constants are explicitly referenced when
3857 the :ref:`identifier for the global <identifiers>` is used and always have
3858 :ref:`pointer <t_pointer>` type. For example, the following is a legal LLVM
3861 .. code-block:: llvm
3865 @Z = global [2 x i32*] [ i32* @X, i32* @Y ]
3872 The string '``undef``' can be used anywhere a constant is expected, and
3873 indicates that the user of the value may receive an unspecified
3874 bit-pattern. Undefined values may be of any type (other than '``label``'
3875 or '``void``') and be used anywhere a constant is permitted.
3877 Undefined values are useful because they indicate to the compiler that
3878 the program is well defined no matter what value is used. This gives the
3879 compiler more freedom to optimize. Here are some examples of
3880 (potentially surprising) transformations that are valid (in pseudo IR):
3882 .. code-block:: llvm
3892 This is safe because all of the output bits are affected by the undef
3893 bits. Any output bit can have a zero or one depending on the input bits.
3895 .. code-block:: llvm
3903 %A = %X ;; By choosing undef as 0
3904 %B = %X ;; By choosing undef as -1
3909 These logical operations have bits that are not always affected by the
3910 input. For example, if ``%X`` has a zero bit, then the output of the
3911 '``and``' operation will always be a zero for that bit, no matter what
3912 the corresponding bit from the '``undef``' is. As such, it is unsafe to
3913 optimize or assume that the result of the '``and``' is '``undef``'.
3914 However, it is safe to assume that all bits of the '``undef``' could be
3915 0, and optimize the '``and``' to 0. Likewise, it is safe to assume that
3916 all the bits of the '``undef``' operand to the '``or``' could be set,
3917 allowing the '``or``' to be folded to -1.
3919 .. code-block:: llvm
3921 %A = select undef, %X, %Y
3922 %B = select undef, 42, %Y
3923 %C = select %X, %Y, undef
3933 This set of examples shows that undefined '``select``' (and conditional
3934 branch) conditions can go *either way*, but they have to come from one
3935 of the two operands. In the ``%A`` example, if ``%X`` and ``%Y`` were
3936 both known to have a clear low bit, then ``%A`` would have to have a
3937 cleared low bit. However, in the ``%C`` example, the optimizer is
3938 allowed to assume that the '``undef``' operand could be the same as
3939 ``%Y``, allowing the whole '``select``' to be eliminated.
3941 .. code-block:: llvm
3943 %A = xor undef, undef
3960 This example points out that two '``undef``' operands are not
3961 necessarily the same. This can be surprising to people (and also matches
3962 C semantics) where they assume that "``X^X``" is always zero, even if
3963 ``X`` is undefined. This isn't true for a number of reasons, but the
3964 short answer is that an '``undef``' "variable" can arbitrarily change
3965 its value over its "live range". This is true because the variable
3966 doesn't actually *have a live range*. Instead, the value is logically
3967 read from arbitrary registers that happen to be around when needed, so
3968 the value is not necessarily consistent over time. In fact, ``%A`` and
3969 ``%C`` need to have the same semantics or the core LLVM "replace all
3970 uses with" concept would not hold.
3972 To ensure all uses of a given register observe the same value (even if
3973 '``undef``'), the :ref:`freeze instruction <i_freeze>` can be used.
3975 .. code-block:: llvm
3983 These examples show the crucial difference between an *undefined value*
3984 and *undefined behavior*. An undefined value (like '``undef``') is
3985 allowed to have an arbitrary bit-pattern. This means that the ``%A``
3986 operation can be constant folded to '``0``', because the '``undef``'
3987 could be zero, and zero divided by any value is zero.
3988 However, in the second example, we can make a more aggressive
3989 assumption: because the ``undef`` is allowed to be an arbitrary value,
3990 we are allowed to assume that it could be zero. Since a divide by zero
3991 has *undefined behavior*, we are allowed to assume that the operation
3992 does not execute at all. This allows us to delete the divide and all
3993 code after it. Because the undefined operation "can't happen", the
3994 optimizer can assume that it occurs in dead code.
3996 .. code-block:: text
3998 a: store undef -> %X
3999 b: store %X -> undef
4004 A store *of* an undefined value can be assumed to not have any effect;
4005 we can assume that the value is overwritten with bits that happen to
4006 match what was already there. However, a store *to* an undefined
4007 location could clobber arbitrary memory, therefore, it has undefined
4010 Branching on an undefined value is undefined behavior.
4011 This explains optimizations that depend on branch conditions to construct
4012 predicates, such as Correlated Value Propagation and Global Value Numbering.
4013 In case of switch instruction, the branch condition should be frozen, otherwise
4014 it is undefined behavior.
4016 .. code-block:: llvm
4019 br undef, BB1, BB2 ; UB
4021 %X = and i32 undef, 255
4022 switch %X, label %ret [ .. ] ; UB
4024 store undef, i8* %ptr
4025 %X = load i8* %ptr ; %X is undef
4026 switch i8 %X, label %ret [ .. ] ; UB
4029 %X = or i8 undef, 255 ; always 255
4030 switch i8 %X, label %ret [ .. ] ; Well-defined
4032 %X = freeze i1 undef
4033 br %X, BB1, BB2 ; Well-defined (non-deterministic jump)
4036 This is also consistent with the behavior of MemorySanitizer.
4037 MemorySanitizer, detector of uses of uninitialized memory,
4038 defines a branch with condition that depends on an undef value (or
4039 certain other values, like e.g. a result of a load from heap-allocated
4040 memory that has never been stored to) to have an externally visible
4041 side effect. For this reason functions with *sanitize_memory*
4042 attribute are not allowed to produce such branches "out of thin
4043 air". More strictly, an optimization that inserts a conditional branch
4044 is only valid if in all executions where the branch condition has at
4045 least one undefined bit, the same branch condition is evaluated in the
4053 A poison value is a result of an erroneous operation.
4054 In order to facilitate speculative execution, many instructions do not
4055 invoke immediate undefined behavior when provided with illegal operands,
4056 and return a poison value instead.
4057 The string '``poison``' can be used anywhere a constant is expected, and
4058 operations such as :ref:`add <i_add>` with the ``nsw`` flag can produce
4061 Poison value behavior is defined in terms of value *dependence*:
4063 - Values other than :ref:`phi <i_phi>` nodes, :ref:`select <i_select>`, and
4064 :ref:`freeze <i_freeze>` instructions depend on their operands.
4065 - :ref:`Phi <i_phi>` nodes depend on the operand corresponding to
4066 their dynamic predecessor basic block.
4067 - :ref:`Select <i_select>` instructions depend on their condition operand and
4068 their selected operand.
4069 - Function arguments depend on the corresponding actual argument values
4070 in the dynamic callers of their functions.
4071 - :ref:`Call <i_call>` instructions depend on the :ref:`ret <i_ret>`
4072 instructions that dynamically transfer control back to them.
4073 - :ref:`Invoke <i_invoke>` instructions depend on the
4074 :ref:`ret <i_ret>`, :ref:`resume <i_resume>`, or exception-throwing
4075 call instructions that dynamically transfer control back to them.
4076 - Non-volatile loads and stores depend on the most recent stores to all
4077 of the referenced memory addresses, following the order in the IR
4078 (including loads and stores implied by intrinsics such as
4079 :ref:`@llvm.memcpy <int_memcpy>`.)
4080 - An instruction with externally visible side effects depends on the
4081 most recent preceding instruction with externally visible side
4082 effects, following the order in the IR. (This includes :ref:`volatile
4083 operations <volatile>`.)
4084 - An instruction *control-depends* on a :ref:`terminator
4085 instruction <terminators>` if the terminator instruction has
4086 multiple successors and the instruction is always executed when
4087 control transfers to one of the successors, and may not be executed
4088 when control is transferred to another.
4089 - Additionally, an instruction also *control-depends* on a terminator
4090 instruction if the set of instructions it otherwise depends on would
4091 be different if the terminator had transferred control to a different
4093 - Dependence is transitive.
4094 - Vector elements may be independently poisoned. Therefore, transforms
4095 on instructions such as shufflevector must be careful to propagate
4096 poison across values or elements only as allowed by the original code.
4098 An instruction that *depends* on a poison value, produces a poison value
4099 itself. A poison value may be relaxed into an
4100 :ref:`undef value <undefvalues>`, which takes an arbitrary bit-pattern.
4101 Propagation of poison can be stopped with the
4102 :ref:`freeze instruction <i_freeze>`.
4104 This means that immediate undefined behavior occurs if a poison value is
4105 used as an instruction operand that has any values that trigger undefined
4106 behavior. Notably this includes (but is not limited to):
4108 - The pointer operand of a :ref:`load <i_load>`, :ref:`store <i_store>` or
4109 any other pointer dereferencing instruction (independent of address
4111 - The divisor operand of a ``udiv``, ``sdiv``, ``urem`` or ``srem``
4113 - The condition operand of a :ref:`br <i_br>` instruction.
4114 - The callee operand of a :ref:`call <i_call>` or :ref:`invoke <i_invoke>`
4116 - The parameter operand of a :ref:`call <i_call>` or :ref:`invoke <i_invoke>`
4117 instruction, when the function or invoking call site has a ``noundef``
4118 attribute in the corresponding position.
4119 - The operand of a :ref:`ret <i_ret>` instruction if the function or invoking
4120 call site has a `noundef` attribute in the return value position.
4122 Here are some examples:
4124 .. code-block:: llvm
4127 %poison = sub nuw i32 0, 1 ; Results in a poison value.
4128 %poison2 = sub i32 poison, 1 ; Also results in a poison value.
4129 %still_poison = and i32 %poison, 0 ; 0, but also poison.
4130 %poison_yet_again = getelementptr i32, i32* @h, i32 %still_poison
4131 store i32 0, i32* %poison_yet_again ; Undefined behavior due to
4134 store i32 %poison, i32* @g ; Poison value stored to memory.
4135 %poison3 = load i32, i32* @g ; Poison value loaded back from memory.
4137 %narrowaddr = bitcast i32* @g to i16*
4138 %wideaddr = bitcast i32* @g to i64*
4139 %poison4 = load i16, i16* %narrowaddr ; Returns a poison value.
4140 %poison5 = load i64, i64* %wideaddr ; Returns a poison value.
4142 %cmp = icmp slt i32 %poison, 0 ; Returns a poison value.
4143 br i1 %cmp, label %end, label %end ; undefined behavior
4147 .. _welldefinedvalues:
4152 Given a program execution, a value is *well defined* if the value does not
4153 have an undef bit and is not poison in the execution.
4154 An aggregate value or vector is well defined if its elements are well defined.
4155 The padding of an aggregate isn't considered, since it isn't visible
4156 without storing it into memory and loading it with a different type.
4158 A constant of a :ref:`single value <t_single_value>`, non-vector type is well
4159 defined if it is neither '``undef``' constant nor '``poison``' constant.
4160 The result of :ref:`freeze instruction <i_freeze>` is well defined regardless
4165 Addresses of Basic Blocks
4166 -------------------------
4168 ``blockaddress(@function, %block)``
4170 The '``blockaddress``' constant computes the address of the specified
4171 basic block in the specified function.
4173 It always has an ``i8 addrspace(P)*`` type, where ``P`` is the address space
4174 of the function containing ``%block`` (usually ``addrspace(0)``).
4176 Taking the address of the entry block is illegal.
4178 This value only has defined behavior when used as an operand to the
4179 ':ref:`indirectbr <i_indirectbr>`' or ':ref:`callbr <i_callbr>`'instruction, or
4180 for comparisons against null. Pointer equality tests between labels addresses
4181 results in undefined behavior --- though, again, comparison against null is ok,
4182 and no label is equal to the null pointer. This may be passed around as an
4183 opaque pointer sized value as long as the bits are not inspected. This
4184 allows ``ptrtoint`` and arithmetic to be performed on these values so
4185 long as the original value is reconstituted before the ``indirectbr`` or
4186 ``callbr`` instruction.
4188 Finally, some targets may provide defined semantics when using the value
4189 as the operand to an inline assembly, but that is target specific.
4191 .. _dso_local_equivalent:
4193 DSO Local Equivalent
4194 --------------------
4196 ``dso_local_equivalent @func``
4198 A '``dso_local_equivalent``' constant represents a function which is
4199 functionally equivalent to a given function, but is always defined in the
4200 current linkage unit. The resulting pointer has the same type as the underlying
4201 function. The resulting pointer is permitted, but not required, to be different
4202 from a pointer to the function, and it may have different values in different
4205 The target function may not have ``extern_weak`` linkage.
4207 ``dso_local_equivalent`` can be implemented as such:
4209 - If the function has local linkage, hidden visibility, or is
4210 ``dso_local``, ``dso_local_equivalent`` can be implemented as simply a pointer
4212 - ``dso_local_equivalent`` can be implemented with a stub that tail-calls the
4213 function. Many targets support relocations that resolve at link time to either
4214 a function or a stub for it, depending on if the function is defined within the
4215 linkage unit; LLVM will use this when available. (This is commonly called a
4216 "PLT stub".) On other targets, the stub may need to be emitted explicitly.
4218 This can be used wherever a ``dso_local`` instance of a function is needed without
4219 needing to explicitly make the original function ``dso_local``. An instance where
4220 this can be used is for static offset calculations between a function and some other
4221 ``dso_local`` symbol. This is especially useful for the Relative VTables C++ ABI,
4222 where dynamic relocations for function pointers in VTables can be replaced with
4223 static relocations for offsets between the VTable and virtual functions which
4224 may not be ``dso_local``.
4226 This is currently only supported for ELF binary formats.
4230 Constant Expressions
4231 --------------------
4233 Constant expressions are used to allow expressions involving other
4234 constants to be used as constants. Constant expressions may be of any
4235 :ref:`first class <t_firstclass>` type and may involve any LLVM operation
4236 that does not have side effects (e.g. load and call are not supported).
4237 The following is the syntax for constant expressions:
4239 ``trunc (CST to TYPE)``
4240 Perform the :ref:`trunc operation <i_trunc>` on constants.
4241 ``zext (CST to TYPE)``
4242 Perform the :ref:`zext operation <i_zext>` on constants.
4243 ``sext (CST to TYPE)``
4244 Perform the :ref:`sext operation <i_sext>` on constants.
4245 ``fptrunc (CST to TYPE)``
4246 Truncate a floating-point constant to another floating-point type.
4247 The size of CST must be larger than the size of TYPE. Both types
4248 must be floating-point.
4249 ``fpext (CST to TYPE)``
4250 Floating-point extend a constant to another type. The size of CST
4251 must be smaller or equal to the size of TYPE. Both types must be
4253 ``fptoui (CST to TYPE)``
4254 Convert a floating-point constant to the corresponding unsigned
4255 integer constant. TYPE must be a scalar or vector integer type. CST
4256 must be of scalar or vector floating-point type. Both CST and TYPE
4257 must be scalars, or vectors of the same number of elements. If the
4258 value won't fit in the integer type, the result is a
4259 :ref:`poison value <poisonvalues>`.
4260 ``fptosi (CST to TYPE)``
4261 Convert a floating-point constant to the corresponding signed
4262 integer constant. TYPE must be a scalar or vector integer type. CST
4263 must be of scalar or vector floating-point type. Both CST and TYPE
4264 must be scalars, or vectors of the same number of elements. If the
4265 value won't fit in the integer type, the result is a
4266 :ref:`poison value <poisonvalues>`.
4267 ``uitofp (CST to TYPE)``
4268 Convert an unsigned integer constant to the corresponding
4269 floating-point constant. TYPE must be a scalar or vector floating-point
4270 type. CST must be of scalar or vector integer type. Both CST and TYPE must
4271 be scalars, or vectors of the same number of elements.
4272 ``sitofp (CST to TYPE)``
4273 Convert a signed integer constant to the corresponding floating-point
4274 constant. TYPE must be a scalar or vector floating-point type.
4275 CST must be of scalar or vector integer type. Both CST and TYPE must
4276 be scalars, or vectors of the same number of elements.
4277 ``ptrtoint (CST to TYPE)``
4278 Perform the :ref:`ptrtoint operation <i_ptrtoint>` on constants.
4279 ``inttoptr (CST to TYPE)``
4280 Perform the :ref:`inttoptr operation <i_inttoptr>` on constants.
4281 This one is *really* dangerous!
4282 ``bitcast (CST to TYPE)``
4283 Convert a constant, CST, to another TYPE.
4284 The constraints of the operands are the same as those for the
4285 :ref:`bitcast instruction <i_bitcast>`.
4286 ``addrspacecast (CST to TYPE)``
4287 Convert a constant pointer or constant vector of pointer, CST, to another
4288 TYPE in a different address space. The constraints of the operands are the
4289 same as those for the :ref:`addrspacecast instruction <i_addrspacecast>`.
4290 ``getelementptr (TY, CSTPTR, IDX0, IDX1, ...)``, ``getelementptr inbounds (TY, CSTPTR, IDX0, IDX1, ...)``
4291 Perform the :ref:`getelementptr operation <i_getelementptr>` on
4292 constants. As with the :ref:`getelementptr <i_getelementptr>`
4293 instruction, the index list may have one or more indexes, which are
4294 required to make sense for the type of "pointer to TY".
4295 ``select (COND, VAL1, VAL2)``
4296 Perform the :ref:`select operation <i_select>` on constants.
4297 ``icmp COND (VAL1, VAL2)``
4298 Perform the :ref:`icmp operation <i_icmp>` on constants.
4299 ``fcmp COND (VAL1, VAL2)``
4300 Perform the :ref:`fcmp operation <i_fcmp>` on constants.
4301 ``extractelement (VAL, IDX)``
4302 Perform the :ref:`extractelement operation <i_extractelement>` on
4304 ``insertelement (VAL, ELT, IDX)``
4305 Perform the :ref:`insertelement operation <i_insertelement>` on
4307 ``shufflevector (VEC1, VEC2, IDXMASK)``
4308 Perform the :ref:`shufflevector operation <i_shufflevector>` on
4310 ``extractvalue (VAL, IDX0, IDX1, ...)``
4311 Perform the :ref:`extractvalue operation <i_extractvalue>` on
4312 constants. The index list is interpreted in a similar manner as
4313 indices in a ':ref:`getelementptr <i_getelementptr>`' operation. At
4314 least one index value must be specified.
4315 ``insertvalue (VAL, ELT, IDX0, IDX1, ...)``
4316 Perform the :ref:`insertvalue operation <i_insertvalue>` on constants.
4317 The index list is interpreted in a similar manner as indices in a
4318 ':ref:`getelementptr <i_getelementptr>`' operation. At least one index
4319 value must be specified.
4320 ``OPCODE (LHS, RHS)``
4321 Perform the specified operation of the LHS and RHS constants. OPCODE
4322 may be any of the :ref:`binary <binaryops>` or :ref:`bitwise
4323 binary <bitwiseops>` operations. The constraints on operands are
4324 the same as those for the corresponding instruction (e.g. no bitwise
4325 operations on floating-point values are allowed).
4332 Inline Assembler Expressions
4333 ----------------------------
4335 LLVM supports inline assembler expressions (as opposed to :ref:`Module-Level
4336 Inline Assembly <moduleasm>`) through the use of a special value. This value
4337 represents the inline assembler as a template string (containing the
4338 instructions to emit), a list of operand constraints (stored as a string), a
4339 flag that indicates whether or not the inline asm expression has side effects,
4340 and a flag indicating whether the function containing the asm needs to align its
4341 stack conservatively.
4343 The template string supports argument substitution of the operands using "``$``"
4344 followed by a number, to indicate substitution of the given register/memory
4345 location, as specified by the constraint string. "``${NUM:MODIFIER}``" may also
4346 be used, where ``MODIFIER`` is a target-specific annotation for how to print the
4347 operand (See :ref:`inline-asm-modifiers`).
4349 A literal "``$``" may be included by using "``$$``" in the template. To include
4350 other special characters into the output, the usual "``\XX``" escapes may be
4351 used, just as in other strings. Note that after template substitution, the
4352 resulting assembly string is parsed by LLVM's integrated assembler unless it is
4353 disabled -- even when emitting a ``.s`` file -- and thus must contain assembly
4354 syntax known to LLVM.
4356 LLVM also supports a few more substitutions useful for writing inline assembly:
4358 - ``${:uid}``: Expands to a decimal integer unique to this inline assembly blob.
4359 This substitution is useful when declaring a local label. Many standard
4360 compiler optimizations, such as inlining, may duplicate an inline asm blob.
4361 Adding a blob-unique identifier ensures that the two labels will not conflict
4362 during assembly. This is used to implement `GCC's %= special format
4363 string <https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html>`_.
4364 - ``${:comment}``: Expands to the comment character of the current target's
4365 assembly dialect. This is usually ``#``, but many targets use other strings,
4366 such as ``;``, ``//``, or ``!``.
4367 - ``${:private}``: Expands to the assembler private label prefix. Labels with
4368 this prefix will not appear in the symbol table of the assembled object.
4369 Typically the prefix is ``L``, but targets may use other strings. ``.L`` is
4372 LLVM's support for inline asm is modeled closely on the requirements of Clang's
4373 GCC-compatible inline-asm support. Thus, the feature-set and the constraint and
4374 modifier codes listed here are similar or identical to those in GCC's inline asm
4375 support. However, to be clear, the syntax of the template and constraint strings
4376 described here is *not* the same as the syntax accepted by GCC and Clang, and,
4377 while most constraint letters are passed through as-is by Clang, some get
4378 translated to other codes when converting from the C source to the LLVM
4381 An example inline assembler expression is:
4383 .. code-block:: llvm
4385 i32 (i32) asm "bswap $0", "=r,r"
4387 Inline assembler expressions may **only** be used as the callee operand
4388 of a :ref:`call <i_call>` or an :ref:`invoke <i_invoke>` instruction.
4389 Thus, typically we have:
4391 .. code-block:: llvm
4393 %X = call i32 asm "bswap $0", "=r,r"(i32 %Y)
4395 Inline asms with side effects not visible in the constraint list must be
4396 marked as having side effects. This is done through the use of the
4397 '``sideeffect``' keyword, like so:
4399 .. code-block:: llvm
4401 call void asm sideeffect "eieio", ""()
4403 In some cases inline asms will contain code that will not work unless
4404 the stack is aligned in some way, such as calls or SSE instructions on
4405 x86, yet will not contain code that does that alignment within the asm.
4406 The compiler should make conservative assumptions about what the asm
4407 might contain and should generate its usual stack alignment code in the
4408 prologue if the '``alignstack``' keyword is present:
4410 .. code-block:: llvm
4412 call void asm alignstack "eieio", ""()
4414 Inline asms also support using non-standard assembly dialects. The
4415 assumed dialect is ATT. When the '``inteldialect``' keyword is present,
4416 the inline asm is using the Intel dialect. Currently, ATT and Intel are
4417 the only supported dialects. An example is:
4419 .. code-block:: llvm
4421 call void asm inteldialect "eieio", ""()
4423 In the case that the inline asm might unwind the stack,
4424 the '``unwind``' keyword must be used, so that the compiler emits
4425 unwinding information:
4427 .. code-block:: llvm
4429 call void asm unwind "call func", ""()
4431 If the inline asm unwinds the stack and isn't marked with
4432 the '``unwind``' keyword, the behavior is undefined.
4434 If multiple keywords appear, the '``sideeffect``' keyword must come
4435 first, the '``alignstack``' keyword second, the '``inteldialect``' keyword
4436 third and the '``unwind``' keyword last.
4438 Inline Asm Constraint String
4439 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
4441 The constraint list is a comma-separated string, each element containing one or
4442 more constraint codes.
4444 For each element in the constraint list an appropriate register or memory
4445 operand will be chosen, and it will be made available to assembly template
4446 string expansion as ``$0`` for the first constraint in the list, ``$1`` for the
4449 There are three different types of constraints, which are distinguished by a
4450 prefix symbol in front of the constraint code: Output, Input, and Clobber. The
4451 constraints must always be given in that order: outputs first, then inputs, then
4452 clobbers. They cannot be intermingled.
4454 There are also three different categories of constraint codes:
4456 - Register constraint. This is either a register class, or a fixed physical
4457 register. This kind of constraint will allocate a register, and if necessary,
4458 bitcast the argument or result to the appropriate type.
4459 - Memory constraint. This kind of constraint is for use with an instruction
4460 taking a memory operand. Different constraints allow for different addressing
4461 modes used by the target.
4462 - Immediate value constraint. This kind of constraint is for an integer or other
4463 immediate value which can be rendered directly into an instruction. The
4464 various target-specific constraints allow the selection of a value in the
4465 proper range for the instruction you wish to use it with.
4470 Output constraints are specified by an "``=``" prefix (e.g. "``=r``"). This
4471 indicates that the assembly will write to this operand, and the operand will
4472 then be made available as a return value of the ``asm`` expression. Output
4473 constraints do not consume an argument from the call instruction. (Except, see
4474 below about indirect outputs).
4476 Normally, it is expected that no output locations are written to by the assembly
4477 expression until *all* of the inputs have been read. As such, LLVM may assign
4478 the same register to an output and an input. If this is not safe (e.g. if the
4479 assembly contains two instructions, where the first writes to one output, and
4480 the second reads an input and writes to a second output), then the "``&``"
4481 modifier must be used (e.g. "``=&r``") to specify that the output is an
4482 "early-clobber" output. Marking an output as "early-clobber" ensures that LLVM
4483 will not use the same register for any inputs (other than an input tied to this
4489 Input constraints do not have a prefix -- just the constraint codes. Each input
4490 constraint will consume one argument from the call instruction. It is not
4491 permitted for the asm to write to any input register or memory location (unless
4492 that input is tied to an output). Note also that multiple inputs may all be
4493 assigned to the same register, if LLVM can determine that they necessarily all
4494 contain the same value.
4496 Instead of providing a Constraint Code, input constraints may also "tie"
4497 themselves to an output constraint, by providing an integer as the constraint
4498 string. Tied inputs still consume an argument from the call instruction, and
4499 take up a position in the asm template numbering as is usual -- they will simply
4500 be constrained to always use the same register as the output they've been tied
4501 to. For example, a constraint string of "``=r,0``" says to assign a register for
4502 output, and use that register as an input as well (it being the 0'th
4505 It is permitted to tie an input to an "early-clobber" output. In that case, no
4506 *other* input may share the same register as the input tied to the early-clobber
4507 (even when the other input has the same value).
4509 You may only tie an input to an output which has a register constraint, not a
4510 memory constraint. Only a single input may be tied to an output.
4512 There is also an "interesting" feature which deserves a bit of explanation: if a
4513 register class constraint allocates a register which is too small for the value
4514 type operand provided as input, the input value will be split into multiple
4515 registers, and all of them passed to the inline asm.
4517 However, this feature is often not as useful as you might think.
4519 Firstly, the registers are *not* guaranteed to be consecutive. So, on those
4520 architectures that have instructions which operate on multiple consecutive
4521 instructions, this is not an appropriate way to support them. (e.g. the 32-bit
4522 SparcV8 has a 64-bit load, which instruction takes a single 32-bit register. The
4523 hardware then loads into both the named register, and the next register. This
4524 feature of inline asm would not be useful to support that.)
4526 A few of the targets provide a template string modifier allowing explicit access
4527 to the second register of a two-register operand (e.g. MIPS ``L``, ``M``, and
4528 ``D``). On such an architecture, you can actually access the second allocated
4529 register (yet, still, not any subsequent ones). But, in that case, you're still
4530 probably better off simply splitting the value into two separate operands, for
4531 clarity. (e.g. see the description of the ``A`` constraint on X86, which,
4532 despite existing only for use with this feature, is not really a good idea to
4535 Indirect inputs and outputs
4536 """""""""""""""""""""""""""
4538 Indirect output or input constraints can be specified by the "``*``" modifier
4539 (which goes after the "``=``" in case of an output). This indicates that the asm
4540 will write to or read from the contents of an *address* provided as an input
4541 argument. (Note that in this way, indirect outputs act more like an *input* than
4542 an output: just like an input, they consume an argument of the call expression,
4543 rather than producing a return value. An indirect output constraint is an
4544 "output" only in that the asm is expected to write to the contents of the input
4545 memory location, instead of just read from it).
4547 This is most typically used for memory constraint, e.g. "``=*m``", to pass the
4548 address of a variable as a value.
4550 It is also possible to use an indirect *register* constraint, but only on output
4551 (e.g. "``=*r``"). This will cause LLVM to allocate a register for an output
4552 value normally, and then, separately emit a store to the address provided as
4553 input, after the provided inline asm. (It's not clear what value this
4554 functionality provides, compared to writing the store explicitly after the asm
4555 statement, and it can only produce worse code, since it bypasses many
4556 optimization passes. I would recommend not using it.)
4562 A clobber constraint is indicated by a "``~``" prefix. A clobber does not
4563 consume an input operand, nor generate an output. Clobbers cannot use any of the
4564 general constraint code letters -- they may use only explicit register
4565 constraints, e.g. "``~{eax}``". The one exception is that a clobber string of
4566 "``~{memory}``" indicates that the assembly writes to arbitrary undeclared
4567 memory locations -- not only the memory pointed to by a declared indirect
4570 Note that clobbering named registers that are also present in output
4571 constraints is not legal.
4576 After a potential prefix comes constraint code, or codes.
4578 A Constraint Code is either a single letter (e.g. "``r``"), a "``^``" character
4579 followed by two letters (e.g. "``^wc``"), or "``{``" register-name "``}``"
4582 The one and two letter constraint codes are typically chosen to be the same as
4583 GCC's constraint codes.
4585 A single constraint may include one or more than constraint code in it, leaving
4586 it up to LLVM to choose which one to use. This is included mainly for
4587 compatibility with the translation of GCC inline asm coming from clang.
4589 There are two ways to specify alternatives, and either or both may be used in an
4590 inline asm constraint list:
4592 1) Append the codes to each other, making a constraint code set. E.g. "``im``"
4593 or "``{eax}m``". This means "choose any of the options in the set". The
4594 choice of constraint is made independently for each constraint in the
4597 2) Use "``|``" between constraint code sets, creating alternatives. Every
4598 constraint in the constraint list must have the same number of alternative
4599 sets. With this syntax, the same alternative in *all* of the items in the
4600 constraint list will be chosen together.
4602 Putting those together, you might have a two operand constraint string like
4603 ``"rm|r,ri|rm"``. This indicates that if operand 0 is ``r`` or ``m``, then
4604 operand 1 may be one of ``r`` or ``i``. If operand 0 is ``r``, then operand 1
4605 may be one of ``r`` or ``m``. But, operand 0 and 1 cannot both be of type m.
4607 However, the use of either of the alternatives features is *NOT* recommended, as
4608 LLVM is not able to make an intelligent choice about which one to use. (At the
4609 point it currently needs to choose, not enough information is available to do so
4610 in a smart way.) Thus, it simply tries to make a choice that's most likely to
4611 compile, not one that will be optimal performance. (e.g., given "``rm``", it'll
4612 always choose to use memory, not registers). And, if given multiple registers,
4613 or multiple register classes, it will simply choose the first one. (In fact, it
4614 doesn't currently even ensure explicitly specified physical registers are
4615 unique, so specifying multiple physical registers as alternatives, like
4616 ``{r11}{r12},{r11}{r12}``, will assign r11 to both operands, not at all what was
4619 Supported Constraint Code List
4620 """"""""""""""""""""""""""""""
4622 The constraint codes are, in general, expected to behave the same way they do in
4623 GCC. LLVM's support is often implemented on an 'as-needed' basis, to support C
4624 inline asm code which was supported by GCC. A mismatch in behavior between LLVM
4625 and GCC likely indicates a bug in LLVM.
4627 Some constraint codes are typically supported by all targets:
4629 - ``r``: A register in the target's general purpose register class.
4630 - ``m``: A memory address operand. It is target-specific what addressing modes
4631 are supported, typical examples are register, or register + register offset,
4632 or register + immediate offset (of some target-specific size).
4633 - ``i``: An integer constant (of target-specific width). Allows either a simple
4634 immediate, or a relocatable value.
4635 - ``n``: An integer constant -- *not* including relocatable values.
4636 - ``s``: An integer constant, but allowing *only* relocatable values.
4637 - ``X``: Allows an operand of any kind, no constraint whatsoever. Typically
4638 useful to pass a label for an asm branch or call.
4640 .. FIXME: but that surely isn't actually okay to jump out of an asm
4641 block without telling llvm about the control transfer???)
4643 - ``{register-name}``: Requires exactly the named physical register.
4645 Other constraints are target-specific:
4649 - ``z``: An immediate integer 0. Outputs ``WZR`` or ``XZR``, as appropriate.
4650 - ``I``: An immediate integer valid for an ``ADD`` or ``SUB`` instruction,
4651 i.e. 0 to 4095 with optional shift by 12.
4652 - ``J``: An immediate integer that, when negated, is valid for an ``ADD`` or
4653 ``SUB`` instruction, i.e. -1 to -4095 with optional left shift by 12.
4654 - ``K``: An immediate integer that is valid for the 'bitmask immediate 32' of a
4655 logical instruction like ``AND``, ``EOR``, or ``ORR`` with a 32-bit register.
4656 - ``L``: An immediate integer that is valid for the 'bitmask immediate 64' of a
4657 logical instruction like ``AND``, ``EOR``, or ``ORR`` with a 64-bit register.
4658 - ``M``: An immediate integer for use with the ``MOV`` assembly alias on a
4659 32-bit register. This is a superset of ``K``: in addition to the bitmask
4660 immediate, also allows immediate integers which can be loaded with a single
4661 ``MOVZ`` or ``MOVL`` instruction.
4662 - ``N``: An immediate integer for use with the ``MOV`` assembly alias on a
4663 64-bit register. This is a superset of ``L``.
4664 - ``Q``: Memory address operand must be in a single register (no
4665 offsets). (However, LLVM currently does this for the ``m`` constraint as
4667 - ``r``: A 32 or 64-bit integer register (W* or X*).
4668 - ``w``: A 32, 64, or 128-bit floating-point, SIMD or SVE vector register.
4669 - ``x``: Like w, but restricted to registers 0 to 15 inclusive.
4670 - ``y``: Like w, but restricted to SVE vector registers Z0 to Z7 inclusive.
4671 - ``Upl``: One of the low eight SVE predicate registers (P0 to P7)
4672 - ``Upa``: Any of the SVE predicate registers (P0 to P15)
4676 - ``r``: A 32 or 64-bit integer register.
4677 - ``[0-9]v``: The 32-bit VGPR register, number 0-9.
4678 - ``[0-9]s``: The 32-bit SGPR register, number 0-9.
4679 - ``[0-9]a``: The 32-bit AGPR register, number 0-9.
4680 - ``I``: An integer inline constant in the range from -16 to 64.
4681 - ``J``: A 16-bit signed integer constant.
4682 - ``A``: An integer or a floating-point inline constant.
4683 - ``B``: A 32-bit signed integer constant.
4684 - ``C``: A 32-bit unsigned integer constant or an integer inline constant in the range from -16 to 64.
4685 - ``DA``: A 64-bit constant that can be split into two "A" constants.
4686 - ``DB``: A 64-bit constant that can be split into two "B" constants.
4690 - ``Q``, ``Um``, ``Un``, ``Uq``, ``Us``, ``Ut``, ``Uv``, ``Uy``: Memory address
4691 operand. Treated the same as operand ``m``, at the moment.
4692 - ``Te``: An even general-purpose 32-bit integer register: ``r0,r2,...,r12,r14``
4693 - ``To``: An odd general-purpose 32-bit integer register: ``r1,r3,...,r11``
4695 ARM and ARM's Thumb2 mode:
4697 - ``j``: An immediate integer between 0 and 65535 (valid for ``MOVW``)
4698 - ``I``: An immediate integer valid for a data-processing instruction.
4699 - ``J``: An immediate integer between -4095 and 4095.
4700 - ``K``: An immediate integer whose bitwise inverse is valid for a
4701 data-processing instruction. (Can be used with template modifier "``B``" to
4702 print the inverted value).
4703 - ``L``: An immediate integer whose negation is valid for a data-processing
4704 instruction. (Can be used with template modifier "``n``" to print the negated
4706 - ``M``: A power of two or an integer between 0 and 32.
4707 - ``N``: Invalid immediate constraint.
4708 - ``O``: Invalid immediate constraint.
4709 - ``r``: A general-purpose 32-bit integer register (``r0-r15``).
4710 - ``l``: In Thumb2 mode, low 32-bit GPR registers (``r0-r7``). In ARM mode, same
4712 - ``h``: In Thumb2 mode, a high 32-bit GPR register (``r8-r15``). In ARM mode,
4714 - ``w``: A 32, 64, or 128-bit floating-point/SIMD register in the ranges
4715 ``s0-s31``, ``d0-d31``, or ``q0-q15``, respectively.
4716 - ``t``: A 32, 64, or 128-bit floating-point/SIMD register in the ranges
4717 ``s0-s31``, ``d0-d15``, or ``q0-q7``, respectively.
4718 - ``x``: A 32, 64, or 128-bit floating-point/SIMD register in the ranges
4719 ``s0-s15``, ``d0-d7``, or ``q0-q3``, respectively.
4723 - ``I``: An immediate integer between 0 and 255.
4724 - ``J``: An immediate integer between -255 and -1.
4725 - ``K``: An immediate integer between 0 and 255, with optional left-shift by
4727 - ``L``: An immediate integer between -7 and 7.
4728 - ``M``: An immediate integer which is a multiple of 4 between 0 and 1020.
4729 - ``N``: An immediate integer between 0 and 31.
4730 - ``O``: An immediate integer which is a multiple of 4 between -508 and 508.
4731 - ``r``: A low 32-bit GPR register (``r0-r7``).
4732 - ``l``: A low 32-bit GPR register (``r0-r7``).
4733 - ``h``: A high GPR register (``r0-r7``).
4734 - ``w``: A 32, 64, or 128-bit floating-point/SIMD register in the ranges
4735 ``s0-s31``, ``d0-d31``, or ``q0-q15``, respectively.
4736 - ``t``: A 32, 64, or 128-bit floating-point/SIMD register in the ranges
4737 ``s0-s31``, ``d0-d15``, or ``q0-q7``, respectively.
4738 - ``x``: A 32, 64, or 128-bit floating-point/SIMD register in the ranges
4739 ``s0-s15``, ``d0-d7``, or ``q0-q3``, respectively.
4744 - ``o``, ``v``: A memory address operand, treated the same as constraint ``m``,
4746 - ``r``: A 32 or 64-bit register.
4750 - ``r``: An 8 or 16-bit register.
4754 - ``I``: An immediate signed 16-bit integer.
4755 - ``J``: An immediate integer zero.
4756 - ``K``: An immediate unsigned 16-bit integer.
4757 - ``L``: An immediate 32-bit integer, where the lower 16 bits are 0.
4758 - ``N``: An immediate integer between -65535 and -1.
4759 - ``O``: An immediate signed 15-bit integer.
4760 - ``P``: An immediate integer between 1 and 65535.
4761 - ``m``: A memory address operand. In MIPS-SE mode, allows a base address
4762 register plus 16-bit immediate offset. In MIPS mode, just a base register.
4763 - ``R``: A memory address operand. In MIPS-SE mode, allows a base address
4764 register plus a 9-bit signed offset. In MIPS mode, the same as constraint
4766 - ``ZC``: A memory address operand, suitable for use in a ``pref``, ``ll``, or
4767 ``sc`` instruction on the given subtarget (details vary).
4768 - ``r``, ``d``, ``y``: A 32 or 64-bit GPR register.
4769 - ``f``: A 32 or 64-bit FPU register (``F0-F31``), or a 128-bit MSA register
4770 (``W0-W31``). In the case of MSA registers, it is recommended to use the ``w``
4771 argument modifier for compatibility with GCC.
4772 - ``c``: A 32-bit or 64-bit GPR register suitable for indirect jump (always
4774 - ``l``: The ``lo`` register, 32 or 64-bit.
4779 - ``b``: A 1-bit integer register.
4780 - ``c`` or ``h``: A 16-bit integer register.
4781 - ``r``: A 32-bit integer register.
4782 - ``l`` or ``N``: A 64-bit integer register.
4783 - ``f``: A 32-bit float register.
4784 - ``d``: A 64-bit float register.
4789 - ``I``: An immediate signed 16-bit integer.
4790 - ``J``: An immediate unsigned 16-bit integer, shifted left 16 bits.
4791 - ``K``: An immediate unsigned 16-bit integer.
4792 - ``L``: An immediate signed 16-bit integer, shifted left 16 bits.
4793 - ``M``: An immediate integer greater than 31.
4794 - ``N``: An immediate integer that is an exact power of 2.
4795 - ``O``: The immediate integer constant 0.
4796 - ``P``: An immediate integer constant whose negation is a signed 16-bit
4798 - ``es``, ``o``, ``Q``, ``Z``, ``Zy``: A memory address operand, currently
4799 treated the same as ``m``.
4800 - ``r``: A 32 or 64-bit integer register.
4801 - ``b``: A 32 or 64-bit integer register, excluding ``R0`` (that is:
4803 - ``f``: A 32 or 64-bit float register (``F0-F31``),
4804 - ``v``: For ``4 x f32`` or ``4 x f64`` types, a 128-bit altivec vector
4805 register (``V0-V31``).
4807 - ``y``: Condition register (``CR0-CR7``).
4808 - ``wc``: An individual CR bit in a CR register.
4809 - ``wa``, ``wd``, ``wf``: Any 128-bit VSX vector register, from the full VSX
4810 register set (overlapping both the floating-point and vector register files).
4811 - ``ws``: A 32 or 64-bit floating-point register, from the full VSX register
4816 - ``A``: An address operand (using a general-purpose register, without an
4818 - ``I``: A 12-bit signed integer immediate operand.
4819 - ``J``: A zero integer immediate operand.
4820 - ``K``: A 5-bit unsigned integer immediate operand.
4821 - ``f``: A 32- or 64-bit floating-point register (requires F or D extension).
4822 - ``r``: A 32- or 64-bit general-purpose register (depending on the platform
4824 - ``vr``: A vector register. (requires V extension).
4825 - ``vm``: A vector mask register. (requires V extension).
4829 - ``I``: An immediate 13-bit signed integer.
4830 - ``r``: A 32-bit integer register.
4831 - ``f``: Any floating-point register on SparcV8, or a floating-point
4832 register in the "low" half of the registers on SparcV9.
4833 - ``e``: Any floating-point register. (Same as ``f`` on SparcV8.)
4837 - ``I``: An immediate unsigned 8-bit integer.
4838 - ``J``: An immediate unsigned 12-bit integer.
4839 - ``K``: An immediate signed 16-bit integer.
4840 - ``L``: An immediate signed 20-bit integer.
4841 - ``M``: An immediate integer 0x7fffffff.
4842 - ``Q``: A memory address operand with a base address and a 12-bit immediate
4843 unsigned displacement.
4844 - ``R``: A memory address operand with a base address, a 12-bit immediate
4845 unsigned displacement, and an index register.
4846 - ``S``: A memory address operand with a base address and a 20-bit immediate
4847 signed displacement.
4848 - ``T``: A memory address operand with a base address, a 20-bit immediate
4849 signed displacement, and an index register.
4850 - ``r`` or ``d``: A 32, 64, or 128-bit integer register.
4851 - ``a``: A 32, 64, or 128-bit integer address register (excludes R0, which in an
4852 address context evaluates as zero).
4853 - ``h``: A 32-bit value in the high part of a 64bit data register
4855 - ``f``: A 32, 64, or 128-bit floating-point register.
4859 - ``I``: An immediate integer between 0 and 31.
4860 - ``J``: An immediate integer between 0 and 64.
4861 - ``K``: An immediate signed 8-bit integer.
4862 - ``L``: An immediate integer, 0xff or 0xffff or (in 64-bit mode only)
4864 - ``M``: An immediate integer between 0 and 3.
4865 - ``N``: An immediate unsigned 8-bit integer.
4866 - ``O``: An immediate integer between 0 and 127.
4867 - ``e``: An immediate 32-bit signed integer.
4868 - ``Z``: An immediate 32-bit unsigned integer.
4869 - ``o``, ``v``: Treated the same as ``m``, at the moment.
4870 - ``q``: An 8, 16, 32, or 64-bit register which can be accessed as an 8-bit
4871 ``l`` integer register. On X86-32, this is the ``a``, ``b``, ``c``, and ``d``
4872 registers, and on X86-64, it is all of the integer registers.
4873 - ``Q``: An 8, 16, 32, or 64-bit register which can be accessed as an 8-bit
4874 ``h`` integer register. This is the ``a``, ``b``, ``c``, and ``d`` registers.
4875 - ``r`` or ``l``: An 8, 16, 32, or 64-bit integer register.
4876 - ``R``: An 8, 16, 32, or 64-bit "legacy" integer register -- one which has
4877 existed since i386, and can be accessed without the REX prefix.
4878 - ``f``: A 32, 64, or 80-bit '387 FPU stack pseudo-register.
4879 - ``y``: A 64-bit MMX register, if MMX is enabled.
4880 - ``x``: If SSE is enabled: a 32 or 64-bit scalar operand, or 128-bit vector
4881 operand in a SSE register. If AVX is also enabled, can also be a 256-bit
4882 vector operand in an AVX register. If AVX-512 is also enabled, can also be a
4883 512-bit vector operand in an AVX512 register, Otherwise, an error.
4884 - ``Y``: The same as ``x``, if *SSE2* is enabled, otherwise an error.
4885 - ``A``: Special case: allocates EAX first, then EDX, for a single operand (in
4886 32-bit mode, a 64-bit integer operand will get split into two registers). It
4887 is not recommended to use this constraint, as in 64-bit mode, the 64-bit
4888 operand will get allocated only to RAX -- if two 32-bit operands are needed,
4889 you're better off splitting it yourself, before passing it to the asm
4894 - ``r``: A 32-bit integer register.
4897 .. _inline-asm-modifiers:
4899 Asm template argument modifiers
4900 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
4902 In the asm template string, modifiers can be used on the operand reference, like
4905 The modifiers are, in general, expected to behave the same way they do in
4906 GCC. LLVM's support is often implemented on an 'as-needed' basis, to support C
4907 inline asm code which was supported by GCC. A mismatch in behavior between LLVM
4908 and GCC likely indicates a bug in LLVM.
4912 - ``c``: Print an immediate integer constant unadorned, without
4913 the target-specific immediate punctuation (e.g. no ``$`` prefix).
4914 - ``n``: Negate and print immediate integer constant unadorned, without the
4915 target-specific immediate punctuation (e.g. no ``$`` prefix).
4916 - ``l``: Print as an unadorned label, without the target-specific label
4917 punctuation (e.g. no ``$`` prefix).
4921 - ``w``: Print a GPR register with a ``w*`` name instead of ``x*`` name. E.g.,
4922 instead of ``x30``, print ``w30``.
4923 - ``x``: Print a GPR register with a ``x*`` name. (this is the default, anyhow).
4924 - ``b``, ``h``, ``s``, ``d``, ``q``: Print a floating-point/SIMD register with a
4925 ``b*``, ``h*``, ``s*``, ``d*``, or ``q*`` name, rather than the default of
4934 - ``a``: Print an operand as an address (with ``[`` and ``]`` surrounding a
4938 - ``y``: Print a VFP single-precision register as an indexed double (e.g. print
4939 as ``d4[1]`` instead of ``s9``)
4940 - ``B``: Bitwise invert and print an immediate integer constant without ``#``
4942 - ``L``: Print the low 16-bits of an immediate integer constant.
4943 - ``M``: Print as a register set suitable for ldm/stm. Also prints *all*
4944 register operands subsequent to the specified one (!), so use carefully.
4945 - ``Q``: Print the low-order register of a register-pair, or the low-order
4946 register of a two-register operand.
4947 - ``R``: Print the high-order register of a register-pair, or the high-order
4948 register of a two-register operand.
4949 - ``H``: Print the second register of a register-pair. (On a big-endian system,
4950 ``H`` is equivalent to ``Q``, and on little-endian system, ``H`` is equivalent
4953 .. FIXME: H doesn't currently support printing the second register
4954 of a two-register operand.
4956 - ``e``: Print the low doubleword register of a NEON quad register.
4957 - ``f``: Print the high doubleword register of a NEON quad register.
4958 - ``m``: Print the base register of a memory operand without the ``[`` and ``]``
4963 - ``L``: Print the second register of a two-register operand. Requires that it
4964 has been allocated consecutively to the first.
4966 .. FIXME: why is it restricted to consecutive ones? And there's
4967 nothing that ensures that happens, is there?
4969 - ``I``: Print the letter 'i' if the operand is an integer constant, otherwise
4970 nothing. Used to print 'addi' vs 'add' instructions.
4974 No additional modifiers.
4978 - ``X``: Print an immediate integer as hexadecimal
4979 - ``x``: Print the low 16 bits of an immediate integer as hexadecimal.
4980 - ``d``: Print an immediate integer as decimal.
4981 - ``m``: Subtract one and print an immediate integer as decimal.
4982 - ``z``: Print $0 if an immediate zero, otherwise print normally.
4983 - ``L``: Print the low-order register of a two-register operand, or prints the
4984 address of the low-order word of a double-word memory operand.
4986 .. FIXME: L seems to be missing memory operand support.
4988 - ``M``: Print the high-order register of a two-register operand, or prints the
4989 address of the high-order word of a double-word memory operand.
4991 .. FIXME: M seems to be missing memory operand support.
4993 - ``D``: Print the second register of a two-register operand, or prints the
4994 second word of a double-word memory operand. (On a big-endian system, ``D`` is
4995 equivalent to ``L``, and on little-endian system, ``D`` is equivalent to
4997 - ``w``: No effect. Provided for compatibility with GCC which requires this
4998 modifier in order to print MSA registers (``W0-W31``) with the ``f``
5007 - ``L``: Print the second register of a two-register operand. Requires that it
5008 has been allocated consecutively to the first.
5010 .. FIXME: why is it restricted to consecutive ones? And there's
5011 nothing that ensures that happens, is there?
5013 - ``I``: Print the letter 'i' if the operand is an integer constant, otherwise
5014 nothing. Used to print 'addi' vs 'add' instructions.
5015 - ``y``: For a memory operand, prints formatter for a two-register X-form
5016 instruction. (Currently always prints ``r0,OPERAND``).
5017 - ``U``: Prints 'u' if the memory operand is an update form, and nothing
5018 otherwise. (NOTE: LLVM does not support update form, so this will currently
5019 always print nothing)
5020 - ``X``: Prints 'x' if the memory operand is an indexed form. (NOTE: LLVM does
5021 not support indexed form, so this will currently always print nothing)
5025 - ``i``: Print the letter 'i' if the operand is not a register, otherwise print
5026 nothing. Used to print 'addi' vs 'add' instructions, etc.
5027 - ``z``: Print the register ``zero`` if an immediate zero, otherwise print
5036 SystemZ implements only ``n``, and does *not* support any of the other
5037 target-independent modifiers.
5041 - ``c``: Print an unadorned integer or symbol name. (The latter is
5042 target-specific behavior for this typically target-independent modifier).
5043 - ``A``: Print a register name with a '``*``' before it.
5044 - ``b``: Print an 8-bit register name (e.g. ``al``); do nothing on a memory
5046 - ``h``: Print the upper 8-bit register name (e.g. ``ah``); do nothing on a
5048 - ``w``: Print the 16-bit register name (e.g. ``ax``); do nothing on a memory
5050 - ``k``: Print the 32-bit register name (e.g. ``eax``); do nothing on a memory
5052 - ``q``: Print the 64-bit register name (e.g. ``rax``), if 64-bit registers are
5053 available, otherwise the 32-bit register name; do nothing on a memory operand.
5054 - ``n``: Negate and print an unadorned integer, or, for operands other than an
5055 immediate integer (e.g. a relocatable symbol expression), print a '-' before
5056 the operand. (The behavior for relocatable symbol expressions is a
5057 target-specific behavior for this typically target-independent modifier)
5058 - ``H``: Print a memory reference with additional offset +8.
5059 - ``P``: Print a memory reference or operand for use as the argument of a call
5060 instruction. (E.g. omit ``(rip)``, even though it's PC-relative.)
5064 No additional modifiers.
5070 The call instructions that wrap inline asm nodes may have a
5071 "``!srcloc``" MDNode attached to it that contains a list of constant
5072 integers. If present, the code generator will use the integer as the
5073 location cookie value when report errors through the ``LLVMContext``
5074 error reporting mechanisms. This allows a front-end to correlate backend
5075 errors that occur with inline asm back to the source code that produced
5078 .. code-block:: llvm
5080 call void asm sideeffect "something bad", ""(), !srcloc !42
5082 !42 = !{ i32 1234567 }
5084 It is up to the front-end to make sense of the magic numbers it places
5085 in the IR. If the MDNode contains multiple constants, the code generator
5086 will use the one that corresponds to the line of the asm that the error
5094 LLVM IR allows metadata to be attached to instructions and global objects in the
5095 program that can convey extra information about the code to the optimizers and
5096 code generator. One example application of metadata is source-level
5097 debug information. There are two metadata primitives: strings and nodes.
5099 Metadata does not have a type, and is not a value. If referenced from a
5100 ``call`` instruction, it uses the ``metadata`` type.
5102 All metadata are identified in syntax by an exclamation point ('``!``').
5104 .. _metadata-string:
5106 Metadata Nodes and Metadata Strings
5107 -----------------------------------
5109 A metadata string is a string surrounded by double quotes. It can
5110 contain any character by escaping non-printable characters with
5111 "``\xx``" where "``xx``" is the two digit hex code. For example:
5114 Metadata nodes are represented with notation similar to structure
5115 constants (a comma separated list of elements, surrounded by braces and
5116 preceded by an exclamation point). Metadata nodes can have any values as
5117 their operand. For example:
5119 .. code-block:: llvm
5121 !{ !"test\00", i32 10}
5123 Metadata nodes that aren't uniqued use the ``distinct`` keyword. For example:
5125 .. code-block:: text
5127 !0 = distinct !{!"test\00", i32 10}
5129 ``distinct`` nodes are useful when nodes shouldn't be merged based on their
5130 content. They can also occur when transformations cause uniquing collisions
5131 when metadata operands change.
5133 A :ref:`named metadata <namedmetadatastructure>` is a collection of
5134 metadata nodes, which can be looked up in the module symbol table. For
5137 .. code-block:: llvm
5141 Metadata can be used as function arguments. Here the ``llvm.dbg.value``
5142 intrinsic is using three metadata arguments:
5144 .. code-block:: llvm
5146 call void @llvm.dbg.value(metadata !24, metadata !25, metadata !26)
5148 Metadata can be attached to an instruction. Here metadata ``!21`` is attached
5149 to the ``add`` instruction using the ``!dbg`` identifier:
5151 .. code-block:: llvm
5153 %indvar.next = add i64 %indvar, 1, !dbg !21
5155 Instructions may not have multiple metadata attachments with the same
5158 Metadata can also be attached to a function or a global variable. Here metadata
5159 ``!22`` is attached to the ``f1`` and ``f2`` functions, and the globals ``g1``
5160 and ``g2`` using the ``!dbg`` identifier:
5162 .. code-block:: llvm
5164 declare !dbg !22 void @f1()
5165 define void @f2() !dbg !22 {
5169 @g1 = global i32 0, !dbg !22
5170 @g2 = external global i32, !dbg !22
5172 Unlike instructions, global objects (functions and global variables) may have
5173 multiple metadata attachments with the same identifier.
5175 A transformation is required to drop any metadata attachment that it does not
5176 know or know it can't preserve. Currently there is an exception for metadata
5177 attachment to globals for ``!type`` and ``!absolute_symbol`` which can't be
5178 unconditionally dropped unless the global is itself deleted.
5180 Metadata attached to a module using named metadata may not be dropped, with
5181 the exception of debug metadata (named metadata with the name ``!llvm.dbg.*``).
5183 More information about specific metadata nodes recognized by the
5184 optimizers and code generator is found below.
5186 .. _specialized-metadata:
5188 Specialized Metadata Nodes
5189 ^^^^^^^^^^^^^^^^^^^^^^^^^^
5191 Specialized metadata nodes are custom data structures in metadata (as opposed
5192 to generic tuples). Their fields are labelled, and can be specified in any
5195 These aren't inherently debug info centric, but currently all the specialized
5196 metadata nodes are related to debug info.
5203 ``DICompileUnit`` nodes represent a compile unit. The ``enums:``,
5204 ``retainedTypes:``, ``globals:``, ``imports:`` and ``macros:`` fields are tuples
5205 containing the debug info to be emitted along with the compile unit, regardless
5206 of code optimizations (some nodes are only emitted if there are references to
5207 them from instructions). The ``debugInfoForProfiling:`` field is a boolean
5208 indicating whether or not line-table discriminators are updated to provide
5209 more-accurate debug info for profiling results.
5211 .. code-block:: text
5213 !0 = !DICompileUnit(language: DW_LANG_C99, file: !1, producer: "clang",
5214 isOptimized: true, flags: "-O2", runtimeVersion: 2,
5215 splitDebugFilename: "abc.debug", emissionKind: FullDebug,
5216 enums: !2, retainedTypes: !3, globals: !4, imports: !5,
5217 macros: !6, dwoId: 0x0abcd)
5219 Compile unit descriptors provide the root scope for objects declared in a
5220 specific compilation unit. File descriptors are defined using this scope. These
5221 descriptors are collected by a named metadata node ``!llvm.dbg.cu``. They keep
5222 track of global variables, type information, and imported entities (declarations
5230 ``DIFile`` nodes represent files. The ``filename:`` can include slashes.
5232 .. code-block:: none
5234 !0 = !DIFile(filename: "path/to/file", directory: "/path/to/dir",
5235 checksumkind: CSK_MD5,
5236 checksum: "000102030405060708090a0b0c0d0e0f")
5238 Files are sometimes used in ``scope:`` fields, and are the only valid target
5239 for ``file:`` fields.
5240 Valid values for ``checksumkind:`` field are: {CSK_None, CSK_MD5, CSK_SHA1, CSK_SHA256}
5247 ``DIBasicType`` nodes represent primitive types, such as ``int``, ``bool`` and
5248 ``float``. ``tag:`` defaults to ``DW_TAG_base_type``.
5250 .. code-block:: text
5252 !0 = !DIBasicType(name: "unsigned char", size: 8, align: 8,
5253 encoding: DW_ATE_unsigned_char)
5254 !1 = !DIBasicType(tag: DW_TAG_unspecified_type, name: "decltype(nullptr)")
5256 The ``encoding:`` describes the details of the type. Usually it's one of the
5259 .. code-block:: text
5265 DW_ATE_signed_char = 6
5267 DW_ATE_unsigned_char = 8
5269 .. _DISubroutineType:
5274 ``DISubroutineType`` nodes represent subroutine types. Their ``types:`` field
5275 refers to a tuple; the first operand is the return type, while the rest are the
5276 types of the formal arguments in order. If the first operand is ``null``, that
5277 represents a function with no return value (such as ``void foo() {}`` in C++).
5279 .. code-block:: text
5281 !0 = !BasicType(name: "int", size: 32, align: 32, DW_ATE_signed)
5282 !1 = !BasicType(name: "char", size: 8, align: 8, DW_ATE_signed_char)
5283 !2 = !DISubroutineType(types: !{null, !0, !1}) ; void (int, char)
5290 ``DIDerivedType`` nodes represent types derived from other types, such as
5293 .. code-block:: text
5295 !0 = !DIBasicType(name: "unsigned char", size: 8, align: 8,
5296 encoding: DW_ATE_unsigned_char)
5297 !1 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !0, size: 32,
5300 The following ``tag:`` values are valid:
5302 .. code-block:: text
5305 DW_TAG_pointer_type = 15
5306 DW_TAG_reference_type = 16
5308 DW_TAG_inheritance = 28
5309 DW_TAG_ptr_to_member_type = 31
5310 DW_TAG_const_type = 38
5312 DW_TAG_volatile_type = 53
5313 DW_TAG_restrict_type = 55
5314 DW_TAG_atomic_type = 71
5316 .. _DIDerivedTypeMember:
5318 ``DW_TAG_member`` is used to define a member of a :ref:`composite type
5319 <DICompositeType>`. The type of the member is the ``baseType:``. The
5320 ``offset:`` is the member's bit offset. If the composite type has an ODR
5321 ``identifier:`` and does not set ``flags: DIFwdDecl``, then the member is
5322 uniqued based only on its ``name:`` and ``scope:``.
5324 ``DW_TAG_inheritance`` and ``DW_TAG_friend`` are used in the ``elements:``
5325 field of :ref:`composite types <DICompositeType>` to describe parents and
5328 ``DW_TAG_typedef`` is used to provide a name for the ``baseType:``.
5330 ``DW_TAG_pointer_type``, ``DW_TAG_reference_type``, ``DW_TAG_const_type``,
5331 ``DW_TAG_volatile_type``, ``DW_TAG_restrict_type`` and ``DW_TAG_atomic_type``
5332 are used to qualify the ``baseType:``.
5334 Note that the ``void *`` type is expressed as a type derived from NULL.
5336 .. _DICompositeType:
5341 ``DICompositeType`` nodes represent types composed of other types, like
5342 structures and unions. ``elements:`` points to a tuple of the composed types.
5344 If the source language supports ODR, the ``identifier:`` field gives the unique
5345 identifier used for type merging between modules. When specified,
5346 :ref:`subprogram declarations <DISubprogramDeclaration>` and :ref:`member
5347 derived types <DIDerivedTypeMember>` that reference the ODR-type in their
5348 ``scope:`` change uniquing rules.
5350 For a given ``identifier:``, there should only be a single composite type that
5351 does not have ``flags: DIFlagFwdDecl`` set. LLVM tools that link modules
5352 together will unique such definitions at parse time via the ``identifier:``
5353 field, even if the nodes are ``distinct``.
5355 .. code-block:: text
5357 !0 = !DIEnumerator(name: "SixKind", value: 7)
5358 !1 = !DIEnumerator(name: "SevenKind", value: 7)
5359 !2 = !DIEnumerator(name: "NegEightKind", value: -8)
5360 !3 = !DICompositeType(tag: DW_TAG_enumeration_type, name: "Enum", file: !12,
5361 line: 2, size: 32, align: 32, identifier: "_M4Enum",
5362 elements: !{!0, !1, !2})
5364 The following ``tag:`` values are valid:
5366 .. code-block:: text
5368 DW_TAG_array_type = 1
5369 DW_TAG_class_type = 2
5370 DW_TAG_enumeration_type = 4
5371 DW_TAG_structure_type = 19
5372 DW_TAG_union_type = 23
5374 For ``DW_TAG_array_type``, the ``elements:`` should be :ref:`subrange
5375 descriptors <DISubrange>`, each representing the range of subscripts at that
5376 level of indexing. The ``DIFlagVector`` flag to ``flags:`` indicates that an
5377 array type is a native packed vector. The optional ``dataLocation`` is a
5378 DIExpression that describes how to get from an object's address to the actual
5379 raw data, if they aren't equivalent. This is only supported for array types,
5380 particularly to describe Fortran arrays, which have an array descriptor in
5381 addition to the array data. Alternatively it can also be DIVariable which
5382 has the address of the actual raw data. The Fortran language supports pointer
5383 arrays which can be attached to actual arrays, this attachment between pointer
5384 and pointee is called association. The optional ``associated`` is a
5385 DIExpression that describes whether the pointer array is currently associated.
5386 The optional ``allocated`` is a DIExpression that describes whether the
5387 allocatable array is currently allocated. The optional ``rank`` is a
5388 DIExpression that describes the rank (number of dimensions) of fortran assumed
5389 rank array (rank is known at runtime).
5391 For ``DW_TAG_enumeration_type``, the ``elements:`` should be :ref:`enumerator
5392 descriptors <DIEnumerator>`, each representing the definition of an enumeration
5393 value for the set. All enumeration type descriptors are collected in the
5394 ``enums:`` field of the :ref:`compile unit <DICompileUnit>`.
5396 For ``DW_TAG_structure_type``, ``DW_TAG_class_type``, and
5397 ``DW_TAG_union_type``, the ``elements:`` should be :ref:`derived types
5398 <DIDerivedType>` with ``tag: DW_TAG_member``, ``tag: DW_TAG_inheritance``, or
5399 ``tag: DW_TAG_friend``; or :ref:`subprograms <DISubprogram>` with
5400 ``isDefinition: false``.
5407 ``DISubrange`` nodes are the elements for ``DW_TAG_array_type`` variants of
5408 :ref:`DICompositeType`.
5410 - ``count: -1`` indicates an empty array.
5411 - ``count: !10`` describes the count with a :ref:`DILocalVariable`.
5412 - ``count: !12`` describes the count with a :ref:`DIGlobalVariable`.
5414 .. code-block:: text
5416 !0 = !DISubrange(count: 5, lowerBound: 0) ; array counting from 0
5417 !1 = !DISubrange(count: 5, lowerBound: 1) ; array counting from 1
5418 !2 = !DISubrange(count: -1) ; empty array.
5420 ; Scopes used in rest of example
5421 !6 = !DIFile(filename: "vla.c", directory: "/path/to/file")
5422 !7 = distinct !DICompileUnit(language: DW_LANG_C99, file: !6)
5423 !8 = distinct !DISubprogram(name: "foo", scope: !7, file: !6, line: 5)
5425 ; Use of local variable as count value
5426 !9 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
5427 !10 = !DILocalVariable(name: "count", scope: !8, file: !6, line: 42, type: !9)
5428 !11 = !DISubrange(count: !10, lowerBound: 0)
5430 ; Use of global variable as count value
5431 !12 = !DIGlobalVariable(name: "count", scope: !8, file: !6, line: 22, type: !9)
5432 !13 = !DISubrange(count: !12, lowerBound: 0)
5439 ``DIEnumerator`` nodes are the elements for ``DW_TAG_enumeration_type``
5440 variants of :ref:`DICompositeType`.
5442 .. code-block:: text
5444 !0 = !DIEnumerator(name: "SixKind", value: 7)
5445 !1 = !DIEnumerator(name: "SevenKind", value: 7)
5446 !2 = !DIEnumerator(name: "NegEightKind", value: -8)
5448 DITemplateTypeParameter
5449 """""""""""""""""""""""
5451 ``DITemplateTypeParameter`` nodes represent type parameters to generic source
5452 language constructs. They are used (optionally) in :ref:`DICompositeType` and
5453 :ref:`DISubprogram` ``templateParams:`` fields.
5455 .. code-block:: text
5457 !0 = !DITemplateTypeParameter(name: "Ty", type: !1)
5459 DITemplateValueParameter
5460 """"""""""""""""""""""""
5462 ``DITemplateValueParameter`` nodes represent value parameters to generic source
5463 language constructs. ``tag:`` defaults to ``DW_TAG_template_value_parameter``,
5464 but if specified can also be set to ``DW_TAG_GNU_template_template_param`` or
5465 ``DW_TAG_GNU_template_param_pack``. They are used (optionally) in
5466 :ref:`DICompositeType` and :ref:`DISubprogram` ``templateParams:`` fields.
5468 .. code-block:: text
5470 !0 = !DITemplateValueParameter(name: "Ty", type: !1, value: i32 7)
5475 ``DINamespace`` nodes represent namespaces in the source language.
5477 .. code-block:: text
5479 !0 = !DINamespace(name: "myawesomeproject", scope: !1, file: !2, line: 7)
5481 .. _DIGlobalVariable:
5486 ``DIGlobalVariable`` nodes represent global variables in the source language.
5488 .. code-block:: text
5490 @foo = global i32, !dbg !0
5491 !0 = !DIGlobalVariableExpression(var: !1, expr: !DIExpression())
5492 !1 = !DIGlobalVariable(name: "foo", linkageName: "foo", scope: !2,
5493 file: !3, line: 7, type: !4, isLocal: true,
5494 isDefinition: false, declaration: !5)
5497 DIGlobalVariableExpression
5498 """"""""""""""""""""""""""
5500 ``DIGlobalVariableExpression`` nodes tie a :ref:`DIGlobalVariable` together
5501 with a :ref:`DIExpression`.
5503 .. code-block:: text
5505 @lower = global i32, !dbg !0
5506 @upper = global i32, !dbg !1
5507 !0 = !DIGlobalVariableExpression(
5509 expr: !DIExpression(DW_OP_LLVM_fragment, 0, 32)
5511 !1 = !DIGlobalVariableExpression(
5513 expr: !DIExpression(DW_OP_LLVM_fragment, 32, 32)
5515 !2 = !DIGlobalVariable(name: "split64", linkageName: "split64", scope: !3,
5516 file: !4, line: 8, type: !5, declaration: !6)
5518 All global variable expressions should be referenced by the `globals:` field of
5519 a :ref:`compile unit <DICompileUnit>`.
5526 ``DISubprogram`` nodes represent functions from the source language. A distinct
5527 ``DISubprogram`` may be attached to a function definition using ``!dbg``
5528 metadata. A unique ``DISubprogram`` may be attached to a function declaration
5529 used for call site debug info. The ``retainedNodes:`` field is a list of
5530 :ref:`variables <DILocalVariable>` and :ref:`labels <DILabel>` that must be
5531 retained, even if their IR counterparts are optimized out of the IR. The
5532 ``type:`` field must point at an :ref:`DISubroutineType`.
5534 .. _DISubprogramDeclaration:
5536 When ``isDefinition: false``, subprograms describe a declaration in the type
5537 tree as opposed to a definition of a function. If the scope is a composite
5538 type with an ODR ``identifier:`` and that does not set ``flags: DIFwdDecl``,
5539 then the subprogram declaration is uniqued based only on its ``linkageName:``
5542 .. code-block:: text
5544 define void @_Z3foov() !dbg !0 {
5548 !0 = distinct !DISubprogram(name: "foo", linkageName: "_Zfoov", scope: !1,
5549 file: !2, line: 7, type: !3, isLocal: true,
5550 isDefinition: true, scopeLine: 8,
5552 virtuality: DW_VIRTUALITY_pure_virtual,
5553 virtualIndex: 10, flags: DIFlagPrototyped,
5554 isOptimized: true, unit: !5, templateParams: !6,
5555 declaration: !7, retainedNodes: !8,
5563 ``DILexicalBlock`` nodes describe nested blocks within a :ref:`subprogram
5564 <DISubprogram>`. The line number and column numbers are used to distinguish
5565 two lexical blocks at same depth. They are valid targets for ``scope:``
5568 .. code-block:: text
5570 !0 = distinct !DILexicalBlock(scope: !1, file: !2, line: 7, column: 35)
5572 Usually lexical blocks are ``distinct`` to prevent node merging based on
5575 .. _DILexicalBlockFile:
5580 ``DILexicalBlockFile`` nodes are used to discriminate between sections of a
5581 :ref:`lexical block <DILexicalBlock>`. The ``file:`` field can be changed to
5582 indicate textual inclusion, or the ``discriminator:`` field can be used to
5583 discriminate between control flow within a single block in the source language.
5585 .. code-block:: text
5587 !0 = !DILexicalBlock(scope: !3, file: !4, line: 7, column: 35)
5588 !1 = !DILexicalBlockFile(scope: !0, file: !4, discriminator: 0)
5589 !2 = !DILexicalBlockFile(scope: !0, file: !4, discriminator: 1)
5596 ``DILocation`` nodes represent source debug locations. The ``scope:`` field is
5597 mandatory, and points at an :ref:`DILexicalBlockFile`, an
5598 :ref:`DILexicalBlock`, or an :ref:`DISubprogram`.
5600 .. code-block:: text
5602 !0 = !DILocation(line: 2900, column: 42, scope: !1, inlinedAt: !2)
5604 .. _DILocalVariable:
5609 ``DILocalVariable`` nodes represent local variables in the source language. If
5610 the ``arg:`` field is set to non-zero, then this variable is a subprogram
5611 parameter, and it will be included in the ``retainedNodes:`` field of its
5612 :ref:`DISubprogram`.
5614 .. code-block:: text
5616 !0 = !DILocalVariable(name: "this", arg: 1, scope: !3, file: !2, line: 7,
5617 type: !3, flags: DIFlagArtificial)
5618 !1 = !DILocalVariable(name: "x", arg: 2, scope: !4, file: !2, line: 7,
5620 !2 = !DILocalVariable(name: "y", scope: !5, file: !2, line: 7, type: !3)
5627 ``DIExpression`` nodes represent expressions that are inspired by the DWARF
5628 expression language. They are used in :ref:`debug intrinsics<dbg_intrinsics>`
5629 (such as ``llvm.dbg.declare`` and ``llvm.dbg.value``) to describe how the
5630 referenced LLVM variable relates to the source language variable. Debug
5631 intrinsics are interpreted left-to-right: start by pushing the value/address
5632 operand of the intrinsic onto a stack, then repeatedly push and evaluate
5633 opcodes from the DIExpression until the final variable description is produced.
5635 The current supported opcode vocabulary is limited:
5637 - ``DW_OP_deref`` dereferences the top of the expression stack.
5638 - ``DW_OP_plus`` pops the last two entries from the expression stack, adds
5639 them together and appends the result to the expression stack.
5640 - ``DW_OP_minus`` pops the last two entries from the expression stack, subtracts
5641 the last entry from the second last entry and appends the result to the
5643 - ``DW_OP_plus_uconst, 93`` adds ``93`` to the working expression.
5644 - ``DW_OP_LLVM_fragment, 16, 8`` specifies the offset and size (``16`` and ``8``
5645 here, respectively) of the variable fragment from the working expression. Note
5646 that contrary to DW_OP_bit_piece, the offset is describing the location
5647 within the described source variable.
5648 - ``DW_OP_LLVM_convert, 16, DW_ATE_signed`` specifies a bit size and encoding
5649 (``16`` and ``DW_ATE_signed`` here, respectively) to which the top of the
5650 expression stack is to be converted. Maps into a ``DW_OP_convert`` operation
5651 that references a base type constructed from the supplied values.
5652 - ``DW_OP_LLVM_tag_offset, tag_offset`` specifies that a memory tag should be
5653 optionally applied to the pointer. The memory tag is derived from the
5654 given tag offset in an implementation-defined manner.
5655 - ``DW_OP_swap`` swaps top two stack entries.
5656 - ``DW_OP_xderef`` provides extended dereference mechanism. The entry at the top
5657 of the stack is treated as an address. The second stack entry is treated as an
5658 address space identifier.
5659 - ``DW_OP_stack_value`` marks a constant value.
5660 - ``DW_OP_LLVM_entry_value, N`` may only appear in MIR and at the
5661 beginning of a ``DIExpression``. In DWARF a ``DBG_VALUE``
5662 instruction binding a ``DIExpression(DW_OP_LLVM_entry_value`` to a
5663 register is lowered to a ``DW_OP_entry_value [reg]``, pushing the
5664 value the register had upon function entry onto the stack. The next
5665 ``(N - 1)`` operations will be part of the ``DW_OP_entry_value``
5666 block argument. For example, ``!DIExpression(DW_OP_LLVM_entry_value,
5667 1, DW_OP_plus_uconst, 123, DW_OP_stack_value)`` specifies an
5668 expression where the entry value of the debug value instruction's
5669 value/address operand is pushed to the stack, and is added
5670 with 123. Due to framework limitations ``N`` can currently only
5673 The operation is introduced by the ``LiveDebugValues`` pass, which
5674 applies it only to function parameters that are unmodified
5675 throughout the function. Support is limited to simple register
5676 location descriptions, or as indirect locations (e.g., when a struct
5677 is passed-by-value to a callee via a pointer to a temporary copy
5678 made in the caller). The entry value op is also introduced by the
5679 ``AsmPrinter`` pass when a call site parameter value
5680 (``DW_AT_call_site_parameter_value``) is represented as entry value
5682 - ``DW_OP_LLVM_arg, N`` is used in debug intrinsics that refer to more than one
5683 value, such as one that calculates the sum of two registers. This is always
5684 used in combination with an ordered list of values, such that
5685 ``DW_OP_LLVM_arg, N`` refers to the ``N``th element in that list. For
5686 example, ``!DIExpression(DW_OP_LLVM_arg, 0, DW_OP_LLVM_arg, 1, DW_OP_minus,
5687 DW_OP_stack_value)`` used with the list ``(%reg1, %reg2)`` would evaluate to
5688 ``%reg1 - reg2``. This list of values should be provided by the containing
5689 intrinsic/instruction.
5690 - ``DW_OP_breg`` (or ``DW_OP_bregx``) represents a content on the provided
5691 signed offset of the specified register. The opcode is only generated by the
5692 ``AsmPrinter`` pass to describe call site parameter value which requires an
5693 expression over two registers.
5694 - ``DW_OP_push_object_address`` pushes the address of the object which can then
5695 serve as a descriptor in subsequent calculation. This opcode can be used to
5696 calculate bounds of fortran allocatable array which has array descriptors.
5697 - ``DW_OP_over`` duplicates the entry currently second in the stack at the top
5698 of the stack. This opcode can be used to calculate bounds of fortran assumed
5699 rank array which has rank known at run time and current dimension number is
5700 implicitly first element of the stack.
5701 - ``DW_OP_LLVM_implicit_pointer`` It specifies the dereferenced value. It can
5702 be used to represent pointer variables which are optimized out but the value
5703 it points to is known. This operator is required as it is different than DWARF
5704 operator DW_OP_implicit_pointer in representation and specification (number
5705 and types of operands) and later can not be used as multiple level.
5707 .. code-block:: text
5711 call void @llvm.dbg.value(metadata i32 4, metadata !17, metadata !20)
5712 !17 = !DILocalVariable(name: "ptr1", scope: !12, file: !3, line: 5,
5714 !18 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !19, size: 64)
5715 !19 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
5716 !20 = !DIExpression(DW_OP_LLVM_implicit_pointer))
5720 call void @llvm.dbg.value(metadata i32 4, metadata !17, metadata !21)
5721 !17 = !DILocalVariable(name: "ptr1", scope: !12, file: !3, line: 5,
5723 !18 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !19, size: 64)
5724 !19 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !20, size: 64)
5725 !20 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
5726 !21 = !DIExpression(DW_OP_LLVM_implicit_pointer,
5727 DW_OP_LLVM_implicit_pointer))
5729 DWARF specifies three kinds of simple location descriptions: Register, memory,
5730 and implicit location descriptions. Note that a location description is
5731 defined over certain ranges of a program, i.e the location of a variable may
5732 change over the course of the program. Register and memory location
5733 descriptions describe the *concrete location* of a source variable (in the
5734 sense that a debugger might modify its value), whereas *implicit locations*
5735 describe merely the actual *value* of a source variable which might not exist
5736 in registers or in memory (see ``DW_OP_stack_value``).
5738 A ``llvm.dbg.addr`` or ``llvm.dbg.declare`` intrinsic describes an indirect
5739 value (the address) of a source variable. The first operand of the intrinsic
5740 must be an address of some kind. A DIExpression attached to the intrinsic
5741 refines this address to produce a concrete location for the source variable.
5743 A ``llvm.dbg.value`` intrinsic describes the direct value of a source variable.
5744 The first operand of the intrinsic may be a direct or indirect value. A
5745 DIExpression attached to the intrinsic refines the first operand to produce a
5746 direct value. For example, if the first operand is an indirect value, it may be
5747 necessary to insert ``DW_OP_deref`` into the DIExpression in order to produce a
5748 valid debug intrinsic.
5752 A DIExpression is interpreted in the same way regardless of which kind of
5753 debug intrinsic it's attached to.
5755 .. code-block:: text
5757 !0 = !DIExpression(DW_OP_deref)
5758 !1 = !DIExpression(DW_OP_plus_uconst, 3)
5759 !1 = !DIExpression(DW_OP_constu, 3, DW_OP_plus)
5760 !2 = !DIExpression(DW_OP_bit_piece, 3, 7)
5761 !3 = !DIExpression(DW_OP_deref, DW_OP_constu, 3, DW_OP_plus, DW_OP_LLVM_fragment, 3, 7)
5762 !4 = !DIExpression(DW_OP_constu, 2, DW_OP_swap, DW_OP_xderef)
5763 !5 = !DIExpression(DW_OP_constu, 42, DW_OP_stack_value)
5768 ``DIArgList`` nodes hold a list of constant or SSA value references. These are
5769 used in :ref:`debug intrinsics<dbg_intrinsics>` (currently only in
5770 ``llvm.dbg.value``) in combination with a ``DIExpression`` that uses the
5771 ``DW_OP_LLVM_arg`` operator. Because a DIArgList may refer to local values
5772 within a function, it must only be used as a function argument, must always be
5773 inlined, and cannot appear in named metadata.
5775 .. code-block:: text
5777 llvm.dbg.value(metadata !DIArgList(i32 %a, i32 %b),
5779 metadata !DIExpression(DW_OP_LLVM_arg, 0, DW_OP_LLVM_arg, 1, DW_OP_plus))
5784 These flags encode various properties of DINodes.
5786 The `ExportSymbols` flag marks a class, struct or union whose members
5787 may be referenced as if they were defined in the containing class or
5788 union. This flag is used to decide whether the DW_AT_export_symbols can
5789 be used for the structure type.
5794 ``DIObjCProperty`` nodes represent Objective-C property nodes.
5796 .. code-block:: text
5798 !3 = !DIObjCProperty(name: "foo", file: !1, line: 7, setter: "setFoo",
5799 getter: "getFoo", attributes: 7, type: !2)
5804 ``DIImportedEntity`` nodes represent entities (such as modules) imported into a
5805 compile unit. The ``elements`` field is a list of renamed entities (such as
5806 variables and subprograms) in the imported entity (such as module).
5808 .. code-block:: text
5810 !2 = !DIImportedEntity(tag: DW_TAG_imported_module, name: "foo", scope: !0,
5811 entity: !1, line: 7, elements: !3)
5813 !4 = !DIImportedEntity(tag: DW_TAG_imported_declaration, name: "bar", scope: !0,
5814 entity: !5, line: 7)
5819 ``DIMacro`` nodes represent definition or undefinition of a macro identifiers.
5820 The ``name:`` field is the macro identifier, followed by macro parameters when
5821 defining a function-like macro, and the ``value`` field is the token-string
5822 used to expand the macro identifier.
5824 .. code-block:: text
5826 !2 = !DIMacro(macinfo: DW_MACINFO_define, line: 7, name: "foo(x)",
5828 !3 = !DIMacro(macinfo: DW_MACINFO_undef, line: 30, name: "foo")
5833 ``DIMacroFile`` nodes represent inclusion of source files.
5834 The ``nodes:`` field is a list of ``DIMacro`` and ``DIMacroFile`` nodes that
5835 appear in the included source file.
5837 .. code-block:: text
5839 !2 = !DIMacroFile(macinfo: DW_MACINFO_start_file, line: 7, file: !2,
5847 ``DILabel`` nodes represent labels within a :ref:`DISubprogram`. All fields of
5848 a ``DILabel`` are mandatory. The ``scope:`` field must be one of either a
5849 :ref:`DILexicalBlockFile`, a :ref:`DILexicalBlock`, or a :ref:`DISubprogram`.
5850 The ``name:`` field is the label identifier. The ``file:`` field is the
5851 :ref:`DIFile` the label is present in. The ``line:`` field is the source line
5852 within the file where the label is declared.
5854 .. code-block:: text
5856 !2 = !DILabel(scope: !0, name: "foo", file: !1, line: 7)
5861 In LLVM IR, memory does not have types, so LLVM's own type system is not
5862 suitable for doing type based alias analysis (TBAA). Instead, metadata is
5863 added to the IR to describe a type system of a higher level language. This
5864 can be used to implement C/C++ strict type aliasing rules, but it can also
5865 be used to implement custom alias analysis behavior for other languages.
5867 This description of LLVM's TBAA system is broken into two parts:
5868 :ref:`Semantics<tbaa_node_semantics>` talks about high level issues, and
5869 :ref:`Representation<tbaa_node_representation>` talks about the metadata
5870 encoding of various entities.
5872 It is always possible to trace any TBAA node to a "root" TBAA node (details
5873 in the :ref:`Representation<tbaa_node_representation>` section). TBAA
5874 nodes with different roots have an unknown aliasing relationship, and LLVM
5875 conservatively infers ``MayAlias`` between them. The rules mentioned in
5876 this section only pertain to TBAA nodes living under the same root.
5878 .. _tbaa_node_semantics:
5883 The TBAA metadata system, referred to as "struct path TBAA" (not to be
5884 confused with ``tbaa.struct``), consists of the following high level
5885 concepts: *Type Descriptors*, further subdivided into scalar type
5886 descriptors and struct type descriptors; and *Access Tags*.
5888 **Type descriptors** describe the type system of the higher level language
5889 being compiled. **Scalar type descriptors** describe types that do not
5890 contain other types. Each scalar type has a parent type, which must also
5891 be a scalar type or the TBAA root. Via this parent relation, scalar types
5892 within a TBAA root form a tree. **Struct type descriptors** denote types
5893 that contain a sequence of other type descriptors, at known offsets. These
5894 contained type descriptors can either be struct type descriptors themselves
5895 or scalar type descriptors.
5897 **Access tags** are metadata nodes attached to load and store instructions.
5898 Access tags use type descriptors to describe the *location* being accessed
5899 in terms of the type system of the higher level language. Access tags are
5900 tuples consisting of a base type, an access type and an offset. The base
5901 type is a scalar type descriptor or a struct type descriptor, the access
5902 type is a scalar type descriptor, and the offset is a constant integer.
5904 The access tag ``(BaseTy, AccessTy, Offset)`` can describe one of two
5907 * If ``BaseTy`` is a struct type, the tag describes a memory access (load
5908 or store) of a value of type ``AccessTy`` contained in the struct type
5909 ``BaseTy`` at offset ``Offset``.
5911 * If ``BaseTy`` is a scalar type, ``Offset`` must be 0 and ``BaseTy`` and
5912 ``AccessTy`` must be the same; and the access tag describes a scalar
5913 access with scalar type ``AccessTy``.
5915 We first define an ``ImmediateParent`` relation on ``(BaseTy, Offset)``
5918 * If ``BaseTy`` is a scalar type then ``ImmediateParent(BaseTy, 0)`` is
5919 ``(ParentTy, 0)`` where ``ParentTy`` is the parent of the scalar type as
5920 described in the TBAA metadata. ``ImmediateParent(BaseTy, Offset)`` is
5921 undefined if ``Offset`` is non-zero.
5923 * If ``BaseTy`` is a struct type then ``ImmediateParent(BaseTy, Offset)``
5924 is ``(NewTy, NewOffset)`` where ``NewTy`` is the type contained in
5925 ``BaseTy`` at offset ``Offset`` and ``NewOffset`` is ``Offset`` adjusted
5926 to be relative within that inner type.
5928 A memory access with an access tag ``(BaseTy1, AccessTy1, Offset1)``
5929 aliases a memory access with an access tag ``(BaseTy2, AccessTy2,
5930 Offset2)`` if either ``(BaseTy1, Offset1)`` is reachable from ``(Base2,
5931 Offset2)`` via the ``Parent`` relation or vice versa.
5933 As a concrete example, the type descriptor graph for the following program
5939 float f; // offset 4
5943 float f; // offset 0
5944 double d; // offset 4
5945 struct Inner inner_a; // offset 12
5948 void f(struct Outer* outer, struct Inner* inner, float* f, int* i, char* c) {
5949 outer->f = 0; // tag0: (OuterStructTy, FloatScalarTy, 0)
5950 outer->inner_a.i = 0; // tag1: (OuterStructTy, IntScalarTy, 12)
5951 outer->inner_a.f = 0.0; // tag2: (OuterStructTy, FloatScalarTy, 16)
5952 *f = 0.0; // tag3: (FloatScalarTy, FloatScalarTy, 0)
5955 is (note that in C and C++, ``char`` can be used to access any arbitrary
5958 .. code-block:: text
5961 CharScalarTy = ("char", Root, 0)
5962 FloatScalarTy = ("float", CharScalarTy, 0)
5963 DoubleScalarTy = ("double", CharScalarTy, 0)
5964 IntScalarTy = ("int", CharScalarTy, 0)
5965 InnerStructTy = {"Inner" (IntScalarTy, 0), (FloatScalarTy, 4)}
5966 OuterStructTy = {"Outer", (FloatScalarTy, 0), (DoubleScalarTy, 4),
5967 (InnerStructTy, 12)}
5970 with (e.g.) ``ImmediateParent(OuterStructTy, 12)`` = ``(InnerStructTy,
5971 0)``, ``ImmediateParent(InnerStructTy, 0)`` = ``(IntScalarTy, 0)``, and
5972 ``ImmediateParent(IntScalarTy, 0)`` = ``(CharScalarTy, 0)``.
5974 .. _tbaa_node_representation:
5979 The root node of a TBAA type hierarchy is an ``MDNode`` with 0 operands or
5980 with exactly one ``MDString`` operand.
5982 Scalar type descriptors are represented as an ``MDNode`` s with two
5983 operands. The first operand is an ``MDString`` denoting the name of the
5984 struct type. LLVM does not assign meaning to the value of this operand, it
5985 only cares about it being an ``MDString``. The second operand is an
5986 ``MDNode`` which points to the parent for said scalar type descriptor,
5987 which is either another scalar type descriptor or the TBAA root. Scalar
5988 type descriptors can have an optional third argument, but that must be the
5989 constant integer zero.
5991 Struct type descriptors are represented as ``MDNode`` s with an odd number
5992 of operands greater than 1. The first operand is an ``MDString`` denoting
5993 the name of the struct type. Like in scalar type descriptors the actual
5994 value of this name operand is irrelevant to LLVM. After the name operand,
5995 the struct type descriptors have a sequence of alternating ``MDNode`` and
5996 ``ConstantInt`` operands. With N starting from 1, the 2N - 1 th operand,
5997 an ``MDNode``, denotes a contained field, and the 2N th operand, a
5998 ``ConstantInt``, is the offset of the said contained field. The offsets
5999 must be in non-decreasing order.
6001 Access tags are represented as ``MDNode`` s with either 3 or 4 operands.
6002 The first operand is an ``MDNode`` pointing to the node representing the
6003 base type. The second operand is an ``MDNode`` pointing to the node
6004 representing the access type. The third operand is a ``ConstantInt`` that
6005 states the offset of the access. If a fourth field is present, it must be
6006 a ``ConstantInt`` valued at 0 or 1. If it is 1 then the access tag states
6007 that the location being accessed is "constant" (meaning
6008 ``pointsToConstantMemory`` should return true; see `other useful
6009 AliasAnalysis methods <AliasAnalysis.html#OtherItfs>`_). The TBAA root of
6010 the access type and the base type of an access tag must be the same, and
6011 that is the TBAA root of the access tag.
6013 '``tbaa.struct``' Metadata
6014 ^^^^^^^^^^^^^^^^^^^^^^^^^^
6016 The :ref:`llvm.memcpy <int_memcpy>` is often used to implement
6017 aggregate assignment operations in C and similar languages, however it
6018 is defined to copy a contiguous region of memory, which is more than
6019 strictly necessary for aggregate types which contain holes due to
6020 padding. Also, it doesn't contain any TBAA information about the fields
6023 ``!tbaa.struct`` metadata can describe which memory subregions in a
6024 memcpy are padding and what the TBAA tags of the struct are.
6026 The current metadata format is very simple. ``!tbaa.struct`` metadata
6027 nodes are a list of operands which are in conceptual groups of three.
6028 For each group of three, the first operand gives the byte offset of a
6029 field in bytes, the second gives its size in bytes, and the third gives
6032 .. code-block:: llvm
6034 !4 = !{ i64 0, i64 4, !1, i64 8, i64 4, !2 }
6036 This describes a struct with two fields. The first is at offset 0 bytes
6037 with size 4 bytes, and has tbaa tag !1. The second is at offset 8 bytes
6038 and has size 4 bytes and has tbaa tag !2.
6040 Note that the fields need not be contiguous. In this example, there is a
6041 4 byte gap between the two fields. This gap represents padding which
6042 does not carry useful data and need not be preserved.
6044 '``noalias``' and '``alias.scope``' Metadata
6045 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6047 ``noalias`` and ``alias.scope`` metadata provide the ability to specify generic
6048 noalias memory-access sets. This means that some collection of memory access
6049 instructions (loads, stores, memory-accessing calls, etc.) that carry
6050 ``noalias`` metadata can specifically be specified not to alias with some other
6051 collection of memory access instructions that carry ``alias.scope`` metadata.
6052 Each type of metadata specifies a list of scopes where each scope has an id and
6055 When evaluating an aliasing query, if for some domain, the set
6056 of scopes with that domain in one instruction's ``alias.scope`` list is a
6057 subset of (or equal to) the set of scopes for that domain in another
6058 instruction's ``noalias`` list, then the two memory accesses are assumed not to
6061 Because scopes in one domain don't affect scopes in other domains, separate
6062 domains can be used to compose multiple independent noalias sets. This is
6063 used for example during inlining. As the noalias function parameters are
6064 turned into noalias scope metadata, a new domain is used every time the
6065 function is inlined.
6067 The metadata identifying each domain is itself a list containing one or two
6068 entries. The first entry is the name of the domain. Note that if the name is a
6069 string then it can be combined across functions and translation units. A
6070 self-reference can be used to create globally unique domain names. A
6071 descriptive string may optionally be provided as a second list entry.
6073 The metadata identifying each scope is also itself a list containing two or
6074 three entries. The first entry is the name of the scope. Note that if the name
6075 is a string then it can be combined across functions and translation units. A
6076 self-reference can be used to create globally unique scope names. A metadata
6077 reference to the scope's domain is the second entry. A descriptive string may
6078 optionally be provided as a third list entry.
6082 .. code-block:: llvm
6084 ; Two scope domains:
6088 ; Some scopes in these domains:
6094 !5 = !{!4} ; A list containing only scope !4
6098 ; These two instructions don't alias:
6099 %0 = load float, float* %c, align 4, !alias.scope !5
6100 store float %0, float* %arrayidx.i, align 4, !noalias !5
6102 ; These two instructions also don't alias (for domain !1, the set of scopes
6103 ; in the !alias.scope equals that in the !noalias list):
6104 %2 = load float, float* %c, align 4, !alias.scope !5
6105 store float %2, float* %arrayidx.i2, align 4, !noalias !6
6107 ; These two instructions may alias (for domain !0, the set of scopes in
6108 ; the !noalias list is not a superset of, or equal to, the scopes in the
6109 ; !alias.scope list):
6110 %2 = load float, float* %c, align 4, !alias.scope !6
6111 store float %0, float* %arrayidx.i, align 4, !noalias !7
6113 '``fpmath``' Metadata
6114 ^^^^^^^^^^^^^^^^^^^^^
6116 ``fpmath`` metadata may be attached to any instruction of floating-point
6117 type. It can be used to express the maximum acceptable error in the
6118 result of that instruction, in ULPs, thus potentially allowing the
6119 compiler to use a more efficient but less accurate method of computing
6120 it. ULP is defined as follows:
6122 If ``x`` is a real number that lies between two finite consecutive
6123 floating-point numbers ``a`` and ``b``, without being equal to one
6124 of them, then ``ulp(x) = |b - a|``, otherwise ``ulp(x)`` is the
6125 distance between the two non-equal finite floating-point numbers
6126 nearest ``x``. Moreover, ``ulp(NaN)`` is ``NaN``.
6128 The metadata node shall consist of a single positive float type number
6129 representing the maximum relative error, for example:
6131 .. code-block:: llvm
6133 !0 = !{ float 2.5 } ; maximum acceptable inaccuracy is 2.5 ULPs
6137 '``range``' Metadata
6138 ^^^^^^^^^^^^^^^^^^^^
6140 ``range`` metadata may be attached only to ``load``, ``call`` and ``invoke`` of
6141 integer types. It expresses the possible ranges the loaded value or the value
6142 returned by the called function at this call site is in. If the loaded or
6143 returned value is not in the specified range, the behavior is undefined. The
6144 ranges are represented with a flattened list of integers. The loaded value or
6145 the value returned is known to be in the union of the ranges defined by each
6146 consecutive pair. Each pair has the following properties:
6148 - The type must match the type loaded by the instruction.
6149 - The pair ``a,b`` represents the range ``[a,b)``.
6150 - Both ``a`` and ``b`` are constants.
6151 - The range is allowed to wrap.
6152 - The range should not represent the full or empty set. That is,
6155 In addition, the pairs must be in signed order of the lower bound and
6156 they must be non-contiguous.
6160 .. code-block:: llvm
6162 %a = load i8, i8* %x, align 1, !range !0 ; Can only be 0 or 1
6163 %b = load i8, i8* %y, align 1, !range !1 ; Can only be 255 (-1), 0 or 1
6164 %c = call i8 @foo(), !range !2 ; Can only be 0, 1, 3, 4 or 5
6165 %d = invoke i8 @bar() to label %cont
6166 unwind label %lpad, !range !3 ; Can only be -2, -1, 3, 4 or 5
6168 !0 = !{ i8 0, i8 2 }
6169 !1 = !{ i8 255, i8 2 }
6170 !2 = !{ i8 0, i8 2, i8 3, i8 6 }
6171 !3 = !{ i8 -2, i8 0, i8 3, i8 6 }
6173 '``absolute_symbol``' Metadata
6174 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6176 ``absolute_symbol`` metadata may be attached to a global variable
6177 declaration. It marks the declaration as a reference to an absolute symbol,
6178 which causes the backend to use absolute relocations for the symbol even
6179 in position independent code, and expresses the possible ranges that the
6180 global variable's *address* (not its value) is in, in the same format as
6181 ``range`` metadata, with the extension that the pair ``all-ones,all-ones``
6182 may be used to represent the full set.
6184 Example (assuming 64-bit pointers):
6186 .. code-block:: llvm
6188 @a = external global i8, !absolute_symbol !0 ; Absolute symbol in range [0,256)
6189 @b = external global i8, !absolute_symbol !1 ; Absolute symbol in range [0,2^64)
6192 !0 = !{ i64 0, i64 256 }
6193 !1 = !{ i64 -1, i64 -1 }
6195 '``callees``' Metadata
6196 ^^^^^^^^^^^^^^^^^^^^^^
6198 ``callees`` metadata may be attached to indirect call sites. If ``callees``
6199 metadata is attached to a call site, and any callee is not among the set of
6200 functions provided by the metadata, the behavior is undefined. The intent of
6201 this metadata is to facilitate optimizations such as indirect-call promotion.
6202 For example, in the code below, the call instruction may only target the
6203 ``add`` or ``sub`` functions:
6205 .. code-block:: llvm
6207 %result = call i64 %binop(i64 %x, i64 %y), !callees !0
6210 !0 = !{i64 (i64, i64)* @add, i64 (i64, i64)* @sub}
6212 '``callback``' Metadata
6213 ^^^^^^^^^^^^^^^^^^^^^^^
6215 ``callback`` metadata may be attached to a function declaration, or definition.
6216 (Call sites are excluded only due to the lack of a use case.) For ease of
6217 exposition, we'll refer to the function annotated w/ metadata as a broker
6218 function. The metadata describes how the arguments of a call to the broker are
6219 in turn passed to the callback function specified by the metadata. Thus, the
6220 ``callback`` metadata provides a partial description of a call site inside the
6221 broker function with regards to the arguments of a call to the broker. The only
6222 semantic restriction on the broker function itself is that it is not allowed to
6223 inspect or modify arguments referenced in the ``callback`` metadata as
6224 pass-through to the callback function.
6226 The broker is not required to actually invoke the callback function at runtime.
6227 However, the assumptions about not inspecting or modifying arguments that would
6228 be passed to the specified callback function still hold, even if the callback
6229 function is not dynamically invoked. The broker is allowed to invoke the
6230 callback function more than once per invocation of the broker. The broker is
6231 also allowed to invoke (directly or indirectly) the function passed as a
6232 callback through another use. Finally, the broker is also allowed to relay the
6233 callback callee invocation to a different thread.
6235 The metadata is structured as follows: At the outer level, ``callback``
6236 metadata is a list of ``callback`` encodings. Each encoding starts with a
6237 constant ``i64`` which describes the argument position of the callback function
6238 in the call to the broker. The following elements, except the last, describe
6239 what arguments are passed to the callback function. Each element is again an
6240 ``i64`` constant identifying the argument of the broker that is passed through,
6241 or ``i64 -1`` to indicate an unknown or inspected argument. The order in which
6242 they are listed has to be the same in which they are passed to the callback
6243 callee. The last element of the encoding is a boolean which specifies how
6244 variadic arguments of the broker are handled. If it is true, all variadic
6245 arguments of the broker are passed through to the callback function *after* the
6246 arguments encoded explicitly before.
6248 In the code below, the ``pthread_create`` function is marked as a broker
6249 through the ``!callback !1`` metadata. In the example, there is only one
6250 callback encoding, namely ``!2``, associated with the broker. This encoding
6251 identifies the callback function as the second argument of the broker (``i64
6252 2``) and the sole argument of the callback function as the third one of the
6253 broker function (``i64 3``).
6255 .. FIXME why does the llvm-sphinx-docs builder give a highlighting
6256 error if the below is set to highlight as 'llvm', despite that we
6257 have misc.highlighting_failure set?
6259 .. code-block:: text
6261 declare !callback !1 dso_local i32 @pthread_create(i64*, %union.pthread_attr_t*, i8* (i8*)*, i8*)
6264 !2 = !{i64 2, i64 3, i1 false}
6267 Another example is shown below. The callback callee is the second argument of
6268 the ``__kmpc_fork_call`` function (``i64 2``). The callee is given two unknown
6269 values (each identified by a ``i64 -1``) and afterwards all
6270 variadic arguments that are passed to the ``__kmpc_fork_call`` call (due to the
6273 .. FIXME why does the llvm-sphinx-docs builder give a highlighting
6274 error if the below is set to highlight as 'llvm', despite that we
6275 have misc.highlighting_failure set?
6277 .. code-block:: text
6279 declare !callback !0 dso_local void @__kmpc_fork_call(%struct.ident_t*, i32, void (i32*, i32*, ...)*, ...)
6282 !1 = !{i64 2, i64 -1, i64 -1, i1 true}
6286 '``unpredictable``' Metadata
6287 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6289 ``unpredictable`` metadata may be attached to any branch or switch
6290 instruction. It can be used to express the unpredictability of control
6291 flow. Similar to the llvm.expect intrinsic, it may be used to alter
6292 optimizations related to compare and branch instructions. The metadata
6293 is treated as a boolean value; if it exists, it signals that the branch
6294 or switch that it is attached to is completely unpredictable.
6296 .. _md_dereferenceable:
6298 '``dereferenceable``' Metadata
6299 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6301 The existence of the ``!dereferenceable`` metadata on the instruction
6302 tells the optimizer that the value loaded is known to be dereferenceable.
6303 The number of bytes known to be dereferenceable is specified by the integer
6304 value in the metadata node. This is analogous to the ''dereferenceable''
6305 attribute on parameters and return values.
6307 .. _md_dereferenceable_or_null:
6309 '``dereferenceable_or_null``' Metadata
6310 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6312 The existence of the ``!dereferenceable_or_null`` metadata on the
6313 instruction tells the optimizer that the value loaded is known to be either
6314 dereferenceable or null.
6315 The number of bytes known to be dereferenceable is specified by the integer
6316 value in the metadata node. This is analogous to the ''dereferenceable_or_null''
6317 attribute on parameters and return values.
6324 It is sometimes useful to attach information to loop constructs. Currently,
6325 loop metadata is implemented as metadata attached to the branch instruction
6326 in the loop latch block. The loop metadata node is a list of
6327 other metadata nodes, each representing a property of the loop. Usually,
6328 the first item of the property node is a string. For example, the
6329 ``llvm.loop.unroll.count`` suggests an unroll factor to the loop
6332 .. code-block:: llvm
6334 br i1 %exitcond, label %._crit_edge, label %.lr.ph, !llvm.loop !0
6337 !1 = !{!"llvm.loop.unroll.enable"}
6338 !2 = !{!"llvm.loop.unroll.count", i32 4}
6340 For legacy reasons, the first item of a loop metadata node must be a
6341 reference to itself. Before the advent of the 'distinct' keyword, this
6342 forced the preservation of otherwise identical metadata nodes. Since
6343 the loop-metadata node can be attached to multiple nodes, the 'distinct'
6344 keyword has become unnecessary.
6346 Prior to the property nodes, one or two ``DILocation`` (debug location)
6347 nodes can be present in the list. The first, if present, identifies the
6348 source-code location where the loop begins. The second, if present,
6349 identifies the source-code location where the loop ends.
6351 Loop metadata nodes cannot be used as unique identifiers. They are
6352 neither persistent for the same loop through transformations nor
6353 necessarily unique to just one loop.
6355 '``llvm.loop.disable_nonforced``'
6356 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6358 This metadata disables all optional loop transformations unless
6359 explicitly instructed using other transformation metadata such as
6360 ``llvm.loop.unroll.enable``. That is, no heuristic will try to determine
6361 whether a transformation is profitable. The purpose is to avoid that the
6362 loop is transformed to a different loop before an explicitly requested
6363 (forced) transformation is applied. For instance, loop fusion can make
6364 other transformations impossible. Mandatory loop canonicalizations such
6365 as loop rotation are still applied.
6367 It is recommended to use this metadata in addition to any llvm.loop.*
6368 transformation directive. Also, any loop should have at most one
6369 directive applied to it (and a sequence of transformations built using
6370 followup-attributes). Otherwise, which transformation will be applied
6371 depends on implementation details such as the pass pipeline order.
6373 See :ref:`transformation-metadata` for details.
6375 '``llvm.loop.vectorize``' and '``llvm.loop.interleave``'
6376 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6378 Metadata prefixed with ``llvm.loop.vectorize`` or ``llvm.loop.interleave`` are
6379 used to control per-loop vectorization and interleaving parameters such as
6380 vectorization width and interleave count. These metadata should be used in
6381 conjunction with ``llvm.loop`` loop identification metadata. The
6382 ``llvm.loop.vectorize`` and ``llvm.loop.interleave`` metadata are only
6383 optimization hints and the optimizer will only interleave and vectorize loops if
6384 it believes it is safe to do so. The ``llvm.loop.parallel_accesses`` metadata
6385 which contains information about loop-carried memory dependencies can be helpful
6386 in determining the safety of these transformations.
6388 '``llvm.loop.interleave.count``' Metadata
6389 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6391 This metadata suggests an interleave count to the loop interleaver.
6392 The first operand is the string ``llvm.loop.interleave.count`` and the
6393 second operand is an integer specifying the interleave count. For
6396 .. code-block:: llvm
6398 !0 = !{!"llvm.loop.interleave.count", i32 4}
6400 Note that setting ``llvm.loop.interleave.count`` to 1 disables interleaving
6401 multiple iterations of the loop. If ``llvm.loop.interleave.count`` is set to 0
6402 then the interleave count will be determined automatically.
6404 '``llvm.loop.vectorize.enable``' Metadata
6405 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6407 This metadata selectively enables or disables vectorization for the loop. The
6408 first operand is the string ``llvm.loop.vectorize.enable`` and the second operand
6409 is a bit. If the bit operand value is 1 vectorization is enabled. A value of
6410 0 disables vectorization:
6412 .. code-block:: llvm
6414 !0 = !{!"llvm.loop.vectorize.enable", i1 0}
6415 !1 = !{!"llvm.loop.vectorize.enable", i1 1}
6417 '``llvm.loop.vectorize.predicate.enable``' Metadata
6418 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6420 This metadata selectively enables or disables creating predicated instructions
6421 for the loop, which can enable folding of the scalar epilogue loop into the
6422 main loop. The first operand is the string
6423 ``llvm.loop.vectorize.predicate.enable`` and the second operand is a bit. If
6424 the bit operand value is 1 vectorization is enabled. A value of 0 disables
6427 .. code-block:: llvm
6429 !0 = !{!"llvm.loop.vectorize.predicate.enable", i1 0}
6430 !1 = !{!"llvm.loop.vectorize.predicate.enable", i1 1}
6432 '``llvm.loop.vectorize.scalable.enable``' Metadata
6433 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6435 This metadata selectively enables or disables scalable vectorization for the
6436 loop, and only has any effect if vectorization for the loop is already enabled.
6437 The first operand is the string ``llvm.loop.vectorize.scalable.enable``
6438 and the second operand is a bit. If the bit operand value is 1 scalable
6439 vectorization is enabled, whereas a value of 0 reverts to the default fixed
6440 width vectorization:
6442 .. code-block:: llvm
6444 !0 = !{!"llvm.loop.vectorize.scalable.enable", i1 0}
6445 !1 = !{!"llvm.loop.vectorize.scalable.enable", i1 1}
6447 '``llvm.loop.vectorize.width``' Metadata
6448 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6450 This metadata sets the target width of the vectorizer. The first
6451 operand is the string ``llvm.loop.vectorize.width`` and the second
6452 operand is an integer specifying the width. For example:
6454 .. code-block:: llvm
6456 !0 = !{!"llvm.loop.vectorize.width", i32 4}
6458 Note that setting ``llvm.loop.vectorize.width`` to 1 disables
6459 vectorization of the loop. If ``llvm.loop.vectorize.width`` is set to
6460 0 or if the loop does not have this metadata the width will be
6461 determined automatically.
6463 '``llvm.loop.vectorize.followup_vectorized``' Metadata
6464 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6466 This metadata defines which loop attributes the vectorized loop will
6467 have. See :ref:`transformation-metadata` for details.
6469 '``llvm.loop.vectorize.followup_epilogue``' Metadata
6470 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6472 This metadata defines which loop attributes the epilogue will have. The
6473 epilogue is not vectorized and is executed when either the vectorized
6474 loop is not known to preserve semantics (because e.g., it processes two
6475 arrays that are found to alias by a runtime check) or for the last
6476 iterations that do not fill a complete set of vector lanes. See
6477 :ref:`Transformation Metadata <transformation-metadata>` for details.
6479 '``llvm.loop.vectorize.followup_all``' Metadata
6480 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6482 Attributes in the metadata will be added to both the vectorized and
6484 See :ref:`Transformation Metadata <transformation-metadata>` for details.
6486 '``llvm.loop.unroll``'
6487 ^^^^^^^^^^^^^^^^^^^^^^
6489 Metadata prefixed with ``llvm.loop.unroll`` are loop unrolling
6490 optimization hints such as the unroll factor. ``llvm.loop.unroll``
6491 metadata should be used in conjunction with ``llvm.loop`` loop
6492 identification metadata. The ``llvm.loop.unroll`` metadata are only
6493 optimization hints and the unrolling will only be performed if the
6494 optimizer believes it is safe to do so.
6496 '``llvm.loop.unroll.count``' Metadata
6497 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6499 This metadata suggests an unroll factor to the loop unroller. The
6500 first operand is the string ``llvm.loop.unroll.count`` and the second
6501 operand is a positive integer specifying the unroll factor. For
6504 .. code-block:: llvm
6506 !0 = !{!"llvm.loop.unroll.count", i32 4}
6508 If the trip count of the loop is less than the unroll count the loop
6509 will be partially unrolled.
6511 '``llvm.loop.unroll.disable``' Metadata
6512 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6514 This metadata disables loop unrolling. The metadata has a single operand
6515 which is the string ``llvm.loop.unroll.disable``. For example:
6517 .. code-block:: llvm
6519 !0 = !{!"llvm.loop.unroll.disable"}
6521 '``llvm.loop.unroll.runtime.disable``' Metadata
6522 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6524 This metadata disables runtime loop unrolling. The metadata has a single
6525 operand which is the string ``llvm.loop.unroll.runtime.disable``. For example:
6527 .. code-block:: llvm
6529 !0 = !{!"llvm.loop.unroll.runtime.disable"}
6531 '``llvm.loop.unroll.enable``' Metadata
6532 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6534 This metadata suggests that the loop should be fully unrolled if the trip count
6535 is known at compile time and partially unrolled if the trip count is not known
6536 at compile time. The metadata has a single operand which is the string
6537 ``llvm.loop.unroll.enable``. For example:
6539 .. code-block:: llvm
6541 !0 = !{!"llvm.loop.unroll.enable"}
6543 '``llvm.loop.unroll.full``' Metadata
6544 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6546 This metadata suggests that the loop should be unrolled fully. The
6547 metadata has a single operand which is the string ``llvm.loop.unroll.full``.
6550 .. code-block:: llvm
6552 !0 = !{!"llvm.loop.unroll.full"}
6554 '``llvm.loop.unroll.followup``' Metadata
6555 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6557 This metadata defines which loop attributes the unrolled loop will have.
6558 See :ref:`Transformation Metadata <transformation-metadata>` for details.
6560 '``llvm.loop.unroll.followup_remainder``' Metadata
6561 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6563 This metadata defines which loop attributes the remainder loop after
6564 partial/runtime unrolling will have. See
6565 :ref:`Transformation Metadata <transformation-metadata>` for details.
6567 '``llvm.loop.unroll_and_jam``'
6568 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6570 This metadata is treated very similarly to the ``llvm.loop.unroll`` metadata
6571 above, but affect the unroll and jam pass. In addition any loop with
6572 ``llvm.loop.unroll`` metadata but no ``llvm.loop.unroll_and_jam`` metadata will
6573 disable unroll and jam (so ``llvm.loop.unroll`` metadata will be left to the
6574 unroller, plus ``llvm.loop.unroll.disable`` metadata will disable unroll and jam
6577 The metadata for unroll and jam otherwise is the same as for ``unroll``.
6578 ``llvm.loop.unroll_and_jam.enable``, ``llvm.loop.unroll_and_jam.disable`` and
6579 ``llvm.loop.unroll_and_jam.count`` do the same as for unroll.
6580 ``llvm.loop.unroll_and_jam.full`` is not supported. Again these are only hints
6581 and the normal safety checks will still be performed.
6583 '``llvm.loop.unroll_and_jam.count``' Metadata
6584 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6586 This metadata suggests an unroll and jam factor to use, similarly to
6587 ``llvm.loop.unroll.count``. The first operand is the string
6588 ``llvm.loop.unroll_and_jam.count`` and the second operand is a positive integer
6589 specifying the unroll factor. For example:
6591 .. code-block:: llvm
6593 !0 = !{!"llvm.loop.unroll_and_jam.count", i32 4}
6595 If the trip count of the loop is less than the unroll count the loop
6596 will be partially unroll and jammed.
6598 '``llvm.loop.unroll_and_jam.disable``' Metadata
6599 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6601 This metadata disables loop unroll and jamming. The metadata has a single
6602 operand which is the string ``llvm.loop.unroll_and_jam.disable``. For example:
6604 .. code-block:: llvm
6606 !0 = !{!"llvm.loop.unroll_and_jam.disable"}
6608 '``llvm.loop.unroll_and_jam.enable``' Metadata
6609 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6611 This metadata suggests that the loop should be fully unroll and jammed if the
6612 trip count is known at compile time and partially unrolled if the trip count is
6613 not known at compile time. The metadata has a single operand which is the
6614 string ``llvm.loop.unroll_and_jam.enable``. For example:
6616 .. code-block:: llvm
6618 !0 = !{!"llvm.loop.unroll_and_jam.enable"}
6620 '``llvm.loop.unroll_and_jam.followup_outer``' Metadata
6621 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6623 This metadata defines which loop attributes the outer unrolled loop will
6624 have. See :ref:`Transformation Metadata <transformation-metadata>` for
6627 '``llvm.loop.unroll_and_jam.followup_inner``' Metadata
6628 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6630 This metadata defines which loop attributes the inner jammed loop will
6631 have. See :ref:`Transformation Metadata <transformation-metadata>` for
6634 '``llvm.loop.unroll_and_jam.followup_remainder_outer``' Metadata
6635 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6637 This metadata defines which attributes the epilogue of the outer loop
6638 will have. This loop is usually unrolled, meaning there is no such
6639 loop. This attribute will be ignored in this case. See
6640 :ref:`Transformation Metadata <transformation-metadata>` for details.
6642 '``llvm.loop.unroll_and_jam.followup_remainder_inner``' Metadata
6643 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6645 This metadata defines which attributes the inner loop of the epilogue
6646 will have. The outer epilogue will usually be unrolled, meaning there
6647 can be multiple inner remainder loops. See
6648 :ref:`Transformation Metadata <transformation-metadata>` for details.
6650 '``llvm.loop.unroll_and_jam.followup_all``' Metadata
6651 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6653 Attributes specified in the metadata is added to all
6654 ``llvm.loop.unroll_and_jam.*`` loops. See
6655 :ref:`Transformation Metadata <transformation-metadata>` for details.
6657 '``llvm.loop.licm_versioning.disable``' Metadata
6658 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6660 This metadata indicates that the loop should not be versioned for the purpose
6661 of enabling loop-invariant code motion (LICM). The metadata has a single operand
6662 which is the string ``llvm.loop.licm_versioning.disable``. For example:
6664 .. code-block:: llvm
6666 !0 = !{!"llvm.loop.licm_versioning.disable"}
6668 '``llvm.loop.distribute.enable``' Metadata
6669 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6671 Loop distribution allows splitting a loop into multiple loops. Currently,
6672 this is only performed if the entire loop cannot be vectorized due to unsafe
6673 memory dependencies. The transformation will attempt to isolate the unsafe
6674 dependencies into their own loop.
6676 This metadata can be used to selectively enable or disable distribution of the
6677 loop. The first operand is the string ``llvm.loop.distribute.enable`` and the
6678 second operand is a bit. If the bit operand value is 1 distribution is
6679 enabled. A value of 0 disables distribution:
6681 .. code-block:: llvm
6683 !0 = !{!"llvm.loop.distribute.enable", i1 0}
6684 !1 = !{!"llvm.loop.distribute.enable", i1 1}
6686 This metadata should be used in conjunction with ``llvm.loop`` loop
6687 identification metadata.
6689 '``llvm.loop.distribute.followup_coincident``' Metadata
6690 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6692 This metadata defines which attributes extracted loops with no cyclic
6693 dependencies will have (i.e. can be vectorized). See
6694 :ref:`Transformation Metadata <transformation-metadata>` for details.
6696 '``llvm.loop.distribute.followup_sequential``' Metadata
6697 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6699 This metadata defines which attributes the isolated loops with unsafe
6700 memory dependencies will have. See
6701 :ref:`Transformation Metadata <transformation-metadata>` for details.
6703 '``llvm.loop.distribute.followup_fallback``' Metadata
6704 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6706 If loop versioning is necessary, this metadata defined the attributes
6707 the non-distributed fallback version will have. See
6708 :ref:`Transformation Metadata <transformation-metadata>` for details.
6710 '``llvm.loop.distribute.followup_all``' Metadata
6711 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6713 The attributes in this metadata is added to all followup loops of the
6714 loop distribution pass. See
6715 :ref:`Transformation Metadata <transformation-metadata>` for details.
6717 '``llvm.licm.disable``' Metadata
6718 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6720 This metadata indicates that loop-invariant code motion (LICM) should not be
6721 performed on this loop. The metadata has a single operand which is the string
6722 ``llvm.licm.disable``. For example:
6724 .. code-block:: llvm
6726 !0 = !{!"llvm.licm.disable"}
6728 Note that although it operates per loop it isn't given the llvm.loop prefix
6729 as it is not affected by the ``llvm.loop.disable_nonforced`` metadata.
6731 '``llvm.access.group``' Metadata
6732 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6734 ``llvm.access.group`` metadata can be attached to any instruction that
6735 potentially accesses memory. It can point to a single distinct metadata
6736 node, which we call access group. This node represents all memory access
6737 instructions referring to it via ``llvm.access.group``. When an
6738 instruction belongs to multiple access groups, it can also point to a
6739 list of accesses groups, illustrated by the following example.
6741 .. code-block:: llvm
6743 %val = load i32, i32* %arrayidx, !llvm.access.group !0
6749 It is illegal for the list node to be empty since it might be confused
6750 with an access group.
6752 The access group metadata node must be 'distinct' to avoid collapsing
6753 multiple access groups by content. A access group metadata node must
6754 always be empty which can be used to distinguish an access group
6755 metadata node from a list of access groups. Being empty avoids the
6756 situation that the content must be updated which, because metadata is
6757 immutable by design, would required finding and updating all references
6758 to the access group node.
6760 The access group can be used to refer to a memory access instruction
6761 without pointing to it directly (which is not possible in global
6762 metadata). Currently, the only metadata making use of it is
6763 ``llvm.loop.parallel_accesses``.
6765 '``llvm.loop.parallel_accesses``' Metadata
6766 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6768 The ``llvm.loop.parallel_accesses`` metadata refers to one or more
6769 access group metadata nodes (see ``llvm.access.group``). It denotes that
6770 no loop-carried memory dependence exist between it and other instructions
6771 in the loop with this metadata.
6773 Let ``m1`` and ``m2`` be two instructions that both have the
6774 ``llvm.access.group`` metadata to the access group ``g1``, respectively
6775 ``g2`` (which might be identical). If a loop contains both access groups
6776 in its ``llvm.loop.parallel_accesses`` metadata, then the compiler can
6777 assume that there is no dependency between ``m1`` and ``m2`` carried by
6778 this loop. Instructions that belong to multiple access groups are
6779 considered having this property if at least one of the access groups
6780 matches the ``llvm.loop.parallel_accesses`` list.
6782 If all memory-accessing instructions in a loop have
6783 ``llvm.access.group`` metadata that each refer to one of the access
6784 groups of a loop's ``llvm.loop.parallel_accesses`` metadata, then the
6785 loop has no loop carried memory dependences and is considered to be a
6788 Note that if not all memory access instructions belong to an access
6789 group referred to by ``llvm.loop.parallel_accesses``, then the loop must
6790 not be considered trivially parallel. Additional
6791 memory dependence analysis is required to make that determination. As a fail
6792 safe mechanism, this causes loops that were originally parallel to be considered
6793 sequential (if optimization passes that are unaware of the parallel semantics
6794 insert new memory instructions into the loop body).
6796 Example of a loop that is considered parallel due to its correct use of
6797 both ``llvm.access.group`` and ``llvm.loop.parallel_accesses``
6800 .. code-block:: llvm
6804 %val0 = load i32, i32* %arrayidx, !llvm.access.group !1
6806 store i32 %val0, i32* %arrayidx1, !llvm.access.group !1
6808 br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !0
6812 !0 = distinct !{!0, !{!"llvm.loop.parallel_accesses", !1}}
6815 It is also possible to have nested parallel loops:
6817 .. code-block:: llvm
6821 %val1 = load i32, i32* %arrayidx3, !llvm.access.group !4
6823 br label %inner.for.body
6827 %val0 = load i32, i32* %arrayidx1, !llvm.access.group !3
6829 store i32 %val0, i32* %arrayidx2, !llvm.access.group !3
6831 br i1 %exitcond, label %inner.for.end, label %inner.for.body, !llvm.loop !1
6835 store i32 %val1, i32* %arrayidx4, !llvm.access.group !4
6837 br i1 %exitcond, label %outer.for.end, label %outer.for.body, !llvm.loop !2
6839 outer.for.end: ; preds = %for.body
6841 !1 = distinct !{!1, !{!"llvm.loop.parallel_accesses", !3}} ; metadata for the inner loop
6842 !2 = distinct !{!2, !{!"llvm.loop.parallel_accesses", !3, !4}} ; metadata for the outer loop
6843 !3 = distinct !{} ; access group for instructions in the inner loop (which are implicitly contained in outer loop as well)
6844 !4 = distinct !{} ; access group for instructions in the outer, but not the inner loop
6846 '``llvm.loop.mustprogress``' Metadata
6847 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6849 The ``llvm.loop.mustprogress`` metadata indicates that this loop is required to
6850 terminate, unwind, or interact with the environment in an observable way e.g.
6851 via a volatile memory access, I/O, or other synchronization. If such a loop is
6852 not found to interact with the environment in an observable way, the loop may
6853 be removed. This corresponds to the ``mustprogress`` function attribute.
6855 '``irr_loop``' Metadata
6856 ^^^^^^^^^^^^^^^^^^^^^^^
6858 ``irr_loop`` metadata may be attached to the terminator instruction of a basic
6859 block that's an irreducible loop header (note that an irreducible loop has more
6860 than once header basic blocks.) If ``irr_loop`` metadata is attached to the
6861 terminator instruction of a basic block that is not really an irreducible loop
6862 header, the behavior is undefined. The intent of this metadata is to improve the
6863 accuracy of the block frequency propagation. For example, in the code below, the
6864 block ``header0`` may have a loop header weight (relative to the other headers of
6865 the irreducible loop) of 100:
6867 .. code-block:: llvm
6871 br i1 %cmp, label %t1, label %t2, !irr_loop !0
6874 !0 = !{"loop_header_weight", i64 100}
6876 Irreducible loop header weights are typically based on profile data.
6878 .. _md_invariant.group:
6880 '``invariant.group``' Metadata
6881 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6883 The experimental ``invariant.group`` metadata may be attached to
6884 ``load``/``store`` instructions referencing a single metadata with no entries.
6885 The existence of the ``invariant.group`` metadata on the instruction tells
6886 the optimizer that every ``load`` and ``store`` to the same pointer operand
6887 can be assumed to load or store the same
6888 value (but see the ``llvm.launder.invariant.group`` intrinsic which affects
6889 when two pointers are considered the same). Pointers returned by bitcast or
6890 getelementptr with only zero indices are considered the same.
6894 .. code-block:: llvm
6896 @unknownPtr = external global i8
6899 store i8 42, i8* %ptr, !invariant.group !0
6900 call void @foo(i8* %ptr)
6902 %a = load i8, i8* %ptr, !invariant.group !0 ; Can assume that value under %ptr didn't change
6903 call void @foo(i8* %ptr)
6905 %newPtr = call i8* @getPointer(i8* %ptr)
6906 %c = load i8, i8* %newPtr, !invariant.group !0 ; Can't assume anything, because we only have information about %ptr
6908 %unknownValue = load i8, i8* @unknownPtr
6909 store i8 %unknownValue, i8* %ptr, !invariant.group !0 ; Can assume that %unknownValue == 42
6911 call void @foo(i8* %ptr)
6912 %newPtr2 = call i8* @llvm.launder.invariant.group(i8* %ptr)
6913 %d = load i8, i8* %newPtr2, !invariant.group !0 ; Can't step through launder.invariant.group to get value of %ptr
6916 declare void @foo(i8*)
6917 declare i8* @getPointer(i8*)
6918 declare i8* @llvm.launder.invariant.group(i8*)
6922 The invariant.group metadata must be dropped when replacing one pointer by
6923 another based on aliasing information. This is because invariant.group is tied
6924 to the SSA value of the pointer operand.
6926 .. code-block:: llvm
6928 %v = load i8, i8* %x, !invariant.group !0
6929 ; if %x mustalias %y then we can replace the above instruction with
6930 %v = load i8, i8* %y
6932 Note that this is an experimental feature, which means that its semantics might
6933 change in the future.
6938 See :doc:`TypeMetadata`.
6940 '``associated``' Metadata
6941 ^^^^^^^^^^^^^^^^^^^^^^^^^
6943 The ``associated`` metadata may be attached to a global variable definition with
6944 a single argument that references a global object (optionally through an alias).
6946 This metadata lowers to the ELF section flag ``SHF_LINK_ORDER`` which prevents
6947 discarding of the global variable in linker GC unless the referenced object is
6948 also discarded. The linker support for this feature is spotty. For best
6949 compatibility, globals carrying this metadata should:
6951 - Be in ``@llvm.compiler.used``.
6952 - If the referenced global variable is in a comdat, be in the same comdat.
6954 ``!associated`` can not express many-to-one relationship. A global variable with
6955 the metadata should generally not be referenced by a function: the function may
6956 be inlined into other functions, leading to more references to the metadata.
6957 Ideally we would want to keep metadata alive as long as any inline location is
6958 alive, but this many-to-one relationship is not representable. Moreover, if the
6959 metadata is retained while the function is discarded, the linker will report an
6960 error of a relocation referencing a discarded section.
6962 The metadata is often used with an explicit section consisting of valid C
6963 identifiers so that the runtime can find the metadata section with
6964 linker-defined encapsulation symbols ``__start_<section_name>`` and
6965 ``__stop_<section_name>``.
6967 It does not have any effect on non-ELF targets.
6971 .. code-block:: text
6974 @a = global i32 1, comdat $a
6975 @b = internal global i32 2, comdat $a, section "abc", !associated !0
6982 The ``prof`` metadata is used to record profile data in the IR.
6983 The first operand of the metadata node indicates the profile metadata
6984 type. There are currently 3 types:
6985 :ref:`branch_weights<prof_node_branch_weights>`,
6986 :ref:`function_entry_count<prof_node_function_entry_count>`, and
6987 :ref:`VP<prof_node_VP>`.
6989 .. _prof_node_branch_weights:
6994 Branch weight metadata attached to a branch, select, switch or call instruction
6995 represents the likeliness of the associated branch being taken.
6996 For more information, see :doc:`BranchWeightMetadata`.
6998 .. _prof_node_function_entry_count:
7000 function_entry_count
7001 """"""""""""""""""""
7003 Function entry count metadata can be attached to function definitions
7004 to record the number of times the function is called. Used with BFI
7005 information, it is also used to derive the basic block profile count.
7006 For more information, see :doc:`BranchWeightMetadata`.
7013 VP (value profile) metadata can be attached to instructions that have
7014 value profile information. Currently this is indirect calls (where it
7015 records the hottest callees) and calls to memory intrinsics such as memcpy,
7016 memmove, and memset (where it records the hottest byte lengths).
7018 Each VP metadata node contains "VP" string, then a uint32_t value for the value
7019 profiling kind, a uint64_t value for the total number of times the instruction
7020 is executed, followed by uint64_t value and execution count pairs.
7021 The value profiling kind is 0 for indirect call targets and 1 for memory
7022 operations. For indirect call targets, each profile value is a hash
7023 of the callee function name, and for memory operations each value is the
7026 Note that the value counts do not need to add up to the total count
7027 listed in the third operand (in practice only the top hottest values
7028 are tracked and reported).
7030 Indirect call example:
7032 .. code-block:: llvm
7034 call void %f(), !prof !1
7035 !1 = !{!"VP", i32 0, i64 1600, i64 7651369219802541373, i64 1030, i64 -4377547752858689819, i64 410}
7037 Note that the VP type is 0 (the second operand), which indicates this is
7038 an indirect call value profile data. The third operand indicates that the
7039 indirect call executed 1600 times. The 4th and 6th operands give the
7040 hashes of the 2 hottest target functions' names (this is the same hash used
7041 to represent function names in the profile database), and the 5th and 7th
7042 operands give the execution count that each of the respective prior target
7043 functions was called.
7047 '``annotation``' Metadata
7048 ^^^^^^^^^^^^^^^^^^^^^^^^^
7050 The ``annotation`` metadata can be used to attach a tuple of annotation strings
7051 to any instruction. This metadata does not impact the semantics of the program
7052 and may only be used to provide additional insight about the program and
7053 transformations to users.
7057 .. code-block:: text
7059 %a.addr = alloca float*, align 8, !annotation !0
7060 !0 = !{!"auto-init"}
7062 Module Flags Metadata
7063 =====================
7065 Information about the module as a whole is difficult to convey to LLVM's
7066 subsystems. The LLVM IR isn't sufficient to transmit this information.
7067 The ``llvm.module.flags`` named metadata exists in order to facilitate
7068 this. These flags are in the form of key / value pairs --- much like a
7069 dictionary --- making it easy for any subsystem who cares about a flag to
7072 The ``llvm.module.flags`` metadata contains a list of metadata triplets.
7073 Each triplet has the following form:
7075 - The first element is a *behavior* flag, which specifies the behavior
7076 when two (or more) modules are merged together, and it encounters two
7077 (or more) metadata with the same ID. The supported behaviors are
7079 - The second element is a metadata string that is a unique ID for the
7080 metadata. Each module may only have one flag entry for each unique ID (not
7081 including entries with the **Require** behavior).
7082 - The third element is the value of the flag.
7084 When two (or more) modules are merged together, the resulting
7085 ``llvm.module.flags`` metadata is the union of the modules' flags. That is, for
7086 each unique metadata ID string, there will be exactly one entry in the merged
7087 modules ``llvm.module.flags`` metadata table, and the value for that entry will
7088 be determined by the merge behavior flag, as described below. The only exception
7089 is that entries with the *Require* behavior are always preserved.
7091 The following behaviors are supported:
7102 Emits an error if two values disagree, otherwise the resulting value
7103 is that of the operands.
7107 Emits a warning if two values disagree. The result value will be the
7108 operand for the flag from the first module being linked, or the max
7109 if the other module uses **Max** (in which case the resulting flag
7114 Adds a requirement that another module flag be present and have a
7115 specified value after linking is performed. The value must be a
7116 metadata pair, where the first element of the pair is the ID of the
7117 module flag to be restricted, and the second element of the pair is
7118 the value the module flag should be restricted to. This behavior can
7119 be used to restrict the allowable results (via triggering of an
7120 error) of linking IDs with the **Override** behavior.
7124 Uses the specified value, regardless of the behavior or value of the
7125 other module. If both modules specify **Override**, but the values
7126 differ, an error will be emitted.
7130 Appends the two values, which are required to be metadata nodes.
7134 Appends the two values, which are required to be metadata
7135 nodes. However, duplicate entries in the second list are dropped
7136 during the append operation.
7140 Takes the max of the two values, which are required to be integers.
7142 It is an error for a particular unique flag ID to have multiple behaviors,
7143 except in the case of **Require** (which adds restrictions on another metadata
7144 value) or **Override**.
7146 An example of module flags:
7148 .. code-block:: llvm
7150 !0 = !{ i32 1, !"foo", i32 1 }
7151 !1 = !{ i32 4, !"bar", i32 37 }
7152 !2 = !{ i32 2, !"qux", i32 42 }
7153 !3 = !{ i32 3, !"qux",
7158 !llvm.module.flags = !{ !0, !1, !2, !3 }
7160 - Metadata ``!0`` has the ID ``!"foo"`` and the value '1'. The behavior
7161 if two or more ``!"foo"`` flags are seen is to emit an error if their
7162 values are not equal.
7164 - Metadata ``!1`` has the ID ``!"bar"`` and the value '37'. The
7165 behavior if two or more ``!"bar"`` flags are seen is to use the value
7168 - Metadata ``!2`` has the ID ``!"qux"`` and the value '42'. The
7169 behavior if two or more ``!"qux"`` flags are seen is to emit a
7170 warning if their values are not equal.
7172 - Metadata ``!3`` has the ID ``!"qux"`` and the value:
7178 The behavior is to emit an error if the ``llvm.module.flags`` does not
7179 contain a flag with the ID ``!"foo"`` that has the value '1' after linking is
7182 Synthesized Functions Module Flags Metadata
7183 -------------------------------------------
7185 These metadata specify the default attributes synthesized functions should have.
7186 These metadata are currently respected by a few instrumentation passes, such as
7189 These metadata correspond to a few function attributes with significant code
7190 generation behaviors. Function attributes with just optimization purposes
7191 should not be listed because the performance impact of these synthesized
7194 - "frame-pointer": **Max**. The value can be 0, 1, or 2. A synthesized function
7195 will get the "frame-pointer" function attribute, with value being "none",
7196 "non-leaf", or "all", respectively.
7197 - "uwtable": **Max**. The value can be 0 or 1. If the value is 1, a synthesized
7198 function will get the ``uwtable`` function attribute.
7200 Objective-C Garbage Collection Module Flags Metadata
7201 ----------------------------------------------------
7203 On the Mach-O platform, Objective-C stores metadata about garbage
7204 collection in a special section called "image info". The metadata
7205 consists of a version number and a bitmask specifying what types of
7206 garbage collection are supported (if any) by the file. If two or more
7207 modules are linked together their garbage collection metadata needs to
7208 be merged rather than appended together.
7210 The Objective-C garbage collection module flags metadata consists of the
7211 following key-value pairs:
7220 * - ``Objective-C Version``
7221 - **[Required]** --- The Objective-C ABI version. Valid values are 1 and 2.
7223 * - ``Objective-C Image Info Version``
7224 - **[Required]** --- The version of the image info section. Currently
7227 * - ``Objective-C Image Info Section``
7228 - **[Required]** --- The section to place the metadata. Valid values are
7229 ``"__OBJC, __image_info, regular"`` for Objective-C ABI version 1, and
7230 ``"__DATA,__objc_imageinfo, regular, no_dead_strip"`` for
7231 Objective-C ABI version 2.
7233 * - ``Objective-C Garbage Collection``
7234 - **[Required]** --- Specifies whether garbage collection is supported or
7235 not. Valid values are 0, for no garbage collection, and 2, for garbage
7236 collection supported.
7238 * - ``Objective-C GC Only``
7239 - **[Optional]** --- Specifies that only garbage collection is supported.
7240 If present, its value must be 6. This flag requires that the
7241 ``Objective-C Garbage Collection`` flag have the value 2.
7243 Some important flag interactions:
7245 - If a module with ``Objective-C Garbage Collection`` set to 0 is
7246 merged with a module with ``Objective-C Garbage Collection`` set to
7247 2, then the resulting module has the
7248 ``Objective-C Garbage Collection`` flag set to 0.
7249 - A module with ``Objective-C Garbage Collection`` set to 0 cannot be
7250 merged with a module with ``Objective-C GC Only`` set to 6.
7252 C type width Module Flags Metadata
7253 ----------------------------------
7255 The ARM backend emits a section into each generated object file describing the
7256 options that it was compiled with (in a compiler-independent way) to prevent
7257 linking incompatible objects, and to allow automatic library selection. Some
7258 of these options are not visible at the IR level, namely wchar_t width and enum
7261 To pass this information to the backend, these options are encoded in module
7262 flags metadata, using the following key-value pairs:
7272 - * 0 --- sizeof(wchar_t) == 4
7273 * 1 --- sizeof(wchar_t) == 2
7276 - * 0 --- Enums are at least as large as an ``int``.
7277 * 1 --- Enums are stored in the smallest integer type which can
7278 represent all of its values.
7280 For example, the following metadata section specifies that the module was
7281 compiled with a ``wchar_t`` width of 4 bytes, and the underlying type of an
7282 enum is the smallest type which can represent all of its values::
7284 !llvm.module.flags = !{!0, !1}
7285 !0 = !{i32 1, !"short_wchar", i32 1}
7286 !1 = !{i32 1, !"short_enum", i32 0}
7288 LTO Post-Link Module Flags Metadata
7289 -----------------------------------
7291 Some optimisations are only when the entire LTO unit is present in the current
7292 module. This is represented by the ``LTOPostLink`` module flags metadata, which
7293 will be created with a value of ``1`` when LTO linking occurs.
7295 Automatic Linker Flags Named Metadata
7296 =====================================
7298 Some targets support embedding of flags to the linker inside individual object
7299 files. Typically this is used in conjunction with language extensions which
7300 allow source files to contain linker command line options, and have these
7301 automatically be transmitted to the linker via object files.
7303 These flags are encoded in the IR using named metadata with the name
7304 ``!llvm.linker.options``. Each operand is expected to be a metadata node
7305 which should be a list of other metadata nodes, each of which should be a
7306 list of metadata strings defining linker options.
7308 For example, the following metadata section specifies two separate sets of
7309 linker options, presumably to link against ``libz`` and the ``Cocoa``
7313 !1 = !{ !"-framework", !"Cocoa" }
7314 !llvm.linker.options = !{ !0, !1 }
7316 The metadata encoding as lists of lists of options, as opposed to a collapsed
7317 list of options, is chosen so that the IR encoding can use multiple option
7318 strings to specify e.g., a single library, while still having that specifier be
7319 preserved as an atomic element that can be recognized by a target specific
7320 assembly writer or object file emitter.
7322 Each individual option is required to be either a valid option for the target's
7323 linker, or an option that is reserved by the target specific assembly writer or
7324 object file emitter. No other aspect of these options is defined by the IR.
7326 Dependent Libs Named Metadata
7327 =============================
7329 Some targets support embedding of strings into object files to indicate
7330 a set of libraries to add to the link. Typically this is used in conjunction
7331 with language extensions which allow source files to explicitly declare the
7332 libraries they depend on, and have these automatically be transmitted to the
7333 linker via object files.
7335 The list is encoded in the IR using named metadata with the name
7336 ``!llvm.dependent-libraries``. Each operand is expected to be a metadata node
7337 which should contain a single string operand.
7339 For example, the following metadata section contains two library specifiers::
7341 !0 = !{!"a library specifier"}
7342 !1 = !{!"another library specifier"}
7343 !llvm.dependent-libraries = !{ !0, !1 }
7345 Each library specifier will be handled independently by the consuming linker.
7346 The effect of the library specifiers are defined by the consuming linker.
7353 Compiling with `ThinLTO <https://clang.llvm.org/docs/ThinLTO.html>`_
7354 causes the building of a compact summary of the module that is emitted into
7355 the bitcode. The summary is emitted into the LLVM assembly and identified
7356 in syntax by a caret ('``^``').
7358 The summary is parsed into a bitcode output, along with the Module
7359 IR, via the "``llvm-as``" tool. Tools that parse the Module IR for the purposes
7360 of optimization (e.g. "``clang -x ir``" and "``opt``"), will ignore the
7361 summary entries (just as they currently ignore summary entries in a bitcode
7364 Eventually, the summary will be parsed into a ModuleSummaryIndex object under
7365 the same conditions where summary index is currently built from bitcode.
7366 Specifically, tools that test the Thin Link portion of a ThinLTO compile
7367 (i.e. llvm-lto and llvm-lto2), or when parsing a combined index
7368 for a distributed ThinLTO backend via clang's "``-fthinlto-index=<>``" flag
7369 (this part is not yet implemented, use llvm-as to create a bitcode object
7370 before feeding into thin link tools for now).
7372 There are currently 3 types of summary entries in the LLVM assembly:
7373 :ref:`module paths<module_path_summary>`,
7374 :ref:`global values<gv_summary>`, and
7375 :ref:`type identifiers<typeid_summary>`.
7377 .. _module_path_summary:
7379 Module Path Summary Entry
7380 -------------------------
7382 Each module path summary entry lists a module containing global values included
7383 in the summary. For a single IR module there will be one such entry, but
7384 in a combined summary index produced during the thin link, there will be
7385 one module path entry per linked module with summary.
7389 .. code-block:: text
7391 ^0 = module: (path: "/path/to/file.o", hash: (2468601609, 1329373163, 1565878005, 638838075, 3148790418))
7393 The ``path`` field is a string path to the bitcode file, and the ``hash``
7394 field is the 160-bit SHA-1 hash of the IR bitcode contents, used for
7395 incremental builds and caching.
7399 Global Value Summary Entry
7400 --------------------------
7402 Each global value summary entry corresponds to a global value defined or
7403 referenced by a summarized module.
7407 .. code-block:: text
7409 ^4 = gv: (name: "f"[, summaries: (Summary)[, (Summary)]*]?) ; guid = 14740650423002898831
7411 For declarations, there will not be a summary list. For definitions, a
7412 global value will contain a list of summaries, one per module containing
7413 a definition. There can be multiple entries in a combined summary index
7414 for symbols with weak linkage.
7416 Each ``Summary`` format will depend on whether the global value is a
7417 :ref:`function<function_summary>`, :ref:`variable<variable_summary>`, or
7418 :ref:`alias<alias_summary>`.
7420 .. _function_summary:
7425 If the global value is a function, the ``Summary`` entry will look like:
7427 .. code-block:: text
7429 function: (module: ^0, flags: (linkage: external, notEligibleToImport: 0, live: 0, dsoLocal: 0), insts: 2[, FuncFlags]?[, Calls]?[, TypeIdInfo]?[, Params]?[, Refs]?
7431 The ``module`` field includes the summary entry id for the module containing
7432 this definition, and the ``flags`` field contains information such as
7433 the linkage type, a flag indicating whether it is legal to import the
7434 definition, whether it is globally live and whether the linker resolved it
7435 to a local definition (the latter two are populated during the thin link).
7436 The ``insts`` field contains the number of IR instructions in the function.
7437 Finally, there are several optional fields: :ref:`FuncFlags<funcflags_summary>`,
7438 :ref:`Calls<calls_summary>`, :ref:`TypeIdInfo<typeidinfo_summary>`,
7439 :ref:`Params<params_summary>`, :ref:`Refs<refs_summary>`.
7441 .. _variable_summary:
7443 Global Variable Summary
7444 ^^^^^^^^^^^^^^^^^^^^^^^
7446 If the global value is a variable, the ``Summary`` entry will look like:
7448 .. code-block:: text
7450 variable: (module: ^0, flags: (linkage: external, notEligibleToImport: 0, live: 0, dsoLocal: 0)[, Refs]?
7452 The variable entry contains a subset of the fields in a
7453 :ref:`function summary <function_summary>`, see the descriptions there.
7460 If the global value is an alias, the ``Summary`` entry will look like:
7462 .. code-block:: text
7464 alias: (module: ^0, flags: (linkage: external, notEligibleToImport: 0, live: 0, dsoLocal: 0), aliasee: ^2)
7466 The ``module`` and ``flags`` fields are as described for a
7467 :ref:`function summary <function_summary>`. The ``aliasee`` field
7468 contains a reference to the global value summary entry of the aliasee.
7470 .. _funcflags_summary:
7475 The optional ``FuncFlags`` field looks like:
7477 .. code-block:: text
7479 funcFlags: (readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0, noInline: 0, alwaysInline: 0, noUnwind: 1, mayThrow: 0, hasUnknownCall: 0)
7481 If unspecified, flags are assumed to hold the conservative ``false`` value of
7489 The optional ``Calls`` field looks like:
7491 .. code-block:: text
7493 calls: ((Callee)[, (Callee)]*)
7495 where each ``Callee`` looks like:
7497 .. code-block:: text
7499 callee: ^1[, hotness: None]?[, relbf: 0]?
7501 The ``callee`` refers to the summary entry id of the callee. At most one
7502 of ``hotness`` (which can take the values ``Unknown``, ``Cold``, ``None``,
7503 ``Hot``, and ``Critical``), and ``relbf`` (which holds the integer
7504 branch frequency relative to the entry frequency, scaled down by 2^8)
7505 may be specified. The defaults are ``Unknown`` and ``0``, respectively.
7512 The optional ``Params`` is used by ``StackSafety`` and looks like:
7514 .. code-block:: text
7516 Params: ((Param)[, (Param)]*)
7518 where each ``Param`` describes pointer parameter access inside of the
7519 function and looks like:
7521 .. code-block:: text
7523 param: 4, offset: [0, 5][, calls: ((Callee)[, (Callee)]*)]?
7525 where the first ``param`` is the number of the parameter it describes,
7526 ``offset`` is the inclusive range of offsets from the pointer parameter to bytes
7527 which can be accessed by the function. This range does not include accesses by
7528 function calls from ``calls`` list.
7530 where each ``Callee`` describes how parameter is forwarded into other
7531 functions and looks like:
7533 .. code-block:: text
7535 callee: ^3, param: 5, offset: [-3, 3]
7537 The ``callee`` refers to the summary entry id of the callee, ``param`` is
7538 the number of the callee parameter which points into the callers parameter
7539 with offset known to be inside of the ``offset`` range. ``calls`` will be
7540 consumed and removed by thin link stage to update ``Param::offset`` so it
7541 covers all accesses possible by ``calls``.
7543 Pointer parameter without corresponding ``Param`` is considered unsafe and we
7544 assume that access with any offset is possible.
7548 If we have the following function:
7550 .. code-block:: text
7552 define i64 @foo(i64* %0, i32* %1, i8* %2, i8 %3) {
7553 store i32* %1, i32** @x
7554 %5 = getelementptr inbounds i8, i8* %2, i64 5
7555 %6 = load i8, i8* %5
7556 %7 = getelementptr inbounds i8, i8* %2, i8 %3
7557 tail call void @bar(i8 %3, i8* %7)
7558 %8 = load i64, i64* %0
7562 We can expect the record like this:
7564 .. code-block:: text
7566 params: ((param: 0, offset: [0, 7]),(param: 2, offset: [5, 5], calls: ((callee: ^3, param: 1, offset: [-128, 127]))))
7568 The function may access just 8 bytes of the parameter %0 . ``calls`` is empty,
7569 so the parameter is either not used for function calls or ``offset`` already
7570 covers all accesses from nested function calls.
7571 Parameter %1 escapes, so access is unknown.
7572 The function itself can access just a single byte of the parameter %2. Additional
7573 access is possible inside of the ``@bar`` or ``^3``. The function adds signed
7574 offset to the pointer and passes the result as the argument %1 into ``^3``.
7575 This record itself does not tell us how ``^3`` will access the parameter.
7576 Parameter %3 is not a pointer.
7583 The optional ``Refs`` field looks like:
7585 .. code-block:: text
7587 refs: ((Ref)[, (Ref)]*)
7589 where each ``Ref`` contains a reference to the summary id of the referenced
7590 value (e.g. ``^1``).
7592 .. _typeidinfo_summary:
7597 The optional ``TypeIdInfo`` field, used for
7598 `Control Flow Integrity <https://clang.llvm.org/docs/ControlFlowIntegrity.html>`_,
7601 .. code-block:: text
7603 typeIdInfo: [(TypeTests)]?[, (TypeTestAssumeVCalls)]?[, (TypeCheckedLoadVCalls)]?[, (TypeTestAssumeConstVCalls)]?[, (TypeCheckedLoadConstVCalls)]?
7605 These optional fields have the following forms:
7610 .. code-block:: text
7612 typeTests: (TypeIdRef[, TypeIdRef]*)
7614 Where each ``TypeIdRef`` refers to a :ref:`type id<typeid_summary>`
7615 by summary id or ``GUID``.
7617 TypeTestAssumeVCalls
7618 """"""""""""""""""""
7620 .. code-block:: text
7622 typeTestAssumeVCalls: (VFuncId[, VFuncId]*)
7624 Where each VFuncId has the format:
7626 .. code-block:: text
7628 vFuncId: (TypeIdRef, offset: 16)
7630 Where each ``TypeIdRef`` refers to a :ref:`type id<typeid_summary>`
7631 by summary id or ``GUID`` preceded by a ``guid:`` tag.
7633 TypeCheckedLoadVCalls
7634 """""""""""""""""""""
7636 .. code-block:: text
7638 typeCheckedLoadVCalls: (VFuncId[, VFuncId]*)
7640 Where each VFuncId has the format described for ``TypeTestAssumeVCalls``.
7642 TypeTestAssumeConstVCalls
7643 """""""""""""""""""""""""
7645 .. code-block:: text
7647 typeTestAssumeConstVCalls: (ConstVCall[, ConstVCall]*)
7649 Where each ConstVCall has the format:
7651 .. code-block:: text
7653 (VFuncId, args: (Arg[, Arg]*))
7655 and where each VFuncId has the format described for ``TypeTestAssumeVCalls``,
7656 and each Arg is an integer argument number.
7658 TypeCheckedLoadConstVCalls
7659 """"""""""""""""""""""""""
7661 .. code-block:: text
7663 typeCheckedLoadConstVCalls: (ConstVCall[, ConstVCall]*)
7665 Where each ConstVCall has the format described for
7666 ``TypeTestAssumeConstVCalls``.
7670 Type ID Summary Entry
7671 ---------------------
7673 Each type id summary entry corresponds to a type identifier resolution
7674 which is generated during the LTO link portion of the compile when building
7675 with `Control Flow Integrity <https://clang.llvm.org/docs/ControlFlowIntegrity.html>`_,
7676 so these are only present in a combined summary index.
7680 .. code-block:: text
7682 ^4 = typeid: (name: "_ZTS1A", summary: (typeTestRes: (kind: allOnes, sizeM1BitWidth: 7[, alignLog2: 0]?[, sizeM1: 0]?[, bitMask: 0]?[, inlineBits: 0]?)[, WpdResolutions]?)) ; guid = 7004155349499253778
7684 The ``typeTestRes`` gives the type test resolution ``kind`` (which may
7685 be ``unsat``, ``byteArray``, ``inline``, ``single``, or ``allOnes``), and
7686 the ``size-1`` bit width. It is followed by optional flags, which default to 0,
7687 and an optional WpdResolutions (whole program devirtualization resolution)
7688 field that looks like:
7690 .. code-block:: text
7692 wpdResolutions: ((offset: 0, WpdRes)[, (offset: 1, WpdRes)]*
7694 where each entry is a mapping from the given byte offset to the whole-program
7695 devirtualization resolution WpdRes, that has one of the following formats:
7697 .. code-block:: text
7699 wpdRes: (kind: branchFunnel)
7700 wpdRes: (kind: singleImpl, singleImplName: "_ZN1A1nEi")
7701 wpdRes: (kind: indir)
7703 Additionally, each wpdRes has an optional ``resByArg`` field, which
7704 describes the resolutions for calls with all constant integer arguments:
7706 .. code-block:: text
7708 resByArg: (ResByArg[, ResByArg]*)
7712 .. code-block:: text
7714 args: (Arg[, Arg]*), byArg: (kind: UniformRetVal[, info: 0][, byte: 0][, bit: 0])
7716 Where the ``kind`` can be ``Indir``, ``UniformRetVal``, ``UniqueRetVal``
7717 or ``VirtualConstProp``. The ``info`` field is only used if the kind
7718 is ``UniformRetVal`` (indicates the uniform return value), or
7719 ``UniqueRetVal`` (holds the return value associated with the unique vtable
7720 (0 or 1)). The ``byte`` and ``bit`` fields are only used if the target does
7721 not support the use of absolute symbols to store constants.
7723 .. _intrinsicglobalvariables:
7725 Intrinsic Global Variables
7726 ==========================
7728 LLVM has a number of "magic" global variables that contain data that
7729 affect code generation or other IR semantics. These are documented here.
7730 All globals of this sort should have a section specified as
7731 "``llvm.metadata``". This section and all globals that start with
7732 "``llvm.``" are reserved for use by LLVM.
7736 The '``llvm.used``' Global Variable
7737 -----------------------------------
7739 The ``@llvm.used`` global is an array which has
7740 :ref:`appending linkage <linkage_appending>`. This array contains a list of
7741 pointers to named global variables, functions and aliases which may optionally
7742 have a pointer cast formed of bitcast or getelementptr. For example, a legal
7745 .. code-block:: llvm
7750 @llvm.used = appending global [2 x i8*] [
7752 i8* bitcast (i32* @Y to i8*)
7753 ], section "llvm.metadata"
7755 If a symbol appears in the ``@llvm.used`` list, then the compiler, assembler,
7756 and linker are required to treat the symbol as if there is a reference to the
7757 symbol that it cannot see (which is why they have to be named). For example, if
7758 a variable has internal linkage and no references other than that from the
7759 ``@llvm.used`` list, it cannot be deleted. This is commonly used to represent
7760 references from inline asms and other things the compiler cannot "see", and
7761 corresponds to "``attribute((used))``" in GNU C.
7763 On some targets, the code generator must emit a directive to the
7764 assembler or object file to prevent the assembler and linker from
7765 removing the symbol.
7767 .. _gv_llvmcompilerused:
7769 The '``llvm.compiler.used``' Global Variable
7770 --------------------------------------------
7772 The ``@llvm.compiler.used`` directive is the same as the ``@llvm.used``
7773 directive, except that it only prevents the compiler from touching the
7774 symbol. On targets that support it, this allows an intelligent linker to
7775 optimize references to the symbol without being impeded as it would be
7778 This is a rare construct that should only be used in rare circumstances,
7779 and should not be exposed to source languages.
7781 .. _gv_llvmglobalctors:
7783 The '``llvm.global_ctors``' Global Variable
7784 -------------------------------------------
7786 .. code-block:: llvm
7788 %0 = type { i32, void ()*, i8* }
7789 @llvm.global_ctors = appending global [1 x %0] [%0 { i32 65535, void ()* @ctor, i8* @data }]
7791 The ``@llvm.global_ctors`` array contains a list of constructor
7792 functions, priorities, and an associated global or function.
7793 The functions referenced by this array will be called in ascending order
7794 of priority (i.e. lowest first) when the module is loaded. The order of
7795 functions with the same priority is not defined.
7797 If the third field is non-null, and points to a global variable
7798 or function, the initializer function will only run if the associated
7799 data from the current module is not discarded.
7800 On ELF the referenced global variable or function must be in a comdat.
7802 .. _llvmglobaldtors:
7804 The '``llvm.global_dtors``' Global Variable
7805 -------------------------------------------
7807 .. code-block:: llvm
7809 %0 = type { i32, void ()*, i8* }
7810 @llvm.global_dtors = appending global [1 x %0] [%0 { i32 65535, void ()* @dtor, i8* @data }]
7812 The ``@llvm.global_dtors`` array contains a list of destructor
7813 functions, priorities, and an associated global or function.
7814 The functions referenced by this array will be called in descending
7815 order of priority (i.e. highest first) when the module is unloaded. The
7816 order of functions with the same priority is not defined.
7818 If the third field is non-null, and points to a global variable
7819 or function, the destructor function will only run if the associated
7820 data from the current module is not discarded.
7821 On ELF the referenced global variable or function must be in a comdat.
7823 Instruction Reference
7824 =====================
7826 The LLVM instruction set consists of several different classifications
7827 of instructions: :ref:`terminator instructions <terminators>`, :ref:`binary
7828 instructions <binaryops>`, :ref:`bitwise binary
7829 instructions <bitwiseops>`, :ref:`memory instructions <memoryops>`, and
7830 :ref:`other instructions <otherops>`.
7834 Terminator Instructions
7835 -----------------------
7837 As mentioned :ref:`previously <functionstructure>`, every basic block in a
7838 program ends with a "Terminator" instruction, which indicates which
7839 block should be executed after the current block is finished. These
7840 terminator instructions typically yield a '``void``' value: they produce
7841 control flow, not values (the one exception being the
7842 ':ref:`invoke <i_invoke>`' instruction).
7844 The terminator instructions are: ':ref:`ret <i_ret>`',
7845 ':ref:`br <i_br>`', ':ref:`switch <i_switch>`',
7846 ':ref:`indirectbr <i_indirectbr>`', ':ref:`invoke <i_invoke>`',
7847 ':ref:`callbr <i_callbr>`'
7848 ':ref:`resume <i_resume>`', ':ref:`catchswitch <i_catchswitch>`',
7849 ':ref:`catchret <i_catchret>`',
7850 ':ref:`cleanupret <i_cleanupret>`',
7851 and ':ref:`unreachable <i_unreachable>`'.
7855 '``ret``' Instruction
7856 ^^^^^^^^^^^^^^^^^^^^^
7863 ret <type> <value> ; Return a value from a non-void function
7864 ret void ; Return from void function
7869 The '``ret``' instruction is used to return control flow (and optionally
7870 a value) from a function back to the caller.
7872 There are two forms of the '``ret``' instruction: one that returns a
7873 value and then causes control flow, and one that just causes control
7879 The '``ret``' instruction optionally accepts a single argument, the
7880 return value. The type of the return value must be a ':ref:`first
7881 class <t_firstclass>`' type.
7883 A function is not :ref:`well formed <wellformed>` if it has a non-void
7884 return type and contains a '``ret``' instruction with no return value or
7885 a return value with a type that does not match its type, or if it has a
7886 void return type and contains a '``ret``' instruction with a return
7892 When the '``ret``' instruction is executed, control flow returns back to
7893 the calling function's context. If the caller is a
7894 ":ref:`call <i_call>`" instruction, execution continues at the
7895 instruction after the call. If the caller was an
7896 ":ref:`invoke <i_invoke>`" instruction, execution continues at the
7897 beginning of the "normal" destination block. If the instruction returns
7898 a value, that value shall set the call or invoke instruction's return
7904 .. code-block:: llvm
7906 ret i32 5 ; Return an integer value of 5
7907 ret void ; Return from a void function
7908 ret { i32, i8 } { i32 4, i8 2 } ; Return a struct of values 4 and 2
7912 '``br``' Instruction
7913 ^^^^^^^^^^^^^^^^^^^^
7920 br i1 <cond>, label <iftrue>, label <iffalse>
7921 br label <dest> ; Unconditional branch
7926 The '``br``' instruction is used to cause control flow to transfer to a
7927 different basic block in the current function. There are two forms of
7928 this instruction, corresponding to a conditional branch and an
7929 unconditional branch.
7934 The conditional branch form of the '``br``' instruction takes a single
7935 '``i1``' value and two '``label``' values. The unconditional form of the
7936 '``br``' instruction takes a single '``label``' value as a target.
7941 Upon execution of a conditional '``br``' instruction, the '``i1``'
7942 argument is evaluated. If the value is ``true``, control flows to the
7943 '``iftrue``' ``label`` argument. If "cond" is ``false``, control flows
7944 to the '``iffalse``' ``label`` argument.
7945 If '``cond``' is ``poison`` or ``undef``, this instruction has undefined
7951 .. code-block:: llvm
7954 %cond = icmp eq i32 %a, %b
7955 br i1 %cond, label %IfEqual, label %IfUnequal
7963 '``switch``' Instruction
7964 ^^^^^^^^^^^^^^^^^^^^^^^^
7971 switch <intty> <value>, label <defaultdest> [ <intty> <val>, label <dest> ... ]
7976 The '``switch``' instruction is used to transfer control flow to one of
7977 several different places. It is a generalization of the '``br``'
7978 instruction, allowing a branch to occur to one of many possible
7984 The '``switch``' instruction uses three parameters: an integer
7985 comparison value '``value``', a default '``label``' destination, and an
7986 array of pairs of comparison value constants and '``label``'s. The table
7987 is not allowed to contain duplicate constant entries.
7992 The ``switch`` instruction specifies a table of values and destinations.
7993 When the '``switch``' instruction is executed, this table is searched
7994 for the given value. If the value is found, control flow is transferred
7995 to the corresponding destination; otherwise, control flow is transferred
7996 to the default destination.
7997 If '``value``' is ``poison`` or ``undef``, this instruction has undefined
8003 Depending on properties of the target machine and the particular
8004 ``switch`` instruction, this instruction may be code generated in
8005 different ways. For example, it could be generated as a series of
8006 chained conditional branches or with a lookup table.
8011 .. code-block:: llvm
8013 ; Emulate a conditional br instruction
8014 %Val = zext i1 %value to i32
8015 switch i32 %Val, label %truedest [ i32 0, label %falsedest ]
8017 ; Emulate an unconditional br instruction
8018 switch i32 0, label %dest [ ]
8020 ; Implement a jump table:
8021 switch i32 %val, label %otherwise [ i32 0, label %onzero
8023 i32 2, label %ontwo ]
8027 '``indirectbr``' Instruction
8028 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
8035 indirectbr <somety>* <address>, [ label <dest1>, label <dest2>, ... ]
8040 The '``indirectbr``' instruction implements an indirect branch to a
8041 label within the current function, whose address is specified by
8042 "``address``". Address must be derived from a
8043 :ref:`blockaddress <blockaddress>` constant.
8048 The '``address``' argument is the address of the label to jump to. The
8049 rest of the arguments indicate the full set of possible destinations
8050 that the address may point to. Blocks are allowed to occur multiple
8051 times in the destination list, though this isn't particularly useful.
8053 This destination list is required so that dataflow analysis has an
8054 accurate understanding of the CFG.
8059 Control transfers to the block specified in the address argument. All
8060 possible destination blocks must be listed in the label list, otherwise
8061 this instruction has undefined behavior. This implies that jumps to
8062 labels defined in other functions have undefined behavior as well.
8063 If '``address``' is ``poison`` or ``undef``, this instruction has undefined
8069 This is typically implemented with a jump through a register.
8074 .. code-block:: llvm
8076 indirectbr i8* %Addr, [ label %bb1, label %bb2, label %bb3 ]
8080 '``invoke``' Instruction
8081 ^^^^^^^^^^^^^^^^^^^^^^^^
8088 <result> = invoke [cconv] [ret attrs] [addrspace(<num>)] <ty>|<fnty> <fnptrval>(<function args>) [fn attrs]
8089 [operand bundles] to label <normal label> unwind label <exception label>
8094 The '``invoke``' instruction causes control to transfer to a specified
8095 function, with the possibility of control flow transfer to either the
8096 '``normal``' label or the '``exception``' label. If the callee function
8097 returns with the "``ret``" instruction, control flow will return to the
8098 "normal" label. If the callee (or any indirect callees) returns via the
8099 ":ref:`resume <i_resume>`" instruction or other exception handling
8100 mechanism, control is interrupted and continued at the dynamically
8101 nearest "exception" label.
8103 The '``exception``' label is a `landing
8104 pad <ExceptionHandling.html#overview>`_ for the exception. As such,
8105 '``exception``' label is required to have the
8106 ":ref:`landingpad <i_landingpad>`" instruction, which contains the
8107 information about the behavior of the program after unwinding happens,
8108 as its first non-PHI instruction. The restrictions on the
8109 "``landingpad``" instruction's tightly couples it to the "``invoke``"
8110 instruction, so that the important information contained within the
8111 "``landingpad``" instruction can't be lost through normal code motion.
8116 This instruction requires several arguments:
8118 #. The optional "cconv" marker indicates which :ref:`calling
8119 convention <callingconv>` the call should use. If none is
8120 specified, the call defaults to using C calling conventions.
8121 #. The optional :ref:`Parameter Attributes <paramattrs>` list for return
8122 values. Only '``zeroext``', '``signext``', and '``inreg``' attributes
8124 #. The optional addrspace attribute can be used to indicate the address space
8125 of the called function. If it is not specified, the program address space
8126 from the :ref:`datalayout string<langref_datalayout>` will be used.
8127 #. '``ty``': the type of the call instruction itself which is also the
8128 type of the return value. Functions that return no value are marked
8130 #. '``fnty``': shall be the signature of the function being invoked. The
8131 argument types must match the types implied by this signature. This
8132 type can be omitted if the function is not varargs.
8133 #. '``fnptrval``': An LLVM value containing a pointer to a function to
8134 be invoked. In most cases, this is a direct function invocation, but
8135 indirect ``invoke``'s are just as possible, calling an arbitrary pointer
8137 #. '``function args``': argument list whose types match the function
8138 signature argument types and parameter attributes. All arguments must
8139 be of :ref:`first class <t_firstclass>` type. If the function signature
8140 indicates the function accepts a variable number of arguments, the
8141 extra arguments can be specified.
8142 #. '``normal label``': the label reached when the called function
8143 executes a '``ret``' instruction.
8144 #. '``exception label``': the label reached when a callee returns via
8145 the :ref:`resume <i_resume>` instruction or other exception handling
8147 #. The optional :ref:`function attributes <fnattrs>` list.
8148 #. The optional :ref:`operand bundles <opbundles>` list.
8153 This instruction is designed to operate as a standard '``call``'
8154 instruction in most regards. The primary difference is that it
8155 establishes an association with a label, which is used by the runtime
8156 library to unwind the stack.
8158 This instruction is used in languages with destructors to ensure that
8159 proper cleanup is performed in the case of either a ``longjmp`` or a
8160 thrown exception. Additionally, this is important for implementation of
8161 '``catch``' clauses in high-level languages that support them.
8163 For the purposes of the SSA form, the definition of the value returned
8164 by the '``invoke``' instruction is deemed to occur on the edge from the
8165 current block to the "normal" label. If the callee unwinds then no
8166 return value is available.
8171 .. code-block:: llvm
8173 %retval = invoke i32 @Test(i32 15) to label %Continue
8174 unwind label %TestCleanup ; i32:retval set
8175 %retval = invoke coldcc i32 %Testfnptr(i32 15) to label %Continue
8176 unwind label %TestCleanup ; i32:retval set
8180 '``callbr``' Instruction
8181 ^^^^^^^^^^^^^^^^^^^^^^^^
8188 <result> = callbr [cconv] [ret attrs] [addrspace(<num>)] <ty>|<fnty> <fnptrval>(<function args>) [fn attrs]
8189 [operand bundles] to label <fallthrough label> [indirect labels]
8194 The '``callbr``' instruction causes control to transfer to a specified
8195 function, with the possibility of control flow transfer to either the
8196 '``fallthrough``' label or one of the '``indirect``' labels.
8198 This instruction should only be used to implement the "goto" feature of gcc
8199 style inline assembly. Any other usage is an error in the IR verifier.
8204 This instruction requires several arguments:
8206 #. The optional "cconv" marker indicates which :ref:`calling
8207 convention <callingconv>` the call should use. If none is
8208 specified, the call defaults to using C calling conventions.
8209 #. The optional :ref:`Parameter Attributes <paramattrs>` list for return
8210 values. Only '``zeroext``', '``signext``', and '``inreg``' attributes
8212 #. The optional addrspace attribute can be used to indicate the address space
8213 of the called function. If it is not specified, the program address space
8214 from the :ref:`datalayout string<langref_datalayout>` will be used.
8215 #. '``ty``': the type of the call instruction itself which is also the
8216 type of the return value. Functions that return no value are marked
8218 #. '``fnty``': shall be the signature of the function being called. The
8219 argument types must match the types implied by this signature. This
8220 type can be omitted if the function is not varargs.
8221 #. '``fnptrval``': An LLVM value containing a pointer to a function to
8222 be called. In most cases, this is a direct function call, but
8223 other ``callbr``'s are just as possible, calling an arbitrary pointer
8225 #. '``function args``': argument list whose types match the function
8226 signature argument types and parameter attributes. All arguments must
8227 be of :ref:`first class <t_firstclass>` type. If the function signature
8228 indicates the function accepts a variable number of arguments, the
8229 extra arguments can be specified.
8230 #. '``fallthrough label``': the label reached when the inline assembly's
8231 execution exits the bottom.
8232 #. '``indirect labels``': the labels reached when a callee transfers control
8233 to a location other than the '``fallthrough label``'. The blockaddress
8234 constant for these should also be in the list of '``function args``'.
8235 #. The optional :ref:`function attributes <fnattrs>` list.
8236 #. The optional :ref:`operand bundles <opbundles>` list.
8241 This instruction is designed to operate as a standard '``call``'
8242 instruction in most regards. The primary difference is that it
8243 establishes an association with additional labels to define where control
8244 flow goes after the call.
8246 The output values of a '``callbr``' instruction are available only to
8247 the '``fallthrough``' block, not to any '``indirect``' blocks(s).
8249 The only use of this today is to implement the "goto" feature of gcc inline
8250 assembly where additional labels can be provided as locations for the inline
8251 assembly to jump to.
8256 .. code-block:: llvm
8258 ; "asm goto" without output constraints.
8259 callbr void asm "", "r,X"(i32 %x, i8 *blockaddress(@foo, %indirect))
8260 to label %fallthrough [label %indirect]
8262 ; "asm goto" with output constraints.
8263 <result> = callbr i32 asm "", "=r,r,X"(i32 %x, i8 *blockaddress(@foo, %indirect))
8264 to label %fallthrough [label %indirect]
8268 '``resume``' Instruction
8269 ^^^^^^^^^^^^^^^^^^^^^^^^
8276 resume <type> <value>
8281 The '``resume``' instruction is a terminator instruction that has no
8287 The '``resume``' instruction requires one argument, which must have the
8288 same type as the result of any '``landingpad``' instruction in the same
8294 The '``resume``' instruction resumes propagation of an existing
8295 (in-flight) exception whose unwinding was interrupted with a
8296 :ref:`landingpad <i_landingpad>` instruction.
8301 .. code-block:: llvm
8303 resume { i8*, i32 } %exn
8307 '``catchswitch``' Instruction
8308 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
8315 <resultval> = catchswitch within <parent> [ label <handler1>, label <handler2>, ... ] unwind to caller
8316 <resultval> = catchswitch within <parent> [ label <handler1>, label <handler2>, ... ] unwind label <default>
8321 The '``catchswitch``' instruction is used by `LLVM's exception handling system
8322 <ExceptionHandling.html#overview>`_ to describe the set of possible catch handlers
8323 that may be executed by the :ref:`EH personality routine <personalityfn>`.
8328 The ``parent`` argument is the token of the funclet that contains the
8329 ``catchswitch`` instruction. If the ``catchswitch`` is not inside a funclet,
8330 this operand may be the token ``none``.
8332 The ``default`` argument is the label of another basic block beginning with
8333 either a ``cleanuppad`` or ``catchswitch`` instruction. This unwind destination
8334 must be a legal target with respect to the ``parent`` links, as described in
8335 the `exception handling documentation\ <ExceptionHandling.html#wineh-constraints>`_.
8337 The ``handlers`` are a nonempty list of successor blocks that each begin with a
8338 :ref:`catchpad <i_catchpad>` instruction.
8343 Executing this instruction transfers control to one of the successors in
8344 ``handlers``, if appropriate, or continues to unwind via the unwind label if
8347 The ``catchswitch`` is both a terminator and a "pad" instruction, meaning that
8348 it must be both the first non-phi instruction and last instruction in the basic
8349 block. Therefore, it must be the only non-phi instruction in the block.
8354 .. code-block:: text
8357 %cs1 = catchswitch within none [label %handler0, label %handler1] unwind to caller
8359 %cs2 = catchswitch within %parenthandler [label %handler0] unwind label %cleanup
8363 '``catchret``' Instruction
8364 ^^^^^^^^^^^^^^^^^^^^^^^^^^
8371 catchret from <token> to label <normal>
8376 The '``catchret``' instruction is a terminator instruction that has a
8383 The first argument to a '``catchret``' indicates which ``catchpad`` it
8384 exits. It must be a :ref:`catchpad <i_catchpad>`.
8385 The second argument to a '``catchret``' specifies where control will
8391 The '``catchret``' instruction ends an existing (in-flight) exception whose
8392 unwinding was interrupted with a :ref:`catchpad <i_catchpad>` instruction. The
8393 :ref:`personality function <personalityfn>` gets a chance to execute arbitrary
8394 code to, for example, destroy the active exception. Control then transfers to
8397 The ``token`` argument must be a token produced by a ``catchpad`` instruction.
8398 If the specified ``catchpad`` is not the most-recently-entered not-yet-exited
8399 funclet pad (as described in the `EH documentation\ <ExceptionHandling.html#wineh-constraints>`_),
8400 the ``catchret``'s behavior is undefined.
8405 .. code-block:: text
8407 catchret from %catch label %continue
8411 '``cleanupret``' Instruction
8412 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
8419 cleanupret from <value> unwind label <continue>
8420 cleanupret from <value> unwind to caller
8425 The '``cleanupret``' instruction is a terminator instruction that has
8426 an optional successor.
8432 The '``cleanupret``' instruction requires one argument, which indicates
8433 which ``cleanuppad`` it exits, and must be a :ref:`cleanuppad <i_cleanuppad>`.
8434 If the specified ``cleanuppad`` is not the most-recently-entered not-yet-exited
8435 funclet pad (as described in the `EH documentation\ <ExceptionHandling.html#wineh-constraints>`_),
8436 the ``cleanupret``'s behavior is undefined.
8438 The '``cleanupret``' instruction also has an optional successor, ``continue``,
8439 which must be the label of another basic block beginning with either a
8440 ``cleanuppad`` or ``catchswitch`` instruction. This unwind destination must
8441 be a legal target with respect to the ``parent`` links, as described in the
8442 `exception handling documentation\ <ExceptionHandling.html#wineh-constraints>`_.
8447 The '``cleanupret``' instruction indicates to the
8448 :ref:`personality function <personalityfn>` that one
8449 :ref:`cleanuppad <i_cleanuppad>` it transferred control to has ended.
8450 It transfers control to ``continue`` or unwinds out of the function.
8455 .. code-block:: text
8457 cleanupret from %cleanup unwind to caller
8458 cleanupret from %cleanup unwind label %continue
8462 '``unreachable``' Instruction
8463 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
8475 The '``unreachable``' instruction has no defined semantics. This
8476 instruction is used to inform the optimizer that a particular portion of
8477 the code is not reachable. This can be used to indicate that the code
8478 after a no-return function cannot be reached, and other facts.
8483 The '``unreachable``' instruction has no defined semantics.
8490 Unary operators require a single operand, execute an operation on
8491 it, and produce a single value. The operand might represent multiple
8492 data, as is the case with the :ref:`vector <t_vector>` data type. The
8493 result value has the same type as its operand.
8497 '``fneg``' Instruction
8498 ^^^^^^^^^^^^^^^^^^^^^^
8505 <result> = fneg [fast-math flags]* <ty> <op1> ; yields ty:result
8510 The '``fneg``' instruction returns the negation of its operand.
8515 The argument to the '``fneg``' instruction must be a
8516 :ref:`floating-point <t_floating>` or :ref:`vector <t_vector>` of
8517 floating-point values.
8522 The value produced is a copy of the operand with its sign bit flipped.
8523 This instruction can also take any number of :ref:`fast-math
8524 flags <fastmath>`, which are optimization hints to enable otherwise
8525 unsafe floating-point optimizations:
8530 .. code-block:: text
8532 <result> = fneg float %val ; yields float:result = -%var
8539 Binary operators are used to do most of the computation in a program.
8540 They require two operands of the same type, execute an operation on
8541 them, and produce a single value. The operands might represent multiple
8542 data, as is the case with the :ref:`vector <t_vector>` data type. The
8543 result value has the same type as its operands.
8545 There are several different binary operators:
8549 '``add``' Instruction
8550 ^^^^^^^^^^^^^^^^^^^^^
8557 <result> = add <ty> <op1>, <op2> ; yields ty:result
8558 <result> = add nuw <ty> <op1>, <op2> ; yields ty:result
8559 <result> = add nsw <ty> <op1>, <op2> ; yields ty:result
8560 <result> = add nuw nsw <ty> <op1>, <op2> ; yields ty:result
8565 The '``add``' instruction returns the sum of its two operands.
8570 The two arguments to the '``add``' instruction must be
8571 :ref:`integer <t_integer>` or :ref:`vector <t_vector>` of integer values. Both
8572 arguments must have identical types.
8577 The value produced is the integer sum of the two operands.
8579 If the sum has unsigned overflow, the result returned is the
8580 mathematical result modulo 2\ :sup:`n`\ , where n is the bit width of
8583 Because LLVM integers use a two's complement representation, this
8584 instruction is appropriate for both signed and unsigned integers.
8586 ``nuw`` and ``nsw`` stand for "No Unsigned Wrap" and "No Signed Wrap",
8587 respectively. If the ``nuw`` and/or ``nsw`` keywords are present, the
8588 result value of the ``add`` is a :ref:`poison value <poisonvalues>` if
8589 unsigned and/or signed overflow, respectively, occurs.
8594 .. code-block:: text
8596 <result> = add i32 4, %var ; yields i32:result = 4 + %var
8600 '``fadd``' Instruction
8601 ^^^^^^^^^^^^^^^^^^^^^^
8608 <result> = fadd [fast-math flags]* <ty> <op1>, <op2> ; yields ty:result
8613 The '``fadd``' instruction returns the sum of its two operands.
8618 The two arguments to the '``fadd``' instruction must be
8619 :ref:`floating-point <t_floating>` or :ref:`vector <t_vector>` of
8620 floating-point values. Both arguments must have identical types.
8625 The value produced is the floating-point sum of the two operands.
8626 This instruction is assumed to execute in the default :ref:`floating-point
8627 environment <floatenv>`.
8628 This instruction can also take any number of :ref:`fast-math
8629 flags <fastmath>`, which are optimization hints to enable otherwise
8630 unsafe floating-point optimizations:
8635 .. code-block:: text
8637 <result> = fadd float 4.0, %var ; yields float:result = 4.0 + %var
8641 '``sub``' Instruction
8642 ^^^^^^^^^^^^^^^^^^^^^
8649 <result> = sub <ty> <op1>, <op2> ; yields ty:result
8650 <result> = sub nuw <ty> <op1>, <op2> ; yields ty:result
8651 <result> = sub nsw <ty> <op1>, <op2> ; yields ty:result
8652 <result> = sub nuw nsw <ty> <op1>, <op2> ; yields ty:result
8657 The '``sub``' instruction returns the difference of its two operands.
8659 Note that the '``sub``' instruction is used to represent the '``neg``'
8660 instruction present in most other intermediate representations.
8665 The two arguments to the '``sub``' instruction must be
8666 :ref:`integer <t_integer>` or :ref:`vector <t_vector>` of integer values. Both
8667 arguments must have identical types.
8672 The value produced is the integer difference of the two operands.
8674 If the difference has unsigned overflow, the result returned is the
8675 mathematical result modulo 2\ :sup:`n`\ , where n is the bit width of
8678 Because LLVM integers use a two's complement representation, this
8679 instruction is appropriate for both signed and unsigned integers.
8681 ``nuw`` and ``nsw`` stand for "No Unsigned Wrap" and "No Signed Wrap",
8682 respectively. If the ``nuw`` and/or ``nsw`` keywords are present, the
8683 result value of the ``sub`` is a :ref:`poison value <poisonvalues>` if
8684 unsigned and/or signed overflow, respectively, occurs.
8689 .. code-block:: text
8691 <result> = sub i32 4, %var ; yields i32:result = 4 - %var
8692 <result> = sub i32 0, %val ; yields i32:result = -%var
8696 '``fsub``' Instruction
8697 ^^^^^^^^^^^^^^^^^^^^^^
8704 <result> = fsub [fast-math flags]* <ty> <op1>, <op2> ; yields ty:result
8709 The '``fsub``' instruction returns the difference of its two operands.
8714 The two arguments to the '``fsub``' instruction must be
8715 :ref:`floating-point <t_floating>` or :ref:`vector <t_vector>` of
8716 floating-point values. Both arguments must have identical types.
8721 The value produced is the floating-point difference of the two operands.
8722 This instruction is assumed to execute in the default :ref:`floating-point
8723 environment <floatenv>`.
8724 This instruction can also take any number of :ref:`fast-math
8725 flags <fastmath>`, which are optimization hints to enable otherwise
8726 unsafe floating-point optimizations:
8731 .. code-block:: text
8733 <result> = fsub float 4.0, %var ; yields float:result = 4.0 - %var
8734 <result> = fsub float -0.0, %val ; yields float:result = -%var
8738 '``mul``' Instruction
8739 ^^^^^^^^^^^^^^^^^^^^^
8746 <result> = mul <ty> <op1>, <op2> ; yields ty:result
8747 <result> = mul nuw <ty> <op1>, <op2> ; yields ty:result
8748 <result> = mul nsw <ty> <op1>, <op2> ; yields ty:result
8749 <result> = mul nuw nsw <ty> <op1>, <op2> ; yields ty:result
8754 The '``mul``' instruction returns the product of its two operands.
8759 The two arguments to the '``mul``' instruction must be
8760 :ref:`integer <t_integer>` or :ref:`vector <t_vector>` of integer values. Both
8761 arguments must have identical types.
8766 The value produced is the integer product of the two operands.
8768 If the result of the multiplication has unsigned overflow, the result
8769 returned is the mathematical result modulo 2\ :sup:`n`\ , where n is the
8770 bit width of the result.
8772 Because LLVM integers use a two's complement representation, and the
8773 result is the same width as the operands, this instruction returns the
8774 correct result for both signed and unsigned integers. If a full product
8775 (e.g. ``i32`` * ``i32`` -> ``i64``) is needed, the operands should be
8776 sign-extended or zero-extended as appropriate to the width of the full
8779 ``nuw`` and ``nsw`` stand for "No Unsigned Wrap" and "No Signed Wrap",
8780 respectively. If the ``nuw`` and/or ``nsw`` keywords are present, the
8781 result value of the ``mul`` is a :ref:`poison value <poisonvalues>` if
8782 unsigned and/or signed overflow, respectively, occurs.
8787 .. code-block:: text
8789 <result> = mul i32 4, %var ; yields i32:result = 4 * %var
8793 '``fmul``' Instruction
8794 ^^^^^^^^^^^^^^^^^^^^^^
8801 <result> = fmul [fast-math flags]* <ty> <op1>, <op2> ; yields ty:result
8806 The '``fmul``' instruction returns the product of its two operands.
8811 The two arguments to the '``fmul``' instruction must be
8812 :ref:`floating-point <t_floating>` or :ref:`vector <t_vector>` of
8813 floating-point values. Both arguments must have identical types.
8818 The value produced is the floating-point product of the two operands.
8819 This instruction is assumed to execute in the default :ref:`floating-point
8820 environment <floatenv>`.
8821 This instruction can also take any number of :ref:`fast-math
8822 flags <fastmath>`, which are optimization hints to enable otherwise
8823 unsafe floating-point optimizations:
8828 .. code-block:: text
8830 <result> = fmul float 4.0, %var ; yields float:result = 4.0 * %var
8834 '``udiv``' Instruction
8835 ^^^^^^^^^^^^^^^^^^^^^^
8842 <result> = udiv <ty> <op1>, <op2> ; yields ty:result
8843 <result> = udiv exact <ty> <op1>, <op2> ; yields ty:result
8848 The '``udiv``' instruction returns the quotient of its two operands.
8853 The two arguments to the '``udiv``' instruction must be
8854 :ref:`integer <t_integer>` or :ref:`vector <t_vector>` of integer values. Both
8855 arguments must have identical types.
8860 The value produced is the unsigned integer quotient of the two operands.
8862 Note that unsigned integer division and signed integer division are
8863 distinct operations; for signed integer division, use '``sdiv``'.
8865 Division by zero is undefined behavior. For vectors, if any element
8866 of the divisor is zero, the operation has undefined behavior.
8869 If the ``exact`` keyword is present, the result value of the ``udiv`` is
8870 a :ref:`poison value <poisonvalues>` if %op1 is not a multiple of %op2 (as
8871 such, "((a udiv exact b) mul b) == a").
8876 .. code-block:: text
8878 <result> = udiv i32 4, %var ; yields i32:result = 4 / %var
8882 '``sdiv``' Instruction
8883 ^^^^^^^^^^^^^^^^^^^^^^
8890 <result> = sdiv <ty> <op1>, <op2> ; yields ty:result
8891 <result> = sdiv exact <ty> <op1>, <op2> ; yields ty:result
8896 The '``sdiv``' instruction returns the quotient of its two operands.
8901 The two arguments to the '``sdiv``' instruction must be
8902 :ref:`integer <t_integer>` or :ref:`vector <t_vector>` of integer values. Both
8903 arguments must have identical types.
8908 The value produced is the signed integer quotient of the two operands
8909 rounded towards zero.
8911 Note that signed integer division and unsigned integer division are
8912 distinct operations; for unsigned integer division, use '``udiv``'.
8914 Division by zero is undefined behavior. For vectors, if any element
8915 of the divisor is zero, the operation has undefined behavior.
8916 Overflow also leads to undefined behavior; this is a rare case, but can
8917 occur, for example, by doing a 32-bit division of -2147483648 by -1.
8919 If the ``exact`` keyword is present, the result value of the ``sdiv`` is
8920 a :ref:`poison value <poisonvalues>` if the result would be rounded.
8925 .. code-block:: text
8927 <result> = sdiv i32 4, %var ; yields i32:result = 4 / %var
8931 '``fdiv``' Instruction
8932 ^^^^^^^^^^^^^^^^^^^^^^
8939 <result> = fdiv [fast-math flags]* <ty> <op1>, <op2> ; yields ty:result
8944 The '``fdiv``' instruction returns the quotient of its two operands.
8949 The two arguments to the '``fdiv``' instruction must be
8950 :ref:`floating-point <t_floating>` or :ref:`vector <t_vector>` of
8951 floating-point values. Both arguments must have identical types.
8956 The value produced is the floating-point quotient of the two operands.
8957 This instruction is assumed to execute in the default :ref:`floating-point
8958 environment <floatenv>`.
8959 This instruction can also take any number of :ref:`fast-math
8960 flags <fastmath>`, which are optimization hints to enable otherwise
8961 unsafe floating-point optimizations:
8966 .. code-block:: text
8968 <result> = fdiv float 4.0, %var ; yields float:result = 4.0 / %var
8972 '``urem``' Instruction
8973 ^^^^^^^^^^^^^^^^^^^^^^
8980 <result> = urem <ty> <op1>, <op2> ; yields ty:result
8985 The '``urem``' instruction returns the remainder from the unsigned
8986 division of its two arguments.
8991 The two arguments to the '``urem``' instruction must be
8992 :ref:`integer <t_integer>` or :ref:`vector <t_vector>` of integer values. Both
8993 arguments must have identical types.
8998 This instruction returns the unsigned integer *remainder* of a division.
8999 This instruction always performs an unsigned division to get the
9002 Note that unsigned integer remainder and signed integer remainder are
9003 distinct operations; for signed integer remainder, use '``srem``'.
9005 Taking the remainder of a division by zero is undefined behavior.
9006 For vectors, if any element of the divisor is zero, the operation has
9012 .. code-block:: text
9014 <result> = urem i32 4, %var ; yields i32:result = 4 % %var
9018 '``srem``' Instruction
9019 ^^^^^^^^^^^^^^^^^^^^^^
9026 <result> = srem <ty> <op1>, <op2> ; yields ty:result
9031 The '``srem``' instruction returns the remainder from the signed
9032 division of its two operands. This instruction can also take
9033 :ref:`vector <t_vector>` versions of the values in which case the elements
9039 The two arguments to the '``srem``' instruction must be
9040 :ref:`integer <t_integer>` or :ref:`vector <t_vector>` of integer values. Both
9041 arguments must have identical types.
9046 This instruction returns the *remainder* of a division (where the result
9047 is either zero or has the same sign as the dividend, ``op1``), not the
9048 *modulo* operator (where the result is either zero or has the same sign
9049 as the divisor, ``op2``) of a value. For more information about the
9050 difference, see `The Math
9051 Forum <http://mathforum.org/dr.math/problems/anne.4.28.99.html>`_. For a
9052 table of how this is implemented in various languages, please see
9054 operation <http://en.wikipedia.org/wiki/Modulo_operation>`_.
9056 Note that signed integer remainder and unsigned integer remainder are
9057 distinct operations; for unsigned integer remainder, use '``urem``'.
9059 Taking the remainder of a division by zero is undefined behavior.
9060 For vectors, if any element of the divisor is zero, the operation has
9062 Overflow also leads to undefined behavior; this is a rare case, but can
9063 occur, for example, by taking the remainder of a 32-bit division of
9064 -2147483648 by -1. (The remainder doesn't actually overflow, but this
9065 rule lets srem be implemented using instructions that return both the
9066 result of the division and the remainder.)
9071 .. code-block:: text
9073 <result> = srem i32 4, %var ; yields i32:result = 4 % %var
9077 '``frem``' Instruction
9078 ^^^^^^^^^^^^^^^^^^^^^^
9085 <result> = frem [fast-math flags]* <ty> <op1>, <op2> ; yields ty:result
9090 The '``frem``' instruction returns the remainder from the division of
9096 The two arguments to the '``frem``' instruction must be
9097 :ref:`floating-point <t_floating>` or :ref:`vector <t_vector>` of
9098 floating-point values. Both arguments must have identical types.
9103 The value produced is the floating-point remainder of the two operands.
9104 This is the same output as a libm '``fmod``' function, but without any
9105 possibility of setting ``errno``. The remainder has the same sign as the
9107 This instruction is assumed to execute in the default :ref:`floating-point
9108 environment <floatenv>`.
9109 This instruction can also take any number of :ref:`fast-math
9110 flags <fastmath>`, which are optimization hints to enable otherwise
9111 unsafe floating-point optimizations:
9116 .. code-block:: text
9118 <result> = frem float 4.0, %var ; yields float:result = 4.0 % %var
9122 Bitwise Binary Operations
9123 -------------------------
9125 Bitwise binary operators are used to do various forms of bit-twiddling
9126 in a program. They are generally very efficient instructions and can
9127 commonly be strength reduced from other instructions. They require two
9128 operands of the same type, execute an operation on them, and produce a
9129 single value. The resulting value is the same type as its operands.
9133 '``shl``' Instruction
9134 ^^^^^^^^^^^^^^^^^^^^^
9141 <result> = shl <ty> <op1>, <op2> ; yields ty:result
9142 <result> = shl nuw <ty> <op1>, <op2> ; yields ty:result
9143 <result> = shl nsw <ty> <op1>, <op2> ; yields ty:result
9144 <result> = shl nuw nsw <ty> <op1>, <op2> ; yields ty:result
9149 The '``shl``' instruction returns the first operand shifted to the left
9150 a specified number of bits.
9155 Both arguments to the '``shl``' instruction must be the same
9156 :ref:`integer <t_integer>` or :ref:`vector <t_vector>` of integer type.
9157 '``op2``' is treated as an unsigned value.
9162 The value produced is ``op1`` \* 2\ :sup:`op2` mod 2\ :sup:`n`,
9163 where ``n`` is the width of the result. If ``op2`` is (statically or
9164 dynamically) equal to or larger than the number of bits in
9165 ``op1``, this instruction returns a :ref:`poison value <poisonvalues>`.
9166 If the arguments are vectors, each vector element of ``op1`` is shifted
9167 by the corresponding shift amount in ``op2``.
9169 If the ``nuw`` keyword is present, then the shift produces a poison
9170 value if it shifts out any non-zero bits.
9171 If the ``nsw`` keyword is present, then the shift produces a poison
9172 value if it shifts out any bits that disagree with the resultant sign bit.
9177 .. code-block:: text
9179 <result> = shl i32 4, %var ; yields i32: 4 << %var
9180 <result> = shl i32 4, 2 ; yields i32: 16
9181 <result> = shl i32 1, 10 ; yields i32: 1024
9182 <result> = shl i32 1, 32 ; undefined
9183 <result> = shl <2 x i32> < i32 1, i32 1>, < i32 1, i32 2> ; yields: result=<2 x i32> < i32 2, i32 4>
9188 '``lshr``' Instruction
9189 ^^^^^^^^^^^^^^^^^^^^^^
9196 <result> = lshr <ty> <op1>, <op2> ; yields ty:result
9197 <result> = lshr exact <ty> <op1>, <op2> ; yields ty:result
9202 The '``lshr``' instruction (logical shift right) returns the first
9203 operand shifted to the right a specified number of bits with zero fill.
9208 Both arguments to the '``lshr``' instruction must be the same
9209 :ref:`integer <t_integer>` or :ref:`vector <t_vector>` of integer type.
9210 '``op2``' is treated as an unsigned value.
9215 This instruction always performs a logical shift right operation. The
9216 most significant bits of the result will be filled with zero bits after
9217 the shift. If ``op2`` is (statically or dynamically) equal to or larger
9218 than the number of bits in ``op1``, this instruction returns a :ref:`poison
9219 value <poisonvalues>`. If the arguments are vectors, each vector element
9220 of ``op1`` is shifted by the corresponding shift amount in ``op2``.
9222 If the ``exact`` keyword is present, the result value of the ``lshr`` is
9223 a poison value if any of the bits shifted out are non-zero.
9228 .. code-block:: text
9230 <result> = lshr i32 4, 1 ; yields i32:result = 2
9231 <result> = lshr i32 4, 2 ; yields i32:result = 1
9232 <result> = lshr i8 4, 3 ; yields i8:result = 0
9233 <result> = lshr i8 -2, 1 ; yields i8:result = 0x7F
9234 <result> = lshr i32 1, 32 ; undefined
9235 <result> = lshr <2 x i32> < i32 -2, i32 4>, < i32 1, i32 2> ; yields: result=<2 x i32> < i32 0x7FFFFFFF, i32 1>
9239 '``ashr``' Instruction
9240 ^^^^^^^^^^^^^^^^^^^^^^
9247 <result> = ashr <ty> <op1>, <op2> ; yields ty:result
9248 <result> = ashr exact <ty> <op1>, <op2> ; yields ty:result
9253 The '``ashr``' instruction (arithmetic shift right) returns the first
9254 operand shifted to the right a specified number of bits with sign
9260 Both arguments to the '``ashr``' instruction must be the same
9261 :ref:`integer <t_integer>` or :ref:`vector <t_vector>` of integer type.
9262 '``op2``' is treated as an unsigned value.
9267 This instruction always performs an arithmetic shift right operation,
9268 The most significant bits of the result will be filled with the sign bit
9269 of ``op1``. If ``op2`` is (statically or dynamically) equal to or larger
9270 than the number of bits in ``op1``, this instruction returns a :ref:`poison
9271 value <poisonvalues>`. If the arguments are vectors, each vector element
9272 of ``op1`` is shifted by the corresponding shift amount in ``op2``.
9274 If the ``exact`` keyword is present, the result value of the ``ashr`` is
9275 a poison value if any of the bits shifted out are non-zero.
9280 .. code-block:: text
9282 <result> = ashr i32 4, 1 ; yields i32:result = 2
9283 <result> = ashr i32 4, 2 ; yields i32:result = 1
9284 <result> = ashr i8 4, 3 ; yields i8:result = 0
9285 <result> = ashr i8 -2, 1 ; yields i8:result = -1
9286 <result> = ashr i32 1, 32 ; undefined
9287 <result> = ashr <2 x i32> < i32 -2, i32 4>, < i32 1, i32 3> ; yields: result=<2 x i32> < i32 -1, i32 0>
9291 '``and``' Instruction
9292 ^^^^^^^^^^^^^^^^^^^^^
9299 <result> = and <ty> <op1>, <op2> ; yields ty:result
9304 The '``and``' instruction returns the bitwise logical and of its two
9310 The two arguments to the '``and``' instruction must be
9311 :ref:`integer <t_integer>` or :ref:`vector <t_vector>` of integer values. Both
9312 arguments must have identical types.
9317 The truth table used for the '``and``' instruction is:
9334 .. code-block:: text
9336 <result> = and i32 4, %var ; yields i32:result = 4 & %var
9337 <result> = and i32 15, 40 ; yields i32:result = 8
9338 <result> = and i32 4, 8 ; yields i32:result = 0
9342 '``or``' Instruction
9343 ^^^^^^^^^^^^^^^^^^^^
9350 <result> = or <ty> <op1>, <op2> ; yields ty:result
9355 The '``or``' instruction returns the bitwise logical inclusive or of its
9361 The two arguments to the '``or``' instruction must be
9362 :ref:`integer <t_integer>` or :ref:`vector <t_vector>` of integer values. Both
9363 arguments must have identical types.
9368 The truth table used for the '``or``' instruction is:
9387 <result> = or i32 4, %var ; yields i32:result = 4 | %var
9388 <result> = or i32 15, 40 ; yields i32:result = 47
9389 <result> = or i32 4, 8 ; yields i32:result = 12
9393 '``xor``' Instruction
9394 ^^^^^^^^^^^^^^^^^^^^^
9401 <result> = xor <ty> <op1>, <op2> ; yields ty:result
9406 The '``xor``' instruction returns the bitwise logical exclusive or of
9407 its two operands. The ``xor`` is used to implement the "one's
9408 complement" operation, which is the "~" operator in C.
9413 The two arguments to the '``xor``' instruction must be
9414 :ref:`integer <t_integer>` or :ref:`vector <t_vector>` of integer values. Both
9415 arguments must have identical types.
9420 The truth table used for the '``xor``' instruction is:
9437 .. code-block:: text
9439 <result> = xor i32 4, %var ; yields i32:result = 4 ^ %var
9440 <result> = xor i32 15, 40 ; yields i32:result = 39
9441 <result> = xor i32 4, 8 ; yields i32:result = 12
9442 <result> = xor i32 %V, -1 ; yields i32:result = ~%V
9447 LLVM supports several instructions to represent vector operations in a
9448 target-independent manner. These instructions cover the element-access
9449 and vector-specific operations needed to process vectors effectively.
9450 While LLVM does directly support these vector operations, many
9451 sophisticated algorithms will want to use target-specific intrinsics to
9452 take full advantage of a specific target.
9454 .. _i_extractelement:
9456 '``extractelement``' Instruction
9457 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
9464 <result> = extractelement <n x <ty>> <val>, <ty2> <idx> ; yields <ty>
9465 <result> = extractelement <vscale x n x <ty>> <val>, <ty2> <idx> ; yields <ty>
9470 The '``extractelement``' instruction extracts a single scalar element
9471 from a vector at a specified index.
9476 The first operand of an '``extractelement``' instruction is a value of
9477 :ref:`vector <t_vector>` type. The second operand is an index indicating
9478 the position from which to extract the element. The index may be a
9479 variable of any integer type.
9484 The result is a scalar of the same type as the element type of ``val``.
9485 Its value is the value at position ``idx`` of ``val``. If ``idx``
9486 exceeds the length of ``val`` for a fixed-length vector, the result is a
9487 :ref:`poison value <poisonvalues>`. For a scalable vector, if the value
9488 of ``idx`` exceeds the runtime length of the vector, the result is a
9489 :ref:`poison value <poisonvalues>`.
9494 .. code-block:: text
9496 <result> = extractelement <4 x i32> %vec, i32 0 ; yields i32
9498 .. _i_insertelement:
9500 '``insertelement``' Instruction
9501 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
9508 <result> = insertelement <n x <ty>> <val>, <ty> <elt>, <ty2> <idx> ; yields <n x <ty>>
9509 <result> = insertelement <vscale x n x <ty>> <val>, <ty> <elt>, <ty2> <idx> ; yields <vscale x n x <ty>>
9514 The '``insertelement``' instruction inserts a scalar element into a
9515 vector at a specified index.
9520 The first operand of an '``insertelement``' instruction is a value of
9521 :ref:`vector <t_vector>` type. The second operand is a scalar value whose
9522 type must equal the element type of the first operand. The third operand
9523 is an index indicating the position at which to insert the value. The
9524 index may be a variable of any integer type.
9529 The result is a vector of the same type as ``val``. Its element values
9530 are those of ``val`` except at position ``idx``, where it gets the value
9531 ``elt``. If ``idx`` exceeds the length of ``val`` for a fixed-length vector,
9532 the result is a :ref:`poison value <poisonvalues>`. For a scalable vector,
9533 if the value of ``idx`` exceeds the runtime length of the vector, the result
9534 is a :ref:`poison value <poisonvalues>`.
9539 .. code-block:: text
9541 <result> = insertelement <4 x i32> %vec, i32 1, i32 0 ; yields <4 x i32>
9543 .. _i_shufflevector:
9545 '``shufflevector``' Instruction
9546 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
9553 <result> = shufflevector <n x <ty>> <v1>, <n x <ty>> <v2>, <m x i32> <mask> ; yields <m x <ty>>
9554 <result> = shufflevector <vscale x n x <ty>> <v1>, <vscale x n x <ty>> v2, <vscale x m x i32> <mask> ; yields <vscale x m x <ty>>
9559 The '``shufflevector``' instruction constructs a permutation of elements
9560 from two input vectors, returning a vector with the same element type as
9561 the input and length that is the same as the shuffle mask.
9566 The first two operands of a '``shufflevector``' instruction are vectors
9567 with the same type. The third argument is a shuffle mask vector constant
9568 whose element type is ``i32``. The mask vector elements must be constant
9569 integers or ``undef`` values. The result of the instruction is a vector
9570 whose length is the same as the shuffle mask and whose element type is the
9571 same as the element type of the first two operands.
9576 The elements of the two input vectors are numbered from left to right
9577 across both of the vectors. For each element of the result vector, the
9578 shuffle mask selects an element from one of the input vectors to copy
9579 to the result. Non-negative elements in the mask represent an index
9580 into the concatenated pair of input vectors.
9582 If the shuffle mask is undefined, the result vector is undefined. If
9583 the shuffle mask selects an undefined element from one of the input
9584 vectors, the resulting element is undefined. An undefined element
9585 in the mask vector specifies that the resulting element is undefined.
9586 An undefined element in the mask vector prevents a poisoned vector
9587 element from propagating.
9589 For scalable vectors, the only valid mask values at present are
9590 ``zeroinitializer`` and ``undef``, since we cannot write all indices as
9591 literals for a vector with a length unknown at compile time.
9596 .. code-block:: text
9598 <result> = shufflevector <4 x i32> %v1, <4 x i32> %v2,
9599 <4 x i32> <i32 0, i32 4, i32 1, i32 5> ; yields <4 x i32>
9600 <result> = shufflevector <4 x i32> %v1, <4 x i32> undef,
9601 <4 x i32> <i32 0, i32 1, i32 2, i32 3> ; yields <4 x i32> - Identity shuffle.
9602 <result> = shufflevector <8 x i32> %v1, <8 x i32> undef,
9603 <4 x i32> <i32 0, i32 1, i32 2, i32 3> ; yields <4 x i32>
9604 <result> = shufflevector <4 x i32> %v1, <4 x i32> %v2,
9605 <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7 > ; yields <8 x i32>
9607 Aggregate Operations
9608 --------------------
9610 LLVM supports several instructions for working with
9611 :ref:`aggregate <t_aggregate>` values.
9615 '``extractvalue``' Instruction
9616 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
9623 <result> = extractvalue <aggregate type> <val>, <idx>{, <idx>}*
9628 The '``extractvalue``' instruction extracts the value of a member field
9629 from an :ref:`aggregate <t_aggregate>` value.
9634 The first operand of an '``extractvalue``' instruction is a value of
9635 :ref:`struct <t_struct>` or :ref:`array <t_array>` type. The other operands are
9636 constant indices to specify which value to extract in a similar manner
9637 as indices in a '``getelementptr``' instruction.
9639 The major differences to ``getelementptr`` indexing are:
9641 - Since the value being indexed is not a pointer, the first index is
9642 omitted and assumed to be zero.
9643 - At least one index must be specified.
9644 - Not only struct indices but also array indices must be in bounds.
9649 The result is the value at the position in the aggregate specified by
9655 .. code-block:: text
9657 <result> = extractvalue {i32, float} %agg, 0 ; yields i32
9661 '``insertvalue``' Instruction
9662 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
9669 <result> = insertvalue <aggregate type> <val>, <ty> <elt>, <idx>{, <idx>}* ; yields <aggregate type>
9674 The '``insertvalue``' instruction inserts a value into a member field in
9675 an :ref:`aggregate <t_aggregate>` value.
9680 The first operand of an '``insertvalue``' instruction is a value of
9681 :ref:`struct <t_struct>` or :ref:`array <t_array>` type. The second operand is
9682 a first-class value to insert. The following operands are constant
9683 indices indicating the position at which to insert the value in a
9684 similar manner as indices in a '``extractvalue``' instruction. The value
9685 to insert must have the same type as the value identified by the
9691 The result is an aggregate of the same type as ``val``. Its value is
9692 that of ``val`` except that the value at the position specified by the
9693 indices is that of ``elt``.
9698 .. code-block:: llvm
9700 %agg1 = insertvalue {i32, float} undef, i32 1, 0 ; yields {i32 1, float undef}
9701 %agg2 = insertvalue {i32, float} %agg1, float %val, 1 ; yields {i32 1, float %val}
9702 %agg3 = insertvalue {i32, {float}} undef, float %val, 1, 0 ; yields {i32 undef, {float %val}}
9706 Memory Access and Addressing Operations
9707 ---------------------------------------
9709 A key design point of an SSA-based representation is how it represents
9710 memory. In LLVM, no memory locations are in SSA form, which makes things
9711 very simple. This section describes how to read, write, and allocate
9716 '``alloca``' Instruction
9717 ^^^^^^^^^^^^^^^^^^^^^^^^
9724 <result> = alloca [inalloca] <type> [, <ty> <NumElements>] [, align <alignment>] [, addrspace(<num>)] ; yields type addrspace(num)*:result
9729 The '``alloca``' instruction allocates memory on the stack frame of the
9730 currently executing function, to be automatically released when this
9731 function returns to its caller. If the address space is not explicitly
9732 specified, the object is allocated in the alloca address space from the
9733 :ref:`datalayout string<langref_datalayout>`.
9738 The '``alloca``' instruction allocates ``sizeof(<type>)*NumElements``
9739 bytes of memory on the runtime stack, returning a pointer of the
9740 appropriate type to the program. If "NumElements" is specified, it is
9741 the number of elements allocated, otherwise "NumElements" is defaulted
9742 to be one. If a constant alignment is specified, the value result of the
9743 allocation is guaranteed to be aligned to at least that boundary. The
9744 alignment may not be greater than ``1 << 32``. If not specified, or if
9745 zero, the target can choose to align the allocation on any convenient
9746 boundary compatible with the type.
9748 '``type``' may be any sized type.
9753 Memory is allocated; a pointer is returned. The allocated memory is
9754 uninitialized, and loading from uninitialized memory produces an undefined
9755 value. The operation itself is undefined if there is insufficient stack
9756 space for the allocation.'``alloca``'d memory is automatically released
9757 when the function returns. The '``alloca``' instruction is commonly used
9758 to represent automatic variables that must have an address available. When
9759 the function returns (either with the ``ret`` or ``resume`` instructions),
9760 the memory is reclaimed. Allocating zero bytes is legal, but the returned
9761 pointer may not be unique. The order in which memory is allocated (ie.,
9762 which way the stack grows) is not specified.
9764 Note that '``alloca``' outside of the alloca address space from the
9765 :ref:`datalayout string<langref_datalayout>` is meaningful only if the
9766 target has assigned it a semantics.
9768 If the returned pointer is used by :ref:`llvm.lifetime.start <int_lifestart>`,
9769 the returned object is initially dead.
9770 See :ref:`llvm.lifetime.start <int_lifestart>` and
9771 :ref:`llvm.lifetime.end <int_lifeend>` for the precise semantics of
9772 lifetime-manipulating intrinsics.
9777 .. code-block:: llvm
9779 %ptr = alloca i32 ; yields i32*:ptr
9780 %ptr = alloca i32, i32 4 ; yields i32*:ptr
9781 %ptr = alloca i32, i32 4, align 1024 ; yields i32*:ptr
9782 %ptr = alloca i32, align 1024 ; yields i32*:ptr
9786 '``load``' Instruction
9787 ^^^^^^^^^^^^^^^^^^^^^^
9794 <result> = load [volatile] <ty>, <ty>* <pointer>[, align <alignment>][, !nontemporal !<nontemp_node>][, !invariant.load !<empty_node>][, !invariant.group !<empty_node>][, !nonnull !<empty_node>][, !dereferenceable !<deref_bytes_node>][, !dereferenceable_or_null !<deref_bytes_node>][, !align !<align_node>][, !noundef !<empty_node>]
9795 <result> = load atomic [volatile] <ty>, <ty>* <pointer> [syncscope("<target-scope>")] <ordering>, align <alignment> [, !invariant.group !<empty_node>]
9796 !<nontemp_node> = !{ i32 1 }
9798 !<deref_bytes_node> = !{ i64 <dereferenceable_bytes> }
9799 !<align_node> = !{ i64 <value_alignment> }
9804 The '``load``' instruction is used to read from memory.
9809 The argument to the ``load`` instruction specifies the memory address from which
9810 to load. The type specified must be a :ref:`first class <t_firstclass>` type of
9811 known size (i.e. not containing an :ref:`opaque structural type <t_opaque>`). If
9812 the ``load`` is marked as ``volatile``, then the optimizer is not allowed to
9813 modify the number or order of execution of this ``load`` with other
9814 :ref:`volatile operations <volatile>`.
9816 If the ``load`` is marked as ``atomic``, it takes an extra :ref:`ordering
9817 <ordering>` and optional ``syncscope("<target-scope>")`` argument. The
9818 ``release`` and ``acq_rel`` orderings are not valid on ``load`` instructions.
9819 Atomic loads produce :ref:`defined <memmodel>` results when they may see
9820 multiple atomic stores. The type of the pointee must be an integer, pointer, or
9821 floating-point type whose bit width is a power of two greater than or equal to
9822 eight and less than or equal to a target-specific size limit. ``align`` must be
9823 explicitly specified on atomic loads, and the load has undefined behavior if the
9824 alignment is not set to a value which is at least the size in bytes of the
9825 pointee. ``!nontemporal`` does not have any defined semantics for atomic loads.
9827 The optional constant ``align`` argument specifies the alignment of the
9828 operation (that is, the alignment of the memory address). A value of 0
9829 or an omitted ``align`` argument means that the operation has the ABI
9830 alignment for the target. It is the responsibility of the code emitter
9831 to ensure that the alignment information is correct. Overestimating the
9832 alignment results in undefined behavior. Underestimating the alignment
9833 may produce less efficient code. An alignment of 1 is always safe. The
9834 maximum possible alignment is ``1 << 32``. An alignment value higher
9835 than the size of the loaded type implies memory up to the alignment
9836 value bytes can be safely loaded without trapping in the default
9837 address space. Access of the high bytes can interfere with debugging
9838 tools, so should not be accessed if the function has the
9839 ``sanitize_thread`` or ``sanitize_address`` attributes.
9841 The optional ``!nontemporal`` metadata must reference a single
9842 metadata name ``<nontemp_node>`` corresponding to a metadata node with one
9843 ``i32`` entry of value 1. The existence of the ``!nontemporal``
9844 metadata on the instruction tells the optimizer and code generator
9845 that this load is not expected to be reused in the cache. The code
9846 generator may select special instructions to save cache bandwidth, such
9847 as the ``MOVNT`` instruction on x86.
9849 The optional ``!invariant.load`` metadata must reference a single
9850 metadata name ``<empty_node>`` corresponding to a metadata node with no
9851 entries. If a load instruction tagged with the ``!invariant.load``
9852 metadata is executed, the memory location referenced by the load has
9853 to contain the same value at all points in the program where the
9854 memory location is dereferenceable; otherwise, the behavior is
9857 The optional ``!invariant.group`` metadata must reference a single metadata name
9858 ``<empty_node>`` corresponding to a metadata node with no entries.
9859 See ``invariant.group`` metadata :ref:`invariant.group <md_invariant.group>`.
9861 The optional ``!nonnull`` metadata must reference a single
9862 metadata name ``<empty_node>`` corresponding to a metadata node with no
9863 entries. The existence of the ``!nonnull`` metadata on the
9864 instruction tells the optimizer that the value loaded is known to
9865 never be null. If the value is null at runtime, the behavior is undefined.
9866 This is analogous to the ``nonnull`` attribute on parameters and return
9867 values. This metadata can only be applied to loads of a pointer type.
9869 The optional ``!dereferenceable`` metadata must reference a single metadata
9870 name ``<deref_bytes_node>`` corresponding to a metadata node with one ``i64``
9872 See ``dereferenceable`` metadata :ref:`dereferenceable <md_dereferenceable>`.
9874 The optional ``!dereferenceable_or_null`` metadata must reference a single
9875 metadata name ``<deref_bytes_node>`` corresponding to a metadata node with one
9877 See ``dereferenceable_or_null`` metadata :ref:`dereferenceable_or_null
9878 <md_dereferenceable_or_null>`.
9880 The optional ``!align`` metadata must reference a single metadata name
9881 ``<align_node>`` corresponding to a metadata node with one ``i64`` entry.
9882 The existence of the ``!align`` metadata on the instruction tells the
9883 optimizer that the value loaded is known to be aligned to a boundary specified
9884 by the integer value in the metadata node. The alignment must be a power of 2.
9885 This is analogous to the ''align'' attribute on parameters and return values.
9886 This metadata can only be applied to loads of a pointer type. If the returned
9887 value is not appropriately aligned at runtime, the behavior is undefined.
9889 The optional ``!noundef`` metadata must reference a single metadata name
9890 ``<empty_node>`` corresponding to a node with no entries. The existence of
9891 ``!noundef`` metadata on the instruction tells the optimizer that the value
9892 loaded is known to be :ref:`well defined <welldefinedvalues>`.
9893 If the value isn't well defined, the behavior is undefined.
9898 The location of memory pointed to is loaded. If the value being loaded
9899 is of scalar type then the number of bytes read does not exceed the
9900 minimum number of bytes needed to hold all bits of the type. For
9901 example, loading an ``i24`` reads at most three bytes. When loading a
9902 value of a type like ``i20`` with a size that is not an integral number
9903 of bytes, the result is undefined if the value was not originally
9904 written using a store of the same type.
9905 If the value being loaded is of aggregate type, the bytes that correspond to
9906 padding may be accessed but are ignored, because it is impossible to observe
9907 padding from the loaded aggregate value.
9908 If ``<pointer>`` is not a well-defined value, the behavior is undefined.
9913 .. code-block:: llvm
9915 %ptr = alloca i32 ; yields i32*:ptr
9916 store i32 3, i32* %ptr ; yields void
9917 %val = load i32, i32* %ptr ; yields i32:val = i32 3
9921 '``store``' Instruction
9922 ^^^^^^^^^^^^^^^^^^^^^^^
9929 store [volatile] <ty> <value>, <ty>* <pointer>[, align <alignment>][, !nontemporal !<nontemp_node>][, !invariant.group !<empty_node>] ; yields void
9930 store atomic [volatile] <ty> <value>, <ty>* <pointer> [syncscope("<target-scope>")] <ordering>, align <alignment> [, !invariant.group !<empty_node>] ; yields void
9931 !<nontemp_node> = !{ i32 1 }
9937 The '``store``' instruction is used to write to memory.
9942 There are two arguments to the ``store`` instruction: a value to store and an
9943 address at which to store it. The type of the ``<pointer>`` operand must be a
9944 pointer to the :ref:`first class <t_firstclass>` type of the ``<value>``
9945 operand. If the ``store`` is marked as ``volatile``, then the optimizer is not
9946 allowed to modify the number or order of execution of this ``store`` with other
9947 :ref:`volatile operations <volatile>`. Only values of :ref:`first class
9948 <t_firstclass>` types of known size (i.e. not containing an :ref:`opaque
9949 structural type <t_opaque>`) can be stored.
9951 If the ``store`` is marked as ``atomic``, it takes an extra :ref:`ordering
9952 <ordering>` and optional ``syncscope("<target-scope>")`` argument. The
9953 ``acquire`` and ``acq_rel`` orderings aren't valid on ``store`` instructions.
9954 Atomic loads produce :ref:`defined <memmodel>` results when they may see
9955 multiple atomic stores. The type of the pointee must be an integer, pointer, or
9956 floating-point type whose bit width is a power of two greater than or equal to
9957 eight and less than or equal to a target-specific size limit. ``align`` must be
9958 explicitly specified on atomic stores, and the store has undefined behavior if
9959 the alignment is not set to a value which is at least the size in bytes of the
9960 pointee. ``!nontemporal`` does not have any defined semantics for atomic stores.
9962 The optional constant ``align`` argument specifies the alignment of the
9963 operation (that is, the alignment of the memory address). A value of 0
9964 or an omitted ``align`` argument means that the operation has the ABI
9965 alignment for the target. It is the responsibility of the code emitter
9966 to ensure that the alignment information is correct. Overestimating the
9967 alignment results in undefined behavior. Underestimating the
9968 alignment may produce less efficient code. An alignment of 1 is always
9969 safe. The maximum possible alignment is ``1 << 32``. An alignment
9970 value higher than the size of the stored type implies memory up to the
9971 alignment value bytes can be stored to without trapping in the default
9972 address space. Storing to the higher bytes however may result in data
9973 races if another thread can access the same address. Introducing a
9974 data race is not allowed. Storing to the extra bytes is not allowed
9975 even in situations where a data race is known to not exist if the
9976 function has the ``sanitize_address`` attribute.
9978 The optional ``!nontemporal`` metadata must reference a single metadata
9979 name ``<nontemp_node>`` corresponding to a metadata node with one ``i32`` entry
9980 of value 1. The existence of the ``!nontemporal`` metadata on the instruction
9981 tells the optimizer and code generator that this load is not expected to
9982 be reused in the cache. The code generator may select special
9983 instructions to save cache bandwidth, such as the ``MOVNT`` instruction on
9986 The optional ``!invariant.group`` metadata must reference a
9987 single metadata name ``<empty_node>``. See ``invariant.group`` metadata.
9992 The contents of memory are updated to contain ``<value>`` at the
9993 location specified by the ``<pointer>`` operand. If ``<value>`` is
9994 of scalar type then the number of bytes written does not exceed the
9995 minimum number of bytes needed to hold all bits of the type. For
9996 example, storing an ``i24`` writes at most three bytes. When writing a
9997 value of a type like ``i20`` with a size that is not an integral number
9998 of bytes, it is unspecified what happens to the extra bits that do not
9999 belong to the type, but they will typically be overwritten.
10000 If ``<value>`` is of aggregate type, padding is filled with
10001 :ref:`undef <undefvalues>`.
10002 If ``<pointer>`` is not a well-defined value, the behavior is undefined.
10007 .. code-block:: llvm
10009 %ptr = alloca i32 ; yields i32*:ptr
10010 store i32 3, i32* %ptr ; yields void
10011 %val = load i32, i32* %ptr ; yields i32:val = i32 3
10015 '``fence``' Instruction
10016 ^^^^^^^^^^^^^^^^^^^^^^^
10023 fence [syncscope("<target-scope>")] <ordering> ; yields void
10028 The '``fence``' instruction is used to introduce happens-before edges
10029 between operations.
10034 '``fence``' instructions take an :ref:`ordering <ordering>` argument which
10035 defines what *synchronizes-with* edges they add. They can only be given
10036 ``acquire``, ``release``, ``acq_rel``, and ``seq_cst`` orderings.
10041 A fence A which has (at least) ``release`` ordering semantics
10042 *synchronizes with* a fence B with (at least) ``acquire`` ordering
10043 semantics if and only if there exist atomic operations X and Y, both
10044 operating on some atomic object M, such that A is sequenced before X, X
10045 modifies M (either directly or through some side effect of a sequence
10046 headed by X), Y is sequenced before B, and Y observes M. This provides a
10047 *happens-before* dependency between A and B. Rather than an explicit
10048 ``fence``, one (but not both) of the atomic operations X or Y might
10049 provide a ``release`` or ``acquire`` (resp.) ordering constraint and
10050 still *synchronize-with* the explicit ``fence`` and establish the
10051 *happens-before* edge.
10053 A ``fence`` which has ``seq_cst`` ordering, in addition to having both
10054 ``acquire`` and ``release`` semantics specified above, participates in
10055 the global program order of other ``seq_cst`` operations and/or fences.
10057 A ``fence`` instruction can also take an optional
10058 ":ref:`syncscope <syncscope>`" argument.
10063 .. code-block:: text
10065 fence acquire ; yields void
10066 fence syncscope("singlethread") seq_cst ; yields void
10067 fence syncscope("agent") seq_cst ; yields void
10071 '``cmpxchg``' Instruction
10072 ^^^^^^^^^^^^^^^^^^^^^^^^^
10079 cmpxchg [weak] [volatile] <ty>* <pointer>, <ty> <cmp>, <ty> <new> [syncscope("<target-scope>")] <success ordering> <failure ordering>[, align <alignment>] ; yields { ty, i1 }
10084 The '``cmpxchg``' instruction is used to atomically modify memory. It
10085 loads a value in memory and compares it to a given value. If they are
10086 equal, it tries to store a new value into the memory.
10091 There are three arguments to the '``cmpxchg``' instruction: an address
10092 to operate on, a value to compare to the value currently be at that
10093 address, and a new value to place at that address if the compared values
10094 are equal. The type of '<cmp>' must be an integer or pointer type whose
10095 bit width is a power of two greater than or equal to eight and less
10096 than or equal to a target-specific size limit. '<cmp>' and '<new>' must
10097 have the same type, and the type of '<pointer>' must be a pointer to
10098 that type. If the ``cmpxchg`` is marked as ``volatile``, then the
10099 optimizer is not allowed to modify the number or order of execution of
10100 this ``cmpxchg`` with other :ref:`volatile operations <volatile>`.
10102 The success and failure :ref:`ordering <ordering>` arguments specify how this
10103 ``cmpxchg`` synchronizes with other atomic operations. Both ordering parameters
10104 must be at least ``monotonic``, the failure ordering cannot be either
10105 ``release`` or ``acq_rel``.
10107 A ``cmpxchg`` instruction can also take an optional
10108 ":ref:`syncscope <syncscope>`" argument.
10110 The instruction can take an optional ``align`` attribute.
10111 The alignment must be a power of two greater or equal to the size of the
10112 `<value>` type. If unspecified, the alignment is assumed to be equal to the
10113 size of the '<value>' type. Note that this default alignment assumption is
10114 different from the alignment used for the load/store instructions when align
10117 The pointer passed into cmpxchg must have alignment greater than or
10118 equal to the size in memory of the operand.
10123 The contents of memory at the location specified by the '``<pointer>``' operand
10124 is read and compared to '``<cmp>``'; if the values are equal, '``<new>``' is
10125 written to the location. The original value at the location is returned,
10126 together with a flag indicating success (true) or failure (false).
10128 If the cmpxchg operation is marked as ``weak`` then a spurious failure is
10129 permitted: the operation may not write ``<new>`` even if the comparison
10132 If the cmpxchg operation is strong (the default), the i1 value is 1 if and only
10133 if the value loaded equals ``cmp``.
10135 A successful ``cmpxchg`` is a read-modify-write instruction for the purpose of
10136 identifying release sequences. A failed ``cmpxchg`` is equivalent to an atomic
10137 load with an ordering parameter determined the second ordering parameter.
10142 .. code-block:: llvm
10145 %orig = load atomic i32, i32* %ptr unordered, align 4 ; yields i32
10149 %cmp = phi i32 [ %orig, %entry ], [%value_loaded, %loop]
10150 %squared = mul i32 %cmp, %cmp
10151 %val_success = cmpxchg i32* %ptr, i32 %cmp, i32 %squared acq_rel monotonic ; yields { i32, i1 }
10152 %value_loaded = extractvalue { i32, i1 } %val_success, 0
10153 %success = extractvalue { i32, i1 } %val_success, 1
10154 br i1 %success, label %done, label %loop
10161 '``atomicrmw``' Instruction
10162 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
10169 atomicrmw [volatile] <operation> <ty>* <pointer>, <ty> <value> [syncscope("<target-scope>")] <ordering>[, align <alignment>] ; yields ty
10174 The '``atomicrmw``' instruction is used to atomically modify memory.
10179 There are three arguments to the '``atomicrmw``' instruction: an
10180 operation to apply, an address whose value to modify, an argument to the
10181 operation. The operation must be one of the following keywords:
10197 For most of these operations, the type of '<value>' must be an integer
10198 type whose bit width is a power of two greater than or equal to eight
10199 and less than or equal to a target-specific size limit. For xchg, this
10200 may also be a floating point type with the same size constraints as
10201 integers. For fadd/fsub, this must be a floating point type. The
10202 type of the '``<pointer>``' operand must be a pointer to that type. If
10203 the ``atomicrmw`` is marked as ``volatile``, then the optimizer is not
10204 allowed to modify the number or order of execution of this
10205 ``atomicrmw`` with other :ref:`volatile operations <volatile>`.
10207 The instruction can take an optional ``align`` attribute.
10208 The alignment must be a power of two greater or equal to the size of the
10209 `<value>` type. If unspecified, the alignment is assumed to be equal to the
10210 size of the '<value>' type. Note that this default alignment assumption is
10211 different from the alignment used for the load/store instructions when align
10214 A ``atomicrmw`` instruction can also take an optional
10215 ":ref:`syncscope <syncscope>`" argument.
10220 The contents of memory at the location specified by the '``<pointer>``'
10221 operand are atomically read, modified, and written back. The original
10222 value at the location is returned. The modification is specified by the
10223 operation argument:
10225 - xchg: ``*ptr = val``
10226 - add: ``*ptr = *ptr + val``
10227 - sub: ``*ptr = *ptr - val``
10228 - and: ``*ptr = *ptr & val``
10229 - nand: ``*ptr = ~(*ptr & val)``
10230 - or: ``*ptr = *ptr | val``
10231 - xor: ``*ptr = *ptr ^ val``
10232 - max: ``*ptr = *ptr > val ? *ptr : val`` (using a signed comparison)
10233 - min: ``*ptr = *ptr < val ? *ptr : val`` (using a signed comparison)
10234 - umax: ``*ptr = *ptr > val ? *ptr : val`` (using an unsigned comparison)
10235 - umin: ``*ptr = *ptr < val ? *ptr : val`` (using an unsigned comparison)
10236 - fadd: ``*ptr = *ptr + val`` (using floating point arithmetic)
10237 - fsub: ``*ptr = *ptr - val`` (using floating point arithmetic)
10242 .. code-block:: llvm
10244 %old = atomicrmw add i32* %ptr, i32 1 acquire ; yields i32
10246 .. _i_getelementptr:
10248 '``getelementptr``' Instruction
10249 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
10256 <result> = getelementptr <ty>, <ty>* <ptrval>{, [inrange] <ty> <idx>}*
10257 <result> = getelementptr inbounds <ty>, <ty>* <ptrval>{, [inrange] <ty> <idx>}*
10258 <result> = getelementptr <ty>, <ptr vector> <ptrval>, [inrange] <vector index type> <idx>
10263 The '``getelementptr``' instruction is used to get the address of a
10264 subelement of an :ref:`aggregate <t_aggregate>` data structure. It performs
10265 address calculation only and does not access memory. The instruction can also
10266 be used to calculate a vector of such addresses.
10271 The first argument is always a type used as the basis for the calculations.
10272 The second argument is always a pointer or a vector of pointers, and is the
10273 base address to start from. The remaining arguments are indices
10274 that indicate which of the elements of the aggregate object are indexed.
10275 The interpretation of each index is dependent on the type being indexed
10276 into. The first index always indexes the pointer value given as the
10277 second argument, the second index indexes a value of the type pointed to
10278 (not necessarily the value directly pointed to, since the first index
10279 can be non-zero), etc. The first type indexed into must be a pointer
10280 value, subsequent types can be arrays, vectors, and structs. Note that
10281 subsequent types being indexed into can never be pointers, since that
10282 would require loading the pointer before continuing calculation.
10284 The type of each index argument depends on the type it is indexing into.
10285 When indexing into a (optionally packed) structure, only ``i32`` integer
10286 **constants** are allowed (when using a vector of indices they must all
10287 be the **same** ``i32`` integer constant). When indexing into an array,
10288 pointer or vector, integers of any width are allowed, and they are not
10289 required to be constant. These integers are treated as signed values
10292 For example, let's consider a C code fragment and how it gets compiled
10308 int *foo(struct ST *s) {
10309 return &s[1].Z.B[5][13];
10312 The LLVM code generated by Clang is:
10314 .. code-block:: llvm
10316 %struct.RT = type { i8, [10 x [20 x i32]], i8 }
10317 %struct.ST = type { i32, double, %struct.RT }
10319 define i32* @foo(%struct.ST* %s) nounwind uwtable readnone optsize ssp {
10321 %arrayidx = getelementptr inbounds %struct.ST, %struct.ST* %s, i64 1, i32 2, i32 1, i64 5, i64 13
10328 In the example above, the first index is indexing into the
10329 '``%struct.ST*``' type, which is a pointer, yielding a '``%struct.ST``'
10330 = '``{ i32, double, %struct.RT }``' type, a structure. The second index
10331 indexes into the third element of the structure, yielding a
10332 '``%struct.RT``' = '``{ i8 , [10 x [20 x i32]], i8 }``' type, another
10333 structure. The third index indexes into the second element of the
10334 structure, yielding a '``[10 x [20 x i32]]``' type, an array. The two
10335 dimensions of the array are subscripted into, yielding an '``i32``'
10336 type. The '``getelementptr``' instruction returns a pointer to this
10337 element, thus computing a value of '``i32*``' type.
10339 Note that it is perfectly legal to index partially through a structure,
10340 returning a pointer to an inner element. Because of this, the LLVM code
10341 for the given testcase is equivalent to:
10343 .. code-block:: llvm
10345 define i32* @foo(%struct.ST* %s) {
10346 %t1 = getelementptr %struct.ST, %struct.ST* %s, i32 1 ; yields %struct.ST*:%t1
10347 %t2 = getelementptr %struct.ST, %struct.ST* %t1, i32 0, i32 2 ; yields %struct.RT*:%t2
10348 %t3 = getelementptr %struct.RT, %struct.RT* %t2, i32 0, i32 1 ; yields [10 x [20 x i32]]*:%t3
10349 %t4 = getelementptr [10 x [20 x i32]], [10 x [20 x i32]]* %t3, i32 0, i32 5 ; yields [20 x i32]*:%t4
10350 %t5 = getelementptr [20 x i32], [20 x i32]* %t4, i32 0, i32 13 ; yields i32*:%t5
10354 If the ``inbounds`` keyword is present, the result value of the
10355 ``getelementptr`` is a :ref:`poison value <poisonvalues>` if one of the
10356 following rules is violated:
10358 * The base pointer has an *in bounds* address of an allocated object, which
10359 means that it points into an allocated object, or to its end. The only
10360 *in bounds* address for a null pointer in the default address-space is the
10361 null pointer itself.
10362 * If the type of an index is larger than the pointer index type, the
10363 truncation to the pointer index type preserves the signed value.
10364 * The multiplication of an index by the type size does not wrap the pointer
10365 index type in a signed sense (``nsw``).
10366 * The successive addition of offsets (without adding the base address) does
10367 not wrap the pointer index type in a signed sense (``nsw``).
10368 * The successive addition of the current address, interpreted as an unsigned
10369 number, and an offset, interpreted as a signed number, does not wrap the
10370 unsigned address space and remains *in bounds* of the allocated object.
10371 As a corollary, if the added offset is non-negative, the addition does not
10372 wrap in an unsigned sense (``nuw``).
10373 * In cases where the base is a vector of pointers, the ``inbounds`` keyword
10374 applies to each of the computations element-wise.
10376 These rules are based on the assumption that no allocated object may cross
10377 the unsigned address space boundary, and no allocated object may be larger
10378 than half the pointer index type space.
10380 If the ``inbounds`` keyword is not present, the offsets are added to the
10381 base address with silently-wrapping two's complement arithmetic. If the
10382 offsets have a different width from the pointer, they are sign-extended
10383 or truncated to the width of the pointer. The result value of the
10384 ``getelementptr`` may be outside the object pointed to by the base
10385 pointer. The result value may not necessarily be used to access memory
10386 though, even if it happens to point into allocated storage. See the
10387 :ref:`Pointer Aliasing Rules <pointeraliasing>` section for more
10390 If the ``inrange`` keyword is present before any index, loading from or
10391 storing to any pointer derived from the ``getelementptr`` has undefined
10392 behavior if the load or store would access memory outside of the bounds of
10393 the element selected by the index marked as ``inrange``. The result of a
10394 pointer comparison or ``ptrtoint`` (including ``ptrtoint``-like operations
10395 involving memory) involving a pointer derived from a ``getelementptr`` with
10396 the ``inrange`` keyword is undefined, with the exception of comparisons
10397 in the case where both operands are in the range of the element selected
10398 by the ``inrange`` keyword, inclusive of the address one past the end of
10399 that element. Note that the ``inrange`` keyword is currently only allowed
10400 in constant ``getelementptr`` expressions.
10402 The getelementptr instruction is often confusing. For some more insight
10403 into how it works, see :doc:`the getelementptr FAQ <GetElementPtr>`.
10408 .. code-block:: llvm
10410 ; yields [12 x i8]*:aptr
10411 %aptr = getelementptr {i32, [12 x i8]}, {i32, [12 x i8]}* %saptr, i64 0, i32 1
10413 %vptr = getelementptr {i32, <2 x i8>}, {i32, <2 x i8>}* %svptr, i64 0, i32 1, i32 1
10415 %eptr = getelementptr [12 x i8], [12 x i8]* %aptr, i64 0, i32 1
10417 %iptr = getelementptr [10 x i32], [10 x i32]* @arr, i16 0, i16 0
10419 Vector of pointers:
10420 """""""""""""""""""
10422 The ``getelementptr`` returns a vector of pointers, instead of a single address,
10423 when one or more of its arguments is a vector. In such cases, all vector
10424 arguments should have the same number of elements, and every scalar argument
10425 will be effectively broadcast into a vector during address calculation.
10427 .. code-block:: llvm
10429 ; All arguments are vectors:
10430 ; A[i] = ptrs[i] + offsets[i]*sizeof(i8)
10431 %A = getelementptr i8, <4 x i8*> %ptrs, <4 x i64> %offsets
10433 ; Add the same scalar offset to each pointer of a vector:
10434 ; A[i] = ptrs[i] + offset*sizeof(i8)
10435 %A = getelementptr i8, <4 x i8*> %ptrs, i64 %offset
10437 ; Add distinct offsets to the same pointer:
10438 ; A[i] = ptr + offsets[i]*sizeof(i8)
10439 %A = getelementptr i8, i8* %ptr, <4 x i64> %offsets
10441 ; In all cases described above the type of the result is <4 x i8*>
10443 The two following instructions are equivalent:
10445 .. code-block:: llvm
10447 getelementptr %struct.ST, <4 x %struct.ST*> %s, <4 x i64> %ind1,
10448 <4 x i32> <i32 2, i32 2, i32 2, i32 2>,
10449 <4 x i32> <i32 1, i32 1, i32 1, i32 1>,
10451 <4 x i64> <i64 13, i64 13, i64 13, i64 13>
10453 getelementptr %struct.ST, <4 x %struct.ST*> %s, <4 x i64> %ind1,
10454 i32 2, i32 1, <4 x i32> %ind4, i64 13
10456 Let's look at the C code, where the vector version of ``getelementptr``
10461 // Let's assume that we vectorize the following loop:
10462 double *A, *B; int *C;
10463 for (int i = 0; i < size; ++i) {
10467 .. code-block:: llvm
10469 ; get pointers for 8 elements from array B
10470 %ptrs = getelementptr double, double* %B, <8 x i32> %C
10471 ; load 8 elements from array B into A
10472 %A = call <8 x double> @llvm.masked.gather.v8f64.v8p0f64(<8 x double*> %ptrs,
10473 i32 8, <8 x i1> %mask, <8 x double> %passthru)
10475 Conversion Operations
10476 ---------------------
10478 The instructions in this category are the conversion instructions
10479 (casting) which all take a single operand and a type. They perform
10480 various bit conversions on the operand.
10484 '``trunc .. to``' Instruction
10485 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
10492 <result> = trunc <ty> <value> to <ty2> ; yields ty2
10497 The '``trunc``' instruction truncates its operand to the type ``ty2``.
10502 The '``trunc``' instruction takes a value to trunc, and a type to trunc
10503 it to. Both types must be of :ref:`integer <t_integer>` types, or vectors
10504 of the same number of integers. The bit size of the ``value`` must be
10505 larger than the bit size of the destination type, ``ty2``. Equal sized
10506 types are not allowed.
10511 The '``trunc``' instruction truncates the high order bits in ``value``
10512 and converts the remaining bits to ``ty2``. Since the source size must
10513 be larger than the destination size, ``trunc`` cannot be a *no-op cast*.
10514 It will always truncate bits.
10519 .. code-block:: llvm
10521 %X = trunc i32 257 to i8 ; yields i8:1
10522 %Y = trunc i32 123 to i1 ; yields i1:true
10523 %Z = trunc i32 122 to i1 ; yields i1:false
10524 %W = trunc <2 x i16> <i16 8, i16 7> to <2 x i8> ; yields <i8 8, i8 7>
10528 '``zext .. to``' Instruction
10529 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
10536 <result> = zext <ty> <value> to <ty2> ; yields ty2
10541 The '``zext``' instruction zero extends its operand to type ``ty2``.
10546 The '``zext``' instruction takes a value to cast, and a type to cast it
10547 to. Both types must be of :ref:`integer <t_integer>` types, or vectors of
10548 the same number of integers. The bit size of the ``value`` must be
10549 smaller than the bit size of the destination type, ``ty2``.
10554 The ``zext`` fills the high order bits of the ``value`` with zero bits
10555 until it reaches the size of the destination type, ``ty2``.
10557 When zero extending from i1, the result will always be either 0 or 1.
10562 .. code-block:: llvm
10564 %X = zext i32 257 to i64 ; yields i64:257
10565 %Y = zext i1 true to i32 ; yields i32:1
10566 %Z = zext <2 x i16> <i16 8, i16 7> to <2 x i32> ; yields <i32 8, i32 7>
10570 '``sext .. to``' Instruction
10571 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
10578 <result> = sext <ty> <value> to <ty2> ; yields ty2
10583 The '``sext``' sign extends ``value`` to the type ``ty2``.
10588 The '``sext``' instruction takes a value to cast, and a type to cast it
10589 to. Both types must be of :ref:`integer <t_integer>` types, or vectors of
10590 the same number of integers. The bit size of the ``value`` must be
10591 smaller than the bit size of the destination type, ``ty2``.
10596 The '``sext``' instruction performs a sign extension by copying the sign
10597 bit (highest order bit) of the ``value`` until it reaches the bit size
10598 of the type ``ty2``.
10600 When sign extending from i1, the extension always results in -1 or 0.
10605 .. code-block:: llvm
10607 %X = sext i8 -1 to i16 ; yields i16 :65535
10608 %Y = sext i1 true to i32 ; yields i32:-1
10609 %Z = sext <2 x i16> <i16 8, i16 7> to <2 x i32> ; yields <i32 8, i32 7>
10611 '``fptrunc .. to``' Instruction
10612 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
10619 <result> = fptrunc <ty> <value> to <ty2> ; yields ty2
10624 The '``fptrunc``' instruction truncates ``value`` to type ``ty2``.
10629 The '``fptrunc``' instruction takes a :ref:`floating-point <t_floating>`
10630 value to cast and a :ref:`floating-point <t_floating>` type to cast it to.
10631 The size of ``value`` must be larger than the size of ``ty2``. This
10632 implies that ``fptrunc`` cannot be used to make a *no-op cast*.
10637 The '``fptrunc``' instruction casts a ``value`` from a larger
10638 :ref:`floating-point <t_floating>` type to a smaller :ref:`floating-point
10639 <t_floating>` type.
10640 This instruction is assumed to execute in the default :ref:`floating-point
10641 environment <floatenv>`.
10646 .. code-block:: llvm
10648 %X = fptrunc double 16777217.0 to float ; yields float:16777216.0
10649 %Y = fptrunc double 1.0E+300 to half ; yields half:+infinity
10651 '``fpext .. to``' Instruction
10652 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
10659 <result> = fpext <ty> <value> to <ty2> ; yields ty2
10664 The '``fpext``' extends a floating-point ``value`` to a larger floating-point
10670 The '``fpext``' instruction takes a :ref:`floating-point <t_floating>`
10671 ``value`` to cast, and a :ref:`floating-point <t_floating>` type to cast it
10672 to. The source type must be smaller than the destination type.
10677 The '``fpext``' instruction extends the ``value`` from a smaller
10678 :ref:`floating-point <t_floating>` type to a larger :ref:`floating-point
10679 <t_floating>` type. The ``fpext`` cannot be used to make a
10680 *no-op cast* because it always changes bits. Use ``bitcast`` to make a
10681 *no-op cast* for a floating-point cast.
10686 .. code-block:: llvm
10688 %X = fpext float 3.125 to double ; yields double:3.125000e+00
10689 %Y = fpext double %X to fp128 ; yields fp128:0xL00000000000000004000900000000000
10691 '``fptoui .. to``' Instruction
10692 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
10699 <result> = fptoui <ty> <value> to <ty2> ; yields ty2
10704 The '``fptoui``' converts a floating-point ``value`` to its unsigned
10705 integer equivalent of type ``ty2``.
10710 The '``fptoui``' instruction takes a value to cast, which must be a
10711 scalar or vector :ref:`floating-point <t_floating>` value, and a type to
10712 cast it to ``ty2``, which must be an :ref:`integer <t_integer>` type. If
10713 ``ty`` is a vector floating-point type, ``ty2`` must be a vector integer
10714 type with the same number of elements as ``ty``
10719 The '``fptoui``' instruction converts its :ref:`floating-point
10720 <t_floating>` operand into the nearest (rounding towards zero)
10721 unsigned integer value. If the value cannot fit in ``ty2``, the result
10722 is a :ref:`poison value <poisonvalues>`.
10727 .. code-block:: llvm
10729 %X = fptoui double 123.0 to i32 ; yields i32:123
10730 %Y = fptoui float 1.0E+300 to i1 ; yields undefined:1
10731 %Z = fptoui float 1.04E+17 to i8 ; yields undefined:1
10733 '``fptosi .. to``' Instruction
10734 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
10741 <result> = fptosi <ty> <value> to <ty2> ; yields ty2
10746 The '``fptosi``' instruction converts :ref:`floating-point <t_floating>`
10747 ``value`` to type ``ty2``.
10752 The '``fptosi``' instruction takes a value to cast, which must be a
10753 scalar or vector :ref:`floating-point <t_floating>` value, and a type to
10754 cast it to ``ty2``, which must be an :ref:`integer <t_integer>` type. If
10755 ``ty`` is a vector floating-point type, ``ty2`` must be a vector integer
10756 type with the same number of elements as ``ty``
10761 The '``fptosi``' instruction converts its :ref:`floating-point
10762 <t_floating>` operand into the nearest (rounding towards zero)
10763 signed integer value. If the value cannot fit in ``ty2``, the result
10764 is a :ref:`poison value <poisonvalues>`.
10769 .. code-block:: llvm
10771 %X = fptosi double -123.0 to i32 ; yields i32:-123
10772 %Y = fptosi float 1.0E-247 to i1 ; yields undefined:1
10773 %Z = fptosi float 1.04E+17 to i8 ; yields undefined:1
10775 '``uitofp .. to``' Instruction
10776 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
10783 <result> = uitofp <ty> <value> to <ty2> ; yields ty2
10788 The '``uitofp``' instruction regards ``value`` as an unsigned integer
10789 and converts that value to the ``ty2`` type.
10794 The '``uitofp``' instruction takes a value to cast, which must be a
10795 scalar or vector :ref:`integer <t_integer>` value, and a type to cast it to
10796 ``ty2``, which must be an :ref:`floating-point <t_floating>` type. If
10797 ``ty`` is a vector integer type, ``ty2`` must be a vector floating-point
10798 type with the same number of elements as ``ty``
10803 The '``uitofp``' instruction interprets its operand as an unsigned
10804 integer quantity and converts it to the corresponding floating-point
10805 value. If the value cannot be exactly represented, it is rounded using
10806 the default rounding mode.
10812 .. code-block:: llvm
10814 %X = uitofp i32 257 to float ; yields float:257.0
10815 %Y = uitofp i8 -1 to double ; yields double:255.0
10817 '``sitofp .. to``' Instruction
10818 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
10825 <result> = sitofp <ty> <value> to <ty2> ; yields ty2
10830 The '``sitofp``' instruction regards ``value`` as a signed integer and
10831 converts that value to the ``ty2`` type.
10836 The '``sitofp``' instruction takes a value to cast, which must be a
10837 scalar or vector :ref:`integer <t_integer>` value, and a type to cast it to
10838 ``ty2``, which must be an :ref:`floating-point <t_floating>` type. If
10839 ``ty`` is a vector integer type, ``ty2`` must be a vector floating-point
10840 type with the same number of elements as ``ty``
10845 The '``sitofp``' instruction interprets its operand as a signed integer
10846 quantity and converts it to the corresponding floating-point value. If the
10847 value cannot be exactly represented, it is rounded using the default rounding
10853 .. code-block:: llvm
10855 %X = sitofp i32 257 to float ; yields float:257.0
10856 %Y = sitofp i8 -1 to double ; yields double:-1.0
10860 '``ptrtoint .. to``' Instruction
10861 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
10868 <result> = ptrtoint <ty> <value> to <ty2> ; yields ty2
10873 The '``ptrtoint``' instruction converts the pointer or a vector of
10874 pointers ``value`` to the integer (or vector of integers) type ``ty2``.
10879 The '``ptrtoint``' instruction takes a ``value`` to cast, which must be
10880 a value of type :ref:`pointer <t_pointer>` or a vector of pointers, and a
10881 type to cast it to ``ty2``, which must be an :ref:`integer <t_integer>` or
10882 a vector of integers type.
10887 The '``ptrtoint``' instruction converts ``value`` to integer type
10888 ``ty2`` by interpreting the pointer value as an integer and either
10889 truncating or zero extending that value to the size of the integer type.
10890 If ``value`` is smaller than ``ty2`` then a zero extension is done. If
10891 ``value`` is larger than ``ty2`` then a truncation is done. If they are
10892 the same size, then nothing is done (*no-op cast*) other than a type
10898 .. code-block:: llvm
10900 %X = ptrtoint i32* %P to i8 ; yields truncation on 32-bit architecture
10901 %Y = ptrtoint i32* %P to i64 ; yields zero extension on 32-bit architecture
10902 %Z = ptrtoint <4 x i32*> %P to <4 x i64>; yields vector zero extension for a vector of addresses on 32-bit architecture
10906 '``inttoptr .. to``' Instruction
10907 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
10914 <result> = inttoptr <ty> <value> to <ty2>[, !dereferenceable !<deref_bytes_node>][, !dereferenceable_or_null !<deref_bytes_node>] ; yields ty2
10919 The '``inttoptr``' instruction converts an integer ``value`` to a
10920 pointer type, ``ty2``.
10925 The '``inttoptr``' instruction takes an :ref:`integer <t_integer>` value to
10926 cast, and a type to cast it to, which must be a :ref:`pointer <t_pointer>`
10929 The optional ``!dereferenceable`` metadata must reference a single metadata
10930 name ``<deref_bytes_node>`` corresponding to a metadata node with one ``i64``
10932 See ``dereferenceable`` metadata.
10934 The optional ``!dereferenceable_or_null`` metadata must reference a single
10935 metadata name ``<deref_bytes_node>`` corresponding to a metadata node with one
10937 See ``dereferenceable_or_null`` metadata.
10942 The '``inttoptr``' instruction converts ``value`` to type ``ty2`` by
10943 applying either a zero extension or a truncation depending on the size
10944 of the integer ``value``. If ``value`` is larger than the size of a
10945 pointer then a truncation is done. If ``value`` is smaller than the size
10946 of a pointer then a zero extension is done. If they are the same size,
10947 nothing is done (*no-op cast*).
10952 .. code-block:: llvm
10954 %X = inttoptr i32 255 to i32* ; yields zero extension on 64-bit architecture
10955 %Y = inttoptr i32 255 to i32* ; yields no-op on 32-bit architecture
10956 %Z = inttoptr i64 0 to i32* ; yields truncation on 32-bit architecture
10957 %Z = inttoptr <4 x i32> %G to <4 x i8*>; yields truncation of vector G to four pointers
10961 '``bitcast .. to``' Instruction
10962 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
10969 <result> = bitcast <ty> <value> to <ty2> ; yields ty2
10974 The '``bitcast``' instruction converts ``value`` to type ``ty2`` without
10980 The '``bitcast``' instruction takes a value to cast, which must be a
10981 non-aggregate first class value, and a type to cast it to, which must
10982 also be a non-aggregate :ref:`first class <t_firstclass>` type. The
10983 bit sizes of ``value`` and the destination type, ``ty2``, must be
10984 identical. If the source type is a pointer, the destination type must
10985 also be a pointer of the same size. This instruction supports bitwise
10986 conversion of vectors to integers and to vectors of other types (as
10987 long as they have the same size).
10992 The '``bitcast``' instruction converts ``value`` to type ``ty2``. It
10993 is always a *no-op cast* because no bits change with this
10994 conversion. The conversion is done as if the ``value`` had been stored
10995 to memory and read back as type ``ty2``. Pointer (or vector of
10996 pointers) types may only be converted to other pointer (or vector of
10997 pointers) types with the same address space through this instruction.
10998 To convert pointers to other types, use the :ref:`inttoptr <i_inttoptr>`
10999 or :ref:`ptrtoint <i_ptrtoint>` instructions first.
11001 There is a caveat for bitcasts involving vector types in relation to
11002 endianess. For example ``bitcast <2 x i8> <value> to i16`` puts element zero
11003 of the vector in the least significant bits of the i16 for little-endian while
11004 element zero ends up in the most significant bits for big-endian.
11009 .. code-block:: text
11011 %X = bitcast i8 255 to i8 ; yields i8 :-1
11012 %Y = bitcast i32* %x to sint* ; yields sint*:%x
11013 %Z = bitcast <2 x int> %V to i64; ; yields i64: %V (depends on endianess)
11014 %Z = bitcast <2 x i32*> %V to <2 x i64*> ; yields <2 x i64*>
11016 .. _i_addrspacecast:
11018 '``addrspacecast .. to``' Instruction
11019 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
11026 <result> = addrspacecast <pty> <ptrval> to <pty2> ; yields pty2
11031 The '``addrspacecast``' instruction converts ``ptrval`` from ``pty`` in
11032 address space ``n`` to type ``pty2`` in address space ``m``.
11037 The '``addrspacecast``' instruction takes a pointer or vector of pointer value
11038 to cast and a pointer type to cast it to, which must have a different
11044 The '``addrspacecast``' instruction converts the pointer value
11045 ``ptrval`` to type ``pty2``. It can be a *no-op cast* or a complex
11046 value modification, depending on the target and the address space
11047 pair. Pointer conversions within the same address space must be
11048 performed with the ``bitcast`` instruction. Note that if the address space
11049 conversion is legal then both result and operand refer to the same memory
11055 .. code-block:: llvm
11057 %X = addrspacecast i32* %x to i32 addrspace(1)* ; yields i32 addrspace(1)*:%x
11058 %Y = addrspacecast i32 addrspace(1)* %y to i64 addrspace(2)* ; yields i64 addrspace(2)*:%y
11059 %Z = addrspacecast <4 x i32*> %z to <4 x float addrspace(3)*> ; yields <4 x float addrspace(3)*>:%z
11066 The instructions in this category are the "miscellaneous" instructions,
11067 which defy better classification.
11071 '``icmp``' Instruction
11072 ^^^^^^^^^^^^^^^^^^^^^^
11079 <result> = icmp <cond> <ty> <op1>, <op2> ; yields i1 or <N x i1>:result
11084 The '``icmp``' instruction returns a boolean value or a vector of
11085 boolean values based on comparison of its two integer, integer vector,
11086 pointer, or pointer vector operands.
11091 The '``icmp``' instruction takes three operands. The first operand is
11092 the condition code indicating the kind of comparison to perform. It is
11093 not a value, just a keyword. The possible condition codes are:
11096 #. ``ne``: not equal
11097 #. ``ugt``: unsigned greater than
11098 #. ``uge``: unsigned greater or equal
11099 #. ``ult``: unsigned less than
11100 #. ``ule``: unsigned less or equal
11101 #. ``sgt``: signed greater than
11102 #. ``sge``: signed greater or equal
11103 #. ``slt``: signed less than
11104 #. ``sle``: signed less or equal
11106 The remaining two arguments must be :ref:`integer <t_integer>` or
11107 :ref:`pointer <t_pointer>` or integer :ref:`vector <t_vector>` typed. They
11108 must also be identical types.
11113 The '``icmp``' compares ``op1`` and ``op2`` according to the condition
11114 code given as ``cond``. The comparison performed always yields either an
11115 :ref:`i1 <t_integer>` or vector of ``i1`` result, as follows:
11117 #. ``eq``: yields ``true`` if the operands are equal, ``false``
11118 otherwise. No sign interpretation is necessary or performed.
11119 #. ``ne``: yields ``true`` if the operands are unequal, ``false``
11120 otherwise. No sign interpretation is necessary or performed.
11121 #. ``ugt``: interprets the operands as unsigned values and yields
11122 ``true`` if ``op1`` is greater than ``op2``.
11123 #. ``uge``: interprets the operands as unsigned values and yields
11124 ``true`` if ``op1`` is greater than or equal to ``op2``.
11125 #. ``ult``: interprets the operands as unsigned values and yields
11126 ``true`` if ``op1`` is less than ``op2``.
11127 #. ``ule``: interprets the operands as unsigned values and yields
11128 ``true`` if ``op1`` is less than or equal to ``op2``.
11129 #. ``sgt``: interprets the operands as signed values and yields ``true``
11130 if ``op1`` is greater than ``op2``.
11131 #. ``sge``: interprets the operands as signed values and yields ``true``
11132 if ``op1`` is greater than or equal to ``op2``.
11133 #. ``slt``: interprets the operands as signed values and yields ``true``
11134 if ``op1`` is less than ``op2``.
11135 #. ``sle``: interprets the operands as signed values and yields ``true``
11136 if ``op1`` is less than or equal to ``op2``.
11138 If the operands are :ref:`pointer <t_pointer>` typed, the pointer values
11139 are compared as if they were integers.
11141 If the operands are integer vectors, then they are compared element by
11142 element. The result is an ``i1`` vector with the same number of elements
11143 as the values being compared. Otherwise, the result is an ``i1``.
11148 .. code-block:: text
11150 <result> = icmp eq i32 4, 5 ; yields: result=false
11151 <result> = icmp ne float* %X, %X ; yields: result=false
11152 <result> = icmp ult i16 4, 5 ; yields: result=true
11153 <result> = icmp sgt i16 4, 5 ; yields: result=false
11154 <result> = icmp ule i16 -4, 5 ; yields: result=false
11155 <result> = icmp sge i16 4, 5 ; yields: result=false
11159 '``fcmp``' Instruction
11160 ^^^^^^^^^^^^^^^^^^^^^^
11167 <result> = fcmp [fast-math flags]* <cond> <ty> <op1>, <op2> ; yields i1 or <N x i1>:result
11172 The '``fcmp``' instruction returns a boolean value or vector of boolean
11173 values based on comparison of its operands.
11175 If the operands are floating-point scalars, then the result type is a
11176 boolean (:ref:`i1 <t_integer>`).
11178 If the operands are floating-point vectors, then the result type is a
11179 vector of boolean with the same number of elements as the operands being
11185 The '``fcmp``' instruction takes three operands. The first operand is
11186 the condition code indicating the kind of comparison to perform. It is
11187 not a value, just a keyword. The possible condition codes are:
11189 #. ``false``: no comparison, always returns false
11190 #. ``oeq``: ordered and equal
11191 #. ``ogt``: ordered and greater than
11192 #. ``oge``: ordered and greater than or equal
11193 #. ``olt``: ordered and less than
11194 #. ``ole``: ordered and less than or equal
11195 #. ``one``: ordered and not equal
11196 #. ``ord``: ordered (no nans)
11197 #. ``ueq``: unordered or equal
11198 #. ``ugt``: unordered or greater than
11199 #. ``uge``: unordered or greater than or equal
11200 #. ``ult``: unordered or less than
11201 #. ``ule``: unordered or less than or equal
11202 #. ``une``: unordered or not equal
11203 #. ``uno``: unordered (either nans)
11204 #. ``true``: no comparison, always returns true
11206 *Ordered* means that neither operand is a QNAN while *unordered* means
11207 that either operand may be a QNAN.
11209 Each of ``val1`` and ``val2`` arguments must be either a :ref:`floating-point
11210 <t_floating>` type or a :ref:`vector <t_vector>` of floating-point type.
11211 They must have identical types.
11216 The '``fcmp``' instruction compares ``op1`` and ``op2`` according to the
11217 condition code given as ``cond``. If the operands are vectors, then the
11218 vectors are compared element by element. Each comparison performed
11219 always yields an :ref:`i1 <t_integer>` result, as follows:
11221 #. ``false``: always yields ``false``, regardless of operands.
11222 #. ``oeq``: yields ``true`` if both operands are not a QNAN and ``op1``
11223 is equal to ``op2``.
11224 #. ``ogt``: yields ``true`` if both operands are not a QNAN and ``op1``
11225 is greater than ``op2``.
11226 #. ``oge``: yields ``true`` if both operands are not a QNAN and ``op1``
11227 is greater than or equal to ``op2``.
11228 #. ``olt``: yields ``true`` if both operands are not a QNAN and ``op1``
11229 is less than ``op2``.
11230 #. ``ole``: yields ``true`` if both operands are not a QNAN and ``op1``
11231 is less than or equal to ``op2``.
11232 #. ``one``: yields ``true`` if both operands are not a QNAN and ``op1``
11233 is not equal to ``op2``.
11234 #. ``ord``: yields ``true`` if both operands are not a QNAN.
11235 #. ``ueq``: yields ``true`` if either operand is a QNAN or ``op1`` is
11237 #. ``ugt``: yields ``true`` if either operand is a QNAN or ``op1`` is
11238 greater than ``op2``.
11239 #. ``uge``: yields ``true`` if either operand is a QNAN or ``op1`` is
11240 greater than or equal to ``op2``.
11241 #. ``ult``: yields ``true`` if either operand is a QNAN or ``op1`` is
11243 #. ``ule``: yields ``true`` if either operand is a QNAN or ``op1`` is
11244 less than or equal to ``op2``.
11245 #. ``une``: yields ``true`` if either operand is a QNAN or ``op1`` is
11246 not equal to ``op2``.
11247 #. ``uno``: yields ``true`` if either operand is a QNAN.
11248 #. ``true``: always yields ``true``, regardless of operands.
11250 The ``fcmp`` instruction can also optionally take any number of
11251 :ref:`fast-math flags <fastmath>`, which are optimization hints to enable
11252 otherwise unsafe floating-point optimizations.
11254 Any set of fast-math flags are legal on an ``fcmp`` instruction, but the
11255 only flags that have any effect on its semantics are those that allow
11256 assumptions to be made about the values of input arguments; namely
11257 ``nnan``, ``ninf``, and ``reassoc``. See :ref:`fastmath` for more information.
11262 .. code-block:: text
11264 <result> = fcmp oeq float 4.0, 5.0 ; yields: result=false
11265 <result> = fcmp one float 4.0, 5.0 ; yields: result=true
11266 <result> = fcmp olt float 4.0, 5.0 ; yields: result=true
11267 <result> = fcmp ueq double 1.0, 2.0 ; yields: result=false
11271 '``phi``' Instruction
11272 ^^^^^^^^^^^^^^^^^^^^^
11279 <result> = phi [fast-math-flags] <ty> [ <val0>, <label0>], ...
11284 The '``phi``' instruction is used to implement the φ node in the SSA
11285 graph representing the function.
11290 The type of the incoming values is specified with the first type field.
11291 After this, the '``phi``' instruction takes a list of pairs as
11292 arguments, with one pair for each predecessor basic block of the current
11293 block. Only values of :ref:`first class <t_firstclass>` type may be used as
11294 the value arguments to the PHI node. Only labels may be used as the
11297 There must be no non-phi instructions between the start of a basic block
11298 and the PHI instructions: i.e. PHI instructions must be first in a basic
11301 For the purposes of the SSA form, the use of each incoming value is
11302 deemed to occur on the edge from the corresponding predecessor block to
11303 the current block (but after any definition of an '``invoke``'
11304 instruction's return value on the same edge).
11306 The optional ``fast-math-flags`` marker indicates that the phi has one
11307 or more :ref:`fast-math-flags <fastmath>`. These are optimization hints
11308 to enable otherwise unsafe floating-point optimizations. Fast-math-flags
11309 are only valid for phis that return a floating-point scalar or vector
11310 type, or an array (nested to any depth) of floating-point scalar or vector
11316 At runtime, the '``phi``' instruction logically takes on the value
11317 specified by the pair corresponding to the predecessor basic block that
11318 executed just prior to the current block.
11323 .. code-block:: llvm
11325 Loop: ; Infinite loop that counts from 0 on up...
11326 %indvar = phi i32 [ 0, %LoopHeader ], [ %nextindvar, %Loop ]
11327 %nextindvar = add i32 %indvar, 1
11332 '``select``' Instruction
11333 ^^^^^^^^^^^^^^^^^^^^^^^^
11340 <result> = select [fast-math flags] selty <cond>, <ty> <val1>, <ty> <val2> ; yields ty
11342 selty is either i1 or {<N x i1>}
11347 The '``select``' instruction is used to choose one value based on a
11348 condition, without IR-level branching.
11353 The '``select``' instruction requires an 'i1' value or a vector of 'i1'
11354 values indicating the condition, and two values of the same :ref:`first
11355 class <t_firstclass>` type.
11357 #. The optional ``fast-math flags`` marker indicates that the select has one or more
11358 :ref:`fast-math flags <fastmath>`. These are optimization hints to enable
11359 otherwise unsafe floating-point optimizations. Fast-math flags are only valid
11360 for selects that return a floating-point scalar or vector type, or an array
11361 (nested to any depth) of floating-point scalar or vector types.
11366 If the condition is an i1 and it evaluates to 1, the instruction returns
11367 the first value argument; otherwise, it returns the second value
11370 If the condition is a vector of i1, then the value arguments must be
11371 vectors of the same size, and the selection is done element by element.
11373 If the condition is an i1 and the value arguments are vectors of the
11374 same size, then an entire vector is selected.
11379 .. code-block:: llvm
11381 %X = select i1 true, i8 17, i8 42 ; yields i8:17
11386 '``freeze``' Instruction
11387 ^^^^^^^^^^^^^^^^^^^^^^^^
11394 <result> = freeze ty <val> ; yields ty:result
11399 The '``freeze``' instruction is used to stop propagation of
11400 :ref:`undef <undefvalues>` and :ref:`poison <poisonvalues>` values.
11405 The '``freeze``' instruction takes a single argument.
11410 If the argument is ``undef`` or ``poison``, '``freeze``' returns an
11411 arbitrary, but fixed, value of type '``ty``'.
11412 Otherwise, this instruction is a no-op and returns the input argument.
11413 All uses of a value returned by the same '``freeze``' instruction are
11414 guaranteed to always observe the same value, while different '``freeze``'
11415 instructions may yield different values.
11417 While ``undef`` and ``poison`` pointers can be frozen, the result is a
11418 non-dereferenceable pointer. See the
11419 :ref:`Pointer Aliasing Rules <pointeraliasing>` section for more information.
11420 If an aggregate value or vector is frozen, the operand is frozen element-wise.
11421 The padding of an aggregate isn't considered, since it isn't visible
11422 without storing it into memory and loading it with a different type.
11428 .. code-block:: text
11432 %y = add i32 %w, %w ; undef
11433 %z = add i32 %x, %x ; even number because all uses of %x observe
11435 %x2 = freeze i32 %w
11436 %cmp = icmp eq i32 %x, %x2 ; can be true or false
11438 ; example with vectors
11439 %v = <2 x i32> <i32 undef, i32 poison>
11440 %a = extractelement <2 x i32> %v, i32 0 ; undef
11441 %b = extractelement <2 x i32> %v, i32 1 ; poison
11442 %add = add i32 %a, %a ; undef
11444 %v.fr = freeze <2 x i32> %v ; element-wise freeze
11445 %d = extractelement <2 x i32> %v.fr, i32 0 ; not undef
11446 %add.f = add i32 %d, %d ; even number
11448 ; branching on frozen value
11449 %poison = add nsw i1 %k, undef ; poison
11450 %c = freeze i1 %poison
11451 br i1 %c, label %foo, label %bar ; non-deterministic branch to %foo or %bar
11456 '``call``' Instruction
11457 ^^^^^^^^^^^^^^^^^^^^^^
11464 <result> = [tail | musttail | notail ] call [fast-math flags] [cconv] [ret attrs] [addrspace(<num>)]
11465 <ty>|<fnty> <fnptrval>(<function args>) [fn attrs] [ operand bundles ]
11470 The '``call``' instruction represents a simple function call.
11475 This instruction requires several arguments:
11477 #. The optional ``tail`` and ``musttail`` markers indicate that the optimizers
11478 should perform tail call optimization. The ``tail`` marker is a hint that
11479 `can be ignored <CodeGenerator.html#sibcallopt>`_. The ``musttail`` marker
11480 means that the call must be tail call optimized in order for the program to
11481 be correct. The ``musttail`` marker provides these guarantees:
11483 #. The call will not cause unbounded stack growth if it is part of a
11484 recursive cycle in the call graph.
11485 #. Arguments with the :ref:`inalloca <attr_inalloca>` or
11486 :ref:`preallocated <attr_preallocated>` attribute are forwarded in place.
11487 #. If the musttail call appears in a function with the ``"thunk"`` attribute
11488 and the caller and callee both have varargs, than any unprototyped
11489 arguments in register or memory are forwarded to the callee. Similarly,
11490 the return value of the callee is returned to the caller's caller, even
11491 if a void return type is in use.
11493 Both markers imply that the callee does not access allocas from the caller.
11494 The ``tail`` marker additionally implies that the callee does not access
11495 varargs from the caller. Calls marked ``musttail`` must obey the following
11498 - The call must immediately precede a :ref:`ret <i_ret>` instruction,
11499 or a pointer bitcast followed by a ret instruction.
11500 - The ret instruction must return the (possibly bitcasted) value
11501 produced by the call, undef, or void.
11502 - The calling conventions of the caller and callee must match.
11503 - The callee must be varargs iff the caller is varargs. Bitcasting a
11504 non-varargs function to the appropriate varargs type is legal so
11505 long as the non-varargs prefixes obey the other rules.
11506 - The return type must not undergo automatic conversion to an `sret` pointer.
11508 In addition, if the calling convention is not `swifttailcc` or `tailcc`:
11510 - All ABI-impacting function attributes, such as sret, byval, inreg,
11511 returned, and inalloca, must match.
11512 - The caller and callee prototypes must match. Pointer types of parameters
11513 or return types may differ in pointee type, but not in address space.
11515 On the other hand, if the calling convention is `swifttailcc` or `swiftcc`:
11517 - Only these ABI-impacting attributes attributes are allowed: sret, byval,
11518 swiftself, and swiftasync.
11519 - Prototypes are not required to match.
11521 Tail call optimization for calls marked ``tail`` is guaranteed to occur if
11522 the following conditions are met:
11524 - Caller and callee both have the calling convention ``fastcc`` or ``tailcc``.
11525 - The call is in tail position (ret immediately follows call and ret
11526 uses value of call or is void).
11527 - Option ``-tailcallopt`` is enabled,
11528 ``llvm::GuaranteedTailCallOpt`` is ``true``, or the calling convention
11530 - `Platform-specific constraints are
11531 met. <CodeGenerator.html#tailcallopt>`_
11533 #. The optional ``notail`` marker indicates that the optimizers should not add
11534 ``tail`` or ``musttail`` markers to the call. It is used to prevent tail
11535 call optimization from being performed on the call.
11537 #. The optional ``fast-math flags`` marker indicates that the call has one or more
11538 :ref:`fast-math flags <fastmath>`, which are optimization hints to enable
11539 otherwise unsafe floating-point optimizations. Fast-math flags are only valid
11540 for calls that return a floating-point scalar or vector type, or an array
11541 (nested to any depth) of floating-point scalar or vector types.
11543 #. The optional "cconv" marker indicates which :ref:`calling
11544 convention <callingconv>` the call should use. If none is
11545 specified, the call defaults to using C calling conventions. The
11546 calling convention of the call must match the calling convention of
11547 the target function, or else the behavior is undefined.
11548 #. The optional :ref:`Parameter Attributes <paramattrs>` list for return
11549 values. Only '``zeroext``', '``signext``', and '``inreg``' attributes
11551 #. The optional addrspace attribute can be used to indicate the address space
11552 of the called function. If it is not specified, the program address space
11553 from the :ref:`datalayout string<langref_datalayout>` will be used.
11554 #. '``ty``': the type of the call instruction itself which is also the
11555 type of the return value. Functions that return no value are marked
11557 #. '``fnty``': shall be the signature of the function being called. The
11558 argument types must match the types implied by this signature. This
11559 type can be omitted if the function is not varargs.
11560 #. '``fnptrval``': An LLVM value containing a pointer to a function to
11561 be called. In most cases, this is a direct function call, but
11562 indirect ``call``'s are just as possible, calling an arbitrary pointer
11564 #. '``function args``': argument list whose types match the function
11565 signature argument types and parameter attributes. All arguments must
11566 be of :ref:`first class <t_firstclass>` type. If the function signature
11567 indicates the function accepts a variable number of arguments, the
11568 extra arguments can be specified.
11569 #. The optional :ref:`function attributes <fnattrs>` list.
11570 #. The optional :ref:`operand bundles <opbundles>` list.
11575 The '``call``' instruction is used to cause control flow to transfer to
11576 a specified function, with its incoming arguments bound to the specified
11577 values. Upon a '``ret``' instruction in the called function, control
11578 flow continues with the instruction after the function call, and the
11579 return value of the function is bound to the result argument.
11584 .. code-block:: llvm
11586 %retval = call i32 @test(i32 %argc)
11587 call i32 (i8*, ...)* @printf(i8* %msg, i32 12, i8 42) ; yields i32
11588 %X = tail call i32 @foo() ; yields i32
11589 %Y = tail call fastcc i32 @foo() ; yields i32
11590 call void %foo(i8 97 signext)
11592 %struct.A = type { i32, i8 }
11593 %r = call %struct.A @foo() ; yields { i32, i8 }
11594 %gr = extractvalue %struct.A %r, 0 ; yields i32
11595 %gr1 = extractvalue %struct.A %r, 1 ; yields i8
11596 %Z = call void @foo() noreturn ; indicates that %foo never returns normally
11597 %ZZ = call zeroext i32 @bar() ; Return value is %zero extended
11599 llvm treats calls to some functions with names and arguments that match
11600 the standard C99 library as being the C99 library functions, and may
11601 perform optimizations or generate code for them under that assumption.
11602 This is something we'd like to change in the future to provide better
11603 support for freestanding environments and non-C-based languages.
11607 '``va_arg``' Instruction
11608 ^^^^^^^^^^^^^^^^^^^^^^^^
11615 <resultval> = va_arg <va_list*> <arglist>, <argty>
11620 The '``va_arg``' instruction is used to access arguments passed through
11621 the "variable argument" area of a function call. It is used to implement
11622 the ``va_arg`` macro in C.
11627 This instruction takes a ``va_list*`` value and the type of the
11628 argument. It returns a value of the specified argument type and
11629 increments the ``va_list`` to point to the next argument. The actual
11630 type of ``va_list`` is target specific.
11635 The '``va_arg``' instruction loads an argument of the specified type
11636 from the specified ``va_list`` and causes the ``va_list`` to point to
11637 the next argument. For more information, see the variable argument
11638 handling :ref:`Intrinsic Functions <int_varargs>`.
11640 It is legal for this instruction to be called in a function which does
11641 not take a variable number of arguments, for example, the ``vfprintf``
11644 ``va_arg`` is an LLVM instruction instead of an :ref:`intrinsic
11645 function <intrinsics>` because it takes a type as an argument.
11650 See the :ref:`variable argument processing <int_varargs>` section.
11652 Note that the code generator does not yet fully support va\_arg on many
11653 targets. Also, it does not currently support va\_arg with aggregate
11654 types on any target.
11658 '``landingpad``' Instruction
11659 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
11666 <resultval> = landingpad <resultty> <clause>+
11667 <resultval> = landingpad <resultty> cleanup <clause>*
11669 <clause> := catch <type> <value>
11670 <clause> := filter <array constant type> <array constant>
11675 The '``landingpad``' instruction is used by `LLVM's exception handling
11676 system <ExceptionHandling.html#overview>`_ to specify that a basic block
11677 is a landing pad --- one where the exception lands, and corresponds to the
11678 code found in the ``catch`` portion of a ``try``/``catch`` sequence. It
11679 defines values supplied by the :ref:`personality function <personalityfn>` upon
11680 re-entry to the function. The ``resultval`` has the type ``resultty``.
11686 ``cleanup`` flag indicates that the landing pad block is a cleanup.
11688 A ``clause`` begins with the clause type --- ``catch`` or ``filter`` --- and
11689 contains the global variable representing the "type" that may be caught
11690 or filtered respectively. Unlike the ``catch`` clause, the ``filter``
11691 clause takes an array constant as its argument. Use
11692 "``[0 x i8**] undef``" for a filter which cannot throw. The
11693 '``landingpad``' instruction must contain *at least* one ``clause`` or
11694 the ``cleanup`` flag.
11699 The '``landingpad``' instruction defines the values which are set by the
11700 :ref:`personality function <personalityfn>` upon re-entry to the function, and
11701 therefore the "result type" of the ``landingpad`` instruction. As with
11702 calling conventions, how the personality function results are
11703 represented in LLVM IR is target specific.
11705 The clauses are applied in order from top to bottom. If two
11706 ``landingpad`` instructions are merged together through inlining, the
11707 clauses from the calling function are appended to the list of clauses.
11708 When the call stack is being unwound due to an exception being thrown,
11709 the exception is compared against each ``clause`` in turn. If it doesn't
11710 match any of the clauses, and the ``cleanup`` flag is not set, then
11711 unwinding continues further up the call stack.
11713 The ``landingpad`` instruction has several restrictions:
11715 - A landing pad block is a basic block which is the unwind destination
11716 of an '``invoke``' instruction.
11717 - A landing pad block must have a '``landingpad``' instruction as its
11718 first non-PHI instruction.
11719 - There can be only one '``landingpad``' instruction within the landing
11721 - A basic block that is not a landing pad block may not include a
11722 '``landingpad``' instruction.
11727 .. code-block:: llvm
11729 ;; A landing pad which can catch an integer.
11730 %res = landingpad { i8*, i32 }
11732 ;; A landing pad that is a cleanup.
11733 %res = landingpad { i8*, i32 }
11735 ;; A landing pad which can catch an integer and can only throw a double.
11736 %res = landingpad { i8*, i32 }
11738 filter [1 x i8**] [@_ZTId]
11742 '``catchpad``' Instruction
11743 ^^^^^^^^^^^^^^^^^^^^^^^^^^
11750 <resultval> = catchpad within <catchswitch> [<args>*]
11755 The '``catchpad``' instruction is used by `LLVM's exception handling
11756 system <ExceptionHandling.html#overview>`_ to specify that a basic block
11757 begins a catch handler --- one where a personality routine attempts to transfer
11758 control to catch an exception.
11763 The ``catchswitch`` operand must always be a token produced by a
11764 :ref:`catchswitch <i_catchswitch>` instruction in a predecessor block. This
11765 ensures that each ``catchpad`` has exactly one predecessor block, and it always
11766 terminates in a ``catchswitch``.
11768 The ``args`` correspond to whatever information the personality routine
11769 requires to know if this is an appropriate handler for the exception. Control
11770 will transfer to the ``catchpad`` if this is the first appropriate handler for
11773 The ``resultval`` has the type :ref:`token <t_token>` and is used to match the
11774 ``catchpad`` to corresponding :ref:`catchrets <i_catchret>` and other nested EH
11780 When the call stack is being unwound due to an exception being thrown, the
11781 exception is compared against the ``args``. If it doesn't match, control will
11782 not reach the ``catchpad`` instruction. The representation of ``args`` is
11783 entirely target and personality function-specific.
11785 Like the :ref:`landingpad <i_landingpad>` instruction, the ``catchpad``
11786 instruction must be the first non-phi of its parent basic block.
11788 The meaning of the tokens produced and consumed by ``catchpad`` and other "pad"
11789 instructions is described in the
11790 `Windows exception handling documentation\ <ExceptionHandling.html#wineh>`_.
11792 When a ``catchpad`` has been "entered" but not yet "exited" (as
11793 described in the `EH documentation\ <ExceptionHandling.html#wineh-constraints>`_),
11794 it is undefined behavior to execute a :ref:`call <i_call>` or :ref:`invoke <i_invoke>`
11795 that does not carry an appropriate :ref:`"funclet" bundle <ob_funclet>`.
11800 .. code-block:: text
11803 %cs = catchswitch within none [label %handler0] unwind to caller
11804 ;; A catch block which can catch an integer.
11806 %tok = catchpad within %cs [i8** @_ZTIi]
11810 '``cleanuppad``' Instruction
11811 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
11818 <resultval> = cleanuppad within <parent> [<args>*]
11823 The '``cleanuppad``' instruction is used by `LLVM's exception handling
11824 system <ExceptionHandling.html#overview>`_ to specify that a basic block
11825 is a cleanup block --- one where a personality routine attempts to
11826 transfer control to run cleanup actions.
11827 The ``args`` correspond to whatever additional
11828 information the :ref:`personality function <personalityfn>` requires to
11829 execute the cleanup.
11830 The ``resultval`` has the type :ref:`token <t_token>` and is used to
11831 match the ``cleanuppad`` to corresponding :ref:`cleanuprets <i_cleanupret>`.
11832 The ``parent`` argument is the token of the funclet that contains the
11833 ``cleanuppad`` instruction. If the ``cleanuppad`` is not inside a funclet,
11834 this operand may be the token ``none``.
11839 The instruction takes a list of arbitrary values which are interpreted
11840 by the :ref:`personality function <personalityfn>`.
11845 When the call stack is being unwound due to an exception being thrown,
11846 the :ref:`personality function <personalityfn>` transfers control to the
11847 ``cleanuppad`` with the aid of the personality-specific arguments.
11848 As with calling conventions, how the personality function results are
11849 represented in LLVM IR is target specific.
11851 The ``cleanuppad`` instruction has several restrictions:
11853 - A cleanup block is a basic block which is the unwind destination of
11854 an exceptional instruction.
11855 - A cleanup block must have a '``cleanuppad``' instruction as its
11856 first non-PHI instruction.
11857 - There can be only one '``cleanuppad``' instruction within the
11859 - A basic block that is not a cleanup block may not include a
11860 '``cleanuppad``' instruction.
11862 When a ``cleanuppad`` has been "entered" but not yet "exited" (as
11863 described in the `EH documentation\ <ExceptionHandling.html#wineh-constraints>`_),
11864 it is undefined behavior to execute a :ref:`call <i_call>` or :ref:`invoke <i_invoke>`
11865 that does not carry an appropriate :ref:`"funclet" bundle <ob_funclet>`.
11870 .. code-block:: text
11872 %tok = cleanuppad within %cs []
11876 Intrinsic Functions
11877 ===================
11879 LLVM supports the notion of an "intrinsic function". These functions
11880 have well known names and semantics and are required to follow certain
11881 restrictions. Overall, these intrinsics represent an extension mechanism
11882 for the LLVM language that does not require changing all of the
11883 transformations in LLVM when adding to the language (or the bitcode
11884 reader/writer, the parser, etc...).
11886 Intrinsic function names must all start with an "``llvm.``" prefix. This
11887 prefix is reserved in LLVM for intrinsic names; thus, function names may
11888 not begin with this prefix. Intrinsic functions must always be external
11889 functions: you cannot define the body of intrinsic functions. Intrinsic
11890 functions may only be used in call or invoke instructions: it is illegal
11891 to take the address of an intrinsic function. Additionally, because
11892 intrinsic functions are part of the LLVM language, it is required if any
11893 are added that they be documented here.
11895 Some intrinsic functions can be overloaded, i.e., the intrinsic
11896 represents a family of functions that perform the same operation but on
11897 different data types. Because LLVM can represent over 8 million
11898 different integer types, overloading is used commonly to allow an
11899 intrinsic function to operate on any integer type. One or more of the
11900 argument types or the result type can be overloaded to accept any
11901 integer type. Argument types may also be defined as exactly matching a
11902 previous argument's type or the result type. This allows an intrinsic
11903 function which accepts multiple arguments, but needs all of them to be
11904 of the same type, to only be overloaded with respect to a single
11905 argument or the result.
11907 Overloaded intrinsics will have the names of its overloaded argument
11908 types encoded into its function name, each preceded by a period. Only
11909 those types which are overloaded result in a name suffix. Arguments
11910 whose type is matched against another type do not. For example, the
11911 ``llvm.ctpop`` function can take an integer of any width and returns an
11912 integer of exactly the same integer width. This leads to a family of
11913 functions such as ``i8 @llvm.ctpop.i8(i8 %val)`` and
11914 ``i29 @llvm.ctpop.i29(i29 %val)``. Only one type, the return type, is
11915 overloaded, and only one type suffix is required. Because the argument's
11916 type is matched against the return type, it does not require its own
11919 :ref:`Unnamed types <t_opaque>` are encoded as ``s_s``. Overloaded intrinsics
11920 that depend on an unnamed type in one of its overloaded argument types get an
11921 additional ``.<number>`` suffix. This allows differentiating intrinsics with
11922 different unnamed types as arguments. (For example:
11923 ``llvm.ssa.copy.p0s_s.2(%42*)``) The number is tracked in the LLVM module and
11924 it ensures unique names in the module. While linking together two modules, it is
11925 still possible to get a name clash. In that case one of the names will be
11926 changed by getting a new number.
11928 For target developers who are defining intrinsics for back-end code
11929 generation, any intrinsic overloads based solely the distinction between
11930 integer or floating point types should not be relied upon for correct
11931 code generation. In such cases, the recommended approach for target
11932 maintainers when defining intrinsics is to create separate integer and
11933 FP intrinsics rather than rely on overloading. For example, if different
11934 codegen is required for ``llvm.target.foo(<4 x i32>)`` and
11935 ``llvm.target.foo(<4 x float>)`` then these should be split into
11936 different intrinsics.
11938 To learn how to add an intrinsic function, please see the `Extending
11939 LLVM Guide <ExtendingLLVM.html>`_.
11943 Variable Argument Handling Intrinsics
11944 -------------------------------------
11946 Variable argument support is defined in LLVM with the
11947 :ref:`va_arg <i_va_arg>` instruction and these three intrinsic
11948 functions. These functions are related to the similarly named macros
11949 defined in the ``<stdarg.h>`` header file.
11951 All of these functions operate on arguments that use a target-specific
11952 value type "``va_list``". The LLVM assembly language reference manual
11953 does not define what this type is, so all transformations should be
11954 prepared to handle these functions regardless of the type used.
11956 This example shows how the :ref:`va_arg <i_va_arg>` instruction and the
11957 variable argument handling intrinsic functions are used.
11959 .. code-block:: llvm
11961 ; This struct is different for every platform. For most platforms,
11962 ; it is merely an i8*.
11963 %struct.va_list = type { i8* }
11965 ; For Unix x86_64 platforms, va_list is the following struct:
11966 ; %struct.va_list = type { i32, i32, i8*, i8* }
11968 define i32 @test(i32 %X, ...) {
11969 ; Initialize variable argument processing
11970 %ap = alloca %struct.va_list
11971 %ap2 = bitcast %struct.va_list* %ap to i8*
11972 call void @llvm.va_start(i8* %ap2)
11974 ; Read a single integer argument
11975 %tmp = va_arg i8* %ap2, i32
11977 ; Demonstrate usage of llvm.va_copy and llvm.va_end
11979 %aq2 = bitcast i8** %aq to i8*
11980 call void @llvm.va_copy(i8* %aq2, i8* %ap2)
11981 call void @llvm.va_end(i8* %aq2)
11983 ; Stop processing of arguments.
11984 call void @llvm.va_end(i8* %ap2)
11988 declare void @llvm.va_start(i8*)
11989 declare void @llvm.va_copy(i8*, i8*)
11990 declare void @llvm.va_end(i8*)
11994 '``llvm.va_start``' Intrinsic
11995 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
12002 declare void @llvm.va_start(i8* <arglist>)
12007 The '``llvm.va_start``' intrinsic initializes ``*<arglist>`` for
12008 subsequent use by ``va_arg``.
12013 The argument is a pointer to a ``va_list`` element to initialize.
12018 The '``llvm.va_start``' intrinsic works just like the ``va_start`` macro
12019 available in C. In a target-dependent way, it initializes the
12020 ``va_list`` element to which the argument points, so that the next call
12021 to ``va_arg`` will produce the first variable argument passed to the
12022 function. Unlike the C ``va_start`` macro, this intrinsic does not need
12023 to know the last argument of the function as the compiler can figure
12026 '``llvm.va_end``' Intrinsic
12027 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
12034 declare void @llvm.va_end(i8* <arglist>)
12039 The '``llvm.va_end``' intrinsic destroys ``*<arglist>``, which has been
12040 initialized previously with ``llvm.va_start`` or ``llvm.va_copy``.
12045 The argument is a pointer to a ``va_list`` to destroy.
12050 The '``llvm.va_end``' intrinsic works just like the ``va_end`` macro
12051 available in C. In a target-dependent way, it destroys the ``va_list``
12052 element to which the argument points. Calls to
12053 :ref:`llvm.va_start <int_va_start>` and
12054 :ref:`llvm.va_copy <int_va_copy>` must be matched exactly with calls to
12059 '``llvm.va_copy``' Intrinsic
12060 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
12067 declare void @llvm.va_copy(i8* <destarglist>, i8* <srcarglist>)
12072 The '``llvm.va_copy``' intrinsic copies the current argument position
12073 from the source argument list to the destination argument list.
12078 The first argument is a pointer to a ``va_list`` element to initialize.
12079 The second argument is a pointer to a ``va_list`` element to copy from.
12084 The '``llvm.va_copy``' intrinsic works just like the ``va_copy`` macro
12085 available in C. In a target-dependent way, it copies the source
12086 ``va_list`` element into the destination ``va_list`` element. This
12087 intrinsic is necessary because the `` llvm.va_start`` intrinsic may be
12088 arbitrarily complex and require, for example, memory allocation.
12090 Accurate Garbage Collection Intrinsics
12091 --------------------------------------
12093 LLVM's support for `Accurate Garbage Collection <GarbageCollection.html>`_
12094 (GC) requires the frontend to generate code containing appropriate intrinsic
12095 calls and select an appropriate GC strategy which knows how to lower these
12096 intrinsics in a manner which is appropriate for the target collector.
12098 These intrinsics allow identification of :ref:`GC roots on the
12099 stack <int_gcroot>`, as well as garbage collector implementations that
12100 require :ref:`read <int_gcread>` and :ref:`write <int_gcwrite>` barriers.
12101 Frontends for type-safe garbage collected languages should generate
12102 these intrinsics to make use of the LLVM garbage collectors. For more
12103 details, see `Garbage Collection with LLVM <GarbageCollection.html>`_.
12105 LLVM provides an second experimental set of intrinsics for describing garbage
12106 collection safepoints in compiled code. These intrinsics are an alternative
12107 to the ``llvm.gcroot`` intrinsics, but are compatible with the ones for
12108 :ref:`read <int_gcread>` and :ref:`write <int_gcwrite>` barriers. The
12109 differences in approach are covered in the `Garbage Collection with LLVM
12110 <GarbageCollection.html>`_ documentation. The intrinsics themselves are
12111 described in :doc:`Statepoints`.
12115 '``llvm.gcroot``' Intrinsic
12116 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
12123 declare void @llvm.gcroot(i8** %ptrloc, i8* %metadata)
12128 The '``llvm.gcroot``' intrinsic declares the existence of a GC root to
12129 the code generator, and allows some metadata to be associated with it.
12134 The first argument specifies the address of a stack object that contains
12135 the root pointer. The second pointer (which must be either a constant or
12136 a global value address) contains the meta-data to be associated with the
12142 At runtime, a call to this intrinsic stores a null pointer into the
12143 "ptrloc" location. At compile-time, the code generator generates
12144 information to allow the runtime to find the pointer at GC safe points.
12145 The '``llvm.gcroot``' intrinsic may only be used in a function which
12146 :ref:`specifies a GC algorithm <gc>`.
12150 '``llvm.gcread``' Intrinsic
12151 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
12158 declare i8* @llvm.gcread(i8* %ObjPtr, i8** %Ptr)
12163 The '``llvm.gcread``' intrinsic identifies reads of references from heap
12164 locations, allowing garbage collector implementations that require read
12170 The second argument is the address to read from, which should be an
12171 address allocated from the garbage collector. The first object is a
12172 pointer to the start of the referenced object, if needed by the language
12173 runtime (otherwise null).
12178 The '``llvm.gcread``' intrinsic has the same semantics as a load
12179 instruction, but may be replaced with substantially more complex code by
12180 the garbage collector runtime, as needed. The '``llvm.gcread``'
12181 intrinsic may only be used in a function which :ref:`specifies a GC
12186 '``llvm.gcwrite``' Intrinsic
12187 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
12194 declare void @llvm.gcwrite(i8* %P1, i8* %Obj, i8** %P2)
12199 The '``llvm.gcwrite``' intrinsic identifies writes of references to heap
12200 locations, allowing garbage collector implementations that require write
12201 barriers (such as generational or reference counting collectors).
12206 The first argument is the reference to store, the second is the start of
12207 the object to store it to, and the third is the address of the field of
12208 Obj to store to. If the runtime does not require a pointer to the
12209 object, Obj may be null.
12214 The '``llvm.gcwrite``' intrinsic has the same semantics as a store
12215 instruction, but may be replaced with substantially more complex code by
12216 the garbage collector runtime, as needed. The '``llvm.gcwrite``'
12217 intrinsic may only be used in a function which :ref:`specifies a GC
12223 'llvm.experimental.gc.statepoint' Intrinsic
12224 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
12232 @llvm.experimental.gc.statepoint(i64 <id>, i32 <num patch bytes>,
12233 func_type <target>,
12234 i64 <#call args>, i64 <flags>,
12235 ... (call parameters),
12241 The statepoint intrinsic represents a call which is parse-able by the
12247 The 'id' operand is a constant integer that is reported as the ID
12248 field in the generated stackmap. LLVM does not interpret this
12249 parameter in any way and its meaning is up to the statepoint user to
12250 decide. Note that LLVM is free to duplicate code containing
12251 statepoint calls, and this may transform IR that had a unique 'id' per
12252 lexical call to statepoint to IR that does not.
12254 If 'num patch bytes' is non-zero then the call instruction
12255 corresponding to the statepoint is not emitted and LLVM emits 'num
12256 patch bytes' bytes of nops in its place. LLVM will emit code to
12257 prepare the function arguments and retrieve the function return value
12258 in accordance to the calling convention; the former before the nop
12259 sequence and the latter after the nop sequence. It is expected that
12260 the user will patch over the 'num patch bytes' bytes of nops with a
12261 calling sequence specific to their runtime before executing the
12262 generated machine code. There are no guarantees with respect to the
12263 alignment of the nop sequence. Unlike :doc:`StackMaps` statepoints do
12264 not have a concept of shadow bytes. Note that semantically the
12265 statepoint still represents a call or invoke to 'target', and the nop
12266 sequence after patching is expected to represent an operation
12267 equivalent to a call or invoke to 'target'.
12269 The 'target' operand is the function actually being called. The
12270 target can be specified as either a symbolic LLVM function, or as an
12271 arbitrary Value of appropriate function type. Note that the function
12272 type must match the signature of the callee and the types of the 'call
12273 parameters' arguments.
12275 The '#call args' operand is the number of arguments to the actual
12276 call. It must exactly match the number of arguments passed in the
12277 'call parameters' variable length section.
12279 The 'flags' operand is used to specify extra information about the
12280 statepoint. This is currently only used to mark certain statepoints
12281 as GC transitions. This operand is a 64-bit integer with the following
12282 layout, where bit 0 is the least significant bit:
12284 +-------+---------------------------------------------------+
12286 +=======+===================================================+
12287 | 0 | Set if the statepoint is a GC transition, cleared |
12289 +-------+---------------------------------------------------+
12290 | 1-63 | Reserved for future use; must be cleared. |
12291 +-------+---------------------------------------------------+
12293 The 'call parameters' arguments are simply the arguments which need to
12294 be passed to the call target. They will be lowered according to the
12295 specified calling convention and otherwise handled like a normal call
12296 instruction. The number of arguments must exactly match what is
12297 specified in '# call args'. The types must match the signature of
12300 The 'call parameter' attributes must be followed by two 'i64 0' constants.
12301 These were originally the length prefixes for 'gc transition parameter' and
12302 'deopt parameter' arguments, but the role of these parameter sets have been
12303 entirely replaced with the corresponding operand bundles. In a future
12304 revision, these now redundant arguments will be removed.
12309 A statepoint is assumed to read and write all memory. As a result,
12310 memory operations can not be reordered past a statepoint. It is
12311 illegal to mark a statepoint as being either 'readonly' or 'readnone'.
12313 Note that legal IR can not perform any memory operation on a 'gc
12314 pointer' argument of the statepoint in a location statically reachable
12315 from the statepoint. Instead, the explicitly relocated value (from a
12316 ``gc.relocate``) must be used.
12318 'llvm.experimental.gc.result' Intrinsic
12319 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
12327 @llvm.experimental.gc.result(token %statepoint_token)
12332 ``gc.result`` extracts the result of the original call instruction
12333 which was replaced by the ``gc.statepoint``. The ``gc.result``
12334 intrinsic is actually a family of three intrinsics due to an
12335 implementation limitation. Other than the type of the return value,
12336 the semantics are the same.
12341 The first and only argument is the ``gc.statepoint`` which starts
12342 the safepoint sequence of which this ``gc.result`` is a part.
12343 Despite the typing of this as a generic token, *only* the value defined
12344 by a ``gc.statepoint`` is legal here.
12349 The ``gc.result`` represents the return value of the call target of
12350 the ``statepoint``. The type of the ``gc.result`` must exactly match
12351 the type of the target. If the call target returns void, there will
12352 be no ``gc.result``.
12354 A ``gc.result`` is modeled as a 'readnone' pure function. It has no
12355 side effects since it is just a projection of the return value of the
12356 previous call represented by the ``gc.statepoint``.
12358 'llvm.experimental.gc.relocate' Intrinsic
12359 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
12366 declare <pointer type>
12367 @llvm.experimental.gc.relocate(token %statepoint_token,
12369 i32 %pointer_offset)
12374 A ``gc.relocate`` returns the potentially relocated value of a pointer
12380 The first argument is the ``gc.statepoint`` which starts the
12381 safepoint sequence of which this ``gc.relocation`` is a part.
12382 Despite the typing of this as a generic token, *only* the value defined
12383 by a ``gc.statepoint`` is legal here.
12385 The second and third arguments are both indices into operands of the
12386 corresponding statepoint's :ref:`gc-live <ob_gc_live>` operand bundle.
12388 The second argument is an index which specifies the allocation for the pointer
12389 being relocated. The associated value must be within the object with which the
12390 pointer being relocated is associated. The optimizer is free to change *which*
12391 interior derived pointer is reported, provided that it does not replace an
12392 actual base pointer with another interior derived pointer. Collectors are
12393 allowed to rely on the base pointer operand remaining an actual base pointer if
12396 The third argument is an index which specify the (potentially) derived pointer
12397 being relocated. It is legal for this index to be the same as the second
12398 argument if-and-only-if a base pointer is being relocated.
12403 The return value of ``gc.relocate`` is the potentially relocated value
12404 of the pointer specified by its arguments. It is unspecified how the
12405 value of the returned pointer relates to the argument to the
12406 ``gc.statepoint`` other than that a) it points to the same source
12407 language object with the same offset, and b) the 'based-on'
12408 relationship of the newly relocated pointers is a projection of the
12409 unrelocated pointers. In particular, the integer value of the pointer
12410 returned is unspecified.
12412 A ``gc.relocate`` is modeled as a ``readnone`` pure function. It has no
12413 side effects since it is just a way to extract information about work
12414 done during the actual call modeled by the ``gc.statepoint``.
12416 .. _gc.get.pointer.base:
12418 'llvm.experimental.gc.get.pointer.base' Intrinsic
12419 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
12426 declare <pointer type>
12427 @llvm.experimental.gc.get.pointer.base(
12428 <pointer type> readnone nocapture %derived_ptr)
12429 nounwind readnone willreturn
12434 ``gc.get.pointer.base`` for a derived pointer returns its base pointer.
12439 The only argument is a pointer which is based on some object with
12440 an unknown offset from the base of said object.
12445 This intrinsic is used in the abstract machine model for GC to represent
12446 the base pointer for an arbitrary derived pointer.
12448 This intrinsic is inlined by the :ref:`RewriteStatepointsForGC` pass by
12449 replacing all uses of this callsite with the offset of a derived pointer from
12450 its base pointer value. The replacement is done as part of the lowering to the
12451 explicit statepoint model.
12453 The return pointer type must be the same as the type of the parameter.
12456 'llvm.experimental.gc.get.pointer.offset' Intrinsic
12457 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
12465 @llvm.experimental.gc.get.pointer.offset(
12466 <pointer type> readnone nocapture %derived_ptr)
12467 nounwind readnone willreturn
12472 ``gc.get.pointer.offset`` for a derived pointer returns the offset from its
12478 The only argument is a pointer which is based on some object with
12479 an unknown offset from the base of said object.
12484 This intrinsic is used in the abstract machine model for GC to represent
12485 the offset of an arbitrary derived pointer from its base pointer.
12487 This intrinsic is inlined by the :ref:`RewriteStatepointsForGC` pass by
12488 replacing all uses of this callsite with the offset of a derived pointer from
12489 its base pointer value. The replacement is done as part of the lowering to the
12490 explicit statepoint model.
12492 Basically this call calculates difference between the derived pointer and its
12493 base pointer (see :ref:`gc.get.pointer.base`) both ptrtoint casted. But
12494 this cast done outside the :ref:`RewriteStatepointsForGC` pass could result
12495 in the pointers lost for further lowering from the abstract model to the
12496 explicit physical one.
12498 Code Generator Intrinsics
12499 -------------------------
12501 These intrinsics are provided by LLVM to expose special features that
12502 may only be implemented with code generator support.
12504 '``llvm.returnaddress``' Intrinsic
12505 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
12512 declare i8* @llvm.returnaddress(i32 <level>)
12517 The '``llvm.returnaddress``' intrinsic attempts to compute a
12518 target-specific value indicating the return address of the current
12519 function or one of its callers.
12524 The argument to this intrinsic indicates which function to return the
12525 address for. Zero indicates the calling function, one indicates its
12526 caller, etc. The argument is **required** to be a constant integer
12532 The '``llvm.returnaddress``' intrinsic either returns a pointer
12533 indicating the return address of the specified call frame, or zero if it
12534 cannot be identified. The value returned by this intrinsic is likely to
12535 be incorrect or 0 for arguments other than zero, so it should only be
12536 used for debugging purposes.
12538 Note that calling this intrinsic does not prevent function inlining or
12539 other aggressive transformations, so the value returned may not be that
12540 of the obvious source-language caller.
12542 '``llvm.addressofreturnaddress``' Intrinsic
12543 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
12550 declare i8* @llvm.addressofreturnaddress()
12555 The '``llvm.addressofreturnaddress``' intrinsic returns a target-specific
12556 pointer to the place in the stack frame where the return address of the
12557 current function is stored.
12562 Note that calling this intrinsic does not prevent function inlining or
12563 other aggressive transformations, so the value returned may not be that
12564 of the obvious source-language caller.
12566 This intrinsic is only implemented for x86 and aarch64.
12568 '``llvm.sponentry``' Intrinsic
12569 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
12576 declare i8* @llvm.sponentry()
12581 The '``llvm.sponentry``' intrinsic returns the stack pointer value at
12582 the entry of the current function calling this intrinsic.
12587 Note this intrinsic is only verified on AArch64.
12589 '``llvm.frameaddress``' Intrinsic
12590 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
12597 declare i8* @llvm.frameaddress(i32 <level>)
12602 The '``llvm.frameaddress``' intrinsic attempts to return the
12603 target-specific frame pointer value for the specified stack frame.
12608 The argument to this intrinsic indicates which function to return the
12609 frame pointer for. Zero indicates the calling function, one indicates
12610 its caller, etc. The argument is **required** to be a constant integer
12616 The '``llvm.frameaddress``' intrinsic either returns a pointer
12617 indicating the frame address of the specified call frame, or zero if it
12618 cannot be identified. The value returned by this intrinsic is likely to
12619 be incorrect or 0 for arguments other than zero, so it should only be
12620 used for debugging purposes.
12622 Note that calling this intrinsic does not prevent function inlining or
12623 other aggressive transformations, so the value returned may not be that
12624 of the obvious source-language caller.
12626 '``llvm.swift.async.context.addr``' Intrinsic
12627 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
12634 declare i8** @llvm.swift.async.context.addr()
12639 The '``llvm.swift.async.context.addr``' intrinsic returns a pointer to
12640 the part of the extended frame record containing the asynchronous
12641 context of a Swift execution.
12646 If the caller has a ``swiftasync`` parameter, that argument will initially
12647 be stored at the returned address. If not, it will be initialized to null.
12649 '``llvm.localescape``' and '``llvm.localrecover``' Intrinsics
12650 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
12657 declare void @llvm.localescape(...)
12658 declare i8* @llvm.localrecover(i8* %func, i8* %fp, i32 %idx)
12663 The '``llvm.localescape``' intrinsic escapes offsets of a collection of static
12664 allocas, and the '``llvm.localrecover``' intrinsic applies those offsets to a
12665 live frame pointer to recover the address of the allocation. The offset is
12666 computed during frame layout of the caller of ``llvm.localescape``.
12671 All arguments to '``llvm.localescape``' must be pointers to static allocas or
12672 casts of static allocas. Each function can only call '``llvm.localescape``'
12673 once, and it can only do so from the entry block.
12675 The ``func`` argument to '``llvm.localrecover``' must be a constant
12676 bitcasted pointer to a function defined in the current module. The code
12677 generator cannot determine the frame allocation offset of functions defined in
12680 The ``fp`` argument to '``llvm.localrecover``' must be a frame pointer of a
12681 call frame that is currently live. The return value of '``llvm.localaddress``'
12682 is one way to produce such a value, but various runtimes also expose a suitable
12683 pointer in platform-specific ways.
12685 The ``idx`` argument to '``llvm.localrecover``' indicates which alloca passed to
12686 '``llvm.localescape``' to recover. It is zero-indexed.
12691 These intrinsics allow a group of functions to share access to a set of local
12692 stack allocations of a one parent function. The parent function may call the
12693 '``llvm.localescape``' intrinsic once from the function entry block, and the
12694 child functions can use '``llvm.localrecover``' to access the escaped allocas.
12695 The '``llvm.localescape``' intrinsic blocks inlining, as inlining changes where
12696 the escaped allocas are allocated, which would break attempts to use
12697 '``llvm.localrecover``'.
12699 '``llvm.seh.try.begin``' and '``llvm.seh.try.end``' Intrinsics
12700 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
12707 declare void @llvm.seh.try.begin()
12708 declare void @llvm.seh.try.end()
12713 The '``llvm.seh.try.begin``' and '``llvm.seh.try.end``' intrinsics mark
12714 the boundary of a _try region for Windows SEH Asynchrous Exception Handling.
12719 When a C-function is compiled with Windows SEH Asynchrous Exception option,
12720 -feh_asynch (aka MSVC -EHa), these two intrinsics are injected to mark _try
12721 boundary and to prevent potential exceptions from being moved across boundary.
12722 Any set of operations can then be confined to the region by reading their leaf
12723 inputs via volatile loads and writing their root outputs via volatile stores.
12725 '``llvm.seh.scope.begin``' and '``llvm.seh.scope.end``' Intrinsics
12726 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
12733 declare void @llvm.seh.scope.begin()
12734 declare void @llvm.seh.scope.end()
12739 The '``llvm.seh.scope.begin``' and '``llvm.seh.scope.end``' intrinsics mark
12740 the boundary of a CPP object lifetime for Windows SEH Asynchrous Exception
12741 Handling (MSVC option -EHa).
12746 LLVM's ordinary exception-handling representation associates EH cleanups and
12747 handlers only with ``invoke``s, which normally correspond only to call sites. To
12748 support arbitrary faulting instructions, it must be possible to recover the current
12749 EH scope for any instruction. Turning every operation in LLVM that could fault
12750 into an ``invoke`` of a new, potentially-throwing intrinsic would require adding a
12751 large number of intrinsics, impede optimization of those operations, and make
12752 compilation slower by introducing many extra basic blocks. These intrinsics can
12753 be used instead to mark the region protected by a cleanup, such as for a local
12754 C++ object with a non-trivial destructor. ``llvm.seh.scope.begin`` is used to mark
12755 the start of the region; it is always called with ``invoke``, with the unwind block
12756 being the desired unwind destination for any potentially-throwing instructions
12757 within the region. `llvm.seh.scope.end` is used to mark when the scope ends
12758 and the EH cleanup is no longer required (e.g. because the destructor is being
12761 .. _int_read_register:
12762 .. _int_read_volatile_register:
12763 .. _int_write_register:
12765 '``llvm.read_register``', '``llvm.read_volatile_register``', and '``llvm.write_register``' Intrinsics
12766 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
12773 declare i32 @llvm.read_register.i32(metadata)
12774 declare i64 @llvm.read_register.i64(metadata)
12775 declare i32 @llvm.read_volatile_register.i32(metadata)
12776 declare i64 @llvm.read_volatile_register.i64(metadata)
12777 declare void @llvm.write_register.i32(metadata, i32 @value)
12778 declare void @llvm.write_register.i64(metadata, i64 @value)
12784 The '``llvm.read_register``', '``llvm.read_volatile_register``', and
12785 '``llvm.write_register``' intrinsics provide access to the named register.
12786 The register must be valid on the architecture being compiled to. The type
12787 needs to be compatible with the register being read.
12792 The '``llvm.read_register``' and '``llvm.read_volatile_register``' intrinsics
12793 return the current value of the register, where possible. The
12794 '``llvm.write_register``' intrinsic sets the current value of the register,
12797 A call to '``llvm.read_volatile_register``' is assumed to have side-effects
12798 and possibly return a different value each time (e.g. for a timer register).
12800 This is useful to implement named register global variables that need
12801 to always be mapped to a specific register, as is common practice on
12802 bare-metal programs including OS kernels.
12804 The compiler doesn't check for register availability or use of the used
12805 register in surrounding code, including inline assembly. Because of that,
12806 allocatable registers are not supported.
12808 Warning: So far it only works with the stack pointer on selected
12809 architectures (ARM, AArch64, PowerPC and x86_64). Significant amount of
12810 work is needed to support other registers and even more so, allocatable
12815 '``llvm.stacksave``' Intrinsic
12816 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
12823 declare i8* @llvm.stacksave()
12828 The '``llvm.stacksave``' intrinsic is used to remember the current state
12829 of the function stack, for use with
12830 :ref:`llvm.stackrestore <int_stackrestore>`. This is useful for
12831 implementing language features like scoped automatic variable sized
12837 This intrinsic returns an opaque pointer value that can be passed to
12838 :ref:`llvm.stackrestore <int_stackrestore>`. When an
12839 ``llvm.stackrestore`` intrinsic is executed with a value saved from
12840 ``llvm.stacksave``, it effectively restores the state of the stack to
12841 the state it was in when the ``llvm.stacksave`` intrinsic executed. In
12842 practice, this pops any :ref:`alloca <i_alloca>` blocks from the stack that
12843 were allocated after the ``llvm.stacksave`` was executed.
12845 .. _int_stackrestore:
12847 '``llvm.stackrestore``' Intrinsic
12848 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
12855 declare void @llvm.stackrestore(i8* %ptr)
12860 The '``llvm.stackrestore``' intrinsic is used to restore the state of
12861 the function stack to the state it was in when the corresponding
12862 :ref:`llvm.stacksave <int_stacksave>` intrinsic executed. This is
12863 useful for implementing language features like scoped automatic variable
12864 sized arrays in C99.
12869 See the description for :ref:`llvm.stacksave <int_stacksave>`.
12871 .. _int_get_dynamic_area_offset:
12873 '``llvm.get.dynamic.area.offset``' Intrinsic
12874 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
12881 declare i32 @llvm.get.dynamic.area.offset.i32()
12882 declare i64 @llvm.get.dynamic.area.offset.i64()
12887 The '``llvm.get.dynamic.area.offset.*``' intrinsic family is used to
12888 get the offset from native stack pointer to the address of the most
12889 recent dynamic alloca on the caller's stack. These intrinsics are
12890 intended for use in combination with
12891 :ref:`llvm.stacksave <int_stacksave>` to get a
12892 pointer to the most recent dynamic alloca. This is useful, for example,
12893 for AddressSanitizer's stack unpoisoning routines.
12898 These intrinsics return a non-negative integer value that can be used to
12899 get the address of the most recent dynamic alloca, allocated by :ref:`alloca <i_alloca>`
12900 on the caller's stack. In particular, for targets where stack grows downwards,
12901 adding this offset to the native stack pointer would get the address of the most
12902 recent dynamic alloca. For targets where stack grows upwards, the situation is a bit more
12903 complicated, because subtracting this value from stack pointer would get the address
12904 one past the end of the most recent dynamic alloca.
12906 Although for most targets `llvm.get.dynamic.area.offset <int_get_dynamic_area_offset>`
12907 returns just a zero, for others, such as PowerPC and PowerPC64, it returns a
12908 compile-time-known constant value.
12910 The return value type of :ref:`llvm.get.dynamic.area.offset <int_get_dynamic_area_offset>`
12911 must match the target's default address space's (address space 0) pointer type.
12913 '``llvm.prefetch``' Intrinsic
12914 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
12921 declare void @llvm.prefetch(i8* <address>, i32 <rw>, i32 <locality>, i32 <cache type>)
12926 The '``llvm.prefetch``' intrinsic is a hint to the code generator to
12927 insert a prefetch instruction if supported; otherwise, it is a noop.
12928 Prefetches have no effect on the behavior of the program but can change
12929 its performance characteristics.
12934 ``address`` is the address to be prefetched, ``rw`` is the specifier
12935 determining if the fetch should be for a read (0) or write (1), and
12936 ``locality`` is a temporal locality specifier ranging from (0) - no
12937 locality, to (3) - extremely local keep in cache. The ``cache type``
12938 specifies whether the prefetch is performed on the data (1) or
12939 instruction (0) cache. The ``rw``, ``locality`` and ``cache type``
12940 arguments must be constant integers.
12945 This intrinsic does not modify the behavior of the program. In
12946 particular, prefetches cannot trap and do not produce a value. On
12947 targets that support this intrinsic, the prefetch can provide hints to
12948 the processor cache for better performance.
12950 '``llvm.pcmarker``' Intrinsic
12951 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
12958 declare void @llvm.pcmarker(i32 <id>)
12963 The '``llvm.pcmarker``' intrinsic is a method to export a Program
12964 Counter (PC) in a region of code to simulators and other tools. The
12965 method is target specific, but it is expected that the marker will use
12966 exported symbols to transmit the PC of the marker. The marker makes no
12967 guarantees that it will remain with any specific instruction after
12968 optimizations. It is possible that the presence of a marker will inhibit
12969 optimizations. The intended use is to be inserted after optimizations to
12970 allow correlations of simulation runs.
12975 ``id`` is a numerical id identifying the marker.
12980 This intrinsic does not modify the behavior of the program. Backends
12981 that do not support this intrinsic may ignore it.
12983 '``llvm.readcyclecounter``' Intrinsic
12984 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
12991 declare i64 @llvm.readcyclecounter()
12996 The '``llvm.readcyclecounter``' intrinsic provides access to the cycle
12997 counter register (or similar low latency, high accuracy clocks) on those
12998 targets that support it. On X86, it should map to RDTSC. On Alpha, it
12999 should map to RPCC. As the backing counters overflow quickly (on the
13000 order of 9 seconds on alpha), this should only be used for small
13006 When directly supported, reading the cycle counter should not modify any
13007 memory. Implementations are allowed to either return an application
13008 specific value or a system wide value. On backends without support, this
13009 is lowered to a constant 0.
13011 Note that runtime support may be conditional on the privilege-level code is
13012 running at and the host platform.
13014 '``llvm.clear_cache``' Intrinsic
13015 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
13022 declare void @llvm.clear_cache(i8*, i8*)
13027 The '``llvm.clear_cache``' intrinsic ensures visibility of modifications
13028 in the specified range to the execution unit of the processor. On
13029 targets with non-unified instruction and data cache, the implementation
13030 flushes the instruction cache.
13035 On platforms with coherent instruction and data caches (e.g. x86), this
13036 intrinsic is a nop. On platforms with non-coherent instruction and data
13037 cache (e.g. ARM, MIPS), the intrinsic is lowered either to appropriate
13038 instructions or a system call, if cache flushing requires special
13041 The default behavior is to emit a call to ``__clear_cache`` from the run
13044 This intrinsic does *not* empty the instruction pipeline. Modifications
13045 of the current function are outside the scope of the intrinsic.
13047 '``llvm.instrprof.increment``' Intrinsic
13048 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
13055 declare void @llvm.instrprof.increment(i8* <name>, i64 <hash>,
13056 i32 <num-counters>, i32 <index>)
13061 The '``llvm.instrprof.increment``' intrinsic can be emitted by a
13062 frontend for use with instrumentation based profiling. These will be
13063 lowered by the ``-instrprof`` pass to generate execution counts of a
13064 program at runtime.
13069 The first argument is a pointer to a global variable containing the
13070 name of the entity being instrumented. This should generally be the
13071 (mangled) function name for a set of counters.
13073 The second argument is a hash value that can be used by the consumer
13074 of the profile data to detect changes to the instrumented source, and
13075 the third is the number of counters associated with ``name``. It is an
13076 error if ``hash`` or ``num-counters`` differ between two instances of
13077 ``instrprof.increment`` that refer to the same name.
13079 The last argument refers to which of the counters for ``name`` should
13080 be incremented. It should be a value between 0 and ``num-counters``.
13085 This intrinsic represents an increment of a profiling counter. It will
13086 cause the ``-instrprof`` pass to generate the appropriate data
13087 structures and the code to increment the appropriate value, in a
13088 format that can be written out by a compiler runtime and consumed via
13089 the ``llvm-profdata`` tool.
13091 '``llvm.instrprof.increment.step``' Intrinsic
13092 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
13099 declare void @llvm.instrprof.increment.step(i8* <name>, i64 <hash>,
13100 i32 <num-counters>,
13101 i32 <index>, i64 <step>)
13106 The '``llvm.instrprof.increment.step``' intrinsic is an extension to
13107 the '``llvm.instrprof.increment``' intrinsic with an additional fifth
13108 argument to specify the step of the increment.
13112 The first four arguments are the same as '``llvm.instrprof.increment``'
13115 The last argument specifies the value of the increment of the counter variable.
13119 See description of '``llvm.instrprof.increment``' intrinsic.
13122 '``llvm.instrprof.value.profile``' Intrinsic
13123 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
13130 declare void @llvm.instrprof.value.profile(i8* <name>, i64 <hash>,
13131 i64 <value>, i32 <value_kind>,
13137 The '``llvm.instrprof.value.profile``' intrinsic can be emitted by a
13138 frontend for use with instrumentation based profiling. This will be
13139 lowered by the ``-instrprof`` pass to find out the target values,
13140 instrumented expressions take in a program at runtime.
13145 The first argument is a pointer to a global variable containing the
13146 name of the entity being instrumented. ``name`` should generally be the
13147 (mangled) function name for a set of counters.
13149 The second argument is a hash value that can be used by the consumer
13150 of the profile data to detect changes to the instrumented source. It
13151 is an error if ``hash`` differs between two instances of
13152 ``llvm.instrprof.*`` that refer to the same name.
13154 The third argument is the value of the expression being profiled. The profiled
13155 expression's value should be representable as an unsigned 64-bit value. The
13156 fourth argument represents the kind of value profiling that is being done. The
13157 supported value profiling kinds are enumerated through the
13158 ``InstrProfValueKind`` type declared in the
13159 ``<include/llvm/ProfileData/InstrProf.h>`` header file. The last argument is the
13160 index of the instrumented expression within ``name``. It should be >= 0.
13165 This intrinsic represents the point where a call to a runtime routine
13166 should be inserted for value profiling of target expressions. ``-instrprof``
13167 pass will generate the appropriate data structures and replace the
13168 ``llvm.instrprof.value.profile`` intrinsic with the call to the profile
13169 runtime library with proper arguments.
13171 '``llvm.thread.pointer``' Intrinsic
13172 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
13179 declare i8* @llvm.thread.pointer()
13184 The '``llvm.thread.pointer``' intrinsic returns the value of the thread
13190 The '``llvm.thread.pointer``' intrinsic returns a pointer to the TLS area
13191 for the current thread. The exact semantics of this value are target
13192 specific: it may point to the start of TLS area, to the end, or somewhere
13193 in the middle. Depending on the target, this intrinsic may read a register,
13194 call a helper function, read from an alternate memory space, or perform
13195 other operations necessary to locate the TLS area. Not all targets support
13198 '``llvm.call.preallocated.setup``' Intrinsic
13199 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
13206 declare token @llvm.call.preallocated.setup(i32 %num_args)
13211 The '``llvm.call.preallocated.setup``' intrinsic returns a token which can
13212 be used with a call's ``"preallocated"`` operand bundle to indicate that
13213 certain arguments are allocated and initialized before the call.
13218 The '``llvm.call.preallocated.setup``' intrinsic returns a token which is
13219 associated with at most one call. The token can be passed to
13220 '``@llvm.call.preallocated.arg``' to get a pointer to get that
13221 corresponding argument. The token must be the parameter to a
13222 ``"preallocated"`` operand bundle for the corresponding call.
13224 Nested calls to '``llvm.call.preallocated.setup``' are allowed, but must
13225 be properly nested. e.g.
13227 :: code-block:: llvm
13229 %t1 = call token @llvm.call.preallocated.setup(i32 0)
13230 %t2 = call token @llvm.call.preallocated.setup(i32 0)
13231 call void foo() ["preallocated"(token %t2)]
13232 call void foo() ["preallocated"(token %t1)]
13234 is allowed, but not
13236 :: code-block:: llvm
13238 %t1 = call token @llvm.call.preallocated.setup(i32 0)
13239 %t2 = call token @llvm.call.preallocated.setup(i32 0)
13240 call void foo() ["preallocated"(token %t1)]
13241 call void foo() ["preallocated"(token %t2)]
13243 .. _int_call_preallocated_arg:
13245 '``llvm.call.preallocated.arg``' Intrinsic
13246 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
13253 declare i8* @llvm.call.preallocated.arg(token %setup_token, i32 %arg_index)
13258 The '``llvm.call.preallocated.arg``' intrinsic returns a pointer to the
13259 corresponding preallocated argument for the preallocated call.
13264 The '``llvm.call.preallocated.arg``' intrinsic returns a pointer to the
13265 ``%arg_index``th argument with the ``preallocated`` attribute for
13266 the call associated with the ``%setup_token``, which must be from
13267 '``llvm.call.preallocated.setup``'.
13269 A call to '``llvm.call.preallocated.arg``' must have a call site
13270 ``preallocated`` attribute. The type of the ``preallocated`` attribute must
13271 match the type used by the ``preallocated`` attribute of the corresponding
13272 argument at the preallocated call. The type is used in the case that an
13273 ``llvm.call.preallocated.setup`` does not have a corresponding call (e.g. due
13274 to DCE), where otherwise we cannot know how large the arguments are.
13276 It is undefined behavior if this is called with a token from an
13277 '``llvm.call.preallocated.setup``' if another
13278 '``llvm.call.preallocated.setup``' has already been called or if the
13279 preallocated call corresponding to the '``llvm.call.preallocated.setup``'
13280 has already been called.
13282 .. _int_call_preallocated_teardown:
13284 '``llvm.call.preallocated.teardown``' Intrinsic
13285 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
13292 declare i8* @llvm.call.preallocated.teardown(token %setup_token)
13297 The '``llvm.call.preallocated.teardown``' intrinsic cleans up the stack
13298 created by a '``llvm.call.preallocated.setup``'.
13303 The token argument must be a '``llvm.call.preallocated.setup``'.
13305 The '``llvm.call.preallocated.teardown``' intrinsic cleans up the stack
13306 allocated by the corresponding '``llvm.call.preallocated.setup``'. Exactly
13307 one of this or the preallocated call must be called to prevent stack leaks.
13308 It is undefined behavior to call both a '``llvm.call.preallocated.teardown``'
13309 and the preallocated call for a given '``llvm.call.preallocated.setup``'.
13311 For example, if the stack is allocated for a preallocated call by a
13312 '``llvm.call.preallocated.setup``', then an initializer function called on an
13313 allocated argument throws an exception, there should be a
13314 '``llvm.call.preallocated.teardown``' in the exception handler to prevent
13317 Following the nesting rules in '``llvm.call.preallocated.setup``', nested
13318 calls to '``llvm.call.preallocated.setup``' and
13319 '``llvm.call.preallocated.teardown``' are allowed but must be properly
13325 .. code-block:: llvm
13327 %cs = call token @llvm.call.preallocated.setup(i32 1)
13328 %x = call i8* @llvm.call.preallocated.arg(token %cs, i32 0) preallocated(i32)
13329 %y = bitcast i8* %x to i32*
13330 invoke void @constructor(i32* %y) to label %conta unwind label %contb
13332 call void @foo1(i32* preallocated(i32) %y) ["preallocated"(token %cs)]
13335 %s = catchswitch within none [label %catch] unwind to caller
13337 %p = catchpad within %s []
13338 call void @llvm.call.preallocated.teardown(token %cs)
13341 Standard C/C++ Library Intrinsics
13342 ---------------------------------
13344 LLVM provides intrinsics for a few important standard C/C++ library
13345 functions. These intrinsics allow source-language front-ends to pass
13346 information about the alignment of the pointer arguments to the code
13347 generator, providing opportunity for more efficient code generation.
13350 '``llvm.abs.*``' Intrinsic
13351 ^^^^^^^^^^^^^^^^^^^^^^^^^^
13356 This is an overloaded intrinsic. You can use ``llvm.abs`` on any
13357 integer bit width or any vector of integer elements.
13361 declare i32 @llvm.abs.i32(i32 <src>, i1 <is_int_min_poison>)
13362 declare <4 x i32> @llvm.abs.v4i32(<4 x i32> <src>, i1 <is_int_min_poison>)
13367 The '``llvm.abs``' family of intrinsic functions returns the absolute value
13373 The first argument is the value for which the absolute value is to be returned.
13374 This argument may be of any integer type or a vector with integer element type.
13375 The return type must match the first argument type.
13377 The second argument must be a constant and is a flag to indicate whether the
13378 result value of the '``llvm.abs``' intrinsic is a
13379 :ref:`poison value <poisonvalues>` if the argument is statically or dynamically
13380 an ``INT_MIN`` value.
13385 The '``llvm.abs``' intrinsic returns the magnitude (always positive) of the
13386 argument or each element of a vector argument.". If the argument is ``INT_MIN``,
13387 then the result is also ``INT_MIN`` if ``is_int_min_poison == 0`` and
13388 ``poison`` otherwise.
13391 '``llvm.smax.*``' Intrinsic
13392 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
13397 This is an overloaded intrinsic. You can use ``@llvm.smax`` on any
13398 integer bit width or any vector of integer elements.
13402 declare i32 @llvm.smax.i32(i32 %a, i32 %b)
13403 declare <4 x i32> @llvm.smax.v4i32(<4 x i32> %a, <4 x i32> %b)
13408 Return the larger of ``%a`` and ``%b`` comparing the values as signed integers.
13409 Vector intrinsics operate on a per-element basis. The larger element of ``%a``
13410 and ``%b`` at a given index is returned for that index.
13415 The arguments (``%a`` and ``%b``) may be of any integer type or a vector with
13416 integer element type. The argument types must match each other, and the return
13417 type must match the argument type.
13420 '``llvm.smin.*``' Intrinsic
13421 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
13426 This is an overloaded intrinsic. You can use ``@llvm.smin`` on any
13427 integer bit width or any vector of integer elements.
13431 declare i32 @llvm.smin.i32(i32 %a, i32 %b)
13432 declare <4 x i32> @llvm.smin.v4i32(<4 x i32> %a, <4 x i32> %b)
13437 Return the smaller of ``%a`` and ``%b`` comparing the values as signed integers.
13438 Vector intrinsics operate on a per-element basis. The smaller element of ``%a``
13439 and ``%b`` at a given index is returned for that index.
13444 The arguments (``%a`` and ``%b``) may be of any integer type or a vector with
13445 integer element type. The argument types must match each other, and the return
13446 type must match the argument type.
13449 '``llvm.umax.*``' Intrinsic
13450 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
13455 This is an overloaded intrinsic. You can use ``@llvm.umax`` on any
13456 integer bit width or any vector of integer elements.
13460 declare i32 @llvm.umax.i32(i32 %a, i32 %b)
13461 declare <4 x i32> @llvm.umax.v4i32(<4 x i32> %a, <4 x i32> %b)
13466 Return the larger of ``%a`` and ``%b`` comparing the values as unsigned
13467 integers. Vector intrinsics operate on a per-element basis. The larger element
13468 of ``%a`` and ``%b`` at a given index is returned for that index.
13473 The arguments (``%a`` and ``%b``) may be of any integer type or a vector with
13474 integer element type. The argument types must match each other, and the return
13475 type must match the argument type.
13478 '``llvm.umin.*``' Intrinsic
13479 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
13484 This is an overloaded intrinsic. You can use ``@llvm.umin`` on any
13485 integer bit width or any vector of integer elements.
13489 declare i32 @llvm.umin.i32(i32 %a, i32 %b)
13490 declare <4 x i32> @llvm.umin.v4i32(<4 x i32> %a, <4 x i32> %b)
13495 Return the smaller of ``%a`` and ``%b`` comparing the values as unsigned
13496 integers. Vector intrinsics operate on a per-element basis. The smaller element
13497 of ``%a`` and ``%b`` at a given index is returned for that index.
13502 The arguments (``%a`` and ``%b``) may be of any integer type or a vector with
13503 integer element type. The argument types must match each other, and the return
13504 type must match the argument type.
13509 '``llvm.memcpy``' Intrinsic
13510 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
13515 This is an overloaded intrinsic. You can use ``llvm.memcpy`` on any
13516 integer bit width and for different address spaces. Not all targets
13517 support all bit widths however.
13521 declare void @llvm.memcpy.p0i8.p0i8.i32(i8* <dest>, i8* <src>,
13522 i32 <len>, i1 <isvolatile>)
13523 declare void @llvm.memcpy.p0i8.p0i8.i64(i8* <dest>, i8* <src>,
13524 i64 <len>, i1 <isvolatile>)
13529 The '``llvm.memcpy.*``' intrinsics copy a block of memory from the
13530 source location to the destination location.
13532 Note that, unlike the standard libc function, the ``llvm.memcpy.*``
13533 intrinsics do not return a value, takes extra isvolatile
13534 arguments and the pointers can be in specified address spaces.
13539 The first argument is a pointer to the destination, the second is a
13540 pointer to the source. The third argument is an integer argument
13541 specifying the number of bytes to copy, and the fourth is a
13542 boolean indicating a volatile access.
13544 The :ref:`align <attr_align>` parameter attribute can be provided
13545 for the first and second arguments.
13547 If the ``isvolatile`` parameter is ``true``, the ``llvm.memcpy`` call is
13548 a :ref:`volatile operation <volatile>`. The detailed access behavior is not
13549 very cleanly specified and it is unwise to depend on it.
13554 The '``llvm.memcpy.*``' intrinsics copy a block of memory from the source
13555 location to the destination location, which must either be equal or
13556 non-overlapping. It copies "len" bytes of memory over. If the argument is known
13557 to be aligned to some boundary, this can be specified as an attribute on the
13560 If ``<len>`` is 0, it is no-op modulo the behavior of attributes attached to
13562 If ``<len>`` is not a well-defined value, the behavior is undefined.
13563 If ``<len>`` is not zero, both ``<dest>`` and ``<src>`` should be well-defined,
13564 otherwise the behavior is undefined.
13566 .. _int_memcpy_inline:
13568 '``llvm.memcpy.inline``' Intrinsic
13569 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
13574 This is an overloaded intrinsic. You can use ``llvm.memcpy.inline`` on any
13575 integer bit width and for different address spaces. Not all targets
13576 support all bit widths however.
13580 declare void @llvm.memcpy.inline.p0i8.p0i8.i32(i8* <dest>, i8* <src>,
13581 i32 <len>, i1 <isvolatile>)
13582 declare void @llvm.memcpy.inline.p0i8.p0i8.i64(i8* <dest>, i8* <src>,
13583 i64 <len>, i1 <isvolatile>)
13588 The '``llvm.memcpy.inline.*``' intrinsics copy a block of memory from the
13589 source location to the destination location and guarantees that no external
13590 functions are called.
13592 Note that, unlike the standard libc function, the ``llvm.memcpy.inline.*``
13593 intrinsics do not return a value, takes extra isvolatile
13594 arguments and the pointers can be in specified address spaces.
13599 The first argument is a pointer to the destination, the second is a
13600 pointer to the source. The third argument is a constant integer argument
13601 specifying the number of bytes to copy, and the fourth is a
13602 boolean indicating a volatile access.
13604 The :ref:`align <attr_align>` parameter attribute can be provided
13605 for the first and second arguments.
13607 If the ``isvolatile`` parameter is ``true``, the ``llvm.memcpy.inline`` call is
13608 a :ref:`volatile operation <volatile>`. The detailed access behavior is not
13609 very cleanly specified and it is unwise to depend on it.
13614 The '``llvm.memcpy.inline.*``' intrinsics copy a block of memory from the
13615 source location to the destination location, which are not allowed to
13616 overlap. It copies "len" bytes of memory over. If the argument is known
13617 to be aligned to some boundary, this can be specified as an attribute on
13619 The behavior of '``llvm.memcpy.inline.*``' is equivalent to the behavior of
13620 '``llvm.memcpy.*``', but the generated code is guaranteed not to call any
13621 external functions.
13625 '``llvm.memmove``' Intrinsic
13626 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
13631 This is an overloaded intrinsic. You can use llvm.memmove on any integer
13632 bit width and for different address space. Not all targets support all
13633 bit widths however.
13637 declare void @llvm.memmove.p0i8.p0i8.i32(i8* <dest>, i8* <src>,
13638 i32 <len>, i1 <isvolatile>)
13639 declare void @llvm.memmove.p0i8.p0i8.i64(i8* <dest>, i8* <src>,
13640 i64 <len>, i1 <isvolatile>)
13645 The '``llvm.memmove.*``' intrinsics move a block of memory from the
13646 source location to the destination location. It is similar to the
13647 '``llvm.memcpy``' intrinsic but allows the two memory locations to
13650 Note that, unlike the standard libc function, the ``llvm.memmove.*``
13651 intrinsics do not return a value, takes an extra isvolatile
13652 argument and the pointers can be in specified address spaces.
13657 The first argument is a pointer to the destination, the second is a
13658 pointer to the source. The third argument is an integer argument
13659 specifying the number of bytes to copy, and the fourth is a
13660 boolean indicating a volatile access.
13662 The :ref:`align <attr_align>` parameter attribute can be provided
13663 for the first and second arguments.
13665 If the ``isvolatile`` parameter is ``true``, the ``llvm.memmove`` call
13666 is a :ref:`volatile operation <volatile>`. The detailed access behavior is
13667 not very cleanly specified and it is unwise to depend on it.
13672 The '``llvm.memmove.*``' intrinsics copy a block of memory from the
13673 source location to the destination location, which may overlap. It
13674 copies "len" bytes of memory over. If the argument is known to be
13675 aligned to some boundary, this can be specified as an attribute on
13678 If ``<len>`` is 0, it is no-op modulo the behavior of attributes attached to
13680 If ``<len>`` is not a well-defined value, the behavior is undefined.
13681 If ``<len>`` is not zero, both ``<dest>`` and ``<src>`` should be well-defined,
13682 otherwise the behavior is undefined.
13686 '``llvm.memset.*``' Intrinsics
13687 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
13692 This is an overloaded intrinsic. You can use llvm.memset on any integer
13693 bit width and for different address spaces. However, not all targets
13694 support all bit widths.
13698 declare void @llvm.memset.p0i8.i32(i8* <dest>, i8 <val>,
13699 i32 <len>, i1 <isvolatile>)
13700 declare void @llvm.memset.p0i8.i64(i8* <dest>, i8 <val>,
13701 i64 <len>, i1 <isvolatile>)
13706 The '``llvm.memset.*``' intrinsics fill a block of memory with a
13707 particular byte value.
13709 Note that, unlike the standard libc function, the ``llvm.memset``
13710 intrinsic does not return a value and takes an extra volatile
13711 argument. Also, the destination can be in an arbitrary address space.
13716 The first argument is a pointer to the destination to fill, the second
13717 is the byte value with which to fill it, the third argument is an
13718 integer argument specifying the number of bytes to fill, and the fourth
13719 is a boolean indicating a volatile access.
13721 The :ref:`align <attr_align>` parameter attribute can be provided
13722 for the first arguments.
13724 If the ``isvolatile`` parameter is ``true``, the ``llvm.memset`` call is
13725 a :ref:`volatile operation <volatile>`. The detailed access behavior is not
13726 very cleanly specified and it is unwise to depend on it.
13731 The '``llvm.memset.*``' intrinsics fill "len" bytes of memory starting
13732 at the destination location. If the argument is known to be
13733 aligned to some boundary, this can be specified as an attribute on
13736 If ``<len>`` is 0, it is no-op modulo the behavior of attributes attached to
13738 If ``<len>`` is not a well-defined value, the behavior is undefined.
13739 If ``<len>`` is not zero, both ``<dest>`` and ``<src>`` should be well-defined,
13740 otherwise the behavior is undefined.
13742 '``llvm.sqrt.*``' Intrinsic
13743 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
13748 This is an overloaded intrinsic. You can use ``llvm.sqrt`` on any
13749 floating-point or vector of floating-point type. Not all targets support
13754 declare float @llvm.sqrt.f32(float %Val)
13755 declare double @llvm.sqrt.f64(double %Val)
13756 declare x86_fp80 @llvm.sqrt.f80(x86_fp80 %Val)
13757 declare fp128 @llvm.sqrt.f128(fp128 %Val)
13758 declare ppc_fp128 @llvm.sqrt.ppcf128(ppc_fp128 %Val)
13763 The '``llvm.sqrt``' intrinsics return the square root of the specified value.
13768 The argument and return value are floating-point numbers of the same type.
13773 Return the same value as a corresponding libm '``sqrt``' function but without
13774 trapping or setting ``errno``. For types specified by IEEE-754, the result
13775 matches a conforming libm implementation.
13777 When specified with the fast-math-flag 'afn', the result may be approximated
13778 using a less accurate calculation.
13780 '``llvm.powi.*``' Intrinsic
13781 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
13786 This is an overloaded intrinsic. You can use ``llvm.powi`` on any
13787 floating-point or vector of floating-point type. Not all targets support
13790 Generally, the only supported type for the exponent is the one matching
13791 with the C type ``int``.
13795 declare float @llvm.powi.f32.i32(float %Val, i32 %power)
13796 declare double @llvm.powi.f64.i16(double %Val, i16 %power)
13797 declare x86_fp80 @llvm.powi.f80.i32(x86_fp80 %Val, i32 %power)
13798 declare fp128 @llvm.powi.f128.i32(fp128 %Val, i32 %power)
13799 declare ppc_fp128 @llvm.powi.ppcf128.i32(ppc_fp128 %Val, i32 %power)
13804 The '``llvm.powi.*``' intrinsics return the first operand raised to the
13805 specified (positive or negative) power. The order of evaluation of
13806 multiplications is not defined. When a vector of floating-point type is
13807 used, the second argument remains a scalar integer value.
13812 The second argument is an integer power, and the first is a value to
13813 raise to that power.
13818 This function returns the first value raised to the second power with an
13819 unspecified sequence of rounding operations.
13821 '``llvm.sin.*``' Intrinsic
13822 ^^^^^^^^^^^^^^^^^^^^^^^^^^
13827 This is an overloaded intrinsic. You can use ``llvm.sin`` on any
13828 floating-point or vector of floating-point type. Not all targets support
13833 declare float @llvm.sin.f32(float %Val)
13834 declare double @llvm.sin.f64(double %Val)
13835 declare x86_fp80 @llvm.sin.f80(x86_fp80 %Val)
13836 declare fp128 @llvm.sin.f128(fp128 %Val)
13837 declare ppc_fp128 @llvm.sin.ppcf128(ppc_fp128 %Val)
13842 The '``llvm.sin.*``' intrinsics return the sine of the operand.
13847 The argument and return value are floating-point numbers of the same type.
13852 Return the same value as a corresponding libm '``sin``' function but without
13853 trapping or setting ``errno``.
13855 When specified with the fast-math-flag 'afn', the result may be approximated
13856 using a less accurate calculation.
13858 '``llvm.cos.*``' Intrinsic
13859 ^^^^^^^^^^^^^^^^^^^^^^^^^^
13864 This is an overloaded intrinsic. You can use ``llvm.cos`` on any
13865 floating-point or vector of floating-point type. Not all targets support
13870 declare float @llvm.cos.f32(float %Val)
13871 declare double @llvm.cos.f64(double %Val)
13872 declare x86_fp80 @llvm.cos.f80(x86_fp80 %Val)
13873 declare fp128 @llvm.cos.f128(fp128 %Val)
13874 declare ppc_fp128 @llvm.cos.ppcf128(ppc_fp128 %Val)
13879 The '``llvm.cos.*``' intrinsics return the cosine of the operand.
13884 The argument and return value are floating-point numbers of the same type.
13889 Return the same value as a corresponding libm '``cos``' function but without
13890 trapping or setting ``errno``.
13892 When specified with the fast-math-flag 'afn', the result may be approximated
13893 using a less accurate calculation.
13895 '``llvm.pow.*``' Intrinsic
13896 ^^^^^^^^^^^^^^^^^^^^^^^^^^
13901 This is an overloaded intrinsic. You can use ``llvm.pow`` on any
13902 floating-point or vector of floating-point type. Not all targets support
13907 declare float @llvm.pow.f32(float %Val, float %Power)
13908 declare double @llvm.pow.f64(double %Val, double %Power)
13909 declare x86_fp80 @llvm.pow.f80(x86_fp80 %Val, x86_fp80 %Power)
13910 declare fp128 @llvm.pow.f128(fp128 %Val, fp128 %Power)
13911 declare ppc_fp128 @llvm.pow.ppcf128(ppc_fp128 %Val, ppc_fp128 Power)
13916 The '``llvm.pow.*``' intrinsics return the first operand raised to the
13917 specified (positive or negative) power.
13922 The arguments and return value are floating-point numbers of the same type.
13927 Return the same value as a corresponding libm '``pow``' function but without
13928 trapping or setting ``errno``.
13930 When specified with the fast-math-flag 'afn', the result may be approximated
13931 using a less accurate calculation.
13933 '``llvm.exp.*``' Intrinsic
13934 ^^^^^^^^^^^^^^^^^^^^^^^^^^
13939 This is an overloaded intrinsic. You can use ``llvm.exp`` on any
13940 floating-point or vector of floating-point type. Not all targets support
13945 declare float @llvm.exp.f32(float %Val)
13946 declare double @llvm.exp.f64(double %Val)
13947 declare x86_fp80 @llvm.exp.f80(x86_fp80 %Val)
13948 declare fp128 @llvm.exp.f128(fp128 %Val)
13949 declare ppc_fp128 @llvm.exp.ppcf128(ppc_fp128 %Val)
13954 The '``llvm.exp.*``' intrinsics compute the base-e exponential of the specified
13960 The argument and return value are floating-point numbers of the same type.
13965 Return the same value as a corresponding libm '``exp``' function but without
13966 trapping or setting ``errno``.
13968 When specified with the fast-math-flag 'afn', the result may be approximated
13969 using a less accurate calculation.
13971 '``llvm.exp2.*``' Intrinsic
13972 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
13977 This is an overloaded intrinsic. You can use ``llvm.exp2`` on any
13978 floating-point or vector of floating-point type. Not all targets support
13983 declare float @llvm.exp2.f32(float %Val)
13984 declare double @llvm.exp2.f64(double %Val)
13985 declare x86_fp80 @llvm.exp2.f80(x86_fp80 %Val)
13986 declare fp128 @llvm.exp2.f128(fp128 %Val)
13987 declare ppc_fp128 @llvm.exp2.ppcf128(ppc_fp128 %Val)
13992 The '``llvm.exp2.*``' intrinsics compute the base-2 exponential of the
13998 The argument and return value are floating-point numbers of the same type.
14003 Return the same value as a corresponding libm '``exp2``' function but without
14004 trapping or setting ``errno``.
14006 When specified with the fast-math-flag 'afn', the result may be approximated
14007 using a less accurate calculation.
14009 '``llvm.log.*``' Intrinsic
14010 ^^^^^^^^^^^^^^^^^^^^^^^^^^
14015 This is an overloaded intrinsic. You can use ``llvm.log`` on any
14016 floating-point or vector of floating-point type. Not all targets support
14021 declare float @llvm.log.f32(float %Val)
14022 declare double @llvm.log.f64(double %Val)
14023 declare x86_fp80 @llvm.log.f80(x86_fp80 %Val)
14024 declare fp128 @llvm.log.f128(fp128 %Val)
14025 declare ppc_fp128 @llvm.log.ppcf128(ppc_fp128 %Val)
14030 The '``llvm.log.*``' intrinsics compute the base-e logarithm of the specified
14036 The argument and return value are floating-point numbers of the same type.
14041 Return the same value as a corresponding libm '``log``' function but without
14042 trapping or setting ``errno``.
14044 When specified with the fast-math-flag 'afn', the result may be approximated
14045 using a less accurate calculation.
14047 '``llvm.log10.*``' Intrinsic
14048 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
14053 This is an overloaded intrinsic. You can use ``llvm.log10`` on any
14054 floating-point or vector of floating-point type. Not all targets support
14059 declare float @llvm.log10.f32(float %Val)
14060 declare double @llvm.log10.f64(double %Val)
14061 declare x86_fp80 @llvm.log10.f80(x86_fp80 %Val)
14062 declare fp128 @llvm.log10.f128(fp128 %Val)
14063 declare ppc_fp128 @llvm.log10.ppcf128(ppc_fp128 %Val)
14068 The '``llvm.log10.*``' intrinsics compute the base-10 logarithm of the
14074 The argument and return value are floating-point numbers of the same type.
14079 Return the same value as a corresponding libm '``log10``' function but without
14080 trapping or setting ``errno``.
14082 When specified with the fast-math-flag 'afn', the result may be approximated
14083 using a less accurate calculation.
14085 '``llvm.log2.*``' Intrinsic
14086 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
14091 This is an overloaded intrinsic. You can use ``llvm.log2`` on any
14092 floating-point or vector of floating-point type. Not all targets support
14097 declare float @llvm.log2.f32(float %Val)
14098 declare double @llvm.log2.f64(double %Val)
14099 declare x86_fp80 @llvm.log2.f80(x86_fp80 %Val)
14100 declare fp128 @llvm.log2.f128(fp128 %Val)
14101 declare ppc_fp128 @llvm.log2.ppcf128(ppc_fp128 %Val)
14106 The '``llvm.log2.*``' intrinsics compute the base-2 logarithm of the specified
14112 The argument and return value are floating-point numbers of the same type.
14117 Return the same value as a corresponding libm '``log2``' function but without
14118 trapping or setting ``errno``.
14120 When specified with the fast-math-flag 'afn', the result may be approximated
14121 using a less accurate calculation.
14125 '``llvm.fma.*``' Intrinsic
14126 ^^^^^^^^^^^^^^^^^^^^^^^^^^
14131 This is an overloaded intrinsic. You can use ``llvm.fma`` on any
14132 floating-point or vector of floating-point type. Not all targets support
14137 declare float @llvm.fma.f32(float %a, float %b, float %c)
14138 declare double @llvm.fma.f64(double %a, double %b, double %c)
14139 declare x86_fp80 @llvm.fma.f80(x86_fp80 %a, x86_fp80 %b, x86_fp80 %c)
14140 declare fp128 @llvm.fma.f128(fp128 %a, fp128 %b, fp128 %c)
14141 declare ppc_fp128 @llvm.fma.ppcf128(ppc_fp128 %a, ppc_fp128 %b, ppc_fp128 %c)
14146 The '``llvm.fma.*``' intrinsics perform the fused multiply-add operation.
14151 The arguments and return value are floating-point numbers of the same type.
14156 Return the same value as a corresponding libm '``fma``' function but without
14157 trapping or setting ``errno``.
14159 When specified with the fast-math-flag 'afn', the result may be approximated
14160 using a less accurate calculation.
14162 '``llvm.fabs.*``' Intrinsic
14163 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
14168 This is an overloaded intrinsic. You can use ``llvm.fabs`` on any
14169 floating-point or vector of floating-point type. Not all targets support
14174 declare float @llvm.fabs.f32(float %Val)
14175 declare double @llvm.fabs.f64(double %Val)
14176 declare x86_fp80 @llvm.fabs.f80(x86_fp80 %Val)
14177 declare fp128 @llvm.fabs.f128(fp128 %Val)
14178 declare ppc_fp128 @llvm.fabs.ppcf128(ppc_fp128 %Val)
14183 The '``llvm.fabs.*``' intrinsics return the absolute value of the
14189 The argument and return value are floating-point numbers of the same
14195 This function returns the same values as the libm ``fabs`` functions
14196 would, and handles error conditions in the same way.
14198 '``llvm.minnum.*``' Intrinsic
14199 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
14204 This is an overloaded intrinsic. You can use ``llvm.minnum`` on any
14205 floating-point or vector of floating-point type. Not all targets support
14210 declare float @llvm.minnum.f32(float %Val0, float %Val1)
14211 declare double @llvm.minnum.f64(double %Val0, double %Val1)
14212 declare x86_fp80 @llvm.minnum.f80(x86_fp80 %Val0, x86_fp80 %Val1)
14213 declare fp128 @llvm.minnum.f128(fp128 %Val0, fp128 %Val1)
14214 declare ppc_fp128 @llvm.minnum.ppcf128(ppc_fp128 %Val0, ppc_fp128 %Val1)
14219 The '``llvm.minnum.*``' intrinsics return the minimum of the two
14226 The arguments and return value are floating-point numbers of the same
14232 Follows the IEEE-754 semantics for minNum, except for handling of
14233 signaling NaNs. This match's the behavior of libm's fmin.
14235 If either operand is a NaN, returns the other non-NaN operand. Returns
14236 NaN only if both operands are NaN. The returned NaN is always
14237 quiet. If the operands compare equal, returns a value that compares
14238 equal to both operands. This means that fmin(+/-0.0, +/-0.0) could
14239 return either -0.0 or 0.0.
14241 Unlike the IEEE-754 2008 behavior, this does not distinguish between
14242 signaling and quiet NaN inputs. If a target's implementation follows
14243 the standard and returns a quiet NaN if either input is a signaling
14244 NaN, the intrinsic lowering is responsible for quieting the inputs to
14245 correctly return the non-NaN input (e.g. by using the equivalent of
14246 ``llvm.canonicalize``).
14249 '``llvm.maxnum.*``' Intrinsic
14250 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
14255 This is an overloaded intrinsic. You can use ``llvm.maxnum`` on any
14256 floating-point or vector of floating-point type. Not all targets support
14261 declare float @llvm.maxnum.f32(float %Val0, float %Val1)
14262 declare double @llvm.maxnum.f64(double %Val0, double %Val1)
14263 declare x86_fp80 @llvm.maxnum.f80(x86_fp80 %Val0, x86_fp80 %Val1)
14264 declare fp128 @llvm.maxnum.f128(fp128 %Val0, fp128 %Val1)
14265 declare ppc_fp128 @llvm.maxnum.ppcf128(ppc_fp128 %Val0, ppc_fp128 %Val1)
14270 The '``llvm.maxnum.*``' intrinsics return the maximum of the two
14277 The arguments and return value are floating-point numbers of the same
14282 Follows the IEEE-754 semantics for maxNum except for the handling of
14283 signaling NaNs. This matches the behavior of libm's fmax.
14285 If either operand is a NaN, returns the other non-NaN operand. Returns
14286 NaN only if both operands are NaN. The returned NaN is always
14287 quiet. If the operands compare equal, returns a value that compares
14288 equal to both operands. This means that fmax(+/-0.0, +/-0.0) could
14289 return either -0.0 or 0.0.
14291 Unlike the IEEE-754 2008 behavior, this does not distinguish between
14292 signaling and quiet NaN inputs. If a target's implementation follows
14293 the standard and returns a quiet NaN if either input is a signaling
14294 NaN, the intrinsic lowering is responsible for quieting the inputs to
14295 correctly return the non-NaN input (e.g. by using the equivalent of
14296 ``llvm.canonicalize``).
14298 '``llvm.minimum.*``' Intrinsic
14299 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
14304 This is an overloaded intrinsic. You can use ``llvm.minimum`` on any
14305 floating-point or vector of floating-point type. Not all targets support
14310 declare float @llvm.minimum.f32(float %Val0, float %Val1)
14311 declare double @llvm.minimum.f64(double %Val0, double %Val1)
14312 declare x86_fp80 @llvm.minimum.f80(x86_fp80 %Val0, x86_fp80 %Val1)
14313 declare fp128 @llvm.minimum.f128(fp128 %Val0, fp128 %Val1)
14314 declare ppc_fp128 @llvm.minimum.ppcf128(ppc_fp128 %Val0, ppc_fp128 %Val1)
14319 The '``llvm.minimum.*``' intrinsics return the minimum of the two
14320 arguments, propagating NaNs and treating -0.0 as less than +0.0.
14326 The arguments and return value are floating-point numbers of the same
14331 If either operand is a NaN, returns NaN. Otherwise returns the lesser
14332 of the two arguments. -0.0 is considered to be less than +0.0 for this
14333 intrinsic. Note that these are the semantics specified in the draft of
14336 '``llvm.maximum.*``' Intrinsic
14337 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
14342 This is an overloaded intrinsic. You can use ``llvm.maximum`` on any
14343 floating-point or vector of floating-point type. Not all targets support
14348 declare float @llvm.maximum.f32(float %Val0, float %Val1)
14349 declare double @llvm.maximum.f64(double %Val0, double %Val1)
14350 declare x86_fp80 @llvm.maximum.f80(x86_fp80 %Val0, x86_fp80 %Val1)
14351 declare fp128 @llvm.maximum.f128(fp128 %Val0, fp128 %Val1)
14352 declare ppc_fp128 @llvm.maximum.ppcf128(ppc_fp128 %Val0, ppc_fp128 %Val1)
14357 The '``llvm.maximum.*``' intrinsics return the maximum of the two
14358 arguments, propagating NaNs and treating -0.0 as less than +0.0.
14364 The arguments and return value are floating-point numbers of the same
14369 If either operand is a NaN, returns NaN. Otherwise returns the greater
14370 of the two arguments. -0.0 is considered to be less than +0.0 for this
14371 intrinsic. Note that these are the semantics specified in the draft of
14374 '``llvm.copysign.*``' Intrinsic
14375 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
14380 This is an overloaded intrinsic. You can use ``llvm.copysign`` on any
14381 floating-point or vector of floating-point type. Not all targets support
14386 declare float @llvm.copysign.f32(float %Mag, float %Sgn)
14387 declare double @llvm.copysign.f64(double %Mag, double %Sgn)
14388 declare x86_fp80 @llvm.copysign.f80(x86_fp80 %Mag, x86_fp80 %Sgn)
14389 declare fp128 @llvm.copysign.f128(fp128 %Mag, fp128 %Sgn)
14390 declare ppc_fp128 @llvm.copysign.ppcf128(ppc_fp128 %Mag, ppc_fp128 %Sgn)
14395 The '``llvm.copysign.*``' intrinsics return a value with the magnitude of the
14396 first operand and the sign of the second operand.
14401 The arguments and return value are floating-point numbers of the same
14407 This function returns the same values as the libm ``copysign``
14408 functions would, and handles error conditions in the same way.
14410 '``llvm.floor.*``' Intrinsic
14411 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
14416 This is an overloaded intrinsic. You can use ``llvm.floor`` on any
14417 floating-point or vector of floating-point type. Not all targets support
14422 declare float @llvm.floor.f32(float %Val)
14423 declare double @llvm.floor.f64(double %Val)
14424 declare x86_fp80 @llvm.floor.f80(x86_fp80 %Val)
14425 declare fp128 @llvm.floor.f128(fp128 %Val)
14426 declare ppc_fp128 @llvm.floor.ppcf128(ppc_fp128 %Val)
14431 The '``llvm.floor.*``' intrinsics return the floor of the operand.
14436 The argument and return value are floating-point numbers of the same
14442 This function returns the same values as the libm ``floor`` functions
14443 would, and handles error conditions in the same way.
14445 '``llvm.ceil.*``' Intrinsic
14446 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
14451 This is an overloaded intrinsic. You can use ``llvm.ceil`` on any
14452 floating-point or vector of floating-point type. Not all targets support
14457 declare float @llvm.ceil.f32(float %Val)
14458 declare double @llvm.ceil.f64(double %Val)
14459 declare x86_fp80 @llvm.ceil.f80(x86_fp80 %Val)
14460 declare fp128 @llvm.ceil.f128(fp128 %Val)
14461 declare ppc_fp128 @llvm.ceil.ppcf128(ppc_fp128 %Val)
14466 The '``llvm.ceil.*``' intrinsics return the ceiling of the operand.
14471 The argument and return value are floating-point numbers of the same
14477 This function returns the same values as the libm ``ceil`` functions
14478 would, and handles error conditions in the same way.
14480 '``llvm.trunc.*``' Intrinsic
14481 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
14486 This is an overloaded intrinsic. You can use ``llvm.trunc`` on any
14487 floating-point or vector of floating-point type. Not all targets support
14492 declare float @llvm.trunc.f32(float %Val)
14493 declare double @llvm.trunc.f64(double %Val)
14494 declare x86_fp80 @llvm.trunc.f80(x86_fp80 %Val)
14495 declare fp128 @llvm.trunc.f128(fp128 %Val)
14496 declare ppc_fp128 @llvm.trunc.ppcf128(ppc_fp128 %Val)
14501 The '``llvm.trunc.*``' intrinsics returns the operand rounded to the
14502 nearest integer not larger in magnitude than the operand.
14507 The argument and return value are floating-point numbers of the same
14513 This function returns the same values as the libm ``trunc`` functions
14514 would, and handles error conditions in the same way.
14516 '``llvm.rint.*``' Intrinsic
14517 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
14522 This is an overloaded intrinsic. You can use ``llvm.rint`` on any
14523 floating-point or vector of floating-point type. Not all targets support
14528 declare float @llvm.rint.f32(float %Val)
14529 declare double @llvm.rint.f64(double %Val)
14530 declare x86_fp80 @llvm.rint.f80(x86_fp80 %Val)
14531 declare fp128 @llvm.rint.f128(fp128 %Val)
14532 declare ppc_fp128 @llvm.rint.ppcf128(ppc_fp128 %Val)
14537 The '``llvm.rint.*``' intrinsics returns the operand rounded to the
14538 nearest integer. It may raise an inexact floating-point exception if the
14539 operand isn't an integer.
14544 The argument and return value are floating-point numbers of the same
14550 This function returns the same values as the libm ``rint`` functions
14551 would, and handles error conditions in the same way.
14553 '``llvm.nearbyint.*``' Intrinsic
14554 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
14559 This is an overloaded intrinsic. You can use ``llvm.nearbyint`` on any
14560 floating-point or vector of floating-point type. Not all targets support
14565 declare float @llvm.nearbyint.f32(float %Val)
14566 declare double @llvm.nearbyint.f64(double %Val)
14567 declare x86_fp80 @llvm.nearbyint.f80(x86_fp80 %Val)
14568 declare fp128 @llvm.nearbyint.f128(fp128 %Val)
14569 declare ppc_fp128 @llvm.nearbyint.ppcf128(ppc_fp128 %Val)
14574 The '``llvm.nearbyint.*``' intrinsics returns the operand rounded to the
14580 The argument and return value are floating-point numbers of the same
14586 This function returns the same values as the libm ``nearbyint``
14587 functions would, and handles error conditions in the same way.
14589 '``llvm.round.*``' Intrinsic
14590 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
14595 This is an overloaded intrinsic. You can use ``llvm.round`` on any
14596 floating-point or vector of floating-point type. Not all targets support
14601 declare float @llvm.round.f32(float %Val)
14602 declare double @llvm.round.f64(double %Val)
14603 declare x86_fp80 @llvm.round.f80(x86_fp80 %Val)
14604 declare fp128 @llvm.round.f128(fp128 %Val)
14605 declare ppc_fp128 @llvm.round.ppcf128(ppc_fp128 %Val)
14610 The '``llvm.round.*``' intrinsics returns the operand rounded to the
14616 The argument and return value are floating-point numbers of the same
14622 This function returns the same values as the libm ``round``
14623 functions would, and handles error conditions in the same way.
14625 '``llvm.roundeven.*``' Intrinsic
14626 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
14631 This is an overloaded intrinsic. You can use ``llvm.roundeven`` on any
14632 floating-point or vector of floating-point type. Not all targets support
14637 declare float @llvm.roundeven.f32(float %Val)
14638 declare double @llvm.roundeven.f64(double %Val)
14639 declare x86_fp80 @llvm.roundeven.f80(x86_fp80 %Val)
14640 declare fp128 @llvm.roundeven.f128(fp128 %Val)
14641 declare ppc_fp128 @llvm.roundeven.ppcf128(ppc_fp128 %Val)
14646 The '``llvm.roundeven.*``' intrinsics returns the operand rounded to the nearest
14647 integer in floating-point format rounding halfway cases to even (that is, to the
14648 nearest value that is an even integer).
14653 The argument and return value are floating-point numbers of the same type.
14658 This function implements IEEE-754 operation ``roundToIntegralTiesToEven``. It
14659 also behaves in the same way as C standard function ``roundeven``, except that
14660 it does not raise floating point exceptions.
14663 '``llvm.lround.*``' Intrinsic
14664 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
14669 This is an overloaded intrinsic. You can use ``llvm.lround`` on any
14670 floating-point type. Not all targets support all types however.
14674 declare i32 @llvm.lround.i32.f32(float %Val)
14675 declare i32 @llvm.lround.i32.f64(double %Val)
14676 declare i32 @llvm.lround.i32.f80(float %Val)
14677 declare i32 @llvm.lround.i32.f128(double %Val)
14678 declare i32 @llvm.lround.i32.ppcf128(double %Val)
14680 declare i64 @llvm.lround.i64.f32(float %Val)
14681 declare i64 @llvm.lround.i64.f64(double %Val)
14682 declare i64 @llvm.lround.i64.f80(float %Val)
14683 declare i64 @llvm.lround.i64.f128(double %Val)
14684 declare i64 @llvm.lround.i64.ppcf128(double %Val)
14689 The '``llvm.lround.*``' intrinsics return the operand rounded to the nearest
14690 integer with ties away from zero.
14696 The argument is a floating-point number and the return value is an integer
14702 This function returns the same values as the libm ``lround``
14703 functions would, but without setting errno.
14705 '``llvm.llround.*``' Intrinsic
14706 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
14711 This is an overloaded intrinsic. You can use ``llvm.llround`` on any
14712 floating-point type. Not all targets support all types however.
14716 declare i64 @llvm.lround.i64.f32(float %Val)
14717 declare i64 @llvm.lround.i64.f64(double %Val)
14718 declare i64 @llvm.lround.i64.f80(float %Val)
14719 declare i64 @llvm.lround.i64.f128(double %Val)
14720 declare i64 @llvm.lround.i64.ppcf128(double %Val)
14725 The '``llvm.llround.*``' intrinsics return the operand rounded to the nearest
14726 integer with ties away from zero.
14731 The argument is a floating-point number and the return value is an integer
14737 This function returns the same values as the libm ``llround``
14738 functions would, but without setting errno.
14740 '``llvm.lrint.*``' Intrinsic
14741 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
14746 This is an overloaded intrinsic. You can use ``llvm.lrint`` on any
14747 floating-point type. Not all targets support all types however.
14751 declare i32 @llvm.lrint.i32.f32(float %Val)
14752 declare i32 @llvm.lrint.i32.f64(double %Val)
14753 declare i32 @llvm.lrint.i32.f80(float %Val)
14754 declare i32 @llvm.lrint.i32.f128(double %Val)
14755 declare i32 @llvm.lrint.i32.ppcf128(double %Val)
14757 declare i64 @llvm.lrint.i64.f32(float %Val)
14758 declare i64 @llvm.lrint.i64.f64(double %Val)
14759 declare i64 @llvm.lrint.i64.f80(float %Val)
14760 declare i64 @llvm.lrint.i64.f128(double %Val)
14761 declare i64 @llvm.lrint.i64.ppcf128(double %Val)
14766 The '``llvm.lrint.*``' intrinsics return the operand rounded to the nearest
14773 The argument is a floating-point number and the return value is an integer
14779 This function returns the same values as the libm ``lrint``
14780 functions would, but without setting errno.
14782 '``llvm.llrint.*``' Intrinsic
14783 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
14788 This is an overloaded intrinsic. You can use ``llvm.llrint`` on any
14789 floating-point type. Not all targets support all types however.
14793 declare i64 @llvm.llrint.i64.f32(float %Val)
14794 declare i64 @llvm.llrint.i64.f64(double %Val)
14795 declare i64 @llvm.llrint.i64.f80(float %Val)
14796 declare i64 @llvm.llrint.i64.f128(double %Val)
14797 declare i64 @llvm.llrint.i64.ppcf128(double %Val)
14802 The '``llvm.llrint.*``' intrinsics return the operand rounded to the nearest
14808 The argument is a floating-point number and the return value is an integer
14814 This function returns the same values as the libm ``llrint``
14815 functions would, but without setting errno.
14817 Bit Manipulation Intrinsics
14818 ---------------------------
14820 LLVM provides intrinsics for a few important bit manipulation
14821 operations. These allow efficient code generation for some algorithms.
14823 '``llvm.bitreverse.*``' Intrinsics
14824 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
14829 This is an overloaded intrinsic function. You can use bitreverse on any
14834 declare i16 @llvm.bitreverse.i16(i16 <id>)
14835 declare i32 @llvm.bitreverse.i32(i32 <id>)
14836 declare i64 @llvm.bitreverse.i64(i64 <id>)
14837 declare <4 x i32> @llvm.bitreverse.v4i32(<4 x i32> <id>)
14842 The '``llvm.bitreverse``' family of intrinsics is used to reverse the
14843 bitpattern of an integer value or vector of integer values; for example
14844 ``0b10110110`` becomes ``0b01101101``.
14849 The ``llvm.bitreverse.iN`` intrinsic returns an iN value that has bit
14850 ``M`` in the input moved to bit ``N-M`` in the output. The vector
14851 intrinsics, such as ``llvm.bitreverse.v4i32``, operate on a per-element
14852 basis and the element order is not affected.
14854 '``llvm.bswap.*``' Intrinsics
14855 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
14860 This is an overloaded intrinsic function. You can use bswap on any
14861 integer type that is an even number of bytes (i.e. BitWidth % 16 == 0).
14865 declare i16 @llvm.bswap.i16(i16 <id>)
14866 declare i32 @llvm.bswap.i32(i32 <id>)
14867 declare i64 @llvm.bswap.i64(i64 <id>)
14868 declare <4 x i32> @llvm.bswap.v4i32(<4 x i32> <id>)
14873 The '``llvm.bswap``' family of intrinsics is used to byte swap an integer
14874 value or vector of integer values with an even number of bytes (positive
14875 multiple of 16 bits).
14880 The ``llvm.bswap.i16`` intrinsic returns an i16 value that has the high
14881 and low byte of the input i16 swapped. Similarly, the ``llvm.bswap.i32``
14882 intrinsic returns an i32 value that has the four bytes of the input i32
14883 swapped, so that if the input bytes are numbered 0, 1, 2, 3 then the
14884 returned i32 will have its bytes in 3, 2, 1, 0 order. The
14885 ``llvm.bswap.i48``, ``llvm.bswap.i64`` and other intrinsics extend this
14886 concept to additional even-byte lengths (6 bytes, 8 bytes and more,
14887 respectively). The vector intrinsics, such as ``llvm.bswap.v4i32``,
14888 operate on a per-element basis and the element order is not affected.
14890 '``llvm.ctpop.*``' Intrinsic
14891 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
14896 This is an overloaded intrinsic. You can use llvm.ctpop on any integer
14897 bit width, or on any vector with integer elements. Not all targets
14898 support all bit widths or vector types, however.
14902 declare i8 @llvm.ctpop.i8(i8 <src>)
14903 declare i16 @llvm.ctpop.i16(i16 <src>)
14904 declare i32 @llvm.ctpop.i32(i32 <src>)
14905 declare i64 @llvm.ctpop.i64(i64 <src>)
14906 declare i256 @llvm.ctpop.i256(i256 <src>)
14907 declare <2 x i32> @llvm.ctpop.v2i32(<2 x i32> <src>)
14912 The '``llvm.ctpop``' family of intrinsics counts the number of bits set
14918 The only argument is the value to be counted. The argument may be of any
14919 integer type, or a vector with integer elements. The return type must
14920 match the argument type.
14925 The '``llvm.ctpop``' intrinsic counts the 1's in a variable, or within
14926 each element of a vector.
14928 '``llvm.ctlz.*``' Intrinsic
14929 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
14934 This is an overloaded intrinsic. You can use ``llvm.ctlz`` on any
14935 integer bit width, or any vector whose elements are integers. Not all
14936 targets support all bit widths or vector types, however.
14940 declare i8 @llvm.ctlz.i8 (i8 <src>, i1 <is_zero_undef>)
14941 declare i16 @llvm.ctlz.i16 (i16 <src>, i1 <is_zero_undef>)
14942 declare i32 @llvm.ctlz.i32 (i32 <src>, i1 <is_zero_undef>)
14943 declare i64 @llvm.ctlz.i64 (i64 <src>, i1 <is_zero_undef>)
14944 declare i256 @llvm.ctlz.i256(i256 <src>, i1 <is_zero_undef>)
14945 declare <2 x i32> @llvm.ctlz.v2i32(<2 x i32> <src>, i1 <is_zero_undef>)
14950 The '``llvm.ctlz``' family of intrinsic functions counts the number of
14951 leading zeros in a variable.
14956 The first argument is the value to be counted. This argument may be of
14957 any integer type, or a vector with integer element type. The return
14958 type must match the first argument type.
14960 The second argument must be a constant and is a flag to indicate whether
14961 the intrinsic should ensure that a zero as the first argument produces a
14962 defined result. Historically some architectures did not provide a
14963 defined result for zero values as efficiently, and many algorithms are
14964 now predicated on avoiding zero-value inputs.
14969 The '``llvm.ctlz``' intrinsic counts the leading (most significant)
14970 zeros in a variable, or within each element of the vector. If
14971 ``src == 0`` then the result is the size in bits of the type of ``src``
14972 if ``is_zero_undef == 0`` and ``undef`` otherwise. For example,
14973 ``llvm.ctlz(i32 2) = 30``.
14975 '``llvm.cttz.*``' Intrinsic
14976 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
14981 This is an overloaded intrinsic. You can use ``llvm.cttz`` on any
14982 integer bit width, or any vector of integer elements. Not all targets
14983 support all bit widths or vector types, however.
14987 declare i8 @llvm.cttz.i8 (i8 <src>, i1 <is_zero_undef>)
14988 declare i16 @llvm.cttz.i16 (i16 <src>, i1 <is_zero_undef>)
14989 declare i32 @llvm.cttz.i32 (i32 <src>, i1 <is_zero_undef>)
14990 declare i64 @llvm.cttz.i64 (i64 <src>, i1 <is_zero_undef>)
14991 declare i256 @llvm.cttz.i256(i256 <src>, i1 <is_zero_undef>)
14992 declare <2 x i32> @llvm.cttz.v2i32(<2 x i32> <src>, i1 <is_zero_undef>)
14997 The '``llvm.cttz``' family of intrinsic functions counts the number of
15003 The first argument is the value to be counted. This argument may be of
15004 any integer type, or a vector with integer element type. The return
15005 type must match the first argument type.
15007 The second argument must be a constant and is a flag to indicate whether
15008 the intrinsic should ensure that a zero as the first argument produces a
15009 defined result. Historically some architectures did not provide a
15010 defined result for zero values as efficiently, and many algorithms are
15011 now predicated on avoiding zero-value inputs.
15016 The '``llvm.cttz``' intrinsic counts the trailing (least significant)
15017 zeros in a variable, or within each element of a vector. If ``src == 0``
15018 then the result is the size in bits of the type of ``src`` if
15019 ``is_zero_undef == 0`` and ``undef`` otherwise. For example,
15020 ``llvm.cttz(2) = 1``.
15024 '``llvm.fshl.*``' Intrinsic
15025 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
15030 This is an overloaded intrinsic. You can use ``llvm.fshl`` on any
15031 integer bit width or any vector of integer elements. Not all targets
15032 support all bit widths or vector types, however.
15036 declare i8 @llvm.fshl.i8 (i8 %a, i8 %b, i8 %c)
15037 declare i67 @llvm.fshl.i67(i67 %a, i67 %b, i67 %c)
15038 declare <2 x i32> @llvm.fshl.v2i32(<2 x i32> %a, <2 x i32> %b, <2 x i32> %c)
15043 The '``llvm.fshl``' family of intrinsic functions performs a funnel shift left:
15044 the first two values are concatenated as { %a : %b } (%a is the most significant
15045 bits of the wide value), the combined value is shifted left, and the most
15046 significant bits are extracted to produce a result that is the same size as the
15047 original arguments. If the first 2 arguments are identical, this is equivalent
15048 to a rotate left operation. For vector types, the operation occurs for each
15049 element of the vector. The shift argument is treated as an unsigned amount
15050 modulo the element size of the arguments.
15055 The first two arguments are the values to be concatenated. The third
15056 argument is the shift amount. The arguments may be any integer type or a
15057 vector with integer element type. All arguments and the return value must
15058 have the same type.
15063 .. code-block:: text
15065 %r = call i8 @llvm.fshl.i8(i8 %x, i8 %y, i8 %z) ; %r = i8: msb_extract((concat(x, y) << (z % 8)), 8)
15066 %r = call i8 @llvm.fshl.i8(i8 255, i8 0, i8 15) ; %r = i8: 128 (0b10000000)
15067 %r = call i8 @llvm.fshl.i8(i8 15, i8 15, i8 11) ; %r = i8: 120 (0b01111000)
15068 %r = call i8 @llvm.fshl.i8(i8 0, i8 255, i8 8) ; %r = i8: 0 (0b00000000)
15070 '``llvm.fshr.*``' Intrinsic
15071 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
15076 This is an overloaded intrinsic. You can use ``llvm.fshr`` on any
15077 integer bit width or any vector of integer elements. Not all targets
15078 support all bit widths or vector types, however.
15082 declare i8 @llvm.fshr.i8 (i8 %a, i8 %b, i8 %c)
15083 declare i67 @llvm.fshr.i67(i67 %a, i67 %b, i67 %c)
15084 declare <2 x i32> @llvm.fshr.v2i32(<2 x i32> %a, <2 x i32> %b, <2 x i32> %c)
15089 The '``llvm.fshr``' family of intrinsic functions performs a funnel shift right:
15090 the first two values are concatenated as { %a : %b } (%a is the most significant
15091 bits of the wide value), the combined value is shifted right, and the least
15092 significant bits are extracted to produce a result that is the same size as the
15093 original arguments. If the first 2 arguments are identical, this is equivalent
15094 to a rotate right operation. For vector types, the operation occurs for each
15095 element of the vector. The shift argument is treated as an unsigned amount
15096 modulo the element size of the arguments.
15101 The first two arguments are the values to be concatenated. The third
15102 argument is the shift amount. The arguments may be any integer type or a
15103 vector with integer element type. All arguments and the return value must
15104 have the same type.
15109 .. code-block:: text
15111 %r = call i8 @llvm.fshr.i8(i8 %x, i8 %y, i8 %z) ; %r = i8: lsb_extract((concat(x, y) >> (z % 8)), 8)
15112 %r = call i8 @llvm.fshr.i8(i8 255, i8 0, i8 15) ; %r = i8: 254 (0b11111110)
15113 %r = call i8 @llvm.fshr.i8(i8 15, i8 15, i8 11) ; %r = i8: 225 (0b11100001)
15114 %r = call i8 @llvm.fshr.i8(i8 0, i8 255, i8 8) ; %r = i8: 255 (0b11111111)
15116 Arithmetic with Overflow Intrinsics
15117 -----------------------------------
15119 LLVM provides intrinsics for fast arithmetic overflow checking.
15121 Each of these intrinsics returns a two-element struct. The first
15122 element of this struct contains the result of the corresponding
15123 arithmetic operation modulo 2\ :sup:`n`\ , where n is the bit width of
15124 the result. Therefore, for example, the first element of the struct
15125 returned by ``llvm.sadd.with.overflow.i32`` is always the same as the
15126 result of a 32-bit ``add`` instruction with the same operands, where
15127 the ``add`` is *not* modified by an ``nsw`` or ``nuw`` flag.
15129 The second element of the result is an ``i1`` that is 1 if the
15130 arithmetic operation overflowed and 0 otherwise. An operation
15131 overflows if, for any values of its operands ``A`` and ``B`` and for
15132 any ``N`` larger than the operands' width, ``ext(A op B) to iN`` is
15133 not equal to ``(ext(A) to iN) op (ext(B) to iN)`` where ``ext`` is
15134 ``sext`` for signed overflow and ``zext`` for unsigned overflow, and
15135 ``op`` is the underlying arithmetic operation.
15137 The behavior of these intrinsics is well-defined for all argument
15140 '``llvm.sadd.with.overflow.*``' Intrinsics
15141 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
15146 This is an overloaded intrinsic. You can use ``llvm.sadd.with.overflow``
15147 on any integer bit width or vectors of integers.
15151 declare {i16, i1} @llvm.sadd.with.overflow.i16(i16 %a, i16 %b)
15152 declare {i32, i1} @llvm.sadd.with.overflow.i32(i32 %a, i32 %b)
15153 declare {i64, i1} @llvm.sadd.with.overflow.i64(i64 %a, i64 %b)
15154 declare {<4 x i32>, <4 x i1>} @llvm.sadd.with.overflow.v4i32(<4 x i32> %a, <4 x i32> %b)
15159 The '``llvm.sadd.with.overflow``' family of intrinsic functions perform
15160 a signed addition of the two arguments, and indicate whether an overflow
15161 occurred during the signed summation.
15166 The arguments (%a and %b) and the first element of the result structure
15167 may be of integer types of any bit width, but they must have the same
15168 bit width. The second element of the result structure must be of type
15169 ``i1``. ``%a`` and ``%b`` are the two values that will undergo signed
15175 The '``llvm.sadd.with.overflow``' family of intrinsic functions perform
15176 a signed addition of the two variables. They return a structure --- the
15177 first element of which is the signed summation, and the second element
15178 of which is a bit specifying if the signed summation resulted in an
15184 .. code-block:: llvm
15186 %res = call {i32, i1} @llvm.sadd.with.overflow.i32(i32 %a, i32 %b)
15187 %sum = extractvalue {i32, i1} %res, 0
15188 %obit = extractvalue {i32, i1} %res, 1
15189 br i1 %obit, label %overflow, label %normal
15191 '``llvm.uadd.with.overflow.*``' Intrinsics
15192 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
15197 This is an overloaded intrinsic. You can use ``llvm.uadd.with.overflow``
15198 on any integer bit width or vectors of integers.
15202 declare {i16, i1} @llvm.uadd.with.overflow.i16(i16 %a, i16 %b)
15203 declare {i32, i1} @llvm.uadd.with.overflow.i32(i32 %a, i32 %b)
15204 declare {i64, i1} @llvm.uadd.with.overflow.i64(i64 %a, i64 %b)
15205 declare {<4 x i32>, <4 x i1>} @llvm.uadd.with.overflow.v4i32(<4 x i32> %a, <4 x i32> %b)
15210 The '``llvm.uadd.with.overflow``' family of intrinsic functions perform
15211 an unsigned addition of the two arguments, and indicate whether a carry
15212 occurred during the unsigned summation.
15217 The arguments (%a and %b) and the first element of the result structure
15218 may be of integer types of any bit width, but they must have the same
15219 bit width. The second element of the result structure must be of type
15220 ``i1``. ``%a`` and ``%b`` are the two values that will undergo unsigned
15226 The '``llvm.uadd.with.overflow``' family of intrinsic functions perform
15227 an unsigned addition of the two arguments. They return a structure --- the
15228 first element of which is the sum, and the second element of which is a
15229 bit specifying if the unsigned summation resulted in a carry.
15234 .. code-block:: llvm
15236 %res = call {i32, i1} @llvm.uadd.with.overflow.i32(i32 %a, i32 %b)
15237 %sum = extractvalue {i32, i1} %res, 0
15238 %obit = extractvalue {i32, i1} %res, 1
15239 br i1 %obit, label %carry, label %normal
15241 '``llvm.ssub.with.overflow.*``' Intrinsics
15242 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
15247 This is an overloaded intrinsic. You can use ``llvm.ssub.with.overflow``
15248 on any integer bit width or vectors of integers.
15252 declare {i16, i1} @llvm.ssub.with.overflow.i16(i16 %a, i16 %b)
15253 declare {i32, i1} @llvm.ssub.with.overflow.i32(i32 %a, i32 %b)
15254 declare {i64, i1} @llvm.ssub.with.overflow.i64(i64 %a, i64 %b)
15255 declare {<4 x i32>, <4 x i1>} @llvm.ssub.with.overflow.v4i32(<4 x i32> %a, <4 x i32> %b)
15260 The '``llvm.ssub.with.overflow``' family of intrinsic functions perform
15261 a signed subtraction of the two arguments, and indicate whether an
15262 overflow occurred during the signed subtraction.
15267 The arguments (%a and %b) and the first element of the result structure
15268 may be of integer types of any bit width, but they must have the same
15269 bit width. The second element of the result structure must be of type
15270 ``i1``. ``%a`` and ``%b`` are the two values that will undergo signed
15276 The '``llvm.ssub.with.overflow``' family of intrinsic functions perform
15277 a signed subtraction of the two arguments. They return a structure --- the
15278 first element of which is the subtraction, and the second element of
15279 which is a bit specifying if the signed subtraction resulted in an
15285 .. code-block:: llvm
15287 %res = call {i32, i1} @llvm.ssub.with.overflow.i32(i32 %a, i32 %b)
15288 %sum = extractvalue {i32, i1} %res, 0
15289 %obit = extractvalue {i32, i1} %res, 1
15290 br i1 %obit, label %overflow, label %normal
15292 '``llvm.usub.with.overflow.*``' Intrinsics
15293 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
15298 This is an overloaded intrinsic. You can use ``llvm.usub.with.overflow``
15299 on any integer bit width or vectors of integers.
15303 declare {i16, i1} @llvm.usub.with.overflow.i16(i16 %a, i16 %b)
15304 declare {i32, i1} @llvm.usub.with.overflow.i32(i32 %a, i32 %b)
15305 declare {i64, i1} @llvm.usub.with.overflow.i64(i64 %a, i64 %b)
15306 declare {<4 x i32>, <4 x i1>} @llvm.usub.with.overflow.v4i32(<4 x i32> %a, <4 x i32> %b)
15311 The '``llvm.usub.with.overflow``' family of intrinsic functions perform
15312 an unsigned subtraction of the two arguments, and indicate whether an
15313 overflow occurred during the unsigned subtraction.
15318 The arguments (%a and %b) and the first element of the result structure
15319 may be of integer types of any bit width, but they must have the same
15320 bit width. The second element of the result structure must be of type
15321 ``i1``. ``%a`` and ``%b`` are the two values that will undergo unsigned
15327 The '``llvm.usub.with.overflow``' family of intrinsic functions perform
15328 an unsigned subtraction of the two arguments. They return a structure ---
15329 the first element of which is the subtraction, and the second element of
15330 which is a bit specifying if the unsigned subtraction resulted in an
15336 .. code-block:: llvm
15338 %res = call {i32, i1} @llvm.usub.with.overflow.i32(i32 %a, i32 %b)
15339 %sum = extractvalue {i32, i1} %res, 0
15340 %obit = extractvalue {i32, i1} %res, 1
15341 br i1 %obit, label %overflow, label %normal
15343 '``llvm.smul.with.overflow.*``' Intrinsics
15344 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
15349 This is an overloaded intrinsic. You can use ``llvm.smul.with.overflow``
15350 on any integer bit width or vectors of integers.
15354 declare {i16, i1} @llvm.smul.with.overflow.i16(i16 %a, i16 %b)
15355 declare {i32, i1} @llvm.smul.with.overflow.i32(i32 %a, i32 %b)
15356 declare {i64, i1} @llvm.smul.with.overflow.i64(i64 %a, i64 %b)
15357 declare {<4 x i32>, <4 x i1>} @llvm.smul.with.overflow.v4i32(<4 x i32> %a, <4 x i32> %b)
15362 The '``llvm.smul.with.overflow``' family of intrinsic functions perform
15363 a signed multiplication of the two arguments, and indicate whether an
15364 overflow occurred during the signed multiplication.
15369 The arguments (%a and %b) and the first element of the result structure
15370 may be of integer types of any bit width, but they must have the same
15371 bit width. The second element of the result structure must be of type
15372 ``i1``. ``%a`` and ``%b`` are the two values that will undergo signed
15378 The '``llvm.smul.with.overflow``' family of intrinsic functions perform
15379 a signed multiplication of the two arguments. They return a structure ---
15380 the first element of which is the multiplication, and the second element
15381 of which is a bit specifying if the signed multiplication resulted in an
15387 .. code-block:: llvm
15389 %res = call {i32, i1} @llvm.smul.with.overflow.i32(i32 %a, i32 %b)
15390 %sum = extractvalue {i32, i1} %res, 0
15391 %obit = extractvalue {i32, i1} %res, 1
15392 br i1 %obit, label %overflow, label %normal
15394 '``llvm.umul.with.overflow.*``' Intrinsics
15395 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
15400 This is an overloaded intrinsic. You can use ``llvm.umul.with.overflow``
15401 on any integer bit width or vectors of integers.
15405 declare {i16, i1} @llvm.umul.with.overflow.i16(i16 %a, i16 %b)
15406 declare {i32, i1} @llvm.umul.with.overflow.i32(i32 %a, i32 %b)
15407 declare {i64, i1} @llvm.umul.with.overflow.i64(i64 %a, i64 %b)
15408 declare {<4 x i32>, <4 x i1>} @llvm.umul.with.overflow.v4i32(<4 x i32> %a, <4 x i32> %b)
15413 The '``llvm.umul.with.overflow``' family of intrinsic functions perform
15414 a unsigned multiplication of the two arguments, and indicate whether an
15415 overflow occurred during the unsigned multiplication.
15420 The arguments (%a and %b) and the first element of the result structure
15421 may be of integer types of any bit width, but they must have the same
15422 bit width. The second element of the result structure must be of type
15423 ``i1``. ``%a`` and ``%b`` are the two values that will undergo unsigned
15429 The '``llvm.umul.with.overflow``' family of intrinsic functions perform
15430 an unsigned multiplication of the two arguments. They return a structure ---
15431 the first element of which is the multiplication, and the second
15432 element of which is a bit specifying if the unsigned multiplication
15433 resulted in an overflow.
15438 .. code-block:: llvm
15440 %res = call {i32, i1} @llvm.umul.with.overflow.i32(i32 %a, i32 %b)
15441 %sum = extractvalue {i32, i1} %res, 0
15442 %obit = extractvalue {i32, i1} %res, 1
15443 br i1 %obit, label %overflow, label %normal
15445 Saturation Arithmetic Intrinsics
15446 ---------------------------------
15448 Saturation arithmetic is a version of arithmetic in which operations are
15449 limited to a fixed range between a minimum and maximum value. If the result of
15450 an operation is greater than the maximum value, the result is set (or
15451 "clamped") to this maximum. If it is below the minimum, it is clamped to this
15455 '``llvm.sadd.sat.*``' Intrinsics
15456 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
15461 This is an overloaded intrinsic. You can use ``llvm.sadd.sat``
15462 on any integer bit width or vectors of integers.
15466 declare i16 @llvm.sadd.sat.i16(i16 %a, i16 %b)
15467 declare i32 @llvm.sadd.sat.i32(i32 %a, i32 %b)
15468 declare i64 @llvm.sadd.sat.i64(i64 %a, i64 %b)
15469 declare <4 x i32> @llvm.sadd.sat.v4i32(<4 x i32> %a, <4 x i32> %b)
15474 The '``llvm.sadd.sat``' family of intrinsic functions perform signed
15475 saturating addition on the 2 arguments.
15480 The arguments (%a and %b) and the result may be of integer types of any bit
15481 width, but they must have the same bit width. ``%a`` and ``%b`` are the two
15482 values that will undergo signed addition.
15487 The maximum value this operation can clamp to is the largest signed value
15488 representable by the bit width of the arguments. The minimum value is the
15489 smallest signed value representable by this bit width.
15495 .. code-block:: llvm
15497 %res = call i4 @llvm.sadd.sat.i4(i4 1, i4 2) ; %res = 3
15498 %res = call i4 @llvm.sadd.sat.i4(i4 5, i4 6) ; %res = 7
15499 %res = call i4 @llvm.sadd.sat.i4(i4 -4, i4 2) ; %res = -2
15500 %res = call i4 @llvm.sadd.sat.i4(i4 -4, i4 -5) ; %res = -8
15503 '``llvm.uadd.sat.*``' Intrinsics
15504 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
15509 This is an overloaded intrinsic. You can use ``llvm.uadd.sat``
15510 on any integer bit width or vectors of integers.
15514 declare i16 @llvm.uadd.sat.i16(i16 %a, i16 %b)
15515 declare i32 @llvm.uadd.sat.i32(i32 %a, i32 %b)
15516 declare i64 @llvm.uadd.sat.i64(i64 %a, i64 %b)
15517 declare <4 x i32> @llvm.uadd.sat.v4i32(<4 x i32> %a, <4 x i32> %b)
15522 The '``llvm.uadd.sat``' family of intrinsic functions perform unsigned
15523 saturating addition on the 2 arguments.
15528 The arguments (%a and %b) and the result may be of integer types of any bit
15529 width, but they must have the same bit width. ``%a`` and ``%b`` are the two
15530 values that will undergo unsigned addition.
15535 The maximum value this operation can clamp to is the largest unsigned value
15536 representable by the bit width of the arguments. Because this is an unsigned
15537 operation, the result will never saturate towards zero.
15543 .. code-block:: llvm
15545 %res = call i4 @llvm.uadd.sat.i4(i4 1, i4 2) ; %res = 3
15546 %res = call i4 @llvm.uadd.sat.i4(i4 5, i4 6) ; %res = 11
15547 %res = call i4 @llvm.uadd.sat.i4(i4 8, i4 8) ; %res = 15
15550 '``llvm.ssub.sat.*``' Intrinsics
15551 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
15556 This is an overloaded intrinsic. You can use ``llvm.ssub.sat``
15557 on any integer bit width or vectors of integers.
15561 declare i16 @llvm.ssub.sat.i16(i16 %a, i16 %b)
15562 declare i32 @llvm.ssub.sat.i32(i32 %a, i32 %b)
15563 declare i64 @llvm.ssub.sat.i64(i64 %a, i64 %b)
15564 declare <4 x i32> @llvm.ssub.sat.v4i32(<4 x i32> %a, <4 x i32> %b)
15569 The '``llvm.ssub.sat``' family of intrinsic functions perform signed
15570 saturating subtraction on the 2 arguments.
15575 The arguments (%a and %b) and the result may be of integer types of any bit
15576 width, but they must have the same bit width. ``%a`` and ``%b`` are the two
15577 values that will undergo signed subtraction.
15582 The maximum value this operation can clamp to is the largest signed value
15583 representable by the bit width of the arguments. The minimum value is the
15584 smallest signed value representable by this bit width.
15590 .. code-block:: llvm
15592 %res = call i4 @llvm.ssub.sat.i4(i4 2, i4 1) ; %res = 1
15593 %res = call i4 @llvm.ssub.sat.i4(i4 2, i4 6) ; %res = -4
15594 %res = call i4 @llvm.ssub.sat.i4(i4 -4, i4 5) ; %res = -8
15595 %res = call i4 @llvm.ssub.sat.i4(i4 4, i4 -5) ; %res = 7
15598 '``llvm.usub.sat.*``' Intrinsics
15599 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
15604 This is an overloaded intrinsic. You can use ``llvm.usub.sat``
15605 on any integer bit width or vectors of integers.
15609 declare i16 @llvm.usub.sat.i16(i16 %a, i16 %b)
15610 declare i32 @llvm.usub.sat.i32(i32 %a, i32 %b)
15611 declare i64 @llvm.usub.sat.i64(i64 %a, i64 %b)
15612 declare <4 x i32> @llvm.usub.sat.v4i32(<4 x i32> %a, <4 x i32> %b)
15617 The '``llvm.usub.sat``' family of intrinsic functions perform unsigned
15618 saturating subtraction on the 2 arguments.
15623 The arguments (%a and %b) and the result may be of integer types of any bit
15624 width, but they must have the same bit width. ``%a`` and ``%b`` are the two
15625 values that will undergo unsigned subtraction.
15630 The minimum value this operation can clamp to is 0, which is the smallest
15631 unsigned value representable by the bit width of the unsigned arguments.
15632 Because this is an unsigned operation, the result will never saturate towards
15633 the largest possible value representable by this bit width.
15639 .. code-block:: llvm
15641 %res = call i4 @llvm.usub.sat.i4(i4 2, i4 1) ; %res = 1
15642 %res = call i4 @llvm.usub.sat.i4(i4 2, i4 6) ; %res = 0
15645 '``llvm.sshl.sat.*``' Intrinsics
15646 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
15651 This is an overloaded intrinsic. You can use ``llvm.sshl.sat``
15652 on integers or vectors of integers of any bit width.
15656 declare i16 @llvm.sshl.sat.i16(i16 %a, i16 %b)
15657 declare i32 @llvm.sshl.sat.i32(i32 %a, i32 %b)
15658 declare i64 @llvm.sshl.sat.i64(i64 %a, i64 %b)
15659 declare <4 x i32> @llvm.sshl.sat.v4i32(<4 x i32> %a, <4 x i32> %b)
15664 The '``llvm.sshl.sat``' family of intrinsic functions perform signed
15665 saturating left shift on the first argument.
15670 The arguments (``%a`` and ``%b``) and the result may be of integer types of any
15671 bit width, but they must have the same bit width. ``%a`` is the value to be
15672 shifted, and ``%b`` is the amount to shift by. If ``b`` is (statically or
15673 dynamically) equal to or larger than the integer bit width of the arguments,
15674 the result is a :ref:`poison value <poisonvalues>`. If the arguments are
15675 vectors, each vector element of ``a`` is shifted by the corresponding shift
15682 The maximum value this operation can clamp to is the largest signed value
15683 representable by the bit width of the arguments. The minimum value is the
15684 smallest signed value representable by this bit width.
15690 .. code-block:: llvm
15692 %res = call i4 @llvm.sshl.sat.i4(i4 2, i4 1) ; %res = 4
15693 %res = call i4 @llvm.sshl.sat.i4(i4 2, i4 2) ; %res = 7
15694 %res = call i4 @llvm.sshl.sat.i4(i4 -5, i4 1) ; %res = -8
15695 %res = call i4 @llvm.sshl.sat.i4(i4 -1, i4 1) ; %res = -2
15698 '``llvm.ushl.sat.*``' Intrinsics
15699 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
15704 This is an overloaded intrinsic. You can use ``llvm.ushl.sat``
15705 on integers or vectors of integers of any bit width.
15709 declare i16 @llvm.ushl.sat.i16(i16 %a, i16 %b)
15710 declare i32 @llvm.ushl.sat.i32(i32 %a, i32 %b)
15711 declare i64 @llvm.ushl.sat.i64(i64 %a, i64 %b)
15712 declare <4 x i32> @llvm.ushl.sat.v4i32(<4 x i32> %a, <4 x i32> %b)
15717 The '``llvm.ushl.sat``' family of intrinsic functions perform unsigned
15718 saturating left shift on the first argument.
15723 The arguments (``%a`` and ``%b``) and the result may be of integer types of any
15724 bit width, but they must have the same bit width. ``%a`` is the value to be
15725 shifted, and ``%b`` is the amount to shift by. If ``b`` is (statically or
15726 dynamically) equal to or larger than the integer bit width of the arguments,
15727 the result is a :ref:`poison value <poisonvalues>`. If the arguments are
15728 vectors, each vector element of ``a`` is shifted by the corresponding shift
15734 The maximum value this operation can clamp to is the largest unsigned value
15735 representable by the bit width of the arguments.
15741 .. code-block:: llvm
15743 %res = call i4 @llvm.ushl.sat.i4(i4 2, i4 1) ; %res = 4
15744 %res = call i4 @llvm.ushl.sat.i4(i4 3, i4 3) ; %res = 15
15747 Fixed Point Arithmetic Intrinsics
15748 ---------------------------------
15750 A fixed point number represents a real data type for a number that has a fixed
15751 number of digits after a radix point (equivalent to the decimal point '.').
15752 The number of digits after the radix point is referred as the `scale`. These
15753 are useful for representing fractional values to a specific precision. The
15754 following intrinsics perform fixed point arithmetic operations on 2 operands
15755 of the same scale, specified as the third argument.
15757 The ``llvm.*mul.fix`` family of intrinsic functions represents a multiplication
15758 of fixed point numbers through scaled integers. Therefore, fixed point
15759 multiplication can be represented as
15761 .. code-block:: llvm
15763 %result = call i4 @llvm.smul.fix.i4(i4 %a, i4 %b, i32 %scale)
15766 %a2 = sext i4 %a to i8
15767 %b2 = sext i4 %b to i8
15768 %mul = mul nsw nuw i8 %a, %b
15769 %scale2 = trunc i32 %scale to i8
15770 %r = ashr i8 %mul, i8 %scale2 ; this is for a target rounding down towards negative infinity
15771 %result = trunc i8 %r to i4
15773 The ``llvm.*div.fix`` family of intrinsic functions represents a division of
15774 fixed point numbers through scaled integers. Fixed point division can be
15777 .. code-block:: llvm
15779 %result call i4 @llvm.sdiv.fix.i4(i4 %a, i4 %b, i32 %scale)
15782 %a2 = sext i4 %a to i8
15783 %b2 = sext i4 %b to i8
15784 %scale2 = trunc i32 %scale to i8
15785 %a3 = shl i8 %a2, %scale2
15786 %r = sdiv i8 %a3, %b2 ; this is for a target rounding towards zero
15787 %result = trunc i8 %r to i4
15789 For each of these functions, if the result cannot be represented exactly with
15790 the provided scale, the result is rounded. Rounding is unspecified since
15791 preferred rounding may vary for different targets. Rounding is specified
15792 through a target hook. Different pipelines should legalize or optimize this
15793 using the rounding specified by this hook if it is provided. Operations like
15794 constant folding, instruction combining, KnownBits, and ValueTracking should
15795 also use this hook, if provided, and not assume the direction of rounding. A
15796 rounded result must always be within one unit of precision from the true
15797 result. That is, the error between the returned result and the true result must
15798 be less than 1/2^(scale).
15801 '``llvm.smul.fix.*``' Intrinsics
15802 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
15807 This is an overloaded intrinsic. You can use ``llvm.smul.fix``
15808 on any integer bit width or vectors of integers.
15812 declare i16 @llvm.smul.fix.i16(i16 %a, i16 %b, i32 %scale)
15813 declare i32 @llvm.smul.fix.i32(i32 %a, i32 %b, i32 %scale)
15814 declare i64 @llvm.smul.fix.i64(i64 %a, i64 %b, i32 %scale)
15815 declare <4 x i32> @llvm.smul.fix.v4i32(<4 x i32> %a, <4 x i32> %b, i32 %scale)
15820 The '``llvm.smul.fix``' family of intrinsic functions perform signed
15821 fixed point multiplication on 2 arguments of the same scale.
15826 The arguments (%a and %b) and the result may be of integer types of any bit
15827 width, but they must have the same bit width. The arguments may also work with
15828 int vectors of the same length and int size. ``%a`` and ``%b`` are the two
15829 values that will undergo signed fixed point multiplication. The argument
15830 ``%scale`` represents the scale of both operands, and must be a constant
15836 This operation performs fixed point multiplication on the 2 arguments of a
15837 specified scale. The result will also be returned in the same scale specified
15838 in the third argument.
15840 If the result value cannot be precisely represented in the given scale, the
15841 value is rounded up or down to the closest representable value. The rounding
15842 direction is unspecified.
15844 It is undefined behavior if the result value does not fit within the range of
15845 the fixed point type.
15851 .. code-block:: llvm
15853 %res = call i4 @llvm.smul.fix.i4(i4 3, i4 2, i32 0) ; %res = 6 (2 x 3 = 6)
15854 %res = call i4 @llvm.smul.fix.i4(i4 3, i4 2, i32 1) ; %res = 3 (1.5 x 1 = 1.5)
15855 %res = call i4 @llvm.smul.fix.i4(i4 3, i4 -2, i32 1) ; %res = -3 (1.5 x -1 = -1.5)
15857 ; The result in the following could be rounded up to -2 or down to -2.5
15858 %res = call i4 @llvm.smul.fix.i4(i4 3, i4 -3, i32 1) ; %res = -5 (or -4) (1.5 x -1.5 = -2.25)
15861 '``llvm.umul.fix.*``' Intrinsics
15862 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
15867 This is an overloaded intrinsic. You can use ``llvm.umul.fix``
15868 on any integer bit width or vectors of integers.
15872 declare i16 @llvm.umul.fix.i16(i16 %a, i16 %b, i32 %scale)
15873 declare i32 @llvm.umul.fix.i32(i32 %a, i32 %b, i32 %scale)
15874 declare i64 @llvm.umul.fix.i64(i64 %a, i64 %b, i32 %scale)
15875 declare <4 x i32> @llvm.umul.fix.v4i32(<4 x i32> %a, <4 x i32> %b, i32 %scale)
15880 The '``llvm.umul.fix``' family of intrinsic functions perform unsigned
15881 fixed point multiplication on 2 arguments of the same scale.
15886 The arguments (%a and %b) and the result may be of integer types of any bit
15887 width, but they must have the same bit width. The arguments may also work with
15888 int vectors of the same length and int size. ``%a`` and ``%b`` are the two
15889 values that will undergo unsigned fixed point multiplication. The argument
15890 ``%scale`` represents the scale of both operands, and must be a constant
15896 This operation performs unsigned fixed point multiplication on the 2 arguments of a
15897 specified scale. The result will also be returned in the same scale specified
15898 in the third argument.
15900 If the result value cannot be precisely represented in the given scale, the
15901 value is rounded up or down to the closest representable value. The rounding
15902 direction is unspecified.
15904 It is undefined behavior if the result value does not fit within the range of
15905 the fixed point type.
15911 .. code-block:: llvm
15913 %res = call i4 @llvm.umul.fix.i4(i4 3, i4 2, i32 0) ; %res = 6 (2 x 3 = 6)
15914 %res = call i4 @llvm.umul.fix.i4(i4 3, i4 2, i32 1) ; %res = 3 (1.5 x 1 = 1.5)
15916 ; The result in the following could be rounded down to 3.5 or up to 4
15917 %res = call i4 @llvm.umul.fix.i4(i4 15, i4 1, i32 1) ; %res = 7 (or 8) (7.5 x 0.5 = 3.75)
15920 '``llvm.smul.fix.sat.*``' Intrinsics
15921 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
15926 This is an overloaded intrinsic. You can use ``llvm.smul.fix.sat``
15927 on any integer bit width or vectors of integers.
15931 declare i16 @llvm.smul.fix.sat.i16(i16 %a, i16 %b, i32 %scale)
15932 declare i32 @llvm.smul.fix.sat.i32(i32 %a, i32 %b, i32 %scale)
15933 declare i64 @llvm.smul.fix.sat.i64(i64 %a, i64 %b, i32 %scale)
15934 declare <4 x i32> @llvm.smul.fix.sat.v4i32(<4 x i32> %a, <4 x i32> %b, i32 %scale)
15939 The '``llvm.smul.fix.sat``' family of intrinsic functions perform signed
15940 fixed point saturating multiplication on 2 arguments of the same scale.
15945 The arguments (%a and %b) and the result may be of integer types of any bit
15946 width, but they must have the same bit width. ``%a`` and ``%b`` are the two
15947 values that will undergo signed fixed point multiplication. The argument
15948 ``%scale`` represents the scale of both operands, and must be a constant
15954 This operation performs fixed point multiplication on the 2 arguments of a
15955 specified scale. The result will also be returned in the same scale specified
15956 in the third argument.
15958 If the result value cannot be precisely represented in the given scale, the
15959 value is rounded up or down to the closest representable value. The rounding
15960 direction is unspecified.
15962 The maximum value this operation can clamp to is the largest signed value
15963 representable by the bit width of the first 2 arguments. The minimum value is the
15964 smallest signed value representable by this bit width.
15970 .. code-block:: llvm
15972 %res = call i4 @llvm.smul.fix.sat.i4(i4 3, i4 2, i32 0) ; %res = 6 (2 x 3 = 6)
15973 %res = call i4 @llvm.smul.fix.sat.i4(i4 3, i4 2, i32 1) ; %res = 3 (1.5 x 1 = 1.5)
15974 %res = call i4 @llvm.smul.fix.sat.i4(i4 3, i4 -2, i32 1) ; %res = -3 (1.5 x -1 = -1.5)
15976 ; The result in the following could be rounded up to -2 or down to -2.5
15977 %res = call i4 @llvm.smul.fix.sat.i4(i4 3, i4 -3, i32 1) ; %res = -5 (or -4) (1.5 x -1.5 = -2.25)
15980 %res = call i4 @llvm.smul.fix.sat.i4(i4 7, i4 2, i32 0) ; %res = 7
15981 %res = call i4 @llvm.smul.fix.sat.i4(i4 7, i4 4, i32 2) ; %res = 7
15982 %res = call i4 @llvm.smul.fix.sat.i4(i4 -8, i4 5, i32 2) ; %res = -8
15983 %res = call i4 @llvm.smul.fix.sat.i4(i4 -8, i4 -2, i32 1) ; %res = 7
15985 ; Scale can affect the saturation result
15986 %res = call i4 @llvm.smul.fix.sat.i4(i4 2, i4 4, i32 0) ; %res = 7 (2 x 4 -> clamped to 7)
15987 %res = call i4 @llvm.smul.fix.sat.i4(i4 2, i4 4, i32 1) ; %res = 4 (1 x 2 = 2)
15990 '``llvm.umul.fix.sat.*``' Intrinsics
15991 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
15996 This is an overloaded intrinsic. You can use ``llvm.umul.fix.sat``
15997 on any integer bit width or vectors of integers.
16001 declare i16 @llvm.umul.fix.sat.i16(i16 %a, i16 %b, i32 %scale)
16002 declare i32 @llvm.umul.fix.sat.i32(i32 %a, i32 %b, i32 %scale)
16003 declare i64 @llvm.umul.fix.sat.i64(i64 %a, i64 %b, i32 %scale)
16004 declare <4 x i32> @llvm.umul.fix.sat.v4i32(<4 x i32> %a, <4 x i32> %b, i32 %scale)
16009 The '``llvm.umul.fix.sat``' family of intrinsic functions perform unsigned
16010 fixed point saturating multiplication on 2 arguments of the same scale.
16015 The arguments (%a and %b) and the result may be of integer types of any bit
16016 width, but they must have the same bit width. ``%a`` and ``%b`` are the two
16017 values that will undergo unsigned fixed point multiplication. The argument
16018 ``%scale`` represents the scale of both operands, and must be a constant
16024 This operation performs fixed point multiplication on the 2 arguments of a
16025 specified scale. The result will also be returned in the same scale specified
16026 in the third argument.
16028 If the result value cannot be precisely represented in the given scale, the
16029 value is rounded up or down to the closest representable value. The rounding
16030 direction is unspecified.
16032 The maximum value this operation can clamp to is the largest unsigned value
16033 representable by the bit width of the first 2 arguments. The minimum value is the
16034 smallest unsigned value representable by this bit width (zero).
16040 .. code-block:: llvm
16042 %res = call i4 @llvm.umul.fix.sat.i4(i4 3, i4 2, i32 0) ; %res = 6 (2 x 3 = 6)
16043 %res = call i4 @llvm.umul.fix.sat.i4(i4 3, i4 2, i32 1) ; %res = 3 (1.5 x 1 = 1.5)
16045 ; The result in the following could be rounded down to 2 or up to 2.5
16046 %res = call i4 @llvm.umul.fix.sat.i4(i4 3, i4 3, i32 1) ; %res = 4 (or 5) (1.5 x 1.5 = 2.25)
16049 %res = call i4 @llvm.umul.fix.sat.i4(i4 8, i4 2, i32 0) ; %res = 15 (8 x 2 -> clamped to 15)
16050 %res = call i4 @llvm.umul.fix.sat.i4(i4 8, i4 8, i32 2) ; %res = 15 (2 x 2 -> clamped to 3.75)
16052 ; Scale can affect the saturation result
16053 %res = call i4 @llvm.umul.fix.sat.i4(i4 2, i4 4, i32 0) ; %res = 7 (2 x 4 -> clamped to 7)
16054 %res = call i4 @llvm.umul.fix.sat.i4(i4 2, i4 4, i32 1) ; %res = 4 (1 x 2 = 2)
16057 '``llvm.sdiv.fix.*``' Intrinsics
16058 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
16063 This is an overloaded intrinsic. You can use ``llvm.sdiv.fix``
16064 on any integer bit width or vectors of integers.
16068 declare i16 @llvm.sdiv.fix.i16(i16 %a, i16 %b, i32 %scale)
16069 declare i32 @llvm.sdiv.fix.i32(i32 %a, i32 %b, i32 %scale)
16070 declare i64 @llvm.sdiv.fix.i64(i64 %a, i64 %b, i32 %scale)
16071 declare <4 x i32> @llvm.sdiv.fix.v4i32(<4 x i32> %a, <4 x i32> %b, i32 %scale)
16076 The '``llvm.sdiv.fix``' family of intrinsic functions perform signed
16077 fixed point division on 2 arguments of the same scale.
16082 The arguments (%a and %b) and the result may be of integer types of any bit
16083 width, but they must have the same bit width. The arguments may also work with
16084 int vectors of the same length and int size. ``%a`` and ``%b`` are the two
16085 values that will undergo signed fixed point division. The argument
16086 ``%scale`` represents the scale of both operands, and must be a constant
16092 This operation performs fixed point division on the 2 arguments of a
16093 specified scale. The result will also be returned in the same scale specified
16094 in the third argument.
16096 If the result value cannot be precisely represented in the given scale, the
16097 value is rounded up or down to the closest representable value. The rounding
16098 direction is unspecified.
16100 It is undefined behavior if the result value does not fit within the range of
16101 the fixed point type, or if the second argument is zero.
16107 .. code-block:: llvm
16109 %res = call i4 @llvm.sdiv.fix.i4(i4 6, i4 2, i32 0) ; %res = 3 (6 / 2 = 3)
16110 %res = call i4 @llvm.sdiv.fix.i4(i4 6, i4 4, i32 1) ; %res = 3 (3 / 2 = 1.5)
16111 %res = call i4 @llvm.sdiv.fix.i4(i4 3, i4 -2, i32 1) ; %res = -3 (1.5 / -1 = -1.5)
16113 ; The result in the following could be rounded up to 1 or down to 0.5
16114 %res = call i4 @llvm.sdiv.fix.i4(i4 3, i4 4, i32 1) ; %res = 2 (or 1) (1.5 / 2 = 0.75)
16117 '``llvm.udiv.fix.*``' Intrinsics
16118 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
16123 This is an overloaded intrinsic. You can use ``llvm.udiv.fix``
16124 on any integer bit width or vectors of integers.
16128 declare i16 @llvm.udiv.fix.i16(i16 %a, i16 %b, i32 %scale)
16129 declare i32 @llvm.udiv.fix.i32(i32 %a, i32 %b, i32 %scale)
16130 declare i64 @llvm.udiv.fix.i64(i64 %a, i64 %b, i32 %scale)
16131 declare <4 x i32> @llvm.udiv.fix.v4i32(<4 x i32> %a, <4 x i32> %b, i32 %scale)
16136 The '``llvm.udiv.fix``' family of intrinsic functions perform unsigned
16137 fixed point division on 2 arguments of the same scale.
16142 The arguments (%a and %b) and the result may be of integer types of any bit
16143 width, but they must have the same bit width. The arguments may also work with
16144 int vectors of the same length and int size. ``%a`` and ``%b`` are the two
16145 values that will undergo unsigned fixed point division. The argument
16146 ``%scale`` represents the scale of both operands, and must be a constant
16152 This operation performs fixed point division on the 2 arguments of a
16153 specified scale. The result will also be returned in the same scale specified
16154 in the third argument.
16156 If the result value cannot be precisely represented in the given scale, the
16157 value is rounded up or down to the closest representable value. The rounding
16158 direction is unspecified.
16160 It is undefined behavior if the result value does not fit within the range of
16161 the fixed point type, or if the second argument is zero.
16167 .. code-block:: llvm
16169 %res = call i4 @llvm.udiv.fix.i4(i4 6, i4 2, i32 0) ; %res = 3 (6 / 2 = 3)
16170 %res = call i4 @llvm.udiv.fix.i4(i4 6, i4 4, i32 1) ; %res = 3 (3 / 2 = 1.5)
16171 %res = call i4 @llvm.udiv.fix.i4(i4 1, i4 -8, i32 4) ; %res = 2 (0.0625 / 0.5 = 0.125)
16173 ; The result in the following could be rounded up to 1 or down to 0.5
16174 %res = call i4 @llvm.udiv.fix.i4(i4 3, i4 4, i32 1) ; %res = 2 (or 1) (1.5 / 2 = 0.75)
16177 '``llvm.sdiv.fix.sat.*``' Intrinsics
16178 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
16183 This is an overloaded intrinsic. You can use ``llvm.sdiv.fix.sat``
16184 on any integer bit width or vectors of integers.
16188 declare i16 @llvm.sdiv.fix.sat.i16(i16 %a, i16 %b, i32 %scale)
16189 declare i32 @llvm.sdiv.fix.sat.i32(i32 %a, i32 %b, i32 %scale)
16190 declare i64 @llvm.sdiv.fix.sat.i64(i64 %a, i64 %b, i32 %scale)
16191 declare <4 x i32> @llvm.sdiv.fix.sat.v4i32(<4 x i32> %a, <4 x i32> %b, i32 %scale)
16196 The '``llvm.sdiv.fix.sat``' family of intrinsic functions perform signed
16197 fixed point saturating division on 2 arguments of the same scale.
16202 The arguments (%a and %b) and the result may be of integer types of any bit
16203 width, but they must have the same bit width. ``%a`` and ``%b`` are the two
16204 values that will undergo signed fixed point division. The argument
16205 ``%scale`` represents the scale of both operands, and must be a constant
16211 This operation performs fixed point division on the 2 arguments of a
16212 specified scale. The result will also be returned in the same scale specified
16213 in the third argument.
16215 If the result value cannot be precisely represented in the given scale, the
16216 value is rounded up or down to the closest representable value. The rounding
16217 direction is unspecified.
16219 The maximum value this operation can clamp to is the largest signed value
16220 representable by the bit width of the first 2 arguments. The minimum value is the
16221 smallest signed value representable by this bit width.
16223 It is undefined behavior if the second argument is zero.
16229 .. code-block:: llvm
16231 %res = call i4 @llvm.sdiv.fix.sat.i4(i4 6, i4 2, i32 0) ; %res = 3 (6 / 2 = 3)
16232 %res = call i4 @llvm.sdiv.fix.sat.i4(i4 6, i4 4, i32 1) ; %res = 3 (3 / 2 = 1.5)
16233 %res = call i4 @llvm.sdiv.fix.sat.i4(i4 3, i4 -2, i32 1) ; %res = -3 (1.5 / -1 = -1.5)
16235 ; The result in the following could be rounded up to 1 or down to 0.5
16236 %res = call i4 @llvm.sdiv.fix.sat.i4(i4 3, i4 4, i32 1) ; %res = 2 (or 1) (1.5 / 2 = 0.75)
16239 %res = call i4 @llvm.sdiv.fix.sat.i4(i4 -8, i4 -1, i32 0) ; %res = 7 (-8 / -1 = 8 => 7)
16240 %res = call i4 @llvm.sdiv.fix.sat.i4(i4 4, i4 2, i32 2) ; %res = 7 (1 / 0.5 = 2 => 1.75)
16241 %res = call i4 @llvm.sdiv.fix.sat.i4(i4 -4, i4 1, i32 2) ; %res = -8 (-1 / 0.25 = -4 => -2)
16244 '``llvm.udiv.fix.sat.*``' Intrinsics
16245 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
16250 This is an overloaded intrinsic. You can use ``llvm.udiv.fix.sat``
16251 on any integer bit width or vectors of integers.
16255 declare i16 @llvm.udiv.fix.sat.i16(i16 %a, i16 %b, i32 %scale)
16256 declare i32 @llvm.udiv.fix.sat.i32(i32 %a, i32 %b, i32 %scale)
16257 declare i64 @llvm.udiv.fix.sat.i64(i64 %a, i64 %b, i32 %scale)
16258 declare <4 x i32> @llvm.udiv.fix.sat.v4i32(<4 x i32> %a, <4 x i32> %b, i32 %scale)
16263 The '``llvm.udiv.fix.sat``' family of intrinsic functions perform unsigned
16264 fixed point saturating division on 2 arguments of the same scale.
16269 The arguments (%a and %b) and the result may be of integer types of any bit
16270 width, but they must have the same bit width. ``%a`` and ``%b`` are the two
16271 values that will undergo unsigned fixed point division. The argument
16272 ``%scale`` represents the scale of both operands, and must be a constant
16278 This operation performs fixed point division on the 2 arguments of a
16279 specified scale. The result will also be returned in the same scale specified
16280 in the third argument.
16282 If the result value cannot be precisely represented in the given scale, the
16283 value is rounded up or down to the closest representable value. The rounding
16284 direction is unspecified.
16286 The maximum value this operation can clamp to is the largest unsigned value
16287 representable by the bit width of the first 2 arguments. The minimum value is the
16288 smallest unsigned value representable by this bit width (zero).
16290 It is undefined behavior if the second argument is zero.
16295 .. code-block:: llvm
16297 %res = call i4 @llvm.udiv.fix.sat.i4(i4 6, i4 2, i32 0) ; %res = 3 (6 / 2 = 3)
16298 %res = call i4 @llvm.udiv.fix.sat.i4(i4 6, i4 4, i32 1) ; %res = 3 (3 / 2 = 1.5)
16300 ; The result in the following could be rounded down to 0.5 or up to 1
16301 %res = call i4 @llvm.udiv.fix.sat.i4(i4 3, i4 4, i32 1) ; %res = 1 (or 2) (1.5 / 2 = 0.75)
16304 %res = call i4 @llvm.udiv.fix.sat.i4(i4 8, i4 2, i32 2) ; %res = 15 (2 / 0.5 = 4 => 3.75)
16307 Specialised Arithmetic Intrinsics
16308 ---------------------------------
16310 .. _i_intr_llvm_canonicalize:
16312 '``llvm.canonicalize.*``' Intrinsic
16313 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
16320 declare float @llvm.canonicalize.f32(float %a)
16321 declare double @llvm.canonicalize.f64(double %b)
16326 The '``llvm.canonicalize.*``' intrinsic returns the platform specific canonical
16327 encoding of a floating-point number. This canonicalization is useful for
16328 implementing certain numeric primitives such as frexp. The canonical encoding is
16329 defined by IEEE-754-2008 to be:
16333 2.1.8 canonical encoding: The preferred encoding of a floating-point
16334 representation in a format. Applied to declets, significands of finite
16335 numbers, infinities, and NaNs, especially in decimal formats.
16337 This operation can also be considered equivalent to the IEEE-754-2008
16338 conversion of a floating-point value to the same format. NaNs are handled
16339 according to section 6.2.
16341 Examples of non-canonical encodings:
16343 - x87 pseudo denormals, pseudo NaNs, pseudo Infinity, Unnormals. These are
16344 converted to a canonical representation per hardware-specific protocol.
16345 - Many normal decimal floating-point numbers have non-canonical alternative
16347 - Some machines, like GPUs or ARMv7 NEON, do not support subnormal values.
16348 These are treated as non-canonical encodings of zero and will be flushed to
16349 a zero of the same sign by this operation.
16351 Note that per IEEE-754-2008 6.2, systems that support signaling NaNs with
16352 default exception handling must signal an invalid exception, and produce a
16355 This function should always be implementable as multiplication by 1.0, provided
16356 that the compiler does not constant fold the operation. Likewise, division by
16357 1.0 and ``llvm.minnum(x, x)`` are possible implementations. Addition with
16358 -0.0 is also sufficient provided that the rounding mode is not -Infinity.
16360 ``@llvm.canonicalize`` must preserve the equality relation. That is:
16362 - ``(@llvm.canonicalize(x) == x)`` is equivalent to ``(x == x)``
16363 - ``(@llvm.canonicalize(x) == @llvm.canonicalize(y))`` is equivalent to
16366 Additionally, the sign of zero must be conserved:
16367 ``@llvm.canonicalize(-0.0) = -0.0`` and ``@llvm.canonicalize(+0.0) = +0.0``
16369 The payload bits of a NaN must be conserved, with two exceptions.
16370 First, environments which use only a single canonical representation of NaN
16371 must perform said canonicalization. Second, SNaNs must be quieted per the
16374 The canonicalization operation may be optimized away if:
16376 - The input is known to be canonical. For example, it was produced by a
16377 floating-point operation that is required by the standard to be canonical.
16378 - The result is consumed only by (or fused with) other floating-point
16379 operations. That is, the bits of the floating-point value are not examined.
16381 '``llvm.fmuladd.*``' Intrinsic
16382 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
16389 declare float @llvm.fmuladd.f32(float %a, float %b, float %c)
16390 declare double @llvm.fmuladd.f64(double %a, double %b, double %c)
16395 The '``llvm.fmuladd.*``' intrinsic functions represent multiply-add
16396 expressions that can be fused if the code generator determines that (a) the
16397 target instruction set has support for a fused operation, and (b) that the
16398 fused operation is more efficient than the equivalent, separate pair of mul
16399 and add instructions.
16404 The '``llvm.fmuladd.*``' intrinsics each take three arguments: two
16405 multiplicands, a and b, and an addend c.
16414 %0 = call float @llvm.fmuladd.f32(%a, %b, %c)
16416 is equivalent to the expression a \* b + c, except that it is unspecified
16417 whether rounding will be performed between the multiplication and addition
16418 steps. Fusion is not guaranteed, even if the target platform supports it.
16419 If a fused multiply-add is required, the corresponding
16420 :ref:`llvm.fma <int_fma>` intrinsic function should be used instead.
16421 This never sets errno, just as '``llvm.fma.*``'.
16426 .. code-block:: llvm
16428 %r2 = call float @llvm.fmuladd.f32(float %a, float %b, float %c) ; yields float:r2 = (a * b) + c
16431 Hardware-Loop Intrinsics
16432 ------------------------
16434 LLVM support several intrinsics to mark a loop as a hardware-loop. They are
16435 hints to the backend which are required to lower these intrinsics further to target
16436 specific instructions, or revert the hardware-loop to a normal loop if target
16437 specific restriction are not met and a hardware-loop can't be generated.
16439 These intrinsics may be modified in the future and are not intended to be used
16440 outside the backend. Thus, front-end and mid-level optimizations should not be
16441 generating these intrinsics.
16444 '``llvm.set.loop.iterations.*``' Intrinsic
16445 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
16450 This is an overloaded intrinsic.
16454 declare void @llvm.set.loop.iterations.i32(i32)
16455 declare void @llvm.set.loop.iterations.i64(i64)
16460 The '``llvm.set.loop.iterations.*``' intrinsics are used to specify the
16461 hardware-loop trip count. They are placed in the loop preheader basic block and
16462 are marked as ``IntrNoDuplicate`` to avoid optimizers duplicating these
16468 The integer operand is the loop trip count of the hardware-loop, and thus
16469 not e.g. the loop back-edge taken count.
16474 The '``llvm.set.loop.iterations.*``' intrinsics do not perform any arithmetic
16475 on their operand. It's a hint to the backend that can use this to set up the
16476 hardware-loop count with a target specific instruction, usually a move of this
16477 value to a special register or a hardware-loop instruction.
16480 '``llvm.start.loop.iterations.*``' Intrinsic
16481 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
16486 This is an overloaded intrinsic.
16490 declare i32 @llvm.start.loop.iterations.i32(i32)
16491 declare i64 @llvm.start.loop.iterations.i64(i64)
16496 The '``llvm.start.loop.iterations.*``' intrinsics are similar to the
16497 '``llvm.set.loop.iterations.*``' intrinsics, used to specify the
16498 hardware-loop trip count but also produce a value identical to the input
16499 that can be used as the input to the loop. They are placed in the loop
16500 preheader basic block and the output is expected to be the input to the
16501 phi for the induction variable of the loop, decremented by the
16502 '``llvm.loop.decrement.reg.*``'.
16507 The integer operand is the loop trip count of the hardware-loop, and thus
16508 not e.g. the loop back-edge taken count.
16513 The '``llvm.start.loop.iterations.*``' intrinsics do not perform any arithmetic
16514 on their operand. It's a hint to the backend that can use this to set up the
16515 hardware-loop count with a target specific instruction, usually a move of this
16516 value to a special register or a hardware-loop instruction.
16518 '``llvm.test.set.loop.iterations.*``' Intrinsic
16519 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
16524 This is an overloaded intrinsic.
16528 declare i1 @llvm.test.set.loop.iterations.i32(i32)
16529 declare i1 @llvm.test.set.loop.iterations.i64(i64)
16534 The '``llvm.test.set.loop.iterations.*``' intrinsics are used to specify the
16535 the loop trip count, and also test that the given count is not zero, allowing
16536 it to control entry to a while-loop. They are placed in the loop preheader's
16537 predecessor basic block, and are marked as ``IntrNoDuplicate`` to avoid
16538 optimizers duplicating these instructions.
16543 The integer operand is the loop trip count of the hardware-loop, and thus
16544 not e.g. the loop back-edge taken count.
16549 The '``llvm.test.set.loop.iterations.*``' intrinsics do not perform any
16550 arithmetic on their operand. It's a hint to the backend that can use this to
16551 set up the hardware-loop count with a target specific instruction, usually a
16552 move of this value to a special register or a hardware-loop instruction.
16553 The result is the conditional value of whether the given count is not zero.
16556 '``llvm.test.start.loop.iterations.*``' Intrinsic
16557 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
16562 This is an overloaded intrinsic.
16566 declare {i32, i1} @llvm.test.start.loop.iterations.i32(i32)
16567 declare {i64, i1} @llvm.test.start.loop.iterations.i64(i64)
16572 The '``llvm.test.start.loop.iterations.*``' intrinsics are similar to the
16573 '``llvm.test.set.loop.iterations.*``' and '``llvm.start.loop.iterations.*``'
16574 intrinsics, used to specify the hardware-loop trip count, but also produce a
16575 value identical to the input that can be used as the input to the loop. The
16576 second i1 output controls entry to a while-loop.
16581 The integer operand is the loop trip count of the hardware-loop, and thus
16582 not e.g. the loop back-edge taken count.
16587 The '``llvm.test.start.loop.iterations.*``' intrinsics do not perform any
16588 arithmetic on their operand. It's a hint to the backend that can use this to
16589 set up the hardware-loop count with a target specific instruction, usually a
16590 move of this value to a special register or a hardware-loop instruction.
16591 The result is a pair of the input and a conditional value of whether the
16592 given count is not zero.
16595 '``llvm.loop.decrement.reg.*``' Intrinsic
16596 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
16601 This is an overloaded intrinsic.
16605 declare i32 @llvm.loop.decrement.reg.i32(i32, i32)
16606 declare i64 @llvm.loop.decrement.reg.i64(i64, i64)
16611 The '``llvm.loop.decrement.reg.*``' intrinsics are used to lower the loop
16612 iteration counter and return an updated value that will be used in the next
16618 Both arguments must have identical integer types. The first operand is the
16619 loop iteration counter. The second operand is the maximum number of elements
16620 processed in an iteration.
16625 The '``llvm.loop.decrement.reg.*``' intrinsics do an integer ``SUB`` of its
16626 two operands, which is not allowed to wrap. They return the remaining number of
16627 iterations still to be executed, and can be used together with a ``PHI``,
16628 ``ICMP`` and ``BR`` to control the number of loop iterations executed. Any
16629 optimisations are allowed to treat it is a ``SUB``, and it is supported by
16630 SCEV, so it's the backends responsibility to handle cases where it may be
16631 optimised. These intrinsics are marked as ``IntrNoDuplicate`` to avoid
16632 optimizers duplicating these instructions.
16635 '``llvm.loop.decrement.*``' Intrinsic
16636 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
16641 This is an overloaded intrinsic.
16645 declare i1 @llvm.loop.decrement.i32(i32)
16646 declare i1 @llvm.loop.decrement.i64(i64)
16651 The HardwareLoops pass allows the loop decrement value to be specified with an
16652 option. It defaults to a loop decrement value of 1, but it can be an unsigned
16653 integer value provided by this option. The '``llvm.loop.decrement.*``'
16654 intrinsics decrement the loop iteration counter with this value, and return a
16655 false predicate if the loop should exit, and true otherwise.
16656 This is emitted if the loop counter is not updated via a ``PHI`` node, which
16657 can also be controlled with an option.
16662 The integer argument is the loop decrement value used to decrement the loop
16668 The '``llvm.loop.decrement.*``' intrinsics do a ``SUB`` of the loop iteration
16669 counter with the given loop decrement value, and return false if the loop
16670 should exit, this ``SUB`` is not allowed to wrap. The result is a condition
16671 that is used by the conditional branch controlling the loop.
16674 Vector Reduction Intrinsics
16675 ---------------------------
16677 Horizontal reductions of vectors can be expressed using the following
16678 intrinsics. Each one takes a vector operand as an input and applies its
16679 respective operation across all elements of the vector, returning a single
16680 scalar result of the same element type.
16682 .. _int_vector_reduce_add:
16684 '``llvm.vector.reduce.add.*``' Intrinsic
16685 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
16692 declare i32 @llvm.vector.reduce.add.v4i32(<4 x i32> %a)
16693 declare i64 @llvm.vector.reduce.add.v2i64(<2 x i64> %a)
16698 The '``llvm.vector.reduce.add.*``' intrinsics do an integer ``ADD``
16699 reduction of a vector, returning the result as a scalar. The return type matches
16700 the element-type of the vector input.
16704 The argument to this intrinsic must be a vector of integer values.
16706 .. _int_vector_reduce_fadd:
16708 '``llvm.vector.reduce.fadd.*``' Intrinsic
16709 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
16716 declare float @llvm.vector.reduce.fadd.v4f32(float %start_value, <4 x float> %a)
16717 declare double @llvm.vector.reduce.fadd.v2f64(double %start_value, <2 x double> %a)
16722 The '``llvm.vector.reduce.fadd.*``' intrinsics do a floating-point
16723 ``ADD`` reduction of a vector, returning the result as a scalar. The return type
16724 matches the element-type of the vector input.
16726 If the intrinsic call has the 'reassoc' flag set, then the reduction will not
16727 preserve the associativity of an equivalent scalarized counterpart. Otherwise
16728 the reduction will be *sequential*, thus implying that the operation respects
16729 the associativity of a scalarized reduction. That is, the reduction begins with
16730 the start value and performs an fadd operation with consecutively increasing
16731 vector element indices. See the following pseudocode:
16735 float sequential_fadd(start_value, input_vector)
16736 result = start_value
16737 for i = 0 to length(input_vector)
16738 result = result + input_vector[i]
16744 The first argument to this intrinsic is a scalar start value for the reduction.
16745 The type of the start value matches the element-type of the vector input.
16746 The second argument must be a vector of floating-point values.
16748 To ignore the start value, negative zero (``-0.0``) can be used, as it is
16749 the neutral value of floating point addition.
16756 %unord = call reassoc float @llvm.vector.reduce.fadd.v4f32(float -0.0, <4 x float> %input) ; relaxed reduction
16757 %ord = call float @llvm.vector.reduce.fadd.v4f32(float %start_value, <4 x float> %input) ; sequential reduction
16760 .. _int_vector_reduce_mul:
16762 '``llvm.vector.reduce.mul.*``' Intrinsic
16763 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
16770 declare i32 @llvm.vector.reduce.mul.v4i32(<4 x i32> %a)
16771 declare i64 @llvm.vector.reduce.mul.v2i64(<2 x i64> %a)
16776 The '``llvm.vector.reduce.mul.*``' intrinsics do an integer ``MUL``
16777 reduction of a vector, returning the result as a scalar. The return type matches
16778 the element-type of the vector input.
16782 The argument to this intrinsic must be a vector of integer values.
16784 .. _int_vector_reduce_fmul:
16786 '``llvm.vector.reduce.fmul.*``' Intrinsic
16787 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
16794 declare float @llvm.vector.reduce.fmul.v4f32(float %start_value, <4 x float> %a)
16795 declare double @llvm.vector.reduce.fmul.v2f64(double %start_value, <2 x double> %a)
16800 The '``llvm.vector.reduce.fmul.*``' intrinsics do a floating-point
16801 ``MUL`` reduction of a vector, returning the result as a scalar. The return type
16802 matches the element-type of the vector input.
16804 If the intrinsic call has the 'reassoc' flag set, then the reduction will not
16805 preserve the associativity of an equivalent scalarized counterpart. Otherwise
16806 the reduction will be *sequential*, thus implying that the operation respects
16807 the associativity of a scalarized reduction. That is, the reduction begins with
16808 the start value and performs an fmul operation with consecutively increasing
16809 vector element indices. See the following pseudocode:
16813 float sequential_fmul(start_value, input_vector)
16814 result = start_value
16815 for i = 0 to length(input_vector)
16816 result = result * input_vector[i]
16822 The first argument to this intrinsic is a scalar start value for the reduction.
16823 The type of the start value matches the element-type of the vector input.
16824 The second argument must be a vector of floating-point values.
16826 To ignore the start value, one (``1.0``) can be used, as it is the neutral
16827 value of floating point multiplication.
16834 %unord = call reassoc float @llvm.vector.reduce.fmul.v4f32(float 1.0, <4 x float> %input) ; relaxed reduction
16835 %ord = call float @llvm.vector.reduce.fmul.v4f32(float %start_value, <4 x float> %input) ; sequential reduction
16837 .. _int_vector_reduce_and:
16839 '``llvm.vector.reduce.and.*``' Intrinsic
16840 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
16847 declare i32 @llvm.vector.reduce.and.v4i32(<4 x i32> %a)
16852 The '``llvm.vector.reduce.and.*``' intrinsics do a bitwise ``AND``
16853 reduction of a vector, returning the result as a scalar. The return type matches
16854 the element-type of the vector input.
16858 The argument to this intrinsic must be a vector of integer values.
16860 .. _int_vector_reduce_or:
16862 '``llvm.vector.reduce.or.*``' Intrinsic
16863 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
16870 declare i32 @llvm.vector.reduce.or.v4i32(<4 x i32> %a)
16875 The '``llvm.vector.reduce.or.*``' intrinsics do a bitwise ``OR`` reduction
16876 of a vector, returning the result as a scalar. The return type matches the
16877 element-type of the vector input.
16881 The argument to this intrinsic must be a vector of integer values.
16883 .. _int_vector_reduce_xor:
16885 '``llvm.vector.reduce.xor.*``' Intrinsic
16886 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
16893 declare i32 @llvm.vector.reduce.xor.v4i32(<4 x i32> %a)
16898 The '``llvm.vector.reduce.xor.*``' intrinsics do a bitwise ``XOR``
16899 reduction of a vector, returning the result as a scalar. The return type matches
16900 the element-type of the vector input.
16904 The argument to this intrinsic must be a vector of integer values.
16906 .. _int_vector_reduce_smax:
16908 '``llvm.vector.reduce.smax.*``' Intrinsic
16909 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
16916 declare i32 @llvm.vector.reduce.smax.v4i32(<4 x i32> %a)
16921 The '``llvm.vector.reduce.smax.*``' intrinsics do a signed integer
16922 ``MAX`` reduction of a vector, returning the result as a scalar. The return type
16923 matches the element-type of the vector input.
16927 The argument to this intrinsic must be a vector of integer values.
16929 .. _int_vector_reduce_smin:
16931 '``llvm.vector.reduce.smin.*``' Intrinsic
16932 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
16939 declare i32 @llvm.vector.reduce.smin.v4i32(<4 x i32> %a)
16944 The '``llvm.vector.reduce.smin.*``' intrinsics do a signed integer
16945 ``MIN`` reduction of a vector, returning the result as a scalar. The return type
16946 matches the element-type of the vector input.
16950 The argument to this intrinsic must be a vector of integer values.
16952 .. _int_vector_reduce_umax:
16954 '``llvm.vector.reduce.umax.*``' Intrinsic
16955 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
16962 declare i32 @llvm.vector.reduce.umax.v4i32(<4 x i32> %a)
16967 The '``llvm.vector.reduce.umax.*``' intrinsics do an unsigned
16968 integer ``MAX`` reduction of a vector, returning the result as a scalar. The
16969 return type matches the element-type of the vector input.
16973 The argument to this intrinsic must be a vector of integer values.
16975 .. _int_vector_reduce_umin:
16977 '``llvm.vector.reduce.umin.*``' Intrinsic
16978 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
16985 declare i32 @llvm.vector.reduce.umin.v4i32(<4 x i32> %a)
16990 The '``llvm.vector.reduce.umin.*``' intrinsics do an unsigned
16991 integer ``MIN`` reduction of a vector, returning the result as a scalar. The
16992 return type matches the element-type of the vector input.
16996 The argument to this intrinsic must be a vector of integer values.
16998 .. _int_vector_reduce_fmax:
17000 '``llvm.vector.reduce.fmax.*``' Intrinsic
17001 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
17008 declare float @llvm.vector.reduce.fmax.v4f32(<4 x float> %a)
17009 declare double @llvm.vector.reduce.fmax.v2f64(<2 x double> %a)
17014 The '``llvm.vector.reduce.fmax.*``' intrinsics do a floating-point
17015 ``MAX`` reduction of a vector, returning the result as a scalar. The return type
17016 matches the element-type of the vector input.
17018 This instruction has the same comparison semantics as the '``llvm.maxnum.*``'
17019 intrinsic. That is, the result will always be a number unless all elements of
17020 the vector are NaN. For a vector with maximum element magnitude 0.0 and
17021 containing both +0.0 and -0.0 elements, the sign of the result is unspecified.
17023 If the intrinsic call has the ``nnan`` fast-math flag, then the operation can
17024 assume that NaNs are not present in the input vector.
17028 The argument to this intrinsic must be a vector of floating-point values.
17030 .. _int_vector_reduce_fmin:
17032 '``llvm.vector.reduce.fmin.*``' Intrinsic
17033 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
17037 This is an overloaded intrinsic.
17041 declare float @llvm.vector.reduce.fmin.v4f32(<4 x float> %a)
17042 declare double @llvm.vector.reduce.fmin.v2f64(<2 x double> %a)
17047 The '``llvm.vector.reduce.fmin.*``' intrinsics do a floating-point
17048 ``MIN`` reduction of a vector, returning the result as a scalar. The return type
17049 matches the element-type of the vector input.
17051 This instruction has the same comparison semantics as the '``llvm.minnum.*``'
17052 intrinsic. That is, the result will always be a number unless all elements of
17053 the vector are NaN. For a vector with minimum element magnitude 0.0 and
17054 containing both +0.0 and -0.0 elements, the sign of the result is unspecified.
17056 If the intrinsic call has the ``nnan`` fast-math flag, then the operation can
17057 assume that NaNs are not present in the input vector.
17061 The argument to this intrinsic must be a vector of floating-point values.
17063 '``llvm.experimental.vector.insert``' Intrinsic
17064 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
17068 This is an overloaded intrinsic. You can use ``llvm.experimental.vector.insert``
17069 to insert a fixed-width vector into a scalable vector, but not the other way
17074 declare <vscale x 4 x float> @llvm.experimental.vector.insert.v4f32(<vscale x 4 x float> %vec, <4 x float> %subvec, i64 %idx)
17075 declare <vscale x 2 x double> @llvm.experimental.vector.insert.v2f64(<vscale x 2 x double> %vec, <2 x double> %subvec, i64 %idx)
17080 The '``llvm.experimental.vector.insert.*``' intrinsics insert a vector into another vector
17081 starting from a given index. The return type matches the type of the vector we
17082 insert into. Conceptually, this can be used to build a scalable vector out of
17083 non-scalable vectors.
17088 The ``vec`` is the vector which ``subvec`` will be inserted into.
17089 The ``subvec`` is the vector that will be inserted.
17091 ``idx`` represents the starting element number at which ``subvec`` will be
17092 inserted. ``idx`` must be a constant multiple of ``subvec``'s known minimum
17093 vector length. If ``subvec`` is a scalable vector, ``idx`` is first scaled by
17094 the runtime scaling factor of ``subvec``. The elements of ``vec`` starting at
17095 ``idx`` are overwritten with ``subvec``. Elements ``idx`` through (``idx`` +
17096 num_elements(``subvec``) - 1) must be valid ``vec`` indices. If this condition
17097 cannot be determined statically but is false at runtime, then the result vector
17101 '``llvm.experimental.vector.extract``' Intrinsic
17102 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
17106 This is an overloaded intrinsic. You can use
17107 ``llvm.experimental.vector.extract`` to extract a fixed-width vector from a
17108 scalable vector, but not the other way around.
17112 declare <4 x float> @llvm.experimental.vector.extract.v4f32(<vscale x 4 x float> %vec, i64 %idx)
17113 declare <2 x double> @llvm.experimental.vector.extract.v2f64(<vscale x 2 x double> %vec, i64 %idx)
17118 The '``llvm.experimental.vector.extract.*``' intrinsics extract a vector from
17119 within another vector starting from a given index. The return type must be
17120 explicitly specified. Conceptually, this can be used to decompose a scalable
17121 vector into non-scalable parts.
17126 The ``vec`` is the vector from which we will extract a subvector.
17128 The ``idx`` specifies the starting element number within ``vec`` from which a
17129 subvector is extracted. ``idx`` must be a constant multiple of the known-minimum
17130 vector length of the result type. If the result type is a scalable vector,
17131 ``idx`` is first scaled by the result type's runtime scaling factor. Elements
17132 ``idx`` through (``idx`` + num_elements(result_type) - 1) must be valid vector
17133 indices. If this condition cannot be determined statically but is false at
17134 runtime, then the result vector is undefined. The ``idx`` parameter must be a
17135 vector index constant type (for most targets this will be an integer pointer
17138 '``llvm.experimental.vector.reverse``' Intrinsic
17139 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
17143 This is an overloaded intrinsic.
17147 declare <2 x i8> @llvm.experimental.vector.reverse.v2i8(<2 x i8> %a)
17148 declare <vscale x 4 x i32> @llvm.experimental.vector.reverse.nxv4i32(<vscale x 4 x i32> %a)
17153 The '``llvm.experimental.vector.reverse.*``' intrinsics reverse a vector.
17154 The intrinsic takes a single vector and returns a vector of matching type but
17155 with the original lane order reversed. These intrinsics work for both fixed
17156 and scalable vectors. While this intrinsic is marked as experimental the
17157 recommended way to express reverse operations for fixed-width vectors is still
17158 to use a shufflevector, as that may allow for more optimization opportunities.
17163 The argument to this intrinsic must be a vector.
17165 '``llvm.experimental.vector.splice``' Intrinsic
17166 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
17170 This is an overloaded intrinsic.
17174 declare <2 x double> @llvm.experimental.vector.splice.v2f64(<2 x double> %vec1, <2 x double> %vec2, i32 %imm)
17175 declare <vscale x 4 x i32> @llvm.experimental.vector.splice.nxv4i32(<vscale x 4 x i32> %vec1, <vscale x 4 x i32> %vec2, i32 %imm)
17180 The '``llvm.experimental.vector.splice.*``' intrinsics construct a vector by
17181 concatenating elements from the first input vector with elements of the second
17182 input vector, returning a vector of the same type as the input vectors. The
17183 signed immediate, modulo the number of elements in the vector, is the index
17184 into the first vector from which to extract the result value. This means
17185 conceptually that for a positive immediate, a vector is extracted from
17186 ``concat(%vec1, %vec2)`` starting at index ``imm``, whereas for a negative
17187 immediate, it extracts ``-imm`` trailing elements from the first vector, and
17188 the remaining elements from ``%vec2``.
17190 These intrinsics work for both fixed and scalable vectors. While this intrinsic
17191 is marked as experimental, the recommended way to express this operation for
17192 fixed-width vectors is still to use a shufflevector, as that may allow for more
17193 optimization opportunities.
17197 .. code-block:: text
17199 llvm.experimental.vector.splice(<A,B,C,D>, <E,F,G,H>, 1) ==> <B, C, D, E> ; index
17200 llvm.experimental.vector.splice(<A,B,C,D>, <E,F,G,H>, -3) ==> <B, C, D, E> ; trailing elements
17206 The first two operands are vectors with the same type. The third argument
17207 ``imm`` is the start index, modulo VL, where VL is the runtime vector length of
17208 the source/result vector. The ``imm`` is a signed integer constant in the range
17209 ``-VL <= imm < VL``. For values outside of this range the result is poison.
17211 '``llvm.experimental.stepvector``' Intrinsic
17212 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
17214 This is an overloaded intrinsic. You can use ``llvm.experimental.stepvector``
17215 to generate a vector whose lane values comprise the linear sequence
17216 <0, 1, 2, ...>. It is primarily intended for scalable vectors.
17220 declare <vscale x 4 x i32> @llvm.experimental.stepvector.nxv4i32()
17221 declare <vscale x 8 x i16> @llvm.experimental.stepvector.nxv8i16()
17223 The '``llvm.experimental.stepvector``' intrinsics are used to create vectors
17224 of integers whose elements contain a linear sequence of values starting from 0
17225 with a step of 1. This experimental intrinsic can only be used for vectors
17226 with integer elements that are at least 8 bits in size. If the sequence value
17227 exceeds the allowed limit for the element type then the result for that lane is
17230 These intrinsics work for both fixed and scalable vectors. While this intrinsic
17231 is marked as experimental, the recommended way to express this operation for
17232 fixed-width vectors is still to generate a constant vector instead.
17244 Operations on matrixes requiring shape information (like number of rows/columns
17245 or the memory layout) can be expressed using the matrix intrinsics. These
17246 intrinsics require matrix dimensions to be passed as immediate arguments, and
17247 matrixes are passed and returned as vectors. This means that for a ``R`` x
17248 ``C`` matrix, element ``i`` of column ``j`` is at index ``j * R + i`` in the
17249 corresponding vector, with indices starting at 0. Currently column-major layout
17250 is assumed. The intrinsics support both integer and floating point matrixes.
17253 '``llvm.matrix.transpose.*``' Intrinsic
17254 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
17258 This is an overloaded intrinsic.
17262 declare vectorty @llvm.matrix.transpose.*(vectorty %In, i32 <Rows>, i32 <Cols>)
17267 The '``llvm.matrix.transpose.*``' intrinsics treat ``%In`` as a ``<Rows> x
17268 <Cols>`` matrix and return the transposed matrix in the result vector.
17273 The first argument ``%In`` is a vector that corresponds to a ``<Rows> x
17274 <Cols>`` matrix. Thus, arguments ``<Rows>`` and ``<Cols>`` correspond to the
17275 number of rows and columns, respectively, and must be positive, constant
17276 integers. The returned vector must have ``<Rows> * <Cols>`` elements, and have
17277 the same float or integer element type as ``%In``.
17279 '``llvm.matrix.multiply.*``' Intrinsic
17280 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
17284 This is an overloaded intrinsic.
17288 declare vectorty @llvm.matrix.multiply.*(vectorty %A, vectorty %B, i32 <OuterRows>, i32 <Inner>, i32 <OuterColumns>)
17293 The '``llvm.matrix.multiply.*``' intrinsics treat ``%A`` as a ``<OuterRows> x
17294 <Inner>`` matrix, ``%B`` as a ``<Inner> x <OuterColumns>`` matrix, and
17295 multiplies them. The result matrix is returned in the result vector.
17300 The first vector argument ``%A`` corresponds to a matrix with ``<OuterRows> *
17301 <Inner>`` elements, and the second argument ``%B`` to a matrix with
17302 ``<Inner> * <OuterColumns>`` elements. Arguments ``<OuterRows>``,
17303 ``<Inner>`` and ``<OuterColumns>`` must be positive, constant integers. The
17304 returned vector must have ``<OuterRows> * <OuterColumns>`` elements.
17305 Vectors ``%A``, ``%B``, and the returned vector all have the same float or
17306 integer element type.
17309 '``llvm.matrix.column.major.load.*``' Intrinsic
17310 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
17314 This is an overloaded intrinsic.
17318 declare vectorty @llvm.matrix.column.major.load.*(
17319 ptrty %Ptr, i64 %Stride, i1 <IsVolatile>, i32 <Rows>, i32 <Cols>)
17324 The '``llvm.matrix.column.major.load.*``' intrinsics load a ``<Rows> x <Cols>``
17325 matrix using a stride of ``%Stride`` to compute the start address of the
17326 different columns. The offset is computed using ``%Stride``'s bitwidth. This
17327 allows for convenient loading of sub matrixes. If ``<IsVolatile>`` is true, the
17328 intrinsic is considered a :ref:`volatile memory access <volatile>`. The result
17329 matrix is returned in the result vector. If the ``%Ptr`` argument is known to
17330 be aligned to some boundary, this can be specified as an attribute on the
17336 The first argument ``%Ptr`` is a pointer type to the returned vector type, and
17337 corresponds to the start address to load from. The second argument ``%Stride``
17338 is a positive, constant integer with ``%Stride >= <Rows>``. ``%Stride`` is used
17339 to compute the column memory addresses. I.e., for a column ``C``, its start
17340 memory addresses is calculated with ``%Ptr + C * %Stride``. The third Argument
17341 ``<IsVolatile>`` is a boolean value. The fourth and fifth arguments,
17342 ``<Rows>`` and ``<Cols>``, correspond to the number of rows and columns,
17343 respectively, and must be positive, constant integers. The returned vector must
17344 have ``<Rows> * <Cols>`` elements.
17346 The :ref:`align <attr_align>` parameter attribute can be provided for the
17347 ``%Ptr`` arguments.
17350 '``llvm.matrix.column.major.store.*``' Intrinsic
17351 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
17358 declare void @llvm.matrix.column.major.store.*(
17359 vectorty %In, ptrty %Ptr, i64 %Stride, i1 <IsVolatile>, i32 <Rows>, i32 <Cols>)
17364 The '``llvm.matrix.column.major.store.*``' intrinsics store the ``<Rows> x
17365 <Cols>`` matrix in ``%In`` to memory using a stride of ``%Stride`` between
17366 columns. The offset is computed using ``%Stride``'s bitwidth. If
17367 ``<IsVolatile>`` is true, the intrinsic is considered a
17368 :ref:`volatile memory access <volatile>`.
17370 If the ``%Ptr`` argument is known to be aligned to some boundary, this can be
17371 specified as an attribute on the argument.
17376 The first argument ``%In`` is a vector that corresponds to a ``<Rows> x
17377 <Cols>`` matrix to be stored to memory. The second argument ``%Ptr`` is a
17378 pointer to the vector type of ``%In``, and is the start address of the matrix
17379 in memory. The third argument ``%Stride`` is a positive, constant integer with
17380 ``%Stride >= <Rows>``. ``%Stride`` is used to compute the column memory
17381 addresses. I.e., for a column ``C``, its start memory addresses is calculated
17382 with ``%Ptr + C * %Stride``. The fourth argument ``<IsVolatile>`` is a boolean
17383 value. The arguments ``<Rows>`` and ``<Cols>`` correspond to the number of rows
17384 and columns, respectively, and must be positive, constant integers.
17386 The :ref:`align <attr_align>` parameter attribute can be provided
17387 for the ``%Ptr`` arguments.
17390 Half Precision Floating-Point Intrinsics
17391 ----------------------------------------
17393 For most target platforms, half precision floating-point is a
17394 storage-only format. This means that it is a dense encoding (in memory)
17395 but does not support computation in the format.
17397 This means that code must first load the half-precision floating-point
17398 value as an i16, then convert it to float with
17399 :ref:`llvm.convert.from.fp16 <int_convert_from_fp16>`. Computation can
17400 then be performed on the float value (including extending to double
17401 etc). To store the value back to memory, it is first converted to float
17402 if needed, then converted to i16 with
17403 :ref:`llvm.convert.to.fp16 <int_convert_to_fp16>`, then storing as an
17406 .. _int_convert_to_fp16:
17408 '``llvm.convert.to.fp16``' Intrinsic
17409 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
17416 declare i16 @llvm.convert.to.fp16.f32(float %a)
17417 declare i16 @llvm.convert.to.fp16.f64(double %a)
17422 The '``llvm.convert.to.fp16``' intrinsic function performs a conversion from a
17423 conventional floating-point type to half precision floating-point format.
17428 The intrinsic function contains single argument - the value to be
17434 The '``llvm.convert.to.fp16``' intrinsic function performs a conversion from a
17435 conventional floating-point format to half precision floating-point format. The
17436 return value is an ``i16`` which contains the converted number.
17441 .. code-block:: llvm
17443 %res = call i16 @llvm.convert.to.fp16.f32(float %a)
17444 store i16 %res, i16* @x, align 2
17446 .. _int_convert_from_fp16:
17448 '``llvm.convert.from.fp16``' Intrinsic
17449 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
17456 declare float @llvm.convert.from.fp16.f32(i16 %a)
17457 declare double @llvm.convert.from.fp16.f64(i16 %a)
17462 The '``llvm.convert.from.fp16``' intrinsic function performs a
17463 conversion from half precision floating-point format to single precision
17464 floating-point format.
17469 The intrinsic function contains single argument - the value to be
17475 The '``llvm.convert.from.fp16``' intrinsic function performs a
17476 conversion from half single precision floating-point format to single
17477 precision floating-point format. The input half-float value is
17478 represented by an ``i16`` value.
17483 .. code-block:: llvm
17485 %a = load i16, i16* @x, align 2
17486 %res = call float @llvm.convert.from.fp16(i16 %a)
17488 Saturating floating-point to integer conversions
17489 ------------------------------------------------
17491 The ``fptoui`` and ``fptosi`` instructions return a
17492 :ref:`poison value <poisonvalues>` if the rounded-towards-zero value is not
17493 representable by the result type. These intrinsics provide an alternative
17494 conversion, which will saturate towards the smallest and largest representable
17495 integer values instead.
17497 '``llvm.fptoui.sat.*``' Intrinsic
17498 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
17503 This is an overloaded intrinsic. You can use ``llvm.fptoui.sat`` on any
17504 floating-point argument type and any integer result type, or vectors thereof.
17505 Not all targets may support all types, however.
17509 declare i32 @llvm.fptoui.sat.i32.f32(float %f)
17510 declare i19 @llvm.fptoui.sat.i19.f64(double %f)
17511 declare <4 x i100> @llvm.fptoui.sat.v4i100.v4f128(<4 x fp128> %f)
17516 This intrinsic converts the argument into an unsigned integer using saturating
17522 The argument may be any floating-point or vector of floating-point type. The
17523 return value may be any integer or vector of integer type. The number of vector
17524 elements in argument and return must be the same.
17529 The conversion to integer is performed subject to the following rules:
17531 - If the argument is any NaN, zero is returned.
17532 - If the argument is smaller than zero (this includes negative infinity),
17534 - If the argument is larger than the largest representable unsigned integer of
17535 the result type (this includes positive infinity), the largest representable
17536 unsigned integer is returned.
17537 - Otherwise, the result of rounding the argument towards zero is returned.
17542 .. code-block:: text
17544 %a = call i8 @llvm.fptoui.sat.i8.f32(float 123.9) ; yields i8: 123
17545 %b = call i8 @llvm.fptoui.sat.i8.f32(float -5.7) ; yields i8: 0
17546 %c = call i8 @llvm.fptoui.sat.i8.f32(float 377.0) ; yields i8: 255
17547 %d = call i8 @llvm.fptoui.sat.i8.f32(float 0xFFF8000000000000) ; yields i8: 0
17549 '``llvm.fptosi.sat.*``' Intrinsic
17550 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
17555 This is an overloaded intrinsic. You can use ``llvm.fptosi.sat`` on any
17556 floating-point argument type and any integer result type, or vectors thereof.
17557 Not all targets may support all types, however.
17561 declare i32 @llvm.fptosi.sat.i32.f32(float %f)
17562 declare i19 @llvm.fptosi.sat.i19.f64(double %f)
17563 declare <4 x i100> @llvm.fptosi.sat.v4i100.v4f128(<4 x fp128> %f)
17568 This intrinsic converts the argument into a signed integer using saturating
17574 The argument may be any floating-point or vector of floating-point type. The
17575 return value may be any integer or vector of integer type. The number of vector
17576 elements in argument and return must be the same.
17581 The conversion to integer is performed subject to the following rules:
17583 - If the argument is any NaN, zero is returned.
17584 - If the argument is smaller than the smallest representable signed integer of
17585 the result type (this includes negative infinity), the smallest
17586 representable signed integer is returned.
17587 - If the argument is larger than the largest representable signed integer of
17588 the result type (this includes positive infinity), the largest representable
17589 signed integer is returned.
17590 - Otherwise, the result of rounding the argument towards zero is returned.
17595 .. code-block:: text
17597 %a = call i8 @llvm.fptosi.sat.i8.f32(float 23.9) ; yields i8: 23
17598 %b = call i8 @llvm.fptosi.sat.i8.f32(float -130.8) ; yields i8: -128
17599 %c = call i8 @llvm.fptosi.sat.i8.f32(float 999.0) ; yields i8: 127
17600 %d = call i8 @llvm.fptosi.sat.i8.f32(float 0xFFF8000000000000) ; yields i8: 0
17602 .. _dbg_intrinsics:
17604 Debugger Intrinsics
17605 -------------------
17607 The LLVM debugger intrinsics (which all start with ``llvm.dbg.``
17608 prefix), are described in the `LLVM Source Level
17609 Debugging <SourceLevelDebugging.html#format-common-intrinsics>`_
17612 Exception Handling Intrinsics
17613 -----------------------------
17615 The LLVM exception handling intrinsics (which all start with
17616 ``llvm.eh.`` prefix), are described in the `LLVM Exception
17617 Handling <ExceptionHandling.html#format-common-intrinsics>`_ document.
17619 Pointer Authentication Intrinsics
17620 ---------------------------------
17622 The LLVM pointer authentication intrinsics (which all start with
17623 ``llvm.ptrauth.`` prefix), are described in the `Pointer Authentication
17624 <PointerAuth.html#intrinsics>`_ document.
17626 .. _int_trampoline:
17628 Trampoline Intrinsics
17629 ---------------------
17631 These intrinsics make it possible to excise one parameter, marked with
17632 the :ref:`nest <nest>` attribute, from a function. The result is a
17633 callable function pointer lacking the nest parameter - the caller does
17634 not need to provide a value for it. Instead, the value to use is stored
17635 in advance in a "trampoline", a block of memory usually allocated on the
17636 stack, which also contains code to splice the nest value into the
17637 argument list. This is used to implement the GCC nested function address
17640 For example, if the function is ``i32 f(i8* nest %c, i32 %x, i32 %y)``
17641 then the resulting function pointer has signature ``i32 (i32, i32)*``.
17642 It can be created as follows:
17644 .. code-block:: llvm
17646 %tramp = alloca [10 x i8], align 4 ; size and alignment only correct for X86
17647 %tramp1 = getelementptr [10 x i8], [10 x i8]* %tramp, i32 0, i32 0
17648 call i8* @llvm.init.trampoline(i8* %tramp1, i8* bitcast (i32 (i8*, i32, i32)* @f to i8*), i8* %nval)
17649 %p = call i8* @llvm.adjust.trampoline(i8* %tramp1)
17650 %fp = bitcast i8* %p to i32 (i32, i32)*
17652 The call ``%val = call i32 %fp(i32 %x, i32 %y)`` is then equivalent to
17653 ``%val = call i32 %f(i8* %nval, i32 %x, i32 %y)``.
17657 '``llvm.init.trampoline``' Intrinsic
17658 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
17665 declare void @llvm.init.trampoline(i8* <tramp>, i8* <func>, i8* <nval>)
17670 This fills the memory pointed to by ``tramp`` with executable code,
17671 turning it into a trampoline.
17676 The ``llvm.init.trampoline`` intrinsic takes three arguments, all
17677 pointers. The ``tramp`` argument must point to a sufficiently large and
17678 sufficiently aligned block of memory; this memory is written to by the
17679 intrinsic. Note that the size and the alignment are target-specific -
17680 LLVM currently provides no portable way of determining them, so a
17681 front-end that generates this intrinsic needs to have some
17682 target-specific knowledge. The ``func`` argument must hold a function
17683 bitcast to an ``i8*``.
17688 The block of memory pointed to by ``tramp`` is filled with target
17689 dependent code, turning it into a function. Then ``tramp`` needs to be
17690 passed to :ref:`llvm.adjust.trampoline <int_at>` to get a pointer which can
17691 be :ref:`bitcast (to a new function) and called <int_trampoline>`. The new
17692 function's signature is the same as that of ``func`` with any arguments
17693 marked with the ``nest`` attribute removed. At most one such ``nest``
17694 argument is allowed, and it must be of pointer type. Calling the new
17695 function is equivalent to calling ``func`` with the same argument list,
17696 but with ``nval`` used for the missing ``nest`` argument. If, after
17697 calling ``llvm.init.trampoline``, the memory pointed to by ``tramp`` is
17698 modified, then the effect of any later call to the returned function
17699 pointer is undefined.
17703 '``llvm.adjust.trampoline``' Intrinsic
17704 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
17711 declare i8* @llvm.adjust.trampoline(i8* <tramp>)
17716 This performs any required machine-specific adjustment to the address of
17717 a trampoline (passed as ``tramp``).
17722 ``tramp`` must point to a block of memory which already has trampoline
17723 code filled in by a previous call to
17724 :ref:`llvm.init.trampoline <int_it>`.
17729 On some architectures the address of the code to be executed needs to be
17730 different than the address where the trampoline is actually stored. This
17731 intrinsic returns the executable address corresponding to ``tramp``
17732 after performing the required machine specific adjustments. The pointer
17733 returned can then be :ref:`bitcast and executed <int_trampoline>`.
17738 Vector Predication Intrinsics
17739 -----------------------------
17740 VP intrinsics are intended for predicated SIMD/vector code. A typical VP
17741 operation takes a vector mask and an explicit vector length parameter as in:
17745 <W x T> llvm.vp.<opcode>.*(<W x T> %x, <W x T> %y, <W x i1> %mask, i32 %evl)
17747 The vector mask parameter (%mask) always has a vector of `i1` type, for example
17748 `<32 x i1>`. The explicit vector length parameter always has the type `i32` and
17749 is an unsigned integer value. The explicit vector length parameter (%evl) is in
17754 0 <= %evl <= W, where W is the number of vector elements
17756 Note that for :ref:`scalable vector types <t_vector>` ``W`` is the runtime
17757 length of the vector.
17759 The VP intrinsic has undefined behavior if ``%evl > W``. The explicit vector
17760 length (%evl) creates a mask, %EVLmask, with all elements ``0 <= i < %evl`` set
17761 to True, and all other lanes ``%evl <= i < W`` to False. A new mask %M is
17762 calculated with an element-wise AND from %mask and %EVLmask:
17766 M = %mask AND %EVLmask
17768 A vector operation ``<opcode>`` on vectors ``A`` and ``B`` calculates:
17772 A <opcode> B = { A[i] <opcode> B[i] M[i] = True, and
17778 Some targets, such as AVX512, do not support the %evl parameter in hardware.
17779 The use of an effective %evl is discouraged for those targets. The function
17780 ``TargetTransformInfo::hasActiveVectorLength()`` returns true when the target
17781 has native support for %evl.
17785 '``llvm.vp.select.*``' Intrinsics
17786 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
17790 This is an overloaded intrinsic.
17794 declare <16 x i32> @llvm.vp.select.v16i32 (<16 x i1> <condition>, <16 x i32> <on_true>, <16 x i32> <on_false>, i32 <evl>)
17795 declare <vscale x 4 x i64> @llvm.vp.select.nxv4i64 (<vscale x 4 x i1> <condition>, <vscale x 4 x i32> <on_true>, <vscale x 4 x i32> <on_false>, i32 <evl>)
17800 The '``llvm.vp.select``' intrinsic is used to choose one value based on a
17801 condition vector, without IR-level branching.
17806 The first operand is a vector of ``i1`` and indicates the condition. The
17807 second operand is the value that is selected where the condition vector is
17808 true. The third operand is the value that is selected where the condition
17809 vector is false. The vectors must be of the same size. The fourth operand is
17810 the explicit vector length.
17812 #. The optional ``fast-math flags`` marker indicates that the select has one or
17813 more :ref:`fast-math flags <fastmath>`. These are optimization hints to
17814 enable otherwise unsafe floating-point optimizations. Fast-math flags are
17815 only valid for selects that return a floating-point scalar or vector type,
17816 or an array (nested to any depth) of floating-point scalar or vector types.
17821 The intrinsic selects lanes from the second and third operand depending on a
17824 All result lanes at positions greater or equal than ``%evl`` are undefined.
17825 For all lanes below ``%evl`` where the condition vector is true the lane is
17826 taken from the second operand. Otherwise, the lane is taken from the third
17832 .. code-block:: llvm
17834 %r = call <4 x i32> @llvm.vp.select.v4i32(<4 x i1> %cond, <4 x i32> %on_true, <4 x i32> %on_false, i32 %evl)
17837 ;; Any result is legal on lanes at and above %evl.
17838 %also.r = select <4 x i1> %cond, <4 x i32> %on_true, <4 x i32> %on_false
17844 '``llvm.vp.add.*``' Intrinsics
17845 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
17849 This is an overloaded intrinsic.
17853 declare <16 x i32> @llvm.vp.add.v16i32 (<16 x i32> <left_op>, <16 x i32> <right_op>, <16 x i1> <mask>, i32 <vector_length>)
17854 declare <vscale x 4 x i32> @llvm.vp.add.nxv4i32 (<vscale x 4 x i32> <left_op>, <vscale x 4 x i32> <right_op>, <vscale x 4 x i1> <mask>, i32 <vector_length>)
17855 declare <256 x i64> @llvm.vp.add.v256i64 (<256 x i64> <left_op>, <256 x i64> <right_op>, <256 x i1> <mask>, i32 <vector_length>)
17860 Predicated integer addition of two vectors of integers.
17866 The first two operands and the result have the same vector of integer type. The
17867 third operand is the vector mask and has the same number of elements as the
17868 result vector type. The fourth operand is the explicit vector length of the
17874 The '``llvm.vp.add``' intrinsic performs integer addition (:ref:`add <i_add>`)
17875 of the first and second vector operand on each enabled lane. The result on
17876 disabled lanes is undefined.
17881 .. code-block:: llvm
17883 %r = call <4 x i32> @llvm.vp.add.v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i1> %mask, i32 %evl)
17884 ;; For all lanes below %evl, %r is lane-wise equivalent to %also.r
17886 %t = add <4 x i32> %a, %b
17887 %also.r = select <4 x i1> %mask, <4 x i32> %t, <4 x i32> undef
17891 '``llvm.vp.sub.*``' Intrinsics
17892 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
17896 This is an overloaded intrinsic.
17900 declare <16 x i32> @llvm.vp.sub.v16i32 (<16 x i32> <left_op>, <16 x i32> <right_op>, <16 x i1> <mask>, i32 <vector_length>)
17901 declare <vscale x 4 x i32> @llvm.vp.sub.nxv4i32 (<vscale x 4 x i32> <left_op>, <vscale x 4 x i32> <right_op>, <vscale x 4 x i1> <mask>, i32 <vector_length>)
17902 declare <256 x i64> @llvm.vp.sub.v256i64 (<256 x i64> <left_op>, <256 x i64> <right_op>, <256 x i1> <mask>, i32 <vector_length>)
17907 Predicated integer subtraction of two vectors of integers.
17913 The first two operands and the result have the same vector of integer type. The
17914 third operand is the vector mask and has the same number of elements as the
17915 result vector type. The fourth operand is the explicit vector length of the
17921 The '``llvm.vp.sub``' intrinsic performs integer subtraction
17922 (:ref:`sub <i_sub>`) of the first and second vector operand on each enabled
17923 lane. The result on disabled lanes is undefined.
17928 .. code-block:: llvm
17930 %r = call <4 x i32> @llvm.vp.sub.v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i1> %mask, i32 %evl)
17931 ;; For all lanes below %evl, %r is lane-wise equivalent to %also.r
17933 %t = sub <4 x i32> %a, %b
17934 %also.r = select <4 x i1> %mask, <4 x i32> %t, <4 x i32> undef
17940 '``llvm.vp.mul.*``' Intrinsics
17941 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
17945 This is an overloaded intrinsic.
17949 declare <16 x i32> @llvm.vp.mul.v16i32 (<16 x i32> <left_op>, <16 x i32> <right_op>, <16 x i1> <mask>, i32 <vector_length>)
17950 declare <vscale x 4 x i32> @llvm.vp.mul.nxv46i32 (<vscale x 4 x i32> <left_op>, <vscale x 4 x i32> <right_op>, <vscale x 4 x i1> <mask>, i32 <vector_length>)
17951 declare <256 x i64> @llvm.vp.mul.v256i64 (<256 x i64> <left_op>, <256 x i64> <right_op>, <256 x i1> <mask>, i32 <vector_length>)
17956 Predicated integer multiplication of two vectors of integers.
17962 The first two operands and the result have the same vector of integer type. The
17963 third operand is the vector mask and has the same number of elements as the
17964 result vector type. The fourth operand is the explicit vector length of the
17969 The '``llvm.vp.mul``' intrinsic performs integer multiplication
17970 (:ref:`mul <i_mul>`) of the first and second vector operand on each enabled
17971 lane. The result on disabled lanes is undefined.
17976 .. code-block:: llvm
17978 %r = call <4 x i32> @llvm.vp.mul.v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i1> %mask, i32 %evl)
17979 ;; For all lanes below %evl, %r is lane-wise equivalent to %also.r
17981 %t = mul <4 x i32> %a, %b
17982 %also.r = select <4 x i1> %mask, <4 x i32> %t, <4 x i32> undef
17987 '``llvm.vp.sdiv.*``' Intrinsics
17988 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
17992 This is an overloaded intrinsic.
17996 declare <16 x i32> @llvm.vp.sdiv.v16i32 (<16 x i32> <left_op>, <16 x i32> <right_op>, <16 x i1> <mask>, i32 <vector_length>)
17997 declare <vscale x 4 x i32> @llvm.vp.sdiv.nxv4i32 (<vscale x 4 x i32> <left_op>, <vscale x 4 x i32> <right_op>, <vscale x 4 x i1> <mask>, i32 <vector_length>)
17998 declare <256 x i64> @llvm.vp.sdiv.v256i64 (<256 x i64> <left_op>, <256 x i64> <right_op>, <256 x i1> <mask>, i32 <vector_length>)
18003 Predicated, signed division of two vectors of integers.
18009 The first two operands and the result have the same vector of integer type. The
18010 third operand is the vector mask and has the same number of elements as the
18011 result vector type. The fourth operand is the explicit vector length of the
18017 The '``llvm.vp.sdiv``' intrinsic performs signed division (:ref:`sdiv <i_sdiv>`)
18018 of the first and second vector operand on each enabled lane. The result on
18019 disabled lanes is undefined.
18024 .. code-block:: llvm
18026 %r = call <4 x i32> @llvm.vp.sdiv.v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i1> %mask, i32 %evl)
18027 ;; For all lanes below %evl, %r is lane-wise equivalent to %also.r
18029 %t = sdiv <4 x i32> %a, %b
18030 %also.r = select <4 x i1> %mask, <4 x i32> %t, <4 x i32> undef
18035 '``llvm.vp.udiv.*``' Intrinsics
18036 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
18040 This is an overloaded intrinsic.
18044 declare <16 x i32> @llvm.vp.udiv.v16i32 (<16 x i32> <left_op>, <16 x i32> <right_op>, <16 x i1> <mask>, i32 <vector_length>)
18045 declare <vscale x 4 x i32> @llvm.vp.udiv.nxv4i32 (<vscale x 4 x i32> <left_op>, <vscale x 4 x i32> <right_op>, <vscale x 4 x i1> <mask>, i32 <vector_length>)
18046 declare <256 x i64> @llvm.vp.udiv.v256i64 (<256 x i64> <left_op>, <256 x i64> <right_op>, <256 x i1> <mask>, i32 <vector_length>)
18051 Predicated, unsigned division of two vectors of integers.
18057 The first two operands and the result have the same vector of integer type. The third operand is the vector mask and has the same number of elements as the result vector type. The fourth operand is the explicit vector length of the operation.
18062 The '``llvm.vp.udiv``' intrinsic performs unsigned division
18063 (:ref:`udiv <i_udiv>`) of the first and second vector operand on each enabled
18064 lane. The result on disabled lanes is undefined.
18069 .. code-block:: llvm
18071 %r = call <4 x i32> @llvm.vp.udiv.v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i1> %mask, i32 %evl)
18072 ;; For all lanes below %evl, %r is lane-wise equivalent to %also.r
18074 %t = udiv <4 x i32> %a, %b
18075 %also.r = select <4 x i1> %mask, <4 x i32> %t, <4 x i32> undef
18081 '``llvm.vp.srem.*``' Intrinsics
18082 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
18086 This is an overloaded intrinsic.
18090 declare <16 x i32> @llvm.vp.srem.v16i32 (<16 x i32> <left_op>, <16 x i32> <right_op>, <16 x i1> <mask>, i32 <vector_length>)
18091 declare <vscale x 4 x i32> @llvm.vp.srem.nxv4i32 (<vscale x 4 x i32> <left_op>, <vscale x 4 x i32> <right_op>, <vscale x 4 x i1> <mask>, i32 <vector_length>)
18092 declare <256 x i64> @llvm.vp.srem.v256i64 (<256 x i64> <left_op>, <256 x i64> <right_op>, <256 x i1> <mask>, i32 <vector_length>)
18097 Predicated computations of the signed remainder of two integer vectors.
18103 The first two operands and the result have the same vector of integer type. The
18104 third operand is the vector mask and has the same number of elements as the
18105 result vector type. The fourth operand is the explicit vector length of the
18111 The '``llvm.vp.srem``' intrinsic computes the remainder of the signed division
18112 (:ref:`srem <i_srem>`) of the first and second vector operand on each enabled
18113 lane. The result on disabled lanes is undefined.
18118 .. code-block:: llvm
18120 %r = call <4 x i32> @llvm.vp.srem.v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i1> %mask, i32 %evl)
18121 ;; For all lanes below %evl, %r is lane-wise equivalent to %also.r
18123 %t = srem <4 x i32> %a, %b
18124 %also.r = select <4 x i1> %mask, <4 x i32> %t, <4 x i32> undef
18130 '``llvm.vp.urem.*``' Intrinsics
18131 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
18135 This is an overloaded intrinsic.
18139 declare <16 x i32> @llvm.vp.urem.v16i32 (<16 x i32> <left_op>, <16 x i32> <right_op>, <16 x i1> <mask>, i32 <vector_length>)
18140 declare <vscale x 4 x i32> @llvm.vp.urem.nxv4i32 (<vscale x 4 x i32> <left_op>, <vscale x 4 x i32> <right_op>, <vscale x 4 x i1> <mask>, i32 <vector_length>)
18141 declare <256 x i64> @llvm.vp.urem.v256i64 (<256 x i64> <left_op>, <256 x i64> <right_op>, <256 x i1> <mask>, i32 <vector_length>)
18146 Predicated computation of the unsigned remainder of two integer vectors.
18152 The first two operands and the result have the same vector of integer type. The
18153 third operand is the vector mask and has the same number of elements as the
18154 result vector type. The fourth operand is the explicit vector length of the
18160 The '``llvm.vp.urem``' intrinsic computes the remainder of the unsigned division
18161 (:ref:`urem <i_urem>`) of the first and second vector operand on each enabled
18162 lane. The result on disabled lanes is undefined.
18167 .. code-block:: llvm
18169 %r = call <4 x i32> @llvm.vp.urem.v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i1> %mask, i32 %evl)
18170 ;; For all lanes below %evl, %r is lane-wise equivalent to %also.r
18172 %t = urem <4 x i32> %a, %b
18173 %also.r = select <4 x i1> %mask, <4 x i32> %t, <4 x i32> undef
18178 '``llvm.vp.ashr.*``' Intrinsics
18179 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
18183 This is an overloaded intrinsic.
18187 declare <16 x i32> @llvm.vp.ashr.v16i32 (<16 x i32> <left_op>, <16 x i32> <right_op>, <16 x i1> <mask>, i32 <vector_length>)
18188 declare <vscale x 4 x i32> @llvm.vp.ashr.nxv4i32 (<vscale x 4 x i32> <left_op>, <vscale x 4 x i32> <right_op>, <vscale x 4 x i1> <mask>, i32 <vector_length>)
18189 declare <256 x i64> @llvm.vp.ashr.v256i64 (<256 x i64> <left_op>, <256 x i64> <right_op>, <256 x i1> <mask>, i32 <vector_length>)
18194 Vector-predicated arithmetic right-shift.
18200 The first two operands and the result have the same vector of integer type. The
18201 third operand is the vector mask and has the same number of elements as the
18202 result vector type. The fourth operand is the explicit vector length of the
18208 The '``llvm.vp.ashr``' intrinsic computes the arithmetic right shift
18209 (:ref:`ashr <i_ashr>`) of the first operand by the second operand on each
18210 enabled lane. The result on disabled lanes is undefined.
18215 .. code-block:: llvm
18217 %r = call <4 x i32> @llvm.vp.ashr.v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i1> %mask, i32 %evl)
18218 ;; For all lanes below %evl, %r is lane-wise equivalent to %also.r
18220 %t = ashr <4 x i32> %a, %b
18221 %also.r = select <4 x i1> %mask, <4 x i32> %t, <4 x i32> undef
18227 '``llvm.vp.lshr.*``' Intrinsics
18228 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
18232 This is an overloaded intrinsic.
18236 declare <16 x i32> @llvm.vp.lshr.v16i32 (<16 x i32> <left_op>, <16 x i32> <right_op>, <16 x i1> <mask>, i32 <vector_length>)
18237 declare <vscale x 4 x i32> @llvm.vp.lshr.nxv4i32 (<vscale x 4 x i32> <left_op>, <vscale x 4 x i32> <right_op>, <vscale x 4 x i1> <mask>, i32 <vector_length>)
18238 declare <256 x i64> @llvm.vp.lshr.v256i64 (<256 x i64> <left_op>, <256 x i64> <right_op>, <256 x i1> <mask>, i32 <vector_length>)
18243 Vector-predicated logical right-shift.
18249 The first two operands and the result have the same vector of integer type. The
18250 third operand is the vector mask and has the same number of elements as the
18251 result vector type. The fourth operand is the explicit vector length of the
18257 The '``llvm.vp.lshr``' intrinsic computes the logical right shift
18258 (:ref:`lshr <i_lshr>`) of the first operand by the second operand on each
18259 enabled lane. The result on disabled lanes is undefined.
18264 .. code-block:: llvm
18266 %r = call <4 x i32> @llvm.vp.lshr.v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i1> %mask, i32 %evl)
18267 ;; For all lanes below %evl, %r is lane-wise equivalent to %also.r
18269 %t = lshr <4 x i32> %a, %b
18270 %also.r = select <4 x i1> %mask, <4 x i32> %t, <4 x i32> undef
18275 '``llvm.vp.shl.*``' Intrinsics
18276 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
18280 This is an overloaded intrinsic.
18284 declare <16 x i32> @llvm.vp.shl.v16i32 (<16 x i32> <left_op>, <16 x i32> <right_op>, <16 x i1> <mask>, i32 <vector_length>)
18285 declare <vscale x 4 x i32> @llvm.vp.shl.nxv4i32 (<vscale x 4 x i32> <left_op>, <vscale x 4 x i32> <right_op>, <vscale x 4 x i1> <mask>, i32 <vector_length>)
18286 declare <256 x i64> @llvm.vp.shl.v256i64 (<256 x i64> <left_op>, <256 x i64> <right_op>, <256 x i1> <mask>, i32 <vector_length>)
18291 Vector-predicated left shift.
18297 The first two operands and the result have the same vector of integer type. The
18298 third operand is the vector mask and has the same number of elements as the
18299 result vector type. The fourth operand is the explicit vector length of the
18305 The '``llvm.vp.shl``' intrinsic computes the left shift (:ref:`shl <i_shl>`) of
18306 the first operand by the second operand on each enabled lane. The result on
18307 disabled lanes is undefined.
18312 .. code-block:: llvm
18314 %r = call <4 x i32> @llvm.vp.shl.v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i1> %mask, i32 %evl)
18315 ;; For all lanes below %evl, %r is lane-wise equivalent to %also.r
18317 %t = shl <4 x i32> %a, %b
18318 %also.r = select <4 x i1> %mask, <4 x i32> %t, <4 x i32> undef
18323 '``llvm.vp.or.*``' Intrinsics
18324 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
18328 This is an overloaded intrinsic.
18332 declare <16 x i32> @llvm.vp.or.v16i32 (<16 x i32> <left_op>, <16 x i32> <right_op>, <16 x i1> <mask>, i32 <vector_length>)
18333 declare <vscale x 4 x i32> @llvm.vp.or.nxv4i32 (<vscale x 4 x i32> <left_op>, <vscale x 4 x i32> <right_op>, <vscale x 4 x i1> <mask>, i32 <vector_length>)
18334 declare <256 x i64> @llvm.vp.or.v256i64 (<256 x i64> <left_op>, <256 x i64> <right_op>, <256 x i1> <mask>, i32 <vector_length>)
18339 Vector-predicated or.
18345 The first two operands and the result have the same vector of integer type. The
18346 third operand is the vector mask and has the same number of elements as the
18347 result vector type. The fourth operand is the explicit vector length of the
18353 The '``llvm.vp.or``' intrinsic performs a bitwise or (:ref:`or <i_or>`) of the
18354 first two operands on each enabled lane. The result on disabled lanes is
18360 .. code-block:: llvm
18362 %r = call <4 x i32> @llvm.vp.or.v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i1> %mask, i32 %evl)
18363 ;; For all lanes below %evl, %r is lane-wise equivalent to %also.r
18365 %t = or <4 x i32> %a, %b
18366 %also.r = select <4 x i1> %mask, <4 x i32> %t, <4 x i32> undef
18371 '``llvm.vp.and.*``' Intrinsics
18372 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
18376 This is an overloaded intrinsic.
18380 declare <16 x i32> @llvm.vp.and.v16i32 (<16 x i32> <left_op>, <16 x i32> <right_op>, <16 x i1> <mask>, i32 <vector_length>)
18381 declare <vscale x 4 x i32> @llvm.vp.and.nxv4i32 (<vscale x 4 x i32> <left_op>, <vscale x 4 x i32> <right_op>, <vscale x 4 x i1> <mask>, i32 <vector_length>)
18382 declare <256 x i64> @llvm.vp.and.v256i64 (<256 x i64> <left_op>, <256 x i64> <right_op>, <256 x i1> <mask>, i32 <vector_length>)
18387 Vector-predicated and.
18393 The first two operands and the result have the same vector of integer type. The
18394 third operand is the vector mask and has the same number of elements as the
18395 result vector type. The fourth operand is the explicit vector length of the
18401 The '``llvm.vp.and``' intrinsic performs a bitwise and (:ref:`and <i_or>`) of
18402 the first two operands on each enabled lane. The result on disabled lanes is
18408 .. code-block:: llvm
18410 %r = call <4 x i32> @llvm.vp.and.v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i1> %mask, i32 %evl)
18411 ;; For all lanes below %evl, %r is lane-wise equivalent to %also.r
18413 %t = and <4 x i32> %a, %b
18414 %also.r = select <4 x i1> %mask, <4 x i32> %t, <4 x i32> undef
18419 '``llvm.vp.xor.*``' Intrinsics
18420 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
18424 This is an overloaded intrinsic.
18428 declare <16 x i32> @llvm.vp.xor.v16i32 (<16 x i32> <left_op>, <16 x i32> <right_op>, <16 x i1> <mask>, i32 <vector_length>)
18429 declare <vscale x 4 x i32> @llvm.vp.xor.nxv4i32 (<vscale x 4 x i32> <left_op>, <vscale x 4 x i32> <right_op>, <vscale x 4 x i1> <mask>, i32 <vector_length>)
18430 declare <256 x i64> @llvm.vp.xor.v256i64 (<256 x i64> <left_op>, <256 x i64> <right_op>, <256 x i1> <mask>, i32 <vector_length>)
18435 Vector-predicated, bitwise xor.
18441 The first two operands and the result have the same vector of integer type. The
18442 third operand is the vector mask and has the same number of elements as the
18443 result vector type. The fourth operand is the explicit vector length of the
18449 The '``llvm.vp.xor``' intrinsic performs a bitwise xor (:ref:`xor <i_xor>`) of
18450 the first two operands on each enabled lane.
18451 The result on disabled lanes is undefined.
18456 .. code-block:: llvm
18458 %r = call <4 x i32> @llvm.vp.xor.v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i1> %mask, i32 %evl)
18459 ;; For all lanes below %evl, %r is lane-wise equivalent to %also.r
18461 %t = xor <4 x i32> %a, %b
18462 %also.r = select <4 x i1> %mask, <4 x i32> %t, <4 x i32> undef
18467 '``llvm.vp.fadd.*``' Intrinsics
18468 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
18472 This is an overloaded intrinsic.
18476 declare <16 x float> @llvm.vp.fadd.v16f32 (<16 x float> <left_op>, <16 x float> <right_op>, <16 x i1> <mask>, i32 <vector_length>)
18477 declare <vscale x 4 x float> @llvm.vp.fadd.nxv4f32 (<vscale x 4 x float> <left_op>, <vscale x 4 x float> <right_op>, <vscale x 4 x i1> <mask>, i32 <vector_length>)
18478 declare <256 x double> @llvm.vp.fadd.v256f64 (<256 x double> <left_op>, <256 x double> <right_op>, <256 x i1> <mask>, i32 <vector_length>)
18483 Predicated floating-point addition of two vectors of floating-point values.
18489 The first two operands and the result have the same vector of floating-point type. The
18490 third operand is the vector mask and has the same number of elements as the
18491 result vector type. The fourth operand is the explicit vector length of the
18497 The '``llvm.vp.fadd``' intrinsic performs floating-point addition (:ref:`add <i_fadd>`)
18498 of the first and second vector operand on each enabled lane. The result on
18499 disabled lanes is undefined. The operation is performed in the default
18500 floating-point environment.
18505 .. code-block:: llvm
18507 %r = call <4 x float> @llvm.vp.fadd.v4f32(<4 x float> %a, <4 x float> %b, <4 x i1> %mask, i32 %evl)
18508 ;; For all lanes below %evl, %r is lane-wise equivalent to %also.r
18510 %t = fadd <4 x float> %a, %b
18511 %also.r = select <4 x i1> %mask, <4 x float> %t, <4 x float> undef
18516 '``llvm.vp.fsub.*``' Intrinsics
18517 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
18521 This is an overloaded intrinsic.
18525 declare <16 x float> @llvm.vp.fsub.v16f32 (<16 x float> <left_op>, <16 x float> <right_op>, <16 x i1> <mask>, i32 <vector_length>)
18526 declare <vscale x 4 x float> @llvm.vp.fsub.nxv4f32 (<vscale x 4 x float> <left_op>, <vscale x 4 x float> <right_op>, <vscale x 4 x i1> <mask>, i32 <vector_length>)
18527 declare <256 x double> @llvm.vp.fsub.v256f64 (<256 x double> <left_op>, <256 x double> <right_op>, <256 x i1> <mask>, i32 <vector_length>)
18532 Predicated floating-point subtraction of two vectors of floating-point values.
18538 The first two operands and the result have the same vector of floating-point type. The
18539 third operand is the vector mask and has the same number of elements as the
18540 result vector type. The fourth operand is the explicit vector length of the
18546 The '``llvm.vp.fsub``' intrinsic performs floating-point subtraction (:ref:`add <i_fsub>`)
18547 of the first and second vector operand on each enabled lane. The result on
18548 disabled lanes is undefined. The operation is performed in the default
18549 floating-point environment.
18554 .. code-block:: llvm
18556 %r = call <4 x float> @llvm.vp.fsub.v4f32(<4 x float> %a, <4 x float> %b, <4 x i1> %mask, i32 %evl)
18557 ;; For all lanes below %evl, %r is lane-wise equivalent to %also.r
18559 %t = fsub <4 x float> %a, %b
18560 %also.r = select <4 x i1> %mask, <4 x float> %t, <4 x float> undef
18565 '``llvm.vp.fmul.*``' Intrinsics
18566 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
18570 This is an overloaded intrinsic.
18574 declare <16 x float> @llvm.vp.fmul.v16f32 (<16 x float> <left_op>, <16 x float> <right_op>, <16 x i1> <mask>, i32 <vector_length>)
18575 declare <vscale x 4 x float> @llvm.vp.fmul.nxv4f32 (<vscale x 4 x float> <left_op>, <vscale x 4 x float> <right_op>, <vscale x 4 x i1> <mask>, i32 <vector_length>)
18576 declare <256 x double> @llvm.vp.fmul.v256f64 (<256 x double> <left_op>, <256 x double> <right_op>, <256 x i1> <mask>, i32 <vector_length>)
18581 Predicated floating-point multiplication of two vectors of floating-point values.
18587 The first two operands and the result have the same vector of floating-point type. The
18588 third operand is the vector mask and has the same number of elements as the
18589 result vector type. The fourth operand is the explicit vector length of the
18595 The '``llvm.vp.fmul``' intrinsic performs floating-point multiplication (:ref:`add <i_fmul>`)
18596 of the first and second vector operand on each enabled lane. The result on
18597 disabled lanes is undefined. The operation is performed in the default
18598 floating-point environment.
18603 .. code-block:: llvm
18605 %r = call <4 x float> @llvm.vp.fmul.v4f32(<4 x float> %a, <4 x float> %b, <4 x i1> %mask, i32 %evl)
18606 ;; For all lanes below %evl, %r is lane-wise equivalent to %also.r
18608 %t = fmul <4 x float> %a, %b
18609 %also.r = select <4 x i1> %mask, <4 x float> %t, <4 x float> undef
18614 '``llvm.vp.fdiv.*``' Intrinsics
18615 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
18619 This is an overloaded intrinsic.
18623 declare <16 x float> @llvm.vp.fdiv.v16f32 (<16 x float> <left_op>, <16 x float> <right_op>, <16 x i1> <mask>, i32 <vector_length>)
18624 declare <vscale x 4 x float> @llvm.vp.fdiv.nxv4f32 (<vscale x 4 x float> <left_op>, <vscale x 4 x float> <right_op>, <vscale x 4 x i1> <mask>, i32 <vector_length>)
18625 declare <256 x double> @llvm.vp.fdiv.v256f64 (<256 x double> <left_op>, <256 x double> <right_op>, <256 x i1> <mask>, i32 <vector_length>)
18630 Predicated floating-point division of two vectors of floating-point values.
18636 The first two operands and the result have the same vector of floating-point type. The
18637 third operand is the vector mask and has the same number of elements as the
18638 result vector type. The fourth operand is the explicit vector length of the
18644 The '``llvm.vp.fdiv``' intrinsic performs floating-point division (:ref:`add <i_fdiv>`)
18645 of the first and second vector operand on each enabled lane. The result on
18646 disabled lanes is undefined. The operation is performed in the default
18647 floating-point environment.
18652 .. code-block:: llvm
18654 %r = call <4 x float> @llvm.vp.fdiv.v4f32(<4 x float> %a, <4 x float> %b, <4 x i1> %mask, i32 %evl)
18655 ;; For all lanes below %evl, %r is lane-wise equivalent to %also.r
18657 %t = fdiv <4 x float> %a, %b
18658 %also.r = select <4 x i1> %mask, <4 x float> %t, <4 x float> undef
18663 '``llvm.vp.frem.*``' Intrinsics
18664 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
18668 This is an overloaded intrinsic.
18672 declare <16 x float> @llvm.vp.frem.v16f32 (<16 x float> <left_op>, <16 x float> <right_op>, <16 x i1> <mask>, i32 <vector_length>)
18673 declare <vscale x 4 x float> @llvm.vp.frem.nxv4f32 (<vscale x 4 x float> <left_op>, <vscale x 4 x float> <right_op>, <vscale x 4 x i1> <mask>, i32 <vector_length>)
18674 declare <256 x double> @llvm.vp.frem.v256f64 (<256 x double> <left_op>, <256 x double> <right_op>, <256 x i1> <mask>, i32 <vector_length>)
18679 Predicated floating-point remainder of two vectors of floating-point values.
18685 The first two operands and the result have the same vector of floating-point type. The
18686 third operand is the vector mask and has the same number of elements as the
18687 result vector type. The fourth operand is the explicit vector length of the
18693 The '``llvm.vp.frem``' intrinsic performs floating-point remainder (:ref:`add <i_frem>`)
18694 of the first and second vector operand on each enabled lane. The result on
18695 disabled lanes is undefined. The operation is performed in the default
18696 floating-point environment.
18701 .. code-block:: llvm
18703 %r = call <4 x float> @llvm.vp.frem.v4f32(<4 x float> %a, <4 x float> %b, <4 x i1> %mask, i32 %evl)
18704 ;; For all lanes below %evl, %r is lane-wise equivalent to %also.r
18706 %t = frem <4 x float> %a, %b
18707 %also.r = select <4 x i1> %mask, <4 x float> %t, <4 x float> undef
18711 .. _int_vp_reduce_add:
18713 '``llvm.vp.reduce.add.*``' Intrinsics
18714 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
18718 This is an overloaded intrinsic.
18722 declare i32 @llvm.vp.reduce.add.v4i32(i32 <start_value>, <4 x i32> <val>, <4 x i1> <mask>, i32 <vector_length>)
18723 declare i16 @llvm.vp.reduce.add.nxv8i16(i16 <start_value>, <vscale x 8 x i16> <val>, <vscale x 8 x i1> <mask>, i32 <vector_length>)
18728 Predicated integer ``ADD`` reduction of a vector and a scalar starting value,
18729 returning the result as a scalar.
18734 The first operand is the start value of the reduction, which must be a scalar
18735 integer type equal to the result type. The second operand is the vector on
18736 which the reduction is performed and must be a vector of integer values whose
18737 element type is the result/start type. The third operand is the vector mask and
18738 is a vector of boolean values with the same number of elements as the vector
18739 operand. The fourth operand is the explicit vector length of the operation.
18744 The '``llvm.vp.reduce.add``' intrinsic performs the integer ``ADD`` reduction
18745 (:ref:`llvm.vector.reduce.add <int_vector_reduce_add>`) of the vector operand
18746 ``val`` on each enabled lane, adding it to the scalar ``start_value``. Disabled
18747 lanes are treated as containing the neutral value ``0`` (i.e. having no effect
18748 on the reduction operation). If the vector length is zero, the result is equal
18749 to ``start_value``.
18751 To ignore the start value, the neutral value can be used.
18756 .. code-block:: llvm
18758 %r = call i32 @llvm.vp.reduce.add.v4i32(i32 %start, <4 x i32> %a, <4 x i1> %mask, i32 %evl)
18759 ; %r is equivalent to %also.r, where lanes greater than or equal to %evl
18760 ; are treated as though %mask were false for those lanes.
18762 %masked.a = select <4 x i1> %mask, <4 x i32> %a, <4 x i32> zeroinitializer
18763 %reduction = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> %masked.a)
18764 %also.r = add i32 %reduction, %start
18767 .. _int_vp_reduce_fadd:
18769 '``llvm.vp.reduce.fadd.*``' Intrinsics
18770 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
18774 This is an overloaded intrinsic.
18778 declare float @llvm.vp.reduce.fadd.v4f32(float <start_value>, <4 x float> <val>, <4 x i1> <mask>, i32 <vector_length>)
18779 declare double @llvm.vp.reduce.fadd.nxv8f64(double <start_value>, <vscale x 8 x double> <val>, <vscale x 8 x i1> <mask>, i32 <vector_length>)
18784 Predicated floating-point ``ADD`` reduction of a vector and a scalar starting
18785 value, returning the result as a scalar.
18790 The first operand is the start value of the reduction, which must be a scalar
18791 floating-point type equal to the result type. The second operand is the vector
18792 on which the reduction is performed and must be a vector of floating-point
18793 values whose element type is the result/start type. The third operand is the
18794 vector mask and is a vector of boolean values with the same number of elements
18795 as the vector operand. The fourth operand is the explicit vector length of the
18801 The '``llvm.vp.reduce.fadd``' intrinsic performs the floating-point ``ADD``
18802 reduction (:ref:`llvm.vector.reduce.fadd <int_vector_reduce_fadd>`) of the
18803 vector operand ``val`` on each enabled lane, adding it to the scalar
18804 ``start_value``. Disabled lanes are treated as containing the neutral value
18805 ``-0.0`` (i.e. having no effect on the reduction operation). If no lanes are
18806 enabled, the resulting value will be equal to ``start_value``.
18808 To ignore the start value, the neutral value can be used.
18810 See the unpredicated version (:ref:`llvm.vector.reduce.fadd
18811 <int_vector_reduce_fadd>`) for more detail on the semantics of the reduction.
18816 .. code-block:: llvm
18818 %r = call float @llvm.vp.reduce.fadd.v4f32(float %start, <4 x float> %a, <4 x i1> %mask, i32 %evl)
18819 ; %r is equivalent to %also.r, where lanes greater than or equal to %evl
18820 ; are treated as though %mask were false for those lanes.
18822 %masked.a = select <4 x i1> %mask, <4 x float> %a, <4 x float> <float -0.0, float -0.0, float -0.0, float -0.0>
18823 %also.r = call float @llvm.vector.reduce.fadd.v4f32(float %start, <4 x float> %masked.a)
18826 .. _int_vp_reduce_mul:
18828 '``llvm.vp.reduce.mul.*``' Intrinsics
18829 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
18833 This is an overloaded intrinsic.
18837 declare i32 @llvm.vp.reduce.mul.v4i32(i32 <start_value>, <4 x i32> <val>, <4 x i1> <mask>, i32 <vector_length>)
18838 declare i16 @llvm.vp.reduce.mul.nxv8i16(i16 <start_value>, <vscale x 8 x i16> <val>, <vscale x 8 x i1> <mask>, i32 <vector_length>)
18843 Predicated integer ``MUL`` reduction of a vector and a scalar starting value,
18844 returning the result as a scalar.
18850 The first operand is the start value of the reduction, which must be a scalar
18851 integer type equal to the result type. The second operand is the vector on
18852 which the reduction is performed and must be a vector of integer values whose
18853 element type is the result/start type. The third operand is the vector mask and
18854 is a vector of boolean values with the same number of elements as the vector
18855 operand. The fourth operand is the explicit vector length of the operation.
18860 The '``llvm.vp.reduce.mul``' intrinsic performs the integer ``MUL`` reduction
18861 (:ref:`llvm.vector.reduce.mul <int_vector_reduce_mul>`) of the vector operand ``val``
18862 on each enabled lane, multiplying it by the scalar ``start_value``. Disabled
18863 lanes are treated as containing the neutral value ``1`` (i.e. having no effect
18864 on the reduction operation). If the vector length is zero, the result is the
18867 To ignore the start value, the neutral value can be used.
18872 .. code-block:: llvm
18874 %r = call i32 @llvm.vp.reduce.mul.v4i32(i32 %start, <4 x i32> %a, <4 x i1> %mask, i32 %evl)
18875 ; %r is equivalent to %also.r, where lanes greater than or equal to %evl
18876 ; are treated as though %mask were false for those lanes.
18878 %masked.a = select <4 x i1> %mask, <4 x i32> %a, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
18879 %reduction = call i32 @llvm.vector.reduce.mul.v4i32(<4 x i32> %masked.a)
18880 %also.r = mul i32 %reduction, %start
18882 .. _int_vp_reduce_fmul:
18884 '``llvm.vp.reduce.fmul.*``' Intrinsics
18885 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
18889 This is an overloaded intrinsic.
18893 declare float @llvm.vp.reduce.fmul.v4f32(float <start_value>, <4 x float> <val>, <4 x i1> <mask>, i32 <vector_length>)
18894 declare double @llvm.vp.reduce.fmul.nxv8f64(double <start_value>, <vscale x 8 x double> <val>, <vscale x 8 x i1> <mask>, i32 <vector_length>)
18899 Predicated floating-point ``MUL`` reduction of a vector and a scalar starting
18900 value, returning the result as a scalar.
18906 The first operand is the start value of the reduction, which must be a scalar
18907 floating-point type equal to the result type. The second operand is the vector
18908 on which the reduction is performed and must be a vector of floating-point
18909 values whose element type is the result/start type. The third operand is the
18910 vector mask and is a vector of boolean values with the same number of elements
18911 as the vector operand. The fourth operand is the explicit vector length of the
18917 The '``llvm.vp.reduce.fmul``' intrinsic performs the floating-point ``MUL``
18918 reduction (:ref:`llvm.vector.reduce.fmul <int_vector_reduce_fmul>`) of the
18919 vector operand ``val`` on each enabled lane, multiplying it by the scalar
18920 `start_value``. Disabled lanes are treated as containing the neutral value
18921 ``1.0`` (i.e. having no effect on the reduction operation). If no lanes are
18922 enabled, the resulting value will be equal to the starting value.
18924 To ignore the start value, the neutral value can be used.
18926 See the unpredicated version (:ref:`llvm.vector.reduce.fmul
18927 <int_vector_reduce_fmul>`) for more detail on the semantics.
18932 .. code-block:: llvm
18934 %r = call float @llvm.vp.reduce.fmul.v4f32(float %start, <4 x float> %a, <4 x i1> %mask, i32 %evl)
18935 ; %r is equivalent to %also.r, where lanes greater than or equal to %evl
18936 ; are treated as though %mask were false for those lanes.
18938 %masked.a = select <4 x i1> %mask, <4 x float> %a, <4 x float> <float 1.0, float 1.0, float 1.0, float 1.0>
18939 %also.r = call float @llvm.vector.reduce.fmul.v4f32(float %start, <4 x float> %masked.a)
18942 .. _int_vp_reduce_and:
18944 '``llvm.vp.reduce.and.*``' Intrinsics
18945 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
18949 This is an overloaded intrinsic.
18953 declare i32 @llvm.vp.reduce.and.v4i32(i32 <start_value>, <4 x i32> <val>, <4 x i1> <mask>, i32 <vector_length>)
18954 declare i16 @llvm.vp.reduce.and.nxv8i16(i16 <start_value>, <vscale x 8 x i16> <val>, <vscale x 8 x i1> <mask>, i32 <vector_length>)
18959 Predicated integer ``AND`` reduction of a vector and a scalar starting value,
18960 returning the result as a scalar.
18966 The first operand is the start value of the reduction, which must be a scalar
18967 integer type equal to the result type. The second operand is the vector on
18968 which the reduction is performed and must be a vector of integer values whose
18969 element type is the result/start type. The third operand is the vector mask and
18970 is a vector of boolean values with the same number of elements as the vector
18971 operand. The fourth operand is the explicit vector length of the operation.
18976 The '``llvm.vp.reduce.and``' intrinsic performs the integer ``AND`` reduction
18977 (:ref:`llvm.vector.reduce.and <int_vector_reduce_and>`) of the vector operand
18978 ``val`` on each enabled lane, performing an '``and``' of that with with the
18979 scalar ``start_value``. Disabled lanes are treated as containing the neutral
18980 value ``UINT_MAX``, or ``-1`` (i.e. having no effect on the reduction
18981 operation). If the vector length is zero, the result is the start value.
18983 To ignore the start value, the neutral value can be used.
18988 .. code-block:: llvm
18990 %r = call i32 @llvm.vp.reduce.and.v4i32(i32 %start, <4 x i32> %a, <4 x i1> %mask, i32 %evl)
18991 ; %r is equivalent to %also.r, where lanes greater than or equal to %evl
18992 ; are treated as though %mask were false for those lanes.
18994 %masked.a = select <4 x i1> %mask, <4 x i32> %a, <4 x i32> <i32 -1, i32 -1, i32 -1, i32 -1>
18995 %reduction = call i32 @llvm.vector.reduce.and.v4i32(<4 x i32> %masked.a)
18996 %also.r = and i32 %reduction, %start
18999 .. _int_vp_reduce_or:
19001 '``llvm.vp.reduce.or.*``' Intrinsics
19002 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
19006 This is an overloaded intrinsic.
19010 declare i32 @llvm.vp.reduce.or.v4i32(i32 <start_value>, <4 x i32> <val>, <4 x i1> <mask>, i32 <vector_length>)
19011 declare i16 @llvm.vp.reduce.or.nxv8i16(i16 <start_value>, <vscale x 8 x i16> <val>, <vscale x 8 x i1> <mask>, i32 <vector_length>)
19016 Predicated integer ``OR`` reduction of a vector and a scalar starting value,
19017 returning the result as a scalar.
19023 The first operand is the start value of the reduction, which must be a scalar
19024 integer type equal to the result type. The second operand is the vector on
19025 which the reduction is performed and must be a vector of integer values whose
19026 element type is the result/start type. The third operand is the vector mask and
19027 is a vector of boolean values with the same number of elements as the vector
19028 operand. The fourth operand is the explicit vector length of the operation.
19033 The '``llvm.vp.reduce.or``' intrinsic performs the integer ``OR`` reduction
19034 (:ref:`llvm.vector.reduce.or <int_vector_reduce_or>`) of the vector operand
19035 ``val`` on each enabled lane, performing an '``or``' of that with the scalar
19036 ``start_value``. Disabled lanes are treated as containing the neutral value
19037 ``0`` (i.e. having no effect on the reduction operation). If the vector length
19038 is zero, the result is the start value.
19040 To ignore the start value, the neutral value can be used.
19045 .. code-block:: llvm
19047 %r = call i32 @llvm.vp.reduce.or.v4i32(i32 %start, <4 x i32> %a, <4 x i1> %mask, i32 %evl)
19048 ; %r is equivalent to %also.r, where lanes greater than or equal to %evl
19049 ; are treated as though %mask were false for those lanes.
19051 %masked.a = select <4 x i1> %mask, <4 x i32> %a, <4 x i32> <i32 0, i32 0, i32 0, i32 0>
19052 %reduction = call i32 @llvm.vector.reduce.or.v4i32(<4 x i32> %masked.a)
19053 %also.r = or i32 %reduction, %start
19055 .. _int_vp_reduce_xor:
19057 '``llvm.vp.reduce.xor.*``' Intrinsics
19058 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
19062 This is an overloaded intrinsic.
19066 declare i32 @llvm.vp.reduce.xor.v4i32(i32 <start_value>, <4 x i32> <val>, <4 x i1> <mask>, i32 <vector_length>)
19067 declare i16 @llvm.vp.reduce.xor.nxv8i16(i16 <start_value>, <vscale x 8 x i16> <val>, <vscale x 8 x i1> <mask>, i32 <vector_length>)
19072 Predicated integer ``XOR`` reduction of a vector and a scalar starting value,
19073 returning the result as a scalar.
19079 The first operand is the start value of the reduction, which must be a scalar
19080 integer type equal to the result type. The second operand is the vector on
19081 which the reduction is performed and must be a vector of integer values whose
19082 element type is the result/start type. The third operand is the vector mask and
19083 is a vector of boolean values with the same number of elements as the vector
19084 operand. The fourth operand is the explicit vector length of the operation.
19089 The '``llvm.vp.reduce.xor``' intrinsic performs the integer ``XOR`` reduction
19090 (:ref:`llvm.vector.reduce.xor <int_vector_reduce_xor>`) of the vector operand
19091 ``val`` on each enabled lane, performing an '``xor``' of that with the scalar
19092 ``start_value``. Disabled lanes are treated as containing the neutral value
19093 ``0`` (i.e. having no effect on the reduction operation). If the vector length
19094 is zero, the result is the start value.
19096 To ignore the start value, the neutral value can be used.
19101 .. code-block:: llvm
19103 %r = call i32 @llvm.vp.reduce.xor.v4i32(i32 %start, <4 x i32> %a, <4 x i1> %mask, i32 %evl)
19104 ; %r is equivalent to %also.r, where lanes greater than or equal to %evl
19105 ; are treated as though %mask were false for those lanes.
19107 %masked.a = select <4 x i1> %mask, <4 x i32> %a, <4 x i32> <i32 0, i32 0, i32 0, i32 0>
19108 %reduction = call i32 @llvm.vector.reduce.xor.v4i32(<4 x i32> %masked.a)
19109 %also.r = xor i32 %reduction, %start
19112 .. _int_vp_reduce_smax:
19114 '``llvm.vp.reduce.smax.*``' Intrinsics
19115 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
19119 This is an overloaded intrinsic.
19123 declare i32 @llvm.vp.reduce.smax.v4i32(i32 <start_value>, <4 x i32> <val>, <4 x i1> <mask>, i32 <vector_length>)
19124 declare i16 @llvm.vp.reduce.smax.nxv8i16(i16 <start_value>, <vscale x 8 x i16> <val>, <vscale x 8 x i1> <mask>, i32 <vector_length>)
19129 Predicated signed-integer ``MAX`` reduction of a vector and a scalar starting
19130 value, returning the result as a scalar.
19136 The first operand is the start value of the reduction, which must be a scalar
19137 integer type equal to the result type. The second operand is the vector on
19138 which the reduction is performed and must be a vector of integer values whose
19139 element type is the result/start type. The third operand is the vector mask and
19140 is a vector of boolean values with the same number of elements as the vector
19141 operand. The fourth operand is the explicit vector length of the operation.
19146 The '``llvm.vp.reduce.smax``' intrinsic performs the signed-integer ``MAX``
19147 reduction (:ref:`llvm.vector.reduce.smax <int_vector_reduce_smax>`) of the
19148 vector operand ``val`` on each enabled lane, and taking the maximum of that and
19149 the scalar ``start_value``. Disabled lanes are treated as containing the
19150 neutral value ``INT_MIN`` (i.e. having no effect on the reduction operation).
19151 If the vector length is zero, the result is the start value.
19153 To ignore the start value, the neutral value can be used.
19158 .. code-block:: llvm
19160 %r = call i8 @llvm.vp.reduce.smax.v4i8(i8 %start, <4 x i8> %a, <4 x i1> %mask, i32 %evl)
19161 ; %r is equivalent to %also.r, where lanes greater than or equal to %evl
19162 ; are treated as though %mask were false for those lanes.
19164 %masked.a = select <4 x i1> %mask, <4 x i8> %a, <4 x i8> <i8 -128, i8 -128, i8 -128, i8 -128>
19165 %reduction = call i8 @llvm.vector.reduce.smax.v4i8(<4 x i8> %masked.a)
19166 %also.r = call i8 @llvm.smax.i8(i8 %reduction, i8 %start)
19169 .. _int_vp_reduce_smin:
19171 '``llvm.vp.reduce.smin.*``' Intrinsics
19172 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
19176 This is an overloaded intrinsic.
19180 declare i32 @llvm.vp.reduce.smin.v4i32(i32 <start_value>, <4 x i32> <val>, <4 x i1> <mask>, i32 <vector_length>)
19181 declare i16 @llvm.vp.reduce.smin.nxv8i16(i16 <start_value>, <vscale x 8 x i16> <val>, <vscale x 8 x i1> <mask>, i32 <vector_length>)
19186 Predicated signed-integer ``MIN`` reduction of a vector and a scalar starting
19187 value, returning the result as a scalar.
19193 The first operand is the start value of the reduction, which must be a scalar
19194 integer type equal to the result type. The second operand is the vector on
19195 which the reduction is performed and must be a vector of integer values whose
19196 element type is the result/start type. The third operand is the vector mask and
19197 is a vector of boolean values with the same number of elements as the vector
19198 operand. The fourth operand is the explicit vector length of the operation.
19203 The '``llvm.vp.reduce.smin``' intrinsic performs the signed-integer ``MIN``
19204 reduction (:ref:`llvm.vector.reduce.smin <int_vector_reduce_smin>`) of the
19205 vector operand ``val`` on each enabled lane, and taking the minimum of that and
19206 the scalar ``start_value``. Disabled lanes are treated as containing the
19207 neutral value ``INT_MAX`` (i.e. having no effect on the reduction operation).
19208 If the vector length is zero, the result is the start value.
19210 To ignore the start value, the neutral value can be used.
19215 .. code-block:: llvm
19217 %r = call i8 @llvm.vp.reduce.smin.v4i8(i8 %start, <4 x i8> %a, <4 x i1> %mask, i32 %evl)
19218 ; %r is equivalent to %also.r, where lanes greater than or equal to %evl
19219 ; are treated as though %mask were false for those lanes.
19221 %masked.a = select <4 x i1> %mask, <4 x i8> %a, <4 x i8> <i8 127, i8 127, i8 127, i8 127>
19222 %reduction = call i8 @llvm.vector.reduce.smin.v4i8(<4 x i8> %masked.a)
19223 %also.r = call i8 @llvm.smin.i8(i8 %reduction, i8 %start)
19226 .. _int_vp_reduce_umax:
19228 '``llvm.vp.reduce.umax.*``' Intrinsics
19229 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
19233 This is an overloaded intrinsic.
19237 declare i32 @llvm.vp.reduce.umax.v4i32(i32 <start_value>, <4 x i32> <val>, <4 x i1> <mask>, i32 <vector_length>)
19238 declare i16 @llvm.vp.reduce.umax.nxv8i16(i16 <start_value>, <vscale x 8 x i16> <val>, <vscale x 8 x i1> <mask>, i32 <vector_length>)
19243 Predicated unsigned-integer ``MAX`` reduction of a vector and a scalar starting
19244 value, returning the result as a scalar.
19250 The first operand is the start value of the reduction, which must be a scalar
19251 integer type equal to the result type. The second operand is the vector on
19252 which the reduction is performed and must be a vector of integer values whose
19253 element type is the result/start type. The third operand is the vector mask and
19254 is a vector of boolean values with the same number of elements as the vector
19255 operand. The fourth operand is the explicit vector length of the operation.
19260 The '``llvm.vp.reduce.umax``' intrinsic performs the unsigned-integer ``MAX``
19261 reduction (:ref:`llvm.vector.reduce.umax <int_vector_reduce_umax>`) of the
19262 vector operand ``val`` on each enabled lane, and taking the maximum of that and
19263 the scalar ``start_value``. Disabled lanes are treated as containing the
19264 neutral value ``0`` (i.e. having no effect on the reduction operation). If the
19265 vector length is zero, the result is the start value.
19267 To ignore the start value, the neutral value can be used.
19272 .. code-block:: llvm
19274 %r = call i32 @llvm.vp.reduce.umax.v4i32(i32 %start, <4 x i32> %a, <4 x i1> %mask, i32 %evl)
19275 ; %r is equivalent to %also.r, where lanes greater than or equal to %evl
19276 ; are treated as though %mask were false for those lanes.
19278 %masked.a = select <4 x i1> %mask, <4 x i32> %a, <4 x i32> <i32 0, i32 0, i32 0, i32 0>
19279 %reduction = call i32 @llvm.vector.reduce.umax.v4i32(<4 x i32> %masked.a)
19280 %also.r = call i32 @llvm.umax.i32(i32 %reduction, i32 %start)
19283 .. _int_vp_reduce_umin:
19285 '``llvm.vp.reduce.umin.*``' Intrinsics
19286 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
19290 This is an overloaded intrinsic.
19294 declare i32 @llvm.vp.reduce.umin.v4i32(i32 <start_value>, <4 x i32> <val>, <4 x i1> <mask>, i32 <vector_length>)
19295 declare i16 @llvm.vp.reduce.umin.nxv8i16(i16 <start_value>, <vscale x 8 x i16> <val>, <vscale x 8 x i1> <mask>, i32 <vector_length>)
19300 Predicated unsigned-integer ``MIN`` reduction of a vector and a scalar starting
19301 value, returning the result as a scalar.
19307 The first operand is the start value of the reduction, which must be a scalar
19308 integer type equal to the result type. The second operand is the vector on
19309 which the reduction is performed and must be a vector of integer values whose
19310 element type is the result/start type. The third operand is the vector mask and
19311 is a vector of boolean values with the same number of elements as the vector
19312 operand. The fourth operand is the explicit vector length of the operation.
19317 The '``llvm.vp.reduce.umin``' intrinsic performs the unsigned-integer ``MIN``
19318 reduction (:ref:`llvm.vector.reduce.umin <int_vector_reduce_umin>`) of the
19319 vector operand ``val`` on each enabled lane, taking the minimum of that and the
19320 scalar ``start_value``. Disabled lanes are treated as containing the neutral
19321 value ``UINT_MAX``, or ``-1`` (i.e. having no effect on the reduction
19322 operation). If the vector length is zero, the result is the start value.
19324 To ignore the start value, the neutral value can be used.
19329 .. code-block:: llvm
19331 %r = call i32 @llvm.vp.reduce.umin.v4i32(i32 %start, <4 x i32> %a, <4 x i1> %mask, i32 %evl)
19332 ; %r is equivalent to %also.r, where lanes greater than or equal to %evl
19333 ; are treated as though %mask were false for those lanes.
19335 %masked.a = select <4 x i1> %mask, <4 x i32> %a, <4 x i32> <i32 -1, i32 -1, i32 -1, i32 -1>
19336 %reduction = call i32 @llvm.vector.reduce.umin.v4i32(<4 x i32> %masked.a)
19337 %also.r = call i32 @llvm.umin.i32(i32 %reduction, i32 %start)
19340 .. _int_vp_reduce_fmax:
19342 '``llvm.vp.reduce.fmax.*``' Intrinsics
19343 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
19347 This is an overloaded intrinsic.
19351 declare float @llvm.vp.reduce.fmax.v4f32(float <start_value>, <4 x float> <val>, <4 x i1> <mask>, float <vector_length>)
19352 declare double @llvm.vp.reduce.fmax.nxv8f64(double <start_value>, <vscale x 8 x double> <val>, <vscale x 8 x i1> <mask>, i32 <vector_length>)
19357 Predicated floating-point ``MAX`` reduction of a vector and a scalar starting
19358 value, returning the result as a scalar.
19364 The first operand is the start value of the reduction, which must be a scalar
19365 floating-point type equal to the result type. The second operand is the vector
19366 on which the reduction is performed and must be a vector of floating-point
19367 values whose element type is the result/start type. The third operand is the
19368 vector mask and is a vector of boolean values with the same number of elements
19369 as the vector operand. The fourth operand is the explicit vector length of the
19375 The '``llvm.vp.reduce.fmax``' intrinsic performs the floating-point ``MAX``
19376 reduction (:ref:`llvm.vector.reduce.fmax <int_vector_reduce_fmax>`) of the
19377 vector operand ``val`` on each enabled lane, taking the maximum of that and the
19378 scalar ``start_value``. Disabled lanes are treated as containing the neutral
19379 value (i.e. having no effect on the reduction operation). If the vector length
19380 is zero, the result is the start value.
19382 The neutral value is dependent on the :ref:`fast-math flags <fastmath>`. If no
19383 flags are set, the neutral value is ``-QNAN``. If ``nnan`` and ``ninf`` are
19384 both set, then the neutral value is the smallest floating-point value for the
19385 result type. If only ``nnan`` is set then the neutral value is ``-Infinity``.
19387 This instruction has the same comparison semantics as the
19388 :ref:`llvm.vector.reduce.fmax <int_vector_reduce_fmax>` intrinsic (and thus the
19389 '``llvm.maxnum.*``' intrinsic). That is, the result will always be a number
19390 unless all elements of the vector and the starting value are ``NaN``. For a
19391 vector with maximum element magnitude ``0.0`` and containing both ``+0.0`` and
19392 ``-0.0`` elements, the sign of the result is unspecified.
19394 To ignore the start value, the neutral value can be used.
19399 .. code-block:: llvm
19401 %r = call float @llvm.vp.reduce.fmax.v4f32(float %float, <4 x float> %a, <4 x i1> %mask, i32 %evl)
19402 ; %r is equivalent to %also.r, where lanes greater than or equal to %evl
19403 ; are treated as though %mask were false for those lanes.
19405 %masked.a = select <4 x i1> %mask, <4 x float> %a, <4 x float> <float QNAN, float QNAN, float QNAN, float QNAN>
19406 %reduction = call float @llvm.vector.reduce.fmax.v4f32(<4 x float> %masked.a)
19407 %also.r = call float @llvm.maxnum.f32(float %reduction, float %start)
19410 .. _int_vp_reduce_fmin:
19412 '``llvm.vp.reduce.fmin.*``' Intrinsics
19413 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
19417 This is an overloaded intrinsic.
19421 declare float @llvm.vp.reduce.fmin.v4f32(float <start_value>, <4 x float> <val>, <4 x i1> <mask>, float <vector_length>)
19422 declare double @llvm.vp.reduce.fmin.nxv8f64(double <start_value>, <vscale x 8 x double> <val>, <vscale x 8 x i1> <mask>, i32 <vector_length>)
19427 Predicated floating-point ``MIN`` reduction of a vector and a scalar starting
19428 value, returning the result as a scalar.
19434 The first operand is the start value of the reduction, which must be a scalar
19435 floating-point type equal to the result type. The second operand is the vector
19436 on which the reduction is performed and must be a vector of floating-point
19437 values whose element type is the result/start type. The third operand is the
19438 vector mask and is a vector of boolean values with the same number of elements
19439 as the vector operand. The fourth operand is the explicit vector length of the
19445 The '``llvm.vp.reduce.fmin``' intrinsic performs the floating-point ``MIN``
19446 reduction (:ref:`llvm.vector.reduce.fmin <int_vector_reduce_fmin>`) of the
19447 vector operand ``val`` on each enabled lane, taking the minimum of that and the
19448 scalar ``start_value``. Disabled lanes are treated as containing the neutral
19449 value (i.e. having no effect on the reduction operation). If the vector length
19450 is zero, the result is the start value.
19452 The neutral value is dependent on the :ref:`fast-math flags <fastmath>`. If no
19453 flags are set, the neutral value is ``+QNAN``. If ``nnan`` and ``ninf`` are
19454 both set, then the neutral value is the largest floating-point value for the
19455 result type. If only ``nnan`` is set then the neutral value is ``+Infinity``.
19457 This instruction has the same comparison semantics as the
19458 :ref:`llvm.vector.reduce.fmin <int_vector_reduce_fmin>` intrinsic (and thus the
19459 '``llvm.minnum.*``' intrinsic). That is, the result will always be a number
19460 unless all elements of the vector and the starting value are ``NaN``. For a
19461 vector with maximum element magnitude ``0.0`` and containing both ``+0.0`` and
19462 ``-0.0`` elements, the sign of the result is unspecified.
19464 To ignore the start value, the neutral value can be used.
19469 .. code-block:: llvm
19471 %r = call float @llvm.vp.reduce.fmin.v4f32(float %start, <4 x float> %a, <4 x i1> %mask, i32 %evl)
19472 ; %r is equivalent to %also.r, where lanes greater than or equal to %evl
19473 ; are treated as though %mask were false for those lanes.
19475 %masked.a = select <4 x i1> %mask, <4 x float> %a, <4 x float> <float QNAN, float QNAN, float QNAN, float QNAN>
19476 %reduction = call float @llvm.vector.reduce.fmin.v4f32(<4 x float> %masked.a)
19477 %also.r = call float @llvm.minnum.f32(float %reduction, float %start)
19480 .. _int_get_active_lane_mask:
19482 '``llvm.get.active.lane.mask.*``' Intrinsics
19483 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
19487 This is an overloaded intrinsic.
19491 declare <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 %base, i32 %n)
19492 declare <8 x i1> @llvm.get.active.lane.mask.v8i1.i64(i64 %base, i64 %n)
19493 declare <16 x i1> @llvm.get.active.lane.mask.v16i1.i64(i64 %base, i64 %n)
19494 declare <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 %base, i64 %n)
19500 Create a mask representing active and inactive vector lanes.
19506 Both operands have the same scalar integer type. The result is a vector with
19507 the i1 element type.
19512 The '``llvm.get.active.lane.mask.*``' intrinsics are semantically equivalent
19517 %m[i] = icmp ult (%base + i), %n
19519 where ``%m`` is a vector (mask) of active/inactive lanes with its elements
19520 indexed by ``i``, and ``%base``, ``%n`` are the two arguments to
19521 ``llvm.get.active.lane.mask.*``, ``%icmp`` is an integer compare and ``ult``
19522 the unsigned less-than comparison operator. Overflow cannot occur in
19523 ``(%base + i)`` and its comparison against ``%n`` as it is performed in integer
19524 numbers and not in machine numbers. If ``%n`` is ``0``, then the result is a
19525 poison value. The above is equivalent to:
19529 %m = @llvm.get.active.lane.mask(%base, %n)
19531 This can, for example, be emitted by the loop vectorizer in which case
19532 ``%base`` is the first element of the vector induction variable (VIV) and
19533 ``%n`` is the loop tripcount. Thus, these intrinsics perform an element-wise
19534 less than comparison of VIV with the loop tripcount, producing a mask of
19535 true/false values representing active/inactive vector lanes, except if the VIV
19536 overflows in which case they return false in the lanes where the VIV overflows.
19537 The arguments are scalar types to accommodate scalable vector types, for which
19538 it is unknown what the type of the step vector needs to be that enumerate its
19539 lanes without overflow.
19541 This mask ``%m`` can e.g. be used in masked load/store instructions. These
19542 intrinsics provide a hint to the backend. I.e., for a vector loop, the
19543 back-edge taken count of the original scalar loop is explicit as the second
19550 .. code-block:: llvm
19552 %active.lane.mask = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i64(i64 %elem0, i64 429)
19553 %wide.masked.load = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* %3, i32 4, <4 x i1> %active.lane.mask, <4 x i32> undef)
19556 .. _int_experimental_vp_splice:
19558 '``llvm.experimental.vp.splice``' Intrinsic
19559 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
19563 This is an overloaded intrinsic.
19567 declare <2 x double> @llvm.experimental.vp.splice.v2f64(<2 x double> %vec1, <2 x double> %vec2, i32 %imm, <2 x i1> %mask, i32 %evl1, i32 %evl2)
19568 declare <vscale x 4 x i32> @llvm.experimental.vp.splice.nxv4i32(<vscale x 4 x i32> %vec1, <vscale x 4 x i32> %vec2, i32 %imm, <vscale x 4 x i1> %mask, i32 %evl1, i32 %evl2)
19573 The '``llvm.experimental.vp.splice.*``' intrinsic is the vector length
19574 predicated version of the '``llvm.experimental.vector.splice.*``' intrinsic.
19579 The result and the first two arguments ``vec1`` and ``vec2`` are vectors with
19580 the same type. The third argument ``imm`` is an immediate signed integer that
19581 indicates the offset index. The fourth argument ``mask`` is a vector mask and
19582 has the same number of elements as the result. The last two arguments ``evl1``
19583 and ``evl2`` are unsigned integers indicating the explicit vector lengths of
19584 ``vec1`` and ``vec2`` respectively. ``imm``, ``evl1`` and ``evl2`` should
19585 respect the following constraints: ``-evl1 <= imm < evl1``, ``0 <= evl1 <= VL``
19586 and ``0 <= evl2 <= VL``, where ``VL`` is the runtime vector factor. If these
19587 constraints are not satisfied the intrinsic has undefined behaviour.
19592 Effectively, this intrinsic concatenates ``vec1[0..evl1-1]`` and
19593 ``vec2[0..evl2-1]`` and creates the result vector by selecting the elements in a
19594 window of size ``evl2``, starting at index ``imm`` (for a positive immediate) of
19595 the concatenated vector. Elements in the result vector beyond ``evl2`` are
19596 ``undef``. If ``imm`` is negative the starting index is ``evl1 + imm``. The result
19597 vector of active vector length ``evl2`` contains ``evl1 - imm`` (``-imm`` for
19598 negative ``imm``) elements from indices ``[imm..evl1 - 1]``
19599 (``[evl1 + imm..evl1 -1]`` for negative ``imm``) of ``vec1`` followed by the
19600 first ``evl2 - (evl1 - imm)`` (``evl2 + imm`` for negative ``imm``) elements of
19601 ``vec2``. If ``evl1 - imm`` (``-imm``) >= ``evl2``, only the first ``evl2``
19602 elements are considered and the remaining are ``undef``. The lanes in the result
19603 vector disabled by ``mask`` are ``undef``.
19608 .. code-block:: text
19610 llvm.experimental.vp.splice(<A,B,C,D>, <E,F,G,H>, 1, 2, 3) ==> <B, E, F, undef> ; index
19611 llvm.experimental.vp.splice(<A,B,C,D>, <E,F,G,H>, -2, 3, 2) ==> <B, C, undef, undef> ; trailing elements
19616 '``llvm.vp.load``' Intrinsic
19617 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
19621 This is an overloaded intrinsic.
19625 declare <4 x float> @llvm.vp.load.v4f32.p0v4f32(<4 x float>* %ptr, <4 x i1> %mask, i32 %evl)
19626 declare <vscale x 2 x i16> @llvm.vp.load.nxv2i16.p0nxv2i16(<vscale x 2 x i16>* %ptr, <vscale x 2 x i1> %mask, i32 %evl)
19627 declare <8 x float> @llvm.vp.load.v8f32.p1v8f32(<8 x float> addrspace(1)* %ptr, <8 x i1> %mask, i32 %evl)
19628 declare <vscale x 1 x i64> @llvm.vp.load.nxv1i64.p6nxv1i64(<vscale x 1 x i64> addrspace(6)* %ptr, <vscale x 1 x i1> %mask, i32 %evl)
19633 The '``llvm.vp.load.*``' intrinsic is the vector length predicated version of
19634 the :ref:`llvm.masked.load <int_mload>` intrinsic.
19639 The first operand is the base pointer for the load. The second operand is a
19640 vector of boolean values with the same number of elements as the return type.
19641 The third is the explicit vector length of the operation. The return type and
19642 underlying type of the base pointer are the same vector types.
19647 The '``llvm.vp.load``' intrinsic reads a vector from memory in the same way as
19648 the '``llvm.masked.load``' intrinsic, where the mask is taken from the
19649 combination of the '``mask``' and '``evl``' operands in the usual VP way. Of
19650 the '``llvm.masked.load``' operands not set by '``llvm.vp.load``': the
19651 '``passthru``' operand is implicitly ``undef``; the '``alignment``' operand is
19652 taken as the ABI alignment of the return type as specified by the
19653 :ref:`datalayout string<langref_datalayout>`.
19658 .. code-block:: text
19660 %r = call <8 x i8> @llvm.vp.load.v8i8.p0v8i8(<8 x i8>* %ptr, <8 x i1> %mask, i32 %evl)
19661 ;; For all lanes below %evl, %r is lane-wise equivalent to %also.r
19662 ;; Note that since the alignment is ultimately up to the data layout
19663 ;; string, 8 (the default) is used as an example.
19665 %also.r = call <8 x i8> @llvm.masked.load.v8i8.p0v8i8(<8 x i8>* %ptr, i32 8, <8 x i1> %mask, <8 x i8> undef)
19670 '``llvm.vp.store``' Intrinsic
19671 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
19675 This is an overloaded intrinsic.
19679 declare void @llvm.vp.store.v4f32.p0v4f32(<4 x float> %val, <4 x float>* %ptr, <4 x i1> %mask, i32 %evl)
19680 declare void @llvm.vp.store.nxv2i16.p0nxv2i16(<vscale x 2 x i16> %val, <vscale x 2 x i16>* %ptr, <vscale x 2 x i1> %mask, i32 %evl)
19681 declare void @llvm.vp.store.v8f32.p1v8f32(<8 x float> %val, <8 x float> addrspace(1)* %ptr, <8 x i1> %mask, i32 %evl)
19682 declare void @llvm.vp.store.nxv1i64.p6nxv1i64(<vscale x 1 x i64> %val, <vscale x 1 x i64> addrspace(6)* %ptr, <vscale x 1 x i1> %mask, i32 %evl)
19687 The '``llvm.vp.store.*``' intrinsic is the vector length predicated version of
19688 the :ref:`llvm.masked.store <int_mstore>` intrinsic.
19693 The first operand is the vector value to be written to memory. The second
19694 operand is the base pointer for the store. It has the same underlying type as
19695 the value operand. The third operand is a vector of boolean values with the
19696 same number of elements as the return type. The fourth is the explicit vector
19697 length of the operation.
19702 The '``llvm.vp.store``' intrinsic reads a vector from memory in the same way as
19703 the '``llvm.masked.store``' intrinsic, where the mask is taken from the
19704 combination of the '``mask``' and '``evl``' operands in the usual VP way. The
19705 '``alignment``' operand of the '``llvm.masked.store``' intrinsic is not set by
19706 '``llvm.vp.store``': it is taken as the ABI alignment of the type of the
19707 '``value``' operand as specified by the :ref:`datalayout
19708 string<langref_datalayout>`.
19713 .. code-block:: text
19715 call void @llvm.vp.store.v8i8.p0v8i8(<8 x i8> %val, <8 x i8>* %ptr, <8 x i1> %mask, i32 %evl)
19716 ;; For all lanes below %evl, the call above is lane-wise equivalent to the call below.
19717 ;; Note that since the alignment is ultimately up to the data layout
19718 ;; string, 8 (the default) is used as an example.
19720 call void @llvm.masked.store.v8i8.p0v8i8(<8 x i8> %val, <8 x i8>* %ptr, i32 8, <8 x i1> %mask)
19725 '``llvm.vp.gather``' Intrinsic
19726 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
19730 This is an overloaded intrinsic.
19734 declare <4 x double> @llvm.vp.gather.v4f64.v4p0f64(<4 x double*> %ptrs, <4 x i1> %mask, i32 %evl)
19735 declare <vscale x 2 x i8> @llvm.vp.gather.nxv2i8.nxv2p0i8(<vscale x 2 x i8*> %ptrs, <vscale x 2 x i1> %mask, i32 %evl)
19736 declare <2 x float> @llvm.vp.gather.v2f32.v2p2f32(<2 x float addrspace(2)*> %ptrs, <2 x i1> %mask, i32 %evl)
19737 declare <vscale x 4 x i32> @llvm.vp.gather.nxv4i32.nxv4p4i32(<vscale x 4 x i32 addrspace(4)*> %ptrs, <vscale x 4 x i1> %mask, i32 %evl)
19742 The '``llvm.vp.gather.*``' intrinsic is the vector length predicated version of
19743 the :ref:`llvm.masked.gather <int_mgather>` intrinsic.
19748 The first operand is a vector of pointers which holds all memory addresses to
19749 read. The second operand is a vector of boolean values with the same number of
19750 elements as the return type. The third is the explicit vector length of the
19751 operation. The return type and underlying type of the vector of pointers are
19752 the same vector types.
19757 The '``llvm.vp.gather``' intrinsic reads multiple scalar values from memory in
19758 the same way as the '``llvm.masked.gather``' intrinsic, where the mask is taken
19759 from the combination of the '``mask``' and '``evl``' operands in the usual VP
19760 way. Of the '``llvm.masked.gather``' operands not set by '``llvm.vp.gather``':
19761 the '``passthru``' operand is implicitly ``undef``; the '``alignment``' operand
19762 is taken as the ABI alignment of the source addresses as specified by the
19763 :ref:`datalayout string<langref_datalayout>`.
19768 .. code-block:: text
19770 %r = call void @llvm.vp.scatter.v8i8.v8p0i8(<8 x i8> %val, <8 x i8*> %ptrs, <8 x i1> %mask, i32 %evl)
19771 ;; For all lanes below %evl, %r is lane-wise equivalent to %also.r
19772 ;; Note that since the alignment is ultimately up to the data layout
19773 ;; string, 8 is used as an example.
19775 %also.r = call void @llvm.masked.scatter.v8i8.v8p0i8(<8 x i8> %val, <8 x i8*> %ptrs, i32 8, <8 x i1> %mask, <8 x i8> undef)
19778 .. _int_vp_scatter:
19780 '``llvm.vp.scatter``' Intrinsic
19781 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
19785 This is an overloaded intrinsic.
19789 declare void @llvm.vp.scatter.v4f64.v4p0f64(<4 x double> %val, <4 x double*> %ptrs, <4 x i1> %mask, i32 %evl)
19790 declare void @llvm.vp.scatter.nxv2i8.nxv2p0i8(<vscale x 2 x i8> %val, <vscale x 2 x i8*> %ptrs, <vscale x 2 x i1> %mask, i32 %evl)
19791 declare void @llvm.vp.scatter.v2f32.v2p2f32(<2 x float> %val, <2 x float addrspace(2)*> %ptrs, <2 x i1> %mask, i32 %evl)
19792 declare void @llvm.vp.scatter.nxv4i32.nxv4p4i32(<vscale x 4 x i32> %val, <vscale x 4 x i32 addrspace(4)*> %ptrs, <vscale x 4 x i1> %mask, i32 %evl)
19797 The '``llvm.vp.scatter.*``' intrinsic is the vector length predicated version of
19798 the :ref:`llvm.masked.scatter <int_mscatter>` intrinsic.
19803 The first operand is a vector value to be written to memory. The second operand
19804 is a vector of pointers, pointing to where the value elements should be stored.
19805 The third operand is a vector of boolean values with the same number of
19806 elements as the return type. The fourth is the explicit vector length of the
19812 The '``llvm.vp.scatter``' intrinsic writes multiple scalar values to memory in
19813 the same way as the '``llvm.masked.scatter``' intrinsic, where the mask is
19814 taken from the combination of the '``mask``' and '``evl``' operands in the
19815 usual VP way. The '``alignment``' operand of the '``llvm.masked.scatter``'
19816 intrinsic is not set by '``llvm.vp.scatter``': it is taken as the ABI alignment
19817 of the destination addresses as specified by the :ref:`datalayout
19818 string<langref_datalayout>`.
19823 .. code-block:: text
19825 call void @llvm.vp.scatter.v8i8.v8p0i8(<8 x i8> %val, <8 x i8*> %ptrs, <8 x i1> %mask, i32 %evl)
19826 ;; For all lanes below %evl, the call above is lane-wise equivalent to the call below.
19827 ;; Note that since the alignment is ultimately up to the data layout
19828 ;; string, 8 is used as an example.
19830 call void @llvm.masked.scatter.v8i8.v8p0i8(<8 x i8> %val, <8 x i8*> %ptrs, i32 8, <8 x i1> %mask)
19833 .. _int_mload_mstore:
19835 Masked Vector Load and Store Intrinsics
19836 ---------------------------------------
19838 LLVM provides intrinsics for predicated vector load and store operations. The predicate is specified by a mask operand, which holds one bit per vector element, switching the associated vector lane on or off. The memory addresses corresponding to the "off" lanes are not accessed. When all bits of the mask are on, the intrinsic is identical to a regular vector load or store. When all bits are off, no memory is accessed.
19842 '``llvm.masked.load.*``' Intrinsics
19843 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
19847 This is an overloaded intrinsic. The loaded data is a vector of any integer, floating-point or pointer data type.
19851 declare <16 x float> @llvm.masked.load.v16f32.p0v16f32 (<16 x float>* <ptr>, i32 <alignment>, <16 x i1> <mask>, <16 x float> <passthru>)
19852 declare <2 x double> @llvm.masked.load.v2f64.p0v2f64 (<2 x double>* <ptr>, i32 <alignment>, <2 x i1> <mask>, <2 x double> <passthru>)
19853 ;; The data is a vector of pointers to double
19854 declare <8 x double*> @llvm.masked.load.v8p0f64.p0v8p0f64 (<8 x double*>* <ptr>, i32 <alignment>, <8 x i1> <mask>, <8 x double*> <passthru>)
19855 ;; The data is a vector of function pointers
19856 declare <8 x i32 ()*> @llvm.masked.load.v8p0f_i32f.p0v8p0f_i32f (<8 x i32 ()*>* <ptr>, i32 <alignment>, <8 x i1> <mask>, <8 x i32 ()*> <passthru>)
19861 Reads a vector from memory according to the provided mask. The mask holds a bit for each vector lane, and is used to prevent memory accesses to the masked-off lanes. The masked-off lanes in the result vector are taken from the corresponding lanes of the '``passthru``' operand.
19867 The first operand is the base pointer for the load. The second operand is the alignment of the source location. It must be a power of two constant integer value. The third operand, mask, is a vector of boolean values with the same number of elements as the return type. The fourth is a pass-through value that is used to fill the masked-off lanes of the result. The return type, underlying type of the base pointer and the type of the '``passthru``' operand are the same vector types.
19872 The '``llvm.masked.load``' intrinsic is designed for conditional reading of selected vector elements in a single IR operation. It is useful for targets that support vector masked loads and allows vectorizing predicated basic blocks on these targets. Other targets may support this intrinsic differently, for example by lowering it into a sequence of branches that guard scalar load operations.
19873 The result of this operation is equivalent to a regular vector load instruction followed by a 'select' between the loaded and the passthru values, predicated on the same mask. However, using this intrinsic prevents exceptions on memory access to masked-off lanes.
19878 %res = call <16 x float> @llvm.masked.load.v16f32.p0v16f32 (<16 x float>* %ptr, i32 4, <16 x i1>%mask, <16 x float> %passthru)
19880 ;; The result of the two following instructions is identical aside from potential memory access exception
19881 %loadlal = load <16 x float>, <16 x float>* %ptr, align 4
19882 %res = select <16 x i1> %mask, <16 x float> %loadlal, <16 x float> %passthru
19886 '``llvm.masked.store.*``' Intrinsics
19887 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
19891 This is an overloaded intrinsic. The data stored in memory is a vector of any integer, floating-point or pointer data type.
19895 declare void @llvm.masked.store.v8i32.p0v8i32 (<8 x i32> <value>, <8 x i32>* <ptr>, i32 <alignment>, <8 x i1> <mask>)
19896 declare void @llvm.masked.store.v16f32.p0v16f32 (<16 x float> <value>, <16 x float>* <ptr>, i32 <alignment>, <16 x i1> <mask>)
19897 ;; The data is a vector of pointers to double
19898 declare void @llvm.masked.store.v8p0f64.p0v8p0f64 (<8 x double*> <value>, <8 x double*>* <ptr>, i32 <alignment>, <8 x i1> <mask>)
19899 ;; The data is a vector of function pointers
19900 declare void @llvm.masked.store.v4p0f_i32f.p0v4p0f_i32f (<4 x i32 ()*> <value>, <4 x i32 ()*>* <ptr>, i32 <alignment>, <4 x i1> <mask>)
19905 Writes a vector to memory according to the provided mask. The mask holds a bit for each vector lane, and is used to prevent memory accesses to the masked-off lanes.
19910 The first operand is the vector value to be written to memory. The second operand is the base pointer for the store, it has the same underlying type as the value operand. The third operand is the alignment of the destination location. It must be a power of two constant integer value. The fourth operand, mask, is a vector of boolean values. The types of the mask and the value operand must have the same number of vector elements.
19916 The '``llvm.masked.store``' intrinsics is designed for conditional writing of selected vector elements in a single IR operation. It is useful for targets that support vector masked store and allows vectorizing predicated basic blocks on these targets. Other targets may support this intrinsic differently, for example by lowering it into a sequence of branches that guard scalar store operations.
19917 The result of this operation is equivalent to a load-modify-store sequence. However, using this intrinsic prevents exceptions and data races on memory access to masked-off lanes.
19921 call void @llvm.masked.store.v16f32.p0v16f32(<16 x float> %value, <16 x float>* %ptr, i32 4, <16 x i1> %mask)
19923 ;; The result of the following instructions is identical aside from potential data races and memory access exceptions
19924 %oldval = load <16 x float>, <16 x float>* %ptr, align 4
19925 %res = select <16 x i1> %mask, <16 x float> %value, <16 x float> %oldval
19926 store <16 x float> %res, <16 x float>* %ptr, align 4
19929 Masked Vector Gather and Scatter Intrinsics
19930 -------------------------------------------
19932 LLVM provides intrinsics for vector gather and scatter operations. They are similar to :ref:`Masked Vector Load and Store <int_mload_mstore>`, except they are designed for arbitrary memory accesses, rather than sequential memory accesses. Gather and scatter also employ a mask operand, which holds one bit per vector element, switching the associated vector lane on or off. The memory addresses corresponding to the "off" lanes are not accessed. When all bits are off, no memory is accessed.
19936 '``llvm.masked.gather.*``' Intrinsics
19937 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
19941 This is an overloaded intrinsic. The loaded data are multiple scalar values of any integer, floating-point or pointer data type gathered together into one vector.
19945 declare <16 x float> @llvm.masked.gather.v16f32.v16p0f32 (<16 x float*> <ptrs>, i32 <alignment>, <16 x i1> <mask>, <16 x float> <passthru>)
19946 declare <2 x double> @llvm.masked.gather.v2f64.v2p1f64 (<2 x double addrspace(1)*> <ptrs>, i32 <alignment>, <2 x i1> <mask>, <2 x double> <passthru>)
19947 declare <8 x float*> @llvm.masked.gather.v8p0f32.v8p0p0f32 (<8 x float**> <ptrs>, i32 <alignment>, <8 x i1> <mask>, <8 x float*> <passthru>)
19952 Reads scalar values from arbitrary memory locations and gathers them into one vector. The memory locations are provided in the vector of pointers '``ptrs``'. The memory is accessed according to the provided mask. The mask holds a bit for each vector lane, and is used to prevent memory accesses to the masked-off lanes. The masked-off lanes in the result vector are taken from the corresponding lanes of the '``passthru``' operand.
19958 The first operand is a vector of pointers which holds all memory addresses to read. The second operand is an alignment of the source addresses. It must be 0 or a power of two constant integer value. The third operand, mask, is a vector of boolean values with the same number of elements as the return type. The fourth is a pass-through value that is used to fill the masked-off lanes of the result. The return type, underlying type of the vector of pointers and the type of the '``passthru``' operand are the same vector types.
19963 The '``llvm.masked.gather``' intrinsic is designed for conditional reading of multiple scalar values from arbitrary memory locations in a single IR operation. It is useful for targets that support vector masked gathers and allows vectorizing basic blocks with data and control divergence. Other targets may support this intrinsic differently, for example by lowering it into a sequence of scalar load operations.
19964 The semantics of this operation are equivalent to a sequence of conditional scalar loads with subsequent gathering all loaded values into a single vector. The mask restricts memory access to certain lanes and facilitates vectorization of predicated basic blocks.
19969 %res = call <4 x double> @llvm.masked.gather.v4f64.v4p0f64 (<4 x double*> %ptrs, i32 8, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x double> undef)
19971 ;; The gather with all-true mask is equivalent to the following instruction sequence
19972 %ptr0 = extractelement <4 x double*> %ptrs, i32 0
19973 %ptr1 = extractelement <4 x double*> %ptrs, i32 1
19974 %ptr2 = extractelement <4 x double*> %ptrs, i32 2
19975 %ptr3 = extractelement <4 x double*> %ptrs, i32 3
19977 %val0 = load double, double* %ptr0, align 8
19978 %val1 = load double, double* %ptr1, align 8
19979 %val2 = load double, double* %ptr2, align 8
19980 %val3 = load double, double* %ptr3, align 8
19982 %vec0 = insertelement <4 x double>undef, %val0, 0
19983 %vec01 = insertelement <4 x double>%vec0, %val1, 1
19984 %vec012 = insertelement <4 x double>%vec01, %val2, 2
19985 %vec0123 = insertelement <4 x double>%vec012, %val3, 3
19989 '``llvm.masked.scatter.*``' Intrinsics
19990 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
19994 This is an overloaded intrinsic. The data stored in memory is a vector of any integer, floating-point or pointer data type. Each vector element is stored in an arbitrary memory address. Scatter with overlapping addresses is guaranteed to be ordered from least-significant to most-significant element.
19998 declare void @llvm.masked.scatter.v8i32.v8p0i32 (<8 x i32> <value>, <8 x i32*> <ptrs>, i32 <alignment>, <8 x i1> <mask>)
19999 declare void @llvm.masked.scatter.v16f32.v16p1f32 (<16 x float> <value>, <16 x float addrspace(1)*> <ptrs>, i32 <alignment>, <16 x i1> <mask>)
20000 declare void @llvm.masked.scatter.v4p0f64.v4p0p0f64 (<4 x double*> <value>, <4 x double**> <ptrs>, i32 <alignment>, <4 x i1> <mask>)
20005 Writes each element from the value vector to the corresponding memory address. The memory addresses are represented as a vector of pointers. Writing is done according to the provided mask. The mask holds a bit for each vector lane, and is used to prevent memory accesses to the masked-off lanes.
20010 The first operand is a vector value to be written to memory. The second operand is a vector of pointers, pointing to where the value elements should be stored. It has the same underlying type as the value operand. The third operand is an alignment of the destination addresses. It must be 0 or a power of two constant integer value. The fourth operand, mask, is a vector of boolean values. The types of the mask and the value operand must have the same number of vector elements.
20015 The '``llvm.masked.scatter``' intrinsics is designed for writing selected vector elements to arbitrary memory addresses in a single IR operation. The operation may be conditional, when not all bits in the mask are switched on. It is useful for targets that support vector masked scatter and allows vectorizing basic blocks with data and control divergence. Other targets may support this intrinsic differently, for example by lowering it into a sequence of branches that guard scalar store operations.
20019 ;; This instruction unconditionally stores data vector in multiple addresses
20020 call @llvm.masked.scatter.v8i32.v8p0i32 (<8 x i32> %value, <8 x i32*> %ptrs, i32 4, <8 x i1> <true, true, .. true>)
20022 ;; It is equivalent to a list of scalar stores
20023 %val0 = extractelement <8 x i32> %value, i32 0
20024 %val1 = extractelement <8 x i32> %value, i32 1
20026 %val7 = extractelement <8 x i32> %value, i32 7
20027 %ptr0 = extractelement <8 x i32*> %ptrs, i32 0
20028 %ptr1 = extractelement <8 x i32*> %ptrs, i32 1
20030 %ptr7 = extractelement <8 x i32*> %ptrs, i32 7
20031 ;; Note: the order of the following stores is important when they overlap:
20032 store i32 %val0, i32* %ptr0, align 4
20033 store i32 %val1, i32* %ptr1, align 4
20035 store i32 %val7, i32* %ptr7, align 4
20038 Masked Vector Expanding Load and Compressing Store Intrinsics
20039 -------------------------------------------------------------
20041 LLVM provides intrinsics for expanding load and compressing store operations. Data selected from a vector according to a mask is stored in consecutive memory addresses (compressed store), and vice-versa (expanding load). These operations effective map to "if (cond.i) a[j++] = v.i" and "if (cond.i) v.i = a[j++]" patterns, respectively. Note that when the mask starts with '1' bits followed by '0' bits, these operations are identical to :ref:`llvm.masked.store <int_mstore>` and :ref:`llvm.masked.load <int_mload>`.
20043 .. _int_expandload:
20045 '``llvm.masked.expandload.*``' Intrinsics
20046 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
20050 This is an overloaded intrinsic. Several values of integer, floating point or pointer data type are loaded from consecutive memory addresses and stored into the elements of a vector according to the mask.
20054 declare <16 x float> @llvm.masked.expandload.v16f32 (float* <ptr>, <16 x i1> <mask>, <16 x float> <passthru>)
20055 declare <2 x i64> @llvm.masked.expandload.v2i64 (i64* <ptr>, <2 x i1> <mask>, <2 x i64> <passthru>)
20060 Reads a number of scalar values sequentially from memory location provided in '``ptr``' and spreads them in a vector. The '``mask``' holds a bit for each vector lane. The number of elements read from memory is equal to the number of '1' bits in the mask. The loaded elements are positioned in the destination vector according to the sequence of '1' and '0' bits in the mask. E.g., if the mask vector is '10010001', "expandload" reads 3 values from memory addresses ptr, ptr+1, ptr+2 and places them in lanes 0, 3 and 7 accordingly. The masked-off lanes are filled by elements from the corresponding lanes of the '``passthru``' operand.
20066 The first operand is the base pointer for the load. It has the same underlying type as the element of the returned vector. The second operand, mask, is a vector of boolean values with the same number of elements as the return type. The third is a pass-through value that is used to fill the masked-off lanes of the result. The return type and the type of the '``passthru``' operand have the same vector type.
20071 The '``llvm.masked.expandload``' intrinsic is designed for reading multiple scalar values from adjacent memory addresses into possibly non-adjacent vector lanes. It is useful for targets that support vector expanding loads and allows vectorizing loop with cross-iteration dependency like in the following example:
20075 // In this loop we load from B and spread the elements into array A.
20076 double *A, B; int *C;
20077 for (int i = 0; i < size; ++i) {
20083 .. code-block:: llvm
20085 ; Load several elements from array B and expand them in a vector.
20086 ; The number of loaded elements is equal to the number of '1' elements in the Mask.
20087 %Tmp = call <8 x double> @llvm.masked.expandload.v8f64(double* %Bptr, <8 x i1> %Mask, <8 x double> undef)
20088 ; Store the result in A
20089 call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> %Tmp, <8 x double>* %Aptr, i32 8, <8 x i1> %Mask)
20091 ; %Bptr should be increased on each iteration according to the number of '1' elements in the Mask.
20092 %MaskI = bitcast <8 x i1> %Mask to i8
20093 %MaskIPopcnt = call i8 @llvm.ctpop.i8(i8 %MaskI)
20094 %MaskI64 = zext i8 %MaskIPopcnt to i64
20095 %BNextInd = add i64 %BInd, %MaskI64
20098 Other targets may support this intrinsic differently, for example, by lowering it into a sequence of conditional scalar load operations and shuffles.
20099 If all mask elements are '1', the intrinsic behavior is equivalent to the regular unmasked vector load.
20101 .. _int_compressstore:
20103 '``llvm.masked.compressstore.*``' Intrinsics
20104 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
20108 This is an overloaded intrinsic. A number of scalar values of integer, floating point or pointer data type are collected from an input vector and stored into adjacent memory addresses. A mask defines which elements to collect from the vector.
20112 declare void @llvm.masked.compressstore.v8i32 (<8 x i32> <value>, i32* <ptr>, <8 x i1> <mask>)
20113 declare void @llvm.masked.compressstore.v16f32 (<16 x float> <value>, float* <ptr>, <16 x i1> <mask>)
20118 Selects elements from input vector '``value``' according to the '``mask``'. All selected elements are written into adjacent memory addresses starting at address '`ptr`', from lower to higher. The mask holds a bit for each vector lane, and is used to select elements to be stored. The number of elements to be stored is equal to the number of active bits in the mask.
20123 The first operand is the input vector, from which elements are collected and written to memory. The second operand is the base pointer for the store, it has the same underlying type as the element of the input vector operand. The third operand is the mask, a vector of boolean values. The mask and the input vector must have the same number of vector elements.
20129 The '``llvm.masked.compressstore``' intrinsic is designed for compressing data in memory. It allows to collect elements from possibly non-adjacent lanes of a vector and store them contiguously in memory in one IR operation. It is useful for targets that support compressing store operations and allows vectorizing loops with cross-iteration dependences like in the following example:
20133 // In this loop we load elements from A and store them consecutively in B
20134 double *A, B; int *C;
20135 for (int i = 0; i < size; ++i) {
20141 .. code-block:: llvm
20143 ; Load elements from A.
20144 %Tmp = call <8 x double> @llvm.masked.load.v8f64.p0v8f64(<8 x double>* %Aptr, i32 8, <8 x i1> %Mask, <8 x double> undef)
20145 ; Store all selected elements consecutively in array B
20146 call <void> @llvm.masked.compressstore.v8f64(<8 x double> %Tmp, double* %Bptr, <8 x i1> %Mask)
20148 ; %Bptr should be increased on each iteration according to the number of '1' elements in the Mask.
20149 %MaskI = bitcast <8 x i1> %Mask to i8
20150 %MaskIPopcnt = call i8 @llvm.ctpop.i8(i8 %MaskI)
20151 %MaskI64 = zext i8 %MaskIPopcnt to i64
20152 %BNextInd = add i64 %BInd, %MaskI64
20155 Other targets may support this intrinsic differently, for example, by lowering it into a sequence of branches that guard scalar store operations.
20161 This class of intrinsics provides information about the
20162 :ref:`lifetime of memory objects <objectlifetime>` and ranges where variables
20167 '``llvm.lifetime.start``' Intrinsic
20168 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
20175 declare void @llvm.lifetime.start(i64 <size>, i8* nocapture <ptr>)
20180 The '``llvm.lifetime.start``' intrinsic specifies the start of a memory
20186 The first argument is a constant integer representing the size of the
20187 object, or -1 if it is variable sized. The second argument is a pointer
20193 If ``ptr`` is a stack-allocated object and it points to the first byte of
20194 the object, the object is initially marked as dead.
20195 ``ptr`` is conservatively considered as a non-stack-allocated object if
20196 the stack coloring algorithm that is used in the optimization pipeline cannot
20197 conclude that ``ptr`` is a stack-allocated object.
20199 After '``llvm.lifetime.start``', the stack object that ``ptr`` points is marked
20200 as alive and has an uninitialized value.
20201 The stack object is marked as dead when either
20202 :ref:`llvm.lifetime.end <int_lifeend>` to the alloca is executed or the
20205 After :ref:`llvm.lifetime.end <int_lifeend>` is called,
20206 '``llvm.lifetime.start``' on the stack object can be called again.
20207 The second '``llvm.lifetime.start``' call marks the object as alive, but it
20208 does not change the address of the object.
20210 If ``ptr`` is a non-stack-allocated object, it does not point to the first
20211 byte of the object or it is a stack object that is already alive, it simply
20212 fills all bytes of the object with ``poison``.
20217 '``llvm.lifetime.end``' Intrinsic
20218 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
20225 declare void @llvm.lifetime.end(i64 <size>, i8* nocapture <ptr>)
20230 The '``llvm.lifetime.end``' intrinsic specifies the end of a memory object's
20236 The first argument is a constant integer representing the size of the
20237 object, or -1 if it is variable sized. The second argument is a pointer
20243 If ``ptr`` is a stack-allocated object and it points to the first byte of the
20244 object, the object is dead.
20245 ``ptr`` is conservatively considered as a non-stack-allocated object if
20246 the stack coloring algorithm that is used in the optimization pipeline cannot
20247 conclude that ``ptr`` is a stack-allocated object.
20249 Calling ``llvm.lifetime.end`` on an already dead alloca is no-op.
20251 If ``ptr`` is a non-stack-allocated object or it does not point to the first
20252 byte of the object, it is equivalent to simply filling all bytes of the object
20256 '``llvm.invariant.start``' Intrinsic
20257 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
20261 This is an overloaded intrinsic. The memory object can belong to any address space.
20265 declare {}* @llvm.invariant.start.p0i8(i64 <size>, i8* nocapture <ptr>)
20270 The '``llvm.invariant.start``' intrinsic specifies that the contents of
20271 a memory object will not change.
20276 The first argument is a constant integer representing the size of the
20277 object, or -1 if it is variable sized. The second argument is a pointer
20283 This intrinsic indicates that until an ``llvm.invariant.end`` that uses
20284 the return value, the referenced memory location is constant and
20287 '``llvm.invariant.end``' Intrinsic
20288 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
20292 This is an overloaded intrinsic. The memory object can belong to any address space.
20296 declare void @llvm.invariant.end.p0i8({}* <start>, i64 <size>, i8* nocapture <ptr>)
20301 The '``llvm.invariant.end``' intrinsic specifies that the contents of a
20302 memory object are mutable.
20307 The first argument is the matching ``llvm.invariant.start`` intrinsic.
20308 The second argument is a constant integer representing the size of the
20309 object, or -1 if it is variable sized and the third argument is a
20310 pointer to the object.
20315 This intrinsic indicates that the memory is mutable again.
20317 '``llvm.launder.invariant.group``' Intrinsic
20318 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
20322 This is an overloaded intrinsic. The memory object can belong to any address
20323 space. The returned pointer must belong to the same address space as the
20328 declare i8* @llvm.launder.invariant.group.p0i8(i8* <ptr>)
20333 The '``llvm.launder.invariant.group``' intrinsic can be used when an invariant
20334 established by ``invariant.group`` metadata no longer holds, to obtain a new
20335 pointer value that carries fresh invariant group information. It is an
20336 experimental intrinsic, which means that its semantics might change in the
20343 The ``llvm.launder.invariant.group`` takes only one argument, which is a pointer
20349 Returns another pointer that aliases its argument but which is considered different
20350 for the purposes of ``load``/``store`` ``invariant.group`` metadata.
20351 It does not read any accessible memory and the execution can be speculated.
20353 '``llvm.strip.invariant.group``' Intrinsic
20354 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
20358 This is an overloaded intrinsic. The memory object can belong to any address
20359 space. The returned pointer must belong to the same address space as the
20364 declare i8* @llvm.strip.invariant.group.p0i8(i8* <ptr>)
20369 The '``llvm.strip.invariant.group``' intrinsic can be used when an invariant
20370 established by ``invariant.group`` metadata no longer holds, to obtain a new pointer
20371 value that does not carry the invariant information. It is an experimental
20372 intrinsic, which means that its semantics might change in the future.
20378 The ``llvm.strip.invariant.group`` takes only one argument, which is a pointer
20384 Returns another pointer that aliases its argument but which has no associated
20385 ``invariant.group`` metadata.
20386 It does not read any memory and can be speculated.
20392 Constrained Floating-Point Intrinsics
20393 -------------------------------------
20395 These intrinsics are used to provide special handling of floating-point
20396 operations when specific rounding mode or floating-point exception behavior is
20397 required. By default, LLVM optimization passes assume that the rounding mode is
20398 round-to-nearest and that floating-point exceptions will not be monitored.
20399 Constrained FP intrinsics are used to support non-default rounding modes and
20400 accurately preserve exception behavior without compromising LLVM's ability to
20401 optimize FP code when the default behavior is used.
20403 If any FP operation in a function is constrained then they all must be
20404 constrained. This is required for correct LLVM IR. Optimizations that
20405 move code around can create miscompiles if mixing of constrained and normal
20406 operations is done. The correct way to mix constrained and less constrained
20407 operations is to use the rounding mode and exception handling metadata to
20408 mark constrained intrinsics as having LLVM's default behavior.
20410 Each of these intrinsics corresponds to a normal floating-point operation. The
20411 data arguments and the return value are the same as the corresponding FP
20414 The rounding mode argument is a metadata string specifying what
20415 assumptions, if any, the optimizer can make when transforming constant
20416 values. Some constrained FP intrinsics omit this argument. If required
20417 by the intrinsic, this argument must be one of the following strings:
20426 "round.tonearestaway"
20428 If this argument is "round.dynamic" optimization passes must assume that the
20429 rounding mode is unknown and may change at runtime. No transformations that
20430 depend on rounding mode may be performed in this case.
20432 The other possible values for the rounding mode argument correspond to the
20433 similarly named IEEE rounding modes. If the argument is any of these values
20434 optimization passes may perform transformations as long as they are consistent
20435 with the specified rounding mode.
20437 For example, 'x-0'->'x' is not a valid transformation if the rounding mode is
20438 "round.downward" or "round.dynamic" because if the value of 'x' is +0 then
20439 'x-0' should evaluate to '-0' when rounding downward. However, this
20440 transformation is legal for all other rounding modes.
20442 For values other than "round.dynamic" optimization passes may assume that the
20443 actual runtime rounding mode (as defined in a target-specific manner) matches
20444 the specified rounding mode, but this is not guaranteed. Using a specific
20445 non-dynamic rounding mode which does not match the actual rounding mode at
20446 runtime results in undefined behavior.
20448 The exception behavior argument is a metadata string describing the floating
20449 point exception semantics that required for the intrinsic. This argument
20450 must be one of the following strings:
20458 If this argument is "fpexcept.ignore" optimization passes may assume that the
20459 exception status flags will not be read and that floating-point exceptions will
20460 be masked. This allows transformations to be performed that may change the
20461 exception semantics of the original code. For example, FP operations may be
20462 speculatively executed in this case whereas they must not be for either of the
20463 other possible values of this argument.
20465 If the exception behavior argument is "fpexcept.maytrap" optimization passes
20466 must avoid transformations that may raise exceptions that would not have been
20467 raised by the original code (such as speculatively executing FP operations), but
20468 passes are not required to preserve all exceptions that are implied by the
20469 original code. For example, exceptions may be potentially hidden by constant
20472 If the exception behavior argument is "fpexcept.strict" all transformations must
20473 strictly preserve the floating-point exception semantics of the original code.
20474 Any FP exception that would have been raised by the original code must be raised
20475 by the transformed code, and the transformed code must not raise any FP
20476 exceptions that would not have been raised by the original code. This is the
20477 exception behavior argument that will be used if the code being compiled reads
20478 the FP exception status flags, but this mode can also be used with code that
20479 unmasks FP exceptions.
20481 The number and order of floating-point exceptions is NOT guaranteed. For
20482 example, a series of FP operations that each may raise exceptions may be
20483 vectorized into a single instruction that raises each unique exception a single
20486 Proper :ref:`function attributes <fnattrs>` usage is required for the
20487 constrained intrinsics to function correctly.
20489 All function *calls* done in a function that uses constrained floating
20490 point intrinsics must have the ``strictfp`` attribute.
20492 All function *definitions* that use constrained floating point intrinsics
20493 must have the ``strictfp`` attribute.
20495 '``llvm.experimental.constrained.fadd``' Intrinsic
20496 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
20504 @llvm.experimental.constrained.fadd(<type> <op1>, <type> <op2>,
20505 metadata <rounding mode>,
20506 metadata <exception behavior>)
20511 The '``llvm.experimental.constrained.fadd``' intrinsic returns the sum of its
20518 The first two arguments to the '``llvm.experimental.constrained.fadd``'
20519 intrinsic must be :ref:`floating-point <t_floating>` or :ref:`vector <t_vector>`
20520 of floating-point values. Both arguments must have identical types.
20522 The third and fourth arguments specify the rounding mode and exception
20523 behavior as described above.
20528 The value produced is the floating-point sum of the two value operands and has
20529 the same type as the operands.
20532 '``llvm.experimental.constrained.fsub``' Intrinsic
20533 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
20541 @llvm.experimental.constrained.fsub(<type> <op1>, <type> <op2>,
20542 metadata <rounding mode>,
20543 metadata <exception behavior>)
20548 The '``llvm.experimental.constrained.fsub``' intrinsic returns the difference
20549 of its two operands.
20555 The first two arguments to the '``llvm.experimental.constrained.fsub``'
20556 intrinsic must be :ref:`floating-point <t_floating>` or :ref:`vector <t_vector>`
20557 of floating-point values. Both arguments must have identical types.
20559 The third and fourth arguments specify the rounding mode and exception
20560 behavior as described above.
20565 The value produced is the floating-point difference of the two value operands
20566 and has the same type as the operands.
20569 '``llvm.experimental.constrained.fmul``' Intrinsic
20570 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
20578 @llvm.experimental.constrained.fmul(<type> <op1>, <type> <op2>,
20579 metadata <rounding mode>,
20580 metadata <exception behavior>)
20585 The '``llvm.experimental.constrained.fmul``' intrinsic returns the product of
20592 The first two arguments to the '``llvm.experimental.constrained.fmul``'
20593 intrinsic must be :ref:`floating-point <t_floating>` or :ref:`vector <t_vector>`
20594 of floating-point values. Both arguments must have identical types.
20596 The third and fourth arguments specify the rounding mode and exception
20597 behavior as described above.
20602 The value produced is the floating-point product of the two value operands and
20603 has the same type as the operands.
20606 '``llvm.experimental.constrained.fdiv``' Intrinsic
20607 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
20615 @llvm.experimental.constrained.fdiv(<type> <op1>, <type> <op2>,
20616 metadata <rounding mode>,
20617 metadata <exception behavior>)
20622 The '``llvm.experimental.constrained.fdiv``' intrinsic returns the quotient of
20629 The first two arguments to the '``llvm.experimental.constrained.fdiv``'
20630 intrinsic must be :ref:`floating-point <t_floating>` or :ref:`vector <t_vector>`
20631 of floating-point values. Both arguments must have identical types.
20633 The third and fourth arguments specify the rounding mode and exception
20634 behavior as described above.
20639 The value produced is the floating-point quotient of the two value operands and
20640 has the same type as the operands.
20643 '``llvm.experimental.constrained.frem``' Intrinsic
20644 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
20652 @llvm.experimental.constrained.frem(<type> <op1>, <type> <op2>,
20653 metadata <rounding mode>,
20654 metadata <exception behavior>)
20659 The '``llvm.experimental.constrained.frem``' intrinsic returns the remainder
20660 from the division of its two operands.
20666 The first two arguments to the '``llvm.experimental.constrained.frem``'
20667 intrinsic must be :ref:`floating-point <t_floating>` or :ref:`vector <t_vector>`
20668 of floating-point values. Both arguments must have identical types.
20670 The third and fourth arguments specify the rounding mode and exception
20671 behavior as described above. The rounding mode argument has no effect, since
20672 the result of frem is never rounded, but the argument is included for
20673 consistency with the other constrained floating-point intrinsics.
20678 The value produced is the floating-point remainder from the division of the two
20679 value operands and has the same type as the operands. The remainder has the
20680 same sign as the dividend.
20682 '``llvm.experimental.constrained.fma``' Intrinsic
20683 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
20691 @llvm.experimental.constrained.fma(<type> <op1>, <type> <op2>, <type> <op3>,
20692 metadata <rounding mode>,
20693 metadata <exception behavior>)
20698 The '``llvm.experimental.constrained.fma``' intrinsic returns the result of a
20699 fused-multiply-add operation on its operands.
20704 The first three arguments to the '``llvm.experimental.constrained.fma``'
20705 intrinsic must be :ref:`floating-point <t_floating>` or :ref:`vector
20706 <t_vector>` of floating-point values. All arguments must have identical types.
20708 The fourth and fifth arguments specify the rounding mode and exception behavior
20709 as described above.
20714 The result produced is the product of the first two operands added to the third
20715 operand computed with infinite precision, and then rounded to the target
20718 '``llvm.experimental.constrained.fptoui``' Intrinsic
20719 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
20727 @llvm.experimental.constrained.fptoui(<type> <value>,
20728 metadata <exception behavior>)
20733 The '``llvm.experimental.constrained.fptoui``' intrinsic converts a
20734 floating-point ``value`` to its unsigned integer equivalent of type ``ty2``.
20739 The first argument to the '``llvm.experimental.constrained.fptoui``'
20740 intrinsic must be :ref:`floating point <t_floating>` or :ref:`vector
20741 <t_vector>` of floating point values.
20743 The second argument specifies the exception behavior as described above.
20748 The result produced is an unsigned integer converted from the floating
20749 point operand. The value is truncated, so it is rounded towards zero.
20751 '``llvm.experimental.constrained.fptosi``' Intrinsic
20752 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
20760 @llvm.experimental.constrained.fptosi(<type> <value>,
20761 metadata <exception behavior>)
20766 The '``llvm.experimental.constrained.fptosi``' intrinsic converts
20767 :ref:`floating-point <t_floating>` ``value`` to type ``ty2``.
20772 The first argument to the '``llvm.experimental.constrained.fptosi``'
20773 intrinsic must be :ref:`floating point <t_floating>` or :ref:`vector
20774 <t_vector>` of floating point values.
20776 The second argument specifies the exception behavior as described above.
20781 The result produced is a signed integer converted from the floating
20782 point operand. The value is truncated, so it is rounded towards zero.
20784 '``llvm.experimental.constrained.uitofp``' Intrinsic
20785 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
20793 @llvm.experimental.constrained.uitofp(<type> <value>,
20794 metadata <rounding mode>,
20795 metadata <exception behavior>)
20800 The '``llvm.experimental.constrained.uitofp``' intrinsic converts an
20801 unsigned integer ``value`` to a floating-point of type ``ty2``.
20806 The first argument to the '``llvm.experimental.constrained.uitofp``'
20807 intrinsic must be an :ref:`integer <t_integer>` or :ref:`vector
20808 <t_vector>` of integer values.
20810 The second and third arguments specify the rounding mode and exception
20811 behavior as described above.
20816 An inexact floating-point exception will be raised if rounding is required.
20817 Any result produced is a floating point value converted from the input
20820 '``llvm.experimental.constrained.sitofp``' Intrinsic
20821 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
20829 @llvm.experimental.constrained.sitofp(<type> <value>,
20830 metadata <rounding mode>,
20831 metadata <exception behavior>)
20836 The '``llvm.experimental.constrained.sitofp``' intrinsic converts a
20837 signed integer ``value`` to a floating-point of type ``ty2``.
20842 The first argument to the '``llvm.experimental.constrained.sitofp``'
20843 intrinsic must be an :ref:`integer <t_integer>` or :ref:`vector
20844 <t_vector>` of integer values.
20846 The second and third arguments specify the rounding mode and exception
20847 behavior as described above.
20852 An inexact floating-point exception will be raised if rounding is required.
20853 Any result produced is a floating point value converted from the input
20856 '``llvm.experimental.constrained.fptrunc``' Intrinsic
20857 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
20865 @llvm.experimental.constrained.fptrunc(<type> <value>,
20866 metadata <rounding mode>,
20867 metadata <exception behavior>)
20872 The '``llvm.experimental.constrained.fptrunc``' intrinsic truncates ``value``
20878 The first argument to the '``llvm.experimental.constrained.fptrunc``'
20879 intrinsic must be :ref:`floating point <t_floating>` or :ref:`vector
20880 <t_vector>` of floating point values. This argument must be larger in size
20883 The second and third arguments specify the rounding mode and exception
20884 behavior as described above.
20889 The result produced is a floating point value truncated to be smaller in size
20892 '``llvm.experimental.constrained.fpext``' Intrinsic
20893 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
20901 @llvm.experimental.constrained.fpext(<type> <value>,
20902 metadata <exception behavior>)
20907 The '``llvm.experimental.constrained.fpext``' intrinsic extends a
20908 floating-point ``value`` to a larger floating-point value.
20913 The first argument to the '``llvm.experimental.constrained.fpext``'
20914 intrinsic must be :ref:`floating point <t_floating>` or :ref:`vector
20915 <t_vector>` of floating point values. This argument must be smaller in size
20918 The second argument specifies the exception behavior as described above.
20923 The result produced is a floating point value extended to be larger in size
20924 than the operand. All restrictions that apply to the fpext instruction also
20925 apply to this intrinsic.
20927 '``llvm.experimental.constrained.fcmp``' and '``llvm.experimental.constrained.fcmps``' Intrinsics
20928 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
20936 @llvm.experimental.constrained.fcmp(<type> <op1>, <type> <op2>,
20937 metadata <condition code>,
20938 metadata <exception behavior>)
20940 @llvm.experimental.constrained.fcmps(<type> <op1>, <type> <op2>,
20941 metadata <condition code>,
20942 metadata <exception behavior>)
20947 The '``llvm.experimental.constrained.fcmp``' and
20948 '``llvm.experimental.constrained.fcmps``' intrinsics return a boolean
20949 value or vector of boolean values based on comparison of its operands.
20951 If the operands are floating-point scalars, then the result type is a
20952 boolean (:ref:`i1 <t_integer>`).
20954 If the operands are floating-point vectors, then the result type is a
20955 vector of boolean with the same number of elements as the operands being
20958 The '``llvm.experimental.constrained.fcmp``' intrinsic performs a quiet
20959 comparison operation while the '``llvm.experimental.constrained.fcmps``'
20960 intrinsic performs a signaling comparison operation.
20965 The first two arguments to the '``llvm.experimental.constrained.fcmp``'
20966 and '``llvm.experimental.constrained.fcmps``' intrinsics must be
20967 :ref:`floating-point <t_floating>` or :ref:`vector <t_vector>`
20968 of floating-point values. Both arguments must have identical types.
20970 The third argument is the condition code indicating the kind of comparison
20971 to perform. It must be a metadata string with one of the following values:
20973 - "``oeq``": ordered and equal
20974 - "``ogt``": ordered and greater than
20975 - "``oge``": ordered and greater than or equal
20976 - "``olt``": ordered and less than
20977 - "``ole``": ordered and less than or equal
20978 - "``one``": ordered and not equal
20979 - "``ord``": ordered (no nans)
20980 - "``ueq``": unordered or equal
20981 - "``ugt``": unordered or greater than
20982 - "``uge``": unordered or greater than or equal
20983 - "``ult``": unordered or less than
20984 - "``ule``": unordered or less than or equal
20985 - "``une``": unordered or not equal
20986 - "``uno``": unordered (either nans)
20988 *Ordered* means that neither operand is a NAN while *unordered* means
20989 that either operand may be a NAN.
20991 The fourth argument specifies the exception behavior as described above.
20996 ``op1`` and ``op2`` are compared according to the condition code given
20997 as the third argument. If the operands are vectors, then the
20998 vectors are compared element by element. Each comparison performed
20999 always yields an :ref:`i1 <t_integer>` result, as follows:
21001 - "``oeq``": yields ``true`` if both operands are not a NAN and ``op1``
21002 is equal to ``op2``.
21003 - "``ogt``": yields ``true`` if both operands are not a NAN and ``op1``
21004 is greater than ``op2``.
21005 - "``oge``": yields ``true`` if both operands are not a NAN and ``op1``
21006 is greater than or equal to ``op2``.
21007 - "``olt``": yields ``true`` if both operands are not a NAN and ``op1``
21008 is less than ``op2``.
21009 - "``ole``": yields ``true`` if both operands are not a NAN and ``op1``
21010 is less than or equal to ``op2``.
21011 - "``one``": yields ``true`` if both operands are not a NAN and ``op1``
21012 is not equal to ``op2``.
21013 - "``ord``": yields ``true`` if both operands are not a NAN.
21014 - "``ueq``": yields ``true`` if either operand is a NAN or ``op1`` is
21016 - "``ugt``": yields ``true`` if either operand is a NAN or ``op1`` is
21017 greater than ``op2``.
21018 - "``uge``": yields ``true`` if either operand is a NAN or ``op1`` is
21019 greater than or equal to ``op2``.
21020 - "``ult``": yields ``true`` if either operand is a NAN or ``op1`` is
21022 - "``ule``": yields ``true`` if either operand is a NAN or ``op1`` is
21023 less than or equal to ``op2``.
21024 - "``une``": yields ``true`` if either operand is a NAN or ``op1`` is
21025 not equal to ``op2``.
21026 - "``uno``": yields ``true`` if either operand is a NAN.
21028 The quiet comparison operation performed by
21029 '``llvm.experimental.constrained.fcmp``' will only raise an exception
21030 if either operand is a SNAN. The signaling comparison operation
21031 performed by '``llvm.experimental.constrained.fcmps``' will raise an
21032 exception if either operand is a NAN (QNAN or SNAN). Such an exception
21033 does not preclude a result being produced (e.g. exception might only
21034 set a flag), therefore the distinction between ordered and unordered
21035 comparisons is also relevant for the
21036 '``llvm.experimental.constrained.fcmps``' intrinsic.
21038 '``llvm.experimental.constrained.fmuladd``' Intrinsic
21039 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
21047 @llvm.experimental.constrained.fmuladd(<type> <op1>, <type> <op2>,
21049 metadata <rounding mode>,
21050 metadata <exception behavior>)
21055 The '``llvm.experimental.constrained.fmuladd``' intrinsic represents
21056 multiply-add expressions that can be fused if the code generator determines
21057 that (a) the target instruction set has support for a fused operation,
21058 and (b) that the fused operation is more efficient than the equivalent,
21059 separate pair of mul and add instructions.
21064 The first three arguments to the '``llvm.experimental.constrained.fmuladd``'
21065 intrinsic must be floating-point or vector of floating-point values.
21066 All three arguments must have identical types.
21068 The fourth and fifth arguments specify the rounding mode and exception behavior
21069 as described above.
21078 %0 = call float @llvm.experimental.constrained.fmuladd.f32(%a, %b, %c,
21079 metadata <rounding mode>,
21080 metadata <exception behavior>)
21082 is equivalent to the expression:
21086 %0 = call float @llvm.experimental.constrained.fmul.f32(%a, %b,
21087 metadata <rounding mode>,
21088 metadata <exception behavior>)
21089 %1 = call float @llvm.experimental.constrained.fadd.f32(%0, %c,
21090 metadata <rounding mode>,
21091 metadata <exception behavior>)
21093 except that it is unspecified whether rounding will be performed between the
21094 multiplication and addition steps. Fusion is not guaranteed, even if the target
21095 platform supports it.
21096 If a fused multiply-add is required, the corresponding
21097 :ref:`llvm.experimental.constrained.fma <int_fma>` intrinsic function should be
21099 This never sets errno, just as '``llvm.experimental.constrained.fma.*``'.
21101 Constrained libm-equivalent Intrinsics
21102 --------------------------------------
21104 In addition to the basic floating-point operations for which constrained
21105 intrinsics are described above, there are constrained versions of various
21106 operations which provide equivalent behavior to a corresponding libm function.
21107 These intrinsics allow the precise behavior of these operations with respect to
21108 rounding mode and exception behavior to be controlled.
21110 As with the basic constrained floating-point intrinsics, the rounding mode
21111 and exception behavior arguments only control the behavior of the optimizer.
21112 They do not change the runtime floating-point environment.
21115 '``llvm.experimental.constrained.sqrt``' Intrinsic
21116 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
21124 @llvm.experimental.constrained.sqrt(<type> <op1>,
21125 metadata <rounding mode>,
21126 metadata <exception behavior>)
21131 The '``llvm.experimental.constrained.sqrt``' intrinsic returns the square root
21132 of the specified value, returning the same value as the libm '``sqrt``'
21133 functions would, but without setting ``errno``.
21138 The first argument and the return type are floating-point numbers of the same
21141 The second and third arguments specify the rounding mode and exception
21142 behavior as described above.
21147 This function returns the nonnegative square root of the specified value.
21148 If the value is less than negative zero, a floating-point exception occurs
21149 and the return value is architecture specific.
21152 '``llvm.experimental.constrained.pow``' Intrinsic
21153 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
21161 @llvm.experimental.constrained.pow(<type> <op1>, <type> <op2>,
21162 metadata <rounding mode>,
21163 metadata <exception behavior>)
21168 The '``llvm.experimental.constrained.pow``' intrinsic returns the first operand
21169 raised to the (positive or negative) power specified by the second operand.
21174 The first two arguments and the return value are floating-point numbers of the
21175 same type. The second argument specifies the power to which the first argument
21178 The third and fourth arguments specify the rounding mode and exception
21179 behavior as described above.
21184 This function returns the first value raised to the second power,
21185 returning the same values as the libm ``pow`` functions would, and
21186 handles error conditions in the same way.
21189 '``llvm.experimental.constrained.powi``' Intrinsic
21190 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
21198 @llvm.experimental.constrained.powi(<type> <op1>, i32 <op2>,
21199 metadata <rounding mode>,
21200 metadata <exception behavior>)
21205 The '``llvm.experimental.constrained.powi``' intrinsic returns the first operand
21206 raised to the (positive or negative) power specified by the second operand. The
21207 order of evaluation of multiplications is not defined. When a vector of
21208 floating-point type is used, the second argument remains a scalar integer value.
21214 The first argument and the return value are floating-point numbers of the same
21215 type. The second argument is a 32-bit signed integer specifying the power to
21216 which the first argument should be raised.
21218 The third and fourth arguments specify the rounding mode and exception
21219 behavior as described above.
21224 This function returns the first value raised to the second power with an
21225 unspecified sequence of rounding operations.
21228 '``llvm.experimental.constrained.sin``' Intrinsic
21229 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
21237 @llvm.experimental.constrained.sin(<type> <op1>,
21238 metadata <rounding mode>,
21239 metadata <exception behavior>)
21244 The '``llvm.experimental.constrained.sin``' intrinsic returns the sine of the
21250 The first argument and the return type are floating-point numbers of the same
21253 The second and third arguments specify the rounding mode and exception
21254 behavior as described above.
21259 This function returns the sine of the specified operand, returning the
21260 same values as the libm ``sin`` functions would, and handles error
21261 conditions in the same way.
21264 '``llvm.experimental.constrained.cos``' Intrinsic
21265 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
21273 @llvm.experimental.constrained.cos(<type> <op1>,
21274 metadata <rounding mode>,
21275 metadata <exception behavior>)
21280 The '``llvm.experimental.constrained.cos``' intrinsic returns the cosine of the
21286 The first argument and the return type are floating-point numbers of the same
21289 The second and third arguments specify the rounding mode and exception
21290 behavior as described above.
21295 This function returns the cosine of the specified operand, returning the
21296 same values as the libm ``cos`` functions would, and handles error
21297 conditions in the same way.
21300 '``llvm.experimental.constrained.exp``' Intrinsic
21301 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
21309 @llvm.experimental.constrained.exp(<type> <op1>,
21310 metadata <rounding mode>,
21311 metadata <exception behavior>)
21316 The '``llvm.experimental.constrained.exp``' intrinsic computes the base-e
21317 exponential of the specified value.
21322 The first argument and the return value are floating-point numbers of the same
21325 The second and third arguments specify the rounding mode and exception
21326 behavior as described above.
21331 This function returns the same values as the libm ``exp`` functions
21332 would, and handles error conditions in the same way.
21335 '``llvm.experimental.constrained.exp2``' Intrinsic
21336 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
21344 @llvm.experimental.constrained.exp2(<type> <op1>,
21345 metadata <rounding mode>,
21346 metadata <exception behavior>)
21351 The '``llvm.experimental.constrained.exp2``' intrinsic computes the base-2
21352 exponential of the specified value.
21358 The first argument and the return value are floating-point numbers of the same
21361 The second and third arguments specify the rounding mode and exception
21362 behavior as described above.
21367 This function returns the same values as the libm ``exp2`` functions
21368 would, and handles error conditions in the same way.
21371 '``llvm.experimental.constrained.log``' Intrinsic
21372 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
21380 @llvm.experimental.constrained.log(<type> <op1>,
21381 metadata <rounding mode>,
21382 metadata <exception behavior>)
21387 The '``llvm.experimental.constrained.log``' intrinsic computes the base-e
21388 logarithm of the specified value.
21393 The first argument and the return value are floating-point numbers of the same
21396 The second and third arguments specify the rounding mode and exception
21397 behavior as described above.
21403 This function returns the same values as the libm ``log`` functions
21404 would, and handles error conditions in the same way.
21407 '``llvm.experimental.constrained.log10``' Intrinsic
21408 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
21416 @llvm.experimental.constrained.log10(<type> <op1>,
21417 metadata <rounding mode>,
21418 metadata <exception behavior>)
21423 The '``llvm.experimental.constrained.log10``' intrinsic computes the base-10
21424 logarithm of the specified value.
21429 The first argument and the return value are floating-point numbers of the same
21432 The second and third arguments specify the rounding mode and exception
21433 behavior as described above.
21438 This function returns the same values as the libm ``log10`` functions
21439 would, and handles error conditions in the same way.
21442 '``llvm.experimental.constrained.log2``' Intrinsic
21443 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
21451 @llvm.experimental.constrained.log2(<type> <op1>,
21452 metadata <rounding mode>,
21453 metadata <exception behavior>)
21458 The '``llvm.experimental.constrained.log2``' intrinsic computes the base-2
21459 logarithm of the specified value.
21464 The first argument and the return value are floating-point numbers of the same
21467 The second and third arguments specify the rounding mode and exception
21468 behavior as described above.
21473 This function returns the same values as the libm ``log2`` functions
21474 would, and handles error conditions in the same way.
21477 '``llvm.experimental.constrained.rint``' Intrinsic
21478 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
21486 @llvm.experimental.constrained.rint(<type> <op1>,
21487 metadata <rounding mode>,
21488 metadata <exception behavior>)
21493 The '``llvm.experimental.constrained.rint``' intrinsic returns the first
21494 operand rounded to the nearest integer. It may raise an inexact floating-point
21495 exception if the operand is not an integer.
21500 The first argument and the return value are floating-point numbers of the same
21503 The second and third arguments specify the rounding mode and exception
21504 behavior as described above.
21509 This function returns the same values as the libm ``rint`` functions
21510 would, and handles error conditions in the same way. The rounding mode is
21511 described, not determined, by the rounding mode argument. The actual rounding
21512 mode is determined by the runtime floating-point environment. The rounding
21513 mode argument is only intended as information to the compiler.
21516 '``llvm.experimental.constrained.lrint``' Intrinsic
21517 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
21525 @llvm.experimental.constrained.lrint(<fptype> <op1>,
21526 metadata <rounding mode>,
21527 metadata <exception behavior>)
21532 The '``llvm.experimental.constrained.lrint``' intrinsic returns the first
21533 operand rounded to the nearest integer. An inexact floating-point exception
21534 will be raised if the operand is not an integer. An invalid exception is
21535 raised if the result is too large to fit into a supported integer type,
21536 and in this case the result is undefined.
21541 The first argument is a floating-point number. The return value is an
21542 integer type. Not all types are supported on all targets. The supported
21543 types are the same as the ``llvm.lrint`` intrinsic and the ``lrint``
21546 The second and third arguments specify the rounding mode and exception
21547 behavior as described above.
21552 This function returns the same values as the libm ``lrint`` functions
21553 would, and handles error conditions in the same way.
21555 The rounding mode is described, not determined, by the rounding mode
21556 argument. The actual rounding mode is determined by the runtime floating-point
21557 environment. The rounding mode argument is only intended as information
21560 If the runtime floating-point environment is using the default rounding mode
21561 then the results will be the same as the llvm.lrint intrinsic.
21564 '``llvm.experimental.constrained.llrint``' Intrinsic
21565 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
21573 @llvm.experimental.constrained.llrint(<fptype> <op1>,
21574 metadata <rounding mode>,
21575 metadata <exception behavior>)
21580 The '``llvm.experimental.constrained.llrint``' intrinsic returns the first
21581 operand rounded to the nearest integer. An inexact floating-point exception
21582 will be raised if the operand is not an integer. An invalid exception is
21583 raised if the result is too large to fit into a supported integer type,
21584 and in this case the result is undefined.
21589 The first argument is a floating-point number. The return value is an
21590 integer type. Not all types are supported on all targets. The supported
21591 types are the same as the ``llvm.llrint`` intrinsic and the ``llrint``
21594 The second and third arguments specify the rounding mode and exception
21595 behavior as described above.
21600 This function returns the same values as the libm ``llrint`` functions
21601 would, and handles error conditions in the same way.
21603 The rounding mode is described, not determined, by the rounding mode
21604 argument. The actual rounding mode is determined by the runtime floating-point
21605 environment. The rounding mode argument is only intended as information
21608 If the runtime floating-point environment is using the default rounding mode
21609 then the results will be the same as the llvm.llrint intrinsic.
21612 '``llvm.experimental.constrained.nearbyint``' Intrinsic
21613 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
21621 @llvm.experimental.constrained.nearbyint(<type> <op1>,
21622 metadata <rounding mode>,
21623 metadata <exception behavior>)
21628 The '``llvm.experimental.constrained.nearbyint``' intrinsic returns the first
21629 operand rounded to the nearest integer. It will not raise an inexact
21630 floating-point exception if the operand is not an integer.
21636 The first argument and the return value are floating-point numbers of the same
21639 The second and third arguments specify the rounding mode and exception
21640 behavior as described above.
21645 This function returns the same values as the libm ``nearbyint`` functions
21646 would, and handles error conditions in the same way. The rounding mode is
21647 described, not determined, by the rounding mode argument. The actual rounding
21648 mode is determined by the runtime floating-point environment. The rounding
21649 mode argument is only intended as information to the compiler.
21652 '``llvm.experimental.constrained.maxnum``' Intrinsic
21653 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
21661 @llvm.experimental.constrained.maxnum(<type> <op1>, <type> <op2>
21662 metadata <exception behavior>)
21667 The '``llvm.experimental.constrained.maxnum``' intrinsic returns the maximum
21668 of the two arguments.
21673 The first two arguments and the return value are floating-point numbers
21676 The third argument specifies the exception behavior as described above.
21681 This function follows the IEEE-754 semantics for maxNum.
21684 '``llvm.experimental.constrained.minnum``' Intrinsic
21685 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
21693 @llvm.experimental.constrained.minnum(<type> <op1>, <type> <op2>
21694 metadata <exception behavior>)
21699 The '``llvm.experimental.constrained.minnum``' intrinsic returns the minimum
21700 of the two arguments.
21705 The first two arguments and the return value are floating-point numbers
21708 The third argument specifies the exception behavior as described above.
21713 This function follows the IEEE-754 semantics for minNum.
21716 '``llvm.experimental.constrained.maximum``' Intrinsic
21717 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
21725 @llvm.experimental.constrained.maximum(<type> <op1>, <type> <op2>
21726 metadata <exception behavior>)
21731 The '``llvm.experimental.constrained.maximum``' intrinsic returns the maximum
21732 of the two arguments, propagating NaNs and treating -0.0 as less than +0.0.
21737 The first two arguments and the return value are floating-point numbers
21740 The third argument specifies the exception behavior as described above.
21745 This function follows semantics specified in the draft of IEEE 754-2018.
21748 '``llvm.experimental.constrained.minimum``' Intrinsic
21749 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
21757 @llvm.experimental.constrained.minimum(<type> <op1>, <type> <op2>
21758 metadata <exception behavior>)
21763 The '``llvm.experimental.constrained.minimum``' intrinsic returns the minimum
21764 of the two arguments, propagating NaNs and treating -0.0 as less than +0.0.
21769 The first two arguments and the return value are floating-point numbers
21772 The third argument specifies the exception behavior as described above.
21777 This function follows semantics specified in the draft of IEEE 754-2018.
21780 '``llvm.experimental.constrained.ceil``' Intrinsic
21781 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
21789 @llvm.experimental.constrained.ceil(<type> <op1>,
21790 metadata <exception behavior>)
21795 The '``llvm.experimental.constrained.ceil``' intrinsic returns the ceiling of the
21801 The first argument and the return value are floating-point numbers of the same
21804 The second argument specifies the exception behavior as described above.
21809 This function returns the same values as the libm ``ceil`` functions
21810 would and handles error conditions in the same way.
21813 '``llvm.experimental.constrained.floor``' Intrinsic
21814 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
21822 @llvm.experimental.constrained.floor(<type> <op1>,
21823 metadata <exception behavior>)
21828 The '``llvm.experimental.constrained.floor``' intrinsic returns the floor of the
21834 The first argument and the return value are floating-point numbers of the same
21837 The second argument specifies the exception behavior as described above.
21842 This function returns the same values as the libm ``floor`` functions
21843 would and handles error conditions in the same way.
21846 '``llvm.experimental.constrained.round``' Intrinsic
21847 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
21855 @llvm.experimental.constrained.round(<type> <op1>,
21856 metadata <exception behavior>)
21861 The '``llvm.experimental.constrained.round``' intrinsic returns the first
21862 operand rounded to the nearest integer.
21867 The first argument and the return value are floating-point numbers of the same
21870 The second argument specifies the exception behavior as described above.
21875 This function returns the same values as the libm ``round`` functions
21876 would and handles error conditions in the same way.
21879 '``llvm.experimental.constrained.roundeven``' Intrinsic
21880 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
21888 @llvm.experimental.constrained.roundeven(<type> <op1>,
21889 metadata <exception behavior>)
21894 The '``llvm.experimental.constrained.roundeven``' intrinsic returns the first
21895 operand rounded to the nearest integer in floating-point format, rounding
21896 halfway cases to even (that is, to the nearest value that is an even integer),
21897 regardless of the current rounding direction.
21902 The first argument and the return value are floating-point numbers of the same
21905 The second argument specifies the exception behavior as described above.
21910 This function implements IEEE-754 operation ``roundToIntegralTiesToEven``. It
21911 also behaves in the same way as C standard function ``roundeven`` and can signal
21912 the invalid operation exception for a SNAN operand.
21915 '``llvm.experimental.constrained.lround``' Intrinsic
21916 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
21924 @llvm.experimental.constrained.lround(<fptype> <op1>,
21925 metadata <exception behavior>)
21930 The '``llvm.experimental.constrained.lround``' intrinsic returns the first
21931 operand rounded to the nearest integer with ties away from zero. It will
21932 raise an inexact floating-point exception if the operand is not an integer.
21933 An invalid exception is raised if the result is too large to fit into a
21934 supported integer type, and in this case the result is undefined.
21939 The first argument is a floating-point number. The return value is an
21940 integer type. Not all types are supported on all targets. The supported
21941 types are the same as the ``llvm.lround`` intrinsic and the ``lround``
21944 The second argument specifies the exception behavior as described above.
21949 This function returns the same values as the libm ``lround`` functions
21950 would and handles error conditions in the same way.
21953 '``llvm.experimental.constrained.llround``' Intrinsic
21954 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
21962 @llvm.experimental.constrained.llround(<fptype> <op1>,
21963 metadata <exception behavior>)
21968 The '``llvm.experimental.constrained.llround``' intrinsic returns the first
21969 operand rounded to the nearest integer with ties away from zero. It will
21970 raise an inexact floating-point exception if the operand is not an integer.
21971 An invalid exception is raised if the result is too large to fit into a
21972 supported integer type, and in this case the result is undefined.
21977 The first argument is a floating-point number. The return value is an
21978 integer type. Not all types are supported on all targets. The supported
21979 types are the same as the ``llvm.llround`` intrinsic and the ``llround``
21982 The second argument specifies the exception behavior as described above.
21987 This function returns the same values as the libm ``llround`` functions
21988 would and handles error conditions in the same way.
21991 '``llvm.experimental.constrained.trunc``' Intrinsic
21992 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
22000 @llvm.experimental.constrained.trunc(<type> <op1>,
22001 metadata <exception behavior>)
22006 The '``llvm.experimental.constrained.trunc``' intrinsic returns the first
22007 operand rounded to the nearest integer not larger in magnitude than the
22013 The first argument and the return value are floating-point numbers of the same
22016 The second argument specifies the exception behavior as described above.
22021 This function returns the same values as the libm ``trunc`` functions
22022 would and handles error conditions in the same way.
22024 .. _int_experimental_noalias_scope_decl:
22026 '``llvm.experimental.noalias.scope.decl``' Intrinsic
22027 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
22035 declare void @llvm.experimental.noalias.scope.decl(metadata !id.scope.list)
22040 The ``llvm.experimental.noalias.scope.decl`` intrinsic identifies where a
22041 noalias scope is declared. When the intrinsic is duplicated, a decision must
22042 also be made about the scope: depending on the reason of the duplication,
22043 the scope might need to be duplicated as well.
22049 The ``!id.scope.list`` argument is metadata that is a list of ``noalias``
22050 metadata references. The format is identical to that required for ``noalias``
22051 metadata. This list must have exactly one element.
22056 The ``llvm.experimental.noalias.scope.decl`` intrinsic identifies where a
22057 noalias scope is declared. When the intrinsic is duplicated, a decision must
22058 also be made about the scope: depending on the reason of the duplication,
22059 the scope might need to be duplicated as well.
22061 For example, when the intrinsic is used inside a loop body, and that loop is
22062 unrolled, the associated noalias scope must also be duplicated. Otherwise, the
22063 noalias property it signifies would spill across loop iterations, whereas it
22064 was only valid within a single iteration.
22066 .. code-block:: llvm
22068 ; This examples shows two possible positions for noalias.decl and how they impact the semantics:
22069 ; If it is outside the loop (Version 1), then %a and %b are noalias across *all* iterations.
22070 ; If it is inside the loop (Version 2), then %a and %b are noalias only within *one* iteration.
22071 declare void @decl_in_loop(i8* %a.base, i8* %b.base) {
22073 ; call void @llvm.experimental.noalias.scope.decl(metadata !2) ; Version 1: noalias decl outside loop
22077 %a = phi i8* [ %a.base, %entry ], [ %a.inc, %loop ]
22078 %b = phi i8* [ %b.base, %entry ], [ %b.inc, %loop ]
22079 ; call void @llvm.experimental.noalias.scope.decl(metadata !2) ; Version 2: noalias decl inside loop
22080 %val = load i8, i8* %a, !alias.scope !2
22081 store i8 %val, i8* %b, !noalias !2
22082 %a.inc = getelementptr inbounds i8, i8* %a, i64 1
22083 %b.inc = getelementptr inbounds i8, i8* %b, i64 1
22084 %cond = call i1 @cond()
22085 br i1 %cond, label %loop, label %exit
22091 !0 = !{!0} ; domain
22092 !1 = !{!1, !0} ; scope
22093 !2 = !{!1} ; scope list
22095 Multiple calls to `@llvm.experimental.noalias.scope.decl` for the same scope
22096 are possible, but one should never dominate another. Violations are pointed out
22097 by the verifier as they indicate a problem in either a transformation pass or
22101 Floating Point Environment Manipulation intrinsics
22102 --------------------------------------------------
22104 These functions read or write floating point environment, such as rounding
22105 mode or state of floating point exceptions. Altering the floating point
22106 environment requires special care. See :ref:`Floating Point Environment <floatenv>`.
22108 '``llvm.flt.rounds``' Intrinsic
22109 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
22116 declare i32 @llvm.flt.rounds()
22121 The '``llvm.flt.rounds``' intrinsic reads the current rounding mode.
22126 The '``llvm.flt.rounds``' intrinsic returns the current rounding mode.
22127 Encoding of the returned values is same as the result of ``FLT_ROUNDS``,
22128 specified by C standard:
22133 1 - to nearest, ties to even
22134 2 - toward positive infinity
22135 3 - toward negative infinity
22136 4 - to nearest, ties away from zero
22138 Other values may be used to represent additional rounding modes, supported by a
22139 target. These values are target-specific.
22142 '``llvm.set.rounding``' Intrinsic
22143 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
22150 declare void @llvm.set.rounding(i32 <val>)
22155 The '``llvm.set.rounding``' intrinsic sets current rounding mode.
22160 The argument is the required rounding mode. Encoding of rounding mode is
22161 the same as used by '``llvm.flt.rounds``'.
22166 The '``llvm.set.rounding``' intrinsic sets the current rounding mode. It is
22167 similar to C library function 'fesetround', however this intrinsic does not
22168 return any value and uses platform-independent representation of IEEE rounding
22175 This class of intrinsics is designed to be generic and has no specific
22178 '``llvm.var.annotation``' Intrinsic
22179 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
22186 declare void @llvm.var.annotation(i8* <val>, i8* <str>, i8* <str>, i32 <int>)
22191 The '``llvm.var.annotation``' intrinsic.
22196 The first argument is a pointer to a value, the second is a pointer to a
22197 global string, the third is a pointer to a global string which is the
22198 source file name, and the last argument is the line number.
22203 This intrinsic allows annotation of local variables with arbitrary
22204 strings. This can be useful for special purpose optimizations that want
22205 to look for these annotations. These have no other defined use; they are
22206 ignored by code generation and optimization.
22208 '``llvm.ptr.annotation.*``' Intrinsic
22209 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
22214 This is an overloaded intrinsic. You can use '``llvm.ptr.annotation``' on a
22215 pointer to an integer of any width. *NOTE* you must specify an address space for
22216 the pointer. The identifier for the default address space is the integer
22221 declare i8* @llvm.ptr.annotation.p<address space>i8(i8* <val>, i8* <str>, i8* <str>, i32 <int>)
22222 declare i16* @llvm.ptr.annotation.p<address space>i16(i16* <val>, i8* <str>, i8* <str>, i32 <int>)
22223 declare i32* @llvm.ptr.annotation.p<address space>i32(i32* <val>, i8* <str>, i8* <str>, i32 <int>)
22224 declare i64* @llvm.ptr.annotation.p<address space>i64(i64* <val>, i8* <str>, i8* <str>, i32 <int>)
22225 declare i256* @llvm.ptr.annotation.p<address space>i256(i256* <val>, i8* <str>, i8* <str>, i32 <int>)
22230 The '``llvm.ptr.annotation``' intrinsic.
22235 The first argument is a pointer to an integer value of arbitrary bitwidth
22236 (result of some expression), the second is a pointer to a global string, the
22237 third is a pointer to a global string which is the source file name, and the
22238 last argument is the line number. It returns the value of the first argument.
22243 This intrinsic allows annotation of a pointer to an integer with arbitrary
22244 strings. This can be useful for special purpose optimizations that want to look
22245 for these annotations. These have no other defined use; they are ignored by code
22246 generation and optimization.
22248 '``llvm.annotation.*``' Intrinsic
22249 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
22254 This is an overloaded intrinsic. You can use '``llvm.annotation``' on
22255 any integer bit width.
22259 declare i8 @llvm.annotation.i8(i8 <val>, i8* <str>, i8* <str>, i32 <int>)
22260 declare i16 @llvm.annotation.i16(i16 <val>, i8* <str>, i8* <str>, i32 <int>)
22261 declare i32 @llvm.annotation.i32(i32 <val>, i8* <str>, i8* <str>, i32 <int>)
22262 declare i64 @llvm.annotation.i64(i64 <val>, i8* <str>, i8* <str>, i32 <int>)
22263 declare i256 @llvm.annotation.i256(i256 <val>, i8* <str>, i8* <str>, i32 <int>)
22268 The '``llvm.annotation``' intrinsic.
22273 The first argument is an integer value (result of some expression), the
22274 second is a pointer to a global string, the third is a pointer to a
22275 global string which is the source file name, and the last argument is
22276 the line number. It returns the value of the first argument.
22281 This intrinsic allows annotations to be put on arbitrary expressions
22282 with arbitrary strings. This can be useful for special purpose
22283 optimizations that want to look for these annotations. These have no
22284 other defined use; they are ignored by code generation and optimization.
22286 '``llvm.codeview.annotation``' Intrinsic
22287 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
22292 This annotation emits a label at its program point and an associated
22293 ``S_ANNOTATION`` codeview record with some additional string metadata. This is
22294 used to implement MSVC's ``__annotation`` intrinsic. It is marked
22295 ``noduplicate``, so calls to this intrinsic prevent inlining and should be
22296 considered expensive.
22300 declare void @llvm.codeview.annotation(metadata)
22305 The argument should be an MDTuple containing any number of MDStrings.
22307 '``llvm.trap``' Intrinsic
22308 ^^^^^^^^^^^^^^^^^^^^^^^^^
22315 declare void @llvm.trap() cold noreturn nounwind
22320 The '``llvm.trap``' intrinsic.
22330 This intrinsic is lowered to the target dependent trap instruction. If
22331 the target does not have a trap instruction, this intrinsic will be
22332 lowered to a call of the ``abort()`` function.
22334 '``llvm.debugtrap``' Intrinsic
22335 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
22342 declare void @llvm.debugtrap() nounwind
22347 The '``llvm.debugtrap``' intrinsic.
22357 This intrinsic is lowered to code which is intended to cause an
22358 execution trap with the intention of requesting the attention of a
22361 '``llvm.ubsantrap``' Intrinsic
22362 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
22369 declare void @llvm.ubsantrap(i8 immarg) cold noreturn nounwind
22374 The '``llvm.ubsantrap``' intrinsic.
22379 An integer describing the kind of failure detected.
22384 This intrinsic is lowered to code which is intended to cause an execution trap,
22385 embedding the argument into encoding of that trap somehow to discriminate
22386 crashes if possible.
22388 Equivalent to ``@llvm.trap`` for targets that do not support this behaviour.
22390 '``llvm.stackprotector``' Intrinsic
22391 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
22398 declare void @llvm.stackprotector(i8* <guard>, i8** <slot>)
22403 The ``llvm.stackprotector`` intrinsic takes the ``guard`` and stores it
22404 onto the stack at ``slot``. The stack slot is adjusted to ensure that it
22405 is placed on the stack before local variables.
22410 The ``llvm.stackprotector`` intrinsic requires two pointer arguments.
22411 The first argument is the value loaded from the stack guard
22412 ``@__stack_chk_guard``. The second variable is an ``alloca`` that has
22413 enough space to hold the value of the guard.
22418 This intrinsic causes the prologue/epilogue inserter to force the position of
22419 the ``AllocaInst`` stack slot to be before local variables on the stack. This is
22420 to ensure that if a local variable on the stack is overwritten, it will destroy
22421 the value of the guard. When the function exits, the guard on the stack is
22422 checked against the original guard by ``llvm.stackprotectorcheck``. If they are
22423 different, then ``llvm.stackprotectorcheck`` causes the program to abort by
22424 calling the ``__stack_chk_fail()`` function.
22426 '``llvm.stackguard``' Intrinsic
22427 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
22434 declare i8* @llvm.stackguard()
22439 The ``llvm.stackguard`` intrinsic returns the system stack guard value.
22441 It should not be generated by frontends, since it is only for internal usage.
22442 The reason why we create this intrinsic is that we still support IR form Stack
22443 Protector in FastISel.
22453 On some platforms, the value returned by this intrinsic remains unchanged
22454 between loads in the same thread. On other platforms, it returns the same
22455 global variable value, if any, e.g. ``@__stack_chk_guard``.
22457 Currently some platforms have IR-level customized stack guard loading (e.g.
22458 X86 Linux) that is not handled by ``llvm.stackguard()``, while they should be
22461 '``llvm.objectsize``' Intrinsic
22462 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
22469 declare i32 @llvm.objectsize.i32(i8* <object>, i1 <min>, i1 <nullunknown>, i1 <dynamic>)
22470 declare i64 @llvm.objectsize.i64(i8* <object>, i1 <min>, i1 <nullunknown>, i1 <dynamic>)
22475 The ``llvm.objectsize`` intrinsic is designed to provide information to the
22476 optimizer to determine whether a) an operation (like memcpy) will overflow a
22477 buffer that corresponds to an object, or b) that a runtime check for overflow
22478 isn't necessary. An object in this context means an allocation of a specific
22479 class, structure, array, or other object.
22484 The ``llvm.objectsize`` intrinsic takes four arguments. The first argument is a
22485 pointer to or into the ``object``. The second argument determines whether
22486 ``llvm.objectsize`` returns 0 (if true) or -1 (if false) when the object size is
22487 unknown. The third argument controls how ``llvm.objectsize`` acts when ``null``
22488 in address space 0 is used as its pointer argument. If it's ``false``,
22489 ``llvm.objectsize`` reports 0 bytes available when given ``null``. Otherwise, if
22490 the ``null`` is in a non-zero address space or if ``true`` is given for the
22491 third argument of ``llvm.objectsize``, we assume its size is unknown. The fourth
22492 argument to ``llvm.objectsize`` determines if the value should be evaluated at
22495 The second, third, and fourth arguments only accept constants.
22500 The ``llvm.objectsize`` intrinsic is lowered to a value representing the size of
22501 the object concerned. If the size cannot be determined, ``llvm.objectsize``
22502 returns ``i32/i64 -1 or 0`` (depending on the ``min`` argument).
22504 '``llvm.expect``' Intrinsic
22505 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
22510 This is an overloaded intrinsic. You can use ``llvm.expect`` on any
22515 declare i1 @llvm.expect.i1(i1 <val>, i1 <expected_val>)
22516 declare i32 @llvm.expect.i32(i32 <val>, i32 <expected_val>)
22517 declare i64 @llvm.expect.i64(i64 <val>, i64 <expected_val>)
22522 The ``llvm.expect`` intrinsic provides information about expected (the
22523 most probable) value of ``val``, which can be used by optimizers.
22528 The ``llvm.expect`` intrinsic takes two arguments. The first argument is
22529 a value. The second argument is an expected value.
22534 This intrinsic is lowered to the ``val``.
22536 '``llvm.expect.with.probability``' Intrinsic
22537 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
22542 This intrinsic is similar to ``llvm.expect``. This is an overloaded intrinsic.
22543 You can use ``llvm.expect.with.probability`` on any integer bit width.
22547 declare i1 @llvm.expect.with.probability.i1(i1 <val>, i1 <expected_val>, double <prob>)
22548 declare i32 @llvm.expect.with.probability.i32(i32 <val>, i32 <expected_val>, double <prob>)
22549 declare i64 @llvm.expect.with.probability.i64(i64 <val>, i64 <expected_val>, double <prob>)
22554 The ``llvm.expect.with.probability`` intrinsic provides information about
22555 expected value of ``val`` with probability(or confidence) ``prob``, which can
22556 be used by optimizers.
22561 The ``llvm.expect.with.probability`` intrinsic takes three arguments. The first
22562 argument is a value. The second argument is an expected value. The third
22563 argument is a probability.
22568 This intrinsic is lowered to the ``val``.
22572 '``llvm.assume``' Intrinsic
22573 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
22580 declare void @llvm.assume(i1 %cond)
22585 The ``llvm.assume`` allows the optimizer to assume that the provided
22586 condition is true. This information can then be used in simplifying other parts
22589 More complex assumptions can be encoded as
22590 :ref:`assume operand bundles <assume_opbundles>`.
22595 The argument of the call is the condition which the optimizer may assume is
22601 The intrinsic allows the optimizer to assume that the provided condition is
22602 always true whenever the control flow reaches the intrinsic call. No code is
22603 generated for this intrinsic, and instructions that contribute only to the
22604 provided condition are not used for code generation. If the condition is
22605 violated during execution, the behavior is undefined.
22607 Note that the optimizer might limit the transformations performed on values
22608 used by the ``llvm.assume`` intrinsic in order to preserve the instructions
22609 only used to form the intrinsic's input argument. This might prove undesirable
22610 if the extra information provided by the ``llvm.assume`` intrinsic does not cause
22611 sufficient overall improvement in code quality. For this reason,
22612 ``llvm.assume`` should not be used to document basic mathematical invariants
22613 that the optimizer can otherwise deduce or facts that are of little use to the
22618 '``llvm.ssa.copy``' Intrinsic
22619 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
22626 declare type @llvm.ssa.copy(type %operand) returned(1) readnone
22631 The first argument is an operand which is used as the returned value.
22636 The ``llvm.ssa.copy`` intrinsic can be used to attach information to
22637 operations by copying them and giving them new names. For example,
22638 the PredicateInfo utility uses it to build Extended SSA form, and
22639 attach various forms of information to operands that dominate specific
22640 uses. It is not meant for general use, only for building temporary
22641 renaming forms that require value splits at certain points.
22645 '``llvm.type.test``' Intrinsic
22646 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
22653 declare i1 @llvm.type.test(i8* %ptr, metadata %type) nounwind readnone
22659 The first argument is a pointer to be tested. The second argument is a
22660 metadata object representing a :doc:`type identifier <TypeMetadata>`.
22665 The ``llvm.type.test`` intrinsic tests whether the given pointer is associated
22666 with the given type identifier.
22668 .. _type.checked.load:
22670 '``llvm.type.checked.load``' Intrinsic
22671 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
22678 declare {i8*, i1} @llvm.type.checked.load(i8* %ptr, i32 %offset, metadata %type) argmemonly nounwind readonly
22684 The first argument is a pointer from which to load a function pointer. The
22685 second argument is the byte offset from which to load the function pointer. The
22686 third argument is a metadata object representing a :doc:`type identifier
22692 The ``llvm.type.checked.load`` intrinsic safely loads a function pointer from a
22693 virtual table pointer using type metadata. This intrinsic is used to implement
22694 control flow integrity in conjunction with virtual call optimization. The
22695 virtual call optimization pass will optimize away ``llvm.type.checked.load``
22696 intrinsics associated with devirtualized calls, thereby removing the type
22697 check in cases where it is not needed to enforce the control flow integrity
22700 If the given pointer is associated with a type metadata identifier, this
22701 function returns true as the second element of its return value. (Note that
22702 the function may also return true if the given pointer is not associated
22703 with a type metadata identifier.) If the function's return value's second
22704 element is true, the following rules apply to the first element:
22706 - If the given pointer is associated with the given type metadata identifier,
22707 it is the function pointer loaded from the given byte offset from the given
22710 - If the given pointer is not associated with the given type metadata
22711 identifier, it is one of the following (the choice of which is unspecified):
22713 1. The function pointer that would have been loaded from an arbitrarily chosen
22714 (through an unspecified mechanism) pointer associated with the type
22717 2. If the function has a non-void return type, a pointer to a function that
22718 returns an unspecified value without causing side effects.
22720 If the function's return value's second element is false, the value of the
22721 first element is undefined.
22724 '``llvm.arithmetic.fence``' Intrinsic
22725 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
22733 @llvm.arithmetic.fence(<type> <op>)
22738 The purpose of the ``llvm.arithmetic.fence`` intrinsic
22739 is to prevent the optimizer from performing fast-math optimizations,
22740 particularly reassociation,
22741 between the argument and the expression that contains the argument.
22742 It can be used to preserve the parentheses in the source language.
22747 The ``llvm.arithmetic.fence`` intrinsic takes only one argument.
22748 The argument and the return value are floating-point numbers,
22749 or vector floating-point numbers, of the same type.
22754 This intrinsic returns the value of its operand. The optimizer can optimize
22755 the argument, but the optimizer cannot hoist any component of the operand
22756 to the containing context, and the optimizer cannot move the calculation of
22757 any expression in the containing context into the operand.
22760 '``llvm.donothing``' Intrinsic
22761 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
22768 declare void @llvm.donothing() nounwind readnone
22773 The ``llvm.donothing`` intrinsic doesn't perform any operation. It's one of only
22774 three intrinsics (besides ``llvm.experimental.patchpoint`` and
22775 ``llvm.experimental.gc.statepoint``) that can be called with an invoke
22786 This intrinsic does nothing, and it's removed by optimizers and ignored
22789 '``llvm.experimental.deoptimize``' Intrinsic
22790 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
22797 declare type @llvm.experimental.deoptimize(...) [ "deopt"(...) ]
22802 This intrinsic, together with :ref:`deoptimization operand bundles
22803 <deopt_opbundles>`, allow frontends to express transfer of control and
22804 frame-local state from the currently executing (typically more specialized,
22805 hence faster) version of a function into another (typically more generic, hence
22808 In languages with a fully integrated managed runtime like Java and JavaScript
22809 this intrinsic can be used to implement "uncommon trap" or "side exit" like
22810 functionality. In unmanaged languages like C and C++, this intrinsic can be
22811 used to represent the slow paths of specialized functions.
22817 The intrinsic takes an arbitrary number of arguments, whose meaning is
22818 decided by the :ref:`lowering strategy<deoptimize_lowering>`.
22823 The ``@llvm.experimental.deoptimize`` intrinsic executes an attached
22824 deoptimization continuation (denoted using a :ref:`deoptimization
22825 operand bundle <deopt_opbundles>`) and returns the value returned by
22826 the deoptimization continuation. Defining the semantic properties of
22827 the continuation itself is out of scope of the language reference --
22828 as far as LLVM is concerned, the deoptimization continuation can
22829 invoke arbitrary side effects, including reading from and writing to
22832 Deoptimization continuations expressed using ``"deopt"`` operand bundles always
22833 continue execution to the end of the physical frame containing them, so all
22834 calls to ``@llvm.experimental.deoptimize`` must be in "tail position":
22836 - ``@llvm.experimental.deoptimize`` cannot be invoked.
22837 - The call must immediately precede a :ref:`ret <i_ret>` instruction.
22838 - The ``ret`` instruction must return the value produced by the
22839 ``@llvm.experimental.deoptimize`` call if there is one, or void.
22841 Note that the above restrictions imply that the return type for a call to
22842 ``@llvm.experimental.deoptimize`` will match the return type of its immediate
22845 The inliner composes the ``"deopt"`` continuations of the caller into the
22846 ``"deopt"`` continuations present in the inlinee, and also updates calls to this
22847 intrinsic to return directly from the frame of the function it inlined into.
22849 All declarations of ``@llvm.experimental.deoptimize`` must share the
22850 same calling convention.
22852 .. _deoptimize_lowering:
22857 Calls to ``@llvm.experimental.deoptimize`` are lowered to calls to the
22858 symbol ``__llvm_deoptimize`` (it is the frontend's responsibility to
22859 ensure that this symbol is defined). The call arguments to
22860 ``@llvm.experimental.deoptimize`` are lowered as if they were formal
22861 arguments of the specified types, and not as varargs.
22864 '``llvm.experimental.guard``' Intrinsic
22865 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
22872 declare void @llvm.experimental.guard(i1, ...) [ "deopt"(...) ]
22877 This intrinsic, together with :ref:`deoptimization operand bundles
22878 <deopt_opbundles>`, allows frontends to express guards or checks on
22879 optimistic assumptions made during compilation. The semantics of
22880 ``@llvm.experimental.guard`` is defined in terms of
22881 ``@llvm.experimental.deoptimize`` -- its body is defined to be
22884 .. code-block:: text
22886 define void @llvm.experimental.guard(i1 %pred, <args...>) {
22887 %realPred = and i1 %pred, undef
22888 br i1 %realPred, label %continue, label %leave [, !make.implicit !{}]
22891 call void @llvm.experimental.deoptimize(<args...>) [ "deopt"() ]
22899 with the optional ``[, !make.implicit !{}]`` present if and only if it
22900 is present on the call site. For more details on ``!make.implicit``,
22901 see :doc:`FaultMaps`.
22903 In words, ``@llvm.experimental.guard`` executes the attached
22904 ``"deopt"`` continuation if (but **not** only if) its first argument
22905 is ``false``. Since the optimizer is allowed to replace the ``undef``
22906 with an arbitrary value, it can optimize guard to fail "spuriously",
22907 i.e. without the original condition being false (hence the "not only
22908 if"); and this allows for "check widening" type optimizations.
22910 ``@llvm.experimental.guard`` cannot be invoked.
22912 After ``@llvm.experimental.guard`` was first added, a more general
22913 formulation was found in ``@llvm.experimental.widenable.condition``.
22914 Support for ``@llvm.experimental.guard`` is slowly being rephrased in
22915 terms of this alternate.
22917 '``llvm.experimental.widenable.condition``' Intrinsic
22918 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
22925 declare i1 @llvm.experimental.widenable.condition()
22930 This intrinsic represents a "widenable condition" which is
22931 boolean expressions with the following property: whether this
22932 expression is `true` or `false`, the program is correct and
22935 Together with :ref:`deoptimization operand bundles <deopt_opbundles>`,
22936 ``@llvm.experimental.widenable.condition`` allows frontends to
22937 express guards or checks on optimistic assumptions made during
22938 compilation and represent them as branch instructions on special
22941 While this may appear similar in semantics to `undef`, it is very
22942 different in that an invocation produces a particular, singular
22943 value. It is also intended to be lowered late, and remain available
22944 for specific optimizations and transforms that can benefit from its
22945 special properties.
22955 The intrinsic ``@llvm.experimental.widenable.condition()``
22956 returns either `true` or `false`. For each evaluation of a call
22957 to this intrinsic, the program must be valid and correct both if
22958 it returns `true` and if it returns `false`. This allows
22959 transformation passes to replace evaluations of this intrinsic
22960 with either value whenever one is beneficial.
22962 When used in a branch condition, it allows us to choose between
22963 two alternative correct solutions for the same problem, like
22966 .. code-block:: text
22968 %cond = call i1 @llvm.experimental.widenable.condition()
22969 br i1 %cond, label %solution_1, label %solution_2
22972 ; Apply memory-consuming but fast solution for a task.
22975 ; Cheap in memory but slow solution.
22977 Whether the result of intrinsic's call is `true` or `false`,
22978 it should be correct to pick either solution. We can switch
22979 between them by replacing the result of
22980 ``@llvm.experimental.widenable.condition`` with different
22983 This is how it can be used to represent guards as widenable branches:
22985 .. code-block:: text
22988 ; Unguarded instructions
22989 call void @llvm.experimental.guard(i1 %cond, <args...>) ["deopt"(<deopt_args...>)]
22990 ; Guarded instructions
22992 Can be expressed in an alternative equivalent form of explicit branch using
22993 ``@llvm.experimental.widenable.condition``:
22995 .. code-block:: text
22998 ; Unguarded instructions
22999 %widenable_condition = call i1 @llvm.experimental.widenable.condition()
23000 %guard_condition = and i1 %cond, %widenable_condition
23001 br i1 %guard_condition, label %guarded, label %deopt
23004 ; Guarded instructions
23007 call type @llvm.experimental.deoptimize(<args...>) [ "deopt"(<deopt_args...>) ]
23009 So the block `guarded` is only reachable when `%cond` is `true`,
23010 and it should be valid to go to the block `deopt` whenever `%cond`
23011 is `true` or `false`.
23013 ``@llvm.experimental.widenable.condition`` will never throw, thus
23014 it cannot be invoked.
23019 When ``@llvm.experimental.widenable.condition()`` is used in
23020 condition of a guard represented as explicit branch, it is
23021 legal to widen the guard's condition with any additional
23024 Guard widening looks like replacement of
23026 .. code-block:: text
23028 %widenable_cond = call i1 @llvm.experimental.widenable.condition()
23029 %guard_cond = and i1 %cond, %widenable_cond
23030 br i1 %guard_cond, label %guarded, label %deopt
23034 .. code-block:: text
23036 %widenable_cond = call i1 @llvm.experimental.widenable.condition()
23037 %new_cond = and i1 %any_other_cond, %widenable_cond
23038 %new_guard_cond = and i1 %cond, %new_cond
23039 br i1 %new_guard_cond, label %guarded, label %deopt
23041 for this branch. Here `%any_other_cond` is an arbitrarily chosen
23042 well-defined `i1` value. By making guard widening, we may
23043 impose stricter conditions on `guarded` block and bail to the
23044 deopt when the new condition is not met.
23049 Default lowering strategy is replacing the result of
23050 call of ``@llvm.experimental.widenable.condition`` with
23051 constant `true`. However it is always correct to replace
23052 it with any other `i1` value. Any pass can
23053 freely do it if it can benefit from non-default lowering.
23056 '``llvm.load.relative``' Intrinsic
23057 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
23064 declare i8* @llvm.load.relative.iN(i8* %ptr, iN %offset) argmemonly nounwind readonly
23069 This intrinsic loads a 32-bit value from the address ``%ptr + %offset``,
23070 adds ``%ptr`` to that value and returns it. The constant folder specifically
23071 recognizes the form of this intrinsic and the constant initializers it may
23072 load from; if a loaded constant initializer is known to have the form
23073 ``i32 trunc(x - %ptr)``, the intrinsic call is folded to ``x``.
23075 LLVM provides that the calculation of such a constant initializer will
23076 not overflow at link time under the medium code model if ``x`` is an
23077 ``unnamed_addr`` function. However, it does not provide this guarantee for
23078 a constant initializer folded into a function body. This intrinsic can be
23079 used to avoid the possibility of overflows when loading from such a constant.
23081 .. _llvm_sideeffect:
23083 '``llvm.sideeffect``' Intrinsic
23084 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
23091 declare void @llvm.sideeffect() inaccessiblememonly nounwind
23096 The ``llvm.sideeffect`` intrinsic doesn't perform any operation. Optimizers
23097 treat it as having side effects, so it can be inserted into a loop to
23098 indicate that the loop shouldn't be assumed to terminate (which could
23099 potentially lead to the loop being optimized away entirely), even if it's
23100 an infinite loop with no other side effects.
23110 This intrinsic actually does nothing, but optimizers must assume that it
23111 has externally observable side effects.
23113 '``llvm.is.constant.*``' Intrinsic
23114 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
23119 This is an overloaded intrinsic. You can use llvm.is.constant with any argument type.
23123 declare i1 @llvm.is.constant.i32(i32 %operand) nounwind readnone
23124 declare i1 @llvm.is.constant.f32(float %operand) nounwind readnone
23125 declare i1 @llvm.is.constant.TYPENAME(TYPE %operand) nounwind readnone
23130 The '``llvm.is.constant``' intrinsic will return true if the argument
23131 is known to be a manifest compile-time constant. It is guaranteed to
23132 fold to either true or false before generating machine code.
23137 This intrinsic generates no code. If its argument is known to be a
23138 manifest compile-time constant value, then the intrinsic will be
23139 converted to a constant true value. Otherwise, it will be converted to
23140 a constant false value.
23142 In particular, note that if the argument is a constant expression
23143 which refers to a global (the address of which _is_ a constant, but
23144 not manifest during the compile), then the intrinsic evaluates to
23147 The result also intentionally depends on the result of optimization
23148 passes -- e.g., the result can change depending on whether a
23149 function gets inlined or not. A function's parameters are
23150 obviously not constant. However, a call like
23151 ``llvm.is.constant.i32(i32 %param)`` *can* return true after the
23152 function is inlined, if the value passed to the function parameter was
23155 On the other hand, if constant folding is not run, it will never
23156 evaluate to true, even in simple cases.
23160 '``llvm.ptrmask``' Intrinsic
23161 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
23168 declare ptrty llvm.ptrmask(ptrty %ptr, intty %mask) readnone speculatable
23173 The first argument is a pointer. The second argument is an integer.
23178 The ``llvm.ptrmask`` intrinsic masks out bits of the pointer according to a mask.
23179 This allows stripping data from tagged pointers without converting them to an
23180 integer (ptrtoint/inttoptr). As a consequence, we can preserve more information
23181 to facilitate alias analysis and underlying-object detection.
23186 The result of ``ptrmask(ptr, mask)`` is equivalent to
23187 ``getelementptr ptr, (ptrtoint(ptr) & mask) - ptrtoint(ptr)``. Both the returned
23188 pointer and the first argument are based on the same underlying object (for more
23189 information on the *based on* terminology see
23190 :ref:`the pointer aliasing rules <pointeraliasing>`). If the bitwidth of the
23191 mask argument does not match the pointer size of the target, the mask is
23192 zero-extended or truncated accordingly.
23196 '``llvm.vscale``' Intrinsic
23197 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
23204 declare i32 llvm.vscale.i32()
23205 declare i64 llvm.vscale.i64()
23210 The ``llvm.vscale`` intrinsic returns the value for ``vscale`` in scalable
23211 vectors such as ``<vscale x 16 x i8>``.
23216 ``vscale`` is a positive value that is constant throughout program
23217 execution, but is unknown at compile time.
23218 If the result value does not fit in the result type, then the result is
23219 a :ref:`poison value <poisonvalues>`.
23222 Stack Map Intrinsics
23223 --------------------
23225 LLVM provides experimental intrinsics to support runtime patching
23226 mechanisms commonly desired in dynamic language JITs. These intrinsics
23227 are described in :doc:`StackMaps`.
23229 Element Wise Atomic Memory Intrinsics
23230 -------------------------------------
23232 These intrinsics are similar to the standard library memory intrinsics except
23233 that they perform memory transfer as a sequence of atomic memory accesses.
23235 .. _int_memcpy_element_unordered_atomic:
23237 '``llvm.memcpy.element.unordered.atomic``' Intrinsic
23238 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
23243 This is an overloaded intrinsic. You can use ``llvm.memcpy.element.unordered.atomic`` on
23244 any integer bit width and for different address spaces. Not all targets
23245 support all bit widths however.
23249 declare void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* <dest>,
23252 i32 <element_size>)
23253 declare void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* <dest>,
23256 i32 <element_size>)
23261 The '``llvm.memcpy.element.unordered.atomic.*``' intrinsic is a specialization of the
23262 '``llvm.memcpy.*``' intrinsic. It differs in that the ``dest`` and ``src`` are treated
23263 as arrays with elements that are exactly ``element_size`` bytes, and the copy between
23264 buffers uses a sequence of :ref:`unordered atomic <ordering>` load/store operations
23265 that are a positive integer multiple of the ``element_size`` in size.
23270 The first three arguments are the same as they are in the :ref:`@llvm.memcpy <int_memcpy>`
23271 intrinsic, with the added constraint that ``len`` is required to be a positive integer
23272 multiple of the ``element_size``. If ``len`` is not a positive integer multiple of
23273 ``element_size``, then the behaviour of the intrinsic is undefined.
23275 ``element_size`` must be a compile-time constant positive power of two no greater than
23276 target-specific atomic access size limit.
23278 For each of the input pointers ``align`` parameter attribute must be specified. It
23279 must be a power of two no less than the ``element_size``. Caller guarantees that
23280 both the source and destination pointers are aligned to that boundary.
23285 The '``llvm.memcpy.element.unordered.atomic.*``' intrinsic copies ``len`` bytes of
23286 memory from the source location to the destination location. These locations are not
23287 allowed to overlap. The memory copy is performed as a sequence of load/store operations
23288 where each access is guaranteed to be a multiple of ``element_size`` bytes wide and
23289 aligned at an ``element_size`` boundary.
23291 The order of the copy is unspecified. The same value may be read from the source
23292 buffer many times, but only one write is issued to the destination buffer per
23293 element. It is well defined to have concurrent reads and writes to both source and
23294 destination provided those reads and writes are unordered atomic when specified.
23296 This intrinsic does not provide any additional ordering guarantees over those
23297 provided by a set of unordered loads from the source location and stores to the
23303 In the most general case call to the '``llvm.memcpy.element.unordered.atomic.*``' is
23304 lowered to a call to the symbol ``__llvm_memcpy_element_unordered_atomic_*``. Where '*'
23305 is replaced with an actual element size. See :ref:`RewriteStatepointsForGC intrinsic
23306 lowering <RewriteStatepointsForGC_intrinsic_lowering>` for details on GC specific
23309 Optimizer is allowed to inline memory copy when it's profitable to do so.
23311 '``llvm.memmove.element.unordered.atomic``' Intrinsic
23312 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
23317 This is an overloaded intrinsic. You can use
23318 ``llvm.memmove.element.unordered.atomic`` on any integer bit width and for
23319 different address spaces. Not all targets support all bit widths however.
23323 declare void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i32(i8* <dest>,
23326 i32 <element_size>)
23327 declare void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i64(i8* <dest>,
23330 i32 <element_size>)
23335 The '``llvm.memmove.element.unordered.atomic.*``' intrinsic is a specialization
23336 of the '``llvm.memmove.*``' intrinsic. It differs in that the ``dest`` and
23337 ``src`` are treated as arrays with elements that are exactly ``element_size``
23338 bytes, and the copy between buffers uses a sequence of
23339 :ref:`unordered atomic <ordering>` load/store operations that are a positive
23340 integer multiple of the ``element_size`` in size.
23345 The first three arguments are the same as they are in the
23346 :ref:`@llvm.memmove <int_memmove>` intrinsic, with the added constraint that
23347 ``len`` is required to be a positive integer multiple of the ``element_size``.
23348 If ``len`` is not a positive integer multiple of ``element_size``, then the
23349 behaviour of the intrinsic is undefined.
23351 ``element_size`` must be a compile-time constant positive power of two no
23352 greater than a target-specific atomic access size limit.
23354 For each of the input pointers the ``align`` parameter attribute must be
23355 specified. It must be a power of two no less than the ``element_size``. Caller
23356 guarantees that both the source and destination pointers are aligned to that
23362 The '``llvm.memmove.element.unordered.atomic.*``' intrinsic copies ``len`` bytes
23363 of memory from the source location to the destination location. These locations
23364 are allowed to overlap. The memory copy is performed as a sequence of load/store
23365 operations where each access is guaranteed to be a multiple of ``element_size``
23366 bytes wide and aligned at an ``element_size`` boundary.
23368 The order of the copy is unspecified. The same value may be read from the source
23369 buffer many times, but only one write is issued to the destination buffer per
23370 element. It is well defined to have concurrent reads and writes to both source
23371 and destination provided those reads and writes are unordered atomic when
23374 This intrinsic does not provide any additional ordering guarantees over those
23375 provided by a set of unordered loads from the source location and stores to the
23381 In the most general case call to the
23382 '``llvm.memmove.element.unordered.atomic.*``' is lowered to a call to the symbol
23383 ``__llvm_memmove_element_unordered_atomic_*``. Where '*' is replaced with an
23384 actual element size. See :ref:`RewriteStatepointsForGC intrinsic lowering
23385 <RewriteStatepointsForGC_intrinsic_lowering>` for details on GC specific
23388 The optimizer is allowed to inline the memory copy when it's profitable to do so.
23390 .. _int_memset_element_unordered_atomic:
23392 '``llvm.memset.element.unordered.atomic``' Intrinsic
23393 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
23398 This is an overloaded intrinsic. You can use ``llvm.memset.element.unordered.atomic`` on
23399 any integer bit width and for different address spaces. Not all targets
23400 support all bit widths however.
23404 declare void @llvm.memset.element.unordered.atomic.p0i8.i32(i8* <dest>,
23407 i32 <element_size>)
23408 declare void @llvm.memset.element.unordered.atomic.p0i8.i64(i8* <dest>,
23411 i32 <element_size>)
23416 The '``llvm.memset.element.unordered.atomic.*``' intrinsic is a specialization of the
23417 '``llvm.memset.*``' intrinsic. It differs in that the ``dest`` is treated as an array
23418 with elements that are exactly ``element_size`` bytes, and the assignment to that array
23419 uses uses a sequence of :ref:`unordered atomic <ordering>` store operations
23420 that are a positive integer multiple of the ``element_size`` in size.
23425 The first three arguments are the same as they are in the :ref:`@llvm.memset <int_memset>`
23426 intrinsic, with the added constraint that ``len`` is required to be a positive integer
23427 multiple of the ``element_size``. If ``len`` is not a positive integer multiple of
23428 ``element_size``, then the behaviour of the intrinsic is undefined.
23430 ``element_size`` must be a compile-time constant positive power of two no greater than
23431 target-specific atomic access size limit.
23433 The ``dest`` input pointer must have the ``align`` parameter attribute specified. It
23434 must be a power of two no less than the ``element_size``. Caller guarantees that
23435 the destination pointer is aligned to that boundary.
23440 The '``llvm.memset.element.unordered.atomic.*``' intrinsic sets the ``len`` bytes of
23441 memory starting at the destination location to the given ``value``. The memory is
23442 set with a sequence of store operations where each access is guaranteed to be a
23443 multiple of ``element_size`` bytes wide and aligned at an ``element_size`` boundary.
23445 The order of the assignment is unspecified. Only one write is issued to the
23446 destination buffer per element. It is well defined to have concurrent reads and
23447 writes to the destination provided those reads and writes are unordered atomic
23450 This intrinsic does not provide any additional ordering guarantees over those
23451 provided by a set of unordered stores to the destination.
23456 In the most general case call to the '``llvm.memset.element.unordered.atomic.*``' is
23457 lowered to a call to the symbol ``__llvm_memset_element_unordered_atomic_*``. Where '*'
23458 is replaced with an actual element size.
23460 The optimizer is allowed to inline the memory assignment when it's profitable to do so.
23462 Objective-C ARC Runtime Intrinsics
23463 ----------------------------------
23465 LLVM provides intrinsics that lower to Objective-C ARC runtime entry points.
23466 LLVM is aware of the semantics of these functions, and optimizes based on that
23467 knowledge. You can read more about the details of Objective-C ARC `here
23468 <https://clang.llvm.org/docs/AutomaticReferenceCounting.html>`_.
23470 '``llvm.objc.autorelease``' Intrinsic
23471 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
23477 declare i8* @llvm.objc.autorelease(i8*)
23482 Lowers to a call to `objc_autorelease <https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autorelease>`_.
23484 '``llvm.objc.autoreleasePoolPop``' Intrinsic
23485 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
23491 declare void @llvm.objc.autoreleasePoolPop(i8*)
23496 Lowers to a call to `objc_autoreleasePoolPop <https://clang.llvm.org/docs/AutomaticReferenceCounting.html#void-objc-autoreleasepoolpop-void-pool>`_.
23498 '``llvm.objc.autoreleasePoolPush``' Intrinsic
23499 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
23505 declare i8* @llvm.objc.autoreleasePoolPush()
23510 Lowers to a call to `objc_autoreleasePoolPush <https://clang.llvm.org/docs/AutomaticReferenceCounting.html#void-objc-autoreleasepoolpush-void>`_.
23512 '``llvm.objc.autoreleaseReturnValue``' Intrinsic
23513 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
23519 declare i8* @llvm.objc.autoreleaseReturnValue(i8*)
23524 Lowers to a call to `objc_autoreleaseReturnValue <https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue>`_.
23526 '``llvm.objc.copyWeak``' Intrinsic
23527 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
23533 declare void @llvm.objc.copyWeak(i8**, i8**)
23538 Lowers to a call to `objc_copyWeak <https://clang.llvm.org/docs/AutomaticReferenceCounting.html#void-objc-copyweak-id-dest-id-src>`_.
23540 '``llvm.objc.destroyWeak``' Intrinsic
23541 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
23547 declare void @llvm.objc.destroyWeak(i8**)
23552 Lowers to a call to `objc_destroyWeak <https://clang.llvm.org/docs/AutomaticReferenceCounting.html#void-objc-destroyweak-id-object>`_.
23554 '``llvm.objc.initWeak``' Intrinsic
23555 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
23561 declare i8* @llvm.objc.initWeak(i8**, i8*)
23566 Lowers to a call to `objc_initWeak <https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-initweak>`_.
23568 '``llvm.objc.loadWeak``' Intrinsic
23569 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
23575 declare i8* @llvm.objc.loadWeak(i8**)
23580 Lowers to a call to `objc_loadWeak <https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-loadweak>`_.
23582 '``llvm.objc.loadWeakRetained``' Intrinsic
23583 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
23589 declare i8* @llvm.objc.loadWeakRetained(i8**)
23594 Lowers to a call to `objc_loadWeakRetained <https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-loadweakretained>`_.
23596 '``llvm.objc.moveWeak``' Intrinsic
23597 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
23603 declare void @llvm.objc.moveWeak(i8**, i8**)
23608 Lowers to a call to `objc_moveWeak <https://clang.llvm.org/docs/AutomaticReferenceCounting.html#void-objc-moveweak-id-dest-id-src>`_.
23610 '``llvm.objc.release``' Intrinsic
23611 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
23617 declare void @llvm.objc.release(i8*)
23622 Lowers to a call to `objc_release <https://clang.llvm.org/docs/AutomaticReferenceCounting.html#void-objc-release-id-value>`_.
23624 '``llvm.objc.retain``' Intrinsic
23625 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
23631 declare i8* @llvm.objc.retain(i8*)
23636 Lowers to a call to `objc_retain <https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-retain>`_.
23638 '``llvm.objc.retainAutorelease``' Intrinsic
23639 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
23645 declare i8* @llvm.objc.retainAutorelease(i8*)
23650 Lowers to a call to `objc_retainAutorelease <https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-retainautorelease>`_.
23652 '``llvm.objc.retainAutoreleaseReturnValue``' Intrinsic
23653 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
23659 declare i8* @llvm.objc.retainAutoreleaseReturnValue(i8*)
23664 Lowers to a call to `objc_retainAutoreleaseReturnValue <https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-retainautoreleasereturnvalue>`_.
23666 '``llvm.objc.retainAutoreleasedReturnValue``' Intrinsic
23667 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
23673 declare i8* @llvm.objc.retainAutoreleasedReturnValue(i8*)
23678 Lowers to a call to `objc_retainAutoreleasedReturnValue <https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-retainautoreleasedreturnvalue>`_.
23680 '``llvm.objc.retainBlock``' Intrinsic
23681 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
23687 declare i8* @llvm.objc.retainBlock(i8*)
23692 Lowers to a call to `objc_retainBlock <https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-retainblock>`_.
23694 '``llvm.objc.storeStrong``' Intrinsic
23695 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
23701 declare void @llvm.objc.storeStrong(i8**, i8*)
23706 Lowers to a call to `objc_storeStrong <https://clang.llvm.org/docs/AutomaticReferenceCounting.html#void-objc-storestrong-id-object-id-value>`_.
23708 '``llvm.objc.storeWeak``' Intrinsic
23709 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
23715 declare i8* @llvm.objc.storeWeak(i8**, i8*)
23720 Lowers to a call to `objc_storeWeak <https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-storeweak>`_.
23722 Preserving Debug Information Intrinsics
23723 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
23725 These intrinsics are used to carry certain debuginfo together with
23726 IR-level operations. For example, it may be desirable to
23727 know the structure/union name and the original user-level field
23728 indices. Such information got lost in IR GetElementPtr instruction
23729 since the IR types are different from debugInfo types and unions
23730 are converted to structs in IR.
23732 '``llvm.preserve.array.access.index``' Intrinsic
23733 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
23740 @llvm.preserve.array.access.index.p0s_union.anons.p0a10s_union.anons(<type> base,
23747 The '``llvm.preserve.array.access.index``' intrinsic returns the getelementptr address
23748 based on array base ``base``, array dimension ``dim`` and the last access index ``index``
23749 into the array. The return type ``ret_type`` is a pointer type to the array element.
23750 The array ``dim`` and ``index`` are preserved which is more robust than
23751 getelementptr instruction which may be subject to compiler transformation.
23752 The ``llvm.preserve.access.index`` type of metadata is attached to this call instruction
23753 to provide array or pointer debuginfo type.
23754 The metadata is a ``DICompositeType`` or ``DIDerivedType`` representing the
23755 debuginfo version of ``type``.
23760 The ``base`` is the array base address. The ``dim`` is the array dimension.
23761 The ``base`` is a pointer if ``dim`` equals 0.
23762 The ``index`` is the last access index into the array or pointer.
23764 The ``base`` argument must be annotated with an :ref:`elementtype
23765 <attr_elementtype>` attribute at the call-site. This attribute specifies the
23766 getelementptr element type.
23771 The '``llvm.preserve.array.access.index``' intrinsic produces the same result
23772 as a getelementptr with base ``base`` and access operands ``{dim's 0's, index}``.
23774 '``llvm.preserve.union.access.index``' Intrinsic
23775 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
23782 @llvm.preserve.union.access.index.p0s_union.anons.p0s_union.anons(<type> base,
23788 The '``llvm.preserve.union.access.index``' intrinsic carries the debuginfo field index
23789 ``di_index`` and returns the ``base`` address.
23790 The ``llvm.preserve.access.index`` type of metadata is attached to this call instruction
23791 to provide union debuginfo type.
23792 The metadata is a ``DICompositeType`` representing the debuginfo version of ``type``.
23793 The return type ``type`` is the same as the ``base`` type.
23798 The ``base`` is the union base address. The ``di_index`` is the field index in debuginfo.
23803 The '``llvm.preserve.union.access.index``' intrinsic returns the ``base`` address.
23805 '``llvm.preserve.struct.access.index``' Intrinsic
23806 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
23813 @llvm.preserve.struct.access.index.p0i8.p0s_struct.anon.0s(<type> base,
23820 The '``llvm.preserve.struct.access.index``' intrinsic returns the getelementptr address
23821 based on struct base ``base`` and IR struct member index ``gep_index``.
23822 The ``llvm.preserve.access.index`` type of metadata is attached to this call instruction
23823 to provide struct debuginfo type.
23824 The metadata is a ``DICompositeType`` representing the debuginfo version of ``type``.
23825 The return type ``ret_type`` is a pointer type to the structure member.
23830 The ``base`` is the structure base address. The ``gep_index`` is the struct member index
23831 based on IR structures. The ``di_index`` is the struct member index based on debuginfo.
23833 The ``base`` argument must be annotated with an :ref:`elementtype
23834 <attr_elementtype>` attribute at the call-site. This attribute specifies the
23835 getelementptr element type.
23840 The '``llvm.preserve.struct.access.index``' intrinsic produces the same result
23841 as a getelementptr with base ``base`` and access operands ``{0, gep_index}``.