syntax.md

   1 # SPIR-V Assembly language syntax
   2
   3 ## Overview
   4
   5 The assembly attempts to adhere to the binary form from Section 3 of the SPIR-V
   6 spec as closely as possible, with one exception aiming at improving the text's
   7 readability.  The `<result-id>` generated by an instruction is moved to the
   8 beginning of that instruction and followed by an `=` sign.  This allows us to
   9 distinguish between variable definitions and uses and locate value definitions
  10 more easily.
  11
  12 Here is an example:
  13
  14 ```
  15      OpCapability Shader
  16      OpMemoryModel Logical Simple
  17      OpEntryPoint GLCompute %3 "main"
  18      OpExecutionMode %3 LocalSize 64 64 1
  19 %1 = OpTypeVoid
  20 %2 = OpTypeFunction %1
  21 %3 = OpFunction %1 None %2
  22 %4 = OpLabel
  23      OpReturn
  24      OpFunctionEnd
  25 ```
  26
  27 A module is a sequence of instructions, separated by whitespace.
  28 An instruction is an opcode name followed by operands, separated by
  29 whitespace.  Typically each instruction is presented on its own line,
  30 but the assembler does not enforce this rule.
  31
  32 The opcode names and expected operands are described in Section 3 of
  33 the SPIR-V specification.  An operand is one of:
  34 * a literal integer: A decimal integer, or a hexadecimal integer.
  35   A hexadecimal integer is indicated by a leading `0x` or `0X`.  A hex
  36   integer supplied for a signed integer value will be sign-extended.
  37   For example, `0xffff` supplied as the literal for an `OpConstant`
  38   on a signed 16-bit integer type will be interpreted as the value `-1`.
  39 * a literal floating point number, in decimal or hexadecimal form.
  40   See [below](#floats).
  41 * a literal string.
  42    * A literal string is everything following a double-quote `"` until the
  43      following un-escaped double-quote. This includes special characters such
  44      as newlines.
  45    * A backslash `\` may be used to escape characters in the string. The `\`
  46      may be used to escape a double-quote or a `\` but is simply ignored when
  47      preceding any other character.
  48 * a named enumerated value, specific to that operand position.  For example,
  49   the `OpMemoryModel` takes a named Addressing Model operand (e.g. `Logical` or
  50   `Physical32`), and a named Memory Model operand (e.g. `Simple` or `OpenCL`).
  51   Named enumerated values are only meaningful in specific positions, and will
  52   otherwise generate an error.
  53 * a mask expression, consisting of one or more mask enum names separated
  54   by `|`.  For example, the expression `NotNaN|NotInf|NSZ` denotes the mask
  55   which is the combination of the `NotNaN`, `NotInf`, and `NSZ` flags.
  56 * an injected immediate integer: `!<integer>`.  See [below](#immediate).
  57 * an ID, e.g. `%foo`. See [below](#id).
  58 * the name of an extended instruction.  For example, `sqrt` in an extended
  59   instruction such as `%f = OpExtInst %f32 %OpenCLImport sqrt %arg`
  60 * the name of an opcode for OpSpecConstantOp, but where the `Op` prefix
  61   is removed.  For example, the following indicates the use of an integer
  62   addition in a specialization constant computation:
  63   `%sum = OpSpecConstantOp %i32 IAdd %a %b`
  64
  65 ## ID Definitions & Usage
  66 <a name="id"></a>
  67
  68 An ID _definition_ pertains to the `<result-id>` of an instruction, and ID
  69 _usage_ is a use of an ID as an input to an instruction.
  70
  71 An ID in the assembly language begins with `%` and must be followed by a name
  72 consisting of one or more letters, numbers or underscore characters.
  73
  74 For every ID in the assembly program, the assembler generates a unique number
  75 called the ID's internal number. Then each ID reference translates into its
  76 internal number in the SPIR-V output. Internal numbers are unique within the
  77 compilation unit: no two IDs in the same unit will share internal numbers.
  78
  79 The disassembler generates IDs where the name is always a decimal number
  80 greater than 0.
  81
  82 So the example can be rewritten using more user-friendly names, as follows:
  83 ```
  84           OpCapability Shader
  85           OpMemoryModel Logical Simple
  86           OpEntryPoint GLCompute %main "main"
  87           OpExecutionMode %main LocalSize 64 64 1
  88   %void = OpTypeVoid
  89 %fnMain = OpTypeFunction %void
  90   %main = OpFunction %void None %fnMain
  91 %lbMain = OpLabel
  92           OpReturn
  93           OpFunctionEnd
  94 ```
  95
  96 ## Floating point literals
  97 <a name="floats"></a>
  98
  99 The assembler and disassembler support floating point literals in both
 100 decimal and hexadecimal form.
 101
 102 The syntax for a floating point literal is the same as floating point
 103 constants in the C programming language, except:
 104 * An optional leading minus (`-`) is part of the literal.
 105 * An optional type specifier suffix is not allowed.
 106 Infinity and NaN values are expressed in hexadecimal float literals
 107 by using the maximum representable exponent for the bit width.
 108
 109 For example, in 32-bit floating point, 8 bits are used for the exponent, and the
 110 exponent bias is 127.  So the maximum representable unbiased exponent is 128.
 111 Therefore, we represent the infinities and some NaNs as follows:
 112
 113 ```
 114 %float32 = OpTypeFloat 32
 115 %inf     = OpConstant %float32 0x1p+128
 116 %neginf  = OpConstant %float32 -0x1p+128
 117 %aNaN    = OpConstant %float32 0x1.8p+128
 118 %moreNaN = OpConstant %float32 -0x1.0002p+128
 119 ```
 120 The assembler preserves all the bits of a NaN value.  For example, the encoding
 121 of `%aNaN` in the previous example is the same as the word with bits
 122 `0x7fc00000`, and `%moreNaN` is encoded as `0xff800100`.
 123
 124 The disassembler prints infinite, NaN, and subnormal values in hexadecimal form.
 125 Zero and normal values are printed in decimal form with enough digits
 126 to preserve all significand bits.
 127
 128 ## Arbitrary Integers
 129 <a name="immediate"></a>
 130
 131 When writing tests it can be useful to emit an invalid 32 bit word into the
 132 binary stream at arbitrary positions within the assembly. To specify an
 133 arbitrary word into the stream the prefix `!` is used, this takes the form
 134 `!<integer>`. Here is an example.
 135
 136 ```
 137 OpCapability !0x0000FF00
 138 ```
 139
 140 Any token in a valid assembly program may be replaced by `!<integer>` -- even
 141 tokens that dictate how the rest of the instruction is parsed.  Consider, for
 142 example, the following assembly program:
 143
 144 ```
 145 %4 = OpConstant %1 123 456 789 OpExecutionMode %2 LocalSize 11 22 33
 146 OpExecutionMode %3 InputLines
 147 ```
 148
 149 The tokens `OpConstant`, `LocalSize`, and `InputLines` may be replaced by random
 150 `!<integer>` values, and the assembler will still assemble an output binary with
 151 three instructions.  It will not necessarily be valid SPIR-V, but it will
 152 faithfully reflect the input text.
 153
 154 You may wonder how the assembler recognizes the instruction structure (including
 155 instruction boundaries) in the text with certain crucial tokens replaced by
 156 arbitrary integers.  If, say, `OpConstant` becomes a `!<integer>` whose value
 157 differs from the binary representation of `OpConstant` (remember that this
 158 feature is intended for fine-grain control in SPIR-V testing), the assembler
 159 generally has no idea what that value stands for.  So how does it know there is
 160 exactly one `<id>` and three number literals following in that instruction,
 161 before the next one begins?  And if `LocalSize` is replaced by an arbitrary
 162 `!<integer>`, how does it know to take the next three tokens (instead of zero or
 163 one, both of which are possible in the absence of certainty that `LocalSize`
 164 provided)?  The answer is a simple rule governing the parsing of instructions
 165 with `!<integer>` in them:
 166
 167 When a token in the assembly program is a `!<integer>`, that integer value is
 168 emitted into the binary output, and parsing proceeds differently than before:
 169 each subsequent token not recognized as an OpCode or a <result-id> is emitted
 170 into the binary output without any checking; when a recognizable OpCode or a
 171 <result-id> is eventually encountered, it begins a new instruction and parsing
 172 returns to normal.  (If a subsequent OpCode is never found, then this alternate
 173 parsing mode handles all the remaining tokens in the program.)
 174
 175 The assembler processes the tokens encountered in alternate parsing mode as
 176 follows:
 177
 178 * If the token is a number literal, since context may be lost, the number
 179   is interpreted as a 32-bit value and output as a single word.  In order to
 180   specify multiple-word literals in alternate-parsing mode, further uses of
 181   `!<integer>` tokens may be required.
 182   All formats supported by `strtoul()` are accepted.
 183 * If the token is a string literal, it outputs a sequence of words representing
 184   the string as defined in the SPIR-V specification for Literal String.
 185 * If the token is an ID, it outputs the ID's internal number.
 186 * If the token is another `!<integer>`, it outputs that integer.
 187 * Any other token causes the assembler to quit with an error.
 188
 189 Note that this has some interesting consequences, including:
 190
 191 * When an OpCode is replaced by `!<integer>`, the integer value should encode
 192   the instruction's word count, as specified in the physical-layout section of
 193   the SPIR-V specification.
 194
 195 * Consecutive instructions may have their OpCode replaced by `!<integer>` and
 196   still produce valid SPIR-V.  For example, `!262187 %1 %2 "abc" !327739 %1 %3 6
 197   %2` will successfully assemble into SPIR-V declaring a constant and a
 198   PrivateGlobal variable.
 199
 200 * Enums (such as `DontInline` or `SubgroupMemory`, for instance) are not handled
 201   by the alternate parsing mode.  They must be replaced by `!<integer>` for
 202   successful assembly.
 203
 204 * The `<result-id>` on the left-hand side of an assignment cannot be a
 205   `!<integer>`. The `<result-id>` can be still be manually controlled if desired
 206   by expressing the entire instruction as `!<integer>` tokens for its opcode and
 207   operands.
 208
 209 * The `=` sign cannot be processed by the alternate parsing mode if the OpCode
 210   following it is a `!<integer>`.
 211
 212 * When replacing a named ID with `!<integer>`, it is possible to generate
 213   unintentionally valid SPIR-V.  If the integer provided happens to equal a
 214   number generated for an existing named ID, it will result in a reference to
 215   that named ID being output.  This may be valid SPIR-V, contrary to the
 216   presumed intention of the writer.
 217
 218 ## Notes
 219
 220 * Some enumerants cannot be used by name, because the target instruction
 221 in which they are meaningful take an ID reference instead of a literal value.
 222 For example:
 223    * Named enumerated value `CmdExecTime` from section 3.30 Kernel
 224      Profiling Info is used in constructing a mask value supplied as
 225      an ID for `OpCaptureEventProfilingInfo`.  But no other instruction
 226      has enough context to bring the enumerant names from section 3.30
 227      into scope.
 228    * Similarly, the names in section 3.29 Kernel Enqueue Flags are used to
 229      construct a value supplied as an ID to the Flags argument of
 230      OpEnqueueKernel.
 231    * Similarly for the names in section 3.25 Memory Semantics.
 232    * Similarly for the names in section 3.27 Scope.
 233 * Some enumerants cannot be used by name, because they only name values
 234 returned by an instruction:
 235    * Enumerants from 3.12 Image Channel Order name possible values returned
 236      by the `OpImageQueryOrder` instruction.
 237    * Enumerants from 3.13 Image Channel Data Type name possible values
 238      returned by the `OpImageQueryFormat` instruction.