docs/gallium/tgsi.rst

   1 TGSI
   2 ====
   3
   4 TGSI, Tungsten Graphics Shader Infrastructure, is an intermediate language
   5 for describing shaders. Since Gallium is inherently shaderful, shaders are
   6 an important part of the API. TGSI is the only intermediate representation
   7 used by all drivers.
   8
   9 Basics
  10 ------
  11
  12 All TGSI instructions, known as *opcodes*, operate on arbitrary-precision
  13 floating-point four-component vectors. An opcode may have up to one
  14 destination register, known as *dst*, and between zero and three source
  15 registers, called *src0* through *src2*, or simply *src* if there is only
  16 one.
  17
  18 Some instructions, like :opcode:`I2F`, permit re-interpretation of vector
  19 components as integers. Other instructions permit using registers as
  20 two-component vectors with double precision; see :ref:`doubleopcodes`.
  21
  22 When an instruction has a scalar result, the result is usually copied into
  23 each of the components of *dst*. When this happens, the result is said to be
  24 *replicated* to *dst*. :opcode:`RCP` is one such instruction.
  25
  26 Source Modifiers
  27 ^^^^^^^^^^^^^^^^
  28
  29 TGSI supports 32-bit negate and absolute value modifiers on floating-point
  30 inputs, and 32-bit integer negates on some drivers.  The negate applies after
  31 absolute value if both are present.
  32
  33 The type of an input can be found by ``tgsi_opcode_infer_src_type()``, and
  34 TGSI_OPCODE_MOV and the second and third operands of TGSI_OPCODE_UCMP (which
  35 return TGSI_TYPE_UNTYPED) are also considered floats for the purpose of source
  36 modifiers.
  37
  38
  39 Other Modifiers
  40 ^^^^^^^^^^^^^^^
  41
  42 The saturate modifier clamps 32-bit destination stores to [0.0, 1.0].
  43
  44 For arithmetic instruction having a precise modifier certain optimizations
  45 which may alter the result are disallowed. Example: *add(mul(a,b),c)* can't be
  46 optimized to TGSI_OPCODE_MAD, because some hardware only supports the fused
  47 MAD instruction.
  48
  49 Instruction Set
  50 ---------------
  51
  52 Core ISA
  53 ^^^^^^^^^^^^^^^^^^^^^^^^^
  54
  55 These opcodes are guaranteed to be available regardless of the driver being
  56 used.
  57
  58 .. opcode:: ARL - Address Register Load
  59
  60 .. math::
  61
  62   dst.x = (int) \lfloor src.x\rfloor
  63
  64   dst.y = (int) \lfloor src.y\rfloor
  65
  66   dst.z = (int) \lfloor src.z\rfloor
  67
  68   dst.w = (int) \lfloor src.w\rfloor
  69
  70
  71 .. opcode:: MOV - Move
  72
  73 .. math::
  74
  75   dst.x = src.x
  76
  77   dst.y = src.y
  78
  79   dst.z = src.z
  80
  81   dst.w = src.w
  82
  83
  84 .. opcode:: LIT - Light Coefficients
  85
  86 .. math::
  87
  88   dst.x &= 1 \\
  89   dst.y &= max(src.x, 0) \\
  90   dst.z &= (src.x > 0) ? max(src.y, 0)^{clamp(src.w, -128, 128))} : 0 \\
  91   dst.w &= 1
  92
  93
  94 .. opcode:: RCP - Reciprocal
  95
  96 This instruction replicates its result.
  97
  98 .. math::
  99
 100   dst = \frac{1}{src.x}
 101
 102
 103 .. opcode:: RSQ - Reciprocal Square Root
 104
 105 This instruction replicates its result. The results are undefined for *src* <= 0.
 106
 107 .. math::
 108
 109   dst = \frac{1}{\sqrt{src.x}}
 110
 111
 112 .. opcode:: SQRT - Square Root
 113
 114 This instruction replicates its result. The results are undefined for *src* < 0.
 115
 116 .. math::
 117
 118   dst = {\sqrt{src.x}}
 119
 120
 121 .. opcode:: EXP - Approximate Exponential Base 2
 122
 123 .. math::
 124
 125   dst.x &= 2^{\lfloor src.x\rfloor} \\
 126   dst.y &= src.x - \lfloor src.x\rfloor \\
 127   dst.z &= 2^{src.x} \\
 128   dst.w &= 1
 129
 130
 131 .. opcode:: LOG - Approximate Logarithm Base 2
 132
 133 .. math::
 134
 135   dst.x &= \lfloor\log_2{|src.x|}\rfloor \\
 136   dst.y &= \frac{|src.x|}{2^{\lfloor\log_2{|src.x|}\rfloor}} \\
 137   dst.z &= \log_2{|src.x|} \\
 138   dst.w &= 1
 139
 140
 141 .. opcode:: MUL - Multiply
 142
 143 .. math::
 144
 145   dst.x = src0.x \times src1.x
 146
 147   dst.y = src0.y \times src1.y
 148
 149   dst.z = src0.z \times src1.z
 150
 151   dst.w = src0.w \times src1.w
 152
 153
 154 .. opcode:: ADD - Add
 155
 156 .. math::
 157
 158   dst.x = src0.x + src1.x
 159
 160   dst.y = src0.y + src1.y
 161
 162   dst.z = src0.z + src1.z
 163
 164   dst.w = src0.w + src1.w
 165
 166
 167 .. opcode:: DP3 - 3-component Dot Product
 168
 169 This instruction replicates its result.
 170
 171 .. math::
 172
 173   dst = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z
 174
 175
 176 .. opcode:: DP4 - 4-component Dot Product
 177
 178 This instruction replicates its result.
 179
 180 .. math::
 181
 182   dst = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src0.w \times src1.w
 183
 184
 185 .. opcode:: DST - Distance Vector
 186
 187 .. math::
 188
 189   dst.x &= 1\\
 190   dst.y &= src0.y \times src1.y\\
 191   dst.z &= src0.z\\
 192   dst.w &= src1.w
 193
 194
 195 .. opcode:: MIN - Minimum
 196
 197 .. math::
 198
 199   dst.x = min(src0.x, src1.x)
 200
 201   dst.y = min(src0.y, src1.y)
 202
 203   dst.z = min(src0.z, src1.z)
 204
 205   dst.w = min(src0.w, src1.w)
 206
 207
 208 .. opcode:: MAX - Maximum
 209
 210 .. math::
 211
 212   dst.x = max(src0.x, src1.x)
 213
 214   dst.y = max(src0.y, src1.y)
 215
 216   dst.z = max(src0.z, src1.z)
 217
 218   dst.w = max(src0.w, src1.w)
 219
 220
 221 .. opcode:: SLT - Set On Less Than
 222
 223 .. math::
 224
 225   dst.x = (src0.x < src1.x) ? 1.0F : 0.0F
 226
 227   dst.y = (src0.y < src1.y) ? 1.0F : 0.0F
 228
 229   dst.z = (src0.z < src1.z) ? 1.0F : 0.0F
 230
 231   dst.w = (src0.w < src1.w) ? 1.0F : 0.0F
 232
 233
 234 .. opcode:: SGE - Set On Greater Equal Than
 235
 236 .. math::
 237
 238   dst.x = (src0.x >= src1.x) ? 1.0F : 0.0F
 239
 240   dst.y = (src0.y >= src1.y) ? 1.0F : 0.0F
 241
 242   dst.z = (src0.z >= src1.z) ? 1.0F : 0.0F
 243
 244   dst.w = (src0.w >= src1.w) ? 1.0F : 0.0F
 245
 246
 247 .. opcode:: MAD - Multiply And Add
 248
 249 Perform a * b + c. The implementation is free to decide whether there is an
 250 intermediate rounding step or not.
 251
 252 .. math::
 253
 254   dst.x = src0.x \times src1.x + src2.x
 255
 256   dst.y = src0.y \times src1.y + src2.y
 257
 258   dst.z = src0.z \times src1.z + src2.z
 259
 260   dst.w = src0.w \times src1.w + src2.w
 261
 262
 263 .. opcode:: LRP - Linear Interpolate
 264
 265 .. math::
 266
 267   dst.x = src0.x \times src1.x + (1 - src0.x) \times src2.x
 268
 269   dst.y = src0.y \times src1.y + (1 - src0.y) \times src2.y
 270
 271   dst.z = src0.z \times src1.z + (1 - src0.z) \times src2.z
 272
 273   dst.w = src0.w \times src1.w + (1 - src0.w) \times src2.w
 274
 275
 276 .. opcode:: FMA - Fused Multiply-Add
 277
 278 Perform a * b + c with no intermediate rounding step.
 279
 280 .. math::
 281
 282   dst.x = src0.x \times src1.x + src2.x
 283
 284   dst.y = src0.y \times src1.y + src2.y
 285
 286   dst.z = src0.z \times src1.z + src2.z
 287
 288   dst.w = src0.w \times src1.w + src2.w
 289
 290
 291 .. opcode:: FRC - Fraction
 292
 293 .. math::
 294
 295   dst.x = src.x - \lfloor src.x\rfloor
 296
 297   dst.y = src.y - \lfloor src.y\rfloor
 298
 299   dst.z = src.z - \lfloor src.z\rfloor
 300
 301   dst.w = src.w - \lfloor src.w\rfloor
 302
 303
 304 .. opcode:: FLR - Floor
 305
 306 .. math::
 307
 308   dst.x = \lfloor src.x\rfloor
 309
 310   dst.y = \lfloor src.y\rfloor
 311
 312   dst.z = \lfloor src.z\rfloor
 313
 314   dst.w = \lfloor src.w\rfloor
 315
 316
 317 .. opcode:: ROUND - Round
 318
 319 .. math::
 320
 321   dst.x = round(src.x)
 322
 323   dst.y = round(src.y)
 324
 325   dst.z = round(src.z)
 326
 327   dst.w = round(src.w)
 328
 329
 330 .. opcode:: EX2 - Exponential Base 2
 331
 332 This instruction replicates its result.
 333
 334 .. math::
 335
 336   dst = 2^{src.x}
 337
 338
 339 .. opcode:: LG2 - Logarithm Base 2
 340
 341 This instruction replicates its result.
 342
 343 .. math::
 344
 345   dst = \log_2{src.x}
 346
 347
 348 .. opcode:: POW - Power
 349
 350 This instruction replicates its result.
 351
 352 .. math::
 353
 354   dst = src0.x^{src1.x}
 355
 356
 357 .. opcode:: LDEXP - Multiply Number by Integral Power of 2
 358
 359 *src1* is an integer.
 360
 361 .. math::
 362
 363   dst.x = src0.x * 2^{src1.x}
 364   dst.y = src0.y * 2^{src1.y}
 365   dst.z = src0.z * 2^{src1.z}
 366   dst.w = src0.w * 2^{src1.w}
 367
 368
 369 .. opcode:: COS - Cosine
 370
 371 This instruction replicates its result.
 372
 373 .. math::
 374
 375   dst = \cos{src.x}
 376
 377
 378 .. opcode:: DDX, DDX_FINE - Derivative Relative To X
 379
 380 The fine variant is only used when ``PIPE_CAP_FS_FINE_DERIVATIVE`` is
 381 advertised. When it is, the fine version guarantees one derivative per row
 382 while DDX is allowed to be the same for the entire 2x2 quad.
 383
 384 .. math::
 385
 386   dst.x = partialx(src.x)
 387
 388   dst.y = partialx(src.y)
 389
 390   dst.z = partialx(src.z)
 391
 392   dst.w = partialx(src.w)
 393
 394
 395 .. opcode:: DDY, DDY_FINE - Derivative Relative To Y
 396
 397 The fine variant is only used when ``PIPE_CAP_FS_FINE_DERIVATIVE`` is
 398 advertised. When it is, the fine version guarantees one derivative per column
 399 while DDY is allowed to be the same for the entire 2x2 quad.
 400
 401 .. math::
 402
 403   dst.x = partialy(src.x)
 404
 405   dst.y = partialy(src.y)
 406
 407   dst.z = partialy(src.z)
 408
 409   dst.w = partialy(src.w)
 410
 411
 412 .. opcode:: PK2H - Pack Two 16-bit Floats
 413
 414 This instruction replicates its result.
 415
 416 .. math::
 417
 418   dst = f32\_to\_f16(src.x) | f32\_to\_f16(src.y) << 16
 419
 420
 421 .. opcode:: PK2US - Pack Two Unsigned 16-bit Scalars
 422
 423 This instruction replicates its result.
 424
 425 .. math::
 426
 427   dst = f32\_to\_unorm16(src.x) | f32\_to\_unorm16(src.y) << 16
 428
 429
 430 .. opcode:: PK4B - Pack Four Signed 8-bit Scalars
 431
 432 This instruction replicates its result.
 433
 434 .. math::
 435
 436   dst = f32\_to\_snorm8(src.x) |
 437         (f32\_to\_snorm8(src.y) << 8) |
 438         (f32\_to\_snorm8(src.z) << 16) |
 439         (f32\_to\_snorm8(src.w) << 24)
 440
 441
 442 .. opcode:: PK4UB - Pack Four Unsigned 8-bit Scalars
 443
 444 This instruction replicates its result.
 445
 446 .. math::
 447
 448   dst = f32\_to\_unorm8(src.x) |
 449         (f32\_to\_unorm8(src.y) << 8) |
 450         (f32\_to\_unorm8(src.z) << 16) |
 451         (f32\_to\_unorm8(src.w) << 24)
 452
 453
 454 .. opcode:: SEQ - Set On Equal
 455
 456 .. math::
 457
 458   dst.x = (src0.x == src1.x) ? 1.0F : 0.0F
 459
 460   dst.y = (src0.y == src1.y) ? 1.0F : 0.0F
 461
 462   dst.z = (src0.z == src1.z) ? 1.0F : 0.0F
 463
 464   dst.w = (src0.w == src1.w) ? 1.0F : 0.0F
 465
 466
 467 .. opcode:: SGT - Set On Greater Than
 468
 469 .. math::
 470
 471   dst.x = (src0.x > src1.x) ? 1.0F : 0.0F
 472
 473   dst.y = (src0.y > src1.y) ? 1.0F : 0.0F
 474
 475   dst.z = (src0.z > src1.z) ? 1.0F : 0.0F
 476
 477   dst.w = (src0.w > src1.w) ? 1.0F : 0.0F
 478
 479
 480 .. opcode:: SIN - Sine
 481
 482 This instruction replicates its result.
 483
 484 .. math::
 485
 486   dst = \sin{src.x}
 487
 488
 489 .. opcode:: SLE - Set On Less Equal Than
 490
 491 .. math::
 492
 493   dst.x = (src0.x <= src1.x) ? 1.0F : 0.0F
 494
 495   dst.y = (src0.y <= src1.y) ? 1.0F : 0.0F
 496
 497   dst.z = (src0.z <= src1.z) ? 1.0F : 0.0F
 498
 499   dst.w = (src0.w <= src1.w) ? 1.0F : 0.0F
 500
 501
 502 .. opcode:: SNE - Set On Not Equal
 503
 504 .. math::
 505
 506   dst.x = (src0.x != src1.x) ? 1.0F : 0.0F
 507
 508   dst.y = (src0.y != src1.y) ? 1.0F : 0.0F
 509
 510   dst.z = (src0.z != src1.z) ? 1.0F : 0.0F
 511
 512   dst.w = (src0.w != src1.w) ? 1.0F : 0.0F
 513
 514
 515 .. opcode:: TEX - Texture Lookup
 516
 517   for array textures *src0.y* contains the slice for 1D,
 518   and *src0.z* contain the slice for 2D.
 519
 520   for shadow textures with no arrays (and not cube map),
 521   *src0.z* contains the reference value.
 522
 523   for shadow textures with arrays, *src0.z* contains
 524   the reference value for 1D arrays, and *src0.w* contains
 525   the reference value for 2D arrays and cube maps.
 526
 527   for cube map array shadow textures, the reference value
 528   cannot be passed in *src0.w*, and TEX2 must be used instead.
 529
 530 .. math::
 531
 532   coord = src0
 533
 534   shadow_ref = src0.z or src0.w (optional)
 535
 536   unit = src1
 537
 538   dst = texture\_sample(unit, coord, shadow_ref)
 539
 540
 541 .. opcode:: TEX2 - Texture Lookup (for shadow cube map arrays only)
 542
 543   this is the same as TEX, but uses another reg to encode the
 544   reference value.
 545
 546 .. math::
 547
 548   coord = src0
 549
 550   shadow_ref = src1.x
 551
 552   unit = src2
 553
 554   dst = texture\_sample(unit, coord, shadow_ref)
 555
 556
 557
 558
 559 .. opcode:: TXD - Texture Lookup with Derivatives
 560
 561 .. math::
 562
 563   coord = src0
 564
 565   ddx = src1
 566
 567   ddy = src2
 568
 569   unit = src3
 570
 571   dst = texture\_sample\_deriv(unit, coord, ddx, ddy)
 572
 573
 574 .. opcode:: TXP - Projective Texture Lookup
 575
 576 .. math::
 577
 578   coord.x = src0.x / src0.w
 579
 580   coord.y = src0.y / src0.w
 581
 582   coord.z = src0.z / src0.w
 583
 584   coord.w = src0.w
 585
 586   unit = src1
 587
 588   dst = texture\_sample(unit, coord)
 589
 590
 591 .. opcode:: UP2H - Unpack Two 16-Bit Floats
 592
 593 .. math::
 594
 595   dst.x = f16\_to\_f32(src0.x \& 0xffff)
 596
 597   dst.y = f16\_to\_f32(src0.x >> 16)
 598
 599   dst.z = f16\_to\_f32(src0.x \& 0xffff)
 600
 601   dst.w = f16\_to\_f32(src0.x >> 16)
 602
 603 .. note::
 604
 605    Considered for removal.
 606
 607 .. opcode:: UP2US - Unpack Two Unsigned 16-Bit Scalars
 608
 609   TBD
 610
 611 .. note::
 612
 613    Considered for removal.
 614
 615 .. opcode:: UP4B - Unpack Four Signed 8-Bit Values
 616
 617   TBD
 618
 619 .. note::
 620
 621    Considered for removal.
 622
 623 .. opcode:: UP4UB - Unpack Four Unsigned 8-Bit Scalars
 624
 625   TBD
 626
 627 .. note::
 628
 629    Considered for removal.
 630
 631
 632 .. opcode:: ARR - Address Register Load With Round
 633
 634 .. math::
 635
 636   dst.x = (int) round(src.x)
 637
 638   dst.y = (int) round(src.y)
 639
 640   dst.z = (int) round(src.z)
 641
 642   dst.w = (int) round(src.w)
 643
 644
 645 .. opcode:: SSG - Set Sign
 646
 647 .. math::
 648
 649   dst.x = (src.x > 0) ? 1 : (src.x < 0) ? -1 : 0
 650
 651   dst.y = (src.y > 0) ? 1 : (src.y < 0) ? -1 : 0
 652
 653   dst.z = (src.z > 0) ? 1 : (src.z < 0) ? -1 : 0
 654
 655   dst.w = (src.w > 0) ? 1 : (src.w < 0) ? -1 : 0
 656
 657
 658 .. opcode:: CMP - Compare
 659
 660 .. math::
 661
 662   dst.x = (src0.x < 0) ? src1.x : src2.x
 663
 664   dst.y = (src0.y < 0) ? src1.y : src2.y
 665
 666   dst.z = (src0.z < 0) ? src1.z : src2.z
 667
 668   dst.w = (src0.w < 0) ? src1.w : src2.w
 669
 670
 671 .. opcode:: KILL_IF - Conditional Discard
 672
 673   Conditional discard.  Allowed in fragment shaders only.
 674
 675 .. math::
 676
 677   if (src.x < 0 || src.y < 0 || src.z < 0 || src.w < 0)
 678     discard
 679   endif
 680
 681
 682 .. opcode:: KILL - Discard
 683
 684   Unconditional discard.  Allowed in fragment shaders only.
 685
 686
 687 .. opcode:: DEMOTE - Demote Invocation to a Helper
 688
 689   This demotes the current invocation to a helper, but continues
 690   execution (while KILL may or may not terminate the
 691   invocation). After this runs, all the usual helper invocation rules
 692   apply about discarding buffer and render target writes. This is
 693   useful for having accurate derivatives in the other invocations
 694   which have not been demoted.
 695
 696   Allowed in fragment shaders only.
 697
 698
 699 .. opcode:: READ_HELPER - Reads Invocation Helper Status
 700
 701   This is identical to ``TGSI_SEMANTIC_HELPER_INVOCATION``, except
 702   this will read the current value, which might change as a result of
 703   a ``DEMOTE`` instruction.
 704
 705   Allowed in fragment shaders only.
 706
 707
 708 .. opcode:: TXB - Texture Lookup With Bias
 709
 710   for cube map array textures and shadow cube maps, the bias value
 711   cannot be passed in *src0.w*, and TXB2 must be used instead.
 712
 713   if the target is a shadow texture, the reference value is always
 714   in *src.z* (this prevents shadow 3d and shadow 2d arrays from
 715   using this instruction, but this is not needed).
 716
 717 .. math::
 718
 719   coord.x = src0.x
 720
 721   coord.y = src0.y
 722
 723   coord.z = src0.z
 724
 725   coord.w = none
 726
 727   bias = src0.w
 728
 729   unit = src1
 730
 731   dst = texture\_sample(unit, coord, bias)
 732
 733
 734 .. opcode:: TXB2 - Texture Lookup With Bias (some cube maps only)
 735
 736   this is the same as TXB, but uses another reg to encode the
 737   LOD bias value for cube map arrays and shadow cube maps.
 738   Presumably shadow 2d arrays and shadow 3d targets could use
 739   this encoding too, but this is not legal.
 740
 741   if the target is a shadow cube map array, the reference value is in
 742   *src1.y*.
 743
 744 .. math::
 745
 746   coord = src0
 747
 748   bias = src1.x
 749
 750   unit = src2
 751
 752   dst = texture\_sample(unit, coord, bias)
 753
 754
 755 .. opcode:: DIV - Divide
 756
 757 .. math::
 758
 759   dst.x = \frac{src0.x}{src1.x}
 760
 761   dst.y = \frac{src0.y}{src1.y}
 762
 763   dst.z = \frac{src0.z}{src1.z}
 764
 765   dst.w = \frac{src0.w}{src1.w}
 766
 767
 768 .. opcode:: DP2 - 2-component Dot Product
 769
 770 This instruction replicates its result.
 771
 772 .. math::
 773
 774   dst = src0.x \times src1.x + src0.y \times src1.y
 775
 776
 777 .. opcode:: TEX_LZ - Texture Lookup With LOD = 0
 778
 779   This is the same as TXL with LOD = 0. Like every texture opcode, it obeys
 780   pipe_sampler_view::u.tex.first_level and pipe_sampler_state::min_lod.
 781   There is no way to override those two in shaders.
 782
 783 .. math::
 784
 785   coord.x = src0.x
 786
 787   coord.y = src0.y
 788
 789   coord.z = src0.z
 790
 791   coord.w = none
 792
 793   lod = 0
 794
 795   unit = src1
 796
 797   dst = texture\_sample(unit, coord, lod)
 798
 799
 800 .. opcode:: TXL - Texture Lookup With explicit LOD
 801
 802   for cube map array textures, the explicit LOD value
 803   cannot be passed in *src0.w*, and TXL2 must be used instead.
 804
 805   if the target is a shadow texture, the reference value is always
 806   in *src.z* (this prevents shadow 3d / 2d array / cube targets from
 807   using this instruction, but this is not needed).
 808
 809 .. math::
 810
 811   coord.x = src0.x
 812
 813   coord.y = src0.y
 814
 815   coord.z = src0.z
 816
 817   coord.w = none
 818
 819   lod = src0.w
 820
 821   unit = src1
 822
 823   dst = texture\_sample(unit, coord, lod)
 824
 825
 826 .. opcode:: TXL2 - Texture Lookup With explicit LOD (for cube map arrays only)
 827
 828   this is the same as TXL, but uses another reg to encode the
 829   explicit LOD value.
 830   Presumably shadow 3d / 2d array / cube targets could use
 831   this encoding too, but this is not legal.
 832
 833   if the target is a shadow cube map array, the reference value is in
 834   *src1.y*.
 835
 836 .. math::
 837
 838   coord = src0
 839
 840   lod = src1.x
 841
 842   unit = src2
 843
 844   dst = texture\_sample(unit, coord, lod)
 845
 846
 847 Compute ISA
 848 ^^^^^^^^^^^^^^^^^^^^^^^^
 849
 850 These opcodes are primarily provided for special-use computational shaders.
 851 Support for these opcodes indicated by a special pipe capability bit (TBD).
 852
 853 XXX doesn't look like most of the opcodes really belong here.
 854
 855 .. opcode:: CEIL - Ceiling
 856
 857 .. math::
 858
 859   dst.x = \lceil src.x\rceil
 860
 861   dst.y = \lceil src.y\rceil
 862
 863   dst.z = \lceil src.z\rceil
 864
 865   dst.w = \lceil src.w\rceil
 866
 867
 868 .. opcode:: TRUNC - Truncate
 869
 870 .. math::
 871
 872   dst.x = trunc(src.x)
 873
 874   dst.y = trunc(src.y)
 875
 876   dst.z = trunc(src.z)
 877
 878   dst.w = trunc(src.w)
 879
 880
 881 .. opcode:: MOD - Modulus
 882
 883 .. math::
 884
 885   dst.x = src0.x \bmod src1.x
 886
 887   dst.y = src0.y \bmod src1.y
 888
 889   dst.z = src0.z \bmod src1.z
 890
 891   dst.w = src0.w \bmod src1.w
 892
 893
 894 .. opcode:: UARL - Integer Address Register Load
 895
 896   Moves the contents of the source register, assumed to be an integer, into the
 897   destination register, which is assumed to be an address (ADDR) register.
 898
 899
 900 .. opcode:: TXF - Texel Fetch
 901
 902   As per NV_gpu_shader4, extract a single texel from a specified texture
 903   image or PIPE_BUFFER resource. The source sampler may not be a CUBE or
 904   SHADOW.  *src0* is a
 905   four-component signed integer vector used to identify the single texel
 906   accessed. 3 components + level.  If the texture is multisampled, then
 907   the fourth component indicates the sample, not the mipmap level.
 908   Just like texture instructions, an optional
 909   offset vector is provided, which is subject to various driver restrictions
 910   (regarding range, source of offsets). This instruction ignores the sampler
 911   state.
 912
 913   TXF(uint_vec coord, int_vec offset).
 914
 915
 916 .. opcode:: TXQ - Texture Size Query
 917
 918   As per NV_gpu_program4, retrieve the dimensions of the texture depending on
 919   the target. For 1D (width), 2D/RECT/CUBE (width, height), 3D (width, height,
 920   depth), 1D array (width, layers), 2D array (width, height, layers).
 921   Also return the number of accessible levels (last_level - first_level + 1)
 922   in W.
 923
 924   For components which don't return a resource dimension, their value
 925   is undefined.
 926
 927 .. math::
 928
 929   lod = src0.x
 930
 931   dst.x = texture\_width(unit, lod)
 932
 933   dst.y = texture\_height(unit, lod)
 934
 935   dst.z = texture\_depth(unit, lod)
 936
 937   dst.w = texture\_levels(unit)
 938
 939
 940 .. opcode:: TXQS - Texture Samples Query
 941
 942   This retrieves the number of samples in the texture, and stores it
 943   into the x component as an unsigned integer. The other components are
 944   undefined.  If the texture is not multisampled, this function returns
 945   (1, undef, undef, undef).
 946
 947 .. math::
 948
 949   dst.x = texture\_samples(unit)
 950
 951
 952 .. opcode:: TG4 - Texture Gather
 953
 954   As per ARB_texture_gather, gathers the four texels to be used in a bi-linear
 955   filtering operation and packs them into a single register.  Only works with
 956   2D, 2D array, cubemaps, and cubemaps arrays.  For 2D textures, only the
 957   addressing modes of the sampler and the top level of any mip pyramid are
 958   used. Set W to zero.  It behaves like the TEX instruction, but a filtered
 959   sample is not generated. The four samples that contribute to filtering are
 960   placed into XYZW in clockwise order, starting with the (u,v) texture
 961   coordinate delta at the following locations (-, +), (+, +), (+, -), (-, -),
 962   where the magnitude of the deltas are half a texel.
 963
 964   PIPE_CAP_TEXTURE_SM5 enhances this instruction to support shadow per-sample
 965   depth compares, single component selection, and a non-constant offset. It
 966   doesn't allow support for the GL independent offset to get i0,j0. This would
 967   require another CAP is HW can do it natively. For now we lower that before
 968   TGSI.
 969
 970   PIPE_CAP_TGSI_TG4_COMPONENT_IN_SWIZZLE changes the encoding so that component
 971   is stored in the sampler source swizzle x.
 972
 973 .. math::
 974
 975    coord = src0
 976
 977    (without TGSI_TG4_COMPONENT_IN_SWIZZLE)
 978    component = src1
 979
 980    dst = texture\_gather4 (unit, coord, component)
 981
 982    (with TGSI_TG4_COMPONENT_IN_SWIZZLE)
 983    dst = texture\_gather4 (unit, coord)
 984    component is encoded in sampler swizzle.
 985
 986 (with SM5 - cube array shadow)
 987
 988 .. math::
 989
 990    coord = src0
 991
 992    compare = src1
 993
 994    dst = texture\_gather (uint, coord, compare)
 995
 996 .. opcode:: LODQ - level of detail query
 997
 998    Compute the LOD information that the texture pipe would use to access the
 999    texture. The Y component contains the computed LOD lambda_prime. The X
1000    component contains the LOD that will be accessed, based on min/max LODs
1001    and mipmap filters.
1002
1003 .. math::
1004
1005    coord = src0
1006
1007    dst.xy = lodq(uint, coord);
1008
1009 .. opcode:: CLOCK - retrieve the current shader time
1010
1011    Invoking this instruction multiple times in the same shader should
1012    cause monotonically increasing values to be returned. The values
1013    are implicitly 64-bit, so if fewer than 64 bits of precision are
1014    available, to provide expected wraparound semantics, the value
1015    should be shifted up so that the most significant bit of the time
1016    is the most significant bit of the 64-bit value.
1017
1018 .. math::
1019
1020    dst.xy = clock()
1021
1022
1023 Integer ISA
1024 ^^^^^^^^^^^^^^^^^^^^^^^^
1025 These opcodes are used for integer operations.
1026 Support for these opcodes indicated by PIPE_SHADER_CAP_INTEGERS (all of them?)
1027
1028
1029 .. opcode:: I2F - Signed Integer To Float
1030
1031    Rounding is unspecified (round to nearest even suggested).
1032
1033 .. math::
1034
1035   dst.x = (float) src.x
1036
1037   dst.y = (float) src.y
1038
1039   dst.z = (float) src.z
1040
1041   dst.w = (float) src.w
1042
1043
1044 .. opcode:: U2F - Unsigned Integer To Float
1045
1046    Rounding is unspecified (round to nearest even suggested).
1047
1048 .. math::
1049
1050   dst.x = (float) src.x
1051
1052   dst.y = (float) src.y
1053
1054   dst.z = (float) src.z
1055
1056   dst.w = (float) src.w
1057
1058
1059 .. opcode:: F2I - Float to Signed Integer
1060
1061    Rounding is towards zero (truncate).
1062    Values outside signed range (including NaNs) produce undefined results.
1063
1064 .. math::
1065
1066   dst.x = (int) src.x
1067
1068   dst.y = (int) src.y
1069
1070   dst.z = (int) src.z
1071
1072   dst.w = (int) src.w
1073
1074
1075 .. opcode:: F2U - Float to Unsigned Integer
1076
1077    Rounding is towards zero (truncate).
1078    Values outside unsigned range (including NaNs) produce undefined results.
1079
1080 .. math::
1081
1082   dst.x = (unsigned) src.x
1083
1084   dst.y = (unsigned) src.y
1085
1086   dst.z = (unsigned) src.z
1087
1088   dst.w = (unsigned) src.w
1089
1090
1091 .. opcode:: UADD - Integer Add
1092
1093    This instruction works the same for signed and unsigned integers.
1094    The low 32bit of the result is returned.
1095
1096 .. math::
1097
1098   dst.x = src0.x + src1.x
1099
1100   dst.y = src0.y + src1.y
1101
1102   dst.z = src0.z + src1.z
1103
1104   dst.w = src0.w + src1.w
1105
1106
1107 .. opcode:: UMAD - Integer Multiply And Add
1108
1109    This instruction works the same for signed and unsigned integers.
1110    The multiplication returns the low 32bit (as does the result itself).
1111
1112 .. math::
1113
1114   dst.x = src0.x \times src1.x + src2.x
1115
1116   dst.y = src0.y \times src1.y + src2.y
1117
1118   dst.z = src0.z \times src1.z + src2.z
1119
1120   dst.w = src0.w \times src1.w + src2.w
1121
1122
1123 .. opcode:: UMUL - Integer Multiply
1124
1125    This instruction works the same for signed and unsigned integers.
1126    The low 32bit of the result is returned.
1127
1128 .. math::
1129
1130   dst.x = src0.x \times src1.x
1131
1132   dst.y = src0.y \times src1.y
1133
1134   dst.z = src0.z \times src1.z
1135
1136   dst.w = src0.w \times src1.w
1137
1138
1139 .. opcode:: IMUL_HI - Signed Integer Multiply High Bits
1140
1141    The high 32bits of the multiplication of 2 signed integers are returned.
1142
1143 .. math::
1144
1145   dst.x = (src0.x \times src1.x) >> 32
1146
1147   dst.y = (src0.y \times src1.y) >> 32
1148
1149   dst.z = (src0.z \times src1.z) >> 32
1150
1151   dst.w = (src0.w \times src1.w) >> 32
1152
1153
1154 .. opcode:: UMUL_HI - Unsigned Integer Multiply High Bits
1155
1156    The high 32bits of the multiplication of 2 unsigned integers are returned.
1157
1158 .. math::
1159
1160   dst.x = (src0.x \times src1.x) >> 32
1161
1162   dst.y = (src0.y \times src1.y) >> 32
1163
1164   dst.z = (src0.z \times src1.z) >> 32
1165
1166   dst.w = (src0.w \times src1.w) >> 32
1167
1168
1169 .. opcode:: IDIV - Signed Integer Division
1170
1171    TBD: behavior for division by zero.
1172
1173 .. math::
1174
1175   dst.x = \frac{src0.x}{src1.x}
1176
1177   dst.y = \frac{src0.y}{src1.y}
1178
1179   dst.z = \frac{src0.z}{src1.z}
1180
1181   dst.w = \frac{src0.w}{src1.w}
1182
1183
1184 .. opcode:: UDIV - Unsigned Integer Division
1185
1186    For division by zero, 0xffffffff is returned.
1187
1188 .. math::
1189
1190   dst.x = \frac{src0.x}{src1.x}
1191
1192   dst.y = \frac{src0.y}{src1.y}
1193
1194   dst.z = \frac{src0.z}{src1.z}
1195
1196   dst.w = \frac{src0.w}{src1.w}
1197
1198
1199 .. opcode:: UMOD - Unsigned Integer Remainder
1200
1201    If *src1* is zero, 0xffffffff is returned.
1202
1203 .. math::
1204
1205   dst.x = src0.x \bmod src1.x
1206
1207   dst.y = src0.y \bmod src1.y
1208
1209   dst.z = src0.z \bmod src1.z
1210
1211   dst.w = src0.w \bmod src1.w
1212
1213
1214 .. opcode:: NOT - Bitwise Not
1215
1216 .. math::
1217
1218   dst.x = \sim src.x
1219
1220   dst.y = \sim src.y
1221
1222   dst.z = \sim src.z
1223
1224   dst.w = \sim src.w
1225
1226
1227 .. opcode:: AND - Bitwise And
1228
1229 .. math::
1230
1231   dst.x = src0.x \& src1.x
1232
1233   dst.y = src0.y \& src1.y
1234
1235   dst.z = src0.z \& src1.z
1236
1237   dst.w = src0.w \& src1.w
1238
1239
1240 .. opcode:: OR - Bitwise Or
1241
1242 .. math::
1243
1244   dst.x = src0.x | src1.x
1245
1246   dst.y = src0.y | src1.y
1247
1248   dst.z = src0.z | src1.z
1249
1250   dst.w = src0.w | src1.w
1251
1252
1253 .. opcode:: XOR - Bitwise Xor
1254
1255 .. math::
1256
1257   dst.x = src0.x \oplus src1.x
1258
1259   dst.y = src0.y \oplus src1.y
1260
1261   dst.z = src0.z \oplus src1.z
1262
1263   dst.w = src0.w \oplus src1.w
1264
1265
1266 .. opcode:: IMAX - Maximum of Signed Integers
1267
1268 .. math::
1269
1270   dst.x = max(src0.x, src1.x)
1271
1272   dst.y = max(src0.y, src1.y)
1273
1274   dst.z = max(src0.z, src1.z)
1275
1276   dst.w = max(src0.w, src1.w)
1277
1278
1279 .. opcode:: UMAX - Maximum of Unsigned Integers
1280
1281 .. math::
1282
1283   dst.x = max(src0.x, src1.x)
1284
1285   dst.y = max(src0.y, src1.y)
1286
1287   dst.z = max(src0.z, src1.z)
1288
1289   dst.w = max(src0.w, src1.w)
1290
1291
1292 .. opcode:: IMIN - Minimum of Signed Integers
1293
1294 .. math::
1295
1296   dst.x = min(src0.x, src1.x)
1297
1298   dst.y = min(src0.y, src1.y)
1299
1300   dst.z = min(src0.z, src1.z)
1301
1302   dst.w = min(src0.w, src1.w)
1303
1304
1305 .. opcode:: UMIN - Minimum of Unsigned Integers
1306
1307 .. math::
1308
1309   dst.x = min(src0.x, src1.x)
1310
1311   dst.y = min(src0.y, src1.y)
1312
1313   dst.z = min(src0.z, src1.z)
1314
1315   dst.w = min(src0.w, src1.w)
1316
1317
1318 .. opcode:: SHL - Shift Left
1319
1320    The shift count is masked with 0x1f before the shift is applied.
1321
1322 .. math::
1323
1324   dst.x = src0.x << (0x1f \& src1.x)
1325
1326   dst.y = src0.y << (0x1f \& src1.y)
1327
1328   dst.z = src0.z << (0x1f \& src1.z)
1329
1330   dst.w = src0.w << (0x1f \& src1.w)
1331
1332
1333 .. opcode:: ISHR - Arithmetic Shift Right (of Signed Integer)
1334
1335    The shift count is masked with 0x1f before the shift is applied.
1336
1337 .. math::
1338
1339   dst.x = src0.x >> (0x1f \& src1.x)
1340
1341   dst.y = src0.y >> (0x1f \& src1.y)
1342
1343   dst.z = src0.z >> (0x1f \& src1.z)
1344
1345   dst.w = src0.w >> (0x1f \& src1.w)
1346
1347
1348 .. opcode:: USHR - Logical Shift Right
1349
1350    The shift count is masked with 0x1f before the shift is applied.
1351
1352 .. math::
1353
1354   dst.x = src0.x >> (unsigned) (0x1f \& src1.x)
1355
1356   dst.y = src0.y >> (unsigned) (0x1f \& src1.y)
1357
1358   dst.z = src0.z >> (unsigned) (0x1f \& src1.z)
1359
1360   dst.w = src0.w >> (unsigned) (0x1f \& src1.w)
1361
1362
1363 .. opcode:: UCMP - Integer Conditional Move
1364
1365 .. math::
1366
1367   dst.x = src0.x ? src1.x : src2.x
1368
1369   dst.y = src0.y ? src1.y : src2.y
1370
1371   dst.z = src0.z ? src1.z : src2.z
1372
1373   dst.w = src0.w ? src1.w : src2.w
1374
1375
1376
1377 .. opcode:: ISSG - Integer Set Sign
1378
1379 .. math::
1380
1381   dst.x = (src0.x < 0) ? -1 : (src0.x > 0) ? 1 : 0
1382
1383   dst.y = (src0.y < 0) ? -1 : (src0.y > 0) ? 1 : 0
1384
1385   dst.z = (src0.z < 0) ? -1 : (src0.z > 0) ? 1 : 0
1386
1387   dst.w = (src0.w < 0) ? -1 : (src0.w > 0) ? 1 : 0
1388
1389
1390
1391 .. opcode:: FSLT - Float Set On Less Than (ordered)
1392
1393    Same comparison as SLT but returns integer instead of 1.0/0.0 float
1394
1395 .. math::
1396
1397   dst.x = (src0.x < src1.x) ? \sim 0 : 0
1398
1399   dst.y = (src0.y < src1.y) ? \sim 0 : 0
1400
1401   dst.z = (src0.z < src1.z) ? \sim 0 : 0
1402
1403   dst.w = (src0.w < src1.w) ? \sim 0 : 0
1404
1405
1406 .. opcode:: ISLT - Signed Integer Set On Less Than
1407
1408 .. math::
1409
1410   dst.x = (src0.x < src1.x) ? \sim 0 : 0
1411
1412   dst.y = (src0.y < src1.y) ? \sim 0 : 0
1413
1414   dst.z = (src0.z < src1.z) ? \sim 0 : 0
1415
1416   dst.w = (src0.w < src1.w) ? \sim 0 : 0
1417
1418
1419 .. opcode:: USLT - Unsigned Integer Set On Less Than
1420
1421 .. math::
1422
1423   dst.x = (src0.x < src1.x) ? \sim 0 : 0
1424
1425   dst.y = (src0.y < src1.y) ? \sim 0 : 0
1426
1427   dst.z = (src0.z < src1.z) ? \sim 0 : 0
1428
1429   dst.w = (src0.w < src1.w) ? \sim 0 : 0
1430
1431
1432 .. opcode:: FSGE - Float Set On Greater Equal Than (ordered)
1433
1434    Same comparison as SGE but returns integer instead of 1.0/0.0 float
1435
1436 .. math::
1437
1438   dst.x = (src0.x >= src1.x) ? \sim 0 : 0
1439
1440   dst.y = (src0.y >= src1.y) ? \sim 0 : 0
1441
1442   dst.z = (src0.z >= src1.z) ? \sim 0 : 0
1443
1444   dst.w = (src0.w >= src1.w) ? \sim 0 : 0
1445
1446
1447 .. opcode:: ISGE - Signed Integer Set On Greater Equal Than
1448
1449 .. math::
1450
1451   dst.x = (src0.x >= src1.x) ? \sim 0 : 0
1452
1453   dst.y = (src0.y >= src1.y) ? \sim 0 : 0
1454
1455   dst.z = (src0.z >= src1.z) ? \sim 0 : 0
1456
1457   dst.w = (src0.w >= src1.w) ? \sim 0 : 0
1458
1459
1460 .. opcode:: USGE - Unsigned Integer Set On Greater Equal Than
1461
1462 .. math::
1463
1464   dst.x = (src0.x >= src1.x) ? \sim 0 : 0
1465
1466   dst.y = (src0.y >= src1.y) ? \sim 0 : 0
1467
1468   dst.z = (src0.z >= src1.z) ? \sim 0 : 0
1469
1470   dst.w = (src0.w >= src1.w) ? \sim 0 : 0
1471
1472
1473 .. opcode:: FSEQ - Float Set On Equal (ordered)
1474
1475    Same comparison as SEQ but returns integer instead of 1.0/0.0 float
1476
1477 .. math::
1478
1479   dst.x = (src0.x == src1.x) ? \sim 0 : 0
1480
1481   dst.y = (src0.y == src1.y) ? \sim 0 : 0
1482
1483   dst.z = (src0.z == src1.z) ? \sim 0 : 0
1484
1485   dst.w = (src0.w == src1.w) ? \sim 0 : 0
1486
1487
1488 .. opcode:: USEQ - Integer Set On Equal
1489
1490 .. math::
1491
1492   dst.x = (src0.x == src1.x) ? \sim 0 : 0
1493
1494   dst.y = (src0.y == src1.y) ? \sim 0 : 0
1495
1496   dst.z = (src0.z == src1.z) ? \sim 0 : 0
1497
1498   dst.w = (src0.w == src1.w) ? \sim 0 : 0
1499
1500
1501 .. opcode:: FSNE - Float Set On Not Equal (unordered)
1502
1503    Same comparison as SNE but returns integer instead of 1.0/0.0 float
1504
1505 .. math::
1506
1507   dst.x = (src0.x != src1.x) ? \sim 0 : 0
1508
1509   dst.y = (src0.y != src1.y) ? \sim 0 : 0
1510
1511   dst.z = (src0.z != src1.z) ? \sim 0 : 0
1512
1513   dst.w = (src0.w != src1.w) ? \sim 0 : 0
1514
1515
1516 .. opcode:: USNE - Integer Set On Not Equal
1517
1518 .. math::
1519
1520   dst.x = (src0.x != src1.x) ? \sim 0 : 0
1521
1522   dst.y = (src0.y != src1.y) ? \sim 0 : 0
1523
1524   dst.z = (src0.z != src1.z) ? \sim 0 : 0
1525
1526   dst.w = (src0.w != src1.w) ? \sim 0 : 0
1527
1528
1529 .. opcode:: INEG - Integer Negate
1530
1531   Two's complement.
1532
1533 .. math::
1534
1535   dst.x = -src.x
1536
1537   dst.y = -src.y
1538
1539   dst.z = -src.z
1540
1541   dst.w = -src.w
1542
1543
1544 .. opcode:: IABS - Integer Absolute Value
1545
1546 .. math::
1547
1548   dst.x = |src.x|
1549
1550   dst.y = |src.y|
1551
1552   dst.z = |src.z|
1553
1554   dst.w = |src.w|
1555
1556 Bitwise ISA
1557 ^^^^^^^^^^^
1558 These opcodes are used for bit-level manipulation of integers.
1559
1560 .. opcode:: IBFE - Signed Bitfield Extract
1561
1562   Like GLSL bitfieldExtract. Extracts a set of bits from the input, and
1563   sign-extends them if the high bit of the extracted window is set.
1564
1565   Pseudocode::
1566
1567     def ibfe(value, offset, bits):
1568       if offset < 0 or bits < 0 or offset + bits > 32:
1569         return undefined
1570       if bits == 0: return 0
1571       # Note: >> sign-extends
1572       return (value << (32 - offset - bits)) >> (32 - bits)
1573
1574 .. opcode:: UBFE - Unsigned Bitfield Extract
1575
1576   Like GLSL bitfieldExtract. Extracts a set of bits from the input, without
1577   any sign-extension.
1578
1579   Pseudocode::
1580
1581     def ubfe(value, offset, bits):
1582       if offset < 0 or bits < 0 or offset + bits > 32:
1583         return undefined
1584       if bits == 0: return 0
1585       # Note: >> does not sign-extend
1586       return (value << (32 - offset - bits)) >> (32 - bits)
1587
1588 .. opcode:: BFI - Bitfield Insert
1589
1590   Like GLSL bitfieldInsert. Replaces a bit region of 'base' with the low bits
1591   of 'insert'.
1592
1593   Pseudocode::
1594
1595     def bfi(base, insert, offset, bits):
1596       if offset < 0 or bits < 0 or offset + bits > 32:
1597         return undefined
1598       # << defined such that mask == ~0 when bits == 32, offset == 0
1599       mask = ((1 << bits) - 1) << offset
1600       return ((insert << offset) & mask) | (base & ~mask)
1601
1602 .. opcode:: BREV - Bitfield Reverse
1603
1604   See SM5 instruction BFREV. Reverses the bits of the argument.
1605
1606 .. opcode:: POPC - Population Count
1607
1608   See SM5 instruction COUNTBITS. Counts the number of set bits in the argument.
1609
1610 .. opcode:: LSB - Index of lowest set bit
1611
1612   See SM5 instruction FIRSTBIT_LO. Computes the 0-based index of the first set
1613   bit of the argument. Returns -1 if none are set.
1614
1615 .. opcode:: IMSB - Index of highest non-sign bit
1616
1617   See SM5 instruction FIRSTBIT_SHI. Computes the 0-based index of the highest
1618   non-sign bit of the argument (i.e. highest 0 bit for negative numbers,
1619   highest 1 bit for positive numbers). Returns -1 if all bits are the same
1620   (i.e. for inputs 0 and -1).
1621
1622 .. opcode:: UMSB - Index of highest set bit
1623
1624   See SM5 instruction FIRSTBIT_HI. Computes the 0-based index of the highest
1625   set bit of the argument. Returns -1 if none are set.
1626
1627 Geometry ISA
1628 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1629
1630 These opcodes are only supported in geometry shaders; they have no meaning
1631 in any other type of shader.
1632
1633 .. opcode:: EMIT - Emit
1634
1635   Generate a new vertex for the current primitive into the specified vertex
1636   stream using the values in the output registers.
1637
1638
1639 .. opcode:: ENDPRIM - End Primitive
1640
1641   Complete the current primitive in the specified vertex stream (consisting of
1642   the emitted vertices), and start a new one.
1643
1644
1645 GLSL ISA
1646 ^^^^^^^^^^
1647
1648 These opcodes are part of :term:`GLSL`'s opcode set. Support for these
1649 opcodes is determined by a special capability bit, ``GLSL``.
1650 Some require glsl version 1.30 (UIF/SWITCH/CASE/DEFAULT/ENDSWITCH).
1651
1652 .. opcode:: CAL - Subroutine Call
1653
1654   push(pc)
1655   pc = target
1656
1657
1658 .. opcode:: RET - Subroutine Call Return
1659
1660   pc = pop()
1661
1662
1663 .. opcode:: CONT - Continue
1664
1665   Unconditionally moves the point of execution to the instruction after the
1666   last BGNLOOP. The instruction must appear within a BGNLOOP/ENDLOOP.
1667
1668 .. note::
1669
1670    Support for CONT is determined by a special capability bit,
1671    ``TGSI_CONT_SUPPORTED``. See :ref:`Screen` for more information.
1672
1673
1674 .. opcode:: BGNLOOP - Begin a Loop
1675
1676   Start a loop. Must have a matching ENDLOOP.
1677
1678
1679 .. opcode:: BGNSUB - Begin Subroutine
1680
1681   Starts definition of a subroutine. Must have a matching ENDSUB.
1682
1683
1684 .. opcode:: ENDLOOP - End a Loop
1685
1686   End a loop started with BGNLOOP.
1687
1688
1689 .. opcode:: ENDSUB - End Subroutine
1690
1691   Ends definition of a subroutine.
1692
1693
1694 .. opcode:: NOP - No Operation
1695
1696   Do nothing.
1697
1698
1699 .. opcode:: BRK - Break
1700
1701   Unconditionally moves the point of execution to the instruction after the
1702   next ENDLOOP or ENDSWITCH. The instruction must appear within a
1703   BGNLOOP/ENDLOOP or SWITCH/ENDSWITCH.
1704
1705
1706 .. opcode:: IF - Float If
1707
1708   Start an IF ... ELSE .. ENDIF block.  Condition evaluates to true if
1709
1710     *src0.x* != 0.0
1711
1712   where *src0.x* is interpreted as a floating point register.
1713
1714
1715 .. opcode:: UIF - Bitwise If
1716
1717   Start an UIF ... ELSE .. ENDIF block. Condition evaluates to true if
1718
1719     *src0.x* != 0
1720
1721   where *src0.x* is interpreted as an integer register.
1722
1723
1724 .. opcode:: ELSE - Else
1725
1726   Starts an else block, after an IF or UIF statement.
1727
1728
1729 .. opcode:: ENDIF - End If
1730
1731   Ends an IF or UIF block.
1732
1733
1734 .. opcode:: SWITCH - Switch
1735
1736    Starts a C-style switch expression. The switch consists of one or multiple
1737    CASE statements, and at most one DEFAULT statement. Execution of a statement
1738    ends when a BRK is hit, but just like in C falling through to other cases
1739    without a break is allowed. Similarly, DEFAULT label is allowed anywhere not
1740    just as last statement, and fallthrough is allowed into/from it.
1741    CASE *src* arguments are evaluated at bit level against the SWITCH *src* argument.
1742
1743    Example::
1744
1745      SWITCH src[0].x
1746      CASE src[0].x
1747      (some instructions here)
1748      (optional BRK here)
1749      DEFAULT
1750      (some instructions here)
1751      (optional BRK here)
1752      CASE src[0].x
1753      (some instructions here)
1754      (optional BRK here)
1755      ENDSWITCH
1756
1757
1758 .. opcode:: CASE - Switch case
1759
1760    This represents a switch case label. The *src* arg must be an integer immediate.
1761
1762
1763 .. opcode:: DEFAULT - Switch default
1764
1765    This represents the default case in the switch, which is taken if no other
1766    case matches.
1767
1768
1769 .. opcode:: ENDSWITCH - End of switch
1770
1771    Ends a switch expression.
1772
1773
1774 Interpolation ISA
1775 ^^^^^^^^^^^^^^^^^
1776
1777 The interpolation instructions allow an input to be interpolated in a
1778 different way than its declaration. This corresponds to the GLSL 4.00
1779 interpolateAt* functions. The first argument of each of these must come from
1780 ``TGSI_FILE_INPUT``.
1781
1782 .. opcode:: INTERP_CENTROID - Interpolate at the centroid
1783
1784    Interpolates the varying specified by *src0* at the centroid
1785
1786 .. opcode:: INTERP_SAMPLE - Interpolate at the specified sample
1787
1788    Interpolates the varying specified by *src0* at the sample id
1789    specified by *src1.x* (interpreted as an integer)
1790
1791 .. opcode:: INTERP_OFFSET - Interpolate at the specified offset
1792
1793    Interpolates the varying specified by *src0* at the offset *src1.xy*
1794    from the pixel center (interpreted as floats)
1795
1796
1797 .. _doubleopcodes:
1798
1799 Double ISA
1800 ^^^^^^^^^^^^^^^
1801
1802 The double-precision opcodes reinterpret four-component vectors into
1803 two-component vectors with doubled precision in each component.
1804
1805 .. opcode:: DABS - Absolute
1806
1807 .. math::
1808
1809   dst.xy = |src0.xy|
1810
1811   dst.zw = |src0.zw|
1812
1813 .. opcode:: DADD - Add
1814
1815 .. math::
1816
1817   dst.xy = src0.xy + src1.xy
1818
1819   dst.zw = src0.zw + src1.zw
1820
1821 .. opcode:: DSEQ - Set on Equal
1822
1823 .. math::
1824
1825   dst.x = src0.xy == src1.xy ? \sim 0 : 0
1826
1827   dst.z = src0.zw == src1.zw ? \sim 0 : 0
1828
1829 .. opcode:: DSNE - Set on Not Equal
1830
1831 .. math::
1832
1833   dst.x = src0.xy != src1.xy ? \sim 0 : 0
1834
1835   dst.z = src0.zw != src1.zw ? \sim 0 : 0
1836
1837 .. opcode:: DSLT - Set on Less than
1838
1839 .. math::
1840
1841   dst.x = src0.xy < src1.xy ? \sim 0 : 0
1842
1843   dst.z = src0.zw < src1.zw ? \sim 0 : 0
1844
1845 .. opcode:: DSGE - Set on Greater equal
1846
1847 .. math::
1848
1849   dst.x = src0.xy >= src1.xy ? \sim 0 : 0
1850
1851   dst.z = src0.zw >= src1.zw ? \sim 0 : 0
1852
1853 .. opcode:: DFRAC - Fraction
1854
1855 .. math::
1856
1857   dst.xy = src.xy - \lfloor src.xy\rfloor
1858
1859   dst.zw = src.zw - \lfloor src.zw\rfloor
1860
1861 .. opcode:: DTRUNC - Truncate
1862
1863 .. math::
1864
1865   dst.xy = trunc(src.xy)
1866
1867   dst.zw = trunc(src.zw)
1868
1869 .. opcode:: DCEIL - Ceiling
1870
1871 .. math::
1872
1873   dst.xy = \lceil src.xy\rceil
1874
1875   dst.zw = \lceil src.zw\rceil
1876
1877 .. opcode:: DFLR - Floor
1878
1879 .. math::
1880
1881   dst.xy = \lfloor src.xy\rfloor
1882
1883   dst.zw = \lfloor src.zw\rfloor
1884
1885 .. opcode:: DROUND - Fraction
1886
1887 .. math::
1888
1889   dst.xy = round(src.xy)
1890
1891   dst.zw = round(src.zw)
1892
1893 .. opcode:: DSSG - Set Sign
1894
1895 .. math::
1896
1897   dst.xy = (src.xy > 0) ? 1.0 : (src.xy < 0) ? -1.0 : 0.0
1898
1899   dst.zw = (src.zw > 0) ? 1.0 : (src.zw < 0) ? -1.0 : 0.0
1900
1901 .. opcode:: DFRACEXP - Convert Number to Fractional and Integral Components
1902
1903 Like the ``frexp()`` routine in many math libraries, this opcode stores the
1904 exponent of its source to ``dst0``, and the significand to ``dst1``, such that
1905 :math:`dst1 \times 2^{dst0} = src` . The results are replicated across
1906 channels.
1907
1908 .. math::
1909
1910   dst0.xy = dst.zw = frac(src.xy)
1911
1912   dst1 = frac(src.xy)
1913
1914
1915 .. opcode:: DLDEXP - Multiply Number by Integral Power of 2
1916
1917 This opcode is the inverse of :opcode:`DFRACEXP`. The second
1918 source is an integer.
1919
1920 .. math::
1921
1922   dst.xy = src0.xy \times 2^{src1.x}
1923
1924   dst.zw = src0.zw \times 2^{src1.z}
1925
1926 .. opcode:: DMIN - Minimum
1927
1928 .. math::
1929
1930   dst.xy = min(src0.xy, src1.xy)
1931
1932   dst.zw = min(src0.zw, src1.zw)
1933
1934 .. opcode:: DMAX - Maximum
1935
1936 .. math::
1937
1938   dst.xy = max(src0.xy, src1.xy)
1939
1940   dst.zw = max(src0.zw, src1.zw)
1941
1942 .. opcode:: DMUL - Multiply
1943
1944 .. math::
1945
1946   dst.xy = src0.xy \times src1.xy
1947
1948   dst.zw = src0.zw \times src1.zw
1949
1950
1951 .. opcode:: DMAD - Multiply And Add
1952
1953 .. math::
1954
1955   dst.xy = src0.xy \times src1.xy + src2.xy
1956
1957   dst.zw = src0.zw \times src1.zw + src2.zw
1958
1959
1960 .. opcode:: DFMA - Fused Multiply-Add
1961
1962 Perform a * b + c with no intermediate rounding step.
1963
1964 .. math::
1965
1966   dst.xy = src0.xy \times src1.xy + src2.xy
1967
1968   dst.zw = src0.zw \times src1.zw + src2.zw
1969
1970
1971 .. opcode:: DDIV - Divide
1972
1973 .. math::
1974
1975   dst.xy = \frac{src0.xy}{src1.xy}
1976
1977   dst.zw = \frac{src0.zw}{src1.zw}
1978
1979
1980 .. opcode:: DRCP - Reciprocal
1981
1982 .. math::
1983
1984    dst.xy = \frac{1}{src.xy}
1985
1986    dst.zw = \frac{1}{src.zw}
1987
1988 .. opcode:: DSQRT - Square Root
1989
1990 .. math::
1991
1992    dst.xy = \sqrt{src.xy}
1993
1994    dst.zw = \sqrt{src.zw}
1995
1996 .. opcode:: DRSQ - Reciprocal Square Root
1997
1998 .. math::
1999
2000    dst.xy = \frac{1}{\sqrt{src.xy}}
2001
2002    dst.zw = \frac{1}{\sqrt{src.zw}}
2003
2004 .. opcode:: F2D - Float to Double
2005
2006 .. math::
2007
2008    dst.xy = double(src0.x)
2009
2010    dst.zw = double(src0.y)
2011
2012 .. opcode:: D2F - Double to Float
2013
2014 .. math::
2015
2016    dst.x = float(src0.xy)
2017
2018    dst.y = float(src0.zw)
2019
2020 .. opcode:: I2D - Int to Double
2021
2022 .. math::
2023
2024    dst.xy = double(src0.x)
2025
2026    dst.zw = double(src0.y)
2027
2028 .. opcode:: D2I - Double to Int
2029
2030 .. math::
2031
2032    dst.x = int(src0.xy)
2033
2034    dst.y = int(src0.zw)
2035
2036 .. opcode:: U2D - Unsigned Int to Double
2037
2038 .. math::
2039
2040    dst.xy = double(src0.x)
2041
2042    dst.zw = double(src0.y)
2043
2044 .. opcode:: D2U - Double to Unsigned Int
2045
2046 .. math::
2047
2048    dst.x = unsigned(src0.xy)
2049
2050    dst.y = unsigned(src0.zw)
2051
2052 64-bit Integer ISA
2053 ^^^^^^^^^^^^^^^^^^
2054
2055 The 64-bit integer opcodes reinterpret four-component vectors into
2056 two-component vectors with 64-bits in each component.
2057
2058 .. opcode:: I64ABS - 64-bit Integer Absolute Value
2059
2060 .. math::
2061
2062   dst.xy = |src0.xy|
2063
2064   dst.zw = |src0.zw|
2065
2066 .. opcode:: I64NEG - 64-bit Integer Negate
2067
2068   Two's complement.
2069
2070 .. math::
2071
2072   dst.xy = -src.xy
2073
2074   dst.zw = -src.zw
2075
2076 .. opcode:: I64SSG - 64-bit Integer Set Sign
2077
2078 .. math::
2079
2080   dst.xy = (src0.xy < 0) ? -1 : (src0.xy > 0) ? 1 : 0
2081
2082   dst.zw = (src0.zw < 0) ? -1 : (src0.zw > 0) ? 1 : 0
2083
2084 .. opcode:: U64ADD - 64-bit Integer Add
2085
2086 .. math::
2087
2088   dst.xy = src0.xy + src1.xy
2089
2090   dst.zw = src0.zw + src1.zw
2091
2092 .. opcode:: U64MUL - 64-bit Integer Multiply
2093
2094 .. math::
2095
2096   dst.xy = src0.xy * src1.xy
2097
2098   dst.zw = src0.zw * src1.zw
2099
2100 .. opcode:: U64SEQ - 64-bit Integer Set on Equal
2101
2102 .. math::
2103
2104   dst.x = src0.xy == src1.xy ? \sim 0 : 0
2105
2106   dst.z = src0.zw == src1.zw ? \sim 0 : 0
2107
2108 .. opcode:: U64SNE - 64-bit Integer Set on Not Equal
2109
2110 .. math::
2111
2112   dst.x = src0.xy != src1.xy ? \sim 0 : 0
2113
2114   dst.z = src0.zw != src1.zw ? \sim 0 : 0
2115
2116 .. opcode:: U64SLT - 64-bit Unsigned Integer Set on Less Than
2117
2118 .. math::
2119
2120   dst.x = src0.xy < src1.xy ? \sim 0 : 0
2121
2122   dst.z = src0.zw < src1.zw ? \sim 0 : 0
2123
2124 .. opcode:: U64SGE - 64-bit Unsigned Integer Set on Greater Equal
2125
2126 .. math::
2127
2128   dst.x = src0.xy >= src1.xy ? \sim 0 : 0
2129
2130   dst.z = src0.zw >= src1.zw ? \sim 0 : 0
2131
2132 .. opcode:: I64SLT - 64-bit Signed Integer Set on Less Than
2133
2134 .. math::
2135
2136   dst.x = src0.xy < src1.xy ? \sim 0 : 0
2137
2138   dst.z = src0.zw < src1.zw ? \sim 0 : 0
2139
2140 .. opcode:: I64SGE - 64-bit Signed Integer Set on Greater Equal
2141
2142 .. math::
2143
2144   dst.x = src0.xy >= src1.xy ? \sim 0 : 0
2145
2146   dst.z = src0.zw >= src1.zw ? \sim 0 : 0
2147
2148 .. opcode:: I64MIN - Minimum of 64-bit Signed Integers
2149
2150 .. math::
2151
2152   dst.xy = min(src0.xy, src1.xy)
2153
2154   dst.zw = min(src0.zw, src1.zw)
2155
2156 .. opcode:: U64MIN - Minimum of 64-bit Unsigned Integers
2157
2158 .. math::
2159
2160   dst.xy = min(src0.xy, src1.xy)
2161
2162   dst.zw = min(src0.zw, src1.zw)
2163
2164 .. opcode:: I64MAX - Maximum of 64-bit Signed Integers
2165
2166 .. math::
2167
2168   dst.xy = max(src0.xy, src1.xy)
2169
2170   dst.zw = max(src0.zw, src1.zw)
2171
2172 .. opcode:: U64MAX - Maximum of 64-bit Unsigned Integers
2173
2174 .. math::
2175
2176   dst.xy = max(src0.xy, src1.xy)
2177
2178   dst.zw = max(src0.zw, src1.zw)
2179
2180 .. opcode:: U64SHL - Shift Left 64-bit Unsigned Integer
2181
2182    The shift count is masked with 0x3f before the shift is applied.
2183
2184 .. math::
2185
2186   dst.xy = src0.xy << (0x3f \& src1.x)
2187
2188   dst.zw = src0.zw << (0x3f \& src1.y)
2189
2190 .. opcode:: I64SHR - Arithmetic Shift Right (of 64-bit Signed Integer)
2191
2192    The shift count is masked with 0x3f before the shift is applied.
2193
2194 .. math::
2195
2196   dst.xy = src0.xy >> (0x3f \& src1.x)
2197
2198   dst.zw = src0.zw >> (0x3f \& src1.y)
2199
2200 .. opcode:: U64SHR - Logical Shift Right (of 64-bit Unsigned Integer)
2201
2202    The shift count is masked with 0x3f before the shift is applied.
2203
2204 .. math::
2205
2206   dst.xy = src0.xy >> (unsigned) (0x3f \& src1.x)
2207
2208   dst.zw = src0.zw >> (unsigned) (0x3f \& src1.y)
2209
2210 .. opcode:: I64DIV - 64-bit Signed Integer Division
2211
2212 .. math::
2213
2214   dst.xy = \frac{src0.xy}{src1.xy}
2215
2216   dst.zw = \frac{src0.zw}{src1.zw}
2217
2218 .. opcode:: U64DIV - 64-bit Unsigned Integer Division
2219
2220 .. math::
2221
2222   dst.xy = \frac{src0.xy}{src1.xy}
2223
2224   dst.zw = \frac{src0.zw}{src1.zw}
2225
2226 .. opcode:: U64MOD - 64-bit Unsigned Integer Remainder
2227
2228 .. math::
2229
2230   dst.xy = src0.xy \bmod src1.xy
2231
2232   dst.zw = src0.zw \bmod src1.zw
2233
2234 .. opcode:: I64MOD - 64-bit Signed Integer Remainder
2235
2236 .. math::
2237
2238   dst.xy = src0.xy \bmod src1.xy
2239
2240   dst.zw = src0.zw \bmod src1.zw
2241
2242 .. opcode:: F2U64 - Float to 64-bit Unsigned Int
2243
2244 .. math::
2245
2246    dst.xy = (uint64_t) src0.x
2247
2248    dst.zw = (uint64_t) src0.y
2249
2250 .. opcode:: F2I64 - Float to 64-bit Int
2251
2252 .. math::
2253
2254    dst.xy = (int64_t) src0.x
2255
2256    dst.zw = (int64_t) src0.y
2257
2258 .. opcode:: U2I64 - Unsigned Integer to 64-bit Integer
2259
2260    This is a zero extension.
2261
2262 .. math::
2263
2264    dst.xy = (int64_t) src0.x
2265
2266    dst.zw = (int64_t) src0.y
2267
2268 .. opcode:: I2I64 - Signed Integer to 64-bit Integer
2269
2270    This is a sign extension.
2271
2272 .. math::
2273
2274    dst.xy = (int64_t) src0.x
2275
2276    dst.zw = (int64_t) src0.y
2277
2278 .. opcode:: D2U64 - Double to 64-bit Unsigned Int
2279
2280 .. math::
2281
2282    dst.xy = (uint64_t) src0.xy
2283
2284    dst.zw = (uint64_t) src0.zw
2285
2286 .. opcode:: D2I64 - Double to 64-bit Int
2287
2288 .. math::
2289
2290    dst.xy = (int64_t) src0.xy
2291
2292    dst.zw = (int64_t) src0.zw
2293
2294 .. opcode:: U642F - 64-bit unsigned integer to float
2295
2296 .. math::
2297
2298    dst.x = (float) src0.xy
2299
2300    dst.y = (float) src0.zw
2301
2302 .. opcode:: I642F - 64-bit Int to Float
2303
2304 .. math::
2305
2306    dst.x = (float) src0.xy
2307
2308    dst.y = (float) src0.zw
2309
2310 .. opcode:: U642D - 64-bit unsigned integer to double
2311
2312 .. math::
2313
2314    dst.xy = (double) src0.xy
2315
2316    dst.zw = (double) src0.zw
2317
2318 .. opcode:: I642D - 64-bit Int to double
2319
2320 .. math::
2321
2322    dst.xy = (double) src0.xy
2323
2324    dst.zw = (double) src0.zw
2325
2326 .. _samplingopcodes:
2327
2328 Resource Sampling Opcodes
2329 ^^^^^^^^^^^^^^^^^^^^^^^^^
2330
2331 Those opcodes follow very closely semantics of the respective Direct3D
2332 instructions. If in doubt double check Direct3D documentation.
2333 Note that the swizzle on SVIEW (src1) determines texel swizzling
2334 after lookup.
2335
2336 .. opcode:: SAMPLE
2337
2338   Using provided address, sample data from the specified texture using the
2339   filtering mode identified by the given sampler. The source data may come from
2340   any resource type other than buffers.
2341
2342   Syntax: ``SAMPLE dst, address, sampler_view, sampler``
2343
2344   Example: ``SAMPLE TEMP[0], TEMP[1], SVIEW[0], SAMP[0]``
2345
2346 .. opcode:: SAMPLE_I
2347
2348   Simplified alternative to the SAMPLE instruction.  Using the provided
2349   integer address, SAMPLE_I fetches data from the specified sampler view
2350   without any filtering.  The source data may come from any resource type
2351   other than CUBE.
2352
2353   Syntax: ``SAMPLE_I dst, address, sampler_view``
2354
2355   Example: ``SAMPLE_I TEMP[0], TEMP[1], SVIEW[0]``
2356
2357   The 'address' is specified as unsigned integers. If the 'address' is out of
2358   range [0...(# texels - 1)] the result of the fetch is always 0 in all
2359   components.  As such the instruction doesn't honor address wrap modes, in
2360   cases where that behavior is desirable 'SAMPLE' instruction should be used.
2361   address.w always provides an unsigned integer mipmap level. If the value is
2362   out of the range then the instruction always returns 0 in all components.
2363   address.yz are ignored for buffers and 1d textures.  address.z is ignored
2364   for 1d texture arrays and 2d textures.
2365
2366   For 1D texture arrays address.y provides the array index (also as unsigned
2367   integer). If the value is out of the range of available array indices
2368   [0... (array size - 1)] then the opcode always returns 0 in all components.
2369   For 2D texture arrays address.z provides the array index, otherwise it
2370   exhibits the same behavior as in the case for 1D texture arrays.  The exact
2371   semantics of the source address are presented in the table below:
2372
2373   +---------------------------+----+-----+-----+---------+
2374   | resource type             | X  |  Y  |  Z  |    W    |
2375   +===========================+====+=====+=====+=========+
2376   | ``PIPE_BUFFER``           | x  |     |     | ignored |
2377   +---------------------------+----+-----+-----+---------+
2378   | ``PIPE_TEXTURE_1D``       | x  |     |     |   mpl   |
2379   +---------------------------+----+-----+-----+---------+
2380   | ``PIPE_TEXTURE_2D``       | x  |  y  |     |   mpl   |
2381   +---------------------------+----+-----+-----+---------+
2382   | ``PIPE_TEXTURE_3D``       | x  |  y  |  z  |   mpl   |
2383   +---------------------------+----+-----+-----+---------+
2384   | ``PIPE_TEXTURE_RECT``     | x  |  y  |     |   mpl   |
2385   +---------------------------+----+-----+-----+---------+
2386   | ``PIPE_TEXTURE_CUBE``     | not allowed as source    |
2387   +---------------------------+----+-----+-----+---------+
2388   | ``PIPE_TEXTURE_1D_ARRAY`` | x  | idx |     |   mpl   |
2389   +---------------------------+----+-----+-----+---------+
2390   | ``PIPE_TEXTURE_2D_ARRAY`` | x  |  y  | idx |   mpl   |
2391   +---------------------------+----+-----+-----+---------+
2392
2393   Where 'mpl' is a mipmap level and 'idx' is the array index.
2394
2395 .. opcode:: SAMPLE_I_MS
2396
2397   Just like SAMPLE_I but allows fetch data from multi-sampled surfaces.
2398
2399   Syntax: ``SAMPLE_I_MS dst, address, sampler_view, sample``
2400
2401 .. opcode:: SAMPLE_B
2402
2403   Just like the SAMPLE instruction with the exception that an additional bias
2404   is applied to the level of detail computed as part of the instruction
2405   execution.
2406
2407   Syntax: ``SAMPLE_B dst, address, sampler_view, sampler, lod_bias``
2408
2409   Example: ``SAMPLE_B TEMP[0], TEMP[1], SVIEW[0], SAMP[0], TEMP[2].x``
2410
2411 .. opcode:: SAMPLE_C
2412
2413   Similar to the SAMPLE instruction but it performs a comparison filter. The
2414   operands to SAMPLE_C are identical to SAMPLE, except that there is an
2415   additional float32 operand, reference value, which must be a register with
2416   single-component, or a scalar literal.  SAMPLE_C makes the hardware use the
2417   current samplers compare_func (in pipe_sampler_state) to compare reference
2418   value against the red component value for the surce resource at each texel
2419   that the currently configured texture filter covers based on the provided
2420   coordinates.
2421
2422   Syntax: ``SAMPLE_C dst, address, sampler_view.r, sampler, ref_value``
2423
2424   Example: ``SAMPLE_C TEMP[0], TEMP[1], SVIEW[0].r, SAMP[0], TEMP[2].x``
2425
2426 .. opcode:: SAMPLE_C_LZ
2427
2428   Same as SAMPLE_C, but LOD is 0 and derivatives are ignored. The LZ stands
2429   for level-zero.
2430
2431   Syntax: ``SAMPLE_C_LZ dst, address, sampler_view.r, sampler, ref_value``
2432
2433   Example: ``SAMPLE_C_LZ TEMP[0], TEMP[1], SVIEW[0].r, SAMP[0], TEMP[2].x``
2434
2435
2436 .. opcode:: SAMPLE_D
2437
2438   SAMPLE_D is identical to the SAMPLE opcode except that the derivatives for
2439   the source address in the x direction and the y direction are provided by
2440   extra parameters.
2441
2442   Syntax: ``SAMPLE_D dst, address, sampler_view, sampler, der_x, der_y``
2443
2444   Example: ``SAMPLE_D TEMP[0], TEMP[1], SVIEW[0], SAMP[0], TEMP[2], TEMP[3]``
2445
2446 .. opcode:: SAMPLE_L
2447
2448   SAMPLE_L is identical to the SAMPLE opcode except that the LOD is provided
2449   directly as a scalar value, representing no anisotropy.
2450
2451   Syntax: ``SAMPLE_L dst, address, sampler_view, sampler, explicit_lod``
2452
2453   Example: ``SAMPLE_L TEMP[0], TEMP[1], SVIEW[0], SAMP[0], TEMP[2].x``
2454
2455 .. opcode:: GATHER4
2456
2457   Gathers the four texels to be used in a bi-linear filtering operation and
2458   packs them into a single register.  Only works with 2D, 2D array, cubemaps,
2459   and cubemaps arrays.  For 2D textures, only the addressing modes of the
2460   sampler and the top level of any mip pyramid are used. Set W to zero.  It
2461   behaves like the SAMPLE instruction, but a filtered sample is not
2462   generated. The four samples that contribute to filtering are placed into
2463   XYZW in counter-clockwise order, starting with the (u,v) texture coordinate
2464   delta at the following locations (-, +), (+, +), (+, -), (-, -), where the
2465   magnitude of the deltas are half a texel.
2466
2467
2468 .. opcode:: SVIEWINFO
2469
2470   Query the dimensions of a given sampler view.  dst receives width, height,
2471   depth or array size and number of mipmap levels as int4. The dst can have a
2472   writemask which will specify what info is the caller interested in.
2473
2474   Syntax: ``SVIEWINFO dst, src_mip_level, sampler_view``
2475
2476   Example: ``SVIEWINFO TEMP[0], TEMP[1].x, SVIEW[0]``
2477
2478   src_mip_level is an unsigned integer scalar. If it's out of range then
2479   returns 0 for width, height and depth/array size but the total number of
2480   mipmap is still returned correctly for the given sampler view.  The returned
2481   width, height and depth values are for the mipmap level selected by the
2482   src_mip_level and are in the number of texels.  For 1d texture array width
2483   is in dst.x, array size is in dst.y and dst.z is 0. The number of mipmaps is
2484   still in dst.w.  In contrast to d3d10 resinfo, there's no way in the tgsi
2485   instruction encoding to specify the return type (float/rcpfloat/uint), hence
2486   always using uint. Also, unlike the SAMPLE instructions, the swizzle on src1
2487   resinfo allowing swizzling dst values is ignored (due to the interaction
2488   with rcpfloat modifier which requires some swizzle handling in the state
2489   tracker anyway).
2490
2491 .. opcode:: SAMPLE_POS
2492
2493   Query the position of a sample in the given resource or render target
2494   when per-sample fragment shading is in effect.
2495
2496   Syntax: ``SAMPLE_POS dst, source, sample_index``
2497
2498   dst receives float4 (x, y, undef, undef) indicated where the sample is
2499   located. Sample locations are in the range [0, 1] where 0.5 is the center
2500   of the fragment.
2501
2502   source is either a sampler view (to indicate a shader resource) or temp
2503   register (to indicate the render target).  The source register may have
2504   an optional swizzle to apply to the returned result
2505
2506   sample_index is an integer scalar indicating which sample position is to
2507   be queried.
2508
2509   If per-sample shading is not in effect or the source resource or render
2510   target is not multisampled, the result is (0.5, 0.5, undef, undef).
2511
2512   NOTE: no driver has implemented this opcode yet (and no gallium frontend
2513   emits it).  This information is subject to change.
2514
2515 .. opcode:: SAMPLE_INFO
2516
2517   Query the number of samples in a multisampled resource or render target.
2518
2519   Syntax: ``SAMPLE_INFO dst, source``
2520
2521   dst receives int4 (n, 0, 0, 0) where n is the number of samples in a
2522   resource or the render target.
2523
2524   source is either a sampler view (to indicate a shader resource) or temp
2525   register (to indicate the render target).  The source register may have
2526   an optional swizzle to apply to the returned result
2527
2528   If per-sample shading is not in effect or the source resource or render
2529   target is not multisampled, the result is (1, 0, 0, 0).
2530
2531   NOTE: no driver has implemented this opcode yet (and no gallium frontend
2532   emits it).  This information is subject to change.
2533
2534 .. opcode:: LOD - level of detail
2535
2536    Same syntax as the SAMPLE opcode but instead of performing an actual
2537    texture lookup/filter, return the computed LOD information that the
2538    texture pipe would use to access the texture. The Y component contains
2539    the computed LOD lambda_prime. The X component contains the LOD that will
2540    be accessed, based on min/max lod's and mipmap filters.
2541    The Z and W components are set to 0.
2542
2543    Syntax: ``LOD dst, address, sampler_view, sampler``
2544
2545
2546 .. _resourceopcodes:
2547
2548 Resource Access Opcodes
2549 ^^^^^^^^^^^^^^^^^^^^^^^
2550
2551 For these opcodes, the resource can be a BUFFER, IMAGE, or MEMORY.
2552
2553 .. opcode:: LOAD - Fetch data from a shader buffer or image
2554
2555                Syntax: ``LOAD dst, resource, address``
2556
2557                Example: ``LOAD TEMP[0], BUFFER[0], TEMP[1]``
2558
2559                Using the provided integer address, LOAD fetches data
2560                from the specified buffer or texture without any
2561                filtering.
2562
2563                The 'address' is specified as a vector of unsigned
2564                integers.  If the 'address' is out of range the result
2565                is unspecified.
2566
2567                Only the first mipmap level of a resource can be read
2568                from using this instruction.
2569
2570                For 1D or 2D texture arrays, the array index is
2571                provided as an unsigned integer in address.y or
2572                address.z, respectively.  address.yz are ignored for
2573                buffers and 1D textures.  address.z is ignored for 1D
2574                texture arrays and 2D textures.  address.w is always
2575                ignored.
2576
2577                A swizzle suffix may be added to the resource argument
2578                this will cause the resource data to be swizzled accordingly.
2579
2580 .. opcode:: STORE - Write data to a shader resource
2581
2582                Syntax: ``STORE resource, address, src``
2583
2584                Example: ``STORE BUFFER[0], TEMP[0], TEMP[1]``
2585
2586                Using the provided integer address, STORE writes data
2587                to the specified buffer or texture.
2588
2589                The 'address' is specified as a vector of unsigned
2590                integers.  If the 'address' is out of range the result
2591                is unspecified.
2592
2593                Only the first mipmap level of a resource can be
2594                written to using this instruction.
2595
2596                For 1D or 2D texture arrays, the array index is
2597                provided as an unsigned integer in address.y or
2598                address.z, respectively.  address.yz are ignored for
2599                buffers and 1D textures.  address.z is ignored for 1D
2600                texture arrays and 2D textures.  address.w is always
2601                ignored.
2602
2603 .. opcode:: RESQ - Query information about a resource
2604
2605   Syntax: ``RESQ dst, resource``
2606
2607   Example: ``RESQ TEMP[0], BUFFER[0]``
2608
2609   Returns information about the buffer or image resource. For buffer
2610   resources, the size (in bytes) is returned in the x component. For
2611   image resources, .xyz will contain the width/height/layers of the
2612   image, while .w will contain the number of samples for multi-sampled
2613   images.
2614
2615 .. opcode:: FBFETCH - Load data from framebuffer
2616
2617   Syntax: ``FBFETCH dst, output``
2618
2619   Example: ``FBFETCH TEMP[0], OUT[0]``
2620
2621   This is only valid on ``COLOR`` semantic outputs. Returns the color
2622   of the current position in the framebuffer from before this fragment
2623   shader invocation. May return the same value from multiple calls for
2624   a particular output within a single invocation. Note that result may
2625   be undefined if a fragment is drawn multiple times without a blend
2626   barrier in between.
2627
2628
2629 .. _bindlessopcodes:
2630
2631 Bindless Opcodes
2632 ^^^^^^^^^^^^^^^^
2633
2634 These opcodes are for working with bindless sampler or image handles and
2635 require PIPE_CAP_BINDLESS_TEXTURE.
2636
2637 .. opcode:: IMG2HND - Get a bindless handle for a image
2638
2639   Syntax: ``IMG2HND dst, image``
2640
2641   Example: ``IMG2HND TEMP[0], IMAGE[0]``
2642
2643   Sets 'dst' to a bindless handle for 'image'.
2644
2645 .. opcode:: SAMP2HND - Get a bindless handle for a sampler
2646
2647   Syntax: ``SAMP2HND dst, sampler``
2648
2649   Example: ``SAMP2HND TEMP[0], SAMP[0]``
2650
2651   Sets 'dst' to a bindless handle for 'sampler'.
2652
2653
2654 .. _threadsyncopcodes:
2655
2656 Inter-thread synchronization opcodes
2657 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2658
2659 These opcodes are intended for communication between threads running
2660 within the same compute grid.  For now they're only valid in compute
2661 programs.
2662
2663 .. opcode:: BARRIER - Thread group barrier
2664
2665   ``BARRIER``
2666
2667   This opcode suspends the execution of the current thread until all
2668   the remaining threads in the working group reach the same point of
2669   the program.  Results are unspecified if any of the remaining
2670   threads terminates or never reaches an executed BARRIER instruction.
2671
2672 .. opcode:: MEMBAR - Memory barrier
2673
2674   ``MEMBAR type``
2675
2676   This opcode waits for the completion of all memory accesses based on
2677   the type passed in. The type is an immediate bitfield with the following
2678   meaning:
2679
2680   Bit 0: Shader storage buffers
2681   Bit 1: Atomic buffers
2682   Bit 2: Images
2683   Bit 3: Shared memory
2684   Bit 4: Thread group
2685
2686   These may be passed in in any combination. An implementation is free to not
2687   distinguish between these as it sees fit. However these map to all the
2688   possibilities made available by GLSL.
2689
2690 .. _atomopcodes:
2691
2692 Atomic opcodes
2693 ^^^^^^^^^^^^^^
2694
2695 These opcodes provide atomic variants of some common arithmetic and
2696 logical operations.  In this context atomicity means that another
2697 concurrent memory access operation that affects the same memory
2698 location is guaranteed to be performed strictly before or after the
2699 entire execution of the atomic operation. The resource may be a BUFFER,
2700 IMAGE, HWATOMIC, or MEMORY.  In the case of an image, the offset works
2701 the same as for ``LOAD`` and ``STORE``, specified above. For atomic
2702 counters, the offset is an immediate index to the base HW atomic
2703 counter for this operation.
2704 These atomic operations may only be used with 32-bit integer image formats.
2705
2706 .. opcode:: ATOMUADD - Atomic integer addition
2707
2708   Syntax: ``ATOMUADD dst, resource, offset, src``
2709
2710   Example: ``ATOMUADD TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2711
2712   The following operation is performed atomically:
2713
2714 .. math::
2715
2716   dst_x = resource[offset]
2717
2718   resource[offset] = dst_x + src_x
2719
2720
2721 .. opcode:: ATOMFADD - Atomic floating point addition
2722
2723   Syntax: ``ATOMFADD dst, resource, offset, src``
2724
2725   Example: ``ATOMFADD TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2726
2727   The following operation is performed atomically:
2728
2729 .. math::
2730
2731   dst_x = resource[offset]
2732
2733   resource[offset] = dst_x + src_x
2734
2735
2736 .. opcode:: ATOMXCHG - Atomic exchange
2737
2738   Syntax: ``ATOMXCHG dst, resource, offset, src``
2739
2740   Example: ``ATOMXCHG TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2741
2742   The following operation is performed atomically:
2743
2744 .. math::
2745
2746   dst_x = resource[offset]
2747
2748   resource[offset] = src_x
2749
2750
2751 .. opcode:: ATOMCAS - Atomic compare-and-exchange
2752
2753   Syntax: ``ATOMCAS dst, resource, offset, cmp, src``
2754
2755   Example: ``ATOMCAS TEMP[0], BUFFER[0], TEMP[1], TEMP[2], TEMP[3]``
2756
2757   The following operation is performed atomically:
2758
2759 .. math::
2760
2761   dst_x = resource[offset]
2762
2763   resource[offset] = (dst_x == cmp_x ? src_x : dst_x)
2764
2765
2766 .. opcode:: ATOMAND - Atomic bitwise And
2767
2768   Syntax: ``ATOMAND dst, resource, offset, src``
2769
2770   Example: ``ATOMAND TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2771
2772   The following operation is performed atomically:
2773
2774 .. math::
2775
2776   dst_x = resource[offset]
2777
2778   resource[offset] = dst_x \& src_x
2779
2780
2781 .. opcode:: ATOMOR - Atomic bitwise Or
2782
2783   Syntax: ``ATOMOR dst, resource, offset, src``
2784
2785   Example: ``ATOMOR TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2786
2787   The following operation is performed atomically:
2788
2789 .. math::
2790
2791   dst_x = resource[offset]
2792
2793   resource[offset] = dst_x | src_x
2794
2795
2796 .. opcode:: ATOMXOR - Atomic bitwise Xor
2797
2798   Syntax: ``ATOMXOR dst, resource, offset, src``
2799
2800   Example: ``ATOMXOR TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2801
2802   The following operation is performed atomically:
2803
2804 .. math::
2805
2806   dst_x = resource[offset]
2807
2808   resource[offset] = dst_x \oplus src_x
2809
2810
2811 .. opcode:: ATOMUMIN - Atomic unsigned minimum
2812
2813   Syntax: ``ATOMUMIN dst, resource, offset, src``
2814
2815   Example: ``ATOMUMIN TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2816
2817   The following operation is performed atomically:
2818
2819 .. math::
2820
2821   dst_x = resource[offset]
2822
2823   resource[offset] = (dst_x < src_x ? dst_x : src_x)
2824
2825
2826 .. opcode:: ATOMUMAX - Atomic unsigned maximum
2827
2828   Syntax: ``ATOMUMAX dst, resource, offset, src``
2829
2830   Example: ``ATOMUMAX TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2831
2832   The following operation is performed atomically:
2833
2834 .. math::
2835
2836   dst_x = resource[offset]
2837
2838   resource[offset] = (dst_x > src_x ? dst_x : src_x)
2839
2840
2841 .. opcode:: ATOMIMIN - Atomic signed minimum
2842
2843   Syntax: ``ATOMIMIN dst, resource, offset, src``
2844
2845   Example: ``ATOMIMIN TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2846
2847   The following operation is performed atomically:
2848
2849 .. math::
2850
2851   dst_x = resource[offset]
2852
2853   resource[offset] = (dst_x < src_x ? dst_x : src_x)
2854
2855
2856 .. opcode:: ATOMIMAX - Atomic signed maximum
2857
2858   Syntax: ``ATOMIMAX dst, resource, offset, src``
2859
2860   Example: ``ATOMIMAX TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2861
2862   The following operation is performed atomically:
2863
2864 .. math::
2865
2866   dst_x = resource[offset]
2867
2868   resource[offset] = (dst_x > src_x ? dst_x : src_x)
2869
2870
2871 .. opcode:: ATOMINC_WRAP - Atomic increment + wrap around
2872
2873   Syntax: ``ATOMINC_WRAP dst, resource, offset, src``
2874
2875   Example: ``ATOMINC_WRAP TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2876
2877   The following operation is performed atomically:
2878
2879 .. math::
2880
2881   dst_x = resource[offset] + 1
2882
2883   resource[offset] = dst_x <= src_x ? dst_x : 0
2884
2885
2886 .. opcode:: ATOMDEC_WRAP - Atomic decrement + wrap around
2887
2888   Syntax: ``ATOMDEC_WRAP dst, resource, offset, src``
2889
2890   Example: ``ATOMDEC_WRAP TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2891
2892   The following operation is performed atomically:
2893
2894 .. math::
2895
2896   dst_x = resource[offset]
2897
2898   resource[offset] = (dst_x > 0 && dst_x < src_x) ? dst_x - 1 : 0
2899
2900
2901 .. _interlaneopcodes:
2902
2903 Inter-lane opcodes
2904 ^^^^^^^^^^^^^^^^^^
2905
2906 These opcodes reduce the given value across the shader invocations
2907 running in the current SIMD group. Every thread in the subgroup will receive
2908 the same result. The BALLOT operations accept a single-channel argument that
2909 is treated as a boolean and produce a 64-bit value.
2910
2911 .. opcode:: VOTE_ANY - Value is set in any of the active invocations
2912
2913   Syntax: ``VOTE_ANY dst, value``
2914
2915   Example: ``VOTE_ANY TEMP[0].x, TEMP[1].x``
2916
2917
2918 .. opcode:: VOTE_ALL - Value is set in all of the active invocations
2919
2920   Syntax: ``VOTE_ALL dst, value``
2921
2922   Example: ``VOTE_ALL TEMP[0].x, TEMP[1].x``
2923
2924
2925 .. opcode:: VOTE_EQ - Value is the same in all of the active invocations
2926
2927   Syntax: ``VOTE_EQ dst, value``
2928
2929   Example: ``VOTE_EQ TEMP[0].x, TEMP[1].x``
2930
2931
2932 .. opcode:: BALLOT - Lanemask of whether the value is set in each active
2933             invocation
2934
2935   Syntax: ``BALLOT dst, value``
2936
2937   Example: ``BALLOT TEMP[0].xy, TEMP[1].x``
2938
2939   When the argument is a constant true, this produces a bitmask of active
2940   invocations. In fragment shaders, this can include helper invocations
2941   (invocations whose outputs and writes to memory are discarded, but which
2942   are used to compute derivatives).
2943
2944
2945 .. opcode:: READ_FIRST - Broadcast the value from the first active
2946             invocation to all active lanes
2947
2948   Syntax: ``READ_FIRST dst, value``
2949
2950   Example: ``READ_FIRST TEMP[0], TEMP[1]``
2951
2952
2953 .. opcode:: READ_INVOC - Retrieve the value from the given invocation
2954             (need not be uniform)
2955
2956   Syntax: ``READ_INVOC dst, value, invocation``
2957
2958   Example: ``READ_INVOC TEMP[0].xy, TEMP[1].xy, TEMP[2].x``
2959
2960   invocation.x controls the invocation number to read from for all channels.
2961   The invocation number must be the same across all active invocations in a
2962   sub-group; otherwise, the results are undefined.
2963
2964
2965 Explanation of symbols used
2966 ------------------------------
2967
2968
2969 Functions
2970 ^^^^^^^^^^^^^^
2971
2972
2973   :math:`|x|`       Absolute value of `x`.
2974
2975   :math:`\lceil x \rceil` Ceiling of `x`.
2976
2977   clamp(x,y,z)      Clamp x between y and z.
2978                     (x < y) ? y : (x > z) ? z : x
2979
2980   :math:`\lfloor x\rfloor` Floor of `x`.
2981
2982   :math:`\log_2{x}` Logarithm of `x`, base 2.
2983
2984   max(x,y)          Maximum of x and y.
2985                     (x > y) ? x : y
2986
2987   min(x,y)          Minimum of x and y.
2988                     (x < y) ? x : y
2989
2990   partialx(x)       Derivative of x relative to fragment's X.
2991
2992   partialy(x)       Derivative of x relative to fragment's Y.
2993
2994   pop()             Pop from stack.
2995
2996   :math:`x^y`       `x` to the power `y`.
2997
2998   push(x)           Push x on stack.
2999
3000   round(x)          Round x.
3001
3002   trunc(x)          Truncate x, i.e. drop the fraction bits.
3003
3004
3005 Keywords
3006 ^^^^^^^^^^^^^
3007
3008
3009   discard           Discard fragment.
3010
3011   pc                Program counter.
3012
3013   target            Label of target instruction.
3014
3015
3016 Other tokens
3017 ---------------
3018
3019
3020 Declaration
3021 ^^^^^^^^^^^
3022
3023
3024 Declares a register that is will be referenced as an operand in Instruction
3025 tokens.
3026
3027 File field contains register file that is being declared and is one
3028 of TGSI_FILE.
3029
3030 UsageMask field specifies which of the register components can be accessed
3031 and is one of TGSI_WRITEMASK.
3032
3033 The Local flag specifies that a given value isn't intended for
3034 subroutine parameter passing and, as a result, the implementation
3035 isn't required to give any guarantees of it being preserved across
3036 subroutine boundaries.  As it's merely a compiler hint, the
3037 implementation is free to ignore it.
3038
3039 If Dimension flag is set to 1, a Declaration Dimension token follows.
3040
3041 If Semantic flag is set to 1, a Declaration Semantic token follows.
3042
3043 If Interpolate flag is set to 1, a Declaration Interpolate token follows.
3044
3045 If file is TGSI_FILE_RESOURCE, a Declaration Resource token follows.
3046
3047 If Array flag is set to 1, a Declaration Array token follows.
3048
3049 Array Declaration
3050 ^^^^^^^^^^^^^^^^^^^^^^^^
3051
3052 Declarations can optional have an ArrayID attribute which can be referred by
3053 indirect addressing operands. An ArrayID of zero is reserved and treated as
3054 if no ArrayID is specified.
3055
3056 If an indirect addressing operand refers to a specific declaration by using
3057 an ArrayID only the registers in this declaration are guaranteed to be
3058 accessed, accessing any register outside this declaration results in undefined
3059 behavior. Note that for compatibility the effective index is zero-based and
3060 not relative to the specified declaration
3061
3062 If no ArrayID is specified with an indirect addressing operand the whole
3063 register file might be accessed by this operand. This is strongly discouraged
3064 and will prevent packing of scalar/vec2 arrays and effective alias analysis.
3065 This is only legal for TEMP and CONST register files.
3066
3067 Declaration Semantic
3068 ^^^^^^^^^^^^^^^^^^^^^^^^
3069
3070 Vertex and fragment shader input and output registers may be labeled
3071 with semantic information consisting of a name and index.
3072
3073 Follows Declaration token if Semantic bit is set.
3074
3075 Since its purpose is to link a shader with other stages of the pipeline,
3076 it is valid to follow only those Declaration tokens that declare a register
3077 either in INPUT or OUTPUT file.
3078
3079 SemanticName field contains the semantic name of the register being declared.
3080 There is no default value.
3081
3082 SemanticIndex is an optional subscript that can be used to distinguish
3083 different register declarations with the same semantic name. The default value
3084 is 0.
3085
3086 The meanings of the individual semantic names are explained in the following
3087 sections.
3088
3089 TGSI_SEMANTIC_POSITION
3090 """"""""""""""""""""""
3091
3092 For vertex shaders, TGSI_SEMANTIC_POSITION indicates the vertex shader
3093 output register which contains the homogeneous vertex position in the clip
3094 space coordinate system.  After clipping, the X, Y and Z components of the
3095 vertex will be divided by the W value to get normalized device coordinates.
3096
3097 For fragment shaders, TGSI_SEMANTIC_POSITION is used to indicate that
3098 fragment shader input (or system value, depending on which one is
3099 supported by the driver) contains the fragment's window position.  The X
3100 component starts at zero and always increases from left to right.
3101 The Y component starts at zero and always increases but Y=0 may either
3102 indicate the top of the window or the bottom depending on the fragment
3103 coordinate origin convention (see TGSI_PROPERTY_FS_COORD_ORIGIN).
3104 The Z coordinate ranges from 0 to 1 to represent depth from the front
3105 to the back of the Z buffer.  The W component contains the interpolated
3106 reciprocal of the vertex position W component (corresponding to gl_Fragcoord,
3107 but unlike d3d10 which interpolates the same 1/w but then gives back
3108 the reciprocal of the interpolated value).
3109
3110 Fragment shaders may also declare an output register with
3111 TGSI_SEMANTIC_POSITION.  Only the Z component is writable.  This allows
3112 the fragment shader to change the fragment's Z position.
3113
3114
3115
3116 TGSI_SEMANTIC_COLOR
3117 """""""""""""""""""
3118
3119 For vertex shader outputs or fragment shader inputs/outputs, this
3120 label indicates that the register contains an R,G,B,A color.
3121
3122 Several shader inputs/outputs may contain colors so the semantic index
3123 is used to distinguish them.  For example, color[0] may be the diffuse
3124 color while color[1] may be the specular color.
3125
3126 This label is needed so that the flat/smooth shading can be applied
3127 to the right interpolants during rasterization.
3128
3129
3130
3131 TGSI_SEMANTIC_BCOLOR
3132 """"""""""""""""""""
3133
3134 Back-facing colors are only used for back-facing polygons, and are only valid
3135 in vertex shader outputs. After rasterization, all polygons are front-facing
3136 and COLOR and BCOLOR end up occupying the same slots in the fragment shader,
3137 so all BCOLORs effectively become regular COLORs in the fragment shader.
3138
3139
3140 TGSI_SEMANTIC_FOG
3141 """""""""""""""""
3142
3143 Vertex shader inputs and outputs and fragment shader inputs may be
3144 labeled with TGSI_SEMANTIC_FOG to indicate that the register contains
3145 a fog coordinate.  Typically, the fragment shader will use the fog coordinate
3146 to compute a fog blend factor which is used to blend the normal fragment color
3147 with a constant fog color.  But fog coord really is just an ordinary vec4
3148 register like regular semantics.
3149
3150
3151 TGSI_SEMANTIC_PSIZE
3152 """""""""""""""""""
3153
3154 Vertex shader input and output registers may be labeled with
3155 TGIS_SEMANTIC_PSIZE to indicate that the register contains a point size
3156 in the form (S, 0, 0, 1).  The point size controls the width or diameter
3157 of points for rasterization.  This label cannot be used in fragment
3158 shaders.
3159
3160 When using this semantic, be sure to set the appropriate state in the
3161 :ref:`rasterizer` first.
3162
3163
3164 TGSI_SEMANTIC_TEXCOORD
3165 """"""""""""""""""""""
3166
3167 Only available if PIPE_CAP_TGSI_TEXCOORD is exposed !
3168
3169 Vertex shader outputs and fragment shader inputs may be labeled with
3170 this semantic to make them replaceable by sprite coordinates via the
3171 sprite_coord_enable state in the :ref:`rasterizer`.
3172 The semantic index permitted with this semantic is limited to <= 7.
3173
3174 If the driver does not support TEXCOORD, sprite coordinate replacement
3175 applies to inputs with the GENERIC semantic instead.
3176
3177 The intended use case for this semantic is gl_TexCoord.
3178
3179
3180 TGSI_SEMANTIC_PCOORD
3181 """"""""""""""""""""
3182
3183 Only available if PIPE_CAP_TGSI_TEXCOORD is exposed !
3184
3185 Fragment shader inputs may be labeled with TGSI_SEMANTIC_PCOORD to indicate
3186 that the register contains sprite coordinates in the form (x, y, 0, 1), if
3187 the current primitive is a point and point sprites are enabled. Otherwise,
3188 the contents of the register are undefined.
3189
3190 The intended use case for this semantic is gl_PointCoord.
3191
3192
3193 TGSI_SEMANTIC_GENERIC
3194 """""""""""""""""""""
3195
3196 All vertex/fragment shader inputs/outputs not labeled with any other
3197 semantic label can be considered to be generic attributes.  Typical
3198 uses of generic inputs/outputs are texcoords and user-defined values.
3199
3200
3201 TGSI_SEMANTIC_NORMAL
3202 """"""""""""""""""""
3203
3204 Indicates that a vertex shader input is a normal vector.  This is
3205 typically only used for legacy graphics APIs.
3206
3207
3208 TGSI_SEMANTIC_FACE
3209 """"""""""""""""""
3210
3211 This label applies to fragment shader inputs (or system values,
3212 depending on which one is supported by the driver) and indicates that
3213 the register contains front/back-face information.
3214
3215 If it is an input, it will be a floating-point vector in the form (F, 0, 0, 1),
3216 where F will be positive when the fragment belongs to a front-facing polygon,
3217 and negative when the fragment belongs to a back-facing polygon.
3218
3219 If it is a system value, it will be an integer vector in the form (F, 0, 0, 1),
3220 where F is 0xffffffff when the fragment belongs to a front-facing polygon and
3221 0 when the fragment belongs to a back-facing polygon.
3222
3223
3224 TGSI_SEMANTIC_EDGEFLAG
3225 """"""""""""""""""""""
3226
3227 For vertex shaders, this semantic label indicates that an input or
3228 output is a boolean edge flag.  The register layout is [F, x, x, x]
3229 where F is 0.0 or 1.0 and x = don't care.  Normally, the vertex shader
3230 simply copies the edge flag input to the edgeflag output.
3231
3232 Edge flags are used to control which lines or points are actually
3233 drawn when the polygon mode converts triangles/quads/polygons into
3234 points or lines.
3235
3236
3237 TGSI_SEMANTIC_STENCIL
3238 """""""""""""""""""""
3239
3240 For fragment shaders, this semantic label indicates that an output
3241 is a writable stencil reference value. Only the Y component is writable.
3242 This allows the fragment shader to change the fragments stencilref value.
3243
3244
3245 TGSI_SEMANTIC_VIEWPORT_INDEX
3246 """"""""""""""""""""""""""""
3247
3248 For geometry shaders, this semantic label indicates that an output
3249 contains the index of the viewport (and scissor) to use.
3250 This is an integer value, and only the X component is used.
3251
3252 If PIPE_CAP_VS_LAYER_VIEWPORT or PIPE_CAP_TES_LAYER_VIEWPORT is
3253 supported, then this semantic label can also be used in vertex or
3254 tessellation evaluation shaders, respectively. Only the value written in the
3255 last vertex processing stage is used.
3256
3257
3258 TGSI_SEMANTIC_LAYER
3259 """""""""""""""""""
3260
3261 For geometry shaders, this semantic label indicates that an output
3262 contains the layer value to use for the color and depth/stencil surfaces.
3263 This is an integer value, and only the X component is used.
3264 (Also known as rendertarget array index.)
3265
3266 If PIPE_CAP_VS_LAYER_VIEWPORT or PIPE_CAP_TES_LAYER_VIEWPORT is
3267 supported, then this semantic label can also be used in vertex or
3268 tessellation evaluation shaders, respectively. Only the value written in the
3269 last vertex processing stage is used.
3270
3271
3272 TGSI_SEMANTIC_CLIPDIST
3273 """"""""""""""""""""""
3274
3275 Note this covers clipping and culling distances.
3276
3277 When components of vertex elements are identified this way, these
3278 values are each assumed to be a float32 signed distance to a plane.
3279
3280 For clip distances:
3281 Primitive setup only invokes rasterization on pixels for which
3282 the interpolated plane distances are >= 0.
3283
3284 For cull distances:
3285 Primitives will be completely discarded if the plane distance
3286 for all of the vertices in the primitive are < 0.
3287 If a vertex has a cull distance of NaN, that vertex counts as "out"
3288 (as if its < 0);
3289
3290 Multiple clip/cull planes can be implemented simultaneously, by
3291 annotating multiple components of one or more vertex elements with
3292 the above specified semantic.
3293 The limits on both clip and cull distances are bound
3294 by the PIPE_MAX_CLIP_OR_CULL_DISTANCE_COUNT define which defines
3295 the maximum number of components that can be used to hold the
3296 distances and by the PIPE_MAX_CLIP_OR_CULL_DISTANCE_ELEMENT_COUNT
3297 which specifies the maximum number of registers which can be
3298 annotated with those semantics.
3299 The properties NUM_CLIPDIST_ENABLED and NUM_CULLDIST_ENABLED
3300 are used to divide up the 2 x vec4 space between clipping and culling.
3301
3302 TGSI_SEMANTIC_SAMPLEID
3303 """"""""""""""""""""""
3304
3305 For fragment shaders, this semantic label indicates that a system value
3306 contains the current sample id (i.e. gl_SampleID) as an unsigned int.
3307 Only the X component is used.  If per-sample shading is not enabled,
3308 the result is (0, undef, undef, undef).
3309
3310 Note that if the fragment shader uses this system value, the fragment
3311 shader is automatically executed at per sample frequency.
3312
3313 TGSI_SEMANTIC_SAMPLEPOS
3314 """""""""""""""""""""""
3315
3316 For fragment shaders, this semantic label indicates that a system
3317 value contains the current sample's position as float4(x, y, undef, undef)
3318 in the render target (i.e.  gl_SamplePosition) when per-fragment shading
3319 is in effect.  Position values are in the range [0, 1] where 0.5 is
3320 the center of the fragment.
3321
3322 Note that if the fragment shader uses this system value, the fragment
3323 shader is automatically executed at per sample frequency.
3324
3325 TGSI_SEMANTIC_SAMPLEMASK
3326 """"""""""""""""""""""""
3327
3328 For fragment shaders, this semantic label can be applied to either a
3329 shader system value input or output.
3330
3331 For a system value, the sample mask indicates the set of samples covered by
3332 the current primitive.  If MSAA is not enabled, the value is (1, 0, 0, 0).
3333
3334 For an output, the sample mask is used to disable further sample processing.
3335
3336 For both, the register type is uint[4] but only the X component is used
3337 (i.e. gl_SampleMask[0]). Each bit corresponds to one sample position (up
3338 to 32x MSAA is supported).
3339
3340 TGSI_SEMANTIC_INVOCATIONID
3341 """"""""""""""""""""""""""
3342
3343 For geometry shaders, this semantic label indicates that a system value
3344 contains the current invocation id (i.e. gl_InvocationID).
3345 This is an integer value, and only the X component is used.
3346
3347 TGSI_SEMANTIC_INSTANCEID
3348 """"""""""""""""""""""""
3349
3350 For vertex shaders, this semantic label indicates that a system value contains
3351 the current instance id (i.e. gl_InstanceID). It does not include the base
3352 instance. This is an integer value, and only the X component is used.
3353
3354 TGSI_SEMANTIC_VERTEXID
3355 """"""""""""""""""""""
3356
3357 For vertex shaders, this semantic label indicates that a system value contains
3358 the current vertex id (i.e. gl_VertexID). It does (unlike in d3d10) include the
3359 base vertex. This is an integer value, and only the X component is used.
3360
3361 TGSI_SEMANTIC_VERTEXID_NOBASE
3362 """""""""""""""""""""""""""""""
3363
3364 For vertex shaders, this semantic label indicates that a system value contains
3365 the current vertex id without including the base vertex (this corresponds to
3366 d3d10 vertex id, so TGSI_SEMANTIC_VERTEXID_NOBASE + TGSI_SEMANTIC_BASEVERTEX
3367 == TGSI_SEMANTIC_VERTEXID). This is an integer value, and only the X component
3368 is used.
3369
3370 TGSI_SEMANTIC_BASEVERTEX
3371 """"""""""""""""""""""""
3372
3373 For vertex shaders, this semantic label indicates that a system value contains
3374 the base vertex (i.e. gl_BaseVertex). Note that for non-indexed draw calls,
3375 this contains the first (or start) value instead.
3376 This is an integer value, and only the X component is used.
3377
3378 TGSI_SEMANTIC_PRIMID
3379 """"""""""""""""""""
3380
3381 For geometry and fragment shaders, this semantic label indicates the value
3382 contains the primitive id (i.e. gl_PrimitiveID). This is an integer value,
3383 and only the X component is used.
3384 FIXME: This right now can be either a ordinary input or a system value...
3385
3386
3387 TGSI_SEMANTIC_PATCH
3388 """""""""""""""""""
3389
3390 For tessellation evaluation/control shaders, this semantic label indicates a
3391 generic per-patch attribute. Such semantics will not implicitly be per-vertex
3392 arrays.
3393
3394 TGSI_SEMANTIC_TESSCOORD
3395 """""""""""""""""""""""
3396
3397 For tessellation evaluation shaders, this semantic label indicates the
3398 coordinates of the vertex being processed. This is available in XYZ; W is
3399 undefined.
3400
3401 TGSI_SEMANTIC_TESSOUTER
3402 """""""""""""""""""""""
3403
3404 For tessellation evaluation/control shaders, this semantic label indicates the
3405 outer tessellation levels of the patch. Isoline tessellation will only have XY
3406 defined, triangle will have XYZ and quads will have XYZW defined. This
3407 corresponds to gl_TessLevelOuter.
3408
3409 TGSI_SEMANTIC_TESSINNER
3410 """""""""""""""""""""""
3411
3412 For tessellation evaluation/control shaders, this semantic label indicates the
3413 inner tessellation levels of the patch. The X value is only defined for
3414 triangle tessellation, while quads will have XY defined. This is entirely
3415 undefined for isoline tessellation.
3416
3417 TGSI_SEMANTIC_VERTICESIN
3418 """"""""""""""""""""""""
3419
3420 For tessellation evaluation/control shaders, this semantic label indicates the
3421 number of vertices provided in the input patch. Only the X value is defined.
3422
3423 TGSI_SEMANTIC_HELPER_INVOCATION
3424 """""""""""""""""""""""""""""""
3425
3426 For fragment shaders, this semantic indicates whether the current
3427 invocation is covered or not. Helper invocations are created in order
3428 to properly compute derivatives, however it may be desirable to skip
3429 some of the logic in those cases. See ``gl_HelperInvocation`` documentation.
3430
3431 TGSI_SEMANTIC_BASEINSTANCE
3432 """"""""""""""""""""""""""
3433
3434 For vertex shaders, the base instance argument supplied for this
3435 draw. This is an integer value, and only the X component is used.
3436
3437 TGSI_SEMANTIC_DRAWID
3438 """"""""""""""""""""
3439
3440 For vertex shaders, the zero-based index of the current draw in a
3441 ``glMultiDraw*`` invocation. This is an integer value, and only the X
3442 component is used.
3443
3444
3445 TGSI_SEMANTIC_WORK_DIM
3446 """"""""""""""""""""""
3447
3448 For compute shaders started via OpenCL this retrieves the work_dim
3449 parameter to the clEnqueueNDRangeKernel call with which the shader
3450 was started.
3451
3452
3453 TGSI_SEMANTIC_GRID_SIZE
3454 """""""""""""""""""""""
3455
3456 For compute shaders, this semantic indicates the maximum (x, y, z) dimensions
3457 of a grid of thread blocks.
3458
3459
3460 TGSI_SEMANTIC_BLOCK_ID
3461 """"""""""""""""""""""
3462
3463 For compute shaders, this semantic indicates the (x, y, z) coordinates of the
3464 current block inside of the grid.
3465
3466
3467 TGSI_SEMANTIC_BLOCK_SIZE
3468 """"""""""""""""""""""""
3469
3470 For compute shaders, this semantic indicates the maximum (x, y, z) dimensions
3471 of a block in threads.
3472
3473
3474 TGSI_SEMANTIC_THREAD_ID
3475 """""""""""""""""""""""
3476
3477 For compute shaders, this semantic indicates the (x, y, z) coordinates of the
3478 current thread inside of the block.
3479
3480
3481 TGSI_SEMANTIC_SUBGROUP_SIZE
3482 """""""""""""""""""""""""""
3483
3484 This semantic indicates the subgroup size for the current invocation. This is
3485 an integer of at most 64, as it indicates the width of lanemasks. It does not
3486 depend on the number of invocations that are active.
3487
3488
3489 TGSI_SEMANTIC_SUBGROUP_INVOCATION
3490 """""""""""""""""""""""""""""""""
3491
3492 The index of the current invocation within its subgroup.
3493
3494
3495 TGSI_SEMANTIC_SUBGROUP_EQ_MASK
3496 """"""""""""""""""""""""""""""
3497
3498 A bit mask of ``bit index == TGSI_SEMANTIC_SUBGROUP_INVOCATION``, i.e.
3499 ``1 << subgroup_invocation`` in arbitrary precision arithmetic.
3500
3501
3502 TGSI_SEMANTIC_SUBGROUP_GE_MASK
3503 """"""""""""""""""""""""""""""
3504
3505 A bit mask of ``bit index >= TGSI_SEMANTIC_SUBGROUP_INVOCATION``, i.e.
3506 ``((1 << (subgroup_size - subgroup_invocation)) - 1) << subgroup_invocation``
3507 in arbitrary precision arithmetic.
3508
3509
3510 TGSI_SEMANTIC_SUBGROUP_GT_MASK
3511 """"""""""""""""""""""""""""""
3512
3513 A bit mask of ``bit index > TGSI_SEMANTIC_SUBGROUP_INVOCATION``, i.e.
3514 ``((1 << (subgroup_size - subgroup_invocation - 1)) - 1) << (subgroup_invocation + 1)``
3515 in arbitrary precision arithmetic.
3516
3517
3518 TGSI_SEMANTIC_SUBGROUP_LE_MASK
3519 """"""""""""""""""""""""""""""
3520
3521 A bit mask of ``bit index <= TGSI_SEMANTIC_SUBGROUP_INVOCATION``, i.e.
3522 ``(1 << (subgroup_invocation + 1)) - 1`` in arbitrary precision arithmetic.
3523
3524
3525 TGSI_SEMANTIC_SUBGROUP_LT_MASK
3526 """"""""""""""""""""""""""""""
3527
3528 A bit mask of ``bit index < TGSI_SEMANTIC_SUBGROUP_INVOCATION``, i.e.
3529 ``(1 << subgroup_invocation) - 1`` in arbitrary precision arithmetic.
3530
3531
3532 TGSI_SEMANTIC_VIEWPORT_MASK
3533 """""""""""""""""""""""""""
3534
3535 A bit mask of viewports to broadcast the current primitive to. See
3536 GL_NV_viewport_array2 for more details.
3537
3538
3539 TGSI_SEMANTIC_TESS_DEFAULT_OUTER_LEVEL
3540 """"""""""""""""""""""""""""""""""""""
3541
3542 A system value equal to the default_outer_level array set via set_tess_level.
3543
3544
3545 TGSI_SEMANTIC_TESS_DEFAULT_INNER_LEVEL
3546 """"""""""""""""""""""""""""""""""""""
3547
3548 A system value equal to the default_inner_level array set via set_tess_level.
3549
3550
3551 Declaration Interpolate
3552 ^^^^^^^^^^^^^^^^^^^^^^^
3553
3554 This token is only valid for fragment shader INPUT declarations.
3555
3556 The Interpolate field specifies the way input is being interpolated by
3557 the rasterizer and is one of TGSI_INTERPOLATE_*.
3558
3559 The Location field specifies the location inside the pixel that the
3560 interpolation should be done at, one of ``TGSI_INTERPOLATE_LOC_*``. Note that
3561 when per-sample shading is enabled, the implementation may choose to
3562 interpolate at the sample irrespective of the Location field.
3563
3564
3565 Declaration Sampler View
3566 ^^^^^^^^^^^^^^^^^^^^^^^^
3567
3568 Follows Declaration token if file is TGSI_FILE_SAMPLER_VIEW.
3569
3570 DCL SVIEW[#], resource, type(s)
3571
3572 Declares a shader input sampler view and assigns it to a SVIEW[#]
3573 register.
3574
3575 resource can be one of BUFFER, 1D, 2D, 3D, 1DArray and 2DArray.
3576
3577 type must be 1 or 4 entries (if specifying on a per-component
3578 level) out of UNORM, SNORM, SINT, UINT and FLOAT.
3579
3580 For TEX\* style texture sample opcodes (as opposed to SAMPLE\* opcodes
3581 which take an explicit SVIEW[#] source register), there may be optionally
3582 SVIEW[#] declarations.  In this case, the SVIEW index is implied by the
3583 SAMP index, and there must be a corresponding SVIEW[#] declaration for
3584 each SAMP[#] declaration.  Drivers are free to ignore this if they wish.
3585 But note in particular that some drivers need to know the sampler type
3586 (float/int/unsigned) in order to generate the correct code, so cases
3587 where integer textures are sampled, SVIEW[#] declarations should be
3588 used.
3589
3590 NOTE: It is NOT legal to mix SAMPLE\* style opcodes and TEX\* opcodes
3591 in the same shader.
3592
3593 Declaration Resource
3594 ^^^^^^^^^^^^^^^^^^^^
3595
3596 Follows Declaration token if file is TGSI_FILE_RESOURCE.
3597
3598 DCL RES[#], resource [, WR] [, RAW]
3599
3600 Declares a shader input resource and assigns it to a RES[#]
3601 register.
3602
3603 resource can be one of BUFFER, 1D, 2D, 3D, CUBE, 1DArray and
3604 2DArray.
3605
3606 If the RAW keyword is not specified, the texture data will be
3607 subject to conversion, swizzling and scaling as required to yield
3608 the specified data type from the physical data format of the bound
3609 resource.
3610
3611 If the RAW keyword is specified, no channel conversion will be
3612 performed: the values read for each of the channels (X,Y,Z,W) will
3613 correspond to consecutive words in the same order and format
3614 they're found in memory.  No element-to-address conversion will be
3615 performed either: the value of the provided X coordinate will be
3616 interpreted in byte units instead of texel units.  The result of
3617 accessing a misaligned address is undefined.
3618
3619 Usage of the STORE opcode is only allowed if the WR (writable) flag
3620 is set.
3621
3622 Hardware Atomic Register File
3623 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
3624
3625 Hardware atomics are declared as a 2D array with an optional array id.
3626
3627 The first member of the dimension is the buffer resource the atomic
3628 is located in.
3629 The second member is a range into the buffer resource, either for
3630 one or multiple counters. If this is an array, the declaration will have
3631 an unique array id.
3632
3633 Each counter is 4 bytes in size, and index and ranges are in counters not bytes.
3634 DCL HWATOMIC[0][0]
3635 DCL HWATOMIC[0][1]
3636
3637 This declares two atomics, one at the start of the buffer and one in the
3638 second 4 bytes.
3639
3640 DCL HWATOMIC[0][0]
3641 DCL HWATOMIC[1][0]
3642 DCL HWATOMIC[1][1..3], ARRAY(1)
3643
3644 This declares 5 atomics, one in buffer 0 at 0,
3645 one in buffer 1 at 0, and an array of 3 atomics in
3646 the buffer 1, starting at 1.
3647
3648 Properties
3649 ^^^^^^^^^^^^^^^^^^^^^^^^
3650
3651 Properties are general directives that apply to the whole TGSI program.
3652
3653 FS_COORD_ORIGIN
3654 """""""""""""""
3655
3656 Specifies the fragment shader TGSI_SEMANTIC_POSITION coordinate origin.
3657 The default value is UPPER_LEFT.
3658
3659 If UPPER_LEFT, the position will be (0,0) at the upper left corner and
3660 increase downward and rightward.
3661 If LOWER_LEFT, the position will be (0,0) at the lower left corner and
3662 increase upward and rightward.
3663
3664 OpenGL defaults to LOWER_LEFT, and is configurable with the
3665 GL_ARB_fragment_coord_conventions extension.
3666
3667 DirectX 9/10 use UPPER_LEFT.
3668
3669 FS_COORD_PIXEL_CENTER
3670 """""""""""""""""""""
3671
3672 Specifies the fragment shader TGSI_SEMANTIC_POSITION pixel center convention.
3673 The default value is HALF_INTEGER.
3674
3675 If HALF_INTEGER, the fractional part of the position will be 0.5
3676 If INTEGER, the fractional part of the position will be 0.0
3677
3678 Note that this does not affect the set of fragments generated by
3679 rasterization, which is instead controlled by half_pixel_center in the
3680 rasterizer.
3681
3682 OpenGL defaults to HALF_INTEGER, and is configurable with the
3683 GL_ARB_fragment_coord_conventions extension.
3684
3685 DirectX 9 uses INTEGER.
3686 DirectX 10 uses HALF_INTEGER.
3687
3688 FS_COLOR0_WRITES_ALL_CBUFS
3689 """"""""""""""""""""""""""
3690 Specifies that writes to the fragment shader color 0 are replicated to all
3691 bound cbufs. This facilitates OpenGL's fragColor output vs fragData[0] where
3692 fragData is directed to a single color buffer, but fragColor is broadcast.
3693
3694 VS_PROHIBIT_UCPS
3695 """"""""""""""""""""""""""
3696 If this property is set on the program bound to the shader stage before the
3697 fragment shader, user clip planes should have no effect (be disabled) even if
3698 that shader does not write to any clip distance outputs and the rasterizer's
3699 clip_plane_enable is non-zero.
3700 This property is only supported by drivers that also support shader clip
3701 distance outputs.
3702 This is useful for APIs that don't have UCPs and where clip distances written
3703 by a shader cannot be disabled.
3704
3705 GS_INVOCATIONS
3706 """"""""""""""
3707
3708 Specifies the number of times a geometry shader should be executed for each
3709 input primitive. Each invocation will have a different
3710 TGSI_SEMANTIC_INVOCATIONID system value set. If not specified, assumed to
3711 be 1.
3712
3713 VS_WINDOW_SPACE_POSITION
3714 """"""""""""""""""""""""""
3715 If this property is set on the vertex shader, the TGSI_SEMANTIC_POSITION output
3716 is assumed to contain window space coordinates.
3717 Division of X,Y,Z by W and the viewport transformation are disabled, and 1/W is
3718 directly taken from the 4-th component of the shader output.
3719 Naturally, clipping is not performed on window coordinates either.
3720 The effect of this property is undefined if a geometry or tessellation shader
3721 are in use.
3722
3723 TCS_VERTICES_OUT
3724 """"""""""""""""
3725
3726 The number of vertices written by the tessellation control shader. This
3727 effectively defines the patch input size of the tessellation evaluation shader
3728 as well.
3729
3730 TES_PRIM_MODE
3731 """""""""""""
3732
3733 This sets the tessellation primitive mode, one of ``PIPE_PRIM_TRIANGLES``,
3734 ``PIPE_PRIM_QUADS``, or ``PIPE_PRIM_LINES``. (Unlike in GL, there is no
3735 separate isolines settings, the regular lines is assumed to mean isolines.)
3736
3737 TES_SPACING
3738 """""""""""
3739
3740 This sets the spacing mode of the tessellation generator, one of
3741 ``PIPE_TESS_SPACING_*``.
3742
3743 TES_VERTEX_ORDER_CW
3744 """""""""""""""""""
3745
3746 This sets the vertex order to be clockwise if the value is 1, or
3747 counter-clockwise if set to 0.
3748
3749 TES_POINT_MODE
3750 """"""""""""""
3751
3752 If set to a non-zero value, this turns on point mode for the tessellator,
3753 which means that points will be generated instead of primitives.
3754
3755 NUM_CLIPDIST_ENABLED
3756 """"""""""""""""""""
3757
3758 How many clip distance scalar outputs are enabled.
3759
3760 NUM_CULLDIST_ENABLED
3761 """"""""""""""""""""
3762
3763 How many cull distance scalar outputs are enabled.
3764
3765 FS_EARLY_DEPTH_STENCIL
3766 """"""""""""""""""""""
3767
3768 Whether depth test, stencil test, and occlusion query should run before
3769 the fragment shader (regardless of fragment shader side effects). Corresponds
3770 to GLSL early_fragment_tests.
3771
3772 NEXT_SHADER
3773 """""""""""
3774
3775 Which shader stage will MOST LIKELY follow after this shader when the shader
3776 is bound. This is only a hint to the driver and doesn't have to be precise.
3777 Only set for VS and TES.
3778
3779 CS_FIXED_BLOCK_WIDTH / HEIGHT / DEPTH
3780 """""""""""""""""""""""""""""""""""""
3781
3782 Threads per block in each dimension, if known at compile time. If the block size
3783 is known all three should be at least 1. If it is unknown they should all be set
3784 to 0 or not set.
3785
3786 LEGACY_MATH_RULES
3787 """""""""""""""""
3788
3789 The MUL TGSI operation (FP32 multiplication) will return 0 if either
3790 of the operands are equal to 0. That means that 0 * Inf = 0. This
3791 should be set the same way for an entire pipeline. Note that this
3792 applies not only to the literal MUL TGSI opcode, but all FP32
3793 multiplications implied by other operations, such as MAD, FMA, DP2,
3794 DP3, DP4, DST, LOG, LRP, and possibly others. If there is a
3795 mismatch between shaders, then it is unspecified whether this behavior
3796 will be enabled.
3797
3798 FS_POST_DEPTH_COVERAGE
3799 """"""""""""""""""""""
3800
3801 When enabled, the input for TGSI_SEMANTIC_SAMPLEMASK will exclude samples
3802 that have failed the depth/stencil tests. This is only valid when
3803 FS_EARLY_DEPTH_STENCIL is also specified.
3804
3805 LAYER_VIEWPORT_RELATIVE
3806 """""""""""""""""""""""
3807
3808 When enabled, the TGSI_SEMATNIC_LAYER output value is relative to the
3809 current viewport. This is especially useful in conjunction with
3810 TGSI_SEMANTIC_VIEWPORT_MASK.
3811
3812
3813 Texture Sampling and Texture Formats
3814 ------------------------------------
3815
3816 This table shows how texture image components are returned as (x,y,z,w) tuples
3817 by TGSI texture instructions, such as :opcode:`TEX`, :opcode:`TXD`, and
3818 :opcode:`TXP`. For reference, OpenGL and Direct3D conventions are shown as
3819 well.
3820
3821 +--------------------+--------------+--------------------+--------------+
3822 | Texture Components | Gallium      | OpenGL             | Direct3D 9   |
3823 +====================+==============+====================+==============+
3824 | R                  | (r, 0, 0, 1) | (r, 0, 0, 1)       | (r, 1, 1, 1) |
3825 +--------------------+--------------+--------------------+--------------+
3826 | RG                 | (r, g, 0, 1) | (r, g, 0, 1)       | (r, g, 1, 1) |
3827 +--------------------+--------------+--------------------+--------------+
3828 | RGB                | (r, g, b, 1) | (r, g, b, 1)       | (r, g, b, 1) |
3829 +--------------------+--------------+--------------------+--------------+
3830 | RGBA               | (r, g, b, a) | (r, g, b, a)       | (r, g, b, a) |
3831 +--------------------+--------------+--------------------+--------------+
3832 | A                  | (0, 0, 0, a) | (0, 0, 0, a)       | (0, 0, 0, a) |
3833 +--------------------+--------------+--------------------+--------------+
3834 | L                  | (l, l, l, 1) | (l, l, l, 1)       | (l, l, l, 1) |
3835 +--------------------+--------------+--------------------+--------------+
3836 | LA                 | (l, l, l, a) | (l, l, l, a)       | (l, l, l, a) |
3837 +--------------------+--------------+--------------------+--------------+
3838 | I                  | (i, i, i, i) | (i, i, i, i)       | N/A          |
3839 +--------------------+--------------+--------------------+--------------+
3840 | UV                 | XXX TBD      | (0, 0, 0, 1)       | (u, v, 1, 1) |
3841 |                    |              | [#envmap-bumpmap]_ |              |
3842 +--------------------+--------------+--------------------+--------------+
3843 | Z                  | XXX TBD      | (z, z, z, 1)       | (0, z, 0, 1) |
3844 |                    |              | [#depth-tex-mode]_ |              |
3845 +--------------------+--------------+--------------------+--------------+
3846 | S                  | (s, s, s, s) | unknown            | unknown      |
3847 +--------------------+--------------+--------------------+--------------+
3848
3849 .. [#envmap-bumpmap] http://www.opengl.org/registry/specs/ATI/envmap_bumpmap.txt
3850 .. [#depth-tex-mode] the default is (z, z, z, 1) but may also be (0, 0, 0, z)
3851    or (z, z, z, z) depending on the value of GL_DEPTH_TEXTURE_MODE.