From 91482ef226de7686350202cfbdfda4358d9cea86 Mon Sep 17 00:00:00 2001
From: Ian Romanick <ian.d.romanick@intel.com>
Date: Mon, 20 Jun 2016 16:28:34 -0700
Subject: [PATCH] MESA_shader_integer_functions: Add extension specification

v2: Fix typo in #extension line noticed by Ken.

v3: Update spec status.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
---
 docs/specs/MESA_shader_integer_functions.txt | 520 +++++++++++++++++++++++++++
 1 file changed, 520 insertions(+)
 create mode 100644 docs/specs/MESA_shader_integer_functions.txt

diff --git a/docs/specs/MESA_shader_integer_functions.txt b/docs/specs/MESA_shader_integer_functions.txt
new file mode 100644
index 0000000..58a956f
--- /dev/null
+++ b/docs/specs/MESA_shader_integer_functions.txt
@@ -0,0 +1,520 @@
+Name
+
+    MESA_shader_integer_functions
+
+Name Strings
+
+    GL_MESA_shader_integer_functions
+
+Contact
+
+    Ian Romanick <ian.d.romanick@intel.com>
+
+Contributors
+
+    All the contributors of GL_ARB_gpu_shader5
+
+Status
+
+    Supported by all GLSL 1.30 capable drivers in Mesa 12.1 and later
+
+Version
+
+    Version 2, July 7, 2016
+
+Number
+
+    TBD
+
+Dependencies
+
+    This extension is written against the OpenGL 3.2 (Compatibility Profile)
+    Specification.
+
+    This extension is written against Version 1.50 (Revision 09) of the OpenGL
+    Shading Language Specification.
+
+    GLSL 1.30 is required.
+
+    This extension interacts with ARB_gpu_shader5.
+
+    This extension interacts with ARB_gpu_shader_fp64.
+
+    This extension interacts with NV_gpu_shader5.
+
+Overview
+
+    GL_ARB_gpu_shader5 extends GLSL in a number of useful ways.  Much of this
+    added functionality requires significant hardware support.  There are many
+    aspects, however, that can be easily implmented on any GPU with "real"
+    integer support (as opposed to simulating integers using floating point
+    calculations).
+
+    This extension provides a set of new features to the OpenGL Shading
+    Language to support capabilities of these GPUs, extending the capabilities
+    of version 1.30 of the OpenGL Shading Language.  Shaders
+    using the new functionality provided by this extension should enable this
+    functionality via the construct
+
+      #extension GL_MESA_shader_integer_functions : require   (or enable)
+
+    This extension provides a variety of new features for all shader types,
+    including:
+
+      * support for implicitly converting signed integer types to unsigned
+        types, as well as more general implicit conversion and function
+        overloading infrastructure to support new data types introduced by
+        other extensions;
+
+      * new built-in functions supporting:
+
+        * splitting a floating-point number into a significand and exponent
+          (frexp), or building a floating-point number from a significand and
+          exponent (ldexp);
+
+        * integer bitfield manipulation, including functions to find the
+          position of the most or least significant set bit, count the number
+          of one bits, and bitfield insertion, extraction, and reversal;
+
+        * extended integer precision math, including add with carry, subtract
+          with borrow, and extenended multiplication;
+
+    The resulting extension is a strict subset of GL_ARB_gpu_shader5.
+
+IP Status
+
+    No known IP claims.
+
+New Procedures and Functions
+
+    None
+
+New Tokens
+
+    None
+
+Additions to Chapter 2 of the OpenGL 3.2 (Compatibility Profile) Specification
+(OpenGL Operation)
+
+    None.
+
+Additions to Chapter 3 of the OpenGL 3.2 (Compatibility Profile) Specification
+(Rasterization)
+
+    None.
+
+Additions to Chapter 4 of the OpenGL 3.2 (Compatibility Profile) Specification
+(Per-Fragment Operations and the Frame Buffer)
+
+    None.
+
+Additions to Chapter 5 of the OpenGL 3.2 (Compatibility Profile) Specification
+(Special Functions)
+
+    None.
+
+Additions to Chapter 6 of the OpenGL 3.2 (Compatibility Profile) Specification
+(State and State Requests)
+
+    None.
+
+Additions to Appendix A of the OpenGL 3.2 (Compatibility Profile)
+Specification (Invariance)
+
+    None.
+
+Additions to the AGL/GLX/WGL Specifications
+
+    None.
+
+Modifications to The OpenGL Shading Language Specification, Version 1.50
+(Revision 09)
+
+    Including the following line in a shader can be used to control the
+    language features described in this extension:
+
+      #extension GL_MESA_shader_integer_functions : <behavior>
+
+    where <behavior> is as specified in section 3.3.
+
+    New preprocessor #defines are added to the OpenGL Shading Language:
+
+      #define GL_MESA_shader_integer_functions        1
+
+
+    Modify Section 4.1.10, Implicit Conversions, p. 27
+
+    (modify table of implicit conversions)
+
+                                Can be implicitly
+        Type of expression        converted to
+        ---------------------   -----------------
+        int                     uint, float
+        ivec2                   uvec2, vec2
+        ivec3                   uvec3, vec3
+        ivec4                   uvec4, vec4
+
+        uint                    float
+        uvec2                   vec2
+        uvec3                   vec3
+        uvec4                   vec4
+
+    (modify second paragraph of the section) No implicit conversions are
+    provided to convert from unsigned to signed integer types or from
+    floating-point to integer types.  There are no implicit array or structure
+    conversions.
+
+    (insert before the final paragraph of the section) When performing
+    implicit conversion for binary operators, there may be multiple data types
+    to which the two operands can be converted.  For example, when adding an
+    int value to a uint value, both values can be implicitly converted to uint
+    and float.  In such cases, a floating-point type is chosen if either
+    operand has a floating-point type.  Otherwise, an unsigned integer type is
+    chosen if either operand has an unsigned integer type.  Otherwise, a
+    signed integer type is chosen.
+    
+
+    Modify Section 5.9, Expressions, p. 57
+
+    (modify bulleted list as follows, adding support for implicit conversion
+    between signed and unsigned types)
+
+    Expressions in the shading language are built from the following:
+
+    * Constants of type bool, int, int64_t, uint, uint64_t, float, all vector
+      types, and all matrix types.
+
+    ...
+
+    * The operator modulus (%) operates on signed or unsigned integer scalars
+      or vectors.  If the fundamental types of the operands do not match, the
+      conversions from Section 4.1.10 "Implicit Conversions" are applied to
+      produce matching types.  ...
+
+
+    Modify Section 6.1, Function Definitions, p. 63
+
+    (modify description of overloading, beginning at the top of p. 64)
+
+     Function names can be overloaded.  The same function name can be used for
+     multiple functions, as long as the parameter types differ.  If a function
+     name is declared twice with the same parameter types, then the return
+     types and all qualifiers must also match, and it is the same function
+     being declared.  For example,
+
+       vec4 f(in vec4 x, out vec4  y);   // (A)
+       vec4 f(in vec4 x, out uvec4 y);   // (B) okay, different argument type
+       vec4 f(in ivec4 x, out uvec4 y);  // (C) okay, different argument type
+
+       int  f(in vec4 x, out ivec4 y);  // error, only return type differs
+       vec4 f(in vec4 x, in  vec4  y);  // error, only qualifier differs
+       vec4 f(const in vec4 x, out vec4 y);  // error, only qualifier differs
+
+     When function calls are resolved, an exact type match for all the
+     arguments is sought.  If an exact match is found, all other functions are
+     ignored, and the exact match is used.  If no exact match is found, then
+     the implicit conversions in Section 4.1.10 (Implicit Conversions) will be
+     applied to find a match.  Mismatched types on input parameters (in or
+     inout or default) must have a conversion from the calling argument type
+     to the formal parameter type.  Mismatched types on output parameters (out
+     or inout) must have a conversion from the formal parameter type to the
+     calling argument type.
+
+     If implicit conversions can be used to find more than one matching
+     function, a single best-matching function is sought.  To determine a best
+     match, the conversions between calling argument and formal parameter
+     types are compared for each function argument and pair of matching
+     functions.  After these comparisons are performed, each pair of matching
+     functions are compared.  A function definition A is considered a better
+     match than function definition B if:
+
+       * for at least one function argument, the conversion for that argument
+         in A is better than the corresponding conversion in B; and
+
+       * there is no function argument for which the conversion in B is better
+         than the corresponding conversion in A.
+
+     If a single function definition is considered a better match than every
+     other matching function definition, it will be used.  Otherwise, a
+     semantic error occurs and the shader will fail to compile.
+
+     To determine whether the conversion for a single argument in one match is
+     better than that for another match, the following rules are applied, in
+     order:
+
+       1. An exact match is better than a match involving any implicit
+          conversion.
+
+       2. A match involving an implicit conversion from float to double is
+          better than a match involving any other implicit conversion.
+
+       3. A match involving an implicit conversion from either int or uint to
+          float is better than a match involving an implicit conversion from
+          either int or uint to double.
+
+     If none of the rules above apply to a particular pair of conversions,
+     neither conversion is considered better than the other.
+
+     For the function prototypes (A), (B), and (C) above, the following
+     examples show how the rules apply to different sets of calling argument
+     types:
+
+       f(vec4, vec4);        // exact match of vec4 f(in vec4 x, out vec4 y)
+       f(vec4, uvec4);       // exact match of vec4 f(in vec4 x, out ivec4 y)
+       f(vec4, ivec4);       // matched to vec4 f(in vec4 x, out vec4 y)
+                             //   (C) not relevant, can't convert vec4 to 
+                             //   ivec4.  (A) better than (B) for 2nd
+                             //   argument (rule 2), same on first argument.
+       f(ivec4, vec4);       // NOT matched.  All three match by implicit
+                             //   conversion.  (C) is better than (A) and (B)
+                             //   on the first argument.  (A) is better than
+                             //   (B) and (C).
+
+
+    Modify Section 8.3, Common Functions, p. 84
+
+    (add support for single-precision frexp and ldexp functions)
+
+    Syntax:
+
+      genType frexp(genType x, out genIType exp);
+      genType ldexp(genType x, in genIType exp);
+
+    The function frexp() splits each single-precision floating-point number in
+    <x> into a binary significand, a floating-point number in the range [0.5,
+    1.0), and an integral exponent of two, such that:
+
+      x = significand * 2 ^ exponent
+
+    The significand is returned by the function; the exponent is returned in
+    the parameter <exp>.  For a floating-point value of zero, the significant
+    and exponent are both zero.  For a floating-point value that is an
+    infinity or is not a number, the results of frexp() are undefined.  
+
+    If the input <x> is a vector, this operation is performed in a
+    component-wise manner; the value returned by the function and the value
+    written to <exp> are vectors with the same number of components as <x>.
+
+    The function ldexp() builds a single-precision floating-point number from
+    each significand component in <x> and the corresponding integral exponent
+    of two in <exp>, returning:
+
+      significand * 2 ^ exponent
+
+    If this product is too large to be represented as a single-precision
+    floating-point value, the result is considered undefined.
+
+    If the input <x> is a vector, this operation is performed in a
+    component-wise manner; the value passed in <exp> and returned by the
+    function are vectors with the same number of components as <x>.
+
+
+    (add support for new integer built-in functions)
+
+    Syntax:
+
+      genIType bitfieldExtract(genIType value, int offset, int bits);
+      genUType bitfieldExtract(genUType value, int offset, int bits);
+
+      genIType bitfieldInsert(genIType base, genIType insert, int offset, 
+                              int bits);
+      genUType bitfieldInsert(genUType base, genUType insert, int offset, 
+                              int bits);
+
+      genIType bitfieldReverse(genIType value);
+      genUType bitfieldReverse(genUType value);
+
+      genIType bitCount(genIType value);
+      genIType bitCount(genUType value);
+
+      genIType findLSB(genIType value);
+      genIType findLSB(genUType value);
+
+      genIType findMSB(genIType value);
+      genIType findMSB(genUType value);
+
+    The function bitfieldExtract() extracts bits <offset> through
+    <offset>+<bits>-1 from each component in <value>, returning them in the
+    least significant bits of corresponding component of the result.  For
+    unsigned data types, the most significant bits of the result will be set
+    to zero.  For signed data types, the most significant bits will be set to
+    the value of bit <offset>+<base>-1.  If <bits> is zero, the result will be
+    zero.  The result will be undefined if <offset> or <bits> is negative, or
+    if the sum of <offset> and <bits> is greater than the number of bits used
+    to store the operand.  Note that for vector versions of bitfieldExtract(),
+    a single pair of <offset> and <bits> values is shared for all components.
+
+    The function bitfieldInsert() inserts the <bits> least significant bits of
+    each component of <insert> into the corresponding component of <base>.
+    The result will have bits numbered <offset> through <offset>+<bits>-1
+    taken from bits 0 through <bits>-1 of <insert>, and all other bits taken
+    directly from the corresponding bits of <base>.  If <bits> is zero, the
+    result will simply be <base>.  The result will be undefined if <offset> or
+    <bits> is negative, or if the sum of <offset> and <bits> is greater than
+    the number of bits used to store the operand.  Note that for vector
+    versions of bitfieldInsert(), a single pair of <offset> and <bits> values
+    is shared for all components.
+
+    The function bitfieldReverse() reverses the bits of <value>.  The bit
+    numbered <n> of the result will be taken from bit (<bits>-1)-<n> of
+    <value>, where <bits> is the total number of bits used to represent
+    <value>.
+
+    The function bitCount() returns the number of one bits in the binary
+    representation of <value>.
+
+    The function findLSB() returns the bit number of the least significant one
+    bit in the binary representation of <value>.  If <value> is zero, -1 will
+    be returned.
+
+    The function findMSB() returns the bit number of the most significant bit
+    in the binary representation of <value>.  For positive integers, the
+    result will be the bit number of the most significant one bit.  For
+    negative integers, the result will be the bit number of the most
+    significant zero bit.  For a <value> of zero or negative one, -1 will be
+    returned.
+
+
+    (support for unsigned integer add/subtract with carry-out)
+
+    Syntax:
+
+      genUType uaddCarry(genUType x, genUType y, out genUType carry);
+      genUType usubBorrow(genUType x, genUType y, out genUType borrow);
+
+    The function uaddCarry() adds 32-bit unsigned integers or vectors <x> and
+    <y>, returning the sum modulo 2^32.  The value <carry> is set to zero if
+    the sum was less than 2^32, or one otherwise.
+
+    The function usubBorrow() subtracts the 32-bit unsigned integer or vector
+    <y> from <x>, returning the difference if non-negative or 2^32 plus the
+    difference, otherwise.  The value <borrow> is set to zero if x >= y, or
+    one otherwise.
+
+
+    (support for signed and unsigned multiplies, with 32-bit inputs and a
+     64-bit result spanning two 32-bit outputs)
+
+    Syntax:
+
+      void umulExtended(genUType x, genUType y, out genUType msb, 
+                        out genUType lsb);
+      void imulExtended(genIType x, genIType y, out genIType msb,
+                        out genIType lsb);
+
+    The functions umulExtended() and imulExtended() multiply 32-bit unsigned
+    or signed integers or vectors <x> and <y>, producing a 64-bit result.  The
+    32 least significant bits are returned in <lsb>; the 32 most significant
+    bits are returned in <msb>.
+
+
+GLX Protocol
+
+    None.
+
+Dependencies on ARB_gpu_shader_fp64
+
+    This extension, ARB_gpu_shader_fp64, and NV_gpu_shader5 all modify the set
+    of implicit conversions supported in the OpenGL Shading Language.  If more
+    than one of these extensions is supported, an expression of one type may
+    be converted to another type if that conversion is allowed by any of these
+    specifications.
+
+    If ARB_gpu_shader_fp64 or a similar extension introducing new data types
+    is not supported, the function overloading rule in the GLSL specification
+    preferring promotion an input parameters to smaller type to a larger type
+    is never applicable, as all data types are of the same size.  That rule
+    and the example referring to "double" should be removed.
+
+
+Dependencies on NV_gpu_shader5
+
+    This extension, ARB_gpu_shader_fp64, and NV_gpu_shader5 all modify the set
+    of implicit conversions supported in the OpenGL Shading Language.  If more
+    than one of these extensions is supported, an expression of one type may
+    be converted to another type if that conversion is allowed by any of these
+    specifications.
+
+    If NV_gpu_shader5 is supported, integer data types are supported with four
+    different precisions (8-, 16, 32-, and 64-bit) and floating-point data
+    types are supported with three different precisions (16-, 32-, and
+    64-bit).  The extension adds the following rule for output parameters,
+    which is similar to the one present in this extension for input
+    parameters:
+
+       5. If the formal parameters in both matches are output parameters, a
+          conversion from a type with a larger number of bits per component is
+          better than a conversion from a type with a smaller number of bits
+          per component.  For example, a conversion from an "int16_t" formal
+          parameter type to "int"  is better than one from an "int8_t" formal
+          parameter type to "int".
+
+    Such a rule is not provided in this extension because there is no
+    combination of types in this extension and ARB_gpu_shader_fp64 where this
+    rule has any effect.
+
+
+Errors
+
+    None
+
+
+New State
+
+    None
+
+New Implementation Dependent State
+
+    None
+
+Issues
+
+    (1) What should this extension be called?
+
+      UNRESOLVED.  This extension borrows from GL_ARB_gpu_shader5, so creating
+      some sort of a play on that name would be viable.  However, nothing in
+      this extension should require SM5 hardware, so such a name would be a
+      little misleading and weird.
+
+      Since the primary purpose is to add integer related functions from
+      GL_ARB_gpu_shader5, call this extension GL_MESA_shader_integer_functions
+      for now.
+
+    (2) Why is some of the formatting in this extension weird?
+
+      RESOLVED: This extension is formatted to minimize the differences (as
+      reported by 'diff --side-by-side -W180') with the GL_ARB_gpu_shader5
+      specification.
+
+    (3) Should ldexp and frexp be included?
+
+      RESOLVED: Yes.  Few GPUs have native instructions to implement these
+      functions.  These are generally implemented using existing GLSL built-in
+      functions and the other functions provided by this extension.
+
+    (4) Should umulExtended and imulExtended be included?
+
+      RESOLVED: Yes.  These functions should be implementable on any GPU that
+      can support the rest of this extension, but the implementation may be
+      complex.  The implementation on a GPU that only supports 32bit x 32bit =
+      32bit multiplication would be quite expensive.  However, many GPUs
+      (including OpenGL 4.0 GPUs that already support this function) have a
+      32bit x 16bit = 48bit multiplier.  The implementation there is only
+      trivially more expensive than regular 32bit multiplication.
+
+    (5) Should the pack and unpack functions be included?
+
+      RESOLVED: No.  These functions are already available via
+      GL_ARB_shading_language_packing.
+
+    (6) Should the "BitsTo" functions be included?
+
+      RESOLVED: No.  These functions are already available via
+      GL_ARB_shader_bit_encoding.
+
+Revision History
+
+    Rev.      Date     Author    Changes
+    ----  -----------  --------  -----------------------------------------
+     2     7-Jul-2016  idr       Fix typo in #extension line
+     1    20-Jun-2016  idr       Initial version based on GL_ARB_gpu_shader5.
-- 
2.7.4