docs/specs/MESA_shader_integer_functions.txt


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520

Name

    MESA_shader_integer_functions

Name Strings

    GL_MESA_shader_integer_functions

Contact

    Ian Romanick <ian.d.romanick@intel.com>

Contributors

    All the contributors of GL_ARB_gpu_shader5

Status

    Supported by all GLSL 1.30 capable drivers in Mesa 12.1 and later

Version

    Version 2, July 7, 2016

Number

    TBD

Dependencies

    This extension is written against the OpenGL 3.2 (Compatibility Profile)
    Specification.

    This extension is written against Version 1.50 (Revision 09) of the OpenGL
    Shading Language Specification.

    GLSL 1.30 is required.

    This extension interacts with ARB_gpu_shader5.

    This extension interacts with ARB_gpu_shader_fp64.

    This extension interacts with NV_gpu_shader5.

Overview

    GL_ARB_gpu_shader5 extends GLSL in a number of useful ways.  Much of this
    added functionality requires significant hardware support.  There are many
    aspects, however, that can be easily implmented on any GPU with "real"
    integer support (as opposed to simulating integers using floating point
    calculations).

    This extension provides a set of new features to the OpenGL Shading
    Language to support capabilities of these GPUs, extending the capabilities
    of version 1.30 of the OpenGL Shading Language.  Shaders
    using the new functionality provided by this extension should enable this
    functionality via the construct

      #extension GL_MESA_shader_integer_functions : require   (or enable)

    This extension provides a variety of new features for all shader types,
    including:

      * support for implicitly converting signed integer types to unsigned
        types, as well as more general implicit conversion and function
        overloading infrastructure to support new data types introduced by
        other extensions;

      * new built-in functions supporting:

        * splitting a floating-point number into a significand and exponent
          (frexp), or building a floating-point number from a significand and
          exponent (ldexp);

        * integer bitfield manipulation, including functions to find the
          position of the most or least significant set bit, count the number
          of one bits, and bitfield insertion, extraction, and reversal;

        * extended integer precision math, including add with carry, subtract
          with borrow, and extenended multiplication;

    The resulting extension is a strict subset of GL_ARB_gpu_shader5.

IP Status

    No known IP claims.

New Procedures and Functions

    None

New Tokens

    None

Additions to Chapter 2 of the OpenGL 3.2 (Compatibility Profile) Specification
(OpenGL Operation)

    None.

Additions to Chapter 3 of the OpenGL 3.2 (Compatibility Profile) Specification
(Rasterization)

    None.

Additions to Chapter 4 of the OpenGL 3.2 (Compatibility Profile) Specification
(Per-Fragment Operations and the Frame Buffer)

    None.

Additions to Chapter 5 of the OpenGL 3.2 (Compatibility Profile) Specification
(Special Functions)

    None.

Additions to Chapter 6 of the OpenGL 3.2 (Compatibility Profile) Specification
(State and State Requests)

    None.

Additions to Appendix A of the OpenGL 3.2 (Compatibility Profile)
Specification (Invariance)

    None.

Additions to the AGL/GLX/WGL Specifications

    None.

Modifications to The OpenGL Shading Language Specification, Version 1.50
(Revision 09)

    Including the following line in a shader can be used to control the
    language features described in this extension:

      #extension GL_MESA_shader_integer_functions : <behavior>

    where <behavior> is as specified in section 3.3.

    New preprocessor #defines are added to the OpenGL Shading Language:

      #define GL_MESA_shader_integer_functions        1


    Modify Section 4.1.10, Implicit Conversions, p. 27

    (modify table of implicit conversions)

                                Can be implicitly
        Type of expression        converted to
        ---------------------   -----------------
        int                     uint, float
        ivec2                   uvec2, vec2
        ivec3                   uvec3, vec3
        ivec4                   uvec4, vec4

        uint                    float
        uvec2                   vec2
        uvec3                   vec3
        uvec4                   vec4

    (modify second paragraph of the section) No implicit conversions are
    provided to convert from unsigned to signed integer types or from
    floating-point to integer types.  There are no implicit array or structure
    conversions.

    (insert before the final paragraph of the section) When performing
    implicit conversion for binary operators, there may be multiple data types
    to which the two operands can be converted.  For example, when adding an
    int value to a uint value, both values can be implicitly converted to uint
    and float.  In such cases, a floating-point type is chosen if either
    operand has a floating-point type.  Otherwise, an unsigned integer type is
    chosen if either operand has an unsigned integer type.  Otherwise, a
    signed integer type is chosen.
    

    Modify Section 5.9, Expressions, p. 57

    (modify bulleted list as follows, adding support for implicit conversion
    between signed and unsigned types)

    Expressions in the shading language are built from the following:

    * Constants of type bool, int, int64_t, uint, uint64_t, float, all vector
      types, and all matrix types.

    ...

    * The operator modulus (%) operates on signed or unsigned integer scalars
      or vectors.  If the fundamental types of the operands do not match, the
      conversions from Section 4.1.10 "Implicit Conversions" are applied to
      produce matching types.  ...


    Modify Section 6.1, Function Definitions, p. 63

    (modify description of overloading, beginning at the top of p. 64)

     Function names can be overloaded.  The same function name can be used for
     multiple functions, as long as the parameter types differ.  If a function
     name is declared twice with the same parameter types, then the return
     types and all qualifiers must also match, and it is the same function
     being declared.  For example,

       vec4 f(in vec4 x, out vec4  y);   // (A)
       vec4 f(in vec4 x, out uvec4 y);   // (B) okay, different argument type
       vec4 f(in ivec4 x, out uvec4 y);  // (C) okay, different argument type

       int  f(in vec4 x, out ivec4 y);  // error, only return type differs
       vec4 f(in vec4 x, in  vec4  y);  // error, only qualifier differs
       vec4 f(const in vec4 x, out vec4 y);  // error, only qualifier differs

     When function calls are resolved, an exact type match for all the
     arguments is sought.  If an exact match is found, all other functions are
     ignored, and the exact match is used.  If no exact match is found, then
     the implicit conversions in Section 4.1.10 (Implicit Conversions) will be
     applied to find a match.  Mismatched types on input parameters (in or
     inout or default) must have a conversion from the calling argument type
     to the formal parameter type.  Mismatched types on output parameters (out
     or inout) must have a conversion from the formal parameter type to the
     calling argument type.

     If implicit conversions can be used to find more than one matching
     function, a single best-matching function is sought.  To determine a best
     match, the conversions between calling argument and formal parameter
     types are compared for each function argument and pair of matching
     functions.  After these comparisons are performed, each pair of matching
     functions are compared.  A function definition A is considered a better
     match than function definition B if:

       * for at least one function argument, the conversion for that argument
         in A is better than the corresponding conversion in B; and

       * there is no function argument for which the conversion in B is better
         than the corresponding conversion in A.

     If a single function definition is considered a better match than every
     other matching function definition, it will be used.  Otherwise, a
     semantic error occurs and the shader will fail to compile.

     To determine whether the conversion for a single argument in one match is
     better than that for another match, the following rules are applied, in
     order:

       1. An exact match is better than a match involving any implicit
          conversion.

       2. A match involving an implicit conversion from float to double is
          better than a match involving any other implicit conversion.

       3. A match involving an implicit conversion from either int or uint to
          float is better than a match involving an implicit conversion from
          either int or uint to double.

     If none of the rules above apply to a particular pair of conversions,
     neither conversion is considered better than the other.

     For the function prototypes (A), (B), and (C) above, the following
     examples show how the rules apply to different sets of calling argument
     types:

       f(vec4, vec4);        // exact match of vec4 f(in vec4 x, out vec4 y)
       f(vec4, uvec4);       // exact match of vec4 f(in vec4 x, out ivec4 y)
       f(vec4, ivec4);       // matched to vec4 f(in vec4 x, out vec4 y)
                             //   (C) not relevant, can't convert vec4 to 
                             //   ivec4.  (A) better than (B) for 2nd
                             //   argument (rule 2), same on first argument.
       f(ivec4, vec4);       // NOT matched.  All three match by implicit
                             //   conversion.  (C) is better than (A) and (B)
                             //   on the first argument.  (A) is better than
                             //   (B) and (C).


    Modify Section 8.3, Common Functions, p. 84

    (add support for single-precision frexp and ldexp functions)

    Syntax:

      genType frexp(genType x, out genIType exp);
      genType ldexp(genType x, in genIType exp);

    The function frexp() splits each single-precision floating-point number in
    <x> into a binary significand, a floating-point number in the range [0.5,
    1.0), and an integral exponent of two, such that:

      x = significand * 2 ^ exponent

    The significand is returned by the function; the exponent is returned in
    the parameter <exp>.  For a floating-point value of zero, the significant
    and exponent are both zero.  For a floating-point value that is an
    infinity or is not a number, the results of frexp() are undefined.  

    If the input <x> is a vector, this operation is performed in a
    component-wise manner; the value returned by the function and the value
    written to <exp> are vectors with the same number of components as <x>.

    The function ldexp() builds a single-precision floating-point number from
    each significand component in <x> and the corresponding integral exponent
    of two in <exp>, returning:

      significand * 2 ^ exponent

    If this product is too large to be represented as a single-precision
    floating-point value, the result is considered undefined.

    If the input <x> is a vector, this operation is performed in a
    component-wise manner; the value passed in <exp> and returned by the
    function are vectors with the same number of components as <x>.


    (add support for new integer built-in functions)

    Syntax:

      genIType bitfieldExtract(genIType value, int offset, int bits);
      genUType bitfieldExtract(genUType value, int offset, int bits);

      genIType bitfieldInsert(genIType base, genIType insert, int offset, 
                              int bits);
      genUType bitfieldInsert(genUType base, genUType insert, int offset, 
                              int bits);

      genIType bitfieldReverse(genIType value);
      genUType bitfieldReverse(genUType value);

      genIType bitCount(genIType value);
      genIType bitCount(genUType value);

      genIType findLSB(genIType value);
      genIType findLSB(genUType value);

      genIType findMSB(genIType value);
      genIType findMSB(genUType value);

    The function bitfieldExtract() extracts bits <offset> through
    <offset>+<bits>-1 from each component in <value>, returning them in the
    least significant bits of corresponding component of the result.  For
    unsigned data types, the most significant bits of the result will be set
    to zero.  For signed data types, the most significant bits will be set to
    the value of bit <offset>+<base>-1.  If <bits> is zero, the result will be
    zero.  The result will be undefined if <offset> or <bits> is negative, or
    if the sum of <offset> and <bits> is greater than the number of bits used
    to store the operand.  Note that for vector versions of bitfieldExtract(),
    a single pair of <offset> and <bits> values is shared for all components.

    The function bitfieldInsert() inserts the <bits> least significant bits of
    each component of <insert> into the corresponding component of <base>.
    The result will have bits numbered <offset> through <offset>+<bits>-1
    taken from bits 0 through <bits>-1 of <insert>, and all other bits taken
    directly from the corresponding bits of <base>.  If <bits> is zero, the
    result will simply be <base>.  The result will be undefined if <offset> or
    <bits> is negative, or if the sum of <offset> and <bits> is greater than
    the number of bits used to store the operand.  Note that for vector
    versions of bitfieldInsert(), a single pair of <offset> and <bits> values
    is shared for all components.

    The function bitfieldReverse() reverses the bits of <value>.  The bit
    numbered <n> of the result will be taken from bit (<bits>-1)-<n> of
    <value>, where <bits> is the total number of bits used to represent
    <value>.

    The function bitCount() returns the number of one bits in the binary
    representation of <value>.

    The function findLSB() returns the bit number of the least significant one
    bit in the binary representation of <value>.  If <value> is zero, -1 will
    be returned.

    The function findMSB() returns the bit number of the most significant bit
    in the binary representation of <value>.  For positive integers, the
    result will be the bit number of the most significant one bit.  For
    negative integers, the result will be the bit number of the most
    significant zero bit.  For a <value> of zero or negative one, -1 will be
    returned.


    (support for unsigned integer add/subtract with carry-out)

    Syntax:

      genUType uaddCarry(genUType x, genUType y, out genUType carry);
      genUType usubBorrow(genUType x, genUType y, out genUType borrow);

    The function uaddCarry() adds 32-bit unsigned integers or vectors <x> and
    <y>, returning the sum modulo 2^32.  The value <carry> is set to zero if
    the sum was less than 2^32, or one otherwise.

    The function usubBorrow() subtracts the 32-bit unsigned integer or vector
    <y> from <x>, returning the difference if non-negative or 2^32 plus the
    difference, otherwise.  The value <borrow> is set to zero if x >= y, or
    one otherwise.


    (support for signed and unsigned multiplies, with 32-bit inputs and a
     64-bit result spanning two 32-bit outputs)

    Syntax:

      void umulExtended(genUType x, genUType y, out genUType msb, 
                        out genUType lsb);
      void imulExtended(genIType x, genIType y, out genIType msb,
                        out genIType lsb);

    The functions umulExtended() and imulExtended() multiply 32-bit unsigned
    or signed integers or vectors <x> and <y>, producing a 64-bit result.  The
    32 least significant bits are returned in <lsb>; the 32 most significant
    bits are returned in <msb>.


GLX Protocol

    None.

Dependencies on ARB_gpu_shader_fp64

    This extension, ARB_gpu_shader_fp64, and NV_gpu_shader5 all modify the set
    of implicit conversions supported in the OpenGL Shading Language.  If more
    than one of these extensions is supported, an expression of one type may
    be converted to another type if that conversion is allowed by any of these
    specifications.

    If ARB_gpu_shader_fp64 or a similar extension introducing new data types
    is not supported, the function overloading rule in the GLSL specification
    preferring promotion an input parameters to smaller type to a larger type
    is never applicable, as all data types are of the same size.  That rule
    and the example referring to "double" should be removed.


Dependencies on NV_gpu_shader5

    This extension, ARB_gpu_shader_fp64, and NV_gpu_shader5 all modify the set
    of implicit conversions supported in the OpenGL Shading Language.  If more
    than one of these extensions is supported, an expression of one type may
    be converted to another type if that conversion is allowed by any of these
    specifications.

    If NV_gpu_shader5 is supported, integer data types are supported with four
    different precisions (8-, 16, 32-, and 64-bit) and floating-point data
    types are supported with three different precisions (16-, 32-, and
    64-bit).  The extension adds the following rule for output parameters,
    which is similar to the one present in this extension for input
    parameters:

       5. If the formal parameters in both matches are output parameters, a
          conversion from a type with a larger number of bits per component is
          better than a conversion from a type with a smaller number of bits
          per component.  For example, a conversion from an "int16_t" formal
          parameter type to "int"  is better than one from an "int8_t" formal
          parameter type to "int".

    Such a rule is not provided in this extension because there is no
    combination of types in this extension and ARB_gpu_shader_fp64 where this
    rule has any effect.


Errors

    None


New State

    None

New Implementation Dependent State

    None

Issues

    (1) What should this extension be called?

      UNRESOLVED.  This extension borrows from GL_ARB_gpu_shader5, so creating
      some sort of a play on that name would be viable.  However, nothing in
      this extension should require SM5 hardware, so such a name would be a
      little misleading and weird.

      Since the primary purpose is to add integer related functions from
      GL_ARB_gpu_shader5, call this extension GL_MESA_shader_integer_functions
      for now.

    (2) Why is some of the formatting in this extension weird?

      RESOLVED: This extension is formatted to minimize the differences (as
      reported by 'diff --side-by-side -W180') with the GL_ARB_gpu_shader5
      specification.

    (3) Should ldexp and frexp be included?

      RESOLVED: Yes.  Few GPUs have native instructions to implement these
      functions.  These are generally implemented using existing GLSL built-in
      functions and the other functions provided by this extension.

    (4) Should umulExtended and imulExtended be included?

      RESOLVED: Yes.  These functions should be implementable on any GPU that
      can support the rest of this extension, but the implementation may be
      complex.  The implementation on a GPU that only supports 32bit x 32bit =
      32bit multiplication would be quite expensive.  However, many GPUs
      (including OpenGL 4.0 GPUs that already support this function) have a
      32bit x 16bit = 48bit multiplier.  The implementation there is only
      trivially more expensive than regular 32bit multiplication.

    (5) Should the pack and unpack functions be included?

      RESOLVED: No.  These functions are already available via
      GL_ARB_shading_language_packing.

    (6) Should the "BitsTo" functions be included?

      RESOLVED: No.  These functions are already available via
      GL_ARB_shader_bit_encoding.

Revision History

    Rev.      Date     Author    Changes
    ----  -----------  --------  -----------------------------------------
     2     7-Jul-2016  idr       Fix typo in #extension line
     1    20-Jun-2016  idr       Initial version based on GL_ARB_gpu_shader5.