diff options
Diffstat (limited to 'src/gallium/docs')
-rw-r--r-- | src/gallium/docs/source/conf.py | 4 | ||||
-rw-r--r-- | src/gallium/docs/source/context.rst | 119 | ||||
-rw-r--r-- | src/gallium/docs/source/cso/blend.rst | 45 | ||||
-rw-r--r-- | src/gallium/docs/source/cso/rasterizer.rst | 101 | ||||
-rw-r--r-- | src/gallium/docs/source/cso/sampler.rst | 10 | ||||
-rw-r--r-- | src/gallium/docs/source/distro.rst | 60 | ||||
-rw-r--r-- | src/gallium/docs/source/exts/tgsi.py | 17 | ||||
-rw-r--r-- | src/gallium/docs/source/glossary.rst | 13 | ||||
-rw-r--r-- | src/gallium/docs/source/screen.rst | 230 | ||||
-rw-r--r-- | src/gallium/docs/source/tgsi.rst | 689 |
10 files changed, 968 insertions, 320 deletions
diff --git a/src/gallium/docs/source/conf.py b/src/gallium/docs/source/conf.py index 9b0c86babdb..59c19ed98dd 100644 --- a/src/gallium/docs/source/conf.py +++ b/src/gallium/docs/source/conf.py @@ -16,13 +16,13 @@ import sys, os # If extensions (or modules to document with autodoc) are in another directory, # add these directories to sys.path here. If the directory is relative to the # documentation root, use os.path.abspath to make it absolute, like shown here. -#sys.path.append(os.path.abspath('.')) +sys.path.append(os.path.abspath('exts')) # -- General configuration ----------------------------------------------------- # Add any Sphinx extension module names here, as strings. They can be extensions # coming with Sphinx (named 'sphinx.ext.*') or your custom ones. -extensions = ['sphinx.ext.pngmath'] +extensions = ['sphinx.ext.pngmath', 'tgsi'] # Add any paths that contain templates here, relative to this directory. templates_path = ['_templates'] diff --git a/src/gallium/docs/source/context.rst b/src/gallium/docs/source/context.rst index 21f5f9111a0..a7669575b95 100644 --- a/src/gallium/docs/source/context.rst +++ b/src/gallium/docs/source/context.rst @@ -33,7 +33,11 @@ This state describes how resources in various flavours (textures, buffers, surfaces) are bound to the driver. -* ``set_constant_buffer`` +* ``set_constant_buffer`` sets a constant buffer to be used for a given shader + type. index is used to indicate which buffer to set (some apis may allow + multiple ones to be set, and binding a specific one later, though drivers + are mostly restricted to the first one right now). + * ``set_framebuffer_state`` * ``set_fragment_sampler_textures`` * ``set_vertex_sampler_textures`` @@ -47,11 +51,13 @@ These pieces of state are too small, variable, and/or trivial to have CSO objects. They all follow simple, one-method binding calls, e.g. ``set_edgeflags``. -* ``set_edgeflags`` * ``set_blend_color`` * ``set_clip_state`` * ``set_polygon_stipple`` -* ``set_scissor_state`` +* ``set_scissor_state`` sets the bounds for the scissor test, which culls + pixels before blending to render targets. If the :ref:`Rasterizer` does + not have the scissor test enabled, then the scissor bounds never need to + be set since they will not be used. * ``set_viewport_state`` * ``set_vertex_elements`` @@ -72,12 +78,67 @@ stencil-only clears of packed depth-stencil buffers. Drawing ^^^^^^^ -``draw_arrays`` +``draw_arrays`` draws a specified primitive. + +This command is equivalent to calling ``draw_arrays_instanced`` +with ``startInstance`` set to 0 and ``instanceCount`` set to 1. -``draw_elements`` +``draw_elements`` draws a specified primitive using an optional +index buffer. + +This command is equivalent to calling ``draw_elements_instanced`` +with ``startInstance`` set to 0 and ``instanceCount`` set to 1. ``draw_range_elements`` +XXX: this is (probably) a temporary entrypoint, as the range +information should be available from the vertex_buffer state. +Using this to quickly evaluate a specialized path in the draw +module. + +``draw_arrays_instanced`` draws multiple instances of the same primitive. + +This command is equivalent to calling ``draw_elements_instanced`` +with ``indexBuffer`` set to NULL and ``indexSize`` set to 0. + +``draw_elements_instanced`` draws multiple instances of the same primitive +using an optional index buffer. + +For instanceID in the range between ``startInstance`` +and ``startInstance``+``instanceCount``-1, inclusive, draw a primitive +specified by ``mode`` and sequential numbers in the range between ``start`` +and ``start``+``count``-1, inclusive. + +If ``indexBuffer`` is not NULL, it specifies an index buffer with index +byte size of ``indexSize``. The sequential numbers are used to lookup +the index buffer and the resulting indices in turn are used to fetch +vertex attributes. + +If ``indexBuffer`` is NULL, the sequential numbers are used directly +as indices to fetch vertex attributes. + +If a given vertex element has ``instance_divisor`` set to 0, it is said +it contains per-vertex data and effective vertex attribute address needs +to be recalculated for every index. + + attribAddr = ``stride`` * index + ``src_offset`` + +If a given vertex element has ``instance_divisor`` set to non-zero, +it is said it contains per-instance data and effective vertex attribute +address needs to recalculated for every ``instance_divisor``-th instance. + + attribAddr = ``stride`` * instanceID / ``instance_divisor`` + ``src_offset`` + +In the above formulas, ``src_offset`` is taken from the given vertex element +and ``stride`` is taken from a vertex buffer associated with the given +vertex element. + +The calculated attribAddr is used as an offset into the vertex buffer to +fetch the attribute data. + +The value of ``instanceID`` can be read in a vertex shader through a system +value register declared with INSTANCEID semantic name. + Queries ^^^^^^^ @@ -87,9 +148,51 @@ draws. Queries may be nested, though no state tracker currently exercises this. Queries can be created with ``create_query`` and deleted with -``destroy_query``. To enable a query, use ``begin_query``, and when finished, -use ``end_query`` to stop the query. Finally, ``get_query_result`` is used -to retrieve the results. +``destroy_query``. To start a query, use ``begin_query``, and when finished, +use ``end_query`` to end the query. + +``get_query_result`` is used to retrieve the results of a query. If +the ``wait`` parameter is TRUE, then the ``get_query_result`` call +will block until the results of the query are ready (and TRUE will be +returned). Otherwise, if the ``wait`` parameter is FALSE, the call +will not block and the return value will be TRUE if the query has +completed or FALSE otherwise. + +A common type of query is the occlusion query which counts the number of +fragments/pixels which are written to the framebuffer (and not culled by +Z/stencil/alpha testing or shader KILL instructions). + + +Conditional Rendering +^^^^^^^^^^^^^^^^^^^^^ + +A drawing command can be skipped depending on the outcome of a query +(typically an occlusion query). The ``render_condition`` function specifies +the query which should be checked prior to rendering anything. + +If ``render_condition`` is called with ``query`` = NULL, conditional +rendering is disabled and drawing takes place normally. + +If ``render_condition`` is called with a non-null ``query`` subsequent +drawing commands will be predicated on the outcome of the query. If +the query result is zero subsequent drawing commands will be skipped. + +If ``mode`` is PIPE_RENDER_COND_WAIT the driver will wait for the +query to complete before deciding whether to render. + +If ``mode`` is PIPE_RENDER_COND_NO_WAIT and the query has not yet +completed, the drawing command will be executed normally. If the query +has completed, drawing will be predicated on the outcome of the query. + +If ``mode`` is PIPE_RENDER_COND_BY_REGION_WAIT or +PIPE_RENDER_COND_BY_REGION_NO_WAIT rendering will be predicated as above +for the non-REGION modes but in the case that an occulusion query returns +a non-zero result, regions which were occluded may be ommitted by subsequent +drawing commands. This can result in better performance with some GPUs. +Normally, if the occlusion query returned a non-zero result subsequent +drawing happens normally so fragments may be generated, shaded and +processed even where they're known to be obscured. + Flushing ^^^^^^^^ diff --git a/src/gallium/docs/source/cso/blend.rst b/src/gallium/docs/source/cso/blend.rst index fd9e4a1e2d5..55c0f328859 100644 --- a/src/gallium/docs/source/cso/blend.rst +++ b/src/gallium/docs/source/cso/blend.rst @@ -6,9 +6,50 @@ Blend This state controls blending of the final fragments into the target rendering buffers. -XXX it is unresolved what behavior should result if blend_enable is off. +Blend Factors +------------- + +The blend factors largely follow the same pattern as their counterparts +in other modern and legacy drawing APIs. + +XXX blurb about dual-source blends Members ------- -XXX undocumented members +independent_blend_enable + If enabled, blend state is different for each render target, and + for each render target set in the respective member of the rt array. + If disabled, blend state is the same for all render targets, and only + the first member of the rt array contains valid data. +logicop_enable + Enables logic ops. Cannot be enabled at the same time as blending, and + is always the same for all render targets. +logicop_func + The logic operation to use if logic ops are enabled. One of PIPE_LOGICOP. +dither + Whether dithering is enabled. +rt + Contains the per-rendertarget blend state. + +Per-rendertarget Members +------------------------ + +blend_enable + If blending is enabled, perform a blend calculation according to blend + functions and source/destination factors. Otherwise, the incoming fragment + color gets passed unmodified (but colormask still applies). +rgb_func + The blend function to use for rgb channels. One of PIPE_BLEND. +rgb_src_factor + The blend source factor to use for rgb channels. One of PIPE_BLENDFACTOR. +rgb_dst_factor + The blend destination factor to use for rgb channels. One of PIPE_BLENDFACTOR. +alpha_func + The blend function to use for the alpha channel. One of PIPE_BLEND. +alpha_src_factor + The blend source factor to use for the alpha channel. One of PIPE_BLENDFACTOR. +alpha_dst_factor + The blend destination factor to use for alpha channel. One of PIPE_BLENDFACTOR. +colormask + Bitmask of which channels to write. Combination of PIPE_MASK bits. diff --git a/src/gallium/docs/source/cso/rasterizer.rst b/src/gallium/docs/source/cso/rasterizer.rst index 00d65fc598a..24cc78c68de 100644 --- a/src/gallium/docs/source/cso/rasterizer.rst +++ b/src/gallium/docs/source/cso/rasterizer.rst @@ -7,32 +7,69 @@ The rasterizer state controls the rendering of points, lines and triangles. Attributes include polygon culling state, line width, line stipple, multisample state, scissoring and flat/smooth shading. - Members ------- +bypass_vs_clip_and_viewport +^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Whether the entire TCL pipeline should be bypassed. This implies that +vertices are pre-transformed for the viewport, and will not be run +through the vertex shader. + +.. note:: + + Implementations may still clip away vertices that are not in the viewport + when this is set. + flatshade - If set, the provoking vertex of each polygon is used to determine the - color of the entire polygon. If not set, fragment colors will be - interpolated between the vertex colors. - Note that this is separate from the fragment shader input attributes - CONSTANT, LINEAR and PERSPECTIVE. We need the flatshade state at +^^^^^^^^^ + +If set, the provoking vertex of each polygon is used to determine the color +of the entire polygon. If not set, fragment colors will be interpolated +between the vertex colors. + +The actual interpolated shading algorithm is obviously +implementation-dependent, but will usually be Gourard for most hardware. + +.. note:: + + This is separate from the fragment shader input attributes + CONSTANT, LINEAR and PERSPECTIVE. The flatshade state is needed at clipping time to determine how to set the color of new vertices. - Also note that the draw module can implement flat shading by copying - the provoking vertex color to all the other vertices in the primitive. + + :ref:`Draw` can implement flat shading by copying the provoking vertex + color to all the other vertices in the primitive. flatshade_first - Whether the first vertex should be the provoking vertex, for most - primitives. If not set, the last vertex is the provoking vertex. +^^^^^^^^^^^^^^^ + +Whether the first vertex should be the provoking vertex, for most primitives. +If not set, the last vertex is the provoking vertex. + +There are several important exceptions to the specification of this rule. + +* ``PIPE_PRIMITIVE_POLYGON``: The provoking vertex is always the first + vertex. If the caller wishes to change the provoking vertex, they merely + need to rotate the vertices themselves. +* ``PIPE_PRIMITIVE_QUAD``, ``PIPE_PRIMITIVE_QUAD_STRIP``: This option has no + effect; the provoking vertex is always the last vertex. +* ``PIPE_PRIMITIVE_TRIANGLE_FAN``: When set, the provoking vertex is the + second vertex, not the first. This permits each segment of the fan to have + a different color. + +Other Members +^^^^^^^^^^^^^ light_twoside - If set, there are per-vertex back-facing colors. The draw module + If set, there are per-vertex back-facing colors. :ref:`Draw` uses this state along with the front/back information to set the final vertex colors prior to rasterization. front_winding Indicates the window order of front-facing polygons, either PIPE_WINDING_CW or PIPE_WINDING_CCW + cull_mode Indicates which polygons to cull, either PIPE_WINDING_NONE (cull no polygons), PIPE_WINDING_CW (cull clockwise-winding polygons), @@ -68,7 +105,7 @@ line_stipple_enable line_stipple_pattern 16-bit bitfield of on/off flags, used to pattern the line stipple. line_stipple_factor - When drawinga stippled line, each bit in the stipple pattern is + When drawing a stippled line, each bit in the stipple pattern is repeated N times, where N = line_stipple_factor + 1. line_last_pixel Controls whether the last pixel in a line is drawn or not. OpenGL @@ -98,7 +135,7 @@ sprite_coord_mode coordinate (0,0,0,1). For PIPE_SPRITE_COORD_UPPER_LEFT, the upper-left vertex will have coordinate (0,0,0,1). - This state is needed by the 'draw' module because that's where each + This state is needed by :ref:`Draw` because that's where each point vertex is converted into four quad vertices. There's no other place to emit the new vertex texture coordinates which are required for sprite rendering. @@ -118,45 +155,9 @@ scissor Whether the scissor test is enabled. multisample - Whether :ref:`MSAA` is enabled. - -bypass_vs_clip_and_viewport - Whether the entire TCL pipeline should be bypassed. This implies that - vertices are pre-transformed for the viewport, and will not be run - through the vertex shader. Note that implementations may still clip away - vertices that are not in the viewport. + Whether :term:`MSAA` is enabled. gl_rasterization_rules Whether the rasterizer should use (0.5, 0.5) pixel centers. When not set, the rasterizer will use (0, 0) for pixel centers. - -Notes ------ - -flatshade -^^^^^^^^^ - -The actual interpolated shading algorithm is obviously -implementation-dependent, but will usually be Gourard for most hardware. - -bypass_vs_clip_and_viewport -^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -When set, this implies that vertices are pre-transformed for the viewport, and -will not be run through the vertex shader. Note that implementations may still -clip away vertices that are not visible. - -flatshade_first -^^^^^^^^^^^^^^^ - -There are several important exceptions to the specification of this rule. - -* ``PIPE_PRIMITIVE_POLYGON``: The provoking vertex is always the first - vertex. If the caller wishes to change the provoking vertex, they merely - need to rotate the vertices themselves. -* ``PIPE_PRIMITIVE_QUAD``, ``PIPE_PRIMITIVE_QUAD_STRIP``: This option has no - effect; the provoking vertex is always the last vertex. -* ``PIPE_PRIMITIVE_TRIANGLE_FAN``: When set, the provoking vertex is the - second vertex, not the first. This permits each segment of the fan to have - a different color. diff --git a/src/gallium/docs/source/cso/sampler.rst b/src/gallium/docs/source/cso/sampler.rst index e3f1757f57a..044ffffcb4f 100644 --- a/src/gallium/docs/source/cso/sampler.rst +++ b/src/gallium/docs/source/cso/sampler.rst @@ -12,8 +12,6 @@ with the traditional (S, T, R, Q) notation. Members ------- -XXX undocumented compare_mode, compare_func - wrap_s How to wrap the S coordinate. One of PIPE_TEX_WRAP. wrap_t @@ -27,12 +25,16 @@ min_mip_filter PIPE_TEX_FILTER. mag_img_filter The filter to use when magnifying texels. One of PIPE_TEX_FILTER. +compare_mode + If set to PIPE_TEX_COMPARE_R_TO_TEXTURE, texture output is computed + according to compare_func, using r coord and the texture value as operands. + If set to PIPE_TEX_COMPARE_NONE, no comparison calculation is performed. +compare_func + How the comparison is computed. One of PIPE_FUNC. normalized_coords Whether the texture coordinates are normalized. If normalized, they will always be in [0, 1]. If not, they will be in the range of each dimension of the loaded texture. -prefilter - XXX From the Doxy, "weird sampling state exposed by some APIs." Refine. lod_bias The bias to apply to the level of detail. min_lod diff --git a/src/gallium/docs/source/distro.rst b/src/gallium/docs/source/distro.rst index 33e846e33d2..100afe33972 100644 --- a/src/gallium/docs/source/distro.rst +++ b/src/gallium/docs/source/distro.rst @@ -31,21 +31,6 @@ Wrapper driver. LLVM Softpipe ^^^^^^^^^^^^^ -nVidia nv04 -^^^^^^^^^^^ - -Deprecated. - -nVidia nv10 -^^^^^^^^^^^ - -Deprecated. - -nVidia nv20 -^^^^^^^^^^^ - -Deprecated. - nVidia nv30 ^^^^^^^^^^^ @@ -61,10 +46,7 @@ VMWare SVGA ATI r300 ^^^^^^^^ -AMD/ATI r600 -^^^^^^^^^^^^ - -Highly experimental. +Testing-quality. Softpipe ^^^^^^^^ @@ -106,20 +88,50 @@ Xorg XFree86 DDX Auxiliary --------- +OS +^^ + +The OS module contains the abstractions for basic operating system services: + +* memory allocation +* simple message logging +* obtaining run-time configuration option +* threading primitives + +This is the bare minimum required to port Gallium to a new platform. + +The OS module already provides the implementations of these abstractions for +the most common platforms. When targeting an embedded platform no +implementation will be provided -- these must be provided separately. + CSO Cache ^^^^^^^^^ +The CSO cache is used to accelerate preparation of state by saving +driver-specific state structures for later use. + +.. _draw: + Draw ^^^^ +Draw is a software :term:`TCL` pipeline for hardware that lacks vertex shaders +or other essential parts of pre-rasterization vertex preparation. + Gallivm ^^^^^^^ Indices ^^^^^^^ -Pipe Buffer Manager -^^^^^^^^^^^^^^^^^^^ +Indices provides tools for translating or generating element indices for +use with element-based rendering. + +Pipe Buffer Managers +^^^^^^^^^^^^^^^^^^^^ + +Each of these managers provides various services to drivers that are not +fully utilizing a memory manager. Remote Debugger ^^^^^^^^^^^^^^^ @@ -127,12 +139,12 @@ Remote Debugger Runtime Assembly Emission ^^^^^^^^^^^^^^^^^^^^^^^^^ -Surface Context Tracker -^^^^^^^^^^^^^^^^^^^^^^^ - TGSI ^^^^ +The TGSI auxiliary module provides basic utilities for manipulating TGSI +streams. + Translate ^^^^^^^^^ diff --git a/src/gallium/docs/source/exts/tgsi.py b/src/gallium/docs/source/exts/tgsi.py new file mode 100644 index 00000000000..e92cd5c4d1b --- /dev/null +++ b/src/gallium/docs/source/exts/tgsi.py @@ -0,0 +1,17 @@ +# tgsi.py +# Sphinx extension providing formatting for TGSI opcodes +# (c) Corbin Simpson 2010 + +import docutils.nodes +import sphinx.addnodes + +def parse_opcode(env, sig, signode): + opcode, desc = sig.split("-", 1) + opcode = opcode.strip().upper() + desc = " (%s)" % desc.strip() + signode += sphinx.addnodes.desc_name(opcode, opcode) + signode += sphinx.addnodes.desc_annotation(desc, desc) + return opcode + +def setup(app): + app.add_description_unit("opcode", "opcode", "%s (TGSI opcode)", parse_opcode) diff --git a/src/gallium/docs/source/glossary.rst b/src/gallium/docs/source/glossary.rst index 6a9110ce786..0696cb5d277 100644 --- a/src/gallium/docs/source/glossary.rst +++ b/src/gallium/docs/source/glossary.rst @@ -8,3 +8,16 @@ Glossary Multi-Sampled Anti-Aliasing. A basic anti-aliasing technique that takes multiple samples of the depth buffer, and uses this information to smooth the edges of polygons. + + TCL + Transform, Clipping, & Lighting. The three stages of preparation in a + rasterizing pipeline prior to the actual rasterization of vertices into + fragments. + + NPOT + Non-power-of-two. Usually applied to textures which have at least one + dimension which is not a power of two. + + LOD + Level of Detail. Also spelled "LoD." The value that determines when the + switches between mipmaps occur during texture sampling. diff --git a/src/gallium/docs/source/screen.rst b/src/gallium/docs/source/screen.rst index 9631e6967ef..27f65522b69 100644 --- a/src/gallium/docs/source/screen.rst +++ b/src/gallium/docs/source/screen.rst @@ -3,6 +3,160 @@ Screen A screen is an object representing the context-independent part of a device. +Useful Flags +------------ + +.. _pipe_cap: + +PIPE_CAP +^^^^^^^^ + +Pipe capabilities help expose hardware functionality not explicitly required +by Gallium. For floating-point values, use :ref:`get_paramf`, and for boolean +or integer values, use :ref:`get_param`. + +The integer capabilities: + +* ``MAX_TEXTURE_IMAGE_UNITS``: The maximum number of samplers available. +* ``NPOT_TEXTURES``: Whether :term:`NPOT` textures may have repeat modes, + normalized coordinates, and mipmaps. +* ``TWO_SIDED_STENCIL``: Whether the stencil test can also affect back-facing + polygons. +* ``GLSL``: Deprecated. +* ``DUAL_SOURCE_BLEND``: Whether dual-source blend factors are supported. See + :ref:`Blend` for more information. +* ``ANISOTROPIC_FILTER``: Whether textures can be filtered anisotropically. +* ``POINT_SPRITE``: Whether point sprites are available. +* ``MAX_RENDER_TARGETS``: The maximum number of render targets that may be + bound. +* ``OCCLUSION_QUERY``: Whether occlusion queries are available. +* ``TEXTURE_SHADOW_MAP``: XXX +* ``MAX_TEXTURE_2D_LEVELS``: The maximum number of mipmap levels available + for a 2D texture. +* ``MAX_TEXTURE_3D_LEVELS``: The maximum number of mipmap levels available + for a 3D texture. +* ``MAX_TEXTURE_CUBE_LEVELS``: The maximum number of mipmap levels available + for a cubemap. +* ``TEXTURE_MIRROR_CLAMP``: Whether mirrored texture coordinates with clamp + are supported. +* ``TEXTURE_MIRROR_REPEAT``: Whether mirrored repeating texture coordinates + are supported. +* ``MAX_VERTEX_TEXTURE_UNITS``: The maximum number of samplers addressable + inside the vertex shader. If this is 0, then the vertex shader cannot + sample textures. +* ``TGSI_CONT_SUPPORTED``: Whether the TGSI CONT opcode is supported. +* ``BLEND_EQUATION_SEPARATE``: Whether alpha blend equations may be different + from color blend equations, in :ref:`Blend` state. +* ``SM3``: Whether the vertex shader and fragment shader support equivalent + opcodes to the Shader Model 3 specification. XXX oh god this is horrible +* ``MAX_PREDICATE_REGISTERS``: XXX +* ``MAX_COMBINED_SAMPLERS``: The total number of samplers accessible from + the vertex and fragment shader, inclusive. +* ``MAX_CONST_BUFFERS``: Maximum number of constant buffers that can be bound + to any shader stage using ``set_constant_buffer``. If 0 or 1, the pipe will + only permit binding one constant buffer per shader, and the shaders will + not permit two-dimensional access to constants. +* ``MAX_CONST_BUFFER_SIZE``: Maximum byte size of a single constant buffer. +* ``INDEP_BLEND_ENABLE``: Whether per-rendertarget blend enabling and channel + masks are supported. If 0, then the first rendertarget's blend mask is + replicated across all MRTs. +* ``INDEP_BLEND_FUNC``: Whether per-rendertarget blend functions are + available. If 0, then the first rendertarget's blend functions affect all + MRTs. +* ``PIPE_CAP_TGSI_FS_COORD_ORIGIN_UPPER_LEFT``: Whether the TGSI property + FS_COORD_ORIGIN with value UPPER_LEFT is supported. +* ``PIPE_CAP_TGSI_FS_COORD_ORIGIN_LOWER_LEFT``: Whether the TGSI property + FS_COORD_ORIGIN with value LOWER_LEFT is supported. +* ``PIPE_CAP_TGSI_FS_COORD_PIXEL_CENTER_HALF_INTEGER``: Whether the TGSI + property FS_COORD_PIXEL_CENTER with value HALF_INTEGER is supported. +* ``PIPE_CAP_TGSI_FS_COORD_PIXEL_CENTER_INTEGER``: Whether the TGSI + property FS_COORD_PIXEL_CENTER with value INTEGER is supported. + +The floating-point capabilities: + +* ``MAX_LINE_WIDTH``: The maximum width of a regular line. +* ``MAX_LINE_WIDTH_AA``: The maximum width of a smoothed line. +* ``MAX_POINT_WIDTH``: The maximum width and height of a point. +* ``MAX_POINT_WIDTH_AA``: The maximum width and height of a smoothed point. +* ``MAX_TEXTURE_ANISOTROPY``: The maximum level of anisotropy that can be + applied to anisotropically filtered textures. +* ``MAX_TEXTURE_LOD_BIAS``: The maximum :term:`LOD` bias that may be applied + to filtered textures. +* ``GUARD_BAND_LEFT``, ``GUARD_BAND_TOP``, ``GUARD_BAND_RIGHT``, + ``GUARD_BAND_BOTTOM``: XXX + +XXX Is there a better home for this? vvv + +If 0 is returned, the driver is not aware of multiple constant buffers, +supports binding of only one constant buffer, and does not support +two-dimensional CONST register file access in TGSI shaders. + +If a value greater than 0 is returned, the driver can have multiple +constant buffers bound to shader stages. The CONST register file can +be accessed with two-dimensional indices, like in the example below. + +DCL CONST[0][0..7] # declare first 8 vectors of constbuf 0 +DCL CONST[3][0] # declare first vector of constbuf 3 +MOV OUT[0], CONST[0][3] # copy vector 3 of constbuf 0 + +For backwards compatibility, one-dimensional access to CONST register +file is still supported. In that case, the constbuf index is assumed +to be 0. + +.. _pipe_buffer_usage: + +PIPE_BUFFER_USAGE +^^^^^^^^^^^^^^^^^ + +These flags control buffer creation. Buffers may only have one role, so +care should be taken to not allocate a buffer with the wrong usage. + +* ``PIXEL``: This is the flag to use for all textures. +* ``VERTEX``: A vertex buffer. +* ``INDEX``: An element buffer. +* ``CONSTANT``: A buffer of shader constants. + +Buffers are inevitably abstracting the pipe's underlying memory management, +so many of their usage flags can be used to direct the way the buffer is +handled. + +* ``CPU_READ``, ``CPU_WRITE``: Whether the user will map and, in the case of + the latter, write to, the buffer. The convenience flag ``CPU_READ_WRITE`` is + available to signify a read/write buffer. +* ``GPU_READ``, ``GPU_WRITE``: Whether the driver will internally need to + read from or write to the buffer. The latter will only happen if the buffer + is made into a render target. +* ``DISCARD``: When set on a map, the contents of the map will be discarded + beforehand. Cannot be used with ``CPU_READ``. +* ``DONTBLOCK``: When set on a map, the map will fail if the buffer cannot be + mapped immediately. +* ``UNSYNCHRONIZED``: When set on a map, any outstanding operations on the + buffer will be ignored. The interaction of any writes to the map and any + operations pending with the buffer are undefined. Cannot be used with + ``CPU_READ``. +* ``FLUSH_EXPLICIT``: When set on a map, written ranges of the map require + explicit flushes using :ref:`buffer_flush_mapped_range`. Requires + ``CPU_WRITE``. + +.. _pipe_texture_usage: + +PIPE_TEXTURE_USAGE +^^^^^^^^^^^^^^^^^^ + +These flags determine the possible roles a texture may be used for during its +lifetime. Texture usage flags are cumulative and may be combined to create a +texture that can be used as multiple things. + +* ``RENDER_TARGET``: A colorbuffer or pixelbuffer. +* ``DISPLAY_TARGET``: A sharable buffer that can be given to another process. +* ``PRIMARY``: A frontbuffer or scanout buffer. +* ``DEPTH_STENCIL``: A depthbuffer, stencilbuffer, or Z buffer. Gallium does + not explicitly provide for stencil-only buffers, so any stencilbuffer + validated here is implicitly also a depthbuffer. +* ``SAMPLER``: A texture that may be sampled from in a fragment or vertex + shader. +* ``DYNAMIC``: A texture that will be mapped frequently. + Methods ------- @@ -18,22 +172,96 @@ get_vendor Returns the screen vendor. +.. _get_param: + get_param ^^^^^^^^^ Get an integer/boolean screen parameter. +**param** is one of the :ref:`PIPE_CAP` names. + +.. _get_paramf: + get_paramf ^^^^^^^^^^ Get a floating-point screen parameter. +**param** is one of the :ref:`PIPE_CAP` names. + +context_create +^^^^^^^^^^^^^^ + +Create a pipe_context. + +**priv** is private data of the caller, which may be put to various +unspecified uses, typically to do with implementing swapbuffers +and/or front-buffer rendering. + is_format_supported ^^^^^^^^^^^^^^^^^^^ See if a format can be used in a specific manner. +**usage** is a bitmask of :ref:`PIPE_TEXTURE_USAGE` flags. + +Returns TRUE if all usages can be satisfied. + +.. note:: + + ``PIPE_TEXTURE_USAGE_DYNAMIC`` is not a valid usage. + +.. _texture_create: + texture_create ^^^^^^^^^^^^^^ -Given a template of texture setup, create a BO-backed texture. +Given a template of texture setup, create a buffer and texture. + +texture_blanket +^^^^^^^^^^^^^^^ + +Like :ref:`texture_create`, but use a supplied buffer instead of creating a +new one. + +texture_destroy +^^^^^^^^^^^^^^^ + +Destroy a texture. The buffer backing the texture is destroyed if it has no +more references. + +buffer_map +^^^^^^^^^^ + +Map a buffer into memory. + +**usage** is a bitmask of :ref:`PIPE_BUFFER_USAGE` flags. + +Returns a pointer to the map, or NULL if the mapping failed. + +buffer_map_range +^^^^^^^^^^^^^^^^ + +Map a range of a buffer into memory. + +The returned map is always relative to the beginning of the buffer, not the +beginning of the mapped range. + +.. _buffer_flush_mapped_range: + +buffer_flush_mapped_range +^^^^^^^^^^^^^^^^^^^^^^^^^ + +Flush a range of mapped memory into a buffer. + +The buffer must have been mapped with ``PIPE_BUFFER_USAGE_FLUSH_EXPLICIT``. + +**usage** is a bitmask of :ref:`PIPE_BUFFER_USAGE` flags. + +buffer_unmap +^^^^^^^^^^^^ + +Unmap a buffer from memory. + +Any pointers into the map should be considered invalid and discarded. diff --git a/src/gallium/docs/source/tgsi.rst b/src/gallium/docs/source/tgsi.rst index ef068448e83..c292cd37d5c 100644 --- a/src/gallium/docs/source/tgsi.rst +++ b/src/gallium/docs/source/tgsi.rst @@ -6,6 +6,23 @@ for describing shaders. Since Gallium is inherently shaderful, shaders are an important part of the API. TGSI is the only intermediate representation used by all drivers. +Basics +------ + +All TGSI instructions, known as *opcodes*, operate on arbitrary-precision +floating-point four-component vectors. An opcode may have up to one +destination register, known as *dst*, and between zero and three source +registers, called *src0* through *src2*, or simply *src* if there is only +one. + +Some instructions, like :opcode:`I2F`, permit re-interpretation of vector +components as integers. Other instructions permit using registers as +two-component vectors with double precision; see :ref:`Double Opcodes`. + +When an instruction has a scalar result, the result is usually copied into +each of the components of *dst*. When this happens, the result is said to be +*replicated* to *dst*. :opcode:`RCP` is one such instruction. + Instruction Set --------------- @@ -13,7 +30,7 @@ From GL_NV_vertex_program ^^^^^^^^^^^^^^^^^^^^^^^^^ -ARL - Address Register Load +.. opcode:: ARL - Address Register Load .. math:: @@ -26,7 +43,7 @@ ARL - Address Register Load dst.w = \lfloor src.w\rfloor -MOV - Move +.. opcode:: MOV - Move .. math:: @@ -39,7 +56,7 @@ MOV - Move dst.w = src.w -LIT - Light Coefficients +.. opcode:: LIT - Light Coefficients .. math:: @@ -52,33 +69,25 @@ LIT - Light Coefficients dst.w = 1 -RCP - Reciprocal - -.. math:: +.. opcode:: RCP - Reciprocal - dst.x = \frac{1}{src.x} +This instruction replicates its result. - dst.y = \frac{1}{src.x} +.. math:: - dst.z = \frac{1}{src.x} + dst = \frac{1}{src.x} - dst.w = \frac{1}{src.x} +.. opcode:: RSQ - Reciprocal Square Root -RSQ - Reciprocal Square Root +This instruction replicates its result. .. math:: - dst.x = \frac{1}{\sqrt{|src.x|}} - - dst.y = \frac{1}{\sqrt{|src.x|}} - - dst.z = \frac{1}{\sqrt{|src.x|}} + dst = \frac{1}{\sqrt{|src.x|}} - dst.w = \frac{1}{\sqrt{|src.x|}} - -EXP - Approximate Exponential Base 2 +.. opcode:: EXP - Approximate Exponential Base 2 .. math:: @@ -91,7 +100,7 @@ EXP - Approximate Exponential Base 2 dst.w = 1 -LOG - Approximate Logarithm Base 2 +.. opcode:: LOG - Approximate Logarithm Base 2 .. math:: @@ -104,7 +113,7 @@ LOG - Approximate Logarithm Base 2 dst.w = 1 -MUL - Multiply +.. opcode:: MUL - Multiply .. math:: @@ -117,7 +126,7 @@ MUL - Multiply dst.w = src0.w \times src1.w -ADD - Add +.. opcode:: ADD - Add .. math:: @@ -130,33 +139,25 @@ ADD - Add dst.w = src0.w + src1.w -DP3 - 3-component Dot Product - -.. math:: +.. opcode:: DP3 - 3-component Dot Product - dst.x = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z +This instruction replicates its result. - dst.y = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z +.. math:: - dst.z = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + dst = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z - dst.w = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z +.. opcode:: DP4 - 4-component Dot Product -DP4 - 4-component Dot Product +This instruction replicates its result. .. math:: - dst.x = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src0.w \times src1.w - - dst.y = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src0.w \times src1.w - - dst.z = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src0.w \times src1.w + dst = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src0.w \times src1.w - dst.w = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src0.w \times src1.w - -DST - Distance Vector +.. opcode:: DST - Distance Vector .. math:: @@ -169,7 +170,7 @@ DST - Distance Vector dst.w = src1.w -MIN - Minimum +.. opcode:: MIN - Minimum .. math:: @@ -182,7 +183,7 @@ MIN - Minimum dst.w = min(src0.w, src1.w) -MAX - Maximum +.. opcode:: MAX - Maximum .. math:: @@ -195,7 +196,7 @@ MAX - Maximum dst.w = max(src0.w, src1.w) -SLT - Set On Less Than +.. opcode:: SLT - Set On Less Than .. math:: @@ -208,7 +209,7 @@ SLT - Set On Less Than dst.w = (src0.w < src1.w) ? 1 : 0 -SGE - Set On Greater Equal Than +.. opcode:: SGE - Set On Greater Equal Than .. math:: @@ -221,7 +222,7 @@ SGE - Set On Greater Equal Than dst.w = (src0.w >= src1.w) ? 1 : 0 -MAD - Multiply And Add +.. opcode:: MAD - Multiply And Add .. math:: @@ -234,7 +235,7 @@ MAD - Multiply And Add dst.w = src0.w \times src1.w + src2.w -SUB - Subtract +.. opcode:: SUB - Subtract .. math:: @@ -247,7 +248,7 @@ SUB - Subtract dst.w = src0.w - src1.w -LRP - Linear Interpolate +.. opcode:: LRP - Linear Interpolate .. math:: @@ -260,7 +261,7 @@ LRP - Linear Interpolate dst.w = src0.w \times src1.w + (1 - src0.w) \times src2.w -CND - Condition +.. opcode:: CND - Condition .. math:: @@ -273,7 +274,7 @@ CND - Condition dst.w = (src2.w > 0.5) ? src0.w : src1.w -DP2A - 2-component Dot Product And Add +.. opcode:: DP2A - 2-component Dot Product And Add .. math:: @@ -286,7 +287,7 @@ DP2A - 2-component Dot Product And Add dst.w = src0.x \times src1.x + src0.y \times src1.y + src2.x -FRAC - Fraction +.. opcode:: FRAC - Fraction .. math:: @@ -299,7 +300,7 @@ FRAC - Fraction dst.w = src.w - \lfloor src.w\rfloor -CLAMP - Clamp +.. opcode:: CLAMP - Clamp .. math:: @@ -312,9 +313,9 @@ CLAMP - Clamp dst.w = clamp(src0.w, src1.w, src2.w) -FLR - Floor +.. opcode:: FLR - Floor -This is identical to ARL. +This is identical to :opcode:`ARL`. .. math:: @@ -327,7 +328,7 @@ This is identical to ARL. dst.w = \lfloor src.w\rfloor -ROUND - Round +.. opcode:: ROUND - Round .. math:: @@ -340,45 +341,33 @@ ROUND - Round dst.w = round(src.w) -EX2 - Exponential Base 2 - -.. math:: +.. opcode:: EX2 - Exponential Base 2 - dst.x = 2^{src.x} +This instruction replicates its result. - dst.y = 2^{src.x} +.. math:: - dst.z = 2^{src.x} + dst = 2^{src.x} - dst.w = 2^{src.x} +.. opcode:: LG2 - Logarithm Base 2 -LG2 - Logarithm Base 2 +This instruction replicates its result. .. math:: - dst.x = \log_2{src.x} - - dst.y = \log_2{src.x} + dst = \log_2{src.x} - dst.z = \log_2{src.x} - dst.w = \log_2{src.x} +.. opcode:: POW - Power - -POW - Power +This instruction replicates its result. .. math:: - dst.x = src0.x^{src1.x} - - dst.y = src0.x^{src1.x} - - dst.z = src0.x^{src1.x} + dst = src0.x^{src1.x} - dst.w = src0.x^{src1.x} - -XPD - Cross Product +.. opcode:: XPD - Cross Product .. math:: @@ -391,7 +380,7 @@ XPD - Cross Product dst.w = 1 -ABS - Absolute +.. opcode:: ABS - Absolute .. math:: @@ -404,48 +393,36 @@ ABS - Absolute dst.w = |src.w| -RCC - Reciprocal Clamped +.. opcode:: RCC - Reciprocal Clamped + +This instruction replicates its result. XXX cleanup on aisle three .. math:: - dst.x = (1 / src.x) > 0 ? clamp(1 / src.x, 5.42101e-020, 1.884467e+019) : clamp(1 / src.x, -1.884467e+019, -5.42101e-020) - - dst.y = (1 / src.x) > 0 ? clamp(1 / src.x, 5.42101e-020, 1.884467e+019) : clamp(1 / src.x, -1.884467e+019, -5.42101e-020) + dst = (1 / src.x) > 0 ? clamp(1 / src.x, 5.42101e-020, 1.884467e+019) : clamp(1 / src.x, -1.884467e+019, -5.42101e-020) - dst.z = (1 / src.x) > 0 ? clamp(1 / src.x, 5.42101e-020, 1.884467e+019) : clamp(1 / src.x, -1.884467e+019, -5.42101e-020) - dst.w = (1 / src.x) > 0 ? clamp(1 / src.x, 5.42101e-020, 1.884467e+019) : clamp(1 / src.x, -1.884467e+019, -5.42101e-020) +.. opcode:: DPH - Homogeneous Dot Product - -DPH - Homogeneous Dot Product +This instruction replicates its result. .. math:: - dst.x = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src1.w - - dst.y = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src1.w - - dst.z = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src1.w + dst = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src1.w - dst.w = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src1.w +.. opcode:: COS - Cosine -COS - Cosine +This instruction replicates its result. .. math:: - dst.x = \cos{src.x} - - dst.y = \cos{src.x} - - dst.z = \cos{src.x} + dst = \cos{src.x} - dst.w = \cos{src.x} - -DDX - Derivative Relative To X +.. opcode:: DDX - Derivative Relative To X .. math:: @@ -458,7 +435,7 @@ DDX - Derivative Relative To X dst.w = partialx(src.w) -DDY - Derivative Relative To Y +.. opcode:: DDY - Derivative Relative To Y .. math:: @@ -471,32 +448,32 @@ DDY - Derivative Relative To Y dst.w = partialy(src.w) -KILP - Predicated Discard +.. opcode:: KILP - Predicated Discard discard -PK2H - Pack Two 16-bit Floats +.. opcode:: PK2H - Pack Two 16-bit Floats TBD -PK2US - Pack Two Unsigned 16-bit Scalars +.. opcode:: PK2US - Pack Two Unsigned 16-bit Scalars TBD -PK4B - Pack Four Signed 8-bit Scalars +.. opcode:: PK4B - Pack Four Signed 8-bit Scalars TBD -PK4UB - Pack Four Unsigned 8-bit Scalars +.. opcode:: PK4UB - Pack Four Unsigned 8-bit Scalars TBD -RFL - Reflection Vector +.. opcode:: RFL - Reflection Vector .. math:: @@ -508,145 +485,171 @@ RFL - Reflection Vector dst.w = 1 -Considered for removal. +.. note:: + + Considered for removal. -SEQ - Set On Equal +.. opcode:: SEQ - Set On Equal .. math:: dst.x = (src0.x == src1.x) ? 1 : 0 + dst.y = (src0.y == src1.y) ? 1 : 0 + dst.z = (src0.z == src1.z) ? 1 : 0 + dst.w = (src0.w == src1.w) ? 1 : 0 -SFL - Set On False +.. opcode:: SFL - Set On False + +This instruction replicates its result. .. math:: - dst.x = 0 - dst.y = 0 - dst.z = 0 - dst.w = 0 + dst = 0 + +.. note:: + + Considered for removal. -Considered for removal. -SGT - Set On Greater Than +.. opcode:: SGT - Set On Greater Than .. math:: dst.x = (src0.x > src1.x) ? 1 : 0 + dst.y = (src0.y > src1.y) ? 1 : 0 - dst.z = (src0.z > src1.z) ? 1 : 0 - dst.w = (src0.w > src1.w) ? 1 : 0 + dst.z = (src0.z > src1.z) ? 1 : 0 -SIN - Sine + dst.w = (src0.w > src1.w) ? 1 : 0 -.. math:: - dst.x = \sin{src.x} +.. opcode:: SIN - Sine - dst.y = \sin{src.x} +This instruction replicates its result. - dst.z = \sin{src.x} +.. math:: - dst.w = \sin{src.x} + dst = \sin{src.x} -SLE - Set On Less Equal Than +.. opcode:: SLE - Set On Less Equal Than .. math:: dst.x = (src0.x <= src1.x) ? 1 : 0 + dst.y = (src0.y <= src1.y) ? 1 : 0 + dst.z = (src0.z <= src1.z) ? 1 : 0 + dst.w = (src0.w <= src1.w) ? 1 : 0 -SNE - Set On Not Equal +.. opcode:: SNE - Set On Not Equal .. math:: dst.x = (src0.x != src1.x) ? 1 : 0 + dst.y = (src0.y != src1.y) ? 1 : 0 + dst.z = (src0.z != src1.z) ? 1 : 0 + dst.w = (src0.w != src1.w) ? 1 : 0 -STR - Set On True +.. opcode:: STR - Set On True + +This instruction replicates its result. .. math:: - dst.x = 1 - dst.y = 1 - dst.z = 1 - dst.w = 1 + dst = 1 -TEX - Texture Lookup +.. opcode:: TEX - Texture Lookup TBD -TXD - Texture Lookup with Derivatives +.. opcode:: TXD - Texture Lookup with Derivatives TBD -TXP - Projective Texture Lookup +.. opcode:: TXP - Projective Texture Lookup TBD -UP2H - Unpack Two 16-Bit Floats +.. opcode:: UP2H - Unpack Two 16-Bit Floats TBD - Considered for removal. +.. note:: + + Considered for removal. -UP2US - Unpack Two Unsigned 16-Bit Scalars +.. opcode:: UP2US - Unpack Two Unsigned 16-Bit Scalars TBD - Considered for removal. +.. note:: + + Considered for removal. -UP4B - Unpack Four Signed 8-Bit Values +.. opcode:: UP4B - Unpack Four Signed 8-Bit Values TBD - Considered for removal. +.. note:: -UP4UB - Unpack Four Unsigned 8-Bit Scalars + Considered for removal. + +.. opcode:: UP4UB - Unpack Four Unsigned 8-Bit Scalars TBD - Considered for removal. +.. note:: + + Considered for removal. -X2D - 2D Coordinate Transformation +.. opcode:: X2D - 2D Coordinate Transformation .. math:: dst.x = src0.x + src1.x \times src2.x + src1.y \times src2.y + dst.y = src0.y + src1.x \times src2.z + src1.y \times src2.w + dst.z = src0.x + src1.x \times src2.x + src1.y \times src2.y + dst.w = src0.y + src1.x \times src2.z + src1.y \times src2.w -Considered for removal. +.. note:: + + Considered for removal. From GL_NV_vertex_program2 ^^^^^^^^^^^^^^^^^^^^^^^^^^ -ARA - Address Register Add +.. opcode:: ARA - Address Register Add TBD - Considered for removal. +.. note:: -ARR - Address Register Load With Round + Considered for removal. + +.. opcode:: ARR - Address Register Load With Round .. math:: @@ -659,26 +662,28 @@ ARR - Address Register Load With Round dst.w = round(src.w) -BRA - Branch +.. opcode:: BRA - Branch pc = target - Considered for removal. +.. note:: + + Considered for removal. -CAL - Subroutine Call +.. opcode:: CAL - Subroutine Call push(pc) pc = target -RET - Subroutine Call Return +.. opcode:: RET - Subroutine Call Return pc = pop() Potential restrictions: * Only occurs at end of function. -SSG - Set Sign +.. opcode:: SSG - Set Sign .. math:: @@ -691,7 +696,7 @@ SSG - Set Sign dst.w = (src.w > 0) ? 1 : (src.w < 0) ? -1 : 0 -CMP - Compare +.. opcode:: CMP - Compare .. math:: @@ -704,7 +709,7 @@ CMP - Compare dst.w = (src0.w < 0) ? src1.w : src2.w -KIL - Conditional Discard +.. opcode:: KIL - Conditional Discard .. math:: @@ -713,7 +718,7 @@ KIL - Conditional Discard endif -SCS - Sine Cosine +.. opcode:: SCS - Sine Cosine .. math:: @@ -726,12 +731,12 @@ SCS - Sine Cosine dst.y = 1 -TXB - Texture Lookup With Bias +.. opcode:: TXB - Texture Lookup With Bias TBD -NRM - 3-component Vector Normalise +.. opcode:: NRM - 3-component Vector Normalise .. math:: @@ -744,7 +749,7 @@ NRM - 3-component Vector Normalise dst.w = 1 -DIV - Divide +.. opcode:: DIV - Divide .. math:: @@ -757,35 +762,31 @@ DIV - Divide dst.w = \frac{src0.w}{src1.w} -DP2 - 2-component Dot Product +.. opcode:: DP2 - 2-component Dot Product -.. math:: +This instruction replicates its result. - dst.x = src0.x \times src1.x + src0.y \times src1.y - - dst.y = src0.x \times src1.x + src0.y \times src1.y - - dst.z = src0.x \times src1.x + src0.y \times src1.y +.. math:: - dst.w = src0.x \times src1.x + src0.y \times src1.y + dst = src0.x \times src1.x + src0.y \times src1.y -TXL - Texture Lookup With LOD +.. opcode:: TXL - Texture Lookup With LOD TBD -BRK - Break +.. opcode:: BRK - Break TBD -IF - If +.. opcode:: IF - If TBD -BGNFOR - Begin a For-Loop +.. opcode:: BGNFOR - Begin a For-Loop dst.x = floor(src.x) dst.y = floor(src.y) @@ -798,25 +799,31 @@ BGNFOR - Begin a For-Loop Note: The destination must be a loop register. The source must be a constant register. - Considered for cleanup / removal. +.. note:: + + Considered for cleanup. + +.. note:: + + Considered for removal. -REP - Repeat +.. opcode:: REP - Repeat TBD -ELSE - Else +.. opcode:: ELSE - Else TBD -ENDIF - End If +.. opcode:: ENDIF - End If TBD -ENDFOR - End a For-Loop +.. opcode:: ENDFOR - End a For-Loop dst.x = dst.x + dst.z dst.y = dst.y - 1.0 @@ -827,30 +834,48 @@ ENDFOR - End a For-Loop Note: The destination must be a loop register. - Considered for cleanup / removal. +.. note:: -ENDREP - End Repeat + Considered for cleanup. + +.. note:: + + Considered for removal. + +.. opcode:: ENDREP - End Repeat TBD -PUSHA - Push Address Register On Stack +.. opcode:: PUSHA - Push Address Register On Stack push(src.x) push(src.y) push(src.z) push(src.w) - Considered for cleanup / removal. +.. note:: + + Considered for cleanup. + +.. note:: + + Considered for removal. -POPA - Pop Address Register From Stack +.. opcode:: POPA - Pop Address Register From Stack dst.w = pop() dst.z = pop() dst.y = pop() dst.x = pop() - Considered for cleanup / removal. +.. note:: + + Considered for cleanup. + +.. note:: + + Considered for removal. From GL_NV_gpu_program4 @@ -858,7 +883,7 @@ From GL_NV_gpu_program4 Support for these opcodes indicated by a special pipe capability bit (TBD). -CEIL - Ceiling +.. opcode:: CEIL - Ceiling .. math:: @@ -871,7 +896,7 @@ CEIL - Ceiling dst.w = \lceil src.w\rceil -I2F - Integer To Float +.. opcode:: I2F - Integer To Float .. math:: @@ -884,7 +909,7 @@ I2F - Integer To Float dst.w = (float) src.w -NOT - Bitwise Not +.. opcode:: NOT - Bitwise Not .. math:: @@ -897,7 +922,7 @@ NOT - Bitwise Not dst.w = ~src.w -TRUNC - Truncate +.. opcode:: TRUNC - Truncate .. math:: @@ -910,7 +935,7 @@ TRUNC - Truncate dst.w = trunc(src.w) -SHL - Shift Left +.. opcode:: SHL - Shift Left .. math:: @@ -923,7 +948,7 @@ SHL - Shift Left dst.w = src0.w << src1.x -SHR - Shift Right +.. opcode:: SHR - Shift Right .. math:: @@ -936,7 +961,7 @@ SHR - Shift Right dst.w = src0.w >> src1.x -AND - Bitwise And +.. opcode:: AND - Bitwise And .. math:: @@ -949,7 +974,7 @@ AND - Bitwise And dst.w = src0.w & src1.w -OR - Bitwise Or +.. opcode:: OR - Bitwise Or .. math:: @@ -962,7 +987,7 @@ OR - Bitwise Or dst.w = src0.w | src1.w -MOD - Modulus +.. opcode:: MOD - Modulus .. math:: @@ -975,20 +1000,20 @@ MOD - Modulus dst.w = src0.w \bmod src1.w -XOR - Bitwise Xor +.. opcode:: XOR - Bitwise Xor .. math:: - dst.x = src0.x ^ src1.x + dst.x = src0.x \oplus src1.x - dst.y = src0.y ^ src1.y + dst.y = src0.y \oplus src1.y - dst.z = src0.z ^ src1.z + dst.z = src0.z \oplus src1.z - dst.w = src0.w ^ src1.w + dst.w = src0.w \oplus src1.w -SAD - Sum Of Absolute Differences +.. opcode:: SAD - Sum Of Absolute Differences .. math:: @@ -1001,17 +1026,17 @@ SAD - Sum Of Absolute Differences dst.w = |src0.w - src1.w| + src2.w -TXF - Texel Fetch +.. opcode:: TXF - Texel Fetch TBD -TXQ - Texture Size Query +.. opcode:: TXQ - Texture Size Query TBD -CONT - Continue +.. opcode:: CONT - Continue TBD @@ -1020,12 +1045,12 @@ From GL_NV_geometry_program4 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -EMIT - Emit +.. opcode:: EMIT - Emit TBD -ENDPRIM - End Primitive +.. opcode:: ENDPRIM - End Primitive TBD @@ -1034,62 +1059,171 @@ From GLSL ^^^^^^^^^^ -BGNLOOP - Begin a Loop +.. opcode:: BGNLOOP - Begin a Loop TBD -BGNSUB - Begin Subroutine +.. opcode:: BGNSUB - Begin Subroutine TBD -ENDLOOP - End a Loop +.. opcode:: ENDLOOP - End a Loop TBD -ENDSUB - End Subroutine +.. opcode:: ENDSUB - End Subroutine TBD -NOP - No Operation +.. opcode:: NOP - No Operation Do nothing. -NRM4 - 4-component Vector Normalise - -.. math:: - - dst.x = \frac{src.x}{src.x \times src.x + src.y \times src.y + src.z \times src.z + src.w \times src.w} +.. opcode:: NRM4 - 4-component Vector Normalise - dst.y = \frac{src.y}{src.x \times src.x + src.y \times src.y + src.z \times src.z + src.w \times src.w} +This instruction replicates its result. - dst.z = \frac{src.z}{src.x \times src.x + src.y \times src.y + src.z \times src.z + src.w \times src.w} +.. math:: - dst.w = \frac{src.w}{src.x \times src.x + src.y \times src.y + src.z \times src.z + src.w \times src.w} + dst = \frac{src.x}{src.x \times src.x + src.y \times src.y + src.z \times src.z + src.w \times src.w} ps_2_x ^^^^^^^^^^^^ -CALLNZ - Subroutine Call If Not Zero +.. opcode:: CALLNZ - Subroutine Call If Not Zero TBD -IFC - If +.. opcode:: IFC - If TBD -BREAKC - Break Conditional +.. opcode:: BREAKC - Break Conditional TBD +.. _doubleopcodes: + +Double Opcodes +^^^^^^^^^^^^^^^ + +.. opcode:: DADD - Add Double + +.. math:: + + dst.xy = src0.xy + src1.xy + + dst.zw = src0.zw + src1.zw + + +.. opcode:: DDIV - Divide Double + +.. math:: + + dst.xy = src0.xy / src1.xy + + dst.zw = src0.zw / src1.zw + +.. opcode:: DSEQ - Set Double on Equal + +.. math:: + + dst.xy = src0.xy == src1.xy ? 1.0F : 0.0F + + dst.zw = src0.zw == src1.zw ? 1.0F : 0.0F + +.. opcode:: DSLT - Set Double on Less than + +.. math:: + + dst.xy = src0.xy < src1.xy ? 1.0F : 0.0F + + dst.zw = src0.zw < src1.zw ? 1.0F : 0.0F + +.. opcode:: DFRAC - Double Fraction + +.. math:: + + dst.xy = src.xy - \lfloor src.xy\rfloor + + dst.zw = src.zw - \lfloor src.zw\rfloor + + +.. opcode:: DFRACEXP - Convert Double Number to Fractional and Integral Components + +.. math:: + + dst0.xy = frexp(src.xy, dst1.xy) + + dst0.zw = frexp(src.zw, dst1.zw) + +.. opcode:: DLDEXP - Multiple Double Number by Integral Power of 2 + +.. math:: + + dst.xy = ldexp(src0.xy, src1.xy) + + dst.zw = ldexp(src0.zw, src1.zw) + +.. opcode:: DMIN - Minimum Double + +.. math:: + + dst.xy = min(src0.xy, src1.xy) + + dst.zw = min(src0.zw, src1.zw) + +.. opcode:: DMAX - Maximum Double + +.. math:: + + dst.xy = max(src0.xy, src1.xy) + + dst.zw = max(src0.zw, src1.zw) + +.. opcode:: DMUL - Multiply Double + +.. math:: + + dst.xy = src0.xy \times src1.xy + + dst.zw = src0.zw \times src1.zw + + +.. opcode:: DMAD - Multiply And Add Doubles + +.. math:: + + dst.xy = src0.xy \times src1.xy + src2.xy + + dst.zw = src0.zw \times src1.zw + src2.zw + + +.. opcode:: DRCP - Reciprocal Double + +.. math:: + + dst.xy = \frac{1}{src.xy} + + dst.zw = \frac{1}{src.zw} + +.. opcode:: DSQRT - Square root double + +.. math:: + + dst.xy = \sqrt{src.xy} + + dst.zw = \sqrt{src.zw} + Explanation of symbols used ------------------------------ @@ -1137,25 +1271,41 @@ Keywords discard Discard fragment. - dst First destination register. + pc Program counter. - dst0 First destination register. + target Label of target instruction. - pc Program counter. - src First source register. +Other tokens +--------------- - src0 First source register. - src1 Second source register. +Declaration +^^^^^^^^^^^ - src2 Third source register. - target Label of target instruction. +Declares a register that is will be referenced as an operand in Instruction +tokens. +File field contains register file that is being declared and is one +of TGSI_FILE. -Other tokens ---------------- +UsageMask field specifies which of the register components can be accessed +and is one of TGSI_WRITEMASK. + +Interpolate field is only valid for fragment shader INPUT register files. +It specifes the way input is being interpolated by the rasteriser and is one +of TGSI_INTERPOLATE. + +If Dimension flag is set to 1, a Declaration Dimension token follows. + +If Semantic flag is set to 1, a Declaration Semantic token follows. + +CylindricalWrap bitfield is only valid for fragment shader INPUT register +files. It specifies which register components should be subject to cylindrical +wrapping when interpolating by the rasteriser. If TGSI_CYLINDRICAL_WRAP_X +is set to 1, the X component should be interpolated according to cylindrical +wrapping rules. Declaration Semantic @@ -1187,9 +1337,8 @@ are the Cartesian coordinates, and ``w`` is the homogenous coordinate and used for the perspective divide, if enabled. As a vertex shader output, position should be scaled to the viewport. When -used in fragment shaders, position will --- - -XXX --- wait a minute. Should position be in [0,1] for x and y? +used in fragment shaders, position will be in window coordinates. The convention +used depends on the FS_COORD_ORIGIN and FS_COORD_PIXEL_CENTER properties. XXX additionally, is there a way to configure the perspective divide? it's accelerated on most chipsets AFAIK... @@ -1266,3 +1415,85 @@ TGSI_SEMANTIC_EDGEFLAG """""""""""""""""""""" XXX no clue + + +Properties +^^^^^^^^^^^^^^^^^^^^^^^^ + + + Properties are general directives that apply to the whole TGSI program. + +FS_COORD_ORIGIN +""""""""""""""" + +Specifies the fragment shader TGSI_SEMANTIC_POSITION coordinate origin. +The default value is UPPER_LEFT. + +If UPPER_LEFT, the position will be (0,0) at the upper left corner and +increase downward and rightward. +If LOWER_LEFT, the position will be (0,0) at the lower left corner and +increase upward and rightward. + +OpenGL defaults to LOWER_LEFT, and is configurable with the +GL_ARB_fragment_coord_conventions extension. + +DirectX 9/10 use UPPER_LEFT. + +FS_COORD_PIXEL_CENTER +""""""""""""""""""""" + +Specifies the fragment shader TGSI_SEMANTIC_POSITION pixel center convention. +The default value is HALF_INTEGER. + +If HALF_INTEGER, the fractionary part of the position will be 0.5 +If INTEGER, the fractionary part of the position will be 0.0 + +Note that this does not affect the set of fragments generated by +rasterization, which is instead controlled by gl_rasterization_rules in the +rasterizer. + +OpenGL defaults to HALF_INTEGER, and is configurable with the +GL_ARB_fragment_coord_conventions extension. + +DirectX 9 uses INTEGER. +DirectX 10 uses HALF_INTEGER. + + + +Texture Sampling and Texture Formats +------------------------------------ + +This table shows how texture image components are returned as (x,y,z,w) tuples +by TGSI texture instructions, such as :opcode:`TEX`, :opcode:`TXD`, and +:opcode:`TXP`. For reference, OpenGL and Direct3D conventions are shown as +well. + ++--------------------+--------------+--------------------+--------------+ +| Texture Components | Gallium | OpenGL | Direct3D 9 | ++====================+==============+====================+==============+ +| R | XXX TBD | (r, 0, 0, 1) | (r, 1, 1, 1) | ++--------------------+--------------+--------------------+--------------+ +| RG | XXX TBD | (r, g, 0, 1) | (r, g, 1, 1) | ++--------------------+--------------+--------------------+--------------+ +| RGB | (r, g, b, 1) | (r, g, b, 1) | (r, g, b, 1) | ++--------------------+--------------+--------------------+--------------+ +| RGBA | (r, g, b, a) | (r, g, b, a) | (r, g, b, a) | ++--------------------+--------------+--------------------+--------------+ +| A | (0, 0, 0, a) | (0, 0, 0, a) | (0, 0, 0, a) | ++--------------------+--------------+--------------------+--------------+ +| L | (l, l, l, 1) | (l, l, l, 1) | (l, l, l, 1) | ++--------------------+--------------+--------------------+--------------+ +| LA | (l, l, l, a) | (l, l, l, a) | (l, l, l, a) | ++--------------------+--------------+--------------------+--------------+ +| I | (i, i, i, i) | (i, i, i, i) | N/A | ++--------------------+--------------+--------------------+--------------+ +| UV | XXX TBD | (0, 0, 0, 1) | (u, v, 1, 1) | +| | | [#envmap-bumpmap]_ | | ++--------------------+--------------+--------------------+--------------+ +| Z | XXX TBD | (z, z, z, 1) | (0, z, 0, 1) | +| | | [#depth-tex-mode]_ | | ++--------------------+--------------+--------------------+--------------+ + +.. [#envmap-bumpmap] http://www.opengl.org/registry/specs/ATI/envmap_bumpmap.txt +.. [#depth-tex-mode] the default is (z, z, z, 1) but may also be (0, 0, 0, z) + or (z, z, z, z) depending on the value of GL_DEPTH_TEXTURE_MODE. |