summaryrefslogtreecommitdiffstats
path: root/src/mesa
Commit message (Collapse)AuthorAgeFilesLines
* mesa: Returns a GL_INVALID_VALUE error on several APIs when buffer size is ↵Eduardo Lima Mitev2015-02-033-12/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | negative Section 2.3.1 (Errors) of the OpenGL 4.5 spec says: "If a negative number is provided where an argument of type sizei or sizeiptr is specified, an INVALID_VALUE error is generated. This patch adds checks for negative buffer size values passed to different APIs. It also moves up the check on other APIs that already had it, making it the first error check performed in the function, for consistency. While there may be other APIs throughtout the code lacking this check (or at least not at the beginning of the function), this patch focuses on the cases that break the dEQP tests listed below. It could be a good excersize for the future to check all other cases, and improve consistency in the order of the checks throughout the whole Mesa code base. This fixes 5 dEQP test: * dEQP-GLES3.functional.negative_api.state.get_attached_shaders * dEQP-GLES3.functional.negative_api.state.get_shader_source * dEQP-GLES3.functional.negative_api.state.get_active_uniform * dEQP-GLES3.functional.negative_api.state.get_active_attrib * dEQP-GLES3.functional.negative_api.shader.program_binary Reviewed-by: Ian Romanick <[email protected]>
* mesa: fix error value in GetFramebufferAttachmentParameteriv for OpenGL ES 3.0Samuel Iglesias Gonsalvez2015-02-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | Section 6.1.13 "Framebuffer Object Queries" of OpenGL ES 3.0 spec: "If the default framebuffer is bound to target, then attachment must be BACK, identifying the color buffer; DEPTH, identifying the depth buffer; or STENCIL, identifying the stencil buffer." OpenGL ES 3.0, section 2.5 (GL Errors): "If a command that requires an enumerated value is passed a symbolic constant that is not one of those specified as allowable for that command, an INVALID_ENUM error is generated." Then change the returned error to INVALID_ENUM. Fixes: dEQP-GLES3.functional.fbo.api.attachment_query_default_fbo Signed-off-by: Samuel Iglesias Gonsalvez <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* glsl: Improve precision of mod(x,y)Iago Toral Quiroga2015-02-035-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, Mesa uses the lowering pass MOD_TO_FRACT to implement mod(x,y) as y * fract(x/y). This implementation has a down side though: it introduces precision errors due to the fract() operation. Even worse, since the result of fract() is multiplied by y, the larger y gets the larger the precision error we produce, so for large enough numbers the precision loss is significant. Some examples on i965: Operation Precision error ----------------------------------------------------- mod(-1.951171875, 1.9980468750) 0.0000000447 mod(121.57, 13.29) 0.0000023842 mod(3769.12, 321.99) 0.0000762939 mod(3769.12, 1321.99) 0.0001220703 mod(-987654.125, 123456.984375) 0.0160663128 mod( 987654.125, 123456.984375) 0.0312500000 This patch replaces the current lowering pass with a different one (MOD_TO_FLOOR) that follows the recommended implementation in the GLSL man pages: mod(x,y) = x - y * floor(x/y) This implementation eliminates the precision errors at the expense of an additional add instruction on some systems. On systems that can do negate with multiply-add in a single operation this new implementation would come at no additional cost. v2 (Ian Romanick) - Do not clone operands because when they are expressions we would be duplicating them and that can lead to suboptimal code. Fixes the following 16 dEQP tests: dEQP-GLES3.functional.shaders.builtin_functions.precision.mod.mediump_* dEQP-GLES3.functional.shaders.builtin_functions.precision.mod.highp_* Reviewed-by: Ian Romanick <[email protected]>
* mesa: Allow querying for GL_PRIMITIVE_RESTART_FIXED_INDEX under GLES 3Eduardo Lima Mitev2015-02-031-0/+1
| | | | | | | | | | | | | | GLES 3.0.0 spec introduces context state PRIMITIVE_RESTART_FIXED_INDEX (2.8.1 Transferring Array Elements, page 26) which is not currently possible to query using glGet*() funcs. Fixes 4 dEQP tests: * dEQP-GLES3.functional.state_query.boolean.primitive_restart_fixed_index_getboolean * dEQP-GLES3.functional.state_query.boolean.primitive_restart_fixed_index_getinteger * dEQP-GLES3.functional.state_query.boolean.primitive_restart_fixed_index_getinteger64 * dEQP-GLES3.functional.state_query.boolean.primitive_restart_fixed_index_getfloat Reviewed-by: Ian Romanick <[email protected]>
* i965: Fix negate with unsigned integersIago Toral Quiroga2015-02-032-8/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For code such as: uint tmp1 = uint(in0); uint tmp2 = -tmp1; float out0 = float(tmp2); We produce code like: mov(8) g5<1>.xF -g9<4,4,1>.xUD which does not produce correct results. This code produces the results we would expect if tmp1 and tmp2 were signed integers instead. It seems that a similar problem was detected and addressed when using negations with unsigned integers as part of condionals, but it looks like the problem has a wider impact than that. This patch fixes the problem by preventing copy-propagation of negated UD registers in all scenarios, not only in conditionals. Fixes the following 24 dEQP tests: dEQP-GLES3.functional.shaders.operator.unary_operator.minus.*_uint_* dEQP-GLES3.functional.shaders.operator.unary_operator.minus.*_uvec2_* dEQP-GLES3.functional.shaders.operator.unary_operator.minus.*_uvec3_* dEQP-GLES3.functional.shaders.operator.unary_operator.minus.*_uvec4_* Reviewed-by: Anuj Phogat <[email protected]>
* st/mesa: add EXT_polygon_offset_clamp supportIlia Mirkin2015-02-022-0/+2
| | | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Glenn Kennard <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* i965/gen6+: enable EXT_polygon_offset_clampIlia Mirkin2015-02-024-3/+4
| | | | | | | Replace the hard-coded 0's with the context clamp value. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: add support for GL_EXT_polygon_offset_clampIlia Mirkin2015-02-0212-16/+73
| | | | | | | | | | Nothing enables the extension yet, but the values are now available. The spec calls for it to only be exposed for GL 3.3+, which is core-only in mesa. Instead we allow any driver to enable it, including in a compat context for any GL version. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Glenn Kennard <[email protected]>
* glapi: add GL_EXT_polygon_offset_clampIlia Mirkin2015-02-023-1/+13
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Glenn Kennard <[email protected]>
* i965: Add a better PRM citation for the IMS dimension mangling.Kenneth Graunke2015-02-021-1/+22
| | | | | | | | | | | | | | | Paul originally had to reverse engineer these formulas based on the description about how the sampler works. The description here is not the easiest to follow - especially given that it's from the Sandybridge era, when the hardware only did 4x multisampling. Jordan and I recently found another part of the documentation where they simply state that IMS dimensions must be adjusted by a set of formulas. Quoting this section provides an easy to follow explanation for the code, including 2x/4x/8x/16x. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* swrast: Whitespace fixes.Laura Ekstrand2015-02-023-12/+12
| | | | | Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* DD: Refactor BlitFramebuffer.Laura Ekstrand2015-02-0220-83/+130
| | | | | | | | | In preparation for glBlitNamedFramebuffer, the DD table function BlitFramebuffer needs to accept two arbitrary framebuffer objects rather than assuming ctx->ReadBuffer and ctx->DrawBuffer. Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* i965: Don't use tiled_memcpy to download from RGBX or BGRX surfacesJason Ekstrand2015-02-022-0/+14
| | | | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88841 Reviewed-by: Anuj Phogat <[email protected]>
* dir-locals.el: Don't set variables for non-programming modesNeil Roberts2015-02-021-1/+1
| | | | | | | | | | | | | | This limits the style changes to modes inherited from prog-mode. The main reason to do this is to avoid setting fill-column for people using Emacs to edit commit messages because 78 characters is too many to make it wrap properly in git log. Note that makefile-mode also inherits from prog-mode so the fill column should continue to apply there. v2: Apply to all the .dir-locals.el files, not just the one in the root directory. Acked-by: Michel Dänzer <[email protected]>
* i965: Fix intel_miptree_copy_teximage for GL_TEXTURE_1D_ARRAYIago Toral Quiroga2015-02-021-1/+6
| | | | | | | | | | | For GL_TEXTURE_1D_ARRAY targets we store the depth of the array in the Height field and leave Depth=1 in the underlying texture object. When we call intel_miptree_copy_teximage in the process of re-creating a miptree (possibily because the number of miplevels has changed) we didn't account for this, so we where only copying texture images for the first slice. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/pixel_read: Don't try to do a tiled_memcpy from a multisampled bufferJason Ekstrand2015-01-311-0/+7
| | | | | | | | | | The GL spec guarantees that glGetTexImage will never get a multisampled texture, but this is not true for glReadPixels. If we get a multisampled buffer, we have to do a multisample resolve on it before we can pull the data down for the user. Since this isn't practical to handle in tiled_memcpy, we just fall back to the other paths that can handle this. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Enable L3 caching of buffer surfaces.Francisco Jerez2015-01-314-9/+3
| | | | | | | | | | | | | | | And remove the mocs argument of the emit_buffer_surface_state vtbl hook. Its semantics vary greatly from one generation to another, so it kind of encourages the caller to pass 0 which is the only valid setting across generations. After this commit the hardware-specific code decides what the best cacheability settings are for buffer surfaces, just like we do for textures. This together with some additional changes coming is expected to improve performance of pull constants, buffer textures, atomic counters and image objects on Gen7 and up. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/pixel_read: Properly flip the results for window system buffersJason Ekstrand2015-01-301-0/+15
| | | | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88841 Reviewed-by: Chad Versace <[email protected]>
* i965/tiled_memcpy: Support a signed linear pitchJason Ekstrand2015-01-302-17/+17
| | | | Reviewed-by: Chad Versace <[email protected]>
* main: Add STENCIL_INDEX formats to base_tex_formatJason Ekstrand2015-01-301-0/+10
| | | | | | | | This fixes a bug on BDW when our meta-based stencil blit path assert-fails due to an invalid internal format even though we do support the ARB_stencil_texturing extension. Reviewed-by: Matt Turner <[email protected]>
* teximage: Don't indent switch casesJason Ekstrand2015-01-301-146/+146
| | | | | | No functional change. Reviewed-by: Matt Turner <[email protected]>
* mesa: remove some dead display list codeBrian Paul2015-01-301-80/+11
| | | | | | | The size of a Node is always four bytes so no need for the old code that was used when sizeof(Node)==8. Reviewed-by: Matt Turner <[email protected]>
* mesa: remove stale comment in dlist.c codeBrian Paul2015-01-301-1/+1
| | | | | | sizeof(Node) is always 4 bytes. Reviewed-by: Matt Turner <[email protected]>
* mesa: s/union gl_dlist_node/Node/ in dlist.c codeBrian Paul2015-01-301-3/+3
| | | | | | Just minor clean-up. Reviewed-by: Matt Turner <[email protected]>
* mesa: fix display list 8-byte alignment issueBrian Paul2015-01-303-7/+74
| | | | | | | | | | | | | | | | | | | | | The _mesa_dlist_alloc() function is only guaranteed to return a pointer with 4-byte alignment. On 64-bit systems which don't support unaligned loads (e.g. SPARC or MIPS) this could lead to a bus error in the VBO code. The solution is to add a new _mesa_dlist_alloc_aligned() function which will return a pointer to an 8-byte aligned address on 64-bit systems. This is accomplished by inserting a 4-byte NOP instruction in the display list when needed. The only place this actually matters is the VBO code where we need to allocate a 'struct vbo_save_vertex_list' which needs to be 8-byte aligned (just as if it were malloc'd). The gears demo and others hit this bug. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88662 Cc: "10.4" <[email protected]> Reviewed-by: José Fonseca <[email protected]>
* i965/skl: Force a BINDING_TABLE_POINTER_* after push constant commandNeil Roberts2015-01-301-0/+7
| | | | | | | | | | | | | According to the SKL bspec the 3DSTATE_CONSTANT_* commands only take effect on the next corresponding 3DSTATE_BINDING_TABLE_POINTER_* command. This patch just makes it set the BRW_NEW_SURFACES state when uploading the push constants to ensure the binding tables will be updated. This fixes the fbo-blending-formats Piglit test and possibly others. Reviewed-by: Kristian Høgsberg <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* meta: Don't write depth when decompressing tex-imagesTopi Pohjolainen2015-01-301-1/+1
| | | | | | Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* meta: Don't write depth when generating miptreesTopi Pohjolainen2015-01-301-1/+1
| | | | | | Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* meta/blit: Compile programs with and without depthTopi Pohjolainen2015-01-302-5/+11
| | | | | | | | | | | | | | When color buffers alone are concerned the depth is not needed. No regression on BDW where meta blit is used instead of blorp. I also disabled blorp temporarily for fbo-blits on IVB and saw no regressions there either. I also compared several graphics benchmarks on BDW and saw neither regressions or improvements. Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* meta/blit: Write depth only when asked forTopi Pohjolainen2015-01-301-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implementing an idea from Ken, on i965 the shader program for 2D blits becomes significantly simpler. Before: pln(8) g6<1>F g4<0,1,0>F g2<8,8,1>F { align1 1Q compacted }; pln(8) g7<1>F g4.4<0,1,0>F g2<8,8,1>F { align1 1Q compacted }; send(8) g2<1>UW g6<8,8,1>F sampler (1, 0, 0, 1) mlen 2 rlen 4 { align1 1Q }; mov(8) g123<1>F g2<8,8,1>F { align1 1Q compacted }; mov(8) g124<1>F g3<8,8,1>F { align1 1Q compacted }; mov(8) g125<1>F g4<8,8,1>F { align1 1Q compacted }; mov(8) g126<1>F g5<8,8,1>F { align1 1Q compacted }; mov(8) g127<1>F g2<8,8,1>F { align1 1Q compacted }; nop ; sendc(8) null g123<8,8,1>F render RT write SIMD8 LastRT Surface = 0 mlen 5 rlen 0 { align1 1Q EOT }; After: pln(8) g6<1>F g4<0,1,0>F g2<8,8,1>F { align1 1Q compacted }; pln(8) g7<1>F g4.4<0,1,0>F g2<8,8,1>F { align1 1Q compacted }; send(8) g124<1>UW g6<8,8,1>F sampler (1, 0, 0, 1) mlen 2 rlen 4 { align1 1Q }; sendc(8) null g124<8,8,1>F render RT write SIMD8 LastRT Surface = 0 mlen 4 rlen 0 { align1 1Q EOT }; v2 (Matt): Removed unintended white-space change Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* meta/blit: Add plumbing for shaders without depthTopi Pohjolainen2015-01-304-3/+5
| | | | | | | | | Currently all blit programs are unconditionally compiled with gl_FragDepth. Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* Mesa: Advertise GL_OES_texture_*float* extensions support with i965.Kalyan Kondapally2015-01-291-0/+5
| | | | | | | | | This patch advertises support for GL_OES_texture_*float* extensions when using i965 drivers. Signed-off-by: Kevin Rogovin <[email protected]> Signed-off-by: Kalyan Kondapally <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* Mesa: Add support for HALF_FLOAT_OES type.Kalyan Kondapally2015-01-294-6/+52
| | | | | | | | | This patch adds needed support for accepting HALF_FLOAT_OES as valid type for TexImage*D and TexSubImage*D when Texture FLoat extensions are supported. Signed-off-by: Kevin Rogovin <[email protected]> Signed-off-by: Kalyan Kondapally <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* Mesa: Add support for GL_OES_texture_*float* extensions.Kalyan Kondapally2015-01-294-0/+132
| | | | | | | | | | | | | | | | | | | This patch series adds support for following GLES2 Texture Float extensions: 1)GL_OES_texture_float, 2)GL_OES_texture_half_float, 3)GL_OES_texture_float_linear, 4)GL_OES_texture_half_float_linear. This patch adds basic infrastructure and needed boolean flags to advertise support for these extensions, by default the support is disabled. Next patch in the series introduces support for HALF_FLOAT_OES token. v4: take assert away and make valid_filter_for_float conditional (Tapani), fix the alphabetical order (Emil) Signed-off-by: Kevin Rogovin <[email protected]> Signed-off-by: Kalyan Kondapally <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* mesa: Move simple_list.h to src/util.Eric Anholt2015-01-2829-238/+27
| | | | | | We have two copies of it in the tree, I'm going to delete one. Reviewed-by: Marek Olšák <[email protected]>
* Revert "util: Move the alternate fpclassify implementation to util"Jason Ekstrand2015-01-281-1/+50
| | | | | | | | | | | | | This reverts commits d6eb572905e39c36168b8f5da240af961f9dde0a and 58e8468d113c7d3d4a59ea4a8d70fd45b78e85e6. This is no longer necessary as we aren't using it in NIR anymore. Also, it broke the build on some strange systems so let's put it back in querymatrix where it came from. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88852 Acked-by: Matt Turner <[email protected]>
* drirc: set allow_glsl_extension_directive_midshader for Dead Island.Sven Arvidsson2015-01-281-0/+4
| | | | | | Signed-off-by: Sven Arvidsson <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87076 Signed-off-by: Marek Olšák <[email protected]>
* util: Move the alternate fpclassify implementation to utilJason Ekstrand2015-01-281-50/+1
| | | | Reviewed-by: Ian Romanick <[email protected]>
* i965/tex: Don't create read-write textures with non-renderable formatsJason Ekstrand2015-01-281-0/+5
| | | | | | | | | I haven't actually seen this bug in the wild, but it's possible that someone could ask to do a S3TC PBO download or something. This protects us from accidentally creating a render target with a compressed or otherwise non-renderable format. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/gen8: Include the buffer offset when emitting renderbuffer relocsJason Ekstrand2015-01-281-1/+1
| | | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88792 Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: improve error messaging for format CSV parserTapani Pälli2015-01-282-2/+7
| | | | | | | | Patch adds 2 error messages that point user directly to fix mispelled or impossible swizzle field for a format. Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Implemente a tiled fast-path for glReadPixels and glGetTexImageSisinty Sasmita Patra2015-01-263-1/+271
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Added intel_readpixels_tiled_mempcpy and intel_gettexsubimage_tiled_mempcpy functions. These are the fast paths for glReadPixels and glGetTexImage. On chrome, using the RoboHornet 2D Canvas toDataURL test, this patch cuts amount of time spent in glReadPixels by more than half and reduces the time of the entire test by 10%. v2: Jason Ekstrand <[email protected]> - Refactor to make the functions look more like the old intel_tex_subimage_tiled_memcpy - Don't export the readpixels_tiled_memcpy function - Fix some pointer arithmatic bugs in partial image downloads (using ReadPixels with a non-zero x or y offset) - Fix a bug when ReadPixels is performed on an FBO wrapping a texture miplevel other than zero. v3: Jason Ekstrand <[email protected]> - Better documentation fot the *_tiled_memcpy functions - Add target restrictions for renderbuffers wrapping textures v4: Jason Ekstrand <[email protected]> - Only check the return value of brw_bo_map for error and not bo->virtual v5: Jason Ekstrand <[email protected]> - Don't unnecessarily repeat a comment Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* i965/tiled_memcpy: Add tiled-to-linear pathsSisinty Sasmita Patra2015-01-262-0/+281
| | | | | | | | | | | | | | | | | This commit addes tiled copy functions for coping from tiled memory to linear memory. These are very similar to the existing linear-to-tiled paths. v2: Jason Ekstrand <[email protected]> - New commit message - Various whitespace fixes - Added ptrdiff_t casts as done in commit 225a09790 v3: Jason Ekstrand <[email protected]> - Fixed a comment Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* i965: Refactor tiled memcpy functions and move them into their own fileSisinty Sasmita Patra2015-01-264-392/+506
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit refactors the tiled_memcpy code in intel_tex_subimage.c and moves it into its own file intel_tiled_memcpy files. Also, xtile_copy and ytile_copy are renamed to linear_to_xtiled and linear_to_ytiled respectively. The *_faster functions are similarly renamed. There was also a bit of logic to select between the the libc provided memcpy function and our custom memcpy that does an RGBA -> BGRA swizzle. This was moved into an intel_get_memcpy function so that rgba8_copy can live (and be inlined) in intel_tiled_memcpy.c. v2: Jason Ekstrand <[email protected]> - Better commit message - Fix up the copyright on the intel_tiled_memcpy files - Various whitespace fixes - Moved a bunch of stuff that did not need to be exposed from intel_tiled_memcpy.h to intel_tiled_memcpy.c - Added proper documentation for intel_get_memcpy - Incorperated the ptrdiff_t tweaks from commit 225a09790 v3: Jason Ekstrand <[email protected]> - Fixed a comment - Move the tile size constants into the .c file Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* i965/tex_subimage: Use the fast tiled path for rectangle texturesJason Ekstrand2015-01-261-1/+2
| | | | | | | | There's no reason why we should be doing this for 2D textures and not rectangles. Just a matter of adding another hunk to the condition. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* mesa: simplify detection of fpclassifyFelix Janda2015-01-261-11/+7
| | | | | | Fixes compilation with musl libc. Reviewed-by: Ian Romanick <[email protected]>
* i965: Handle CMP.nz ... 0 and MOV.nz similarly in cmod propagation.Kenneth Graunke2015-01-261-6/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | "MOV.nz null src" and "CMP.nz null src 0" are equivalent instructions. Previously, we deleted MOV.nz instructions when the instruction generating the MOV's source also wrote the flag register (as the flag register already contains the desired value). However, we wouldn't delete CMP.nz instructions that served the same purpose. We also didn't attempt true cmod propagation on MOV.nz instructions, while we would for the equivalent CMP.nz form. This patch fixes both limitations, treating both forms equally. CMP.nz instructions will now be deleted (helping the NIR backend), and MOV.nz instructions will have their .nz propagated. No changes in shader-db without NIR. With NIR, total instructions in shared programs: 6006153 -> 5969364 (-0.61%) instructions in affected programs: 2087139 -> 2050350 (-1.76%) helped: 10704 HURT: 0 GAINED: 2 LOST: 2 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* nir: use Python to autogenerate opcode informationConnor Abbott2015-01-241-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before, we used a system where a file, nir_opcodes.h, defined some macros that were included to generate the enum values and the nir_op_infos structure. This worked pretty well, but for development the error messages were never very useful, Python tools couldn't understand the opcode list, and it was difficult to use nir_opcodes.h to do other things like autogenerate a builder API. Now, we store opcode information in nir_opcodes.py, and we have nir_opcodes_c.py to generate the old nir_opcodes.c and nir_opcodes_h.py to generate nir_opcodes.h, which contains all the enum names and gets included into nir.h like before. In addition to solving the above problems, using Python and Mako to generate everything means that it's much easier to add keep information centralized as we add new things like constant propagation that require per-opcode information. v2: - make Opcode derive from object (Dylan) - don't use assert like it's a function (Dylan) - style fixes for fnoise, use xrange (Dylan) - use iterkeys() in nir_opcodes_h.py (Dylan) - use pydoc-style comments (Jason) - don't make fmin/fmax commutative and associative yet (Jason) Signed-off-by: Connor Abbott <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> v3 Jason Ekstrand <[email protected]> - Alphabetize source file lists - Generate nir_opcodes.h in the builddir instead of the source dir - Include $(builddir)/src/glsl/nir in the i965 build - Rework nir_opcodes.h generation so it generates a complete header file instead of one that has to be embedded inside an enum declaration
* i965: Convert CMP.GE -(abs)reg 0 -> CMP.Z reg 0.Matt Turner2015-01-232-0/+24
| | | | | | | | | total instructions in shared programs: 5952059 -> 5951603 (-0.01%) instructions in affected programs: 138812 -> 138356 (-0.33%) GAINED: 1 LOST: 0 Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Add support for removing MOV.NZ instructions.Matt Turner2015-01-232-3/+52
| | | | | | | | | | | | | | | | | | | | | | For some reason, we occasionally write the flag register with a MOV.NZ instruction: add(8) g25<1>F -g6<0,1,0>F g15<8,8,1>F cmp.l.f0(8) g26<1>D g25<8,8,1>F 0F mov.nz.f0(8) null g26<8,8,1>D A MOV.NZ instruction on the result of a CMP is like comparing for equality with true in C. It's useless. Removing it allows us to generate: add.l.f0(8) null -g6<0,1,0>F g15<8,8,1>F total instructions in shared programs: 5955701 -> 5951657 (-0.07%) instructions in affected programs: 302910 -> 298866 (-1.34%) GAINED: 1 LOST: 0 Reviewed-by: Kenneth Graunke <[email protected]>