aboutsummaryrefslogtreecommitdiffstats
path: root/src/compiler/glsl/ir_optimization.h
Commit message (Collapse)AuthorAgeFilesLines
* glsl: [u/i]mulExtended optimization for GLSLSagar Ghuge2019-03-041-0/+1
| | | | | | | | | | | | | | | Optimize mulExtended to use 32x32->64 multiplication. Drivers which are not based on NIR, they can set the MUL64_TO_MUL_AND_MUL_HIGH lowering flag in order to have same old behavior. v2: Add missing condition check (Jason Ekstrand) Signed-off-by: Sagar Ghuge <[email protected]> Suggested-by: Matt Turner <Matt Turner <[email protected]> Suggested-by: Jason Ekstrand <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* glsl: use only copy_propagation_elementsCaio Marcelo de Oliveira Filho2018-07-271-1/+0
| | | | | | | | Now that the elements version handles both cases, remove the non-elements version. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Thomas Helland <[email protected]>
* mesa: include mtypes.h lessMarek Olšák2018-04-121-0/+3
| | | | | | | | | | - remove mtypes.h from most header files - add main/menums.h for often used definitions - remove main/core.h v2: fix radv build Reviewed-by: Brian Paul <[email protected]>
* glsl: Specify framebuffer fetch coherency mode in ↵Francisco Jerez2018-02-241-1/+1
| | | | | | | | | | | | | | | | | | | | | lower_blend_equation_advanced(). This requires passing an extra argument to the lowering pass because the KHR_blend_equation_advanced specification doesn't seem to define any mechanism for the implementation to determine at compile-time whether coherent blending can ever be used (not even an "#extension KHR_blend_equation_advanced_coherent" directive seems to be required in the shader source AFAICT). In the long run we'll probably want to do state-dependent recompiles based on the value of ctx->Color.BlendCoherent, but right now there would be no benefit from that because the only driver that supports coherent framebuffer fetch is i965 on SKL+ hardware, which are unable to support the non-coherent path for the moment because of texture layout issues, so framebuffer fetch coherency is always enabled for them. Reviewed-by: Plamena Manolova <[email protected]>
* glsl: Combine nop-swizzle optimization with swizzle-swizzle optimizationIan Romanick2017-11-081-2/+1
| | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: <[email protected]>
* glsl: fix derived cs variablesIlia Mirkin2017-10-231-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | There are two issues with the current implementation. First, it relies on the layout(local_size_*) happening in the same shader as the main function, and secondly it doesn't work for variable group sizes. In both cases, the simplest fix is to move the setup of these derived values to a later time, similar to how the gl_VertexID workarounds are done. There already exist system values defined for both of the derived values, so we use them unconditionally, and lower them after linking is performed. While we're at it, we move to using gl_LocalGroupSizeARB instead of gl_WorkGroupSize for variable group sizes. Also the dead code elimination avoidance can be removed, since there can be situations where gl_LocalGroupSizeARB is needed but has not been inserted for the shader with main function. As a result, the lowering code has to insert its own copies of the system values if needed. Reported-by: Stephane Chevigny <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103393 Cc: [email protected] Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* glsl/linker: add check for compute shared memory sizeNicolai Hähnle2017-10-101-2/+3
| | | | | | | | | | Unlike uniforms, the limit on shared memory size is not called out explicitly in the list of things that cause linker errors, but presumably that's just an oversight in the spec. Fixes dEQP-GLES31.functional.debug.negative_coverage.{callbacks,get_error,log}.compute.exceed_shared_memory_size_limit Reviewed-by: Timothy Arceri <[email protected]>
* glsl: Convert lower_variable_index_to_cond_assign to ir_builderIan Romanick2017-10-021-3/+5
| | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Alejandro Piñeiro <[email protected]>
* glsl: Return ir_variable from compare_index_blockIan Romanick2017-10-021-3/+3
| | | | | | | | | This is basically a wash now, but it simplifies later patches that convert to using ir_builder. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Thomas Helland <[email protected]>
* glsl: pass UseSTD430AsDefaultPacking to where it will be usedTimothy Arceri2017-08-221-1/+1
| | | | | | | Here we also make use of the UseSTD430AsDefaultPacking constant and call the new get_internal_ifc_packing() helper. Reviewed-by: Marek Olšák <[email protected]>
* glsl: lower sqrt(abs()) and inversesqrt(abs()) if requestedSamuel Pitoiset2017-03-221-0/+1
| | | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* glsl: consistently use ifndef guards over pragma onceEmil Velikov2017-03-221-1/+5
| | | | | | | | | | | | | | | | | Through the glsl headers we had an odd mix of guards be that "ifndef", "pragma once" neither or both. Simplify things by using the more common ones (ifndef) and annotating all the sources, barring the generated builting header - builtin_int64.h. The final header - udivmod64.h - is [seemingly] unused and on its way out (patch purge it is on the mailing list). Signed-off-by: Emil Velikov <[email protected]> Acked-by: Vedran Miletić <[email protected]> Acked-by: Juha-Pekka Heikkila <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* glsl: split DIV_TO_MUL_RCP into single- and double-precision flagsNicolai Hähnle2017-01-231-1/+3
| | | | | | | | Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]> Tested-by: Glenn Kennard <[email protected]> Tested-by: James Harvey <[email protected]> Cc: 17.0 <[email protected]>
* glsl: Add a lowering pass for 64-bit integer modulusIan Romanick2017-01-201-0/+1
| | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Add a lowering pass for 64-bit integer divisionIan Romanick2017-01-201-0/+1
| | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Add a lowering pass for 64-bit integer sign()Ian Romanick2017-01-201-0/+1
| | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Add a lowering pass for 64-bit integer multiplicationIan Romanick2017-01-201-0/+6
| | | | | | | | v2: Rename lower_64bit.cpp and lower_64bit_test.cpp to lower_int64. Suggested by Matt. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl/lower_if: conditionally lower if-branches based on their sizeMarek Olšák2016-11-151-1/+1
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* glsl/lower_if: don't lower branches touching tess control outputsMarek Olšák2016-11-151-1/+2
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* glsl: record number of components used in each slot for varying packingIlia Mirkin2016-11-091-1/+3
| | | | | | | | | | | | | | Instead of packing varyings into vec4's, keep track of how many components each slot uses and create varyings with matching types. This ensures that we don't end up using more components than the orginal shader, which is especially important for geometry shader output limits. This comes up for NVIDIA hw, where the limit is 1024 output components for a GS, and the hardware complains *loudly* if you even think about going over. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* glsl: use reproducible name for lowered const arraysTimothy Arceri2016-09-271-1/+1
| | | | | | | Otherwise we can end up with mismatching names between the cached binary and the cached metadata. Reviewed-by: Kenneth Graunke <[email protected]>
* glsl: Add a lowering pass to handle advanced blending modes.Kenneth Graunke2016-08-251-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Many GPUs cannot handle GL_KHR_blend_equation_advanced natively, and need to emulate it in the pixel shader. This lowering pass implements all the necessary math for advanced blending. It fetches the existing framebuffer value using the MESA_shader_framebuffer_fetch built-in variables, and the previous commit's state var uniform to select which equation to use. This is done at the GLSL IR level to make it easy for all drivers to implement the GL_KHR_blend_equation_advanced extension and share code. Drivers need to hook up MESA_shader_framebuffer_fetch functionality: 1. Hook up the fb_fetch_output variable 2. Implement BlendBarrier() Then to get KHR_blend_equation_advanced, they simply need to: 3. Disable hardware blending based on ctx->Color._AdvancedBlendEnabled 4. Call this lowering pass. Very little driver specific code should be required. v2: Handle multiple output variables per render target (which may exist due to ARB_enhanced_layouts), and array variables (even with one render target, we might have out vec4 color[1]), and non-vec4 variables (it's easier than finding spec text to justify not handling it). Thanks to Francisco Jerez for the feedback. v3: Lower main returns so that we have a single exit point where we can add our blending epilogue (caught by Francisco Jerez). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* glsl: Add lowering pass for ir_bin_imul_highIan Romanick2016-07-191-0/+1
| | | | | | | | | | This isn't the lowering pass you want. Most GPUs that can support GLSL 1.30 have a multiply unit that can do something more interesting than 32x32->32. Many have 32x16->48. Any GPU that does, should do the lowering in the backend. This is just the thing that will always work. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Add lowering pass for ir_unop_find_msbIan Romanick2016-07-191-0/+1
| | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Add lowering pass for ir_unop_find_lsbIan Romanick2016-07-191-0/+1
| | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Add lowering pass for ir_unop_bitfield_reverseIan Romanick2016-07-191-0/+1
| | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Add lowering pass for ir_quadop_bitfield_insertIan Romanick2016-07-191-0/+1
| | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Add lowering pass for ir_triop_bitfield_extractIan Romanick2016-07-191-0/+1
| | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Add lowering pass for ir_unop_bit_countIan Romanick2016-07-191-0/+1
| | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl/mesa: split gl_shader in twoTimothy Arceri2016-06-301-9/+14
| | | | | | | | | | | | | | | | | There are two distinctly different uses of this struct. The first is to store GL shader objects. The second is to store information about a shader stage thats been linked. The two uses actually share few fields and there is clearly confusion about their use. For example the linked shaders map one to one with a program so can simply be destroyed along with the program. However previously we were calling reference counting on the linked shaders. We were also creating linked shaders with a name even though it is always 0 and called the driver version of the _mesa_new_shader() function unnecessarily for GL shader objects. Acked-by: Iago Toral Quiroga <[email protected]>
* glsl: Add an option to clamp block indices when lowering UBO/SSBOsJason Ekstrand2016-05-231-1/+1
| | | | | | | | This prevents array overflow when the block is actually an array of UBOs or SSBOs. On some hardware such as i965, such overflows can cause GPU hangs. Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* glsl: rewrite clip/cull distance lowering passDave Airlie2016-05-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | The last version of this broke clipping, and I had to spend sometime getting this working properly. I had to introduce a third pass to count the clip/cull totals, all due to one messy corner case. We have a piglit test tes-input-gl_ClipDistance.shader_test that doesn't actually output the clip distances, it just passes them like a varying from TCS->TES, the older lowering pass worked but to lower clip/cull we need to know the total number of clip+culls used to defined the new variable correctly, and to offset culls properly. This adds an extra pass that works out the sizes for clip/cull, then lowers gl_ClipDistance then gl_CullDistance into the new gl_ClipDistanceMESA. The pass checks using the fixed array sizes code if they array has been referenced, or is actually never used, and ignores it in the latter case. Reviewed-by: Ilia Mirkin <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* glsl: Consolidate duplicate copies of constant folding.Kenneth Graunke2016-05-151-0/+2
| | | | | | | | | | We could probably clean this up more (maybe make it a method), but at least there's only one copy of this code now, and that's a start. No change in shader-db. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* Revert "glsl: Extend lowering pass for gl_ClipDistance to support other ↵Dave Airlie2016-05-141-3/+1
| | | | | | | | arrays (v4)" This reverts commit ad355652c20b245f5f2faa8622e71461e3121a7f. This broke a bunch of clip tests.
* glsl: Extend lowering pass for gl_ClipDistance to support other arrays (v4)Tobias Klausmann2016-05-141-1/+3
| | | | | | | | | | | | | | | | | | | This will come in handy when we want to lower gl_CullDistance into gl_CullDistanceMESA. [airlied: drop separate APIs for clip/cull - just use single API to call both passes.] v3: reexamine my sanity, this was pretty broken, the new code creates one copy of gl_ClipDistanceMESA, as the clip distance varying and lowers everything into that in two passes, one for clips one for culls. v4: rework using the passes in clip/cull sizes, instead of the array sizes. Signed-off-by: Tobias Klausmann <[email protected]> Signed-off-by: Dave Airlie <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
* glsl: Add a pass to propagate the "invariant" and "precise" qualifiersJason Ekstrand2016-03-231-0/+1
| | | | Reviewed-by: Francisco Jerez <[email protected]>
* glsl: disable varying packing when its not safeTimothy Arceri2016-03-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In GL 4.4+ there is no guarantee that interpolation qualifiers will match between stages so we cannot safely pack varyings using the current packing pass in Mesa. We also disable packing on outerward facing interfaces for SSO because in ES we need to retain the unpacked varying information for draw time validation. For desktop GL we could allow packing for SSO in versions < 4.4 but its just safer not to do so. We do however enable packing on individual arrays, structs, and matrices as these are required by the transform feedback code and it is still safe to do so. Finally we also enable packing when a varying is only used for transform feedback and its not a SSO. This fixes all remaining rendering issues with the dEQP SSO tests, the only issues remaining with thoses tests are to do with validation. Note: There is still one remaining SSO bug that this patch doesn't fix. Their is a chance that VS -> TCS will have mismatching interfaces because we pack VS output in case its used by transform feedback but don't pack TCS input for performance reasons. This patch will make the situation better but doesn't fix it. V4: fix out of order function params after rebase, make sure packing still disabled in tess stages. Update comments as to why we disable packing on SSO. V3: ES 3.1 *does* require interpolation to match so don't disable packing there. Rebased on master rather than on enhanced layouts component packing series. V2: Make is_varying_packing_safe() a function in the varying_matches class, fix spelling (Matt) and make sure to remove the outer array when dealing with Geom and Tess shaders where appropriate. Lastly fix piglit regression in new piglit test and document the undefined behaviour it depends on: arb_separate_shader_objects/execution/vs-gs-linking.shader_test Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* glsl: pass disable_varying_packing bool to the lowering passTimothy Arceri2016-03-181-1/+2
| | | | | | | | | | This will allow us to choose to ignore the disable which will be useful for more fine grained control over when to enable or disable packing. Reviewed-by: Anuj Phogat <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* glsl: Remove 2x16 half-precision pack/unpack opcodes.Matt Turner2016-02-011-9/+6
| | | | | | i965/fs was the only consumer, and we're now doing the lowering in NIR. Reviewed-by: Iago Toral Quiroga <[email protected]>
* glsl: move to compiler/Emil Velikov2016-01-261-0/+147
Signed-off-by: Emil Velikov <[email protected]> Acked-by: Matt Turner <[email protected]> Acked-by: Jose Fonseca <[email protected]>