summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* turnip: Fix GCC compiles.Bas Nieuwenhuizen2019-03-161-6/+3
| | | | | | | | | | Apparently GCC does not consider static const variables to be integer constants, and hence the array size and the static assert result in compile failures. Fixes: 4b9f967cd1a "turnip: add a more complete format table" Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* intel/nir: Lower array-deref-of-vector UBO and SSBO loadsJason Ekstrand2019-03-151-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes a serious performance issue with DXVK: https://github.com/doitsujin/dxvk/issues/937 This was caused by a recent change that to improve performance on RADV which back-fired on ANV and killed performance for some apps: https://github.com/doitsujin/dxvk/commit/e5a06d3f4a103a54cd4eb51970fedee405d1d698 Throwing in this bit of lowering lets us come along and CSE those UBO loads (or copy-prop for SSBO load) and get one load where we previously would have gotten several. VkPipeline-db results on Kaby Lake: total instructions in shared programs: 5115361 -> 5073185 (-0.82%) instructions in affected programs: 1754333 -> 1712157 (-2.40%) helped: 5331 HURT: 63 total cycles in shared programs: 2544501169 -> 2481144545 (-2.49%) cycles in affected programs: 2531058653 -> 2467702029 (-2.50%) helped: 9202 HURT: 4323 total loops in shared programs: 3340 -> 3331 (-0.27%) loops in affected programs: 9 -> 0 helped: 9 HURT: 0 total spills in shared programs: 3246 -> 3053 (-5.95%) spills in affected programs: 384 -> 191 (-50.26%) helped: 10 HURT: 5 total fills in shared programs: 4626 -> 4452 (-3.76%) fills in affected programs: 439 -> 265 (-39.64%) helped: 10 HURT: 5 All of the shaders with hurt spilling were in Rise of the Tomb Raider which also had shaders solidly helped in the spilling department. Not shown in those results (because I've not had success dumping the shaders) is Witcher 3 where this reduces spilling and improves over-all perf by around 20-25%. There were no shader-db changes. Apparently, this just isn't a pattern that happens in OpenGL. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Cc: "19.0" [email protected]
* nir: Add a new pass to lower array dereferences on vectorsJason Ekstrand2019-03-154-0/+202
| | | | | | | | This pass was originally written for lowering TCS output reads and writes but it is also applicable just about anything including UBOs, SSBOs, and shared variables. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir/builder: Add a vector extract helperJason Ekstrand2019-03-152-6/+30
| | | | | | | | | This one's a tiny bit better than what we had in spirv_to_nir because it emits a binary tree rather than a linear walk. It also doesn't leave around unneeded bcsel instructions for a constant index and returns an undef for constant OOB access. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* softpipe: Enable PIPE_CAP_MIXED_COLORBUFFER_FORMATSGert Wollny2019-03-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |    It seems softpipe actually supports this. This change enables the following piglits as passing without regressions in the gpu test set: gl-3.1-mixed-int-float-fbo gl-3.1-mixed-int-float-fbo int_second fbo-blending-format-quirks Changes for deqp: dEQP-GLES2.functional.fbo.completeness.attachment_combinations.rbo_tex_none_none QualityWarning -> Pass dEQP-GLES2.functional.fbo.completeness.attachment_combinations.rbo_tex_none_rbo QualityWarning -> Pass dEQP-GLES2.functional.fbo.completeness.attachment_combinations.rbo_tex_none_tex QualityWarning -> Pass dEQP-GLES2.functional.fbo.completeness.attachment_combinations.rbo_tex_rbo_none QualityWarning -> Pass dEQP-GLES2.functional.fbo.completeness.attachment_combinations.rbo_tex_tex_none QualityWarning -> Pass dEQP-GLES2.functional.fbo.completeness.attachment_combinations.tex_rbo_none_none QualityWarning -> Pass dEQP-GLES2.functional.fbo.completeness.attachment_combinations.tex_rbo_none_rbo QualityWarning -> Pass dEQP-GLES2.functional.fbo.completeness.attachment_combinations.tex_rbo_none_tex QualityWarning -> Pass dEQP-GLES2.functional.fbo.completeness.attachment_combinations.tex_rbo_rbo_none QualityWarning -> Pass dEQP-GLES2.functional.fbo.completeness.attachment_combinations.tex_rbo_tex_none QualityWarning -> Pass dEQP-GLES3.functional.fbo.completeness.samples.rbo0_rbo0_tex Fail -> Pass dEQP-GLES3.functional.fbo.completeness.samples.rbo0_tex_none Fail -> Pass dEQP-GLES3.functional.fbo.completeness.samples.rbo1_rbo1_rbo1 Fail -> Pass dEQP-GLES3.functional.fragment_out.random.* NotSupported -> Pass dEQP-GLES31.functional.shaders.builtin_functions.common.frexp.*_fragment Fail -> Pass dEQP-GLES31.functional.shaders.builtin_functions.common.frexp.*_vertex Fail -> Pass dEQP-GLES31.functional.shaders.builtin_functions.precision.frexp.*_fragment.* Fail -> Pass dEQP-GLES31.functional.shaders.builtin_functions.precision.frexp.*_vertex.* Fail -> Pass Signed-off-by: Gert Wollny <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* freedreno/ir3/cp: fix ldib bugRob Clark2019-03-151-0/+6
| | | | | | | Something that we didn't hit earlier because of the extra shr.b Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* gallium/auxiliary/vl: Change weave compute shader implementationJames Zhu2019-03-151-17/+62
| | | | | | | | | | | | | Use 2D_ARRARY instead of RECT to fetch texels for weave compute shader. Problem 2,3: Fixed interpolation issue with weave de-interlace Fixes: 9364d66cb7f7 (Add video compositor compute shader render) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109646 Signed-off-by: James Zhu <[email protected]> Acked-by: Leo Liu <[email protected]> Tested-by: Bruno Milreu <[email protected]>
* gallium/auxiliary/vl: Change grid settingJames Zhu2019-03-151-4/+5
| | | | | | | | | Using draw area for grid setting instead of destination buffer size. Signed-off-by: James Zhu <[email protected]> Acked-by: Leo Liu <[email protected]> Tested-by: Bruno Milreu <[email protected]>
* gallium/auxiliary/vl: Increase shader_params sizeJames Zhu2019-03-152-2/+9
| | | | | | | | | Increase shader_params size to pass sampler data to compute shader during weave de-interlace. Signed-off-by: James Zhu <[email protected]> Acked-by: Leo Liu <[email protected]> Tested-by: Bruno Milreu <[email protected]>
* omx: add a compute path in enc_LoadImage_commonMarek Olšák2019-03-154-37/+196
| | | | Acked-by: Leo Liu <[email protected]>
* omx: clean up enc_LoadImage_commonMarek Olšák2019-03-151-16/+37
| | | | | | | - add *pipe - add documentation Acked-by: Leo Liu <[email protected]>
* gallium: add pipe_grid_info::last_blockMarek Olšák2019-03-158-35/+35
| | | | | | | | | The OpenMAX state tracker will use this. RadeonSI is adapted to use pipe_grid_info::last_block instead of its internal state. Acked-by: Leo Liu <[email protected]>
* nir/xfb: move varyings info out of nir_xfb_infoAlejandro Piñeiro2019-03-153-30/+64
| | | | | | | | | | | | | | | | | | | | | When varyings was added we moved to use to dynamycally allocated pointers, instead of allocating just one block for everything. That breaks some assumptions of some vulkan drivers (like anv), that make serialization and copying easier. And at the same time, varyings are not needed for vulkan. So this commit moves them out. Although it seems a little an overkill, fixing the anv side would require a similar, or more, changes, so in the end it is about to decide where do we want to put our effort. v2: (from Jason review) * Don't use a temp variable on the _create methods, just return result of rzalloc_size * Wrap some lines too long. Fixes: cf0b2ad486c9 ("nir/xfb: adding varyings on nir_xfb_info and gather_info") Reviewed-by: Jason Ekstrand <[email protected]>
* radv: always load 3 channels for formats that need to be shuffledSamuel Pitoiset2019-03-151-9/+14
| | | | | | | | | This fixes a rendering issue with Hellblade and DXVK. Fixes: a66b186bebf ("radv: use typed buffer loads for vertex input fetches") Reported-by: Philip Rebohle <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* mesa: Add assert to _mesa_primitive_restart_index.Mathias Fröhlich2019-03-151-0/+3
| | | | | | | Make sure the inde_size parameter is meant to be in bytes. Reviewed-by: Brian Paul <[email protected]> Signed-off-by: Mathias Fröhlich <[email protected]>
* vbo: Fix GL_PRIMITIVE_RESTART_FIXED_INDEX in display list compiles.Mathias Fröhlich2019-03-151-5/+9
| | | | | | | | The maximum value primitive restart index is different for each index data type. Use the appropriate fixed restart index value. Reviewed-by: Brian Paul <[email protected]> Signed-off-by: Mathias Fröhlich <[email protected]>
* vbo: Fix basevertex handling in display list compiles.Mathias Fröhlich2019-03-151-5/+12
| | | | | | | | | The standard requires that the primitive restart comparison happens before the basevertex value is added. Do this now, drop a reference to the standard why this happens at this place. Reviewed-by: Brian Paul <[email protected]> Signed-off-by: Mathias Fröhlich <[email protected]>
* mesa: Use mapping tools in debug prints.Mathias Fröhlich2019-03-151-45/+12
| | | | | Reviewed-by: Brian Paul <[email protected]> Signed-off-by: Mathias Fröhlich <[email protected]>
* mesa: Remove _ae_{,un}map_vbos and dependencies.Mathias Fröhlich2019-03-152-100/+0
| | | | | | | | | Since mapping and unmapping the buffer objects in a VAO is handled directly from the VAO, this part of the _NEW_ARRAY state is no longer used. So remove this part of array element state. Reviewed-by: Brian Paul <[email protected]> Signed-off-by: Mathias Fröhlich <[email protected]>
* mesa: Replace _ae_{,un}map_vbos with _mesa_vao_{,un}map_arraysMathias Fröhlich2019-03-152-13/+11
| | | | | | | | | | Due to the use of bitmaps, the _mesa_vao_{,un}map_arrays functions should provide comparable runtime efficienty to the currently used _ae_{,un}map_vbos functions. So use this functions and enable further cleanup. Reviewed-by: Brian Paul <[email protected]> Signed-off-by: Mathias Fröhlich <[email protected]>
* mesa: Use _mesa_array_element in dlist save.Mathias Fröhlich2019-03-151-4/+19
| | | | | | | | | | Make use of the newly factored out _mesa_array_element function in display list compilation. For now that duplicates out the primitive restart logic. But that turns out to need a fix in display list handling anyhow. Reviewed-by: Brian Paul <[email protected]> Signed-off-by: Mathias Fröhlich <[email protected]>
* mesa: Factor out _mesa_array_element.Mathias Fröhlich2019-03-152-19/+32
| | | | | | | | | The factored out function handles emitting the vertex attributes at the given index. The now public accessible function gets used in the following patches. Reviewed-by: Brian Paul <[email protected]> Signed-off-by: Mathias Fröhlich <[email protected]>
* mesa: Implement helper functions to map and unmap a VAO.Mathias Fröhlich2019-03-152-0/+102
| | | | | | | | | | Provide a set of functions that maps or unmaps all VBOs held in a VAO. The functions will be used in the following patches. v2: Update comments. Reviewed-by: Brian Paul <[email protected]> Signed-off-by: Mathias Fröhlich <[email protected]>
* st/mesa: Let NIR lower UBO and SSBO access when we have itJason Ekstrand2019-03-152-1/+11
| | | | Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* i965: Stop setting LowerBuferInterfaceBlocksJason Ekstrand2019-03-152-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead, we do UBO and SSBO deref lowering in NIR after we've given it a chance to optimize SSBO access: Shader-db results on Kaby Lake: total instructions in shared programs: 15235775 -> 15235484 (<.01%) instructions in affected programs: 14992 -> 14701 (-1.94%) helped: 19 HURT: 20 total cycles in shared programs: 339220331 -> 339027307 (-0.06%) cycles in affected programs: 79831981 -> 79638957 (-0.24%) helped: 540 HURT: 602 total loops in shared programs: 4402 -> 4348 (-1.23%) loops in affected programs: 186 -> 132 (-29.03%) helped: 27 HURT: 0 total spills in shared programs: 23261 -> 23234 (-0.12%) spills in affected programs: 38 -> 11 (-71.05%) helped: 1 HURT: 0 total fills in shared programs: 31442 -> 31371 (-0.23%) fills in affected programs: 98 -> 27 (-72.45%) helped: 1 HURT: 0 LOST: 12 GAINED: 12 Most of the help and hurt in instruction counts was just churn caused by re-ordering of optimizations and the fact that the NIR deref lowering code is emitting slightly different instructions. Nothing was hurt by more than three instructions and most things weren't helped by more than four. The primary exception to this is one Car Chase shader: shaders/non-free/gfxbench4/carchase/341.shader_test CS SIMD32: 1144 -> 821 (-28.23%) There is also one compute shader in Manhattan 3.1 and a fragment shader in the UE4 Shooter Game demo that now get a loop partially unrolled. Those showed up in the results as hurt instructions but were manually removed to get the results above. The lost/gained was a dozen Car Chase shaders that went from SIMD8 to SIMD16 thanks to improved register pressure: shaders/non-free/gfxbench4/carchase/366.shader_test CS shaders/non-free/gfxbench4/carchase/368.shader_test CS shaders/non-free/gfxbench4/carchase/370.shader_test CS shaders/non-free/gfxbench4/carchase/372.shader_test CS shaders/non-free/gfxbench4/carchase/376.shader_test CS shaders/non-free/gfxbench4/carchase/378.shader_test CS shaders/non-free/gfxbench4/carchase/380.shader_test CS shaders/non-free/gfxbench4/carchase/382.shader_test CS shaders/non-free/gfxbench4/carchase/384.shader_test CS shaders/non-free/gfxbench4/carchase/388.shader_test CS shaders/non-free/gfxbench4/carchase/4.shader_test CS shaders/non-free/gfxbench4/carchase/6.shader_test CS Given how much it appeared to be improved, I ran Car Chase on my laptop. Unfortunately, I wasn't able to see any measurable improvement. It might be helped by 1-2% but it's in the noise. It does render correctly as far as I can tell so the improvement is legitimate. All of the loops that got delete were in dolphin uber shaders. I've had no opportunity to test them for correctness or performance. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* glsl/nir: Add a pass to lower UBO and SSBO accessJason Ekstrand2019-03-154-0/+305
| | | | Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* glsl/nir: Handle unlowered SSBO atomic and array_length intrinsicsJason Ekstrand2019-03-151-0/+112
| | | | | | | | We didn't have any of these before because all NIR consumers always called lower_ubo_references. Soon, we want to pass the derefs straight through to NIR so we need to handle these intrinsics directly. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* glsl/nir: Set explicit types on UBO/SSBO variablesJason Ekstrand2019-03-151-15/+67
| | | | | | | | We want to be able to use variables and derefs for UBO/SSBO access in NIR. In order to do this, the rest of NIR needs to know the type layout information. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* glsl: Don't lower vector derefs for SSBOs, UBOs, and sharedJason Ekstrand2019-03-151-0/+21
| | | | | | | | | | | | All of these are backed by some sort of memory so if you have multiple threads writing to different components of the same vector at the same time, the load-vec-store pattern that GLSL IR emits won't work. This shouldn't affect any drivers today as they all call GLSL IR lowering which lowers access to these variables to index+offset intrinsics before we get to this point. However, NIR will start handling the derefs itself and won't want the lowering. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir/lower_io: Add a new buffer_array_length intrinsic and loweringJason Ekstrand2019-03-152-0/+45
| | | | | Reviewed-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir: Rename nir_address_format_vk_index_offset to not be vkJason Ekstrand2019-03-154-10/+10
| | | | | | | | It's just a 32-bit index and offset. We're going to want to use it in GL as well so stop talking about Vulkan. Reviewed-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir/deref: Consider COHERENT decorated var derefs as aliasingJason Ekstrand2019-03-151-4/+47
| | | | | | | | If we get to two deref_var paths with different variables, we usually know they don't alias. However, if both of the paths are marked coherent, we don't have to worry about it. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* compiler/types: Add helpers to get explicit types for standard layoutsJason Ekstrand2019-03-152-16/+191
| | | | | | | | | We also need to modify the current size/align helpers to not blow up when they encounter an explicitly laid out type. Previously we considered using the size/align helpers mutually exclusive with standard layouts but now we just assert that they match. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* compiler/types: Add a C wrapper to get full struct field dataJason Ekstrand2019-03-152-0/+11
| | | | | Reviewed-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* compiler/types: Add a new is_interface C wrapperJason Ekstrand2019-03-152-0/+7
| | | | | | Reviewed-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir/validate: Allow 32-bit boolean load/store intrinsicsJason Ekstrand2019-03-151-0/+6
| | | | | | | | | | | | With UBOs and SSBOs we have boolean types but they're actually 32-bit values. Make the validator a little less strict so that we can do a 32-bit load/store on boolean types. We're about to add a lowering pass called gl_nir_lower_buffers which will lower boolean load/store operations to 32-bit and insert i2b and b2i instructions to convert to/from 1-bit booleans. We want that to be legal. Reviewed-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir/validate: Only require bare types to match for copy_derefJason Ekstrand2019-03-153-3/+6
| | | | | | | | | | If we want to be able to use copy_deref instructions on explicitly laid out types, we have to be a little more flexible about what types we allow. Instead, of requiring the types to exactly match, only require the bare types to match. Reviewed-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir/algebraic: Add a couple optimizations for iabs and ishrJason Ekstrand2019-03-151-0/+6
| | | | | | | | | | | | | | Shader-db results on Kaby Lake: total instructions in shared programs: 15225213 -> 15222365 (-0.02%) instructions in affected programs: 43524 -> 40676 (-6.54%) helped: 203 HURT: 0 Lots of shaders in Shadow Warrior had this pattern along with Deus Ex, Civ, Shadow of Mordor, and several others. Reviewed-by: Kristian H. Kristensen <[email protected]>
* mesa/st: Fix leaks of TGSI tokens in VP variants.Eric Anholt2019-03-141-14/+20
| | | | | | | | | | Starting a glxgears and closing it, I was seeing a lot of leaked TGSI for the fixed function VPs. v2: drop unused delete_ir() arg. Fixes: 3b4929ec6e64 ("st/mesa: Copy VP TGSI tokens if they exist, even for NIR shaders.") Reviewed-by: Kenneth Graunke <[email protected]>
* mesa/st: Make sure that prog_to_nir NIR gets freed.Eric Anholt2019-03-141-0/+6
| | | | | | | | | | GLSL NIR gets freed on relink by _mesa_delete_program(), but for ARB programs we need to free the old NIR when PSN is used to set up new NIR in the same gl_program. Additionally, set the base .nir field so that it will get freed by _mesa_delete_program(). Fixes: 3d7611e9a6c6 ("st/nir: use NIR for asm programs") Reviewed-by: Kenneth Graunke <[email protected]>
* panfrost/midgard: Implement fpowAlyssa Rosenzweig2019-03-144-1/+4
| | | | | | | We have a native op for this, which was just found in a disassembly -- so instead of lowering, use it! Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Compute viewport state on the flyAlyssa Rosenzweig2019-03-142-71/+38
| | | | | | | | | | | Previously, we were caching this incorrectly; there's no real reason to given how variable it is (sensitive to changes in viewport, framebuffer dimensions, and scissors) and how cheap it is to recompute. So, just do it on the fly each draw. Fixes glmark-es2 -bshadow and -brefract. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost; Disable AFBC for depth buffersAlyssa Rosenzweig2019-03-142-4/+7
| | | | | | | | | For inexplicable reasons, the depth buffer is faster if kept as linear, whereas the colour buffers are faster if AFBC. Given both code paths are available, we'll choose the faster one of each (which also helps with testing coverage). Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Allocate extra data for depth bufferAlyssa Rosenzweig2019-03-141-0/+5
| | | | | | | It's not clear why the hardware "spills" a little bit, but if we don't do this, we get MMU faults with linear depth buffers. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Comment spelling fixAlyssa Rosenzweig2019-03-141-1/+1
| | | | Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/mfbd: Respect per-job depth write flagAlyssa Rosenzweig2019-03-144-20/+42
| | | | | | | | | | While a depth buffer may be supplied, it only needs to be written to if the depth writemask is set for any draw AND if the depth buffer is not immediately invalidated (as is the case for scanout). This refactors panfrost_job to provide a depth write requirement, which is now implemented for MFBD depth buffers. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/mfbd: Implement linear depth buffersAlyssa Rosenzweig2019-03-141-10/+9
| | | | | | | | This removes a clunky hack where the depth buffer was enabled during the *clear*, instead of during depth buffer linking. That said, this does not yet support writeback like AFBC depth buffers. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Minor comment cleanup (version detection)Alyssa Rosenzweig2019-03-141-2/+3
| | | | Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Remove staging MFBDAlyssa Rosenzweig2019-03-142-109/+98
| | | | | | | Same idea as the previous commit, but for the MFBD this time instead of the SFBD. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Remove staging SFBD for pan_contextAlyssa Rosenzweig2019-03-144-39/+30
| | | | | | | | The fragment framebuffer descriptor should not be a context entry; rather, it should be constructed only at fragment time to keep analysis tractable. Signed-off-by: Alyssa Rosenzweig <[email protected]>