summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* nir: Add partial redundancy elimination for comparesIan Romanick2019-03-285-0/+414
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This pass attempts to dectect code sequences like if (x < y) { z = y - x; ... } and replace them with sequences like t = x - y; if (t < 0) { z = -t; ... } On architectures where the subtract can generate the flags used by the if-statement, this saves an instruction. It's also possible that moving an instruction out of the if-statement will allow nir_opt_peephole_select to convert the whole thing to a bcsel. Currently only floating point compares and adds are supported. Adding support for integer will be a challenge due to integer overflow. There are a couple possible solutions, but they may not apply to all architectures. v2: Fix a typo in the commit message and a couple typos in comments. Fix possible NULL pointer deref from result of push_block(). Add missing (-A + B) case. Suggested by Caio. v3: Fix is_not_const_zero to work correctly with types other than nir_type_float32. Suggested by Ken. v4: Add some comments explaining how this works. Suggested by Ken. Reviewed-by: Kenneth Graunke <[email protected]>
* nir: Add nir_alu_srcs_negative_equalIan Romanick2019-03-283-0/+192
| | | | | | | v2: Move bug fix in get_neg_instr from the next patch to this patch (where it was intended to be in the first place). Noticed by Caio. Reviewed-by: Kenneth Graunke <[email protected]>
* nir: Add nir_const_value_negative_equalIan Romanick2019-03-284-0/+398
| | | | | | | v2: Rebase on 1-bit Boolean changes. Reviewed-by: Thomas Helland <[email protected]> [v1] Reviewed-by: Kenneth Graunke <[email protected]>
* nir/algebraic: Add missing 16-bit extract_[iu]8 patternsIan Romanick2019-03-281-0/+3
| | | | | | | | | | | | | | No shader-db changes on any Intel platform. v2: Use a loop to generate patterns. Suggested by Jason. v3: Fix a copy-and-paste bug in the extract_[ui] of ishl loop that would replace an extract_i8 with and extract_u8. This broke ~180 tests. This bug was introduced in v2. Reviewed-by: Matt Turner <[email protected]> [v1] Reviewed-by: Dylan Baker <[email protected]> [v2] Acked-by: Jason Ekstrand <[email protected]> [v2]
* nir/algebraic: Add missing 64-bit extract_[iu]8 patternsIan Romanick2019-03-281-0/+3
| | | | | | | | | | | | | | No shader-db changes on any Intel platform. v2: Use a loop to generate patterns. Suggested by Jason. v3: Fix a copy-and-paste bug in the extract_[ui] of ishl loop that would replace an extract_i8 with and extract_u8. This broke ~180 tests. This bug was introduced in v2. Reviewed-by: Matt Turner <[email protected]> [v1] Reviewed-by: Dylan Baker <[email protected]> [v2] Acked-by: Jason Ekstrand <[email protected]> [v2]
* nir/algebraic: Remove redundant extract_[iu]8 patternsIan Romanick2019-03-281-14/+4
| | | | | | | | No shader-db changes on any Intel platform. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Dylan Baker <[email protected]> Acked-by: Jason Ekstrand <[email protected]>
* nir/algebraic: Fix up extract_[iu]8 after loop unrollingIan Romanick2019-03-281-2/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Skylake, Broadwell, and Haswell had similar results. (Skylake shown) total instructions in shared programs: 15256840 -> 15256837 (<.01%) instructions in affected programs: 4713 -> 4710 (-0.06%) helped: 3 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.06% max: 0.08% x̄: 0.06% x̃: 0.06% total cycles in shared programs: 372286583 -> 372286583 (0.00%) cycles in affected programs: 198516 -> 198516 (0.00%) helped: 1 HURT: 1 helped stats (abs) min: 10 max: 10 x̄: 10.00 x̃: 10 helped stats (rel) min: <.01% max: <.01% x̄: <.01% x̃: <.01% HURT stats (abs) min: 10 max: 10 x̄: 10.00 x̃: 10 HURT stats (rel) min: 0.01% max: 0.01% x̄: 0.01% x̃: 0.01% No changes on any other Intel platform. v2: Use a loop to generate patterns. Suggested by Jason. v3: Fix a copy-and-paste bug in the extract_[ui] of ishl loop that would replace an extract_i8 with and extract_u8. This broke ~180 tests. This bug was introduced in v2. Reviewed-by: Matt Turner <[email protected]> [v1] Reviewed-by: Dylan Baker <[email protected]> [v2] Acked-by: Jason Ekstrand <[email protected]> [v2]
* nir/deref: fix struct wrapper casts. (v3)Dave Airlie2019-03-291-2/+36
| | | | | | | | | | | llvm/spir-v spits out some struct a { struct b {} }, but it doesn't deref, it casts (struct a) to (struct b), reconstruct struct derefs instead of casts for these. v2: use ssa_def_rewrite uses, rework the type restrictions (Jason) v3: squish more stuff into one function, drop unused temp (Jason) Reviewed-by: Jason Ekstrand <[email protected]>
* i965/blorp: Remove unused parameter from blorp_surf_for_miptree.Rafael Antognolli2019-03-281-24/+12
| | | | | | It seems pretty useless nowadays. Reviewed-by: Jason Ekstrand <[email protected]>
* iris/icl: Add WA_2204188704 to disable pixel shader panic dispatchAnuj Phogat2019-03-281-0/+7
| | | | | Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* iris/icl: Set Enabled Texel Offset Precision Fix bitAnuj Phogat2019-03-281-0/+7
| | | | | | | | h/w specification requires this bit to be always set. See Mesa commit 5eb173304bd. Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* freedreno/ir3: align const size to vec4Rob Clark2019-03-281-4/+5
| | | | | | | This is no longer true since PIPE_CAP_PACKED_UNIFORMS was enabled. Fixes: 3c8779af325 freedreno/ir3: Enable PIPE_CAP_PACKED_UNIFORMS Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: reads/writes to unrelated arrays are not dependentRob Clark2019-03-281-1/+30
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: sched fixRob Clark2019-03-281-1/+1
| | | | | | | Not sure why new-style frag inputs start triggering this. But we probably shouldn't consider src's from other blocks. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: small cleanupRob Clark2019-03-281-2/+2
| | | | Signed-off-by: Rob Clark <[email protected]>
* iris: Fix blits with S8_UINT destinationKenneth Graunke2019-03-281-4/+2
| | | | | | | | | | | | For depth and stencil blits, we always want the main mask to be Z, and the secondary pass mask to be S. If asked to blit Z+S to S, we should handle the blit in the second pass which properly gets the stencil resources. Before, we were trying to handle S as the main mask, and accidentally blitting a Z source to a S destination, which doesn't work out well. Fixes Piglit's "framebuffer-blit-levels {draw,read} stencil" tests.
* st/mesa: Fix blitting from GL_DEPTH_STENCIL to GL_STENCIL_INDEXKenneth Graunke2019-03-281-0/+1
| | | | | | | | | Fixes assertion failures in Piglit's "framebuffer-blit-levels {draw,read} stencil" tests on iris. Also fixes assert failures in frameretrace, which tries to ReadPixels the stencil values (only) from a Z24S8 depth/stencil attachment. Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno/ir3: Add workaround for VS samgqKristian H. Kristensen2019-03-286-4/+29
| | | | | | | | | | | | | | | | | This instruction needs a workaround when used from vertex shaders. Fixes: dEQP-GLES3.functional.shaders.texture_functions.texturegradoffset.sampler2dshadow_vertex dEQP-GLES3.functional.shaders.texture_functions.texturegradoffset.sampler3d_fixed_vertex dEQP-GLES3.functional.shaders.texture_functions.texturegradoffset.sampler3d_float_vertex dEQP-GLES3.functional.shaders.texture_functions.textureprojgradoffset.sampler2dshadow_vertex dEQP-GLES3.functional.shaders.texture_functions.textureprojgradoffset.sampler3d_fixed_vertex dEQP-GLES3.functional.shaders.texture_functions.textureprojgradoffset.sampler3d_float_vertex dEQP-GLES3.functional.shaders.texture_functions.textureprojgrad.sampler2dshadow_vertex Signed-off-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* freedreno/ir3: Don't access beyond available regsKristian H. Kristensen2019-03-281-4/+7
| | | | | | | | emit_cat5() needs to check if the last optional reg is there before it accesses it. Signed-off-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* util/disk_cache: close fd in the fallback pathEric Engestrom2019-03-281-4/+3
| | | | | | | | | There are multiple `goto path_fail` with an open fd, but none that go to `fail:` without going through `path_fail:` first, so let's just move the `close(fd)` there. Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* radv: skip updating depth/color metadata for conditional renderingSamuel Pitoiset2019-03-281-3/+3
| | | | | | | | | | | | I don't think we should update metadata when conditional rendering is enabled. For some reasons, some CTS breaks only on SI. This fixes the following CTS on SI: dEQP-VK.conditional_rendering.draw_clear.clear.depth.* Cc: 19.0 <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* st/nir: Free the GLSL IR after linking.Kenneth Graunke2019-03-281-0/+4
| | | | | | | | | i965 does this, and st's tgsi path does this. st/nir did not. Cuts 138MB of memory from a DiRT Rally trace, which is about 44% of the total GLSL IR memory. Reviewed-by: Timothy Arceri <[email protected]>
* radv: enable VK_AMD_gpu_shader_int16Samuel Pitoiset2019-03-281-0/+1
| | | | | | | This extension allows 16-bit support to Frexp/FrexpStruct. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: do not lower frexp_exp and frexp_sigSamuel Pitoiset2019-03-281-1/+0
| | | | | | | Hardware has two instructions. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: add ac_build_frex_exp() helper ans 16-bit/32-bit supportSamuel Pitoiset2019-03-283-3/+33
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: add ac_build_frexp_mant() helper and 16-bit/32-bit supportSamuel Pitoiset2019-03-283-2/+31
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* iris: Actually advertise some modifiersKenneth Graunke2019-03-271-0/+39
| | | | | | | | | | I neglected to fill out this driver function, causing us to advertise 0 modifiers. Now we advertise the various tilings and let the driver pick them. I've verified that X tiling works with Weston (by hacking the list to skip Y tiling). Y+CCS doesn't work yet because it's multiplane and the Gallium dri state tracker isn't really prepared for that. Leave it off for now.
* intel/genxml: Media instructions and structures for gen11Toni Lönnberg2019-03-281-24/+3450
| | | | | | | | | | | v2: Lionel Landwerlin <[email protected]> - fix missing type - fix *_FQM_*/*_QM_* commands - shorten some media structs using groups - factor out memory attributes - switch MI_FLUSH_DW fields to bool Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/genxml: Media instructions and structures for gen10Toni Lönnberg2019-03-281-24/+3284
| | | | | | | | | | | v2: Lionel Landwerlin <[email protected]> - fix missing type - fix *_FQM_*/*_QM_* commands - shorten some media structs using groups - factor out memory attributes - switch MI_FLUSH_DW fields to bool Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/genxml: Media instructions and structures for gen9Toni Lönnberg2019-03-281-24/+3090
| | | | | | | | | | | v2: Lionel Landwerlin <[email protected]> - fix missing type - fix *_FQM_*/*_QM_* commands - shorten some media structs using groups - factor out memory attributes - switch MI_FLUSH_DW fields to bool Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/genxml: Media instructions and structures for gen8Toni Lönnberg2019-03-281-0/+1572
| | | | | | | v2: Lionel Landwerlin <[email protected]> - switch MI_FLUSH_DW fields to bool Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/genxml: Media instructions and structures for gen7.5Toni Lönnberg2019-03-281-1/+1291
| | | | | | v2: Fixed MI_WAIT_FOR_EVENT to be for video also Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/genxml: Media instructions and structures for gen7Toni Lönnberg2019-03-281-1/+1347
| | | | | | v2: Fixed MI_WAIT_FOR_EVENT to be for blitter and video also Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/genxml: Media instructions and structures for gen6Toni Lönnberg2019-03-281-1/+1003
| | | | | | v2: Fixed MI_WAIT_FOR_EVENT to be for blitter and video also Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/genxml: Only handle instructions meant for render engine when generatingToni Lönnberg2019-03-282-7/+59
| | | | | | | | | | headers v2: Fixed the check for engine v3: Changed engine into an argument given to the scripts Reviewed-by: Lionel Landwerlin <[email protected]>
* softpipe: add indirect store buffer/image unitDave Airlie2019-03-281-2/+34
| | | | | | | | The code to handle image unit indirect was missing Fixes piglit tests/spec/arb_arrays_of_arrays/execution/image_store/basic-imageStore-mixed-const-non-const-uniform-index.shader_test Reviewed-by: Roland Scheidegger <[email protected]>
* softpipe/draw: fix vertex id in soft paths.Dave Airlie2019-03-285-11/+19
| | | | | | | | | | | | | | This fixes the vertex id fetch in the non-llvm drawing paths. This vertex id in elt mode comes from the elts not just a linear value. Note we don't bad basevertex in the elts case as it's already included in the elts by the looks of it (at least tests fail if I add it) Fixes piglit end-primitive tests and some others. Reviewed-by: Roland Scheidegger <[email protected]>
* freedreno/ir3: Push UBOs to constant fileKristian H. Kristensen2019-03-275-16/+145
| | | | | | | | We have a rather big constant file and it seems that the best way to use it is to upload all UBOs and lower UBO access the load_uniform. Signed-off-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* freedreno/ir3: Enable PIPE_CAP_PACKED_UNIFORMSKristian H. Kristensen2019-03-278-13/+120
| | | | | | | | | | | | | | | | | | | This commit turns on the gallium cap and adds a pass to lower the load_ubo intrinsics for block 0 back to load_uniform intrinsics and adjust the backend where the cap switches units from vec4s to dwords. As we stop using ir3_glsl_type_size() for uniform layout, this also corrects an issue where we would allocate a vec4 slot for samplers in uniforms, fixing: dEQP-GLES3.functional.shaders.struct.uniform.sampler_array_fragment dEQP-GLES3.functional.shaders.struct.uniform.sampler_array_vertex dEQP-GLES3.functional.shaders.struct.uniform.sampler_nested_fragment dEQP-GLES2.functional.shaders.struct.uniform.sampler_nested_vertex dEQP-GLES2.functional.shaders.struct.uniform.sampler_nested_fragment Signed-off-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* st/glsl_to_nir: Calculate num_uniforms from NumParameterValuesKristian H. Kristensen2019-03-271-5/+5
| | | | | | | | | | | | We don't need to determine the number of uniform slots here, it's already available as prog->Parameters->NumParameterValues. The way we previously determined the number of slots was also broken for PackedDriverUniformStorage, where we would add loc (in dwords) and type_size() (in vec4s). Signed-off-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* intel: Add Elkhart Lake PCI-IDsAnuj Phogat2019-03-271-0/+4
| | | | | Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* intel: Add Elkhart Lake device infoAnuj Phogat2019-03-271-0/+60
| | | | | | | | V2: Fix L3 bank count (Vivek) Fix simulator_id and num_eu_per_subslice (Lionel) Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* radeon/vcn: add H.264 constrained baseline supportLeo Liu2019-03-271-0/+1
| | | | | | | | VCN supports this profile as well as UVD, so add it Signed-off-by: Leo Liu <[email protected]> Reviewed-by: Alex Deucher <[email protected]> CC: <[email protected]>
* egl/android: chose node type based on swrast and preprocessor flagsGurchetan Singh2019-03-271-3/+9
| | | | | | | | | kms_swrast can work with primary nodes out of the box, but also with rendernodes if the build environment specifies the EGL_FORCE_RENDERNODE flag. Suggested-by: Emil Velikov <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* egl/android: use software rendering when appropriateGurchetan Singh2019-03-271-5/+6
| | | | | | | | | Now the init logic fallbacks to or forces software rendering. v2: simplify flow (@eric) Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* egl/android: use swrast option in droid_load_driverGurchetan Singh2019-03-271-0/+18
| | | | | | | | | | | Load the kms_swrast driver when specified. Doesn't work with drm_gralloc. v2: remove unneeded line (@eric) v3: Remove swrast_loader_extensions (@evelikov) Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* egl/android: plumb swrast optionGurchetan Singh2019-03-271-9/+9
| | | | | | | It's good to have options. Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* egl/android: refactor droid_load_driver a bitGurchetan Singh2019-03-271-24/+20
| | | | | | | | This way, we can use primary nodes with kms_swrast too. Also fix up some whitespace issues. Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* egl/android: droid_open_device_drm_gralloc --> droid_open_deviceGurchetan Singh2019-03-271-7/+4
| | | | | | | | Makes things easier to follow. Suggested-by: Emil Velikov <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* egl/android: move droid_open_device_drm_gralloc down a bitGurchetan Singh2019-03-271-27/+24
| | | | | | | | | 1) Removes a forward declaration. 2) Makes next patch easier. Suggested-by: Emil Velikov <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>