summaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* mesa: expose AMD_gpu_shader_int64Marek Olšák2018-08-248-12/+260
| | | | | | | | | because the closed driver exposes it. It's equivalent to ARB_gpu_shader_int64. In this patch, I did everything the same as we do for ARB_gpu_shader_int64. Reviewed-by: Ian Romanick <[email protected]>
* mesa: expose ARB_post_depth_coverage in the Compatibility profileMarek Olšák2018-08-242-1/+2
| | | | | | It only contains GLSL changes. v2: allow the layout qualifier on GLSL <= 1.30
* intel/nir: Enable nir_opt_find_array_copiesJason Ekstrand2018-08-232-13/+28
| | | | | | | | | | | | | | | | | | | | | | | We have to be a bit careful with this one because we want it to run in the optimization loop but only in the first brw_nir_optimize call. Later calls assume that we've lowered away copy_deref instructions and we don't want to introduce any more. Shader-db results on Kaby Lake: total instructions in shared programs: 15176942 -> 15176942 (0.00%) instructions in affected programs: 0 -> 0 helped: 0 HURT: 0 In spite of the lack of any shader-db improvement, this patch completely eliminates spilling in the Batman: Arkham City tessellation shaders. This is because we are now able to detect that the temporary array created by DXVK for storing TCS inputs is a copy of the input arrays and use indirect URB reads instead of making a copy of 4.5 KiB of input data and then indirecting on it with if-ladders. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir: Add an array copy optimizationJason Ekstrand2018-08-234-0/+415
| | | | | | | | | | | | This peephole optimization looks for a series of load/store_deref or copy_deref instructions that copy an array from one variable to another and turns it into a copy_deref that copies the entire array. The pattern it looks for is extremely specific but it's good enough to pick up on the input array copies in DXVK and should also be able to pick up the sequence generated by spirv_to_nir for a OpLoad of a large composite followed by OpStore. It can always be improved later if needed. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/nir: Use nir_shrink_vec_array_varsJason Ekstrand2018-08-231-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | Shader-db results on Kaby Lake: total instructions in shared programs: 15177605 -> 15176765 (<.01%) instructions in affected programs: 4259 -> 3419 (-19.72%) helped: 1 HURT: 0 total spills in shared programs: 10954 -> 10855 (-0.90%) spills in affected programs: 295 -> 196 (-33.56%) helped: 1 HURT: 0 total fills in shared programs: 22222 -> 22117 (-0.47%) fills in affected programs: 417 -> 312 (-25.18%) helped: 1 HURT: 0 The helped shader is from the OglCSDof synmark test. On my Kaby Lake laptop, the actual framerate of the benchmark didn't appear to improve beyond the noise. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir: Add a array-of-vector variable shrinking passJason Ekstrand2018-08-232-0/+718
| | | | | | | This pass looks for variables with vector or array-of-vector types and narrows the type to only the components used. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/nir: Use the new structure and array splitting passesJason Ekstrand2018-08-231-0/+2
| | | | | | | | | | | | | | | | | | | | | | We call structure splitting once because it is guaranteed to split all the structures in the entire shader in one go. We call array splitting in the loop in case future optimizations turn indirects into direct dereferences and we can split more arrays. Shader-db results on Kaby Lake: total instructions in shared programs: 15177605 -> 15177605 (0.00%) instructions in affected programs: 0 -> 0 helped: 0 HURT: 0 This is unsurprising because nir_lower_vars_to_ssa already effectively does structure and array splitting internally. It doesn't actually split the variables but it's ability to reason about aliasing in the presence of arrays and structures and pick out scalars or vectors to be lowered to SSA values is fairly advanced. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir: Add an array splitting passJason Ekstrand2018-08-232-0/+584
| | | | | | | | | | | | | | | | | | | | | | | This pass looks for array variables where at least one level of the array is never indirected and splits it into multiple smaller variables. This pass doesn't really do much now because nir_lower_vars_to_ssa can already see through arrays of arrays and can detect indirects on just one level or even see that arr[i][0][5] does not alias arr[i][1][j]. This pass exists to help other passes more easily see through arrays of arrays. If a back-end does implement arrays using scratch or indirects on registers, having more smaller arrays is likely to have better memory efficiency. v2 (Jason Ekstrand): - Better comments and naming (some from Caio) - Rework to use one hash map instead of two v2.1 (Jason Ekstrand): - Fix a couple of bugs that were added in the rework including one which basically prevented it from running Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir: Add a structure splitting passJason Ekstrand2018-08-234-0/+278
| | | | | | | | | | | This pass doesn't really do much now because nir_lower_vars_to_ssa can already see through structures and considers them to be "split". This pass exists to help other passes more easily see through structure variables. If a back-end does implement arrays using scratch or indirects on registers, having more smaller arrays is likely to have better memory efficiency. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir/types: Add array_or_matrix helpersJason Ekstrand2018-08-232-0/+17
| | | | Reviewed-by: Thomas Helland<[email protected]>
* i965: don't include compute resources in "Combined" limitsKenneth Graunke2018-08-231-11/+11
| | | | | | | | | | | The combined limits should only include shader stages that can be active at the same time. We don't need to include compute. See also cff290df4c09547cd2cb3b129ec59bdebdadba90 for st/mesa. Unbreaks i965 from assert failing on driver load since Marek's 45f87a48f94148b484961f18a4f1ccf86f066b1c, which dropped the core Mesa capabilities before adjusting driver limits down to match.
* radeonsi: increase the maximum UBO size to 2 GBMarek Olšák2018-08-231-1/+1
| | | | | | | | | Same as the closed driver. This causes a failure in GL45-CTS.compute_shader.max, which has a trivial bug. Tested-by: Dieter Nützel <[email protected]>
* radeonsi: bump MAX_GS_INVOCATIONSMarek Olšák2018-08-232-3/+3
| | | | | | same as the closed driver Tested-by: Dieter Nützel <[email protected]>
* gallium: add PIPE_CAP_MAX_SHADER_BUFFER_SIZEMarek Olšák2018-08-2319-1/+39
| | | | Tested-by: Dieter Nützel <[email protected]>
* gallium: add PIPE_CAP_MAX_GS_INVOCATIONSMarek Olšák2018-08-2319-1/+40
| | | | Tested-by: Dieter Nützel <[email protected]>
* tgsi/ureg: don't call tgsi_sanity when it's too slowMarek Olšák2018-08-231-1/+12
| | | | Tested-by: Dieter Nützel <[email protected]>
* st/mesa: fix up uniform limits to be able to expose large UBOsMarek Olšák2018-08-231-8/+23
| | | | Tested-by: Dieter Nützel <[email protected]>
* st/mesa: don't include compute resources in "Combined" limitsMarek Olšák2018-08-231-6/+3
| | | | | | | The combined limits should only include shader stages that can be active at the same time. Tested-by: Dieter Nützel <[email protected]>
* st/mesa: set ctx->Const.SubPixelBitsMarek Olšák2018-08-231-0/+1
| | | | Tested-by: Dieter Nützel <[email protected]>
* glsl: fix error checking against MAX_UNIFORM_LOCATIONSMarek Olšák2018-08-231-2/+6
| | | | Tested-by: Dieter Nützel <[email protected]>
* mesa: make MaxCombinedUniformComponents 64-bit to allow large UBOsMarek Olšák2018-08-232-7/+7
| | | | Tested-by: Dieter Nützel <[email protected]>
* mesa: add ctx->Const.MaxGeometryShaderInvocationsMarek Olšák2018-08-235-3/+7
| | | | | | | radeonsi wants to report a different value Reviewed-by: Ian Romanick <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* mesa: don't include compute resources in MAX_COMBINED_* limitsMarek Olšák2018-08-231-9/+13
| | | | | | | | | | 5 is the maximum number of shader stages that can be used by 1 execution call at the same time (e.g. a draw call). The limit ensures that each stage can use all of its binding points. Compute is separate and doesn't need the 5x multiplier. Tested-by: Dieter Nützel <[email protected]>
* mesa: bump GL_MAX_ELEMENTS_INDICES and GL_MAX_ELEMENTS_VERTICESMarek Olšák2018-08-232-2/+5
| | | | | | | | same number as our closed GL driver v2: don't use MaxArrayLockSize Tested-by: Dieter Nützel <[email protected]>
* mesa: remove incorrect change for EXT_disjoint_timer_queryMarek Olšák2018-08-231-2/+1
| | | | | Reviewed-by: Tapani Pälli <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* glapi: actually implement GL_EXT_robustness for GLESMarek Olšák2018-08-231-0/+32
| | | | | | | | | | | | The extension was exposed but not the functions. This fixes: dEQP-GLES31.functional.debug.negative_coverage.get_error.buffer.readn_pixels dEQP-GLES31.functional.debug.negative_coverage.get_error.state.get_nuniformfv dEQP-GLES31.functional.debug.negative_coverage.get_error.state.get_nuniformiv Cc: 18.1 18.2 <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* intel/decoder: Decode SFIXED values.Kenneth Graunke2018-08-231-3/+7
| | | | | | This lets us example SAMPLER_STATE's LOD Bias field, among other things. Reviewed-by: Lionel Landwerlin <[email protected]>
* configure: allow building with python3Emil Velikov2018-08-2325-33/+33
| | | | | | | | | | | | Pretty much all of the scripts are python2+3 compatible. Check and allow using python3, while adjusting the PYTHON2 refs. Note: - python3.4 is used as it's the earliest supported version - python3 chosen prior to python2 Signed-off-by: Emil Velikov <[email protected]> Acked-by: Eric Engestrom <[email protected]>
* vk/wsi: avoid reading uninitialised memoryEric Engestrom2018-08-231-2/+2
| | | | | | | | | | | | It will be ignored by x11_swapchain_result() anyway (because reaching the `fail` label without setting `result` means the swapchain status was already a hard error), but the compiler still complains about reading uninitialised memory. While at it, drop the unused assignment right before returning. Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* egl: drop unused _EGL_BUILT_IN_DRIVER_DRI2Eric Engestrom2018-08-233-4/+1
| | | | | | | Unused since b174a1ae720cb404738c "egl: Simplify the "driver" interface". Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* radv/gfx9: implement coherent shaders for VK_ACCESS_SHADER_READ_BITSamuel Pitoiset2018-08-231-1/+20
| | | | | | | | Single-sample color and single-sample depth (not stencil) are coherent with shaders. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]
* meson: Run the install script with Python 3Mathieu Bridon2018-08-235-0/+5
| | | | | | | | The script was being run directly as an executable, and it has a Python 2 shebang. Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* glsl: remove execute bit and shebang from python testsEmil Velikov2018-08-233-3/+0
| | | | | | | | | | | | | Just like the rest of the tree - these should be run either as part of the build system check target, or at the very least with an explicitly versioned python executable. Fixes: db8cd8e3677 ("glcpp/tests: Convert shell scripts to a python script") Fixes: 97c28cb0823 ("glsl/tests: Convert optimization-test.sh to pure python") Fixes: 3b52d292273 ("glsl/tests: reimplement warnings-test in python") Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Dylan Baker <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* i965: Enable INTEL_shader_atomic_float_minmax on Gen9+Ian Romanick2018-08-221-0/+1
| | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* i965: Sort Gen9+ extension enablesIan Romanick2018-08-221-3/+3
| | | | | | | | | This is a strictly alphabetic sort, as is done in extensions_table.h There are other options. We should pick one and document it. Right now, this file is chaos. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/compiler: Implement untyped atomic float min, max, and compare-swap ↵Ian Romanick2018-08-2214-1/+261
| | | | | | | | | | dataport messages v2: Split changes to the message type field to another patch. Suggested by Caio. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/compiler: Expand untyped atomic message type field by a bitIan Romanick2018-08-223-4/+9
| | | | | | | | | | | This is necessary for a new Gen9 message type that will be added in the next patch. There are also Gen8 message types that need the extra bit (mostly for bindless). v2: Split off from the next patch. Suggested by Caio. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/compiler: Silence unused parameter warningsIan Romanick2018-08-225-7/+5
| | | | | | | | | | | | | | | | | | | | | | | | src/intel/compiler/brw_disasm_info.c: In function ‘nir_print_instr’: src/intel/compiler/brw_disasm_info.c:30:61: warning: unused parameter ‘instr’ [-Wunused-parameter] __attribute__((weak)) void nir_print_instr(const nir_instr *instr, FILE *fp) {} ^~~~~ src/intel/compiler/brw_disasm_info.c:30:74: warning: unused parameter ‘fp’ [-Wunused-parameter] __attribute__((weak)) void nir_print_instr(const nir_instr *instr, FILE *fp) {} ^~ src/intel/compiler/brw_disasm.c: In function ‘src_ia1’: src/intel/compiler/brw_disasm.c:850:18: warning: unused parameter ‘_reg_file’ [-Wunused-parameter] unsigned _reg_file, ^~~~~~~~~ src/intel/compiler/brw_fs_surface_builder.cpp: In function ‘void brw::surface_access::emit_byte_scattered_write(const brw::fs_builder&, const fs_reg&, const fs_reg&, const fs_reg&, unsigned int, unsigned int, unsigned int, brw_predicate)’: src/intel/compiler/brw_fs_surface_builder.cpp:193:57: warning: unused parameter ‘size’ [-Wunused-parameter] unsigned dims, unsigned size, ^~~~ v2: Update commit message. brw_fs_generator.cpp warnings were already fixed by another patch. Noticed by Caio. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir: Add floating point atomic min, max, and compare-swap instrinsicsIan Romanick2018-08-224-8/+50
| | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir: Add floating point atomic add instrinsicsIan Romanick2018-08-225-5/+22
| | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* glsl: Add support for lowering shared-variable float atomicsIan Romanick2018-08-221-3/+3
| | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* glsl: Add support for lowering SSBO float atomicsIan Romanick2018-08-221-3/+3
| | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* glsl: Add built-in functions for INTEL_shader_atomic_float_minmaxIan Romanick2018-08-221-1/+32
| | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* mesa: Extension boilerplate for INTEL_shader_atomic_float_minmaxIan Romanick2018-08-224-0/+5
| | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* glsl: Add built-in functions for NV_shader_atomic_floatIan Romanick2018-08-221-3/+48
| | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* mesa: Extension boilerplate for NV_shader_atomic_floatIan Romanick2018-08-224-0/+5
| | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* meson: fix egl build for androidGurchetan Singh2018-08-221-0/+1
| | | | | | | | Haven't tested this, but we do include loader.h in platform_android.c Fixes: c5ec1556859b7d33637c9fad13d3473c7b2f9eb3 ("meson: wire up egl/android") Reviewed-by: Dylan Baker <[email protected]>
* meson: fix egl build for surfacelessGurchetan Singh2018-08-221-0/+1
| | | | | | | | | | | | | | Without this, I get: > platform_surfaceless.c:38:10: fatal error: 'loader.h' file not found > #include "loader.h" > ^~~~~~~~~~ > 1 error generated. Fixes: 108d257a16859898f5ce02f4759c5c58f9b8c050 ("meson: build libEGL") Reviewed-by: Dylan Baker <[email protected]> v2: Split up patches, modify commit message (Dylan)
* nir: Give end_block its own indexCaio Marcelo de Oliveira Filho2018-08-221-1/+4
| | | | | | | | | | | Since there's no particular reason for the index to be 0, choose an index that is not used by other block. This is convenient when we store "per-block" data in an array AND look for the successors data (e.g. any kind of backwards data-flow analysis). v2: Add a note about end_block's index. (Jason) Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Skip common instructions when comparing deref pathsCaio Marcelo de Oliveira Filho2018-08-221-0/+3
| | | | | | | | | | | | | | | | | | | Deref paths may share the same deref instructions in their chains, e.g. ssa_100 = deref_var A ssa_101 = deref_struct "array_field" of ssa_100 ssa_102 = deref_array "[1]" of ssa_101 ssa_103 = deref_struct "field_a" of ssa_102 ssa_104 = deref_struct "field_a" of ssa_103 when comparing the two last deref instructions, their paths will share a common sequence ssa_100, ssa_101, ssa_102. This patch skips to next iteration if the deref instructions are the same. Path[0] (the var) is still handled specially, so in the case above, only ssa_101 and ssa_102 will be skipped. Reviewed-by: Jason Ekstrand <[email protected]>