summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Revert recent changes about not including compute in combined limits.Kenneth Graunke2018-08-243-27/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As far as I can tell, no one reviewed these changes, they made i965 assert fail on driver load, and I am not certain they are correct. (Hopefully reverting these does not break radeonsi too badly...) The uniform related changes seem fine and reasonable, but the texture image units change is possibly incorrect. According to the OES_tessellation_shader spec issue 5: (5) How are aggregate shader limits computed? RESOLVED: Following the GL 4.4 model, but we restrict uniform buffer bindings to 12/stage instead of 14, this results in MAX_UNIFORM_BUFFER_BINDINGS = 72 This is 12 bindings/stage * 6 shader stages, allowing a static partitioning of the bindings even though at most 5 stages can appear in a program object). MAX_COMBINED_UNIFORM_BLOCKS = 60 This is 12 blocks/stage * 5 stages, since compute shaders can't be mixed with other stages. MAX_COMBINED_TEXTURE_IMAGE_UNITS = 96 This is 16 textures/stage * 6 stages. which definitely is including compute shaders in that last limit. Not including compute shaders breaks the following test: dEQP-GLES31.functional.state_query.integer.max_combined_texture_image_units_getinteger There was enough breakage that I figured we should just send this back to the drawing board. Revert "i965: don't include compute resources in "Combined" limits" Revert "st/mesa: don't include compute resources in "Combined" limits" Revert "mesa: don't include compute resources in MAX_COMBINED_* limits" This reverts commit b03dcb1e5f507c5950d0de053a6f76e6306ee71f. This reverts commit cff290df4c09547cd2cb3b129ec59bdebdadba90. This reverts commit 45f87a48f94148b484961f18a4f1ccf86f066b1c.
* gallivm: don't use saturated unsigned add/sub intrinsics for llvm 8.0Roland Scheidegger2018-08-241-27/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These have been removed. Unfortunately auto-upgrade doesn't work for jit. (Worse, it seems we don't get a compilation error anymore when compiling the shader, rather llvm will just do a call to a null function in the jitted shaders making it difficult to detect when intrinsics vanish.) Luckily the signed ones are still there, I helped convincing llvm removing them is a bad idea for now, since while the unsigned ones have sort of agreed-upon simplest patterns to replace them with, this is not the case for the signed ones, and they require _significantly_ more complex patterns - to the point that the recognition is IMHO probably unlikely to ever work reliably in practice (due to other optimizations interfering). (Even for the relatively trivial unsigned patterns, llvm already added test cases where recognition doesn't work, unsaturated add followed by saturated add may produce atrocious code.) Nevertheless, it seems there's a serious quest to squash all cpu-specific intrinsics going on, so I'd expect patches to nuke them as well to resurface. Adapt the existing fallback code to match the simple patterns llvm uses and hope for the best. I've verified with lp_test_blend that it does produce the expected saturated assembly instructions. Though our cmp/select build helpers don't use boolean masks, but it doesn't seem to interfere with llvm's ability to recognize the pattern. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106231 Reviewed-by: Jose Fonseca <[email protected]>
* st/mesa: expose KHR_texture_compression_astc_sliced_3dMarek Olšák2018-08-243-3/+6
| | | | | This is ASTC 2D LDR allowing texture arrays and 3D, compressing each slice as a separate 2D image. Tested by piglit. Trivial.
* st/mesa: expose EXT_disjoint_timer_queryMarek Olšák2018-08-242-0/+2
| | | | same cap as ARB_timer_query, no changes needed, tested by piglit
* mesa: expose EXT_vertex_attrib_64bitMarek Olšák2018-08-244-0/+74
| | | | | | | because the closed driver exposes it. It's the same as the ARB extension. Reviewed-by: Ian Romanick <[email protected]>
* mesa: expose AMD_query_buffer_objectMarek Olšák2018-08-242-0/+2
| | | | | | it's a subset of the ARB extension. Reviewed-by: Ian Romanick <[email protected]>
* mesa: expose AMD_multi_draw_indirectMarek Olšák2018-08-243-0/+22
| | | | | | | because the closed driver exposes it. This is equivalent to the ARB extension. Reviewed-by: Ian Romanick <[email protected]>
* mesa: expose AMD_gpu_shader_int64Marek Olšák2018-08-249-12/+261
| | | | | | | | | because the closed driver exposes it. It's equivalent to ARB_gpu_shader_int64. In this patch, I did everything the same as we do for ARB_gpu_shader_int64. Reviewed-by: Ian Romanick <[email protected]>
* mesa: expose ARB_post_depth_coverage in the Compatibility profileMarek Olšák2018-08-242-1/+2
| | | | | | It only contains GLSL changes. v2: allow the layout qualifier on GLSL <= 1.30
* intel/nir: Enable nir_opt_find_array_copiesJason Ekstrand2018-08-232-13/+28
| | | | | | | | | | | | | | | | | | | | | | | We have to be a bit careful with this one because we want it to run in the optimization loop but only in the first brw_nir_optimize call. Later calls assume that we've lowered away copy_deref instructions and we don't want to introduce any more. Shader-db results on Kaby Lake: total instructions in shared programs: 15176942 -> 15176942 (0.00%) instructions in affected programs: 0 -> 0 helped: 0 HURT: 0 In spite of the lack of any shader-db improvement, this patch completely eliminates spilling in the Batman: Arkham City tessellation shaders. This is because we are now able to detect that the temporary array created by DXVK for storing TCS inputs is a copy of the input arrays and use indirect URB reads instead of making a copy of 4.5 KiB of input data and then indirecting on it with if-ladders. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir: Add an array copy optimizationJason Ekstrand2018-08-234-0/+415
| | | | | | | | | | | | This peephole optimization looks for a series of load/store_deref or copy_deref instructions that copy an array from one variable to another and turns it into a copy_deref that copies the entire array. The pattern it looks for is extremely specific but it's good enough to pick up on the input array copies in DXVK and should also be able to pick up the sequence generated by spirv_to_nir for a OpLoad of a large composite followed by OpStore. It can always be improved later if needed. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/nir: Use nir_shrink_vec_array_varsJason Ekstrand2018-08-231-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | Shader-db results on Kaby Lake: total instructions in shared programs: 15177605 -> 15176765 (<.01%) instructions in affected programs: 4259 -> 3419 (-19.72%) helped: 1 HURT: 0 total spills in shared programs: 10954 -> 10855 (-0.90%) spills in affected programs: 295 -> 196 (-33.56%) helped: 1 HURT: 0 total fills in shared programs: 22222 -> 22117 (-0.47%) fills in affected programs: 417 -> 312 (-25.18%) helped: 1 HURT: 0 The helped shader is from the OglCSDof synmark test. On my Kaby Lake laptop, the actual framerate of the benchmark didn't appear to improve beyond the noise. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir: Add a array-of-vector variable shrinking passJason Ekstrand2018-08-232-0/+718
| | | | | | | This pass looks for variables with vector or array-of-vector types and narrows the type to only the components used. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/nir: Use the new structure and array splitting passesJason Ekstrand2018-08-231-0/+2
| | | | | | | | | | | | | | | | | | | | | | We call structure splitting once because it is guaranteed to split all the structures in the entire shader in one go. We call array splitting in the loop in case future optimizations turn indirects into direct dereferences and we can split more arrays. Shader-db results on Kaby Lake: total instructions in shared programs: 15177605 -> 15177605 (0.00%) instructions in affected programs: 0 -> 0 helped: 0 HURT: 0 This is unsurprising because nir_lower_vars_to_ssa already effectively does structure and array splitting internally. It doesn't actually split the variables but it's ability to reason about aliasing in the presence of arrays and structures and pick out scalars or vectors to be lowered to SSA values is fairly advanced. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir: Add an array splitting passJason Ekstrand2018-08-232-0/+584
| | | | | | | | | | | | | | | | | | | | | | | This pass looks for array variables where at least one level of the array is never indirected and splits it into multiple smaller variables. This pass doesn't really do much now because nir_lower_vars_to_ssa can already see through arrays of arrays and can detect indirects on just one level or even see that arr[i][0][5] does not alias arr[i][1][j]. This pass exists to help other passes more easily see through arrays of arrays. If a back-end does implement arrays using scratch or indirects on registers, having more smaller arrays is likely to have better memory efficiency. v2 (Jason Ekstrand): - Better comments and naming (some from Caio) - Rework to use one hash map instead of two v2.1 (Jason Ekstrand): - Fix a couple of bugs that were added in the rework including one which basically prevented it from running Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir: Add a structure splitting passJason Ekstrand2018-08-234-0/+278
| | | | | | | | | | | This pass doesn't really do much now because nir_lower_vars_to_ssa can already see through structures and considers them to be "split". This pass exists to help other passes more easily see through structure variables. If a back-end does implement arrays using scratch or indirects on registers, having more smaller arrays is likely to have better memory efficiency. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir/types: Add array_or_matrix helpersJason Ekstrand2018-08-232-0/+17
| | | | Reviewed-by: Thomas Helland<[email protected]>
* i965: don't include compute resources in "Combined" limitsKenneth Graunke2018-08-231-11/+11
| | | | | | | | | | | The combined limits should only include shader stages that can be active at the same time. We don't need to include compute. See also cff290df4c09547cd2cb3b129ec59bdebdadba90 for st/mesa. Unbreaks i965 from assert failing on driver load since Marek's 45f87a48f94148b484961f18a4f1ccf86f066b1c, which dropped the core Mesa capabilities before adjusting driver limits down to match.
* radeonsi: increase the maximum UBO size to 2 GBMarek Olšák2018-08-231-1/+1
| | | | | | | | | Same as the closed driver. This causes a failure in GL45-CTS.compute_shader.max, which has a trivial bug. Tested-by: Dieter Nützel <[email protected]>
* radeonsi: bump MAX_GS_INVOCATIONSMarek Olšák2018-08-232-3/+3
| | | | | | same as the closed driver Tested-by: Dieter Nützel <[email protected]>
* gallium: add PIPE_CAP_MAX_SHADER_BUFFER_SIZEMarek Olšák2018-08-2319-1/+39
| | | | Tested-by: Dieter Nützel <[email protected]>
* gallium: add PIPE_CAP_MAX_GS_INVOCATIONSMarek Olšák2018-08-2319-1/+40
| | | | Tested-by: Dieter Nützel <[email protected]>
* tgsi/ureg: don't call tgsi_sanity when it's too slowMarek Olšák2018-08-231-1/+12
| | | | Tested-by: Dieter Nützel <[email protected]>
* st/mesa: fix up uniform limits to be able to expose large UBOsMarek Olšák2018-08-231-8/+23
| | | | Tested-by: Dieter Nützel <[email protected]>
* st/mesa: don't include compute resources in "Combined" limitsMarek Olšák2018-08-231-6/+3
| | | | | | | The combined limits should only include shader stages that can be active at the same time. Tested-by: Dieter Nützel <[email protected]>
* st/mesa: set ctx->Const.SubPixelBitsMarek Olšák2018-08-231-0/+1
| | | | Tested-by: Dieter Nützel <[email protected]>
* glsl: fix error checking against MAX_UNIFORM_LOCATIONSMarek Olšák2018-08-231-2/+6
| | | | Tested-by: Dieter Nützel <[email protected]>
* mesa: make MaxCombinedUniformComponents 64-bit to allow large UBOsMarek Olšák2018-08-232-7/+7
| | | | Tested-by: Dieter Nützel <[email protected]>
* mesa: add ctx->Const.MaxGeometryShaderInvocationsMarek Olšák2018-08-235-3/+7
| | | | | | | radeonsi wants to report a different value Reviewed-by: Ian Romanick <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* mesa: don't include compute resources in MAX_COMBINED_* limitsMarek Olšák2018-08-231-9/+13
| | | | | | | | | | 5 is the maximum number of shader stages that can be used by 1 execution call at the same time (e.g. a draw call). The limit ensures that each stage can use all of its binding points. Compute is separate and doesn't need the 5x multiplier. Tested-by: Dieter Nützel <[email protected]>
* mesa: bump GL_MAX_ELEMENTS_INDICES and GL_MAX_ELEMENTS_VERTICESMarek Olšák2018-08-232-2/+5
| | | | | | | | same number as our closed GL driver v2: don't use MaxArrayLockSize Tested-by: Dieter Nützel <[email protected]>
* mesa: remove incorrect change for EXT_disjoint_timer_queryMarek Olšák2018-08-231-2/+1
| | | | | Reviewed-by: Tapani Pälli <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* glapi: actually implement GL_EXT_robustness for GLESMarek Olšák2018-08-231-0/+32
| | | | | | | | | | | | The extension was exposed but not the functions. This fixes: dEQP-GLES31.functional.debug.negative_coverage.get_error.buffer.readn_pixels dEQP-GLES31.functional.debug.negative_coverage.get_error.state.get_nuniformfv dEQP-GLES31.functional.debug.negative_coverage.get_error.state.get_nuniformiv Cc: 18.1 18.2 <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* intel/decoder: Decode SFIXED values.Kenneth Graunke2018-08-231-3/+7
| | | | | | This lets us example SAMPLER_STATE's LOD Bias field, among other things. Reviewed-by: Lionel Landwerlin <[email protected]>
* travis: use python3 for the autoconf buildsEmil Velikov2018-08-231-1/+11
| | | | | Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* configure: allow building with python3Emil Velikov2018-08-2327-39/+37
| | | | | | | | | | | | Pretty much all of the scripts are python2+3 compatible. Check and allow using python3, while adjusting the PYTHON2 refs. Note: - python3.4 is used as it's the earliest supported version - python3 chosen prior to python2 Signed-off-by: Emil Velikov <[email protected]> Acked-by: Eric Engestrom <[email protected]>
* bin/git_sha1_gen.py: remove execute bit/shebangEmil Velikov2018-08-231-2/+0
| | | | | | | | The script is executed explicitly via the build system, that uses PYTHON/prog_python and equivalent. Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* vk/wsi: avoid reading uninitialised memoryEric Engestrom2018-08-231-2/+2
| | | | | | | | | | | | It will be ignored by x11_swapchain_result() anyway (because reaching the `fail` label without setting `result` means the swapchain status was already a hard error), but the compiler still complains about reading uninitialised memory. While at it, drop the unused assignment right before returning. Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* egl: drop unused _EGL_BUILT_IN_DRIVER_DRI2Eric Engestrom2018-08-233-4/+1
| | | | | | | Unused since b174a1ae720cb404738c "egl: Simplify the "driver" interface". Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* radv/gfx9: implement coherent shaders for VK_ACCESS_SHADER_READ_BITSamuel Pitoiset2018-08-231-1/+20
| | | | | | | | Single-sample color and single-sample depth (not stencil) are coherent with shaders. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]
* bin/install_megadrivers.py: Remove shebang and executable bitMathieu Bridon2018-08-231-1/+0
| | | | | | | | | | | | Since the script is never executed directly, but launched by Meson as an argument to the Python interpreter, those are not needed any more. In addition, they are the reason this script was missed when I moved the Meson buildsystem to Python 3, so removing them helps avoiding future confusion. Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* meson: Run the install script with Python 3Mathieu Bridon2018-08-235-0/+5
| | | | | | | | The script was being run directly as an executable, and it has a Python 2 shebang. Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* glsl: remove execute bit and shebang from python testsEmil Velikov2018-08-233-3/+0
| | | | | | | | | | | | | Just like the rest of the tree - these should be run either as part of the build system check target, or at the very least with an explicitly versioned python executable. Fixes: db8cd8e3677 ("glcpp/tests: Convert shell scripts to a python script") Fixes: 97c28cb0823 ("glsl/tests: Convert optimization-test.sh to pure python") Fixes: 3b52d292273 ("glsl/tests: reimplement warnings-test in python") Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Dylan Baker <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* docs: update required mako versionEmil Velikov2018-08-231-1/+1
| | | | | | | | | | | The requirement was bumped a while back, but we forgot to update the docs. Fixes: ed871af91c2 ("configure.ac: raise Mako required version to 0.8.0") Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Dylan Baker <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* configure: use distutils in ax_check_python_mako_moduleEmil Velikov2018-08-231-2/+3
| | | | | | | | Handling the version comparison by hand is a bad idea. Python has a handy module distutils for that - use it. Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* configure: enforce python 2.7 with AM_PATH_PYTHONEmil Velikov2018-08-232-3/+6
| | | | | | | | | | | | | | | Currently we use AC_CHECK_PROGS looking for python2.7, python2 and finally python. That is due to the varying names used across the different OS. Use the handy AM_PATH_PYTHON which finds the correct name and checks for the version. Note: python2.7 has been an unofficial requirement for quite some time. Update the docs to reflect that. Cc: Dylan Baker <[email protected]> Signed-off-by: Emil Velikov <[email protected]>
* i965: Enable INTEL_shader_atomic_float_minmax on Gen9+Ian Romanick2018-08-221-0/+1
| | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* i965: Sort Gen9+ extension enablesIan Romanick2018-08-221-3/+3
| | | | | | | | | This is a strictly alphabetic sort, as is done in extensions_table.h There are other options. We should pick one and document it. Right now, this file is chaos. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/compiler: Implement untyped atomic float min, max, and compare-swap ↵Ian Romanick2018-08-2214-1/+261
| | | | | | | | | | dataport messages v2: Split changes to the message type field to another patch. Suggested by Caio. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/compiler: Expand untyped atomic message type field by a bitIan Romanick2018-08-223-4/+9
| | | | | | | | | | | This is necessary for a new Gen9 message type that will be added in the next patch. There are also Gen8 message types that need the extra bit (mostly for bindless). v2: Split off from the next patch. Suggested by Caio. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>