summaryrefslogtreecommitdiffstats
path: root/src/amd
Commit message (Collapse)AuthorAgeFilesLines
* radv: Merge ps_input_cntl computation with PM4 generation.Bas Nieuwenhuizen2018-01-302-83/+79
| | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Merge vtx_reuse_depth computation with PM4 generation.Bas Nieuwenhuizen2018-01-302-8/+6
| | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Merge vs state computation with PM4 generation.Bas Nieuwenhuizen2018-01-302-58/+34
| | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Merge binning state generation with pm4 emission.Bas Nieuwenhuizen2018-01-302-35/+19
| | | | | | | We don't need the pipeline state struct anymore. Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Constify some pipeline helpers.Bas Nieuwenhuizen2018-01-302-6/+6
| | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Add PM4 pregeneration for compute pipelines.Bas Nieuwenhuizen2018-01-302-58/+68
| | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Record a PM4 sequence for graphics pipeline switches.Bas Nieuwenhuizen2018-01-303-451/+483
| | | | | | | | | | | | | This gives about 2% performance improvement on dota2 for me. This is mostly a mechanical copy and replacement, but at bind time we still do: 1) Some stuff that is only based on num_samples changes. 2) Some command buffer state setting. Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Determine unneeded dynamic states.Bas Nieuwenhuizen2018-01-303-38/+64
| | | | | | | Which avoids setting or emitting them. Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* ac/llvm: bump the number of results to 8.Dave Airlie2018-01-311-1/+1
| | | | | | | | | | | This function can get access for a 64-bit dvec4, which means we have to load 8 components. This fixes: R600_DEBUG=nir ./bin/shader_runner generated_tests/spec/arb_gpu_shader_fp64/execution/built-in-functions/fs-abs-dvec4.shader_test -auto Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* nir: add vs_inputs_dual_locations compiler optionTimothy Arceri2018-01-301-0/+1
| | | | | | | | | | | | | Allows nir drivers to either use a single or dual locations for vs double inputs. i965 uses dual locations for both OpenGL and Vulkan drivers, for now gallium OpenGL drivers only use a single location. The following patch will also make use of this option when calling nir_shader_gather_info(). Reviewed-by: Karol Herbst <[email protected]>
* radv/gfx9: fix block compression texture views. (v2)Dave Airlie2018-01-301-4/+49
| | | | | | | | | | | | | | | | This ports a fix from amdvlk, to fix the sizing for mip levels when block compressed images are viewed using uncompressed views. My original fix didn't power the clamping, but it looks like the clamping is required to stop the sizing going too large. Fixes: dEQP-VK.image.texel_view_compatible.graphic.extended*bc* Doesn't crash DOW3 anymore. Reviewed-by: Bas Nieuwenhuizen <[email protected]> Fixes: e38685cc62e 'Revert "radv: disable support for VEGA for now."' Signed-off-by: Dave Airlie <[email protected]>
* radv: Signal fence correctly after sparse binding.Bas Nieuwenhuizen2018-01-291-14/+32
| | | | | | | | | It did not signal syncobjs in the fence, and also signalled too early if there was work on the queue already, as we have to wait till that work is done. Fixes: d27aaae4d2 "radv: Add external fence support." Reviewed-by: Samuel Pitoiset <[email protected]>
* ac: fix indentationTimothy Arceri2018-01-291-6/+6
| | | | Reviewed-by: Dave Airlie <[email protected]>
* ac: remove unused nir2llvmtype()Timothy Arceri2018-01-291-22/+0
| | | | | | The last use of this was removed in the previous patch. Reviewed-by: Dave Airlie <[email protected]>
* ac: fix gs load inputs typeTimothy Arceri2018-01-291-2/+3
| | | | | | | This fixes the scenario where the input is a struct. With this the Unreal engines Elemental demo now works on radeonsi. Reviewed-by: Dave Airlie <[email protected]>
* ac/nir: call glsl_get_sampler_dim() only once where possibleKai Wasserbäch2018-01-291-8/+11
| | | | | | | | | | Changes since v1: * Rebased on top of e68150de263156a3f3d1b609b6506c5649967f61 and 82adf53308c137ce0dc5f2d5da4e7cc40c5b808c. Signed-off-by: Kai Wasserbäch <[email protected]> Reviewed-by: Timothy Arceri <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* ac: rename and move si_const_array into common codeMarek Olšák2018-01-273-13/+16
| | | | Reviewed-by: Samuel Pitoiset <[email protected]>
* ac: move address space definitions to common codeMarek Olšák2018-01-272-6/+4
| | | | Reviewed-by: Samuel Pitoiset <[email protected]>
* ac: don't use byval LLVM qualifier in shadersMarek Olšák2018-01-274-9/+3
| | | | | | | shader-db doesn't show any regression and 32-bit pointers with byval are declared as VGPRs for some reason. Reviewed-by: Samuel Pitoiset <[email protected]>
* ac/nir: set amdgpu.uniform and invariant.load for SSBOsSamuel Pitoiset2018-01-261-1/+7
| | | | | | | For descriptors. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac/nir: set amdgpu.uniform and invariant.load for UBOsSamuel Pitoiset2018-01-261-1/+7
| | | | | | | | | | UBOs are constants buffers. Cc: "18.0" <[email protected]> Fixes: 41c36c45 ("amd/common: use ac_build_buffer_load() for emitting UBO loads") Signed-off-by: Samuel Pitoiset <[email protected]> Tested-by: Alex Smith <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac/nir: set the noalias attribute on input pointersSamuel Pitoiset2018-01-261-0/+1
| | | | | | | | This attribute is similar to the definition of restrict in C99 and it might help LLVM. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: only load used channels when sampling buffer viewsSamuel Pitoiset2018-01-261-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | This allows to reduce the number of dwords that are loaded with buffer_load_format_xyzw. For example, when the only used channel is 1, the driver will emit buffer_load_format_x instead. Shader stats for DOW3 (with some local hacky scripts for SPIRV): 143 shaders in 143 tests Totals: SGPRS: 5344 -> 5352 (0.15 %) VGPRS: 3476 -> 3452 (-0.69 %) Spilled SGPRs: 30 -> 29 (-3.33 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 269860 -> 269808 (-0.02 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 1267 -> 1272 (0.39 %) Wait states: 0 -> 0 (0.00 %) Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* ac: pass the number of channels to ac_build_buffer_load_format()Samuel Pitoiset2018-01-263-14/+7
| | | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* ac: add ac_build_buffer_load_common() helperSamuel Pitoiset2018-01-261-21/+40
| | | | | | | | For both versions of llvm.amdgcn.buffer.load.{format}.*. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radv: fix RADV_DEBUG=syncshaders on GFX9Samuel Pitoiset2018-01-261-1/+10
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: fix a GPU hang with RADV_DEBUG=syncshadersSamuel Pitoiset2018-01-261-8/+7
| | | | | | | | | The GPU hangs when the driver forces a PS_PARTIAL_FLUSH after a dispatch call (and vice versa for graphics). Something has changed in the kernel driver because it used to work. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac/shader: scan if fragment shaders write memorySamuel Pitoiset2018-01-265-16/+39
| | | | | | | It's better to do that in ac_shader_info. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac/nir: only canonicalize 32-bit float min/max outputs on pre-GFX9Samuel Pitoiset2018-01-261-2/+8
| | | | | | | | | | According to LLVM, only pre-GFX9 targets do not flush denorms for fmin/fmax. All dEQP-VK.glsl.builtin.precision.* still pass. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: don't enable tc compat for d32s8 + 4/8 samples (v1.1)Dave Airlie2018-01-261-1/+2
| | | | | | | | | | | | | | | | | | This seems to be broken, at least the cts tests fail. This fixes: dEQP-VK.renderpass.suballocation.multisample.d32_sfloat_s8_uint.samples_4 dEQP-VK.renderpass.suballocation.multisample.d32_sfloat_s8_uint.samples_8 2 samples seems to pass fine, amdvlk doesn't appear to enable TC for possibly some other reasons here. This is most likely a hack. v1.1: add a bit of explaination text. (Samuel) Fixes: ad3d98da9 (radv: enable tc compatible htile for d32s8 also.) Signed-off-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* ac/nir: add break statements in needs_view_index_sgpr()Samuel Pitoiset2018-01-251-0/+2
| | | | | | | | | Previous code is correct but as the first case statement uses a break, keep it consistent. CID: 1428579 Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: add multisample Z optimisation from amdvlkDave Airlie2018-01-251-0/+3
| | | | | | | | This was just found while reading for other stuff, src/core/hw/gfxip/gfx6/gfx6DepthStencilView.cpp. Reviewed-by: Samuel Pitoiset <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv: move spi_baryc_cntl to pipelineDave Airlie2018-01-253-5/+5
| | | | | | | | | | | | We need to enable the pos float location 2 mode anytime we have persample not just when forced by the frag shader. This fixes: dEQP-VK.pipeline.multisample.min_sample_shading* Fixes: 58c97a079 (radv: enable location at sample when persample is forced.) Reviewed-by: Samuel Pitoiset <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv: fix sample_mask_in loading. (v3.1)Dave Airlie2018-01-244-6/+56
| | | | | | | | | | | | | | This is ported from radeonsi and fixes: dEQP-VK.pipeline.multisample_shader_builtin.sample_mask.bit_* v2: don't call this path for radeonsi, it does it in the epilog. use the radeonsi code path. v3: handle NULL pCreateInfo->pMultisampleState properly (Samuel) v3.1: set ps_iter_samples default to 1 (Bas) Reviewed-by: Bas Nieuwenhuizen <[email protected]> Fixes: bdcbe7c76 (radv: add sample mask input support) Signed-off-by: Dave Airlie <[email protected]>
* radv: don't use hw resolves for r16g16 norm formats.Dave Airlie2018-01-241-1/+4
| | | | | | | | | | | | | radeonsi has a workaround for this, but it uses a R16A16 format, which vulkan doesn't have, we could probably come up with a work around but for now just avoid hw resolves. Fixes: dEQP-VK.renderpass.suballocation.multisample.r16g16_*norm* Reviewed-by: Bas Nieuwenhuizen <[email protected]> Fixes: 2a04f5481d (radv/meta: select resolve paths) Signed-off-by: Dave Airlie <[email protected]>
* radv: don't use hw resolve for integer image formatsDave Airlie2018-01-241-0/+5
| | | | | | | | | | | | | | From reading AMDVLK it currently never uses hw resolve paths. This patch takes from radeonsi which doesn't use hw resolve for integer formats, and does the same for radv. This fixes: dEQP-VK.renderpass.suballocation.multisample*uint tests. Reviewed-by: Bas Nieuwenhuizen <[email protected]> Fixes: 2a04f5481d (radv/meta: select resolve paths) Signed-off-by: Dave Airlie <[email protected]>
* radv: add fs_key meta format support to resolve passes.Dave Airlie2018-01-242-30/+61
| | | | | | | | | | | | Some of the hw resolve passes need the SPI color format setup correctly. This fixes lots of 16-bit and 32-bit format tests in dEQP-VK.renderpass.suballocation.multisample* Reviewed-by: Bas Nieuwenhuizen <[email protected]> Fixes: f4e499ec7914 "radv: add initial non-conformant radv vulkan driver" Signed-off-by: Dave Airlie <[email protected]>
* ac/nir: Use instance_rate_inputs per attribute, not per variable.Bas Nieuwenhuizen2018-01-231-14/+13
| | | | | | | | | | This did the wrong thing if we had e.g. an array for which only some of the attributes use the instance index. Tripped up some new CTS tests. CC: <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* ac: fix image load store for GLSL_SAMPLER_DIM_3DTimothy Arceri2018-01-231-1/+3
| | | | | | | | | | | | Fixes the following piglit tests: arb_shader_image_load_store/layer/image3d/layered binding test arb_shader_image_load_store/max-size/image3d max size test/2048x8x8x1 arb_shader_image_load_store/max-size/image3d max size test/8x2048x8x1 arb_shader_image_load_store/max-size/image3d max size test/8x8x2048x1 arb_shader_image_load_store/semantics/imageload/vertex shader/rgba32f/image3d test Reviewed-by: Nicolai Hähnle <[email protected]>
* ac: image size builtin for GLSL_SAMPLER_DIM_3DTimothy Arceri2018-01-231-1/+2
| | | | | | | | This is what radeonsi does. Fixes remaing piglit subtest in: ./bin/arb_shader_image_size-builtin --quick -auto -fbo Reviewed-by: Nicolai Hähnle <[email protected]>
* ac: fix ac_build_varying_gather_values() for packed layoutsTimothy Arceri2018-01-231-1/+1
| | | | | | This fixes a segfault for varyings not starting at component 0. Reviewed-by: Nicolai Hähnle <[email protected]>
* ac: remove arrays when when querying sampler infoTimothy Arceri2018-01-231-3/+1
| | | | | | | | | | | Fixes the following ARB_arrays_of_arrays piglit tests: basic-imagestore-const-uniform-index basic-imagestore-mixed-const-non-const-uniform-index basic-imagestore-mixed-const-non-const-uniform-index2 basic-imagestore-non-const-uniform-index Reviewed-by: Nicolai Hähnle <[email protected]>
* ac: fix emit vertex stream parameterTimothy Arceri2018-01-231-2/+3
| | | | | | | | Fixes the following piglit test on radeonsi: ./bin/arb_enhanced_layouts-gs-stream-location-aliasing Reviewed-by: Nicolai Hähnle <[email protected]>
* ac: add support for gl_HelperInvocationTimothy Arceri2018-01-231-0/+14
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* ac/radeonsi: add emit primitive to the abiTimothy Arceri2018-01-232-2/+7
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* ac: add stream handling to visit_end_primitive()Timothy Arceri2018-01-231-4/+3
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* ac/nir/radeonsi: add ARB_shader_ballot supportTimothy Arceri2018-01-231-0/+37
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* ac/nir: add ARB_shader_group_vote supportTimothy Arceri2018-01-231-0/+15
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radv: add an option that allows to dump pre-optimization irSamuel Pitoiset2018-01-225-0/+8
| | | | | | | With RADV_DEBUG=preoptir. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* radv: restore previous stencil reference after depth-stencil clearMatthew Nicholls2018-01-221-0/+6
| | | | | | Cc: [email protected] Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Alex Smith <[email protected]>