summaryrefslogtreecommitdiffstats
path: root/src/amd
Commit message (Collapse)AuthorAgeFilesLines
* amd/common: determine the ES type (VS or TES) for the GS on GFX9Samuel Pitoiset2018-01-102-0/+9
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: rework emit_barrier() to not segfault on radeonsiTimothy Arceri2018-01-091-9/+8
| | | | | | | | nir_to_llvm_context will always be NULL for radeonsi so we need work around this. Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* ac: add load_tess_level() to the abiTimothy Arceri2018-01-092-0/+10
| | | | | | | | | | | | | | | | Fixes the following piglit tests in radeonsi: vs-tcs-tes-tessinner-tessouter-inputs-quads.shader_test vs-tcs-tes-tessinner-tessouter-inputs-tris.shader_test vs-tes-tessinner-tessouter-inputs-quads.shader_test vs-tes-tessinner-tessouter-inputs-tris.shader_test v2: make use of si_shader_io_get_unique_index_patch() via the helper in the previous patch rather than shader_io_get_unique_index() Reviewed-by: Nicolai Hähnle <[email protected]> (v1) Reviewed-by: Marek Olšák <[email protected]>
* radv: get InstanceID from VGPR1 (or VGPR2 for tess) instead of VGPR3Samuel Pitoiset2018-01-081-8/+24
| | | | | | | | | VGPR1 = InstanceID / StepRate0; // StepRate0 can be set to 1 Ported from RadeonSI. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: avoid PS partial flushes when viewports/scissors don't changeSamuel Pitoiset2018-01-081-6/+33
| | | | | | | | | | | | | For Vega10 and Raven that need a special workaround for the scissor bug. This seems to give a minor boost for Talos and Dota 2, at least. To reduce the cost of memcmp, the driver checks if it's really useful to do the comparison. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: add has_scissor_bug for Vega10 and RavenSamuel Pitoiset2018-01-083-2/+6
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv/gfx9: do not load VGPR1 when GS uses points or linesSamuel Pitoiset2018-01-081-1/+3
| | | | | | | | VGPR1 is only needed for topology that needs 3 offsets like triangles or quads. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: make shader BOs read-only for the GPUSamuel Pitoiset2018-01-083-1/+7
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: make descriptor BOs read-only for the GPUSamuel Pitoiset2018-01-082-3/+7
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: make the indirect GFX config BO read-only for the GPUSamuel Pitoiset2018-01-081-1/+2
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv/winsys: make IBs read-only for the GPUSamuel Pitoiset2018-01-081-6/+11
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv/winsys: add RADEON_FLAG_READ_ONLYSamuel Pitoiset2018-01-082-1/+4
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv/winsys: rework radv_amdgpu_bo_va_op()Samuel Pitoiset2018-01-081-17/+23
| | | | | | | Needed for the following commit. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: add ac_build_fmin/fmax helpersMarek Olšák2018-01-062-15/+22
| | | | Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: remove unused radv_color_buffer_info::cb_clear_valueXSamuel Pitoiset2018-01-051-2/+0
| | | | | | | Found by inspection. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas nieuwenhuizen <[email protected]>
* radv: enable denorms for 64-bit and 16-bit floatsSamuel Pitoiset2018-01-051-0/+14
| | | | | | | | | | | Similar to RadeonSI. This fixes: dEQP-VK.image.texel_view_compatible.graphic.basic.attachment_read.bc*r16g16b16a16_sfloat dEQP-VK.image.extended_usage_bit.attachment_write.r16_sfloat Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* amd/common: correctly detect if we need ring buffersSamuel Pitoiset2018-01-051-7/+9
| | | | | | | | | | | | When allocate_user_sgprs() was called, ctx->stage was actually unset and 0 is for the vertex shader. This doesn't change anything for now because of the spill support thing. Though, the number of user SGPRs has to be fixed for merged shaders on GFX9. It was broken before anyway. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* amd/common: use ac_image_load when lod is zeroSamuel Pitoiset2018-01-051-1/+3
| | | | | | | | | This might decrease VGPR spilling, because we no longer have to use v4i32 for 2D fetches when level == 0. We now use v2i32 for those cases. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radv: limit the scissor bug workaround to Vega 10 and RavenSamuel Pitoiset2018-01-051-1/+6
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: rework ac_llvm_extract_elem()Timothy Arceri2018-01-051-3/+3
| | | | | | | Simplifies the logic a little and asserts index is 0. Suggested-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* ac/radeonsi: add load_tess_coord() to the abiTimothy Arceri2018-01-052-7/+17
| | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* ac/radeonsi: add tcs_rel_ids to the abiTimothy Arceri2018-01-052-8/+8
| | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* ac: add {tcs,tes}_patch_id to the abiTimothy Arceri2018-01-052-8/+8
| | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* ac: move some helpers to ac_llvm_build.cTimothy Arceri2018-01-053-40/+50
| | | | | | | We will call these from the radeonsi NIR backend. Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* ac: add store_tcs_outputs() to the abiTimothy Arceri2018-01-052-24/+51
| | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* ac: call load_tcs_input() via the abiTimothy Arceri2018-01-051-19/+17
| | | | | | | | This also enables some code sharing with tes. V2: drop type param and just use ctx->i32 Reviewed-by: Marek Olšák <[email protected]>
* ac: add load_tes_inputs() to the abiTimothy Arceri2018-01-052-22/+51
| | | | | | V2: drop type param and just use ctx->i32 Reviewed-by: Marek Olšák <[email protected]>
* radv: Use correct flush bits for flushing L2 during CB/DB flushes.Bas Nieuwenhuizen2018-01-041-13/+16
| | | | | | | | | | | | Copied from radeonsi. Putting in the correct metadata flush commands for eventually not flushing L2 on CB/DB switch. Does not remove the need for V_028A90_CACHE_FLUSH_AND_INV_TS_EVENT at the moment. Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Invalidate L1 for VK_ACCESS_VERTEX_ATTRIBUTE_READ_BIT.Bas Nieuwenhuizen2018-01-041-1/+1
| | | | | | | These are just shaders reads, so we need to invalidate L1. Fixes: 6dbb0eaccc "radv: handle subpass cache flushes" Reviewed-by: Samuel Pitoiset <[email protected]>
* radv/gfx9: reduce the number of input VGPRs for the GS stageSamuel Pitoiset2018-01-041-1/+14
| | | | | | | This can still be improved, but let's start with this. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* amd/common: scan if gl_PrimitiveID is used before translating to LLVMSamuel Pitoiset2018-01-045-10/+7
| | | | | | | | | It makes more sense to move all scan stuff in the same place. Also, we don't really need to duplicate the uses_primid field for each stages. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* amd/common: scan if gl_InvocationID is usedSamuel Pitoiset2018-01-042-0/+4
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* ac: rename has_sync_file to has_fence_to_handle.Bas Nieuwenhuizen2018-01-042-3/+3
| | | | | | | | | | | sync_files are in linux since 4.7, while the amdgpu fence_to_handle ioctl is only in 4.15. In particular we don't need it for sync_file in radv, because everything happens via syncobjs, which got support earlier than fence_to_handle. Reviewed-by: Marek Olšák <[email protected]>
* ac/nir: Handle loading data from compact arrays.Bas Nieuwenhuizen2018-01-041-6/+7
| | | | | Fixes: f4e499ec791 "radv: add initial non-conformant radv vulkan driver" Reviewed-by: Dave Airlie <[email protected]>
* radv: Allow writing 0 scissors.Bas Nieuwenhuizen2018-01-041-1/+2
| | | | | | | When rasterization is disabled we can have that few. Fixes: 76603aa90b8 "radv: Drop the default viewport when 0 viewports are given." Reviewed-by: Dave Airlie <[email protected]>
* radv: Use correct HTILE expanded words.Bas Nieuwenhuizen2018-01-041-2/+4
| | | | | | | | | | | Seems like users are actually hitting 0xFFFFFFFF actually making things broken for them, and the mad max regression is fixed, so lets put this in once more. v2: Use 0xf for depth-only htile. (Dave) Fixes: af2844116fd "radv: Revert HTILE reset word to 0xFFFFFFFF." Reviewed-by: Dave Airlie <[email protected]>
* ac: rename has_syncobj_wait -> has_syncobj_wait_for_submitMarek Olšák2018-01-044-7/+7
| | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Implement binning on GFX9.Bas Nieuwenhuizen2017-12-314-6/+348
| | | | | | | | | | | | Overall it does not really help or hurt. The deferred demo gets 1% improvement and some games a 3% decrease, so I don't think this should be enabled by default. But with the code upstream it is easier to experiment with it. v2: Remove initializing the registers from si_emit_config. Reviewed-by: Dave Airlie <[email protected]>
* radv: Add flag for enabling binning.Bas Nieuwenhuizen2017-12-312-0/+9
| | | | | | Letting it be disabled by default. Reviewed-by: Dave Airlie <[email protected]>
* radv: Also set DCC params for sampling for input attachment usage.Bas Nieuwenhuizen2017-12-291-1/+2
| | | | | | | | Those are implemented as texture sampling, so we need to make the texture TC-compatible too. Fixes: 34d23e82ca9 "radv: set some dcc parameters depending on if texture will be sampled" Reviewed-by: Fredrik Höglund <[email protected]>
* radv: Enable DCC with transfers.Bas Nieuwenhuizen2017-12-291-2/+1
| | | | | | | | | Before this DCC was in practice disabled for most games. This enables practical DCC use. Expect a 5-10% perf increase on a bunch of games on vega @ 4k. Reviewed-by: Dave Airlie <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* radv: Decompress copy destination if formats are incompatible.Bas Nieuwenhuizen2017-12-291-2/+25
| | | | | | | | | | If both source and destination are DCC compressed, and their formats are not compatible, we need to decompress one of them to make sure we can do reinterpretation (which needs src format == dst format) . Reviewed-by: Dave Airlie <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* radv: Disable DCC for GENERAL layout and compute transfer dest.Bas Nieuwenhuizen2017-12-294-8/+47
| | | | | | | | | | | | | | | Apps can use this for render feedback loops, where things are defined if they render each pixel only once. However, DCC fails here, as the level of coherence is a block not a pixel, so disable it. This is also going to help implementing other stuff. Even if we optimize this later to only happen if there actually is a loop (if possible at all ...), then the machinery is still useful to exclude images accessible by the SDMA queue when that is implemented. Reviewed-by: Dave Airlie <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* radv: Don't init DCC metadata during FS resolve.Bas Nieuwenhuizen2017-12-291-5/+0
| | | | | | | | It should already be valid there + the RB will update it during rendering. Reviewed-by: Dave Airlie <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* radv: Make color meta operations layout aware.Bas Nieuwenhuizen2017-12-295-110/+145
| | | | | | | | | | | | | For fast clear eliminate and decompressions, we always use the most compressed format. For clears, the code already creates a renderpass on demand with the exact same layout as specified. Otherwise we start distinguishing between GENERAL and TRANSFER_DST_OPTIMAL. Reviewed-by: Dave Airlie <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* radv: Add compute DCC decompress.Bas Nieuwenhuizen2017-12-293-0/+275
| | | | | | | | | | | We do an in place copy where we read compressed and write decompressed. By doing this in sizes that cover entire DCC blocks and waiting for all reads in the block before starting to write we avoid corruption. In the end we clear the DCC metadata to 0xffffffff. Reviewed-by: Dave Airlie <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* radv: Use the meta fast clear destructor on construction failure.Bas Nieuwenhuizen2017-12-291-6/+3
| | | | | | | | Simplifies failure paths. The caller already calls radv_device_finish_meta_fast_clear_flush_state on failure. Reviewed-by: Dave Airlie <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* radv: Add GFX DCC decompress.Bas Nieuwenhuizen2017-12-292-12/+83
| | | | | Reviewed-by: Dave Airlie <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* radv: Don't enable DCC / TC compat HTILE for storage images.Bas Nieuwenhuizen2017-12-291-5/+6
| | | | | | | | | | | We don't get a layout when binding to a descriptor set, but can assume that the LAYOUT is GENERAL. For DCC stores with the DCC bits set will result in a hang, so better be safe than sorry. Reviewed-by: Dave Airlie <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* Revert "radv/gfx9: fix block compression texture views."Bas Nieuwenhuizen2017-12-291-35/+0
| | | | | | | | | This reverts commit 59515780433837ad3975f8ed20b93cf2fe6870e5. The mentioned commit causes a hang in DoW3 on Vega. Fixes: 59515780433 "radv/gfx9: fix block compression texture views." Acked-by: Dave Airlie <[email protected]>