summaryrefslogtreecommitdiffstats
path: root/src/amd
Commit message (Collapse)AuthorAgeFilesLines
* ac: treat Mullins as Kabini, remove the enumMarek Olšák2019-05-276-12/+1
| | | | it's the same design
* radv: ignore the loadOp if the first use of an attachment is a resolveSamuel Pitoiset2019-05-271-9/+3
| | | | | | | Based on ANV. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: always dirty the framebuffer when restoring a subpassSamuel Pitoiset2019-05-272-2/+4
| | | | | | | | | | | The old code was not wrong because the transitions performed after the resolves should re-emit the framebuffer if needed. This change is mostly a no-op but it improves consistency regarding other meta operations that need to save/restore subpasses. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: add radv_clear_htile() helperSamuel Pitoiset2019-05-273-6/+16
| | | | | | | | This helper will be useful for clearing HTILE after some depth/stencil resolves. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: tidy up GetQueryPoolResults for occlusion queriesSamuel Pitoiset2019-05-271-7/+5
| | | | | | | | Just move the block that checks the availability bit into the switch like other query types. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* nir: Drop imov/fmov in favor of one mov instructionJason Ekstrand2019-05-241-2/+1
| | | | | | | | | | | | | | | | The difference between imov and fmov has been a constant source of confusion in NIR for years. No one really knows why we have two or when to use one vs. the other. The real reason is that they do different things in the presence of source and destination modifiers. However, without modifiers (which many back-ends don't have), they are identical. Now that we've reworked nir_lower_to_source_mods to leave one abs/neg instruction in place rather than replacing them with imov or fmov instructions, we don't need two different instructions at all anymore. Reviewed-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Vasily Khoruzhick <[email protected]> Acked-by: Rob Clark <[email protected]>
* nir/builder: Remove the use_fmov parameter from nir_swizzleJason Ekstrand2019-05-242-4/+4
| | | | | | | | | | This flag has caused more confusion than good in most cases. You can validly use imov for floats or fmov for integers because, without source modifiers, neither modify their input in any way. Using imov for floats is more reliable so we go that direction. Reviewed-by: Kristian H. Kristensen <[email protected]> Acked-by: Alyssa Rosenzweig <[email protected]>
* vulkan: fix build dependency issue with generated filesLionel Landwerlin2019-05-221-4/+3
| | | | | | | | | | | | | On machines with many cores, you can run into that issue : ../mesa-9999/src/vulkan/overlay-layer/overlay.cpp:42:10: fatal error: vk_enum_to_str.h: No such file or directory v2: Move declare_dependency around (Eric) Signed-off-by: Lionel Landwerlin <[email protected]> Reported-by: Jan Ziak Cc: <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* radv: do not reset query pool during creationSamuel Pitoiset2019-05-221-3/+0
| | | | | | | | | | | From the Vulkan spec 1.1.108: "After query pool creation, each query must be reset before it is used." So, the driver doesn't need to do this at creation time. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: fix the sample max distance value for 8xSamuel Pitoiset2019-05-221-1/+1
| | | | | | | It should be 7, not 8. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: emit correct centroid priority based on the number of samplesSamuel Pitoiset2019-05-221-3/+16
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: clean up the sample locations codebaseSamuel Pitoiset2019-05-224-98/+76
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: remove remaining code related to 16 samplesSamuel Pitoiset2019-05-222-51/+0
| | | | | | | The driver only supports up to 8 samples. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* spirv, radv, anv: Replace ptr_type with addr_formatCaio Marcelo de Oliveira Filho2019-05-201-5/+5
| | | | | | | | | Instead of setting the glsl types of the pointers for each resource, set the nir_address_format, from which we can derive the glsl_type, and in the future the bit pattern representing a NULL pointer. Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: decompress FMASK before performing a MSAA decompress using FMASKSamuel Pitoiset2019-05-201-2/+13
| | | | | | | This fixes some CTS failures related to VK_EXT_sample_locations. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: add a workaround for Monster Hunter World and LLVM 7&8Samuel Pitoiset2019-05-175-3/+16
| | | | | | | | | | | | | | The load/store optimizer pass doesn't handle WaW hazards correctly and this is the root cause of the reflection issue with Monster Hunter World. AFAIK, it's the only game that are affected by this issue. This is fixed with LLVM r361008, but we need a workaround for older LLVM versions unfortunately. Cc: "19.0" "19.1" <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: match radeonsi code in ac_shader_binary_read_configMarek Olšák2019-05-161-3/+3
|
* r600+radeonsi: use ctx_query_reset_status on radeonMarek Olšák2019-05-162-3/+0
| | | | This allows a nice cleanup, because the winsys always handles it.
* winsys/amdgpu: add a parallel compute IB coupled with a gfx IBMarek Olšák2019-05-162-0/+9
| | | | | Tested-by: Dieter Nützel <[email protected]> Acked-by: Nicolai Hähnle <[email protected]>
* ac: add LLVM code for triangle cullingMarek Olšák2019-05-164-0/+338
| | | | | Tested-by: Dieter Nützel <[email protected]> Acked-by: Nicolai Hähnle <[email protected]>
* ac: rename SI-CIK-VI to GFX6-GFX7-GFX8Marek Olšák2019-05-1523-199/+199
| | | | | | | | | | | | Acked-by: Dave Airlie <[email protected]> We already use GFX9 and I don't want us to have confusing naming in the driver. GFXn naming is better from the driver perspective, because it's the real version of the gfx portion of the hw. Also, CIK means Bonaire-Kaveri-Kabini, it doesn't mean CI. It shouldn't confuse our SDMA, UVD, VCE etc. code much. Those have nothing to do with GFXn and they have their own version numbers.
* ac: add comments to chip enumsMarek Olšák2019-05-151-11/+11
| | | | | Reviewed-by: Alex Deucher <[email protected]> (except GFX2 changes) Reviewed-by: Dave Airlie <[email protected]> (except <= GFX5 changes)
* ac: use 1D GEPs for descriptors and constantsMarek Olšák2019-05-143-13/+12
| | | | | | | just a cleanup Reviewed-by: Bas Nieuwenhuizen <[email protected]> Tested-by: Bas Nieuwenhuizen <[email protected]>
* radv: add support for VK_KHR_uniform_buffer_standard_layoutSamuel Pitoiset2019-05-142-0/+7
| | | | | | | Nothing to do. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Clean up signalled and submitted fields from winsys fences.Bas Nieuwenhuizen2019-05-136-41/+47
| | | | | | | | | Other types like syncobj do not need it, so lets make things a bit more uniform. Also reduce confusion what the signalled/submitted referred to (especially with imported fences) Reviewed-by: Dave Airlie <[email protected]>
* radv: bump reported version to 1.1.107Samuel Pitoiset2019-05-132-50/+1
| | | | | | | | VK_AMD_draw_indirect_count has been promoted with the suffix changed to KHR. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* amd/common: add ac_build_opencoded_fetch_formatNicolai Hähnle2019-05-132-0/+343
| | | | | | | Implement software emulation of buffer_load_format for all types required by vertex buffer fetches. Reviewed-by: Marek Olšák <[email protected]>
* radv: Do not use extra descriptor space for the 3rd plane.Bas Nieuwenhuizen2019-05-123-7/+26
| | | | | | | | | | | | | | | | | | | | | While ImageFormatProperties returns the number of internal descriptors, it turns out that applications do not need to actually allocate more descriptors in the descriptor pool. So if we make descriptors with more planes larger we have to be convervative and always allocate space for the larger descriptors which is a waste given the low usage of this ext. So let us make use of the fact that 3plane formats all have the same formats & dimensions for the last two planes. This way we only need the first half of the descriptor of the 3rd plane and can share the second half of the second plane. This allows us to use 16 bytes for the descriptor which nicely fits into the 16 bytes that are unused right next to the sampler. Fixes: 5564c38212a "radv: Update descriptor sets for multiple planes." Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Add support for icd loader interface v4.Bas Nieuwenhuizen2019-05-133-2/+66
| | | | | | Adds support for physical device functions unknown to the loader. Acked-by: Samuel Pitoiset <[email protected]>
* radv: clear vertex bindings while resetting command bufferJózef Kucia2019-05-111-1/+2
| | | | | | | | | Only vertex inputs accessed by vertex shader must have valid buffers bound. Signed-off-by: Józef Kucia <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Fixes: 5010436e09f "radv: bail out when binding the same vertex buffers"
* nir: allow specifying a set of opcodes in lower_alu_to_scalarJonathan Marek2019-05-101-1/+1
| | | | | | | | | This can be used by both etnaviv and freedreno/a2xx as they are both vec4 architectures with some instructions being scalar-only. Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Christian Gmeiner <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* nir: Initialize lower_flrp_progress everywhereIan Romanick2019-05-091-1/+1
| | | | | | | | | | | | | | | | I don't know why I thought NIR_PASS always set the progress variable. Derp. Fixes: d41cdef2a59 ("nir: Use the flrp lowering pass instead of nir_opt_algebraic") Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Coverity CID: 1444996 Coverity CID: 1444995 Coverity CID: 1444994 Coverity CID: 1444993 Coverity CID: 1444991 Coverity CID: 1444989
* radv: fix setting the number of rectangles when it's dyanmicSamuel Pitoiset2019-05-091-4/+6
| | | | | | | | | | We need to know the number of rectangles. This fixes new CTS dEQP-VK.draw.discard_rectangles.dynamic_*. Fixes: 5db0bf99944 ("radv: Implement VK_EXT_discard_rectangles.") Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: call constant folding before opt algebraicTimothy Arceri2019-05-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | The pattern of calling opt algebraic first seems to have originated in i965. The order in OpenGL drivers generally doesn't matter because the GLSL IR optimisations do constant folding before opt algebraic. However in Vulkan drivers calling opt algebraic first can result in missed constant folding opportunities. vkpipeline-db results (VEGA64): Totals from affected shaders: SGPRS: 3160 -> 3176 (0.51 %) VGPRS: 3588 -> 3580 (-0.22 %) Spilled SGPRs: 52 -> 44 (-15.38 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 12 -> 12 (0.00 %) dwords per thread Code Size: 261812 -> 261036 (-0.30 %) bytes LDS: 7 -> 7 (0.00 %) blocks Max Waves: 346 -> 348 (0.58 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* nir: Use the flrp lowering pass instead of nir_opt_algebraicIan Romanick2019-05-061-0/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | I tried to be very careful while updating all the various drivers, but I don't have any of that hardware for testing. :( i965 is the only platform that sets always_precise = true, and it is only set true for fragment shaders. Gen4 and Gen5 both set lower_flrp32 only for vertex shaders. For fragment shaders, nir_op_flrp is lowered during code generation as a(1-c)+bc. On all other platforms 64-bit nir_op_flrp and on Gen11 32-bit nir_op_flrp are lowered using the old nir_opt_algebraic method. No changes on any other Intel platforms. v2: Add panfrost changes. Iron Lake and GM45 had similar results. (Iron Lake shown) total cycles in shared programs: 188647754 -> 188647748 (<.01%) cycles in affected programs: 5096 -> 5090 (-0.12%) helped: 3 HURT: 0 helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 helped stats (rel) min: 0.12% max: 0.12% x̄: 0.12% x̃: 0.12% Reviewed-by: Matt Turner <[email protected]>
* radv: fix rowPitch for R32G32B32 formats on GFX9Samuel Pitoiset2019-05-061-1/+13
| | | | | | | | | | The pitch is actually the number of components per row. We found the problem when we implemented some meta operations for these formats and the wrong pitch has been confirmed with a small test case. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108325 Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Use given stride for images imported from Android.Bas Nieuwenhuizen2019-05-063-0/+35
| | | | | | Handled similarly as radeonsi. I checked the offsets are actually used. Acked-by: Samuel Pitoiset <[email protected]>
* radv: Implement cosited_even sampling.Bas Nieuwenhuizen2019-05-062-2/+83
| | | | | | | | | | Apparently cosited_even was the required one instead of midpoint. This adds slight offset of 0.5 pixels to the coordinates (+ we need the image size to convert to normalized coords) Fixes: 91702374d5d "radv: Add ycbcr lowering pass." Acked-by: Samuel Pitoiset <[email protected]>
* radv: Disable subsampled formats.Bas Nieuwenhuizen2019-05-061-1/+2
| | | | | | | | | | | | | Broken on Polaris and since I discovered NV12 is not subsampled, but a 2-plane format I decided I don't really care. Work to do to re-enable: 1) Figure out which devices support it natively. 2) Write some software emulation for the others. Fixes: 52c1adda21b "radv: Add ycbcr format features." Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: apply the indexing workaround for atomic buffer operations on GFX9Samuel Pitoiset2019-05-033-5/+14
| | | | | | | | | | | | | | | | | | | Because the new raw/struct intrinsics are buggy with LLVM 8 (they weren't marked as source of divergence), we fallback to the old instrinsics for atomic buffer operations only. This means we need to apply the indexing workaround for GFX9. The load/store operations still use the new LLVM 8 intrinsics. The fact that we need another workaround is painful but we should be able to clean up that a bit once LLVM 7 support will be dropped. This fixes a GPU hang with AC Odyssey and some rendering problems with Nioh. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110573 Fixes: 31164cf5f70 ("ac/nir: only use the new raw/struct image atomic intrinsics with LLVM 9+") Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: fix radv_get_aspect_format() for D+S formatsSamuel Pitoiset2019-05-031-0/+2
| | | | | | | | | | | | This restores the previous behaviour before YCBCR landed. For D+S formats, it returns the depth format. This fixes an assertion with Thrones of Britannia. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110540 Fixes: 66507cc6563 ("radv: Add single plane image views & meta operations") Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: only need to force emit the TCS regs on Vega10 and Raven1Samuel Pitoiset2019-05-021-2/+2
| | | | | | | | Other GFX9 chips aren't affected. Cc: "19.0" "19.1" <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: set WD_SWITCH_ON_EOP=1 when drawing primitives from a stream output bufferSamuel Pitoiset2019-05-023-0/+9
| | | | | | | | | | | | | According to RadeonSI, this seems to be required by the hardware to avoid GPU hangs. I think I just forgot to set that bit when I implemented VK_EXT_transform_feedback. This fixes a GPU hang with Space Engineers and DXVK. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110291 Fixes: b4eb029062a ("radv: implement VK_EXT_transform_feedback") Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: fix set_output_usage_mask() with composite and 64-bit typesRhys Perry2019-05-021-4/+17
| | | | | | | | | | | | | | | | It previously used var->type instead of deref_instr->type and didn't handle 64-bit outputs. This fixes lots of transform feedback CTS tests involving transform feedback and geometry shaders (mostly dEQP-VK.transform_feedback.fuzz.random_geometry.*) v2: fix writemask widening when comp != 0 v3: fix 64-bit variables when comp != 0, again Signed-off-by: Rhys Perry <[email protected]> Cc: 19.0 19.1 <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* ac: tidy up ac_build_llvm8_tbuffer_{load,store}Samuel Pitoiset2019-05-021-13/+13
| | | | | | | | For consistency with ac_build_llvm8_buffer_{load,store}_common helpers and that will help a bit for removing the vec3 restriction. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: implement a workaround for VK_EXT_conditional_renderingSamuel Pitoiset2019-05-021-3/+44
| | | | | | | | | | | | | | | | | | Per the Vulkan spec 1.1.107, the predicate is a 32-bit value. Though the AMD hardware treats it as a 64-bit value which means it might fail to discard. I don't know why this extension has been drafted like that but this definitely not fit with AMD. The hardware doesn't seem to support a 32-bit value for the predicate, so we need to implement a workaround. This fixes an issue when DXVK enables conditional rendering with RADV, this also fixes the Sasha conditionalrender demo. Fixes: e45ba51ea45 ("radv: add support for VK_EXT_conditional_rendering") Reported-by: Philip Rebohle <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: fix color conversions for normalized uint/sint formatsSamuel Pitoiset2019-05-021-4/+16
| | | | | | | | | | | | | The hardware actually rounds before conversion. This now matches what values are used when performing fast clears vs slow clears. This fixes a rendering issue with Far Cry 3&4. This also fixes a bunch of CTS tests that use a 8-bit UNORM format (only when the 512*512 image size hint is manually disabled). Cc: "19.0" "19.1" <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: do not need to force emit the TCS regs on Vega20Samuel Pitoiset2019-05-021-0/+1
| | | | | | | | | This chip doesn't need the fixup. This fixes a bunch of dEQP-VK.tessellation tests and avoid random GPU hangs. Cc: "19.0" "19.1" <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Restrict YUVY formats to 1 layer.Bas Nieuwenhuizen2019-05-021-0/+7
| | | | | Fixes: 8bb3cec7c9b "radv: Expose VK_EXT_ycbcr_image_arrays." Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Set is_array in lowered ycbcr tex instructions.Bas Nieuwenhuizen2019-05-021-0/+1
| | | | | | | Fixes array tests. Fixes: 91702374d5d "radv: Add ycbcr lowering pass." Reviewed-by: Samuel Pitoiset <[email protected]>