summaryrefslogtreecommitdiffstats
path: root/src/amd
Commit message (Collapse)AuthorAgeFilesLines
* vulkan: Add KHR_display extension using DRM [v10]Keith Packard2018-06-193-1/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds support for the KHR_display extension support to the vulkan WSI layer. Driver support will be added separately. v2: * fix double ;; in wsi_common_display.c * Move mode list from wsi_display to wsi_display_connector * Fix scope for wsi_display_mode andwsi_display_connector allocs * Switch all allocations to vk_zalloc instead of vk_alloc. * Fix DRM failure in wsi_display_get_physical_device_display_properties When DRM fails, or when we don't have a master fd (presumably due to application errors), just return 0 properties from this function, which is at least a valid response. * Use vk_outarray for all property queries This is a bit less error-prone than open-coding the same stuff. * Remove VK_COMPOSITE_ALPHA_INHERIT_BIT_KHR from surface caps Until we have multi-plane support, we shouldn't pretend to have any multi-plane semantics, even if undefined. Suggested-by: Jason Ekstrand <[email protected]> * Simplify addition of VK_USE_PLATFORM_DISPLAY_KHR to vulkan_wsi_args Suggested-by: Eric Engestrom <[email protected]> v3: Add separate 'display_fd' and 'render_fd' arguments to wsi_device_init API. This allows drivers to use different FDs for the different aspects of the device. Use largest mode as display size when no preferred mode. If the display doesn't provide a preferred mode, we'll assume that the largest supported mode is the "physical size" of the device and report that. v4: Make wsi_image_state enumeration values uppercase. Follow more common mesa conventions. Remove 'render_fd' from wsi_device_init API. The wsi_common_display code doesn't use this fd at all, so stop passing it in. This avoids any potential confusion over which fd to use when creating display-relative object handles. Remove call to wsi_create_prime_image which would never have been reached as the necessary condition (use_prime_blit) is never set. whitespace cleanups in wsi_common_display.c Suggested-by: Jason Ekstrand <[email protected]> Add depth/bpp info to available surface formats. Instead of hard-coding depth 24 bpp 32 in the drmModeAddFB call, use the requested format to find suitable values. Destroy kernel buffers and FBs when swapchain is destroyed. We were leaking both of these kernel objects across swapchain destruction. Note that wsi_display_wait_for_event waits for anything to happen. wsi_display_wait_for_event is simply a yield so that the caller can then check to see if the desired state change has occurred. Record swapchain failures in chain for later return. If some asynchronous swapchain activity fails, we need to tell the application eventually. Record the failure in the swapchain and report it at the next acquire_next_image or queue_present call. Fix error returns from wsi_display_setup_connector. If a malloc failed, then the result should be VK_ERROR_OUT_OF_HOST_MEMORY. Otherwise, the associated ioctl failed and we're either VT switched away, or our lease has been revoked, in which case we should return VK_ERROR_OUT_OF_DATE_KHR. Make sure both sides of if/else brace use matches Note that we assume drmModeSetCrtc is synchronous. Add a comment explaining why we can idle any previous displayed image as soon as the mode set returns. Note that EACCES from drmModePageFlip means VT inactive. When vt switched away drmModePageFlip returns EACCES. Poll once a second waiting until we get some other return value back. Clean up after alloc failure in wsi_display_surface_create_swapchain. Destroy any created images, free the swapchain. Remove physical_device from wsi_display_init_wsi. We never need this value, so remove it from the API and from the internal wsi_display structure. Use drmModeAddFB2 in wsi_display_image_init. This takes a drm format instead of depth/bpp, which provides more control over the format of the data. v5: Set the 'currentStackIndex' member of the VkDisplayPlanePropertiesKHR record to zero, instead of indexing across all displays. This value is the stack depth of the plane within an individual display, and as the current code supports only a single plane per display, should be set to zero for all elements Discovered-by: David Mao <[email protected]> v6: Remove 'platform_display' bits from the build and use the existing 'platform_drm' instead. v7: Ensure VK_ICD_WSI_PLATFORM_MAX is large enough by setting to VK_ICD_WSI_PLATFORM_DISPLAY + 1 v8: Simplify wsi_device_init failure from wsi_display_init_wsi by using the same pattern as the other wsi layers. Adopt Jason Ekstrand's white space and variable declaration suggestions. Declare variables at first use, eliminate extra whitespace between types and names, add list iterator helpers, switch to lower-case list_ macros. Respond to Jason's April 8 review: * Create a function to convert relative to absolute timeouts to catch overflow issues in one place * use VK_NULL_HANDLE to clear prop->currentDisplay * Get rid of available_present_modes array. * return OUT_OF_DATE_KHR when display_queue_next called after display has been released. * Make errors from mode setting fatal in display_queue_next * Remove duplicate pthread_mutex_init call * Add wsi_init_pthread_cond_monotonic helper function to isolate pthread error handling from wsi_display_init_wsi Suggested-by: Jason Ekstrand <[email protected]> v9: Fix vscan handling by using MAX2(vscan, 1) everywhere. Vscan can be zero anywhere, which is treated the same as 1. Suggested-by: Jason Ekstrand <[email protected]> v10: Respond to Vulkan CTS failures. 1. Initialize planeReorderPossible in display_properties code 2. Only report connected displays in get_display_plane_supported_displays 3. Return VK_ERROR_OUT_OF_HOST_MEMORY when pthread cond initialization fails. Signed-off-by: Jason Ekstrand <[email protected]> 4. Add vkCreateDisplayModeKHR. This doesn't actually create new modes, it only looks to see if the requested parameters matches an existing mode and returns that. Suggested-by: Jason Ekstrand <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Signed-off-by: Keith Packard <[email protected]>
* radv: Merge the flush bits of CMASK & DCC clear.Bas Nieuwenhuizen2018-06-191-1/+1
| | | | | | | | | | | Probably won't be much different in practice, but still wrong. Fixes Coverity issue 1435002. Not CC'ing to stable since this is only hit if you enable MSAA DCC via RADV_DEBUG. Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Don't check for pipeline being set in draw.Bas Nieuwenhuizen2018-06-191-1/+0
| | | | | | | | Draws without pipeline are definitely not allowed. Fixes Coverity issue 1434216. Reviewed-by: Samuel Pitoiset <[email protected]>
* amd,radeonsi: rename radeon_winsys_cs -> radeon_cmdbufMarek Olšák2018-06-1911-143/+143
| | | | Acked-by: Bas Nieuwenhuizen <[email protected]>
* ac/gpu_info: add radeon_info::num_tcc_blocksMarek Olšák2018-06-192-0/+11
| | | | | | The values for the radeon winsys were copied from the kernel driver. Tested-by: Dieter Nützel <[email protected]>
* ac/surface: Set compressZ for stencil-only surfaces.Bas Nieuwenhuizen2018-06-191-1/+1
| | | | | | | We HTILE compress stencil-only surfaces too. CC: 18.1 <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radv: fix reported number of available VGPRsEric Engestrom2018-06-181-1/+1
| | | | | | | | It's a bit late to round up after an integer division. Fixes: de889794134e6245e08a2 "radv: Implement VK_AMD_shader_info" Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Alex Smith <[email protected]>
* radv: Use less conservative approximation for context rolls.Bas Nieuwenhuizen2018-06-181-3/+6
| | | | | | | Drops the number of time we set the scissor by 4x for F1 2017, which results in a consistent performance improvement of about 4%. Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: fix bitwise checkEric Engestrom2018-06-181-1/+1
| | | | | | Fixes: 922cd38172b8a2bc286bd "radv: implement out-of-order rasterization when it's safe on VI+" Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* ac: Clear meminfo to avoid valgrind warning.Bas Nieuwenhuizen2018-06-161-1/+1
| | | | | | | Somehow valgrind misses that the value is initialized by the ioctl. Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: fix emitting the TCS regs on GFX9Samuel Pitoiset2018-06-161-1/+0
| | | | | | | | | | | | | The primitive ID is NULL and this generates an invalid select instruction which crashes because one operand is NULL. This fixes crashes in The Long Journey Home, Quantum Break and Just Cause 3 with DXVK. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106756 CC: <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* Revert "radv: always set/load both depth and stencil clear values"Samuel Pitoiset2018-06-151-5/+28
| | | | | | | | | This fixes a rendering regression with RoTR. This reverts commit 4bdad9faddc82a4560603936ce5ade5707ecb254. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: don't check for linear images in emit_fast_color_clear()Samuel Pitoiset2018-06-151-2/+0
| | | | | | | | We don't enable CMASK for linear surfaces and addrlib only enables DCC for tiling surfaces. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: allow RADV_PERFTEST=dccmsaa on GFX9Samuel Pitoiset2018-06-151-2/+2
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: add RADV_DEBUG=checkirSamuel Pitoiset2018-06-155-3/+11
| | | | | | | This allows to run the LLVM verifier pass. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: update ZRANGE_PRECISION in radv_update_bound_fast_clear_ds()Samuel Pitoiset2018-06-151-31/+15
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: clean up radv_{set,load}_depth_clear_regs() helpersSamuel Pitoiset2018-06-153-32/+44
| | | | | | | And replace _regs by _metadata because it makes more sense. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: always set/load both depth and stencil clear valuesSamuel Pitoiset2018-06-151-28/+5
| | | | | | | | I don't think that matter much to emit both values and that makes the code a bit simpler. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: update the fast ds clear values only if the image is boundSamuel Pitoiset2018-06-151-5/+32
| | | | | | | | It's unnecessary to update the fast depth/stencil clear values if the fast cleared depth/stencil image isn't currently bound. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: clean up radv_{set,load}_color_clear_regs() helpersSamuel Pitoiset2018-06-153-33/+47
| | | | | | | And replace _regs by _metadata because it makes more sense. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: update the fast color clear values only if the image is boundSamuel Pitoiset2018-06-151-3/+32
| | | | | | | | It's unnecessary to update the fast color clear values if the fast cleared color image isn't currently bound. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: remove multisample bit from shader key.Dave Airlie2018-06-153-4/+0
| | | | | | This wasn't being used anywhere inside the shader from what I can see. Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Fix output for sparse MRTs.Bas Nieuwenhuizen2018-06-141-9/+10
| | | | | | | | | | | | We need to init the cb_shader_format correctly with the changed col_format, so this moves the col_format adjustment to before the adjustment to before the cb_shader_mask gets generated. Fixes: 06d3c650980 "radv: fix a GPU hang when MRTs are sparse" Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106903 CC: 18.1 <[email protected]> Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: update the ZRANGE_PRECISION value for the TC-compat bugSamuel Pitoiset2018-06-141-0/+108
| | | | | | | | | | | | | | | | | | | | | | On GFX8+, there is a bug that affects TC-compatible depth surfaces when the ZRange is not reset after LateZ kills pixels. The workaround is to always set DB_Z_INFO.ZRANGE_PRECISION to match the last fast clear value. Because the value is set to 1 by default, we only need to update it when clearing Z to 0.0. We also need to set the depth clear regs and to update ZRANGE_PRECISION when initializing a TC-compat depth image to 0. Original patch from James Legg. This fixes random CTS fails with dEQP-VK.renderpass.suballocation.formats.d32_sfloat_s8_uint.input.* Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105396 CC: <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: handle undefined EQAA samples in ac_apply_fmask_to_sampleMarek Olšák2018-06-131-2/+4
| | | | | | RADV might wanna use this helper too. Tested-by: Dieter Nützel <[email protected]>
* ac/gpu_info: report real total memory sizesMarek Olšák2018-06-131-28/+54
| | | | | | | The change from MIN2 to MAX2 is intentional. Cc: 18.1 <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: don't fast clear HTILE for 16-bit depth surfaces on GFX8Samuel Pitoiset2018-06-131-0/+8
| | | | | | | | | | This causes rendering issues in Shadow Warrior 2 with DXVK. Cc: [email protected] Fixes: ccc64f3133 ("radv: enable TC-compat HTILE for 16-bit depth surfaces on GFX8") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106912 Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: add a workaround for DXVK hangs by setting amdgpu-skip-thresholdSamuel Pitoiset2018-06-091-1/+78
| | | | | | | | | | | | | | | | | | | | | | | | | | Workaround for bug in llvm that causes the GPU to hang in presence of nested loops because there is an exec mask issue. The proper solution is to fix LLVM but this might require a bunch of work. This fixes a bunch of GPU hangs that happen with DXVK. Vega10: Totals from affected shaders: SGPRS: 110456 -> 110456 (0.00 %) VGPRS: 122800 -> 122800 (0.00 %) Spilled SGPRs: 7478 -> 7478 (0.00 %) Spilled VGPRs: 36 -> 36 (0.00 %) Code Size: 9901104 -> 9922928 (0.22 %) bytes Max Waves: 7143 -> 7143 (0.00 %) Code size slightly increases because it inserts more branch instructions but that's expected. I don't see any real performance changes. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105613 Cc: [email protected] Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: fix missing ZRANGE_PRECISION(1) for GFX9+Samuel Pitoiset2018-06-091-1/+2
| | | | | | | | | | | | | | | ZRANGE_PRECISION(1) seems to be the default optimal value, but it was only set for VI and older chips. This fixes a rendering issue with Banished through DXVK, and might fix more than that. There is still the ZRANGE_PRECISION bug that we need to handle but that can be fixed later. Cc: [email protected] Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: fix possible truncation of intrinsic nameTimothy Arceri2018-06-081-1/+1
| | | | | | | | Fixes the gcc warning: snprintf’ output between 26 and 33 bytes into a destination of size 32 Fixes: d5f7ebda3ec0 ("ac: add LLVM build functions for subgroup instrinsics") Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* amd/common: Fix number of coords for getlod.Bas Nieuwenhuizen2018-06-071-3/+18
| | | | | | | | | | The LLVM 6 code reduced it to a non-array call. We need to do that with the new code too. This fixes dEQP-VK.glsl.texture_functions.query.texturequerylod.*array* for radv. Fixes: a9a79934412 "amd/common: use the dimension-aware image intrinsics on LLVM 7+" Reviewed-by: Dave Airlie <[email protected]>
* radv: fix Coverity no effect control flow issueTimothy Arceri2018-06-071-1/+1
| | | | | swizzle is unsigned so "desc->swizzle[c] < 0" is never true. Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Use correct color format for fast clearsPhilip Rebohle2018-06-051-2/+2
| | | | | | | | | Using the image format is incorrect when the view has a different format than the image. Instead, the view format needs to be used. Reviewed-by: Bas Nieuwenhuizen <[email protected]> CC: 18.1 <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106687
* radv: Do not hardcode fast clear formats.Bas Nieuwenhuizen2018-06-051-180/+73
| | | | | | | | except for the odd one out. This should support many more formats. Reviewed-by: Dave Airlie <[email protected]>
* amd/common: use the dimension-aware image intrinsics on LLVM 7+Nicolai Hähnle2018-06-041-24/+165
| | | | | | Requires LLVM trunk r329166. Acked-by: Marek Olšák <[email protected]>
* radv: fix a GPU hang when MRTs are sparseSamuel Pitoiset2018-06-041-0/+10
| | | | | | | | | | | | | | When the i-th target format is set, all previous target formats must be non-zero to avoid hangs. In other words, without this if a fragment shader exports mrt0, mrt2 and mrt3, the GPU hangs because the target format of mrt1 is zero. This fixes DXVK GPU hangs with "Seven: The Days Long Gone", "GTA V" and probably more games. Cc: "18.0" 18.1" <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Don't pass a TESS_EVAL shader when tesselation is not enabled.Bas Nieuwenhuizen2018-06-041-0/+2
| | | | | | | | | | | | | | | | Otherwise on pre-GFX9, if the constant layout allows both TESS_EVAL and GEOMETRY shaders, but the PIPELINE has only GEOMETRY, it would return the GEOMETRY shader for the TESS_EVAL shader. This would cause the flush_constants code to emit the GEOMETRY constants to the TESS_EVAL registers and then conclude that it did not need to set the GEOMETRY shader registers. Fixes: dfff9fb6f8d "radv: Handle GFX9 merged shaders in radv_flush_constants()" CC: 18.1 <[email protected]> Reviewed-by: Alex Smith <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radv: Handle GFX9 merged shaders in radv_flush_constants()Alex Smith2018-06-011-1/+8
| | | | | | | | | | | | | | | | | | | | This was not previously handled correctly. For example, push_constant_stages might only contain MESA_SHADER_VERTEX because only that stage was changed by CmdPushConstants or CmdBindDescriptorSets. In that case, if vertex has been merged with tess control, then the push constant address wouldn't be updated since pipeline->shaders[MESA_SHADER_VERTEX] would be NULL. Use radv_get_shader() instead of getting the shader directly so that we get the right shader if merged. Also, skip emitting the address redundantly - if two merged stages are set in push_constant_stages this change would have made the address get emitted twice. Signed-off-by: Alex Smith <[email protected]> Cc: "18.1" <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Consolidate GFX9 merged shader lookup logicAlex Smith2018-06-013-35/+26
| | | | | | | | | This was being handled in a few different places, consolidate it into a single radv_get_shader() function. Signed-off-by: Alex Smith <[email protected]> Cc: "18.1" <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Set active_stages the same whether or not shaders were cachedAlex Smith2018-06-011-5/+2
| | | | | | | | | | | | With GFX9 merged shaders, active_stages would be set to the original stages specified if shaders were not cached, but to the stages still present after merging if they were. Be consistent and use the original stages. Signed-off-by: Alex Smith <[email protected]> Cc: "18.1" <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Add startup debug option.Bas Nieuwenhuizen2018-05-314-2/+50
| | | | | | | | | | | | | | | | This adds a RADV_DEBUG=startup option to dump more info about instance creation and device enumeration. A common question end users have is why the direver is not loading for them, and this has two common reasons: 1) They did not install the driver. 2) AMDGPU is not used for the card in the kernel. This adds some info messages so we can easily get a some useful output from end users. Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Add option to print errors even in optimized builds.Bas Nieuwenhuizen2018-05-3114-96/+108
| | | | | | | | | | | Errors are not that common of a case so we can eat a slight perf hit in having to call a function and do a runtime check. In turn this makes debugging random errors happening for end users easier, because they don't have to have a debug build on hand. Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Make the sem_info allocate/free functions static.Bas Nieuwenhuizen2018-05-312-15/+9
| | | | | | | They are only used in 1 file. Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Only expose subgroup shuffles on VI+.Bas Nieuwenhuizen2018-05-301-2/+5
| | | | | | | | | The current implementation depends on bpermute, which is VI+. Fixes: f2c6a550611 "radv: enable subgroup capabilities" Reviewed-by: Daniel Schürmann <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: fix emitting descriptor pointers with LLVM < 7Samuel Pitoiset2018-05-301-2/+4
| | | | | | | | | | This was terribly wrong, I forced use of 32-bit pointers when emitting shader descriptor pointers. This fixes GPU hangs with LLVM 5&6 because 32-bit pointers are only supported with LLVM 7. Fixes: 88d1ed0f81 ("radv: emit shader descriptor pointers consecutively") Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: emit shader descriptor pointers consecutivelySamuel Pitoiset2018-05-291-47/+57
| | | | | | | | | | | | This reduces the number of SET_SH_REG packets which are emitted for applications that use more than one descriptor set per stage. We should be able to emit more SET_SH_REG packets consecutively (like push constants and vertex buffers for the vertex stage), but this will be improved later. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: allow radv_emit_shader_pointer_head() to emit more pointersSamuel Pitoiset2018-05-291-3/+5
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: split radv_emit_shader_pointer()Samuel Pitoiset2018-05-291-5/+20
| | | | | | | | | This will allow to emit consecutive shader pointers for reducing the number of emitted SET_SH_REG packets, which is recommended. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Implement VK_KHR_draw_indirect_count.Bas Nieuwenhuizen2018-05-282-0/+50
| | | | | | | | | Literally the same as the AMD ext. Passes *indirect_draw_count* CTS tests. Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Implement alternate GFX9 scissor workaround.Bas Nieuwenhuizen2018-05-281-33/+47
| | | | | | | | | | | | | | | | | | | | | This improves dota2 performance for me by 11% when I force the GPU DPM level to low (otherwise dota2 is CPU limited for 4k on my threadripper), which should be a large part of the radv-amdvlk gap. (For me with that was radv 60.3 -> 66.6, while AMDVLK does about 68 fps) It looks like dota2 rendered the GUI with a bunch of draws with a SetScissors before almost each draw, causing a lot of pipeline stalls. I'm not really happy with the duplication of code, but overriding radeon_set_context_reg would also be messy since we have the pre-recorded pipelines and a bunch of si_cmd_buffer code, as well as some memory->context reg loads for which things would be more complicated. Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>