summaryrefslogtreecommitdiffstats
path: root/src/amd
Commit message (Collapse)AuthorAgeFilesLines
* ac/nir: for ubo load use correct num_componentsDave Airlie2017-11-071-1/+1
| | | | | | | | | I was hacking something stupid in doom, and hit an assert for the bitcast following this, it definitely looks like this should be the number of 32-bit components, not the instr level ones. Reviewed-by: Nicolai Hähnle <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv: move is_local up to the winsys level.Dave Airlie2017-11-064-3/+6
| | | | | | | | We can avoid adding the buffer in the non-local case, this will avoid all the overhead of the indirect call. Reviewed-by: Samuel Pitoiset <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv: wrap cs_add_buffer in an inline. (v2)Dave Airlie2017-11-066-41/+49
| | | | | | | | | The next patch will try and avoid calling the indirect function. v2: add a missing conversion. Reviewed-by: Samuel Pitoiset <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv: when loading regs no need to add bufferDave Airlie2017-11-061-2/+0
| | | | | | | | The function that calls us has just added the buffer to the list already, no need to try and add it again. Reviewed-by: Samuel Pitoiset <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv: pre-calculate user_data_0 registers and store in pipelineDave Airlie2017-11-065-52/+55
| | | | | | | | There's no point recalculating these the whole time on descriptor emission, just store them at pipeline creation. Reviewed-by: Samuel Pitoiset <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv: add initial copy descriptor support. (v2)Dave Airlie2017-11-061-2/+53
| | | | | | | | | | | | It appears the latest dota2 vulkan uses this, and we get a hang in VR mode without it. v2: remove finishme I left in after finishing. Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Andres Rodriguez <[email protected]> Cc: "17.2 17.3" <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv: move descriptor sets out of cmd_state.Dave Airlie2017-11-063-17/+20
| | | | | | | | | | | Instead of storing all the pointers and zeroing them all out, just store a valid bitmask in the state. This also moves the CmdBindPipeline path down the cpu usage path for the multithreading demo as it no longer has to traverse MAX_SETS to find the active descriptor sets. Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv: add helper for setting a descriptor.Dave Airlie2017-11-063-10/+17
| | | | | | | This is just a simple refactor. Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv: move vertex binding out of cmd state.Dave Airlie2017-11-062-4/+4
| | | | | | | | | This isn't required to be cleared, since buffers are only linked by vertex elements, so if elements are clear then no buffers should be referenced. Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv: reorder cmd_state to remove a hole.Dave Airlie2017-11-061-1/+1
| | | | | | | | This just removes a hole in the cmd_state and packs some bools together. Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv: free attachments on end command buffer.Dave Airlie2017-11-061-0/+2
| | | | | | | | | | | | | | If we allocate attachments in the begin command buffer due to the render pass continue bit, we were leaking them. Since renderpasses inside a cmd buffer malloc/free these properly, and set to NULL, we just need to call free at end. Fixes a memory leak with multithreading demo. Reviewed-by: Bas Nieuwenhuizen <[email protected]> Cc: "17.2 17.3" <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv: Optimize calling radv_save_descriptors.Bas Nieuwenhuizen2017-11-041-4/+2
| | | | | | | | uint32_t data[MAX_SETS * 2] = {}; was getting executed before the exit and took significant amounts of time. By having the check outside the function, we skip the execution of the clear. Reviewed-by: Dave Airlie <[email protected]>
* radv: Use an array to store descriptor sets.Bas Nieuwenhuizen2017-11-042-26/+50
| | | | | | | | | | | | The vram_list linked list resulted in lots of pointer chasing. Replacing this with an array instead improves descriptor set allocation CPU usage by 3x at least (when also considering the free), because it had to iterate through 300-400 sets on average. Not a huge improvement as the pre-improvement CPU usage was only about 2.3% in the busiest thread. Reviewed-by: Dave Airlie <[email protected]>
* ac: remove the remaining duplicate llvm typesTimothy Arceri2017-11-031-12/+1
| | | | | Reviewed-by: Marek Olšák <[email protected] Acked-by: Nicolai Hähnle <[email protected]>
* ac: remove usused v4f32Timothy Arceri2017-11-031-4/+0
| | | | | Reviewed-by: Marek Olšák <[email protected] Acked-by: Nicolai Hähnle <[email protected]>
* ac: add v2f32 to the common code and make use of itTimothy Arceri2017-11-033-10/+7
| | | | | Reviewed-by: Marek Olšák <[email protected] Acked-by: Nicolai Hähnle <[email protected]>
* ac: use the ac f16 llvm typeTimothy Arceri2017-11-031-3/+1
| | | | | Reviewed-by: Marek Olšák <[email protected] Acked-by: Nicolai Hähnle <[email protected]>
* ac: use the ac f32 llvm typeTimothy Arceri2017-11-031-35/+33
| | | | | Reviewed-by: Marek Olšák <[email protected] Acked-by: Nicolai Hähnle <[email protected]>
* ac: use the ac f64 llvm typeTimothy Arceri2017-11-031-3/+1
| | | | | Reviewed-by: Marek Olšák <[email protected] Acked-by: Nicolai Hähnle <[email protected]>
* ac: use the common v8i32 llvm typeTimothy Arceri2017-11-031-4/+2
| | | | | Reviewed-by: Marek Olšák <[email protected] Acked-by: Nicolai Hähnle <[email protected]>
* ac: use the common v4i32 llvm typeTimothy Arceri2017-11-031-9/+7
| | | | | Reviewed-by: Marek Olšák <[email protected] Acked-by: Nicolai Hähnle <[email protected]>
* ac: add v3i32 to the common code and make use of itTimothy Arceri2017-11-033-5/+5
| | | | | Reviewed-by: Marek Olšák <[email protected] Acked-by: Nicolai Hähnle <[email protected]>
* ac: add v2i32 to the common code and use itTimothy Arceri2017-11-033-11/+11
| | | | | Reviewed-by: Marek Olšák <[email protected] Acked-by: Nicolai Hähnle <[email protected]>
* ac: use the ac i64 llvm typeTimothy Arceri2017-11-031-3/+1
| | | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Marek Olšák <[email protected] Acked-by: Nicolai Hähnle <[email protected]>
* ac: remove unused i16 llvm typeTimothy Arceri2017-11-031-2/+0
| | | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Marek Olšák <[email protected] Acked-by: Nicolai Hähnle <[email protected]>
* ac: use the ac ivoidt llvm typeTimothy Arceri2017-11-031-4/+2
| | | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Marek Olšák <[email protected] Acked-by: Nicolai Hähnle <[email protected]>
* ac: use the ac i8 llvm typeTimothy Arceri2017-11-031-6/+4
| | | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Marek Olšák <[email protected] Acked-by: Nicolai Hähnle <[email protected]>
* ac: use the ac i1 llvm typeTimothy Arceri2017-11-031-3/+1
| | | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Marek Olšák <[email protected] Acked-by: Nicolai Hähnle <[email protected]>
* ac: use the ac i32 llvm typeTimothy Arceri2017-11-031-181/+179
| | | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Marek Olšák <[email protected] Acked-by: Nicolai Hähnle <[email protected]>
* ac/radeonsi: add support for tex instr without a derefenceTimothy Arceri2017-11-031-34/+46
| | | | | | | | | | | These are produced by nir_lower_bitmap(), adding the missing derefence would cause other issues that need to be hacked around such as skipping sampler lowering and uniform location assignment, so this change seems the correct way to go. Fixes 194 piglit crashes on radeonsi using NIR. Reviewed-by: Nicolai Hähnle <[email protected]>
* radv: use the optimal packets order for dispatch callsSamuel Pitoiset2017-11-021-8/+53
| | | | | | | | | | This should reduce the time where compute units are idle, mainly for meta operations because they use a bunch of compute shaders. This seems to have a really minor positive effect for Talos, at least. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Don't expose heaps with 0 memory.Bas Nieuwenhuizen2017-11-023-53/+101
| | | | | | | | | | | | It confuses CTS. This pregenerates the heap info into the physical device, so we can use it for translating contiguous indices into our "standard" ones. This also makes the WSI a bit smarter in case the first preferred heap does not exist. Reviewed-by: Dave Airlie <[email protected]> CC: <[email protected]>
* radeonsi: remove 'Authors:' commentsMarek Olšák2017-11-024-13/+1
| | | | | | | It's inaccurate. Instead, see the copyright and use "git log" and "git blame" to know the authorship. Acked-by: Nicolai Hähnle <[email protected]>
* radv: make radv_bind_descriptor_set() staticSamuel Pitoiset2017-11-022-6/+3
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: make sure we set buffers as shareable properly.Dave Airlie2017-11-022-2/+7
| | | | | | | | | | | This should make sure we don't treat exports buffers as local bos. Fixes: a639d40f13 (radv: add support for local bos. (v3)) Tested-by: Andres Rodriguez <[email protected]> Reviewed-by: Andres Rodriguez <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv: bail out when binding the same vertex buffersSamuel Pitoiset2017-10-311-2/+16
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: bail out when binding the same index bufferSamuel Pitoiset2017-10-312-0/+14
| | | | | | | DOW3 appears to hit this path. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: use correct alloc function when loading from diskTimothy Arceri2017-10-311-1/+14
| | | | | | | | | Fixes regression in: dEQP-VK.api.object_management.alloc_callback_fail.graphics_pipeline Fixes: 1e84e53712ae "radv: add cache items to in memory cache when reading from disk" Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Fix -Wformat-security issueAlex Smith2017-10-301-1/+1
| | | | | | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103513 Fixes: de889794134e ("radv: Implement VK_AMD_shader_info") Signed-off-by: Alex Smith <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: add cache items to in memory cache when reading from diskTimothy Arceri2017-10-301-70/+71
| | | | | | | | | Otherwise we will leak them, load duplicates from disk rather than memory and never write items loaded from disk to the apps pipeline cache. Fixes: fd24be134ffd 'radv: make use of on-disk cache' Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Implement VK_AMD_shader_infoAlex Smith2017-10-296-35/+171
| | | | | | | | | | | | | | | | | | This allows an app to query shader statistics and get a disassembly of a shader. RenderDoc git has support for it, so this allows you to view shader disassembly from a capture. When this extension is enabled on a device (or when tracing), we now disable pipeline caching, since we don't get the shader debug info when we retrieve cached shaders. v2: Improvements to resource usage reporting v3: Disassembly string must be null terminated (string_buffer's length does not include the terminator) v4: Fixed LDS reporting. (Bas) Signed-off-by: Alex Smith <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: allow to use a compute shader for resetting the query poolSamuel Pitoiset2017-10-271-7/+9
| | | | | | | | | | | | | Serious Sam Fusion 2017 uses a huge number of occlusion queries, and the allocated query pool buffer is greater than 4096 bytes. This slightly improves performance (tested in Ultra) from 117.2 FPS to 119.7 FPS (~+2%) on my RX480. This also improves Talos, from 69 FPS to 72/73 FPS (~+5%). Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: make radv_fill_buffer() return the needed flush bitsSamuel Pitoiset2017-10-274-58/+57
| | | | | | | Only needed when the CS path is used. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: add support for local bos. (v3)Dave Airlie2017-10-2611-22/+51
| | | | | | | | | | | | This uses the new kernel interfaces for reduced cs overhead, We only set the local flag for memory allocations that don't have a dedicated allocation and ones that aren't imports. v2: add to all the internal buffer creation paths. v3: missed some command submission paths, handle 0/empty bo lists. Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv: only copy the dynamic states that changedSamuel Pitoiset2017-10-261-23/+69
| | | | | | | | | | | | | | | | When binding a new pipeline, we applied all dynamic states without checking if they really need to be re-emitted. This doesn't seem to be useful for the meta operations because only the viewports/scissors are updated. This should reduce the number of commands added to the IB when a new graphics pipeline is bound. Also, rename radv_dynamic_state_copy() to radv_bind_dynamic_state() and set the dirty flags directly there. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radv: store the dynamic state mask into radv_dynamic_stateSamuel Pitoiset2017-10-263-7/+12
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radv: only emit the depth bounds test values when set dynamicallySamuel Pitoiset2017-10-261-2/+1
| | | | | | | | The depth bounds test values are either set at pipeline creation or dynamically using vkCmdSetDepthBounds(). Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* ac/llvm: drop pointless wrappers around umsb/imsbDave Airlie2017-10-261-14/+2
| | | | | Reviewed-by: Timothy Arceri <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* ac/llvm: consolidate find lsb function.Dave Airlie2017-10-263-29/+36
| | | | | | | This was the same between si and ac. Reviewed-by: Timothy Arceri <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* ac/llvm: drop v4f32empty. (v2)Dave Airlie2017-10-261-12/+0
| | | | | | | | | This was unused. v2: drop args. Reviewed-by: Timothy Arceri <[email protected]> Signed-off-by: Dave Airlie <[email protected]>