aboutsummaryrefslogtreecommitdiffstats
path: root/src/amd
Commit message (Collapse)AuthorAgeFilesLines
* ac: add emit_vertex to the abiTimothy Arceri2017-11-122-5/+10
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radv: Fix architecture in radeon_icd.{arch}.jsonChad Versace2017-11-091-1/+1
| | | | | | | | | | | | Use the host arch, not the target arch. In Meson and in recent Autotools, the host arch is where the binary will be used. The target arch is useful only when compiling a compiler. See: http://mesonbuild.com/Cross-compilation.html See: https://www.gnu.org/software/automake/manual/html_node/Cross_002dCompilation.html Reported-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac/nir: add support for all intrinsics. (v2)Dave Airlie2017-11-091-1/+31
| | | | | | | | | | | This is derived from tgsi/radeonsi code from the GLSL intrinsics. This should pre-fix radv for the upcoming spirv patches. v2: actually use wait_cnt, sleep deprived dad time! (Bas) Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* amd: add amdgpu_asic_addr.h to the sources listEmil Velikov2017-11-081-0/+1
| | | | | | | Otherwise it will be missing from the release tarball Fixes: 7f33e94e43a ("amd/addrlib: update to latest version") Signed-off-by: Emil Velikov <[email protected]>
* amd/addrlib: update to latest versionMarek Olšák2017-11-0831-3334/+1354
| | | | | | | | | | | | This uses C++11 initializer lists. I just overwrote all Mesa files with internal addrlib and discarded hunks that we should probably keep, but I might have missed something. The code depending on ADDR_AM_BUILD is removed. We can add it back next time if needed. Acked-by: Nicolai Hähnle <[email protected]>
* radeonsi: use ac_create_target_machineMarek Olšák2017-11-072-2/+8
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: use ac_get_llvm_processor_nameMarek Olšák2017-11-072-1/+3
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: remove unused field in the PCI ID tableMarek Olšák2017-11-071-1/+1
| | | | Reviewed-by: Alex Deucher <[email protected]>
* ac/nir: for ubo load use correct num_componentsDave Airlie2017-11-071-1/+1
| | | | | | | | | I was hacking something stupid in doom, and hit an assert for the bitcast following this, it definitely looks like this should be the number of 32-bit components, not the instr level ones. Reviewed-by: Nicolai Hähnle <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv: move is_local up to the winsys level.Dave Airlie2017-11-064-3/+6
| | | | | | | | We can avoid adding the buffer in the non-local case, this will avoid all the overhead of the indirect call. Reviewed-by: Samuel Pitoiset <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv: wrap cs_add_buffer in an inline. (v2)Dave Airlie2017-11-066-41/+49
| | | | | | | | | The next patch will try and avoid calling the indirect function. v2: add a missing conversion. Reviewed-by: Samuel Pitoiset <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv: when loading regs no need to add bufferDave Airlie2017-11-061-2/+0
| | | | | | | | The function that calls us has just added the buffer to the list already, no need to try and add it again. Reviewed-by: Samuel Pitoiset <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv: pre-calculate user_data_0 registers and store in pipelineDave Airlie2017-11-065-52/+55
| | | | | | | | There's no point recalculating these the whole time on descriptor emission, just store them at pipeline creation. Reviewed-by: Samuel Pitoiset <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv: add initial copy descriptor support. (v2)Dave Airlie2017-11-061-2/+53
| | | | | | | | | | | | It appears the latest dota2 vulkan uses this, and we get a hang in VR mode without it. v2: remove finishme I left in after finishing. Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Andres Rodriguez <[email protected]> Cc: "17.2 17.3" <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv: move descriptor sets out of cmd_state.Dave Airlie2017-11-063-17/+20
| | | | | | | | | | | Instead of storing all the pointers and zeroing them all out, just store a valid bitmask in the state. This also moves the CmdBindPipeline path down the cpu usage path for the multithreading demo as it no longer has to traverse MAX_SETS to find the active descriptor sets. Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv: add helper for setting a descriptor.Dave Airlie2017-11-063-10/+17
| | | | | | | This is just a simple refactor. Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv: move vertex binding out of cmd state.Dave Airlie2017-11-062-4/+4
| | | | | | | | | This isn't required to be cleared, since buffers are only linked by vertex elements, so if elements are clear then no buffers should be referenced. Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv: reorder cmd_state to remove a hole.Dave Airlie2017-11-061-1/+1
| | | | | | | | This just removes a hole in the cmd_state and packs some bools together. Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv: free attachments on end command buffer.Dave Airlie2017-11-061-0/+2
| | | | | | | | | | | | | | If we allocate attachments in the begin command buffer due to the render pass continue bit, we were leaking them. Since renderpasses inside a cmd buffer malloc/free these properly, and set to NULL, we just need to call free at end. Fixes a memory leak with multithreading demo. Reviewed-by: Bas Nieuwenhuizen <[email protected]> Cc: "17.2 17.3" <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv: Optimize calling radv_save_descriptors.Bas Nieuwenhuizen2017-11-041-4/+2
| | | | | | | | uint32_t data[MAX_SETS * 2] = {}; was getting executed before the exit and took significant amounts of time. By having the check outside the function, we skip the execution of the clear. Reviewed-by: Dave Airlie <[email protected]>
* radv: Use an array to store descriptor sets.Bas Nieuwenhuizen2017-11-042-26/+50
| | | | | | | | | | | | The vram_list linked list resulted in lots of pointer chasing. Replacing this with an array instead improves descriptor set allocation CPU usage by 3x at least (when also considering the free), because it had to iterate through 300-400 sets on average. Not a huge improvement as the pre-improvement CPU usage was only about 2.3% in the busiest thread. Reviewed-by: Dave Airlie <[email protected]>
* ac: remove the remaining duplicate llvm typesTimothy Arceri2017-11-031-12/+1
| | | | | Reviewed-by: Marek Olšák <[email protected] Acked-by: Nicolai Hähnle <[email protected]>
* ac: remove usused v4f32Timothy Arceri2017-11-031-4/+0
| | | | | Reviewed-by: Marek Olšák <[email protected] Acked-by: Nicolai Hähnle <[email protected]>
* ac: add v2f32 to the common code and make use of itTimothy Arceri2017-11-033-10/+7
| | | | | Reviewed-by: Marek Olšák <[email protected] Acked-by: Nicolai Hähnle <[email protected]>
* ac: use the ac f16 llvm typeTimothy Arceri2017-11-031-3/+1
| | | | | Reviewed-by: Marek Olšák <[email protected] Acked-by: Nicolai Hähnle <[email protected]>
* ac: use the ac f32 llvm typeTimothy Arceri2017-11-031-35/+33
| | | | | Reviewed-by: Marek Olšák <[email protected] Acked-by: Nicolai Hähnle <[email protected]>
* ac: use the ac f64 llvm typeTimothy Arceri2017-11-031-3/+1
| | | | | Reviewed-by: Marek Olšák <[email protected] Acked-by: Nicolai Hähnle <[email protected]>
* ac: use the common v8i32 llvm typeTimothy Arceri2017-11-031-4/+2
| | | | | Reviewed-by: Marek Olšák <[email protected] Acked-by: Nicolai Hähnle <[email protected]>
* ac: use the common v4i32 llvm typeTimothy Arceri2017-11-031-9/+7
| | | | | Reviewed-by: Marek Olšák <[email protected] Acked-by: Nicolai Hähnle <[email protected]>
* ac: add v3i32 to the common code and make use of itTimothy Arceri2017-11-033-5/+5
| | | | | Reviewed-by: Marek Olšák <[email protected] Acked-by: Nicolai Hähnle <[email protected]>
* ac: add v2i32 to the common code and use itTimothy Arceri2017-11-033-11/+11
| | | | | Reviewed-by: Marek Olšák <[email protected] Acked-by: Nicolai Hähnle <[email protected]>
* ac: use the ac i64 llvm typeTimothy Arceri2017-11-031-3/+1
| | | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Marek Olšák <[email protected] Acked-by: Nicolai Hähnle <[email protected]>
* ac: remove unused i16 llvm typeTimothy Arceri2017-11-031-2/+0
| | | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Marek Olšák <[email protected] Acked-by: Nicolai Hähnle <[email protected]>
* ac: use the ac ivoidt llvm typeTimothy Arceri2017-11-031-4/+2
| | | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Marek Olšák <[email protected] Acked-by: Nicolai Hähnle <[email protected]>
* ac: use the ac i8 llvm typeTimothy Arceri2017-11-031-6/+4
| | | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Marek Olšák <[email protected] Acked-by: Nicolai Hähnle <[email protected]>
* ac: use the ac i1 llvm typeTimothy Arceri2017-11-031-3/+1
| | | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Marek Olšák <[email protected] Acked-by: Nicolai Hähnle <[email protected]>
* ac: use the ac i32 llvm typeTimothy Arceri2017-11-031-181/+179
| | | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Marek Olšák <[email protected] Acked-by: Nicolai Hähnle <[email protected]>
* ac/radeonsi: add support for tex instr without a derefenceTimothy Arceri2017-11-031-34/+46
| | | | | | | | | | | These are produced by nir_lower_bitmap(), adding the missing derefence would cause other issues that need to be hacked around such as skipping sampler lowering and uniform location assignment, so this change seems the correct way to go. Fixes 194 piglit crashes on radeonsi using NIR. Reviewed-by: Nicolai Hähnle <[email protected]>
* radv: use the optimal packets order for dispatch callsSamuel Pitoiset2017-11-021-8/+53
| | | | | | | | | | This should reduce the time where compute units are idle, mainly for meta operations because they use a bunch of compute shaders. This seems to have a really minor positive effect for Talos, at least. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Don't expose heaps with 0 memory.Bas Nieuwenhuizen2017-11-023-53/+101
| | | | | | | | | | | | It confuses CTS. This pregenerates the heap info into the physical device, so we can use it for translating contiguous indices into our "standard" ones. This also makes the WSI a bit smarter in case the first preferred heap does not exist. Reviewed-by: Dave Airlie <[email protected]> CC: <[email protected]>
* radeonsi: remove 'Authors:' commentsMarek Olšák2017-11-024-13/+1
| | | | | | | It's inaccurate. Instead, see the copyright and use "git log" and "git blame" to know the authorship. Acked-by: Nicolai Hähnle <[email protected]>
* radv: make radv_bind_descriptor_set() staticSamuel Pitoiset2017-11-022-6/+3
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: make sure we set buffers as shareable properly.Dave Airlie2017-11-022-2/+7
| | | | | | | | | | | This should make sure we don't treat exports buffers as local bos. Fixes: a639d40f13 (radv: add support for local bos. (v3)) Tested-by: Andres Rodriguez <[email protected]> Reviewed-by: Andres Rodriguez <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv: bail out when binding the same vertex buffersSamuel Pitoiset2017-10-311-2/+16
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: bail out when binding the same index bufferSamuel Pitoiset2017-10-312-0/+14
| | | | | | | DOW3 appears to hit this path. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: use correct alloc function when loading from diskTimothy Arceri2017-10-311-1/+14
| | | | | | | | | Fixes regression in: dEQP-VK.api.object_management.alloc_callback_fail.graphics_pipeline Fixes: 1e84e53712ae "radv: add cache items to in memory cache when reading from disk" Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Fix -Wformat-security issueAlex Smith2017-10-301-1/+1
| | | | | | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103513 Fixes: de889794134e ("radv: Implement VK_AMD_shader_info") Signed-off-by: Alex Smith <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: add cache items to in memory cache when reading from diskTimothy Arceri2017-10-301-70/+71
| | | | | | | | | Otherwise we will leak them, load duplicates from disk rather than memory and never write items loaded from disk to the apps pipeline cache. Fixes: fd24be134ffd 'radv: make use of on-disk cache' Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Implement VK_AMD_shader_infoAlex Smith2017-10-296-35/+171
| | | | | | | | | | | | | | | | | | This allows an app to query shader statistics and get a disassembly of a shader. RenderDoc git has support for it, so this allows you to view shader disassembly from a capture. When this extension is enabled on a device (or when tracing), we now disable pipeline caching, since we don't get the shader debug info when we retrieve cached shaders. v2: Improvements to resource usage reporting v3: Disassembly string must be null terminated (string_buffer's length does not include the terminator) v4: Fixed LDS reporting. (Bas) Signed-off-by: Alex Smith <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: allow to use a compute shader for resetting the query poolSamuel Pitoiset2017-10-271-7/+9
| | | | | | | | | | | | | Serious Sam Fusion 2017 uses a huge number of occlusion queries, and the allocated query pool buffer is greater than 4096 bytes. This slightly improves performance (tested in Ultra) from 117.2 FPS to 119.7 FPS (~+2%) on my RX480. This also improves Talos, from 69 FPS to 72/73 FPS (~+5%). Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>