summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/radeon
Commit message (Collapse)AuthorAgeFilesLines
...
* gallium/radeon: make sure the address of separate CMASK is aligned properlyMarek Olšák2016-10-261-2/+3
| | | | | | | | This should fix random GPU hangs on Hawaii and Fiji. Cc: 11.2 12.0 13.0 <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: fix incorrect bpe use in si_set_optimal_micro_tile_modeMarek Olšák2016-10-261-7/+7
| | | | | | | | Oh my god, I wonder what catastrophic issues this was causing on SI. Cc: 13.0 <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: import all TGSI->LLVM code from gallium/radeonMarek Olšák2016-10-185-1625/+1
| | | | | | Acked-by: Nicolai Hähnle <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Acked-by: Edward O'Callaghan <[email protected]>
* gallium/radeon: simplify initialization of 64-bit gallivm buildersMarek Olšák2016-10-181-18/+4
| | | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Acked-by: Edward O'Callaghan <[email protected]>
* gallium/radeon: remove unused radeon_llvm_reg_index_soaMarek Olšák2016-10-182-7/+0
| | | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Acked-by: Edward O'Callaghan <[email protected]>
* radeonsi: move LLVM ALU codegen into radeonsiMarek Olšák2016-10-182-986/+2
| | | | | | Acked-by: Nicolai Hähnle <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Acked-by: Edward O'Callaghan <[email protected]>
* radeonsi: implement TC-compatible HTILEMarek Olšák2016-10-133-10/+64
| | | | | | | | | | | | | | | | | | | | | so that decompress blits aren't needed and depth texturing needs less memory bandwidth. Z16 and Z24 are promoted to Z32_FLOAT by the driver, because TC-compatible HTILE only supports Z32_FLOAT. This doubles memory footprint for Z16. The format promotion is not visible to state trackers. This is part of TC-compatible renderbuffer compression, which has 3 parts: DCC, HTILE, FMASK. Only TC-compatible FMASK compression is missing now. I don't see a measurable increase in performance though. (I tested Talos Principle and DiRT: Showdown, the latter is improved by 0.5%, which is almost noise, and it originally used layered Z16, so at least we know that Z16 promoted to Z32F isn't slower now) Tested-by: Edmondo Tommasina <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: assign a name to LLVM output variables in debug buildsNicolai Hähnle2016-10-101-1/+6
| | | | | | | This can be helpful with R600_DEBUG=preoptir. Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: avoid redundant work with overlapping in/out arraysNicolai Hähnle2016-10-101-1/+4
| | | | | Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: support ARB_compute_variable_group_sizeNicolai Hähnle2016-10-102-1/+11
| | | | | | | | Not sure if it's possible to avoid programming the block size twice (once for the userdata and once for the dispatch). Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium: add missing zero-init for resource templatesRob Clark2016-10-071-0/+1
| | | | | | | Mostly test code, plus one spot I noticed in r600. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium: add PIPE_COMPUTE_CAP_MAX_VARIABLE_THREADS_PER_BLOCKSamuel Pitoiset2016-10-071-0/+2
| | | | | | | | | v3: - use a new case statement in r600_pipe_common.c - fix compilation of softpipe... Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: implement set_device_reset_callbackNicolai Hähnle2016-10-052-0/+34
| | | | | | | | | Check for device reset on flush. It would be nicer if the kernel just reported this as an error on the submit ioctl (and similarly for fences), but this will do for now. Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: use the new parent/child pools for transfersNicolai Hähnle2016-10-053-6/+11
| | | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97894 Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: optionally run the LLVM IR verifier passNicolai Hähnle2016-10-044-2/+17
| | | | | | | | This is enabled automatically if shader printing is enabled, or separately by R600_DEBUG=checkir. Catch mal-formed IR before it crashes in a later pass. Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: fix argument type of llvm.{cttz,ctlz}.i32 intrinsicsNicolai Hähnle2016-10-041-2/+2
| | | | | | Caught by R600_DEBUG=checkir (next commit). Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: unify the creation of basic blocksNicolai Hähnle2016-10-041-10/+24
| | | | | | | This changes the order of basic blocks to be equal to the order of code in the original TGSI, which is nice for making sense of shader dumps. Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: merge branch and loop flow control stacksNicolai Hähnle2016-10-042-82/+78
| | | | Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: simplify if/else/endif blocksNicolai Hähnle2016-10-042-25/+18
| | | | | | | In particular, we no longer emit an else block when there is no ELSE instruction. Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: label basic blocks by the corresponding TGSI pcNicolai Hähnle2016-10-041-0/+17
| | | | Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: cleanup and fix branch emitsNicolai Hähnle2016-10-041-37/+14
| | | | | | | | | | | | | Some of the existing code is needlessly complicated. The basic principle should be: control-flow opcodes emit branches to properly terminate the current block, _unless_ the current block already has a terminator (which happens if and only if there was a BRK or CONT). This also fixes a bug where multiple terminators were created in a block. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97887 Cc: [email protected] Reviewed-by: Marek Olšák <[email protected]>
* winsys/radeon: add buffer_get_reloc_offsetNicolai Hähnle2016-10-043-2/+14
| | | | | | | | | | | Really fix the bug that was supposed to be fixed by commits 3e7cced4b and a48bf02d: even when virtual addresses are used, the legacy relocation-based method with offsets relative to the kernel's buffer object are used for video submissions. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97969 Reviewed-by: Christian König <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: move r600_common_context::texture_buffers to r600gMarek Olšák2016-10-042-7/+0
| | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: track buffer bind historyMarek Olšák2016-10-042-0/+2
| | | | | | | similar to gl_buffer_object::UsageHistory Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* gallium/radeon: inline r600_context_add_resource_sizeMarek Olšák2016-10-042-21/+13
| | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* gallium/radeon: fix crash/regression in performance countersNicolai Hähnle2016-09-301-0/+9
| | | | | | | Regression introduced by "gallium/radeon: zero all query buffers". Cc: Michel Dänzer <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: update documentation of buffer_get_virtual_addressNicolai Hähnle2016-09-301-0/+3
| | | | Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: emit relocations for query fencesNicolai Hähnle2016-09-303-8/+14
| | | | | | | | | This is only needed for r600 which doesn't have ARB_query_buffer_object and therefore wouldn't really need the fences, but let's be optimistic about filling in this feature gap eventually. Cc: Dieter Nützel <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeon/uvd: adjust the buffer offset when relocation is usedNicolai Hähnle2016-09-301-0/+1
| | | | | | | We don't plan to use sub-allocated buffers with UVD, but just in case one slips through, this increases the chances of things working out anyway. Reviewed-by: Christian König <[email protected]>
* radeon/vce: adjust the buffer offset when relocation is usedNicolai Hähnle2016-09-301-0/+1
| | | | | | | We don't plan to use sub-allocated buffers with VCE, but just in case one slips through, this increases the chances of things working out anyway. Reviewed-by: Christian König <[email protected]>
* radeon/video: don't use sub-allocated buffersNicolai Hähnle2016-09-302-1/+10
| | | | | | | Cc: Christian König <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97976 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97969 Reviewed-by: Christian König <[email protected]>
* gallium/radeon: use smaller buffers for query resultsNicolai Hähnle2016-09-291-1/+1
| | | | | | | Most of the time, even the 512 bytes that we now get is more than sufficient (pipeline stats queries are the largest at 184 bytes per shot). Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon/winsyses: add radeon_winsys::min_alloc_sizeNicolai Hähnle2016-09-291-0/+1
| | | | Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: implement get_query_result_resource (v2)Nicolai Hähnle2016-09-294-1/+402
| | | | | | | v2: fix a comment (Gustaw Smolarczyk) Acked-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: zero all query buffersNicolai Hähnle2016-09-292-17/+11
| | | | | | | To ensure that fences are properly initialized. Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: cleanup getting PIPE_QUERY_TIMESTAMP resultNicolai Hähnle2016-09-291-5/+1
| | | | | Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: add query fences and r600_get_hw_query_paramsNicolai Hähnle2016-09-291-16/+91
| | | | | | | | | | | | | | | | | We will support the waiting option in ARB_query_buffer_object using WAIT_REG_MEM on an appropriate fence-like dword. Some queries conveniently write their results with the highest bit set, and we can just use that; for others, we have to write a fence explicitly. ZPASS_DONE for occlusion queries writes its results with the high bit set, but it writes up to 8 pairs of results (one for each DB). We have to wait for all of these results, so let's just add an explicit fence. The new function provides summary information to be used by subsequent patches. Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: add save_qbo_stateNicolai Hähnle2016-09-292-0/+10
| | | | | | | | Save compute shader state that will be used for the ARB_query_buffer_object implementation. Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: add r600_gfx_{write,wait}_fenceNicolai Hähnle2016-09-292-0/+57
| | | | | | | For bottom-of-pipe fences inside the gfx command stream. Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: add barrier_flags to r600_common_screenNicolai Hähnle2016-09-291-0/+12
| | | | | | | | | | | There are driver-specific context flags for barriers that are not covered by the Gallium barrier interfaces. The R600 settings of these flags may not be optimal, but we're not going to use them yet anyway. Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: Initialize pipe_resource::next to NULLMichel Dänzer2016-09-282-0/+2
| | | | | | | Fixes lots of piglit tests crashing due to using uninitialized memory. Fixes: ecd6fce2611e ("mesa/st: support lowering multi-planar YUV") Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* winsys/amdgpu: add fence and buffer list logic for slab allocated buffersNicolai Hähnle2016-09-271-0/+3
| | | | Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: add RADEON_FLAG_HANDLENicolai Hähnle2016-09-272-0/+3
| | | | | | | | | | | | | | | | When passed to winsys->buffer_create, this flag will indicate that we require a buffer that maps 1:1 with a kernel buffer handle. This is currently set for all textures, since textures can potentially be exported to other processes. This is not a huge loss, since the main purpose of this patch series is to deal with applications that allocate many small buffers. A hypothetical application with tons of tiny textures might still benefit from not setting this flag, but that's not a use case I'm worried about just now. Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: add RADEON_USAGE_SYNCHRONIZEDNicolai Hähnle2016-09-274-5/+14
| | | | | | | | | | This is really the behavior we want most of the time, but having a SYNCHRONIZED flag instead of an UNSYNCHRONIZED one has the advantage that OR'ing different flags together always results in stronger guarantees. The parent BOs of sub-allocated buffers will be added unsynchronized. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: prepare 64-bit integer support. (v2)Dave Airlie2016-09-211-7/+62
| | | | | | | | | | | v2: - no PIPE_CAP_INT64 yet - emit DIV/MOD without the divide-by-zero workaround Reviewed-by: Marek Olšák <[email protected]> (v1) Reviewed-by: Edward O'Callaghan <[email protected]> Signed-off-by: Dave Airlie <[email protected]> Signed-off-by: Nicolai Hähnle <[email protected]>
* radeon/vce: add firmware support for version 52.8.3Leo Liu2016-09-201-0/+3
| | | | | Signed-off-by: Leo Liu <[email protected]> Reviewed-by: Christian König <[email protected]>
* radeonsi/compute: Use the HSA abi for non-TGSI compute shaders v3Tom Stellard2016-09-161-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch switches non-TGSI compute shaders over to using the HSA ABI described here: https://github.com/RadeonOpenCompute/ROCm-Docs/blob/master/AMDGPU-ABI.md The HSA ABI provides a much cleaner interface for compute shaders and allows us to share more code in the compiler with the HSA stack. The main changes in this patch are: - We now pass the scratch buffer resource into the shader via user sgprs rather than using relocations. - Grid/Block sizes are now passed to the shader via the dispatch packet rather than at the beginning of the kernel arguments. Typically for HSA, the CP firmware will create the dispatch packet and set up the user sgprs automatically. However, in Mesa we let the driver do this work. The main reason for this is that I haven't researched how to get the CP to do all these things, and I'm not sure if it is supported for all GPUs. v2: - Add comments explaining why we are setting certain bits of the scratch resource descriptor. v3: - Use amdgcn-mesa-mesa3d triple instead of amdgcn--mesa3d. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: reload PS inputs with direct indexing at each use (v2)Marek Olšák2016-09-142-6/+30
| | | | | | | | | | | | | | | | | | | | | The LLVM compiler can CSE interp intrinsics thanks to LLVMReadNoneAttribute. 26011 shaders in 14651 tests Totals: SGPRS: 1146340 -> 1132676 (-1.19 %) VGPRS: 727371 -> 711730 (-2.15 %) Spilled SGPRs: 2218 -> 2078 (-6.31 %) Spilled VGPRs: 369 -> 369 (0.00 %) Scratch VGPRs: 1344 -> 1344 (0.00 %) dwords per thread Code Size: 35841268 -> 36009732 (0.47 %) bytes LDS: 767 -> 767 (0.00 %) blocks Max Waves: 222559 -> 224779 (1.00 %) Wait states: 0 -> 0 (0.00 %) v2: don't call load_input for fragment shaders in emit_declaration Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: set new r600_resource fields correctly in other places tooMarek Olšák2016-09-131-0/+11
| | | | | | | | | | | | | This was missed in: commit 0d2e43fcb1198a6e67c85feadb1ca8c360ddc284 Author: Marek Olšák <[email protected]> Date: Thu Aug 18 16:30:00 2016 +0200 gallium/radeon: derive buffer placement and flags only at initialization Tested-by: Michel Dänzer <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeon: Don't check DCC on pipe buffersJan Vesely2016-09-131-3/+4
| | | | | | | | | Fixes segfaults in EG compute since: commit 21de3be8e62b2b093569a99550e6356ed2f106b4 radeonsi: fix texture format reinterpretation with DCC Signed-off-by: Jan Vesely <[email protected]> Reviewed-by: Marek Olšák <[email protected]>