summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/radeon
Commit message (Collapse)AuthorAgeFilesLines
* radeonsi: don't use allocas for arrays with LLVM 3.8Marek Olšák2016-08-251-1/+3
| | | | | | It crashes. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97413
* gallium/radeon: unify and simplify checking for an empty gfx IBMarek Olšák2016-08-251-10/+19
| | | | | | | We can take advantage of the fact that multi_fence does the obvious thing with NULL fences. This fixes unflushed fences that can get stuck due to empty IBs.
* gallium: add a pipe_context parameter to resource_get_handleMarek Olšák2016-08-251-0/+1
| | | | | | | | radeonsi needs to do some operations (DCC decompression) for OpenGL-OpenCL interop and this is the only way to make it coherent with the current context. It can optionally be set to NULL. Reviewed-by: Brian Paul <[email protected]>
* radeon/vce: set flag based on dual instance enablementBoyuan Zhang2016-08-191-2/+4
| | | | | | | Set the flag on when dual instance encoding is supported, otherwise set it to off. Signed-off-by: Boyuan Zhang <[email protected]>
* radeonsi: initialize and finalize the LLVM function pass managerMarek Olšák2016-08-181-0/+2
| | | | Reviewed-by: Tom Stellard <[email protected]>
* gallium/radeon: assign the highest priority to scratch; make rings secondMarek Olšák2016-08-171-2/+4
| | | | | | | just FYI, the kernel receives priority/4 Acked-by: Edward O'Callaghan <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/winsys: re-number winsys priority flagsMarek Olšák2016-08-171-16/+13
| | | | | | | free 60..63, move CP_DMA up Acked-by: Edward O'Callaghan <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: mark shader rings as highest-priority buffersMarek Olšák2016-08-171-1/+1
| | | | | | | and rename the enum Acked-by: Edward O'Callaghan <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: set SHADER_RW_BUFFER priority for streamout buffersMarek Olšák2016-08-171-2/+2
| | | | | Acked-by: Edward O'Callaghan <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: use current context for DCC feedback-loop decompress, fixes ElementalMarek Olšák2016-08-172-7/+33
| | | | | | | | | | This is just a workaround. The problem is described in the code. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96541 v2: say that it's only between the current context and aux_context Reviewed-by: Nicolai Hähnle <[email protected]> (v1)
* gallium/radeon: use unflushed fences for PIPE_QUERY_GPU_FINISHEDMarek Olšák2016-08-171-2/+2
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: use lp_build_alloca_undefNicolai Hähnle2016-08-171-13/+4
| | | | | | | | Avoid building all those store 0 / store undef instruction pairs that end up getting removed anyway. Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: protect against out of bounds temporary array accessesNicolai Hähnle2016-08-171-0/+15
| | | | | | | They can lead to VM faults and worse, which goes against the GL robustness promises. Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: add radeon_llvm_bound_index for bounds checkingNicolai Hähnle2016-08-172-0/+33
| | | | Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: reduce alloca of temporaries based on usagemaskNicolai Hähnle2016-08-172-10/+54
| | | | | | v2: take actual writemasks into account Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: use tgsi_scan_arrays for temp arraysNicolai Hähnle2016-08-172-4/+8
| | | | Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: allocate temps array info in radeon_llvm_context_initNicolai Hähnle2016-08-172-33/+44
| | | | | | | | | Also, prepare for using tgsi_array_info. This also opens the door for properly handling allocation failures, but I'm leaving that for a separate change. Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: always do the full store in store_value_to_arrayNicolai Hähnle2016-08-171-49/+28
| | | | | | | | | Doing the write-back of the temporary vector in radeon_llvm_emit_store makes no sense. This also allows us to get rid of get_alloca_for_array. Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: extract common getelementptr logic into get_pointer_into_arrayNicolai Hähnle2016-08-171-39/+66
| | | | Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: pass indirect register info into get_alloca_for_arrayNicolai Hähnle2016-08-171-5/+6
| | | | | | To have the same signature as get_array_range. Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: extract common lookup code into get_temp_array functionNicolai Hähnle2016-08-171-33/+40
| | | | Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: clarify the comment on the array alloca heuristicNicolai Hähnle2016-08-171-10/+19
| | | | Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: more descriptive names for LLVM temporaries in debug buildsNicolai Hähnle2016-08-171-2/+12
| | | | | Reviewed-by: Tom Stellard <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: simplify radeon_llvm_emit_store for direct array addressingNicolai Hähnle2016-08-171-7/+0
| | | | | | | We can use the pointer stored in the temps array directly. Reviewed-by: Tom Stellard <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: simplify radeon_llvm_emit_fetch for direct array addressingNicolai Hähnle2016-08-171-5/+0
| | | | | | | We can use the pointer stored in the temps array directly. Reviewed-by: Tom Stellard <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: clean up emit_declaration for temporariesNicolai Hähnle2016-08-171-9/+18
| | | | | | | | In the alloca'd array case, no longer create redundant and unused allocas for the individual elements; create getelementptrs instead. Reviewed-by: Tom Stellard <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: use unflushed fences for deferred flushes (v2)Marek Olšák2016-08-101-1/+43
| | | | | | | | | | +23% Bioshock Infinite performance. v2: - use the new fence_finish interface - allow deferred fences with multiple contexts - clear the ctx pointer after a deferred flush Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium: add a pipe_context parameter to fence_finishMarek Olšák2016-08-102-1/+2
| | | | | | | | required by glClientWaitSync (GL 4.5 Core spec) that can optionally flush the context Reviewed-by: Rob Clark <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: add HUD queries for mapped VRAM/GTTMarek Olšák2016-08-102-0/+12
| | | | | | mainly for monitoring visible VRAM congestion Reviewed-by: Nicolai Hähnle <[email protected]>
* winsys/amdgpu: track the amount of mapped memoryMarek Olšák2016-08-101-0/+2
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: increase the size of the renderer stringMarek Olšák2016-08-101-1/+1
| | | | | | Mine is longer than 64 bytes. Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: implement ARB_clear_texture (v3)Marek Olšák2016-08-101-0/+67
| | | | | | | | | | Some ideas copied from Jakob Sinclair's implementation, but the color clearing is completely different. v2: remove leftover code, disable conditional rendering disable render condition cleanly Reviewed-by: Nicolai Hähnle <[email protected]>
* winsys/amdgpu: query ME/PFP/CE firmware versionsNicolai Hähnle2016-08-082-0/+6
| | | | | | | The radeon kernel module doesn't have the firmware query interface, so the corresponding values will remain 0. Reviewed-by: Marek Olšák <[email protected]>
* Revert "gallium/radeon: count contexts"Marek Olšák2016-08-062-4/+0
| | | | | | This reverts commit b403eb338533894ee012a96bf55653996c92ec7c. Not needed.
* gallium/radeon: add cs_get_next_fence winsys callbackMarek Olšák2016-08-061-0/+7
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: count contextsMarek Olšák2016-08-062-0/+4
| | | | | | We don't wanna use unflushed fences when we have multiple contexts. Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: count gfx IB flushesMarek Olšák2016-08-061-0/+1
| | | | | | | This will be used as a counter for whether fence_finish needs to flush the IB. Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: move radeon_winsys::cs_memory_below_limit to driversMarek Olšák2016-08-063-15/+27
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: inline radeon_winsys::query_memory_usageMarek Olšák2016-08-062-3/+1
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon/winsyses: expose per-IB used_vram and used_gart to driversMarek Olšák2016-08-061-0/+5
| | | | | | The following patches will use this. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: flush if sampler views and images use too much memoryMarek Olšák2016-08-061-0/+34
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: add r600_resource::vram_usage and gart_usageMarek Olšák2016-08-063-12/+19
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: move last_gfx_fence from radeonsi to common codeMarek Olšák2016-08-032-0/+2
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't set the last parameter component of llvm.AMDGPU.cubeMarek Olšák2016-08-031-2/+8
| | | | | | LLVM doesn't use it. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: use llvm.amdgcn.cube* if availableMarek Olšák2016-08-031-4/+28
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: use llvm.amdgcn.rsq.f64 if availableMarek Olšák2016-08-031-1/+2
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: use v_mad_f32 for fmaMarek Olšák2016-08-031-2/+2
| | | | | | | | | | | | | | | | | | | | | | v_fma_f32 runs at FP64 rate (= slow). Alien Isolation and F1 2015 seem to use fma for all d3d multiply-add instructions, which is silly. This tries to restore performance for those games. The main difference between v_mad_f32 and v_fma_f32 is that v_mad doesn't support denormals, which we don't enable anyway, because they are slow too. Also, there is code size reduction: Totals from affected shaders: VGPRS: 109796 -> 109808 (0.01 %) Spilled SGPRs: 29995 -> 30022 (0.09 %) Spilled VGPRs: 12 -> 13 (8.33 %) <-- it's just one shader going from 12 to 13 Code Size: 6667596 -> 6476356 (-2.87 %) bytes Max Waves: 26931 -> 26899 (-0.12 %) I've not actually tested real performance. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeon/llvm: Use alloca instructions for larger arrays [revert a revert]Marek Olšák2016-07-262-25/+149
| | | | | | This reverts commit f84e9d749fbb6da73a60fb70e6725db773c9b8f8. Bioshock Infinite no longer hangs.
* radeonsi: implement buffer_subdata without indirect callsMarek Olšák2016-07-233-3/+39
| | | | | | There is less noise in CPU profile data now. Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium: split transfer_inline_write into buffer and texture callbacksMarek Olšák2016-07-233-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | to reduce the call indirections with u_resource_vtbl. The worst call tree you could get was: - u_transfer_inline_write_vtbl - u_default_transfer_inline_write - u_transfer_map_vtbl - driver_transfer_map - u_transfer_unmap_vtbl - driver_transfer_unmap That's 6 indirect calls. Some drivers only had 5. The goal is to have 1 indirect call for drivers that care. The resource type can be determined statically at most call sites. The new interface is: pipe_context::buffer_subdata(ctx, resource, usage, offset, size, data) pipe_context::texture_subdata(ctx, resource, level, usage, box, data, stride, layer_stride) v2: fix whitespace, correct ilo's behavior Reviewed-by: Nicolai Hähnle <[email protected]> Acked-by: Roland Scheidegger <[email protected]>