summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/radeon
Commit message (Collapse)AuthorAgeFilesLines
...
* gallium/radeon: simplify radeon_llvm_emit_fetch for direct array addressingNicolai Hähnle2016-08-171-5/+0
| | | | | | | We can use the pointer stored in the temps array directly. Reviewed-by: Tom Stellard <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: clean up emit_declaration for temporariesNicolai Hähnle2016-08-171-9/+18
| | | | | | | | In the alloca'd array case, no longer create redundant and unused allocas for the individual elements; create getelementptrs instead. Reviewed-by: Tom Stellard <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: use unflushed fences for deferred flushes (v2)Marek Olšák2016-08-101-1/+43
| | | | | | | | | | +23% Bioshock Infinite performance. v2: - use the new fence_finish interface - allow deferred fences with multiple contexts - clear the ctx pointer after a deferred flush Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium: add a pipe_context parameter to fence_finishMarek Olšák2016-08-102-1/+2
| | | | | | | | required by glClientWaitSync (GL 4.5 Core spec) that can optionally flush the context Reviewed-by: Rob Clark <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: add HUD queries for mapped VRAM/GTTMarek Olšák2016-08-102-0/+12
| | | | | | mainly for monitoring visible VRAM congestion Reviewed-by: Nicolai Hähnle <[email protected]>
* winsys/amdgpu: track the amount of mapped memoryMarek Olšák2016-08-101-0/+2
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: increase the size of the renderer stringMarek Olšák2016-08-101-1/+1
| | | | | | Mine is longer than 64 bytes. Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: implement ARB_clear_texture (v3)Marek Olšák2016-08-101-0/+67
| | | | | | | | | | Some ideas copied from Jakob Sinclair's implementation, but the color clearing is completely different. v2: remove leftover code, disable conditional rendering disable render condition cleanly Reviewed-by: Nicolai Hähnle <[email protected]>
* winsys/amdgpu: query ME/PFP/CE firmware versionsNicolai Hähnle2016-08-082-0/+6
| | | | | | | The radeon kernel module doesn't have the firmware query interface, so the corresponding values will remain 0. Reviewed-by: Marek Olšák <[email protected]>
* Revert "gallium/radeon: count contexts"Marek Olšák2016-08-062-4/+0
| | | | | | This reverts commit b403eb338533894ee012a96bf55653996c92ec7c. Not needed.
* gallium/radeon: add cs_get_next_fence winsys callbackMarek Olšák2016-08-061-0/+7
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: count contextsMarek Olšák2016-08-062-0/+4
| | | | | | We don't wanna use unflushed fences when we have multiple contexts. Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: count gfx IB flushesMarek Olšák2016-08-061-0/+1
| | | | | | | This will be used as a counter for whether fence_finish needs to flush the IB. Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: move radeon_winsys::cs_memory_below_limit to driversMarek Olšák2016-08-063-15/+27
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: inline radeon_winsys::query_memory_usageMarek Olšák2016-08-062-3/+1
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon/winsyses: expose per-IB used_vram and used_gart to driversMarek Olšák2016-08-061-0/+5
| | | | | | The following patches will use this. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: flush if sampler views and images use too much memoryMarek Olšák2016-08-061-0/+34
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: add r600_resource::vram_usage and gart_usageMarek Olšák2016-08-063-12/+19
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: move last_gfx_fence from radeonsi to common codeMarek Olšák2016-08-032-0/+2
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't set the last parameter component of llvm.AMDGPU.cubeMarek Olšák2016-08-031-2/+8
| | | | | | LLVM doesn't use it. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: use llvm.amdgcn.cube* if availableMarek Olšák2016-08-031-4/+28
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: use llvm.amdgcn.rsq.f64 if availableMarek Olšák2016-08-031-1/+2
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: use v_mad_f32 for fmaMarek Olšák2016-08-031-2/+2
| | | | | | | | | | | | | | | | | | | | | | v_fma_f32 runs at FP64 rate (= slow). Alien Isolation and F1 2015 seem to use fma for all d3d multiply-add instructions, which is silly. This tries to restore performance for those games. The main difference between v_mad_f32 and v_fma_f32 is that v_mad doesn't support denormals, which we don't enable anyway, because they are slow too. Also, there is code size reduction: Totals from affected shaders: VGPRS: 109796 -> 109808 (0.01 %) Spilled SGPRs: 29995 -> 30022 (0.09 %) Spilled VGPRs: 12 -> 13 (8.33 %) <-- it's just one shader going from 12 to 13 Code Size: 6667596 -> 6476356 (-2.87 %) bytes Max Waves: 26931 -> 26899 (-0.12 %) I've not actually tested real performance. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeon/llvm: Use alloca instructions for larger arrays [revert a revert]Marek Olšák2016-07-262-25/+149
| | | | | | This reverts commit f84e9d749fbb6da73a60fb70e6725db773c9b8f8. Bioshock Infinite no longer hangs.
* radeonsi: implement buffer_subdata without indirect callsMarek Olšák2016-07-233-3/+39
| | | | | | There is less noise in CPU profile data now. Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium: split transfer_inline_write into buffer and texture callbacksMarek Olšák2016-07-233-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | to reduce the call indirections with u_resource_vtbl. The worst call tree you could get was: - u_transfer_inline_write_vtbl - u_default_transfer_inline_write - u_transfer_map_vtbl - driver_transfer_map - u_transfer_unmap_vtbl - driver_transfer_unmap That's 6 indirect calls. Some drivers only had 5. The goal is to have 1 indirect call for drivers that care. The resource type can be determined statically at most call sites. The new interface is: pipe_context::buffer_subdata(ctx, resource, usage, offset, size, data) pipe_context::texture_subdata(ctx, resource, level, usage, box, data, stride, layer_stride) v2: fix whitespace, correct ilo's behavior Reviewed-by: Nicolai Hähnle <[email protected]> Acked-by: Roland Scheidegger <[email protected]>
* gallium/radeon: make deferred flushes asynchronousMarek Olšák2016-07-221-0/+2
| | | | Reviewed-by: Edward O'Callaghan <[email protected]>
* gallium/radeon: remove RADEON_FLUSH_KEEP_TILING_FLAGS flagMarek Olšák2016-07-191-2/+1
| | | | | | always set Reviewed-by: Nicolai Hähnle <[email protected]>
* radeon/uvd: add session context buffer for polaris 10/11 v2Christian König2016-07-182-0/+21
| | | | | | | | | This way we have unlimited UVD sessions. v2: only enable it when kernel supports it as well. Signed-off-by: Christian König <[email protected]> Reviewed-by: Leo Liu <[email protected]>
* Revert "radeon/llvm: Use alloca instructions for larger arrays"Marek Olšák2016-07-142-149/+25
| | | | | | This reverts commit 513fccdfb68e6a71180e21827f071617c93fd09b. Bioshock Infinite hangs with that.
* radeon/uvd: fail to create a decoder if RUVD_MSG_CREATE submission failsMarek Olšák2016-07-141-6/+9
| | | | | | This is the bare minimum for reporting the error to the user. Reviewed-by: Christian König <[email protected]>
* gallium/radeon: add a return value to cs_flushMarek Olšák2016-07-141-3/+5
| | | | | | Required by our UVD code. Reviewed-by: Christian König <[email protected]>
* radeon/vce: handle newly added parametersBoyuan Zhang2016-07-141-13/+20
| | | | | | | Replace the previous hardcoded value with newly defined parameters Signed-off-by: Boyuan Zhang <[email protected]> Reviewed-by: Christian König <[email protected]>
* gallium/radeon: normalize the code styleMarek Olšák2016-07-132-338/+286
| | | | | | no change in behavior Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: just save buffer sizes instead of buffers while recording IBsMarek Olšák2016-07-132-6/+1
| | | | | | whole buffer objects are not needed Reviewed-by: Nicolai Hähnle <[email protected]>
* radeon/uvd: simplify sending context buffer messageChristian König2016-07-081-4/+1
| | | | | | | Just send it whenever it is allocated. Signed-off-by: Christian König <[email protected]> Reviewed-by: Leo Liu <[email protected]>
* radeon/uvd: fix contex buffer destruction in the error pathChristian König2016-07-081-6/+2
| | | | | | | Destroying a not allocated buffer is harmless. Signed-off-by: Christian König <[email protected]> Reviewed-by: Leo Liu <[email protected]>
* radeon/uvd: move polaris fw check into radeon_video.c v2Christian König2016-07-082-11/+13
| | | | | | | | | | It's actually not very clever to claim to support H.264 and then fail to create a decoder. v2: prefix FW macro with UVD_. Signed-off-by: Christian König <[email protected]> Reviewed-by: Leo Liu <[email protected]>
* radeon/video: fix coding style in radeon_video.c v2Christian König2016-07-081-15/+15
| | | | | | | v2: fix other tabs as well. Signed-off-by: Christian König <[email protected]> Reviewed-by: Leo Liu <[email protected]>
* radeonsi: explicitly choose center locations for 1xAA on PolarisNicolai Hähnle2016-07-081-0/+7
| | | | | | | | | | | | | Unlike SC, the small primitive filter does not automatically use center locations in 1xAA mode, so this is needed to avoid artifacts caused by the small primitive filter discarding triangles that it shouldn't. As a side effect of how the effective number of samples is now calculated, this patch also avoids submitting the sample locations for line/poly smoothing when they're not really needed. Cc: 12.0 <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeon/llvm: Use alloca instructions for larger arraysTom Stellard2016-07-062-25/+151
| | | | | | | | | | | | | | | | | | | | We were storing arrays in vectors, which was leading to some really bad spill code for large arrays. allocas instructions are a better fit for arrays and LLVM optimizations are more geared toward dealing with allocas instead of vectors. For arrays that have 16 or less 32-bit elements, we will continue to use vectors, because this will force LLVM to store them in registers and use indirect registers, which is usually faster for small arrays. In the future we should use allocas for all arrays and teach LLVM how to store allocas in registers. This fixes the piglit test: spec/glsl-1.50/execution/geometry/max-input-component Reviewed-by: Marek Olšák <[email protected]>
* radeon/llvm: Add helpers for loading and storing data from arrays.Tom Stellard2016-07-061-10/+41
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeon/llvm: Remove uses_temp_indirect_addressing() functionTom Stellard2016-07-061-23/+1
| | | | | | bld->indirect_files is never set, so this function always returns false. Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: add depth/stencil_adjusted output to surface computationNicolai Hähnle2016-07-062-2/+10
| | | | | | | | | | | This fixes a rare bug with stencil texturing -- seen on Polaris and Tonga, though it's basically a function of the memory configuration so could affect other parts as well. Fixes piglit "unaligned-blit * stencil downsample" and various "fbo-depth-array *stencil*" tests. Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: allocate only the required plane for flushed depthNicolai Hähnle2016-07-061-3/+34
| | | | Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: replace is_flushing_texture with db_compatibleNicolai Hähnle2016-07-062-2/+3
| | | | | | | | | | | This is a left-over of when I considered generalizing the separate stencil support. I do prefer the new name since it emphasizes what flushing vs. non-flushing means from a functional point-of-view, namely special handling of the texture format. v2: adjust r600_init_color_surface as well Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: add can_sample_z/s flags for texturesNicolai Hähnle2016-07-062-4/+24
| | | | | | v2: adjust r600_init_color_surface as well Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon/winsyses: remove unused stencil_offsetNicolai Hähnle2016-07-061-1/+0
| | | | Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: remove redundant null-pointer checkNicolai Hähnle2016-07-061-2/+1
| | | | | | v2: keep using r600_texture_reference Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: print StencilLayout only onceNicolai Hähnle2016-07-061-2/+2
| | | | | | It is the same for all levels. Reviewed-by: Marek Olšák <[email protected]>