summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/radeonsi
Commit message (Collapse)AuthorAgeFilesLines
* radeonsi: make more use of si_have_tgsi_computeNicolai Hähnle2016-10-101-3/+1
| | | | | Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: support ARB_compute_variable_group_sizeNicolai Hähnle2016-10-103-16/+42
| | | | | | | | Not sure if it's possible to avoid programming the block size twice (once for the userdata and once for the dispatch). Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: fix texture border colors for compute shadersMarek Olšák2016-10-051-0/+12
| | | | | | | | There are VM faults without this. Cc: 12.0 <[email protected]> Acked-by: Edward O'Callaghan <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: fix interpolateAt opcodes for .zw componentsMarek Olšák2016-10-051-1/+1
| | | | | | | | | | Not returning garbage in .zw seems pretty important. This fixes: GL45-CTS.shader_multisample_interpolation.render.interpolate_at_*_check.* Cc: 11.2 12.0 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: add assertions to validate interpolation flagsMarek Olšák2016-10-051-0/+34
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: interpolate colors after interpolation weight shufflingMarek Olšák2016-10-051-48/+48
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: implement set_device_reset_callbackNicolai Hähnle2016-10-051-0/+3
| | | | | | | | | Check for device reset on flush. It would be nicer if the kernel just reported this as an error on the submit ioctl (and similarly for fences), but this will do for now. Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: optionally run the LLVM IR verifier passNicolai Hähnle2016-10-041-7/+21
| | | | | | | | This is enabled automatically if shader printing is enabled, or separately by R600_DEBUG=checkir. Catch mal-formed IR before it crashes in a later pass. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: don't declare LDS in PS when ds_bpermute is usedMarek Olšák2016-10-043-4/+7
| | | | | | | | I guess this is not needed because dead code elimination removes the declaration. Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: use DDX/DDY directly in si_llvm_emit_ddxy_interpMarek Olšák2016-10-041-49/+7
| | | | | | | We can finally do this, because the opcodes are scalar now. Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: simplify si_llvm_emit_ddxyMarek Olšák2016-10-041-51/+29
| | | | | | | | si_llvm_emit_ddxy is called once per element, so we don't have to generate code for 4 elements at once. Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: don't call build_gep0 in si_llvm_emit_ddxy on VIMarek Olšák2016-10-041-5/+9
| | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: use a helper function for BuildGEP(0, x)Marek Olšák2016-10-041-47/+35
| | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: remove obsolete shader definitionsMarek Olšák2016-10-041-12/+4
| | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: remove unnecessary #includesMarek Olšák2016-10-0411-23/+0
| | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: clean up lucky #include dependenciesMarek Olšák2016-10-042-36/+35
| | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: don't re-create shader PM4 states after scratch buffer updateMarek Olšák2016-10-043-15/+25
| | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: don't set sampler buffer offsets in create_sampler_viewMarek Olšák2016-10-043-24/+22
| | | | | | | | | do it at bind time, so that pipe_sampler_view is immutable with regard to buffer reallocations and we don't have to remember all existing buffer views. Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: optimize si_invalidate_buffer based on bind_historyMarek Olšák2016-10-041-87/+100
| | | | | | | | | Just enclose each section with: if (rbuffer->bind_history & PIPE_BIND_...) Bioshock Infinite: +1% performance Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: track buffer bind historyMarek Olšák2016-10-042-5/+21
| | | | | | | similar to gl_buffer_object::UsageHistory Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: drop support for NULL sampler viewsMarek Olšák2016-10-042-12/+4
| | | | | | | not used anymore. It was used when the polygon stipple texture was constant. Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: separate IA_MULTI_VGT_PARAM and VGT_PRIMITIVE_TYPE emissionMarek Olšák2016-10-041-7/+10
| | | | | | | We want to emit IA_MULTI_VGT_PARAM less often because it's a context reg. Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: move VGT_LS_HS_CONFIG to derived tess_stateMarek Olšák2016-10-043-28/+14
| | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: don't check PIPE_BARRIER_MAPPED_BUFFERMarek Olšák2016-10-041-4/+3
| | | | | | | Caches are always flushed at IB boundary. Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: parse SURFACE_SYNC correctly on CIK-VIMarek Olšák2016-10-041-9/+16
| | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: Fix primitive restart when index changesJames Legg2016-10-041-7/+7
| | | | | | | | | | | If primitive restart is enabled for two consecutive draws which use different primitive restart indices, then the first draw's primitive restart index was incorrectly used for the second draw. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98025 Cc: 11.1 11.2 12.0 <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* gallium/radeon: emit relocations for query fencesNicolai Hähnle2016-09-301-1/+1
| | | | | | | | | This is only needed for r600 which doesn't have ARB_query_buffer_object and therefore wouldn't really need the fences, but let's be optimistic about filling in this feature gap eventually. Cc: Dieter Nützel <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: enable ARB_query_buffer_object (v2)Nicolai Hähnle2016-09-291-7/+14
| | | | | | | v2: enable only when compute is available Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: add save_qbo_stateNicolai Hähnle2016-09-291-0/+12
| | | | | | | | Save compute shader state that will be used for the ARB_query_buffer_object implementation. Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: add si_get_shader_buffers/get_pipe_constant_buffers (v2)Nicolai Hähnle2016-09-292-0/+51
| | | | | | | | | | These functions extract the pipe state structure from the current descriptors, for state saving. v2: correctly dereference *buf (Bas) Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: add r600_gfx_{write,wait}_fenceNicolai Hähnle2016-09-291-38/+3
| | | | | | | For bottom-of-pipe fences inside the gfx command stream. Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: add barrier_flags to r600_common_screenNicolai Hähnle2016-09-291-0/+5
| | | | | | | | | | | There are driver-specific context flags for barriers that are not covered by the Gallium barrier interfaces. The R600 settings of these flags may not be optimal, but we're not going to use them yet anyway. Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi/compute: Use the HSA abi for non-TGSI compute shaders v3Tom Stellard2016-09-161-17/+222
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch switches non-TGSI compute shaders over to using the HSA ABI described here: https://github.com/RadeonOpenCompute/ROCm-Docs/blob/master/AMDGPU-ABI.md The HSA ABI provides a much cleaner interface for compute shaders and allows us to share more code in the compiler with the HSA stack. The main changes in this patch are: - We now pass the scratch buffer resource into the shader via user sgprs rather than using relocations. - Grid/Block sizes are now passed to the shader via the dispatch packet rather than at the beginning of the kernel arguments. Typically for HSA, the CP firmware will create the dispatch packet and set up the user sgprs automatically. However, in Mesa we let the driver do this work. The main reason for this is that I haven't researched how to get the CP to do all these things, and I'm not sure if it is supported for all GPUs. v2: - Add comments explaining why we are setting certain bits of the scratch resource descriptor. v3: - Use amdgcn-mesa-mesa3d triple instead of amdgcn--mesa3d. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi/compute: Add some more debug printfsTom Stellard2016-09-161-0/+3
|
* radeonsi: reload PS inputs with direct indexing at each use (v2)Marek Olšák2016-09-141-16/+11
| | | | | | | | | | | | | | | | | | | | | The LLVM compiler can CSE interp intrinsics thanks to LLVMReadNoneAttribute. 26011 shaders in 14651 tests Totals: SGPRS: 1146340 -> 1132676 (-1.19 %) VGPRS: 727371 -> 711730 (-2.15 %) Spilled SGPRs: 2218 -> 2078 (-6.31 %) Spilled VGPRs: 369 -> 369 (0.00 %) Scratch VGPRs: 1344 -> 1344 (0.00 %) dwords per thread Code Size: 35841268 -> 36009732 (0.47 %) bytes LDS: 767 -> 767 (0.00 %) blocks Max Waves: 222559 -> 224779 (1.00 %) Wait states: 0 -> 0 (0.00 %) v2: don't call load_input for fragment shaders in emit_declaration Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: get rid of constant buffer preloadingMarek Olšák2016-09-141-24/+14
| | | | | | | | | | | | | | | | | 26011 shaders in 14651 tests Totals: SGPRS: 1152636 -> 1146340 (-0.55 %) VGPRS: 728198 -> 727371 (-0.11 %) Spilled SGPRs: 3776 -> 2218 (-41.26 %) Spilled VGPRs: 369 -> 369 (0.00 %) Scratch VGPRs: 1344 -> 1344 (0.00 %) dwords per thread Code Size: 35835152 -> 35841268 (0.02 %) bytes LDS: 767 -> 767 (0.00 %) blocks Max Waves: 222372 -> 222559 (0.08 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: get rid of img/buf/sampler descriptor preloading (v2)Marek Olšák2016-09-141-132/+47
| | | | | | | | | | | | | | | | | | 26011 shaders in 14651 tests Totals: SGPRS: 1251920 -> 1152636 (-7.93 %) VGPRS: 728421 -> 728198 (-0.03 %) Spilled SGPRs: 16644 -> 3776 (-77.31 %) Spilled VGPRs: 369 -> 369 (0.00 %) Scratch VGPRs: 1344 -> 1344 (0.00 %) dwords per thread Code Size: 36001064 -> 35835152 (-0.46 %) bytes LDS: 767 -> 767 (0.00 %) blocks Max Waves: 222221 -> 222372 (0.07 %) Wait states: 0 -> 0 (0.00 %) v2: merge codepaths where possible Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: rename get_sampler_desc -> load_sampler_descMarek Olšák2016-09-141-11/+11
| | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: cosmetic changes in si_shader.cMarek Olšák2016-09-141-3/+5
| | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: load streamout buffer descriptors before use (v2)Marek Olšák2016-09-141-33/+14
| | | | | | v2: inline the code and remove the conditional that's a no-op now Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: fix FP64 UBO loads with indirect uniform block indexingMarek Olšák2016-09-131-2/+1
| | | | | | | No known tests. Cc: [email protected] Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: clean up CP DMA emit codeMarek Olšák2016-09-131-84/+60
| | | | | | | | Unify the clear and copy paths, clean up the definitions. It looks more like a rework. It's a preparation for GDS support, which might or might not come. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: print the IB and buffer list in VM fault reportsMarek Olšák2016-09-131-1/+2
| | | | | | This is a fallout from reworking the debug flags. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: add sampler view BOs to the BO list lastMarek Olšák2016-09-131-7/+10
| | | | | | | | | If si_sampler_view_add_buffer ends up flushing, then the code in begin_new_cs would previously have added the buffer(s) for whatever was previously bound to that slot. Now it would add only the new buffer. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: export SampleMask from pixel shaders at full rateMarek Olšák2016-09-133-16/+56
| | | | | | | Heaven and Valley write gl_SampleMask and not Z. Use 16_ABGR instead of 32_ABGR if Z isn't written. Reviewed-by: Nicolai Hähnle <[email protected]>
* android: add support for libmesa_amdgpu_addrlibMauro Rossi2016-09-131-1/+3
| | | | | | | | | | | Android porting of the following commits: f1f1ba3 "radeonsi: move sid.h/r600d_common.h to a common place." 69fca64 "amd/addrlib: move addrlib from amdgpu winsys to common code" This patch fixes android building errors Reviewed-by: Dave Airlie <[email protected]>
* radeonsi: don't preload constants at the beginning of shadersMarek Olšák2016-09-121-20/+11
| | | | | | | | | | | | | | | | | | | | | | LLVM can CSE the loads, thus we can always re-load constants before each use. The decrease in SGPR spilling is huge. The best improvements are the dumbest ones. 26011 shaders in 14651 tests Totals: SGPRS: 1453346 -> 1251920 (-13.86 %) VGPRS: 742576 -> 728421 (-1.91 %) Spilled SGPRs: 52298 -> 16644 (-68.17 %) Spilled VGPRs: 397 -> 369 (-7.05 %) Scratch VGPRs: 1372 -> 1344 (-2.04 %) dwords per thread Code Size: 36136488 -> 36001064 (-0.37 %) bytes LDS: 767 -> 767 (0.00 %) blocks Max Waves: 219315 -> 222221 (1.33 %) Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: flush TC L2 before using a compute indirect bufferMarek Olšák2016-09-091-2/+10
| | | | | | | There is no known test for this. Cc: 12.0 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: fix the VGT performance tweak for small instancesMarek Olšák2016-09-091-5/+6
| | | | | | | | Based on the VGT spec. The Vulkan driver doesn't do it optimally and they plan to fix it. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: remove the cache_flush atomMarek Olšák2016-09-097-12/+9
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>