summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/radeonsi
Commit message (Collapse)AuthorAgeFilesLines
* radeonsi: pre-generate shader logs for ddebugMarek Olšák2016-07-264-6/+34
| | | | | | | | This cuts down the overhead of si_dump_shader when ddebug is capturing shader logs, which is done for every draw call unconditionally (that's quite a lot of work for a draw call). Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: add empty lines after shader statsMarek Olšák2016-07-261-1/+1
| | | | | | to separate individual shaders dumped consecutively. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: move the shader key dumping to si_shader_dumpMarek Olšák2016-07-263-5/+9
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium: rework flags for pipe_context::dump_debug_stateMarek Olšák2016-07-261-13/+19
| | | | | | | | The pipelined hang detection mode will not want to dump everything. (and it's also time consuming) It will only dump shaders after a draw call and then dump the status registers separately if a hang is detected. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: ensure sample locations are set for line and polygon smoothingNicolai Hähnle2016-07-231-2/+1
| | | | | | | Since commit d938b8c, the sample locations are no longer set unconditionally, so we need to set the atom to dirty on all chips, not just Polaris. Cc: 12.0 <[email protected]>
* radeonsi: fix Polaris MSAA regressionNicolai Hähnle2016-07-232-15/+20
| | | | | | | | | | | | | The regression was introduced by commit d938b8c. The problem here is that in order to use the small primitive filter, we need to explicitly set the sample locations to 0. But the DB doesn't properly process the change of sample locations without a flush, and so we can end up with incorrect Z values. Instead of doing a flush, just disable the small primitive filter when MSAA is force-disabled. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96908 Cc: 12.0 <[email protected]>
* radeonsi: implement buffer_subdata without indirect callsMarek Olšák2016-07-231-1/+1
| | | | | | There is less noise in CPU profile data now. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: advertise 8 bits subpixel precision for viewport boundsJózef Kucia2016-07-201-1/+2
| | | | | Signed-off-by: Józef Kucia <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* gallium: add a cap for VIEWPORT_SUBPIXEL_BITS (v2)Józef Kucia2016-07-201-0/+1
| | | | | | | | | | | | This allows Gallium drivers to advertise the subpixel precision for floating point viewports bounds. v2: - Set ViewportSubpixelBits in st_init_limits. Signed-off-by: Józef Kucia <[email protected]> Signed-off-by: Marek Olšák <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* radeonsi: emit PS exports lastMarek Olšák2016-07-191-13/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This effectively removes s_waitcnt instructions after FP16 exports. Before: v_cvt_pkrtz_f16_f32_e32 v0, v0, v1 ; 5E000300 v_cvt_pkrtz_f16_f32_e32 v1, v2, v3 ; 5E020702 exp 15, 0, 1, 0, 0, v0, v1, v0, v0 ; F800040F 00000100 s_waitcnt expcnt(0) ; BF8C0F0F v_cvt_pkrtz_f16_f32_e32 v0, v4, v5 ; 5E000B04 v_cvt_pkrtz_f16_f32_e32 v1, v6, v7 ; 5E020F06 exp 15, 1, 1, 0, 0, v0, v1, v0, v0 ; F800041F 00000100 s_waitcnt expcnt(0) ; BF8C0F0F v_cvt_pkrtz_f16_f32_e32 v0, v8, v9 ; 5E001308 v_cvt_pkrtz_f16_f32_e32 v1, v10, v11 ; 5E02170A exp 15, 2, 1, 0, 0, v0, v1, v0, v0 ; F800042F 00000100 s_waitcnt expcnt(0) ; BF8C0F0F v_cvt_pkrtz_f16_f32_e32 v0, v12, v13 ; 5E001B0C v_cvt_pkrtz_f16_f32_e32 v1, v14, v15 ; 5E021F0E exp 15, 3, 1, 1, 1, v0, v1, v0, v0 ; F8001C3F 00000100 s_endpgm ; BF810000 After: v_cvt_pkrtz_f16_f32_e32 v0, v0, v1 ; 5E000300 v_cvt_pkrtz_f16_f32_e32 v1, v2, v3 ; 5E020702 v_cvt_pkrtz_f16_f32_e32 v2, v4, v5 ; 5E040B04 v_cvt_pkrtz_f16_f32_e32 v3, v6, v7 ; 5E060F06 exp 15, 0, 1, 0, 0, v0, v1, v0, v0 ; F800040F 00000100 v_cvt_pkrtz_f16_f32_e32 v4, v8, v9 ; 5E081308 v_cvt_pkrtz_f16_f32_e32 v5, v10, v11 ; 5E0A170A exp 15, 1, 1, 0, 0, v2, v3, v0, v0 ; F800041F 00000302 v_cvt_pkrtz_f16_f32_e32 v6, v12, v13 ; 5E0C1B0C v_cvt_pkrtz_f16_f32_e32 v7, v14, v15 ; 5E0E1F0E exp 15, 2, 1, 0, 0, v4, v5, v0, v0 ; F800042F 00000504 exp 15, 3, 1, 1, 1, v6, v7, v0, v0 ; F8001C3F 00000706 s_endpgm ; BF810000 Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: set optimal settings in COMPUTE_RESOURCE_LIMITSMarek Olšák2016-07-191-2/+6
| | | | | | ported from Vulkan Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: really wait for the second EOP event and not the first oneMarek Olšák2016-07-191-1/+5
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: remove RADEON_FLUSH_KEEP_TILING_FLAGS flagMarek Olšák2016-07-191-3/+0
| | | | | | always set Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/u_queue: add optional cleanup callbackRob Clark2016-07-161-1/+2
| | | | | | | | | | | | Adds a second optional cleanup callback, called after the fence is signaled. This is needed if, for example, the queue has the last reference to the object that embeds the util_queue_fence. In this case we cannot drop the ref in the main callback, since that would result in the fence being destroyed before it is signaled. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: remove the DRAW_PREAMBLE packetNicolai Hähnle2016-07-163-12/+1
| | | | | | | According to firmware guys, the new sequence that we added for Polaris should work on all CIK parts, and should actually be faster on some parts. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: report accurate SGPR and VGPR spillsMarek Olšák2016-07-132-5/+15
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: add a workaround for a compute VGPR-usage LLVM bugMarek Olšák2016-07-131-0/+35
| | | | | | | v2: use abort(), describe which LLVM version is affected Cc: 12.0 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: use LLVMGetTypeKind to tell if an input is an array of descriptorsMarek Olšák2016-07-131-19/+11
| | | | | | just a cleanup Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: replace !tbaa with !invariant.loadMarek Olšák2016-07-131-12/+5
| | | | | | no change in generated code thanks to dereferenceable(n) Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: set dereferenceable attribute on descriptor arraysMarek Olšák2016-07-131-4/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This allows moving the loads arbitrarily in the Sinking pass. 26002 shaders in 14643 tests Totals: SGPRS: 2080160 -> 2080160 (0.00 %) VGPRS: 798875 -> 797826 (-0.13 %) Spilled SGPRs: 108485 -> 79165 (-27.03 %) Spilled VGPRs: 327 -> 327 (0.00 %) Scratch VGPRs: 1656 -> 1652 (-0.24 %) dwords per thread Code Size: 36127192 -> 35559780 (-1.57 %) bytes LDS: 767 -> 767 (0.00 %) blocks Max Waves: 212464 -> 212672 (0.10 %) Wait states: 0 -> 0 (0.00 %) PERCENTAGES / App Shaders SGPRs VGPRs SpillSGPR SpillVGPR Scratch CodeSize MaxWaves Waits (unknown) 4 . . . . . . . . 0ad 6 . . . . . . . . alien_isolation 2938 . 0.04 % -8.53 % . . -0.71 % -0.06 % . anholt 10 . . . . . . . . batman_arkham_origins 589 . -0.58 % -79.54 % . . -6.72 % 0.57 % . bioshock-infinite 1769 . -0.65 % -89.32 % . . -4.73 % 0.48 % . borderlands2 3968 . -0.31 % -51.21 % . . -4.09 % 0.22 % . brutal-legend 338 . -0.03 % -2.95 % . . -0.06 % . . civilization_beyond.. 116 . . -14.17 % . . -0.88 % . . counter_strike_glob.. 1142 . . . . . . . . dirt-showdown 541 . -0.56 % -40.14 % . -3.45 % -1.82 % 0.35 % . dolphin 22 . . . . . 0.16 % . . dota2 1747 . . . . . 0.01 % . . europa_universalis_4 76 . -0.23 % -42.11 % . . -0.96 % . . f1-2015 774 . -0.09 % -28.89 % . . -2.60 % 0.09 % . furmark-0.7.0 4 . . . . . . . . gimark-0.7.0 10 . . . . . . . . glamor 16 . . . . . . . . humus-celshading 4 . . . . . . . . humus-domino 6 . . . . . . . . humus-dynamicbranching 24 . 0.71 % . . . 0.29 % -0.45 % . humus-hdr 10 . . . . . . . . humus-portals 2 . . . . . . . . humus-volumetricfog.. 6 . . . . . . . . left_4_dead_2 1762 . . . . . . . . metro_2033_redux 2670 . -0.10 % -7.15 % . . -0.03 % . . nexuiz 80 . . . . . . . . pixmark-julia-fp32 2 . . . . . . . . pixmark-julia-fp64 2 . . . . . . . . pixmark-piano-0.7.0 2 . . . . . . . . pixmark-volplosion-.. 2 . . . . . . . . plot3d-0.7.0 8 . . . . . . . . portal 474 . . . . . . . . sauerbraten 7 . . . . . . . . serious_sam_3_bfe 392 . . -13.20 % . . -1.81 % . . supertuxkart 4 . . . . . . . . talos_principle 324 . -0.21 % -18.39 % . . -2.73 % 0.14 % . team_fortress_2 808 . . . . . . . . tesseract 430 . 0.08 % -68.57 % . . -0.45 % . . tessmark-0.7.0 6 . . . . . . . . thea 172 . . . . . 0.03 % . . ue4_effects_cave 299 . -0.04 % -10.15 % . . -0.25 % 0.04 % . ue4_elemental 586 . -0.02 % -13.93 % . . -0.13 % 0.02 % . ue4_lightroom_inter.. 74 . -0.17 % -70.00 % . . -1.27 % . . ue4_realistic_rende.. 92 . . -32.58 % . . -0.35 % . . unigine_heaven 322 . 0.12 % -54.17 % . . -1.42 % -0.12 % . unigine_sanctuary 264 . . . . . . . . unigine_tropics 210 . . . . . . . . unigine_valley 278 . -0.15 % -40.74 % . . -2.00 % 0.09 % . unity 72 . . . . . 0.03 % . . warsow 176 . . . . . . . . warzone2100 4 . . . . . 0.13 % . . witcher2 1040 . -0.03 % -86.28 % . . -0.28 % 0.01 % . xcom_enemy_within 1236 . -0.24 % -63.54 % . . -0.93 % 0.18 % . yofrankie 82 . -0.61 % -100.00 % . . -0.83 % 0.41 % . ----------------------------------------------------------------------------------------------------------- Total 26002 . -0.13 % -27.03 % . -0.24 % -1.57 % 0.10 % . Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: clean up shader value metadata codeMarek Olšák2016-07-131-15/+19
| | | | | | | No change in behavior. BTW, tbaa_md_kind == 1, which was the magic number in the code. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: remove LLVMNoUnwindAttribute usesMarek Olšák2016-07-131-36/+31
| | | | | | always set by gallivm Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: fix a typo in SI_PARAM_LINEAR_* handlingMarek Olšák2016-07-131-1/+1
| | | | | | introduced in 476e9cee1d0cbe321c401277214e6c36ce5b18c9 Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: just save buffer sizes instead of buffers while recording IBsMarek Olšák2016-07-131-2/+2
| | | | | | whole buffer objects are not needed Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: silence Coverity warningNicolai Hähnle2016-07-132-0/+4
| | | | | | | | Coverity's analysis is too weak to understand that r600_init_flushed_depth(_, _, NULL) only returns true when flushed_depth_texture was assigned a non-NULL value. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: fix bad assertion in si_emit_sample_maskNicolai Hähnle2016-07-091-1/+2
| | | | | | | The blitter sets mask == 1, which is fine since it doesn't use smoothing. Fixes a regression introduced in commit 5bcfbf91. Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: disable multi-threading when shader dumps are enabledNicolai Hähnle2016-07-081-0/+1
| | | | | | | Otherwise, shader dumps can become interleaved and unusable. Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: use multi-threaded compilation in debug contextsNicolai Hähnle2016-07-081-4/+4
| | | | | | | | We only have to stay single-threaded when debug output must be synchronous. This yields better parallelism in shader-db runs for me. Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: catch a potential state tracker error with non-MSAA FBsNicolai Hähnle2016-07-081-0/+6
| | | | | | At least st/mesa ensures this, so I'd rather not handle deviations in radeonsi. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: explicitly choose center locations for 1xAA on PolarisNicolai Hähnle2016-07-084-18/+41
| | | | | | | | | | | | | Unlike SC, the small primitive filter does not automatically use center locations in 1xAA mode, so this is needed to avoid artifacts caused by the small primitive filter discarding triangles that it shouldn't. As a side effect of how the effective number of samples is now calculated, this patch also avoids submitting the sample locations for line/poly smoothing when they're not really needed. Cc: 12.0 <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: decompress to flushed depth texture when requiredNicolai Hähnle2016-07-061-29/+103
| | | | | | v2: s/dirty_level_mask/stencil_dirty_level_mask/ in stencil case Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: extract DB->CB copy logic into its own functionNicolai Hähnle2016-07-061-36/+61
| | | | | | Also clean up some of the looping. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: sample from flushed depth texture when requiredNicolai Hähnle2016-07-062-8/+46
| | | | | | | Note that this has no effect yet. A case where can_sample_z/s can be false in radeonsi will be added in a later patch. Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: replace is_flushing_texture with db_compatibleNicolai Hähnle2016-07-063-6/+8
| | | | | | | | | | | This is a left-over of when I considered generalizing the separate stencil support. I do prefer the new name since it emphasizes what flushing vs. non-flushing means from a functional point-of-view, namely special handling of the texture format. v2: adjust r600_init_color_surface as well Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: correctly mark levels of 3D textures as fully decompressedNicolai Hähnle2016-07-061-2/+2
| | | | | | Account for the fact that max_layer is minified for higher levels. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: do compilation from si_create_shader_selector asynchronouslyMarek Olšák2016-07-054-7/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | Main shader parts and geometry shaders are compiled asynchronously by util_queue. si_create_shader_selector doesn't wait and returns. si_draw_vbo(si_shader_select) waits for completion. This has the best effect when shaders are compiled at app-loading time. It doesn't help much for shaders compiled on demand, even though VS+PS compilation should take as much as time as the bigger one of the two. If an app creates more shaders, at most 4 threads will be used to compile them. Debug output disables this for shader stats to be printed in the correct order. (We could go even further and build variants asynchronously too, then emit draw calls without waiting and emit incomplete shader states, then force IB chaining to give the compiler more time, then sync the compilation at the IB flush and patch the IB with correct shader states. This is great for compilation before draw calls, but there are some difficulties such as scratch and tess states requiring the compiler output, and an on-disk shader cache will likely be a much better and simpler solution.) Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't lock shader cache mutex during compilationMarek Olšák2016-07-051-6/+16
| | | | | | | | | | to allow multiple shaders to be compiled simultaneously. ALso, shader-db can again use all 4 cores. v2: Remove the pipe_mutex_unlock call in the error path. Reviewed-by: Nicolai Hähnle <[email protected]> (v1)
* radeonsi: separate the compilation chunk of si_create_shader_selectorMarek Olšák2016-07-053-80/+110
| | | | | | | The function interface is ready to be used by util_queue. Also, si_shader_select_with_key can no longer accept si_context. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: move LLVMTargetMachineRef creation to a separate functionMarek Olšák2016-07-051-14/+18
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: add and use radeon_info::max_alloc_size (v2)Marek Olšák2016-07-051-2/+2
| | | | | | | | | | v2: - squashed the patches - use INT_MAX - clamp max_const_buffer_size - check the DRM version in radeon Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Vedran Miletić <[email protected]>
* radeonsi: print LLVM IRs to ddebug logsMarek Olšák2016-07-054-1/+24
| | | | | | | Getting LLVM IRs of hanging shaders have never been easier. Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: enable string markers and record apitrace call numbersMarek Olšák2016-07-053-1/+24
| | | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: remove unused code - radeon_llvm_util.*Marek Olšák2016-07-051-1/+0
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: remove an obsolete commentMarek Olšák2016-07-051-5/+0
| | | | | | It's not true. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't interpolate colors if flatshading is enabledMarek Olšák2016-07-053-2/+14
| | | | | | use v_interp_mov for those Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: enable the barycentric optimization in all casesMarek Olšák2016-07-053-18/+125
| | | | | | | | Handle the bc_optimize SGPR bit if both CENTER and CENTROID are enabled. This should increase the PS launch rate for big primitives with MSAA. Based on discussion with SPI guys. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: compute only one set of interpolation (i,j) when MSAA is disabledMarek Olšák2016-07-053-3/+88
| | | | | | | This should increase the PS launch rate for shaders using at least 2 pairs of perspective (i,j) and same for linear. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: split ps.prolog.force_persample_interp into persp and linear bitsMarek Olšák2016-07-053-45/+64
| | | | | | This reduces the number of v_mov's in the prolog. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't dump the shader key for non-monolithic shaders earlyMarek Olšák2016-07-051-1/+2
| | | | | | It's always zero. Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: add and use r600_texture_referenceMarek Olšák2016-06-291-1/+1
| | | | | | Reviewed-by: Alex Deucher <[email protected]> Reviewed-by: Vedran Miletić <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>