aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* util: fix u_fifo_pop()Rob Clark2020-03-301-1/+1
| | | | | | | | | Seems like no one ever depended on it to actually return false when fifo is empty. Fixes: 6e61d062093 ("util: Add super simple fifo") Signed-off-by: Rob Clark <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4366>
* freedreno: remove some obsolete debug optionsRob Clark2020-03-302-11/+4
| | | | | | | | | | 'fraghalf' is unused (superceeded by actually lowering output based on the precision information in nir). And glsl140 support in ir3 is long past the experimental stage, so the glsl120 option is no longer needed. So remove them and free up some bits for new things. Signed-off-by: Rob Clark <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4366>
* nir/opt_loop_unroll: Fix has_nested_loop handlingJason Ekstrand2020-03-301-1/+1
| | | | | | | | | | | | | | | | | In 87839680c0a48, a very subtle mistake was made with the CFG walking recursion. Instead of setting the local has_nested_loop variable when process child loops, has_nested_loop_out was passed directly into the process_loop_in_block call. This broke nested loop detection heuristics and caused loop unrolling to run massively out of control. In particular, it makes the following CTS test compile virtually forever: dEQP-VK.spirv_assembly.instruction.graphics.16bit_storage.struct_mixed_types.uniform_buffer_block_geom Fixes: 87839680c0 "nir: Fix breakage of foreach_list_typed_safe..." Closes: #2710 Reviewed-by: Danylo Piliaiev <[email protected]> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4380> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4380>
* freedreno: Work around UBWC flakiness.Eric Anholt2020-03-302-29/+13
| | | | | | | | | | | | | | | | | | In trying to track down the new failure in #2670, I found that I could get the flaky test set down to 4 tests, and dropping any remaining test wouldn't trigger the failure (a bad 8x4 block in the middle of dEQP-GLES3.functional.fbo.msaa.4_samples.r16f's render target). Disabling gmem or bypass didn't help, and adding lots of CCU flushing didn't help. What did help was disabling blitting, or this memset to initialize the UBWC area after we (presumably) pull a BO out of the BO cache. My guess is that the 2D blitter can't handle some rare set of state in the flags buffer and emits some garbage. I've run 8 gles3 and 7 gles31 runs with this branch now so hopefully I've got the4 right set of flakes marked for removal. Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2670 Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4290> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4290>
* freedreno: Fix detection of being in a blit for acc queries.Eric Anholt2020-03-3010-37/+30
| | | | | | | | | | | | | | | | | The batch might not have stage == FD_STAGE_BLIT set because fd_blitter_pipe_begin was sticking the stage on some random batch (or none at all) rather than the one that would be used in the meta operation. What we actually wanted to be looking at was set_active_query_state(), which is already called by util_blitter and whose state we just needed to track. Fixes piglit occlusion_query_meta_no_fragments. I haven't changed query_hw.c's stage handling to clean the rest up because I don't have a db410c/db820c at home to iterate over the piglit tests. Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4356> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4356>
* freedreno: Rename "is_blit" to "is_discard_blit"Eric Anholt2020-03-305-8/+8
| | | | | | | It's about the special case of an overwrite of a level meaning we can discard old batch contents. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4356>
* freedreno/a6xx: Fix timestamp queries.Eric Anholt2020-03-301-4/+10
| | | | | | | | | | | | | | | We were returning the same kind of result as time_elapsed (an end - start time in ns), which on a timestamp query is approximately zero since begin/end are at the same point in time. What we're supposed to return is a converted-to-ns timestamp based on the GPU clock. Remove the _pause() function for time_elapsed to reduce the command stream overhead, and just capture start (which is, unfortunately, going to happen on each tile and thus the final start value we ready will be the last tile of the frame, not the first). Fixes piglit spec/arb_timer_query/query gl_timestamp Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4356>
* freedreno: Count blits in GL_TIME_ELAPSED and perf counter queries.Eric Anholt2020-03-305-6/+8
| | | | | | Fixes 0 gpu time reported for glBlitFramebuffer in apitrace replay --pgpu. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4356>
* freedreno: Associate the acc query bo with the batch.Eric Anholt2020-03-301-0/+2
| | | | | | | Otherwise, a result query with wait won't trigger flushing the batch, and we can end up with zeroed results. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4356>
* freedreno: Fix acc query handling in the presence of batch reordering.Eric Anholt2020-03-304-25/+62
| | | | | | | | | | When we switch batches and start a new draw, we need to cap the queries in the previous batch and start queries again in the new one. FD_STAGE_NULL got renamed to 0 so that it would naturally return !is_active and end the queries at the end of the batch. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4356>
* freedreno: Remove the "active" member of queries.Eric Anholt2020-03-304-21/+9
| | | | | | | The state tracker only gets to begin/query/destroy when !active and end when active, so we have no need to try to track this ourselves. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4356>
* freedreno: Remove always-true return from per-gen begin_query.Eric Anholt2020-03-305-13/+7
| | | | | | You should do failure-prone allocation in create_query, not begin, anyway. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4356>
* util/u_queue: fix race in total_jobs_size accessRhys Perry2020-03-301-2/+2
| | | | | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Eric Anholt <[email protected]> CC: <[email protected]> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4335> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4335>
* glsl: fix race in instance gettersRhys Perry2020-03-301-5/+15
| | | | | | | | | | | Insertions can modify entry->data. Seems to fix random Fossilize crashes. Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Eric Anholt <[email protected]> CC: <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4335>
* nir: Set UBO alignments in lower_uniforms_to_uboJason Ekstrand2020-03-301-0/+2
| | | | | | | Fixes: fb64954d9dd "nir: Validate that memory load/store ops work on..." Reviewed-by: Kenneth Graunke <[email protected]> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4378> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4378>
* aco: look at p_{extract,split}_vector's definitions in pred_by_exec_mask()Rhys Perry2020-03-301-2/+5
| | | | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4333> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4333>
* CI: Re-enable Windows VS2019 buildsDaniel Stone2020-03-301-1/+2
| | | | | | | | | | | The failures are fixed, but I didn't notice this had been silently disabled in !4272. Re-enable the VS2019 build. Signed-off-by: Daniel Stone <[email protected]> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4374> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4374>
* nir: Validate that memory load/store ops work on whole bytesJason Ekstrand2020-03-301-0/+27
| | | | | | Reviewed-by: Kenneth Graunke <[email protected]> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4338> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4338>
* anv: Set alignments on descriptor and constant loadsJason Ekstrand2020-03-301-0/+3
| | | | | Reviewed-by: Kenneth Graunke <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4338>
* nir: Insert b2b1s around booleans in nir_lower_toJason Ekstrand2020-03-301-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | By inserting a b2b1 around the load_ubo, load_input, etc. intrinsics generated by nir_lower_io, we can ensure that the intrinsic has the correct destination bit size. Not having the right size can mess up passes which try to optimize access. In particular, it was causing brw_nir_analyze_ubo_ranges to ignore load_ubo of booleans which meant that booleans uniforms weren't getting pushed as push constants. I don't think this is an actual functional bug anywhere hence no CC to stable but it may improve perf somewhere. Shader-db results on ICL with iris: total instructions in shared programs: 16076707 -> 16075246 (<.01%) instructions in affected programs: 129034 -> 127573 (-1.13%) helped: 487 HURT: 0 helped stats (abs) min: 3 max: 3 x̄: 3.00 x̃: 3 helped stats (rel) min: 0.45% max: 3.00% x̄: 1.33% x̃: 1.36% 95% mean confidence interval for instructions value: -3.00 -3.00 95% mean confidence interval for instructions %-change: -1.37% -1.29% Instructions are helped. total cycles in shared programs: 338015639 -> 337983311 (<.01%) cycles in affected programs: 971986 -> 939658 (-3.33%) helped: 362 HURT: 110 helped stats (abs) min: 1 max: 1664 x̄: 97.37 x̃: 43 helped stats (rel) min: 0.03% max: 36.22% x̄: 5.58% x̃: 2.60% HURT stats (abs) min: 1 max: 554 x̄: 26.55 x̃: 18 HURT stats (rel) min: 0.03% max: 10.99% x̄: 1.04% x̃: 0.96% 95% mean confidence interval for cycles value: -79.97 -57.01 95% mean confidence interval for cycles %-change: -4.60% -3.47% Cycles are helped. total sends in shared programs: 815037 -> 814550 (-0.06%) sends in affected programs: 5701 -> 5214 (-8.54%) helped: 487 HURT: 0 LOST: 2 GAINED: 0 The two lost programs were SIMD16 shaders in CS:GO. However, CS:GO was also one of the most helped programs where it shaves sends off of 134 programs. This seems to reduce GPU core clocks by about 4% on the first 1000 frames of the PTS benchmark. Reviewed-by: Kenneth Graunke <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4338>
* nir: Use b2b opcodes for shared and constant memoryJason Ekstrand2020-03-303-17/+24
| | | | | | | | No shader-db changes on ICL with iris Reviewed-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4338>
* aco: Implement b2b32 and b2b1Jason Ekstrand2020-03-302-0/+4
| | | | | | | | | | | The implementations here just clone i2b32 and i2b1. This means that b2b32 doesn't technically generate true NIR 0/-1 booleans but it should be fine as it's only ever generated for shared variable writes which will always be consumed by something which will then run it through an i2b again. Reviewed-by: Rhys Perry <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4338>
* nir: Add b2b opcodesJason Ekstrand2020-03-305-2/+22
| | | | | | | | | | | | | | | | | | | | | These exist to convert between different types of boolean values. In particular, we want to use these for uniform and shared memory operations where we need to convert to a reasonably sized boolean but we don't care what its format is so we don't want to make the back-end insert an actual i2b/b2i. In the case of uniforms, Mesa can tweak the format of the uniform boolean to whatever the driver wants. In the case of shared, every value in a shared variable comes from the shader so it's already in the right boolean format. The new boolean conversion opcodes get replaced with mov in lower_bool_to_int/float32 so the back-end will hopefully never see them. However, while we're in the middle of optimizing our NIR, they let us have sensible load_uniform/ubo intrinsics and also have the bit size conversion. Reviewed-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4338>
* intel/nir: Run copy-prop and DCE after lower_bool_to_int32Jason Ekstrand2020-03-301-0/+2
| | | | | | | No shader-db impact on ICL with iris. Reviewed-by: Kenneth Graunke <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4338>
* etnaviv: compiled_framebuffer_state: get rid of SE_SCISSOR_*Christian Gmeiner2020-03-302-18/+5
| | | | | | | Signed-off-by: Christian Gmeiner <[email protected]> Reviewed-by: Jonathan Marek <[email protected]> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4278> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4278>
* etnaviv: s/scissor_s/scissorChristian Gmeiner2020-03-303-7/+7
| | | | | | Signed-off-by: Christian Gmeiner <[email protected]> Reviewed-by: Jonathan Marek <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4278>
* etnaviv: get rid of struct compiled_scissor_stateChristian Gmeiner2020-03-304-31/+15
| | | | | | | | We can reuse pipe_scissor_state. Signed-off-by: Christian Gmeiner <[email protected]> Reviewed-by: Jonathan Marek <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4278>
* etnaviv: do the left shift by 16 at emit timeChristian Gmeiner2020-03-302-16/+16
| | | | | | | | Also round up the max bounds. Signed-off-by: Christian Gmeiner <[email protected]> Reviewed-by: Jonathan Marek <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4278>
* etnaviv: rework clippling calculation to be a derived stateChristian Gmeiner2020-03-303-43/+48
| | | | | | | | This moves the whole clipping calculation out of the emit function. Signed-off-by: Christian Gmeiner <[email protected]> Reviewed-by: Jonathan Marek <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4278>
* etnaviv: get rid of SE_CLIP_*Christian Gmeiner2020-03-303-28/+14
| | | | | | | | | | The only difference between e.g. SE_SCISSOR_RIGHT and SE_CLIP_RIGHT is the used margin value. With that information we can remove SE_CLIP_* and apply the different margins during emit time. Signed-off-by: Christian Gmeiner <[email protected]> Reviewed-by: Jonathan Marek <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4278>
* gitlab-ci: Prune all SCons jobs except scons-win64, and allows failures.Jose Fonseca2020-03-301-17/+2
| | | | | | | | Based on the discussion in https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4352 Reviewed-by: Daniel Stone <[email protected]> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4363> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4363>
* nir/algebraic: add fexp2(fmul(flog2(a), 0.5) -> fsqrt(a) optimizationSamuel Pitoiset2020-03-301-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Helps some Wolfenstein II and Wolfenstein Youngblood shaders. pipeline-db (VEGA10/ACO): Totals from affected shaders: SGPRS: 17904 -> 17904 (0.00 %) VGPRS: 14492 -> 14492 (0.00 %) Spilled SGPRs: 20 -> 20 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Code Size: 1753152 -> 1749708 (-0.20 %) bytes Max Waves: 2581 -> 2581 (0.00 %) pipeline-db (VEGA10/LLVM): Totals from affected shaders: SGPRS: 26656 -> 26656 (0.00 %) VGPRS: 23780 -> 23780 (0.00 %) Spilled SGPRs: 2112 -> 2112 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Code Size: 2552712 -> 2549236 (-0.14 %) bytes Max Waves: 3359 -> 3359 (0.00 %) Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Alyssa Rosenzweig <[email protected]> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4353> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4353>
* scons: Prune out unnecessary targets.Jose Fonseca2020-03-3029-1406/+1
| | | | | | | | | | | | | | | | This prunes out all targets except libgl-gdi, libgl-xlib, and svga, as suggested by Marek Olšák. libgl-xlib will be remove once I have had time to confirm no automated tests we have rely upon it. There are also a bunch of Makefile.sources which become orphaned as result, that are not taken care of in this change. v2: Prune remainders of swr support. Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4348> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4348>
* aco: Don't store LS VS outputs to LDS when TCS doesn't need them.Timur Kristóf2020-03-302-2/+14
| | | | | | | | | | | | | | Totals: Code Size: 254764624 -> 254745104 (-0.01 %) bytes Totals from affected shaders: VGPRS: 12132 -> 12112 (-0.16 %) Code Size: 573364 -> 553844 (-3.40 %) bytes Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Rhys Perry <[email protected]> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>
* aco: When LS and HS invocations are the same, pass LS outputs in temps.Timur Kristóf2020-03-301-0/+35
| | | | | | | | | | | | | | | | | | | We know that in this case, the LS and HS invocations are working on the exact same vertex, so it's safe to skip the LDS. Totals: VGPRS: 3960744 -> 3961844 (0.03 %) Code Size: 254824300 -> 254764624 (-0.02 %) bytes Max Waves: 1053748 -> 1053574 (-0.02 %) Totals from affected shaders: VGPRS: 26152 -> 27252 (4.21 %) Code Size: 1496600 -> 1436924 (-3.99 %) bytes Max Waves: 4860 -> 4686 (-3.58 %) Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Rhys Perry <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>
* aco: Extract store_output_to_temps into a separate function.Timur Kristóf2020-03-301-21/+32
| | | | | | | | Will be used by LS output stores. Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Rhys Perry <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>
* aco: Fix workgroup size calculation.Timur Kristóf2020-03-305-35/+39
| | | | | | | | | | | | | | | | | | | Clear the workgroup size for all supported shader stages. Also, unify the workgroup size calculation accross various places. As a result, insert_waitcnt can use the proper workgroup size which means that some waits can be dropped from tessellation shaders. Also, in cases where the previous calculation was wrong, we now insert s_barrier instructions. Totals from affected shaders (GFX10): Code Size: 340116 -> 338484 (-0.48 %) bytes Fixes: a8d15ab6daf0a07476e9dfabe513c0f1e0f3bf82 Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]> Reviewed-by: Rhys Perry <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>
* aco: Extract setup_tcs_info to a separate function.Timur Kristóf2020-03-301-12/+19
| | | | | | | | Will be required by the workgroup size calculation. Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Rhys Perry <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>
* aco: Zero-fill undefined elements in create_vec_from_array.Timur Kristóf2020-03-301-7/+15
| | | | | | Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Rhys Perry <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>
* aco: Change isel inputs/outputs to a flat array.Timur Kristóf2020-03-302-20/+25
| | | | | | Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Rhys Perry <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>
* aco: Treat outputs of the previous stage as inputs of the next stage.Timur Kristóf2020-03-302-21/+28
| | | | | | Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Rhys Perry <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>
* nir: Collect if shader uses cross-invocation or indirect I/O.Timur Kristóf2020-03-302-13/+110
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following new fields are added to tess shader info: * `tcs_cross_invocation_inputs_read` * `tcs_cross_invocation_outputs_read` These are I/O masks that are a subset of inputs_read and outputs_read and they contain which per-vertex inputs and outputs are read cross-invocation. Additionall, the following new fields are added to shader_info: * `inputs_read_indirectly` * `outputs_accessed_indirectly` * `patch_inputs_read_indirectly` * `patch_outputs_accessed_indirectly` These new fields can be used for optimizing TCS in a back-end compiler. If you can be sure that the TCS doesn't use cross-invocation inputs or outputs, you can choose a different strategy for storing VS and TCS outputs. However, such optimizations might need to be disabled when the inputs/outputs are accessed indirectly due to backend limitations, so this information is also collected. Example: RADV currently has to store all VS and TCS outputs in LDS, but for shaders when only inputs and/or outputs belonging to the current invocation ID are used, it could skip storing these in LDS entirely. Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Rhys Perry <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>
* aco: Use more optimal sequence at the beginning of merged shaders.Timur Kristóf2020-03-301-3/+17
| | | | | | | | | | | | | | | | | | | It can be further optimized in the future, but the new sequence already has a few advantages: * Uses fewer instructions * Uses even fewer instructions in wave32 mode * Doesn't use the VALU at all Totals from affected shaders (GFX10): VGPRS: 43504 -> 43496 (-0.02 %) Code Size: 2436000 -> 2423688 (-0.51 %) bytes Max Waves: 8704 -> 8705 (0.01 %) Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]> Reviewed-by: Rhys Perry <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>
* aco: Skip 2nd read of merged wave info when TCS in/out vertices are equal.Timur Kristóf2020-03-302-6/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | When TCS has an equal number of input and output, it means that the number of VS and TCS invocations (LS and HS) are the same; and that the HS invocations operate on the same vertices as the LS. When this is the case, this commit removes the else-if between the merged VS and TCS halves, making it possible to schedule and optimize the code accross the two halves. Totals: SGPRS: 5577367 -> 5581735 (0.08 %) VGPRS: 3958592 -> 3960752 (0.05 %) Code Size: 254867144 -> 254838244 (-0.01 %) bytes Max Waves: 1053887 -> 1053747 (-0.01 %) Totals from affected shaders: SGPRS: 29032 -> 33400 (15.05 %) VGPRS: 35664 -> 37824 (6.06 %) Code Size: 1979028 -> 1950128 (-1.46 %) bytes Max Waves: 7310 -> 7170 (-1.92 %) Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Rhys Perry <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>
* aco: Allow combining LDS loads when loading tess factors.Timur Kristóf2020-03-301-13/+13
| | | | | | | | | | | | | | | | | Previously the tess factors were loaded individually, but now they can be loaded using a single LDS load instruction. Note that the inner and outer tess factors are not yet combined. Totals (GFX10): Code Size: 254896008 -> 254879212 (-0.01 %) bytes Totals from affected shaders (GFX10): Code Size: 2028352 -> 2011556 (-0.83 %) bytes Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Rhys Perry <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>
* aco: Allow combining TCS output VMEM stores.Timur Kristóf2020-03-301-1/+1
| | | | | | | | | | | | | | | | | Some copypasta may have stuck in the code. This was left on false by mistake. Totals (GFX10): Code Size: 254939248 -> 254896008 (-0.02 %) bytes Totals from affected shaders (GFX10): VGPRS: 16196 -> 16212 (0.10 %) Code Size: 1126332 -> 1083092 (-3.84 %) bytes Max Waves: 2336 -> 2334 (-0.09 %) Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Rhys Perry <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>
* aco: Fix handling of tess factors.Timur Kristóf2020-03-302-22/+13
| | | | | | | | | | | | There is no need to check whether they are written using indirect indices, because all tess factors should be written to VMEM only at the end of the shader. No pipeline db changes. Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Rhys Perry <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>
* aco: Extract tcs_driver_location_matches_api_mask to separate function.Timur Kristóf2020-03-301-21/+30
| | | | | | | | Also clear up should_write_tcs_output_to_lds a little bit. Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Rhys Perry <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>
* aco: Create null exports in instruction selection instead of assembler.Timur Kristóf2020-03-303-36/+72
| | | | | | | | | | | | | | | | This allows the passes after isel to assume that the exports are always correct, and also allows to schedule these null exports later. Additionally, it ensures that the correct exec mask is used for these exports. Totals from affected shaders (GFX10): SGPRS: 84224 -> 84344 (0.14 %) VGPRS: 23088 -> 23076 (-0.05 %) Code Size: 882892 -> 894368 (1.30 %) bytes Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Rhys Perry <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>
* nir: Fix breakage of foreach_list_typed_safe assumptions in loop unrollingDanylo Piliaiev2020-03-301-12/+70
| | | | | | | | | | | | | | | | | foreach_list_typed_safe works with assumption that even if current node becomes invalid, the next will be still valid. However process_loops broke this assumption, because during iteration when immediate child is unrolled - not only current node could be removed but also the one after it. This doesn't cause issues now but it will cause issues when undefined behaviour in foreach* macros is fixed. Signed-off-by: Danylo Piliaiev <[email protected]> Reviewed-by: Timothy Arceri <[email protected]> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4189> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4189>