aboutsummaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/vc4/vc4_program.c
Commit message (Collapse)AuthorAgeFilesLines
* nir: Rework conversion opcodesJason Ekstrand2017-03-141-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | The NIR story on conversion opcodes is a mess. We've had way too many of them, naming is inconsistent, and which ones have explicit sizes was sort-of random. This commit re-organizes things and makes them all consistent: - All non-bool conversion opcodes now have the explicit size in the destination and are named <src_type>2<dst_type><size>. - Integer <-> integer conversion opcodes now only come in i2i and u2u forms (i2u and u2i have been removed) since the only difference between the different integer conversions is whether or not they sign-extend when up-converting. - Boolean conversion opcodes all have the explicit size on the bool and are named <src_type>2<dst_type>. Making things consistent also allows nir_type_conversion_op to be moved to nir_opcodes.c and auto-generated using mako. This will make adding int8, int16, and float16 versions much easier when the time comes. Reviewed-by: Eric Anholt <[email protected]>
* vc4: Report to shader-db how many threads a fragment shader has.Eric Anholt2017-03-081-0/+7
| | | | | Doing instruction count analysis when we emit the thread switches that will save us from tons of stalls is kind of missing the point.
* Revert "vc4: Lazily emit our FS/VS input loads."Eric Anholt2017-03-081-80/+69
| | | | | | This reverts commit 292c24ddac5acc35676424f05291c101fcd47b3e. It broke a lot of GLES2 deqp, and I see at least one problem that will require some serious rework to fix.
* gallium: s/unsigned/enum pipe_shader_type/ for get_compiler_options()Brian Paul2017-03-081-1/+2
| | | | Reviewed-by: Edward O'Callaghan <[email protected]>
* vc4: Lazily emit our FS/VS input loads.Eric Anholt2017-02-241-69/+80
| | | | | | | | | | | | | | | | | | | This reduces register pressure in both types of shaders, by reordering the input loads from the var->data.driver_location order to whatever order they appear first in the NIR shader. These instructions aren't reorderable at our QIR scheduling level because the FS takes two in lockstep to do an interpolation, and the VS takes multiple read instructions in a row to get a whole vec4-level attribute read. shader-db impact: total instructions in shared programs: 76666 -> 76590 (-0.10%) instructions in affected programs: 42945 -> 42869 (-0.18%) total max temps in shared programs: 9395 -> 9208 (-1.99%) max temps in affected programs: 2951 -> 2764 (-6.34%) Some programs get their max temps hurt, depending on the order that the load_input intrinsics appear, because we end up being unable to copy propagate an older VPM read into its only use.
* vc4: Refactor the load_input code out of the intrinsic code.Eric Anholt2017-02-241-25/+42
| | | | It's going gain most of ntq_setup_inputs(), so simplify it first.
* vc4: Track the last block we emitted at the top level.Eric Anholt2017-02-241-5/+8
| | | | | This will be used for delaying our VPM reads (which must be unconditional) until just before they're used.
* vc4: Enable glSampleMask() even when !rasterizer->multisample.Eric Anholt2017-02-101-2/+1
| | | | | | | | gallium's blitter expects that it can set the sample mask even when the rasterizer doesn't have the flag on. Between this and the previous test, 10 new ext_framebuffer_multisample tests start passing.
* vc4: Use accurate 1/w in coordinate shader as well as vert shader.Eric Anholt2017-02-101-1/+1
| | | | | We probably shouldn't be emitting different scaled viewport coordinates between vertex and coord.
* vc4: Avoid emitting small immediates for UBO indirect load address guards.Eric Anholt2017-02-101-4/+4
| | | | | | | | | | | | The kernel will reject our shader if we emit one here, and having 4, 8, or 12 as the top end of our UBO clamp rare is enough that it's not worth making the kernel let us. Fixes piglit fs-const-array-of-struct and fs-const-array-of-struct-of-array since recent GLSL linking changes made us get this as an indirect load of a uniform, instead of a tempoary. Cc: "13.0 17.0" <[email protected]>
* vc4: Avoid an extra temporary and mov in ffloor/ffract/fceil.Eric Anholt2017-01-281-13/+18
| | | | | | | | | | shader-db results: total instructions in shared programs: 92611 -> 91764 (-0.91%) instructions in affected programs: 27417 -> 26570 (-3.09%) The star is one shader in glmark2's terrain (drops 16% of its instructions), but there are also wins in mupen64plus and glb2.7.
* nir: Rename convert_to_ssa lower_regs_to_ssaJason Ekstrand2016-12-291-1/+1
| | | | This matches the naming of nir_lower_vars_to_ssa, the other to-SSA pass.
* vc4: Enable NIR-based loop unrolling.Eric Anholt2016-12-291-0/+5
| | | | | This successfully unrolls a new shader in GLB2.7, which also gets that shader to successfully compile in multithreaded mode.
* treewide: s/comparitor/comparator/Ilia Mirkin2016-12-121-1/+1
| | | | | | | | | | git grep -l comparitor | xargs sed -i 's/comparitor/comparator/g' Just happened to notice this in a patch that was sent and included one of the tokens in question. Signed-off-by: Ilia Mirkin <[email protected]> Acked-by: Nicolai Hähnle <[email protected]>
* vc4: In a loop break/continue, jump if everyone has taken the path.Eric Anholt2016-11-301-10/+17
| | | | | | | | | | | | | | | | | This should be a win for most loops, which tend to have uniform control flow. More importantly, it exposes important information to live variables: that the break/continue here means that our jump target may have access to values that were live on our input. Previously, we were just setting the exec mask and letting control flow fall through, so an intervening def between the break and the end of the loop would appear to live variables as if it screened off the variable, when it didn't actually. Fixes a regression in glsl-vs-loop-redundant-condition.shader_test when a perturbing of register allocation caused a live variable to get stomped. Cc: 13.0 <[email protected]>
* vc4: Restructure texture insts as ALU ops with tex_[strb] as the dst.Eric Anholt2016-11-291-9/+25
| | | | | For now we're still just generating MOVs, but this will let us fold into other ops in the future. No difference on shader-db.
* vc4: Refactor qir_get_op_nsrc(enum qop) to qir_get_nsrc(struct qinst *).Eric Anholt2016-11-291-1/+1
| | | | | | Every caller was dereffing the qinst, and this will let us make the number of sources vary depending on the destination of the qinst so that we can have general ALU ops that store to tex_[strb] and get an implicit uniform.
* gallium: fix more occurences of u_hash.hMarek Olšák2016-11-221-1/+1
| | | | this fixes compile failures since 86514d84e0beec47c82da4888db12bf07f33cb83
* vc4: Try compiling our FSes in multithreaded mode on new kernels.Eric Anholt2016-11-161-2/+13
| | | | | | Multithreaded fragment shaders let us hide texturing latency by a hyperthreading-style switch to another fragment shader. This gets us up to 20% framerate improvements on glmark2 tests.
* vc4: Mark threaded FSes as non-singlethread in the CL.Eric Anholt2016-11-121-0/+2
|
* vc4: Flag the last thread switch in the program as the last.Eric Anholt2016-11-121-0/+11
| | | | | | We don't allow the last thread switch to be inside control flow, to be sure that we hit the last state exactly once. If the last texturing was in control flow, fall back to single threaded.
* vc4: Add THRSW nodes after each tex sample setup in multithreaded mode.Eric Anholt2016-11-121-0/+25
| | | | | This is a suboptimal implementation, but Jonas Pfeil found that it was still a massive performance gain.
* vc4: Clamp the shadow comparison value.Eric Anholt2016-11-091-0/+9
| | | | | | Fixes piglit glsl-fs-shadow2D-clamp-z. Cc: <[email protected]>
* vc4: Don't abort when a shader compile fails.Eric Anholt2016-11-091-4/+14
| | | | | | | | | It's much better to just skip the draw call entirely. Getting this information out of register allocation will also be useful for implementing threaded fragment shaders, which will need to retry non-threaded if RA fails. Cc: <[email protected]>
* vc4: Use Newton-Raphson on the 1/W write to fix glmark2 terrain.Eric Anholt2016-11-041-1/+1
| | | | | | | The 1/W was apparently not accurate enough, and we were getting sparklies in the distance. The closed driver also did a N-R step here. Cc: <[email protected]>
* vc4: Make sure that vertex shader texture2D() calls use LOD 0.Eric Anholt2016-11-041-0/+10
| | | | | I noticed this while trying to debug glmark2 terrain (which does vertex shader texturing, but no mipmaps on its textures sampled from the VS).
* ralloc: use rzalloc where it's necessaryMarek Olšák2016-10-311-1/+1
| | | | | | | | | | | | | | | | | No change in behavior. ralloc_size is equivalent to rzalloc_size. That will change though. Calls not switched to rzalloc_size: - ralloc_vasprintf - glsl_type::name allocation (it's filled with snprintf) - C++ classes where valgrind didn't show uninitialized values I switched most of non-glsl stuff to rzalloc without checking whether it's really needed. Reviewed-by: Edward O'Callaghan <[email protected]> Tested-by: Edmondo Tommasina <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* nir/i965/anv/radv/gallium: make shader info a pointerTimothy Arceri2016-10-261-3/+3
| | | | | | | | | | When restoring something from shader cache we won't have and don't want to create a nir_shader this change detaches the two. There are other advantages such as being able to reuse the shader info populated by GLSL IR. Reviewed-by: Jason Ekstrand <[email protected]>
* vc4: Avoid making temporaries for assignments to NIR registers.Eric Anholt2016-10-211-35/+79
| | | | | | | | | | | | | | | | | | | | | | | Getting stores to NIR regs to not generate new MOVs is tricky, since the result we're trying to store into the NIR reg may have been from a conditional update of a temp, or a series of packed writes. The easiest solution seems to be to require that nir_store_dest()'s arg comes from an SSA temp. This causes us to put in a few more temporary MOVs in the NIR SSA dest case, but copy propagation successfully cleans those up. The shader-db change is modest: total instructions in shared programs: 93774 -> 93598 (-0.19%) instructions in affected programs: 14760 -> 14584 (-1.19%) total estimated cycles in shared programs: 212135 -> 211946 (-0.09%) estimated cycles in affected programs: 27005 -> 26816 (-0.70%) but I was seeing patterns in some register-allocation failures in DEQP tests that looked like the extra MOVs would increase maximum register pressure in loops. Some debug code indicates that that's not the case, though I'm still a bit confused by that result.
* vc4: Restructure the simulator mode.Eric Anholt2016-10-211-3/+0
| | | | | | | | | | | | | Rather than having simulator mode changes scattered around vc4_bufmgr.c and vc4_screen.c, make vc4_bufmgr.c just call a vc4_simulator_ioctl, which then dispatches to a corresponding implementation. This will give the simulator support a centralized place to do tricks like storing most BOs directly in simulator memory rather than copying in and out. This leaves special casing of mmaping BOs and execution, because of the winsys mapping.
* vc4: Fix live intervals analysis for screening defs in if statements.Eric Anholt2016-10-061-1/+4
| | | | | | | | | If a conditional assignment is only conditioned on the exec mask, that's still screening off the value in the executed channels (and, since we're not storing to the unexcuted channels, we don't care what's in there). Fixes a bunch of extra register pressure on Processing's Ribbons demo, which is failing to allocate.
* vc4: Fix assertion fails from trying to cast non-ALU instrs to ALU.Eric Anholt2016-10-061-0/+2
| | | | | Fixes 100 piglit tests since the assertions were added to nir.h. What's amazing is that these tests used to pass, even when casting garbage.
* nir: Make nir_foo_first/last_cf_node return a block insteadJason Ekstrand2016-10-061-4/+2
| | | | | | | | | | One of NIR's invariants is that control flow lists always start and end with blocks. There's no good reason why we should return a cf_node from these functions since we know that it's always a block. Making it a block lets us remove a bunch of code. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* nir: Allow opt_peephole_sel to be more aggressive in flattening IFs.Eric Anholt2016-09-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | VC4 was running into a major performance regression from enabling control flow in the glmark2 conditionals test, because of short if statements containing an ffract. This pass seems like it was was trying to ensure that we only flattened IFs that should be entirely a win by guaranteeing that there would be fewer bcsels than there were MOVs otherwise. However, if the number of ALU ops is small, we can avoid the overhead of branching (which itself costs cycles) and still get a win, even if it means moving real instructions out of the THEN/ELSE blocks. For now, just turn on aggressive flattening on vc4. i965 will need some tuning to avoid regressions. It does looks like this may be useful to replace freedreno code. Improves glmark2 -b conditionals:fragment-steps=5:vertex-steps=0 from 47 fps to 95 fps on vc4. vc4 shader-db: total instructions in shared programs: 101282 -> 99543 (-1.72%) instructions in affected programs: 17365 -> 15626 (-10.01%) total uniforms in shared programs: 31295 -> 31172 (-0.39%) uniforms in affected programs: 3580 -> 3457 (-3.44%) total estimated cycles in shared programs: 225182 -> 223746 (-0.64%) estimated cycles in affected programs: 26085 -> 24649 (-5.51%) v2: Update shader-db output. Reviewed-by: Ian Romanick <[email protected]> (v1)
* nir: Report progress from nir_lower_phis_to_scalar.Kenneth Graunke2016-09-141-2/+1
| | | | | Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* nir: Report progress from nir_lower_alu_to_scalar.Kenneth Graunke2016-09-141-1/+1
| | | | | Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* vc4: Move the render job state into a separate structure.Eric Anholt2016-09-141-1/+2
| | | | | This is a preparation step for having multiple jobs being queued up at the same time.
* vc4: Handle discards while in control flow.Eric Anholt2016-08-291-6/+27
| | | | | I missed this while adding loop support because the discard test inside a loop was crashing before, anyway. Fixes piglit glsl-fs-discard-04.
* vc4: Add support for fddx/fddyEric Anholt2016-08-251-0/+52
| | | | Based vaguely on a patch by jonasarrow on github.
* vc4: Tell state_tracker that we would prefer NIR.Eric Anholt2016-08-221-7/+25
| | | | | | | | | | Before this series, the code generation path was: GLSL IR -> TGSI -> NIR -> NIR clone -> QIR -> QPU Now it's (generally) GLSL IR -> NIR -> NIR clone -> QIR -> QPU
* vc4: Use proper type sizes for uniforms.Eric Anholt2016-08-221-4/+5
|
* vc4: Add VARYING_SLOT_PNTC support.Eric Anholt2016-08-221-4/+5
| | | | We end up with this when doing GLSL-to-NIR.
* nir: Define system values for vc4's blending-lowering arguments.Eric Anholt2016-08-221-25/+31
| | | | | | | | | | | | | In the GLSL-to-NIR conversion of VC4, I had a bit of trouble with what I was calling the "state uniforms" that I was putting into the NIR fighting with its other lowering passes. Instead of using magic uniform base numbers in the backend, follow the lead of load_user_clip_plane and just define system values for them. v2: Fix unintended change to channel_num, drop unspecified const_index value on blend_const_color_r_float. Reviewed-by: Kenneth Graunke <[email protected]>
* vc4: Switch store_output to using nir_lower_io_to_scalar / component.Eric Anholt2016-08-191-5/+13
|
* vc4: Use the intrinsic's first_component for vattr VPM index.Eric Anholt2016-08-191-5/+1
| | | | Avoids another multiplication by 4 of the base in the NIR.
* vc4: Convert to using nir_lower_io_scalar for FS inputs.Eric Anholt2016-08-191-3/+17
| | | | | The scalarizing of FS inputs can be done in a non-driver-dependent manner, so extract it out of the driver.
* vc4: Switch to using the intrinsic accessors.Eric Anholt2016-08-191-9/+10
| | | | | The const_index[] values have always felt magic, and this documents them a bit better.
* ttn: Make FRAG_RESULT_DEPTH be a float variable to match gtn and ptn.Eric Anholt2016-08-191-1/+1
| | | | | | | This lets TTN-using drivers handle FRAG_RESULT_DEPTH the same between all their source paths. Reviewed-by: Rob Clark <[email protected]>
* vc4: Dump the TGSI before trying to convert it to NIR.Eric Anholt2016-08-191-4/+3
| | | | In the case of debugging a crash in TTN, this is nice to have.
* vc4: Move scalarizing and some lowering to link time.Eric Anholt2016-08-041-5/+12
| | | | | | | This works out to be a wash in terms of memory usage: We use more memory to store the separate ALU instructions, but we optimize out a lot of code as well. The main result, though, is that we do more of our work at link time rather than draw time.