summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/vc4
Commit message (Collapse)AuthorAgeFilesLines
* vc4: Add a small QIR validate pass.Eric Anholt2016-05-064-0/+127
| | | | | This has caught a couple of bugs during loop development so far, and I should probably have written it long ago.
* vc4: Fix the src count on exp2/log2.Eric Anholt2016-05-061-2/+2
| | | | Found by the upcoming QIR validate pass.
* vc4: Reuse QPU disasm's cond flags in QIR.Eric Anholt2016-05-063-27/+46
| | | | In the process, this made me flatten out the "%s%s%s%s" fprintf arguments.
* vc4: When emitting an instruction to an existing temp, mark it non-SSA.Eric Anholt2016-05-061-0/+2
| | | | Prevents a bug in the later control-flow support series.
* vc4: Make sure that we don't overwrite the signal for PROG_END.Eric Anholt2016-05-061-0/+8
| | | | | | | | We should have already emitted a NOP due to the last instruction being a TLB or VPM write. However, if you disable dead code elimination then you might get dead code at the end, and that dead code might have the signal bits set to something non-default, at which point you die in assertion failure.
* vc4: fixup for new nir_foreach_block()Connor Abbott2016-05-054-48/+20
| | | | Reviewed-by: Eric Anholt <[email protected]>
* vc4: Use NIR lowering for sRGB decode.Eric Anholt2016-05-022-40/+3
| | | | | This should get us the same decode code generated, but with a lot less custom code in the driver.
* vc4: Just use NIR lowering for texture projection.Eric Anholt2016-05-021-15/+3
| | | | | This means doing Newton-Raphson on the RCP, but it's probably actually a good thing to be accurate on.
* vc4: Scalarize phi nodes as well.Eric Anholt2016-05-021-0/+1
| | | | | This makes fewer programs with loops assertion fail, replacing them with the rendering failure warning.
* vc4: Add whitespace after each program stage dump.Eric Anholt2016-05-022-0/+3
| | | | | In particular it's been hard to find the point where we switch from dumping pre-optimization QIR and post-optimization QIR.
* vc4: Remove the CSE pass.Eric Anholt2016-05-024-162/+0
| | | | | | It's not doing anything according to shader-db now that we're using NIR. It would have had to be reworked significantly anyway, to handle control flow.
* vc4: Emit only one FRAG_Z or FRAG_W QIR opcode.Eric Anholt2016-05-021-2/+19
| | | | | | We were generating piles of FRAG_W for interpolation, only to CSE them away immediately. Since this is the only thing that CSE is doing for us any more, just avoid making the CSE work necessary.
* vc4: Use the NIR cubemap normalization instead of our own.Eric Anholt2016-05-021-6/+1
| | | | | | | This is one of two uses of the current QIR CSE pass according to shader-db. The NIR pass means that we'll end up doing Newton-Raphson on our RCP, which we weren't doing before, but that's probably actually a good thing.
* vc4: Drop the support for DCE of texture instructions.Eric Anholt2016-05-021-22/+1
| | | | | Now that we're using NIR for our optimization, there's no need for this tricky code.
* nir: Switch the arguments to nir_foreach_functionJason Ekstrand2016-04-284-5/+5
| | | | | | | | | This matches the "foreach x in container" pattern found in many other programming languages. Generated by the following regular expression: s/nir_foreach_function(\([^,]*\),\s*\([^,]*\))/nir_foreach_function(\2, \1)/ Reviewed-by: Ian Romanick <[email protected]>
* nir: Switch the arguments to nir_foreach_instrJason Ekstrand2016-04-284-5/+5
| | | | | | | | | | | This matches the "foreach x in container" pattern found in many other programming languages. Generated by the following regular expression: s/nir_foreach_instr(\([^,]*\),\s*\([^,]*\))/nir_foreach_instr(\2, \1)/ and similar expressions for nir_foreach_instr_safe etc. Reviewed-by: Ian Romanick <[email protected]>
* nir: rename lower_flrp to lower_flrp32Samuel Iglesias Gonsálvez2016-04-281-1/+1
| | | | | | | A later patch will add lower_flrp64 option to NIR. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* vc4: Make sure we recompile when sample_mask changes.Eric Anholt2016-04-221-0/+1
| | | | | | | Part of fixing piglit EXT_framebuffer_multisample/sample-coverage inverted (there is also a bug with RCL tiled blits) Cc: "11.1 11.2" <[email protected]>
* vc4: Fix validation of full res tile offset if used for non-MSAA.Eric Anholt2016-04-223-2/+14
| | | | | | There's no reason we couldn't do non-MSAA full resolution tile buffer load/stores, but we would have claimed buffer overflow was being attempted. Nothing does this currently.
* vc4: Only do MSAA FB operations if the FB is MSAA.Eric Anholt2016-04-221-5/+8
| | | | | I noticed this as a problem with ET:QW traces emitting coverage code when the framebuffer was supposed to be single sampled.
* vc4: Fix tests for format supported with nr_samples == 1.Eric Anholt2016-04-221-3/+4
| | | | | | | | | | This was a bug from the MSAA enabling. Tests for surfaces with nr_samples==1 instead of 0 (generally GL renderbuffers) would incorrectly fail out. Fixes the ARB_framebuffer_sRGB piglit tests other than srgb_conformance. Cc: "11.1 11.2" <[email protected]>
* vc4: Don't try to blit from MSAA surfaces with mismatched width to dst.Eric Anholt2016-04-221-11/+14
| | | | | | | | | I had made the previous blit fix non-MSAA only because I was thinking about how the hardware infers stride from the RENDERING_CONFIG packet. However, I'm also inferring the stride for both MSAA src and dst in vc4_render_cl.c from the width argument in the ioctl. Fixes 15 EXT_framebuffer_multisample piglit tests.
* gallium: add bool return to pipe_context::end_queryNicolai Hähnle2016-04-211-1/+2
| | | | | | | | | Even when begin_query succeeds, there can still be failures in query handling. For example for radeon, additional buffers may have to be allocated when queries span multiple command buffers. Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium: merge PIPE_SWIZZLE_* and UTIL_FORMAT_SWIZZLE_*Marek Olšák2016-04-224-16/+16
| | | | | | | | Use PIPE_SWIZZLE_* everywhere. Use X/Y/Z/W/0/1 instead of RED, GREEN, BLUE, ALPHA, ZERO, ONE. The new enum is called pipe_swizzle. Acked-by: Jose Fonseca <[email protected]>
* nir: rename nir_foreach_block*() to nir_foreach_block*_call()Connor Abbott2016-04-204-4/+4
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* vc4: Fix fbo-generatemipmap-formats for NPOT.Eric Anholt2016-04-181-0/+20
| | | | | | | Single-sampled texture miplevels > 1 are stored in POT-aligned areas, but we only get one value to control the stride of the src and dst for single sampled buffers. A RCL tile blit from level != 1 to level == 0 would therefore load from the wrong stride.
* vc4: Remove unused "immediates" fieldEric Anholt2016-04-181-1/+0
| | | | This was for TGSI, which we no longer have to deal with.
* vc4: Add support for rendering to cube map surfaces.Eric Anholt2016-04-181-1/+2
| | | | | | | We need to fix up the offset to point at the face of the cube. Fixes piglit fbo-cubemap, copyteximage CUBE, and glean's fbo test. Cc: "11.1 11.2" <[email protected]>
* vc4: Don't flush on read-only access of buffers read by the CL.Eric Anholt2016-04-183-7/+16
| | | | | | Fixes piglit mixed-immediate-and-vbo, and may significantly improve performance of applications that store a 4-byte IB in the same VBO as vertex data.
* vc4: Sanity check that flushes don't happen between state emit and draw.Eric Anholt2016-04-181-0/+7
| | | | | | Catches the cause of failure in arb_vertex_buffer_object-mixed-immediate-and-vbo, I've had this class of failure before, and it probably won't be the last time.
* vc4: Sanity check strides for imported BOs.Eric Anholt2016-04-181-5/+18
| | | | | | | If we're going to sample from or render to them at some particular size, we'd better make sure that they actually are that size. Causes some tests under simulation to generate appropriate error messages instead of failures.
* vc4: Fix subimage accesses to LT textures.Eric Anholt2016-04-151-4/+4
| | | | | | | | | | | | | This code started out like the T case, iterating over utile offsets, but I had partially switched it to iterating over pixel offsets. I hadn't caught this before because it's unusual to do piecemeal uploads to small textures. Fixes bad text rendering in QT5 apps, which use a 256x16 glyph cache. Also fixes 6 piglit tests related to glTexSubImage() and glGetTexSubImage(). Cc: "11.1 11.2" <[email protected]>
* nir/dead_variables: Configurably work with any variable modeJason Ekstrand2016-04-131-1/+1
| | | | | | | The old version of the pass only worked on globals and locals and always left inputs, outputs, uniforms, etc. alone. Reviewed-by: Kenneth Graunke <[email protected]>
* vc4: Work around hardware limits on the number of verts in a single draw.Eric Anholt2016-04-121-18/+92
| | | | | Fixes rendering failures in glmark2's refract and bump:render-mode=high-poly demos, and partially in its terrain demo.
* gallium: Add capability for ARB_robust_buffer_access_behavior.Bas Nieuwenhuizen2016-04-121-0/+1
| | | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* gallium: add pipe_context::set_active_query_state for pausing queriesMarek Olšák2016-04-121-0/+6
| | | | | Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* vc4: Move FRAG_X/Y/REV_FLAG to a QFILE like VPM or TLB color writes.Eric Anholt2016-04-084-27/+29
| | | | | This gives us one less set of special instruction generation cases, and instead just the case for returning the correct register to read.
* vc4: Allow TLB Z/color/stencil writes from any ALU operation in QIR.Eric Anholt2016-04-085-65/+100
| | | | | | | | This lets us write the Z directly from the FTOI for computed Z, and may let us coalesce color writes in the future. No change in my shader-db, but clearly drops an instruction in piglit's early-z test.
* vc4: Add a helper function for the construction of qregs.Eric Anholt2016-04-084-12/+13
| | | | | The separate declaration of the struct is not helping clarity, and I was going to be writing a whole lot more of these in the upcoming patches.
* vc4: Add missing scheduling dependency for MS color writes.Eric Anholt2016-04-081-0/+1
|
* vc4: Drop the multi_instruction distinction for QIR instructions.Eric Anholt2016-04-082-14/+5
| | | | | | | It wasn't correctly flagged everywhere, and QPU generation now handles the only remaining case that was paying attention to it. No change on shader-db.
* vc4: Handle SF on instructions that write r4.Eric Anholt2016-04-081-10/+14
| | | | | | | Normal SFU writes couldn't have SF because they were marked as multi_instruction, but tex_result and tlb_color_read weren't. This ended up not being a problem according to anything in shader-db, but it seems possible.
* vc4: Allow multi-instruction QIR nodes to get VPM optimization.Eric Anholt2016-04-081-2/+2
| | | | | | | | | | | There used to be multi-instruction operations that would use src[] twice, which is why we couldn't do some optimizations on them. This is no longer the case. total instructions in shared programs: 77973 -> 77969 (-0.01%) instructions in affected programs: 84 -> 80 (-4.76%) total estimated cycles in shared programs: 234165 -> 234157 (-0.00%) estimated cycles in affected programs: 92 -> 84 (-8.70%)
* vc4: Switch to using NIR_PASS macros.Eric Anholt2016-04-085-33/+32
| | | | This gets us better validation of our NIR transformations.
* vc4: Handle nir_intrinsic_load_user_clip_plane as a vec4.Eric Anholt2016-04-082-20/+12
| | | | | | | | I liked having all my NIR be scalar, but nir_validate() complains that the intrinsic writes 4 components but the destination we set up was only 1 component. I could generate a new scalar variant, but it's a lot easier to just leave it as a vec4. This doesn't hurt codegen since we GC unused uniforms, and UCP dot products use all the components anyway.
* vc4: Emit a warning and proceed for handling loops in NIR.Rhys Kidd2016-04-081-1/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | We don't really suppor control flow yet, but it's a lot nicer to render something and warn on stderr than to crash. Fixes the following piglit tests: - shaders/complex-loop-analysis-bug - shaders/glsl-fs-discard-04 Converts the following piglit tests from crash to fail: - shaders/glsl-fs-continue-inside-do-while - shaders/glsl-fs-loop - shaders/glsl-fs-loop-continue - shaders/glsl-fs-loop-nested - shaders/glsl-texcoord-array - shaders/glsl-vs-continue-inside-do-while - shaders/glsl-vs-loop - shaders/glsl-vs-loop-continue - shaders/glsl-vs-loop-nested No piglit regressions. v2 (Eric): Add stronger stderr warning. Signed-off-by: Rhys Kidd <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* vc4: Add a stub for NIR->QIR of control flow function nodesRhys Kidd2016-04-081-0/+11
| | | | | | | | We shouldn't have any NIR functions present since all GLSL functions get inlined, but this would be a more informative error if it does happen. Signed-off-by: Rhys Kidd <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* vc4: Add better debug of NIR->QIR control flow graph failureRhys Kidd2016-04-081-1/+2
| | | | | | | | | | | | | Ensure NIR control flow graph nodes that are unhandled in QIR are reported with sufficient verbosity to aid debugging. This improves piglit outputs, amongst other tools. There are no other remaining uses of assert(0) as a blunt tool within vc4. Signed-off-by: Rhys Kidd <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* vc4: Remove unused include from vc4_program.cRhys Kidd2016-04-081-1/+0
| | | | | | | | Found with grep and inspection. Test compiled on RPi hw. Assists any future effort to remove TGSI as an intermediate stage. Signed-off-by: Rhys Kidd <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* gallium: Add PIPE_CAP_FRAMEBUFFER_NO_ATTACHMENTEdward O'Callaghan2016-04-071-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add PIPE_CAP to determine if the GL extension 'GL_ARB_framebuffer_no_attachments' shall be supported. The driver is required to support 'PIPE_FORMAT_NONE' via its 'is_format_supported()' callback in order to determine the MSAA modes the hardware supports so that values requested from the application using 'GL_ARB_framebuffer_no_attachments' may be quantized to what the hardware expects. V.2: Fix doc for a more detailed description of the PIPE_CAP and the corresponding GL constant. V.3: Renamed and repurposed once again. V.4: Remove CAP from cap_mapping array. [airlied: fix damaged whitespace] Signed-off-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Signed-off-by: Dave Airlie <[email protected]>