summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/vc4
Commit message (Collapse)AuthorAgeFilesLines
* vc4: Add shader-db dumping of NIR instruction count.Eric Anholt2015-04-011-0/+29
| | | | | | | | I was previously using temporary disables of VC4 optimization to show the benefits of improved NIR optimization, but this can get me quick and dirty numbers for NIR-only improvements without having to add hacks to disable VC4's code (disabling of which might hide ways that the NIR changes would hurt actual VC4 codegen).
* vc4: Convert to consuming NIR.Eric Anholt2015-04-015-720/+707
| | | | | | | | | | | | | | | | | | | NIR brings us better optimization than I would have bothered to write within the driver, developers sharing future optimization work, and the ability to share device-specific lowering code that we and other GLES2-level drivers need. total uniforms in shared programs: 13421 -> 13422 (0.01%) uniforms in affected programs: 62 -> 63 (1.61%) total instructions in shared programs: 39961 -> 39707 (-0.64%) instructions in affected programs: 15494 -> 15240 (-1.64%) v2: Add missing imov support, and assert that there are no dest saturates. v3: Rebase on the target-specific algebraic series. v4: Rebase on gallium-includes-from-NIR changes in mater. v5: Rebase on variables being in lists instead of hash tables. v6: Squash in intermediate changes that used the NIR-to-TGSI pass (which I'm not committing)
* vc4: Tell shader-db how big our UBOs are, if present.Eric Anholt2015-04-011-0/+6
| | | | I had regressed them for a while with the NIR work.
* vc4: Drop integer multiplies with 0 to moves of 0.Eric Anholt2015-03-301-0/+8
| | | | | | | | This cleans up more instructions generated by uniform array indexing multiplies. total instructions in shared programs: 39989 -> 39961 (-0.07%) instructions in affected programs: 896 -> 868 (-3.12%)
* vc4: Add a constant folding pass.Eric Anholt2015-03-304-0/+113
| | | | | | | | | | | | This cleans up some pointless operations generated by the in-driver mul24 lowering (commonly generated by making a vec4 index for a matrix in a uniform array). I could fill in other operations, but pretty much anything else ought to be getting handled at the NIR level, I think. total uniforms in shared programs: 13423 -> 13421 (-0.01%) uniforms in affected programs: 346 -> 344 (-0.58%)
* vc4: Don't bother masking out the low 24 bits for integer multipliesEric Anholt2015-03-301-12/+8
| | | | | | | | | | The hardware just uses the low 24 lines, saving us an AND to drop the high bits. total uniforms in shared programs: 13433 -> 13423 (-0.07%) uniforms in affected programs: 356 -> 346 (-2.81%) total instructions in shared programs: 40003 -> 39989 (-0.03%) instructions in affected programs: 910 -> 896 (-1.54%)
* vc4: Make integer multiply use 24 bits for the low parts.Eric Anholt2015-03-301-5/+5
| | | | | The hardware uses the low 24 bits in integer multiplies, so we can have fewer high bits (and so probably drop them more frequently).
* vc4: Add a dump-the-surface-contents routine.Eric Anholt2015-03-242-0/+101
| | | | | This has been useful once again while trying to debug stride issues between render targets and texturing.
* vc4: Fix pitch alignment of linear textures.Eric Anholt2015-03-241-1/+1
| | | | | Fixes some non-power-of-two texture rendering when I force ARGB8888 to raster.
* vc4: Write the alignment of level width consistently in validation.Eric Anholt2015-03-241-2/+2
| | | | | | 16 / cpp happens to be the same as utile_w on the only raster format supported (4 bytes per pixel), but simulator/hw source code generally talks in terms of utiles.
* vc4: Fix use of a bool as an enum.Eric Anholt2015-03-241-1/+1
| | | | The enum compared to was 0, so it worked out, but it sure looked wrong.
* vc4: Decide the HW's format before laying out the miptree.Eric Anholt2015-03-241-3/+3
| | | | | | I'm experimenting with a workaround for raster texture misrendering on hardware, and this lets me look at the format chosen when computing strides.
* vc4: Use our device-specific ioctls for create/mmap.Eric Anholt2015-03-241-15/+36
| | | | | | They don't do anything special for us, but I've been told by kernel maintainers that relying on dumb for my acceleration-capable buffers is not OK.
* vc4: Make a new #define for making code conditional on the simulator.Eric Anholt2015-03-243-15/+25
| | | | | | I'd like to compile as much of the device-specific code as possible when building for simulator, and using if (using_simulator) instead of ifdefs helps.
* vc4: Add some useful debug printfs for miptrees.Eric Anholt2015-03-241-0/+37
| | | | I keep rewriting these.
* gallium: implement get_device_vendor() for existing driversGiuseppe Bilotta2015-03-231-0/+1
| | | | | | | | | The only hackish ones are llvmpipe and softpipe, which currently return the same string as for get_vendor(), while ideally they should return the CPU vendor. Signed-off-by: Giuseppe Bilotta <[email protected]> Reviewed-by: Tom Stellard <[email protected]>
* gallium: add FMA and DFMA opcodes (v3)Marek Olšák2015-03-161-0/+1
| | | | | | | | | Needed by ARB_gpu_shader5. v2: select DMAD for FMA with double precision v3: add and select DFMA Reviewed-by: Ilia Mirkin <[email protected]>
* vc4: Update to current kernel sources.Eric Anholt2015-02-248-49/+93
| | | | | | New BO create and mmap ioctls are added. The submit ABI gains a flags argument, and the pointers are fixed at 64-bit. Shaders are now fixed at the start of their BOs.
* vc4: Keep an array of pointers to instructions defining the temps around.Eric Anholt2015-02-198-68/+67
| | | | | The optimization passes are always regenerating it and throwing it away, but it's not hard to keep track of.
* vc4: Move qir_uniform() and the constant-value versions to vc4_qir.c/h.Eric Anholt2015-02-193-45/+49
| | | | | I may want them in optimization passes, and they're not really particular to the program translation stage.
* vc4: Enforce one-uniform-per-instruction after optimization.Eric Anholt2015-02-196-50/+209
| | | | | | | | | | | | | | | This lets us more intelligently decide which uniform values should be put into temporaries, by choosing the most reused values to push to temps first. total uniforms in shared programs: 13457 -> 13433 (-0.18%) uniforms in affected programs: 1524 -> 1500 (-1.57%) total instructions in shared programs: 40198 -> 40019 (-0.45%) instructions in affected programs: 6027 -> 5848 (-2.97%) I noticed this opportunity because with the NIR work, some programs were happening to make different uniform copy propagation choices that significantly increased instruction counts.
* vc4: Rename add_uniform() to qir_uniform().Eric Anholt2015-02-191-15/+15
|
* vc4: Shut up runtime warnings about new pipe caps.Eric Anholt2015-02-191-0/+2
|
* gallium: add interface and state tracker support for GL_AMD_pinned_memoryMarek Olšák2015-02-171-0/+1
| | | | | | v2: add alignment restrictions to docs, fix indentation in headers Reviewed-by: Christian König <[email protected]>
* vc4: Make SF be a flag on the QIR instructions.Eric Anholt2015-02-128-51/+47
| | | | | | | | | | | | Right now the places that used to emit a mov.sf just put the SF on the previous instruction when it generated the source of the SF value. Even without optimization to push the sf up further (and kill thus potentially kill more MOVs), this gets us: total uniforms in shared programs: 13455 -> 13457 (0.01%) uniforms in affected programs: 3 -> 5 (66.67%) total instructions in shared programs: 40296 -> 40198 (-0.24%) instructions in affected programs: 12595 -> 12497 (-0.78%)
* gallium: Add MULTISAMPLE_Z_RESOLVE capAxel Davy2015-02-061-0/+1
| | | | | | | | | | | | | | | | Resolving a multisampled depth texture into a single sampled texture is supported on >= SM4.1 hw. It is possible some previous hw support it. The ability was tested on radeonsi and nvc0. Apparently is is also supported for radeon >= r700. This patch adds the MULTISAMPLE_Z_RESOLVE cap and add it to the drivers. It is advertised for drivers for which it is sure the ability is supported. Reviewed-by: Marek Olšák <[email protected]> Signed-off-by: Axel Davy <[email protected]>
* gallium: add a cap to determine whether the driver supports offset_clampIlia Mirkin2015-02-021-0/+1
| | | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Glenn Kennard <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* dir-locals.el: Don't set variables for non-programming modesNeil Roberts2015-02-022-2/+2
| | | | | | | | | | | | | | This limits the style changes to modes inherited from prog-mode. The main reason to do this is to avoid setting fill-column for people using Emacs to edit commit messages because 78 characters is too many to make it wrap properly in git log. Note that makefile-mode also inherits from prog-mode so the fill column should continue to apply there. v2: Apply to all the .dir-locals.el files, not just the one in the root directory. Acked-by: Michel Dänzer <[email protected]>
* vc4: Kill a bunch of color write calculation when colormask is all off.Eric Anholt2015-02-011-8/+35
| | | | | | | | | | | | | I could have done this in the bit that generates the ANDs and ORs, but it's probably generally useful. Sadly, I still need this even if I move to NIR, because I can't yet express my read of the destination color in NIR, which I would need to move my blend/logicop/colormask handling into NIR. total uniforms in shared programs: 13497 -> 13455 (-0.31%) uniforms in affected programs: 101 -> 59 (-41.58%) total instructions in shared programs: 40797 -> 40296 (-1.23%) instructions in affected programs: 1639 -> 1138 (-30.57%)
* vc4: Dump the VPM read index in QIR disasm.Eric Anholt2015-02-011-4/+9
| | | | | Since the VPM reads have to be in order, it's useful to see their indices in the dump.
* vc4: Fix point size handling when it's the first output.Eric Anholt2015-01-291-1/+1
|
* gallium: Replace u_simple_list.h with util/simple_list.hEric Anholt2015-01-282-7/+2
| | | | | | | The code was exactly the same, except util/ has c++ guards and a struct simple_node declaration. Reviewed-by: Marek Olšák <[email protected]>
* vc4: Fix build since 8ed5305d28d9309d651dfec3fbf4349854694694Eric Anholt2015-01-201-1/+1
|
* vc4: Add some dumping for STORE_TILE_BUFFER_GENERAL.Eric Anholt2015-01-151-1/+79
|
* vc4: Add dumping for the TILE_RENDERING_MODE_CONFIG packet.Eric Anholt2015-01-151-1/+70
| | | | I wanted to read it, so I wrote parsing.
* vc4: Fix CL dumping trying to dump too far.Eric Anholt2015-01-151-2/+2
| | | | | Execution will end at the cl->next, because that's what ct0ea/ct1ea get programmed to.
* vc4: Fix texture type masking.Eric Anholt2015-01-151-1/+1
| | | | | Everything from ETC1 to RGBA64 was getting its top bit dropped, but we didn't use any of those formats.
* vc4: Colormask should apply after all other fragment ops (like logic op).Eric Anholt2015-01-151-9/+18
| | | | | Theoretically it should apply after dithering as well, but ditehring for 565 happens in fixed function in the TLB store.
* vc4: No turning unpack arguments into small immediates.Eric Anholt2015-01-151-0/+3
| | | | | Since unpack only happens on things read from the A register file, we have to leave them as something that can be allocated to A (temp or uniform).
* vc4: Move the tests for src needing to be an A register to vc4_qir.c.Eric Anholt2015-01-153-17/+28
| | | | I want it from another location.
* vc4: Don't swap the raddr on instructions doing unpacks.Eric Anholt2015-01-151-0/+5
| | | | | It would mean different unpacking behavior, since only the A file does unpack (with PM==0).
* vc4: Don't let pairing happen with badly mismatched unpack flags.Eric Anholt2015-01-151-0/+39
| | | | | No difference on shader-db, but prevents definite regressions in the blending changes.
* vc4: Don't let pairing happen with badly mismatched pack flags.Eric Anholt2015-01-151-0/+39
| | | | | No difference on shader-db, but will become more important as I introduce more use of pack flags with the blending changes.
* vc4: Fix early Z behavior on hardware.Eric Anholt2015-01-151-2/+1
| | | | | | It turns out the simulator was not treating this bit the same as the RPi, and I'd forgotten to remove it when turning on early Z. The result was that you'd get big chunks of your rendering missing.
* vc4: Clamp the inputs to the blend equation to [0, 1].Eric Anholt2015-01-111-1/+10
| | | | Fixes the remaining ARB_color_buffer_float rendering tests.
* vc4: Add a little helper for clamping to [0,1].Eric Anholt2015-01-111-4/+10
|
* vc4: Fix up statechange management for uncompiled/compiled FS/VS.Eric Anholt2015-01-112-11/+10
| | | | | | | | No need to recheck the FS compile when the VS source has changed, but there *is* a need to recheck the VS compile when the compiled VS has changed (since the live inputs may change). Fixes es3conform's blend test.
* vc4: Fix clear color setup for RGB565.Eric Anholt2015-01-111-1/+4
| | | | | | | The util_pack_color() thing only sets up the low bits of the union, so only return them, too. Fixes intermittent failure on fbo-alphatest-formats and es3conform's framebuffer-objects test under simulation.
* vc4: Avoid the save/restore of r3 for raddr conflicts, just use ra31.Eric Anholt2015-01-112-38/+11
| | | | | | | | | | | | | Turns out this was harmful in code quality: total instructions in shared programs: 39487 -> 38845 (-1.63%) instructions in affected programs: 22522 -> 21880 (-2.85%) This costs us yet another register, which is painful since it means more programs might fail to compile). However, the alternative was causing us trouble where we'd save/restore r3 while it contained a MIN-ed direct texture offset, causing the kernel to fail to validate our shaders (such as in GLB2.7).
* vc4: Allow dead code elimination of VPM reads.Eric Anholt2015-01-102-1/+44
| | | | | | | | This gets a bunch of dead reads out of the CSes, which don't read most attributes generally. total instructions in shared programs: 39753 -> 39487 (-0.67%) instructions in affected programs: 4721 -> 4455 (-5.63%)