summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/vc4
Commit message (Collapse)AuthorAgeFilesLines
* Fix a few typosZoë Blade2015-04-271-1/+1
| | | | Reviewed-by: Francisco Jerez <[email protected]>
* vc4: Don't try to use color load/stores to blit across format changes.Eric Anholt2015-04-151-0/+3
| | | | | | We could potentially support the right combination of 8888 to 565, but the important thing for now is to not mix up our orderings of 8888. Fixes fbo-copyteximage regressions.
* vc4: Don't try to use color load/stores to do depth/stencil blits.Eric Anholt2015-04-151-0/+3
| | | | | Fixes regressions in fbo-generatemipmap-formats on depth/stencil (which does blits to work around baselevel/lastlevel).
* vc4: Update the shadow texture for public textures on every draw.Eric Anholt2015-04-153-7/+20
| | | | | We don't know who else has written to it, so we'd better update it every time. This makes the gears spin in X again.
* vc4: Hook up VC4_DEBUG=perf to some useful printfs.Eric Anholt2015-04-153-1/+16
|
* vc4: Add a blitter path using just the render thread.Eric Anholt2015-04-131-0/+127
| | | | | This accelerates the path for generating the shadow tiled texture when asked to sample from a raster texture (typical in glamor).
* vc4: Allow submitting jobs with no bin CL in validation.Eric Anholt2015-04-134-11/+19
| | | | | | For blitting, we want to fire off an RCL-only job. This takes a bit of tweaking in our validation and the simulator support (and corresponding new code in the kernel).
* vc4: Move the blit code to a separate file.Eric Anholt2015-04-134-64/+92
| | | | | There will be other blit code showing up, and it seems like the place you'd look.
* vc4: Separate out a bit of code for submitting jobs to the kernel.Eric Anholt2015-04-134-90/+139
| | | | | | I want to be able to have multiple jobs being set up at the same time (for example, a render job to do a little fixup blit in the course of doing a render to the main FBO).
* vc4: When asked to sample from a raster texture, make a shadow tiled copy.Eric Anholt2015-04-131-2/+9
| | | | | | | | | | | So, it turns out my simulator doesn't *quite* match the hardware. And the errata about raster textures tells you most of what's wrong, but there's still stuff wrong after that. Instead, if we're asked to sample from raster, we'll just blit it to a tiled temporary. Raster textures should only be screen scanout, and word is that it's faster to copy to tiled using the tiling engine first than to texture from an entire raster texture, anyway.
* vc4: Fix off-by-one in branch target validation.Eric Anholt2015-04-131-1/+1
|
* vc4: Use NIR-level lowering for idiv.Eric Anholt2015-04-131-11/+1
| | | | This fixes the idiv tests in piglit.
* vc4: Add a bunch of type conversions.Eric Anholt2015-04-131-0/+12
| | | | | | These are required to get piglit's idiv tests working. The unsigned<->float conversions are wrong, but are good enough to get piglit's small ranges of values working.
* vc4: Use the blit interface for updating shadow textures.Eric Anholt2015-04-131-13/+31
| | | | | This lets us plug in a better blit implementation and have it impact the shadow update, too.
* vc4: Remove dead fields from vc4_surface.Eric Anholt2015-04-131-3/+0
|
* vc4: Skip sending down the clear colors if not clearing.Eric Anholt2015-04-131-5/+7
|
* vc4: Sync with kernel changes to relax BCL versus RCL validation.Eric Anholt2015-04-131-22/+3
| | | | There was no reason to tie the two packets' values together.
* vc4: Fix another space allocation mistake.Eric Anholt2015-04-131-0/+1
| | | | | | We're over-allocating our BCL in vc4_draw.c, so this never mattered. However, new RCL-only blit support might end up here without having set up any BCL contents.
* vc4: Add missed accounting for the size of the semaphore.Eric Anholt2015-04-131-0/+2
| | | | This wouldn't have mattered except in the worst case scenario RCL setup.
* vc4: Add support for nir_iabs.Eric Anholt2015-04-021-0/+5
| | | | | Tested using the GLSL 1.30 tests for integer abs(). Not currently used, but it was one of the new opcodes used by robclark's idiv lowering.
* vc4: Add shader-db dumping of NIR instruction count.Eric Anholt2015-04-011-0/+29
| | | | | | | | I was previously using temporary disables of VC4 optimization to show the benefits of improved NIR optimization, but this can get me quick and dirty numbers for NIR-only improvements without having to add hacks to disable VC4's code (disabling of which might hide ways that the NIR changes would hurt actual VC4 codegen).
* vc4: Convert to consuming NIR.Eric Anholt2015-04-015-720/+707
| | | | | | | | | | | | | | | | | | | NIR brings us better optimization than I would have bothered to write within the driver, developers sharing future optimization work, and the ability to share device-specific lowering code that we and other GLES2-level drivers need. total uniforms in shared programs: 13421 -> 13422 (0.01%) uniforms in affected programs: 62 -> 63 (1.61%) total instructions in shared programs: 39961 -> 39707 (-0.64%) instructions in affected programs: 15494 -> 15240 (-1.64%) v2: Add missing imov support, and assert that there are no dest saturates. v3: Rebase on the target-specific algebraic series. v4: Rebase on gallium-includes-from-NIR changes in mater. v5: Rebase on variables being in lists instead of hash tables. v6: Squash in intermediate changes that used the NIR-to-TGSI pass (which I'm not committing)
* vc4: Tell shader-db how big our UBOs are, if present.Eric Anholt2015-04-011-0/+6
| | | | I had regressed them for a while with the NIR work.
* vc4: Drop integer multiplies with 0 to moves of 0.Eric Anholt2015-03-301-0/+8
| | | | | | | | This cleans up more instructions generated by uniform array indexing multiplies. total instructions in shared programs: 39989 -> 39961 (-0.07%) instructions in affected programs: 896 -> 868 (-3.12%)
* vc4: Add a constant folding pass.Eric Anholt2015-03-304-0/+113
| | | | | | | | | | | | This cleans up some pointless operations generated by the in-driver mul24 lowering (commonly generated by making a vec4 index for a matrix in a uniform array). I could fill in other operations, but pretty much anything else ought to be getting handled at the NIR level, I think. total uniforms in shared programs: 13423 -> 13421 (-0.01%) uniforms in affected programs: 346 -> 344 (-0.58%)
* vc4: Don't bother masking out the low 24 bits for integer multipliesEric Anholt2015-03-301-12/+8
| | | | | | | | | | The hardware just uses the low 24 lines, saving us an AND to drop the high bits. total uniforms in shared programs: 13433 -> 13423 (-0.07%) uniforms in affected programs: 356 -> 346 (-2.81%) total instructions in shared programs: 40003 -> 39989 (-0.03%) instructions in affected programs: 910 -> 896 (-1.54%)
* vc4: Make integer multiply use 24 bits for the low parts.Eric Anholt2015-03-301-5/+5
| | | | | The hardware uses the low 24 bits in integer multiplies, so we can have fewer high bits (and so probably drop them more frequently).
* vc4: Add a dump-the-surface-contents routine.Eric Anholt2015-03-242-0/+101
| | | | | This has been useful once again while trying to debug stride issues between render targets and texturing.
* vc4: Fix pitch alignment of linear textures.Eric Anholt2015-03-241-1/+1
| | | | | Fixes some non-power-of-two texture rendering when I force ARGB8888 to raster.
* vc4: Write the alignment of level width consistently in validation.Eric Anholt2015-03-241-2/+2
| | | | | | 16 / cpp happens to be the same as utile_w on the only raster format supported (4 bytes per pixel), but simulator/hw source code generally talks in terms of utiles.
* vc4: Fix use of a bool as an enum.Eric Anholt2015-03-241-1/+1
| | | | The enum compared to was 0, so it worked out, but it sure looked wrong.
* vc4: Decide the HW's format before laying out the miptree.Eric Anholt2015-03-241-3/+3
| | | | | | I'm experimenting with a workaround for raster texture misrendering on hardware, and this lets me look at the format chosen when computing strides.
* vc4: Use our device-specific ioctls for create/mmap.Eric Anholt2015-03-241-15/+36
| | | | | | They don't do anything special for us, but I've been told by kernel maintainers that relying on dumb for my acceleration-capable buffers is not OK.
* vc4: Make a new #define for making code conditional on the simulator.Eric Anholt2015-03-243-15/+25
| | | | | | I'd like to compile as much of the device-specific code as possible when building for simulator, and using if (using_simulator) instead of ifdefs helps.
* vc4: Add some useful debug printfs for miptrees.Eric Anholt2015-03-241-0/+37
| | | | I keep rewriting these.
* gallium: implement get_device_vendor() for existing driversGiuseppe Bilotta2015-03-231-0/+1
| | | | | | | | | The only hackish ones are llvmpipe and softpipe, which currently return the same string as for get_vendor(), while ideally they should return the CPU vendor. Signed-off-by: Giuseppe Bilotta <[email protected]> Reviewed-by: Tom Stellard <[email protected]>
* gallium: add FMA and DFMA opcodes (v3)Marek Olšák2015-03-161-0/+1
| | | | | | | | | Needed by ARB_gpu_shader5. v2: select DMAD for FMA with double precision v3: add and select DFMA Reviewed-by: Ilia Mirkin <[email protected]>
* vc4: Update to current kernel sources.Eric Anholt2015-02-248-49/+93
| | | | | | New BO create and mmap ioctls are added. The submit ABI gains a flags argument, and the pointers are fixed at 64-bit. Shaders are now fixed at the start of their BOs.
* vc4: Keep an array of pointers to instructions defining the temps around.Eric Anholt2015-02-198-68/+67
| | | | | The optimization passes are always regenerating it and throwing it away, but it's not hard to keep track of.
* vc4: Move qir_uniform() and the constant-value versions to vc4_qir.c/h.Eric Anholt2015-02-193-45/+49
| | | | | I may want them in optimization passes, and they're not really particular to the program translation stage.
* vc4: Enforce one-uniform-per-instruction after optimization.Eric Anholt2015-02-196-50/+209
| | | | | | | | | | | | | | | This lets us more intelligently decide which uniform values should be put into temporaries, by choosing the most reused values to push to temps first. total uniforms in shared programs: 13457 -> 13433 (-0.18%) uniforms in affected programs: 1524 -> 1500 (-1.57%) total instructions in shared programs: 40198 -> 40019 (-0.45%) instructions in affected programs: 6027 -> 5848 (-2.97%) I noticed this opportunity because with the NIR work, some programs were happening to make different uniform copy propagation choices that significantly increased instruction counts.
* vc4: Rename add_uniform() to qir_uniform().Eric Anholt2015-02-191-15/+15
|
* vc4: Shut up runtime warnings about new pipe caps.Eric Anholt2015-02-191-0/+2
|
* gallium: add interface and state tracker support for GL_AMD_pinned_memoryMarek Olšák2015-02-171-0/+1
| | | | | | v2: add alignment restrictions to docs, fix indentation in headers Reviewed-by: Christian König <[email protected]>
* vc4: Make SF be a flag on the QIR instructions.Eric Anholt2015-02-128-51/+47
| | | | | | | | | | | | Right now the places that used to emit a mov.sf just put the SF on the previous instruction when it generated the source of the SF value. Even without optimization to push the sf up further (and kill thus potentially kill more MOVs), this gets us: total uniforms in shared programs: 13455 -> 13457 (0.01%) uniforms in affected programs: 3 -> 5 (66.67%) total instructions in shared programs: 40296 -> 40198 (-0.24%) instructions in affected programs: 12595 -> 12497 (-0.78%)
* gallium: Add MULTISAMPLE_Z_RESOLVE capAxel Davy2015-02-061-0/+1
| | | | | | | | | | | | | | | | Resolving a multisampled depth texture into a single sampled texture is supported on >= SM4.1 hw. It is possible some previous hw support it. The ability was tested on radeonsi and nvc0. Apparently is is also supported for radeon >= r700. This patch adds the MULTISAMPLE_Z_RESOLVE cap and add it to the drivers. It is advertised for drivers for which it is sure the ability is supported. Reviewed-by: Marek Olšák <[email protected]> Signed-off-by: Axel Davy <[email protected]>
* gallium: add a cap to determine whether the driver supports offset_clampIlia Mirkin2015-02-021-0/+1
| | | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Glenn Kennard <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* dir-locals.el: Don't set variables for non-programming modesNeil Roberts2015-02-022-2/+2
| | | | | | | | | | | | | | This limits the style changes to modes inherited from prog-mode. The main reason to do this is to avoid setting fill-column for people using Emacs to edit commit messages because 78 characters is too many to make it wrap properly in git log. Note that makefile-mode also inherits from prog-mode so the fill column should continue to apply there. v2: Apply to all the .dir-locals.el files, not just the one in the root directory. Acked-by: Michel Dänzer <[email protected]>
* vc4: Kill a bunch of color write calculation when colormask is all off.Eric Anholt2015-02-011-8/+35
| | | | | | | | | | | | | I could have done this in the bit that generates the ANDs and ORs, but it's probably generally useful. Sadly, I still need this even if I move to NIR, because I can't yet express my read of the destination color in NIR, which I would need to move my blend/logicop/colormask handling into NIR. total uniforms in shared programs: 13497 -> 13455 (-0.31%) uniforms in affected programs: 101 -> 59 (-41.58%) total instructions in shared programs: 40797 -> 40296 (-1.23%) instructions in affected programs: 1639 -> 1138 (-30.57%)
* vc4: Dump the VPM read index in QIR disasm.Eric Anholt2015-02-011-4/+9
| | | | | Since the VPM reads have to be in order, it's useful to see their indices in the dump.