aboutsummaryrefslogtreecommitdiffstats
path: root/src/gallium
Commit message (Collapse)AuthorAgeFilesLines
* vc4: Coalesce MOVs into VPM with the instructions generating the values.Eric Anholt2014-12-184-15/+143
| | | | | total instructions in shared programs: 41168 -> 40976 (-0.47%) instructions in affected programs: 18156 -> 17964 (-1.06%)
* vc4: Redefine VPM writes as a (destination) QIR register file.Eric Anholt2014-12-173-7/+19
| | | | | This will let me coalesce the VPM writes into the instructions generating the values.
* gallium: remove support for GCC older than 4.2.0Timothy Arceri2014-12-181-1/+1
| | | | | | Signed-off-by: Timothy Arceri <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* vc4: Add support for turning constant uniforms into small immediates.Eric Anholt2014-12-1713-46/+283
| | | | | | | | | | | | | | | | | | | | | | Small immediates have the downside of taking over the raddr B field, so you might have less chance to pack instructions together thanks to raddr B conflicts. However, it also reduces some register pressure since it lets you load 2 "uniform" values in one instruction (avoiding a previous load of the constant value to a register), and increases some pairing for the same reason. total uniforms in shared programs: 16231 -> 13374 (-17.60%) uniforms in affected programs: 10280 -> 7423 (-27.79%) total instructions in shared programs: 40795 -> 41168 (0.91%) instructions in affected programs: 25551 -> 25924 (1.46%) In a previous version of this patch I had a reduction in instruction count by forcing the other args alongside a SMALL_IMM to be in the A file or accumulators, but that increases register pressure and had a bug in handling FRAG_Z. In this patch is I just use raddr conflict resolution, which is more expensive. I think I'd rather tweak allocation to have some way to slightly prefer good choices for files in general, rather than risk failing to register allocate by forcing things into register classes.
* vc4: Move follow_movs() to common QIR code.Eric Anholt2014-12-173-11/+12
| | | | I want this from other passes.
* vc4: Fix missing newline for load immediate instruction disasm.Eric Anholt2014-12-171-4/+4
|
* mesa: Remove unnecessary -f from $(RM).Matt Turner2014-12-173-5/+5
| | | | $(RM) includes -f.
* gallium: Add egl and gbm to distribution.Matt Turner2014-12-171-0/+4
|
* targets/xvmc: Add uninstall hooks to handle megadriver hardlinks.Matt Turner2014-12-171-0/+5
|
* targets/vdpau: Add uninstall hooks to handle megadriver hardlinks.Matt Turner2014-12-171-0/+5
|
* targets/vdpau: Add clean-local rule to remove .lib links.Matt Turner2014-12-171-0/+6
|
* vc4: Add a userspace BO cache.Eric Anholt2014-12-174-4/+175
| | | | | | | | | | Since our kernel BOs require CMA allocation, and the use of them requires new mmaps, it's pretty expensive and we should avoid it if possible. Copying my original design for Intel, make a userspace cache that reuses BOs that haven't been shared to other processes but frees BOs that have sat in the cache for over a second. Improves glxgears framerate on RPi by around 30%.
* vc4: Add dmabuf support.Eric Anholt2014-12-174-24/+78
| | | | | | This gets DRI3 working on modesetting with glamor. It's not enabled under simulation, because it looks like handing our dumb-allocated buffers off to the server doesn't actually work for the server's rendering.
* vc4: Drop a weird argument in the BOs-from-handles API.Eric Anholt2014-12-173-7/+5
|
* draw: revert using correct order for prim decomposition.Roland Scheidegger2014-12-171-1/+3
| | | | | | | | | This reverts db3dfcfe90a3d27e6020e0d3642f8ab0330e57be. The commit was correct but we've got some precision problems later in llvmpipe (or possibly in draw clip) due to the vertices coming in in different order, causing some internal test failures. So revert for now. (Will only affect drivers which actually support constant-interpolated attributes and not just flatshading.)
* vc4: Add support for turning add-based MOVs to muls for pairing.Eric Anholt2014-12-161-2/+49
| | | | | total instructions in shared programs: 43053 -> 40795 (-5.24%) instructions in affected programs: 37996 -> 35738 (-5.94%)
* vc4: Add a helper for changing a field in an instruction.Eric Anholt2014-12-162-11/+12
|
* vc4: Fix the name of qpu_waddr_ignores_ws().Eric Anholt2014-12-161-5/+5
| | | | We're deciding about the WS bit, not PM.
* gallium: remove support for GCC older than 4.1.0Timothy Arceri2014-12-172-5/+5
| | | | | | | Signed-off-by: Timothy Arceri <[email protected]> Reviewed-By: Jose Fonseca <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* vc4: Add support for enabling early Z discards.Eric Anholt2014-12-161-0/+18
| | | | This is the same basic logic from the original Broadcom driver.
* nvc0: add missed PIPE_CAP_VERTEXID_NOBASEIlia Mirkin2014-12-151-0/+1
| | | | | | Commit ade8b26bf missed adding this cap to nvc0. Signed-off-by: Ilia Mirkin <[email protected]>
* draw: implement support for the VERTEXID_NOBASE and BASEVERTEX semantics.Roland Scheidegger2014-12-164-19/+47
| | | | | | This fixes 4 vertexid related piglit tests with llvmpipe due to switching behavior of vertexid to the one gl expects. (Won't fix non-llvm draw path since we don't get the basevertex currently.)
* gallium: add TGSI_SEMANTIC_VERTEXID_NOBASE and TGSI_SEMANTIC_BASEVERTEXRoland Scheidegger2014-12-1619-6/+84
| | | | | | | | | | | | | | | | | | | Plus a new PIPE_CAP_VERTEXID_NOBASE query. The idea is that drivers not supporting vertex ids with base vertex offset applied (so, only support d3d10-style vertex ids) will get such a d3d10-style vertex id instead - with the caveat they'll also need to handle the basevertex system value too (this follows what core mesa already does). Additionally, this is also useful for other state trackers (for instance llvmpipe / draw right now implement the d3d10 behavior on purpose, but with different semantics it can just do both). Doesn't do anything yet. And fix up the docs wrt similar values. v2: incorporate feedback from Brian and others, better names, better docs. Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* r600g/sb: implement r600 gpr index workaround. (v3.1)Dave Airlie2014-12-164-9/+57
| | | | | | | | | | | | | | | | | | | | | | | r600, rv610 and rv630 all have a bug in their GPR indexing and how the hw inserts access to PV. If the base index for the src is the same as the dst gpr in a previous group, then it will use PV instead of using the indexed gpr correctly. The workaround is to insert a NOP when you detect this. v2: add second part of fix detecting DST rel writes followed by same src base index reads. v3: forget adding stuff to structs, just iterate over the previous node group again, makes it more obvious. v3.1: drop local_nop. Fixes ~200 piglit regressions on rv635 since SB was introduced. Reviewed-By: Glenn Kennard <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600g/sb: fix issues with loops created for switchVadim Girlin2014-12-165-4/+16
| | | | Signed-off-by: Dave Airlie <[email protected]>
* Revert "r600g/sb: fix issues cause by GLSL switching to loops for switch"Dave Airlie2014-12-161-38/+12
| | | | | | This reverts commit 7b0067d23a6f64cf83c42e7f11b2cd4100c569fe. Vadim's patch fixes this a lot better.
* vc4: Add support for 32-bit signed norm/scaled vertex attrs.Eric Anholt2014-12-152-0/+18
| | | | | 32-bit unsigned would require some adjustments to handle values >= 0x80000000.
* vc4: Add support for 16-bit signed/unsigned norm/scaled vertex attrs.Eric Anholt2014-12-156-6/+94
|
* vc4: Rename the 16-bit unpack #define.Eric Anholt2014-12-152-4/+4
| | | | | It's only an f16 conversion if you're doing a float operation, otherwise it's 16 bit signed to 32-bit signed.
* vc4: Add support for 8-bit unnormalized vertex attrs.Eric Anholt2014-12-156-11/+69
|
* vc4: Refactor vertex attribute conversions a bit.Eric Anholt2014-12-151-25/+40
| | | | There was just way too much indentation.
* vc4: Fix use of r3 as a temp in 8-bit unpacking.Eric Anholt2014-12-151-10/+7
| | | | | We're actually allocating out of r3 now, and I missed it because I'd typed this one as qpu_rn(3) instead of qpu_r3().
* vc4: Rename UNPACK_8* to UNPACK_8*_F.Eric Anholt2014-12-155-20/+20
| | | | | There is an equivalent unpack function without conversion to float if you use an integer operation instead.
* vc4: Add support for UMAD.Eric Anholt2014-12-151-0/+9
|
* vc4: 0-initialize the screen again.Eric Anholt2014-12-151-1/+1
| | | | I typoed this when rebasing the memory leak fixes.
* vc4: Fix leaks of the compiled shaders' keys.Eric Anholt2014-12-141-1/+1
|
* vc4: Fix leaks of the CL contents.Eric Anholt2014-12-142-1/+6
|
* vc4: Fix leak of vc4_bos stashed in the context.Eric Anholt2014-12-141-0/+5
|
* vc4: Fix leak of the compiled shader programs in the cache.Eric Anholt2014-12-143-0/+24
|
* vc4: Fix leak of a copy of the scheduled QPU instructions.Eric Anholt2014-12-141-2/+3
| | | | They're copied into a vc4_bo after compiling is done.
* vc4: Switch to using the util/ hash table.Eric Anholt2014-12-142-54/+33
| | | | | No performance difference on a microbenchmark with norast that should hit it enough to have mattered, n=220.
* vc4: Fix leak of simulator memory on screen cleanup.Eric Anholt2014-12-142-3/+6
|
* vc4: Fix a leak of the simulator's exec BO's actual vc4_bo.Eric Anholt2014-12-141-0/+1
|
* util/hash_table: Rework the API to know about hashingJason Ekstrand2014-12-141-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | Previously, the hash_table API required the user to do all of the hashing of keys as it passed them in. Since the hashing function is intrinsically tied to the comparison function, it makes sense for the hash table to know about it. Also, it makes for a somewhat clumsy API as the user is constantly calling hashing functions many of which have long names. This is especially bad when the standard call looks something like _mesa_hash_table_insert(ht, _mesa_pointer_hash(key), key, data); In the above case, there is no reason why the hash table shouldn't do the hashing for you. We leave the option for you to do your own hashing if it's more efficient, but it's no longer needed. Also, if you do do your own hashing, the hash table will assert that your hash matches what it expects out of the hashing function. This should make it harder to mess up your hashing. v2: change to call the old entrypoint "pre_hashed" rather than "with_hash", like cworth's equivalent change upstream (change by anholt, acked-in-general by Jason). Signed-off-by: Jason Ekstrand <[email protected]> Signed-off-by: Eric Anholt <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* freedreno/a4xx: mipmapsRob Clark2014-12-134-24/+80
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno: update generated headersRob Clark2014-12-135-12/+20
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno: add is_a3xx()/is_a4xx() helpersRob Clark2014-12-133-7/+27
| | | | | | | | A bunch of open-coded 'gpu_id > 300's seems like it will eventually cause problems with future generations. There were already a few minor problems with caps for features that still need additional work on a4xx. Signed-off-by: Rob Clark <[email protected]>
* freedreno: helper to calc layer/level offsetRob Clark2014-12-135-21/+31
| | | | | | | Rather than duplicating this everywhere. Especially as on a4xx the layout of layers and levels differs based on texture type. Signed-off-by: Rob Clark <[email protected]>
* util: add missing closing brace for __cplusplusBrian Paul2014-12-121-0/+6
|
* gallium: Remove Android files from distribution.Matt Turner2014-12-1220-27/+12
| | | | Android builds Mesa from git, so there don't need to be in the tarball.