summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/vc4
Commit message (Collapse)AuthorAgeFilesLines
* vc4: Just put USE_VC4_SIMULATOR in DEFINES.Eric Anholt2015-11-222-5/+0
| | | | | | | | In the pipe-loader reworks, it was missed in one of the new directories it was used. Cc: [email protected] Reviewed-by: Emil Velikov <[email protected]>
* vc4: Use nir_channel() to simplify all of our nir_swizzle() cases.Eric Anholt2015-11-212-6/+5
|
* vc4: Fix point size lookup.Eric Anholt2015-11-211-1/+1
| | | | | | | | I think I may have regressed this in the NIR conversion. TGSI-to-NIR is putting the PSIZ in the .x channel, not .w, so we were grabbing some garbage for point size, which ended up meaning just not drawing points. Fixes glean pointAtten and pointsprite.
* vc4: Don't bother lowering uniforms when the same value is used twice.Eric Anholt2015-11-171-13/+33
| | | | | | | | | | | | | DEQP likes to do math on uniforms, and the "fmaxabs dst, uni, uni" to get the absolute value would get lowered. The lowering doesn't bother to try to restrict the lifetime of the lowered uniforms, so we'd end up register allocation failng due to this on 5 of the tests (More tests still fail in RA, which look like we'll need to reduce lowered uniform lifetimes to fix). No changes on shader-db, though fewer extra MOVs are generated on even glxgears (MOVs pair well enough that it ends up being the same instruction count).
* vc4: Fix uniform reordering to support reading the same uniform twice.Eric Anholt2015-11-171-8/+18
| | | | | This does actually happen in the wild (particularly fabs of a uniform), so we'd like to support it.
* vc4: Fix documentation on vc4_qir_lower_uniforms.c.Eric Anholt2015-11-171-7/+3
|
* vc4: Add support for nir_op_uge, using the carry bit on QPU_A_SUB.Eric Anholt2015-11-175-0/+26
| | | | | | | | It looks like nir_lower_idiv is going to use it soon, so add support. With Ilia's change, this fixes one case in fs-op-div-large-uint-uint (with GL 3.0 forced on). Cc: "11.0" <[email protected]>
* gallium: add PIPE_CAP_CLEAR_TEXTURE and clear_texture prototypeIlia Mirkin2015-11-111-0/+1
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* vc4: Avoid loading undefined (newly-allocated) FBO contents.Eric Anholt2015-11-091-0/+17
| | | | | | | Since X has undefined contents in new pixmaps, it will allocate new textures for an FBO and draw to them without an explicit clear. For VC4, it's much faster to emit a clear than the load of the actual undefined memory contents, so just do that instead.
* vc4: Return NULL when we can't make our shadow for a sampler view.Eric Anholt2015-11-091-0/+4
| | | | | | | I'm not sure what the caller does is appropriate (just have a NULL sampler at this slot), but it fixes the immediate crash. Cc: "11.0" <[email protected]>
* vc4: Return GL_OUT_OF_MEMORY when buffer allocation fails.Eric Anholt2015-11-092-19/+32
| | | | | | | I was afraid our callers weren't prepared for this, but it looks like at least for resource creation, mesa/st throws an error appropriately. Cc: "11.0" <[email protected]>
* vc4: Add CL dumping for GL_ARRAY_PRIMITIVE.Eric Anholt2015-11-091-1/+16
|
* vc4: Fix a compiler warning.Eric Anholt2015-11-091-1/+1
|
* vc4: When the create ioctl fails, free our cache and try again.Eric Anholt2015-11-041-5/+24
| | | | | | | | | This greatly increases the pressure you can put on the driver before create fails. Ultimately we need to let the kernel take control of our cached BOs and just take them from us (and other clients) directly, but this is a very easy patch for the moment. Cc: "11.0" <[email protected]>
* vc4: Print the rounded shader size in debug output.Eric Anholt2015-11-041-1/+1
| | | | | It's surprising to see "0kb" printed for debug on short shaders, while 4kb alignment won't be suprising.
* vc4: Fix dumping the size of BOs allocated/cached.Eric Anholt2015-11-041-2/+2
| | | | 60MB of cached BOs are a lot less scary than 600MB.
* vc4: Allow user index buffers, to avoid slow readback for shadow IBs.Eric Anholt2015-10-294-10/+25
| | | | | Improves low-settings openarena performance by 31.9975% +/- 0.659931% (n=7).
* gallium: add PIPE_CAP_COPY_BETWEEN_COMPRESSED_AND_PLAIN_FORMATSMarek Olšák2015-10-281-0/+1
| | | | | | For ARB_copy_image. Reviewed-by: Brian Paul <[email protected]>
* vc4: Add support for copy propagation with unpack flags present.Eric Anholt2015-10-262-36/+109
| | | | | total instructions in shared programs: 89251 -> 87862 (-1.56%) instructions in affected programs: 52971 -> 51582 (-2.62%)
* vc4: Rewrite the pack instructions as a MOV with a dst pack flagEric Anholt2015-10-263-37/+18
| | | | Another step in reducing the special-casing of instructions.
* vc4: Move dst pack setup out to a helper function with more asserts.Eric Anholt2015-10-261-10/+22
|
* vc4: Switch the unpack ops to being unpack flags on a mov.Eric Anholt2015-10-266-123/+42
| | | | | | | | | | | | This paves the way for copy propagating our unpacks. We end up with a small change on shader-db: total instructions in shared programs: 89390 -> 89251 (-0.16%) instructions in affected programs: 19041 -> 18902 (-0.73%) which appears to be because we no longer convert MOVs for an FMAX dst, r4.unpack, r4.unpack (instead of the previous MOV dst, r4.unpack), and this ends up with a slightly better schedule.
* vc4: Drop some confused code about pack/unpack handling.Eric Anholt2015-10-261-23/+4
| | | | | | | | | At one point I thought packs and unpacks were in the same field of the instruction. They aren't. These instructions therefore never cause a pack. total instructions in shared programs: 89472 -> 89390 (-0.09%) instructions in affected programs: 15261 -> 15179 (-0.54%)
* vc4: Reduce MOV special-casing in QIR-to-QPU.Eric Anholt2015-10-261-8/+11
| | | | | I'm going to introduce some more types of MOV, which also want the elision of raw MOVs.
* vc4: Fix up the test for whether the unpack can be from r4.Eric Anholt2015-10-263-8/+27
| | | | We can do 16a/16b from float as well. No difference on shader-db.
* vc4: Don't try to follow MOVs across a pack.Eric Anholt2015-10-261-1/+2
|
* vc4: Only copy propagate raw MOVs.Eric Anholt2015-10-261-6/+1
| | | | No problems being fixed, but needed for the new unpack changes.
* vc4: If a QIR source has an unpack set, print it.Eric Anholt2015-10-263-3/+13
| | | | Not used yet, but will be.
* vc4: Fix names of the 16-bit unpacksEric Anholt2015-10-243-6/+6
| | | | | They're only f16-to-f32 on a float operation, otherwise they're i16-to-i32.
* vc4: Don't try to register coalesce into the VPM across non-raw MOVs.Eric Anholt2015-10-241-1/+1
| | | | | No known bugs, just something I noticed while updating optimization code for other changes.
* vc4: Take advantage of the 8888 pack function in pack_unorm_4x8.Eric Anholt2015-10-241-0/+14
| | | | | | | | | | One instruction instead of four, and it turns out you do this a lot for the Over operator. total uniforms in shared programs: 32168 -> 32087 (-0.25%) uniforms in affected programs: 318 -> 237 (-25.47%) total instructions in shared programs: 89830 -> 89472 (-0.40%) instructions in affected programs: 6434 -> 6076 (-5.56%)
* vc4: Fix the test for skipping raw MOVs.Eric Anholt2015-10-243-1/+10
| | | | | I don't know what previous test was trying to do, but it dates back to the first add of vc4_qpu_emit.c. No change to shader-db.
* vc4: Convert blending to being done in 4x8 unorm normally.Eric Anholt2015-10-235-51/+276
| | | | | | | | | | | | | We can't do this all the time, because you want blending to be done in linear space, and sRGB would lose too much precision being done in 4x8. The win on instructions is pretty huge when you can, though. total uniforms in shared programs: 32065 -> 32168 (0.32%) uniforms in affected programs: 327 -> 430 (31.50%) total instructions in shared programs: 92644 -> 89830 (-3.04%) instructions in affected programs: 15580 -> 12766 (-18.06%) Improves openarena performance at 1920x1080 from 10.7fps to 11.2fps.
* vc4: Add QIR/QPU support for the 8-bit vector instructions.Eric Anholt2015-10-234-0/+45
|
* vc4: Don't try to CSE non-SSA instructions.Eric Anholt2015-10-231-0/+1
| | | | | | | This can happen when we're doing destination packing -- we don't know what's in the rest of the register. Signed-off-by: Eric Anholt <[email protected]>
* vc4: Add dumping of VC4_PACKET_GL_INDEXED_PRIMITIVE.Eric Anholt2015-10-231-1/+22
|
* vc4: Add a workaround for HW-2116 (state counter wrap fails).Eric Anholt2015-10-233-6/+40
| | | | | | I haven't proven that this happens (I've got other GPU hangs in the way), but the closed driver also does this and it's documented as an errata.
* vc4: Fix missing \n in a perf_debug().Eric Anholt2015-10-231-1/+1
|
* vc4: Use Rob's NIR-based user clip lowering.Eric Anholt2015-10-234-69/+14
|
* vc4: Also dump the decimation mode for resolved stores.Eric Anholt2015-10-231-2/+4
|
* vc4: Use VC4_GET_FIELD and other defines in dumping VC4_RENDER_CONFIG.Eric Anholt2015-10-231-10/+10
|
* vc4: Add a sentinel after simulator buffers for buffer overflow detection.Eric Anholt2015-10-231-1/+11
| | | | | | | | | This is a little bit like the mprotect-based fencing I've experimented with, but it's simple and low overhead. The downside is that only catches writes, not reads. It didn't catch any bad writes on a current piglit run, but may be useful in the future.
* gallium: add PIPE_SHADER_CAP_MAX_UNROLL_ITERATIONS_HINTMarek Olšák2015-10-201-0/+2
| | | | | | | | | | | | | | This avoids a serious r600g bug leading to a GPU hang. The chances this bug will get fixed are pretty low now. I deeply regret listening to others and not pushing this patch, leaving other users with a GPU-crashing driver. Yes, it should be fixed in the compiler and it's ugly, but users couldn't care less about that. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86720 Cc: 11.0 10.6 <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* vc4: Switch our vertex attr lowering to being NIR-based.Eric Anholt2015-10-202-143/+200
| | | | | | | | | | This exposes more information to NIR's optimization, and should be particularly useful when we do range-based optimization. total uniforms in shared programs: 32066 -> 32065 (-0.00%) uniforms in affected programs: 21 -> 20 (-4.76%) total instructions in shared programs: 93104 -> 92630 (-0.51%) instructions in affected programs: 31901 -> 31427 (-1.49%)
* vc4: Add limited support for ibfe/ubfe.Eric Anholt2015-10-201-0/+42
| | | | | This is just enough to cover our unpack modes, which will be used by some new NIR-based lowering in the next commit.
* gallium: add PIPE_CAP_SHAREABLE_SHADERSMarek Olšák2015-10-201-0/+1
| | | | | | I'll let drivers figure out how to do it. Reviewed-by: Ilia Mirkin <[email protected]>
* vc4: Use nir_foreach_variableBoyan Ding2015-10-203-7/+7
| | | | | Signed-off-by: Boyan Ding <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* vc4: use nir two-sided-color loweringBoyan Ding2015-10-062-24/+2
| | | | | | | | Similar to 9ffc1049ca (freedreno/ir3: use nir two-sided-color lowering). No piglit regression. Signed-off-by: Boyan Ding <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* vc4: Fix a leak of the last color read/write surface on context destroy.Eric Anholt2015-10-061-0/+3
|
* vc4: Fix a memory leak in the simulator case.Eric Anholt2015-10-061-1/+6
| | | | We validate per draw call, and need to free the shader per draw call, too.