summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/vc4
Commit message (Collapse)AuthorAgeFilesLines
* vc4: Move dst pack setup out to a helper function with more asserts.Eric Anholt2015-10-261-10/+22
|
* vc4: Switch the unpack ops to being unpack flags on a mov.Eric Anholt2015-10-266-123/+42
| | | | | | | | | | | | This paves the way for copy propagating our unpacks. We end up with a small change on shader-db: total instructions in shared programs: 89390 -> 89251 (-0.16%) instructions in affected programs: 19041 -> 18902 (-0.73%) which appears to be because we no longer convert MOVs for an FMAX dst, r4.unpack, r4.unpack (instead of the previous MOV dst, r4.unpack), and this ends up with a slightly better schedule.
* vc4: Drop some confused code about pack/unpack handling.Eric Anholt2015-10-261-23/+4
| | | | | | | | | At one point I thought packs and unpacks were in the same field of the instruction. They aren't. These instructions therefore never cause a pack. total instructions in shared programs: 89472 -> 89390 (-0.09%) instructions in affected programs: 15261 -> 15179 (-0.54%)
* vc4: Reduce MOV special-casing in QIR-to-QPU.Eric Anholt2015-10-261-8/+11
| | | | | I'm going to introduce some more types of MOV, which also want the elision of raw MOVs.
* vc4: Fix up the test for whether the unpack can be from r4.Eric Anholt2015-10-263-8/+27
| | | | We can do 16a/16b from float as well. No difference on shader-db.
* vc4: Don't try to follow MOVs across a pack.Eric Anholt2015-10-261-1/+2
|
* vc4: Only copy propagate raw MOVs.Eric Anholt2015-10-261-6/+1
| | | | No problems being fixed, but needed for the new unpack changes.
* vc4: If a QIR source has an unpack set, print it.Eric Anholt2015-10-263-3/+13
| | | | Not used yet, but will be.
* vc4: Fix names of the 16-bit unpacksEric Anholt2015-10-243-6/+6
| | | | | They're only f16-to-f32 on a float operation, otherwise they're i16-to-i32.
* vc4: Don't try to register coalesce into the VPM across non-raw MOVs.Eric Anholt2015-10-241-1/+1
| | | | | No known bugs, just something I noticed while updating optimization code for other changes.
* vc4: Take advantage of the 8888 pack function in pack_unorm_4x8.Eric Anholt2015-10-241-0/+14
| | | | | | | | | | One instruction instead of four, and it turns out you do this a lot for the Over operator. total uniforms in shared programs: 32168 -> 32087 (-0.25%) uniforms in affected programs: 318 -> 237 (-25.47%) total instructions in shared programs: 89830 -> 89472 (-0.40%) instructions in affected programs: 6434 -> 6076 (-5.56%)
* vc4: Fix the test for skipping raw MOVs.Eric Anholt2015-10-243-1/+10
| | | | | I don't know what previous test was trying to do, but it dates back to the first add of vc4_qpu_emit.c. No change to shader-db.
* vc4: Convert blending to being done in 4x8 unorm normally.Eric Anholt2015-10-235-51/+276
| | | | | | | | | | | | | We can't do this all the time, because you want blending to be done in linear space, and sRGB would lose too much precision being done in 4x8. The win on instructions is pretty huge when you can, though. total uniforms in shared programs: 32065 -> 32168 (0.32%) uniforms in affected programs: 327 -> 430 (31.50%) total instructions in shared programs: 92644 -> 89830 (-3.04%) instructions in affected programs: 15580 -> 12766 (-18.06%) Improves openarena performance at 1920x1080 from 10.7fps to 11.2fps.
* vc4: Add QIR/QPU support for the 8-bit vector instructions.Eric Anholt2015-10-234-0/+45
|
* vc4: Don't try to CSE non-SSA instructions.Eric Anholt2015-10-231-0/+1
| | | | | | | This can happen when we're doing destination packing -- we don't know what's in the rest of the register. Signed-off-by: Eric Anholt <[email protected]>
* vc4: Add dumping of VC4_PACKET_GL_INDEXED_PRIMITIVE.Eric Anholt2015-10-231-1/+22
|
* vc4: Add a workaround for HW-2116 (state counter wrap fails).Eric Anholt2015-10-233-6/+40
| | | | | | I haven't proven that this happens (I've got other GPU hangs in the way), but the closed driver also does this and it's documented as an errata.
* vc4: Fix missing \n in a perf_debug().Eric Anholt2015-10-231-1/+1
|
* vc4: Use Rob's NIR-based user clip lowering.Eric Anholt2015-10-234-69/+14
|
* vc4: Also dump the decimation mode for resolved stores.Eric Anholt2015-10-231-2/+4
|
* vc4: Use VC4_GET_FIELD and other defines in dumping VC4_RENDER_CONFIG.Eric Anholt2015-10-231-10/+10
|
* vc4: Add a sentinel after simulator buffers for buffer overflow detection.Eric Anholt2015-10-231-1/+11
| | | | | | | | | This is a little bit like the mprotect-based fencing I've experimented with, but it's simple and low overhead. The downside is that only catches writes, not reads. It didn't catch any bad writes on a current piglit run, but may be useful in the future.
* gallium: add PIPE_SHADER_CAP_MAX_UNROLL_ITERATIONS_HINTMarek Olšák2015-10-201-0/+2
| | | | | | | | | | | | | | This avoids a serious r600g bug leading to a GPU hang. The chances this bug will get fixed are pretty low now. I deeply regret listening to others and not pushing this patch, leaving other users with a GPU-crashing driver. Yes, it should be fixed in the compiler and it's ugly, but users couldn't care less about that. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86720 Cc: 11.0 10.6 <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* vc4: Switch our vertex attr lowering to being NIR-based.Eric Anholt2015-10-202-143/+200
| | | | | | | | | | This exposes more information to NIR's optimization, and should be particularly useful when we do range-based optimization. total uniforms in shared programs: 32066 -> 32065 (-0.00%) uniforms in affected programs: 21 -> 20 (-4.76%) total instructions in shared programs: 93104 -> 92630 (-0.51%) instructions in affected programs: 31901 -> 31427 (-1.49%)
* vc4: Add limited support for ibfe/ubfe.Eric Anholt2015-10-201-0/+42
| | | | | This is just enough to cover our unpack modes, which will be used by some new NIR-based lowering in the next commit.
* gallium: add PIPE_CAP_SHAREABLE_SHADERSMarek Olšák2015-10-201-0/+1
| | | | | | I'll let drivers figure out how to do it. Reviewed-by: Ilia Mirkin <[email protected]>
* vc4: Use nir_foreach_variableBoyan Ding2015-10-203-7/+7
| | | | | Signed-off-by: Boyan Ding <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* vc4: use nir two-sided-color loweringBoyan Ding2015-10-062-24/+2
| | | | | | | | Similar to 9ffc1049ca (freedreno/ir3: use nir two-sided-color lowering). No piglit regression. Signed-off-by: Boyan Ding <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* vc4: Fix a leak of the last color read/write surface on context destroy.Eric Anholt2015-10-061-0/+3
|
* vc4: Fix a memory leak in the simulator case.Eric Anholt2015-10-061-1/+6
| | | | We validate per draw call, and need to free the shader per draw call, too.
* gallium: add per-sample interpolation control into rasterizer statOAeMarek Olšák2015-10-031-0/+1
| | | | | | | | Required by ARB_sample_shading for drivers that don't want a shader variant in st/mesa. Reviewed-by: Ilia Mirkin <[email protected]> Acked-by: Roland Scheidegger <[email protected]>
* vc4: Try to pair up instructions when only one of them has PM bitBoyan Ding2015-09-171-47/+76
| | | | | | | | | | | | | Instructions with difference in PM field can actually be paired up if the one without PM doesn't do packing/unpacking and non-NOP packing/unpacking operations from PM instruction aren't added to the other without PM. total instructions in shared programs: 48209 -> 47460 (-1.55%) instructions in affected programs: 11688 -> 10939 (-6.41%) Signed-off-by: Boyan Ding <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* vc4: convert from tgsi semantic/index to varying-slotEric Anholt2015-09-167-147/+106
| | | | | | | | | (originally part of previous patch, split out to separate patch by Rob) v2: squash in some fixes from Eric v3: Another fix from Eric for point coords. Signed-off-by: Rob Clark <[email protected]>
* gallium/ttn: Convert to using VARYING_SLOT_* / FRAG_RESULT_*.Eric Anholt2015-09-163-10/+28
| | | | | | | | | | | | | | | This avoids exceeding the size of the .index bitfield since it got truncated, and should make our NIR look more like the NIR that the rest of the NIR developers are working on. v2: split out vc4 updates, first patch uses varying_slot_to_tgsi_semantic() helper, and second patch does the actual conversion. v3: add frag_result_to_tgsi_semantic() helper and don't try to map frag_results to semantic name/index as if they were varying_slot's v4: use VERT_ATTRIB_ for VS inputs v5: Fix vc4 build. Signed-off-by: Rob Clark <[email protected]>
* vc4: Fix build from recent NIR cleanups.Eric Anholt2015-09-141-2/+1
|
* gallium: add PIPE_CAP_TGSI_TXQS to let st know if TXQS is supportedIlia Mirkin2015-09-131-0/+1
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Glenn Kennard <[email protected]>
* vc4: Initialize pack field of qreg to 0 in qir_get_tempBoyan Ding2015-09-041-0/+1
| | | | | | | | | | | | | This avoids generation of undefined packing in qir and qpu instructions, fixing a lot of rendering errors. Fixes 8b36d107fdd (vc4: Pack the unorm-packing bits into a src MUL instruction when possible.) Cc: [email protected] Signed-off-by: Boyan Ding <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* nir: Convert the builder to use the new NIR cursor API.Kenneth Graunke2015-08-272-4/+4
| | | | | | | | | | | | | | | | | | The NIR cursor API is exactly what we want for the builder's insertion point. This simplifies the API, the implementation, and is actually more flexible as well. This required a bit of reworking of TGSI->NIR's if/loop stack handling; we now store cursors instead of cf_node_lists, for better or worse. v2: Actually move the cursor in the after_instr case. v3: Take advantage of nir_instr_insert (suggested by Connor). v4: vc4 build fixes (thanks to Eric). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> [v1] Reviewed-by: Jason Ekstrand <[email protected]> [v4] Acked-by: Connor Abbott <[email protected]> [v4]
* gallium: add flags parameter to pipe_screen::context_createMarek Olšák2015-08-262-2/+2
| | | | | | | | This allows creating compute-only and debug contexts. Reviewed-by: Brian Paul <[email protected]> Acked-by: Christian König <[email protected]> Acked-by: Alex Deucher <[email protected]>
* vc4: Actually allow math results to allocate into r4.Eric Anholt2015-08-212-1/+7
| | | | | | | | | | I switched us to tracking whether the results *could* go to r4, but then didn't make a separate register class for the class bits that included r4. Switch the "any" class to actually be "any", and name the "any but r4" class more appropriately. total instructions in shared programs: 96798 -> 94680 (-2.19%) instructions in affected programs: 62736 -> 60618 (-3.38%)
* vc4: Fold the 16-bit integer pack into the instructions generating it.Eric Anholt2015-08-215-30/+22
| | | | | total instructions in shared programs: 97580 -> 96798 (-0.80%) instructions in affected programs: 52826 -> 52044 (-1.48%)
* vc4: Reuse QPU dumping for packing bits in QIR.Eric Anholt2015-08-213-22/+26
|
* vc4: Make _dest variants of qir ALU helpers to provide an explicit dest.Eric Anholt2015-08-212-4/+20
|
* vc4: Use the SSA defs list for figuring out eligible MOVs for copy prop.Eric Anholt2015-08-211-12/+10
| | | | | I thought I'd converted this over previously. It was copy propagating MOVs badly with the new destination packing flags.
* vc4: Add algebraic opt for rcp(1.0).Eric Anholt2015-08-201-0/+8
| | | | | | | | | | | We're generating rcps as part of backend lowering of the packed coordinate in the CS, and we don't want to lower them in NIR because of the extra newton-raphson steps in the common case. However, GLB2.7 is moving a vertex attribute with a 1.0 W component to the position, and that makes us produce some silly RCPs. total instructions in shared programs: 97590 -> 97580 (-0.01%) instructions in affected programs: 74 -> 64 (-13.51%)
* vc4: Allow unpack_8[abcd]_f's src to stay in r4.Eric Anholt2015-08-201-1/+15
| | | | | | | I had QPU emit code to do it, but forgot to flag the register class. total instructions in shared programs: 97974 -> 97590 (-0.39%) instructions in affected programs: 25291 -> 24907 (-1.52%)
* vc4: Pack the unorm-packing bits into a src MUL instruction when possible.Eric Anholt2015-08-205-16/+104
| | | | | | | | Now that we do non-SSA QIR instructions, we can take a NIR SSA src that's only used by the unorm packing and just stuff the pack bits into it. total instructions in shared programs: 98136 -> 97974 (-0.17%) instructions in affected programs: 4149 -> 3987 (-3.90%)
* vc4: Add a QIR helper for whether the op is a MUL type.Eric Anholt2015-08-203-4/+16
|
* vc4: Drop an unused algebraic op.Eric Anholt2015-08-201-9/+0
| | | | NIR now handles this optimization for us.
* vc4: Switch QPU_PACK_SCALED to be two non-SSA instructions.Eric Anholt2015-08-205-21/+19
| | | | | total instructions in shared programs: 98159 -> 98136 (-0.02%) instructions in affected programs: 12279 -> 12256 (-0.19%)