summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/vc4
Commit message (Collapse)AuthorAgeFilesLines
* nir: Get rid of *_indirect variants of input/output load/store intrinsicsJason Ekstrand2015-12-104-34/+71
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is some special-casing needed in a competent back-end. However, they can do their special-casing easily enough based on whether or not the offset is a constant. In the mean time, having the *_indirect variants adds special cases a number of places where they don't need to be and, in general, only complicates things. To complicate matters, NIR had no way to convdert an indirect load/store to a direct one in the case that the indirect was a constant so we would still not really get what the back-ends wanted. The best solution seems to be to get rid of the *_indirect variants entirely. This commit is a bunch of different changes squashed together: - nir: Get rid of *_indirect variants of input/output load/store intrinsics - nir/glsl: Stop handling UBO/SSBO load/stores differently depending on indirect - nir/lower_io: Get rid of load/store_foo_indirect - i965/fs: Get rid of load/store_foo_indirect - i965/vec4: Get rid of load/store_foo_indirect - tgsi_to_nir: Get rid of load/store_foo_indirect - ir3/nir: Use the new unified io intrinsics - vc4: Do all uniform loads with byte offsets - vc4/nir: Use the new unified io intrinsics - vc4: Fix load_user_clip_plane crash - vc4: add missing src for store outputs - vc4: Fix state uniforms - nir/lower_clip: Update to the new load/store intrinsics - nir/lower_two_sided_color: Update to the new load intrinsic NIR and i965 changes are Reviewed-by: Kenneth Graunke <[email protected]> NIR indirect declarations and vc4 changes are Reviewed-by: Eric Anholt <[email protected]> ir3 changes are Reviewed-by: Rob Clark <[email protected]> NIR changes are Acked-by: Rob Clark <[email protected]>
* vc4: Enable MSAA.Eric Anholt2015-12-081-2/+3
| | | | | | | | We still have several failures in the newly enabled tests in simulation: sRGB downsampling is done as if it was just linear, stencil blits are not supported on MSAA either, and derivatives are still not supported (breaking some MSAA simulation shaders). So, other than sRGB downsampling quality, things seem to be in good shape.
* vc4: Add support for mapping of MSAA resources.Eric Anholt2015-12-082-8/+105
| | | | | The pipe_transfer_map API requires that we do an implicit downsample/upsample and return a mapping of that.
* vc4: Add support for texel fetches from MSAA resources.Eric Anholt2015-12-085-15/+295
| | | | | | | | This is the core of ARB_texture_multisample. Most of the piglit tests for GL_ARB_texture_multisample require GL 3.0, but exposing support for this lets us use the gallium blitter for multisample resolves. We can sometimes multisample resolve using just the RCL, but that requires that the blit is 1:1, unflipped, and aligned to tile boundaries.
* vc4: Add support for multisample framebuffer operations.Eric Anholt2015-12-087-24/+191
| | | | | | | | This includes GL_SAMPLE_COVERAGE, GL_SAMPLE_ALPHA_TO_ONE, and GL_SAMPLE_ALPHA_TO_COVAGE. I haven't implemented a dithering function yet, and gallium doesn't give me a good chance to do so for GL_SAMPLE_COVERAGE.
* vc4: Add a workaround for HW-2905, and additional failure I saw with MSAA.Eric Anholt2015-12-081-2/+16
| | | | | | I only stumbled on this while experimenting due to reading about HW-2905. I don't know if the EZ disable in the Z-clear is actually necessary, but go with it for now.
* vc4: Add support for drawing in MSAA.Eric Anholt2015-12-086-50/+148
|
* vc4: Add kernel RCL support for MSAA rendering.Eric Anholt2015-12-085-39/+239
|
* vc4: Rename color_ms_write to color_write.Eric Anholt2015-12-083-22/+21
| | | | | I was thinking this was the only MSAA resolve thing, so it should be noted separately, but actually load/store general also do MSAA resolve.
* vc4: Allow RCL blits to the edge of the surface.Eric Anholt2015-12-081-2/+8
| | | | | | | The recent unaligned fix successfully prevented RCL blits that weren't aligned inside of the surface, but we also want to be able to do RCL blits for the whole surface when the width or height of the surface aren't aligned (we don't care what renders inside of the padding).
* vc4: Add disabled debug printf for describing blits.Eric Anholt2015-12-081-0/+10
| | | | I keep typing variants of this while debugging RCL blits for MSAA.
* vc4: Fix check for tile RCL blits with mismatched y.Eric Anholt2015-12-081-1/+1
| | | | | This was a typo in 3a508a0d94d020d9cd95f8882e9393d83ffac377 that didn't show up in testcases at that moment.
* vc4: Fix compiler warning from size_t change.Eric Anholt2015-12-081-1/+1
| | | | I missed this when bringing over the kernel changes.
* gallium/drivers: Sanitize NULL checks into canonical formEdward O'Callaghan2015-12-061-1/+1
| | | | | | | | | | Use NULL tests of the form `if (ptr)' or `if (!ptr)'. They do not depend on the definition of the symbol NULL. Further, they provide the opportunity for the accidental assignment, are clear and succinct. Signed-off-by: Edward O'Callaghan <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* vc4: Fix accidental scissoring when scissor is disabled.Eric Anholt2015-12-051-5/+23
| | | | | | | | Even if the rasterizer has scissor disabled, we'll have whatever vc4->scissor bounds were last set when someone set up a scissor, so we shouldn't clip to them in that case. Fixes piglit fbo-blit-rect, and a lot of MSAA tests once they're enabled.
* vc4: Disable RCL blitting when scissors are enabled.Eric Anholt2015-12-051-0/+3
| | | | | | | | We could potentially handle scissored blits when they're tile aligned, but it doesn't seem worth it. If you're doing a scissored blit, you're probably a testcase. Fixes piglit's fbo-scissor-blit fbo
* vc4: Bring over cleanups from submitting to the kernel.Eric Anholt2015-12-054-87/+78
|
* vc4: Add debug dumping of MSAA surfaces.Eric Anholt2015-12-042-6/+145
|
* vc4: Add support for laying out MSAA resources.Eric Anholt2015-12-041-5/+20
| | | | | | For MSAA, we store full resolution tile buffer contents, which have their own tiling format. Since they're full resolution buffers, we have to align their size to full tiles.
* vc4: Add support for storing sample mask.Eric Anholt2015-12-045-0/+24
| | | | | From the API perspective, writing 1 bits can't turn on pixels that were off, so we AND it with the sample mask from the payload.
* vc4: Fix up tile alignment checks for blitting using just an RCL.Eric Anholt2015-12-041-6/+22
| | | | | | | | We were checking that the blit started at 0 and was 1:1, but not that it went to the full width of the surface, or that the width was aligned to a tile. We then told it to blit to the full width/height of the surface, causing contents to be stomped in a bunch of MSAA tests that happen to include half-screen-width blits to 0,0.
* vc4: Add support for loading sample mask.Eric Anholt2015-12-046-1/+19
|
* vc4: Add the RCL to CL debug dumping when in simulator mode.Eric Anholt2015-12-031-0/+6
| | | | | | We can't dump it in the real driver, since the kernel doesn't give us a handle to it (except after a GPU hang, using a root ioctl). In the simulator we can.
* vc4: Just put USE_VC4_SIMULATOR in DEFINES.Eric Anholt2015-11-222-5/+0
| | | | | | | | In the pipe-loader reworks, it was missed in one of the new directories it was used. Cc: [email protected] Reviewed-by: Emil Velikov <[email protected]>
* vc4: Use nir_channel() to simplify all of our nir_swizzle() cases.Eric Anholt2015-11-212-6/+5
|
* vc4: Fix point size lookup.Eric Anholt2015-11-211-1/+1
| | | | | | | | I think I may have regressed this in the NIR conversion. TGSI-to-NIR is putting the PSIZ in the .x channel, not .w, so we were grabbing some garbage for point size, which ended up meaning just not drawing points. Fixes glean pointAtten and pointsprite.
* vc4: Don't bother lowering uniforms when the same value is used twice.Eric Anholt2015-11-171-13/+33
| | | | | | | | | | | | | DEQP likes to do math on uniforms, and the "fmaxabs dst, uni, uni" to get the absolute value would get lowered. The lowering doesn't bother to try to restrict the lifetime of the lowered uniforms, so we'd end up register allocation failng due to this on 5 of the tests (More tests still fail in RA, which look like we'll need to reduce lowered uniform lifetimes to fix). No changes on shader-db, though fewer extra MOVs are generated on even glxgears (MOVs pair well enough that it ends up being the same instruction count).
* vc4: Fix uniform reordering to support reading the same uniform twice.Eric Anholt2015-11-171-8/+18
| | | | | This does actually happen in the wild (particularly fabs of a uniform), so we'd like to support it.
* vc4: Fix documentation on vc4_qir_lower_uniforms.c.Eric Anholt2015-11-171-7/+3
|
* vc4: Add support for nir_op_uge, using the carry bit on QPU_A_SUB.Eric Anholt2015-11-175-0/+26
| | | | | | | | It looks like nir_lower_idiv is going to use it soon, so add support. With Ilia's change, this fixes one case in fs-op-div-large-uint-uint (with GL 3.0 forced on). Cc: "11.0" <[email protected]>
* gallium: add PIPE_CAP_CLEAR_TEXTURE and clear_texture prototypeIlia Mirkin2015-11-111-0/+1
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* vc4: Avoid loading undefined (newly-allocated) FBO contents.Eric Anholt2015-11-091-0/+17
| | | | | | | Since X has undefined contents in new pixmaps, it will allocate new textures for an FBO and draw to them without an explicit clear. For VC4, it's much faster to emit a clear than the load of the actual undefined memory contents, so just do that instead.
* vc4: Return NULL when we can't make our shadow for a sampler view.Eric Anholt2015-11-091-0/+4
| | | | | | | I'm not sure what the caller does is appropriate (just have a NULL sampler at this slot), but it fixes the immediate crash. Cc: "11.0" <[email protected]>
* vc4: Return GL_OUT_OF_MEMORY when buffer allocation fails.Eric Anholt2015-11-092-19/+32
| | | | | | | I was afraid our callers weren't prepared for this, but it looks like at least for resource creation, mesa/st throws an error appropriately. Cc: "11.0" <[email protected]>
* vc4: Add CL dumping for GL_ARRAY_PRIMITIVE.Eric Anholt2015-11-091-1/+16
|
* vc4: Fix a compiler warning.Eric Anholt2015-11-091-1/+1
|
* vc4: When the create ioctl fails, free our cache and try again.Eric Anholt2015-11-041-5/+24
| | | | | | | | | This greatly increases the pressure you can put on the driver before create fails. Ultimately we need to let the kernel take control of our cached BOs and just take them from us (and other clients) directly, but this is a very easy patch for the moment. Cc: "11.0" <[email protected]>
* vc4: Print the rounded shader size in debug output.Eric Anholt2015-11-041-1/+1
| | | | | It's surprising to see "0kb" printed for debug on short shaders, while 4kb alignment won't be suprising.
* vc4: Fix dumping the size of BOs allocated/cached.Eric Anholt2015-11-041-2/+2
| | | | 60MB of cached BOs are a lot less scary than 600MB.
* vc4: Allow user index buffers, to avoid slow readback for shadow IBs.Eric Anholt2015-10-294-10/+25
| | | | | Improves low-settings openarena performance by 31.9975% +/- 0.659931% (n=7).
* gallium: add PIPE_CAP_COPY_BETWEEN_COMPRESSED_AND_PLAIN_FORMATSMarek Olšák2015-10-281-0/+1
| | | | | | For ARB_copy_image. Reviewed-by: Brian Paul <[email protected]>
* vc4: Add support for copy propagation with unpack flags present.Eric Anholt2015-10-262-36/+109
| | | | | total instructions in shared programs: 89251 -> 87862 (-1.56%) instructions in affected programs: 52971 -> 51582 (-2.62%)
* vc4: Rewrite the pack instructions as a MOV with a dst pack flagEric Anholt2015-10-263-37/+18
| | | | Another step in reducing the special-casing of instructions.
* vc4: Move dst pack setup out to a helper function with more asserts.Eric Anholt2015-10-261-10/+22
|
* vc4: Switch the unpack ops to being unpack flags on a mov.Eric Anholt2015-10-266-123/+42
| | | | | | | | | | | | This paves the way for copy propagating our unpacks. We end up with a small change on shader-db: total instructions in shared programs: 89390 -> 89251 (-0.16%) instructions in affected programs: 19041 -> 18902 (-0.73%) which appears to be because we no longer convert MOVs for an FMAX dst, r4.unpack, r4.unpack (instead of the previous MOV dst, r4.unpack), and this ends up with a slightly better schedule.
* vc4: Drop some confused code about pack/unpack handling.Eric Anholt2015-10-261-23/+4
| | | | | | | | | At one point I thought packs and unpacks were in the same field of the instruction. They aren't. These instructions therefore never cause a pack. total instructions in shared programs: 89472 -> 89390 (-0.09%) instructions in affected programs: 15261 -> 15179 (-0.54%)
* vc4: Reduce MOV special-casing in QIR-to-QPU.Eric Anholt2015-10-261-8/+11
| | | | | I'm going to introduce some more types of MOV, which also want the elision of raw MOVs.
* vc4: Fix up the test for whether the unpack can be from r4.Eric Anholt2015-10-263-8/+27
| | | | We can do 16a/16b from float as well. No difference on shader-db.
* vc4: Don't try to follow MOVs across a pack.Eric Anholt2015-10-261-1/+2
|
* vc4: Only copy propagate raw MOVs.Eric Anholt2015-10-261-6/+1
| | | | No problems being fixed, but needed for the new unpack changes.