aboutsummaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/v3d
Commit message (Collapse)AuthorAgeFilesLines
* v3d: Use the new lower_to_scratch implementation for indirects on temps.Eric Anholt2019-04-121-1/+2
| | | | | | | | | | | | | We can use the same register spilling infrastructure for our loads/stores of indirect access of temp variables, instead of doing an if ladder. Cuts 50% of instructions and max-temps from 2 KSP shaders in shader-db. Also causes several other KSP shaders with large bodies and large loop counts to not be force-unrolled. The change was originally motivated by NOLTIS slightly modifying register pressure in piglit temp mat4 array read/write tests, triggering register allocation failures.
* v3d: Detect the correct number of QPUs and use it to fix the spill size.Eric Anholt2019-04-122-4/+10
| | | | | We were missing a * 4 even if the particular hardware matched our assumption.
* v3d: Add Compute Shader compilation support.Eric Anholt2019-04-126-79/+258
| | | | | | | | While waiting for the CSD UABI to get reviewed, I keep having to rebase the CS patch. Just land the compiler side for now to keep it from diverging. For now this covers just GLES 3.1 compute shaders, not CL kernels.
* v3d: Drop a note for the future about PIPE_CAP_PACKED_UNIFORMS.Eric Anholt2019-04-121-0/+7
|
* nir/i965/freedreno/vc4: add a bindless bool to type size functionsTimothy Arceri2019-04-121-1/+1
| | | | | | | This required to calculate sizes correctly when we have bindless samplers/images. Reviewed-by: Marek Olšák <[email protected]>
* st: Lower uniforms in st in the !PIPE_CAP_PACKED_UNIFORMS case as well.Eric Anholt2019-04-101-11/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | PIPE_CAP_PACKED_UNIFORMS conflates several things: Lowering uniforms i/o at the st level instead of the backend, packing uniforms with no padding at all, and lowering to UBOs. Requiring backends to lower uniforms i/o for !PIPE_CAP_PACKED_UNIFORMS leads to the driver needing to either link against the type size function in mesa/st, or duplicating it in the backend. Given that all backends want this lower-io as far as I can tell, just move it to mesa/st to resolve the link issue and avoid the driver author needing to understand st's uniforms layout. Incidentally, fixes uniform layout failures in nouveau in: dEQP-GLES2.functional.shaders.struct.uniform.sampler_nested_fragment dEQP-GLES2.functional.shaders.struct.uniform.sampler_nested_vertex dEQP-GLES2.functional.shaders.struct.uniform.sampler_array_fragment dEQP-GLES2.functional.shaders.struct.uniform.sampler_array_vertex and I think in Lima as well. v2: fix indents Reviewed-by: Kenneth Graunke <[email protected]>
* nir: Get rid of global registersJason Ekstrand2019-04-091-1/+0
| | | | | | | | | We have a pass to lower global registers to locals and many drivers dutifully call it. However, no one ever creates a global register ever so it's all dead code. It's time we bury it. Acked-by: Karol Herbst <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* v3d: Don't try to use the TFU blit path if a scissor is enabled.Eric Anholt2019-04-041-1/+2
| | | | | | | | We'll need to do a render-based blit for scissors, since the TFU (as seen in this conditional) can only update a whole surface. Fixes: 976ea90bdca2 ("v3d: Add support for using the TFU to do some blits.") Fixes piglit fbo-scissor-blit.
* v3d: Bump the maximum texture size to 4k for V3D 4.x.Eric Anholt2019-04-043-2/+29
| | | | | | | 4.1 and 4.2 both have the same 16k limit, but it I'm seeing GPU hangs in the CTS at 8k and 16k. 4k at least lets us get one 4k display working. Cc: [email protected]
* v3d: Add support for handling OOM signals from the simulator.Eric Anholt2019-04-043-14/+78
| | | | | | I have v3d allocating enough initial allocation memory that we've been passing tests without it, but to match kernel behavior more it would be good to actually exercise the OOM path.
* gallium: add writable_bitmask parameter into set_shader_buffersMarek Olšák2019-04-041-1/+2
| | | | | | | to indicate write usage per buffer. This is just a hint (it will be used by radeonsi). Reviewed-by: Timothy Arceri <[email protected]>
* v3d: Upload all of UBO[0] if any indirect load occurs.Eric Anholt2019-03-211-38/+19
| | | | | | | | | | | | | | | The idea was that we could skip uploading the constant-indexed uniform data and just upload the uniforms that are variably-indexed. However, since the VS bin and render shaders may have a different set of uniforms used, this meant that we had to upload the UBO for each of them. The first case is generally a fairly small impact (usually the uniform array is the most space, other than a couple of FSes in shader-db), while the second is a larger impact: 3DMMES2 was uploading 38k/frame of uniforms instead of 18k. Given that the optimization is of dubious value, has a big downside, and is quite a bit of code, just drop it. No change in shader-db. No change on 3DMMES2 (n=15).
* v3d: Move constant offsets to UBO addresses into the main uniform stream.Eric Anholt2019-03-212-2/+6
| | | | | | | | | | We'd end up with the constant offset in the uniform stream anyway, since they're bigger than small immediates. Avoids the extra uniforms and adds in the shader in favor of just adding once on the CPU. shader-db: total instructions in shared programs: 6496865 -> 6494851 (-0.03%) total uniforms in shared programs: 2119511 -> 2117243 (-0.11%)
* v3d: Rename v3d_tmu_config_data to v3d_unit_data.Eric Anholt2019-03-211-4/+4
| | | | | | I want to reuse this for encoding small constant UBO/SSBO offsets into the uniform stream to reduce the extra uniform loads and adds for the small constant offsets.
* gallium: Add PIPE_BARRIER_UPDATE_BUFFER and UPDATE_TEXTURE bits.Kenneth Graunke2019-03-191-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The glMemoryBarrier() function makes shader memory stores ordered with respect to things specified by the given bits. Until now, st/mesa has ignored GL_TEXTURE_UPDATE_BARRIER_BIT and GL_BUFFER_UPDATE_BARRIER_BIT, saying that drivers should implicitly perform the needed flushing. This seems like a pretty big assumption to make. Instead, this commit opts to translate them to new PIPE_BARRIER bits, and adjusts existing drivers to continue ignoring them (preserving the current behavior). The i965 driver performs actions on these memory barriers. Shader memory stores go through a "data cache" which is separate from the render cache and other read caches (like the texture cache). All memory barriers need to flush the data cache (to ensure shader memory stores are visible), and possibly invalidate read caches (to ensure stale data is no longer visible). The driver implicitly flushes for most caches, but not for data cache, since ARB_shader_image_load_store introduced MemoryBarrier() precisely to order these explicitly. I would like to follow i965's approach in iris, flushing the data cache on any MemoryBarrier() call, so I need st/mesa to actually call the pipe->memory_barrier() callback. Fixes KHR-GL45.shader_image_load_store.advanced-sync-textureUpdate and Piglit's spec/arb_shader_image_load_store/host-mem-barrier on the iris driver. Roland said this looks reasonable to him. Reviewed-by: Eric Anholt <[email protected]>
* v3d: Expose the dma-buf modifiers query.Eric Anholt2019-03-191-0/+29
| | | | | | | | | | This allows DRI3 to pick between UIF and raster according to whether we're pageflipping or not and whether the pageflipping display can do UIF, avoiding copies for the windowed/composited case that previously was forced to linear. Improves windowed glmark2 -b build:use-vbo=false performance by 30.7783% +/- 13.1719% (n=3)
* v3d: Allow the UIF modifier with renderonly.Eric Anholt2019-03-191-38/+52
| | | | | | | We ask the other side to make a buffer with the right number of pages, and then just store the UIF in it. This avoids an extra silent copy of the buffer from linear to UIF if it gets used for texturing (X11 copy-based swapbuffers, GL compositors).
* v3d: Always lay out shared tiled buffers with UIF_TOP set.Eric Anholt2019-03-191-4/+6
| | | | | The samplers are already ready for this, we just needed to make sure that layout chose UIF for level 0.
* v3d: Use shared drm_find_modifier utilAlyssa Rosenzweig2019-03-141-15/+3
| | | | | Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* v3d: Fix leak of the renderonly struct on screen destruction.Eric Anholt2019-03-121-0/+1
| | | | | | This makes v3d match vc4's destroy path. Fixes: e113b21cb779 ("v3d: Add renderonly support.")
* v3d: Disable PIPE_CAP_BLIT_BASED_TEXTURE_TRANSFER.Eric Anholt2019-03-121-0/+3
| | | | | | This reduces the runtime of dEQP-GLES3.functional.shaders.precision.* from 11.5s to 3.3s. This brings CTS runs down to 4 hours on one of my target devices.
* tgsi_to_nir: Produce optimized NIR for a given pipe_screen.Timur Kristóf2019-03-051-1/+1
| | | | | | | | | | | | | | | | | | | With this patch, tgsi_to_nir will output NIR that is tailored to the given pipe, by reading its capabilities and adjusting the NIR code to those capabilities similarly to how glsl_to_nir works. It also adds an optimization loop that brings the output NIR in line with what glsl_to_nir outputs. This is necessary for the same reason why glsl_to_nir has its own optimization loop: currently not every driver does these optimizations yet. For uses which cannot pass a pipe_screen we also keep a variant called tgsi_to_nir_noscreen which keeps the old behavior. Signed-Off-By: Timur Kristóf <[email protected]> Tested-by: Andre Heider <[email protected]> Tested-by: Rob Clark <[email protected]> Acked-By: Eric Anholt <[email protected]>
* v3d: Fix build of NEON code with Mesa's cflags not targeting NEON.Eric Anholt2019-03-011-3/+17
| | | | | | | | v3d may be built as part of a set of drivers in a system not requiring NEON, but we know V3D devices will be paired with CPUs with NEON so we should be able to use this asm. Fixes: 0c05198d6b5b ("v3d: Always enable the NEON utile load/store code.")
* v3d: Stop tracking num_inputs for VPM loads.Eric Anholt2019-02-182-2/+2
| | | | | It's unused in the VS (since we need vattr_sizes[] anyway), so move it to FS prog data.
* v3d: Use the early_fragment_tests flag for the shader's disable-EZ field.Eric Anholt2019-02-181-2/+7
| | | | | | | | | | | Apparently we need disable-EZ flagged, not just "does Z writes". Fixes dEQP-GLES31.functional.image_load_store.early_fragment_tests.no_early_fragment_tests_depth_fbo on 7278, even though it passed in simulation. Signed-off-by: Eric Anholt <[email protected]> Fixes: 051a41d3d56e ("v3d: Add support for the early_fragment_tests flag.")
* v3d: Sync indirect draws on the last rendering.Eric Anholt2019-02-181-2/+2
| | | | | | Fixes intermittent fails in dEQP-GLES31.functional.draw_indirect.compute_interop.separate.drawelements_compute_cmd_and_data_and_indices and others (particularly when run as part of a CTS run)
* v3d: Clear the GMP on initialization of the simulator.Eric Anholt2019-02-181-0/+1
| | | | | | | Otherwise, we might have pages accessible that shouldn't be and miss out on errors. This is unlikely for most tests since v3d_hw_get_mem() is big enough that it'll be a freshly zeroed mmap, but if screens are destroyed and recreated then we'd be reusing the old v3d_hw_get_mem() contents.
* drm-uapi: use local files, not system libdrmEric Engestrom2019-02-144-5/+5
| | | | | | | | | There was an issue recently caused by the system header being included by mistake, so let's just get rid of this include path and always explicitly #include "drm-uapi/FOO.h" Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* gallium: add PIPE_CAP_MAX_VARYINGSKarol Herbst2019-02-071-0/+3
| | | | | | | | | | | | | | | | | Some NVIDIA hardware can accept 128 fragment shader input components, but only have up to 124 varying-interpolated input components. We add a new cap to express this cleanly. For most drivers, this will have the same value as PIPE_SHADER_CAP_MAX_INPUTS for the fragment shader. Fixes KHR-GL45.limits.max_fragment_input_components Signed-off-by: Karol Herbst <[email protected]> [imirkin: rebased, improved docs/commit message] Signed-off-by: Ilia Mirkin <[email protected]> Acked-by: Rob Clark <[email protected]> Acked-by: Eric Anholt <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Cc: 19.0 <[email protected]>
* v3d: Store the actual mask of color buffers present in the key.Eric Anholt2019-02-051-9/+10
| | | | | | | If you only bound rt 1+, we'd still emit a write to the rt0 that isn't present (noticed while debugging an ext_framebuffer_multisample-alpha-to-coverage-no-draw-buffer-zero regression in another change).
* v3d: Fix precompile of FRAG_RESULT_DATA1 and higher outputs.Eric Anholt2019-02-051-1/+1
| | | | I was just leaving the other MRT targets than DATA0 out, by accident.
* nir: Move V3D's "the shader was TGSI, ignore FS output types" flag to NIR.Eric Anholt2019-02-052-10/+2
| | | | | | | | | | | | | | Ken's rework of mesa/st builtins to NIR means that we'll have more NIR shaders with color output types that are mismatched with the render target types. Since this is behavior that GLSL doesn't require, add it as a shader_info option so the driver can know that it needs to ignore the FS output's base type in favor of the actual render target's. This prevents needing additional variants in several mesa/st paths (clear, pbo upload, pbo download), given that the driver already has to handle the variants for any TGSI being passed to it (from u_blitter, for example). Reviewed-by: Rob Clark <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* v3d: Fix leak in resource setup error pathErnestas Kulik2019-01-291-1/+1
| | | | | | | | | Reported by Coverity: in the case of unsupported modifier request, the code does not jump to the “fail” label to destroy the acquired resource. CID: 1435704 Signed-off-by: Ernestas Kulik <[email protected]> Fixes: 45bb8f295710 ("broadcom: Add V3D 3.3 gallium driver called "vc5", for BCM7268.")
* v3d: Always enable the NEON utile load/store code.Eric Anholt2019-01-291-5/+6
| | | | | | | I can't imagine the new HW block being paired with a v6 CPU, so don't bother with the CPU detection that vc4 had to do. Improves 1024x1024 TexImage on my 7278 by 47.3229% +/- 0.679632%
* v3d: Create separate sampler states for the various blend formats.Eric Anholt2019-01-273-46/+299
| | | | | | | | | | | | The sampler border color is encoded in the TMU's blending format (half floats, 32-bit floats, or integers) and must be clamped to the format's range unorm/snorm/int ranges by the driver. Additionally, the TMU doesn't know about how we're abusing the swizzle to support BGRA, A, and LA, so we have to pre-swizzle the border color for those. We don't really want to spend half a kb on sampler states in most cases, so skip generating the variants when the border color is unused or is 0,0,0,0.
* v3d: Move the sampler state to the long-lived state uploader.Eric Anholt2019-01-273-6/+13
| | | | | Samplers are small (8-24 bytes), so allocating 4k for them is a huge waste.
* v3d: Use the symbolic names for wrap modes from the XML.Eric Anholt2019-01-271-6/+9
|
* v3d: Fix stencil sampling from a separate-stencil buffer.Eric Anholt2019-01-272-0/+7
| | | | | | | | When the sampler view is in sample-stencil mode, we need to return uint stencil values. To do that, fill in the format table to return R8I, and have the sampler view point at the separate stencil buffer. Fixes dEQP-GLES31.functional.stencil_texturing.format.depth32f_stencil8_2d
* v3d: Fix stencil sampling from packed depth/stencil.Eric Anholt2019-01-271-1/+1
| | | | We need to pick the 8-bit unorm value out, not the depth component.
* v3d: Fix release-build warning about utile_h.Eric Anholt2019-01-271-2/+1
|
* v3d: Flush blit jobs immediately after generating them.Eric Anholt2019-01-271-0/+8
| | | | | | | | | | Fixes OOMs in the CTS's packed_pixels.varied_rectangle.* tests -- the series of texture uploads at the start before texturing occurred would end up all sitting around as cached jobs for reuse. By flushing immediately, peak active BO usage goes from 150M to 40M. We could maybe put some limits on how many jobs we keep around, but blits seem particularly unlikely to get reused for other drawing.
* v3d: Fix BO stats accounting for imported buffers.Eric Anholt2019-01-271-0/+3
|
* v3d: Drop maximum number of texture units down to 16.Eric Anholt2019-01-271-3/+3
| | | | | | This is the GLES 3.2 minmax, and also what the closed source driver does. Avoids hitting OOMs in the CTS's dEQP-GLES3.functional.texture.units.all_units.only_cube.1.
* v3d: Avoid duplicating limits defines between gallium and v3d core.Eric Anholt2019-01-276-12/+9
| | | | | We don't want to pull the compiler into every include in the gallium driver, so just make a new little header to store the limits.
* v3d: Rename gallium-local limits defines from VC5 to V3D.Eric Anholt2019-01-2711-33/+33
| | | | | The compiler has its limits under V3D_* (like most V3D stuff), so sync up with that.
* nir: rename nir_var_function to nir_var_function_tempKarol Herbst2019-01-191-1/+1
| | | | | | | | Signed-off-by: Karol Herbst <[email protected]> Acked-by: Jason Ekstrand <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* v3d: Restructure RO allocations using resource_from_handle.Eric Anholt2019-01-161-29/+38
| | | | | | | | | | | | | | | I had bugs in the old path where I was laying out as tiled (so we'd render tiled) but then only allocating space in the shared object for linear rendering. The resource_from_handle makes it so the same layout choices are made in both the import and export scanout cases. Also, fixes a leak of the fd that was tripping up the CTS. Now that we're checking PIPE_BIND_SHARED to choose to use RO, the DRM_FORMAT_MOD_LINEAR check wasn't needed any more. Fixes visual corruption and MMU faults in X in renderonly mode. Fixes: bd09bb1629a7 ("v3d: SHARED but not necessarily SCANOUT buffers on RO must be linear.")
* v3d: If the modifier is not known on BO import, default to linear for RO.Eric Anholt2019-01-161-1/+3
| | | | | | Part of fixing DRI3 rendering with RO on X11. Fixes: e113b21cb779 ("v3d: Add renderonly support.")
* v3d: SHARED but not necessarily SCANOUT buffers on RO must be linear.Eric Anholt2019-01-141-1/+1
| | | | | We don't have a way to talk to RO about modifiers it can do yet, so assume the minimum.
* v3d: Add support for shader_image_load_store.Eric Anholt2019-01-145-2/+196
| | | | | | This is only exposed on V3D 4.1+, because we didn't have the TMU write operations for images on 3.3 (To do GLES 3.1 there, you have to lower it to SSBO load/stores, which is a problem to solve later).