summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/v3d
Commit message (Collapse)AuthorAgeFilesLines
* v3d: don't emit point coordinates varyings if the FS doesn't read themIago Toral Quiroga2019-06-071-0/+5
| | | | | | | We still need to emit them in V3D 3.x since there there is no mechanism to disable them. Reviewed-by: Eric Anholt <[email protected]>
* v3d: Use driconf to expose non-MSAA texture limits for Xorg.Eric Anholt2019-05-134-3/+38
| | | | | | The V3D 4.2 HW has a limit to MSAA texture sizes of 4096. With non-MSAA, we can go up to 7680 (actually probably 8138, but that hasn't been validated by the HW team). Exposing 7680 in X11 will allow dual 4k displays.
* gallium: Redefine the max texture 2d cap from _LEVELS to _SIZE.Eric Anholt2019-05-131-1/+5
| | | | | | | | The _LEVELS assumes that the max is always power of two. For V3D 4.2, we can support up to 7680 non-power-of-two MSAA textures, which will let X11 support dual 4k displays on newer hardware. Reviewed-by: Marek Olšák <[email protected]>
* Revert "v3d: Disable PIPE_CAP_BLIT_BASED_TEXTURE_TRANSFER."Eric Anholt2019-04-261-1/+9
| | | | | | This reverts commit ccce9409470c1053c40c822d759b9bd417062bc0, leaving a note as to why we had to (corruption in chromium, breaking some GLES3.1 tests).
* v3d: Don't try to update the shadow texture for separate stencil.Eric Anholt2019-04-261-1/+2
| | | | | | | | | | | There are two cases where v3d's sampler view's resource doesn't match the base's: shadow textures for sampling from raster, and pointing at the separate depth texture for z32f_s8x24. We only want to update shadow for the first case. Fixes dEQP-GLES31.functional.stencil_texturing.render.depth32f_stencil8_draw when run after the previous testcase.
* v3d: Use _mesa_hash_table_remove_key() where appropriate.Eric Anholt2019-04-261-13/+8
|
* v3d: Apply the GFXH-930 workaround to the case where the VS loads attrs.Eric Anholt2019-04-261-0/+15
| | | | | | | We were emitting a dummy load for when the VS doesn't load any attributes, but we also need to emit a dummy load for when the render VS loads attributes but the binner VS doesn't. Fixes simulator assertion failures and GPU hangs on KHR-GLES31.core.texture_gather.\*
* v3d: Fill in the ignored segment size fields to appease new simulator.Eric Anholt2019-04-261-2/+4
| | | | | | We are assured that the input segment size field is ignored for !separate_segs mode, and now the simulator wants an in-range value set regardless of whether it's functionally ignored or not.
* v3d: Disable SSBOs and atomic counters on vertex shaders.Eric Anholt2019-04-241-0/+3
| | | | | | | | | | The CTS fails on dEQP-GLES31.functional.shaders.opaque_type_indexing.atomic_counter.*vertex when they are enabled, due to the VS being run for both bin and render. I think this behavior is expected to be valid, but I can't find text in atomic counters or SSBO specs saying so (the closed I found was in shader_image_load_store). Just disable it for now, since the closed source driver doesn't expose vertex atomic counters/SSBOs either.
* Delete autotoolsDylan Baker2019-04-152-75/+0
| | | | | | | | | | Acked-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Acked-by: Marek Olšák <[email protected]> Acked-by: Jason Ekstrand <[email protected]> Acked-by: Bas Nieuwenhuizen <[email protected]> Acked-by: Matt Turner <[email protected]>
* v3d: Use the new lower_to_scratch implementation for indirects on temps.Eric Anholt2019-04-121-1/+2
| | | | | | | | | | | | | We can use the same register spilling infrastructure for our loads/stores of indirect access of temp variables, instead of doing an if ladder. Cuts 50% of instructions and max-temps from 2 KSP shaders in shader-db. Also causes several other KSP shaders with large bodies and large loop counts to not be force-unrolled. The change was originally motivated by NOLTIS slightly modifying register pressure in piglit temp mat4 array read/write tests, triggering register allocation failures.
* v3d: Detect the correct number of QPUs and use it to fix the spill size.Eric Anholt2019-04-122-4/+10
| | | | | We were missing a * 4 even if the particular hardware matched our assumption.
* v3d: Add Compute Shader compilation support.Eric Anholt2019-04-126-79/+258
| | | | | | | | While waiting for the CSD UABI to get reviewed, I keep having to rebase the CS patch. Just land the compiler side for now to keep it from diverging. For now this covers just GLES 3.1 compute shaders, not CL kernels.
* v3d: Drop a note for the future about PIPE_CAP_PACKED_UNIFORMS.Eric Anholt2019-04-121-0/+7
|
* nir/i965/freedreno/vc4: add a bindless bool to type size functionsTimothy Arceri2019-04-121-1/+1
| | | | | | | This required to calculate sizes correctly when we have bindless samplers/images. Reviewed-by: Marek Olšák <[email protected]>
* st: Lower uniforms in st in the !PIPE_CAP_PACKED_UNIFORMS case as well.Eric Anholt2019-04-101-11/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | PIPE_CAP_PACKED_UNIFORMS conflates several things: Lowering uniforms i/o at the st level instead of the backend, packing uniforms with no padding at all, and lowering to UBOs. Requiring backends to lower uniforms i/o for !PIPE_CAP_PACKED_UNIFORMS leads to the driver needing to either link against the type size function in mesa/st, or duplicating it in the backend. Given that all backends want this lower-io as far as I can tell, just move it to mesa/st to resolve the link issue and avoid the driver author needing to understand st's uniforms layout. Incidentally, fixes uniform layout failures in nouveau in: dEQP-GLES2.functional.shaders.struct.uniform.sampler_nested_fragment dEQP-GLES2.functional.shaders.struct.uniform.sampler_nested_vertex dEQP-GLES2.functional.shaders.struct.uniform.sampler_array_fragment dEQP-GLES2.functional.shaders.struct.uniform.sampler_array_vertex and I think in Lima as well. v2: fix indents Reviewed-by: Kenneth Graunke <[email protected]>
* nir: Get rid of global registersJason Ekstrand2019-04-091-1/+0
| | | | | | | | | We have a pass to lower global registers to locals and many drivers dutifully call it. However, no one ever creates a global register ever so it's all dead code. It's time we bury it. Acked-by: Karol Herbst <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* v3d: Don't try to use the TFU blit path if a scissor is enabled.Eric Anholt2019-04-041-1/+2
| | | | | | | | We'll need to do a render-based blit for scissors, since the TFU (as seen in this conditional) can only update a whole surface. Fixes: 976ea90bdca2 ("v3d: Add support for using the TFU to do some blits.") Fixes piglit fbo-scissor-blit.
* v3d: Bump the maximum texture size to 4k for V3D 4.x.Eric Anholt2019-04-043-2/+29
| | | | | | | 4.1 and 4.2 both have the same 16k limit, but it I'm seeing GPU hangs in the CTS at 8k and 16k. 4k at least lets us get one 4k display working. Cc: [email protected]
* v3d: Add support for handling OOM signals from the simulator.Eric Anholt2019-04-043-14/+78
| | | | | | I have v3d allocating enough initial allocation memory that we've been passing tests without it, but to match kernel behavior more it would be good to actually exercise the OOM path.
* gallium: add writable_bitmask parameter into set_shader_buffersMarek Olšák2019-04-041-1/+2
| | | | | | | to indicate write usage per buffer. This is just a hint (it will be used by radeonsi). Reviewed-by: Timothy Arceri <[email protected]>
* v3d: Upload all of UBO[0] if any indirect load occurs.Eric Anholt2019-03-211-38/+19
| | | | | | | | | | | | | | | The idea was that we could skip uploading the constant-indexed uniform data and just upload the uniforms that are variably-indexed. However, since the VS bin and render shaders may have a different set of uniforms used, this meant that we had to upload the UBO for each of them. The first case is generally a fairly small impact (usually the uniform array is the most space, other than a couple of FSes in shader-db), while the second is a larger impact: 3DMMES2 was uploading 38k/frame of uniforms instead of 18k. Given that the optimization is of dubious value, has a big downside, and is quite a bit of code, just drop it. No change in shader-db. No change on 3DMMES2 (n=15).
* v3d: Move constant offsets to UBO addresses into the main uniform stream.Eric Anholt2019-03-212-2/+6
| | | | | | | | | | We'd end up with the constant offset in the uniform stream anyway, since they're bigger than small immediates. Avoids the extra uniforms and adds in the shader in favor of just adding once on the CPU. shader-db: total instructions in shared programs: 6496865 -> 6494851 (-0.03%) total uniforms in shared programs: 2119511 -> 2117243 (-0.11%)
* v3d: Rename v3d_tmu_config_data to v3d_unit_data.Eric Anholt2019-03-211-4/+4
| | | | | | I want to reuse this for encoding small constant UBO/SSBO offsets into the uniform stream to reduce the extra uniform loads and adds for the small constant offsets.
* gallium: Add PIPE_BARRIER_UPDATE_BUFFER and UPDATE_TEXTURE bits.Kenneth Graunke2019-03-191-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The glMemoryBarrier() function makes shader memory stores ordered with respect to things specified by the given bits. Until now, st/mesa has ignored GL_TEXTURE_UPDATE_BARRIER_BIT and GL_BUFFER_UPDATE_BARRIER_BIT, saying that drivers should implicitly perform the needed flushing. This seems like a pretty big assumption to make. Instead, this commit opts to translate them to new PIPE_BARRIER bits, and adjusts existing drivers to continue ignoring them (preserving the current behavior). The i965 driver performs actions on these memory barriers. Shader memory stores go through a "data cache" which is separate from the render cache and other read caches (like the texture cache). All memory barriers need to flush the data cache (to ensure shader memory stores are visible), and possibly invalidate read caches (to ensure stale data is no longer visible). The driver implicitly flushes for most caches, but not for data cache, since ARB_shader_image_load_store introduced MemoryBarrier() precisely to order these explicitly. I would like to follow i965's approach in iris, flushing the data cache on any MemoryBarrier() call, so I need st/mesa to actually call the pipe->memory_barrier() callback. Fixes KHR-GL45.shader_image_load_store.advanced-sync-textureUpdate and Piglit's spec/arb_shader_image_load_store/host-mem-barrier on the iris driver. Roland said this looks reasonable to him. Reviewed-by: Eric Anholt <[email protected]>
* v3d: Expose the dma-buf modifiers query.Eric Anholt2019-03-191-0/+29
| | | | | | | | | | This allows DRI3 to pick between UIF and raster according to whether we're pageflipping or not and whether the pageflipping display can do UIF, avoiding copies for the windowed/composited case that previously was forced to linear. Improves windowed glmark2 -b build:use-vbo=false performance by 30.7783% +/- 13.1719% (n=3)
* v3d: Allow the UIF modifier with renderonly.Eric Anholt2019-03-191-38/+52
| | | | | | | We ask the other side to make a buffer with the right number of pages, and then just store the UIF in it. This avoids an extra silent copy of the buffer from linear to UIF if it gets used for texturing (X11 copy-based swapbuffers, GL compositors).
* v3d: Always lay out shared tiled buffers with UIF_TOP set.Eric Anholt2019-03-191-4/+6
| | | | | The samplers are already ready for this, we just needed to make sure that layout chose UIF for level 0.
* v3d: Use shared drm_find_modifier utilAlyssa Rosenzweig2019-03-141-15/+3
| | | | | Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* v3d: Fix leak of the renderonly struct on screen destruction.Eric Anholt2019-03-121-0/+1
| | | | | | This makes v3d match vc4's destroy path. Fixes: e113b21cb779 ("v3d: Add renderonly support.")
* v3d: Disable PIPE_CAP_BLIT_BASED_TEXTURE_TRANSFER.Eric Anholt2019-03-121-0/+3
| | | | | | This reduces the runtime of dEQP-GLES3.functional.shaders.precision.* from 11.5s to 3.3s. This brings CTS runs down to 4 hours on one of my target devices.
* tgsi_to_nir: Produce optimized NIR for a given pipe_screen.Timur Kristóf2019-03-051-1/+1
| | | | | | | | | | | | | | | | | | | With this patch, tgsi_to_nir will output NIR that is tailored to the given pipe, by reading its capabilities and adjusting the NIR code to those capabilities similarly to how glsl_to_nir works. It also adds an optimization loop that brings the output NIR in line with what glsl_to_nir outputs. This is necessary for the same reason why glsl_to_nir has its own optimization loop: currently not every driver does these optimizations yet. For uses which cannot pass a pipe_screen we also keep a variant called tgsi_to_nir_noscreen which keeps the old behavior. Signed-Off-By: Timur Kristóf <[email protected]> Tested-by: Andre Heider <[email protected]> Tested-by: Rob Clark <[email protected]> Acked-By: Eric Anholt <[email protected]>
* v3d: Fix build of NEON code with Mesa's cflags not targeting NEON.Eric Anholt2019-03-011-3/+17
| | | | | | | | v3d may be built as part of a set of drivers in a system not requiring NEON, but we know V3D devices will be paired with CPUs with NEON so we should be able to use this asm. Fixes: 0c05198d6b5b ("v3d: Always enable the NEON utile load/store code.")
* v3d: Stop tracking num_inputs for VPM loads.Eric Anholt2019-02-182-2/+2
| | | | | It's unused in the VS (since we need vattr_sizes[] anyway), so move it to FS prog data.
* v3d: Use the early_fragment_tests flag for the shader's disable-EZ field.Eric Anholt2019-02-181-2/+7
| | | | | | | | | | | Apparently we need disable-EZ flagged, not just "does Z writes". Fixes dEQP-GLES31.functional.image_load_store.early_fragment_tests.no_early_fragment_tests_depth_fbo on 7278, even though it passed in simulation. Signed-off-by: Eric Anholt <[email protected]> Fixes: 051a41d3d56e ("v3d: Add support for the early_fragment_tests flag.")
* v3d: Sync indirect draws on the last rendering.Eric Anholt2019-02-181-2/+2
| | | | | | Fixes intermittent fails in dEQP-GLES31.functional.draw_indirect.compute_interop.separate.drawelements_compute_cmd_and_data_and_indices and others (particularly when run as part of a CTS run)
* v3d: Clear the GMP on initialization of the simulator.Eric Anholt2019-02-181-0/+1
| | | | | | | Otherwise, we might have pages accessible that shouldn't be and miss out on errors. This is unlikely for most tests since v3d_hw_get_mem() is big enough that it'll be a freshly zeroed mmap, but if screens are destroyed and recreated then we'd be reusing the old v3d_hw_get_mem() contents.
* drm-uapi: use local files, not system libdrmEric Engestrom2019-02-144-5/+5
| | | | | | | | | There was an issue recently caused by the system header being included by mistake, so let's just get rid of this include path and always explicitly #include "drm-uapi/FOO.h" Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* gallium: add PIPE_CAP_MAX_VARYINGSKarol Herbst2019-02-071-0/+3
| | | | | | | | | | | | | | | | | Some NVIDIA hardware can accept 128 fragment shader input components, but only have up to 124 varying-interpolated input components. We add a new cap to express this cleanly. For most drivers, this will have the same value as PIPE_SHADER_CAP_MAX_INPUTS for the fragment shader. Fixes KHR-GL45.limits.max_fragment_input_components Signed-off-by: Karol Herbst <[email protected]> [imirkin: rebased, improved docs/commit message] Signed-off-by: Ilia Mirkin <[email protected]> Acked-by: Rob Clark <[email protected]> Acked-by: Eric Anholt <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Cc: 19.0 <[email protected]>
* v3d: Store the actual mask of color buffers present in the key.Eric Anholt2019-02-051-9/+10
| | | | | | | If you only bound rt 1+, we'd still emit a write to the rt0 that isn't present (noticed while debugging an ext_framebuffer_multisample-alpha-to-coverage-no-draw-buffer-zero regression in another change).
* v3d: Fix precompile of FRAG_RESULT_DATA1 and higher outputs.Eric Anholt2019-02-051-1/+1
| | | | I was just leaving the other MRT targets than DATA0 out, by accident.
* nir: Move V3D's "the shader was TGSI, ignore FS output types" flag to NIR.Eric Anholt2019-02-052-10/+2
| | | | | | | | | | | | | | Ken's rework of mesa/st builtins to NIR means that we'll have more NIR shaders with color output types that are mismatched with the render target types. Since this is behavior that GLSL doesn't require, add it as a shader_info option so the driver can know that it needs to ignore the FS output's base type in favor of the actual render target's. This prevents needing additional variants in several mesa/st paths (clear, pbo upload, pbo download), given that the driver already has to handle the variants for any TGSI being passed to it (from u_blitter, for example). Reviewed-by: Rob Clark <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* v3d: Fix leak in resource setup error pathErnestas Kulik2019-01-291-1/+1
| | | | | | | | | Reported by Coverity: in the case of unsupported modifier request, the code does not jump to the “fail” label to destroy the acquired resource. CID: 1435704 Signed-off-by: Ernestas Kulik <[email protected]> Fixes: 45bb8f295710 ("broadcom: Add V3D 3.3 gallium driver called "vc5", for BCM7268.")
* v3d: Always enable the NEON utile load/store code.Eric Anholt2019-01-291-5/+6
| | | | | | | I can't imagine the new HW block being paired with a v6 CPU, so don't bother with the CPU detection that vc4 had to do. Improves 1024x1024 TexImage on my 7278 by 47.3229% +/- 0.679632%
* v3d: Create separate sampler states for the various blend formats.Eric Anholt2019-01-273-46/+299
| | | | | | | | | | | | The sampler border color is encoded in the TMU's blending format (half floats, 32-bit floats, or integers) and must be clamped to the format's range unorm/snorm/int ranges by the driver. Additionally, the TMU doesn't know about how we're abusing the swizzle to support BGRA, A, and LA, so we have to pre-swizzle the border color for those. We don't really want to spend half a kb on sampler states in most cases, so skip generating the variants when the border color is unused or is 0,0,0,0.
* v3d: Move the sampler state to the long-lived state uploader.Eric Anholt2019-01-273-6/+13
| | | | | Samplers are small (8-24 bytes), so allocating 4k for them is a huge waste.
* v3d: Use the symbolic names for wrap modes from the XML.Eric Anholt2019-01-271-6/+9
|
* v3d: Fix stencil sampling from a separate-stencil buffer.Eric Anholt2019-01-272-0/+7
| | | | | | | | When the sampler view is in sample-stencil mode, we need to return uint stencil values. To do that, fill in the format table to return R8I, and have the sampler view point at the separate stencil buffer. Fixes dEQP-GLES31.functional.stencil_texturing.format.depth32f_stencil8_2d
* v3d: Fix stencil sampling from packed depth/stencil.Eric Anholt2019-01-271-1/+1
| | | | We need to pick the 8-bit unorm value out, not the depth component.
* v3d: Fix release-build warning about utile_h.Eric Anholt2019-01-271-2/+1
|