aboutsummaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/v3d
Commit message (Collapse)AuthorAgeFilesLines
* drm-uapi: use local files, not system libdrmEric Engestrom2019-02-144-5/+5
| | | | | | | | | There was an issue recently caused by the system header being included by mistake, so let's just get rid of this include path and always explicitly #include "drm-uapi/FOO.h" Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* gallium: add PIPE_CAP_MAX_VARYINGSKarol Herbst2019-02-071-0/+3
| | | | | | | | | | | | | | | | | Some NVIDIA hardware can accept 128 fragment shader input components, but only have up to 124 varying-interpolated input components. We add a new cap to express this cleanly. For most drivers, this will have the same value as PIPE_SHADER_CAP_MAX_INPUTS for the fragment shader. Fixes KHR-GL45.limits.max_fragment_input_components Signed-off-by: Karol Herbst <[email protected]> [imirkin: rebased, improved docs/commit message] Signed-off-by: Ilia Mirkin <[email protected]> Acked-by: Rob Clark <[email protected]> Acked-by: Eric Anholt <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Cc: 19.0 <[email protected]>
* v3d: Store the actual mask of color buffers present in the key.Eric Anholt2019-02-051-9/+10
| | | | | | | If you only bound rt 1+, we'd still emit a write to the rt0 that isn't present (noticed while debugging an ext_framebuffer_multisample-alpha-to-coverage-no-draw-buffer-zero regression in another change).
* v3d: Fix precompile of FRAG_RESULT_DATA1 and higher outputs.Eric Anholt2019-02-051-1/+1
| | | | I was just leaving the other MRT targets than DATA0 out, by accident.
* nir: Move V3D's "the shader was TGSI, ignore FS output types" flag to NIR.Eric Anholt2019-02-052-10/+2
| | | | | | | | | | | | | | Ken's rework of mesa/st builtins to NIR means that we'll have more NIR shaders with color output types that are mismatched with the render target types. Since this is behavior that GLSL doesn't require, add it as a shader_info option so the driver can know that it needs to ignore the FS output's base type in favor of the actual render target's. This prevents needing additional variants in several mesa/st paths (clear, pbo upload, pbo download), given that the driver already has to handle the variants for any TGSI being passed to it (from u_blitter, for example). Reviewed-by: Rob Clark <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* v3d: Fix leak in resource setup error pathErnestas Kulik2019-01-291-1/+1
| | | | | | | | | Reported by Coverity: in the case of unsupported modifier request, the code does not jump to the “fail” label to destroy the acquired resource. CID: 1435704 Signed-off-by: Ernestas Kulik <[email protected]> Fixes: 45bb8f295710 ("broadcom: Add V3D 3.3 gallium driver called "vc5", for BCM7268.")
* v3d: Always enable the NEON utile load/store code.Eric Anholt2019-01-291-5/+6
| | | | | | | I can't imagine the new HW block being paired with a v6 CPU, so don't bother with the CPU detection that vc4 had to do. Improves 1024x1024 TexImage on my 7278 by 47.3229% +/- 0.679632%
* v3d: Create separate sampler states for the various blend formats.Eric Anholt2019-01-273-46/+299
| | | | | | | | | | | | The sampler border color is encoded in the TMU's blending format (half floats, 32-bit floats, or integers) and must be clamped to the format's range unorm/snorm/int ranges by the driver. Additionally, the TMU doesn't know about how we're abusing the swizzle to support BGRA, A, and LA, so we have to pre-swizzle the border color for those. We don't really want to spend half a kb on sampler states in most cases, so skip generating the variants when the border color is unused or is 0,0,0,0.
* v3d: Move the sampler state to the long-lived state uploader.Eric Anholt2019-01-273-6/+13
| | | | | Samplers are small (8-24 bytes), so allocating 4k for them is a huge waste.
* v3d: Use the symbolic names for wrap modes from the XML.Eric Anholt2019-01-271-6/+9
|
* v3d: Fix stencil sampling from a separate-stencil buffer.Eric Anholt2019-01-272-0/+7
| | | | | | | | When the sampler view is in sample-stencil mode, we need to return uint stencil values. To do that, fill in the format table to return R8I, and have the sampler view point at the separate stencil buffer. Fixes dEQP-GLES31.functional.stencil_texturing.format.depth32f_stencil8_2d
* v3d: Fix stencil sampling from packed depth/stencil.Eric Anholt2019-01-271-1/+1
| | | | We need to pick the 8-bit unorm value out, not the depth component.
* v3d: Fix release-build warning about utile_h.Eric Anholt2019-01-271-2/+1
|
* v3d: Flush blit jobs immediately after generating them.Eric Anholt2019-01-271-0/+8
| | | | | | | | | | Fixes OOMs in the CTS's packed_pixels.varied_rectangle.* tests -- the series of texture uploads at the start before texturing occurred would end up all sitting around as cached jobs for reuse. By flushing immediately, peak active BO usage goes from 150M to 40M. We could maybe put some limits on how many jobs we keep around, but blits seem particularly unlikely to get reused for other drawing.
* v3d: Fix BO stats accounting for imported buffers.Eric Anholt2019-01-271-0/+3
|
* v3d: Drop maximum number of texture units down to 16.Eric Anholt2019-01-271-3/+3
| | | | | | This is the GLES 3.2 minmax, and also what the closed source driver does. Avoids hitting OOMs in the CTS's dEQP-GLES3.functional.texture.units.all_units.only_cube.1.
* v3d: Avoid duplicating limits defines between gallium and v3d core.Eric Anholt2019-01-276-12/+9
| | | | | We don't want to pull the compiler into every include in the gallium driver, so just make a new little header to store the limits.
* v3d: Rename gallium-local limits defines from VC5 to V3D.Eric Anholt2019-01-2711-33/+33
| | | | | The compiler has its limits under V3D_* (like most V3D stuff), so sync up with that.
* nir: rename nir_var_function to nir_var_function_tempKarol Herbst2019-01-191-1/+1
| | | | | | | | Signed-off-by: Karol Herbst <[email protected]> Acked-by: Jason Ekstrand <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* v3d: Restructure RO allocations using resource_from_handle.Eric Anholt2019-01-161-29/+38
| | | | | | | | | | | | | | | I had bugs in the old path where I was laying out as tiled (so we'd render tiled) but then only allocating space in the shared object for linear rendering. The resource_from_handle makes it so the same layout choices are made in both the import and export scanout cases. Also, fixes a leak of the fd that was tripping up the CTS. Now that we're checking PIPE_BIND_SHARED to choose to use RO, the DRM_FORMAT_MOD_LINEAR check wasn't needed any more. Fixes visual corruption and MMU faults in X in renderonly mode. Fixes: bd09bb1629a7 ("v3d: SHARED but not necessarily SCANOUT buffers on RO must be linear.")
* v3d: If the modifier is not known on BO import, default to linear for RO.Eric Anholt2019-01-161-1/+3
| | | | | | Part of fixing DRI3 rendering with RO on X11. Fixes: e113b21cb779 ("v3d: Add renderonly support.")
* v3d: SHARED but not necessarily SCANOUT buffers on RO must be linear.Eric Anholt2019-01-141-1/+1
| | | | | We don't have a way to talk to RO about modifiers it can do yet, so assume the minimum.
* v3d: Add support for shader_image_load_store.Eric Anholt2019-01-145-2/+196
| | | | | | This is only exposed on V3D 4.1+, because we didn't have the TMU write operations for images on 3.3 (To do GLES 3.1 there, you have to lower it to SSBO load/stores, which is a problem to solve later).
* v3d: Add SSBO/atomic counters support.Eric Anholt2019-01-146-1/+102
| | | | | So far I assume that all the buffers get written. If they weren't, you'd probably be using UBOs instead.
* v3d: Drop the GLSL version level.Eric Anholt2019-01-141-1/+1
| | | | | | This was an arbitrary "we support lots of stuff" value when I started the driver. However, at 400 we expose OES_gpu_shader5, which claims support for dynamically indexing samplers, which the driver doesn't do yet.
* v3d: Add an isr to the simulator to catch GMP violations.Eric Anholt2019-01-143-0/+39
| | | | | | | Otherwise, the simulator raises the GMP interrupt and waits for it to be handled, and v3d ends up spinning in v3d_hw_tick(). Aborting right when violation happens gives us a chance to look at the backtrace of whatever thread triggered the violation.
* v3d: Add support for GL_ARB_framebuffer_no_attachments.Eric Anholt2019-01-143-2/+19
| | | | | | Fixes dEQP-GLES31.functional.state_query.integer.max_framebuffer_height_getboolean when GLES3 is enabled.
* v3d: Add support for flushing dirty TMU data at job end.Eric Anholt2019-01-142-0/+20
| | | | This will be needed for SSBOs and image_load_store.
* v3d: Enable GL_ARB_texture_gather on V3D 4.x.Eric Anholt2019-01-081-0/+5
| | | | | This is part of GLES 3.1, and with the NIR lowering we're now passing the GLES31 testcases.
* nir: rename global/local to private/function memoryKarol Herbst2019-01-081-1/+1
| | | | | | | | | | | | | | | | | | the naming is a bit confusing no matter how you look at it. Within SPIR-V "global" memory is memory accessible from all threads. glsl "global" memory normally refers to shader thread private memory declared at global scope. As we already use "shared" for memory shared across all thrads of a work group the solution where everybody could be happy with is to rename "global" to "private" and use "global" later for memory usually stored within system accessible memory (be it VRAM or system RAM if keeping SVM in mind). glsl "local" memory is memory only accessible within a function, while SPIR-V "local" memory is memory accessible within the same workgroup. v2: rename local to function as well v3: rename vtn_variable_mode_local as well Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* v3d: Fix up VS output setup during precompiles.Eric Anholt2019-01-041-6/+10
| | | | | | | | | I noticed that a VS I was debugging was missing all of its output stores -- outputs_written was for POS, VAR0, VAR3, while the shader's variables were POS, VAR9, and VAR12. I'm not sure what outputs_written is supposed to be doing here, but we can just walk the declared variables and avoid both this bug and the emission of extra stvpms for less-than-vec4 varyings.
* v3d: Refactor compiler entrypoints.Eric Anholt2019-01-021-26/+6
| | | | | | Before, I had per-stage entryoints with some helpers shared between them. As I extended for compute shaders and shader-db, it turned out that the other common code in the middle wanted to be shared too.
* v3d: Don't forget to include RT writes in precompiles.Eric Anholt2019-01-021-0/+10
| | | | | Looking at some assembly dumps for an optimization, we were clearly missing important parts of the shader!
* v3d: Fix segfault when failing to compile a program.Eric Anholt2019-01-021-2/+4
| | | | | | | We'll still fail at draw time, but this avoids a regression in shader-db execution once I enable TLB writes in precompiles. Fixes: b38e4d313fc2 ("v3d: Create a state uploader for packing our shaders together.")
* v3d: Add support for requesting the sample offsets.Eric Anholt2018-12-301-0/+22
|
* v3d: Hook up some shader-db output to GL_ARB_debug_output.Eric Anholt2018-12-301-0/+12
| | | | | | | This allows the original shader-db project's run.c runner to parse things easily, and is probably a good thing to have for GL_ARB_debug_output in general. I formatted it more like Intel's so I can mostly reuse their report script.
* v3d: Add a "precompile" debug flag for shader-db.Eric Anholt2018-12-291-0/+76
| | | | | | | | | I've been using my apitrace-based shader-db so far, but it's slow (apitrace decompression), intrusive (apitrace windows spamming the screen), and doesn't have much coverage. The original shader-db provides a lot more coverage and compiles faster, at the expense of not having the actual runtime variant key. As v3d has a lot less runtime variation than vc4 did, this tradeoff makes more sense.
* v3d: Hook up perf_debug() output to GL_ARB_debug output as well.Eric Anholt2018-12-202-0/+3
| | | | | This is the right channel to report these things, so that end-users don't need to know each driver's custom debug options.
* v3d: Wire up core pipe_debug_callbackRhys Kidd2018-12-202-0/+14
| | | | | | | This lets the driver use pipe_debug_message() for GL_ARB_debug_output. Signed-off-by: Rhys Kidd <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* v3d: Drop shadow comparison state from shader variant key.Eric Anholt2018-12-201-2/+0
| | | | The shadow state is now in the sampler.
* v3d: Fix simulator mode on i915 render nodes.Eric Anholt2018-12-201-28/+73
| | | | | | i915 render nodes refuse the dumb ioctls, so the simulator would crash on the original non-apitrace shader-db. Replace them with direct i915 calls if we detect that we're on one of their gem fds.
* v3d: Load and store aligned utiles all at once.Eric Anholt2018-12-191-8/+114
| | | | | | This calls the expensive uif offset function once per utile, but it still gets us a 212.218% +/- 2.41216% (n=10) win on 1024x1024 glTexImage over calling it on each pixel.
* v3d: Implement texture_subdata to reduce teximage upload copies.Eric Anholt2018-12-191-29/+85
| | | | | | | This lets us store the non-PBO glTexImage data directly into the tiled image without making an extra untiled memcpy for the gallium transfer. Improves 1024x1024 TexImage perf by ~19%, mostly from not thrashing around in the kernel mapping and unmapping the transfer's temporary area.
* v3d: Remove dead prototypes for load/store utile functions.Eric Anholt2018-12-191-2/+0
|
* v3d: Don't try to create shadow tiled temporaries for 1D textures.Eric Anholt2018-12-191-1/+2
| | | | | | | They're raster order anyway, so we'd assertion fail along with wasting bandwidth. Fixes: 6ad9e8690d14 ("v3d: Add support for texturing from linear.")
* v3d: Fix check for TFU job completion in the simulator.Eric Anholt2018-12-191-1/+1
| | | | | | | | | | We're waiting for the jobs-completed count to increment (with wrapping), not to reach its starting state. This mostly ended up working out because the next v3d_hw_tick() for a submit CL would end up doing the TFU operation first, but it did fail when a blit was used for glReadPixels() at the end of a test. Fixes: ee0549ff9ab3 ("v3d: Add the V3D TFU submit interface to the simulator.")
* v3d: Put the dst bo first in the list of BOs for TFU calls.Eric Anholt2018-12-191-2/+2
| | | | | | | | | | | | In the UAPI, the first BO is the destination, and the one the kernel should do an exclusive reservation on. Currently we only do exclusive reservations, anyway. However, in the simulator path I was only copying back the "destination" BO (actually src in this case), and this caused regressions once I fixed the simulator to actually complete TFU before returning (since otherwise, the TFU op would happen at the start of the next CL submit and the draw would get the right contents). Fixes: 976ea90bdca2 ("v3d: Add support for using the TFU to do some blits.")
* v3d: Drop in a bunch of notes about performance improvement opportunities.Eric Anholt2018-12-142-1/+13
| | | | | | These have all been floating in my head, and while I've thought about encoding them in issues on gitlab once they're enabled, they also make sense to just have in the area of the code you'll need to work in.
* v3d: Use the uniform pretty-printer in v3d_write_uniforms()'s debug code.Eric Anholt2018-12-141-1/+3
| | | | | This will be a lot easier than my usual "38400.000000? that looks like a viewport scale" decoding strategy.
* v3d: Move uinfo->data[] dereference to the top of v3d_write_uniforms().Eric Anholt2018-12-141-15/+13
| | | | | | Follows 3954331aff23 ("vc4: Pull uinfo->data[i] dereference out to the top of the loop.") which showed a large performance win for vc4, but also cleans up the code a decent bit.