summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/vc4
Commit message (Collapse)AuthorAgeFilesLines
* util: use C99 declaration in the for-loop hash_table_foreach() macroEric Engestrom2018-10-255-6/+0
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* gallium/ttn: Convert inputs and outputs to derefs of variables.Eric Anholt2018-10-151-3/+4
| | | | | | | | | | | This means that TTN shaders more closely resemble GTN shaders: they have inputs and outputs as variable derefs, with the variables having their .driver_location already set up for you. This will be useful for v3d to do input variable DCE in NIR, which we can't do when the TTN shaders never have a pre-nir_lower_io stage. Acked-by: Rob Clark <[email protected]>
* gallium/u_transfer_helper: Add support for separate Z24/S8 as well.Kenneth Graunke2018-10-141-1/+2
| | | | | | | | | | | | | | | | u_transfer_helper already had code to handle treating packed Z32_S8 as separate Z32_FLOAT and S8_UINT resources, since some drivers can't handle that interleaved format natively. Other hardware needs depth and stencil as separate resources for all formats. For example, V3D3 needs this for 24-bit depth as well. This patch adds a new flag to lower all depth/stencils formats, and implements support for Z24_UNORM_S8_UINT. (S8_UINT_Z24_UNORM is left as an exercise to the reader, preferably someone who has access to a machine that uses that format.) Reviewed-by: Eric Anholt <[email protected]>
* vc4: Remove dead i == 0 code from the cos() implementation.Eric Anholt2018-09-211-6/+3
| | | | The loop starts at 1.
* vc4: Fix sin(0.0) and cos(0.0) accuracy to fix SDL rendering rotation.Eric Anholt2018-09-211-26/+40
| | | | | | | | | | | | | | | | | | SDL has some shaders that compute sin(angle) and cos(angle) for a rotation matrix in the VS, and angle is usually 0.0. Our previous implementation had quite a bit of error around 0.0, causing single-pixel rotations at typical window sizes. SDL2 has changed as of August 28th (commit 12156:e5a666405750) to not need sin/cos in the VS, but we should still fix this for existing implementations or similar patterns that other programs may have. glsl-cos goes from 32 instructions to 36, but 9 uniforms to 7. glsl-sin goes from 32 instructions to 34, but 8 uniforms to 7. This seems like a fine impact to have for the bugfix. Cc: 18.1 18.2 <[email protected]> Fixes: https://github.com/anholt/mesa/issues/110
* vc4: Drop a bunch of duplicated gallium PIPE_CAP default code.Eric Anholt2018-09-041-179/+0
| | | | | | | | Now that we have the util function for the default values, we can get rid of the boilerplate. v2: drop GLSL level in favor of defaults. v3: Rebase on new gallium caps
* gallium: Add a helper for implementing PIPE_CAP_* default values.Eric Anholt2018-09-041-2/+2
| | | | | | | | | | | | | | | | | | One of the pains of implementing a gallium driver is filling in a million pipe caps you don't know about yet when you're just starting out. One of the pains of working on gallium is copy-and-pasting your new PIPE_CAP into each driver. We can fix both of these by having each driver call into the default helper from their default case, so that both sides can ignore each other until they need to. v2: fix i915g build, revert swr change to avoid breaking scons build (https://travis-ci.org/anholt/mesa/jobs/419739857) v3: Rebase on 3 new gallium caps. Reviewed-by: Marek Olšák <[email protected]> (v1) Cc: Bruce Cherniak <[email protected]> Cc: George Kyriazis <[email protected]> Cc: Kenneth Graunke <[email protected]>
* gallium: Split out PIPE_CAP_TEXTURE_MIRROR_CLAMP_TO_EDGE.Kenneth Graunke2018-08-241-0/+1
| | | | | | | | | | | | | Some hardware can do PIPE_TEX_WRAP_MIRROR_REPEAT but not PIPE_TEX_WRAP_MIRROR_CLAMP and PIPE_TEX_WRAP_MIRROR_CLAMP_TO_BORDER. Drivers for such hardware would like to advertise support for ARB_texture_mirror_clamp_to_edge but not EXT_texture_mirror_clamp. This commit adds a new PIPE_CAP_TEXTURE_MIRROR_CLAMP_TO_EDGE bit, changes the extension enable to be based on that, and enables it in all upstream drivers which supported PIPE_CAP_TEXTURE_MIRROR_CLAMP (so they continue supporting this mode).
* gallium: add PIPE_CAP_MAX_SHADER_BUFFER_SIZEMarek Olšák2018-08-231-0/+2
| | | | Tested-by: Dieter Nützel <[email protected]>
* gallium: add PIPE_CAP_MAX_GS_INVOCATIONSMarek Olšák2018-08-231-0/+1
| | | | Tested-by: Dieter Nützel <[email protected]>
* vc4: Implement texture_subdata() to directly upload tiled data.Eric Anholt2018-08-081-1/+39
| | | | | | This avoids a memcpy into a temporary in the upload path. Improves x11perf -putimage100 performance by 12.1586% +/- 1.38155% (n=145)
* vc4: Handle partial loads/stores of tiled textures.Eric Anholt2018-08-083-60/+155
| | | | | | | | | | | | | | | | Previously, we would load out the tile-aligned area, update the raster copy, and store it back. This was a huge cost for XPutImage calls to the screen under glamor. Instead, implement a general load/store path that walks over the source x/y writing into the corresponding pixel of the destination (using clever math from https://fgiesen.wordpress.com/2011/01/17/texture-tiling-and-swizzling/). If things are aligned, we go through the previous utile-at-a-time loop. Improves x11perf -putimage10 performance by 139.777% +/- 2.83464% (n=5) Improves x11perf -putimage100 performance by 383.908% +/- 22.6297% (n=11) Improves x11perf -getimage10 performance by 2.75731% +/- 0.585054% (n=145)
* vc4: Compile the LT image helper per cpp we might load/store.Eric Anholt2018-08-081-2/+31
| | | | | | | | For the partial load/store support I'm about to add, we want the memcpy to be compiled out to a single load/store. This should also eliminate the calls to vc4_utile_width/height(). Improves x11perf -putimage100 performance by 3.76344% +/- 1.16978% (n=15)
* vc4: Refactor to reuse the LT tile walking code.Eric Anholt2018-08-081-24/+34
|
* vc4: Fix vc4_fence_server_sync() on pre-syncobj kernels.Eric Anholt2018-08-071-1/+2
| | | | | | | | | We won't have an FD if we're just having the server wait on a fence created by eglCreateSyncKHR(). Our seqno fences will happen in order, so server-side waits are no-ops in that case. Fixes dEQP-EGL.functional.sharing.gles2.multithread.simple_egl_server_sync.buffers.gen_delete Fixes: b0acc3a5628c ("broadcom/vc4: Native fence fd support")
* vc4: Ignore samplers for finding uniform offsets.Eric Anholt2018-08-071-3/+14
| | | | | | | | | | Fixes: dEQP-GLES2.shaders.struct.uniform.sampler_array_fragment dEQP-GLES2.shaders.struct.uniform.sampler_array_vertex dEQP-GLES2.shaders.struct.uniform.sampler_nested_fragment dEQP-GLES2.shaders.struct.uniform.sampler_nested_vertex Cc: [email protected]
* vc4: Extend dumping of uniforms in QIR and in the command stream.Eric Anholt2018-08-073-13/+68
| | | | Similar to what I did for V3D, provide some description of the uniforms.
* vc4: Pull uinfo->data[i] dereference out to the top of the loop.Eric Anholt2018-08-071-20/+18
| | | | | | Reduces the size of vc4_uniforms.o by about 10%. We would basically always end up loading the cachline of uinfo->data[i] anyway, so it should be good for performance as well as making the code a bit cleaner.
* vc4: Make sure to emit a tile coordinates between two MSAA loads.Eric Anholt2018-08-071-12/+11
| | | | | | | | | | The HW only executes a load once the tile coordinates packet happens, and only tracks one at a time, so by emitting our two MSAA loads back to back we would end up with an undefined color or Z buffer. The simulator doesn't seem to care, but sync up the RCL generation with the kernel anyway. Fixes dEQP-EGL.functional.render.multi_context.gles2.rgb888_window
* vc4: Respect a sampler view's first_layer field.Eric Anholt2018-08-071-1/+3
| | | | | | | Fixes texturing from EGL images created from cubemap faces, as in dEQP-EGL.functional.image.create.gles2_cubemap_negative_x_rgba_texture Cc: [email protected]
* vc4: Fix a leak of the no-vertex-elements workaround BO.Eric Anholt2018-08-061-0/+2
| | | | Fixes: bd1925562ad1 ("vc4: Convert the driver to emitting the shader record using pack macros.")
* vc4: Fix context creation when syncobjs aren't supported.Eric Anholt2018-08-061-2/+6
| | | | | | Noticed when trying to run current Mesa on rpi's downstream kernel. Fixes: b0acc3a5628c ("broadcom/vc4: Native fence fd support")
* vc4: Fix automake linking error.Juan A. Suarez Romero2018-08-011-0/+9
| | | | | | | | | | | | | | | CXXLD gallium_dri.la ../../../../src/gallium/drivers/vc4/.libs/libvc4.a(vc4_cl_dump.o): In function `vc4_dump_cl': src/gallium/drivers/vc4/vc4_cl_dump.c:45: undefined reference to `clif_dump_init' src/gallium/drivers/vc4/vc4_cl_dump.c:82: undefined reference to `clif_dump_destroy' ../../../../src/broadcom/cle/.libs/libbroadcom_cle.a(cle_libbroadcom_cle_la-v3d_decoder.o): In function `v3d_field_iterator_next': src/broadcom/cle/v3d_decoder.c:902: undefined reference to `clif_lookup_bo' Fixes: e92959c4e0 ("v3d: Pass the whole clif_dump structure to v3d_print_group().") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107423 CC: Eric Anholt <[email protected]> Acked-by: Eric Anholt <[email protected]> Reviewed-by: Andres Gomez <[email protected]>
* gallium: add storage_sample_count parameter into is_format_supportedMarek Olšák2018-07-311-0/+4
| | | | Tested-by: Dieter Nützel <[email protected]>
* gallium: add PIPE_CAP_FRAMEBUFFER_MSAA_CONSTRAINTSMarek Olšák2018-07-311-0/+1
| | | | Tested-by: Dieter Nützel <[email protected]>
* v3d: Add a separate flag for CLIF ABI output versus human-readable CLs.Eric Anholt2018-07-301-1/+1
| | | | | | A few of the upcoming changes would make the V3D_DEBUG=cl output less readable, so let's make proper CLIF file production be under a separate V3D_DEBUG=clif flag.
* vc4: Fix meson build when enabled without v3d.Eric Anholt2018-07-291-1/+1
| | | | | Reported-by: Rob Clark <[email protected]> Fixes: e92959c4e03c ("v3d: Pass the whole clif_dump structure to v3d_print_group().")
* v3d: Stop doing pretty-printed colorful booleans in CLIF output.Eric Anholt2018-07-271-1/+1
| | | | | The parser wants to see a 1 or 0. We can put "true" and "false" in a comment to clarify that it's a boolean and the parser will skip it.
* v3d: Move clif dump BO lookup into the clif dumper.Eric Anholt2018-07-271-1/+1
| | | | | The clif dumper is going to need information about all of our BOs if we're going to dump them for replay purposes.
* v3d: Pass the whole clif_dump structure to v3d_print_group().Eric Anholt2018-07-271-1/+6
| | | | | | To generate CLIF files that the v3dv3 simulator can parse, we're going to need to decode addresses, and for that we'll need the vaddr lookup function from the clif structure from within v3d_decoder.
* vc4: Tell NIR to lower fdiv instructionsJason Ekstrand2018-07-131-0/+1
| | | | | | This should allow us to use them in nir_lower_tex Reviewed-by: Eric Anholt <[email protected]>
* vc4: Switch to using u_transfer_helper for MSAA maps.Eric Anholt2018-07-132-100/+16
| | | | No requirement, just reduces code duplication.
* vc4: Don't automatically reallocate a PERSISTENT-mapped buffer.Eric Anholt2018-07-121-1/+1
| | | | | | | I had mistakenly used the COHERENT flag, which can only be set when PERSISTENT is mapped, but isn't always. Fixes: a2014c2eb9e0 ("vc4: Simplify the DISCARD_RANGE handling")
* gallium/util: remove dummy function util_format_is_supportedMarek Olšák2018-06-291-2/+1
| | | | Reviewed-by: Eric Engestrom <[email protected]>
* broadcom/vc4: Remove deref chain support from nir_lower_txf_ms.Eric Anholt2018-06-221-1/+0
| | | | | | | Acked-by: Rob Clark <[email protected]> Acked-by: Bas Nieuwenhuizen <[email protected]> Acked-by: Dave Airlie <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* st,ir3,radeonsi: push lower_deref_instrs back into driverRob Clark2018-06-221-1/+0
| | | | | | | | | | | | | vc4+vc5 is not really effected by the deref chain to deref instr conversion, so it no longer needs this pass. For others, now that all the passes mesa/st uses are using deref instructions, push the lowering to deref chains back into driver. Signed-off-by: Rob Clark <[email protected]> Acked-by: Rob Clark <[email protected]> Acked-by: Bas Nieuwenhuizen <[email protected]> Acked-by: Dave Airlie <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* anv,i965,radv,st,ir3: Call nir_lower_deref_instrsJason Ekstrand2018-06-221-0/+1
| | | | | | | | | | | This inserts a call to nir_lower_deref_instrs at every call site of glsl_to_nir, spirv_to_nir, and prog_to_nir. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Acked-by: Rob Clark <[email protected]> Acked-by: Bas Nieuwenhuizen <[email protected]> Acked-by: Dave Airlie <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* gallium: add scalar isa shader capChristian Gmeiner2018-06-201-0/+2
| | | | | | | | | | | | | | | | v1 -> v2: - nv30 is _NOT_ scalar as suggested by Ilia Mirkin. - Change from a screen cap to a shader cap as suggested by Eric Anholt. - radeonsi is scalar as suggested by Marek Olšák. - Change missing ones to be scalar. v2 -> v3: - r600 prefers vec4 as suggested by Marek Olšák. Signed-off-by: Christian Gmeiner <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium: add support for programmable sample locationsRhys Perry2018-06-141-0/+1
| | | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Brian Paul <[email protected]> (v2) Reviewed-by: Marek Olšák <[email protected]> (v2)
* v3d: Be more explicit about include directory from our generated code.Eric Anholt2018-06-051-1/+2
| | | | | | | You'd need src/broadcom/cle/ in the -I previously, for srcdir != builddir. nir was fine at that, but automake didn't have it. Bugzilla: https://github.com/anholt/mesa/issues/104
* gallium: add PIPE_CAP_GLSL_FEATURE_LEVEL_COMPATIBILITYMarek Olšák2018-05-291-0/+1
| | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* gallium/winsys: rename DRM_API_HANDLE_* to WINSYS_HANDLE_*Dave Airlie2018-05-301-5/+5
| | | | | | | | | | | | This just renames this as we want to add an shm handle which isn't really drm related. Originally by: Marc-André Lureau <[email protected]> (airlied: I used this sed script instead) This was generated with: git grep -l 'DRM_API_' | xargs sed -i 's/DRM_API_/WINSYS_/g' Reviewed-by: Marek Olšák <[email protected]>
* broadcom/vc4: Native fence fd supportStefan Schake2018-05-176-11/+107
| | | | | | | | | | | | | | | With the syncobj support in place, lets use it to implement the EGL_ANDROID_native_fence_sync extension. This mostly follows previous implementations in freedreno and etnaviv. v2: Drop the flags (Eric) Handle in_fence_fd already in job_submit (Eric) Drop extra vc4_fence_context_init (Eric) Dup fds with CLOEXEC (Eric) Mention exact extension name (Eric) Signed-off-by: Stefan Schake <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* broadcom/vc4: Store job fence in syncobjStefan Schake2018-05-173-4/+35
| | | | | | | | | This gives us access to the fence created for the render job. v2: Drop flag (Eric) Signed-off-by: Stefan Schake <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* broadcom/vc4: Detect syncobj supportStefan Schake2018-05-172-0/+7
| | | | | | | | | | We need to know if the kernel supports syncobj submission since otherwise all the DRM syncobj calls fail. v2: Use drmGetCap to detect syncobj support (Eric) Signed-off-by: Stefan Schake <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* vc4: use util_copy_framebuffer_stateRob Clark2018-05-151-12/+2
| | | | | Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* gallium: add initial support for conservative rasterizationRhys Perry2018-04-301-1/+12
| | | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* util: Move util_is_power_of_two to bitscan.h and rename to ↵Ian Romanick2018-03-291-2/+2
| | | | | | | | | | | util_is_power_of_two_or_zero The new name make the zero-input behavior more obvious. The next patch adds a new function with different zero-input behavior. Signed-off-by: Ian Romanick <[email protected]> Suggested-by: Matt Turner <[email protected]> Reviewed-by: Alejandro Piñeiro <[email protected]>
* broadcom/vc4: add path to nir_builder.hJuan A. Suarez Romero2018-03-221-1/+1
| | | | | | | | As the other VC4 files do. Otherwise, it won't find nir_builder.h v2: add path in source code rather changing autotools (Emil) Reviewed-by: Emil Velikov <[email protected]>
* gallium: add packed uniform CAPTimothy Arceri2018-03-201-0/+1
| | | | Reviewed-by: Marek Olšák <[email protected]>