aboutsummaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/vc4
Commit message (Collapse)AuthorAgeFilesLines
* drm-uapi: use local files, not system libdrmEric Engestrom2019-02-146-8/+8
| | | | | | | | | There was an issue recently caused by the system header being included by mistake, so let's just get rid of this include path and always explicitly #include "drm-uapi/FOO.h" Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* gallium: add PIPE_CAP_MAX_VARYINGSKarol Herbst2019-02-071-0/+3
| | | | | | | | | | | | | | | | | Some NVIDIA hardware can accept 128 fragment shader input components, but only have up to 124 varying-interpolated input components. We add a new cap to express this cleanly. For most drivers, this will have the same value as PIPE_SHADER_CAP_MAX_INPUTS for the fragment shader. Fixes KHR-GL45.limits.max_fragment_input_components Signed-off-by: Karol Herbst <[email protected]> [imirkin: rebased, improved docs/commit message] Signed-off-by: Ilia Mirkin <[email protected]> Acked-by: Rob Clark <[email protected]> Acked-by: Eric Anholt <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Cc: 19.0 <[email protected]>
* vc4: Fix leak in HW queries error pathErnestas Kulik2019-01-291-1/+1
| | | | | | | | | | Reported by Coverity: in the case where there exist hardware and non-hardware queries, the code does not jump to err_free_query and leaks the query. CID: 1430194 Signed-off-by: Ernestas Kulik <[email protected]> Fixes: 9ea90ffb98fb ("broadcom/vc4: Add support for HW perfmon")
* vc4: Enable NEON asm on meson cross-builds.Eric Anholt2019-01-281-4/+6
| | | | | | | | | The core Mesa with_asm_arch and USE_ARM_ASM flags are disabled for meson cross-builds because of the need to run host binaries on the build system. vc4 doesn't need to do that, so skip with_asm_arch to enable NEON on my cross-builds. Fixes: ebcb4c2156e9 ("meson: Enable VC4's NEON assembly support.")
* nir: add bit_size parameter to system values with multiple allowed bit sizesKarol Herbst2019-01-211-2/+2
| | | | | | | | | v2: add assert to verify we have at least one valid bit_size v3: fix use of load_front_face in nir_lower_two_sided_color and tgsi_to_nir Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* nir: rename nir_var_function to nir_var_function_tempKarol Herbst2019-01-191-2/+2
| | | | | | | | Signed-off-by: Karol Herbst <[email protected]> Acked-by: Jason Ekstrand <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* nir: rename global/local to private/function memoryKarol Herbst2019-01-081-2/+2
| | | | | | | | | | | | | | | | | | the naming is a bit confusing no matter how you look at it. Within SPIR-V "global" memory is memory accessible from all threads. glsl "global" memory normally refers to shader thread private memory declared at global scope. As we already use "shared" for memory shared across all thrads of a work group the solution where everybody could be happy with is to rename "global" to "private" and use "global" later for memory usually stored within system accessible memory (be it VRAM or system RAM if keeping SVM in mind). glsl "local" memory is memory only accessible within a function, while SPIR-V "local" memory is memory accessible within the same workgroup. v2: rename local to function as well v3: rename vtn_variable_mode_local as well Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* vc4: Hook up perf_debug() output to GL_ARB_debug_output as well.Eric Anholt2018-12-202-0/+3
| | | | | This is the right channel to report these things, so that end-users don't need to know each driver's custom debug options.
* vc4: Wire up core pipe_debug_callbackRhys Kidd2018-12-202-0/+14
| | | | | | | This lets the driver use pipe_debug_message() for GL_ARB_debug_output. Signed-off-by: Rhys Kidd <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* vc4: Move the utile load/store functions to a header for reuse by v3d.Eric Anholt2018-12-192-202/+11
| | | | | These implementations of whole-utile load/stores would be the same for v3d, though the layouts of blocks of utiles has changed.
* nir/opt_peephole_select: Don't peephole_select expensive math instructionsIan Romanick2018-12-171-1/+1
| | | | | | | | | | | | | | | | On some GPUs, especially older Intel GPUs, some math instructions are very expensive. On those architectures, don't reduce flow control to a csel if one of the branches contains one of these expensive math instructions. This prevents a bunch of cycle count regressions on pre-Gen6 platforms with a later patch (intel/compiler: More peephole select for pre-Gen6). v2: Remove stray #if block. Noticed by Thomas. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Thomas Helland <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* nir/opt_peephole_select: Don't try to remove flow control around indirect loadsIan Romanick2018-12-171-1/+1
| | | | | | | | | | | | | | | | | | | That flow control may be trying to avoid invalid loads. On at least some platforms, those loads can also be expensive. No shader-db changes on any Intel platform (even with the later patch "intel/compiler: More peephole select"). v2: Add a 'indirect_load_ok' flag to nir_opt_peephole_select. Suggested by Rob. See also the big comment in src/intel/compiler/brw_nir.c. v3: Use nir_deref_instr_has_indirect instead of deref_has_indirect (from nir_lower_io_arrays_to_elements.c). v4: Fix inverted condition in brw_nir.c. Noticed by Lionel. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* vc4: Reuse nir_format_convert.h in our blend lowering.Eric Anholt2018-12-171-33/+3
| | | | | These helpers came along after and have effectively the same implementation.
* nir: Add a bool to int32 lowering passJason Ekstrand2018-12-161-0/+2
| | | | | | | | We also enable it in all of the NIR drivers. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Tested-by: Bas Nieuwenhuizen <[email protected]>
* nir: Rename Boolean-related opcodes to include 32 in the nameJason Ekstrand2018-12-161-20/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a squash of a bunch of individual changes: nir/builder: Generate 32-bit bool opcodes transparently nir/algebraic: Remap Boolean opcodes to the 32-bit variant Use 32-bit opcodes in the NIR producers and optimizations Generated with a little hand-editing and the following sed commands: sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' **/*.c sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' **/*.c sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' **/*.c sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' **/*.c sed -i 's/nir_op_\([fiu]lt\)/nir_op_\132/g' **/*.c sed -i 's/nir_op_\([fiu]ge\)/nir_op_\132/g' **/*.c sed -i 's/nir_op_\([fiu]ne\)/nir_op_\132/g' **/*.c sed -i 's/nir_op_\([fiu]eq\)/nir_op_\132/g' **/*.c sed -i 's/nir_op_\([fi]\)ne32g/nir_op_\1neg/g' **/*.c sed -i 's/nir_op_bcsel/nir_op_b32csel/g' **/*.c Use 32-bit opcodes in the NIR back-ends Generated with a little hand-editing and the following sed commands: sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' **/*.c sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' **/*.c sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' **/*.c sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' **/*.c sed -i 's/nir_op_\([fiu]lt\)/nir_op_\132/g' **/*.c sed -i 's/nir_op_\([fiu]ge\)/nir_op_\132/g' **/*.c sed -i 's/nir_op_\([fiu]ne\)/nir_op_\132/g' **/*.c sed -i 's/nir_op_\([fiu]eq\)/nir_op_\132/g' **/*.c sed -i 's/nir_op_\([fi]\)ne32g/nir_op_\1neg/g' **/*.c sed -i 's/nir_op_bcsel/nir_op_b32csel/g' **/*.c Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Tested-by: Bas Nieuwenhuizen <[email protected]>
* vc4: Use the original bit size when scalarizing uniform loads.Eric Anholt2018-12-161-1/+2
| | | | | | Prevents a regression in jekstrand's 1-bit series. Reviewed-by: Jason Ekstrand <[email protected]>
* vc4: Fix a leak of the transfer helper on screen destroy.Eric Anholt2018-12-071-0/+3
| | | | Fixes: d009463a6549 ("vc4: Switch to using u_transfer_helper for MSAA maps.")
* nir: Make boolean conversions sized just like the othersJason Ekstrand2018-12-051-4/+4
| | | | | | | | | Instead of a single i2b and b2i, we now have i2b32 and b2iN where N is one if 8, 16, 32, or 64. This leads to having a few more opcodes but now everything is consistent and booleans aren't a weird special case anymore. Reviewed-by: Connor Abbott <[email protected]>
* nir: Make nir_lower_clip_vs optionally work with variables.Kenneth Graunke2018-11-191-1/+2
| | | | | | | | | | | | The way nir_lower_clip_vs() works with store_output intrinsics makes a ton of assumptions about the driver_location field. In i965 and iris, I'd rather do this lowering early and work with variables. v3d may want to switch to that as well, and ir3 could too, but I'm not sure exactly what would need updating. For now, handle both methods. Reviewed-by: Eric Anholt <[email protected]>
* vc4: Don't return a vc4 BO handle on a renderonly screen.Eric Anholt2018-11-151-2/+4
| | | | | | The handles exported need to be on the KMS device's fd, anything else is failure. Also, this code is assuming that the scanout resource has been created already, so assert it.
* vc4: Make sure we make ro scanout resources for create_with_modifiers.Eric Anholt2018-11-151-1/+9
| | | | | | | | | | The DRI3 create_with_modifiers paths don't set tmpl.bind to SCANOUT or SHARED, with the theory that given that you've got modifiers, that's all you need. However, we were looking at the tmpl.bind for setting up the KMS handle in the renderonly case, so we'd end up trying to use vc4's handle on the hx8357d fd. Fixes: 84ed8b67c56b ("vc4: Set shareable BOs as T tiled if possible")
* vc4: Use the normal simulator ioctl path for CL submit as well.Eric Anholt2018-11-023-13/+5
| | | | The simulator no longer needs to look back into the gallium structs.
* vc4: Maintain a separate GEM mapping of BOs in the simulator.Eric Anholt2018-11-022-42/+58
| | | | This will let us avoid looking back into the gallium driver's vc4_bo.
* vc4: Take advantage of _mesa_hash_table_remove_key() in the simulator.Eric Anholt2018-11-021-4/+2
|
* vc4: Drop the winsys_stride relayout in the simluatorEric Anholt2018-11-015-95/+12
| | | | | | | Since 0c1dd9dee0da ("broadcom/vc4: Allow importing linear BOs with arbitrary offset/stride."), we have the vc4-side BO properly laid out (assuming it's linear) in the winsys BO so that we can skip this extra copy.
* util: Move os_misc to utilDylan Baker2018-10-301-1/+1
| | | | | | | this is needed by u_debug Tested-by: Brian Paul <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* vc4: Fix unused variable warning.Eric Anholt2018-10-301-1/+0
| | | | Fixes: bb84fa146f22 ("util: use C99 declaration in the for-loop hash_table_foreach() macro")
* util: use C99 declaration in the for-loop hash_table_foreach() macroEric Engestrom2018-10-255-6/+0
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* gallium/ttn: Convert inputs and outputs to derefs of variables.Eric Anholt2018-10-151-3/+4
| | | | | | | | | | | This means that TTN shaders more closely resemble GTN shaders: they have inputs and outputs as variable derefs, with the variables having their .driver_location already set up for you. This will be useful for v3d to do input variable DCE in NIR, which we can't do when the TTN shaders never have a pre-nir_lower_io stage. Acked-by: Rob Clark <[email protected]>
* gallium/u_transfer_helper: Add support for separate Z24/S8 as well.Kenneth Graunke2018-10-141-1/+2
| | | | | | | | | | | | | | | | u_transfer_helper already had code to handle treating packed Z32_S8 as separate Z32_FLOAT and S8_UINT resources, since some drivers can't handle that interleaved format natively. Other hardware needs depth and stencil as separate resources for all formats. For example, V3D3 needs this for 24-bit depth as well. This patch adds a new flag to lower all depth/stencils formats, and implements support for Z24_UNORM_S8_UINT. (S8_UINT_Z24_UNORM is left as an exercise to the reader, preferably someone who has access to a machine that uses that format.) Reviewed-by: Eric Anholt <[email protected]>
* vc4: Remove dead i == 0 code from the cos() implementation.Eric Anholt2018-09-211-6/+3
| | | | The loop starts at 1.
* vc4: Fix sin(0.0) and cos(0.0) accuracy to fix SDL rendering rotation.Eric Anholt2018-09-211-26/+40
| | | | | | | | | | | | | | | | | | SDL has some shaders that compute sin(angle) and cos(angle) for a rotation matrix in the VS, and angle is usually 0.0. Our previous implementation had quite a bit of error around 0.0, causing single-pixel rotations at typical window sizes. SDL2 has changed as of August 28th (commit 12156:e5a666405750) to not need sin/cos in the VS, but we should still fix this for existing implementations or similar patterns that other programs may have. glsl-cos goes from 32 instructions to 36, but 9 uniforms to 7. glsl-sin goes from 32 instructions to 34, but 8 uniforms to 7. This seems like a fine impact to have for the bugfix. Cc: 18.1 18.2 <[email protected]> Fixes: https://github.com/anholt/mesa/issues/110
* vc4: Drop a bunch of duplicated gallium PIPE_CAP default code.Eric Anholt2018-09-041-179/+0
| | | | | | | | Now that we have the util function for the default values, we can get rid of the boilerplate. v2: drop GLSL level in favor of defaults. v3: Rebase on new gallium caps
* gallium: Add a helper for implementing PIPE_CAP_* default values.Eric Anholt2018-09-041-2/+2
| | | | | | | | | | | | | | | | | | One of the pains of implementing a gallium driver is filling in a million pipe caps you don't know about yet when you're just starting out. One of the pains of working on gallium is copy-and-pasting your new PIPE_CAP into each driver. We can fix both of these by having each driver call into the default helper from their default case, so that both sides can ignore each other until they need to. v2: fix i915g build, revert swr change to avoid breaking scons build (https://travis-ci.org/anholt/mesa/jobs/419739857) v3: Rebase on 3 new gallium caps. Reviewed-by: Marek Olšák <[email protected]> (v1) Cc: Bruce Cherniak <[email protected]> Cc: George Kyriazis <[email protected]> Cc: Kenneth Graunke <[email protected]>
* gallium: Split out PIPE_CAP_TEXTURE_MIRROR_CLAMP_TO_EDGE.Kenneth Graunke2018-08-241-0/+1
| | | | | | | | | | | | | Some hardware can do PIPE_TEX_WRAP_MIRROR_REPEAT but not PIPE_TEX_WRAP_MIRROR_CLAMP and PIPE_TEX_WRAP_MIRROR_CLAMP_TO_BORDER. Drivers for such hardware would like to advertise support for ARB_texture_mirror_clamp_to_edge but not EXT_texture_mirror_clamp. This commit adds a new PIPE_CAP_TEXTURE_MIRROR_CLAMP_TO_EDGE bit, changes the extension enable to be based on that, and enables it in all upstream drivers which supported PIPE_CAP_TEXTURE_MIRROR_CLAMP (so they continue supporting this mode).
* gallium: add PIPE_CAP_MAX_SHADER_BUFFER_SIZEMarek Olšák2018-08-231-0/+2
| | | | Tested-by: Dieter Nützel <[email protected]>
* gallium: add PIPE_CAP_MAX_GS_INVOCATIONSMarek Olšák2018-08-231-0/+1
| | | | Tested-by: Dieter Nützel <[email protected]>
* vc4: Implement texture_subdata() to directly upload tiled data.Eric Anholt2018-08-081-1/+39
| | | | | | This avoids a memcpy into a temporary in the upload path. Improves x11perf -putimage100 performance by 12.1586% +/- 1.38155% (n=145)
* vc4: Handle partial loads/stores of tiled textures.Eric Anholt2018-08-083-60/+155
| | | | | | | | | | | | | | | | Previously, we would load out the tile-aligned area, update the raster copy, and store it back. This was a huge cost for XPutImage calls to the screen under glamor. Instead, implement a general load/store path that walks over the source x/y writing into the corresponding pixel of the destination (using clever math from https://fgiesen.wordpress.com/2011/01/17/texture-tiling-and-swizzling/). If things are aligned, we go through the previous utile-at-a-time loop. Improves x11perf -putimage10 performance by 139.777% +/- 2.83464% (n=5) Improves x11perf -putimage100 performance by 383.908% +/- 22.6297% (n=11) Improves x11perf -getimage10 performance by 2.75731% +/- 0.585054% (n=145)
* vc4: Compile the LT image helper per cpp we might load/store.Eric Anholt2018-08-081-2/+31
| | | | | | | | For the partial load/store support I'm about to add, we want the memcpy to be compiled out to a single load/store. This should also eliminate the calls to vc4_utile_width/height(). Improves x11perf -putimage100 performance by 3.76344% +/- 1.16978% (n=15)
* vc4: Refactor to reuse the LT tile walking code.Eric Anholt2018-08-081-24/+34
|
* vc4: Fix vc4_fence_server_sync() on pre-syncobj kernels.Eric Anholt2018-08-071-1/+2
| | | | | | | | | We won't have an FD if we're just having the server wait on a fence created by eglCreateSyncKHR(). Our seqno fences will happen in order, so server-side waits are no-ops in that case. Fixes dEQP-EGL.functional.sharing.gles2.multithread.simple_egl_server_sync.buffers.gen_delete Fixes: b0acc3a5628c ("broadcom/vc4: Native fence fd support")
* vc4: Ignore samplers for finding uniform offsets.Eric Anholt2018-08-071-3/+14
| | | | | | | | | | Fixes: dEQP-GLES2.shaders.struct.uniform.sampler_array_fragment dEQP-GLES2.shaders.struct.uniform.sampler_array_vertex dEQP-GLES2.shaders.struct.uniform.sampler_nested_fragment dEQP-GLES2.shaders.struct.uniform.sampler_nested_vertex Cc: [email protected]
* vc4: Extend dumping of uniforms in QIR and in the command stream.Eric Anholt2018-08-073-13/+68
| | | | Similar to what I did for V3D, provide some description of the uniforms.
* vc4: Pull uinfo->data[i] dereference out to the top of the loop.Eric Anholt2018-08-071-20/+18
| | | | | | Reduces the size of vc4_uniforms.o by about 10%. We would basically always end up loading the cachline of uinfo->data[i] anyway, so it should be good for performance as well as making the code a bit cleaner.
* vc4: Make sure to emit a tile coordinates between two MSAA loads.Eric Anholt2018-08-071-12/+11
| | | | | | | | | | The HW only executes a load once the tile coordinates packet happens, and only tracks one at a time, so by emitting our two MSAA loads back to back we would end up with an undefined color or Z buffer. The simulator doesn't seem to care, but sync up the RCL generation with the kernel anyway. Fixes dEQP-EGL.functional.render.multi_context.gles2.rgb888_window
* vc4: Respect a sampler view's first_layer field.Eric Anholt2018-08-071-1/+3
| | | | | | | Fixes texturing from EGL images created from cubemap faces, as in dEQP-EGL.functional.image.create.gles2_cubemap_negative_x_rgba_texture Cc: [email protected]
* vc4: Fix a leak of the no-vertex-elements workaround BO.Eric Anholt2018-08-061-0/+2
| | | | Fixes: bd1925562ad1 ("vc4: Convert the driver to emitting the shader record using pack macros.")
* vc4: Fix context creation when syncobjs aren't supported.Eric Anholt2018-08-061-2/+6
| | | | | | Noticed when trying to run current Mesa on rpi's downstream kernel. Fixes: b0acc3a5628c ("broadcom/vc4: Native fence fd support")
* vc4: Fix automake linking error.Juan A. Suarez Romero2018-08-011-0/+9
| | | | | | | | | | | | | | | CXXLD gallium_dri.la ../../../../src/gallium/drivers/vc4/.libs/libvc4.a(vc4_cl_dump.o): In function `vc4_dump_cl': src/gallium/drivers/vc4/vc4_cl_dump.c:45: undefined reference to `clif_dump_init' src/gallium/drivers/vc4/vc4_cl_dump.c:82: undefined reference to `clif_dump_destroy' ../../../../src/broadcom/cle/.libs/libbroadcom_cle.a(cle_libbroadcom_cle_la-v3d_decoder.o): In function `v3d_field_iterator_next': src/broadcom/cle/v3d_decoder.c:902: undefined reference to `clif_lookup_bo' Fixes: e92959c4e0 ("v3d: Pass the whole clif_dump structure to v3d_print_group().") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107423 CC: Eric Anholt <[email protected]> Acked-by: Eric Anholt <[email protected]> Reviewed-by: Andres Gomez <[email protected]>