summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/vc4
Commit message (Collapse)AuthorAgeFilesLines
* gallium: introduce PIPE_CAP_LOAD_CONSTBUFTimothy Arceri2017-09-151-0/+1
| | | | Reviewed-by: Marek Olšák <[email protected]>
* broadcom/vc4: Port NEON-code to ARM64Jonas Pfeil2017-08-151-0/+84
| | | | | | | | | | Changed all register and instruction names, works the same. v2: Rebase on build system changes (by anholt) v3: Fix build on clang (by anholt, reported by Rob) Signed-off-by: Jonas Pfeil <[email protected]> Tested-by: Rob Herring <[email protected]>
* broadcom/vc4: Build the vc4_tiling_lt_neon.c with -mfpu=neon on ARM.Eric Anholt2017-08-154-7/+24
| | | | | | | | | | | | | If you don't pass this, the compiler refuses to compile the assembly for pre-v7 CPUs. This also keeps us from building identical, non-NEON code on aarch64 and x86. Fixes: a373f77662c5 ("vc4: Use a wrapper file to set VC4_BUILD_NEON instead of CFLAGS.") v2: Fix Android build by just appending NEON_C_SOURCES when ARCH_ARM_HAVE_NEON. Tested-by: Rob Herring <[email protected]>
* gallium: introduce PIPE_CAP_MEMOBJTimothy Arceri2017-08-031-0/+1
| | | | | | | | | | | | | | This can be used to guard support for EXT_memory_object and related extensions. v2: update gallium docs v3 (Timothy Arceri): - add cap to nv50 Signed-off-by: Andres Rodriguez <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* gallium: add PIPE_QUERY_SO_OVERFLOW_ANY_PREDICATE and corresponding capNicolai Hähnle2017-08-021-0/+1
| | | | | | | | v2: rename cap to PIPE_CAP_QUERY_SO_OVERFLOW and be a bit more explicit in the documentation Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* st/glsl_to_nir: move nir_lower_io to driversNicolai Hähnle2017-07-311-0/+3
| | | | | | | This allows drivers more freedom in how exactly they want to lower I/O, e.g. first lowering I/O to temporaries. Reviewed-by: Marek Olšák <[email protected]>
* st/mesa: get rid of st_glsl_typesNicolai Hähnle2017-07-311-2/+8
| | | | | | | It's a duplicate of glsl_type::count_attribute_slots. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium: add PIPE_CAP_NIR_SAMPLERS_AS_DEREFNicolai Hähnle2017-07-311-0/+1
| | | | Reviewed-by: Marek Olšák <[email protected]>
* broadcom/vc4: Use the RA callback to improve register selection's choices.Eric Anholt2017-07-251-1/+52
| | | | | | | | | | | | | | | We simply pick r4 if available (anything else would force a MOV), then round-robin through accumulators (avoids physical regfile RAW delay slots), then round-robin through the physical regfile. The effect on instruction count is pretty impressive: total instructions in shared programs: 76563 -> 74526 (-2.66%) instructions in affected programs: 66463 -> 64426 (-3.06%) and we could probably do better with a little heuristic of "if we're going to choose a physical reg, and other operands of instructions using this as a src have the same physical regfile, then use the other regfile".
* broadcom/vc4: Scissor blits performed using the rendering engine.Eric Anholt2017-07-251-0/+9
| | | | | | Without this, a BlitFramebuffer would mark the whole framebuffer as being changed (so we emit loads/stores of all of it) rather than just the modified subset.
* broadcom/vc4: Prefer blit via rendering to the software fallback.Eric Anholt2017-07-251-6/+8
| | | | | | | I don't know how I managed to leave this here for so long. Found when working on a 1:1 overlapping blit extension for X11. Cc: [email protected]
* broadcom/vc4: Switch the Viewport Center fields to a fixed-point representation.Eric Anholt2017-07-251-2/+2
| | | | | | This gets us automatic CL decoding to a floating-point value, and drops a magic number from the emit code. 250x250 shader runner tests now say they have a center of 125.0 instead of 2000.
* broadcom/vc4: Use the XML decoder for CL dumping.Eric Anholt2017-07-253-443/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The VC4_DEBUG_CL output goes from: 0x00000010 0x00000010: 0x06 VC4_PACKET_START_TILE_BINNING 0x00000011 0x00000011: 0x38 VC4_PACKET_PRIMITIVE_LIST_FORMAT 0x00000012 0x00000012: 0x12 0x00000013 0x00000013: 0x66 VC4_PACKET_CLIP_WINDOW 0x00000014 0x00000014: 0x00 0x00000015 0x00000015: 0x00 0x00000016 0x00000016: 0x00 0x00000017 0x00000017: 0x00 0x00000018 0x00000018: 0xfa 0x00000019 0x00000019: 0x00 0x0000001a 0x0000001a: 0xfa 0x0000001b 0x0000001b: 0x00 to: 0x00000010 0x00000010: 0x06 Start Tile Binning 0x00000011 0x00000011: 0x38 Primitive List Format Data Type: 1 (16-bit index) Primitive Type: 2 (Triangles List) 0x00000013 0x00000013: 0x66 Clip Window Clip Window Height in pixels: 250 Clip Window Width in pixels: 250 Clip Window Bottom Pixel Coordinate: 0 Clip Window Left Pixel Coordinate: 0 v2: Squash in robher's fixes for Android
* renderonly/etnaviv: stop importing resource from renderonlyLucas Stach2017-07-191-2/+3
| | | | | | | | | | | The current way of importing the resource from renderonly after allocation is opaque and is taking away control from the driver, which it needs in order to implement more advanced scenarios than the simple linear scanout with matching stride alignments. Signed-off-by: Lucas Stach <[email protected]> Reviewed-by: Christian Gmeiner <[email protected]> Acked-by: Daniel Stone <[email protected]>
* Android: Fix vc4 build since XML changes.Rob Herring2017-07-121-1/+1
| | | | | | | | | | | | | | Since commit 7f80a9ff1312 ("vc4: Introduce XML-based packet header generation like Intel's."), the vc4 build on Android is broken: out/target/product/linaro_x86_64/gen/STATIC_LIBRARIES/libmesa_broadcom_genxml_intermediates/broadcom/cle/v3d_packet_v21_pack.h:12:10: fatal error: 'v3d_packet_helpers.h' file not found external/mesa3d/src/gallium/drivers/vc4/vc4_cl_dump.c:28:10: fatal error: 'vc4_packet.h' file not found The path of the generated header needs to be fixed since we build out of tree. Acked-by: Eric Anholt <[email protected]> Signed-off-by: Rob Herring <[email protected]>
* vc4: Set shareable BOs as T tiled if possibleEric Anholt2017-07-124-13/+182
| | | | | | | | | | | | | | | | | | | | | | | | X11 and GL compositor performance on VC4 has been terrible because of our SHARED-usage buffers all being forced to linear. This swaps SHARED && !LINEAR buffers over to being tiled. This is an expected win for all GL compositors during rendering (a full copy of each shared texture per draw call), allows X11 to be used with decent performance without a GL compositor, and improves X11 windowed swapbuffers performance as well. It also halves the memory usage of shared buffers that get textured from. The only cost should be idle systems with a scanout-only buffer that isn't flagged as LINEAR, in which case the memory bandwidth cost of scanout goes up ~25%. This implements the EGL_EXT_image_dma_buf_import_modifiers extension, supporting the VC4 T_TILED modifier. v2: Added modifier support to resource creation/import, and advertisement (by daniels). v3: Fix old-kernel fallback path, fix compiler error and warnings, and comment touchups (by anholt). Reviewed-by: Daniel Stone <[email protected]>
* vc4: Use vc4_setup_slices for resource importEric Anholt2017-07-121-33/+19
| | | | | | | | | Rather than open-coding populating the first slice inside resource import, use vc4_setup_slices to do it for us. v2: Rebase on VC4_DEBUG=surf change Reviewed-by: Daniel Stone <[email protected]>
* vc4: Make the miptree debug code available under VC4_DEBUG=surfEric Anholt2017-07-123-5/+6
| | | | | | I kept flipping the bool on for debug, so let's just make it available. Reviewed-by: Daniel Stone <[email protected]>
* vc4: Switch back to using a local copy of vc4_drm.h.Eric Anholt2017-07-122-2/+4
| | | | | | | | | | | Needing to get our uapi header from libdrm has only complicated things. Follow intel's lead and drop our requirement for it. Generated from the same commit mentioned in the README. v2: Update Android.mk as well, move vc4_drm.h reference for distcheck. Reviewed-by: Daniel Stone <[email protected]>
* vc4: Remove a stale comment.Eric Anholt2017-07-121-4/+0
| | | | | The kernel hasn't been synchronous in a couple of years, plus there was synchronization code right there.
* vc4: automake: include vc4_cl_dump.h inJuan A. Suarez Romero2017-07-041-0/+1
| | | | | | | | | Ensure vc4_cl_dump.h and $(BROADCOM_FILES) are distributed in the dist-file. This fixes `make distcheck` Reviewed-by: Emil Velikov <[email protected]>
* vc4: Start using XML unpack functions in CL dump.Eric Anholt2017-06-305-19/+67
| | | | | | For now this is a no-op on the output, but it makes it clear that we've had weird things going on with things like V3D21_CLIPPER_Z_SCALE_AND_OFFSET.
* vc4: Replace a couple of magic numbers with #define usage.Eric Anholt2017-06-301-2/+2
|
* vc4: Move rasterizer state packing to CSO creation time.Eric Anholt2017-06-304-29/+25
| | | | | | | | | | This gets our vc4_emit.c size back down a bit: before: 1020 0 0 1020 3fc src/gallium/drivers/vc4/.libs/vc4_emit.o after: 968 0 0 968 3c8 src/gallium/drivers/vc4/.libs/vc4_emit.o
* vc4: Convert the driver to emitting the shader record using pack macros.Eric Anholt2017-06-303-54/+92
|
* vc4: Simplify pack header usageEric Anholt2017-06-304-35/+28
| | | | | | | | | | | | | Take the CL pointer in, which will be useful for enabling relocs. However, our code expands a bit more: before: 4449 0 0 4449 1161 src/gallium/drivers/vc4/.libs/vc4_draw.o 988 0 0 988 3dc src/gallium/drivers/vc4/.libs/vc4_emit.o after: 4481 0 0 4481 1181 src/gallium/drivers/vc4/.libs/vc4_draw.o 1020 0 0 1020 3fc src/gallium/drivers/vc4/.libs/vc4_emit.o
* vc4: Start using the pack header.Eric Anholt2017-06-304-51/+130
| | | | | | | | | | | | | This slightly inflates the size of the generated code, in exchange for getting us some convenient tools. before: 4389 0 0 4389 1125 src/gallium/drivers/vc4/.libs/vc4_draw.o 808 0 0 808 328 src/gallium/drivers/vc4/.libs/vc4_emit.o after: 4449 0 0 4449 1161 src/gallium/drivers/vc4/.libs/vc4_draw.o 988 0 0 988 3dc src/gallium/drivers/vc4/.libs/vc4_emit.o
* vc4: Introduce XML-based packet header generation like Intel's.Eric Anholt2017-06-301-1/+4
| | | | | | | | | | | | | | | I really liked this idea, as it should help with management of packet parsing tools like the CL dump. The python script is forked off of theirs because our packets are byte-based instead of dwords, and the changes to do so while avoiding performance regressions due to unaligned accesses were quite invasive. v2: Fix Android.mk paths, drop shebang for python script, fix overlap detection. Acked-by: Jason Ekstrand <[email protected]> Acked-by: Kenneth Graunke <[email protected]> Tested-by: Rob Herring <[email protected]>
* Android: use symlinks for driver loadingRob Herring2017-06-291-0/+1
| | | | | | | | | Instead of having special driver loading logic for Android, create symlinks to gallium_dri.so so we can use the standard loading logic. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Signed-off-by: Rob Herring <[email protected]>
* vc4: Clean up release build warnings using MAYBE_UNUSED.Eric Anholt2017-06-202-6/+5
| | | | | These variables are all used in an assert(), so release builds see no usages.
* vc4: Allow VBOs to be mapped during execution.Eric Anholt2017-06-201-1/+1
| | | | | | | | There's no reason we can't -- the mappings we expose are basically equivalent to persistent/coherent, already. Improves mesa-demos drawoverhead (no state change) performance by 5.21362% +/- 1.25078% (n=11).
* gallium: Add renderonly-based support for pl111+vc4.Eric Anholt2017-06-154-2/+54
| | | | | | | | | | | | | | | | | | | This follows the model of imx (display) and etnaviv (render): pl111 is a display-only device, so when asked to do GL for it, we see if we have a vc4 renderer, make the vc4 screen, and have vc4 call back to pl111 to do scanout allocations. The difference from etnaviv is that we share the same BO between vc4 and pl111, rather than having a vc4 bo and a pl11 bo and copies between the two. The only mismatch between their requirements is that vc4 requires 4-pixel (at 32bpp) stride alignment, while pl111 requires that stride match width. The kernel will reject any modesets to an incorrect stride, so the 3D driver doesn't need to worry about that. v2: Rebase on Android rework, drop unused include. v3: Fix another Android bug, from Rob Herring's build-testing. Reviewed-by: Christian Gmeiner <[email protected]>
* gallium: add PIPE_CAP_BINDLESS_TEXTURESamuel Pitoiset2017-06-141-0/+1
| | | | | | | | | Whether bindless texture operations are supported by the underlying driver. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium: Add a cap to check if the driver supports ARB_post_depth_coverageLyude2017-06-021-0/+1
| | | | | Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* vc4: Remove dead code in vc4_dump_surface_msaa()Rhys Kidd2017-05-221-6/+0
| | | | | | | | | Coverity caught the use of dead code copy-paste for found_colors[] and num_found_colors. CID: 1341850 Signed-off-by: Rhys Kidd <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* vc4: Don't allocate new BOs to avoid synchronization when they're shared.Eric Anholt2017-05-171-1/+2
| | | | | | | If X11 did a software fallback to the entire screen, we would throw out the BO the screen is scanning out from and allocate a new one. Cc: [email protected]
* vc4: Drop pointless indirections around BO import/export.Eric Anholt2017-05-173-69/+49
| | | | | | I've since found them to be more confusing by adding indirections than clarifying by screening off resources from the handle/fd import/export process.
* vc4: Drop the u_resource_vtbl no-op layer.Eric Anholt2017-05-174-33/+27
| | | | | We only ever attached one vtbl, so it was a waste of space and indirections.
* gallium: add PIPE_CAP_ALLOW_MAPPED_BUFFERS_DURING_EXECUTIONMarek Olšák2017-05-171-0/+1
| | | | | | for skipping mapped-buffer checking in every GL draw call Reviewed-by: Nicolai Hähnle <[email protected]>
* Android: push driver build details to driver makefilesRob Herring2017-05-111-0/+4
| | | | | | | | | | | | | src/gallium/targets/dri/Android.mk contains lots of conditional for individual drivers. Let's move these details into the individual driver makefiles. In the process, align the make driver conditionals with automake (i.e. HAVE_GALLIUM_*). Signed-off-by: Rob Herring <[email protected]> [Emil Velikov: add the radeon winsys for radeonsi] Signed-off-by: Emil Velikov <[email protected]>
* gallium: add PIPE_CAP_CAN_BIND_CONST_BUFFER_AS_VERTEXMarek Olšák2017-05-101-0/+1
| | | | | | | The next patch will use it. This is really for svga and GL2-level drivers. Tested-by: Edmondo Tommasina <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* gallium: remove pipe_index_buffer and set_index_bufferMarek Olšák2017-05-105-38/+20
| | | | | | | | | | | | | | pipe_draw_info::indexed is replaced with index_size. index_size == 0 means non-indexed. Instead of pipe_index_buffer::offset, pipe_draw_info::start is used. For indexed indirect draws, pipe_draw_info::start is added to the indirect start. This is the only case when "start" affects indirect draws. pipe_draw_info::index is a union. Use either index::resource or index::user depending on the value of pipe_draw_info::has_user_indices. v2: fixes for nine, svga
* gallium: decrease the size of pipe_vertex_buffer - 24 -> 16 bytesMarek Olšák2017-05-101-1/+1
|
* nir: Embed the shader_info in the nir_shader againJason Ekstrand2017-05-092-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit e1af20f18a86f52a9640faf2d4ff8a71b0a4fa9b changed the shader_info from being embedded into being just a pointer. The idea was that sharing the shader_info between NIR and GLSL would be easier if it were a pointer pointing to the same shader_info struct. This, however, has caused a few problems: 1) There are many things which generate NIR without GLSL. This means we have to support both NIR shaders which come from GLSL and ones that don't and need to have an info elsewhere. 2) The solution to (1) raises all sorts of ownership issues which have to be resolved with ralloc_parent checks. 3) Ever since 00620782c92100d77c660f9783504c6d80fa1d58, we've been using nir_gather_info to fill out the final shader_info. Thanks to cloning and the above ownership issues, the nir_shader::info may not point back to the gl_shader anymore and so we have to do a copy of the shader_info from NIR back to GLSL anyway. All of these issues go away if we just embed the shader_info in the nir_shader. There's a little downside of having to copy it back after calling nir_gather_info but, as explained above, we have to do that anyway. Acked-by: Timothy Arceri <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* vc4: Use runtime CPU detection for whether NEON is available.Eric Anholt2017-05-022-14/+16
| | | | | | | | This will allow Raspbian's ARMv6 builds to take advantage of the new NEON code, and could prevent problems if vc4 ends up getting used on a v7 CPU without NEON. v2: Drop dead NEON_SUFFIX (noted by Erik Faye-Lund)
* vc4: Use a wrapper file to set VC4_BUILD_NEON instead of CFLAGS.Eric Anholt2017-05-024-8/+31
| | | | | | | | | Android.mk was setting the flag across the entire driver, so we didn't have non-NEON versions getting built. This was going to be a problem with the next commit, when I start auto-detecting NEON support and use the non-NEON version when appropriate. Reviewed-by: Rob Herring <[email protected]>
* vc4: Only build the NEON code on arm32.Eric Anholt2017-05-011-2/+2
| | | | | | | | | | | NEON is sufficiently different on arm64 that we can't just reuse this code. Disable it on arm64 for now. v2: Use PIPE_ARCH_ARM instead, as __ARM_ARCH may be 8 for a 32-bit build for a v8 CPU. Signed-off-by: Eric Anholt <[email protected]> Cc: <[email protected]>
* gallium: add PIPE_SHADER_CAP_TGSI_SKIP_MERGE_REGISTERSSamuel Pitoiset2017-04-261-0/+1
| | | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium: fold u_trim_pipe_prim call from st/mesa to driversMarek Olšák2017-04-201-0/+5
| | | | | | | Most drivers don't need it and shouldn't need it because it can't be used in some cases (indirect draws, primitive restart, count from streamout). Reviewed-by: Brian Paul <[email protected]>
* vc4: Enable V3D 2.6.Eric Anholt2017-04-181-1/+1
| | | | | This version of the chip is present on the Cygnus-based 911360 enterprise phone platform. It appears to be completely backwards compatible.