aboutsummaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/vc4
Commit message (Collapse)AuthorAgeFilesLines
...
* meson: Use dependencies for nirDylan Baker2018-01-111-3/+4
| | | | | | | | | | | | | | | | | This creates two new internal dependencies, idep_nir_headers and idep_nir. The former encapsulates the generation of nir_opcodes.h and nir_builder_opcodes.h and adding src/compiler/nir as an include path. This ensures that any target that needs nir headers will have the includes and that the generated headers will be generated before the target is build. The second, idep_nir, includes the first and additionally links to libnir. This is intended to make it easier to avoid race conditions in the build when using nir, since the number of consumers for libnir and it's headers are quite high. Acked-by: Eric Engestrom <[email protected]> Signed-off-by: Dylan Baker <[email protected]>
* gallium: plumb context priority through to driverRob Clark2017-12-191-0/+1
| | | | | | | | Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Andres Rodriguez <[email protected]> Reviewed-by: Wladimir J. van der Laan <[email protected]>
* meson: define driver dependenciesDylan Baker2017-12-041-0/+5
| | | | | | | | | | | | This allow us to encapsulate the compiler and linkage requirements of each driver in a reusable way. The result will be that each target that needs a specific driver can simply add `driver_<name>` to its dependencies line and the necessary libraries and compiler args will be added. This will allow for a lot of code de-duplication between gallium targets. Signed-off-by: Dylan Baker <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* broadcom/vc4: Use a single-entry cached last_hindex value.Eric Anholt2017-12-012-2/+20
| | | | | | | | | Since almost all BOs will be in one CL at a time, this cache will almost always hit except for the first usage of the BO in each CL. This didn't show up as statistically significant on the minetest trace (n=340), but if I lop off the throttled lobe of the bimodal distribution, it very clearly does (0.74731% +/- 0.162093%, n=269).
* broadcom/vc4: Decompose single QUADs to a TRIANGLE_FAN.Eric Anholt2017-12-011-5/+14
| | | | | | | | No significant difference in the minetest replay, but it should reduce overhead by not requiring that we write quad indices to index buffers that we repeatedly re-upload (and making the draw packet smaller, as well). Over the course of the series the actual game seems to be up by 1-2 fps.
* broadcom/vc4: Skip emitting redundant VC4_PACKET_GEM_HANDLES.Eric Anholt2017-12-013-3/+12
| | | | | | | | | Now that there's only one user of it, it's pretty obvious how to avoid emitting redundant ones. This should save a bunch of kernel validation overhead. No statistically sigificant difference on the minetest trace I was looking at (n=169), but the maximum FPS is up by .3%
* broadcom/vc4: Simplify the relocation handling for index buffers.Eric Anholt2017-12-012-17/+17
| | | | | | Originally there was CL code for handling various relocations back when I had relocs for the TSDA/TA buffers. Now that the kernel handles those entirely on its own, I can inline that code into the one place using it.
* broadcom/vc4: Fix handling of GFXH-515 workaround with a start vertex count.Eric Anholt2017-12-011-16/+27
| | | | | | | | | | | | | We failed to take the start into account for how many vertices to draw in this round, so we would end up decrementing count below 0, which as an unsigned number meant we would loop until the CLs soon ran out of space. When I wrote the code I was thinking about how to use the previously emitted shader state (no index bias baked into the elements) by emitting up to 65535 and then only re-emitting with bias for the second wround, but that doesn't work if the start is over 65535. Instead, just delay emitting shader state until we get into the drawarrays GFXH-515 loop and always bake the bias in when we're doing the workaround.
* broadcom/vc4: Fix the scaling factor for the GFXH-515 workaround.Eric Anholt2017-12-011-1/+1
| | | | For triangle strips, we step by max_verts - 2.
* vc4: check preprocessor token existence using #ifdef instead of #ifEric Engestrom2017-11-281-3/+3
| | | | | | | (other uses of USE_VC4_SIMULATOR are already correct) Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* broadcom/vc4: fix indentation in vc4_screen.cAndres Rodriguez2017-11-141-8/+8
| | | | | | | Stumbled into this when adding a new PIPE_CAP. Signed-off-by: Andres Rodriguez <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* broadcom/vc4: Fix simulator mode for the MADVISE usage.Eric Anholt2017-11-091-0/+4
|
* gallium: add CAPs to support HW atomic counters. (v3)Dave Airlie2017-11-101-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | This looks like an evergreen specific feature, but with atomic counters AMD have hw specific counters they use instead of operating on buffers directly. These are separate to the buffer atomics, so require different limits and code paths. I've left the CAP for atomic type extensible in case someone else has a variant on this sort of thing (freedreno maybe?) and needs to change it. This adds all the CAPs required to add support for those atomic counters, along with a related CAP for limiting the number of output resources. I'd like to land this and the st patch then I can start to upstream the evergreen support for these and other GL4.x features. v2: drop the ATOMIC_COUNTER_MODE cap, just use the return from the HW counters. If 0 we use the current mode. v3: fix some rebase errors (Gert Wollny) Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Tested-By: Gert Wollny <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* broadcom/vc4: Mark BOs as purgeable when they enter the BO cacheBoris Brezillon2017-11-093-48/+86
| | | | | | | | | | | | | | | | This patch makes use of the DRM_IOCTL_VC4_GEM_MADVISE ioctl to mark all BOs placed in the mesa BO cache as purgeable so that the system can reclaim this memory under memory pressure. v2: - Removed BOs from the cache when they've been purged by the kernel - Check whether the madvise ioctl is supported or not before using it v3: Don't walk the whole list when we find a busy BO (by anholt, acked by Boris) Signed-off-by: Boris Brezillon <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* meson: Enable VC4's NEON assembly support.Eric Anholt2017-11-091-0/+13
| | | | | | Reviewed-by: Dylan Baker <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Tested-by: Timothy Arceri <[email protected]>
* gallium: add PIPE_CAP_SIGNED_VERTEX_BUFFER_OFFSETMarek Olšák2017-11-061-0/+1
|
* gallium: add cap for driver specified max combined shader resources.Dave Airlie2017-11-011-0/+1
| | | | | | | | Some hw (evergreen) has a limit on how many combined (images/buffers/mrts) a fragment shader can access. Reviewed-by: Ilia Mirkin <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* vc4: fix release buildEric Engestrom2017-10-271-6/+6
| | | | | | | | | | | Mesa's DEBUG and assert's NDEBUG are not tied to each other, so we need to explicitly compile this code out. Fixes: 3df78928786134874eafa "vc4: Drop reloc_count tracking for debug asserts on non-debug builds." Cc: Eric Anholt <[email protected]> Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* broadcom/vc4: Fix aliasing issueStefan Schake2017-10-201-1/+1
| | | | | | | This was causing Android clang version 3.8.256229 to miscompile, presumably due to strict aliasing. Fixes: 14dc281c1332 ("vc4: Enforce one-uniform-per-instruction after optimization.")
* nir: Get rid of nir_shader::stageJason Ekstrand2017-10-201-1/+1
| | | | | | | | It's redundant with nir_shader::info::stage. Acked-by: Timothy Arceri <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* meson: Add support for the vc4 driver.Eric Anholt2017-10-171-0/+101
| | | | Reviewed-by: Dylan Baker <[email protected]>
* broadcom/vc4: Fix false-positive for the tiling ioctls on simulator mode.Eric Anholt2017-10-171-0/+1
| | | | | If there happened to be an ENOENT laying around, we would try using the ioctls later and fail out resource allocation.
* broadcom/vc4: Skip BO labeling when in simulator mode.Eric Anholt2017-10-172-1/+5
| | | | | It was calling down into i915 trying to label the BO, which is definitely not the right thing.
* broadcom: Add V3D 3.3 gallium driver called "vc5", for BCM7268.Eric Anholt2017-10-102-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | V3D 3.3 is a continuation of the 3D implementation in VC4 (v2.1 and v2.6). V3D 3.3 introduces an MMU (no more CMA allocations) and support for GLES3.1. This driver is not currently conformant, though that will be a target as soon as possible. V3D 3.x parts use a new texture tiling layout common across many Broadcom graphics parts including and the HVS scanout engine. It also massively changes the QPU instructions, introducing a common physical register file (no more A/B split) and half-float instructions, while removing the 4x8 unorm instructions in favor of half-float for talking to fixed function interfaces. Because so much has changed, vc5 is implemented in a separate gallium driver, using only the XML code-generation support from vc4. v2: Fix tile layout for 64bpp textures. Fix texture swizzling for 32-bit returns. Fix up a bit of MRT setup. Sync the simulator to kernel behavior a bit more. Improve uniform debugging code. Rebase on QIR->VIR rename. Move texture state mostly to the CSOs. Improve cache flushing on the simulator. Fix program deletion use-after-frees. Acked-by: Dave Airlie <[email protected]> (uabi plan) Acked-by: Daniel Vetter <[email protected]> (uabi plan)
* nir: Move vc4's alpha test lowering to core NIR.Eric Anholt2017-10-103-55/+11
| | | | | | | | | | | | | I've been doing this inside of vc4, but vc5 wants it as well and it may be useful for other drivers (Intel has a related path for pre-gen6 with MRT, and freedreno had a TGSI path for it at one point). This required defining a common enum for the standard comparison functions, but other lowering passes are likely to also want that enum. v2: Add to meson.build as well. Acked-by: Rob Clark <[email protected]>
* broadcom/vc4: Expose PIPE_CAP_TILE_RASTER_ORDEREric Anholt2017-10-107-20/+71
| | | | | | | | | | | | | | Because vc4 can control the order that tiles are rasterized in, we can use it to implement overlapping blits using normal drawing and GL_ARB_texture_barrier, as long as we can tell the kernel what order to render the tiles in. v2: Fix on the simulator. v3: Add the cap (disabled) to other drivers, add rst docs for the cap. v4: Rebase on PIPE_CAP_TGSI_ANY_REG_AS_ADDRESS v5: Split from the core gallium commit, drop some unnecessary code related to glBlitFramebuffer(), fix a crash with clears before state has been bound.
* gallium: Create a new PIPE_CAP_TILE_RASTER_ORDER for vc4.Eric Anholt2017-10-101-0/+1
| | | | | | | | | | | | | | | | Because vc4 can control the order that tiles are rasterized in, we can use it to implement overlapping blits using normal drawing and GL_ARB_texture_barrier, as long as we can tell the kernel what order to render the tiles in. This commit introduces the core gallium support, vc4 changes will follow. v2: Fix on the simulator. v3: Add the cap (disabled) to other drivers, add rst docs for the cap. v4: Rebase on PIPE_CAP_TGSI_ANY_REG_AS_ADDRESS v5: Drop vc4 changes from this commit, for clarity. Reviewed-by: Nicolai Hähnle <[email protected]> (v3)
* broadcom/vc4: Implement GL_ARB_texture_barrier.Eric Anholt2017-10-102-1/+12
| | | | | | Improves x11perf -copywinwin100 from ~2000/sec to ~4700/sec. More importantly, this is a prerequisite for the new GL_MESA_tile_raster_order extension.
* vc4: Add support for 5551 textures.Eric Anholt2017-10-102-3/+3
| | | | | This keeps us from promoting them up to 8888, at the cost of not being color-renderable.
* gallium: add PIPE_CAP_TGSI_ANY_REG_AS_ADDRESSMarek Olšák2017-10-061-0/+1
| | | | | Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* broadcom: Fix out-of-tree build include pathDaniel Stone2017-10-051-0/+2
| | | | | Reviewed-by: Eric Anholt <[email protected]> Fixes: 5b102160ae ("broadcom/genxml: Introduce a V3D packet/struct decoder.")
* broadcom/vc4: Don't advertise tiled dmabuf modifiers if we can't use themDerek Foreman2017-10-051-13/+18
| | | | | | | | | | | | | | | | If the DRM_VC4_GET_TILING ioctl isn't present then we can't tell if a dmabuf bo is tiled or linear, so will always assume it's linear. By not advertising tiled formats in this situation we ensure the assumption is correct. This fixes a bug where most attempts to render a gl wayland client under weston will result in a client side abort. Signed-off-by: Derek Foreman <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Acked-by: Daniel Stone <[email protected]> (on irc)
* gallium: add LDEXP TGSI instruction and corresponding capNicolai Hähnle2017-09-291-0/+1
| | | | | Reviewed-by: Marek Olšák <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* broadcom/vc4: Fix release buildEric Anholt2017-09-271-1/+1
| | | | | | | | | I remember thinking "gosh, it would be nice if I could do a kernel-style 'if (!IS_ENABLED(DEBUG))' instead of using an #ifdef, so the code was compiled on both builds", and then forgot to test a release build anyway. Fixes: a8fd58eae596 ("vc4: Add labels to BOs for debug builds or with VC4_DEBUG=surf set.") Reported-by: Derek Foreman <[email protected]>
* vc4: Add labels to BOs for debug builds or with VC4_DEBUG=surf set.Eric Anholt2017-09-274-0/+41
| | | | | | | This has proven to be incredibly useful for debugging CMA allocation failures and driving memory management improvements. However, we don't want to burden entry and exit from the BO cache with the labeling ioctl's overhead on release builds.
* broadcom/vc4: Fix infinite retry in vc4_bo_alloc()Boris Brezillon2017-09-261-6/+4
| | | | | | | | | | | | | | | cleared_and_retried is always reset to false when jumping to the retry label, thus leading to an infinite retry loop. Fix that by moving the cleared_and_retried variable definitions at the beginning of the function. While we're at it, move the create variable with the other local variables and explicitly reset its content in the retry path. Signed-off-by: Boris Brezillon <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Fixes: 78087676c98aa8884ba92 "vc4: Restructure the simulator mode."
* broadcom/vc4: Keep pipe_sampler_view->texture matching the original texture.Eric Anholt2017-09-266-39/+42
| | | | | | | | | | | I was overwriting view->texture with the shadow resource when we need to do shadow copies (retiling or baselevel rebase), but that tripped up some critical new sanity checking in state_tracker (making sure that stObj->pt hasn't changed from view->texture through TexImage-related paths). To avoid that, move the shadow resource to the vc4_sampler_view struct. Fixes: f0ecd36ef8e1 ("st/mesa: add an entirely separate codepath for setting up buffer views")
* gallium: Add PIPE_SHADER_CAP_INT64_ATOMICSJan Vesely2017-09-211-0/+1
| | | | | | | Denotes availability of 64bit int atomic instructions Signed-off-by: Jan Vesely <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* broadcom/vc4: Fix use-after-free when deleting a program.Eric Anholt2017-09-181-6/+15
| | | | | | | | | | | By leaving the compiled shader in the context's stage state, the next compile of a new FS would look in the old compiled FS for figuring out whether to set various dirty flags for the VS compile. Clear out the pointer when deleting the program, and make sure that we always mark the state as dirty if the previous program had been lost. Fixes valgrind warnings on glsl-max-varyings. Fixes: 2350569a78c6 ("vc4: Avoid VS shader recompiles by keeping a set of FS inputs seen so far.")
* broadcom/vc4: Fix crashes since the gallium blitter reworks.Eric Anholt2017-09-181-1/+3
| | | | | Even if we're not clearing color, the blitter has started dereferencing the color value.
* broadcom/vc4: Fix use-after-free trying to mix a quad and tile clear.Eric Anholt2017-09-181-24/+32
| | | | | | | | | | | | | The blitter will bind just the depth buffer, which flushes the current job if we had both a color and depth/stencil. If the clear was doing partial depth/stencil (quad-based) and color (tile-based), we'd go on to try to set up the rest of the tile clear in the now flushed job. Instead, move the partial clear up before we start setting up the job for the current FBO state, and re-fetch the job if we're continuing on to a tile-based clear. Fixes valgrind failures in fbo-depthtex. Fixes: 9421a6065c4e ("vc4: Fix fallback to quad clears of depth in GLX.")
* broadcom/vc4: Fix use-after-free for flushing when writing to a texture.Eric Anholt2017-09-181-2/+7
| | | | | | | | I was trying to continue the hash table loop, not the inner loop. This tended to work out, because we would have *just* freed the job struct. Fixes some valgrind failures in fbo-depthtex. Fixes: f597ac396640 ("vc4: Implement job shuffling")
* gallium: Add PIPE_SHADER_CAP_FP16Jan Vesely2017-09-181-0/+1
| | | | | | | | | Denotes native half precision float operations capability v2: PIPE_CAP_HALFS -> PIPE_SHADER_CAP_FP16 fix indentation Signed-off-by: Jan Vesely <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium: introduce PIPE_CAP_LOAD_CONSTBUFTimothy Arceri2017-09-151-0/+1
| | | | Reviewed-by: Marek Olšák <[email protected]>
* broadcom/vc4: Port NEON-code to ARM64Jonas Pfeil2017-08-151-0/+84
| | | | | | | | | | Changed all register and instruction names, works the same. v2: Rebase on build system changes (by anholt) v3: Fix build on clang (by anholt, reported by Rob) Signed-off-by: Jonas Pfeil <[email protected]> Tested-by: Rob Herring <[email protected]>
* broadcom/vc4: Build the vc4_tiling_lt_neon.c with -mfpu=neon on ARM.Eric Anholt2017-08-154-7/+24
| | | | | | | | | | | | | If you don't pass this, the compiler refuses to compile the assembly for pre-v7 CPUs. This also keeps us from building identical, non-NEON code on aarch64 and x86. Fixes: a373f77662c5 ("vc4: Use a wrapper file to set VC4_BUILD_NEON instead of CFLAGS.") v2: Fix Android build by just appending NEON_C_SOURCES when ARCH_ARM_HAVE_NEON. Tested-by: Rob Herring <[email protected]>
* gallium: introduce PIPE_CAP_MEMOBJTimothy Arceri2017-08-031-0/+1
| | | | | | | | | | | | | | This can be used to guard support for EXT_memory_object and related extensions. v2: update gallium docs v3 (Timothy Arceri): - add cap to nv50 Signed-off-by: Andres Rodriguez <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* gallium: add PIPE_QUERY_SO_OVERFLOW_ANY_PREDICATE and corresponding capNicolai Hähnle2017-08-021-0/+1
| | | | | | | | v2: rename cap to PIPE_CAP_QUERY_SO_OVERFLOW and be a bit more explicit in the documentation Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* st/glsl_to_nir: move nir_lower_io to driversNicolai Hähnle2017-07-311-0/+3
| | | | | | | This allows drivers more freedom in how exactly they want to lower I/O, e.g. first lowering I/O to temporaries. Reviewed-by: Marek Olšák <[email protected]>
* st/mesa: get rid of st_glsl_typesNicolai Hähnle2017-07-311-2/+8
| | | | | | | It's a duplicate of glsl_type::count_attribute_slots. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Marek Olšák <[email protected]>