summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/vc4
Commit message (Collapse)AuthorAgeFilesLines
* broadcom/vc4: fix indentation in vc4_screen.cAndres Rodriguez2017-11-141-8/+8
| | | | | | | Stumbled into this when adding a new PIPE_CAP. Signed-off-by: Andres Rodriguez <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* broadcom/vc4: Fix simulator mode for the MADVISE usage.Eric Anholt2017-11-091-0/+4
|
* gallium: add CAPs to support HW atomic counters. (v3)Dave Airlie2017-11-101-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | This looks like an evergreen specific feature, but with atomic counters AMD have hw specific counters they use instead of operating on buffers directly. These are separate to the buffer atomics, so require different limits and code paths. I've left the CAP for atomic type extensible in case someone else has a variant on this sort of thing (freedreno maybe?) and needs to change it. This adds all the CAPs required to add support for those atomic counters, along with a related CAP for limiting the number of output resources. I'd like to land this and the st patch then I can start to upstream the evergreen support for these and other GL4.x features. v2: drop the ATOMIC_COUNTER_MODE cap, just use the return from the HW counters. If 0 we use the current mode. v3: fix some rebase errors (Gert Wollny) Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Tested-By: Gert Wollny <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* broadcom/vc4: Mark BOs as purgeable when they enter the BO cacheBoris Brezillon2017-11-093-48/+86
| | | | | | | | | | | | | | | | This patch makes use of the DRM_IOCTL_VC4_GEM_MADVISE ioctl to mark all BOs placed in the mesa BO cache as purgeable so that the system can reclaim this memory under memory pressure. v2: - Removed BOs from the cache when they've been purged by the kernel - Check whether the madvise ioctl is supported or not before using it v3: Don't walk the whole list when we find a busy BO (by anholt, acked by Boris) Signed-off-by: Boris Brezillon <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* meson: Enable VC4's NEON assembly support.Eric Anholt2017-11-091-0/+13
| | | | | | Reviewed-by: Dylan Baker <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Tested-by: Timothy Arceri <[email protected]>
* gallium: add PIPE_CAP_SIGNED_VERTEX_BUFFER_OFFSETMarek Olšák2017-11-061-0/+1
|
* gallium: add cap for driver specified max combined shader resources.Dave Airlie2017-11-011-0/+1
| | | | | | | | Some hw (evergreen) has a limit on how many combined (images/buffers/mrts) a fragment shader can access. Reviewed-by: Ilia Mirkin <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* vc4: fix release buildEric Engestrom2017-10-271-6/+6
| | | | | | | | | | | Mesa's DEBUG and assert's NDEBUG are not tied to each other, so we need to explicitly compile this code out. Fixes: 3df78928786134874eafa "vc4: Drop reloc_count tracking for debug asserts on non-debug builds." Cc: Eric Anholt <[email protected]> Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* broadcom/vc4: Fix aliasing issueStefan Schake2017-10-201-1/+1
| | | | | | | This was causing Android clang version 3.8.256229 to miscompile, presumably due to strict aliasing. Fixes: 14dc281c1332 ("vc4: Enforce one-uniform-per-instruction after optimization.")
* nir: Get rid of nir_shader::stageJason Ekstrand2017-10-201-1/+1
| | | | | | | | It's redundant with nir_shader::info::stage. Acked-by: Timothy Arceri <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* meson: Add support for the vc4 driver.Eric Anholt2017-10-171-0/+101
| | | | Reviewed-by: Dylan Baker <[email protected]>
* broadcom/vc4: Fix false-positive for the tiling ioctls on simulator mode.Eric Anholt2017-10-171-0/+1
| | | | | If there happened to be an ENOENT laying around, we would try using the ioctls later and fail out resource allocation.
* broadcom/vc4: Skip BO labeling when in simulator mode.Eric Anholt2017-10-172-1/+5
| | | | | It was calling down into i915 trying to label the BO, which is definitely not the right thing.
* broadcom: Add V3D 3.3 gallium driver called "vc5", for BCM7268.Eric Anholt2017-10-102-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | V3D 3.3 is a continuation of the 3D implementation in VC4 (v2.1 and v2.6). V3D 3.3 introduces an MMU (no more CMA allocations) and support for GLES3.1. This driver is not currently conformant, though that will be a target as soon as possible. V3D 3.x parts use a new texture tiling layout common across many Broadcom graphics parts including and the HVS scanout engine. It also massively changes the QPU instructions, introducing a common physical register file (no more A/B split) and half-float instructions, while removing the 4x8 unorm instructions in favor of half-float for talking to fixed function interfaces. Because so much has changed, vc5 is implemented in a separate gallium driver, using only the XML code-generation support from vc4. v2: Fix tile layout for 64bpp textures. Fix texture swizzling for 32-bit returns. Fix up a bit of MRT setup. Sync the simulator to kernel behavior a bit more. Improve uniform debugging code. Rebase on QIR->VIR rename. Move texture state mostly to the CSOs. Improve cache flushing on the simulator. Fix program deletion use-after-frees. Acked-by: Dave Airlie <[email protected]> (uabi plan) Acked-by: Daniel Vetter <[email protected]> (uabi plan)
* nir: Move vc4's alpha test lowering to core NIR.Eric Anholt2017-10-103-55/+11
| | | | | | | | | | | | | I've been doing this inside of vc4, but vc5 wants it as well and it may be useful for other drivers (Intel has a related path for pre-gen6 with MRT, and freedreno had a TGSI path for it at one point). This required defining a common enum for the standard comparison functions, but other lowering passes are likely to also want that enum. v2: Add to meson.build as well. Acked-by: Rob Clark <[email protected]>
* broadcom/vc4: Expose PIPE_CAP_TILE_RASTER_ORDEREric Anholt2017-10-107-20/+71
| | | | | | | | | | | | | | Because vc4 can control the order that tiles are rasterized in, we can use it to implement overlapping blits using normal drawing and GL_ARB_texture_barrier, as long as we can tell the kernel what order to render the tiles in. v2: Fix on the simulator. v3: Add the cap (disabled) to other drivers, add rst docs for the cap. v4: Rebase on PIPE_CAP_TGSI_ANY_REG_AS_ADDRESS v5: Split from the core gallium commit, drop some unnecessary code related to glBlitFramebuffer(), fix a crash with clears before state has been bound.
* gallium: Create a new PIPE_CAP_TILE_RASTER_ORDER for vc4.Eric Anholt2017-10-101-0/+1
| | | | | | | | | | | | | | | | Because vc4 can control the order that tiles are rasterized in, we can use it to implement overlapping blits using normal drawing and GL_ARB_texture_barrier, as long as we can tell the kernel what order to render the tiles in. This commit introduces the core gallium support, vc4 changes will follow. v2: Fix on the simulator. v3: Add the cap (disabled) to other drivers, add rst docs for the cap. v4: Rebase on PIPE_CAP_TGSI_ANY_REG_AS_ADDRESS v5: Drop vc4 changes from this commit, for clarity. Reviewed-by: Nicolai Hähnle <[email protected]> (v3)
* broadcom/vc4: Implement GL_ARB_texture_barrier.Eric Anholt2017-10-102-1/+12
| | | | | | Improves x11perf -copywinwin100 from ~2000/sec to ~4700/sec. More importantly, this is a prerequisite for the new GL_MESA_tile_raster_order extension.
* vc4: Add support for 5551 textures.Eric Anholt2017-10-102-3/+3
| | | | | This keeps us from promoting them up to 8888, at the cost of not being color-renderable.
* gallium: add PIPE_CAP_TGSI_ANY_REG_AS_ADDRESSMarek Olšák2017-10-061-0/+1
| | | | | Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* broadcom: Fix out-of-tree build include pathDaniel Stone2017-10-051-0/+2
| | | | | Reviewed-by: Eric Anholt <[email protected]> Fixes: 5b102160ae ("broadcom/genxml: Introduce a V3D packet/struct decoder.")
* broadcom/vc4: Don't advertise tiled dmabuf modifiers if we can't use themDerek Foreman2017-10-051-13/+18
| | | | | | | | | | | | | | | | If the DRM_VC4_GET_TILING ioctl isn't present then we can't tell if a dmabuf bo is tiled or linear, so will always assume it's linear. By not advertising tiled formats in this situation we ensure the assumption is correct. This fixes a bug where most attempts to render a gl wayland client under weston will result in a client side abort. Signed-off-by: Derek Foreman <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Acked-by: Daniel Stone <[email protected]> (on irc)
* gallium: add LDEXP TGSI instruction and corresponding capNicolai Hähnle2017-09-291-0/+1
| | | | | Reviewed-by: Marek Olšák <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* broadcom/vc4: Fix release buildEric Anholt2017-09-271-1/+1
| | | | | | | | | I remember thinking "gosh, it would be nice if I could do a kernel-style 'if (!IS_ENABLED(DEBUG))' instead of using an #ifdef, so the code was compiled on both builds", and then forgot to test a release build anyway. Fixes: a8fd58eae596 ("vc4: Add labels to BOs for debug builds or with VC4_DEBUG=surf set.") Reported-by: Derek Foreman <[email protected]>
* vc4: Add labels to BOs for debug builds or with VC4_DEBUG=surf set.Eric Anholt2017-09-274-0/+41
| | | | | | | This has proven to be incredibly useful for debugging CMA allocation failures and driving memory management improvements. However, we don't want to burden entry and exit from the BO cache with the labeling ioctl's overhead on release builds.
* broadcom/vc4: Fix infinite retry in vc4_bo_alloc()Boris Brezillon2017-09-261-6/+4
| | | | | | | | | | | | | | | cleared_and_retried is always reset to false when jumping to the retry label, thus leading to an infinite retry loop. Fix that by moving the cleared_and_retried variable definitions at the beginning of the function. While we're at it, move the create variable with the other local variables and explicitly reset its content in the retry path. Signed-off-by: Boris Brezillon <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Fixes: 78087676c98aa8884ba92 "vc4: Restructure the simulator mode."
* broadcom/vc4: Keep pipe_sampler_view->texture matching the original texture.Eric Anholt2017-09-266-39/+42
| | | | | | | | | | | I was overwriting view->texture with the shadow resource when we need to do shadow copies (retiling or baselevel rebase), but that tripped up some critical new sanity checking in state_tracker (making sure that stObj->pt hasn't changed from view->texture through TexImage-related paths). To avoid that, move the shadow resource to the vc4_sampler_view struct. Fixes: f0ecd36ef8e1 ("st/mesa: add an entirely separate codepath for setting up buffer views")
* gallium: Add PIPE_SHADER_CAP_INT64_ATOMICSJan Vesely2017-09-211-0/+1
| | | | | | | Denotes availability of 64bit int atomic instructions Signed-off-by: Jan Vesely <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* broadcom/vc4: Fix use-after-free when deleting a program.Eric Anholt2017-09-181-6/+15
| | | | | | | | | | | By leaving the compiled shader in the context's stage state, the next compile of a new FS would look in the old compiled FS for figuring out whether to set various dirty flags for the VS compile. Clear out the pointer when deleting the program, and make sure that we always mark the state as dirty if the previous program had been lost. Fixes valgrind warnings on glsl-max-varyings. Fixes: 2350569a78c6 ("vc4: Avoid VS shader recompiles by keeping a set of FS inputs seen so far.")
* broadcom/vc4: Fix crashes since the gallium blitter reworks.Eric Anholt2017-09-181-1/+3
| | | | | Even if we're not clearing color, the blitter has started dereferencing the color value.
* broadcom/vc4: Fix use-after-free trying to mix a quad and tile clear.Eric Anholt2017-09-181-24/+32
| | | | | | | | | | | | | The blitter will bind just the depth buffer, which flushes the current job if we had both a color and depth/stencil. If the clear was doing partial depth/stencil (quad-based) and color (tile-based), we'd go on to try to set up the rest of the tile clear in the now flushed job. Instead, move the partial clear up before we start setting up the job for the current FBO state, and re-fetch the job if we're continuing on to a tile-based clear. Fixes valgrind failures in fbo-depthtex. Fixes: 9421a6065c4e ("vc4: Fix fallback to quad clears of depth in GLX.")
* broadcom/vc4: Fix use-after-free for flushing when writing to a texture.Eric Anholt2017-09-181-2/+7
| | | | | | | | I was trying to continue the hash table loop, not the inner loop. This tended to work out, because we would have *just* freed the job struct. Fixes some valgrind failures in fbo-depthtex. Fixes: f597ac396640 ("vc4: Implement job shuffling")
* gallium: Add PIPE_SHADER_CAP_FP16Jan Vesely2017-09-181-0/+1
| | | | | | | | | Denotes native half precision float operations capability v2: PIPE_CAP_HALFS -> PIPE_SHADER_CAP_FP16 fix indentation Signed-off-by: Jan Vesely <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium: introduce PIPE_CAP_LOAD_CONSTBUFTimothy Arceri2017-09-151-0/+1
| | | | Reviewed-by: Marek Olšák <[email protected]>
* broadcom/vc4: Port NEON-code to ARM64Jonas Pfeil2017-08-151-0/+84
| | | | | | | | | | Changed all register and instruction names, works the same. v2: Rebase on build system changes (by anholt) v3: Fix build on clang (by anholt, reported by Rob) Signed-off-by: Jonas Pfeil <[email protected]> Tested-by: Rob Herring <[email protected]>
* broadcom/vc4: Build the vc4_tiling_lt_neon.c with -mfpu=neon on ARM.Eric Anholt2017-08-154-7/+24
| | | | | | | | | | | | | If you don't pass this, the compiler refuses to compile the assembly for pre-v7 CPUs. This also keeps us from building identical, non-NEON code on aarch64 and x86. Fixes: a373f77662c5 ("vc4: Use a wrapper file to set VC4_BUILD_NEON instead of CFLAGS.") v2: Fix Android build by just appending NEON_C_SOURCES when ARCH_ARM_HAVE_NEON. Tested-by: Rob Herring <[email protected]>
* gallium: introduce PIPE_CAP_MEMOBJTimothy Arceri2017-08-031-0/+1
| | | | | | | | | | | | | | This can be used to guard support for EXT_memory_object and related extensions. v2: update gallium docs v3 (Timothy Arceri): - add cap to nv50 Signed-off-by: Andres Rodriguez <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* gallium: add PIPE_QUERY_SO_OVERFLOW_ANY_PREDICATE and corresponding capNicolai Hähnle2017-08-021-0/+1
| | | | | | | | v2: rename cap to PIPE_CAP_QUERY_SO_OVERFLOW and be a bit more explicit in the documentation Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* st/glsl_to_nir: move nir_lower_io to driversNicolai Hähnle2017-07-311-0/+3
| | | | | | | This allows drivers more freedom in how exactly they want to lower I/O, e.g. first lowering I/O to temporaries. Reviewed-by: Marek Olšák <[email protected]>
* st/mesa: get rid of st_glsl_typesNicolai Hähnle2017-07-311-2/+8
| | | | | | | It's a duplicate of glsl_type::count_attribute_slots. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium: add PIPE_CAP_NIR_SAMPLERS_AS_DEREFNicolai Hähnle2017-07-311-0/+1
| | | | Reviewed-by: Marek Olšák <[email protected]>
* broadcom/vc4: Use the RA callback to improve register selection's choices.Eric Anholt2017-07-251-1/+52
| | | | | | | | | | | | | | | We simply pick r4 if available (anything else would force a MOV), then round-robin through accumulators (avoids physical regfile RAW delay slots), then round-robin through the physical regfile. The effect on instruction count is pretty impressive: total instructions in shared programs: 76563 -> 74526 (-2.66%) instructions in affected programs: 66463 -> 64426 (-3.06%) and we could probably do better with a little heuristic of "if we're going to choose a physical reg, and other operands of instructions using this as a src have the same physical regfile, then use the other regfile".
* broadcom/vc4: Scissor blits performed using the rendering engine.Eric Anholt2017-07-251-0/+9
| | | | | | Without this, a BlitFramebuffer would mark the whole framebuffer as being changed (so we emit loads/stores of all of it) rather than just the modified subset.
* broadcom/vc4: Prefer blit via rendering to the software fallback.Eric Anholt2017-07-251-6/+8
| | | | | | | I don't know how I managed to leave this here for so long. Found when working on a 1:1 overlapping blit extension for X11. Cc: [email protected]
* broadcom/vc4: Switch the Viewport Center fields to a fixed-point representation.Eric Anholt2017-07-251-2/+2
| | | | | | This gets us automatic CL decoding to a floating-point value, and drops a magic number from the emit code. 250x250 shader runner tests now say they have a center of 125.0 instead of 2000.
* broadcom/vc4: Use the XML decoder for CL dumping.Eric Anholt2017-07-253-443/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The VC4_DEBUG_CL output goes from: 0x00000010 0x00000010: 0x06 VC4_PACKET_START_TILE_BINNING 0x00000011 0x00000011: 0x38 VC4_PACKET_PRIMITIVE_LIST_FORMAT 0x00000012 0x00000012: 0x12 0x00000013 0x00000013: 0x66 VC4_PACKET_CLIP_WINDOW 0x00000014 0x00000014: 0x00 0x00000015 0x00000015: 0x00 0x00000016 0x00000016: 0x00 0x00000017 0x00000017: 0x00 0x00000018 0x00000018: 0xfa 0x00000019 0x00000019: 0x00 0x0000001a 0x0000001a: 0xfa 0x0000001b 0x0000001b: 0x00 to: 0x00000010 0x00000010: 0x06 Start Tile Binning 0x00000011 0x00000011: 0x38 Primitive List Format Data Type: 1 (16-bit index) Primitive Type: 2 (Triangles List) 0x00000013 0x00000013: 0x66 Clip Window Clip Window Height in pixels: 250 Clip Window Width in pixels: 250 Clip Window Bottom Pixel Coordinate: 0 Clip Window Left Pixel Coordinate: 0 v2: Squash in robher's fixes for Android
* renderonly/etnaviv: stop importing resource from renderonlyLucas Stach2017-07-191-2/+3
| | | | | | | | | | | The current way of importing the resource from renderonly after allocation is opaque and is taking away control from the driver, which it needs in order to implement more advanced scenarios than the simple linear scanout with matching stride alignments. Signed-off-by: Lucas Stach <[email protected]> Reviewed-by: Christian Gmeiner <[email protected]> Acked-by: Daniel Stone <[email protected]>
* Android: Fix vc4 build since XML changes.Rob Herring2017-07-121-1/+1
| | | | | | | | | | | | | | Since commit 7f80a9ff1312 ("vc4: Introduce XML-based packet header generation like Intel's."), the vc4 build on Android is broken: out/target/product/linaro_x86_64/gen/STATIC_LIBRARIES/libmesa_broadcom_genxml_intermediates/broadcom/cle/v3d_packet_v21_pack.h:12:10: fatal error: 'v3d_packet_helpers.h' file not found external/mesa3d/src/gallium/drivers/vc4/vc4_cl_dump.c:28:10: fatal error: 'vc4_packet.h' file not found The path of the generated header needs to be fixed since we build out of tree. Acked-by: Eric Anholt <[email protected]> Signed-off-by: Rob Herring <[email protected]>
* vc4: Set shareable BOs as T tiled if possibleEric Anholt2017-07-124-13/+182
| | | | | | | | | | | | | | | | | | | | | | | | X11 and GL compositor performance on VC4 has been terrible because of our SHARED-usage buffers all being forced to linear. This swaps SHARED && !LINEAR buffers over to being tiled. This is an expected win for all GL compositors during rendering (a full copy of each shared texture per draw call), allows X11 to be used with decent performance without a GL compositor, and improves X11 windowed swapbuffers performance as well. It also halves the memory usage of shared buffers that get textured from. The only cost should be idle systems with a scanout-only buffer that isn't flagged as LINEAR, in which case the memory bandwidth cost of scanout goes up ~25%. This implements the EGL_EXT_image_dma_buf_import_modifiers extension, supporting the VC4 T_TILED modifier. v2: Added modifier support to resource creation/import, and advertisement (by daniels). v3: Fix old-kernel fallback path, fix compiler error and warnings, and comment touchups (by anholt). Reviewed-by: Daniel Stone <[email protected]>
* vc4: Use vc4_setup_slices for resource importEric Anholt2017-07-121-33/+19
| | | | | | | | | Rather than open-coding populating the first slice inside resource import, use vc4_setup_slices to do it for us. v2: Rebase on VC4_DEBUG=surf change Reviewed-by: Daniel Stone <[email protected]>