summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/freedreno
Commit message (Collapse)AuthorAgeFilesLines
* nir: Make nir_lower_clip_vs optionally work with variables.Kenneth Graunke2018-11-191-1/+1
| | | | | | | | | | | | The way nir_lower_clip_vs() works with store_output intrinsics makes a ton of assumptions about the driver_location field. In i965 and iris, I'd rather do this lowering early and work with variables. v3d may want to switch to that as well, and ir3 could too, but I'm not sure exactly what would need updating. For now, handle both methods. Reviewed-by: Eric Anholt <[email protected]>
* freedreno/drm: fix unused 'entry' warningsRob Clark2018-11-121-2/+0
| | | | | | | Looks like importing libdrm_freedreno into mesa crossed paths with e27902a2613. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: Clear z32 and separate stencil with blitterKristian H. Kristensen2018-11-062-27/+50
| | | | Signed-off-by: Kristian H. Kristensen <[email protected]>
* freedreno/a6xx: fix VSC bug with larger # of tilesRob Clark2018-11-061-5/+2
| | | | | | | | | | At higher resolutions with the addition of MSAA, the number of tiles can increase to the point where we use more than one VSC pipe per tile. Which would cause us to calculate an out-of-bounds offset for VSC_SIZE_ADDRESS. So don't try to be clever, just always put it at a fixed offset assuming the max 32 VSC pipes in use. Signed-off-by: Rob Clark <[email protected]>
* freedreno: update generated headersRob Clark2018-11-067-29/+51
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno: Do not link ir3_compiler with valgrind libraries.Vinson Lee2018-10-311-2/+1
| | | | | | | | | | | | | | | | | | | This patch fixes this freedreno autotools build error. CXXLD ir3_compiler /usr/lib/valgrind/libcoregrind-amd64-linux.a(libcoregrind_amd64_linux_a-m_main.o): In function `_start': (.text+0x0): multiple definition of `_start' /usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/Scrt1.o:(.text+0x0): first defined here /usr/bin/ld: /usr/lib/valgrind/libcoregrind-amd64-linux.a(libcoregrind_amd64_linux_a-m_main.o): relocation R_X86_64_32S against undefined symbol `vgPlain_interim_stack' can not be used when making a PIE object; recompile with -fPIC /usr/bin/ld: /usr/lib/valgrind/libcoregrind-amd64-linux.a(libcoregrind_amd64_linux_a-m_trampoline.o): relocation R_X86_64_32 against `.text' can not be used when making a PIE object; recompile with -fPIC /usr/bin/ld: /usr/lib/valgrind/libcoregrind-amd64-linux.a(libcoregrind_amd64_linux_a-dispatch-amd64-linux.o): relocation R_X86_64_32S against symbol `vgPlain_stats__n_xindirs_32' can not be used when making a PIE object; recompile with -fPIC /usr/bin/ld: final link failed: Nonrepresentable section on output collect2: error: ld returned 1 exit status Fixes: f3cc0d274756 ("freedreno: import libdrm_freedreno + redesign submit") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108595 Signed-off-by: Vinson Lee <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* configure: allow building with python3Emil Velikov2018-10-311-1/+1
| | | | | | | | | | | | | | | Pretty much all of the scripts are python2+3 compatible. Check and allow using python3, while adjusting the PYTHON2 refs. Note: - python3.4 is used as it's the earliest supported version - python2 chosen prior to python3 v2: use python2 by default Cc: Ilia Mirkin <[email protected]> Signed-off-by: Emil Velikov <[email protected]> Acked-by: Eric Engestrom <[email protected]>
* freedreno: don't flush when new and old pfb is identicalRob Clark2018-10-281-0/+3
| | | | | | | | | | | | | | | | | | | | | | In the 'inorder' case (ie. FD_MESA_DEBUG=inorder, or old kernel), if the u_blitter clear path is used (a3xx, a4xx, and some fallback cases on newer gens), util_blitter_restore_fb_state() will set_framebuffer_state() to something that is identical to the current fb state, which triggers an unnecessary flush, and then eventually an assert: (gdb) bt #0 0x0000007fbf24a078 in kill () from /lib64/libc.so.6 #1 0x0000007fbe061278 in _debug_assert_fail (expr=0x7fbe93a820 "!batch->flushed", file=0x7fbe93a628 "../src/gallium/drivers/freedreno/freedreno_batch.c", line=491, function=0x7fbe93a990 <__func__.17380> "fd_batch_check_size") at ../src/gallium/auxiliary/util/u_debug.c:322 #2 0x0000007fbe1ccb8c in fd_batch_check_size (batch=0x55556d5a70) at ../src/gallium/drivers/freedreno/freedreno_batch.c:491 #3 0x0000007fbe1d0e08 in fd_clear (pctx=0x55555c61e0, buffers=5, color=0x55556e388c, depth=1, stencil=0) at ../src/gallium/drivers/freedreno/freedreno_draw.c:463 #4 0x0000007fbe57afa4 in st_Clear (ctx=0x55556e17b0, mask=18) at ../src/mesa/state_tracker/st_cb_clear.c:452 The assert was introduced in 4b847b38ae3, so from a functionality standpoint this patch fixes that commit. But it should also avoid an unnecessary flush in the 'inorder' case, fixing a performance bug. Fixes: 4b847b38ae3 freedreno: make fd_batch a one-shot thing Signed-off-by: Rob Clark <[email protected]>
* freedreno: dependency tracking for z/s depends on ZSA stateRob Clark2018-10-281-1/+3
| | | | | | | | | | ZSA state can change whether depth or stencil is enabled This plus previous patch fix stk, and various things w/ FD_MESA_DEBUG=inorder Fixes: ec717fc629 freedreno: reduce resource dependency tracking overhead Signed-off-by: Rob Clark <[email protected]>
* freedreno: mark all state dirty after switching batchRob Clark2018-10-282-0/+3
| | | | | | | | | The problem isn't directly with ec717fc629 but rather that commit exposes the problem. When we switch batch we cannot assume previous state is clean so we should mark all state dirty. Fixes: ec717fc629 freedreno: reduce resource dependency tracking overhead Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: inline draw_impl()Rob Clark2018-10-261-38/+31
| | | | | | | | Now that it is just called once per draw (instead of once for binning and once for draw), let's just inline it. If nothing else, it makes perf-annotate easier to look at. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: small cleanupRob Clark2018-10-261-1/+1
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: move where we handle dirty vbo stateRob Clark2018-10-263-16/+14
| | | | | | | | | Historically this wasn't in fdN_emit_state(), because prior to addition of blitter in a5xx, fdN_emit_state() was also used in the clear path. These days that is only true for a2xx (a3xx and a4xx use u_blitter). So the reason for it not to be in fd6_emit_state() no longer exists. Signed-off-by: Rob Clark <[email protected]>
* freedreno: avoid no-op flushes by re-using last-fenceRob Clark2018-10-265-2/+34
| | | | | | | | Noticed that with webgl (in chromium, at least) we end up generating a lot of no-op submits just to get a fence. Tracking the last fence and returning that if there is no rendering since last flush avoids this. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: Move stencil/depth/alpha state to IBKristian H. Kristensen2018-10-265-15/+55
| | | | Signed-off-by: Kristian H. Kristensen <[email protected]>
* freedreno/a6xx: Move stencil mask emit to FD_DIRTY_ZSA groupKristian H. Kristensen2018-10-261-5/+6
| | | | Signed-off-by: Kristian H. Kristensen <[email protected]>
* freedreno/a6xx: Rename FD6_GROUP_ZSA ro FD6_GROUP_LRZKristian H. Kristensen2018-10-262-7/+7
| | | | Signed-off-by: Kristian H. Kristensen <[email protected]>
* freedreno/a6xx: Move rasterizer state to state objectKristian H. Kristensen2018-10-265-27/+51
| | | | Signed-off-by: Kristian H. Kristensen <[email protected]>
* freedreno/a6xx: Fix set_blit_scissor helperKristian H. Kristensen2018-10-261-2/+2
| | | | | | | The scissor maxx/maxy are non-inclusive, so don't subtract one from framebuffer width and height. Signed-off-by: Kristian H. Kristensen <[email protected]>
* freedreno/a2xx: Squash a compiler warningKristian H. Kristensen2018-10-261-2/+2
| | | | | | | | We get a warning here for assigning a const char * pointer to char *swizzle in struct ir2_src_register. The constructor strdups a 4 byte string here, so just memcpy to that instead. Signed-off-by: Kristian H. Kristensen <[email protected]>
* freedreno/a6xx: Use fd6_emit_ib from a6xxKristian H. Kristensen2018-10-263-10/+10
| | | | | | Move it to a header and use it where possible to avoid vfunc call. Signed-off-by: Kristian H. Kristensen <[email protected]>
* freedreno: import libdrm_freedreno + redesign submitRob Clark2018-10-2633-73/+3764
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the pursuit of lowering driver overhead, it became clear that some amount of redesign of how libdrm_freedreno constructs the submit ioctl would be needed. In particular, as the gallium driver is starting to make heavier use of CP_SET_DRAW_STATE state groups/objects, the over- head of tracking cmd buffers and relocs becomes too much. And for "streaming" state, which isn't ever reused (like uniform uploads) the overhead of allocating/freeing ringbuffer[1] objects is too high. This redesign makes two main changes: 1) Introduces a fd_submit object for tracking bos and cmds table for the submit ioctl, making ringbuffer objects more light- weight. This was previously done in the ringbuffer. But we have many ringbuffer instances involved in a submit (gmem + draw + potentially 1000's of state-group rbs), and only need a single bos and cmds table. (Reloc table is still per-rb) The submit is also a convenient place for a slab allocator for ringbuffer objects. Other options would have required locking because, while we can guarantee allocations will only happen on a single thread, free's could happen either on the application thread or the flush_queue thread. With the slab allocator in the submit object, any frees that happen on the flush_queue thread happen after we know that the application thread is done with the submit. 2) Introduce a new "softpin" msm_ringbuffer_sp implementation that does not use relocs and only has cmds table entries for IB1 (ie. the cmdstream buffers that kernel needs to CP_INDIRECT_BUFFER to from the RB). To do this properly will require some updates on the kernel side, so whether you get the softpin or legacy submit/ringbuffer implementation at runtime depends on your kernel version. To make all these changes in libdrm would basically require adding a libdrm_freedreno2, so this is a good point to just pull the libdrm code into mesa. Plus it allows for using mesa's hashtable, slab allocator, etc. And it lets us have asserts enabled for debug mesa buids but omitted for release builds. And it makes life easier if further API changes become necessary. At this point I haven't tried to pull in the kgsl backend. Although I left the level of vfunc indirection which would make it possible to have other backends. (And this was convenient to keep to allow for the "softpin" ringbuffer to coexist.) NOTE: if bisecting a build error takes you here, try a clean build. There are a bunch of ways things can go wrong if you still have libdrm_freedreno cflags. [1] "ringbuffer" is probably a bad name, the only level of cmdstream buffer that is actually a ring is RB managed by kernel. User- space cmdstream is all IB1/IB2 and state-groups. Reviewed-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Signed-off-by: Rob Clark <[email protected]>
* freedreno: use fd_bc_alloc_batch instead of fd_batch_create.Hyunjun Ko2018-10-252-2/+2
| | | | | | | | | | Following the commit 2385d7b066 and 8e798e28f7, for resource dependancy tracking. Fixes: dEQP-GLES31.functional.image_load_store.early_fragment_tests.no_early_fragment_tests_depth_fbo with FD_MESA_DEBUG=inorder Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: take reg->num out of union in ir3_registerHyunjun Ko2018-10-251-5/+6
| | | | | | | | | | To avoid wrong result when identifying the type of register. Ie. If the reg is an array, it might be identified as address or predicate register. Fixes: dEQP-GLES31.functional.ssbo.layout.random.arrays_of_arrays.6 Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: disable unused groupsRob Clark2018-10-252-6/+13
| | | | | | | | Don't leave vsconst/fsconst group enabled if we switch to shader with no uniforms. Fixes: abcdf5627a2 freedreno/a6xx: move const emit to state group Signed-off-by: Rob Clark <[email protected]>
* freedreno: add useful assertRob Clark2018-10-251-1/+3
| | | | | | | Would have been useful to catch the problem fixed in 8e798e28f736e22e9e1e4534ab42a36cde14b142 Signed-off-by: Rob Clark <[email protected]>
* util: use C99 declaration in the for-loop set_foreach() macroEric Engestrom2018-10-253-4/+0
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* util: use C99 declaration in the for-loop hash_table_foreach() macroEric Engestrom2018-10-252-5/+0
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* ir3_compiler/nir: fix imageSize() for buffer-backed imagesEduardo Lima Mitev2018-10-242-0/+33
| | | | | | | | | | | | | | | | | | | | | | GL_EXT_texture_buffer introduced texture buffers, which can be used in shaders through a new type imageBuffer. Because how image access is implemented in freedreno, calling imageSize on an imageBuffer returns the size in bytes instead of texels, which is incorrect. This patch adds a division of imageSize result by the bytes-per-pixel of the image format, when image is buffer-backed. Fixes all tests under dEQP-GLES31.functional.image_load_store.buffer.image_size.* v2: Pre-compute and submit the log2 of the image format's bpp as shader constant instead of emitting the LOG2 instruction in code. (Rob Clark) v3: Use ffs (find-first-bit) helper for computing log2 (Ilia Mirkin) Reviewed-by: Rob Clark <[email protected]>
* ir3/nir: Set up image_dims consts for image_deref_size intrinsic tooEduardo Lima Mitev2018-10-211-0/+1
| | | | | | | | | `nir_intrinsic_image_deref_size` is not being considered during scan for driver constants, so image constants are not emitted if a shader only ever query the size of an image (no load, store, atomic op, etc). This is unlikely, but possible. Reviewed-by: Rob Clark <[email protected]>
* freedreno/a6xx: don't allocate binning rbRob Clark2018-10-171-3/+7
| | | | | | | Now that a single cmdstream is used for both binning and draw passes, we can skip allocation of cmdstream buffer for binning. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: single cmdstream for draw+binningRob Clark2018-10-173-15/+3
| | | | | | | | | | | Now that state which is different for draw vs binning pass is split out into different state-groups with appropriate enable_mask (so the appropriate one is chosen for draw vs binning), switch over to using a single cmdstream for both passes. This should significantly lower draw overhead for CPU bound benchmarks. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: split binning vs draw program stateobj'sRob Clark2018-10-172-4/+4
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: split VBO state into binning/draw variantsRob Clark2018-10-172-1/+8
| | | | | | | | Blob seems to manage to use same input registers for BS (binning pass) vs VS (draw pass) shaders, so it can use the same VBO state for both. We can't quite do that yet, so split them. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: move VBO state to stateobjRob Clark2018-10-173-8/+19
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: move ZSA state to stateobjRob Clark2018-10-172-19/+39
| | | | | | | Step towards single cmdstream, where we need different state-group-id's for binning vs draw ZSA state. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: remove vismode paramRob Clark2018-10-171-14/+2
| | | | | | | We don't need to keep this IGNORE_VISIBILITY in binning pass. Prep work for using single cmdstream for both draw and binning passes. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: move binning-pass fixup for a6xx+Rob Clark2018-10-171-20/+37
| | | | | | | | | Move this to after ir3_cp (which can add lowered immediates to the const state) for a6xx+, to ensure the uniform state matches between binning and vertex shaders. This way we can emit just a single VS_CONST state- group when we re-use single cmdstream for both binning and draw passes. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: a bit more state emit cleanupRob Clark2018-10-174-37/+27
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: move framebuffer state emit to emit_mrt()Rob Clark2018-10-172-29/+24
| | | | | | | No point in checking this per-draw, since framebuffer change means new batch. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: small emit_mrt() cleanupRob Clark2018-10-171-14/+7
| | | | | | | On a6xx, this is only used for pfb->cbufs so we can just directly pass the pfb state. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: use program cacheRob Clark2018-10-177-130/+247
| | | | | | | Use the in-memory cache to construct shader program state and re-use it on subsequent draws, to lower driver overhead. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: shader variant cacheRob Clark2018-10-175-0/+214
| | | | | | | | | | Cache that maps gallium hwcso (in this case, 'struct ir3_shader') plus shader variant key to a generation specific state object. This could eventually replace the linked list of shader variants, but for now it lets us re-use the work currently done in fdN_program_emit() Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: move binning_pass out of shader variant keyRob Clark2018-10-1721-75/+109
| | | | | | | | | | | Prep work for a following patch, that introduces a cache to map from program state (all shader stages) plus variant key to pre-baked hw state (which could be emit'd via CP_SET_DRAW_STATE, for example). To do that, we really want the variant key to be immutable, and to treat the binning pass shader as an extra shader stage, rather than as a VS variant. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: track # of samplers used by shaderRob Clark2018-10-178-25/+19
| | | | | | | This is useful for a6xx to avoid program state from depending on bound tex/samp state. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: texture state objRob Clark2018-10-176-33/+251
| | | | | | | | | | | Unfortunately gallium doesn't match what the hw wants perfectly here, in using a separate CSO for each texture/sampler. So we have to use a hash table to map the collection of texture/samplers to hw state object. We probably could use separate hw state objects for texture and sampler state, but mesa/st tends to update the tex and samp state together. Signed-off-by: Rob Clark <[email protected]>
* freedreno: add resource seqnoRob Clark2018-10-174-3/+11
| | | | | | | Intended to be something more compact than a 64b pointer, which could be used as a key into hashtables. Prep work for texture state objects. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: move const emit to state groupRob Clark2018-10-174-15/+70
| | | | | | | | | | | | | | Eventually we want to move nearly everything, but no other state depends on const state, so this is the easiest one to move first. For webgl aquarium, this reduces GPU load by about 10%, since for each fish it does a uniform upload plus draw.. fish frequently are visible in only a single tile, so this skips the uniform uploads for other tiles. The additional step of avoiding WFI's when using CP_SET_DRAW_STATE seems to be work an additional 10% gain for aquarium. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: add infrastructure for CP_DRAW_STATERob Clark2018-10-172-0/+46
| | | | | | | Add helper to add state-groups to emit, and code to emit CP_DRAW_STATE packet if we have any state-groups. Signed-off-by: Rob Clark <[email protected]>
* freedreno: reduce resource dependency tracking overheadRob Clark2018-10-171-42/+67
| | | | Signed-off-by: Rob Clark <[email protected]>