summaryrefslogtreecommitdiffstats
path: root/src/gallium
Commit message (Collapse)AuthorAgeFilesLines
* svga: Add missing include guardsMichał Janiszewski2018-10-302-0/+6
| | | | | | Signed-off-by: Michał Janiszewski <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* st/dri: remove leftover local variableEric Engestrom2018-10-301-1/+0
| | | | | | | | Left over from the cleanup in 6ccc435e7ad92bb0ba77d "pipe-loader: move dup(fd) within pipe_loader_drm_probe_fd" Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* clover: add missing meson build dependencyEric Engestrom2018-10-291-1/+1
| | | | | | Fixes: 42ea0631f108d82554339 "meson: build clover" Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* svga: add missing meson build dependencyEric Engestrom2018-10-291-1/+1
| | | | | | Fixes: a537231b226280bc1e5b7 "meson: build svga driver on linux" Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* freedreno: don't flush when new and old pfb is identicalRob Clark2018-10-281-0/+3
| | | | | | | | | | | | | | | | | | | | | | In the 'inorder' case (ie. FD_MESA_DEBUG=inorder, or old kernel), if the u_blitter clear path is used (a3xx, a4xx, and some fallback cases on newer gens), util_blitter_restore_fb_state() will set_framebuffer_state() to something that is identical to the current fb state, which triggers an unnecessary flush, and then eventually an assert: (gdb) bt #0 0x0000007fbf24a078 in kill () from /lib64/libc.so.6 #1 0x0000007fbe061278 in _debug_assert_fail (expr=0x7fbe93a820 "!batch->flushed", file=0x7fbe93a628 "../src/gallium/drivers/freedreno/freedreno_batch.c", line=491, function=0x7fbe93a990 <__func__.17380> "fd_batch_check_size") at ../src/gallium/auxiliary/util/u_debug.c:322 #2 0x0000007fbe1ccb8c in fd_batch_check_size (batch=0x55556d5a70) at ../src/gallium/drivers/freedreno/freedreno_batch.c:491 #3 0x0000007fbe1d0e08 in fd_clear (pctx=0x55555c61e0, buffers=5, color=0x55556e388c, depth=1, stencil=0) at ../src/gallium/drivers/freedreno/freedreno_draw.c:463 #4 0x0000007fbe57afa4 in st_Clear (ctx=0x55556e17b0, mask=18) at ../src/mesa/state_tracker/st_cb_clear.c:452 The assert was introduced in 4b847b38ae3, so from a functionality standpoint this patch fixes that commit. But it should also avoid an unnecessary flush in the 'inorder' case, fixing a performance bug. Fixes: 4b847b38ae3 freedreno: make fd_batch a one-shot thing Signed-off-by: Rob Clark <[email protected]>
* freedreno: dependency tracking for z/s depends on ZSA stateRob Clark2018-10-281-1/+3
| | | | | | | | | | ZSA state can change whether depth or stencil is enabled This plus previous patch fix stk, and various things w/ FD_MESA_DEBUG=inorder Fixes: ec717fc629 freedreno: reduce resource dependency tracking overhead Signed-off-by: Rob Clark <[email protected]>
* freedreno: mark all state dirty after switching batchRob Clark2018-10-282-0/+3
| | | | | | | | | The problem isn't directly with ec717fc629 but rather that commit exposes the problem. When we switch batch we cannot assume previous state is clean so we should mark all state dirty. Fixes: ec717fc629 freedreno: reduce resource dependency tracking overhead Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: inline draw_impl()Rob Clark2018-10-261-38/+31
| | | | | | | | Now that it is just called once per draw (instead of once for binning and once for draw), let's just inline it. If nothing else, it makes perf-annotate easier to look at. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: small cleanupRob Clark2018-10-261-1/+1
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: move where we handle dirty vbo stateRob Clark2018-10-263-16/+14
| | | | | | | | | Historically this wasn't in fdN_emit_state(), because prior to addition of blitter in a5xx, fdN_emit_state() was also used in the clear path. These days that is only true for a2xx (a3xx and a4xx use u_blitter). So the reason for it not to be in fd6_emit_state() no longer exists. Signed-off-by: Rob Clark <[email protected]>
* freedreno: avoid no-op flushes by re-using last-fenceRob Clark2018-10-265-2/+34
| | | | | | | | Noticed that with webgl (in chromium, at least) we end up generating a lot of no-op submits just to get a fence. Tracking the last fence and returning that if there is no rendering since last flush avoids this. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: Move stencil/depth/alpha state to IBKristian H. Kristensen2018-10-265-15/+55
| | | | Signed-off-by: Kristian H. Kristensen <[email protected]>
* freedreno/a6xx: Move stencil mask emit to FD_DIRTY_ZSA groupKristian H. Kristensen2018-10-261-5/+6
| | | | Signed-off-by: Kristian H. Kristensen <[email protected]>
* freedreno/a6xx: Rename FD6_GROUP_ZSA ro FD6_GROUP_LRZKristian H. Kristensen2018-10-262-7/+7
| | | | Signed-off-by: Kristian H. Kristensen <[email protected]>
* freedreno/a6xx: Move rasterizer state to state objectKristian H. Kristensen2018-10-265-27/+51
| | | | Signed-off-by: Kristian H. Kristensen <[email protected]>
* freedreno/a6xx: Fix set_blit_scissor helperKristian H. Kristensen2018-10-261-2/+2
| | | | | | | The scissor maxx/maxy are non-inclusive, so don't subtract one from framebuffer width and height. Signed-off-by: Kristian H. Kristensen <[email protected]>
* freedreno/a2xx: Squash a compiler warningKristian H. Kristensen2018-10-261-2/+2
| | | | | | | | We get a warning here for assigning a const char * pointer to char *swizzle in struct ir2_src_register. The constructor strdups a 4 byte string here, so just memcpy to that instead. Signed-off-by: Kristian H. Kristensen <[email protected]>
* freedreno/a6xx: Use fd6_emit_ib from a6xxKristian H. Kristensen2018-10-263-10/+10
| | | | | | Move it to a header and use it where possible to avoid vfunc call. Signed-off-by: Kristian H. Kristensen <[email protected]>
* freedreno: import libdrm_freedreno + redesign submitRob Clark2018-10-2634-74/+3765
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the pursuit of lowering driver overhead, it became clear that some amount of redesign of how libdrm_freedreno constructs the submit ioctl would be needed. In particular, as the gallium driver is starting to make heavier use of CP_SET_DRAW_STATE state groups/objects, the over- head of tracking cmd buffers and relocs becomes too much. And for "streaming" state, which isn't ever reused (like uniform uploads) the overhead of allocating/freeing ringbuffer[1] objects is too high. This redesign makes two main changes: 1) Introduces a fd_submit object for tracking bos and cmds table for the submit ioctl, making ringbuffer objects more light- weight. This was previously done in the ringbuffer. But we have many ringbuffer instances involved in a submit (gmem + draw + potentially 1000's of state-group rbs), and only need a single bos and cmds table. (Reloc table is still per-rb) The submit is also a convenient place for a slab allocator for ringbuffer objects. Other options would have required locking because, while we can guarantee allocations will only happen on a single thread, free's could happen either on the application thread or the flush_queue thread. With the slab allocator in the submit object, any frees that happen on the flush_queue thread happen after we know that the application thread is done with the submit. 2) Introduce a new "softpin" msm_ringbuffer_sp implementation that does not use relocs and only has cmds table entries for IB1 (ie. the cmdstream buffers that kernel needs to CP_INDIRECT_BUFFER to from the RB). To do this properly will require some updates on the kernel side, so whether you get the softpin or legacy submit/ringbuffer implementation at runtime depends on your kernel version. To make all these changes in libdrm would basically require adding a libdrm_freedreno2, so this is a good point to just pull the libdrm code into mesa. Plus it allows for using mesa's hashtable, slab allocator, etc. And it lets us have asserts enabled for debug mesa buids but omitted for release builds. And it makes life easier if further API changes become necessary. At this point I haven't tried to pull in the kgsl backend. Although I left the level of vfunc indirection which would make it possible to have other backends. (And this was convenient to keep to allow for the "softpin" ringbuffer to coexist.) NOTE: if bisecting a build error takes you here, try a clean build. There are a bunch of ways things can go wrong if you still have libdrm_freedreno cflags. [1] "ringbuffer" is probably a bad name, the only level of cmdstream buffer that is actually a ring is RB managed by kernel. User- space cmdstream is all IB1/IB2 and state-groups. Reviewed-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Signed-off-by: Rob Clark <[email protected]>
* st/nine: Handle window resize when a presentation buffer is usedAxel Davy2018-10-261-1/+30
| | | | | | | | | | | | | | | | | | Usually when a window is resized, the app calls d3d to resize the back buffer to the window size. In some cases, it is not done, and it expects the output resizes to the window size, even if the back buffer size is unchanged. This patch introduces the behaviour when a presentation buffer is used. ID3DPresent_GetWindowInfo is a function available with D3DPresent v1.0, and thus we don't need to check if the function is available. The function had been introduced to implement this very feature. Signed-off-by: Axel Davy <[email protected]>
* st/nine: Reduce MaxSimultaneousTextures to 8Axel Davy2018-10-262-9/+8
| | | | | | | | | | | | Windows drivers don't set this flag (which affects ff) to more than 8. Do the same in case some games check for 8. v2: Remove any dependence on MaxSimultaneousTextures. For non-ff the number of textures is 16 when the device is able of vs/ps3. Add this requirement of 16 textures to the driver requirements. Signed-off-by: Axel Davy <[email protected]>
* st/nine: Enable shadow mapping for ps 1.XAxel Davy2018-10-263-10/+14
| | | | | | | | | | We didn't implement shadow textures for ps 1.X, assuming the case couldn't happen... Well it does. Fixes: https://github.com/iXit/Mesa-3D/issues/261 Signed-off-by: Axel Davy <[email protected]>
* st/nine: Do not set unused states for stateblocksAxel Davy2018-10-261-21/+3
| | | | | | | | A lot of these states are used only for the context, and are unused for stateblocks (which just uses the changed.* fields instead for a lot of them). Signed-off-by: Axel Davy <[email protected]>
* st/nine: Fix aliasing states for stateblocksAxel Davy2018-10-261-2/+1
| | | | | | | | | | | | | | | | | | | | | If NINE_STATE_FF_MATERIAL is set, the stateblock will upload its recorded materials matrix. If NINE_STATE_FF_LIGHTING is set, the lighting set is uploaded. These flags could be set by a NineDevice9_SetTransform call or by setting some states related to ff, but that shouldn't trigger these stateblock behaviours. We don't need to follow the context states dirtied by render states. NINE_STATE_FF_VSTRANSF is exactly the state controlling stateblock updates of transformation matrices, NINE_STATE_FF is too broad. These two changes avoid setting the two mentionned states when we shouldn't. Fixes: https://github.com/iXit/Mesa-3D/issues/320 Signed-off-by: Axel Davy <[email protected]>
* st/nine: Never update device changed.* fieldsAxel Davy2018-10-264-48/+59
| | | | | | | | | The device state changed.* field are never used. These fields are used only for stateblocks. Avoid setting them at all for clarity. Signed-off-by: Axel Davy <[email protected]>
* st/nine: Capture also default matrices for D3DSBT_ALLAxel Davy2018-10-263-24/+41
| | | | | | | | | | We avoid allocating space for never unused matrices. However we must do as if we had captured them. Thus when a D3DSBT_ALL stateblock apply has fewer matrices than device state, allocate the default matrices for the stateblock before applying. Signed-off-by: Axel Davy <[email protected]>
* st/nine: Mark transform matrices dirty for D3DSBT_ALLAxel Davy2018-10-261-1/+12
| | | | | | | | D3DSBT_ALL stateblocks capture the transform matrices. Fixes some d3d test programs not displaying properly. Signed-off-by: Axel Davy <[email protected]>
* st/nine: Don't update unused world matricesAxel Davy2018-10-261-0/+6
| | | | | | | | | | | | | | While to the application we have to track accurately all 256 world matrices (including in stateblocks), hw vertex processing enables to set a limit to the number of world matrices the hardware can access to in the advertised caps, which is 8 for nine. Thus don't bother in the stateblock code to send the updated values for the unreachable matrices. Signed-off-by: Axel Davy <[email protected]>
* st/nine: Remove two unused states.Axel Davy2018-10-262-3/+1
| | | | | | | NINE_STATE_MATERIAL was used incorrectly at one location. Replace it with the correct state. Signed-off-by: Axel Davy <[email protected]>
* st/nine: Remove commented nine_context_apply_stateblockAxel Davy2018-10-261-230/+0
| | | | | | | | | | | | At some point the project was to adapt the commented version to csmt. The csmt rework enabled to fix some state aliasing issues between stateblocks and internal state updates. The commented version needs a lot of work to work with that. Just drop it. Signed-off-by: Axel Davy <[email protected]>
* scons/svga: remove opt from the list of valid build typesBrian Paul2018-10-261-2/+0
| | | | | | | | | This reverts commit a5fd54f8bf6713312fa5efd7ef5cd125557a0ffe. The whole point was to add a way to pass -DVMX86_STATS to the build, but we can do that with a command line argument when we invoke scons. Reviewed-by: José Fonseca <[email protected]>
* radeon/vcn: use util function to get h264 profile idcBoyuan Zhang2018-10-261-2/+1
| | | | | | | | Use utility function for converting h264 pipe video profile to profile idc, instead of using array. Signed-off-by: Boyuan Zhang <[email protected]> Acked-by: Christian König <christian.koenig at amd.com>
* radeon/vce: use util function to get h264 profile idcBoyuan Zhang2018-10-262-8/+2
| | | | | | | | Use utility function for converting h264 pipe video profile to profile idc, instead of using array. Signed-off-by: Boyuan Zhang <[email protected]> Acked-by: Christian König <christian.koenig at amd.com>
* vl: get h264 profile idcBoyuan Zhang2018-10-261-0/+24
| | | | | | | Adding a function for converting h264 pipe video profile to profile idc Signed-off-by: Boyuan Zhang <[email protected]> Acked-by: Christian König <christian.koenig at amd.com>
* util: Change remaining uint32 cache ids to sha1David McFarland2018-10-263-67/+69
| | | | | | | | | | | | After discussion with Timothy Arceri. disk_cache_get_function_identifier was using only the first byte of the sha1 build-id. Replace disk_cache_get_function_identifier with implementation from radv_get_build_id. Instead of writing a uint32_t it now writes to a mesa_sha1. All drivers using disk_cache_get_function_identifier are updated accordingly. Reviewed-by: Timothy Arceri <[email protected]> Fixes: 83ea8dd99bb1 ("util: add disk_cache_get_function_identifier()")
* freedreno: use fd_bc_alloc_batch instead of fd_batch_create.Hyunjun Ko2018-10-252-2/+2
| | | | | | | | | | Following the commit 2385d7b066 and 8e798e28f7, for resource dependancy tracking. Fixes: dEQP-GLES31.functional.image_load_store.early_fragment_tests.no_early_fragment_tests_depth_fbo with FD_MESA_DEBUG=inorder Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: take reg->num out of union in ir3_registerHyunjun Ko2018-10-251-5/+6
| | | | | | | | | | To avoid wrong result when identifying the type of register. Ie. If the reg is an array, it might be identified as address or predicate register. Fixes: dEQP-GLES31.functional.ssbo.layout.random.arrays_of_arrays.6 Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: disable unused groupsRob Clark2018-10-252-6/+13
| | | | | | | | Don't leave vsconst/fsconst group enabled if we switch to shader with no uniforms. Fixes: abcdf5627a2 freedreno/a6xx: move const emit to state group Signed-off-by: Rob Clark <[email protected]>
* freedreno: add useful assertRob Clark2018-10-251-1/+3
| | | | | | | Would have been useful to catch the problem fixed in 8e798e28f736e22e9e1e4534ab42a36cde14b142 Signed-off-by: Rob Clark <[email protected]>
* swr/rast: ignore CreateElementUnorderedAtomicMemCpyAlok Hota2018-10-251-1/+2
| | | | | | | | | This function's API changed between LLVM 5 and 6. Compile errors occur when building with LLVM 6+ if LLVM 5 was used for a dist tarball CC: <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107865 Reviewed-by: Emil Velikov <[email protected]>
* swr/rast: fix intrinsic/function for LLVM 7 compatibilityAlok Hota2018-10-256-14/+3
| | | | | | | | | | | | Converted from x86 VFMADDPS intrinsic to generic LLVM intrinsic, and removed createInstructionSimplifierPass, which were both removed in LLVM 7.0.0 These changes combine patches we received from the community and our own internal patches Reviewed-by: Bruce Cherniak <[email protected]> Tested-by: Chuck Atkins <[email protected]>
* nvc0: increase NOUVEAU_TRANSFER_PUSHBUF_THRESHOLD to 1024 on Kepler+Rhys Perry2018-10-254-3/+11
| | | | | | | | Gives a +3.89% to +5.27% FPS improvement with Hitman and +2.73% to +2.82% FPS improvement with Dirt Rally on my GTX 1060. Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* util: use C99 declaration in the for-loop set_foreach() macroEric Engestrom2018-10-254-6/+0
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* util: use C99 declaration in the for-loop hash_table_foreach() macroEric Engestrom2018-10-2510-20/+0
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* ir3_compiler/nir: fix imageSize() for buffer-backed imagesEduardo Lima Mitev2018-10-242-0/+33
| | | | | | | | | | | | | | | | | | | | | | GL_EXT_texture_buffer introduced texture buffers, which can be used in shaders through a new type imageBuffer. Because how image access is implemented in freedreno, calling imageSize on an imageBuffer returns the size in bytes instead of texels, which is incorrect. This patch adds a division of imageSize result by the bytes-per-pixel of the image format, when image is buffer-backed. Fixes all tests under dEQP-GLES31.functional.image_load_store.buffer.image_size.* v2: Pre-compute and submit the log2 of the image format's bpp as shader constant instead of emitting the LOG2 instruction in code. (Rob Clark) v3: Use ffs (find-first-bit) helper for computing log2 (Ilia Mirkin) Reviewed-by: Rob Clark <[email protected]>
* radeonsi: enable vcn jpeg decode for ravenBoyuan Zhang2018-10-231-0/+2
| | | | | | | Enable vcn jpeg decode for raven. Signed-off-by: Boyuan Zhang <[email protected]> Reviewed-by: Leo Liu <[email protected]>
* winsys/amdgpu: add vcn jpeg cs supportBoyuan Zhang2018-10-231-0/+12
| | | | | | | Add vcn jpeg cs support, align cs by no-op. Signed-off-by: Boyuan Zhang <[email protected]> Reviewed-by: Leo Liu <[email protected]>
* radeon/vcn: implement jpeg target buffer cmdBoyuan Zhang2018-10-231-1/+72
| | | | | | | | Implement jpeg target buffer cmd by programming registers directly, since there is no firmware for VCN Jpeg decode. Signed-off-by: Boyuan Zhang <[email protected]> Acked-by: Leo Liu <[email protected]>
* radeon/vcn: implement jpeg bitstream buffer cmdBoyuan Zhang2018-10-231-1/+45
| | | | | | | | Implement jpeg bitstream buffer cmd by programming registers directly, since there is no firmware for VCN Jpeg decode. Signed-off-by: Boyuan Zhang <[email protected]> Acked-by: Leo Liu <[email protected]>
* radeon/uvd: remove get mjpeg slice headerBoyuan Zhang2018-10-231-157/+0
| | | | | | | | Move the previous get_mjpeg_slice_heaeder function and eoi from "radeon/vcn" to "st/va". Signed-off-by: Boyuan Zhang <[email protected]> Reviewed-by: Leo Liu <[email protected]>