summaryrefslogtreecommitdiffstats
path: root/src/gallium
Commit message (Collapse)AuthorAgeFilesLines
* gallium: add PIPE_CAP_TGSI_CAN_READ_OUTPUTSNicolai Hähnle2016-11-3017-0/+18
| | | | | | | | | | | Drivers that support this benefit by saving one lowering pass in the GLSL-to-TGSI conversion. radeonsi already supports this because all outputs are stored in temporary variables before the export (except for TCS outputs, which have always been readable in TGSI anyway due to their special semantics). Reviewed-by: Marek Olšák <[email protected]>
* swr: [rasterizer jit] use signed integer representation for logic opIlia Mirkin2016-11-291-5/+12
| | | | | | | | | | Instead of (incorrectly) biasing the snorm value to make it look like a unorm, just use signed integer math. This fixes arb_color_buffer_float-render GL_RGBA8_SNORM Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: add missing rgbx8_srgb variantIlia Mirkin2016-11-291-0/+1
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: reorder renderable formats, add grouping commentsIlia Mirkin2016-11-291-65/+87
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: use util_copy_framebuffer_state helperIlia Mirkin2016-11-291-12/+1
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: enable cubemap arraysIlia Mirkin2016-11-291-1/+1
| | | | | | | Everything is in place for these. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: rearrange caps into limits/supported/unsupported groupsIlia Mirkin2016-11-291-129/+84
| | | | | | | | | | I find this a lot more readable and compact - much easier to scan through the list and see what's on and what's off. No functional change intended. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: only store up to the LOD sizeIlia Mirkin2016-11-291-1/+3
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: [rasterizer common] add SwrTrace() and macrosTim Rowley2016-11-292-15/+95
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* radeonsi: don't fetch 8 dwords for samplerBuffer and imageBufferMarek Olšák2016-11-291-51/+43
| | | | | | | | | | | | | | The compiler doesn't shrink s_load_dwordx8, so we always wasted 4 SGPRs. Also, the extraction of the descriptor created some really ugly asm code with lots of VALU bitwise ops and v_readfirstlane. Totals from *affected* shaders: SGPRS: 13880 -> 13253 (-4.52 %) VGPRS: 15200 -> 15088 (-0.74 %) Code Size: 499864 -> 459816 (-8.01 %) bytes Max Waves: 1554 -> 1564 (0.64 %) Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: disable XNACK to free 2 SGPRs on APUsMarek Olšák2016-11-291-1/+1
| | | | | | My LLVM commit disables it for dGPUs, but not APUs. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: count and report temp arrays in scratch separatelyMarek Olšák2016-11-292-4/+40
| | | | | | v2: only do this if debug output of shader dumping is enabled Reviewed-by: Nicolai Hähnle <[email protected]> (v1)
* radeonsi: don't try to eliminate trivial VS outputs for PS and CSMarek Olšák2016-11-291-1/+4
| | | | | | PS and CS don't have any param exports, so it's a no-op. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: disable RB+ blend optimizations for dual source blendingMarek Olšák2016-11-291-0/+11
| | | | | | | | This fixes dual source blending on Stoney. The fix was copied from Vulkan. The problem was discovered during internal testing. Cc: 13.0 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: set CB_BLEND1_CONTROL.ENABLE for dual source blendingMarek Olšák2016-11-291-0/+4
| | | | | | | copied from Vulkan Cc: 13.0 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: always set all blend registersMarek Olšák2016-11-291-5/+5
| | | | | | | better safe than sorry Cc: 13.0 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: set the smallest possible CB_TARGET_MASKMarek Olšák2016-11-291-5/+5
| | | | | | better safe than sorry; set_framebuffer_state always makes this dirty Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't print bodies of header-only packetsMarek Olšák2016-11-291-0/+4
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: print unknown registers with correct formattingMarek Olšák2016-11-291-1/+2
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* ddebug: fix hang detection with deferred flushesMarek Olšák2016-11-291-1/+1
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* vc4: Add a note for the future about texture latency calculation.Eric Anholt2016-11-291-0/+20
| | | | | | | Debugging a shader-db reported cycle count regression from the tex coalescing, I eventually figured out that the texture latencies were totally bogus. Really fixing it will probably involve mirroring vc4_qir_schedule.c's texture fifo management here.
* vc4: Add support for coalescing ALU ops into tex_[srtb] MOVs.Eric Anholt2016-11-294-29/+37
| | | | | | | | | | | This isn't as complete as I would like (can't merge interpolation because of the implicit r5 dependency, doesn't work with control flow), but this was cheap and easy. Improves 3DMMES Taiji performance by 1.15353% +/- 0.299896% (n=29, 16) total instructions in shared programs: 99810 -> 99059 (-0.75%) instructions in affected programs: 10705 -> 9954 (-7.02%)
* vc4: Restructure VPM write optimization into two passes.Eric Anholt2016-11-291-18/+10
| | | | | For texturing, there won't be a fixed limit on how many writes there are, so we need to compute uses up front.
* vc4: Make qir_for_each_inst_inorder() safe against removal.Eric Anholt2016-11-291-1/+1
| | | | | The dead code elimination wants it to be safe, and I actually got segfaults due to it being unsafe with the new coalescing pass.
* vc4: Split optimizing VPM writes from VPM reads.Eric Anholt2016-11-295-51/+110
| | | | | | The VPM write logic will be basically the same as the texture coordinate write logic we need, and it's not really related to the VPM read logic other than the reuse of the use_count array.
* vc4: Restructure texture insts as ALU ops with tex_[strb] as the dst.Eric Anholt2016-11-299-89/+194
| | | | | For now we're still just generating MOVs, but this will let us fold into other ops in the future. No difference on shader-db.
* vc4: Refactor qir_get_op_nsrc(enum qop) to qir_get_nsrc(struct qinst *).Eric Anholt2016-11-2917-36/+34
| | | | | | Every caller was dereffing the qinst, and this will let us make the number of sources vary depending on the destination of the qinst so that we can have general ALU ops that store to tex_[strb] and get an implicit uniform.
* vc4: Replace the qinst src[] with a fixed-size array.Eric Anholt2016-11-293-4/+2
| | | | | | This may have made a tiny bit of sense when we had one 4-arg inst per shader, but if we only ever put 2 things in, having a pointer to 2 things almost every instruction is pointless indirection.
* vc4: Remove qir_inst4().Eric Anholt2016-11-292-25/+0
| | | | | This was used originally for unorm4x8 packs, but we now represent those as a series of packed movs.
* swr: [rasterizer memory] only clear up to the LOD sizeIlia Mirkin2016-11-281-2/+8
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: [rasterizer memory] hook up stencil clears for ClearTileIlia Mirkin2016-11-281-5/+8
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: [rasterizer memory] add support for clearing Z32F_X32 and Z16Ilia Mirkin2016-11-281-0/+2
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: don't clear all dirty bits when changing so targetsIlia Mirkin2016-11-281-1/+1
| | | | | | | | | | Among other things, blits would clear existing SO targets which would cause a bunch of updates from u_blitter to be missed. Fixes fbo-scissor-blit fbo, probably among many others. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer core] fix typo in scissor tile-alignment logicIlia Mirkin2016-11-281-1/+1
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* st/omx/dec/h264: consider POC as signed instead of unsignedChandu Babu Namburu2016-11-281-3/+3
| | | | | | picture order count can be a negative value Reviewed-by: Christian König <[email protected]>
* freedreno: fix slice size for imported buffersRob Clark2016-11-271-0/+1
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx: make _emit_const() staticRob Clark2016-11-272-5/+1
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a4xx: make _emit_const() staticRob Clark2016-11-273-6/+2
| | | | Signed-off-by: Rob Clark <[email protected]>
* gm107/ir: optimize 32-bit CONST load to movSamuel Pitoiset2016-11-262-0/+17
| | | | | | | | | This is not allowed for indirect accesses because the source GPR might be erased by a subsequent instruction (WaR hazard) if we don't emit a read dep bar. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* gm107/ir: do not combine CONST loadsSamuel Pitoiset2016-11-261-2/+7
| | | | | | | | | | | | | | This will allow to use MOV instead of LD. The main advantage is that MOV doesn't require a read dependency barrier while LD does, and so this will both reduce barriers pressure and the number of stall counts needed to read data from constant memory. This is currently only for user uniform accesses. I should do something similar when loading from the driver constant buffer but it seems like a bit tricky to handle for now. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* clover: Restore support for LLVM <= 3.9.Vedran Miletić2016-11-242-6/+21
| | | | | | | | | | | | | | | | | | The commit 8e430ff8b060b4e8e922bae24b3c57837da6ea77 broke support for LLVM 3.9 and older versions in Clover. This patch restores it and refactors the support using Clover compatibility layer for LLVM. v2: merged #ifdef blocks v3: added support for LLVM 3.6-3.8 v4: add missing #ifdef around <memory> v5: simplify using templates and lambda Signed-off-by: Vedran Miletić <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98740 Tested-by[v4]: Pierre Moreau <[email protected]> Tested-by: Vinson Lee <[email protected]> Reviewed-by: Francisco Jerez <[email protected]> Reviewed-by: Jan Vesely <[email protected]>
* scons: Recognize LLVM_CONFIG environment variable.Vinson Lee2016-11-241-1/+2
| | | | | | Signed-off-by: Vinson Lee <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* util: fix memory leak from the fragment shaders for SINT<->UINT blitsCharmaine Lee2016-11-231-1/+1
| | | | | | This patch deletes those fragment shaders in util_blitter_destroy(). Reviewed-by: Brian Paul <[email protected]>
* swr: clear every layer of the attached surfacesIlia Mirkin2016-11-231-6/+29
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: [rasterizer core] pipe renderTargetArrayIndex through to clearsIlia Mirkin2016-11-237-20/+35
| | | | | | | | | | Currently clears only operate on the 0th array index (ignoring surface layout parameters). Instead normalize to take a RTAI like all the load/store tile logic does, and use ComputeSurfaceAddress to properly take the surface state's lod/array index into account. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: [rasterizer core] clear data now comes in as floatIlia Mirkin2016-11-231-10/+4
| | | | | | | | The non-fast-clear path was never updated after clear colors were passed in as floats. Remove the now-harmful conversion from unorm8. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: [rasterizer core] actually perform clear before store in GetHotTileIlia Mirkin2016-11-231-0/+12
| | | | | | | | | When switching render target array indexes (as might happen in a GS, or in a future change, with layered clears), if the previous state is HOTTILE_CLEAR, we should actually clear the tile before saving it off. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* radeonsi: print new opt flags in si_dump_shader_keyMarek Olšák2016-11-231-0/+9
| | | | | Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: add a debug flag that disables optimized shader variantsMarek Olšák2016-11-233-0/+7
| | | | | Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* swr: [rasterizer core] fix cast for stencil clear valueTim Rowley2016-11-221-3/+2
| | | | | | | Bad type cast for stencil clear value was picking up structure padding bytes. Reviewed-by: Ilia Mirkin <[email protected]>