summaryrefslogtreecommitdiffstats
path: root/src/gallium
Commit message (Collapse)AuthorAgeFilesLines
* tgsi_to_nir: Remove dependency on libglsl.Timur Kristóf2019-09-062-14/+18
| | | | | | | | | This commit removes the GLSL dependency in TTN by manually recording the textures used and calling nir_lower_samplers instead of its GL counterpart. Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* radeonsi: Release storage for smda_uploads when the context is destroyedGert Wollny2019-09-061-0/+1
| | | | | | | | | | | | | This fixes a memory leak in the flush code: Direct leak of 128 byte(s) in 1 object(s) allocated from: #0 in __interceptor_realloc .../gcc-8.3.0/libsanitizer/asan/asan_malloc_linux.cc:105 #1 in si_buffer_do_flush_region src/gallium/drivers/radeonsi/si_buffer.c:573 #2 in si_buffer_flush_region src/gallium/drivers/radeonsi/si_buffer.c:608 #3 in si_buffer_flush_region src/gallium/drivers/radeonsi/si_buffer.c:597 Signed-off-by: Gert Wollny <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* lima/ppir: don't lower phis to scalarVasily Khoruzhick2019-09-051-1/+0
| | | | | | | | | | Utgard PP is vec4 architecture, so lowering phis to scalars increases instruction count and potentially interferes with spilling. Tested-by: Andreas Baierl <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Signed-off-by: Vasily Khoruzhick <[email protected]>
* freedreno/a2xx: formats updateJonathan Marek2019-09-064-247/+103
| | | | | | | | | | | | | | For render formats, update fd2_pipe2color to only work with HW supported render formats, and remove the format whitelist is_format_supported. This patch enables float render formats (which work). For vertex/texture formats, use a generic function which translates using the bitsize of the channels. Since we fake support for some vertex formats, check for these in is_format_supported to avoid enabling them as sampler formats. Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* freedreno/a2xx: fix depth gmem restoreJonathan Marek2019-09-061-15/+12
| | | | | | | | | | | | | | Use fd_gmem_restore_format() to avoid trying to use unsupported Z24S8/Z16 render formats for gmem restore. Also apply this change to gmem2mem so it doesn't depend on fd2_pipe2color working with depth formats. gmem2mem/mem2gmem also doesn't need to use the swap/swizzle, since dst/src formats are the same. Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* freedreno/a2xx: implement polygon offsetJonathan Marek2019-09-061-0/+12
| | | | | | | | | Fixes failures in the following deqp tests: dEQP-GLES2.functional.polygon_offset.* Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* freedreno/a2xx: fix SRC_ALPHA_SATURATE for alpha blend functionJonathan Marek2019-09-061-1/+6
| | | | | | | | | Fixes failures in the following deqp tests: dEQP-GLES2.functional.fragment_ops.*src_alpha_saturate* Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* freedreno/a2xx: ir2: update register state in scalar insertJonathan Marek2019-09-061-0/+6
| | | | | Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* freedreno/a2xx: ir2: fix incorrect instruction reorderingJonathan Marek2019-09-061-0/+16
| | | | | Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* freedreno/a2xx: ir2: check opcode on the right instruction in export cpJonathan Marek2019-09-061-1/+1
| | | | | | Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* freedreno/a2xx: ir2: fix saturate in cpJonathan Marek2019-09-061-0/+4
| | | | | | Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* freedreno/a2xx: ir2: set lower_fdphJonathan Marek2019-09-061-0/+1
| | | | | | | | The fdph opcode is not supported. Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* freedreno/a2xx: ir2: remove pointcoord y invertJonathan Marek2019-09-061-4/+2
| | | | | | | | | Fixes the following deqp test: dEQP-GLES2.functional.shaders.builtin_variable.pointcoord Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* freedreno/a2xx: ir2: fix lowering of instructions after float loweringJonathan Marek2019-09-061-3/+2
| | | | | | | | | | | Some instructions generated by int/bool float lowering need to be lowered by opt_algebraic. Fixes: 43dbd7d6 Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* lima/ppir: don't lower vector {b,f}csel to scalar if condition is scalarVasily Khoruzhick2019-09-061-5/+21
| | | | | | | | | | Utgard PP has vector fcsel operation, but its condition is scalar. Add filtering callback that checks whether {b,f}csel condition is not scalar to lower {b,f}csel to scalar only in this case. Reviewed-by: Qiang Yu <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Signed-off-by: Vasily Khoruzhick <[email protected]>
* nir: allow specifying filter callback in lower_alu_to_scalarVasily Khoruzhick2019-09-067-51/+87
| | | | | | | | | | | | | Set of opcodes doesn't have enough flexibility in certain cases. E.g. Utgard PP has vector conditional select operation, but condition is always scalar. Lowering all the vector selects to scalar increases instruction number, so we need a way to filter only those ops that can't be handled in hardware. Reviewed-by: Qiang Yu <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Signed-off-by: Vasily Khoruzhick <[email protected]>
* lima/ppir: improve regalloc spill cost calculationErico Nunes2019-09-051-5/+49
| | | | | | | | | | | | Now that spilling ops can be inserted into existing instructions, it makes sense to increase cost to spill registers that would cause the creation of a new instruction. Experimental results showed that penalizing too much due to this caused worse results, however it is beneficial as a tie resolver between registers with the same number of components. Signed-off-by: Erico Nunes <[email protected]> Reviewed-by: Vasily Khoruzhick <[email protected]>
* lima/ppir: optimizations in regalloc spilling codeErico Nunes2019-09-051-90/+88
| | | | | | | | | | | | | | | Avoid creating unnecessary instructions for the load/store temp nodes when not required, to further reduce register pressure. The store_temp operation seems to be unable to do any spilling. At least the offline shader seems to never output instructions accessing swizzled components, and attempting to output that in ppir results in errors. So, force spilled registers to allocate a full vec4 register. This seems to be the optimal way as it is possible to always keep stores and temps in a single instruction that can be pipelined. Signed-off-by: Erico Nunes <[email protected]> Reviewed-by: Vasily Khoruzhick <[email protected]>
* lima/ppir: mark regalloc created ssa unspillableErico Nunes2019-09-051-0/+1
| | | | | | | | | | | One ssa created in the spillinc code in ppir_update_spilled_src was not properly being marked 'spilled', which made it a candidate for future spilling attempts. Since it was being inserted by the spilling code itself, let's mark it unspillable to avoid an infinite spilling loop. Signed-off-by: Erico Nunes <[email protected]> Reviewed-by: Vasily Khoruzhick <[email protected]>
* radeonsi/nir: Don't lower constant arrays to uniformsConnor Abbott2019-09-051-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | shader-db results: Totals: SGPRS: 3955968 -> 3954960 (-0.03 %) VGPRS: 2220220 -> 2220092 (-0.01 %) Spilled SGPRs: 11387 -> 11325 (-0.54 %) Spilled VGPRs: 97 -> 97 (0.00 %) Private memory VGPRs: 2528 -> 2528 (0.00 %) Scratch size: 2656 -> 2656 (0.00 %) dwords per thread Code Size: 76002204 -> 75994988 (-0.01 %) bytes LDS: 740 -> 740 (0.00 %) blocks Max Waves: 772776 -> 772787 (0.00 %) Wait states: 0 -> 0 (0.00 %) Totals from affected shaders: SGPRS: 16840 -> 15832 (-5.99 %) VGPRS: 16452 -> 16324 (-0.78 %) Spilled SGPRs: 1416 -> 1354 (-4.38 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 2016 -> 2016 (0.00 %) Scratch size: 2040 -> 2040 (0.00 %) dwords per thread Code Size: 953624 -> 946408 (-0.76 %) bytes LDS: 303 -> 303 (0.00 %) blocks Max Waves: 1622 -> 1633 (0.68 %) Wait states: 0 -> 0 (0.00 %) There were a large number of regressions in code size, but they seem to be because NIR unrolls some loop which results in the table being replaced by a bunch of immediates on multiplies etc. -- this bloats code size since the table size is now included, but means that there are less loads so it's still a net positive. Reviewed-by: Timothy Arceri <[email protected]>
* gallium: Plumb through a way to disable GLSL const loweringConnor Abbott2019-09-053-0/+9
| | | | | | | | | | For radeonsi, we will prefer the NIR pass as it'll generate better code (some index calculation and a single load vs. a load, then index calculation, then another load) and oftentimes NIR optimization can kick in and make all the access indices constant. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* ac/nir: Enable nir_opt_large_constantsConnor Abbott2019-09-051-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | vkpipeline-db numbers: Totals: SGPRS: 1740306 -> 1741322 (0.06 %) VGPRS: 1331124 -> 1331712 (0.04 %) Spilled SGPRs: 21201 -> 21316 (0.54 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 256 -> 256 (0.00 %) dwords per thread Code Size: 79022628 -> 78694788 (-0.41 %) bytes LDS: 6500 -> 6500 (0.00 %) blocks Max Waves: 301413 -> 301302 (-0.04 %) Wait states: 0 -> 0 (0.00 %) Totals from affected shaders: SGPRS: 53633 -> 54649 (1.89 %) VGPRS: 53000 -> 53588 (1.11 %) Spilled SGPRs: 3454 -> 3569 (3.33 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 5284232 -> 4956392 (-6.20 %) bytes LDS: 2 -> 2 (0.00 %) blocks Max Waves: 4239 -> 4128 (-2.62 %) Wait states: 0 -> 0 (0.00 %) (The biggest VGPR and max wave regression is due to unrolling a loop, which made the scheduler more aggressive, but in this case it's able to effectively hide latency so it's actually probably a win.) shader-db numbers with radeonsi NIR: Totals: SGPRS: 3526496 -> 3526512 (0.00 %) VGPRS: 2198576 -> 2198576 (0.00 %) Spilled SGPRs: 10463 -> 10463 (0.00 %) Spilled VGPRs: 86 -> 86 (0.00 %) Private memory VGPRs: 3182 -> 2528 (-20.55 %) Scratch size: 3308 -> 2640 (-20.19 %) dwords per thread Code Size: 74117280 -> 74106140 (-0.02 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 775846 -> 775844 (-0.00 %) Wait states: 0 -> 0 (0.00 %) Totals from affected shaders: SGPRS: 856 -> 872 (1.87 %) VGPRS: 680 -> 680 (0.00 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 654 -> 0 (-100.00 %) Scratch size: 668 -> 0 (-100.00 %) dwords per thread Code Size: 49652 -> 38512 (-22.44 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 182 -> 180 (-1.10 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Marek Olšák <[email protected]>
* radv/radeonsi: Don't count read-only data when reporting code sizeConnor Abbott2019-09-051-1/+1
| | | | | | | | | | We usually use these counts as a simple way to figure out if a change reduces the number of instructions or shrinks an instruction. However, since .rodata sections aren't executed, we shouldn't be counting their size for this analysis. Make the linker return the total executable size, and use it to report the more useful size in both drivers. Reviewed-by: Marek Olšák <[email protected]>
* clover: Fix build after clang r370122.Hal Gentz2019-09-042-2/+16
| | | | | | | | | | | | | | | | | ../mesa/src/gallium/state_trackers/clover/llvm/invocation.cpp: In function ‘std::unique_ptr<clang::CompilerInstance> {anonymous}::create_compiler_instance(const clover::device&, const std::vector<std::__cxx11::basic_string<char> >&, std::string&)’: ../mesa/src/gallium/state_trackers/clover/llvm/invocation.cpp:203:81: error: no matching function for call to ‘clang::CompilerInvocation::CreateFromArgs(clang::CompilerInvocation&, const char* const*, const char* const*, clang::DiagnosticsEngine&)’ 203 | c->getInvocation(), copts.data(), copts.data() + copts.size(), diag)) | ^ In file included from /opt/llvm64/include/clang/Frontend/CompilerInstance.h:15, from ../mesa/src/gallium/state_trackers/clover/llvm/codegen.hpp:37, from ../mesa/src/gallium/state_trackers/clover/llvm/invocation.cpp:49: /opt/llvm64/include/clang/Frontend/CompilerInvocation.h:157:15: note: candidate: ‘static bool clang::CompilerInvocation::CreateFromArgs(clang::CompilerInvocation&, llvm::ArrayRef<const char*>, clang::DiagnosticsEngine&)’ 157 | static bool CreateFromArgs(CompilerInvocation &Res, | ^~~~~~~~~~~~~~ /opt/llvm64/include/clang/Frontend/CompilerInvocation.h:157:15: note: candidate expects 3 arguments, 4 provided Signed-off-by: Hal Gentz <[email protected]> Reviewed-by: Aaron Watry <[email protected]>
* gallium/osmesa: Move 565 format selection checks where the rest are.Eric Anholt2019-09-041-4/+2
| | | | Reviewed-by: Timothy Arceri <[email protected]>
* gallium/osmesa: Fix a race in creating the stmgr.Eric Anholt2019-09-041-9/+17
| | | | | | Noticed while looking at other OSMesa bugs. Reviewed-by: Timothy Arceri <[email protected]>
* gallium/osmesa: Introduce a test.Eric Anholt2019-09-042-0/+52
| | | | | | | | Given that we occasionally touch this code and probably nobody really wants to think about it, introduce a minimal test so that we know we haven't completely broken OSMesa. Reviewed-by: Timothy Arceri <[email protected]>
* llvmpipe: enable compute shaders if LLVM has coroutinesDave Airlie2019-09-041-1/+1
| | | | Reviewed-by: Roland Scheidegger <[email protected]>
* llvmpipe: add local memory allocation pathDave Airlie2019-09-042-0/+12
| | | | Reviewed-by: Roland Scheidegger <[email protected]>
* llvmpipe: add compute shader parameter fetching supportDave Airlie2019-09-041-0/+54
| | | | Reviewed-by: Roland Scheidegger <[email protected]>
* llvmpipe: add compute shader images supportDave Airlie2019-09-044-1/+111
| | | | Reviewed-by: Roland Scheidegger <[email protected]>
* llvmpipe: add ssbo support to compute shadersDave Airlie2019-09-044-0/+61
| | | | Reviewed-by: Roland Scheidegger <[email protected]>
* llvmpipe: add compute sampler + sampler view support.Dave Airlie2019-09-044-4/+292
| | | | | | This is ported from the fragment shader code. Reviewed-by: Roland Scheidegger <[email protected]>
* llvmpipe: add support for compute constant buffers.Dave Airlie2019-09-044-2/+72
| | | | | | This is mostly ported from the fragment shader code. Reviewed-by: Roland Scheidegger <[email protected]>
* llvmpipe: add compute pipeline statistics support.Dave Airlie2019-09-042-1/+3
| | | | | | This just adds the CS invocations counter. Reviewed-by: Roland Scheidegger <[email protected]>
* llvmpipe: add grid launchDave Airlie2019-09-041-0/+76
| | | | | | | | | This adds the dispatch code. It creates a job for the number of blocks in the grid, and dispatches them to the threadpool implementation. The threadpool then calls the JIT code to execute the coroutines. Reviewed-by: Roland Scheidegger <[email protected]>
* llvmpipe: add compute shader generation.Dave Airlie2019-09-042-0/+337
| | | | | | | | | | | This creates the coroutine execution environment and the main compute shaders that get executed inside it. Each compute shader block is executed in it's own coroutine execution shader, which each "thread" being a coroutine executed inside it in sequence. Reviewed-by: Roland Scheidegger <[email protected]>
* llvmpipe: introduce variant building infrastrucutre.Dave Airlie2019-09-041-1/+185
| | | | | | | | This doesn't actually build any of the shaders yet, but just builds up the framework necessary to start building the shaders and variants. Reviewed-by: Roland Scheidegger <[email protected]>
* llvmpipe: introduce new state dirty tracking for compute.Dave Airlie2019-09-043-1/+3
| | | | | | | Compute doesn't share dirty state with the fragment pipeline so create a separate path for it. Reviewed-by: Roland Scheidegger <[email protected]>
* llvmpipe: add initial shader create/bind/destroy variants framework.Dave Airlie2019-09-044-0/+118
| | | | | | This is mostly a port of the fragment shader framework Reviewed-by: Roland Scheidegger <[email protected]>
* llvmpipe: add compute debug optionDave Airlie2019-09-042-0/+2
| | | | Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: add compute jit interface.Dave Airlie2019-09-043-1/+196
| | | | | | | This adds the jit interface for compute shaders, it's based on the fragment shader one. Reviewed-by: Roland Scheidegger <[email protected]>
* llvmpipe: add initial compute state structsDave Airlie2019-09-043-0/+40
| | | | | | These mirror the fragment shader structs, this is just a framework. Reviewed-by: Roland Scheidegger <[email protected]>
* llvmpipe: introduce compute shader contextDave Airlie2019-09-046-0/+98
| | | | | | | | The compute shader will need it's own context like the frag shader has, this just introduces the framework struct and allocates/frees for it in the right places. Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: add barrier support for compute shaders.Dave Airlie2019-09-042-1/+23
| | | | | | | When the code is executing an hits a barrier, it will suspend the coroutine and return control to the coroutine dispatcher. Reviewed-by: Roland Scheidegger <[email protected]>
* llvmpipe: add compute threadpool + mutexDave Airlie2019-09-046-2/+256
| | | | | | | Reviewed-by: Roland Scheidegger <[email protected]> In order to efficiently run a number of compute blocks, use a threadpool that just allows for jobs with unique sequential ids to be dispatched.
* gallivm: add support for compute shared memoryDave Airlie2019-09-043-19/+46
| | | | Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: add new compute related intrinsicsDave Airlie2019-09-042-0/+19
| | | | Reviewed-by: Roland Scheidegger <[email protected]>
* llvmpipe: reogranise jit pointer orderingDave Airlie2019-09-042-31/+31
| | | | | | | | In order to share the texture/image/sampler code with compute shaders we need to reorg them to be at the front of context same as draw does for vs/gs sharing. Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: add coroutine pass manager supportDave Airlie2019-09-043-1/+32
| | | | | | | coroutines require a proper pass manager, so add the passes to the correct places Reviewed-by: Roland Scheidegger <[email protected]>