summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers
Commit message (Collapse)AuthorAgeFilesLines
* swr/rast: Lower PERMD and PERMPS to x86.George Kyriazis2018-04-184-86/+14
| | | | | | | Add support for providing an emulation callback function for arch/width combinations that don't map cleanly to an x86 intrinsic. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Start refactoring of builder/packetizer.George Kyriazis2018-04-1816-46/+565
| | | | | | | | | | | Move x86 intrinsic lowering to a separate pass. Builder now instantiates generic intrinsics for features not supported by llvm. The separate x86 lowering pass is responsible for lowering to valid x86 for the target SIMD architecture. Currently it's a port of existing code to get it up and running quickly. Will eventually support optimized x86 for AVX, AVX2 and AVX512. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Simplify #define usage in gen source fileGeorge Kyriazis2018-04-181-4/+3
| | | | | | | | | | | | Removed preprocessor defines from structures passed to LLVM jitted code. The python scripts do not understand the preprocessor defines and ignores them. So for fields that are compiled out due to a preprocessor define the LLVM script accounts for them anyway because it doesn't know what the defines are set to. The sanitize defines for open source are fine in that they're safely used. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Move CallPrint() to a separate fileGeorge Kyriazis2018-04-184-21/+56
| | | | | | Needed work for jit code debug. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Fix name mangling for LLVM pow intrinsicGeorge Kyriazis2018-04-181-1/+1
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Add some archrast countersGeorge Kyriazis2018-04-187-4/+53
| | | | | | Hook up archrast counters for shader stats: instructions executed. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Code cleanupGeorge Kyriazis2018-04-181-8/+1
| | | | | | Removing some code that doesn't seem to do anything meaningful. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Add "Num Instructions Executed" stats intrinsic.George Kyriazis2018-04-181-7/+21
| | | | | | | Added a SWR_SHADER_STATS structure which is passed to each shader. The stats pass will instrument the shader to populate this. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Add MEM_ADD helper function to Builder.George Kyriazis2018-04-182-0/+9
| | | | | | | | mem[offset] += value This function will be heavily used by all stats intrinsics. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Permute work for simd16George Kyriazis2018-04-187-10/+67
| | | | | | | | Fix slow permutes in PA tri lists under SIMD16 emulation on AVX Added missing permute (interlane, immediate) to SIMDLIB Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: WIP builder rewrite (2)George Kyriazis2018-04-181-4/+13
| | | | | | | | | | | | | Finish up the remaining explicit intrinsic uses. At this point all explicit Intrinsic::getDeclaration() usage has been replaced with auto generated macros generated with gen_llvm_ir_macros.py. Going forward, make sure to only use the intrinsics here, adding new ones as needed. Next step is to remove all references to x86 intrinsics to keep the builder target-independent. Any x86 lowering will be handled by a separate pass. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Add autogen of helper llvm intrinsics.George Kyriazis2018-04-1813-126/+130
| | | | | | | | Replace sqrt, maskload, fp min/max, cttz, ctlz with llvm equivalent. Replace AVX maskedstore intrinsic with LLVM intrinsic. Add helper llvm macros for stacksave, stackrestore, popcnt. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: WIP builder rewrite.George Kyriazis2018-04-182-14/+0
| | | | | | Start removing avx2 macros for functionality that exists in llvm. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: LLVM 6 fixGeorge Kyriazis2018-04-181-1/+1
| | | | | | for getting masked gather intrinsic (also compatible with LLVM 4) Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Changes to allow jitter to compile with LLVM5George Kyriazis2018-04-181-1/+17
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Add some archrast statsGeorge Kyriazis2018-04-189-11/+105
| | | | | | | | | Add stats for degenerate and backfacing primitive counts Wire archrast stats for alpha blend and alpha test. pass value to jitter, upon return have archrast event increment a value Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Silence some unused variable warningsGeorge Kyriazis2018-04-181-1/+11
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Add debug type info for i128George Kyriazis2018-04-181-0/+1
| | | | | | Help support debug info in 16 wide shaders. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Use blend context struct to pass paramsGeorge Kyriazis2018-04-183-49/+62
| | | | | | | Stuff parameters into a blend context struct before passing down through the PFN_BLEND_JIT_FUNC function pointer. Needed for stat changes. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Introduce JIT_MEM_CLIENTGeorge Kyriazis2018-04-183-40/+71
| | | | | | | Add assert for correct usage of memory accesses v2: reworded commit message; renamed enum more appropriately Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Add some instructions to jitterGeorge Kyriazis2018-04-183-0/+15
| | | | | | VPHADDD, PMAXUD, PMINUD Reviewed-by: Bruce Cherniak <[email protected]>
* meson: Add library versions to swr driversJan Alexander Steffens (heftig)2018-04-171-0/+4
| | | | | | | This is for parity with autotools. Signed-off-by: Jan Alexander Steffens (heftig) <[email protected]> Acked-by: Dylan Baker <[email protected]>
* radeonsi: don't emit partial flushes for internal CS flushes onlyMarek Olšák2018-04-167-11/+14
| | | | | Tested-by: Benedikt Schemmer <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: implement mechanism for IBs without partial flushes at the end (v6)Marek Olšák2018-04-163-16/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (This patch doesn't enable the behavior. It will be enabled in a later commit.) Draw calls from multiple IBs can be executed in parallel. v2: do emit partial flushes on SI v3: invalidate all shader caches at the beginning of IBs v4: don't call si_emit_cache_flush in si_flush_gfx_cs if not needed, only do this for flushes invoked internally v5: empty IBs should wait for idle if the flush requires it v6: split the commit If we artificially limit the number of draw calls per IB to 5, we'll get a lot more IBs, leading to a lot more partial flushes. Let's see how the removal of partial flushes changes GPU utilization in that scenario: With partial flushes (time busy): CP: 99% SPI: 86% CB: 73: Without partial flushes (time busy): CP: 99% SPI: 93% CB: 81% Tested-by: Benedikt Schemmer <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: restore si_emit_cache_flush call at the end of IBsMarek Olšák2018-04-131-0/+2
| | | | Fixes: 918b798668c "radeonsi: make sure CP DMA is idle at the end of IBs"
* gallium: move ddebug, noop, rbug, trace to auxiliary to improve build timesMarek Olšák2018-04-1349-13220/+2
| | | | which also simplifies the build scripts.
* radeonsi: make sure CP DMA is idle at the end of IBsMarek Olšák2018-04-133-2/+16
|
* radeonsi: always prefetch later shaders after the draw packetMarek Olšák2018-04-133-26/+75
| | | | | | | | | so that the draw is started as soon as possible. v2: only prefetch the API VS and VBO descriptors Reviewed-by: Samuel Pitoiset <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* radeonsi: emit shader pointers before cache flushes & waitsMarek Olšák2018-04-131-13/+7
| | | | | | | | This code was written with the constant engine in mind. We can simplify it now. Reviewed-by: Samuel Pitoiset <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* radeonsi/gfx9: don't use the workaround for gather4 + stencilMarek Olšák2018-04-131-2/+11
| | | | | | | it doesn't seem to be needed. Acked-by: Samuel Pitoiset <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* radeonsi: disable TC-compat HTILE on Tonga and IcelandMarek Olšák2018-04-131-0/+7
| | | | | Acked-by: Samuel Pitoiset <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* radeonsi: force 2D tiling on VI only when TC-compat HTILE is really enabledMarek Olšák2018-04-131-9/+7
| | | | | | | just pass the flag that indicates it. Reviewed-by: Samuel Pitoiset <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* radeonsi: don't flush HTILE if there is no HTILE clearMarek Olšák2018-04-131-2/+2
| | | | | Reviewed-by: Samuel Pitoiset <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* radeonsi: merge 2 identical if statements in si_clearMarek Olšák2018-04-131-9/+2
| | | | | | | and other cleanups Reviewed-by: Samuel Pitoiset <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* radeonsi: don't do GFX-specific texture decompression for computeMarek Olšák2018-04-131-10/+10
| | | | | Reviewed-by: Samuel Pitoiset <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* radeonsi: simplify generating the renderer stringMarek Olšák2018-04-131-11/+8
| | | | | | | HAVE_LLVM > 0 is a tautology. Reviewed-by: Samuel Pitoiset <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* broadcom/vc5: Fix a stray '`' in a comment.Eric Anholt2018-04-121-1/+1
|
* broadcom/vc5: Update the UABI for in/out syncobjsEric Anholt2018-04-129-90/+55
| | | | | | | | | This is the ABI I'm hoping to stabilize for merging the driver. seqnos are eliminated, which allows for the GPU scheduler to task-switch between DRM fds even after submission to the kernel. In/out sync objects are introduced, to allow the Android fencing extension (not yet implemented, but should be trivial), and to also allow the driver to tell the kernel to not start a bin until a previous render is complete.
* broadcom/vc5: Drop the finished_seqno optimization.Eric Anholt2018-04-122-11/+0
| | | | | With the DRM scheduler changes, I'm about to remove all seqnos from the UABI.
* broadcom/vc5: Drop the throttling code.Eric Anholt2018-04-121-9/+0
| | | | | Since I'll be using the DRM scheduler, we won't run into the problem of a runaway client starving other clients of GPU time.
* broadcom/vc5: Move flush_last_load into load_general, like for stores.Eric Anholt2018-04-121-28/+29
| | | | | | | This should avoid mistakes with not flushing as we change the series of loads. Already, it fixes a hopefully unreachable case where we were emitting just the TILE_COORDINATES and not the dummy store that needs to go with it.
* broadcom/vc5: Rename read_but_not_cleared to loads_pending.Eric Anholt2018-04-121-13/+13
| | | | | This is a more obvious name for what the variable means, and matches what it's called for stores.
* broadcom/vc5: Refactor the implicit coords/stores_pending logic.Eric Anholt2018-04-121-23/+13
| | | | | Since I just fixed a bug due to forgetting to do these right, do it once in the helper func.
* broadcom/vc5: Emit missing TILE_COORDINATES_IMPLICIT in separate z/s stores.Eric Anholt2018-04-121-5/+16
| | | | | Fixes a simulator assertion failure in KHR-GLES3.packed_depth_stencil.blit.depth32f_stencil8
* broadcom/vc5: Add checks that we don't try to do raw Z+S load/stores.Eric Anholt2018-04-121-0/+8
| | | | | | | This was dying in the simulator on GTF-GLES3.gtf.GL3Tests.packed_depth_stencil.packed_depth_stencil_blit. We'll need to do basically the same thing as Z32F/S8 does in the MSAA Z24S8 case.
* broadcom/vc5: Fix MSAA depth/stencil size setup.Eric Anholt2018-04-121-2/+4
| | | | | | | The v3dX(get_internal_type_bpp_for_output_format)() call only handles color output formats (which overlap in enum numbers with depth output formats), so for depth we just need to take the normal cpp times the number of samples.
* radeonsi: use PIPE_FORMAT_P016 format for VP9 profile2Leo Liu2018-04-121-1/+2
| | | | | Signed-off-by: Leo Liu <[email protected]> Acked-by: Christian König <[email protected]>
* radeon/vcn: add VP9 profile2 supportLeo Liu2018-04-121-0/+16
| | | | | Signed-off-by: Leo Liu <[email protected]> Acked-by: Christian König <[email protected]>
* radeonsi: cap VP9 support to progressive bufferLeo Liu2018-04-121-0/+2
| | | | | Signed-off-by: Leo Liu <[email protected]> Acked-by: Christian König <[email protected]>
* radeonsi: cap VP9 support to RavenLeo Liu2018-04-121-0/+4
| | | | | Signed-off-by: Leo Liu <[email protected]> Acked-by: Christian König <[email protected]>