summaryrefslogtreecommitdiffstats
path: root/src/gallium
Commit message (Collapse)AuthorAgeFilesLines
* radeonsi: replace !tbaa with !invariant.loadMarek Olšák2016-07-131-12/+5
| | | | | | no change in generated code thanks to dereferenceable(n) Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: set dereferenceable attribute on descriptor arraysMarek Olšák2016-07-131-4/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This allows moving the loads arbitrarily in the Sinking pass. 26002 shaders in 14643 tests Totals: SGPRS: 2080160 -> 2080160 (0.00 %) VGPRS: 798875 -> 797826 (-0.13 %) Spilled SGPRs: 108485 -> 79165 (-27.03 %) Spilled VGPRs: 327 -> 327 (0.00 %) Scratch VGPRs: 1656 -> 1652 (-0.24 %) dwords per thread Code Size: 36127192 -> 35559780 (-1.57 %) bytes LDS: 767 -> 767 (0.00 %) blocks Max Waves: 212464 -> 212672 (0.10 %) Wait states: 0 -> 0 (0.00 %) PERCENTAGES / App Shaders SGPRs VGPRs SpillSGPR SpillVGPR Scratch CodeSize MaxWaves Waits (unknown) 4 . . . . . . . . 0ad 6 . . . . . . . . alien_isolation 2938 . 0.04 % -8.53 % . . -0.71 % -0.06 % . anholt 10 . . . . . . . . batman_arkham_origins 589 . -0.58 % -79.54 % . . -6.72 % 0.57 % . bioshock-infinite 1769 . -0.65 % -89.32 % . . -4.73 % 0.48 % . borderlands2 3968 . -0.31 % -51.21 % . . -4.09 % 0.22 % . brutal-legend 338 . -0.03 % -2.95 % . . -0.06 % . . civilization_beyond.. 116 . . -14.17 % . . -0.88 % . . counter_strike_glob.. 1142 . . . . . . . . dirt-showdown 541 . -0.56 % -40.14 % . -3.45 % -1.82 % 0.35 % . dolphin 22 . . . . . 0.16 % . . dota2 1747 . . . . . 0.01 % . . europa_universalis_4 76 . -0.23 % -42.11 % . . -0.96 % . . f1-2015 774 . -0.09 % -28.89 % . . -2.60 % 0.09 % . furmark-0.7.0 4 . . . . . . . . gimark-0.7.0 10 . . . . . . . . glamor 16 . . . . . . . . humus-celshading 4 . . . . . . . . humus-domino 6 . . . . . . . . humus-dynamicbranching 24 . 0.71 % . . . 0.29 % -0.45 % . humus-hdr 10 . . . . . . . . humus-portals 2 . . . . . . . . humus-volumetricfog.. 6 . . . . . . . . left_4_dead_2 1762 . . . . . . . . metro_2033_redux 2670 . -0.10 % -7.15 % . . -0.03 % . . nexuiz 80 . . . . . . . . pixmark-julia-fp32 2 . . . . . . . . pixmark-julia-fp64 2 . . . . . . . . pixmark-piano-0.7.0 2 . . . . . . . . pixmark-volplosion-.. 2 . . . . . . . . plot3d-0.7.0 8 . . . . . . . . portal 474 . . . . . . . . sauerbraten 7 . . . . . . . . serious_sam_3_bfe 392 . . -13.20 % . . -1.81 % . . supertuxkart 4 . . . . . . . . talos_principle 324 . -0.21 % -18.39 % . . -2.73 % 0.14 % . team_fortress_2 808 . . . . . . . . tesseract 430 . 0.08 % -68.57 % . . -0.45 % . . tessmark-0.7.0 6 . . . . . . . . thea 172 . . . . . 0.03 % . . ue4_effects_cave 299 . -0.04 % -10.15 % . . -0.25 % 0.04 % . ue4_elemental 586 . -0.02 % -13.93 % . . -0.13 % 0.02 % . ue4_lightroom_inter.. 74 . -0.17 % -70.00 % . . -1.27 % . . ue4_realistic_rende.. 92 . . -32.58 % . . -0.35 % . . unigine_heaven 322 . 0.12 % -54.17 % . . -1.42 % -0.12 % . unigine_sanctuary 264 . . . . . . . . unigine_tropics 210 . . . . . . . . unigine_valley 278 . -0.15 % -40.74 % . . -2.00 % 0.09 % . unity 72 . . . . . 0.03 % . . warsow 176 . . . . . . . . warzone2100 4 . . . . . 0.13 % . . witcher2 1040 . -0.03 % -86.28 % . . -0.28 % 0.01 % . xcom_enemy_within 1236 . -0.24 % -63.54 % . . -0.93 % 0.18 % . yofrankie 82 . -0.61 % -100.00 % . . -0.83 % 0.41 % . ----------------------------------------------------------------------------------------------------------- Total 26002 . -0.13 % -27.03 % . -0.24 % -1.57 % 0.10 % . Reviewed-by: Nicolai Hähnle <[email protected]>
* gallivm: add helper lp_add_attr_dereferenceableMarek Olšák2016-07-132-0/+14
| | | | | | | | | Not sure if this is the right way to do it, but it seems to work. v2: make it a no-op on LLVM <= 3.5 Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: clean up shader value metadata codeMarek Olšák2016-07-131-15/+19
| | | | | | | No change in behavior. BTW, tbaa_md_kind == 1, which was the magic number in the code. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: remove LLVMNoUnwindAttribute usesMarek Olšák2016-07-131-36/+31
| | | | | | always set by gallivm Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: fix a typo in SI_PARAM_LINEAR_* handlingMarek Olšák2016-07-131-1/+1
| | | | | | introduced in 476e9cee1d0cbe321c401277214e6c36ce5b18c9 Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: normalize the code styleMarek Olšák2016-07-132-338/+286
| | | | | | no change in behavior Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: just save buffer sizes instead of buffers while recording IBsMarek Olšák2016-07-135-10/+5
| | | | | | whole buffer objects are not needed Reviewed-by: Nicolai Hähnle <[email protected]>
* Add c99_alloca.h include to fix compilation on CygwinJon Turney2016-07-131-0/+1
| | | | | | | | Fix compilation on Cygwin, since 50b22354, by adding c99_alloca.h include, which should know how to portably make the alloc() prototype available. Signed-off-by: Jon Turney <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: silence Coverity warningNicolai Hähnle2016-07-132-0/+4
| | | | | | | | Coverity's analysis is too weak to understand that r600_init_flushed_depth(_, _, NULL) only returns true when flushed_depth_texture was assigned a non-NULL value. Reviewed-by: Marek Olšák <[email protected]>
* vc4: Validate QPU uniform pointer updates.Eric Anholt2016-07-121-0/+22
|
* vc4: Add support for NIR loops and break/continue.Eric Anholt2016-07-122-3/+79
|
* vc4: Add support for emitting NIR IF nodes.Eric Anholt2016-07-121-1/+91
|
* vc4: Add support for storing to NIR registers in a non-SSA fashion.Eric Anholt2016-07-122-85/+144
| | | | | | | Previously, there were occasionally NIR registers in our programs, but they were always actually used SSA-only. Now that we're trying to support control flow, we need to actually conditionally move to registers based on whether channels are active or not.
* vc4: Add a flag in the screen to track control flow support.Eric Anholt2016-07-123-1/+14
| | | | | For now it's still always false, but I need it in place for kernel backwards compat support as I extend the backend for control flow.
* vc4: Define a QIR branch instructionEric Anholt2016-07-124-9/+61
| | | | | | This uses the branch condition code in inst->cond to jump to either successor[0] (condition matches) or successor[0] (condition doesn't match).
* vc4: Add kernel support for branching in shader validation.Eric Anholt2016-07-123-17/+280
| | | | | | | | | | | | | | | | | | | | | We're already checking that branch instructions are within the contents of the shader and the proper PROG_END sequence is present. The other thing we need in the presence of branching is to verify that the shader doesn't overflow past the end of the uniforms stream. To do that, we require that at the start of any basic block reading uniforms have the following instructions: load_imm temp, <offset within uniform stream> add unif_addr, temp, unif The instructions are generated by userspace, and the kernel verifies that the load_imm is of the expected offset, and that the add adds it to a uniform. We track which uniform in the stream that is, and at draw call time fix up the uniform stream to have the address of the start of the shader's uniforms for that draw call. Signed-off-by: Eric Anholt <[email protected]>
* vc4: Add a bitmap of branch targets in kernel validation.Eric Anholt2016-07-123-2/+133
| | | | | | This isn't used yet, it's just a first step toward loop validation. During the main parsing of instructions, we need to know when we hit a new basic block so that we can reset validated state.
* vc4: Track the current instruction into the validation_state.Eric Anholt2016-07-121-24/+30
| | | | | This reduces how much we need to pass around as arguments, which was becoming more of a problem with looping validation.
* vc4: Add QPU support for generating BRANCH instructions.Eric Anholt2016-07-125-1/+85
|
* vc4: Print live variable start/ends during QIR dumping.Eric Anholt2016-07-121-0/+45
| | | | | This only happens when live variables are set up, which is not in the normal dump, but is set up when we've failed to register allocate.
* vc4: Implement live intervals using a CFG.Eric Anholt2016-07-126-39/+393
| | | | | Right now our CFG is always a trivial single basic block, but that will change when enable loops.
* vc4: Make vc4_qir_schedule handle each block in the program.Eric Anholt2016-07-121-14/+23
| | | | | | | | Basically we just treat each block independently. The only inter-block scheduling I can think of that would be be interesting would be to move texture result collection to after a short loop/if block that doesn't do texturing. However, the kernel disallows that as part of its security validation.
* vc4: Convert uniforms lowering to work with multiple blocks.Eric Anholt2016-07-121-29/+44
| | | | | | | | | We still decide which uniform to lower based on how many instructions-that-need-lowering use that uniform, but now we emit a new temporary uniform load in each of the basic blocks containing an instruction being lowered. This commit is best reviewed with diff -b.
* vc4: Convert vc4_opt_peephole_sf to work with control flow.Eric Anholt2016-07-121-4/+18
| | | | | | We need to apply the peephole pass to each of the blocks in the program. We don't do dataflow analysis for SF across blocks, but we also don't generate code that would need us to do so.
* vc4: Create a basic block structure and move the instructions into it.Eric Anholt2016-07-126-20/+122
| | | | | | | The optimization passes and scheduling aren't actually ready for multiple blocks with control flow yet (as seen by the "cur_block" references in them instead of iterating over blocks), but this creates the structures necessary for converting them.
* vc4: Add a "qir_for_each_inst_inorder" macro and use it in many places.Eric Anholt2016-07-1212-14/+17
| | | | | | | | We have the prior list_foreach() all over the code, but I need to move where instructions live as part of adding support for control flow. Start by just converting to a helper iterator macro. (The simpler "qir_for_each_inst()" will be used for the for-each-inst-in-a-block iterator macro later)
* vc4: Also enable phi elimination.Eric Anholt2016-07-121-0/+1
| | | | | | | | This avoids a bunch of code gen regressions when enabling loops in vc4. Prior to that, the GLSL that would have generated these optimizable phi nodes was being lowered to csels between either (undef, a) or (a, a), and those were being dealt with by nir_opt_undef and nir_opt_algebraic.
* vc4: fix memory leakEric Engestrom2016-07-121-1/+1
| | | | | | | | The allocation has succeeded by that point, so it needs to be freed. CovID: 1358929 Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* vc4: Close our screen's fd on screen close.Eric Anholt2016-07-121-0/+3
| | | | | We're passed in a freshly dup()ed fd on screen create, so we should close it on exit. Debugged by Hugh Cole-Baker.
* swr: [rasterizer core] correct MSAA behavior for conservative rasterizationTim Rowley2016-07-123-11/+31
| | | | Signed-off-by: Tim Rowley <[email protected]>
* swr: [rasterizer core] conservative rast backend changesTim Rowley2016-07-128-221/+538
| | | | Signed-off-by: Tim Rowley <[email protected]>
* swr: [rasterizer] buckets cleanupTim Rowley2016-07-124-12/+43
| | | | Signed-off-by: Tim Rowley <[email protected]>
* swr: [rasterizer core] make all api functions call GetContextTim Rowley2016-07-121-14/+14
| | | | | | | | Small api cleanup. Make all api functions call GetContext instead of locally casting handle. Makes debugging easier by providing a single point to track context changes. Signed-off-by: Tim Rowley <[email protected]>
* swr: [rasterizer] add support for llvm-3.9Tim Rowley2016-07-122-15/+28
| | | | | | v2: use signed compare, remove unneeded vmask Signed-off-by: Tim Rowley <[email protected]>
* swr: [rasterizer jitter] fix llvm-3.7 compileTim Rowley2016-07-121-0/+5
| | | | | | | d3d97f8 broke llvm-3.7, which has a mismatched API for setDataLayout/getDataLayout. Signed-off-by: Tim Rowley <[email protected]>
* st/omx/dec: convert decoder video buffer to progressiveLeo Liu2016-07-122-3/+68
| | | | | | | | | | | | | | | | | | | | | | with encode tunneling The idea of encode tunneling is to use video buffer directly for encoder, but currently the encoder doesn’t support interlaced surface, the OMX decoder set progressive surface before on that purpose. Since now we are polling the driver for interlacing information for decoder, we got the interlaced as preferred as other APIs(VDPAU, VA-API), thus breaking the transcode with tunneling. The solution is when with tunnel detected, re-allocate progressive target buffers, and then converting the interlaced decoder results to there. This has been tested with transcode results bit to bit matching as before with surface from progressive to progressive. Signed-off-by: Leo Liu <[email protected]> Acked-by: Christian König <[email protected]> Tested-by: Julien Isorce <[email protected]>
* vl/compositor: set layer of y or uv to renderLeo Liu2016-07-122-0/+42
| | | | | | Signed-off-by: Leo Liu <[email protected]> Acked-by: Christian König <[email protected]> Tested-by: Julien Isorce <[email protected]>
* vl/compositor: add weave to yuv shaderLeo Liu2016-07-122-0/+43
| | | | | | | | This shader will make interlaced yuv to progressive yuv. Signed-off-by: Leo Liu <[email protected]> Acked-by: Christian König <[email protected]> Tested-by: Julien Isorce <[email protected]>
* vl/compositor: move weave shader out from rgb weavingLeo Liu2016-07-122-76/+83
| | | | | | | | We'll use weave shader in the later patch. Signed-off-by: Leo Liu <[email protected]> Acked-by: Christian König <[email protected]> Tested-by: Julien Isorce <[email protected]>
* clover/api: Implement clLinkProgram per-device binary presence validation rule.Francisco Jerez2016-07-111-2/+31
| | | | | Reviewed-by: Serge Martin <[email protected]> Tested-by: Jan Vesely <[email protected]>
* clover: Add clLinkProgram (CL 1.2).Serge Martin2016-07-111-4/+27
| | | | | | | | | [ Francisco Jerez: Use validate_build_common for error checking, simplify control flow slightly and handle additional exception types. ] Reviewed-by: Francisco Jerez <[email protected]> Tested-by: Jan Vesely <[email protected]>
* clover: Trivial cleanups for api/program.cpp.Francisco Jerez2016-07-111-9/+8
| | | | | Reviewed-by: Serge Martin <[email protected]> Tested-by: Jan Vesely <[email protected]>
* clover/core: Remove compiler.hpp.Francisco Jerez2016-07-114-37/+3
| | | | | | | | header_map was the only definition left in compiler.hpp, move it into program.hpp which is its only user in clover/core. Reviewed-by: Serge Martin <[email protected]> Tested-by: Jan Vesely <[email protected]>
* clover/llvm: Get rid of compile_program_llvm().Francisco Jerez2016-07-112-18/+0
| | | | | | | Superseded by compile_program() and link_program(). Reviewed-by: Serge Martin <[email protected]> Tested-by: Jan Vesely <[email protected]>
* clover: Provide separate program methods for compilation and linking.Francisco Jerez2016-07-113-12/+42
| | | | | | | | [ Serge Martin: Fix inverted opts and log build ctor args. Keep the log related to the build. Fix indentation ] Reviewed-by: Serge Martin <[email protected]> Tested-by: Jan Vesely <[email protected]>
* clover: Unify program::build_* into a single method returning a struct.Francisco Jerez2016-07-114-50/+39
| | | | | | | | | | | | This gets rid of the program::build_* query methods and replaces them with the program::build() method that returns a single data structure containing all parameters for the last build done on the given target device (including build logs, options and the binary itself). [ Serge Martin: Fix inverted opts and log build ctor args ] Reviewed-by: Serge Martin <[email protected]> Tested-by: Jan Vesely <[email protected]>
* clover: Change program::build opts argument to std::string.Serge Martin2016-07-112-3/+3
| | | | | Reviewed-by: Francisco Jerez <[email protected]> Tested-by: Jan Vesely <[email protected]>
* clover: Define error subclass to signal build option parse failure.Francisco Jerez2016-07-113-3/+11
| | | | | Reviewed-by: Serge Martin <[email protected]> Tested-by: Jan Vesely <[email protected]>
* clover: Move back to using build_error to signal compilation failure.Francisco Jerez2016-07-115-17/+17
| | | | | | | | | | | | | | This partially reverts 7e0180d57d330bd8d3047e841086712376b2a1cc. Having two different exception subclasses for compilation and linking makes it more difficult to share or move code between the two codepaths, because the exact same function under the same error condition would need to throw one exception or the other depending on what top-level API is being implemented with it. There is little benefit anyway because clCompileProgram() and clLinkProgram() can tell whether they are linking or compiling a program. Reviewed-by: Serge Martin <[email protected]> Tested-by: Jan Vesely <[email protected]>