summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers
Commit message (Collapse)AuthorAgeFilesLines
* radeonsi: replace !tbaa with !invariant.loadMarek Olšák2016-07-131-12/+5
| | | | | | no change in generated code thanks to dereferenceable(n) Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: set dereferenceable attribute on descriptor arraysMarek Olšák2016-07-131-4/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This allows moving the loads arbitrarily in the Sinking pass. 26002 shaders in 14643 tests Totals: SGPRS: 2080160 -> 2080160 (0.00 %) VGPRS: 798875 -> 797826 (-0.13 %) Spilled SGPRs: 108485 -> 79165 (-27.03 %) Spilled VGPRs: 327 -> 327 (0.00 %) Scratch VGPRs: 1656 -> 1652 (-0.24 %) dwords per thread Code Size: 36127192 -> 35559780 (-1.57 %) bytes LDS: 767 -> 767 (0.00 %) blocks Max Waves: 212464 -> 212672 (0.10 %) Wait states: 0 -> 0 (0.00 %) PERCENTAGES / App Shaders SGPRs VGPRs SpillSGPR SpillVGPR Scratch CodeSize MaxWaves Waits (unknown) 4 . . . . . . . . 0ad 6 . . . . . . . . alien_isolation 2938 . 0.04 % -8.53 % . . -0.71 % -0.06 % . anholt 10 . . . . . . . . batman_arkham_origins 589 . -0.58 % -79.54 % . . -6.72 % 0.57 % . bioshock-infinite 1769 . -0.65 % -89.32 % . . -4.73 % 0.48 % . borderlands2 3968 . -0.31 % -51.21 % . . -4.09 % 0.22 % . brutal-legend 338 . -0.03 % -2.95 % . . -0.06 % . . civilization_beyond.. 116 . . -14.17 % . . -0.88 % . . counter_strike_glob.. 1142 . . . . . . . . dirt-showdown 541 . -0.56 % -40.14 % . -3.45 % -1.82 % 0.35 % . dolphin 22 . . . . . 0.16 % . . dota2 1747 . . . . . 0.01 % . . europa_universalis_4 76 . -0.23 % -42.11 % . . -0.96 % . . f1-2015 774 . -0.09 % -28.89 % . . -2.60 % 0.09 % . furmark-0.7.0 4 . . . . . . . . gimark-0.7.0 10 . . . . . . . . glamor 16 . . . . . . . . humus-celshading 4 . . . . . . . . humus-domino 6 . . . . . . . . humus-dynamicbranching 24 . 0.71 % . . . 0.29 % -0.45 % . humus-hdr 10 . . . . . . . . humus-portals 2 . . . . . . . . humus-volumetricfog.. 6 . . . . . . . . left_4_dead_2 1762 . . . . . . . . metro_2033_redux 2670 . -0.10 % -7.15 % . . -0.03 % . . nexuiz 80 . . . . . . . . pixmark-julia-fp32 2 . . . . . . . . pixmark-julia-fp64 2 . . . . . . . . pixmark-piano-0.7.0 2 . . . . . . . . pixmark-volplosion-.. 2 . . . . . . . . plot3d-0.7.0 8 . . . . . . . . portal 474 . . . . . . . . sauerbraten 7 . . . . . . . . serious_sam_3_bfe 392 . . -13.20 % . . -1.81 % . . supertuxkart 4 . . . . . . . . talos_principle 324 . -0.21 % -18.39 % . . -2.73 % 0.14 % . team_fortress_2 808 . . . . . . . . tesseract 430 . 0.08 % -68.57 % . . -0.45 % . . tessmark-0.7.0 6 . . . . . . . . thea 172 . . . . . 0.03 % . . ue4_effects_cave 299 . -0.04 % -10.15 % . . -0.25 % 0.04 % . ue4_elemental 586 . -0.02 % -13.93 % . . -0.13 % 0.02 % . ue4_lightroom_inter.. 74 . -0.17 % -70.00 % . . -1.27 % . . ue4_realistic_rende.. 92 . . -32.58 % . . -0.35 % . . unigine_heaven 322 . 0.12 % -54.17 % . . -1.42 % -0.12 % . unigine_sanctuary 264 . . . . . . . . unigine_tropics 210 . . . . . . . . unigine_valley 278 . -0.15 % -40.74 % . . -2.00 % 0.09 % . unity 72 . . . . . 0.03 % . . warsow 176 . . . . . . . . warzone2100 4 . . . . . 0.13 % . . witcher2 1040 . -0.03 % -86.28 % . . -0.28 % 0.01 % . xcom_enemy_within 1236 . -0.24 % -63.54 % . . -0.93 % 0.18 % . yofrankie 82 . -0.61 % -100.00 % . . -0.83 % 0.41 % . ----------------------------------------------------------------------------------------------------------- Total 26002 . -0.13 % -27.03 % . -0.24 % -1.57 % 0.10 % . Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: clean up shader value metadata codeMarek Olšák2016-07-131-15/+19
| | | | | | | No change in behavior. BTW, tbaa_md_kind == 1, which was the magic number in the code. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: remove LLVMNoUnwindAttribute usesMarek Olšák2016-07-131-36/+31
| | | | | | always set by gallivm Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: fix a typo in SI_PARAM_LINEAR_* handlingMarek Olšák2016-07-131-1/+1
| | | | | | introduced in 476e9cee1d0cbe321c401277214e6c36ce5b18c9 Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: normalize the code styleMarek Olšák2016-07-132-338/+286
| | | | | | no change in behavior Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: just save buffer sizes instead of buffers while recording IBsMarek Olšák2016-07-133-8/+3
| | | | | | whole buffer objects are not needed Reviewed-by: Nicolai Hähnle <[email protected]>
* Add c99_alloca.h include to fix compilation on CygwinJon Turney2016-07-131-0/+1
| | | | | | | | Fix compilation on Cygwin, since 50b22354, by adding c99_alloca.h include, which should know how to portably make the alloc() prototype available. Signed-off-by: Jon Turney <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: silence Coverity warningNicolai Hähnle2016-07-132-0/+4
| | | | | | | | Coverity's analysis is too weak to understand that r600_init_flushed_depth(_, _, NULL) only returns true when flushed_depth_texture was assigned a non-NULL value. Reviewed-by: Marek Olšák <[email protected]>
* vc4: Validate QPU uniform pointer updates.Eric Anholt2016-07-121-0/+22
|
* vc4: Add support for NIR loops and break/continue.Eric Anholt2016-07-122-3/+79
|
* vc4: Add support for emitting NIR IF nodes.Eric Anholt2016-07-121-1/+91
|
* vc4: Add support for storing to NIR registers in a non-SSA fashion.Eric Anholt2016-07-122-85/+144
| | | | | | | Previously, there were occasionally NIR registers in our programs, but they were always actually used SSA-only. Now that we're trying to support control flow, we need to actually conditionally move to registers based on whether channels are active or not.
* vc4: Add a flag in the screen to track control flow support.Eric Anholt2016-07-123-1/+14
| | | | | For now it's still always false, but I need it in place for kernel backwards compat support as I extend the backend for control flow.
* vc4: Define a QIR branch instructionEric Anholt2016-07-124-9/+61
| | | | | | This uses the branch condition code in inst->cond to jump to either successor[0] (condition matches) or successor[0] (condition doesn't match).
* vc4: Add kernel support for branching in shader validation.Eric Anholt2016-07-123-17/+280
| | | | | | | | | | | | | | | | | | | | | We're already checking that branch instructions are within the contents of the shader and the proper PROG_END sequence is present. The other thing we need in the presence of branching is to verify that the shader doesn't overflow past the end of the uniforms stream. To do that, we require that at the start of any basic block reading uniforms have the following instructions: load_imm temp, <offset within uniform stream> add unif_addr, temp, unif The instructions are generated by userspace, and the kernel verifies that the load_imm is of the expected offset, and that the add adds it to a uniform. We track which uniform in the stream that is, and at draw call time fix up the uniform stream to have the address of the start of the shader's uniforms for that draw call. Signed-off-by: Eric Anholt <[email protected]>
* vc4: Add a bitmap of branch targets in kernel validation.Eric Anholt2016-07-123-2/+133
| | | | | | This isn't used yet, it's just a first step toward loop validation. During the main parsing of instructions, we need to know when we hit a new basic block so that we can reset validated state.
* vc4: Track the current instruction into the validation_state.Eric Anholt2016-07-121-24/+30
| | | | | This reduces how much we need to pass around as arguments, which was becoming more of a problem with looping validation.
* vc4: Add QPU support for generating BRANCH instructions.Eric Anholt2016-07-125-1/+85
|
* vc4: Print live variable start/ends during QIR dumping.Eric Anholt2016-07-121-0/+45
| | | | | This only happens when live variables are set up, which is not in the normal dump, but is set up when we've failed to register allocate.
* vc4: Implement live intervals using a CFG.Eric Anholt2016-07-126-39/+393
| | | | | Right now our CFG is always a trivial single basic block, but that will change when enable loops.
* vc4: Make vc4_qir_schedule handle each block in the program.Eric Anholt2016-07-121-14/+23
| | | | | | | | Basically we just treat each block independently. The only inter-block scheduling I can think of that would be be interesting would be to move texture result collection to after a short loop/if block that doesn't do texturing. However, the kernel disallows that as part of its security validation.
* vc4: Convert uniforms lowering to work with multiple blocks.Eric Anholt2016-07-121-29/+44
| | | | | | | | | We still decide which uniform to lower based on how many instructions-that-need-lowering use that uniform, but now we emit a new temporary uniform load in each of the basic blocks containing an instruction being lowered. This commit is best reviewed with diff -b.
* vc4: Convert vc4_opt_peephole_sf to work with control flow.Eric Anholt2016-07-121-4/+18
| | | | | | We need to apply the peephole pass to each of the blocks in the program. We don't do dataflow analysis for SF across blocks, but we also don't generate code that would need us to do so.
* vc4: Create a basic block structure and move the instructions into it.Eric Anholt2016-07-126-20/+122
| | | | | | | The optimization passes and scheduling aren't actually ready for multiple blocks with control flow yet (as seen by the "cur_block" references in them instead of iterating over blocks), but this creates the structures necessary for converting them.
* vc4: Add a "qir_for_each_inst_inorder" macro and use it in many places.Eric Anholt2016-07-1212-14/+17
| | | | | | | | We have the prior list_foreach() all over the code, but I need to move where instructions live as part of adding support for control flow. Start by just converting to a helper iterator macro. (The simpler "qir_for_each_inst()" will be used for the for-each-inst-in-a-block iterator macro later)
* vc4: Also enable phi elimination.Eric Anholt2016-07-121-0/+1
| | | | | | | | This avoids a bunch of code gen regressions when enabling loops in vc4. Prior to that, the GLSL that would have generated these optimizable phi nodes was being lowered to csels between either (undef, a) or (a, a), and those were being dealt with by nir_opt_undef and nir_opt_algebraic.
* vc4: fix memory leakEric Engestrom2016-07-121-1/+1
| | | | | | | | The allocation has succeeded by that point, so it needs to be freed. CovID: 1358929 Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* vc4: Close our screen's fd on screen close.Eric Anholt2016-07-121-0/+3
| | | | | We're passed in a freshly dup()ed fd on screen create, so we should close it on exit. Debugged by Hugh Cole-Baker.
* swr: [rasterizer core] correct MSAA behavior for conservative rasterizationTim Rowley2016-07-123-11/+31
| | | | Signed-off-by: Tim Rowley <[email protected]>
* swr: [rasterizer core] conservative rast backend changesTim Rowley2016-07-128-221/+538
| | | | Signed-off-by: Tim Rowley <[email protected]>
* swr: [rasterizer] buckets cleanupTim Rowley2016-07-124-12/+43
| | | | Signed-off-by: Tim Rowley <[email protected]>
* swr: [rasterizer core] make all api functions call GetContextTim Rowley2016-07-121-14/+14
| | | | | | | | Small api cleanup. Make all api functions call GetContext instead of locally casting handle. Makes debugging easier by providing a single point to track context changes. Signed-off-by: Tim Rowley <[email protected]>
* swr: [rasterizer] add support for llvm-3.9Tim Rowley2016-07-122-15/+28
| | | | | | v2: use signed compare, remove unneeded vmask Signed-off-by: Tim Rowley <[email protected]>
* swr: [rasterizer jitter] fix llvm-3.7 compileTim Rowley2016-07-121-0/+5
| | | | | | | d3d97f8 broke llvm-3.7, which has a mismatched API for setDataLayout/getDataLayout. Signed-off-by: Tim Rowley <[email protected]>
* nvc0: initial support for GP100 GPUsBen Skeggs2016-07-124-5/+15
| | | | Signed-off-by: Ben Skeggs <[email protected]>
* nvc0: use a define for the driver constant buffer sizeSamuel Pitoiset2016-07-117-17/+17
| | | | | | | This might avoid mistakes if the size is bumped in the future. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: fix the driver cb size when draw parameters are usedSamuel Pitoiset2016-07-111-2/+2
| | | | | | | | | | | | The size of the driver constant buffer for each stage should be 2048 and not 512 because it has been increased recently for buffers/images. While we are at it, do the same change for indirect draws. This fixes all ARB_shader_draw_parameters tests on GM107. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Cc: 12.0 <[email protected]>
* nvc0/ir: fix images indirect access on FermiSamuel Pitoiset2016-07-111-0/+7
| | | | | | | | | | | This fixes the following piglits: arb_arrays_of_arrays-basic-imagestore-mixed-const-non-const-uniform-index arb_arrays_of_arrays-basic-imagestore-mixed-const-non-const-uniform-index2 Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Cc: 12.0 <[email protected]>
* radeonsi: fix bad assertion in si_emit_sample_maskNicolai Hähnle2016-07-091-1/+2
| | | | | | | The blitter sets mask == 1, which is fine since it doesn't use smoothing. Fixes a regression introduced in commit 5bcfbf91. Reviewed-by: Edward O'Callaghan <[email protected]>
* radeon/uvd: simplify sending context buffer messageChristian König2016-07-081-4/+1
| | | | | | | Just send it whenever it is allocated. Signed-off-by: Christian König <[email protected]> Reviewed-by: Leo Liu <[email protected]>
* radeon/uvd: fix contex buffer destruction in the error pathChristian König2016-07-081-6/+2
| | | | | | | Destroying a not allocated buffer is harmless. Signed-off-by: Christian König <[email protected]> Reviewed-by: Leo Liu <[email protected]>
* radeon/uvd: move polaris fw check into radeon_video.c v2Christian König2016-07-082-11/+13
| | | | | | | | | | It's actually not very clever to claim to support H.264 and then fail to create a decoder. v2: prefix FW macro with UVD_. Signed-off-by: Christian König <[email protected]> Reviewed-by: Leo Liu <[email protected]>
* radeon/video: fix coding style in radeon_video.c v2Christian König2016-07-081-15/+15
| | | | | | | v2: fix other tabs as well. Signed-off-by: Christian König <[email protected]> Reviewed-by: Leo Liu <[email protected]>
* svga: simplify/fix 1D/2D array resource copiesBrian Paul2016-07-081-26/+12
| | | | | | | | | | | Fixes the one of the piglit arb_copy_image-targets tests for 1D arrays. Previously, we were applying the 1D array z/face adjustment twice. Also simplify the copy_region_vgpu10() function. It never has to copy multiple array layers/slices. The Mesa code for glCopyImageSubData does the loop over slices/faces. Reviewed-by: Charmaine Lee <[email protected]>
* svga: remove unused variableBrian Paul2016-07-081-1/+0
| | | | Reviewed-by: Charmaine Lee <[email protected]>
* svga: add dumping for more device commandsBrian Paul2016-07-081-155/+724
| | | | Signed-off-by: Brian Paul <[email protected]>
* svga: silence a couple unused variable warningsBrian Paul2016-07-082-1/+3
| | | | Signed-off-by: Brian Paul <[email protected]>
* svga: rebind using render target surfaces in hw draw stateCharmaine Lee2016-07-081-6/+6
| | | | | | | | | | | | | | | Currently when we rebind framebuffer resources at the beginning of the command buffer, we use the color buffer surfaces saved in the context hw clear state. But the surfaces could be different from the actual emitted render target surfaces if any of the color buffer surfaces is also used for shader resource, in that case, we create a backed surface for the collided render target surface. So to rebind the framebuffer resources correctly, use the render target surfaces saved in the context hw draw state. Tested with Heaven, Lightsmark2008, MTT piglit, glretrace, conform. Reviewed-by: Brian Paul <[email protected]>
* svga: invalidate gb surface before it is reusedCharmaine Lee2016-07-083-9/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | With this patch, a guest-backed surface will be invalidated using the SVGA_3D_CMD_INVALIDATE_GB_SURFACE command before the surface is reused. This fixes the updating dirty image error from the device when a surface is reused. v2: Instead of invalidating the surface when it is reused, send the invalidate command before the surface is put into the recycle pool. v3: (1) surface invalidate is a noop operation in Linux winsys, since surface invalidation is not needed for DMA path. (2) Instead of invalidating the surface content in svga_screen_surface_destroy() when a surface is to be destroyed, it is done in svga_screen_cache_flush() when the surface is no longer referenced in a command buffer and is ready to be moved to the unused list. At this point, the surface will be moved to the invalidate list. When the surface invalidation is submitted, the surface will be moved to the unused list. Tested with piglit, glretrace. Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Sinclair Yeh <[email protected]>