summaryrefslogtreecommitdiffstats
path: root/src/gallium
Commit message (Collapse)AuthorAgeFilesLines
* radeon/llvm: Use alloca instructions for larger arraysTom Stellard2016-07-062-25/+151
| | | | | | | | | | | | | | | | | | | | We were storing arrays in vectors, which was leading to some really bad spill code for large arrays. allocas instructions are a better fit for arrays and LLVM optimizations are more geared toward dealing with allocas instead of vectors. For arrays that have 16 or less 32-bit elements, we will continue to use vectors, because this will force LLVM to store them in registers and use indirect registers, which is usually faster for small arrays. In the future we should use allocas for all arrays and teach LLVM how to store allocas in registers. This fixes the piglit test: spec/glsl-1.50/execution/geometry/max-input-component Reviewed-by: Marek Olšák <[email protected]>
* radeon/llvm: Add helpers for loading and storing data from arrays.Tom Stellard2016-07-061-10/+41
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeon/llvm: Remove uses_temp_indirect_addressing() functionTom Stellard2016-07-061-23/+1
| | | | | | bld->indirect_files is never set, so this function always returns false. Reviewed-by: Marek Olšák <[email protected]>
* gallium: un-inline pipe_surface_descRob Clark2016-07-061-11/+12
| | | | | | | Want to re-use this struct, so un-inline it. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/util: make util_copy_framebuffer_state(src=NULL) workRob Clark2016-07-061-11/+26
| | | | | | | | Be more consistent with the other u_inlines util_copy_xyz_state() helpers and support NULL src. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* winsys/amdgpu: avoid flushed depth when possibleNicolai Hähnle2016-07-061-3/+8
| | | | | | | If a depth/stencil texture has no mipmaps, we can always get a layout that is compatible with DB and TC. Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: add depth/stencil_adjusted output to surface computationNicolai Hähnle2016-07-063-2/+14
| | | | | | | | | | | This fixes a rare bug with stencil texturing -- seen on Polaris and Tonga, though it's basically a function of the memory configuration so could affect other parts as well. Fixes piglit "unaligned-blit * stencil downsample" and various "fbo-depth-array *stencil*" tests. Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: allocate only the required plane for flushed depthNicolai Hähnle2016-07-061-3/+34
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: decompress to flushed depth texture when requiredNicolai Hähnle2016-07-061-29/+103
| | | | | | v2: s/dirty_level_mask/stencil_dirty_level_mask/ in stencil case Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: extract DB->CB copy logic into its own functionNicolai Hähnle2016-07-061-36/+61
| | | | | | Also clean up some of the looping. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: sample from flushed depth texture when requiredNicolai Hähnle2016-07-062-8/+46
| | | | | | | Note that this has no effect yet. A case where can_sample_z/s can be false in radeonsi will be added in a later patch. Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: replace is_flushing_texture with db_compatibleNicolai Hähnle2016-07-069-19/+24
| | | | | | | | | | | This is a left-over of when I considered generalizing the separate stencil support. I do prefer the new name since it emphasizes what flushing vs. non-flushing means from a functional point-of-view, namely special handling of the texture format. v2: adjust r600_init_color_surface as well Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: add can_sample_z/s flags for texturesNicolai Hähnle2016-07-065-24/+34
| | | | | | v2: adjust r600_init_color_surface as well Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: correctly mark levels of 3D textures as fully decompressedNicolai Hähnle2016-07-061-2/+2
| | | | | | Account for the fact that max_layer is minified for higher levels. Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon/winsyses: remove unused stencil_offsetNicolai Hähnle2016-07-063-5/+0
| | | | Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: remove redundant null-pointer checkNicolai Hähnle2016-07-061-2/+1
| | | | | | v2: keep using r600_texture_reference Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: print StencilLayout only onceNicolai Hähnle2016-07-061-2/+2
| | | | | | It is the same for all levels. Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: flush stdout after printing texture informationNicolai Hähnle2016-07-061-0/+1
| | | | Reviewed-by: Marek Olšák <[email protected]>
* svga: avoid emitting redundant DXSetRenderTargets commandCharmaine Lee2016-07-052-18/+32
| | | | | | | Tested with Lightsmark2008, MTT piglit, glretrace, conform. Reviewed-by: Sinclair Yeh <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* radeon/vce: update encRefPic addr and array mode to tiledLeo Liu2016-07-051-0/+1
| | | | | Signed-off-by: Leo Liu <[email protected]> Reviewed-by: Christian König <[email protected]>
* radeon/vce: increase cpb height alignmentLeo Liu2016-07-051-1/+1
| | | | | | | | Height should be aligned with 2 macroblocks, thus making safer for tiled mode Signed-off-by: Leo Liu <[email protected]> Reviewed-by: Christian König <[email protected]>
* swr: automake: don't ship LLVM version specific generated sourcesEmil Velikov2016-07-051-2/+43
| | | | | | | | | | | | | | | Otherwise things will fail to build, if the builder is using another version of LLVM. v2: annotate all the dependencies of builder_gen.h v3: clean the generated files as needed v4: comment cleanups (Tim) Cc: "12.0" <[email protected]> Tested-by: Tim Rowley <[email protected]> Tested-by: Chuck Atkins <[email protected]> (v2) Reported-by: Chuck Atkins <[email protected]> Signed-off-by: Emil Velikov <[email protected]>
* clover: conditionally use MESA_GIT_SHA1Emil Velikov2016-07-052-2/+8
| | | | | | | | | | | | | | | Considering how hard/annoying it was for many peoples' workflow to properly generate the macro, it will be demoted to conditionally available with follow-up commits. v2: Kill off gracious blank line (Vedran). Cc: [email protected] Cc: Vedran Miletić <[email protected]> Cc: Francisco Jerez <[email protected]> Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> (v1) Reviewed-by: Vedran Miletić <[email protected]>
* nvc0/ir: rename NVE4_SU_INFO_XXX to NVC0_SU_INFO_XXXSamuel Pitoiset2016-07-051-49/+49
| | | | | | | | While we are at it, fix a typo inside the comment which describes what those constants are for. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0/ir: reset the base offset for indirect images accessesSamuel Pitoiset2016-07-051-2/+4
| | | | | | | | | | In presence of an indirect image access, the base offset should be zeroed because the stride will be computed twice. This is a pretty rare situation but it can happen when tex.r > 0. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Cc: "11.2 12.0" <[email protected]>
* gm107/ir: fix sign bit emission for FADD32ISamuel Pitoiset2016-07-051-3/+6
| | | | | | | | | | | | When emitting OP_SUB, the sign bit for FADD and FADD32I is not at the same position. It's at position 45 for FADD but 51 for FADD32I. This fixes the following piglit test: tests/spec/arb_fragment_program/fdo30337b.shader_test Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Cc: <[email protected]>
* vc4: Regularize instruction emit macrosEric Anholt2016-07-042-39/+50
| | | | | | ALU0 didn't have the _dest variant, and ALU2 didn't unset the def the way ALU1 did. This should make the ALU[012] macros much clearer, by moving most of their contents to vc4_qir.c
* vc4: Enable dead CF elimination.Eric Anholt2016-07-041-0/+1
| | | | | | Now that we're about to start generating control flow in our NIR, we want this in place. It optimizes things frequently in the CS, when the GL VS has control flow that doesn't affect the vertex position.
* vc4: Optimize out redundant SF updates.Eric Anholt2016-07-042-6/+78
| | | | | | | | | | | Tiny change on shader-db currently, but it will be important when we start emitting a lot of SFs from the same variable as part of control flow support. total instructions in shared programs: 89463 -> 89430 (-0.04%) instructions in affected programs: 1522 -> 1489 (-2.17%) total estimated cycles in shared programs: 250060 -> 250015 (-0.02%) estimated cycles in affected programs: 8568 -> 8523 (-0.53%)
* vc4: Move SF removal to a separate peephole pass.Eric Anholt2016-07-045-17/+85
| | | | | | | | | The DCE pass is going to change significantly to handle control flow, while we don't really need to change it for the SF handling. We also need to add some more SF peephole optimization for SF updates generated by control flow support. No change on shader-db.
* vc4: DCE instructions with a NULL destination.Eric Anholt2016-07-041-2/+3
| | | | | | | | I'm going to add an optimization for redundant SF update removal, which will just remove the SF and leave us (in many cases) with an instruction with a NULL destination and no side effects. Rather than teaching that pass whether the whole instruction can be removed, leave that responsibility to this pass.
* vc4: Mark texturing setup instructions as having side effects.Eric Anholt2016-07-041-5/+5
| | | | | | | We need to not DCE them even though they don't have a destination in QIR. We also shouldn't relocate them in vc4_opt_vpm. Neither of these things happen, but I'm about to make DCE consider instructions with a NULL destination.
* vc4: Fix a pasteo in scheduling condition flag usage.Eric Anholt2016-07-041-1/+1
| | | | | | | Noticed by code inspection. This hasn't been too big of a deal, because our cond usages all start out as adder ops, either MOVs or the FTOI for Z writes. MOVs *can* get converted to mul ops during scheduling, but apparently we hadn't hit this.
* vc4: Drop the dead QIR_PACK() macro.Eric Anholt2016-07-041-8/+0
| | | | | This isn't used since we switched to using the dst.pack field instead of custom instructions.
* radeonsi: do compilation from si_create_shader_selector asynchronouslyMarek Olšák2016-07-054-7/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | Main shader parts and geometry shaders are compiled asynchronously by util_queue. si_create_shader_selector doesn't wait and returns. si_draw_vbo(si_shader_select) waits for completion. This has the best effect when shaders are compiled at app-loading time. It doesn't help much for shaders compiled on demand, even though VS+PS compilation should take as much as time as the bigger one of the two. If an app creates more shaders, at most 4 threads will be used to compile them. Debug output disables this for shader stats to be printed in the correct order. (We could go even further and build variants asynchronously too, then emit draw calls without waiting and emit incomplete shader states, then force IB chaining to give the compiler more time, then sync the compilation at the IB flush and patch the IB with correct shader states. This is great for compilation before draw calls, but there are some difficulties such as scratch and tess states requiring the compiler output, and an on-disk shader cache will likely be a much better and simpler solution.) Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't lock shader cache mutex during compilationMarek Olšák2016-07-051-6/+16
| | | | | | | | | | to allow multiple shaders to be compiled simultaneously. ALso, shader-db can again use all 4 cores. v2: Remove the pipe_mutex_unlock call in the error path. Reviewed-by: Nicolai Hähnle <[email protected]> (v1)
* radeonsi: separate the compilation chunk of si_create_shader_selectorMarek Olšák2016-07-053-80/+110
| | | | | | | The function interface is ready to be used by util_queue. Also, si_shader_select_with_key can no longer accept si_context. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: move LLVMTargetMachineRef creation to a separate functionMarek Olšák2016-07-051-14/+18
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: add and use radeon_info::max_alloc_size (v2)Marek Olšák2016-07-056-10/+16
| | | | | | | | | | v2: - squashed the patches - use INT_MAX - clamp max_const_buffer_size - check the DRM version in radeon Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Vedran Miletić <[email protected]>
* radeonsi: print LLVM IRs to ddebug logsMarek Olšák2016-07-056-1/+26
| | | | | | | Getting LLVM IRs of hanging shaders have never been easier. Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: enable string markers and record apitrace call numbersMarek Olšák2016-07-053-1/+24
| | | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* ddebug: add an option to dump info about a specific apitrace callMarek Olšák2016-07-053-3/+29
| | | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* ddebug: implement pipe_context::generate_mipmapMarek Olšák2016-07-051-1/+52
| | | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* ddebug: record and dump apitrace call numbersMarek Olšák2016-07-054-1/+31
| | | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* ddebug: implement emit_string_markerMarek Olšák2016-07-051-3/+10
| | | | | | | and remove some obsolete comments Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: remove unused code - radeon_llvm_util.*Marek Olšák2016-07-055-169/+0
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: keep using v_rcp_f32 for division in future LLVM (v2)Marek Olšák2016-07-052-2/+30
| | | | | | | | | | This will be needed after some LLVM changes that haven't landed yet. v2: - use LLVMIsConstant to fix an LLVM assertion failure. LLVMSetMetadata doesn't work with constants. - don't set float metadata as string Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: remove an obsolete commentMarek Olšák2016-07-051-5/+0
| | | | | | It's not true. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't interpolate colors if flatshading is enabledMarek Olšák2016-07-053-2/+14
| | | | | | use v_interp_mov for those Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: enable the barycentric optimization in all casesMarek Olšák2016-07-053-18/+125
| | | | | | | | Handle the bc_optimize SGPR bit if both CENTER and CENTROID are enabled. This should increase the PS launch rate for big primitives with MSAA. Based on discussion with SPI guys. Reviewed-by: Nicolai Hähnle <[email protected]>