summaryrefslogtreecommitdiffstats
path: root/src/gallium
Commit message (Collapse)AuthorAgeFilesLines
* radeonsi: use llvm.amdgcn.s.barrier instead of llvm.AMDGPU.barrier.localNicolai Hähnle2016-01-261-1/+3
| | | | | | | | | The new name for the intrinsic was introduced in LLVM r258558. v2: use ternary operator instead of preprocessor Reviewed-by: Michel Dänzer <[email protected]> (v1) Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: add DCC buffer for sampler views on new CSNicolai Hähnle2016-01-251-15/+18
| | | | | | | | This fixes a VM fault and possible lockup in high memory pressure situations. Cc: "11.1" <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: emit rw_buffers for tes_shader only if tes_shader presentNicolai Hähnle2016-01-251-3/+5
| | | | | Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: do not set the shader->key for gs copy shadersNicolai Hähnle2016-01-251-1/+0
| | | | | | | | | | | The key for a geometry shader would be interpreted as the key for a vertex shader further down the line, which really doesn't make sense. This does not affect the contents of shader->key because geometry shaders don't have any key entries anyway. Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: si_llvm_emit_vs_epilogue is never used with gs copy shadersNicolai Hähnle2016-01-251-2/+3
| | | | | | | Hence remove the misleading branch on is_gs_copy_shader. Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: move is_gs_copy_shader to si_shader_contextNicolai Hähnle2016-01-252-6/+5
| | | | | | | | It is only used during shader creation now, so no need to keep it around afterwards. Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: replace use of is_gs_copy_shader in si_shader_vsNicolai Hähnle2016-01-251-1/+1
| | | | | | | | We now have an explicit parameter that contains the same information, and this will allow us to get rid of is_gs_copy_shader in the si_shader struct. Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: ensure that VGT_GS_MODE is sent when necessaryNicolai Hähnle2016-01-251-8/+21
| | | | | | | | | | | | | | | Specifically, when the API switches from using a GS to not using a GS and then back to using the same GS again, we do not have to re-send all the GS state, but we do have to send VGT_GS_MODE. So make VGT_GS_MODE consistently be a part of the VS state. This fixes a rendering bug in Dolphin, but surely other applications are affected as well. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93648 Cc: "11.0 11.1" <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: extract the VGT_GS_MODE calculation into its own functionNicolai Hähnle2016-01-251-19/+28
| | | | | | Cc: "11.0 11.1" <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* trace: fix a segfault when tracing indirect draw callsSamuel Pitoiset2016-01-241-1/+16
| | | | | | | Like other resources, the indirect draw buffer must be unwrapped. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* winsys/amdgpu: optionally use buffer lists with all allocated buffersMarek Olšák2016-01-235-3/+61
| | | | | | | Set RADEON_ALL_BOS=1 to use it. Reviewed-by: Alex Deucher <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* virgl: enable building on AndroidRob Herring2016-01-234-0/+78
| | | | | | | | This is just a copy-n-paste and rename of vc4 Android makefiles. Signed-off-by: Rob Herring <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radeonsi: add ETC1 support for StoneyMarek Olšák2016-01-221-0/+1
| | | | | | | | | It's a subset of ETC2. Tested. For more information, see page 42 and onward: http://www.graphicshardware.org/previous/www_2007/presentations/strom-etc2-gh07.pdf Reviewed-by: Alex Deucher <[email protected]>
* radeonsi: change LLVM intrinsics for BREV, CLAMP, EX2Marek Olšák2016-01-221-3/+6
| | | | | | | Requested by Matt Arsenault. Reviewed-by: Tom Stellard <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: add max waves / SIMD to shader stats (v2)Marek Olšák2016-01-221-5/+49
| | | | | | | v2: account for LDS usage in PS the limit is per SIMD, not per CU Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: enable late VS allocation (v3)Marek Olšák2016-01-221-3/+23
| | | | | | | v2: take the number of CUs into account v3: change in LS allocation Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: allow using all CUs for tessellation and on-chip GS (v2)Marek Olšák2016-01-221-2/+2
| | | | | | | v2: After more discussion with hw teams, the kernel already contains the optimal settings allowing us to use all CUs. Reviewed-by: Nicolai Hähnle <[email protected]>
* Revert "radeonsi: fix discard-only fragment shaders (v2)"Nicolai Hähnle2016-01-221-4/+0
| | | | | | | This reverts commit 843855bbf0da2204ce536623ba957bfa83fdbd52. It became redundant due to Marek's earlier pushed 8667a1ae which achieves the same thing.
* radeonsi: fix discard-only fragment shaders (v2)Nicolai Hähnle2016-01-221-0/+4
| | | | | | | | | | | | | | | | | | | When a fragment shader is used that has no outputs but does conditional discard (KILL_IF), all fragments are killed without this patch. By comparing various register settings, my conclusion is that the exec mask is either not properly forwarded to the DB by NULL exports or ends up being unused, at least when there is _only_ a NULL export (the ISA documentation claims that NULL exports can be used to override a previously exported exec mask). Of the various approaches I have tried to work around the problem, this one seems to be the least invasive one. v2: take discard by alpha test into account as well Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93761 Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: add ETC2 support for StoneyMarek Olšák2016-01-222-10/+38
| | | | Tested and working.
* radeonsi: implement SAMPLEPOS system value without a constant buffer loadMarek Olšák2016-01-221-2/+13
| | | | | | | We always get per-sample input position. Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* winsys/amdgpu: compute num_good_compute_units correctlyMarek Olšák2016-01-221-10/+5
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: rename max_compute_units -> num_good_compute_unitsMarek Olšák2016-01-226-11/+11
| | | | | | radeon sets this correctly, but not amdgpu Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: disable SPI color outputs the shader doesn't writeMarek Olšák2016-01-222-0/+16
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: use all SPI color formatsMarek Olšák2016-01-226-58/+195
| | | | | | | | | because not using SPI_SHADER_32_ABGR doubles fill rate. We should also get optimal performance if alpha isn't needed or blending isn't enabled. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: use 32_AR for alpha-to-coverage without a color bufferMarek Olšák2016-01-221-1/+1
| | | | | | This avoids the fp16 packing instructions. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: add shader conversion code for all SPI color formatsMarek Olšák2016-01-222-14/+140
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: set CB_SHADER_MASK according to SPI color formatsMarek Olšák2016-01-221-16/+35
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: use SPI_SHADER_COL_FORMAT fields instead of export_16bpcMarek Olšák2016-01-227-38/+91
| | | | | | | | | | | | | This does change the behavior slightly: If a shader writes COLOR[i] and that color buffer isn't bound, the shader will export MRT_NULL instead and discard the IR tree that calculates the output. The only exception is alpha-to-coverage, which requires an alpha export. v2: - update a comment about 16BPC - account for MRTZ when when fixing alpha-test/kill Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't enable blending if colormask == 0Marek Olšák2016-01-221-0/+3
| | | | | | most likely useless, but doesn't hurt Reviewed-by: Nicolai Hähnle <[email protected]>
* targets/dri: android: use WHOLE static librariesEmil Velikov2016-01-221-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | By using whole static libraries the android buildsystem provides whole-archive (alike) solution. This means that we don't need to worry about the order of the static libraries and any reverse, recursive or circular dependencies that they have between one another. Without this the linker will discard any unused hunks of one library and we'll end up with unresolved symbols as those are required by another static library. This issue has become more prominent with the introduction of pipe-loader. Whole static libraries has been used in i915/i965 for a very long time, so we might do the same. v2: - Better commit message (Ilia) - Keep external dependencies as [normal] static libs (Mauro) Cc: [email protected] Cc: Mauro Rossi <[email protected]> Reported-by: Mauro Rossi <[email protected]> Signed-off-by: Emil Velikov <[email protected]>
* freedreno/a4xx: Add support for adreno 430cstout2016-01-211-0/+1
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno: make opc array static constChristian Gmeiner2016-01-211-1/+1
| | | | | Signed-off-by: Christian Gmeiner <[email protected]> Signed-off-by: Rob Clark <[email protected]>
* freedreno: implement emit_string_markerRob Clark2016-01-212-1/+28
| | | | | | Writes string to cmdstream in payload of a no-op packet. Signed-off-by: Rob Clark <[email protected]>
* gallium: add GREMEDY_string_markerRob Clark2016-01-2116-0/+22
| | | | | | | | | | Since the GREMEDY extensions are normally only exposed by the gremedy debugger (and could possibly trigger debug paths in the app), we don't expose the extension by default, but instead only with ST_DEBUG=gremedy. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* r600g: don't leak driver const buffersGrazvydas Ignotas2016-01-211-0/+6
| | | | | | | | | | | The buffers are referenced from r600_update_driver_const_buffers() -> r600_set_constant_buffer() -> u_upload_data(), but nothing ever releases the reference. Similar case with driver_consts. Found using valgrind. Signed-off-by: Grazvydas Ignotas <[email protected]> Cc: <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* nv50/ir: 64-bit splitting fixesIlia Mirkin2016-01-201-1/+2
| | | | | | | Take reading shader outputs into account, and use setFlagsDef for the carry since we rely on having i->flagsDef being set. Signed-off-by: Ilia Mirkin <[email protected]>
* gk110/ir: allow carry to be set/read by imadIlia Mirkin2016-01-201-0/+4
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* gm107/ir: add carry emission to LOP and IADDIlia Mirkin2016-01-201-0/+4
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* gm107/ir: add ATOM and CCTL supportIlia Mirkin2016-01-201-0/+52
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* gm107/ir: set LD/ST address width bitIlia Mirkin2016-01-201-0/+2
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* gk110/ir: fix double-wide vm addressIlia Mirkin2016-01-201-0/+4
|
* gk110/ir: add OP_CCTL handlingIlia Mirkin2016-01-201-0/+35
|
* gk110/ir: add atomic op emission, fix gmem loadsIlia Mirkin2016-01-201-5/+65
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* llvmpipe: warn about illegal use of objects in different contextsRoland Scheidegger2016-01-213-1/+32
| | | | | | | | | | | Doing that is clearly a bug. We can't quite assert as st/mesa may hit this, but increase at least visibility of it a bit. (For the non-refcounted objects it would be illegal too, but we can't detect that unless we'd store the context ourselves. Plus, those don't tend to cause random crashes at context or object destruction time... So just sampler views, surfaces and so targets for now.) Reviewed-by: Jose Fonseca <[email protected]>
* llvmpipe,i915: add back NEW_RASTERIZER dependency when computing vertex infoRoland Scheidegger2016-01-213-3/+11
| | | | | | | | | | | | | | | | | | | | I removed this mistakenly in 2dbc20e45689e09766552517a74e2270e49817b5. I actually thought it should not be necessary and a piglit run didn't show any differences, but this shouldn't have been in there. draw_prepare_shader_outputs() is in fact dependent on NEW_RASTERIZER. The new polygon-mode-facing test indeed shows why this is necessary, there's lots of invalid reads and writes with valgrind (also crashes without valgrind), because the pre-pipeline vertex size doesn't match the post-pipeline vertex size (note this won't help much with stages which don't have the prepare hook which can grow the vertex size, in particular the wide point stage, but this isn't used by llvmpipe). The test still won't pass, of course, but it is only usage of uninitialized values now, which is much less dangerous... (Albeit I'm pretty sure for i915 it really is not needed anymore as it doesn't care about the extra outputs and doesn't call draw_prepare_shader_outputs().) Reviewed-by: Jose Fonseca <[email protected]>
* nv50/ir: don't flip SHL(ADD) into ADD(SHL) if ADD sources have modifiersIlia Mirkin2016-01-201-0/+2
| | | | | Fixes: 31fde8fa (nv50/ir: flip shl(add, imm) into add(shl, imm)) Signed-off-by: Ilia Mirkin <[email protected]>
* gk110/ir: fix load from shared memoryIlia Mirkin2016-01-201-1/+1
| | | | | | It was accidentally using the store opcode. Signed-off-by: Ilia Mirkin <[email protected]>
* gk110/ir: add partial BAR supportIlia Mirkin2016-01-201-2/+18
| | | | | | This is enough for the plain TGSI BARRIER implementation. Signed-off-by: Ilia Mirkin <[email protected]>
* llvmpipe: turn depth clears into full depth/stencil clears for d24x8 formatsRoland Scheidegger2016-01-201-11/+14
| | | | | | | | | | | | If we have a d24x8 format, there is no stencil. Therefore, we can always clear these bits too, which means this will be some kind of memset rather than read-modify-write. This is good for some 7% increase or so in gears with huge window size - seems to have a bigger effect if things aren't in caches. Of course, any real app won't spend nearly as much time comparatively in clearing depth buffer in the first place, so the speedup will be much lower. Reviewed-by: Jose Fonseca <[email protected]>