summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/radeonsi
Commit message (Collapse)AuthorAgeFilesLines
* radeonsi: remove unused si_prepare_cube_coordsNicolai Hähnle2017-01-132-200/+0
| | | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* amd/common: unify cube map coordinate handling between radeonsi and radvNicolai Hähnle2017-01-133-1/+11
| | | | | | | | | | | | | | | Code is taken from a combination of radv (for the more basic functions, to avoid gallivm dependencies) and radeonsi (for the new and improved derivative calculations). v2: add 0.5 offset to tex coords only after derivative calculation v3: - really only touch the first three coordinates - rebase on the removal of the 1.5 --> 0.5 offset change Reviewed-by: Bas Nieuwenhuizen <[email protected]> (v2) Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: only touch first three coordinates in si_prepare_cube_coordsNicolai Hähnle2017-01-131-12/+1
| | | | | | | | Sourcing coords_arg[4] is actually never correct, since bias is handled differently in tex_fetch_args anyway. Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: remove unused si_llvm_cube_to_2d_coordsNicolai Hähnle2017-01-131-28/+0
| | | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: restrict cube map derivative computations to the correct planeNicolai Hähnle2017-01-131-23/+107
| | | | | | | | | | | | | | | | | | | | As remarked by the comment in the original code, the old algorithm fails when (tc + deriv) points at a different cube face. Instead, simply project the derivative directly to the plane of the selected cube face. The new code is based on exactly differentiating (using the chain rule) the projection onto a plane corresponding to a fixed cube map face (which is still selected in the usual way based on the texture coordinate itself). The computations end up fairly involved, but we do save two reciprocal computations. Fixes GL45-CTS.texture_cube_map_array.sampling. v2: add 0.5 offset to tex coords only after derivative calculation v3: go back to 1.5 offset Reviewed-by: Bas Nieuwenhuizen <[email protected]> (v2) Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: communicate cube map coordinates more explicitlyNicolai Hähnle2017-01-131-33/+43
| | | | | | | v2: fix compile error that snuck in during rebase Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* ac/debug: move .gitignore for sid_tables.h tooGrazvydas Ignotas2017-01-131-1/+0
| | | | | | | | b838f642 "ac/debug: Move sid_tables.h generation to common code." moved sid_tables.h but forgot the corresponding .gitignore. Signed-off-by: Grazvydas Ignotas <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* ac, radeonsi: automake: add missing builddir includeEmil Velikov2017-01-121-0/+1
| | | | | | | | | | The generated file is correctly stored in the builddir as of earlier commit. Yet the commit forgot to add the respective include flag thus the compiler would error out failing to find sid_tables.h Bugzila: https://bugs.freedesktop.org/show_bug.cgi?id=99389 Fixes: d1dc22eb466 "ac: automake: rework sid_tables.h generation" Signed-off-by: Emil Velikov <[email protected]>
* radeonsi: num_records is in units of stride for swizzled buffers even on VINicolai Hähnle2017-01-121-2/+0
| | | | | | The old setting didn't hurt, but this is cleaner. Reviewed-by: Marek Olšák <[email protected]>
* ac/debug: Dump indirect buffers.Bas Nieuwenhuizen2017-01-091-3/+6
| | | | | | | | | | | | | | This is for handling chained command buffers and secondary command buffers. It doesn't handle the trace id for secondary command buffers yet, but I don't think that is possible in general with just writes, as we could call a secondary command buffer multiple times. I think this is good enough for now, as the most useful case is the chaining when we grow an IB. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* ac/debug: Move IB decode to common code.Bas Nieuwenhuizen2017-01-093-332/+15
| | | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* ac/debug: Move sid_tables.h generation to common code.Bas Nieuwenhuizen2017-01-093-308/+1
| | | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radeonsi: fix the Witcher 2 black transitionsMarek Olšák2017-01-091-2/+13
| | | | | | | | v2: do it properly Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98238 Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: set si_shader_context::input_decls for ranged decls correctlyMarek Olšák2017-01-091-1/+4
| | | | | | | | This has no effect because no code uses those members with ranged decls. Tested-by: Edmondo Tommasina <[email protected]> Acked-by: Edward O'Callaghan <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: cleanly communicate whether si_shader_dump should check R600_DEBUGMarek Olšák2017-01-095-13/+15
| | | | | | Tested-by: Edmondo Tommasina <[email protected]> Acked-by: Edward O'Callaghan <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: add TC L2 prefetch for shaders and VBO descriptorsMarek Olšák2017-01-063-1/+50
| | | | | Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: add CP DMA flags for greater control over synchronizationMarek Olšák2017-01-063-16/+31
| | | | | | | for L2 prefetch Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: cleanly communicate which CP DMA packet is firstMarek Olšák2017-01-061-11/+21
| | | | | Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: add HUD queries for cache flush statsMarek Olšák2017-01-061-0/+5
| | | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't count fast clears and prefetches into CP DMA statsMarek Olšák2017-01-061-2/+6
| | | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't wait for compute shaders in texture_barrierMarek Olšák2017-01-061-2/+1
| | | | | | | it doesn't interact with compute shaders in any way Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: assume that a TES without POSITION precedes GSMarek Olšák2017-01-061-1/+2
| | | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: unduplicate VS color export codeMarek Olšák2017-01-061-9/+2
| | | | | | | it's exactly the same as the other ones Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: clean up more HAVE_LLVM #ifdefsMarek Olšák2017-01-061-8/+11
| | | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: turn SDMA IBs into de-facto preambles of GFX IBsMarek Olšák2017-01-051-4/+16
| | | | | | | | | | Draw calls no longer flush SDMA IBs. r600_need_dma_space is responsible for synchronizing execution between both IBs. Initial buffer clears and fast clears will stay unflushed in the SDMA IB (up to 64 MB) as long as the GFX IB isn't flushed either. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: implement SDMA-based buffer clearing for SIMarek Olšák2017-01-052-1/+41
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: do all math in bytes in SI DMA codeMarek Olšák2017-01-051-17/+17
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: prevent SDMA stalls by detecting RAW hazards in need_dma_spaceMarek Olšák2017-01-052-10/+0
| | | | | | | | Call r600_dma_emit_wait_idle only when there is a possibility of a read-after-write hazard. Buffers not yet used by the SDMA IB don't have to wait. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: inline cik_sdma_do_copy_bufferMarek Olšák2017-01-051-28/+16
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: also wait for SDMA in the clear_buffer CPU fallbackMarek Olšák2017-01-051-3/+2
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: simplify r600_resource typecasts in si_clear_bufferMarek Olšák2017-01-051-5/+5
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: always use SDMA for big buffer clears and first buffer usesMarek Olšák2017-01-051-0/+20
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: implement SDMA-based buffer clearing for CIK-VIMarek Olšák2017-01-051-0/+42
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium: remove TGSI_OPCODE_ABSMarek Olšák2017-01-051-2/+0
| | | | | | It's redundant with the source modifier. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: update clip_regs if clip_disable changes to fix a hangMarek Olšák2017-01-051-0/+5
| | | | | | | | | | | | | | This seems to fix the GPU hangs caused by: commit ed3190b3f3a776fc8c75b1e6130a88079166d115 Author: Marek Olšák <[email protected]> Date: Sun Nov 13 18:41:43 2016 +0100 radeonsi: don't export ClipVertex and ClipDistance[] if clipping is disabled Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99219 Tested-by: Samuel Pitoiset <[email protected]>
* gallium: add PIPE_CAP_GLSL_OPTIMIZE_CONSERVATIVELYMarek Olšák2017-01-051-0/+1
| | | | | | Drivers with good compilers don't need aggressive optimizations before TGSI. Reviewed-by: Eric Anholt <[email protected]>
* radeonsi: capitalize VM hex addr when dumping buffer listSamuel Pitoiset2017-01-041-1/+1
| | | | | | | | Useful when debugging with R600_DEBUG=vm,check_vm to match addr in both outputs. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Bugfix needed for hashcatChristian Inci2016-12-221-5/+7
| | | | | | | | | Hashcat needs MAX_GLOBAL_BUFFERS to be 21 or even 22 for some modes. It'll crash otherwise. I'm adding an assert to see if programs need it to be even higher. Signed-off-by: Christian Inci <[email protected]> [Handle first properly; should be NFC, since clover always uses first == 0.] Signed-off-by: Nicolai Hähnle <[email protected]>
* radeonsi: fix gl_ClipDistance and gl_ClipVertex for pointsNicolai Hähnle2016-12-221-2/+10
| | | | | | | | | | The clipper hardware doesn't consider points as primitives that can be clipped. Simply setting the corresponding cull bits works, and should not have an adverse effect on other primitive types according to the hardware team. Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: only set VS_OUT_MISC_SIDE_BUS_ENA when the misc vector is usedNicolai Hähnle2016-12-221-5/+6
| | | | | | | | Should have no effect (other than perhaps on power consumption), but Vulkan does this. Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: add Polaris12 support (v3)Junwei Zhang2016-12-212-0/+2
| | | | | | | | | | | v2: use gfxip names for llvm 4.0+ v3: use tonga for llvm <= 3.8, drop gfxip name, we can just change that we change the other asics. Reviewed-by: Marek Olšák <[email protected]> Signed-off-by: Junwei Zhang <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Acked-by: Christian König <[email protected]>
* radeonsi: shrink the GSVS ring to account for the reduced item sizesNicolai Hähnle2016-12-121-1/+1
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: shrink each vertex stream to the actually required sizeNicolai Hähnle2016-12-122-25/+40
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: use a single descriptor for the GSVS ringNicolai Hähnle2016-12-124-50/+67
| | | | | | | | | We can hardcode all of the fields for swizzling in the geometry shader. The advantage is that we use fewer descriptor slots and we no longer have to update any of the (ring) descriptors when the geometry shader changes. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: pack GS output components for each vertex stream contiguouslyNicolai Hähnle2016-12-121-3/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Note that the memory layout of one vertex stream inside one "item" (= memory written by one GS wave) on the GSVS ring is: t0v0c0 ... t15v0c0 t0v1c0 ... t15v1c0 ... t0vLc0 ... t15vLc0 t0v0c1 ... t15v0c1 t0v1c1 ... t15v1c1 ... t0vLc1 ... t15vLc1 ... t0v0cL ... t15v0cL t0v1cL ... t15v1cL ... t0vLcL ... t15vLcL t16v0c0 ... t31v0c0 t16v1c0 ... t31v1c0 ... t16vLc0 ... t31vLc0 t16v0c1 ... t31v0c1 t16v1c1 ... t31v1c1 ... t16vLc1 ... t31vLc1 ... t16v0cL ... t31v0cL t16v1cL ... t31v1cL ... t16vLcL ... t31vLcL ... t48v0c0 ... t63v0c0 t48v1c0 ... t63v1c0 ... t48vLc0 ... t63vLc0 t48v0c1 ... t63v0c1 t48v1c1 ... t63v1c1 ... t48vLc1 ... t63vLc1 ... t48v0cL ... t63v0cL t48v1cL ... t63v1cL ... t48vLcL ... t63vLcL where tNN indicates the thread number, vNN the vertex number (in the order of EMIT_VERTEX), and cNN the output component (vL and cL are the last vertex and component, respectively). The vertex streams are laid out sequentially. The swizzling by 16 threads is hard-coded in the way the VGT generates the offset passed into the GS copy shader, and the jump every 16 threads is calculated from VGT_GSVS_RING_OFFSET_n and VGT_GSVS_RING_ITEMSIZE in a way that makes it difficult to deviate from this layout (at least that's what I've experimentally confirmed on VI after first trying to go the simpler route of just interleaving the vertex streams). Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: do not write non-existent components through the GSVS ringNicolai Hähnle2016-12-121-2/+4
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: only write values belonging to the stream when emitting GS vertexNicolai Hähnle2016-12-121-0/+3
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: generate an explicit switch instruction over vertex streamsNicolai Hähnle2016-12-121-8/+13
| | | | | | | | | | | | SimplifyCFG generates a switch instruction anyway when all four streams are present, but is simultaneously not smart enough to eliminate some redundant jumps that it generates. The generated assembly is still a bit silly, probably because the control flow annotation doesn't know how to handle a switch with uniform condition. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: fetch only outputs of current vertex stream from the GSVS ringNicolai Hähnle2016-12-121-16/+25
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: only export from GS copy shader for vertex stream 0Nicolai Hähnle2016-12-121-12/+19
| | | | | | | | When running the copy shader for vertex streams != 0, the SX does not need any data from us (there is no rasterization for the higher vertex streams, only streamout). Reviewed-by: Marek Olšák <[email protected]>