summaryrefslogtreecommitdiffstats
path: root/src/gallium
Commit message (Collapse)AuthorAgeFilesLines
* radeonsi: add basic code for overrasterizationMarek Olšák2015-03-165-16/+28
| | | | | | | This will be used for line and polygon smoothing. This is GCN-only even though it's in shared code. Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: small cleanup in si_shader_selector_keyMarek Olšák2015-03-161-12/+12
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: simplify accessing alpha pointer in si_llvm_emit_fs_epilogueMarek Olšák2015-03-161-7/+8
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: add support for easy opcodes from ARB_gpu_shader5Marek Olšák2015-03-161-0/+8
| | | | | | | | I have to use the BFE instrinsics, because BFE is one of the most complex instructions that can't be matched easily. BFE has 3 conditional branches and one of them is quite big. In the isel DAG, lowered BFE has 27 nodes (including leafs).
* radeonsi: implement bit-finding opcodes from ARB_gpu_shader5Marek Olšák2015-03-161-0/+92
| | | | Reviewed-by: Glenn Kennard <[email protected]>
* radeonsi: implement gl_SampleMaskInMarek Olšák2015-03-161-0/+4
| | | | Reviewed-by: Glenn Kennard <[email protected]>
* radeonsi: add support for SQRTMarek Olšák2015-03-162-1/+3
| | | | | Reviewed-by: Tom Stellard <[email protected]> Reviewed-by: Glenn Kennard <[email protected]>
* radeonsi: add support for FMAMarek Olšák2015-03-162-1/+4
| | | | | Reviewed-by: Tom Stellard <[email protected]> Reviewed-by: Glenn Kennard <[email protected]>
* gallium/radeon: don't use LLVMReadOnlyAttribute for ALUMarek Olšák2015-03-161-16/+9
| | | | | | | None of the instructions use a pointer argument. (+ small cosmetic changes) Reviewed-by: Tom Stellard <[email protected]>
* tgsi: handle bitwise opcodes in tgsi_opcode_infer_type (v2)Marek Olšák2015-03-161-0/+8
| | | | | | v2: set the same types as the destination type in tgsi_exec Reviewed-by: Ilia Mirkin <[email protected]>
* gallium: add FMA and DFMA opcodes (v3)Marek Olšák2015-03-1619-8/+53
| | | | | | | | | Needed by ARB_gpu_shader5. v2: select DMAD for FMA with double precision v3: add and select DFMA Reviewed-by: Ilia Mirkin <[email protected]>
* freedreno: update generated headersRob Clark2015-03-155-6/+6
| | | | | | | Fix a3xx texture layer-size. Signed-off-by: Rob Clark <[email protected]> Cc: "10.4 10.5" <[email protected]>
* freedreno/ir3: remove old compilerRob Clark2015-03-157-1574/+10
| | | | | | | Now that piglit is no longer falling back to old compiler for any tests, we can remove it. Hurray \o/ Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: avoid scheduler deadlockRob Clark2015-03-153-0/+45
| | | | | | | | | | | Deadlock can occur if we schedule an address register write, yet some instructions which depend on that address register value also depend on other unscheduled instructions that depend on a different address register value. To solve this, before scheduling an address register write, ensure that all the other dependencies of the instructions which consume this address register are already scheduled. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: bit of cleanupRob Clark2015-03-153-19/+23
| | | | | | | | Add an array_insert() macro to simplify inserting into dynamically sized arrays, add a comment, and remove unused prototype inherited from the original freedreno.git/fdre-a3xx test code, etc. Signed-off-by: Rob Clark <[email protected]>
* freedreno: fix slice pitch calculationsIlia Mirkin2015-03-131-1/+1
| | | | | | | | | | | | | For example if width were 65, the first slice would get 96 while the second would get 32. However the hardware appears to expect the second pitch to be 64, based on halving the 96 (and aligning up to 32). This fixes texelFetch piglit tests on a3xx below a certain size. Going higher they break again, but most likely due to unrelated reasons. Signed-off-by: Ilia Mirkin <[email protected]> Cc: "10.4 10.5" <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* freedreno/a3xx: use the same layer size for all slicesIlia Mirkin2015-03-131-1/+8
| | | | | | | | | | | | | | | | | | We only program in one layer size per texture, so that means that all levels must share one size. This makes the piglit test bin/texelFetch fs sampler2DArray have the same breakage as its non-array version instead of being completely off, and makes bin/ext_texture_array-gen-mipmap start passing. Signed-off-by: Ilia Mirkin <[email protected]> Cc: "10.4 10.5" <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* util: convert slab macros to inline functionsBrian Paul2015-03-131-2/+11
| | | | Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: (trivial) Fix typo in comment introduced by 70dc8aAlexandre Demers2015-03-131-1/+1
| | | | | | | Fix typo in comment introduced by 70dc8a Signed-off-by: Alexandre Demers <[email protected]> Signed-off-by: Jose Fonseca <[email protected]>
* gallivm: Prevent double delete on LLVM 3.6Jose Fonseca2015-03-121-0/+1
| | | | | | | | std::unique_ptr takes ownership of MM, and a double delete could ensure in case of an error, as pointed out by Chris Vine in https://bugs.freedesktop.org/show_bug.cgi?id=89387 Reviewed-by: Chris Vine <[email protected]>
* st/glx: use strdup() instead of _mesa_strdup()Brian Paul2015-03-111-1/+2
| | | | | Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* nvc0: fix wrong max value for driver queriesSamuel Pitoiset2015-03-091-4/+3
| | | | | | | | | | | The maximum value of a Gallium HUD's panel is automatically adjusted when the current value is greater than the max. If we set the pipe_query_driver_info::max_value to UINT64_MAX, the maximum value is never adjusted and this results in a flat line instead of a pretty curve which is correctly scaled. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* r600g: Use R600_MAX_VIEWPORTS instead of 16Alexandre Demers2015-03-095-12/+14
| | | | | | | | | | | Lets define R600_MAX_VIEWPORTS instead of using 16 here and there in the code when looping through viewports and scissors. It is easier to understand what this number represents. v2: Missed a case where R600_MAX_VIEWPORTS should have been used. Signed-off-by: Alexandre Demers <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* r300g: fix sRGB->sRGB blitsMarek Olšák2015-03-091-0/+9
| | | | Cc: 10.5 10.4 <[email protected]>
* r300g: fix a crash when resolving into an sRGB textureMarek Olšák2015-03-091-3/+5
| | | | Cc: 10.5 10.4 <[email protected]>
* r300g: use memset for clearing the shader keyMarek Olšák2015-03-091-1/+2
|
* r300g: remove the broken SNORM->UNORM shader lowering passMarek Olšák2015-03-093-54/+0
| | | | Not used anymore.
* r300g: fix RGTC1 and LATC1 SNORM formatsMarek Olšák2015-03-092-31/+17
| | | | Cc: 10.5 10.4 <[email protected]>
* r300g: Fix the ATI1N swizzle (RGTC1 and LATC1)Stefan Dösinger2015-03-091-1/+3
| | | | | | | | | | | | | | | | | | | | | This fixes the GL_COMPRESSED_RED_RGTC1 part of piglit's rgtc-teximage-01 test as well as the precision part of Wine's 3dc format test (fd.o bug 89156). The Z component seems to contain a lower precision version of the result, probably a temporary value from the decompression computation. The Y and W component contain different data that depends on the input values as well, but I could not make sense of them (Not that I tried very hard). GL_COMPRESSED_SIGNED_RED_RGTC1 still seems to have precision problems in piglit, and both formats are affected by a compiler bug if they're sampled by the shader with a swizzle other than .xyzw. Wine uses .xxxx, which returns random garbage. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89156 Signed-off-by: Marek Olšák <[email protected]> Cc: 10.5 10.4 <[email protected]>
* radeonsi: Add additional information to shader dumpsTom Stellard2015-03-091-6/+12
| | | | | | | This adds SGPR count, VGPR count, shader size, LDS size, and scratch usage to shader dumps. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi/compute: Use value from compiler for COMPUTE_PGM_RSRC1.FLOAT_MODETom Stellard2015-03-093-1/+4
| | | | Reviewed-by: Marek Olšák <[email protected]>
* clover: Return the minimum required value for CL_DEVICE_SINGLE_FP_CONFIG v2Tom Stellard2015-03-091-1/+4
| | | | | | | | | | This means dropping CL_FP_DENORM from the current return value. v2: - Add comments about minimum values for OpenCL 1.2. Reviewed-by: Francisco Jerez <[email protected]> Reviewed-by: Jan Vesely <[email protected]>
* freedreno/ir3: get the # of miplevels from getinfoIlia Mirkin2015-03-091-0/+20
| | | | | | | | | This fixes ARB_texture_query_levels to actually return the desired value. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Rob Clark <[email protected]> Cc: "10.4 10.5" <[email protected]>
* freedreno/ir3: fix array count returned by TXQIlia Mirkin2015-03-091-2/+42
| | | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Rob Clark <[email protected]> Cc: "10.4 10.5" <[email protected]>
* freedreno: move fb state copy after checking for size changeIlia Mirkin2015-03-091-2/+2
| | | | | | | Fixes: 1f3ca56b ("freedreno: use util_copy_framebuffer_state()") Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Rob Clark <[email protected]> Cc: "10.4 10.5" <[email protected]>
* freedreno: replace glsl130 debug flag with glsl120Rob Clark2015-03-082-12/+10
| | | | | | | | | | | | | | Now that relative-dst works, we should never fall back to the old compiler. (Which is almost true, other than a couple edge case sched fails in piglit). So replace glsl130 flag to force GLSL 130 and integers on a3xx/a4xx with a glsl120 flag to force GLSL 120 and !integers. If this commit breaks any game/app/etc use FD_MESA_DEBUG=glsl120 as a workaround and please let me know. Signed-off-by: Rob Clark <[email protected]>
* gallium/docs: add some freedreno compiler docsRob Clark2015-03-086-1/+457
| | | | | | | | | | | | | | | Enable the 'sphinx.ext.graphviz' extension, and add in a section for driver specific docs, with freedreno compiler docs beneath. The goal is for more complete compiler docs, and hopefully some docs about other parts of the driver (such as how tiling works, etc). Note that there is also a Distribution -> Drivers section. Although that appears to be simply just a list of drivers. Not sure if that should move under the 'Drivers' section or left alone. I did add a one-line section for freedreno in the existing Distribution -> Drivers section. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: relative dstRob Clark2015-03-085-42/+244
| | | | | | | | | To simplify RA, assign arrays that are written to first. Since enough dependency information is in the graph to preserve order of reads and writes of array, so all SSA names for the array collapse into one, just assign the entire thing by array-id. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: split out array_fanin() helperRob Clark2015-03-081-17/+30
| | | | | | We'll need this too for relative dst.. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: drop deref nodesRob Clark2015-03-087-80/+53
| | | | | | | | | | | | | | | | | | | | | The meta-deref instruction doesn't really do what we need for relative destination. Instead, since each instruction can reference at most a single address value, track the dependency on the address register via instr->address. This lets us express the dependency regardless of whether it is used for dst and/or src. The foreach_ssa_src{_n} iterator macros now also iterates the address register so, at least in SSA form, the address register behaves as an additional virtual src to the instruction. Which is pretty much what we want, as far as scheduling/etc. TODO: For now, the foreach_src{_n} iterators are unchanged. We could wrap the address in an ir3_register and make the foreach_src_{_n} iterators behave the same way. But that seems unnecessary at this point, since we mainly care about the address dependency when in SSA form. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: helpful iterator macrosRob Clark2015-03-089-111/+109
| | | | | | | | | I remembered that we are using c99.. which makes some sugary iterator macros easier. So introduce iterator macros to iterate all src registers and all SSA src instructions. The _n variants also return the src #, since there are a handful of places that need this. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: fix register usage calculationsRob Clark2015-03-081-7/+14
| | | | | | | | | For cat1 instructions, use reg() as well for relative src, to ensure proper accounting of register usage. Also, for relative instructions, use reg->size rather than reg->wrmask to determine the number of components read/written. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: couple tweaks for cmdline compilerRob Clark2015-03-081-1/+4
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: split up ssa_dstRob Clark2015-03-081-17/+25
| | | | | | And a couple other trivial renames, to prepare for relative dst. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: fix failed assert in groupingRob Clark2015-03-081-27/+44
| | | | | | | | | | | | | | | | | | | | | Turns out there are scenarios where we need to insert mov's in "front" of an input. Triggered by shaders like: VERT DCL IN[0] DCL IN[1] DCL OUT[0], POSITION DCL OUT[1], GENERIC[9] DCL SAMP[0] DCL TEMP[0], LOCAL 0: MOV TEMP[0].xy, IN[1].xyyy 1: MOV TEMP[0].w, IN[1].wwww 2: TXF TEMP[0], TEMP[0], SAMP[0], 1D_ARRAY 3: MOV OUT[1], TEMP[0] 4: MOV OUT[0], IN[0] 5: END Signed-off-by: Rob Clark <[email protected]>
* r300g: Fix build, invalid extern "C" around header inclusion.Mark Janes2015-03-061-0/+8
| | | | | | | | | | | | A previous patch to fix header inclusion within extern "C" neglected to fix the occurences of this pattern in r300 files. When the helper to detect this issue was pushed to master, it broke the build for the r300 driver. This patch fixes the r300 build. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89477 Reviewed-by: Ilia Mirkin <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* nouveau: Fix build, invalid extern "C" around header inclusion.Mark Janes2015-03-062-2/+7
| | | | | | | | | | | | A previous patch to fix header inclusion within extern "C" neglected to fix the occurences of this pattern in nouveau files. When the helper to detect this issue was pushed to master, it broke the build for the nouveau driver. This patch fixes the nouveau build. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89477 Reviewed-by: Ilia Mirkin <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* nv50,nvc0: remove bogus 64_FLOAT formatsIlia Mirkin2015-03-061-5/+0
| | | | | | | There is no HW support for these and the VBO pusher doesn't know about them. No need to, either, since the st will be lowering them to 2x32. Signed-off-by: Ilia Mirkin <[email protected]>
* ilo: clarify valid and preferred tilingsChia-I Wu2015-03-072-15/+29
| | | | | We did it right until the switch to gen_surface_tiling, which has GEN8_TILING_W. Generally, GEN8_TILING_W may be valid but not preferred.
* ilo: clean up Gen6 WAsChia-I Wu2015-03-071-34/+62
| | | | | | Add a help function for each WA and make PIPE_CONTROL flags match the WA descriptions. Call gen6_wa_pre_pipe_contro() only before PIPE_CONTROLs. Fix missing gen6_wa_pre_3dstate_vs_toggle() in the rectlist path.