summaryrefslogtreecommitdiffstats
path: root/src/gallium
Commit message (Collapse)AuthorAgeFilesLines
* freedreno/a3xx: add fake RGTC support (required for GL3)Ilia Mirkin2015-11-185-27/+175
| | | | | | | | | | | | Also throw in LATC while we're at it (same exact format). This could be made more efficient by keeping a shadow compressed texture to use for returning at map time. However... it's not worth it for now... presumably compressed textures are not updated often. Lastly fix up Z32S8 transfers to non-0 layers. Signed-off-by: Ilia Mirkin <[email protected]> Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx: add missing formats to enable ARB_vertex_type_2_10_10_10_revIlia Mirkin2015-11-182-4/+8
| | | | | | | | | The previously RE'd formats were from an ES driver implementing OES_vertex_type_10_10_10_2 and thus backwards. A future change could add the 2_10_10_10 support. Signed-off-by: Ilia Mirkin <[email protected]> Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx+a4xx: fix for stk binning pass hangRob Clark2015-11-183-19/+76
| | | | | | | | | | | | | | We'd end up in a state where shader uses no inputs, yet num_elements is greater than zero. Triggered by a TF vertex shader which did: gl_Position = vec4(0.0, 0.0, 0.0, 0.0); resulting in a binning pass variant with no inputs. Includes equiv fix in a4xx, even though we don't have binning-pass enabled yet on a4xx. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx+a4xx: fix GL_POINTS lockup w/ GLESRob Clark2015-11-183-6/+11
| | | | | | | | point_size_per_vertex is always TRUE for GLES, causing us to configure the hw as if gl_PointSize was written, even if it was not. Which makes for grumpy hw. Signed-off-by: Rob Clark <[email protected]>
* llvmpipe: disable VSX in ppc due to LLVM PPC bugOded Gabbay2015-11-181-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | This patch disables the use of VSX instructions, as they cause some piglit tests to fail For more details, see: https://llvm.org/bugs/show_bug.cgi?id=25503#c7 With this patch, ppc64le reaches parity with x86-64 as far as piglit test suite is concerned. v2: - Added check that we have at least LLVM 3.4 - Added the LLVM bug URL as a comment in the code v3: - Only disable VSX if Altivec is supported, because if Altivec support is missing, then VSX support doesn't exist anyway. - Change original patch description. Signed-off-by: Oded Gabbay <[email protected]> Cc: "11.0" <[email protected]> Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* nvc0/ir: actually emit AFETCH on keplerIlia Mirkin2015-11-181-0/+3
| | | | | | | | Looks like this was forgotten in the commit which added the AFETCH logic. Signed-off-by: Ilia Mirkin <[email protected]> Cc: [email protected]
* svga: use more VGPU10 formatsBrian Paul2015-11-181-30/+67
| | | | | | | | | | We always want to prefer the VGPU10 formats over the VGPU9 ones when we have VGPU10 support. Original patch by Jose and updated by Brian. Reviewed-by: Charmaine Lee <[email protected]> Reviewed-by: José Fonseca <[email protected]>
* svga: add/use new svga_sampler_format() functionBrian Paul2015-11-183-0/+30
| | | | | | | | | This is important for the case of sampling from a depth texture. In that case, we need to sample the texture as if it were a single-channel color texture. For other/color formats, we can use the format as-is. Reviewed-by: Charmaine Lee <[email protected]> Reviewed-by: José Fonseca <[email protected]>
* radeon: count cs dwords separately for query begin and endNicolai Hähnle2015-11-182-15/+21
| | | | | | This will be important for perfcounter queries. Reviewed-by: Marek Olšák <[email protected]>
* radeon: expose r600_query_hw functions for reuseNicolai Hähnle2015-11-182-14/+27
| | | | | Reviewed-by: Marek Olšák <[email protected]> [Fixed a rebase conflict and re-tested before pushing.]
* radeon: implement r600_query_hw_get_result via function pointersNicolai Hähnle2015-11-182-99/+94
| | | | | | We will need the clear_result override for the batch query implementation. Reviewed-by: Marek Olšák <[email protected]>
* radeon: split hw query buffer handling from cs emitNicolai Hähnle2015-11-182-81/+133
| | | | | | | | | The idea here is that driver queries implemented outside of common code will use the same query buffer handling with different logic for starting and stopping the corresponding counters. Reviewed-by: Marek Olšák <[email protected]> [Fixed a rebase conflict and re-tested before pushing.]
* radeon: convert hardware queries to the new styleNicolai Hähnle2015-11-182-148/+172
| | | | | | | | | Move r600_query and r600_query_hw into the header because we will want to reuse the buffer handling and suspend/resume logic outside of the common radeon code. Reviewed-by: Marek Olšák <[email protected]> [Fixed a rebase conflict and re-tested before pushing.]
* radeon: convert software queries to the new styleNicolai Hähnle2015-11-181-172/+194
| | | | | | | | Software queries are all queries that do not require suspend/resume and explicit handling of result buffers. Reviewed-by: Marek Olšák <[email protected]> [Fixed a rebase conflict and re-tested before pushing.]
* radeon: add query handler function pointersNicolai Hähnle2015-11-182-7/+62
| | | | | | | | The goal here is to be able to move the implementation details of hardware- specific queries (in particular, performance counters) out of the common code. Reviewed-by: Marek Olšák <[email protected]> [Fixed a rebase conflict and re-tested before pushing.]
* radeon: move R600_QUERY_* constants into a new query header fileNicolai Hähnle2015-11-184-15/+51
| | | | | | | More query-related structures will have to be moved into their own header file to support hardware-specific performance counters. Reviewed-by: Marek Olšák <[email protected]>
* radeon: cleanup driver query listNicolai Hähnle2015-11-181-29/+55
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeon: move get_driver_query_info to r600_query.cNicolai Hähnle2015-11-183-45/+51
| | | | Reviewed-by: Marek Olšák <[email protected]>
* vc4: Don't bother lowering uniforms when the same value is used twice.Eric Anholt2015-11-171-13/+33
| | | | | | | | | | | | | DEQP likes to do math on uniforms, and the "fmaxabs dst, uni, uni" to get the absolute value would get lowered. The lowering doesn't bother to try to restrict the lifetime of the lowered uniforms, so we'd end up register allocation failng due to this on 5 of the tests (More tests still fail in RA, which look like we'll need to reduce lowered uniform lifetimes to fix). No changes on shader-db, though fewer extra MOVs are generated on even glxgears (MOVs pair well enough that it ends up being the same instruction count).
* vc4: Fix uniform reordering to support reading the same uniform twice.Eric Anholt2015-11-171-8/+18
| | | | | This does actually happen in the wild (particularly fabs of a uniform), so we'd like to support it.
* vc4: Fix documentation on vc4_qir_lower_uniforms.c.Eric Anholt2015-11-171-7/+3
|
* vc4: Add support for nir_op_uge, using the carry bit on QPU_A_SUB.Eric Anholt2015-11-175-0/+26
| | | | | | | | It looks like nir_lower_idiv is going to use it soon, so add support. With Ilia's change, this fixes one case in fs-op-div-large-uint-uint (with GL 3.0 forced on). Cc: "11.0" <[email protected]>
* radeonsi: enable optimal raster config setting for fiji (v2)Alex Deucher2015-11-161-3/+9
| | | | | | | | | | | Requires proper kernel tiling configuration so check the tiling config registers. v2: send the right version of the patch Reviewed-by: Marek Olšák <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
* radeonsi: use proper GRBM_GFX_INDEX offset for CI+Alex Deucher2015-11-161-4/+12
| | | | | | | The offset is different on CI and newer. Reviewed-by: Marek Olšák <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
* nv50: add missing header into the sources listEmil Velikov2015-11-161-0/+1
| | | | | | Otherwise it won't end up in the tarball. Signed-off-by: Emil Velikov <[email protected]>
* nv50,nvc0: disable render condition around clear_* functionsIlia Mirkin2015-11-144-0/+32
| | | | | | | Only the regular "clear" call is supposed to respect the render condition. The rest should ignore it. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50: add support for performance metrics on G84+Samuel Pitoiset2015-11-144-3/+259
| | | | | | | | Currently only one metric is exposed but more will be added later. Signed-off-by: Samuel Pitoiset <[email protected]> Tested-by: Pierre Moreau <[email protected]> Acked-by: Ilia Mirkin <[email protected]>
* nv50: add compute-related MP perf counters on G84+Samuel Pitoiset2015-11-149-2/+548
| | | | | | | | | | | | | | | | | | These compute-related MP performance counters have been reverse engineered using CUPTI which is part of NVIDIA CUDA. As for nvc0, we use a compute kernel to read out those performance counters, and the command stream to configure them. Note that Tesla only exposes 4 MP performance counters, while Fermi has 8. Only G84+ is supported because G80 is an old and weird card. Tested on G84, G96, G200, MCP79 and GT218 with glxgears, glxspheres64, xonotic-glx, heaven and valley. Signed-off-by: Samuel Pitoiset <[email protected]> Tested-by: Pierre Moreau <[email protected]> Acked-by: Ilia Mirkin <[email protected]>
* nv50: implement a basic compute supportSamuel Pitoiset2015-11-1410-9/+1006
| | | | | | | | | | | | | | | | | | | | | | | | | | This adds the ability to launch simple compute kernels like the one I will use to read out MP performance counters in the upcoming patch. This compute support is based on the work of Francisco Jerez (aka curro) that he did as part of his EVoC project in 2011/2012 to get OpenCL working on Tesla. His original work can be found here: https://github.com/curro/mesa/commits/nv50-compute I did some improvements on the original code, like fixing using both 3D and COMPUTE simultaneously, improving global buffers binding, and making the code closer to what nvc0 already does. This compute support has been tested by Pierre Moreau and myself with some compute kernels. This is a step towards OpenCL. Speaking about this, it seems like compute programs overlap fragment programs when they are used both. To fix this, we need to re-validate fragment programs when binding compute programs and vice versa. Note that, textures, samplers and surfaces still need to be implemented. Signed-off-by: Samuel Pitoiset <[email protected]> Tested-by: Pierre Moreau <[email protected]> Acked-by: Ilia Mirkin <[email protected]>
* nv50: free interpolation parameters in nv50_program_destroy()Samuel Pitoiset2015-11-141-1/+1
| | | | | | | | As for nvc0, we need to free memory allocated by interpolation parameters. This fixes a memory leak spotted by valgrind. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: reduce the number of GPR used when reading MP perf countersSamuel Pitoiset2015-11-141-1/+2
| | | | | | | No need to allocate more GPR than used in the compute kernel which reads MP performance counters on Fermi. Signed-off-by: Samuel Pitoiset <[email protected]>
* nouveau: don't expose HEVC decoding supportIlia Mirkin2015-11-141-0/+1
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Cc: [email protected]
* radeonsi: remove dead code after ES-GS linkage changeMarek Olšák2015-11-133-57/+0
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: link ES-GS just like LS-HSMarek Olšák2015-11-133-39/+19
| | | | | | | | | | | | This reduces the shader key for ES. Use a fixed attrib location based on (semantic name, index). The ESGS item size is determined by the physical index of the highest ES output, so it's almost always larger than before, but I think that shouldn't matter as long as the ESGS ring buffer is large enough. Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: calculate optimal GS ring sizes to fix GS hangs on TongaMarek Olšák2015-11-135-47/+113
| | | | | | | | | | | | | | I discovered that increasing the ESGS ring size fixes GS hangs on Tonga, so let's do it properly. There is now a separate init_config_gs_rings state that is not immutable, because GS rings are resized when needed. This also saves some memory. Most apps won't need more than 1MB per ring per shader engine. Reviewed-by: Michel Dänzer <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: rename si_update_gs_ringsMarek Olšák2015-11-131-2/+2
| | | | | Reviewed-by: Michel Dänzer <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: calculate ESGS_RING_ITEMSIZE in create_shaderMarek Olšák2015-11-132-1/+3
| | | | | Reviewed-by: Michel Dänzer <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: move maximum gs stream calculation into create_shaderMarek Olšák2015-11-132-16/+7
| | | | | Reviewed-by: Michel Dänzer <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: clean up small duplication in si_shader_gsMarek Olšák2015-11-132-6/+8
| | | | | Reviewed-by: Michel Dänzer <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: shorten render_cond variable namesMarek Olšák2015-11-135-13/+13
| | | | | | and ..._cond -> ..._invert Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: remove predicate_drawing flagMarek Olšák2015-11-134-4/+2
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: atomize render condition (SET_PREDICATION)Marek Olšák2015-11-1310-45/+45
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: simplify restoring render condition after flushMarek Olšák2015-11-133-26/+5
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: don't use PREDICATION_OP_CLEARMarek Olšák2015-11-131-36/+24
| | | | | | Not setting the predication bit is sufficient. Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: simplify disabling render condition for u_blitterMarek Olšák2015-11-135-23/+22
| | | | | | just disable it by not setting the predication bit Reviewed-by: Nicolai Hähnle <[email protected]>
* r600g: don't set predication on non-draw packetsMarek Olšák2015-11-131-8/+8
| | | | | | This has no effect. Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: inline the r600_rings structureMarek Olšák2015-11-1324-266/+262
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: prevent recursion in si_context_gfx_flushMarek Olšák2015-11-132-0/+8
| | | | | | The recursion can only occur if you modify need_cs_space to always flush. Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: remove the IB flushing flagMarek Olšák2015-11-134-14/+2
| | | | | | | Not needed anymore. A similar flag will be introduced in the next commit, which will be private in radeonsi. Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: move GFX/DMA flushing from add_to_buffer_list to need_cs_spaceMarek Olšák2015-11-134-15/+14
| | | | | | | | need_cs_space isn't invoked so often and is called before all commands too. This is a lot cleaner. The code in radeon_add_to_buffer_list always seemed dodgy to me. Reviewed-by: Nicolai Hähnle <[email protected]>