summaryrefslogtreecommitdiffstats
path: root/src/gallium
Commit message (Collapse)AuthorAgeFilesLines
* st/va: remove unused variable pbuffJuan A. Suarez Romero2016-12-051-1/+0
| | | | | Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Elie Tournier <[email protected]>
* st/va: automake: cleanup C{PP,}FLAGSEmil Velikov2016-12-051-12/+0
| | | | | | | | Remove some transitional left overs from the gallium pipe-loader rework and kill off unneeded AM_CPPFLAGS. Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Christian König <[email protected]>
* configure.ac: Move llvm_set_environment_variables higher.Tobias Droste2016-12-0511-11/+11
| | | | | | | | | | | | | | | | This moves the function to get the LLVM environment variables higher in the file. It still needs to be below the "--enable-opencl" because it uses $enable_opencl. It can be called without condition now as it only throws errors if openCL is enabled. v5: HAVE_MESA_LLVM is only used for gallium. Rename it to HAVE_GALLIUM_LLVM. In order to only link LLVM when it is needed, HAVE_GALLIUM_LLVM is only set if "$enable-gallium-llvm" is yes. Signed-off-by: Tobias Droste <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* st/va: fix gop size for rate controlBoyuan Zhang2016-12-052-6/+14
| | | | | | | | | | | | | The gop_size in rate control is the budget window for internal rate control calculation, and shouldn't always equal to idr period. Define a coefficient to let budget window contains a number of idr period for proper rate control calculation. Adjust the number of i/p frame remaining accordingly. Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=98005 Signed-off-by: Boyuan Zhang <[email protected]> Acked-by: Christian König <[email protected]>
* st/va: force to submit two consecutive single jobsBoyuan Zhang2016-12-053-7/+27
| | | | | | | | | | | | | | | The gop_size in rate control is the budget window for internal rate control calculation, and shouldn't always equal to idr period. Define a coefficient to let budget window contains a number of idr period for proper rate control calculation. Adjust the number of i/p frame remaining accordingly. v2: fixed regression issues introduced by previous version Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=98005 Signed-off-by: Boyuan Zhang <[email protected]> Acked-by: Christian König <[email protected]>
* st/vdpau: fix compiler warning in vlVdpVideoMixerRenderNayan Deshmukh2016-12-051-1/+1
| | | | | Signed-off-by: Nayan Deshmukh <[email protected]> Reviewed-by: Christian König <[email protected]>
* swr: Fix active_queries countBruce Cherniak2016-12-021-6/+7
| | | | | | | The active_query count was incorrect for query types that don't require a begin_query. Removed the unnecessary assert. Reviewed-by: Tim Rowley <[email protected]>
* swr: Fix type to match parameters of std::max()George Kyriazis2016-12-021-7/+7
| | | | | | Include propagation of comparisons further down. Reviewed-by: Tim Rowley <[email protected]>
* swr: [rasterizer jitter] include cstdarg in builder_misc.cppTim Rowley2016-12-021-1/+2
| | | | | | | | Fixes build problem with llvm-svn. v2: use cstdarg instead of stdarg.h Reviewed-by: Bruce Cherniak <[email protected]>
* freedreno: no-op render when we need a fenceRob Clark2016-12-013-7/+40
| | | | | | | | | If app tries to create a fence but there is no rendering to submit, we need a dummy/no-op submit. Use a string-marker for the purpose.. mostly since it avoids needing to realize that the packet format changes in later gen's (so one less place to fixup for a5xx). Signed-off-by: Rob Clark <[email protected]>
* freedreno: native fence fd supportRob Clark2016-12-017-8/+69
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno: some fence cleanupRob Clark2016-12-019-27/+23
| | | | | | | | Prep-work for next patch, mostly move to tracking last_fence as a pipe_fence_handle (created now only in fd_gmem_render_tiles()), and a bit of superficial renaming. Signed-off-by: Rob Clark <[email protected]>
* gallium: support for native fence fd'sRob Clark2016-12-0118-2/+91
| | | | | | | This enables gallium support for EGL_ANDROID_native_fence_sync, for drivers which support PIPE_CAP_NATIVE_FENCE_FD. Signed-off-by: Rob Clark <[email protected]>
* gallium: wire up server_wait_syncRob Clark2016-12-012-1/+11
| | | | | | | | This will be needed for explicit synchronization with devices outside the gpu, ie. EGL_ANDROID_native_fence_sync. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* tgsi: store writes_primid when scanning tgsiTim Rowley2016-12-012-0/+4
| | | | Reviewed-by: Marek Olšák <[email protected]>
* vc4: Avoid false scheduling dependencies for LOAD_IMMs.Eric Anholt2016-11-302-0/+9
| | | | | | | | | | Noticed in shaders with branching, where we ended up scheduling delay slots near the start of a block for the uniforms reset setup. total instructions in shared programs: 93970 -> 93951 (-0.02%) instructions in affected programs: 3117 -> 3098 (-0.61%) 3DMMES performance +0.423087% +/- 0.133521% (n=9,10)
* vc4: Try to schedule QIR instructions between writing to and reading math.Eric Anholt2016-11-301-0/+22
| | | | | | | | | This helps us get the delay slots between SFU writes and reads filled. total instructions in shared programs: 94494 -> 93970 (-0.55%) instructions in affected programs: 59206 -> 58682 (-0.89%) 3DMMES performance +1.89967% +/- 0.157611% (n=10,9)
* vc4: Improve interleaving of texture coordinates vs results.Eric Anholt2016-11-301-3/+3
| | | | | | | | | | | | | | | | | | | | The latency_between was trying to handle the delay between the coordinate write ("before") and the corresponding sample read ("after"), but we were handing in the two instructions swapped. This meant that we tried to fit things between a tex_s and its *preceding* tex_result. This made us only interleave normal texture coordinates by accident, and pessimized UBO reads by pushing the tex_result collection earlier until there was nothing but it (and then its preceding coordinate setup) left. In addition to latency reduction, things end up packing better (probably due to reduced live ranges of the texture results): total instructions in shared programs: 98121 -> 94775 (-3.41%) instructions in affected programs: 91196 -> 87850 (-3.67%) 3DMMES performance +1.15569% +/- 0.124714% (n=8,10)
* vc4: Fix stray "." on no-op MUL packs.Eric Anholt2016-11-301-6/+6
| | | | | This happened when the PM bit was set for R4 unpacks, where the MUL pack was NOP.
* vc4: Allow merging instructions with SF set where the other writes NOP.Eric Anholt2016-11-301-0/+1
| | | | | | | | | | | | | I'm not sure how I managed to write the SF merge code (7d8b79f398f18ed7bb48a74b1b82950e2f08abad) without allowing merges with NOPs. *Everything* we try to merge with will have a NOP on one or the other side of the instruction, and that's why that commit showed no benefit. total instructions in shared programs: 99347 -> 95128 (-4.25%) instructions in affected programs: 91906 -> 87687 (-4.59%) 3DMMES performance +2.57105% +/- 0.135276% (n=6,8)
* vc4: In a loop break/continue, jump if everyone has taken the path.Eric Anholt2016-11-301-10/+17
| | | | | | | | | | | | | | | | | This should be a win for most loops, which tend to have uniform control flow. More importantly, it exposes important information to live variables: that the break/continue here means that our jump target may have access to values that were live on our input. Previously, we were just setting the exec mask and letting control flow fall through, so an intervening def between the break and the end of the loop would appear to live variables as if it screened off the variable, when it didn't actually. Fixes a regression in glsl-vs-loop-redundant-condition.shader_test when a perturbing of register allocation caused a live variable to get stomped. Cc: 13.0 <[email protected]>
* swr: add streamout buffer offset into pBuffer pointerIlia Mirkin2016-11-301-2/+3
| | | | | | | | The buffer_size does not take the offset into account. Just add the offset into the pointer which lines up the structures much better. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: fix assertion for max number of so targetsIlia Mirkin2016-11-301-1/+1
| | | | | | | The number has to be less than or equal to the max, not just less than. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: properly report max number of SO componentsIlia Mirkin2016-11-301-1/+1
| | | | | | | | The components count the number of individual values, not the number of slots. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: turn off queries around blitsIlia Mirkin2016-11-301-1/+9
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: don't advertise stream pause/resumeIlia Mirkin2016-11-301-1/+1
| | | | | | | | | There is no support for resuming streamout. Furthermore, this also controls glDrawTransformFeedback functionality which requires the same ability to query how many primitives were sent out of TF. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: fix range computation for instanced client-side arraysIlia Mirkin2016-11-302-24/+52
| | | | | | | | | | | We need to take the instance divisor and number of instances into account for instanced client-side arrays, rather than the vertex parameters. Loosely based on the comparable nvc0 logic. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer memory] assert when trying to convert an unknown formatIlia Mirkin2016-11-301-0/+1
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: remove warning about multi-layer surfacesIlia Mirkin2016-11-301-4/+0
| | | | | | | | | | We now support clearing these, and actually rendering to multiple layers would require GS support, which will fail in much more spectacular ways for now. Once that is hooked up, there won't be anything else to do here. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: [rasterizer core] don't attempt to load another RTAI when storingIlia Mirkin2016-11-301-1/+1
| | | | | | | | | Since we don't pass a renderTargetArrayIndex in, and the current hot tile may be for a different index, we may end up loading the RTAI=0 into the hot tile for no reason. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* radeonsi: apply the double EVENT_WRITE_EOP workaround to VI as wellMarek Olšák2016-12-011-2/+4
| | | | | | | | | | | | Internal docs don't mention it, but they also don't mention that the bug has been fixed (like other CI bugs fixed in VI). Vulkan does this too. v2: also update r600_gfx_write_fence_dwords Cc: 13.0 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> (v1)
* radeonsi: add a tess+GS hang workaround for VI dGPUsMarek Olšák2016-12-011-2/+10
| | | | | | | ported from Vulkan Cc: 13.0 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't apply the Z export bug workaround to HainanMarek Olšák2016-12-011-2/+3
| | | | | | not needed Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: apply a tessellation bug workaround for SIMarek Olšák2016-12-011-0/+7
| | | | | Cc: 13.0 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: apply a TC L1 write corruption workaround for SIMarek Olšák2016-12-011-11/+23
| | | | | Cc: 13.0 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: apply a multi-wave workgroup SPI bug workaround to affected CIK chipsMarek Olšák2016-12-014-4/+29
| | | | | | | All codepaths are handled except for clover. Cc: 13.0 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: consolidate max-work-group-size computationMarek Olšák2016-12-011-24/+19
| | | | | | | The next commit will need this. Cc: 13.0 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* freedreno/a5xx: fix negative branchesRob Clark2016-11-302-1/+6
| | | | | | | | Looks like immed branch offset size increased again.. making what we think is a small negative number look to hw like a huge positive number. And things go badly when shader tries to jump to hyperspace. Signed-off-by: Rob Clark <[email protected]>
* freedreno: fix android build with a5xxRob Clark2016-11-301-0/+1
| | | | | | | | | Android doesn't build all the files that normal linux/autotools build does (mainly standalond ir3_compiler).. but possibly we should pull C_SOURCES + aNxx_SOURCES into a single variable picked up by both Android.mk and Makefile.am? (Suggested by Rob H.) Signed-off-by: Rob Clark <[email protected]>
* freedreno/a5xx: fix discardRob Clark2016-11-301-3/+4
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a5xx: initial supportRob Clark2016-11-3033-17/+4470
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno: update generated headersRob Clark2016-11-3010-100/+4125
| | | | | | Pull in a5xx Signed-off-by: Rob Clark <[email protected]>
* freedreno: make gmem tile size alignment configurableRob Clark2016-11-303-8/+17
| | | | | | | a5xx seems to prefer 64 pixel alignment, in at least some cases. Make this configurable per generation. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: don't offset inloc by 8Rob Clark2016-11-304-27/+15
| | | | | | | | | On a3xx/a4xx, the SP_VS_VPC_DST_REG.OUTLOCn is offset by 8, so we used to add this offset into fs->inputs[n].inloc. But a5xx drops this extra offset-by-8. So instead make inloc zero based and add the offset when we emit OUTLOCn values (for the gen's that need the offset). Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx: use new shader linkage helperRob Clark2016-11-301-27/+16
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a4xx: use new shader linkage helperRob Clark2016-11-301-27/+16
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: add new helper for shader linkageRob Clark2016-11-301-0/+47
| | | | | | | Helps simplify things on a5xx, where pos/psize get added to the vs-out map. And anyways, simplifies a3xx and a4xx. Signed-off-by: Rob Clark <[email protected]>
* gallium: add PIPE_CAP_TGSI_CAN_READ_OUTPUTSNicolai Hähnle2016-11-3017-0/+18
| | | | | | | | | | | Drivers that support this benefit by saving one lowering pass in the GLSL-to-TGSI conversion. radeonsi already supports this because all outputs are stored in temporary variables before the export (except for TCS outputs, which have always been readable in TGSI anyway due to their special semantics). Reviewed-by: Marek Olšák <[email protected]>
* swr: [rasterizer jit] use signed integer representation for logic opIlia Mirkin2016-11-291-5/+12
| | | | | | | | | | Instead of (incorrectly) biasing the snorm value to make it look like a unorm, just use signed integer math. This fixes arb_color_buffer_float-render GL_RGBA8_SNORM Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: add missing rgbx8_srgb variantIlia Mirkin2016-11-291-0/+1
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>