summaryrefslogtreecommitdiffstats
path: root/src/gallium
Commit message (Collapse)AuthorAgeFilesLines
* nv50,nvc0: disable render condition around clear_* functionsIlia Mirkin2015-11-144-0/+32
| | | | | | | Only the regular "clear" call is supposed to respect the render condition. The rest should ignore it. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50: add support for performance metrics on G84+Samuel Pitoiset2015-11-144-3/+259
| | | | | | | | Currently only one metric is exposed but more will be added later. Signed-off-by: Samuel Pitoiset <[email protected]> Tested-by: Pierre Moreau <[email protected]> Acked-by: Ilia Mirkin <[email protected]>
* nv50: add compute-related MP perf counters on G84+Samuel Pitoiset2015-11-149-2/+548
| | | | | | | | | | | | | | | | | | These compute-related MP performance counters have been reverse engineered using CUPTI which is part of NVIDIA CUDA. As for nvc0, we use a compute kernel to read out those performance counters, and the command stream to configure them. Note that Tesla only exposes 4 MP performance counters, while Fermi has 8. Only G84+ is supported because G80 is an old and weird card. Tested on G84, G96, G200, MCP79 and GT218 with glxgears, glxspheres64, xonotic-glx, heaven and valley. Signed-off-by: Samuel Pitoiset <[email protected]> Tested-by: Pierre Moreau <[email protected]> Acked-by: Ilia Mirkin <[email protected]>
* nv50: implement a basic compute supportSamuel Pitoiset2015-11-1410-9/+1006
| | | | | | | | | | | | | | | | | | | | | | | | | | This adds the ability to launch simple compute kernels like the one I will use to read out MP performance counters in the upcoming patch. This compute support is based on the work of Francisco Jerez (aka curro) that he did as part of his EVoC project in 2011/2012 to get OpenCL working on Tesla. His original work can be found here: https://github.com/curro/mesa/commits/nv50-compute I did some improvements on the original code, like fixing using both 3D and COMPUTE simultaneously, improving global buffers binding, and making the code closer to what nvc0 already does. This compute support has been tested by Pierre Moreau and myself with some compute kernels. This is a step towards OpenCL. Speaking about this, it seems like compute programs overlap fragment programs when they are used both. To fix this, we need to re-validate fragment programs when binding compute programs and vice versa. Note that, textures, samplers and surfaces still need to be implemented. Signed-off-by: Samuel Pitoiset <[email protected]> Tested-by: Pierre Moreau <[email protected]> Acked-by: Ilia Mirkin <[email protected]>
* nv50: free interpolation parameters in nv50_program_destroy()Samuel Pitoiset2015-11-141-1/+1
| | | | | | | | As for nvc0, we need to free memory allocated by interpolation parameters. This fixes a memory leak spotted by valgrind. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: reduce the number of GPR used when reading MP perf countersSamuel Pitoiset2015-11-141-1/+2
| | | | | | | No need to allocate more GPR than used in the compute kernel which reads MP performance counters on Fermi. Signed-off-by: Samuel Pitoiset <[email protected]>
* nouveau: don't expose HEVC decoding supportIlia Mirkin2015-11-141-0/+1
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Cc: [email protected]
* radeonsi: remove dead code after ES-GS linkage changeMarek Olšák2015-11-133-57/+0
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: link ES-GS just like LS-HSMarek Olšák2015-11-133-39/+19
| | | | | | | | | | | | This reduces the shader key for ES. Use a fixed attrib location based on (semantic name, index). The ESGS item size is determined by the physical index of the highest ES output, so it's almost always larger than before, but I think that shouldn't matter as long as the ESGS ring buffer is large enough. Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: calculate optimal GS ring sizes to fix GS hangs on TongaMarek Olšák2015-11-135-47/+113
| | | | | | | | | | | | | | I discovered that increasing the ESGS ring size fixes GS hangs on Tonga, so let's do it properly. There is now a separate init_config_gs_rings state that is not immutable, because GS rings are resized when needed. This also saves some memory. Most apps won't need more than 1MB per ring per shader engine. Reviewed-by: Michel Dänzer <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: rename si_update_gs_ringsMarek Olšák2015-11-131-2/+2
| | | | | Reviewed-by: Michel Dänzer <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: calculate ESGS_RING_ITEMSIZE in create_shaderMarek Olšák2015-11-132-1/+3
| | | | | Reviewed-by: Michel Dänzer <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: move maximum gs stream calculation into create_shaderMarek Olšák2015-11-132-16/+7
| | | | | Reviewed-by: Michel Dänzer <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: clean up small duplication in si_shader_gsMarek Olšák2015-11-132-6/+8
| | | | | Reviewed-by: Michel Dänzer <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: shorten render_cond variable namesMarek Olšák2015-11-135-13/+13
| | | | | | and ..._cond -> ..._invert Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: remove predicate_drawing flagMarek Olšák2015-11-134-4/+2
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: atomize render condition (SET_PREDICATION)Marek Olšák2015-11-1310-45/+45
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: simplify restoring render condition after flushMarek Olšák2015-11-133-26/+5
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: don't use PREDICATION_OP_CLEARMarek Olšák2015-11-131-36/+24
| | | | | | Not setting the predication bit is sufficient. Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: simplify disabling render condition for u_blitterMarek Olšák2015-11-135-23/+22
| | | | | | just disable it by not setting the predication bit Reviewed-by: Nicolai Hähnle <[email protected]>
* r600g: don't set predication on non-draw packetsMarek Olšák2015-11-131-8/+8
| | | | | | This has no effect. Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: inline the r600_rings structureMarek Olšák2015-11-1324-266/+262
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: prevent recursion in si_context_gfx_flushMarek Olšák2015-11-132-0/+8
| | | | | | The recursion can only occur if you modify need_cs_space to always flush. Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: remove the IB flushing flagMarek Olšák2015-11-134-14/+2
| | | | | | | Not needed anymore. A similar flag will be introduced in the next commit, which will be private in radeonsi. Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: move GFX/DMA flushing from add_to_buffer_list to need_cs_spaceMarek Olšák2015-11-134-15/+14
| | | | | | | | need_cs_space isn't invoked so often and is called before all commands too. This is a lot cleaner. The code in radeon_add_to_buffer_list always seemed dodgy to me. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: rename cache flushing flags once moreMarek Olšák2015-11-137-35/+30
| | | | | | | | | | | | | | | KCACHE, TC L1 and TC L2 are renamed to: - SMEM L1 - VMEM L1 - GLOBAL L2 You can easily tell what they are used for now. Shaders must deal with coherency issues between both L1s manually, e.g. by setting GLC=1 or by using s_dcache_*. BOTH_ICACHE_KCACHE was an unused definition. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: set the DISABLE_WR_CONFIRM flag on CI-VI as wellMarek Olšák2015-11-131-2/+2
| | | | | | | | I missed this in commit c3e527f93d4281ad6e2ca165eaf6ff588e4faefa radeonsi: only enable write confirmation on the last CP DMA packet Reviewed-by: Michel Dänzer <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: initialize SX_PS_DOWNCONVERT to 0 on StoneyMarek Olšák2015-11-131-0/+3
| | | | | | | otherwise the SX or CB blocks can go bananas Reviewed-by: Nicolai Hähnle <[email protected]> Cc: [email protected]
* radeonsi: add glClearBufferSubData accelerationMarek Olšák2015-11-131-0/+60
| | | | | | 8-bit and 16-bit clears which are not aligned to dwords are done in software. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: add SI_SAVE_FRAGMENT_STATE blitter flagMarek Olšák2015-11-131-19/+25
| | | | | | Buffer clears via transform feedback won't set this. Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/u_blitter: add support for multi-dword clear values in clear_bufferMarek Olšák2015-11-131-11/+14
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: fix a future crash in emit_cb_target_maskMarek Olšák2015-11-131-1/+1
| | | | | | | This can't crash currently, but it would crash if clear_buffer from u_blitter were used with a clean context. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: fix unaligned clear_buffer fallbackMarek Olšák2015-11-131-6/+8
| | | | | | | This is unreachable currently, but it will be used by unaligned 8-bit and 16-bit fills. Reviewed-by: Nicolai Hähnle <[email protected]>
* r600g: fix clear_buffer fallback with offset != 0Marek Olšák2015-11-131-0/+1
| | | | | | | Discovered by luck. This code path hasn't been exercised since transform feedback was implemented. Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: fix PIPE_QUERY_GPU_FINISHEDMarek Olšák2015-11-131-1/+1
| | | | | | | | | Broken by the addition of r600_multi_fence in 3b37155a68acc351cba86a1fa142bd0de2192d4c Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89014 Reviewed-by: Michel Dänzer <[email protected]>
* nvc0/ir: add support for TGSI_SEMANTIC_HELPER_INVOCATIONIlia Mirkin2015-11-126-0/+6
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* gallium: add support for gl_HelperInvocation semanticIlia Mirkin2015-11-123-1/+11
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Glenn Kennard <[email protected]>
* st/wgl: add a comment about recursive locking in stw_make_current()Brian Paul2015-11-121-0/+4
| | | | | Reviewed-by: Sinclair Yeh <[email protected]> Reviewed-by: Charmaine Lee <[email protected]>
* st/wgl: add a lock assertion in stw_framebuffer_from_hwnd_locked()Brian Paul2015-11-121-0/+1
| | | | | Reviewed-by: Sinclair Yeh <[email protected]> Reviewed-by: Charmaine Lee <[email protected]>
* st/wgl: add some mutex checking codeJosé Fonseca2015-11-121-0/+26
| | | | | | | | | | | This would have caught the locking bug that was fixed in the earlier "st/wgl: fix locking issue in stw_st_framebuffer_present_locked()" patch. v2: minor coding style changes by Brian. Reviewed-by: Sinclair Yeh <[email protected]> Reviewed-by: Charmaine Lee <[email protected]>
* st/wgl: rename stw_framebuffer_release() to stw_framebuffer_unlock()Brian Paul2015-11-125-19/+19
| | | | | | | To match the new stw_framebuffer_lock() function. Reviewed-by: Sinclair Yeh <[email protected]> Reviewed-by: Charmaine Lee <[email protected]>
* st/wgl: reimplement stw_framebuffer::mutex with CRITICAL_SECTIONBrian Paul2015-11-124-29/+32
| | | | | | | | v2: update comments on the stw_framebuffer::mutex field regarding locking order. Reviewed-by: Sinclair Yeh <[email protected]> Reviewed-by: Charmaine Lee <[email protected]>
* st/wgl: include u_debug.hBrian Paul2015-11-123-0/+6
| | | | | | | | To get declaration for debug_printf() directly instead of getting it indirectly through os_thread.h Reviewed-by: Sinclair Yeh <[email protected]> Reviewed-by: Charmaine Lee <[email protected]>
* st/wgl: reimplement stw_device::fb_mutex with CRITICAL_SECTIONBrian Paul2015-11-123-15/+29
| | | | | Reviewed-by: Sinclair Yeh <[email protected]> Reviewed-by: Charmaine Lee <[email protected]>
* st/wgl: re-implement stw_device::ctx_mutex with CRITICAL_SECTIONBrian Paul2015-11-123-19/+34
| | | | | | | | | This is Windows-only code so we can use the native Win32 functions for critical sections. This will also allow us to (cleanly) add some mutex check/debug code in subsequent patches. Reviewed-by: Sinclair Yeh <[email protected]> Reviewed-by: Charmaine Lee <[email protected]>
* gallium/hud: add cpu graph support for WindowsBrian Paul2015-11-121-0/+54
| | | | | | | | | | We support "cpu" but not "cpu#" because there's no good way of querying per-cpu usage. Also, the cpu usage is for the process, not the whole system. Original code cobbled together by Brian and then fixed/polished by Jose. Signed-off-by: Brian Paul <[email protected]>
* nv50,nvc0: add ARB_clear_texture supportIlia Mirkin2015-11-115-7/+101
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* gallium: add PIPE_CAP_CLEAR_TEXTURE and clear_texture prototypeIlia Mirkin2015-11-1118-0/+31
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* r600: initialised PGM_RESOURCES_2 for ES/GSDave Airlie2015-11-122-0/+6
| | | | | | | | | | | This fixes the corruption on rendering that we are seeing in certain geometry shaders. Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=91780 Reviewed-by: Alex Deucher <[email protected]> Tested / Reviewed-by: Glenn Kennard <[email protected]> Cc: "10.6" "11.0" <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* st/wgl: clarify code in stw_framebuffer_from_hwnd_locked()Brian Paul2015-11-111-2/+2
| | | | | | | | | Just a minor code change to make it obvious that NULL is returned when we don't find the given HWND. Reviewed-by: Sinclair Yeh <[email protected]> Reviewed-by: Charmaine Lee <[email protected]> Reviewed-by: José Fonseca <[email protected]>