summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* draw: use vectorized calculations for fetchRoland Scheidegger2016-11-082-159/+282
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of doing all the math with scalars, use vectors. This means the overflow math needs to be done manually, albeit that's only really problematic for the stride/index mul, the rest has been pretty much moved outside the shader loop (albeit the mul could actually be optimized away too), where things are still scalar. Because llvm is complete fail with the zero-extend widening mul, roll our own even... To eliminate control flow in the main shader loop fetch, provide fake buffers (so index 0 is always valid to fetch). Still uses aos fetch though in the end - mostly because some more code would be needed to handle unaligned fetches in that path, and because for most formats it won't make a difference anyway (we generate some truly horrendous code for things like R16G16_something for instance). Instanced fetch however stays roughly the same as before, except that no longer the same element is fetched multiple times (I've seen a reduction of ~3 times in main shader loop size due to apparently llvm not being able to deduce it's really all the same with a couple instanced elements). Also, for elts gathering, use vectorized code as well - provide a fake elt buffer if there's no valid one bound. The generated shaders are smaller and faster to compile (not entirely sure about execution speed, but generally unless there's just single vertices to handle I would expect it to be faster - there's more opportunities for future improvements by using soa fetch). No piglit change. Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: introduce 32x32->64bit lp_build_mul_32_lohi functionRoland Scheidegger2016-11-083-38/+172
| | | | | | | | | | | | This is used by shader umul_hi/imul_hi functions (and soon by draw). It's actually useful separating this out on its own, however the real reason for doing it is because we're using an optimized sse2 version, since the code llvm generates is atrocious (since there's no widening mul in llvm, and it does not recognize the widening mul pattern, so it generates code for real 64x64->64bit mul, which the cpu can't do natively, in contrast to 32x32->64bit mul which it could do). Reviewed-by: Jose Fonseca <[email protected]>
* i965: Add space before parenAnuj Phogat2016-11-071-1/+1
| | | | Signed-off-by: Anuj Phogat <[email protected]>
* i965: Remove unnecessary white spaceAnuj Phogat2016-11-071-1/+1
| | | | Signed-off-by: Anuj Phogat <[email protected]>
* i965: Fix alpha-to-coverage and alpha test enabled checksAnuj Phogat2016-11-074-12/+16
| | | | | Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Ben Widawsky <[email protected]>
* mesa: Add helper function _mesa_is_alpha_to_coverage_enabled()Anuj Phogat2016-11-072-0/+16
| | | | | | Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Ben Widawsky <[email protected]>
* mesa: Add helper function _mesa_is_alpha_test_enabled()Anuj Phogat2016-11-072-0/+14
| | | | | | Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Ben Widawsky <[email protected]>
* mesa: Use separate line for function return typeAnuj Phogat2016-11-071-1/+2
| | | | | | Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Ben Widawsky <[email protected]>
* nvc0: simplify draw parameters upload for vertex shadersSamuel Pitoiset2016-11-071-8/+6
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* gallium/hud: protect against and initialization raceSteven Toth2016-11-074-8/+41
| | | | | | | | | In the event that multiple threads attempt to install a graph concurrently, protect the shared list. Signed-off-by: Steven Toth <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/hud: close a previously opened handleSteven Toth2016-11-073-1/+6
| | | | | | | | We're missing the closedir() to the matching opendir(). Signed-off-by: Steven Toth <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/hud: fix a problem where objects are free'd while in use.Steven Toth2016-11-074-55/+0
| | | | | | | | | | | | | | Instead of trying to maintain a reference counted list of valid HUD objects, and freeing them accordingly, creating race conditions between unanticipated multiple threads, simply accept they're allocated once and never released until the process terminates. They're a shared resource between multiple threads, so accept they're always available for use. Signed-off-by: Steven Toth <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* mesa: drop current draw/read buffer when ctx is releasedRob Clark2016-11-071-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes a problem seen with gallium drivers vs android wallpaper. Basically, what happens is: EGLSurface tmpSurface = mEgl.eglCreatePbufferSurface(mEglDisplay, mEglConfig, attribs); mEgl.eglMakeCurrent(mEglDisplay, tmpSurface, tmpSurface, mEglContext); int[] maxSize = new int[1]; Rect frame = surfaceHolder.getSurfaceFrame(); glGetIntegerv(GL_MAX_TEXTURE_SIZE, maxSize, 0); mEgl.eglMakeCurrent(mEglDisplay, EGL_NO_SURFACE, EGL_NO_SURFACE, EGL_NO_CONTEXT); mEgl.eglDestroySurface(mEglDisplay, tmpSurface); ... check maxSize vs frame size and bail if needed ... mEglSurface = mEgl.eglCreateWindowSurface(mEglDisplay, mEglConfig, surfaceHolder, null); ... error checking ... mEgl.eglMakeCurrent(mEglDisplay, mEglSurface, mEglSurface, mEglContext); When the window-surface is created, it ends up with the same ptr address as the recently freed tmpSurface pbuffer surface. Which after many levels of indirection, results in st_framebuffer_validate() ending up with the same/old framebuffer object, and in the end never calling the DRIimageLoaderExtension::getBuffers(). Then in droid_swap_buffers(), the dri2_surf is still the old pbuffer surface (with dri2_surf->buffer being NULL, obviously, so when wallpaper app calls eglSwapBuffers() nothing gets enqueued to the compositor). Resulting in a black/blank background layer. Note that at the EGL layer, when the context is unbound, EGL drops it's references to the draw and read buffer as well. Signed-off-by: Rob Clark <[email protected]> Tested-by: Robert Foss <[email protected]> Acked-by: Tapani Pälli <[email protected]>
* clover: Add CL_PROGRAM_BINARY_TYPE support (CL1.2).Serge Martin2016-11-0610-11/+35
| | | | | | | | | | | | v3 [Francisco Jerez]: Loosely based on Serge's v1 of this patch in order to avoid CL-specific enums in the clover module binary format. In addition to other changes made in v2: Represent the CL program binary type as the section type instead of adding a CL API-specific enum, check that the binary types of the input objects are valid during clLinkProgram(), pass section type as argument to build_module_library() instead of using separate function. Reviewed-by: Francisco Jerez <[email protected]>
* clover: add missing clGetDeviceInfo CL1.2 queriesSerge Martin2016-11-063-0/+35
| | | | | | Reviewed-by: Francisco Jerez <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Vedran Miletić <[email protected]>
* nvc0: get rid of NVE4_COMPUTE_MP_PM_{A,B}_SIGSEL_XXXSamuel Pitoiset2016-11-051-56/+56
| | | | | | | Instead, hardcode group sigsel because there are a bunch of unknown groups, especially on SM50/SM52. Signed-off-by: Samuel Pitoiset <[email protected]>
* gm107/ir: emit RED instead of ATOM when no dstSamuel Pitoiset2016-11-051-1/+28
| | | | | | | | | | | | | This is similar to NVC0 and GK110 emitters where we emit reduction operations instead of atomic operations when the destination is not used. Found after writing some tests which check if performance counters return the expected value. In that case, gred_count returned 0 on gm107 while at least gk106 returned the correct value. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* st/mesa: initialize members of glsl_to_tgsi_instruction in emit_asm()Brian Paul2016-11-051-0/+4
| | | | | | | | | | | | | | This fixes random crashes with MSVC release builds. It seems the members are implicitly initialized to zero with gcc, but not MSVC. In particular, the tex_offset_num_offset field was non-zero causing a loop over the NULL tex_offsets array to crash. Zero-init those fields and a few others to be safe. The regression began with acc23b04cfd64e "ralloc: remove memset from ralloc_size". Reviewed-by: Marek Olšák <[email protected]>
* android: amd/common: add support for libmesa_amd_commonMauro Rossi2016-11-054-1/+60
| | | | | | | | | | | | | | | Fixes the following building error introduced with commit 7115e56 and related amd/common dependencies: external/mesa/src/gallium/drivers/radeonsi/si_shader.c:6861: error: undefined reference to 'ac_is_sgpr_param' external/mesa/src/gallium/drivers/radeonsi/si_shader.c:6951: error: undefined reference to 'ac_is_sgpr_param' clang++: error: linker command failed with exit code 1 (use -v to see invocation) ninja: build stopped: subcommand failed. build/core/ninja.mk:148: recipe for target 'ninja_wrapper' failed make: *** [ninja_wrapper] Error 1 Signed-off-by: Marek Olšák <[email protected]>
* winsys/radeon: don't call surface_best for FMASKMarek Olšák2016-11-051-1/+1
| | | | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98518 Acked-by: Edward O'Callaghan <[email protected]>
* mesa: Add linear ETC2/EAC to the compressed format list with ES3 compat.Kenneth Graunke2016-11-041-6/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | GL_ARB_ES3_compatibility brings ETC2/EAC formats to desktop GL. The meaning of the GL compressed format list is pretty vague - it's supposed to return formats for "general-purpose usage". (GL 4.2 deprecates the list because of this.) Basically everyone interprets this as "linear RGB/RGBA". ETC2/EAC meets that criteria, so while we shouldn't be required to add it to the list, there's also little harm in doing so, at least on platforms with native support. I doubt anyone is using this list for much anyway, so even on platforms without native support, it's probably not a big deal. Makes the following GL45-CTS.gtf43 tests pass: * GL3Tests.eac_compression_r11.gl_compressed_r11_eac * GL3Tests.eac_compression_rg11.gl_compressed_rg11_eac * GL3Tests.eac_compression_signed_r11.gl_compressed_signed_r11_eac * GL3Tests.eac_compression_signed_rg11.gl_compressed_signed_rg11_eac * GL3Tests.etc2_compression_rgb8.gl_compressed_rgb8_etc2 * GL3Tests.etc2_compression_rgb8_pt_alpha1.gl_compressed_rgb8_pt_alpha1_etc2 * GL3Tests.etc2_compression_rgba8.gl_compressed_rgba8_etc2 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eduardo Lima Mitev <[email protected]>
* vc4: Use Newton-Raphson on the 1/W write to fix glmark2 terrain.Eric Anholt2016-11-041-1/+1
| | | | | | | The 1/W was apparently not accurate enough, and we were getting sparklies in the distance. The closed driver also did a N-R step here. Cc: <[email protected]>
* vc4: Make sure that vertex shader texture2D() calls use LOD 0.Eric Anholt2016-11-041-0/+10
| | | | | I noticed this while trying to debug glmark2 terrain (which does vertex shader texturing, but no mipmaps on its textures sampled from the VS).
* radeonsi: fix vertex fetches for 2_10_10_10 formatsNicolai Hähnle2016-11-045-6/+78
| | | | | | | | | | | The hardware always treats the alpha channel as unsigned, so add a shader workaround. This is rare enough that we'll just build a monolithic vertex shader. The SINT case cannot actually happen in OpenGL, but I've included it for completeness since it's just a mix of the other cases. Reviewed-by: Marek Olšák <[email protected]>
* st/mesa: fix the layer of VDPAU surface samplersNicolai Hähnle2016-11-043-17/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A (latent) bug in VDPAU interop was exposed by commit e5cc84dd43be066c1dd418e32f5ad258e31a150a. Before that commit, the st_vdpau code created samplers with first_layer == last_layer == 1 that the general texture handling code would immediately delete and re-create, because the layer does not match the information in the GL texture object. This was correct behavior at least in the DMABUF case, because the imported resource is supposed to have the correct offset already applied. In the non-DMABUF case, this was just plain wrong but apparently nobody noticed. After that commit, the state tracker assumes that an existing sampler is correct at all times. Existing samplers are supposed to be deleted when they may become invalid, and they will be created on-demand. This meant that the sampler with first_layer == last_layer == 1 stuck around, leading to rendering artefacts (on radeonsi), command stream failures (on r600), and assertions (in debug builds everywhere). This patch fixes the problem by simply not creating a sampler at all in st_vdpau_map_surface. We rely on the generic texture code to do the right thing, adding the layer_override to make the non-DMABUF case work. v2: add the layer_override Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98512 Cc: 13.0 <[email protected]> Cc: Christian König <[email protected]> Cc: Ilia Mirkin <[email protected]> Reviewed-by: Marek Olšák <[email protected]> (v1) Reviewed-by: Christian König <[email protected]>
* Revert "st/vdpau: use linear layout for output surfaces"Dave Airlie2016-11-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit d180de35320eafa3df3d76f0e82b332656530126. This is a radeon specific hack that causes problems on nouveau when combined with the SHARED flag later. If radeonsi needs a fix for this, please fix it in the driver. [chk] Using linear surfaces for this makes sense because tilling isn't beneficial and the surfaces can potentially be shared with other GPUs using the VDPAU OpenGL interop. [airlied] I think we need a flag that isn't SHARED/LINEAR that is more SHARED_OTHER_GPU. [mareko] Does radeonsi need PIPE_BIND_VIDEO_DECODE_OUTPUT that it would translate into linear ? [mareko] My only concern is decoding performance. If the decoder works in 64x1 blocks, tiling will hurt. That's the theory. I don't know how the decoder works. Cc: 12.0 13.0 <[email protected]> Acked-by: Christian König <[email protected]> Signed-off-by: Dave Airlie <[email protected]> Tested-by: Ilia Mirkin <[email protected]> Tested-by: Nayan Deshmukh <[email protected]> (I+A)
* radeonsi: fix an assertion failure in si_decompress_sampler_color_texturesMarek Olšák2016-11-041-1/+3
| | | | | | | | | This fixes a crash in Deus Ex: Mankind Divided. Release builds were unaffected, so it's not too serious. Cc: 11.2 12.0 13.0 <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* glx: make interop ABI visible againMarek Olšák2016-11-041-2/+2
| | | | | | | | | This was broken when the GLAPI use was removed from mesa_glinterop.h. Cc: 12.0 13.0 <[email protected]> Acked-by: Alex Deucher <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* egl: make interop ABI visible againMarek Olšák2016-11-041-2/+2
| | | | | | | | | This was broken when the GLAPI use was removed from mesa_glinterop.h. Cc: 12.0 13.0 <[email protected]> Acked-by: Alex Deucher <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* egl: use util/macros.hMarek Olšák2016-11-042-5/+2
| | | | | | | | | I need the definition of PUBLIC. Cc: 12.0 13.0 <[email protected]> Acked-by: Alex Deucher <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* radeonsi: enable GLSL 4.50Nicolai Hähnle2016-11-041-1/+1
| | | | | Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* st/glsl_to_tgsi: fix dvec[34] loads from SSBONicolai Hähnle2016-11-041-6/+4
| | | | | | | | | | When splitting up loads, we have to add 16 bytes to the offset for the high components, just like already happens for stores. Fixes arb_gpu_shader_fp64@shader_storage@layout-std140-fp64-shader. Cc: 13.0 <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* glsl/cache: correct asprintf error handlingNicolai Hähnle2016-11-041-3/+3
| | | | | | | | | | | From the manpage of asprintf: "If memory allocation wasn't possible, or some other error occurs, these functions will return -1, and the contents of strp are undefined." Reviewed-by: Iago Toral Quiroga <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* gallium/radeon: Multiply bpe by nsamples in surf_winsys_to_drmMichel Dänzer2016-11-041-2/+5
| | | | | | | For symmetry with surf_drm_to_winsys. Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: Use flags parameter in radeon_winsys_surface_initMichel Dänzer2016-11-041-1/+1
| | | | | | | Fixes valgrind warnings about surf_ws->flags being uninitialized while starting X. Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: Only convert stencil info if RADEON_SURF_SBUFFER is setMichel Dänzer2016-11-041-10/+21
| | | | | | | Fixes valgrind warnings about using uninitialized memory when starting X. Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: Only loop up to last_level for drm<->winsys conversionMichel Dänzer2016-11-041-2/+2
| | | | | | | | | Fixes spurious assertion failure in surf_level_drm_to_winsys when starting X, due to processing a miplevel which was never initialized. Fixes: e9c76eeeaa67 ("gallium/radeon: remove radeon_surf_level::pitch_bytes") Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* anv: use limits.h instead of deprecated/obsolete values.hTapani Pälli2016-11-041-1/+1
| | | | | | | | Mesa uses limits.h elsewhere, and this makes is possible to compile anv_allocator.c on Android. Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* vc4: Add miptree/texture state support for ETC1 compressed textures.Eric Anholt2016-11-035-1/+33
| | | | | The format isn't flagged as enabled at runtime yet, because we need kernel validation support.
* vc4: Fix use of undefined values since the ralloc zeroing changes.Eric Anholt2016-11-031-6/+11
| | | | | reralloc() no longer zeroes the new contents, so switch to using rzalloc_array() instead.
* nir: Make sure to set the texsrc type in nir drawpixels/bitmap lowering.Eric Anholt2016-11-032-0/+4
| | | | | | | | | We were leaving an undefined value since the ralloc zeroing changes. Fixes nir_validate() failures on vc4. v2: Fix the color-index case of drawpixels as well. Reviewed-by: Rob Clark <[email protected]> (v1)
* draw: fix undefined input handling some more...Roland Scheidegger2016-11-041-50/+54
| | | | | | | | | | | | | | Previous fixes were incomplete - some code still iterated through the number of elements provided by velem layout instead of the number stored in the key (which is the same as the number defined by the vs). And also actually accessed the elements from the layout directly instead of those in the key. This mismatch could still cause crashes. (Besides, it is a very good idea to only use data stored in the key anyway.) v2: move null format check, remove now unnecessary function parameter, some minor prettify Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/hud: call fflush() after printing error messagesBrian Paul2016-11-031-1/+9
| | | | | | For Windows. Otherwise, we don't see the message until the program exits. Reviewed-by: Charmaine Lee <[email protected]>
* svga: move svga_mark_surfaces_dirty() prototype to svga_surface.hBrian Paul2016-11-033-10/+4
| | | | Trivial.
* svga: whitespace / formatting clean-up in svga_context.cBrian Paul2016-11-031-28/+34
| | | | Trivial.
* svga: collect stats for time spent in svga_context_finish()Brian Paul2016-11-031-0/+4
| | | | | This should have appeared with commit "svga: add guest statistic gathering interface" from August 4, but was somehow lost.
* svga: invalidate new surface before it is bound to a render target viewCharmaine Lee2016-11-036-3/+42
| | | | | | | | | Invalidate a "new" surface before it is bound to a render target view or depth stencil view in order to avoid the unnecessary host side copy of the surface data before it is rendered to. Note that, recycled surface is already invalidated before it is reused. Reviewed-by: Brian Paul <[email protected]>
* Revert "svga: use untyped surface formats in most cases"Charmaine Lee2016-11-031-7/+4
| | | | | | Using untyped surface formats causes huge performance degradation on Fusion. This reverts commit eb0ced74f6decd1bf1e111b162e1389bede89af6 until the backend has a better solution to address typeless surface formats.
* svga: allow quad blit for more formatsCharmaine Lee2016-11-031-1/+136
| | | | | | | | | | | | Currently blitter will fail if the blit format is different and view-incompatible to the resource format. Instead of punting to software blit which will stall the pipeline, we will create temporary resource to allow blitter to work. Fixes piglit test arb_copy_image-formats. Also tested with MTT piglit, glretrace. Reviewed-by: Brian Paul <[email protected]>
* svga: create BGRX render target view for BGRX_UNORM surfaceCharmaine Lee2016-11-031-1/+2
| | | | | | | | | | Currently we adjust the view format when we are asked to create a BGRA render target view for BGRX surface. But we only look for SVGA3D_B8G8R8X8_TYPELESS surface format. With this patch, we will also check for SVGA3D_B8G8R8X8_UNORM surface format, and use SVGA3D_B8G8R8X8_UNORM as the view format for that case. Reviewed-by: Brian Paul <[email protected]>