summaryrefslogtreecommitdiffstats
path: root/src/mesa
Commit message (Collapse)AuthorAgeFilesLines
* ff_fragment_shader: Don't do unnecessary (and dangerous) uniform setup.Paul Berry2013-03-191-16/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, right after calling _mesa_glsl_link_shader(), the fixed function fragment shader code made several calls with the ostensible purpose of setting up uniforms for the fragment shader it just created. These calls are unnecessary, since _mesa_glsl_link_shader() calls driver->LinkShader(), which takes care of calling these functions (or their equivalent). Also, they are dangerous to call after _mesa_glsl_link_shader() has returned, because on back-ends such as i965 which do precompilation, _mesa_glsl_link_shader() may have already cached pointers to the existing uniform structures; attempting to set up the uniforms again invalidates those cached pointers. It was only by sheer coincidence that this wasn't manifesting itself as a bug. It turns out that i965's precompile mechanism was always setting bit 0 of brw_wm_prog_key::proj_attrib_mask to 0 for fixed function fragment shaders, but during normal usage this bit usually gets set to 1. As a result, the precompiled shader (with its invalid uniform pointers) was not being used. I'm about to introduce some changes that cause bit 0 of proj_attrib_mask to be set consistently between precompilation and normal usage, so to avoid regressions I need to get rid of the dangerous duplicate uniform setup code first. Reviewed-by: Ian Romanick <[email protected]>
* i965: Avoid unnecessary copy when depthstencil workaround invoked by clear.Paul Berry2013-03-1910-17/+52
| | | | | | | | | | | | | | | | | | | | | | | | | Since apps typically begin rendering with a call to glClear(), it is likely that when brw_workaround_depthstencil_alignment() moves a miplevel to a temporary buffer, it can avoid doing a blit, since the contents of the miplevel are about to be erased. This patch adds the necessary plumbing to determine when brw_workaround_depthstencil_alignment() is being called as a consequence of glClear(), and avoids the unnecessary blit when it is safe to do so. Reviewed-by: Chad Versace <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> v2: Eliminate unnecessary call to _mesa_is_depthstencil_format(). Fix handling of depth buffer in depth/stencil format. v3: Use correct bitfields for clear_mask. Fix handling of depth buffer in depth/stencil format when hardware uses separate stencil. When invalidating, make sure we still reassociate the image to the new miptree. Reviewed-by: Eric Anholt <[email protected]>
* osmesa: fix out-of-tree buildAndreas Boll2013-03-191-0/+1
| | | | | | | | | | | | Taken from downstream: http://anonscm.debian.org/gitweb/?p=pkg-xorg/lib/mesa.git;a=blob;f=debian/patches/14-fix-osmesa-build.diff;h=00581d0e1833c5492d9050e1bf3d5e658cad782e;hb=refs/heads/ubuntu%2B1 v2: Move the added line immediately after -I$(top_srcdir)/src/mapi NOTE: This is a candidate for the 9.1 and 9.0 branches. Acked-by: Kenneth Graunke <[email protected]> (v1) Reviewed-by: Matt Turner <[email protected]>
* mesa: use ieee fp on s390 and m68kAndreas Boll2013-03-191-1/+2
| | | | | | | | | | | | | | Taken from downstream: http://anonscm.debian.org/gitweb/?p=pkg-xorg/lib/mesa.git;a=blob;f=debian/patches/02_use-ieee-fp-on-s390-and-m68k.patch;h=d3d6c1d7fec3c72ecf320706167deb61c52636c3;hb=refs/heads/ubuntu%2B1 Fixes Debian bug #349437. Patch written by David Nusinow. NOTE: This is a candidate for stable branches. Acked-by: Kenneth Graunke <[email protected]> Acked-by: Matt Turner <[email protected]>
* glsl_to_tgsi: remove indirect addressing limitationsChristian König2013-03-191-33/+3
| | | | | | They shouldn't be necessary any more. Signed-off-by: Christian König <[email protected]>
* glsl_to_tgsi: allocate arrays separately v2Christian König2013-03-192-31/+59
| | | | | | | | | Instead of allocating everything as temporaries, use the new array allocation functions. v2: fix bug in simplify_cmp, declare arrays on demand Signed-off-by: Christian König <[email protected]>
* glsl_to_tgsi: use get_temp for all allocationsChristian König2013-03-191-13/+10
| | | | Signed-off-by: Christian König <[email protected]>
* Add dri image entry point for creating image from fdKristian Høgsberg2013-03-183-4/+94
| | | | Reviewed-by: Ander Conselvan de Oliveira <[email protected]>
* i965/blorp: Add INTEL_DEBUG=blorp flag.Paul Berry2013-03-183-0/+8
| | | | | | | | | | | This debug flag prints out the native GEN assembly for a blitting shader produced using BLORP. Hopefully this should be useful in developing additional BLORP features. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Anuj Phogat <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* i965: Simplify separate stencil checkPaul Berry2013-03-161-1/+1
| | | | | | | | The only format returned by _mesa_get_format_base_format() that satisfies _mesa_is_depthstencil_format() is GL_DEPTH_STENCIL, so we can simplify the check. Reviewed-by: Eric Anholt <[email protected]>
* i965: Apply depthstencil alignment workaround when doing fast clears.Paul Berry2013-03-151-1/+5
| | | | | | | | | | | | | | | | | | | | | | Fast depth clears have the same depth/stencil alignment requirements as other drawing operations. Therefore, we need to call brw_workaround_depthstencil_alignment() from both the clear and drawing paths. Without this fix, we get image corruption if the following conditions hold: (a) the first ever drawing operation to a depth miplevel (or the first drawing operation after having used the texture for sampling) is a clear, (b) the depth miplevel has a size that is eligible for fast depth clears, and (c) the depth miplevel has an offset within the miptree that isn't 8x8 aligned. Fixes piglit "depthstencil-render-miplevels" tests with size 273. NOTE: This is a candidate for stable branches Reviewed-by: Chad Versace <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* Replace gl_frag_attrib enum with gl_varying_slot.Paul Berry2013-03-1567-768/+706
| | | | | | | | | | | | This patch makes the following search-and-replace changes: gl_frag_attrib -> gl_varying_slot FRAG_ATTRIB_* -> VARYING_SLOT_* FRAG_BIT_* -> VARYING_BIT_* Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Tested-by: Brian Paul <[email protected]>
* Get rid of _mesa_frag_attrib_to_vert_result().Paul Berry2013-03-152-33/+8
| | | | | | | | | | Now that there is no difference between the enums that represent vertex outputs and fragment inputs, there's no need for a conversion function. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Tested-by: Brian Paul <[email protected]>
* Get rid of _mesa_vert_result_to_frag_attrib().Paul Berry2013-03-153-37/+26
| | | | | | | | | | | | | Now that there is no difference between the enums that represent vertex outputs and fragment inputs, there's no need for a conversion function. But we still need to be able to detect when a given vertex output has no corresponding fragment input. So it is replaced by a new function, _mesa_varying_slot_in_fs(), which tells whether the given varying slot exists as an FS input or not. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Tested-by: Brian Paul <[email protected]>
* mtypes.h: Modify gl_frag_attrib to refer to new gl_varying_slot enum.Paul Berry2013-03-152-26/+34
| | | | | | | | This paves the way for eliminating the gl_frag_attrib enum entirely. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Tested-by: Brian Paul <[email protected]>
* Replace gl_geom_result enum with gl_varying_slot.Paul Berry2013-03-156-52/+22
| | | | | | | | | | | This patch makes the following search-and-replace changes: gl_geom_result -> gl_varying_slot GEOM_RESULT_* -> VARYING_SLOT_* Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Tested-by: Brian Paul <[email protected]>
* mtypes.h: Modify gl_geom_result to refer to new gl_varying_slot enum.Paul Berry2013-03-151-21/+20
| | | | | | | | This paves the way for eliminating the gl_geom_result enum entirely. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Tested-by: Brian Paul <[email protected]>
* Replace gl_geom_attrib enum with gl_varying_slot.Paul Berry2013-03-155-58/+15
| | | | | | | | | | | | This patch makes the following search-and-replace changes: gl_geom_attrib -> gl_varying_slot GEOM_ATTRIB_* -> VARYING_SLOT_* GEOM_BIT_* -> VARYING_BIT_* Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Tested-by: Brian Paul <[email protected]>
* mtypes.h: Modify gl_geom_attrib to refer to new gl_varying_slot enum.Paul Berry2013-03-151-13/+13
| | | | | | | | This paves the way for eliminating the gl_geom_attrib enum entirely. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Tested-by: Brian Paul <[email protected]>
* Replace gl_vert_result enum with gl_varying_slot.Paul Berry2013-03-1540-376/+343
| | | | | | | | | | | This patch makes the following search-and-replace changes: gl_vert_result -> gl_varying_slot VERT_RESULT_* -> VARYING_SLOT_* Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Tested-by: Brian Paul <[email protected]>
* mtypes.h: Modify gl_vert_result to refer to new gl_varying_slot enum.Paul Berry2013-03-152-28/+43
| | | | | | | | This paves the way for eliminating the gl_vert_result enum entirely. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Tested-by: Brian Paul <[email protected]>
* mtypes.h: Add new gl_varying_slot enum, and bitfield defines.Paul Berry2013-03-151-0/+70
| | | | | | | | | | | Future patches will make use of the enum. It will eventually take the place of the existing enums gl_vert_result, gl_geom_attrib, gl_geom_result, and gl_frag_attrib, all of which represent essentially the same information but using inconsistent values. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Tested-by: Brian Paul <[email protected]>
* i965: Change fragment input related bitfields to 64-bit.Paul Berry2013-03-155-15/+16
| | | | | | | | | | | | | | This patch updates the bitfields brw_context::wm.input_size_masks, tracker::size_masks, and brw_wm_prog_key::proj_attrib_mask, all of which are indexed by gl_frag_attrib, from 32-bit to 64-bit. This paves the way for supporting geometry shaders, and for merging the gl_frag_attrib and gl_vert_result enums. The combination of these two will require at least 55 bits in the bitfields. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Tested-by: Brian Paul <[email protected]>
* driconf: add a miscellaneous section and always_have_depth_buffer optionBrian Paul2013-03-151-0/+14
| | | | | | | This option is needed for some applications that neglect to request a depth buffer when choosing a visual/fbconfig. The Linux app Topogun is an example of this problem.
* driconf: reorder options, reformat comments, etcBrian Paul2013-03-151-60/+74
| | | | | | | Move the options into the proper section (Debug, Quality, Performance, etc). Update comments and add some whitespace to improve readability.
* i965: Make INTEL_DEBUG=shader_time use the RAW surface format.Kenneth Graunke2013-03-142-3/+3
| | | | | | | | | | | | | | Untyped Atomic Operation messages are illegal for non-RAW formats. The IVB hardware proceeds happily (after all, who cares what the format of the surface is if you're doing untyped ops on it?), but later hardware apparently doesn't. The simulator for gen7 does complain, though. v2: Rebase against updates to previous patches. (by anholt) NOTE: This is a candidate for the 9.1 branch. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Paul Berry <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]>
* i965: Specialize SURFACE_STATE creation for shader time.Kenneth Graunke2013-03-144-8/+45
| | | | | | | | | | | | | | | | | | This is basically a copy and paste of gen7_create_constant_surface, but with the parameters filled in to offer a simpler interface. It will diverge shortly. I didn't bother adding it to the vtable for now since shader time is only exposed on Gen7+. v2: Replace tabs in the new code (by anholt) Add back dropped memset() and add a comment about HSW channel selects. NOTE: This is a candidate for the 9.1 branch. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Paul Berry <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]>
* i965: Fix INTEL_DEBUG=shader_time for Haswell.Kenneth Graunke2013-03-142-4/+12
| | | | | | | | | | | | | | | Haswell's "Data Cache" data port is a single unit, but split into two SFIDs to allow for more message types without adding more bits in the message descriptor. Untyped Atomic Operations are now message 0010 in the second data cache data port, rather than 6 in the first. v2: Use the #defines from the previous commit. (by anholt) NOTE: This is a candidate for the 9.1 branch. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> (v1)
* i965: Add definitions for gen7+ data cache messages.Eric Anholt2013-03-141-0/+37
| | | | | | | | | | We were sparsely using some of these message types, but I'll just fill them all in now. It will be used for fixing shader_time on HSW. v2: Add missing MEDIA_BLOCK_READ. NOTE: This is a candidate for the 9.1 branch. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Split shader_time entries into separate cachelines.Eric Anholt2013-03-144-4/+13
| | | | | | | | | | | | | This avoids some snooping overhead between EUs processing separate shaders (so VS versus FS). Improves performance of a minecraft trace with shader_time by 28.9% +/- 18.3% (n=7), and performance of my old GLSL demo by 93.7% +/- 0.8% (n=4). v2: Add a define for the stride with a comment explaining its units and why. Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: Fix FB blitting in case of zero size src or dst rectAnuj Phogat2013-03-131-1/+3
| | | | | | | | | | | | | | | Framebuffer blitting operation should be skipped if any of the dimensions (width/height) of src/dst rect is zero. V2: Move the dimension check after error checking in _mesa_BlitFramebuffer. Fixes: fbblit(negative.nullblit.zeroSize) in Intel oglconform https://bugs.freedesktop.org/show_bug.cgi?id=59495 Note: Candidate for all the stable branches. Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* scons: Define PACKAGE_VERSION/BUGREPORT globally.José Fonseca2013-03-131-5/+0
| | | | Fixes the scons build.
* tests: Add $(top_srcdir)/include to AM_CPPFLAGS.Vinson Lee2013-03-121-0/+1
| | | | | | | | | | | Fixes this build error with make check. CC collision.o In file included from ../../../../../src/mesa/main/hash_table.h:34:0, from collision.c:31: ../../../../../src/mesa/main/compiler.h:51:53: fatal error: c99_compat.h: No such file or directory Signed-off-by: Vinson Lee <[email protected]>
* scons: Define PACKAGE_xxxJosé Fonseca2013-03-131-0/+5
| | | | Should get the builds going again.
* st/mesa: add PIPE_FORMAT_R16G16B16A16_UNORM renderbuffer supportBrian Paul2013-03-121-0/+3
| | | | | | To allow rendering in 16-bit/channel RGBA buffers. Reviewed-by: José Fonseca <[email protected]>
* build: Get rid of dead MESA_ASM_FILES variableMatt Turner2013-03-121-1/+0
| | | | Reviewed-by: Eric Anholt <[email protected]>
* mesa/build: Get rid of dead ALL_FILES variableMatt Turner2013-03-121-6/+0
| | | | Reviewed-by: Eric Anholt <[email protected]>
* xmlpool/.gitignore: Remove 'Makefile'Matt Turner2013-03-121-1/+0
| | | | | | Handled by top level .gitignore. Reviewed-by: Eric Anholt <[email protected]>
* mesa: Use PACKAGE_BUGREPORT macro.Matt Turner2013-03-121-1/+1
| | | | Reviewed-by: Eric Anholt <[email protected]>
* mesa: Remove unused version #defines from version.h.Matt Turner2013-03-121-11/+0
| | | | Reviewed-by: Eric Anholt <[email protected]>
* mesa: Replace MESA_VERSION with PACKAGE_VERSION.Matt Turner2013-03-125-5/+5
| | | | | | One fewer place to have to update. Reviewed-by: Eric Anholt <[email protected]>
* mesa,gallium,egl,mapi: One definition of C99 inline/__func__ to rule them all.José Fonseca2013-03-121-51/+5
| | | | | | | | We were in four already... NOTE: Candidate for the stable branches. Reviewed-by: Brian Paul <[email protected]>
* mesa: Use correct functions for enum conversion.Vinson Lee2013-03-111-2/+2
| | | | | | | | Fixes mixing enum types defects reported by Coverity. Signed-off-by: Vinson Lee <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Fix typo in doxygen hyperlinkChad Versace2013-03-111-1/+1
| | | | | | | | s/brw_state_upload/brw_upload_state/ Found because the link was broken. Signed-off-by: Chad Versace <[email protected]>
* mesa: Reduce memory usage for reg alloc with many graph nodes (part 2).Eric Anholt2013-03-111-4/+8
| | | | | | | | | | | | | After the previous fix that almost removes an allocation of 4*n^2 bytes, we can use a bitset to reduce another allocation from n^2 bytes to n^2/8 bytes. Between the previous commit and this one, the peak heap size for an oglconform ARB_fragment_program max instructions test on i965 goes from 4GB to 255MB. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=55825 Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: Reduce the memory usage for reg alloc with many graph nodes (part 1)Eric Anholt2013-03-111-1/+13
| | | | | | | | We were allocating an adjacency_list entry for every possible interference that could get created, but that usually doesn't happen. We can save a lot of memory by resizing the array on demand. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Improve CSE performance by expiring some available expressions.Eric Anholt2013-03-111-1/+19
| | | | | | | | | | We're already walking the list, and we can easily know when something has no reason to be in the list any longer, so take a brief extra step to reduce our worst-case runtime (an oglconform test that emits the maximum instructions in a fragment program). I don't actually know what the worst-case runtime was, because it was too long and I got bored. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Improve live variables calculation performance.Eric Anholt2013-03-112-26/+32
| | | | | | | | | | | | We can execute way fewer instructions by doing our boolean manipulation on an "int" of bits at a time, while also reducing our working set size. Reduces compile time of L4D2's slowest shader from 4s to 1.1s (-72.4% +/- 0.2%, n=10) v2: Remove redundant masking (noted by Ken) Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Also do the gen4 SEND dependency workaround against other SENDs.Eric Anholt2013-03-111-9/+15
| | | | | | | | | | | | We were handling the the dependency workaround for the first written reg of a send preceding the one we're fixing up, but didn't consider the other regs. Thus if you had two sampler calls that got allocated to the same set of regs, one might, rarely, ovewrite the other. This was occurring in XBMC's GLSL shaders. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44567 NOTE: This is a candidate for the stable branches. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Switch to using sampler LD messages for uniform pull constants.Eric Anholt2013-03-114-52/+50
| | | | | | | | | When forcing the compiler to always generate pull constants instead of push constants (in order to have an easy to use testcase), improves performance of my old GLSL demo 23.3553% +/- 1.42968% (n=7). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=60866 Reviewed-by: Kenneth Graunke <[email protected]>