aboutsummaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965/brw_wm.c
Commit message (Collapse)AuthorAgeFilesLines
* i965: Add performance debug for shader recompiles.Eric Anholt2012-08-121-0/+84
| | | | | Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add performance debug for register spilling.Eric Anholt2012-08-121-0/+4
| | | | | Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: Move DepthMode to texture objectPauli Nieminen2012-08-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | GL_DEPTH_TEXTURE_MODE isn't meant to be part of sampler state based on compatibility profile specifications. OpenGL specification 4.1 compatibility 20100725 3.9.2: "... The values accepted in the pname parameter are TEXTURE_WRAP_S, TEXTURE_WRAP_T, TEXTURE_WRAP_R, TEXTURE_MIN_- FILTER, TEXTURE_MAG_FILTER, TEXTURE_BORDER_COLOR, TEXTURE_MIN_- LOD, TEXTURE_MAX_LOD, TEXTURE_LOD_BIAS, TEXTURE_COMPARE_MODE, and TEXTURE_COMPARE_FUNC. Texture state listed in table 6.25 but not listed here and in the sampler state in table 6.26 is not part of the sampler state, and remains in the texture object." The list of states is in Table 6.24 "Textures (state per texture object)" instead of 6.25 mentioned in the specification text. Same can be found from 3.3 compatibility specification. Signed-off-by: Pauli Nieminen <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Avoid unnecessary recompiles for shaders that don't use dFdy().Paul Berry2012-07-191-7/+1
| | | | | | | | | | | | The i965 back-end needs to compile dFdy() differently for FBOs and window system framebuffers, because Y coordinates are flipped between the two (see commit 82d2596: i965: Compute dFdy() correctly for FBOs). This patch avoids unnecessarily recompiling shaders that don't use dFdy(), by only setting render_to_fbo in the wm program key if the shader actually uses dFdy(). Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965: Move loop over texture units into brw_populate_sampler_prog_key.Kenneth Graunke2012-07-121-77/+78
| | | | | | | | | | | The whole reason I avoided this was because it might operate on a brw_vertex_program or a brw_fragment_program. However, that isn't a problem: all we need is the gl_program base type. This avoids awkwardly passing the loop counter 'i' as a parameter, simplifies both callers, and also plumbs prog in place for future use. Signed-off-by: Kenneth Graunke <[email protected]>
* i965: Always emit alpha when nr_color_buffers == 0.Kenneth Graunke2012-07-121-3/+1
| | | | | | | | | | | | | | | If alpha-testing is enabled, we need to send alpha down the pipeline even if nr_color_buffers == 0. However, tracking whether alpha-testing is enabled in the WM program key is expensive: it causes us to compile multiple specializations of the same shader, using program cache space. This patch removes the check for alpha-testing, and simply emits alpha whenever nr_color_buffers == 0. We believe this will also be necessary for alpha-to-coverage, and it should add minimal overhead to an uncommon case. Saving the recompiles should more than make up the difference. Reviewed-by: Eric Anholt <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]>
* i965: Delete previous workaround for textureGrad with shadow samplers.Kenneth Graunke2012-07-121-3/+0
| | | | | | | | | | | | | | | It had many problems: - The shadow comparison was done post-filtering. - It required state-dependent recompiles whenever the comparison function changed. - It didn't even work: many cases hit assertion failures. - I never implemented it for the VS. The new lowering pass which converts textureGrad to textureLod by computing the LOD value works much better. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965/msaa: Fix centroid interpolation of unlit pixels.Paul Berry2012-07-021-4/+14
| | | | | | | | | | | | | | | | | | | | | | | | | From the Ivy Bridge PRM, Vol 2 Part 1 p280-281 (3DSTATE_WM: Barycentric Interpolation Mode): "Errata: When Centroid Barycentric mode is required, HW may produce incorrect interpolation results when a 2X2 pixels have unlit pixels." To work around this problem, after doing centroid interpolation, we replace the centroid-interpolated values for unlit pixels with non-centroid-interpolated values (which are interpolated at pixel centers). This produces correct rendering at the expense of a slight increase in shader execution time. I've conditioned the workaround with a runtime flag (brw->needs_unlit_centroid_workaround) in the hopes that we won't need it in future chip generations. Fixes piglit tests "EXT_framebuffer_multisample/interpolation {2,4} {centroid-deriv,centroid-deriv-disabled}". All MSAA interpolation tests pass now. Reviewed-by: Eric Anholt <[email protected]>
* i965/msaa: Add backend support for centroid interpolation.Paul Berry2012-06-251-4/+15
| | | | | | | | | | | | | | | This patch causes the fragment shader to be configured correctly (and the correct code to be generated) for centroid interpolation. This required two changes: brw_compute_barycentric_interp_modes() needs to determine when centroid barycentric coordinates need to be included in the pixel shader thread payload, and fs_visitor::emit_general_interpolation() needs to interpolate using the correct set of barycentric coordinates. Fixes piglit tests "EXT_framebuffer_multisample/interpolation {2,4} centroid-edges" on i965. Reviewed-by: Eric Anholt <[email protected]>
* i965: Compute dFdy() correctly for FBOs.Paul Berry2012-06-221-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On i965, dFdx() and dFdy() are computed by taking advantage of the fact that each consecutive set of 4 pixels dispatched to the fragment shader always constitutes a contiguous 2x2 block of pixels in a fixed arrangement known as a "sub-span". So we calculate dFdx() by taking the difference between the values computed for the left and right halves of the sub-span, and we calculate dFdy() by taking the difference between the values computed for the top and bottom halves of the sub-span. However, there's a subtlety when FBOs are in use: since FBOs use a coordinate system where the origin is at the upper left, and window system framebuffers use a coordinate system where the origin is at the lower left, the computation of dFdy() needs to be negated for FBOs. This patch modifies the fragment shader back-ends to negate the value of dFdy() when an FBO is in use. It also modifies the code that populates the program key (brw_wm_populate_key() and brw_fs_precompile()) so that they always record in the program key whether we are rendering to an FBO or to a window system framebuffer; this ensures that the fragment shader will get recompiled when switching between FBO and non-FBO use. This will result in unnecessary recompiles of fragment shaders that don't use dFdy(). To fix that, we will need to adapt the GLSL and NV_fragment_program front-ends to record whether or not a given shader uses dFdy(). I plan to implement this in a future patch series; I've left FIXME comments in the code as a reminder. Fixes Piglit test "fbo-deriv". NOTE: This is a candidate for stable release branches. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Don't set brw_wm_prog_key::iz_lookup on Gen6+.Kenneth Graunke2012-06-191-18/+20
| | | | | | | | Sandy Bridge and later don't use this field, so there's no point in setting it. It can only cause harmful state-based recompiles. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* intel: use _mesa_is_winsys/user_fbo() helpersBrian Paul2012-05-011-1/+2
| | | | Reviewed-by: Eric Anholt <[email protected]>
* i965: Add support for sampling texture buffer objects on gen7+.Eric Anholt2012-04-091-1/+1
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Compute required barycentric interp modes once at FS compile time.Eric Anholt2012-02-211-4/+5
| | | | | | | Improves VS state change microbenchmark performance 1.78817% +/- 0.556878% (n=25). Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: Track fixed-function fragment shader as a shaderIan Romanick2012-01-111-1/+1
| | | | | | | | | | | | Previously the fixed-function fragment shader was tracked as a gl_program. This means that it shows up in the driver as a Mesa IR program instead of as a GLSL IR program. If a driver doesn't generate Mesa IR from the GLSL IR, that program is empty. If the program is empty there is either no rendering or a GPU hang. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
* i965/fs: Only set brw_wm_prog_key data for samplers used by the WM.Kenneth Graunke2011-12-191-1/+3
| | | | | | | | | This should avoid state-dependent FS recompiles when samplers that are only used by the VS change. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965/fs: Factor out texturing related data from brw_wm_prog_key.Kenneth Graunke2011-12-191-78/+84
| | | | | | | | | | The idea is to reuse this for the VS and (in the future) GS as well. v2: Include yuvtex data since we're not dropping GL_MESA_ycbycr. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> [v1] Reviewed-by: Ian Romanick <[email protected]>
* i965: Always handle GL_DEPTH_TEXTURE_MODE through the shader.Eric Anholt2011-11-291-12/+27
| | | | | | | | | | | | We were already doing it through the shader (layered underneath GL_EXT_texture_swizzle) in the shadow compare case. This avoids having per-format logic for switching out the surface format dependent on the depth mode. v2: Also do the swizzling for DEPTH_STENCIL. oops. Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Move program compile to emit() time.Eric Anholt2011-10-291-2/+3
| | | | | | | Only 4 other prepare() functions are left, which don't rely on this. Reviewed-by: Kenneth Graunke <[email protected]> Acked-by: Paul Berry <[email protected]>
* i965/gen6+: Add support for noperspective interpolation.Paul Berry2011-10-271-4/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This required the following changes: - WM setup now makes the appropriate set of barycentric coordinates (perspective vs. noperspective) available to the fragment shader, based on whether the shader requires perspective interpolation, noperspective interpolation, both, or neither. - The fragment shader backend now uses the appropriate set of barycentric coordiantes when interpolating, based on the interpolation mode returned by ir_variable::determine_interpolation_mode(). - SF setup now uses gl_fragment_program::InterpQualifier to determine which attributes are to be flat shaded (as opposed to the old logic, which only flat shaded colors). - CLIP setup now ensures that the clipper outputs non-perspective barycentric coordinates when they are needed by the fragment shader. Fixes the remaining piglit tests of interpolation qualifiers that were failing: - interpolation-flat-*-smooth-none - interpolation-flat-other-flat-none - interpolation-noperspective-* - interpolation-smooth-gl_*Color-flat-* Reviewed-by: Eric Anholt <[email protected]>
* i965/gen6+: Parameterize barycentric interpolation modes.Paul Berry2011-10-271-10/+32
| | | | | | | | | | | | | | | | | This patch modifies the fragment shader back-end so that instead of using a single delta_x/delta_y register pair to store barycentric coordinates, it uses an array of such register pairs, one for each possible intepolation mode. When setting up the WM, we intstruct it to only provide the barycentric coordinates that are actually needed by the fragment shader--that is computed by brw_compute_barycentric_interp_modes(). Currently this function returns just BRW_WM_PERSPECTIVE_PIXEL_BARYCENTRIC, because this is the only interpolation mode we support. However, that will change in a later patch. Reviewed-by: Eric Anholt <[email protected]>
* i965: Rename (vs|wm)_max_threads to max_(vs|wm)_threads for consistency.Kenneth Graunke2011-10-251-1/+1
| | | | | | | | | The inconsistency between vs_max_threads and max_vs_entries was rather annoying. I could never seem to remember which one was reversed, which made it harder to find quickly. "Max __ Threads" seems more natural. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* intel: Convert from GLboolean to 'bool' from stdbool.h.Kenneth Graunke2011-10-181-2/+2
| | | | | | | | | | | | | | | | | I initially produced the patch using this bash command: for file in {intel,i915,i965}/*.{c,cpp,h}; do [ ! -h $file ] && sed -i 's/GLboolean/bool/g' $file && sed -i 's/GL_TRUE/true/g' $file && sed -i 's/GL_FALSE/false/g' $file; done Then I manually added #include <stdbool.h> to fix compilation errors, and converted a few functions back to GLboolean that were used in core Mesa's function pointer table to avoid "incompatible pointer" warnings. Finally, I cleaned up some whitespace issues introduced by the change. Signed-off-by: Kenneth Graunke <[email protected]> Acked-by: Chad Versace <[email protected]> Acked-by: Paul Berry <[email protected]>
* mesa: Use gl_shader_program::_LinkedShaders instead of FragmentProgramIan Romanick2011-10-071-1/+1
| | | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: silence unused var warnings in non-debug buildsBrian Paul2011-10-071-0/+1
| | | | Reviewed-by: Chad Versace <[email protected]>
* i965: Fix Android build by removing relative includesChad Versace2011-08-301-1/+1
| | | | | | | | | | Replace each occurence of #include "../glsl/*.h" with #include "glsl/*.h" Reviewed-by: Ian Romanick <[email protected]> Signed-off-by: Chad Versace <[email protected]>
* i965: Fix typo in 2b224d66a01f3ce867fb05558b25749705bbfe7aEric Anholt2011-08-231-1/+1
| | | | | | | | | Unfortunately, since a previous efficiency improvement, we no longer have any open-source testcases producing register spilling, so this code was untested in the fragment shader path. That should change when we get proper temporary array support in the fragment shader. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=40194
* i965: Set up allocation of a VS scratch space if required.Eric Anholt2011-08-161-22/+3
|
* i965/fs: Don't allocate the old backend's compile structs for our compile.Eric Anholt2011-08-051-4/+7
| | | | This saves some 35MB when the program only uses GLSL shaders.
* i965/fs: Add support for TXD with shadow comparisons.Kenneth Graunke2011-06-181-0/+2
| | | | | | | | | | | | | | Our hardware doesn't have a sample_d_c message, so we have to do a regular sample_d and emit instructions to manually perform the comparison. This requires a state dependent recompile whenever the sampler's compare mode or function change. This adds the per-sampler comparison functions to brw_wm_prog_key, but only sets them when the sampler's compare mode is GL_COMPARE_R_TO_TEXTURE (i.e. only for shadow sampling). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Use state streaming on programs, and state base address on gen5+.Eric Anholt2011-06-181-13/+8
| | | | | | | | | | There will be a little bit of thrashing of the program cache BO as the cache warms up, but once the application is in steady state, this reduces relocations on gen5 and later. On my T420 laptop, cairogl firefox-talos-gfx performance improves 2.6% +/- 1.3% (n=6). No statistically significant performance difference on nexuiz (n=5).
* i965/fs: Do a FS compile up front at link time to produce link errors.Eric Anholt2011-05-271-7/+19
| | | | | | At glLinkShaders time, a fail() call in FS compile in 8-wide (the one that's required to succeed, though we may relax that at some point for pre-Ironlake performance) will now report out as a link error.
* i965/fs: Move the computation of register block count from unit to compile.Eric Anholt2011-05-271-1/+1
| | | | | | | No net code size change, but unit update is down 0.8% code size pre-gen6. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Remove dead shadowtex_mask entry in the WM key.Eric Anholt2011-05-261-3/+0
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Remove linear_color for GL_PERSPECTIVE_CORRECTION_HINT.Eric Anholt2011-05-261-4/+0
| | | | | | | | | | From the GL 2.1 spec: "Required perspective-correct interpolation for all fragment attributes except depth in sections 3.4.1 and 3.5.1, effectively making GL PERSPECTIVE CORRECT HINT a no-op." Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add support for correct GL_CLAMP behavior by clamping coordinates.Eric Anholt2011-05-181-0/+10
| | | | | | | | This removes the stupid strict-conformance fallback code I broke when adding ARB_sampler_objects. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=36572 Reviewed-by: Kenneth Graunke <[email protected]> (v1)
* i965: Get a ralloc context into brw_compile.Kenneth Graunke2011-05-171-7/+8
| | | | | | | | | | | | This would be so much easier if we were using C++; we could simply use constructors and destructors. Instead, we have to update all the callers. While we're at it, ralloc various brw_wm_compile fields rather than explicitly calloc/free'ing them. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Remove dead entrypoints to state cache, rename the one that's left.Eric Anholt2011-04-291-8/+5
| | | | | | | | | | As we expanded the usage of the state cache, it grew extra functionality. However, with the recent state streaming rework, we're back to the state cache being used only for shader kernels, which is the piece of GPU state that's actually expensive to compute again from scratch, since it involves compiling. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Add initial support for 16-wide dispatch on gen6.Eric Anholt2011-04-261-9/+4
| | | | | | | | | At this point it doesn't do uniforms, which have to be laid out the same between 8 and 16. Other than that, it supports everything but flow control, which was the thing that forced us to choose 8-wide for general GLSL support. Reviewed-by: Kenneth Graunke <[email protected]>
* intel: Add support for ARB_sampler_objects.Eric Anholt2011-04-231-4/+6
| | | | | | | | | | | | This extension support consists of replacing "gl_texture_obj->Sampler." with "_mesa_get_samplerobj(ctx, unit)->". One instance of referencing the texture's base sampler remains in the initial miptree allocation, where I'm not sure we have a clear association with any texture unit. Tested with piglit ARB_sampler_objects/sampler-objects. Reviewed-by: Brian Paul <[email protected]>
* intel: Add support for ARB_color_buffer_float.Eric Anholt2011-04-201-0/+4
| | | | Reviewed-by: Brian Paul <[email protected]>
* i965/fs: Add gen6 register spilling support.Eric Anholt2011-04-171-0/+15
| | | | | | | | | | | Most of this is code movement to get the scratch space allocated in a shared location. Other than that, the only real changes are that the old oword block messages now operate on oword-aligned areas (with new messages for unaligned access, which we don't do), and that the caching control is in the SFID part of the descriptor instead of message control. Fixes glsl-fs-convolution-1.
* mesa: move sampler state into new gl_sampler_object typeBrian Paul2011-04-101-4/+4
| | | | | | gl_texture_object contains an instance of this type for the regular texture object sampling state. glGenSamplers() generates new instances of gl_sampler_object which can override that state with glBindSampler().
* i965: Fix alpha testing when there is no color buffer in the FBO.Eric Anholt2011-03-151-0/+1
| | | | | We were alpha testing against an unwritten value, resulting in garbage. (part of) Bug #35073.
* i965: Fix tex_swizzle when depth mode is GL_REDChad Versace2011-03-141-1/+2
| | | | | | | Change swizzle from (x000) to (x001). Signed-off-by: Chad Versace <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Remove dead assignmentChad Versace2011-03-141-2/+0
| | | | | | | | The assignment on line 368, `tex_swizzles[i] = SWIZZLE_NOOP`, is rendered dead by the reassignment on line 392. Signed-off-by: Chad Versace <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Drop the dead tracking of color_regions[].Eric Anholt2011-02-041-1/+2
| | | | We pull the draw regions right out of the renderbuffers these days.
* intel: Set the swizzling for depth textures using the GL_RED depth mode.Eric Anholt2010-12-091-0/+4
| | | | Fixes depth-tex-modes-rg.
* i965: Nuke brw_wm_glsl.c.Eric Anholt2010-12-061-14/+7
| | | | | | | | | | It was only used for gen6 fragment programs (not GLSL shaders) at this point, and it was clearly unsuited to the task -- missing opcodes, corrupted texturing, and assertion failures hit various applications of all sorts. It was easier to patch up the non-glsl for remaining gen6 changes than to make brw_wm_glsl.c complete. Bug #30530
* i965: Move payload reg setup to compile, not lookup time.Eric Anholt2010-12-061-53/+61
| | | | | | | | Payload reg setup on gen6 depends more on the dispatch width as well as the uses_depth, computes_depth, and other flags. That's something we want to decide at compile time, not at cache lookup. As a bonus, the fragment shader program cache lookup should be cheaper now that there's less to compute for the hash key.