| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
GL_DEPTH_TEXTURE_MODE isn't meant to be part of sampler state based on
compatibility profile specifications.
OpenGL specification 4.1 compatibility 20100725 3.9.2:
"... The values accepted in the pname parameter
are TEXTURE_WRAP_S, TEXTURE_WRAP_T, TEXTURE_WRAP_R, TEXTURE_MIN_-
FILTER, TEXTURE_MAG_FILTER, TEXTURE_BORDER_COLOR, TEXTURE_MIN_-
LOD, TEXTURE_MAX_LOD, TEXTURE_LOD_BIAS, TEXTURE_COMPARE_MODE, and
TEXTURE_COMPARE_FUNC. Texture state listed in table 6.25 but not listed here and
in the sampler state in table 6.26 is not part of the sampler state, and remains in the
texture object."
The list of states is in Table 6.24 "Textures (state per texture
object)" instead of 6.25 mentioned in the specification text.
Same can be found from 3.3 compatibility specification.
Signed-off-by: Pauli Nieminen <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The i965 back-end needs to compile dFdy() differently for FBOs and
window system framebuffers, because Y coordinates are flipped between
the two (see commit 82d2596: i965: Compute dFdy() correctly for FBOs).
This patch avoids unnecessarily recompiling shaders that don't use
dFdy(), by only setting render_to_fbo in the wm program key if the
shader actually uses dFdy().
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
The whole reason I avoided this was because it might operate on a
brw_vertex_program or a brw_fragment_program. However, that isn't a
problem: all we need is the gl_program base type.
This avoids awkwardly passing the loop counter 'i' as a parameter,
simplifies both callers, and also plumbs prog in place for future use.
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If alpha-testing is enabled, we need to send alpha down the pipeline
even if nr_color_buffers == 0. However, tracking whether alpha-testing
is enabled in the WM program key is expensive: it causes us to compile
multiple specializations of the same shader, using program cache space.
This patch removes the check for alpha-testing, and simply emits alpha
whenever nr_color_buffers == 0. We believe this will also be necessary
for alpha-to-coverage, and it should add minimal overhead to an uncommon
case. Saving the recompiles should more than make up the difference.
Reviewed-by: Eric Anholt <[email protected]>
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It had many problems:
- The shadow comparison was done post-filtering.
- It required state-dependent recompiles whenever the comparison
function changed.
- It didn't even work: many cases hit assertion failures.
- I never implemented it for the VS.
The new lowering pass which converts textureGrad to textureLod by
computing the LOD value works much better.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
From the Ivy Bridge PRM, Vol 2 Part 1 p280-281 (3DSTATE_WM:
Barycentric Interpolation Mode):
"Errata: When Centroid Barycentric mode is required, HW may
produce incorrect interpolation results when a 2X2 pixels have
unlit pixels."
To work around this problem, after doing centroid interpolation, we
replace the centroid-interpolated values for unlit pixels with
non-centroid-interpolated values (which are interpolated at pixel
centers). This produces correct rendering at the expense of a slight
increase in shader execution time.
I've conditioned the workaround with a runtime flag
(brw->needs_unlit_centroid_workaround) in the hopes that we won't need
it in future chip generations.
Fixes piglit tests "EXT_framebuffer_multisample/interpolation {2,4}
{centroid-deriv,centroid-deriv-disabled}". All MSAA interpolation
tests pass now.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch causes the fragment shader to be configured correctly (and
the correct code to be generated) for centroid interpolation. This
required two changes: brw_compute_barycentric_interp_modes() needs to
determine when centroid barycentric coordinates need to be included in
the pixel shader thread payload, and
fs_visitor::emit_general_interpolation() needs to interpolate using
the correct set of barycentric coordinates.
Fixes piglit tests "EXT_framebuffer_multisample/interpolation {2,4}
centroid-edges" on i965.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On i965, dFdx() and dFdy() are computed by taking advantage of the
fact that each consecutive set of 4 pixels dispatched to the fragment
shader always constitutes a contiguous 2x2 block of pixels in a fixed
arrangement known as a "sub-span". So we calculate dFdx() by taking
the difference between the values computed for the left and right
halves of the sub-span, and we calculate dFdy() by taking the
difference between the values computed for the top and bottom halves
of the sub-span.
However, there's a subtlety when FBOs are in use: since FBOs use a
coordinate system where the origin is at the upper left, and window
system framebuffers use a coordinate system where the origin is at the
lower left, the computation of dFdy() needs to be negated for FBOs.
This patch modifies the fragment shader back-ends to negate the value
of dFdy() when an FBO is in use. It also modifies the code that
populates the program key (brw_wm_populate_key() and
brw_fs_precompile()) so that they always record in the program key
whether we are rendering to an FBO or to a window system framebuffer;
this ensures that the fragment shader will get recompiled when
switching between FBO and non-FBO use.
This will result in unnecessary recompiles of fragment shaders that
don't use dFdy(). To fix that, we will need to adapt the GLSL and
NV_fragment_program front-ends to record whether or not a given shader
uses dFdy(). I plan to implement this in a future patch series; I've
left FIXME comments in the code as a reminder.
Fixes Piglit test "fbo-deriv".
NOTE: This is a candidate for stable release branches.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
Sandy Bridge and later don't use this field, so there's no point in
setting it. It can only cause harmful state-based recompiles.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
Improves VS state change microbenchmark performance 1.78817% +/-
0.556878% (n=25).
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously the fixed-function fragment shader was tracked as a
gl_program. This means that it shows up in the driver as a Mesa IR
program instead of as a GLSL IR program. If a driver doesn't generate
Mesa IR from the GLSL IR, that program is empty. If the program is
empty there is either no rendering or a GPU hang.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Acked-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This should avoid state-dependent FS recompiles when samplers that are
only used by the VS change.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The idea is to reuse this for the VS and (in the future) GS as well.
v2: Include yuvtex data since we're not dropping GL_MESA_ycbycr.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]> [v1]
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
We were already doing it through the shader (layered underneath
GL_EXT_texture_swizzle) in the shadow compare case. This avoids
having per-format logic for switching out the surface format dependent
on the depth mode.
v2: Also do the swizzling for DEPTH_STENCIL. oops.
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
Only 4 other prepare() functions are left, which don't rely on this.
Reviewed-by: Kenneth Graunke <[email protected]>
Acked-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This required the following changes:
- WM setup now makes the appropriate set of barycentric coordinates
(perspective vs. noperspective) available to the fragment shader,
based on whether the shader requires perspective interpolation,
noperspective interpolation, both, or neither.
- The fragment shader backend now uses the appropriate set of
barycentric coordiantes when interpolating, based on the
interpolation mode returned by
ir_variable::determine_interpolation_mode().
- SF setup now uses gl_fragment_program::InterpQualifier to determine
which attributes are to be flat shaded (as opposed to the old logic,
which only flat shaded colors).
- CLIP setup now ensures that the clipper outputs non-perspective
barycentric coordinates when they are needed by the fragment shader.
Fixes the remaining piglit tests of interpolation qualifiers that were
failing:
- interpolation-flat-*-smooth-none
- interpolation-flat-other-flat-none
- interpolation-noperspective-*
- interpolation-smooth-gl_*Color-flat-*
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch modifies the fragment shader back-end so that instead of
using a single delta_x/delta_y register pair to store barycentric
coordinates, it uses an array of such register pairs, one for each
possible intepolation mode.
When setting up the WM, we intstruct it to only provide the
barycentric coordinates that are actually needed by the fragment
shader--that is computed by brw_compute_barycentric_interp_modes().
Currently this function returns just
BRW_WM_PERSPECTIVE_PIXEL_BARYCENTRIC, because this is the only
interpolation mode we support. However, that will change in a later
patch.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The inconsistency between vs_max_threads and max_vs_entries was rather
annoying. I could never seem to remember which one was reversed, which
made it harder to find quickly. "Max __ Threads" seems more natural.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I initially produced the patch using this bash command:
for file in {intel,i915,i965}/*.{c,cpp,h}; do [ ! -h $file ] && sed -i
's/GLboolean/bool/g' $file && sed -i 's/GL_TRUE/true/g' $file && sed -i
's/GL_FALSE/false/g' $file; done
Then I manually added #include <stdbool.h> to fix compilation errors,
and converted a few functions back to GLboolean that were used in core
Mesa's function pointer table to avoid "incompatible pointer" warnings.
Finally, I cleaned up some whitespace issues introduced by the change.
Signed-off-by: Kenneth Graunke <[email protected]>
Acked-by: Chad Versace <[email protected]>
Acked-by: Paul Berry <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Replace each occurence of
#include "../glsl/*.h"
with
#include "glsl/*.h"
Reviewed-by: Ian Romanick <[email protected]>
Signed-off-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Unfortunately, since a previous efficiency improvement, we no longer
have any open-source testcases producing register spilling, so this
code was untested in the fragment shader path. That should change
when we get proper temporary array support in the fragment shader.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=40194
|
| |
|
|
|
|
| |
This saves some 35MB when the program only uses GLSL shaders.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Our hardware doesn't have a sample_d_c message, so we have to do a
regular sample_d and emit instructions to manually perform the
comparison.
This requires a state dependent recompile whenever the sampler's compare
mode or function change. This adds the per-sampler comparison functions
to brw_wm_prog_key, but only sets them when the sampler's compare mode
is GL_COMPARE_R_TO_TEXTURE (i.e. only for shadow sampling).
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
There will be a little bit of thrashing of the program cache BO as the
cache warms up, but once the application is in steady state, this
reduces relocations on gen5 and later.
On my T420 laptop, cairogl firefox-talos-gfx performance improves 2.6%
+/- 1.3% (n=6). No statistically significant performance difference
on nexuiz (n=5).
|
|
|
|
|
|
| |
At glLinkShaders time, a fail() call in FS compile in 8-wide (the one
that's required to succeed, though we may relax that at some point for
pre-Ironlake performance) will now report out as a link error.
|
|
|
|
|
|
|
| |
No net code size change, but unit update is down 0.8% code size
pre-gen6.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
From the GL 2.1 spec:
"Required perspective-correct interpolation for all fragment
attributes except depth in sections 3.4.1 and 3.5.1, effectively
making GL PERSPECTIVE CORRECT HINT a no-op."
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
This removes the stupid strict-conformance fallback code I broke when
adding ARB_sampler_objects.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=36572
Reviewed-by: Kenneth Graunke <[email protected]> (v1)
|
|
|
|
|
|
|
|
|
|
|
|
| |
This would be so much easier if we were using C++; we could simply use
constructors and destructors. Instead, we have to update all the
callers.
While we're at it, ralloc various brw_wm_compile fields rather than
explicitly calloc/free'ing them.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
As we expanded the usage of the state cache, it grew extra
functionality. However, with the recent state streaming rework, we're
back to the state cache being used only for shader kernels, which is
the piece of GPU state that's actually expensive to compute again from
scratch, since it involves compiling.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
At this point it doesn't do uniforms, which have to be laid out the
same between 8 and 16. Other than that, it supports everything but
flow control, which was the thing that forced us to choose 8-wide for
general GLSL support.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This extension support consists of replacing
"gl_texture_obj->Sampler." with "_mesa_get_samplerobj(ctx, unit)->".
One instance of referencing the texture's base sampler remains in the
initial miptree allocation, where I'm not sure we have a clear
association with any texture unit.
Tested with piglit ARB_sampler_objects/sampler-objects.
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
| |
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Most of this is code movement to get the scratch space allocated in a
shared location. Other than that, the only real changes are that the
old oword block messages now operate on oword-aligned areas (with new
messages for unaligned access, which we don't do), and that the
caching control is in the SFID part of the descriptor instead of
message control.
Fixes glsl-fs-convolution-1.
|
|
|
|
|
|
| |
gl_texture_object contains an instance of this type for the regular
texture object sampling state. glGenSamplers() generates new instances
of gl_sampler_object which can override that state with glBindSampler().
|
|
|
|
|
| |
We were alpha testing against an unwritten value, resulting in garbage.
(part of) Bug #35073.
|
|
|
|
|
|
|
| |
Change swizzle from (x000) to (x001).
Signed-off-by: Chad Versace <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
The assignment on line 368, `tex_swizzles[i] = SWIZZLE_NOOP`, is rendered
dead by the reassignment on line 392.
Signed-off-by: Chad Versace <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
We pull the draw regions right out of the renderbuffers these days.
|
|
|
|
| |
Fixes depth-tex-modes-rg.
|
|
|
|
|
|
|
|
|
|
| |
It was only used for gen6 fragment programs (not GLSL shaders) at this
point, and it was clearly unsuited to the task -- missing opcodes,
corrupted texturing, and assertion failures hit various applications
of all sorts. It was easier to patch up the non-glsl for remaining
gen6 changes than to make brw_wm_glsl.c complete.
Bug #30530
|
|
|
|
|
|
|
|
| |
Payload reg setup on gen6 depends more on the dispatch width as well
as the uses_depth, computes_depth, and other flags. That's something
we want to decide at compile time, not at cache lookup. As a bonus,
the fragment shader program cache lookup should be cheaper now that
there's less to compute for the hash key.
|