| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
Unused since commit fd104a845.
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
Except for a couple of explicit uses, _mesa_inv_sqrtf was disabled since
its addition in 2003 (see f9b1e524).
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Temporarily disabled since 2003 (see 386578c5b).
This saves us from calling sqrt() 128 times to generate the sqrttab in
one_time_init().
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Found by compiler warning:
i830_texstate.c:131:28: warning: argument to 'sizeof' in 'memset' call
is the same expression as the destination; did you mean to
dereference it? [-Wsizeof-pointer-memaccess]
memset(state, 0, sizeof(state));
~~~~~ ^~~~~
On 64-bit systems, memset here would write an extra 4 bytes.
Note: This is a candidate for the stable branches.
Reviewed-by: Brian Paul <[email protected]>
Signed-off-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
| |
To reduce excessive compilation time in release mode.
NOTE: This is a candidate for the 8.0 branch.
Tested-by: Brian Paul <[email protected]>
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This can potentially cut shader program size by a factor of 4 for 4-wide
execution respectively 2 for 8-wide execution and while this ratios aren't
quite reached for more complex shaders it can be close.
Could not really measure a performance difference so far except for trivial
shaders (glxgears).
There seems to be a fair amount of unnecessary move's generated especially
at the beginning it might be possible to optimize those away somehow.
Things aren't quite as clean, some additional stuff needs to be done for
keeping both paths working (though llvm might be able to optimize this away).
glxgears seems to lose about 5-10% of performance, looking at the generated
shaders this is actually less than I'd think it would be - both 4 and 8-wide
shaders, despite containing a loop actually have about 10% more instructions
in total, and will have roughly 50% more executed instructions (though mostly
cheap ones). Need to figure out how to reduce overhead...
v2: keep complex interpolation for 4-wide mode, adapt to interface changes.
Reviewed-by: José Fonseca <[email protected]>
|
|
|
|
|
|
|
|
| |
Fixes piglit vp-address-01 amongst several others.
Signed-off-by: Roy Spliet <[email protected]>
Reviewed-by: Lucas Stach <[email protected]>
Tested-by: Lucas Stach <[email protected]>
|
|
|
|
|
| |
Fixes rendering glitches in Psychonauts such as Raz's eyes flickering white.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=51962.
|
|
|
|
|
|
|
|
|
|
|
|
| |
This thread count is only supposed to be enabled when "WIZ Hashing Disable in
GT_MODE register enabled." I've always been confused whether that means the
bit in the register should be 1 or 0. For my IVB GT2's register 0x7008 value
of 0x0, this appears to work fine.
Improves l4d2 performance at 640x480 by 0.88 +/- 0.11% (n=88). Improves
performance with rasterization at 1280x1024 by 1.45% +/- 0.36% (n=6).
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
Fixes piglit layout-std140.
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
| |
This is a requirement for std140 uniform blocks, and optional for
packed/shared blocks.
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
| |
This is a requirement for std140 uniform blocks, and optional for
packed/shared blocks.
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
| |
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
| |
Now we can actually return information on uniforms in uniform blocks
in the new queries.
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Now that we finally have a list of uniform blocks in the linked shader
program, we can tell what their indices are.
Fixes piglit GL_ARB_uniform_buffer_object/getuniformblockindex.
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
At this point in the linking, we've totally lost track of the struct
gl_uniform_buffer that this pointed to in the original unlinked
shader, so we do a nasty n^2 walk to find it the new one based on the
variable name.
Note that these point into the shader's list of gl_uniform_buffers,
not the linked program's.
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
| |
We'll need to propagate the UBO fields to the uniform storage records
before we can handle the other pnames.
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This is a single entrypoint that maps from a series of names to the
indices of those names within the active uniforms list. Each index is
like glGetUniformLocation()'s return value, except that it doesn't
encode an array offset.
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
| |
With the upcoming GL_ARB_uniform_buffer_object changes, the only
other caller that will want the cooked value is state_tracker.
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
| |
This attempts error-checking, but the layout isn't done yet.
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
We're going to need this structure to cross-validate the uniform
blocks between shader stages, since unused ir_variables might get
dropped. It's also the place we store the RowMajor qualifier, which
is not part of the GLSL type (since that would cause a bunch of type
equality checks to fail).
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
| |
Fixes piglit layout-*-non-uniform and layout-*-within-block.
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Someone tried to be clever and "optimized" add_vertex_data2() to just use
two points for the texture coordinates and then reuse individual
components. Sadly this is not how matrix multiplication works.
Fixes rendercheck -t tmcoords
Signed-off-by: Lucas Stach <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, on Gen7, when texturing from a depth or stencil surface,
the blorp engine would configure the 3D pipeline as though the input
surface was non-multisampled, and perform the necessary coordinate
transformations in the fragment shader to account for the IMS layout.
This meant outputting a lot of extra fragment shader code, and it
raised some uncertainty about how to deal with very large surfaces.
This patch modifies blorp to configure the 3D pipeline properly for
IMS layout when reading from depth and stencil surfaces.
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
|
| |
Previously, on Gen7, compute_msaa_layout_for_pipeline() would verify
that IMS layout is not used. However, now that we configure
SURFACE_STATE correctly for IMS surfaces, IMS layout is available.
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch modifies gen7_set_surface_num_multisamples() to set up the
SURFACE_STATE appropriately for texturing from IMS format MSAA
surfaces (which are only used on Gen7 for depth and stencil buffers).
Since the function now sets more than just the number of multisamples,
it's been renamed to gen7_set_surface_msaa().
This will make it possible to remove some kludginess from the blorp
engine.
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
When downsampling a compressed multisampled surface, we can take a
shortcut to downsample any pixels that were completely covered by a
single primitive. In this case, the first color value we fetch is the
correct final color for the downsampled pixel, so we can skip the rest
of the blending operation.
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When downsampling an integer-format buffer on Gen7, we need to use the
"avg" instruction rather than the "add" instruction, to ensure that we
don't overflow the range of 32-bit integers. Also, we need to use the
proper register type (BRW_REGISTER_TYPE_D or BRW_REGISTER_TYPE_UD) for
intermediate color data and for writing to the render target.
Note: this patch causes blorp to use the proper register type for all
operations (downsampling, upsampling, and ordinary blits). Strictly
speaking, this is only necessary for downsampling, because the other
operations exclusively use MOV instructions on the color data. But
it's simpler to use the proper register type in all cases.
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When downsampling from an MSAA image to a single-sampled image, it is
inevitable that some loss of numerical precision will occur, since we
have to use 32-bit floating point registers to hold the intermediate
results while blending. However, it seems reasonable to expect that
when all samples corresponding to a given pixel have the exact same
color value, there will be no loss of precision.
Previously, we averaged samples as follows:
blend = (((sample[0] + sample[1]) + sample[2]) + sample[3]) / 4
This had the potential to lose numerical precision when all samples
have the same color value, since ((sample[0] + sample[1]) + sample[2])
may not be precisely representable as a 32-bit float, even if the
individual samples are.
This patch changes the formula to:
blend = ((sample[0] + sample[1]) + (sample[2] + sample[3])) / 4
This avoids any loss of precision in the event that all samples are
the same, by ensuring that each addition operation adds two equal
values.
As a side benefit, this puts the formula in the form we will need in
order to implement correct blending of integer formats.
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
From the Ivy Bridge PRM, Vol4 Part3 p152:
"The avg instruction performs component-wise integer average of
src0 and src1 and stores the results in dst. An integer average
uses integer upward rounding. It is equivalent to increment one to
the addition of src0 and src1 and then apply an arithmetic right
shift to this intermediate value."
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
| |
The kill_emitted variable was duplicating the functionality of
gl_fragment_program::UsesKill. There's no need for both.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, the code for setting this flag for GLSL programs was
duplicated in three places: brw_link_shader(), glsl_to_tgsi_visitor,
and ir_to_mesa_visitor. In addition to the unnecessary duplication,
there was a performance problem on i965: brw_link_shader() set the
flag before doing its final round of optimizations, which meant that
if the optimizations managed to eliminate all the discard operations,
the flag would still be set, resulting (at least in theory) in slower
performance.
This patch consolidates all of the code that sets UsesKill for GLSL
programs into do_set_program_inouts(), which already is doing a
similar job for UsesDFdy, and which occurs after i965's final round of
optimizations.
Non-GLSL programs (ARB programs and the state tracker's glBitmap
program) are unaffected.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
Move it to native_wayland_drm_bufmgr_helper.c which only gets compiled when
wayland is enabled and which already includes the right headers.
Signed-off-by: Kristian Høgsberg <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The cube sampler generates two-dimensional texture coordinates and
hence passes NULL for the array for the third one. The actual 2D
sampler, lower in the pipe, knew not to used that array since it
didn't need it. But the samplers have become single-texel and the
coordinate array dereference has been moved up one step, to a level
where the code does not know only two coordinates are used. Hence the
segfault.
The simplest fix by far is to add a third dummy coordinate array in
the call to the next pipe step, which will be dereferenced to an
harmless 0 which then will be happily ignored by the sampler.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=52250
Signed-off-by: Olivier Galibert <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
| |
We're going to make the public wl_buffer struct as small as possible.
Signed-off-by: Kristian Høgsberg <[email protected]>
|
|
|
|
|
|
|
|
| |
We also reuse EGL_TEXTURE_RGBA and EGL_TEXTURE_RGB, adding only the new
planar YUV texture formats: EGL_TEXTURE_Y_U_V_WL, EGL_TEXTURE_Y_UV_WL and
EGL_TEXTURE_Y_XUXV_WL.
Signed-off-by: Kristian Høgsberg <[email protected]>
|
|
|
|
|
|
| |
Support this query for gallium EGL too.
Signed-off-by: Kristian Høgsberg <[email protected]>
|
|
|
|
|
|
|
|
| |
Presumably the function didn't exist when we wrote this code.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The i965 back-end needs to compile dFdy() differently for FBOs and
window system framebuffers, because Y coordinates are flipped between
the two (see commit 82d2596: i965: Compute dFdy() correctly for FBOs).
This patch avoids unnecessarily recompiling shaders that don't use
dFdy(), by only setting render_to_fbo in the wm program key if the
shader actually uses dFdy().
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch updates the ir_set_program_inouts_visitor so that it also
sets gl_fragment_program::UsesDFdy.
This is a bit of a hack (since dFdy() isn't an input or an output),
but there's no other obvious visitor to squeeze this functionality
into, and it would be silly to create a brand new visitor just for
this purpose.
v2: use local 'fprog' var to avoid repeated casting.
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
| |
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The i965 back-end needs to compile dFdy() differently for FBOs and
window system framebuffers, because Y coordinates are flipped between
the two (see commit 82d2596: i965: Compute dFdy() correctly for FBOs).
This boolean will allow it to avoid unnecessarily recompiling shaders
that don't use dFdy().
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The previous commit implemented the workaround, cited a bug report
about OilRush, but actually only enabled the workaround for the demos.
Turn it on for OilRush too.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=50291
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Unigine Heaven (at least) has a bug where it incorrectly uses the
GL_ARB_blend_func_extended extension.
Dual source blending allows two color outputs per render target;
individual shader outputs can be assigned to be either the first or
second blending input by setting the 'index' via one of two methods:
- An API call: glBindFragDataLocationIndexed()
- The GLSL 'layout' qualifier provided by GL_ARB_explicit_attrib_location
Both of these only work on user defined fragment shader outputs; it's an
error to use either on built-in outputs like gl_FragData.
Unigine uses gl_FragData and gl_FragColor exclusively, and doesn't even
attempt to use either method to set index == 1. However, it does set
the blending function to SRC1 enums, which requires a fragment shader
output with index == 1 or else rendering is undefined.
In other words, enabling ARB_blend_func_extended causes Unigine to
render incorrectly, resulting in an apparent regression, even though our
driver code (as far as I can tell) is perfectly fine.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=50291
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chad Versace <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
| |
|
|
|
|
|
|
|
| |
In a few cases, remove unneeded casts.
And fix a few other const-correctness issues.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|