| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
We return 0 for GL_NUM_SHADER_BINARY_FORMATS, so
GL_SHADER_BINARY_FORMATS should not write any data to the application
buffer.
Fixes piglit test 'arb_get_program_binary-overrun shader'.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, we computed dFdy() using the following instruction:
add(8) dst<1>F src<4,4,0)F -src.2<4,4,0>F { align1 1Q }
That had the disadvantage that it computed the same value for all 4
pixels of a 2x2 subspan, which meant that it was less accurate than
dFdx(). This patch changes it to the following instruction when
c->key.high_quality_derivatives is set:
add(8) dst<1>F src<4,4,1>.xyxyF -src<4,4,1>.zwzwF { align16 1Q }
This gives it comparable accuracy to dFdx().
Unfortunately, align16 instructions can't be compressed, so in SIMD16
shaders, instead of emitting this instruction:
add(16) dst<1>F src<4,4,1>.xyxyF -src<4,4,1>.zwzwF { align16 1H }
We need to unroll to two instructions:
add(8) dst<1>F src<4,4,1>.xyxyF -src<4,4,1>.zwzwF { align16 1Q }
add(8) (dst+1)<1>F (src+1)<4,4,1>.xyxyF -(src+1)<4,4,1>.zwzwF { align16 2Q }
Fixes piglit test spec/glsl-1.10/execution/fs-dfdy-accuracy.
Acked-by: Chris Forbes <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Clean up inconsistency in enum decoration:
- Use the undecorated enums where possible.
- MAX_PROGRAM_TEXTURE_GATHER_COMPONENTS_ARB remains decorated, since it
has no undecorated equivalent in GL4.
Signed-off-by: Chris Forbes <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70054
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
The new surface channel select bits allow us to avoid having to
recompile the shader for this workaround.
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-and-tested-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This allows us to use a different surface format for gather4, which is
required for R32G32_FLOAT to work on Gen7.
V4: - Only emit alternate surface state for shaders which will actually
use it.
- Pass a simple 'for_gather' flag rather than a function pointer.
The callee can decide what w/a to apply.
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Worst-case is that *every* texunit uses a format that needs overriding.
V4: Place the gather slots last, so shaders which don't use gather don't
get penalized by having a huge binding table.
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
gather4 GREEN channel against a surface with format R32G32_FLOAT doesn't work
correctly on IVB. w/a from bspec:
- use R32G32_FLOAT_LD = 0x97 instead, for gather4 only.
- select BLUE channel to read GREEN
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
V4: Only flag quirks if there are any uses of gather in the shader,
to avoid spurious recompiles just because someone happened to use
RG32F.
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Pretty much the same as the FS case. Channel select goes in the header,
V2: Less mangling.
V3: Avoid sampling at all, for degenerate swizzles.
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Lowers ir_tg4 (from textureGather and textureGatherOffset builtins) to
SHADER_OPCODE_TG4.
The usual post-sampling swizzle workaround can't work for ir_tg4,
so avoid doing that:
* For R/G/B/A swizzles use the hardware channel select (lives in the
same dword in the header as the texel offset), and then don't do
anything afterward in the shader.
* For 0/1 swizzles blast the appropriate constant over all the output
channels instead of sampling.
V2: Avoid duplicating header enabling block
V3: Avoid sampling at all, for degenerate swizzles.
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Adds the Gen7 message IDs, a new SHADER_OPCODE_TG4 pseudo-op, and
low-level support for emitting it via generate_tex().
V3: Updated for changes in master.
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
V2 [Chris Forbes]:
- Add new pattern, fixup parameter reading.
V3: Rebase onto new builtins machinery
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When used with a cube array in VS, failed assertion in ir_validate:
Assignment count of LHS write mask channels enabled not
matching RHS vector size (3 LHS, 4 RHS).
To fix this, swizzle the RHS correctly for the writemask.
This showed up in the ARB_texture_gather tests, which exercise cube
arrays in the VS.
Signed-off-by: Chris Forbes <[email protected]>
Cc: "9.2" <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Consider only the top-left and top-right pixels to approximate DDX in a 2x2
subspan, unless the application requests a more accurate approximation via
GL_FRAGMENT_SHADER_DERIVATIVE_HINT or this optimization is disabled from the
new driconf option disable_derivative_optimization.
This results in a less accurate approximation. However, it improves the
performance of Xonotic with Ultra settings by 24.3879% +/- 0.832202% (at 95.0%
confidence) on Haswell. No noticeable image quality difference observed.
The improvement comes from faster sample_d. It seems, on Haswell, some
optimizations are introduced to allow faster sample_d when all pixels in a
subspan have the same derivative. I considered SAMPLE_STATE too, which allows
one to control the quality of sample_d on Haswell. But it gave much worse
image quality without giving better performance comparing to this change.
No piglit quick.tests regression on Haswell (tested with v1).
v2: better guess for precompile program key
Signed-off-by: Chia-I Wu <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
|
|
|
|
|
|
|
|
| |
We have the destination framebuffer object passed in; there's no need to
go digging around in the context.
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
| |
All member variables of glsl_to_tgsi_instruction are already being
initialized from its implicitly defined constructor, it's not
necessary to use rzalloc to allocate its memory.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
All member variables of ir_to_mesa_instruction are already being
initialized from its implicitly defined constructor, it's not
necessary to use rzalloc to allocate its memory.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
All member variables of vec4_live_variables are already being
initialized from its constructor, it's not necessary to use rzalloc to
allocate its memory, and doing so makes it more likely that we will
start relying on the allocator to zero out all memory if the class is
ever extended with new member variables.
That's bad because it ties objects to some specific allocation scheme,
and gives unpredictable results when an object is created with a
different allocator -- Stack allocation, array allocation, or
aggregation inside a different object are some of the useful
possibilities that come to my mind.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
All member variables of fs_live_variables are already being
initialized from its constructor, it's not necessary to use rzalloc to
allocate its memory, and doing so makes it more likely that we will
start relying on the allocator to zero out all memory if the class is
ever extended with new member variables.
That's bad because it ties objects to some specific allocation scheme,
and gives unpredictable results when an object is created with a
different allocator -- Stack allocation, array allocation, or
aggregation inside a different object are some of the useful
possibilities that come to my mind.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
All member variables of fs_inst are already being initialized from its
constructor, it's not necessary to use rzalloc to allocate its memory,
and doing so makes it more likely that we will start relying on the
allocator to zero out all memory if the class is ever extended with
new member variables.
That's bad because it ties objects to some specific allocation scheme,
and gives unpredictable results when an object is created with a
different allocator -- Stack allocation, array allocation, or
aggregation inside a different object are some of the useful
possibilities that come to my mind.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
All member variables of ip_record are already being initialized from
its constructor, it's not necessary to use rzalloc to allocate its
memory, and doing so makes it more likely that we will start relying
on the allocator to zero out all memory if the class is ever extended
with new member variables.
That's bad because it ties objects to some specific allocation scheme,
and gives unpredictable results when an object is created with a
different allocator -- Stack allocation, array allocation, or
aggregation inside a different object are some of the useful
possibilities that come to my mind.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The cfg_t object relies on the memory allocator zeroing out its
contents before it's initialized, which is quite an unusual practice
in the C++ world because it ties objects to some specific allocation
scheme, and gives unpredictable results when an object is created with
a different allocator -- Stack allocation, array allocation, or
aggregation inside a different object are some of the useful
possibilities that come to my mind. Initialize all fields from the
constructor and stop using the zeroing allocator.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The bblock_t object relies on the memory allocator zeroing out its
contents before it's initialized, which is quite an unusual practice
in the C++ world because it ties objects to some specific allocation
scheme, and gives unpredictable results when an object is created with
a different allocator -- Stack allocation, array allocation, or
aggregation inside a different object are some of the useful
possibilities that come to my mind. Initialize all fields from the
constructor and stop using the zeroing allocator.
v2: Use zero initialization for numeric types instead of default construction.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The vec4_instruction object relies on the memory allocator zeroing out
its contents before it's initialized, which is quite an unusual
practice in the C++ world because it ties objects to some specific
allocation scheme, and gives unpredictable results when an object is
created with a different allocator -- Stack allocation, array
allocation, or aggregation inside a different object are some of the
useful possibilities that come to my mind. Initialize all fields from
the constructor and stop using the zeroing allocator.
Reviewed-by: Paul Berry <[email protected]>
Reviewed-by: Chad Versace <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Several C++ source files include "main/uniforms.h" from an extern "C"
block, which is both unnecessary, because "uniforms.h" already checks
for a C++ compiler and sets the right linkage, and incorrect, because
the header file includes other C++ headers ("glsl_types.h" and
"ir_uniform.h") that are supposed to get C++ linkage.
Reviewed-by: Paul Berry <[email protected]>
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In commit 247f90c77e8f3894e963d796628246ba0bde27b5 (i965/gs: Set
control data header size/format appropriately for EndPrimitive()), I
incorrectly numbered the DWORDs in the 3DSTATE_GS command starting
from 1 instead of starting from 0. This caused the control data
format to be programmed into the wrong DWORD, resulting in corruption
in some geometry shaders that used an output type of points.
This patch numbers the DWORDs starting from 0, as we do for all other
commands, which causes the control data format to be programmed into
the correct DWORD.
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
| |
The spec doesn't say GL_INVALID_VALUE should be raised for bufSize <= 0.
In any case, memcpy(len < 0) will lead to a crash, so don't allow it.
CC: "9.2" <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
| |
Return bool instead of int. Const-qualify the syncObj. Add some comments.
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Error checking bufSize isn't mentioned in the spec, but it is in the
man pages. However, I believe the man page is incorrect. Typically,
GL functions that take GLsizei parameters check that they're positive
or non-negative. Negative values don't make sense here.
A spec bug has been filed with Khronos/ARB.
v2: check for negative values, not <= 0.
|
|
|
|
| |
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This incorporates Vinson's change to check for a null src pointer as
detected by coverity.
Also, rename the function params to be src/dst, const-qualify src,
and use GL types to match the calling functions. And add some more
comments.
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The mesa/drivers/dri/Makefile.am already guards the individual
targets/subdirs with HAVE_*_DRI before including them. Thus making
the additional check within each Makefile.am unnecessary.
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
Fixes "Resource leak" defect reported by Coverity.
Signed-off-by: Vinson Lee <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The format of the window system framebuffer changed from ARGB8888 to
SARGB8, but we're still supposed to render to it the same as ARGB8888
unless the user flipped the GL_FRAMEBUFFER_SRGB switch.
Reviewed-by: Kenneth Graunke <[email protected]>
NOTE: This is a candidate for stable branches.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I believe this extension was enabled by accident. As far as I can tell,
there has never been any code in Mesa to actually support it. Not only
that, this extension is only useful in the common-lite profile, and Mesa
does the common profile.
This "fixes" the piglit test oes_matrix_get-api.
Signed-off-by: Ian Romanick <[email protected]>
Cc: "9.1 9.2" <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
From the bspec documentation of the SEND instruction:
"destination region cannot cross the 256-bit register boundary."
To avoid violating this restriction when executing SIMD16 texturing
operations (such as those used by blorp), we need to ensure that the
destination of the SEND instruction doesn't exceed 256 bits in size.
An easy way to do this is to set the type of the destination register
to UW (unsigned word), since 16 unsigned words can fit inside a
256-bit register. Fortunately, this has no effect on the sampling
operation, since the sampler always infers the destination data type
from the sampler message rather than from the type of the instruction
operand.
Previously, we did this for texturing operations issued by the vec4
and fs back-ends, but not for blorp. This patch makes blorp use the
same trick.
I haven't observed any behavioural difference on actual hardware due
to this patch, but it avoids a warning from the simulator so it seems
like the right thing to do.
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Acked-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
We originally had a path just did the loop and called
ctx->Driver.AllocTextureImageBuffer(), which I moved into Mesa core. But
we can do better, avoiding incorrect miptree size guesses and later
texture validations by just directly allocating the miptree and setting it
to all the images.
v2: drop debug printf.
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
|
| |
I've rewritten a lot of this file.
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
| |
No change in copies during a piglit run, but it's one less first_level !=
0 in our codebase.
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
As long as the baselevel, maxlevel still sit inside the range we had
previously validated, there's no need to reallocate the texture.
I also hope this makes our texture validation logic much more obvious.
It's taken me enough tries to write this change, that's for sure. Reduces
miptree copy count on a piglit run by 1.3%, though the change in amount of
data moved is much smaller.
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Given that a teximage that calls us with this flag set will immediately
proceed to allocate the other levels, we can probably just go ahead and
allocate those levels now.
Reduces miptree copies in piglit by about .05%.
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the caller shows up with GL_BASE_LEVEL != 0, it doesn't mean that the
texture will over the course of its lifetime have that nonzero baselevel,
it means that the caller is filling the texture from the bottom up for
some reason (one could imagine demand-loading detailed texture layers at
runtime, for example). If we allocate from just the current baselevel, it
means when they come along with the next level up, we'll have to allocate
a new miptree and copy all of our bits out of the first miptree.
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Let's say you started allocating your 2D texture with level 2 of a tree as
a 1x1 image. The driver doesn't know if this means that level 0 is 4x4 or
4x1 or 1x4, so we would just allocate a single 1x1 and let it get copied
in to the real location at texture validate time later.
Since this is just a temporary allocation that *will* get copied, the
extra space allocation of just taking the normal path which will happen to
producing a 4x1 level 0, 2x1 level 1, and 1x1 level 2 is the right way to
go, to reduce complexity in the normal case.
No change in miptree copies over the course of a piglit run.
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This has no effect currently, because intel_finalize_mipmap_tree() always
makes mt->first_level == tObj->BaseLevel.
The change I made before to handle it
(b1080cfbdb0a084122fcd662cd27b4748c5598fd) got very close to working, but
after fixing some unrelated bugs in the series, it still left
tex-miplevel-selection producing errors when testing textureLod(). The
problem is that for explicit LODs, the sampler's LOD clamping is ignored,
and only the surface's MIP clamping is respected. So we need to use
surface mip clamping, which applies on top of the sampler's mip clamping,
so the sampler change gets backed out.
Now actually tested with a non-regressing series producing a non-zero
computed baselevel.
Reviewed-by: Chad Versace <[email protected]>
|