| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
Removes the need of API_DEFINES.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The shadow comparitor needs to be loaded into the Z component of the
last DWord.
Fixes es3conform's shadow_execution_vert and oglconform's
shadow-grad advanced.textureGrad.1D tests on Haswell.
NOTE: This is a candidate for stable branches.
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
According to the Ivybridge PRM, Volume 4 Part 1, page 130, in the
section for the sample_d message: "The r coordinate contains the faceid,
and the r gradients are ignored by hardware."
This doesn't match GLSL, which provides gradients for all of the
coordinates. So we would need to do some math to compute the face ID
before using sample_d. We currently don't have any code to do that.
However, we do have a lowering pass that converts textureGrad to
textureLod, which solves this problem. Since textureGrad on three
components is sufficiently obscure, it's not a performance path.
For now, only handle samplerCubeShadow; we need tests for samplerCube
and samplerCubeArray.
Fixes es3conform's shadow_comparison_frag test on Haswell.
NOTE: This is a candidate for stable branches.
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The upside is less CPU overhead in fiddling with GL error handling, the
ability to use the constant color write message in most cases, and no GLSL
clear shaders appearing in MESA_GLSL=dump output. The downside is more
batch flushing and a total recompute of GL state at the end of blorp.
However, if we're ever going to use the fast color clear feature of CMS
surfaces, we'll need this anyway since it requires very special state
setup.
This increases the fail rate of some the GLES3conform ARB_sync tests,
because of the initial flush at the start of blorp. The tests already
intermittently failed (because it's just a bad testing procedure), and we
can return it to its previous fail rate by fixing the initial flush.
Improves GLB2.7 performance 0.37% +/- 0.11% (n=71/70, outlier removed).
v2: Rename the key member, use the core helper for sRGB, and use
BRW_MASK_* enums, fix comment and indentation (review by Paul).
v3: Rewrite a comment, drop a silly temporary variable (review by Ken)
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
v2: const-qualify ctx, and add a comment about the function (recommended
by Brian and Kenneth).
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]> (v1)
|
|
|
|
|
|
|
|
|
|
| |
Improves GLB2.7 performance 0.13% +/- 0.09% (n=104/105, outliers removed).
More importantly, once color glClear()s are done through blorp in the next
commit, this reduces regression in GLES3 conformance tests that rely on
queueing up many glClear()s and having the GPU report being still busy in
an ARB_sync query after that.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
The remaining bits happen to do nothing that
_swrast_span_render_start()/finish() don't do.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
| |
This could be used by shader-db for hopefully more accurate regression
testing.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Improves GLB2.7 performance on my HSW by 0.671455% +/- 0.225037% (n=62).
v2: Make is_valid_3src() a method of the fs_reg. (recommended by Ken)
Reviewed-by: Matt Turner <[email protected]> (v1)
Reviewed-by: Kenneth Graunke <[email protected]> (v1)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It appears that Z16 on Intel hardware is in fact slower than Z24, so
people are getting surprisingly hurt when trying to use Z16 as a
performance-versus-precision tradeoff, or when they're targeting GLES2 and
that's all you get.
GL 3.0+ have Z16 on the list of required exact format sizes, but GLES
doesn't, so choose the better-performing layout in that case. Improves
GLB 2.7 trex performance at 1920x1080 by 10.7% +/- 1.1% (n=3) on my IVB
system.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These are entirely based on the opcode, which is available in
backend_instruction. It makes sense to only implement them in one
place.
This changes the VS implementation of is_tex() slightly, which now
accepts FS_OPCODE_TXB and SHADER_OPCODE_LOD. However, since those
aren't generated in the VS anyway, it should be fine.
This also makes is_control_flow() available in the VS.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Interpolation modes other than perspective-barycentric-pixel-center (and
their associated coefficients in the WM payload) only exist in Gen6 and
later.
Unfortunately, if a varying was declared as `centroid`, we would blindly
read the nonexistant values, and so produce all manner of bad behavior
-- texture swimming, snow, etc.
Fixes rendering in Counter-Strike Source and Team Fortress 2 on
Ironlake.
NOTE: This is a candidate for the 9.1 branch.
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Paul Berry <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Tested-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
| |
The order or arguments matches DirectX, and is backwards from GLSL's
mix() built-in.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=63983
|
|
|
|
|
|
|
|
|
|
| |
Only 13 affected programs in shader-db, but they were all helped.
total instructions in shared programs: 368877 -> 368851 (-0.01%)
instructions in affected programs: 1576 -> 1550 (-1.65%)
Reviewed-by: Chris Forbes <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Three-source instructions have a vertical stride overloaded to 4, which
prevents directly using vec4 uniforms as arguments. Instead we need to
insert a MOV instruction to do the replication for the three-source
instruction.
With this in place, we can use three-source instructions in the vertex
shader. While some thought needs to go into deciding whether its better
to use a three-source instruction rather than a sequence of equivalent
instructions (when one or more sources are uniforms or immediates), this
will allow us to skip a lot of ugly lowering code and use the BFE and
BFI2 instructions directly.
Reviewed-by: Chris Forbes <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Removes 75/78 state-dependent recompiles in GLB2.7 (the remaining 3 are
due to FBO-rendering size predictions). We currently expose
GL_ARB_color_buffer_float on GL core, so we may mis-predict there, but I'm
about to send a patch for removing that silly extension in that case.
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
From low to high bits, the sample positions are packed y0,x0,y1,x1...
Fixes arb_texture_multisample-sample-position piglit.
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Unlike GEN6, the bits of entry count are distributed like this
width = (entry_count & 0x0000007f); /* bits [6:0] */
height = (entry_count & 0x001fff80) >> 7; /* bits [20:7] */
depth = (entry_count & 0x7fe00000) >> 21; /* bits [30:21] */
The maximum entry count is still limited to 2^27.
This was noted while going over the PRM. No test is impacted, because
1<<20 (the bit that moved) is much larger than GL_UNIFORM_BLOCK_MAX_SIZE,
GL_MAX_TEXTURE_BUFFER_SIZE, or MAX_*_UNIFORM_COMPONENTS.
v2: Explain more in the commit message (by anholt)
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The inverse repeat count should taks up bits 31:15 and is in U1.16. Fixes
the "Restarting lines within a single Begin/End block" subtest of piglit
linestipple, and gets the other failing subtests much closer to passing.
v2: Rewrite commit message with more detailed piglit info (by anholt)
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
| |
Wrong fields were used when dumping width and height.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
Never existed? At least never supported. Doesn't appear in 965, G45,
or ILK documentation.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
We don't want to set the flag for Gallium.
I think only swrast needs the flag to be set for occlusion queries.
v2: fix stats_wm updates in i965
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
No driver checks the flag. Nobody uses it.
I also removed the FLUSH_VERTICES calls, because PixelStorei has no effect
on rendering.
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
_NEW_TRANSFORM_FEEDBACK is not used by core Mesa, so it can be removed.
Instead, an new private flag is added to i965 to serve the same purpose.
If you're new to this:
* When creating a context. you can set private dirty flags
in gl_context::DriverFlags, eg.:
ctx->DriverFlags.NewStateX = BRW_NEW_STATE_X;
* When StateX is changed, core Mesa does:
ctx->NewDriverState |= ctx->DriverFlags.NewStateX;
* When you have to draw, read and clear ctx->NewDriverState.
* Pros: not touching NewState, the driver decides the mapping between
GL states and hw state groups, unlimited number of flags in core Mesa
(still limited number of flags in the driver though)
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
| |
Probably a copy-n-paste mistake.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
The code doesn't set brw->query.obj to NULL, it sets query->bo to NULL.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
Listed in the restrictions section of CMP, but not on the work-arounds
page.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
This matches u_minify()'s behavior, for consistency.
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This reverts commit ecdda414d361ab4430fd5747c9217687c1f3d63f.
Commit was supposed to be a simple typo fix. Clearly needs more
investigating.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=63688
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes issue identified by Klocwork analysis:
'attribute_map' array elements might be used uninitialized in this
function (vec4_visitor::lower_attributes_to_hw_regs).
The attribute_map array contains the mapping from shader input
attributes to the hardware registers they are stored in.
vec4_vs_visitor::setup_attributes() only populates elements of this
array which, according to core Mesa, are actually used by the shader.
Therefore, when vec4_visitor::lower_attributes_to_hw_regs() accesses
the array to lower a register access in the shader, it should in
principle only access elements of attribute_map that contain valid
data. However, if a bug ever caused the driver back-end to access an
input that was not flagged as used by core Mesa, then
lower_attributes_to_hw_regs() would access uninitialized memory, which
could cause illegal instructions to get generated, resulting in a
possible GPU hang.
This patch makes the situation more robust by using memset() to
pre-initialize the attribute_map array to zero, so that if such a bug
ever occurred, lower_attributes_to_hw_regs() would generate a (mostly)
harmless access to r0. In addition, it adds assertions to
lower_attributes_to_hw_regs() so that if we do have such a bug, we're
likely to discover it quickly.
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
| |
It was all over the formats section I wanted to edit.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
I don't see a sensible value to use in this path, but we shouldn't ever
hit this outside of developer new-texture-target enabling.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
| |
We don't want to store this thing in the class, and we do need the
definition to be at the top of the function and held onto until the end
here, so there's not much to do besides (void) reference it.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
| |
This was copy and pasted from can_reswizzle_dst(), and we can just fold it
in instead to avoid the warning.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
| |
I think this actually clarifies what's going on in the asserts a bit,
given how many regions we've got floating around.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
| |
It's used in an assert, but we have this as a member of the class anyway.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We have this support for firsthalf/sechalf instructions, which would be
called in the !has_compr4 (aka original gen4) 16-wide case. We currently
only support 16-wide for gen5+, so we weren't tripping over this, but it
would have been a problem if we ever try to enable it.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a poor substitute for proper global dead code elimination that
could replace both our current paths, but it was very easy to write. It
particularly helps with Valve's shaders that are translated out of DX
assembly, which has been register allocated and thus have a bunch of
unrelated uses of the same variable (some of which get copy-propagated
from and then left for dead).
shader-db results:
total instructions in shared programs: 1735753 -> 1731698 (-0.23%)
instructions in affected programs: 492620 -> 488565 (-0.82%)
v2: Fix comment typo
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
| |
This instruction doesn't update its IR destination, it just moves from
payload to f0. This caused the dead code elimination pass I'm adding to
dead-code-eliminate the first step of interpolation.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
| |
These checks were all over, and every time I wrote one I had to try to
decide again what the cases were for partial updates.
v2: Fix inadvertent reladdr check removal.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
| |
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
| |
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
| |
Also change if (shader) to if (prog) for consistency.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since the vec4_visitor and vec4_generator classes are going to be
re-used for geometry shaders, we can't enable their debug
functionality based on (INTEL_DEBUG & DEBUG_VS) anymore. Instead, add
a debug_flag boolean to these two classes, so that when they're
instantiated the caller can specify whether debug dumps are needed.
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|