| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously the SF only handled the builtin color varying specially.
This patch generalizes that support to cover user-defined varyings,
driven by the interpolation mode array set up alongside the VUE map.
Based on the following patches from Olivier Galibert:
- http://lists.freedesktop.org/archives/mesa-dev/2012-July/024335.html
- http://lists.freedesktop.org/archives/mesa-dev/2012-July/024339.html
With this patch, all the GLSL 1.3 interpolation tests that do not clip
(spec/glsl-1.30/execution/interpolation/*-none.shader_test) pass.
V5: Move key.do_flat_shading to brw_sf_compile.has_flat_shading; drop
vestigial hunks.
V6: Real bools.
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Most functions no longer use intel_context, so this patch additionally
removes the local "intel" variables to avoid compiler warnings.
Signed-off-by: Kenneth Graunke <[email protected]>
Acked-by: Chris Forbes <[email protected]>
Acked-by: Paul Berry <[email protected]>
Acked-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
| |
Signed-off-by: Kenneth Graunke <[email protected]>
Acked-by: Chris Forbes <[email protected]>
Acked-by: Paul Berry <[email protected]>
Acked-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
| |
Signed-off-by: Kenneth Graunke <[email protected]>
Acked-by: Chris Forbes <[email protected]>
Acked-by: Paul Berry <[email protected]>
Acked-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
| |
Signed-off-by: Kenneth Graunke <[email protected]>
Acked-by: Chris Forbes <[email protected]>
Acked-by: Paul Berry <[email protected]>
Acked-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We have ir->print() to do the old declaration of a visitor and having the
IR accept the visitor (yuck!). And now you can call _mesa_print_ir()
safely anywhere that you know what an ir_instruction is.
A couple of missing printf("\n")s are added in error paths -- when an
expression is handed to the visitor, it doesn't print '\n' (since it might
be a step in printing a whole expression tree).
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Okay I now understand why Frank would want to run away, this is
my attempt at fixing the CVE out of bounds access to constants
outside the range. This attempt converts any illegal constants
to constant 0 as per the GL spec, and is undefined behaviour.
A future patch should add some debug for users to find this out,
but this needs to be backported to stable branches.
CVE-2013-1872
v2: drop the last hunk which was a separate fix (now in master).
hopefully fix the indentations.
v3: don't fail piglit, the whole 8/16 dispatch stuff was over
my head, and I spent a while figuring it out, but this one is
definitely safe, one piglit pass extra on my Ironlake.
NOTE: This is a candidate for stable branches.
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
| |
This reverts commit 98dfd59a0445666060c97b0dccaf0e9f030b547a.
The patch was clearly not Piglit tested, as it caused at least 225
tests to start crashing with assertion failures. That was before my
desktop tanked and the test run died completely.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is my attempt at fixing this as the CVE is making RH security team
care enough to make me look at this. (please upstream, security fixes are
more important than whatever else you are doing, if for no other reason than
it saves me having to fix stuff I've no real clue about).
Since Frank's original fix was denied, here is my attempt to just
alias all constants that are out of bounds < 0 or > nr_params to constant 0,
hopefully this provides the undefined behaviour idr requires..
CVE-2013-1872
v2: drop the last hunk which was a separate fix (now in master).
hopefully fix the indentations.
NOTE: This is a candidate for stable branches.
Reviewed-by: Kenneth Graunke <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, we would sometimes not consider a write to a register to
extend the end of the interval, nor would we consider a read before a
write to extend the start. This made for a bunch of complicated logic
related to how to treat the results when dead code might be present.
Instead, just extend the interval and fix dead code elimination to know
how to remove it.
Interestingly, this actually results in a tiny bit more optimization:
total instructions in shared programs: 1391220 -> 1390799 (-0.03%)
instructions in affected programs: 14037 -> 13616 (-3.00%)
v2: Fix a theoretical problem with the simd16 workaround if dst == src,
where we would revert the bump of the live range.
Reviewed-by: Ian Romanick <[email protected]> (v1)
|
|
|
|
|
|
|
|
|
|
|
|
| |
Don't bother scalarizing ir_binop_bfm, since its results are
identical for all channels.
v2: Subtract result of FBH from 31 (unless an error) to convert
MSB counts to LSB counts.
v3: Use op0->clone() in ir_triop_bfi to prevent (var_ref
channel_expressions) from appearing multiple times in the IR.
Reviewed-by: Chris Forbes <[email protected]> [v2]
|
|
|
|
|
|
|
|
|
|
| |
I need this so I can look at vec4 and fs registers' files from the same
.cpp file without namespaces. As far as I can tell we never rely on the
particular numerical values of the files, though I thought it sounded like
a good idea when doing the VS (it turns out having 0 be BAD_FILE is nicer).
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Improves GLB2.7 performance on my HSW by 0.671455% +/- 0.225037% (n=62).
v2: Make is_valid_3src() a method of the fs_reg. (recommended by Ken)
Reviewed-by: Matt Turner <[email protected]> (v1)
Reviewed-by: Kenneth Graunke <[email protected]> (v1)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These are entirely based on the opcode, which is available in
backend_instruction. It makes sense to only implement them in one
place.
This changes the VS implementation of is_tex() slightly, which now
accepts FS_OPCODE_TXB and SHADER_OPCODE_LOD. However, since those
aren't generated in the VS anyway, it should be fine.
This also makes is_control_flow() available in the VS.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Interpolation modes other than perspective-barycentric-pixel-center (and
their associated coefficients in the WM payload) only exist in Gen6 and
later.
Unfortunately, if a varying was declared as `centroid`, we would blindly
read the nonexistant values, and so produce all manner of bad behavior
-- texture swimming, snow, etc.
Fixes rendering in Counter-Strike Source and Team Fortress 2 on
Ironlake.
NOTE: This is a candidate for the 9.1 branch.
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Paul Berry <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Tested-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Removes 75/78 state-dependent recompiles in GLB2.7 (the remaining 3 are
due to FBO-rendering size predictions). We currently expose
GL_ARB_color_buffer_float on GL core, so we may mis-predict there, but I'm
about to send a patch for removing that silly extension in that case.
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
We don't want to store this thing in the class, and we do need the
definition to be at the top of the function and held onto until the end
here, so there's not much to do besides (void) reference it.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
| |
It's used in an assert, but we have this as a member of the class anyway.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We have this support for firsthalf/sechalf instructions, which would be
called in the !has_compr4 (aka original gen4) 16-wide case. We currently
only support 16-wide for gen5+, so we weren't tripping over this, but it
would have been a problem if we ever try to enable it.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a poor substitute for proper global dead code elimination that
could replace both our current paths, but it was very easy to write. It
particularly helps with Valve's shaders that are translated out of DX
assembly, which has been register allocated and thus have a bunch of
unrelated uses of the same variable (some of which get copy-propagated
from and then left for dead).
shader-db results:
total instructions in shared programs: 1735753 -> 1731698 (-0.23%)
instructions in affected programs: 492620 -> 488565 (-0.82%)
v2: Fix comment typo
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
| |
This instruction doesn't update its IR destination, it just moves from
payload to f0. This caused the dead code elimination pass I'm adding to
dead-code-eliminate the first step of interpolation.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
| |
These checks were all over, and every time I wrote one I had to try to
decide again what the cases were for partial updates.
v2: Fix inadvertent reladdr check removal.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
| |
Also change if (shader) to if (prog) for consistency.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The next patch is going to change the type of vec4_visitor::vp from
struct gl_vertex_program * to struct gl_program *, and rename it. The
sensible name to change it to is vec4_visitor::prog. However, prog is
already used in backend_visitor (which vec4_visitor derives from).
Since backend_visitor::prog is of type struct gl_shader_program *, it
makes sense to rename it to shader_prog.
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The previous commit removed the last user of this field, so there's no
longer any point in setting it. Removing this should eliminate
state-dependent recompiles, and make the precompile more reliable.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This optimization attempts to avoid extra attribute interpolation
instructions for texture coordinates where the W-component is 1.0.
Unfortunately, it requires a lot of complexity: the brw_wm_input_sizes
state atom (all the brw_vs_constval.c code) needs to run on each draw.
It computes the input_size_masks array, then uses that to compute
proj_attrib_mask. Differences in proj_attrib_mask can cause
state-dependent fragment shader recompiles. We also often fail to guess
proj_attrib_mask for the fragment shader precompile, causing us to
needlessly compile it twice.
Furthermore, this optimization only applies to fixed-function programs;
it does not help modern GLSL-based programs at all. Generally, older
fixed-function programs run fine on modern hardware anyway.
The optimization has existed in some form since the initial commit. When
we rewrote the fragment shader backend, we dropped it for a while. Eric
readded it in commit eb30820f268608cf451da32de69723036dddbc62 as part of
an attempt to cure a ~1% performance regression caused by converting the
fixed-function fragment shader generation code from Mesa IR to GLSL IR.
However, no performance data was included in the commit message, so it's
unclear whether or not it was successful.
Time has passed, so I decided to re-measure this. Surprisingly,
Eric's OpenArena timedemo actually runs /faster/ after removing this and
the brw_wm_input_sizes atom. On Ivybridge at 1024x768, I measured a
1.39532% +/- 0.91833% increase in FPS (n = 55). On Ironlake, there was
no statistically significant difference (n = 37).
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This comes at a minor performance cost at the moment (-3.2% +/- 0.2%, n=14 on
my GM45 forced to load all uniforms through the varying-index path), but we
get a whole vec4 at a time to reuse in the next commit.
v2: Fix comment about channels in the other message.
Reviewed-by: Kenneth Graunke <[email protected]>
NOTE: This is a candidate for the 9.1 branch.
|
|
|
|
|
|
|
|
|
| |
We weren't setting needs_dep[i] in the loops, so we'd continue on to
potentially add the same workaround MOVs to the later basic block
boundaries, too. We can either set needs_dep[i] to exit through the
normal path, or we can just return since we know we're done.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
For sampler messages, it depends on the target gen, and on gen4
SIMD16-sampler-on-SIMD8-execution we were returning 4 instead of 8 like we
should.
Reviewed-by: Kenneth Graunke <[email protected]>
NOTE: This is a candidate for the 9.1 branch.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Like we have done for the VS and for constant-index uniform loads, we use
the sampler engine to get caching in front of the L3 to avoid tickling the
IVB L3 bug. This is also a bit of a functional change, as we're now
loading a vec4 instead of a single dword, though we're not taking
advantage of the other 3 components of the vec4 (yet).
With the driver hacked to always take the varying-index path for all
uniforms, improves performance of my old GLSL demo by 315% +/- 2% (n=4).
This a major fix for some blur shaders in compositors from the
varying-index uniforms support I introduced in 9.1.
v2: Move old offset computation into the pre-gen7 path.
Reviewed-by: Kenneth Graunke <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=61554
NOTE: This is a candidate for the 9.1 branch.
|
|
|
|
|
|
|
|
| |
Right now we don't have anything with regs_written() > 1 and !inst->mlen,
but that's about to change.
NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We want to load vec4s, since loading a vec4 instead of a dword is
basically no increased latency. But for variable indexed access, the
previous requirement of aligned vec4s for a sampler LD was hard to
implement.
Note that this change only affects those messages that use the surface
format, like sampler LDs, but not to the untyped data cache loads we've
used in other cases.
No significant performance difference on my GLSL demo with uniforms forced
to take the varying pull constants path (n=4).
NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
I'm going to want to change the math for gen7 using sampler LD
instructions in a way that gets CSE to occur like we'd hope.
NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
We weren't inserting it into the list, so it did nothing. This line was
replaced by the MOV/MUL block above.
NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
"discard" instructions generate HALT instructions which jump to a final
HALT near the end of the shader. Previously, fs_generator created this
final jump target when it saw the first FS_OPCODE_FB_WRITE, causing it
to jump right before the FB write epilogue. This is normally good.
However, INTEL_DEBUG=shader_time also has an epilogue section which
records the final timestamp. The frontend emits IR for this just before
FS_OPCODE_FB_WRITE. Unfortunately, this led to the following ordering:
1. Shader Time Epilogue
2. Final HALT (where discards jump)
3. Framebuffer Write Epilogue
This meant that discarded pixels completely skipped the shader time
epilogue, causing no ending timestamp to be written. This obviously
led to inaccurate results.
This patch adds a new FS_OPCODE_PLACEHOLDER_HALT in the IR stream just
before any epilogue sections. This is where the final HALT should be
generated, and makes it easy to ensure the correct ordering:
1. Final HALT
2. Shader Time Epilogue
3. Framebuffer Write Epilogue
For shaders that don't discard, this opcode compiles away to nothing.
The scheduler adds barrier dependencies to make sure that it doesn't
get moved above any FS_OPCODE_DISCARD_JUMP instructions.
One 8-wide shader in GLBenchmark 2.7 dropped from 2291.67 Gcycles to
a mere 5.13 Gcycles.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
I'd previously added the minimum names to understand my dumps, but this
makes dumps in general much easier to read.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
v2: Support Ironlake as well.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
Previously, if you just wrote a constant color to the render target, no
time got noted at all. This is convenient for doing single-instruction
timings, but not so much for actual program analysis.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This avoids conflicts between shader_time and FB writes, so we can include
more of the program under our profiling. This does mean hiding more of
the message setup from the optimizer, which doesn't have a way to handle
multi-reg sends from GRFs.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
This will let us do much better printouts for non-GLSL programs.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With the introduction of geometry shaders, fragment inputs will no
longer come exclusively from the vertex shader; sometimes they come
from the geometry shader. So the name "vp_outputs_written" will
become a misnomer. This patch renames vp_outputs_written to
input_slots_valid, to reflect the true meaning of the bitfield from
the fragment shader's point of view: it indicates which of the
possible input slots contain valid data that was written by the
previous shader stage.
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previous to this patch, when using fixed function fragment shading,
bit VARYING_BIT_POS of brw_wm_prog_key::proj_attrib_mask was being set
differently during precompiles and normal usage. During precompiles
it was being set only if the fragment shader reads from window
position (which it never does), so it was always being set to 0.
During normal usage it was being set if the vertex shader writes to
all 4 components of gl_Position (which it usually does), so it was
usually being set to 1. As a result, we were almost always doing an
extra recompile for the fixed function fragment shader.
The recompile was totally unnecessary, though, because
brw_wm_prog_key::proj_attrib_mask is only consulted for
fs_visitor::emit_general_interpolation(), which isn't used for
VARYING_SLOT_POS.
This patch avoids the unnecessary recompile by always setting bit
VARYING_BIT_POS of brw_wm_prog_key::proj_attrib_mask to 1.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch makes the following search-and-replace changes:
gl_frag_attrib -> gl_varying_slot
FRAG_ATTRIB_* -> VARYING_SLOT_*
FRAG_BIT_* -> VARYING_BIT_*
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Tested-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that there is no difference between the enums that represent
vertex outputs and fragment inputs, there's no need for a conversion
function. But we still need to be able to detect when a given vertex
output has no corresponding fragment input. So it is replaced by a
new function, _mesa_varying_slot_in_fs(), which tells whether the
given varying slot exists as an FS input or not.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Tested-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This patch makes the following search-and-replace changes:
gl_vert_result -> gl_varying_slot
VERT_RESULT_* -> VARYING_SLOT_*
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Tested-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch updates the bitfields brw_context::wm.input_size_masks,
tracker::size_masks, and brw_wm_prog_key::proj_attrib_mask, all of
which are indexed by gl_frag_attrib, from 32-bit to 64-bit.
This paves the way for supporting geometry shaders, and for merging
the gl_frag_attrib and gl_vert_result enums. The combination of these
two will require at least 55 bits in the bitfields.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Tested-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This avoids some snooping overhead between EUs processing separate shaders
(so VS versus FS).
Improves performance of a minecraft trace with shader_time by 28.9% +/-
18.3% (n=7), and performance of my old GLSL demo by 93.7% +/- 0.8% (n=4).
v2: Add a define for the stride with a comment explaining its units and
why.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
We were handling the the dependency workaround for the first written reg
of a send preceding the one we're fixing up, but didn't consider the other
regs. Thus if you had two sampler calls that got allocated to the same
set of regs, one might, rarely, ovewrite the other. This was occurring in
XBMC's GLSL shaders.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44567
NOTE: This is a candidate for the stable branches.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
When forcing the compiler to always generate pull constants instead of
push constants (in order to have an easy to use testcase), improves
performance of my old GLSL demo 23.3553% +/- 1.42968% (n=7).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=60866
Reviewed-by: Kenneth Graunke <[email protected]>
|