diff options
author | Ian Romanick <[email protected]> | 2018-02-16 17:26:11 -0800 |
---|---|---|
committer | Ian Romanick <[email protected]> | 2018-03-08 15:26:26 -0800 |
commit | 360899d4577a2431dc73b5c702d60ac6bd59ca07 (patch) | |
tree | 8f018c2161d30ef3857510aae54e72a76e6dd2dc /src/intel/compiler | |
parent | 52c7df1643ec9af119fd66f916f7fbdbcc798d2d (diff) |
i965/vec4: Relax writemask condition in CSE
If the previously seen instruction generates more fields than the new
instruction, still allow CSE to happen. This doesn't do much, but it
also enables a couple more shaders in the next patch. It helped quite a
bit in another change series that I have (at least for now) abandoned.
v2: Add some extra comentary about the parameters to instructions_match.
Suggested by Ken.
No changes on Skylake, Broadwell, Iron Lake or GM45.
Ivy Bridge and Haswell had similar results. (Ivy Bridge shown)
total instructions in shared programs: 11780295 -> 11780294 (<.01%)
instructions in affected programs: 302 -> 301 (-0.33%)
helped: 1
HURT: 0
total cycles in shared programs: 257308315 -> 257308313 (<.01%)
cycles in affected programs: 2074 -> 2072 (-0.10%)
helped: 1
HURT: 0
Sandy Bridge
total instructions in shared programs: 10506687 -> 10506686 (<.01%)
instructions in affected programs: 335 -> 334 (-0.30%)
helped: 1
HURT: 0
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Diffstat (limited to 'src/intel/compiler')
-rw-r--r-- | src/intel/compiler/brw_vec4_cse.cpp | 10 |
1 files changed, 9 insertions, 1 deletions
diff --git a/src/intel/compiler/brw_vec4_cse.cpp b/src/intel/compiler/brw_vec4_cse.cpp index 2e65ef78548..d9f08c96317 100644 --- a/src/intel/compiler/brw_vec4_cse.cpp +++ b/src/intel/compiler/brw_vec4_cse.cpp @@ -112,6 +112,14 @@ operands_match(const vec4_instruction *a, const vec4_instruction *b) } } +/** + * Checks if instructions match, exactly for sources, but loosely for + * destination writemasks. + * + * \param 'a' is the generating expression from the AEB entry. + * \param 'b' is the second occurrence of the expression that we're + * considering eliminating. + */ static bool instructions_match(vec4_instruction *a, vec4_instruction *b) { @@ -127,7 +135,7 @@ instructions_match(vec4_instruction *a, vec4_instruction *b) a->base_mrf == b->base_mrf && a->header_size == b->header_size && a->shadow_compare == b->shadow_compare && - a->dst.writemask == b->dst.writemask && + ((a->dst.writemask & b->dst.writemask) == a->dst.writemask) && a->force_writemask_all == b->force_writemask_all && a->size_written == b->size_written && a->exec_size == b->exec_size && |