| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
We're going to redo discard handling to track discards in the other flag
subregister, saving instructions in the discard and allowing predicated
jumps out to the end of the shader.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
We're about to start using the f0.1 subregister.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As far as I can see, the intention of the requirement that we do so is to
prevent instruction prefetch from wandering out into either unmapped memory or
memory with a different caching type, and hanging the chip. The kernel makes
sure that the page after your BO has a valid page of the same caching type,
which meets this requirement, so there's no need to waste space between our
programs (and in instruction cache) on this.
Saves another 9kb instructions in l4d2 shaders.
Acked-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Reduces l4d2 program size from 1195kb to 919kb. Improves performance by 0.22%
+/- 0.11% (n=70).
v2: Rebase on compaction v2, fix up flag reg handling (by anholt).
v3: Fix uncompaction of the flag register number.
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reduces program size by using some smaller encodings for common bit
patterns in the Gen ISA, with the hope of making programs fit in the
instruction cache better.
v2: Use larger bitshifts for the uncompressed field setups, in line with the
way it's described in the spec. Consistently name a brw_compile "p" like
all other code. Add a couple more tests. Consistently call things
"compacted" not "compressed" (which is a different feature). Drop the
explicit check for not compacting SENDs, which is unjustified and already
implied by our lack of support for immediate values.
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
|
| |
It's going to get more complicated when we do instruction compaction. This
also introduces putting the program offset in the output.
v2: Use next_insn_offset in brw_get_program(), too.
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
| |
I noticed in valgrind that p->single_program_flow was used while
uninitialized. Everything else zeroed out brw_compile, but this is better
API.
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
|
|
| |
There was a chance for brw_wm_emit.c to screw up and pass (1 << 4) instead of
1, which would get converted to 0 when stored. Instead, use stdbool which
converts nonzero to true/1 like we want.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The idea here is to rewrite comparisons like 2 >= x with x <= 2; we want
to simply exchange arguments, not negate the condition. If equality was
part of the original comparison, it should remain part of the swapped
version.
This is the true cause of bug #50298. It didn't manifest itself on
Sandybridge because we embed the conditional modifier in the IF
instruction rather than emitting a CMP. All other platforms use CMP.
It also didn't manifest itself on the master branch because commit
be5f27a84d ("glsl: Refine the loop instruction counting.") papered over
the problem.
NOTE: This is a candidate for stable release branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=50298
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
| |
This never worked. brwProgramStringNotify also explicitly rejects
programs that use CAL and RET. So there's no need for this to exist.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Here is the final patch to enable dynamic eu instruction store size:
increase the brw eu instruction store size dynamically instead of just
allocating it statically with a constant limit. This would fix something
that 'GL_MAX_PROGRAM_INSTRUCTIONS_ARB was 16384 while the driver would
limit it to 10000'.
v2: comments from ken, do not hardcode the eu limit to (1024 * 1024)
Signed-off-by: Yuanhan Liu <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If dynamic instruction store size is enabled, while after
the brw_IF/ELSE() and before the brw_ENDIF() function, the
eu instruction store base address(p->store) may change.
Thus let if_stack just store the instruction index. This is
somehow more flexible and safe than store the instruction
memory address.
Signed-off-by: Yuanhan Liu <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
The codegen backends all had this same tracking, so just do it at the
EU level.
Reviewed-by: Yuanhan Liu <[email protected]>
|
|
|
|
|
|
|
| |
This is a similar cleanup to what we did for brw_IF(), brw_ELSE(),
brw_ENDIF() handling.
Reviewed-by: Yuanhan Liu <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
brw_set_compression_control took a GLboolean as an argument, then
promptly used a switch statement to compare it with various enumeration
values. Clearly it's not actually a boolean.
Introduce a new enumeration type, enum brw_compression, and use that.
Found by converting GLboolean to bool; clang then gave warnings about
switching on a boolean and ultimately duplicated case errors.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Replace each occurence of
#include "../glsl/*.h"
with
#include "glsl/*.h"
Reviewed-by: Ian Romanick <[email protected]>
Signed-off-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
| |
This hides the IF stack and back-patching of IF/ELSE instructions from
each of the code generators, greatly simplifying the interface.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This would be so much easier if we were using C++; we could simply use
constructors and destructors. Instead, we have to update all the
callers.
While we're at it, ralloc various brw_wm_compile fields rather than
explicitly calloc/free'ing them.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
| |
This makes it symmetric with brw_set_dest, which is convenient, and will
also allow for assertions to be made based off of intel->gen.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This is like what we do for add/mul, but we have to invert the
predicate to choose the other source instead.
This removes 5 extra moves of constants in nexuiz shaders. No
statistically significant performance difference on my Sandybridge
laptop (n=5).
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
| |
This is like what we do with add/mul, but we also have to flip the
conditional test.
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
| |
Since the 8-wide first-quarter and 16-wide first-half have the same
bit encoding, we now need to track "do you want instruction
compression" in the compile state.
|
|
|
|
| |
Whenever the accumulator results are needed, this bit must be set.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, the prog_instruction::Data field was used to map original Mesa
instructions to brw instructions in order to resolve subroutine calls. This
was a rather tangled mess. Plus it's an obstacle to implementing dynamic
allocation/growing of the instruction buffer (it's still a fixed size).
Mesa's GLSL compiler emits a label for each subroutine and CAL instruction.
Now we use those labels to patch the subroutine calls after code generation
has been done. We just keep a list of all CAL instructions that needs patching
and a list of all subroutine labels. It's a simple matter to resolve them.
This also consolidates some redundant post-emit code between brw_vs_emit.c and
brw_wm_glsl.c and removes some loops that cleared the prog_instruction::Data
fields at the end.
Plus, a bunch of new comments.
|
| |
|
|
This driver comes from Tungsten Graphics, with a few further modifications by
Intel.
|