| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
Now that loop_controls no longer creates normatively bound loops,
there is no need for ir_loop::normative_bound or the
lower_bounded_loops pass.
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch replaces the ir_loop fields "from", "to", "increment",
"counter", and "cmp" with a single integer ("normative_bound") that
serves the same purpose.
I've used the name "normative_bound" to emphasize the fact that the
back-end is required to emit code to prevent the loop from running
more than normative_bound times. (By contrast, an "informative" bound
would be a bound that is informational only).
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, all of the back-ends (ir_to_mesa, st_glsl_to_tgsi, and the
i965 fs and vec4 visitors) had nearly identical logic for handling
bounded loops. This replaces the duplicate logic with an equivalent
lowering pass that is used by all the back-ends.
Note: on i965, there is a slight increase in instruction count. For
example, a loop like this:
for (int i = 0; i < 100; i++) {
total += i;
}
would previously compile down to this (vec4) native code:
mov(8) g4<1>.xD 0D
mov(8) g8<1>.xD 0D
loop:
cmp.ge.f0(8) null g8<4;4,1>.xD 100D
(+f0) break(8)
add(8) g5<1>.xD g5<4;4,1>.xD g4<4;4,1>.xD
add(8) g8<1>.xD g8<4;4,1>.xD 1D
add(8) g4<1>.xD g4<4;4,1>.xD 1D
while(8) loop
After this patch, the "(+f0) break(8)" turns into:
(+f0) if(8)
break(8)
endif(8)
because the back-end isn't smart enough to recognize that "if
(condition) break;" can be done using a conditional break instruction.
However, it should be relatively easy for a future peephole
optimization to properly optimize this.
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Fast color clears of MSAA buffers work just like fast color clears
with non-MSAA buffers, except that the alignment and scaledown
requirements are different.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chad Versace <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This will make it easier to add fast color clear support to MSAA
buffers, since they have different alignment and scaling requirements.
Reviewed-by: Chad Versace <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, we didn't do multisample blorp clears because we couldn't
figure out how to get them to work. The reason for this was because
we weren't setting the brw_blorp_params num_samples field consistently
with dst.num_samples. Now that those two fields have been collapsed
down into one, we can do multisample blorp clears.
However, we need to do a few other pieces of bookkeeping to make them
work correctly in all circumstances:
- Since blorp clears may now operate on multisampled window system
framebuffers, they need to call
intel_renderbuffer_set_needs_downsample() to ensure that a
downsample happens before buffer swap (or glReadPixels()).
- When clearing a layered multisample buffer attachment using UMS or
CMS layout, we need to advance layer by multiples of num_samples
(since each logical layer is associated with num_samples physical
layers).
Note: we still don't do multisample fast color clears; more work needs
to be done to enable those.
Reviewed-by: Chad Versace <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, brw_blorp_params contained two fields for determining
sample count: num_samples (which determined the multisample
configuration of the rendering pipeline) and dst.num_samples (which
determined the multisample configuration of the render target
surface). This was redundant, since both fields had to be set to the
same value to avoid rendering errors.
This patch eliminates num_samples to avoid future confusion.
Reviewed-by: Chad Versace <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch renames the enum that's used to keep track of fast clear
state from "mcs_state" to "fast_clear_state", and it removes the enum
value INTEL_MCS_STATE_MSAA (which previously meant, "this is an MSAA
buffer, so we're not keeping track of fast clear state"). The only
real purpose that enum value was serving was to prevent us from trying
to do fast clear resolves on MSAA buffers, and it's just as easy to
prevent that by checking the buffer's msaa_layout.
This paves the way for implementing fast clears of MSAA buffers.
Reviewed-by: Chad Versace <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The hardware blitter doesn't understand multisampled layouts, so
there's no way this could possibly succeed.
Reviewed-by: Chad Versace <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The "layer" parameters used in blorp, and the
intel_renderbuffer::mt_layer field, represent a physical layer rather
than a logical layer. This is important for 2D multisample arrays on
Gen7+ because the UMS and CMS multisample layouts use N physical
layers to represent each logical layer, where N is the number of
samples.
Also add an assertion to blorp to help catch bugs if we fail to follow
these conventions.
Reviewed-by: Chad Versace <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
Clarify the fact that we only optimize full buffer clears using fast
color clear, and why.
Reviewed-by: Chad Versace <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
My understanding is that Broadwell retains the same SCS mechanism
that Haswell has, so even if the underlying issue with this format
is not fixed, the w/a will be applied in SCS rather than needing
shader code.
Signed-off-by: Chris Forbes <[email protected]>
Cc: Kenneth Graunke <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Now that all the pieces are in place, this should provide
a nice performance boost for apps using multisample textures.
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Paul Berry <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Paul Berry <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We need to emit extra shader code in this case to sample the
MCS surface first; we can't just blindly do this all the time
since IVB will sometimes try to access the MCS surface even if
disabled.
V3: Use actual MSAA layout from the texture's mt, rather
then computing what would have been used based on the format.
This is simpler and less fragile - there's at least one case where
we might want to have a texture's MSAA layout change based on what
the app does (CMS SINT falling back to UMS if the app ever attempts
to render to it with a channel disabled.)
This also obsoletes V2's 1/10 -- compute_msaa_layout can now remain
an implementation detail of the miptree code.
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This gives us correct behavior for both renderbuffers (which previously
worked) and multisample textures (which would never get an MCS surface
allocated, even if CMS layout was selected)
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Paul Berry <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
Previously this was only done for render targets.
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Paul Berry <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The bspec says:
"SW must program the sample mask value in this field so that it matches
with 3DSTATE_SAMPLE_MASK"
I haven't observed this to actually fix anything, but stumbled across it
while adding the rest of the support for CMS layout for multisample
textures.
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Paul Berry <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Haswell needs a copy of the sample mask in 3DSTATE_PS; this makes that
convenient.
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Paul Berry <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
These are identical: main/compiler.h defines INLINE to "inline".
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
sed -i -e 's/GLuint/unsigned/g' -e 's/GLint/int/g' \
-e 's/GLfloat/float/g' -e 's/GLubyte/uint8_t/g' \
-e 's/GLshort/int16_t/g' \
brw_eu* brw_disasm.c brw_structs.h
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
Performed via:
$ for file in *; do sed -i 's/ *//g'; done
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
Performed via s/ *$//g.
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
total instructions in shared programs: 1550449 -> 1550048 (-0.03%)
instructions in affected programs: 15207 -> 14806 (-2.64%)
Reviewed-by: Paul Berry <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
|
|
|
|
|
|
|
| |
total instructions in shared programs: 1550713 -> 1550449 (-0.02%)
instructions in affected programs: 7931 -> 7667 (-3.33%)
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
| |
One program affected.
instructions in affected programs: 436 -> 428 (-1.83%)
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before this patch, the following code would not be optimized even though
the first two instructions were common to the then and else blocks:
(+f0) IF
MOV dst0 ...
MOV dst1 ...
MOV dst2 ...
ELSE
MOV dst0 ...
MOV dst1 ...
MOV dst3 ...
ENDIF
This commit extends the peephole to handle this case.
No shader-db changes.
Reviewed-by: Paul Berry <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
fs_visitor::try_replace_with_sel optimizes only if statements whose
"then" and "else" bodies contain a single MOV instruction. It also
could not handle constant arguments, since they cause an extra MOV
immediate to be generated (since we haven't run constant propagation,
there are more than the single MOV).
This peephole fixes both of these and operates as a normal optimization
pass.
fs_visitor::try_replace_with_sel is still arguably necessary, since it
runs before pull constant loads are lowered.
total instructions in shared programs: 1559129 -> 1545833 (-0.85%)
instructions in affected programs: 167120 -> 153824 (-7.96%)
GAINED: 13
LOST: 6
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
| |
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This is the last thing that register_coalesce() still handled.
total instructions in shared programs: 1561060 -> 1560908 (-0.01%)
instructions in affected programs: 15758 -> 15606 (-0.96%)
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
parent_mem_ctx was unused since db47074a, so remove the two wrappers
around create() and make create() the constructor.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
make_list is just a one-line wrapper and was confusingly called by
NULL objects. E.g., cur_if == NULL; cur_if->make_list(mem_ctx).
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
|