| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
v2: Properly use "conditionalmod" pre-Gen6, rather than the incorrectly
copy-and-pasted "BRW_CONDITIONAL_G".
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
| |
This will become necessary once we start supporting ARB programs and
fixed function in this backend.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
| |
Don't forget to set depth_mt even if !hiz_mt.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
This was missing and got labeled "Something else".
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When copy_array_to_vbo_array encountered an array with src_stride == 0
and dst_stride != 0, we would replicate out the single element to the
whole size (max - min + 1). This is unnecessary: we can simply upload
one copy and set the buffer's stride to 0.
Decreases vertex upload overhead in an upcoming Steam for Linux title.
Prior to this patch, copy_array_to_vbo_array appeared very high in the
profile (Eric quoted 20%). After the patch, it disappeared completely.
Reviewed-by: Eric Anholt <[email protected]>
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This essentially reverts the following:
commit c625aa19cb53ed27f91bfd16fea6ea727e9a5bbd
Author: Chris Wilson <[email protected]>
Date: Fri Feb 18 10:37:43 2011 +0000
intel: extend current vertex buffers
While working on optimizing an upcoming Steam title, I broke this code.
Eric expressed his doubts about this optimization, and noted that the
original commit offered no performance data.
I ran before and after benchmarks on Xonotic and Citybench, and found
that this code made no difference. So, remove it to reduce complexity
and make future work simpler.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
This is a leftover from when we had to split those two functions due to
the separate BO validation step.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
"Active" is an already-used term for the query being between
glBeginQuery() and glEndQuery(), while this is tracking whether the
start of the packet pair for emitting state has been inserted into the
current batchbuffer.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Consider the following code, which reinterprets a register as a
different type:
mov(8) g6<1>F g1.4<0,4,1>.xF
and(8) g5<1>.xUD g6<4,4,1>.xUD 0x7fffffffUD
Copy propagation would notice that we can replace the use of g6 with
g1.4 and eliminate the MOV. Unfortunately, it failed to preserve the UD
type, incorrectly generating:
and(8) g5<1>.xUD g6<4,4,1>.xF 0x7fffffffUD
Found while debugging Ian's uncommitted ARB_vertex_program LOG opcode
test with my new Mesa IR -> Vec4 IR translator.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Consider the following code sequence:
mul(8) g4<1>F g1<0,4,1>.wzwwF g3<4,4,1>.wzwwF
mov.sat(8) m1<1>.xyF g4<4,4,1>F
mul(8) g4<1>F g1<0,4,1>.xxyxF g3<4,4,1>.xxyxF
mov.sat(8) m1<1>.zwF g4<4,4,1>F
The compute-to-MRF pass will discover the first mov.sat and attempt to
replace it by rewriting earlier instructions. Everything works out,
so it replaces scan_inst's destination file, reg, and reg_offset,
resulting in:
mul(8) m1<1>F g1<0,4,1>.wzwwF g3<4,4,1>.wzwwF
mul(8) g4<1>F g1<0,4,1>.xxyxF g3<4,4,1>.xxyxF
mov.sat(8) m1<1>.zwF g4<4,4,1>F
Unfortunately, it loses the .xy writemask on the mov.sat's MRF
destination. While this doesn't pose an immediate problem, it then
proceeds to transform the second mov.sat, resulting in:
mul(8) m1<1>F g1<0,4,1>.wzwwF g3<4,4,1>.wzwwF
mul(8) m1<1>F g1<0,4,1>.xxyxF g3<4,4,1>.xxyxF
Instead of writing both halves of the vector (like the original code),
it overwrites the full vector both times, clobbering the desired .xy
values.
When encountering a MOV, the compute-to-MRF code scans for instructions
which generate channels of the MOV source. It ensures that all
necessary channels are available (possibly written by several
instructions). In this case, *more* channels are available than
necessary, so we want to take the subset that's actually used.
Taking the bitwise and of both writemasks should accomplish that.
This was discovered by analyzing an ARB_vertex_program test
(glean/vertProg1/MUL test (with swizzle and masking)) with my new
Mesa IR -> Vec4 IR translator code. However, it should be possible
with GLSL programs as well.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
While copying the values into the batch space, we advance the param
pointer. The debug code then tries to iterate over all the uploaded
values, starting at param...which is now the end of the uploaded data,
rather than the start.
This patch saves a pointer to the start of push constant space before
it gets altered and switches the debug code to use that.
Tested by uncommenting the code and examining the output of
glsl-vs-clamp-1.shader_test. Previously all values appeared to be zero.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
| |
v2: Fix mangled sentence in the comment, and make the loop exit early.
Reviewed-by: Ian Romanick <[email protected]> (v1)
|
|
|
|
|
|
|
| |
Given the usecase we have of trying to measure timestamps across individual
draw calls, flushing will totally mess up what people are trying to measure.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The theory I had when I wrote the code was that you wanted to minimize latency
on your queries because the app was going to ask soon. Only, it turns out
that everybody batches up their queries and asks for the results later (often
after the next SwapBuffers!), so this was a pessimization.
Until now, I had no workload where it mattered enough to benchmark. Recently
I started playing some Minecraft, which uses tons of queries to decide whether
to render chunks of the terrain. For that app, avoiding the flush in the
query-generation loop improves performance 22.7% +/- 4.7% (n=3) on an apitrace
capture of it (confirmed in game by watching the fps meter found by pressing
F3, 15/16 -> 20/21 fps).
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
I'm amazed that my usual warnings check didn't catch this, and that this
passed piglit.
|
|
|
|
|
|
|
|
| |
otherwise some compilers will throw error
"error: format not a string literal and no format arguments"
Signed-off-by: Tapani Pälli <[email protected]>
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
| |
Now that we've replaced all the variable settings other than reg_width, it's
easy to hang on to this (the expensive part of setting up the allocator).
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
This should also reduce register pressure on gen7+, like the previous commit.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Improves performance of the Lightsmark penumbra shadows scene by 15.7% +/-
1.0% (n=15), by eliminating register spilling. (tested by smashing the list of
scenes to have all other scenes have 0 duration -- includes additional
rendering of scene description text that normally doesn't appear in that
scene)
v2: Allow allocation of all but g0/g1 of the payload.
v3: Pull count_to_loop_end() out to a helper function.
Reviewed-by: Kenneth Graunke <[email protected]> (v2, recommended v3)
|
|
|
|
|
|
| |
For now, nothing else can get allocated over them, but that will change.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
This was to slot in the magic aligned pairs class, but it got moved to a
descriptive name later.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
Based on split_virtual_grfs(), we choose the same set every time, so set it in
stone. This will help us avoid regenerating the somewhat expensive
class/register set setup every compile.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This is derived from the FS visitor code for the same, but tracks each channel
separately (otherwise, some typical fill-a-channel-at-a-time patterns would
produce excessive live intervals across loops and cause spilling).
Reviewed-by: Kenneth Graunke <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=48375
(crash -> failure, can turn into pass by forcing unrolling still)
|
|
|
|
|
|
|
|
| |
These messages always have m0 = g0 and m1 = offset, and write has m2 = data.
Avoids regression in opt_compute_to_mrf() with a change to scratch writes to
set up the data as an MRF write in the IR.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
Note that BRW_PREDICATE_NONE is 0 and BRW_PREDICATE_NORMAL is 1, so that's a
lot like the true/false we had in the FS before.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
fs_bblock_link -> bblock_link
fs_bblock -> bblock_t (to avoid conflicting with all the fs_bblock *bblock)
fs_cfg -> cfg_t (to avoid conflicting with all the fs_cfg *cfg)
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
This will let us reuse brw_fs_cfg.cpp from brw_vec4_*.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This fixes confusion by the upcoming live variable analysis which saw e.g. use
of temp.w when only temp.xyz were initialized in the basic block, and
concluded that temp.w must have come from outside of the block (even though it
was never initialized anywhere).
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
Both callers were doing basically the same thing, just written differently.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
Both callers used (effectively) inst->dst as the argument, so just reference
it.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
This is super basic, but it let me visualize a problem I had with
opt_compute_to_mrf().
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
Fixes 51 piglit tests (fbo-clear-formats, and most of the remaining failures
in depthstencil).
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This code is twisty, and the comment before most of the blocks was actually
giving me the opposite impression from its intention: We want to apply as much
of our offset as possible through coarse tile-aligned adjustment, since we can
do so independently per buffer, and apply the minimum we can through
fine-grained drawing offset x/y, since it has to agree between all buffers.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
We now have a case of wanting to do that on gen6+ as well, so make this logic
usable elsewhere.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that ARB programs and fixed function are routed through the new
backend, shader might be NULL. Don't do INTEL_DEBUG=perf support in
that case, since it relies on shader->compiled_once.
Since INTEL_DEBUG=perf wasn't previously supported, this maintains the
status quo. It might be nice to support it someday, however.
This could be moved to brw_shader_program instead of brw_shader, but
it appears even prog can be NULL in that case.
Reviewed-by: Eric Anholt <[email protected]>
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
More dead code. I'm not sure what it was for.
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
These were only part of NV_fragment_program, so we can kill them.
The fact that PROGRAM_NAMED_PARAM appears in r200_vertprog.c is rather
comedic, but also demonstrates that people just spam the various types
of parameters everywhere because they're confusing.
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
Nobody uses it anymore.
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
We were holding on to this code because we were aware that NWN 1 had some
support for vertex programs -- no other linux programs I've come across would
use it (since other software also has ARB_vp or GLSL support). Only, it turns
out that NWN doesn't even give us any vertex programs. Given that we have
known issues where the extension has never been fully supported, just give up
on it.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=46795
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
| |
These don't appear in ARB_vp or NV_vp and I missed that fact on the first
pass of removing dead opcodes.
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This should improve our ability to register allocate without spilling.
Unfortuantely, due to the live variable analysis being ignorant of loops, we
still have register allocation failures on some programs.
v2: Add more context to the comment explaining the function.
Reviewed-by: Kenneth Graunke <[email protected]> (v1)
|
|
|
|
|
|
|
|
| |
Before, we'd spill one reg, then continue on without actually register
allocating, then assertion fail when we tried to use a vgrf number as a
register number.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To validate this code, I ran piglit -t vs quick.tests with the "go spill
everything" debugging code enabled. There was only one regression:
glsl-vs-unroll-explosion simply ran out of registers. This should be
fine in the real world, since no one actually spills every single
register.
NOTE: This is a candidate for the 9.0 branch. Even if it proves to have
bugs, it's likely better than simply failing to compile.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
move_grf_array_access_to_scratch() calculates scratch buffer offsets in
bytes. However, emit_scratch_read/write() expects the base_offset
parameter to be measured in OWords.
As a result, a shader using a scratch read/write offset greater than
zero (in practice, a shader containing more than one variable in
scratch) would use too large an offset, frequently exceeding the
available scratch space.
This patch corrects the mismatch by removing spurious conversion from
OWords to bytes in move_grf_array_access_to_scratch().
This is based on a patch by Paul Berry.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
| |
Presumably some of this was used by the old fragment shader backend.
|
|
|
|
|
|
|
|
| |
Use a simple chaining hash table for the ACP. This is not really very good,
because we still do a full walk of the tree per destination write, but it
still reduces fp-long-alu runtime from 5.3 to 3.9s.
Reviewed-by: Kenneth Graunke <[email protected]>
|