| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
Otherwise, the performance warning for shader recompiles will just say
"something else".
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
| |
It doesn't exist, so attempting to read it will trigger generation
assertions in the brw_inst API.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Printing the hex offsets makes it basically impossible to diff assembly:
if you add even a single instruction, the entire shader shows up as a
difference. So, every time I want to compare assembly, I have to strip
this out.
The hex offsets might be useful when debugging compaction, or when
inspecting the program cache buffer. Since it's occasionally useful,
but uncommon, this patch disables it by default, but makes it easy to
re-enable it temporarily when the need arises.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Avoids regenerating it unnecessarily.
Every program in shader-db improved, none by an amount less than a 1/3
reduction. One Dota2 shader decreased from 62 -> 24.
cfg calculations: 429492 -> 193197 (-55.02%)
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
| |
Will let us abstract how the instructions are stored.
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
| |
brw_fs_visitor.cpp:2400:1: warning: unused parameter 'ir' [-Wunused-parameter]
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The parameter is an int16_t, and we're check that it's value will fit in
16-bits. Yes, the value that is stored in 16-bits will surely fit in
16-bits.
brw_inst.h: In function 'brw_inst_set_gen6_jump_count':
brw_inst.h:321:66: warning: comparison is always true due to limited range of data type [-Wtype-limits]
brw_inst.h:321:66: warning: comparison is always true due to limited range of data type [-Wtype-limits]
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
| |
brw_inst.h: In function 'brw_inst_set_src1_vstride':
brw_inst.h:118:76: warning: unused parameter 'brw' [-Wunused-parameter]
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
| |
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The bug is triggered by using glTexSubImage2d() with GL_DEPTH_STENCIL
as base internal format and non-zero x, y offsets. Currently x, y
offsets are ignored while updating the texture image.
Fixes Khronos GLES3 CTS tests:
npot_tex_sub_image_2d
npot_tex_sub_image_3d
npot_pbo_tex_sub_image_2d
npot_pbo_tex_sub_image_2d
Cc: <[email protected]>
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit bbefb15e01e1c16af69646898918982ae00f8c92.
Fixes the 11 regressions caused in framebuffer_blit tests in
Khronos GLES3 CTS tests:
Original patch reduced the instruction count but had no performance
benefits. So, it's safe to revert it without causing any performance
regressions.
Signed-off-by: Anuj Phogat <[email protected]>
Acked-by: Kristian Høgsberg <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 3178d2474ae5bdd1102fb3d76a60d1d63c961ff5.
This caused GPU hangs on Ivybridge for some users and huge (80%)
performance regressions across the board on multiple platforms.
We need to find a better solution. I've made several attempts, but none
of them have worked yet. In the meantime, we should revert this.
Reverting it breaks GL_PRIMITIVES_GENERATED for non-zero streams, but
that's okay, since we don't expose GL_ARB_gpu_shader5 yet.
Fixes Piglit's EXT_transform_feedback/generatemipmap prims_generated
test case on Haswell.
|
|
|
|
|
|
|
|
|
|
|
|
| |
It's not clear what abs on logical instructions means on Broadwell, and
it doesn't appear to do anything sensible.
Fixes 270 Piglit tests (the bitand/bitor/bitxor tests with abs).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=81157
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Cc: "10.2" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This code should execute without regard to the currently executing
channels. Asking for gl_SampleID inside control flow might break in
strange ways. It appears to break even at the top of the program in
SIMD16 mode occasionally as well.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
gen8_fs_generator uses these to decide whether to set the execution size
to 8 or 16, so we incorrectly made both of these MOVs the full width in
SIMD16 shaders. (It happened to work out on Gen4-7.)
Setting them should also help inform optimization passes what's really
going on, which could help avoid bugs.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Both inst->force_uncompressed and inst->force_sechalf mean that the
generated instruction should be uncompressed and have an execution size
of 8. We don't require the visitor to set both flags - setting
inst->force_sechalf by itself is supposed to be enough.
On Gen4-7, guess_execution_size() demoted instructions to 8-wide based
on the default compression state. On Gen8+, we instead set a default
execution size, which worked great...except that we forgot to check
inst->force_sechalf when deciding whether to use 8 or 16.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
|
|
| |
Instead of hand-rolling it.
v2 [mattst88]: Rename get_size to length. Expand comment in ir_reader.
Reviewed-by: Ian Romanick <[email protected]> [v1]
Reviewed-by: Matt Turner <[email protected]>
Signed-off-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A similar attempt was made in commit 5ff1e446 and was reverted in commit
a39428cf after causing a regression in an ES 3 conformance test. The
test still passes after this commit.
total instructions in shared programs: 1994827 -> 1992858 (-0.10%)
instructions in affected programs: 128247 -> 126278 (-1.54%)
GAINED: 0
LOST: 1
Acked-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Should potentially allow a few more cases, while avoiding doing CSE on
texture operations on Gen <= 6 with the MRF.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80211
Reviewed-by: Kenneth Graunke <[email protected]>
Tested-by: lu hua <[email protected]>
|
|
|
|
|
|
|
| |
Otherwise we'd compare uninitialized pointers with NULL and dereference,
leading to crashes.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Signed-off-by: Chris Forbes <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SIMD8-only for now.
V5: - Fix style complaints
- Move prototype to be with other oddball emit functions
- Use unreachable() instead of assert() where possible
V6: - Describe what is happening with the clamping
- Add reg_width to make some expressions clearer
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
The backend will have to do a message send, so we want to keep these in
one piece, just like texture ops.
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
V5: - Split into separate opcodes
- Pass message data in src1 immediate
- Put noperspective bit in fs_inst rather than adding any junk to
backend_instruction
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
V3: Rework for brw_inst changes
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
These got lost in the big brw_inst shakeup.
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
- Don't try to disassemble send's src1 as a descriptor if it's not an
immediate.
- In the same case, show src1 as an operand (makes it easier to see
bogus register regions, etc -- the hardware is very fussy)
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
| |
We'd otherwise go looking into virtual_grf_sizes for things that aren't
in there at all.
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
| |
These were looking in the wrong field.
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 54e91e7420 introduced a function declaration that uses
brw_context. While brw_context tends to get included in most files, it
is not when compiling intel_asm_annotation.c resulting in the following
warning:
In file included from brw_shader.h:25:0,
from brw_cfg.h:32,
from intel_asm_annotation.c:24:
brw_reg.h:122:39: warning: 'struct brw_context' declared inside
parameter list [enabled by default]
brw_reg.h:122:39: warning: its scope is only this definition or
declaration, which is probably not what you want [enabled by default]
Add a forward-declaration for struct brw_context to avoid the issue.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
BRW_PREDICATE_ALIGN1_ANY16H was incorrectly being disassembled as
"all16h", and ALL16H would probably print as "(null)".
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
| |
... on Gen6+. I'm not actually sure which class Gen6 fits into.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously instruction scheduling tracked dependencies on a per-register
basis. This meant that there was an artificial dependency between
interpolation instructions writing into the same virtual register.
Instruction scheduling would insert a number of instructions between the
two instructions in this example, when they are actually independent.
linterp vgrf8+0.0:F, hw_reg2:F, hw_reg3:F, hw_reg6:F
linterp vgrf8+1.0:F, hw_reg2:F, hw_reg3:F, hw_reg6+16:F
This lead to cases where the first texture coordinate is interpolated at
the beginning of the shader, but the second is done immediately before
the texture operation that uses it as a source.
After this change, the artificial dependency is removed and the
interpolation instructions are scheduled together.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current compute-to-mrf pass doesn't handle blocks of MOVs. Shaders
that end with a texture fetch follwed by an fb write are left like this:
0x00000000: pln(8) g6<1>F g4<0,1,0>F g2<8,8,1>F { align1 WE_normal 1Q compacted };
0x00000008: pln(8) g7<1>F g4.4<0,1,0>F g2<8,8,1>F { align1 WE_normal 1Q compacted };
0x00000010: send(8) g2<1>UW g6<8,8,1>F
sampler (1, 0, 0, 1) mlen 2 rlen 4 { align1 WE_normal 1Q };
0x00000020: mov(8) g113<1>F g2<8,8,1>F { align1 WE_normal 1Q compacted };
0x00000028: mov(8) g114<1>F g3<8,8,1>F { align1 WE_normal 1Q compacted };
0x00000030: mov(8) g115<1>F g4<8,8,1>F { align1 WE_normal 1Q compacted };
0x00000038: mov(8) g116<1>F g5<8,8,1>F { align1 WE_normal 1Q compacted };
0x00000040: sendc(8) null g113<8,8,1>F
render ( RT write, 0, 4, 12) mlen 4 rlen 0 { align1 WE_normal 1Q EOT };
This patch lets compute-to-mrf recognize blocks of MOVs and match them to
instructions (typically SEND) that writes multiple registers. With this,
the above shader becomes:
0x00000000: pln(8) g6<1>F g4<0,1,0>F g2<8,8,1>F { align1 WE_normal 1Q compacted };
0x00000008: pln(8) g7<1>F g4.4<0,1,0>F g2<8,8,1>F { align1 WE_normal 1Q compacted };
0x00000010: send(8) g113<1>UW g6<8,8,1>F
sampler (1, 0, 0, 1) mlen 2 rlen 4 { align1 WE_normal 1Q };
0x00000020: sendc(8) null g113<8,8,1>F
render ( RT write, 0, 20, 12) mlen 4 rlen 0 { align1 WE_normal 1Q EOT };
which is the bulk of the shader db results:
total instructions in shared programs: 987040 -> 986720 (-0.03%)
instructions in affected programs: 844 -> 524 (-37.91%)
GAINED: 0
LOST: 0
The optimization also applies to MRT shaders that write the same
color value to multiple RTs, in which case we can eliminate four MOVs in
a similar fashion. See fbo-drawbuffers2-blend in piglit for an example.
No measurable performance impact. No piglit regressions.
Signed-off-by: Kristian Høgsberg <[email protected]>
|
|
|
|
|
|
|
|
| |
Although the HSW PRM shows it, the BSpec lists this workaround as being
for Ivybridge only.
total instructions in shared programs: 1994951 -> 1993675 (-0.06%)
instructions in affected programs: 27325 -> 26049 (-4.67%)
|
|
|
|
|
|
|
|
| |
Port of commit b16b3c87 to the vec4 code.
No shader-db improvements, but might as well. The fs backend saw an
improvement because it's scalar and multiple identical CMP instructions
were generated by the SEL peepholes.
|
|
|
|
| |
Port of commit 219b43c6 to the vec4 code.
|
|
|
|
| |
Port of commit 5daf867f to the vec4 code.
|
|
|
|
|
|
|
|
|
|
|
| |
[mattst88]: Modified to perform CSE on instructions with
the same writemask. Offered no improvement before.
total instructions in shared programs: 1995633 -> 1995185 (-0.02%)
instructions in affected programs: 14410 -> 13962 (-3.11%)
Reviewed-by: Matt Turner <[email protected]>
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
It's C. Compile it as such.
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
| |
The #ifndef include guards already said the right thing :)
Reviewed-by: Topi Pohjolainen <[email protected]>
|