| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
| |
This enables a gallium driver not to care about the semantics of
ARB_sample_shading vs ARB_gpu_shader5 sample attributes. When
ARB_sample_shading-style sample shading is enabled, all of the fp inputs
are marked for interpolation at the sample location.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
The new location field can be either center, centroid, or sample, which
indicates the location that the shader should interpolate at.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
i965 precompiles shaders at link time, and prints a disassembly if
INTEL_DEBUG=vs,gs,fs, including the shader name. However, blit shaders
were showing up as "unnamed" since we hadn't set a name prior to
linking.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Originally, we didn't have direct accessors for all of the GLSL types,
so the only way to get at them was to use the symbol table. Now, we
can just get at them directly, which is simpler and faster.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ben Widawsky <[email protected]>
|
|
|
|
|
|
| |
as a candidate for the GL_RGB10_A2 internal texture format.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
Spotted by Charmaine Lee.
Cc: "10.2" <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Charmaine Lee <[email protected]>
|
|
|
|
|
|
|
| |
Spotted by Charmaine Lee.
Cc: "10.2" <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
| |
|
|
|
|
|
| |
If the driver doesn't implement get_sample_position(), let's return
some non-garbage values.
|
| |
|
|
|
|
|
|
|
|
| |
BRW_PREDICATE_ALIGN1_ANY16H was incorrectly being disassembled as
"all16h", and ALL16H would probably print as "(null)".
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
| |
... on Gen6+. I'm not actually sure which class Gen6 fits into.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously instruction scheduling tracked dependencies on a per-register
basis. This meant that there was an artificial dependency between
interpolation instructions writing into the same virtual register.
Instruction scheduling would insert a number of instructions between the
two instructions in this example, when they are actually independent.
linterp vgrf8+0.0:F, hw_reg2:F, hw_reg3:F, hw_reg6:F
linterp vgrf8+1.0:F, hw_reg2:F, hw_reg3:F, hw_reg6+16:F
This lead to cases where the first texture coordinate is interpolated at
the beginning of the shader, but the second is done immediately before
the texture operation that uses it as a source.
After this change, the artificial dependency is removed and the
interpolation instructions are scheduled together.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current compute-to-mrf pass doesn't handle blocks of MOVs. Shaders
that end with a texture fetch follwed by an fb write are left like this:
0x00000000: pln(8) g6<1>F g4<0,1,0>F g2<8,8,1>F { align1 WE_normal 1Q compacted };
0x00000008: pln(8) g7<1>F g4.4<0,1,0>F g2<8,8,1>F { align1 WE_normal 1Q compacted };
0x00000010: send(8) g2<1>UW g6<8,8,1>F
sampler (1, 0, 0, 1) mlen 2 rlen 4 { align1 WE_normal 1Q };
0x00000020: mov(8) g113<1>F g2<8,8,1>F { align1 WE_normal 1Q compacted };
0x00000028: mov(8) g114<1>F g3<8,8,1>F { align1 WE_normal 1Q compacted };
0x00000030: mov(8) g115<1>F g4<8,8,1>F { align1 WE_normal 1Q compacted };
0x00000038: mov(8) g116<1>F g5<8,8,1>F { align1 WE_normal 1Q compacted };
0x00000040: sendc(8) null g113<8,8,1>F
render ( RT write, 0, 4, 12) mlen 4 rlen 0 { align1 WE_normal 1Q EOT };
This patch lets compute-to-mrf recognize blocks of MOVs and match them to
instructions (typically SEND) that writes multiple registers. With this,
the above shader becomes:
0x00000000: pln(8) g6<1>F g4<0,1,0>F g2<8,8,1>F { align1 WE_normal 1Q compacted };
0x00000008: pln(8) g7<1>F g4.4<0,1,0>F g2<8,8,1>F { align1 WE_normal 1Q compacted };
0x00000010: send(8) g113<1>UW g6<8,8,1>F
sampler (1, 0, 0, 1) mlen 2 rlen 4 { align1 WE_normal 1Q };
0x00000020: sendc(8) null g113<8,8,1>F
render ( RT write, 0, 20, 12) mlen 4 rlen 0 { align1 WE_normal 1Q EOT };
which is the bulk of the shader db results:
total instructions in shared programs: 987040 -> 986720 (-0.03%)
instructions in affected programs: 844 -> 524 (-37.91%)
GAINED: 0
LOST: 0
The optimization also applies to MRT shaders that write the same
color value to multiple RTs, in which case we can eliminate four MOVs in
a similar fashion. See fbo-drawbuffers2-blend in piglit for an example.
No measurable performance impact. No piglit regressions.
Signed-off-by: Kristian Høgsberg <[email protected]>
|
|
|
|
|
|
|
|
| |
Although the HSW PRM shows it, the BSpec lists this workaround as being
for Ivybridge only.
total instructions in shared programs: 1994951 -> 1993675 (-0.06%)
instructions in affected programs: 27325 -> 26049 (-4.67%)
|
|
|
|
|
|
|
|
| |
Port of commit b16b3c87 to the vec4 code.
No shader-db improvements, but might as well. The fs backend saw an
improvement because it's scalar and multiple identical CMP instructions
were generated by the SEL peepholes.
|
|
|
|
| |
Port of commit 219b43c6 to the vec4 code.
|
|
|
|
| |
Port of commit 5daf867f to the vec4 code.
|
|
|
|
|
|
|
|
|
|
|
| |
[mattst88]: Modified to perform CSE on instructions with
the same writemask. Offered no improvement before.
total instructions in shared programs: 1995633 -> 1995185 (-0.02%)
instructions in affected programs: 14410 -> 13962 (-3.11%)
Reviewed-by: Matt Turner <[email protected]>
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
It's C. Compile it as such.
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
| |
The #ifndef include guards already said the right thing :)
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
| |
With a hack to place an exec_node in the struct in C to be at the same
location as the inherited exec_node in C++.
Acked-by: Topi Pohjolainen <[email protected]>
|
|
|
|
| |
Acked-by: Topi Pohjolainen <[email protected]>
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Now that this cap is used to determine the availability of both, adjust
its name to reflect the new reality.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The assumption is that any driver capable of emitting layer from the
vertex shader and supporting viewports should be able to also handle
emitting viewport index from the vertex shader.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Tested-by: Tobias Droste <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
|
|
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
Tested-by: Tobias Droste <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
If multiple viewports are supported, that implies the presence of a GS
and layered rendering, so we can enable ARB_fragment_layer_viewport as
well.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
I noticed this when trying to find comments about pull constant buffers.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
I wanted to access this value from stage-generic code, so stop storing it
under two different names.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
If we had some NOS affecting VS compilation that resulted in optimization
changing the set of constants to be uploaded, we might not have reuploaded
the constants.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
This changed in the state streaming work years ago.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
I recently moved the code from the mentioned location right into this
file.
Reviewed-by: Kenneth Graunke <[email protected]>
|