| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
The uniform_vector_size array was only ever used by pack_uniform_registers
which no longer needs it.
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Previously, pack_uniform_registers worked based on the size of the uniform
as given to us when we initially set up the uniforms. However, we have to
walk through the uniforms and figure out liveness anyway, so we migh as
well record the number of channels used as we go. This may also allow us
to pack things tighter in a few cases.
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
|
| |
The next commit will add code to codegen_vs_prog that requires the NIR
shader to be there in all cases. It doesn't hurt anything to just move it
from brw_vs_emit to its only caller.
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
GLSL IR vs. NIR shader-db results for vec4 programs on i965:
total instructions in shared programs: 1499328 -> 1388354 (-7.40%)
instructions in affected programs: 1245199 -> 1134225 (-8.91%)
helped: 7469
HURT: 2440
GLSL IR vs. NIR shader-db results for vec4 programs on G4x:
total instructions in shared programs: 1436799 -> 1325825 (-7.72%)
instructions in affected programs: 1205599 -> 1094625 (-9.20%)
helped: 7469
HURT: 2440
GLSL IR vs. NIR shader-db results for vec4 programs on Iron Lake:
total instructions in shared programs: 1436654 -> 1325682 (-7.72%)
instructions in affected programs: 1205503 -> 1094531 (-9.21%)
helped: 7468
HURT: 2440
GLSL IR vs. NIR shader-db results for vec4 programs on Sandy Bridge:
total instructions in shared programs: 2016249 -> 1787033 (-11.37%)
instructions in affected programs: 1850547 -> 1621331 (-12.39%)
helped: 14856
HURT: 1481
GLSL IR vs. NIR shader-db results for vec4 programs on Ivy Bridge:
total instructions in shared programs: 1848027 -> 1648216 (-10.81%)
instructions in affected programs: 1660279 -> 1460468 (-12.03%)
helped: 14668
HURT: 1369
GLSL IR vs. NIR shader-db results for vec4 programs on Bay Trail:
total instructions in shared programs: 1848027 -> 1648216 (-10.81%)
instructions in affected programs: 1660279 -> 1460468 (-12.03%)
helped: 14668
HURT: 1369
GLSL IR vs. NIR shader-db results for vec4 programs on Haswell:
total instructions in shared programs: 1848027 -> 1648216 (-10.81%)
instructions in affected programs: 1660279 -> 1460468 (-12.03%)
helped: 14668
HURT: 1369
I also ran our full suite of benchmarks on a Haswell and had the following
statistically significant (according to ministat) changes:
Test master-glsl master-nir diff
bench_OglGeomPoint 461.556 463.006 1.450
bench_OglTerrainFlyInst 184.484 187.574 3.090
bench_OglTerrainPanInst 132.412 136.307 3.895
bench_OglTexFilterAniso 19.653 19.645 -0.008
bench_OglTexFilterTri 58.333 58.009 -0.324
bench_OglVSInstancing 65.049 65.327 0.278
bench_trexoff 69.474 69.694 0.220
bench_valley 40.708 41.125 0.417
v2 (Jason Ekstrand):
- Remove more uses of NirOptions as a switch
- New shader-db numbers
- Added benchmark numbers
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
| |
Notice that Skylake needs to include a header in the sampler message
so it will need some tweaks to work there.
Signed-off-by: Samuel Iglesias Gonsalvez <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
|
|
|
|
|
|
|
|
| |
Gen6 MATH instructions can not execute in align16 mode, so swizzles or
writemasking are not allowed.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92033
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With NIR:
instructions in affected programs: 111508 -> 109193 (-2.08%)
helped: 507
Without NIR:
instructions in affected programs: 28763 -> 28474 (-1.00%)
helped: 186
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
We're trying to avoid a libdrm dependency in the core compiler, so let's
move the perf_debug code one level up from the brw_*_emit() helpers to
the brw_codegen_*_prog() helpers.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Signed-off-by: Kristian Høgsberg Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the register types do not match and the instruction
that contains the final destination is saturated, register
coalescing generated non-equivalent code.
This did not happen when using IR because types usually
matched, but it is visible in nir-vec4.
For example,
mov vgrf7:D vgrf2:D
mov.sat m4:F vgrf7:F
is coalesced to:
mov.sat m4:D vgrf2:D
The patch prevents coalescing in such scenario, unless the
instruction we want to coalesce into is a MOV (without type
conversion implied). In that case, the patch sets the register
types to the type of the final destination.
Shader-db results in HSW (only vec4 instructions shown):
total instructions in shared programs: 1754415 -> 1754416 (0.00%)
instructions in affected programs: 74 -> 75 (1.35%)
helped: 0
HURT: 1
GAINED: 0
LOST: 0
Only one extra instruction in one of the shaders, that comes from
eliminating a saturation error by preventing register coalesce.
Cc: "10.6 11.0" <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
| |
Cc: "11.0 10.6" <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91719
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
opt_register_coalesce stopped to check previous instructions to
coalesce with if somebody else was writing on the same
destination. This can be optimized to check if somebody else was
writing to the same channels of the same destination using the
writemask.
Shader DB results (taking into account only vec4):
total instructions in shared programs: 1781593 -> 1734957 (-2.62%)
instructions in affected programs: 1238390 -> 1191754 (-3.77%)
helped: 12782
HURT: 0
GAINED: 0
LOST: 0
v2: removed some parenthesis, fixed indentation, as suggested by
Matt Turner
v3: added brackets, for consistency, as suggested by Eduardo Lima
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
[v2: kayden-supplied code in fs_nir replacing need for logical opcode]
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The legacy userclip fields are only used for the vertex shader, and at
that point there's only program_string_id and the tex struct, which are
common to all keys. So there's no need for a "VUE" key base class.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is now only used for the vertex shader, so it makes sense to get it
out of any paths run by the geometry shader.
Instead of passing the gl_clip_plane array into the run() method (which
is shared among all subclasses), we add it as a vec4_vs_visitor
constructor parameter. This eliminates the bogus NULL parameter in the
GS case.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are two uses of this flag.
The primary use is checking whether we need to emit code to convert
legacy gl_ClipVertex/gl_Position clipping to clip distances. In this
case, we also have to upload the clip planes as uniforms, which means
setting nr_userclip_plane_consts to a positive value. Checking if it's
> 0 works for detecting this case.
Gen4-5 also wants to know whether we're doing clipping at all, so it can
emit user clip flags. Checking if output_reg[VARYING_SLOT_CLIP_DIST0]
is set to a real register suffices for this.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The src_reg constructor that received the glsl_type was using it
only to build the swizzle, but not to fill this->type as dst_reg
is doing.
This caused some type mismatch between movs and alu operations
on the NIR path, so copy propagation optimization was not applied
to remove unneeded movs if negate modifier was involved. This was
first detected on minus (negate+add) operations.
Shader DB results (taking into account only vec4):
total instructions in shared programs: 20019 -> 19934 (-0.42%)
instructions in affected programs: 2918 -> 2833 (-2.91%)
helped: 79
HURT: 0
GAINED: 0
LOST: 0
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Noticed when debugging things that lead to the next patch.
On G45 (and presumably ILK) this helps register coalescing:
total instructions in shared programs: 4077373 -> 4077340 (-0.00%)
instructions in affected programs: 43751 -> 43718 (-0.08%)
helped: 52
HURT: 2
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When the NIR-vec4 pass is enabled, handles uniform and GRF array access
on ARB_vertex_program like it is done on vertex shaders.
When the old IR-vec4 pass is used, emit_program_code() emits pull constant
loads directly instead of using relative addressing, hence to call to
move_uniform_array_access_to_pull_constants() is not needed and it is enough
to call to split_uniform_registers().
The patch also calls to move_grf_array_access_to_scratch() like it is
done for shaders, however I suspect this is a no-op for vertex programs and
we could remove it.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
| |
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
| |
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is useful for the upcoming texture support in NIR->vec4 pass,
as we found several cases where the brw_type is available, but not
the glsl_type.
Without this new constructor, the alternative would be:
dst_reg reg(MRF, <reg>)
reg.type = <brw_type>
reg.writemask = <mask>
Adding a new constructor makes code easier to read.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The upcoming introduction of NIR->vec4 pass will require that some NIR
lowering passes are enabled/disabled depending on the type of shader
(scalar vs. vector).
With this patch we pass a 'is_scalar' variable to the process of
constructing the NIR, to let an external context decide how the shader
should be handled.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The NIR->vec4 pass will be activated if both the following conditions are met:
* INTEL_USE_NIR environment variable is defined and is positive (1 or true)
* The stage is vertex shader (support for geometry shaders and
ARB_vertex_program will be added later).
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
After tearing it out another level or two, and just passing the key and
vp directly, we can finally remove this struct. It also eliminates a
pointless memcpy() of the key.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
At this point, the brw_vs_compile structure only contains the key and
gl_vertex_program pointer. We may as well pass and store them directly;
it's simpler and more convenient (key-> instead of vs_compile->key...).
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Nothing outside of vec4_visitor uses it, so we may as well keep it
internal.
Commit db9c915abcc5ad78d2d11d0e732f04cc94631350 for the vec4 backend.
(The empty class will be going away soon.)
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
| |
This is more consistent with how we do it in the FS backend, and reduces
a tiny bit of duplication. It'll also allow for a bit more tidying.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch makes us only issue the performance warning about register
spilling if we actually spilled registers. We also use scratch space
for indirect addressing and the like.
This is basically commit c51163b0cf7aff0375b1a5ea4cb3da9d9e164044 for
the vec4 backend.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Jason plumbed this through a while back in the FS backend, but
apparently we were just passing NULL in the vec4 backend.
This patch passes brw in as intended.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
| |
As of this commit, nothing actually needs the brw_context.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
|
|
|
|
|
|
|
| |
This way we can stop doing is_gles3 checks inside of the compiler.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Previously, these were pulled out of the GL context conditionally based on
whether we were running ff/ARB or a GLSL program. Now, we just pass them
in so that the visitor doesn't have to grab them itself.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Previously, each shader took 3 shader time indices which were potentially
at arbirary points in the shader time buffer. Now, each shader gets a
single index which refers to 3 consecutive locations in the buffer. This
simplifies some of the logic at the cost of having a magic 3 a few places.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This creates the options at screen cration time and then we just copy them
into the context at context creation time. We also move is_scalar to the
brw_compiler structure.
We also end up manually setting some values that the core would have set by
default for us. Fortunately, there are only two non-zero shader compiler
option defaults that we aren't overriding anyway so this isn't a big deal.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Chris Forbes <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
v2 (Ken): Make shader_debug_log a printf-like function.
v3 (Jason): Add a void * to pass the brw_context through
Reviewed-by: Chris Forbes <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We used to store the GS dispatch mode in brw_gs_prog_data while
separately storing the VS dispatch mode in brw_vue_prog_data::simd8.
This patch introduces an enum to represent all possible dispatch modes,
and stores it in brw_vue_prog_data::dispatch_mode, unifying the two.
Based on a suggestion by Matt Turner.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ben Widawsky <[email protected]>
|
|
|
|
|
|
|
|
| |
Whether or not to use NIR is now equivalent to brw->scalar_vs. We can
simplify the logic and make it far less confusing.
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
The backend_shader class really is a representation of a shader. The fact
that it inherits from ir_visitor is somewhat immaterial.
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For scalar GS support, we either need to add a fourth constructor which
takes the GS structures, or combine the existing two and pass the shader
stage.
Given that they're not significantly different, I opted for the latter.
v2: Remove more stuff from the .h file (Jason and Jordan).
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
v2: Save some CPU cycles by doing 'return progress' rather than
'depth++' in the discard jump special case.
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
v2: Style fixes.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
| |
Acked-by: Kenneth Graunke <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
Acked-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This doesn't actually enable untyped surface message sends from GRF
yet, the upcoming atomic counter and image intrinsic lowering code
will.
Reviewed-by: Topi Pohjolainen <[email protected]>
Acked-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
Suggested-by: Kristian Høgsberg <[email protected]>
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The stage_abbrev and stage_name fields in backend_visitor provide what
we need without any additional effort. It also means we'll get the
right names for compute shaders, SIMD8 geometry shaders, and both kinds
of tessellation shaders.
This does unfortunately change the capitalization of the stage
abbreviation in the INTEL_DEBUG=optimizer output filenames. It doesn't
seem worth adding code to handle, though.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
|