| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Without this commit, copy propagation is discarded if it involves
a uniform with an instruction that has 3 sources. But 3 sourced
instructions can access scalar values.
For example, this is what vec4_visitor::fix_3src_operand() is already
doing:
if (src.file == UNIFORM && brw_is_single_value_swizzle(src.swizzle))
return src;
Shader-db results (unfiltered) on NIR:
total instructions in shared programs: 6259650 -> 6241985 (-0.28%)
instructions in affected programs: 812755 -> 795090 (-2.17%)
helped: 7930
HURT: 0
Shader-db results (unfiltered) on IR:
total instructions in shared programs: 6445822 -> 6441788 (-0.06%)
instructions in affected programs: 296630 -> 292596 (-1.36%)
helped: 2533
HURT: 0
v2:
- Updated commit message, using Matt Turner suggestions
- Move the check after we've created the final value, as Jason
Ekstrand suggested
- Clean up the condition
v3:
- Move the check back to the original place, to keep things
tidy, as suggested by Jason Ekstrand
v4:
- Fixed missing is_single_value_swizzle() as pointed by Jason Ekstrand
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
When we assign hw regs to attributes, we don't incorporate the stride
and subreg_offset from the fs_reg. It's rarely used, but the integer
multiplication lowering uses unusual stride and subreg_offset
combination breaks when one source is an attribute.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91970
Cc: "10.6 11.0" <[email protected]>
Signed-off-by: Kristian Høgsberg Kristensen <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, core Mesa's _mesa_CopyImageSubData() created temporary textures
to wrap renderbuffer sources/destinations. This caused a bit of a mess in
the Mesa/gallium state tracker because we had to basically undo that
wrapping.
Instead, change ctx->Driver.CopyImageSubData() to take both gl_renderbuffer
and gl_texture_image src/dst pointers (one being null, the other non-null)
so the driver can handle renderbuffer vs. texture as needed.
For the i965 driver, we basically moved the code that wrapped textures
around renderbuffers from copyimage.c down into the met and driver code.
The old code in copyimage.c also made some questionable calls to
_mesa_BindTexture(), etc. which weren't undone at the end.
v2 (Jason Ekstrand): Rework the intel bits
v3 (Brian Paul): Update the temporary st_CopyImageSubData() function.
Reviewed-by: Topi Pohjolainen <[email protected]>
Tested-by: Kai Wasserbäch <[email protected]>
Tested-by: Nick Sarnie <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
I left a bunch of code indented a level in the previous patch to make
the diff easier to read. But now we should fix that.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
By performing the vertex counting in NIR, we're able to elide a ton of
useless safety checks around every EmitVertex() call:
total instructions in shared programs: 3952 -> 3720 (-5.87%)
instructions in affected programs: 3491 -> 3259 (-6.65%)
helped: 11
HURT: 0
Improves performance in Gl32GSCloth by 0.671742% +/- 0.142202% (n=621)
on Haswell GT3e at 1024x768.
This should also make it easier to implement Broadwell's "Static Vertex
Count" feature someday.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
The table used to map the GL primitive to the hw primitive never
changes so make it const.
Signed-off-by: Ville Syrjälä <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
| |
Defined to 0 in a few places, but it's not used anywhere.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Two drivers use this file, and neither supports quads.
No piglit regressions on i915 (G33) or radeon (Radeon 7500).
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Two drivers use this file, and neither supports quad strips.
No piglit regressions on i915 (G33) or radeon (Radeon 7500).
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The value passed in count previously was "vertex after the last vertex
to be processed." Calling that "count" was misleading and kind of mean.
Looking at the code, many functions immediately do "count-start" to get
back the true count. That's just silly.
If it is better for the loops to be 'for (j = start; j < (start +
count); j++)', GCC will do that transformation.
NOTE: There is some strange formatting left by this patch. That was
done to make it more obvious that the before and after code is
equivalent. These will be fixed in the next patch.
No piglit regressions on i915 (G33) or radeon (Radeon 7500).
v2: Fix a remaining (count-start) in render_quad_strip_verts.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Brian Paul <[email protected]> [v1]
Cc: "10.6 11.0" <[email protected]>
|
|
|
|
|
|
|
|
| |
Gen6 MATH instructions can not execute in align16 mode, so swizzles or
writemasking are not allowed.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92033
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With NIR:
instructions in affected programs: 111508 -> 109193 (-2.08%)
helped: 507
Without NIR:
instructions in affected programs: 28763 -> 28474 (-1.00%)
helped: 186
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
v2 (Ken):
- Squash together commits for HS, DS, and TE, as well as fixes.
- Add INTEL_MASK variants so we can use SET_FIELD if we want.
- Rename GEN7_HS_INSTANCE_CONTROL to GEN7_HS_INSTANCE_COUNT to match
the documentation.
- Add some more fields from the PRMs.
- Add Broadwell variants.
Signed-off-by: Chris Forbes <[email protected]>
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now it is more similar to brw_fs_copy_propagation, with three
clear stages:
1) Build up the value we are propagating as if it were the source of a
single MOV:
2) Check that we can propagate that value
3) Build the final value
Previously everything was somewhat messed up, making the
implementation on some specific cases, like knowing if you can
propagate from a previous instruction even with type mismatches, even
messier (for example, with the need of maintaining more of one
has_source_modifiers). The refactoring clears stuff, and gives
support to this mentioned use case without doing anything extra
(for example, only one has_source_modifiers is used).
Shader-db results for vec4 programs on Haswell:
total instructions in shared programs: 1683842 -> 1669037 (-0.88%)
instructions in affected programs: 739837 -> 725032 (-2.00%)
helped: 6237
HURT: 0
v2: using 'arg' index to get the from inst was wrong
v3: rebased against last change on the previous patch of the series
v4: don't need to track instructions on struct copy_entry, as we
only set the source on a direct copy
v5: change the approach for a refactoring
v6: tweaked comments
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes bugs exposed by commit
2b1cdb0eddb73f62e4848d4b64840067f1f70865 in:
ES3-CTS.gtf.GL3Tests.shadow.shadow_execution_frag
No regressions observed in deqp, CTS or Piglit.
v2: address review feedback from Iago Toral:
- move rho calculation to else branch
- optimize dx and dy calculation
- fix documentation inconsistensies
Signed-off-by: Tapani Pälli <[email protected]>
Signed-off-by: Kevin Rogovin <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91114
Cc: "10.6 11.0" <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The only functional change here is that we now set EmitNoIndirectOutput and
EmitNoIndirectTemp for compute shaders. Compute shaders don't have outputs
per-se and we should have been setting EmitNoIndirectTemp all along.
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
All SKL SKUs except the lowest one which has half the L3 size actually have 384K
of URB per slice.
For once, I can explain how this mistake was made and how it was missed in
review... Historically when we enable a platform and put the production sizes,
you can simply look at the "smallest" SKU and see what its URB size is (and we
assumed it was the 1 slice variant). Since on newer platforms the URB sizes are
scaled automatically by HW, this was sufficient. On SKL, this is a bit different
as the lowest SKU actually has half of the L3 fused off. GT2 is the 1 slice (not
GT1) variant and it has 384K.
There are no Jenkins tests fixed (or regressions) and we don't expect any fixes
here because you can always run with less URB size.
Thanks to Sarah for bringing this to my attention.
Cc: Sarah Sharp <[email protected]>
Signed-off-by: Ben Widawsky <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
compr4 is represented by setting the high bit on the MRF number.
We need to mask it out before sanity checking the register number.
Fixes ~8000 assert fails on Ironlake and G45.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92066
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are some bug reports about shaders failing to compile in gen6
because MRF 14 is used when we need to spill. For example:
https://bugs.freedesktop.org/show_bug.cgi?id=86469
https://bugs.freedesktop.org/show_bug.cgi?id=90631
Discussion in bugzilla pointed to the fact that gen6 might actually have
24 MRF registers available instead of 16, so we could use other MRF
registers and avoid these conflicts (we still need to investigate why
some shaders need up to MRF 14 anyway, since this is not expected).
Notice that the hardware docs are not clear about this fact:
SNB PRM Vol4 Part2's "Table 5-4. MRF Registers Available in Device
Hardware" says "Number per Thread" - "24 registers"
However, SNB PRM Vol4 Part1, 1.6.1 Message Register File (MRF) says:
"Normal threads should construct their messages in m1..m15. (...)
Regardless of actual hardware implementation, the thread should
not assume th at MRF addresses above m15 wrap to legal MRF registers."
Therefore experimentation was necessary to evaluate if we had these extra
MRF registers available or not. This was tested in gen6 using MRF
registers 21..23 for spilling and doing a full piglit run (all.py) forcing
spilling of everything on the FS backend. It was also tested by doing
spilling of everything on both the FS and the VS backends with a piglit run
of shader.py. In both cases no regressions were observed. In fact, many of
these tests where helped in the cases where we forced spilling, since that
triggered the same underlying problem described in the bug reports. Here are
some results using INTEL_DEBUG=spill_fs,spill_vec4 for a shader.py run on
gen6 hardware:
Using MRFs 13..15 for spilling:
crash: 2, fail: 113, pass: 6621, skip: 5461
Using MRFs 21..23 for spilling:
crash: 2, fail: 12, pass: 6722, skip: 5461
This patch sets the ground for later patches to implement spilling
using MRF registers 21..23 in gen6.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In a later patch we will make BRW_MAX_MRF return a different value depending
on the hardware generation, but it is inconvenient to add a gen parameter
to the brw_reg functions only for the assertions, so move these to places where
we have the hardware generation available.
Ken suggested to add the asserts to brw_set_src0 and brw_set_dest since that
would make sure that we catch all uses of MRF registers, even those coming
from modules that generate native code directly, like blorp. Unfortunately,
this is very late in the process which can make things harder to debug, so add
asserts to the generator as well.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Until now we only used MRFs 1..15 for regular SEND messages, so the
message length could not possibly exceed the maximum size. Soon we'll
allow to use MRF registers 1..23 in gen6, so we need to be careful
not to build messages that can go beyond the limit. That could occur,
specifically, when building URB write messages, which we may need to
split in chunks due to their size. Previously we would simply go and
create a new message when we reached MRF 13 (since 13..15 were
reserved for spilling), now we also want to check the size of the
message explicitly.
Besides adding that condition to split URB write messages properly,
this patch also adds asserts in the generator. Notice that
brw_inst_set_mlen already asserts for this, but asserting in the
generators is easy and can make debugging easier in some cases.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
For consistency, either we have all class members dereferenced, or none.
In this case, very few are so lets get rid of them all.
Reviewed-by: Timothy Arceri <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
Broken by commit c228514c72cb2fd5fb9e510808e29204fc9e7ae1
"dri/common: use sysconfdir when looking for drirc".
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92054
Signed-off-by: Marcin Ślusarz <[email protected]>
|
|
|
|
|
|
|
|
| |
Useful when locally installed mesa has more quirks than the system one.
Signed-off-by: Marcin Ślusarz <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some hardware, such as adreno a3xx, supports txp on some but not all
sampler types. In this case we want more fine grained control over
which texture projectors get lowered.
v2: split out nir_lower_tex_options struct to make it easier to
add the additional parameters coming in the following patches
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Since the following patches will add additional tex-lowering related
functionality, which doesn't make sense to split out into a separate
pass (as they would require duplication of the projector lowering
logic), let's give this pass a more generic name.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
current instruction
SEL and MOV instructions, as long as they don't have source modifiers, are
just copying bits around. So those kind of instruction could be propagated
even if there are type mismatches. This is needed because NIR generates
integer SEL and MOV instructions whenever it doesn't know what else to
generate.
This commit adds support for copy propagation using current instruction
as reference.
Equivalent to commit 472ef9 but for vec4.
v2: include check for saturate, as Jason Ekstrand suggested
v3: check that the dst.type and the src type are the same, in order to
solve (among others) the following deqp regression with v2:
dEQP-GLES3.functional.shaders.operator.unary_operator.minus.lowp_uint_vertex
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
brw_fs_visitor.cpp: In member function 'void fs_visitor::emit_urb_writes()':
brw_fs_visitor.cpp:977:58: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
Reviewed-by: Tapani Pälli <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The idea here is not that it gives register coalescing a little bit of a
helping hand. It doesn't actually fix the coalescing problems, but it
seems to help a good bit.
Shader-db results for vec4 programs on Haswell:
total instructions in shared programs: 1746280 -> 1683959 (-3.57%)
instructions in affected programs: 1259166 -> 1196845 (-4.95%)
helped: 11363
HURT: 148
v2 (Jason Ekstrand):
- Run nir_move_vec_src_uses_to_dest after going out of SSA
- New shader-db numbers
Reviewed-by: Eduardo Lima Mitev <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When preparing the barrier payload, the instructions should operate in
simd8 mode since we only use 1 payload register.
fs_inst::regs_read is also updated to indicate that it only reads one
register for SHADER_OPCODE_BARRIER.
These issues were flagged by:
commit cadd7dd384b33a779d46bd664f456bed4a21a5b7
Author: Jason Ekstrand <[email protected]>
Date: Thu Jul 2 15:41:02 2015 -0700
i965/fs: Add a very basic validation pass
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Connor Abbott <[email protected]>
Reviewed-by: Eduardo Lima Mitev <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We made this switch in the FS backend some time ago and it seems to make a
number of things a bit easier. In particular, supporting SSA values takes
very little work in the backend and allows us to take advantage of the
majority of the SSA information even after we've gotten rid of Phi nodes.
Reviewed-by: Eduardo Lima Mitev <[email protected]>
|
|
|
|
|
|
|
|
| |
Currently the validation pass only validates that regs_read and
regs_written are consistent with the sizes of VGRF's. We can add more as
we find it to be useful.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
In certain conditions, we have to do bounds-checking in the shader for
image_load_store. The way this works for image loads is that we do a
predicated load and then emit a series of selects, one per component,
that gives us 0 or the loaded value depending on whether or not you're
in bounds. However, we were hard-coding 4 components which may not be
correct. Instead, we should be using size which is the number of
components read.
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
| |
Reviewed-by: Kristian Høgsberg <[email protected]>
|
|
|
|
| |
Reviewed-by: Kristian Høgsberg <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
We're trying to avoid a libdrm dependency in the core compiler, so let's
move the perf_debug code one level up from the brw_*_emit() helpers to
the brw_codegen_*_prog() helpers.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Signed-off-by: Kristian Høgsberg Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
All other precompile functions live in the brw_<stage>.c files, make fs
follow the convention.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Signed-off-by: Kristian Høgsberg Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This moves the compute shader code around in order to make the way the
code is split up more consistent. There should be no functional changes.
Typically we have a few files per stage:
brw_vs.c, brw_wm.c brw_gs.c:
code to drive code generation and implement precompiling and
cache search.
genX_<stage>_state.c
gen specific implementation of the state emission for the shader
stage.
The brw_*_emit() functions are all in the same files as the visitor
classes they use (with the exception of VS, which may use either vec4 or
fs).
To make compute follow this convention, we move the brw_cs_emit()
function into brw_fs.cpp. We can then rename brw_cs.cpp to brw_cs.c and
do this in C like the other similar files. Finally, move state setup
and atoms to gen7_cs_state.c.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Signed-off-by: Kristian Høgsberg Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
See similar fix for Readpixels in mesa commit 0d20790. Jason suggested
we need that for TexSubImage as well.
Cc: <[email protected]>
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Loads constants using integer as their register type, like it is
done in FS backend.
No shader-db changes in HSW.
Cc: "10.6 11.0" <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91716
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the register types do not match and the instruction
that contains the final destination is saturated, register
coalescing generated non-equivalent code.
This did not happen when using IR because types usually
matched, but it is visible in nir-vec4.
For example,
mov vgrf7:D vgrf2:D
mov.sat m4:F vgrf7:F
is coalesced to:
mov.sat m4:D vgrf2:D
The patch prevents coalescing in such scenario, unless the
instruction we want to coalesce into is a MOV (without type
conversion implied). In that case, the patch sets the register
types to the type of the final destination.
Shader-db results in HSW (only vec4 instructions shown):
total instructions in shared programs: 1754415 -> 1754416 (0.00%)
instructions in affected programs: 74 -> 75 (1.35%)
helped: 0
HURT: 1
GAINED: 0
LOST: 0
Only one extra instruction in one of the shaders, that comes from
eliminating a saturation error by preventing register coalesce.
Cc: "10.6 11.0" <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
|