| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
When binding a layered texture, the layer is already 0. There's no need
to special case this.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This ports over Chris Forbes' equivalent fixes in gen7_misc_state.c
from commit 77d55ef4819436ebbf9786a1e720ec00707bbb19.
No Piglit changes on Sandybridge.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Color clears can be performed via two separate shaders - one is the
generic "meta clear" shader (in meta.c); the other is the i965 specific
"repclear" shader (in brw_meta_fast_clear.c).
Giving them separate names makes them distinguishable when reading
INTEL_DEBUG=shader_time output.
v2: Call it "meta repclear", as suggested by Jason.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
The last patch left the code indented too far.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Paul's original code had emit_control_data_bits() skip the URB write if
vertex_count was 0. This meant wrapping every control data write in a
conditional write.
We accumulate control data bits in a single UD (32-bit) register. For
simple shaders that don't emit many vertices, the control data header
will be <= 32-bits long, so we only need to write it once at the end of
the shader.
For shaders with larger headers, we write out batches of control data
bits at EmitVertex(), when (vertex_count * bits_per_vertex) % 32 == 0.
On the first EmitVertex() call, the above expression will evaluate to
true simply because vertex_count == 0. But we want to avoid emitting
the control data bits, because we haven't accumulated 32-bits worth yet.
In other words, the vertex_count != 0 check is really only necessary in
the EmitVertex() batching case, not the end-of-thread case.
This saves a CMP/IF/ENDIF in every shader that uses EndPrimitive() or
multiple streams. The only downside is that a shader which emits no
vertices at all will execute an additional URB write---but such shaders
are pointless and not worth optimizing.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Don't assume that $(top_srcdir)/.git is a directory. It may be a
gitlink file [1] if $(top_srcdir) is a submodule checkout or a linked
worktree [2].
[1] A "gitlink" is a text file that specifies the real location of
the gitdir.
[2] Linked worktrees are a new feature in Git 2.5.
Cc: "10.6, 10.5" <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
|
|
| |
After tearing it out another level or two, and just passing the key and
vp directly, we can finally remove this struct. It also eliminates a
pointless memcpy() of the key.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
At this point, the brw_vs_compile structure only contains the key and
gl_vertex_program pointer. We may as well pass and store them directly;
it's simpler and more convenient (key-> instead of vs_compile->key...).
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Nothing outside of vec4_visitor uses it, so we may as well keep it
internal.
Commit db9c915abcc5ad78d2d11d0e732f04cc94631350 for the vec4 backend.
(The empty class will be going away soon.)
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
| |
This is more consistent with how we do it in the FS backend, and reduces
a tiny bit of duplication. It'll also allow for a bit more tidying.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch makes us only issue the performance warning about register
spilling if we actually spilled registers. We also use scratch space
for indirect addressing and the like.
This is basically commit c51163b0cf7aff0375b1a5ea4cb3da9d9e164044 for
the vec4 backend.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Jason plumbed this through a while back in the FS backend, but
apparently we were just passing NULL in the vec4 backend.
This patch passes brw in as intended.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Adding new shader stages to a switch statement is less confusing than an
if-else-if ladder where all but the first case are fragment shader
specific (but don't claim to be).
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It's only used inside #ifdef DEBUG. Cuts ~1.7k of .text, and more
importantly prevents a larger code size regression in the next commit
when the .used field is replaced and calculated on demand.
text data bss dec hex filename
4945468 195152 26192 5166812 4ed6dc i965_dri.so before
4943740 195152 26192 5165084 4ed01c i965_dri.so after
And surround the emit and total fields with #ifdef DEBUG to prevent
such mistakes from happening again.
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch can cause an infinite recursion if the previous patch titled, "i965:
Track finished batch state" isn't present (backporters take notice).
v2: Sent out the wrong patch originally. This patches switches the order of
flushes, doing the generic flush before the CC_STATE, and the required
workaround flush afterwards
v3: Only perform workaround for render ring
Add text to the BATCH_RESERVE comments
v4 (By Ken): Rebase; update citation to mention PRM and Wa name; combine two
blocks.
http://otc-mesa-ci.jf.intel.com/job/bwidawsk/171/
Signed-off-by: Ben Widawsky <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
With the exception of gen8, the sole user of the workaround bo are for
emitting pipe controls. Move it out of the purview of the batchbuffer
and into the pipecontrol.
Signed-off-by: Chris Wilson <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Martin Peres <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Move the query for the TIMESTAMP register from context init to the
screen, so that it is only queried once for all contexts.
On 32bit systems, some old kernels trigger a hw bug resulting in the
TIMESTAMP register being shifted and the low 32bits always zero. Detect
this by repeating the read a few times and check the register is
incrementing every 80ns as expected and not stuck on zero (as would be
the case with the buggy kernel/hw.).
Signed-off-by: Chris Wilson <[email protected]>
Cc: Martin Peres <[email protected]>
Reviewed-by: Martin Peres <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Matrix vertex attributes have their columns padded out to vec4s, which
I was failing to account for. Scalar NIR expects them to be packed,
however.
Fixes 1256 dEQP tests on Broadwell.
Cc: [email protected]
Signed-off-by: Kenneth Graunke <[email protected]>
Tested-by: Mark Janes <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was probably disabled due to a combination of several bugs in the
generator code (fixed earlier in this series) and a misunderstanding
of the hardware spec. The documentation for most control flow
instructions mentions among other restrictions:
"Instruction compression is not allowed."
This however doesn't have any implications on 16 wide not being
supported, because none of the control flow instructions have
multi-register operands (control flow instructions are not compressed
on more recent hardware either, except maybe SNB's IF with inline
compare). In fact Gen4-5 had 16-wide control flow masks and stacks,
and the spec mentions in several places that control flow instructions
push and pop 16 channels worth of data -- Otherwise there doesn't seem
to be any indication that it shouldn't work.
Causes no piglit regressions, and gives the following shader-db
results on ILK:
total instructions in shared programs: 4711384 -> 4711384 (0.00%)
instructions in affected programs: 0 -> 0
helped: 0
HURT: 0
GAINED: 1215
LOST: 0
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
From the hardware docs for the DO instruction:
"Execution size is ignored for this instruction."
My observation on ILK hardware contradicts the spec though, channels
over the execution size of a DO instruction won't enter the loop, and
channels over the execution size of a WHILE instruction will exit the
loop after the first iteration -- The latter is consistent with the
spec though, there's no claim about the execution size being ignored
for the WHILE instruction so it's not completely unexpected that it
has an influence on the evaluation of EMask.
The execute_size argument of brw_DO() shouldn't have any effect on
Gen6 and newer hardware. On Gen4-5 WHILE instructions inherit the
execution size from the matching DO, so this patch should fix them
too. The execution size of BREAK and CONT instructions was already
being set correctly.
Fixes some 50 piglit tests on Gen4-5 when forced to run shaders with
conditional and loop instructions 16-wide,
e.g. shaders/glsl-fs-continue-inside-do-while.
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The hardware docs don't mention explicitly what these fields should
be, but I've verified experimentally on ILK that using a GRF as
destination causes the register to be corrupted when the execution
size of an ENDIF instruction is higher than 8 -- and because the
destination we were using was g0, eventually a hang.
Fixes some 150 piglit tests on Gen4-5 when forced to run shaders with
if conditionals 16-wide, e.g. shaders/glsl-fs-sampler-numbering-3.
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Ben noticed that I said each PIPE_CONTROL was 4 DWords, but it's
actually 5 DWords on Gen6-7. We've been reserving insufficient space
for performance monitoring on Sandybridge, which means it would likely
break if you used that functionality. (Thankfully, no one does...)
Also, the existing number of 146 was the result of me flubbing up the
arithmetic: it should have actually been 140.
Cc: [email protected]
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
Reviewed-by: Ben Widawsky <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
On Gen9+ there is a new bit in 3DSTATE_PS_EXTRA that must be set if
the shader sends a message to the pixel interpolator. This fixes the
interpolateAt* tests on SKL, apart from interpolateatsample-nonconst
but that is not implemented anywhere so it's not a regression.
Reviewed-by: Ben Widawsky <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Cc: "10.6 10.5" <[email protected]>
|
|
|
|
|
| |
The reason might be that no commands have been submitted before the flush
and the GPU is idle.
|
|
|
|
| |
Reviewed-by: Brian Paul <[email protected]>
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There was a comment saying that in SIMD16 mode the pixel interpolator
returns coords interleaved 8 channels at a time and that this requires
extra work to support. However, this interleaved format is exactly
what the PLN instruction requires so I don't think anything needs to
be done to support it apart from removing the line to disable it and
to ensure that the message lengths for the send message are correct.
I am more convinced that this is correct because as it says in the
comment this interleaved output is identical to what is given in the
thread payload. The code generated to apply the plane equation to
these coordinates is identical on SIMD16 and SIMD8 except that the
dispatch width is larger which implies no special unmangling is
needed.
Perhaps the confusion stems from the fact that the description of the
PLN instruction in the IVB PRM seems to imply that the src1 inputs are
not interleaved so it wouldn't work. However, in the HSW and BDW PRMs,
the pseudo-code is different and looks like it expects the interleaved
format. Mesa doesn't seem to generate different code on IVB to
uninterleave the payload registers and everything is working so I can
only assume that the PRM is wrong.
I tested the interpolateAt tests on HSW and did a full Piglit run on
IVB on there were no regressions.
Reviewed-by: Chris Forbes <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
The optimization logic relies on being able to read out constbuf values
from program parameters. However that only works if there's no relative
addressing involved.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91173
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: "10.5 10.6" <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When there are no color buffer render targets, gen6 and gen7 still
use the first BLEND_STATE element to determine alpha test.
gen6_upload_blend_state was allocating zero elements when
ctx->Color.AlphaEnabled was false.
That left _3DSTATE_CC_STATE_POINTERS or _3DSTATE_BLEND_STATE_POINTERS
pointing to random data from some previous brw_state_batch().
That sometimes suppressed depth rendering when those bits
happened to mean COMPAREFUNC_NEVER.
This produced flickering shadows for dota2 reborn.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80500
Reviewed-by: Chris Forbes <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These checks were in Mesa prior to commit fbba25bba, but they were
not necessary for the purpose that Mesa intended (check if we could
resolve ReadPixels via memcpy), so that commit took them away.
Unfortunately, it seems that some Gallium drivers rely on these
checks to make the decision of whether they should fallback to Mesa's
implementation of ReadPixels correctly. Michel Dänzer reported that
the following piglit test would fail on radeonsi after commit
fbba25bba:
spec@ext_texture_integer@fbo_integer_readpixels_sint_uint
This patch puts the checks back in Gallium, where they are needed.
Tested-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 4b249d2ee (mesa: Handle transferOps in texstore_rgba) introduced
proper transferops handling, but in updating the source to the newly
allocated temporary image neglected to reset the source packing. Set it
to the default which should be appropriate for the floats used.
Fixes: 4b249d2ee (mesa: Handle transferOps in texstore_rgba)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91173
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: "10.5 10.6" <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Although the horizontal and vertical alignment fields are ignored here,
0 is a reserved value for them and may cause undefined behavior. Change
the default value to an abitrary valid one.
v2: add comment about chosen value (Topi).
Reviewed-by: Anuj Phogat <[email protected]>
Signed-off-by: Nanley Chery <[email protected]>
|
|
|
|
|
|
|
| |
Now that we can create builders with a bigger width than their parent as
long as it's exec_all, we don't need to create the instruction manually.
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
is on.
This assertion was meant to catch code inadvertently escaping the
control flow jail determined by the group of channel enable signals
selected by some caller, however it seems useful to be able to
increase the default execution size as long as force_writemask_all is
enabled, because force_writemask_all is an explicit indication that
there is no longer a one-to-one correspondence between channels and
SIMD components so the restriction doesn't apply.
In addition reorder the calls to fs_builder::group and ::exec_all in a
couple of places to make sure that we don't temporarily break this
invariant in the future for instructions with exec_size higher than
the dispatch width.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
We want to require different versions for nouveau and nouveau_vieux.
autoconf will only check for NOUVEAU once if both drivers are enabled,
meaning both version checks don't get executed. Rename the nouveau_vieux
one to NVVIEUX to avoid the issue.
Signed-off-by: Ilia Mirkin <[email protected]>
Tested-by: Alexandre Courbot <[email protected]>
Tested-by: Martin Peres <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Resource list can be created properly only after LinkShader hook
has been called to make sure all dead variables have been removed.
Signed-off-by: Tapani Pälli <[email protected]>
Reviewed-by: Martin Peres <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90925
|
|
|
|
| |
PIXEL_X/Y takes a vec2 in the first argument
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As of now, the width field is no longer used for anything. The width field
"seemed like a good idea at the time" but is actually entirely redundant
with the instruction's execution size. Initially, it gave us the ability
to easily set the instructions execution size based entirely on register
widths. With the builder, we can easiliy set the sizes explicitly and the
width field doesn't have as much purpose. At this point, it's just
redundant information that can get out of sync so it really needs to go.
Reviewed-by: Topi Pohjolainen <[email protected]>
Acked-by: Francisco Jerez <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
Acked-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
| |
There are a variety of places where we use dst.width / 8 to compute the
size of a single logical channel. Instead, we should be using exec_size.
Reviewed-by: Topi Pohjolainen <[email protected]>
Acked-by: Francisco Jerez <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
Acked-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
| |
Now that all of the non-explicit constructors are gone, we don't need to
guess anymore.
Reviewed-by: Topi Pohjolainen <[email protected]>
Acked-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
| |
Previously we used dst.width but the two *should* be the same.
Reviewed-by: Topi Pohjolainen <[email protected]>
Acked-by: Francisco Jerez <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
Acked-by: Francisco Jerez <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
Acked-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Previously, we were just depending on register widths to ensure that
various things were exec_size of 1 etc. Now, we do so explicitly using the
builder.
Reviewed-by: Topi Pohjolainen <[email protected]>
Acked-by: Francisco Jerez <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
Acked-by: Francisco Jerez <[email protected]>
|