| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
Going to need this soon (not going to bother with avx2 intrinsics at this time
but don't want to do workarounds for true vector shifts if llvm itself can use
them just fine and won't need the gazillion instruction emulation).
Not really tested other than my cpu returns 0 for these features...
(I have no idea if llvm actually would emit avx2/xop instructions neither...)
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Need to check the wrap mode of the actually used coords not a fixed 2.
While checking more than necessary would only potentially disable aos and
not cause any harm I'm pretty sure for 3d textures it could have caused
assertion failures (if s,t coords have simple filter and r not).
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Turns out it is actually very complicated to figure out what a format really
is wrt range, as using channel information for determining unorm/snorm etc.
doesn't work for a bunch of cases - namely compressed, subsampled, other.
Also while here add clamping for uint/sint as well - d3d10 doesn't actually
need this (can only use ld with these formats hence no border) and we could
do this outside the shader for GL easily (due to the fixed texture/sampler
relation) do it here too just so I can forget about it.
v2: move border color clamping out of fetch texel. Also change it to clamp
the whole border vector at once (and use vectorized load of border color),
which saves a couple of instructions - needs some different handling of
mixed signed/unsigned formats so skip the per channel stuff and just derive
this from first channel except for special formats.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There's a new debug value used to disable per-quad lod optimizations
in fragment shader (ignored for vs/gs as the results are just too wrong
typically). Also trying to detect if a supplied lod value is really a
scalar (if it's coming from immediate or constant file) in which case
sampler code can use this to stay on per-quad-lod path (in fact for
explicit lod could simplify even further and use same lod for both
quads in the avx case but this is not implemented yet).
Still need to actually implement per-element lod bias (and derivatives),
and need to handle per-element lod in size queries.
v2: fix comments, prettify.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The rules were writing files to e.g. util/u_indices_gen.py, but in an
out-of-tree build this directory doesn't exist in the build directory. So,
create the directories just in case.
Cc: [email protected]
Reviewed-by: Matt Turner <[email protected]>
Signed-off-by: Ross Burton <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
| |
Code should loop through and cleanup the three (VL_NUM_COMPONENTS) idct
buffers, rather than doing the first one three times.
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
| |
Check if we have successfully allocated memory.
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I was looking into some minor 422 issues/discrepencies I noticed long
ago using vdpau on my rv790.
I noticed that there is code that is halving height rather than width -
422 is full height AFAIK.
Making the changes below doesn't actually make any noticable difference
to what I was looking into.
Maybe there are more but here's three I've found so far
Reviewed-by: Christian König <[email protected]>
|
|
|
|
| |
Signed-off-by: Christian König <[email protected]>
|
|
|
|
| |
Signed-off-by: Christian König <[email protected]>
|
|
|
|
| |
Signed-off-by: Christian König <[email protected]>
|
|
|
|
| |
Signed-off-by: Christian König <[email protected]>
|
|
|
|
| |
Signed-off-by: Christian König <[email protected]>
|
|
|
|
|
|
|
|
| |
If clipdistance for one of the vertices is nan (or inf) then the
entire primitive should be discarded.
Signed-off-by: Zack Rusin <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
| |
That magic wasn't meant to be commited, need to work on some proper fix.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Doing the comparisons pre-filter is highly recommended by OpenGL (and d3d9)
and definitely required by d3d10.
This actually doesn't do it pre-filter but more "in-filter" as otherwise
need to push the comparisons even further down into fetch code and this
also trivially allows using a somewhat cheaper lerp.
Doing it pre-filter would actually have some performance advantage for UNORM
formats (because the comparisons should be done in texture format, we'd only
need to convert the shadow ref coord to texture format once, but in turn would
save converting the per-sample texture values to floats) but this gets a bit
messy as this has implications for border color handling as well (which needs
to be done prior to depth comparisons, hence would also need to convert border
color to texture format too or use some other tricks like doing separate border
color / shadow ref comparison and simply using that result directly when doing
border replacement).
Should make no difference for nearest filtering, and performance for linear
filtering should be mostly the same too (essentially have one more comparison
instruction per sample, and replace the sub/mul/add lerp with a sub/and/and/add
special "lerp" which all in all shouldn't be much of a difference).
v2: get rid of old code completely
Reviewed-by: Zack Rusin <[email protected]>
|
|
|
|
| |
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
| |
The sample index is always in W.
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This makes things a bit nicer, and more importantly it fixes an issue
where a "downgraded" array texture (due to view reduced to 1 layer and
addressed with (non-array) samplec instruction) would use the wrong
coord as shadow reference value. (This could also be fixed by passing
target through the sampler interface much the same way as is done for
size queries, might do this eventually anyway.)
And if we'd ever want to support (shadow) cube map arrays, we'd need
5 coords in any case.
v2: fix bugs (texel fetch using wrong layer coord for 1d, shadow tex
using wrong shadow coord for 2d...). Plus need to project the shadow
coord, and just for fun keep projecting the layer coord too.
Reviewed-by: Zack Rusin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of passing s,t,r coordinates pass a coord array - the reason is that
I need to pass more coords (in particular for shadow "coord", future will also
need another one for cube map arrays) so just pass them as an array.
Also, to simplify things, use fixed location for the shadow reference value I
want to get rid of the silly "where is the right coord value" game.
Keep old-style however for aos sampling (which is not going to need shadow
coord, though for cube map arrays it still would need fixing).
(Next patch will pass those through using the new arrangement directly from
sampler interface.)
v2: fix up soa split path (unreachable currently but still...)
Reviewed-by: Zack Rusin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We need to put border color into texture format color space which
essentially means clamping for non-float, normalized formats (not entirely
sure if we're also meant to quantize the float but it's probably ok not to
do it thankfully).
For OpenGL we could do this easily outside generated code due to the
1:1 sampler/texture correspondence but not for d3d10 which is terrible
(as we recalculate a constant over and over again per shader invocation).
Fortunately border color should be rare enough that we don't care THAT much.
Reviewed-by: Zack Rusin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Calling the prepare outputs cleans up the slot assignments
for outputs, unfortunately aapoint and aaline didn't have
code to reset their slots after the initial setup, this
was messing up our slot assignments. The unfilled stage
was just missing the initial assignment of the face slot.
This fixes all of the reported piglit failures.
Signed-off-by: Zack Rusin <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This patch adds the level query support to the video decoders
and uses some more reasonable defaults.
v2: (ck) add commit message
Reviewed-by: Christian König <[email protected]>
|
|
|
|
|
|
|
|
|
| |
FSEQ/FSGE/FSLT/FSNE work just the same as SEQ/SGE/SLT/SNE except skip the
select.
And just for consistency use the same appropriate ordered/unordered comparisons
for the old opcodes as well.
Reviewed-by: Zack Rusin <[email protected]>
|
|
|
|
|
|
|
|
| |
Also while here add a bunch of other forgotten (integer) instructions to
tgsi_util_get_inst_usage_mask() (which isn't used for much except optimizing
away unused input components), though it may still be incomplete.
Reviewed-by: Zack Rusin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Because we must maintain an exec_mask even if there's currently nothing
on the mask stack, we can still have an exec_mask at the end of the program.
Effectively, this mask should be set back to default when returning from main.
Without relying on END/RET opcode (I think it's valid to have neither) it is
actually difficult to do this, as there doesn't seem any reasonable place to
do it, so instead let's just say the exec_mask is invalid outside main (which
it really is effectively).
The problem is that geometry shader called end_primitive outside the shader
(in the epilogue), and as a result used a bogus mask, leading to bugs if we
had to set the (somewhat misnamed) ret_in_main bit anywhere. So just avoid
the mask combining function when called from outside the shader.
Reviewed-by: Zack Rusin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The code was quite weird, the second comparison was in fact a complete no-op
and we can also do the comparison with the vector directly instead of scalar,
which should not also be faster but it is way more obvious how that mask
is actually going to look like.
Reviewed-by: Zack Rusin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of reducing masks to 0/1 simply use the mask directly as -1.
Also use some signed comparison instead of unsigned (as far as I understand
these values have to be (very) small and signed means llvm doesn't have to
apply additional logic to do the unsigned comparisons the cpu can't do).
Saves a couple of instructions in some test geometry shader here.
v2: that was a bit to much optimization, don't skip combining the masks...
Reviewed-by: Zack Rusin <[email protected]>
|
|
|
|
|
|
| |
And dump the variant key too (same as vs does).
Just so I can stop wondering why I see the tgsi dump for fs and vs but not
gs...
|
|
|
|
| |
Was meant to match the name used elsewhere, spotted by Anthony.
|
|
|
|
|
|
|
|
|
|
|
| |
My previous attempt at doing so double-failed miserably (minification of
zero still gives one, and even if it would not the value was never written
anyway).
While here also rename the confusingly named int_vec bld as we have int vecs
of different sizes, and rename need_nr_mips (as this also changes out-of-bounds
behavior) to is_sviewinfo too.
Reviewed-by: Zack Rusin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
d3d10 has no notion of distinct array resources neither at the resource nor
sampler view level. However, shader dcl of resources certainly has, and
d3d10 expects resinfo to return the values according to that - in particular
a resource might have been a 1d texture with some array layers, then the
sampler view might have only used 1 layer so it can be accessed both as 1d
or 1d array texture (I think - the former definitely works). resinfo of a
resource decleared as array needs to return number of array layers but
non-array resource needs to return 0 (and not 1). Hence fix this by passing
the target from the shader decl to emit_size_query and use that (in case of
OpenGL the target will come from the instruction itself).
Could probably do the same for actual sampling, though it may not matter there
(as the bogus components will essentially get clamped away), possibly could
wreak havoc though if it REALLY doesn't match (which is of course an error
but still).
Reviewed-by: Zack Rusin <[email protected]>
|
|
|
|
|
|
|
| |
Specifically, must return 0 for non-existent mip levels (and non-existent
textures which is an unsolved problem) for everything but total mip count.
Reviewed-by: Zack Rusin <[email protected]>
|
|
|
|
|
| |
Otherwise gcc might do very unsafe optimizations, spotted by Uros Bizjak.
Hopefully this time it's finally right?
|
|
|
|
|
|
|
|
|
|
|
|
| |
We can't be injecting the primitive id's in the pipeline because
by that time the primitives have already been decomposed. To
properly number the primitives we need to handle the adjacency
primitives by hand. This patch moves the prim id injection into
the original primitive assembler and completely removes the
useless pipeline stage.
Signed-off-by: Zack Rusin <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Without reseting the vertex id, with primitives where the same
vertex is used with different primitives (e.g. tri/lines strips)
our vbuf module won't re-emit those vertices with the changed
primitive id. So lets reset the vertex id whenever injecting
new primitive id to make sure that the vertex data is correctly
emitted.
Signed-off-by: Zack Rusin <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Before inserting new front face and prim id outputs cleanup
the old extra outputs, otherwise our cache will use previous
output slots which will break as soon as outputs of the current
shader don't match the last.
Signed-off-by: Zack Rusin <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
| |
Oops. Should fix https://bugs.freedesktop.org/show_bug.cgi?id=67921
|
| |
|
|
|
|
|
|
|
|
| |
d3d10 specifies ordered comparisons for everything but not_equal which is
unordered (http://msdn.microsoft.com/en-us/library/windows/desktop/cc308050.aspx).
OpenGL probably doesn't care.
Reviewed-by: Zack Rusin <[email protected]>
|
|
|
|
|
|
|
|
| |
This is wrong both for OpenGL and d3d. (In fact clamping is a side effect
of converting to depth format, so this should really do quantization too
at least in d3d10 for the comparisons to be truly correct.)
Reviewed-by: Zack Rusin <[email protected]>
|
|
|
|
|
|
|
| |
Clearly the returned values need to be per-element if the lod is per element.
Does not actually change behavior yet.
Reviewed-by: Zack Rusin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
For d3d10 and ARB_robust_buffer_access_behavior, we are required to return
0 for out-of-bounds coordinates (for which we can just enable the code already
there was just disabled). Additionally, also need to return 0 for
out-of-bounds mip level and out-of-bounds layer. This changes the logic
so instead of clamping the level/layer, an out-of-bound mask is computed
instead in this case (actual clamping then can be omitted just like with
coordinates, since we set the fetch offset to zero if that happens anyway).
Reviewed-by: Zack Rusin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
While so far this only causes some harmless test failures, there's lots more
cpus with DAZ. All 64bit capable ones can do it (particularly relevant for
AMD cpus as they supported sse3 very very late) but if really necessary we
can check support for that for real with some more magic.
(In fact just about ANY cpu with sse2 can support DAZ, I believe the only
exception are first gen P4 (Willamette) and from those only early steppings
which can't do it it's almost like intel forgot to add it... - a real pity
though docs say you can't just try to set it as they will throw a GPF.)
While this was meant to address https://bugs.freedesktop.org/show_bug.cgi?id=67672
it does not fix it. Most likely the tests need fixing as I don't think
there's any guarantee about denorm handling in the reference math library
functions if the flags aren't set to standard values. Nevertheless enabling
DAZ on all cpus which can do it should be the right thing to do.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Should be much faster, seems to work in softpipe.
While here (also it's now disabled) fix up the pow factor - the former value
is what is in GL core it is however not actually accurate to fp32 standard
(as it is 1.0/2.4), and if someone would do all the accurate math there's no
reason to waste 8 mantissa bits or so...
v2: use real table generating function instead of just printing the values
(might take a bit longer as it does calculations on some 3+ million floats
but much more descriptive obviously).
Also fix up another inaccurate pow factor (this time in the python code) -
wondering where the couple one bit errors came from :-(.
Reviewed-by: Jose Fonseca <[email protected]>
Reviewed-by: Zack Rusin <[email protected]>
|
|
|
|
| |
I think it's actually not good enough now...
|
|
|
|
|
|
|
|
|
|
| |
Since llvm -3.4svn r187618, TargetOptions doesn't provide
RealignStack, so only enable it with llvm<3.4
This option must now be specified using function attributes, see LLVM
commit r187618
Reviewed-by: Tom Stellard <[email protected]>
|
|
|
|
|
|
|
|
|
| |
unfilled_stage::face_slot is of type int.
Fixes "Unsigned compared against 0" defect reported by Coverity.
Signed-off-by: Vinson Lee <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
| |
pp_free_bos dereferences ppq without a null check.
Fixes "Dereference before null check" defect reported by Coverity.
Signed-off-by: Vinson Lee <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|