| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
If the fragment shader is null then pixel shader invocations have
to be equal to zero. And if we're running a null ps then clipper
invocations and primitives should be equal to zero but only
if both stancil and depth testing are disabled.
Signed-off-by: Zack Rusin <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
My previous attempt at doing so double-failed miserably (minification of
zero still gives one, and even if it would not the value was never written
anyway).
While here also rename the confusingly named int_vec bld as we have int vecs
of different sizes, and rename need_nr_mips (as this also changes out-of-bounds
behavior) to is_sviewinfo too.
Reviewed-by: Zack Rusin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
d3d10 has no notion of distinct array resources neither at the resource nor
sampler view level. However, shader dcl of resources certainly has, and
d3d10 expects resinfo to return the values according to that - in particular
a resource might have been a 1d texture with some array layers, then the
sampler view might have only used 1 layer so it can be accessed both as 1d
or 1d array texture (I think - the former definitely works). resinfo of a
resource decleared as array needs to return number of array layers but
non-array resource needs to return 0 (and not 1). Hence fix this by passing
the target from the shader decl to emit_size_query and use that (in case of
OpenGL the target will come from the instruction itself).
Could probably do the same for actual sampling, though it may not matter there
(as the bogus components will essentially get clamped away), possibly could
wreak havoc though if it REALLY doesn't match (which is of course an error
but still).
Reviewed-by: Zack Rusin <[email protected]>
|
|
|
|
|
|
|
| |
Clearly the returned values need to be per-element if the lod is per element.
Does not actually change behavior yet.
Reviewed-by: Zack Rusin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Nowadays -1 for slots means that the semantic is not present, so
we need to store it in a signed variables, otherwise <0 comparisons
are pointless. Fixes
http://bugzilla.eng.vmware.com/show_bug.cgi?id=67811 (at least
with softpipe, edgeflags don't work wit llvmpipe)
Signed-off-by: Zack Rusin <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
If gs is null, then freeing state->shader.tokens would result in a null
dereference.
Fixes "Dereference after null check" defect reported by Coverity.
Signed-off-by: Vinson Lee <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
| |
Lets make sure the frontface is 1 for front and -1 for back.
Discussed with Roland and Jose.
Signed-off-by: Zack Rusin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The loop was iterating over all the fs inputs and setting them
to perspective interpolation, then after the loop we were
creating extra output slots with the correct interpolation. Instead
of injecting bogus extra outputs, just set the interpolation
on front face and prim id correctly when doing the initial scan
of fs inputs.
Signed-off-by: Zack Rusin <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Draw module can decompose primitives into wireframe models, which
is a fancy word for 'lines', unfortunately that decomposition means
that we weren't able to preserve the original front-face info which
could be derived from the original primitives (lines don't have a
'face'). To fix it allow draw module to inject a fake face semantic
into outputs from which the backends can figure out the original
frontfacing info of the primitives.
Signed-off-by: Zack Rusin <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The spec says that front-face is true if the value is >0 and false
if it's <0. To make sure that we follow the spec, lets just
subtract 0.5 from our value (llvmpipe did 1 for frontface and 0
otherwise), which will get us a positive num for frontface and
negative for backface.
Signed-off-by: Zack Rusin <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
| |
Cc: [email protected]
[ Francisco Jerez: Fix "PIPE_ENDIAN_SMALL" in the documentation,
define PIPE_ENDIAN_NATIVE. ]
|
|
|
|
| |
Tested-by: José Fonseca <[email protected]>
|
|
|
|
|
|
| |
Never called.
Trivial.
|
|
|
|
|
|
|
|
|
| |
Test infs, zeros and nans with our arith functions to assure
correct/defined behavior with those values.
Signed-off-by: Zack Rusin <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Usually with fixed point renderbuffers clamping is done as part of conversion.
However, since we blend in float format, we essentially skip all conversion
steps pre-blend but since this is still a fixed point renderbuffer we must
still clamp the inputs in this case. Makes no difference for piglit though.
Obviously we could skip this if fragment color clamping is enabled, but a)
this is deprecated in OpenGL (d3d never had it) and b) we don't support it
natively so it gets baked into the shader.
Also add some comment about logic ops being broken for srgb, luckily no test
tries to do that as there's no easy fix...
Reviewed-by: Jose Fonseca <[email protected]>
Reviewed-by: Zack Rusin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We were fixing up the blend factor to ZERO, however this only works correctly
with fixed point render buffers where the input values are clamped to 0/1
(because src_alpha_saturate is min(As, 1-Ad) so can be negative with unclamped
inputs). Haven't seen any failure anywhere due to that with fixed point SNORM
buffers (which clamp inputs to -1/1) but it should apply there as well (snorm
blending is rare, even opengl 4.3 doesn't require snorm rendertargets at all,
d3d10 requires them but they are not blendable).
Doesn't look like piglit hits this though (some internal testing hits the
float case at least). (With legacy OpenGL we could theoretically still use the
fixup to zero if the fragment color clamp is enabled, but we can't detect that
easily since we don't support native clamping hence it gets baked into the
shader.)
Reviewed-by: Jose Fonseca <[email protected]>
Reviewed-by: Zack Rusin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Just use the new conversion functions to do the work. The way it's plugged
in into the blend code is quite hacktastic but follows all the same hacks
as used by packed float format already.
Only support 4x8bit srgb formats (rgba/rgbx plus swizzle), 24bit formats never
worked anyway in the blend code and are thus disabled, and I don't think anyone
is interested in L8/L8A8. Would need even more hacks otherwise.
Unless I'm missing something, this is the last feature except MSAA needed for
OpenGL 3.0, and for OpenGL 3.1 as well I believe.
v2: prettify a bit, use separate function for packing.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The D3D10 spec is very explicit about treatment of denorm floats and
the behavior is exactly the same for them as it would be for -0 or
+0. This makes our shading code match that behavior, since OpenGL
doesn't care and on a few cpu's it's faster (worst case the same).
Float16 conversions will likely break but we'll fix them in a follow
up commit.
Signed-off-by: Zack Rusin <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
d3d10 requires per-pixel lod calculations for explicit lod, lod bias and
explicit derivatives, and we should probably do it for OpenGL too - at least
if they are used from vertex or geometry shaders (so doesn't apply to lod
bias) this doesn't just affect neighboring pixels.
Some code was already there to handle this so fix it up and enable it.
There will no doubt be a performance hit unfortunately, we could do better
if we'd knew we had a real vector shift instruction (with variable shift
count) but this requires AVX2 on x86 (or a AMD Bulldozer family cpu).
Don't do anything for lod bias and explicit derivatives yet, though
no special magic should be needed for them neither.
Likewise, the size query is still broken just the same.
v2: Use information if lod is a (broadcast) scalar or not. The idea would be
to base this on the actual value, for now just pretend it's a scalar in fs
and not a scalar otherwise (so, per-pixel lod is only used in gs/vs but same
code is generated for fs as before).
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
| |
Not needed with do_dead_builtin_varyings.
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
b04a295a4a0cd2defe352b3193b5fa79ca8fc9fc removed seemingly unnecessary
code in get_query. Turns out this code could in fact be reached - while
timestamps are always binned, if there are no bins (which happens if fb
size is 0) then the rasterization query code filling this in is still
never executed.
So fix this up by filling in some timestamp, but do it at EndQuery time
not GetQuery time which should be more appropriate.
Makes piglit arb_timer_query-timestamp-get happy again.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was just ignored (unless for some reason like unfilled polys draw was
handling this).
I'm not convinced of that code, putting the float for the clamp in the key
isn't really a good idea. Then again the other floats for depth bias are
already in there too anyway (should probably have a jit_context for the
setup function), so this is just a quick fix.
Also, the "minimum resolvable depth difference" used isn't really right as it
should be calculated according to the z values of the current primitive
and not be a constant (of course, this only makes a difference for float
depth buffers), at least for d3d10, so depth biasing is still not quite right.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
| |
timestamp queries are always binned in an active scene, therefore
always have a result.
|
|
|
|
|
|
|
|
|
| |
If there are queries active the opaque optimization reseting the bin needs to
be disabled.
(Not really tested since the bug was discovered by code inspection not
an actual test failure.)
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
OpenGL doesn't support this but d3d10 does.
It is a bit of a pain as it is necessary to keep track of queries
still active at the end of a scene, which is also why I cheat a bit
and limit the amount of simultaneously active queries to (arbitrary)
16 (simplifies things because don't have to deal with a real list
that way). I can't think of a reason why you'd really want large
numbers of overlapping/nested queries so it is hopefully fine.
(This only affects queries which need to be binned.)
v2: don't copy remainder of array when deleting an entry simply replace
the deleted entry with the last one (order doesn't matter).
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously lp_rast_begin_query commands were always inserted into each bin,
and re-issued if the scene was restarted, while lp_rast_end_query commands
were executed for each still active query at the end of tile rasterization.
Also, the ps_invocations and vis_counter were set to zero when the respective
command was encountered.
This however cannot work for multiple queries of the same type (note that
occlusion counter and occlusion predicate while different type were also
affected).
So, change the logic to always set the ps_invocations and vis_counter to zero
at the start of tile rasterization, and then use "start" and "end" per-thread
query values when encountering the begin/end query commands instead, which
should work for multiple queries of the same type. This also means queries do
not have to be reissued in a new scene, however they still need to be finished
at end of tile rasterization, so a list of queries still active at the end of
a scene needs to be maintained.
Also while here don't bin the queries which don't do anything in rasterization.
(This change does not actually handle multiple queries of the same type yet,
as the list of active queries is just a simple fixed array and setup can still
only have one query active per type.)
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Squashed commit of the following:
commit 0857a7e105bfcbc4d1431b2cc56612094c747ca3
Author: Richard Sandiford <[email protected]>
Date: Tue Jun 18 12:25:07 2013 -0400
gallivm: Fix lp_build_rgba8_to_fi32_soa for big endian
Reviewed-by: Adam Jackson <[email protected]>
Signed-off-by: Richard Sandiford <[email protected]>
commit 0d65131649a8aa140e2db228ba779d685c4333e3
Author: Richard Sandiford <[email protected]>
Date: Tue Jun 18 12:25:07 2013 -0400
gallivm: Fix big-endian machines
This adds a bit-shift count to the format table, and adds the concept of
vector or bitwise alignment on gathers.
Reviewed-by: Adam Jackson <[email protected]>
Signed-off-by: Richard Sandiford <[email protected]>
commit 9740bda9b7dc894b629ed38be9b51059ce90818f
Author: Richard Sandiford <[email protected]>
Date: Tue Jun 18 12:25:07 2013 -0400
llvmpipe: Fix convert_to_blend_type on big-endian
Reviewed-by: Adam Jackson <[email protected]>
Signed-off-by: Richard Sandiford <[email protected]>
commit ae037c2de0f029e4e99371c0de25560484f0d8df
Author: Richard Sandiford <[email protected]>
Date: Tue Jun 18 12:25:06 2013 -0400
util: Convert color pack to packed formats
This fixes them on big-endian.
Reviewed-by: Adam Jackson <[email protected]>
Signed-off-by: Richard Sandiford <[email protected]>
commit 5b05ac0c89ae092ea8ba5bba9f739708d7396b5c
Author: Richard Sandiford <[email protected]>
Date: Tue Jun 18 12:25:06 2013 -0400
graw-xlib: Convert to packed formats
Reviewed-by: Adam Jackson <[email protected]>
Signed-off-by: Richard Sandiford <[email protected]>
commit 51396e7d098cb6ff794391cf11afe4dbf86dbea0
Author: Richard Sandiford <[email protected]>
Date: Tue Jun 18 12:25:06 2013 -0400
format: Convert to packed formats
Reviewed-by: Adam Jackson <[email protected]>
Signed-off-by: Richard Sandiford <[email protected]>
commit 417b60bc66eb450e68a92ab0e47f76e292b385e6
Author: Adam Jackson <[email protected]>
Date: Tue Jun 18 12:25:06 2013 -0400
st/dri: Convert to packed formats
Reviewed-by: Adam Jackson <[email protected]>
Signed-off-by: Richard Sandiford <[email protected]>
commit 0934b2e022a5e0847d312c40734e2b44cac52fd8
Author: Richard Sandiford <[email protected]>
Date: Tue Jun 18 12:25:06 2013 -0400
st/xlib: Convert to packed formats
Reviewed-by: Adam Jackson <[email protected]>
Signed-off-by: Richard Sandiford <[email protected]>
commit a307ea3c3716a706963acce7966b5e405ba11db9
Author: Richard Sandiford <[email protected]>
Date: Tue Jun 18 12:25:06 2013 -0400
gbm: Convert to packed formats
Reviewed-by: Adam Jackson <[email protected]>
Signed-off-by: Richard Sandiford <[email protected]>
commit 53eebdd253e1960a645ea278f31d7ef6a6cf4aeb
Author: Richard Sandiford <[email protected]>
Date: Tue Jun 18 12:25:06 2013 -0400
tests: Convert to packed formats
Reviewed-by: Adam Jackson <[email protected]>
Signed-off-by: Richard Sandiford <[email protected]>
commit 2f77fe3ee524945eacd546efcac34f7799fb3124
Author: Adam Jackson <[email protected]>
Date: Tue Jun 18 13:07:37 2013 -0400
gallium: Document packed formats
Signed-off-by: Adam Jackson <[email protected]>
commit 1f1017159ce951f922210a430de9229f91f62714
Author: Richard Sandiford <[email protected]>
Date: Tue Jun 18 12:25:06 2013 -0400
gallium: Introduce 32-bit packed format names
These are for interacting with buffers natively described in terms of
bit shifts, like X11 visuals:
uint32_t xyzw8888 = (x << 0) | (y << 8) | (z << 16) | (w << 24);
Define these in terms of (endian-dependent) aliases to the array-style
format names.
Reviewed-by: Adam Jackson <[email protected]>
Signed-off-by: Richard Sandiford <[email protected]>
commit 6cc7ab1ee66ed668da78c1d951dfd7782b4e786a
Author: Adam Jackson <[email protected]>
Date: Mon Jun 3 12:10:32 2013 -0400
gallium: Document format name conventions
v2:
- Fix a channel name thinko (Michel Dänzer)
- Elaborate on SCALED versus INT
- Add links to DirectX and FOURCC docs
Signed-off-by: Adam Jackson <[email protected]>
commit df4d269e7fb62051a3c029b84147465001e5776e
Author: Adam Jackson <[email protected]>
Date: Tue Jun 18 12:25:06 2013 -0400
gallivm: Remove all notion of byte-swapping
Signed-off-by: Adam Jackson <[email protected]>
Signed-off-by: Adam Jackson <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The result isn't always 0 in this case (depends on query type),
so instead of special casing this just use the ordinary path (should result
in correct values thanks to initialization in query_begin/end), just
skipping the fence wait.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
| |
Some APIs require 64bit and at least for 64bit archs the overhead
should be minimal.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Handle PIPE_QUERY_GPU_FINISHED and PIPE_QUERY_TIMESTAMP_DISJOINT, and
also fill out the ps_invocations and c_primitives from the
PIPE_QUERY_PIPELINE_STATISTICS (the others in there should already
be handled). Note that ps_invocations isn't pixel exact, just 16 pixel
exact but I guess it's better than nothing.
Doesn't really seem to work correctly but there's probably bugs elsewhere.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 41966fdb3b71c0b70aeb095e0eb3c5626c144a3a.
While it's a lot cleaner it causes regressions because
the draw interface is always called from the draw functions
of the drivers (because the buffers need to be mapped) which
means that the stream output buffers endup being cleared on
every draw rather than on setting.
Signed-off-by: Zack Rusin <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
honor render_condition for clear_render_target and clear_depth_stencil.
Also add minimal support for occlusion predicate, though it can't be active
at the same time as an occlusion query yet.
While here also switchify some large if-else (actually just mutually
exclusive if-if-if...) constructs.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For conditional rendering this makes it possible to skip rendering
if either the predicate is true or false, as supported by d3d10
(in fact previously it was sort of implied skip rendering if predicate
is false for occlusion predicate, and true for so_overflow predicate).
There's no cap bit for this as presumably all drivers could do it trivially
(but this patch does not implement it for the drivers using true
hw predicates, nvxx, r600, radeonsi, no change is expected for OpenGL
functionality).
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Moves clearing of the draw so target buffers to the draw
module. They had to be cleared in the drivers before
which was quite messy.
Signed-off-by: Zack Rusin <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Use new util_fill_box helper for util_clear_render_target.
(Also fix off-by-one map error.)
v2: handle non-zero z correctly in new helper
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
| |
Believe it or not but these two are actually the first two functions which
really belong in this file nowadays.
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Mostly just make sure the layer parameter gets passed through to the right
places (and get clamped, can do this at setup time), fix up clears to
clear all layers and disable opaque optimization. Luckily don't need to
touch the jitted code.
(Clears invoked via pipe's clear_render_target method will not work however
since the pipe_util_clear function used for it doesn't handle clearing
multiple layers yet.)
v2: per Brian's suggestion, prettify var initialization and add some comments,
add assertion for impossible layer specification for surface.
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
| |
These should just work, required by d3d10. Too large resources will
get thrown out separately anyway.
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was always doing per-pixel alignment which isn't necessary, except
for the buffer case (due to the per-element offset). The disabled code
for calculating it was incorrect because it assumed that always the full
block would be fetched, which may not be the case, so fix this up.
The original code failed for instance for r10g10b10a2 the alignment would
have been calculated as 4 (block_width) * 4 (bytes) so 16, but the actual
fetch may have only fetched 2 values at a time, hence only alignment 8 -
it is unclear what exactly would happen in this case (alignment larger
than size to fetch).
So just use the (already calculated) fetch size instead and get alignment
from that which should always work, no matter if fetching 1,2 or 4 pixels.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For rendering to buffers, we cannot have any y alignment.
So make sure that tile clear commands only clear up to the fb width/height,
not more (do this for all resources actually as clearing more seems
pointless for other resources too). For the jit fs function, skip execution
of the lower half of the fragment shader for the 4x4 stamp completely,
for depth/stencil only load/store the values from the first row
(replace other row with undef).
For the blend function, also only load half the values from fs output,
replace the rest with undefs so that everything still operates on the
full 4x4 block to keep code the same between 4x1 and 4x4 (except for
load/store of course which also needs to skip (store) or replace these
values with undefs (load))., at the cost of slightly less optimal code
being produced in some cases.
Also reduce 1d and 1d array alignment too, because they can be handled the
same as buffers so don't need to waste memory.
v2: don't try to run special blend code for 4x1, (very) slightly less
complexity if we just use the same code as for 4x4 which may or may not
make it easier to optimize in the future (as we care a lot more about 4x4
performance than 1d).
v2: don't use undef values for unused fs src outputs with llvm 3.1 as it
apparently can trigger a bug in llvm.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some parameters were used inconsistently, for instance not using
block_width/block_height/block_size for deferring number of pixels
but rather relying on guesses from the number of fragment shaders etc,
so fix this up (no actual change in behavior since the block size stays
fixed). (Though most of the code would work with different block_height,
with three exceptions, one being the hacked r11g11b10 conversions and
twiddle code which only work with block_height 2 not 1, and the last
one being blend vector type not being 128bit wide.)
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There's no good reason why it can't handle 2x4f->1x8ub, 1x4f->1x4ub and
1x8f->1x8ub cases, there might be legitimate reasons why we don't have
enough input vectors for a full destination vector, and using pack
intrinsics should still be much better than using generic conversion
(it looks like convert_alpha from the blend code might hit this though
I suspect it could be avoided).
v2: add another test vector format to lp_test_conv so this gets tested.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
One of the assertion made no sense for buffer rendertargets
(due to the union), so drop it. (The same assertion is present already in
the path for texture surfaces later.).
v2: make assertion completely accurate (suggested by Jose).
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The overallocation was very bad especially for things like 1d array
textures which got blown up by a factor of 64. (Even ordinary smallish
2d textures benefit a lot from this, a mipmapped 64x64 rgba8 texture
previously used 7*16kB = 112kB instead of now ~22kB.)
4x4 is chosen because this is the size the jit functions run on, so
making it smaller is going to be a bit more complicated.
It is actually not strictly 4x4 pixel, since we'd want to avoid situations
where different threads are rendering to the same cacheline so we keep
cacheline size alignment in x direction (often 64bytes).
To make this work introduce new task width/height parameters and make
sure clears don't clear the whole tile if it's a partial tile. Likewise,
the rasterizer may produce fragments outside the 4x4 blocks present in a
tile, so don't call the jit function for them.
This does not yet fix rendering to buffers (which cannot have any y
alignment at all), and 1d/1d array textures are still overallocated by a
factor of 4.
v2: replace magic number 4 with LP_RASTER_BLOCK_SIZE, fix size of buffers
allocated (needed in case we render to them).
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
These were mostly just a waste of memory and cache pressure, and were
really only used for debugging.
This change reduces instruction count (as measured by callgrind's Ir
event) of gnome-shell-perf-tool on Ivybridge by 3.5% ± 0.015% (n=20).
Signed-off-by: Adam Jackson <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
We need to clamp to make sure invalid shader doesn't crash our
driver. The spec says to return 0-th index for everything that's
out of bounds.
Signed-off-by: Zack Rusin <[email protected]>
Reviewed-by: José Fonseca<[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
draw_find_shader_output like most of the code in draw used to
depend on position always being at output slot 0. which meant
that any other attribute being at 0 could signify an error.
unfortunately position can be at any of the output slots, thus
other attributes can occupy slot 0 and we need to mark the ones
which were not found by something else. This commit changes
draw_find_shader_output so that it returns -1 if it can't
find the given attribute and adjust the code that depended
on it returning >0 whenever it correctly found an attrib.
Signed-off-by: Zack Rusin <[email protected]>
Reviewed-by: José Fonseca<[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Largely related to making sure the rasterizer can correctly
pick out the correct scissor box for the current viewport.
Signed-off-by: Zack Rusin <[email protected]>
Reviewed-by: José Fonseca<[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Gallium supported only a single viewport/scissor combination. This
commit changes the interface to allow us to add support for multiple
viewports/scissors.
Signed-off-by: Zack Rusin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: José Fonseca<[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
| |
Eliminate the rest of the no longer needed layout logic.
(It is possible some code could be simplified a bit further still.)
Reviewed-by: Jose Fonseca <[email protected]>
|