| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
| |
It was actually slightly buggy (missing initialization / setup not dependent
on new vs albeit I didn't see issues), but the case of non-existing attributes
is now handled by draw emit code so don't need that anymore.
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
|
| |
Found-by: Coccinelle
Signed-off-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch converts the SSE-optimized lp_rast_triangle_32_3_16()
to VMX/VSX.
I measured the results on POWER8 machine with 32 cores at 3.4GHz and
16GB of RAM.
FPS/Score
Name Before After Delta
------------------------------------------------
openarena 16.35 16.7 2.14%
xonotic 4.707 4.97 5.57%
glmark2 didn't show a significant (more than 1%) difference.
v2: Make sure code is build only on POWER8 LE machine
Signed-off-by: Oded Gabbay <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch converts the SSE-optimized build_mask_32() and
build_mask_linear_32() to VMX/VSX.
I measured the results on POWER8 machine with 32 cores at 3.4GHz and
16GB of RAM.
FPS/Score
Name Before After Delta
------------------------------------------------
glmark2 (score) 139.8 142.7 2.07%
openarena and xonotic didn't show a significant (more than 1%)
difference.
v2: Make sure code is build only on POWER8 LE machine
Signed-off-by: Oded Gabbay <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch converts the SSE optimization done in do_triangle_ccw to
VMX/VSX.
I measured the results on POWER8 machine with 32 cores at 3.4GHz and
16GB of RAM.
FPS/Score
Name Before After Delta
------------------------------------------------
glmark2 (score) 136.6 139.8 2.34%
openarena 16.14 16.35 1.30%
xonotic 4.655 4.707 1.11%
v2:
- Convert loads to use aligned loads
- Make sure code is build only on POWER8 LE machine
Signed-off-by: Oded Gabbay <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
| |
This allows the state tracker to know that the various draw parameters
are available in vertex shaders.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ARB_fragment_layer_viewport requires that if a fs reads layer or viewport
index but it wasn't output by gs (or vs with other extensions), then it reads
0. This never worked for llvmpipe, and is surprisingly non-trivial to fix.
The problem is the mechanism to handle non-existing outputs in draw is rather
crude, it will simply redirect them to whatever is at output 0, thus later
stages will just get garbage. So, rather than trying to fix this up (which
looks non-trivial), fix this up in llvmpipe setup by detecting this case there
and output a fixed zero directly.
While here, also optimize the hw vertex layout a bit - previously if the gs
outputted layer (or vp) and the fs read those inputs, we'd add them twice
to the vertex layout, which is unnecessary.
And do some minor cleanup, slots don't require that many bits, there was some
bogus (but harmless) float/int mixup for psize slot too, make the slots all
unsigned (we always put pos at pos zero thus everything else has to be positive
if it exists), and make sure they are properly initialized (layer and vp index
slot were not which looked fishy as they might not have got set back to zero
when changing from a gs which outputs them to one which does not).
This fixes the failures in piglit's arb_fragment_layer_viewport group
(3 each for layer and vp).
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Use NULL tests of the form `if (ptr)' or `if (!ptr)'.
They do not depend on the definition of the symbol NULL.
Further, they provide the opportunity for the accidental
assignment, are clear and succinct.
Signed-off-by: Edward O'Callaghan <[email protected]>
Signed-off-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Edward O'Callaghan <[email protected]>
Signed-off-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Edward O'Callaghan <[email protected]>
Signed-off-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
d3d10 actually requires using provoking (first) vertex. GL is happy with
any vertex (as long as we say it's undefined in the corresponding queries).
Up to now we actually used vertex 0 for viewport index, and vertex 1 for
layer (for tris), which really didn't make sense (probably a typo). Also,$
since we reorder vertices of clockwise triangle, that actually meant we used
a different vertex depending if the traingle was cw or ccw (still ok by gl).
However, it should be consistent with what draw (clip) does, and using
provoking vertex seems like the sensible choice (draw clip will be fixed
next as it is totally broken there).
While here, also use the correct viewport always even when not needed
in setup (we pass it down to jit fragment shader it might be needed there
for getting correct near/far depth values).
No piglit changes.
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
| |
33339775565154040e0c4ea2e196217dccc08cdf added support for ASTC textures to
gallium. They don't have any helpers hooked up for software decoding, however,
so cannot support them in drivers relying on util code for decoding.
Reviewed-by: Jose Fonseca <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Removing the fake format helpers (1c7d0a6aa4f5cb38af7e281e1e5437cd1a20f781)
caused this to fail. These formats were never supported, but previously
they would have asserted in the generated jit functions (which, due to lack
of test cases for these formats, were never called) whereas we now assert when
trying to build the jit function. So, skip them completely.
This fixes https://bugs.freedesktop.org/show_bug.cgi?id=93092
Reviewed-by: Jose Fonseca <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
As pointed out by Emil, this sometimes hangs, appears to be due to threading
need to rethink how this stuff works for llvmpipe.
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
| |
There are some weird problems with 8-wide vectors.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
compressed textures are very slow because decoding is rather complex
(and because there's no jit code code to decode them too for non-technical
reasons).
Thus, add some texture cache which holds a couple of decoded blocks.
Right now this handles only s3tc format albeit it could be extended to work
with other formats rather trivially as long as the result of decode fits into
32bit per texel (ideally, rgtc actually would decode to more than 8 bits
per channel, but even then making it work for it shouldn't be too difficult).
This can improve performance noticeably but don't expect wonders (uncompressed
is unsurprisingly still faster). It's also possible it might be slower in
some cases (using nearest filtering for example or if there's otherwise not
many cache hits, the cache is only direct mapped which isn't great).
Also, actual decode of a block relies on util code, thus even though always
full blocks are decoded it is done texel by texel - this could obviously
benefit greatly from simd-optimized code decoding full blocks at once...
Note the cache is per (raster) thread, and currently only used for fragment
shaders.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are currently two methods in llvmpipe code to calculate coeffs to
be used as inputs for the fragment shader. The two methods use slightly
different ways to do the floating point calculations and thus produce
slightly different results.
The decision which method to use is determined by the size of the vector
that is used by the platform.
For vectors with size of more than 128bit, a single-step method is used,
in which coeffs_init_simple() + attribs_update_simple() are called.
For vectors with size of 128bit or less, a two-step method is used, in
which coeffs_init() + attribs_update() are called.
This causes some piglit tests (clip-distance-bulk-copy,
interface-vs-unnamed-to-fs-unnamed) to fail when using platforms with
128bit vectors (such as ppc64le or x86-64 without AVX).
This patch makes platforms with 128bit vectors use the single-step
method (aka "simple" method) instead of the two-step method.
This would make the resulting coeffs identical between more platforms,
make sure the piglit tests passes, and make debugging and maintainability
a bit easier as the generated LLVM IR will be the same for more platforms.
The performance impact is negligible for x86-64 without AVX, and
basically non-existent for ppc64le, as it can be seen from the following
benchmarking results:
- glxspheres, on ppc64le:
- original code: 4.892745317 frames/sec 5.460303857 Mpixels/sec
- with the patch: 4.932083873 frames/sec 5.504205571 Mpixels/sec
- Additional 0.8% performance boost
- glxspheres, on x86-64 without AVX:
- original code: 20.16418809 frames/sec 22.50323395 Mpixels/sec
- with the patch: 20.31328989 frames/sec 22.66963152 Mpixels/sec
- Additional 0.74% performance boost
- glmark2, on ppc64le:
- original code: score of 58
- with my change: score of 57
- glmark2, on x86-64 without AVX:
- original code: score of 175
- with the patch: score of 167
- Impact of of -4.5% on performance
- OpenArena, on ppc64le:
- original code: 3398 frames 1719.0 seconds 2.0 fps
255.0/505.9/2773.0/0.0 ms
- with the patch: 3398 frames 1690.4 seconds 2.0 fps
241.0/497.5/2563.0/0.2 ms
- 29 seconds faster with the patch, which is about 2%
- OpenArena, on x86-64 without AVX:
- original code: 3398 frames 239.6 seconds 14.2 fps
38.0/70.5/719.0/14.6 ms
- with the patch: 3398 frames 244.4 seconds 13.9 fps
38.0/71.9/697.0/14.3 ms
- 0.3 fps slower with the patch (about 2%)
Additional details can be found at:
http://lists.freedesktop.org/archives/mesa-dev/2015-October/098635.html
Signed-off-by: Oded Gabbay <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
So I've known this was broken before, cogl has a workaround
for it from what I know, but with the gallium based swrast
drivers BlitFramebuffer from back to front or vice-versa
was pretty broken.
The legacy swrast driver tracks when a front buffer is used
and does the get/put images when it is mapped/unmapped,
so this patch attempts to add the same functionality to the
gallium drivers.
It creates a new context interface to denote when a front
buffer is being created, and passes a private pointer to it,
this pointer is then used to decide on map/unmap if the
contents should be updated from the real frontbuffer using
get/put image.
This is primarily to make gtk's gl code work, the only
thing I've tested so far is the glarea test from
https://github.com/ebassi/glarea-example.git
v2: bump extension version,
check extension version before calling get image. (Ian)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91930
Cc: <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
| |
For ARB_copy_image.
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Just need to use resource target not view target when calculating
first-layer based mip offsets. (This is a gl specific problem since
d3d10 does not distinguish between non-array and array resources neither
at the resource nor view level, only at the shader level.)
Fixes new piglit arb_texture_view sampling-2d-array-as-2d-layer test.
Reviewed-by: Dave Airlie <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
| |
I'll let drivers figure out how to do it.
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
| |
Required by ARB_sample_shading for drivers that don't want a shader variant
in st/mesa.
Reviewed-by: Ilia Mirkin <[email protected]>
Acked-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Glenn Kennard <[email protected]>
|
|
|
|
|
|
|
|
| |
This allows creating compute-only and debug contexts.
Reviewed-by: Brian Paul <[email protected]>
Acked-by: Christian König <[email protected]>
Acked-by: Alex Deucher <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Roland Scheidegger <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74329
v2: add a CAP for half floats
drivers should not expose the CAPs if they don't support the formats
v3: update relnotes
Reviewed-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Generated by running:
git grep -l INLINE src/gallium/ | xargs sed -i 's/\bINLINE\b/inline/g'
git grep -l INLINE src/mesa/state_tracker/ | xargs sed -i 's/\bINLINE\b/inline/g'
git checkout src/gallium/state_trackers/clover/Doxyfile
and manual edits to
src/gallium/include/pipe/p_compiler.h
src/gallium/README.portability
to remove mentions of the inline define.
Signed-off-by: Ilia Mirkin <[email protected]>
Acked-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
| |
fence_finish(timeout=0) does the same thing
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
| |
I copied what fence_signalled does.
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Tested with Ilia Mirkin's gzdoom.trace and
"arb_uniform_buffer_object-maxuniformblocksize fsexceed" piglit test
without my earlier fix to fail linkage when UBO exceeds
GL_MAX_UNIFORM_BLOCK_SIZE.
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Just implement it in terms of util_resource_copy_region(). Both the
original code and util_resource_copy_region() boil down to mapping,
calling util_copy_box() and unmapping.
No piglit regressions. This will also help to implement GL_ARB_copy_image.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Pretty trivial, fixes the issue that we're expected to be able to blit
stencil surfaces (as the blit just relies on util blitter code which needs
stencil export to do it).
2 piglits skip->pass, 11 fail->pass
v2: prettify, keep different stencil ref value handling out of depth/stencil
test itself.
Reviewed-by: Jose Fonseca <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
| |
Apparently some compilers think we probably wanted to do !(x == y) instead
and issue a warning, so just shut it up... No functional change, obviously.
Cc: <[email protected]>
|
|
|
|
| |
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
All the functionality was pretty much there, just not tested.
Trivially fix up the missing pieces (take target info from view not
resource), and add some missing bits for cubes.
Also add some minimal debug validation to detect uninitialized target values
in the view...
49 new piglits, 47 pass, 2 fail (both related to fake multisampling,
not texture_view itself). No other piglit changes.
v2: move sampler view validation to sampler view creation, update docs.
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
GL_AMD_performance_monitor must return an error when a monitoring
session cannot be started.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Martin Peres <[email protected]>
|
|
|
|
| |
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
llvm goes crazy when doing that, using way more memory and time, though there's
probably more to it - this points to a very much similar issue as fixed in
8a9f5ecdb116d0449d63f7b94efbfa8b205d826f. In any case I've seen a quite
plain looking vertex shader with just ~50 simple tgsi instructions (but with a
dozen or so such indirect constant buffer lookups) go from a terribly high
~440ms compile time (consuming 25MB of memory in the process) down to a still
awful ~230ms and 13MB with this fix (with llvm 3.3), so there's still obvious
improvements possible (but I have no clue why it's so slow...).
The resulting shader is most likely also faster (certainly seemed so though
I don't have any hard numbers as it may have been influenced by compile times)
since generally fetching constants outside the buffer range is most likely an
app error (that is we expect all indices to be valid).
It is possible this fixes some mysterious vertex shader slowdowns we've seen
ever since we are conforming to newer apis at least partially (the main draw
loop also has similar looking conditionals which we probably could do without -
if not for the fetch at least for the additional elts condition.)
v2: use static vars for the fake bufs, minor code cleanups
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Just announce support for 4 components.
While here also increase the max/min texel offsets (the limit is completely
artificial, was chosen because that's what other hardware did, however there's
other drivers using larger limits).
Over a thousand little piglits skip->pass.
v2: update docs/GL3.txt
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This has got a bit out of control with more and more parameters added.
Worse, whenever something in there changes all callees have to be updated
for that, even though they don't really do much with any parameter in there
except pass it on to the actual sampling function.
Hence simply put almost everything into a struct. Also instead of relying
on some arguments being NULL, be explicit and set this in a key (which is
just reused for function generation for simplicity). (The code still relies
on them being NULL in the end for now.)
Technically there is a minimal functional change here for shadow sampling:
if shadow sampling is done is now determined explicitly by the texture
function (either sample_c or the gl-style tex func inherit this from target)
instead of the static texture state. These two should always match, however.
Otherwise, it should generate all the same code.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These functions looked quite complicated, even though what they actually did
was trivial (ever since we dropped swizzled rendering). Also drop lookup of
format block per bytes done for each block, and do it once per scene instead.
This improves everybody's favorite "benchmark" by 3% or so, though
lp_rast_shade_quads_all() which calls this shows up still quite high for a
function which does little more than call the jit function.
(This would most likely be much better handled by the jit function itself,
the strides are passed through anyway already, though for being able to
handle layers it would definitely add some complexity.)
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The callbacks used for getting the dynamic texture/sampler state were using
the jit_context from the generated jit function. This works just fine, however
that way it's impossible to generate separate functions for texture sampling,
as will be done in the next commit. Hence, pass this pointer through all
interfaces so it can be passed to a separate function (technically, it would
probably be possible to extract this pointer from the current function instead,
but this feels hacky and would probably require some more hacks if we'd use
real functions instead of inlining all shader functions at some point).
There should be no difference in the generated code for now.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The only hackish ones are llvmpipe and softpipe, which currently return
the same string as for get_vendor(), while ideally they should return
the CPU vendor.
Signed-off-by: Giuseppe Bilotta <[email protected]>
Reviewed-by: Tom Stellard <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There's 2 reasons why we'd want to use the global context:
1) There still seems to be one memory "leak" left when using multiple llvm
contexts (it is not a true leak as the memory disappears into some still
addressable pool but nevertheless the memory consumption grows). See
http://cgit.freedesktop.org/~jrfonseca/llvm-jitstress/
2) These contexts get kinda big - even when disposing modules etc. after
compiling a shader the LLVMContext can easily be over 100kB. So when there's
lots of llvm contexts arounds it adds up.
The downside is that at least right now this is absolutely not thread safe,
so this only works safely in environments where multiple pipe contexts are not
used concurrently.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
| |
Matching what we already do with autotools builds.
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
where possible.
The main objective of this change is to enable Linux developers to use
more of C99 throughout Mesa, with confidence that the portions that need
to be built with MSVC -- and only those portions --, stay portable.
This is achieved by using the appropriate -Werror= options only on the
places they need to be used.
Unfortunately we still need MSVC 2008 on a few portions of the code
(namely llvmpipe and its dependencies). I hope to eventually eliminate
this so that we can use C99 everywhere, but there are technical/logistic
challenges (specifically, newer Windows SDKs no longer bundle MSVC,
instead require a full installation of Visual Studio, and that has
hindered adoption of newer MSVC versions on our build processes.)
Thankfully we have more directy control over our OpenGL driver, which is
why we're now able to migrate to MSVC 2013 for most of the tree.
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|