| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Consider the following shader:
vec4 f(vec4 v) { return v; }
vec4 f(vec4 v);
The prototype exactly matches the signature of the earlier definition,
so there's absolutely no point in it. However, it doesn't appear to
be illegal. The GLSL 4.30 specification offers two relevant quotes:
"If a function name is declared twice with the same parameter types,
then the return types and all qualifiers must also match, and it is the
same function being declared."
"User-defined functions can have multiple declarations, but only one
definition."
In this case the same function was declared twice, and there's only one
definition, which fits both pieces of text. There doesn't appear to be
any text saying late prototypes are illegal, so presumably it's valid.
Unfortunately, it currently triggers an assertion failure:
ir_dereference_variable @ <p1> specifies undeclared variable `v' @ <p2>
When we process the second line, we look for an existing exact match so
we can enforce the one-definition rule. We then leave sig set to that
existing function, and hit sig->replace_parameters(&hir_parameters),
unfortunately nuking our existing definition's parameters (which have
actual dereferences) with the prototype's bogus unused parameters.
Simply bailing out and ignoring such late prototypes is the safest
thing to do.
Fixes Piglit's late-proto.vert as well as 3DMark/Ice Storm for Android.
NOTE: This is a candidate for stable branches.
Signed-off-by: Kenneth Graunke <[email protected]>
Tested-by: Tapani Pälli <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Chad Versace <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
| |
Tested-by: Tom Stellard <[email protected]>
Tested-by: Aaron Watry <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
It guarded the function prototype of pipe_loader_sw_probe, whose use (in
pipe_loader.c) and definition (in pipe_loader_sw.c) were not guarded.
Both are built into libpipe_loader.la if HAVE_LOADER_GALLIUM, which is
enable_gallium_loader in configure.ac.
Tested-by: Tom Stellard <[email protected]>
Tested-by: Aaron Watry <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
| |
Tested-by: Tom Stellard <[email protected]>
Tested-by: Aaron Watry <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
| |
For consistency, since we already have HAVE_PIPE_LOADER_{SW,DRM}.
Tested-by: Tom Stellard <[email protected]>
Tested-by: Aaron Watry <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The number of samples is already available in the miptree data
structure, so there's no need to pass it in.
I suspect this may fix a subtle bug because in one case
(intel_renderbuffer_update_wrapper) we were always passing zero for
num_samples, even though the buffer in question was not guaranteed to
be single-sampled. But I wasn't able to find a failing test case.
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Technically it's legal for geometry shader to not emit any
vertices. It's silly, but perfectly legal, so lets make draw
stop crashing if it happens.
Signed-off-by: Zack Rusin <[email protected]>
Reviewed-by: José Fonseca <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The upside is less CPU overhead in fiddling with GL error handling, the
ability to use the constant color write message in most cases, and no GLSL
clear shaders appearing in MESA_GLSL=dump output. The downside is more
batch flushing and a total recompute of GL state at the end of blorp.
However, if we're ever going to use the fast color clear feature of CMS
surfaces, we'll need this anyway since it requires very special state
setup.
This increases the fail rate of some the GLES3conform ARB_sync tests,
because of the initial flush at the start of blorp. The tests already
intermittently failed (because it's just a bad testing procedure), and we
can return it to its previous fail rate by fixing the initial flush.
Improves GLB2.7 performance 0.37% +/- 0.11% (n=71/70, outlier removed).
v2: Rename the key member, use the core helper for sRGB, and use
BRW_MASK_* enums, fix comment and indentation (review by Paul).
v3: Rewrite a comment, drop a silly temporary variable (review by Ken)
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
v2: const-qualify ctx, and add a comment about the function (recommended
by Brian and Kenneth).
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]> (v1)
|
|
|
|
|
|
|
|
|
|
| |
Improves GLB2.7 performance 0.13% +/- 0.09% (n=104/105, outliers removed).
More importantly, once color glClear()s are done through blorp in the next
commit, this reduces regression in GLES3 conformance tests that rely on
queueing up many glClear()s and having the GPU report being still busy in
an ARB_sync query after that.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Signed-off-by: Vadim Girlin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Collects various statistical information for each shader
and total stats for contexts.
Printed with R600_DEBUG=sb,sbstat
Signed-off-by: Vadim Girlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In some cases we use value::gvn_source field to link values that
are known to be equal before gvn pass (e.g. results of DOT4 in different
slots of the same alu group), but then source value may become dead later
and this confuses further passes.
This patch resets value::gvn_source to NULL in the dce_cleanup pass
if it points to dead value.
Fixes segfault during shader optimization with ETQW.
Signed-off-by: Vadim Girlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It's not a complete register pressure tracking, yet it helps to prevent
register allocation problems in some cases where they were observed.
The problems are uncovered by false dependencies between fetch instructions
introduced by some recent changes in TGSI and/or default backend.
Sometimes we have code like this:
...
SAMPLE R5.xyzw, R5.xyzw
... store R5.xyzw somewhere
MOV R5.x, <next x coord>
MOV R5.y, <next y coord>
SAMPLE R5.xyzw, R5.xyzw
... <may be repeated a lot of times>
With 2D resources, z and w in SAMPLE src reg aren't used and can be simply
masked, but shader backend doesn't have this information, so it's
considered as data dependency by optimization algorithms.
|
| |
|
| |
|
|
|
|
|
|
| |
Optimization is enabled with "R600_DEBUG=sb".
Signed-off-by: Vadim Girlin <[email protected]>
|
| |
|
|
|
|
| |
This prevents the problems when the header is included in C++ code.
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
This results in more clean shader code and may improve the quality of
optimized code produced by r600-sb due to eliminated false dependencies
in some cases.
Signed-off-by: Vadim Girlin <[email protected]>
|
|
|
|
|
|
|
|
| |
The remaining bits happen to do nothing that
_swrast_span_render_start()/finish() don't do.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
| |
It's not really span code ever since we stopped using spans for S8.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the case of renering to windows in X, we would render to stale buffers
(or not render at all!) if you hit a MapRenderbuffer as the first thing
done to your window after new buffers are ready to be collected in DRI2.
I think this also covers the weird comment about irb->mt being missing
sometimes.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
| |
I always forget how we do this for compressed textures.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that everything goes through ImageSlices[], we can rely on the
driver's existing texture mapping function.
A big block of code goes away on Radeon that looks like it was to deal with
the validate that happened at SpanRenderStart, which no longer occurs since we
don't need validation for the MapTextureImage hook.
v2: Rewrite comment about ImageSlices, fix duplicated swImages, touch up
unmap loop.
Reviewed-by: Kenneth Graunke <[email protected]> (v1)
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
| |
This code is trying to deal with providing a map in the case that
AllocTexImageBuffer was called, which is hooked up to the swrast variant.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
| |
MapTextureImage has the exact same logic, except it can also handle
swrast-allocated buffers.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For hardware drivers with pitch alignment requirements, a
non-power-of-two-sized texture format won't end up being an integer number
of pixels per row. Also, avoids having to change our units between
MapTextureImage's rowStride and swrast's RowStride.
This doesn't fully convert the compressed texel fetch path, but does make
sure we don't drop any bits (not that we'd expect to).
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
| |
This gets us ready for the Map field to die.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a step toward allowing drivers to use their normal mapping paths,
instead of requiring that all slice mappings come from an aligned offset
from the first slice's map.
This incidentally fixes missing slice handling in FXT1 swrast.
v2: Use slice height helper function.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
| |
v2: Move slice height calculation to a helper function (recommeded by Brian).
Reviewed-by: Kenneth Graunke <[email protected]> (v1)
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
| |
This function going to get used a lot more in upcoming patches.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
| |
This is the equivalent of intel's
80513ec8b4c812b9c6249cc5824337a5f04ab34c.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
| |
|
|
|
|
| |
* Haiku now has DEP enabled by default.
|
|
|
|
| |
* Haiku now has DEP enabled by default.
|
|
|
|
| |
* Haiku now has DEP enabled by default.
|
| |
|
|
|
|
|
|
|
| |
This could be used by shader-db for hopefully more accurate regression
testing.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Improves GLB2.7 performance on my HSW by 0.671455% +/- 0.225037% (n=62).
v2: Make is_valid_3src() a method of the fs_reg. (recommended by Ken)
Reviewed-by: Matt Turner <[email protected]> (v1)
Reviewed-by: Kenneth Graunke <[email protected]> (v1)
|
|
|
|
|
|
|
|
| |
Improves GLB2.7 trex performance 1.01985% +/- 0.721366% on my IVB (n=10)
and by 3.38771% +/- 0.584241% (n=15) on my HSW, due to a 32x32 ARGB8888
cubemap going from untiled to tiled.
Reviewed-by: Daniel Vetter <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It appears that Z16 on Intel hardware is in fact slower than Z24, so
people are getting surprisingly hurt when trying to use Z16 as a
performance-versus-precision tradeoff, or when they're targeting GLES2 and
that's all you get.
GL 3.0+ have Z16 on the list of required exact format sizes, but GLES
doesn't, so choose the better-performing layout in that case. Improves
GLB 2.7 trex performance at 1920x1080 by 10.7% +/- 1.1% (n=3) on my IVB
system.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|