| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
| |
This patch adds liveness analysis to i915g and a couple
optimizations which benefit from it. One interesting
optimization turns (fake) indirect texture accesses into direct
texture accesses (the i915 supports a maximum of 4 indirect
texture accesses). Among other things this fixes a bunch of
piglit tests.
|
|
|
|
|
|
|
| |
Fixes assertion failues in 24 piglit tests with
MESA_GL_VERSION_OVERRIDE=3.0, 12 of which are now passing.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
Fixes 20 piglit tests with MESA_GL_VERSION_OVERRIDE=3.0.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
Based on calim's original fix in the nine branch.
Signed-off-by: Maarten Lankhorst <[email protected]>
Cc: "9.2 and 9.1" <[email protected]>
|
|
|
|
|
|
| |
Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=68845
Signed-off-by: Christian König <[email protected]>
|
|
|
|
| |
Cc: "9.2" <[email protected]>
|
|
|
|
| |
Cc: "9.2 and 9.1" <[email protected]>
|
|
|
|
| |
Too many calls into libdrm when a single one is enough.
|
| |
|
|
|
|
| |
Cc: "9.2" <[email protected]>
|
|
|
|
|
|
| |
Fixes crash with Amnesia: The Dark Descent.
Cc: "9.2 and 9.1" <[email protected]>
|
|
|
|
|
|
|
| |
This fixes shaders produced by supertuxkart.
Cc: "9.2" <[email protected]>
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
| |
Cc: "9.2 and 9.1" <[email protected]>
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This mimics r600g. The R600_CONTEXT_xxx flags are added to rctx->b.flags
and si_emit_cache_flush emits the packets. That's it. The shared radeon code
tells us when the streamout cache should be flushed, so we have to check
the flags anyway.
There is a new atom "cache_flush", because caches must be flushed *after*
resource descriptors are changed in memory.
Functional changes:
* Write caches are flushed at the end of CS and read caches are flushed
at its beginning.
* Sampler view states are removed from si_state, they only held the flush
flags.
* Everytime a shader is changed, the I cache is flushed. Is this needed?
Due to a hw bug, this also flushes the K cache.
* The WRITE_DATA packet is changed to use TC, which fixes a rendering issue
in openarena. I'm not sure how TC interacts with CP DMA, but for now it
seems to work better than any other solution I tried. (BTW CIK allows us
to use TC for CP DMA.)
* Flush the K cache instead of the texture cache when updating resource
descriptors (due to a hw bug, this also flushes the I cache).
I think the K cache flush is correct here, but I'm not sure if the texture
cache should be flushed too (probably not considering we use TC
for WRITE_DATA, but we don't use TC for CP DMA).
* The number of resource contexts is decreased to 16. With all of these cache
changes, 4 doesn't work, but 8 works, which suggests I'm actually doing
the right thing here and the pipeline isn't drained during flushes.
Reviewed-by: Michel Dänzer <[email protected]>
Reviewed-by: Christian König <[email protected]>
Tested-by: Tom Stellard <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There is a new "class" si_buffer_resources, which should be good enough for
implementing any kind of buffer bindings (constant buffers, vertex buffers,
streamout buffers, shader storage buffers, etc.)
I don't even keep a copy of pipe_constant_buffer - we don't need it.
The main motivation behind this is to have a well-tested infrastrusture
for setting up streamout buffers.
Reviewed-by: Michel Dänzer <[email protected]>
Reviewed-by: Christian König <[email protected]>
Tested-by: Tom Stellard <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Also r600_hw_context_priv.h and si_state_streamout.c are removed, because
they are no longer needed.
Reviewed-by: Michel Dänzer <[email protected]>
Reviewed-by: Christian König <[email protected]>
Tested-by: Tom Stellard <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This streamout state code will be used by radeonsi.
There are new structures r600_common_context and r600_common_screen.
What is inherited by what is shown here:
pipe_context -> r600_common_context -> r600_context
pipe_screen -> r600_common_screen -> r600_screen
The common structures reside in drivers/radeon. Currently they only contain
enough functionality to be able to handle streamout. Eventually I'd like
the whole pipe_screen implementation to be shared and some of the context
stuff too.
This is quite big, but most changes are because of the new structures and
the fact r600_write_value is replaced by radeon_emit.
Thanks to Tom Stellard for fixing the build for r600g/compute.
Reviewed-by: Michel Dänzer <[email protected]>
Reviewed-by: Christian König <[email protected]>
Tested-by: Tom Stellard <[email protected]>
|
|
|
|
|
|
| |
Reviewed-by: Michel Dänzer <[email protected]>
Reviewed-by: Christian König <[email protected]>
Tested-by: Tom Stellard <[email protected]>
|
|
|
|
|
|
| |
Reviewed-by: Michel Dänzer <[email protected]>
Reviewed-by: Christian König <[email protected]>
Tested-by: Tom Stellard <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
There are drivers not using these optional stages.
Broken by a3ae5dc7dd5c2f8893f86a920247e690e550ebd4.
Cc: [email protected]
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Niels Ole Salscheider <[email protected]>
Signed-off-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
pstipple/aaline stages used PIPE_MAX_SAMPLER instead of
PIPE_MAX_SHADER_SAMPLER_VIEWS when dealing with sampler views.
Now these stages can't actually handle sampler_unit != texture_unit anyway
(they cannot work with d3d10 shaders at all due to using tex not sample
opcodes as "mixed mode" shaders are impossible) but this leads to crashes if
a driver just installs these stages and then more than PIPE_MAX_SAMPLER views
are set even if the stages aren't even used.
Reviewed-by: Zack Rusin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Turns out we don't need to do much extra work for detecting this case,
since we are guaranteed to get a empty static texture state in this case,
hence just rely on format being 0 and return all zero then.
Previously needed dummy textures (would just have crashed on format being 0
otherwise) which cannot return the correct result for size queries and when
sampling textures with wrap modes using border.
As a bonus should hugely increase performance when sampling unbound textures -
too bad it isn't a useful feature :-).
Reviewed-by: Jose Fonseca <[email protected]>
Reviewed-by: Zack Rusin <[email protected]>
|
|
|
|
|
|
|
| |
Instead of crashing just return all zero.
Reviewed-by: Jose Fonseca <[email protected]>
Reviewed-by: Zack Rusin <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
No idea if this is working right but copied straight from llvmpipe.
(Not only does this check the so_target but also use buffer->data instead
of buffer for the mapping.)
Just trying to get rid of a segfault testing something else...
Reviewed-by: Jose Fonseca <[email protected]>
Reviewed-by: Zack Rusin <[email protected]>
|
|
|
|
|
|
|
| |
The only reason this was needed was because the fetch texel function had to
get the (dynamic) border color, but this is now done much earlier.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
| |
Instead of enhancing the AoS path so it can deal with it, just use SoA. Fixing
AoS path wouldn't be all that difficult (use all the same logic as SoA) but
considered not worth it for now.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
| |
Signed-off-by: Vadim Girlin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Tom Stellard <[email protected]>
Reviewed-by: Christian König <[email protected]>
|
|
|
|
|
|
|
| |
We need to export at least one color if the shader writes it,
even when nr_cbufs==0.
Signed-off-by: Vadim Girlin <[email protected]>
|
|
|
|
|
|
| |
Fixes "Uninitialized pointer field" defects reported by Coverity.
Signed-off-by: Vinson Lee <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since we can have per-pixel lod we should also honor the filter per-pixel
(in fact we didn't honor it per quad neither in the multiple quad case).
Do this by running the linear path and simply beating the weights into shape
(the sample with the higher weight is the one which should have been chosen
with nearest filtering hence adjust filter weight to 1.0/0.0 based on that).
If all pixels use nearest filter (either min and mag) then still run just a
nearest filter as this is way cheaper (probably around 4 times faster for 2d,
more for 3d case) and it should be relatively rare that pixels really need
different filtering. OTOH if all pixels would require linear don't do anything
special since the linear path with filter adjustments shouldn't really be all
that much more expensive than ordinary linear, and we think it's rare that
min/mag filters are configured differently so there doesn't seem much value
in trying to optimize this further.
This does not yet fix the AoS path (though currently AoS is only used for
single quads hence it could be considered less broken, just never honoring
per-pixel filter decision but doing it per quad).
v2: simplify code a bit (unify min linear and min nearest cases)
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
While a sqrt here and there shouldn't hurt much (depending on the cpu) it is
possible to completely omit it since rho is only used for calculating lod and
there log2(x) == 0.5*log2(x^2). Depending on the exact path taken for
calculating lod this means we get a simple mul instead of sqrt (in case of
nearest mip filter in fact we don't need to replace the sqrt with something
else at all), only in some not very useful path this doesn't work (combined
brilinear calculation of int level and fractional lod, accurate rho calc but
brilinear filtering seems odd).
Apart from being faster as an added bonus this should increase our crappy
fractional accuracy of lod, since fast_log2 is only good for ~3bits and this
should increase accuracy by one bit (though not used if dimension is just one
as we'd need an extra mul there as we never had the squared rho in the first
place).
v2: use separate ilog2_sqrt function if we have squared rho.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is just preparation for per-pixel (or per-quad in case of multiple quads)
min/mag filter since some assumptions about number of miplevels being equal
to number of lods no longer holds true.
This change does not change behavior yet (though theoretically when forcing
per-element path it might be slower with different min/mag filter since the
code will respect this setting even when there's no mip maps now in this case,
so some lod calcs will be done per-element just ultimately still the same
filter used for all pixels).
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
| |
Fixes "Missing break in switch" defect reported by Coverity.
Signed-off-by: Vinson Lee <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
The downstream android kernel driver is "kgsl", the upstream drm/kms
driver is called "msm". Since libdrm_freedreno handles the differences
between the two, we need to load the same thing for either device.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
There where some small API tweaks in libdrm_freedreno to enable support
for msm drm/kms driver.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
We need to set the flag on all the .xyzw components that are written by
the instruction, not just on .x. Otherwise a later use of rN.y (for
example) will not trigger the appropriate sync bit to be set.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Seems like most/all instructions have some restrictions about const src
registers. In seems like the 2 src (cat2) instructions can take at most
one const, and the 3 src (cat3) instructions can take at most one const
in the first 2 arguments. And so on. Handle this properly now.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
And remove libdrm/ from a winsys include statement.
Signed-off-by: Jonathan Gray <[email protected]>
|
|
|
|
|
|
| |
The previous point/line/triangle() functions didn't handle GS primitives.
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
| |
Otherwise the first few frames have an incorrect reference index.
Signed-off-by: Christian König <[email protected]>
|
|
|
|
|
|
| |
Silences Coverity "Out-of-bounds access" defect.
Signed-off-by: Vinson Lee <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
MSAA was tested by one user on RS690 and it works for him with color
compression (CMASK) disabled. Our theory is that his chipset lacks CMASK RAM.
Since we don't have hardware documentation about which chipsets actually have
CMASK RAM, I had to take a guess based on the presence of HiZ.
Reviewed-by: Alex Deucher <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In particular noone is interested in the vertex count, so drop that,
and also drop the duplicated num_primitives_generated /
so.primitives_storage_needed variables in drivers. I am unable for now to figure
out if primitives_storage_needed in SO stats (used for d3d10) should
increase if SO is disabled, though the equivalent num_primitives_generated
used for OpenGL definitely should increase. In any case we were only counting
when SO is active both in softpipe and llvmpipe anyway so don't pretend there's
an independent num_primitives_generated counter which would count always.
(This means the PIPE_QUERY_PRIMITIVES_GENERATED count will still be wrong just
as before, should eventually fix this by doing either separate counting for this
query or adjust the code so it always counts this even if SO is inactive depending
on what's correct for d3d10.)
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
| |
There's just no way resetting the counters is working with nested/overlapping
queries.
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
| |
There's just no way resetting the counters is working with nested/overlapping
queries.
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
| |
CC: "9.2" <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
| |
The command is submitted once the event has been triggered, but it might not
have completed yet. Therefore, we have to add it to deps in order to wait on it.
Signed-off-by: Niels Ole Salscheider <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Niels Ole Salscheider <[email protected]>
Acked-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
| |
This was broken when arrayid was added.
Reviewed-by: Brian Paul <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|