| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
So that pipe->blit can be used for 3D mipmap generation.
|
|
|
|
|
| |
It may cause issues with mipmap generation.
I think it was used to make some piglit tests pass on r300g.
|
|
|
|
|
|
|
|
|
| |
As we do for sampler states in single_sampler_done() and many other
CSO functions.
Reviewed-by: Roland Scheidegger <[email protected]>
Reviewed-by: José Fonseca <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We want to call pipe->set_sampler_views() with count being the
maximum of the old number of sampler views and the new number.
This makes sure we null-out any old sampler views.
We already do the same thing for sampler states in single_sampler_done().
Fixes some assertions seen in the VMware driver with XA tracker.
Cc: "10.0" "10.1" <[email protected]>
Reviewed-by: Thomas Hellstrom <[email protected]>
Tested-by: Thomas Hellstrom <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
it's useful to know what the llvmbuildstore arguments are going to
be before executing it because it can crash and make sure to
print out the inputs only if we're not generating a gs because
it fetches inputs differently.
Signed-off-by: Zack Rusin <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
We used to overallocate the output buffer sometimes running out
of memory with applications rendering large geometries. The actual
maximum number of vertices out is simply the maximum number of
primitives in (number of gs invocations) multiplied by the maximum
number of output vertices per gs input primitive (i.e. gs invocation).
Signed-off-by: Zack Rusin <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This can get called in some circumstances if both src type and dst type
have same width (seen with float32->unorm32). While this particular case
was bogus anyway let's just fix that as it can work trivially (due to the
way it was called it actually worked anyway apart from the assert).
Reviewed-by: Zack Rusin <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
| |
As done in draw_pipe_aaline and draw_pipe_aapoint modules.
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
Reviewed-by: Zack Rusin <[email protected]>
Cc: "10.0 10.1" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The conversion code for srgb was tuned for n x 4x8bit AoS -> 4 x nxfloat SoA
(and vice versa), fix this to handle also 16bit 565-style srgb formats.
Still not really all that generic, things like r10g10b10a2_srgb or
r4g4b4a4_srgb wouldn't work (the latter trivial to fix, the former would not
require more work to not crash but near certainly need some higher precision
calculation) but not needed right now.
The code is not fully optimized for this (could use more direct calculation
instead of expanding to 8-bit range first) but should be good enough.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
GL generally doesn't seem to allow srgb formats with less (or more) than 8 bit
for the rgb channels, though some hw could easily do it (typically for formats
with up to 10 bits for the rgb channels, at least for formats with less than 8
bits support is likely widespread even). While it may be true there aren't
really any benefits for such formats, we need for it for d3d, though luckily
only for b5g6r5_srgb it seems.
So add this format along with the util code for conversion - since that util
code is heavily tuned for 8bit srgb this isn't really all that well optimized
and rounding doesn't seem right but at least it should give some halfway
meaningful results.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
| |
The last changes to it are from 2008 and 2009.
It doesn't support most texture formats and some texture targets.
Nobody can possibly be using this.
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
| |
Cc: [email protected]
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
| |
Similar to the other cases, shift some weight/coord calculations to int
space. This should be slightly faster (on x86 sse it should actually safe one
instruction, and generally int instructions are cheaper).
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The previous code used coords which were calculated as
(int) (f_coord * tex_size * 256) >> 8.
This is not only unnecessarily complex but can give the wrong texel due to
rounding for negative coords (as an example, after denormalization coords
from -1.0 to 0.0 should give -1, but this will give -1 for numbers from
-1.0-1/256 - 0.0-1/256.
Instead, juse use ifloor, dropping the shift stuff.
Unfortunately, this will most likely be slower - with arch rounding available
it shouldn't be too bad (trades a int shift for a round but also saves an int
mul (which is shared by all coords) but otherwise it's a mess.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The previous method for converting coords to ints was sligthly inaccurate
(effectively losing 1bit from the 8bit lerp weight). This is probably
especially noticeable when trying to draw a pixel-aligned texture.
As an example, for a 100x100 texture after dernormalization the texture
coords in this case would turn up as
0.5, 1.5, 2.5, 3.5, 4.5, ...
After the mul by 256, conversion to int and 128 subtraction, they end up as
0, 256, 512, 768, 1024, ...
which gets us the correct coords/weights of
0/0, 1/0, 2/0, 3/0, 4/0, ...
But even LSB errors (which are unavoidable) in the input coords may cause
these coords/weights to be wrong, e.g. for a coord of 3.49999 we'd get a
coord/weight of 2/255 instead.
Fix this by using round-to-nearest int instead of FPToSi (trunc). Should be
equally fast on x86 sse though other archs probably suffer a little.
|
|
|
|
|
|
|
|
|
|
|
| |
There is little gain in printing whenever a folder is created.
v2:
- Use $(AM_V_at) over @ to have control in verbose builds.
Suggested by Erik Faye-Lund.
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Jon TURNEY <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
D3D10 allows setting of the internal offset of a buffer, which is
in general only incremented via actual stream output writes. By
allowing setting of the internal offset draw_auto is capable
of rendering from buffers which have not been actually streamed
out to. Our interface didn't allow. This change functionally
shouldn't make any difference to OpenGL where instead of an
append_bitmask you just get a real array where -1 means append
(like in D3D) and 0 means do not append.
Signed-off-by: Zack Rusin <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
Like L4A4.
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
| |
v2: rotate in gen_rect_verts instead
v3: clear rotate in vl_compositor_clear_layers,
update calc_drawn_area as well
Signed-off-by: Kusanagi Kouichi <[email protected]>
Signed-off-by: Christian König <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Fix a leaked vertex shader in u_blitter.c
Signed-off-by: Aaron Watry <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
CC: "10.1" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Because in draw we always inject position at slot 0 whenever
fragment shader would take the maximum number of inputs (32) it
meant that we had PIPE_MAX_ATTRIBS + 1 slots to translate, which
meant that we were crashing with fragment shaders that took
the maximum number of attributes as inputs. The actual max number
of attributes we need to translate thus is PIPE_MAX_ATTRIBS + 1.
Signed-off-by: Zack Rusin <[email protected]>
Reviewed-by: José Fonseca <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
Reviewed-by: Matthew McClure <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
draw_current_shader_* functions return a final output when considering
both the geometry shader and the vertex shader. But when code generating
vertex shader we can not be using output slots from the geometry shader
because, obviously, those can be completely different. This fixes a
number of very non-obvious crashes.
A side-effect of this bug was that sometimes the vertex shading code
could save some random outputs as position/clip when the geometry
shader was writing them and vertex shader had different outputs at
those slots (sometimes writing garbage and sometimes something correct).
Signed-off-by: Zack Rusin <[email protected]>
Reviewed-by: José Fonseca <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
Reviewed-by: Matthew McClure <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Brian Paul <[email protected]>
Cc: "10.0" "10.1" <[email protected]>
|
|
|
|
|
|
|
| |
This is needed for MIN2/MAX2
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some formats can't be handled - in particular cannot handle ints/uints formats,
which lack the pack_rgba_float/unpack_rgba_float functions. Instead of trying
to call these (and crash) return an error (I'm not sure yet if we should try
to translate such formats too here might not make much sense).
v2: suggested by Jose, use separate checks for pack/unpack of rgba_8unorm and
rgba_float functions (right now if one exists the other should as well).
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
| |
Reviewed-by: Fredrik Höglund <[email protected]>
|
|
|
|
| |
Reviewed-by: Fredrik Höglund <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds support to gallium for a TG4 instruction,
and two CAPs. The first CAP is required for GL_ARB_texture_gather.
The second CAP is required to expose GL_ARB_gpu_shader5.
However so far we haven't found any hardware that natively
exposes the textureGatherOffsets feature from GL, so just
lower it for now. If hardware appears for this we can add
another CAP to allow TG4 to take 4 offsets.
v2: add component selection src and a cap to say
hw can do it. (st can use to help control
GL_ARB_gpu_shader5/GLSL 4.00). Add docs.
v3: rename to SM5, add docs.
Reviewed-by: Roland Scheidegger <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Michel Dänzer <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
v2:
- Use __builtin_bswap64()
- Remove unnecessary mask
- Add util_le64_to_cpu() helper
v3:
- Remove unnecessary AC_SUBST
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
|
| |
v2:
- Remove unnecessary AC_SUBST
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The above function implies using the the xlib winsys, which
has additional library dependencies that should not be forced.
Make the software xlib pipe loader optional thus avoid all
the dependency hell. A user that wishes to use the particular
pipe-loader would need to set the following within configure.ac.
enable_gallium_xlib_loader=yes
v2:
- Wrap sw/xlib/xlib_sw_winsys.h to handle compilation on systems
lacking X11 headers. Spotted by Christian Prochaska.
Tested-by: Tom Stellard <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75356
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
| |
v2: Handle null_sw_create failure, add missing function return type
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Jakob Bornecrantz <[email protected]> (v1)
|
|
|
|
|
|
|
|
|
|
|
| |
Will be used in the following commits.
v2: Link gallium tests against the library.
v3: Handle dri_create_sw_winsys failure
v4: Rebase on top of the targets/xa changes
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Jakob Bornecrantz <[email protected]> (v2)
|
|
|
|
|
|
|
|
|
| |
Will be used in the upcoming patches.
v2: handle xlib_create_sw_winsys failure, drop unneeded header
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Jakob Bornecrantz <[email protected]> (v1)
|
|
|
|
|
|
|
|
| |
v2: Rebase on top of the rendernode changes.
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Jakob Bornecrantz <[email protected]> (v1)
Reviewed-by: Francisco Jerez <[email protected]> (v1)
|
|
|
|
|
|
| |
v2: Rebase on top of vl_winsys_xsp.c removal
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Jakob Bornecrantz <[email protected]> (v1)
|
|
|
|
|
| |
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Jakob Bornecrantz <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
defined
Currently HAVE_PIPE_LOADER_XCB is defined, rather than being set to 1/0.
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Jakob Bornecrantz <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The sw pipe-loader implicitly handles winsys_create, thus we
it would make sense to implicitly destroy it upon releasing
the loader.
Currently we leak the sw_winsys when releasing the pipe-loader.
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Jakob Bornecrantz <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Jakob Bornecrantz <[email protected]>
Reviewed-by: Christian König <[email protected]>
|
|
|
|
|
|
|
|
| |
Fixes a build break in state_tracker/st_program.c
Signed-off-by: Jordan Justen <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75278
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The previous code relied on cpu denorm support for converting small float
formats (such r11g11b10_float and r16_float) to floats, otherwise denorms
are flushed to zero. We worked around that in llvmpipe blend code by
reenabling denorms, but this did nothing for texture sampling. Now it would
be possible to reenable it there too but I'm not really a fan of messing
with fpu flags (and it seems we can't actually do it reliably with llvm in
any case looking at some bug reports). (Not to mention if you actually have
a lot of denorms in there, you can expect some order-of-magnitude slowdown
with x86 cpus.)
So instead use code which adjusts exponents etc. directly hence not relying
on cpu denorm support for the rescaling mul.
(We still need the fpu flag handling as we can't do float-to-smallfloat
without using cpu denorms at least for now - I actually wanted to keep
both the old and new code and using one or the other depending on from where
it's called but that didn't work out as the parameter would have to be passed
through too many layers than I'd like.)
Reviewed-by: Zack Rusin <[email protected]>
Reviewed-by: Si Chen <[email protected]>
|
|
|
|
|
|
|
|
| |
Build two versions of pipe-loader, with only the client version linking
in x11 client side dependencies. This will allow the XA state tracker
to use pipe-loader.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
Requested by Marek.
Reviewed-by: Marek Olšák <[email protected]>
Cc: "10.1" <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Thomas Hellstrom <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Cc: "10.1" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In some situations, it may be desirable to bypass the cache at buffer
creation but to insert the buffer in the cache at buffer destruction.
One such situation is where we already have a kernel representation of a
buffer that we want to use, but we also want to insert it in the cache when
it's freed up.
Signed-off-by: Thomas Hellstrom <[email protected]>
Reviewed-by: José Fonseca <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Cc: "10.1" <[email protected]>
|
|
|
|
|
|
|
|
| |
In some situations it's important to restrict the sizes of buffers that the
cached buffer manager is allowed to return
Signed-off-by: Thomas Hellstrom <[email protected]>
Cc: "10.1" <[email protected]>
|
|
|
|
| |
Reviewed-by: Christian König <[email protected]>
|
|
|
|
|
|
|
| |
v2:
- Add missing call to pipe_loader_drm_release()
- Fix render node macros
- Drop render-node configure option
|