| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
Cc: [email protected]
[ Francisco Jerez: Fix "PIPE_ENDIAN_SMALL" in the documentation,
define PIPE_ENDIAN_NATIVE. ]
|
|
|
|
| |
Tested-by: José Fonseca <[email protected]>
|
|
|
|
|
|
| |
Never called.
Trivial.
|
|
|
|
|
|
|
|
|
| |
Test infs, zeros and nans with our arith functions to assure
correct/defined behavior with those values.
Signed-off-by: Zack Rusin <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Usually with fixed point renderbuffers clamping is done as part of conversion.
However, since we blend in float format, we essentially skip all conversion
steps pre-blend but since this is still a fixed point renderbuffer we must
still clamp the inputs in this case. Makes no difference for piglit though.
Obviously we could skip this if fragment color clamping is enabled, but a)
this is deprecated in OpenGL (d3d never had it) and b) we don't support it
natively so it gets baked into the shader.
Also add some comment about logic ops being broken for srgb, luckily no test
tries to do that as there's no easy fix...
Reviewed-by: Jose Fonseca <[email protected]>
Reviewed-by: Zack Rusin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We were fixing up the blend factor to ZERO, however this only works correctly
with fixed point render buffers where the input values are clamped to 0/1
(because src_alpha_saturate is min(As, 1-Ad) so can be negative with unclamped
inputs). Haven't seen any failure anywhere due to that with fixed point SNORM
buffers (which clamp inputs to -1/1) but it should apply there as well (snorm
blending is rare, even opengl 4.3 doesn't require snorm rendertargets at all,
d3d10 requires them but they are not blendable).
Doesn't look like piglit hits this though (some internal testing hits the
float case at least). (With legacy OpenGL we could theoretically still use the
fixup to zero if the fragment color clamp is enabled, but we can't detect that
easily since we don't support native clamping hence it gets baked into the
shader.)
Reviewed-by: Jose Fonseca <[email protected]>
Reviewed-by: Zack Rusin <[email protected]>
|
|
|
|
| |
I broke this with 7948ed1250cae78ae1b22dbce4ab23aceacc6159 for r700 at least.
|
|
|
|
|
|
|
| |
Lets the code compile on non Linux systems.
Signed-off-by: Jonathan Gray <[email protected]>
Signed-off-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adds H.264 and MPEG2 codec support via VP2, using firmware from the
blob. Acceleration is supported at the bitstream level for H.264 and
IDCT level for MPEG2.
Known issues:
- H.264 interlaced doesn't render properly
- H.264 shows very occasional artifacts on a small fraction of videos
- MPEG2 + VDPAU shows frequent but small artifacts, which aren't there
when using XvMC on the same videos
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Scheduler/register allocator in r600-sb was developed and optimized
on evergreen (VLIW-5) hardware, so currently it's not optimal for
VLIW-4 chips.
This patch should improve performance on cayman gpus due to better alu
packing, but also it tends to increase register usage, so overall positive
effect on performance has to be proven by real benchmarks yet.
Some results with bfgminer kernel on cayman:
source bytecode: 60 gprs, 3905 alu groups,
sbcl before the patch: 45 gprs, 4088 alu groups,
sbcl with this patch: 55 gprs, 3474 alu groups.
Signed-off-by: Vadim Girlin <[email protected]>
|
|
|
|
|
|
|
| |
Ex-scalar instructions that became multislot on cayman do replicate result
to all channels - handle them similar to DOT4.
Signed-off-by: Vadim Girlin <[email protected]>
|
|
|
|
|
|
| |
Update the stale debug code for other changes related to debug output.
Signed-off-by: Vadim Girlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Mark values that are members of the 'same register' constraint as
preallocated in ra_init pass, this will prevent incorrect
reallocation in scheduler in some cases.
Should fix https://bugs.freedesktop.org/show_bug.cgi?id=66713
Signed-off-by: Vadim Girlin <[email protected]>
|
|
|
|
| |
Signed-off-by: Vadim Girlin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Actually PS doesn't make sense for cayman and isn't even mentioned in
cayman docs, but llvm backend currently uses it in bytecode and, assuming
that hw seems to be mostly ok with it, this will allow sb to parse such
source bytecode correctly.
Signed-off-by: Vadim Girlin <[email protected]>
|
|
|
|
|
|
| |
Fixes "Uninitialized scalar field" defect reported by Coverity.
Signed-off-by: Vinson Lee <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Just use the new conversion functions to do the work. The way it's plugged
in into the blend code is quite hacktastic but follows all the same hacks
as used by packed float format already.
Only support 4x8bit srgb formats (rgba/rgbx plus swizzle), 24bit formats never
worked anyway in the blend code and are thus disabled, and I don't think anyone
is interested in L8/L8A8. Would need even more hacks otherwise.
Unless I'm missing something, this is the last feature except MSAA needed for
OpenGL 3.0, and for OpenGL 3.1 as well I believe.
v2: prettify a bit, use separate function for packing.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
| |
This reverts commit 631c631cbf5b7e84e42a7cfffa1c206d63143370.
https://bugs.freedesktop.org/show_bug.cgi?id=66921
Cc: [email protected]
|
|
|
|
| |
Cc: [email protected]
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The splitting of a draw call into several draw commands was broken, because
the split sometimes took place in the middle of a primitive. The splitting
was supposed to be dealing with the case when there are more indices than
the maximum size of a CS.
This commit throws that code away and uses a real index buffer instead.
https://bugs.freedesktop.org/show_bug.cgi?id=66558
Cc: [email protected]
|
|
|
|
|
|
| |
When only the offset to the index buffer is changed, we can skip the
3DSTATE_INDEX_BUFFER if we always use 0 for the offset, and add
(offset / index_size) to Start Vertex Location in 3DPRIMITIVE.
|
|
|
|
|
|
| |
Fixes "Uninitialized scalar field" reported by Coverity.
Signed-off-by: Vinson Lee <[email protected]>
|
|
|
|
|
|
| |
The compiler does not know that ilo_3d_pipeline_estimate_size() is pure and
can be eliminated in a release build in gen6_pipeline_end(). Move the call
into the assert().
|
|
|
|
|
| |
The checks may seem redundant because cso_context handles them, but
util_blitter does not have access to cso_context.
|
|
|
|
|
| |
Document why certain states need to be saved, and fix a bug when blitting with
scissor enabled.
|
|
|
|
|
|
|
|
|
| |
There are hw bugs. Flush and inv event is sufficient.
Fixes:
https://bugs.freedesktop.org/show_bug.cgi?id=66837
Signed-off-by: Alex Deucher <[email protected]>
|
|
|
|
| |
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=66858
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
TGSI_OPCODE_KIL and KILP had confusing names. The former was conditional
kill (if any src component < 0). The later was unconditional kill.
At one time KILP was supposed to work with NV-style condition
codes/predicates but we never had that in TGSI.
This patch renames both opcodes:
TGSI_OPCODE_KIL -> KILL_IF (kill if src.xyzw < 0)
TGSI_OPCODE_KILP -> KILL (unconditional kill)
Note: I didn't just transpose the opcode names to help ensure that I
didn't miss updating any code anywhere.
I believe I've updated all the relevant code and comments but I'm
not 100% sure that some drivers had this right in the first place.
For example, the radeon driver might have llvm.AMDGPU.kill and
llvm.AMDGPU.kilp mixed up. Driver authors should review their code.
Reviewed-by: Jose Fonseca <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
UVD 2.x doesn't support hardware decoding of MPEG2, just use shader
based decoding for those chipsets.
Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=66450
v2: fix interlacing as well
Signed-off-by: Christian König <[email protected]>
|
|
|
|
|
|
|
| |
Note: this is a candidate for the 9.1 branch.
Reviewed-by: Marek Olšák <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
|
|
|
|
|
| |
Add the sb CXX files to the Android Makefile and also stop using some
c++11 features.
v2 (Vadim Girlin): use &bc[0] instead of bc.begin()
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds support for some math optimizations that are generally
considered unsafe, that's why they are currently disabled for compute
shaders.
GL requirements are less strict, so they are enabled for
for GL shaders by default. In case of any issues with
applications that rely on higher precision than guaranteed by GL,
'sbsafemath' option in R600_DEBUG allows to disable them.
v2 - always set proper src vector size for transformed instructions
- check for clamp modifier in the expr_handler::fold_assoc
Signed-off-by: Vadim Girlin <[email protected]>
|
|
|
|
| |
So that there are at most (2^22 * 6) texels, lower than the 2^26 limit.
|
|
|
|
|
| |
Initialize all 4 channels of undefined registers (that is, TEMPs that are used
before being assigned) in FS.
|
|
|
|
|
|
| |
16 more little piglits.
Reviewed-by: Tom Stellard <[email protected]>
|
|
|
|
|
|
| |
One more little piglit.
Reviewed-by: Tom Stellard <[email protected]>
|
|
|
|
| |
The PRM specifies several padding requirements that we failed to honor.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The D3D10 spec is very explicit about treatment of denorm floats and
the behavior is exactly the same for them as it would be for -0 or
+0. This makes our shading code match that behavior, since OpenGL
doesn't care and on a few cpu's it's faster (worst case the same).
Float16 conversions will likely break but we'll fix them in a follow
up commit.
Signed-off-by: Zack Rusin <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
| |
Reviewed-by: Alex Deucher <[email protected]>
|
|
|
|
|
|
|
| |
It's done automatically for vertex buffers, but not for constant buffers,
textures, and colorbuffers.
Reviewed-by: Alex Deucher <[email protected]>
|
|
|
|
|
|
|
| |
This should increase performance if constant uploads are done with the CP DMA,
because only the cache that needs to be flushed is flushed.
Reviewed-by: Alex Deucher <[email protected]>
|
|
|
|
|
|
|
| |
also flushing any cache in evergreen_emit_cs_shader seems to be superfluous
(we don't flush caches when changing the other shaders either)
Reviewed-by: Alex Deucher <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. flush SH with read caches
2. add flag for DB flushes
3. add flag for CB flushes
v2: flush all CBs, remove redundant emit_state variable.
v3: Marek: also set the new flags in r600_context_flush, the CP dma functions,
and texture_barrier, and rename them
Signed-off-by: Marek Olšák <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
|
|
|
|
|
|
|
|
| |
The winsys should do this, because it measures how much time we spend
in buffer_map doing synchronization, which can be viewed with the gallium
HUD.
Reviewed-by: Alex Deucher <[email protected]>
|
|
|
|
| |
Reviewed-by: Alex Deucher <[email protected]>
|
|
|
|
|
|
| |
the winsys does this automatically
Reviewed-by: Alex Deucher <[email protected]>
|
|
|
|
|
|
|
|
|
| |
It was wrong, because the offset shouldn't be applied to MSAA depth buffers.
This small cleanup should prevent such issues in the future.
This fixes a lockup in "piglit/fbo-depthstencil default_fb -samples=n".
Reviewed-by: Alex Deucher <[email protected]>
|
|
|
|
| |
Reviewed-by: Alex Deucher <[email protected]>
|
|
|
|
| |
Reviewed-by: Alex Deucher <[email protected]>
|