| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Scheduler/register allocator in r600-sb was developed and optimized
on evergreen (VLIW-5) hardware, so currently it's not optimal for
VLIW-4 chips.
This patch should improve performance on cayman gpus due to better alu
packing, but also it tends to increase register usage, so overall positive
effect on performance has to be proven by real benchmarks yet.
Some results with bfgminer kernel on cayman:
source bytecode: 60 gprs, 3905 alu groups,
sbcl before the patch: 45 gprs, 4088 alu groups,
sbcl with this patch: 55 gprs, 3474 alu groups.
Signed-off-by: Vadim Girlin <[email protected]>
|
|
|
|
|
|
|
| |
Ex-scalar instructions that became multislot on cayman do replicate result
to all channels - handle them similar to DOT4.
Signed-off-by: Vadim Girlin <[email protected]>
|
|
|
|
|
|
| |
Update the stale debug code for other changes related to debug output.
Signed-off-by: Vadim Girlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Mark values that are members of the 'same register' constraint as
preallocated in ra_init pass, this will prevent incorrect
reallocation in scheduler in some cases.
Should fix https://bugs.freedesktop.org/show_bug.cgi?id=66713
Signed-off-by: Vadim Girlin <[email protected]>
|
|
|
|
| |
Signed-off-by: Vadim Girlin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Actually PS doesn't make sense for cayman and isn't even mentioned in
cayman docs, but llvm backend currently uses it in bytecode and, assuming
that hw seems to be mostly ok with it, this will allow sb to parse such
source bytecode correctly.
Signed-off-by: Vadim Girlin <[email protected]>
|
|
|
|
|
|
| |
Fixes "Uninitialized scalar field" defect reported by Coverity.
Signed-off-by: Vinson Lee <[email protected]>
|
|
|
|
|
|
|
|
| |
Every function but the above four uses explicitly sized types for their
src and dst arguments. Even fetch_rgba_{s,u}int follows the convention.
Signed-off-by: Emil Velikov <[email protected]>
Signed-off-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
MCJIT is the only supported LLVM JIT on AArch64 and ARM (the regular
JIT has bit-rotted badly on ARM and doesn't exist on AArch64.)
Signed-off-by: Kyle McMartin <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Just use the new conversion functions to do the work. The way it's plugged
in into the blend code is quite hacktastic but follows all the same hacks
as used by packed float format already.
Only support 4x8bit srgb formats (rgba/rgbx plus swizzle), 24bit formats never
worked anyway in the blend code and are thus disabled, and I don't think anyone
is interested in L8/L8A8. Would need even more hacks otherwise.
Unless I'm missing something, this is the last feature except MSAA needed for
OpenGL 3.0, and for OpenGL 3.1 as well I believe.
v2: prettify a bit, use separate function for packing.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
| |
This reverts commit 631c631cbf5b7e84e42a7cfffa1c206d63143370.
https://bugs.freedesktop.org/show_bug.cgi?id=66921
Cc: [email protected]
|
|
|
|
| |
Cc: [email protected]
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The splitting of a draw call into several draw commands was broken, because
the split sometimes took place in the middle of a primitive. The splitting
was supposed to be dealing with the case when there are more indices than
the maximum size of a CS.
This commit throws that code away and uses a real index buffer instead.
https://bugs.freedesktop.org/show_bug.cgi?id=66558
Cc: [email protected]
|
|
|
|
|
|
|
| |
Some lame compilers can't do exp2f() and as far as I can tell they can't do
exp2() (with doubles) neither so instead of providing some workaround for
that (wouldn't actually be too bad just replace with pow) and since it is
used with a constant only just use the precalculated constant.
|
|
|
|
|
|
| |
When only the offset to the index buffer is changed, we can skip the
3DSTATE_INDEX_BUFFER if we always use 0 for the offset, and add
(offset / index_size) to Start Vertex Location in 3DPRIMITIVE.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
srgb-to-linear is using 3rd degree polynomial for now which should be _just_
good enough. Reverse is using some rational polynomials and is quite accurate,
though not hooked into llvmpipe's blend code yet and hence unused (untested).
Using a table might also be an option (for srgb-to-linear especially).
This does not enable any new features yet because EXT_texture_srgb was already
supported via util_format fallbacks, but performance was lacking probably due
to the external function call (the table used by the util_format_srgb code may
not be all that much slower on its own).
Some performance figures (taken from modified gloss, replaced both base and
sphere texture to use GL_SRGB instead of GL_RGB, measured on 1Ghz Sandy Bridge,
the numbers aren't terribly accurate):
normal gloss, aos, 8-wide: 47 fps
normal gloss, aos, 4-wide: 48 fps
normal gloss, forced to soa, 8-wide: 48 fps
normal gloss, forced to soa, 4-wide: 47 fps
patched gloss, old code, soa, 8-wide: 21 fps
patched gloss, old code, soa, 4-wide: 24 fps
patched gloss, new code, soa, 8-wide: 41 fps
patched gloss, new code, soa, 4-wide: 38 fps
So there's a performance hit but it seems acceptable, certainly better
than using the fallback.
Note the new code only works for 4x8bit srgb formats, others (L8/L8A8) will
continue to use the old util_format fallback, because I can't be bothered
to write code for formats noone uses anyway (as decoding is done as part of
lp_build_unpack_rgba_soa which can only handle block type width of 32).
Compressed srgb formats should get their own path though eventually (it is
going to be expensive in any case, first decompress, then convert).
No piglit regressions.
v2: use lp_build_polynomial instead of ad-hoc polynomial construction, also
since keeping both linear to srgb functions for now make sure both are
compiled (since they share quite some code just integrate into the same
function).
v3: formatting fixes and bugfix in the complicated (disabled) linear-to-srgb
path.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We had to disable fast rsqrt before because it wasn't precise enough etc.
However in situations when we know we're not going to need more precision
we can still use a fast rsqrt (which can be several times faster than
the quite expensive sqrt). Hence introduce a new helper which does exactly
that - it is probably not useful calling it in some situations if there's
no fast rsqrt available so make it queryable if it's available too.
v2: use fast_rsqrt consistently instead of rsqrt_fast, fix indentation,
let rsqrt use fast_rsqrt.
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
| |
Fixes "Uninitialized scalar field" reported by Coverity.
Signed-off-by: Vinson Lee <[email protected]>
|
|
|
|
|
| |
The original idea was that cs=NULL should be allowed here, but we never used
NULL until 862f69fbe1e54e0e9a3c439450a14f. This fixes a segfault in CoreBreach.
|
|
|
|
|
|
| |
The compiler does not know that ilo_3d_pipeline_estimate_size() is pure and
can be eliminated in a release build in gen6_pipeline_end(). Move the call
into the assert().
|
|
|
|
|
| |
The checks may seem redundant because cso_context handles them, but
util_blitter does not have access to cso_context.
|
|
|
|
|
| |
Document why certain states need to be saved, and fix a bug when blitting with
scissor enabled.
|
|
|
|
|
|
|
|
|
| |
There are hw bugs. Flush and inv event is sufficient.
Fixes:
https://bugs.freedesktop.org/show_bug.cgi?id=66837
Signed-off-by: Alex Deucher <[email protected]>
|
|
|
|
| |
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=66858
|
|
|
|
|
|
|
|
|
|
|
|
| |
GLSL spec says that rsq is undefined for src<=0, but the D3D10
spec says it needs to be a NaN, so lets stop taking an absolute
value of the source which completely breaks that behavior. For
the gl program we can simply insert an extra abs instrunction
which produces the desired behavior there.
Signed-off-by: Zack Rusin <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
| |
So that lp_test_format doesn't fail until we decide what should be done.
|
|
|
|
|
|
|
| |
lp_build_cmp already returns 0 / ~0, so the lp_build_select call is
unnecessary.
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
TGSI_OPCODE_KIL and KILP had confusing names. The former was conditional
kill (if any src component < 0). The later was unconditional kill.
At one time KILP was supposed to work with NV-style condition
codes/predicates but we never had that in TGSI.
This patch renames both opcodes:
TGSI_OPCODE_KIL -> KILL_IF (kill if src.xyzw < 0)
TGSI_OPCODE_KILP -> KILL (unconditional kill)
Note: I didn't just transpose the opcode names to help ensure that I
didn't miss updating any code anywhere.
I believe I've updated all the relevant code and comments but I'm
not 100% sure that some drivers had this right in the first place.
For example, the radeon driver might have llvm.AMDGPU.kill and
llvm.AMDGPU.kilp mixed up. Driver authors should review their code.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
| |
KILP is really unconditional fragment kill.
We've had KIL and KILP transposed forever. I'll fix that next.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
| |
To align with the docs and the state tracker.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The code happened to work in the past since the (scalar) src args
effectively always have a swizzle of .xxxx, .yyyy, .zzzz, or .wwww so
whether you grab the X or Y component doesn't really matter. Just
fixing the code to make it look right.
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
| |
v2: explicitly test for BSD/APPLE, #warning for unexpected
environments.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
UVD 2.x doesn't support hardware decoding of MPEG2, just use shader
based decoding for those chipsets.
Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=66450
v2: fix interlacing as well
Signed-off-by: Christian König <[email protected]>
|
|
|
|
|
|
|
| |
Note: this is a candidate for the 9.1 branch.
Reviewed-by: Marek Olšák <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
|
|
|
|
|
| |
Add the sb CXX files to the Android Makefile and also stop using some
c++11 features.
v2 (Vadim Girlin): use &bc[0] instead of bc.begin()
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds support for some math optimizations that are generally
considered unsafe, that's why they are currently disabled for compute
shaders.
GL requirements are less strict, so they are enabled for
for GL shaders by default. In case of any issues with
applications that rely on higher precision than guaranteed by GL,
'sbsafemath' option in R600_DEBUG allows to disable them.
v2 - always set proper src vector size for transformed instructions
- check for clamp modifier in the expr_handler::fold_assoc
Signed-off-by: Vadim Girlin <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Jonathan Gray <[email protected]>
Reviewed-by: Christian König <[email protected]>
|
|
|
|
| |
There is no public symbol in this winsys.
|
|
|
|
| |
So that there are at most (2^22 * 6) texels, lower than the 2^26 limit.
|
|
|
|
|
| |
Initialize all 4 channels of undefined registers (that is, TEMPs that are used
before being assigned) in FS.
|
|
|
|
|
|
| |
16 more little piglits.
Reviewed-by: Tom Stellard <[email protected]>
|
|
|
|
|
|
| |
One more little piglit.
Reviewed-by: Tom Stellard <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It seems __builtin_ia32_ldmxcsr is only available on gcc and only when
-msse is used. xmmintrin.h/pmmintrin.h provide portable intrinsics, but
these too are only available with gcc when -msse/-msse3 are set.
scons build always sets -msse on x86 builds, but autotools doesn't seem
to.
We could try to get this working on gcc x86 without -msse by emitting
assembly, but I believe that in this day and age we really should be
building Mesa with -msse and -msse2.
|
|
|
|
| |
The PRM specifies several padding requirements that we failed to honor.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The D3D10 spec is very explicit about treatment of denorm floats and
the behavior is exactly the same for them as it would be for -0 or
+0. This makes our shading code match that behavior, since OpenGL
doesn't care and on a few cpu's it's faster (worst case the same).
Float16 conversions will likely break but we'll fix them in a follow
up commit.
Signed-off-by: Zack Rusin <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
| |
Reviewed-by: Alex Deucher <[email protected]>
|