| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
| |
Only 8 out of the up to 13 regs are for source/dest depth, so the name
wasn't particularly appropriate. Note that this doesn't count the
constant or URB payload regs. Also, don't pre-divide by 2, so it's
actually a number of registers.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
Whenever the accumulator results are needed, this bit must be set.
|
| |
|
| |
|
|
|
|
|
| |
This makes reading the code easier when matching up to the specs,
which also use this format.
|
|
|
|
|
| |
The SIMD16 message no longer has the goofy interleaved format that
made Compr4 compression necessary before.
|
|
|
|
| |
format ‘%d’ expects type ‘int’, but argument 2 has type ‘long int’
|
|
|
|
|
| |
Otherwise, we might end up with the if stack pointing at the wrong
place. Fixes GPU hang with glsl-vs-if-loop.
|
|
|
|
|
|
|
| |
Fixes glsl-vs-if-nested (70.0 is not <= 70.000648 thanks to the
swizzle bits getting set). Some safety checks are added to make sure
this doesn't happen again as we increase the usage of immediate values
in program generation.
|
|
|
|
|
| |
We'll need to use the HALT instruction to do this right, like returns
from other functions.
|
|
|
|
| |
Fixes glsl-vs-dot-vec2.
|
|
|
|
|
|
|
|
| |
This saves an extra message reg move in the program, though I'm not
clear on whether it will have any performance impact other than cache
footprint. It will also fix those math calls on Sandybridge, where
the brw_eu_emit.c brw_math() support relies on the implied move being
used.
|
| |
|
|\ |
|
| |
| |
| |
| |
| |
| |
| | |
mtypes.h does not use any symbols from compiler.h.
Also add the required headers for files that depended on symbols from
compiler.h but were indirectly including compiler.h through mtypes.h.
|
| | |
|
| | |
|
| | |
|
| | |
|
| |
| |
| |
| |
| |
| | |
Mixing stderr (_mesa_print_program, _mesa_print_instruction,
_mesa_print_alu) with stdout means that when writing both to a file,
there isn't a consistent ordering between the two.
|
| | |
|
| | |
|
|\|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This pulls in multiple i965 driver fixes which will help ensure better
testing coverage during development, and also gets past the conflicts
of the src/mesa/shader -> src/mesa/program move.
Conflicts:
src/mesa/Makefile
src/mesa/main/shaderapi.c
src/mesa/main/shaderobj.h
|
| |
| |
| |
| |
| | |
Also fix up comments, so that the difference between the two passes is
clarified.
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Several routines directly analyze the grf-to-mrf moves from the Gen
binary code. When it is possible, the mov is removed and the message
register is directly written in the arithmetic instruction
Also redundant mrf-to-grf moves are removed (frequently for example,
when sampling many textures with the same uv)
Code was tested with piglit, warsow and nexuiz on an Ironlake
machine. No regression was found there
Note that the optimizations are *deactivated* on Gen4 and Gen6 since I
did test them properly yet. No reason there are bugs but who knows
The optimizations are currently done in branch free programs *only*.
Considering branches is more complicated and there are actually two
paths: one for branch free programs and one for programs with branches
Also some other optimizations should be done during the emission
itself but considering that some code is shader between vertex shaders
(AOS) and pixel shaders (SOA) and that we may have branches or not, it
is pretty hard to both factorize the code and have one good set of
strategies
|
| |
| |
| |
| |
| | |
Clarifies program assembly, and with a little tweak to always use
constant_map, we could cut down on constant buffer payload.
|
| |
| |
| |
| |
| | |
This should be more useful for developers and for bug triaging than
just generating wrong code.
|
| |
| |
| |
| | |
Fixes glsl-vs-arrays. Bug #27388.
|
| |
| |
| |
| | |
Fixes glsl-vs-point-size.
|
| |
| |
| |
| |
| |
| | |
This has confused me twice now. It's a fixed width of 4 (usually a
region description of <4,4,1>), not 1. If it was 1, we'd have been
skipping all over register space.
|
| | |
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The ARL value is increments of vec4 in the register file. But
PROGRAM_TEMPORARY or PROGRAM_INPUT are stored as vec4s interleaved
between the two verts being executed (thus a vec8 each), compared to
PROGRAM_STATE_VAR being packed vec4s.
Fixes:
glsl-vs-arrays-2
glsl-vs-mov-after-deref
(without regressing glsl-vs-arrays-3)
|
| | |
|
| |
| |
| |
| |
| | |
The previous support was overly complicated by trying to use the same
1-OWORD message for both offsets.
|
| | |
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| | |
Otherwise, the second half isn't written, and we end up reading back
black.
Fixes the remaining junk drawn in glsl-max-varyings, and will likely
help with a number of large real-world shaders.
|
| |
| |
| |
| |
| |
| | |
They go into the render cache, so while we don't care about their
contents after execution, failing to note them could cause the writes
to be flushed over important buffer contents later.
|
| | |
|