| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
In spu_tri.c:setup_sort_vertices() triangles are culled after the
vertices are sorted. This patch moves the check a little earlier
and performs the actual check a little faster through intrinsics and
a little trickery.
Reduced code size and less work is done before a triangle is deemed
OK to skip.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It was taking approximately 50 cycles to extract the vertex indices,
calculate the vertex_header pointers and call tri_draw() for each
three vertices - .
Unrolled, it takes less than 100 cycles to extract, unpack,
calculate pointers and call tri_draw() eight times. It does have a
nasty jump-tabled switch. I'm sure that there's a better way...
Code size of spu_render.o gets larger due to the extra constants and
work in the inner loop, there are extra stack saves and loads
because there are more registers in use, and an assert. spu_tri.o
gets a little smaller.
|
|
|
|
| |
Suggested by Jonathan Adamczewski. There may be more places to do this...
|
| |
|
| |
|
|
|
|
|
|
| |
Without the f, the constant is treated as a double, resulting in
slower arithmetic and libgcc conversion calls each time CEILF()
is used.
|
|
|
|
|
|
| |
Put setup.v{min,mid,max,provoke} into a union with qword vertex_headers.
Rewrite vertex sorting to more efficiently handle the packed data items.
Reduces spu_tri.o by ~128 bytes.
|
|
|
|
|
|
|
| |
Put edge.{dx,dy} into a union with a vector and perform subtractions in
setup_sort_vertices() on vectors.
Reduces spu_tri.o by ~300 bytes.
|
|
|
|
|
|
|
| |
Replace int setup.span{left,right}[2] with vec_uint4 setup.span.quad
SIMDize calculate_mask() and inline into into flush_spans()
Set setup.span.quad members using spu_shuffle() or spu_sel().
Reduces spu_tri.o by ~116 bytes.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With these changes, the tests/stencil_twoside test now works.
- Eliminate blending from the stencil_twoside test, as it produces an
unneeded dependency on having blending working
- The spe_splat() function will now work if the register being splatted
and the destination register are the same
- Separate fragment code generated for front-facing and back-facing
fragments. Often these are the same; if two-sided stenciling is on,
they can be different. This is easier and faster than generating
code that does both tests and merges the results.
- Fixed a cut/paste bug where if the back Z-pass stencil operation
were different from all the other operations, the back Z-fail
results were incorrect.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Two definitive bugs in stenciling were fixed.
The first, reversed registers in the generated Select Bytes (selb)
instruction, caused the stenciling INCR and DECR operations to
fail dramatically, putting new values in where old values were
supposed to be and vice versa.
The second caused stencil tiles to not be read and written from
main memory by the SPUs. A per-spu flag, spu.read_depth, was used
to indicate whether the SPU should be reading depth tiles, and was set
only when depth was enabled. A second flag, spu.read_stencil, was
set when stenciling was enabled, but never referenced.
As stenciling and depth are in the same tiles on the Cell, and there
is no corresponding TAG_WRITE_TILE_STENCIL to complement
TAG_WRITE_TILE_COLOR and TAG_WRITE_TILE_Z, I fixed this by
eliminating the unused "spu.read_stencil", renaming "spu.read_depth"
to "spu.read_depth_stencil", and setting it if either stenciling or
depth is enabled.
I also added an optimization to the fragment ops generation code,
that avoids calculating stencil values and/or stencil writemask
when the stencil operations are all KEEP.
|
| |
|
|
|
|
| |
Results in slightly tighter code.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
Lots of restrictions for now (one 2D texture, no mipmaps, etc.) for now
but basic texture demos work.
TEX, TXD, TXP do the same thing for the time being.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This set of code changes are for stencil code generation
support. Both one-sided and two-sided stenciling are supported.
In addition to the raw code generation changes, these changes had
to be made elsewhere in the system:
- Added new "register set" feature to the SPE assembly generation.
A "register set" is a way to allocate multiple registers and free
them all at the same time, delegating register allocation management
to the spe_function unit. It's quite useful in complex register
allocation schemes (like stenciling).
- Added and improved SPE macro calculations.
These are operations between registers and unsigned integer
immediates. In many cases, the calculation can be performed
with a single instruction; the macros will generate the
single instruction if possible, or generate a register load
and register-to-register operation if not. These macro
functions are: spe_load_uint() (which has new ways to
load a value in a single instruction), spe_and_uint(),
spe_xor_uint(), spe_compare_equal_uint(), and spe_compare_greater_uint().
- Added facing to fragment generation. While rendering, the rasterizer
needs to be able to determine front- and back-facing fragments, in order
to correctly apply two-sided stencil. That requires these changes:
- Added front_winding field to the cell_command_render block, so that
the state tracker could communicate to the rasterizer what it
considered to be the front-facing direction.
- Added fragment facing as an input to the fragment function.
- Calculated facing is passed during emit_quad().
|
| |
|
|
|
|
| |
Also remove old code, etc.
|
|
|
|
|
|
| |
TGSI shaders are translated into SPE instructions which are then sent to
the SPEs for execution. Only a few opcodes work, no swizzling yet, no
support for constants/immediates, etc.
|
| |
|
| |
|
|
|
|
|
|
|
| |
Do code generation for alpha test, z test, stencil, blend, colormask
and framebuffer/tile read/write as a single code block.
Ian's previous blend/z/stencil test code is still there but mostly disabled
and will be removed soon.
|
| |
|
|
|
|
| |
Note that SPU vertex transformation is disabled at this time.
|
|
|
|
| |
Also, rename p_tile.[ch] to u_tile.[ch]
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
| |
This also implements code-gen for the float-to-packed color
conversion. It's currently hardcoded for A8R8G8B8, but that can
easily be fixed as soon as other color depths are supported by the
Cell driver.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously the constant color blend factor was compiled into the
generated code. This meant that the code had to be regenerated each
time the constant color was changed. This doesn't fit with the model
used in Gallium.
As-is, the code could be better. The constant color is loaded for
every quad processed, even if it is not used. Also, if a lot of (1-x)
blend factors are used, 1.0 will be loaded and reloaded into registers
many times.
|
| |
|
|
|
|
| |
So far this is only tested when GL_BLEND is disabled.
|
|
|
|
|
|
|
|
| |
Alpha test is currently broken because all per-fragment testing occurs
before alpha is calculated.
Stencil test is currently broken because the Z-clear code asserts if
there is a stencil buffer.
|
|
This is in a separate commit to ensure renames are properly preserved.
|