aboutsummaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/cell/spu/spu_tri.c
Commit message (Collapse)AuthorAgeFilesLines
* cell: perform triangle cull a little earlierJonathan Adamczewski2009-05-211-31/+74
| | | | | | | | | | In spu_tri.c:setup_sort_vertices() triangles are culled after the vertices are sorted. This patch moves the check a little earlier and performs the actual check a little faster through intrinsics and a little trickery. Reduced code size and less work is done before a triangle is deemed OK to skip.
* cell: unroll inner loop of spu_render.c:cmd_render()Jonathan Adamczewski2009-05-211-22/+18
| | | | | | | | | | | | | | | It was taking approximately 50 cycles to extract the vertex indices, calculate the vertex_header pointers and call tri_draw() for each three vertices - . Unrolled, it takes less than 100 cycles to extract, unpack, calculate pointers and call tri_draw() eight times. It does have a nasty jump-tabled switch. I'm sure that there's a better way... Code size of spu_render.o gets larger due to the extra constants and work in the inner loop, there are extra stack saves and loads because there are more registers in use, and an assert. spu_tri.o gets a little smaller.
* cell: use some SPU intrinsics to get slightly better code in eval_inputs()Brian Paul2009-02-161-4/+7
| | | | Suggested by Jonathan Adamczewski. There may be more places to do this...
* cell: new/tighter code for computing fragment program inputsBrian Paul2009-02-151-91/+76
|
* cell: combine eval_z(), eval_w() functionsBrian Paul2009-02-151-20/+27
|
* cell: Specify constant as float for CEILF().Jonathan Adamczewski2009-01-141-1/+1
| | | | | | Without the f, the constant is treated as a double, resulting in slower arithmetic and libgcc conversion calls each time CEILF() is used.
* cell: SIMDize sorting in setup_sort_vertices()Jonathan Adamczewski2009-01-051-55/+42
| | | | | | Put setup.v{min,mid,max,provoke} into a union with qword vertex_headers. Rewrite vertex sorting to more efficiently handle the packed data items. Reduces spu_tri.o by ~128 bytes.
* cell: SIMDize some subtractionsJonathan Adamczewski2009-01-051-8/+10
| | | | | | | Put edge.{dx,dy} into a union with a vector and perform subtractions in setup_sort_vertices() on vectors. Reduces spu_tri.o by ~300 bytes.
* cell: improvements to spu_tri.cJonathan Adamczewski2009-01-041-42/+52
| | | | | | | Replace int setup.span{left,right}[2] with vec_uint4 setup.span.quad SIMDize calculate_mask() and inline into into flush_spans() Set setup.span.quad members using spu_shuffle() or spu_sel(). Reduces spu_tri.o by ~116 bytes.
* CELL: two-sided stencil fixesRobert Ellison2008-11-111-4/+16
| | | | | | | | | | | | | | | | | | | With these changes, the tests/stencil_twoside test now works. - Eliminate blending from the stencil_twoside test, as it produces an unneeded dependency on having blending working - The spe_splat() function will now work if the register being splatted and the destination register are the same - Separate fragment code generated for front-facing and back-facing fragments. Often these are the same; if two-sided stenciling is on, they can be different. This is easier and faster than generating code that does both tests and merges the results. - Fixed a cut/paste bug where if the back Z-pass stencil operation were different from all the other operations, the back Z-fail results were incorrect.
* CELL: stencil bug fixesRobert Ellison2008-10-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | Two definitive bugs in stenciling were fixed. The first, reversed registers in the generated Select Bytes (selb) instruction, caused the stenciling INCR and DECR operations to fail dramatically, putting new values in where old values were supposed to be and vice versa. The second caused stencil tiles to not be read and written from main memory by the SPUs. A per-spu flag, spu.read_depth, was used to indicate whether the SPU should be reading depth tiles, and was set only when depth was enabled. A second flag, spu.read_stencil, was set when stenciling was enabled, but never referenced. As stenciling and depth are in the same tiles on the Cell, and there is no corresponding TAG_WRITE_TILE_STENCIL to complement TAG_WRITE_TILE_COLOR and TAG_WRITE_TILE_Z, I fixed this by eliminating the unused "spu.read_stencil", renaming "spu.read_depth" to "spu.read_depth_stencil", and setting it if either stenciling or depth is enabled. I also added an optimization to the fragment ops generation code, that avoids calculating stencil values and/or stencil writemask when the stencil operations are all KEEP.
* cell: implement KIL instructionBrian Paul2008-10-161-1/+4
|
* cell: get rid of last usage of float4 union/typedefBrian Paul2008-10-151-34/+29
| | | | Results in slightly tighter code.
* cell: simplify triangle front/back face determinationBrian Paul2008-10-151-46/+23
|
* cell: send rasterizer state to SPUs in proper way, remove front_winding hackBrian Paul2008-10-151-2/+2
|
* cell: updated vertex dump/debug codeBrian Paul2008-10-151-9/+14
|
* cell: more clean-up in spu_tri.cBrian Paul2008-10-131-84/+16
|
* cell: remove dead code, clean-up, reformattingBrian Paul2008-10-131-90/+24
|
* cell: finish-up perspective-corrected interpolationBrian Paul2008-10-131-45/+82
|
* cell: remove old texture codeBrian Paul2008-10-131-66/+1
|
* cell: updates in response to draw's struct vertex_info changesBrian Paul2008-10-101-2/+2
|
* cell: implement basic TXP instruction in fragment shadersBrian Paul2008-10-091-1/+1
| | | | | | Lots of restrictions for now (one 2D texture, no mipmaps, etc.) for now but basic texture demos work. TEX, TXD, TXP do the same thing for the time being.
* CELL: changes to generate SPU code for stencilingRobert Ellison2008-10-031-5/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This set of code changes are for stencil code generation support. Both one-sided and two-sided stenciling are supported. In addition to the raw code generation changes, these changes had to be made elsewhere in the system: - Added new "register set" feature to the SPE assembly generation. A "register set" is a way to allocate multiple registers and free them all at the same time, delegating register allocation management to the spe_function unit. It's quite useful in complex register allocation schemes (like stenciling). - Added and improved SPE macro calculations. These are operations between registers and unsigned integer immediates. In many cases, the calculation can be performed with a single instruction; the macros will generate the single instruction if possible, or generate a register load and register-to-register operation if not. These macro functions are: spe_load_uint() (which has new ways to load a value in a single instruction), spe_and_uint(), spe_xor_uint(), spe_compare_equal_uint(), and spe_compare_greater_uint(). - Added facing to fragment generation. While rendering, the rasterizer needs to be able to determine front- and back-facing fragments, in order to correctly apply two-sided stencil. That requires these changes: - Added front_winding field to the cell_command_render block, so that the state tracker could communicate to the rasterizer what it considered to be the front-facing direction. - Added fragment facing as an input to the fragment function. - Calculated facing is passed during emit_quad().
* cell: evaluate multiple fragment inputsBrian Paul2008-09-121-1/+7
|
* cell: setup fragment program inputs in SOA formatBrian Paul2008-09-121-56/+56
| | | | Also remove old code, etc.
* cell: initial support for fragment shader code generation.Brian Paul2008-09-111-0/+35
| | | | | | TGSI shaders are translated into SPE instructions which are then sent to the SPEs for execution. Only a few opcodes work, no swizzling yet, no support for constants/immediates, etc.
* cell: asst. clean-upBrian Paul2008-09-111-5/+5
|
* cell: remove old per-fragment code, replace with all new codeBrian Paul2008-09-111-96/+0
|
* cell: checkpoint commit of new per-fragment processingBrian Paul2008-09-111-0/+30
| | | | | | | Do code generation for alpha test, z test, stencil, blend, colormask and framebuffer/tile read/write as a single code block. Ian's previous blend/z/stencil test code is still there but mostly disabled and will be removed soon.
* cell: commentsBrian Paul2008-09-111-1/+4
|
* cell: asst fixes to get driver building/running again.Brian2008-08-251-0/+1
| | | | Note that SPU vertex transformation is disabled at this time.
* gallium: refactor/replace p_util.h with util/u_memory.h and util/u_math.hBrian Paul2008-08-241-1/+0
| | | | Also, rename p_tile.[ch] to u_tile.[ch]
* cell: more multi-texture fixes (mostly working now)Brian2008-04-011-9/+9
|
* cell: checkpoint: more multi-texture workBrian2008-04-011-4/+30
|
* cell: more work for multi-texture supportBrian2008-03-311-1/+1
|
* cell: initial work to support multi-textureBrian2008-03-311-1/+1
|
* cell: Implement code-gen for logic opIan Romanick2008-03-261-26/+33
| | | | | | | This also implements code-gen for the float-to-packed color conversion. It's currently hardcoded for A8R8G8B8, but that can easily be fixed as soon as other color depths are supported by the Cell driver.
* cell: Change code-gen for CONST_COLOR blend factorIan Romanick2008-03-211-0/+2
| | | | | | | | | | | | Previously the constant color blend factor was compiled into the generated code. This meant that the code had to be regenerated each time the constant color was changed. This doesn't fit with the model used in Gallium. As-is, the code could be better. The constant color is loaded for every quad processed, even if it is not used. Also, if a lot of (1-x) blend factors are used, 1.0 will be loaded and reloaded into registers many times.
* cell: Fix bus error when there is no depth bufferIan Romanick2008-03-201-0/+3
|
* cell: Use code-gen for alpha blendIan Romanick2008-03-201-19/+37
| | | | So far this is only tested when GL_BLEND is disabled.
* cell: Initial code-gen for alpha / stencil / depth testingIan Romanick2008-03-171-14/+9
| | | | | | | | Alpha test is currently broken because all per-fragment testing occurs before alpha is calculated. Stencil test is currently broken because the Z-clear code asserts if there is a stencil buffer.
* Code reorganization: move files into their places.José Fonseca2008-02-151-0/+926
This is in a separate commit to ensure renames are properly preserved.