summaryrefslogtreecommitdiffstats
path: root/src/gallium/auxiliary
Commit message (Collapse)AuthorAgeFilesLines
* llvmpipe: handle z32s8x24 depth/stencil formatRoland Scheidegger2013-05-182-1/+7
| | | | | | | We need to split up the depth and stencil values in this case, and there's some new logic required to handle float depth and stencil simultaneously. Also make sure we get the 64bit zs clear values and masks propagated correctly.
* gallivm: handle z32s8x24 format for samplingRoland Scheidegger2013-05-181-8/+51
| | | | | | | | | | | | | | | Since we can only sample either depth or stencil but not both only load the required bits which makes things a bit easier (it requires special handling since the format doesn't fit into 32bit). The logic for deciding if depth or stencil should be sampled is a bit odd, but seems to be what other drivers and statetrackers do: if it's a format with both depth and stencil (or just with depth) then sample depth, for sampling stencil a sampler view format with only stencil is required. Also while here fix up stencil sampling for other formats as well, though this isn't supported by mesa (ARB_stencil_texturing), and while blits would use it they don't work neither since they'd also need stencil export. Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: Eliminate 8.8 fixed point intermediates from AoS sampling path.José Fonseca2013-05-174-240/+184
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change was meant as a stepping stone to use PMADDUBSW SSSE3 instruction, but actually this refactoring by itself yields a 10% speedup on texture intensive shaders (e.g, Google Earth's ocean water w/o S3TC on a Ivy Bridge machine), while giving yielding exactly the same results, whereas PMADDUBSW only gave an extra 5%, at the expense of 2bits of precision in the interpolation. I belive that the speedup of this change comes from the reduced register pressure (as 8.8 fixed point intermediates take twice the space of 8bit unorm). Also, not dealing with 8.8 simplifies lp_bld_sample_aos.c code substantially -- it's no longer necessary to have code duplicated for low and high register halfs. Note about lp_build_sample_mipmap(): the path for num_quads > 1 is never executed (as it is faster on AVX to split the 256bit wide texture computation into two 128bit chunks, in order to leverage integer opcodes). This path might be useful in the future, so in order to verify this change did not break that path I had to apply this change: @@ -1662,11 +1662,11 @@ lp_build_sample_soa(struct gallivm_state *gallivm, /* * we only try 8-wide sampling with soa as it appears to * be a loss with aos with AVX (but it should work). * (It should be faster if we'd support avx2) */ - if (num_quads == 1 || !use_aos) { + if (/* num_quads == 1 || ! */ use_aos) { if (num_quads > 1) { if (mip_filter == PIPE_TEX_MIPFILTER_NONE) { LLVMValueRef index0 = lp_build_const_int32(gallivm, 0); /* and then run texfilt mesademo: LP_NATIVE_VECTOR_WIDTH=256 ./texfilt Ran whole piglit without regressions. Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: Add and use lp_build_lerp_3d.José Fonseca2013-05-173-26/+60
| | | | Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: Support pointers in lp_build_print_value().José Fonseca2013-05-161-0/+2
| | | | Trivial.
* draw: More defensive coding in DRAW_GET_IDX.José Fonseca2013-05-151-2/+2
| | | | Doesn't make a difference ATM, but just in case.
* draw: Fix vsplit regression when the ib can be used directly.José Fonseca2013-05-151-1/+1
| | | | | | `ib` no longer is offseted by `istart`. Trivial.
* draw/gs: fix extracting of the clipZack Rusin2013-05-141-2/+4
| | | | | | | | | | The indices are not consecutive when using the geometry shader, which means we were extracting non existing values. Create an array of linear indices and always use it instead of the passed indices. Found by Jose. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: José Fonseca <[email protected]>
* draw: try to prevent overflows on index buffersZack Rusin2013-05-148-54/+110
| | | | | | | | | | | | Pass in the size of the index buffer, when available, and use it to handle out of bounds conditions. The behavior in the case of an overflow needs to be the same as with other overflows in the vertex processing pipeline meaning that a vertex should still be generated but all attributes in it set to zero. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: José Fonseca <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* draw: use the total number of vertices for statisticsZack Rusin2013-05-142-2/+2
| | | | | | | | | | | the number of vertices to fetch doesn't necessarily equal the total number of input vertices, e.g. we might want to fetch a single vertex but then draw it twice. Lets use the correct number of input vertices in the statistics. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: José Fonseca <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* draw: don't crash on vertex buffer overflowZack Rusin2013-05-1410-31/+122
| | | | | | | | | | | | | | We would crash when stride was bigger than the size of the buffer. The correct behavior is to just fetch zero's in this case. Unfortunatly with user_buffer's there's no way to validate the size because currently we're just not getting it. Adjust the draw interface to pass the size along the mapped buffer, which works perfectly for buffer backed vertex_buffers and, in future, it will allow us to plumb user_buffer sizes through the same interface. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: José Fonseca <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm/soa: implement indirect addressing in immediatesZack Rusin2013-05-142-2/+82
| | | | | | | | | | | | | The support is analogous to the way we handle indirect addressing in temporaries, except that we don't have to worry about storing (after declarations) and thus we'll able to keep using the old code when indirect addressing isn't used. In other words we're still using constants directly, unless the instruction has immediate register with indirect addressing. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: José Fonseca <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* draw/gs: don't bind the tgsi state if we're using llvm pathsZack Rusin2013-05-141-1/+6
| | | | | | Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: José Fonseca <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: Fix build with LLVM >= 3.4 r181680.Vinson Lee2013-05-141-1/+3
| | | | | Tested-by: Laurent Carlier <[email protected]> Signed-off-by: Vinson Lee <[email protected]>
* draw: Fix io_ptr/num_prims name in IR.José Fonseca2013-05-141-1/+1
| | | | Trivial.
* draw/llvm: Add additional llvm optimization passesStéphane Marchesin2013-05-081-0/+3
| | | | | | | | | | | | It helps a bit with vertex shader performance on i915g (a couple percent faster with openarena). I have tried most other passes, and they weren't showing any measurable improvement. Note that my vertex shaders didn't have loops, so maybe the loop optimizations could still be useful in the future. Reviewed-by: Brian Paul <[email protected]>
* tgsi: fix operand type of TGSI_OPCODE_NOTChia-I Wu2013-05-082-1/+2
| | | | | | | | | It should be TGSI_TYPE_UNSIGNED, not TGSI_TYPE_FLOAT. Fixed also gallivm not_emit_cpu() to use uint build context. Signed-off-by: Chia-I Wu <[email protected]> Acked-by: Roland Scheidegger <[email protected]>
* tgsi: refactor tgsi_opcode_infer_src_type()Chia-I Wu2013-05-081-35/+9
| | | | | | | Call tgsi_opcode_infer_type() from tgsi_opcode_infer_src_type(). Signed-off-by: Chia-I Wu <[email protected]> Acked-by: Roland Scheidegger <[email protected]>
* tgsi: refactor tgsi_opcode_infer_dst_type()Chia-I Wu2013-05-081-25/+35
| | | | | | | | | | | | | Move the body of tgsi_opcode_infer_dst_type() to a new helper function, tgsi_opcode_infer_type(), and call the helper function from tgsi_opcode_infer_dst_type(). The diff looks complicated simply because the code is moved around. A following commit will make tgsi_opcode_infer_src_type() call tgsi_opcode_infer_type(). Signed-off-by: Chia-I Wu <[email protected]> Acked-by: Roland Scheidegger <[email protected]>
* tgsi: reorder opcodes in opcode type inferenceChia-I Wu2013-05-081-24/+24
| | | | | | | | | Reorder opcodes by their assigned numbers. This makes it easier to see the differences between tgsi_opcode_infer_src_type() and tgsi_opcode_infer_dst_type(). Signed-off-by: Chia-I Wu <[email protected]> Acked-by: Roland Scheidegger <[email protected]>
* tgsi: clean up exec_tex()Chia-I Wu2013-05-081-168/+52
| | | | | | | | | | | | Make use of tgsi_util_get_texture_coord_dim() to replace the big switch table. There is a subtle difference with this change. When TXP is used with an array texture, the layer is now also projected. This behavior matches the TGSI doc. Since GLSL does not allow TXP on an array texture, I am not sure which behavior is correct or preferred. Signed-off-by: Chia-I Wu <[email protected]> Acked-by: Roland Scheidegger <[email protected]>
* tgsi: add tgsi_util_get_texture_coord_dim()Chia-I Wu2013-05-082-0/+94
| | | | | | | | | | | | This util function returns the dimension of the texture coordinates for a texture target, and the location of the shadow reference value. For example, when the texture target is TGSI_TEXTURE_SHADOW2D, the dimension of the texture coordinates is 2, and the location of the ref value is 2 (that is, the Z channel). Signed-off-by: Chia-I Wu <[email protected]> Acked-by: Roland Scheidegger <[email protected]>
* gallivm: Fix build for LLVM < 3.3Tom Stellard2013-05-061-0/+6
| | | | | The C API versions of the LLVM multithreaded functions were added in LLVM 3.3.
* gallivm: Move LLVMStartMultithreaded() static initializer into gallivmTom Stellard2013-05-061-0/+15
| | | | | | | This does not solve all of the problems with using LLVM in a multithreaded enivronment, but it should help in some cases. Reviewed-by: [email protected]
* draw/pt: adjust overflow calculationsZack Rusin2013-05-031-2/+1
| | | | | | | | | gallium lies. buffer_size is not actually buffer_size but available size, which is 'buffer_size - buffer_offset' so by adding buffer offset we'd incorrectly compute overflow. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: José Fonseca <[email protected]>
* tgsi/ureg: make the dst register match the src indirectionZack Rusin2013-05-032-4/+11
| | | | | | | | | | | In ureg src registers could have an indirect register that was either a temp or an addr register, while dst registers allowed only addr. That made moving between them a little difficult so make them behave the same way and allow temp's and addr registers as indirect files for both (tgsi supports it, just ureg didn't). Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: José Fonseca <[email protected]>
* vl/idct: fix for commit 7d2f2a0c890b1993532a45c8c392c28950ddc06eChristian König2013-05-033-16/+21
| | | | | | | | We still need the option for handling 3D textures as well. Should fix: https://bugs.freedesktop.org/show_bug.cgi?id=64143 Signed-off-by: Christian König <[email protected]>
* vl/buffers: fix typo in function nameChristian König2013-05-032-13/+13
| | | | Signed-off-by: Christian König <[email protected]>
* draw: Update for u_assembled_primitive -> u_assembled_prim rename.José Fonseca2013-05-031-1/+1
| | | | | | | | Mesa build is too complex to rely on successful builds. On refactorings it is always a good idea to use git grep to prevent missing cases: $ git grep u_assembled_primitive src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline_llvm.c: u_assembled_primitive(in_prim);
* util/prim: add u_reduced_prims_for_vertices()Chia-I Wu2013-05-031-0/+20
| | | | | | | | The function returns the number of reduced/tessellated primitives for the given vertex count. Signed-off-by: Chia-I Wu <[email protected]> Acked-by: Zack Rusin <[email protected]>
* util/prim: assorted fixes for u_decomposed_prims_for_vertices()Chia-I Wu2013-05-031-11/+11
| | | | | | | | | | | Switch to '>=' for comparisons, and it becomes obvious that the comparison for PIPE_PRIM_QUAD_STRIP was wrong. Add minimum vertex count check for PIPE_PRIM_LINE_LOOP. Return 1 for PIPE_PRIM_POLYGON with 3 vertices. Signed-off-by: Chia-I Wu <[email protected]> Acked-by: Zack Rusin <[email protected]>
* util/prim: use vertex count info in u_validate_pipe_prim()Chia-I Wu2013-05-031-32/+2
| | | | | | | As a side effect, primitives with adjacency are now correctly validated. Signed-off-by: Chia-I Wu <[email protected]> Acked-by: Zack Rusin <[email protected]>
* util/prim: fix the name of the include guardChia-I Wu2013-05-031-2/+2
| | | | | | | It should be U_PRIM_H, not U_BLIT_H. Signed-off-by: Chia-I Wu <[email protected]> Acked-by: Zack Rusin <[email protected]>
* draw: use u_assembled_prim() instead of u_assembled_primitive()Chia-I Wu2013-05-033-11/+3
| | | | | | | The latter function is also removed as a result of the change. Signed-off-by: Chia-I Wu <[email protected]> Acked-by: Zack Rusin <[email protected]>
* util/prim: clean up and add commentsChia-I Wu2013-05-031-60/+107
| | | | | | | | | | | | | Move together (or add) functions to decompose/reduce/assemble a primitive, give them consistent names, and document them. Add u_prim_vertex_count() so that the vertex count information can be used elsewhere. u_assembled_primitive() will be removed in a folow-on commit. [olv: fix a warning when -Wold-style-declaration is enabled] Signed-off-by: Chia-I Wu <[email protected]> Acked-by: Zack Rusin <[email protected]>
* util/prim: fix primitive trimming for triangles with adjacencyChia-I Wu2013-05-031-2/+2
| | | | | | | Fix for PIPE_PRIM_TRIANGLES_ADJACENCY and PIPE_PRIM_TRIANGLE_STRIP_ADJACENCY. Signed-off-by: Chia-I Wu <[email protected]> Acked-by: Zack Rusin <[email protected]>
* draw/gs: don't crash when vs/gs signatures don't matchZack Rusin2013-05-021-39/+54
| | | | | | | | | instead of crashing just fill zeros at the input slots that don't match, that's the mandated behavior and it avoids debug asserts. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: José Fonseca <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* tgsi: allow negation of all integer typesZack Rusin2013-05-022-3/+2
| | | | | | | | | | It's valid because we reuse certain arithmetic operations for both signed and unsigned types (e.g. uadd, umad, which have a bit unfortunate naming) Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: José Fonseca <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: Fix build with LLVM 3.3Armin K2013-05-021-1/+3
| | | | Reviewed-by: Tom Stellard <[email protected]>
* gallivm: Fix altivec intrinsics for 8xi16 add/subAdam Jackson2013-05-021-2/+2
| | | | Signed-off-by: Adam Jackson <[email protected]>
* vl/buffer: use 2D_ARRAY instead of 3D texturesChristian König2013-05-013-20/+22
| | | | Signed-off-by: Christian König <[email protected]>
* vl/compositor: cleanup background clearingChristian König2013-05-012-4/+6
| | | | | | Add an extra parameter to specify if we should clear the render target. Signed-off-by: Christian König <[email protected]>
* build: Remove HAVE_PIPE_LOADER_SW.Matt Turner2013-04-301-4/+0
| | | | | | | | | | | It guarded the function prototype of pipe_loader_sw_probe, whose use (in pipe_loader.c) and definition (in pipe_loader_sw.c) were not guarded. Both are built into libpipe_loader.la if HAVE_LOADER_GALLIUM, which is enable_gallium_loader in configure.ac. Tested-by: Tom Stellard <[email protected]> Tested-by: Aaron Watry <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* build: Rename PIPE_LOADER_HAVE_XCB to HAVE_PIPE_LOADER_XCB.Matt Turner2013-04-301-2/+2
| | | | | | | | For consistency, since we already have HAVE_PIPE_LOADER_{SW,DRM}. Tested-by: Tom Stellard <[email protected]> Tested-by: Aaron Watry <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* draw: don't crash if GS doesn't emit anythingZack Rusin2013-04-272-0/+18
| | | | | | | | | | Technically it's legal for geometry shader to not emit any vertices. It's silly, but perfectly legal, so lets make draw stop crashing if it happens. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: José Fonseca <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* Gallium: Use mmap on Haiku for executable memory vs mallocAlexander von Gluck IV2013-04-291-1/+1
| | | | * Haiku now has DEP enabled by default.
* draw/so: fix overflow calculationZack Rusin2013-04-271-8/+18
| | | | | | | | | | only report overflow for missing targets if they're actually being used. if the targets are missing but are not being used by any slot in the stream output declaration we should correctly just ignore them. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: José Fonseca <[email protected]>
* draw/so: indicate overflow when buffer is missingZack Rusin2013-04-271-0/+4
| | | | | | | | | | We were crashing if one of the buffers wasn't set, we should just treat it as an overflow. It's useful when using so statistics because it allows one to figure out how much data would be generated by so without actually writing any of it. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: José Fonseca <[email protected]>
* gallivm: fix indirect addressing of temps in soa modeZack Rusin2013-04-271-0/+11
| | | | | | | | | | we weren't adding the soa offsets when constructing the indices for the gather functions. That meant that we were always returning the data in the first vertex/primitive/pixel in the SoA structure and not correctly fetching from all structures. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: José Fonseca <[email protected]>
* tgsi/ureg: Add a function to return the number of outputsZack Rusin2013-04-262-0/+15
| | | | | | | | We already hold the variable, just weren't providing access to it. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: José Fonseca <[email protected]>