summaryrefslogtreecommitdiffstats
path: root/src/gallium/auxiliary
Commit message (Collapse)AuthorAgeFilesLines
* gallium/u_blitter: make clearing independent of the colorbuffer formatMarek Olšák2013-06-132-43/+4
| | | | | | | There isn't any difference between 32_FLOAT and 32_*INT in vertex fetching. Both of them don't do any format conversion. Reviewed-by: Brian Paul <[email protected]>
* gallium/u_blitter: make clearing independent of the number of bound colorbuffersMarek Olšák2013-06-134-56/+41
| | | | | | We can use the fragment shader TGSI property WRITES_ALL_CBUFS. Reviewed-by: Brian Paul <[email protected]>
* gallium/util: make WRITES_ALL_CBUFS optional in the passthrough fragment shaderMarek Olšák2013-06-133-5/+10
| | | | Reviewed-by: Brian Paul <[email protected]>
* util: new util_fill_box helperRoland Scheidegger2013-06-132-10/+37
| | | | | | | | | Use new util_fill_box helper for util_clear_render_target. (Also fix off-by-one map error.) v2: handle non-zero z correctly in new helper Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: (trivial) remove duplicated code block (including comment)Roland Scheidegger2013-06-131-7/+0
|
* draw: implement distance cullingZack Rusin2013-06-108-31/+198
| | | | | | | | | | | Works similarly to clip distance. If the cull distance is negative for all vertices against a specific plane then the primitive is culled. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* gallium: add a cull distance semanticZack Rusin2013-06-103-1/+7
| | | | | | | | | | | | | cull distance is analogous to clip distance. If a register is given this semantic, then the values in it are assumed to be a float32 distance to a plane. Primitives will be completely discarded if the plane distance for all of the vertices in the primitive are < 0. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* draw: fix clipper invocation statisticsZack Rusin2013-06-105-6/+33
| | | | | | | | | | | | | We need to figure out the number of invocations of the clipper before the emit, because in the emit we are after clipping where the number of primitives will be equal to number of clipper invocations minus the clipped primitives. So our computations were always off by the number of clipped primitives. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* draw: enable user plane clipping when clipdistance is usedZack Rusin2013-06-102-7/+20
| | | | | | | | | | | | | | | | | Draw depended on clip_plane_enable being set in the rasterizer to use clipdistance registers for clipping. That's really unfriendly because it requires that rasterizer state to have variants for every shader out there. Instead of depending on the rasterizer lets extract the info from the available state: if a shader writes clipdistance then we need to use it and we need to clip using a number of planes equal to the number of writen clipdistance components. This way clipdistances just work. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* draw: make sure clipdistances work with geometry shadersZack Rusin2013-06-106-2/+22
| | | | | | | | | | | we were always fetching the info from the vertex shader, but if geometry shader is present it should be used as the source of that info. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* draw: Add A8R8G8B8 to draw_print_arraysRichard Sandiford2013-06-101-0/+7
| | | | | Reviewed-by: Adam Jackson <[email protected]> Signed-off-by: Richard Sandiford <[email protected]>
* draw: Fix type mismatch between draw_private.h and LLVMRichard Sandiford2013-06-101-1/+1
| | | | | | | | | | | | | | | draw_vertex_buffer declared the size field to be a size_t, but the LLVM code used an int32 instead. This caused problems on big-endian 64-bit targets, because the first 32-bit chunk of the 64-bit size_t was always 0. In one sense size_t seems like a good choice for a size, so one fix would have been to try to get the LLVM code to use the equivalent of size_t too. However, in practice, the size is taken from things like ~0 or width0, both of which are int-sized, so it seemed simpler to make the size field int-sized as well. Reviewed-by: Adam Jackson <[email protected]> Signed-off-by: Richard Sandiford <[email protected]>
* util: Use sizeof(void *) rather than 0 as the fallback cache line sizeRichard Sandiford2013-06-101-0/+5
| | | | | | | | Without this, llvmpipe ends up giving a zero size to all uncompressed textures on non-x86 systems, since align() cannot handle a 0 alignment. Reviewed-by: Adam Jackson <[email protected]> Signed-off-by: Richard Sandiford <[email protected]>
* llvmpipe: Use saturating add/sub for UNORM formatsRichard Sandiford2013-06-101-0/+8
| | | | | | | | | | | | | | | | | | lp_build_add and lp_build_sub have fallback code for cases that cannot be handled by known intrinsics. For UNORM formats, this code was using modulo rather than saturating arithmetic. This fixes some rendering issues for a gnome session on System z. It also fixes various piglit tests on z, such as spec/ARB_color_buffer_float/GL_RGBA8-render. The patch deliberately doesn't tackle the more complicated SNORM case. Tested against piglit on x86_64 and System z with no regressions. Reviewed-by: Adam Jackson <[email protected]> Signed-off-by: Richard Sandiford <[email protected]>
* gallivm: work around slow code generated for interleaving 128bit vectorsRoland Scheidegger2013-06-081-0/+22
| | | | | | | | | | | | | | | We use 128bit vector interleave for untwiddling in the blend code (with 256bit vectors). llvm generates terrible code for this for some reason, so instead of generating a shuffle for 2 128bit vectors use a extract/insert shuffle instead (it only seems to matter we're not using 128bit wide vectors for the shuffle). This decreases instruction count of the blend code generated for a rgba8 render target without blending from 169 to 113 with llvm 3.1 and from 136 to 114 in llvm 3.2/3.3, and I got a ~8% (llvm 3.1) and ~5% (3.2/3.3) performance improvement in gears. (The generated code is still not terribly good as we could actually avoid the interleaving completely but llvm can't know this.) Reviewed-by: Jose Fonseca <[email protected]>
* util: add comment about bogus transfer flagsRoland Scheidegger2013-06-071-0/+1
|
* util: fix util_clear_render_target and util_clear_depth_stencil layer handlingRoland Scheidegger2013-06-071-87/+103
| | | | | | | These functions must clear all bound layers, not just the first. Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* util: add util_resource_is_array_texture()Chia-I Wu2013-06-081-1/+19
| | | | | | | | Checking if array_size is greater than 1 is not enough for single-layered array textures. Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* gallium: Remove draw_arrays() and draw_arrays_instanced() functionsArnas Milasevicius2013-06-072-51/+0
| | | | | | | | Moved draw_arrays() to st_draw_feedback.c and removed draw_arrays_instanced(). draw_arrays() was used by nobody else. Now there's just one "draw" entrypoint into the draw module. Signed-off-by: Brian Paul <[email protected]>
* tgsi: replace tgsi_file_names tgsi_file_names[] with tgsi_file_name() functionBrian Paul2013-06-075-14/+26
| | | | | | | | | This change came from the discovery that the STATIC_ASSERT to check that the number of register file strings didn't actually work. Similar changes could be made for the other string arrays in tgsi_string.c Reviewed-by: Jose Fonseca <[email protected]>
* u_vbuf: fix index buffer leakChia-I Wu2013-06-071-0/+3
| | | | | Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* draw: trivial fix comment typoRoland Scheidegger2013-06-061-1/+1
|
* gallium/tgsi: add missing string for layer semanticRoland Scheidegger2013-06-063-1/+8
| | | | | | Also report if a shader writes the layer semantic Reviewed-by: Brian Paul <[email protected]>
* mesa: remove outdated version lines in commentsRico Schüller2013-06-055-5/+0
| | | | Signed-off-by: Brian Paul <[email protected]>
* gallium: System z supportRichard Sandiford2013-06-051-1/+1
| | | | | | | | | The main change is to use MCJIT rather than the old JIT, which will never be supported for System z. The endianness part is by example since the patch was tested on a glibc system. Signed-off-by: Richard Sandiford <[email protected]> Signed-off-by: Brian Paul <[email protected]>
* gallivm: enhance special sse2 4x4f and 2x8f -> 1x16ub conversionRoland Scheidegger2013-06-051-32/+58
| | | | | | | | | | | | | There's no good reason why it can't handle 2x4f->1x8ub, 1x4f->1x4ub and 1x8f->1x8ub cases, there might be legitimate reasons why we don't have enough input vectors for a full destination vector, and using pack intrinsics should still be much better than using generic conversion (it looks like convert_alpha from the blend code might hit this though I suspect it could be avoided). v2: add another test vector format to lp_test_conv so this gets tested. Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: (trivial) fix lp_build_concat_nRoland Scheidegger2013-06-051-1/+5
| | | | | | | The code was designed to handle no-op concat but failed (unless the caller was using same pointer for src and dst). Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: fix out-of-bounds access with mirror_clamp_to_edge address modeRoland Scheidegger2013-06-011-6/+7
| | | | | | | | | | | | | | | | | | | Surprising this bug survived so long, we were missing a clamp (in the linear filtering version). (Valgrind complained a lot about invalid reads with piglit texwrap, I've also seen spurios failures in this test which might have happened due to this. Valgrind probably didn't complain before the alignment reduction in llvmpipe to 4x4 since the test is using tiny textures so the reads were still always well within allocated area.) While here, also do an effective clamp (after half subtraction) of [0,length-0.5] instead of [0, length-1] which saves an instruction (the filtering weight could be different due to this, but only if both texels point to the same max texel so it doesn't matter). (Both changes are borrowed from PIPE_TEX_CLAMP_TO_EDGE case.) Note: This is a candidate for the stable branches. Reviewed-by: Jose Fonseca <[email protected]>
* draw: fix vs/fs input/output mismatchesZack Rusin2013-05-301-0/+7
| | | | | | | | | | | | When we've changed draw_find_shader_output to return -1 instead of 0 on non found attribs we broke the default behavior of draw, which was to always redirect those to the first (0th) slot. To preserve that behavior if draw_emit_vertex_attr notices a mismatched vertex attrib, it just redirects it to the first slot (instead of trying to use negative index in an array). Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: José Fonseca <[email protected]>
* draw: add cast in debug_printf() to silence warningBrian Paul2013-05-291-1/+1
|
* draw: make sure viewport index is fetched from leading vertexZack Rusin2013-05-256-28/+54
| | | | | | | | | | | | | Viewport index should only be used on a per primitive basis, so instead of fetching it from each vertex, potentially making each vertex in a primitive use a different viewport index, which is obviously broken, make sure that we only fetch from the first vertex in the primitive making the viewport index the same for the entire primtive. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: José Fonseca<[email protected]> Reviewed-by: Brian Paul <[email protected]>
* draw: clamp the viewports to always be between 0 and maxZack Rusin2013-05-255-16/+24
| | | | | | | | | If the viewport index is larger than the PIPE_MAX_VIEWPORTS, then the first (0-th) viewport should be used. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: José Fonseca<[email protected]> Reviewed-by: Brian Paul <[email protected]>
* draw: fixup draw_find_shader_outputZack Rusin2013-05-252-6/+7
| | | | | | | | | | | | | | | | | draw_find_shader_output like most of the code in draw used to depend on position always being at output slot 0. which meant that any other attribute being at 0 could signify an error. unfortunately position can be at any of the output slots, thus other attributes can occupy slot 0 and we need to mark the ones which were not found by something else. This commit changes draw_find_shader_output so that it returns -1 if it can't find the given attribute and adjust the code that depended on it returning >0 whenever it correctly found an attrib. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: José Fonseca<[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* draw: implement support for multiple viewportsZack Rusin2013-05-259-33/+105
| | | | | | | | | | | This adds support for multiple viewports to the draw module. Multiple viewports depend on the presence of geometry shaders which can write the viewport index. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: José Fonseca<[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* gallium: Add support for multiple viewportsZack Rusin2013-05-2513-19/+32
| | | | | | | | | | | | Gallium supported only a single viewport/scissor combination. This commit changes the interface to allow us to add support for multiple viewports/scissors. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: José Fonseca<[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* gallium: fix build on uclibc systemAnthony G. Basile2013-05-291-4/+2
| | | | | | | | | | | | execinfo.h and debug_symbol_name_glibc() are pure GNU-isms and do not build on uclibc systems. A previous patch addressed this issue, but there was an error. This patch corrects that error. See https://bugs.freedesktop.org/show_bug.cgi?id=51782 https://bugs.gentoo.org/show_bug.cgi?id=469768 Signed-off-by: Anthony G. Basile <[email protected]> Signed-off-by: Brian Paul <[email protected]>
* tgsi: add buffer texture to tgsi_util_get_texture_coord_dim()Chia-I Wu2013-05-272-0/+3
| | | | | | | | TGSI_TEXTURE_BUFFER is one-dimensional. Assert that exec_tex() is never called with TGSI_TEXTURE_BUFFER. Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* scons: Don't force stabs debug format for Mingw.José Fonseca2013-05-211-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - recent gdb handles DWARF fine (tested both with version 7.1.90.20100730 from mingw-w64 project, and 7.5-1 from mingw project) - http://people.freedesktop.org/~jrfonseca/bfdhelp/ was updated to handle DWARF - stabs requires ugly hacks to prevent compilation failures - mixing stabs/dwarf prevents proper backtraces (which is inevitable, given that the MinGW C runtime is pre-built with DWARF) For example, without this change I get: (gdb) bt #0 _wassert (_Message=0xf925060 L"Num < NumOperands && \"Invalid child # of SDNode!\"", _File=0xf60b488 L"llvm/include/llvm/CodeGen/SelectionDAGNodes.h", _Line=534) at ../../../../mingw-w64-crt/misc/wassert.c:51 #1 0x0368996b in _assert (_Message=0x39d7ee4 "Num < NumOperands && \"Invalid child # of SDNode!\"", _File=0x39d7e94 "llvm/include/llvm/CodeGen/SelectionDAGNodes.h", _Line=534) at ../../../../mingw-w64-crt/misc/wassert.c:44 #2 0x00000004 in ?? () #3 0x00000004 in ?? () #4 0x0f60b488 in ?? () #5 0x00000000 in ?? () While with this change I get: (gdb) bt #0 _wassert (_Message=0xfb982e8 L"Num < NumOperands && \"Invalid child # of SDNode!\"", _File=0xefbcb40 L"llvm/include/llvm/CodeGen/SelectionDAGNodes.h", _Line=534) at ../../../../mingw-w64-crt/misc/wassert.c:51 #1 0x039c996b in _assert (_Message=0x3d17f24 "Num < NumOperands && \"Invalid child # of SDNode!\"", _File=0x3d17ed4 "llvm/include/llvm/CodeGen/SelectionDAGNodes.h", _Line=534) at ../../../../mingw-w64-crt/misc/wassert.c:44 #2 0x033111cc in getOperand (Num=4, this=<optimized out>) at llvm/include/llvm/CodeGen/SelectionDAGNodes.h:534 #3 getOperand (i=4, this=<optimized out>) at llvm/include/llvm/CodeGen/SelectionDAGNodes.h:779 #4 llvm::SelectionDAG::getNode (this=0xf00cb08, Opcode=79, DL=..., VT=..., N1=..., N2=...) at llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:2859 #5 0x03377b20 in llvm::SelectionDAGBuilder::visitExtractElement (this=0xfb45028, I=...) at llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:2803 [...] Reviewed-by: Brian Paul <[email protected]>
* llvmpipe: handle z32s8x24 depth/stencil formatRoland Scheidegger2013-05-182-1/+7
| | | | | | | We need to split up the depth and stencil values in this case, and there's some new logic required to handle float depth and stencil simultaneously. Also make sure we get the 64bit zs clear values and masks propagated correctly.
* gallivm: handle z32s8x24 format for samplingRoland Scheidegger2013-05-181-8/+51
| | | | | | | | | | | | | | | Since we can only sample either depth or stencil but not both only load the required bits which makes things a bit easier (it requires special handling since the format doesn't fit into 32bit). The logic for deciding if depth or stencil should be sampled is a bit odd, but seems to be what other drivers and statetrackers do: if it's a format with both depth and stencil (or just with depth) then sample depth, for sampling stencil a sampler view format with only stencil is required. Also while here fix up stencil sampling for other formats as well, though this isn't supported by mesa (ARB_stencil_texturing), and while blits would use it they don't work neither since they'd also need stencil export. Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: Eliminate 8.8 fixed point intermediates from AoS sampling path.José Fonseca2013-05-174-240/+184
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change was meant as a stepping stone to use PMADDUBSW SSSE3 instruction, but actually this refactoring by itself yields a 10% speedup on texture intensive shaders (e.g, Google Earth's ocean water w/o S3TC on a Ivy Bridge machine), while giving yielding exactly the same results, whereas PMADDUBSW only gave an extra 5%, at the expense of 2bits of precision in the interpolation. I belive that the speedup of this change comes from the reduced register pressure (as 8.8 fixed point intermediates take twice the space of 8bit unorm). Also, not dealing with 8.8 simplifies lp_bld_sample_aos.c code substantially -- it's no longer necessary to have code duplicated for low and high register halfs. Note about lp_build_sample_mipmap(): the path for num_quads > 1 is never executed (as it is faster on AVX to split the 256bit wide texture computation into two 128bit chunks, in order to leverage integer opcodes). This path might be useful in the future, so in order to verify this change did not break that path I had to apply this change: @@ -1662,11 +1662,11 @@ lp_build_sample_soa(struct gallivm_state *gallivm, /* * we only try 8-wide sampling with soa as it appears to * be a loss with aos with AVX (but it should work). * (It should be faster if we'd support avx2) */ - if (num_quads == 1 || !use_aos) { + if (/* num_quads == 1 || ! */ use_aos) { if (num_quads > 1) { if (mip_filter == PIPE_TEX_MIPFILTER_NONE) { LLVMValueRef index0 = lp_build_const_int32(gallivm, 0); /* and then run texfilt mesademo: LP_NATIVE_VECTOR_WIDTH=256 ./texfilt Ran whole piglit without regressions. Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: Add and use lp_build_lerp_3d.José Fonseca2013-05-173-26/+60
| | | | Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: Support pointers in lp_build_print_value().José Fonseca2013-05-161-0/+2
| | | | Trivial.
* draw: More defensive coding in DRAW_GET_IDX.José Fonseca2013-05-151-2/+2
| | | | Doesn't make a difference ATM, but just in case.
* draw: Fix vsplit regression when the ib can be used directly.José Fonseca2013-05-151-1/+1
| | | | | | `ib` no longer is offseted by `istart`. Trivial.
* draw/gs: fix extracting of the clipZack Rusin2013-05-141-2/+4
| | | | | | | | | | The indices are not consecutive when using the geometry shader, which means we were extracting non existing values. Create an array of linear indices and always use it instead of the passed indices. Found by Jose. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: José Fonseca <[email protected]>
* draw: try to prevent overflows on index buffersZack Rusin2013-05-148-54/+110
| | | | | | | | | | | | Pass in the size of the index buffer, when available, and use it to handle out of bounds conditions. The behavior in the case of an overflow needs to be the same as with other overflows in the vertex processing pipeline meaning that a vertex should still be generated but all attributes in it set to zero. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: José Fonseca <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* draw: use the total number of vertices for statisticsZack Rusin2013-05-142-2/+2
| | | | | | | | | | | the number of vertices to fetch doesn't necessarily equal the total number of input vertices, e.g. we might want to fetch a single vertex but then draw it twice. Lets use the correct number of input vertices in the statistics. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: José Fonseca <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* draw: don't crash on vertex buffer overflowZack Rusin2013-05-1410-31/+122
| | | | | | | | | | | | | | We would crash when stride was bigger than the size of the buffer. The correct behavior is to just fetch zero's in this case. Unfortunatly with user_buffer's there's no way to validate the size because currently we're just not getting it. Adjust the draw interface to pass the size along the mapped buffer, which works perfectly for buffer backed vertex_buffers and, in future, it will allow us to plumb user_buffer sizes through the same interface. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: José Fonseca <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm/soa: implement indirect addressing in immediatesZack Rusin2013-05-142-2/+82
| | | | | | | | | | | | | The support is analogous to the way we handle indirect addressing in temporaries, except that we don't have to worry about storing (after declarations) and thus we'll able to keep using the old code when indirect addressing isn't used. In other words we're still using constants directly, unless the instruction has immediate register with indirect addressing. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: José Fonseca <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>