summaryrefslogtreecommitdiffstats
path: root/src/gallium/auxiliary
Commit message (Collapse)AuthorAgeFilesLines
* util: Remove check_os_katmai_support.Vinson Lee2010-08-161-119/+1
| | | | | | | | | | | | | | | | | check_os_katmai_support checks that the operating system running on a SSE-capable processor supports SSE. This is necessary for unpatched 2.2.x and earlier kernels. 2.4.x and later kernels support SSE. check_os_katmai_support will disable SSE capabilities for 32-bit x86 operating systems for which there is no code path. Currently, this function handles Linux, Windows, and several BSDs. Mac OS, Cygwin, and Solaris are several operating systems with no code paths. Rather than add code for the unhandled operating systems, remove this function altogether. This will fix SSE detection on all recent 32-bit x86 operating systems. This completely breaks functionality on unpatched 2.2.x and earlier kernels, although there are likely no Gallium3D users on such operating systems.
* translate: Move loop variable declaration outside for loop.Vinson Lee2010-08-161-1/+2
| | | | Fixes MSVC build.
* translate: Remove unused temporary register.José Fonseca2010-08-161-1/+0
| | | | Assuming the side-effect of x86_make_reg is also unnecessary.
* translate: Eliminate void pointer arithmetic.José Fonseca2010-08-161-1/+1
| | | | Non-portable.
* draw_llvm: fix segfaults on non-SSE2 CPUs where it is disabled (v2)Luca Barbieri2010-08-163-17/+32
| | | | | | | | | | | | | | | Changes in v2: - Change function name Currently draw_llvm refuses to create itself on non-SSE2 CPUs due to an alleged LLVM bug. However, this is implemented improperly, because other parts of draw still attempt to access draw->llvm, resulting in segfaults. Instead, put the check in debug_get_option_draw_use_llvm, check that before calling draw_llvm_create, and then check whether draw->llvm is non-null everywhere else.
* translate_sse: major rewrite (v5)Luca Barbieri2010-08-162-239/+936
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NOTE: Win64 is untested, and is thus currently disabled. If you have such a system, please enable it and report whether it works. To enable it, change src/gallium/auxiliary/translate/translate.c Changes in v5: - On Win64, preserve %xmm6 and %xmm7 as required by the ABI - Use _WIN64 instead of WIN64 Changes in v4: - Use x86_target() and x86_target_caps() - Enable translate_sse in x86-64, but not in Win64 Changes in v3: - Win64 support (untested) - Use u_cpu_detect.h constants instead of #ifs Changes in v2: - Minimize #ifs - Give a name to magic number CHANNELS_0001 - Add support for CPUs without SSE (only memcpy and swizzles, like non SSE2) - Fixed comments translate_sse is currently very limited to the point of being useless in essentially all cases. In particular, it only support some float32 and unorm8 formats and doesn't work on x86-64. This commit rewrites it to support: 1. Dumb memory copy for any pair of identical formats 2. All formats that are swizzles of each other 3. Converting 32/64-bit floats and all 8/16/32-bit integers to 32-bit float 4. Converting unorm8/snorm8 to snorm16 and uscaled8/sscaled8 to sscaled16 5. Support for x86-64 (doesn't take advantage of it in any way though) This new translate can even be useful to translate index buffers for cards that lack 8-bit index support. It passes the testsuite I wrote, but note that this is a major change, and more testing would be great.
* rtasm: add minimal x86-64 support and new instructions (v5)Luca Barbieri2010-08-163-40/+551
| | | | | | | | | | | | | | | | | | | | | | | | Changes in v5: - Add sse2_movdqa Changes in v4: - Use _WIN64 instead of WIN64 Changes in v3: - Add target and target caps functions, so that they could be different in principle from the current CPU and they don't need #ifs to check Changes in v2: - Win64 support (untested) - Use u_cpu_detect.h constants instead of #ifs This commit adds minimal x86-64 support: only movs between registers are supported for r8-r15, and x64_rexw() must be used to ask for 64-bit operations. It also adds several new instructions for the new translate_sse code. movdqa
* translate: add support for 8/16-bit indicesLuca Barbieri2010-08-165-19/+108
| | | | | Currently, only 32-bit indices are supported, but some use cases translate needs support for all types.
* translate_sse: remove useless generated function wrappersLuca Barbieri2010-08-161-51/+4
| | | | | | | | | | Currently translate_sse puts two trivial wrappers in the translate vtable. These slow it down and enlarge the source code for no gain, except perhaps the ability to set a breakpoint there, so remove them. Breakpoints can be set on the caller of the translate functions, with no loss of functionality.
* translate_generic: factor out common code between linear and indexedLuca Barbieri2010-08-161-115/+62
| | | | This moves the common code into a separate ALWAYS_INLINE function.
* translate_generic: use memcpy if possible (v3)Luca Barbieri2010-08-161-33/+75
| | | | | | | | | | | | | | | | | | | | | | | | Changes in v3: - If we can do a copy, don't try to get an emit func, as that can assert(0) Changes in v2: - Add comment regarding copy_size When used in GPU drivers, translate can be used to simultaneously perform a gather operation, and convert away from unsupported formats. In this use case, input and output formats will often be identical: clearly it would make sense to use a memcpy in this case. Instead, translate will insist to convert to and from 32-bit floating point numbers. This is not only extremely expensive, but it also loses precision for 32/64-bit integers and 64-bit floating point numbers. This patch changes translate_generic to just use memcpy if the formats are identical, non-blocked, and with an integral number of bytes per pixel (note that all sensible vertex formats are like this).
* drwa: Fix polygon edge flags.Chia-I Wu2010-08-161-1/+1
| | | | | Fix a copy-and-paste error introduced by f141abdc8fdbff41e16b0ce53fa3fa8fba32a7f9.
* draw: No need to make max_vertices even.Chia-I Wu2010-08-166-30/+0
| | | | | | | | | Triangle strip alternates the front/back orientation of its triangles. max_vertices was made even so that varray never splitted a triangle strip at the wrong positions. It did not work with triangle strips with adjacencies. And it is no longer relevant with vsplit.
* draw: Remove DRAW_PIPE_MAX_VERTICES and DRAW_PIPE_FLAG_MASK.Chia-I Wu2010-08-166-30/+19
| | | | | The higher bits of draw elements are no longer used for the stipple or edge flags.
* drwa: Add PRIMITIVE macro to vsplit.Chia-I Wu2010-08-162-20/+31
| | | | | PRIMITIVE is used by the indexed path to flush the entire primitive with custom vertex count checks. It replaces the existing fast path.
* draw: last_vertex_last is always true for GS and SO.Chia-I Wu2010-08-162-9/+3
| | | | | That is, OpenGL decomposition rule is assumed. There should be a pipe_context state to specify the rules.
* draw: Remove varray and vcache.Chia-I Wu2010-08-169-1284/+2
| | | | They have been deprecated by vsplit.
* draw: Replace vcache by vsplit.Chia-I Wu2010-08-163-26/+4
| | | | | | | vcache decomposes primitives while vsplit splits primitives. Splitting is generally easier to do and is faster. More importantly, vcache depends on flatshade_first to decompose. The outputs may have incorrect vertex order which is significant to GS.
* draw: Replace varray by vsplit.Chia-I Wu2010-08-162-8/+9
| | | | | vsplit is a superset of varray. It sets the split flags comparing to varray.
* draw: Add vsplit frontend.Chia-I Wu2010-08-166-1/+695
| | | | | | | | vsplit is based on varray. It sets the split flags when a primitive is splitted. It also has support for indexed primitives. For indexed primitives, unlike vcache, vsplit splits the primitives instead of decomposes them.
* draw: Add new util function draw_pt_trim_count.Chia-I Wu2010-08-163-10/+9
| | | | draw_pt_trim_count is renamed from trim in draw_pt.c.
* draw: Simplify frontend interface a little.Chia-I Wu2010-08-165-21/+11
| | | | | The run method is simplified to take the start vertex and the vertex count.
* draw: Add prim flags to middle ends.Chia-I Wu2010-08-167-26/+46
| | | | | Update the middle end interface to pass the primitive flags from the frontends to the pipeline. No frontend sets the flags yet.
* draw: Add flags to draw_prim_info.Chia-I Wu2010-08-169-11/+36
| | | | | | | | | A primitive may be splitted in frontends. The splitted primitives should convey certain flag bits so that the decomposer can correctly decide the stipple or edge flags. This commit adds flags to draw_prim_info and updates the decomposer to honor the flags. Frontends and middle ends will be updated later.
* gallium: Make printing info on debug builds default offJakob Bornecrantz2010-08-154-4/+4
| | | | | | | This commit silences the printing off most of the debug information when running debug builds. The big culprits are: the tgsi sanity checker that gets run on all shaders on debug; all the options; and finaly the cpu caps printer.
* gallivm: Remove unnecessary header.Vinson Lee2010-08-141-1/+0
|
* u_cpu_detect: remove arch and little_endianLuca Barbieri2010-08-144-35/+9
| | | | | This logic duplicates the one in p_config.h, so remove it and adjust the only two places that were using it.
* gallivm: Refactor the Newton-Rapshon steps, and disable once again.José Fonseca2010-08-141-28/+83
| | | | It causes a very ugly corruption on the Earth's halo on Google Earth.
* Revert "u_blitter: unify clear_depth_stencil and flush_depth_stencil"Marek Olšák2010-08-122-0/+48
| | | | This reverts commit de4784e36505316c2a5ab34cc5b371d17f38d3c5.
* u_blitter: unify clear_depth_stencil and flush_depth_stencilMarek Olšák2010-08-122-48/+0
| | | | No need to enable depth test for clear.
* u_staging: remove useless inline keywordLuca Barbieri2010-08-111-1/+1
|
* translate: allow clients to ask for supported output formatsLuca Barbieri2010-08-113-0/+88
| | | | | | | | | | | | | Currently translate asserts on unsupported output formats, making it impossible to use for some purposes, such as testing whether it actually works on all formats it supports. Removing the assert was met with opposition, so this change allows clients to ask whether an output format is supported, and they are thus able to avoid attempting to use it. Since this is just an addition to the API, no adverse effect is possible, and it makes the testsuite work again.
* auxiliary: Make u_staging.c MSVC compatible.Vinson Lee2010-08-111-3/+5
| | | | Fixes MSVC build.
* auxiliary: Add u_staging.c to SCons build.Vinson Lee2010-08-111-0/+1
| | | | | This is a follow-up to commit b85c71d4e1e4ed788be834dff5b7b3c0cd0402ac which added u_staging.c to make.
* gallivm: Fix and enable the extra Newton/Raphson step in lp_build_rcp().José Fonseca2010-08-111-2/+2
| | | | Thanks to Michal for spotting this.
* Revert "translate_generic: return NULL instead of assert(0) if format not ↵Luca Barbieri2010-08-111-6/+9
| | | | | | | | | | | supported" This reverts commit 16b45ca7cefb3432b4133fe9d0b1dbfe3f286131. José Fonseca asked for a revert. Note that the testsuite will now segfault since it attempts to test all possible formats.
* translate_generic: fix broken A8R8G8B8_UNORM outputLuca Barbieri2010-08-111-3/+9
| | | | | | | translate was attempting to output A8R8G8B8_UNORM as if it were R8G8B8A8_UNORM. Now the tests just added pass.
* translate_generic: return NULL instead of assert(0) if format not supportedLuca Barbieri2010-08-111-9/+6
| | | | This gives the caller a chance to recover (or crash anyway otherwise).
* auxiliary: fix util_framebuffer_copyLuca Barbieri2010-08-111-2/+4
| | | | | | | | | util_framebuffer_copy was attempting to copy all elements of the source framebuffer state. However, this breaks if the user does not zero initialize the structure. Instead, only copy the elements up to nr_cbufs, and clear elements up to dst->nr_cbufs, if the destination was larger than the source.
* gallivm: Use lp_build_div instead of lp_build_mul + lp_build_rcp.José Fonseca2010-08-111-2/+1
| | | | | | | Single divide, so let lp_build_div decide how to implement this. This will save a multiplication in architectures which don't have a RCP intrinsic.
* gallivm: Use unsigned shift in lp_build_minify.José Fonseca2010-08-111-1/+1
| | | | Texture dimensions are unsigned.
* util: copy the u_staging commit message to the codeMarek Olšák2010-08-111-1/+9
|
* auxiliary: support for transfers using staging resourcesLuca Barbieri2010-08-113-0/+123
| | | | | | | | | | | Direct3D 10/11 has no concept of transfers. Applications instead create resources with a STAGING or DYNAMIC usage, copy between them and the real resource and use Map to map the STAGING/DYNAMIC resource. This util module allows to implement Gallium drivers as a Direct3D driver would be implemented: transfers allocate a resource with PIPE_USAGE_STAGING, and copy the data between it and the real resource with resource_copy_region.
* u_surfaces: add util_surfaces_peekLuca Barbieri2010-08-111-0/+13
| | | | Used to find out if a surface exists without creating one.
* u_surfaces: use cso_hash instead of util_hash_tableLuca Barbieri2010-08-112-53/+31
| | | | | Using cso_hash directly is the right thing since util_hash_table adds useless overhead and is harder to use for this application.
* u_surfaces: fix surface leak due to off by oneLuca Barbieri2010-08-111-1/+1
|
* auxiliary: make primitive splitter assert on unimplemented adjacency primsLuca Barbieri2010-08-111-1/+4
| | | | | They are unimplemented, even though the framework makes it possible to implement them well, and nv50 needs them.
* auxiliary: fix u_split_prim naming conventionLuca Barbieri2010-08-111-3/+3
| | | | Current practice is to start identifiers with "util_" instead of "u_".
* auxiliary: move Ben Skeggs' primitive splitter to common codeLuca Barbieri2010-08-111-0/+102
| | | | | | | | | | | | | | | | | | | | | This is a simple framework that handles splitting primitives in an abstract way. The user has to specify the primitive start, start index and count. Then, it can ask the primitive splitter to "draw" a chunk of the primitive, staying under a given vertex/index budget. The primitive splitter will then call user-supplied functions to emit a range of vertices/indices, as well as switch the edgeflag on or off. This is particularly useful for hardware that either has limits on the vertex count field, or where vertices are pushed on a FIFO or temporary buffer of limited size. Note that unlike other splitters, it does not manipulate data in any way, and merely asks a callback to do so, in vertex intervals.
* util: Add util_format_srgb().José Fonseca2010-08-101-0/+38
| | | | To convert RGB -> SRGB format.