summaryrefslogtreecommitdiffstats
path: root/src/gallium/auxiliary
Commit message (Collapse)AuthorAgeFilesLines
* gallium/util: Fix detection of AVX cpu capsAndre Heider2013-07-231-2/+25
| | | | | | | | | | | | | | | | | For AVX it's not sufficient to only rely on the cpuid flags. If the CPU supports these extensions, but the OS doesn't, issuing these insns will trigger an undefined opcode exception. In addition to the AVX cpuid bit we also need to: * test cpuid for OSXSAVE support * XGETBV to check if the OS saves/restores AVX regs on context switches See "Detecting Availability and Support" at http://software.intel.com/en-us/articles/introduction-to-intel-advanced-vector-extensions Signed-off-by: Andre Heider <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: José Fonseca <[email protected]>
* util/u_math: Define NAN/INFINITY macros for MSVC.José Fonseca2013-07-201-0/+4
| | | | Untested. But should hopefully fix the build.
* gallivm: add a log function that handles edge casesZack Rusin2013-07-192-0/+21
| | | | | | | | | Same as log2_safe, which means that it can handle infs, 0s and nans. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: export unordered/ordered cmp to a common functionZack Rusin2013-07-191-283/+158
| | | | | | | | | Only the floating point operarators change everything else is the same so it makes sense to share the code. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: handle -inf, inf and nan's in sin/cos instructionsZack Rusin2013-07-192-0/+49
| | | | | | | | | sin/cos for anything not finite is nan and everything else has to be between [-1, 1]. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: add a version of log2 which handles edge casesZack Rusin2013-07-193-6/+65
| | | | | | | | | | | | | | That means that if input is: * - less than zero (to and including -inf) then NaN will be returned * - equal to zero (-denorm, -0, +0 or +denorm), then -inf will be returned * - +infinity, then +infinity will be returned * - NaN, then NaN will be returned It's a separate function because the checks are a little bit costly and in most cases are likely unnecessary. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: fix edge cases in exp2Zack Rusin2013-07-191-3/+7
| | | | | | | | | | exp(0) has to be exactly 1, exp(-inf) has to be 0, exp(inf) has to be inf and exp(nan) has to be nan, this fixes all of those cases. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: handle nan's in min/maxZack Rusin2013-07-196-52/+482
| | | | | | | | | | | Both D3D10 and OpenCL say that if one the inputs is nan then the other should be returned. To preserve that behavior the patch fixes both the sse and the non-sse paths in both functions and adds helper code for handling nans. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: (trivial) simplify lp_build_cos/lp_build_sin a tiny bitRoland Scheidegger2013-07-171-7/+6
| | | | | | | | | Use "or" instead of "add" (this is a classic select sequence, which at least newer llvm versions can actually recognize (3.2+?), and the "add" might prevent that - and we really don't want an add instead of an or with avx if it isn't recognized (even without avx logic ops might be cheaper)). Reviewed-by: Jose Fonseca <[email protected]>
* util/u_format_s3tc: handle srgb formats correctly.Roland Scheidegger2013-07-172-185/+254
| | | | | | | | | | | | | | | | Instead of just ignoring the srgb/linear conversions, simply call the corresponding conversion functions, for all of pack/unpack/fetch, both for float and unorm8 versions (though some don't make a whole lot of sense, i.e. unorm8/unorm8 srgb/linear combinations). Refactored some functions a bit so don't have to duplicate all the code (there's a slight change for packing dxt1_rgb, as there will now be always 4 components initialized and sent to the external compression function so the same code can be used for all, the quite horrid and ad-hoc interface (by now) should always have worked with that). Fixes llvmpipe/softpipe piglit texwrap GL_EXT_texture_sRGB-s3tc. Reviewed-by: Jose Fonseca <[email protected]>
* gallium/util: use explicily sized types for {un, }pack_rgba_{s, u}intEmil Velikov2013-07-172-8/+8
| | | | | | | | Every function but the above four uses explicitly sized types for their src and dst arguments. Even fetch_rgba_{s,u}int follows the convention. Signed-off-by: Emil Velikov <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* llvmpipe: use MCJIT on ARM and AArch64Kyle McMartin2013-07-171-1/+1
| | | | | | | | MCJIT is the only supported LLVM JIT on AArch64 and ARM (the regular JIT has bit-rotted badly on ARM and doesn't exist on AArch64.) Signed-off-by: Kyle McMartin <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* llvmpipe: support sRGB framebuffersRoland Scheidegger2013-07-162-4/+54
| | | | | | | | | | | | | | | Just use the new conversion functions to do the work. The way it's plugged in into the blend code is quite hacktastic but follows all the same hacks as used by packed float format already. Only support 4x8bit srgb formats (rgba/rgbx plus swizzle), 24bit formats never worked anyway in the blend code and are thus disabled, and I don't think anyone is interested in L8/L8A8. Would need even more hacks otherwise. Unless I'm missing something, this is the last feature except MSAA needed for OpenGL 3.0, and for OpenGL 3.1 as well I believe. v2: prettify a bit, use separate function for packing. Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: (trivial) use constant instead of exp2f() functionRoland Scheidegger2013-07-141-2/+3
| | | | | | | Some lame compilers can't do exp2f() and as far as I can tell they can't do exp2() (with doubles) neither so instead of providing some workaround for that (wouldn't actually be too bad just replace with pow) and since it is used with a constant only just use the precalculated constant.
* gallivm: handle srgb-to-linear and linear-to-srgb conversionsRoland Scheidegger2013-07-136-7/+332
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | srgb-to-linear is using 3rd degree polynomial for now which should be _just_ good enough. Reverse is using some rational polynomials and is quite accurate, though not hooked into llvmpipe's blend code yet and hence unused (untested). Using a table might also be an option (for srgb-to-linear especially). This does not enable any new features yet because EXT_texture_srgb was already supported via util_format fallbacks, but performance was lacking probably due to the external function call (the table used by the util_format_srgb code may not be all that much slower on its own). Some performance figures (taken from modified gloss, replaced both base and sphere texture to use GL_SRGB instead of GL_RGB, measured on 1Ghz Sandy Bridge, the numbers aren't terribly accurate): normal gloss, aos, 8-wide: 47 fps normal gloss, aos, 4-wide: 48 fps normal gloss, forced to soa, 8-wide: 48 fps normal gloss, forced to soa, 4-wide: 47 fps patched gloss, old code, soa, 8-wide: 21 fps patched gloss, old code, soa, 4-wide: 24 fps patched gloss, new code, soa, 8-wide: 41 fps patched gloss, new code, soa, 4-wide: 38 fps So there's a performance hit but it seems acceptable, certainly better than using the fallback. Note the new code only works for 4x8bit srgb formats, others (L8/L8A8) will continue to use the old util_format fallback, because I can't be bothered to write code for formats noone uses anyway (as decoding is done as part of lp_build_unpack_rgba_soa which can only handle block type width of 32). Compressed srgb formats should get their own path though eventually (it is going to be expensive in any case, first decompress, then convert). No piglit regressions. v2: use lp_build_polynomial instead of ad-hoc polynomial construction, also since keeping both linear to srgb functions for now make sure both are compiled (since they share quite some code just integrate into the same function). v3: formatting fixes and bugfix in the complicated (disabled) linear-to-srgb path. Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: better support for fast rsqrtRoland Scheidegger2013-07-132-16/+63
| | | | | | | | | | | | | | | We had to disable fast rsqrt before because it wasn't precise enough etc. However in situations when we know we're not going to need more precision we can still use a fast rsqrt (which can be several times faster than the quite expensive sqrt). Hence introduce a new helper which does exactly that - it is probably not useful calling it in some situations if there's no fast rsqrt available so make it queryable if it's available too. v2: use fast_rsqrt consistently instead of rsqrt_fast, fix indentation, let rsqrt use fast_rsqrt. Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* gallium: fixup definitions of the rsq and sqrtZack Rusin2013-07-112-13/+8
| | | | | | | | | | | | GLSL spec says that rsq is undefined for src<=0, but the D3D10 spec says it needs to be a NaN, so lets stop taking an absolute value of the source which completely breaks that behavior. For the gl program we can simply insert an extra abs instrunction which produces the desired behavior there. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* util/u_format: Comment out half float denormal test case.José Fonseca2013-07-121-0/+5
| | | | So that lp_test_format doesn't fail until we decide what should be done.
* gallivm: Eliminate redundant lp_build_select calls.José Fonseca2013-07-121-12/+2
| | | | | | | lp_build_cmp already returns 0 / ~0, so the lp_build_select call is unnecessary. Reviewed-by: Roland Scheidegger <[email protected]>
* tgsi: rename the TGSI fragment kill opcodesBrian Paul2013-07-1214-46/+44
| | | | | | | | | | | | | | | | | | | | | TGSI_OPCODE_KIL and KILP had confusing names. The former was conditional kill (if any src component < 0). The later was unconditional kill. At one time KILP was supposed to work with NV-style condition codes/predicates but we never had that in TGSI. This patch renames both opcodes: TGSI_OPCODE_KIL -> KILL_IF (kill if src.xyzw < 0) TGSI_OPCODE_KILP -> KILL (unconditional kill) Note: I didn't just transpose the opcode names to help ensure that I didn't miss updating any code anywhere. I believe I've updated all the relevant code and comments but I'm not 100% sure that some drivers had this right in the first place. For example, the radeon driver might have llvm.AMDGPU.kill and llvm.AMDGPU.kilp mixed up. Driver authors should review their code. Reviewed-by: Jose Fonseca <[email protected]>
* tgsi: fix-up KILP commentsBrian Paul2013-07-122-5/+3
| | | | | | | | KILP is really unconditional fragment kill. We've had KIL and KILP transposed forever. I'll fix that next. Reviewed-by: Jose Fonseca <[email protected]>
* tgsi: exec TGSI_OPCODE_SQRT as a scalar instruction, not vectorBrian Paul2013-07-121-1/+1
| | | | | | To align with the docs and the state tracker. Reviewed-by: Jose Fonseca <[email protected]>
* tgsi: use X component of the second operand in exec_scalar_binary()Brian Paul2013-07-121-1/+1
| | | | | | | | | The code happened to work in the past since the (scalar) src args effectively always have a swizzle of .xxxx, .yyyy, .zzzz, or .wwww so whether you grab the X or Y component doesn't really matter. Just fixing the code to make it look right. Reviewed-by: Roland Scheidegger <[email protected]>
* os: add os_get_process_name() functionBrian Paul2013-07-123-0/+133
| | | | | v2: explicitly test for BSD/APPLE, #warning for unexpected environments.
* hud: silence some MSVC warningsBrian Paul2013-07-121-8/+8
|
* util: add casts to silence MSVC warnings in u_blit.cBrian Paul2013-07-121-14/+14
|
* tgsi: s/unsigned/int/ to silence MSVC warningBrian Paul2013-07-121-1/+1
|
* util/u_math: Use xmmintrin.h whenever possible.José Fonseca2013-07-101-9/+17
| | | | | | | | | | | | | It seems __builtin_ia32_ldmxcsr is only available on gcc and only when -msse is used. xmmintrin.h/pmmintrin.h provide portable intrinsics, but these too are only available with gcc when -msse/-msse3 are set. scons build always sets -msse on x86 builds, but autotools doesn't seem to. We could try to get this working on gcc x86 without -msse by emitting assembly, but I believe that in this day and age we really should be building Mesa with -msse and -msse2.
* util: treat denorm'ed floats like zeroZack Rusin2013-07-094-0/+72
| | | | | | | | | | | | | The D3D10 spec is very explicit about treatment of denorm floats and the behavior is exactly the same for them as it would be for -0 or +0. This makes our shading code match that behavior, since OpenGL doesn't care and on a few cpu's it's faster (worst case the same). Float16 conversions will likely break but we'll fix them in a follow up commit. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: (trivial) fix using one lod instead of per-quad lod for texel fetchRoland Scheidegger2013-07-051-1/+2
| | | | | | The logic for choosing number of lods was bogus. (The code should ultimately handle the case of only one lod even with multiple quads but currently can't.)
* gallivm: Remove bogus assert.José Fonseca2013-07-051-4/+1
| | | | | | | | | | | It is perfectly valid for the swizzle to be bigger than 2. For example the texel offsets could be SAMPLE ..., IMM[0].zzz What is not correct is for chan_index to be bigger than 2. Trivial.
* gallivm: (trivial) fix bogus assertion for per-element lod with 1d resourcesRoland Scheidegger2013-07-052-2/+1
| | | | | | The assertion was always broken but the code unused until enabling the per-element lod code. Fixes piglit texelFetch vs isampler1D and similar tests (only run with GL 3.0 version override).
* gallivm: do per-pixel lod calculations for explicit lodRoland Scheidegger2013-07-049-125/+193
| | | | | | | | | | | | | | | | | | | | | d3d10 requires per-pixel lod calculations for explicit lod, lod bias and explicit derivatives, and we should probably do it for OpenGL too - at least if they are used from vertex or geometry shaders (so doesn't apply to lod bias) this doesn't just affect neighboring pixels. Some code was already there to handle this so fix it up and enable it. There will no doubt be a performance hit unfortunately, we could do better if we'd knew we had a real vector shift instruction (with variable shift count) but this requires AVX2 on x86 (or a AMD Bulldozer family cpu). Don't do anything for lod bias and explicit derivatives yet, though no special magic should be needed for them neither. Likewise, the size query is still broken just the same. v2: Use information if lod is a (broadcast) scalar or not. The idea would be to base this on the actual value, for now just pretend it's a scalar in fs and not a scalar otherwise (so, per-pixel lod is only used in gs/vs but same code is generated for fs as before). Reviewed-by: Jose Fonseca <[email protected]>
* draw: fix overflows in the indexed rendering pathsZack Rusin2013-07-034-43/+159
| | | | | | | | | | | | | The semantics for overflow detection are a bit tricky with indexed rendering. If the base index in the elements array overflows, then the index of the first element should be used, if the index with bias overflows then it should be treated like a normal overflow. Also overflows need to be checked for in all paths that either the bias, or the starting index location. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* draw/llvm: index overflows if it's greater than elt maxZack Rusin2013-07-031-1/+1
| | | | | | | | | The comparison, incorrectly, was greater-than-or-equal to elt max. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* postprocess: move second temporary assertion into isolated configurationMatthew McClure2013-07-031-2/+2
| | | | | | | | | With this patch we will only assert that the second temporary is allocated, when there are more than two active filters. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=66423 Signed-off-by: Brian Paul <[email protected]>
* gallivm: Simplify intrinsic name construction.José Fonseca2013-07-021-23/+10
| | | | | | Just noticed this could be slightly shortened when fixing MSVC build. Trivial.
* gallivm: Fix MSVC build.José Fonseca2013-07-021-8/+7
|
* gallivm: Fix indirect immediate registers.José Fonseca2013-07-021-2/+2
| | | | | | | | | | | If reg->Register.Indirect is true then the immediate is not truly a constant LLVM expression. There is no performance regression in using LLVMBuildBitCast, as it will fallback to LLVMConstBitCast internally when the argument is a constant. Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Zack Rusin <[email protected]>
* draw/translate: fix instancingZack Rusin2013-06-2813-24/+93
| | | | | | | | | | | | | | | | | | We were incorrectly computing the buffer offset when using the instances. The buffer offset is always equal to: start_instance * stride + (instance_num / instance_divisor) * stride We were completely ignoring the start instance quite often producing instances that completely wrong, e.g. if start instance = 5, instance divisor = 2, then on the first iteration it should be: 5 * stride, not (5/2) * stride as we'd have currently, and if start instance = 1, instance divisor = 3, then on the first iteration it should be: 1 * stride, not 0 as we'd have. This fixes it and adjusts all the code to the changes. Signed-off-by: Zack Rusin <[email protected]>
* draw: fix incorrect clipper invocation statisticsZack Rusin2013-06-281-6/+0
| | | | | | | | clipper invocations are computed earlier (of course before the emittion) so this code was adding bogus numbers to already computed clipper invocations. Signed-off-by: Zack Rusin <[email protected]>
* draw/gallivm: export overflow arithmetic to its own fileZack Rusin2013-06-284-44/+234
| | | | | | | We'll be reusing this code so lets put it in a common file and use it in the draw module. Signed-off-by: Zack Rusin <[email protected]>
* draw: check for integer overflows in instance computationZack Rusin2013-06-282-0/+7
| | | | | | | | | Integers could easily overflow is the starting instance was large enough. Instead of letting bogus counts through set the instance to max if it overflown and let our regular buffer overflow computation handle it. Signed-off-by: Zack Rusin <[email protected]>
* draw: check for an integer overflow when computing strideZack Rusin2013-06-281-10/+43
| | | | | | | | | Our buffer overflow arithmetic was susceptible to integer overflows which was the buffer overflow logic to break. Lets use the llvm overflow intrinsics to check for integer overflows while computing the stride/needed buffer size. Signed-off-by: Zack Rusin <[email protected]>
* draw: account for elem size when computing overflowZack Rusin2013-06-281-7/+23
| | | | | | | | | | | We weren't taking into account the size of element that is to be fetched, which meant that it was possible to overflow the buffer reads if the stride was very close to the end of the buffer, e.g. stride = 3, buffer size = 4, and the element to be read = 4. This should be properly detected as an overflow. Signed-off-by: Zack Rusin <[email protected]>
* st/mesa: handle SNORM formats in generic CopyPixels pathMarek Olšák2013-06-302-0/+23
| | | | v2: check desc->is_mixed in util_format_is_snorm
* postprocess: handle partial intialization failures.Matthew McClure2013-06-277-95/+281
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes segfaults observed when enabling the post processing features. When the format is not supported, or a texture cannot be created, the code must gracefully handle failure and report the error to the calling code for proper failure handling. To accomplish this the following changes were made to the filters.h prototypes: - bool return for pp_init_func - Added pp_free_func for filter specific resource destruction Fixes segfaults from backtraces: * util_destroy_blit pp_free * u_transfer_inline_write_vtbl pp_jimenezmlaa_init_run pp_init This patch also uses tgsi_alloc_tokens to allocate temporary tokens in pp_tgsi_to_state, instead of allocating the array on the stack. This fixes the following stack corruption segfault in pp_run.c: * _int_free aaline_delete_fs_state pp_free Bug Number: 1021843 Reviewed-by: Brian Paul <[email protected]>
* hud: add float casts to silence MSVC warningsBrian Paul2013-06-261-49/+49
|
* hud: include stdio.h since we use fprintf(), fscanf(), etcBrian Paul2013-06-261-0/+2
|
* hud: add cast to silence MSVC warningBrian Paul2013-06-261-1/+1
|