summaryrefslogtreecommitdiffstats
path: root/src/gallium
Commit message (Collapse)AuthorAgeFilesLines
* draw: always call util_cpu_detect() in draw context creation.Roland Scheidegger2013-07-241-1/+4
| | | | | | | | | | | | | | | | | | | | Since disabling denorms in draw_vbo() we require the util_cpu_caps to be initialized there. Hence add another util_cpu_detect() call in draw_create_context() which should ensure this. (There is another call in draw_get_option_use_llvm() which only gets called with x86 (not x86_64) but calling it always there wouldn't help since it most likely wouldn't get called when compiling without llvm, so leave it alone there.) This fixes https://bugs.freedesktop.org/show_bug.cgi?id=66806. (Because util_cpu_caps wasn't initialized when first calling util_fpstate_get() hence it returning zero, but it would later get initialized by rtasm translate code hence when draw call returned it unmasked all exceptions by calling util_fpstate_set(). This was happening only with DRAW_USE_LLVM=0 or not compiling with llvm, otherwise the llvm init code was calling it on time too.) Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Zack Rusin <[email protected]> Tested-by: Vinson Lee <[email protected]>
* gallium/util: Fix detection of AVX cpu capsAndre Heider2013-07-231-2/+25
| | | | | | | | | | | | | | | | | For AVX it's not sufficient to only rely on the cpuid flags. If the CPU supports these extensions, but the OS doesn't, issuing these insns will trigger an undefined opcode exception. In addition to the AVX cpuid bit we also need to: * test cpuid for OSXSAVE support * XGETBV to check if the OS saves/restores AVX regs on context switches See "Detecting Availability and Support" at http://software.intel.com/en-us/articles/introduction-to-intel-advanced-vector-extensions Signed-off-by: Andre Heider <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: José Fonseca <[email protected]>
* clover: Respect kernel argument alignment restrictions.Francisco Jerez2013-07-222-2/+19
| | | | | Cc: [email protected] Reviewed-by: Tom Stellard <[email protected]>
* clover: Extend kernel arguments for differing host and device data types.Francisco Jerez2013-07-222-4/+56
| | | | | | | Loosely based on a similar patch by Tom Stellard. Cc: [email protected] Reviewed-by: Tom Stellard <[email protected]>
* clover: Byte-swap kernel arguments when host and device endianness differ.Francisco Jerez2013-07-221-37/+65
| | | | | Cc: [email protected] Reviewed-by: Tom Stellard <[email protected]>
* clover: Add kernel argument fields to allow differing host/target data types.Francisco Jerez2013-07-221-2/+23
| | | | | | | Loosely based on a similar patch by Tom Stellard. Cc: [email protected] Reviewed-by: Tom Stellard <[email protected]>
* clover: Pass corresponding module::argument to kernel::argument::bind().Francisco Jerez2013-07-222-84/+61
| | | | | | | | And remove size information from most kernel::argument derived classes, it's no longer going to be necessary. Cc: [email protected] Reviewed-by: Tom Stellard <[email protected]>
* clover: Return correct value for CL_DEVICE_ENDIAN_LITTLETom Stellard2013-07-223-1/+8
| | | | | | | | Query the driver using PIPE_CAP_ENDIANNESS rather than always returning true. Cc: [email protected] Reviewed-by: Francisco Jerez <[email protected]>
* gallium: Add PIPE_CAP_ENDIANNESSTom Stellard2013-07-2214-1/+38
| | | | | | Cc: [email protected] [ Francisco Jerez: Fix "PIPE_ENDIAN_SMALL" in the documentation, define PIPE_ENDIAN_NATIVE. ]
* build: Remove unused EGL_PLATFORMS.Matt Turner2013-07-221-1/+0
|
* llvmpipe: Ensure FTZ/DAZ flags are set on deferred draw flushes.Zack Rusin2013-07-221-0/+8
| | | | Tested-by: José Fonseca <[email protected]>
* llvmpipe: Remove lp_rast_get_num_threads().José Fonseca2013-07-222-11/+0
| | | | | | Never called. Trivial.
* scons: Don't use -z defs ld option on Mac.José Fonseca2013-07-211-1/+2
| | | | Should fix fdo bug 67098.
* util/u_math: Define NAN/INFINITY macros for MSVC.José Fonseca2013-07-201-0/+4
| | | | Untested. But should hopefully fix the build.
* llvmpipe/tests: update arith test to check for edge casesZack Rusin2013-07-191-9/+19
| | | | | | | | | Test infs, zeros and nans with our arith functions to assure correct/defined behavior with those values. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: add a log function that handles edge casesZack Rusin2013-07-192-0/+21
| | | | | | | | | Same as log2_safe, which means that it can handle infs, 0s and nans. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: export unordered/ordered cmp to a common functionZack Rusin2013-07-191-283/+158
| | | | | | | | | Only the floating point operarators change everything else is the same so it makes sense to share the code. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: handle -inf, inf and nan's in sin/cos instructionsZack Rusin2013-07-192-0/+49
| | | | | | | | | sin/cos for anything not finite is nan and everything else has to be between [-1, 1]. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: add a version of log2 which handles edge casesZack Rusin2013-07-193-6/+65
| | | | | | | | | | | | | | That means that if input is: * - less than zero (to and including -inf) then NaN will be returned * - equal to zero (-denorm, -0, +0 or +denorm), then -inf will be returned * - +infinity, then +infinity will be returned * - NaN, then NaN will be returned It's a separate function because the checks are a little bit costly and in most cases are likely unnecessary. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: fix edge cases in exp2Zack Rusin2013-07-191-3/+7
| | | | | | | | | | exp(0) has to be exactly 1, exp(-inf) has to be 0, exp(inf) has to be inf and exp(nan) has to be nan, this fixes all of those cases. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: handle nan's in min/maxZack Rusin2013-07-196-52/+482
| | | | | | | | | | | Both D3D10 and OpenCL say that if one the inputs is nan then the other should be returned. To preserve that behavior the patch fixes both the sse and the non-sse paths in both functions and adds helper code for handling nans. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* scons: Disallow undefined symbols in Xlib libGL.so.José Fonseca2013-07-191-0/+3
| | | | | | | | It's not the first time that, due to missing build dependencies or incomplete commits, we end up with a broken libGL.so that's missing symbols, causing all tests to fail catastrophically. Instead try to catch this sort of issues earlier.
* llvmpipe: clamp inputs for srgb render buffersRoland Scheidegger2013-07-181-0/+35
| | | | | | | | | | | | | | | Usually with fixed point renderbuffers clamping is done as part of conversion. However, since we blend in float format, we essentially skip all conversion steps pre-blend but since this is still a fixed point renderbuffer we must still clamp the inputs in this case. Makes no difference for piglit though. Obviously we could skip this if fragment color clamping is enabled, but a) this is deprecated in OpenGL (d3d never had it) and b) we don't support it natively so it gets baked into the shader. Also add some comment about logic ops being broken for srgb, luckily no test tries to do that as there's no easy fix... Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Zack Rusin <[email protected]>
* llvmpipe: fix blending with SRC_ALPHA_SATURATE with some formats without alphaRoland Scheidegger2013-07-182-8/+26
| | | | | | | | | | | | | | | | | | We were fixing up the blend factor to ZERO, however this only works correctly with fixed point render buffers where the input values are clamped to 0/1 (because src_alpha_saturate is min(As, 1-Ad) so can be negative with unclamped inputs). Haven't seen any failure anywhere due to that with fixed point SNORM buffers (which clamp inputs to -1/1) but it should apply there as well (snorm blending is rare, even opengl 4.3 doesn't require snorm rendertargets at all, d3d10 requires them but they are not blendable). Doesn't look like piglit hits this though (some internal testing hits the float case at least). (With legacy OpenGL we could theoretically still use the fixup to zero if the fragment color clamp is enabled, but we can't detect that easily since we don't support native clamping hence it gets baked into the shader.) Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Zack Rusin <[email protected]>
* r600g: use WAIT_3D_IDLE before using CP DMAMarek Olšák2013-07-182-0/+2
| | | | I broke this with 7948ed1250cae78ae1b22dbce4ab23aceacc6159 for r700 at least.
* r300g: make use of gallium's os_get_process_name()Jonathan Gray2013-07-181-1/+6
| | | | | | | Lets the code compile on non Linux systems. Signed-off-by: Jonathan Gray <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* nv50: H.264/MPEG2 decoding support via VP2, available on NV84-NV96, NVA0Ilia Mirkin2013-07-1811-3/+1815
| | | | | | | | | | | | | | Adds H.264 and MPEG2 codec support via VP2, using firmware from the blob. Acceleration is supported at the bitstream level for H.264 and IDCT level for MPEG2. Known issues: - H.264 interlaced doesn't render properly - H.264 shows very occasional artifacts on a small fraction of videos - MPEG2 + VDPAU shows frequent but small artifacts, which aren't there when using XvMC on the same videos Signed-off-by: Ilia Mirkin <[email protected]>
* gallivm: (trivial) simplify lp_build_cos/lp_build_sin a tiny bitRoland Scheidegger2013-07-171-7/+6
| | | | | | | | | Use "or" instead of "add" (this is a classic select sequence, which at least newer llvm versions can actually recognize (3.2+?), and the "add" might prevent that - and we really don't want an add instead of an or with avx if it isn't recognized (even without avx logic ops might be cheaper)). Reviewed-by: Jose Fonseca <[email protected]>
* util/u_format_s3tc: handle srgb formats correctly.Roland Scheidegger2013-07-172-185/+254
| | | | | | | | | | | | | | | | Instead of just ignoring the srgb/linear conversions, simply call the corresponding conversion functions, for all of pack/unpack/fetch, both for float and unorm8 versions (though some don't make a whole lot of sense, i.e. unorm8/unorm8 srgb/linear combinations). Refactored some functions a bit so don't have to duplicate all the code (there's a slight change for packing dxt1_rgb, as there will now be always 4 components initialized and sent to the external compression function so the same code can be used for all, the quite horrid and ad-hoc interface (by now) should always have worked with that). Fixes llvmpipe/softpipe piglit texwrap GL_EXT_texture_sRGB-s3tc. Reviewed-by: Jose Fonseca <[email protected]>
* r600g/sb: improve alu packing on caymanVadim Girlin2013-07-172-15/+89
| | | | | | | | | | | | | | | | Scheduler/register allocator in r600-sb was developed and optimized on evergreen (VLIW-5) hardware, so currently it's not optimal for VLIW-4 chips. This patch should improve performance on cayman gpus due to better alu packing, but also it tends to increase register usage, so overall positive effect on performance has to be proven by real benchmarks yet. Some results with bfgminer kernel on cayman: source bytecode: 60 gprs, 3905 alu groups, sbcl before the patch: 45 gprs, 4088 alu groups, sbcl with this patch: 55 gprs, 3474 alu groups. Signed-off-by: Vadim Girlin <[email protected]>
* r600g/sb: fix handling of new multislot instructions on caymanVadim Girlin2013-07-173-5/+6
| | | | | | | Ex-scalar instructions that became multislot on cayman do replicate result to all channels - handle them similar to DOT4. Signed-off-by: Vadim Girlin <[email protected]>
* r600g/sb: fix debug dump code in schedulerVadim Girlin2013-07-171-4/+5
| | | | | | Update the stale debug code for other changes related to debug output. Signed-off-by: Vadim Girlin <[email protected]>
* r600g/sb: fix initial register allocationVadim Girlin2013-07-171-0/+1
| | | | | | | | | | Mark values that are members of the 'same register' constraint as preallocated in ra_init pass, this will prevent incorrect reallocation in scheduler in some cases. Should fix https://bugs.freedesktop.org/show_bug.cgi?id=66713 Signed-off-by: Vadim Girlin <[email protected]>
* r600g/sb: move chip & class name functions to sb_contextVadim Girlin2013-07-174-53/+55
| | | | Signed-off-by: Vadim Girlin <[email protected]>
* r600g/sb: fix handling of PS in source bytecode on caymanVadim Girlin2013-07-171-0/+5
| | | | | | | | | Actually PS doesn't make sense for cayman and isn't even mentioned in cayman docs, but llvm backend currently uses it in bytecode and, assuming that hw seems to be mostly ok with it, this will allow sb to parse such source bytecode correctly. Signed-off-by: Vadim Girlin <[email protected]>
* r600g/sb: Initialize ra_checker member variables.Vinson Lee2013-07-171-1/+1
| | | | | | Fixes "Uninitialized scalar field" defect reported by Coverity. Signed-off-by: Vinson Lee <[email protected]>
* gallium/util: use explicily sized types for {un, }pack_rgba_{s, u}intEmil Velikov2013-07-172-8/+8
| | | | | | | | Every function but the above four uses explicitly sized types for their src and dst arguments. Even fetch_rgba_{s,u}int follows the convention. Signed-off-by: Emil Velikov <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* llvmpipe: use MCJIT on ARM and AArch64Kyle McMartin2013-07-172-2/+9
| | | | | | | | MCJIT is the only supported LLVM JIT on AArch64 and ARM (the regular JIT has bit-rotted badly on ARM and doesn't exist on AArch64.) Signed-off-by: Kyle McMartin <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* llvmpipe: support sRGB framebuffersRoland Scheidegger2013-07-164-18/+111
| | | | | | | | | | | | | | | Just use the new conversion functions to do the work. The way it's plugged in into the blend code is quite hacktastic but follows all the same hacks as used by packed float format already. Only support 4x8bit srgb formats (rgba/rgbx plus swizzle), 24bit formats never worked anyway in the blend code and are thus disabled, and I don't think anyone is interested in L8/L8A8. Would need even more hacks otherwise. Unless I'm missing something, this is the last feature except MSAA needed for OpenGL 3.0, and for OpenGL 3.1 as well I believe. v2: prettify a bit, use separate function for packing. Reviewed-by: Jose Fonseca <[email protected]>
* Revert "r300g: allow HiZ with a 16-bit zbuffer"Marek Olšák2013-07-151-0/+1
| | | | | | | | This reverts commit 631c631cbf5b7e84e42a7cfffa1c206d63143370. https://bugs.freedesktop.org/show_bug.cgi?id=66921 Cc: [email protected]
* r300g/swtcl: fix a lockup in MSAA resolveMarek Olšák2013-07-151-0/+7
| | | | Cc: [email protected]
* r300g/swtcl: fix geometry corruption by uploading indices to a bufferMarek Olšák2013-07-153-45/+31
| | | | | | | | | | | | | The splitting of a draw call into several draw commands was broken, because the split sometimes took place in the middle of a primitive. The splitting was supposed to be dealing with the case when there are more indices than the maximum size of a CS. This commit throws that code away and uses a real index buffer instead. https://bugs.freedesktop.org/show_bug.cgi?id=66558 Cc: [email protected]
* gallivm: (trivial) use constant instead of exp2f() functionRoland Scheidegger2013-07-141-2/+3
| | | | | | | Some lame compilers can't do exp2f() and as far as I can tell they can't do exp2() (with doubles) neither so instead of providing some workaround for that (wouldn't actually be too bad just replace with pow) and since it is used with a constant only just use the precalculated constant.
* ilo: skip 3DSTATE_INDEX_BUFFER when possibleChia-I Wu2013-07-144-59/+77
| | | | | | When only the offset to the index buffer is changed, we can skip the 3DSTATE_INDEX_BUFFER if we always use 0 for the offset, and add (offset / index_size) to Start Vertex Location in 3DPRIMITIVE.
* gallivm: handle srgb-to-linear and linear-to-srgb conversionsRoland Scheidegger2013-07-136-7/+332
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | srgb-to-linear is using 3rd degree polynomial for now which should be _just_ good enough. Reverse is using some rational polynomials and is quite accurate, though not hooked into llvmpipe's blend code yet and hence unused (untested). Using a table might also be an option (for srgb-to-linear especially). This does not enable any new features yet because EXT_texture_srgb was already supported via util_format fallbacks, but performance was lacking probably due to the external function call (the table used by the util_format_srgb code may not be all that much slower on its own). Some performance figures (taken from modified gloss, replaced both base and sphere texture to use GL_SRGB instead of GL_RGB, measured on 1Ghz Sandy Bridge, the numbers aren't terribly accurate): normal gloss, aos, 8-wide: 47 fps normal gloss, aos, 4-wide: 48 fps normal gloss, forced to soa, 8-wide: 48 fps normal gloss, forced to soa, 4-wide: 47 fps patched gloss, old code, soa, 8-wide: 21 fps patched gloss, old code, soa, 4-wide: 24 fps patched gloss, new code, soa, 8-wide: 41 fps patched gloss, new code, soa, 4-wide: 38 fps So there's a performance hit but it seems acceptable, certainly better than using the fallback. Note the new code only works for 4x8bit srgb formats, others (L8/L8A8) will continue to use the old util_format fallback, because I can't be bothered to write code for formats noone uses anyway (as decoding is done as part of lp_build_unpack_rgba_soa which can only handle block type width of 32). Compressed srgb formats should get their own path though eventually (it is going to be expensive in any case, first decompress, then convert). No piglit regressions. v2: use lp_build_polynomial instead of ad-hoc polynomial construction, also since keeping both linear to srgb functions for now make sure both are compiled (since they share quite some code just integrate into the same function). v3: formatting fixes and bugfix in the complicated (disabled) linear-to-srgb path. Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: better support for fast rsqrtRoland Scheidegger2013-07-132-16/+63
| | | | | | | | | | | | | | | We had to disable fast rsqrt before because it wasn't precise enough etc. However in situations when we know we're not going to need more precision we can still use a fast rsqrt (which can be several times faster than the quite expensive sqrt). Hence introduce a new helper which does exactly that - it is probably not useful calling it in some situations if there's no fast rsqrt available so make it queryable if it's available too. v2: use fast_rsqrt consistently instead of rsqrt_fast, fix indentation, let rsqrt use fast_rsqrt. Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* r600g/sb: Initialize ra_constraint::cost.Vinson Lee2013-07-131-1/+1
| | | | | | Fixes "Uninitialized scalar field" reported by Coverity. Signed-off-by: Vinson Lee <[email protected]>
* winsys/radeon: allow a NULL cs pointer in radeon_bo_map to fix a segfaultMarek Olšák2013-07-131-9/+11
| | | | | The original idea was that cs=NULL should be allowed here, but we never used NULL until 862f69fbe1e54e0e9a3c439450a14f. This fixes a segfault in CoreBreach.
* ilo: move a santiy check into its assert()Chia-I Wu2013-07-131-5/+2
| | | | | | The compiler does not know that ilo_3d_pipeline_estimate_size() is pure and can be eliminated in a release build in gen6_pipeline_end(). Move the call into the assert().
* ilo: mark some states dirty when they are really changedChia-I Wu2013-07-131-0/+16
| | | | | The checks may seem redundant because cso_context handles them, but util_blitter does not have access to cso_context.