summaryrefslogtreecommitdiffstats
path: root/src/gallium/auxiliary/util
Commit message (Collapse)AuthorAgeFilesLines
* util: (trivial) fix asm input/output list for fxsaveRoland Scheidegger2013-08-091-1/+1
| | | | | Otherwise gcc might do very unsafe optimizations, spotted by Uros Bizjak. Hopefully this time it's finally right?
* util: (trivial) fix more compile errors in u_cpu_detect (gcc/x86 this time).Dieter Nützel2013-08-091-1/+1
| | | | Oops. Should fix https://bugs.freedesktop.org/show_bug.cgi?id=67921
* util: (trivial) fix compile error with MSVC on x86Roland Scheidegger2013-08-081-1/+1
|
* util: try much harder to set DAZ flagRoland Scheidegger2013-08-083-1/+31
| | | | | | | | | | | | | | | | | | While so far this only causes some harmless test failures, there's lots more cpus with DAZ. All 64bit capable ones can do it (particularly relevant for AMD cpus as they supported sse3 very very late) but if really necessary we can check support for that for real with some more magic. (In fact just about ANY cpu with sse2 can support DAZ, I believe the only exception are first gen P4 (Willamette) and from those only early steppings which can't do it it's almost like intel forgot to add it... - a real pity though docs say you can't just try to set it as they will throw a GPF.) While this was meant to address https://bugs.freedesktop.org/show_bug.cgi?id=67672 it does not fix it. Most likely the tests need fixing as I don't think there's any guarantee about denorm handling in the reference math library functions if the flags aren't set to standard values. Nevertheless enabling DAZ on all cpus which can do it should be the right thing to do. Reviewed-by: Jose Fonseca <[email protected]>
* util: implement table-based + linear interpolation linear-to-srgb conversionRoland Scheidegger2013-08-082-11/+102
| | | | | | | | | | | | | | | | | Should be much faster, seems to work in softpipe. While here (also it's now disabled) fix up the pow factor - the former value is what is in GL core it is however not actually accurate to fp32 standard (as it is 1.0/2.4), and if someone would do all the accurate math there's no reason to waste 8 mantissa bits or so... v2: use real table generating function instead of just printing the values (might take a bit longer as it does calculations on some 3+ million floats but much more descriptive obviously). Also fix up another inaccurate pow factor (this time in the python code) - wondering where the couple one bit errors came from :-(. Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Zack Rusin <[email protected]>
* gallium/util: reformat, comment util_get_offset()Brian Paul2013-07-311-3/+7
| | | | Reviewed-by: Roland Scheidegger <[email protected]>
* gallium/util: comments, var renaming in u_inlines.hBrian Paul2013-07-311-13/+48
| | | | | | | | | | | | The variable 'usage' was being used for two different things. Sometimes for PIPE_USAGE_x and other times for PIPE_TRANSFER_x. This renames usage to access when we're talking about PIPE_TRANSFER_x flags. Plus, add a bunch of comments to remind us what's going on. Also, use unsigned for PIPE_TRANSFER_x bitmask to be consistent with other places. And add a missing const qualifier. Reviewed-by: Roland Scheidegger <[email protected]>
* st/dri: implement the driconf option force_s3tc_enable properlyMarek Olšák2013-07-301-10/+2
| | | | Reviewed-by: Brian Paul <[email protected]>
* util: don't flush overflowing values to infinity in half-float conversionRoland Scheidegger2013-07-272-9/+17
| | | | | | | | | | | | | | | | | | I am not able to find _any_ rounding behavior specified for OpenGL for float to half-float conversions. However, it is specified for fp11/fp10 which suggests round to next finite value but round-to-zero would also be allowed, but finite values must not be flushed to infinity in either case. Hence I believe it makes sense to do the same for half-floats too. We could probably also use round-to-zero consistently, which is in fact required by d3d10 (but it doesn't seem to matter much). Does not match the mesa core function doing the same though (which is saying it was built to match intel gpus which I don't believe for a second as it would cause failures in d3d10, moreover the PRM (for ivy bridge, not listed in older manuals) while not specifying rounding behavior clearly states finite numbers are never flushed to infinity). Reviewed-by: Jose Fonseca <[email protected]>
* gallium/util: Fix detection of AVX cpu capsAndre Heider2013-07-231-2/+25
| | | | | | | | | | | | | | | | | For AVX it's not sufficient to only rely on the cpuid flags. If the CPU supports these extensions, but the OS doesn't, issuing these insns will trigger an undefined opcode exception. In addition to the AVX cpuid bit we also need to: * test cpuid for OSXSAVE support * XGETBV to check if the OS saves/restores AVX regs on context switches See "Detecting Availability and Support" at http://software.intel.com/en-us/articles/introduction-to-intel-advanced-vector-extensions Signed-off-by: Andre Heider <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: José Fonseca <[email protected]>
* util/u_math: Define NAN/INFINITY macros for MSVC.José Fonseca2013-07-201-0/+4
| | | | Untested. But should hopefully fix the build.
* util/u_format_s3tc: handle srgb formats correctly.Roland Scheidegger2013-07-172-185/+254
| | | | | | | | | | | | | | | | Instead of just ignoring the srgb/linear conversions, simply call the corresponding conversion functions, for all of pack/unpack/fetch, both for float and unorm8 versions (though some don't make a whole lot of sense, i.e. unorm8/unorm8 srgb/linear combinations). Refactored some functions a bit so don't have to duplicate all the code (there's a slight change for packing dxt1_rgb, as there will now be always 4 components initialized and sent to the external compression function so the same code can be used for all, the quite horrid and ad-hoc interface (by now) should always have worked with that). Fixes llvmpipe/softpipe piglit texwrap GL_EXT_texture_sRGB-s3tc. Reviewed-by: Jose Fonseca <[email protected]>
* gallium/util: use explicily sized types for {un, }pack_rgba_{s, u}intEmil Velikov2013-07-172-8/+8
| | | | | | | | Every function but the above four uses explicitly sized types for their src and dst arguments. Even fetch_rgba_{s,u}int follows the convention. Signed-off-by: Emil Velikov <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* util/u_format: Comment out half float denormal test case.José Fonseca2013-07-121-0/+5
| | | | So that lp_test_format doesn't fail until we decide what should be done.
* tgsi: rename the TGSI fragment kill opcodesBrian Paul2013-07-121-5/+5
| | | | | | | | | | | | | | | | | | | | | TGSI_OPCODE_KIL and KILP had confusing names. The former was conditional kill (if any src component < 0). The later was unconditional kill. At one time KILP was supposed to work with NV-style condition codes/predicates but we never had that in TGSI. This patch renames both opcodes: TGSI_OPCODE_KIL -> KILL_IF (kill if src.xyzw < 0) TGSI_OPCODE_KILP -> KILL (unconditional kill) Note: I didn't just transpose the opcode names to help ensure that I didn't miss updating any code anywhere. I believe I've updated all the relevant code and comments but I'm not 100% sure that some drivers had this right in the first place. For example, the radeon driver might have llvm.AMDGPU.kill and llvm.AMDGPU.kilp mixed up. Driver authors should review their code. Reviewed-by: Jose Fonseca <[email protected]>
* util: add casts to silence MSVC warnings in u_blit.cBrian Paul2013-07-121-14/+14
|
* util/u_math: Use xmmintrin.h whenever possible.José Fonseca2013-07-101-9/+17
| | | | | | | | | | | | | It seems __builtin_ia32_ldmxcsr is only available on gcc and only when -msse is used. xmmintrin.h/pmmintrin.h provide portable intrinsics, but these too are only available with gcc when -msse/-msse3 are set. scons build always sets -msse on x86 builds, but autotools doesn't seem to. We could try to get this working on gcc x86 without -msse by emitting assembly, but I believe that in this day and age we really should be building Mesa with -msse and -msse2.
* util: treat denorm'ed floats like zeroZack Rusin2013-07-092-0/+63
| | | | | | | | | | | | | The D3D10 spec is very explicit about treatment of denorm floats and the behavior is exactly the same for them as it would be for -0 or +0. This makes our shading code match that behavior, since OpenGL doesn't care and on a few cpu's it's faster (worst case the same). Float16 conversions will likely break but we'll fix them in a follow up commit. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* draw/translate: fix instancingZack Rusin2013-06-281-4/+4
| | | | | | | | | | | | | | | | | | We were incorrectly computing the buffer offset when using the instances. The buffer offset is always equal to: start_instance * stride + (instance_num / instance_divisor) * stride We were completely ignoring the start instance quite often producing instances that completely wrong, e.g. if start instance = 5, instance divisor = 2, then on the first iteration it should be: 5 * stride, not (5/2) * stride as we'd have currently, and if start instance = 1, instance divisor = 3, then on the first iteration it should be: 1 * stride, not 0 as we'd have. This fixes it and adjusts all the code to the changes. Signed-off-by: Zack Rusin <[email protected]>
* st/mesa: handle SNORM formats in generic CopyPixels pathMarek Olšák2013-06-302-0/+23
| | | | v2: check desc->is_mixed in util_format_is_snorm
* util: int/unsigned changes to silence some MSVC warningsBrian Paul2013-06-262-3/+3
|
* util: add some casts to silence some MSVC warningsBrian Paul2013-06-261-2/+2
|
* util: s/int/unsigned/ to silence some MSVC warningsBrian Paul2013-06-261-2/+2
|
* util/debug: Cleanup/improve debug_symbol_name_dbghelp.José Fonseca2013-06-251-78/+161
| | | | | | | | | | | | - use mgwhelp -- the successor for bfdhelp which does not have a hard dependency on BFD, and works on 64bits. - use a macro instead of hand-typing to dispatch DbgHelp functions - dump line numbers - dump module names when symbols are not available - support 64bits. - add comments Reviewed-by: Brian Paul <[email protected]>
* util/debug: Make debug_backtrace_capture work for 64bit windows.José Fonseca2013-06-252-2/+61
| | | | | | Rely on Windows' CaptureStackBackTrace to do the grunt work. Reviewed-by: Brian Paul <[email protected]>
* gallium: Fix llvmpipe on big-endian machinesAdam Jackson2013-06-245-48/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Squashed commit of the following: commit 0857a7e105bfcbc4d1431b2cc56612094c747ca3 Author: Richard Sandiford <[email protected]> Date: Tue Jun 18 12:25:07 2013 -0400 gallivm: Fix lp_build_rgba8_to_fi32_soa for big endian Reviewed-by: Adam Jackson <[email protected]> Signed-off-by: Richard Sandiford <[email protected]> commit 0d65131649a8aa140e2db228ba779d685c4333e3 Author: Richard Sandiford <[email protected]> Date: Tue Jun 18 12:25:07 2013 -0400 gallivm: Fix big-endian machines This adds a bit-shift count to the format table, and adds the concept of vector or bitwise alignment on gathers. Reviewed-by: Adam Jackson <[email protected]> Signed-off-by: Richard Sandiford <[email protected]> commit 9740bda9b7dc894b629ed38be9b51059ce90818f Author: Richard Sandiford <[email protected]> Date: Tue Jun 18 12:25:07 2013 -0400 llvmpipe: Fix convert_to_blend_type on big-endian Reviewed-by: Adam Jackson <[email protected]> Signed-off-by: Richard Sandiford <[email protected]> commit ae037c2de0f029e4e99371c0de25560484f0d8df Author: Richard Sandiford <[email protected]> Date: Tue Jun 18 12:25:06 2013 -0400 util: Convert color pack to packed formats This fixes them on big-endian. Reviewed-by: Adam Jackson <[email protected]> Signed-off-by: Richard Sandiford <[email protected]> commit 5b05ac0c89ae092ea8ba5bba9f739708d7396b5c Author: Richard Sandiford <[email protected]> Date: Tue Jun 18 12:25:06 2013 -0400 graw-xlib: Convert to packed formats Reviewed-by: Adam Jackson <[email protected]> Signed-off-by: Richard Sandiford <[email protected]> commit 51396e7d098cb6ff794391cf11afe4dbf86dbea0 Author: Richard Sandiford <[email protected]> Date: Tue Jun 18 12:25:06 2013 -0400 format: Convert to packed formats Reviewed-by: Adam Jackson <[email protected]> Signed-off-by: Richard Sandiford <[email protected]> commit 417b60bc66eb450e68a92ab0e47f76e292b385e6 Author: Adam Jackson <[email protected]> Date: Tue Jun 18 12:25:06 2013 -0400 st/dri: Convert to packed formats Reviewed-by: Adam Jackson <[email protected]> Signed-off-by: Richard Sandiford <[email protected]> commit 0934b2e022a5e0847d312c40734e2b44cac52fd8 Author: Richard Sandiford <[email protected]> Date: Tue Jun 18 12:25:06 2013 -0400 st/xlib: Convert to packed formats Reviewed-by: Adam Jackson <[email protected]> Signed-off-by: Richard Sandiford <[email protected]> commit a307ea3c3716a706963acce7966b5e405ba11db9 Author: Richard Sandiford <[email protected]> Date: Tue Jun 18 12:25:06 2013 -0400 gbm: Convert to packed formats Reviewed-by: Adam Jackson <[email protected]> Signed-off-by: Richard Sandiford <[email protected]> commit 53eebdd253e1960a645ea278f31d7ef6a6cf4aeb Author: Richard Sandiford <[email protected]> Date: Tue Jun 18 12:25:06 2013 -0400 tests: Convert to packed formats Reviewed-by: Adam Jackson <[email protected]> Signed-off-by: Richard Sandiford <[email protected]> commit 2f77fe3ee524945eacd546efcac34f7799fb3124 Author: Adam Jackson <[email protected]> Date: Tue Jun 18 13:07:37 2013 -0400 gallium: Document packed formats Signed-off-by: Adam Jackson <[email protected]> commit 1f1017159ce951f922210a430de9229f91f62714 Author: Richard Sandiford <[email protected]> Date: Tue Jun 18 12:25:06 2013 -0400 gallium: Introduce 32-bit packed format names These are for interacting with buffers natively described in terms of bit shifts, like X11 visuals: uint32_t xyzw8888 = (x << 0) | (y << 8) | (z << 16) | (w << 24); Define these in terms of (endian-dependent) aliases to the array-style format names. Reviewed-by: Adam Jackson <[email protected]> Signed-off-by: Richard Sandiford <[email protected]> commit 6cc7ab1ee66ed668da78c1d951dfd7782b4e786a Author: Adam Jackson <[email protected]> Date: Mon Jun 3 12:10:32 2013 -0400 gallium: Document format name conventions v2: - Fix a channel name thinko (Michel Dänzer) - Elaborate on SCALED versus INT - Add links to DirectX and FOURCC docs Signed-off-by: Adam Jackson <[email protected]> commit df4d269e7fb62051a3c029b84147465001e5776e Author: Adam Jackson <[email protected]> Date: Tue Jun 18 12:25:06 2013 -0400 gallivm: Remove all notion of byte-swapping Signed-off-by: Adam Jackson <[email protected]> Signed-off-by: Adam Jackson <[email protected]>
* util: (trivial) add has_popcnt fieldRoland Scheidegger2013-06-192-0/+2
| | | | | | Not used yet but there's a couple of places in llvmpipe which should use this (occlusion count is currently very inefficent if there's no cpu popcnt instruction).
* gallium: add condition parameter to render_conditionRoland Scheidegger2013-06-184-3/+7
| | | | | | | | | | | | | For conditional rendering this makes it possible to skip rendering if either the predicate is true or false, as supported by d3d10 (in fact previously it was sort of implied skip rendering if predicate is false for occlusion predicate, and true for so_overflow predicate). There's no cap bit for this as presumably all drivers could do it trivially (but this patch does not implement it for the drivers using true hw predicates, nvxx, r600, radeonsi, no change is expected for OpenGL functionality). Reviewed-by: Jose Fonseca <[email protected]>
* mesa: Fix bug in unclamped float to ubyte conversion.Manfred Ernst2013-06-121-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | Problem: The IEEE float optimized version of UNCLAMPED_FLOAT_TO_UBYTE in macros.h computed incorrect results for inputs in the range 0x3f7f0000 (=0.99609375) to 0x3f7f7f80 (=0.99803924560546875) inclusive. 0x3f7f7f80 is the IEEE float value that results in 254.5 when multiplied by 255. With rounding mode "round to closest even integer", this is the largest float in the range 0.0-1.0 that is converted to 254 by the generic implementation of UNCLAMPED_FLOAT_TO_UBYTE. The IEEE float optimized version incorrectly defined the cut-off for mapping to 255 as 0x3f7f0000 (=255.0/256.0). The same bug was present in the function float_to_ubyte in u_math.h. Fix: The proposed fix replaces the incorrect cut-off value by 0x3f800000, which is the IEEE float representation of 1.0f. 0x3f7f7f81 (or any value in between) would also work, but 1.0f is probably cleaner. The patch does not regress piglit on llvmpipe and on i965 on sandy bridge. Tested-by Stéphane Marchesin <[email protected]> Reviewed-by Stéphane Marchesin <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* gallium/u_format: add a new helper for initializing pipe_blit_info::maskMarek Olšák2013-06-132-25/+29
| | | | Reviewed-by: Brian Paul <[email protected]>
* gallium/u_blitter: make clearing independent of the colorbuffer formatMarek Olšák2013-06-132-43/+4
| | | | | | | There isn't any difference between 32_FLOAT and 32_*INT in vertex fetching. Both of them don't do any format conversion. Reviewed-by: Brian Paul <[email protected]>
* gallium/u_blitter: make clearing independent of the number of bound colorbuffersMarek Olšák2013-06-134-56/+41
| | | | | | We can use the fragment shader TGSI property WRITES_ALL_CBUFS. Reviewed-by: Brian Paul <[email protected]>
* gallium/util: make WRITES_ALL_CBUFS optional in the passthrough fragment shaderMarek Olšák2013-06-132-4/+8
| | | | Reviewed-by: Brian Paul <[email protected]>
* util: new util_fill_box helperRoland Scheidegger2013-06-132-10/+37
| | | | | | | | | Use new util_fill_box helper for util_clear_render_target. (Also fix off-by-one map error.) v2: handle non-zero z correctly in new helper Reviewed-by: Jose Fonseca <[email protected]>
* util: Use sizeof(void *) rather than 0 as the fallback cache line sizeRichard Sandiford2013-06-101-0/+5
| | | | | | | | Without this, llvmpipe ends up giving a zero size to all uncompressed textures on non-x86 systems, since align() cannot handle a 0 alignment. Reviewed-by: Adam Jackson <[email protected]> Signed-off-by: Richard Sandiford <[email protected]>
* util: add comment about bogus transfer flagsRoland Scheidegger2013-06-071-0/+1
|
* util: fix util_clear_render_target and util_clear_depth_stencil layer handlingRoland Scheidegger2013-06-071-87/+103
| | | | | | | These functions must clear all bound layers, not just the first. Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* util: add util_resource_is_array_texture()Chia-I Wu2013-06-081-1/+19
| | | | | | | | Checking if array_size is greater than 1 is not enough for single-layered array textures. Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* u_vbuf: fix index buffer leakChia-I Wu2013-06-071-0/+3
| | | | | Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* mesa: remove outdated version lines in commentsRico Schüller2013-06-051-1/+0
| | | | Signed-off-by: Brian Paul <[email protected]>
* gallium: Add support for multiple viewportsZack Rusin2013-05-251-4/+4
| | | | | | | | | | | | Gallium supported only a single viewport/scissor combination. This commit changes the interface to allow us to add support for multiple viewports/scissors. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: José Fonseca<[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* gallium: fix build on uclibc systemAnthony G. Basile2013-05-291-4/+2
| | | | | | | | | | | | execinfo.h and debug_symbol_name_glibc() are pure GNU-isms and do not build on uclibc systems. A previous patch addressed this issue, but there was an error. This patch corrects that error. See https://bugs.freedesktop.org/show_bug.cgi?id=51782 https://bugs.gentoo.org/show_bug.cgi?id=469768 Signed-off-by: Anthony G. Basile <[email protected]> Signed-off-by: Brian Paul <[email protected]>
* util/prim: add u_reduced_prims_for_vertices()Chia-I Wu2013-05-031-0/+20
| | | | | | | | The function returns the number of reduced/tessellated primitives for the given vertex count. Signed-off-by: Chia-I Wu <[email protected]> Acked-by: Zack Rusin <[email protected]>
* util/prim: assorted fixes for u_decomposed_prims_for_vertices()Chia-I Wu2013-05-031-11/+11
| | | | | | | | | | | Switch to '>=' for comparisons, and it becomes obvious that the comparison for PIPE_PRIM_QUAD_STRIP was wrong. Add minimum vertex count check for PIPE_PRIM_LINE_LOOP. Return 1 for PIPE_PRIM_POLYGON with 3 vertices. Signed-off-by: Chia-I Wu <[email protected]> Acked-by: Zack Rusin <[email protected]>
* util/prim: use vertex count info in u_validate_pipe_prim()Chia-I Wu2013-05-031-32/+2
| | | | | | | As a side effect, primitives with adjacency are now correctly validated. Signed-off-by: Chia-I Wu <[email protected]> Acked-by: Zack Rusin <[email protected]>
* util/prim: fix the name of the include guardChia-I Wu2013-05-031-2/+2
| | | | | | | It should be U_PRIM_H, not U_BLIT_H. Signed-off-by: Chia-I Wu <[email protected]> Acked-by: Zack Rusin <[email protected]>
* draw: use u_assembled_prim() instead of u_assembled_primitive()Chia-I Wu2013-05-031-8/+0
| | | | | | | The latter function is also removed as a result of the change. Signed-off-by: Chia-I Wu <[email protected]> Acked-by: Zack Rusin <[email protected]>
* util/prim: clean up and add commentsChia-I Wu2013-05-031-60/+107
| | | | | | | | | | | | | Move together (or add) functions to decompose/reduce/assemble a primitive, give them consistent names, and document them. Add u_prim_vertex_count() so that the vertex count information can be used elsewhere. u_assembled_primitive() will be removed in a folow-on commit. [olv: fix a warning when -Wold-style-declaration is enabled] Signed-off-by: Chia-I Wu <[email protected]> Acked-by: Zack Rusin <[email protected]>
* util/prim: fix primitive trimming for triangles with adjacencyChia-I Wu2013-05-031-2/+2
| | | | | | | Fix for PIPE_PRIM_TRIANGLES_ADJACENCY and PIPE_PRIM_TRIANGLE_STRIP_ADJACENCY. Signed-off-by: Chia-I Wu <[email protected]> Acked-by: Zack Rusin <[email protected]>
* util/u_sse: Fix _mm_shuffle_epi8 prototype for clang.José Fonseca2013-04-251-1/+6
| | | | | Clang does not support __artificial__. Instead match precisely what's in the clang headers.