summaryrefslogtreecommitdiffstats
path: root/src/gallium
Commit message (Collapse)AuthorAgeFilesLines
* clover: Delete copy constructors and assignment operators in all ↵Francisco Jerez2013-10-2111-22/+44
| | | | | | non-copiable objects. Tested-by: Tom Stellard <[email protected]>
* clover: Define a few convenience equality operators.Francisco Jerez2013-10-2110-5/+47
| | | | Tested-by: Tom Stellard <[email protected]>
* clover: Simplify the platform object by using util/range.Francisco Jerez2013-10-213-28/+8
| | | | Tested-by: Tom Stellard <[email protected]>
* clover: Add property list helpers with a syntax consistent with other API ↵Francisco Jerez2013-10-215-50/+91
| | | | | | objects. Tested-by: Tom Stellard <[email protected]>
* clover: Switch samplers to the new model.Francisco Jerez2013-10-217-53/+53
| | | | Tested-by: Tom Stellard <[email protected]>
* clover: Switch memory objects to the new model.Francisco Jerez2013-10-219-302/+267
| | | | Tested-by: Tom Stellard <[email protected]>
* clover: Switch kernel and program objects to the new model.Francisco Jerez2013-10-2111-492/+458
| | | | Tested-by: Tom Stellard <[email protected]>
* clover: Switch command queues to the new model.Francisco Jerez2013-10-2114-252/+264
| | | | Tested-by: Tom Stellard <[email protected]>
* clover: Switch event objects to the new model.Francisco Jerez2013-10-217-222/+233
| | | | Tested-by: Tom Stellard <[email protected]>
* clover: Switch context objects to the new model.Francisco Jerez2013-10-2113-103/+91
| | | | Tested-by: Tom Stellard <[email protected]>
* clover: Switch device objects to the new model.Francisco Jerez2013-10-219-140/+139
| | | | Tested-by: Tom Stellard <[email protected]>
* clover: Switch platform objects to the new model.Francisco Jerez2013-10-217-46/+47
| | | | Tested-by: Tom Stellard <[email protected]>
* clover: Define helper classes for the new object model.Francisco Jerez2013-10-2120-107/+398
| | | | Tested-by: Tom Stellard <[email protected]>
* clover: Clean up property query functions by using a new property_buffer ↵Francisco Jerez2013-10-2112-263/+547
| | | | | | helper class. Tested-by: Tom Stellard <[email protected]>
* clover: Switch to the new utility code.Francisco Jerez2013-10-2117-717/+152
| | | | Tested-by: Tom Stellard <[email protected]>
* clover: Name include guards consistently.Francisco Jerez2013-10-2117-34/+34
| | | | Tested-by: Tom Stellard <[email protected]>
* clover: Replace a bunch of double underscores with single underscores.Francisco Jerez2013-10-2127-206/+208
| | | | | | | | Identifiers with double underscores are reserved, and using them has undefined behavior according to the C++ spec. It's unlikely to make any difference, but... Tested-by: Tom Stellard <[email protected]>
* clover: Clean up the event profiling code.Francisco Jerez2013-10-218-121/+228
| | | | Tested-by: Tom Stellard <[email protected]>
* clover: Import new utility library.Francisco Jerez2013-10-2112-1/+2157
| | | | Tested-by: Tom Stellard <[email protected]>
* clover: Use std::numeric_limits<std::size_t>::max() instead of SIZE_MAXTom Stellard2013-10-211-1/+1
| | | | | | This prevents a build failure on some systems. Reviewed-by: Francisco Jerez <[email protected]>
* llvmpipe: enable seamless cube filteringRoland Scheidegger2013-10-211-1/+1
| | | | | Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* gallivm: implement seamless cube filteringRoland Scheidegger2013-10-213-40/+368
| | | | | | | | | | | | | | | | | | | | | | | | | For seamless cube filtering it is necessary to determine new faces and new coords per sample. The logic for this is _seriously_ complex (what needs to happen is very "asymmetric" wrt face, x/y under/overflow), further complicated by the fact that if the 4 samples are in a corner (meaning we only have actually 3 samples, and all 3 are on different faces) then falling off the edge is happening _both_ on x and y axis simultaneously. There was a noticeable performance hit in mesa's cubemap demo when seamless filtering was forced on (just below 10 percent or so in a debug build, when disabling all filtering hacks, otherwise it would probably be a bit more) and when always doing the logic, hence use a branch which it only does it if any of the pixels in a quad (or in two quads) actually hit this. With that there was no measurable performance hit in the cubemap demo (neither in a debug nor release buidl), but this will vary (cubemap demo very rarely hits edges). Might also be different on other cpus, as this forces SoA sampling path which potentially can be quite a bit slower. Note that as for corners, this code gets all the 3 samples which actually exist right, and the 4th texel will simply be the same as one of the others, meaning that filter weights will be a bit wrong. This however should be enough for full OpenGL (but not d3d10) compliance. Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* winsys/radeon: cleanup CS offloadingChristian König2013-10-211-21/+10
| | | | | | | | | Using atomic function for ncs is superfluous since it is protected by a mutex anyway. Also lock the mutex only once while retrieving the next CS for submission. Signed-off-by: Christian König <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* r300g/compiler: Fix unsigned comparison with less than zeroDavid Heidelberger2013-10-211-1/+1
| | | | | | | | | | rc_find_free_temporary_list() returns signed integer (in case of lack of free temporary registers returns -1), so new_index in radeon_rename_regs() should be signed. https://bugs.freedesktop.org/show_bug.cgi?id=54867 Signed-off-by: Marek Olšák <[email protected]>
* r600g/sb: Initialize shader::dce_flags.Vinson Lee2013-10-201-1/+2
| | | | | | | Fixes "Uninitialized scalar field" defect reported by Coverity. Signed-off-by: Vinson Lee <[email protected]> Reviewed-by: Vadim Girlin <[email protected]>
* translate_sse: Fix generated code argument handling for msabi on x86_64Jon TURNEY2013-10-181-3/+11
| | | | | | | | | | | | | | | | translate_sse.c contains code for msabi on x86_64, but it appears to be untested. Currently arguments 1 and 2 passed to the generated code are moved as 32-bit quantities into the registers used by sysvabi, irrespective of the architecture. Since these may be pointers, they must be moved as 64-bit quantities to avoid truncation. Commit f4dd0991719ef3e2606920c5100b372181c60899 disabled tranlate_sse.c on MinGW x86_64, I don't know if was due to this issue, or a different one... Signed-off-by: Jon TURNEY <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* rtasm: Cygwin uses the msabi calling convention on x86_64Jon TURNEY2013-10-181-1/+1
| | | | | | | | | | Cygwin also uses the msabi calling convention on x86_64, not the sysvabi calling convention Signed-off-by: Jon TURNEY <[email protected]> Reviewed-by: Brian Paul <[email protected]> ignored, and an empty message aborts the commit.
* rtasm: The heap is NX on 64-bit Cygwin, so use the rtasm_exec_malloc() ↵Jon TURNEY2013-10-181-1/+1
| | | | | | | | | | | implementation which uses mmap() The heap is NX on 64-bit Cygwin, so use the rtasm_exec_malloc() implementation which uses mmap() to allocate an anonymous page with execute permission, rather than the one which just uses malloc(). Signed-off-by: Jon TURNEY <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* r600g/sb: fix issue with DCE between GVN and GCM (v2)Vadim Girlin2013-10-174-12/+39
| | | | | | | | | | | | | We can't perform DCE using the liveness pass between GVN and GCM because it relies on the correct schedule, but GVN doesn't care about preserving correctness - it's rescheduled later by GCM. This patch makes dce_cleanup pass perform simple DCE between GVN and GCM instead of relying on liveness pass. Fixes https://bugs.freedesktop.org/show_bug.cgi?id=70088 Signed-off-by: Vadim Girlin <[email protected]>
* Revert "scons: Fix build when rtti is disabled"José Fonseca2013-10-162-7/+4
| | | | | | | | | | This reverts commit 94d05bf87a21bd364e84f699a0064e5fba58a6f9 as it has a few problems: - it breaks windows builds becuase env[LLVM_CXXFLAGS] is never set there - it is merging not only rtti, but the whole cxxflags (defines etc) which has proven to be a source of troubles (breaks debugging etc.)
* radeonsi: Use 'SI' as the LLVM processor for CIK on LLVM <= 3.3Tom Stellard2013-10-161-0/+4
| | | | | | | | LLVM 3.3 does not know about CIK processors, and the codes paths for SI and CIK are the same. Reviewed-by: Marek Olšák <[email protected]> Cc: "9.2" <[email protected]>
* r600g/compute Improve debugging outputTom Stellard2013-10-162-5/+7
|
* clover: Link libclc before running any optimizationsTom Stellard2013-10-161-27/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | This is required in order for clang to correctly handle the OpenCL C barrier() builtin which has the following restrictions acording to the OpenCL 1.1 Specification: If barrier is inside a conditional statement, then all work-items must enter the conditional if any work-item enters the conditional statement and executes the barrier. If barrier is inside a loop, all work-items must execute the barrier for each iteration of the loop before any are allowed to continue execution beyond the barrier. By linking before otimizations, we can replace calls to barrier() with calls to a target specific intrinsic which has the noduplicate attribute This attribute prevents clang from performing optimizations which could violate the above rules. This attribute must be applied to the call instruction that invokes the function, so it is not enough to add this attribute the barrier() declaration. As a bonus this will probably speed up compile times since we will no longer need to run link-time optimizations.
* svga: minor fix-ups in svga_get_shader_param()Brian Paul2013-10-161-2/+3
| | | | | Fix debug error message. Add switch case for PIPE_SHADER_COMPUTE. Trivial.
* cso: fix incorrect sampler view count in cso_restore_sampler_views()Brian Paul2013-10-161-3/+6
| | | | | | | | | | | | | | | | | | During the recent bind_sampler_states() interface change in gallium we changed the CSO single_sampler_done() function so that if we were decreasing the number of sampler states bound in the driver, we'd null-out the "extra/old" sampler states to unbind them. See commit 1e2fbf265. However, we didn't make the corresponding fix for sampler views. This caused an assertion to fail in the svga driver which checked that the number of sampler views matched the number of sampler states. This patch fixes cso_restore_sampler_views() so that it nulls-out the extra/old sampler views if the number of new views is less than the number of current/old views. Reviewed-by: Jose Fonseca <[email protected]>
* scons: Fix build when rtti is disabledAlexander von Gluck IV2013-10-152-4/+7
| | | | | | | | | | | | * The rtti fix actually dug up a bug in the scons build scripts. * Autotools took the LLVM cpp and cxx flags, while scons only took the cpp flags. * This grabs the cxx flags and applies them where needed. We may want to make the same change for the llvm cpp flags in scons. * The only linux platform I can find with LLVM no-rtti is Ubuntu. * Fixes bug #70471 Tested-by: Vinson Lee <[email protected]>
* llvmpipe: Advertise PIPE_CAP_DEPTH_CLIP_DISABLE.José Fonseca2013-10-151-1/+1
| | | | | | | | Actually implemented by draw module. Tested piglit ARB_depth_clamp tests, which pass 100%. Trivial.
* draw: make vs_slot signed.José Fonseca2013-10-151-2/+4
| | | | | | Otherwise (vs_slot < 0) will never be true. Trivial.
* swrast: add correct include for out-of-tree buildsEmil Velikov2013-10-151-0/+1
| | | | | | | | | | | | | The xmlpool/options.h file was not accessible when building out-of-tree leading to failure. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70378 Reported-by: Fabio Pedretti <[email protected]> Tested-by: Fabio Pedretti <[email protected]> Tested-by: Andre Heider <[email protected]> Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Andreas Boll <[email protected]>
* build: remove forced -fno-rttiAlexander von Gluck IV2013-10-141-6/+0
| | | | | | | | | | | | | | | | | | | | | | | | * As discussed on the mailing list, forced no-rtti breaks C++ public API's such as the Haiku C++ libGL.so * -fno-rtti *can* be still set however instead of blindly forcing -fno-rtti, we can rely on the llvm-config --cppflags output. If the system llvm is built without rtti (default), the no-rtti flag will be present in llvm-config --cppflags (which we pick up on) If llvm is built with rtti (REQUIRES_RTTI=1), then -fno-rtti is removed from llvm-config --cppflags. * We could selectively add / remove rtti from various components, however mixing rtti and non-rtti code is tricky and could introduce missing symbols. * This needs impact tested. Reviewed-by: Francisco Jerez <[email protected]>
* st/vdpau: add format conversions for GetBitsYCbCrGrigori Goronzy2013-10-131-8/+117
| | | | | | | | Add simple plain C routines for NV12<->YV12 and YUYV<->UYVY conversions. The NV12->YV12 conversion is commonly used, for instance by VLC. Reviewed-by: Christian König <[email protected]>
* radeon: use staging for mapping linear texturesGrigori Goronzy2013-10-131-0/+6
| | | | | | | | Textures that likely reside in VRAM, are mapped for reading and don't require direct mapping should be staged into GTT, to avoid bad performance. This fixes readback performance of VDPAU surfaces. Reviewed-by: Marek Olšák <[email protected]>
* radeon/uvd: use PIPE_BIND_LINEAR for video surfacesGrigori Goronzy2013-10-132-7/+7
| | | | | | | This new bind flag forces linear storage, but does not have other side effects like R600_RESOURCE_FLAG_TRANSFER. Reviewed-by: Christian König <[email protected]>
* radeonsi: Allow Sinking pass to move preloaded const/res/samplVincent Lejeune2013-10-132-5/+28
| | | | | This fixes a crash in Unigine Heaven 3.0, and probably in some others apps.
* radeonsi: pass alpha_ref value to PS in the user sgprVadim Girlin2013-10-133-25/+29
| | | | | | | | | | | | Currently it's hardcoded in the shader, so every change requires compilation of the shader variant, killing the performance in Serious Sam 3 and probably other apps. This patch passes alpha_ref in the user sgpr and removes it from the shader key. Signed-off-by: Vadim Girlin <[email protected]> Reviewed-by: Michel Dänzer <[email protected]>
* r600g: fix tgsi_op2_s with trans-only instructionsVadim Girlin2013-10-131-5/+31
| | | | | | | | | | | | | | | | | | This fixes the issue when dst and src is the same reg and operation on one channel overwrites the source for other channels, e.g.: UMUL TEMP[2].xyz, TEMP[0].xyzz, TEMP[2].xxxx In this example the result of the operation on channel x is written in TEMP[2].x and then used as a second source operand for channels y and z instead of original value in TEMP[2].x. This patch stores the results in temp reg and moves them to dst after performing operation on all channels. Fixes https://bugs.freedesktop.org/show_bug.cgi?id=70327 Signed-off-by: Vadim Girlin <[email protected]>
* i915g: Fix assertStephane Marchesin2013-10-121-1/+1
| | | | | | Now that we support start, assert on start + num < max samplers Reported by xexaxo
* radeon/llvm: show LLVM disassembly when availableJay Cornwall2013-10-123-1/+9
| | | | | | | | With code dump enabled LLVM may generate disassembly during compilation. Show this disassembly when available and prefer it to SI bytecode dump. Reviewed-by: Tom Stellard <[email protected]> Signed-off-by: Jay Cornwall <[email protected]>
* softpipe: fix seamless cube filteringRoland Scheidegger2013-10-121-48/+151
| | | | | | | | | | | | | | | | | | | | | | Fix coord wrapping (and face selection too) in case of edges. Unfortunately, the coord wrapping is way more complicated than what the code did, as it depends on the face and the direction where the texel falls off the face (the logic needed to get this right in fact seems utterly ridiculous). Also fix a bug in (y direction under/overflow) face selection. And get rid of complicated cube corner handling. Just like edge case, the coord wrapping was wrong and it seems very difficult to fix. I'm near certain it can't always work anyway (though ordinary seamless filtering on edge has actually a similar problem but not as severe) because we don't have per-pixel face, hence could have multiple corner texels which would make it very difficult to average the remaining texels correctly. Hence simply pick a texel which would only have fallen off one edge but not both instead, which is not quite accurate but actually I think should be enough to meet OpenGL (but not d3d10) requirements. v2: small fixes suggested by Brian, add some comments. Reviewed-by: Brian Paul <[email protected]>
* llvmpipe: increase fs shader variant instruction cache limit by factor 4Roland Scheidegger2013-10-121-2/+2
| | | | | | | | | | | | | | | | | The previous limit of of 128*1024 was reported to cause frequent recompiles in some apps due to shader variant thrashing on IRC in some apps leading to noticeable lags. Note that the LP_MAX_SHADER_VARIANTS limit (1024) was more or less impossible to reach, since even simple fragment shaders without texturing (glxgears) used more than twice than 128 instructions, hence the instruction limit would have always been reached first (excluding things like trivial shaders not writing color). Even with the new limit it is VERY likely the instruction limit is hit first. Should help with such lags due to recompiles (though other shader types have their own limits, LP_MAX_SETUP_VARIANTS and DRAW_MAX_SHADER_VARIANTS, in particular the latter seems a bit small (128)). Reviewed-by: Brian Paul <[email protected]>