summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* i965: Fix typo in doxygen hyperlinkChad Versace2013-03-111-1/+1
| | | | | | | | s/brw_state_upload/brw_upload_state/ Found because the link was broken. Signed-off-by: Chad Versace <[email protected]>
* mesa: Reduce memory usage for reg alloc with many graph nodes (part 2).Eric Anholt2013-03-111-4/+8
| | | | | | | | | | | | | After the previous fix that almost removes an allocation of 4*n^2 bytes, we can use a bitset to reduce another allocation from n^2 bytes to n^2/8 bytes. Between the previous commit and this one, the peak heap size for an oglconform ARB_fragment_program max instructions test on i965 goes from 4GB to 255MB. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=55825 Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: Reduce the memory usage for reg alloc with many graph nodes (part 1)Eric Anholt2013-03-111-1/+13
| | | | | | | | We were allocating an adjacency_list entry for every possible interference that could get created, but that usually doesn't happen. We can save a lot of memory by resizing the array on demand. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Improve CSE performance by expiring some available expressions.Eric Anholt2013-03-111-1/+19
| | | | | | | | | | We're already walking the list, and we can easily know when something has no reason to be in the list any longer, so take a brief extra step to reduce our worst-case runtime (an oglconform test that emits the maximum instructions in a fragment program). I don't actually know what the worst-case runtime was, because it was too long and I got bored. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Improve live variables calculation performance.Eric Anholt2013-03-112-26/+32
| | | | | | | | | | | | We can execute way fewer instructions by doing our boolean manipulation on an "int" of bits at a time, while also reducing our working set size. Reduces compile time of L4D2's slowest shader from 4s to 1.1s (-72.4% +/- 0.2%, n=10) v2: Remove redundant masking (noted by Ken) Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Also do the gen4 SEND dependency workaround against other SENDs.Eric Anholt2013-03-111-9/+15
| | | | | | | | | | | | We were handling the the dependency workaround for the first written reg of a send preceding the one we're fixing up, but didn't consider the other regs. Thus if you had two sampler calls that got allocated to the same set of regs, one might, rarely, ovewrite the other. This was occurring in XBMC's GLSL shaders. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44567 NOTE: This is a candidate for the stable branches. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Switch to using sampler LD messages for uniform pull constants.Eric Anholt2013-03-114-52/+50
| | | | | | | | | When forcing the compiler to always generate pull constants instead of push constants (in order to have an easy to use testcase), improves performance of my old GLSL demo 23.3553% +/- 1.42968% (n=7). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=60866 Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Fix broken rendering in large shaders with UBO loads.Eric Anholt2013-03-111-0/+2
| | | | | | | | | | The lowering process creates a new vgrf on gen7 that should be represented in live interval analysis. As-is, it was getting a conflicting allocation with gl_FragDepth in the dolphin emulator, producing broken rendering. NOTE: This is a candidate for the 9.1 branch. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=61317 Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Add a comment about about an implementation detail.Eric Anholt2013-03-111-0/+4
| | | | | | | | I was going to fix the code above like the previous commit, but we already had that covered (otherwise all our uniform access would have been broken, unlike just pull constants). Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Fix register allocation for uniform pull constants in 16-wide.Eric Anholt2013-03-111-23/+31
| | | | | | | | | | | | | | | We were allowing a compressed instruction to write a register that contained the last use of a uniform pull constant (either UBO load or push constant spillover), so it would get half its values smashed. Since we need to see the actual instruction to decide this, move the pre-gen6 pixel_x/y logic here, which should improve the performance of register allocation since virtual_grf_interferes() is called more than once per instruction. NOTE: This is a candidate for the stable branches. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=61317 Reviewed-by: Kenneth Graunke <[email protected]>
* intel: Remove some unused debug flags.Eric Anholt2013-03-112-8/+0
| | | | | | | I was looking at the list to see what might be interesting to document for application developers, and it turns out some are completely dead. Reviewed-by: Kenneth Graunke <[email protected]>
* draw/gs: Correctly iterate the emitted primitivesZack Rusin2013-03-071-4/+4
| | | | | | | | | | We were assuming that each emitted primitive had the same number of vertices. That is incorrect. Emitted primitives can have arbirtrary number of vertices. Simply increment index on iteration to fix it. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: José Fonseca <[email protected]>
* tgsi/exec: Correctly reset NumOutputs before parsing the shaderZack Rusin2013-03-071-3/+7
| | | | | | | | | | | | Whenever we're binding the shaders we're incrementing NumOutputs, assuming the parser spots an output decleration, but we were never reseting the variable. That means that each subsequent bind of a geometry shader would add its number of output to the number of output bound by all previously ran shaders and our indexes would get completely messed up. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: José Fonseca <[email protected]>
* draw/llvm: another quick hack for drawing with no position outputRoland Scheidegger2013-03-111-1/+1
| | | | | | Also need to skip things if we have no cv value but pos value (happens with geometry shaders enabled). Needs a round of cleanup, though.
* softpipe: don't use samplers with prebaked sampler and sampler_view stateRoland Scheidegger2013-03-116-866/+779
| | | | | | | | | | | | | | This is needed for handling the dx10-style sample opcodes. This also simplifies the logic by getting rid of sampler variants completely (sampler_views though OTOH have sort of variants because some of their state is different depending on the shader stage they are bound to). No significant performance difference (openarena run: 840 frames in 459.8 seconds vs. 840 frames in 460.5 seconds). v2: fix reference counting bug spotted by Jose. Reviewed-by: Jose Fonseca <[email protected]>
* tgsi: emit code for SVIEWINFO and SAMPLE_IRoland Scheidegger2013-03-111-3/+10
| | | | | | | | Can handle them since the single sampler interface was introduced. v2: simplify txf/sample_i handling a bit according to Brian's feedback. Reviewed-by: Jose Fonseca <[email protected]>
* tgsi: fix wrong reg used for unit for TGSI_OPCODE_TXFRoland Scheidegger2013-03-111-2/+2
| | | | Reviewed-by: Jose Fonseca <[email protected]>
* r600g/llvm: Fix buildTom Stellard2013-03-111-1/+1
|
* r600g: add debug options disabling various copy-buffer-related featuresMarek Olšák2013-03-113-2/+11
| | | | This will be invaluable for debugging and bug reports.
* mesa: don't allocate a texture if width or height is 0 in CopyTexImageMarek Olšák2013-03-111-10/+12
| | | | | | NOTE: This is a candidate for the stable branches. Reviewed-by: Brian Paul <[email protected]>
* gallium/util: attempt to fix blitting multisample texture arraysMarek Olšák2013-03-112-2/+2
| | | | We don't have a test for this yet, but obviously the swizzle was wrong.
* r600g: allocate FMASK right after the texture, so that it's aligned with itMarek Olšák2013-03-111-1/+1
| | | | | | This avoids the kernel CS checker errors with MSAA textures. Reviewed-by: Jerome Glisse <[email protected]>
* r600g: remove r600.h, move the stuff elsewhere (mostly to r600_pipe.h)Marek Olšák2013-03-118-167/+126
| | | | Reviewed-by: Jerome Glisse <[email protected]>
* r600g: remove r600_hw_context_priv.h, move the stuff to r600_pipe.hMarek Olšák2013-03-116-46/+13
| | | | Reviewed-by: Jerome Glisse <[email protected]>
* r600g: remove deprecated state management codeMarek Olšák2013-03-1110-560/+2
| | | | | | It's nice to see so much code that did pretty much nothing go away. Reviewed-by: Jerome Glisse <[email protected]>
* r600g: atomize pixel shaderMarek Olšák2013-03-117-207/+83
| | | | Reviewed-by: Jerome Glisse <[email protected]>
* r600g: atomize vertex shaderMarek Olšák2013-03-118-232/+203
| | | | Reviewed-by: Jerome Glisse <[email protected]>
* r600g: inline r600_pipe_shader functionMarek Olšák2013-03-115-58/+51
| | | | | | also change names of other functions, so that they make sense Reviewed-by: Jerome Glisse <[email protected]>
* r600g: dump vertex elements state along with the fetch shaderMarek Olšák2013-03-111-0/+8
|
* gallium/util: dump instance_divisorMarek Olšák2013-03-111-2/+1
|
* r600g: remove bytecode dumpingMarek Olšák2013-03-112-240/+0
| | | | Reviewed-by: Tom Stellard <[email protected]>
* r600g: use a single env var R600_DEBUG, disable bytecode dumpingMarek Olšák2013-03-1110-95/+122
| | | | | | | | | | | | | | | | | | | | | | | | | Only the disassembler is used to dump shaders. Here's a few examples how to use R600_DEBUG. Log compute info: R600_DEBUG=compute Dump all shaders: R600_DEBUG=fs,vs,gs,ps,cs Dump pixel shaders only: R600_DEBUG=ps Disable Hyper-Z: R600_DEBUG=nohyperz Disable the LLVM backend: R600_DEBUG=nollvm Or use any combination of the above, or print all options: R600_DEBUG=help Reviewed-by: Tom Stellard <[email protected]>
* r600g: cleanup #include recursion between r600_pipe.h and evergreen_compute.hMarek Olšák2013-03-117-2/+6
| | | | Reviewed-by: Tom Stellard <[email protected]>
* r600g: don't check for R600_ENABLE_S3TC env varMarek Olšák2013-03-111-10/+3
|
* glapi/gen: Remove duplicate PYTHON_FLAGSStefan Brüns2013-03-091-7/+7
| | | | | | | PYTHON_GEN calls python with PYTHON_FLAGS Reviewed-by: Matt Turner <[email protected]> Signed-off-by: Stefan Brüns <[email protected]>
* i965: Link i965_dri.so with C++ linker.Frank Henigman2013-03-081-0/+1
| | | | | | Force C++ linking of i965_dri.so by adding a dummy C++ source file. Reviewed-by: Matt Turner <[email protected]>
* gallium/util: Correct shift value for TSC feature detection.Maxence Le Doré2013-03-081-1/+1
| | | | Reviewed-by: Matt Turner <[email protected]>
* configure.ac: Build dricommon for DRI gallium driversMatt Turner2013-03-081-9/+10
| | | | | | | | | | | | | | | | | | | | | | | | Commit 67ef7559 added an || test "x$enable_dri" check in an attempt to get the DRI common bits built in some necessary cases. That change was inappropriate as it made these common DRI pieces be built unconditionally, so some builds were broken. Subsequently, commit 998d975e3 change the "|| test" to a "-a" conjunction within the existing test invocation. This made the '-a "x$enable_dri" = xyes' clause have no effect, (as it was inside an enclosing test for the same condition). So the new breakage from commit 67ef7559 was addressed, but the original problems were regressed. The immediately preceding commit removed the redundant condition. Now, finally this commit fixes the original problem as described in the commit message of 67ef7559: this code should be compiled when using the DRI state tracker. In order to do so, the HAVE_*_DRI conditionals must be moved after the last assignment of HAVE_COMMON_DRI. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=61821 Tested-by: Stéphane Marchesin <[email protected]>
* configure.ac: Remove redundant checks of enable_dri.Matt Turner2013-03-081-18/+16
| | | | The whole block is enclosed inside if test "x$enable_dri" = xyes.
* mesa: Allow ETC2/EAC formats with ARB_ES3_compatibility.Matt Turner2013-03-082-2/+2
| | | | | | Fixes piglit's oes_compressed_etc2_texture-miptree tests on Desktop GL. Reported-by: Marek Olšák <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i915g: Use PIPE_FLUSH_END_OF_FRAME to trigger throttlingStéphane Marchesin2013-03-0810-27/+43
| | | | | This helps with jittering, instead of throttling at every command buffer we only throttle once a frame.
* i915g: Update TODOStéphane Marchesin2013-03-081-12/+1
|
* docs: document another Viewperf bugBrian Paul2013-03-081-0/+29
|
* dri/nouveau: fix crash in nouveau_flushJan de Groot2013-03-071-1/+2
| | | | | | https://bugs.freedesktop.org/show_bug.cgi?id=61947 Note: this is a candidate for the stable branches
* draw: add const qualifier to silence compiler warningBrian Paul2013-03-071-1/+1
|
* llvmpipe: remove the power of two sizeof(struct cmd_block) assertionBrian Paul2013-03-071-7/+0
| | | | | | | It fails on 32-bit systems (I only tested on 64-bit). Power of two size isn't required, so just remove the assertion. Reviewed-by: José Fonseca <[email protected]>
* vbo: fix crash found with shared display listsBrian Paul2013-03-071-1/+1
| | | | | | | | | | | This fixes a crash when a display list is created in one context but executed from a second one. The vbo_save_context::vertex_store memeber will be NULL if we never created a display list with the context. Just check for that before dereferencing the pointer. Fixes http://bugzilla.redhat.com/show_bug.cgi?id=918661 Note: This is a candidate for the stable branches.
* mesa: fix glGetInteger*(GL_SAMPLER_BINDING).Alan Hourihane2013-03-073-2/+14
| | | | | | | | | | | If the sampler object has been deleted on another context, an alternative context may reference the old sampler. So ensure the sampler object still exists. Note: this is a candidate for the stable branch. Signed-off-by: Alan Hourihane <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* radeon/llvm: document LLVM commitChristian König2013-03-071-0/+1
| | | | | | We need at least that revision to work correctly now. Signed-off-by: Christian König <[email protected]>
* radeon/llvm: enable LICM and DCE pass v2Christian König2013-03-071-0/+2
| | | | | | | | | | | | | LICM stands for Loop Invariant Code Motion. Instructions that does not depend of loop index are moved outside of loop body. DCE is DeadCodeElimination. v2: updated commit msg, thx to Vincent. Signed-off-by: Christian König <[email protected]> Reviewed-by: Vincent Lejeune <vljn at ovi.com> Reviewed-by: Tom Stellard <[email protected]>