summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* nvc0/ir: don't replace load from input in COMPUTE progs with VFETCHChristoph Bumiller2013-03-121-2/+7
|
* nvc0/ir: implement lowering of surface ops for nve4Christoph Bumiller2013-03-128-16/+429
|
* nvc0/ir: add formatted surface load lib code, move to extra headerChristoph Bumiller2013-03-126-149/+1309
| | | | | | OpenGL is nice and makes the user specify a format with an image unit. OpenCL is evil and doesn't, and what's better than adding a huge load of functions that we call indirectly to handle the conversion ?
* nv50/ir: extend moveSources for delta < 0Christoph Bumiller2013-03-122-16/+31
|
* nvc0/ir: lower atomics in s[]Christoph Bumiller2013-03-121-0/+33
|
* nvc0/ir/emit: implement INSBF, EXTBF, PERMT and ATOMChristoph Bumiller2013-03-122-1/+133
|
* nv50/ir/emit: handle OP_ATOMChristoph Bumiller2013-03-121-0/+41
|
* nvc0/ir/target: some ops can't be predicated, e.g. CALLChristoph Bumiller2013-03-121-0/+8
|
* nv50/ir/opt: CALLs cannot loadChristoph Bumiller2013-03-121-0/+3
|
* nv50/ir: add support for indirect BRA,CALLChristoph Bumiller2013-03-125-6/+29
|
* nvc0/ir/emit: implement move to and logic ops on predicatesChristoph Bumiller2013-03-121-0/+45
|
* nvc0/ir/emit: implement surface related opsChristoph Bumiller2013-03-122-0/+301
|
* nv50/ir: initialize CodeEmitters' specialized target fieldsChristoph Bumiller2013-03-123-9/+10
|
* nv50/ir/opt: make optimization aware of atomics, barriers, surface opsChristoph Bumiller2013-03-122-1/+28
|
* nv50/ir: add various new OPs that will be needed for computeChristoph Bumiller2013-03-129-48/+179
|
* nv50/ir: Rename "mkLoad" to "mkLoadv" for consistency.Francisco Jerez2013-03-124-12/+21
|
* nv50/ir: fix comparison of system valuesChristoph Bumiller2013-03-121-0/+3
|
* nv50/ir/tgsi: Translate grid-related system parameters.Francisco Jerez2013-03-121-0/+4
|
* nv50/ir/tgsi: Accept COMPUTE programs.Francisco Jerez2013-03-121-0/+1
|
* nv50/ir/ra: make sure all used function inputs get assigned a regChristoph Bumiller2013-03-121-0/+7
| | | | | | A live range [0, 0) counts as empty. For function inputs this can be a problem, so insert a nop at the beginning to make it [0, 1). This is a bit of a hack but also the most simple solution.
* nv50/ir/ra: also add pre-existing MERGE,SPLIT to constraint listChristoph Bumiller2013-03-121-1/+3
|
* nv50/ir/ra: fix confusion with conditional RegisterSet::occupyChristoph Bumiller2013-03-122-12/+32
|
* nv50/ir/ra: swap copyCompound args if src is compound and dst isn'tChristoph Bumiller2013-03-121-0/+9
|
* nv50/ir/ra: Fix maxGPR calculation for programs with multiple functions.Francisco Jerez2013-03-121-1/+1
|
* nv50/ir/ra: Fix traversal before the beginning of the active list in buildRIG.Francisco Jerez2013-03-121-6/+5
|
* nv50/ir/ra: Fix RegisterSet::occupy(const Value *v).Francisco Jerez2013-03-121-1/+1
|
* nv50/ir/ra: Fix argument const-ness in RegisterSet::idToUnits and idToBytesFrancisco Jerez2013-03-121-2/+2
|
* nv50/ir/opt: Fix tryPropagateBranch for BBs with several exit branches.Francisco Jerez2013-03-121-28/+32
| | | | | Comments and "if (bf->cfg.incidentCount() == 1)" condition added by Christoph Bumiller.
* nv50/ir: Clean up references to function values before destroying them.Francisco Jerez2013-03-121-0/+4
|
* nouveau: Bail out from nouveau_fence_wait if flushing the pushbuf fails.Francisco Jerez2013-03-121-2/+4
|
* mesa: Use correct functions for enum conversion.Vinson Lee2013-03-111-2/+2
| | | | | | | | Fixes mixing enum types defects reported by Coverity. Signed-off-by: Vinson Lee <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* freedreno: gallium driver for adrenoRob Clark2013-03-1149-1/+9973
| | | | | | | | | | | | | | | | | | | Currently works on a220. Others in the a2xx family look pretty similar and should be pretty straightforward to support with the same driver. The a3xx has a new shader ISA, and while many registers appear similar, the register addresses have been completely shuffled around. I am not sure yet whether it is best to support with the same driver, but different compiler, or whether it should be split into a different driver. v1: original v2: build file updates from review comments, and remove GPL licensed header files from msm kernel v3: smarter temp/pred register assignment, fix clear and depth/stencil format issues, resource_transfer fixes, scissor fixes Signed-off-by: Rob Clark <[email protected]>
* d3d1x: Remove.José Fonseca2013-03-12137-27307/+1
| | | | | | Unused/unmaintained. Reviewed-by: Christoph Bumiller <[email protected]>
* nv50: Remove nv0_ir_from_sm4.*José Fonseca2013-03-122-2512/+0
| | | | | | Unused, depends on d3d1x. Reviewed-by: Christoph Bumiller <[email protected]>
* gallivm: clean up passing derivatives aroundRoland Scheidegger2013-03-126-249/+196
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, the derivatives were calculated and passed in a packed form to the sample code (for implicit derivatives, explicit derivatives were packed to the same format). There's several reasons why this wasn't such a good idea: 1) the derivatives may not even be needed (not as bad as it sounds since llvm will just throw the calculations needed for them away but still) 2) the special packing format really shouldn't be part of the sampler interface 3) depending what the sample code actually does the derivatives will be processed differently, hence there is no "ideal" packing. For cube maps with explicit derivatives (which we don't do yet) for instance the packing looked downright useless, and for non-isotropic filtering we'd need different calculations too. So, instead just pass the derivatives as is (for explicit derivatives), or let the rho calculating sample code calculate them itself. This still does exactly the same packing stuff for implicit derivatives for now, though explicit ones are handled in a more straightforward manner (quick estimates show performance should be quite similar, though it is much easier to follow and also does the rho calculation per-pixel until the end, which we eventually need for spec compliance anyway). No piglit changes. Reviewed-by: Jose Fonseca <[email protected]>
* i965: Fix typo in doxygen hyperlinkChad Versace2013-03-111-1/+1
| | | | | | | | s/brw_state_upload/brw_upload_state/ Found because the link was broken. Signed-off-by: Chad Versace <[email protected]>
* mesa: Reduce memory usage for reg alloc with many graph nodes (part 2).Eric Anholt2013-03-111-4/+8
| | | | | | | | | | | | | After the previous fix that almost removes an allocation of 4*n^2 bytes, we can use a bitset to reduce another allocation from n^2 bytes to n^2/8 bytes. Between the previous commit and this one, the peak heap size for an oglconform ARB_fragment_program max instructions test on i965 goes from 4GB to 255MB. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=55825 Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: Reduce the memory usage for reg alloc with many graph nodes (part 1)Eric Anholt2013-03-111-1/+13
| | | | | | | | We were allocating an adjacency_list entry for every possible interference that could get created, but that usually doesn't happen. We can save a lot of memory by resizing the array on demand. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Improve CSE performance by expiring some available expressions.Eric Anholt2013-03-111-1/+19
| | | | | | | | | | We're already walking the list, and we can easily know when something has no reason to be in the list any longer, so take a brief extra step to reduce our worst-case runtime (an oglconform test that emits the maximum instructions in a fragment program). I don't actually know what the worst-case runtime was, because it was too long and I got bored. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Improve live variables calculation performance.Eric Anholt2013-03-112-26/+32
| | | | | | | | | | | | We can execute way fewer instructions by doing our boolean manipulation on an "int" of bits at a time, while also reducing our working set size. Reduces compile time of L4D2's slowest shader from 4s to 1.1s (-72.4% +/- 0.2%, n=10) v2: Remove redundant masking (noted by Ken) Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Also do the gen4 SEND dependency workaround against other SENDs.Eric Anholt2013-03-111-9/+15
| | | | | | | | | | | | We were handling the the dependency workaround for the first written reg of a send preceding the one we're fixing up, but didn't consider the other regs. Thus if you had two sampler calls that got allocated to the same set of regs, one might, rarely, ovewrite the other. This was occurring in XBMC's GLSL shaders. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44567 NOTE: This is a candidate for the stable branches. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Switch to using sampler LD messages for uniform pull constants.Eric Anholt2013-03-114-52/+50
| | | | | | | | | When forcing the compiler to always generate pull constants instead of push constants (in order to have an easy to use testcase), improves performance of my old GLSL demo 23.3553% +/- 1.42968% (n=7). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=60866 Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Fix broken rendering in large shaders with UBO loads.Eric Anholt2013-03-111-0/+2
| | | | | | | | | | The lowering process creates a new vgrf on gen7 that should be represented in live interval analysis. As-is, it was getting a conflicting allocation with gl_FragDepth in the dolphin emulator, producing broken rendering. NOTE: This is a candidate for the 9.1 branch. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=61317 Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Add a comment about about an implementation detail.Eric Anholt2013-03-111-0/+4
| | | | | | | | I was going to fix the code above like the previous commit, but we already had that covered (otherwise all our uniform access would have been broken, unlike just pull constants). Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Fix register allocation for uniform pull constants in 16-wide.Eric Anholt2013-03-111-23/+31
| | | | | | | | | | | | | | | We were allowing a compressed instruction to write a register that contained the last use of a uniform pull constant (either UBO load or push constant spillover), so it would get half its values smashed. Since we need to see the actual instruction to decide this, move the pre-gen6 pixel_x/y logic here, which should improve the performance of register allocation since virtual_grf_interferes() is called more than once per instruction. NOTE: This is a candidate for the stable branches. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=61317 Reviewed-by: Kenneth Graunke <[email protected]>
* intel: Remove some unused debug flags.Eric Anholt2013-03-112-8/+0
| | | | | | | I was looking at the list to see what might be interesting to document for application developers, and it turns out some are completely dead. Reviewed-by: Kenneth Graunke <[email protected]>
* draw/gs: Correctly iterate the emitted primitivesZack Rusin2013-03-071-4/+4
| | | | | | | | | | We were assuming that each emitted primitive had the same number of vertices. That is incorrect. Emitted primitives can have arbirtrary number of vertices. Simply increment index on iteration to fix it. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: José Fonseca <[email protected]>
* tgsi/exec: Correctly reset NumOutputs before parsing the shaderZack Rusin2013-03-071-3/+7
| | | | | | | | | | | | Whenever we're binding the shaders we're incrementing NumOutputs, assuming the parser spots an output decleration, but we were never reseting the variable. That means that each subsequent bind of a geometry shader would add its number of output to the number of output bound by all previously ran shaders and our indexes would get completely messed up. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: José Fonseca <[email protected]>
* draw/llvm: another quick hack for drawing with no position outputRoland Scheidegger2013-03-111-1/+1
| | | | | | Also need to skip things if we have no cv value but pos value (happens with geometry shaders enabled). Needs a round of cleanup, though.
* softpipe: don't use samplers with prebaked sampler and sampler_view stateRoland Scheidegger2013-03-116-866/+779
| | | | | | | | | | | | | | This is needed for handling the dx10-style sample opcodes. This also simplifies the logic by getting rid of sampler variants completely (sampler_views though OTOH have sort of variants because some of their state is different depending on the shader stage they are bound to). No significant performance difference (openarena run: 840 frames in 459.8 seconds vs. 840 frames in 460.5 seconds). v2: fix reference counting bug spotted by Jose. Reviewed-by: Jose Fonseca <[email protected]>