summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* r600g: don't reserve more stack space than required v5Vadim Girlin2013-04-023-56/+142
| | | | | | | | | | | Reduced stack size allows to run more threads in some cases, improving performance for the shaders that use stack (that is, for the shaders with control flow instructions). E.g. with unigine-based apps. v4: implement exact computation taking into account wavefront size v5: add cases for RV620, RS880 Signed-off-by: Vadim Girlin <[email protected]>
* r600g: fix range handling for tgsi input declarations v2Vadim Girlin2013-04-021-3/+7
| | | | Signed-off-by: Vadim Girlin <[email protected]>
* gallium/hud: do .xxxx swizzling for the font texture in the fragment shaderMarek Olšák2013-04-021-6/+30
| | | | | | This allows using L8 and R8 for the font if I8 isn't supported. Tested-by: Brian Paul <[email protected]>
* hud: flush/unmap the vertex buffer before drawingBrian Paul2013-04-021-0/+3
| | | | | | | The VMware svga driver is picky about making sure the VBO is unmapped before drawing. Reviewed-by: Marek Olšák <[email protected]>
* draw: use pipe_transfer_unmap() to match pipe_transfer_map()Brian Paul2013-04-021-1/+1
|
* gallivm: fix signed small float to float conversionRoland Scheidegger2013-04-021-1/+1
| | | | | | Introduced by 5f41e08cf39d585d600aa506cdcd2f5380c60ddd, just a silly typo. Fixes https://bugs.freedesktop.org/show_bug.cgi?id=62921.
* radeonsi: add instance divisor support v3Christian König2013-04-024-59/+94
| | | | | | | | v2: reduce key size, don't copy key around to much. v3: remove key size reduction Signed-off-by: Christian König <[email protected]> Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: add start instance supportChristian König2013-04-023-10/+21
| | | | | | | | This works different than on R600, we need to add the start instance manually. Signed-off-by: Christian König <[email protected]> Reviewed-by: Michel Dänzer <[email protected]> Tested-by: Michel Dänzer <[email protected]>
* radeonsi: add instanceid supportChristian König2013-04-024-7/+43
| | | | | | Signed-off-by: Christian König <[email protected]> Reviewed-by: Michel Dänzer <[email protected]> Tested-by: Michel Dänzer <[email protected]>
* radeon/llvm: move system value fetching to common codeChristian König2013-04-022-12/+12
| | | | | | | | This should be used by both SI and R600. Signed-off-by: Christian König <[email protected]> Reviewed-by: Michel Dänzer <[email protected]> Tested-by: Michel Dänzer <[email protected]>
* radeonsi: Handle arbitrary 2-byte formats in resource_copy_regionMichel Dänzer2013-04-021-0/+6
| | | | | | | | | | | Fixes mplayer -vo vdpau OSD. NOTE: This is a candidate for the 9.1 branch. Reported-by: Igor Vagulin <[email protected]> Reviewed-by: Christian König <[email protected]> Tested-by: Christian König <[email protected]>
* nvc0: Fix fd leak in nvc0_create_decoderMaarten Lankhorst2013-04-021-2/+2
| | | | | | NOTE: This is a candidate for the 9.0 and 9.1 branches. Signed-off-by: Maarten Lankhorst <[email protected]>
* GLSL: fix lower_jumps to report progress properlyAras Pranckevicius2013-04-011-1/+3
| | | | | | | | | A fix for lower_jumps progress reporting, very much like similar in c1e591eed. NOTE: This is a candidate for stable branches. Reviewed-by: Ian Romanick <[email protected]>
* i965/fs: Allow CSE on pre-gen7 varying-index uniform loadsEric Anholt2013-04-011-1/+1
| | | | | | | | | | | | | | | All the other expression types allowed here have inst->mlen == 0, and this one has implied MRF writes for all of its payload, so nothing else in the implementation should need to change. Reduces SEND messages for loading from pull constants in kwin's Lanczos shader from 16 to 6. (Due to a deficiency in constant propagation, I can't use the hack I did in the previous commit to test the performance change) Reviewed-by: Kenneth Graunke <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=61554 NOTE: This is a candidate for the 9.1 branch.
* i965/fs: Use LD messages for pre-gen7 varying-index uniform loadsEric Anholt2013-04-014-67/+84
| | | | | | | | | | This comes at a minor performance cost at the moment (-3.2% +/- 0.2%, n=14 on my GM45 forced to load all uniforms through the varying-index path), but we get a whole vec4 at a time to reuse in the next commit. v2: Fix comment about channels in the other message. Reviewed-by: Kenneth Graunke <[email protected]> NOTE: This is a candidate for the 9.1 branch.
* i965/fs: Don't double-emit SEND dependency workarounds at control flow.Eric Anholt2013-04-011-0/+2
| | | | | | | | | We weren't setting needs_dep[i] in the loops, so we'd continue on to potentially add the same workaround MOVs to the later basic block boundaries, too. We can either set needs_dep[i] to exit through the normal path, or we can just return since we know we're done. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Bake regs_written into the IR instead of recomputing it later.Eric Anholt2013-04-017-33/+27
| | | | | | | | | For sampler messages, it depends on the target gen, and on gen4 SIMD16-sampler-on-SIMD8-execution we were returning 4 instead of 8 like we should. Reviewed-by: Kenneth Graunke <[email protected]> NOTE: This is a candidate for the 9.1 branch.
* i965/fs: Clean up the setup of gen4 simd16 message destinations.Eric Anholt2013-04-011-5/+4
| | | | | | | I think this makes it much more obvious what's going on here. NOTE: This is a candidate for the 9.1 branch. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Do CSE on gen7's varying-index pull constant loads.Eric Anholt2013-04-011-11/+32
| | | | | | | | | | | | This is our first CSE on a regs_written() > 1 instruction, so it takes a bit of extra fixup. Reduces the number of loads on kwin's Lanczos shader from 12 to 2. v2: Fix compiler warning (false positive on possibly-uninitialized variable) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=61554 Reviewed-by: Kenneth Graunke <[email protected]> (v1) NOTE: This is a candidate for the 9.1 branch.
* i965/fs: Improve performance of varying-index uniform loads on IVB.Eric Anholt2013-04-012-18/+38
| | | | | | | | | | | | | | | | | | | Like we have done for the VS and for constant-index uniform loads, we use the sampler engine to get caching in front of the L3 to avoid tickling the IVB L3 bug. This is also a bit of a functional change, as we're now loading a vec4 instead of a single dword, though we're not taking advantage of the other 3 components of the vec4 (yet). With the driver hacked to always take the varying-index path for all uniforms, improves performance of my old GLSL demo by 315% +/- 2% (n=4). This a major fix for some blur shaders in compositors from the varying-index uniforms support I introduced in 9.1. v2: Move old offset computation into the pre-gen7 path. Reviewed-by: Kenneth Graunke <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=61554 NOTE: This is a candidate for the 9.1 branch.
* i965/fs: Avoid inappropriate optimization with regs_written > 1.Eric Anholt2013-04-011-0/+6
| | | | | | | | Right now we don't have anything with regs_written() > 1 and !inst->mlen, but that's about to change. NOTE: This is a candidate for the 9.1 branch. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Make the fragment shader pull constants index by dwords, not vec4s.Eric Anholt2013-04-016-16/+19
| | | | | | | | | | | | | | | | | We want to load vec4s, since loading a vec4 instead of a dword is basically no increased latency. But for variable indexed access, the previous requirement of aligned vec4s for a sampler LD was hard to implement. Note that this change only affects those messages that use the surface format, like sampler LDs, but not to the untyped data cache loads we've used in other cases. No significant performance difference on my GLSL demo with uniforms forced to take the varying pull constants path (n=4). NOTE: This is a candidate for the 9.1 branch. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Make the constant surface interface take a normal byte size.Eric Anholt2013-04-014-17/+16
| | | | | | | | | This puts the rounding-up logic into the function itself instead of all the callers having to manage it. Also drop an "unused" comment in gen4, as the stride *is* used for texbos (and will be for uniforms soon). NOTE: This is a candidate for the 9.1 branch. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Move varying uniform offset compuation into the helper func.Eric Anholt2013-04-013-11/+13
| | | | | | | | I'm going to want to change the math for gen7 using sampler LD instructions in a way that gets CSE to occur like we'd hope. NOTE: This is a candidate for the 9.1 branch. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Remove creation of a MOV instruction that's never used.Eric Anholt2013-04-011-1/+0
| | | | | | | | We weren't inserting it into the list, so it did nothing. This line was replaced by the MOV/MUL block above. NOTE: This is a candidate for the 9.1 branch. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Allow constant propagation into MACH.Eric Anholt2013-04-011-2/+4
| | | | | | | | This happens quite a bit with varying-index uniform loads. We could also do better by avoiding the MACH entirely, but there's no reason not to at least take this step. Reviewed-by: Kenneth Graunke <[email protected]>
* r600g/llvm: Update LLVM_REVISION.txtVincent Lejeune2013-04-011-1/+1
|
* r600g/llvm: Use stack_size provided from llvm.Vincent Lejeune2013-04-011-0/+1
|
* r600g/llvm: uses function attribute to pass shader typeVincent Lejeune2013-04-011-0/+1
|
* r600g/llvm: Add support for cf_alu native encodeVincent Lejeune2013-04-013-1/+16
|
* ACTIVE_UNIFORM_MAX_LENGTH should include 3 extra characters for arrays.Haixia Shi2013-04-011-2/+4
| | | | | | | | | | | | | | | | | If the active uniform is an array, then the length of the uniform name should include the three extra characters for the "[0]" suffix, which is required by the GL 4.2 spec to be appended to the uniform name in glGetActiveUniform(). This avoids the situation where the output buffer does not have enough space to hold the "[0]" suffix, resulting in an incomplete array specification like "foobar[0". NOTE: This is a candidate for the 9.1 branch. Change-Id: I41e87ba347a7169eec8c575596cc3416adbe0728 Signed-off-by: Haixia Shi <[email protected]> Reviewed-by: Stéphane Marchesin <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965/fs: Fix bad interaction between tex swizzles and textureQueryLOD.Matt Turner2013-04-011-1/+1
| | | | | Reported-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Remove the old brw_optimize() code.Eric Anholt2013-04-013-120/+0
| | | | | | This is now done in the VS backend before instruction emit. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: Add a pass to set dependency control fields on instructions.Eric Anholt2013-04-013-0/+126
| | | | | | | This is a more aggressive version of the old brw_optimize() path. Reduces cycles spent in the vertex shader on minecraft by 18.6% +/- 10.0% (n=15). Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Dump shader source for linked shader programs.Eric Anholt2013-04-011-2/+18
| | | | | | | | | | We dump shader source in ir_to_mesa.cpp, and we dump linked programs here, but we had no reference from the linked programs to their source. This was preventing improvement of shader-db to use linked shader programs instead of individual shader files (which is bogus, because it means we optimize out VS outputs, and don't interpolate FS inputs!) Reviewed-by: Kenneth Graunke <[email protected]>
* clover: Fix build with LLVM 3.3Mike Lothian2013-04-011-1/+2
|
* llvmpipe: use triangle subdivision to avoid fixed-point overflow issuesBrian Paul2013-04-013-0/+186
| | | | | | | | | | | | If we're drawing to a surface that's 2048 x 2048 pixels or larger there's danger of fixed-point overflow in the triangle rasterization code. That leads to various rendering glitches. Rather than implement some intricate changes to the rasterization code, simply subdivide triangles into smaller subtriangles to avoid the issue. Only do this when the drawing surface is larger than 2048 by 2048. Reviewed-by: José Fonseca <[email protected]>
* mesa: remove platform checks around __builtin_ffs, __builtin_ffsllBrian Paul2013-04-011-6/+0
| | | | | | | | | | | Use the __builtin_ffs, __builtin_ffsll functions whenever we have GCC, not just for specific platforms. Fixes Solaris build. Note: This is a candidate for the stable branches. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62868 Signed-off-by: Brian Paul <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* docs: add a new page documenting known application issuesBrian Paul2013-04-012-0/+84
| | | | | | Let's try to update this when we find other broken applications... Reviewed-by: José Fonseca <[email protected]>
* drirc: set always_have_depth_buffer for TopogonBrian Paul2013-04-011-0/+6
| | | | Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: Minor comment cleanupAdam Jackson2013-04-011-1/+1
| | | | Signed-off-by: Adam Jackson <[email protected]>
* mesa: fix texture storage multisample prototypes harder.Dave Airlie2013-04-012-4/+4
| | | | | | I just noticed the warnings since I fixed the other bit. Signed-off-by: Dave Airlie <[email protected]>
* r600g/llvm: Update LLVM_REVISIONVincent Lejeune2013-03-311-1/+1
|
* r600g/llvm: use native encode for texVincent Lejeune2013-03-311-23/+27
|
* glapi: fix storage multisample build errorsDave Airlie2013-03-311-2/+2
| | | | | | Reported on #radeon by udovdh Signed-off-by: Dave Airlie <[email protected]>
* docs: mark ARB_texture_storage_multisample doneChris Forbes2013-03-311-1/+1
| | | | | | Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* i965: enable ARB_texture_storage_multisample on Gen6+Chris Forbes2013-03-311-0/+1
| | | | | | | | | This can be enabled everywhere that ARB_texture_multisample is supported -- ARB_texture_storage is supported on everything. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* mesa: allow multisample texture targets in [Get]TexParameter*Chris Forbes2013-03-311-1/+87
| | | | | | | | | | | | | | | | | | | | ARB_texture_storage_multisample allows texture parameters to be queried for TEXTURE_2D_MULTISAMPLE and TEXTURE_2D_MULTISAMPLE_ARRAY targets. Some parameters may also be set, with the following exceptions: - TEXTURE_BASE_LEVEL may not be set to a nonzero value; generates INVALID_OPERATION - any state which appears in the `per-sampler` state table may not be set; generates INVALID_OPERATION V2: Don't introduce bogus handling of TEXTURE_MAX_LEVEL Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* mesa: improve reported function name in Tex*MultisampleChris Forbes2013-03-311-16/+14
| | | | | | | | | Now that there are 4 variants, just pass the function name into teximagemultisample rather than reconstructing it. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* mesa: add enable bit for ARB_texture_storage_multisampleChris Forbes2013-03-312-0/+2
| | | | | | Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Brian Paul <[email protected]>