summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* nvc0: fix 128 bit compressed storage type selectionChristoph Bumiller2013-04-031-1/+1
|
* nvc0: place staging textures in GART and map them directlyChristoph Bumiller2013-04-037-11/+76
|
* nv50: account for pesky prefetch in size calculation of linear texturesChristoph Bumiller2013-04-031-1/+6
|
* nvc0: honour scaled coordiantes setting for linear texturesChristoph Bumiller2013-04-031-6/+5
|
* nvc0: fix for 2d engine R source formats writing RRR1 and not R001Christoph Bumiller2013-04-033-52/+148
|
* nv50,nvc0: disable DEPTH_RANGE_NEAR/FAR clipping during blitChristoph Bumiller2013-04-033-1/+5
| | | | | We send position.z == 0, DEPTH_RANGE may be some arbitrary range not including 0 (for exmaple in piglit's hiz tests).
* st/mesa: fix bitmap,drawpix,drawtex for PIPE_CAP_TGSI_TEXCOORDChristoph Bumiller2013-04-033-2/+8
| | | | | | | NOTE: Changed the semantic index for the drawtex coordinate to be the texture unit index instead of always 0. Not sure if this is correct but since the value seems to depend on the unit it would make sense to use different varying slots.
* nouveau: accelerate buffer copies in resource_copy_regionChristoph Bumiller2013-04-035-11/+47
|
* nvc0: demagic some of the NVE4_COMPUTE_UPLOAD methodsChristoph Bumiller2013-04-033-49/+129
| | | | It's actually the same as P2MF.
* nvc0: read PM counters for each warp scheduler separatelyChristoph Bumiller2013-04-032-61/+138
|
* nvc0: add some metrics to driver specific queriesChristoph Bumiller2013-04-032-58/+160
|
* nvc0: add some driver statistics queriesChristoph Bumiller2013-04-0314-25/+279
|
* nvc0: disable compressed storage type 0xdb for nowChristoph Bumiller2013-04-031-1/+3
| | | | Single-sample color compression doesn't seem that useful anyway.
* nvc0: use correct hw query for PRIMITIVES_GENERATEDChristoph Bumiller2013-04-031-4/+7
| | | | It was the same as SO_STATISTICS[1] before.
* nvc0: use fence to check state of queries that don't write sequenceChristoph Bumiller2013-04-031-1/+5
| | | | | | This still isn't optimal, since the fence will signal a bit late, but better than checking on the bo, which may never be ready if it is shared (which is likely).
* gallium/hud: add support for PIPE_QUERY_PIPELINE_STATISTICSChristoph Bumiller2013-04-034-9/+52
| | | | | | | Also, renamed "pixels-rendered" to "samples-passed" because the occlusion counter increments even if colour and depth writes are disabled, or (on some implementations) for killed fragments that passed the depth test when PS early_fragment_tests is set.
* gallium/docs: fix definition of PIPE_QUERY_SO_STATISTICSChristoph Bumiller2013-04-031-3/+5
| | | | Reviewed-by: Marek Olšák <[email protected]>
* gallium: add PIPE_CAP_QUERY_PIPELINE_STATISTICSChristoph Bumiller2013-04-0312-1/+17
| | | | Reviewed-by: Marek Olšák <[email protected]>
* i965: Reduce code duplication in handling of depth, stencil, and HiZ.Paul Berry2013-04-025-151/+178
| | | | | | | | | | | | | | | | | | | | | | | | | This patch consolidates duplicate code in the brw_depthbuffer and gen7_depthbuffer state atoms. Previously, these state atoms contained 5 chunks of code for emitting the _3DSTATE_DEPTH_BUFFER packet (3 for Gen4-6 and 2 for Gen7). Also a lot of logic for determining the appropriate buffer setup was duplicated between the Gen4-6 and Gen7 functions. This refactor splits the code into three separate functions: brw_emit_depthbuffer(), which determines the appropriate buffer setup in a mostly generation-independent way, brw_emit_depth_stencil_hiz(), which emits the appropriate state packets for Gen4-6, and gen7_emit_depth_stencil_hiz(), which emits the appropriate state packets for Gen7. Tested using Piglit on Gen5-7 (no regressions). v2: Re-word some comments. Fix an assertion that incorrectly prohibited packed depth/stencil formats on Gen6 (these are allowed provided that HiZ is disabled). Reviewed-by: Chad Versace <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* Revert "glsl: Replace constant-index vector array accesses with swizzles"Paul Berry2013-04-022-83/+83
| | | | | | | This reverts commit dbf94d105a48b7aafb2c8cf64d8b4392d87efea1, which was working around a bug in the handling of array indexing when constant folding built-in functions. Now that the constant folding bug has been fixed, the workaround is no longer needed.
* glsl: Fix array indexing when constant folding built-in functions.Paul Berry2013-04-021-1/+1
| | | | | | | | | | | | | | | | | | | Mesa constant-folds built-in functions by using a miniature GLSL interpreter (see ir_function_signature::constant_expression_evaluate_expression_list()). This interpreter had a bug in its handling of array indexing, which caused expressions like "m[i][j]" (where m is a matrix) to be handled incorrectly. Specifically, it incorrectly treated j as indexing into the whole matrix (rather than indexing just into the vector m[i]); as a result the offset computed for m[i] was lost and m[i][j] was treated as m[j][0]. Fixes piglit tests inverse-mat[234].{vert,frag}. NOTE: This is a candidate for the 9.1 and 9.0 branches. Reviewed-by: Ian Romanick <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=57436
* gallivm: bring back optimized but incorrect float to smallfloat optimizationsRoland Scheidegger2013-04-021-38/+78
| | | | | | | | | | | | | Conceptually the same as previously done in float_to_half. Should cut down number of instructions from 14 to 10 or so, but will promote some NaNs to Infs, so it's disabled. It gets a bit tricky though handling all the cases correctly... Passes basic tests either way (though there are no tests testing special cases, but some manual tests injecting them seemed promising). v2: style and comment fixes suggested by Jose Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: consolidate code for float-to-half and float-to-packed conversion.Roland Scheidegger2013-04-023-108/+102
| | | | | | | | | | | | | | | | | | This replaces the existing float-to-half implementation. There are definitely a couple of differences - the old implementation had unspecified(?) rounding behavior, and could at least in theory construct Inf values out of NaNs. NaNs and Infs should now always be properly propagated, and rounding behavior is now towards zero (note this means too large but non-Infinity values get propagated to max representable value, not Infinity). The implementation will definitely not match util code, however (which does nearest rounding, which also means too large values will get propagated to Infinity). Also fix a bogus round mask probably leading to rounding bugs... v2: fix a logic bug in handling infs/nans. Reviewed-by: Jose Fonseca <[email protected]>
* r600g: don't reserve more stack space than required v5Vadim Girlin2013-04-023-56/+142
| | | | | | | | | | | Reduced stack size allows to run more threads in some cases, improving performance for the shaders that use stack (that is, for the shaders with control flow instructions). E.g. with unigine-based apps. v4: implement exact computation taking into account wavefront size v5: add cases for RV620, RS880 Signed-off-by: Vadim Girlin <[email protected]>
* r600g: fix range handling for tgsi input declarations v2Vadim Girlin2013-04-021-3/+7
| | | | Signed-off-by: Vadim Girlin <[email protected]>
* gallium/hud: do .xxxx swizzling for the font texture in the fragment shaderMarek Olšák2013-04-021-6/+30
| | | | | | This allows using L8 and R8 for the font if I8 isn't supported. Tested-by: Brian Paul <[email protected]>
* hud: flush/unmap the vertex buffer before drawingBrian Paul2013-04-021-0/+3
| | | | | | | The VMware svga driver is picky about making sure the VBO is unmapped before drawing. Reviewed-by: Marek Olšák <[email protected]>
* draw: use pipe_transfer_unmap() to match pipe_transfer_map()Brian Paul2013-04-021-1/+1
|
* gallivm: fix signed small float to float conversionRoland Scheidegger2013-04-021-1/+1
| | | | | | Introduced by 5f41e08cf39d585d600aa506cdcd2f5380c60ddd, just a silly typo. Fixes https://bugs.freedesktop.org/show_bug.cgi?id=62921.
* radeonsi: add instance divisor support v3Christian König2013-04-024-59/+94
| | | | | | | | v2: reduce key size, don't copy key around to much. v3: remove key size reduction Signed-off-by: Christian König <[email protected]> Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: add start instance supportChristian König2013-04-023-10/+21
| | | | | | | | This works different than on R600, we need to add the start instance manually. Signed-off-by: Christian König <[email protected]> Reviewed-by: Michel Dänzer <[email protected]> Tested-by: Michel Dänzer <[email protected]>
* radeonsi: add instanceid supportChristian König2013-04-024-7/+43
| | | | | | Signed-off-by: Christian König <[email protected]> Reviewed-by: Michel Dänzer <[email protected]> Tested-by: Michel Dänzer <[email protected]>
* radeon/llvm: move system value fetching to common codeChristian König2013-04-022-12/+12
| | | | | | | | This should be used by both SI and R600. Signed-off-by: Christian König <[email protected]> Reviewed-by: Michel Dänzer <[email protected]> Tested-by: Michel Dänzer <[email protected]>
* radeonsi: Handle arbitrary 2-byte formats in resource_copy_regionMichel Dänzer2013-04-021-0/+6
| | | | | | | | | | | Fixes mplayer -vo vdpau OSD. NOTE: This is a candidate for the 9.1 branch. Reported-by: Igor Vagulin <[email protected]> Reviewed-by: Christian König <[email protected]> Tested-by: Christian König <[email protected]>
* nvc0: Fix fd leak in nvc0_create_decoderMaarten Lankhorst2013-04-021-2/+2
| | | | | | NOTE: This is a candidate for the 9.0 and 9.1 branches. Signed-off-by: Maarten Lankhorst <[email protected]>
* GLSL: fix lower_jumps to report progress properlyAras Pranckevicius2013-04-011-1/+3
| | | | | | | | | A fix for lower_jumps progress reporting, very much like similar in c1e591eed. NOTE: This is a candidate for stable branches. Reviewed-by: Ian Romanick <[email protected]>
* i965/fs: Allow CSE on pre-gen7 varying-index uniform loadsEric Anholt2013-04-011-1/+1
| | | | | | | | | | | | | | | All the other expression types allowed here have inst->mlen == 0, and this one has implied MRF writes for all of its payload, so nothing else in the implementation should need to change. Reduces SEND messages for loading from pull constants in kwin's Lanczos shader from 16 to 6. (Due to a deficiency in constant propagation, I can't use the hack I did in the previous commit to test the performance change) Reviewed-by: Kenneth Graunke <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=61554 NOTE: This is a candidate for the 9.1 branch.
* i965/fs: Use LD messages for pre-gen7 varying-index uniform loadsEric Anholt2013-04-014-67/+84
| | | | | | | | | | This comes at a minor performance cost at the moment (-3.2% +/- 0.2%, n=14 on my GM45 forced to load all uniforms through the varying-index path), but we get a whole vec4 at a time to reuse in the next commit. v2: Fix comment about channels in the other message. Reviewed-by: Kenneth Graunke <[email protected]> NOTE: This is a candidate for the 9.1 branch.
* i965/fs: Don't double-emit SEND dependency workarounds at control flow.Eric Anholt2013-04-011-0/+2
| | | | | | | | | We weren't setting needs_dep[i] in the loops, so we'd continue on to potentially add the same workaround MOVs to the later basic block boundaries, too. We can either set needs_dep[i] to exit through the normal path, or we can just return since we know we're done. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Bake regs_written into the IR instead of recomputing it later.Eric Anholt2013-04-017-33/+27
| | | | | | | | | For sampler messages, it depends on the target gen, and on gen4 SIMD16-sampler-on-SIMD8-execution we were returning 4 instead of 8 like we should. Reviewed-by: Kenneth Graunke <[email protected]> NOTE: This is a candidate for the 9.1 branch.
* i965/fs: Clean up the setup of gen4 simd16 message destinations.Eric Anholt2013-04-011-5/+4
| | | | | | | I think this makes it much more obvious what's going on here. NOTE: This is a candidate for the 9.1 branch. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Do CSE on gen7's varying-index pull constant loads.Eric Anholt2013-04-011-11/+32
| | | | | | | | | | | | This is our first CSE on a regs_written() > 1 instruction, so it takes a bit of extra fixup. Reduces the number of loads on kwin's Lanczos shader from 12 to 2. v2: Fix compiler warning (false positive on possibly-uninitialized variable) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=61554 Reviewed-by: Kenneth Graunke <[email protected]> (v1) NOTE: This is a candidate for the 9.1 branch.
* i965/fs: Improve performance of varying-index uniform loads on IVB.Eric Anholt2013-04-012-18/+38
| | | | | | | | | | | | | | | | | | | Like we have done for the VS and for constant-index uniform loads, we use the sampler engine to get caching in front of the L3 to avoid tickling the IVB L3 bug. This is also a bit of a functional change, as we're now loading a vec4 instead of a single dword, though we're not taking advantage of the other 3 components of the vec4 (yet). With the driver hacked to always take the varying-index path for all uniforms, improves performance of my old GLSL demo by 315% +/- 2% (n=4). This a major fix for some blur shaders in compositors from the varying-index uniforms support I introduced in 9.1. v2: Move old offset computation into the pre-gen7 path. Reviewed-by: Kenneth Graunke <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=61554 NOTE: This is a candidate for the 9.1 branch.
* i965/fs: Avoid inappropriate optimization with regs_written > 1.Eric Anholt2013-04-011-0/+6
| | | | | | | | Right now we don't have anything with regs_written() > 1 and !inst->mlen, but that's about to change. NOTE: This is a candidate for the 9.1 branch. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Make the fragment shader pull constants index by dwords, not vec4s.Eric Anholt2013-04-016-16/+19
| | | | | | | | | | | | | | | | | We want to load vec4s, since loading a vec4 instead of a dword is basically no increased latency. But for variable indexed access, the previous requirement of aligned vec4s for a sampler LD was hard to implement. Note that this change only affects those messages that use the surface format, like sampler LDs, but not to the untyped data cache loads we've used in other cases. No significant performance difference on my GLSL demo with uniforms forced to take the varying pull constants path (n=4). NOTE: This is a candidate for the 9.1 branch. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Make the constant surface interface take a normal byte size.Eric Anholt2013-04-014-17/+16
| | | | | | | | | This puts the rounding-up logic into the function itself instead of all the callers having to manage it. Also drop an "unused" comment in gen4, as the stride *is* used for texbos (and will be for uniforms soon). NOTE: This is a candidate for the 9.1 branch. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Move varying uniform offset compuation into the helper func.Eric Anholt2013-04-013-11/+13
| | | | | | | | I'm going to want to change the math for gen7 using sampler LD instructions in a way that gets CSE to occur like we'd hope. NOTE: This is a candidate for the 9.1 branch. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Remove creation of a MOV instruction that's never used.Eric Anholt2013-04-011-1/+0
| | | | | | | | We weren't inserting it into the list, so it did nothing. This line was replaced by the MOV/MUL block above. NOTE: This is a candidate for the 9.1 branch. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Allow constant propagation into MACH.Eric Anholt2013-04-011-2/+4
| | | | | | | | This happens quite a bit with varying-index uniform loads. We could also do better by avoiding the MACH entirely, but there's no reason not to at least take this step. Reviewed-by: Kenneth Graunke <[email protected]>
* r600g/llvm: Update LLVM_REVISION.txtVincent Lejeune2013-04-011-1/+1
|