summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* i965/fs: Use measured Gen7 instruction timings on Gen6.Matt Turner2013-03-291-1/+4
| | | | | | | | | | | | | | | | | | x before + after +------------------------------------------------------------------------------+ | x x + | | xx ++ x + | | xx ++ + xx ++ | |x xxx x+++++ + xxx x*x+*+++ + x +| | |_____|____________A______A____M____M_|_______| | +------------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 23 8083.78 8287.83 8205.55 8162.7461 68.307951 + 23 8107.56 8358.74 8224.33 8186.1765 71.506301 No difference proven at 95.0% confidence Reviewed-by: Eric Anholt <[email protected]>
* i965/fs: Increase and document MAD latency on Gen7.Matt Turner2013-03-291-4/+18
| | | | | | | 58% of mad(8) generated in shader-db are reading registers from the same bank. Reviewed-by: Eric Anholt <[email protected]>
* i965/fs: Add LRP instruction latency.Matt Turner2013-03-291-0/+26
| | | | | | | | Set its latency to what happens to be the default floating-point instruction latency. One day we may want to handle latency based on register bank information. Reviewed-by: Eric Anholt <[email protected]>
* i965/fs: Add Haswell cycle timingsMatt Turner2013-03-291-9/+9
| | | | Reviewed-by: Eric Anholt <[email protected]>
* i965: Note that write-after-write dependencies are blocking.Matt Turner2013-03-291-1/+1
| | | | Reviewed-by: Eric Anholt <[email protected]>
* i965: Reword comment about the shared mathbox.Matt Turner2013-03-291-4/+4
| | | | Reviewed-by: Eric Anholt <[email protected]>
* gallivm: consolidate some half-to-float and r11g11b10-to-float codeRoland Scheidegger2013-03-293-63/+52
| | | | | | | Similar enough that we can try to use shared code. v2: fix a stupid bug using wrong variable causing mayhem with Inf and NaNs. Reviewed-by: Jose Fonseca <[email protected]
* mesa: provide default implementation of QuerySamplesForFormatChris Forbes2013-03-293-1/+21
| | | | | | | | | | | | | | Previously at least i915 failed to provide an implementation, but exposed ARB_internalformat_query anyway, leading to crashes when QueryInternalformativ was called. Default implementation just returns 1 for everything, so is suitable for any driver which does not support multisampling. V2: - Move from intel to core mesa. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nvc0: implement MP performance countersChristoph Bumiller2013-03-298-6/+556
| | | | | | | | | There's more, but this only adds (most) of the counters that are handled directly by the shader processors. The other counter domains are not handled on the multiprocessor and there are no FIFO object methods for configuring them. Instead, they have to be programmed by the kernel via PCOUNTER, and the interface for this isn't in place yet.
* nvc0: enable compression when supportedChristoph Bumiller2013-03-293-3/+13
|
* nvc0: use NOUVEAU_GETPARAM_GRAPH_UNITS to get MP countChristoph Bumiller2013-03-294-29/+73
|
* nv50,nvc0: fix 3d blits, restore viewport after blitChristoph Bumiller2013-03-292-18/+32
|
* nv50: fix 3D render target setupChristoph Bumiller2013-03-291-2/+10
|
* llvmpipe: put .bmp extension on dumped image filesBrian Paul2013-03-281-2/+2
|
* llvmpipe: add 'f' suffix to 1.0 in fixed_to_float()Brian Paul2013-03-281-1/+1
|
* draw: fix some build breakage when LLVM is not usedBrian Paul2013-03-282-1/+8
| | | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62883 Tested-by: Vinson Lee <[email protected]>
* mesa: handle STATE_CURRENT_ATTRIB_MAYBE_VP_CLAMPED for parameter printingMarek Olšák2013-03-281-0/+3
| | | | Reviewed-by: Brian Paul <[email protected]>
* i965: Tidy shader time printing code by using printf's field widths.Kenneth Graunke2013-03-281-12/+4
| | | | | | | | | | | | | We can use %-6s%-6s rather than manually counting characters, resulting in much more readable code. This necessitates a small secondary change: using "total fs16" and "" now causes the "" string to be padded out to 6 characters, resulting in too much whitespace. Splitting it into "total" and "fs16" produces the same output as before. Reviewed-by: Eric Anholt <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]>
* i965/vs: Include URB payload setup in shader_time.Eric Anholt2013-03-282-4/+11
| | | | | | | | This much more accurately reflects the cost of the vertex shader, since the payload setup is often a significant fraction of the instructions in the VS. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: Use a send from a 2-register VGRF for shader time writes.Eric Anholt2013-03-282-14/+13
| | | | | | | This will let us emit it later, after we're setting up MRFs for the URB write. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: Teach copy propagation about sends from GRFs.Eric Anholt2013-03-283-7/+29
| | | | | | | This incidentally also teaches it a bit about gen6 math -- we now allow unswizzled, unmodified GRF temps as the sources for math. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: Prepare split_virtual_grfs() for the presence of SENDs from GRFs.Eric Anholt2013-03-282-20/+45
| | | | | | v2: Fix silly bool handling, and don't add new tabs. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Include everything but the final FB write in shader_time.Eric Anholt2013-03-282-5/+15
| | | | | | | | Previously, if you just wrote a constant color to the render target, no time got noted at all. This is convenient for doing single-instruction timings, but not so much for actual program analysis. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Switch shader_time writes to using GRFs.Eric Anholt2013-03-286-19/+63
| | | | | | | | | This avoids conflicts between shader_time and FB writes, so we can include more of the program under our profiling. This does mean hiding more of the message setup from the optimizer, which doesn't have a way to handle multi-reg sends from GRFs. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Provide more detailed information to match shader_time to programs.Eric Anholt2013-03-281-13/+50
| | | | | | | | | Ken asked me the other day what -1 vs 0 vs 3 vs other meant in our shader names, and I realized that it was really unclear. I'd like to do even better, like noting which one is the clear shader, but that would require exposing the metaops struct to the driver. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Track ARB program state along with GLSL state for shader_time.Eric Anholt2013-03-286-29/+47
| | | | | | This will let us do much better printouts for non-GLSL programs. Reviewed-by: Kenneth Graunke <[email protected]>
* st/dri: fix crash with HUD and single bufferingMarek Olšák2013-03-281-1/+2
|
* st/mesa: remove leftover printfs from ReadPixelsMarek Olšák2013-03-281-3/+0
| | | | Oops, I thought I had removed all debugging code.
* i965/fs: Improve performance of copy propagation dataflow using bitsets.Eric Anholt2013-03-281-33/+34
| | | | | | Reduces compile time of l4d2's slowest shader by 17.8% +/- 1.3% (n=10). Reviewed-by: Kenneth Graunke <[email protected]>
* llvmpipe/draw: Fix texture sampling in geometry shadersZack Rusin2013-03-278-71/+146
| | | | | | | | | We weren't correctly propagating the samplers and sampler views when they were related to geometry shaders. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: José Fonseca <[email protected]>
* draw/llvm: Cleanup the store debugging codeZack Rusin2013-03-271-8/+5
| | | | | | Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: José Fonseca <[email protected]>
* draw: Allocate the output buffer for output primitivesZack Rusin2013-03-271-2/+1
| | | | | | | | | | We were allocating the output buffer but using the input primitives. We need to allocate that buffer using the maximum number of output, not input, primitives. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: José Fonseca <[email protected]>
* gallivm: Implement the breakc instructionZack Rusin2013-03-272-0/+34
| | | | | | | | Required by more modern examples. Like BRK but with a condition. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: José Fonseca <[email protected]>
* gallivm: implement implicit primitive flushingZack Rusin2013-03-272-0/+15
| | | | | | | | | TGSI semantics currently require an implicit endprim at the end of GS if an ending primitive hasn't been emitted. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: José Fonseca <[email protected]>
* gallium/llvm: implement geometry shaders in the llvm pathsZack Rusin2013-03-2710-79/+1285
| | | | | | | | | This commits implements code generation of the geometry shaders in the SOA paths. All the code is there but bugs are likely present. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: José Fonseca <[email protected]>
* draw/gs: Fetch more than one primitive per invocationZack Rusin2013-03-272-13/+48
| | | | | | | | | | Allows executing gs on up to 4 primitives at a time. Will also be required by the llvm code because there we definitely don't want to flush with just a single primitive. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: José Fonseca <[email protected]>
* draw/gs: Abstract the portions of GS that are tgsi specificZack Rusin2013-03-272-128/+156
| | | | | | | | | To be able to add llvm paths later on we need to have some common interface for them. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: José Fonseca <[email protected]>
* draw/llvm: Remove unused gs_constants from jit_contextZack Rusin2013-03-273-25/+11
| | | | | | | | | The member was never used and we'll need to handle it differently because gs will also need samplers/textures setup. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: José Fonseca <[email protected]>
* graw/gs: add missing max output vertices to all testsZack Rusin2013-03-274-0/+4
| | | | | | | | A few tests were missing this crucial property. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: José Fonseca <[email protected]>
* radeonsi: add cs tracing v3Jerome Glisse2013-03-276-1/+124
| | | | | | | | | | | Same as on r600, trace cs execution by writting cs offset after each states, this allow to pin point lockup inside command stream and narrow down the scope of lockup investigation. v2: Use WRITE_DATA packet instead of WRITE_MEM v3: Remove useless nop packet Signed-off-by: Jerome Glisse <[email protected]>
* mesa: only check sample count if we actually wanted multisamplingChris Forbes2013-03-271-9/+10
| | | | | | | | | | | | | Fixes various test fallout from 90b5a2425a on Pineview, which claims to support ARB_internalformat_query but doesn't actually provide the driverfunc. That driver is still broken [GetInternalformativ will still segfault!] but it was silly to be going through the sample count logic in the nonmultisampling case at all. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* radeon/llvm: document LLVM commitChristian König2013-03-261-1/+1
| | | | | | We need at least that revision to work correctly now. Signed-off-by: Christian König <[email protected]>
* radeonsi: add preloading for all samplersChristian König2013-03-261-12/+45
| | | | | Signed-off-by: Christian König <[email protected]> Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: add preloading of all constantsChristian König2013-03-261-16/+51
| | | | | Signed-off-by: Christian König <[email protected]> Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: mark most intrinsics as readnone/nounwindChristian König2013-03-261-8/+10
| | | | | Signed-off-by: Christian König <[email protected]> Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: mark all loads as constantChristian König2013-03-261-7/+25
| | | | | Signed-off-by: Christian König <[email protected]> Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: remove wqm intrinsicChristian König2013-03-261-9/+0
| | | | | | | Now the backend handles that itself. Signed-off-by: Christian König <[email protected]> Reviewed-by: Michel Dänzer <[email protected]>
* radeon/llvm: remove uneeded inclusionChristian König2013-03-261-1/+0
| | | | | | | The include isn't needed and the file has moved with LLVM master. Signed-off-by: Christian König <[email protected]> Reviewed-by: Michel Dänzer <[email protected]>
* glsl_to_tgsi: avoid creating arrays if driver doesn't support themChristian König2013-03-261-1/+3
| | | | | | Avoid creating arrays if we replace indirect addressing anyway. Signed-off-by: Christian König <[email protected]>
* glsl_to_tgsi: make simplify_cmp work with arraysChristian König2013-03-261-1/+1
| | | | | | | | | Even when we have arrays it is possible for simplify_cmp to work on temps, just not on arrays. Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=62696 Signed-off-by: Christian König <[email protected]>