| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
x before
+ after
+------------------------------------------------------------------------------+
| x x + |
| xx ++ x + |
| xx ++ + xx ++ |
|x xxx x+++++ + xxx x*x+*+++ + x +|
| |_____|____________A______A____M____M_|_______| |
+------------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 23 8083.78 8287.83 8205.55 8162.7461 68.307951
+ 23 8107.56 8358.74 8224.33 8186.1765 71.506301
No difference proven at 95.0% confidence
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
58% of mad(8) generated in shader-db are reading registers from the same
bank.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
| |
Set its latency to what happens to be the default floating-point
instruction latency. One day we may want to handle latency based on
register bank information.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
Similar enough that we can try to use shared code.
v2: fix a stupid bug using wrong variable causing mayhem with Inf and NaNs.
Reviewed-by: Jose Fonseca <[email protected]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously at least i915 failed to provide an implementation, but
exposed ARB_internalformat_query anyway, leading to crashes when
QueryInternalformativ was called.
Default implementation just returns 1 for everything, so is suitable for
any driver which does not support multisampling.
V2: - Move from intel to core mesa.
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
There's more, but this only adds (most) of the counters that are
handled directly by the shader processors.
The other counter domains are not handled on the multiprocessor and
there are no FIFO object methods for configuring them.
Instead, they have to be programmed by the kernel via PCOUNTER, and
the interface for this isn't in place yet.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
| |
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62883
Tested-by: Vinson Lee <[email protected]>
|
|
|
|
| |
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We can use %-6s%-6s rather than manually counting characters, resulting
in much more readable code.
This necessitates a small secondary change: using "total fs16" and ""
now causes the "" string to be padded out to 6 characters, resulting in
too much whitespace. Splitting it into "total" and "fs16" produces the
same output as before.
Reviewed-by: Eric Anholt <[email protected]>
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
This much more accurately reflects the cost of the vertex shader, since
the payload setup is often a significant fraction of the instructions in
the VS.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
This will let us emit it later, after we're setting up MRFs for the
URB write.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
This incidentally also teaches it a bit about gen6 math -- we now allow
unswizzled, unmodified GRF temps as the sources for math.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
v2: Fix silly bool handling, and don't add new tabs.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
Previously, if you just wrote a constant color to the render target, no
time got noted at all. This is convenient for doing single-instruction
timings, but not so much for actual program analysis.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This avoids conflicts between shader_time and FB writes, so we can include
more of the program under our profiling. This does mean hiding more of
the message setup from the optimizer, which doesn't have a way to handle
multi-reg sends from GRFs.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Ken asked me the other day what -1 vs 0 vs 3 vs other meant in our shader
names, and I realized that it was really unclear. I'd like to do even
better, like noting which one is the clear shader, but that would require
exposing the metaops struct to the driver.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
This will let us do much better printouts for non-GLSL programs.
Reviewed-by: Kenneth Graunke <[email protected]>
|
| |
|
|
|
|
| |
Oops, I thought I had removed all debugging code.
|
|
|
|
|
|
| |
Reduces compile time of l4d2's slowest shader by 17.8% +/- 1.3% (n=10).
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We weren't correctly propagating the samplers and sampler views
when they were related to geometry shaders.
Signed-off-by: Zack Rusin <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: José Fonseca <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Zack Rusin <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: José Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
We were allocating the output buffer but using the input
primitives. We need to allocate that buffer using the
maximum number of output, not input, primitives.
Signed-off-by: Zack Rusin <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: José Fonseca <[email protected]>
|
|
|
|
|
|
|
|
| |
Required by more modern examples. Like BRK but with a condition.
Signed-off-by: Zack Rusin <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: José Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
| |
TGSI semantics currently require an implicit endprim at the end
of GS if an ending primitive hasn't been emitted.
Signed-off-by: Zack Rusin <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: José Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This commits implements code generation of the geometry shaders in
the SOA paths. All the code is there but bugs are likely present.
Signed-off-by: Zack Rusin <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: José Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Allows executing gs on up to 4 primitives at a time. Will also be
required by the llvm code because there we definitely don't want
to flush with just a single primitive.
Signed-off-by: Zack Rusin <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: José Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
| |
To be able to add llvm paths later on we need to have some common
interface for them.
Signed-off-by: Zack Rusin <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: José Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The member was never used and we'll need to handle it differently
because gs will also need samplers/textures setup.
Signed-off-by: Zack Rusin <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: José Fonseca <[email protected]>
|
|
|
|
|
|
|
|
| |
A few tests were missing this crucial property.
Signed-off-by: Zack Rusin <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: José Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Same as on r600, trace cs execution by writting cs offset after each
states, this allow to pin point lockup inside command stream and
narrow down the scope of lockup investigation.
v2: Use WRITE_DATA packet instead of WRITE_MEM
v3: Remove useless nop packet
Signed-off-by: Jerome Glisse <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes various test fallout from 90b5a2425a on Pineview, which claims to
support ARB_internalformat_query but doesn't actually provide the
driverfunc.
That driver is still broken [GetInternalformativ will still segfault!]
but it was silly to be going through the sample count logic in the
nonmultisampling case at all.
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
We need at least that revision to work correctly now.
Signed-off-by: Christian König <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
|
| |
Now the backend handles that itself.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
|
| |
The include isn't needed and the file has moved with LLVM master.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
| |
Avoid creating arrays if we replace indirect addressing anyway.
Signed-off-by: Christian König <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Even when we have arrays it is possible for simplify_cmp
to work on temps, just not on arrays.
Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=62696
Signed-off-by: Christian König <[email protected]>
|