| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
All the other expression types allowed here have inst->mlen == 0, and this
one has implied MRF writes for all of its payload, so nothing else in the
implementation should need to change.
Reduces SEND messages for loading from pull constants in kwin's Lanczos
shader from 16 to 6. (Due to a deficiency in constant propagation, I
can't use the hack I did in the previous commit to test the performance
change)
Reviewed-by: Kenneth Graunke <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=61554
NOTE: This is a candidate for the 9.1 branch.
|
|
|
|
|
|
|
|
|
|
| |
This comes at a minor performance cost at the moment (-3.2% +/- 0.2%, n=14 on
my GM45 forced to load all uniforms through the varying-index path), but we
get a whole vec4 at a time to reuse in the next commit.
v2: Fix comment about channels in the other message.
Reviewed-by: Kenneth Graunke <[email protected]>
NOTE: This is a candidate for the 9.1 branch.
|
|
|
|
|
|
|
|
|
| |
We weren't setting needs_dep[i] in the loops, so we'd continue on to
potentially add the same workaround MOVs to the later basic block
boundaries, too. We can either set needs_dep[i] to exit through the
normal path, or we can just return since we know we're done.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
For sampler messages, it depends on the target gen, and on gen4
SIMD16-sampler-on-SIMD8-execution we were returning 4 instead of 8 like we
should.
Reviewed-by: Kenneth Graunke <[email protected]>
NOTE: This is a candidate for the 9.1 branch.
|
|
|
|
|
|
|
| |
I think this makes it much more obvious what's going on here.
NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is our first CSE on a regs_written() > 1 instruction, so it takes a
bit of extra fixup. Reduces the number of loads on kwin's Lanczos shader
from 12 to 2.
v2: Fix compiler warning (false positive on possibly-uninitialized variable)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=61554
Reviewed-by: Kenneth Graunke <[email protected]> (v1)
NOTE: This is a candidate for the 9.1 branch.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Like we have done for the VS and for constant-index uniform loads, we use
the sampler engine to get caching in front of the L3 to avoid tickling the
IVB L3 bug. This is also a bit of a functional change, as we're now
loading a vec4 instead of a single dword, though we're not taking
advantage of the other 3 components of the vec4 (yet).
With the driver hacked to always take the varying-index path for all
uniforms, improves performance of my old GLSL demo by 315% +/- 2% (n=4).
This a major fix for some blur shaders in compositors from the
varying-index uniforms support I introduced in 9.1.
v2: Move old offset computation into the pre-gen7 path.
Reviewed-by: Kenneth Graunke <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=61554
NOTE: This is a candidate for the 9.1 branch.
|
|
|
|
|
|
|
|
| |
Right now we don't have anything with regs_written() > 1 and !inst->mlen,
but that's about to change.
NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We want to load vec4s, since loading a vec4 instead of a dword is
basically no increased latency. But for variable indexed access, the
previous requirement of aligned vec4s for a sampler LD was hard to
implement.
Note that this change only affects those messages that use the surface
format, like sampler LDs, but not to the untyped data cache loads we've
used in other cases.
No significant performance difference on my GLSL demo with uniforms forced
to take the varying pull constants path (n=4).
NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This puts the rounding-up logic into the function itself instead of all
the callers having to manage it. Also drop an "unused" comment in gen4,
as the stride *is* used for texbos (and will be for uniforms soon).
NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
I'm going to want to change the math for gen7 using sampler LD
instructions in a way that gets CSE to occur like we'd hope.
NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
We weren't inserting it into the list, so it did nothing. This line was
replaced by the MOV/MUL block above.
NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
This happens quite a bit with varying-index uniform loads. We could also
do better by avoiding the MACH entirely, but there's no reason not to at
least take this step.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the active uniform is an array, then the length of the uniform name should
include the three extra characters for the "[0]" suffix, which is required by
the GL 4.2 spec to be appended to the uniform name in glGetActiveUniform().
This avoids the situation where the output buffer does not have enough space
to hold the "[0]" suffix, resulting in an incomplete array specification like
"foobar[0".
NOTE: This is a candidate for the 9.1 branch.
Change-Id: I41e87ba347a7169eec8c575596cc3416adbe0728
Signed-off-by: Haixia Shi <[email protected]>
Reviewed-by: Stéphane Marchesin <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
| |
Reported-by: Chris Forbes <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
This is now done in the VS backend before instruction emit.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
This is a more aggressive version of the old brw_optimize() path. Reduces
cycles spent in the vertex shader on minecraft by 18.6% +/- 10.0% (n=15).
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
We dump shader source in ir_to_mesa.cpp, and we dump linked programs here,
but we had no reference from the linked programs to their source. This
was preventing improvement of shader-db to use linked shader programs
instead of individual shader files (which is bogus, because it means we
optimize out VS outputs, and don't interpolate FS inputs!)
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Use the __builtin_ffs, __builtin_ffsll functions whenever we have GCC,
not just for specific platforms. Fixes Solaris build.
Note: This is a candidate for the stable branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62868
Signed-off-by: Brian Paul <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
| |
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
| |
I just noticed the warnings since I fixed the other bit.
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This can be enabled everywhere that ARB_texture_multisample is
supported -- ARB_texture_storage is supported on everything.
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ARB_texture_storage_multisample allows texture parameters to be
queried for TEXTURE_2D_MULTISAMPLE and TEXTURE_2D_MULTISAMPLE_ARRAY
targets.
Some parameters may also be set, with the following exceptions:
- TEXTURE_BASE_LEVEL may not be set to a nonzero value; generates
INVALID_OPERATION
- any state which appears in the `per-sampler` state table may not
be set; generates INVALID_OPERATION
V2: Don't introduce bogus handling of TEXTURE_MAX_LEVEL
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Now that there are 4 variants, just pass the function name into
teximagemultisample rather than reconstructing it.
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Adds XML for the extension, dispatch_sanity enabling, and the two new
entrypoints. These are both implemented by calling the shared
teximagemultisample() with immutable=GL_TRUE.
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The new entrypoints will come later, but this adds the actual logic for
supporting immutable multisample textures:
- The immutability flag is set as desired.
- Attempting to modify an immutable multisample texture produces
INVALID_OPERATION.
Note: The extension spec does not mention adding this behavior to
TexImage*Multisample, but it seems like the reasonable thing to do.
V2: - Cover missing error cases (unsized formats; texture object zero)
Signed-off-by: Chris Forbes <[email protected]>
[V1] Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
| |
This is about to be used in teximagemultisample() when immutable=true.
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
| |
These haven't been used since we deleted NV_vertex_program support.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We are intentionally not allocating a slot for gl_ClipVertex. But by
leaving the bit set in the slots_valid, the fragment shader's computation
of where varyings are in urb entry coming out of the SF would be off by
one. Fixes rendering in Freespace 2 SCP, and improves rendering in TF2.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62830
Tested-by: Joaquín Ignacio Aramendía <[email protected]>
NOTE: This is a candidate for the 9.1 branch.
Reviewed-and-tested-by: Kenneth Graunke <[email protected]>
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
| |
Alessandro Pignotti noted when I added this code in commit
0e723b135bfd59868c92c3ae243f1adaedaec3a5 that it's in the else block for
"if (busy)", so this debug print couldn't happen.
Reviewed-by: Kenneth Graunke <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
"discard" instructions generate HALT instructions which jump to a final
HALT near the end of the shader. Previously, fs_generator created this
final jump target when it saw the first FS_OPCODE_FB_WRITE, causing it
to jump right before the FB write epilogue. This is normally good.
However, INTEL_DEBUG=shader_time also has an epilogue section which
records the final timestamp. The frontend emits IR for this just before
FS_OPCODE_FB_WRITE. Unfortunately, this led to the following ordering:
1. Shader Time Epilogue
2. Final HALT (where discards jump)
3. Framebuffer Write Epilogue
This meant that discarded pixels completely skipped the shader time
epilogue, causing no ending timestamp to be written. This obviously
led to inaccurate results.
This patch adds a new FS_OPCODE_PLACEHOLDER_HALT in the IR stream just
before any epilogue sections. This is where the final HALT should be
generated, and makes it easy to ensure the correct ordering:
1. Final HALT
2. Shader Time Epilogue
3. Framebuffer Write Epilogue
For shaders that don't discard, this opcode compiles away to nothing.
The scheduler adds barrier dependencies to make sure that it doesn't
get moved above any FS_OPCODE_DISCARD_JUMP instructions.
One 8-wide shader in GLBenchmark 2.7 dropped from 2291.67 Gcycles to
a mere 5.13 Gcycles.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
I'd previously added the minimum names to understand my dumps, but this
makes dumps in general much easier to read.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
v2: Support Ironlake as well.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
v2: Support Ironlake as well.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
v2 [mattst88]:
- Rebase.
- #define GL_ARB_texture_query_lod to 1.
- Remove comma after ir_lod in ir.h for MSVC.
- Handled ir_lod in ir_hv_accept.cpp, ir_rvalue_visitor.cpp,
opt_tree_grafting.cpp.
- Rename textureQueryLOD to textureQueryLod, see
https://www.khronos.org/bugzilla/show_bug.cgi?id=821
- Fix ir_reader of (lod ...).
v3 [mattst88]:
- Rename textureQueryLod to textureQueryLOD, pending resolution of
Khronos 821.
- Add ir_lod case to ir_to_mesa.cpp.
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
x before
+ after
+------------------------------------------------------------------------------+
| x x + |
| xx ++ x + |
| xx ++ + xx ++ |
|x xxx x+++++ + xxx x*x+*+++ + x +|
| |_____|____________A______A____M____M_|_______| |
+------------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 23 8083.78 8287.83 8205.55 8162.7461 68.307951
+ 23 8107.56 8358.74 8224.33 8186.1765 71.506301
No difference proven at 95.0% confidence
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
58% of mad(8) generated in shader-db are reading registers from the same
bank.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
| |
Set its latency to what happens to be the default floating-point
instruction latency. One day we may want to handle latency based on
register bank information.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously at least i915 failed to provide an implementation, but
exposed ARB_internalformat_query anyway, leading to crashes when
QueryInternalformativ was called.
Default implementation just returns 1 for everything, so is suitable for
any driver which does not support multisampling.
V2: - Move from intel to core mesa.
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We can use %-6s%-6s rather than manually counting characters, resulting
in much more readable code.
This necessitates a small secondary change: using "total fs16" and ""
now causes the "" string to be padded out to 6 characters, resulting in
too much whitespace. Splitting it into "total" and "fs16" produces the
same output as before.
Reviewed-by: Eric Anholt <[email protected]>
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
This much more accurately reflects the cost of the vertex shader, since
the payload setup is often a significant fraction of the instructions in
the VS.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
This will let us emit it later, after we're setting up MRFs for the
URB write.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
This incidentally also teaches it a bit about gen6 math -- we now allow
unswizzled, unmodified GRF temps as the sources for math.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
v2: Fix silly bool handling, and don't add new tabs.
Reviewed-by: Kenneth Graunke <[email protected]>
|