| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
Adds XML for the extension, dispatch_sanity enabling, and the two new
entrypoints. These are both implemented by calling the shared
teximagemultisample() with immutable=GL_TRUE.
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The new entrypoints will come later, but this adds the actual logic for
supporting immutable multisample textures:
- The immutability flag is set as desired.
- Attempting to modify an immutable multisample texture produces
INVALID_OPERATION.
Note: The extension spec does not mention adding this behavior to
TexImage*Multisample, but it seems like the reasonable thing to do.
V2: - Cover missing error cases (unsized formats; texture object zero)
Signed-off-by: Chris Forbes <[email protected]>
[V1] Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
| |
This is about to be used in teximagemultisample() when immutable=true.
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
| |
These haven't been used since we deleted NV_vertex_program support.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We are intentionally not allocating a slot for gl_ClipVertex. But by
leaving the bit set in the slots_valid, the fragment shader's computation
of where varyings are in urb entry coming out of the SF would be off by
one. Fixes rendering in Freespace 2 SCP, and improves rendering in TF2.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62830
Tested-by: Joaquín Ignacio Aramendía <[email protected]>
NOTE: This is a candidate for the 9.1 branch.
Reviewed-and-tested-by: Kenneth Graunke <[email protected]>
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
| |
Alessandro Pignotti noted when I added this code in commit
0e723b135bfd59868c92c3ae243f1adaedaec3a5 that it's in the else block for
"if (busy)", so this debug print couldn't happen.
Reviewed-by: Kenneth Graunke <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
"discard" instructions generate HALT instructions which jump to a final
HALT near the end of the shader. Previously, fs_generator created this
final jump target when it saw the first FS_OPCODE_FB_WRITE, causing it
to jump right before the FB write epilogue. This is normally good.
However, INTEL_DEBUG=shader_time also has an epilogue section which
records the final timestamp. The frontend emits IR for this just before
FS_OPCODE_FB_WRITE. Unfortunately, this led to the following ordering:
1. Shader Time Epilogue
2. Final HALT (where discards jump)
3. Framebuffer Write Epilogue
This meant that discarded pixels completely skipped the shader time
epilogue, causing no ending timestamp to be written. This obviously
led to inaccurate results.
This patch adds a new FS_OPCODE_PLACEHOLDER_HALT in the IR stream just
before any epilogue sections. This is where the final HALT should be
generated, and makes it easy to ensure the correct ordering:
1. Final HALT
2. Shader Time Epilogue
3. Framebuffer Write Epilogue
For shaders that don't discard, this opcode compiles away to nothing.
The scheduler adds barrier dependencies to make sure that it doesn't
get moved above any FS_OPCODE_DISCARD_JUMP instructions.
One 8-wide shader in GLBenchmark 2.7 dropped from 2291.67 Gcycles to
a mere 5.13 Gcycles.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
I'd previously added the minimum names to understand my dumps, but this
makes dumps in general much easier to read.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
v2: Support Ironlake as well.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
v2: Support Ironlake as well.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
v2 [mattst88]:
- Rebase.
- #define GL_ARB_texture_query_lod to 1.
- Remove comma after ir_lod in ir.h for MSVC.
- Handled ir_lod in ir_hv_accept.cpp, ir_rvalue_visitor.cpp,
opt_tree_grafting.cpp.
- Rename textureQueryLOD to textureQueryLod, see
https://www.khronos.org/bugzilla/show_bug.cgi?id=821
- Fix ir_reader of (lod ...).
v3 [mattst88]:
- Rename textureQueryLod to textureQueryLOD, pending resolution of
Khronos 821.
- Add ir_lod case to ir_to_mesa.cpp.
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
x before
+ after
+------------------------------------------------------------------------------+
| x x + |
| xx ++ x + |
| xx ++ + xx ++ |
|x xxx x+++++ + xxx x*x+*+++ + x +|
| |_____|____________A______A____M____M_|_______| |
+------------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 23 8083.78 8287.83 8205.55 8162.7461 68.307951
+ 23 8107.56 8358.74 8224.33 8186.1765 71.506301
No difference proven at 95.0% confidence
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
58% of mad(8) generated in shader-db are reading registers from the same
bank.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
| |
Set its latency to what happens to be the default floating-point
instruction latency. One day we may want to handle latency based on
register bank information.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously at least i915 failed to provide an implementation, but
exposed ARB_internalformat_query anyway, leading to crashes when
QueryInternalformativ was called.
Default implementation just returns 1 for everything, so is suitable for
any driver which does not support multisampling.
V2: - Move from intel to core mesa.
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We can use %-6s%-6s rather than manually counting characters, resulting
in much more readable code.
This necessitates a small secondary change: using "total fs16" and ""
now causes the "" string to be padded out to 6 characters, resulting in
too much whitespace. Splitting it into "total" and "fs16" produces the
same output as before.
Reviewed-by: Eric Anholt <[email protected]>
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
This much more accurately reflects the cost of the vertex shader, since
the payload setup is often a significant fraction of the instructions in
the VS.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
This will let us emit it later, after we're setting up MRFs for the
URB write.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
This incidentally also teaches it a bit about gen6 math -- we now allow
unswizzled, unmodified GRF temps as the sources for math.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
v2: Fix silly bool handling, and don't add new tabs.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
Previously, if you just wrote a constant color to the render target, no
time got noted at all. This is convenient for doing single-instruction
timings, but not so much for actual program analysis.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This avoids conflicts between shader_time and FB writes, so we can include
more of the program under our profiling. This does mean hiding more of
the message setup from the optimizer, which doesn't have a way to handle
multi-reg sends from GRFs.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Ken asked me the other day what -1 vs 0 vs 3 vs other meant in our shader
names, and I realized that it was really unclear. I'd like to do even
better, like noting which one is the clear shader, but that would require
exposing the metaops struct to the driver.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
This will let us do much better printouts for non-GLSL programs.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Oops, I thought I had removed all debugging code.
|
|
|
|
|
|
| |
Reduces compile time of l4d2's slowest shader by 17.8% +/- 1.3% (n=10).
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes various test fallout from 90b5a2425a on Pineview, which claims to
support ARB_internalformat_query but doesn't actually provide the
driverfunc.
That driver is still broken [GetInternalformativ will still segfault!]
but it was silly to be going through the sample count logic in the
nonmultisampling case at all.
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
Avoid creating arrays if we replace indirect addressing anyway.
Signed-off-by: Christian König <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Even when we have arrays it is possible for simplify_cmp
to work on temps, just not on arrays.
Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=62696
Signed-off-by: Christian König <[email protected]>
|
|
|
|
|
|
| |
https://bugs.freedesktop.org/show_bug.cgi?id=62573
Tested-by: Andreas Boll <[email protected]>
|
|
|
|
|
| |
Postprocessing is an internal meta op and should restore the states
it changes.
|
|
|
|
| |
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
target-specific variables are undefined when used as pre-requisites.
instead, use secondary-expansion.
I noticed this when building the patch:
i965: Add a driconf option to disable flush throttling
Signed-off-by: Adrian Marius Negreanu <[email protected]>
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch changes the arrays in brw_vue_map (which only ever contain
values from -1 to 58) from ints to signed chars. This reduces the
size of the struct from 488 bytes to 136 bytes.
Reviewed-by: Kenneth Graunke <[email protected]>
v2: fix STATIC_ASSERT to use 127 instead of 128.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With the introduction of geometry shaders, fragment inputs will no
longer come exclusively from the vertex shader; sometimes they come
from the geometry shader. So the name "vp_outputs_written" will
become a misnomer. This patch renames vp_outputs_written to
input_slots_valid, to reflect the true meaning of the bitfield from
the fragment shader's point of view: it indicates which of the
possible input slots contain valid data that was written by the
previous shader stage.
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch modifies post-GS pipeline stages (transform feedback, clip,
sf, fs) to refer to the VUE map through brw->vue_map_geom_out rather
than brw->vs.prog_data->vue_map. This ensures that when geometry
shader support is added, these pipeline stages will consult the
geometry shader output VUE map when appropriate, rather than the
vertex shader output VUE map.
v2: Fixed some stale "CACHE_NEW_VS_PROG" comments.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, the GPU pipeline has one active VUE map in effect at any
given time--the one representing the layout of vertex data coming from
the vertex shader. However, when geometry shaders are added, they
will have their own independent VUE map. Later pipeline stages (clip,
sf, fs) will need to consult the geometry shader VUE map if a geometry
shader is in use, and the vertex shader VUE map otherwise.
This patch adds a new field to brw_context, vue_map_geom_out, which
contains the VUE map that should be used by later pipeline stages. It
also adds a new state flag, BRW_NEW_VUE_MAP_GEOM_OUT, which is
signalled whenever the contents of the VUE map changes.
Since we don't support geometry shaders yet, vue_map_geom_out is
currently set only by the brw_vs_prog state atom.
v2: Don't set vue_map_geom_out in do_vs_prog--that's redundant and
possibly problematic for precompiles. Only set it in
brw_upload_vs_prog. Also, make a copy instead of using a
pointer--this makes it possible to detect when the VUE map hasn't
changed, so we can avoid redundant state uploads.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Future patches will allow for there to be separate VUE maps when both
a geometry shader and a vertex shader are in use. When this happens,
we will want to have correspondingly separate outputs_written
bitfields. Moving outputs_written into the VUE map will make this
easy.
For consistency with the terminology used in the VUE map, the bitfield
is renamed to "slots_valid" in the process.
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Gen7 adds mask bits to the message header for a URB write which allow
the write to apply only to certain channels. We don't use this
functionality, so to ensure that the entire write always occurs, we
emit an OR instruction to set the mask bits.
With the advent of geometry shaders, URB writes won't just happen at
the end of a thread; they will happen in mid-thread too. Thus, we can
no longer rely on channel 0 being enabled, so we need to emit the OR
instruction in WE_all mode to ensure that it is executed.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
The new name clarifies that it represents *one more* than the maximum
possible brw_varying_slot value.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch removes the terminology "vert_result" from the i965 driver,
replacing it with "varying". The old terminology, "vert_result", was
confusing because (a) it referred to the enum gl_vert_result, which no
longer exists (it was replaced with gl_varying_slot), and (b) it
implied a vertex output, but with the advent of geometry shaders, it
could be either a vertex or a geometry output, depending what shaders
are in use. The generic term "varying" is less confusing.
No functional change.
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
v2: Whitespace fixes.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Bump MAX_DEPTH_TEXTURE_SAMPLES to match what GetInternalformativ is
claiming. Since that limit is what is actually enforced now, this
doesn't actually change anything except the queried value.
There's still no piglits verifying that multisample depth textures work,
but this works in the Unigine demos.
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Extends _mesa_check_sample_count() to properly support the
TEXTURE_2D_MULTISAMPLE and TEXTURE_2D_MULTISAMPLE_ARRAY targets, which
have subtly different limits than renderbuffers.
This resolves the remaining TODO in the implementation of
TexImage*DMultisample.
V2: - Don't introduce spurious block.
- Do this in multisample.c instead.
- Fix typo in error message.
- Inline spec quotes
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Pulls the checking of the sample count into a helper function, and
extends the existing logic to include the interactions with both
ARB_texture_multisample and ARB_internalformat_query.
_mesa_check_sample_count() checks a desired sample count against a
a combination of target/internalformat, and returns the error enum
to be produced, if any. Unfortunately the conditions are messy and the
errors vary.
V2: - Tidy up spurious block.
- Move _mesa_check_sample_count() to multisample.c instead; It
doesn't really belong in fbobject.c or teximage.c.
- Inlined spec quotes
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that we support ARB_texture_multisample, there are multiple targets
accepted for this query, and they may have target-dependent limits, so
pass the target to the driverfunc.
For example, the sampling hardware may not be able to do general
texelFetch() for some format/sample count combination, but the driver
may still be able to implement a reasonable resolve operation, so it can
be supported for renderbuffers.
V2: - Don't break Gallium compile.
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|