| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We always uploaded them together, mostly out of laziness - both required
an additional vertex element. However, gl_VertexID now also requires an
additional vertex buffer for storing gl_BaseVertex; for non-indirect
draws this also means uploading (a small amount of) data. This is extra
overhead we don't need if the shader only uses gl_InstanceID.
In particular, our clear shaders currently use gl_InstanceID for doing
layered clears, but don't need gl_VertexID.
Signed-off-by: Kenneth Graunke <[email protected]>
Cc: "10.3" <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
Tested-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the non-indirect draw case, we call intel_upload_data to upload
gl_BaseVertex. It makes brw->draw.draw_params_bo point to the upload
buffer, and increments the upload BO reference count.
So, we need to unreference it when making brw->draw.draw_params_bo point
at something else, or else we'll retain a reference to stale upload
buffers and hold on to them forever.
This also means that the indirect case should increment the reference
count on the indirect draw buffer when making brw->draw.draw_params_bo
point at it. That way, both paths increment the reference count, so
we can safely unreference it every time.
Signed-off-by: Kenneth Graunke <[email protected]>
Cc: "10.3" <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
Tested-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
| |
The _Enabled property already has the relevant information.
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: "10.2 10.3" <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: "10.2 10.3" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the source is not a GRF, it could have a register >= virtual_grf_count.
Accessing virtual_grf_end with such a register would lead to
out-of-bounds access. Make sure the source is a GRF before accessing
virtual_grf_end.
Fixes Valgrind complaints while compiling some shaders.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
So far we have been using CL_INVOCATION_COUNT to resolve this query but this
is no good with streams, as only stream 0 reaches the clipping stage.
From ARB_transform_feedback3:
"When a generated primitive query for a vertex stream is active, the
primitives-generated count is incremented every time a primitive emitted to
that stream reaches the Discarding Rasterization stage (see Section 3.x)
right before rasterization. This counter is incremented whether or not
transform feedback is active."
Unfortunately, we don't have any registers that provide the number of primitives
written to a specific stream other than the ones that track the number of
primitives written to transform feedback in the SOL stage, so we can't
implement this exactly as specified.
In the past we implemented this feature by activating the SOL unit even if
transform feeback was disabled, but making it so that all buffers were
disabled and it only recorded statistics, which gave us the right semantics
(see 3178d2474ae5bdd1102fb3d76a60d1d63c961ff5). Unfortunately, this came with
a significant performance impact and had to be reverted.
This new take does not intend to implement the exact semantics required by
the spec, but improves what we have now, since now we return the primitive
count for stream 0 in all cases. With this patch we use
GEN7_SO_PRIM_STORAGE_NEEDED to resolve GL_PRIMITIVES_GENERATED queries
for non-zero streams. This would return the number of primitives written
to transform feedback for each stream instead. Since non-zero streams are
only useful in combination with transform feedback this should not be too
bad, and the only case that I think we would not be supporting would be
the one in which we want to use both GL_PRIMITIVES_GENERATED and
GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN on the same non-zero stream to
detect buffer overflow.
This patch also fixes the following piglit test:
arb_gpu_shader5-xfb-streams-without-invocations
This test uses both GL_PRIMITIVES_GENERATED and
GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN queries on non-zero streams, but it
does never hit the overflow case, so both queries are always expected to return
the same value.
Reviewed-by: Kenneth Graunke <[email protected]>
Cc: "10.3" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Apparently guardband clipping doesn't work like we thought: objects
entirely outside fthe guardband are trivially rejected, regardless of
their relation to the viewport. Normally, the guardband is larger than
the viewport, so this is not a problem. However, when the viewport is
larger than the guardband, this means that we would discard primitives
which were wholly outside of the guardband, but still visible.
We always program the guardband to 8K x 8K to enforce the restriction
that the screenspace bounding box of a single triangle must be no more
than 8K x 8K. So, if the viewport is larger than that, we need to
disable guardband clipping.
Fixes ES3 conformance tests:
- framebuffer_blit_functionality_negative_height_blit
- framebuffer_blit_functionality_negative_width_blit
- framebuffer_blit_functionality_negative_dimensions_blit
- framebuffer_blit_functionality_magnifying_blit
- framebuffer_blit_functionality_multisampled_to_singlesampled_blit
v2: Mention the acronym expansion for TA/TR/MC in the comments.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes the (new) piglit tests gles-3.0-drawarrays-vertexid,
gl-3.0-multidrawarrays-vertexid, and gl-3.2-basevertex-vertexid.
Fixes gles3conform failure in:
ES3-CTS.gtf.GL3Tests.transform_feedback.transform_feedback_vertex_id
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80247
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that we have the data available, we need to expose it to the
shaders. We can reuse the same vertex element that we use for
gl_VertexID, but we need to back it by an actual vertex buffer.
A hardware restriction requires that vertex attributes coming from a
buffer (STORE_SRC) must come before any other types (i.e. STORE_0).
So, we have to make gl_BaseVertex be the .x component of the vertex
attribute. This means moving gl_VertexID to a different component.
I chose to move gl_VertexID and gl_InstanceID to the .z and .w
components, respectively, to make room for gl_BaseInstance in the .y
component (which would also come from a buffer, and therefore be
STORE_SRC).
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We'll need to emit another VERTEX_BUFFER_STATE for gl_BaseVertex;
pulling this into a helper function will save us from having to deal
with cross-generation differences in that code.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This will be used for GL_ARB_shader_draw_parameters, as well as fixing
gl_VertexID, which is supposed to include gl_BaseVertex's value.
For indirect draws, we simply point at the indirect buffer; for normal
draws, we upload the value via the upload buffer.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
Let's handle LIBGL_DEBUG env. variable in Mesa in a consistent way.
Fixes: https://bugzilla.novell.com/show_bug.cgi?id=895730
Signed-off-by: Stefan Dirsch <[email protected]>
Reviewed-by: Courtney Goeltzenleuchter <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
UBO loads can be boolean-valued expressions, too, so we need to handle
them in emit_bool_to_cond_code() and emit_if_gen6().
However, unlike most expressions, it doesn't make sense to evaluate
their operands, then do something with the results. We just want to
evaluate the UBO load as a whole---which performs the read from
memory---then load the boolean result into the flag register.
Instead of adding code to handle it, we can simply bypass the
ir_expression handling, and fall through to the default code, which will
do exactly that.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83468
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Matt and I believe that Sandybridge actually uses 0xFFFFFFFF for a
"true" comparison result, similar to Ivybridge. This matches the
internal documentation, and empirical results, but contradicts the PRM.
So, the comment is inaccurate, and we can actually just handle these
directly without ever needing to fall through to the condition code
path.
Also, the vec4 backend has always done it this way, and has apparently
been working fine. This patch makes the FS backend match the vec4
backend's behavior.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
ir_triop_csel can return a boolean expression, so we need to handle it
here; we simply forgot when we added ir_triop_csel, and forgot again
when adding it to emit_bool_to_cond_code.
Fixes Piglit's EXT_shader_integer_mix/{vs,fs}-mix-if-bool on Sandybridge.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
| |
Instead we cast backend_visitor::prog for fragment shader specific code paths.
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
Instead we store a void pointer to the key, and cast it to
brw_wm_prog_key for fragment shader specific code paths.
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
Instead we store a brw_stage_prog_data pointer, and cast it to
brw_wm_prog_data for fragment shader specific code paths.
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
gl_program* is named prog similar to backend_visitor.
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
This matches backend_visitor, and will allow gl_program to be named prog.
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
This will allow for stage specific code paths.
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The scale factors for the resolve rectangle change for BDW and we have
to look at brw->gen now to figure out how big it should be.
Fixes: https://bugs.freedesktop.org/attachment.cgi?id=105777
Cc: "10.3" <[email protected]>
Signed-off-by: Kristian Høgsberg <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Uniform values are in the UNIFORM register file, not the GRF register file.
Looking in virtual_grf_sizes makes no sense and only makes the output of
dump_instructions confusing.
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a texture is wrapped in a texture view, we can't trust the format in
the miptree itself. This patch allows us to pass the format seperately
through blorp so we can proprerly handled wrapped textures.
It's worth noting here that we can use the miptree format directly for
depth/stencil formats because they cannot be reinterpreted by a texture
view.
Signed-off-by: Jason Ekstrand <[email protected]>
CC: "10.3" <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before commit 04895f5c we would only reswizzle dot product instructions
(since they wrote the same value into all channels, and we didn't have
to think about anything else). That commit extended reswizzling to cases
when the swizzle was single valued -- i.e., writing the same result into
all channels.
But allowing reswizzling of arbitrary things is actually really easy and
is even less code. (Why didn't we do this in the first place?!)
total instructions in shared programs: 4266079 -> 4261000 (-0.12%)
instructions in affected programs: 351933 -> 346854 (-1.44%)
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Despite the comment above the function claiming otherwise, the function
did not reswizzle sources, which would lead to bad code generation since
commit 04895f5c, which began claiming we could do such swizzling when we
could not.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82932
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
| |
The 'start' instruction is always in the current block, except for the
case of shader time, which emits code in a pattern seen no where else.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Otherwise, the basic block start/end IPs don't get updated properly,
leading to a broken CFG. This usually results in the following
assertion failure:
brw_fs_live_variables.cpp:141:
void brw::fs_live_variables::setup_def_use():
Assertion `ip == block->start_ip' failed.
Fixes KWin, WebGL demos, and a score of Piglit tests on Sandybridge and
earlier hardware.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The dump() methods don't alter the CFG or basic blocks, so we should
mark them as const. This lets you call them even if you have a const
cfg_t - which is the case in certain portions of the code (such as live
interval handling).
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
| |
I think this bug crept in only recently.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
If the ENDIF instruction was the only instruction in its block, we'd
leave the successors of the merged if+jump block in a bad state.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83080
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reverts
* "i965: Modify state upload to allow 2 different sets of state atoms."
8e27a4d2b3e4e74e9a77446bce49607433d86be3
* "i965: Modify dirty bit handling to support 2 pipelines."
373143ed9187c4d4ce1e3c486b5dd0880d18ec8b
* "i965: Create a macro for checking a dirty bit."
c5bdf9be1eca190417998d548fd140c1eca37a54
Conflicts:
src/mesa/drivers/dri/i965/brw_context.h
* "i965: Create a macro for setting all dirty bits."
6f56e1424d923fd80c84090fbf4506c9eaaffea1
Conflicts:
src/mesa/drivers/dri/i965/brw_blorp.cpp
src/mesa/drivers/dri/i965/brw_state_cache.c
src/mesa/drivers/dri/i965/brw_state_upload.c
* "i965: Create a macro for setting a dirty bit."
88e3d404dad009d8cff5124cf8acee7daeaceb64
Signed-off-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
| |
Reduce fs_visitor's dependence on gl_fragment_program.
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
This common init routine can be used by constructors for multiple program
types.
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In commit 32f2fd1c5d6088692551c80352b7d6fa35b0cd09, several calls to
_mesa_calloc(x) were replaced with calls to calloc(1, x). This is strictly
equivalent to what the code was doing previously.
But for cases where "x" involves multiplication, now that we are explicitly
using the two-argument calloc, we can do one step better and replace:
calloc(1, A * B);
with:
calloc(A, B);
The advantage of the latter is that calloc will detect any overflow that would
have resulted from the multiplication and will fail the allocation, (whereas
the former would return a small allocation). So this fix can change
potentially exploitable buffer overruns into segmentation faults.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
ir_triop_csel can return a boolean expression, so we need to handle it
here; we simply forgot when we added it.
Fixes Piglit's EXT_shader_integer_mix/{vs,fs}-mix-if-bool.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
| |
All shader stages have these fields, so it makes sense to store them in
the common base structure, rather than duplicating them in each.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
| |
Signed-off-by: Jason Ekstrand <[email protected]>
Cc: "10.3" <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82804
Tested-by: Tapani Pälli <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
| |
Signed-off-by: Jason Ekstrand <[email protected]>
Cc: "10.3" <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82804
Tested-by: Tapani Pälli <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, we were using the source images level for both source and
destination. Also, we weren't taking the MinLevel from a potential texture
view into account. This commit fixes both problems.
Signed-off-by: Jason Ekstrand <[email protected]>
Cc: "10.3" <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82804
Tested-by: Tapani Pälli <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
| |
i965 will have more than 32 bits when BRW_STATE_COMPUTE_PROGRAM is added.
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
The set of state atoms for compute shaders is currently empty; it will
be filled in by future patches.
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The hardware state for compute shaders is almost entirely orthogonal
to the hardware state for 3D rendering. To avoid sending unnecessary
state to the hardware, we'll need to have a separate set of state
atoms for the compute pipeline and the 3D pipeline. That means we
need to maintain two separate sets of dirty bits to determine which
state atoms need to be run.
But the dirty bits are not completely independent; for example, if
BRW_NEW_SURFACES is flagged while doing 3D rendering, then not only do
we need to re-run 3D state atoms that depend on BRW_NEW_SURFACES, but
we also need to re-run compute state atoms that depend on
BRW_NEW_SURFACES. But we'll also need to re-run those state atoms the
next time the compute pipeline is run.
To accomplish this, we record two sets of dirty bits, one for each
pipeline. When bits are dirtied (via SET_DIRTY_BIT() or
SET_DIRTY_ALL()) we set them to the dirty state in both pipelines.
When brw_state_upload() is run, we clear the dirty bits just for the
pipeline that was run.
Note that since the number of pipelines is known at compile time to be
2, the compiler should unroll the loops in SET_DIRTY_BIT() and
SET_DIRTY_ALL().
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
| |
This will make it easier to extend dirty bit handling to support
compute shaders.
Reviewed-by: Jordan Justen <[email protected]>
|