| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3024>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Function _mesa_delete_transform_feedback_object called from within
drivers once driver-specific clean-up has been done. Brings into
conformity with how other GL objects are handled.
CC: Eric Anholt <[email protected]>
CC: Kenneth Graunke <[email protected]>
Signed-off-by: Yevhenii Kolesnikov <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
| |
Some leaks detected with GL_KHR_debug on i965.
CC: Timothy Arceri <[email protected]>
Signed-off-by: Sergii Romantsov <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We're planning to start managing the PPGTT in userspace in the near
future, rather than relying on the kernel to assign addresses. While
most buffers can go anywhere, some need to be restricted to within 4GB
of a base address.
This commit adds a "memory zone" parameter to the BO allocation
functions, which lets the caller specify which base address the BO will
be associated with, or BRW_MEMZONE_OTHER for the full 48-bit VMA.
Eventually, I hope to create a 4GB memory zone corresponding to each
state base address.
Reviewed-by: Scott D Phillips <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
| |
brw_bo_alloc no longer uses this parameter, so there's no point.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Improves performance of SynMark2 OglGSCloth by a further 9.65%±0.59%
due to the reduction in overwraps of the primitive count buffer that
lead to a CPU stall on previous rendering. Cummulative performance
improvement from the series 81.50% ±0.96% (data gathered on VLV).
Tested-By: Eero Tamminen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
begin/end block.
This allows us to aggregate the primitive counts of a completed
transform feedback begin/end block lazily, which in the most typical
case (where glDrawTransformFeedback is not used) will allow us to
avoid aggregating the primitive counters on the CPU altogether,
preventing a stall on previous rendering during
glBeginTransformFeedback(), which dramatically improves performance of
applications that rely heavily on transform feedback.
Improves performance of SynMark2 OglGSCloth by 65.52% ±0.25% (data
gathered on VLV).
Tested-By: Eero Tamminen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
A primitive counter encapsulates a scalar aggregating counter for each
vertex stream along with a section within the primitive tally buffer
which hasn't been read out yet. Defining this as a separate type will
allow us to keep multiple counter objects around for the same
transform feedback object without any code duplication.
Tested-By: Eero Tamminen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
| |
This makes it easier to loop over programs.
Reviewed-by: Alejandro Piñeiro <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
We can encapsulate the logic for choosing the mapping type. This will
also help when we add WC mappings.
A few functional changes are made in this patch. On non-LLC, what were
previously WB mappings are now GTT mappings (in the prefilling debug
code in brw_performance_query.c; the shader_time code in brw_program.c;
and in the case of an RW mapping in intel_buffer_objects.c).
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
brw_bo_map_cpu() took a write_enable arg, but it wasn't always clear
whether we were also planning to read from the buffer. I kept everything
semantically identical by passing only MAP_READ or MAP_READ | MAP_WRITE
depending on the write_enable argument.
The other flags are not used yet, but MAP_ASYNC for instance, will be
used in a later patch to remove the need for a separate
brw_bo_map_unsynchronized() function.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
I'm going to make a new function named brw_bo_map() in a later patch
that is responsible for choosing the mapping type, so this patch clears
the way.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
Just return the map from brw_map_bo_*
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This restores the performance warnings removed in:
i965: Drop brw_bo_map[_gtt] wrappers which issue perf warnings.
but adds them for nearly all BO mapping, and also for wait_rendering.
Because we add this to the core bufmgr, we automatically get stall
warnings in all callers, unlike before where only a few callsites used
the wrappers that gave stall warnings.
We also do it a bit differently: we simply measure how long set_domain
takes (the part that stalls), and complain if it's more than 0.01 ms.
We don't bother calling brw_bo_busy(), and we don't measure the mmap
time (which doesn't stall). This should be more accurate.
Reviewed-by: Daniel Vetter <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The bacon is all gone.
This renames both the class and the related functions. We're about to
run indent on the bufmgr code, so no need to worry about fixing bad
indentation.
Acked-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
We'll want to change the implementation of this shortly.
Reviewed-by: Chris Wilson <[email protected]>
Acked-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
| |
Now we can actually test our changes.
Acked-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We create the BO when creating a transform feedback object, and only
destroy it when deleting that object. So it won't be NULL.
CID: 1401410
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was used for aubdumping (deleted a while ago) and INTEL_DEBUG=bat
decoding (deleted recently).
While we're changing parameters, delete the wrapper macro and make the
actual function brw_state_batch instead of __brw_state_batch.
This subsumes a patch by Emil Velikov to drop this from BLORP.
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
| |
While we're at it, we also change the GEN6 binding macro to be a start
index that gets added to the binding. This makes things a bit more
explicit.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes Piglit's ARB_transform_feedback2/change-objects-while-paused
GLES 3.0 test. When resuming the transform feedback object, we need to
reset the SVBI counters so we continue writing at the correct point in
the buffer.
Instead of SO_WRITE_OFFSET counters (with a DWord offset), we have the
Streamed Vertex Buffer Index (SVBI) counters, which contain a count of
vertices emitted.
Unfortunately, there's no straightforward way to store the current SVBI
counter values to a buffer. They're not available in a register. You
can use a bit in the 3DSTATE_SVB_INDEX packet to copy them to another
internal counter which 3DPRIMITIVE can use...but there's no good way to
extract that either.
So, once again, we use SO_NUM_PRIMS_WRITTEN to calculate the vertex
numbers. Thankfully, we can reuse most of the existing Gen7+ code.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
| |
I'm going to need this in a new Resume hook shortly.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
| |
Sandybridge and earlier only have a single counter.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
|
| |
This way on Sandybridge we'll only do 1 stream worth of math, since
we only have one SO_NUM_PRIMS_WRITTEN counter.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
|
|
| |
I plan to use these functions on Sandybridge soon. I changed the prefix
on a couple of functions to "brw" instead of "gen7" as in theory they
should be usable all the way back to G45.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This makes much more sense and should be more performant in some
critical paths such as SSO validation which is called at draw time.
Previously the CurrentProgram array could have contained multiple
pointers to the same struct which was confusing and we would often
need to fish out the information we were really after from the
gl_program anyway.
Also it was error prone to depend on the _LinkedShader array for
programs in current use because a failed linking attempt will lose
the infomation about the current program in use which is still
valid.
V2: fix validate_io() to compare linked_stages rather than the
consumer and producer to decide if we are looking at inward
facing shader interfaces which don't need validation.
Acked-by: Edward O'Callaghan <[email protected]>
To avoid build regressions the following 2 patches were squashed in to
this commit:
mesa/meta: rewrite _mesa_shader_program_use() and _mesa_program_use()
These are rewritten to do what the function name suggests, that is
_mesa_shader_program_use() sets the use of all stage and
_mesa_program_use() sets the use of a single stage.
Reviewed-by: Lionel Landwerlin <[email protected]>
Acked-by: Edward O'Callaghan <[email protected]>
mesa: update active relinked program
This likely fixes a subroutine bug were
_mesa_shader_program_init_subroutine_defaults() would never have been
called for the relinked program as we previously just set
_NEW_PROGRAM as dirty and never called the _mesa_use* functions when
linking.
Acked-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
|
|
|
|
| |
gl_shader_program
This will allow us to make the CurrentProgram array store gl_program which allows
us to do a bunch of simplifications.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This will help allow us to store gl_program in the CurrentProgram array rather
than gl_shader_program which will allow a bunch of simplifications.
Note that we make LinkedTransformFeedback a pointer so we don't waste
memory creating a struct for each stage. We also store a pointer to
the gl_program that will contain the pointer in gl_shader_program so
we can get easy access to the correct stage.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We have already set the gl_shader_program pointer to the correct
shader program in _mesa_BeginTransformFeedback() so use it.
This is more consistent with how we do it for gen7.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]
|
|
|
|
|
|
|
| |
This will be used in a following patch to implement interface
query support for TRANSFORM_FEEDBACK_BUFFER.
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This allows us to print the correct binding point when not all
buffers declared in the shader are bound.
For example if we use a single buffer:
layout(xfb_buffer=2, offset=0) out vec4 v;
We now print '2' when the buffer is not bound rather than '0'.
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
A while back, we moved to directly emitting the Gen7+ state when
constructing the binding tables. These flags are only used on
Gen4-6, which emit all the binding table pointers at once.
We gain nothing by having separate flags, so combine them.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Chris Wilson <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now, we only use ctx->NewDriverState.
I used this bash & sed command in the i965 directory:
for file in *.[ch] *.[ch]pp; do
sed -i -e 's/state\.dirty\.brw/ctx.NewDriverState/g' $file
done
Followed by manual changes to brw_state_upload.c.
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Acked-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Sandybridge requires the post-sync non-zero workaround in a ton of
places, and if you ever miss one, the GPU usually hangs.
Currently, we try to track exactly when a workaround flush is
necessary (via the brw->batch.need_workaround_flush flag). This is
tricky to get right, and we've botched it several times in the past.
This patch unconditionally performs the post-sync non-zero flush at the
start of each primitive's state upload (including BLORP). We drop the
needs_workaround_flush flag, and drop all the other callers, as the
flush has already been performed.
We have no data to indicate that simply flushing all the time will
hurt performance, and it has the potential to help stability.
v2: Add post-sync workaround to initial GPU state upload to be extra
cautious (suggested by Chad Versace).
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rendering with a GS and then using transform feedback with a program that does
not have a GS can crash in gen6. The reason for this is that
brw_begin_transform_feedback checks brw->geometry_program to decide if there
is a GS program, but this is not correct: brw->geometry_program is updated when
issuing drawing commands, so after rendering with a GS it will be non-NULL
until we draw again with a program that does not have a GS. If the next
program uses TF, we will call glBegintransformFeedback before issuing
the drawing command and hence brw->geometry_program will be non-NULL if
the previous rendering used a GS. The right thing to do here is to check
ctx->_Shader->CurrentProgram[MESA_SHADER_GEOMETRY] instead. This is what the
gen7 code path does too.
Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=87694
Reviewed-by: Tapani Pälli <[email protected]>
|
|
|
|
|
|
|
|
|
| |
It's been merged into brw_state_flags::brw for simplicity and
efficiency.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Most of the dirty flags were listed in some arbitrary order. Some used
bonus parenthesis. Some put multiple flags on one line, others put one
per line. Some used tabs instead of spaces...but only on some lines.
This patch settles on one flag per line, in alphabetical order, using
spaces instead of tabs, and sheds the unnecessary parentheses.
Sorting was mostly done with vim's visual block feature and !sort,
although I alphabetized short lists by hand; it was pretty manual.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For gen6 geometry shaders we use the first BRW_MAX_SOL_BINDINGS entries of the
binding table for transform feedback surfaces. However, vec4_visitor will
setup the binding table so that textures use the same space in the binding
table. This is done when calling assign_common_binding_table_offsets(0) as
part if its run() method.
To fix this clash we add a virtual method to the vec4_visitor hierarchy to
assign the binding table offsets, so that we can change this behavior
specifically for gen6 geometry shaders by mapping textures right after the
first BRW_MAX_SOL_BINDINGS entries.
Also, when there is no user-provided geometry shader, we only need to upload
the binding table if we have transform feedback, however, in the case of a
user-provided geometry shader, we can't only look into transform feedback
to make that decision.
This fixes multiple piglit tests for textureSize() and texelFetch() when these
functions are called from a geometry shader in gen6, like these:
bin/textureSize gs sampler2D -fbo -auto
bin/texelFetch gs usampler2D -fbo -auto
Acked-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Update gen6_gs_binding_table and gen6_sol_surface to use user-provided
geometry program information when present. This is necessary to implement
transform feedback support.
Signed-off-by: Samuel Iglesias Gonsalvez <[email protected]>
Acked-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reverts
* "i965: Modify state upload to allow 2 different sets of state atoms."
8e27a4d2b3e4e74e9a77446bce49607433d86be3
* "i965: Modify dirty bit handling to support 2 pipelines."
373143ed9187c4d4ce1e3c486b5dd0880d18ec8b
* "i965: Create a macro for checking a dirty bit."
c5bdf9be1eca190417998d548fd140c1eca37a54
Conflicts:
src/mesa/drivers/dri/i965/brw_context.h
* "i965: Create a macro for setting all dirty bits."
6f56e1424d923fd80c84090fbf4506c9eaaffea1
Conflicts:
src/mesa/drivers/dri/i965/brw_blorp.cpp
src/mesa/drivers/dri/i965/brw_state_cache.c
src/mesa/drivers/dri/i965/brw_state_upload.c
* "i965: Create a macro for setting a dirty bit."
88e3d404dad009d8cff5124cf8acee7daeaceb64
Signed-off-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
| |
This will make it easier to extend dirty bit handling to support
compute shaders.
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Basically a sed but shaderapi.c and get.c.
get.c => GL_CURRENT_PROGAM always refer to the "old" UseProgram behavior
shaderapi.c => the old api stil update the Shader object directly
V2: formatting improvement
V3 (idr):
* Rebase fixes after a block of code was moved from ir_to_mesa.cpp to
shaderapi.c.
* Trivial reformatting.
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
array.
These are replaced with
ctx->Shader.CurrentProgram[MESA_SHADER_{VERTEX,FRAGMENT,GEOMETRY}].
In patches to follow, this will allow us to replace a lot of ad-hoc
logic with a variable index into the array.
With the exception of the changes to mtypes.h, this patch was
generated entirely by the command:
find src -type f '(' -iname '*.c' -o -iname '*.cpp' ')' \
-print0 | xargs -0 sed -i \
-e 's/\.CurrentVertexProgram/.CurrentProgram[MESA_SHADER_VERTEX]/g' \
-e 's/\.CurrentGeometryProgram/.CurrentProgram[MESA_SHADER_GEOMETRY]/g' \
-e 's/\.CurrentFragmentProgram/.CurrentProgram[MESA_SHADER_FRAGMENT]/g'
Reviewed-by: Chris Forbes <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Implementing the GetTransformFeedbackVertexCount() driver hook allows
the VBO module to call us with the right number of vertices.
The hardware doesn't directly count the number of vertices written by
SOL, so we instead use the SO_NUM_PRIMS_WRITTEN(n) counters and multiply
by the number of vertices per primitive.
Unfortunately, counting the number of primitives generated is tricky:
a program might pause a transform feedback operation, start a second one
with a different object, then switch back and resume. Both transform
feedback operations share the SO_NUM_PRIMS_WRITTEN counters.
To work around this, we save the counter values at Begin, Pause, Resume,
and End. This "bookends" each section where transform feedback is
active for the current object. Adding up differences of pairs gives
us the number of primitives generated. (This is similar to what we
do for occlusion queries on platforms without hardware contexts.)
v2: Fix missing parenthesis in assertion (caught by Eric Anholt).
v3: Reuse prim_count_bo rather than freeing it and immediately
allocating a new one (suggested by Topi Pohjolainen).
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The ARB_transform_feedback2 extension introduces the ability to pause
and resume transform feedback sessions. Although only one can be active
at a time, it's possible to switch between multiple transform feedback
objects while paused.
In order to facilitate this, we need to save/restore the SO_WRITE_OFFSET
registers so that after resuming, the GPU continues writing where it
left off.
This functionality also exists in ES 3.0, but somehow we completely
forgot to implement it.
v2: Reduce alignment from 4096 to 64 (it seemed excessive).
v3: Use I915_GEM_DOMAIN_INSTRUCTION instead of RENDER, for consistency
with other writes. It shouldn't matter on IVB+.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|