| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
It should account for the number of streamout outputs.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
| |
Streamout outputs are stored in the ESGS ring.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
| |
It needs more space for multiple streams.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
| |
This allocates two BOs for GFX10 NGG streamout.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This internal option is turned off by default because NGG streamout
still hangs. It seems like it's related to GDS as RadeonSI.
That option will be turned on once all issues are resolved.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
| |
For NGG streamout which uses GDS.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Documentation for pipe_context::flush states:
"NOTE: use screen->fence_reference() (or equivalent) to transfer
new fence ref to **fence, to ensure that previous fence is unref'd"
Hence we need to unref previous out_fence.
Signed-off-by: Danylo Piliaiev <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A filed of nir_variable.location may be equel to -1.
That may cause copying to invalid address of list-node,
making some internal fields corrupted.
Patch fixes segfault during freeing context due to
corrupted address of ralloc_header.destructor.
v2: copy data if var is constant (Connor Abbott)
CC: Caio Marcelo de Oliveira Filho <[email protected]>
Fixes: b6d475356846 (nir/large_constants: De-duplicate constants)
Signed-off-by: Sergii Romantsov <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111676
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This option strictly allocate the minImageCount given by the
application at swapchain creation.
This works around application that do not deal with the fact that the
implementation allocates more images than the minimum specified.
v2: Add values in default drirc (Bas)
v3: specify engine name/version (Lionel)
Signed-off-by: Lionel Landwerlin <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111522
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
Cc: 19.2 <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Vulkan applications can register with the following structure :
typedef struct VkApplicationInfo {
VkStructureType sType;
const void* pNext;
const char* pApplicationName;
uint32_t applicationVersion;
const char* pEngineName;
uint32_t engineVersion;
uint32_t apiVersion;
} VkApplicationInfo;
This enables the Vulkan implementations to apply workarounds based off
matching this description.
Here we add a new parameter for matching the driconfig options with
the following :
<device driver="anv">
<application engine_name_match="MyOwnEngine.*" engine_versions="10:12,40:42">
<option name="blaaah" value="true" />
</application>
</device>
v2: switch engine name match to use regexps
v3: Verify that the regexec returns REG_NOMATCH for match failure (Eric)
v4: Add missing bit that went to the following commit (Eric)
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Cc: 19.2 <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
We'll use this later for a new driconfig matching parameter.
v2: Avoid leak in device creation error case (Bas)
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Cc: 19.2 <[email protected]>
|
|
|
|
|
|
|
| |
Also move the clearing of the bits out of if/else.
Signed-off-by: Christian Gmeiner <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Lepton Wu <[email protected]>
Reviewed-by: Gurchetan Singh <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It was calloc'd to 0 which is PIPE_PRIM_POINTS, which means that we
fail to notice an initial primitive of points being new, and fail at
updating the "primitive is points or lines" field.
We do not need to reset this on device loss because we're tracking
the last primitive mode sent to us on the CPU via draw_vbo, not the
last primitive mode sent to the GPU.
Fixes several tests:
- dEQP-GLES3.functional.clipping.point.wide_point_clip
- dEQP-GLES3.functional.clipping.point.wide_point_clip_viewport_center
- dEQP-GLES3.functional.clipping.point.wide_point_clip_viewport_corner
Fixes: dcfca0af7c5 ("iris: Set XY Clipping correctly.")
|
|
|
|
|
|
|
|
|
|
| |
Add a ppir dummy node for nir_ssa_undef_instr, create a reg for it and mark
it as undefined, so that regalloc can set it non-interfering to avoid
register pressure.
Signed-off-by: Andreas Baierl <[email protected]>
Reviewed-by: Vasily Khozuzhick <[email protected]>
Reviewed-by: Erico Nunes <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Andreas Baierl <[email protected]>
Reviewed-by: Vasily Khoruzhick <[email protected]>
Reviewed-by: Erico Nunes <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We are about to patch panfrost_flush() to flush all pending batches,
not only the current one. In order to do that, we need to move the
'flush single batch' code to panfrost_batch_submit().
While at it, we get rid of the existing pipelining logic, which is
currently unused and replace it by an unconditional wait at the end of
panfrost_batch_submit(). A new pipeline logic will be introduced later
on.
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
| |
panfrost_flush() is about to be reworked to flush all pending batches,
but we want the fence to block on the last one. Let's move the fence
creation logic in panfrost_flush() to prepare for this situation.
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
panfrost_draw_vbo() Might call the primeconvert/without_prim_restart
helpers which will enter the ->draw_vbo() again. Let's delay
payloads[].offset_start initialization so we don't initialize them
twice.
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
panfrost_attach_vt_xxx() functions are now passed a batch, and the
generated FB desc is kept in panfrost_batch so we can switch FBs
without forcing a flush. The postfix->framebuffer field is restored
on the next attach_vt_framebuffer() call if the batch already has an
FB desc.
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
| |
So we can emit SET_VALUE jobs for a batch that's not currently bound
to the context.
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
We'll soon be able to flush a batch that's not currently bound to the
context, which means ctx->pipe_framebuffer will not necessarily be the
FBO targeted by the wallpaper draw. Let's prepare for this case and
use ctx->wallpaper_batch in panfrost_blit_wallpaper().
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
| |
So we can emit such jobs to a batch that's not currently bound to the
context.
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
We need that if we want to upload transient buffers to a batch that's
not currently bound to the context, which in turn will be needed if we
want to relax the batch serialization we have right now (only flush
batches when we need to: on a flush request, or when one batch depends
on the result of other batches).
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Rename panfrost_is_scanout() into panfrost_batch_is_scanout(), pass it
a batch instead of a context and move the code to pan_job.c.
With this in place, we can now test if a batch is targeting a scanout
FB even if this batch is not bound to the context.
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
| |
Will be replaced by something similar but using a BOs as keys instead
of resources.
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This way we have all the fb_state information directly attached to a
batch and can pass only the batch to functions emitting CMDs, which is
needed if we want to be able to queue CMDs to a batch that's not
currently bound to the context.
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Indrajit Das <[email protected]>
Reviewed-by: Leo Liu <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
mir_foreach_instr_in_block_safe() is based on list_for_each_entry_safe()
which is designed to protect against removal of the current entry, but
removing the entry placed just after the current one will lead to a
use-after-free situation.
Luckily, the midgard_pair_load_store() logic guarantees that the
instruction being removed (if any) is never placed just after ins which
in turn guarantees that the hidden __next variable always points to a
valid object.
Took me a bit of time to realize that this code was safe, so I'm
suggesting to get rid of the inner mir_foreach_instr_in_block_from()
loop and rework the code so that the removed instruction is always the
current one (which is what the list_for_each_entry_safe() API was
initially designed for).
While at it, we also get rid of the unecessary insert(ins)/remove(ins)
dance by simply moving the instruction around.
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
| |
list_for_each_entry() does not allow modifying the current item pointer.
Let's rework the skip-instructions logic in schedule_block() to not
break this rule.
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The V3D documentation states that primitive counters are reset when
we emit Tile Binning Mode Configuration items, which we do at the start
of each draw call, however, in the actual hardware this doesn't seem to
take effect when transform feedback is not active (this doesn't happen in
the simulator). This causes a problem in the following scenario:
glBeginTransformFeedback()
glDrawArrays()
glPauseTransformFeedback()
glDrawArrays()
glResumeTransformFeedback()
glEndTransformFeedback()
The TF pause will trigger a flush of the primitive counters, which results
in a correct number of primitives up to that point. In theory, the counter
should then be reset when we execute the draw after pausing TF, but that
doesn't happen, and since TF is enabled again by the resume command before
we end recording, by the time we end the transform feedback recording we
again check the counters, but instead of reading 0, we read again the same
value we read at the time we paused, incorrectly accumulating that value
again.
In theory, we should be able to avoid this by using the other method to
reset the primitive counters: using operation 1 instead of 0 when we
flush the counts to the buffer at the time we pause, but again, this
doesn't seem to be work and we still see obsolete counts by the time we
end transform feedback.
This patch fixes the problem by not accumulating TF primitive counts
unless we know we have actually queued draw calls during transform
feedback, since that seems to effectively reset the counters. This should
also be more performant, since it saves unnecessary stalls for the
primitive counters to be updated when we know there haven't been any
new primitives drawn.
Fixes CTS tests:
dEQP-GLES3.functional.transform_feedback.*
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
| |
This was updating the counter for the indexed draw path only, but we are
already updating the counter for all paths a bit later, so this is only
duplicating counts for indexed paths.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
| |
Fixes: 0f2d1dfe65 ("v3d: use the GPU to record primitives written to transform feedback")
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
| |
So we can better correlate different results to versions of the runner.
Signed-off-by: Tomeu Vizoso <[email protected]>
|
|
|
|
|
|
|
| |
We haven't updated in a long time, so better do it now and again when
5.3 is released.
Signed-off-by: Tomeu Vizoso <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Instead of running it with the Wayland platform, which introduces
unwanted dependencies and complexity.
Makes tests run 30% faster, as well.
Signed-off-by: Tomeu Vizoso <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
streamout_buffers is assigned after that function, so the previous
fix was completely wrong. This probably fix something when streamout
buffers and push constants are used/inlined in the same shader.
Fixes: 378e2d24143 ("radv: fix computing number of user SGPRs for streamout buffers")
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When the UNDEF instruction was added, we didn't do anything special in
split_virtual_grfs. This mean that anything with an UNDEF wasn't
getting split which causes problems for the compiler. Among other
things, it makes RA harder because things are in bigger chunks. It also
meant that dvec4s weren't getting split which means that they are larger
than the maximum register size.
Shader-db results on Kaby Lake:
total instructions in shared programs: 14959202 -> 14960035 (<.01%)
instructions in affected programs: 96197 -> 97030 (0.87%)
helped: 140
HURT: 128
helped stats (abs) min: 1 max: 17 x̄: 1.62 x̃: 1
helped stats (rel) min: 0.09% max: 6.15% x̄: 0.65% x̃: 0.45%
HURT stats (abs) min: 1 max: 825 x̄: 8.28 x̃: 1
HURT stats (rel) min: 0.13% max: 139.83% x̄: 1.70% x̃: 0.50%
95% mean confidence interval for instructions value: -2.96 9.18
95% mean confidence interval for instructions %-change: -0.56% 1.51%
Inconclusive result (value mean confidence interval includes 0).
total loops in shared programs: 4372 -> 4372 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0
total cycles in shared programs: 352646771 -> 352840997 (0.06%)
cycles in affected programs: 218600800 -> 218795026 (0.09%)
helped: 21167
HURT: 21411
helped stats (abs) min: 1 max: 2924 x̄: 36.89 x̃: 10
helped stats (rel) min: <.01% max: 41.90% x̄: 2.97% x̃: 0.98%
HURT stats (abs) min: 1 max: 26027 x̄: 45.54 x̃: 10
HURT stats (rel) min: <.01% max: 324.46% x̄: 3.88% x̃: 1.06%
95% mean confidence interval for cycles value: 2.87 6.26
95% mean confidence interval for cycles %-change: 0.40% 0.55%
Cycles are HURT.
total spills in shared programs: 8840 -> 8953 (1.28%)
spills in affected programs: 126 -> 239 (89.68%)
helped: 1
HURT: 2
total fills in shared programs: 21782 -> 21914 (0.61%)
fills in affected programs: 431 -> 563 (30.63%)
helped: 1
HURT: 3
LOST: 0
GAINED: 5
Shader-db results on Haswell:
total instructions in shared programs: 13320918 -> 13320769 (<.01%)
instructions in affected programs: 40998 -> 40849 (-0.36%)
helped: 146
HURT: 56
helped stats (abs) min: 1 max: 8 x̄: 2.73 x̃: 2
helped stats (rel) min: 0.16% max: 8.60% x̄: 2.52% x̃: 2.22%
HURT stats (abs) min: 2 max: 23 x̄: 4.45 x̃: 4
HURT stats (rel) min: 0.21% max: 10.26% x̄: 6.83% x̃: 10.26%
95% mean confidence interval for instructions value: -1.26 -0.21
95% mean confidence interval for instructions %-change: -0.62% 0.77%
Inconclusive result (%-change mean confidence interval includes 0).
total loops in shared programs: 4373 -> 4373 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0
total cycles in shared programs: 374518258 -> 374384193 (-0.04%)
cycles in affected programs: 231101954 -> 230967889 (-0.06%)
helped: 21427
HURT: 19438
helped stats (abs) min: 1 max: 2035 x̄: 31.09 x̃: 8
helped stats (rel) min: <.01% max: 40.95% x̄: 2.42% x̃: 0.86%
HURT stats (abs) min: 1 max: 20875 x̄: 27.38 x̃: 8
HURT stats (rel) min: <.01% max: 59.09% x̄: 2.49% x̃: 0.80%
95% mean confidence interval for cycles value: -4.49 -2.07
95% mean confidence interval for cycles %-change: -0.14% -0.04%
Cycles are helped.
total spills in shared programs: 23406 -> 23411 (0.02%)
spills in affected programs: 3 -> 8 (166.67%)
helped: 0
HURT: 2
total fills in shared programs: 34845 -> 34850 (0.01%)
fills in affected programs: 3 -> 8 (166.67%)
helped: 0
HURT: 2
LOST: 0
GAINED: 0
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111566
Fixes: f4ef34f207d1 "intel/fs: Add an UNDEF instruction to avoid..."
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
_mesa_texstore_z32f_x24s8 calculates source rowStride at a
pace of 64-bit, this will make inaccuracy offset if the width
of src image is an odd number. Modify src pointer to int_32* as
source image format is gl_float which is 32-bit per pixel.
Reviewed by Ilia Mirkin
Signed-off-by: Jiadong Zhu <[email protected]>
Signed-off-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
The AnTuTu "garden" benchmark overflows the fixed size constbuffer
stateobject, so lets be more clever and calculate (a potentially
slightly pessimistic) actual size.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
Accidentally dropped in 4fdd455eeb7cffadee86f06c685005a3b64ce94b.
Fixes: 4fdd455e ("gallium: Require LLVM >= 3.4)
Reported-by: Ilia Mirkin <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
| |
Fixes: 272f9cfe6a19 ("dri: Use DRM_FORMAT_* instead of defining our own copy.")
Reviewed-by: John Stultz <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
| |
fd6_blitter.c:724:31: warning: passing argument 1 of ‘fd_resource_level_linear’ discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This brings back the fallback previously present in
st_nir_lookup_parameter_index(): if there's no parameter associated
with the variable, use a parameter from a variable with the same
prefix.
We'll have to sort out something for SPIR-V, but in the meantime let's
fix GLSL.
Fixes: b6384e57f5f ("mesa/st: Lookup parameters without using names")
Reviewed-by: Eric Anholt <[email protected]>
Tested-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
This slot is always filled in with __glFillImage.
Reviewed-by: Michel Dänzer <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
"partial" because `nir_intrinsics_h` was missing.
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Dylan Baker <[email protected]>
|
|
|
|
|
|
|
| |
"partial" because `nir_intrinsics_h` was missing.
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Dylan Baker <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As introduced in "v3d: flag dirty state when binding new sampler states"
we need to add support for compute states. New flag VC5_DIRTY_COMPTEX and
VC5_DIRTY_UNCOMPILED_CS are introduced.
Reaching 33 flags at the dirty field forces us to change the type to
uint_64. Flags are reordered and empty continuous bits are available
for future pipeline stages.
v2: Update flag conditions to compile cs shader. (Eric Antholt)
Now dirty flags use uint_64t and flags are reordered.
Added VC5_DIRTY_UNCOMPILED_CS flag.
Reviewed-by: Eric Anholt <[email protected]>
|