| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
| |
Having this separate could potentially make programs that rebind atomics
but no other surfaces ever so slightly faster. But it's a tiny amount
of code to add to the existing UBO/SSBO atom, and very related.
The extra atoms have a cost on every draw call, and so dropping some of
them would be nice. This also reclaims a dirty bit.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
We use the same hardware mechanism for both atomic counters and SSBO
atomics, so there's really no benefit to maintaining separate code to
handle each case. Instead, we can just use Rob's shiny new NIR pass to
convert atomic_uints to SSBOs, and delete piles of code.
The ssbo_start section of the binding table becomes a combined ABO and
SSBO section, with ABOs first, then SSBOs.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This fixes the missing AutomaticSize handling in the ABO code, removes
a bunch of duplicated code, and drops an extra layer of wrapping around
brw_emit_buffer_surface_state().
Reviewed-by: Tapani Pälli <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This should reduce the overhead of adding a BO to the current
list, especially when the list is huge. Also, when a new pipeline
is bound, we only need to update the descriptor, the buffer objects
should already be in the list.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
| |
I don't think we will need a 64-bit unsigned integer for the
dirty flags in the future, and there is still 20 bits left.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
| |
VkDeviceQueueCreateInfo::queueFamilyIndex is an unsigned 32-bit
integer.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
| |
radv_fill_buffer() ensures that the image BO is added to the list.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
| |
radv_fill_buffer() ensures that the image BO is added to the list.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Float rts were always set as unorm instead of float.
Not sure of the consequences, but at least it looks like the blend clamp
would have been enabled, which is against the rules (only eg really bothered
to even attempt to specify this correctly, r600 always used clamp anyway).
Albeit r600 (not r700) setup still looks bugged to me due to never setting
BLEND_FLOAT32 which must be set according to docs...
Not sure if the hw really cares, no piglit change (on eg/juniper).
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Both r600 and evergreen used the clamped version, whereas cayman used the
ieee one. I don't think there's a valid reason for this discrepancy, so let's
switch to the ieee version for r600 and evergreen too, since we generally
want to stick to ieee arithmetic.
With this, behavior for both rcp and rsq should now be the same for all of
r600, eg, cm, all using ieee versions (albeit note rsq retains the abs
behavior for everybody, which may not be a good idea ultimately).
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
r600 used the clamped version for rcp, whereas both evergreen and cayman
used the ieee version. I don't know why that discrepancy exists (it does so
since day 1) but there does not seem to be a valid reason for this, so make
it consistent. This seems now safer than before the previous commit (using
the dx10 clamp bit).
Note that rsq still uses clamped version (as before even though the table
may have suggested otherwise for evergreen) for r600/eg, but not for cayman.
Will be changed separately for better regression tracking...
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The docs are not very concise in what this really does, however both
Alex Deucher and Nicolai Hähnle suggested this only really affects instructions
using the CLAMP output modifier, and I've confirmed that with the newly
changed piglit isinf_and_isnan test.
So, with this bit set, if an instruction has the CLAMP modifier bit (which
clamps to [0,1]) set, then NaNs will be converted to zero, otherwise the result
will be NaN.
D3D10 would require this, glsl doesn't have modifiers (with mesa
clamp(x,0,1) would get converted to such a modifier) coupled with a
whatever-floats-your-boat specified NaN behavior, but the clamp behavior
should probably always be used (this also matches what a decomposition into
min(1.0, max(x, 0.0)) would do, if min/max also adhere to the ieee spec of
picking the non-nan result).
Some apps may in fact rely on this, as this prevents misrenderings in
This War of Mine since using ieee muls
(ce7a045feeef8cad155f1c9aa07f166e146e3d00), without having to use clamped
rcp opcode, which would also fix this bug there.
radeonsi also seems to set this bit nowadays if I see that righ (albeit the
llvm amdgpu code comment now says "Make clamp modifier on NaN input returns 0"
instead of "Do not clamp NAN to 0" since it was changed, which also looks
a bit misleading).
v2: set it in all shader stages.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103544
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I believe this is the safe thing to do, especially ever since the driver
actually generates NaNs for muls too.
The ISA docs are not very helpful here, however the dx10 versions will pick
a non-nan result over a NaN one (this is also the ieee754 behavior), whereas
the non-dx10 ones will pick the NaN (verified by newly changed piglit
isinf-and-isnan test).
Other "modern" drivers will most likely do the same.
This was shown to make some difference for bug 103544, albeit it is not
required to fix it.
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A lot of cubemap array piglits fail, port the texture type
picking code from radeonsi which seems to fix most of them.
For images I will port the rest of the code.
Fixes:
getteximage-depth gl_texture_cube_map_array-*
fbo-generatemipmap-cubemap array
getteximage-targets cube_array
amongst others.
Reviewed-by: Roland Scheidegger <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
A couple failures in piglit tests w/ TF or gl_VertexID + indirect draws.
OTOH all the deqp tests (although they don't test those combinations).
I suspect this could be fixed by a firmware update, but I don't think
there is much we can do in mesa for that.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
We need a similar thing for indirect draws.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Anuj Phogat <[email protected]>
Cc: Nanley Chery <[email protected]>
|
|
|
|
|
|
|
| |
Number of dwords in MI_FLUSH_DW changed from 4 to 5 in gen8+.
Signed-off-by: Anuj Phogat <[email protected]>
Cc: <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Anuj Phogat <[email protected]>
Cc: <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We already have this workaround in OpenGL driver.
See Mesa commit 3cf4fe2219.
Signed-off-by: Anuj Phogat <[email protected]>
Cc: Nanley Chery <[email protected]>
Cc: Rafael Antognolli <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Andres Rodriguez <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Andres Rodriguez <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
Stumbled into this when adding a new PIPE_CAP.
Signed-off-by: Andres Rodriguez <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
This reverts commit e8c9e65185de3e821e1e482e77906d1d51efa3ec.
With the actual bug fixed (by commit 6ac2d1690192), this is not
necessary. I'm doubtful of its correctness in any case.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The MOV instruction can extract bytes to words/double words, and
words/double words to quadwords, but not byte to quadwords.
For unsigned byte to quadword, we can read them as words and AND off the
high byte and extract to quadword in one instruction. For signed bytes,
we need to first sign extend to word and the sign extend that word to a
quadword.
Fixes the following test on CHV, BXT, and GLK:
KHR-GL46.shader_ballot_tests.ShaderBallotBitmasks
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103628
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
Fixes the following tests on CHV, BXT, and GLK:
KHR-GL46.shader_ballot_tests.ShaderBallotFunctionBallot
dEQP-VK.spirv_assembly.instruction.compute.uconvert.uint32_to_int64
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103115
|
|
|
|
|
|
|
|
| |
Speed up simd16 frontend (default) on avx/avx2 platforms;
fixes performance regression caused by switch to simdlib.
Reviewed-by: Bruce Cherniak <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
| |
Speed up avx512 platforms; fixes performance regression caused
by swithc to simdlib.
Reviewed-by: Bruce Cherniak <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When queryImage doesn't support __DRI_IMAGE_ATTRIB_FOURCC wayland clients
will die with a NULL derefence in wl_proxy_add_listener.
Attempt to provide a simple fallback to keep ancient systems working.
Fixes: 6595c699511 ("egl/wayland: Remove more surface specifics from
create_wl_buffer")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103519
Signed-off-by: Derek Foreman <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
Acked-by: Daniel Stone <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
| |
v2: remove r600_can_dma_copy_buffer
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Already implemented for Gallium drivers.
Useful for gbm_bo_(un)map.
Tests:
By porting wayland/weston/clients/simple-dmabuf-drm.c to GBM.
kmscube --mode=rgba
kmscube --mode=nv12-1img
kmscube --mode=nv12-2img
piglit ext_image_dma_buf_import-refcount -auto
piglit ext_image_dma_buf_import-transcode-nv12-as-r8-gr88 -auto
piglit ext_image_dma_buf_import-sample_rgb -fmt=XR24 -alpha-one -auto
piglit ext_image_dma_buf_import-sample_rgb -fmt=AR24 -auto
piglit ext_image_dma_buf_import-sample_yuv -fmt=NV12 -auto
piglit ext_image_dma_buf_import-sample_yuv -fmt=YU12 -auto
piglit ext_image_dma_buf_import-sample_yuv -fmt=YV12 -auto
v2: add early return if (flag & MAP_INTERNAL_MASK)
v3: take input rect into account and test with kmscube and piglit.
v4: handle wraparound and bo reference.
v5: indent, exclude 0 width and height on the boundary, map bo
independently of the image.
Signed-off-by: Julien Isorce <[email protected]>
Reviewed-by: Chris Wilson <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
It seems safe and it improves performance by +4% (73->76).
A drirc based solution is not what we want for now, keep it
simple and improve later if it's really needed.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
| |
Introduced on commit 157c9a13414b524ce98ea0ea07fce819efc1ba65
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
| |
Otherwise we leak it.
Fixes: eaa56eab6da "radv: initial support for shared semaphores (v2)"
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
| |
Otherwise we can leak the old syncobj.
Fixes: eaa56eab6da "radv: initial support for shared semaphores (v2)"
Reviewed-by: Dave Airlie <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, we just had one hash set for tracking depth and render
caches called brw_context::render_cache. This is less than ideal
because the depth and render caches are separate and we can't track
moves between the depth and the render caches. This limitation led
to some unnecessary flushing around the depth cache. There are cases
(mostly with BLORP) where we can end up touching a depth or stencil
buffer through the render cache. To guard against this, blorp would
unconditionally do a render_cache_set_check_flush on it's destination
which meant that if you did any rendering (including a BLORP operation)
to a given surface and then used it as a blorp destination, you would
end up flushing it out of the render cache before rendering into it.
Things get worse when you dig into the depth/stencil state code for
regular GL draw calls. Because we may end up rendering to a depth
or stencil buffer via BLORP, we did a render_cache_set_check_flush on
all depth and stencil buffers in brw_emit_depthbuffer to ensure that
they got flushed out of the render cache prior to using them for depth
or stencil testing. However, because we also need to track dirtiness
for depth and stencil so that we can implement depth and stencil
texturing correctly, we were adding all depth and stencil buffers to the
render cache set in brw_postdraw_set_buffers_need_resolve. This meant
that, if anything caused 3DSTATE_DEPTH_BUFFER to get re-emitted
(currently _NEW_BUFFERS, BRW_NEW_BATCH, and BRW_NEW_BLORP), we would
almost always do a full pipeline stall and render/depth cache flush.
The root cause of both of these problems is that we can't tell the
difference between the render and depth caches in our tracking. This
commit splits our cache tracking into two sets, one for render and one
for depth, and properly handles transitioning between the two. We still
flush all the caches whenever anything needs to be flushed. The idea is
that if we're going to take the hit of a flush and stall, we may as well
flush everything in the hopes that we can avoid a flush by something
else later.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Right now we just always flush the destination for render and aren't
particularly careful about depth or stencil. Soon, flush_for_render
isn't going to do the same thing as flush_for_depth and we may be doing
a good deal less depth flushing so we should be a bit more precise.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
In theory, this will let us track the depth and render caches
separately. Right now, they're just wrappers around
brw_render_cache_set_*
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
We may access them as a texture using blorp regardless of whether or not
stencil texturing is enabled.
Reviewed-by: Kenneth Graunke <[email protected]>
Cc: [email protected]
|