| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
VGPR1 = InstanceID / StepRate0; // StepRate0 can be set to 1
Ported from RadeonSI.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For Vega10 and Raven that need a special workaround for the
scissor bug.
This seems to give a minor boost for Talos and Dota 2, at least.
To reduce the cost of memcmp, the driver checks if it's
really useful to do the comparison.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
| |
VGPR1 is only needed for topology that needs 3 offsets like
triangles or quads.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
| |
Needed for the following commit.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
../../src/util/.libs/libmesautil.a(libmesautil_la-u_queue.o): In function `u_thread_setname':
/builddir/build/BUILD/mesa-17.3.1/src/util/../../src/util/u_thread.h:66: undefined reference to `pthread_setname_np'
../../src/util/.libs/libmesautil.a(libmesautil_la-u_queue.o): In function `thrd_join':
/builddir/build/BUILD/mesa-17.3.1/src/util/../../include/c11/threads_posix.h:336: undefined reference to `pthread_join'
../../src/util/.libs/libmesautil.a(libmesautil_la-u_queue.o): In function `u_thread_create':
/builddir/build/BUILD/mesa-17.3.1/src/util/../../src/util/u_thread.h:48: undefined reference to `pthread_sigmask'
../../src/util/.libs/libmesautil.a(libmesautil_la-u_queue.o): In function `thrd_create':
/builddir/build/BUILD/mesa-17.3.1/src/util/../../include/c11/threads_posix.h:296: undefined reference to `pthread_create'
../../src/util/.libs/libmesautil.a(libmesautil_la-u_queue.o): In function `u_thread_create':
/builddir/build/BUILD/mesa-17.3.1/src/util/../../src/util/u_thread.h:50: undefined reference to `pthread_sigmask'
/builddir/build/BUILD/mesa-17.3.1/src/util/../../src/util/u_thread.h:50: undefined reference to `pthread_sigmask'
../../src/util/.libs/libmesautil.a(libmesautil_la-u_queue.o): In function `call_once':
/builddir/build/BUILD/mesa-17.3.1/src/util/../../include/c11/threads_posix.h:96: undefined reference to `pthread_once'
../../src/util/.libs/libmesautil.a(libmesautil_la-u_queue.o): In function `u_thread_get_time_nano':
/builddir/build/BUILD/mesa-17.3.1/src/util/../../src/util/u_thread.h:84: undefined reference to `pthread_getcpuclockid'
collect2: error: ld returned 1 exit status
Reviewed-by: Adam Jackson <[email protected]>
Signed-off-by: Igor Gnatenko <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was never enabled in secondary buffers because hiz_enabled was
never set to true for those.
If the app provides a framebuffer in the inheritance info when beginning
a secondary buffer, we can determine if HiZ is enabled and therefore
allow the PMA optimization to be enabled within the command buffer.
This improves performance by ~13% on an internal benchmark on Skylake.
v2: Use anv_cmd_buffer_get_depth_stencil_view().
Signed-off-by: Alex Smith <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Respect the std430 rules for determining offset and size of struct
members when using a std430 buffer. std140 rules lead to wrong buffer
offsets in that case.
Fixes my test case attached in Bugzilla. No piglit changes.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104492
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
| |
All the functionality is in. Maxwell will take a little bit more
enablement work.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
| |
A part of the driver constbuf area is allocated for bindless images. Any
update requires uploading to all driver constbufs. This also extends the
driver constbuf to 64KB, up from 2KB.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This keeps a list of resident textures (per context), and dumps that
list into the active buffer list when submitting. We also treat bindless
texture fetches slightly differently, wrt the meaning of indirect, and
not requiring the SAMPLER file to be used.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
In preparation for bindless images, we have to retrieve the
target/format info from the instruction directly, as there will be no
declaration. Furthermore, for bound images, this information is still
available in the instruction, so we can drop the declaration-based
mechanism entirely.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
I'm fairly sure both of the changed sites are OK as-is, but they're
fragile, so this is just safening them up. Since this is happening
pre-ssa, we don't want to be overwriting values that may potentially get
used later on.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
| |
This is helpful for bindless, where changing TIC id's is undesirable.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
If we free the bo, then the PTE may get deallocated immediately. We have
to make sure that the submission includes a ref to the old bo so that it
remains mapped for the duration of the command execution.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
intel_batchbuffer_emit_float is dead code, it should go.
intel_batchbuffer_emit_dword only had one user, which had bungled using
them by forgetting to call intel_batchbuffer_require_space first. So it
seems wise to delete these unsafe helpers.
Cc: [email protected]
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
intel_batchbuffer_emit_dword doesn't reserve space for the DWord it
emits. In the past, we had some reserved batch space to ensure this
worked. With the switch to growing batches, we need to actually request
space so that we grow if necessary.
Fixes: 2c46a67b4138631217141f (i965: Delete BATCH_RESERVED handling.)
Cc: [email protected]
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
If asserts are disabled, you get pointless warnings about devinfo
being used (it's used to assert on devinfo->gen).
|
|
|
|
| |
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
| |
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
| |
The enums are moved to the top and indented like the rest of the file.
Comments are added to split up the function aliases by corresponding
extension. This should make no functional difference.
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
| |
Found by inspection.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we have a color attachment, but its writes are masked, this would
have still returned true. This is inconsistent with how HasWriteableRT
in 3DSTATE_PS_BLEND is set, which does take the mask into account.
This could lead to PixelShaderHasUAV not being set in 3DSTATE_PS_EXTRA
if the fragment shader does use UAVs, meaning the fragment shader may
not be invoked because HasWriteableRT is false. Specifically, this was
seen to occur when the shader also enables early fragment tests: the
fragment shader was not invoked despite passing depth/stencil.
Fix by taking the color write mask into account in this function. This
is consistent with how things are done on i965.
Signed-off-by: Alex Smith <[email protected]>
Cc: [email protected]
Reviewed-by: Iago Toral Quiroga <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
| |
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Similar to RadeonSI.
This fixes:
dEQP-VK.image.texel_view_compatible.graphic.basic.attachment_read.bc*r16g16b16a16_sfloat
dEQP-VK.image.extended_usage_bit.attachment_write.r16_sfloat
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
When allocate_user_sgprs() was called, ctx->stage was actually
unset and 0 is for the vertex shader. This doesn't change
anything for now because of the spill support thing.
Though, the number of user SGPRs has to be fixed for merged
shaders on GFX9. It was broken before anyway.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This might decrease VGPR spilling, because we no longer
have to use v4i32 for 2D fetches when level == 0. We now
use v2i32 for those cases.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
| |
Using 4, as it is the default value on mesa. See mesa/main/config.h
and the following commit that introduced the value:
15ac66e331abdab12e882d80a6b4f647bc905298
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
| |
ARB_transform_feedback3 sets a minimum of 1, ARB_gpu_shader5 a minimum
of 4. It shouldn't matter too much, so choosing the later.
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
| |
Used to handle how many ubo you can define on the context. Minimimum
defined as 36 on ARB_uniform_buffer_object spec, up to 84 on OpenGL
4.6 (12 per stage at each moment).
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
| |
Every now and then I execute the standalone compiler, get the
non-version error, and need to remember what I'm doing wrong
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
| |
Simplifies the logic a little and asserts index is 0.
Suggested-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
This allows us to pass the llvm param directly rather than looking
it up.
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|