| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
| |
The previous code was confused about whether slot_end was inclusive or
exclusive. Make it so that it is inclusive consistently, and use it for
setting the new location. This also avoids discrepancies in how
num_components is calculated vs the more manual approach taken for the
former reserved_slots check.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
| |
This setting is for whether color and alpha have different blend
settings, not for whether blending is enabled on a per-RT basis.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Tim Rowley <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Tim Rowley <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Tim Rowley <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
With ARB_clip_control, GL may also do 0..1 depth clipping, not just
-1..1. This removes clip's reliance on driver type. DX users will need
to be updated to set the new clipHalfZ flag to get proper clipping
functionality.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Tim Rowley <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Tim Rowley <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Most logic op usage is probably going to end up with normalized
textures. Scale the floating point values and convert to integer before
performing the logic operations.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Tim Rowley <[email protected]>
|
|
|
|
|
|
| |
Fixes piglit glsl-fs-shadow2D-clamp-z.
Cc: <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Jonas Pfeil noticed that we were putting passthrough tlb_z writes early in
the shader, despite QIR and QPU scheduling both trying to delay scoreboard
locking for as long as possible.
The problem was that when trying to pair up QPU instructions, at some
point the passthrough tlb_z would be the last one available and it would
get paired, even if the other half would open up other instructions to be
scheduled and we could have paired tlb_z with something later in the
program. Also, since passthrough z is just a mov, it pairs up really
easily.
The proper fix would probably be to flip the order of scheduling
instructions so we went from bottom to top (also relevant for branch delay
slot scheduling).
However, we can do a quick fix here to just not schedule a TLB lock until
there's nothing but TLB left in the program, at a slight instruction cost
(est .61% cycle count in shader-db) but a major fragment shader
parallelism win.
glmark2 results:
texture:texture-filter=linear: +1.24481% +/- 0.626117% (n=15)
bump:bump-render=height: 1.24991% +/- 0.154793% (n=136,133 -- screensaver
outliers removed)
|
| |
|
|
|
|
|
|
|
|
|
| |
It's much better to just skip the draw call entirely. Getting this
information out of register allocation will also be useful for
implementing threaded fragment shaders, which will need to retry
non-threaded if RA fails.
Cc: <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We had missed a bit of errata - PS scratch needs to be computed as if
there were 4 subslices per slice, rather than 3.
Skylake Broxton Kabylake
GT1 GT2 GT3 GT4 2x6 3x6 GT1 GT1.5 GT2 GT3 GT4
Actual Slices 1 1 2 3 1 1 1 1 1 2 3
Total Subslices 3 3 6 9 2 3 2 3 3 6 9
Subsl. for PS Scratch 4 4 8 12 4 4 4 4 4 8 12
Note that Skylake GT1-3 already worked because we allocated 64 * 9
(trying to use a value that would work on GT4, with 9 subslices),
and the actual required values were 64 * 4 or 64 * 8. However, all
others (Skylake GT4, Broxton, and Kabylake GT1-4) underallocated,
which can lead to scratch writes trashing random process memory,
and rendering corruption or GPU hangs.
Fixes GPU hangs and rendering corruption on Skylake GT4 in shaders that
spill. Particularly, dEQP-GLES31.functional.ubo.all_per_block_buffers.*
now runs successfully with no hangs and renders correctly. This may
fix problems on Broxton and Kabylake as well.
Cc: "13.0" <[email protected]>
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ben Widawsky <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Half float support already exists for desktop GL. Reuse the
ARB_half_float_vertex enable bit and account for the different enum to
enable the extension for ES2.
Signed-off-by: Kevin Strasser <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
This reverts commit 3652d1d5942a857f225700d67ce2c900396982f2.
Self nack/reject on this one. The base.ConfigID is overwritten
immediately after we store the current value, thus one memcpy [further
down] the wrong value will be copied.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Based on a patch by George Kyriazis but changed to test for
_MSC_VER >= 1800 (Visual Studio 2015).
This fixes the failed CANARY assertion in src/util/ralloc.c:get_header()
on Windows.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98595
Tested-by: Brian Paul <[email protected]>
Signed-off-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Port of the anv commit d96345de989 ("anv: Suffix the intel_icd file with
the host CPU").
v2: s/intel_icd/radeon_icd/ in commit summary (Gražvydas)
Cc: "13.0" <[email protected]>
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Dave Airlie <[email protected]> (IRC)
|
|
|
|
|
|
|
|
| |
Analogous to previous commit.
Cc: "13.0" <[email protected]>
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Dave Airlie <[email protected]> (IRC)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Vulkan has introduced the consept of .specVersion which can be used to
attribute changes of the said extension.
The current loader does not check the value, thus it have gone unnoticed
that the driver exposes an old version of the following extensions:
VK_KHR_xcb_surface (Rev 6)
VK_KHR_xlib_surface (Rev 6)
VK_KHR_wayland_surface (Rev 5)
- Updated the surface create function to take a pCreateInfo structure
VK_KHR_swapchain (Rev 68)
- Moved the "validity" include for vkAcquireNextImage to be in its proper
place, after the prototype and list of parameters.
...
According to the documentation:
* pname:specVersion is the version of this extension.
It is an integer, incremented with backward compatible changes.
Based on the history of vk.xml the above (latest) revision has been
available since Vulkan 1.0 so even if they were any backwards
incompatible change(s) [as hinted by the revision log] those should be
safe.
Cc: "13.0" <[email protected]>
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The use of regparm causes an error on arm/arm64 builds with clang.
fastcall is allowed, but still throws a warning. As both options only
have effect on 32-bit x86 builds, limit them to that case.
v2: keep the __i386__ within GCC (Nicolai)
Cc: 13.0 <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Nicolai Hähnle <[email protected]>
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Rob Herring <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently if one uses a non-default prefix, the path won't get
propagated and we'll fail at link-time.
A very quick and easy example is to install to /usr/local.
At this point, llvm-config will be picked even without the
--with-llvm-prefix, but regardless of the latter linking will fail.
Currently people can workaround that via LD_LIBRARY_PATH.
Cc: "12.0 13.0" <[email protected]>
Cc: Tom Stellard <[email protected]>
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Currently we only saved the id to memcpy the whole _EGLConfig to write
back the exact same id value.
Remove the unneeded and confusing/misleading code.
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As the linked per-stage shaders are processed, mark any block that has a
field that is accessed as referenced. When combining all the linked
shaders, combine the per-stage stageref masks.
This fixes a number of GLES CTS tests:
ES31-CTS.core.geometry_shader.program_resource.program_resource
ES32-CTS.core.geometry_shader.program_resource.program_resource
ESEXT-CTS.geometry_shader.program_resource.program_resource
piglit.gl45-cts.geometry_shader.program_resource.program_resource
However, it makes quite a few more fail:
ES31-CTS.functional.program_interface_query.buffer_variable.random.6
ES31-CTS.functional.program_interface_query.buffer_variable.referenced_by.compute.unnamed_block.float
ES31-CTS.functional.program_interface_query.buffer_variable.referenced_by.separable_fragment.unnamed_block.float
ES31-CTS.functional.program_interface_query.buffer_variable.referenced_by.vertex_fragment_only_fragment.unnamed_block.float
ES31-CTS.functional.program_interface_query.buffer_variable.referenced_by.vertex_fragment.unnamed_block.float
ES31-CTS.functional.program_interface_query.buffer_variable.referenced_by.vertex_geo_fragment_only_fragment.unnamed_block.float
ES31-CTS.functional.program_interface_query.buffer_variable.referenced_by.vertex_geo_fragment.unnamed_block.float
ES31-CTS.functional.program_interface_query.buffer_variable.referenced_by.vertex_tess_fragment_only_fragment.unnamed_block.float
ES31-CTS.functional.program_interface_query.buffer_variable.referenced_by.vertex_tess_fragment.unnamed_block.float
ES31-CTS.functional.program_interface_query.buffer_variable.referenced_by.vertex_tess_geo_fragment_only_fragment.unnamed_block.float
ES31-CTS.functional.program_interface_query.buffer_variable.referenced_by.vertex_tess_geo_fragment.unnamed_block.float
ES32-CTS.functional.program_interface_query.buffer_variable.random.6
ES32-CTS.functional.program_interface_query.buffer_variable.referenced_by.compute.unnamed_block.float
ES32-CTS.functional.program_interface_query.buffer_variable.referenced_by.separable_fragment.unnamed_block.float
ES32-CTS.functional.program_interface_query.buffer_variable.referenced_by.vertex_fragment_only_fragment.unnamed_block.float
ES32-CTS.functional.program_interface_query.buffer_variable.referenced_by.vertex_fragment.unnamed_block.float
ES32-CTS.functional.program_interface_query.buffer_variable.referenced_by.vertex_geo_fragment_only_fragment.unnamed_block.float
ES32-CTS.functional.program_interface_query.buffer_variable.referenced_by.vertex_geo_fragment.unnamed_block.float
ES32-CTS.functional.program_interface_query.buffer_variable.referenced_by.vertex_tess_fragment_only_fragment.unnamed_block.float
ES32-CTS.functional.program_interface_query.buffer_variable.referenced_by.vertex_tess_fragment.unnamed_block.float
ES32-CTS.functional.program_interface_query.buffer_variable.referenced_by.vertex_tess_geo_fragment_only_fragment.unnamed_block.float
ES32-CTS.functional.program_interface_query.buffer_variable.referenced_by.vertex_tess_geo_fragment.unnamed_block.float
I have diagnosed the failures, but I'm not sure whether we or the
tests are wrong. After optimizations are applied, all of the tests
are of the form:
buffer X {
float f;
} x;
void main()
{
x.f = x.f;
}
The test then queries that x is referenced by that shader stage. We
eliminate the assignment of x.f to itself, and that removes the last
reference to x. We report that x is not referenced, and the test fails.
I do not know whether or not we are allowed to eliminate that assignment
of x.f to itself.
After discussions with the OpenGL ES group in Khronos, we believe that
Mesa's behavior is correct. I will provide patches to the CTS tests
to Khronos.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
| |
v2: Revert the unreachable to assert in
parcel_out_uniform_storage::visit_field. Suggested by Ilia.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
It was not obvious from the just the .h file what the hash table
contained. It was also not obvious that get_variable_entry would create
a new entry in the hash table.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Applies on top of v3 of Tom's gallivm change.
v2:
- Tom Stellard: Use enums instread of strings.
Reviewed-by: Nicolai Hähnle <[email protected]>
Signed-off-by: Aaron Watry <[email protected]>
CC: Tom Stellard <[email protected]>
CC: Jan Vesely <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
v2:
Fix adding parameter attributes with LLVM < 4.0.
v3:
Fix typo.
Fix parameter index.
Add a gallivm enum for function attributes.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
if a fence is created pre-signaled we should return that
in GetFenceStatus even if it hasn't been submitted.
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Gustaw Smolarczyk <[email protected]>
Cc: "13.0" <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes a bunch of GPU hangs introduced in some CTS
tests like
dEQP-VK.memory.pipeline_barrier.host_write_uniform_buffer.65536
It works around an issue seen in the LLVM backend, but
also makes the radv code work more like the radeonsi stack.
Cc: "13.0" <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is ported from GLSL and converts
if (cond)
discard;
into
discard_if(cond);
This removes a block, but also is needed by radv
to workaround a bug in the LLVM backend.
v2: handle if (a) discard_if(b) (nha)
cleanup and drop pointless loop (Matt)
make sure there are no dependent phis (Eric)
v3: make sure only one instruction in the then block.
v4: remove sneaky tabs, add cursor init (Eric)
Reviewed-by: Eric Anholt <[email protected]>
Cc: "13.0" <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
We are going to start lowering to this in NIR code,
so prepare radv for it.
v2: handle conversion to kilp properly (nha)
Cc: "13.0" <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since our surface state buffer is shared by all batches, the kernel does a
full stall and sync with the CPU between batches every time we call
execbuf2 because it refuses to do relocations on an active buffer. Doing
them in userspace and passing the NO_RELOC flag to the kernel allows us to
perform the relocations without stalling.
This improves the performance of Dota 2 by around 30% on a Sky Lake GT2.
v2 (Jason Ekstrand):
- Better comments (Chris Wilson)
- Fixed write_reloc for correct canonical form (Chris Wilson)
v3 (Jason Ekstrand):
- Skip relocations which aren't needed
- Provide an environment variable to always use the kernel
- More comments about correctness (Chris Wilson)
v4 (Jason Ekstrand):
- More comments (Chris Wilson)
v5 (Jason Ekstrand):
- Rebase on top of moving execbuf2 setup go QueueSubmit
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
Cc: "13.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Ever since the early days of the Vulkan driver, we've been setting up the
lists of relocations at EndCommandBuffer time. The idea behind this was to
move some of the CPU load out of QueueSubmit which the client is required
to lock around and into command buffer building which could be done in
parallel. Then QueueSubmit basically just becomes a bunch of execbuf2
calls.
Technically, this works. However, when you start to do more in QueueSubmit
than just execbuf2, you start to run into problems. In particular, if a
block pool is resized between EndCommandBuffer and QueueSubmit, the list of
anv_bo's and the execbuf2 object list can get out of sync. This can cause
problems if, for instance, you wanted to do relocations in userspace.
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
Cc: "13.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The original reason for putting it in the batch_bo was to allow primaries
to share it across secondaries or something like that. However, the
relocation lists in secondary command buffers are are always left alone and
copied into the primary command buffer's relocation list. This means that
the offset really applies at the command buffer level and putting it in the
batch_bo doesn't make sense. This fixes a couple of potential bugs around
re-submission of command buffers that are not likely to be hit but are bugs
none the less.
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
Cc: "13.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This commit adds a little helper struct for storing everything we use to
build an execbuf2 call. Since the add_bo function really has nothing to do
with a command buffer, it makes sense to break it out a bit. This also
reduces some of the churn in the next commit.
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
Cc: "13.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
The old version wasn't properly handling large addresses where we have to
sign-extend to get it into the "canonical form" expected by the hardware.
Also, the new version is capable of doing a clflush of the newly written
reloc if requested.
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
Cc: "13.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Since -1 is an invalid GPU address, this lets us know whether or not we
have a valid address for a buffer. We don't get a valid address until the
first time that buffer is used in an execbuf2 ioctl.
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
Cc: "13.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The previous implementation was being overly clever and using the
anv_bo::size field as its mutex. Scratch pool allocations don't happen
often, will happen at most a fixed number of times, and never happen in the
critical path (they only happen in shader compilation). We can make this
much simpler by just using the device mutex. This also means that we can
start using anv_bo_init_new directly on the bo and avoid setting fields
one-at-a-time.
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
Cc: "13.0" <[email protected]>
|
|
|
|
|
|
|
|
| |
This ensures that we're always setting all of the fields in anv_bo
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
Cc: "13.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Because our relocation processing happens at EndCommandBuffer time and
because RENDER_SURFACE_STATE objects may be shared by batches, we really
have no clue whatsoever what address is actually written to the relocation
offset in the BO. We need to stop making such claims to the kernel and
just let it relocate for us.
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
Cc: "13.0" <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This puts the actual execbuf2 call in anv_batch_chain.c along with the
other relocation stuff.
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
Cc: "13.0" <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This wrapper ensures that we always update all anv_bo::offset fields based
on the offsets returned by the kernel.
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
Cc: "13.0" <[email protected]>
|
|
|
|
|
|
|
|
|
| |
When you fire up Dota2 on Haswell you get spammed with thousands of
"Implement Gen7 HZ ops" finishme's. The point of anv_finishme is to act as
a reminder that there is something left to implement. Printing it once
should be sufficient.
Signed-off-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97447
Signed-off-by: Jordan Justen <[email protected]>
Tested-by: Evan Odabashian <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
|
| |
Trivial. There's some regressions internally, related to overflow
behavior. I'll have to look at it at another time, some interactions
with vsplit/vcache are actually mind-blowing.
This reverts commit 3fa10ffb496cc4e6d1003891cf0381bb5bec2a74.
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Tim Rowley <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Tim Rowley <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Tim Rowley <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Tim Rowley <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch should have been the part of commit e592f7df.
In a situation when there are multiple render targets with alpha testing
enabled, if fragment shader doesn't write to draw buffer zero, it causes
the GPU hang on SKL. No GPU hang is seen on HSW. Simulator gives a
warning for all gen6+ h/w:
"Illegal render target write message length 0xa expected 0xc"
This patch fixes the GPU hang as well as the simulator warning with
new piglit test fbo-mrt-alphatest-no-buffer-zero-write:
https://patchwork.freedesktop.org/patch/118212
No regressions in Jenkins CI system.
Cc: "12.0 13.0" <[email protected]>
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Ben Widawsky <[email protected]>
|