| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
We will be using the image layout. Store the full struct directly from
the user.
Signed-off-by: Nanley Chery <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
This allows the helper to check for llc instead of having to do it
manually at all the call sites.
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
| |
It's a bit shorter and easier to work with. Also, we're about to add a
helper called clflush which does the clflush but without any memory
fencing.
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Unfortunately, this doesn't substantially improve the performance of any
known apps. With Dota 2 on my Sky Lake gt4, it seems help by somewhere
between 0% and 1% but there's enough noise that it's hard to get a clear
picture.
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
|
|
|
|
|
| |
It's a bit hard to measure because it almost gets lost in the noise,
but this seemed to help Dota 2 by a percent or two on my Broadwell
GT3e desktop.
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This helps Dota 2 on Broadwell by 8-9%. I also hacked up the driver and
used the Sascha "shadowmapping" demo to get some results. Setting
uses_kill to true dropped the framerate on the demo by 25-30%. Enabling
the PMA fix brought it back up to around 90% of the original framerate.
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Vulkan doesn't have a stencilWriteEnable bit like it does for depth.
Instead, you have a stencil mask. Since the stencil mask is handled as
dynamic state, we have to handle it later during command buffer
construction. This, combined with a later commit, seems to help Dota2
on my Broadwell GT3e desktop by a couple percent because it allows the
hardware to move the depth and stencil writes to early in more cases.
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
|
|
|
|
| |
Set 3DSTATE_WM/ThreadDispatchEnable bit on/off based on the same
conditions as used in the GL version.
Signed-off-by: Juan A. Suarez Romero <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
v2: use define for buffer ID (Jason)
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Up to now on Gen8+ we only allocated a vertex element for
gl_InstanceIndex or gl_VertexIndex when a vertex shader uses
gl_BaseInstanceARB or gl_BaseVertexARB. This is because we would
configure the VF_SGVS packet to make the VF unit write the
gl_InstanceIndex & gl_VertexIndex values right behind the values
computed from the vertex buffers.
In the next commit we will also write the gl_DrawIDARB value. Our
backend expects to pull the gl_DrawIDARB value from the element
following the element containing gl_InstanceIndex, gl_VertexIndex,
gl_BaseInstanceARB and gl_BaseVertexARB (see
vec4_vs_visitor::setup_attributes). Therefore we need to allocate an
element for the SGVS elements as long as at least one of the SGVS
element is read by the shader. Otherwise our shader will use a
gl_DrawIDARB value pulled from the URB one element too far (most
likely garbage).
v2: Fix my english (Lionel)
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
v2: use define for buffer ID (Jason)
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The same we do in the OpenGL driver (comment copied from there).
This is required to ensure that we execute the fragment shader stage when
side-effects (such as image or ssbo stores) are present but there are no
color writes.
I found this while writing a test to check rendering to a framebuffer
without attachments where the fragment shader does not produce any
color outputs but writes to an image via imageStore(). Without this patch
the fragment shader does not execute and the image is not written,
which is not correct.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The Vulkan rules for point size are a bit whacky. If you only have a
vertex shader and you use points, then you must write PointSize in your
vertex shader. If you have a geometry or tessellation shader, then it's
dependent on the shaderTessellationAndGeometryPointSize device feature.
From the Vulkan 1.0.38 specification:
"shaderTessellationAndGeometryPointSize indicates whether the
PointSize built-in decoration is available in the tessellation
control, tessellation evaluation, and geometry shader stages. If this
feature is not enabled, members decorated with the PointSize built-in
decoration must not be read from or written to and all points written
from a tessellation or geometry shader will have a size of 1.0. This
also indicates whether shader modules can declare the
TessellationPointSize capability for tessellation control and
evaluation shaders, or if the shader modules can declare the
GeometryPointSize capability for geometry shaders. An implementation
supporting this feature must also support one or both of the
tessellationShader or geometryShader features."
In other words, if the feature is disbled (the client can disable
features!) then they don't write PointSize and we provide a 1.0 default
but if the feature is enabled, they do write PointSize and we use the
one they wrote in the shader. There are at least two valid ways we can
implement this:
1) Track whether or not shaderTessellationAndGeometryPointSize is
enabled and set the 3DSTATE_SF bits based on that and what stages
are enabled, ignoring the shader source.
2) Just look at the last geometry stage VUE map and see if they wrote
PointSize and set the 3DSTATE_SF accordingly.
The second solution is the easiest and the most robust against invalid
usage of the Vulkan API, so we choose to go with that one.
This fixes all of the dEQP-VK.tessellation.primitive_discard.*point_mode
tests. The tests are also broken because they unconditionally enable
shaderTessellationAndGeometryPointSize if it's supported by the
implementation and then don't write PointSize in the evaluation shader.
However, since this is the "robust against invalid API usage" solution,
the tests happily pass. :-)
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
This lets us delete a helper from genX_pipeline.c
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
This is the same we do in the GL driver: the hardware provides gl_Layer
in the VUE header, so when the fragment shader reads it we can't skip it.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
So far, input_reads was a bitmap tracking which vertex input locations
were being used.
In OpenGL, an attribute bigger than a vec4 (like a dvec3 or dvec4)
consumes just one location, any other small attribute. So we mark the
proper bit in inputs_read, and also the same bit in double_inputs_read
if the attribute is a dvec3/dvec4.
But in Vulkan, this is slightly different: a dvec3/dvec4 attribute
consumes two locations, not just one. And hence two bits would be marked
in inputs_read for the same vertex input attribute.
To avoid handling two different situations in NIR, we just choose the
latest one: in OpenGL, when creating NIR from GLSL/IR, any dvec3/dvec4
vertex input attribute is marked with two bits in the inputs_read bitmap
(and also in the double_inputs_read), and following attributes are
adjusted accordingly.
As example, if in our GLSL/IR shader we have three attributes:
layout(location = 0) vec3 attr0;
layout(location = 1) dvec4 attr1;
layout(location = 2) dvec3 attr2;
then in our NIR shader we put attr0 in location 0, attr1 in locations 1
and 2, and attr2 in location 3 and 4.
Checking carefully, basically we are using slots rather than locations
in NIR.
When emitting the vertices, we do a inverse map to know the
corresponding location for each slot.
v2 (Jason):
- use two slots from inputs_read for dvec3/dvec4 NIR from GLSL/IR.
v3 (Jason):
- Fix commit log error.
- Use ladder ifs and fix braces.
- elements_double is divisible by 2, don't need DIV_ROUND_UP().
- Use if ladder instead of a switch.
- Add comment about hardware restriction in 64bit vertex attributes.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The specification section 9.4 says :
When an application attempts to create many pipelines in a single
command, it is possible that some subset may fail creation. In that
case, the corresponding entries in the pPipelines output array will
be filled with VK_NULL_HANDLE values. If any pipeline fails
creation (for example, due to out of memory errors), the
vkCreate*Pipelines commands will return an error code. The
implementation will attempt to create all pipelines, and only
return VK_NULL_HANDLE values for those that actually failed.
Fixes :
dEQP-VK.api.object_management.alloc_callback_fail_multiple.graphics_pipeline
dEQP-VK.api.object_management.alloc_callback_fail_multiple.compute_pipeline
v2: C is hard let's go shopping (Lionel)
v3: Remove unnecessary condition in for loops (Lionel)
v4: Document why we return on first failure (Eduardo)
Move i declaration inside for() (Eduardo)
v5: Move array cleanup out of loop (Jason)
Signed-off-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
| |
It's not used on gen8+ so it causes unused function warnings.
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
| |
This fixes a "discards const" warning since blend is const.
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
| |
Apparently the hw wedges otherwise, as mentioned in i965 comments.
Reported-by: Emmanuel Gil Peyrot <[email protected]>
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
| |
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This code is far too complicated to cut and paste.
v2: Update the newly added genX_gpu_memcpy.c; const a few things.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Kristian H. Kristensen <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
There are a few dynamic bits, namely binding table and sampler addresses,
but most of it is static and really belongs in the pipeline. It certainly
doesn't belong in flush_compute_descriptor_set. We'll use the same state
merging trick we use for gen7 DEPTH_STENCIL.
Reviewed-by: Kristian H. Kristensen <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Kristian H. Kristensen <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Kristian H. Kristensen <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Now that we have anv_shader_bin, they're completely redundant with other
information we have in the pipeline. For vertex shaders, we also go
through way too much work to put the offset in one or the other field and
then look at which one we put it in later.
Reviewed-by: Timothy Arceri <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
| |
This moves all the alloc/free in anv to the generic helpers.
Acked-by: Jason Ekstrand <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
| |
Now that we don't have meta, we have no need for a gen-agnostic pipeline
create path. We can, instead, just generate one Create*Pipelines function
per gen and be done with it.
Signed-off-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Now that we no longer have meta, all pipelines get created via the normal
Vulkan pipeline creation mechanics. There is no more need for this bit of
extra magic data that we've been passing around.
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reproduces this commit :
commit 0fb85ac08d61d365e67c8f79d6955e9f89543560
Author: Kenneth Graunke <[email protected]>
Date: Mon Jun 6 21:37:34 2016 -0700
i965: Use the correct number of threads for compute shaders.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Now that we're using gen_l3_config.c, we no longer have one set of l3
config functions per gen and we can simplify a bit. Also, we know that
only compute uses SLM so we don't need to look for it in all of the stages.
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The original pipeline cache the Kristian wrote was based on a now-false
premise that the shaders can be stored in the pipeline cache. The Vulkan
1.0 spec explicitly states that the pipeline cache object is transiant and
you are allowed to delete it after using it to create a pipeline with no
ill effects. As nice as Kristian's design was, it doesn't jive with the
expectation provided by the Vulkan spec.
The new pipeline cache uses reference-counted anv_shader_bin objects that
are backed by a large state pool. The cache itself is just a hash table
mapping keys hashes to anv_shader_bin objects. This has the added
advantage of removing one more hand-rolled hash table from mesa.
Signed-off-by: Jason Ekstrand <[email protected]>
Cc: "12.0" <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97476
Acked-by: Kristian Høgsberg Kristensen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Jason Ekstrand <[email protected]>
Cc: "12.0" <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This solves a race condition where we can end up having different stages
stomp on each other because they're all trying to scratch in the same BO
but they have different views of its layout.
Signed-off-by: Jason Ekstrand <[email protected]>
Cc: "12.0" <[email protected]>
|
|
|
|
|
|
|
|
| |
While we're here, we also fixup MEDIA_VFE_STATE and rename the field in
3DSTATE_VS on gen6-7.5 to be consistent with the others.
Signed-off-by: Jason Ekstrand <[email protected]>
Cc: "12.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The cross thread constant support appears on Haswell. It allows us to
upload a set of uniform data for all threads without duplicating it
per thread.
We also support per-thread data which allows us to store a per-thread
ID in one of the uniforms that can be used to calculate the
gl_LocalInvocationIndex and gl_LocalInvocationID variables.
v4:
* Support the old local ID push constant layout as well (Jason)
Cc: "12.0" <[email protected]>
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
| |
Cc: "12.0" <[email protected]>
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Jordan Justen <[email protected]>
Acked-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
If images or shader buffers are used, we will enable the data cache in
the the L3 config.
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
| |
Acked-by: Kristian Høgsberg <[email protected]>
|