| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
anv_format is supposed to have a pointer back to the associated
VkFormat, we were missed this for depth/stencil formats.
This doesn't fix anything afaict, but will be needed for future
changes.
Signed-off-by: Lionel Landwerlin <[email protected]>
Fixes: 465de47bad70 ("anv: associate vulkan formats with aspects")
Acked-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
I can find no evidence that removing this is a good idea.
Fixes: 9b116173b6a ("radv: do not emit VGT_FLUSH on GFX10")
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
unorm and snorm require that the border color values are clamped, so when
picking the sampler view copy/clamp the border color from the sampler and
use these adjusted values.
Fixes:
dEQP-GLES31.functional.texture.border_clamp.range_clamp.linear_compressed_color
dEQP-GLES31.functional.texture.border_clamp.range_clamp.linear_snorm_color
dEQP-GLES31.functional.texture.border_clamp.range_clamp.linear_srgb_color
dEQP-GLES31.functional.texture.border_clamp.range_clamp.linear_unorm_color
dEQP-GLES31.functional.texture.border_clamp.range_clamp.nearest_compressed_color
dEQP-GLES31.functional.texture.border_clamp.range_clamp.nearest_snorm_color
dEQP-GLES31.functional.texture.border_clamp.range_clamp.nearest_srgb_color
dEQP-GLES31.functional.texture.border_clamp.range_clamp.nearest_unorm_color
dEQP-GLES31.functional.texture.border_clamp.range_clamp.nearest_unorm_depth
dEQP-GLES31.functional.texture.border_clamp.range_clamp.nearest_unorm_depth_uint_stencil_sample_depth
Signed-off-by: Gert Wollny <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The value of -0.5f is not small enough to produce negative coordinates,
so lower the minimum clamp value to -1.0f. This fixes a number of tests
from
dEQP-GLES31.functional.texture.border_clamp.*
Signed-off-by: Gert Wollny <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
when mirroring the texture corrdinates the indices must be mirrored as
well and the half pixel shift must be applied in reverse.
Fixes a number of tests from:
dEQP-GLES31.functional.texture.gather.offset.*
dEQP-GLES31.functional.texture.gather.offsets.*
Signed-off-by: Gert Wollny <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
At this point all the draw caches are flushed to the old attached textures,
so the read caches of these textures will need to be updated too.
Fixes:
dEQP-GLES3.functional.fbo.color.repeated_clear.sample.tex2d.*
Signed-off-by: Gert Wollny <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This fixes a rendering glitch observed in SDL testscale test, where alpha
blending samples with value (1.0, 1.0, 1.0, 0.0) whitens the target instead
of having no effect.
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Christian Gmeiner <[email protected]>
|
|
|
|
|
|
|
| |
Update to etna_viv commit a16a418.
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Christian Gmeiner <[email protected]>
|
|
|
|
|
|
|
| |
Newer GPUs use the half float ALPHA_COLOR_EXT register.
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Christian Gmeiner <[email protected]>
|
|
|
|
|
|
|
|
| |
We need to check rgb_func/alpha_func when determining if blend or separate
alpha is required.
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Christian Gmeiner <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Dividing the fui result by 65535 is obviously wrong, and from testing, on
GC7000L at least there is no division by 65535.
Fixes dEQP-GLES2.functional.polygon_offset.fixed16_displacement_with_units
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Christian Gmeiner <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This fixes the memory use regression from bug 111107.
Fixes: 726a31df705 ("radv: Add the concept of radv shader binaries.")
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111107
|
|
|
|
|
|
|
|
| |
This is ported from AMDVLK, it's probably not requires unless
we want to use "real time queues", but it might be nice to just have
in place.
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
| |
this shouldn't matter, but it's good to be correct.
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rob Clark thinks this was likely a workaround for our const buffer
update bugs, and now that it's passing tests, we should be able to
drop it.
renderdoc-traces results:
traces/android/clashofclans.rdc: +6.1% +/- 1.1%
traces/android/candycrush.rdc: +5.2% +/- 1.6%
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that the bin vs render constlen is fixed, we can skip these waits.
Improves webgl aquarium performance at 10k fish from 27fps to 33.
Some highlights from renderdoc-traces:
traces/android/minecraft.rdc: +17.1% +/- 3.4%
traces/glmark2/ideas-speed=duration.rdc: +11.6% +/- 2.4%
traces/android/candycrush.rdc: +5.4% +/- 1.1%
traces/android/clashofclans.rdc: +4.4% +/- 1.3%
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We actually could go up to vs->constlen in the binning shader on a6xx,
but for sanity let's make sure that we're always under constlen. This
would have caught the bug fixed in 572c76fd8826 ("freedreno: Clamp UBO
uploads to the constlen decided by the shader.")
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Fixes constlen overflow in
dEQP-GLES31.functional.shaders.builtin_var.compute.num_work_groups and
dEQP-GLES31.functional.image_load_store.buffer.image_size.readonly_32
and probably others.
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
We already skip the upload if it's unused, due to the constlen >
offset check.
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The set of meta data was removed by commit 8083464. It broke lots of
dEQP tests when running with pbuffer surface type.
Fixes: 80834640137 ("virgl: remove dead code")
Signed-off-by: Lepton Wu <[email protected]>
Reviewed-by: Erik Faye-Lund <[email protected]>
Reviewed-by: Chia-I Wu <[email protected]>
|
|
|
|
|
|
|
|
| |
After reset, if valid does not contain the relevant bit the descriptor
can be != NULL but still not be valid.
CC: <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Spec says :
"timestampComputeAndGraphics specifies support for timestamps on all
graphics and compute queues. If this limit is set to VK_TRUE, all
queues that advertise the VK_QUEUE_GRAPHICS_BIT or
VK_QUEUE_COMPUTE_BIT in the VkQueueFamilyProperties::queueFlags
support VkQueueFamilyProperties::timestampValidBits of at least 36."
On gen7+ this should be true (we only have 32bits of timestamp on
gen6 and below).
Signed-off-by: Lionel Landwerlin <[email protected]>
Fixes: 802f00219addb3 ("anv/device: Update features and limits")
Reported-by: Timothy Strelchun <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Until now we only supported fast clear colors on the first miplevel and
layer. The main reason for it is that we can't have different fast clear
values at different levels/layers, since the surface state only supports
one clear value.
We can, however, enable it if we make sure we only use the same value
for all levels/layers, and if one of them changes, we resolve all the
others. We already do that for depth fast clears so hopefully it will be
fine for color fast clears too.
v2: Add check for partial clear too (Ken).
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
| |
We want to use this in the transfer code and possibly for fast clears.
|
|
|
|
| |
In common we can use implementation for Vulkan.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Relax the restriction that all the writes need to be in the first
block: now accept variables that have all the writes in the same
block, and all the reads are dominated by that block.
This let the pass identify large constants that are local to a helper
function. The writes will be at the place that the function is
inlined, possibly not in the first block (but still all in the same
block).
Results for vkpipeline-db in SKL:
total instructions in shared programs: 3624891 -> 3623145 (-0.05%)
instructions in affected programs: 79416 -> 77670 (-2.20%)
helped: 16
HURT: 0
total cycles in shared programs: 1458149667 -> 1458147273 (<.01%)
cycles in affected programs: 30154164 -> 30151770 (<.01%)
helped: 14
HURT: 2
total loops in shared programs: 2437 -> 2437 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0
total spills in shared programs: 8813 -> 8745 (-0.77%)
spills in affected programs: 2894 -> 2826 (-2.35%)
helped: 8
HURT: 0
total fills in shared programs: 23470 -> 23392 (-0.33%)
fills in affected programs: 12248 -> 12170 (-0.64%)
helped: 6
HURT: 2
LOST: 0
GAINED: 0
Results for shader-db in SKL with Iris:
total instructions in shared programs: 15379442 -> 15379392 (<.01%)
instructions in affected programs: 837 -> 787 (-5.97%)
helped: 2
HURT: 2
helped stats (abs) min: 27 max: 27 x̄: 27.00 x̃: 27
helped stats (rel) min: 10.47% max: 10.67% x̄: 10.57% x̃: 10.57%
HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel) min: 1.23% max: 1.23% x̄: 1.23% x̃: 1.23%
95% mean confidence interval for instructions value: -39.14 14.14
95% mean confidence interval for instructions %-change: -15.51% 6.17%
Inconclusive result (value mean confidence interval includes 0).
total loops in shared programs: 4880 -> 4880 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0
total cycles in shared programs: 370677237 -> 370676567 (<.01%)
cycles in affected programs: 17852 -> 17182 (-3.75%)
helped: 2
HURT: 1
helped stats (abs) min: 338 max: 356 x̄: 347.00 x̃: 347
helped stats (rel) min: 13.98% max: 14.64% x̄: 14.31% x̃: 14.31%
HURT stats (abs) min: 24 max: 24 x̄: 24.00 x̃: 24
HURT stats (rel) min: 0.18% max: 0.18% x̄: 0.18% x̃: 0.18%
total spills in shared programs: 11772 -> 11772 (0.00%)
spills in affected programs: 0 -> 0
helped: 0
HURT: 0
total fills in shared programs: 24948 -> 24948 (0.00%)
fills in affected programs: 0 -> 0
helped: 0
HURT: 0
LOST: 0
GAINED: 0
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
In many cases, the compiler can just copy-prop the strided MOV whereas
the conversion is a bit trickier. This cuts 5% of the instructions off
of one particular Vulkan CTS test which does lots of load_ssbo.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These seem like obvious enough optimizations in the world of multiple
integer bit sizes. The only known thing which hits these at the moment
is some Vulkan CTS tests for 16-bit SSBO values which like to up-cast
and check for equality. However, it's something that's bound to come up
as we start seeing more integers in shaders.
The optimizations of comparisons of casted values with constants are
something which we would ideally do with range analysis. However,
lacking that, we can do it in opt_algebraic as long as one side is a
constant.
In dEQP-VK.ssbo.phys.layout.random.16bit.scalar.13, this commit, along
with the previous commit, reduce the number of instructions emitted on
Skylake from 55328 to 44546, a reduction of 20%.
Acked-by: Matt Turner <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We could, in theory, add the same optimization for 64-bit unpack
operations but that's likely to fight with 64-bit integer lowering on
platforms which require it so it will require more infrastructure before
that will be a good idea.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
| |
This helps greatly when debugging algebraic transform generators because
you can now actually see the output and verify that your transforms are
getting generated.
Acked-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This fixes some validation errors generated by certain D->W conversions
but is likely not a full solution. Calculating an actual register
stride is a far more complex problem in general and should probably be
handled by the brw_fs_generator.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It was checking if the dest or src[0] SSA values were vectors, rather than
whether the ALU op was using the source as a vector resulting in a
nir_fdot4 making it through to vc4 and v3d:
vec1 32 ssa_6 = fdot4 ssa_4.xxxx, ssa_5
Fixes: c1cffa4249ca ("nir/alu_to_scalar: Use the new NIR lowering framework")
v2: Use Jason's recommendation to look at input_sizes.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Theoretically we would like these split since varyings can have
specially optimized flags (no map, coherent local). For now, since
neither of these flags is particularly meaningful right now, merge them
together instead of special casing varyings_mem.
Saves upwards of 64MB of RAM per context.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With the following chain of events :
vkQueuePresent()
<- Surface resize
vkQueuePresent()
We should be able to report SUBOPTIMAL or OUT_OF_DATE on the second
vkQueuePresent() call. Currently we only look at X11 events in the
vkAcquireNextImage() path so we're not able to report this.
This change checks the queue of events and process any available ones
to update the swapchain status.
v2: Be consistent about reporting the current error state of the
swapchain (Jason)
Signed-off-by: Lionel Landwerlin <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111097
Cc: <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
Will be useful for testing the legacy path.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
| |
This is essensially a port of ed53e61bec9 from LLVMpipe to softpipe,
as it makes things a bit simpler and more performant.
Signed-off-by: Erik Faye-Lund <[email protected]>
Reviewed-By: Gert Wollny <[email protected]>
|
|
|
|
|
|
| |
v2: Rebase update after changes on previous patches.
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We can use it to get real values for ARB_spirv_extensions methods.
Signed-off-by: Alejandro Piñeiro <[email protected]>
Signed-off-by: Arcady Goldmints-Orlov <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add a struct to maintain which SPIR-V extensions are supported, and an
utility method to initialize it based on
nir_spirv_supported_capabilities.
v2:
* Fixing code style (Ian Romanick)
* Adding a prefix (spirv) to fill_supported_spirv_extensions (Ian Romanick)
v3: rebase update (nir_spirv_supported_extensions renamed)
v4: include AMD_gcn_shader support
v5: move spirv_fill_supported_spirv_extensions to
src/mesa/main/spirv_extensions.c
Signed-off-by: Alejandro Piñeiro <[email protected]>
Signed-off-by: Arcady Goldmints-Orlov <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Ideally this should be generated somehow. One option would be gather
all the extension dependencies listed on the core grammar, but there
would be the possibility of not including some of the extensions.
Note that spirv-tools is doing it just slightly better, as it has a
hardcoded list of extensions manually took from the registry, that
they parse to get the enum and the to_string method (see
generate_grammar_tables.py).
v2:
* Use a macro to improve readability. (Tapani Pälli)
* Add unreachable on the switch, no default (Eric Engestrom)
* No typedef enum (Ian Romanick)
* Sort extensions names (Ian Romanick)
* Don't add extensions unlikely to be supported by Mesa at any point
(Ian Romanick)
v3: rebase update
v4: Include AMD_gcn_shader
v5: move spirv_extensions_to_string to src/mesa/main/spirv_extensions.c
Signed-off-by: Alejandro Piñeiro <[email protected]>
Signed-off-by: Arcady Goldmints-Orlov <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
v2:
* Mention extension gap at gl_API.xml (Emil Velikov)
* Bail with INVALID_ENUM if extension not available on getStringi (Emil Velikov)
* Use EXTRA_EXT macro when defining the extension at
get.c/get_hash_params.py (Emil Velikov)
* Rename source files (spirvextensions.[ch] -> spirv_extensions.[ch]) (Ian)
v3:
* Fix GL_PROGRAM_BINARY_FORMATS glGet query, broken by error on a
previous rebase
v4:
* Fix rebase conflicts on getstring.c after
GL_SHADING_LANGUAGE_VERSION query was added
v5:
* Remove src/mapi/glapi/gen/Makefile.am as it no longer exists in
master
Signed-off-by: Alejandro Piñeiro <[email protected]>
Signed-off-by: Arcady Goldmints-Orlov <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
| |
I did implement this extension a while ago but it didn't work
on pre GFX10 for some reasons. Now all CTS pass.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
| |
Unnecessary.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This is unsupported and hangs.
This fixes GPU hangs with
dEQP-VK.tessellation.geometry_interaction.limits.output_required_*.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
It should be possible to build it on-demand too but it requires
more work. On GFX10, the GS copy shader is required when tess
is enabled with extreme geometry.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Thanks to Eric Engestrom for pointing out that there was something wrong
with that function.
Fixes: 724a73509e1bc1ce3abf9500e457bb2911b642db
softpipe: Prepare handling explicit gradients
Signed-off-by: Gert Wollny <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
This decoration can be ignored, so we can just skip the next steps.
Otherwise we'd have to also handle it in apply_var_decoration.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Lionel moved brw_timebase_scale to gen_device_info_timebase_scale a few
months ago, so we should just use that, and not our own copy in iris.
|