| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
| |
The edgeflag comes in as ubyte with glEdgeFlagPointer but as float with
plain immediate glEdgeFlag. Avoid reading bytes that weren't meant for
the edgeflag in the pointer case.
Fixes intermittent failures with gl-2.0-edgeflag piglit (and valgrind
complaints about reading uninitialized memory).
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: [email protected]
(cherry picked from commit e05021ff72abb7de6506c90dd70a9f7ab490bf90)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
f16c intrinsic can only be emitted when AVX is used. So when we disable AVX
due to forcing 128bit vectors we must not use this intrinsic (depending on
llvm version, this worked previously because llvm used AVX even when we didn't
tell it to, however I've seen this fail with llvm 3.3 since
718249843b915decf8fccec92e466ac1a6219934 which seems to have the side effect
of disabling avx in llvm albeit it only touches sse flags really, but
with ea421e919ae6e72e1319fb205c42a6fb53ca2f82 it's now really disabled).
Albeit being able to use AVX with 128bit vectors also would have its uses, the
code as is really was meant to emulate jit code creation for less capable cpus.
v2: add some (ifdefed out) missing de-featuring options for simulating
less capable cpus.
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
(cherry picked from commit 711489648bcce5cd8fcf14e73e5affe069010c01)
Nominated-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
| |
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92214
CC: "10.6 11.0" <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
(cherry picked from commit ea421e919ae6e72e1319fb205c42a6fb53ca2f82)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The section for UVD 2 and older was not updated
when HEVC support was added. Reported by Kano
on irc.
v2: integrate the UVD2 and older checks into the
main switch statement.
v3: handle encode checking as well. Encode is
already checked in the top case statement, so
drop encode checks in the lower case statement.
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]
(cherry picked from commit 7b636581253fe858ac883e3d3eec21173ac069d4)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This should prevent disparity between features Mesa and LLVM
believe are supported by the CPU.
http://lists.freedesktop.org/archives/mesa-dev/2015-October/thread.html#96990
Tested on a i7-3720QM w/ LLVM 3.3 and 3.6.
v2: Increase SmallVector initial size as suggested by Gustaw Smolarczyk.
Reviewed-by: Roland Scheidegger <[email protected]>
CC: "10.6 11.0" <[email protected]>
(cherry picked from commit 718249843b915decf8fccec92e466ac1a6219934)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The refactoring commit, c6bf1cd, accidentally reverted cd49b97
and 99b1f47. These changes caused more code to be added to the
function and removed the existing support for ASTC. This patch
reverts those modifications.
v2. Actually include ASTC support again.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92221
Cc: "11.0" <[email protected]>
Signed-off-by: Nanley Chery <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
(cherry picked from commit f1147a238ab35a56fa7d1c64f6025ff3b909dad8)
[Emil Velikov]
- Drop the KHR_texture_compression_astc_ldr check
- Add texcompress.h include.
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
| |
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92437
CC: "10.6 11.0" <[email protected]>
Signed-off-by: Jose Fonseca <[email protected]>
(cherry picked from commit 04703762e544bc732f6f8b07033221dfbd58159f)
|
|
|
|
|
|
|
| |
The commit base varies greatly between master and 11.0. It seems that
the commit (in it's current form) is not applicable for the branch.
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
| |
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
| |
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
| |
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
GNU make predefines RM to rm -f but this is not required by POSIX
so ensure that RM is set. This fixes "make clean" on OpenBSD.
v2: use AC_CHECK_PROG
Signed-off-by: Jonathan Gray <[email protected]>
CC: "10.6 11.0" <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
(cherry picked from commit 99c4079c37ac04a0dad4ead3117f786706c80aaf)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Patch also refactors name length queries which were using array size
in computation, this has to be done in same time to avoid regression in
arb_program_interface_query-resource-query Piglit test.
Fixes rest of the failures with
ES31-CTS.program_interface_query.no-locations
v2: make additional check only for GS inputs
v3: create helper function for resource name length
so that it gets calculated only in one place
Signed-off-by: Tapani Pälli <[email protected]>
Reviewed-by: Martin Peres <[email protected]>
(cherry picked from commit c0722be9f58ef89dae98d8c459ec4f9589f97748)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
intel_update_winsys_renderbuffer_miptree() will release the existing
miptree when wrapping a new DRI2 buffer, so we can remove the early
release and so prevent a NULL mt dereference should importing the new
DRI2 name fail for any reason. (Reusing the old DRI2 name will result
in the rendering going astray, to a stale buffer, and not shown on the
screen, but it allows us to issue a warning and not crash much later in
innocent code.)
Signed-off-by: Chris Wilson <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86281
Reviewed-by: Martin Peres <[email protected]>
Reviewed-by: Chad Versace <[email protected]>
(cherry picked from commit 70e91d61fde239e8ae58148cacd4ff891126e2aa)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The src_reg constructor that received the glsl_type was using it
only to build the swizzle, but not to fill this->type as dst_reg
is doing.
This caused some type mismatch between movs and alu operations
on the NIR path, so copy propagation optimization was not applied
to remove unneeded movs if negate modifier was involved. This was
first detected on minus (negate+add) operations.
Shader DB results (taking into account only vec4):
total instructions in shared programs: 20019 -> 19934 (-0.42%)
instructions in affected programs: 2918 -> 2833 (-2.91%)
helped: 79
HURT: 0
GAINED: 0
LOST: 0
Reviewed-by: Matt Turner <[email protected]>
(cherry picked from commit 4de86e1371b0d59a5b9a787b726be3d373024647)
Nominated-by: Christoph Brill <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
opt_register_coalesce stopped to check previous instructions to
coalesce with if somebody else was writing on the same
destination. This can be optimized to check if somebody else was
writing to the same channels of the same destination using the
writemask.
Shader DB results (taking into account only vec4):
total instructions in shared programs: 1781593 -> 1734957 (-2.62%)
instructions in affected programs: 1238390 -> 1191754 (-3.77%)
helped: 12782
HURT: 0
GAINED: 0
LOST: 0
v2: removed some parenthesis, fixed indentation, as suggested by
Matt Turner
v3: added brackets, for consistency, as suggested by Eduardo Lima
Reviewed-by: Matt Turner <[email protected]>
(cherry picked from commit d4e29af2344c06490913efc35430f93a966061bb)
Nominated-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Fixes assertion failure with new piglit
arb_draw_buffers_blend-state_set_get test.
Cc: [email protected]
Reviewed-by: Jose Fonseca <[email protected]>
(cherry picked from commit e24d04e436ed48d4a0aac90590cbaa40da936208)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This avoids a serious r600g bug leading to a GPU hang.
The chances this bug will get fixed are pretty low now.
I deeply regret listening to others and not pushing this patch, leaving
other users with a GPU-crashing driver. Yes, it should be fixed
in the compiler and it's ugly, but users couldn't care less about that.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86720
Cc: 11.0 10.6 <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
(cherry picked from commit 814f31457e9ae83d4f1e39236f704721b279b73d)
|
|
|
|
|
|
|
| |
Signed-off-by: Leo Liu <[email protected]>
Reviewed-by: Christian König <[email protected]>
Cc: "10.6 11.0" <[email protected]>
(cherry picked from commit 867284a8f07b69887f8adb109fb6c71156668227)
|
|
|
|
|
|
|
|
|
| |
vlVaCreateImage
Cc: "11.0" <[email protected]>
Reviewed-by: Christian König <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
(cherry picked from commit 381c17d695b39f9ab501f5aa5a3cc42c8519ac3b)
|
|
|
|
|
|
|
|
|
|
|
| |
Cc: [email protected]
Reviewed-by: Michel Dänzer <[email protected]>
(cherry picked from commit aa060e276c203baf4691d4a4722accd5bdbb8526)
[Emil Velikov: si_shader_destroy() wants the ctx as first argument]
Signed-off-by: Emil Velikov <[email protected]>
Conflicts:
src/gallium/drivers/radeonsi/si_shader.c
|
|
|
|
|
|
|
|
| |
This allows removing FLUSH_VERTICES in MatrixMode.
Cc: [email protected]
Reviewed-by: Brian Paul <[email protected]>
(cherry picked from commit 3c6156a4a7b647cc55cbe3a4c13d53b5ffe505e6)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Otherwise there are problems when user overrides version and application
such as Piglit wants to detect used api with glGetString(GL_VERSION).
This makes it currently impossible to run glslparsertest tests for
OpenGL ES when using version override.
Below is example when using MESA_GLES_VERSION_OVERRIDE=3.1.
Before:
"3.1 Mesa 11.1.0-devel (git-24a1a15)"
After:
"OpenGL ES 3.1 Mesa 11.1.0-devel (git-78042ff)"
v2: only include api prefix for OpenGL ES (Boyan Ding)
Signed-off-by: Tapani Pälli <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
Cc: "11.0" <[email protected]>
(cherry picked from commit dc8c221e2890cc9913dfc99e1e0fcb73c89af52c)
|
|
|
|
|
|
|
|
|
| |
Otherwise the mem2gmem blit would see potentially bogus texture
coordinates. Fixes an issue that shows up with glamor.
CC: "11.0" <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
(cherry picked from commit 6206da736c84c4f7316ab586c886b4865fda8805)
|
|
|
|
|
|
|
|
|
| |
It fixes a building error of the android 6.0 64-bit target.
Signed-off-by: Chih-Wei Huang <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
Cc: [email protected]
(cherry picked from commit 7599f8b167321cb8adb2ba51a53163752b668532)
|
|
|
|
|
|
|
|
|
| |
Note Android version before Lollipop is not supported.
Signed-off-by: Chih-Wei Huang <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
Cc: [email protected]
(cherry picked from commit d31005e3e5588b20760c774f14ac0ea80375a181)
|
|
|
|
|
|
|
|
|
|
| |
Cc: "10.6 11.0" <[email protected]>
Fixes: 669cfc267a1 (android: mesa: fix the path of the SSE4_1
optimisations)
Signed-off-by: Chih-Wei Huang <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
(cherry picked from commit 67d8518a0e5a3df400a6e70de667d69e4b6ce9c5)
|
|
|
|
|
|
|
|
|
|
|
|
| |
pipe_surface_reference have problems with deleted contexts,
so use of pipe_surface_release might be more appropriate.
Fixes Wasteland 2 Director's Cut crash on start.
Cc: [email protected]
Reviewed-by: Brian Paul <[email protected]>
(cherry picked from commit 14f7ce42484c31a45fcb6aabdf503f7496a9a94c)
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The variable 'i' is a value in [0, MAT_ATTRIB_MAX-1] so subtracting
VERT_ATTRIB_GENERIC0 gave a bogus value and we executed the default
switch clause for all loop iterations.
This doesn't fix any known issues but was clearly incorrect.
Cc: [email protected]
Reviewed-by: Marek Olšák <[email protected]>
(cherry picked from commit dd293d8aae324ac7b9d5297e33a1e732e1f3f4d3)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
expression
Fixes:
ES3-CTS.shaders.negative.constant_sequence
spec/glsl-es-3.00/compiler/global-initializer/from-sequence.vert
spec/glsl-es-3.00/compiler/global-initializer/from-sequence.frag
v2: Fix a couple copy-and-paste mistake in the spec quotations.
Suggested by Matt.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Cc: "10.6 11.0" <[email protected]>
(cherry picked from commit 92635a84a7f464b827baa406578420dd6109e1a4)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
operator
This will be used in the next patch to enforce some language sematics.
v2: Fix inverted logic in
ast_function_expression::has_sequence_subexpression. The method
originally had a different name and a different meaning. I fixed the
logic in ast_to_hir.cpp, but I only changed the names in
ast_function.cpp.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Marta Lofstedt <[email protected]> [v1]
Reviewed-by: Matt Turner <[email protected]>
Cc: "10.6 11.0" <[email protected]>
(cherry picked from commit 05e4601c6b9ce456cc4a4c395677a22125d889d2)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
v2: Combine this check with the existing const and uniform checks. This
change depends on the previous patch (glsl: Only set
ir_variable::constant_value for const-decorated variables).
Fixes:
ES2-CTS.shaders.negative.initialize
ES3-CTS.shaders.negative.initialize
spec/glsl-es-1.00/compiler/global-initializer/from-attribute.vert
spec/glsl-es-1.00/compiler/global-initializer/from-uniform.vert
spec/glsl-es-1.00/compiler/global-initializer/from-uniform.frag
spec/glsl-es-1.00/compiler/global-initializer/from-global.vert
spec/glsl-es-1.00/compiler/global-initializer/from-global.frag
spec/glsl-es-1.00/compiler/global-initializer/from-varying.frag
spec/glsl-es-3.00/compiler/global-initializer/from-uniform.vert
spec/glsl-es-3.00/compiler/global-initializer/from-uniform.frag
spec/glsl-es-3.00/compiler/global-initializer/from-in.vert
spec/glsl-es-3.00/compiler/global-initializer/from-in.frag
spec/glsl-es-3.00/compiler/global-initializer/from-global.vert
spec/glsl-es-3.00/compiler/global-initializer/from-global.frag
Note: spec/glsl-es-3.00/compiler/global-initializer/from-sequence.*
still fail because the result of a sequence operator is still considered
to be a constant expression.
Signed-off-by: Ian Romanick <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92304
Reviewed-by: Tapani Pälli <[email protected]> [v1]
Reviewed-by: Iago Toral Quiroga <[email protected]> [v1]
Reviewed-by: Matt Turner <[email protected]>
Cc: "10.6 11.0" <[email protected]>
(cherry picked from commit bb329f2ff6e8bf8910a467b09f69a4d843689617)
|
|
|
|
|
|
|
|
|
|
|
| |
Right now we're also setting for uniforms, and that doesn't seem to hurt
things. The next patch will make general global variables in GLSL ES,
and those definitely should not have constant_value set!
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Cc: "10.6 11.0" <[email protected]>
(cherry picked from commit 3524d6df33b1e3716992f9a555ffb0f7b1ae2f4f)
|
|
|
|
|
|
|
|
|
|
|
|
| |
whether to keep an unused uniform
This even matches the comment "uniform initializers are precious, and
could get used by another stage."
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Cc: "10.6 11.0" <[email protected]>
(cherry picked from commit 5bc68f0f2b80b21997435742af74c49eb72891f7)
|
|
|
|
|
|
|
|
|
| |
initialize uniforms
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Cc: "10.6 11.0" <[email protected]>
(cherry picked from commit 313372cae8e10e4b9a3de093d65c0a0d8954bb0d)
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is the way layout(binding=xxx) works from GLSL. The old method
just happened to work (and significantly predated support for
layout(binding=xxx)), but future changes will break this.
v2: Remove some stale comments. Suggested by Matt and Chris Forbes.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Cc: "10.6 11.0" <[email protected]>
(cherry picked from commit 8acce5d53af44a3d1d05a26e69559fd35f835de4)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In d4a24745 (August 2012), Paul made functions calls not be constant
expressions in GLSL ES 1.00. Since this feature was added in desktop
GLSL 1.20, we believed that it was added in GLSL ES 3.00. That turns
out to be completely wrong. Built-in functions have always been allowed
as constant expressions in GLSL ES, and the patch adds the (many) spec
quotations to prove it.
While we never previously encountered this, a later patch enforces a GLSL
ES 1.00 rule that global variable initializers must be constant
expressions. Without this fix, several dEQP tests fail.
Fixes:
tests/spec/glsl-es-1.00/compiler/const-initializer/from-function.frag
tests/spec/glsl-es-1.00/compiler/const-initializer/from-function.vert
tests/spec/glsl-es-1.00/compiler/const-initializer/from-sequence-in-function.frag
tests/spec/glsl-es-1.00/compiler/const-initializer/from-sequence-in-function.vert
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Cc: "10.0 10.1 10.2 10.3 10.4 10.5 10.6 11.0" <[email protected]>
Yes, I know we don't maintain stable branches that far back, but that
*is* how far back this bug goes!
(cherry picked from commit 43b07eb60faba1c65fc6f7a99087d051b00e9c0f)
|
|
|
|
|
|
|
|
|
|
|
|
| |
Vertex attributes of different categories (constant/per-instance/
per-vertex) go into different buffers for translation, and this is now
properly reflected in the vertex buffers passed to the driver.
Fixes e.g. piglit's point-vertex-id divisor test.
Cc: [email protected]
Reviewed-by: Marek Olšák <[email protected]>
(cherry picked from commit 45ed627d894aa4d51682e8b07e7234bbc6e7c02d)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The initial glGetUniformdv support didn't cover all the
casting cases that are apparantly legal, and cts seems to
test for them.
I've updated the piglit test to cover these cases now.
v2: fix indentation - it's all broken in this file (Ilia)
fix src/dst index tracking in light of fp64 support (Ilia)
cc: "11.0" <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
(cherry picked from commit bcfaab38858fdcfbd8ffeaf6b0e3da8a726f02e6)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The point is to avoid having to re-validate all image units when
_NEW_TEXTURE is flagged, which can be expensive if the driver exposes
a large number of image units. This has been reported to fix a 36%
performance regression in the Synmark2 Multithread benchmark on the
i965 driver which exposes 192 image units.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91788
Reported-by: Wendy Wang <[email protected]>
Tested-by: Ye Tian <[email protected]>
CC: "11.0" <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
(cherry picked from commit 7e441bf025cf8c5d088430d546acb4c0ed58d27b)
|
|
|
|
|
|
|
|
|
| |
gl_image_unit::_Valid will be removed in a future commit.
Tested-by: Ye Tian <[email protected]>
CC: "11.0" <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
(cherry picked from commit 2d97a78b37ddf325d90e056f5eefee0548092530)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The call to _mesa_test_texobj_completeness() is unnecessary if the
texture is already known to be complete. If the texture object is
dirtied in the meantime _BaseComplete and _MipmapComplete will be
reset to false. _mesa_is_image_unit_valid() will start to be called
more frequently in a future commit, so it seems desirable to avoid the
unnecessary work.
Tested-by: Ye Tian <[email protected]>
CC: "11.0" <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
(cherry picked from commit 25d3338be37ddbfe676716034ec5f29e27323704)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A future commit will remove all texture object-dependent derived state
from the image unit struct to make validation unnecessary on texture
state changes. Instead of checking gl_image_unit::_Valid drivers will
be required to call this function when needed to find out whether an
image unit is in a valid state and whether access from the shader is
allowed.
Tested-by: Ye Tian <[email protected]>
CC: "11.0" <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
(cherry picked from commit 5152db415f4047569822d648fda09bdde4171d6d)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The hardware documentation relating to the UAV HW-assisted coherency
mechanism and UAV access enable bits is scarce and sometimes
contradictory, and there's quite some guesswork behind this commit, so
let me summarize the background first: HSW and later hardware have
infrastructure to support a stricter form of data coherency between
shader invocations from separate primitives. The mechanism is
controlled by the "Accesses UAV" bits on 3DSTATE_VS, _HS, _DS, _GS and
_PS (or _PS_EXTRA on BDW+), and the "UAV Coherency Required" bit on
the 3DPRIMITIVE command.
Regardless of whether "UAV Coherency Required" is set, the hardware
fixed-function units will increment a per-stage semaphore for each
request received if "Accesses UAV" is set for the same or any lower
stage. An implicit DC flush is emitted by the lowermost stage with
"Accesses UAV" set once it's done processing the request, this also
happens regardless of the value of "UAV Coherency Required". The
completion of the DC flush will cause the same stage and all previous
ones to decrement the semaphore, marking the UAV accesses for the
primitive as coherent with L3.
The "UAV Coherency Required" 3DPRIMITIVE bit will cause a pipeline
stall before any threads are dispatched for the first FF stage with
"Accesses UAV" set until the semaphore is cleared for the same stage.
Effectively this guarantees that UAV memory accesses performed by
previous primitives from any stage will be strictly ordered (and
thanks to the implicit DC flush visible in memory) with UAV accesses
from the following primitives.
None of this is required by the usual image, atomic counter and SSBO
GL APIs which have very relaxed cross-primitive coherency and ordering
requirements, so we don't actually ever set the "UAV Coherency
Required" bit -- Ordering with respect to shader invocations from
previous stages on the same primitive where there is a data dependency
is of course already guaranteed as the spec requires, regardless of
this mechanism being enabled. We do set the "Accesses UAV" bits
though since my commit ac7664e493655e290783c23a0412b9c70936da50 (which
this patch partially reverts), mainly because of comments like the
following from the BDW PRM:
> 3DSTATE_GS
>[...]
> 12 Accesses UAV
> Format: Enable
> This field must be set when GS has a UAV access.
There are similar comments in the documentation for the other
3DSTATE_*S commands. The "must" part is misleading and unjustified
AFAIK. Most of the "Accesses UAV" bits don't seem to have any side
effects other than the implicit DC flushes and the related
book-keeping in anticipation for a subsequent primitive with "UAV
Coherency Required" set, so in most cases they are unnecessary and may
incur a performance penalty. There is an exception though. On Gen8+
the PS_EXTRA UAV access bit influences the calculation of the PS
UAV-only and ThreadDispatchEnable signals which on previous
generations were set explicitly by the driver, so we cannot always
avoid enabling it on the PS stage.
The primary motivation for this change is that in fact the hardware
coherency mechanism is buggy and will cause a rather non-deterministic
hang on Gen8 when VS is the only stage with "Accesses UAV" set and the
processing of a request terminates immediately after the implicit DC
flush is sent for a previous primitive with no additional vertices
being emitted for the second primitive, what will cause the hardware
to skip sending a second DC flush and cause the VS to stall
indefinitely waiting for a response from the DC (BDWGFX HSD 1912017).
This hardware bug can be reproduced on current master with the
spec@arb_shader_image_load_store@host-mem-barrier@Indirect/RaW piglit
subtest (if you have the patience to run it a few dozen times).
The proposed workaround is to insert CS STALLs speculatively between
3DPRIMITIVE commands when "Accesses UAV" is enabled for the VS stage
only. Because this would affect one of the hottest paths in the
driver and likely decrease performance even further due to the
unnecessary serialization, and because we don't actually need the
implicit DC flushes, it seems better to just disable them.
Cc: 11.0 <[email protected]>
(cherry picked from commit 5346c1167064d6429c6338974c6342f8346fd34b)
|
|
|
|
|
|
|
|
|
|
|
|
| |
Patch adds missing type (used with NV_read_depth) so that it gets
handled correctly. This fixes errors seen with following CTS test:
ES3-CTS.gtf.GL3Tests.packed_pixels.packed_pixels
Signed-off-by: Tapani Pälli <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
Cc: "11.0" <[email protected]>
(cherry picked from commit d8d0e4a81e42678cc8c8b876dfee24d5c2f4ba38)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I started seeing a lot of situations on nv30 where fence emission
wouldn't fit into the previous buffer (causing assertions). This ensures
that whenever checking for space, we always leave a bit of extra room
for the fence emission commands. Adjusts the nv30 and nvc0 fence
emission logic to bypass the space checking as well.
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: [email protected]
Reviewed-by: Samuel Pitoiset <[email protected]>
(cherry picked from commit 47d11990b2ca3eb666b8ac81fee7f7eb5019eba1)
Squashed with commit
nouveau: avoid emitting new fences unnecessarily
Right now we emit on every kick, but this is only necessary if something
will ever be able to observe that the fence completed. If there are no
refs, leave the fence alone and emit it another day.
This also happens to work around an issue for the kick handler -- a kick
can be a result of e.g. nouveau_bo_wait or explicit kick, or it can be
due to lack of space in the pushbuf. We want the emit to happen in the
current batch, so we want there to always be enough space. However an
explicit kick could take the reserved space for the implicitly-triggered
kick's fence emission if it happened right after. With the new mechanism,
hopefully there's no way to cause two fences to be emitted into the same
reserved space.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
Fixes: 47d11990b (nouveau: make sure there's always room to emit a fence)
Cc: [email protected]
(cherry picked from commit 8053c9208f30964d89dc4e262fdf2148f0664696)
Squashed with commit
nv50,nvc0: don't base decisions on available pushbuf space
We still have to push everything out, might as well kick earlier and
flip pushbufs when we know we'll need it. This resolves some issues with
the new policy of making sure that we always leave a bit of room at the
end for fences.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
Fixes: 47d11990b (nouveau: make sure there's always room to emit a fence)
Cc: [email protected]
(cherry picked from commit 9fe458335ffd35366ef0f4b741aad0cdb3503783)
Squashed with commit
nouveau: avoid double-emitting fence
The act of ensuring that there is space can cause a flush to happen,
which will emit the current screen fence. If that is the fence we're
trying to wait on, then it will have been emitted as a result of doing
the PUSH_SPACE. Don't attempt to emit it a second time.
Signed-off-by: Ilia Mirkin <[email protected]>
Fixes: 8053c9208f (nouveau: avoid emitting new fences unnecessarily)
Cc: [email protected]
(cherry picked from commit bf97f8d467ad1d485c2327da3f4fe1f9e1dc7379)
|
|
|
|
| |
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
| |
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
| |
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 30570b262971c881366deab58caf8d8d48d7d79d.
As mentioned by Ilia Mirkin:
Please remove this one from your list of cherry-picked patches. While
it fixes real issues on nv30 (and probably the other generations too),
it appears to introduce some new ones on nvc0. I've figured out what's
causing it, but haven't figured out a proper fix. Not sure I'll be
able to before you do a release.
|