| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There is an object-level preemption workaround which requires this.
However, even without object-level preemption, we seem to have issues
with geometry flickering when 3D and compute are combined in the same
batch and this appears to fix it.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110395
Suggested-by: Jason Ekstrand <[email protected]>
Signed-off-by: Danylo Piliaiev <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Cc: [email protected]
(cherry picked from commit b8842bc3128a255677a1a8ea5207df46f8e54a04)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Test-case with depth-clear 0.5 and format
MESA_FORMAT_Z24_UNORM_X8_UINT fails due inconsistent
clear-value of 0.4999997.
Maybe its better to improve?
CC: Jason Ekstrand <[email protected]>
Fixes: 0ae9ce0f29ea (i965/clear: Quantize the depth clear value based on the format)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111113
Signed-off-by: Sergii Romantsov <[email protected]>
Signed-off-by: Danylo Piliaiev <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
(cherry picked from commit a86eccfb78092493b3999849db62613838951756)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SLICE_COMMON_CHICKEN3 is a privileged register not accesible from userspace.
This patch silences a simulator warning about it.
We don't need to add this workaround in linux kernel as the WA description
says it's fixed on latest stepping.
This reverts commit 85ecd14ef6a084f5e82860de6dbc79870b335682.
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
(cherry picked from commit 7746d4edef5332777bd0206e836617879ad8bb70)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In case of any enabled VS members from: uses_firstvertex,
uses_baseinstance, uses_drawid, uses_is_indexed_draw
leaks may happens.
Call gen6_upload_push_constants allocates
stage_stat->push_const_bo. It than takes pointer from
push_const_bo to draw_params_bo (in the call
brw_prepare_shader_draw_parameters by brw_upload_data)
and do reference which finally haven't got unreferenced.
Fixes leak:
136 bytes in 1 blocks are definitely lost in loss record 6 of 13
at 0x4C31B25: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
by 0xC2B64B7: bo_alloc_internal (brw_bufmgr.c:596)
by 0xC2B6748: brw_bo_alloc (brw_bufmgr.c:672)
by 0xC314BB3: brw_upload_space (intel_upload.c:88)
by 0xC2EBBC5: gen6_upload_push_constants (gen6_constant_state.c:155)
by 0xC9E4FA6: gen9_upload_vs_push_constants (genX_state_upload.c:3300)
by 0xC2E0EDA: check_and_emit_atom (brw_state_upload.c:540)
by 0xC2E0EDA: brw_upload_pipeline_state (brw_state_upload.c:659)
by 0xC2E0FF1: brw_upload_render_state (brw_state_upload.c:681)
by 0xC2C5D2D: brw_draw_single_prim (brw_draw.c:1052)
by 0xC2C62CB: brw_draw_prims (brw_draw.c:1175)
by 0xC488AD1: vbo_exec_vtx_flush (vbo_exec_draw.c:386)
by 0xC485270: vbo_exec_FlushVertices_internal (vbo_exec_api.c:652)
Reviewed-by: Lionel Landwerlin <[email protected]>
Reported-by: Yevhenii Kolesnikov <[email protected]>
Signed-off-by: Sergii Romantsov <[email protected]>
(cherry picked from commit 1931c97a1dc71f8fb548a23247c2a0dd4793ad3c)
|
|
|
|
|
|
|
|
|
| |
It is similar with YUYV
Fixes: 165e704719b85c ("i965/i915: Add UYVY as the supported format")
Signed-off-by: Haihao Xiang <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
(cherry picked from commit 8ead5bebdb5cedc5250116403166279b1b292a85)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was taking a reference to the 64kB upload buffer and never
returning it, leaking a reference each time this atom triggered.
This leaked lots of 64kB upload BOs, eventually running us out of
of VMA space. This would usually happen when using mpv to watch a
movie, after 20-40 minutes.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110134
Fixes: 63d7b33f516 i965/cs: Setup surface binding for gl_NumWorkGroups
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
(cherry picked from commit 3f60810de0a2960ec15118ef9888d9efc9ea605a)
|
|
|
|
|
|
|
|
|
| |
This ports commit 9e7b0988d6e98690eb8902e477b51713a6ef9cae from anv
to i965. Thanks to Lionel for noticing that it was missing!
Fixes: 01058a55229 i965: Add virtual memory allocator infrastructure to brw_bufmgr.
Reviewed-by: Jason Ekstrand <[email protected]>
(cherry picked from commit d568fcd0a09751cd041cdd46bc585e209e0df394)
|
|
|
|
|
|
|
|
| |
This should happen regardless, but let's be paranoid.
Fixes: 01058a55229 i965: Add virtual memory allocator infrastructure to brw_bufmgr.
Reviewed-by: Jason Ekstrand <[email protected]>
(cherry picked from commit 17210c63a91aaf018813b0d336f5f1d4fd87eafb)
|
|
|
|
|
|
|
|
|
|
|
| |
The STATE_BASE_ADDRESS "Size" fields can only hold 0xfffff in pages,
and 0xfffff * 4096 = 4294963200, which is 1 page shy of 4GB.
So we can't use the top page.
Fixes: 01058a55229 i965: Add virtual memory allocator infrastructure to brw_bufmgr.
Reviewed-by: Jason Ekstrand <[email protected]>
(cherry picked from commit 15f134c62853ed6679435a9e4ae40e3308fc7453)
|
|
|
|
|
|
|
| |
Is no longer used, so we have less occasions where NewState is non zero.
Reviewed-by: Brian Paul <[email protected]>
Signed-off-by: Mathias Fröhlich <[email protected]>
|
|
|
|
|
|
|
|
| |
Both drivers are feature-complete and should be running more-or-less at
perf at this point. Drop the warning.
Acked-by: Kenneth Graunke <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This way we can mark the dri_drivers and dri_link arrays as temporary,
as all knowledge about them are contained in a single build-file with
clearly visible limited life-span.
Signed-off-by: Erik Faye-Lund <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
Acked-by: Dylan Baker <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This patch re-enables fast color clears for GEN11.
It also ensures that we use linear color formats
for sRGB surfaces during fast clears.
Signed-off-by: Plamena Manolova <[email protected]>
Reviewed-by: Nanley Chery <[email protected]>
Reviewed-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
|
|
|
| |
One special case, `src/util/xmlpool/.gitignore` is not entirely deleted,
as `xmlpool.pot` still gets generated (eg. by `ninja xmlpool-pot`).
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Dylan Baker <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 624789e3708c moved the destruction of types out of atexit() and
made use of a ref count instead. This is useful for avoiding a crash
where drivers such as radeonsi are still compiling in a thread when the app
exits and has not called MakeCurrent to change from the current context.
While the above scenario is technically an app bug we shouldn't crash.
However that change caused another race condition between the shader
compilation tread in radeonsi and context teardown functions.
This patch makes two changes to fix this new problem:
First we explicitly call _mesa_destroy_shader_compiler_types() when destroying
the st context rather than calling it indirectly via _mesa_free_context_data().
We do this as we must call it after st_destroy_context_priv() so that we don't
destory the glsl types before the compilation threads finish.
Next wait for the shader threads to finish in si_destroy_context() this
also means we need to call context destroy before destroying the queues
in si_destroy_screen().
Fixes: 624789e3708c ("compiler/glsl: handle case where we have multiple users for types")
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
| |
I left code indented one level too far in the previous commit to make
the diff easier to review. Drop that extra level now.
Fixes: 6981069fc80 i965: Ignore uniform storage for samplers or images, use binding info
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
gl_nir_lower_samplers_as_deref creates new top level sampler and image
uniforms which have been split from structure uniforms. i965 assumed
that it could walk through gl_uniform_storage slots by starting at
var->data.location and walking forward based on a simple slot count.
This assumed that structure types were walked in a particular order.
With samplers and images split out of structures, it becomes impossible
to assign meaningful locations. Consider:
struct S {
sampler2D a;
sampler2D b;
} s[2];
The gl_uniform_storage locations for these follow this map:
0 => a[0], 1 => b[0], 2 => a[0], 3 => b[0].
But the new split variables look like:
sampler2D lowered_a[2];
sampler2D lowered_b[2];
and there is no way to know that there's effectively a stride to get to
the location for successive elements of a[] or b[]. So, working with
location becomes effectively impossible.
Ultimately, the point of looking at uniform storage was to pull out the
bindings from the opaque index fields. gl_nir_lower_samplers_as_derefs
can obtain this information while doing the splitting, however, and sets
up var->data.binding to have the desired values.
We move gl_nir_lower_samplers before brw_nir_lower_image_load_store so
gl_nir_lower_samplers_as_derefs has the opportunity to set proper image
bindings. Then, we make the uniform handling code skip sampler(-array)
variables, and handle image param setup based on var->data.binding.
Fixes Piglit tests/spec/glsl-1.10/execution/samplers/uniform-struct,
this time without regressing dEQP-GLES2.functional.uniform_api.random.3.
Fixes: f003859f97c nir: Make gl_nir_lower_samplers use gl_nir_lower_samplers_as_deref
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This 3d performance workaround was initially put in the kernel but the
media driver requires different settings so the register has been
whitelisted in i915 [1] and userspace drivers are left initializing it as
they wish.
[1] : https://patchwork.freedesktop.org/series/59494/
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
| |
We can deduct the size from another field, let's just save some space.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Mark Janes <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The Gen10+ expected format adds an additional counter which we can't
disclose yet. We can still make the size of the expected query result
match.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Mark Janes <[email protected]>
|
|
|
|
|
|
|
| |
One more thing we want to share between the different APIs.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Mark Janes <[email protected]>
|
|
|
|
|
|
|
| |
We want to reuse this in Anv.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Mark Janes <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Mark Janes <[email protected]>
|
|
|
|
|
|
|
| |
We'll want to reuse this in our Vulkan extension.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Mark Janes <[email protected]>
|
|
|
|
|
|
|
| |
We'll want to reuse those structures later on.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Mark Janes <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
We would like to reuse performance query metrics in other APIs. Let's
make the query code dealing with the processing of raw counters into
human readable values API agnostic.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Mark Janes <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The i965 driver has a bunch of code to compare two sets of program keys
and print out the differences. This can be useful for debugging why a
shader needed to be recompiled on the fly due to non-orthogonal state
dependencies. anv doesn't do recompiles, so we didn't need to share
this in the past - but I'd like to use it in iris.
This moves the bulk of the code to the compiler where it can be reused.
To make that possible, we need to decouple it from i965 - we can't get
at the brw program cache directly, nor use brw_context to print things.
Instead, we use compiler->shader_perf_log(), and simply pass in keys.
We put all of this debugging code in brw_debug_recompile.c, and only
export a single function, for simplicity. I also tidied the code a
bit while moving it, now that it all lives in one file.
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Acked-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
Acked-by: Marek Olšák <[email protected]>
Acked-by: Jason Ekstrand <[email protected]>
Acked-by: Bas Nieuwenhuizen <[email protected]>
Acked-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Pipeline statistics queries should not count BLORP's rectangles.
(23) How do operations like Clear, TexSubImage, etc. affect the
results of the newly introduced queries?
DISCUSSION: Implementations might require "helper" rendering
commands be issued to implement certain operations like Clear,
TexSubImage, etc.
RESOLVED: They don't. Only application submitted rendering
commands should have an effect on the results of the queries.
Piglit's arb_pipeline_statistics_query-vert_adj exposes this bug when
the driver is hacked to always perform glBufferData via a GPU staging
copy (for debugging purposes).
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
v2: handle atomics as well
make use of nir_rewrite_image_intrinsic
v3: remove call to nir_remove_dead_derefs
v4: (Timothy Arceri) dont actually call lowering yet
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]> (v3)
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
| |
libintel_common depends on libintel_compiler, but it contains debug
functionality that is needed by libintel_compiler. Break the circular
dependency by moving gen_debug files to libintel_dev.
Suggested-by: Kenneth Graunke <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Drivers using genxml will start compilation before generated files are
created, so add a dependency to it.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
Reviewed-by: Dylan Baker <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
|
| |
This updates allows an MI_LRI to trigger a OA report write in the
global OA buffer. This isn't really useful for us, we just keep close
to the internal public configs.
Signed-off-by: Lionel Landwerlin <[email protected]>
Acked-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Lionel Landwerlin <[email protected]>
Acked-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Lionel Landwerlin <[email protected]>
Acked-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Lionel Landwerlin <[email protected]>
Acked-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Lionel Landwerlin <[email protected]>
Acked-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
This register is flagged as IVB only in the documentation.
Signed-off-by: Lionel Landwerlin <[email protected]>
Acked-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Lionel Landwerlin <[email protected]>
Acked-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
These are new metrics for Gen8/9 to measure the effect of the PMA
stall workaround fix.
Signed-off-by: Lionel Landwerlin <[email protected]>
Acked-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
This rework the programming between older pre-production steppings &
new ones.
Signed-off-by: Lionel Landwerlin <[email protected]>
Acked-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
This unifies some of the programming between pre-production stepping
and production ones.
Signed-off-by: Lionel Landwerlin <[email protected]>
Acked-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
This makes no difference in term of programming, it's just a cleanup.
Signed-off-by: Lionel Landwerlin <[email protected]>
Acked-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Seems there is no sense in blitting 0-sized sources
or destinations.
Additionaly it may cause segfaults for i965.
v2: Function call replaced with inline check
v3: Added check to avoid devision by zero (L. Landwerlin)
v4: Added simillar check for Iris (L. Landwerlin)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110239
Signed-off-by: Sergii Romantsov <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
| |
It seems pretty useless nowadays.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
From "Alpha Coverage" section of SKL PRM Volume 7:
"If Pixel Shader outputs oMask, AlphaToCoverage is disabled in
hardware, regardless of the state setting for this feature."
From OpenGL spec 4.6, "15.2 Shader Execution":
"The built-in integer array gl_SampleMask can be used to change
the sample coverage for a fragment from within the shader."
From OpenGL spec 4.6, "17.3.1 Alpha To Coverage":
"If SAMPLE_ALPHA_TO_COVERAGE is enabled, a temporary coverage value
is generated where each bit is determined by the alpha value at the
corresponding sample location. The temporary coverage value is then
ANDed with the fragment coverage value to generate a new fragment
coverage value."
Similar wording could be found in Vulkan spec 1.1.100
"25.6. Multisample Coverage"
Thus we need to compute alpha to coverage dithering manually in shader
and replace sample mask store with the bitwise-AND of sample mask and
alpha to coverage dithering.
The following formula is used to compute final sample mask:
m = int(16.0 * clamp(src0_alpha, 0.0, 1.0))
dither_mask = 0x1111 * ((0xfea80 >> (m & ~3)) & 0xf) |
0x0808 * (m & 2) | 0x0100 * (m & 1)
sample_mask = sample_mask & dither_mask
Credits to Francisco Jerez <[email protected]> for creating it.
It gives a number of ones proportional to the alpha for 2, 4, 8 or 16
least significant bits of the result.
GEN6 hardware does not have issue with simultaneous usage of sample mask
and alpha to coverage however due to the wrong sending order of oMask
and src0_alpha it is still affected by it.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109743
Signed-off-by: Danylo Piliaiev <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Anuj Phogat <[email protected]>
Acked-by: Jason Ekstrand <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|