| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This allows most GPU objects to use the full 48-bit address space
offered by Gen8+ platforms, rather than being stuck with 32-bit.
This expands the available GPU memory from 4G to 256TB or so.
A few objects - instruction, scratch, and vertex buffers - need to
remain pinned in the low 4GB of the address space for various reasons.
We default everything to 48-bit but disable it in those cases.
Thanks to Jason Ekstrand for blazing this trail in anv first and
finding the nasty undocumented hardware issues. This patch simply
rips off all of his findings.
Reviewed-by: Jordan Justen <[email protected]>
Acked-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
This makes the name shorter in debug printouts. If "workaround_bo"
is good enough for the code, it's probably good enough for debugging.
|
|
|
|
|
| |
When anything goes wrong with this code, dumping the validation list
is a useful way to figure out what's happening.
|
|
|
|
|
|
|
|
|
|
| |
Fixes: 6c530ad11605
("i965: Reduce passing 2x32b of reloc_domains to 2 bits")
Signed-off-by: Andriy Khulap <[email protected]>
Signed-off-by: Vadym Shovkoplias <[email protected]>
Reviewed-by: Chris Wilson <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In 16631ca30ea6 we fixed gen9 active components to account for padded
inputs in the URB, which we can have with SSO programs. To do that,
instead of going through the bitfield of inputs (which doesn't include
padding information), we compute the number of inputs from the size
of the URB entry.
Unfortunately, there are some special inputs that are not stored in
the URB and that we also need to account for. These special inputs
are identified and handled during calculate_attr_overrides().
Instead of keeping track of the exact number of inputs, we just
program active components for all possible inputs like we do in
anvil.
This fixes a regression in a WebGL program that uses Point Sprite
functionality (specifically, VARYING_SLOT_PNTC).
v2:
- Add 'Fixes' tag (Mark Janes)
- make no_vue_inputs int instead of uint32_t, and add const qualifier
to num_inputs variable (Ian)
v3:
- Do not try to count inputs correctly, just program all input
slots like we do in anvil (Ken)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105224
Fixes: 16631ca30ea6 (i965/sbe: fix active components for SSO programs with over 16 inputs)
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit a2c1e48f15995a826dc759e064c2603882a37e0c.
On BDWGT3e and KBLGT3e systems, this commit regressed the following
tests:
piglit.spec.ext_framebuffer_multisample.accuracy 2 stencil_resolve small depthstencil
piglit.spec.ext_framebuffer_multisample.accuracy 4 stencil_resolve small depthstencil
piglit.spec.ext_framebuffer_multisample.accuracy 6 stencil_resolve small depthstencil
piglit.spec.ext_framebuffer_multisample.accuracy 8 stencil_resolve small depthstencil
piglit.spec.ext_framebuffer_multisample.accuracy all_samples stencil_resolve small depthstencil
|
|
|
|
|
| |
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before, we were trusting in the hardware to take the intersection
of the viewport clip with the drawing rectangle. Unfortunately,
3DSTATE_DRAWING_RECTANGLE is fairly expensive because it implicitly
does a full pipeline stall. If we're a bit more careful with our
viewport clipping, we can just re-emit it once at context creation
time.
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
Gen11 does not support DF, Q, UQ types in hardware. As a result, we have
to disable some GL extensions until they can be reimplemented.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Tapani Pälli <[email protected]>
Reviewed-by: Chris Wilson <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Scott D Phillips <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
lower_blend_equation_advanced().
This requires passing an extra argument to the lowering pass because
the KHR_blend_equation_advanced specification doesn't seem to define
any mechanism for the implementation to determine at compile-time
whether coherent blending can ever be used (not even an "#extension
KHR_blend_equation_advanced_coherent" directive seems to be required
in the shader source AFAICT).
In the long run we'll probably want to do state-dependent recompiles
based on the value of ctx->Color.BlendCoherent, but right now there
would be no benefit from that because the only driver that supports
coherent framebuffer fetch is i965 on SKL+ hardware, which are unable
to support the non-coherent path for the moment because of texture
layout issues, so framebuffer fetch coherency is always enabled for
them.
Reviewed-by: Plamena Manolova <[email protected]>
|
|
|
|
|
|
|
|
| |
The changes I had originally planned for the MESA_shader_framebuffer_fetch
extension have been merged into the EXT spec, there's no point in keeping
MESA_shader_framebuffer_fetch extension enables.
Reviewed-by: Plamena Manolova <[email protected]>
|
|
|
|
|
|
|
|
| |
This GL entry point was renamed to glFramebufferFetchBarrier() in the
EXT extension on request from Khronos members. Update the Mesa
codebase to match the latest spec.
Reviewed-by: Plamena Manolova <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts two bogus and seemingly useless changes from the commits
referenced below, which broke KHR_blend_equation_advanced (and
EXT_shader_framebuffer_fetch_non_coherent which wasn't exposed yet)
for any kind of render target surface that would cause the
get_isl_surf() call in brw_emit_surface_state() to do anything useful
(notice how the result of get_isl_surf() is completely ignored by the
caller right now), as was the case while using those extensions with
1D array or 3D framebuffers in particular.
Fixes: f5859b45b1686e8116380d87 "i965/miptree: Switch remaining surfaces to isl"
Fixes: bf24c3539e4b6989512968ca "i965/miptree: Clean-up unused"
Cc: [email protected]
Reviewed-by: Plamena Manolova <[email protected]>
|
|
|
|
|
|
|
| |
GLSL 1.40 is required.
Tested-by: Dieter Nützel <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
| |
Tested-by: Dieter Nützel <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
| |
Fixes: 458468c136e "i965: Expose OA counters via INTEL_performance_query"
Signed-off-by: Lionel Landwerlin <[email protected]>
Cc: <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The setTexBuffer2 hook from GLX is used to implement glxBindTexImageEXT
which has tighter restrictions than just "it's shared". In particular,
it says that any rendering to the image while it is bound causes the
contents to become undefined.
The GLX_EXT_texture_from_pixmap extension provides us with an acquire
and release in the form of glXBindTexImageEXT and glXReleaseTexImageEXT.
The extension spec says,
"Rendering to the drawable while it is bound to a texture will leave
the contents of the texture in an undefined state. However, no
synchronization between rendering and texturing is done by GLX. It
is the application's responsibility to implement any synchronization
required."
From the EGL 1.4 spec for eglBindTexImage:
"After eglBindTexImage is called, the specified surface is no longer
available for reading or writing. Any read operation, such as
glReadPixels or eglCopyBuffers, which reads values from any of the
surface’s color buffers or ancillary buffers will produce
indeterminate results. In addition, draw operations that are done
to the surface before its color buffer is released from the texture
produce indeterminate results
In other words, between the bind and release calls, we effectively own
those pixels and can assume, so long as we don't crash, that no one else
is reading from/writing to the surface. The GLX and EGL implementations
call the setTexBuffer2 and releaseTexBuffer function pointers that the
driver can hook.
In theory, this means that, between BindTexImage and ReleaseTexImage, we
own the pixels and it should be safe to track aux usage so we
can avoid redundant resolves so long as we start off with the right
assumption at the start of the bind/release pair.
In practice, however, X11 has slightly different expectations. It's
expected that the server may be drawing to the image at the same time as
the compositor is texturing from it. In that case, the worst expected
outcome should be tearing or partial rendering and not random corruption
like we see when rendering races with scanout with CCS. Fortunately,
the GEM rules about texture/render dependencies save us here. If X11
submits work to write to a pixmap after the compositor has submitted
work to texture from it, GEM inserts a dependency between the compositor
and X11. If X11 is using a high-priority context, this will cause the
compositor to get a temporarily boosted priority while the batch from
X11 is waiting on it. This means that we will never have an actual race
between X11 and the compositor so no corruption can happen.
Unfortunately, however, this means that X11 will likely be rendering to it
between the compositor's BindTexImage and ReleaseTexImage calls. If we
want to avoid strange issues, we need to be a bit careful about
resolves because we can't really transition it away from the "default"
aux usage. The only case where this would practically be a problem is
with image_load_store where we have to do a full resolve in order to use
the image via the data port. Even there it would only be a problem if
batches were split such that X11's rendering happens between the resolve
and the use of it as a storage image. However, the chances of this
happening are very slim so we just emit a warning and hope for the best.
This commit adds a new helper intel_miptree_finish_external which resets
all aux state to whatever ISL says is the right worst-case "default" for
the given modifier. It feels a little awkward to call it "finish"
because it's actually an acquire from the perspective of the driver, but
it matches the semantics of the other prepare/finish functions. This
new helper gets called in intelSetTexBuffer2 instead of make_shareable.
We also add an intelReleaseTexBuffer (we passed NULL to releaseTexBuffer
before) and call intel_miptree_prepare_external in it. This probably
does nothing most of the time but it means that the prepare/finish calls
are properly matched.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The old code made a new miptree that referenced the same BO as the
renderbuffer and just trusted in the memory aliasing to work. There are
only two ways in which the new miptree is liable to differ from the one
in the renderbuffer and neither of them matter:
1) It may have a different target. The only targets that we can ever
see in intelSetTexBuffer2 are GL_TEXTURE_2D and GL_TEXTURE_RECTANGLE
and the difference between the two doesn't matter as far as the
miptree is concerned; genX(update_sampler_state) only looks at the
gl_texture_object and not the miptree when determining whether or
not to use normalized coordinates.
2) It may have a very slightly different format. Again, this doesn't
matter because we've supported texture views for quite some time so
we always look at the gl_texture_object format instead of the
miptree format for hardware setup anyway.
On the other hand, because we were recreating the miptree, we were using
intel_miptree_create_for_bo which doesn't understand modifiers. We
really want this function to work without doing a resolve so long as you
have modifiers so we need to fix that.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This function is used to determine when we need to re-allocate a
miptree. Since we do nothing different in miptree allocation for
sRGB vs. linear, loosening this should be safe and may lead to less
copying and reallocating in some odd cases.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
We're about to start letting the intel_obj->_Format be the "real"
texture format. For depth/stencil textures, this may be a combined
depth stencil format. For ETC2 on gen7 and earlier, this will be the
actual ETC2 format. This makes a bit more GL sense but means we have to
be careful in state upload.
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
| |
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
Reviewed-by: Tapani Pälli <[email protected]>
Acked-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Acked-by: Daniel Stone <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
By default, 3DSTATE_CONSTANT_* Constant Buffer 0 is relative to dynamic
state base address. This makes it unusable for pushing UBOs.
There is a bit in the INSTPM register (or CS_DEBUG_MODE2 on Skylake)
which controls whether buffer 0 is relative to dynamic state base
address, or simply a normal pointer. Setting that gives us full
flexibility. This lets us push up to 4 UBO ranges.
We can't currently write this on Haswell and earlier, and will need
to update the kernel command parser, and then do the whole version
checking song and dance. We also need a brand new kernel that supports
context isolation - on older kernels, newly created contexts inherit
register state from whatever happened to be running. So, setting this
would have catastrophic impact on other drivers such as libva, Beignet,
or older Mesa.
See commit 8ec5a4e4a4a32f4de351c5fc2bf0eb615b6eef1b where we did this
once before, but had to revert it in commit 013d33122028f2492da90a03a.
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Kernel 4.16 has proper context isolation, which means we can change
the L3 configuration without worrying about that leaking to other
newly created contexts, breaking the assumptions of other userspace.
So, disable our workaround to reprogram it back to the default.
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The previous commit reworked the checks intel_from_planar() to check the
right individual cases for regular/planar/aux buffers, and do size
checks in all cases.
Unfortunately, the aux size check was broken, and required the aux
surface to be allocated with the correct aux stride, but full image
height (!).
As the ISL aux surface is not recorded in the DRIimage, we cannot easily
access it to check. Instead, store the aux size from when we do have the
ISL surface to hand, and check against that later when we go to access
the aux surface.
Signed-off-by: Daniel Stone <[email protected]>
Fixes: c2c4e5bae3ba ("i965: Fix bugs in intel_from_planar")
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Meta is awful and we'd like to stop using it. Implementing this using
BLORP allows us to stop trashing a bunch of GL state every time.
This follows the structure of st_generate_mipmap().
compute_num_levels is lifted directly from there.
Improves performance in Gl41HdrBloom by about 11.794% +/- 1.01919% (n=3)
on Kabylake GT2 at 1280x720 (the difference seems much smaller at higher
resolutions).
v2 (idr): Don't try depth or depth-stencil blorp blits on Gen4 or Gen5
because it's not implemented yet.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
From PIPE_CONTROL command description in gfxspecs:
"Whenever a Binding Table Index (BTI) used by a Render Taget Message
points to a different RENDER_SURFACE_STATE, SW must issue a Render
Target Cache Flush by enabling this bit. When render target flush
is set due to new association of BTI, PS Scoreboard Stall bit must
be set in this packet."
V2: Move the PIPE_CONTROL to update_renderbuffer_surfaces() in
brw_wm_surface_state.c (Ken).
Fixes a fulsim error and a GPU hang described in below JIRA.
JIRA: MD5-322
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Sampling from hiz is enabled in i965 for GEN9+ but this feature has
been removed from gen11. So, this new flag will be useful to turn
the feature on/off for different gen h/w. It will be used later
in a patch adding device info for gen11.
Suggested-by: Kenneth Graunke <[email protected]>
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
SIMD4x2 dispatch mode has been removed in GEN11. We're not using
it anyways in Mesa. Adding few asserts to make it explicit.
Use GEN_GEN macro in place of devinfo->gen (Ken)
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
Nothing is changed here from gen10 to gen11. So, just update
the assert.
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
Gen11 MOCS settings are duplicate of Gen10 MOCS settings.
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Dylan Baker <[email protected]>
|
|
|
|
|
|
|
| |
This field is removed in gen11+
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
These only existed to avoid making people update libdrm for new uABI
headers. A while ago we imported those headers into the Mesa repo,
so the dependency is gone and these are no longer useful.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Generators really are never the thing you want. The problem in this case
is that a generator must create a file that contains any file that the
generated target depends on. Since brw_oa.py doesn't generate such a
file the generated sources are not regenerated even if the xml files
they should depend on changes.
While we could change brw_oa.py to write such a file, that's silly, it
depends on itself and the xml file. So we'll just use a custom target
instead, which will have the correct dependency behavior and doesn't
really add that much code.
Fixes: 3218056e0eb3 ("meson: Build i965 and dri stack")
CC: Ian Romanick <[email protected]>
Signed-off-by: Dylan Baker <[email protected]>
Acked-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
| |
This fixes the build in clang
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105088
Reviewed-by: Tapani Pälli <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
When initializing mcs, map with MAP_RAW and fill in the linear
map. Removes a place where gtt mapping is used.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
In all current uses, the linear surface is only allocated starting
at (xt1, yt1) anyway, so this improves the calling ergonomics.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
TileY's low 6 address bits are: v1 v0 u3 u2 u1 u0
Thus a cache line in the tiled surface is composed of a 2d area of
16x4 bytes of the linear surface.
Add a special case where the area being copied is 4-line aligned
and a multiple of 4-lines so that entire cache lines will be
written at a time.
On Apollolake, this increases tiling throughput to wc maps by
84.0103% +/- 0.862818%
v2: Split [y0, y1) and [y2, y3) loops apart for clarity (Jason Ekstrand)
v3: Don't reset src var (Jason), Ensure y0 <= y1 <= y2 <= y3
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Gen10 seems pretty stable so far, so there's no reason to keep this
message.
Signed-off-by: Rafael Antognolli <[email protected]>
Cc: Kenneth Graunke <[email protected]>
Cc: "18.0" [email protected]
Reviewed-by: Anuj Phogat <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Looks like one conversion was missed.
Fixes: e149a0253 (mesa,glsl,nir: reduce gl_state_index size to 2 bytes)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105067
Signed-off-by: Dave Airlie <[email protected]>
Tested-by: Mark Janes <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Let's use the new gl_state_index16 type everywhere and remove
the typecasts.
This helps reduce the size of gl_program_parameter.
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
| |
Reviewed-by: Brian Paul <[email protected]>
|