| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
| |
This patch fixes a memory leak with sampler state when piglit
is run with HW version 11. Sampler state clean up was incorrectly skipped
in svga_cleanup_sampler_state() for vgpu9.
Tested with piglit.
Reviewed-by: Neha Bhende <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
| |
Left over test code spotted by Sinclair.
Tested with piglit, Google Earth, Lightsmark, Heaven4, glretraces, etc.
Reviewed-by: Sinclair Yeh <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Use SVGA3D_QUERYTYPE_MAX instead of SVGA_QUERY_MAX for
svga query type check.
Tested with various OpenGL apps with GALLIUM_HUD set.
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
| |
Replace the num-resources-mapped hud with
num-textures-mapped and num-buffers-mapped, so we can
differentiate the map counts for these two different resources.
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
| |
This will make it easier to add new hud types.
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some OpenGL apps, like Cinebench R15, have many glDrawElements(GL_QUADS)
calls. Since we don't directly support quads we have to convert these
calls into GL_TRIANGLES which involves generating a new index buffer.
This patch saves the new/translated index buffer in the hope that it
can be reused for a later draw call.
Cinebench R15 increases by about 20% with this change.
The NobelClinician Viewer app also hits this code.
Tested with full piglit run.
Reviewed-by: Charmaine Lee <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a consecutive sequence of drawing commands references the same
vertex/index buffers, there should be no need to rebind the surfaces
for the second and subsequent drawing commands.
Apps that use multiple display lists benefit from this since the vertex
data for several display lists is often stored in one buffer.
In the case of the legacy E&S Glaze demo, this reduces the size of our
command buffers from 91KB to 44KB. One WSI Fusion trace shows a 33%
reduction in command buffer sizes.
Tested with full piglit run.
Reviewed-by: Charmaine Lee <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, every time we put shader constants into the default constant
buffer we called u_upload_alloc(), which mapped the buffer, and
u_upload_unmap(). We had to unmap the buffer before calling
svga_buffer_handle() to get the winsys handle for the buffer. But we
really only need to do that the first time we reference the const buffer.
Now we try to keep the upload manager's buffer mapped until we fill it or
flush the command buffer.
v2: add additional comment on the buffer unmapping code in
svga_context_flush(), per Charmaine.
Reviewed-by: Charmaine Lee <[email protected]>
|
|
|
|
|
|
|
| |
When we migrate a buffer from sw/malloc storage to a hardware buffer,
don't memcpy the whole buffer, just copy the part we've written to.
Reviewed-by: Charmaine Lee <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
PredCopyRegion support copy between same type of textures.
Instead of comparing src and dst pipe texture type, compare svga texture
type which can avoid some software fallback.
for example, it avoids a software blit with the Redway3D Aston demo.
Tested piglit tests on VGPU9 and VGPU10 on GL/DX11Renderer, Redway3D Aston demo
v2: some nit pick suggested by Charmaine.
Reviewed-by: Charmaine Lee <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
| |
This function returns svga texture type for corresponding pipe texture.
Reviewed-by: Charmaine Lee <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Offset was wrong, it's at bit 8, not 4. Also, uses subr instead
of sub when src2 has neg. Similar to GK110 now.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
|
| |
The comment for the commutative flags was wrong because OP_MUL is
before OP_MAD. While we are at it add missing opcodes, and fix
the comment about the short forms.
Signed-off-by: Samuel Pitoiset <[email protected]>
Acked-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This needs to be above the switch on API, as that can return true
(valid to render) before this error check even had a chance to run.
Fixes ESEXT-CTS.draw_elements_base_vertex_tests.invalid_mapped_bos,
which worked before commit 72f1566f90c434c7752d8405193eec68d6743246.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Mathias Fröhlich <[email protected]>
|
|
|
|
|
|
|
| |
Fixes four ES32-CTS.context_flags.* tests.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch switches non-TGSI compute shaders over to using the HSA
ABI described here:
https://github.com/RadeonOpenCompute/ROCm-Docs/blob/master/AMDGPU-ABI.md
The HSA ABI provides a much cleaner interface for compute shaders and allows
us to share more code in the compiler with the HSA stack.
The main changes in this patch are:
- We now pass the scratch buffer resource into the shader via user sgprs
rather than using relocations.
- Grid/Block sizes are now passed to the shader via the dispatch packet
rather than at the beginning of the kernel arguments.
Typically for HSA, the CP firmware will create the dispatch packet and set
up the user sgprs automatically. However, in Mesa we let the driver do
this work. The main reason for this is that I haven't researched how to
get the CP to do all these things, and I'm not sure if it is supported
for all GPUs.
v2:
- Add comments explaining why we are setting certain bits of the scratch
resource descriptor.
v3:
- Use amdgcn-mesa-mesa3d triple instead of amdgcn--mesa3d.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
| |
|
|
|
|
|
|
|
| |
This fixes 8 fs-interpolateat* piglit crashes on radeonsi, because it can't
handle non-input operands in interpolateAt*.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
This fixes 66 CTS tests on st/mesa.
Cc: 12.0 <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
| |
This fix getting the size of a struct arg. vec3 types still work ok.
Only buit-in args need to have power of two alignment, getTypeAllocSize
reports the correct size in all cases.
Acked-by: Francisco Jerez <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
The gallium interface defines these like DX10. Note that OpenGL ignores
these options if MSAA is disabled or the dest buffer doesn't support
MSAA.
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
| |
The old comment was a copy and paste mistake. Indent another comment.
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Regardless of whether GL_MULTISAMPLE is enabled (it's enabled by default)
we should not set the alpha_to_coverage or alpha_to_one flags if the
current drawing buffer does not do MSAA.
This fixes the new piglit gl-1.3-alpha_to_coverage_nop test.
ETQW is a game that enables GL_SAMPLE_ALPHA_TO_COVERAGE without MSAA.
Shrubs along the side of roads were invisible because fragments with
alpha < 0.5 were being discarded (zero coverage).
v2: remove ctx->DrawBuffer != NULL check.
Reviewed-by: Roland Scheidegger <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This adds support for the input attachments subpass type
to the SPIRV->NIR pass.
v1.1: drop handling from vtn_handle_texture
Reviewed-by: Jason Ekstrand <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SPIR-V/Vulkan have a special image type for input attachments
called the subpass type. It has different characteristics than
other images types.
The main one being it can only be an input image to fragment
shaders and loads from it are relative to the frag coord.
This adds support for it to the GLSL types. Unfortunately
we've run out of space in the sampler dim in types, so we
need to use another bit.
v2: Fixup subpass input name (Jason)
Reviewed-by: Jason Ekstrand <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Gen6 only has one additional restriction over Gen7+, so we just add it
to the existing gen7 function (which actually covers later gens too).
This should stop FINISHME spew when running GL on Sandybridge.
v2: Fix bytes per block vs. bits per block confusion (Jason) and
rename function to gen6_filter_tiling (Jason and Chad).
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
Note that ASTC support is not actually mandated for this extension to be
exposed.
Signed-off-by: Ilia Mirkin <[email protected]>
Acked-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
From the ARB_gpu_shader5 spec:
The built-in functions interpolateAtCentroid() and interpolateAtSample()
will sample variables as though they were declared with the "centroid"
or "sample" qualifiers, respectively.
When running with persample dispatch forced by the API, we interpolate
anything that isn't flat as if it's qualified by "sample". In order to
keep interpolateAtCentroid() consistent with the "centroid" qualifier, we
need to make interpolateAtCentroid() do sample interpolation instead.
Nothing in the GLSL spec guarantees that the result of
interpolateAtCentroid is uniform across samples in any way, so this is a
perfectly fine thing to do.
Fixes 8 of the new dEQP-VK.pipeline.multisample_interpolation.* Vulkan CTS
tests that specifically validate consistency between the "sample" qualifier
and interpolateAtSample()
Signed-off-by: Jason Ekstrand <[email protected]>
Cc: "12.0" <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some apps issue redundant glLoadMatrixf() calls with the same matrix.
Try to avoid setting dirty state in that situation.
This reduces the number of constant buffer updates by about half in
ET Quake Wars.
Tested with Piglit, ETQW, Sauerbraten, Google Earth, etc.
Reviewed-by: Charmaine Lee <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Structurally, this is very similar to the existing Apple-DRI code, except I
have chosen to implement this using the __GLXDRIdisplay, etc. vtables (as
suggested originally in [1]), rather than a maze of ifdefs. This also means
that LIBGL_ALWAYS_SOFTWARE and LIBGL_ALWAYS_INDIRECT work as expected.
[1] https://lists.freedesktop.org/archives/mesa-dev/2010-May/000756.html
This adds:
* the Windows-DRI extension protocol headers and the windowsdriproto.pc
file, for use in building the Windows-DRI extension for the X server
* a Windows-DRI extension helper client library
* a Windows-specific DRI implementation for GLX clients
The server is queried for Windows-DRI extension support on the screen before
using it (to detect the case where WGL is disabled or can't be activated).
The server is queried for fbconfigID to pixelformatindex mapping, which is
used to augment glx_config.
The server is queried for a native handle for the drawable (which is of a
different type for windows, pixmaps and pbuffers), which is used to augment
__GLXDRIdrawable.
Various GLX extensions are enabled depending on if the equivalent WGL
extension is available.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is supposed to be exposed with the GL_KHR_robustness extension,
which we support on ES 2.0 and later. On desktop GL, it's also exposed
by GL_ARB_robustness, which is supported by all drivers ("dummy_true").
so we also allow desktop GL.
Fixes:
- ES32-CTS.robust.robustness.noResetNotification
- ES32-CTS.robust.robustness.loseContextOnReset
Cc: [email protected]
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Without this bit set, the value in "L3 Atomic Disable" won't get applied by
the hardware so we won't properly get L3 atomic caching.
Fixes dEQP-VK.spirv_assembly.instruction.compute.opatomic.compex and 198 of
the dEQP-VK.image.atomic_operations.* tests on HSW
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The Vulkan driver sets 3DSTATE_DRAWING_RECTANGLE once to MAX_INT x MAX_INT
at the GPU initialization time and never sets it again. The GL driver sets
it every time the framebuffer changes. Originally, blorp set it to the
size of the drawing area but meant we had to set it back in the Vulkan
driver. Instead, we can easily just do that in the GL driver's blorp_exec
implementation and not set it in blorp core.
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Previously, we relied on a driver hook for 3DSTATE_MULTISAMPLE. However,
now that Vulkan and GL use the same sample positions, we can set up
3DSTATE_MULTISAMPLE directly in blorp and delete the driver hook.
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
|
| |
instructions"
This reverts commit 524fd55d2d973f50a5d8bc2255684610f5faae32.
Reason: https://bugs.freedesktop.org/show_bug.cgi?id=97808
|
|
|
|
|
|
|
|
| |
Non-16B-aligned pull constant loads are unlikely to be particularly
useful given that you can get roughly the same effect by using
swizzles on the result.
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
| |
It might be useful to actually handle this once copy propagation
becomes smarter about register-misaligned offsets.
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
A better fix would be to do something along the lines of the FS
back-end spilling code and emit a scratch read before any instruction
that overwrites the register to spill partially due to a non-zero
sub-register offset. In the meantime mark registers used with a
non-zero sub-register offset as no-spill to prevent the spilling code
from miscompiling the program.
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This prevents it from trying to propagate a copy through a
register-misaligned region. MOV instructions with a misaligned
destination shouldn't be treated as a direct GRF copy, because they
only define the destination GRFs partially. Also fix the interference
check implemented with is_channel_updated() to consider overlapping
regions with different register offset to interfere, since the
writemask check implemented in the function is only valid under the
assumption that the source and destination regions are aligned
component by component.
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Cmod propagation would misoptimize the program if the destination
offset of the generating instruction wasn't exactly the same as the
source region offset of the copy instruction. In preparation for
adding support for sub-GRF offsets to the VEC4 IR.
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
register coalesce.
Because the pass already checks that the destination offset of each
'scan_inst' that needs to be rewritten matches 'inst->src[0].offset'
exactly, the final offset of the rewritten instruction is just the
original destination offset of the copy. This is in preparation for
adding support for sub-GRF offsets to the VEC4 IR.
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
|
| |
MOV source.
In preparation for adding support for sub-GRF offsets to the VEC4 IR.
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
|
| |
check.
In preparation for adding support for sub-GRF offsets to the VEC4 IR.
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
|
|
| |
For simplicity just assume that two writes to the same GRF with
different sub-GRF offsets will potentially interfere and break the
dependency control chain. This is in preparation for adding sub-GRF
offset support to the VEC4 IR.
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
|
|
| |
bytes.
This simplifies things slightly and makes the pass more correct in
presence of sub-GRF offsets.
Reviewed-by: Iago Toral Quiroga <[email protected]>
|