| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Trivial.
|
|
|
|
| |
Trivial.
|
|
|
|
| |
Trivial.
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
clock_crystal_freq is always non-zero now.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
It looks like there is no way to monitor SDMA busyness on GFX9.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
by setting PIPE_CONTEXT_DEBUG in the caller
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
This is overly cautious, but better safe than sorry.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
For inputs and outputs, indirect indexing is lowered by the GLSL compiler.
For temporaries, use alloca and disable the "promote-alloca" pass.
In the future, we could switch all codepaths to alloca permanently and
just rely on the "promote-alloca" pass.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
Both loops now look simple.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
For clarity. It's only used by color interpolation.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
This is much simpler.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
It's for initializing the native (x86) target.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
This should fix exports of suballocated buffers.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
Trivial.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
Fixes performance regression from f50aa21456d - was forcing internal
code generation to target AVX (no gather, etc).
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
|
| |
This reverts commit d8b2ccdb880f, which causes priglit regressions on GPUs
with SNORM support. We'll have another try at enabling this feature after
the 17.2 branchpoint.
Signed-off-by: Lucas Stach <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
A dangling bo object would result in memory corruption while loading a
level in ioquake3_opengl2.
Fixes: 330d0607ed60 (gallium: remove pipe_index_buffer and set_index_buffer)
Suggested-by: Lucas Stach <[email protected]>
Signed-off-by: Wladimir J. van der Laan <[email protected]>
Reviewed-by: Lucas Stach <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
GC3000 has a new LOG instruction, similar to the new SIN and COS instructions.
Generate the new instruction sequence when appropriate; there are
two occasions, as part of LIT and the generator for the LG2
instruction itself.
Signed-off-by: Wladimir J. van der Laan <[email protected]>
Reviewed-by: Lucas Stach <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
If we blit from a rendertarget or a depthstencil buffer there might still
be dirty data in the TS buffer which needs to be flushed out.
Fixes missing shadow tiles in glmark2 shadow.
Signed-off-by: Lucas Stach <[email protected]>
Reviewed-by: Philipp Zabel <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before resolving a rendertarget or a depth/stencil resource into a
texture, flush both the color cache and the depth cache together.
It is unclear whether this is necessary for the following stall to
work properly, or whether the depth flush just adds enough time
for the color cache flush to finish before the resolver is started,
but this change removes artifacts that otherwise appear if a texture
is sampled directly after rendering into it.
The test case is a simple QML scene graph with a QtWebEngine based
WebView rendered on top of a blue background:
import QtQuick 2.0
import QtQuick.Window 2.2
import QtWebView 1.1
Window {
Rectangle {
id: background
anchors.fill: parent
color: "blue"
}
WebView {
id: webView
anchors.fill: parent
}
Component.onCompleted: {
webView.url = "<some animated website>"
}
}
If the website is animated, the WebView renders the site contents into
texture tiles and immediately afterwards samples from them to draw the
tiles into the Qt renderbuffer. Without this patch, a small irregular
triangle in the lower right of each browser tile appears solid blue, as
if the texture sampler samples zeroes instead of the website contents,
and the previously rendered blue Rectangle shows through.
Other attempts such as adding a pipeline stall before the color flush or
a TS cache flush afterwards or flushing multiple times, with stalls
before and after each flush, have shown no effect.
Signed-off-by: Philipp Zabel <[email protected]>
|
|
|
|
| |
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
| |
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
| |
Previous check-ins without testing with USE_SIMD16_FRONTEND have
introduced regressions. This fixes the build, not the regressions.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
| |
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
| |
Core will ensure hot tiles are loaded for read and write render targets,
and will skip all output merger for read-only render targets.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
| |
WIP to support read-only render targets.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Forwarding from the ES prolog to the ES just barely exceeds the current
maximum array size when 16 vertex attributes are used. Give it a decent
bump to account for merged shaders having up to 32 user SGPRs.
Fixes a crash in GL45-CTS.multi_bind.draw_bind_vertex_buffers.
Cc: [email protected]
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
We were using the "cp" union fields, which are only valid for compute
shaders. The threads calculation affects the available GPRs, so just
pick a small number for other shader types to avoid limiting available
registers.
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The comments are correct - we get -1 and 0. However by adding 1, we
convert this into 0,1. This mostly works for conditionals, but when
negated, this will yield the wrong result. Instead just negate the
values (as they are backwards -- -1 means back instead of front).
Fixes tests/shaders/glsl-fs-frontfacing-not.shader_test and
dEQP-GLES3.functional.shaders.builtin_variable.frontfacing on A530.
The latter also tested on A306 by Rob Clark.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If size of client memory copy is too large, don't copy. The draw will
access user-buffer directly and then block. This is faster and more
efficient than queuing many large client draws.
Applications that still use large client arrays benefit from this. VMD
is an example.
The threshold for this path defaults to 32KB. This value can be
overridden by setting environment variable SWR_CLIENT_COPY_LIMIT.
v2: Use #define for default value, rather than hard-coded constant.
Reviewed-by: Tim Rowley <[email protected]>
|
|
|
|
|
|
|
|
| |
Moved reading of environment config options out of
swr_create_screen_internal, into a separate swr_validate_env_options.
This is to keep from cluttering create_screen.
Reviewed-by: Tim Rowley <[email protected]>
|
|
|
|
|
|
|
|
| |
Removed the hard-coded constant in favor of a #define. Also removed
TODO comment. The constant value doesn't need an environment
configurable option.
Reviewed-by: Tim Rowley <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since commit 7f80a9ff1312 ("vc4: Introduce XML-based packet header
generation like Intel's."), the vc4 build on Android is broken:
out/target/product/linaro_x86_64/gen/STATIC_LIBRARIES/libmesa_broadcom_genxml_intermediates/broadcom/cle/v3d_packet_v21_pack.h:12:10: fatal error: 'v3d_packet_helpers.h' file not found
external/mesa3d/src/gallium/drivers/vc4/vc4_cl_dump.c:28:10: fatal error: 'vc4_packet.h' file not found
The path of the generated header needs to be fixed since we build out of
tree.
Acked-by: Eric Anholt <[email protected]>
Signed-off-by: Rob Herring <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
X11 and GL compositor performance on VC4 has been terrible because of our
SHARED-usage buffers all being forced to linear. This swaps SHARED &&
!LINEAR buffers over to being tiled.
This is an expected win for all GL compositors during rendering (a full
copy of each shared texture per draw call), allows X11 to be used with
decent performance without a GL compositor, and improves X11 windowed
swapbuffers performance as well. It also halves the memory usage of
shared buffers that get textured from. The only cost should be idle
systems with a scanout-only buffer that isn't flagged as LINEAR, in which
case the memory bandwidth cost of scanout goes up ~25%.
This implements the EGL_EXT_image_dma_buf_import_modifiers extension,
supporting the VC4 T_TILED modifier.
v2: Added modifier support to resource creation/import, and
advertisement (by daniels).
v3: Fix old-kernel fallback path, fix compiler error and warnings, and
comment touchups (by anholt).
Reviewed-by: Daniel Stone <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Rather than open-coding populating the first slice inside resource
import, use vc4_setup_slices to do it for us.
v2: Rebase on VC4_DEBUG=surf change
Reviewed-by: Daniel Stone <[email protected]>
|
|
|
|
|
|
| |
I kept flipping the bool on for debug, so let's just make it available.
Reviewed-by: Daniel Stone <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Needing to get our uapi header from libdrm has only complicated things.
Follow intel's lead and drop our requirement for it.
Generated from the same commit mentioned in the README.
v2: Update Android.mk as well, move vc4_drm.h reference for distcheck.
Reviewed-by: Daniel Stone <[email protected]>
|
|
|
|
|
| |
The kernel hasn't been synchronous in a couple of years, plus there was
synchronization code right there.
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
| |
put the comment before the relevant code. Move declaration of
swizzle_tab var to where it's used.
Reviewed-by: Charmaine Lee <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit bfe1e7737a76e3b046 changed how texture swizzles are set up.
This exposed a latent bug in the VMware driver: we were ignoring
the texture instruction's writemask when applying the 0 and 1
swizzle terms.
This wasn't caught by the Piglit texture swizzle test because it
only exercises fixed function (no write masking).
Fixes issues seen with ETQW apitrace.
CC: <[email protected]>
Reviewed-by: Charmaine Lee <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
swr used to build and link the rasterizer to the driver, and to support
multiple architectures we needed to have multiple versions of the
driver/rasterizer combination, which needed to link in much of mesa.
Changing to having one instance of the driver and just building
architecture specific versions of the rasterizer gives a large reduction
in disk space.
libGL.so 6464 Kb -> 7000 Kb
libswrAVX.so 10068 Kb -> 5432 Kb
libswrAVX2.so 9828 Kb -> 5200 Kb
Total 26360 Kb -> 17632 Kb
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Use the SWR rasterizer API through the table returned from
SwrGetInterface rather than referencing the functions directly.
This will allow us to move to a model of having the driver dynamically
load the appropriate swr architecture library.
Reviewed-by: Bruce Cherniak <[email protected]>
|