| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, the plan was "if the width/height we have to load/store isn't
the size the user is planning on writing, then we need to load the old
contents out beforehand to prevent writing back undefined".
However, when we're doing glTexImage() we often end up aligning the
width/height into the padding of the texture, and we don't actually
need to read out that padding.
Improves x11perf -aatrapezoid100 performance from ~460/sec to
~700/sec.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Regression introduced by
ba0274c7d6c3b77a36bbe1b444f427b0c873e2f3
Check the resource exists before assigning it
a flag (and use This->base.resource instead
of pResource, since the former may have a newly
allocate resource, while the latter would be
NULL).
This should reintroduce the behaviour of previous
code.
Signed-off-by: Axel Davy <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since 1604efa6fda9b780e8537a131ad77f3e83e5a67a,
lconsti and lconstb don't need to be initialized.
Remove some leftovers from the previous code (which
has now invalid use of ARRAY_SIZE on a pointer instead
of an array).
Reported by Coverity.
Signed-off-by: Axel Davy <[email protected]>
|
|
|
|
|
|
|
|
| |
Use uint64_t instead of int64_t in the calculation,
as the result is uint64_t.
Signed-off-by: Axel Davy <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
| |
All ARB_enhanced_layouts piglit tests pass without any changes
in our compiler.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
| |
Cc: "12.0" <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
These were changed in LLVM r284024.
v2:
- Only use float types for vdata of llvm.amdgcn.image.store. LLVM doesn't
support integer types for this intrinsic.
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
| |
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The identity swizzle should operate exactly
like an .r = R, .g = G, .b = B, .a = A swizzle.
This fixes a bunch of the 16-bit BGRA blit tests
dEQP-VK.api.copy_and_blit.blit_image.all_formats.b4g4r4a4*
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fix was found in the radv codebase when running dota2,
no idea if anyone has reported it on anv, but the same problem
occurs.
Once an image is acquired we need to mark it busy.
Acked-by: Edward O'Callaghan <[email protected]>
Cc: "12.0" <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
dota2 does multiple acquires followed by multiple queues,
this bug manifested itself as a hang in the xshmfence code
randomly when dota2 was doing it's menus. It also occured
when running dota2 under phoronix-test-suite.
The fix is once the image is acquired to mark it busy then
so nobody else can acquire. We have to trust vulkan apps
that they will eventually submit it.
Acked-by: Edward O'Callaghan <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
| |
At least set this to not be uninitialised memory.
Cc: "12.0" <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
| |
The table was copied from the Vulkan driver. The comment lines are as long
as the table for cosmetic reasons.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This is a serious performance fix. Discovered by luck.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94354
Cc: 12.0 <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
so that decompress blits aren't needed and depth texturing needs less
memory bandwidth.
Z16 and Z24 are promoted to Z32_FLOAT by the driver, because TC-compatible
HTILE only supports Z32_FLOAT. This doubles memory footprint for Z16.
The format promotion is not visible to state trackers.
This is part of TC-compatible renderbuffer compression, which has 3 parts:
DCC, HTILE, FMASK. Only TC-compatible FMASK compression is missing now.
I don't see a measurable increase in performance though.
(I tested Talos Principle and DiRT: Showdown, the latter is improved by
0.5%, which is almost noise, and it originally used layered Z16,
so at least we know that Z16 promoted to Z32F isn't slower now)
Tested-by: Edmondo Tommasina <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For performance tuning in drivers. It filters out window system
framebuffers and OpenGL renderbuffers.
radeonsi will use this to guess whether a depth buffer will be read
by a shader. There is no guarantee about what will actually happen.
This is a departure from PIPE_BIND flags which are defined to be strict
but they are useless in practice.
Acked-by: Roland Scheidegger <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Caused by a bad rebase when pushing commit 76a940893.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Whether one or two slots are taken up by one API array depends on the
vertex shader, not on how the array is configured. When an array is
set up with fewer components than the shader expects, the high components
are undefined.
Fixes GL45-CTS.vertex_attrib_binding.basic-inputL-case1.
Cc: [email protected]
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
| |
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
| |
Cc: [email protected]
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes a bug with offsets from uniforms which seems to have only been
noticed as a crash in piglit's
arb_gpu_shader5/compiler/builtin-functions/fs-gatherOffset-uniform-offset.frag
on radeonsi.
Cc: [email protected]
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
| |
Fixes GL45-CTS.shader_image_load_store.basic-allTargets-atomic*
Reviewed-by: Dave Airlie <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
| |
If pPhysicalDevices is too small for all physical devices,
the driver must return VK_INCOMPLETE. Since only a single
physical device is supported, this is only the case when
pPhysicalDeviceCount == 0 && pPhysicalDevices != NULL.
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Gallium is completely oblivious to whether the fbo is flipped or not.
Only flip the stipple pattern when the fbo is flipped as well. Otherwise
the driver has no idea when to unflip the pattern.
Fixes bin/gl-2.1-polygon-stipple-fs -fbo
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Tested-by: Brian Paul <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
| |
Otherwise they'll be missing from the tarball and the build will fail.
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Recent fix for non-const offsets broke the case of a single offset (vs 4
offsets). The later code relies on the offs array to contain null values
to tell whether they should be added onto the srcs list.
Fixes: 5239bd592 ("nvc0/ir: fix overwriting of value backing non-constant gather offset")
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
|
|
| |
The offset needs to be properly copied over to the phi value, otherwise
it will get assigned to the base of the merge instead of the proper
location.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
| |
v2: mark llvmpipe & softpipe properly as well (Jason Wood)
Reviewed-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In order to be able to emit overlapping input and output array
declarations, we flip the logic of emitting those declarations on its
head: rather than iterating over slots and emitting the corresponding
declarations, we iterate over the declarations from GLSL and emit those.
v2: fix some regressions related to structs
v3: fix a regression in geometry and tessellation shader array handling
Acked-by: Edward O'Callaghan <[email protected]> (v2)
Reviewed-by: Dave Airlie <[email protected]> (v2)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In some cases, a shader may have an input/output array but not use some
entries in the middle. This happens with eON games, for example.
We emit declarations that cover the entire array range even if there are
some unused gaps. This patch now reflects that in the InputsRead etc.
fields to ensure the various input/outputMapping arrays are actually
correct, which will be important when we re-jiggle the way declarations
are emitted.
v2: fix a typo (Edward O'Callaghan)
Reviewed-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This optimization is incorrect with 64-bit operations, because the
channel-splitting logic in emit_asm ends up being applied twice to
the source operands.
A lucky coincidence of how the writemask test works resulted in this
optimization basically never being applied anyway. As far as I can tell,
the only case where it would (incorrectly) have been applied is something
like
dvec2 d;
float x = (float)d.y;
which nobody seems to have ever done. But the moral equivalent does occur
in one of the component layout piglit test.
Cc: [email protected]
Reviewed-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Empty writemasks mean "copy everything", so we can always just use the number
of vector elements (which uses the GLSL meaning here, i.e. each double is a
single element/writemask bit).
Reviewed-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
| |
For specifying an exact location/component.
v2: change the order of parameters (Dave)
Reviewed-by: Edward O'Callaghan <[email protected]> (v1)
Reviewed-by: Dave Airlie <[email protected]> (v1)
|
|
|
|
|
|
|
| |
v2: change the order of parameters (Dave)
Reviewed-by: Edward O'Callaghan <[email protected]> (v1)
Reviewed-by: Dave Airlie <[email protected]> (v1)
|
|
|
|
|
|
|
| |
v2: remove a tautological left-over assert (Marek)
Reviewed-by: Edward O'Callaghan <[email protected]> (v1)
Reviewed-by: Dave Airlie <[email protected]> (v1)
|
|
|
|
|
|
|
|
| |
This is a screen cap because drivers are expected to support it either
for all shader types or for none of them.
Reviewed-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
| |
This patch requires LLVM r284024 or newer.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
The existing function only worked for integer types.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
radeonsi no longer supports pixel shaders without interpolation optimizations,
which led to assertion failures in si_shader_ps when running shader-db.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
total instructions in shared programs :2286901 -> 2284473 (-0.11%)
total gprs used in shared programs :335256 -> 335273 (0.01%)
total local used in shared programs :31968 -> 31968 (0.00%)
local gpr inst bytes
helped 0 41 852 852
hurt 0 44 23 23
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|