| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
Turns out we can write to tiled images as well as read. This avoids
having to linearize or do the tiling in the shader.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On GLSL that info is set as a layout qualifier when redeclaring
gl_FragCoord, so somehow tied to a specific variable. But in practice,
they behave as a global of the shader. On ARB programs they are set
using a global OPTION (defined at ARB_fragment_coord_conventions), and
on SPIR-V using ExecutionModes, that are also not tied specifically to
the builtin.
This patch moves that info from nir variable and ir variable to nir
shader and gl_program shader_info respectively, so the map is more
similar to SPIR-V, and ARB programs, instead of more similar to GLSL.
FWIW, shader_info.fs already had pixel_center_integer, so this change
also removes some redundancy. Also, as struct gl_program also includes
a shader_info, we removed gl_program::OriginUpperLeft and
PixelCenterInteger, as it would be superfluous.
This change was needed because recently spirv_to_nir changed the order
in which execution modes and variables are handled, so the variables
didn't get the correct values. Now the info is set on the shader
itself, and we don't need to go back to the builtin variable to set
it.
Fixes: e68871f6a ("spirv: Handle constants and types before execution
modes")
v2: (Jason)
* glsl_to_nir: get the info before glsl_to_nir, while all the rest
of the info gathering is happening
* prog_to_nir: gather the info on a general info-gathering pass,
not on variable setup.
v3: (Jason)
* Squash with the patch that removes that info from ir variable
* anv: assert that OriginUpperLeft is true. It should be already
set by spirv_to_nir.
* blorp: set origin_upper_left on its core "compile fragment
shader", not just on some specific places (for this we added an
helper on a previous patch).
* prog_to_nir: no need to gather specifically this fragcoord modes
as the full gl_program shader_info is copied.
* spirv_to_nir: assert that we are a fragment shader when handling
this execution modes.
v4: (reported by failing gitlab pipeline #18750)
* state_tracker: update too due changes on ir.h/gl_program
v5:
* blorp: minor change after change on previous patch
* radeonsi: update due this change.
v6: (Timothy Arceri)
* prog_to_nir: remove extra whitespace
* shader_info: don't use :1 on origin_upper_left
* glsl: program.fs.origin_upper_left/pixel_center_integer can be
move out of the shader list loop
|
|
|
|
|
|
|
|
|
|
|
|
| |
The condition code in extended branches is repeated 8 times for unclear
reasons; accordingly, the code would be disassembled as "unknown5555",
"unknownAAAA", etc. This patch correctly masks off the lower two bits to
find the true code to print, verifying that the code is repeated as
believed to be necessary (providing some assurance for compiler quality
and an assert trip in case we encounter a shader in the wild that breaks
the convention).
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
discard and discard_if are both implemented with the branching pipeline
on Midgard; essentially, we branch to the end of the fragment shader in
a special "discard" mode, setting the condition as necessary.
Previously, we hardcoded the form of this instruction, which worked for
very simple shaders but was incorrect for anything remotely interesting.
This patch instead emits logical branches in the IR, which are flattened
to real discard ops the same way other branches are, allowing targets to
be computed correctly.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, we only emitted compact branches; however, the offset range
of these branches is too small for many real world shaders. This patch
implements support for emitting extended branches and switches to always
using them for control flow. This incurs a code size and possibly
performance penalty, but expands the range of working shaders and
provides opportunity for further optimization.
Support for emitting compact branches is retained but this code path is
presently unused. In the future, we'll want to heuristically determine
which type of branch should be emitted for optimal codegen.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Midgard features "compact branches" and "extended branches", i.e.
corresponds to short jumps and far jumps. The form of the extended
branch was previously incorrect in the ISA headers; this patch corrects
it and updates the disassembler (simultaneous to preserve
bisectability).
Additionally, we fix some a corner case in the disassembly of extended
branches, and we now prefix extended branches with "brx", to visually
differentiate from compact branches prefixed with "br".
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
An if-else statement is compiled to a conditional branch (from the start
to the second block) and an unconditional branch (from the end of the
first block to the end of the else). We previously incorrectly computed
the block index of the unconditional branch to be exactly one after that
of the conditional branch, valid for a single if-else statement but
nothing fancier. This patch correctly computes the unconditional branch
target, fixing more complex if-else chains.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Each Midgard instruction is scheduled to a particular instruction type
("tag"). Presumably the hardware prefetches memory based on tag, so it
is required to report out the first tag to the command stream and the
next tag of a branch target. This procedure was implemented in two
separate parts of the compiler (one time with a slight bug relating to
empty blocks); this patch refactors to unite the two routines and solve
the bug when branching to empty blocks.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Historically, Panfrost debugging entailed the use of the LD_PRELOADable
`panwrap` tool. This setup is a tad fragile; Panfrost can be traced
directly without the intermediate layer. pantrace implements the
quivalent functionality of panwrap into Panfrost proper, allowing dumps
to work regardless of the kernel layer in use.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The `panwrap` utility can be LD_PRELOAD'd into a GLES app, intercepting
communication between the driver and the kernel. Modern panwrap versions
do no processing of their own; instead, they create a trace directory.
This directory contains the following files:
- control.log: a line-by-line plain text file, denoting important
syscalls (mmaps and job submits) along with their arguments
- memory_*.bin, shader_*.bin: binary dumps of mapped memory
Together, these files contain enough information to reconstruct the
command stream and shaders of (at minimum) a single frame.
The `pandecode` utility takes this directory structure as input,
reconstructing the mapped memory and using the job submit command as an
entrypoint. It then walks the descriptors as the hardware would, parsing
and pretty-printing. Its final output is the pretty-printed command
stream interleaved with the disassembled shaders, suitable for driver
debugging. For instance, the behaviour of two driver versions (one
working, one broken) can be compared by diff'ing their decoded logs.
pandecode/decode.c was originally a part of `panwrap`; it is the oldest
living code in the project. Its history is generally not worth
preserving.
panwrap itself will continue to live downstream for the foreseeable
future, as it is specifically written for the vendor kernel. It is
possible, however, to produce equivalent traces directly from Panfrost,
bypassing the intermediate wrapping layer for well-behaved drivers.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
| |
This is not yet functional, but it resolves a crash in various apps and
provides a framework for further work.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
| |
v2: use tc.stream_uploader in si buffer_transfer_map if not called from
the driver thread
Reviewed-by: Nicolai Hähnle <[email protected]> (v1)
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
|
|
|
| |
for radeonsi
Reviewed-by: Nicolai Hähnle <[email protected]>
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
|
|
|
|
| |
radeonsi will require this. It's a no-op for drivers supporting persistent
mappings.
Reviewed-by: Nicolai Hähnle <[email protected]>
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use the 'UNK31' bit (which should probably be called 'BUFFER') for
samplerBuffer case, which increases the size of supported buffer
texture beyond 2^15 elements.
Also need to fix the 2nd coord injected to handle the tex instructions
that take integer coords.
Fixes dEQP-GLES31.functional.texture.texture_buffer.render.as_fragment_texture.buffer_size_131071
and similar
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
Fixes dEQP-GLES31.functional.image_load_store.{3d,cube}.store.*
and a bunch more
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
The variant will be NULL if RA failed. Which isn't ideal, but at least
lets not segfault and bring down the rest of the dEQP run with us.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
While I haven't tested them all, given that they're all using the same
allocation paths and modifiers in the kernel they should be fine to use in
the same way.
v2: Rebase on other kmsro changes.
v3: Skip repeated '[with_gallium_kmsro,' in the meson build.
Acked-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This gets stencil and depth resolves working properly.
Fixes:
dEQP-GLES3.functional.fbo.msaa.2_samples.depth32f_stencil8
dEQP-GLES3.functional.fbo.msaa.2_samples.depth24_stencil8
dEQP-GLES3.functional.fbo.msaa.4_samples.depth32f_stencil8
dEQP-GLES3.functional.fbo.msaa.4_samples.depth24_stencil8
dEQP-GLES3.functional.fbo.invalidate.whole.unbind_blit_msaa_color
dEQP-GLES3.functional.fbo.invalidate.sub.unbind_blit_msaa_color
Signed-off-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Blitter does support it after all. Previous attempt to use R8_UINT
failed because we overwrote the a6xx format in emit_blit_texture(),
but some of the later setup still looked at the gallium format.
If we overwrite it in the pipe_blit_info before we even call into
emit_blit_texture() it works properly.
Signed-off-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ARB_vertex_program and ARB_fragment_program define 0^0 = 1 (while GLSL
leaves it undefined). Performing fpow lowering in NIR would break this
behavior, preventing us from using prog_to_nir.
According to llvm/lib/Target/AMDGPU/SIInstructions.td, POW_common
expands to <V_LOG_F32_e32, V_EXP_F32_e32, V_MUL_LEGACY_F32_e32>,
which presumably does a zero-wins multiply.
Lowering in NIR results in a non-legacy multiply, where:
pow(0, 0) = 2^(log2(0) * 0)
= 2^(-INF * 0)
= 2^(-NaN)
= -NaN
which isn't the desired result.
This reverts:
- commit d6b75392067712908bdc372f1007e085439bf9f5
(ac/nir: remove emission of nir_op_fpow)
- commit 22430224fec31591432d4a3e65c6f457ba1c1653
(radeonsi/nir: enable lowering of fpow)
and prevents a regression in gl-1.0-spot-light with AMD_DEBUG=nir
after enabling prog_to_nir in st/mesa later in this series.
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
shader-db results for VEGA64:
Totals from affected shaders:
SGPRS: 1976 -> 1976 (0.00 %)
VGPRS: 1240 -> 1144 (-7.74 %)
Spilled SGPRs: 145 -> 145 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 34632 -> 34604 (-0.08 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 261 -> 285 (9.20 %)
Wait states: 0 -> 0 (0.00 %)
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
shader-db results for VEGA64:
Totals from affected shaders:
SGPRS: 791528 -> 792616 (0.14 %)
VGPRS: 421624 -> 410784 (-2.57 %)
Spilled SGPRs: 1639 -> 1674 (2.14 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 16103516 -> 16063696 (-0.25 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 136307 -> 137830 (1.12 %)
Wait states: 0 -> 0 (0.00 %)
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
The nine state tracker can produce NIR uniform variables
whose location is explicitly set. radeonsi did not take that
into account when calculating const_file_max, resulting in
rendering glitches. This patch fixes that.
Signed-Off-By: Timur Kristóf <[email protected]>
Tested-by: Andre Heider <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
| |
Cc: 18.3 19.0 <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
| |
Fixes: f1374805a86d0d506557 "drm-uapi: use local files, not system libdrm"
Bug: https://bugs.freedesktop.org/show_bug.cgi?id=109647
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
| |
Fixes: f1374805a86d0d506557 "drm-uapi: use local files, not system libdrm"
Bug: https://bugs.freedesktop.org/show_bug.cgi?id=109645
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
| |
It's unused in the VS (since we need vattr_sizes[] anyway), so move it to
FS prog data.
|
|
|
|
|
|
|
|
|
|
|
| |
Apparently we need disable-EZ flagged, not just "does Z writes".
Fixes
dEQP-GLES31.functional.image_load_store.early_fragment_tests.no_early_fragment_tests_depth_fbo
on 7278, even though it passed in simulation.
Signed-off-by: Eric Anholt <[email protected]>
Fixes: 051a41d3d56e ("v3d: Add support for the early_fragment_tests flag.")
|
|
|
|
|
|
| |
Fixes intermittent fails in
dEQP-GLES31.functional.draw_indirect.compute_interop.separate.drawelements_compute_cmd_and_data_and_indices
and others (particularly when run as part of a CTS run)
|
|
|
|
|
|
|
| |
Otherwise, we might have pages accessible that shouldn't be and miss out
on errors. This is unlikely for most tests since v3d_hw_get_mem() is big
enough that it'll be a freshly zeroed mmap, but if screens are destroyed
and recreated then we'd be reusing the old v3d_hw_get_mem() contents.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since gl_HelperInvocation is lowered to:
!((1 << sample_id) & sample_mask_in))
Not setting these enable bits was causing it be broken. (And probably a
bunch of other stuff too.)
Fixes dEQP-GLES31.functional.shaders.helper_invocation.*
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
| |
This fixes issues where polygons that should be culled (due to negative
w, for instance) may not be.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
| |
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
| |
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
| |
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
| |
v2: Don't check for NULL before free()
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
Soon we'll need this logic to deal w/ image/SSBO case, so split out a
helper rather than duplicate the logic.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Images and SSBOs don't map directly to the hw. They end up being part
texture and part something else. Starting with a6xx, the hack used for
a5xx to smash the image tex state into hw texture state starting from
MAX counting down won't work, because we start using tex state also for
SSBO read.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
Split out of the main fd6_emit() code, since it was already getting to
be a pretty giant function.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
| |
Reduce stack space used by clipper, which had lead to crashes in some
versions for MSVC
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
| |
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
| |
Reviewed-by: Bruce Cherniak <[email protected]>
|