| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I've played with a few different approaches to tweak instruction
priority according to how much they increase/decrease register pressure,
etc. But nothing seems to change the fact that compared to original
(pre-multiple-block-support) scheduler, in some edge cases we are
generating shaders w/ 5-6x higher register usage.
The problem is that the priority queue approach completely looses the
dependency between instructions, and ends up scheduling all paths at the
same time.
Original reason for switching was that recursive approach relied on
starting from the shader outputs array. But we can achieve more or less
the same thing by starting from the depth-sorted list.
shader-db results:
total instructions in shared programs: 113350 -> 105183 (-7.21%)
total dwords in shared programs: 219328 -> 211168 (-3.72%)
total full registers used in shared programs: 7911 -> 7383 (-6.67%)
total half registers used in shader programs: 109 -> 109 (0.00%)
total const registers used in shared programs: 21294 -> 21294 (0.00%)
half full const instr dwords
helped 0 322 0 711 215
hurt 0 163 0 38 4
The shaders hurt tend to gain a register or two. While there are also a
lot of helped shaders that only loose a register or two, the more
complex ones tend to loose significanly more registers used. In some
more extreme cases, like glsl-fs-convolution-1.shader_test it is more
like 7 vs 34 registers!
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
It causes confusion in sched if we need to split_addr() since otherwise
we wouldn't easily know which block the new addr instr will be scheduled
in. So just side-step the whole situation.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
We'll need to add similar for ir3_instruction, but following the pattern
to use 'id' seems confusing. Let's just go w/ generic 'data' as the
name.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rather than assigning inloc up front, when we don't yet know if it will
be unused, assign it last thing before the legalize pass.
Also, realize when inputs are unused (since for frag shader's we can't
rely on them being removed from ir->inputs[]). This doesn't make sense
if we don't also dynamically assign the inloc's, since we could end up
telling the hw the wrong # of varyings (since we currently assume that
the # of varyings and max-inloc are related..)
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
Rather than magic depth value, which won't be available in later stages.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
Same as previous commit, for a4xx.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Make the interpolation / point-sprite replacement mode setup deal with
varying packing.
In a later commit, we switch to packing just the varying components that
are actually used by the frag shader, so we won't be able to assume
everything is vec4's aligned to vec4. Which would highly confuse the
previous vinterp/vpsrepl logic.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
Tested on a4xx. This is part of the builtins added by ARB_gpu_shader5
and GLSL ES 3.10.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
The program emit depends on certain fb details. Make sure those get
updated when the fb changes.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
It appears that the hardware wants the integer to be scaled the same way
that the hardware representation is. snorm16 uses one of the float
factors, so this is only relevant for snorm8.
This fixes a number of subcases of
bin/fbo-blending-formats GL_EXT_texture_snorm
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: [email protected]
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
| |
Looks like the width field's bitmask was off-by-one.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
v2: do the same in tgsi_to_nir (Samuel)
v3: added missing cases after rebase (Iago)
v4: Add a blank space after '#' in one of the comments (Matt)
Reviewed-by: Iago Toral Quiroga <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
| |
This fixes the ext_draw_buffers2 and arb_draw_buffers_blend tests.
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
| |
We already pass in start_instance in fd4_draw. Expose the extension.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Same fix as on a3xx - set the second (tiny) layer size bitfield to the
smallest level's size so that the hw knows not to minify beyond that.
This fixes texelFetch sampler3D piglits.
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
|
|
|
|
| |
When layer is the container, slices are tightly packed inside of each
layer. We don't need any additional alignment. On a3xx, each slice
contains all the layers, so having alignment makes sense.
This fixes a whole slew of array-related piglits, including texelFetch
and tex-miplevel-selection varieties.
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
| |
Same as commit 84d087aea but for a4xx. The RE'd enums had the same issue
too.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
a4xx hardware has real support for RGTC so there's no need to fake it
like we do on a3xx. Undo the hacks, and keep track of an "internal
format" of a resource, which on a3xx will be different, triggering the
transfer-time conversions to take place.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Looks like a4xx hw does this in a more standard way and we don't need to
hack around it like we do on a3xx. Fixes GL_ALPHA formats in
fbo-blending-formats, fbo-colormask-formats, and fbo-alphatest-formats.
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
| |
Instead of playing the guessing game as to which texture format reads
from which border color encoding type, just write both of them always.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
There are not native RGBX render formats, so we must manually force
dst_alpha to be one, same as for a3xx.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This is the NIR analog to GLSL IR ir_samples_identical.
v2: Don't add the second nir_tex_src_ms_index parameter. Suggested by
Ken and Jason.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
|
|
|
|
|
|
|
|
| |
This fixes teximage-colors, fbo-generatemipmap-formats, and probably
others (in relation to the RGB5 formats, others still fail).
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: [email protected]
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The lower layers assume that we support this, and it's been core since
GL 1.4. This fixes a slew of piglit tests, especially around
tex-miplevel-selection.
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
|
|
| |
The a4xx bits corresponding to 'freedreno/a3xx: add fake RGTC support
(required for GL3)'
TODO some more r/e.. maybe we get lucky and hw supports some of this
directly? For now this will help us enable gl3.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The main issue is that the current logic looked into cso->u.tex, which
is the wrong side of the union to look into for texture buffers. While I
was at it, it was easy enough to add the logic to handle offsets
(first_element).
- reduce texture buffer size limit (determined experimentally)
- don't look at first/last levels, instead look at first/last element
- include the first element offset
- set offset alignment to 16 (determined experimentally)
Signed-off-by: Ilia Mirkin <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
A smarter implementation would make it possible to attach this to emit
state for the BY_REGION versions to avoid breaking the tiling. But this
is a start.
Signed-off-by: Ilia Mirkin <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Also throw in LATC while we're at it (same exact format). This could be
made more efficient by keeping a shadow compressed texture to use for
returning at map time. However... it's not worth it for now...
presumably compressed textures are not updated often.
Lastly fix up Z32S8 transfers to non-0 layers.
Signed-off-by: Ilia Mirkin <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The previously RE'd formats were from an ES driver implementing
OES_vertex_type_10_10_10_2 and thus backwards. A future change could add
the 2_10_10_10 support.
Signed-off-by: Ilia Mirkin <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We'd end up in a state where shader uses no inputs, yet num_elements is
greater than zero. Triggered by a TF vertex shader which did:
gl_Position = vec4(0.0, 0.0, 0.0, 0.0);
resulting in a binning pass variant with no inputs.
Includes equiv fix in a4xx, even though we don't have binning-pass
enabled yet on a4xx.
Signed-off-by: Rob Clark <[email protected]>
|