| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The entire goal of intel_miptree_make_shareable() is to permanently
disable the miptree's aux surfaces. So set
intel_mipmap_tree:disable_aux_buffers after the function's done with
discarding down the aux surfaces.
References: https://bugs.freedesktop.org/show_bug.cgi?id=98329
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Cc: Nanley Chery <[email protected]
Cc: Haixia Shi <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
|
|
| |
This is a step towards using NIR optimisations over GLSL IR
optimisations. Delaying adding built-in uniforms until after
we convert to NIR gives it a chance to optimise them away.
V2: move the new code back to brw_link_shader()
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98297
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This extension allows the fragment shader to control whether values in
gl_SampleMaskIn[] reflect the coverage after application of the early
depth and stencil tests.
Signed-off-by: Plamena Manolova <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
This was already set to the same value earlier.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was uninitialized, which resulted in weird looking printouts where
it appeared that the TCS output and TES input patch URB entries differed
in SSO/non-SSO layout. There is no "separable" layout for both, as
they're tied together.
It should have no other actual effect.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was a hack which worked around the VS and TCS disagreeing on their
shared interface due to the lack of varying packing. In particular, it
was needed by Piglit's tcs-input-read-array-interface test.
However, that was just one case where things could go awry, so the
previous commit forcibly made interfaces match. This hack is no longer
necessary.
It also seems to be broken, though I'm not sure why. It fixes Piglit
regressions in spec/arb_shader_image_load_store/semantics from commit
ec1f159ac81ed964415d102eed4a0a29be8e7937.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98893
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A while ago, I made i965 start compiling shaders independently. The VUE
map layouts were based entirely on each shader's input/output bitfields.
Assuming the interfaces match, this works out well - both sides will
compute the same layout, and outputs are correctly routed to inputs.
At the time, I had assumed that the linker would guarantee that the
interfaces match. While it usually succeeds, it unfortunately seems
to fail in some cases.
For example, Piglit's tcs-input-read-array-interface test has a VS
output array with two elements, but the TCS only reads one. The linker
isn't able to eliminate the unused element from the VS, which makes the
interfaces not match.
Another case is where a shader other than the last writes clip/cull
distances. These should be demoted to ordinary varyings, but they
currently aren't - so we think they still have some special meaning,
and prevent them from being eliminated.
Fixing the linker to guarantee this in all cases is complicated. It
needs to be able to optimize out dead code. It's tied into varying
packing and other messiness. While we can certainly improve it---and
should---I'd rather not rely on it being correct in all cases.
This patch ORs adjacent stages' input/output bitfields together,
ensuring that their interface (and hence VUE map layout) will be
compatible. This should safeguard us against linker insufficiencies.
Fixes line rendering in Dolphin, and the Piglit test based on it:
spec/glsl-1.50/execution/geometry/clip-distance-vs-gs-out.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97232
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
The PRMs for HSW and newer say that other than the opcode and DebugCtrl
bits of the instruction word, the rest must be zero.
By zeroing the instruction word manually, we avoid using any of the
state inherited through brw_codegen.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96959
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Allocating zero URB space is a really bad idea. The hardware has to
give threads a handle to their URB space, and threads have to use that
to terminate the thread. Having it be an empty region just breaks a
lot of assumptions. Hence, why we asserted that it isn't possible.
Unfortunately, it /is/ possible prior to Gen8, if max_vertices = 0.
In theory a geometry shader could do SSBO/image access and maybe
still accomplish something. In reality, this is tripped up by
conformance tests.
Gen8+ already avoids this problem by placing the vertex count DWord
in the URB entry header. This fixes things on earlier generations.
Cc: [email protected]
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
Tested-by: Ian Romanick <[email protected]>
|
|
|
|
|
| |
This reverts commit 9404439a754e5640ccd98df40fa694835c0d8759. I didn't
intend to push it and it breaks clip and cull distance.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When I originally implemented the ARB_copy_image extension, the fast-path
was written in meta using texture views. This path only worked if both
images were uncompressed color images. All of the other cases fell back to
the blitter or, in the worst case, mapping and memcpy on the CPU. Now that
we have the blorp path, it handles all copies ever and the old meta,
blitter, and CPU paths are only used on gen5 and below. The primary reason
why we needed the meta path (apart from having a slow blitter on later
hardware) was to handle multisampling which gen5 and earlier don't support
anyway. Since the blitter is reasonably fast on gen5, we can just delete
the meta path and get rid of all that terrible code.
If we decide that we're ok with just disabling ARB_copy_image on gen5 and
earlier (I personally am), then we could get rid of another 300 lines or so
of semi-hairy code.
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
By using emit_miptree_blit which does chunking, this fixes the blitter path
for the case where the image is too tall to blit normally. We also pull it
into intel_blit as intel_miptree_copy. This matches the naming of the
blorp blit and copy functions brw_blorp_blit and brw_blorp_copy.
Reviewed-by: Anuj Phogat <[email protected]>
Cc: "13.0" <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Anuj Phogat <[email protected]>
Cc: "13.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This moves the nir_lower_indirect_derefs() call into
brw_preprocess_nir() so thats is called by both OpenGL and Vulkan
and removes that call to the old GLSL IR pass
lower_variable_index_to_cond_assign()
We want to do this pass in nir to be able to move loop unrolling
to nir.
There is a increase of 1-3 instructions in a small number of shaders,
and 2 Kerbal Space program shaders that increase by 32 instructions.
Shader-db results BDW:
total instructions in shared programs: 8705873 -> 8706194 (0.00%)
instructions in affected programs: 32515 -> 32836 (0.99%)
helped: 3
HURT: 79
total cycles in shared programs: 74618120 -> 74583476 (-0.05%)
cycles in affected programs: 528104 -> 493460 (-6.56%)
helped: 47
HURT: 37
LOST: 2
GAINED: 0
|
|
|
|
|
|
|
|
| |
Otherwise subsequent render cycles keep on using compression
and/or fast clear.
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
In commit 45cd76e342d1e8e schedule_instructions(bblock_t *) began
setting bblock_t::cycle_count, but that function was not called on
trivial blocks.
Remove the code to skip trivial blocks so that cycle_count is set.
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
| |
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
schedule_instructions(bblock_t *) isn't called on blocks with a single
instruction, and since it is the only thing that set cycle_count,
cycle_count would be uninitialized.
A non-empty block with bblock_t::cycle_count == 0 is arguably a bug.
That'll be fixed in the next commit.
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
| |
This reverts commit b4001af1744a02f472bd1204458662088307981b.
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
| |
This matches the capabilities of the hardware.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
Offsets that don't fit into 4 bits need to force gather_po to be
selected. Adjust the logic so that this happens.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, we had an OFFSET_VALUE source for logical texture instructions
that was intended to mean exactly what it says, "offset". In reality, we
only fully used it for tg4 offsets. We used offset_value.file == IMM to
mean, "you have a constant offset, go look in instr->offset" and didn't
actually use the contents of the register at all in that case except for
in nir_emit_texture where we used it as a temporary before we copy it into
instr->offset.
This commit renames OFFSET_VALUE to TG4_OFFSET and restricts its usage to
indirect tg4 offsets only. The nir_emit_texture code is refactored so that
we explicitly build a header_bits value which is placed in instr->offset
and the constant offset values (both for tg4 and regular texture
operations) are used to construct header_bits and don't go through the
offset source at all. Finally, we stop passing offset_value in to
lower_sampler_logical_send_gen5 because we can't do indirect offsets until
gen7 anyway.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On SKL (also fast clear is used for level 0, layer 0):
Manhattan 3.0: 3.88434% +/- 0.814659%
Manhattan 3.0 off: 3.25542% +/- 0.101149%
Trex: 3.43501% +/- 0.31223%
Trex off: 4.13781% +/- 0.0993569%
ON BDW:
Manhattan 3.0: 1.37079% +/- 0.571208%
Manhattan 3.0 off: 1.74029% +/- 0.267499%
v2 (Ben, Matt): Fix rebase error by removing the perf warning
v3 (Topi): Rebased on top of revised eligibility logic
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
One can now also delete intel_get_non_msrt_mcs_alignment().
v2 (Jason): Do not leak aux buf but allocate only after getting
ISL surfaces.
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Note that RESOLVED is not tracked in the map explicitly. Absence
of item implicitly means RESOLVED state.
v2: Added intel_resolve_map_clear() into intel_miptree_release()
v3 (Jason): Properly handle the assumption of resolve map not
containing any items with state RESOLVED. Removed
unnecessary intel_miptree_set_fast_clear_state() call
in brw_blorp_resolve_color() preventing
intel_miptree_set_fast_clear_state() from asserting
against RESOLVED.
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
Status is still tracked per miptree. Next patch will switch to
resolve map per slice/level.
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Until now fast clear has been supported only for non-layered and
non-mipmapped buffers. However, from gen8 onwards there is hardware
support also for layered/mipmapped. Once this is enabled, fast clear
operations target specific layer/level and call for the state to be
tracked in the same granularity. This is the first step providing
the details from callers to the state tracking.
Patch introduces new interface for reading and writing the state
hiding the upcoming bookkeeping changes in the call sites. There is
bunch of sanity checks added that will be relaxed per hardware
generation later on when the actual functionality is enabled.
v2: Rebased on top current master setting the state in
blorp_surf_for_miptree().
v3: Replace open-coded resolved check in surface state emission
with intel_miptree_has_color_unresolved().
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
This patch also introduces getter and setter for fast clear state
preparing for tracking the state per slice.
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently the status bits for fast clear include the flag telling
if non-multisampled mcs buffer should be used at all. Once the
state tracking is changed to follow individual levels/layers one
still needs to have the mcs enabling information in the miptree.
Therefore simply split it out to its own boolean.
Possible follow-up work is to combine disable_aux_buffers and
no_ccs into single enum.
v2 (Jason): Changed no_msrt_mcs to no_ccs and updated comment
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
v2: Make intel_miptree_resolve_color() take start layer and
layer count.
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Upcoming patches will introduce fast clear in level/layer
granularity like the driver does already for depth/hiz. This patch
introduces equivalent full resolve option.
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Essentially this moves fast clear state update away from surface
state setup into brw_postdraw_set_buffers_need_resolve() that gets
called just after draw submission.
Calling intel_miptree_used_for_rendering() can be drop for gen6
and earlier as it is no-op.
v2: Rebased on top current master setting the state in
blorp_surf_for_miptree().
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes rendering in Dolphin on Vulkan since we enabled clip
distances. (Dolphin on GL has a similar bug because the linker
fails to eliminate unused clip distance built-in arrays, but it
isn't using SSO...so that needs more fixing.)
Also fixes a Piglit test:
spec/glsl-1.50/execution/geometry.clip-distance-vs-gs-out-sso
Signed-off-by: Kenneth Graunke <[email protected]>
Tested-by: Emmanuel Gil Peyrot <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Gen6-7.5 specify the user clip distance enable bitmask in 3DSTATE_CLIP.
Gen8+ normally uses the new internal signalling mechanism to select the
one specified in the last enabled shader stage (3DSTATE_VS, DS, or GS).
This is a pretty good fit for Vulkan, or even newer GL, where the
bitmask comes entirely from the shader. But with glClipPlane(),
this is dynamic state, and we have to listen to _NEW_TRASNFORM.
Clip plane enables are the only reason the VS/DS/GS atoms need to
listen to _NEW_TRANSFORM. 3DSTATE_CLIP already has to listen to it
in order to support ARB_clip_control settings.
Setting the "Use the 3DSTATE_CLIP bitmask" force enable bit allows
us to drop _NEW_TRANSFORM from all the shader stage atoms, so we can
re-emit them less often.
Improves performance of OglBatch7 (version 6) by 2.70773% +/- 0.491257%
(n = 38) at 1024x768 on Cherryview.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
We can't render to 8x MSAA if the width is greater than 64 bits. (see
brw_render_target_supported)
Fixes ES31-CTS.sample_variables.mask.rgba32f.samples_8.mask_*
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
|
|
| |
v2 (Jason):
- Use PRM citation for SKL now that it is available
- Also return false for gen < 8 mipmapped/arrayed
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
instead of in intel_miptree_init_mcs(). For lossless compression
the status is immediately overwritten in
intel_miptree_alloc_non_msrt_mcs() while the status for
non-compressed non-msaa miptrees is explicitly set in
do_blorp_clear().
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
This path is not yet taken for fast cleared or compressed buffers
but later patches will enable it.
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
This is useful when checking if any slice is in unresolved state.
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
And fix a mangled comment while at it.
v2 (Ben): Return the converted color.
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was detected when examining CCS_E failures with piglit test:
"fbo-generatemipmap-formats". Test creates a 2D texture with
dimensions 293x277. It manually loops over all levels and calls
glTexImage2D(). Level one triggers creation of full miptree:
intel_alloc_texture_image_buffer() realizes that there is only one
level in the miptree and calls intel_miptree_create_for_teximage()
to re-allocate the miptree with all 9 levels. However, the end result
is a miptree with level zero dimensions of 292x276.
Related, and possibly calling for treatment of its own is mip-map
generation:
After calling glTexImage2D() against every level test continues by
replacing content for levels one to eight with data derived from level
zero by calling glGenerateMipmapEXT(). This results into the miptree
being allocated anew for every level:
Mip-map generation goes thru meta which ends up validating the texture
(brw_validate_textures()->intel_finalize_mipmap_tree()->
intel_miptree_match_image()) where one finds texture with base level
size 292:276. This results into new miptree being created for the npot
size 293:277. Only here intel_finalize_mipmap_tree() is asked for only
one level, and therefore such is created. Generation for level one in
turn finds right base level size but only one level when two is needed.
And the same goes on for all eight levels.
This patch prevents the shrink maintaining the NPOT size of 293x277.
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|