| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In commit cda886a4851ab767fba40e8474d6fa8190347e4f, Neil made us stop
advertising RGBX formats on Gen9+, as the hardware apparently no longer
has working fast clear support for those formats. Instead, we just
fall back to RGBA formats, and use SCS to override alpha to 1.0.
This is fine, but had one unintended side effect: it made us fall back
to slow clears when the color mask disables alpha. Normally, we ignore
the color mask for non-existent channels. This includes alpha for XRGB
formats as writing garbage to the X channel is harmless. But, now that
we use RGBA, we think there's a real alpha channel, and can't do the
optimization.
To hack around this, check if _BaseFormat is GL_RGB and ignore alpha.
Improves WebGL Aquarium performance on Skylake GT3e by about 50%
by letting it use repclears instead of slow clears.
Cc: [email protected]
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ben Widawsky <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We do this in two steps: first we clip the dst rect and adjust the src
rect accordingly. Then we do it the other way around. In both passes
the adjustment part involves multiplying by a scale factor that can lead
to a small precision loss. This is breaking a few dEQP tests.
Specifically, the problem happens when we need to clip the same coordinate
twice. For example, if srcX0 and dstX0 need both to be clipped we want to
avoid the situation where we clip srcX0 first, then adjust dstX0 accordingly
but then we realize that the resulting dstX0 still needs to be clipped, so
we clip dstX0 and adjust srcX0 again. Each of these two passes can lead
to precission loss. What we want to do here is detect the rect that leads
to the largest clip (accounting for the scale factor involved), clip that
rect and adjust the other one. With this we ensure that the adjusted
coordinate does not need to be clipped again and we can skip a second pass,
improving precision.
Fixes the following 4 dEQP tests:
dEQP-GLES3.functional.fbo.blit.rect.out_of_bounds_reverse_src_x_nearest
dEQP-GLES3.functional.fbo.blit.rect.out_of_bounds_reverse_src_x_linear
dEQP-GLES3.functional.fbo.blit.rect.out_of_bounds_reverse_dst_x_nearest
dEQP-GLES3.functional.fbo.blit.rect.out_of_bounds_reverse_dst_x_linear
Reviewed-by: Kenneth Graunke <[email protected]>
Tested-by: Mark Janes <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit b449366587b5f3f64c6fb45fe22c39e4bc8a4309.
I removed the pass thinking that it was now not useful, but that was not
true. I believe I ran shader-db on HSW and saw no results, but HSW does
not use the unlit centroid workaround code and as a result does not emit
redundant MOV_DISPATCH_TO_FLAGS instructions.
On IVB, the shader-db results are:
total instructions in shared programs: 6650806 -> 6646303 (-0.07%)
instructions in affected programs: 106893 -> 102390 (-4.21%)
helped: 793
total cycles in shared programs: 56195538 -> 56103720 (-0.16%)
cycles in affected programs: 873048 -> 781230 (-10.52%)
helped: 553
HURT: 209
On SNB, the shader-db results are:
total instructions in shared programs: 7173074 -> 7168541 (-0.06%)
instructions in affected programs: 119757 -> 115224 (-3.79%)
helped: 799
total cycles in shared programs: 98128032 -> 98072938 (-0.06%)
cycles in affected programs: 1437104 -> 1382010 (-3.83%)
helped: 454
HURT: 237
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
| |
This was left out from the original gen8 upload introduction.
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
There is not initial assignment, thus appending to it does not work.
Fixes: b27c85c4c08 "i965: add build rule for brw_nir_trig_workarounds.c"
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Add MESA_FORMAT_R8G8B8A8_UNORM and MESA_FORMAT_R8G8B8X8_UNORM formats as
these are the preferred formats for Android.
Signed-off-by: Rob Herring <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Commit bfd17c76c126 ("i965: Port INTEL_PRECISE_TRIG=1 to NIR.") added a
generated file brw_nir_trig_workarounds.c which broke the Android build.
Add the necessary makefiles to the Android build.
Cc: Kenneth Graunke <[email protected]>
Signed-off-by: Rob Herring <[email protected]>
Tested-by: Chih-Wei Huang <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Use the defines Mesa configure sets to indicate presence of the bswap32
builtins. This lets i965 work on OpenBSD again after the changes that
were made in 0a5d8d9af42fd77fce1492d55f958da97816961a.
Signed-off-by: Jonathan Gray <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Previously the vertex buffer consisted of eight floats per vertex
of which six where constants. These can be as easily provided by
vertex fetcher as it is capable of filling vertex elements with
constant one and zero. This reduces the size of the vertex buffer
from 3 * 8 * 4 = 96 to 3 * 2 * 4 = 24 bytes.
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Otherwise clearing with blorp will regress performance in some
synthetic test cases.
v2: Used vsize >= 2 instead of vsize > 0, and updated the comment.
Review by Ken in one of the earlier patches revealed this.
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
In case there is no source it means the program does a simple
clear or a resolve. In such case there is no need to program
sampling state or enable pixel kill in fragment shader.
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
This partially reverts 2f28a0dc23165123cf1e8b5942acad37878edd8a
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
Also add the additional render format check to the same utility.
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
On gen8 color resolving won't work anymore if the target isn't
the first entry in the binding table.
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
Generator is only needed for getting the assembly.
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
v2 (Ken): Moved switch cases for gen8/9 in texel_fetch() to
earlier patch adding gen8/9 sampling support.
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
v2 (Ken): Fix the condition on using meta for stencil blits:
use_blorp -> !use_blorp
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds additional MOV instruction for all blorp programs
that use SHADER_OPCODE_TXF. Alternative is to augment blorp program
key to tell if z-coordinate is needed, add condition to the blorp
blit compiler and to produce a variant with and without the MOV.
This seems a little overkill.
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In order to support cases where gen9 uses RGBA format to back client
requested RGB, one needs to have means to force alpha channel to one
when user requested RGB surface is used as blit source.
v2 (Ken): Use helper for constructing the swizzle (this should be
changed to use brw_get_texture_swizzle() as a follow-up).
Also calculate the swizzle for CopyTexSubImage.
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
v2 (Ken): Drop GEN8_RASTER_FRONT_WINDING_CCW in raster state
Add emission of pma stall.
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
v2 (Ken): Added switch cases for gen8/9 in texel_fetch(). These
were wrongly introduced in blit-enabling patch.
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
v2 (Ken): Use payload directly instead of retyping it into vec8.
Drop the implied header, it isn't used for gen6+ anyway.
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, we hardcoded "VS URB Starting Address" to 2 (in 8kB chunks),
which meant VS URB data would start at an offset of 16kB.
However, on Haswell GT3 and Gen8+, we allocate the first 32kB for the
push constant region. This means that the PS push constant and VS URB
data regions overlap, which can lead to corruption.
v2 (Ken): Better description of the change, and do not change vs_size
from 2 to 1.
Cc: [email protected]
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Currently the size is sizeof(float) times too large. One reserves
GEN6_BLORP_VBO_SIZE many floats whereas GEN6_BLORP_VBO_SIZE stands
for the size of vertex buffer in bytes.
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Fixes dEQP-GLES31.functional.shaders.multisample_interpolation tests:
- interpolate_at_sample.non_multisample_buffer.sample_n_default_framebuffer
- interpolate_at_sample.non_multisample_buffer.sample_n_singlesample_rbo
- interpolate_at_sample.non_multisample_buffer.sample_n_singlesample_texture
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The coverage mask is not sufficient - in per-sample mode, we also need
to AND with a mask representing the samples being processed by the
current fragment shader invocation.
Fixes 18 dEQP-GLES31.functional.shaders.sample_variables tests:
sample_mask_in.bit_count_per_sample.multisample_{rbo,texture}_{1,2,4,8}
sample_mask_in.bit_count_per_two_samples.multisample_{rbo,texture}_{4,8}
sample_mask_in.bits_unique_per_sample.multisample_{rbo,texture}_{1,2,4,8}
sample_mask_in.bits_unique_per_two_samples.multisample_{rbo,texture}_{4,8}
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The ARB_sample_shading specification says that setting gl_SampleMask
bits to 0 means that the corresponding sample "should be considered
uncovered for the purposes of multisample fragment operations
(Section 4.1.3)."
The OpenGL 4.4 specification, section 17.3.3 ("Multisample Fragment
Operations") specifies:
"No changes to the fragment alpha or coverage values are made at this
step if MULTISAMPLE is disabled, or if the value of SAMPLE_BUFFERS
is not one."
oMask output alters coverage masks and can kill pixels. We need to
disable it in the above case, which conveniently corresponds to
key->multisample_fbo being false.
Khronos bug #12188 also spells this out clearly:
https://cvs.khronos.org/bugzilla/show_bug.cgi?id=12188
Fixes two Piglit tests:
tests/spec/arb_sample_shading/builtin-gl-sample-mask-simple 0
tests/spec/arb_sample_shading/builtin-gl-sample-mask 0
Fixes 21 ES3 conformance tests:
ES31-CTS.sample_variables.mask.rgba8.samples_0.mask_zero
ES31-CTS.sample_variables.mask.rgba8.samples_0.mask_0
ES31-CTS.sample_variables.mask.rgba8.samples_0.mask_1
ES31-CTS.sample_variables.mask.rgba8.samples_0.mask_2
ES31-CTS.sample_variables.mask.rgba8.samples_0.mask_3
ES31-CTS.sample_variables.mask.rgba8.samples_0.mask_7
ES31-CTS.sample_variables.mask.rgba8i.samples_0.mask_zero
ES31-CTS.sample_variables.mask.rgba8i.samples_0.mask_3
ES31-CTS.sample_variables.mask.rgba8i.samples_0.mask_4
ES31-CTS.sample_variables.mask.rgba8i.samples_0.mask_5
ES31-CTS.sample_variables.mask.rgba8i.samples_0.mask_7
ES31-CTS.sample_variables.mask.rgba8ui.samples_0.mask_zero
ES31-CTS.sample_variables.mask.rgba8ui.samples_0.mask_2
ES31-CTS.sample_variables.mask.rgba8ui.samples_0.mask_3
ES31-CTS.sample_variables.mask.rgba8ui.samples_0.mask_4
ES31-CTS.sample_variables.mask.rgba8ui.samples_0.mask_6
ES31-CTS.sample_variables.mask.rgba32f.samples_0.mask_zero
ES31-CTS.sample_variables.mask.rgba32f.samples_0.mask_0
ES31-CTS.sample_variables.mask.rgba32f.samples_0.mask_2
ES31-CTS.sample_variables.mask.rgba32f.samples_0.mask_5
ES31-CTS.sample_variables.mask.rgba32f.samples_0.mask_7
Fixes 9 dEQP-GLES31.functional.shaders.sample_variables tests:
sample_mask.discard_half_per_pixel.default_framebuffer
sample_mask.discard_half_per_pixel.singlesample_rbo
sample_mask.discard_half_per_pixel.singlesample_texture
sample_mask.discard_half_per_sample.default_framebuffer
sample_mask.discard_half_per_sample.singlesample_rbo
sample_mask.discard_half_per_sample.singlesample_texture
sample_mask.discard_half_per_two_samples.default_framebuffer
sample_mask.discard_half_per_two_samples.singlesample_rbo
sample_mask.discard_half_per_two_samples.singlesample_texture
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I'm going to need a key entry meaning "we have a multisample FBO,
and multisampling is enabled" in an upcoming patch. This is basically
wm_key->compute_sample_id, except that it also checks that the SAMPLE_ID
system value is read.
The only use of wm_key->compute_sample_id is in emit_sampleid_setup(),
which is only called when handling the SAMPLE_ID system value. So we
can just eliminate the check and generalize the field.
v2: Also update the Vulkan driver.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This was only used by the old gl_SampleID calculations. The new code
doesn't need to handle 2x specially.
v2: Delete it from the Vulkan driver, too.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On Gen7+, the thread payload provides the sample ID - we can read it
in two instructions, without any elaborate calculations. We don't even
need a state dependency - this will properly produce zero in the
non-MSAA case. Unfortunately, we need the state flag anyway, so we
may as well continue to use it to produce a single MOV 0 instead of
SHR/AND.
For some reason, the sample ID field is always zero on Gen7/7.5, so
we can't use this yet. However, it works fine on Gen8+. So, land the
code and use it where it's working, and leave a TODO for later.
v2: Fix register types in the comment (caught by Matt Turner!).
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|