| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
| |
Without this, we were DCEing flag writes because we didn't think their
results were used because we didn't understand that an ANY32 predicate
actually read all the flags.
Fixes: df1aec763eb "i965/fs: Define methods to calculate the flag..."
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Matt Turner <[email protected]>
Signed-off-by: Maya Rashish <[email protected]>
|
|
|
|
|
|
|
|
| |
Fixes invalid close(-1) in the unit tests.
Signed-off-by: Lionel Landwerlin <[email protected]>
Cc: <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
flrp was forgotten when already adding the rounding mode for other
instructions.
Fixes: ba1e25e1aa6 ("i965/fs: set rounding mode when emitting fadd, fmul and ffma instructions")
Suggested-by: Ian Romanick <[email protected]>
Signed-off-by: Andres Gomez <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
After
1711bf6cf2d ("intel/fs: Generate better code for fsign multiplied by a value"),
the conflicts resolution for setting the rounding mode after the
fused fmul and fsign optimization is non obvious.
Basically, the optimization doesn't really result in a MUL, or any
other operation which would need to have the rounding mode set. Hence,
we set it just before the actual MUL in the treatment of fmul.
Fixes: ba1e25e1aa6 ("i965/fs: set rounding mode when emitting fadd, fmul and ffma instructions")
Suggested-by: Caio Marcelo de Oliveira Filho <[email protected]>
Signed-off-by: Andres Gomez <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
From the MEDIA_VFE_STATE docs:
"Starting with this configuration, the Maximum Number of Threads must
be set to (#EU * 8) for GPGPU dispatches.
Although there are only 7 threads per EU in the configuration, the
FFTID is calculated as if there are 8 threads per EU, which in turn
requires a larger amount of Scratch Space to be allocated by the
driver."
It's pretty clear that we need to increase this for scratch address
calculations, because the FFTID has a certain bit-pattern. The quote
above seems to indicate that we should increase the actual thread count
programmed in MEDIA_VFE_STATE as well, but we think the intention is to
only bump the scratch space.
Fixes GPU hangs in Bioshock Infinite and Synmark's CSDof on Icelake 8x8.
Fixes: 5ac804bd9ac ("intel: Add a preliminary device for Ice Lake")
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 729de1488f49033bc181b8123af5658228a51bf1.
It turns out that, although the register is in the logical context,
it isn't whitelisted, so we can't actually write it from userspace
batch buffers. The write just becomes a noop, which is why we saw
no performance changes.
I manually whitelisted it, and still observed no performance gains, but
it did regress KHR-GL46.texture_cube_map_array.color_depth_attachments
on the iris driver. So we might need to fix something before enabling
this. To prevent it randomly getting turned on should the kernel ever
whitelist this register, we revert the patch for now.
|
|
|
|
|
|
|
| |
'α' has never appeared in any genxml files, so there's no need to
replace it with the word "alpha".
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Gen11 doesn't require us to bypass the L2 cache for BC* images anymore.
The documentation is a bit hard to follow on this point, but the Windows
driver clearly only applies this workaround on Gen9, and their commit
history indicates that this was an intentional change to drop the
workaround for Gen11+.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
| |
Reviewed-by: Paulo Zanoni <[email protected]>
|
|
|
|
| |
Reviewed-by: Paulo Zanoni <[email protected]>
|
|
|
|
|
|
|
|
| |
We can't really handle it in the little-core 64-bit case but it's not
really needed there. Where we really want this is for when we need to
do 16 -> 8-bit conversions.
Reviewed-by: Paulo Zanoni <[email protected]>
|
|
|
|
|
|
|
| |
Because byte immediates aren't a thing on GEN hardware, we return a
signed or unsigned word immediate in the byte case.
Reviewed-by: Paulo Zanoni <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
During generate_shuffle(), when we use byte sized registers we end up
with a destination stride of 2. We don't take the stride into
consideration when selecting the group offset for the last MOV
operation, which means we end up moving things to the wrong place,
leaving the last few channels untouched. Take the destination stride
in consideration so we don't miss the last channels.
v2: Assert this is not necessary for the IVB special case (Jason).
Reviewed-by: Jason Ekstrand <[email protected]>
Signed-off-by: Paulo Zanoni <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
The new order matches that of the comparison functions accepted by the C
standard library qsort() functions. Being consistent with qsort will
hopefully help avoid developer confusion.
The only current user of the red-black tree is aub_mem.c which is pretty
easy to fix up.
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This effectively breaks the instance dispatch table in 2 with entry
points using a physical device as first argument getting their own
dispatch table.
As a result we now have to check instance & physical device dispatch
table instead of just the instance dispatch table before.
Signed-off-by: Eric Engestrom <[email protected]>
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
There's nothing whatsoever compiler-specific about it other than that's
currently where it's used.
Reviewed-by: Kristian H. Kristensen <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
| |
Reviewed-by: Andres Gomez <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Later generations support bindless for samplers, images, and buffers and
thus per-stage descriptors are not limited by the binding table size.
However, gen8 doesn't support bindless images and thus needs to report a
lower per-stage limit so that all combinations of descriptors that fit
within the advertised limits are reported as supported by
vkGetDescriptorSetLayoutSupport.
Fixes test dEQP-VK.api.maintenance3_check.descriptor_set
Fixes: 79fb0d27f3 ("anv: Implement SSBOs bindings with GPU addresses in the descriptor BO")
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current code can create functions with a width of 32, which is not
supported by our hardware. Add some code to simplify how we express
what we want and prevent such cases.
For some unknown reason, all the tests I could run seem to work even
with these unsupported MOVs.
Fixes: b0858c1cc6 "intel/fs: Add a couple of simple helper opcodes"
Reviewed-by: Jason Ekstrand <[email protected]>
Signed-off-by: Paulo Zanoni <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are cases where we try to generate registers with a stride of
32, while the hardware maximum is just 16. This happens, for example,
when using 8 bit integers on SIMD32. This results in a crash because
the variable 'width' has a value of 32:
../../src/intel/compiler/brw_reg.h:550: brw_reg brw_vecn_reg(unsigned
int, brw_reg_file, unsigned int, unsigned int): Assertion `!"Invalid
register width"' failed.
This change prevents the crash and makes the tests pass.
Reviewed-by: Jason Ekstrand <[email protected]>
Signed-off-by: Paulo Zanoni <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
IMHO the code is easier to understand this way, being explicit that
we're doing exactly the same thing every time.
No functional changes.
v2: Adjust the loop breaking condition (Jason).
Reviewed-by: Jason Ekstrand <[email protected]>
Signed-off-by: Paulo Zanoni <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
When dealing with uint16_t and uint8_t on SIMD32 we can do all the
operations using just 2 registers, so we don't hit the recursion at
the beginning of emit_scan(). Because of that, we need to actually
compute scan/reduce for channels 31:16.
v2: Still missed instructions (Jason).
Reviewed-by: Jason Ekstrand <[email protected]>
Signed-off-by: Paulo Zanoni <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I would like for iris to be able to avoid setting up SURFACE_STATE
for UBOs in the common case where all constants are pushed.
Unfortunately, we don't know up front whether everything will be
pushed: the backend is allowed to demote pushed UBOs to pull loads
fairly late in the process. This is probably desirable though, as
we'd like the backend to be able to re-pull pushed data to break up
long live ranges in response to register pressure.
Here we simply add a "are there any pull loads at all" boolean to
prog_data, which is a bit crude but at least allows us to skip work
in the common "everything pushed" case. We could skip more work by
tracking exactly which UBO surfaces are pulled in a bitmask, but I
wanted to avoid bringing back the old mark_surface_used() mechanism.
Finer-grained tracking could allow us to skip a bit more work when
multiple UBOs are in use and /some/ are 100% pushed, but others are
accessed via pulls. However, I'm not sure how common this is and
it would save at most 4 pull descriptors, so we defer that for now.
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When there are no color regions (i.e. a depth only pass), we can set
the "Null Render Target" bit in the Gen11 RT write extended message
descriptor to indicate that it should behave as if it's writing to a
null render target, without the need for a binding table entry.
This lets drivers avoid setting up that null RT binding table entry,
but more importantly means the HW doesn't actually have to bother
looking up the surface state.
Together with the next patch, this improves performance in Car Chase on
an Icelake 8x8 (locked to 700Mhz) by 0.0445526% +/- 0.0132736% (n=832).
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds support for
VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_FLOAT_CONTROLS_PROPERTIES_KHR and
enables de Vulkan and SPIR-V extensions.
Also, notice that this includes the updates applied to the
VkPhysicalDeviceFloatControlsPropertiesKHR structure in the extension
VK_KHR_shader_float_controls v4 and Vulkan 1.1.116.
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Signed-off-by: Andres Gomez <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The remove_extra_rounding_modes() optimization will remove duplicated
rounding mode changes.
v2:
- Fix bug in the rounding mode change (Alejandro).
v3:
- Fix rounding modes.
v4:
- Updated to renamed shader info member and enum values (Andres).
v5:
- Simplify flags logic operations (Caio).
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Signed-off-by: Andres Gomez <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
| |
v2:
- Consider nir_op_f2f16 case too (Caio).
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Signed-off-by: Andres Gomez <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
| |
v2:
- Updated to renamed shader info member (Andres).
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Signed-off-by: Andres Gomez <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We need this function to emit code that setups the control register
later with the defined execution mode for the shader. Therefore, we
emit it as the first instruction.
v2:
- Fix bug in setting the default mode mask in brw_rnd_mode_from_nir().
- Fix support for rounding modes in brw_rnd_mode_from_nir().
v3:
- Updated to renamed shader info member and enum values (Andres).
v4:
- Add actual emission as first instruction of emit_nir_code (Caio).
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Signed-off-by: Andres Gomez <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
register
Before this commit, we had only FPRoundingMode decoration (the per
instruction one) that is applied during the SPIR-V handling. In
vtn_alu we find out the rounding mode, and generate the code
accordingly that later will be used to look for the respective
nir_op_f2f16_{rtz,rtne}.
Per-instruction gets prioritized because we make them explicit
conversions (with RTZ or RTNE nir opcodes) and they will override the
default execution mode defined with float controls. However, we need
to come back to the mode defined by float controls after the execution
of the FP Rounding instruction.
Therefore, the new SHADER_OPCODE_FLOAT_CONTROL_MODE opcode will be
used to set the default rounding mode and denorms treatment in the
whole shader while the pre-existent SHADER_OPCODE_RND_MODE, will be
used as prioritized rounding mode in a per-instruction basis.
v2:
- Fix bug in defining BRW_CR0_FP_MODE_MASK.
v3:
- Update comment (Caio).
v4:
- Split the patch into the helper and the new opcode (this
one) (Caio).
v5:
- Add an explanation on the actual purpose and priority of the newly
introduced opcode in the commit log (Caio).
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
controls
v2:
- Fix bug in defining BRW_CR0_FP_MODE_MASK.
v3:
- Update comment (Caio).
v4:
- Split the patch into the helper (this one) and the new
opcode (Caio).
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The denorm mode is set in the control register, no need to do
something else.
v2:
- Add an assert to make sure that we realize if this assumption is
broken in the future (Caio).
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Signed-off-by: Andres Gomez <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we have fsin or fcos trigonometric operations with constant values
as inputs, we will multiply the result by 0.99997 in
brw_nir_apply_trig_workarounds, making the result wrong.
Adjusting the rules so they do not apply to const values we let a
later constant fold to deal with it.
v2:
- Do not early constant fold but only apply the trig workaround for
non constants (Caio).
- Add fixes tag to commit log (Caio).
Fixes: bfd17c76c12 "i965: Port INTEL_PRECISE_TRIG=1 to NIR."
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Signed-off-by: Andres Gomez <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
v2: by J.Ekstrand suggestion moved lowering of large
constants after lowering of copy_deref is done.
CC: Jason Ekstrand <[email protected]>
CC: Caio Marcelo de Oliveira Filho <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111450
Signed-off-by: Sergii Romantsov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This option strictly allocate the minImageCount given by the
application at swapchain creation.
This works around application that do not deal with the fact that the
implementation allocates more images than the minimum specified.
v2: Add values in default drirc (Bas)
v3: specify engine name/version (Lionel)
Signed-off-by: Lionel Landwerlin <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111522
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
Cc: 19.2 <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Vulkan applications can register with the following structure :
typedef struct VkApplicationInfo {
VkStructureType sType;
const void* pNext;
const char* pApplicationName;
uint32_t applicationVersion;
const char* pEngineName;
uint32_t engineVersion;
uint32_t apiVersion;
} VkApplicationInfo;
This enables the Vulkan implementations to apply workarounds based off
matching this description.
Here we add a new parameter for matching the driconfig options with
the following :
<device driver="anv">
<application engine_name_match="MyOwnEngine.*" engine_versions="10:12,40:42">
<option name="blaaah" value="true" />
</application>
</device>
v2: switch engine name match to use regexps
v3: Verify that the regexec returns REG_NOMATCH for match failure (Eric)
v4: Add missing bit that went to the following commit (Eric)
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Cc: 19.2 <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When the UNDEF instruction was added, we didn't do anything special in
split_virtual_grfs. This mean that anything with an UNDEF wasn't
getting split which causes problems for the compiler. Among other
things, it makes RA harder because things are in bigger chunks. It also
meant that dvec4s weren't getting split which means that they are larger
than the maximum register size.
Shader-db results on Kaby Lake:
total instructions in shared programs: 14959202 -> 14960035 (<.01%)
instructions in affected programs: 96197 -> 97030 (0.87%)
helped: 140
HURT: 128
helped stats (abs) min: 1 max: 17 x̄: 1.62 x̃: 1
helped stats (rel) min: 0.09% max: 6.15% x̄: 0.65% x̃: 0.45%
HURT stats (abs) min: 1 max: 825 x̄: 8.28 x̃: 1
HURT stats (rel) min: 0.13% max: 139.83% x̄: 1.70% x̃: 0.50%
95% mean confidence interval for instructions value: -2.96 9.18
95% mean confidence interval for instructions %-change: -0.56% 1.51%
Inconclusive result (value mean confidence interval includes 0).
total loops in shared programs: 4372 -> 4372 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0
total cycles in shared programs: 352646771 -> 352840997 (0.06%)
cycles in affected programs: 218600800 -> 218795026 (0.09%)
helped: 21167
HURT: 21411
helped stats (abs) min: 1 max: 2924 x̄: 36.89 x̃: 10
helped stats (rel) min: <.01% max: 41.90% x̄: 2.97% x̃: 0.98%
HURT stats (abs) min: 1 max: 26027 x̄: 45.54 x̃: 10
HURT stats (rel) min: <.01% max: 324.46% x̄: 3.88% x̃: 1.06%
95% mean confidence interval for cycles value: 2.87 6.26
95% mean confidence interval for cycles %-change: 0.40% 0.55%
Cycles are HURT.
total spills in shared programs: 8840 -> 8953 (1.28%)
spills in affected programs: 126 -> 239 (89.68%)
helped: 1
HURT: 2
total fills in shared programs: 21782 -> 21914 (0.61%)
fills in affected programs: 431 -> 563 (30.63%)
helped: 1
HURT: 3
LOST: 0
GAINED: 5
Shader-db results on Haswell:
total instructions in shared programs: 13320918 -> 13320769 (<.01%)
instructions in affected programs: 40998 -> 40849 (-0.36%)
helped: 146
HURT: 56
helped stats (abs) min: 1 max: 8 x̄: 2.73 x̃: 2
helped stats (rel) min: 0.16% max: 8.60% x̄: 2.52% x̃: 2.22%
HURT stats (abs) min: 2 max: 23 x̄: 4.45 x̃: 4
HURT stats (rel) min: 0.21% max: 10.26% x̄: 6.83% x̃: 10.26%
95% mean confidence interval for instructions value: -1.26 -0.21
95% mean confidence interval for instructions %-change: -0.62% 0.77%
Inconclusive result (%-change mean confidence interval includes 0).
total loops in shared programs: 4373 -> 4373 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0
total cycles in shared programs: 374518258 -> 374384193 (-0.04%)
cycles in affected programs: 231101954 -> 230967889 (-0.06%)
helped: 21427
HURT: 19438
helped stats (abs) min: 1 max: 2035 x̄: 31.09 x̃: 8
helped stats (rel) min: <.01% max: 40.95% x̄: 2.42% x̃: 0.86%
HURT stats (abs) min: 1 max: 20875 x̄: 27.38 x̃: 8
HURT stats (rel) min: <.01% max: 59.09% x̄: 2.49% x̃: 0.80%
95% mean confidence interval for cycles value: -4.49 -2.07
95% mean confidence interval for cycles %-change: -0.14% -0.04%
Cycles are helped.
total spills in shared programs: 23406 -> 23411 (0.02%)
spills in affected programs: 3 -> 8 (166.67%)
helped: 0
HURT: 2
total fills in shared programs: 34845 -> 34850 (0.01%)
fills in affected programs: 3 -> 8 (166.67%)
helped: 0
HURT: 2
LOST: 0
GAINED: 0
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111566
Fixes: f4ef34f207d1 "intel/fs: Add an UNDEF instruction to avoid..."
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
| |
Initial benchmarking didn't show any performance benefits. But it might eventually.
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Change needed to fix the following building error:
In file included from external/mesa/src/intel/vulkan/anv_device.c:43:
external/mesa/src/util/xmlpool.h:115:10: fatal error: 'xmlpool/options.h' file not found
^~~~~~~~~~~~~~~~~~~
1 error generated.
Fixes: 4dcb1ff ("anv: add support for driconf")
Signed-off-by: Mauro Rossi <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the case where the stencil clear is nicely aligned, we can clear
stencil much more efficiently by mapping it as a wide format (say
RGBA32_UINT) and blasting out the stencil clear value with a repclear.
On Unigine Heaven, this makes one stencil clear go from non-trivial to
unnoticeable when looking at per-draw timings.
In order for this change to work properly, ANV needs to do a bit more
flushing around depth and stencil clears. i965 and iris already have
the cache tracking logic to handle this so no changes are required
there.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
| |
|
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
| |
This isn't known to fix any current bugs but it does prevent a
regression in a subsequent commit.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
| |
Cc: [email protected]
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
No option is supported yet, this is just the boilerplate.
Cc: [email protected]
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|