| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Take into account offset values less than a full register (32 bytes)
when getting the var from register.
This is required when dealing with an operation that writes half of the
register (like one d2x in IVB/BYT, which uses exec_size == 4).
v2:
- Take in account this offset < 32 in liveness analysis too (Curro)
v3:
- Change formula in var_from_reg() (Curro)
- Remove useless changes (Curro)
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
On IVB, DF instructions have lowered the SIMD width to 4 but the
exec_size will be later doubled. Fix the assert to avoid crashing in
this case.
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
[ Francisco Jerez: Simplify assert. Except for the 'inst->group % 4
== 0' part the assertion was redundant with the previous assertion. ]
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This way we can set the destination type as double to all these new opcodes,
avoiding any optimizer's confusion that was happening before.
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
[ Francisco Jerez: Drop no_spill workaround originally needed due to
the bogus destination type of VEC4_OPCODE_FROM_DOUBLE. ]
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
explicit ones
When doing a 64-bit to a smaller data type size conversion, the destination should
be aligned to 64-bits. Because of that, we need to gather the data after the
actual conversion.
Until now, these two operations were done by VEC4_OPCODE_FROM_DOUBLE but
now we split them explicitely in two different instructions:
VEC4_OPCODE_FROM_DOUBLE just do the conversion and
VEC4_OPCODE_PICK_LOW_32BIT will gather the data.
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the generator we must generate slightly different code for
Ivybridge/Baytrail, because of the way the stride works in
this hardware.
v2:
- Use stride and don't need to fix dst (Curro)
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Keep the original type when dealing with null registers. Especially
because we do no want to introduce an implicit conversion between
types that could affect the conditional flags.
This affects especially when the original type is DF, and we are working
on Ivybridge/Baytrail.
v2 (Curro)
- Fix typo.
- Use retype() instead of applying the type directly.
- Remove unneeded retype.
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We need to split DF instructions in two on IVB/BYT as it needs an
execsize 8 to process 4 DF values (one GRF in total).
v2:
- Rename helper and make it static inline function (Matt).
- Fix indention and add braces (Matt).
v3:
- Don't edit IR instruction when doubling exec_size (Curro)
- Add comment into the code (Curro).
- Manage ARF registers like the others (Curro)
v4:
- Add get_exec_type() function and use it to calculate the execution
size.
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
[ Francisco Jerez: Fix bogus 'type != BAD_FILE' check. Take
destination type as execution type where there is no valid source.
Assert-fail if the deduced execution type is byte. Clarify comment
in get_lowered_simd_width(). Move SIMD width workaround outside of
'if (...inst->size_written > REG_SIZE)' conditional block, since the
problem should be independent of whether the amount of data written
by the instruction is greater or lower than a GRF. Drop redundant
is_ivb_df definition. Drop bogus inst->exec_size < 8 check.
Simplify channel group assertion. ]
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The hardware applies the same channel enable signals to both halves of
the compressed instruction which will be just wrong under non-uniform
control flow. Fix this by splitting those instructions to SIMD4.
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
| |
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
|
|
|
|
|
|
|
| |
Doing so allows us to use a single MOV in VEC4_OPCODE_TO_DOUBLE instead
of two.
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
According to the IVB and HSW PRMs:
"2.When the destination requires two registers and the sources are
indirect, the sources must use 1x1 regioning mode."
So for DF instructions the execution size is not limited by the number
of address registers that are available, but by the EU decompression
logic not handling VxH indirect addressing correctly.
This patch limits the SIMD width to 4 in this case.
v2:
- Fix typo (Matt).
- Fix condition (Curro)
v3:
- Add spec quote (Curro)
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Signed-off-by: Juan A. Suarez Romero <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When converting a DF to 32-bit conversions, we set dst stride to 2,
to fulfill alignment restrictions because the upper Dword of every
Qword will be written with undefined value.
But in IVB/BYT, this is not necessary, as each DF conversion already
writes 2, the first one the real value, and the second one a 0.
That is, IVB/BYT already set stride = 2 implicitly, so we must set it to
1 explicitly to avoid ending up with stride = 4.
v2:
- Fix typo (Matt)
v3:
- Fix stride in the destination's brw_reg, don't modity IR (Curro)
v4:
- Remove 'is_dst' argument of brw_reg_from_fs_reg() (Curro)
- Fix comment (Curro).
- Relax hstride assert (Curro)
Signed-off-by: Juan A. Suarez Romero <[email protected]>
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
[ Francisco Jerez: Minor spelling fixes. ]
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
| |
v2:
- Change the name to lower_conversions.
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 7dccd38b400d3a65da20ddefe282a7bb0b7ccb58.
d2x pass fixes SEL instructions when there is a type conversion
by doing a SEL without type conversion and then convert the result.
This pass also takes into account the non-uniform control flow.
Then, 7dccd38b400d3a65da20ddefe282a7bb0b7ccb58 is not needed anymore.
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Generalize it to lower any unsupported narrower conversion.
v2 (Curro):
- Add supports_type_conversion()
- Reuse existing intruction instead of cloning it.
- Generalize d2x to narrower and equal size conversions.
v3 (Curro):
- Make supports_type_conversion() const and improve it.
- Use foreach_block_and_inst to process added instructions.
- Simplify code.
- Add assert and improve comments.
- Remove redundant mov.
- Remove useless comment.
- Remove saturate == false assert and add support for saturation
when fixing the conversion.
- Add get_exec_type() function.
v4 (Curro):
- Use get_exec_type() function to get sources' type.
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On HSW+, scalar DF sources can be accessed using the normal <0,1,0>
region, but on IVB and BYT DF regions must be programmed in terms of
floats. A <0,2,1> region accomplishes this.
v2:
- Apply region <0,2,1> in brw_reg_from_fs_reg() (Curro).
v3:
- Added comment explaining the reason (Curro).
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Then the SIMD lowering pass will get rid of any compressed instructions with scalar
source (whether force_writemask_all or not) and we avoid hitting the Gen7 region
decompression bug.
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Suggested-by: Francisco Jerez <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In IVB and BYT, both regioning parameters and execution sizes are measured as
32-bits element size.
So when we have something like:
mov(8) g2<1>DF g3<4,4,1>DF
We are not actually moving 8 doubles (our intention), but 4 doubles.
We need to double the parameters to cope with this issue. However,
horizontal strides don't behave as they're supposed to on IVB
for DF regions, they will cause each 32-bit half of DF sources to be
strided individually, and doubling the value won't make any difference.
v2:
- Use devinfo directly (Matt).
- Use Baytrail instead of Valleview (Matt).
- Use IvyBridge instead of Ivy (Matt)
- Double the exec_size in code emission (Curro)
v3:
- Change hstride doubling by an assert and fix commit log (Curro).
- Substitute remaining compiler->devinfo by devinfo (Curro).
v4:
- Fix comment (Curro).
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The execution data size is the biggest type size of any instruction
operand.
We will use it to know if the instruction deals with DF, because in Ivy
we need to double the execution size and regioning parameters.
v2:
- Fix typo in commit log (Matt)
- Use static inline function instead of fs_inst's method (Curro).
- Define the result as a constant (Curro).
- Fix indentation (Matt).
- Add braces to nested control flow (Matt).
v3 (Curro):
- Add get_exec_type() and other auxiliary functions and use them to
calculate its size.
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
[ Francisco Jerez: Fix bogus 'type != BAD_FILE' check. Fix deduced
execution type for integer vector types. Take destination type as
execution type where there is no valid source. Assert-fail if the
deduced execution type is byte. Move into brw_ir_fs.h header for
consistency with the VEC4 back-end. ]
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
On IVB/BYT, region parameters and execution size for DF are in terms of
32-bit elements, so they are doubled. For evaluating the validity of an
instruction, we halve them.
v2 (Sam):
- Add comments.
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
4-wide DF operations where NibCtrl applies require and execsize of 8
in IvyBridge/BayTrail.
v2:
- Refactor NibCtrl printing (Matt)
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The Vulkan driver was originally written under the assumption that
VK_ATTACHMENT_UNUSED was basically just for depth-stencil attachments.
However, the way things fell together, VK_ATTACHMENT_UNUSED can be used
anywhere in the subpass description. The blorp-based clear and resolve
code has a bunch of places where we walk lists of attachments and we
weren't handling VK_ATTACHMENT_UNUSED everywhere. This commit should
fix all of them.
Reviewed-by: Nanley Chery <[email protected]>
Cc: <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Nanley Chery <[email protected]>
Cc: <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
We're about to start requiring it in yet another case and calculating
exactly when one is needed is starting to get prohibitively expensive.
A single surface state doesn't take up that much space so we may as well
create one all the time.
Reviewed-by: Nanley Chery <[email protected]>
Cc: <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
Cc: "13.0 17.0" <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
Cc: "13.0 17.0" <[email protected]>
|
|
|
|
| |
Reviewed-by: Juan A. Suarez Romero <[email protected]>
|
|
|
|
|
|
|
| |
This makes it much easier to throw together a bit of dynamic state. It
also automatically handles flushing so you don't accidentally forget.
Reviewed-by: Alejandro Piñeiro <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
cost heuristic.
The individual branches of an if/else/endif construct will be executed
some unknown number of times between 0 and 1 relative to the parent
block. Use some factor in between as weight while approximating the
cost of spill/fill instructions within a conditional if-else branch.
This favors spilling registers used within conditional branches which
are likely to be executed less frequently than registers used at the
top level.
Improves the framerate of the SynMark2 OglCSDof benchmark by ~1.9x on
my SKL GT4e. Should have a comparable effect on other platforms. No
significant regressions.
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
| |
This is already invoked in the following VG_NOACCESS_READ() call.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
| |
We're about to replace blorp's emit code with ISL and it emits them in
the other order. This makes diffing the aubs easier.
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
| |
This gets rid of one piece of ugliness with the way ISL handles surface
emitting surface states. I've never liked that hand-rolled table but it
was the best we had at the time.
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The helper block is extremely general. It takes an string property name
and an object that supports three methods: has_prop, iter_prop, and
get_prop. This way we can easily generalize it to emit more different
types of getter functions.
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
| |
Instead of figuring it all out ourselves, just use the information given
to us by the client.
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
| |
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
| |
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
| |
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
We tend to try to reduce the number of allocation calls the Vulkan
driver uses by doing a single allocation whenever possible for a data
structure. While this has certain downsides (usually code complexity),
it does mean error handling and cleanup is much easier. This commit
adds a nice little helper struct for getting rid of some of that
complexity.
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
| |
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit b2c97bc789198427043cd902bc76e194e7e81c7d which made us start
using a busy-wait for individual query results also messed up cache
flushing on !LLC platforms. For one thing, I forgot the mfence after
the clflush so memory access wasn't properly getting fenced. More
importantly, however, was that we were clflushing the whole query range
and then waiting for individual queries and then trying to read the
results without clflushing again. Getting the clflushing both correct
and efficient is very subtle and painful. Instead, let's side-step the
problem by just snooping.
Reviewed-by: Chris Wilson <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Otherwise linking way fail.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100600
Fixes: f195d40eca4 ("anv/device: Add a helper for querying whether a BO is busy")
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Juan A. Suarez Romero <[email protected]>
Tested-by: Vinson Lee <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On Broadwell we still need to do a resolve between the subpass
that writes and the subpass that reads when there is a
self-dependency because HW could not see fast-clears and works
on the render cache as if there was regular non-fast-clear surface.
Fixes 16 tests on BDW:
dEQP-VK.renderpass.formats.*.input.clear.store.self_dep*
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|