summaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965/intel_image.h
diff options
context:
space:
mode:
authorIago Toral Quiroga <[email protected]>2015-08-13 15:36:05 -0700
committerSamuel Iglesias Gonsálvez <[email protected]>2017-01-03 11:26:50 +0100
commit558f27953101c438747c3e9d3c3f98ce21e79007 (patch)
tree87eb24c967dbd74cea33e126bee73faaa4c3b296 /src/mesa/drivers/dri/i965/intel_image.h
parent2d6eee3144ce16b39909522be466bdb3871f4c1b (diff)
i965/vec4: add double/float conversion pseudo-opcodes
These need to be emitted as align1 MOV's, since they need to have a stride of 2 on the float register (whether src or dest) so that data from another thread doesn't cross the middle of a SIMD8 register. v2 (Iago): - The float-to-double needs to align 32-bit data to 64-bit before doing the conversion. This was doable in align16 when we tried to use an execsize of 4, but with an execsize of 8 we would need another align1 opcode to do that (since we need data to cross the middle of a SIMD register). Just making the opcode handle this internally seems more practical that adding another opcode just for this purpose and having the caller know about this before converting. - The double-to-float conversion produces 32-bit elements aligned to 64-bit so we make the opcode re-pack the result to 32-bit and fit in one register, as expected by SIMD4x2 operation. This still requires that callers reserve two registers for the float data destination because we need to produce 64-bit aligned data first, and repack it later on the same destination register, but it saves the need for a re-pack opcode only to achieve this making the operation complete in a single opcode. Hopefully that is worth the weirdness of the double register allocation... Signed-off-by: Connor Abbott <[email protected]> Signed-off-by: Iago Toral Quiroga <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
Diffstat (limited to 'src/mesa/drivers/dri/i965/intel_image.h')
0 files changed, 0 insertions, 0 deletions