mesa.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	i965/vec4: don't modify regioning parameters to the sources of DF align1 ↵	Samuel Iglesias Gonsálvez	2017-05-03	1	-8/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	instructions The regioning parameters are now properly set by convert_to_hw_regs() and we don't need to fix them in the generator. That latter fix previously done in the generator was strictly speaking wrong for any non-identity regions. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Cc: "17.1" <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
*	i965/vec4: fix register width for DF VGRF and UNIFORM	Samuel Iglesias Gonsálvez	2017-05-03	1	-5/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On gen7, the swizzles used in DF align16 instructions works for element size of 32 bits, so we can address only 2 consecutive DFs. As we assumed that in the rest of the code and prepare the instructions for this (scalarize_df()), we need to set it to two again. However, for DF align1 instructions, a width of 2 is wrong as we are not reading the data we want. For example, an uniform would have a region of <0, 2, 1> so it would repeat the first 2 DFs, when we wanted to access to the first 4. This patch sets the default one to 4 and then modifies the width of align16 instruction's DF sources when we translate the logical swizzle to the physical one. v2: - Remove conditional (Curro). Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Cc: "17.1" <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
*	i965/vec4: fix vertical stride to avoid breaking region parameter rule	Samuel Iglesias Gonsálvez	2017-05-03	1	-18/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	From IVB PRM, vol4, part3, "General Restrictions on Regioning Parameters": "If ExecSize = Width and HorzStride ≠ 0, VertStride must be set to Width * HorzStride." In next patch, we are going to modify the region parameter for uniforms and vgrf. For uniforms that are the source of DF align1 instructions, they will have <0, 4, 1> regioning and the execsize for those instructions will be 4, so they will break the regioning rule. This will be the same for VGRF sources where we use the vstride == 0 exploit. As we know we are not going to cross the GRF boundary with that execsize and parameters (not even with the exploit), we just fix the vstride here. v2: - Move is_align1_df() (Curro) - Refactor exec_size == width calculation (Curro) Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Cc: "17.1" <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
*	intel/fs: Take into account amount of data read in spilling cost heuristic.	Francisco Jerez	2017-04-24	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Until now the spilling cost calculation was neglecting the amount of data read from the register during the spilling cost calculation. This caused it to make suboptimal decisions in some cases leading to higher memory bandwidth usage than necessary. Improves Unigine Heaven performance by ~4% on BDW, reversing an unintended FPS regression from my previous commit 147e71242ce539ff28e282f009c332818c35f5ac with n=12 and statistical significance 5%. In addition SynMark2 OglCSDof performance is improved by an additional ~5% on SKL, and a Kerbal Space Program apitrace around the Moho planet I can provide on request improves by ~20%. Cc: <[email protected]> Reviewed-by: Plamena Manolova <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	intel/fs: Use regs_written() in spilling cost heuristic for improved accuracy.	Francisco Jerez	2017-04-24	1	-2/+1
\| \| \| \| \| \| \| \| \| \|	This is what we use later on to compute the number of registers that will actually get spilled to memory, so it's more likely to match reality than the current open-coded approximation. Cc: <[email protected]> Reviewed-by: Plamena Manolova <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965/vec4: Use reads_accumulator_implicitly(), not MACH checks.	Kenneth Graunke	2017-04-24	1	-4/+4
\| \| \| \| \| \| \| \| \|	Curro pointed out that I should not just check for MACH, but use the reads_accumulator_implicitly() helper, which would also prevent the same bug with MAC and SADA2 (if we ever decide to use them). Cc: [email protected] Reviewed-by: Francisco Jerez <[email protected]>
*	nir/i965: add before ffma algebraic opts	Timothy Arceri	2017-04-24	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This shuffles constants down in the reverse of what the previous patch does and applies some simpilifications that may be made possible from doing so. Shader-db results BDW: total instructions in shared programs: 12980814 -> 12977822 (-0.02%) instructions in affected programs: 281889 -> 278897 (-1.06%) helped: 1231 HURT: 128 total cycles in shared programs: 246562852 -> 246567288 (0.00%) cycles in affected programs: 11271524 -> 11275960 (0.04%) helped: 1630 HURT: 1378 V2: mark float opts as inexact Reviewed-by: Elie Tournier <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/vec4: Avoid reswizzling MACH instructions in opt_register_coalesce().	Kenneth Graunke	2017-04-22	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	opt_register_coalesce() was optimizing sequences such as: mul(8) acc0:D, attr18.xyyy:D, attr19.xyyy:D mach(8) vgrf5.xy:D, attr18.xyyy:D, attr19.xyyy:D mov(8) m4.zw:F, vgrf5.xxxy:F into: mul(8) acc0:D, attr18.xyyy:D, attr19.xyyy:D mach(8) m4.zw:D, attr18.xxxy:D, attr19.xxxy:D This doesn't work - if we're going to reswizzle MACH, we'd need to reswizzle the MUL as well. Here, the MUL fills the accumulator's .zw components with attr18.yy * attr19.yy. But the MACH instruction expects .z to contain attr18.x * attr19.x. Bogus results ensue. No change in shader-db on Haswell. Prevents regressions in Timothy's patches to use enhanced layouts for varying packing (which rearrange code just enough to trigger this pre-existing bug, but were fine themselves). Acked-by: Timothy Arceri <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965: Use correct VertStride on align16 instructions.	Matt Turner	2017-04-14	1	-10/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In commit c35fa7a, we changed the "width" of DF source registers to 2, which is conceptually fine. Unfortunately a VertStride of 2 is not allowed by align16 instructions on IVB/BYT, and the regular VertStride of 4 works fine in any case. See generated_tests/spec/arb_gpu_shader_fp64/execution/built-in-functions/vs-round-double.shader_test for example: cmp.ge.f0(8) g18<1>DF g1<0>.xyxyDF -g8<2>DF { align16 1Q }; ERROR: In Align16 mode, only VertStride of 0 or 4 is allowed cmp.ge.f0(8) g19<1>DF g1<0>.xyxyDF -g9<2>DF { align16 2N }; ERROR: In Align16 mode, only VertStride of 0 or 4 is allowed v2: - Add spec quote (Curro). - Change the condition to only BRW_VERTICAL_STRIDE_2 (Curro) Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
*	i965/vec4/dce: improve track of partial flag register writes	Samuel Iglesias Gonsálvez	2017-04-14	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is required for correctness in presence of multiple 4-wide flag writes (e.g. 4-wide instructions with a conditional mod set) which update a different portion of the same 8-bit flag subregister. Right now we keep track of flag dataflow with 8-bit granularity and consider flag writes to have killed any previous definition of the same subregister even if the write was less than 8 channels wide, which can cause live flag register updates to be dead code-eliminated incorrectly. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
*	i965/vec4: don't do horizontal stride on some register file types	Samuel Iglesias Gonsálvez	2017-04-14	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \|	horiz_offset() shouldn't be doing anything for scalar registers, because all channels of any SIMD instructions will end up reading or writing the same component of the register, so shifting the register offset would be wrong. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> [ Francisco Jerez: Re-implement in terms of is_uniform() for simplicity. Pass argument by const reference. Clarify commit message. ] Reviewed-by: Francisco Jerez <[email protected]>
*	i965/vec4: Fix exec size for MOVs {SET,PICK}_{HIGH,LOW}_32BIT.	Matt Turner	2017-04-14	1	-4/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Otherwise for a pack_double_2x32_split opcode, we emit: vec1 64 ssa_135 = pack_double_2x32_split ssa_133, ssa_134 mov(8) g5<1>UD g5<4>.xUD { align16 1Q compacted }; mov(8) g7<2>UD g5<4,4,1>UD { align1 1Q }; ERROR: When the destination spans two registers, the source must span two registers (exceptions for scalar source and packed-word to packed-dword expansion) mov(8) g8<2>UD g5.4<4,4,1>UD { align1 2N }; ERROR: The offset from the two source registers must be the same mov(8) g5<1>UD g6<4>.xUD { align16 1Q compacted }; mov(8) g7.1<2>UD g5<4,4,1>UD { align1 1Q }; ERROR: When the destination spans two registers, the source must span two registers (exceptions for scalar source and packed-word to packed-dword expansion) mov(8) g8.1<2>UD g5.4<4,4,1>UD { align1 2N }; ERROR: The offset from the two source registers must be the same The intention was to emit mov(4)s for the instructions that have ERROR annotations. See tests/spec/arb_gpu_shader_fp64/execution/vs-isinf-dvec.shader_test for example. v2 (Samuel): - Instead of setting the exec size to a fixed value, don't double it (Curro). - Add PICK_{HIGH,LOW}_32BIT to the condition. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> [ Francisco Jerez: Trivial rebase changes. ] Reviewed-by: Francisco Jerez <[email protected]>
*	i965/vec4: use vec4_builder to emit instructions in setup_imm_df()	Samuel Iglesias Gonsálvez	2017-04-14	2	-50/+50
\| \| \| \| \| \| \| \|	Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> [ Francisco Jerez: Drop useless vec4_visitor dependencies. Demote to static stand-alone function. Don't write unused components in the result. Use vec4_builder interface for register allocation. ] Reviewed-by: Francisco Jerez <[email protected]>
*	i965/vec4: consider subregister offset in live variables	Juan A. Suarez Romero	2017-04-14	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Take into account offset values less than a full register (32 bytes) when getting the var from register. This is required when dealing with an operation that writes half of the register (like one d2x in IVB/BYT, which uses exec_size == 4). v2: - Take in account this offset < 32 in liveness analysis too (Curro) v3: - Change formula in var_from_reg() (Curro) - Remove useless changes (Curro) Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
*	i965/vec4: fix assert to detect SIMD lowered DF instructions in IVB	Francisco Jerez	2017-04-14	1	-5/+1
\| \| \| \| \| \| \| \| \| \| \|	On IVB, DF instructions have lowered the SIMD width to 4 but the exec_size will be later doubled. Fix the assert to avoid crashing in this case. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> [ Francisco Jerez: Simplify assert. Except for the 'inst->group % 4 == 0' part the assertion was redundant with the previous assertion. ] Reviewed-by: Francisco Jerez <[email protected]>
*	i965/vec4: split VEC4_OPCODE_FROM_DOUBLE into one opcode per destination's type	Samuel Iglesias Gonsálvez	2017-04-14	7	-27/+60
\| \| \| \| \| \| \| \| \| \|	This way we can set the destination type as double to all these new opcodes, avoiding any optimizer's confusion that was happening before. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> [ Francisco Jerez: Drop no_spill workaround originally needed due to the bogus destination type of VEC4_OPCODE_FROM_DOUBLE. ] Reviewed-by: Francisco Jerez <[email protected]>
*	i965/vec4: split d2x conversion and data gathering from one opcode to two ↵	Samuel Iglesias Gonsálvez	2017-04-14	2	-8/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	explicit ones When doing a 64-bit to a smaller data type size conversion, the destination should be aligned to 64-bits. Because of that, we need to gather the data after the actual conversion. Until now, these two operations were done by VEC4_OPCODE_FROM_DOUBLE but now we split them explicitely in two different instructions: VEC4_OPCODE_FROM_DOUBLE just do the conversion and VEC4_OPCODE_PICK_LOW_32BIT will gather the data. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
*	i965/vec4: fix VEC4_OPCODE_FROM_DOUBLE for IVB/BYT	Juan A. Suarez Romero	2017-04-14	1	-7/+19
\| \| \| \| \| \| \| \| \| \| \| \|	In the generator we must generate slightly different code for Ivybridge/Baytrail, because of the way the stride works in this hardware. v2: - Use stride and don't need to fix dst (Curro) Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
*	i965/vec4: keep original type when dealing with null registers	Juan A. Suarez Romero	2017-04-14	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Keep the original type when dealing with null registers. Especially because we do no want to introduce an implicit conversion between types that could affect the conditional flags. This affects especially when the original type is DF, and we are working on Ivybridge/Baytrail. v2 (Curro) - Fix typo. - Use retype() instead of applying the type directly. - Remove unneeded retype. Reviewed-by: Francisco Jerez <[email protected]>
*	i965/vec4: split DF instructions and later double its execsize in IVB/BYT	Samuel Iglesias Gonsálvez	2017-04-14	3	-1/+53
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We need to split DF instructions in two on IVB/BYT as it needs an execsize 8 to process 4 DF values (one GRF in total). v2: - Rename helper and make it static inline function (Matt). - Fix indention and add braces (Matt). v3: - Don't edit IR instruction when doubling exec_size (Curro) - Add comment into the code (Curro). - Manage ARF registers like the others (Curro) v4: - Add get_exec_type() function and use it to calculate the execution size. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> [ Francisco Jerez: Fix bogus 'type != BAD_FILE' check. Take destination type as execution type where there is no valid source. Assert-fail if the deduced execution type is byte. Clarify comment in get_lowered_simd_width(). Move SIMD width workaround outside of 'if (...inst->size_written > REG_SIZE)' conditional block, since the problem should be independent of whether the amount of data written by the instruction is greater or lower than a GRF. Drop redundant is_ivb_df definition. Drop bogus inst->exec_size < 8 check. Simplify channel group assertion. ] Reviewed-by: Francisco Jerez <[email protected]>
*	i965/fs: lower all non-force_writemask_all DF instructions to SIMD4 on IVB/BYT	Samuel Iglesias Gonsálvez	2017-04-14	1	-0/+9
\| \| \| \| \| \| \| \| \|	The hardware applies the same channel enable signals to both halves of the compressed instruction which will be just wrong under non-uniform control flow. Fix this by splitting those instructions to SIMD4. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
*	i965/fs: Get 64-bit indirect moves working on IVB.	Francisco Jerez	2017-04-14	1	-2/+25
\| \| \| \|	Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
*	i965: Use source region <1,2,0> when converting to DF.	Matt Turner	2017-04-14	2	-13/+28
\| \| \| \| \| \| \|	Doing so allows us to use a single MOV in VEC4_OPCODE_TO_DOUBLE instead of two. Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
*	i965/fs: fix lower SIMD width for IVB/BYT's MOV_INDIRECT	Juan A. Suarez Romero	2017-04-14	1	-3/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	According to the IVB and HSW PRMs: "2.When the destination requires two registers and the sources are indirect, the sources must use 1x1 regioning mode." So for DF instructions the execution size is not limited by the number of address registers that are available, but by the EU decompression logic not handling VxH indirect addressing correctly. This patch limits the SIMD width to 4 in this case. v2: - Fix typo (Matt). - Fix condition (Curro) v3: - Add spec quote (Curro) Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Signed-off-by: Juan A. Suarez Romero <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
*	i965/fs: fix dst stride in IVB/BYT type conversions	Juan A. Suarez Romero	2017-04-14	1	-27/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When converting a DF to 32-bit conversions, we set dst stride to 2, to fulfill alignment restrictions because the upper Dword of every Qword will be written with undefined value. But in IVB/BYT, this is not necessary, as each DF conversion already writes 2, the first one the real value, and the second one a 0. That is, IVB/BYT already set stride = 2 implicitly, so we must set it to 1 explicitly to avoid ending up with stride = 4. v2: - Fix typo (Matt) v3: - Fix stride in the destination's brw_reg, don't modity IR (Curro) v4: - Remove 'is_dst' argument of brw_reg_from_fs_reg() (Curro) - Fix comment (Curro). - Relax hstride assert (Curro) Signed-off-by: Juan A. Suarez Romero <[email protected]> Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> [ Francisco Jerez: Minor spelling fixes. ] Reviewed-by: Francisco Jerez <[email protected]>
*	i965/fs: rename lower_d2x to lower_conversions	Samuel Iglesias Gonsálvez	2017-04-14	3	-3/+3
\| \| \| \| \| \| \| \|	v2: - Change the name to lower_conversions. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
*	Revert "i965/fs: Don't emit SEL instructions for type-converting MOVs."	Samuel Iglesias Gonsálvez	2017-04-14	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit 7dccd38b400d3a65da20ddefe282a7bb0b7ccb58. d2x pass fixes SEL instructions when there is a type conversion by doing a SEL without type conversion and then convert the result. This pass also takes into account the non-uniform control flow. Then, 7dccd38b400d3a65da20ddefe282a7bb0b7ccb58 is not needed anymore. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Francisco Jerez <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	i965/fs: generalize the legalization d2x pass	Samuel Iglesias Gonsálvez	2017-04-14	2	-37/+67
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Generalize it to lower any unsupported narrower conversion. v2 (Curro): - Add supports_type_conversion() - Reuse existing intruction instead of cloning it. - Generalize d2x to narrower and equal size conversions. v3 (Curro): - Make supports_type_conversion() const and improve it. - Use foreach_block_and_inst to process added instructions. - Simplify code. - Add assert and improve comments. - Remove redundant mov. - Remove useless comment. - Remove saturate == false assert and add support for saturation when fixing the conversion. - Add get_exec_type() function. v4 (Curro): - Use get_exec_type() function to get sources' type. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
*	i965: Use <0,2,1> region for scalar DF sources on IVB/BYT.	Matt Turner	2017-04-14	1	-0/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On HSW+, scalar DF sources can be accessed using the normal <0,1,0> region, but on IVB and BYT DF regions must be programmed in terms of floats. A <0,2,1> region accomplishes this. v2: - Apply region <0,2,1> in brw_reg_from_fs_reg() (Curro). v3: - Added comment explaining the reason (Curro). Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
*	i965/fs: clamp exec_size when an instruction has a scalar DF source	Samuel Iglesias Gonsálvez	2017-04-14	1	-3/+8
\| \| \| \| \| \| \| \| \| \|	Then the SIMD lowering pass will get rid of any compressed instructions with scalar source (whether force_writemask_all or not) and we avoid hitting the Gen7 region decompression bug. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Suggested-by: Francisco Jerez <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
*	i965/fs: double regioning parameters and execsize for DF in IVB/BYT	Juan A. Suarez Romero	2017-04-14	1	-7/+43
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In IVB and BYT, both regioning parameters and execution sizes are measured as 32-bits element size. So when we have something like: mov(8) g2<1>DF g3<4,4,1>DF We are not actually moving 8 doubles (our intention), but 4 doubles. We need to double the parameters to cope with this issue. However, horizontal strides don't behave as they're supposed to on IVB for DF regions, they will cause each 32-bit half of DF sources to be strided individually, and doubling the value won't make any difference. v2: - Use devinfo directly (Matt). - Use Baytrail instead of Valleview (Matt). - Use IvyBridge instead of Ivy (Matt) - Double the exec_size in code emission (Curro) v3: - Change hstride doubling by an assert and fix commit log (Curro). - Substitute remaining compiler->devinfo by devinfo (Curro). v4: - Fix comment (Curro). Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
*	i965/fs: add helper to retrieve instruction execution type	Juan A. Suarez Romero	2017-04-14	3	-5/+64
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The execution data size is the biggest type size of any instruction operand. We will use it to know if the instruction deals with DF, because in Ivy we need to double the execution size and regioning parameters. v2: - Fix typo in commit log (Matt) - Use static inline function instead of fs_inst's method (Curro). - Define the result as a constant (Curro). - Fix indentation (Matt). - Add braces to nested control flow (Matt). v3 (Curro): - Add get_exec_type() and other auxiliary functions and use them to calculate its size. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> [ Francisco Jerez: Fix bogus 'type != BAD_FILE' check. Fix deduced execution type for integer vector types. Take destination type as execution type where there is no valid source. Assert-fail if the deduced execution type is byte. Move into brw_ir_fs.h header for consistency with the VEC4 back-end. ] Reviewed-by: Francisco Jerez <[email protected]>
*	i965: Handle IVB DF differences in the validator.	Matt Turner	2017-04-14	1	-0/+24
\| \| \| \| \| \| \| \| \| \| \|	On IVB/BYT, region parameters and execution size for DF are in terms of 32-bit elements, so they are doubled. For evaluating the validity of an instruction, we halve them. v2 (Sam): - Add comments. Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
*	i965/disasm: also print nibctrl in IVB for execsize=8	Iago Toral Quiroga	2017-04-14	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \|	4-wide DF operations where NibCtrl applies require and execsize of 8 in IvyBridge/BayTrail. v2: - Refactor NibCtrl printing (Matt) Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
*	i965/fs: Take into account lower frequency of conditional blocks in spilling ↵	Francisco Jerez	2017-04-11	1	-5/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	cost heuristic. The individual branches of an if/else/endif construct will be executed some unknown number of times between 0 and 1 relative to the parent block. Use some factor in between as weight while approximating the cost of spill/fill instructions within a conditional if-else branch. This favors spilling registers used within conditional branches which are likely to be executed less frequently than registers used at the top level. Improves the framerate of the SynMark2 OglCSDof benchmark by ~1.9x on my SKL GT4e. Should have a comparable effect on other platforms. No significant regressions. Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	i965/fs: Always provide a default LOD of 0 for TXS and TXL	Jason Ekstrand	2017-04-04	1	-9/+9
\| \| \| \| \| \| \| \| \| \| \| \| \|	We already provide a default LOD for textureQueryLevels and texture() on non-fragment stages. However, there are more cases where one is needed such as textureSize(gsampler2DMS*) in SPIR-V. Instead of trying to list out all of the cases one at a time, just provide the default for all TXS and TXL operations. This fixes a shader validation error in the new Sascha deferredmultisampling demo which uses textureSize(gsampler2DMS). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100391 Reviewed-by: Anuj Phogat <[email protected]> Cc: "13.0 17.0" <[email protected]>
*	intel/vec4: Add some fall through comments	Jason Ekstrand	2017-04-03	1	-0/+4
\| \| \| \|	Reviewed-by: Matt Turner <[email protected]>
*	i965: expose BRW_OPCODE_[F32TO16/F16TO32] name on gen8+	Alejandro Piñeiro	2017-03-29	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Technically those hw operations are only available on gen7, as gen8+ support the conversion on the MOV. But, when using the builder to implement nir operations (example: nir_op_fquantize2f16), it is not needed to do the gen check. This check is done later, on the final emission at brw_F32TO16 (brw_eu_emit), choosing between the MOV or the specific operation accordingly. So in the middle, during optimization phases those hw operations can be around for gen8+ too. Without this patch, several (at least 95) vulkan-cts quantize tests crashes when using INTEL_DEBUG=optimizer. For example: dEQP-VK.spirv_assembly.instruction.graphics.opquantize.too_small_vert v2: simplify the code using GEN_GE (Ilia Mirkin) v3: tweak brw_instruction_name instead of changing opcode_descs table, that is used for validation (Matt Turner) Reviewed-by: Matt Turner <[email protected]>
*	i965/fs: Don't emit SEL instructions for type-converting MOVs.	Matt Turner	2017-03-27	1	-0/+2
\| \| \| \| \| \| \| \| \| \|	SEL can only convert between a few integer types, which we basically never do. Fixes fs/vs-double-uniform-array-direct-indirect-non-uniform-control-flow Cc: [email protected] Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]> Acked-by: Francisco Jerez <[email protected]>
*	anv/pipeline: make FragCoord include sample positions when sample shading	Iago Toral Quiroga	2017-03-24	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We need to know if sample shading has been requested during shader compilation since that affects the way fragment coordinates are computed. Notice that the semantics of fragment coordinates only depend on whether sample shading has been requested, not on whether more than one sample will actually be produced (that is, minSampleShading and rasterizationSamples do not affect this behavior). Because this setting affects the code we generate for the shader, we also need to include it in the WM prog key. Notice we don't need to alter the OpenGL code because it doesn't ever use this behavior, so they key's value is always false (the default). Fixes: dEQP-VK.glsl.builtin_var.fragcoord_msaa.* Reviewed-by: Jason Ekstrand <[email protected]>
*	i965: Replace OPT_V() with OPT().	Matt Turner	2017-03-23	1	-23/+19
\| \| \| \| \| \| \|	We want to be able to check the progress of each pass and dump the NIR for debugging purposes if it changed. Reviewed-by: Jason Ekstrand <[email protected]>
*	i965/fs: Return progress from demote_sample_qualifiers().	Matt Turner	2017-03-23	1	-1/+6
\| \| \| \|	Reviewed-by: Jason Ekstrand <[email protected]>
*	i965/fs: Return progress from move_interpolation_to_top().	Matt Turner	2017-03-23	1	-1/+6
\| \| \| \| \| \|	And mark as static at the same time. Reviewed-by: Jason Ekstrand <[email protected]>
*	intel/compiler: consistently use ifndef guards over pragma once	Emil Velikov	2017-03-22	8	-5/+31
\| \| \| \| \| \| \| \|	Signed-off-by: Emil Velikov <[email protected]> Acked-by: Lionel Landwerlin <[email protected]> Acked-by: Vedran Miletić <[email protected]> Acked-by: Juha-Pekka Heikkila <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
*	i965: make brw_setup_image_uniform_values static	Emil Velikov	2017-03-22	1	-5/+0
\| \| \| \| \| \| \| \| \| \|	Used only internally. Signed-off-by: Emil Velikov <[email protected]> Acked-by: Lionel Landwerlin <[email protected]> Acked-by: Vedran Miletić <[email protected]> Acked-by: Juha-Pekka Heikkila <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
*	nir: Rework conversion opcodes	Jason Ekstrand	2017-03-14	3	-67/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The NIR story on conversion opcodes is a mess. We've had way too many of them, naming is inconsistent, and which ones have explicit sizes was sort-of random. This commit re-organizes things and makes them all consistent: - All non-bool conversion opcodes now have the explicit size in the destination and are named <src_type>2<dst_type><size>. - Integer <-> integer conversion opcodes now only come in i2i and u2u forms (i2u and u2i have been removed) since the only difference between the different integer conversions is whether or not they sign-extend when up-converting. - Boolean conversion opcodes all have the explicit size on the bool and are named <src_type>2<dst_type>. Making things consistent also allows nir_type_conversion_op to be moved to nir_opcodes.c and auto-generated using mako. This will make adding int8, int16, and float16 versions much easier when the time comes. Reviewed-by: Eric Anholt <[email protected]>
*	i965/fs: Re-arrange conversion operations	Jason Ekstrand	2017-03-14	1	-36/+31
\| \| \| \|	Reviewed-by: Topi Pohjolainen <[email protected]>
*	i965/vec4: Get rid of the type parameter from to/from_double	Jason Ekstrand	2017-03-14	2	-24/+15
\| \| \| \|	Reviewed-by: Topi Pohjolainen <[email protected]>
*	i965/fs: Use num_components from the SSA def in image intrinsics	Jason Ekstrand	2017-03-14	1	-2/+1
\| \| \| \| \| \|	Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
*	intel: fix compiler build	Iago Toral Quiroga	2017-03-13	1	-0/+7
\| \| \| \| \| \| \| \| \|	compiler/brw_vec4_gs_visitor.cpp:744:39: error: ‘GEN7_MAX_GS_OUTPUT_VERTEX_SIZE_BYTES’ was not declared in this scope output_vertex_size_bytes <= GEN7_MAX_GS_OUTPUT_VERTEX_SIZE_BYTES); Fixes: d0d4a5f43b4 ("i965: split EU defines to brw_eu_defines.h") Reviewed-by: Emil Velikov <[email protected]>