summaryrefslogtreecommitdiffstats
path: root/src/intel/compiler
Commit message (Collapse)AuthorAgeFilesLines
* intel/eu: Set EXECUTE_1 when setting the rounding mode in cr0Jason Ekstrand2018-05-221-0/+2
| | | | | Fixes: d6cd14f2131a5b "i965/fs: Define new shader opcode to..." Reviewed-by: Jose Maria Casanova Crespo <[email protected]>
* i965/compiler: handle conversion to smaller type in the lowering pass for thatIago Toral Quiroga2018-05-052-12/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This rollbacks the revert of this same patch introduced in commit 7b9c15628aae8729118b648f5f473e6ac926b99b. And also squahes the following patch to prevent a piglit regression caused by this change: intel/compiler: Fix lower_conversions for 8-bit types. Author: Jose Maria Casanova Crespo <[email protected]> For 8-bit types the execution type is word. A byte raw MOV has 16-bit execution type and 8-bit destination and it shouldn't be considered a conversion case. So there is no need to change alignment and enter in lower_conversions for these instructions. Fixes a regresion in the piglit test "glsl-fs-shader-stencil-export" that is introduced with this patch from the Vulkan shaderInt16 series: 'i965/compiler: handle conversion to smaller type in the lowering pass for that'. The problem is caused because there is already a case in the driver that injects Byte instructions like this: mov(8) g127<1>UB g2<32,8,4>UB And the aforementioned pass was not accounting for the special handling of the execution size of Byte instructions. This patch fixes this. v2: (Jason Ekstrand) - Simplify is_byte_raw_mov, include reference to PRM and not consider B <-> UB conversions as raw movs. v3: (Matt Turner) - Indentation style fixes. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106393 Tested-by: Mark Janes <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel/compiler: handle 16-bit to 64-bit conversions in BSW platformsIago Toral Quiroga2018-05-051-4/+4
| | | | | | | | | | | | | | | | | These are subject to the general restriction that anything that is converted to 64-bit needs to be aligned to 64-bit. We had this already in place for 32-bit to 64-bit conversions, so this patch generalizes the implementation to take effect on any conversion to 64-bit from a source smaller than 64-bit. Fixes assembly validation errors in the following CTS tests in BSW: dEQP-VK.spirv_assembly.instruction.compute.sconvert.int16_to_int64 dEQP-VK.spirv_assembly.instruction.compute.uconvert.uint16_to_uint64 dEQP-VK.spirv_assembly.instruction.compute.sconvert.int16_to_uint64 Tested-by: Mark Janes <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* Revert "i965/compiler: handle conversion to smaller type in the lowering ↵Mark Janes2018-05-032-7/+12
| | | | | | | | | pass for that" This reverts commit 96b51537908cd2aace85f54b437eeb72e6346b7e. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106393 Reviewed-by: Scott D Phillips <[email protected]>
* intel/compiler: implement 16-bit pack/unpack opcodesIago Toral Quiroga2018-05-031-0/+10
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* compiler/lower_64bit_packing: rename the pass to be more genericIago Toral Quiroga2018-05-031-1/+1
| | | | | | It can do 32-bit packing too now. Reviewed-by: Jason Ekstrand <[email protected]>
* intel/compiler: fix 16-bit comparisonsIago Toral Quiroga2018-05-031-8/+30
| | | | | | | | | | | | | | NIR assumes that booleans are always 32-bit, but Intel hardware produces 16-bit booleans for 16-bit comparisons. This means that we need to convert the 16-bit result to 32-bit. In the future we want to add an optimization pass to clean this up and hopefully remove the conversions. v2 (Jason): use the type of the source for the temporary and use brw_reg_type_from_bit_size for the conversion to 32-bit. Reviewed-by: Jason Ekstrand <[email protected]>
* intel/compiler: lower some 16-bit integer operations to 32-bitIago Toral Quiroga2018-05-031-0/+21
| | | | | | | | | These are not supported in hardware for 16-bit integers. We do the lowering pass after the optimization loop to ensure that we lower ALU operations injected by algebraic optimizations too. Reviewed-by: Jason Ekstrand <[email protected]>
* intel/compiler: support negate and abs of half float immediatesJose Maria Casanova Crespo2018-05-031-2/+4
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* intel/compiler: fix brw_imm_w for negative 16-bit integersJose Maria Casanova Crespo2018-05-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | 16-bit immediates need to replicate the 16-bit immediate value in both words of the 32-bit value. This needs to be careful to avoid sign-extension, which the previous implementation was not handling properly. For example, with the previous implementation, storing the value -3 would generate imm.d = 0xfffffffd due to signed integer sign extension, which is not correct. Instead, we should cast to uint16_t, which gives us the correct result: imm.ud = 0xfffdfffd. We only had a couple of cases hitting this path in the driver until now, one with value -1, which would work since all bits are one in this case, and another with value -2 in brw_clip_tri(), which would hit the aforementioned issue (this case only affects gen4 although we are not aware of whether this was causing an actual bug somewhere). v2: Make explicit uint32_t casting for left shift (Jason Ekstrand) Reviewed-by: Jason Ekstrand <[email protected]> Cc: "18.0 18.1" <[email protected]>
* intel/compiler: fix 16-bit int brw_negate_immediate and brw_abs_immediateJose Maria Casanova Crespo2018-05-031-4/+8
| | | | | | | | | | | | | | | | | | | | | From Intel Skylake PRM, vol 07, "Immediate" section (page 768): "For a word, unsigned word, or half-float immediate data, software must replicate the same 16-bit immediate value to both the lower word and the high word of the 32-bit immediate field in a GEN instruction." This fixes the int16/uint16 negate and abs immediates that weren't taking into account the replication in lower and upper words. v2: Integer cases are different to Float cases. (Jason Ekstrand) Included reference to PRM (Jose Maria Casanova) v3: Make explicit uint32_t casting for left shift (Jason Ekstrand) Split half float implementation. (Jason Ekstrand) Fix brw_abs_immediate (Jose Maria Casanova) Cc: "18.0 18.1" <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel/compiler: implement nir_instr_type_load_const for 16-bit constantsJose Maria Casanova Crespo2018-05-031-0/+5
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* intel/compiler: implement conversions from 16-bit int/float to boolIago Toral Quiroga2018-05-031-5/+11
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* intel/compiler: implement conversion between float/int 16-bit typesIago Toral Quiroga2018-05-031-0/+4
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* i965/compiler: handle conversion to smaller type in the lowering pass for thatIago Toral Quiroga2018-05-032-12/+7
| | | | | | | The lowering pass was specialized to act on 64-bit to 32-bit conversions only, but the implementation is valid for other cases. Reviewed-by: Jason Ekstrand <[email protected]>
* intel/compiler: fix isign for 16-bit integersIago Toral Quiroga2018-05-031-5/+12
| | | | | | | | | | | | | We need to use 16-bit constants with 16-bit instructions, otherwise we get the following validation error: "Destination stride must be equal to the ratio of the sizes of the execution data type to the destination type" Because the execution data type is 4B due to the 32-bit integer constant. Reviewed-by: Jason Ekstrand <[email protected]>
* intel: activate the gl_BaseVertex loweringAntia Puentes2018-05-025-16/+8
| | | | | | | | | | | | | | | | | | | | | | | | | Surplus code related to the basevertex is removed. The Vertex Elements contain now: * VE 1: <firstvertex, BaseInstance, VertexID, InstanceID> * VE 2: <DrawID, is_indexed_draw, 0, 0> Also fixes unreachable message. Fixes OpenGL CTS tests: * KHR-GL46.shader_draw_parameters_tests.ShaderDrawArraysInstancedParameters * KHR-GL46.shader_draw_parameters_tests.ShaderMultiDrawArraysParameters * KHR-GL46.shader_draw_parameters_tests.MultiDrawArraysIndirectCountParameters * KHR-GL46.shader_draw_parameters_tests.ShaderDrawArraysParameters * KHR-GL46.shader_draw_parameters_tests.ShaderMultiDrawArraysIndirectParameters Fixes Piglit tests: * arb_shader_draw_parameters-drawid-indirect baseinstance * arb_shader_draw_parameters-basevertex Reviewed-by: Jason Ekstrand <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102678
* intel: emit is_indexed_draw in the same VE than gl_DrawIDAntia Puentes2018-05-023-8/+19
| | | | | | | | | | | The Vertex Elements are now: * VE 1: <BaseVertex/firstvertex, BaseInstance, VertexID, InstanceID> * VE 2: <DrawID, is-indexed-draw, 0, 0> VE1 is it kept as it was before, VE2 additionally contains the new system value. Reviewed-by: Jason Ekstrand <[email protected]>
* intel/compiler: Add uses_is_indexed_draw flagAntia Puentes2018-05-022-0/+5
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* intel/compiler: Add scheduler deps for instructions that implicitly read g0Ian Romanick2018-04-242-0/+28
| | | | | | | | | | | | | Otherwise the scheduler can move the writes after the reads. Signed-off-by: Ian Romanick <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95009 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95012 Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Tested-by: Mark Janes <[email protected]> Cc: Clayton A Craft <[email protected]> Cc: [email protected]
* intel/compiler: Silence unused parameter warnings in empty ↵Ian Romanick2018-04-241-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | vec4_instruction_scheduler methods src/intel/compiler/brw_schedule_instructions.cpp: In member function ‘virtual void vec4_instruction_scheduler::count_reads_remaining(backend_instruction*)’: src/intel/compiler/brw_schedule_instructions.cpp:764:72: warning: unused parameter ‘be’ [-Wunused-parameter] vec4_instruction_scheduler::count_reads_remaining(backend_instruction *be) ^~ src/intel/compiler/brw_schedule_instructions.cpp: In member function ‘virtual void vec4_instruction_scheduler::setup_liveness(cfg_t*)’: src/intel/compiler/brw_schedule_instructions.cpp:769:51: warning: unused parameter ‘cfg’ [-Wunused-parameter] vec4_instruction_scheduler::setup_liveness(cfg_t *cfg) ^~~ src/intel/compiler/brw_schedule_instructions.cpp: In member function ‘virtual void vec4_instruction_scheduler::update_register_pressure(backend_instruction*)’: src/intel/compiler/brw_schedule_instructions.cpp:774:75: warning: unused parameter ‘be’ [-Wunused-parameter] vec4_instruction_scheduler::update_register_pressure(backend_instruction *be) ^~ src/intel/compiler/brw_schedule_instructions.cpp: In member function ‘virtual int vec4_instruction_scheduler::get_register_pressure_benefit(backend_instruction*)’: src/intel/compiler/brw_schedule_instructions.cpp:779:80: warning: unused parameter ‘be’ [-Wunused-parameter] vec4_instruction_scheduler::get_register_pressure_benefit(backend_instruction *be) ^~ src/intel/compiler/brw_schedule_instructions.cpp: In member function ‘virtual int vec4_instruction_scheduler::issue_time(backend_instruction*)’: src/intel/compiler/brw_schedule_instructions.cpp:1550:61: warning: unused parameter ‘inst’ [-Wunused-parameter] vec4_instruction_scheduler::issue_time(backend_instruction *inst) ^~~~ Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel/compiler: Silence unused parameter warning in compile_cs_to_nirIan Romanick2018-04-241-4/+3
| | | | | | | | | | src/intel/compiler/brw_fs.cpp: In function ‘nir_shader* compile_cs_to_nir(const brw_compiler*, void*, const brw_cs_prog_key*, brw_cs_prog_data*, const nir_shader*, unsigned int)’: src/intel/compiler/brw_fs.cpp:7205:44: warning: unused parameter ‘prog_data’ [-Wunused-parameter] struct brw_cs_prog_data *prog_data, ^~~~~~~~~ Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel/compiler: Silence unused parameter warnings in generate_foo methodsIan Romanick2018-04-242-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since all of the fs_generator::generate_foo methods take a fs_inst * as the first parameter, just remove the name to quiet the compiler. src/intel/compiler/brw_fs_generator.cpp: In member function ‘void fs_generator::generate_barrier(fs_inst*, brw_reg)’: src/intel/compiler/brw_fs_generator.cpp:743:41: warning: unused parameter ‘inst’ [-Wunused-parameter] fs_generator::generate_barrier(fs_inst *inst, struct brw_reg src) ^~~~ src/intel/compiler/brw_fs_generator.cpp: In member function ‘void fs_generator::generate_discard_jump(fs_inst*)’: src/intel/compiler/brw_fs_generator.cpp:1326:46: warning: unused parameter ‘inst’ [-Wunused-parameter] fs_generator::generate_discard_jump(fs_inst *inst) ^~~~ src/intel/compiler/brw_fs_generator.cpp: In member function ‘void fs_generator::generate_pack_half_2x16_split(fs_inst*, brw_reg, brw_reg, brw_reg)’: src/intel/compiler/brw_fs_generator.cpp:1675:54: warning: unused parameter ‘inst’ [-Wunused-parameter] fs_generator::generate_pack_half_2x16_split(fs_inst *inst, ^~~~ src/intel/compiler/brw_fs_generator.cpp: In member function ‘void fs_generator::generate_shader_time_add(fs_inst*, brw_reg, brw_reg, brw_reg)’: src/intel/compiler/brw_fs_generator.cpp:1743:49: warning: unused parameter ‘inst’ [-Wunused-parameter] fs_generator::generate_shader_time_add(fs_inst *inst, ^~~~ src/intel/compiler/brw_vec4_generator.cpp: In function ‘void generate_set_simd4x2_header_gen9(brw_codegen*, brw::vec4_instruction*, brw_reg)’: src/intel/compiler/brw_vec4_generator.cpp:1412:52: warning: unused parameter ‘inst’ [-Wunused-parameter] vec4_instruction *inst, ^~~~ src/intel/compiler/brw_vec4_generator.cpp: In function ‘void generate_mov_indirect(brw_codegen*, brw::vec4_instruction*, brw_reg, brw_reg, brw_reg, brw_reg)’: src/intel/compiler/brw_vec4_generator.cpp:1430:41: warning: unused parameter ‘inst’ [-Wunused-parameter] vec4_instruction *inst, ^~~~ src/intel/compiler/brw_vec4_generator.cpp:1432:63: warning: unused parameter ‘length’ [-Wunused-parameter] struct brw_reg indirect, struct brw_reg length) ^~~~~~ Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Return mlen * 8 for size_read() for INTERPOLATE_AT_*Jason Ekstrand2018-04-231-0/+2
| | | | | | | | | They are send messages and this makes size_read() and mlen agree. For both of these opcodes, the payload is just a dummy so mlen == 1 and this should decrease register pressure a bit. Reviewed-by: Francisco Jerez <[email protected]> Cc: [email protected]
* i965/fs: retype offset_reg to UD at load_ssboJose Maria Casanova Crespo2018-04-201-1/+1
| | | | | | | | | | | | | | All operations with offset_reg at do_vector_read are done with UD type. So copy propagation was not working through the generated MOVs: mov(8) vgrf9:UD, vgrf7:D This change allows removing the MOV generated for reading the first components for 16-bit and 64-bit ssbo reads with non-constant offsets. Reviewed-by: Iago Toral Quiroga <[email protected]>
* intel: Handle firstvertex in an identical way to BaseVertexAntia Puentes2018-04-193-0/+8
| | | | | | | | | | | | Until we set gl_BaseVertex to zero for non-indexed draw calls both have an identical value. The Vertex Elements are kept like that: * VE 1: <BaseVertex/firstvertex, BaseInstance, VertexID, InstanceID> * VE 2: <Draw ID, 0, 0, 0> v2 (idr): Mark nir_intrinsic_load_first_vertex as "unreachable" in emit_system_values_block and fs_visitor::nir_emit_vs_intrinsic.
* intel/compiler: Add a uses_firstvertex flagNeil Roberts2018-04-192-0/+5
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: include mtypes.h lessMarek Olšák2018-04-122-0/+2
| | | | | | | | | | - remove mtypes.h from most header files - add main/menums.h for often used definitions - remove main/core.h v2: fix radv build Reviewed-by: Brian Paul <[email protected]>
* intel/compiler: Explicitly cast register type in switchIan Romanick2018-04-061-1/+1
| | | | | | | | | | | | | | | | | | brw_reg::type is "enum brw_reg_type type:4". For whatever reason, GCC is treating this as an int instead of an enum. As a result, it doesn't detect missing switch cases and it doesn't detect that flow can get out of the switch. This silences the warning: src/intel/compiler/brw_reg.h: In function ‘bool brw_regs_negative_equal(const brw_reg*, const brw_reg*)’: src/intel/compiler/brw_reg.h:305:1: warning: control reaches end of non-void function [-Wreturn-type] } ^ Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* intel: compiler: silence compiler warningLionel Landwerlin2018-04-041-0/+1
| | | | | | | | | | ../src/intel/compiler/brw_reg.h: In function ‘bool brw_regs_negative_equal(const brw_reg*, const brw_reg*)’: ../src/intel/compiler/brw_reg.h:305:1: warning: control reaches end of non-void function [-Wreturn-type] Introduced by 8f83eea71e233 ("i965: Add negative_equals methods"). Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]>
* nir+drivers: add helpers to get # of src/dest componentsRob Clark2018-04-031-6/+5
| | | | | | | | | Add helpers to get the number of src/dest components for an intrinsic, and update spots that were open-coding this logic to use the helpers instead. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* intel/vec4: Set channel_sizes for MOV_INDIRECT sourcesJason Ekstrand2018-03-301-1/+4
| | | | | | | | | | | Otherwise, any indirect push constant access results in an assertion failure when we start digging through the channel_sizes array. This fixes dEQP-VK.pipeline.push_constant.graphics_pipeline.dynamic_index_vert on Haswell. It should be a harmless no-op for GL since indirect push constants aren't used there. Reviewed-by: Kenneth Graunke <[email protected]> Fixes: e69e5c7006d "i965/vec4: load dvec3/4 uniforms first in the..."
* util: Add and use util_is_power_of_two_nonzeroIan Romanick2018-03-291-2/+2
| | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Eduardo Lima Mitev <[email protected]>
* util: Move util_is_power_of_two to bitscan.h and rename to ↵Ian Romanick2018-03-291-2/+2
| | | | | | | | | | | util_is_power_of_two_or_zero The new name make the zero-input behavior more obvious. The next patch adds a new function with different zero-input behavior. Signed-off-by: Ian Romanick <[email protected]> Suggested-by: Matt Turner <[email protected]> Reviewed-by: Alejandro Piñeiro <[email protected]>
* intel/fs: Don't emit a des copy for image ops with has_dest == falseJason Ekstrand2018-03-271-3/+6
| | | | | | | | | | This was causing us to walk dest_components times over a thing with no destination. This happened to work because all of the image intrinsics without a destination also happened to have dest_components == 0. We shouldn't be reading dest_components if has_dest == false. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965/vec4: Fix null destination register in 3-source instructionsIan Romanick2018-03-262-0/+27
| | | | | | | | | | | | | | | | | | | | | | | A recent commit (see below) triggered some cases where conditional modifier propagation and dead code elimination would cause a MAD instruction like the following to be generated: mad.l.f0 null, ... Matt pointed out that fs_visitor::fixup_3src_null_dest() fixes cases like this in the scalar backend. This commit basically ports that code to the vec4 backend. NOTE: I have sent a couple tests to the piglit list that reproduce this bug *without* the commit mentioned below. This commit fixes those tests. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]> Tested-by: Tapani Pälli <[email protected]> Cc: [email protected] Fixes: ee63933a7 ("nir: Distribute binary operations with constants into bcsel") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105704
* i965/vec4: Propagate conditional modifiers from compares to addsIan Romanick2018-03-261-5/+65
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | No changes on Broadwell or later as those platforms do not use the vec4 backend. Ivy Bridge and Haswell had similar results. (Ivy Bridge shown) total instructions in shared programs: 11682119 -> 11681056 (<.01%) instructions in affected programs: 150403 -> 149340 (-0.71%) helped: 950 HURT: 0 helped stats (abs) min: 1 max: 16 x̄: 1.12 x̃: 1 helped stats (rel) min: 0.23% max: 2.78% x̄: 0.82% x̃: 0.71% 95% mean confidence interval for instructions value: -1.19 -1.04 95% mean confidence interval for instructions %-change: -0.84% -0.79% Instructions are helped. total cycles in shared programs: 257495842 -> 257495238 (<.01%) cycles in affected programs: 270302 -> 269698 (-0.22%) helped: 271 HURT: 13 helped stats (abs) min: 2 max: 14 x̄: 2.42 x̃: 2 helped stats (rel) min: 0.06% max: 1.13% x̄: 0.32% x̃: 0.28% HURT stats (abs) min: 2 max: 12 x̄: 4.00 x̃: 4 HURT stats (rel) min: 0.15% max: 1.18% x̄: 0.30% x̃: 0.26% 95% mean confidence interval for cycles value: -2.41 -1.84 95% mean confidence interval for cycles %-change: -0.31% -0.26% Cycles are helped. Sandy Bridge total instructions in shared programs: 10430493 -> 10429727 (<.01%) instructions in affected programs: 120860 -> 120094 (-0.63%) helped: 766 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.30% max: 2.70% x̄: 0.78% x̃: 0.73% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -0.80% -0.75% Instructions are helped. total cycles in shared programs: 146138718 -> 146138446 (<.01%) cycles in affected programs: 244114 -> 243842 (-0.11%) helped: 132 HURT: 0 helped stats (abs) min: 2 max: 4 x̄: 2.06 x̃: 2 helped stats (rel) min: 0.03% max: 0.43% x̄: 0.16% x̃: 0.19% 95% mean confidence interval for cycles value: -2.12 -2.00 95% mean confidence interval for cycles %-change: -0.18% -0.15% Cycles are helped. GM45 and Iron Lake had identical results. (Iron Lake shown) total instructions in shared programs: 7780251 -> 7780248 (<.01%) instructions in affected programs: 175 -> 172 (-1.71%) helped: 3 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 1.49% max: 2.44% x̄: 1.81% x̃: 1.49% total cycles in shared programs: 177851584 -> 177851578 (<.01%) cycles in affected programs: 9796 -> 9790 (-0.06%) helped: 3 HURT: 0 helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 helped stats (rel) min: 0.05% max: 0.08% x̄: 0.06% x̃: 0.05% Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Allow cmod propagation when src0 is a uniform or shader inputIan Romanick2018-03-261-1/+2
| | | | | | | | | | No shader-db changes. This source must have been written by a previous instruction, so it cannot be a uniform or a shader input. However, this change allows the next commit to help more shaders. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Propagate conditional modifiers from compares to addsIan Romanick2018-03-262-5/+400
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The math inside the add and the cmp in this instruction sequence is the same. We can utilize this to eliminate the compare. add(8) g5<1>F g2<8,8,1>F g64.5<0,1,0>F { align1 1Q compacted }; cmp.z.f0(8) null<1>F g2<8,8,1>F -g64.5<0,1,0>F { align1 1Q switch }; (-f0) sel(8) g8<1>F (abs)g5<8,8,1>F 3e-37F { align1 1Q }; This is reduced to: add.z.f0(8) g5<1>F g2<8,8,1>F g64.5<0,1,0>F { align1 1Q compacted }; (-f0) sel(8) g8<1>F (abs)g5<8,8,1>F 3e-37F { align1 1Q }; This optimization pass could do even better. The nature of converting vectorized code from the GLSL front end to scalar code in NIR results in sequences like: add(8) g7<1>F g4<8,8,1>F g64.5<0,1,0>F { align1 1Q compacted }; add(8) g6<1>F g3<8,8,1>F g64.5<0,1,0>F { align1 1Q compacted }; add(8) g5<1>F g2<8,8,1>F g64.5<0,1,0>F { align1 1Q compacted }; cmp.z.f0(8) null<1>F g2<8,8,1>F -g64.5<0,1,0>F { align1 1Q switch }; (-f0) sel(8) g8<1>F (abs)g5<8,8,1>F 3e-37F { align1 1Q }; cmp.z.f0(8) null<1>F g3<8,8,1>F -g64.5<0,1,0>F { align1 1Q switch }; (-f0) sel(8) g10<1>F (abs)g6<8,8,1>F 3e-37F { align1 1Q }; cmp.z.f0(8) null<1>F g4<8,8,1>F -g64.5<0,1,0>F { align1 1Q switch }; (-f0) sel(8) g12<1>F (abs)g7<8,8,1>F 3e-37F { align1 1Q }; In this sequence, only the first cmp.z is removed. With different scheduling, all 3 could get removed. Skylake total instructions in shared programs: 14407009 -> 14400173 (-0.05%) instructions in affected programs: 1307274 -> 1300438 (-0.52%) helped: 4880 HURT: 0 helped stats (abs) min: 1 max: 33 x̄: 1.40 x̃: 1 helped stats (rel) min: 0.03% max: 8.70% x̄: 0.70% x̃: 0.52% 95% mean confidence interval for instructions value: -1.45 -1.35 95% mean confidence interval for instructions %-change: -0.72% -0.69% Instructions are helped. total cycles in shared programs: 532943169 -> 532923528 (<.01%) cycles in affected programs: 14065798 -> 14046157 (-0.14%) helped: 2703 HURT: 339 helped stats (abs) min: 1 max: 1062 x̄: 12.27 x̃: 2 helped stats (rel) min: <.01% max: 28.72% x̄: 0.38% x̃: 0.21% HURT stats (abs) min: 1 max: 739 x̄: 39.86 x̃: 12 HURT stats (rel) min: 0.02% max: 27.69% x̄: 1.38% x̃: 0.41% 95% mean confidence interval for cycles value: -8.66 -4.26 95% mean confidence interval for cycles %-change: -0.24% -0.14% Cycles are helped. LOST: 0 GAINED: 1 Broadwell total instructions in shared programs: 14719636 -> 14712949 (-0.05%) instructions in affected programs: 1288188 -> 1281501 (-0.52%) helped: 4845 HURT: 0 helped stats (abs) min: 1 max: 33 x̄: 1.38 x̃: 1 helped stats (rel) min: 0.03% max: 8.00% x̄: 0.70% x̃: 0.52% 95% mean confidence interval for instructions value: -1.43 -1.33 95% mean confidence interval for instructions %-change: -0.72% -0.68% Instructions are helped. total cycles in shared programs: 559599253 -> 559581699 (<.01%) cycles in affected programs: 13315565 -> 13298011 (-0.13%) helped: 2600 HURT: 269 helped stats (abs) min: 1 max: 2128 x̄: 12.24 x̃: 2 helped stats (rel) min: <.01% max: 23.95% x̄: 0.41% x̃: 0.20% HURT stats (abs) min: 1 max: 790 x̄: 53.07 x̃: 20 HURT stats (rel) min: 0.02% max: 15.96% x̄: 1.55% x̃: 0.75% 95% mean confidence interval for cycles value: -8.47 -3.77 95% mean confidence interval for cycles %-change: -0.27% -0.18% Cycles are helped. LOST: 0 GAINED: 8 Haswell total instructions in shared programs: 12978609 -> 12973483 (-0.04%) instructions in affected programs: 932921 -> 927795 (-0.55%) helped: 3480 HURT: 0 helped stats (abs) min: 1 max: 33 x̄: 1.47 x̃: 1 helped stats (rel) min: 0.03% max: 7.84% x̄: 0.78% x̃: 0.58% 95% mean confidence interval for instructions value: -1.53 -1.42 95% mean confidence interval for instructions %-change: -0.80% -0.75% Instructions are helped. total cycles in shared programs: 410270788 -> 410250531 (<.01%) cycles in affected programs: 10986161 -> 10965904 (-0.18%) helped: 2087 HURT: 254 helped stats (abs) min: 1 max: 2672 x̄: 14.63 x̃: 4 helped stats (rel) min: <.01% max: 39.61% x̄: 0.42% x̃: 0.21% HURT stats (abs) min: 1 max: 519 x̄: 40.49 x̃: 16 HURT stats (rel) min: 0.01% max: 12.83% x̄: 1.20% x̃: 0.47% 95% mean confidence interval for cycles value: -12.82 -4.49 95% mean confidence interval for cycles %-change: -0.31% -0.18% Cycles are helped. LOST: 0 GAINED: 5 Ivy Bridge total instructions in shared programs: 11686082 -> 11681548 (-0.04%) instructions in affected programs: 937696 -> 933162 (-0.48%) helped: 3150 HURT: 0 helped stats (abs) min: 1 max: 33 x̄: 1.44 x̃: 1 helped stats (rel) min: 0.03% max: 7.84% x̄: 0.69% x̃: 0.49% 95% mean confidence interval for instructions value: -1.49 -1.38 95% mean confidence interval for instructions %-change: -0.71% -0.67% Instructions are helped. total cycles in shared programs: 257514962 -> 257492471 (<.01%) cycles in affected programs: 11524149 -> 11501658 (-0.20%) helped: 1970 HURT: 239 helped stats (abs) min: 1 max: 3525 x̄: 17.48 x̃: 3 helped stats (rel) min: <.01% max: 49.60% x̄: 0.46% x̃: 0.17% HURT stats (abs) min: 1 max: 1358 x̄: 50.00 x̃: 15 HURT stats (rel) min: 0.02% max: 59.88% x̄: 1.84% x̃: 0.65% 95% mean confidence interval for cycles value: -17.01 -3.35 95% mean confidence interval for cycles %-change: -0.33% -0.08% Cycles are helped. LOST: 9 GAINED: 1 Sandy Bridge total instructions in shared programs: 10432841 -> 10429893 (-0.03%) instructions in affected programs: 685071 -> 682123 (-0.43%) helped: 2453 HURT: 0 helped stats (abs) min: 1 max: 9 x̄: 1.20 x̃: 1 helped stats (rel) min: 0.02% max: 7.55% x̄: 0.64% x̃: 0.46% 95% mean confidence interval for instructions value: -1.23 -1.17 95% mean confidence interval for instructions %-change: -0.67% -0.62% Instructions are helped. total cycles in shared programs: 146133660 -> 146134195 (<.01%) cycles in affected programs: 3991634 -> 3992169 (0.01%) helped: 1237 HURT: 153 helped stats (abs) min: 1 max: 2853 x̄: 6.93 x̃: 2 helped stats (rel) min: <.01% max: 29.00% x̄: 0.24% x̃: 0.14% HURT stats (abs) min: 1 max: 1740 x̄: 59.56 x̃: 12 HURT stats (rel) min: 0.03% max: 78.98% x̄: 1.96% x̃: 0.42% 95% mean confidence interval for cycles value: -5.13 5.90 95% mean confidence interval for cycles %-change: -0.17% 0.16% Inconclusive result (value mean confidence interval includes 0). LOST: 0 GAINED: 1 GM45 and Iron Lake had similar results (GM45 shown): total instructions in shared programs: 4800332 -> 4798380 (-0.04%) instructions in affected programs: 565995 -> 564043 (-0.34%) helped: 1451 HURT: 0 helped stats (abs) min: 1 max: 20 x̄: 1.35 x̃: 1 helped stats (rel) min: 0.05% max: 5.26% x̄: 0.47% x̃: 0.31% 95% mean confidence interval for instructions value: -1.40 -1.29 95% mean confidence interval for instructions %-change: -0.50% -0.45% Instructions are helped. total cycles in shared programs: 122032318 -> 122027798 (<.01%) cycles in affected programs: 8334868 -> 8330348 (-0.05%) helped: 1029 HURT: 1 helped stats (abs) min: 2 max: 40 x̄: 4.43 x̃: 2 helped stats (rel) min: <.01% max: 1.83% x̄: 0.09% x̃: 0.04% HURT stats (abs) min: 38 max: 38 x̄: 38.00 x̃: 38 HURT stats (rel) min: 0.25% max: 0.25% x̄: 0.25% x̃: 0.25% 95% mean confidence interval for cycles value: -4.70 -4.08 95% mean confidence interval for cycles %-change: -0.09% -0.08% Cycles are helped. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Allow cmod propagation when src0 is a uniform or shader inputIan Romanick2018-03-261-1/+2
| | | | | | | | | | No shader-db changes. This source must have been written by a previous instruction, so it cannot be a uniform or a shader input. However, this change allows the next commit to help about 900 more shaders. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Add negative_equals methodsIan Romanick2018-03-267-0/+72
| | | | | | | | | | | | | | | | | | | | | | This method is similar to the existing ::equals methods. Instead of testing that two src_regs are equal to each other, it tests that one is the negation of the other. v2: Simplify various checks based on suggestions from Matt. Use src_reg::type instead of fixed_hw_reg.type in a check. Also suggested by Matt. v3: Rebase on 3 years. Fix some problems with negative_equals with VF constants. Add fs_reg::negative_equals. v4: Replace the existing default case with BRW_REGISTER_TYPE_UB, BRW_REGISTER_TYPE_B, and BRW_REGISTER_TYPE_NF. Suggested by Matt. Expand the FINISHME comment to better explain why it isn't already finished. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Alejandro Piñeiro <[email protected]> [v3] Reviewed-by: Matt Turner <[email protected]>
* nir: Rename image intrinsics to image_varJason Ekstrand2018-03-231-23/+23
| | | | | | | | | | | Generated with git grep -l nir_intrinsic_image | xargs \ sed -i 's/nir_intrinsic_image/nir_intrinsic_image_var/g' and some manual fixing in nir_intrinsics.h Reviewed-by: Timothy Arceri <[email protected]>
* intel/compiler: Readd ICL to test_eu_validate.cppMatt Turner2018-03-221-0/+1
| | | | Now that the PCI IDs are upstream, this can be readded.
* intel/compiler: Skip 64-bit type tests when types not availableMatt Turner2018-03-221-0/+19
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* intel/compiler/icl: Clear "null render target" bit in extended message ↵Jason Ekstrand2018-03-222-0/+6
| | | | | | | | | descriptor Otherwise all our render target writes go no where. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel/compiler/icl: Update the assert in brw_stage_has_packed_dispatch()Anuj Phogat2018-03-221-1/+1
| | | | | | | | Rafael ran piglit with the test code enabled and saw no additional GPU hangs. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Silence compiler warning about promoted_constants.Eric Anholt2018-03-161-1/+1
| | | | | | | | We only have a cfg != NULL if we went through one of the paths that set it, but my compiler doesn't figure that out. Reviewed-by: Lionel Landwerlin <[email protected]> Fixes: 6411defdcd6f ("intel/cs: Re-run final NIR optimizations for each SIMD size")
* intel/compiler: Use gen_get_device_info() in test_eu_validateMatt Turner2018-03-162-39/+18
| | | | | | | | | | | | | Previously the unit test filled out a minimal devinfo struct. A previous patch caused the test to begin assert failing because the devinfo was not complete. Avoid this by using the real mechanism to create devinfo. Note that we have to drop icl from the table, since we now rely on the name -> PCI ID translation done by gen_device_name_to_pci_device_id(), and ICL's PCI IDs are not upstream yet. Fixes: f89e735719a6 ("intel/compiler: Check for unsupported register sizes.") Reviewed-by: Rafael Antognolli <[email protected]>
* intel/compiler: Check for unsupported register sizes.Rafael Antognolli2018-03-161-0/+3
| | | | | | | | | Make sure we don't emit 64 bit types if the hardware doesn't support them. Signed-off-by: Rafael Antognolli <[email protected]> Suggested-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* compiler: int8/uint8 supportKarol Herbst2018-03-143-0/+9
| | | | | | | | | | OpenCL kernels also have int8/uint8. v2: remove changes in nir_search as Jason posted a patch for that Reviewed-by: Jason Ekstrand <[email protected]> Signed-off-by: Rob Clark <[email protected]> Signed-off-by: Karol Herbst <[email protected]>