aboutsummaryrefslogtreecommitdiffstats
path: root/src/intel/compiler
Commit message (Collapse)AuthorAgeFilesLines
...
* intel/fs: Use ANY/ALL32 predicates in SIMD32Jason Ekstrand2017-11-071-12/+30
| | | | | | | | | | | | We have ANY/ALL32 predicates and, for the most part, they work just fine. (See the next commit for more details.) Also, due to the way that flag registers are handled in hardware, instruction splitting is able to split the CMP correctly. Specifically, that hardware looks at the execution group and knows to shift it's flag usage up correctly so a 2H instruction will write to f0.1 instead of f0.0. Reviewed-by: Matt Turner <mattst88@gmail.com> Cc: mesa-stable@lists.freedesktop.org
* intel/fs: Be more explicit about our placement of [un]zipJason Ekstrand2017-11-071-3/+17
| | | | | | | | | | | | | Before, we were careful to place the zip after the last of the split instructions but did unzip on-demand. This changes things so that the unzips go before all of the split instructions and the unzip comes explicitly after all the split instructions. As a side-effect of this change, we now emit the split instruction from highest SIMD group to lowest instead of low to high. We could have kept the old behavior, but it shouldn't matter and this made the code easier. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Cc: mesa-stable@lists.freedesktop.org
* intel/fs: Pass builders instead of blocks into emit_[un]zipJason Ekstrand2017-11-071-26/+35
| | | | | | | | | This makes it far more explicit where we're inserting the instructions rather than the magic "before and after" stuff that the emit_[un]zip helpers did based on block and inst. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Cc: mesa-stable@lists.freedesktop.org
* intel/fs: Use a pure vertical stride for large register stridesJason Ekstrand2017-11-071-3/+13
| | | | | | | | | | | | | | Register strides higher than 4 are uncommon but they can happen. For instance, if you have a 64-bit extract_u8 operation, we turn that into UB -> UQ MOV with a source stride of 8. Our previous calculation would try to generate a stride of <32;8,8>:ub which is invalid because the maximum horizontal stride is 4. To solve this problem, we instead use a stride of <8;1,0>. As noted in the comment, this does not work as a destination but that's ok as very few things actually generate that stride. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Cc: mesa-stable@lists.freedesktop.org
* intel/fs: Don't allocate a param array for zero push constantsJason Ekstrand2017-11-021-1/+8
| | | | | | | | | | Thanks to the ralloc invariant of "any pointer returned from ralloc can be used as a context", calling ralloc_size with a size of zero will cause it to allocate at least a header. If we don't have any push constants, then NULL is perfectly acceptable (and even preferred). Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
* intel/fs: Alloc pull constants off mem_ctxJason Ekstrand2017-11-021-1/+1
| | | | | | | | | | It doesn't actually matter since the only user of push constants, i965, ralloc_steals it back to NULL but it's more consistent and probably fixes memory leaks in some error cases. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Cc: mesa-stable@lists.freedesktop.org
* intel/compiler: Add functions to get prog_data and prog_key sizes for a stageJordan Justen2017-10-312-0/+42
| | | | | | | | | v2: * Return unsigned instead of size_t. (Ken) Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* intel/compiler: Add union types for prog_data and prog_key stagesJordan Justen2017-10-311-0/+22
| | | | | Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* intel/compiler: Remove final_program_size from brw_compile_*Jordan Justen2017-10-316-38/+14
| | | | | | | | | The caller can now use brw_stage_prog_data::program_size which is set by the brw_compile_* functions. Cc: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* intel/compiler: add new field for storing program sizeCarl Worth2017-10-316-14/+35
| | | | | | | | | | | | This will be used by the on disk shader cache. v2: * Set in brw_compile_* rather than brw_codegen_*. (Jason) Signed-off-by: Timothy Arceri <timothy.arceri@collabora.com> [jordan.l.justen@intel.com: Only add to brw_stage_prog_data] Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965: remove unused variableEric Engestrom2017-10-301-3/+0
| | | | | | | | Fixes: 2c873060d3578c7004c0 "i965: Delete unused brw_vs_prog_data::nr_attributes field." Cc: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com> Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
* glsl: Remove ir_binop_greater and ir_binop_lequal expressionsIan Romanick2017-10-301-4/+0
| | | | | | | | | | | | | | | | | | | NIR does not have these instructions. TGSI and Mesa IR both implement them using < and >=, repsectively. Removing them deletes a bunch of code and means I don't have to add code to the SPIR-V generator for them. v2: Rebase on 2+ years of change... and fix a major bug added in the rebase. text data bss dec hex filename 8255291 268856 294072 8818219 868e2b 32-bit i965_dri.so before 8254235 268856 294072 8817163 868a0b 32-bit i965_dri.so after 7815339 345592 420592 8581523 82f193 64-bit i965_dri.so before 7813995 345560 420592 8580147 82ec33 64-bit i965_dri.so after Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* i965: fix blorp stage_prog_data->param leakTapani Pälli2017-10-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Patch uses mem_ctx for allocation to ensure param array gets freed later. ==6164== 48 bytes in 1 blocks are definitely lost in loss record 61 of 193 ==6164== at 0x4C2EB6B: malloc (vg_replace_malloc.c:299) ==6164== by 0x12E31C6C: ralloc_size (ralloc.c:121) ==6164== by 0x130189F1: fs_visitor::assign_constant_locations() (brw_fs.cpp:2095) ==6164== by 0x13022D32: fs_visitor::optimize() (brw_fs.cpp:5715) ==6164== by 0x13024D5A: fs_visitor::run_fs(bool, bool) (brw_fs.cpp:6229) ==6164== by 0x1302549A: brw_compile_fs (brw_fs.cpp:6570) ==6164== by 0x130C4B07: blorp_compile_fs (blorp.c:194) ==6164== by 0x130D384B: blorp_params_get_clear_kernel (blorp_clear.c:79) ==6164== by 0x130D3C56: blorp_fast_clear (blorp_clear.c:332) ==6164== by 0x12EFA439: do_single_blorp_clear (brw_blorp.c:1261) ==6164== by 0x12EFC4AF: brw_blorp_clear_color (brw_blorp.c:1326) ==6164== by 0x12EFF72B: brw_clear (brw_clear.c:297) Fixes: 8d90e28839 ("intel/compiler: Allocate pull_param in assign_constant_locations") Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable@lists.freedesktop.org
* i965: Delete brw_wm_prog_key::drawable_height.Kenneth Graunke2017-10-291-1/+0
| | | | | | This has been unused since we switched to nir_lower_wpos_ytransform. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* intel/compiler/gen9: Pixel shader header only workaroundTopi Pohjolainen2017-10-281-0/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes intermittent GPU hangs on Broxton with an Intel internal test case. There are plenty of similar fragment shaders in piglit that do not use any varyings and any uniforms. According to the documentation special timing is needed between pipeline stages. Apparently we just don't hit that with piglit. Even with the failing test case one doesn't always get the hang. Moreover, according to the error states the hang happens significantly later than the execution of the problematic shader. There are multiple render cycles (primitive submissions) in between. I've also seen error states where the ACTHD points outside the batch. Almost as if the hardware writes somewhere that gets used later on. That would also explain why piglit doesn't suffer from this - most tests kick off one render cycle and any corruption is left unseen. v2 (Ken): Instead of enabling push constants, enable one of the inputs (PSIZ). v3 (Ken, Jason): Use LAYER instead making vulkan emit_3dstate_sbe() happy. Cc: "17.3 17.2" <mesa-stable@lists.freedesktop.org> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
* i965: Delete unused brw_vs_prog_data::nr_attributes field.Kenneth Graunke2017-10-272-2/+0
| | | | Reviewed-by: Matt Turner <mattst88@gmail.com>
* intel/compiler: brw_validate_instructions to take const void* instead of void*Kevin Rogovin2017-10-262-2/+2
| | | | | | | The disassembler does not (and should not) be modifying the data. Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* intel/compiler: Call nir_lower_system_values in brw_preprocess_nirJason Ekstrand2017-10-251-0/+2
| | | | Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
* intel/eu: Use EXECUTE_1 for JMPIJason Ekstrand2017-10-252-2/+1
| | | | | | | | | | | | The PRM says "The execution size must be 1." In 73137997e23ff6c11, the execution size was set to 1 when it should have been BRW_EXECUTE_1 (which maps to 0). Later, in dc2d3a7f5c217a7cee9, JMPI was used for line AA on gen6 and earlier and we started manually stomping the exeution size to BRW_EXECUTE_1 in the generator. This commit fixes the original bug and makes brw_JMPI just do the right thing. Reviewed-by: Matt Turner <mattst88@gmail.com> Fixes: 73137997e23ff6c1145d036315d1a9ad96651281
* i965/fs: Add brw_reg_type_from_bit_size utility methodAlejandro Piñeiro2017-10-251-5/+64
| | | | | | | | | | | | | | | Returns the brw_type for a given ssa.bit_size, and a reference type. So if bit_size is 64, and the reference type is BRW_REGISTER_TYPE_F, it returns BRW_REGISTER_TYPE_DF. The same applies if bit_size is 32 and reference type is BRW_REGISTER_TYPE_HF it returns BRW_REGISTER_TYPE_F v2 (Jason Ekstrand): - Use better unreachable() messages - Add Q types Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965/fs/nir: Use the nir_src_bit_size helperJason Ekstrand2017-10-251-9/+3
| | | | Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
* intel/fs: Handle flag read/write aliasing in needs_src_copyJason Ekstrand2017-10-251-1/+3
| | | | | | | | | | | | | | | In order to implement the ballot intrinsic, we do a MOV from flag register to some GRF. If that GRF is used in a SEL, cmod propagation helpfully changes it into a MOV from the flag register with a cmod. This is perfectly valid but when lower_simd_width comes along, it simply splits into two instructions which both have conditional modifiers. This is a problem since we're reading the flag register. This commit makes us check whether or not flags_written() overlaps with the flag values that we are reading via the instruction source and, if we have any interference, will force us to emit a copy of the source. Reviewed-by: Matt Turner <mattst88@gmail.com> Cc: mesa-stable@lists.freedesktop.org
* intel/nir: Zero local index const struct for valgrind & nir_serializeJordan Justen2017-10-251-0/+1
| | | | | | | Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* meson: extract out variable for nir_algebraic.pyRob Clark2017-10-241-1/+1
| | | | | | | | Also needed in freedreno/ir3. Signed-off-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
* i965: Fix memmem compiler warnings.Eric Anholt2017-10-241-1/+2
| | | | | | | | | | | | | | | | | | | gcc is throwing this warning in my meson build: ../src/intel/compiler/brw_eu_validate.c:50:11: warning argument 1 null where non-null expected [-Wnonnull] return memmem(haystack.str, haystack.len, ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ needle.str, needle.len) != NULL; ~~~~~~~~~~~~~~~~~~~~~~~ The first check for CONTAINS has a NULL error_msg.str and 0 len. The glibc implementation will exit without looking at any haystack bytes if haystack.len < needle.len, so this was safe, but silence the warning anyway by guarding against implementation variablility. Fixes: 122ef3799d56 ("i965: Only insert error message if not already present") Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/fs: Use align1 mode on ternary instructions on Gen10+Matt Turner2017-10-201-4/+8
| | | | | | | | | Align1 mode offers some nice features over align16, like access to more data types and the ability to use a 16-bit immediate. This patch does not start using any new features. It just emits ternary instructions in align1 mode. Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
* i965: Add align1 ternary instruction emission supportMatt Turner2017-10-201-55/+160
| | | | Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
* i965: Add align1 ternary instruction disassembler supportMatt Turner2017-10-202-75/+288
| | | | Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
* i965: Add align1 ternary instruction-word supportMatt Turner2017-10-201-0/+108
| | | | Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
* i965: Add align1 ternary instruction support to conversion functionsMatt Turner2017-10-204-34/+101
| | | | Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
* i965: Add align1 ternary instruction field encodingsMatt Turner2017-10-201-0/+35
| | | | Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
* i965: Add functions to abstract access to 3src register typesMatt Turner2017-10-202-20/+23
| | | | Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
* i965: Rename brw_inst's functions that access the 3src register typeMatt Turner2017-10-203-18/+18
| | | | | | | | | Put hw_ in the name so that it's clear these are the hardware encodings. Similar to commit 9fb832332868 ("i965: Rename brw_inst's functions that access the register type") Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
* i965: Rename brw_inst 3src functions in preparation for align1Matt Turner2017-10-204-86/+92
| | | | Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
* i965: Print subreg in units of type-size on ternary instructionsMatt Turner2017-10-201-5/+26
| | | | | | | | The instruction word contains SubRegNum[4:2] so it's in units of dwords (hence the * 4 to get it in terms of bytes). Before this patch, the subreg would have been wrong for DF arguments. Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
* i965: Add functions for brw_reg_type <-> hw 3src typeMatt Turner2017-10-202-0/+58
| | | | Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
* i965: Move brw_reg_type_is_floating_point to brw_reg_type.hMatt Turner2017-10-202-13/+15
| | | | | | | I'm going to call this from brw_inst.h, and I don't want to have to include all of brw_reg.h. Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
* nir: Get rid of nir_shader::stageJason Ekstrand2017-10-206-21/+21
| | | | | | | | It's redundant with nir_shader::info::stage. Acked-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
* i965/vec4: remove setting default LOD in the backendSamuel Iglesias Gonsálvez2017-10-202-21/+0
| | | | | | | It is already done in NIR. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
* i965/fs: remove setting default LOD in the backendSamuel Iglesias Gonsálvez2017-10-201-9/+0
| | | | | | | It is already done in NIR. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
* i965: Use is_scheduling_barrier instead of schedule_node::is_barrier.Kenneth Graunke2017-10-191-22/+10
| | | | | | | | | | | | | | | | | | | Commit a73116ecc60414ade89802150b tried to make add_barrier_deps() walk to the next barrier, and stop. To accomplish that, it added an is_barrier flag. Unfortunately, this only works half of the time. The issue is that add_barrier_deps() walks both backward (to the previous barrier), and forward (to the next barrier). It also sets is_barrier. Assuming that we're processing instructions in forward order, this means that is_barrier will be set for previous instructions, but not future ones. So we'll never see it, and walk further than we need to. dEQP-GLES31.functional.ssbo.layout.random.all_shared_buffer.23 now compiles its shaders in 3.6 seconds instead of 3.3 minutes. Reviewed-by: Matt Turner <mattst88@gmail.com> Tested-by: Pallavi G <pallavi.g@intel.com>
* i965: Move fs_inst::has_side_effects()'s eot check to the parent class.Kenneth Graunke2017-10-195-9/+3
| | | | | | | | | This eliminates a layer of wrapping, and makes a backend_instruction sufficient. The downside is that it exposes 'eot' to the vec4 backend, which it doesn't need, but can basically happily ignore. Reviewed-by: Matt Turner <mattst88@gmail.com> Tested-by: Pallavi G <pallavi.g@intel.com>
* intel/cs: Make thread_local_id a regular builtin paramJason Ekstrand2017-10-123-28/+29
| | | | | | | | | This is a lot more natural than special casing it all over the place. We still have to do a bit of special-casing in assign_constant_locations but it's not special-cased quite as bad as it was before. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* intel/compiler: Allocate pull_param in assign_constant_locationsJason Ekstrand2017-10-122-6/+14
| | | | | | | | | | | Now that everything is nicely ralloc'd, we can allocate the pull_param array in assign_constant_locations instead of higher up. We can also re-allocate the param array so that it's exactly the needed size. This should save us some memory because we're not allocating the total needed param space for both push and pull. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* intel: Allocate prog_data::[pull_]param deeper inside the compilerJason Ekstrand2017-10-122-4/+4
| | | | | | | | | | | | | Now that we're always growing the param array as-needed, we can allocate the param array in common code and stop repeating the allocation everywere. In order to keep things sane, we ralloc the [pull_]param array off of the compile context and then steal it back to a NULL context later. This doesn't get us all the way to where prog_data::[pull_]param is purely an out parameter of the back-end compiler but it gets us a lot closer. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* intel/vs: Grow the param array for clip planesJason Ekstrand2017-10-122-0/+14
| | | | | | | | Instead of requiring the caller of brw_compile_vs to figure it out, just grow the param array on-demand. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* intel/cs: Grow prog_data::param on-demand for thread_local_id_indexJason Ekstrand2017-10-122-15/+9
| | | | | | | | | | Instead of making the caller of brw_compile_cs add something to the param array for thread_local_id_index, just add it on-demand in brw_nir_intrinsics and grow the array. This is now safe to do because everyone is now using ralloc for prog_data::param. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* intel/compiler: Make brw_nir_lower_intrinsics compute-specificJason Ekstrand2017-10-124-18/+12
| | | | | | | | | It's already only ever called from brw_compile_cs and only handles compute intrinsics. Let's just make it CS-specific. We can always make it handle other stages again later if we want. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* intel/compiler: Add a helper for growing the prog_data::param arrayJason Ekstrand2017-10-121-0/+13
| | | | | Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* intel/compiler: Add a flag for pull constant supportJason Ekstrand2017-10-123-2/+11
| | | | | | | | | | | The Vulkan driver does not support pull constants. It simply limits things such that we can always push everything. Previously, we were determining whether or not to push things based on whether or not the prog_data::pull_param array is non-null. This is rather hackish and about to stop working. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>