summaryrefslogtreecommitdiffstats
path: root/src/intel
Commit message (Collapse)AuthorAgeFilesLines
* intel/fs: Don't allocate a param array for zero push constantsJason Ekstrand2017-11-021-1/+8
| | | | | | | | | | Thanks to the ralloc invariant of "any pointer returned from ralloc can be used as a context", calling ralloc_size with a size of zero will cause it to allocate at least a header. If we don't have any push constants, then NULL is perfectly acceptable (and even preferred). Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* intel/fs: Alloc pull constants off mem_ctxJason Ekstrand2017-11-021-1/+1
| | | | | | | | | | It doesn't actually matter since the only user of push constants, i965, ralloc_steals it back to NULL but it's more consistent and probably fixes memory leaks in some error cases. Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Tapani Pälli <[email protected]> Cc: [email protected]
* intel: decoder: enable decoding a single fieldLionel Landwerlin2017-11-012-0/+52
| | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Scott D Phillips <[email protected]>
* intel: decoder: expose missing find_enum()Lionel Landwerlin2017-11-011-0/+2
| | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Scott D Phillips <[email protected]>
* intel: decoder: extract field value computationLionel Landwerlin2017-11-011-30/+37
| | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Scott D Phillips <[email protected]>
* intel: decoder: rename field() to field_value()Lionel Landwerlin2017-11-011-18/+18
| | | | | | | We would like to avoid collisions with variables named field. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Scott D Phillips <[email protected]>
* intel: decoder: rename internal function to free nameLionel Landwerlin2017-11-011-3/+3
| | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Scott D Phillips <[email protected]>
* intel: decoder: simplify field_is_header()Lionel Landwerlin2017-11-012-4/+6
| | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Scott D Phillips <[email protected]>
* intel: common: make intel utils available from C++Lionel Landwerlin2017-11-013-0/+25
| | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Scott D Phillips <[email protected]>
* intel: decoder: remove unused platform fieldLionel Landwerlin2017-11-011-2/+0
| | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Scott D Phillips <[email protected]>
* intel: error-decode: implement a rolling window of programsLionel Landwerlin2017-11-011-14/+23
| | | | | | | | | | | | | | If we have more programs than what we can store, aubinator_error_decode will assert. Instead let's have a rolling window of programs. v2: Fix overflowing issues (Eric Engestrom) v3: Go through programs starting at idx_program (Scott) Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Scott D Phillips <[email protected]>
* intel: decoder: extract instruction/structs lengthLionel Landwerlin2017-11-012-0/+8
| | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Scott D Phillips <[email protected]>
* intel: decoder: pack iterator variable declarationsLionel Landwerlin2017-11-011-11/+8
| | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Scott D Phillips <[email protected]>
* intel: decoder: simplify creation of struct when 0-allocatedLionel Landwerlin2017-11-011-4/+0
| | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Scott D Phillips <[email protected]>
* intel: decoder: add destructor for gen_specLionel Landwerlin2017-11-012-102/+91
| | | | | | | | This makes use of ralloc to simplify the destruction. We can also store instructions in hash tables. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Scott D Phillips <[email protected]>
* intel: decoder: expose helper to test header fieldsLionel Landwerlin2017-11-012-3/+4
| | | | | | | | These fields are of little importance as they're used to recognize instructions. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Scott D Phillips <[email protected]>
* intel: decoder: don't read qword outside instruction/struct limitLionel Landwerlin2017-11-012-3/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We used to print invalid data when the last field was being clamped to 32bits due to Dword Length of the whole instruction. Here is an example where the decoder read part of the next instruction instead of stopping at the 32bit limit: 0x000ce0b4: 0x10000002: MI_STORE_DATA_IMM 0x000ce0b4: 0x10000002 : Dword 0 DWord Length: 2 Store Qword: 0 Use Global GTT: false 0x000ce0b8: 0x00045010 : Dword 1 Core Mode Enable: 0 Address: 0x00045010 0x000ce0bc: 0x00000000 : Dword 2 0x000ce0c0: 0x00000000 : Dword 3 Immediate Data: 8791026489807077376 With this change we have the proper value : 0x000ce0b4: 0x10000002: MI_STORE_DATA_IMM (4 Dwords) 0x000ce0b4: 0x10000002 : Dword 0 DWord Length: 2 Store Qword: 0 Use Global GTT: false 0x000ce0b8: 0x00045010 : Dword 1 Core Mode Enable: 0 Address: 0x00045010 0x000ce0bc: 0x00000000 : Dword 2 0x000ce0c0: 0x00000000 : Dword 3 Immediate Data: 0 Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Scott D Phillips <[email protected]>
* intel: decoder: split out getting the next field and decoding itLionel Landwerlin2017-11-011-10/+21
| | | | | | | | | Due to the new way we handle fields, we need *not* to forget the first field when decoding instructions. The issue was that the advance function was called first and skipped the first field. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Scott D Phillips <[email protected]>
* intel: decoder: move field name copyLionel Landwerlin2017-11-011-2/+7
| | | | | | | This should be inside the function that actually decodes fields. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Scott D Phillips <[email protected]>
* intel: decoder: reorder iterator init functionLionel Landwerlin2017-11-011-14/+14
| | | | | | | Making the next change more readable. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Scott D Phillips <[email protected]>
* intel: common: print out all dword with field spanning multiple dwordsLionel Landwerlin2017-11-011-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For example, we were skipping Dword 3 in this PIPE_CONTROL : 0x000ce130: 0x7a000004: PIPE_CONTROL DWord Length: 4 0x000ce134: 0x00000010 : Dword 1 Flush LLC: false Destination Address Type: 0 (PPGTT) LRI Post Sync Operation: 0 (No LRI Operation) Store Data Index: 0 Command Streamer Stall Enable: false Global Snapshot Count Reset: false TLB Invalidate: false Generic Media State Clear: false Post Sync Operation: 0 (No Write) Depth Stall Enable: false Render Target Cache Flush Enable: false Instruction Cache Invalidate Enable: false Texture Cache Invalidation Enable: false Indirect State Pointers Disable: false Notify Enable: false Pipe Control Flush Enable: false DC Flush Enable: false VF Cache Invalidation Enable: true Constant Cache Invalidation Enable: false State Cache Invalidation Enable: false Stall At Pixel Scoreboard: false Depth Cache Flush Enable: false 0x000ce138: 0x00000000 : Dword 2 Address: 0x00000000 0x000ce140: 0x00000000 : Dword 4 Immediate Data: 0 Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Scott D Phillips <[email protected]>
* intel: decoder: build sorted linked lists of fieldsLionel Landwerlin2017-11-012-25/+34
| | | | | | | | | | The xml files don't always have fields in order. This might confuse our parsing of the commands. Let's have the fields in order. To do this, the easiest way it to use a linked list. It also helps a bit with the iterator. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Scott D Phillips <[email protected]>
* intel: common: expose gen_spec fieldsLionel Landwerlin2017-11-012-13/+13
| | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Scott D Phillips <[email protected]>
* intel/compiler: Add functions to get prog_data and prog_key sizes for a stageJordan Justen2017-10-312-0/+42
| | | | | | | | | v2: * Return unsigned instead of size_t. (Ken) Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel/compiler: Add union types for prog_data and prog_key stagesJordan Justen2017-10-311-0/+22
| | | | | Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel/compiler: Remove final_program_size from brw_compile_*Jordan Justen2017-10-3111-71/+40
| | | | | | | | | The caller can now use brw_stage_prog_data::program_size which is set by the brw_compile_* functions. Cc: Jason Ekstrand <[email protected]> Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel/compiler: add new field for storing program sizeCarl Worth2017-10-316-14/+35
| | | | | | | | | | | | This will be used by the on disk shader cache. v2: * Set in brw_compile_* rather than brw_codegen_*. (Jason) Signed-off-by: Timothy Arceri <[email protected]> [[email protected]: Only add to brw_stage_prog_data] Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel/isl: Disable some gen10 CCS_E formats for nowNanley Chery2017-10-311-0/+24
| | | | | | | | | CannonLake additionally supports R11G11B10_FLOAT and four 10-10-10-2 formats with CCS_E. None of these formats fit within the current blorp_copy framework so disable them until support is added. Signed-off-by: Nanley Chery <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel/genxml: Fix decoding of groups with fields smaller than a DWord.Kenneth Graunke2017-10-302-10/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Groups containing fields smaller than a DWord were not being decoded correctly. For example: <group count="32" start="32" size="4"> <field name="Vertex Element Enables" start="0" end="3" type="uint"/> </group> gen_field_iterator_next would properly walk over each element of the array, incrementing group_iter, and calling iter_group_offset_bits() to advance to the proper DWord. However, the code to print the actual values only considered iter->field->start/end, which are 0 and 3 in the above example. So it would always fetch bits 3:0 of the current DWord when printing values, instead of advancing to each element of the array, printing bits 0-3, 4-7, 8-11, and so on. To fix this, we add new iter->start/end tracking, which properly advances for each instance of a group's field. Caught by Matt Turner while working on 3DSTATE_VF_COMPONENT_PACKING, with a patch to convert it to use an array of bitfields (the example above). This also fixes the decoding of 3DSTATE_SBE's "Attribute Active Component Format" fields. Reviewed-by: Jordan Justen <[email protected]>
* intel: common: silence compiler warningLionel Landwerlin2017-10-301-1/+1
| | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: remove unused variableEric Engestrom2017-10-301-3/+0
| | | | | | | | Fixes: 2c873060d3578c7004c0 "i965: Delete unused brw_vs_prog_data::nr_attributes field." Cc: Kenneth Graunke <[email protected]> Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Eduardo Lima Mitev <[email protected]>
* glsl: Remove ir_binop_greater and ir_binop_lequal expressionsIan Romanick2017-10-301-4/+0
| | | | | | | | | | | | | | | | | | | NIR does not have these instructions. TGSI and Mesa IR both implement them using < and >=, repsectively. Removing them deletes a bunch of code and means I don't have to add code to the SPIR-V generator for them. v2: Rebase on 2+ years of change... and fix a major bug added in the rebase. text data bss dec hex filename 8255291 268856 294072 8818219 868e2b 32-bit i965_dri.so before 8254235 268856 294072 8817163 868a0b 32-bit i965_dri.so after 7815339 345592 420592 8581523 82f193 64-bit i965_dri.so before 7813995 345560 420592 8580147 82ec33 64-bit i965_dri.so after Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* i965: fix blorp stage_prog_data->param leakTapani Pälli2017-10-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Patch uses mem_ctx for allocation to ensure param array gets freed later. ==6164== 48 bytes in 1 blocks are definitely lost in loss record 61 of 193 ==6164== at 0x4C2EB6B: malloc (vg_replace_malloc.c:299) ==6164== by 0x12E31C6C: ralloc_size (ralloc.c:121) ==6164== by 0x130189F1: fs_visitor::assign_constant_locations() (brw_fs.cpp:2095) ==6164== by 0x13022D32: fs_visitor::optimize() (brw_fs.cpp:5715) ==6164== by 0x13024D5A: fs_visitor::run_fs(bool, bool) (brw_fs.cpp:6229) ==6164== by 0x1302549A: brw_compile_fs (brw_fs.cpp:6570) ==6164== by 0x130C4B07: blorp_compile_fs (blorp.c:194) ==6164== by 0x130D384B: blorp_params_get_clear_kernel (blorp_clear.c:79) ==6164== by 0x130D3C56: blorp_fast_clear (blorp_clear.c:332) ==6164== by 0x12EFA439: do_single_blorp_clear (brw_blorp.c:1261) ==6164== by 0x12EFC4AF: brw_blorp_clear_color (brw_blorp.c:1326) ==6164== by 0x12EFF72B: brw_clear (brw_clear.c:297) Fixes: 8d90e28839 ("intel/compiler: Allocate pull_param in assign_constant_locations") Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]> Cc: [email protected]
* i965: Delete brw_wm_prog_key::drawable_height.Kenneth Graunke2017-10-291-1/+0
| | | | | | This has been unused since we switched to nir_lower_wpos_ytransform. Reviewed-by: Jason Ekstrand <[email protected]>
* intel/compiler/gen9: Pixel shader header only workaroundTopi Pohjolainen2017-10-281-0/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes intermittent GPU hangs on Broxton with an Intel internal test case. There are plenty of similar fragment shaders in piglit that do not use any varyings and any uniforms. According to the documentation special timing is needed between pipeline stages. Apparently we just don't hit that with piglit. Even with the failing test case one doesn't always get the hang. Moreover, according to the error states the hang happens significantly later than the execution of the problematic shader. There are multiple render cycles (primitive submissions) in between. I've also seen error states where the ACTHD points outside the batch. Almost as if the hardware writes somewhere that gets used later on. That would also explain why piglit doesn't suffer from this - most tests kick off one render cycle and any corruption is left unseen. v2 (Ken): Instead of enabling push constants, enable one of the inputs (PSIZ). v3 (Ken, Jason): Use LAYER instead making vulkan emit_3dstate_sbe() happy. Cc: "17.3 17.2" <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]>
* anv: Fix assert about source attrs.Kenneth Graunke2017-10-271-1/+1
| | | | | | | | | Asserting slot >= 2 made sense when the URB read offset was always 1 (pair of slots). Commit 566a0c43f0b9fbf5106161471dd5061c7275f761 made it possible to read from the VUE header in slot 0, by adjusting the offset to be 0. So, this assert is now bogus. Use the one from GL. Reviewed-by: Jason Ekstrand <[email protected]>
* anv: Drop URB entry output read handling in 3DSTATE_XS.Kenneth Graunke2017-10-271-26/+0
| | | | | | | | | Commit 566a0c43f0b9fbf5106161471dd5061c7275f761 started setting the 3DSTATE_SBE bit to override these values with the one calculated there. So, they're dead. Stop setting them. Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Delete unused brw_vs_prog_data::nr_attributes field.Kenneth Graunke2017-10-272-2/+0
| | | | Reviewed-by: Matt Turner <[email protected]>
* intel/tools/disasm: correctly observe FILE *out parameterKevin Rogovin2017-10-261-2/+2
| | | | | | Signed-off-by: Kevin Rogovin <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel/compiler: brw_validate_instructions to take const void* instead of void*Kevin Rogovin2017-10-262-2/+2
| | | | | | | The disassembler does not (and should not) be modifying the data. Signed-off-by: Kevin Rogovin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* anv/entrypoints: Dump useful data if mako throws an exceptionJason Ekstrand2017-10-251-5/+17
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/compiler: Call nir_lower_system_values in brw_preprocess_nirJason Ekstrand2017-10-252-2/+2
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* anv/pipeline: Call nir_lower_system_valaues after brw_preprocess_nirJason Ekstrand2017-10-251-1/+2
| | | | | | | | | | We currently have a bug where nir_lower_system_values gets called before nir_lower_var_copies so it will miss any system value uses which come from a copy_var intrinsic. Moving it to after brw_preprocess_nir fixes this problem. Reviewed-by: Lionel Landwerlin <[email protected]> Cc: [email protected]
* anv/pipeline: Drop nir_lower_clip_cull_distance_arraysJason Ekstrand2017-10-251-2/+0
| | | | | | We already handle it in brw_preprocess_nir Reviewed-by: Lionel Landwerlin <[email protected]>
* anv/pipeline: Dump shader immedately after spirv_to_nirJason Ekstrand2017-10-251-0/+15
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/eu: Use EXECUTE_1 for JMPIJason Ekstrand2017-10-252-2/+1
| | | | | | | | | | | | The PRM says "The execution size must be 1." In 73137997e23ff6c11, the execution size was set to 1 when it should have been BRW_EXECUTE_1 (which maps to 0). Later, in dc2d3a7f5c217a7cee9, JMPI was used for line AA on gen6 and earlier and we started manually stomping the exeution size to BRW_EXECUTE_1 in the generator. This commit fixes the original bug and makes brw_JMPI just do the right thing. Reviewed-by: Matt Turner <[email protected]> Fixes: 73137997e23ff6c1145d036315d1a9ad96651281
* i965/fs: Add brw_reg_type_from_bit_size utility methodAlejandro Piñeiro2017-10-251-5/+64
| | | | | | | | | | | | | | | Returns the brw_type for a given ssa.bit_size, and a reference type. So if bit_size is 64, and the reference type is BRW_REGISTER_TYPE_F, it returns BRW_REGISTER_TYPE_DF. The same applies if bit_size is 32 and reference type is BRW_REGISTER_TYPE_HF it returns BRW_REGISTER_TYPE_F v2 (Jason Ekstrand): - Use better unreachable() messages - Add Q types Signed-off-by: Jose Maria Casanova Crespo <[email protected]> Signed-off-by: Alejandro Piñeiro <[email protected] Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs/nir: Use the nir_src_bit_size helperJason Ekstrand2017-10-251-9/+3
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/fs: Handle flag read/write aliasing in needs_src_copyJason Ekstrand2017-10-251-1/+3
| | | | | | | | | | | | | | | In order to implement the ballot intrinsic, we do a MOV from flag register to some GRF. If that GRF is used in a SEL, cmod propagation helpfully changes it into a MOV from the flag register with a cmod. This is perfectly valid but when lower_simd_width comes along, it simply splits into two instructions which both have conditional modifiers. This is a problem since we're reading the flag register. This commit makes us check whether or not flags_written() overlaps with the flag values that we are reading via the instruction source and, if we have any interference, will force us to emit a copy of the source. Reviewed-by: Matt Turner <[email protected]> Cc: [email protected]
* intel/nir: Zero local index const struct for valgrind & nir_serializeJordan Justen2017-10-251-0/+1
| | | | | | | Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>