summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* intel/compiler: validate conversions between 64-bit and 8-bit typesIago Toral Quiroga2019-04-182-0/+105
| | | | | | | | | | | | | v2: - Add some tests with UB type too (Jason) v3: - consider implicit conversions from 2src instructions too (Curro). v4: - Do not check src1 type in single-source instructions (Curro). Reviewed-by: Jason Ekstrand <[email protected]> (v2)
* intel/compiler: validate region restrictions for half-float conversionsIago Toral Quiroga2019-04-182-1/+270
| | | | | | | | | | | | | | | | | | v2: - Consider implicit conversions in 2-src instructions too (Curro) - For restrictions that involve destination stride requirements only validate them for Align1, since Align16 always requires packed data. - Skip general rule for the dst/execution type size ratio for mixed float instructions on CHV and SKL+, these have their own set of rules that we'll be validated separately. v3 (Curro): - Do not check src1 type in single-source instructions. - Check restriction on src1. - Remove invalid test. Reviewed-by: Francisco Jerez <[email protected]>
* intel/compiler: also set F execution type for mixed float mode in BDWIago Toral Quiroga2019-04-181-16/+20
| | | | | | | | | | | | | | The section 'Execution Data Types' of 3D Media GPGPU volume, which describes execution types, is exactly the same in BDW and SKL+. Also, this section states that there is a single execution type, so it makes sense that this is the wider of the two floating point types involved in mixed float mode, which is what we do for SKL+ and CHV. v2: - Make sure we also account for the destination type in mixed mode (Curro). Acked-by: Francisco Jerez <[email protected]>
* intel/compiler: implement SIMD16 restrictions for mixed-float instructionsIago Toral Quiroga2019-04-181-0/+72
| | | | | | | v2: f32to16/f16to32 can use a :W destination (Curro) v3: check destination is packed (Curro). Reviewed-by: Francisco Jerez <[email protected]>
* intel/compiler: skip MAD algebraic optimization for half-float or mixed modeIago Toral Quiroga2019-04-181-0/+4
| | | | | | | | It is very likely that this optimzation is never useful and we'll probably just end up removing it, so let's not bother adding more cases to it for now. Reviewed-by: Jason Ekstrand <[email protected]>
* intel/compiler: remove inexact algebraic optimizations from the backendIago Toral Quiroga2019-04-181-38/+1
| | | | | | | | | | | | | | | | | | NIR already has these and correctly considers exact/inexact qualification, whereas the backend doesn't and can apply the optimizations where it shouldn't. This happened to be the case in a handful of Tomb Raider shaders, where NIR would skip the optimizations because of a precise qualification but the backend would then (incorrectly) apply them anyway. Besides this, considering that we are not emitting much math in the backend these days it is unlikely that these optimizations are useful in general. A shader-db run confirms that MAD and LRP optimizations, for example, were only being triggered in cases where NIR would skip them due to precise requirements, so in the near future we might want to remove more of these, but for now we just remove the ones that are not completely correct. Suggested-by: Jason Ekstrand <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel/compiler: fix cmod propagation for non 32-bit typesIago Toral Quiroga2019-04-181-4/+9
| | | | | | | v2: - Do not propagate if the bit-size changes Reviewed-by: Jason Ekstrand <[email protected]>
* intel/compiler: add a brw_reg_type_is_integer helperIago Toral Quiroga2019-04-181-0/+18
| | | | | | | v2: - Fixed typo: meant BRW_REGISTER_TYPE_UB instead BRW_REGISTER_TYPE_UV Reviewed-by: Jason Ekstrand <[email protected]> (v1)
* intel/compiler: implement is_zero, is_one, is_negative_one for 8-bit/16-bitIago Toral Quiroga2019-04-181-0/+26
| | | | | | | | | | | | There are no 8-bit immediates, so assert in that case. 16-bit immediates are replicated in each word of a 32-bit immediate, so we only need to check the lower 16-bits. v2: - Fix is_zero with half-float to consider -0 as well (Jason). - Fix is_negative_one for word type. Reviewed-by: Jason Ekstrand <[email protected]>
* intel/compiler: generalize the combine constants passIago Toral Quiroga2019-04-181-22/+212
| | | | | | | | | | | | | | | | At the very least we need it to handle HF too, since we are doing constant propagation for MAD and LRP, which relies on this pass to promote the immediates to GRF in the end, but ideally we want it to support even more types so we can take advantage of it to improve register pressure in some scenarios. v2 (Jason): - Support 64-bit types too. - Check if we need to set the half-float flag if the immediate already existed. - Multiply the size of the immediate by the width of the copy Reviewed-by: Jason Ekstrand <[email protected]>
* intel/eu: force stride of 2 on NULL register for Byte instructionsIago Toral Quiroga2019-04-181-0/+11
| | | | | | | | | | | The hardware only allows a stride of 1 on a Byte destination for raw byte MOV instructions. This is required even when the destination is the NULL register. Rather than making sure that we emit a proper NULL:B destination every time we need one, just fix it at emission time. Reviewed-by: Jason Ekstrand <[email protected]>
* intel/compiler: ask for an integer type if requesting an 8-bit typeIago Toral Quiroga2019-04-181-2/+3
| | | | | | | v2: - Assign BRW_REGISTER_TYPE_B directly for 8-bit (Jason) Reviewed-by: Jason Ekstrand <[email protected]>
* intel/compiler: rework conversion opcodesIago Toral Quiroga2019-04-181-19/+22
| | | | | | | | | | | | | | | | Now that we have the regioning lowering pass we can just put all of these opcodes together in a single block and we can just assert on the few cases of conversion instructions that are not supported in hardware and that should be lowered in brw_nir_lower_conversions. The only cases what we still handle separately are the conversions from float to half-float since the rounding variants would need to fallthrough and we are already doing this for boolean opcodes (since they need to negate), plus there is also a large comment about these opcodes that we probably want to keep so it is just easier to keep these separate. Suggested-by: Jason Ekstrand <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel/compiler: activate 16-bit bit-size lowerings also for 8-bitIago Toral Quiroga2019-04-181-1/+1
| | | | | | | Particularly, we need the same lowewrings we use for 16-bit integers. Reviewed-by: Jason Ekstrand <[email protected]>
* intel/compiler: split is_partial_write() into two variantsIago Toral Quiroga2019-04-1811-30/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This function is used in two different scenarios that for 32-bit instructions are the same, but for 16-bit instructions are not. One scenario is that in which we are working at a SIMD8 register level and we need to know if a register is fully defined or written. This is useful, for example, in the context of liveness analysis or register allocation, where we work with units of registers. The other scenario is that in which we want to know if an instruction is writing a full scalar component or just some subset of it. This is useful, for example, in the context of some optimization passes like copy propagation. For 32-bit instructions (or larger), a SIMD8 dispatch will always write at least a full SIMD8 register (32B) if the write is not partial. The function is_partial_write() checks this to determine if we have a partial write. However, when we deal with 16-bit instructions, that logic disables some optimizations that should be safe. For example, a SIMD8 16-bit MOV will only update half of a SIMD register, but it is still a complete write of the variable for a SIMD8 dispatch, so we should not prevent copy propagation in this scenario because we don't write all 32 bytes in the SIMD register or because the write starts at offset 16B (wehere we pack components Y or W of 16-bit vectors). This is a problem for SIMD8 executions (VS, TCS, TES, GS) of 16-bit instructions, which lose a number of optimizations because of this, most important of which is copy-propagation. This patch splits is_partial_write() into is_partial_reg_write(), which represents the current is_partial_write(), useful for things like liveness analysis, and is_partial_var_write(), which considers the dispatch size to check if we are writing a full variable (rather than a full register) to decide if the write is partial or not, which is what we really want in many optimization passes. Then the patch goes on and rewrites all uses of is_partial_write() to use one or the other version. Specifically, we use is_partial_var_write() in the following places: copy propagation, cmod propagation, common subexpression elimination, saturate propagation and sel peephole. Notice that the semantics of is_partial_var_write() exactly match the current implementation of is_partial_write() for anything that is 32-bit or larger, so no changes are expected for 32-bit instructions. Tested against ~5000 tests involving 16-bit instructions in CTS produced the following changes in instruction counts: Patched | Master | % | ================================================ SIMD8 | 621,900 | 706,721 | -12.00% | ================================================ SIMD16 | 93,252 | 93,252 | 0.00% | ================================================ As expected, the change only affects SIMD8 dispatches. Reviewed-by: Topi Pohjolainen <[email protected]>
* intel/compiler: workaround for SIMD8 half-float MAD in gen8Iago Toral Quiroga2019-04-181-11/+28
| | | | | | | | | | | | | | | | | | | | | | | | Empirical testing shows that gen8 has a bug where MAD instructions with a half-float source starting at a non-zero offset fail to execute properly. This scenario usually happened in SIMD8 executions, where we used to pack vector components Y and W in the second half of SIMD registers (therefore, with a 16B offset). It looks like we are not currently doing this any more but this would handle the situation properly if we ever happen to produce code like this again. v2 (Jason): - Move this workaround to the lower_regioning pass as an additional case to has_invalid_src_region() - Do not apply the workaround if the stride of the source operand is 0, testing suggests the problem doesn't exist in that case. v3 (Jason): - We want offset % REG_SIZE > 0, not just offset > 0 - Use a helper to compute the offset Reviewed-by: Topi Pohjolainen <[email protected]> (v1)
* intel/compiler: fix ddy for half-float in BroadwellIago Toral Quiroga2019-04-181-2/+15
| | | | | | | | | | | | | | | | | Broadwell has restrictions that apply to Align16 half-float that make the Align16 implementation of this invalid for this platform. Use the gen11 path for this instead, which uses Align1 mode. The restriction is not present in cherryview, gen9 or gen10, where the Align16 implementation seems to work just fine. v2: - Rework the comment in the code, move the PRM citation from the commit message to the comment in the code (Matt) - Cherryview isn't affected, only Broadwell (Matt) Reviewed-by: Jason Ekstrand <[email protected]> (v1) Reviewed-by: Matt Turner <[email protected]>
* intel/compiler: fix ddx and ddy for 16-bit floatIago Toral Quiroga2019-04-181-19/+18
| | | | | | | | | | | | | | | We were assuming 32-bit elements. Also, In SIMD8 we pack 2 vector components in a single SIMD register, so for example, component Y of a 16-bit vec2 starts is at byte offset 16B. This means that when we compute the offset of the elements to be differentiated we should not stomp whatever base offset we have, but instead add to it. v2 - Use byte_offset() helper (Jason) - Merge the fix for SIMD8: using byte_offset() fixes that too. Reviewed-by: Jason Ekstrand <[email protected]> (v1) Reviewed-by: Matt Turner <[email protected]>
* intel/compiler: set correct precision fields for 3-source float instructionsIago Toral Quiroga2019-04-181-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | Source0 and Destination extract the floating-point precision automatically from the SrcType and DstType instruction fields respectively when they are set to types :F or :HF. For Source1 and Source2 operands, we use the new 1-bit fields Src1Type and Src2Type, where 0 means normal precision and 1 means half-precision. Since we always use the type of the destination for all operands when we emit 3-source instructions, we only need set Src1Type and Src2Type to 1 when we are emitting a half-precision instruction. v2: - Set the bit separately for each source based on its type so we can do mixed floating-point mode in the future (Topi). v3: - Use regular citation style for the comment referencing the PRM (Matt). - Decided not to add asserts in the emission code to check that only mixed HF/F types are used since such checks would break negative tests for brw_eu_validate.c (Matt) Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* intel/compiler: allow half-float on 3-source instructions since gen8Iago Toral Quiroga2019-04-181-1/+2
| | | | | | Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* intel/compiler: don't compact 3-src instructions with Src1Type or Src2Type bitsIago Toral Quiroga2019-04-181-1/+4
| | | | | | | | | | | | | | | We are now using these bits, so don't assert that they are not set. In gen8, if these bits are set compaction is not possible. On gen9 and CHV platforms set_3src_control_index() checks these bits (and others) against a table to validate if the particular bit combination is eligible for compaction or not. v2 - Add more detail in the commit message explaining the situation for SKL+ and CHV (Jason) Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* intel/compiler: add new half-float register type for 3-src instructionsIago Toral Quiroga2019-04-181-0/+4
| | | | | | | | | | | | This is available since gen8. v2: restore previously existing assertion. v3: don't use separate tables for gen7 and gen8, just assert that we don't use half-float before gen8 (Matt) Reviewed-by: Topi Pohjolainen <[email protected]> (v1) Reviewed-by: Jason Ekstrand <[email protected]>
* intel/compiler: add instruction setters for Src1Type and Src2Type.Iago Toral Quiroga2019-04-181-0/+2
| | | | | | | | | | | | | | | The original SrcType is a 3-bit field that takes a subset of the types supported for the hardware for 3-source instructions. Since gen8, when the half-float type was added, 3-source floating point operations can use use mixed precision mode, where not all the operands have the same floating-point precision. While the precision for the first operand is taken from the type in SrcType, the bits in Src1Type (bit 36) and Src2Type (bit 35) define the precision for the other operands (0: normal precision, 1: half precision). Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Matt Turner <[email protected]> Acked-by: Jason Ekstrand <[email protected]>
* intel/compiler: drop unnecessary temporary from 32-bit fsign implementationIago Toral Quiroga2019-04-181-3/+2
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* intel/compiler: implement 16-bit fsignIago Toral Quiroga2019-04-181-1/+16
| | | | | | | | | | | v2: - make 16-bit be its own separate case (Jason) v3: - Drop the result_int temporary (Jason) Reviewed-by: Topi Pohjolainen <[email protected]> (v1) Reviewed-by: Jason Ekstrand <[email protected]>
* intel/compiler: handle extended math restrictions for half-floatIago Toral Quiroga2019-04-183-12/+34
| | | | | | | | | | | | | | | Extended math with half-float operands is only supported since gen9, but it is limited to SIMD8. In gen8 we lower it to 32-bit. v2: quashed together the following patches (Jason): - intel/compiler: allow extended math functions with HF operands - intel/compiler: lower 16-bit extended math to 32-bit prior to gen9 - intel/compiler: extended Math is limited to SIMD8 on half-float Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]> (allow extended math functions with HF operands, extended Math is limited to SIMD8 on half-float)
* intel/compiler: lower some 16-bit float operations to 32-bitIago Toral Quiroga2019-04-181-0/+5
| | | | | | | The hardware doesn't support half-float for these. Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel/compiler: assert restrictions on conversions to half-floatIago Toral Quiroga2019-04-181-2/+3
| | | | | | | | | | | There are some hardware restrictions that brw_nir_lower_conversions should have taken care of before we get here. v2: - rebased on top of regioning lowering pass Reviewed-by: Topi Pohjolainen <[email protected]> (v1) Reviewed-by: Jason Ekstrand <[email protected]>
* intel/compiler: handle b2i/b2f with other integer conversion opcodesIago Toral Quiroga2019-04-181-8/+8
| | | | | | | | | | | | | Since we handle booleans as integers this makes more sense. v2: - rebased to incorporate new boolean conversion opcodes v3: - rebased on top regioning lowering pass Reviewed-by: Jason Ekstrand <[email protected]> (v1) Reviewed-by: Topi Pohjolainen <[email protected]> (v2)
* intel/compiler: split float to 64-bit opcodes from int to 64-bitIago Toral Quiroga2019-04-181-0/+7
| | | | | | | | | | | Going forward having these split is a bit more convenient since these two groups have different restrictions. v2: - Rebased on top of new regioning lowering pass. Reviewed-by: Topi Pohjolainen <[email protected]> (v1) Reviewed-by: Jason Ekstrand <[email protected]>
* intel/compiler: add a NIR pass to lower conversionsIago Toral Quiroga2019-04-185-0/+175
| | | | | | | | | | | | | | | | | | | | | | Some conversions are not directly supported in hardware and need to be split in two conversion instructions going through an intermediary type. Doing this at the NIR level simplifies a bit the complexity in the backend. v2: - Consider fp16 rounding conversion opcodes - Properly handle swizzles on conversion sources. v3 - Run the pass earlier, right after nir_opt_algebraic_late (Jason) - NIR alu output types already have the bit-size (Jason) - Use 'is_conversion' to identify conversion operations (Jason) v4: - Be careful about the intermediate types we use so we don't lose range and avoid incorrect rounding semantics (Jason) Reviewed-by: Topi Pohjolainen <[email protected]> (v1) Reviewed-by: Jason Ekstrand <[email protected]>
* Add no_aos_sampling GALLIVM_PERF optionDominik Drees2019-04-173-4/+11
| | | | | This forces using general sampling and should improve precision and performance in some cases.
* ac: use struct/raw store intrinsics for 8-bit/16-bit int with LLVM 9+Samuel Pitoiset2019-04-171-14/+34
| | | | | | | | This changes requires LLVM r356465. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* ac: use struct/raw load intrinsics for 8-bit/16-bit int with LLVM 9+Samuel Pitoiset2019-04-171-12/+38
| | | | | | | | This changes requires LLVM r356465. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* ac: add support for more types with struct/raw LLVM intrinsicsSamuel Pitoiset2019-04-171-20/+26
| | | | | | | | LLVM 9+ now supports 8-bit and 16-bit types. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radv: add VK_KHR_shader_atomic_int64 but disable it for nowSamuel Pitoiset2019-04-173-0/+12
| | | | | | | No support for 64-bit compare&swap atomic operations. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac/nir: add 64-bit SSBO atomic operations supportSamuel Pitoiset2019-04-171-3/+7
| | | | | | | | Except compare&swap which is still buggy. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac/nir: use new LLVM 8 intrinsics for SSBO atomics except cmpswapSamuel Pitoiset2019-04-171-13/+18
| | | | | | | | | | Use the raw version (ie. IDXEN=0) because vindex is unused. Use the old intrinsic for compare&swap because the new one hangs the GPU for some reasons. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* gallivm: fix saturated signed add / sub with llvm 9Roland Scheidegger2019-04-171-0/+14
| | | | | | | | | | | | | | | | llvm 8 removed saturated unsigned add / sub x86 sse2 intrinsics, and now llvm 9 removed the signed versions as well - they were proposed for removal earlier, but the pattern to recognize those was very complex, so it wasn't done then. However, instead of these arch-specific intrinsics, there's now arch-independent intrinsics for saturated add / sub, both for signed and unsigned, so use these. They should have only advantages (work with arbitrary vector sizes, optimal code for all archs), although I don't know how well they work in practice for other archs (at least for x86 they do the right thing). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110454 Reviewed-by: Brian Paul <[email protected]>
* meson: Add dependency on genxml to anvil genfilesJuan A. Suarez Romero2019-04-171-1/+1
| | | | | | | | | | | | This fixes a race condition where anv_gen_files are executed before genxml files, which causes a build failure v2: add dependency on idep_genxml (Lionel) Fixes: d1992255bb29054fa51763376d125183a9f602f ("meson: Add build Intel "anv" vulkan driver") Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/perf: constify accumlator parameterLionel Landwerlin2019-04-172-3/+3
| | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Mark Janes <[email protected]>
* intel/perf: drop counter size fieldLionel Landwerlin2019-04-174-9/+26
| | | | | | | We can deduct the size from another field, let's just save some space. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Mark Janes <[email protected]>
* i965: perf: add mdapi pipeline statistics queries on gen10/11Lionel Landwerlin2019-04-172-1/+10
| | | | | | | | | The Gen10+ expected format adds an additional counter which we can't disclose yet. We can still make the size of the expected query result match. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Mark Janes <[email protected]>
* intel/perf: stub gen10/11 missing definitionsLionel Landwerlin2019-04-171-0/+4
| | | | Reviewed-by: Mark Janes <[email protected]>
* i965: move mdapi guid into intel/perfLionel Landwerlin2019-04-172-2/+4
| | | | | | | One more thing we want to share between the different APIs. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Mark Janes <[email protected]>
* i965: move mdapi result data format to intel/perfLionel Landwerlin2019-04-177-98/+138
| | | | | | | We want to reuse this in Anv. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Mark Janes <[email protected]>
* i965: move brw_timebase_scale to device infoLionel Landwerlin2019-04-176-19/+22
| | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Mark Janes <[email protected]>
* i965: move OA accumulation code to intel/perfLionel Landwerlin2019-04-175-199/+229
| | | | | | | We'll want to reuse this in our Vulkan extension. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Mark Janes <[email protected]>
* i965: move mdapi data structure to intel/perfLionel Landwerlin2019-04-173-97/+128
| | | | | | | We'll want to reuse those structures later on. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Mark Janes <[email protected]>
* i965: extract performance query metricsLionel Landwerlin2019-04-1731-866/+1098
| | | | | | | | | | We would like to reuse performance query metrics in other APIs. Let's make the query code dealing with the processing of raw counters into human readable values API agnostic. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Mark Janes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>