summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* intel/compiler: don't compact 3-src instructions with Src1Type or Src2Type bitsIago Toral Quiroga2019-04-181-1/+4
| | | | | | | | | | | | | | | We are now using these bits, so don't assert that they are not set. In gen8, if these bits are set compaction is not possible. On gen9 and CHV platforms set_3src_control_index() checks these bits (and others) against a table to validate if the particular bit combination is eligible for compaction or not. v2 - Add more detail in the commit message explaining the situation for SKL+ and CHV (Jason) Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* intel/compiler: add new half-float register type for 3-src instructionsIago Toral Quiroga2019-04-181-0/+4
| | | | | | | | | | | | This is available since gen8. v2: restore previously existing assertion. v3: don't use separate tables for gen7 and gen8, just assert that we don't use half-float before gen8 (Matt) Reviewed-by: Topi Pohjolainen <[email protected]> (v1) Reviewed-by: Jason Ekstrand <[email protected]>
* intel/compiler: add instruction setters for Src1Type and Src2Type.Iago Toral Quiroga2019-04-181-0/+2
| | | | | | | | | | | | | | | The original SrcType is a 3-bit field that takes a subset of the types supported for the hardware for 3-source instructions. Since gen8, when the half-float type was added, 3-source floating point operations can use use mixed precision mode, where not all the operands have the same floating-point precision. While the precision for the first operand is taken from the type in SrcType, the bits in Src1Type (bit 36) and Src2Type (bit 35) define the precision for the other operands (0: normal precision, 1: half precision). Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Matt Turner <[email protected]> Acked-by: Jason Ekstrand <[email protected]>
* intel/compiler: drop unnecessary temporary from 32-bit fsign implementationIago Toral Quiroga2019-04-181-3/+2
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* intel/compiler: implement 16-bit fsignIago Toral Quiroga2019-04-181-1/+16
| | | | | | | | | | | v2: - make 16-bit be its own separate case (Jason) v3: - Drop the result_int temporary (Jason) Reviewed-by: Topi Pohjolainen <[email protected]> (v1) Reviewed-by: Jason Ekstrand <[email protected]>
* intel/compiler: handle extended math restrictions for half-floatIago Toral Quiroga2019-04-183-12/+34
| | | | | | | | | | | | | | | Extended math with half-float operands is only supported since gen9, but it is limited to SIMD8. In gen8 we lower it to 32-bit. v2: quashed together the following patches (Jason): - intel/compiler: allow extended math functions with HF operands - intel/compiler: lower 16-bit extended math to 32-bit prior to gen9 - intel/compiler: extended Math is limited to SIMD8 on half-float Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]> (allow extended math functions with HF operands, extended Math is limited to SIMD8 on half-float)
* intel/compiler: lower some 16-bit float operations to 32-bitIago Toral Quiroga2019-04-181-0/+5
| | | | | | | The hardware doesn't support half-float for these. Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel/compiler: assert restrictions on conversions to half-floatIago Toral Quiroga2019-04-181-2/+3
| | | | | | | | | | | There are some hardware restrictions that brw_nir_lower_conversions should have taken care of before we get here. v2: - rebased on top of regioning lowering pass Reviewed-by: Topi Pohjolainen <[email protected]> (v1) Reviewed-by: Jason Ekstrand <[email protected]>
* intel/compiler: handle b2i/b2f with other integer conversion opcodesIago Toral Quiroga2019-04-181-8/+8
| | | | | | | | | | | | | Since we handle booleans as integers this makes more sense. v2: - rebased to incorporate new boolean conversion opcodes v3: - rebased on top regioning lowering pass Reviewed-by: Jason Ekstrand <[email protected]> (v1) Reviewed-by: Topi Pohjolainen <[email protected]> (v2)
* intel/compiler: split float to 64-bit opcodes from int to 64-bitIago Toral Quiroga2019-04-181-0/+7
| | | | | | | | | | | Going forward having these split is a bit more convenient since these two groups have different restrictions. v2: - Rebased on top of new regioning lowering pass. Reviewed-by: Topi Pohjolainen <[email protected]> (v1) Reviewed-by: Jason Ekstrand <[email protected]>
* intel/compiler: add a NIR pass to lower conversionsIago Toral Quiroga2019-04-185-0/+175
| | | | | | | | | | | | | | | | | | | | | | Some conversions are not directly supported in hardware and need to be split in two conversion instructions going through an intermediary type. Doing this at the NIR level simplifies a bit the complexity in the backend. v2: - Consider fp16 rounding conversion opcodes - Properly handle swizzles on conversion sources. v3 - Run the pass earlier, right after nir_opt_algebraic_late (Jason) - NIR alu output types already have the bit-size (Jason) - Use 'is_conversion' to identify conversion operations (Jason) v4: - Be careful about the intermediate types we use so we don't lose range and avoid incorrect rounding semantics (Jason) Reviewed-by: Topi Pohjolainen <[email protected]> (v1) Reviewed-by: Jason Ekstrand <[email protected]>
* Add no_aos_sampling GALLIVM_PERF optionDominik Drees2019-04-173-4/+11
| | | | | This forces using general sampling and should improve precision and performance in some cases.
* ac: use struct/raw store intrinsics for 8-bit/16-bit int with LLVM 9+Samuel Pitoiset2019-04-171-14/+34
| | | | | | | | This changes requires LLVM r356465. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* ac: use struct/raw load intrinsics for 8-bit/16-bit int with LLVM 9+Samuel Pitoiset2019-04-171-12/+38
| | | | | | | | This changes requires LLVM r356465. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* ac: add support for more types with struct/raw LLVM intrinsicsSamuel Pitoiset2019-04-171-20/+26
| | | | | | | | LLVM 9+ now supports 8-bit and 16-bit types. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radv: add VK_KHR_shader_atomic_int64 but disable it for nowSamuel Pitoiset2019-04-173-0/+12
| | | | | | | No support for 64-bit compare&swap atomic operations. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac/nir: add 64-bit SSBO atomic operations supportSamuel Pitoiset2019-04-171-3/+7
| | | | | | | | Except compare&swap which is still buggy. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac/nir: use new LLVM 8 intrinsics for SSBO atomics except cmpswapSamuel Pitoiset2019-04-171-13/+18
| | | | | | | | | | Use the raw version (ie. IDXEN=0) because vindex is unused. Use the old intrinsic for compare&swap because the new one hangs the GPU for some reasons. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* gallivm: fix saturated signed add / sub with llvm 9Roland Scheidegger2019-04-171-0/+14
| | | | | | | | | | | | | | | | llvm 8 removed saturated unsigned add / sub x86 sse2 intrinsics, and now llvm 9 removed the signed versions as well - they were proposed for removal earlier, but the pattern to recognize those was very complex, so it wasn't done then. However, instead of these arch-specific intrinsics, there's now arch-independent intrinsics for saturated add / sub, both for signed and unsigned, so use these. They should have only advantages (work with arbitrary vector sizes, optimal code for all archs), although I don't know how well they work in practice for other archs (at least for x86 they do the right thing). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110454 Reviewed-by: Brian Paul <[email protected]>
* meson: Add dependency on genxml to anvil genfilesJuan A. Suarez Romero2019-04-171-1/+1
| | | | | | | | | | | | This fixes a race condition where anv_gen_files are executed before genxml files, which causes a build failure v2: add dependency on idep_genxml (Lionel) Fixes: d1992255bb29054fa51763376d125183a9f602f ("meson: Add build Intel "anv" vulkan driver") Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/perf: constify accumlator parameterLionel Landwerlin2019-04-172-3/+3
| | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Mark Janes <[email protected]>
* intel/perf: drop counter size fieldLionel Landwerlin2019-04-174-9/+26
| | | | | | | We can deduct the size from another field, let's just save some space. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Mark Janes <[email protected]>
* i965: perf: add mdapi pipeline statistics queries on gen10/11Lionel Landwerlin2019-04-172-1/+10
| | | | | | | | | The Gen10+ expected format adds an additional counter which we can't disclose yet. We can still make the size of the expected query result match. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Mark Janes <[email protected]>
* intel/perf: stub gen10/11 missing definitionsLionel Landwerlin2019-04-171-0/+4
| | | | Reviewed-by: Mark Janes <[email protected]>
* i965: move mdapi guid into intel/perfLionel Landwerlin2019-04-172-2/+4
| | | | | | | One more thing we want to share between the different APIs. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Mark Janes <[email protected]>
* i965: move mdapi result data format to intel/perfLionel Landwerlin2019-04-177-98/+138
| | | | | | | We want to reuse this in Anv. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Mark Janes <[email protected]>
* i965: move brw_timebase_scale to device infoLionel Landwerlin2019-04-176-19/+22
| | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Mark Janes <[email protected]>
* i965: move OA accumulation code to intel/perfLionel Landwerlin2019-04-175-199/+229
| | | | | | | We'll want to reuse this in our Vulkan extension. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Mark Janes <[email protected]>
* i965: move mdapi data structure to intel/perfLionel Landwerlin2019-04-173-97/+128
| | | | | | | We'll want to reuse those structures later on. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Mark Janes <[email protected]>
* i965: extract performance query metricsLionel Landwerlin2019-04-1731-866/+1098
| | | | | | | | | | We would like to reuse performance query metrics in other APIs. Let's make the query code dealing with the processing of raw counters into human readable values API agnostic. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Mark Janes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: store device revision in gen_device_infoLionel Landwerlin2019-04-174-6/+5
| | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel/compiler/icl: Use tcs barrier id bits 24:30 instead of 24:27Topi Pohjolainen2019-04-171-7/+17
| | | | | | | | | Similarly to 1cc17fb731466c68586915acbb916586457b19bc Fixes gpu hangs with dEQP-VK.tessellation.shader_input_output.barrier Reviewed-by: Anuj Phogat <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]>
* virgl: document potentially failing blitErik Faye-Lund2019-04-171-0/+6
| | | | | | | | | This blit can fail, but this is not new; in the old version we didn't even try to blit in this case. So let's just document the limitation for now, and leave this for another day. Signed-off-by: Erik Faye-Lund <[email protected]> Reviewed-by: Gurchetan Singh <[email protected]>
* virgl: do color-conversion during when mapping transferErik Faye-Lund2019-04-171-10/+70
| | | | | | | | | | | | | When running on OpenGL ES, we can't just map any format for reading, because of limitations on glReadPixels. So let's fall back to the blit code-path, and translate the pixels to the correct format in the end. This fixes the remaining failures of KHR-GL32.packed_pixels.* apart from the sRGB tests. Signed-off-by: Erik Faye-Lund <[email protected]> Reviewed-by: Gurchetan Singh <[email protected]>
* virgl: only blit if resource is readErik Faye-Lund2019-04-171-2/+5
| | | | | Signed-off-by: Erik Faye-Lund <[email protected]> Reviewed-by: Gurchetan Singh <[email protected]>
* virgl: get readback-formats from hostErik Faye-Lund2019-04-173-0/+44
| | | | | Signed-off-by: Erik Faye-Lund <[email protected]> Reviewed-by: Gurchetan Singh <[email protected]>
* gallium/util: support translating between uint and sint formatsErik Faye-Lund2019-04-171-0/+62
| | | | | | | | Without this, we can't for instance convert between r8_sint and r8g8b8a8_sint. But that's pretty useful, so let's support it as well. Signed-off-by: Erik Faye-Lund <[email protected]> Reviewed-by: Gurchetan Singh <[email protected]>
* virgl: make sure bind is set for non-buffersErik Faye-Lund2019-04-171-0/+3
| | | | | | | Otherwise, virglrenderer will reject the resource. Signed-off-by: Erik Faye-Lund <[email protected]> Reviewed-by: Gurchetan Singh <[email protected]>
* virgl: support write-back with staged transfersErik Faye-Lund2019-04-172-22/+49
| | | | | | | | | | | | | | | | We currently don't support writing to resources that uses a temporary staging-resource to resolve the pixels. If a write-bit was set, we forgot to perform a blit back to the old resource, followed by trying to update the wrong resource, which lacks backing-storage. The end-result would be that nothing useful happened. This approach also fixes a few smaller bugs, like using the wrong box (without x y and z zeroed out), which means a partial update of a multisampled texture could result in the wrong part of the texture being updated. Signed-off-by: Erik Faye-Lund <[email protected]> Reviewed-by: Gurchetan Singh <[email protected]>
* virgl: use pipe_box for blit dst-rectErik Faye-Lund2019-04-171-5/+12
| | | | | Signed-off-by: Erik Faye-Lund <[email protected]> Reviewed-by: Gurchetan Singh <[email protected]>
* virgl: rewrite core of virgl_texture_transfer_mapErik Faye-Lund2019-04-171-36/+58
| | | | | Signed-off-by: Erik Faye-Lund <[email protected]> Reviewed-by: Gurchetan Singh <[email protected]>
* virgl: return error if allocating resolve_tmp failsErik Faye-Lund2019-04-171-0/+4
| | | | | Signed-off-by: Erik Faye-Lund <[email protected]> Reviewed-by: Gurchetan Singh <[email protected]>
* virgl: wait for the right resourceErik Faye-Lund2019-04-171-1/+1
| | | | | | | | In case we're resolving, we need to wait for the resolved resource instead of the original one. Signed-off-by: Erik Faye-Lund <[email protected]> Reviewed-by: Gurchetan Singh <[email protected]>
* virgl: check for readback on correct resourceErik Faye-Lund2019-04-171-1/+1
| | | | | Signed-off-by: Erik Faye-Lund <[email protected]> Reviewed-by: Gurchetan Singh <[email protected]>
* virgl: make unmap queuing a bit more straight-forwardErik Faye-Lund2019-04-171-5/+7
| | | | | | | | | It's hard to read the code that decides if we want to queue up an unmap or destroy the transfer right away. So let's make it a bit simpler, by setting a bool in case we want to queue it. Signed-off-by: Erik Faye-Lund <[email protected]> Reviewed-by: Gurchetan Singh <[email protected]>
* virgl: simplify virgl_texture_transfer_unmap logicErik Faye-Lund2019-04-171-13/+9
| | | | | | | | There's no reason to keep an extra indentation level here, let's merge the two if-conditions. Signed-off-by: Erik Faye-Lund <[email protected]> Reviewed-by: Gurchetan Singh <[email protected]>
* virgl: track full virgl_resource instead of just virgl_hw_resErik Faye-Lund2019-04-171-5/+5
| | | | | Signed-off-by: Erik Faye-Lund <[email protected]> Reviewed-by: Gurchetan Singh <[email protected]>
* virgl: tmp_resource -> templErik Faye-Lund2019-04-171-4/+3
| | | | | | | | This isn't the temporary resource itself, it's the template that we'll create the resource from. So let's name it appropriately. Signed-off-by: Erik Faye-Lund <[email protected]> Reviewed-by: Gurchetan Singh <[email protected]>
* virgl: remove pointless transfer-counterErik Faye-Lund2019-04-174-4/+2
| | | | | | | This is only written to, never read. Let's just get rid of it. Signed-off-by: Erik Faye-Lund <[email protected]> Reviewed-by: Gurchetan Singh <[email protected]>
* radeonsi/nir: fix scanning of bindless imagesTimothy Arceri2019-04-171-38/+37
| | | | Fixes: d62d434fe920 ("ac/nir_to_llvm: add image bindless support")