summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* android: add vulkan build for intelTapani Pälli2017-02-012-0/+227
| | | | | | | | | | | | | | | | | | | | | | | | | fixes to issues spotted by Emil Velikov: - set ANV_TIMESTAMP corretly - fix typo with VULKAN_GEM_FILES v2: update to use Makefile.sources under vulkan instead of having own v3: update to changes to generate from vk.xml (commit c7fc310) v4: remove 'hw' relative path cleanups, remove unnecessary cruft review from Emil Velikov: - move to vulkan folder - remove timestamp gen, no longer necessary - more cleanups Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* mesa: use same is_color_attachment trick to discern error casesIlia Mirkin2017-01-311-3/+11
| | | | | | | | | | | All the other calls to retrieve the attachment have been covered except this one - return the proper error for attachment points that are valid enums but out of bound for the driver. Fixes GL45-CTS.geometry_shader.layered_fbo.fb_texture_invalid_attachment Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* anv: Improve flushing around STATE_BASE_ADDRESSJason Ekstrand2017-01-311-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | It is not clear from the docs exactly how pipelined STATE_BASE_ADDRESS actually is. We know from experimentation that we need to flush the render cache prior to emitting STATE_BASE_ADDRESS and invalidate the texture cache afterwards. The only thing the PRM says is that, on gen8+ we're supposed to invalidate the state cache after STATE_BASE_ADDRESS but experimentation has indicated that doing so does nothing whatsoever. Since we don't really know, let's do just a bit more flushing in the hopes that this won't be a problem again. In particular: 1) Do a CS stall before we emit STATE_BASE_ADDRESS since we don't really know whether or not it's pipelined. 2) Do a data cache flush in case what runs before STATE_BASE_ADDRESS is a compute shader. 3) Invalidate the state and constant caches after STATE_BASE_ADDRESS because the state may be getting cached there (we don't really know). Reported-by: Mark Janes <[email protected]> Tested-by: Mark Janes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: "13.0 17.0" <[email protected]>
* anv: Flush render cache before STATE_BASE_ADDRESS on gen7Jason Ekstrand2017-01-311-3/+0
| | | | | | | | | | | We had no good reason for *not* doing this on gen7 before but we didn't know it was needed. Recently, when trying update to Vulkan CTS version 1.0.2 in our CI system, Mark discovered GPU hangs on Haswell that appear to be STATE_BASE_ADDRESS related. This commit fixes them. Reported-by: Mark Janes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: "13.0 17.0" <[email protected]>
* isl/formats: Only advertise sampling for A4B4G4R4 on BroadwellJason Ekstrand2017-01-311-2/+3
| | | | | | | | | This causes hangs on Broadwell if you try to render to it. I have no idea how we managed to not hit this earlier. Tested-by: Mark Janes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: "13.0 17.0" <[email protected]>
* intel/blorp: Handle clearing of A4B4G4R4 on all platformsJason Ekstrand2017-01-311-0/+23
| | | | | | Tested-by: Mark Janes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: "13.0 17.0" <[email protected]>
* radeonsi: Fix build on LLVM < 3.9 v2Tom Stellard2017-02-011-2/+4
| | | | | | | | | This was broken by: e0cc0a614c96011958bc3a1b84da9168e0e1ccbb v2: - Use preprocessor macro Tested-by: Mark Janes <[email protected]>
* radv: Enable Float64 support.Bas Nieuwenhuizen2017-02-012-1/+2
| | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radv/ac: Implement Float64 SSBO loads.Bas Nieuwenhuizen2017-02-011-26/+49
| | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radv/ac: Implement Float64 UBO loads.Bas Nieuwenhuizen2017-02-011-2/+6
| | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radv/ac: Implement Float64 load/store var.Bas Nieuwenhuizen2017-02-011-53/+48
| | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radv/ac: Implement Float64 SSBO stores.Bas Nieuwenhuizen2017-02-011-3/+14
| | | | | | | No f16 support as I'm not quite sure about alignment yet. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radv/ac: Add core Float64 support.Bas Nieuwenhuizen2017-02-011-44/+129
| | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* vc4: Enable Neon on arm android buildsRob Herring2017-01-311-0/+2
| | | | | Signed-off-by: Rob Herring <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* vc4: fix arm64 build with NeonRob Herring2017-01-311-1/+1
| | | | | | | | The addition of Neon assembly breaks on arm64 builds because the assembly syntax is different. For now, restrict Neon to ARMv7 builds. Signed-off-by: Rob Herring <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* vc4: Make Neon inline assembly clang compatibleRob Herring2017-01-311-35/+35
| | | | | | | | | | | | clang throws an error on "%r2" and similar. I couldn't find any documentation on what "%r?" is supposed to mean and I've never seen any use like that as far as I remember. The parameter is supposed to be cpu_stride and just %2/%3 should be sufficient. There's no need for trailing ";" either, so remove those, too. Signed-off-by: Rob Herring <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* radeonsi: Set datalayout on the llvm moduleTom Stellard2017-01-311-0/+6
| | | | | | | | | | This prevents LLVM from using sext instructions for local memory offsets and allows the backend to fold immediate offsets into the instruction. This also prevents some incorrect code generation for ptrtoint and inttoptr instructions. Reviewed-by: Nicolai Hähnle <[email protected]>
* nir/spirv/glsl450: Implement IEEE-compliant handling of atan2(±∞, ±∞).Francisco Jerez2017-01-311-1/+21
| | | | | Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Juan A. Suarez Romero <[email protected]>
* glsl: Implement IEEE-compliant handling of atan2(±∞, ±∞).Francisco Jerez2017-01-311-1/+21
| | | | | Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Juan A. Suarez Romero <[email protected]>
* nir/spirv/glsl450: Rewrite atan2 implementation to fix accuracy and handling ↵Francisco Jerez2017-01-311-22/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | of zero/infinity. See "glsl: Rewrite atan2 implementation to fix accuracy and handling of zero/infinity." for the rationale, but note that the instruction count benefit discussed there is somewhat less important for the SPIRV implementation, because the current code already emitted no control flow instructions -- Still this saves us one hardware instruction per scalar component on Intel SKL hardware. Fixes the following Vulkan CTS tests on Intel hardware: dEQP-VK.glsl.builtin.precision.atan2.highp_compute.scalar dEQP-VK.glsl.builtin.precision.atan2.highp_compute.vec2 dEQP-VK.glsl.builtin.precision.atan2.highp_compute.vec3 dEQP-VK.glsl.builtin.precision.atan2.highp_compute.vec4 dEQP-VK.glsl.builtin.precision.atan2.mediump_compute.vec2 dEQP-VK.glsl.builtin.precision.atan2.mediump_compute.vec4 Note that most of the test-cases above expect IEEE-compliant handling of atan2(±∞, ±∞), which this patch doesn't explicitly handle, so except for the last two the test-cases above weren't expected to pass yet. The reason they do is that the i965 back-end implementation of the NIR fmin and fmax instructions is not quite GLSL-compliant (it complies with IEEE 754 recommendations though), because fmin/fmax of a NaN and a non-NaN argument currently always return the non-NaN argument, which causes atan() to flush NaN to one and return the expected value. The front-end should probably not be relying on this behavior for correctness though because other back-ends are likely to behave differently -- A follow-up patch will handle the atan2(±∞, ±∞) corner cases explicitly. v2: Fix up argument scaling to take into account the range and precision of exotic FP24 hardware. Flip coordinate system for arguments along the vertical line as if they were on the left half-plane in order to avoid division by zero which may give unspecified results on non-GLSL 4.1-capable hardware. Sprinkle in some more comments. Reviewed-by: Ian Romanick <[email protected]>
* glsl: Rewrite atan2 implementation to fix accuracy and handling of ↵Francisco Jerez2017-01-311-36/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | zero/infinity. This addresses several issues of the current atan2 implementation: - Negative zero (and negative denorms which end up getting flushed to zero) isn't handled correctly by the current implementation. The reason is that it does 'y >= 0' and 'x < 0' comparisons to decide on which side of the branch cut the argument is, which causes us to return incorrect results (off by up to 2π) for very small negative values. - There is a serious precision problem for x values of large enough magnitude introduced by the floating point division operation being implemented as a mul+rcp sequence. This can lead to the quotient getting flushed to zero in some cases introducing an error of over 8e6 ULP in the result -- Or in the most catastrophic case will cause us to return NaN instead of the correct value ±π/2 for y=±∞ and x very large. We can fix this easily by scaling down both arguments when the absolute value of the denominator goes above certain threshold. The error of this atan2 implementation remains below 25 ULP in most of its domain except for a neighborhood of y=0 where it reaches a maximum error of about 180 ULP. - It emits a bunch of instructions including no less than three if-else branches per scalar component that don't seem to get optimized out later on. This implementation uses about 13% less instructions on Intel SKL hardware and doesn't emit any control flow instructions. v2: Fix up argument scaling to take into account the range and precision of exotic FP24 hardware. Flip coordinate system for arguments along the vertical line as if they were on the left half-plane in order to avoid division by zero which may give unspecified results on non-GLSL 4.1-capable hardware. Sprinkle in some more comments. Reviewed-by: Ian Romanick <[email protected]>
* i965/fs: Fix nir_op_fsign of absolute value.Francisco Jerez2017-01-311-1/+8
| | | | | | | | | | This does point at the front-end emitting silly code that could have been optimized out, but the current fsign implementation would emit bogus IR if abs was set for the argument (because it would apply the abs modifier on an unsigned integer type), and we shouldn't rely on the upper layer's optimization passes for correctness. Reviewed-by: Ian Romanick <[email protected]>
* glsl/ir_builder: Add rcp builder.Francisco Jerez2017-01-312-0/+7
| | | | | Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Juan A. Suarez Romero <[email protected]>
* glsl: Fix constant evaluation of the rcp op.Francisco Jerez2017-01-311-1/+1
| | | | | | | | | | Will avoid a regression in a future commit that introduces some additional rcp operations. According to the GLSL 4.10 specification: "Dividing by 0 results in the appropriately signed IEEE Inf." Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Juan A. Suarez Romero <[email protected]>
* mesa/program: Translate csel operation from GLSL IR.Francisco Jerez2017-01-311-1/+8
| | | | | | | | | | | This will be used internally by the GLSL front-end in order to implement some built-in functions. Plumb it through MESA IR for back-ends that rely on this translation pass. v2: Add comment. Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Juan A. Suarez Romero <[email protected]>
* etnaviv: Set SE.CLIP registers, add margins for scissor/clip registersWladimir J. van der Laan2017-01-313-20/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | This fixes rendering of full-screen quads (and other screen-filling geometry, e.g. ioquake3 walls up-close) on gc3000. It should be a no-op on other hardware. - It looks like SE_CLIP registers were not set at all. I'm amazed that rendering worked without them. Emit them to avoid issues on gc3000. - Define constants ETNA_SE_SCISSOR_MARGIN_RIGHT (0x1119) ETNA_SE_SCISSOR_MARGIN_BOTTOM (0x1111) ETNA_SE_CLIP_MARGIN_RIGHT (0xffff) ETNA_SE_CLIP_MARGIN_BOTTOM (0xffff) These demarcate the margin (fixp16) between the computed sizes and the value sent to the chip. I have set these to the numbers used by the Vivante driver for gc2000. I am not sure whether any old hardware was relying on the old numbers, or whether those were just a guess. But if so, these need to be moved to the _specs structure. CC: <[email protected]> Signed-off-by: Wladimir J. van der Laan <[email protected]> Acked-by: Christian Gmeiner <[email protected]>
* etnaviv: Generate new sin/cos instructions on GC3000Wladimir J. van der Laan2017-01-313-1/+40
| | | | | | | | | | | | | | | | | | | | | Shaders using sin/cos instructions were not working on GC3000. The reason for this turns out to be that these chips implement sin/cos in a different way (but using the same opcodes): - Need their input scaled by 1/pi instead of 2/pi. - Output an x and y component, which need to be multiplied to get the result. - tex_amode needs to be set to 1. Add a new bit to the compiler specs and generate these instructions as necessary. CC: <[email protected]> Signed-off-by: Wladimir J. van der Laan <[email protected]> Acked-by: Christian Gmeiner <[email protected]>
* anv/cmd_buffer: Use the proper depth input attachment surface stateNanley Chery2017-01-311-6/+6
| | | | | | | | | | | | | | | | | | | | | | | Commit 2852efcda40274acf3272611c6a3b7731523a72d moved the location of the depth input attachment surface state from the render pass to the image view, but failed to update the surface state location used when emitting the binding table. Fix this by loading the surface state from the correct location. Fixes: dEQP-VK.renderpass.formats.d16_unorm.input.* dEQP-VK.renderpass.formats.d24_unorm_s8_uint.input.* dEQP-VK.renderpass.formats.d32_sfloat.input.* dEQP-VK.renderpass.formats.x8_d24_unorm_pack32.input.* dEQP-VK.renderpass.attachment_allocation.input_output.93 dEQP-VK.renderpass.attachment_allocation.input_output.92 dEQP-VK.renderpass.attachment_allocation.input_output.82 dEQP-VK.renderpass.attachment_allocation.input_output.46 Cc: "17.0" <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Signed-off-by: Nanley Chery <[email protected]>
* glsl: fix heap-buffer-overflowBartosz Tomczyk2017-01-311-1/+1
| | | | | | | | | The `end+1` skips the ']', whereas the `strlen+1` includes the final '\0' in the move to terminate the string. Cc: [email protected] Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* etnaviv: Cannot render to rb-swapped formatsWladimir J. van der Laan2017-01-311-2/+5
| | | | | | | | | | | | | | Exposing rb swapped (or other swizzled) formats for rendering would involve swizzing in the pixel shader. This is not the case at the moment, so reject requests for creating such surfaces. (GPUs that need an extra resolve step anyway due to multiple pixel pipes, such as gc2000, might also do this swap in the resolve operation. But this would be tricky to keep track of) CC: <[email protected]> Signed-off-by: Wladimir J. van der Laan <[email protected]> Acked-by: Christian Gmeiner <[email protected]>
* etnaviv: Avoid infinite loop in find_frame()Christian Gmeiner2017-01-311-1/+1
| | | | | | | | | | | | | | | | | | | Use of unsigned loop control variable with '>= 0' would lead to infinite loop. Reported by clang: etnaviv_compiler.c:1024:39: warning: comparison of unsigned expression >= 0 is always true [-Wtautological-compare] for (unsigned sp = c->frame_sp; sp >= 0; sp--) ~~ ^ ~ v2: Simply use the same datatype as c->frame_sp is using. CC: <[email protected]> Reported-by: Rhys Kidd <[email protected]> Signed-off-by: Christian Gmeiner <[email protected]> Reviewed-by: Rhys Kidd <[email protected]>
* radv/ac: apply slice rounding to 1d arrays as well.Dave Airlie2017-01-311-5/+15
| | | | | | | | | Fixes: dEQP-VK.glsl.texture_functions.texture.*1darray* Reviewed-by: Bas Nieuwenhuizen <[email protected]> Cc: "17.0" <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv/geom: check if esgs and gsvs ring exists before filling geom ringsDave Airlie2017-01-311-3/+6
| | | | | | | | | | | There are some corner cases where you end up with an esgs ring, but no gsvs ring, test for both before dereferencing. Fixes: dEQP-VK.geometry.emit.points_emit_0_end_0 Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv: enable geometryShader and multiViewport capabilities.Dave Airlie2017-01-311-2/+2
| | | | | | | This enables geometry shader support on radv. Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv: handle layer export from vs->fs properlyDave Airlie2017-01-313-2/+23
| | | | | | | | Fixes: dEQP-VK.geometry.layered.1d_array.fragment_layer Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv: emit esgs itemsize register.Dave Airlie2017-01-311-0/+2
| | | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv: handle prim id inputs to fragment shader.Dave Airlie2017-01-311-1/+15
| | | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv: emit geometry shaders to hardwareDave Airlie2017-01-311-2/+96
| | | | | | | This emits the compiled geometry shader and other state registers. Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv: emit geometry ring size and pointers via preamble (v2)Dave Airlie2017-01-313-11/+230
| | | | | | | | | | | | This uses the scratch infrastructure to handle the esgs and gsvs rings. (this replaces the old code that did this with patching). v2: fix correct ring sizes, reset sizes (Bas) Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv: add gs ring size calculations to pipeline.Dave Airlie2017-01-312-0/+34
| | | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv: add pipeline creation support for geometry shaders (v2.1)Dave Airlie2017-01-313-8/+124
| | | | | | | | | | | This adds gs copy shader support to the pipeline cache, and few geometry related changes. v2: rebase for spill changes. v2.1: fix incorrect pipeline destruction. Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv/ac: handle primitive idDave Airlie2017-01-312-1/+11
| | | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv/ac: handle emitting vertex outputs to esgs ring.Dave Airlie2017-01-312-1/+38
| | | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv/ac: handle gs inputsDave Airlie2017-01-311-0/+56
| | | | | | | | This handles geometry shader inputs written by the vertex (es) shader to the esgs ring. Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv/ac: add geom input support to get deref offset.Dave Airlie2017-01-311-8/+14
| | | | | | | This just adds the API and fixes up the callers. Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv/ac: handle invocation and primitive id intrinsicsDave Airlie2017-01-311-0/+9
| | | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv/ac: handle geometry emit vertex and end prim intrinsics.Dave Airlie2017-01-311-0/+126
| | | | | | | | This handles emitting things to the gsvs ring, and sending the correct GS msgs. Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv/ac: handle emitting gs epilogueDave Airlie2017-01-311-0/+14
| | | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv/ac: add copy shader creationDave Airlie2017-01-312-0/+88
| | | | | | | This create the gs copy shader and compiles it. Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv/ac: setup function parameters for vs as es and copy shader.Dave Airlie2017-01-311-17/+32
| | | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>