summaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* radv: handle VK_QUEUE_FAMILY_IGNORED in image transitions (v3)Dave Airlie2017-02-024-12/+15
| | | | | | | | | | | | | The CTS tests at least are using this, and we were totally ignoring it. This hopefully fixes the bouncing multisample CTS tests. v2: get family mask in ignored case from command buffer. v3: only change things in one place, use logic from Bas. Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv/ac: handle clip/cull distance sizing in geometry shader outputsDave Airlie2017-02-021-8/+10
| | | | | | | | | | | Otherwise we were writing these as 4 components, and things went bad. Fixes (the remaining): dEQP-VK.clipping.user_defined.*.vert_geom.* Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv/ac: add const_index to fetch index for gs inputsDave Airlie2017-02-021-1/+1
| | | | | | | | | | | | This fixes clip distance fetches as they are single item loads with a const_index like float[1]. Fixes: dEQP-VK.clipping.user_defined.*.vert_geom.[0-6] Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radeonsi/ac: move frag interp emission code to shared llvm code.Dave Airlie2017-02-023-87/+98
| | | | | | | | This code should be used in radv, so move it to a shared location in advance of doing that. Reviewed-by: Nicolai Hähnle <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* st/mesa: inline get_mesa_program()Timothy Arceri2017-02-021-37/+23
| | | | | | | | | | In the past I've gotten this function confused with the one in ir_to_mesa.cpp of the same name. Now that the affected flag setting has move into a helper it makes sense just to inline this remaining code. Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* st/mesa: create set_prog_affected_state_flags() helperTimothy Arceri2017-02-021-106/+111
| | | | | | | | This will be used when restoring tgsi from the on-disk shader cache. Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* st/mesa: st_atom_shader.c C99 tidy upTimothy Arceri2017-02-021-3/+1
| | | | | Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* st/mesa: remove pre C99 statement block for variable declarationTimothy Arceri2017-02-021-60/+58
| | | | Acked-by: Marek Olšák <[email protected]>
* isl: Add assertions for render target swizzle restrictionsJason Ekstrand2017-02-011-0/+32
| | | | Reviewed-by: Anuj Phogat <[email protected]>
* st/va: add h264 constrained baseline profileBoyuan Zhang2017-02-011-0/+1
| | | | | Signed-off-by: Boyuan Zhang <[email protected]> Reviewed-by: Christian König <[email protected]>
* st/vdpau: add h264 constrained baseline profileBoyuan Zhang2017-02-011-0/+4
| | | | | Signed-off-by: Boyuan Zhang <[email protected]> Reviewed-by: Christian König <[email protected]>
* radeon/uvd: add h264 constrained baseline supportBoyuan Zhang2017-02-011-0/+1
| | | | | Signed-off-by: Boyuan Zhang <[email protected]> Reviewed-by: Christian König <[email protected]>
* vl: add h264 constrained baseline profileBoyuan Zhang2017-02-012-0/+2
| | | | | Signed-off-by: Boyuan Zhang <[email protected]> Reviewed-by: Christian König <[email protected]>
* radv: Enable VK_KHR_shader_draw_parameters.Bas Nieuwenhuizen2017-02-012-0/+5
| | | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radv: Pass draw index to shader.Bas Nieuwenhuizen2017-02-011-5/+9
| | | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radv/ac: Add draw index support.Bas Nieuwenhuizen2017-02-011-2/+8
| | | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* i965: Prevent coverity warningRobert Foss2017-02-011-0/+1
| | | | | | | | | | | | Add assert checking that num_sources is never larger than 3. This prevents Coverity from concluding that the unhandled cases of num_sources not being 0-3 are relevant. Coverity-Id: 1399480-1399489 Signed-off-by: Robert Foss <[email protected]> Reviewed-by: Anuj Phogat <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]>
* spirv: add SPV_KHR_shader_draw_parameters supportLionel Landwerlin2017-02-013-0/+17
| | | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* compiler: add missing enums for debugLionel Landwerlin2017-02-011-0/+2
| | | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* winsys/radeon: Allow visible VRAM size > 256MB with kernel driver >= 2.49Michel Dänzer2017-02-011-1/+6
| | | | | | | | | The kernel driver reports correct values now. Reviewed-by: Christian König <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Alex Deucher <[email protected]>
* android: add vulkan build for intelTapani Pälli2017-02-012-0/+227
| | | | | | | | | | | | | | | | | | | | | | | | | fixes to issues spotted by Emil Velikov: - set ANV_TIMESTAMP corretly - fix typo with VULKAN_GEM_FILES v2: update to use Makefile.sources under vulkan instead of having own v3: update to changes to generate from vk.xml (commit c7fc310) v4: remove 'hw' relative path cleanups, remove unnecessary cruft review from Emil Velikov: - move to vulkan folder - remove timestamp gen, no longer necessary - more cleanups Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* mesa: use same is_color_attachment trick to discern error casesIlia Mirkin2017-01-311-3/+11
| | | | | | | | | | | All the other calls to retrieve the attachment have been covered except this one - return the proper error for attachment points that are valid enums but out of bound for the driver. Fixes GL45-CTS.geometry_shader.layered_fbo.fb_texture_invalid_attachment Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* anv: Improve flushing around STATE_BASE_ADDRESSJason Ekstrand2017-01-311-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | It is not clear from the docs exactly how pipelined STATE_BASE_ADDRESS actually is. We know from experimentation that we need to flush the render cache prior to emitting STATE_BASE_ADDRESS and invalidate the texture cache afterwards. The only thing the PRM says is that, on gen8+ we're supposed to invalidate the state cache after STATE_BASE_ADDRESS but experimentation has indicated that doing so does nothing whatsoever. Since we don't really know, let's do just a bit more flushing in the hopes that this won't be a problem again. In particular: 1) Do a CS stall before we emit STATE_BASE_ADDRESS since we don't really know whether or not it's pipelined. 2) Do a data cache flush in case what runs before STATE_BASE_ADDRESS is a compute shader. 3) Invalidate the state and constant caches after STATE_BASE_ADDRESS because the state may be getting cached there (we don't really know). Reported-by: Mark Janes <[email protected]> Tested-by: Mark Janes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: "13.0 17.0" <[email protected]>
* anv: Flush render cache before STATE_BASE_ADDRESS on gen7Jason Ekstrand2017-01-311-3/+0
| | | | | | | | | | | We had no good reason for *not* doing this on gen7 before but we didn't know it was needed. Recently, when trying update to Vulkan CTS version 1.0.2 in our CI system, Mark discovered GPU hangs on Haswell that appear to be STATE_BASE_ADDRESS related. This commit fixes them. Reported-by: Mark Janes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: "13.0 17.0" <[email protected]>
* isl/formats: Only advertise sampling for A4B4G4R4 on BroadwellJason Ekstrand2017-01-311-2/+3
| | | | | | | | | This causes hangs on Broadwell if you try to render to it. I have no idea how we managed to not hit this earlier. Tested-by: Mark Janes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: "13.0 17.0" <[email protected]>
* intel/blorp: Handle clearing of A4B4G4R4 on all platformsJason Ekstrand2017-01-311-0/+23
| | | | | | Tested-by: Mark Janes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: "13.0 17.0" <[email protected]>
* radeonsi: Fix build on LLVM < 3.9 v2Tom Stellard2017-02-011-2/+4
| | | | | | | | | This was broken by: e0cc0a614c96011958bc3a1b84da9168e0e1ccbb v2: - Use preprocessor macro Tested-by: Mark Janes <[email protected]>
* radv: Enable Float64 support.Bas Nieuwenhuizen2017-02-012-1/+2
| | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radv/ac: Implement Float64 SSBO loads.Bas Nieuwenhuizen2017-02-011-26/+49
| | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radv/ac: Implement Float64 UBO loads.Bas Nieuwenhuizen2017-02-011-2/+6
| | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radv/ac: Implement Float64 load/store var.Bas Nieuwenhuizen2017-02-011-53/+48
| | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radv/ac: Implement Float64 SSBO stores.Bas Nieuwenhuizen2017-02-011-3/+14
| | | | | | | No f16 support as I'm not quite sure about alignment yet. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radv/ac: Add core Float64 support.Bas Nieuwenhuizen2017-02-011-44/+129
| | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* vc4: Enable Neon on arm android buildsRob Herring2017-01-311-0/+2
| | | | | Signed-off-by: Rob Herring <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* vc4: fix arm64 build with NeonRob Herring2017-01-311-1/+1
| | | | | | | | The addition of Neon assembly breaks on arm64 builds because the assembly syntax is different. For now, restrict Neon to ARMv7 builds. Signed-off-by: Rob Herring <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* vc4: Make Neon inline assembly clang compatibleRob Herring2017-01-311-35/+35
| | | | | | | | | | | | clang throws an error on "%r2" and similar. I couldn't find any documentation on what "%r?" is supposed to mean and I've never seen any use like that as far as I remember. The parameter is supposed to be cpu_stride and just %2/%3 should be sufficient. There's no need for trailing ";" either, so remove those, too. Signed-off-by: Rob Herring <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* radeonsi: Set datalayout on the llvm moduleTom Stellard2017-01-311-0/+6
| | | | | | | | | | This prevents LLVM from using sext instructions for local memory offsets and allows the backend to fold immediate offsets into the instruction. This also prevents some incorrect code generation for ptrtoint and inttoptr instructions. Reviewed-by: Nicolai Hähnle <[email protected]>
* nir/spirv/glsl450: Implement IEEE-compliant handling of atan2(±∞, ±∞).Francisco Jerez2017-01-311-1/+21
| | | | | Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Juan A. Suarez Romero <[email protected]>
* glsl: Implement IEEE-compliant handling of atan2(±∞, ±∞).Francisco Jerez2017-01-311-1/+21
| | | | | Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Juan A. Suarez Romero <[email protected]>
* nir/spirv/glsl450: Rewrite atan2 implementation to fix accuracy and handling ↵Francisco Jerez2017-01-311-22/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | of zero/infinity. See "glsl: Rewrite atan2 implementation to fix accuracy and handling of zero/infinity." for the rationale, but note that the instruction count benefit discussed there is somewhat less important for the SPIRV implementation, because the current code already emitted no control flow instructions -- Still this saves us one hardware instruction per scalar component on Intel SKL hardware. Fixes the following Vulkan CTS tests on Intel hardware: dEQP-VK.glsl.builtin.precision.atan2.highp_compute.scalar dEQP-VK.glsl.builtin.precision.atan2.highp_compute.vec2 dEQP-VK.glsl.builtin.precision.atan2.highp_compute.vec3 dEQP-VK.glsl.builtin.precision.atan2.highp_compute.vec4 dEQP-VK.glsl.builtin.precision.atan2.mediump_compute.vec2 dEQP-VK.glsl.builtin.precision.atan2.mediump_compute.vec4 Note that most of the test-cases above expect IEEE-compliant handling of atan2(±∞, ±∞), which this patch doesn't explicitly handle, so except for the last two the test-cases above weren't expected to pass yet. The reason they do is that the i965 back-end implementation of the NIR fmin and fmax instructions is not quite GLSL-compliant (it complies with IEEE 754 recommendations though), because fmin/fmax of a NaN and a non-NaN argument currently always return the non-NaN argument, which causes atan() to flush NaN to one and return the expected value. The front-end should probably not be relying on this behavior for correctness though because other back-ends are likely to behave differently -- A follow-up patch will handle the atan2(±∞, ±∞) corner cases explicitly. v2: Fix up argument scaling to take into account the range and precision of exotic FP24 hardware. Flip coordinate system for arguments along the vertical line as if they were on the left half-plane in order to avoid division by zero which may give unspecified results on non-GLSL 4.1-capable hardware. Sprinkle in some more comments. Reviewed-by: Ian Romanick <[email protected]>
* glsl: Rewrite atan2 implementation to fix accuracy and handling of ↵Francisco Jerez2017-01-311-36/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | zero/infinity. This addresses several issues of the current atan2 implementation: - Negative zero (and negative denorms which end up getting flushed to zero) isn't handled correctly by the current implementation. The reason is that it does 'y >= 0' and 'x < 0' comparisons to decide on which side of the branch cut the argument is, which causes us to return incorrect results (off by up to 2π) for very small negative values. - There is a serious precision problem for x values of large enough magnitude introduced by the floating point division operation being implemented as a mul+rcp sequence. This can lead to the quotient getting flushed to zero in some cases introducing an error of over 8e6 ULP in the result -- Or in the most catastrophic case will cause us to return NaN instead of the correct value ±π/2 for y=±∞ and x very large. We can fix this easily by scaling down both arguments when the absolute value of the denominator goes above certain threshold. The error of this atan2 implementation remains below 25 ULP in most of its domain except for a neighborhood of y=0 where it reaches a maximum error of about 180 ULP. - It emits a bunch of instructions including no less than three if-else branches per scalar component that don't seem to get optimized out later on. This implementation uses about 13% less instructions on Intel SKL hardware and doesn't emit any control flow instructions. v2: Fix up argument scaling to take into account the range and precision of exotic FP24 hardware. Flip coordinate system for arguments along the vertical line as if they were on the left half-plane in order to avoid division by zero which may give unspecified results on non-GLSL 4.1-capable hardware. Sprinkle in some more comments. Reviewed-by: Ian Romanick <[email protected]>
* i965/fs: Fix nir_op_fsign of absolute value.Francisco Jerez2017-01-311-1/+8
| | | | | | | | | | This does point at the front-end emitting silly code that could have been optimized out, but the current fsign implementation would emit bogus IR if abs was set for the argument (because it would apply the abs modifier on an unsigned integer type), and we shouldn't rely on the upper layer's optimization passes for correctness. Reviewed-by: Ian Romanick <[email protected]>
* glsl/ir_builder: Add rcp builder.Francisco Jerez2017-01-312-0/+7
| | | | | Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Juan A. Suarez Romero <[email protected]>
* glsl: Fix constant evaluation of the rcp op.Francisco Jerez2017-01-311-1/+1
| | | | | | | | | | Will avoid a regression in a future commit that introduces some additional rcp operations. According to the GLSL 4.10 specification: "Dividing by 0 results in the appropriately signed IEEE Inf." Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Juan A. Suarez Romero <[email protected]>
* mesa/program: Translate csel operation from GLSL IR.Francisco Jerez2017-01-311-1/+8
| | | | | | | | | | | This will be used internally by the GLSL front-end in order to implement some built-in functions. Plumb it through MESA IR for back-ends that rely on this translation pass. v2: Add comment. Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Juan A. Suarez Romero <[email protected]>
* etnaviv: Set SE.CLIP registers, add margins for scissor/clip registersWladimir J. van der Laan2017-01-313-20/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | This fixes rendering of full-screen quads (and other screen-filling geometry, e.g. ioquake3 walls up-close) on gc3000. It should be a no-op on other hardware. - It looks like SE_CLIP registers were not set at all. I'm amazed that rendering worked without them. Emit them to avoid issues on gc3000. - Define constants ETNA_SE_SCISSOR_MARGIN_RIGHT (0x1119) ETNA_SE_SCISSOR_MARGIN_BOTTOM (0x1111) ETNA_SE_CLIP_MARGIN_RIGHT (0xffff) ETNA_SE_CLIP_MARGIN_BOTTOM (0xffff) These demarcate the margin (fixp16) between the computed sizes and the value sent to the chip. I have set these to the numbers used by the Vivante driver for gc2000. I am not sure whether any old hardware was relying on the old numbers, or whether those were just a guess. But if so, these need to be moved to the _specs structure. CC: <[email protected]> Signed-off-by: Wladimir J. van der Laan <[email protected]> Acked-by: Christian Gmeiner <[email protected]>
* etnaviv: Generate new sin/cos instructions on GC3000Wladimir J. van der Laan2017-01-313-1/+40
| | | | | | | | | | | | | | | | | | | | | Shaders using sin/cos instructions were not working on GC3000. The reason for this turns out to be that these chips implement sin/cos in a different way (but using the same opcodes): - Need their input scaled by 1/pi instead of 2/pi. - Output an x and y component, which need to be multiplied to get the result. - tex_amode needs to be set to 1. Add a new bit to the compiler specs and generate these instructions as necessary. CC: <[email protected]> Signed-off-by: Wladimir J. van der Laan <[email protected]> Acked-by: Christian Gmeiner <[email protected]>
* anv/cmd_buffer: Use the proper depth input attachment surface stateNanley Chery2017-01-311-6/+6
| | | | | | | | | | | | | | | | | | | | | | | Commit 2852efcda40274acf3272611c6a3b7731523a72d moved the location of the depth input attachment surface state from the render pass to the image view, but failed to update the surface state location used when emitting the binding table. Fix this by loading the surface state from the correct location. Fixes: dEQP-VK.renderpass.formats.d16_unorm.input.* dEQP-VK.renderpass.formats.d24_unorm_s8_uint.input.* dEQP-VK.renderpass.formats.d32_sfloat.input.* dEQP-VK.renderpass.formats.x8_d24_unorm_pack32.input.* dEQP-VK.renderpass.attachment_allocation.input_output.93 dEQP-VK.renderpass.attachment_allocation.input_output.92 dEQP-VK.renderpass.attachment_allocation.input_output.82 dEQP-VK.renderpass.attachment_allocation.input_output.46 Cc: "17.0" <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Signed-off-by: Nanley Chery <[email protected]>
* glsl: fix heap-buffer-overflowBartosz Tomczyk2017-01-311-1/+1
| | | | | | | | | The `end+1` skips the ']', whereas the `strlen+1` includes the final '\0' in the move to terminate the string. Cc: [email protected] Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* etnaviv: Cannot render to rb-swapped formatsWladimir J. van der Laan2017-01-311-2/+5
| | | | | | | | | | | | | | Exposing rb swapped (or other swizzled) formats for rendering would involve swizzing in the pixel shader. This is not the case at the moment, so reject requests for creating such surfaces. (GPUs that need an extra resolve step anyway due to multiple pixel pipes, such as gc2000, might also do this swap in the resolve operation. But this would be tricky to keep track of) CC: <[email protected]> Signed-off-by: Wladimir J. van der Laan <[email protected]> Acked-by: Christian Gmeiner <[email protected]>