summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* anv: Don't use bogus alpha swizzlesJason Ekstrand2017-02-013-3/+22
| | | | | | | | | | | For RGB formats in Vulkan, we use the corresponding RGBA format with a swizzle of RGB1. While this swizzle is exactly what we want for texturing, it's not allowed for rendering according to the docs. While we haven't been getting hangs or anything, we should probably obey the docs. This commit just sanitizes all render swizzles so that the alpha channel maps to ALPHA. Reviewed-by: Anuj Phogat <[email protected]>
* Add missing copyright header to wayland-egl-priv.hMicah Fedke2017-02-011-0/+27
| | | | | Reviewed-by: Daniel Stone <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* radv: handle VK_QUEUE_FAMILY_IGNORED in image transitions (v3)Dave Airlie2017-02-024-12/+15
| | | | | | | | | | | | | The CTS tests at least are using this, and we were totally ignoring it. This hopefully fixes the bouncing multisample CTS tests. v2: get family mask in ignored case from command buffer. v3: only change things in one place, use logic from Bas. Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv/ac: handle clip/cull distance sizing in geometry shader outputsDave Airlie2017-02-021-8/+10
| | | | | | | | | | | Otherwise we were writing these as 4 components, and things went bad. Fixes (the remaining): dEQP-VK.clipping.user_defined.*.vert_geom.* Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv/ac: add const_index to fetch index for gs inputsDave Airlie2017-02-021-1/+1
| | | | | | | | | | | | This fixes clip distance fetches as they are single item loads with a const_index like float[1]. Fixes: dEQP-VK.clipping.user_defined.*.vert_geom.[0-6] Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radeonsi/ac: move frag interp emission code to shared llvm code.Dave Airlie2017-02-023-87/+98
| | | | | | | | This code should be used in radv, so move it to a shared location in advance of doing that. Reviewed-by: Nicolai Hähnle <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* st/mesa: inline get_mesa_program()Timothy Arceri2017-02-021-37/+23
| | | | | | | | | | In the past I've gotten this function confused with the one in ir_to_mesa.cpp of the same name. Now that the affected flag setting has move into a helper it makes sense just to inline this remaining code. Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* st/mesa: create set_prog_affected_state_flags() helperTimothy Arceri2017-02-021-106/+111
| | | | | | | | This will be used when restoring tgsi from the on-disk shader cache. Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* st/mesa: st_atom_shader.c C99 tidy upTimothy Arceri2017-02-021-3/+1
| | | | | Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* st/mesa: remove pre C99 statement block for variable declarationTimothy Arceri2017-02-021-60/+58
| | | | Acked-by: Marek Olšák <[email protected]>
* isl: Add assertions for render target swizzle restrictionsJason Ekstrand2017-02-011-0/+32
| | | | Reviewed-by: Anuj Phogat <[email protected]>
* st/va: add h264 constrained baseline profileBoyuan Zhang2017-02-011-0/+1
| | | | | Signed-off-by: Boyuan Zhang <[email protected]> Reviewed-by: Christian König <[email protected]>
* st/vdpau: add h264 constrained baseline profileBoyuan Zhang2017-02-011-0/+4
| | | | | Signed-off-by: Boyuan Zhang <[email protected]> Reviewed-by: Christian König <[email protected]>
* radeon/uvd: add h264 constrained baseline supportBoyuan Zhang2017-02-011-0/+1
| | | | | Signed-off-by: Boyuan Zhang <[email protected]> Reviewed-by: Christian König <[email protected]>
* vl: add h264 constrained baseline profileBoyuan Zhang2017-02-012-0/+2
| | | | | Signed-off-by: Boyuan Zhang <[email protected]> Reviewed-by: Christian König <[email protected]>
* radv: Enable VK_KHR_shader_draw_parameters.Bas Nieuwenhuizen2017-02-012-0/+5
| | | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radv: Pass draw index to shader.Bas Nieuwenhuizen2017-02-011-5/+9
| | | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radv/ac: Add draw index support.Bas Nieuwenhuizen2017-02-011-2/+8
| | | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* i965: Prevent coverity warningRobert Foss2017-02-011-0/+1
| | | | | | | | | | | | Add assert checking that num_sources is never larger than 3. This prevents Coverity from concluding that the unhandled cases of num_sources not being 0-3 are relevant. Coverity-Id: 1399480-1399489 Signed-off-by: Robert Foss <[email protected]> Reviewed-by: Anuj Phogat <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]>
* spirv: add SPV_KHR_shader_draw_parameters supportLionel Landwerlin2017-02-013-0/+17
| | | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* compiler: add missing enums for debugLionel Landwerlin2017-02-011-0/+2
| | | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* docs: add news item and link release notes for 13.0.4Emil Velikov2017-02-012-0/+7
| | | | Signed-off-by: Emil Velikov <[email protected]>
* docs: add sha256 checksums for 13.0.4Emil Velikov2017-02-011-1/+2
| | | | | Signed-off-by: Emil Velikov <[email protected]> (cherry picked from commit 6bfc352f5a35ab21f012d6d501821ffbf767aab3)
* docs: add release notes for 13.0.4Emil Velikov2017-02-011-0/+254
| | | | | Signed-off-by: Emil Velikov <[email protected]> (cherry picked from commit 3255d10da4c2703bfdfcefd8f59b0d8f21dbb43f)
* winsys/radeon: Allow visible VRAM size > 256MB with kernel driver >= 2.49Michel Dänzer2017-02-011-1/+6
| | | | | | | | | The kernel driver reports correct values now. Reviewed-by: Christian König <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Alex Deucher <[email protected]>
* android: add vulkan build for intelTapani Pälli2017-02-012-0/+227
| | | | | | | | | | | | | | | | | | | | | | | | | fixes to issues spotted by Emil Velikov: - set ANV_TIMESTAMP corretly - fix typo with VULKAN_GEM_FILES v2: update to use Makefile.sources under vulkan instead of having own v3: update to changes to generate from vk.xml (commit c7fc310) v4: remove 'hw' relative path cleanups, remove unnecessary cruft review from Emil Velikov: - move to vulkan folder - remove timestamp gen, no longer necessary - more cleanups Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* mesa: use same is_color_attachment trick to discern error casesIlia Mirkin2017-01-311-3/+11
| | | | | | | | | | | All the other calls to retrieve the attachment have been covered except this one - return the proper error for attachment points that are valid enums but out of bound for the driver. Fixes GL45-CTS.geometry_shader.layered_fbo.fb_texture_invalid_attachment Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* anv: Improve flushing around STATE_BASE_ADDRESSJason Ekstrand2017-01-311-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | It is not clear from the docs exactly how pipelined STATE_BASE_ADDRESS actually is. We know from experimentation that we need to flush the render cache prior to emitting STATE_BASE_ADDRESS and invalidate the texture cache afterwards. The only thing the PRM says is that, on gen8+ we're supposed to invalidate the state cache after STATE_BASE_ADDRESS but experimentation has indicated that doing so does nothing whatsoever. Since we don't really know, let's do just a bit more flushing in the hopes that this won't be a problem again. In particular: 1) Do a CS stall before we emit STATE_BASE_ADDRESS since we don't really know whether or not it's pipelined. 2) Do a data cache flush in case what runs before STATE_BASE_ADDRESS is a compute shader. 3) Invalidate the state and constant caches after STATE_BASE_ADDRESS because the state may be getting cached there (we don't really know). Reported-by: Mark Janes <[email protected]> Tested-by: Mark Janes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: "13.0 17.0" <[email protected]>
* anv: Flush render cache before STATE_BASE_ADDRESS on gen7Jason Ekstrand2017-01-311-3/+0
| | | | | | | | | | | We had no good reason for *not* doing this on gen7 before but we didn't know it was needed. Recently, when trying update to Vulkan CTS version 1.0.2 in our CI system, Mark discovered GPU hangs on Haswell that appear to be STATE_BASE_ADDRESS related. This commit fixes them. Reported-by: Mark Janes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: "13.0 17.0" <[email protected]>
* isl/formats: Only advertise sampling for A4B4G4R4 on BroadwellJason Ekstrand2017-01-311-2/+3
| | | | | | | | | This causes hangs on Broadwell if you try to render to it. I have no idea how we managed to not hit this earlier. Tested-by: Mark Janes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: "13.0 17.0" <[email protected]>
* intel/blorp: Handle clearing of A4B4G4R4 on all platformsJason Ekstrand2017-01-311-0/+23
| | | | | | Tested-by: Mark Janes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: "13.0 17.0" <[email protected]>
* radeonsi: Fix build on LLVM < 3.9 v2Tom Stellard2017-02-011-2/+4
| | | | | | | | | This was broken by: e0cc0a614c96011958bc3a1b84da9168e0e1ccbb v2: - Use preprocessor macro Tested-by: Mark Janes <[email protected]>
* radv: Enable Float64 support.Bas Nieuwenhuizen2017-02-012-1/+2
| | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radv/ac: Implement Float64 SSBO loads.Bas Nieuwenhuizen2017-02-011-26/+49
| | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radv/ac: Implement Float64 UBO loads.Bas Nieuwenhuizen2017-02-011-2/+6
| | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radv/ac: Implement Float64 load/store var.Bas Nieuwenhuizen2017-02-011-53/+48
| | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radv/ac: Implement Float64 SSBO stores.Bas Nieuwenhuizen2017-02-011-3/+14
| | | | | | | No f16 support as I'm not quite sure about alignment yet. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radv/ac: Add core Float64 support.Bas Nieuwenhuizen2017-02-011-44/+129
| | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* vc4: Enable Neon on arm android buildsRob Herring2017-01-311-0/+2
| | | | | Signed-off-by: Rob Herring <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* vc4: fix arm64 build with NeonRob Herring2017-01-311-1/+1
| | | | | | | | The addition of Neon assembly breaks on arm64 builds because the assembly syntax is different. For now, restrict Neon to ARMv7 builds. Signed-off-by: Rob Herring <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* vc4: Make Neon inline assembly clang compatibleRob Herring2017-01-311-35/+35
| | | | | | | | | | | | clang throws an error on "%r2" and similar. I couldn't find any documentation on what "%r?" is supposed to mean and I've never seen any use like that as far as I remember. The parameter is supposed to be cpu_stride and just %2/%3 should be sufficient. There's no need for trailing ";" either, so remove those, too. Signed-off-by: Rob Herring <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* radeonsi: Set datalayout on the llvm moduleTom Stellard2017-01-311-0/+6
| | | | | | | | | | This prevents LLVM from using sext instructions for local memory offsets and allows the backend to fold immediate offsets into the instruction. This also prevents some incorrect code generation for ptrtoint and inttoptr instructions. Reviewed-by: Nicolai Hähnle <[email protected]>
* nir/spirv/glsl450: Implement IEEE-compliant handling of atan2(±∞, ±∞).Francisco Jerez2017-01-311-1/+21
| | | | | Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Juan A. Suarez Romero <[email protected]>
* glsl: Implement IEEE-compliant handling of atan2(±∞, ±∞).Francisco Jerez2017-01-311-1/+21
| | | | | Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Juan A. Suarez Romero <[email protected]>
* nir/spirv/glsl450: Rewrite atan2 implementation to fix accuracy and handling ↵Francisco Jerez2017-01-311-22/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | of zero/infinity. See "glsl: Rewrite atan2 implementation to fix accuracy and handling of zero/infinity." for the rationale, but note that the instruction count benefit discussed there is somewhat less important for the SPIRV implementation, because the current code already emitted no control flow instructions -- Still this saves us one hardware instruction per scalar component on Intel SKL hardware. Fixes the following Vulkan CTS tests on Intel hardware: dEQP-VK.glsl.builtin.precision.atan2.highp_compute.scalar dEQP-VK.glsl.builtin.precision.atan2.highp_compute.vec2 dEQP-VK.glsl.builtin.precision.atan2.highp_compute.vec3 dEQP-VK.glsl.builtin.precision.atan2.highp_compute.vec4 dEQP-VK.glsl.builtin.precision.atan2.mediump_compute.vec2 dEQP-VK.glsl.builtin.precision.atan2.mediump_compute.vec4 Note that most of the test-cases above expect IEEE-compliant handling of atan2(±∞, ±∞), which this patch doesn't explicitly handle, so except for the last two the test-cases above weren't expected to pass yet. The reason they do is that the i965 back-end implementation of the NIR fmin and fmax instructions is not quite GLSL-compliant (it complies with IEEE 754 recommendations though), because fmin/fmax of a NaN and a non-NaN argument currently always return the non-NaN argument, which causes atan() to flush NaN to one and return the expected value. The front-end should probably not be relying on this behavior for correctness though because other back-ends are likely to behave differently -- A follow-up patch will handle the atan2(±∞, ±∞) corner cases explicitly. v2: Fix up argument scaling to take into account the range and precision of exotic FP24 hardware. Flip coordinate system for arguments along the vertical line as if they were on the left half-plane in order to avoid division by zero which may give unspecified results on non-GLSL 4.1-capable hardware. Sprinkle in some more comments. Reviewed-by: Ian Romanick <[email protected]>
* glsl: Rewrite atan2 implementation to fix accuracy and handling of ↵Francisco Jerez2017-01-311-36/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | zero/infinity. This addresses several issues of the current atan2 implementation: - Negative zero (and negative denorms which end up getting flushed to zero) isn't handled correctly by the current implementation. The reason is that it does 'y >= 0' and 'x < 0' comparisons to decide on which side of the branch cut the argument is, which causes us to return incorrect results (off by up to 2π) for very small negative values. - There is a serious precision problem for x values of large enough magnitude introduced by the floating point division operation being implemented as a mul+rcp sequence. This can lead to the quotient getting flushed to zero in some cases introducing an error of over 8e6 ULP in the result -- Or in the most catastrophic case will cause us to return NaN instead of the correct value ±π/2 for y=±∞ and x very large. We can fix this easily by scaling down both arguments when the absolute value of the denominator goes above certain threshold. The error of this atan2 implementation remains below 25 ULP in most of its domain except for a neighborhood of y=0 where it reaches a maximum error of about 180 ULP. - It emits a bunch of instructions including no less than three if-else branches per scalar component that don't seem to get optimized out later on. This implementation uses about 13% less instructions on Intel SKL hardware and doesn't emit any control flow instructions. v2: Fix up argument scaling to take into account the range and precision of exotic FP24 hardware. Flip coordinate system for arguments along the vertical line as if they were on the left half-plane in order to avoid division by zero which may give unspecified results on non-GLSL 4.1-capable hardware. Sprinkle in some more comments. Reviewed-by: Ian Romanick <[email protected]>
* i965/fs: Fix nir_op_fsign of absolute value.Francisco Jerez2017-01-311-1/+8
| | | | | | | | | | This does point at the front-end emitting silly code that could have been optimized out, but the current fsign implementation would emit bogus IR if abs was set for the argument (because it would apply the abs modifier on an unsigned integer type), and we shouldn't rely on the upper layer's optimization passes for correctness. Reviewed-by: Ian Romanick <[email protected]>
* glsl/ir_builder: Add rcp builder.Francisco Jerez2017-01-312-0/+7
| | | | | Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Juan A. Suarez Romero <[email protected]>
* glsl: Fix constant evaluation of the rcp op.Francisco Jerez2017-01-311-1/+1
| | | | | | | | | | Will avoid a regression in a future commit that introduces some additional rcp operations. According to the GLSL 4.10 specification: "Dividing by 0 results in the appropriately signed IEEE Inf." Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Juan A. Suarez Romero <[email protected]>
* mesa/program: Translate csel operation from GLSL IR.Francisco Jerez2017-01-311-1/+8
| | | | | | | | | | | This will be used internally by the GLSL front-end in order to implement some built-in functions. Plumb it through MESA IR for back-ends that rely on this translation pass. v2: Add comment. Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Juan A. Suarez Romero <[email protected]>