aboutsummaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
...
* turnip: Disable timestamp queries for now.Eric Anholt2019-11-271-2/+2
| | | | | | | They're not implemented, and not critical to bring up immediately. Avoids failures in the CTS when nothing gets written to the query. Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* freedreno/perfcntrs/fdperf: add missing a2xx case in select_counterJonathan Marek2019-11-271-0/+1
| | | | | Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* freedreno/perfcntrs/fdperf: add missing a20x compatibleJonathan Marek2019-11-271-0/+1
| | | | | Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* freedreno/perfcntrs/fdperf: fix u64 print on 32-bit buildsJonathan Marek2019-11-271-1/+2
| | | | | Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* freedreno/perfcntrs: add a2xx MH countersJonathan Marek2019-11-271-4/+186
| | | | | Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* freedreno/registers: add missing MH perfcounter enum for a2xxJonathan Marek2019-11-271-0/+185
| | | | | Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* v3d: fix indirect BO allocation for uniformsIago Toral Quiroga2019-11-271-3/+8
| | | | | | | | | | | | | | | | | | | We were always ensuring a minimum size of 4 bytes for uniforms for the case where we don't have any, to account for hardware pre-fetching of the uniform stream, however, pre-fetching could also lead to to out of bounds reads when have read the last uniform in the stream, so we probably want to have the extra 4 bytes to prevent the kernel from observing invalid memory accesses when the uniform stream sits right at the end of a page. This seems to fix MMU exceptions reported with a Linux 5.4 kernel. Credit goes to Phil Elwell for identifying the problem and narrowing it down to memory accesses in the uniform stream. Reported-by: Phil Elwell <[email protected]> Tested-by: Phil Elwell <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* radv: enable VK_KHR_shader_subgroup_extended_types on GFX10Samuel Pitoiset2019-11-271-1/+1
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: add 8-bit and 16-bit supports to ac_build_permlane16()Samuel Pitoiset2019-11-271-8/+16
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv/gfx10: fix implementation of exclusive scansSamuel Pitoiset2019-11-271-24/+52
| | | | | | | | | | | This implementation is loosely based on ROCm. https://github.com/RadeonOpenCompute/ROCm-Device-Libs/blob/master/ockl/src/wfredscan.cl This fixes dEQP-VK.subgroups.arithmetic.*.subgroupexclusive* on GFX10. Fixes: 227c29a80de ("amd/common/gfx10: implement scan & reduce operations") Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: fix enabling sample shading with SampleID/SamplePositionSamuel Pitoiset2019-11-271-7/+24
| | | | | | | | | | When a fragment shader includes an input variable decorated with SampleId or SamplePosition, sample shading should be enabled because minSampleShadingFactor is expected to be 1.0. Cc: 19.2, 19.3 <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* turnip: fix integer render targetsJonathan Marek2019-11-261-1/+3
| | | | | | | | | | | | | Add missing required bits. Fixes at least: dEQP-VK.pipeline.render_to_image.dedicated_allocation.1d.small.r16g16_sint_d24_unorm_s8_uint dEQP-VK.pipeline.render_to_image.dedicated_allocation.2d.mipmap.r16g16_sint_d24_unorm_s8_uint dEQP-VK.renderpass.dedicated_allocation.attachment.4.401 dEQP-VK.renderpass2.suballocation.formats.r16_uint.load.draw dEQP-VK.synchronization.op.single_queue.barrier.write_draw_read_copy_image_to_buffer.image_128x128_r16_uint Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* anv: Push constants are relative to dynamic state on IVBJason Ekstrand2019-11-261-0/+17
| | | | | | Fixes: aecde2351 "anv: Pre-compute push ranges for graphics pipelines" Closes: #2136 Reviewed-by: Lionel Landwerlin <[email protected]>
* gallium/auxiliary: Fix uses of gnu struct = {} extensionDylan Baker2019-11-265-8/+8
| | | | | | | | | | Most of these will never actually be compiled by windows, but in the interest of being able to make using struct foo = {}; an error and avoiding breaking windows removing a handful of safe uses seems like a good trade off. Reviewed-by: Kristian H. Kristensen <[email protected]> Acked-by: Eric Engestrom <[email protected]>
* st/mesa: add st_variant base class to simplify code for shader variantsMarek Olšák2019-11-268-307/+149
| | | | Reviewed-by: Timothy Arceri <[email protected]>
* st/mesa: don't use ** in the st_nir_link_shaders signatureMarek Olšák2019-11-261-20/+20
| | | | Reviewed-by: Timothy Arceri <[email protected]>
* st/mesa: simplify looping over linked shaders when linking NIRMarek Olšák2019-11-261-48/+28
| | | | Reviewed-by: Timothy Arceri <[email protected]>
* st/mesa: propagate gl_PatchVerticesIn from TCS to TES before linking for NIRMarek Olšák2019-11-261-2/+2
| | | | Reviewed-by: Timothy Arceri <[email protected]>
* st/mesa: don't call ProgramStringNotify in glsl_to_nirMarek Olšák2019-11-262-13/+16
| | | | Reviewed-by: Timothy Arceri <[email protected]>
* st/mesa: don't use redundant stp->state.ir.nirMarek Olšák2019-11-263-25/+12
| | | | Reviewed-by: Timothy Arceri <[email protected]>
* st/mesa: don't serialize all streamout state if there are no SO outputsMarek Olšák2019-11-261-4/+15
| | | | Reviewed-by: Timothy Arceri <[email protected]>
* iris: Disable VF cache partial address workaround on Gen11+Kenneth Graunke2019-11-262-0/+14
| | | | | | | | | | | | | | | The vertex cache uses the full 48-bit address on Gen11+. See the documentation for 3DSTATE_VERTEX_BUFFERS, which describes the workaround and lists it as pre-Icelake. Interestingly, the docs don't mention index buffers as needing a workaround at all. So either we've been overzealous, or the docs never got updated to record that. Which begs the question of whether the issue there was fixed, if there was one... Cuts 40% of the PIPE_CONTROLs from Civilization VI's benchmark; appears that it improves performance by about 1-2% on Icelake 8x8 (not frequency locked).
* freedreno: switch to layout helperRob Clark2019-11-2627-199/+190
| | | | | | | | | | | | The slices table and most of the other layout fields in the freedreno_resource moves into fdl_layout. v2: Changes by anholt to not have duplicate fields, which was introducing a surprising behavior change in resource layout (using the level_linear helper before the setup of the shadowed fields) Reviewed-by: Eric Anholt <[email protected]> Acked-by: Rob Clark <[email protected]>
* freedreno/a6xx: Log the tiling mode in resource layout debug.Eric Anholt2019-11-261-2/+2
| | | | | | | This was important for figuring out what went wrong with the layout refactor. Acked-by: Rob Clark <[email protected]>
* freedreno: Convert the slice struct to the new resource header.Eric Anholt2019-11-2623-71/+43
| | | | | | | | This gets the worst of the sed required for shared resource layout out of the way. The texture layout comment is dropped now that we're referencing the shared header, which has a more complete description. Acked-by: Rob Clark <[email protected]>
* freedreno: Introduce a resource layout header.Eric Anholt2019-11-261-0/+164
| | | | | | | | This will be used for sharing resource layout code between freedreno and tu. Mostly copied from a commit by Rob, with a new location and the slice struct renamed for consistency. Acked-by: Rob Clark <[email protected]>
* freedreno: Introduce a fd_resource_tile_mode() helper.Eric Anholt2019-11-268-36/+24
| | | | | | | | Multiple places were doing the same thing to get the tile mode of a level, so refactor it out. This will make the shared resource helper transition cleaner. Acked-by: Rob Clark <[email protected]>
* freedreno: Introduce a fd_resource_layer_stride() helper.Eric Anholt2019-11-262-11/+15
| | | | | | | This factors out a bit of duplicated code, but will also make the shared resource layout transition process clearer. Acked-by: Rob Clark <[email protected]>
* freedreno: use rsc->slice accessor everywhereRob Clark2019-11-2614-45/+58
| | | | | | | This will make it easier to extract the slice table out into a layout helper. Acked-by: Rob Clark <[email protected]>
* nir: Make algebraic backtrack and reprocess after a replacement.Eric Anholt2019-11-262-22/+97
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The algebraic pass was exhibiting O(n^2) behavior in dEQP-GLES2.functional.uniform_api.random.3 and dEQP-GLES31.functional.ubo.random.all_per_block_buffers.13 (along with other code-generated tests, and likely real-world loop-unroll cases). In the process of using fmul(b2f(x), b2f(x)) -> b2f(iand(x, y)) to transform: result = b2f(a == b); result *= b2f(c == d); ... result *= b2f(z == w); -> temp = (a == b) temp = temp && (c == d) ... temp = temp && (z == w) result = b2f(temp); nir_opt_algebraic, proceeding bottom-to-top, would match and convert the top-most fmul(b2f(), b2f()) case each time, leaving the new b2f to be matched by the next fmul down on the next time algebraic got run by the optimization loop. Back in 2016 in 7be8d0773229 ("nir: Do opt_algebraic in reverse order."), Matt changed algebraic to go bottom-to-top so that we would match the biggest patterns first. This helped his cases, but I believe introduced this failure mode. Instead of reverting that, now that we've got the automaton, we can update the automaton's state recursively and just re-process any instructions whose state has changed (indicating that they might match new things). There's a small chance that the state will hash to the same value and miss out on this round of algebraic, but this seems to be good enough to fix dEQP. Effects with NIR_VALIDATE=0 (improvement is better with validation enabled): Intel shader-db runtime -0.954712% +/- 0.333844% (n=44/46, obvious throttling outliers removed) dEQP-GLES2.functional.uniform_api.random.3 runtime -65.3512% +/- 4.22369% (n=21, was 1.4s) dEQP-GLES31.functional.ubo.random.all_per_block_buffers.13 runtime -68.8066% +/- 6.49523% (was 4.8s) v2: Use two worklists, suggested by @cwabbott, to cut out a bunch of tricky code. Runtime of uniform_api.random.3 down -0.790299% +/- 0.244213% compred to v1. v3: Re-add the nir_instr_remove() that I accidentally dropped in v2, fixing infinite loops. Reviewed-by: Connor Abbott <[email protected]>
* nir: Refactor algebraic's block walkEric Anholt2019-11-261-31/+31
| | | | | | | | | My motivation was to clarify the changes in the following commit, but incidentally, it reduces runtime of dEQP-GLES2.functional.uniform_api.random.3 (an algebraic-heavy testcase) by -5.39524% +/- 2.21179% (n=15) Reviewed-by: Connor Abbott <[email protected]>
* nir: Maintain the algebraic automaton's state as we work.Connor Abbott2019-11-262-38/+78
| | | | | | | | In order to have nir_opt_algebraic be able to do further algebraic work on the output of a replacement, we need to maintain the automaton's state. Reviewed-by: Eric Anholt <[email protected]>
* etnaviv: support 3d/array/integer formats in texture descriptorsJonathan Marek2019-11-261-3/+19
| | | | | Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Christian Gmeiner <[email protected]>
* etnaviv: blt: fix partial ZS clears with TSJonathan Marek2019-11-261-4/+7
| | | | | | | | If not all bits are cleared, then BLT needs to be given the current clear value and not the new one. Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Christian Gmeiner <[email protected]>
* aco: don't value-number instructions from within a loop with ones after the ↵Daniel Schürmann2019-11-261-1/+6
| | | | | | | | | | loop. Fixes: Wolfenstein:Youngblood (w/o shader_ballot) dEQP-VK.descriptor_indexing.combined_image_sampler_in_loop_with_lod Reviewed-by: Rhys Perry <[email protected]>
* aco: set dlc/glc correctly for image loadsRhys Perry2019-11-261-0/+3
| | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-By: Timur Kristóf <[email protected]>
* aco: allow constant offsets for global/scratch instructions on GFX10Rhys Perry2019-11-262-2/+5
| | | | | | | | | | I don't think the bug applies for global/scratch instructions and load_barycentric_at_sample selection expects this feature to work. Fixes various dEQP-VK.pipeline.multisample_interpolation.* tests on GFX10. Signed-off-by: Rhys Perry <[email protected]> Reviewed-By: Timur Kristóf <[email protected]>
* radv: Enable VK_KHR_buffer_device_address.Bas Nieuwenhuizen2019-11-262-3/+25
| | | | | | Still no capture/replay or multi device support. Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: fix reporting subgroup size with VK_KHR_pipeline_executable_propertiesSamuel Pitoiset2019-11-261-3/+2
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Allocate cmdbuffer space for buffer marker write.Bas Nieuwenhuizen2019-11-261-0/+4
| | | | | Fixes: 946193ae008 "radv: add support for VK_AMD_buffer_marker" Reviewed-by: Samuel Pitoiset <[email protected]>
* r600: Disable eight bit three channel formatsGert Wollny2019-11-261-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 0899bf55 made some deqp-gles3 tests related to RGB8 PBOs fail on R600 because it exposed PIPE_FORMAT_R8G8B8_UNORM and R600 doesn't propely handle this. Disabling this format also for buffers fixes the issue. In addition, disabling also the related RGB8 integer formats for buffers fixes some deqp-gles3 tests: dEQP-GLES3.functional.texture.specification.teximage2d_pbo.rgb8ui_cube dEQP-GLES3.functional.texture.specification.texsubimage2d_pbo.rgb8i_2d dEQP-GLES3.functional.texture.specification.texsubimage2d_pbo.rgb8i_cube dEQP-GLES3.functional.texture.specification.texsubimage2d_pbo.rgb8ui_2d dEQP-GLES3.functional.texture.specification.texsubimage2d_pbo.rgb8ui_cube dEQP-GLES3.functional.texture.specification.teximage3d_pbo.rgb8i_2d_array dEQP-GLES3.functional.texture.specification.teximage3d_pbo.rgb8i_3d dEQP-GLES3.functional.texture.specification.teximage3d_pbo.rgb8ui_2d_array dEQP-GLES3.functional.texture.specification.teximage3d_pbo.rgb8ui_3d dEQP-GLES3.functional.texture.specification.texsubimage3d_pbo.rgb8i_2d_array dEQP-GLES3.functional.texture.specification.texsubimage3d_pbo.rgb8i_3d dEQP-GLES3.functional.texture.specification.texsubimage3d_pbo.rgb8ui_2d_array dEQP-GLES3.functional.texture.specification.texsubimage3d_pbo.rgb8ui_3d Fixes: 0899bf55 st/mesa: Map MESA_FORMAT_RGB_UNORM8 <-> PIPE_FORMAT_R8G8B8_UNORM Closes #2118 Signed-off-by: Gert Wollny <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* ac/llvm: fix warning in ac_build_canonicalize()Samuel Pitoiset2019-11-261-1/+1
| | | | | | | | | | | | | ../src/amd/llvm/ac_llvm_build.c: In function ‘ac_build_canonicalize’: ../src/amd/llvm/ac_llvm_build.c:4567:9: warning: ‘intr’ may be used uninitialized in this function [-Wmaybe-uninitialized] 4567 | return ac_build_intrinsic(ctx, intr, type, params, 1, | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 4568 | AC_FUNC_ATTR_READNONE); | ~~~~~~~~~~~~~~~~~~~~~~ ../src/amd/llvm/ac_llvm_build.c:4567:9: warning: ‘type’ may be used uninitialized in this function [-Wmaybe-uninitialized] Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Pierre-Eric Pelloux-Prayer <[email protected]>
* mapi: add GetInteger64vEXT with EXT_disjoint_timer_queryTapani Pälli2019-11-263-1/+16
| | | | | | | | | | | | | From EXT_disjoint_timer_query spec: "Interaction: This extension adds GetInteger64vEXT if OpenGL ES 3.0 is not supported" See https://github.com/KhronosGroup/OpenGL-Registry/issues/326. Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2090 Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* vulkan: Update the XML and headers to 1.1.129Jason Ekstrand2019-11-261-48/+273
| | | | Acked-by: Lionel Landwerlin <[email protected]>
* anv/entrypoints: Better handle promoted extensionsJason Ekstrand2019-11-261-9/+25
| | | | | | | | | | | In the case of promoted extensions we can end up with an entrypoint that we support being an alias of an entrypoint we do not support. For instance, if an extension gets promoted from EXT to KHR, the EXT entry- points may be aliases of the KHR ones. We want to leave everything as EXT until we get around to advertising the KHR so that we don't break things when we update the XML and headers. Reviewed-by: Lionel Landwerlin <[email protected]>
* vulkan/enum_to_str: Handle out-of-order aliasesJason Ekstrand2019-11-261-3/+21
| | | | | | | | | The current code can only handle enum aliases if the original enum is declared first followed by the alias as we walk the XML in a linear fashion. This commit allows us to handle aliases where the alias declaration comes before the thing it's aliasing. Reviewed-by: Lionel Landwerlin <[email protected]>
* iris: Update SURFACE_STATE addresses when setting sampler viewsKenneth Graunke2019-11-251-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | We may have replaced the backing storage for a texture buffer while it was unbound, at which point iris_rebind_buffer would not have caught it and updated it. We need to ensure that the current resource's address matches the one our SURFACE_STATE points at. If not, update addresses and re-upload the SURFACE_STATE. Shader images and buffers do not suffer from this problem because we re-stream the surface state on every set call, since there isn't a created CSO object for those with a saved SURFACE_STATE. Constant buffers are also currently re-streamed (we pitch the SURFACE_STATE on every set_constant_buffer call). Surfaces would need this treatment (as they're created CSOs) except that we never swap out their backing storage today (we only do it for buffers), so it's OK for now. Fixes misrendering in Unreal 4 demos (Elemental, Matinee Fight Scene). Huge thanks to Andrii Simiklit for tracking down the problem - it was quite difficult to find! Also fixes Andrii's new Piglit test for the bug, 'arb_texture_buffer_object-re-init'. Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1365
* iris: Maintain CPU-side SURFACE_STATE copies for views and surfaces.Kenneth Graunke2019-11-252-55/+136
| | | | | | | | | | | | | | | | | | | | | | When replacing the backing storage for texture buffers, image buffers, and so on, we may need to update the "Surface Base Address" field in any corresponding SURFACE_STATE. This is easier to accomplish if we have a copy on the CPU - we can just compare the current field, update it, and re-upload. This patch adds a CPU-side copy to the new iris_surface_state wrapper struct, and reworks allocation and upload to fill things out on the CPU copy first, then upload that to the GPU when finished. This will be necessary to fix iris_invalidate_resource bugs shortly. Technically, we never replace the backing storage for pipe_surfaces (render targets), so we don't need to make this change there. However, it's nice to have surfaces, sampler views, and image views handled similarly. Plus, if we ever wanted to swap out backing storage for busy textures, we'd need this infrastructure. v2: Properly free memory (caught by Andrii Simiklit)
* iris: Create an "iris_surface_state" wrapper structKenneth Graunke2019-11-252-27/+36
| | | | | | | Today, we only have a state reference to the GPU buffer containing our uploaded SURFACE_STATEs. However, we're going to want a CPU-side copy soon. Making a wrapper struct means we can talk about both together, and also put both in the field called "surface_state".
* iris: Drop 'old_address' parameter from iris_rebind_bufferKenneth Graunke2019-11-253-7/+6
| | | | | | | We can just compare the VERTEX_BUFFER_STATE address field to the current BO's address. When calling rebind, we've already updated the resource to the new buffer, but the state will have the old address.