aboutsummaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* radeon/vcn: Add VP9 8K decode supportLeo Liu2019-10-301-1/+1
| | | | | | | Require increase of context buffer size Signed-off-by: Leo Liu <[email protected]> Reviewed-by: Boyuan Zhang <[email protected]>
* aco: try to group together VMEM loads of the same resourceRhys Perry2019-10-301-10/+56
| | | | | | | | v2: remove accidental shaderInt16 change v2: simplify can_move_down initialization v2: simplify VMEM_CLAUSE_MAX_GRAB_DIST Reviewed-by: Daniel Schürmann <[email protected]>
* aco: don't schedule instructions through depending VMEM instructionsDaniel Schürmann2019-10-301-0/+3
| | | | | | | | Previously, the scheduler tried to move up instructions from below depending VMEM instructions only to move them down again when scheduling the VMEM instruction. Reviewed-by: Rhys Perry <[email protected]>
* aco: add can_reorder flags to load_ubo and load_constantDaniel Schürmann2019-10-301-5/+9
| | | | | | | | These got lost due to some refactoring. Due to the way our scheduler works currently, for now we add back the reorder flag for divergent loads only. Reviewed-by: Rhys Perry <[email protected]>
* aco: only skip RAR dependencies if the variable is killed somewhereDaniel Schürmann2019-10-301-21/+46
| | | | | | | | | This patch changes VMEM scheduling in a way that they can only be moved upwards by previous VMEM instructions but not downwards. This way, it improves the order of VMEM instructions in relation to their users. Reviewed-by: Rhys Perry <[email protected]>
* aco: restrict scheduling depending on max_wavesDaniel Schürmann2019-10-301-9/+15
| | | | | | | | | Previously, we allowed all shaders to reduce the number of max_waves to as low as 5. Restricting this on shaders with low register demand, increases the total number of waves while the VMEM def-use distances hardly change. This patch also changes the max number of move operations per MEM instruction. Reviewed-by: Rhys Perry <[email protected]>
* anv: Avoid emitting UBO surface states that won't be usedJason Ekstrand2019-10-301-1/+12
| | | | | | | | This shaves around 4-5% off of a CPU-limited example running with the Dawn WebGPU implementation. Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel/vec4: Set brw_stage_prog_data::has_ubo_pullJason Ekstrand2019-10-301-0/+2
| | | | | | | | | | | | | | In 0e4a75f917, Ken added a flag brw_stage_prog_data which indicates whether any UBO pulls ever occur. Unfortunately, he neglected to set the bit in the vec4 back-end. This was fine at the time because the optimization was intended for iris which does not support gen7 and using the vec4 back-end on Gen8+ requires an environment variable. We want to use this in Vulkan which does support Gen7 so we want the information from the vec4 back-end as well as scalar. Fixes: 0e4a75f917 "intel/compiler: Record whether any pull constant..." Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* radv: fix perftest optionsSamuel Pitoiset2019-10-301-10/+9
| | | | | | | | RADV_PERFTEST=outooforder has been removed a while ago. This fixes dumping the options into hang reports. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: move nomemorycache debug option at the right palceSamuel Pitoiset2019-10-301-1/+1
| | | | | | Fixes: 6571000071d ("radv: add debug option to turn off in memory cache") Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: fix dumping SPIR-V into hang reportsSamuel Pitoiset2019-10-304-5/+13
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* mesa: enable ARB_gpu_shader_int64 in compat profileTapani Pälli2019-10-303-77/+76
| | | | | | Signed-off-by: Tapani Pälli <[email protected]> Acked-by: Marek Olšák <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: add [Program]Uniform*64ARB display list supportTapani Pälli2019-10-301-0/+979
| | | | | | | | This is required for int64 to be enabled in compat profile. Signed-off-by: Tapani Pälli <[email protected]> Acked-by: Marek Olšák <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* radv: Enable VK_KHR_timeline_semaphore.Bas Nieuwenhuizen2019-10-302-1/+13
| | | | Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Add wait-before-submit support for timelines.Bas Nieuwenhuizen2019-10-302-7/+154
| | | | | | | | | | | | | | This is actually a non-threaded implementation. I'd summarize this as event-based submission. When submit happens we walk a tree of submissions that depend on the syncobj signal operations to be submitted and if those submission we no other dependencies we start to execute them immediately. Or, well I still use a list to avoid issues with long chains and the stacksize when using recursion. Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Add timelines with a VK_KHR_timeline_semaphore impl.Bas Nieuwenhuizen2019-10-303-60/+504
| | | | | | | | | | This does not fully do wait-before-submit, to be done in a follow up patch. For kernels without support for timeline syncobjs, this adds an implementation of non-shareable timelines using legacy syncobjs. Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Add temporary datastructure for submissions.Bas Nieuwenhuizen2019-10-301-28/+142
| | | | | | So we can defer them. Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Split semaphore into two parts as enum+union.Bas Nieuwenhuizen2019-10-302-38/+92
| | | | | | This is in preparation to adding more types. Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Always enable syncobj when supported for all fences/semaphores.Bas Nieuwenhuizen2019-10-301-2/+0
| | | | | | | This simplifies code for timeline semaphores by needing to support less configurations. Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Improve fence signalling in QueueSubmit.Bas Nieuwenhuizen2019-10-301-13/+24
| | | | | | Only signalling it once. Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Do sparse binding in queue submission.Bas Nieuwenhuizen2019-10-301-60/+81
| | | | | | | So we have one place to do queue things if we end up deferring submissions. Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Split out commandbuffer submission.Bas Nieuwenhuizen2019-10-301-163/+187
| | | | Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Clean up unused variable.Bas Nieuwenhuizen2019-10-301-4/+3
| | | | Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Add an early exit in the secure compile if we already have the cache ↵Bas Nieuwenhuizen2019-10-301-0/+14
| | | | | | entries. Reviewed-by: Timothy Arceri <[email protected]>
* radv: Compute hashes in secure process for secure compilation.Bas Nieuwenhuizen2019-10-301-0/+23
| | | | | | To prevent poisoning arbitrary cache entries. Reviewed-by: Timothy Arceri <[email protected]>
* zink: drop nop descriptor-updatesErik Faye-Lund2019-10-301-4/+5
| | | | | | | If there's nothing to be done, let's actually do nothing. Seems like a good idea. Reviewed-by: Dave Airlie <[email protected]>
* zink: use bitfield for dirty flaggingErik Faye-Lund2019-10-302-7/+6
| | | | | | | Bitfields are a bit more ideomatic than explicit flags, and harder to get wrong. Reviewed-by: Dave Airlie <[email protected]>
* zink: use dynamic state for line-widthErik Faye-Lund2019-10-305-13/+17
| | | | | | | This will lead to fewer pipelines in the cache, which is assumed to become our most unavoidable performance bottle-neck down the line. Reviewed-by: Dave Airlie <[email protected]>
* zink: Use optimal layout instead of general. Reduces valid layer warnings. ↵Duncan Hopkins2019-10-303-36/+113
| | | | | | Fixes RADV image noise. Reviewed-by: Erik Faye-Lund <[email protected]>
* radv: make use of radv_sc_read()Timothy Arceri2019-10-303-39/+76
| | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: add radv_sc_read() helperTimothy Arceri2019-10-302-0/+42
| | | | | | | | | | This is a function with timeout support for reading from the pipe between processes used for secure compile. Initially we hardcode the timeout to 5 seconds. We can adjust the timeout limit in future if needed. Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: allow select() calls in secure compileTimothy Arceri2019-10-301-1/+5
| | | | | | | This will be used in the following patch to support timeouts for reading the pipe between processes. Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* mapi: Improve the x86 tsd stubs performance.Lepton Wu2019-10-291-5/+6
| | | | | | | | This skips touching %ebx most times and it shows that glGetString performance increased from 114M/s to 120M/s on my desktop. Reviewed-by: Matt Turner <[email protected]> Signed-off-by: Lepton Wu <[email protected]>
* mapi: Inline call x86_current_tls.Lepton Wu2019-10-291-4/+8
| | | | | | | | | | This saves one return and a simple benchmark which calls glGetString repeatedly on my desktop shows it improves calls per second from 123M to 141M. Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1997 Reviewed-by: Matt Turner <[email protected]> Signed-off-by: Lepton Wu <[email protected]>
* mapi: Clean up entry_patch_public for x86 tlsLepton Wu2019-10-291-10/+7
| | | | | | | | | | Remove hard coded 16 and use entry_generate_or_patch to patch public stubs. The generated code actually is sightly tighter than before since the "nop" instructions before the final "jmp" get removed. Reviewed-by: Matt Turner <[email protected]> Signed-off-by: Lepton Wu <[email protected]>
* mapi: split entry_generate_or_patch for x86 tlsLepton Wu2019-10-291-5/+16
| | | | | | | | The code works exactly the same with before. Just split this function out so we can reuse it. Reviewed-by: Matt Turner <[email protected]> Signed-off-by: Lepton Wu <[email protected]>
* mapi: Adapted libglvnd x86 tsd changesJonathan Gray2019-10-291-5/+11
| | | | | | | | | | The x86 assembly language stub in src/mapi/entry_x86_tsd.h does not generate PIC (position-independent code). This causes text relocations which bring troubles on recent versions of FreeBSD, OpenBSD, Android. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108541 Reviewed-by: Matt Turner <[email protected]> Signed-off-by: Lepton Wu <[email protected]>
* spirv: Don't fail if multiple ordering semantics bits are setCaio Marcelo de Oliveira Filho2019-10-291-9/+30
| | | | | | | | | | | | | | Vulkan requires that only one bit for the ordering is set, but old versions of GLSLang just set all the bits. This was fixed as part of https://github.com/KhronosGroup/glslang/commit/c51287d744fb6e7e9ccc09f6f8451e6c64b1dad6 but we can still find older versions (or shaders compiled with it) around. So instead of failing, emit a warning and fallback to the effective result of any combination of multiple bits: AcquireRelease. Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2018 Reviewed-by: Jason Ekstrand <[email protected]>
* intel/isl: Allow stencil buffer to support compression on Gen12+Sagar Ghuge2019-10-291-2/+3
| | | | | | | | | v2: (Nanley Chery) - Fix commit title - Fix comment Signed-off-by: Sagar Ghuge <[email protected]> Reviewed-by: Nanley Chery <[email protected]>
* iris: Resolve stencil resource prior to copy or used by CPUSagar Ghuge2019-10-291-9/+19
| | | | | | | | | v2: Decide aux usage in get_copy_region_aux_settings (Nanley Chery) v3: Use isl_surf_usage_is_stencil function (Nanley Chery) Signed-off-by: Sagar Ghuge <[email protected]> Reviewed-by: Nanley Chery <[email protected]>
* iris: Prepare resources before stencil blit operationSagar Ghuge2019-10-291-7/+52
| | | | | | | | | | We have to resolve destination surfaces if we are bliting to and from the same surface. v2: Revert unrelated change (Nanley Chery) Signed-off-by: Sagar Ghuge <[email protected]> Reviewed-by: Nanley Chery <[email protected]>
* iris: Prepare depth resource if clear_depth enableSagar Ghuge2019-10-291-2/+2
| | | | | | | Avoid preparing depth resource, if we did fast depth clear before. Signed-off-by: Sagar Ghuge <[email protected]> Reviewed-by: Nanley Chery <[email protected]>
* iris: Prepare stencil resource before clear depth stencilSagar Ghuge2019-10-291-2/+10
| | | | | | | | | | Let aux surface state tracker track the stencil buffer's aux state while clearing depth stencil buffer. v2: Fix condition check (Nanley Chery) Signed-off-by: Sagar Ghuge <[email protected]> Reviewed-by: Nanley Chery <[email protected]>
* iris: Resolve stencil buffer lossless compression with WM_HZ_OP packetSagar Ghuge2019-10-291-9/+13
| | | | | | | | | | | | | | | | Even though stencil buffer compression looks like regular lossless color compression w/o fast clear support, we have to resolve stencil buffer with WM_HZ_OP packet. v2: Check if resource is stencil with helper function (Nanley Chery) v3: Remove unnecessary included file (Nanley Chery) v4: (Nanley Chery) - Avoid stencil buffer aux state transition by improving condition check Signed-off-by: Sagar Ghuge <[email protected]> Reviewed-by: Nanley Chery <[email protected]>
* intel/blorp: Set stencil resolve enable bitSagar Ghuge2019-10-291-4/+17
| | | | | | | | | | | | | | When set, the stencil buffer is filled with the true stencil values and we have to disable stencil buffer clear enable bit. v2: 1) Refactor code little bit (Nanley Chery) 2) Fix assertion (Nanley Chery) v3: 1) Remove unncessary assignment (Nanley Chery) 2) Fix GEN_GEN check (Nanley Chery) Signed-off-by: Sagar Ghuge <[email protected]> Reviewed-by: Nanley Chery <[email protected]>
* intel: Track stencil aux usage on Gen12+Sagar Ghuge2019-10-294-0/+10
| | | | | | | | | | | | | | Enable stencil compression enable and control surface enable bit if stencil buffer lossless compression is enabled. v2: Remove unnecessary GEN_GEN check (Nanley Chery) v3: (Nanley Chery) - Change commit subject tag from intel/isl to intel - Keep assignment order correct Signed-off-by: Sagar Ghuge <[email protected]> Reviewed-by: Nanley Chery <[email protected]>
* intel/blorp: Add helper function for stencil buffer resolveSagar Ghuge2019-10-292-0/+34
| | | | | | | | On Gen12+, Stencil buffer's lossless compression should be resolved with WM_HZ_OP packet. Signed-off-by: Sagar Ghuge <[email protected]> Reviewed-by: Nanley Chery <[email protected]>
* intel/blorp: Assign correct view while clearing depth stencilSagar Ghuge2019-10-291-1/+1
| | | | | | | | | | We never saw any failures regarding this typo but it's good to assign correct stencil view while constructing blorp_params. Fixes: 0cabf93b80d0 "intel/blorp: Add an entrypoint for clearing depth and stencil" Signed-off-by: Sagar Ghuge <[email protected]> Reviewed-by: Nanley Chery <[email protected]>
* genxml/gen12: Add Stencil Buffer Resolve Enable bitSagar Ghuge2019-10-291-0/+1
| | | | | Signed-off-by: Sagar Ghuge <[email protected]> Reviewed-by: Nanley Chery <[email protected]>
* iris: Allocate main and aux surfaces togetherNanley Chery2019-10-291-34/+21
| | | | | | | | | | | On Gen12, the CCS buffer address doesn't have to be referenced in state packets. In the case of a stencil buffer with CCS, the kernel won't know the location of the CCS unless an extra call is made to pin its address. To avoid this extra call, make the CCS part of the main surface. v2. Update comment above bo_size. (Jordan) Reviewed-by: Jordan Justen <[email protected]>