summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* anv/TODO: Update the HiZ taskNanley Chery2016-10-071-1/+1
| | | | | | Signed-off-by: Nanley Chery <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* anv: Enable fast depth clearsNanley Chery2016-10-072-2/+35
| | | | | | | | | Provides an FPS increase of ~30% on the Sascha triangle and multisampling demos. Signed-off-by: Nanley Chery <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* anv/cmd_buffer: Enable rendering to HiZChad Versace2016-10-072-4/+40
| | | | | | | | | | | | | | | | | | | | | Nanley Chery: (rebase) - Resolve conflicts with new anv_batch_emit macro (amend) - Handle a QPitch TODO - Emit 3DSTATE_HIER_DEPTH_BUFFER on pre-BDW systems - Only use HiZ for single-subpass renderpasses - Emit the HiZ instruction before the stencil instruction to follow the optimized clear sequence specified in the PRMs - Don't modify clear params - Enable resolves when a HiZ buffer is used to ensure depth buffer validity Provides an FPS increase of ~15% on the Sascha triangle and multisampling demos. Signed-off-by: Nanley Chery <[email protected]> Reviewed-by: Chad Versace <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv/cmd_buffer: Add code for performing HZ operationsNanley Chery2016-10-073-0/+197
| | | | | | | | | Create a function that performs one of three HiZ operations - depth/stencil clears, HiZ resolve, and depth resolves. Signed-off-by: Nanley Chery <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* anv/image: Memset hiz surfaces to 0 when binding memoryJason Ekstrand2016-10-071-1/+30
| | | | | | | | | Nanley Chery (amend): - Change memset value from 0xff to 0 (a defined value for HiZ). Signed-off-by: Nanley Chery <[email protected]> Reviewed-by: Chad Versace <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: Move BindImageMemory to anv_image.cJason Ekstrand2016-10-072-20/+20
| | | | | | Signed-off-by: Nanley Chery <[email protected]> Reviewed-by: Chad Versace <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: Allocate hiz surfaceChad Versace2016-10-071-3/+34
| | | | | | | | | | | | | | | | Nanley Chery: (rebase) - Use isl_surf_get_hiz_surf() (amend) - Only add a HiZ surface onto a depth/stencil attachment - Add comment above HiZ surface addition - Hide HiZ behind INTEL_VK_HIZ prior to BDW - Disable HiZ for untested cases - Remove DISABLE_AUX_BIT instead of preventing it from being added Signed-off-by: Nanley Chery <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* anv: Add func anv_image_has_hiz()Chad Versace2016-10-071-0/+10
| | | | | Signed-off-by: Nanley Chery <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: Add anv_image::hiz_surfaceChad Versace2016-10-071-0/+2
| | | | | | | Unused. Signed-off-by: Nanley Chery <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* isl: Correct a comment in the isl_format enumNanley Chery2016-10-071-1/+1
| | | | | | | | HiZ is not a color surface, but an auxiliary depth surface. Signed-off-by: Nanley Chery <[email protected]> Reviewed-by: Chad Versace <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* gallium: add missing zero-init for resource templatesRob Clark2016-10-0710-0/+17
| | | | | | | Mostly test code, plus one spot I noticed in r600. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* freedreno: don't try to shadow layered texturesRob Clark2016-10-071-0/+3
| | | | | | | | We will only hit this with multi-planar YUV external images, so we would probably never hit this code path in the first place. But if we did, it wouldn't do the right thing so just bail. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx+a4xx: fix clip-plane lowering stateRob Clark2016-10-072-0/+6
| | | | | | | If enabled clip-planes have changed, we need to mark program state dirty. Signed-off-by: Rob Clark <[email protected]>
* glsl: Let cache_test build when the shader cache is not enabledIan Romanick2016-10-071-0/+4
| | | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Timothy Arceri <[email protected]> Tested-by: Aaron Watry <[email protected]>
* anv: pipeline cache: fix return value of vkGetPipelineCacheDataLionel Landwerlin2016-10-071-2/+5
| | | | | | | | | | | | | | | | | According to the spec - 9.6. Pipeline Cache : If pDataSize is less than the maximum size that can be retrieved by the pipeline cache, at most pDataSize bytes will be written to pData, and vkGetPipelineCacheData will return VK_INCOMPLETE. Fixes the following test from Vulkan CTS : dEQP-VK.pipeline.cache.pipeline_from_incomplete_get_data.vertex_stage_fragment_stage dEQP-VK.pipeline.cache.pipeline_from_incomplete_get_data.vertex_stage_geometry_stage_fragment_stage dEQP-VK.pipeline.cache.misc_tests.invalid_size_test Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* util: remove unused variableTimothy Arceri2016-10-071-4/+2
| | | | | | Also initialise page at declaration. Reviewed-by: Nicolai Hähnle <[email protected]>
* loader/dri3: import prime buffers in the currently-bound screenMartin Peres2016-10-071-1/+11
| | | | | | | | | | | | | | | | | | This tries to mirrors the codepath taken by DRI2 in IntelSetTexBuffer2() and fixes many applications when using DRI3: - Totem with libva on hw-accelerated decoding - obs-studio, using Window Capture (Xcomposite) as a Source - gstreamer with VAAPI v2: - introduce get_dri_screen() in the dri3 loader's vtable (krh) Tested-by: Timo Aaltonen <[email protected]> Tested-by: Ionut Biru <[email protected]> Cc: [email protected] Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=71759 Reviewed-by: Emil Velikov <[email protected]> Signed-off-by: Martin Peres <[email protected]>
* loader/dri3: add get_dri_screen() to the vtableMartin Peres2016-10-073-0/+24
| | | | | | | | | This allows querying the current active screen from the loader's common code. Cc: [email protected] Reviewed-by: Emil Velikov <[email protected]> Signed-off-by: Martin Peres <[email protected]>
* anv/entrypoints: Save off the entire devinfo rather than a pointerJason Ekstrand2016-10-061-5/+5
| | | | | | | | | Since the gen_device_info structs are no longer just constant memory, a pointer to one is not a pointer to something in the .data section so we shouldn't be storing it in a static variable. Instead, we should just store the entire device_info structure. Signed-off-by: Jason Ekstrand <[email protected]>
* radv: drop all uint for unsigned.Dave Airlie2016-10-071-8/+8
| | | | Signed-off-by: Dave Airlie <[email protected]>
* vc4: Don't worry about partial Z/S clear if the other is already cleared.Eric Anholt2016-10-061-3/+7
| | | | | | | | | We have to be careful to not smash the value they're clearing to, but other than that we're fine. Avoids quad clears in Processing, which likes to do glClear(Z|S); glClear(Z). Improves performance of Processing's QuadRendering demo at 5000 quads by 5.46507% +/- 1.35576% (n=15 before, 32 after)
* vc4: Try to fix the HW-2116 workaround.Eric Anholt2016-10-061-9/+10
| | | | | | | | | | | | | | | We were incrementing the count at the end of vc4_start_draw(), except that that function returns immediately if we've already started drawing on this batch. It also failed to count the statechanges from the GFXH-515 workaround. This incidentally allows repeated glClear() to be coalesced, because the fast clears aren't counted in draw_calls_queued any more. Fixes most of the extra flushes in Processing, which emits glClear(Z|S); glClear(Z); glClear(C) during its frame setup. Improves performance of Processing's QuadRendering demo at 5000 quads by 3.33538% +/- 2.05846% (n=21 before, 15 after)
* vc4: Drop dead argument from vc4_start_draw().Eric Anholt2016-10-061-3/+3
|
* vc4: Fix fallback to quad clears of depth in GLX.Eric Anholt2016-10-064-25/+64
| | | | | The fix in the vc4-jobs series ended up triggering the fallback path on GLX apps that use depth but not stencil.
* vc4: Add the format name in miptree_debug.Eric Anholt2016-10-061-2/+4
| | | | | I was curious if my Z/S buffer was actually ZS or ZX, and the vc4 format of "0" didn't tell me much.
* vc4: Fix perf debug formatting on partial Z/S clear.Eric Anholt2016-10-061-1/+1
|
* vc4: Drop destination register when it's unused.Eric Anholt2016-10-061-1/+22
| | | | | | | This slightly reduces instructions on shader-db, but I think it's just perturbing register allocation -- the allocator should have always trivially colored these nodes, before. This commit is just to make QIR code failing more intelligible when register allocation fails.
* vc4: Fix live intervals analysis for screening defs in if statements.Eric Anholt2016-10-063-5/+20
| | | | | | | | | If a conditional assignment is only conditioned on the exec mask, that's still screening off the value in the executed channels (and, since we're not storing to the unexcuted channels, we don't care what's in there). Fixes a bunch of extra register pressure on Processing's Ribbons demo, which is failing to allocate.
* vc4: Fix simulator when more than one vc4_screen is opened.Eric Anholt2016-10-063-3/+39
| | | | | | We would assertion fail in setting up the simulator the second time around. This at least postpones the assertion failure until we've closed all of the first set of screens and started opening a new set.
* vc4: Fix assertion fails from trying to cast non-ALU instrs to ALU.Eric Anholt2016-10-061-0/+2
| | | | | Fixes 100 piglit tests since the assertions were added to nir.h. What's amazing is that these tests used to pass, even when casting garbage.
* anv/cmd_buffer: Move the clear_subpasses calls to set_subpassJason Ekstrand2016-10-061-2/+2
| | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Nanley Chery <[email protected]>
* anv/cmd_buffer: Don't call set_subpass in a secondaryJason Ekstrand2016-10-064-48/+3
| | | | | | | | | | | | | Initially, we had intended set_subpass to be an interesting function that did whatever (presumably a lot) setup we needed for a subpass. In reality, it just sets a pointer and a dirty bit and then emits depth and stencil state. When we call BeginCommandBuffer on a secondary, there's no point in setting depth and stencil state since it will already be set by the primary. Instead, the only thing we need to do at the start of a secondary is set the subpass pointer and the dirty bit. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Nanley Chery <[email protected]>
* anv/cmd_buffer: Rework descriptor dirtying in set_subpassJason Ekstrand2016-10-061-1/+5
| | | | | | | | | We have a DIRTY_RENDER_TARGETS flag and that makes a lot more sense than just dirtying fragment descriptors. We're checking for it in some of the gen7 code but unfortunately, nothing was setting it and it didn't do what it was supposed to do in cmd_buffer_flush_state. Signed-off-by: Jason Ekstrand <[email protected]>
* anv/wsi: Advertise UNORM formats as well as sRGBJason Ekstrand2016-10-062-0/+5
| | | | | | | | | | | | | | Because WSI images are created with VkImageCreateInfo::flags explicitly set to 0, they don't ever have the VK_IMAGE_CREATE_MUTABLE_FORMAT_BIT set. This means that you can't create an image view of it with a different format so applications can't render directly in sRGB (without automatic encoding) unless we actually advertise UNORM formats. There are a lot of applications that want to do their own sRGB conversion, so we should allow for that. We do, however, make UNORM come after sRGB in the list so that the default for dumb apps that just grab the first thing is to render in linear and let the sRGB conversion happen automatically. Signed-off-by: Jason Ekstrand <[email protected]>
* radv: fix configure.ac checkDave Airlie2016-10-071-1/+1
| | | | | | This should be positive test. Signed-off-by: Dave Airlie <[email protected]>
* radv: Skip already signalled fences.Gustaw Smolarczyk2016-10-071-3/+3
| | | | | | | | | If the user created a fence with VK_FENCE_CREATE_SIGNALED_BIT set, we shouldn't fail to wait for a fence if it was not submitted since that is not necessary. Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv: add initial non-conformant radv vulkan driverDave Airlie2016-10-0763-8/+32093
| | | | | | | | | | | | | | | | | | | | | | | This squashes all the radv development up until now into one for merging. History can be found: https://github.com/airlied/mesa/tree/semi-interesting This requires llvm 3.9 and is in no way considered a conformant vulkan implementation. It can run a number of vulkan applications, and supports all GPUs using the amdgpu kernel driver. Thanks to Intel for providing anv and spirv->nir, and Emil Velikov for reviewing build integration. Parts of this are: Reviewed-by: Nicolai Hähnle <[email protected]> Acked-by: Edward O'Callaghan <[email protected]> Authors: Bas Nieuwenhuizen and Dave Airlie Signed-off-by: Dave Airlie <[email protected]>
* nv50/ir: fix wrong check when optimizing MAD to SHLADDSamuel Pitoiset2016-10-071-1/+1
| | | | | | | | | Checking if MAD is supported is definitely wrong, and it's more likely a typo I introduced few days ago which breaks NV50 because SHLADD is not supported there. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* intel: aubinator: use getopt to parse argumentsLionel Landwerlin2016-10-071-57/+33
| | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Sirisha Gandikota <[email protected]>
* nvc0: dump program binary only when NV50_PROG_DEBUG is setSamuel Pitoiset2016-10-071-1/+1
| | | | | | | | When the chipset is forced with NV50_PROG_CHIPSET, we actually only want to output the binary if NV50_PROG_DEBUG is also enabled. Otherwise, this pollutes the shader-db output. Signed-off-by: Samuel Pitoiset <[email protected]>
* nir: Fix the control flow tests for nir_loop_first_block changesJason Ekstrand2016-10-061-1/+1
| | | | | | | | | Commit 2ed17d46de045404042f13c6591895a1cf31b167 changed nir_loop_first_cf_node and friends to return a nir_block instead of a nir_cf_node. This broke one of the NIR control flow tests. Signed-off-by: Jason Ekstrand <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98128
* docs: mark ARB_compute_variable_group_size as done for nvc0Samuel Pitoiset2016-10-072-1/+2
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* nvc0: expose ARB_compute_variable_group_sizeSamuel Pitoiset2016-10-071-2/+6
| | | | | | | | | Only expose 512 threads/block on Fermi to not be limited by 32 GPRs/thread. v4: - use 512 threads on Fermi, 1024 on Kepler+ Signed-off-by: Samuel Pitoiset <[email protected]>
* nv50/ir: set number of threads/block for variable local sizeSamuel Pitoiset2016-10-071-0/+2
| | | | | | | | | | | | When a variable local size is defined as specified by ARB_compute_variable_group_size, the fixed local size is set to 0 and a SIGFPE occurs when we compute the maximum number of regs. This allows to use 64 GPRs/thread. v4: - use 512 threads on Fermi, 1024 on Kepler+ Signed-off-by: Samuel Pitoiset <[email protected]>
* st/mesa: expose ARB_compute_variable_group_sizeSamuel Pitoiset2016-10-071-0/+22
| | | | | | | | | | | | | This extension is only exposed if the underlying driver supports ARB_compute_shader and if PIPE_COMPUTE_MAX_VARIABLE_THREADS_PER_BLOCK is set. v3: - initialize max_variable_threads_per_block to 0 v2: - expose the ext based on that new cap Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* st/mesa: add support for dispatching a variable local sizeSamuel Pitoiset2016-10-071-3/+12
| | | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* st/mesa: add mapping for SYSTEM_VALUE_LOCAL_GROUP_SIZESamuel Pitoiset2016-10-071-0/+2
| | | | | | | | | gl_LocalGroupSizeARB can be translated into TGSI_SEMANTIC_BLOCK_SIZE which represents the block size in threads. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium: add PIPE_COMPUTE_CAP_MAX_VARIABLE_THREADS_PER_BLOCKSamuel Pitoiset2016-10-077-1/+15
| | | | | | | | | v3: - use a new case statement in r600_pipe_common.c - fix compilation of softpipe... Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* glsl: add gl_LocalGroupSizeARB as a system valueSamuel Pitoiset2016-10-072-0/+7
| | | | | | | | v2: - only add it if the ext is enabled (Ilia) Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* glsl/linker: handle errors when a variable local size is usedSamuel Pitoiset2016-10-071-2/+23
| | | | | | | | | | | | | Compute shaders can now include a fixed local size as defined by ARB_compute_shader or a variable size as defined by ARB_compute_variable_group_size. v2: - update formatting spec quotations (Ian) - various cosmetic changes (Ian) Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>