summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* freedreno/a3xx+a4xx: fix clip-plane lowering stateRob Clark2016-10-072-0/+6
| | | | | | | If enabled clip-planes have changed, we need to mark program state dirty. Signed-off-by: Rob Clark <[email protected]>
* glsl: Let cache_test build when the shader cache is not enabledIan Romanick2016-10-071-0/+4
| | | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Timothy Arceri <[email protected]> Tested-by: Aaron Watry <[email protected]>
* anv: pipeline cache: fix return value of vkGetPipelineCacheDataLionel Landwerlin2016-10-071-2/+5
| | | | | | | | | | | | | | | | | According to the spec - 9.6. Pipeline Cache : If pDataSize is less than the maximum size that can be retrieved by the pipeline cache, at most pDataSize bytes will be written to pData, and vkGetPipelineCacheData will return VK_INCOMPLETE. Fixes the following test from Vulkan CTS : dEQP-VK.pipeline.cache.pipeline_from_incomplete_get_data.vertex_stage_fragment_stage dEQP-VK.pipeline.cache.pipeline_from_incomplete_get_data.vertex_stage_geometry_stage_fragment_stage dEQP-VK.pipeline.cache.misc_tests.invalid_size_test Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* util: remove unused variableTimothy Arceri2016-10-071-4/+2
| | | | | | Also initialise page at declaration. Reviewed-by: Nicolai Hähnle <[email protected]>
* loader/dri3: import prime buffers in the currently-bound screenMartin Peres2016-10-071-1/+11
| | | | | | | | | | | | | | | | | | This tries to mirrors the codepath taken by DRI2 in IntelSetTexBuffer2() and fixes many applications when using DRI3: - Totem with libva on hw-accelerated decoding - obs-studio, using Window Capture (Xcomposite) as a Source - gstreamer with VAAPI v2: - introduce get_dri_screen() in the dri3 loader's vtable (krh) Tested-by: Timo Aaltonen <[email protected]> Tested-by: Ionut Biru <[email protected]> Cc: [email protected] Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=71759 Reviewed-by: Emil Velikov <[email protected]> Signed-off-by: Martin Peres <[email protected]>
* loader/dri3: add get_dri_screen() to the vtableMartin Peres2016-10-073-0/+24
| | | | | | | | | This allows querying the current active screen from the loader's common code. Cc: [email protected] Reviewed-by: Emil Velikov <[email protected]> Signed-off-by: Martin Peres <[email protected]>
* anv/entrypoints: Save off the entire devinfo rather than a pointerJason Ekstrand2016-10-061-5/+5
| | | | | | | | | Since the gen_device_info structs are no longer just constant memory, a pointer to one is not a pointer to something in the .data section so we shouldn't be storing it in a static variable. Instead, we should just store the entire device_info structure. Signed-off-by: Jason Ekstrand <[email protected]>
* radv: drop all uint for unsigned.Dave Airlie2016-10-071-8/+8
| | | | Signed-off-by: Dave Airlie <[email protected]>
* vc4: Don't worry about partial Z/S clear if the other is already cleared.Eric Anholt2016-10-061-3/+7
| | | | | | | | | We have to be careful to not smash the value they're clearing to, but other than that we're fine. Avoids quad clears in Processing, which likes to do glClear(Z|S); glClear(Z). Improves performance of Processing's QuadRendering demo at 5000 quads by 5.46507% +/- 1.35576% (n=15 before, 32 after)
* vc4: Try to fix the HW-2116 workaround.Eric Anholt2016-10-061-9/+10
| | | | | | | | | | | | | | | We were incrementing the count at the end of vc4_start_draw(), except that that function returns immediately if we've already started drawing on this batch. It also failed to count the statechanges from the GFXH-515 workaround. This incidentally allows repeated glClear() to be coalesced, because the fast clears aren't counted in draw_calls_queued any more. Fixes most of the extra flushes in Processing, which emits glClear(Z|S); glClear(Z); glClear(C) during its frame setup. Improves performance of Processing's QuadRendering demo at 5000 quads by 3.33538% +/- 2.05846% (n=21 before, 15 after)
* vc4: Drop dead argument from vc4_start_draw().Eric Anholt2016-10-061-3/+3
|
* vc4: Fix fallback to quad clears of depth in GLX.Eric Anholt2016-10-064-25/+64
| | | | | The fix in the vc4-jobs series ended up triggering the fallback path on GLX apps that use depth but not stencil.
* vc4: Add the format name in miptree_debug.Eric Anholt2016-10-061-2/+4
| | | | | I was curious if my Z/S buffer was actually ZS or ZX, and the vc4 format of "0" didn't tell me much.
* vc4: Fix perf debug formatting on partial Z/S clear.Eric Anholt2016-10-061-1/+1
|
* vc4: Drop destination register when it's unused.Eric Anholt2016-10-061-1/+22
| | | | | | | This slightly reduces instructions on shader-db, but I think it's just perturbing register allocation -- the allocator should have always trivially colored these nodes, before. This commit is just to make QIR code failing more intelligible when register allocation fails.
* vc4: Fix live intervals analysis for screening defs in if statements.Eric Anholt2016-10-063-5/+20
| | | | | | | | | If a conditional assignment is only conditioned on the exec mask, that's still screening off the value in the executed channels (and, since we're not storing to the unexcuted channels, we don't care what's in there). Fixes a bunch of extra register pressure on Processing's Ribbons demo, which is failing to allocate.
* vc4: Fix simulator when more than one vc4_screen is opened.Eric Anholt2016-10-063-3/+39
| | | | | | We would assertion fail in setting up the simulator the second time around. This at least postpones the assertion failure until we've closed all of the first set of screens and started opening a new set.
* vc4: Fix assertion fails from trying to cast non-ALU instrs to ALU.Eric Anholt2016-10-061-0/+2
| | | | | Fixes 100 piglit tests since the assertions were added to nir.h. What's amazing is that these tests used to pass, even when casting garbage.
* anv/cmd_buffer: Move the clear_subpasses calls to set_subpassJason Ekstrand2016-10-061-2/+2
| | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Nanley Chery <[email protected]>
* anv/cmd_buffer: Don't call set_subpass in a secondaryJason Ekstrand2016-10-064-48/+3
| | | | | | | | | | | | | Initially, we had intended set_subpass to be an interesting function that did whatever (presumably a lot) setup we needed for a subpass. In reality, it just sets a pointer and a dirty bit and then emits depth and stencil state. When we call BeginCommandBuffer on a secondary, there's no point in setting depth and stencil state since it will already be set by the primary. Instead, the only thing we need to do at the start of a secondary is set the subpass pointer and the dirty bit. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Nanley Chery <[email protected]>
* anv/cmd_buffer: Rework descriptor dirtying in set_subpassJason Ekstrand2016-10-061-1/+5
| | | | | | | | | We have a DIRTY_RENDER_TARGETS flag and that makes a lot more sense than just dirtying fragment descriptors. We're checking for it in some of the gen7 code but unfortunately, nothing was setting it and it didn't do what it was supposed to do in cmd_buffer_flush_state. Signed-off-by: Jason Ekstrand <[email protected]>
* anv/wsi: Advertise UNORM formats as well as sRGBJason Ekstrand2016-10-062-0/+5
| | | | | | | | | | | | | | Because WSI images are created with VkImageCreateInfo::flags explicitly set to 0, they don't ever have the VK_IMAGE_CREATE_MUTABLE_FORMAT_BIT set. This means that you can't create an image view of it with a different format so applications can't render directly in sRGB (without automatic encoding) unless we actually advertise UNORM formats. There are a lot of applications that want to do their own sRGB conversion, so we should allow for that. We do, however, make UNORM come after sRGB in the list so that the default for dumb apps that just grab the first thing is to render in linear and let the sRGB conversion happen automatically. Signed-off-by: Jason Ekstrand <[email protected]>
* radv: fix configure.ac checkDave Airlie2016-10-071-1/+1
| | | | | | This should be positive test. Signed-off-by: Dave Airlie <[email protected]>
* radv: Skip already signalled fences.Gustaw Smolarczyk2016-10-071-3/+3
| | | | | | | | | If the user created a fence with VK_FENCE_CREATE_SIGNALED_BIT set, we shouldn't fail to wait for a fence if it was not submitted since that is not necessary. Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv: add initial non-conformant radv vulkan driverDave Airlie2016-10-0763-8/+32093
| | | | | | | | | | | | | | | | | | | | | | | This squashes all the radv development up until now into one for merging. History can be found: https://github.com/airlied/mesa/tree/semi-interesting This requires llvm 3.9 and is in no way considered a conformant vulkan implementation. It can run a number of vulkan applications, and supports all GPUs using the amdgpu kernel driver. Thanks to Intel for providing anv and spirv->nir, and Emil Velikov for reviewing build integration. Parts of this are: Reviewed-by: Nicolai Hähnle <[email protected]> Acked-by: Edward O'Callaghan <[email protected]> Authors: Bas Nieuwenhuizen and Dave Airlie Signed-off-by: Dave Airlie <[email protected]>
* nv50/ir: fix wrong check when optimizing MAD to SHLADDSamuel Pitoiset2016-10-071-1/+1
| | | | | | | | | Checking if MAD is supported is definitely wrong, and it's more likely a typo I introduced few days ago which breaks NV50 because SHLADD is not supported there. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* intel: aubinator: use getopt to parse argumentsLionel Landwerlin2016-10-071-57/+33
| | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Sirisha Gandikota <[email protected]>
* nvc0: dump program binary only when NV50_PROG_DEBUG is setSamuel Pitoiset2016-10-071-1/+1
| | | | | | | | When the chipset is forced with NV50_PROG_CHIPSET, we actually only want to output the binary if NV50_PROG_DEBUG is also enabled. Otherwise, this pollutes the shader-db output. Signed-off-by: Samuel Pitoiset <[email protected]>
* nir: Fix the control flow tests for nir_loop_first_block changesJason Ekstrand2016-10-061-1/+1
| | | | | | | | | Commit 2ed17d46de045404042f13c6591895a1cf31b167 changed nir_loop_first_cf_node and friends to return a nir_block instead of a nir_cf_node. This broke one of the NIR control flow tests. Signed-off-by: Jason Ekstrand <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98128
* docs: mark ARB_compute_variable_group_size as done for nvc0Samuel Pitoiset2016-10-072-1/+2
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* nvc0: expose ARB_compute_variable_group_sizeSamuel Pitoiset2016-10-071-2/+6
| | | | | | | | | Only expose 512 threads/block on Fermi to not be limited by 32 GPRs/thread. v4: - use 512 threads on Fermi, 1024 on Kepler+ Signed-off-by: Samuel Pitoiset <[email protected]>
* nv50/ir: set number of threads/block for variable local sizeSamuel Pitoiset2016-10-071-0/+2
| | | | | | | | | | | | When a variable local size is defined as specified by ARB_compute_variable_group_size, the fixed local size is set to 0 and a SIGFPE occurs when we compute the maximum number of regs. This allows to use 64 GPRs/thread. v4: - use 512 threads on Fermi, 1024 on Kepler+ Signed-off-by: Samuel Pitoiset <[email protected]>
* st/mesa: expose ARB_compute_variable_group_sizeSamuel Pitoiset2016-10-071-0/+22
| | | | | | | | | | | | | This extension is only exposed if the underlying driver supports ARB_compute_shader and if PIPE_COMPUTE_MAX_VARIABLE_THREADS_PER_BLOCK is set. v3: - initialize max_variable_threads_per_block to 0 v2: - expose the ext based on that new cap Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* st/mesa: add support for dispatching a variable local sizeSamuel Pitoiset2016-10-071-3/+12
| | | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* st/mesa: add mapping for SYSTEM_VALUE_LOCAL_GROUP_SIZESamuel Pitoiset2016-10-071-0/+2
| | | | | | | | | gl_LocalGroupSizeARB can be translated into TGSI_SEMANTIC_BLOCK_SIZE which represents the block size in threads. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium: add PIPE_COMPUTE_CAP_MAX_VARIABLE_THREADS_PER_BLOCKSamuel Pitoiset2016-10-077-1/+15
| | | | | | | | | v3: - use a new case statement in r600_pipe_common.c - fix compilation of softpipe... Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* glsl: add gl_LocalGroupSizeARB as a system valueSamuel Pitoiset2016-10-072-0/+7
| | | | | | | | v2: - only add it if the ext is enabled (Ilia) Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* glsl/linker: handle errors when a variable local size is usedSamuel Pitoiset2016-10-071-2/+23
| | | | | | | | | | | | | Compute shaders can now include a fixed local size as defined by ARB_compute_shader or a variable size as defined by ARB_compute_variable_group_size. v2: - update formatting spec quotations (Ian) - various cosmetic changes (Ian) Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* glsl: reject compute shaders with fixed and variable local sizeSamuel Pitoiset2016-10-071-0/+14
| | | | | | | | | | | The ARB_compute_variable_group_size specification explains that when a compute shader includes both a fixed and a variable local size, a compile-time error occurs. v2: - update formatting spec quotations (Ian) Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* glsl: process local_size_variable input qualifierSamuel Pitoiset2016-10-075-1/+37
| | | | | | | | | | | | This is the new layout qualifier introduced by ARB_compute_variable_group_size which allows to use a variable work group size. v4: - add missing '%s' in the monster format string Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* glsl: add enable flags for ARB_compute_variable_group_sizeSamuel Pitoiset2016-10-074-0/+12
| | | | | | | | This also initializes the default values for the standalone compiler. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* mesa/main: add support for ARB_compute_variable_groups_sizeSamuel Pitoiset2016-10-0711-1/+185
| | | | | | | | | | | | | v5: - replace fixed_local_size by !LocalSizeVariable (Nicolai) v4: - slightly indent spec quotes (Nicolai) - drop useless _mesa_has_compute_shaders() check (Nicolai) - move the fixed local size outside of the loop (Nicolai) - add missing check for invalid use of work group count v2: - update formatting spec quotations (Ian) - move the total_invocations check outside of the loop (Ian) Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* glapi: add entry points for GL_ARB_compute_variable_group_sizeSamuel Pitoiset2016-10-076-1/+45
| | | | | | | | | v2: - correctly sort that new extension (Ian) - fix up the comment (Ian) Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* nv50/ir: optimize sub(a, 0) to aKarol Herbst2016-10-061-0/+3
| | | | | | | | | | | | | | | | | helped some ue4 demos and divinity OS shaders total instructions in shared programs : 2818674 -> 2818606 (-0.00%) total gprs used in shared programs : 379273 -> 379273 (0.00%) total local used in shared programs : 9505 -> 9505 (0.00%) total bytes used in shared programs : 25837792 -> 25837192 (-0.00%) local gpr inst bytes helped 0 0 33 33 hurt 0 0 0 0 Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Reviewed-by: Pierre Moreau <[email protected]>
* st/mesa: move all sampler view code into new st_sampler_view.[ch] filesBrian Paul2016-10-0613-493/+589
| | | | | | | | | | | | | Previously, the sampler view code was scattered across several different files. Note, the previous REALLOC(), FREE() for st_texture_object::sampler_views are replaced by realloc(), free() to avoid conflicting macros in Mesa vs. Gallium. Reviewed-by: Edward O'Callaghan <[email protected]> Acked-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* st/mesa: optimize pipe_sampler_view validationBrian Paul2016-10-064-30/+101
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before, st_get_texture_sampler_view_from_stobj() did a lot of work to check if the texture parameters matched the sampler view (format, swizzle, min/max lod, first/last layer, etc). We did this every time we validated the texture state. Now, we use a ctx->Driver.TexParameter() callback and a couple other checks to proactively release texture views when we know that view-related parameters have changed. Then, the validation step is simplified: - Search the texture's list of sampler views (just match the context). - If found, we're done. - Else, create a new sampler view. There will never be old, out-of-date sampler views attached to texture objects that we have to test. Most apps create textures and set the texture parameters once. This make sampler view validation much cheaper for that case. Note that the old texture/sampler comparison code has been converted into a set of assertions to verify that the sampler view is in fact consistent with the texture parameters. This should help to spot any potential regressions. Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* mesa: call ctx->Driver.TexParameter() in texture_buffer_range()Brian Paul2016-10-061-0/+13
| | | | | | | To inform drivers of texture buffer offset/size changes, as we do for other texture object parameters. Reviewed-by: Nicolai Hähnle <[email protected]>
* st/mesa: consolidate view format setup codeBrian Paul2016-10-061-34/+54
| | | | | | | | | | | | Before, we had code to compute the sampler view's format spread across two different functions: in update_single_texture() and st_get_texture_sampler_view_from_stobj(). Now it's all in one new function. Also, use _mesa_texture_base_format() to simplify the code. Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* st/mesa: add some const qualifiers in st_atom_texture.cBrian Paul2016-10-061-3/+5
| | | | | | | | And minor code reformatting. Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* st/mesa: simplify some code in get_texture_format_swizzle()Brian Paul2016-10-061-5/+5
| | | | | | | | There's no need to cast to st_texture_image. Just use gl_texture_image. Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]>