mesa.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	radv: Clamp gfx9 image view extents to the allocated image extents.	Bas Nieuwenhuizen	2018-11-27	1	-4/+2
\| \| \| \| \| \| \| \| \| \| \|	Mirrors AMDVLK. Looks like if we go over the alignment of height we actually start to change the addressing. Seems like the extra miplevels actually work with this. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108245 Fixes: f6cc15dccd5 "radv/gfx9: fix block compression texture views. (v2)" Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
*	radv: Fix opaque metadata descriptor last layer.	Bas Nieuwenhuizen	2018-11-26	1	-1/+1
\| \| \| \| \| \| \| \| \|	We used the layer count which results in an off by one error. Not sure this really affects anything. Fixes: f4e499ec791 "radv: add initial non-conformant radv vulkan driver" Reviewed-by: Dave Airlie <[email protected]>
*	radv: ignore subpass self-dependencies for CreateRenderPass() too	Samuel Pitoiset	2018-11-23	1	-0/+10
\| \| \| \| \| \| \|	We really need to refactor this... Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
*	radv: remove useless sync before CmdClear{Color,DepthStencil}Image()	Samuel Pitoiset	2018-11-23	1	-6/+2
\| \| \| \| \| \| \| \| \| \|	We don't need to flush anything before these two commands as well. This is because they have to be externally synchronized, so the app should have called CmdPipelineBarrier() prior to that and the driver should have flushed the caches. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
*	radv: remove useless sync after CmdClear{Color,DepthStencil}Image()	Samuel Pitoiset	2018-11-22	1	-4/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	'post_flush' is only set to NULL for the normal clear path (ie. only vkCmdClearColorImage() and vkCmdClearDepthStencilImage() are affected commands). Because these two operations have to be externally synchronized with VK_PIPELINE_STAGE_TRANSFER_BIT and VK_ACCESS_TRANSFER_WRITE_BIT, it's useless to set those flags internallY. VK_PIPELINE_STAGE_TRANSFER_BIT will wait for compute to be idle, while VK_ACCESS_TRANSFER_WRITE_BIT will invalidate both L1 vector caches and L2. RADV_CMD_FLAG_WRITEBACK_GLOBAL_L2 will be superseded by RADV_CMD_FLAG_INV_GLOBAL_L2. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
*	radv: only sync CP DMA for transfer operations or bottom pipe	Samuel Pitoiset	2018-11-21	1	-2/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	CP DMA can only be busy when the driver copies buffers. The only affected Vulkan commands are vkCmdCopyBuffer() and vkCmdUpdateBuffer() (because we fallback to a copy depending on a threshold). Clear operations are currently not concerned because the driver always syncs after the last DMA operation. Per the spec, these two operations have to be externally synchronized with VK_PIPELINE_STAGE_TRANSFER_BIT. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
*	radv: ignore subpass self-dependencies	Samuel Pitoiset	2018-11-21	1	-0/+10
\| \| \| \| \| \| \| \|	Unnecessary as they allow the app to call vkCmdPipelineBarrier() inside the render pass. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
*	ac: handle cast derefs	Dave Airlie	2018-11-21	1	-0/+3
\| \| \| \| \| \|	Just give back the same value for now. Reviewed-by: Bas Nieuwenhuizen <[email protected]>
*	radv: handle loading from shared pointers	Dave Airlie	2018-11-21	1	-9/+18
\| \| \| \| \| \| \| \| \| \|	We won't have a var to load from, so don't try to the processing required if we don't need it. This avoids crashes in: dEQP-VK.spirv_assembly.instruction.compute.variable_pointers.compute.workgroup_two_buffers Reviewed-by: Bas Nieuwenhuizen <[email protected]>
*	ac: avoid casting pointers on bcsel and stores	Dave Airlie	2018-11-21	3	-3/+14
\| \| \| \| \| \| \| \|	For variable pointers we really don't want to case the pointers to int without a good reason, just add a wrapper for bcsel loading and result storing. Reviewed-by: Bas Nieuwenhuizen <[email protected]>
*	ac/nir: fix intrinsic name string size in visit_image_atomic()	Samuel Pitoiset	2018-11-20	1	-1/+1
\| \| \| \| \| \| \| \|	Fixes an assertion in SoTTR. Fixes: dd0172e865 ("radv: Use structured intrinsics instead of indexing workaround for GFX9.") Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
*	radv: Use structured intrinsics instead of indexing workaround for GFX9.	Bas Nieuwenhuizen	2018-11-19	3	-8/+75
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	These force the index to be used in the instruction so we don't need the workaround. Totals: SGPRS: 1321642 -> 1321802 (0.01 %) VGPRS: 943664 -> 943788 (0.01 %) Spilled SGPRs: 28468 -> 28480 (0.04 %) Spilled VGPRs: 88 -> 89 (1.14 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 80 -> 80 (0.00 %) dwords per thread Code Size: 52415292 -> 52338932 (-0.15 %) bytes LDS: 400 -> 400 (0.00 %) blocks Max Waves: 233903 -> 233803 (-0.04 %) Wait states: 0 -> 0 (0.00 %) Totals from affected shaders: SGPRS: 238344 -> 238504 (0.07 %) VGPRS: 232732 -> 232856 (0.05 %) Spilled SGPRs: 13125 -> 13137 (0.09 %) Spilled VGPRs: 88 -> 89 (1.14 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 80 -> 80 (0.00 %) dwords per thread Code Size: 15752712 -> 15676352 (-0.48 %) bytes LDS: 139 -> 139 (0.00 %) blocks Max Waves: 31680 -> 31580 (-0.32 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Samuel Pitoiset <[email protected]>
*	radv: implement fast HTILE clears for depth or stencil only on GFX9	Samuel Pitoiset	2018-11-19	2	-5/+269
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This allows to fast clear the depth part (or the stencil part) of a depth+stencil surface when HTILE is enabled. I didn't test on GFX8, so it's disabled currently. This gives a very nice boost, for example when clearing the depth aspect of a 4096x4096 D32_SFLOAT_S8_UINT image (18x faster). BEFORE: 235 us AFTER: 13 us Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
*	radv: rewrite the condition that checks allowed depth/stencil values	Samuel Pitoiset	2018-11-19	1	-8/+4
\| \| \| \| \|	Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
*	radv: check allowed fast HTILE clears a bit earlier	Samuel Pitoiset	2018-11-19	1	-0/+5
\| \| \| \| \|	Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
*	radv: add radv_is_fast_clear_{depth,stencil}_allowed() helpers	Samuel Pitoiset	2018-11-19	1	-2/+16
\| \| \| \| \|	Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
*	radv: add radv_get_htile_fast_clear_value() helper	Samuel Pitoiset	2018-11-19	1	-3/+18
\| \| \| \| \|	Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
*	radv: remove unnecessary goto in the fast clear paths	Samuel Pitoiset	2018-11-19	1	-28/+24
\| \| \| \| \|	Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
*	radv/winsys: remove the max IBs per submit limit for the sysmem path	Samuel Pitoiset	2018-11-19	1	-17/+29
\| \| \| \| \| \| \| \| \|	This path will be eventually improved later but as it's only used on SI (or with RADV_DEBUG=noibs), I'm not sure if that matters much. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
*	radv/winsys: remove the max IBs per submit limit for the fallback path	Samuel Pitoiset	2018-11-19	1	-48/+55
\| \| \| \| \| \| \| \|	The chained submission is the fastest path and it should now be used more often than before. This removes some EOP events. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
*	radv: always clear the FCE predicate after DCC/FMASK/CMASK decompressions	Samuel Pitoiset	2018-11-19	1	-5/+8
\| \| \| \| \| \| \| \| \|	DCC and FMASK also imply a fast-clear eliminate, so it should be safe to reset the predicate unconditionally. We still only skip FMASK or CMASK decompressions for now. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
*	radv: tidy up radv_set_dcc_need_cmask_elim_pred()	Samuel Pitoiset	2018-11-19	5	-15/+14
\| \| \| \| \| \| \|	This is just a small cleanup. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
*	radv: enable primitive binning by default	Samuel Pitoiset	2018-11-16	2	-7/+3
\| \| \| \| \| \| \| \| \| \| \| \|	After doing a bunch of benchmarks, primitive binning helps some games like The Talos Principle (+5%) or Serious Sam 2017 (+3%). For other titles, either it doesn't change anything or it hurts very few (less than 1%). This only affects GFX9. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
*	radv: add a debug option for disabling primitive binning	Samuel Pitoiset	2018-11-16	2	-0/+2
\| \| \| \| \|	Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
*	Revert "radv: disable VK_SUBGROUP_FEATURE_VOTE_BIT"	Connor Abbott	2018-11-16	1	-4/+2
\| \| \| \| \| \| \| \| \|	This reverts commit 647c2b90e96a9ab8571baf958a7c67c1e816911a. There was one recently-introduced bug in ac for dvec3 loads, but the other test failures were actually bugs in the tests. See https://github.com/KhronosGroup/VK-GL-CTS/commit/9429e621c48848d224e35f30a1ae45a4a079922c Reviewed-by: Bas Nieuwenhuizen <[email protected]>
*	nir: replace nir_load_system_value calls with appropiate builder functions	Karol Herbst	2018-11-14	6	-24/+24
\| \| \| \| \| \| \| \| \|	this helps reduce the overall code changes when a bit_size parameter is added to nir_load_system_value Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Signed-off-by: Karol Herbst <[email protected]>
*	radv: make use of nir_move_out_const_to_consumer()	Timothy Arceri	2018-11-14	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	vkpipeline-db results: Totals from affected shaders: SGPRS: 28400 -> 28576 (0.62 %) VGPRS: 27916 -> 27692 (-0.80 %) Spilled SGPRs: 140 -> 138 (-1.43 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 1534456 -> 1520560 (-0.91 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 3541 -> 3582 (1.16 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Samuel Pitoiset <[email protected]>
*	radv: set optimal OVERWRITE_COMBINER_WATERMARK on GFX9	Samuel Pitoiset	2018-11-13	2	-3/+21
\| \| \| \| \| \| \|	Ported from RadeonSI. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
*	radv: set PA.SC_CONSERVATIVE_RASTERIZATION.NULL_SQUAD_AA_MASK_ENABLE	Samuel Pitoiset	2018-11-13	1	-1/+1
\| \| \| \| \|	Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
*	radv: binding streamout buffers doesn't change context regs	Samuel Pitoiset	2018-11-13	1	-2/+7
\| \| \| \| \| \|	Cc: 18.3 <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
*	radv: make use of num_good_cu_per_sh in si_emit_graphics() too	Samuel Pitoiset	2018-11-12	1	-2/+1
\| \| \| \| \|	Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
*	radv: clean up setting partial_es_wave for distributed tess on VI	Samuel Pitoiset	2018-11-12	1	-7/+4
\| \| \| \| \| \| \| \|	Only needed when the pipeline actually uses tessellation. I don't think that changes anything, except improving readability. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
*	radv: cleanup and document a Hawaii bug with offchip buffers	Samuel Pitoiset	2018-11-12	1	-9/+8
\| \| \| \| \|	Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
*	ac/surface: remove the overallocation workaround for Vega12	Marek Olšák	2018-11-09	1	-4/+0
\| \| \| \| \| \|	not needed anymore (probably since the tile_swizzle fix) Reviewed-by: Samuel Pitoiset <[email protected]>
*	radv: include LLVM IR in the VK_AMD_shader_info "disassembly"	Nicolai Hähnle	2018-11-09	1	-0/+1
\| \| \| \| \| \| \|	Helpful for debugging compiler backend problems: this allows us to easily retrieve the LLVM IR from RenderDoc. Reviewed-by: Samuel Pitoiset <[email protected]>
*	radv: fix GPU hangs when loading depth/stencil clear values on SI/CIK	Samuel Pitoiset	2018-11-08	1	-5/+19
\| \| \| \| \| \| \| \|	HTILE is supported on these chips, not sure how I missed that. This restores using PFP_SYNC_ME when LOAD_CONTEXT_REG is not used. Fixes: f425d9ee74 ("radv: use LOAD_CONTEXT_REG when loading fast clear values") Signed-off-by: Samuel Pitoiset <[email protected]>
*	radv: use LOAD_CONTEXT_REG when loading fast clear values	Samuel Pitoiset	2018-11-08	2	-19/+27
\| \| \| \| \| \| \| \| \|	This avoids syncing the Micro Engine. This is only supported for VI+ currently. There is probably a way for using LOAD_CONTEXT_REG on previous chips but that could be done later. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
*	radv: only expose VK_SUBGROUP_FEATURE_ARITHMETIC_BIT for VI+	Samuel Pitoiset	2018-11-08	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	Inclusive and exclusives scan are missing because older chips don't have llvm.amdgcn.update.dpp. This fixes crashes with dEQP-VK.subgroups.arithmetic.*. CC: [email protected] Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
*	radv: disable conditional rendering for vkCmdCopyQueryPoolResults()	Samuel Pitoiset	2018-11-07	1	-0/+10
\| \| \| \| \| \| \| \| \|	VK_EXT_conditional_rendering says that copy commands should not be affected by conditional rendering. Cc: 18.2 18.3 <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
*	radv: allocate enough space in CS when copying query results with compute	Samuel Pitoiset	2018-11-07	1	-0/+4
\| \| \| \| \|	Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
*	ac/nir_to_llvm: fix b2f for f64	Timothy Arceri	2018-11-07	1	-3/+12
\| \| \| \| \| \|	Fixes: d7e0d47b9de3 ("nir: Add a bunch of b2[if] optimizations") Reviewed-by: Dave Airlie <[email protected]>
*	radv: more use of radv_cp_wait_mem()	Samuel Pitoiset	2018-11-05	1	-22/+9
\| \| \| \| \|	Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
*	radv: replace si_emit_wait_fence() with radv_cp_wait_mem()	Samuel Pitoiset	2018-11-05	4	-10/+14
\| \| \| \| \|	Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
*	radv: add missing TFB queries support to CmdCopyQueryPoolsResults()	Samuel Pitoiset	2018-11-05	2	-0/+278
\| \| \| \| \| \| \|	Cc: 18.3 <[email protected]> Fixes: b4eb029062a ("radv: implement VK_EXT_transform_feedback") Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
*	radv: remove useless sync after copying query results with compute	Samuel Pitoiset	2018-11-05	1	-4/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The spec says: "vkCmdCopyQueryPoolResults is considered to be a transfer operation, and its writes to buffer memory must be synchronized using VK_PIPELINE_STAGE_TRANSFER_BIT and VK_ACCESS_TRANSFER_WRITE_BIT before using the results." VK_PIPELINE_STAGE_TRANSFER_BIT will wait for compute to be idle, while VK_ACCESS_TRANSFER_WRITE_BIT will invalidate both L1 vector caches and L2. So, it's useless to set those flags internally. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
*	android: radv: add libmesa_git_sha1 static dependency	Mauro Rossi	2018-11-03	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	libmesa_git_sha1 whole static dependency is added to get git_sha1.h header and avoid following building error: external/mesa/src/amd/vulkan/radv_device.c:46:10: fatal error: 'git_sha1.h' file not found ^ 1 error generated. Fixes: 9d40ec2cf6 ("radv: Add support for VK_KHR_driver_properties.") Signed-off-by: Mauro Rossi <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
*	amd: Make vgpr-spilling depend on llvm version	Jan Vesely	2018-11-02	1	-1/+2
\| \| \| \| \| \| \| \| \|	The option was removed in LLVM r345763 Signed-off-by: Jan Vesely <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Tested-by: Dieter Nützel <[email protected]>
*	radv: fix begin/end transform feedback with 0 counter buffers.	Dave Airlie	2018-11-02	1	-12/+16
\| \| \| \| \| \| \| \| \| \|	If the user gives 0 counterBuffers then the driver should still enable transform feedback on all targets. This changes the driver to always enable xfb, and use counter buffers where one is defined for the target in question. Fixes: b4eb029062 (radv: implement VK_EXT_transform_feedback) Reviewed-by: Samuel Pitoiset <[email protected]>
*	radv: apply xfb buffer offset at buffer binding time not later. (v2)	Dave Airlie	2018-11-02	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \|	In order to handle pause/resume properly, the offset should be added to the buffer binding not to the begin/end paths. v2: don't add offset to size Fixes ext_transform_feedback-alignment* under zink Fixes: b4eb029062 (radv: implement VK_EXT_transform_feedback) Reviewed-by: Samuel Pitoiset <[email protected]>
*	radv: set PA_SU_PRIM_FILTER_CNTL optimally	Samuel Pitoiset	2018-11-01	1	-0/+9
\| \| \| \| \| \| \| \|	Ported from RadeonSI. It's always TRUE for CIK+ because RADV doesn't support 16 samples. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Dave Airlie <[email protected]>