mesa.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	swr: reorder renderable formats, add grouping comments	Ilia Mirkin	2016-11-29	1	-65/+87
\| \| \| \| \|	Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
*	swr: use util_copy_framebuffer_state helper	Ilia Mirkin	2016-11-29	1	-12/+1
\| \| \| \| \|	Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
*	swr: enable cubemap arrays	Ilia Mirkin	2016-11-29	1	-1/+1
\| \| \| \| \| \| \|	Everything is in place for these. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
*	swr: rearrange caps into limits/supported/unsupported groups	Ilia Mirkin	2016-11-29	1	-129/+84
\| \| \| \| \| \| \| \| \| \|	I find this a lot more readable and compact - much easier to scan through the list and see what's on and what's off. No functional change intended. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
*	swr: only store up to the LOD size	Ilia Mirkin	2016-11-29	1	-1/+3
\| \| \| \| \|	Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
*	swr: [rasterizer common] add SwrTrace() and macros	Tim Rowley	2016-11-29	2	-15/+95
\| \| \| \|	Reviewed-by: Bruce Cherniak <[email protected]>
*	radeonsi: don't fetch 8 dwords for samplerBuffer and imageBuffer	Marek Olšák	2016-11-29	1	-51/+43
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The compiler doesn't shrink s_load_dwordx8, so we always wasted 4 SGPRs. Also, the extraction of the descriptor created some really ugly asm code with lots of VALU bitwise ops and v_readfirstlane. Totals from affected shaders: SGPRS: 13880 -> 13253 (-4.52 %) VGPRS: 15200 -> 15088 (-0.74 %) Code Size: 499864 -> 459816 (-8.01 %) bytes Max Waves: 1554 -> 1564 (0.64 %) Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: disable XNACK to free 2 SGPRs on APUs	Marek Olšák	2016-11-29	1	-1/+1
\| \| \| \| \| \|	My LLVM commit disables it for dGPUs, but not APUs. Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: count and report temp arrays in scratch separately	Marek Olšák	2016-11-29	2	-4/+40
\| \| \| \| \| \|	v2: only do this if debug output of shader dumping is enabled Reviewed-by: Nicolai Hähnle <[email protected]> (v1)
*	radeonsi: don't try to eliminate trivial VS outputs for PS and CS	Marek Olšák	2016-11-29	1	-1/+4
\| \| \| \| \| \|	PS and CS don't have any param exports, so it's a no-op. Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: disable RB+ blend optimizations for dual source blending	Marek Olšák	2016-11-29	1	-0/+11
\| \| \| \| \| \| \| \|	This fixes dual source blending on Stoney. The fix was copied from Vulkan. The problem was discovered during internal testing. Cc: 13.0 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: set CB_BLEND1_CONTROL.ENABLE for dual source blending	Marek Olšák	2016-11-29	1	-0/+4
\| \| \| \| \| \| \|	copied from Vulkan Cc: 13.0 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: always set all blend registers	Marek Olšák	2016-11-29	1	-5/+5
\| \| \| \| \| \| \|	better safe than sorry Cc: 13.0 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: set the smallest possible CB_TARGET_MASK	Marek Olšák	2016-11-29	1	-5/+5
\| \| \| \| \| \|	better safe than sorry; set_framebuffer_state always makes this dirty Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: don't print bodies of header-only packets	Marek Olšák	2016-11-29	1	-0/+4
\| \| \| \|	Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: print unknown registers with correct formatting	Marek Olšák	2016-11-29	1	-1/+2
\| \| \| \|	Reviewed-by: Nicolai Hähnle <[email protected]>
*	ddebug: fix hang detection with deferred flushes	Marek Olšák	2016-11-29	1	-1/+1
\| \| \| \|	Reviewed-by: Nicolai Hähnle <[email protected]>
*	radv: set spi_baryc_cntl.pos_float_location to 0	Dave Airlie	2016-11-29	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	This fixes: dEQP-VK.pipeline.multisample_interpolation.offset_interpolate_at_sample_position.* This should probably be 2 when sample shading is enabled, but I'm not sure. Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
*	radv: force persample shading when required.	Dave Airlie	2016-11-29	4	-6/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We need to force persample shading when a) shader uses sample_id b) shader uses sample_position c) shader uses sample qualifier. Also since ps_iter_samples can now change independently of the rasterizer samples we need to move setting the regs more often. This fixes: dEQP-VK.pipeline.multisample_interpolation.centroid_interpolate_at_consistency.* dEQP-VK.pipeline.multisample_interpolation.centroid_qualifier_inside_primitive.137_191_1.* dEQP-VK.pipeline.multisample_interpolation.sample_interpolate_at_distinct_values.* dEQP-VK.pipeline.multisample_interpolation.sample_qualifier_distinct_values.128_128_1.* Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
*	nir: print var binding in dumps.	Dave Airlie	2016-11-29	1	-1/+1
\| \| \| \| \| \| \| \|	This only useful for spir-v shaders, but I keep finding myself having to add it. Reviewed-by: Kenneth Graunke <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
*	docs: fix small typo	Eric Engestrom	2016-11-29	1	-1/+1
\| \| \| \| \| \|	Fixes: ba28f2136febca32fe56 ("docs: add note about r-b/other tags when resending") Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
*	i965/sched: Schedule trivial blocks.	Matt Turner	2016-11-29	1	-3/+0
\| \| \| \| \| \| \| \| \| \|	In commit 45cd76e342d1e8e schedule_instructions(bblock_t *) began setting bblock_t::cycle_count, but that function was not called on trivial blocks. Remove the code to skip trivial blocks so that cycle_count is set. Reviewed-by: Francisco Jerez <[email protected]>
*	i965/sched: Make 'time' a local variable.	Matt Turner	2016-11-29	1	-3/+1
\| \| \| \|	Reviewed-by: Francisco Jerez <[email protected]>
*	i965/cfg: Initialize bblock_t::cycle_count.	Matt Turner	2016-11-29	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	schedule_instructions(bblock_t *) isn't called on blocks with a single instruction, and since it is the only thing that set cycle_count, cycle_count would be uninitialized. A non-empty block with bblock_t::cycle_count == 0 is arguably a bug. That'll be fixed in the next commit. Reviewed-by: Francisco Jerez <[email protected]>
*	i965/cfg: Initialize cfg_t::cycle_count.	Matt Turner	2016-11-29	2	-1/+2
\| \| \| \| \| \|	This reverts commit b4001af1744a02f472bd1204458662088307981b. Reviewed-by: Francisco Jerez <[email protected]>
*	ac/nir: Fix accessing an unitialized value.	Bas Nieuwenhuizen	2016-11-29	1	-1/+2
\| \| \| \| \|	Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
*	radv: Initialize the shader_stats_dump flag.	Bas Nieuwenhuizen	2016-11-29	1	-0/+1
\| \| \| \| \| \| \| \|	Meta was using it before it was set. I suspect we typically don't want to dump meta shaders, so just set it to false in the beginning. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
*	vc4: Add a note for the future about texture latency calculation.	Eric Anholt	2016-11-29	1	-0/+20
\| \| \| \| \| \| \|	Debugging a shader-db reported cycle count regression from the tex coalescing, I eventually figured out that the texture latencies were totally bogus. Really fixing it will probably involve mirroring vc4_qir_schedule.c's texture fifo management here.
*	vc4: Add support for coalescing ALU ops into tex_[srtb] MOVs.	Eric Anholt	2016-11-29	4	-29/+37
\| \| \| \| \| \| \| \| \| \| \|	This isn't as complete as I would like (can't merge interpolation because of the implicit r5 dependency, doesn't work with control flow), but this was cheap and easy. Improves 3DMMES Taiji performance by 1.15353% +/- 0.299896% (n=29, 16) total instructions in shared programs: 99810 -> 99059 (-0.75%) instructions in affected programs: 10705 -> 9954 (-7.02%)
*	vc4: Restructure VPM write optimization into two passes.	Eric Anholt	2016-11-29	1	-18/+10
\| \| \| \| \|	For texturing, there won't be a fixed limit on how many writes there are, so we need to compute uses up front.
*	vc4: Make qir_for_each_inst_inorder() safe against removal.	Eric Anholt	2016-11-29	1	-1/+1
\| \| \| \| \|	The dead code elimination wants it to be safe, and I actually got segfaults due to it being unsafe with the new coalescing pass.
*	vc4: Split optimizing VPM writes from VPM reads.	Eric Anholt	2016-11-29	5	-51/+110
\| \| \| \| \| \|	The VPM write logic will be basically the same as the texture coordinate write logic we need, and it's not really related to the VPM read logic other than the reuse of the use_count array.
*	vc4: Restructure texture insts as ALU ops with tex_[strb] as the dst.	Eric Anholt	2016-11-29	9	-89/+194
\| \| \| \| \|	For now we're still just generating MOVs, but this will let us fold into other ops in the future. No difference on shader-db.
*	vc4: Refactor qir_get_op_nsrc(enum qop) to qir_get_nsrc(struct qinst *).	Eric Anholt	2016-11-29	17	-36/+34
\| \| \| \| \| \|	Every caller was dereffing the qinst, and this will let us make the number of sources vary depending on the destination of the qinst so that we can have general ALU ops that store to tex_[strb] and get an implicit uniform.
*	vc4: Replace the qinst src[] with a fixed-size array.	Eric Anholt	2016-11-29	3	-4/+2
\| \| \| \| \| \|	This may have made a tiny bit of sense when we had one 4-arg inst per shader, but if we only ever put 2 things in, having a pointer to 2 things almost every instruction is pointless indirection.
*	vc4: Remove qir_inst4().	Eric Anholt	2016-11-29	2	-25/+0
\| \| \| \| \|	This was used originally for unorm4x8 packs, but we now represent those as a series of packed movs.
*	anv: bump the texture gather offset limits	Ilia Mirkin	2016-11-29	1	-2/+2
\| \| \| \| \| \| \| \|	This matches what NVIDIA and AMD hardware expose, as well as what Intel hardware supports. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965/gen7: expose larger gather offsets	Ilia Mirkin	2016-11-29	1	-2/+7
\| \| \| \| \| \| \|	This matches the capabilities of the hardware. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965: support constant gather offsets larger than 4 bits	Ilia Mirkin	2016-11-29	4	-12/+24
\| \| \| \| \| \| \| \|	Offsets that don't fit into 4 bits need to force gather_po to be selected. Adjust the logic so that this happens. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965/fs: Refactor handling of constant tg4 offsets	Jason Ekstrand	2016-11-29	3	-34/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, we had an OFFSET_VALUE source for logical texture instructions that was intended to mean exactly what it says, "offset". In reality, we only fully used it for tg4 offsets. We used offset_value.file == IMM to mean, "you have a constant offset, go look in instr->offset" and didn't actually use the contents of the register at all in that case except for in nir_emit_texture where we used it as a temporary before we copy it into instr->offset. This commit renames OFFSET_VALUE to TG4_OFFSET and restricts its usage to indirect tg4 offsets only. The nir_emit_texture code is refactored so that we explicitly build a header_bits value which is placed in instr->offset and the constant offset values (both for tg4 and regular texture operations) are used to construct header_bits and don't go through the offset source at all. Finally, we stop passing offset_value in to lower_sampler_logical_send_gen5 because we can't do indirect offsets until gen7 anyway. Reviewed-by: Kenneth Graunke <[email protected]>
*	radv: Use different intrinsic for ubo loads.	Bas Nieuwenhuizen	2016-11-29	1	-1/+29
\| \| \| \| \| \| \| \| \| \| \|	Not sure about the deprecation path, but this intrinsic can be lowered to SMEM loads. This results in a significant Talos performance improvement. v2: Fix for LLVM attribute changes. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
*	mesa: fix active subroutine uniforms properly	Timothy Arceri	2016-11-29	3	-104/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	07fe2d565b introduced a big hack in order to return NumSubroutineUniforms when querying ACTIVE_RESOURCES for <shader>_SUBROUTINE_UNIFORM interfaces. However this is the wrong fix we are meant to be returning the number of active resources i.e. the count of subroutine uniforms in the resource list which is what the code was previously doing, anything else will cause trouble when trying to retrieve the resource properties based on the ACTIVE_RESOURCES count. The real problem is that NumSubroutineUniforms was counting array elements as separate uniforms but the innermost array is always considered a single uniform so we fix that count instead which was counted incorrectly in 7fa0250f9. Idealy we could probably completely remove NumSubroutineUniforms and just compute its value when needed from the resource list but this works for now. Reviewed-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Tapani Pälli <[email protected]> Cc: 13.0 <[email protected]>
*	anv/cmd_buffer: Remove the 1-D case from the HiZ QPitch calculation	Jason Ekstrand	2016-11-28	1	-3/+6
\| \| \| \| \| \| \| \| \| \| \|	The 1-D special case doesn't actually apply to depth or HiZ. I discovered this while converting BLORP over to genxml and ISL. The reason is that the 1-D special case only applies to the new Sky Lake 1-D layout which is only used for LINEAR 1-D images. For tiled 1-D images, such as depth buffers, the old gen4 2-D layout is used and the QPitch should be in rows. Reviewed-by: Nanley Chery <[email protected]> Cc: "13.0" <[email protected]>
*	anv/cmd_buffer: Set the correct surface type for depth/stencil	Jason Ekstrand	2016-11-28	1	-2/+53
\| \| \| \|	Reviewed-by: Nanley Chery <[email protected]>
*	anv: enable drawIndirectFirstInstance	Ilia Mirkin	2016-11-28	1	-1/+1
\| \| \| \| \| \| \| \|	This was already piped through in the CmdDraw(Indexed)Indirect handling. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	anv: expose depthBiasClamp, it is already set	Ilia Mirkin	2016-11-28	1	-1/+1
\| \| \| \| \| \| \| \|	The gen7/8_cmd_buffer logic already sets the clamp, and it's piped through via the dynamic state. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	anv: bump maxFramebufferLayers to 2048	Ilia Mirkin	2016-11-28	1	-1/+1
\| \| \| \| \| \| \| \| \|	This matches maxImageArrayLayers, as well as the same setting in the GL frontend. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	anv: enable storage image extended formats	Ilia Mirkin	2016-11-28	1	-1/+1
\| \| \| \| \| \| \| \|	These are all regularly available in desktop GL, so the backend fully supports them. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	anv: expose imageCubeArray functionality	Ilia Mirkin	2016-11-28	1	-1/+1
\| \| \| \| \| \| \| \|	This appears to be fully supported already. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	radv: set maxFragmentDualSrcAttachments to 1	Dave Airlie	2016-11-29	1	-1/+1
\| \| \| \| \| \| \|	Reported-by: Ilia Mirkin <[email protected]> Cc: "13.0" <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Signed-off-by: Dave Airlie <[email protected]>