mesa.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	i965: skip reading unused slots at the begining of the URB for the FS	Iago Toral Quiroga	2017-10-02	3	-14/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We can start reading the URB at the first offset that contains varyings that are actually read in the URB. We still need to make sure that we read at least one varying to honor hardware requirements. This helps alleviate a problem introduced with 99df02ca26f61 for separate shader objects: without separate shader objects we assign locations sequentially, however, since that commit we have changed the method for SSO so that the VUE slot assigned depends on the number of builtin slots plus the location assigned to the varying. This fixed layout is intended to help SSO programs by avoiding on-the-fly recompiles when swapping out shaders, however, it also means that if a varying uses a large location number close to the maximum allowed by the SF/FS units (31), then the offset introduced by the number of builtin slots can push the location outside the range and trigger an assertion. This problem is affecting at least the following CTS tests for enhanced layouts: KHR-GL45.enhanced_layouts.varying_array_components KHR-GL45.enhanced_layouts.varying_array_locations KHR-GL45.enhanced_layouts.varying_components KHR-GL45.enhanced_layouts.varying_locations which use SSO and the the location layout qualifier to select such location numbers explicitly. This change helps these tests because for SSO we always have to include things such as VARYING_SLOT_CLIP_DIST{0,1} even if the fragment shader is very unlikely to read them, so by doing this we free builtin slots from the fixed VUE layout and we avoid the tests to crash in this scenario. Of course, this is not a proper fix, we'd still run into problems if someone tries to use an explicit max location and read gl_ViewportIndex, gl_LayerID or gl_CullDistancein in the FS, but that would be a much less common bug and we can probably wait to see if anyone actually runs into that situation in a real world scenario before making the decision that more aggresive changes are required to support this without reverting 99df02ca26f61. v2: - Add a debug message when we skip clip distances (Ilia) - we also need to account for this when we compute the urb setup for the fragment shader stage, so add a compiler util to compute the first slot that we need to read from the URB instead of replicating the logic in both places. v3: - Make the util more generic so it can account for all unused slots at the beginning of the URB, that will make it more useful (Ken). - Drop the debug message, it was not what Ilia was asking for. Suggested-by: Kenneth Graunke <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Normalize types for FBL, FBH, etc	Matt Turner	2017-09-30	2	-15/+11
\| \| \| \| \| \| \| \| \| \| \| \|	Allows the instructions to be compacted. The documentation claims that some of these only accept UD types, even though the type doesn't change the operation performed. Just normalize the types to ensure we get instruction compaction. The only functional changes are for FBL and CBIT (always use UD types) and FBH (always use the same types). Reviewed-by: Kenneth Graunke <[email protected]>
*	radeonsi: don't use the template keyword	Marek Olšák	2017-09-30	1	-7/+7
\| \| \| \| \| \|	for C++ editors Reviewed-by: Brian Paul <[email protected]>
*	glx: don't use the template keyword	Marek Olšák	2017-09-30	1	-3/+3
\| \| \| \| \| \|	for C++ editors Reviewed-by: Brian Paul <[email protected]>
*	gallium/vl: don't use the template keyword	Marek Olšák	2017-09-30	1	-14/+14
\| \| \| \| \| \|	for C++ editors Reviewed-by: Brian Paul <[email protected]>
*	egl/dri2: don't use the template keyword	Marek Olšák	2017-09-30	1	-3/+3
\| \| \| \| \| \|	for C++ editors Reviewed-by: Brian Paul <[email protected]>
*	radeonsi/uvd: clean up si_video_buffer_create	Benedikt Schemmer	2017-09-30	1	-30/+17
\| \| \| \| \| \|	V2: remove code duplication and one unnessecary variable, minor whitespace fix Signed-off-by: Marek Olšák <[email protected]>
*	radeonsi/uvd: fix planar formats broken since f70f6baaa3bb0f8b280ac2eaea69bb	Marek Olšák	2017-09-30	1	-3/+8
\| \| \| \| \|	Tested-by: Benedikt Schemmer <[email protected]> Reviewed-by: Christian König <[email protected]>
*	gallium: add new LOD opcode	Roland Scheidegger	2017-09-30	5	-5/+74
\| \| \| \| \| \| \| \| \| \|	The operation performed is all the same as LODQ, but with the usual differences between dx10 and GL texture opcodes, that is separate resource and sampler indices (plus result swizzling, and setting z/w channels to zero). Reviewed-by: Jose Fonseca <[email protected]> Acked-by: Nicolai Hähnle <[email protected]>
*	drirc: whitelist glthread for Outlast	Kamil Páral	2017-09-29	1	-0/+3
\| \| \| \| \|	FPS increase 10-20% in starting locations on Core i5-4570 + Radeon R9 270.
*	st/va: add dst rect to avoid scale on deint	Leo Liu	2017-09-29	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \|	For 1080p video transcode, the height will be scaled to 1088 when deint to progressive buffer. Set dst rect to make sure no scale. Fixes: 3ad8687 "st/va: use new vl_compositor_yuv_deint_full() to deint" Signed-off-by: Leo Liu <[email protected]> Reviewed-by: Christian König <[email protected]> Acked-by: Andy Furniss <[email protected]>
*	radeonsi: emit DLDEXP and DFRACEXP TGSI opcodes	Nicolai Hähnle	2017-09-29	2	-1/+26
\| \| \| \| \| \| \| \| \|	Note: this causes spurious regressions in some current piglit tests, because the tests incorrectly assume that there is no denorm support for doubles. I'm going to send out a fix for those tests as well. Reviewed-by: Marek Olšák <[email protected]> Tested-by: Dieter Nützel <[email protected]>
*	radeonsi: emit LDEXP opcode	Nicolai Hähnle	2017-09-29	2	-1/+3
\| \| \| \| \| \| \| \|	The LLVM intrinsic has existed for a long time. The current name was established in LLVM 3.9. Reviewed-by: Marek Olšák <[email protected]> Tested-by: Dieter Nützel <[email protected]>
*	st/glsl_to_tgsi: use LDEXP when available	Nicolai Hähnle	2017-09-29	1	-3/+7
\| \| \| \| \|	Reviewed-by: Marek Olšák <[email protected]> Tested-by: Dieter Nützel <[email protected]>
*	gallium: add LDEXP TGSI instruction and corresponding cap	Nicolai Hähnle	2017-09-29	20	-3/+50
\| \| \| \| \|	Reviewed-by: Marek Olšák <[email protected]> Tested-by: Dieter Nützel <[email protected]>
*	tgsi: infer that dst[1] of DFRACEXP is an integer	Nicolai Hähnle	2017-09-29	5	-6/+9
\| \| \| \| \|	Reviewed-by: Marek Olšák <[email protected]> Tested-by: Dieter Nützel <[email protected]>
*	gallivm: add support for TGSI instructions with two outputs	Nicolai Hähnle	2017-09-29	3	-1/+31
\| \| \| \| \|	Reviewed-by: Marek Olšák <[email protected]> Tested-by: Dieter Nützel <[email protected]>
*	gallivm: add dst register index to lp_build_tgsi_context::emit_store	Nicolai Hähnle	2017-09-29	6	-20/+27
\| \| \| \| \|	Reviewed-by: Marek Olšák <[email protected]> Tested-by: Dieter Nützel <[email protected]>
*	tgsi: clarify the semantics of DFRACEXP	Nicolai Hähnle	2017-09-29	4	-22/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The status quo is quite the mess: 1. tgsi_exec will do a per-channel computation, and store the dst[0] result (significand) correctly for each channel. The dst[1] result (exponent) will be written to the first bit set in the writemask. So per-component calculation only works partially. 2. r600 will only do a single computation. It will replicate the exponent but not the significand. 3. The docs pretend that there's per-component calculation, but even get dst[0] and dst[1] confused. 4. Luckily, st_glsl_to_tgsi only ever emits single-component instructions, and kind-of assumes that everything is replicated, generating this for the dvec4 case: DFRACEXP TEMP[0].xy, TEMP[1].x, CONST[0][0].xyxy DFRACEXP TEMP[0].zw, TEMP[1].y, CONST[0][0].zwzw DFRACEXP TEMP[2].xy, TEMP[1].z, CONST[0][1].xyxy DFRACEXP TEMP[2].zw, TEMP[1].w, CONST[0][1].zwzw Settle on the simplest behavior, which is single-component calculation with replication, document it, and adjust tgsi_exec and r600. Reviewed-by: Marek Olšák <[email protected]> Tested-by: Dieter Nützel <[email protected]>
*	tgsi: fix the documentation of DLDEXP	Nicolai Hähnle	2017-09-29	1	-1/+1
\| \| \| \| \| \| \| \| \|	Sourcing the exponent for the zw destination pair from Z is consistent with both tgsi_exec and gallivm. In practice, st_glsl_to_tgsi always generates per-channel instructions anyway. Reviewed-by: Marek Olšák <[email protected]> Tested-by: Dieter Nützel <[email protected]>
*	tgsi: infer that DLDEXP's second source has an integer type	Nicolai Hähnle	2017-09-29	4	-7/+11
\| \| \| \| \|	Reviewed-by: Marek Olšák <[email protected]> Tested-by: Dieter Nützel <[email protected]>
*	glsl/lower_instruction: handle denorms and overflow in ldexp correctly	Nicolai Hähnle	2017-09-29	1	-64/+107
\| \| \| \| \| \| \| \| \| \| \| \| \|	GLSL ES requires both, and while GLSL explicitly doesn't require correct overflow handling, it does appear to require handling input inf/denorms correctly. Fixes dEQP-GLES31.functional.shaders.builtin_functions.precision.ldexp.* Cc: [email protected] Acked-by: Matt Turner <[email protected]> Acked-by: Marek Olšák <[email protected]> Tested-by: Dieter Nützel <[email protected]>
*	util/queue: fix a race condition in the fence code	Nicolai Hähnle	2017-09-29	1	-0/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A tempting alternative fix would be adding a lock/unlock pair in util_queue_fence_is_signalled. However, that wouldn't actually improve anything in the semantics of util_queue_fence_is_signalled, while making that test much more heavy-weight. So this lock/unlock pair in util_queue_fence_destroy for "flushing out" other threads that may still be in util_queue_fence_signal looks like the better fix. v2: rephrase the comment Cc: [email protected] Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Gustaw Smolarczyk <[email protected]>
*	r600: cleanup set_occlusion_query_state	Nicolai Hähnle	2017-09-29	3	-14/+3
\| \| \| \| \| \| \| \| \| \| \|	This fixes a warning caused by the fork (note the change in the function signature): ../../../../../mesa-src/src/gallium/drivers/r600/r600_state_common.c: In function ‘r600_init_common_state_functions’: ../../../../../mesa-src/src/gallium/drivers/r600/r600_state_common.c:2974:36: warning: assignment from incompatible pointer type [-Wincompatible-pointer-types] rctx->b.set_occlusion_query_state = r600_set_occlusion_query_state; Reviewed-by: Marek Olšák <[email protected]>
*	r300: add missing case PIPE_SHADER_CAP_INT64_ATOMICS	Nicolai Hähnle	2017-09-29	1	-0/+1
\| \| \| \|	Reviewed-by: Marek Olšák <[email protected]>
*	radeonsi: fix border color translation for integer textures	Nicolai Hähnle	2017-09-29	3	-29/+60
\| \| \| \| \| \| \| \| \| \|	This fixes the extremely unlikely case that an application uses 0x80000000 or 0x3f800000 as border color for an integer texture and helps in the also, but perhaps slightly less, unlikely case that 1 is used as a border color. Reviewed-by: Marek Olšák <[email protected]> Tested-by: Dieter Nützel <[email protected]>
*	radeonsi: clamp border colors for upgraded depth textures	Nicolai Hähnle	2017-09-29	1	-59/+60
\| \| \| \| \| \| \| \| \| \| \| \| \|	The hardware does this automatically for unorm formats, but we need to do it manually for unorm depth formats that have been upgraded to Z32_FLOAT. Fixes dEQP-GLES31.functional.texture.border_clamp.range_clamp.nearest_unorm_depth and others. Fixes: d4d9ec55c589 ("radeonsi: implement TC-compatible HTILE") Reviewed-by: Marek Olšák <[email protected]> Tested-by: Dieter Nützel <[email protected]>
*	radeonsi: clamp depth comparison value only for fixed point formats	Nicolai Hähnle	2017-09-29	7	-14/+55
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The hardware usually does this automatically. However, we upgrade depth to Z32_FLOAT to enable TC-compatible HTILE, which means the hardware no longer clamps the comparison value for us. The only way to tell in the shader whether a clamp is required seems to be to communicate an additional bit in the descriptor table. While VI has some unused bits in the resource descriptor, those bits have unfortunately all been used in gfx9. So we use an unused bit in the sampler state instead. Fixes dEQP-GLES3.functional.texture.shadow.2d.linear.equal_depth_component32f and many other tests in dEQP-GLES3.functional.texture.shadow.* Fixes: d4d9ec55c589 ("radeonsi: implement TC-compatible HTILE") Reviewed-by: Marek Olšák <[email protected]> Tested-by: Dieter Nützel <[email protected]>
*	radeonsi/gfx9: fix geometry shaders without output vertices	Nicolai Hähnle	2017-09-29	1	-3/+5
\| \| \| \| \| \| \| \| \| \| \|	Not that those are super common or useful, but hey! Fun corner cases of the API... Fixes dEQP-GLES31.functional.geometry_shading.emit.* Cc: [email protected] Reviewed-by: Marek Olšák <[email protected]> Tested-by: Dieter Nützel <[email protected]>
*	amd/common: save an instruction in the build_cube_select sequence	Nicolai Hähnle	2017-09-29	1	-5/+6
\| \| \| \| \| \| \|	Avoid a v_cndmask: the absolute value is free due to input modifiers. Reviewed-by: Marek Olšák <[email protected]> Tested-by: Dieter Nützel <[email protected]>
*	amd/common: fix build_cube_select	Nicolai Hähnle	2017-09-29	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \|	Fix the custom cube coord selection sequence to be identical to the hardware v_cubesc/tc and OpenGL spec. Affects texture sampling with user-provided derivatives. Fixes dEQP-GLES3.functional.shaders.texture_functions.texturegrad.* Cc: [email protected] Reviewed-by: Marek Olšák <[email protected]> Tested-by: Dieter Nützel <[email protected]>
*	st/glsl_to_tgsi: fix conditional assignments to packed shader outputs	Nicolai Hähnle	2017-09-29	1	-1/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Overriding the default (no-op) swizzle is clearly counter-productive, since the whole point is putting the destination register as one of the source operands so that it remains unmodified when the assignment condition is false. Fragment depth and stencil outputs are a special case due to how their source swizzles are manipulated in translate_src when compiling to TGSI. Fixes dEQP-GLES2.functional.shaders.conditionals.if.*_vertex Cc: [email protected] Reviewed-by: Marek Olšák <[email protected]> Tested-by: Dieter Nützel <[email protected]>
*	st/glsl_to_tgsi: fix a use-after-free in merge_two_dsts	Nicolai Hähnle	2017-09-29	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	Found by address sanitizer. The loop here tries to be safe, but in doing so, it ends up doing exactly the wrong thing: the safe foreach is for when the loop variable (inst) could be deleted and nothing else. However, this particular can delete inst's successor, but not inst itself. Fixes: 8c6a0ebaad72 ("st/mesa: add st fp64 support (v7.1)") Reviewed-by: Marek Olšák <[email protected]> Tested-by: Dieter Nützel <[email protected]>
*	radeonsi: move descriptor logs to after corresponding draw/compute packet	Nicolai Hähnle	2017-09-29	2	-8/+6
\| \| \| \| \| \| \| \|	It has to happen after descriptor uploads since otherwise we'll print out the wrong GPU list / incorrectly claim descriptor corruption. Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
*	amd/common: remove ac_shader_abi::chip_class	Nicolai Hähnle	2017-09-29	3	-15/+10
\| \| \| \| \| \| \|	Redundant with the recently added ac_llvm_context::chip_class. Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
*	gallium/radeon: fix a comment	Nicolai Hähnle	2017-09-29	1	-1/+1
\| \| \| \| \|	Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
*	i965/fs: force pull model for 64-bit GS inputs	Iago Toral Quiroga	2017-09-29	2	-7/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Triggering the push model when 64-bit inputs are involved is not easy due to the constrains on the maximum number of registers that we allow for this mode, however, for GS with 'points' primitive type and just a couple of double varyings we can trigger this and it just doesn't work because the implementation is not 64-bit aware at all. For now, let's make sure that we don't attempt this model whith 64-bit inputs and we always fall back to pull model for them. Also, don't enable the VUE handles in the thread payload on the fly when we find an input for which we need the pull model, this is not safe: if we need to resort to the pull model we need to account for that when we setup the thread payload so we compute the first non-payload register properly. If we didn't do that correctly and we enable it on-the-fly here then we will end up VUE handles on the first non-payload register which will probably lead to GPU hangs. Instead, always enable the VUE handles for the pull model so we can safely use them when needed. The GS is going to resort to pull model almost in every situation anyway, so this shouldn't make a significant difference and it makes things easier and safer. v2: Always enable the VUE handles for pull model, this is easier and safer and the GS is going to fallback to pull model almost always anyway (Ken) v3: Only clamp the URB read length if we are over the maximum reserved for push inputs as we were doing in the original code (Ken). v4: No need to clamp the urb read length if invocations > 1 Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/link: Use prog->nir instead of creating a temporary	Jason Ekstrand	2017-09-28	1	-4/+3
\| \| \| \| \| \| \| \| \|	This way, when NIR_PASS_V makes a clone of the shader (for testing nir_clone), the new and lowered version gets re-assigned to prog->nir. [[email protected]: Tested NIR_TEST_CLONE=1 with valgrind] Tested-by: Jordan Justen <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
*	i965/link: Make more use of NIR_PASS	Jason Ekstrand	2017-09-28	1	-6/+6
\| \| \| \| \| \|	[[email protected]: Tested NIR_TEST_CLONE=1 with valgrind] Tested-by: Jordan Justen <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
*	i965/link: Make better use of temporary variables	Jason Ekstrand	2017-09-28	1	-4/+5
\| \| \| \| \| \| \| \| \| \| \|	The way NIR_PASS works (and, by extension, nir_optimize) is that they may clone the shader and throw the old one away. (We use this for testing nir_clone.) It's better if we just make a temporary variable, use it for everything, and re-assign to the gl_program at the end. [[email protected]: Tested NIR_TEST_CLONE=1 with valgrind] Tested-by: Jordan Justen <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
*	util: fix in-class initialization of static member	Thomas Helland	2017-09-28	1	-3/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix a compile error with G++ 4.4 string_buffer_test.cpp:43: error: ISO C++ forbids initialization of member ‘str1’ string_buffer_test.cpp:43: error: making ‘str1’ static string_buffer_test.cpp:43: error: invalid in-class initialization of static data member of non-integral type ‘const char*’ Tested-by: Vinson Lee <vlee at freedesktop.org> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103002
*	meson: remove duplicate libisl dependency in anv	Dylan Baker	2017-09-28	1	-1/+1
\| \| \| \| \|	Signed-off-by: Dylan Baker <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
*	svga: add missing PIPE_SHADER_CAP_INT64_ATOMICS switch cases	Brian Paul	2017-09-28	1	-0/+2
\| \| \| \| \| \|	Silences a compiler warning. Reviewed-by: Roland Scheidegger <[email protected]>
*	svga: trivial whitespace clean-ups in svga_screen.c	Brian Paul	2017-09-28	1	-11/+13
\|
*	gallium/util: use new util_vasprintf() function	Brian Paul	2017-09-28	1	-1/+2
\| \| \| \|	Reviewed-by: Nicolai Hähnle <[email protected]>
*	util: add util_vasprintf() for Windows (v2)	Brian Paul	2017-09-28	1	-0/+22
\| \| \| \| \| \| \| \|	We don't have vasprintf() on Windows so we need to implement it ourselves. v2: compute actual length of output string, per Nicolai Hähnle. Reviewed-by: Nicolai Hähnle <[email protected]>
*	st/mesa: don't call close() on Windows	Brian Paul	2017-09-28	1	-0/+2
\| \| \| \|	Reviewed-by: Marek Olšák <[email protected]>
*	svga: start advertising PIPE_CAP_QUADS_FOLLOW_PROVOKING_VERTEX_CONVENTION	Neha Bhende	2017-09-28	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \|	Since our driver support arb_provoking_vertex, we can start advertising PIPE_CAP_QUADS_FOLLOW_PROVOKING_VERTEX_CONVENTION Fixes ./clipflat & ./arb-provoking-vertex-render piglit tests Tested piglit, glretrace on Hw 11 and Hw 13 Reviewed-by: Charmaine Lee <[email protected]> Reviewed-by: Brian Paul <[email protected]>
*	mesa: fix texture updates for ATI_fragment_shader	Marek Olšák	2017-09-28	1	-3/+5
\| \| \| \| \| \|	Cc: 17.1 17.2 <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	etnaviv: optimize RS transfers	Lucas Stach	2017-09-28	1	-4/+25
\| \| \| \| \| \| \| \| \| \| \| \| \|	Currently we are blitting the whole resource when the RS is used to de-/tile a resource. This can be very inefficient for large resources where the transfer is only changing a small part of the resource (happens a lot with glTexSubImage2D). Optimize this by only blitting the tile aligned subregion of the resource, which the transfer is going to change. Signed-off-by: Lucas Stach <[email protected]> Reviewed-By: Wladimir J. van der Laan <[email protected]>