mesa.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	gallium/radeon: assign the highest priority to scratch; make rings second	Marek Olšák	2016-08-17	2	-4/+6
\| \| \| \| \| \| \|	just FYI, the kernel receives priority/4 Acked-by: Edward O'Callaghan <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
*	gallium/winsys: re-number winsys priority flags	Marek Olšák	2016-08-17	1	-16/+13
\| \| \| \| \| \| \|	free 60..63, move CP_DMA up Acked-by: Edward O'Callaghan <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
*	gallium/radeon: mark shader rings as highest-priority buffers	Marek Olšák	2016-08-17	5	-7/+7
\| \| \| \| \| \| \|	and rename the enum Acked-by: Edward O'Callaghan <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
*	gallium/radeon: set SHADER_RW_BUFFER priority for streamout buffers	Marek Olšák	2016-08-17	2	-4/+6
\| \| \| \| \|	Acked-by: Edward O'Callaghan <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: use current context for DCC feedback-loop decompress, fixes Elemental	Marek Olšák	2016-08-17	4	-16/+38
\| \| \| \| \| \| \| \| \| \|	This is just a workaround. The problem is described in the code. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96541 v2: say that it's only between the current context and aux_context Reviewed-by: Nicolai Hähnle <[email protected]> (v1)
*	radeonsi: simplify CB_TARGET_MASK logic	Marek Olšák	2016-08-17	1	-14/+7
\| \| \| \| \| \|	we can now rely on CB_COLORn_INFO to disable empty slots. Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: don't set CB_COLOR1_INFO for dual src blending	Marek Olšák	2016-08-17	1	-7/+0
\| \| \| \| \| \| \| \| \|	Vulkan doesn't do this. The reason may be that CB_COLOR1_INFO.SOURCE_FORMAT from NI was moved to SPI_SHADER_COL_FORMAT for SI. I asked CB guys about this 2 days ago and they still haven't replied. Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: eliminate PS OUT[1] if dual src blending is off and CB1 is not bound	Marek Olšák	2016-08-17	2	-11/+7
\| \| \| \| \| \|	All VP DX9 ports benefit from this. Reviewed-by: Nicolai Hähnle <[email protected]>
*	gallium/radeon: use unflushed fences for PIPE_QUERY_GPU_FINISHED	Marek Olšák	2016-08-17	1	-2/+2
\| \| \| \|	Reviewed-by: Nicolai Hähnle <[email protected]>
*	gallium/radeon: use lp_build_alloca_undef	Nicolai Hähnle	2016-08-17	1	-13/+4
\| \| \| \| \| \| \| \|	Avoid building all those store 0 / store undef instruction pairs that end up getting removed anyway. Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
*	gallivm: add lp_build_alloca_undef	Nicolai Hähnle	2016-08-17	2	-0/+24
\| \| \| \| \|	Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
*	gallivm: add create_builder_at_entry helper function	Nicolai Hähnle	2016-08-17	1	-23/+22
\| \| \| \| \| \| \|	Reduces code duplication. Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
*	gallium/radeon: protect against out of bounds temporary array accesses	Nicolai Hähnle	2016-08-17	1	-0/+15
\| \| \| \| \| \| \|	They can lead to VM faults and worse, which goes against the GL robustness promises. Reviewed-by: Marek Olšák <[email protected]>
*	gallium/radeon: add radeon_llvm_bound_index for bounds checking	Nicolai Hähnle	2016-08-17	3	-18/+34
\| \| \| \|	Reviewed-by: Marek Olšák <[email protected]>
*	gallium/radeon: reduce alloca of temporaries based on usagemask	Nicolai Hähnle	2016-08-17	2	-10/+54
\| \| \| \| \| \|	v2: take actual writemasks into account Reviewed-by: Marek Olšák <[email protected]>
*	gallium/radeon: use tgsi_scan_arrays for temp arrays	Nicolai Hähnle	2016-08-17	3	-5/+10
\| \| \| \|	Reviewed-by: Marek Olšák <[email protected]>
*	gallium/radeon: allocate temps array info in radeon_llvm_context_init	Nicolai Hähnle	2016-08-17	3	-36/+47
\| \| \| \| \| \| \| \| \|	Also, prepare for using tgsi_array_info. This also opens the door for properly handling allocation failures, but I'm leaving that for a separate change. Reviewed-by: Marek Olšák <[email protected]>
*	gallium/radeon: always do the full store in store_value_to_array	Nicolai Hähnle	2016-08-17	1	-49/+28
\| \| \| \| \| \| \| \| \|	Doing the write-back of the temporary vector in radeon_llvm_emit_store makes no sense. This also allows us to get rid of get_alloca_for_array. Reviewed-by: Marek Olšák <[email protected]>
*	gallium/radeon: extract common getelementptr logic into get_pointer_into_array	Nicolai Hähnle	2016-08-17	1	-39/+66
\| \| \| \|	Reviewed-by: Marek Olšák <[email protected]>
*	gallium/radeon: pass indirect register info into get_alloca_for_array	Nicolai Hähnle	2016-08-17	1	-5/+6
\| \| \| \| \| \|	To have the same signature as get_array_range. Reviewed-by: Marek Olšák <[email protected]>
*	gallium/radeon: extract common lookup code into get_temp_array function	Nicolai Hähnle	2016-08-17	1	-33/+40
\| \| \| \|	Reviewed-by: Marek Olšák <[email protected]>
*	gallium/radeon: clarify the comment on the array alloca heuristic	Nicolai Hähnle	2016-08-17	1	-10/+19
\| \| \| \|	Reviewed-by: Marek Olšák <[email protected]>
*	gallium/radeon: more descriptive names for LLVM temporaries in debug builds	Nicolai Hähnle	2016-08-17	1	-2/+12
\| \| \| \| \|	Reviewed-by: Tom Stellard <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
*	gallium/radeon: simplify radeon_llvm_emit_store for direct array addressing	Nicolai Hähnle	2016-08-17	1	-7/+0
\| \| \| \| \| \| \|	We can use the pointer stored in the temps array directly. Reviewed-by: Tom Stellard <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
*	gallium/radeon: simplify radeon_llvm_emit_fetch for direct array addressing	Nicolai Hähnle	2016-08-17	1	-5/+0
\| \| \| \| \| \| \|	We can use the pointer stored in the temps array directly. Reviewed-by: Tom Stellard <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
*	gallium/radeon: clean up emit_declaration for temporaries	Nicolai Hähnle	2016-08-17	1	-9/+18
\| \| \| \| \| \| \| \|	In the alloca'd array case, no longer create redundant and unused allocas for the individual elements; create getelementptrs instead. Reviewed-by: Tom Stellard <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
*	st_glsl_to_tgsi: use calloc the way it's meant to be used	Nicolai Hähnle	2016-08-17	1	-1/+1
\| \| \| \|	Reviewed-by: Marek Olšák <[email protected]>
*	tgsi/scan: add tgsi_scan_arrays	Nicolai Hähnle	2016-08-17	2	-0/+93
\| \| \| \|	Reviewed-by: Marek Olšák <[email protected]>
*	glsl: Add missing ir_quadop_vector constant evaluation for Boolean types	Ian Romanick	2016-08-17	1	-0/+3
\| \| \| \| \|	Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	glsl: Fix typo in ir_unop_f2u implementation	Ian Romanick	2016-08-17	1	-1/+1
\| \| \| \| \| \| \|	This won't affect the output, but it was, technically, wrong. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	glsl: Fix typo in ir_unop_b2i implementation	Ian Romanick	2016-08-17	1	-1/+1
\| \| \| \| \| \| \|	This won't affect the output, but it was, technically, wrong. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	glsl: Don't support integer types for operations that can't handle them	Ian Romanick	2016-08-17	2	-14/+2
\| \| \| \| \| \| \| \|	ir_unop_fract already forbade integer types in ir_validate. ir_unop_rcp, ir_unop_rsq, and ir_unop_sqrt should also forbid them in ir_validate. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	glsl: Don't support ir_unop_abs or ir_unop_sign for unsigned integers	Ian Romanick	2016-08-17	2	-6/+9
\| \| \| \| \|	Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	nir/algebraic: Optimize common array indexing sequence	Ian Romanick	2016-08-17	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some shaders include code that looks like: uniform int i; uniform vec4 bones[...]; foo(bones[i * 3], bones[i * 3 + 1], bones[i * 3 + 2]); CSE would do some work on this: x = i * 3 foo(bones[x], bones[x + 1], bones[x + 2]); The compiler may then add '<< 4 + base' to the index calculations. This results in expressions like x = i * 3 foo(bones[x << 4], bones[(x + 1) << 4], bones[(x + 2) << 4]); Just rearranging the math to produce (i * 48) + 16 saves an instruction, and it allows CSE to do more work. x = i * 48; foo(bones[x], bones[x + 16], bones[x + 32]); So, ~6 instructions becomes ~3. Some individual shader-db results look pretty bad. However, I have a really, really hard time believing the change in estimated cycles in, for example, 3dmmes-taiji/51.shader_test after looking that change in the generated code. G45 total instructions in shared programs: 4020840 -> 4010070 (-0.27%) instructions in affected programs: 177460 -> 166690 (-6.07%) helped: 894 HURT: 0 total cycles in shared programs: 98829000 -> 98784990 (-0.04%) cycles in affected programs: 3936648 -> 3892638 (-1.12%) helped: 894 HURT: 0 Ironlake total instructions in shared programs: 6418887 -> 6408117 (-0.17%) instructions in affected programs: 177460 -> 166690 (-6.07%) helped: 894 HURT: 0 total cycles in shared programs: 143504542 -> 143460532 (-0.03%) cycles in affected programs: 3936648 -> 3892638 (-1.12%) helped: 894 HURT: 0 Sandy Bridge total instructions in shared programs: 8357887 -> 8339251 (-0.22%) instructions in affected programs: 432715 -> 414079 (-4.31%) helped: 2795 HURT: 0 total cycles in shared programs: 118284184 -> 118207412 (-0.06%) cycles in affected programs: 6114626 -> 6037854 (-1.26%) helped: 2478 HURT: 317 Ivy Bridge total instructions in shared programs: 7669390 -> 7653822 (-0.20%) instructions in affected programs: 388234 -> 372666 (-4.01%) helped: 2795 HURT: 0 total cycles in shared programs: 68381982 -> 68263684 (-0.17%) cycles in affected programs: 1972658 -> 1854360 (-6.00%) helped: 2458 HURT: 307 Haswell total instructions in shared programs: 7082636 -> 7067068 (-0.22%) instructions in affected programs: 388234 -> 372666 (-4.01%) helped: 2795 HURT: 0 total cycles in shared programs: 68282020 -> 68164158 (-0.17%) cycles in affected programs: 1891820 -> 1773958 (-6.23%) helped: 2459 HURT: 261 Broadwell total instructions in shared programs: 9002466 -> 8985875 (-0.18%) instructions in affected programs: 658784 -> 642193 (-2.52%) helped: 2795 HURT: 5 total cycles in shared programs: 78503092 -> 78450404 (-0.07%) cycles in affected programs: 2873304 -> 2820616 (-1.83%) helped: 2275 HURT: 415 Skylake total instructions in shared programs: 9156978 -> 9140387 (-0.18%) instructions in affected programs: 682625 -> 666034 (-2.43%) helped: 2795 HURT: 5 total cycles in shared programs: 75591392 -> 75550574 (-0.05%) cycles in affected programs: 3192120 -> 3151302 (-1.28%) helped: 2271 HURT: 425 Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Thomas Helland <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	glx: Don't use current context in __glXSendError	Michel Dänzer	2016-08-17	1	-3/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	There's no guarantee that there is one, and we don't need one anyway. Fixes piglit tests: glx@glx-fbconfig-bad glx@glx_ext_import_context@import context, multi process glx@glx_ext_import_context@import context, single process Fixes: 2e3f067458e4 ("glx: fix error code when there is no context bound") Cc: "11.2" <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
*	nv50/ir: fix bb positions after exit instructions	Ilia Mirkin	2016-08-16	1	-3/+10
\| \| \| \| \| \| \| \| \| \| \|	It's fairly rare that the BB layout puts BBs after the exit block, which is likely the reason these issues lingered for so long. This fixes a fraction of issues with the giant pixmark piano shader. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Cc: <[email protected]>
*	nv50/ir: properly clear upper bits of a bitset fill	Ilia Mirkin	2016-08-16	1	-2/+2
\| \| \| \| \| \| \|	Found by inspection. In practice, val is always == 0, so this never got triggered. Signed-off-by: Ilia Mirkin <[email protected]>
*	i965/fs: Estimate maximum sampler message execution size more accurately.	Francisco Jerez	2016-08-16	1	-37/+72
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current logic used to determine the execution size of sampler messages was based on special-casing several argument and opcode combinations, which unsurprisingly missed the possibility that some messages could exceed the payload size limit or not depending on the number of coordinate components present. In particular: - The TXL, TXB and TEX messages (the latter on non-FS stages only) would attempt to use SIMD16 on Gen7+ hardware even if a shadow reference was present and the texture was a cubemap array, causing it to overflow the maximum supported sampler payload size and crash. - The TG4_OFFSET message with shadow comparison was falling back to SIMD8 regardless of the number of coordinate components, which is unnecessary when two coordinates or less are present. Both cases have been handled incorrectly ever since cubemap arrays and texture gather were respectively enabled (the current logic used by the SIMD lowering pass is almost unchanged from the previous no16 fall-back logic used pre-SIMD lowering times). Fixes the following GL4.5 conformance test on Gen7-8 (the bug also affects Gen9+ in principle, but SKL passes the test by luck because it manages to use the TXL_LZ message instead of TXL): GL45-CTS.texture_cube_map_array.sampling Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97267 Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/fs: Return zero from fs_inst::components_read for non-present sources.	Francisco Jerez	2016-08-16	1	-2/+5
\| \| \| \| \| \| \| \| \|	This makes it easier for the caller to find out how many scalar components are actually read by the instruction. As a bonus we no longer need to special-case BAD_FILE in the implementation of fs_inst::regs_read. Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/fs: Lower TEX to TXL during NIR translation.	Francisco Jerez	2016-08-16	2	-14/+6
\| \| \| \| \| \| \| \|	This simplifies the code slightly and will allow the SIMD lowering pass to find out easily what the actual texturing opcode is in order to determine the maximum execution size of texturing instructions. Reviewed-by: Kenneth Graunke <[email protected]>
*	freedreno/a3xx: fix generic clear path	Rob Clark	2016-08-16	1	-0/+1
\| \| \| \|	Signed-off-by: Rob Clark <[email protected]>
*	st/mesa: use pipe var instead of st->pipe in st_create_context_priv()	Brian Paul	2016-08-16	1	-4/+4
\| \| \| \| \| \|	As is done in most other places in the function. Reviewed-by: Marek Olšák <[email protected]>
*	gallium: remove unused u_clear.h file	Brian Paul	2016-08-16	2	-65/+0
\| \| \| \|	Reviewed-by: Marek Olšák <[email protected]>
*	gallium/i915: inline the util_clear() code into i915_clear_blitter()	Brian Paul	2016-08-16	1	-3/+21
\| \| \| \| \| \|	This is the only place the util_clear() function was used. Reviewed-by: Marek Olšák <[email protected]>
*	gallium/util: minor reformatting in u_box.h	Brian Paul	2016-08-16	1	-29/+13
\| \| \| \|	Reviewed-by: Marek Olšák <[email protected]>
*	svga: remove unused var in svga_mark_surfaces_dirty()	Brian Paul	2016-08-16	1	-1/+0
\| \| \| \|	Signed-off-by: Brian Paul <[email protected]>
*	svga: avoid a calloc in svga_buffer_transfer_map()	Brian Paul	2016-08-16	1	-1/+3
\| \| \| \| \| \|	Just initialize the two other pipe_transfer fields explicitly. Reviewed-by: Charmaine Lee <[email protected]>
*	svga: don't call os_get_time() when not needed by Gallium HUD	Brian Paul	2016-08-16	5	-11/+26
\| \| \| \| \| \| \|	The calls to os_get_time() were showing up higher than expected in profiles. Reviewed-by: Charmaine Lee <[email protected]>
*	svga: remove unneeded memset() call in draw_vgpu10()	Brian Paul	2016-08-16	1	-2/+1
\| \| \| \| \| \| \| \| \|	All three fields of the vbuffer_attrs[] array are assigned in the following loop. The remaining elements of the array are not used. Tested with full Piglit run, Heaven 4.0, etc. Reviewed-by: Charmaine Lee <[email protected]>
*	svga: reduce looping in svga_mark_surfaces_dirty()	Brian Paul	2016-08-16	1	-1/+1
\| \| \| \| \| \| \| \| \|	We don't need to loop over the max number of color buffers, just the current number (which is usually one). Tested with full Piglit run, Heaven 4.0, etc. Reviewed-by: Charmaine Lee <[email protected]>