mesa.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	st/mesa: Disable blending for integer formats.	Kenneth Graunke	2018-08-29	1	-0/+1
\| \| \| \| \| \| \| \| \|	Blending isn't valid for integer formats. Rather than having drivers worry about this, just disable blending in this case. This hopefully will increase hits in the CSO cache as well, by eliminating most of the meaningless fields in this case. Reviewed-by: Marek Olšák <[email protected]>
*	svga: add missing switch cases for shadow textures	Brian Paul	2018-08-29	2	-0/+5
\| \| \| \| \| \| \|	This doesn't seem to make any difference in testing, but it fixes a failed assertion when dumping sm3 shaders. Reviewed-by: Charmaine Lee <[email protected]>
*	svga: fix vgpu9 sprite coordinate bug	Brian Paul	2018-08-29	3	-7/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	Setting GL_POINT_SPRITE_COORD_ORIGIN to GL_LOWER_LEFT did not work for vgpu9. We can use the rasterizer sprite_coord_enable bitfield as-is. We need to index into it using the TGSI semantic index, not the register index. This fixes the Piglit fbo-gl_pointcoord and glsl-fs-pointcoord tests. Testing done: Piglit, Mesa sprite demos Reviewed-by: Charmaine Lee <[email protected]>
*	svga: fix PIPE_TEXTURE_RECT/BUFFER const buffer issue	Brian Paul	2018-08-29	3	-20/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The flag_rect and flag_buffer fields didn't sufficiently capture the state changes needed for those resource types. For example, if a texture binding was changed from a 500x500 rect texture to a 400x400 rect texture we didn't set SVGA_NEW_TEXTURE_CONSTS. But we need to do that to emit the new texcoord scale factors to the constant buffers. Rather than track the sizes of all bound resources, just set the flag if the resource is a rect. Same story with texture buffers. Also, since rect/buffer textures are usable with VS/GS shaders, add SVGA_NEW_TEXTURE_CONSTS to the flags we check for emitting VS/GS constants. This seems to help with XFCE / xfwm4 desktop scaling. VMware issue 2156696. Reviewed-by: Charmaine Lee <[email protected]>
*	svga: minor improvements in svga_state_constants.c	Brian Paul	2018-08-29	1	-17/+13
\| \| \| \| \| \| \| \| \| \|	Add const qualifiers. Add 'f' suffix on floats to avoid double promotion. Remove unneeded shader type assertion since the switch statement handled it already. Reviewed-by: Charmaine Lee <[email protected]>
*	anv: Free the app and engine name	Jason Ekstrand	2018-08-29	1	-0/+3
\| \| \| \| \|	Fixes: 8c048af5890d4 "anv: Copy the appliation info into the instance" Reviewed-by: Lionel Landwerlin <[email protected]>
*	nv50/ir: silence partitionLoadStore() unused function warning	Rhys Kidd	2018-08-29	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \|	Move this now-unused function into the existing comment block, which was its only prior use. ../../../../../src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp:2645:1: warning: unused function 'partitionLoadStore' [-Wunused-function] partitionLoadStore(uint8_t comp[2], uint8_t size[2], uint8_t mask) Fixes: ("86e4440361 nouveau: codegen: Disable more old resource handling code") Signed-off-by: Rhys Kidd <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
*	glsl/linker: Link all out vars from a shader objects on a single stage	vadym.shovkoplias	2018-08-29	1	-0/+37
\| \| \| \| \| \| \| \| \| \| \|	During intra stage linking some out variables can be dropped because it is not used in a shader with the main function. But these out vars can be referenced on later stages which can lead to further linking errors. Signed-off-by: Vadym Shovkoplias <[email protected]> Reviewed-by: Timothy Arceri <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105731
*	anv: blorp: support multiple aspect blits	Lionel Landwerlin	2018-08-29	1	-70/+75
\| \| \| \| \| \| \| \| \| \|	Newer blit tests are enabling depth&stencils blits. We currently don't support it but can do by iterating over the aspects masks (copy some logic from the CopyImage function). Signed-off-by: Lionel Landwerlin <[email protected]> Fixes: 9f44745eca0e41 ("anv: Use blorp to implement VkBlitImage") Reviewed-by: Jason Ekstrand <[email protected]>
*	mesa: allow GL_UNSIGNED_BYTE type for SNORM reads	Tapani Pälli	2018-08-29	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	OpenGL ES spec states: "For normalized fixed-point rendering surfaces, the combination format RGBA and type UNSIGNED_BYTE is accepted." This fixes following failing VK-GL-CTS tests: KHR-GLES3.packed_pixels.pbo_rectangle.rgba8_snorm KHR-GLES3.packed_pixels.rectangle.rgba8_snorm KHR-GLES3.packed_pixels.varied_rectangle.rgba8_snorm Signed-off-by: Tapani Pälli <[email protected]> https://bugs.freedesktop.org/show_bug.cgi?id=107658 Cc: [email protected] Reviewed-by: Marek Olšák <[email protected]> Tested-by: Andres Gomez <[email protected]>
*	nir: add loop unroll support for wrapper loops	Timothy Arceri	2018-08-29	1	-0/+77
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds support for unrolling the classic do { // ... } while (false) that is used to wrap multi-line macros. GLSL IR also wraps switch statements in a loop like this. shader-db results IVB: total loops in shared programs: 2515 -> 2512 (-0.12%) loops in affected programs: 33 -> 30 (-9.09%) helped: 3 HURT: 0 Reviewed-by: Jason Ekstrand <[email protected]>
*	nir/opt_loop_unroll: Remove unneeded phis if we make progress	Timothy Arceri	2018-08-29	1	-1/+9
\| \| \| \| \| \| \| \| \| \| \|	Now that SSA values can be derefs and they have special rules, we have to be a bit more careful about our LCSSA phis. In particular, we need to clean up in case LCSSA ended up creating a phi node for a deref. This avoids validation issues with some CTS tests with the following patch, but its possible this we could also see the same problem with the existing unrolling passes. Reviewed-by: Jason Ekstrand <[email protected]>
*	nir: add complex_loop bool to loop info	Timothy Arceri	2018-08-29	2	-2/+12
\| \| \| \| \| \| \| \| \| \| \|	In order to be sure loop_terminator_list is an accurate representation of all the jumps in the loop we need to be sure we didn't encounter any other complex behaviour such as continues, nested breaks, etc during analysis. This will be used in the following patch. Reviewed-by: Jason Ekstrand <[email protected]>
*	nir: always attempt to find loop terminators	Timothy Arceri	2018-08-29	1	-7/+7
\| \| \| \| \| \| \|	This will help later patches with unrolling loops that end with a break i.e. loops the always exit on their first interation. Reviewed-by: Jason Ekstrand <[email protected]>
*	ac/surface: fix CMASK fast clear for NPOT textures with mipmapping on SI/CI/VI	Marek Olšák	2018-08-28	1	-2/+2
\| \| \| \| \| \| \|	This fixes VM faults and corruption. Cc: 18.1 18.2 <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
*	i965/vec4: Emit BRW_AOP_INC or BRW_AOP_DEC for atomicAdd of +1 or -1	Ian Romanick	2018-08-28	1	-3/+16
\| \| \| \| \| \| \|	No shader-db changes on any Intel platform. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
*	i965/fs: Emit BRW_AOP_INC or BRW_AOP_DEC for imageAtomicAdd of +1 or -1	Ian Romanick	2018-08-28	1	-3/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	v2: Refactor selection of atomic opcode to a separate function. Suggested by Jason. No changes on any other Intel platforms. Skylake total instructions in shared programs: 14304261 -> 14304241 (<.01%) instructions in affected programs: 1625 -> 1605 (-1.23%) helped: 4 HURT: 0 helped stats (abs) min: 1 max: 8 x̄: 5.00 x̃: 5 helped stats (rel) min: 1.01% max: 14.29% x̄: 5.86% x̃: 4.07% 95% mean confidence interval for instructions value: -10.66 0.66 95% mean confidence interval for instructions %-change: -15.91% 4.19% Inconclusive result (value mean confidence interval includes 0). total cycles in shared programs: 527531226 -> 527531194 (<.01%) cycles in affected programs: 92204 -> 92172 (-0.03%) helped: 2 HURT: 0 Haswell and Broadwell had similar results. (Broadwell shown) total instructions in shared programs: 14615730 -> 14615710 (<.01%) instructions in affected programs: 1838 -> 1818 (-1.09%) helped: 4 HURT: 0 helped stats (abs) min: 1 max: 8 x̄: 5.00 x̃: 5 helped stats (rel) min: 0.89% max: 13.04% x̄: 5.37% x̃: 3.78% 95% mean confidence interval for instructions value: -10.66 0.66 95% mean confidence interval for instructions %-change: -14.59% 3.85% Inconclusive result (value mean confidence interval includes 0). Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965/fs: Refactor image atomics to be a bit more like other atomics	Ian Romanick	2018-08-28	1	-40/+44
\| \| \| \| \| \| \|	This greatly simplifies the next patch. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
*	i965/fs: Emit BRW_AOP_INC or BRW_AOP_DEC for atomicAdd of +1 or -1	Ian Romanick	2018-08-28	1	-4/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Funny story... a single shader was hurt for instructions, spills, fills. That same shader was also the most helped for cycles. #GPUsAreWeird No changes on any other Intel platform. v2: Refactor selection of atomic opcode to a separate function. Suggested by Jason. Haswell, Broadwell, and Skylake had similar results. (Skylake shown) total instructions in shared programs: 14304116 -> 14304261 (<.01%) instructions in affected programs: 12776 -> 12921 (1.13%) helped: 19 HURT: 1 helped stats (abs) min: 1 max: 16 x̄: 2.32 x̃: 1 helped stats (rel) min: 0.05% max: 7.27% x̄: 0.92% x̃: 0.55% HURT stats (abs) min: 189 max: 189 x̄: 189.00 x̃: 189 HURT stats (rel) min: 4.87% max: 4.87% x̄: 4.87% x̃: 4.87% 95% mean confidence interval for instructions value: -12.83 27.33 95% mean confidence interval for instructions %-change: -1.57% 0.31% Inconclusive result (value mean confidence interval includes 0). total cycles in shared programs: 527552861 -> 527531226 (<.01%) cycles in affected programs: 1459195 -> 1437560 (-1.48%) helped: 16 HURT: 2 helped stats (abs) min: 2 max: 21328 x̄: 1353.69 x̃: 6 helped stats (rel) min: 0.01% max: 5.29% x̄: 0.36% x̃: 0.03% HURT stats (abs) min: 12 max: 12 x̄: 12.00 x̃: 12 HURT stats (rel) min: 0.03% max: 0.03% x̄: 0.03% x̃: 0.03% 95% mean confidence interval for cycles value: -3699.81 1295.92 95% mean confidence interval for cycles %-change: -0.94% 0.30% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 8025 -> 8033 (0.10%) spills in affected programs: 208 -> 216 (3.85%) helped: 1 HURT: 1 total fills in shared programs: 10989 -> 11040 (0.46%) fills in affected programs: 444 -> 495 (11.49%) helped: 1 HURT: 1 Ivy Bridge total instructions in shared programs: 11709181 -> 11709153 (<.01%) instructions in affected programs: 3505 -> 3477 (-0.80%) helped: 3 HURT: 0 helped stats (abs) min: 1 max: 23 x̄: 9.33 x̃: 4 helped stats (rel) min: 0.11% max: 1.16% x̄: 0.63% x̃: 0.61% total cycles in shared programs: 254741126 -> 254738801 (<.01%) cycles in affected programs: 919067 -> 916742 (-0.25%) helped: 3 HURT: 0 helped stats (abs) min: 21 max: 2144 x̄: 775.00 x̃: 160 helped stats (rel) min: 0.03% max: 0.90% x̄: 0.32% x̃: 0.03% total spills in shared programs: 4536 -> 4533 (-0.07%) spills in affected programs: 40 -> 37 (-7.50%) helped: 1 HURT: 0 total fills in shared programs: 4819 -> 4813 (-0.12%) fills in affected programs: 94 -> 88 (-6.38%) helped: 1 HURT: 0 Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> [v1] Reviewed-by: Jason Ekstrand <[email protected]>
*	intel/compiler: Silence unused parameter warnings in brw_eu.h	Ian Romanick	2018-08-28	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	All of the other brw__desc functions take a devinfo parameter, and all of the others at least have an assert that uses it. Keep the parameter, but mark it as unused. Silences 37 warnings like: In file included from src/intel/common/gen_disasm.c:27:0: src/intel/compiler/brw_eu.h: In function ‘brw_pixel_interp_desc’: src/intel/compiler/brw_eu.h:377:53: warning: unused parameter ‘devinfo’ [-Wunused-parameter] brw_pixel_interp_desc(const struct gen_device_info devinfo, ^~~~~~~ Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
*	i965: enable AMD_depth_clamp_separate	Sagar Ghuge	2018-08-28	1	-0/+1
\| \| \| \| \|	Signed-off-by: Sagar Ghuge <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
*	i965: add functional changes for AMD_depth_clamp_separate	Sagar Ghuge	2018-08-28	1	-5/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Gen >= 9 have ability to control clamping of depth values separately at near and far plane. z_w is clamped to the range [min(n,f), 0] if clamping at near plane is enabled, [0, max(n,f)] if clamping at far plane is enabled and [min(n,f) max(n,f)] if clamping at both plane is enabled. v2: 1) Use better coding style (Ian Romanick) Signed-off-by: Sagar Ghuge <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
*	mesa: add EXTRA_EXT for AMD_depth_clamp_separate	Sagar Ghuge	2018-08-28	2	-0/+6
\| \| \| \| \| \|	Signed-off-by: Sagar Ghuge <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
*	mesa: add support for GL_AMD_depth_clamp_separate tokens	Sagar Ghuge	2018-08-28	1	-0/+36
\| \| \| \| \| \| \| \| \| \| \|	_mesa_set_enable() and _mesa_IsEnabled() extended to accept new two tokens GL_DEPTH_CLAMP_NEAR_AMD and GL_DEPTH_CLAMP_FAR_AMD. v2: Remove unnecessary parentheses (Marek Olsak) Signed-off-by: Sagar Ghuge <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
*	mesa: Add support for AMD_depth_clamp_separate	Sagar Ghuge	2018-08-28	12	-33/+111
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Enable _mesa_PushAttrib() and _mesa_PopAttrib() to handle GL_DEPTH_CLAMP_NEAR_AMD and GL_DEPTH_CLAMP_FAR_AMD tokens. Remove DepthClamp, because DepthClampNear + DepthClampFar replaces it, as suggested by Marek Olsak. Driver that enables AMD_depth_clamp_separate will only ever look at DepthClampNear and DepthClampFar, as suggested by Ian Romanick. v2: 1) Remove unnecessary parentheses (Marek Olsak) 2) if AMD_depth_clamp_separate is unsupported, TEST_AND_UPDATE GL_DEPTH_CLAMP only (Marek Olsak) 3) Clamp against near and far plane separately (Marek Olsak) 4) Clip point separately for near and far Z clipping plane (Marek Olsak) v3: Clamp raster position zw to the range [min(n,f), 0] for near plane and [0, max(n,f)] for far plane (Marek Olsak) v4: Use MIN2 and MAX2 instead of CLAMP (Marek Olsak) Signed-off-by: Sagar Ghuge <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
*	mesa: Add types for AMD_depth_clamp_separate.	Sagar Ghuge	2018-08-28	2	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \|	Add some basic types and storage for the AMD_depth_clamp_separate extension. v2: 1) Drop unnecessary definition (Marek Olsak) 2) Expose extension in compatibility profile (Marek Olsak) Signed-off-by: Sagar Ghuge <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
*	glapi: define AMD_depth_clamp_separate	Sagar Ghuge	2018-08-28	4	-0/+20
\| \| \| \| \| \|	Signed-off-by: Sagar Ghuge <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
*	anv: Claim to support depthBounds for ID games	Jason Ekstrand	2018-08-28	1	-0/+9
\| \| \| \| \|	Cc: "18.2" <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
*	anv: Copy the appliation info into the instance	Jason Ekstrand	2018-08-28	2	-8/+30
\| \| \| \| \|	Cc: "18.2" <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
*	vulkan/alloc: Add a vk_strdup helper	Jason Ekstrand	2018-08-28	1	-0/+17
\| \| \| \| \|	Cc: "18.2" <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
*	meson: Actually load translation files	Dylan Baker	2018-08-28	1	-1/+4
\| \| \| \| \| \| \| \| \|	Currently we run the script but don't actually load any files, even in a tarball where they exist. Fixes: 3218056e0eb375eeda470058d06add1532acd6d4 ("meson: Build i965 and dri stack") Reviewed-by: Eric Engestrom <[email protected]>
*	nir: Remove outdated comment	Caio Marcelo de Oliveira Filho	2018-08-28	1	-3/+0
\| \| \| \|	Reviewed-by: Jason Ekstrand <[email protected]>
*	i965: Add INTEL_fragment_shader_ordering support.	Kevin Rogovin	2018-08-28	2	-0/+2
\| \| \| \| \| \| \| \| \| \|	Adds suppport for INTEL_fragment_shader_ordering. We achieve the fragment ordering by using the same instruction as for beginInvocationInterlockARB() which is by issuing a memory fence via sendc. Signed-off-by: Kevin Rogovin <[email protected]> Reviewed-by: Plamena Manolova <[email protected]>
*	mesa: Add GL/GLSL plumbing for INTEL_fragment_shader_ordering	Kevin Rogovin	2018-08-28	8	-0/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This extension provides new GLSL built-in function beginFragmentShaderOrderingIntel() that guarantees (taking wording of GL_INTEL_fragment_shader_ordering extension) that any memory transactions issued by shader invocations from previous primitives mapped to same xy window coordinates (and same sample when per-sample shading is active), complete and are visible to the shader invocation that called beginFragmentShaderOrderingINTEL(). One advantage of INTEL_fragment_shader_ordering over ARB_fragment_shader_interlock is that it provides a function that operates as a memory barrie (instead of a defining a critcial section) that can be called under arbitary control flow from any function (in contrast the begin/end of ARB_fragment_shader_interlock may only be called once, from main(), under no control flow. Signed-off-by: Kevin Rogovin <[email protected]> Reviewed-by: Plamena Manolova <[email protected]>
*	i965/gen6/xfb: handle case where transform feedback is not active	Andrii Simiklit	2018-08-28	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	When the SVBI Payload Enable is false I guess the register R1.4 which contains the Maximum Streamed Vertex Buffer Index is filled by zero and GS stops to write transform feedback when the transform feedback is not active. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107579 Signed-off-by: Andrii Simiklit <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
*	virgl: add debug-switch to output TGSI	Erik Faye-Lund	2018-08-28	3	-0/+5
\| \| \| \| \| \| \| \|	This is quite useful for debugging shader-transpiling issues in virglrenderer. Signed-off-by: Erik Faye-Lund <[email protected]> Reviewed-By: Gert Wollny <[email protected]>
*	virgl: introduce $VIRGL_DEBUG=verbose	Erik Faye-Lund	2018-08-28	4	-3/+18
\| \| \| \| \| \| \| \| \| \|	This adds an environment-varaible that can be used for driver-specific flags, as well as a flag for it to enable verbose output. While we're at it, quiet some overly chatty debug-output by default. Signed-off-by: Erik Faye-Lund <[email protected]> Reviewed-By: Gert Wollny <[email protected]>
*	virgl: replace fprintf-call with debug_printf	Erik Faye-Lund	2018-08-28	1	-1/+1
\| \| \| \| \| \| \| \|	This is the only direct call-site for fprintf in virgl; all other call-sites call debug_printf instead. So let's follow in style here. Signed-off-by: Erik Faye-Lund <[email protected]> Reviewed-By: Gert Wollny <[email protected]>
*	virgl: delete commented out fprintf-call	Erik Faye-Lund	2018-08-28	1	-1/+0
\| \| \| \| \| \| \|	This is just debug-cruft left over. Let's just get rid of it. Signed-off-by: Erik Faye-Lund <[email protected]> Reviewed-By: Gert Wollny <[email protected]>
*	intel/eu: print bytes instead of 32 bit hex value	Sagar Ghuge	2018-08-27	1	-17/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	INTEL_DEBUG=hex prints 32 bit hex value and due to endianness of CPU byte order is reversed. In order to disassemble binary files, print each byte instead of 32 bit hex value. v2: Print blank spaces in order to vertically align output of compacted instructions hex value with uncompacted instructions hex value. (Matt Turner) v3: Fix line wrap at correct length Signed-off-by: Sagar Ghuge <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	intel: decoder: handle 0 sized structs	Lionel Landwerlin	2018-08-27	1	-4/+8
\| \| \| \| \| \| \| \| \| \|	Gen7.5 has a BLEND_STATE of size 0 which includes a variable length group. We did not deal with that very well, leading to an endless loop. Signed-off-by: Lionel Landwerlin <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107544 Reviewed-by: Jason Ekstrand <[email protected]>
*	nv50/ir,nvc0: use constant buffers for compute when possible on Kepler+	Rhys Perry	2018-08-27	2	-10/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Gives a +7.79% increase in FPS with Hitman on lowest quality settings on my GTX 1060. total instructions in shared programs : 5787979 -> 5748677 (-0.68%) total gprs used in shared programs : 669901 -> 669373 (-0.08%) total shared used in shared programs : 548832 -> 548832 (0.00%) total local used in shared programs : 21068 -> 21064 (-0.02%) local shared gpr inst bytes helped 1 0 152 274 274 hurt 0 0 0 0 0 Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Karol Herbst <[email protected]>
*	nv50/ir: optimize multiplication by 16-bit immediates into two xmads	Rhys Perry	2018-08-27	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Rather than the usual three that would be created. total instructions in shared programs : 5796385 -> 5786560 (-0.17%) total gprs used in shared programs : 670103 -> 669968 (-0.02%) total shared used in shared programs : 548832 -> 548832 (0.00%) total local used in shared programs : 21164 -> 21068 (-0.45%) local shared gpr inst bytes helped 1 0 64 1040 1040 hurt 0 0 27 0 0 Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Karol Herbst <[email protected]>
*	nv50/ir: optimize near power-of-twos into shladd	Rhys Perry	2018-08-27	1	-0/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	total instructions in shared programs : 5819319 -> 5796385 (-0.39%) total gprs used in shared programs : 670571 -> 670103 (-0.07%) total shared used in shared programs : 548832 -> 548832 (0.00%) total local used in shared programs : 21164 -> 21164 (0.00%) local shared gpr inst bytes helped 0 0 318 1758 1758 hurt 0 0 63 0 0 Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Karol Herbst <[email protected]>
*	nv50/ir: move a * b -> a << log2(b) code into createMul()	Rhys Perry	2018-08-27	1	-15/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With this commit, OP_MAD is handled on nv50 too. This commit is also useful for later commits. Also, instead of creating a shladd, it relies on LateAlgebraicOpt to create one. This simplifies the code and helps shader-db slightly overall. total instructions in shared programs : 5820882 -> 5819319 (-0.03%) total gprs used in shared programs : 670595 -> 670571 (-0.00%) total shared used in shared programs : 548832 -> 548832 (0.00%) total local used in shared programs : 21164 -> 21164 (0.00%) local shared gpr inst bytes helped 0 0 18 230 230 hurt 0 0 8 263 263 Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Karol Herbst <[email protected]>
*	nv50/ir: optimize imul/imad to xmads	Rhys Perry	2018-08-27	2	-1/+56
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This hits the shader-db numbers a good bit, though a few xmads is way faster than an imul or imad and the cost is mitigated by the next commit, which optimizes many multiplications by immediates into shorter and less register heavy instructions than the xmads. total instructions in shared programs : 5768871 -> 5820882 (0.90%) total gprs used in shared programs : 669919 -> 670595 (0.10%) total shared used in shared programs : 548832 -> 548832 (0.00%) total local used in shared programs : 21068 -> 21164 (0.46%) local shared gpr inst bytes helped 0 0 38 0 0 hurt 1 0 365 3076 3076 Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Karol Herbst <[email protected]>
*	gm107/ir: add support for OP_XMAD on GM107+	Rhys Perry	2018-08-27	3	-1/+71
\| \| \| \| \| \| \| \|	v4: make the immediate field 16 bits v5: don't ever emit h1 flags for immediates Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Karol Herbst <[email protected]>
*	nv50/ir: add preliminary support for OP_XMAD	Rhys Perry	2018-08-27	7	-5/+85
\| \| \| \| \| \| \| \|	v4: remove uint16_t(...) v4: don't allow immediates outside [0,65535] in insnCanLoad() Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Karol Herbst <[email protected]>
*	glsl/linker: Allow unused in blocks which are not declated on previous stage	vadym.shovkoplias	2018-08-27	2	-3/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	>From Section 4.3.4 (Inputs) of the GLSL 1.50 spec: "Only the input variables that are actually read need to be written by the previous stage; it is allowed to have superfluous declarations of input variables." Fixes: * interstage-multiple-shader-objects.shader_test v2: Update comment in ir.h since the usage of "used" field has been extended. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101247 Signed-off-by: Vadym Shovkoplias <[email protected]> Reviewed-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
*	nir: Pull block_ends_in_jump into nir.h	Jason Ekstrand	2018-08-27	3	-23/+13
\| \| \| \| \| \| \|	We had two different implementations in different files. May as well have one and put it in nir.h. Reviewed-by: Timothy Arceri <[email protected]>