mesa.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	st/mesa: completely rewrite state atoms	Marek Olšák	2016-07-30	33	-516/+381
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The goal is to do this in st_validate_state: while (dirty) atoms[u_bit_scan(&dirty)]->update(st); That implies that atoms can't specify which flags they consume. There is exactly one ST_NEW_* flag for each atom. (58 flags in total) There are macros that combine multiple flags into one for easier use. All _NEW_* flags are translated into ST_NEW_* flags in st_invalidate_state. st/mesa doesn't keep the _NEW_* flags after that. torcs is 2% faster between the previous patch and the end of this series. v2: - add st_atom_list.h to Makefile.sources Reviewed-by: Nicolai Hähnle <[email protected]>
*	st/mesa: remove st_tracked_state::name	Marek Olšák	2016-07-30	20	-58/+0
\| \| \| \|	Reviewed-by: Nicolai Hähnle <[email protected]>
*	st/mesa: remove atom debugging code	Marek Olšák	2016-07-30	1	-67/+3
\| \| \| \| \| \|	This won't be needed after the rewrite. Reviewed-by: Nicolai Hähnle <[email protected]>
*	i965: Fix move_interpolation_to_top() pass.	Kenneth Graunke	2016-07-29	1	-21/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The pass I introduced in commit a2dc11a7818c04d8dc0324e8fcba98d60bae was entirely broken. A missing "break" made the load_interpolated_input case always fall through to "default" and hit a "continue", making it not actually move any load_interpolated_input intrinsics at all. It would only move the simple load_barycentric_* intrinsics, which don't emit any code anyway, making it basically useless. The initial version I sent of the pass worked, but I apparently failed to verify that the simplified version in v2 actually worked. With the obvious fix applied (so we actually tried to move load_interpolated_input intrinsics), I discovered a second bug: we weren't moving the offset SSA def to the top, breaking SSA validation. The new version of the pass actually moves load_interpolated_input intrinsics and all their dependencies, as intended. Papers over GPU hangs on Ivybridge and Baytrail caused by the recent NIR FS input rework by restoring the old behavior. (I'm not honestly sure why they hang with PLN not at the top.) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97083 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	st_glsl_to_tgsi: only skip over slots of an input array that are present	Nicolai Hähnle	2016-07-28	1	-1/+5
\| \| \| \| \| \| \| \| \| \|	When an application declares varying arrays but does not actually do any indirect indexing, some array indices may end up unused in the consuming shader, so the number of input slots that correspond to the array ends up less than the array_size. Cc: [email protected] Reviewed-by: Marek Olšák <[email protected]>
*	i965: remove unnecessary null check	Timothy Arceri	2016-07-28	1	-4/+1
\| \| \| \| \| \| \| \|	We would have hit a segfault already if this could be null. Fixes Coverity warning spotted by Matt. Reviewed-by: Matt Turner <[email protected]>
*	vbo: Fix handling of POS/GENERIC0 attributes.	Mathias Fröhlich	2016-07-27	1	-3/+16
\| \| \| \| \| \| \| \| \| \| \| \| \|	In case of split primitives we need to restore the original setting of the vtx.attrsz array to make immediate mode attribute array tracking work. v2: Use bool instead of boolean. Signed-off-by: Mathias Fröhlich <[email protected]> Reviewed-by: Brian Paul <[email protected]> Tested-by: Brian Paul <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96950
*	mesa: standardize naming Mesa3D, MESA -> Mesa	Vedran Miletić	2016-07-26	1	-1/+1
\| \| \| \| \|	Signed-off-by: Vedran Miletić <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
*	mesa: Make MESA_SHADER_CAPTURE_PATH skip shaders with Name == -1.	Kenneth Graunke	2016-07-26	1	-1/+1
\| \| \| \| \| \| \| \|	Shaders with shProg->Name == ~0 (aka 4294967295) are internal meta shaders that we don't really want to capture. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	mesa: Avoid aliasing violation in uniform_query.cpp.	Matt Turner	2016-07-26	1	-14/+31
\| \| \| \| \|	Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
*	mesa: Avoid aliasing violation in FXT1.	Matt Turner	2016-07-26	1	-2/+2
\| \| \| \| \|	Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
*	swrast: Avoid aliasing violation.	Matt Turner	2016-07-26	1	-2/+2
\| \| \| \| \|	Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
*	glsl: Separate overlapping sentinel nodes in exec_list.	Matt Turner	2016-07-26	3	-3/+3
\| \| \| \| \| \| \| \| \| \| \|	I do appreciate the cleverness, but unfortunately it prevents a lot more cleverness in the form of additional compiler optimizations brought on by -fstrict-aliasing. No difference in OglBatch7 (n=20). Co-authored-by: Davin McCall <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
*	i965/miptree: Stop multiplying cube depth by 6 in HiZ calculations	Jason Ekstrand	2016-07-26	1	-17/+2
\| \| \| \| \| \| \| \| \|	intel_mipmap_tree::logical_depth0 is now in number of 2D slices so we no longer need to be multiplying by 6. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Anuj Phogat <[email protected]> Cc: "12.0" <[email protected]>
*	i965/miptree/isl: Stop multiplying depth by 6 for cubes	Jason Ekstrand	2016-07-26	1	-5/+0
\| \| \| \| \| \| \| \| \| \|	Now that the logical_depth0 field is in number of 2D slices, we don't need to be multiplying by 6 when creating the surface. It wasn't hurting anything primarily because we get the actual length from the view which was already handling it correctly. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
*	i965/blorp/gen8: Stop multiplying depth by 6 for cubes	Jason Ekstrand	2016-07-26	1	-4/+1
\| \| \| \| \| \| \| \|	intel_mipmap_tree::logical_depth0 is now in 2-D slices so there is no need for us to multiply by 6 when we go to fill out a blorp surface state. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
*	main: memcpy larger chunks in _mesa_propagate_uniforms_to_driver_storage	Nils Wallménius	2016-07-25	1	-6/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When possible, do the memcpy on larger blocks. This reduces cycles spent in _mesa_propagate_uniforms_to_driver_storage from 1.51 % to 0.62% according to perf during the Unigine Heaven benchmark. It did not affect the framerate of the benchmark. The system used for testing was an i5 6600K with a Radeon R9 380. Piglit hangs randomly on this system both with and without the patch so i could not make a comparison. v2: fixed whitespace Signed-off-by: Nils Wallménius <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
*	glsl: reuse main extension table to appropriately restrict extensions	Ilia Mirkin	2016-07-23	5	-26/+62
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously we were only restricting based on ES/non-ES-ness and whether the overall enable bit had been flipped on. However we have been adding more fine-grained restrictions, such as based on compat profiles, as well as specific ES versions. Most of the time this doesn't matter, but it can create awkward situations and duplication of logic. Here we separate the main extension table into a separate object file, linked to the glsl compiler, which makes use of it with a custom function which takes the ES-ness of the shader into account (thus allowing desktop shaders to properly use ES extensions that would otherwise have been disallowed.) We can also now use this logic to generate #define's for all supported extensions automatically, removing the duplicate (and often inaccurate) list in glcpp. The effect of this change should be nil in most cases. However in some situations, extensions like GL_ARB_gpu_shader5 which were formerly available in compat contexts on the GLSL side of things will now become inaccessible. This regresses two ES CTS tests: ES3-CTS.shaders.shader_integer_mix.define ES31-CTS.shader_integer_mix.define however that is due to them using #version 100 instead of 300 es. As the extension is only defined for ES3, I believe this is the correct behavior. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> (v2) v2 -> v3: integrate glcpp defines into the same mechanism
*	gallium: split transfer_inline_write into buffer and texture callbacks	Marek Olšák	2016-07-23	2	-9/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	to reduce the call indirections with u_resource_vtbl. The worst call tree you could get was: - u_transfer_inline_write_vtbl - u_default_transfer_inline_write - u_transfer_map_vtbl - driver_transfer_map - u_transfer_unmap_vtbl - driver_transfer_unmap That's 6 indirect calls. Some drivers only had 5. The goal is to have 1 indirect call for drivers that care. The resource type can be determined statically at most call sites. The new interface is: pipe_context::buffer_subdata(ctx, resource, usage, offset, size, data) pipe_context::texture_subdata(ctx, resource, level, usage, box, data, stride, layer_stride) v2: fix whitespace, correct ilo's behavior Reviewed-by: Nicolai Hähnle <[email protected]> Acked-by: Roland Scheidegger <[email protected]>
*	mesa: Don't call GenerateMipmap if Width or Height == 0.	Kenneth Graunke	2016-07-22	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	One of the WebGL 2.0 conformance tests is trying to call glGenerateMipmaps with a width and height of 0. With the meta implementation, this generates a "framebuffer attachment incomplete" status, and falls back to the CPU path, calling MapTextureImage. Except that there's no actual texture to map, and we assert fail. There's no work to do in this case. The test expects it to succeed, so just return early with no error and avoid hassling the driver. Cc: [email protected] Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96911 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
*	i965: Get rid of the do_lower_unnormalized_offsets pass	Jason Ekstrand	2016-07-22	4	-109/+0
\| \| \| \| \| \| \| \| \|	We can do this in NIR now. No need to keep a GLSL pass lying around for it. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: "12.0" <[email protected]>
*	i965/nir: Enable NIR lowering of txf and rect offsets	Jason Ekstrand	2016-07-22	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes the following piglit tests on gen6+: tex-miplevel-selection textureProjGradOffset 2DRect tex-miplevel-selection textureGradOffset 2DRect tex-miplevel-selection textureGradOffset 2DRectShadow tex-miplevel-selection textureProjGradOffset 2DRect_ProjVec4 tex-miplevel-selection textureProjGradOffset 2DRectShadow Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: "12.0" <[email protected]>
*	gallium: add PIPE_FLUSH_DEFERRED	Marek Olšák	2016-07-22	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	There are 2 uses: - Asynchronous flushing for multithreaded drivers. - Return a fence without flushing (mid-command-buffer fence). The driver can defer flushing until fence_finish is called. This is required to make Bioshock Infinite faster, which creates 1000 fences (flushes) per frame. Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Rob Clark <[email protected]>
*	i965: fix varying output setup	Timothy Arceri	2016-07-23	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Since 7f53fead5c we treat every location as using all four components so we only need special handling for doubles when they cross multiple locations. This fixes a crash in GL45-CTS.enhanced_layouts.varying_locations where the outputs array would overflow when a dmat2 was stored at the max varying location i.e 30. Reviewed-by: Iago Toral Quiroga <[email protected]>
*	mesa: Add GL_BGRA_EXT to the list of GenerateMipmap internal formats.	Kenneth Graunke	2016-07-21	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The GL_EXT_texture_format_BGRA8888 extension specification defines a GL_BGRA_EXT unsized internal format (which is a little odd - usually BGRA is a pixel transfer format). The extension is written against the ES 1.0 specification, so it's a little hard to map, but I believe it's effectively adding it to the table used here, so we should allow it here as well. Note that GL_EXT_texture_format_BGRA8888 is always enabled (dummy_true), so we don't need to check if it's enabled here. This fixes mipmap generation in Skia and ChromeOS. Signed-off-by: Kenneth Graunke <[email protected]> References: https://bugs.chromium.org/p/chromium/issues/detail?id=630371 Reviewed-by: Ian Romanick <[email protected]> Reported-by: Stéphane Marchesin <[email protected]> Cc: [email protected]
*	i965: Fix "operation operation" in comment.	Kenneth Graunke	2016-07-21	1	-1/+1
\| \| \| \| \| \|	From the redundant redundant department. Reported-by: Michael Schellenberger Costa <[email protected]>
*	i965: Fix shared atomic intrinsics to pay attention to base.	Kenneth Graunke	2016-07-21	1	-1/+12
\| \| \| \| \| \|	Cc: "12.0" <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
*	i965: Include VUE handles for GS with invocations > 1.	Kenneth Graunke	2016-07-21	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We always resort to the pull model for instanced GS inputs. So, we'd better include the VUE handles, or else we can't actually pull anything. Ian reports that on his branch with OES_geometry_shader enabled, this fixes a bunch of dEQP-GLES31.functional.geometry_shading tests:: - instanced.draw_2_instances_geometry_2_invocations - instanced.draw_2_instances_geometry_8_invocations - instanced.draw_4_instances_geometry_2_invocations - instanced.draw_4_instances_geometry_8_invocations - instanced.draw_8_instances_geometry_2_invocations - instanced.draw_8_instances_geometry_8_invocations - instanced.geometry_2_invocations - instanced.geometry_32_invocations - instanced.geometry_8_invocations - instanced.geometry_max_invocations - instanced.geometry_output_different_2_invocations - instanced.geometry_output_different_32_invocations - instanced.geometry_output_different_8_invocations - instanced.geometry_output_different_max_invocations - instanced.invocation_output_vary_by_attribute - instanced.invocation_output_vary_by_texture - instanced.invocation_output_vary_by_uniform - query.primitives_generated_instanced Cc: [email protected] Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]> Tested-by: Ian Romanick <[email protected]>
*	i965: print error messages if gs fails to compile	Timothy Arceri	2016-07-21	1	-0/+6
\| \| \| \| \| \|	We do this for all other stages. Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: enable GL4.4 for Gen8+	Timothy Arceri	2016-07-21	2	-2/+2
\| \| \| \|	Acked-by: Kenneth Graunke <[email protected]>
*	i965: enable ARB_enhanced_layouts for gen6+	Timothy Arceri	2016-07-21	1	-1/+1
\| \| \| \|	Reviewed-by: Edward O'Callaghan <[email protected]>
*	i965/vec4: add packing support for tcs load outputs	Timothy Arceri	2016-07-21	3	-7/+17
\| \| \| \| \|	Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
*	i965/vec4: add support for packing tes inputs	Timothy Arceri	2016-07-21	1	-4/+10
\| \| \| \| \|	Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Alejandro Piñeiro <[email protected]>
*	i965/vec4: add support for packing tcs outputs	Timothy Arceri	2016-07-21	1	-0/+7
\| \| \| \| \|	Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
*	i965/vec4: support packing tcs inputs	Timothy Arceri	2016-07-21	2	-2/+7
\| \| \| \| \|	Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
*	i965/vec4: add component packing for gs	Timothy Arceri	2016-07-21	1	-0/+2
\| \| \| \| \|	Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
*	i965/vec4: add support for packing vs/gs/tes outputs	Timothy Arceri	2016-07-21	3	-4/+45
\| \| \| \| \| \| \| \| \| \| \|	Here we create a new output_generic_reg array with the ability to store the dst_reg for each component of user defined varyings. This is needed as the previous code only stored the dst_reg based on the varying location which meant packed varyings would overwrite each other. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Alejandro Piñeiro <[email protected]>
*	i965/vec4: add support for packing inputs	Timothy Arceri	2016-07-21	1	-0/+2
\| \| \| \| \|	Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
*	i965: add helper for creating packing writemask	Timothy Arceri	2016-07-21	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \|	For example where n=3 first_component=1 this will give us 0xE (WRITEMASK_YZW). V2: Add assert to check first component is <= 4 (Suggested by Ken) Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
*	i965: add helpers for creating component layout swizzle	Timothy Arceri	2016-07-21	1	-0/+3
\| \| \| \| \| \| \| \| \|	This will be used to swizzle components to the beginning or end of the vector based on the component layout qualifier and whether we are doing a load or store. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
*	i965: enable ARB_enhanced_layouts for gen8+	Timothy Arceri	2016-07-21	1	-0/+1
\| \| \| \|	Acked-by: Edward O'Callaghan <[email protected]>
*	i965: add component packing support for load_output intrinsics	Timothy Arceri	2016-07-21	1	-5/+33
\| \| \| \| \| \| \| \|	Here we use the component qualifier (which is the first component) as an offset when loading output varyings. Reviewed-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: enable component packing for vs and fs	Timothy Arceri	2016-07-21	4	-25/+16
\| \| \| \| \| \| \| \| \| \| \|	Rather than trying to work out the total number of components used at a location we simply treat all outputs as vec4s. This removes the need for complex code looping over varyings to match packed locations and the need for storing the total number of components used at each location. Reviewed-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: bring back type_size_vec4_times_4()	Timothy Arceri	2016-07-21	2	-0/+14
\| \| \| \| \| \| \| \|	We will use this for output varyings. To make component packing simpler we will just treat all varyings as vec4s. Reviewed-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Move VS load_input handling to nir_emit_vs_intrinsic().	Kenneth Graunke	2016-07-20	1	-31/+30
\| \| \| \| \| \| \| \| \| \|	TCS/TES/GS and now FS all handle these in stage-specific functions. CS don't have inputs, so VS was the only one left using this code. Move it to the VS-specific function for clarity. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965: Delete the FS_OPCODE_INTERPOLATE_AT_CENTROID virtual opcode.	Kenneth Graunke	2016-07-20	4	-10/+0
\| \| \| \| \| \| \| \| \|	We no longer use this message. As far as I can tell, it's fairly useless - the equivalent information is provided in the payload. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965: Rewrite FS input handling to use the new NIR intrinsics.	Kenneth Graunke	2016-07-20	5	-341/+270
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This eliminates the need to walk the list of input variables, recurse into their types (via logic largely redundant with nir_lower_io), and interpolate all possible inputs up front. The backend no longer has to care about variables at all, which eliminates complications from trying to pack multiple variables into the same location. Instead, each intrinsic specifies exactly what's needed. This should unblock Timothy's work on GL_ARB_enhanced_layouts. Each load_interpolated_input intrinsic corresponds to PLN instructions, while load_barycentric_at_* intrinsics correspond to pixel interpolator messages. The pixel/centroid/sample barycentric intrinsics simply refer to payload fields (delta_xy[]), and don't actually generate any code. Because we use a single intrinsic for both centroid-qualified variables and interpolateAtCentroid(), they become indistinguishable. We stop sending pixel interpolator messages for those, and instead use the payload provided data, which should be considerably faster. On Broadwell: total instructions in shared programs: 9067751 -> 9067570 (-0.00%) instructions in affected programs: 145902 -> 145721 (-0.12%) helped: 422 HURT: 209 total spills in shared programs: 2849 -> 2899 (1.76%) spills in affected programs: 760 -> 810 (6.58%) helped: 0 HURT: 10 total fills in shared programs: 3910 -> 3950 (1.02%) fills in affected programs: 617 -> 657 (6.48%) helped: 0 HURT: 10 LOST: 3 GAINED: 3 The differences mostly appear to be slight changes in MOVs. v2: Use nir_shader_compiler_options::use_interpolated_input_intrinsics flag rather than passing it directly to nir_lower_io. Use the unreachable() macro rather than assert in one place. (Review feedback from Chris Forbes.) Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]> Acked-by: Jason Ekstrand <[email protected]>
*	i965: Move load_interpolated_input/barycentric_* intrinsics to the top.	Kenneth Graunke	2016-07-20	1	-0/+64
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, i965 interpolates all FS inputs at the top of the program. This has advantages and disadvantages, but I'd like to keep that policy while reworking this code. We can consider changing it independently. The next patch will make the compiler generate PLN instructions "on the fly", when it encounters an input load intrinsic, rather than doing it for all inputs at the start of the program. To emulate this behavior, we introduce an ugly pass to move all NIR load_interpolated_input and payload-based (not interpolator message) load_barycentric_* intrinsics to the shader's start block. This helps avoid regressions in shader-db for cases such as: if (...) { ...load some input... } else { ...load that same input... } which CSE can't handle, because there's no dominance relationship between the two loads. Because the start block dominates all others, we can CSE all inputs and emit PLNs exactly once, as we did before. Ideally, global value numbering would eliminate these redundant loads, while not forcing them all the way to the start block. When that lands, we should consider dropping this hacky pass. Again, this pass currently does nothing, as i965 doesn't generate these intrinsics yet. But it will shortly, and I figured I'd separate this code as it's relatively self-contained. v2: Dramatically simplify pass - instead of creating new instructions, just remove/re-insert their list nodes (suggested by Jason Ekstrand). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]> [v1] Reviewed-by: Jason Ekstrand <[email protected]>
*	i965: Add a pass to demote sample interpolation intrinsics.	Kenneth Graunke	2016-07-20	1	-0/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When working with a non-multisampled render target, asking for "sample" interpolation locations doesn't make sense. We demote them to centroid. In a couple of patches, brw_compute_barycentric_modes will begin looking at these intrinsics to determine the barycentric modes. fs_visitor also will use them to code-generate pixel interpolator messages or payload references. Handling the "but what if it's not MSAA?" logic ahead of time in a NIR pass simplifies things and prevents duplicated logic. This patch doesn't actually do anything useful yet as we don't generate these intrinsics. I decided to keep it separate as it's self-contained, in the hopes of shrinking the "convert everything" patch for reviewers. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965: Stop muging cube array lengths by 6	Jason Ekstrand	2016-07-20	5	-38/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	From the Sky Lake PRM: "For SURFTYPE_CUBE: For Sampling Engine Surfaces and Typed Data Port Surfaces, the range of this field is [0,340], indicating the number of cube array elements (equal to the number of underlying 2D array elements divided by 6). For other surfaces, this field must be zero." In other words, the depth field for cube maps is in number of cubes not number of 2-D slices so we need to divide by 6. ISL will do this correctly for us assuming that we provide it with the correct array bounds which it expects to be in 2-D slices. It appears as if we've been doing this wrong ever since we first added cube map arrays for Sandy Bridge and the change to ISL made things slightly worse. While we're at it, we now need to remoe the shader hacks we've always done since they were only needed because we were setting the depth field six times too large. v2: Fix the vec4 backend as well (not sure how I missed this). Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]> Reviewed-by: Chris Forbes <[email protected]>