mesa.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	dri/common: remove unused libdri_test_stubs.la	Emil Velikov	2016-06-13	3	-104/+1
\| \| \| \| \| \| \| \| \| \|	... and associated file(s). No longer needed since commit 057259655e7 ("i965: Don't link libmesa or libdri_test_stubs into tests") Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965: Use the correct number of threads for compute shaders.	Kenneth Graunke	2016-06-12	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We were programming the number of threads per subslice, when we should have been programming the total number of threads on the GPU as a whole. Thanks to Curro and Jordan for helping track this down! On Skylake GT3e: - Improves performance in Unreal's Elemental Demo by roughly 1.5-1.7x. - Improves performance in Synmark's Gl43CSDof by roughly 3.7x. - Improves performance in Synmark's Gl43GSCloth by roughly 1.18x. On Broadwell GT2: - Improves performance in Unreal's Elemental Demo by roughly 1.2-1.5x. - Improves performance in Synmark's Gl43CSDof by roughly 2.0x. - Improves performance in Synmark's Gl43GSCloth by 1.47035% +/- 0.255654% (n=25). On Haswell GT3e: - Improves performance in Unreal's Elemental Demo (in GL 4.3 mode) by roughly 1.10x. - Improves performance in Synmark's Gl43CSDof by roughly 1.18x. - Decreases performance in Synmark's Gl43CSCloth by -1.99484% +/- 0.432771% (n=64). On Ivybridge GT2: - Improves performance in Unreal's Elemental Demo (in GL 4.2 mode) by roughly 1.03x. - Improves performance in Synmark's G/43CSDof by roughly 1.25x. - No change in Synmark's Gl43CSCloth (n=28). Cc: "12.0" <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Francisco Jerez <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
*	i965: Assert that the scratch spaces are in range.	Kenneth Graunke	2016-06-12	1	-0/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I don't know that anything actually guarantees this, but if we exceed the limits, we may end up overflowing and trashing random buffers that happen to be nearby in the VMA space, leading to rendering corruption, hangs, or worse. We should really fix this properly. However, the pitfall has existed for ages, so for now we should at least detect it. Cc: "12.0" <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Francisco Jerez <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
*	i965: Fix CS scratch size calculations on Ivybridge and Baytrail.	Kenneth Graunke	2016-06-12	2	-2/+10
\| \| \| \| \| \| \| \| \|	These are linear, not powers of two, and much more limited. Cc: "12.0" <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Francisco Jerez <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
*	i965: Fix Haswell CS per-thread scratch space encoding.	Kenneth Graunke	2016-06-12	2	-3/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Most scratch stages use power of two sizes, in kilobytes, where 0 means 1kB. But compute shaders on Haswell have a minimum of 2kB, and use a representation where 0 = 2kB. This meant that we were effectively telling the hardware to allocate each thread twice as much space as we meant to, while simultaneously not allocating that much space in the buffer, leading to overflows. Note that the existing code is completely wrong for Ivybridge, but that will take additional work to sort out, so I've left it as is for now. A subsequent commit will take care of that. Together with the previous patches, this fixes rendering corruption on Synmark's Gl43CSDof on Haswell. Cc: "12.0" <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Francisco Jerez <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
*	i965: Account for poor address calculations in Haswell CS scratch size.	Kenneth Graunke	2016-06-12	1	-1/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Curro figured this out by investigating the simulator. Apparently there's also a workaround in the Windows driver. I'm not sure it's actually documented anywhere. We were underallocating the scratch buffer by a factor of 128/70. v2: Rename threads_per_subslice to scratch_ids_per_subslice (suggested by Jordan Justen). Cc: "12.0" <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Francisco Jerez <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
*	i965: Allocate scratch space for the maximum number of compute threads.	Kenneth Graunke	2016-06-12	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We were allocating enough space for the number of threads per subslice, when we should have been allocating space for the number of threads in the entire GPU. Even though we currently run with a reduced thread count (due to a bug), we might still overflow the scratch buffer because the address calculation is based on the FFTID, which can depend on exactly which threads, EUs, and threads are executing. We need to allocate enough for every possible thread that could run. Fixes rendering corruption in Synmark's Gl43CSDof on Gen8+. Earlier platforms need additional bug fixes. Cc: "12.0" <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Francisco Jerez <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
*	i965: Set subslice_total on Gen7/7.5 platforms.	Kenneth Graunke	2016-06-12	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \|	We'll use this for compute shader thread counts and scratch space calculations shortly. Note that subslices are referred to as "half slices" on Ivybridge. Cc: "12.0" <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Francisco Jerez <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
*	i965: Fix shared local memory size for Gen9+.	Kenneth Graunke	2016-06-12	2	-9/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Skylake changes the representation of shared local memory size: Size \| 0 kB \| 1 kB \| 2 kB \| 4 kB \| 8 kB \| 16 kB \| 32 kB \| 64 kB \| ------------------------------------------------------------------- Gen7-8 \| 0 \| none \| none \| 1 \| 2 \| 4 \| 8 \| 16 \| ------------------------------------------------------------------- Gen9+ \| 0 \| 1 \| 2 \| 3 \| 4 \| 5 \| 6 \| 7 \| The old formula would substantially underallocate the amount of space. This fixes GPU hangs on Skylake when running with full thread counts. v2: Fix the Vulkan driver too, use a helper function, and fix the table in the comments and commit message. Cc: "12.0" <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Francisco Jerez <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
*	i965/gen9: Don't change halign and valign to fit in fast copy blit	Anuj Phogat	2016-06-09	1	-4/+2
\| \| \| \| \| \| \| \| \|	An update in graphics specs has deleted the halign and valign fields from XY_FAST_COPY_BLT command. See mesa commit 97f0f91. Cc: Ben Widawsky <[email protected]> Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Ben Widawsky <[email protected]>
*	i965: Emit surface states for extra planes prior to gen8	Jason Ekstrand	2016-06-08	2	-0/+18
\| \| \| \| \| \| \| \| \| \|	When Kristian implemented GL_TEXTURE_EXTERNAL_OES, he hooked it up for gen8 but not for gen7 or earlier. It all works, we just need to emit the states for the extra planes. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]> Cc: "12.0" <[email protected]>
*	glsl/types: rename is_dual_slot_double to is_dual_slot_64bit.	Dave Airlie	2016-06-09	1	-2/+2
\| \| \| \| \| \| \|	In the future int64 support will have the same requirements. Reviewed-by: Ilia Mirkin <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
*	i965: Integrate precise trig into configuration infrastructure	Gurchetan Singh	2016-06-07	4	-2/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	With this change, to enable precise SIN and COS instructions on Intel hardware, one can put <option name="precise_trig" value="true"/> in the proper drirc file. V2: Make option name more generic Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Stephane Marchesin <[email protected]>
*	i965/gen8: fix cull distance emission for tessellation shaders.	Dave Airlie	2016-06-07	1	-3/+5
\| \| \| \| \| \| \| \| \| \|	This fixes some cases of: GL45-CTS.cull_distance.functional on Skylake. Reviewed-by: Chris Forbes <[email protected]> Cc: "12.0" <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
*	mesa: hook up core bits of GL_ARB_shader_group_vote	Ilia Mirkin	2016-06-06	1	-0/+5
\| \| \| \| \| \|	Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
*	i965/gs/scalar: Fix load input for doubles	Samuel Iglesias Gonsálvez	2016-06-06	1	-18/+54
\| \| \| \| \| \|	Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Timothy Arceri <[email protected]> Cc: "12.0" <[email protected]>
*	i965/fs: fix offset when loading double vector input varyings	Samuel Iglesias Gonsálvez	2016-06-06	1	-1/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When we are not packing a double input varying, we might need to read its data in a non-aligned to 64-bit offset, so we read the wrong data. This is happening when using explicit locations in varyings because Mesa disables packing varying for that case. const_index is in 32-bit size units but offset() is multiplying it by destination type size units. When operating with double input varyings, const_index value could be not aligned to 64 bits. To fix it, we load the double vector as if it was a float based vector with twice the number of components. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Timothy Arceri <[email protected]> Cc: "12.0" <[email protected]>
*	i965/fs: fix FS_OPCODE_CINTERP for unpacked double input varyings	Samuel Iglesias Gonsálvez	2016-06-06	1	-1/+12
\| \| \| \| \| \| \| \| \| \| \|	Data starts at suboffet 3 in 32-bit units (12 bytes), so it is not 64-bit aligned and the current implementation fails to read the data properly. Instead, when there is is a double input varying, read it as vector of floats with twice the number of components. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Timothy Arceri <[email protected]> Cc: "12.0" <[email protected]>
*	i965: don't use NumLayers for 3D textures.	Dave Airlie	2016-06-06	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	For 3D textures we shouldn't be using NumLayers, we need to get it from the depth. This fixes: GL45-CTS.geometry_shader.layered_framebuffer.clear_call_support Reviewed-by: Eduardo Lima Mitev <[email protected]> Cc: "12.0" <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
*	i965/ps_state: Use wm_prog_data.has_side_effects	Jason Ekstrand	2016-06-03	2	-9/+6
\| \| \| \| \|	Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/fs Add a wm_prog_data bit for has_side_effects	Jason Ekstrand	2016-06-03	2	-0/+15
\| \| \| \| \| \| \| \| \| \| \|	This is more accurate than calling _mesa_active_fragment_shader_has_side_effects because it looks at whether or not the SSBOs, images, or atomic buffers are actually written rather than just existing in the program. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: "12.0" <[email protected]>
*	Revert "i965/fs: Allow scalar source regions on SNB math instructions."	Francisco Jerez	2016-06-03	3	-7/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit c1107cec44ab030c7fcc97c67baa12df1cc9d7b5. Apparently the hardware spec text I quoted in the commit message was outright lying about scalar source math being supported on SNB, the hardware seems to load 32 contiguous bits of data for each channel regardless of the regioning mode. Fixes regressions in the following CTS tests (which we didn't catch early due to CTS being temporarily disabled in our CI system): es2-cts.gtf.gl.atan.atan_vec3_frag_xvary es2-cts.gtf.gl.cos.cos_vec2_frag_xvary es2-cts.gtf.gl.atan.atan_vec2_frag_xvary es2-cts.gtf.gl.pow.pow_vec2_frag_xvary_yconsthalf es2-cts.gtf.gl.cos.cos_float_frag_xvary es2-cts.gtf.gl.pow.pow_float_frag_xvary_yconsthalf es2-cts.gtf.gl.atan.atan_vec3_frag_xvaryyvary es2-cts.gtf.gl.pow.pow_vec3_frag_xvary_yconsthalf es2-cts.gtf.gl.cos.cos_vec3_frag_xvary es2-cts.gtf.gl.atan.atan_vec2_frag_xvaryyvary Cc: [email protected] Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96346 Reported-by: Mark Janes <[email protected]> Acked-by: Matt Turner <[email protected]>
*	i965/vec4: Fix cmod propagation not to propagate non-identity cmod into CMP(N).	Francisco Jerez	2016-06-03	1	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The conditional mod of these instructions determines the semantics of the comparison itself (rather than being evaluated based on the result of the instruction as is usually the case for most other instructions that allow conditional mods), so it's in general not legal to propagate a conditional mod into a CMP instruction. This prevents cmod propagation from (mis)optimizing: cmp.z.f0 tmp, ... mov.z.f0 null, tmp into: cmp.z.f0 tmp, ... which gives the negation of the flag result of the original sequence. I originally noticed this while working on SIMD32 in the scalar back-end, but the same scenario is likely to be possible in vec4 programs so this commit ports the bugfix with the same name from the scalar back-end to the vec4 cmod propagation pass. Cc: [email protected] Reviewed-by: Jason Ekstrand <[email protected]>
*	isl: add support for Android libmesa_isl static library	Mauro Rossi	2016-06-02	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	isl library is needed to build i965, libmesa_isl static library is added to fix related Android building errors. Any attempt to build libmesa_genxml as phony package module failed to deliver gen{7,75,8,9}_pack.h generated headers, needed for libmesa_isl_gen{7,75,8,9} Due to constraints in Android Build System, libmesa_genxml is built as static, at least one source is needed, so dummy.c is autogenerated for this scope, libmesa_genxml dependency is declared using LOCAL_WHOLE_STATIC_LIBRARIES, to avoid building errors due to missing genxml/gen{7,75,8,9}_pack.h headers. Cc: <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
*	i965: Add _NEW_POINT to a couple of comments.	Kenneth Graunke	2016-06-02	3	-3/+3
\| \| \| \| \|	Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]>
*	i965/fs: Reindent emit_zip().	Francisco Jerez	2016-06-02	1	-14/+14
\| \| \| \| \|	Cc: [email protected] Reviewed-by: Jason Ekstrand <[email protected]>
*	i965/fs: Skip SIMD lowering destination zipping if possible.	Francisco Jerez	2016-06-02	1	-0/+55
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Skipping the temporary allocation and copy instructions is easy (just return dst), but the conditions used to find out whether the copy can be optimized out safely without breaking the program are rather complex: The destination must be exactly one component of at most the execution width of the lowered instruction, and all source regions of the instruction must be either fully disjoint from the destination or be aligned with it group by group. v2: Don't handle partial source-destination overlap for simplicity (Jason). No instruction count regressions with respect to v1 in either shader-db or the few FP64 shader_runner test-cases with partial overlap I've checked manually. Cc: [email protected] Reviewed-by: Jason Ekstrand <[email protected]>
*	blorp: Fix 16x multisample scaled blits	Anuj Phogat	2016-06-02	1	-7/+10
\| \| \| \| \| \| \| \| \|	Piglit test ext_framebuffer_multisample_blit_scaled-blit-scaled (with added 16x sample support) now passes with this patch. Cc: "12.0" <[email protected]> Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965: Add missing types to type_sz().	Matt Turner	2016-06-02	1	-1/+5
\| \| \| \| \| \| \| \|	Coverity warns in multiple places about the potential for division by zero, caused by this function's default case. Reviewed-by: Francisco Jerez <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
*	i965/formatquery: remove INTERNALFORMAT_PREFERRED implementation	Alejandro Piñeiro	2016-06-02	1	-71/+0
\| \| \| \| \| \| \| \| \| \| \| \| \|	Right now the implementation only checks if the internalformat is supported or not. But that implementation is wrong, returning unsupported for some internalformats. Additionally, checking if the internalformat is supported or not is already done at mesa/main before calling the driver hook, so this new check is not needed. Acked-by: Eduardo Lima <[email protected]> Acked-by: Antia Puentes <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
*	i965/eu: use simd8 when exec_size != EXECUTE_16	Alejandro Piñeiro	2016-06-02	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	Among other thigs, fix a gpu hang when using INTEL_DEBUG=shader_time for any shader. Signed-off-by: Jason Ekstrand <[email protected]> Signed-off-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
*	i965: Remove old CS local ID handling	Jordan Justen	2016-06-01	6	-120/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	The old method pushed data for each channels uvec3 data of gl_LocalInvocationID. The new method pushes 1 dword of data that is a 'thread local ID' value. Based on that value, we can generate gl_LocalInvocationIndex and gl_LocalInvocationID with some calculations. Cc: "12.0" <[email protected]> Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965: Enable cross-thread constants and compact local IDs for hsw+	Jordan Justen	2016-06-01	3	-14/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The cross thread constant support appears on Haswell. It allows us to upload a set of uniform data for all threads without duplicating it per thread. One complication is that cross-thread constants are loaded into registers before per-thread constants. Previously, our local IDs were loaded before the uniform data and treated as 'payload' data, even though they were actually pushed into the registers like the other uniform data. Therefore, in this patch we simultaneously enable a newer layout where each thread now uses a single uniform slot for a unique local ID for the thread. This uniform is handled specially to make sure it is added last into the uniform push constant registers. This minimizes our usage of push constant registers, and maximizes our ability to use cross-thread constants for registers. To swap from the old to the new layout, we also need to flip some lowering pass switches to let our driver handle the lowering instead. We also no longer force thread_local_id_index to -1. v4: * Minimize size of patch that switches from the old local ID layout to the new layout (Jason) Cc: "12.0" <[email protected]> Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965: Support new local ID push constant & cross-thread constants	Jordan Justen	2016-06-01	2	-45/+52
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The cross thread constant support appears on Haswell. It allows us to upload a set of uniform data for all threads without duplicating it per thread. We also support per-thread data which allows us to store a per-thread ID in one of the uniforms that can be used to calculate the gl_LocalInvocationIndex and gl_LocalInvocationID variables. v4: * Support the old local ID push constant layout as well (Jason) Cc: "12.0" <[email protected]> Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965: Add CS push constant info to brw_cs_prog_data	Jordan Justen	2016-06-01	2	-0/+73
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We need information about push constants in a few places for the GL driver, and another couple places for the vulkan driver. When we add support for uploading both a common (cross-thread) set of push constants, combined with the previous per-thread push constant data, things are going to get even more complicated. To simplify things, we add push constant info into the cs prog_data struct. The cross-thread constant support is added as of Haswell. To support it we need to make sure all push constants with uniform values are added to earlier registers. The register that varies per thread and holds the thread invocation's unique local ID needs to be added last. For now we add the code that would calculate cross-thread constatn information for hsw+, but we force it (cross_thread_supported) off until the other parts of the driver support it. v4: * Support older local ID push constant layout as well. (Jason) Cc: "12.0" <[email protected]> Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965: Store number of threads in brw_cs_prog_data	Jordan Justen	2016-06-01	3	-25/+23
\| \| \| \| \| \|	Cc: "12.0" <[email protected]> Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965: Add nir based intrinsic lowering and thread ID uniform	Jordan Justen	2016-06-01	4	-0/+190
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We add a lowering pass for nir intrinsics. This pass can replace nir intrinsics with driver specific nir lower code. We lower the gl_LocalInvocationIndex intrinsic based on a uniform which is loaded with a thread specific ID. We also lower the gl_LocalInvocationID based on gl_LocalInvocationIndex. v2: * Create variable during lowering pass. (Ken) v3: * Don't create a variable, but instead just insert an intrisic call to load a uniform from the allocated location. (Jason) v4: * Don't run this pass if thread_local_id_index < 0 Cc: "12.0" <[email protected]> Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965: Put CS local thread ID uniform in last push register	Jordan Justen	2016-06-01	1	-1/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This thread ID uniform will be used to compute the gl_LocalInvocationIndex and gl_LocalInvocationID values. It is important for this uniform to be added in the last push constant register. fs_visitor::assign_constant_locations is updated to make sure this happens. The reason this is important is that the cross-thread push constant registers are loaded first, and the per-thread push constant registers are loaded after that. (Broadwell adds another push constant upload mechanism which reverses this order, but we are ignoring this for now.) v2: * Add variable in intrinsics lowering pass * Make sure the ID is pushed last in assign_constant_locations, and that we save a spot for the ID in the push constants v3: * Simplify code based with Jason's suggestions. Cc: "12.0" <[email protected]> Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965: Add uniform for a CS thread local base ID	Jordan Justen	2016-06-01	3	-1/+21
\| \| \| \| \| \| \| \| \| \| \| \|	v4: * Force thread_local_id_index to -1 for now, and have fs_visitor::setup_cs_payload look at thread_local_id_index. This enables us to more easily cut over from the old local ID layout to the new layout, as suggested by Jason. Cc: "12.0" <[email protected]> Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965: Add nir channel_num system value	Jordan Justen	2016-06-01	1	-0/+15
\| \| \| \| \| \| \| \| \|	v2: * simd16/32 fixes (curro) Cc: "12.0" <[email protected]> Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	nir: Make lowering gl_LocalInvocationIndex optional	Jordan Justen	2016-06-01	1	-1/+2
\| \| \| \| \| \|	Cc: "12.0" <[email protected]> Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	glsl: Add glsl LowerCsDerivedVariables option	Jordan Justen	2016-06-01	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	v2: * Move lower flag to context constants. (Ken) Cc: "12.0" <[email protected]> Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> (v1) Reviewed-by: Jason Ekstrand <[email protected]>
*	i965/fs: Copy the offset when lowering logical pull constant sends	Jason Ekstrand	2016-06-01	1	-0/+8
\| \| \| \| \| \| \| \|	This fixes 64 Vulkan CTS tests per gen Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96299 Reviewed-by: Francisco Jerez <[email protected]> Cc: "12.0" <[email protected]>
*	i965: Fix isoline reads in scalar TES.	Kenneth Graunke	2016-06-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	Isolines aren't reversed. commit 5b2d8c2273c6f fixed this for the vec4 TES backend, but not the scalar one. Found while debugging GL45-CTS.tessellation_shader. tessellation_control_to_tessellation_evaluation.gl_tessLevel. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]> Cc: [email protected]
*	i965: If control_data_header_size_bits is zero, don't do EndPrimitive	Ian Romanick	2016-06-01	2	-0/+6
\| \| \| \| \| \| \| \|	This can occur when max_vertices=0 is explicitly specified. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: "12.0" <[email protected]>
*	i965: Add norbc debug option	Topi Pohjolainen	2016-06-01	3	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This INTEL_DEBUG option disables lossless compression (also known as render buffer compression). v2: (Matt) Use likely(!lossless_compression_disabled) instead of !likely(lossless_compression_disabled) (Grazvydas) Update docs/envvars.html Cc: "12.0" <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/gen9: Configure rbc buffers as plain for non-rbc tex views	Topi Pohjolainen	2016-06-01	2	-3/+48
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fixes rendering in Shadow of Mordor with rbc. Application writes RGBA_UNORM texture filling it with values the application wants to later on treat as SRGB_ALPHA. Intel driver enables lossless compression for the buffer by the time of writing. However, the driver fails to make sure the buffer can be sampled as something else later on and unfortunately there is restriction in the hardware for using lossless compression for srgb formats which looks to extend itself to the sampling engine also. Requesting srgb to linear conversion on top of compressed buffer results the color values to be pretty much garbage. Fortunately none of tracked benchmarks showed a regression with this. v2 (Matt): Add missing space Cc: "12.0" <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Fix the passthrough TCS for isolines.	Kenneth Graunke	2016-05-31	1	-7/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We weren't setting up several of the uniform values for the patch header, so we'd crash when uploading push constants. We at least need to initialize them to zero. We also had the isoline parameters reversed, so it would also render incorrectly (if it didn't crash). Fixes a new Piglit test() (isoline-no-tcs), as well as crashes in GL44-CTS.tessellation_shader.single.max_patch_vertices. () https://lists.freedesktop.org/archives/piglit/2016-May/019866.html Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Dave Airlie <[email protected]> Cc: [email protected]
*	i965/xfb: skip components in correct buffer.	Dave Airlie	2016-06-01	1	-4/+6
\| \| \| \| \| \| \| \| \| \| \|	The driver was adding the skip components but always for buffer 0. This fixes: GL45-CTS.gtf40.GL3Tests.transform_feedback3.transform_feedback3_skip_multiple_buffers Reviewed-by: Kenneth Graunke <[email protected]> Cc: "12.0 11.2" <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
*	i965/fs: Allow scalar source regions on SNB math instructions.	Francisco Jerez	2016-05-31	3	-17/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I haven't found any evidence that this isn't supported by the hardware, in fact according to the SNB hardware spec: "The supported regioning modes for math instructions are align16, align1 with the following restrictions: - Scalar source is supported. [...] - Source and destination offset must be the same, except the case of scalar source." Cc: "12.0" <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>