mesa.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	i965/fs: Fix hang on IVB and VLV with image format mismatch.	Francisco Jerez	2015-09-28	1	-4/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	IVB and VLV hang sporadically when an untyped surface read or write message is used to access a surface of format other than RAW, as may happen when there is a mismatch between the format qualifier of the image uniform and the format of the actual image bound to the pipeline. According to the spec this condition gives undefined results but may not lead to program termination (which is one of the possible outcomes of the hang). Fix it by checking at runtime whether the surface is of the right type. Fixes the "arb_shader_image_load_store.invalid/format mismatch" piglit subtest. Reported-by: Mark Janes <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91718 CC: [email protected] Reviewed-by: Ian Romanick <[email protected]>
*	i965/gs: Optimize away the EOT write on Gen8+ with static vertex count.	Kenneth Graunke	2015-09-26	1	-0/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With static vertex counts, the final EOT write doesn't actually write any data - it's just there to end the thread. Typically, the last thing before ending the thread will be an EmitVertex() call, resulting in a URB write. We can just set EOT on that. Note that this isn't always possible - there might be an intervening SSBO write/image store, or the URB write may have been in a loop. shader-db statistics for geometry shaders only: total instructions in shared programs: 3173 -> 3149 (-0.76%) instructions in affected programs: 176 -> 152 (-13.64%) helped: 8 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	i965/gs: Allow src0 immediates in GS_OPCODE_SET_WRITE_OFFSET.	Kenneth Graunke	2015-09-26	2	-2/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	GS_OPCODE_SET_WRITE_OFFSET is a MUL with a constant src[1] and special strides. We can easily make the generator handle constant src[0] arguments by instead generating a MOV with the product of both operands. This isn't necessarily a win in and of itself - instead of a MUL, we generate a MOV, which should be basically the same cost. However, we can probably avoid the earlier MOV to put src[0] into a register. shader-db statistics for geometry shaders only: total instructions in shared programs: 3207 -> 3173 (-1.06%) instructions in affected programs: 3207 -> 3173 (-1.06%) helped: 11 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	i965: Implement "Static Vertex Count" geometry shader optimization.	Kenneth Graunke	2015-09-26	5	-4/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Broadwell's 3DSTATE_GS contains new "Static Output" and "Static Vertex Count" fields, which control a new optimization. Normally, geometry shaders can output arbitrary numbers of vertices, which means that resource allocation has to be done on the fly. However, if the number of vertices is statically known, the hardware can pre-allocate resources up front, which is more efficient. Thanks to the new NIR GS intrinsics, this is easy. We just call the function introduced in the previous commit to get the vertex count. If it obtains a count, we stop emitting the extra 32-bit "Vertex Count" field in the VUE, and instead fill out the 3DSTATE_GS fields. Improves performance of Gl32GSCloth by 5.16347% +/- 0.12611% (n=91) on my Lenovo X250 laptop (Broadwell GT2) at 1024x768. shader-db statistics for geometry shaders only: total instructions in shared programs: 3227 -> 3207 (-0.62%) instructions in affected programs: 242 -> 222 (-8.26%) helped: 10 v2: Don't break non-NIR paths (just skip this optimization). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
*	i965: Move GS_THREAD_END mlen calculations out of the generator.	Kenneth Graunke	2015-09-26	2	-2/+2
\| \| \| \| \| \| \| \| \| \|	The visitor was setting a mlen that was wrong for Broadwell, but the generator was ignoring it and doing the right thing regardless. We may as well move the logic fully into the visitor. This will be useful in the next commit as well. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
*	i965: Simplify handling of VUE map changes.	Kenneth Graunke	2015-09-26	4	-42/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The old code was disasterously complex - spread across multiple atoms which may not even run, inspecting the dirty bits to try and decide whether it was necessary to do checks...storing VS information in brw_context...extra flagging... This code tripped me and Carl up very badly when working on the shader cache code. It's very fragile and hard to maintain. Now that geometry shaders only depend on their inputs and don't have to worry about the VS VUE map, we can dramatically simplify this: just compute the VUE map coming out of the geometry shader stage in brw_upload_programs. If it changes, flag it. Done. v2: Also check vue_map.separable. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
*	i965/gs: Remove the dependency on the VS VUE map.	Kenneth Graunke	2015-09-26	2	-11/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Because we only support geometry shaders in core profile, we can safely ignore any driver-extending of VS outputs. Those are: - Legacy userclipping (doesn't exist in core profile) - Edgeflag copying (Gen4-5 only, no GS support) - Point coord replacement (Gen4-5 only, no GS support) - front/back color hacks (Gen4-5 only, no GS support) v2: Rebase; leave a comment about why SSO works. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
*	i965: Don't re-layout varyings for separate shader programs.	Kenneth Graunke	2015-09-26	5	-18/+67
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, our VUE map code always assigned slots to varyings sequentially, in one contiguous block. This was a bad fit for separate shaders - the GS input layout depended or the VS output layout, so if we swapped out vertex shaders, we might have to recompile the GS on the fly - which rather defeats the point of using separate shader objects. (Tessellation would suffer from this as well - we could have to recompile the HS, DS, and GS.) Instead, this patch makes the VUE map for separate shaders use a fixed layout, based on the input/output variable's location field. (This is either specified by layout(location = ...) or assigned by the linker.) Corresponding inputs/outputs will match up by location; if there's a mismatch, we're allowed to have undefined behavior. This may be less efficient - depending what locations were chosen, we may have empty padding slots in the VUE. But applications presumably use small consecutive integers for locations, so it hopefully won't be much worse in practice. 3% of Dota 2 Reborn shaders are hurt, but only by 2 instructions. This seems like a small price to pay for avoiding recompiles. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
*	i965/vue: Make assign_vue_map() take an explicit slot.	Kenneth Graunke	2015-09-26	1	-16/+19
\| \| \| \| \| \| \| \| \| \| \| \|	Our plan of assigning consecutive slots doesn't work properly for separate shader objects - at least, if we want to avoid recompiling them whenever the interface changes. As a first step, make assign_vue_map take an explicit slot parameter, rather than implicitly incrementing it. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
*	i965: Initialize unused VUE map slots to BRW_VARYING_SLOT_PAD.	Kenneth Graunke	2015-09-26	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Nothing actually relies on unused slots being initialized to BRW_VARYING_SLOT_COUNT. Soon, we're going to have VUE maps with holes in them, at which point pre-filling with BRW_VARYING_SLOT_PAD make a lot more sense. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
*	i965: Fix BRW_VARYING_SLOT_PAD handling in the scalar VS backend.	Kenneth Graunke	2015-09-26	1	-4/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	We can't just break for padding slots. Instead, treat them like unwritten output variables, so we handle flushing and incrementing urb_offset correctly. Paul introduced the concept of padding slots back in 2011, but we've never actually used them for anything. So it's unsurprising that the scalar VS backend didn't handle them quite right. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
*	i965: Enable ARB_shader_storage_buffer_object extension for gen7+	Samuel Iglesias Gonsalvez	2015-09-25	1	-0/+1
\| \| \| \| \|	Signed-off-by: Samuel Iglesias Gonsalvez <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
*	i965/nir/vec4: Implement nir_intrinsic_ssbo_atomic_*	Iago Toral Quiroga	2015-09-25	2	-0/+79
\| \| \| \|	Reviewed-by: Kristian Høgsberg <[email protected]>
*	i965/nir/fs: Implement nir_intrinsic_ssbo_atomic_*	Iago Toral Quiroga	2015-09-25	2	-0/+79
\| \| \| \|	Reviewed-by: Kristian Høgsberg <[email protected]>
*	i965/nir/vec4: Implement nir_intrinsic_load_ssbo	Iago Toral Quiroga	2015-09-25	1	-0/+54
\| \| \| \|	Reviewed-by: Kristian Høgsberg <[email protected]>
*	i965/nir/fs: Implement nir_intrinsic_load_ssbo	Iago Toral Quiroga	2015-09-25	1	-0/+62
\| \| \| \|	Reviewed-by: Kristian Høgsberg <[email protected]>
*	i965/nir/vec4: Implement nir_intrinsic_store_ssbo	Iago Toral Quiroga	2015-09-25	1	-0/+148
\| \| \| \|	Reviewed-by: Kristian Høgsberg <[email protected]>
*	i965/nir/fs: Implement nir_intrinsic_store_ssbo	Iago Toral Quiroga	2015-09-25	1	-0/+71
\| \| \| \|	Reviewed-by: Kristian Høgsberg <[email protected]>
*	i965/vec4: Import surface message builder functions.	Francisco Jerez	2015-09-25	2	-0/+273
\| \| \| \| \| \| \| \| \| \| \|	Implement helper functions that can be used to construct and send untyped and typed surface read, write and atomic messages to the shared dataport unit. v2: Split from the FS implementation. v3: Rewrite to avoid evil array_reg, emit_collect and emit_zip. Reviewed-by: Kristian Høgsberg <[email protected]>
*	i965/vec4: Import helpers to convert vectors into arrays and back.	Francisco Jerez	2015-09-25	3	-0/+130
\| \| \| \| \| \| \| \| \| \| \| \| \|	These functions handle the conversion of a vec4 into the form expected by the dataport unit in message and message return payloads. The conversion is not always trivial because some messages don't support SIMD4x2 for some generations, in which case a strided copy may be necessary. v2: Split from the FS implementation. v3: Rewrite to avoid evil array_reg, emit_collect and emit_zip. Reviewed-by: Kristian Høgsberg <[email protected]>
*	i965/vec4: Introduce VEC4 IR builder.	Francisco Jerez	2015-09-25	2	-0/+603
\| \| \| \| \| \| \| \| \| \| \| \|	See "i965/fs: Introduce FS IR builder." for the rationale. v2: Drop scalarizing VEC4 builder. v3: Take a backend_shader as constructor argument. Improve handling of debug annotations and execution control flags. Rename "instr" variable. Initialize cursor to NULL by default and add method to explicitly point the builder at the end of the program. Reviewed-by: Kristian Høgsberg <[email protected]>
*	i965/wm: surfaces should have the API buffer size, not the drm buffer size	Samuel Iglesias Gonsalvez	2015-09-25	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	The returned drm buffer object has a size multiple of 4096 but that should not be exposed to the API user, which is working with a different size. As far as I can see this problem is only visible in the calculation of the length of unsized arrays used in SSBOs, as the implementation of this needs to query the underlying buffer size via a message. Signed-off-by: Samuel Iglesias Gonsalvez <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
*	i965/wm: emit null buffer surfaces when null buffers are attached	Samuel Iglesias Gonsalvez	2015-09-25	1	-18/+26
\| \| \| \| \| \| \| \| \| \| \|	Otherwise we can expect odd things to happen if, for example, we ask for the size of the attached buffer from shader code, since that might query this value from the surface we uploaded and get random results. Signed-off-by: Samuel Iglesias Gonsalvez <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
*	i965/fs/nir: implement nir_intrinsic_get_buffer_size	Samuel Iglesias Gonsalvez	2015-09-25	1	-0/+24
\| \| \| \| \| \| \| \| \|	v2: - Remove inst->regs_written assignment as the instruction only writes to one register. Signed-off-by: Samuel Iglesias Gonsalvez <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
*	i965/fs: Implement FS_OPCODE_GET_BUFFER_SIZE	Samuel Iglesias Gonsalvez	2015-09-25	5	-0/+55
\| \| \| \| \|	Signed-off-by: Samuel Iglesias Gonsalvez <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
*	i965/vec4/nir: implement nir_intrinsic_get_buffer_size	Samuel Iglesias Gonsalvez	2015-09-25	1	-0/+26
\| \| \| \| \|	Signed-off-by: Samuel Iglesias Gonsalvez <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
*	i965/vec4: Implement VS_OPCODE_GET_BUFFER_SIZE	Samuel Iglesias Gonsalvez	2015-09-25	5	-0/+44
\| \| \| \| \| \| \| \|	Notice that Skylake needs to include a header in the sampler message so it will need some tweaks to work there. Signed-off-by: Samuel Iglesias Gonsalvez <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
*	glsl: Add parser/compiler support for unsized array's length()	Samuel Iglesias Gonsalvez	2015-09-25	2	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The unsized array length is computed with the following formula: array.length() = max((buffer_object_size - offset_of_array) / stride_of_array, 0) Of these, only the buffer size needs to be provided by the backends, the frontend already knows the values of the two other variables. This patch identifies the cases where we need to get the length of an unsized array, injecting ir_unop_ssbo_unsized_array_length expressions that will be lowered (in a later patch) to inject the formula mentioned above. It also adds the ir_unop_get_buffer_size expression that drivers will implement to provide the buffer length. v2: - Do not define a triop that will force backends to implement the entire formula, they should only need to provide the buffer size since the other values are known by the frontend (Curro). v3: - Call state->has_shader_storage_buffer_objects() in ast_function.cpp instead of using state->ARB_shader_storage_buffer_object_enable (Tapani). Signed-off-by: Samuel Iglesias Gonsalvez <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
*	i965/fs: Do not split buffer variables	Iago Toral Quiroga	2015-09-25	1	-0/+1
\| \| \| \| \| \| \| \|	Buffer variables are the same as uniforms, only that read/write, so we want the same treatment. Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
*	i965: handle visiting of ir_var_shader_storage variables	Iago Toral Quiroga	2015-09-25	1	-2/+3
\| \| \| \| \|	Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
*	i965: Upload Shader Storage Buffer Object surfaces	Iago Toral Quiroga	2015-09-25	2	-13/+57
\| \| \| \| \| \| \| \| \| \| \|	Since these are a special kind of UBOs we emit them together reusing the same infrastructure, however, we use a RAW surface so we can reuse existing untyped read/write/atomic messages which include a pixel mask header that we need to set to obtain correct behavior with helper invocations of the fragment shader. Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
*	i965: Set MaxShaderStorageBuffers for compute shaders	Iago Toral Quiroga	2015-09-25	1	-0/+3
\| \| \| \| \| \| \| \|	v2: - Set it after the driver's MaxShaderStorageBuffers value assignment. Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
*	i965: set ARB_shader_storage_buffer_object related constant values	Samuel Iglesias Gonsalvez	2015-09-25	1	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \|	v2: - Add tessellation shader constants assignment v3: - Set MaxShaderStorageBufferBindings to 36. Signed-off-by: Samuel Iglesias Gonsalvez <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
*	i965: Implement DriverFlags.NewShaderStorageBuffer	Iago Toral Quiroga	2015-09-25	2	-0/+3
\| \| \| \| \| \| \| \|	We use the same dirty state for SSBOs and UBOs because they share the same infrastructure. Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
*	i965: Use 64-byte offset alignment for shader storage buffers	Iago Toral Quiroga	2015-09-25	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This should be a cacheline (64 bytes) so that we can safely have the CPU and GPU writing the same SSBO on non-cachecoherent systems (our Atom CPUs). With UBOs, the GPU never writes, so there's no problem. For an SSBO, the GPU and the CPU can be updating disjoint regions of the buffer simultaneously and that will break if the regions overlap the same cacheline. v2: - Use cacheline size (64 bytes) instead of 16 bytes (Kristian). - Update commit log and add a comment in the code explaining why we use cacheline size (Ben). Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
*	i965/cs: Implement DispatchComputeIndirect support	Jordan Justen	2015-09-24	3	-4/+60
\| \| \| \| \|	Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
*	i965/vec4: check swizzle before discarding a uniform on a 3src operand	Alejandro Piñeiro	2015-09-24	1	-3/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Without this commit, copy propagation is discarded if it involves a uniform with an instruction that has 3 sources. But 3 sourced instructions can access scalar values. For example, this is what vec4_visitor::fix_3src_operand() is already doing: if (src.file == UNIFORM && brw_is_single_value_swizzle(src.swizzle)) return src; Shader-db results (unfiltered) on NIR: total instructions in shared programs: 6259650 -> 6241985 (-0.28%) instructions in affected programs: 812755 -> 795090 (-2.17%) helped: 7930 HURT: 0 Shader-db results (unfiltered) on IR: total instructions in shared programs: 6445822 -> 6441788 (-0.06%) instructions in affected programs: 296630 -> 292596 (-1.36%) helped: 2533 HURT: 0 v2: - Updated commit message, using Matt Turner suggestions - Move the check after we've created the final value, as Jason Ekstrand suggested - Clean up the condition v3: - Move the check back to the original place, to keep things tidy, as suggested by Jason Ekstrand v4: - Fixed missing is_single_value_swizzle() as pointed by Jason Ekstrand Reviewed-by: Matt Turner <[email protected]>
*	i965: Respect stride and subreg_offset for ATTR registers	Kristian Høgsberg Kristensen	2015-09-24	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \|	When we assign hw regs to attributes, we don't incorporate the stride and subreg_offset from the fs_reg. It's rarely used, but the integer multiplication lowering uses unusual stride and subreg_offset combination breaks when one source is an attribute. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91970 Cc: "10.6 11.0" <[email protected]> Signed-off-by: Kristian Høgsberg Kristensen <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	mesa: rework Driver.CopyImageSubData() and related code	Brian Paul	2015-09-24	1	-23/+57
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, core Mesa's _mesa_CopyImageSubData() created temporary textures to wrap renderbuffer sources/destinations. This caused a bit of a mess in the Mesa/gallium state tracker because we had to basically undo that wrapping. Instead, change ctx->Driver.CopyImageSubData() to take both gl_renderbuffer and gl_texture_image src/dst pointers (one being null, the other non-null) so the driver can handle renderbuffer vs. texture as needed. For the i965 driver, we basically moved the code that wrapped textures around renderbuffers from copyimage.c down into the met and driver code. The old code in copyimage.c also made some questionable calls to _mesa_BindTexture(), etc. which weren't undone at the end. v2 (Jason Ekstrand): Rework the intel bits v3 (Brian Paul): Update the temporary st_CopyImageSubData() function. Reviewed-by: Topi Pohjolainen <[email protected]> Tested-by: Kai Wasserbäch <[email protected]> Tested-by: Nick Sarnie <[email protected]>
*	i965: add ARB_texture_barrier support	Ilia Mirkin	2015-09-23	2	-0/+10
\| \| \| \| \|	Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965/gs: Fix extra level of indentation left by the previous commit.	Kenneth Graunke	2015-09-23	2	-115/+111
\| \| \| \| \| \| \| \|	I left a bunch of code indented a level in the previous patch to make the diff easier to read. But now we should fix that. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965/gs: Use new NIR intrinsics.	Kenneth Graunke	2015-09-23	4	-26/+48
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	By performing the vertex counting in NIR, we're able to elide a ton of useless safety checks around every EmitVertex() call: total instructions in shared programs: 3952 -> 3720 (-5.87%) instructions in affected programs: 3491 -> 3259 (-6.65%) helped: 11 HURT: 0 Improves performance in Gl32GSCloth by 0.671742% +/- 0.142202% (n=621) on Haswell GT3e at 1024x768. This should also make it easier to implement Broadwell's "Static Vertex Count" feature someday. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965/vec4: Don't coalesce regs in Gen6 MATH ops if reswizzle/writemask needed	Antia Puentes	2015-09-23	2	-3/+12
\| \| \| \| \| \| \| \|	Gen6 MATH instructions can not execute in align16 mode, so swizzles or writemasking are not allowed. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92033 Reviewed-by: Matt Turner <[email protected]>
*	i965/vec4: Detect and delete useless MOVs.	Matt Turner	2015-09-22	1	-0/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With NIR: instructions in affected programs: 111508 -> 109193 (-2.08%) helped: 507 Without NIR: instructions in affected programs: 28763 -> 28474 (-1.00%) helped: 186 Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/vec4: Add support for fdph_replicated	Jason Ekstrand	2015-09-22	1	-0/+5
\| \| \| \| \|	Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Add defines for tessellation stages	Chris Forbes	2015-09-22	1	-0/+72
\| \| \| \| \| \| \| \| \| \| \| \| \|	v2 (Ken): - Squash together commits for HS, DS, and TE, as well as fixes. - Add INTEL_MASK variants so we can use SET_FIELD if we want. - Rename GEN7_HS_INSTANCE_CONTROL to GEN7_HS_INSTANCE_COUNT to match the documentation. - Add some more fields from the PRMs. - Add Broadwell variants. Signed-off-by: Chris Forbes <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]>
*	i965/vec4: refactor brw_vec4_copy_propagation.	Alejandro Piñeiro	2015-09-22	1	-14/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now it is more similar to brw_fs_copy_propagation, with three clear stages: 1) Build up the value we are propagating as if it were the source of a single MOV: 2) Check that we can propagate that value 3) Build the final value Previously everything was somewhat messed up, making the implementation on some specific cases, like knowing if you can propagate from a previous instruction even with type mismatches, even messier (for example, with the need of maintaining more of one has_source_modifiers). The refactoring clears stuff, and gives support to this mentioned use case without doing anything extra (for example, only one has_source_modifiers is used). Shader-db results for vec4 programs on Haswell: total instructions in shared programs: 1683842 -> 1669037 (-0.88%) instructions in affected programs: 739837 -> 725032 (-2.00%) helped: 6237 HURT: 0 v2: using 'arg' index to get the from inst was wrong v3: rebased against last change on the previous patch of the series v4: don't need to track instructions on struct copy_entry, as we only set the source on a direct copy v5: change the approach for a refactoring v6: tweaked comments Reviewed-by: Jason Ekstrand <[email protected]>
*	i965: fix textureGrad for cubemaps	Tapani Pälli	2015-09-22	1	-19/+182
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fixes bugs exposed by commit 2b1cdb0eddb73f62e4848d4b64840067f1f70865 in: ES3-CTS.gtf.GL3Tests.shadow.shadow_execution_frag No regressions observed in deqp, CTS or Piglit. v2: address review feedback from Iago Toral: - move rho calculation to else branch - optimize dx and dy calculation - fix documentation inconsistensies Signed-off-by: Tapani Pälli <[email protected]> Signed-off-by: Kevin Rogovin <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91114 Cc: "10.6 11.0" <[email protected]>
*	i965: Clean up GLSL compiler option setup	Jason Ekstrand	2015-09-21	1	-26/+20
\| \| \| \| \| \| \| \| \|	The only functional change here is that we now set EmitNoIndirectOutput and EmitNoIndirectTemp for compute shaders. Compute shaders don't have outputs per-se and we should have been setting EmitNoIndirectTemp all along. Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	i965/skl: Use larger URB size where available.	Ben Widawsky	2015-09-21	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	All SKL SKUs except the lowest one which has half the L3 size actually have 384K of URB per slice. For once, I can explain how this mistake was made and how it was missed in review... Historically when we enable a platform and put the production sizes, you can simply look at the "smallest" SKU and see what its URB size is (and we assumed it was the 1 slice variant). Since on newer platforms the URB sizes are scaled automatically by HW, this was sufficient. On SKL, this is a bit different as the lowest SKU actually has half of the L3 fused off. GT2 is the 1 slice (not GT1) variant and it has 384K. There are no Jenkins tests fixed (or regressions) and we don't expect any fixes here because you can always run with less URB size. Thanks to Sarah for bringing this to my attention. Cc: Sarah Sharp <[email protected]> Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Jordan Justen <[email protected]>