aboutsummaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965/brw_vec4.cpp
Commit message (Collapse)AuthorAgeFilesLines
* i965/vs: Fix regression on pre-gen6 with no VS uniforms in use.Eric Anholt2013-08-301-0/+1
| | | | | | | | | | | | df06745c5adb524e15d157f976c08f1718f08efa made it so that we didn't allocate extra uniform space for unused clip planes, which also incidentally made us not allocate any space at all, which we were relying on for this no-uniforms case. Instead of putting the knowledge of this special HW exception into the thing that normally preallocates prog_data for us, just allocate it here. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68766 Reviewed-by: Kenneth Graunke <[email protected]>
* i965/gs: Add GS_OPCODE_THREAD_END.Paul Berry2013-08-231-0/+1
| | | | | | Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/gs: Add GS_OPCODE_URB_WRITE.Paul Berry2013-08-231-0/+2
| | | | | | Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Virtualize setup_payload instead of setup_attributes.Paul Berry2013-08-231-1/+1
| | | | | | | | | | | | | | | | | | | When I initially generalized the vec4_visitor class in preparation for geometry shaders, I assumed that the setup_attributes() function would need to be different between vertex and geometry shaders, but its caller, setup_payload(), could be shared. So I made setup_attributes() a virtual function. It turns out this isn't true; setup_payload() needs to be different too, since the geometry shader payload sometimes includes an extra register (primitive ID) that has to come before uniforms. So setup_payload() needs to be the virtual function instead of setup_attributes(). Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Allow for dispatch_grf_start_reg to vary.Paul Berry2013-08-231-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Both 3DSTATE_VS and 3DSTATE_GS have a dispatch_grf_start_reg control, which determines the register where the hardware delivers data sourced from the URB (push constants followed by per-vertex input data). For vertex shaders, we always set dispatch_grf_start_reg to 1, since R1 is always the first register available for push constants in vertex shaders. For geometry shaders, we'll need the flexibility to set dispatch_grf_start_reg to different values depending on the behvaiour of the geometry shader; if it accesses gl_PrimitiveIDIn, we'll need to set it to 2 to allow the primitive ID to be delivered to the thread in R1. This patch eliminates the assumption that dispatch_grf_start_reg is always 1. In vec4_visitor, we record the regnum that was passed to vec4_visitor::setup_uniforms() in prog_data for later use. In vec4_generator, we consult this value when converting an abstract UNIFORM register to a concrete hardware register. And in the code that emits 3DSTATE_VS, we set dispatch_grf_start_reg based on the value recorded in prog_data. This will allow us to set dispatch_grf_start_reg to the appropriate value when compiling geometry shaders. Vertex shaders will continue to always use a dispatch_grf_start_reg of 1. v2: Make dispatch_grf_start_reg "unsigned" rather than "GLuint". Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Move vec4 data structures and functions to brw_vec4.{cpp,h}.Paul Berry2013-08-231-0/+27
| | | | | | | | | | | | | | | | This patch moves the following things into brw_vec4.{cpp,h}: - struct brw_vec4_compile - struct brw_vec4_prog_key - brw_vec4_prog_data_compare() - brw_vec4_prog_data_free() This will allow us to avoid having to include brw_vs.h in geometry-shader-specific files. Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Stop including brw_vs.h from brw_vec4.h.Paul Berry2013-08-231-0/+1
| | | | | | | | | | | | | | | | This is backwards from what we are going to want in the long term, which is: - brw_vec4.h declares general-purpose vec4 infrastructure needed by both VS and GS - brw_vs.h includes brw_vec4.h and adds VS-specific parts. - brw_gs.h includes brw_vec4.h and adds GS-specific parts. Note that at the moment brw_vec.h contains a fair amount of VS-specific declarations--I plan to address that in a later patch. Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/vs: Plumb brw_vec4_prog_data into vec4_generator().Kenneth Graunke2013-08-191-1/+1
| | | | | | | This will be useful for the next commit. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965: Move intel_context::gen and gt fields to brw_context.Kenneth Graunke2013-07-091-5/+5
| | | | | | | | | | Most functions no longer use intel_context, so this patch additionally removes the local "intel" variables to avoid compiler warnings. Signed-off-by: Kenneth Graunke <[email protected]> Acked-by: Chris Forbes <[email protected]> Acked-by: Paul Berry <[email protected]> Acked-by: Anuj Phogat <[email protected]>
* i965: Move intel_context::perf_debug to brw_context.Kenneth Graunke2013-07-091-3/+2
| | | | | | | Signed-off-by: Kenneth Graunke <[email protected]> Acked-by: Chris Forbes <[email protected]> Acked-by: Paul Berry <[email protected]> Acked-by: Anuj Phogat <[email protected]>
* i965: Move intel_context::batch to brw_context.Kenneth Graunke2013-07-091-3/+3
| | | | | | | Signed-off-by: Kenneth Graunke <[email protected]> Acked-by: Chris Forbes <[email protected]> Acked-by: Paul Berry <[email protected]> Acked-by: Anuj Phogat <[email protected]>
* glsl: Remove ir_print_visitor.h includes and usageEric Anholt2013-06-211-1/+0
| | | | | | | | | | | | | We have ir->print() to do the old declaration of a visitor and having the IR accept the visitor (yuck!). And now you can call _mesa_print_ir() safely anywhere that you know what an ir_instruction is. A couple of missing printf("\n")s are added in error paths -- when an expression is handed to the visitor, it doesn't print '\n' (since it might be a step in printing a whole expression tree). Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965/vs: Fix implied_mrf_writes() for integer division pre-gen6.Eric Anholt2013-05-291-0/+2
| | | | | | | | | | | Previously it would assertion fail in debug builds (though the correct value was returned in a non-debug build). Marking it as a candidate for stable even though it has no current consumers in the stable branches, in case one shows up in a later backport. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=64727 NOTE: This is a candidate for stable branches. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: Make virtual grf live intervals actually cover their used range.Eric Anholt2013-05-091-4/+7
| | | | | | | | This is the same change as the previous commit to the FS. A very few VSes are regressed by 1 or 2 instructions, which look recoverable with a bit more dead code elimination. Reviewed-by: Ian Romanick <[email protected]>
* i965/vs: Add instruction scheduling.Eric Anholt2013-05-021-0/+9
| | | | | | | | | | | While this is ignorant of dependency control, it's still good for a 0.39% +/- 0.08% performance improvement on GLBenchmark 2.7 (n=548) v2: Rewrite as a subclass of the base class for the FS instruction scheduler, inheriting the same latency information. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Make dump_instructions be a virtual method of the visitor.Eric Anholt2013-05-021-12/+3
| | | | | Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Move is_math/is_tex/is_control_flow() to backend_instruction.Kenneth Graunke2013-04-291-26/+0
| | | | | | | | | | | | | | | These are entirely based on the opcode, which is available in backend_instruction. It makes sense to only implement them in one place. This changes the VS implementation of is_tex() slightly, which now accepts FS_OPCODE_TXB and SHADER_OPCODE_LOD. However, since those aren't generated in the VS anyway, it should be fine. This also makes is_control_flow() available in the VS. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965/vec4: Fix hypothetical use of uninitialized data in attribute_map[].Paul Berry2013-04-171-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes issue identified by Klocwork analysis: 'attribute_map' array elements might be used uninitialized in this function (vec4_visitor::lower_attributes_to_hw_regs). The attribute_map array contains the mapping from shader input attributes to the hardware registers they are stored in. vec4_vs_visitor::setup_attributes() only populates elements of this array which, according to core Mesa, are actually used by the shader. Therefore, when vec4_visitor::lower_attributes_to_hw_regs() accesses the array to lower a register access in the shader, it should in principle only access elements of attribute_map that contain valid data. However, if a bug ever caused the driver back-end to access an input that was not flagged as used by core Mesa, then lower_attributes_to_hw_regs() would access uninitialized memory, which could cause illegal instructions to get generated, resulting in a possible GPU hang. This patch makes the situation more robust by using memset() to pre-initialize the attribute_map array to zero, so that if such a bug ever occurred, lower_attributes_to_hw_regs() would generate a (mostly) harmless access to r0. In addition, it adds assertions to lower_attributes_to_hw_regs() so that if we do have such a bug, we're likely to discover it quickly. Reviewed-by: Jordan Justen <[email protected]>
* i965: Fix a warning in the release build.Eric Anholt2013-04-121-2/+1
| | | | | | | This was copy and pasted from can_reswizzle_dst(), and we can just fold it in instead to avoid the warning. Reviewed-by: Matt Turner <[email protected]>
* i965/vs: Print error if vertex shader fails to compile.Matt Turner2013-04-111-0/+4
| | | | Reviewed-by: Eric Anholt <[email protected]>
* i965: NULL check prog on shader compilation failure.Matt Turner2013-04-111-3/+5
| | | | | | Also change if (shader) to if (prog) for consistency. Reviewed-by: Eric Anholt <[email protected]>
* i965/vs: Don't hardcode DEBUG_VS in generic vec4 code.Paul Berry2013-04-111-1/+2
| | | | | | | | | | | | Since the vec4_visitor and vec4_generator classes are going to be re-used for geometry shaders, we can't enable their debug functionality based on (INTEL_DEBUG & DEBUG_VS) anymore. Instead, add a debug_flag boolean to these two classes, so that when they're instantiated the caller can specify whether debug dumps are needed. Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: Generalize attribute setup code in preparation for GS.Paul Berry2013-04-111-22/+32
| | | | | | | | | | | This patch introduces a new function, vec4_visitor::lower_attributes_to_hw_regs(), which replaces registers of type ATTR in the instruction stream with the hardware registers that store those attributes. This logic will need to be common between the vertex and geometry shaders. Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: Generalize data structures pointed to by vec4_generator.Paul Berry2013-04-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | This patch removes the following field from vec4_generator, since it is not used: - struct brw_vs_compile *c And changes the following field: - struct gl_vertex_program *vp => struct gl_program *prog With these changes, vec4_generator no longer refers to any VS-specific data structures. This will pave the way for re-using it for geometry shaders. Reviewed-by: Jordan Justen <[email protected]> v2: Use the name "prog" rather than "p". Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: move VS-specific data members to vs_vec4_visitor.Paul Berry2013-04-111-19/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | This patch moves the following data structures from vec4_visitor to vec4_vs_visitor, since they contain VS-specific data: - struct brw_vs_compile *c (renamed to vs_compile) - struct brw_vs_prog_data *prog_data (renamed to vs_prog_data) - src_reg *vp_temp_regs - src_reg vp_addr_reg Since brw_vs_compile and brw_vs_prog_data also contain vec4-generic data, the following pointers are added to the base class, to allow it to access the vec4-generic portions of these data structures: - struct brw_vec4_compile *c - struct brw_vec4_prog_key *key - struct brw_vec4_prog_data *prog_data Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Eric Anholt <[email protected]> v2: Use shorter names in the base class and longer names in the derived class. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: Make some vec4_visitor functions virtual.Paul Berry2013-04-111-4/+4
| | | | | | | | | | | | | | | | This patch makes the following vec4_visitor functions virtual, since they will need to be implemented differently for vertex and geometry shaders. Some of the functions are renamed to reflect their generic purpose, rather than their VS-specific behaviour: - setup_attributes - emit_attribute_fixups (renamed to emit_prolog) - emit_vertex_program_code (renamed to emit_program_code) - emit_urb_writes (renamed to emit_thread_end) Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: Make vec4_vs_visitor class derived from vec4_visitor.Paul Berry2013-04-111-1/+1
| | | | | | | | | | This patch just creates the derived class; later patches will migrate VS-specific functions and data structures from the base class into the derived class. Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: split brw_vs_prog_data into generic and VS-specific parts.Paul Berry2013-04-111-16/+18
| | | | | | | | | | | This will allow the generic parts to be re-used for geometry shaders. Reviewed-by: Jordan Justen <[email protected]> v2: Put urb_read_length and urb_entry_size in the generic struct. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: split brw_vs_prog_key into generic and VS-specific parts.Paul Berry2013-04-111-1/+1
| | | | | | | | This will allow the generic parts to be re-used for geometry shaders. Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: Remove brw_vs_prog_data pointer from brw_vs_compile.Paul Berry2013-04-111-9/+10
| | | | | | | | | | | | | | | | | | In patches that follow, we'll be splitting structs brw_vs_prog_data and brw_vs_compile into a vec4-generic base struct and a VS-specific derived struct (this will allow the vec4-generic code to be re-used for geometry shaders). Having brw_vs_compile point to brw_vs_prog_data makes it difficult to do this cleanly. Fortunately most of the functions that use brw_vs_compile (those in the vec4_visitor class) already have access to brw_vs_prog_data through a separate pointer (vec4_visitor::prog_data). So all we have to do is use that pointer consistently, and plumb prog_data through the few remaining functions that need access to it. Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: Make type of vec4_visitor::vp more generic.Paul Berry2013-04-111-3/+3
| | | | | | | | | | | | | | The vec4_visitor functions don't use any VS specific data from vec4_visitor::vp. So rename it to "prog" and change its type from struct gl_vertex_program * to struct gl_program *. This will allow the code to be re-used for geometry shaders. Reviewed-by: Jordan Justen <[email protected]> v2: Use the name "prog" rather than "p". Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Rename backend_visitor::prog to shader_prog.Paul Berry2013-04-111-2/+2
| | | | | | | | | | | | The next patch is going to change the type of vec4_visitor::vp from struct gl_vertex_program * to struct gl_program *, and rename it. The sensible name to change it to is vec4_visitor::prog. However, prog is already used in backend_visitor (which vec4_visitor derives from). Since backend_visitor::prog is of type struct gl_shader_program *, it makes sense to rename it to shader_prog. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: Use GRFs for pull constant offsets on gen7.Eric Anholt2013-04-101-1/+7
| | | | | | | | | | | | | This allows the computation of the offset to get written directly into the message source. shader-db results: total instructions in shared programs: 3308390 -> 3283025 (-0.77%) instructions in affected programs: 442998 -> 417633 (-5.73%) No difference in GLB2.7 low res (n=9). Reviewed-by: Matt Turner <[email protected]>
* i965/vs: When asked to make a dst_reg for a src.xxxx, just write to src.x.Eric Anholt2013-04-101-1/+8
| | | | | | | | | | | | | We have several places in our pull constant handling where we make a temporary src_reg for an int, and then turn it into a dst. In doing so, we were writing to the dst.xyzw, so we never register coalesced it with a later mov from dst.x to real_dst.x. These extra channels written would be removed if we had channel-wise DCE in the backend, but we don't. Fix it for now by just not writing these extra channels that won't get used. Reviewed-by: Matt Turner <[email protected]>
* i965/vs: Add a pass to set dependency control fields on instructions.Eric Anholt2013-04-011-0/+109
| | | | | | | This is a more aggressive version of the old brw_optimize() path. Reduces cycles spent in the vertex shader on minecraft by 18.6% +/- 10.0% (n=15). Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add names for all instructions to dump_instruction() in FS and VS.Eric Anholt2013-03-291-6/+1
| | | | | | | I'd previously added the minimum names to understand my dumps, but this makes dumps in general much easier to read. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: Include URB payload setup in shader_time.Eric Anholt2013-03-281-3/+0
| | | | | | | | This much more accurately reflects the cost of the vertex shader, since the payload setup is often a significant fraction of the instructions in the VS. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: Use a send from a 2-register VGRF for shader time writes.Eric Anholt2013-03-281-12/+12
| | | | | | | This will let us emit it later, after we're setting up MRFs for the URB write. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: Teach copy propagation about sends from GRFs.Eric Anholt2013-03-281-0/+12
| | | | | | | This incidentally also teaches it a bit about gen6 math -- we now allow unswizzled, unmodified GRF temps as the sources for math. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: Prepare split_virtual_grfs() for the presence of SENDs from GRFs.Eric Anholt2013-03-281-20/+44
| | | | | | v2: Fix silly bool handling, and don't add new tabs. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Track ARB program state along with GLSL state for shader_time.Eric Anholt2013-03-281-11/+2
| | | | | | This will let us do much better printouts for non-GLSL programs. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: Add IR dumping for immediates.Kenneth Graunke2013-03-201-0/+16
| | | | | | | This makes dump_instructions more useful. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Split shader_time entries into separate cachelines.Eric Anholt2013-03-141-1/+1
| | | | | | | | | | | | | This avoids some snooping overhead between EUs processing separate shaders (so VS versus FS). Improves performance of a minecraft trace with shader_time by 28.9% +/- 18.3% (n=7), and performance of my old GLSL demo by 93.7% +/- 0.8% (n=4). v2: Add a define for the stride with a comment explaining its units and why. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Make perf_debug() output to GL_ARB_debug_output in a debug context.Eric Anholt2013-03-051-2/+2
| | | | | | | | I tried to ensure that performance in the non-debug case doesn't change (we still just check one condition up front), and I think the impact is small enough in the debug context case to warrant including all of it. Reviewed-by: Jordan Justen <[email protected]>
* i965: add a new virtual opcode: SHADER_OPCODE_TXF_MSChris Forbes2013-03-021-0/+1
| | | | | | | | | | | | | This is very similar to the TXF opcode, but lowers to `ld2dms` rather than `ld` on Gen7. V4: - add SHADER_OPCODE_TXF_MS to is_tex() functions, so regalloc thinks it actually writes the correct number of registers. Otherwise in nontrivial shaders some of the registers tend to get clobbered, producing bad results. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965/vs/gen7: Allow MATH instructions to have MRF as a destinationMatt Turner2013-02-281-1/+1
| | | | | | | | | total instructions in shared programs: 346873 -> 346847 (-0.01%) instructions in affected programs: 364 -> 338 (-7.14%) (All affected shaders are from Lightsmark) Reviewed-by: Eric Anholt <[email protected]>
* i965: Add asserts to check that we don't realloc ParameterValues.Eric Anholt2012-12-281-0/+9
| | | | | | | Things are even more restrictive than they used to be, so I've made mistakes in this area. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Generalize VS compute-to-MRF for compute-to-another-GRF, too.Eric Anholt2012-12-141-58/+71
| | | | | | | | | No statistically significant performance difference on glbenchmark 2.7 (n=60). It reduces cycles spent in the vertex shader by 3.3% +/- 0.8% (n=5), but that's only about .3% of all cycles spent according to the fixed shader_time. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: Extend opt_compute_to_mrf to handle limited "reswizzling"Eric Anholt2012-12-141-9/+90
| | | | | | | | | | | | | The way our visitor works, scalar expression/swizzle results that get stored in channels other than .x will have an intermediate MOV from their result in the .x channel to the real .y (or whatever) channel, and similarly for vec2/vec3 results. By knowing how to adjust DP4-type instructions for optimizing out a swizzled MOV, we can reduce instructions in common matrix multiplication cases. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Scale shader_time to compensate for resets.Eric Anholt2012-12-141-1/+3
| | | | | | | | | | Some shaders experience resets more than others, which skews the numbers reported. Attempt to correct for this by linearly scaling according to the number of resets that happen. Note that will not be accurate if invocations of shaders have varying times and longer invocations are more likely to reset. However, this should at least be better than the previous situation.