aboutsummaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965/brw_vec4.cpp
Commit message (Collapse)AuthorAgeFilesLines
* i965/vec4: Fix confusion between SWIZZLE and BRW_SWIZZLE macros.Francisco Jerez2014-02-121-1/+1
| | | | | | | | | | | | Most of the VEC4 back-end agrees on src_reg::swizzle being one of the BRW_SWIZZLE macros defined in brw_reg.h, except in two places where we use Mesa's SWIZZLE macros. There is even a doxygen comment saying that Mesa's macros are the right ones. They are incompatible swizzle representations (3 bits vs. 2 bits per component), and the code using Mesa's works by pure luck. Fix it. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965: Fix register types in dump_instructions().Kenneth Graunke2014-02-051-1/+1
| | | | | | | | | | | | | | This regressed when I converted BRW_REGISTER_TYPE_* to be an abstract type that doesn't match the hardware description. dump_instruction() was using reg_encoding[] from brw_disasm.c, which no longer matches (and was incorrect for Gen8+ anyway). This patch introduces a new function to convert the abstract enum values into the letter suffix we expect. Signed-off-by: Kenneth Graunke <[email protected]> Reported-by: Matt Turner <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: rename tex_ms to tex_cmsTopi Pohjolainen2014-01-231-1/+1
| | | | | | | | | | Prepares for the introduction of non-compressed multi-sampled lookup used in the blorp programs. v2: now also taking into account gen8 Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965: Print reg_offset for vgrf of size > 1 in dump_instruction().Matt Turner2014-01-211-1/+1
| | | | | | | Previously we wouldn't print the +0 for the first part of a VGRF of size greater than 1. Reviewed-by: Jordan Justen <[email protected]>
* i965: Add GS support to INTEL_DEBUG=shader_time.Paul Berry2014-01-211-3/+3
| | | | | | | Previously, time spent in geometry shaders would be counted as part of the vertex shader time. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Create a new vec4 backend for Broadwell.Kenneth Graunke2014-01-181-5/+11
| | | | | | | | | | | | | | | | | | | | This replaces the old vec4_generator backend. v2: Port to use the C-based instruction representation. Also, remove Geometry Shader offset hacks - the visitor will handle those instead of this code. v3: Texturing fixes (including adding textureGather support). v4: Pass brw_context to gen8_instruction functions as required. v5: Add SHADER_OPCODE_TXF_MCS support; port DUAL_INSTANCED gs fixes (caught by Eric). Simplify ADDC/SUBB handling; add comments to gen8_set_dp_message calls (suggested by Matt). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Stop doing our optimization on a copy of the GLSL IR.Eric Anholt2014-01-171-2/+2
| | | | | | | | | | | The original intent was that we'd keep a driver-private copy, and there would be the normal copy for swrast to make use of without the tuning (or anything more invasive we might do) specific to i965. Only, we don't generate swrast code any more, because swrast can't render current shaders anyway. Thus, our private copy is rather a waste, and we can just do our backend-specific operations on the linked shader. Reviewed-by: Ian Romanick <[email protected]>
* i965: Add shader opcode for sampling MCS surfaceChris Forbes2013-12-071-0/+1
| | | | | | Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Paul Berry <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Print conditional mod in dump_instruction().Matt Turner2013-12-041-1/+5
| | | | Reviewed-by: Eric Anholt <[email protected]>
* i965: Print argument types in dump_instruction().Matt Turner2013-12-041-1/+5
| | | | Reviewed-by: Eric Anholt <[email protected]>
* i965/vec4: Don't print swizzles for immediate values.Matt Turner2013-12-041-4/+6
| | | | Reviewed-by: Eric Anholt <[email protected]>
* i965/vec4: Print negate and absolute value for src args.Matt Turner2013-12-041-0/+7
| | | | Reviewed-by: Eric Anholt <[email protected]>
* i965/vec4: Add support for printing HW_REGs in dump_instruction().Matt Turner2013-12-041-0/+60
| | | | Reviewed-by: Eric Anholt <[email protected]>
* i965: Don't print extra (null) arguments in dump_instruction().Matt Turner2013-12-041-2/+2
| | | | Reviewed-by: Eric Anholt <[email protected]>
* i965/cfg: Clean up cfg_t constructors.Matt Turner2013-12-041-1/+1
| | | | | | | parent_mem_ctx was unused since db47074a, so remove the two wrappers around create() and make create() the constructor. Reviewed-by: Eric Anholt <[email protected]>
* i965: Add a pass to remove dead control flow.Matt Turner2013-11-201-0/+2
| | | | | | | | | | | | | Removes IF/ENDIF and IF/ELSE/ENDIF with no intervening instructions. total instructions in shared programs: 1360393 -> 1360387 (-0.00%) instructions in affected programs: 157 -> 151 (-3.82%) (no change in vertex shaders) Reviewed-by: Paul Berry <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vec4: Add invalidate_live_intervals method.Matt Turner2013-11-201-4/+4
| | | | | Reviewed-by: Paul Berry <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add a 'has_side_effects' back-end instruction predicate.Francisco Jerez2013-11-041-1/+1
| | | | | | | | | | | | | This patch fixes the three dead code elimination passes and the VEC4/FS instruction scheduling passes so they leave instructions with side effects alone. At some point it might be interesting to have the instruction scheduler calculate the exact memory dependencies between atomic ops, but they're rare enough that it seems unlikely that it will make any practical difference. Reviewed-by: Paul Berry <[email protected]>
* i965: Merge together opcodes for SHADER_OPCODE_GEN4_SCRATCH_READ/WRITEEric Anholt2013-10-301-2/+2
| | | | | | | I'm going to be introducing gen7 variants, and the previous naming was going to get confusing. Reviewed-by: Paul Berry <[email protected]>
* i965/gen7: Implement code generation for untyped surface read instructions.Francisco Jerez2013-10-291-0/+1
|
* i965/gen7: Implement code generation for untyped atomic instructions.Francisco Jerez2013-10-291-0/+2
| | | | Reviewed-by: Paul Berry <[email protected]>
* i965: Add SHADER_OPCODE_TG4_OFFSET for gather with nonconstant offsets.Chris Forbes2013-10-261-0/+1
| | | | | | | | | The generator code ends up clearer this way than if we had to sniff via the message length. Implemented via the gather4_po message in hardware, which is present in Gen7 and later. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965/vec4: if register allocation fails, don't try to schedule.Paul Berry2013-10-241-1/+1
| | | | | | | | | | | Otherwise the scheduler would be invoked with prog_data->total_grf == 0, causing havoc. In a future patch, this will allow us to try compiling a geometry shader in DUAL_OBJECT mode with spilling disabled, and then fall back to DUAL_INSTANCED mode if that failed. Reviewed-by: Eric Anholt <[email protected]>
* i965/vec4: Add the ability for attributes to be interleaved.Paul Berry2013-10-241-4/+24
| | | | | | | | | | | | When geometry shaders are operated in "single" or "dual instanced" mode, a single set of geometry shader inputs is interleaved into the thread payload (with each payload register containing a pair of inputs) in order to save register space. This patch modifies vec4_visitor::lower_attributes_to_hw_regs so that it can handle the interleaved format. Reviewed-by: Eric Anholt <[email protected]>
* i965/vec4: Extract function to set up vec4 prog key for precompiling.Paul Berry2013-10-241-0/+22
| | | | | | | | This will allow us to re-use it for precompiling geometry shaders. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Remove uses_clip_distance from program key.Paul Berry2013-10-241-1/+1
| | | | | | | | | | This should never have been in the program key in the first place, since it's determined by the shader source, not by GL state. Change the code to just refer to gl_program::UsesClipDistanceOut directly. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Move the common binding table offset code to brw_shader.cpp.Eric Anholt2013-10-151-32/+1
| | | | | | | | | | Now that both vec4 and fs are dynamically assigning offsets, a lot of the code is the same. v2: Avoid passing around the next offset through the class. (Review by Paul) Reviewed-by: Paul Berry <[email protected]>
* i965/vec4: Dynamically assign the VS/GS binding table offsets.Eric Anholt2013-10-151-5/+24
| | | | | | | Note that the dropped comment in brw_context.h is mostly (better written) in brw_binding_table.c as well. Reviewed-by: Paul Berry <[email protected]>
* i965: Make a brw_stage_prog_data for storing the SURF_INDEX information.Eric Anholt2013-10-151-1/+15
| | | | | | | | | | | It would be nice to be able to pack our binding table so that programs that use 1 render target don't upload an extra BRW_MAX_DRAW_BUFFERS - 1 binding table entries. To do that, we need the compiled program to have information on where its surfaces go. v2: Rename size to size_bytes to be more explicit. Reviewed-by: Paul Berry <[email protected]>
* i965: Don't copy prop source mods into instructions that can't take them.Matt Turner2013-10-141-0/+3
|
* i965: Fixup for don't dead-code eliminate instructions that write to the ↵Matt Turner2013-10-071-2/+1
| | | | | | | | | | accumulator. Accidentally pushed an old version of the patch. v2: Set destination register using brw_null_reg(). Reviewed-by: Paul Berry <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Don't dead-code eliminate instructions that write to the accumulator.Matt Turner2013-10-071-1/+15
| | | | | Reviewed-by: Paul Berry <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: add SHADER_OPCODE_TG4Chris Forbes2013-10-031-0/+1
| | | | | | | | | | Adds the Gen7 message IDs, a new SHADER_OPCODE_TG4 pseudo-op, and low-level support for emitting it via generate_tex(). V3: Updated for changes in master. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: Detect GRF sources in split_virtual_grfs send-from-GRF code.Kenneth Graunke2013-08-301-2/+6
| | | | | | | | | | | | | | | | | | | | It is incorrect to assume that src[0] of a SEND-from-GRF opcode is the GRF. VS_OPCODE_PULL_CONSTANT_LOAD_GEN7 uses an IMM as src[0], and stores the GRF as src[1]. To be safe, loop over all the source registers and mark any GRFs. We probably won't ever have more than one, but it's simpler to just check all three rather than attempting to bail early. Fixes assertion failures in Unigine Sanctuary since we started making register allocation rely on split_virtual_grfs working. (The register classes were actually sufficient, we were just interpreting an IMM as a virtual GRF number.) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68637 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Cc: [email protected]
* i965/vs: Fix regression on pre-gen6 with no VS uniforms in use.Eric Anholt2013-08-301-0/+1
| | | | | | | | | | | | df06745c5adb524e15d157f976c08f1718f08efa made it so that we didn't allocate extra uniform space for unused clip planes, which also incidentally made us not allocate any space at all, which we were relying on for this no-uniforms case. Instead of putting the knowledge of this special HW exception into the thing that normally preallocates prog_data for us, just allocate it here. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68766 Reviewed-by: Kenneth Graunke <[email protected]>
* i965/gs: Add GS_OPCODE_THREAD_END.Paul Berry2013-08-231-0/+1
| | | | | | Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/gs: Add GS_OPCODE_URB_WRITE.Paul Berry2013-08-231-0/+2
| | | | | | Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Virtualize setup_payload instead of setup_attributes.Paul Berry2013-08-231-1/+1
| | | | | | | | | | | | | | | | | | | When I initially generalized the vec4_visitor class in preparation for geometry shaders, I assumed that the setup_attributes() function would need to be different between vertex and geometry shaders, but its caller, setup_payload(), could be shared. So I made setup_attributes() a virtual function. It turns out this isn't true; setup_payload() needs to be different too, since the geometry shader payload sometimes includes an extra register (primitive ID) that has to come before uniforms. So setup_payload() needs to be the virtual function instead of setup_attributes(). Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Allow for dispatch_grf_start_reg to vary.Paul Berry2013-08-231-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Both 3DSTATE_VS and 3DSTATE_GS have a dispatch_grf_start_reg control, which determines the register where the hardware delivers data sourced from the URB (push constants followed by per-vertex input data). For vertex shaders, we always set dispatch_grf_start_reg to 1, since R1 is always the first register available for push constants in vertex shaders. For geometry shaders, we'll need the flexibility to set dispatch_grf_start_reg to different values depending on the behvaiour of the geometry shader; if it accesses gl_PrimitiveIDIn, we'll need to set it to 2 to allow the primitive ID to be delivered to the thread in R1. This patch eliminates the assumption that dispatch_grf_start_reg is always 1. In vec4_visitor, we record the regnum that was passed to vec4_visitor::setup_uniforms() in prog_data for later use. In vec4_generator, we consult this value when converting an abstract UNIFORM register to a concrete hardware register. And in the code that emits 3DSTATE_VS, we set dispatch_grf_start_reg based on the value recorded in prog_data. This will allow us to set dispatch_grf_start_reg to the appropriate value when compiling geometry shaders. Vertex shaders will continue to always use a dispatch_grf_start_reg of 1. v2: Make dispatch_grf_start_reg "unsigned" rather than "GLuint". Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Move vec4 data structures and functions to brw_vec4.{cpp,h}.Paul Berry2013-08-231-0/+27
| | | | | | | | | | | | | | | | This patch moves the following things into brw_vec4.{cpp,h}: - struct brw_vec4_compile - struct brw_vec4_prog_key - brw_vec4_prog_data_compare() - brw_vec4_prog_data_free() This will allow us to avoid having to include brw_vs.h in geometry-shader-specific files. Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Stop including brw_vs.h from brw_vec4.h.Paul Berry2013-08-231-0/+1
| | | | | | | | | | | | | | | | This is backwards from what we are going to want in the long term, which is: - brw_vec4.h declares general-purpose vec4 infrastructure needed by both VS and GS - brw_vs.h includes brw_vec4.h and adds VS-specific parts. - brw_gs.h includes brw_vec4.h and adds GS-specific parts. Note that at the moment brw_vec.h contains a fair amount of VS-specific declarations--I plan to address that in a later patch. Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/vs: Plumb brw_vec4_prog_data into vec4_generator().Kenneth Graunke2013-08-191-1/+1
| | | | | | | This will be useful for the next commit. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965: Move intel_context::gen and gt fields to brw_context.Kenneth Graunke2013-07-091-5/+5
| | | | | | | | | | Most functions no longer use intel_context, so this patch additionally removes the local "intel" variables to avoid compiler warnings. Signed-off-by: Kenneth Graunke <[email protected]> Acked-by: Chris Forbes <[email protected]> Acked-by: Paul Berry <[email protected]> Acked-by: Anuj Phogat <[email protected]>
* i965: Move intel_context::perf_debug to brw_context.Kenneth Graunke2013-07-091-3/+2
| | | | | | | Signed-off-by: Kenneth Graunke <[email protected]> Acked-by: Chris Forbes <[email protected]> Acked-by: Paul Berry <[email protected]> Acked-by: Anuj Phogat <[email protected]>
* i965: Move intel_context::batch to brw_context.Kenneth Graunke2013-07-091-3/+3
| | | | | | | Signed-off-by: Kenneth Graunke <[email protected]> Acked-by: Chris Forbes <[email protected]> Acked-by: Paul Berry <[email protected]> Acked-by: Anuj Phogat <[email protected]>
* glsl: Remove ir_print_visitor.h includes and usageEric Anholt2013-06-211-1/+0
| | | | | | | | | | | | | We have ir->print() to do the old declaration of a visitor and having the IR accept the visitor (yuck!). And now you can call _mesa_print_ir() safely anywhere that you know what an ir_instruction is. A couple of missing printf("\n")s are added in error paths -- when an expression is handed to the visitor, it doesn't print '\n' (since it might be a step in printing a whole expression tree). Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965/vs: Fix implied_mrf_writes() for integer division pre-gen6.Eric Anholt2013-05-291-0/+2
| | | | | | | | | | | Previously it would assertion fail in debug builds (though the correct value was returned in a non-debug build). Marking it as a candidate for stable even though it has no current consumers in the stable branches, in case one shows up in a later backport. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=64727 NOTE: This is a candidate for stable branches. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: Make virtual grf live intervals actually cover their used range.Eric Anholt2013-05-091-4/+7
| | | | | | | | This is the same change as the previous commit to the FS. A very few VSes are regressed by 1 or 2 instructions, which look recoverable with a bit more dead code elimination. Reviewed-by: Ian Romanick <[email protected]>
* i965/vs: Add instruction scheduling.Eric Anholt2013-05-021-0/+9
| | | | | | | | | | | While this is ignorant of dependency control, it's still good for a 0.39% +/- 0.08% performance improvement on GLBenchmark 2.7 (n=548) v2: Rewrite as a subclass of the base class for the FS instruction scheduler, inheriting the same latency information. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Make dump_instructions be a virtual method of the visitor.Eric Anholt2013-05-021-12/+3
| | | | | Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>