aboutsummaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965/brw_vec4.h
Commit message (Collapse)AuthorAgeFilesLines
* i965/vec4: Emit smarter code for b2f of a comparisonIan Romanick2014-06-111-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | Previously we would emit the comparison, emit an AND to mask off extra bits from the comparison result, then convert the result to float. Now, do the comparison, then use a cleverly constructed SEL to pick either 0.0f or 1.0f. No piglit regressions on Ivybridge. total instructions in shared programs: 1642311 -> 1639449 (-0.17%) instructions in affected programs: 136533 -> 133671 (-2.10%) GAINED: 0 LOST: 0 Programs that are affected appear to save between 1 and 5 instuctions (just by skimming the output from shader-db report.py. v2: s/b2i/b2f/ in commit subject (noticed by Chris Forbes). Remove extraneous fix_3src_operand (suggested by Matt). The latter change required swapping the order of the operands and using predicate_inverse. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Combine generate_math[12]_gen6 methods.Kenneth Graunke2014-06-101-7/+4
| | | | | | | | | These are trivial to combine: we should just avoid checking the second operand if it's brw_null_reg. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965/vec4: Drop the generate_math2_gen7() method.Kenneth Graunke2014-06-101-4/+0
| | | | | | | | | It's now a single line of code, so we may as well fold it into the caller. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: Make src_reg::equals() take a constant reference, not a pointer.Kenneth Graunke2014-06-101-1/+1
| | | | | | | This is more typical C++ style. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Move annotation info into generate code.Matt Turner2014-06-021-4/+2
| | | | | | Suggested by Ken as a way to cut down lines of code. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Give dump_instruction() a FILE* argument.Matt Turner2014-06-011-0/+1
| | | | | | | Use function overloading rather than default arguments, since gdb doesn't know about default arguments. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Print disassembly after compaction.Matt Turner2014-05-241-8/+5
| | | | Reviewed-by: Eric Anholt <[email protected]>
* i965/vec4: Port untyped atomic message support to Broadwell.Kenneth Graunke2014-05-011-0/+4
| | | | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77221 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965/vec4: Port untyped surface reads support to Broadwell.Kenneth Graunke2014-05-011-0/+3
| | | | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77221 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Drop mark_surface_used from gen8 generators.Kenneth Graunke2014-05-011-2/+0
| | | | | | | | Francisco made brw_mark_surface_used a freestanding function in commit a32817f3c248125fb537c3a915566445e5600d45. We should use it. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965/vec4: Remove 'mul_arg' from try_emit_mad().Matt Turner2014-04-301-1/+1
| | | | | Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add is_accumulator() function.Juha-Pekka Heikkila2014-04-161-0/+2
| | | | | | Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Signed-off-by: Juha-Pekka Heikkila <[email protected]>
* i965/vec4: Add is_null() method to dst_reg.Matt Turner2014-03-241-0/+2
| | | | Reviewed-by: Eric Anholt <[email protected]>
* i965/vec4: Rename depends_on_flags() to reads_flag().Matt Turner2014-03-241-1/+1
| | | | | | To be consistent with the fs backend. Reviewed-by: Eric Anholt <[email protected]>
* i965/vec4: Add and use vec4_instruction::writes_flag().Matt Turner2014-03-241-0/+5
| | | | | | | | To be consistent with the fs backend. Also the instruction scheduler incorrectly considered SEL with a conditional modifier to read the flag register. Reviewed-by: Eric Anholt <[email protected]>
* i965/vec4: Add missing doxygen close brace.Matt Turner2014-03-241-0/+1
| | | | Reviewed-by: Eric Anholt <[email protected]>
* i965: Merge resolving of shader program sourceTopi Pohjolainen2014-03-051-1/+1
| | | | | Reviewed-by: Matt Turner <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]>
* i965/vec4: Mark invariant members as constants in vec4_visitorTopi Pohjolainen2014-03-051-3/+3
| | | | | | Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]>
* i965: Allocate vec4_visitor's uniform_size and uniform_vector_size arrays ↵Petri Latvala2014-02-281-2/+3
| | | | | | | | | | | | | | | | | | | | | | dynamically. v2: Don't add function parameters, pass the required size in prog_data->nr_params. v3: - Use the name uniform_array_size instead of uniform_param_count. - Round up when dividing param_count by 4. - Use MAX2() instead of taking the maximum by hand. - Don't crash if prog_data passed to vec4_visitor constructor is NULL v4: Rebase for current master v5 (idr): Trivial whitespace change. Signed-off-by: Petri Latvala <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=71254 Cc: "10.1" <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965/vec4: Handle ir_triop_lrp on Gen4-5 as well.Kenneth Graunke2014-02-261-0/+3
| | | | | | | | | | | | | | | | | | | | | | | When the vec4 backend encountered an ir_triop_lrp, it always emitted an actual LRP instruction, which only exists on Gen6+. Gen4-5 used lower_instructions() to decompose ir_triop_lrp at the IR level. Since commit 8d37e9915a3b21 ("glsl: Optimize open-coded lrp into lrp."), we've had an bug where lower_instructions translates ir_triop_lrp into arithmetic, but opt_algebraic reassembles it back into a lrp. To avoid this ordering concern, just handle ir_triop_lrp in the backend. The FS backend already does this, so we may as well do likewise. v2: Add a comment reminding us that we could emit better assembly if we implemented the infrastructure necessary to support using MAC. (Assembly code provided by Eric Anholt). Cc: "10.1" <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75253 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Acked-by: Eric Anholt <[email protected]>
* i965: support gl_InvocationID for gen7Jordan Justen2014-02-201-0/+1
| | | | | | | | | | | | | v2: * Make gl_InvocationID a system value v3: * Properly shift from R0.1 into DST.4 by adding GS_OPCODE_GET_INSTANCE_ID Signed-off-by: Jordan Justen <[email protected]> Acked-by: Paul Berry <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965/vec4: Trivial improvements to the with_writemask() function.Francisco Jerez2014-02-191-2/+8
| | | | | | | | | | | | | | Add assertion that the register is not in the HW_REG or IMM file, calculate the conjunction of the old and new mask instead of replacing the old [consistent with the behavior of brw_writemask(), causes no functional changes right now], make it static inline to let the compiler do a slightly better job at optimizing things, and shorten its name. v2: Assert that the new writemask is not zero to avoid undefined hardware behaviour. Reviewed-by: Paul Berry <[email protected]>
* i965: Make sure that backend_reg::type and brw_reg::type are consistent for ↵Francisco Jerez2014-02-191-0/+14
| | | | | | | | | | | | | | | fixed regs. And define non-mutating helper functions to retype fixed and normal regs with a common interface. At some point we may want to get rid of ::fixed_hw_reg completely and have fixed regs use the normal register data members (e.g. backend_reg::reg to select a fixed GRF number, src_reg::swizzle to store the swizzle, etc.), I have the feeling that this is not the last headache we're going to get because of the multiple ways to represent the same thing and the different register interface depending on the file a register is stored in... Reviewed-by: Paul Berry <[email protected]>
* i965/vec4: Add non-mutating helper functions to modify src_reg::swizzle and ↵Francisco Jerez2014-02-191-0/+24
| | | | | | ::negate. Reviewed-by: Paul Berry <[email protected]>
* i965: Add non-mutating helper functions to modify the register offset.Francisco Jerez2014-02-191-0/+16
| | | | | | | Yes, we could avoid having four copies of essentially the same code by using templates here. Reviewed-by: Paul Berry <[email protected]>
* i965: Unify fs_generator:: and vec4_generator::mark_surface_used as a free ↵Francisco Jerez2014-02-191-2/+0
| | | | | | | | function. This way it can be used anywhere. I need it from the visitor. Reviewed-by: Paul Berry <[email protected]>
* i965: Move up duplicated fields from stage-specific prog_data to ↵Francisco Jerez2014-02-191-3/+0
| | | | | | | | | | | | | brw_stage_prog_data. There doesn't seem to be any reason for nr_params, nr_pull_params, param, and pull_param to be duplicated in the stage-specific subclasses of brw_stage_prog_data. Moving their definition to the common base class will allow some code sharing in a future commit, the removal of brw_vec4_prog_data_compare and brw_*_prog_data_free, and the simplification of the stage-specific brw_*_prog_data_compare. Reviewed-by: Paul Berry <[email protected]>
* i965/vec4: Add constructor of src_reg from a fixed hardware reg.Francisco Jerez2014-02-191-0/+1
| | | | Reviewed-by: Paul Berry <[email protected]>
* i965/vec4: Fix confusion between SWIZZLE and BRW_SWIZZLE macros.Francisco Jerez2014-02-121-1/+1
| | | | | | | | | | | | Most of the VEC4 back-end agrees on src_reg::swizzle being one of the BRW_SWIZZLE macros defined in brw_reg.h, except in two places where we use Mesa's SWIZZLE macros. There is even a doxygen comment saying that Mesa's macros are the right ones. They are incompatible swizzle representations (3 bits vs. 2 bits per component), and the code using Mesa's works by pure luck. Fix it. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965/vec4: Emit shader w/a for Gen6 gatherChris Forbes2014-02-081-0/+1
| | | | | Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Remove *_generator::shader field; use prog field instead.Paul Berry2014-01-231-1/+0
| | | | | | | | | | | | | | | The "shader" field in fs_generator, vec4_generator, and gen8_generator was only used for one purpose; to figure out if we were compiling an assembly program or a GLSL shader (shader is NULL for assembly programs). And it wasn't being used properly: in vec4 shaders we were always initializing it based on prog->_LinkedShaders[MESA_SHADER_FRAGMENT], regardless of whether we were compiling a geometry shader or a vertex shader. This patch simplifies things by using the "prog" field instead; this is also NULL for assembly programs. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add GS support to INTEL_DEBUG=shader_time.Paul Berry2014-01-211-1/+8
| | | | | | | Previously, time spent in geometry shaders would be counted as part of the vertex shader time. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Create a new vec4 backend for Broadwell.Kenneth Graunke2014-01-181-0/+61
| | | | | | | | | | | | | | | | | | | | This replaces the old vec4_generator backend. v2: Port to use the C-based instruction representation. Also, remove Geometry Shader offset hacks - the visitor will handle those instead of this code. v3: Texturing fixes (including adding textureGather support). v4: Pass brw_context to gen8_instruction functions as required. v5: Add SHADER_OPCODE_TXF_MCS support; port DUAL_INSTANCED gs fixes (caught by Eric). Simplify ADDC/SUBB handling; add comments to gen8_set_dp_message calls (suggested by Matt). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/vs: Sample from MCS surface when requiredChris Forbes2013-12-071-0/+1
| | | | | Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965/vec4: Add invalidate_live_intervals method.Matt Turner2013-11-201-0/+1
| | | | | Reviewed-by: Paul Berry <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/gen7: Handle atomic instructions from the VEC4 back-end.Francisco Jerez2013-11-041-0/+9
| | | | | | | | | | | | | This can deal with all the 15 32-bit untyped atomic operations the hardware supports, but only INC and PREDEC are going to be exposed through the API for now. v2: Represent atomics as GLSL intrinsics. Add support for variably indexed atomic counter arrays. v3: Add comment on why we don't need to assign uniform storage for atomic counters. Reviewed-by: Paul Berry <[email protected]>
* i965/gen7: Implement code generation for untyped surface read instructions.Francisco Jerez2013-10-291-0/+4
|
* i965/gen7: Implement code generation for untyped atomic instructions.Francisco Jerez2013-10-291-0/+5
| | | | Reviewed-by: Paul Berry <[email protected]>
* i965/vec4: Add the ability to suppress register spilling.Paul Berry2013-10-241-1/+8
| | | | | | | | | In future patches, this will allow us to first try compiling a geometry shader in DUAL_OBJECT mode (which is more efficient but uses more registers) and then if spilling is required, fall back on DUAL_INSTANCED mode. Reviewed-by: Eric Anholt <[email protected]>
* i965/vec4: Add the ability for attributes to be interleaved.Paul Berry2013-10-241-1/+2
| | | | | | | | | | | | When geometry shaders are operated in "single" or "dual instanced" mode, a single set of geometry shader inputs is interleaved into the thread payload (with each payload register containing a pair of inputs) in order to save register space. This patch modifies vec4_visitor::lower_attributes_to_hw_regs so that it can handle the interleaved format. Reviewed-by: Eric Anholt <[email protected]>
* i965/vec4: Extract function to set up vec4 prog key for precompiling.Paul Berry2013-10-241-0/+4
| | | | | | | | This will allow us to re-use it for precompiling geometry shaders. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Remove uses_clip_distance from program key.Paul Berry2013-10-241-6/+0
| | | | | | | | | | This should never have been in the program key in the first place, since it's determined by the shader source, not by GL state. Change the code to just refer to gl_program::UsesClipDistanceOut directly. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Move the common binding table offset code to brw_shader.cpp.Eric Anholt2013-10-151-1/+0
| | | | | | | | | | Now that both vec4 and fs are dynamically assigning offsets, a lot of the code is the same. v2: Avoid passing around the next offset through the class. (Review by Paul) Reviewed-by: Paul Berry <[email protected]>
* i965: Make a brw_stage_prog_data for storing the SURF_INDEX information.Eric Anholt2013-10-151-0/+1
| | | | | | | | | | | It would be nice to be able to pack our binding table so that programs that use 1 render target don't upload an extra BRW_MAX_DRAW_BUFFERS - 1 binding table entries. To do that, we need the compiled program to have information on where its surfaces go. v2: Rename size to size_bytes to be more explicit. Reviewed-by: Paul Berry <[email protected]>
* i965: Always have the struct gl_program * in the backend visitor.Eric Anholt2013-10-151-1/+0
| | | | | | | vec4 already had it, so put it in the FS, too. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965: Remove the "ARF" register file.Matt Turner2013-10-071-1/+1
| | | | | | | | The registers in the architecture register file don't share much in common, so there's no point in grouping them together. Use the HW_REG class instead. The vec4 backend already does this. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Generate code for ir_binop_carry and ir_binop_borrow.Matt Turner2013-10-071-0/+2
| | | | | | Using the ADDC and SUBB instructions on Gen7. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add UD null register helpers.Matt Turner2013-10-071-0/+5
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: Add support for ir_tg4Chris Forbes2013-10-031-0/+1
| | | | | | | | | | Pretty much the same as the FS case. Channel select goes in the header, V2: Less mangling. V3: Avoid sampling at all, for degenerate swizzles. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Initialize all member variables of vec4_instruction on construction.Francisco Jerez2013-10-011-1/+1
| | | | | | | | | | | | | | | The vec4_instruction object relies on the memory allocator zeroing out its contents before it's initialized, which is quite an unusual practice in the C++ world because it ties objects to some specific allocation scheme, and gives unpredictable results when an object is created with a different allocator -- Stack allocation, array allocation, or aggregation inside a different object are some of the useful possibilities that come to my mind. Initialize all fields from the constructor and stop using the zeroing allocator. Reviewed-by: Paul Berry <[email protected]> Reviewed-by: Chad Versace <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>