aboutsummaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965/brw_vec4.h
Commit message (Collapse)AuthorAgeFilesLines
* i965/vs: Sample from MCS surface when requiredChris Forbes2013-12-071-0/+1
| | | | | Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965/vec4: Add invalidate_live_intervals method.Matt Turner2013-11-201-0/+1
| | | | | Reviewed-by: Paul Berry <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/gen7: Handle atomic instructions from the VEC4 back-end.Francisco Jerez2013-11-041-0/+9
| | | | | | | | | | | | | This can deal with all the 15 32-bit untyped atomic operations the hardware supports, but only INC and PREDEC are going to be exposed through the API for now. v2: Represent atomics as GLSL intrinsics. Add support for variably indexed atomic counter arrays. v3: Add comment on why we don't need to assign uniform storage for atomic counters. Reviewed-by: Paul Berry <[email protected]>
* i965/gen7: Implement code generation for untyped surface read instructions.Francisco Jerez2013-10-291-0/+4
|
* i965/gen7: Implement code generation for untyped atomic instructions.Francisco Jerez2013-10-291-0/+5
| | | | Reviewed-by: Paul Berry <[email protected]>
* i965/vec4: Add the ability to suppress register spilling.Paul Berry2013-10-241-1/+8
| | | | | | | | | In future patches, this will allow us to first try compiling a geometry shader in DUAL_OBJECT mode (which is more efficient but uses more registers) and then if spilling is required, fall back on DUAL_INSTANCED mode. Reviewed-by: Eric Anholt <[email protected]>
* i965/vec4: Add the ability for attributes to be interleaved.Paul Berry2013-10-241-1/+2
| | | | | | | | | | | | When geometry shaders are operated in "single" or "dual instanced" mode, a single set of geometry shader inputs is interleaved into the thread payload (with each payload register containing a pair of inputs) in order to save register space. This patch modifies vec4_visitor::lower_attributes_to_hw_regs so that it can handle the interleaved format. Reviewed-by: Eric Anholt <[email protected]>
* i965/vec4: Extract function to set up vec4 prog key for precompiling.Paul Berry2013-10-241-0/+4
| | | | | | | | This will allow us to re-use it for precompiling geometry shaders. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Remove uses_clip_distance from program key.Paul Berry2013-10-241-6/+0
| | | | | | | | | | This should never have been in the program key in the first place, since it's determined by the shader source, not by GL state. Change the code to just refer to gl_program::UsesClipDistanceOut directly. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Move the common binding table offset code to brw_shader.cpp.Eric Anholt2013-10-151-1/+0
| | | | | | | | | | Now that both vec4 and fs are dynamically assigning offsets, a lot of the code is the same. v2: Avoid passing around the next offset through the class. (Review by Paul) Reviewed-by: Paul Berry <[email protected]>
* i965: Make a brw_stage_prog_data for storing the SURF_INDEX information.Eric Anholt2013-10-151-0/+1
| | | | | | | | | | | It would be nice to be able to pack our binding table so that programs that use 1 render target don't upload an extra BRW_MAX_DRAW_BUFFERS - 1 binding table entries. To do that, we need the compiled program to have information on where its surfaces go. v2: Rename size to size_bytes to be more explicit. Reviewed-by: Paul Berry <[email protected]>
* i965: Always have the struct gl_program * in the backend visitor.Eric Anholt2013-10-151-1/+0
| | | | | | | vec4 already had it, so put it in the FS, too. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965: Remove the "ARF" register file.Matt Turner2013-10-071-1/+1
| | | | | | | | The registers in the architecture register file don't share much in common, so there's no point in grouping them together. Use the HW_REG class instead. The vec4 backend already does this. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Generate code for ir_binop_carry and ir_binop_borrow.Matt Turner2013-10-071-0/+2
| | | | | | Using the ADDC and SUBB instructions on Gen7. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add UD null register helpers.Matt Turner2013-10-071-0/+5
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: Add support for ir_tg4Chris Forbes2013-10-031-0/+1
| | | | | | | | | | Pretty much the same as the FS case. Channel select goes in the header, V2: Less mangling. V3: Avoid sampling at all, for degenerate swizzles. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Initialize all member variables of vec4_instruction on construction.Francisco Jerez2013-10-011-1/+1
| | | | | | | | | | | | | | | The vec4_instruction object relies on the memory allocator zeroing out its contents before it's initialized, which is quite an unusual practice in the C++ world because it ties objects to some specific allocation scheme, and gives unpredictable results when an object is created with a different allocator -- Stack allocation, array allocation, or aggregation inside a different object are some of the useful possibilities that come to my mind. Initialize all fields from the constructor and stop using the zeroing allocator. Reviewed-by: Paul Berry <[email protected]> Reviewed-by: Chad Versace <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965, mesa: Use the new DECLARE_R[Z]ALLOC_CXX_OPERATORS macros.Kenneth Graunke2013-09-211-33/+3
| | | | | | | | | | | | | | | | These classes declared a placement new operator, but didn't declare a delete operator. Switching to the macro gives them a delete operator, which probably is a good idea anyway. This also eliminates a lot of boilerplate. v2: Properly use RZALLOC in Mesa IR/TGSI translators. Caught by Eric and Chad. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Chad Versace <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965/vec4: Add the ability to emit opcodes with just a dst register.Paul Berry2013-09-111-0/+2
| | | | | | | This is needed for GS_OPCODE_PREPARE_CHANNEL_MASKS. Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/gs: Add opcodes needed for EndPrimitive().Paul Berry2013-09-111-0/+2
| | | | | Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vec4: Make with_writemask() non-static.Paul Berry2013-09-051-0/+3
| | | | | | | This will allow it to be shared between brw_vec4_visitor.cpp and brw_vec4_vs_visitor.cpp (which will be created in the next patch). Acked-by: Kenneth Graunke <[email protected]>
* i965/vs: Move vs-specific code out of brw_vec4.h.Paul Berry2013-09-051-32/+0
| | | | | | | Now brw_vec4.h contains only code that is shared between the vertex and geometry shaders. Acked-by: Kenneth Graunke <[email protected]>
* i965/vs: Add support for translating ir_triop_fma into MAD.Matt Turner2013-08-271-0/+1
| | | | | Reviewed-by: Paul Berry <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965/vs: Expose the payload registers to the register allocator.Kenneth Graunke2013-08-261-0/+2
| | | | | | | | | | | | | | | For now, nothing else can get allocated over them. That may change at some point in the future. This also means that base_reg_count can be computed without knowing the number of registers used for the payload, which is required if we want to allocate the register set once at context creation time. See commit 551e1cd44f6857f7e29ea4c8f892da5a97844377, which implemented virtually identical code in the FS backend. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965: Allow C++ type safety in the use of enum brw_urb_write_flags.Paul Berry2013-08-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (From a suggestion by Francisco Jerez) If an enum represents a bitfield of flags, e.g.: enum E { A = 1, B = 2, C = 4, D = 8, }; then C++ normally prohibits statements like this: enum E x = A | B; because A and B are implicitly converted to ints before OR-ing them, and an int can't be stored in an enum without a type cast. C, on the other hand, allows an int to be implicitly converted to an enum without casting. In the past we've dealt with this situation by storing flag bitfields as ints. This avoids ugly casting at the expense of some type safety that C++ would normally have offered (e.g. we get no warning if we accidentally use the wrong enum type). However, we can get the best of both worlds if we override the | operator. The ugly casting is confined to the operator overload, and we still get the benefit of C++ making sure we don't use the wrong enum type. v2: Remove unnecessary comment and unnecessary use of "enum" keyword. Use static_cast. Reviewed-by: Chad Versace <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* i965: Remove redundant (and uninitialized) field vec4_generator::ctx.Paul Berry2013-08-261-1/+0
| | | | | | | | | | | | | We never noticed that this field was uninitialized because it is only used in an error path that reports internal Mesa errors. But it's silly to have it around anyway because &brw->ctx is equivalent. Should fix Coverity defect CID 1063351: Uninitialized pointer field (UNINIT_CTOR) /src/mesa/drivers/dri/i965/brw_vec4_emit.cpp: 148 Reviewed-by: Ian Romanick <[email protected]>
* i965/gs: Add GS_OPCODE_SET_DWORD_2_IMMED.Paul Berry2013-08-231-0/+1
| | | | | | Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/gs: Add GS_OPCODE_SET_VERTEX_COUNT.Paul Berry2013-08-231-0/+2
| | | | | | Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/gs: Add GS_OPCODE_SET_WRITE_OFFSET.Paul Berry2013-08-231-0/+3
| | | | | | | | v2: Added a comment to vec4_generator::generate_gs_set_write_offset(). Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/gs: Add GS_OPCODE_THREAD_END.Paul Berry2013-08-231-0/+1
| | | | | | Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/gs: Add GS_OPCODE_URB_WRITE.Paul Berry2013-08-231-1/+2
| | | | | | Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Combine 4 boolean args of brw_urb_WRITE into a flags bitfield.Paul Berry2013-08-231-1/+1
| | | | | | | | | | | | The arguments to brw_urb_WRITE() were getting pretty unwieldy, and we have to add more flags to support geometry shaders anyhow. Also plumb these flags through brw_clip_emit_vue(), brw_set_urb_message(), and the vec4_instruction class. Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Virtualize setup_payload instead of setup_attributes.Paul Berry2013-08-231-3/+3
| | | | | | | | | | | | | | | | | | | When I initially generalized the vec4_visitor class in preparation for geometry shaders, I assumed that the setup_attributes() function would need to be different between vertex and geometry shaders, but its caller, setup_payload(), could be shared. So I made setup_attributes() a virtual function. It turns out this isn't true; setup_payload() needs to be different too, since the geometry shader payload sometimes includes an extra register (primitive ID) that has to come before uniforms. So setup_payload() needs to be the virtual function instead of setup_attributes(). Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Allow for dispatch_grf_start_reg to vary.Paul Berry2013-08-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Both 3DSTATE_VS and 3DSTATE_GS have a dispatch_grf_start_reg control, which determines the register where the hardware delivers data sourced from the URB (push constants followed by per-vertex input data). For vertex shaders, we always set dispatch_grf_start_reg to 1, since R1 is always the first register available for push constants in vertex shaders. For geometry shaders, we'll need the flexibility to set dispatch_grf_start_reg to different values depending on the behvaiour of the geometry shader; if it accesses gl_PrimitiveIDIn, we'll need to set it to 2 to allow the primitive ID to be delivered to the thread in R1. This patch eliminates the assumption that dispatch_grf_start_reg is always 1. In vec4_visitor, we record the regnum that was passed to vec4_visitor::setup_uniforms() in prog_data for later use. In vec4_generator, we consult this value when converting an abstract UNIFORM register to a concrete hardware register. And in the code that emits 3DSTATE_VS, we set dispatch_grf_start_reg based on the value recorded in prog_data. This will allow us to set dispatch_grf_start_reg to the appropriate value when compiling geometry shaders. Vertex shaders will continue to always use a dispatch_grf_start_reg of 1. v2: Make dispatch_grf_start_reg "unsigned" rather than "GLuint". Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Move vec4 data structures and functions to brw_vec4.{cpp,h}.Paul Berry2013-08-231-2/+44
| | | | | | | | | | | | | | | | This patch moves the following things into brw_vec4.{cpp,h}: - struct brw_vec4_compile - struct brw_vec4_prog_key - brw_vec4_prog_data_compare() - brw_vec4_prog_data_free() This will allow us to avoid having to include brw_vs.h in geometry-shader-specific files. Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Make brw_{shader,vec4}.h safe to include from C.Paul Berry2013-08-231-1/+9
| | | | | | | | | | | | | | The patch that follows will move the definition of struct brw_vec4_prog_key from brw_vs.h to brw_vec4.h, making it necessary for brw_vs.h to include brw_vec4.h (because brw_vs.h defines struct brw_vs_prog_key, which contains brw_vec4_prog_key as a member). Since brw_vs.h is included from C source files, that means that brw_vec4.h will need to be safe to include from C. Same for brw_shader.h, since it is included by brw_vec4.h. Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Stop including brw_vs.h from brw_vec4.h.Paul Berry2013-08-231-1/+4
| | | | | | | | | | | | | | | | This is backwards from what we are going to want in the long term, which is: - brw_vec4.h declares general-purpose vec4 infrastructure needed by both VS and GS - brw_vs.h includes brw_vec4.h and adds VS-specific parts. - brw_gs.h includes brw_vec4.h and adds GS-specific parts. Note that at the moment brw_vec.h contains a fair amount of VS-specific declarations--I plan to address that in a later patch. Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/vs: Rework binding table size calculation.Kenneth Graunke2013-08-191-0/+2
| | | | | | | | | | | | | | | | | | | | | | Unlike the FS, the VS backend already computed the binding table size. However, it did so poorly: after compilation, it looked to see if any pull constants/textures/UBOs were in use, and set num_surfaces to the maximum surface index for that category. If the VS only used a single texture or UBO, this overcounted by quite a bit. The shader time surface was also noted at state upload time (during drawing), not at compile time, which is inefficient. I believe it also had an off by one error. This patch computes it accurately, while also simplifying the code. It also renames num_surfaces to binding_table_size, since num_surfaces wasn't actually the number of surfaces used. For example, a VS that used one UBO and no other surfaces would have set num_surfaces to SURF_INDEX_VS_UBO(1) == 18, rather than 1. A bit of a misnomer there. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965/vs: Plumb brw_vec4_prog_data into vec4_generator().Kenneth Graunke2013-08-191-0/+3
| | | | | | | This will be useful for the next commit. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965: add new VS_OPCODE_UNPACK_FLAGS_SIMD4X2Chris Forbes2013-08-161-1/+3
| | | | | | | | | | | | | | Splits the bottom 8 bits of f0.0 for further wrangling in a SIMD4x2 program. The 4 bits corresponding to the channels in each program flow are copied to the LSBs of dst.x visible to each flow. This is useful for working with clipping flags in the VS. V3: - Fixup immediate types - Teach scheduler about the hidden dep on flags Signed-off-by: Chris Forbes <[email protected]> V2: Reviewed-by: Paul Berry <[email protected]>
* i965/vs: add vec4_instruction::depends_on_flagsChris Forbes2013-08-161-0/+5
| | | | | | | We're about to have an instruction that depends on the flags but isn't predicated. This lays the groundwork. Signed-off-by: Chris Forbes <[email protected]>
* i965/vs: Do legacy clip lowering earlierChris Forbes2013-08-161-1/+1
| | | | | | | | | We need to produce clip flags for the vertex header on Gen4/5, so clip plane lowering has to be done before we try to emit the flags/psiz attribute. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* glsl: add ir_emit_vertex and ir_end_primitive instruction typesBryan Cain2013-08-011-0/+2
| | | | | | | | | | | | | | These correspond to the EmitVertex and EndPrimitive functions in GLSL. v2 (Paul Berry <[email protected]>): Add stub implementations of new pure visitor functions to i965's vec4_visitor and fs_visitor classes. v3 (Paul Berry <[email protected]>): Rename classes to be more consistent with the names used in the GL spec. Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Delete intel_context entirely.Kenneth Graunke2013-07-091-1/+0
| | | | | | | | | | This makes brw_context inherit directly from gl_context; that was the only thing left in intel_context. Signed-off-by: Kenneth Graunke <[email protected]> Acked-by: Chris Forbes <[email protected]> Acked-by: Paul Berry <[email protected]> Acked-by: Anuj Phogat <[email protected]>
* i965: Remove pointless intel_context parameter from try_copy_propagate.Kenneth Graunke2013-07-091-2/+1
| | | | | | | | | It's already part of the visitor class. Signed-off-by: Kenneth Graunke <[email protected]> Acked-by: Chris Forbes <[email protected]> Acked-by: Paul Berry <[email protected]> Acked-by: Anuj Phogat <[email protected]>
* i965/vs: Use the MAD instruction when possible.Eric Anholt2013-06-101-0/+1
| | | | | | | | | | | | | | This is different from how we do it in the FS - we are using MAD even when some of the args are constants, because with the relatively unrestrained ability to schedule a MOV to prepare a temporary with that data, we can get lower latency for the sequence of instructions. No significant performance difference on GLB2.7 trex (n=33/34), though it doesn't have that many MADs. I noticed MAD opportunities while reading the code for the DOTA2 bug. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/vs: Make virtual grf live intervals actually cover their used range.Eric Anholt2013-05-091-2/+2
| | | | | | | | This is the same change as the previous commit to the FS. A very few VSes are regressed by 1 or 2 instructions, which look recoverable with a bit more dead code elimination. Reviewed-by: Ian Romanick <[email protected]>
* i965/vs: Add support for bit instructions.Matt Turner2013-05-061-0/+7
| | | | | | | | | | | v2: Rebase on LRP addition. Use fix_3src_operand() when emitting BFE and BFI2. Add BFE and BFI2 to is_3src_inst check in brw_vec4_copy_propagation.cpp. Subtract result of FBH from 31 (unless an error) to convert MSB counts to LSB counts Reviewed-by: Chris Forbes <[email protected]>
* i965/vs: Add instruction scheduling.Eric Anholt2013-05-021-0/+1
| | | | | | | | | | | While this is ignorant of dependency control, it's still good for a 0.39% +/- 0.08% performance improvement on GLBenchmark 2.7 (n=548) v2: Rewrite as a subclass of the base class for the FS instruction scheduler, inheriting the same latency information. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Share the register file enum between the two backends.Eric Anholt2013-05-021-11/+0
| | | | | | | | | | I need this so I can look at vec4 and fs registers' files from the same .cpp file without namespaces. As far as I can tell we never rely on the particular numerical values of the files, though I thought it sounded like a good idea when doing the VS (it turns out having 0 be BAD_FILE is nicer). Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>