aboutsummaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965/brw_vec4.h
Commit message (Collapse)AuthorAgeFilesLines
* i965/vec4: Inline get_pull_constant_offsetJason Ekstrand2016-04-131-2/+0
| | | | | | | It's not really doing enough anymore to justify a helper function. Reviewed-by: Eduardo Lima Mitev <[email protected]> Reveiewed-by: Kristian Høgsberg <[email protected]>
* i965/vec4/nir: remove emit_untyped_surface_read and emit_untyped_atomic at ↵Alejandro Piñeiro2016-03-081-7/+0
| | | | | | | | | | | | | brw_vec4_visitor surface_access emit_untyped_read and emit_untyped_atomic provides the same functionality. v2: surface parameter of emit_untyped_atomic is a const, no need to specify default predicate on emit_untyped_atomic, use retype (Francisco Jerez). Reviewed-by: Francisco Jerez <[email protected]>
* i965/vec4: add opportunistic behaviour to opt_vector_float()Juan A. Suarez Romero2016-03-041-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | opt_vector_float() transforms several scalar MOV operations to a single vectorial MOV. This is done when those MOV covers all the components of the destination register. So something like: mov vgrf3.0.xy:D, 0D mov vgrf3.0.w:D, 1065353216D mov vgrf3.0.z:D, 0D is transformed in: mov vgrf3.0:F, [0F, 0F, 0F, 1F] But there are cases where not all the components are written. For example, in: mov vgrf2.0.x:D, 1073741824D mov vgrf3.0.xy:D, 0D mov vgrf3.0.w:D, 1065353216D mov vgrf4.0.xy:D, 1065353216D mov vgrf4.0.w:D, 0D mov vgrf6.0:UD, u4.xyzw:UD Nor vgrf3 nor vgrf4 .z components are written, so the optimization is not applied. But it could be applied anyway with the components covered, using a writemask to select the ones written. So we could transform it in: mov vgrf2.0.x:D, 1073741824D mov vgrf3.0.xyw:F, [0F, 0F, 0F, 1F] mov vgrf4.0.xyw:F, [1F, 1F, 0F, 0F] mov vgrf6.0:UD, u4.xyzw:UD This commit does precisely that: opportunistically apply opt_vector_float() when possible. total instructions in shared programs: 7124660 -> 7114784 (-0.14%) instructions in affected programs: 443078 -> 433202 (-2.23%) helped: 4998 HURT: 0 total cycles in shared programs: 64757760 -> 64728016 (-0.05%) cycles in affected programs: 1401686 -> 1371942 (-2.12%) helped: 3243 HURT: 38 v2: change vectorize_mov() signature (Matt). v3: take in account predicates (Juan). v4 [mattst88]: Update shader-db numbers. Fix some whitespace issues. Reviewed-by: Matt Turner <[email protected]> Signed-off-by: Juan A. Suarez Romero <[email protected]>
* i965: Lower min/max after optimization on Gen4/5.Matt Turner2016-02-171-0/+2
| | | | | | | | | | | | | | | | | | | Gen4/5's SEL instruction cannot use conditional modifiers, so min/max are implemented as CMP + SEL. Handling that after optimization lets us CSE more. On Ironlake: total instructions in shared programs: 6426035 -> 6422753 (-0.05%) instructions in affected programs: 326604 -> 323322 (-1.00%) helped: 1411 total cycles in shared programs: 129184700 -> 129101586 (-0.06%) cycles in affected programs: 18950290 -> 18867176 (-0.44%) helped: 2419 HURT: 328 Reviewed-by: Francisco Jerez <[email protected]>
* i965/vec4: Plumb separate surfaces and samplers through from NIRJason Ekstrand2016-02-091-1/+2
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* glsl: move to compiler/Emil Velikov2016-01-261-1/+1
| | | | | | Signed-off-by: Emil Velikov <[email protected]> Acked-by: Matt Turner <[email protected]> Acked-by: Jose Fonseca <[email protected]>
* nir: move to compiler/Emil Velikov2016-01-261-1/+1
| | | | | | Signed-off-by: Emil Velikov <[email protected]> Acked-by: Matt Turner <[email protected]> Acked-by: Jose Fonseca <[email protected]>
* i965: Add tessellation control shaders.Kenneth Graunke2015-12-221-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | The TCS is the first tessellation shader stage, and the most complicated. It has access to each of the control points in the input patch, and computes a new output patch. There is one logical invocation per output control point; all invocations run in parallel, and can communicate by reading and writing output variables. One of the main responsibilities of the TCS is to write the special gl_TessLevelOuter[] and gl_TessLevelInner[] output variables which control how much new geometry the hardware tessellation engine will produce. Otherwise, it simply writes outputs that are passed along to the TES. We run in SIMD4x2 mode, handling two logical invocations per EU thread. The hardware doesn't properly manage the dispatch mask for us; it always initializes it to 0xFF. We wrap the whole program in an IF..ENDIF block to handle an odd number of invocations, essentially falling back to SIMD4x1 on the last thread. v2: Update comments (requested by Jordan Justen). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965/vec4: Optimize predicate handling for any/all.Matt Turner2015-12-181-0/+2
| | | | | | | | | | | | | | | | | | For a select whose condition is any(v), instead of emitting cmp.nz.f0(8) null<1>D g1<0,4,1>D 0D mov(8) g7<1>.xUD 0x00000000UD (+f0.any4h) mov(8) g7<1>.xUD 0xffffffffUD cmp.nz.f0(8) null<1>D g7<4,4,1>.xD 0D (+f0) sel(8) g8<1>UD g4<4,4,1>UD g3<4,4,1>UD we now emit cmp.nz.f0(8) null<1>D g1<0,4,1>D 0D (+f0.any4h) sel(8) g9<1>UD g4<4,4,1>UD g3<4,4,1>UD Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vec4: Get rid of the nir_inputs arrayJason Ekstrand2015-12-031-2/+0
| | | | | | | | It's not really buying us anything at this point. It's just a way of remapping one offset namespace onto another. We can just use the location namespace the whole way through. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Clean up #includes in the compiler.Matt Turner2015-11-241-13/+1
| | | | Reviewed-by: Ian Romanick <[email protected]>
* i965: Push down inclusion of brw_program.h.Matt Turner2015-11-241-1/+0
| | | | | | | We were including it in headers, which then caused it to be included in tons of places it wasn't needed. Reviewed-by: Ian Romanick <[email protected]>
* i965: Use NIR for lowering texture swizzleJason Ekstrand2015-11-231-4/+0
| | | | | | | | | | Now that nir_lower_tex can do texture swizzle lowering, we can use that instead of repeating more-or-less the same code in both backends. This both allows us to share code and means that things like the tg4 work-arounds are somewhat simpler because they don't have to take the swizzle into account. Reviewed-by: Connor Abbott <[email protected]>
* i965/vec4: Move vec4_generator class definition into the .cpp file.Kenneth Graunke2015-10-291-111/+0
| | | | | | | | | The public API for the generator is brw_vec4_generate_code(); nobody actually needs to use the class. This means we can extend it without triggering the recompiles associated with altering brw_vec4.h. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Wrap vec4_generator in a C function.Kenneth Graunke2015-10-291-0/+9
| | | | | | | | | vec4_generator is a class for convenience, but only exports a single method as its public API. It makes much more sense to just export a single function. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Convert src_reg/dst_reg to brw_reg at the end of the visitor.Kenneth Graunke2015-10-291-0/+1
| | | | | | | | | | | | | | | | | This patch makes the visitor convert registers to the HW_REG file at the very end, after register allocation, post-RA scheduling, and dependency control flagging. After that, everything is in fixed brw_regs. This simplifies the code generator, as it can just use the hardware registers rather than having to interpret our abstract files. In particular, interpreting the UNIFORM file meant reading prog_data to figure out where push constants are supposed to start. Having the part of the code that performs register allocation also translate everything to hardware registers seems sensible. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: adding vec4_cmod_propagation optimizationAlejandro Piñeiro2015-10-221-0/+1
| | | | | | | | | | | | | | | | | | | | | vec4 port of fs_cmod_propagation. Shader-db results (no vec4 grepping): total instructions in shared programs: 6240413 -> 6235841 (-0.07%) instructions in affected programs: 401933 -> 397361 (-1.14%) total loops in shared programs: 1979 -> 1979 (0.00%) helped: 2265 HURT: 0 v2: remove extra space and combine two if blocks, as suggested by Matt Turner v3: add condition check to bail out if current inst and inst being scanned has different writemask, as pointed by Matt Turner v3: updated shader-db numbers v4: remove block from foreach_inst_in_block_*_starting_from after commit 801f151917fedb13c5c6e96281a18d833dd6901f Reviewed-by: Matt Turner <[email protected]>
* i965: Use a const nir_shader in backend_shaderJason Ekstrand2015-10-191-1/+1
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* i965/vec4: Remove gl_program and gl_shader_program from the generatorJason Ekstrand2015-10-191-7/+3
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Make vec4_visitor's destructor virtualIago Toral Quiroga2015-10-051-1/+1
| | | | | | | | | | | | | | | | | | We need a virtual destructor when at least one of the class' methods is virtual. Failure to do so might lead to undefined behavior when destructing derived classes. Fixes the following warning: brw_vec4_gs_visitor.cpp: In function 'const unsigned int* brw::brw_gs_emit(brw_context*, gl_shader_program*, brw_gs_compile*, void*, unsigned int*)': brw_vec4_gs_visitor.cpp:703:11: warning: deleting object of polymorphic class type 'brw::vec4_gs_visitor' which has non-virtual destructor might cause undefined behaviour [-Wdelete-non-virtual-dtor] delete gs; Curro: This shouldn't be causing any actual bugs at the moment because gen6_gs_visitor is the only subclass of vec4_visitor destroyed through a pointer of a base class (vec4_gs_visitor *) and its destructor is basically the same as its parent's. Anyway it seems sensible to change this so it doesn't bite us in the future. Reviewed-by: Francisco Jerez <[email protected]>
* i965/vec4: Remove more dead visitor/vertex program code.Matt Turner2015-10-041-8/+0
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965/shader: Get rid of the shader, prog, and shader_prog fieldsJason Ekstrand2015-10-021-3/+1
| | | | | | | | | | Unfortunately, we can't get rid of them entirely. The FS backend still needs gl_program for handling TEXTURE_RECTANGLE. The GS vec4 backend still needs gl_shader_program for handling transfom feedback. However, the VS needs neither and we can substantially reduce the amount they are used. One day we will be free from their tyranny. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs,vec4: Get rid of the sanity_param_countJason Ekstrand2015-10-021-2/+0
| | | | | | | | It doesn't exist for anything other than an assert that, as far as I can tell, isn't possible to trip. Soon, we will remove prog from the visitor entirely and this will become even more impossible to hit. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/backend_shader: Add a field to store the NIR shaderJason Ekstrand2015-10-021-3/+3
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Move binding table setup to codegen time.Jason Ekstrand2015-10-021-1/+0
| | | | | | | | | Setting up binding tables really has little to do with the actual process of turning shaders into instructions; it's more part of setting up prog_data. This commit moves it out of the visitors and with the rest of the prog_data setup stuff. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/nir: Pull GLSL uniform handling into a common functionJason Ekstrand2015-10-021-2/+0
| | | | | | | | The way we deal with GLSL uniforms and builtins is basically the same in both the vec4 and the fs backend. This commit takes the best parts of both implementations and pulls the common code into a shared helper function. Reviewed-by: Iago Toral Quiroga <[email protected]>
* i965/shader: Get rid of the setup_vec4_uniform_value helperJason Ekstrand2015-10-021-3/+0
| | | | | | It's not used by anything anymore Reviewed-by: Iago Toral Quiroga <[email protected]>
* i965/vec4: Get rid of the uniform_vector_size arrayJason Ekstrand2015-10-021-2/+1
| | | | | | | The uniform_vector_size array was only ever used by pack_uniform_registers which no longer needs it. Reviewed-by: Iago Toral Quiroga <[email protected]>
* i965/vec4: Delete the old vec4_vp codeJason Ekstrand2015-10-021-1/+0
| | | | Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Delete the old ir_visitor codeJason Ekstrand2015-10-021-72/+1
| | | | Reviewed-by: Matt Turner <[email protected]>
* i965/nir/vec4: Implement nir_intrinsic_ssbo_atomic_*Iago Toral Quiroga2015-09-251-0/+1
| | | | Reviewed-by: Kristian Høgsberg <[email protected]>
* i965/vec4: Implement VS_OPCODE_GET_BUFFER_SIZESamuel Iglesias Gonsalvez2015-09-251-0/+6
| | | | | | | | Notice that Skylake needs to include a header in the sampler message so it will need some tweaks to work there. Signed-off-by: Samuel Iglesias Gonsalvez <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
* i965/vec4_nir: Use partial SSA form rather than full non-SSAJason Ekstrand2015-09-151-0/+1
| | | | | | | | | We made this switch in the FS backend some time ago and it seems to make a number of things a bit easier. In particular, supporting SSA values takes very little work in the backend and allows us to take advantage of the majority of the SSA information even after we've gotten rid of Phi nodes. Reviewed-by: Eduardo Lima Mitev <[email protected]>
* i965: Remove the brw_vue_prog_key base class.Kenneth Graunke2015-09-031-7/+1
| | | | | | | | | The legacy userclip fields are only used for the vertex shader, and at that point there's only program_string_id and the tex struct, which are common to all keys. So there's no need for a "VUE" key base class. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Virtualize vec4_visitor::emit_urb_slot().Kenneth Graunke2015-09-031-1/+1
| | | | | | | | | | | | This avoids a downcast of key, which won't exist in the base class soon. I'm not a huge fan of this patch, but given that we're currently using inheritance, this seems like the "right" way to do it. The alternative is to make key a void pointer in the parent class and continue downcasting. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Store a key_tex pointer in vec4_visitor.Kenneth Graunke2015-09-031-0/+1
| | | | | | | | | I'm about to remove the base class for VS/GS/HS/DS program keys, at which point we won't be able to use key->tex anymore. Instead, we'll need to store a direct pointer (like we do in the FS backend). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Move legacy clip plane handling to vec4_vs_visitor.Kenneth Graunke2015-09-031-3/+1
| | | | | | | | | | | | | This is now only used for the vertex shader, so it makes sense to get it out of any paths run by the geometry shader. Instead of passing the gl_clip_plane array into the run() method (which is shared among all subclasses), we add it as a vec4_vs_visitor constructor parameter. This eliminates the bogus NULL parameter in the GS case. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965/vec4_nir: Get rid of the uniform_driver_location trackingJason Ekstrand2015-08-251-1/+0
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Move type_size() methods out of visitor classes.Kenneth Graunke2015-08-251-1/+0
| | | | | | | | I want to use C function pointers to these, and they don't use anything in the visitor classes anyway. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Make setup_vec4_uniform_value and _image_uniform_values take an offsetJason Ekstrand2015-08-251-1/+2
| | | | | | | | This way they don't implicitly increment the uniforms variable and don't have to be called in-sequence during uniform setup. Reviewed-by: Francisco Jerez <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Rename setup_vector_uniform_values to setup_vec4_uniform_valueJason Ekstrand2015-08-251-2/+2
| | | | | | | The new name more accurately represents what it does: Set up a single vec4 uniform value. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vec4_visitor: Make some function arguments const referencesJason Ekstrand2015-08-101-3/+3
| | | | Reviewed-by: Matt Turner <[email protected]>
* i965/vec4_nir: Do boolean source modifier resolves on BDW+Jason Ekstrand2015-08-101-0/+1
| | | | | | | | | | On BDW+, the negation source modifier on NOT, AND, OR, and XOR, is actually a boolean negate and not an integer negate. However, NIR's soruce modifiers are the integer version. We have to resolve it with a MOV prior to emitting the actual instruction. This is basically the same thing we do in the FS backend. Reviewed-by: Matt Turner <[email protected]>
* i965/gs: Refactor ir_emit_vertex and ir_end_primitiveIago Toral Quiroga2015-08-031-0/+2
| | | | | | | | So the implementation is independent of GLSL IR and the visit methods of the vec4 visitor. This way we will be able to reuse that implementation directly from the NIR vec4 backend. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/ir/vec4: Refactor visit(ir_texture *ir)Alejandro Piñeiro2015-08-031-0/+14
| | | | | | | | Splitted in two. The emission is moved to a new vec4_visitor method, vec4_visitor::emit_texture, ir order to be reused on the nir path. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/vec4: Change vec4_visitor::swizzle_result() method to allow reuseAlejandro Piñeiro2015-08-031-1/+3
| | | | | | | | This patch changes the signature of swizzle_result() to accept lower level arguments. The purpose is to reuse it in the upcoming NIR->vec4 pass. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/vec4: Change vec4_visitor::gather_channel() method to allow reuseEduardo Lima Mitev2015-08-031-1/+1
| | | | | | | | This patch changes the signature of gather_channel() to accept the gather component directly instead of fetching it internally from ir_texture. This will allow reuse in the upcoming NIR->vec4 pass. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/vec4: Change vec4_visitor::emit_mcs_fetch() method to allow reuseEduardo Lima Mitev2015-08-031-1/+2
| | | | | | | This patch changes the signature of emit_mcs_fetch() to accept lower level arguments. The purpose is to reuse it in the upcoming NIR->vec4 pass. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/vec4: Move is_high_sample() method to vec4_visitor classEduardo Lima Mitev2015-08-031-0/+1
| | | | | | | | The is_high_sample() method is currently accessible only in the implementation of vec4_visitor. Since we need to reuse it in the upcoming NIR->vec4 pass, lets make it a method of the class instead. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/vec4: Return the emitted instruction in emit_lrp()Antia Puentes2015-08-031-2/+2
| | | | | | | Needed in the NIR backend to set the "saturate" value of the instruction. Reviewed-by: Jason Ekstrand <[email protected]>