aboutsummaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965/brw_vec4.h
Commit message (Collapse)AuthorAgeFilesLines
* i965/vec4: add support for packing vs/gs/tes outputsTimothy Arceri2016-07-211-0/+3
| | | | | | | | | | | Here we create a new output_generic_reg array with the ability to store the dst_reg for each component of user defined varyings. This is needed as the previous code only stored the dst_reg based on the varying location which meant packed varyings would overwrite each other. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Alejandro Piñeiro <[email protected]>
* i965: Stop muging cube array lengths by 6Jason Ekstrand2016-07-201-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | From the Sky Lake PRM: "For SURFTYPE_CUBE: For Sampling Engine Surfaces and Typed Data Port Surfaces, the range of this field is [0,340], indicating the number of cube array elements (equal to the number of underlying 2D array elements divided by 6). For other surfaces, this field must be zero." In other words, the depth field for cube maps is in number of cubes not number of 2-D slices so we need to divide by 6. ISL will do this correctly for us assuming that we provide it with the correct array bounds which it expects to be in 2-D slices. It appears as if we've been doing this wrong ever since we first added cube map arrays for Sandy Bridge and the change to ISL made things slightly worse. While we're at it, we now need to remoe the shader hacks we've always done since they were only needed because we were setting the depth field six times too large. v2: Fix the vec4 backend as well (not sure how I missed this). Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: enable the emission of the DIM instructionSamuel Iglesias Gonsálvez2016-07-141-0/+2
| | | | | | | | | | v2 (Matt): - Take a DF source argument for the DIM instruction emission in the visitors. - Indentation. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Pass nir_src/nir_dest by reference.Matt Turner2016-05-201-6/+6
| | | | | | | | | | Cuts 6K of .text. text data bss dec hex filename 5772372 264648 29320 6066340 5c90a4 lib/i965_dri.so before 5766074 264648 29320 6060042 5c780a lib/i965_dri.so after Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Silence unused parameter warningsIan Romanick2016-05-181-2/+1
| | | | | | | | | | | | The only place that actually used the type parameter was the GS visitor, and it was always passed glsl_type::int. Just remove the parameter. brw_vec4_vs_visitor.cpp:38:61: warning: unused parameter ‘type’ [-Wunused-parameter] const glsl_type *type) ^ Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* i965: Fold vectorize_mov() back into the one caller.Kenneth Graunke2016-04-201-4/+0
| | | | | | | | | | After the previous patch, this helper is only called in one place. So, just fold it back in - there are a lot of parameters here and not much code. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Get rid of the uniform_size arrayJason Ekstrand2016-04-141-2/+0
| | | | | Reviewed-by: Kristian Høgsberg <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vec4: Use MOV_INDIRECT instead of reladdr for indirect push constantsJason Ekstrand2016-04-141-1/+2
| | | | | | | | | | | | | | | This commit moves us to an instruction based model rather than a register-based model for indirects. This is more accurate anyway as we have to emit instructions to resolve the reladdr. It's also a lot simpler because it gets rid of the recursive reladdr problem by design. One side-effect of this is that we need a whole new algorithm in move_uniform_array_access_to_pull_constants. This new algorithm is much more straightforward than the old one and is fairly similar to what we're already doing in the FS backend. Reviewed-by: Kristian Høgsberg <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
* i965/vec4: Inline get_pull_constant_offsetJason Ekstrand2016-04-131-2/+0
| | | | | | | It's not really doing enough anymore to justify a helper function. Reviewed-by: Eduardo Lima Mitev <[email protected]> Reveiewed-by: Kristian Høgsberg <[email protected]>
* i965/vec4/nir: remove emit_untyped_surface_read and emit_untyped_atomic at ↵Alejandro Piñeiro2016-03-081-7/+0
| | | | | | | | | | | | | brw_vec4_visitor surface_access emit_untyped_read and emit_untyped_atomic provides the same functionality. v2: surface parameter of emit_untyped_atomic is a const, no need to specify default predicate on emit_untyped_atomic, use retype (Francisco Jerez). Reviewed-by: Francisco Jerez <[email protected]>
* i965/vec4: add opportunistic behaviour to opt_vector_float()Juan A. Suarez Romero2016-03-041-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | opt_vector_float() transforms several scalar MOV operations to a single vectorial MOV. This is done when those MOV covers all the components of the destination register. So something like: mov vgrf3.0.xy:D, 0D mov vgrf3.0.w:D, 1065353216D mov vgrf3.0.z:D, 0D is transformed in: mov vgrf3.0:F, [0F, 0F, 0F, 1F] But there are cases where not all the components are written. For example, in: mov vgrf2.0.x:D, 1073741824D mov vgrf3.0.xy:D, 0D mov vgrf3.0.w:D, 1065353216D mov vgrf4.0.xy:D, 1065353216D mov vgrf4.0.w:D, 0D mov vgrf6.0:UD, u4.xyzw:UD Nor vgrf3 nor vgrf4 .z components are written, so the optimization is not applied. But it could be applied anyway with the components covered, using a writemask to select the ones written. So we could transform it in: mov vgrf2.0.x:D, 1073741824D mov vgrf3.0.xyw:F, [0F, 0F, 0F, 1F] mov vgrf4.0.xyw:F, [1F, 1F, 0F, 0F] mov vgrf6.0:UD, u4.xyzw:UD This commit does precisely that: opportunistically apply opt_vector_float() when possible. total instructions in shared programs: 7124660 -> 7114784 (-0.14%) instructions in affected programs: 443078 -> 433202 (-2.23%) helped: 4998 HURT: 0 total cycles in shared programs: 64757760 -> 64728016 (-0.05%) cycles in affected programs: 1401686 -> 1371942 (-2.12%) helped: 3243 HURT: 38 v2: change vectorize_mov() signature (Matt). v3: take in account predicates (Juan). v4 [mattst88]: Update shader-db numbers. Fix some whitespace issues. Reviewed-by: Matt Turner <[email protected]> Signed-off-by: Juan A. Suarez Romero <[email protected]>
* i965: Lower min/max after optimization on Gen4/5.Matt Turner2016-02-171-0/+2
| | | | | | | | | | | | | | | | | | | Gen4/5's SEL instruction cannot use conditional modifiers, so min/max are implemented as CMP + SEL. Handling that after optimization lets us CSE more. On Ironlake: total instructions in shared programs: 6426035 -> 6422753 (-0.05%) instructions in affected programs: 326604 -> 323322 (-1.00%) helped: 1411 total cycles in shared programs: 129184700 -> 129101586 (-0.06%) cycles in affected programs: 18950290 -> 18867176 (-0.44%) helped: 2419 HURT: 328 Reviewed-by: Francisco Jerez <[email protected]>
* i965/vec4: Plumb separate surfaces and samplers through from NIRJason Ekstrand2016-02-091-1/+2
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* glsl: move to compiler/Emil Velikov2016-01-261-1/+1
| | | | | | Signed-off-by: Emil Velikov <[email protected]> Acked-by: Matt Turner <[email protected]> Acked-by: Jose Fonseca <[email protected]>
* nir: move to compiler/Emil Velikov2016-01-261-1/+1
| | | | | | Signed-off-by: Emil Velikov <[email protected]> Acked-by: Matt Turner <[email protected]> Acked-by: Jose Fonseca <[email protected]>
* i965: Add tessellation control shaders.Kenneth Graunke2015-12-221-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | The TCS is the first tessellation shader stage, and the most complicated. It has access to each of the control points in the input patch, and computes a new output patch. There is one logical invocation per output control point; all invocations run in parallel, and can communicate by reading and writing output variables. One of the main responsibilities of the TCS is to write the special gl_TessLevelOuter[] and gl_TessLevelInner[] output variables which control how much new geometry the hardware tessellation engine will produce. Otherwise, it simply writes outputs that are passed along to the TES. We run in SIMD4x2 mode, handling two logical invocations per EU thread. The hardware doesn't properly manage the dispatch mask for us; it always initializes it to 0xFF. We wrap the whole program in an IF..ENDIF block to handle an odd number of invocations, essentially falling back to SIMD4x1 on the last thread. v2: Update comments (requested by Jordan Justen). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965/vec4: Optimize predicate handling for any/all.Matt Turner2015-12-181-0/+2
| | | | | | | | | | | | | | | | | | For a select whose condition is any(v), instead of emitting cmp.nz.f0(8) null<1>D g1<0,4,1>D 0D mov(8) g7<1>.xUD 0x00000000UD (+f0.any4h) mov(8) g7<1>.xUD 0xffffffffUD cmp.nz.f0(8) null<1>D g7<4,4,1>.xD 0D (+f0) sel(8) g8<1>UD g4<4,4,1>UD g3<4,4,1>UD we now emit cmp.nz.f0(8) null<1>D g1<0,4,1>D 0D (+f0.any4h) sel(8) g9<1>UD g4<4,4,1>UD g3<4,4,1>UD Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vec4: Get rid of the nir_inputs arrayJason Ekstrand2015-12-031-2/+0
| | | | | | | | It's not really buying us anything at this point. It's just a way of remapping one offset namespace onto another. We can just use the location namespace the whole way through. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Clean up #includes in the compiler.Matt Turner2015-11-241-13/+1
| | | | Reviewed-by: Ian Romanick <[email protected]>
* i965: Push down inclusion of brw_program.h.Matt Turner2015-11-241-1/+0
| | | | | | | We were including it in headers, which then caused it to be included in tons of places it wasn't needed. Reviewed-by: Ian Romanick <[email protected]>
* i965: Use NIR for lowering texture swizzleJason Ekstrand2015-11-231-4/+0
| | | | | | | | | | Now that nir_lower_tex can do texture swizzle lowering, we can use that instead of repeating more-or-less the same code in both backends. This both allows us to share code and means that things like the tg4 work-arounds are somewhat simpler because they don't have to take the swizzle into account. Reviewed-by: Connor Abbott <[email protected]>
* i965/vec4: Move vec4_generator class definition into the .cpp file.Kenneth Graunke2015-10-291-111/+0
| | | | | | | | | The public API for the generator is brw_vec4_generate_code(); nobody actually needs to use the class. This means we can extend it without triggering the recompiles associated with altering brw_vec4.h. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Wrap vec4_generator in a C function.Kenneth Graunke2015-10-291-0/+9
| | | | | | | | | vec4_generator is a class for convenience, but only exports a single method as its public API. It makes much more sense to just export a single function. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Convert src_reg/dst_reg to brw_reg at the end of the visitor.Kenneth Graunke2015-10-291-0/+1
| | | | | | | | | | | | | | | | | This patch makes the visitor convert registers to the HW_REG file at the very end, after register allocation, post-RA scheduling, and dependency control flagging. After that, everything is in fixed brw_regs. This simplifies the code generator, as it can just use the hardware registers rather than having to interpret our abstract files. In particular, interpreting the UNIFORM file meant reading prog_data to figure out where push constants are supposed to start. Having the part of the code that performs register allocation also translate everything to hardware registers seems sensible. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: adding vec4_cmod_propagation optimizationAlejandro Piñeiro2015-10-221-0/+1
| | | | | | | | | | | | | | | | | | | | | vec4 port of fs_cmod_propagation. Shader-db results (no vec4 grepping): total instructions in shared programs: 6240413 -> 6235841 (-0.07%) instructions in affected programs: 401933 -> 397361 (-1.14%) total loops in shared programs: 1979 -> 1979 (0.00%) helped: 2265 HURT: 0 v2: remove extra space and combine two if blocks, as suggested by Matt Turner v3: add condition check to bail out if current inst and inst being scanned has different writemask, as pointed by Matt Turner v3: updated shader-db numbers v4: remove block from foreach_inst_in_block_*_starting_from after commit 801f151917fedb13c5c6e96281a18d833dd6901f Reviewed-by: Matt Turner <[email protected]>
* i965: Use a const nir_shader in backend_shaderJason Ekstrand2015-10-191-1/+1
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* i965/vec4: Remove gl_program and gl_shader_program from the generatorJason Ekstrand2015-10-191-7/+3
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Make vec4_visitor's destructor virtualIago Toral Quiroga2015-10-051-1/+1
| | | | | | | | | | | | | | | | | | We need a virtual destructor when at least one of the class' methods is virtual. Failure to do so might lead to undefined behavior when destructing derived classes. Fixes the following warning: brw_vec4_gs_visitor.cpp: In function 'const unsigned int* brw::brw_gs_emit(brw_context*, gl_shader_program*, brw_gs_compile*, void*, unsigned int*)': brw_vec4_gs_visitor.cpp:703:11: warning: deleting object of polymorphic class type 'brw::vec4_gs_visitor' which has non-virtual destructor might cause undefined behaviour [-Wdelete-non-virtual-dtor] delete gs; Curro: This shouldn't be causing any actual bugs at the moment because gen6_gs_visitor is the only subclass of vec4_visitor destroyed through a pointer of a base class (vec4_gs_visitor *) and its destructor is basically the same as its parent's. Anyway it seems sensible to change this so it doesn't bite us in the future. Reviewed-by: Francisco Jerez <[email protected]>
* i965/vec4: Remove more dead visitor/vertex program code.Matt Turner2015-10-041-8/+0
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965/shader: Get rid of the shader, prog, and shader_prog fieldsJason Ekstrand2015-10-021-3/+1
| | | | | | | | | | Unfortunately, we can't get rid of them entirely. The FS backend still needs gl_program for handling TEXTURE_RECTANGLE. The GS vec4 backend still needs gl_shader_program for handling transfom feedback. However, the VS needs neither and we can substantially reduce the amount they are used. One day we will be free from their tyranny. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs,vec4: Get rid of the sanity_param_countJason Ekstrand2015-10-021-2/+0
| | | | | | | | It doesn't exist for anything other than an assert that, as far as I can tell, isn't possible to trip. Soon, we will remove prog from the visitor entirely and this will become even more impossible to hit. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/backend_shader: Add a field to store the NIR shaderJason Ekstrand2015-10-021-3/+3
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Move binding table setup to codegen time.Jason Ekstrand2015-10-021-1/+0
| | | | | | | | | Setting up binding tables really has little to do with the actual process of turning shaders into instructions; it's more part of setting up prog_data. This commit moves it out of the visitors and with the rest of the prog_data setup stuff. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/nir: Pull GLSL uniform handling into a common functionJason Ekstrand2015-10-021-2/+0
| | | | | | | | The way we deal with GLSL uniforms and builtins is basically the same in both the vec4 and the fs backend. This commit takes the best parts of both implementations and pulls the common code into a shared helper function. Reviewed-by: Iago Toral Quiroga <[email protected]>
* i965/shader: Get rid of the setup_vec4_uniform_value helperJason Ekstrand2015-10-021-3/+0
| | | | | | It's not used by anything anymore Reviewed-by: Iago Toral Quiroga <[email protected]>
* i965/vec4: Get rid of the uniform_vector_size arrayJason Ekstrand2015-10-021-2/+1
| | | | | | | The uniform_vector_size array was only ever used by pack_uniform_registers which no longer needs it. Reviewed-by: Iago Toral Quiroga <[email protected]>
* i965/vec4: Delete the old vec4_vp codeJason Ekstrand2015-10-021-1/+0
| | | | Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Delete the old ir_visitor codeJason Ekstrand2015-10-021-72/+1
| | | | Reviewed-by: Matt Turner <[email protected]>
* i965/nir/vec4: Implement nir_intrinsic_ssbo_atomic_*Iago Toral Quiroga2015-09-251-0/+1
| | | | Reviewed-by: Kristian Høgsberg <[email protected]>
* i965/vec4: Implement VS_OPCODE_GET_BUFFER_SIZESamuel Iglesias Gonsalvez2015-09-251-0/+6
| | | | | | | | Notice that Skylake needs to include a header in the sampler message so it will need some tweaks to work there. Signed-off-by: Samuel Iglesias Gonsalvez <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
* i965/vec4_nir: Use partial SSA form rather than full non-SSAJason Ekstrand2015-09-151-0/+1
| | | | | | | | | We made this switch in the FS backend some time ago and it seems to make a number of things a bit easier. In particular, supporting SSA values takes very little work in the backend and allows us to take advantage of the majority of the SSA information even after we've gotten rid of Phi nodes. Reviewed-by: Eduardo Lima Mitev <[email protected]>
* i965: Remove the brw_vue_prog_key base class.Kenneth Graunke2015-09-031-7/+1
| | | | | | | | | The legacy userclip fields are only used for the vertex shader, and at that point there's only program_string_id and the tex struct, which are common to all keys. So there's no need for a "VUE" key base class. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Virtualize vec4_visitor::emit_urb_slot().Kenneth Graunke2015-09-031-1/+1
| | | | | | | | | | | | This avoids a downcast of key, which won't exist in the base class soon. I'm not a huge fan of this patch, but given that we're currently using inheritance, this seems like the "right" way to do it. The alternative is to make key a void pointer in the parent class and continue downcasting. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Store a key_tex pointer in vec4_visitor.Kenneth Graunke2015-09-031-0/+1
| | | | | | | | | I'm about to remove the base class for VS/GS/HS/DS program keys, at which point we won't be able to use key->tex anymore. Instead, we'll need to store a direct pointer (like we do in the FS backend). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Move legacy clip plane handling to vec4_vs_visitor.Kenneth Graunke2015-09-031-3/+1
| | | | | | | | | | | | | This is now only used for the vertex shader, so it makes sense to get it out of any paths run by the geometry shader. Instead of passing the gl_clip_plane array into the run() method (which is shared among all subclasses), we add it as a vec4_vs_visitor constructor parameter. This eliminates the bogus NULL parameter in the GS case. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965/vec4_nir: Get rid of the uniform_driver_location trackingJason Ekstrand2015-08-251-1/+0
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Move type_size() methods out of visitor classes.Kenneth Graunke2015-08-251-1/+0
| | | | | | | | I want to use C function pointers to these, and they don't use anything in the visitor classes anyway. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Make setup_vec4_uniform_value and _image_uniform_values take an offsetJason Ekstrand2015-08-251-1/+2
| | | | | | | | This way they don't implicitly increment the uniforms variable and don't have to be called in-sequence during uniform setup. Reviewed-by: Francisco Jerez <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Rename setup_vector_uniform_values to setup_vec4_uniform_valueJason Ekstrand2015-08-251-2/+2
| | | | | | | The new name more accurately represents what it does: Set up a single vec4 uniform value. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vec4_visitor: Make some function arguments const referencesJason Ekstrand2015-08-101-3/+3
| | | | Reviewed-by: Matt Turner <[email protected]>