aboutsummaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965/brw_nir.c
Commit message (Collapse)AuthorAgeFilesLines
* i965: Add scalar GS input lowering code.Kenneth Graunke2015-11-031-5/+39
| | | | | | | | | We really ought to compute the VUE map at link time and stash it, rather than recomputing it here, but with the mess of program structures I wasn't sure where to put it. We can improve that later. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
* glsl: keep track of intra-stage indices for atomicsTimothy Arceri2015-10-271-2/+4
| | | | | | | | | | | | | | | This is more optimal as it means we no longer have to upload the same set of ABO surfaces to all stages in the program. This also fixes a bug where since commit c0cd5b var->data.binding was being used as a replacement for atomic buffer index, but they don't have to be the same value they just happened to end up the same when binding is 0. Reviewed-by: Francisco Jerez <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]> Cc: Ilia Mirkin <[email protected]> Cc: Alejandro Piñeiro <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90175
* i965/nir: Switch on shader stage in nir_lower_outputs().Kenneth Graunke2015-10-171-5/+21
| | | | | | | | | | | VS, GS, and FS continue doing the same thing they did before. We can simplify the FS code a bit because it is always scalar. Compute shaders now assert that there are no outputs instead of doing a loop over 0 outputs. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Ignore compute shaders in brw_nir_lower_inputsJordan Justen2015-10-141-0/+4
| | | | | | | | | | | | | | | The commit shown below caused compute shaders to hit the unreachable in the default of the switch block. Since compute shaders don't have any inputs, we can make brw_nir_lower_inputs a no-op for CS. commit 2953c3d76178d7589947e6ea1dbd902b7b02b3d4 Author: Kenneth Graunke <[email protected]> Date: Fri Aug 14 15:15:11 2015 -0700 i965/vs: Map scalar VS input locations properly; avoid tons of MOVs. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Simplify FS in brw_nir_lower_inputs to only support scalar modeJordan Justen2015-10-141-1/+2
| | | | | Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: Simplify fs_visitor's ATTR file.Kenneth Graunke2015-10-121-0/+40
| | | | | | | | | | | | | | | | | | Previously, ATTR was indexed by VERT_ATTRIB_* slots; at the end of compilation, assign_vs_urb_setup() translated those into GRF units, and converted ATTR to HW_REGs. This patch moves the transslation earlier, making ATTR work in terms of GRF units from the beginning. assign_vs_urb_setup() simply has to add the number of payload registers and push constants to obtain the final hardware GRF number. (We can't do this earlier as those values aren't known.) ATTR still supports reg_offset; however, it's simply added to reg. It's not clear whether this is valuable or not. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/vs: Map scalar VS input locations properly; avoid tons of MOVs.Kenneth Graunke2015-10-101-1/+22
| | | | | | | | | | | | | | | | | | | | Previously, we used nir_lower_io with the scalar type_size function, which mapped VERT_ATTRIB_* locations to...some numbers. Then, in fs_visitor::nir_setup_inputs(), we created temporaries indexed by those numbers, and emitted MOVs from the actual ATTR registers to those temporaries. Virtually all of these were copy propagated away, but it's still ugly. This patch reworks our input lowering to produce NIR lower_input intrinsics that properly index into the ATTR file, so we can access it directly. No changes in shader-db. v2: Fix unreachable() message (Ken), update commit message (Matt). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* nir: Introduce new nir_intrinsic_load_per_vertex_input intrinsics.Kenneth Graunke2015-10-041-2/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | Geometry and tessellation shaders process multiple vertices; their inputs are arrays indexed by the vertex number. While GLSL makes this look like a normal array, it can be very different behind the scenes. On Intel hardware, all inputs for a particular vertex are stored together - as if they were grouped into a single struct. This means that consecutive elements of these top-level arrays are not contiguous. In fact, they may sometimes be in completely disjoint memory segments. NIR's existing load_input intrinsics are awkward for this case, as they distill everything down to a single offset. We'd much rather keep the vertex ID separate, but build up an offset as normal beyond that. This patch introduces new nir_intrinsic_load_per_vertex_input intrinsics to handle this case. They work like ordinary load_input intrinsics, but have an extra source (src[0]) which represents the outermost array index. v2: Rebase on earlier refactors. v3: Use ssa defs instead of nir_srcs, rebase on earlier refactors. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Use nir_foreach_variableJason Ekstrand2015-10-021-1/+1
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965/nir: Remove the prog parameter from brw_nir_lower_inputsJason Ekstrand2015-10-021-4/+2
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* nir/glsl: Take a gl_shader_program and a stage rather than a gl_shaderJason Ekstrand2015-10-021-2/+1
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965/nir: Refactor input/output lowering setup into helpers.Kenneth Graunke2015-10-011-20/+26
| | | | | | | | | | | | | | The code for input lowering is going to get significantly more complicated shortly, so I wanted to pull it out. Vertex shader inputs are handled nearly identically regardless of vec4/scalar mode, so I opted to not split that. I thought about having each function actually do the lowering, but one pass through nir_lower_io that handles all types (which weren't handled earlier) is probably more efficient. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* nir: Allow nir_lower_io() to only lower one type of variable.Kenneth Graunke2015-10-011-2/+2
| | | | | | | | | We may want to use different type_size functions for (e.g.) inputs vs. uniforms. Passing in -1 for mode ignores this, handling all modes as before. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/gs: Use new NIR intrinsics.Kenneth Graunke2015-09-231-0/+5
| | | | | | | | | | | | | | | | | | | By performing the vertex counting in NIR, we're able to elide a ton of useless safety checks around every EmitVertex() call: total instructions in shared programs: 3952 -> 3720 (-5.87%) instructions in affected programs: 3491 -> 3259 (-6.65%) helped: 11 HURT: 0 Improves performance in Gl32GSCloth by 0.671742% +/- 0.142202% (n=621) on Haswell GT3e at 1024x768. This should also make it easier to implement Broadwell's "Static Vertex Count" feature someday. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir/lower_tex: support projector lowering per sampler typeRob Clark2015-09-181-1/+4
| | | | | | | | | | | | Some hardware, such as adreno a3xx, supports txp on some but not all sampler types. In this case we want more fine grained control over which texture projectors get lowered. v2: split out nir_lower_tex_options struct to make it easier to add the additional parameters coming in the following patches Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nir: rename nir_lower_tex_projectorRob Clark2015-09-181-1/+1
| | | | | | | | | | Since the following patches will add additional tex-lowering related functionality, which doesn't make sense to split out into a separate pass (as they would require duplication of the projector lowering logic), let's give this pass a more generic name. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vec4: Use nir_move_vec_src_uses_to_destJason Ekstrand2015-09-171-0/+3
| | | | | | | | | | | | | | | | | | | The idea here is not that it gives register coalescing a little bit of a helping hand. It doesn't actually fix the coalescing problems, but it seems to help a good bit. Shader-db results for vec4 programs on Haswell: total instructions in shared programs: 1746280 -> 1683959 (-3.57%) instructions in affected programs: 1259166 -> 1196845 (-4.95%) helped: 11363 HURT: 148 v2 (Jason Ekstrand): - Run nir_move_vec_src_uses_to_dest after going out of SSA - New shader-db numbers Reviewed-by: Eduardo Lima Mitev <[email protected]>
* i965/vec4_nir: Use partial SSA form rather than full non-SSAJason Ekstrand2015-09-151-1/+1
| | | | | | | | | We made this switch in the FS backend some time ago and it seems to make a number of things a bit easier. In particular, supporting SSA values takes very little work in the backend and allows us to take advantage of the majority of the SSA information even after we've gotten rid of Phi nodes. Reviewed-by: Eduardo Lima Mitev <[email protected]>
* i965/nir: enable the dead control flow optimizationConnor Abbott2015-09-011-0/+2
| | | | | | | | total instructions in shared programs: 7541551 -> 7541381 (-0.00%) instructions in affected programs: 3054 -> 2884 (-5.57%) helped: 29 Reviewed-by: Kenneth Graunke <[email protected]>
* i965/nir: Make use of nir_opt_undefBoyan Ding2015-08-271-0/+2
| | | | | | | | | | | | Shader-db result on Ivy Bridge: total instructions in shared programs: 145484 -> 145445 (-0.03%) instructions in affected programs: 225 -> 186 (-17.33%) helped: 5 HURT: 0 Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Thomas Helland <[email protected]> Signed-off-by: Boyan Ding <[email protected]>
* nir: Use nir_shader::stage rather than passing it around.Kenneth Graunke2015-08-251-1/+1
| | | | | | Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965/fs: Rework uniform handlingJason Ekstrand2015-08-251-4/+3
| | | | | | | | | | | Previously, we treated the entire UNIFORM file as if it had two elements: One for direct things and one for indirect. This is substantially different from how the old visitor code handled it where each element was effectively its own uniform. This commit makes the NIR path more like the old ir_visitor path where each uniform is separate. This should allow us to more easily make decisions about what to push. Reviewed-by: Kenneth Graunke <[email protected]>
* nir: Pass a type_size() function pointer into nir_lower_io().Kenneth Graunke2015-08-251-6/+10
| | | | | | | | | | | | | | | | | | | | | | | | | Previously, there were four type_size() functions in play - the i965 compiler backend defined scalar and vec4 type_size() functions, and nir_lower_io contained its own similar functions. In fact, the i965 driver used nir_lower_io() and then looped over the components using its own type_size - meaning both were in play. The two are /basically/ the same, but not exactly in obscure cases like subroutines and images. This patch removes nir_lower_io's functions, and instead makes the driver supply a function pointer. This gives the driver ultimate flexibility in deciding how it wants to count things, reduces code duplication, and improves consistency. v2 (Jason Ekstrand): - One side-effect of passing in a function pointer is that nir_lower_io is now aware of and properly allocates space for image uniforms, allowing us to drop hacks in the backend Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> v2 Reviewed-by: Kenneth Graunke <[email protected]>
* i965/nir: Do not scalarize phis in non-scalar setupsIago Toral Quiroga2015-08-031-2/+6
| | | | | | Significantly reduces register pressure in some piglit tests. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/nir: Add new utility method brw_glsl_base_type_for_nir_type()Eduardo Lima Mitev2015-08-031-0/+21
| | | | | | | | This method returns the glsl_base_type corresponding to a nir_alu_type. It will factorize code currently present in fs_nir, that can be reused in vec4_nir on its upcoming emit_texture support. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/nir/vec4: Lower "vecN" instructions and mark them unreachableAntia Puentes2015-08-031-0/+5
| | | | | | This enables NIR pass "lower_vec_to_movs" on shaders that work on vec4. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/nir: Disable alu_to_scalar pass on non-scalar shadersAlejandro Piñeiro2015-08-031-6/+10
| | | | | | | Disables nir_lower_alu_to_scalar when the shader stage being processed work on vec4 vectors, like the upcoming NIR->vec4 backend. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/nir/vec4: Implement store_output intrinsicEduardo Lima Mitev2015-08-031-1/+4
| | | | | | | | | | | | | | This implementation is based on the current URB setup in vec4_visitor, which requires the output register to be stored in the output_reg array at variable's original shader location index. But since nir_lower_io() pass uses the value in var->data.driver_location, we need to put there var->data.location instead, prior to calling nir_lower_io(), so that we end up with the correct index in const_index[0]. The driver_location is not used at all, so this patch also disables the nir_assign_var_locations pass on non-scalar shaders. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/nir: Move brw_type_for_nir_type() to brw_nir to allow reuseEduardo Lima Mitev2015-08-031-0/+18
| | | | | | Upcoming NIR->vec4 pass can benefit from this method, so lets move it up. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/nir/vec4: Implement load_const intrinsicEduardo Lima Mitev2015-08-031-1/+1
| | | | | | | | | Similar to fs_nir backend, a nir_local_values map will be filled with newly allocated registers as the load_const instrinsic instructions are processed. Later, get_nir_src() will fetch the registers from this map for sources that are ssa. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/nir: Dot not assign direct uniform locations first for vec4-based shadersIago Toral Quiroga2015-08-031-4/+10
| | | | | | | | | | In the vec4 backend we want uniform locations to be assigned consecutively since that way the offsets produced by nir_lower_io are exactly what we need to implement nir_intrinsic_load_uniform. Otherwise we would need a mapping to match the output of nir_lower_io to the actual uniform registers we need to use. Reviewed-by: Jason Ekstrand <[email protected]>
* nir/nir_lower_io: Add vec4 supportIago Toral Quiroga2015-08-031-6/+8
| | | | | | | | | | The current implementation operates in scalar mode only, so add a vec4 mode where types are padded to vec4 sizes. This will be useful in the i965 driver for its vec4 nir backend (and possbly other drivers that have vec4-based shaders). Reviewed-by: Jason Ekstrand <[email protected]>
* i965/nir: Pass a is_scalar boolean to brw_create_nir()Eduardo Lima Mitev2015-08-031-1/+2
| | | | | | | | | | | | The upcoming introduction of NIR->vec4 pass will require that some NIR lowering passes are enabled/disabled depending on the type of shader (scalar vs. vector). With this patch we pass a 'is_scalar' variable to the process of constructing the NIR, to let an external context decide how the shader should be handled. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: use SSA values directlyConnor Abbott2015-06-301-1/+1
| | | | | | | | | | | Before, we would use registers, but set a magical "parent_instr" field to indicate that it was actually purely an SSA value (i.e., it wasn't involved in any phi nodes). Instead, just use SSA values directly, which lets us get rid of the hack and reduces memory usage since we're not allocating a nir_register for every value. It also makes our handling of load_const more consistent compared to the other instructions. Reviewed-by: Jason Ekstrand <[email protected]>
* nir/from_ssa: add a flag to not convert everything from SSAConnor Abbott2015-06-301-1/+1
| | | | | | | | | | | | | We already don't convert constants out of SSA, and in our backend we'd like to have only one way of saying something is still in SSA. The one tricky part about this is that we may now leave some undef instructions around if they aren't part of a phi-web, so we have to be more careful about deleting them. v2: rename and flip meaning of flag (Jason) Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Don't count NIR instructions for shader-db.Kenneth Graunke2015-06-231-31/+0
| | | | | | | Matt, Jason, and I haven't found this useful in a long time. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Re-index SSA definitions before printing NIR code.Kenneth Graunke2015-06-111-0/+6
| | | | | | | | | | This makes the SSA definitions use sequential numbers (0, 1, 2, ...) instead of seemingly random ones. There's not much point normally, but it makes debug output much easier to read. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* prog_to_nir: Use a variable for uniform dataJason Ekstrand2015-05-231-12/+3
| | | | | | | | | | | | | Previously, the prog_to_nir pass was directly generating uniform load/store intrinsics. This converts it to use a single giant "parameters" variable and we now depend on lowering to get the uniform load/store intrinsics. One advantage of this is that we now have one code-path after we do the initial conversion into NIR. No shader-db changes. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Create NIR during LinkShader() and ProgramStringNotify().Kenneth Graunke2015-04-111-0/+213
Previously, we translated into NIR and did all the optimizations and lowering as part of running fs_visitor. This meant that we did all of that work twice for fragment shaders - once for SIMD8, and again for SIMD16. We also had to redo it every time we hit a state based recompile. We now generate NIR once at link time. ARB programs don't have linking, so we instead generate it at ProgramStringNotify time. Mesa's fixed function vertex program handling doesn't bother to inform the driver about new programs at all (which is rather mean), so we generate NIR at the last minute, if it hasn't happened already. shader-db runs ~9.4% faster on my i7-5600U, with a release build. v2: Check NirOptions != NULL in ProgramStringNotify(). Don't bother using _mesa_program_enum_to_shader_stage as we already know it. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>