aboutsummaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965/brw_vec4.cpp
Commit message (Collapse)AuthorAgeFilesLines
* i965/vec4: Get rid of the uniform_vector_size arrayJason Ekstrand2015-10-021-1/+0
| | | | | | | The uniform_vector_size array was only ever used by pack_uniform_registers which no longer needs it. Reviewed-by: Iago Toral Quiroga <[email protected]>
* i965/vec4: Use the actual channels used in pack_uniform_registersJason Ekstrand2015-10-021-14/+37
| | | | | | | | | | Previously, pack_uniform_registers worked based on the size of the uniform as given to us when we initially set up the uniforms. However, we have to walk through the uniforms and figure out liveness anyway, so we migh as well record the number of channels used as we go. This may also allow us to pack things tighter in a few cases. Reviewed-by: Iago Toral Quiroga <[email protected]>
* i965/vs: Move lazy NIR creation to codegen_vs_progJason Ekstrand2015-10-021-12/+0
| | | | | | | | The next commit will add code to codegen_vs_prog that requires the NIR shader to be there in all cases. It doesn't hurt anything to just move it from brw_vs_emit to its only caller. Reviewed-by: Iago Toral Quiroga <[email protected]>
* i965/vec4: Always use NIRJason Ekstrand2015-10-021-31/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | GLSL IR vs. NIR shader-db results for vec4 programs on i965: total instructions in shared programs: 1499328 -> 1388354 (-7.40%) instructions in affected programs: 1245199 -> 1134225 (-8.91%) helped: 7469 HURT: 2440 GLSL IR vs. NIR shader-db results for vec4 programs on G4x: total instructions in shared programs: 1436799 -> 1325825 (-7.72%) instructions in affected programs: 1205599 -> 1094625 (-9.20%) helped: 7469 HURT: 2440 GLSL IR vs. NIR shader-db results for vec4 programs on Iron Lake: total instructions in shared programs: 1436654 -> 1325682 (-7.72%) instructions in affected programs: 1205503 -> 1094531 (-9.21%) helped: 7468 HURT: 2440 GLSL IR vs. NIR shader-db results for vec4 programs on Sandy Bridge: total instructions in shared programs: 2016249 -> 1787033 (-11.37%) instructions in affected programs: 1850547 -> 1621331 (-12.39%) helped: 14856 HURT: 1481 GLSL IR vs. NIR shader-db results for vec4 programs on Ivy Bridge: total instructions in shared programs: 1848027 -> 1648216 (-10.81%) instructions in affected programs: 1660279 -> 1460468 (-12.03%) helped: 14668 HURT: 1369 GLSL IR vs. NIR shader-db results for vec4 programs on Bay Trail: total instructions in shared programs: 1848027 -> 1648216 (-10.81%) instructions in affected programs: 1660279 -> 1460468 (-12.03%) helped: 14668 HURT: 1369 GLSL IR vs. NIR shader-db results for vec4 programs on Haswell: total instructions in shared programs: 1848027 -> 1648216 (-10.81%) instructions in affected programs: 1660279 -> 1460468 (-12.03%) helped: 14668 HURT: 1369 I also ran our full suite of benchmarks on a Haswell and had the following statistically significant (according to ministat) changes: Test master-glsl master-nir diff bench_OglGeomPoint 461.556 463.006 1.450 bench_OglTerrainFlyInst 184.484 187.574 3.090 bench_OglTerrainPanInst 132.412 136.307 3.895 bench_OglTexFilterAniso 19.653 19.645 -0.008 bench_OglTexFilterTri 58.333 58.009 -0.324 bench_OglVSInstancing 65.049 65.327 0.278 bench_trexoff 69.474 69.694 0.220 bench_valley 40.708 41.125 0.417 v2 (Jason Ekstrand): - Remove more uses of NirOptions as a switch - New shader-db numbers - Added benchmark numbers Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Implement VS_OPCODE_GET_BUFFER_SIZESamuel Iglesias Gonsalvez2015-09-251-0/+1
| | | | | | | | Notice that Skylake needs to include a header in the sampler message so it will need some tweaks to work there. Signed-off-by: Samuel Iglesias Gonsalvez <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
* i965/vec4: Don't coalesce regs in Gen6 MATH ops if reswizzle/writemask neededAntia Puentes2015-09-231-2/+10
| | | | | | | | Gen6 MATH instructions can not execute in align16 mode, so swizzles or writemasking are not allowed. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92033 Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Detect and delete useless MOVs.Matt Turner2015-09-221-0/+22
| | | | | | | | | | | | | | | With NIR: instructions in affected programs: 111508 -> 109193 (-2.08%) helped: 507 Without NIR: instructions in affected programs: 28763 -> 28474 (-1.00%) helped: 186 Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Move perf_debug code to brw_codegen_*_prog()Kristian Høgsberg Kristensen2015-09-141-19/+0
| | | | | | | | | | We're trying to avoid a libdrm dependency in the core compiler, so let's move the perf_debug code one level up from the brw_*_emit() helpers to the brw_codegen_*_prog() helpers. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Signed-off-by: Kristian Høgsberg Kristensen <[email protected]>
* i965/vec4: Fix saturation errors when coalescing registersAntia Puentes2015-09-141-0/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the register types do not match and the instruction that contains the final destination is saturated, register coalescing generated non-equivalent code. This did not happen when using IR because types usually matched, but it is visible in nir-vec4. For example, mov vgrf7:D vgrf2:D mov.sat m4:F vgrf7:F is coalesced to: mov.sat m4:D vgrf2:D The patch prevents coalescing in such scenario, unless the instruction we want to coalesce into is a MOV (without type conversion implied). In that case, the patch sets the register types to the type of the final destination. Shader-db results in HSW (only vec4 instructions shown): total instructions in shared programs: 1754415 -> 1754416 (0.00%) instructions in affected programs: 74 -> 75 (1.35%) helped: 0 HURT: 1 GAINED: 0 LOST: 0 Only one extra instruction in one of the shaders, that comes from eliminating a saturation error by preventing register coalesce. Cc: "10.6 11.0" <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965/vec4: Don't reswizzle hardware registersJason Ekstrand2015-09-121-0/+8
| | | | | | Cc: "11.0 10.6" <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91719 Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vec4: check writemask when bailing out at register coalesceAlejandro Piñeiro2015-09-111-4/+6
| | | | | | | | | | | | | | | | | | | | | | | opt_register_coalesce stopped to check previous instructions to coalesce with if somebody else was writing on the same destination. This can be optimized to check if somebody else was writing to the same channels of the same destination using the writemask. Shader DB results (taking into account only vec4): total instructions in shared programs: 1781593 -> 1734957 (-2.62%) instructions in affected programs: 1238390 -> 1191754 (-3.77%) helped: 12782 HURT: 0 GAINED: 0 LOST: 0 v2: removed some parenthesis, fixed indentation, as suggested by Matt Turner v3: added brackets, for consistency, as suggested by Eduardo Lima Reviewed-by: Matt Turner <[email protected]>
* i965: add support for textureSamples functionIlia Mirkin2015-09-101-0/+1
| | | | | | Signed-off-by: Ilia Mirkin <[email protected]> [v2: kayden-supplied code in fs_nir replacing need for logical opcode] Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add a debug option for spilling everything in vec4 codeIago Toral Quiroga2015-09-041-1/+1
| | | | Reviewed-by: Francisco Jerez <[email protected]>
* i965: Remove the brw_vue_prog_key base class.Kenneth Graunke2015-09-031-12/+0
| | | | | | | | | The legacy userclip fields are only used for the vertex shader, and at that point there's only program_string_id and the tex struct, which are common to all keys. So there's no need for a "VUE" key base class. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Move legacy clip plane handling to vec4_vs_visitor.Kenneth Graunke2015-09-031-6/+4
| | | | | | | | | | | | | This is now only used for the vertex shader, so it makes sense to get it out of any paths run by the geometry shader. Instead of passing the gl_clip_plane array into the run() method (which is shared among all subclasses), we add it as a vec4_vs_visitor constructor parameter. This eliminates the bogus NULL parameter in the GS case. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Delete the brw_vue_program_key::userclip_active flag.Kenneth Graunke2015-09-031-1/+1
| | | | | | | | | | | | | | | | | There are two uses of this flag. The primary use is checking whether we need to emit code to convert legacy gl_ClipVertex/gl_Position clipping to clip distances. In this case, we also have to upload the clip planes as uniforms, which means setting nr_userclip_plane_consts to a positive value. Checking if it's > 0 works for detecting this case. Gen4-5 also wants to know whether we're doing clipping at all, so it can emit user clip flags. Checking if output_reg[VARYING_SLOT_CLIP_DIST0] is set to a real register suffices for this. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965/vec4: fill src_reg type using the constructor type parameterAlejandro Piñeiro2015-09-021-0/+2
| | | | | | | | | | | | | | | | | | | | | | The src_reg constructor that received the glsl_type was using it only to build the swizzle, but not to fill this->type as dst_reg is doing. This caused some type mismatch between movs and alu operations on the NIR path, so copy propagation optimization was not applied to remove unneeded movs if negate modifier was involved. This was first detected on minus (negate+add) operations. Shader DB results (taking into account only vec4): total instructions in shared programs: 20019 -> 19934 (-0.42%) instructions in affected programs: 2918 -> 2833 (-2.91%) helped: 79 HURT: 0 GAINED: 0 LOST: 0 Reviewed-by: Matt Turner <[email protected]>
* i965: Only consider fixed_hw_reg in equals() if file is HW_REG/IMM.Matt Turner2015-08-281-2/+3
| | | | | | | | | | | | | | Noticed when debugging things that lead to the next patch. On G45 (and presumably ILK) this helps register coalescing: total instructions in shared programs: 4077373 -> 4077340 (-0.00%) instructions in affected programs: 43751 -> 43718 (-0.08%) helped: 52 HURT: 2 Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vec4: Handle uniform and GRF array access on vertex programs (NIR)Antia Puentes2015-08-031-1/+1
| | | | | | | | | | | | | | | | When the NIR-vec4 pass is enabled, handles uniform and GRF array access on ARB_vertex_program like it is done on vertex shaders. When the old IR-vec4 pass is used, emit_program_code() emits pull constant loads directly instead of using relative addressing, hence to call to move_uniform_array_access_to_pull_constants() is not needed and it is enough to call to split_uniform_registers(). The patch also calls to move_grf_array_access_to_scratch() like it is done for shaders, however I suspect this is a no-op for vertex programs and we could remove it. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/vec4: Enable NIR-vec4 pass on ARB_vertex_programsAntia Puentes2015-08-031-23/+24
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* i965/nir: Enable NIR-vec4 pass on geometry shadersIago Toral Quiroga2015-08-031-1/+1
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* i965/vec4: Add a new dst_reg constructor accepting a brw_reg_typeAlejandro Piñeiro2015-08-031-0/+11
| | | | | | | | | | | | | | | This is useful for the upcoming texture support in NIR->vec4 pass, as we found several cases where the brw_type is available, but not the glsl_type. Without this new constructor, the alternative would be: dst_reg reg(MRF, <reg>) reg.type = <brw_type> reg.writemask = <mask> Adding a new constructor makes code easier to read. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/nir: Pass a is_scalar boolean to brw_create_nir()Eduardo Lima Mitev2015-08-031-1/+1
| | | | | | | | | | | | The upcoming introduction of NIR->vec4 pass will require that some NIR lowering passes are enabled/disabled depending on the type of shader (scalar vs. vector). With this patch we pass a 'is_scalar' variable to the process of constructing the NIR, to let an external context decide how the shader should be handled. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/nir/vec4: Select between new nir_vec4 or current vec4_visitor code-pathsEduardo Lima Mitev2015-08-031-4/+14
| | | | | | | | | | The NIR->vec4 pass will be activated if both the following conditions are met: * INTEL_USE_NIR environment variable is defined and is positive (1 or true) * The stage is vertex shader (support for geometry shaders and ARB_vertex_program will be added later). Reviewed-by: Jason Ekstrand <[email protected]>
* i965/vs: Get rid of brw_vs_compile completely.Kenneth Graunke2015-07-091-19/+19
| | | | | | | | | After tearing it out another level or two, and just passing the key and vp directly, we can finally remove this struct. It also eliminates a pointless memcpy() of the key. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
* i965/vs: Remove 'c'/vs_compile from vec4_vs_visitor.Kenneth Graunke2015-07-091-2/+2
| | | | | | | | | At this point, the brw_vs_compile structure only contains the key and gl_vertex_program pointer. We may as well pass and store them directly; it's simpler and more convenient (key-> instead of vs_compile->key...). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
* i965/vec4: Move c->last_scratch into vec4_visitor.Kenneth Graunke2015-07-091-2/+2
| | | | | | | | | | | | Nothing outside of vec4_visitor uses it, so we may as well keep it internal. Commit db9c915abcc5ad78d2d11d0e732f04cc94631350 for the vec4 backend. (The empty class will be going away soon.) Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
* i965/vec4: Move total_scratch calculation into the visitor.Kenneth Graunke2015-07-091-2/+5
| | | | | | | | This is more consistent with how we do it in the FS backend, and reduces a tiny bit of duplication. It'll also allow for a bit more tidying. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
* i965/vec4: Move perf_debug about register spilling into the visitor.Kenneth Graunke2015-07-091-3/+13
| | | | | | | | | | | | This patch makes us only issue the performance warning about register spilling if we actually spilled registers. We also use scratch space for indirect addressing and the like. This is basically commit c51163b0cf7aff0375b1a5ea4cb3da9d9e164044 for the vec4 backend. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
* i965/vec4: Plumb log_data through so the backend_shader field gets set.Kenneth Graunke2015-07-091-1/+1
| | | | | | | | | | Jason plumbed this through a while back in the FS backend, but apparently we were just passing NULL in the vec4 backend. This patch passes brw in as intended. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Remove the brw_context from the visitorsJason Ekstrand2015-06-231-2/+4
| | | | | | | As of this commit, nothing actually needs the brw_context. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965/vec4_vs: Add an explicit use_legacy_snorm_formula flagJason Ekstrand2015-06-231-1/+3
| | | | | | | This way we can stop doing is_gles3 checks inside of the compiler. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965/vs: Pass the current set of clip planes through run() and run_vs()Jason Ekstrand2015-06-231-4/+4
| | | | | | | | | Previously, these were pulled out of the GL context conditionally based on whether we were running ff/ARB or a GLSL program. Now, we just pass them in so that the visitor doesn't have to grab them itself. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Pull calls to get_shader_time_index out of the visitorJason Ekstrand2015-06-231-12/+13
| | | | | Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Use a single index per shader for shader_time.Jason Ekstrand2015-06-231-8/+10
| | | | | | | | | | Previously, each shader took 3 shader time indices which were potentially at arbirary points in the shader time buffer. Now, each shader gets a single index which refers to 3 consecutive locations in the buffer. This simplifies some of the logic at the cost of having a magic 3 a few places. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Add compiler options to brw_compilerJason Ekstrand2015-06-231-1/+1
| | | | | | | | | | | | | This creates the options at screen cration time and then we just copy them into the context at context creation time. We also move is_scalar to the brw_compiler structure. We also end up manually setting some values that the core would have set by default for us. Fortunately, there are only two non-zero shader compiler option defaults that we aren't overriding anyway so this isn't a big deal. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Remove the dependance on brw_context from the generatorsJason Ekstrand2015-06-231-2/+2
| | | | | Reviewed-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Plumb compiler debug logging through a function pointer in brw_compilerJason Ekstrand2015-06-231-2/+4
| | | | | | | | v2 (Ken): Make shader_debug_log a printf-like function. v3 (Jason): Add a void * to pass the brw_context through Reviewed-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Create a shader_dispatch_mode enum to replace VS/GS fields.Kenneth Graunke2015-06-011-1/+4
| | | | | | | | | | | | | We used to store the GS dispatch mode in brw_gs_prog_data while separately storing the VS dispatch mode in brw_vue_prog_data::simd8. This patch introduces an enum to represent all possible dispatch modes, and stores it in brw_vue_prog_data::dispatch_mode, unifying the two. Based on a suggestion by Matt Turner. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ben Widawsky <[email protected]>
* i965/vs: Rework the logic for generating NIR from ARB vertex programsJason Ekstrand2015-05-281-12/+11
| | | | | | | | Whether or not to use NIR is now equivalent to brw->scalar_vs. We can simplify the logic and make it far less confusing. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Rename backend_visitor to backend_shaderJason Ekstrand2015-05-281-2/+2
| | | | | | | | The backend_shader class really is a representation of a shader. The fact that it inherits from ir_visitor is somewhat immaterial. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Combine the fs_visitor constructors.Kenneth Graunke2015-05-141-1/+2
| | | | | | | | | | | | | | For scalar GS support, we either need to add a fourth constructor which takes the GS structures, or combine the existing two and pass the shader stage. Given that they're not significantly different, I opted for the latter. v2: Remove more stuff from the .h file (Jason and Jordan). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: Change header_present to header_size in backend_instructionJason Ekstrand2015-05-061-1/+1
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Perform basic optimizations on the FIND_LIVE_CHANNEL opcode.Francisco Jerez2015-05-041-0/+41
| | | | | | | | v2: Save some CPU cycles by doing 'return progress' rather than 'depth++' in the discard jump special case. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Perform basic optimizations on the BROADCAST opcode.Francisco Jerez2015-05-041-0/+10
| | | | | | v2: Style fixes. Reviewed-by: Matt Turner <[email protected]>
* i965: Add typed surface access opcodes.Francisco Jerez2015-05-041-0/+6
| | | | | Acked-by: Kenneth Graunke <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Add untyped surface write opcode.Francisco Jerez2015-05-041-0/+2
| | | | | Reviewed-by: Topi Pohjolainen <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
* i965/vec4: Add support for untyped surface message sends from GRF.Francisco Jerez2015-05-041-3/+4
| | | | | | | | | This doesn't actually enable untyped surface message sends from GRF yet, the upcoming atomic counter and image intrinsic lowering code will. Reviewed-by: Topi Pohjolainen <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
* i965: Add brw_setup_tex_for_precompile. Use in VS, GS & FS.Jordan Justen2015-05-021-12/+1
| | | | | | Suggested-by: Kristian Høgsberg <[email protected]> Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Unhardcode a few more stage names and abbreviations.Kenneth Graunke2015-04-301-4/+2
| | | | | | | | | | | | | | | The stage_abbrev and stage_name fields in backend_visitor provide what we need without any additional effort. It also means we'll get the right names for compute shaders, SIMD8 geometry shaders, and both kinds of tessellation shaders. This does unfortunately change the capitalization of the stage abbreviation in the INTEL_DEBUG=optimizer output filenames. It doesn't seem worth adding code to handle, though. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>