aboutsummaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965/brw_vec4.cpp
Commit message (Collapse)AuthorAgeFilesLines
* i965: Perform basic optimizations on the FIND_LIVE_CHANNEL opcode.Francisco Jerez2015-05-041-0/+41
| | | | | | | | v2: Save some CPU cycles by doing 'return progress' rather than 'depth++' in the discard jump special case. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Perform basic optimizations on the BROADCAST opcode.Francisco Jerez2015-05-041-0/+10
| | | | | | v2: Style fixes. Reviewed-by: Matt Turner <[email protected]>
* i965: Add typed surface access opcodes.Francisco Jerez2015-05-041-0/+6
| | | | | Acked-by: Kenneth Graunke <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Add untyped surface write opcode.Francisco Jerez2015-05-041-0/+2
| | | | | Reviewed-by: Topi Pohjolainen <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
* i965/vec4: Add support for untyped surface message sends from GRF.Francisco Jerez2015-05-041-3/+4
| | | | | | | | | This doesn't actually enable untyped surface message sends from GRF yet, the upcoming atomic counter and image intrinsic lowering code will. Reviewed-by: Topi Pohjolainen <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
* i965: Add brw_setup_tex_for_precompile. Use in VS, GS & FS.Jordan Justen2015-05-021-12/+1
| | | | | | Suggested-by: Kristian Høgsberg <[email protected]> Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Unhardcode a few more stage names and abbreviations.Kenneth Graunke2015-04-301-4/+2
| | | | | | | | | | | | | | | The stage_abbrev and stage_name fields in backend_visitor provide what we need without any additional effort. It also means we'll get the right names for compute shaders, SIMD8 geometry shaders, and both kinds of tessellation shaders. This does unfortunately change the capitalization of the stage abbreviation in the INTEL_DEBUG=optimizer output filenames. It doesn't seem worth adding code to handle, though. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965/vs: Remove unnecessary NULL check on generate_code() result.Kenneth Graunke2015-04-271-2/+1
| | | | | | | | | Code generation is not allowed to fail for any reason - in fact, fs_generator has no mechanism for failing. The visitor is responsible for that. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Add a devinfo field to backend_visitor and use it for gen checksJason Ekstrand2015-04-221-11/+11
| | | | Reviewed-by: Matt Turner <[email protected]>
* i965: Create NIR during LinkShader() and ProgramStringNotify().Kenneth Graunke2015-04-111-3/+14
| | | | | | | | | | | | | | | | | | | | | | | Previously, we translated into NIR and did all the optimizations and lowering as part of running fs_visitor. This meant that we did all of that work twice for fragment shaders - once for SIMD8, and again for SIMD16. We also had to redo it every time we hit a state based recompile. We now generate NIR once at link time. ARB programs don't have linking, so we instead generate it at ProgramStringNotify time. Mesa's fixed function vertex program handling doesn't bother to inform the driver about new programs at all (which is rather mean), so we generate NIR at the last minute, if it hasn't happened already. shader-db runs ~9.4% faster on my i7-5600U, with a release build. v2: Check NirOptions != NULL in ProgramStringNotify(). Don't bother using _mesa_program_enum_to_shader_stage as we already know it. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Check the INTEL_USE_NIR environment variable once at context creationJason Ekstrand2015-04-031-1/+3
| | | | Reviewed-by: Jordan Justen <[email protected]>
* i965/nir: Use NIR for ARB_vertex_program support on Gen8+.Kenneth Graunke2015-03-271-4/+10
| | | | | | | | | | | Everything is already in place; we simply have to take the scalar code generation path. This gives us SIMD8 VS programs, instead of SIMD4x2. v2: Rebase on the patch that drops brw->gen >= 8. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* i965: Drop unnecessary brw->gen >= 8 check from scalar VS code.Kenneth Graunke2015-03-251-1/+1
| | | | | | | brw->scalar_vs already implies that brw->gen >= 8. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965/vec4: Define helpers to calculate the common live interval of a range ↵Francisco Jerez2015-03-231-4/+2
| | | | | | | | | of variables. These will be especially useful when we start keeping track of liveness information for each subregister. Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Fix handling of multiple register reads and writes in ↵Francisco Jerez2015-03-231-9/+6
| | | | | | split_virtual_grfs(). Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Fix handling of multiple register reads and writes in ↵Francisco Jerez2015-03-231-14/+10
| | | | | | opt_register_coalesce(). Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Some more trivial swizzle clean-up.Francisco Jerez2015-03-231-6/+2
| | | | Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Improve src_reg/dst_reg conversion constructors.Francisco Jerez2015-03-231-26/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This simplifies the src_reg/dst_reg conversion constructors using the swizzle utils introduced in a previous patch. It also makes them more useful by changing their semantics slightly: dst_reg(src_reg) used to set the writemask to XYZW if the src_reg swizzle was anything other than XXXX, which was almost certainly not what the caller intended if the swizzle was non-trivial. After this patch the same components that are present in the swizzle will be enabled in the resulting writemask. src_reg(dst_reg) used to set the first components of the swizzle to the enabled components of the writemask and then replicate the last enabled component to fill the swizzle, which, in cases where the writemask didn't have exactly the first n components set, would in general not be compatible with the original dst_reg. E.g.: | ADD(tmp, src_reg(tmp), src_reg(1)); would *not* do what one would expect (add one to each of the enabled components of tmp) if tmp didn't have a writemask of the described form (e.g. YZ, YW, XZW would all fail). This pattern actually occurs in many different places in the VEC4 back-end, it's a wonder that it hasn't caused piglit failures until now. After this patch src_reg(dst_reg) will construct a swizzle with each enabled component at its natural position (e.g. Y at the second position, Z at the third, and so on). The resulting swizzle will behave like the identity when used in any instruction with the original writemask. I've manually verified that *none* of the callers of both conversion constructors were relying on the previous broken semantics. There are no piglit regressions on any generation. Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Pass argument by reference to src_reg/dst_reg conversion ↵Francisco Jerez2015-03-231-2/+2
| | | | | | constructors. Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Remove swizzle_for_size() in favour of brw_swizzle_for_size().Francisco Jerez2015-03-231-21/+1
| | | | | | | | | | It could be objected that swizzle_for_size() is "faster" than brw_swizzle_for_size(). It's not measurably better in any reasonable CPU-bound benchmark on VLV according to the Finnish benchmarking system (including the SynMark2 DrvShComp shader compilation benchmark). Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Simplify opt_register_coalesce() using the swizzle utils.Francisco Jerez2015-03-231-26/+7
| | | | Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Simplify reswizzle() using the swizzle utils.Francisco Jerez2015-03-231-29/+11
| | | | Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Simplify opt_reduce_swizzle() using the swizzle utils.Francisco Jerez2015-03-231-44/+7
| | | | Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Fix signedness of dst_reg::writemask.Francisco Jerez2015-03-231-1/+1
| | | | Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Print spills:fills and number of promoted constants.Matt Turner2015-03-191-1/+2
| | | | | Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965/vec4: Handle saturate in dump_instruction().Matt Turner2015-03-051-0/+2
| | | | Reviewed-by: Ian Romanick <[email protected]>
* i965/vec4: Don't attempt to reduce swizzles of send from GRF instructions.Francisco Jerez2015-02-191-1/+2
| | | | Reviewed-by: Matt Turner <[email protected]>
* i965: Optimize multiplication by -1 into a negated MOV.Matt Turner2015-02-151-0/+5
| | | | | | | instructions in affected programs: 968 -> 942 (-2.69%) helped: 4 Reviewed-by: Ian Romanick <[email protected]>
* i965: Quiet another compiler warning about uninitialized values.Eric Anholt2015-02-121-2/+2
| | | | | | | | The compiler can't tell that we're always going to hit the first if block on the first time through the loop. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965/vec4: Don't set any dependency control bits for F32TO16 on Gen8.Francisco Jerez2015-02-101-0/+5
| | | | | | It's expanded to several instructions. Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Init mlen for several send from GRF instructions.Francisco Jerez2015-02-101-1/+3
| | | | Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Fix the scheduler to take into account reads and writes of ↵Francisco Jerez2015-02-101-0/+18
| | | | | | | | multiple registers. v2: Avoid nested ternary operators in vec4_instruction::regs_read(). (Matt) Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Make vec4_visitor::implied_mrf_writes() return zero for sends ↵Francisco Jerez2015-02-101-1/+1
| | | | | | from GRF. Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Implement equals() method for dst_reg too.Francisco Jerez2015-02-101-0/+16
| | | | Reviewed-by: Matt Turner <[email protected]>
* i965: Move up fs_inst::flag_subreg to backend_instruction.Francisco Jerez2015-02-101-2/+9
| | | | Reviewed-by: Matt Turner <[email protected]>
* i965: Factor out virtual GRF allocation to a separate object.Francisco Jerez2015-02-101-10/+10
| | | | | | | | | | | | | Right now virtual GRF book-keeping and allocation is performed in each visitor class separately (among other hundred different things), leading to duplicated logic in each visitor and preventing layering as it forces any code that manipulates i965 IR and needs to allocate virtual registers to depend on the specific visitor that happens to be used to translate from GLSL IR. v2: Use realloc()/free() to allocate VGRF book-keeping arrays (Connor). Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Correct MUL destination hazardBen Widawsky2015-02-061-4/+4
| | | | | | | | | | | | | | | | | | | | | | As it turns out, we were over-thinking the cause of the hang on Cherryview. It's simply errata for Cherryview. commit 88fea85f09e2252035bec66ab26c375b45b000f5 Author: Ben Widawsky <[email protected]> Date: Fri Nov 21 10:47:41 2014 -0800 i965/vec4/gen8: Handle the MUL dest hazard exception This is an explanation to why we never saw the hang on BDW. NOTE: The problem the original patch was trying to fix does still exist. It will have to be fixed at some point. v2: Modify commit message, s/CHV/BDW Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84212 Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Convert CMP.GE -(abs)reg 0 -> CMP.Z reg 0.Matt Turner2015-01-231-0/+12
| | | | | | | | | total instructions in shared programs: 5952059 -> 5951603 (-0.01%) instructions in affected programs: 138812 -> 138356 (-0.33%) GAINED: 1 LOST: 0 Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vec4: Make sure that imm writes are to registers in the same file.Matt Turner2015-01-151-2/+8
| | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87887
* i965: Fix "vertex" vs. "geometry" and "VS" vs. "GS" in debug output.Kenneth Graunke2015-01-141-1/+1
| | | | | | | | | We were happily printing "Native code for unnamed vertex shader" and "VS vec4" program for geometry shaders in our INTEL_DEBUG=gs output, as well as the KHR_debug output used by shader-db. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Pass a shader stage abbreviation to fs_generator().Kenneth Graunke2015-01-141-1/+1
| | | | | | | | | | | A lot of messages hardcoded the string "FS", which is confusing on Broadwell, where we use this code for VS support as well. shader-db particularly got confused, as it reported two "FS SIMD8" shaders, and no vertex shaders at all. Craziness ensued. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Make the precompile ignore DEPTH_TEXTURE_MODE on Gen7.5+.Kenneth Graunke2015-01-041-1/+3
| | | | | | | | | | | | | | | Gen7.5+ platforms that support the "Shader Channel Select" feature leave key->tex.swizzles[i] as SWIZZLE_NOOP except when GL_DEPTH_TEXTURE_MODE is GL_ALPHA (which is really uncommon). So, the precompile should leave them as SWIZZLE_NOOP (aka SWIZZLE_XYZW) as well. We didn't notice this because prog->ShadowSamplers is not set correctly. The next patch will fix that problem. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87886 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Fix INTEL_DEBUG=optimizer with VF types.Kenneth Graunke2015-01-031-1/+1
| | | | | | | Hardcoding stderr is wrong; INTEL_DEBUG=optimizer uses other files. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Show opt_vector_float() and later passes in INTEL_DEBUG=optimizer.Kenneth Graunke2015-01-031-8/+12
| | | | | | | | | | | | In order to support calling opt_vector_float() inside a condition, this patch makes OPT() a statement expression: https://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html We've used that elsewhere already. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Do separate copy followed by constant propagation after ↵Matt Turner2014-12-291-1/+2
| | | | | | | | | | | | | | | | | opt_vector_float(). total instructions in shared programs: 5877012 -> 5876617 (-0.01%) instructions in affected programs: 33140 -> 32745 (-1.19%) From before the commit that allows VF constant propagation (which hurt some programs) to here, the results are: total instructions in shared programs: 5877951 -> 5876617 (-0.02%) instructions in affected programs: 123444 -> 122110 (-1.08%) with no programs hurt. Reviewed-by: Ian Romanick <[email protected]>
* i965/vec4: Do CSE, copy propagation, and DCE after opt_vector_float().Matt Turner2014-12-291-1/+5
| | | | | | | total instructions in shared programs: 5869005 -> 5868220 (-0.01%) instructions in affected programs: 70208 -> 69423 (-1.12%) Reviewed-by: Ian Romanick <[email protected]>
* i965/vec4: Add pass to gather constants into a vector-float MOV.Matt Turner2014-12-291-0/+61
| | | | | | | | | | Currently only handles consecutive instructions with the same destination that collectively write all channels. total instructions in shared programs: 5879798 -> 5869011 (-0.18%) instructions in affected programs: 465236 -> 454449 (-2.32%) Reviewed-by: Ian Romanick <[email protected]>
* i965: Add support for saturating immediates.Matt Turner2014-12-291-0/+16
| | | | | | | I don't feel great about assert(!"unimplemented: ...") but these cases do only seem possible under some currently impossible circumstances. Reviewed-by: Ian Romanick <[email protected]>
* i965: Add fs_reg/src_reg constructors that take vf[4].Matt Turner2014-12-291-0/+9
| | | | | | | | | | Sometimes it's easier to generate 4x values into an array, and the memcpy is 1 instruction, rather than 11 to piece 4 arguments together. I'd forgotten to remove the prototype from fs_reg from a previous patch, so it's already there for us here. Reviewed-by: Ian Romanick <[email protected]>
* i965/brw_reg: struct constructor now needs explicit negate and abs values.Andres Gomez2014-12-151-0/+2
| | | | | | | | | | | | | | | | | | | We were assuming, when constructing a new brw_reg struct, that the negate and abs register modifiers would not be present by default in the new register. Now, we force explicitly setting these values when constructing a new register. This will avoid problems like forgetting to properly set them when we are using a previous register to generate this new register, as it was happening in the dFdx and dFdy generation functions. Fixes piglit test shaders/glsl-deriv-varyings Cc: "10.4 10.3" <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82991 Reviewed-by: Matt Turner <[email protected]>