aboutsummaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965/brw_fs.h
Commit message (Collapse)AuthorAgeFilesLines
* i965/fs: Allow spilling for SIMD16 compute shadersJordan Justen2016-03-081-0/+1
| | | | | | | | | | | | | | | | For fragment shaders, we can always use a SIMD8 program. Therefore, if we detect spilling with a SIMD16 program, then it is better to skip generating a SIMD16 program to only rely on a SIMD8 program. Unfortunately, this doesn't work for compute shaders. For a compute shader, we may be required to use SIMD16 if the local workgroup size is bigger than a certain size. For example, on gen7, if the local workgroup size is larger than 512, then a SIMD16 program is required. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93840 Signed-off-by: Jordan Justen <[email protected]> Cc: "11.2" <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Optimize float conversions of byte/word extract.Matt Turner2016-03-041-0/+2
| | | | | | | | | | | | | | | | | | instructions in affected programs: 31535 -> 29966 (-4.98%) helped: 23 cycles in affected programs: 272648 -> 266022 (-2.43%) helped: 14 HURT: 1 The patch decreases the number of instructions in the two Unigine programs by: #1721: 4374 -> 4155 instructions (-5.01%) #1706: 3582 -> 3363 instructions (-6.11%) Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]>
* i965: Lower min/max after optimization on Gen4/5.Matt Turner2016-02-171-0/+1
| | | | | | | | | | | | | | | | | | | Gen4/5's SEL instruction cannot use conditional modifiers, so min/max are implemented as CMP + SEL. Handling that after optimization lets us CSE more. On Ironlake: total instructions in shared programs: 6426035 -> 6422753 (-0.05%) instructions in affected programs: 326604 -> 323322 (-1.00%) helped: 1411 total cycles in shared programs: 129184700 -> 129101586 (-0.06%) cycles in affected programs: 18950290 -> 18867176 (-0.44%) helped: 2419 HURT: 328 Reviewed-by: Francisco Jerez <[email protected]>
* i965/fs: Refactor setup_payload_gen6 to assume FSJason Ekstrand2016-02-111-2/+2
| | | | | | | | | It's extremely FS specific so the fact that we have a stage check in the middle of it is rather bogus. While were here, we rename setup_payload_gen4 and setup_payload_gen6 to make it obvious that they are both FS specific. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Plumb separate surfaces and samplers through from NIRJason Ekstrand2016-02-091-0/+2
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Separate the sampler from the surface in generate_texJason Ekstrand2016-02-091-0/+1
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* glsl: move to compiler/Emil Velikov2016-01-261-1/+1
| | | | | | Signed-off-by: Emil Velikov <[email protected]> Acked-by: Matt Turner <[email protected]> Acked-by: Jose Fonseca <[email protected]>
* nir: move to compiler/Emil Velikov2016-01-261-1/+1
| | | | | | Signed-off-by: Emil Velikov <[email protected]> Acked-by: Matt Turner <[email protected]> Acked-by: Jose Fonseca <[email protected]>
* i965/fs/generator: Take an actual shader stage rather than a stringJason Ekstrand2016-01-151-2/+2
| | | | | | Cc: "11.1" <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Add tessellation evaluation shadersKenneth Graunke2015-12-221-1/+9
| | | | | | | | | | | | | | | | | | | The TES is essentially a post-tessellator VS, which has access to the entire TCS output patch, and a special gl_TessCoord input. Otherwise, they're very straightforward. This patch implements SIMD8 tessellation evaluation shaders for Gen8+. The tessellator can generate a lot of geometry, so operating in SIMD8 mode (8 vertices per thread) is more efficient than SIMD4x2 mode (only 2 vertices per thread). I have another patch which implements SIMD4x2 mode for older hardware (or via an environment variable override). We currently handle all inputs via the pull model. v2: Improve comments (suggested by Jordan Justen). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* nir: Get rid of *_indirect variants of input/output load/store intrinsicsJason Ekstrand2015-12-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is some special-casing needed in a competent back-end. However, they can do their special-casing easily enough based on whether or not the offset is a constant. In the mean time, having the *_indirect variants adds special cases a number of places where they don't need to be and, in general, only complicates things. To complicate matters, NIR had no way to convdert an indirect load/store to a direct one in the case that the indirect was a constant so we would still not really get what the back-ends wanted. The best solution seems to be to get rid of the *_indirect variants entirely. This commit is a bunch of different changes squashed together: - nir: Get rid of *_indirect variants of input/output load/store intrinsics - nir/glsl: Stop handling UBO/SSBO load/stores differently depending on indirect - nir/lower_io: Get rid of load/store_foo_indirect - i965/fs: Get rid of load/store_foo_indirect - i965/vec4: Get rid of load/store_foo_indirect - tgsi_to_nir: Get rid of load/store_foo_indirect - ir3/nir: Use the new unified io intrinsics - vc4: Do all uniform loads with byte offsets - vc4/nir: Use the new unified io intrinsics - vc4: Fix load_user_clip_plane crash - vc4: add missing src for store outputs - vc4: Fix state uniforms - nir/lower_clip: Update to the new load/store intrinsics - nir/lower_two_sided_color: Update to the new load intrinsic NIR and i965 changes are Reviewed-by: Kenneth Graunke <[email protected]> NIR indirect declarations and vc4 changes are Reviewed-by: Eric Anholt <[email protected]> ir3 changes are Reviewed-by: Rob Clark <[email protected]> NIR changes are Acked-by: Rob Clark <[email protected]>
* i965/nir: Implement shared variable atomic operationsJordan Justen2015-12-091-0/+2
| | | | | | | | | v3: * Update based on latest SSBO code (Iago) Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
* i965: Fix scalar vertex shader struct outputs.Kenneth Graunke2015-11-251-0/+2
| | | | | | | | | | | | | | | | | | | | While we correctly set output[] for composite varyings, we set completely bogus values for output_components[], making emit_urb_writes() output zeros instead of the actual values. Unfortunately, our simple approach goes out the window, and we need to recurse into structs to get the proper value of vector_elements for each field. Together with the previous patch, this fixes rendering in an upcoming game from Feral Interactive. v2: Use pointers instead of pass-by-mutable-reference (Jason, Matt). Cc: "11.1 11.0" <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Fix fragment shader struct inputs.Kenneth Graunke2015-11-251-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | Apparently we have literally no support for FS varying struct inputs. This is somewhat surprising, given that we've had tests for that very feature that have been passing for a long time. Normally, varying packing splits up structures for us, so we don't see them in the backend. However, with SSO, varying packing isn't around to save us, and we get actual structs that we have to handle. This patch changes fs_visitor::emit_general_interpolation() to work recursively, properly handling nested structs/arrays/and so on. (It's easier to read with diff -b, as indentation changes.) When using the vec4 VS backend, this fixes rendering in an upcoming game from Feral Interactive. (The scalar VS backend requires additional bug fixes in the next patch.) v2: Use pointers instead of pass-by-mutable-reference (Jason, Matt). Cc: "11.1 11.0" <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Clean up #includes in the compiler.Matt Turner2015-11-241-20/+0
| | | | Reviewed-by: Ian Romanick <[email protected]>
* i965: Use NIR for lowering texture swizzleJason Ekstrand2015-11-231-4/+0
| | | | | | | | | | Now that nir_lower_tex can do texture swizzle lowering, we can use that instead of repeating more-or-less the same code in both backends. This both allows us to share code and means that things like the tg4 work-arounds are somewhat simpler because they don't have to take the swizzle into account. Reviewed-by: Connor Abbott <[email protected]>
* i965: Use nir_lower_tex for texture coordinate loweringJason Ekstrand2015-11-231-3/+0
| | | | | | | | | | Previously, we had a rescale_texcoords helper in the FS backend for handling rescaling of texture coordinates. Now that we can do variants in NIR, we can use nir_lower_tex to do the rescaling for us. This allows us to delete the i965-specific code and gives us proper TEXTURE_RECTANGLE and GL_CLAMP handling in vertex and geometry shaders. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Allow indirect GS input indexing in the scalar backend.Kenneth Graunke2015-11-181-1/+2
| | | | | | | | | | | | | | | | | | This allows arbitrary non-constant indices on GS input arrays, both for the vertex index, and any array offsets beyond that. All indirects are handled via the pull model. We could potentially handle indirect addressing of pushed data as well, but it would add additional code complexity, and we usually have to pull inputs anyway due to the sheer volume of input data. Plus, marking pushed inputs as live due to indirect addressing could exacerbate register pressure problems pretty badly. We'd need to be careful. v2: Use updated MOV_INDIRECT opcode. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Abdiel Janulgue <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
* i965: Introduce a MOV_INDIRECT opcode.Kenneth Graunke2015-11-141-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The geometry and tessellation control shader stages both read from multiple URB entries (one per vertex). The thread payload contains several URB handles which reference these separate memory segments. In GLSL, these inputs are represented as per-vertex arrays; the outermost array index selects which vertex's inputs to read. This array index does not necessarily need to be constant. To handle that, we need to use indirect addressing on GRFs to select which of the thread payload registers has the appropriate URB handle. (This is before we can even think about applying the pull model!) This patch introduces a new opcode which performs a MOV from a source using VxH indirect addressing (which allows each of the 8 SIMD channels to select distinct data.) Based on a patch by Jason Ekstrand. v2: Rename from INDIRECT_THREAD_PAYLOAD_MOV to MOV_INDIRECT; make it a bit more generic. Use regs_read() instead of hacking up the register allocator. (Suggested by Jason Ekstrand.) v3: Fix regs_read() to be more accurate for small unaligned regions. Also rebase on Matt's work. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> [v3] Reviewed-by: Abdiel Janulgue <[email protected]> [v1]
* i965: Replace HW_REG with ARF/FIXED_GRF.Matt Turner2015-11-131-2/+3
| | | | | | | | | | | | | | | | HW_REGs are (were!) kind of awful. If the file was HW_REG, you had to look at different fields for type, abs, negate, writemask, swizzle, and a second file. They also caused annoying problems like immediate sources being considered scheduling barriers (commit 6148e94e2) and other such nonsense. Instead use ARF/FIXED_GRF/MRF for fixed registers in those files. After a sufficient amount of time has passed since "GRF" was used, we can rename FIXED_GRF -> GRF, but doing so now would make rebasing awful. Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Rename GRF to VGRF.Matt Turner2015-11-131-1/+1
| | | | | | | | | | The 2-bit hardware register file field is ARF, GRF, MRF, IMM. Rename GRF to VGRF (virtual GRF) so that we can reuse the GRF name to mean an assigned general purpose register. Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Split nir_emit_intrinsic by stage with a general fallback.Kenneth Graunke2015-11-111-0/+8
| | | | | | | | | | | | | | | | | | | | | | Many intrinsics only apply to a particular stage (such as discard). In other cases, we may want to interpret them differently based on the stage (such as load_primitive_id or load_input). The current method isn't that pretty - we handle all intrinsics in one giant function. Sometimes we assert on stage, sometimes we forget. Different behaviors are handled via if-ladders based on stage. This commit introduces new nir_emit_<stage>_intrinsic() functions, and makes nir_emit_instr() call those. In turn, those fall back to the generic nir_emit_intrinsic() function for cases they don't want to handle specially. This makes it clear which intrinsics only exist in one stage, and makes it easy to handle inputs/outputs differently for various stages. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: Add scalar geometry shader support.Kenneth Graunke2015-11-031-1/+16
| | | | | | | | | | | | | | | | | | | | This is hidden behind INTEL_SCALAR_GS=1 for now, as we don't yet support instanced geometry shaders, and Orbital Explorer's shader spills like crazy. But the infrastructure is in place, and it's largely working. v2: Lots of rebasing. v3: (feedback from Kristian Høgsberg) - Handle stride and subreg_offset correctly for ATTRs; use a helper. - Fix missing emit_shader_time_end() call. - Delete dead code after early EOT in static vertex case to avoid tripping asserts in emit_shader_time_end(). - Use proper D/UD type in intexp2(). - Fix "EndPrimitve" and "to that" typos. - Assert that invocations == 1 so we know this is missing. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
* i965: Fix the fs_visitor GS constructor to take shader_time_index.Kenneth Graunke2015-11-031-1/+2
| | | | | | | | Jason reworked this so it isn't simply ST_GS anymore...it's either -1 (not enabled) or an actual offset. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
* i965/fs: split out calculation of payload live rangesConnor Abbott2015-10-301-0/+2
| | | | | | | | We'll need this for the scheduler too, since it wants to know when the live ranges of payload registers end in order to model them in our register pressure calculations. Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Implement ARB_shader_stencil_export (gen9+)Ben Widawsky2015-10-211-0/+3
| | | | | | | | | | | | | v2: remove useless source_stencil_to_render_target (Ken) Squash in the actual packing function, which also got to v2: Move the definition of the OPCODE outside of FB_WRITE opcodes (Matt) Reorder the regioning to be in VWH order (Matt) Don't retype src in the backend, just assert instead (Matt) Rename the debug prints to something better (Matt) Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add a fs_visitor constructor that takes a brw_gs_compile.Kenneth Graunke2015-10-211-1/+10
| | | | | | | | | | | Unlike the vs/wm structs, brw_gs_compile is actually useful: it contains the input VUE map and information about the control data headers. Passing this in allows us to share that code in brw_gs.c, and calculate them before deciding on vec4 vs. scalar mode, as it's independent of that choice. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
* i965: Introduce a new SHADER_OPCODE_URB_READ_SIMD8 opcode.Kenneth Graunke2015-10-211-0/+1
| | | | | | | | | | | | In scalar mode, geometry shader inputs can easily take up hundreds of registers. This makes pushing VUE entries impractical; we'll need to resort to the pull model in some cases. To support this, we introduce a new opcode corresponding to the "URB Read SIMD8" message. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
* i965: Use a const nir_shader in backend_shaderJason Ekstrand2015-10-191-1/+1
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* i965/fs: Remove the gl_program from the generatorJason Ekstrand2015-10-191-3/+0
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* nir: remove dependency on glslRob Clark2015-10-161-1/+1
| | | | | | | | | | | | | | | Move glsl_types into NIR, now that the dependency on glsl_symbol_table has been split out. Possibly makes sense to rename things at this point, but if we do that I'd like to keep it split out into a separate patch to make git history easier to follow (IMHO). v2: fix android build v3: I f***ing hate scons.. but at least it builds Reviewed-by: Jason Ekstrand <[email protected]> Signed-off-by: Rob Clark <[email protected]>
* i965/shader: Get rid of the shader, prog, and shader_prog fieldsJason Ekstrand2015-10-021-2/+2
| | | | | | | | | | Unfortunately, we can't get rid of them entirely. The FS backend still needs gl_program for handling TEXTURE_RECTANGLE. The GS vec4 backend still needs gl_shader_program for handling transfom feedback. However, the VS needs neither and we can substantially reduce the amount they are used. One day we will be free from their tyranny. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs,vec4: Get rid of the sanity_param_countJason Ekstrand2015-10-021-1/+0
| | | | | | | | It doesn't exist for anything other than an assert that, as far as I can tell, isn't possible to trip. Soon, we will remove prog from the visitor entirely and this will become even more impossible to hit. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Move sampler unit lookup into rescale_texcoordJason Ekstrand2015-10-021-3/+2
| | | | | | | | | The texunit variable we create and assign in nir_emit_texture gets passed through two more layers of function calls before it gets to its sole use in rescale_texcoord. The best part is that we already pass the sampler into rescale_texcoord so we can just look it up there. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/backend_shader: Add a field to store the NIR shaderJason Ekstrand2015-10-021-4/+4
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Move binding table setup to codegen time.Jason Ekstrand2015-10-021-2/+0
| | | | | | | | | Setting up binding tables really has little to do with the actual process of turning shaders into instructions; it's more part of setting up prog_data. This commit moves it out of the visitors and with the rest of the prog_data setup stuff. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/nir: Pull GLSL uniform handling into a common functionJason Ekstrand2015-10-021-2/+0
| | | | | | | | The way we deal with GLSL uniforms and builtins is basically the same in both the vec4 and the fs backend. This commit takes the best parts of both implementations and pulls the common code into a shared helper function. Reviewed-by: Iago Toral Quiroga <[email protected]>
* i965/shader: Get rid of the setup_vec4_uniform_value helperJason Ekstrand2015-10-021-4/+0
| | | | | | It's not used by anything anymore Reviewed-by: Iago Toral Quiroga <[email protected]>
* i965/cs: Add a binding table entry for gl_NumWorkGroupsJordan Justen2015-09-291-1/+2
| | | | | | | | | | | | | If glDispatchComputeIndirect is used, then the value for this variable must be read from the indirect BO. To allow the same generated code to support indirect and glDispatchCompute, we will also setup a BO for the number of work groups using the intel_upload_data mechanism. This will only be required if the gl_NumWorkGroups variable is accessed. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
* i965/nir/fs: Implement nir_intrinsic_ssbo_atomic_*Iago Toral Quiroga2015-09-251-0/+2
| | | | Reviewed-by: Kristian Høgsberg <[email protected]>
* i965/fs: Implement FS_OPCODE_GET_BUFFER_SIZESamuel Iglesias Gonsalvez2015-09-251-0/+3
| | | | | Signed-off-by: Samuel Iglesias Gonsalvez <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
* i965/fs: Add a very basic validation passJason Ekstrand2015-09-151-0/+1
| | | | | | | | Currently the validation pass only validates that regs_read and regs_written are consistent with the sizes of VGRF's. We can add more as we find it to be useful. Reviewed-by: Matt Turner <[email protected]>
* i965/cs: Initialize gl_WorkGroupID variable from payloadJordan Justen2015-09-131-0/+1
| | | | | Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
* i965/cs: Initialize gl_LocalInvocationID from payloadJordan Justen2015-09-131-0/+1
| | | | | Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
* i965/cs: Reserve local invocation id in payload regsJordan Justen2015-09-131-0/+1
| | | | | | Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]>
* i965: Move brw_setup_tex_for_precompile to brw_program.[ch].Kenneth Graunke2015-09-031-3/+0
| | | | | | | | | | | | This living in brw_fs.{h,cpp} is a historical artifact of us supporting texturing for fragment shaders before any other stages. It's kind of awkward given that we use it for all stages. This avoids having to include brw_fs.h in geometry shader code in order to access this function. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965/fs: Remove fs_visitor::try_replace_with_sel().Matt Turner2015-08-281-1/+0
| | | | | | | No shader-db changes on g4x, snb, hsw, or bdw. Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Combine assign_constant_locations and ↵Jason Ekstrand2015-08-251-1/+0
| | | | | | | | | | move_uniform_array_access_to_pull_constants The comment above move_uniform_array_access_to_pull_constants was completely bogus because it has nothing to do with lowering instructions. Instead, it's assiging locations of pull constants. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Rework uniform handlingJason Ekstrand2015-08-251-3/+0
| | | | | | | | | | | Previously, we treated the entire UNIFORM file as if it had two elements: One for direct things and one for indirect. This is substantially different from how the old visitor code handled it where each element was effectively its own uniform. This commit makes the NIR path more like the old ir_visitor path where each uniform is separate. This should allow us to more easily make decisions about what to push. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Move type_size() methods out of visitor classes.Kenneth Graunke2015-08-251-1/+0
| | | | | | | | I want to use C function pointers to these, and they don't use anything in the visitor classes anyway. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>