summaryrefslogtreecommitdiffstats
path: root/src/mesa
Commit message (Collapse)AuthorAgeFilesLines
* i965: Remove artificial dependency between math instructions.Matt Turner2014-07-081-1/+2
| | | | | | ... on Gen6+. I'm not actually sure which class Gen6 fits into. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Track dependencies in instruction scheduling per reg offset.Matt Turner2014-07-081-8/+15
| | | | | | | | | | | | | | | | | | | | | Previously instruction scheduling tracked dependencies on a per-register basis. This meant that there was an artificial dependency between interpolation instructions writing into the same virtual register. Instruction scheduling would insert a number of instructions between the two instructions in this example, when they are actually independent. linterp vgrf8+0.0:F, hw_reg2:F, hw_reg3:F, hw_reg6:F linterp vgrf8+1.0:F, hw_reg2:F, hw_reg3:F, hw_reg6+16:F This lead to cases where the first texture coordinate is interpolated at the beginning of the shader, but the second is done immediately before the texture operation that uses it as a source. After this change, the artificial dependency is removed and the interpolation instructions are scheduled together. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Extend compute-to-mrf pass to understand blocks of MOVsKristian Høgsberg2014-07-071-10/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current compute-to-mrf pass doesn't handle blocks of MOVs. Shaders that end with a texture fetch follwed by an fb write are left like this: 0x00000000: pln(8) g6<1>F g4<0,1,0>F g2<8,8,1>F { align1 WE_normal 1Q compacted }; 0x00000008: pln(8) g7<1>F g4.4<0,1,0>F g2<8,8,1>F { align1 WE_normal 1Q compacted }; 0x00000010: send(8) g2<1>UW g6<8,8,1>F sampler (1, 0, 0, 1) mlen 2 rlen 4 { align1 WE_normal 1Q }; 0x00000020: mov(8) g113<1>F g2<8,8,1>F { align1 WE_normal 1Q compacted }; 0x00000028: mov(8) g114<1>F g3<8,8,1>F { align1 WE_normal 1Q compacted }; 0x00000030: mov(8) g115<1>F g4<8,8,1>F { align1 WE_normal 1Q compacted }; 0x00000038: mov(8) g116<1>F g5<8,8,1>F { align1 WE_normal 1Q compacted }; 0x00000040: sendc(8) null g113<8,8,1>F render ( RT write, 0, 4, 12) mlen 4 rlen 0 { align1 WE_normal 1Q EOT }; This patch lets compute-to-mrf recognize blocks of MOVs and match them to instructions (typically SEND) that writes multiple registers. With this, the above shader becomes: 0x00000000: pln(8) g6<1>F g4<0,1,0>F g2<8,8,1>F { align1 WE_normal 1Q compacted }; 0x00000008: pln(8) g7<1>F g4.4<0,1,0>F g2<8,8,1>F { align1 WE_normal 1Q compacted }; 0x00000010: send(8) g113<1>UW g6<8,8,1>F sampler (1, 0, 0, 1) mlen 2 rlen 4 { align1 WE_normal 1Q }; 0x00000020: sendc(8) null g113<8,8,1>F render ( RT write, 0, 20, 12) mlen 4 rlen 0 { align1 WE_normal 1Q EOT }; which is the bulk of the shader db results: total instructions in shared programs: 987040 -> 986720 (-0.03%) instructions in affected programs: 844 -> 524 (-37.91%) GAINED: 0 LOST: 0 The optimization also applies to MRT shaders that write the same color value to multiple RTs, in which case we can eliminate four MOVs in a similar fashion. See fbo-drawbuffers2-blend in piglit for an example. No measurable performance impact. No piglit regressions. Signed-off-by: Kristian Høgsberg <[email protected]>
* i965/fs: Disable unlit_centroid_workaround on Haswell.Matt Turner2014-07-061-2/+4
| | | | | | | | Although the HSW PRM shows it, the BSpec lists this workaround as being for Ivybridge only. total instructions in shared programs: 1994951 -> 1993675 (-0.06%) instructions in affected programs: 27325 -> 26049 (-4.67%)
* i965/vec4: Perform CSE on CMP(N) instructions.Matt Turner2014-07-061-1/+16
| | | | | | | | Port of commit b16b3c87 to the vec4 code. No shader-db improvements, but might as well. The fs backend saw an improvement because it's scalar and multiple identical CMP instructions were generated by the SEL peepholes.
* i965/vec4: Don't emit null MOVs in CSE.Matt Turner2014-07-061-5/+7
| | | | Port of commit 219b43c6 to the vec4 code.
* i965/vec4: Improve CSE performance by expiring some available expressions.Matt Turner2014-07-061-0/+20
| | | | Port of commit 5daf867f to the vec4 code.
* i965/vec4: Add basic common subexpression elimination.Kenneth Graunke2014-07-064-0/+236
| | | | | | | | | | | [mattst88]: Modified to perform CSE on instructions with the same writemask. Offered no improvement before. total instructions in shared programs: 1995633 -> 1995185 (-0.02%) instructions in affected programs: 14410 -> 13962 (-3.11%) Reviewed-by: Matt Turner <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]>
* i965: Fix warnings introduced in commit e24ef5ab.Matt Turner2014-07-061-2/+1
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Move assembly annotation functions to intel_asm_annotation.c.Matt Turner2014-07-054-61/+67
| | | | | | It's C. Compile it as such. Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Rename intel_asm_printer -> intel_asm_annotation.Matt Turner2014-07-058-7/+7
| | | | | | The #ifndef include guards already said the right thing :) Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Make backend_instruction usable from C.Matt Turner2014-07-051-4/+7
| | | | | | | With a hack to place an exec_node in the struct in C to be at the same location as the inherited exec_node in C++. Acked-by: Topi Pohjolainen <[email protected]>
* i965/cfg: Make cfg_t usable from C.Matt Turner2014-07-053-8/+6
| | | | Acked-by: Topi Pohjolainen <[email protected]>
* i965: Repack backend_instruction struct.Matt Turner2014-07-051-7/+5
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Make a brw_predicate enum.Matt Turner2014-07-056-31/+35
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Make a brw_conditional_mod enum.Matt Turner2014-07-0518-43/+54
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Move common fields into backend_instruction.Matt Turner2014-07-053-25/+13
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Use enum brw_reg_type for register types.Matt Turner2014-07-057-13/+14
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Move is_zero/one/null/accumulator into backend_reg.Matt Turner2014-07-056-93/+44
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Make a common backend_reg class.Matt Turner2014-07-054-42/+36
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Drop imm union from visitor register classes.Matt Turner2014-07-052-14/+0
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Use immediate storage in brw_reg for visitor regs.Matt Turner2014-07-056-41/+37
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* gallium: rename PIPE_CAP_TGSI_VS_LAYER to also have _VIEWPORTIlia Mirkin2014-07-032-2/+2
| | | | | | | | | Now that this cap is used to determine the availability of both, adjust its name to reflect the new reality. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* mesa/st: enable AMD_vertex_shader_viewport_indexIlia Mirkin2014-07-032-0/+6
| | | | | | | | | | The assumption is that any driver capable of emitting layer from the vertex shader and supporting viewports should be able to also handle emitting viewport index from the vertex shader. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Tested-by: Tobias Droste <[email protected]>
* i965: expose AMD_vertex_shader_viewport_index on gen7+Ilia Mirkin2014-07-021-1/+3
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* mesa: add support for AMD_vertex_shader_viewport_indexIlia Mirkin2014-07-022-0/+2
| | | | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]> Tested-by: Tobias Droste <[email protected]>
* mesa/st: enable ARB_fragment_layer_viewportIlia Mirkin2014-07-021-0/+1
| | | | | | | | | | If multiple viewports are supported, that implies the presence of a GS and layered rendering, so we can enable ARB_fragment_layer_viewport as well. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* i965/gen6: Add a spec citation about push constant packet requirements.Eric Anholt2014-07-021-1/+8
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add a comment about null renderbuffer surfaces and why they exist.Eric Anholt2014-07-023-2/+22
| | | | | | I noticed this when trying to find comments about pull constant buffers. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Update a ton of comments about constant buffers.Eric Anholt2014-07-024-32/+74
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Merge VS/GS and WM pull constant buffer upload paths.Eric Anholt2014-07-024-53/+42
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965/gen6+: Merge VS/GS and WM push constant buffer upload paths.Eric Anholt2014-07-024-66/+52
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Move dispatch_grf_start_reg and first_curbe_grf into stage_prog_data.Eric Anholt2014-07-0214-35/+36
| | | | | | | I wanted to access this value from stage-generic code, so stop storing it under two different names. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Fix state flags for gen4/5 CURBE.Eric Anholt2014-07-021-8/+8
| | | | | | | | If we had some NOS affecting VS compilation that resulted in optimization changing the set of constants to be uploaded, we might not have reuploaded the constants. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Remove a dead define.Eric Anholt2014-07-021-2/+0
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Reuse libdrm's header for AUB definitions.Eric Anholt2014-07-024-71/+7
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Fix stale comments about the state cache.Eric Anholt2014-07-022-2/+9
| | | | | | This changed in the state streaming work years ago. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Fix stale binding table comment.Eric Anholt2014-07-021-2/+0
| | | | | | | I recently moved the code from the mentioned location right into this file. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Drop the memcmp for finding duplicated CURBE uploads.Eric Anholt2014-07-024-50/+2
| | | | | | | | | | | | | | At this point, the extra copy of the data and memcmp are as expensive as just re-uploading. Note: now that we'll always upload, and brw_constant_buffer watches BRW_NEW_BATCH anyway, we don't need to explicitly unref the old curbe_bo at batch reset time. No significant performance difference on glamor copywinwin10 (n=55), despite that test having a 98% hit rate on the cache. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Reuse intel_upload.c for gen4/5 constant buffers.Eric Anholt2014-07-023-31/+7
| | | | | | No performance difference on glamor with copywinwin10 (n=40) on my gm45. Reviewed-by: Kenneth Graunke <[email protected]>
* st/mesa: add support for indirect drawingChristoph Bumiller2014-07-023-1/+14
|
* xmlconfig/dri: bool -> unsigned charDave Airlie2014-07-023-10/+8
| | | | | | | | | Drop stdbool, due to the X server being a pain and having struct members called bool, although I've sent a patch to fix that we should retain stupidity here. Use unsigned char which is what GLboolean is anyways. Signed-off-by: Dave Airlie <[email protected]>
* i965/fs: Update discard jump to preserve uniform loads via sampler.Cody Northrop2014-07-011-6/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 17c7ead7 exposed a bug in how uniform loading happens in the presence of discard. It manifested itself in an application as randomly incorrect pixels on the borders of conditional areas. This is due to how discards jump to the end of the shader incorrectly for some channels. The current implementation checks each 2x2 subspan to preserve derivatives. When uniform loading via samplers was turned on, it uses a full execution mask, as stated in lower_uniform_pull_constant_loads(), and only populates four channels of the destination (see generate_uniform_pull_constant_load_gen7()). It happens incorrectly when the first subspan has been jumped over. The series that implemented this optimization was done before the changes to use samplers for uniform loads. Uniform sampler loads use special execution masks and only populate four channels, so we can't jump over those or corruption ensues. This fix only jumps to the end of the shader if all relevant channels are disabled, i.e. all 8 or 16, depending on dispatch. This preserves the original GLbenchmark 2.7 speedup noted in commit beafced2. It changes the shader assembly accordingly: before : (-f0.1.any4h) halt(8) 17 2 null { align1 WE_all 1Q }; after(8) : (-f0.1.any8h) halt(8) 17 2 null { align1 WE_all 1Q }; after(16): (-f0.1.any16h) halt(16) 17 2 null { align1 WE_all 1H }; v2: Cleaned up comments and conditional ordering. v3: Fix typo. Signed-off-by: Cody Northrop <[email protected]> Reviewed-by: Mike Stroyan <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79948
* i965/fs: Mark case unreachable to silence warning.Matt Turner2014-07-011-0/+2
| | | | Reviewed-by: Ian Romanick <[email protected]>
* i965: Use unreachable() instead of unconditional assert().Matt Turner2014-07-0155-324/+182
| | | | Reviewed-by: Ian Romanick <[email protected]>
* mesa: Make unreachable macro take a string argument.Matt Turner2014-07-016-14/+16
| | | | | | To aid in debugging. Reviewed-by: Ian Romanick <[email protected]>
* i965/vec4: Remove useless conditionals.Matt Turner2014-07-011-6/+3
| | | | | Setting a couple of bits is the same cost or less as conditionally setting a couple of bits.
* i965/fs: Pass cfg to calculate_live_intervals().Matt Turner2014-07-016-12/+15
| | | | | | | We've often created the CFG immediately before, so use it when available. Reviewed-by: Ian Romanick <[email protected]>
* i965: Mark fields in the live interval classes protected.Matt Turner2014-07-012-16/+19
| | | | | | | | cfg, for instance, is a pointer to a local variable in calculate_live_intervals, certainly not valid after that function has returned. Reviewed-by: Ian Romanick <[email protected]>
* i965: Use typed foreach_in_list_safe instead of foreach_list_safe.Matt Turner2014-07-0111-55/+20
| | | | Acked-by: Ian Romanick <[email protected]>