summaryrefslogtreecommitdiffstats
path: root/src/mesa
Commit message (Collapse)AuthorAgeFilesLines
* st/mesa: add PIPE_FORMAT_R10G10B10A2_UNORM to format_map tableBrian Paul2014-07-091-1/+2
| | | | | | as a candidate for the GL_RGB10_A2 internal texture format. Reviewed-by: Marek Olšák <[email protected]>
* st/mesa: add some missing MESA/PIPE_FORMAT_R10G10B10A2_UNORM switch casesBrian Paul2014-07-091-0/+4
| | | | Reviewed-by: Marek Olšák <[email protected]>
* st/mesa: fix geometry shader memory leakBrian Paul2014-07-091-0/+1
| | | | | | | | Spotted by Charmaine Lee. Cc: "10.2" <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Charmaine Lee <[email protected]>
* mesa: fix geometry shader memory leaksBrian Paul2014-07-092-0/+4
| | | | | | | Spotted by Charmaine Lee. Cc: "10.2" <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* st/mesa: minor simplification of some state atom assignmentsBrian Paul2014-07-092-7/+4
|
* st/mesa: minor fix-up in st_GetSamplePosition()Brian Paul2014-07-091-2/+4
| | | | | If the driver doesn't implement get_sample_position(), let's return some non-garbage values.
* mesa: use float to silence MSVC warning in _mesa_GetMultisamplefv()Brian Paul2014-07-091-1/+1
|
* i965/disasm: Fix disassembly of the any16h/all16h predicates.Kenneth Graunke2014-07-081-1/+1
| | | | | | | | BRW_PREDICATE_ALIGN1_ANY16H was incorrectly being disassembled as "all16h", and ALL16H would probably print as "(null)". Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Remove artificial dependency between math instructions.Matt Turner2014-07-081-1/+2
| | | | | | ... on Gen6+. I'm not actually sure which class Gen6 fits into. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Track dependencies in instruction scheduling per reg offset.Matt Turner2014-07-081-8/+15
| | | | | | | | | | | | | | | | | | | | | Previously instruction scheduling tracked dependencies on a per-register basis. This meant that there was an artificial dependency between interpolation instructions writing into the same virtual register. Instruction scheduling would insert a number of instructions between the two instructions in this example, when they are actually independent. linterp vgrf8+0.0:F, hw_reg2:F, hw_reg3:F, hw_reg6:F linterp vgrf8+1.0:F, hw_reg2:F, hw_reg3:F, hw_reg6+16:F This lead to cases where the first texture coordinate is interpolated at the beginning of the shader, but the second is done immediately before the texture operation that uses it as a source. After this change, the artificial dependency is removed and the interpolation instructions are scheduled together. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Extend compute-to-mrf pass to understand blocks of MOVsKristian Høgsberg2014-07-071-10/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current compute-to-mrf pass doesn't handle blocks of MOVs. Shaders that end with a texture fetch follwed by an fb write are left like this: 0x00000000: pln(8) g6<1>F g4<0,1,0>F g2<8,8,1>F { align1 WE_normal 1Q compacted }; 0x00000008: pln(8) g7<1>F g4.4<0,1,0>F g2<8,8,1>F { align1 WE_normal 1Q compacted }; 0x00000010: send(8) g2<1>UW g6<8,8,1>F sampler (1, 0, 0, 1) mlen 2 rlen 4 { align1 WE_normal 1Q }; 0x00000020: mov(8) g113<1>F g2<8,8,1>F { align1 WE_normal 1Q compacted }; 0x00000028: mov(8) g114<1>F g3<8,8,1>F { align1 WE_normal 1Q compacted }; 0x00000030: mov(8) g115<1>F g4<8,8,1>F { align1 WE_normal 1Q compacted }; 0x00000038: mov(8) g116<1>F g5<8,8,1>F { align1 WE_normal 1Q compacted }; 0x00000040: sendc(8) null g113<8,8,1>F render ( RT write, 0, 4, 12) mlen 4 rlen 0 { align1 WE_normal 1Q EOT }; This patch lets compute-to-mrf recognize blocks of MOVs and match them to instructions (typically SEND) that writes multiple registers. With this, the above shader becomes: 0x00000000: pln(8) g6<1>F g4<0,1,0>F g2<8,8,1>F { align1 WE_normal 1Q compacted }; 0x00000008: pln(8) g7<1>F g4.4<0,1,0>F g2<8,8,1>F { align1 WE_normal 1Q compacted }; 0x00000010: send(8) g113<1>UW g6<8,8,1>F sampler (1, 0, 0, 1) mlen 2 rlen 4 { align1 WE_normal 1Q }; 0x00000020: sendc(8) null g113<8,8,1>F render ( RT write, 0, 20, 12) mlen 4 rlen 0 { align1 WE_normal 1Q EOT }; which is the bulk of the shader db results: total instructions in shared programs: 987040 -> 986720 (-0.03%) instructions in affected programs: 844 -> 524 (-37.91%) GAINED: 0 LOST: 0 The optimization also applies to MRT shaders that write the same color value to multiple RTs, in which case we can eliminate four MOVs in a similar fashion. See fbo-drawbuffers2-blend in piglit for an example. No measurable performance impact. No piglit regressions. Signed-off-by: Kristian Høgsberg <[email protected]>
* i965/fs: Disable unlit_centroid_workaround on Haswell.Matt Turner2014-07-061-2/+4
| | | | | | | | Although the HSW PRM shows it, the BSpec lists this workaround as being for Ivybridge only. total instructions in shared programs: 1994951 -> 1993675 (-0.06%) instructions in affected programs: 27325 -> 26049 (-4.67%)
* i965/vec4: Perform CSE on CMP(N) instructions.Matt Turner2014-07-061-1/+16
| | | | | | | | Port of commit b16b3c87 to the vec4 code. No shader-db improvements, but might as well. The fs backend saw an improvement because it's scalar and multiple identical CMP instructions were generated by the SEL peepholes.
* i965/vec4: Don't emit null MOVs in CSE.Matt Turner2014-07-061-5/+7
| | | | Port of commit 219b43c6 to the vec4 code.
* i965/vec4: Improve CSE performance by expiring some available expressions.Matt Turner2014-07-061-0/+20
| | | | Port of commit 5daf867f to the vec4 code.
* i965/vec4: Add basic common subexpression elimination.Kenneth Graunke2014-07-064-0/+236
| | | | | | | | | | | [mattst88]: Modified to perform CSE on instructions with the same writemask. Offered no improvement before. total instructions in shared programs: 1995633 -> 1995185 (-0.02%) instructions in affected programs: 14410 -> 13962 (-3.11%) Reviewed-by: Matt Turner <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]>
* i965: Fix warnings introduced in commit e24ef5ab.Matt Turner2014-07-061-2/+1
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Move assembly annotation functions to intel_asm_annotation.c.Matt Turner2014-07-054-61/+67
| | | | | | It's C. Compile it as such. Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Rename intel_asm_printer -> intel_asm_annotation.Matt Turner2014-07-058-7/+7
| | | | | | The #ifndef include guards already said the right thing :) Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Make backend_instruction usable from C.Matt Turner2014-07-051-4/+7
| | | | | | | With a hack to place an exec_node in the struct in C to be at the same location as the inherited exec_node in C++. Acked-by: Topi Pohjolainen <[email protected]>
* i965/cfg: Make cfg_t usable from C.Matt Turner2014-07-053-8/+6
| | | | Acked-by: Topi Pohjolainen <[email protected]>
* i965: Repack backend_instruction struct.Matt Turner2014-07-051-7/+5
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Make a brw_predicate enum.Matt Turner2014-07-056-31/+35
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Make a brw_conditional_mod enum.Matt Turner2014-07-0518-43/+54
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Move common fields into backend_instruction.Matt Turner2014-07-053-25/+13
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Use enum brw_reg_type for register types.Matt Turner2014-07-057-13/+14
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Move is_zero/one/null/accumulator into backend_reg.Matt Turner2014-07-056-93/+44
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Make a common backend_reg class.Matt Turner2014-07-054-42/+36
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Drop imm union from visitor register classes.Matt Turner2014-07-052-14/+0
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Use immediate storage in brw_reg for visitor regs.Matt Turner2014-07-056-41/+37
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* gallium: rename PIPE_CAP_TGSI_VS_LAYER to also have _VIEWPORTIlia Mirkin2014-07-032-2/+2
| | | | | | | | | Now that this cap is used to determine the availability of both, adjust its name to reflect the new reality. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* mesa/st: enable AMD_vertex_shader_viewport_indexIlia Mirkin2014-07-032-0/+6
| | | | | | | | | | The assumption is that any driver capable of emitting layer from the vertex shader and supporting viewports should be able to also handle emitting viewport index from the vertex shader. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Tested-by: Tobias Droste <[email protected]>
* i965: expose AMD_vertex_shader_viewport_index on gen7+Ilia Mirkin2014-07-021-1/+3
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* mesa: add support for AMD_vertex_shader_viewport_indexIlia Mirkin2014-07-022-0/+2
| | | | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]> Tested-by: Tobias Droste <[email protected]>
* mesa/st: enable ARB_fragment_layer_viewportIlia Mirkin2014-07-021-0/+1
| | | | | | | | | | If multiple viewports are supported, that implies the presence of a GS and layered rendering, so we can enable ARB_fragment_layer_viewport as well. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* i965/gen6: Add a spec citation about push constant packet requirements.Eric Anholt2014-07-021-1/+8
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add a comment about null renderbuffer surfaces and why they exist.Eric Anholt2014-07-023-2/+22
| | | | | | I noticed this when trying to find comments about pull constant buffers. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Update a ton of comments about constant buffers.Eric Anholt2014-07-024-32/+74
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Merge VS/GS and WM pull constant buffer upload paths.Eric Anholt2014-07-024-53/+42
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965/gen6+: Merge VS/GS and WM push constant buffer upload paths.Eric Anholt2014-07-024-66/+52
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Move dispatch_grf_start_reg and first_curbe_grf into stage_prog_data.Eric Anholt2014-07-0214-35/+36
| | | | | | | I wanted to access this value from stage-generic code, so stop storing it under two different names. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Fix state flags for gen4/5 CURBE.Eric Anholt2014-07-021-8/+8
| | | | | | | | If we had some NOS affecting VS compilation that resulted in optimization changing the set of constants to be uploaded, we might not have reuploaded the constants. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Remove a dead define.Eric Anholt2014-07-021-2/+0
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Reuse libdrm's header for AUB definitions.Eric Anholt2014-07-024-71/+7
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Fix stale comments about the state cache.Eric Anholt2014-07-022-2/+9
| | | | | | This changed in the state streaming work years ago. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Fix stale binding table comment.Eric Anholt2014-07-021-2/+0
| | | | | | | I recently moved the code from the mentioned location right into this file. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Drop the memcmp for finding duplicated CURBE uploads.Eric Anholt2014-07-024-50/+2
| | | | | | | | | | | | | | At this point, the extra copy of the data and memcmp are as expensive as just re-uploading. Note: now that we'll always upload, and brw_constant_buffer watches BRW_NEW_BATCH anyway, we don't need to explicitly unref the old curbe_bo at batch reset time. No significant performance difference on glamor copywinwin10 (n=55), despite that test having a 98% hit rate on the cache. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Reuse intel_upload.c for gen4/5 constant buffers.Eric Anholt2014-07-023-31/+7
| | | | | | No performance difference on glamor with copywinwin10 (n=40) on my gm45. Reviewed-by: Kenneth Graunke <[email protected]>
* st/mesa: add support for indirect drawingChristoph Bumiller2014-07-023-1/+14
|
* xmlconfig/dri: bool -> unsigned charDave Airlie2014-07-023-10/+8
| | | | | | | | | Drop stdbool, due to the X server being a pain and having struct members called bool, although I've sent a patch to fix that we should retain stupidity here. Use unsigned char which is what GLboolean is anyways. Signed-off-by: Dave Airlie <[email protected]>