summaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* glsl/glcpp: Fix preprocessor error condition for macro redefinitionAnuj Phogat2014-07-091-8/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch specifically fixes redefinition condition for white space changes. #define and #undef functionality in GLSL follows the standard for C++ preprocessors for macro definitions. From https://gcc.gnu.org/onlinedocs/cpp/Undefining-and-Redefining-Macros.html: These definitions are effectively the same: #define FOUR (2 + 2) #define FOUR (2 + 2) #define FOUR (2 /* two */ + 2) but these are not: #define FOUR (2 + 2) #define FOUR ( 2+2 ) #define FOUR (2 * 2) #define FOUR(score,and,seven,years,ago) (2 + 2) Fixes Khronos GLES3 CTS tests; invalid_object_whitespace_vertex invalid_object_whitespace_fragment Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Carl Worth <[email protected]>
* glsl/glcpp: Add test to ensure compiler won't allow #undef for some builtinsCarl Worth2014-07-092-0/+10
| | | | | | | Currently verifying that an #undef of __FILE__, __LINE__, or __VERSION__ will generate an error. Reviewed-by: Anuj Phogat <[email protected]>
* glsl/glcpp: Do not allow undefining the built-in macrosAnuj Phogat2014-07-091-0/+6
| | | | | | | | | | | | | | | | | Fixes piglit tests in spec/glsl-es-3.00/compile: undef-__FILE__.vert undef-GL_ES.vert undef-__LINE__.vert undef-__VERSION__.vert Also, fixes Khronos GLES3 CTS tests: undefine_invalid_object_1_vertex undefine_invalid_object_1_fragment undefine_invalid_object_2_vertex undefine_invalid_object_2_fragment Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Carl Worth <[email protected]>
* gallium/u_blitter: fix some shader memory leaksBrian Paul2014-07-091-0/+9
| | | | | | | The _msaa shaders weren't getting freed. Cc: "10.2" <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* tgsi: properly parse indirect dimension references (e.g. for UBOs)Ilia Mirkin2014-07-091-0/+7
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* radeonsi: fix order of r600_need_dma_space and r600_context_bo_relocChristian König2014-07-091-1/+2
| | | | | Signed-off-by: Christian König <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* st/mesa: fix geometry shader memory leakBrian Paul2014-07-091-0/+1
| | | | | | | | Spotted by Charmaine Lee. Cc: "10.2" <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Charmaine Lee <[email protected]>
* mesa: fix geometry shader memory leaksBrian Paul2014-07-092-0/+4
| | | | | | | Spotted by Charmaine Lee. Cc: "10.2" <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* st/mesa: minor simplification of some state atom assignmentsBrian Paul2014-07-092-7/+4
|
* st/mesa: minor fix-up in st_GetSamplePosition()Brian Paul2014-07-091-2/+4
| | | | | If the driver doesn't implement get_sample_position(), let's return some non-garbage values.
* mesa: use float to silence MSVC warning in _mesa_GetMultisamplefv()Brian Paul2014-07-091-1/+1
|
* nvc0: allocate more space before a counter is configuredSamuel Pitoiset2014-07-081-2/+3
| | | | | | | | | On nvc0, a counter can have up to 6 sources instead of only one for nve4+. This fixes a crash when a counter uses more than one source. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: use unordered_set instead of list to keep track of var usesTobias Klausmann2014-07-084-9/+10
| | | | | | | | | | | The set of variable uses does not need to be ordered in any way, and removing/adding elements is a fairly common operation in various optimization passes. This shortens runtime of piglit test fp-long-alu to ~22s from ~4h Signed-off-by: Tobias Klausmann <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* i965/disasm: Fix disassembly of the any16h/all16h predicates.Kenneth Graunke2014-07-081-1/+1
| | | | | | | | BRW_PREDICATE_ALIGN1_ANY16H was incorrectly being disassembled as "all16h", and ALL16H would probably print as "(null)". Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Fix the foreach_in_list_reverse macro.Kenneth Graunke2014-07-081-3/+3
| | | | | | | | | | | We clearly don't want to start at the head and walk backwards; we want to start at the last real element before the tail sentinel. If the list is empty, tail_pred will be the head sentinel, and we'll stop. Nothing uses this function, so I guess nobody noticed it was broken. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* radeonsi: mark MSAA config state as dirty at the beginning of CSMarek Olšák2014-07-081-0/+1
| | | | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=81020 Reviewed-by: Alex Deucher <[email protected]>
* gallium: fix u_default_transfer_inline_write for texturesMarek Olšák2014-07-081-2/+2
| | | | | | | | | This doesn't fix any known issue. In fact, radeon drivers ignore all the discard flags for textures and implicitly do "discard range" for any write transfer. Cc: [email protected] Reviewed-by: Roland Scheidegger <[email protected]>
* i965: Remove artificial dependency between math instructions.Matt Turner2014-07-081-1/+2
| | | | | | ... on Gen6+. I'm not actually sure which class Gen6 fits into. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Track dependencies in instruction scheduling per reg offset.Matt Turner2014-07-081-8/+15
| | | | | | | | | | | | | | | | | | | | | Previously instruction scheduling tracked dependencies on a per-register basis. This meant that there was an artificial dependency between interpolation instructions writing into the same virtual register. Instruction scheduling would insert a number of instructions between the two instructions in this example, when they are actually independent. linterp vgrf8+0.0:F, hw_reg2:F, hw_reg3:F, hw_reg6:F linterp vgrf8+1.0:F, hw_reg2:F, hw_reg3:F, hw_reg6+16:F This lead to cases where the first texture coordinate is interpolated at the beginning of the shader, but the second is done immediately before the texture operation that uses it as a source. After this change, the artificial dependency is removed and the interpolation instructions are scheduled together. Reviewed-by: Kenneth Graunke <[email protected]>
* ilo: fix fence reference countingChia-I Wu2014-07-081-12/+9
| | | | The old code was complicated, and was wrong when *ptr is NULL.
* i965: Extend compute-to-mrf pass to understand blocks of MOVsKristian Høgsberg2014-07-071-10/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current compute-to-mrf pass doesn't handle blocks of MOVs. Shaders that end with a texture fetch follwed by an fb write are left like this: 0x00000000: pln(8) g6<1>F g4<0,1,0>F g2<8,8,1>F { align1 WE_normal 1Q compacted }; 0x00000008: pln(8) g7<1>F g4.4<0,1,0>F g2<8,8,1>F { align1 WE_normal 1Q compacted }; 0x00000010: send(8) g2<1>UW g6<8,8,1>F sampler (1, 0, 0, 1) mlen 2 rlen 4 { align1 WE_normal 1Q }; 0x00000020: mov(8) g113<1>F g2<8,8,1>F { align1 WE_normal 1Q compacted }; 0x00000028: mov(8) g114<1>F g3<8,8,1>F { align1 WE_normal 1Q compacted }; 0x00000030: mov(8) g115<1>F g4<8,8,1>F { align1 WE_normal 1Q compacted }; 0x00000038: mov(8) g116<1>F g5<8,8,1>F { align1 WE_normal 1Q compacted }; 0x00000040: sendc(8) null g113<8,8,1>F render ( RT write, 0, 4, 12) mlen 4 rlen 0 { align1 WE_normal 1Q EOT }; This patch lets compute-to-mrf recognize blocks of MOVs and match them to instructions (typically SEND) that writes multiple registers. With this, the above shader becomes: 0x00000000: pln(8) g6<1>F g4<0,1,0>F g2<8,8,1>F { align1 WE_normal 1Q compacted }; 0x00000008: pln(8) g7<1>F g4.4<0,1,0>F g2<8,8,1>F { align1 WE_normal 1Q compacted }; 0x00000010: send(8) g113<1>UW g6<8,8,1>F sampler (1, 0, 0, 1) mlen 2 rlen 4 { align1 WE_normal 1Q }; 0x00000020: sendc(8) null g113<8,8,1>F render ( RT write, 0, 20, 12) mlen 4 rlen 0 { align1 WE_normal 1Q EOT }; which is the bulk of the shader db results: total instructions in shared programs: 987040 -> 986720 (-0.03%) instructions in affected programs: 844 -> 524 (-37.91%) GAINED: 0 LOST: 0 The optimization also applies to MRT shaders that write the same color value to multiple RTs, in which case we can eliminate four MOVs in a similar fashion. See fbo-drawbuffers2-blend in piglit for an example. No measurable performance impact. No piglit regressions. Signed-off-by: Kristian Høgsberg <[email protected]>
* nvc0/ir: fill offset in properly for TXDIlia Mirkin2014-07-081-13/+43
| | | | | | | | Apparently TXD wants its offset differently than TEX, accepting it in the upper bits of the layer index. Unclear what happens when this is combined with indirect sampler indexing. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: use manual TXD when offsets are involvedIlia Mirkin2014-07-081-1/+2
| | | | | | | | | | Something about how we're implementing offsets for TXD is wrong, just flip to the generic quadop-based implementation in that case. This is the minimal fix appropriate for backporting. Signed-off-by: Ilia Mirkin <[email protected]> Cc: <[email protected]>
* nvc0/ir: do quadops on the right texture coordinates for TXDIlia Mirkin2014-07-081-2/+3
| | | | | | | | handleTEX moves the layer as the first argument. This makes sure that the quadops deal with the texture coordinates. Signed-off-by: Ilia Mirkin <[email protected]> Cc: <[email protected]>
* nv50/ir: ignore bias for samplerCubeShadow on nv50Ilia Mirkin2014-07-081-0/+10
| | | | | | | | Unfortunately there's no good way to do this on the nv50 shader isa. Dropping the bias seems preferable to doing the compare post-filtering. Signed-off-by: Ilia Mirkin <[email protected]> Cc: <[email protected]>
* nv50/ir: retrieve shadow compare from first argIlia Mirkin2014-07-081-1/+1
| | | | | | | | This can only happen with texture(samplerCubeShadow, bias), where the compare will be in the first argument. Signed-off-by: Ilia Mirkin <[email protected]> Cc: <[email protected]>
* i965/fs: Disable unlit_centroid_workaround on Haswell.Matt Turner2014-07-061-2/+4
| | | | | | | | Although the HSW PRM shows it, the BSpec lists this workaround as being for Ivybridge only. total instructions in shared programs: 1994951 -> 1993675 (-0.06%) instructions in affected programs: 27325 -> 26049 (-4.67%)
* i965/vec4: Perform CSE on CMP(N) instructions.Matt Turner2014-07-061-1/+16
| | | | | | | | Port of commit b16b3c87 to the vec4 code. No shader-db improvements, but might as well. The fs backend saw an improvement because it's scalar and multiple identical CMP instructions were generated by the SEL peepholes.
* i965/vec4: Don't emit null MOVs in CSE.Matt Turner2014-07-061-5/+7
| | | | Port of commit 219b43c6 to the vec4 code.
* i965/vec4: Improve CSE performance by expiring some available expressions.Matt Turner2014-07-061-0/+20
| | | | Port of commit 5daf867f to the vec4 code.
* i965/vec4: Add basic common subexpression elimination.Kenneth Graunke2014-07-064-0/+236
| | | | | | | | | | | [mattst88]: Modified to perform CSE on instructions with the same writemask. Offered no improvement before. total instructions in shared programs: 1995633 -> 1995185 (-0.02%) instructions in affected programs: 14410 -> 13962 (-3.11%) Reviewed-by: Matt Turner <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]>
* i965: Fix warnings introduced in commit e24ef5ab.Matt Turner2014-07-061-2/+1
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* gallium/radeon: use PRIX64 instead of PRIu64Christian König2014-07-062-2/+2
| | | | | | | We want hex values here, not decimals. Signed-off-by: Christian König <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* i965: Move assembly annotation functions to intel_asm_annotation.c.Matt Turner2014-07-054-61/+67
| | | | | | It's C. Compile it as such. Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Rename intel_asm_printer -> intel_asm_annotation.Matt Turner2014-07-058-7/+7
| | | | | | The #ifndef include guards already said the right thing :) Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Make backend_instruction usable from C.Matt Turner2014-07-051-4/+7
| | | | | | | With a hack to place an exec_node in the struct in C to be at the same location as the inherited exec_node in C++. Acked-by: Topi Pohjolainen <[email protected]>
* i965/cfg: Make cfg_t usable from C.Matt Turner2014-07-053-8/+6
| | | | Acked-by: Topi Pohjolainen <[email protected]>
* i965: Repack backend_instruction struct.Matt Turner2014-07-051-7/+5
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Make a brw_predicate enum.Matt Turner2014-07-056-31/+35
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Make a brw_conditional_mod enum.Matt Turner2014-07-0518-43/+54
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Move common fields into backend_instruction.Matt Turner2014-07-053-25/+13
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Use enum brw_reg_type for register types.Matt Turner2014-07-057-13/+14
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Move is_zero/one/null/accumulator into backend_reg.Matt Turner2014-07-056-93/+44
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Make a common backend_reg class.Matt Turner2014-07-054-42/+36
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Drop imm union from visitor register classes.Matt Turner2014-07-052-14/+0
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Use immediate storage in brw_reg for visitor regs.Matt Turner2014-07-056-41/+37
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* glsl: Fix merging of layout(invocations) with other qualifiersChris Forbes2014-07-051-0/+10
| | | | | | | | | | | | | | | | | If another layout qualifier appeared to the left of `invocations` in the GS input layout declaration, the invocation count would be dropped on the floor. Fixes the piglit tests: spec/ARB_transform_feedback3/arb_transform_feedback3-ext_interleaved_two_bufs_gs_max spec/ARB_gpu_shader5/arb_gpu_shader5-invocation-id spec/ARB_gpu_shader5/compiler/correct-multiple-layout-qualifier-invocations.geom spec/ARB_gpu_shader5/execution/invocations-conflicting Signed-off-by: Chris Forbes <[email protected]> Tested-by: Ilia Mirkin <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* nvc0: add a memory barrier when there are persistent UBOsIlia Mirkin2014-07-035-4/+57
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Cc: "10.2" <[email protected]>
* nv50: do an explicit flush on draw when there are persistent buffersIlia Mirkin2014-07-033-2/+50
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Cc: "10.2" <[email protected]>
* nv50: disable dedicated ubo upload methodIlia Mirkin2014-07-031-0/+7
| | | | | | | | | | The hardware allows multiple simultaneous renders with the same memory-backed constbufs but with each invocation having different values. However in order for that to work, the data has to be streamed in via the right constbuf slot. We weren't doing that for UBOs. Signed-off-by: Ilia Mirkin <[email protected]> Cc: "10.2 10.1" <[email protected]>