aboutsummaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965
Commit message (Collapse)AuthorAgeFilesLines
* i965: Add missing persample_shading field to brw_wm_debug_recompile.Kenneth Graunke2014-07-211-0/+2
| | | | | | | | Otherwise, the performance warning for shader recompiles will just say "something else". Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965/disasm: Don't disassemble the URB complete field on Broadwell.Kenneth Graunke2014-07-211-2/+4
| | | | | | | | It doesn't exist, so attempting to read it will trigger generation assertions in the brw_inst API. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Disable hex offset printing in disassembly.Kenneth Graunke2014-07-211-1/+2
| | | | | | | | | | | | | | | Printing the hex offsets makes it basically impossible to diff assembly: if you add even a single instruction, the entire shader shows up as a difference. So, every time I want to compare assembly, I have to strip this out. The hex offsets might be useful when debugging compaction, or when inspecting the program cache buffer. Since it's occasionally useful, but uncommon, this patch disables it by default, but makes it easy to re-enable it temporarily when the need arises. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Use foreach_inst_in_block a couple more places.Matt Turner2014-07-212-8/+2
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Replace cfg instances with calls to calculate_cfg().Matt Turner2014-07-215-22/+22
| | | | | | | | | | | Avoids regenerating it unnecessarily. Every program in shader-db improved, none by an amount less than a 1/3 reduction. One Dota2 shader decreased from 62 -> 24. cfg calculations: 429492 -> 193197 (-55.02%) Reviewed-by: Topi Pohjolainen <[email protected]>
* i965/cfg: Add a foreach_block_and_inst macro.Matt Turner2014-07-211-0/+4
| | | | | | Will let us abstract how the instructions are stored. Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Add cfg to backend_visitor.Matt Turner2014-07-219-33/+48
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Silence unused parameter warningIan Romanick2014-07-191-1/+1
| | | | | | | brw_fs_visitor.cpp:2400:1: warning: unused parameter 'ir' [-Wunused-parameter] Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Silence 'comparison is always true' warningIan Romanick2014-07-191-2/+0
| | | | | | | | | | | | | The parameter is an int16_t, and we're check that it's value will fit in 16-bits. Yes, the value that is stored in 16-bits will surely fit in 16-bits. brw_inst.h: In function 'brw_inst_set_gen6_jump_count': brw_inst.h:321:66: warning: comparison is always true due to limited range of data type [-Wtype-limits] brw_inst.h:321:66: warning: comparison is always true due to limited range of data type [-Wtype-limits] Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Silence many unused parameter warningsIan Romanick2014-07-191-0/+10
| | | | | | | | brw_inst.h: In function 'brw_inst_set_src1_vstride': brw_inst.h:118:76: warning: unused parameter 'brw' [-Wunused-parameter] Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* Add support for RGBA8 and RGBX8 textures in intel_texsubimage_tiled_memcpyJason Ekstrand2014-07-171-0/+11
| | | | | | Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* i965: Improve debug output in intelTexImage and intelTexSubimageJason Ekstrand2014-07-172-1/+9
| | | | | Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* i965: Fix z_offset computation in intel_miptree_unmap_depthstencil()Anuj Phogat2014-07-171-2/+2
| | | | | | | | | | | | | | | | The bug is triggered by using glTexSubImage2d() with GL_DEPTH_STENCIL as base internal format and non-zero x, y offsets. Currently x, y offsets are ignored while updating the texture image. Fixes Khronos GLES3 CTS tests: npot_tex_sub_image_2d npot_tex_sub_image_3d npot_pbo_tex_sub_image_2d npot_pbo_tex_sub_image_2d Cc: <[email protected]> Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* Revert "i965: Extend compute-to-mrf pass to understand blocks of MOVs"Anuj Phogat2014-07-171-53/+10
| | | | | | | | | | | | | | This reverts commit bbefb15e01e1c16af69646898918982ae00f8c92. Fixes the 11 regressions caused in framebuffer_blit tests in Khronos GLES3 CTS tests: Original patch reduced the instruction count but had no performance benefits. So, it's safe to revert it without causing any performance regressions. Signed-off-by: Anuj Phogat <[email protected]> Acked-by: Kristian Høgsberg <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* Revert "i965: Implement GL_PRIMITIVES_GENERATED with non-zero streams."Kenneth Graunke2014-07-162-26/+7
| | | | | | | | | | | | | | | | This reverts commit 3178d2474ae5bdd1102fb3d76a60d1d63c961ff5. This caused GPU hangs on Ivybridge for some users and huge (80%) performance regressions across the board on multiple platforms. We need to find a better solution. I've made several attempts, but none of them have worked yet. In the meantime, we should revert this. Reverting it breaks GL_PRIMITIVES_GENERATED for non-zero streams, but that's okay, since we don't expose GL_ARB_gpu_shader5 yet. Fixes Piglit's EXT_transform_feedback/generatemipmap prims_generated test case on Haswell.
* i965: Don't copy propagate abs into Broadwell logic instructions.Kenneth Graunke2014-07-152-12/+6
| | | | | | | | | | | | It's not clear what abs on logical instructions means on Broadwell, and it doesn't appear to do anything sensible. Fixes 270 Piglit tests (the bitand/bitor/bitxor tests with abs). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=81157 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Cc: "10.2" <[email protected]>
* i965/fs: Use WE_all for gl_SampleID header register munging.Kenneth Graunke2014-07-151-5/+9
| | | | | | | | | | | | This code should execute without regard to the currently executing channels. Asking for gl_SampleID inside control flow might break in strange ways. It appears to break even at the top of the program in SIMD16 mode occasionally as well. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Chris Forbes <[email protected]> Cc: [email protected]
* i965/fs: Set force_uncompressed and force_sechalf on samplepos setup.Kenneth Graunke2014-07-151-6/+8
| | | | | | | | | | | | | | gen8_fs_generator uses these to decide whether to set the execution size to 8 or 16, so we incorrectly made both of these MOVs the full width in SIMD16 shaders. (It happened to work out on Gen4-7.) Setting them should also help inform optimization passes what's really going on, which could help avoid bugs. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Chris Forbes <[email protected]> Cc: [email protected]
* i965: Set execution size to 8 for instructions with force_sechalf set.Kenneth Graunke2014-07-151-1/+1
| | | | | | | | | | | | | | | | | Both inst->force_uncompressed and inst->force_sechalf mean that the generated instruction should be uncompressed and have an execution size of 8. We don't require the visitor to set both flags - setting inst->force_sechalf by itself is supposed to be enough. On Gen4-7, guess_execution_size() demoted instructions to 8-wide based on the default compression state. On Gen8+, we instead set a default execution size, which worked great...except that we forgot to check inst->force_sechalf when deciding whether to use 8 or 16. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Chris Forbes <[email protected]> Cc: [email protected]
* exec_list: Make various places use the new length() method.Connor Abbott2014-07-152-7/+2
| | | | | | | | | | Instead of hand-rolling it. v2 [mattst88]: Rename get_size to length. Expand comment in ir_reader. Reviewed-by: Ian Romanick <[email protected]> [v1] Reviewed-by: Matt Turner <[email protected]> Signed-off-by: Connor Abbott <[email protected]>
* i965/fs: Relax interference check in register coalescing.Matt Turner2014-07-151-11/+12
| | | | | | | | | | | | | A similar attempt was made in commit 5ff1e446 and was reverted in commit a39428cf after causing a regression in an ES 3 conformance test. The test still passes after this commit. total instructions in shared programs: 1994827 -> 1992858 (-0.10%) instructions in affected programs: 128247 -> 126278 (-1.54%) GAINED: 0 LOST: 1 Acked-by: Kenneth Graunke <[email protected]>
* i965/fs: Perform CSE on sends-from-GRF rather than textures.Matt Turner2014-07-151-1/+1
| | | | | | | | | Should potentially allow a few more cases, while avoiding doing CSE on texture operations on Gen <= 6 with the MRF. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80211 Reviewed-by: Kenneth Graunke <[email protected]> Tested-by: lu hua <[email protected]>
* i965: Initialize new chunks of realloc'd memory.Matt Turner2014-07-151-0/+4
| | | | | | | Otherwise we'd compare uninitialized pointers with NULL and dereference, leading to crashes. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vec4: Invalidate live intervals in opt_cse, not _local.Matt Turner2014-07-141-3/+3
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vec4: Move aeb list into opt_cse_local.Matt Turner2014-07-142-7/+7
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Invalidate live intervals in opt_cse, not _local.Matt Turner2014-07-141-3/+3
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Move aeb list into opt_cse_local.Matt Turner2014-07-142-7/+7
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vec4: Silence warnings about unhandled interpolation opsChris Forbes2014-07-131-0/+3
| | | | Signed-off-by: Chris Forbes <[email protected]>
* i965/fs: add support for ir_*_interpolate_at_* expressionsChris Forbes2014-07-132-2/+150
| | | | | | | | | | | | | | SIMD8-only for now. V5: - Fix style complaints - Move prototype to be with other oddball emit functions - Use unreachable() instead of assert() where possible V6: - Describe what is happening with the clamping - Add reg_width to make some expressions clearer Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Skip channel expressions splitting for interpolationChris Forbes2014-07-131-0/+25
| | | | | | | | The backend will have to do a message send, so we want to keep these in one piece, just like texture ops. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: add generator support for pixel interpolator queryChris Forbes2014-07-134-0/+59
| | | | | | | | | | V5: - Split into separate opcodes - Pass message data in src1 immediate - Put noperspective bit in fs_inst rather than adding any junk to backend_instruction Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: add low-level support for send to pixel interpolatorChris Forbes2014-07-132-0/+38
| | | | | Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/disasm: add support for pixel interpolator messagesChris Forbes2014-07-131-0/+17
| | | | | | | V3: Rework for brw_inst changes Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add message descriptor bit definitions for pixel interpolatorChris Forbes2014-07-132-0/+16
| | | | | | | These got lost in the big brw_inst shakeup. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/disasm: Disassemble indirect sends more properlyChris Forbes2014-07-121-162/+174
| | | | | | | | | | | - Don't try to disassemble send's src1 as a descriptor if it's not an immediate. - In the same case, show src1 as an operand (makes it easier to see bogus register regions, etc -- the hardware is very fussy) Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Avoid crashing while dumping vec4 insn operandsChris Forbes2014-07-121-1/+4
| | | | | | | | We'd otherwise go looking into virtual_grf_sizes for things that aren't in there at all. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Fix two broken asserts in brw_eu_emitChris Forbes2014-07-121-2/+2
| | | | | | | These were looking in the wrong field. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: forward-declare struct brw_context in brw_reg.hIlia Mirkin2014-07-091-0/+2
| | | | | | | | | | | | | | | | | | | | Commit 54e91e7420 introduced a function declaration that uses brw_context. While brw_context tends to get included in most files, it is not when compiling intel_asm_annotation.c resulting in the following warning: In file included from brw_shader.h:25:0, from brw_cfg.h:32, from intel_asm_annotation.c:24: brw_reg.h:122:39: warning: 'struct brw_context' declared inside parameter list [enabled by default] brw_reg.h:122:39: warning: its scope is only this definition or declaration, which is probably not what you want [enabled by default] Add a forward-declaration for struct brw_context to avoid the issue. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/disasm: Fix disassembly of the any16h/all16h predicates.Kenneth Graunke2014-07-081-1/+1
| | | | | | | | BRW_PREDICATE_ALIGN1_ANY16H was incorrectly being disassembled as "all16h", and ALL16H would probably print as "(null)". Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Remove artificial dependency between math instructions.Matt Turner2014-07-081-1/+2
| | | | | | ... on Gen6+. I'm not actually sure which class Gen6 fits into. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Track dependencies in instruction scheduling per reg offset.Matt Turner2014-07-081-8/+15
| | | | | | | | | | | | | | | | | | | | | Previously instruction scheduling tracked dependencies on a per-register basis. This meant that there was an artificial dependency between interpolation instructions writing into the same virtual register. Instruction scheduling would insert a number of instructions between the two instructions in this example, when they are actually independent. linterp vgrf8+0.0:F, hw_reg2:F, hw_reg3:F, hw_reg6:F linterp vgrf8+1.0:F, hw_reg2:F, hw_reg3:F, hw_reg6+16:F This lead to cases where the first texture coordinate is interpolated at the beginning of the shader, but the second is done immediately before the texture operation that uses it as a source. After this change, the artificial dependency is removed and the interpolation instructions are scheduled together. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Extend compute-to-mrf pass to understand blocks of MOVsKristian Høgsberg2014-07-071-10/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current compute-to-mrf pass doesn't handle blocks of MOVs. Shaders that end with a texture fetch follwed by an fb write are left like this: 0x00000000: pln(8) g6<1>F g4<0,1,0>F g2<8,8,1>F { align1 WE_normal 1Q compacted }; 0x00000008: pln(8) g7<1>F g4.4<0,1,0>F g2<8,8,1>F { align1 WE_normal 1Q compacted }; 0x00000010: send(8) g2<1>UW g6<8,8,1>F sampler (1, 0, 0, 1) mlen 2 rlen 4 { align1 WE_normal 1Q }; 0x00000020: mov(8) g113<1>F g2<8,8,1>F { align1 WE_normal 1Q compacted }; 0x00000028: mov(8) g114<1>F g3<8,8,1>F { align1 WE_normal 1Q compacted }; 0x00000030: mov(8) g115<1>F g4<8,8,1>F { align1 WE_normal 1Q compacted }; 0x00000038: mov(8) g116<1>F g5<8,8,1>F { align1 WE_normal 1Q compacted }; 0x00000040: sendc(8) null g113<8,8,1>F render ( RT write, 0, 4, 12) mlen 4 rlen 0 { align1 WE_normal 1Q EOT }; This patch lets compute-to-mrf recognize blocks of MOVs and match them to instructions (typically SEND) that writes multiple registers. With this, the above shader becomes: 0x00000000: pln(8) g6<1>F g4<0,1,0>F g2<8,8,1>F { align1 WE_normal 1Q compacted }; 0x00000008: pln(8) g7<1>F g4.4<0,1,0>F g2<8,8,1>F { align1 WE_normal 1Q compacted }; 0x00000010: send(8) g113<1>UW g6<8,8,1>F sampler (1, 0, 0, 1) mlen 2 rlen 4 { align1 WE_normal 1Q }; 0x00000020: sendc(8) null g113<8,8,1>F render ( RT write, 0, 20, 12) mlen 4 rlen 0 { align1 WE_normal 1Q EOT }; which is the bulk of the shader db results: total instructions in shared programs: 987040 -> 986720 (-0.03%) instructions in affected programs: 844 -> 524 (-37.91%) GAINED: 0 LOST: 0 The optimization also applies to MRT shaders that write the same color value to multiple RTs, in which case we can eliminate four MOVs in a similar fashion. See fbo-drawbuffers2-blend in piglit for an example. No measurable performance impact. No piglit regressions. Signed-off-by: Kristian Høgsberg <[email protected]>
* i965/fs: Disable unlit_centroid_workaround on Haswell.Matt Turner2014-07-061-2/+4
| | | | | | | | Although the HSW PRM shows it, the BSpec lists this workaround as being for Ivybridge only. total instructions in shared programs: 1994951 -> 1993675 (-0.06%) instructions in affected programs: 27325 -> 26049 (-4.67%)
* i965/vec4: Perform CSE on CMP(N) instructions.Matt Turner2014-07-061-1/+16
| | | | | | | | Port of commit b16b3c87 to the vec4 code. No shader-db improvements, but might as well. The fs backend saw an improvement because it's scalar and multiple identical CMP instructions were generated by the SEL peepholes.
* i965/vec4: Don't emit null MOVs in CSE.Matt Turner2014-07-061-5/+7
| | | | Port of commit 219b43c6 to the vec4 code.
* i965/vec4: Improve CSE performance by expiring some available expressions.Matt Turner2014-07-061-0/+20
| | | | Port of commit 5daf867f to the vec4 code.
* i965/vec4: Add basic common subexpression elimination.Kenneth Graunke2014-07-064-0/+236
| | | | | | | | | | | [mattst88]: Modified to perform CSE on instructions with the same writemask. Offered no improvement before. total instructions in shared programs: 1995633 -> 1995185 (-0.02%) instructions in affected programs: 14410 -> 13962 (-3.11%) Reviewed-by: Matt Turner <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]>
* i965: Fix warnings introduced in commit e24ef5ab.Matt Turner2014-07-061-2/+1
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Move assembly annotation functions to intel_asm_annotation.c.Matt Turner2014-07-054-61/+67
| | | | | | It's C. Compile it as such. Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Rename intel_asm_printer -> intel_asm_annotation.Matt Turner2014-07-058-7/+7
| | | | | | The #ifndef include guards already said the right thing :) Reviewed-by: Topi Pohjolainen <[email protected]>