aboutsummaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers
Commit message (Collapse)AuthorAgeFilesLines
* i965: Make can_do_source_mods() a member of the instruction classes.Matt Turner2014-06-256-14/+12
| | | | | | | Pretty nonsensical to have it as a method of the visitor just for access to brw. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Don't fix_math_operand() on Gen >= 8.Matt Turner2014-06-241-2/+4
| | | | | | Reviewed-by: Ben Widawsky <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965/vec4: Don't fix_math_operand() on Gen >= 8.Matt Turner2014-06-241-2/+6
| | | | | | | | The emit_math?_gen? functions serve to implement workarounds for the math instruction, none of which exist on Gen8+. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965/vec4: Don't return void from a void function.Matt Turner2014-06-241-4/+4
| | | | | | Reviewed-by: Ben Widawsky <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965: Don't emit SURFACE_STATEs for gather workarounds on Broadwell.Kenneth Graunke2014-06-232-8/+15
| | | | | | | | | | | | | As far as I can tell, Broadwell doesn't need any of the SURFACE_STATE workarounds for textureGather() bugs, so there's no need to emit a second set of identical copies. To keep things simple, just point the gather surface index base to the same place as the texture surface index base. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]> Cc: "10.2" <[email protected]>
* i965: Allow the blorp blit between BGR and RGBNeil Roberts2014-06-231-7/+21
| | | | | | | | | | | | | | | | | | | | Previously the blorp blitter would only be used if the format is identical or there is only a difference between whether there is an alpha component or not. This patch makes it also allow the blorp blitter if the only difference is the ordering of the RGB components (ie, RGB or BGR). This is particularly useful since commit 61e264f4fcdba3623 because Mesa now prefers RGB ordering for textures but the window system buffers are still created as BGR. That means that the blorp blitter won't be used for the (probably) common case of blitting from a texture to the window system buffer. This doesn't cause any regressions in the FBO piglit tests on Haswell. On Sandybridge it causes the fbo-blit-stretch test to fail but that is only because it was failing anyway before the above commit and that commit hid the problem. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68365 Reviewed-by: Matt Turner <[email protected]>
* i915: Fix gen2 texblend setupVille Syrjälä2014-06-231-1/+1
| | | | | | | | | | | | | | | | | Fix an off by one in the texture unit walk during texblend setup on gen2. This caused the last enabled texunit to be skipped resulting in totally messed up texturing. This is a regression introduced here: commit 1ad443ecdd694dd9bf3c4a5050d749fb80db6fa2 Author: Eric Anholt <[email protected]> Date: Wed Apr 23 15:35:27 2014 -0700 i915: Redo texture unit walking on i830. Reviewed-by: Ian Romanick <[email protected]> Cc: "10.2" <[email protected]> Signed-off-by: Ville Syrjälä <[email protected]>
* i965: Save meta stencil blit programs in the context.Kenneth Graunke2014-06-212-8/+13
| | | | | | | | | | | | | | | | | | | | | | | When the last context in a share group is destroyed, the hash table containing all of the shader programs (ctx->Shared->ShaderObjects) is destroyed, throwing away all of the shader programs. Using a static variable to store program IDs ends up holding on to them after this, so we think we still have a compiled program, when it actually got destroyed. _mesa_UseProgram then hits GL errors, since no program by that ID exists. Instead, store the program IDs in the context, so we know to recompile if our context gets destroyed and the application creates another one. Fixes es3conform tests when run without -minfmt (where it creates separate contexts for testing each visual). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77865 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Cc: "10.2" <[email protected]>
* meta: Respect the driver's maximum number of draw buffersIan Romanick2014-06-181-2/+2
| | | | | | | | | | | | | | | Commit c1c1cf5f9 added infrastructure for saving and restoring draw buffer state. However, it universially used MAX_DRAW_BUFFERS, but many drivers support far fewer than that at limit. For example, the radeon and i915 drivers only support 1. Using MAX_DRAW_BUFFERS causes meta to generate GL errors. Signed-off-by: Ian Romanick <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80115 Reviewed-by: Kenneth Graunke <[email protected]> Tested-by: Kenneth Graunke <[email protected]> [on Broadwell] Tested-by: [email protected] Cc: "10.2" <[email protected]>
* i965/vec4: unit test for copy propagation and writemaskChia-I Wu2014-06-181-0/+30
| | | | | | | | This unit test demonstrates a subtle bug fixed by 4ddf51db6af36736d5d42c1043eeea86e47459ce. Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965/vec4/gs: Silence warning about unused 'success' in release build.Matt Turner2014-06-171-0/+1
| | | | | Reviewed-by: Tapani Pälli <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965/disasm: Mark three_source_reg_encoding[] static.Matt Turner2014-06-171-1/+1
| | | | | Reviewed-by: Tapani Pälli <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965/blorp: Remove unused 'brw' member.Matt Turner2014-06-171-2/+0
| | | | | Reviewed-by: Tapani Pälli <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965/blorp: Mark branch unreachable to silence uninitialized var warning.Matt Turner2014-06-171-0/+1
| | | | | Reviewed-by: Tapani Pälli <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965: Silence warning about unused brw in release builds.Matt Turner2014-06-171-2/+1
| | | | | Reviewed-by: Tapani Pälli <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965: Mark backend_instruction and bblock_t as structs.Matt Turner2014-06-172-2/+2
| | | | | | | | | They have to be marked as structs for C code elsewhere. bblock_t is already defined as a struct, and all of backend_instruction's fields are public anyway. Reviewed-by: Tapani Pälli <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965: Use standard SSE intrinsics instead of gcc built-ins.Matt Turner2014-06-171-5/+7
| | | | | | | | Let's this file compile with clang. Reviewed-by: Frank Henigman <[email protected]> Reviewed-by: Tapani Pälli <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* mesa: Mark default case unreachable to silence warning.Matt Turner2014-06-171-0/+1
| | | | | | | | Warned about 'coord' being undefined in the default case, which is unreachable. Reviewed-by: Tapani Pälli <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* Revert "i965: Add 'wait' instruction support"Matt Turner2014-06-173-34/+0
| | | | | | This reverts commit 20be3ff57670529a410b30a1008a71e768d08428. No evidence of ever being used.
* i965/fs: Optimize SEL with the same sources into a MOV.Matt Turner2014-06-171-1/+7
| | | | | | instructions in affected programs: 474 -> 462 (-2.53%) Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Perform CSE on texture operations.Matt Turner2014-06-171-1/+10
| | | | | | | | Helps Unigine Tropics and some (old) gstreamer shaders in shader-db. instructions in affected programs: 792 -> 744 (-6.06%) Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Copy propagate from load_payload.Matt Turner2014-06-171-0/+22
| | | | | But only into non-load_payload instructions. Otherwise we would prevent register coalescing from combining identical payloads.
* i965/fs: Perform CSE on load_payload instructions if it's not a copy.Matt Turner2014-06-171-0/+18
| | | | | | | | | | | | | | Since CSE creates instructions, if we let CSE generate things register coalescing can't remove, bad things will happen. Only let CSE combine non-copy load_payloads. E.g., allow CSE to handle this load_payload vgrf4+0, vgrf5, vgrf6 but not this load_payload vgrf4+0, vgrf5+0, vgrf5+1
* i965/fs: Support register coalescing on LOAD_PAYLOAD operands.Matt Turner2014-06-171-10/+54
|
* i965/fs: Emit load_payload instead of multiple MOVs for large VGRFs.Matt Turner2014-06-171-12/+21
|
* i965/fs: Only consider real sources when comparing instructions.Matt Turner2014-06-171-4/+15
|
* i965/fs: Apply cube map array fixup and restore the payload.Matt Turner2014-06-171-1/+14
| | | | | So that we don't have partial writes to a large VGRF. Will be cleaned up by register coalescing.
* i965/fs: Use LOAD_PAYLOAD in emit_texture_gen7().Matt Turner2014-06-171-62/+73
|
* i965/fs: Lower LOAD_PAYLOAD and clean up.Matt Turner2014-06-172-0/+39
| | | | Clean up with with register_coalesce()/dead_code_eliminate().
* i965/fs: Add SHADER_OPCODE_LOAD_PAYLOAD.Matt Turner2014-06-175-0/+33
| | | | | | Will be used to simplify the handling of large virtual GRFs in SSA form. Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Use 8x4 aligned rectangles for HiZ operations on Broadwell.Kenneth Graunke2014-06-161-4/+16
| | | | | | | | | | | | | | | | | Like on Haswell, we need to use 8x4 aligned rectangle primitives for hierarchical depth buffer resolves and depth clears. See the comments in brw_blorp.cpp's brw_hiz_op_params() constructor. (The Broadwell documentation confirms that this is still necessary.) This patch makes the Broadwell code follow the same behavior as Chad and Jordan's Gen7 BLORP code. Based on a patch by Topi Pohjolainen. This fixes es3conform's framebuffer_blit_functionality_scissor_blit test, with no Piglit regressions. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Cc: "10.2" <[email protected]>
* i965: Make INTEL_DEBUG=mip print out whether HiZ is enabled.Kenneth Graunke2014-06-161-0/+2
| | | | | | | | | We only enable HiZ for miplevels which are aligned on 8x4 blocks. When debugging HiZ failures, it's useful to know whether a particular miplevel is using HiZ or not. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965/cs: Use override structure rather than separate env varJordan Justen2014-06-162-4/+2
| | | | | | | | | | | | In 25268b93, we added a new environment variable (INTEL_COMPUTE_SHADER) to allow some constant values to be upgraded for the ARB_compute_shader extension. Now, we can look to see if the extension was enabled via the MESA_EXTENSION_OVERRIDE environment variable. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* Enable GL_ARB_explicit_uniform_location in the drivers.Tapani Pälli2014-06-162-0/+2
| | | | | | | v2: enable also for i915 (Ian) Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Petri Latvala <[email protected]>
* i965/vec4: Use the sampler for pull constant loads on Broadwell.Kenneth Graunke2014-06-151-8/+8
| | | | | | | | | | | | | | | | | | | | | We've used the LD sampler message for pull constant loads on earlier hardware for some time, and also were already using it for the FS on Broadwell. This patch makes us use it for Broadwell VS/GS as well. I believe that when I wrote this code in 2012, we still used the data port in some cases, and I somehow neglected to convert it while rebasing. Improves performance in GLBenchmark 2.7 Egypt by 416.978% +/- 2.25821% (n = 17). Many other applications should benefit similarly: this speeds up uniform array access in the VS, which is commonly used for skinning shaders, among other things. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Tested-by: Ben Widawsky <[email protected]> Cc: "10.2" <[email protected]>
* i965: Add missing newlines to a few perf_debug messages.Kenneth Graunke2014-06-151-2/+2
| | | | | | Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Cc: "10.2" <[email protected]>
* i965: Drop Broadwell perf_debugs about missing MOCS that aren't missing.Kenneth Graunke2014-06-152-4/+0
| | | | | | | | | I actually added MOCS support for these things, but forgot to delete the corresponding perf_debug() warnings. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Cc: "10.2" <[email protected]>
* i965: Add missing MOCS setup for 3DSTATE_INDEX_BUFFER on Broadwell.Kenneth Graunke2014-06-151-3/+1
| | | | | | | | Somehow I missed this when adding all of the other MOCS values. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Cc: "10.2" <[email protected]>
* i965/vec4: Fix dead code elimination for VGRFs of size > 1.Kenneth Graunke2014-06-151-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | When faced with code such as: mov vgrf31.0:UD, 960D mov vgrf31.1:UD, vgrf30.xxxx:UD The dead code eliminator didn't consider reg_offsets, so it decided that the second instruction was writing was writing to the same register as the first one, and eliminated the first one. But they're actually different registers. This fixes INTEL_DEBUG=shader_time for vertex shaders. In the above code, vgrf31.0 represents the offset into the shader_time buffer where the data should be written, and vgrf31.1 represents the actual time data. With a completely undefined offset, results were...unexpected. I think this is probably one of the few cases (maybe only case) where we generate multiple MOVs to a large VGRF. Normally, we just use them as texturing results; the other SEND-from-GRF uses a size 1 VGRF. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79029 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Cc: [email protected]
* i965: Add SHADER_OPCODE_SHADER_TIME_ADD to dump_instructions() decode.Kenneth Graunke2014-06-151-0/+2
| | | | | | | "shader_time_add" is a lot more informative than "op152". Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* mesa/drivers: Fix clang constant-logical-operand warnings.Vinson Lee2014-06-144-13/+13
| | | | | | | | | | | | | | | | This patch fixes several clang constant-logical-operand warnings such as the following. ../../../../../src/mesa/tnl_dd/t_dd_tritmp.h:130:32: warning: use of logical '||' with constant operand [-Wconstant-logical-operand] if (DO_TWOSIDE || DO_OFFSET || DO_UNFILLED || DO_TWOSTENCIL) ^ ~~~~~~~~~~~ ../../../../../src/mesa/tnl_dd/t_dd_tritmp.h:130:32: note: use '|' for a bitwise operation if (DO_TWOSIDE || DO_OFFSET || DO_UNFILLED || DO_TWOSTENCIL) ^~ | Signed-off-by: Vinson Lee <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* meta_blit: properly compute texture width for the CopyTexSubImage fallbackJason Ekstrand2014-06-131-1/+1
| | | | | | | Cc: "10.2" <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Set the fast clear color value for texture surfacesNeil Roberts2014-06-122-2/+6
| | | | | | | | | | | | | | When a multisampled texture is used for sampling the fast clear color value needs to be programmed into the surface state. This was being left as all zeroes so if the surface was cleared to a value other than black then it wouldn't work properly. This doesn't matter for single-sample textures because in that case the MCS buffer is resolved before it is used as a texture source. https://bugs.freedesktop.org/show_bug.cgi?id=79729 Reviewed-by: Chris Forbes <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Cc: "10.1 10.2" <[email protected]>
* i965: Fix disassembly of BLORP clear programs.Kenneth Graunke2014-06-121-1/+1
| | | | | | | Too many levels of indirection. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Move FB write default state mashing in a level.Kenneth Graunke2014-06-121-7/+7
| | | | | | | | | | We only need to alter the default state if we're emitting MOVs for header related fields. So, we can simply move the push/pop of state in to the if (header_present) block, bypassing it in the common case. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79903
* i965: Fix Haswell discard regressions since Gen4-5 line AA fix.Kenneth Graunke2014-06-121-2/+7
| | | | | | | | | | | | | | | | In commit dc2d3a7f5c217a7cee92380fbf503924a9591bea, Iago accidentally moved fire_fb_write() above the brw_pop_insn_state(), which caused the SEND to lose its predication and change from WE_normal to WE_all. Haswell uses predicated SENDs for discards, so this broke Piglit's tests for discards. We want the Gen4-5 MOV to be uncompressed, unpredicated, and unmasked, but the actual FB write itself should respect those. So, pop state first, and force it again around the single MOV. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79903
* i965: Use brw->gen in some generation checks.Matt Turner2014-06-115-11/+17
| | | | | | | Will simplify the automated conversion if we want to allow compiling the driver for a single generation. Reviewed-by: Kristian Høgsberg <[email protected]>
* i965/fs: Clean up tabs in brw_fs_cse.cpp.Matt Turner2014-06-111-43/+43
| | | | I'm adding vec4 CSE, and I want to diff the files.
* meta: save and restore swizzle for _GenerateMipmapRobert Bragg2014-06-111-0/+12
| | | | | | | | | | This makes sure to use a no-op swizzle while iteratively rendering each level of a mipmap otherwise we may loose components and effectively apply the swizzle twice by the time these levels are sampled. Signed-off-by: Robert Bragg <[email protected]> Reviewed-by: Chris Forbes <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965/vec4: Emit smarter code for b2f of a comparisonIan Romanick2014-06-112-0/+48
| | | | | | | | | | | | | | | | | | | | | | | | Previously we would emit the comparison, emit an AND to mask off extra bits from the comparison result, then convert the result to float. Now, do the comparison, then use a cleverly constructed SEL to pick either 0.0f or 1.0f. No piglit regressions on Ivybridge. total instructions in shared programs: 1642311 -> 1639449 (-0.17%) instructions in affected programs: 136533 -> 133671 (-2.10%) GAINED: 0 LOST: 0 Programs that are affected appear to save between 1 and 5 instuctions (just by skimming the output from shader-db report.py. v2: s/b2i/b2f/ in commit subject (noticed by Chris Forbes). Remove extraneous fix_3src_operand (suggested by Matt). The latter change required swapping the order of the operands and using predicate_inverse. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>