summaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers
Commit message (Collapse)AuthorAgeFilesLines
* glsl/loops: Get rid of lower_bounded_loops and ir_loop::normative_bound.Paul Berry2013-12-093-12/+0
| | | | | | | | Now that loop_controls no longer creates normatively bound loops, there is no need for ir_loop::normative_bound or the lower_bounded_loops pass. Reviewed-by: Ian Romanick <[email protected]>
* glsl/loops: replace loop controls with a normative bound.Paul Berry2013-12-092-4/+8
| | | | | | | | | | | | | | This patch replaces the ir_loop fields "from", "to", "increment", "counter", and "cmp" with a single integer ("normative_bound") that serves the same purpose. I've used the name "normative_bound" to emphasize the fact that the back-end is required to emit code to prevent the loop from running more than normative_bound times. (By contrast, an "informative" bound would be a bound that is informational only). Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* glsl/loops: consolidate bounded loop handling into a lowering pass.Paul Berry2013-12-093-63/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, all of the back-ends (ir_to_mesa, st_glsl_to_tgsi, and the i965 fs and vec4 visitors) had nearly identical logic for handling bounded loops. This replaces the duplicate logic with an equivalent lowering pass that is used by all the back-ends. Note: on i965, there is a slight increase in instruction count. For example, a loop like this: for (int i = 0; i < 100; i++) { total += i; } would previously compile down to this (vec4) native code: mov(8) g4<1>.xD 0D mov(8) g8<1>.xD 0D loop: cmp.ge.f0(8) null g8<4;4,1>.xD 100D (+f0) break(8) add(8) g5<1>.xD g5<4;4,1>.xD g4<4;4,1>.xD add(8) g8<1>.xD g8<4;4,1>.xD 1D add(8) g4<1>.xD g4<4;4,1>.xD 1D while(8) loop After this patch, the "(+f0) break(8)" turns into: (+f0) if(8) break(8) endif(8) because the back-end isn't smart enough to recognize that "if (condition) break;" can be done using a conditional break instruction. However, it should be relatively easy for a future peephole optimization to properly optimize this. Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965/gen7+: Implement fast color clears for MSAA buffers.Paul Berry2013-12-091-44/+87
| | | | | | | | | | Fast color clears of MSAA buffers work just like fast color clears with non-MSAA buffers, except that the alignment and scaledown requirements are different. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Chad Versace <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965/blorp: Refactor code for computing fast clear align/scaledown factors.Paul Berry2013-12-091-19/+23
| | | | | | | | | This will make it easier to add fast color clear support to MSAA buffers, since they have different alignment and scaling requirements. Reviewed-by: Chad Versace <[email protected]> Reviewed-by: Anuj Phogat <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/blorp: allow multisample blorp clearsPaul Berry2013-12-091-18/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, we didn't do multisample blorp clears because we couldn't figure out how to get them to work. The reason for this was because we weren't setting the brw_blorp_params num_samples field consistently with dst.num_samples. Now that those two fields have been collapsed down into one, we can do multisample blorp clears. However, we need to do a few other pieces of bookkeeping to make them work correctly in all circumstances: - Since blorp clears may now operate on multisampled window system framebuffers, they need to call intel_renderbuffer_set_needs_downsample() to ensure that a downsample happens before buffer swap (or glReadPixels()). - When clearing a layered multisample buffer attachment using UMS or CMS layout, we need to advance layer by multiples of num_samples (since each logical layer is associated with num_samples physical layers). Note: we still don't do multisample fast color clears; more work needs to be done to enable those. Reviewed-by: Chad Versace <[email protected]> Reviewed-by: Anuj Phogat <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/blorp: Get rid of redundant num_samples blorp param.Paul Berry2013-12-095-15/+12
| | | | | | | | | | | | | | | Previously, brw_blorp_params contained two fields for determining sample count: num_samples (which determined the multisample configuration of the rendering pipeline) and dst.num_samples (which determined the multisample configuration of the render target surface). This was redundant, since both fields had to be set to the same value to avoid rendering errors. This patch eliminates num_samples to avoid future confusion. Reviewed-by: Chad Versace <[email protected]> Reviewed-by: Anuj Phogat <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/gen7+: Disentangle MSAA layout from fast clear state.Paul Berry2013-12-093-48/+47
| | | | | | | | | | | | | | | | This patch renames the enum that's used to keep track of fast clear state from "mcs_state" to "fast_clear_state", and it removes the enum value INTEL_MCS_STATE_MSAA (which previously meant, "this is an MSAA buffer, so we're not keeping track of fast clear state"). The only real purpose that enum value was serving was to prevent us from trying to do fast clear resolves on MSAA buffers, and it's just as easy to prevent that by checking the buffer's msaa_layout. This paves the way for implementing fast clears of MSAA buffers. Reviewed-by: Chad Versace <[email protected]> Reviewed-by: Anuj Phogat <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Don't try to use HW blitter for glCopyPixels() when multisampled.Paul Berry2013-12-091-0/+5
| | | | | | | | | The hardware blitter doesn't understand multisampled layouts, so there's no way this could possibly succeed. Reviewed-by: Chad Versace <[email protected]> Reviewed-by: Anuj Phogat <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Document conventions for counting layers in 2D multisample buffers.Paul Berry2013-12-094-0/+27
| | | | | | | | | | | | | | | | The "layer" parameters used in blorp, and the intel_renderbuffer::mt_layer field, represent a physical layer rather than a logical layer. This is important for 2D multisample arrays on Gen7+ because the UMS and CMS multisample layouts use N physical layers to represent each logical layer, where N is the number of samples. Also add an assertion to blorp to help catch bugs if we fail to follow these conventions. Reviewed-by: Chad Versace <[email protected]> Reviewed-by: Anuj Phogat <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/blorp: Improve fast color clear comment.Paul Berry2013-12-091-1/+12
| | | | | | | | Clarify the fact that we only optimize full buffer clears using fast color clear, and why. Reviewed-by: Chad Versace <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Don't flag gather quirks for Gen8+Chris Forbes2013-12-071-1/+1
| | | | | | | | | | | My understanding is that Broadwell retains the same SCS mechanism that Haswell has, so even if the underlying issue with this format is not fixed, the w/a will be applied in SCS rather than needing shader code. Signed-off-by: Chris Forbes <[email protected]> Cc: Kenneth Graunke <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/Gen7: Allow CMS layout for multisample texturesChris Forbes2013-12-071-17/+1
| | | | | | | | | Now that all the pieces are in place, this should provide a nice performance boost for apps using multisample textures. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Paul Berry <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: Sample from MCS surface when requiredChris Forbes2013-12-072-7/+40
| | | | | Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965/fs: Sample from MCS surface when requiredChris Forbes2013-12-073-10/+41
| | | | | Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965: Add shader opcode for sampling MCS surfaceChris Forbes2013-12-076-0/+16
| | | | | | Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Paul Berry <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/Gen7: Include bitfield in the sampler key for CMS layoutChris Forbes2013-12-072-0/+18
| | | | | | | | | | | | | | | | | | | | We need to emit extra shader code in this case to sample the MCS surface first; we can't just blindly do this all the time since IVB will sometimes try to access the MCS surface even if disabled. V3: Use actual MSAA layout from the texture's mt, rather then computing what would have been used based on the format. This is simpler and less fragile - there's at least one case where we might want to have a texture's MSAA layout change based on what the app does (CMS SINT falling back to UMS if the app ever attempts to render to it with a channel disabled.) This also obsoletes V2's 1/10 -- compute_msaa_layout can now remain an implementation detail of the miptree code. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965/Gen7: Move decision to allocate MCS surface into intel_mipmap_createChris Forbes2013-12-071-6/+8
| | | | | | | | | | This gives us correct behavior for both renderbuffers (which previously worked) and multisample textures (which would never get an MCS surface allocated, even if CMS layout was selected) Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Paul Berry <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/Gen7: emit mcs info for multisample texturesChris Forbes2013-12-071-0/+5
| | | | | | | | Previously this was only done for render targets. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Paul Berry <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/wm: Set copy of sample mask in 3DSTATE_PS correctly for HaswellChris Forbes2013-12-071-2/+7
| | | | | | | | | | | | | | | The bspec says: "SW must program the sample mask value in this field so that it matches with 3DSTATE_SAMPLE_MASK" I haven't observed this to actually fix anything, but stumbled across it while adding the rest of the support for CMS layout for multisample textures. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Paul Berry <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: refactor sample mask calculationChris Forbes2013-12-074-33/+41
| | | | | | | | | Haswell needs a copy of the sample mask in 3DSTATE_PS; this makes that convenient. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Paul Berry <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Replace non-standard INLINE macro with "inline".Kenneth Graunke2013-12-056-22/+22
| | | | | | These are identical: main/compiler.h defines INLINE to "inline". Signed-off-by: Kenneth Graunke <[email protected]>
* i965: Don't use GL types in files shared with intel-gpu-tools.Kenneth Graunke2013-12-056-1035/+1035
| | | | | | | | | sed -i -e 's/GLuint/unsigned/g' -e 's/GLint/int/g' \ -e 's/GLfloat/float/g' -e 's/GLubyte/uint8_t/g' \ -e 's/GLshort/int16_t/g' \ brw_eu* brw_disasm.c brw_structs.h Signed-off-by: Kenneth Graunke <[email protected]>
* i965: Drop trailing whitespace from the rest of the driver.Kenneth Graunke2013-12-0572-621/+621
| | | | | | | Performed via: $ for file in *; do sed -i 's/ *//g'; done Signed-off-by: Kenneth Graunke <[email protected]>
* i965: Drop trailing whitespace from files shared with intel-gpu-tools.Kenneth Graunke2013-12-055-276/+276
| | | | | | Performed via s/ *$//g. Signed-off-by: Kenneth Graunke <[email protected]>
* i965: Emit better code for ir_unop_sign.Matt Turner2013-12-042-15/+49
| | | | | | | | | total instructions in shared programs: 1550449 -> 1550048 (-0.03%) instructions in affected programs: 15207 -> 14806 (-2.64%) Reviewed-by: Paul Berry <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965/fs: New peephole optimization to flatten IF/BREAK/ENDIF.Matt Turner2013-12-044-0/+99
| | | | | | | total instructions in shared programs: 1550713 -> 1550449 (-0.02%) instructions in affected programs: 7931 -> 7667 (-3.33%) Reviewed-by: Paul Berry <[email protected]>
* i965/fs: Emit a MOV instead of a SEL if the sources are the same.Matt Turner2013-12-041-19/+23
| | | | | | | | One program affected. instructions in affected programs: 436 -> 428 (-1.83%) Reviewed-by: Paul Berry <[email protected]>
* i965/fs: Extend SEL peephole to handle only matching MOVs.Matt Turner2013-12-041-3/+2
| | | | | | | | | | | | | | | | | | | | | | Before this patch, the following code would not be optimized even though the first two instructions were common to the then and else blocks: (+f0) IF MOV dst0 ... MOV dst1 ... MOV dst2 ... ELSE MOV dst0 ... MOV dst1 ... MOV dst3 ... ENDIF This commit extends the peephole to handle this case. No shader-db changes. Reviewed-by: Paul Berry <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965/fs: New peephole optimization to generate SEL.Matt Turner2013-12-044-0/+225
| | | | | | | | | | | | | | | | | | | | | fs_visitor::try_replace_with_sel optimizes only if statements whose "then" and "else" bodies contain a single MOV instruction. It also could not handle constant arguments, since they cause an extra MOV immediate to be generated (since we haven't run constant propagation, there are more than the single MOV). This peephole fixes both of these and operates as a normal optimization pass. fs_visitor::try_replace_with_sel is still arguably necessary, since it runs before pull constant loads are lowered. total instructions in shared programs: 1559129 -> 1545833 (-0.85%) instructions in affected programs: 167120 -> 153824 (-7.96%) GAINED: 13 LOST: 6 Reviewed-by: Paul Berry <[email protected]>
* i965/fs: Add SEL() convenience function.Matt Turner2013-12-042-0/+2
| | | | Reviewed-by: Paul Berry <[email protected]>
* i965: Print conditional mod in dump_instruction().Matt Turner2013-12-042-2/+6
| | | | Reviewed-by: Eric Anholt <[email protected]>
* i965: Externalize conditional_modifier for use in dump_instruction().Matt Turner2013-12-042-1/+2
| | | | Reviewed-by: Eric Anholt <[email protected]>
* i965: Print argument types in dump_instruction().Matt Turner2013-12-042-2/+10
| | | | Reviewed-by: Eric Anholt <[email protected]>
* i965: Externalize reg_encoding for use in dump_instruction().Matt Turner2013-12-042-1/+2
| | | | Reviewed-by: Eric Anholt <[email protected]>
* i965/vec4: Don't print swizzles for immediate values.Matt Turner2013-12-041-4/+6
| | | | Reviewed-by: Eric Anholt <[email protected]>
* i965/vec4: Print negate and absolute value for src args.Matt Turner2013-12-041-0/+7
| | | | Reviewed-by: Eric Anholt <[email protected]>
* i965/vec4: Add support for printing HW_REGs in dump_instruction().Matt Turner2013-12-041-0/+60
| | | | Reviewed-by: Eric Anholt <[email protected]>
* i965/fs: Print ARF registers properly in dump_instruction().Matt Turner2013-12-041-2/+46
| | | | Reviewed-by: Eric Anholt <[email protected]>
* i965: Don't print extra (null) arguments in dump_instruction().Matt Turner2013-12-042-4/+4
| | | | Reviewed-by: Eric Anholt <[email protected]>
* i965: Allow commuting the operands of ADDC for const propagation.Matt Turner2013-12-042-2/+2
| | | | Reviewed-by: Eric Anholt <[email protected]>
* i965/fs: Rename register_coalesce_2() -> register_coalesce().Matt Turner2013-12-042-6/+6
| | | | Reviewed-by: Eric Anholt <[email protected]>
* i965/fs: Remove now useless register_coalesce() pass.Matt Turner2013-12-042-148/+0
| | | | Reviewed-by: Eric Anholt <[email protected]>
* i965/fs: Let register_coalesce_2() eliminate self-moves.Matt Turner2013-12-041-1/+2
| | | | | | | | | This is the last thing that register_coalesce() still handled. total instructions in shared programs: 1561060 -> 1560908 (-0.01%) instructions in affected programs: 15758 -> 15606 (-0.96%) Reviewed-by: Eric Anholt <[email protected]>
* i965: Allow constant propagation into ASR and BFI1.Matt Turner2013-12-042-0/+4
| | | | Reviewed-by: Eric Anholt <[email protected]>
* i965/cfg: Document cur_* variables.Matt Turner2013-12-041-2/+5
| | | | Reviewed-by: Eric Anholt <[email protected]>
* i965/cfg: Remove ip & cur from brw_cfg.Matt Turner2013-12-042-26/+17
| | | | Reviewed-by: Eric Anholt <[email protected]>
* i965/cfg: Clean up cfg_t constructors.Matt Turner2013-12-049-23/+9
| | | | | | | parent_mem_ctx was unused since db47074a, so remove the two wrappers around create() and make create() the constructor. Reviewed-by: Eric Anholt <[email protected]>
* i965/cfg: Throw out confusing make_list method.Matt Turner2013-12-042-15/+7
| | | | | | | make_list is just a one-line wrapper and was confusingly called by NULL objects. E.g., cur_if == NULL; cur_if->make_list(mem_ctx). Reviewed-by: Eric Anholt <[email protected]>
* i965/cfg: Include only needed headers.Matt Turner2013-12-042-2/+3
| | | | Reviewed-by: Eric Anholt <[email protected]>